MODELS AND DIAGNOSTICS FORPARSIMONIOUS DEPENDENCE WITHAPPLICATIONS TO MULTIVARIATEEXTREMESbyDavid LeeB.Sc.(ActuarSc), The University of Hong Kong, 2010M.Phil., The University of Hong Kong, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2016c© David Lee 2016AbstractStatistical models with parsimonious dependence are useful for high-dimensional modellingas they offer interpretations relevant to the data being fitted and may be computationallymore manageable. We propose parsimonious models for multivariate extremes; in particu-lar, extreme value (EV) copulas with factor and truncated vine structures are developed,through (a) taking the EV limit of a factor copula, or (b) structuring the underlying corre-lation matrix of existing multivariate EV copulas. Through data examples, we demonstratethat these models allow interpretation of the respective structures and offer insight on thedependence relationship among variables.The strength of pairwise dependence for extreme value copulas can be described usingthe extremal coefficient. We consider a generalization of the F-madogram estimator for thebivariate extremal coefficient to the estimation of tail dependence of an arbitrary bivariatecopula. This estimator is tail-weighted in the sense that the joint upper or lower portion ofthe copula is given a higher weight than the middle, thereby emphasizing tail dependence.The proposed estimator is useful when tail heaviness plays an important role in inference,so that choosing a copula with matching tail properties is essential.Before using a fitted parsimonious model for further analysis, diagnostic checks shouldbe done to ensure that the model is adequate. Bivariate extremal coefficients have beenused for diagnostic checking of multivariate extreme value models. We investigate theuse of an adequacy-of-fit statistic based on the difference between low-order empirical andmodel-based features (dependence measures), including the extremal coefficient, for thispurpose. The difference is computed for each of the bivariate margins and a quadratic formstatistic is obtained, with large values relative to a high quantile of the reference distributionsuggesting model inadequacy. We develop methods to determine the appropriate cutoffvalues for various parsimonious models, dimensions, dependence measures and methods ofmodel fitting that reflect practical situations. Data examples show that these diagnosticchecks are handy complements to existing model selection criteria such as the AIC andBIC, and provide the user with some idea about the quality of the fitted models.iiPrefaceThe thesis is written under the supervision of Prof. Harry Joe. Chapter 3 is based on a sub-mitted manuscript coauthored with the supervisor, who initiated the idea on the extensionof theory about parsimonious structures to multivariate extremes. With guidance from thesupervisor, the author derived the main results, identified numerical challenges in the fittingof the proposed extreme value factor copula, conducted data analysis and drafted the initialmanuscript. Prof. Joe suggested revisions to the manuscript and the author implementedthese revisions to bring the chapter to its present form. Ideas on Gaussian quadratureunderlying the derivations in Chapter 4 were suggested by the supervisor. The author con-ducted most of the derivations and identified further difficulties in certain situations. Thenumerical procedures were implemented by the author in Fortran 90 for speed improvement.Some of the results in Chapter 4 were also included in the submitted manuscript.The review of estimators for the extremal coefficient in Chapter 5 was conducted by theauthor. The supervisor and the author jointly identified the potential extension of the F-madogram to general copulas. The author derived most of the properties of the estimator,while the supervisor suggested improvements to some of the more difficult proofs. Thisextension is based on a manuscript in preparation coauthored with the supervisor.The supervisor advocated the use of the difference statistic as a measure of adequacy-of-fit in Chapters 6 and 7. The author derived most of the properties, including the separabilityresult with empirical U-statistics and maximum likelihood estimation. The data analysisand simulation studies in Chapter 7 were conducted by the author, under helpful advice andinsight from the supervisor. The author developed the efficient algorithm on the evaluationof bivariate empirical distribution functions at the observed data points, as described inAppendix E, and implemented it in Fortran 90. The algorithm is based on a submittedmanuscript coauthored with the supervisor.Throughout the preparation of the thesis, Prof. Joe made ample suggestions on the im-provement of presentation, motivation and big picture viewpoints as well as some technicaldetails. Many of the proposed changes were implemented to result in the current versionthat fits into a unified topic with improved connection within and between chapters.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Symbols and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Multivariate extreme value copula models with parsimonious dependence . 21.2 Bivariate extremal coefficient and extension to general copulas . . . . . . . 41.3 Adequacy-of-fit diagnostic checks for parsimonious models . . . . . . . . . . 51.4 Research contributions and organization of thesis . . . . . . . . . . . . . . . 72 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Copula theory and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.1 Copulas as multivariate distributions on the unit hypercube . . . . . 92.1.2 Factor copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Vine copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Extreme value theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.1 Univariate extreme value theory . . . . . . . . . . . . . . . . . . . . 162.2.2 Multivariate extreme value theory . . . . . . . . . . . . . . . . . . . 17iv2.2.3 Relationship between marginal and joint extremes . . . . . . . . . . 202.3 Tail dependence functions and extreme value limit of copula models . . . . 242.3.1 Tail dependence functions . . . . . . . . . . . . . . . . . . . . . . . . 242.3.2 Extreme value limit of copula models . . . . . . . . . . . . . . . . . 272.3.3 Measures of dependence for bivariate extreme value copulas . . . . . 282.4 Dependence measures for general bivariate copulas . . . . . . . . . . . . . . 292.5 Model estimation methods for multivariate copulas . . . . . . . . . . . . . . 312.5.1 Composite likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5.2 Inference function for margins . . . . . . . . . . . . . . . . . . . . . . 332.5.3 Marginal ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Parsimonious multivariate extreme value copula models . . . . . . . . . 353.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Extreme value factor copula model . . . . . . . . . . . . . . . . . . . . . . . 373.2.1 Construction of extreme value factor copulas . . . . . . . . . . . . . 37A 1-factor extreme value copula . . . . . . . . . . . . . . . . . 37B 2-factor extreme value copula and higher-order generalization 38C Bi-factor extreme value copula . . . . . . . . . . . . . . . . 403.2.2 Extreme value limit of vine copulas . . . . . . . . . . . . . . . . . . . 433.2.3 Bivariate dependence properties . . . . . . . . . . . . . . . . . . . . 443.2.4 Examples of 1-factor and 2-factor extreme value copulas . . . . . . . 45A 1-factor with Dagum (inverse Burr) conditional tail depen-dence functions . . . . . . . . . . . . . . . . . . . . . . . . . 45B 1-factor with Burr (Singh-Maddala) conditional tail depen-dence functions . . . . . . . . . . . . . . . . . . . . . . . . . 46C 2-factor with Dagum conditional tail dependence functionsfor factor 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Structured Hu¨sler-Reiss model . . . . . . . . . . . . . . . . . . . . . . . . . 483.3.1 Hu¨sler-Reiss copula with parsimonious dependence . . . . . . . . . . 483.3.2 Bivariate dependence properties . . . . . . . . . . . . . . . . . . . . 503.4 Comparison between the extreme value factor copula and the structuredHu¨sler-Reiss copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.5 Statistical inference via composite likelihood methods . . . . . . . . . . . . 533.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.7 Data examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.7.1 Fraser River flows data . . . . . . . . . . . . . . . . . . . . . . . . . 603.7.2 United States stock returns data . . . . . . . . . . . . . . . . . . . . 66v4 Numerical integration methods for extreme value factor copula models 714.1 Burr 1-factor extreme value copula . . . . . . . . . . . . . . . . . . . . . . . 734.2 Dagum 1-factor extreme value copula . . . . . . . . . . . . . . . . . . . . . . 795 Extremal coefficient for extreme value copulas and its generalizations . 855.1 Empirical estimators of the extremal coefficient . . . . . . . . . . . . . . . . 865.1.1 Estimators assuming known margins . . . . . . . . . . . . . . . . . . 875.1.2 Rank-based estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 895.1.3 Asymptotic efficiency of the estimators . . . . . . . . . . . . . . . . . 915.2 Generalization of the F-Madogram estimator to non-extreme-value copulas 945.2.1 Dependence properties . . . . . . . . . . . . . . . . . . . . . . . . . . 965.2.2 Interpretation and use for general copulas . . . . . . . . . . . . . . . 985.2.3 Boundary cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.2.4 Asymptotic normality and variance . . . . . . . . . . . . . . . . . . . 1035.2.5 Extension to higher dimensions . . . . . . . . . . . . . . . . . . . . . 1075.2.6 Potential future research . . . . . . . . . . . . . . . . . . . . . . . . . 1086 Assessing model adequacy based on empirical and fitted features . . . . 1096.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.1.1 Vector of differences between empirical and model-based features . . 1096.1.2 Relevance to goodness-of-fit tests . . . . . . . . . . . . . . . . . . . . 1136.2 Asymptotics of the difference vector for a correctly specified model . . . . . 1156.2.1 Behaviour of the difference between empirical and fitted distributionfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.2.2 Generalization to the difference for U-statistics . . . . . . . . . . . . 1206.2.3 Separability of variance in the asymptotic normal distribution . . . . 1266.3 Asymptotics of the difference vector under model misspecification . . . . . . 1276.4 Some comments on properties of the difference statistic . . . . . . . . . . . 1306.5 Decision criteria based on the adequacy-of-fit statistic . . . . . . . . . . . . 1367 Adequacy-of-fit for multivariate copulas with parsimonious dependence 1387.1 Motivation and background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387.2 Diagnostic checks based on the adequacy-of-fit statistic . . . . . . . . . . . . 1437.2.1 Issues and general strategies . . . . . . . . . . . . . . . . . . . . . . . 1437.2.2 Strategies for different parsimonious models . . . . . . . . . . . . . . 1507.2.3 Relationship between the adequacy-of-fit statistic and SRMSR . . . 155vi7.3 Results for the exchangeable Gaussian distribution and some copulas withexchangeable dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1567.3.1 Theoretical results for the exchangeable Gaussian distribution . . . . 1567.3.2 Patterns for some copulas with exchangeable dependence . . . . . . 1587.4 Method on evaluating Σ for factor and truncated vine models under maxi-mum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617.4.1 Empirical covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1627.4.2 Model-based covariance . . . . . . . . . . . . . . . . . . . . . . . . . 1637.5 Finding an approximate critical value using a matching Gaussian copula . . 1667.5.1 Behaviour of different copulas with the same dependence structure . 1677.5.2 Gaussian copula approximation for factor and vine structures . . . . 1717.5.3 Improving computational efficiency . . . . . . . . . . . . . . . . . . . 1747.6 Multivariate extreme value copulas — an example for pairwise likelihoodestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1787.6.1 Finding a matching t-EV copula . . . . . . . . . . . . . . . . . . . . 1787.6.2 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1807.6.3 Implementation for parsimonious extreme value copulas . . . . . . . 1837.7 Adequacy-of-fit for tails of copulas . . . . . . . . . . . . . . . . . . . . . . . 1847.7.1 Properties of the difference statistic for a single bivariate margin . . 1857.7.2 SRMSR critical values for parsimonious dependence structures . . . 1877.8 Model estimation methods other than the maximum likelihood . . . . . . . 1897.8.1 Estimation of marginal distributions . . . . . . . . . . . . . . . . . . 1897.8.2 Sequential estimation for vine structures . . . . . . . . . . . . . . . . 1937.9 Data examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1947.9.1 European market returns data . . . . . . . . . . . . . . . . . . . . . 1947.9.2 Fraser River flows data . . . . . . . . . . . . . . . . . . . . . . . . . 1997.9.3 United States stock returns data . . . . . . . . . . . . . . . . . . . . 2017.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2018 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208AppendicesA Derivations of the asymptotic properties of the rank-based F-madogrammeasure of dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220viiB Behaviour of the asymptotic variances of the empirical Kendall’s τ andSpearman’s ρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223C Maximum likelihood estimator for Gaussian vine structures when vari-ances are estimated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225D Derivations of the properties of correlation parameters for the ex-changeable Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . 230D.1 Variance of the difference for one bivariate margin . . . . . . . . . . . . . . 230D.2 Covariance between differences for two bivariate margins . . . . . . . . . . . 234E An O(N log2N) algorithm for evaluating the bivariate empirical distri-bution function at the N observations . . . . . . . . . . . . . . . . . . . . 237E.1 The modified merge sort algorithm . . . . . . . . . . . . . . . . . . . . . . . 239E.2 The modified quicksort algorithm . . . . . . . . . . . . . . . . . . . . . . . . 243E.3 Some comments on the algorithms . . . . . . . . . . . . . . . . . . . . . . . 247viiiList of Tables2.1 Dependence properties of some commonly used copulas . . . . . . . . . . . . . 132.2 Linking copulas for each tree of a C-vine rooted at variable 1 . . . . . . . . . . 153.1 An example of the dependence structure between observed and latent variablesin a bi-factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Matrices of pairwise Spearman’s ρ for five copulas with parsimonious dependence 533.3 Pairwise Kendall’s τ and extremal coefficient ϑ for the mixed dependence scenario 553.4 Estimated Kendall’s τ and standard errors for the Dagum 1-factor extreme valuecopula simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.5 Correlations of the normal scores for the Fraser River flows data . . . . . . . . 603.6 Fitting results for the Fraser River flows data using normal scores and Gaussiandistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.7 Loadings of the Gaussian 3-factor model for the normal scores of the FraserRiver flows data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.8 Fitting results for the Fraser River flows data using multivariate extreme valuecopulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.9 Loadings of the Hu¨sler-Reiss 1- and 2-factor models . . . . . . . . . . . . . . . 653.10 Correlations of the normal scores for the US stocks data . . . . . . . . . . . . . 673.11 Fitting results for the US stocks data using normal scores and Gaussian distribution 673.12 Loadings of the Gaussian 1- and 2-factor models for the normal scores of the USstocks data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.13 Fitting results for the US stocks data using multivariate extreme value copulas 705.1 Asymptotic variances of the empirical extremal coefficient ϑˆ for the rank-basedestimators with the independence copula . . . . . . . . . . . . . . . . . . . . . . 935.2 Asymptotic variances of the empirical extremal coefficient ϑˆ for various estima-tors with the Gumbel copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.3 Values of λU,α and λL,α for various bivariate copulas with Kendall’s τ equal to 0.5 99ix5.4 Comparison of the difference in λL,α for six pairs of copulas against their asymp-totic standard errors for the sample size 400 . . . . . . . . . . . . . . . . . . . . 1015.5 Asymptotic variances of λˆU,α and λˆL,α for various bivariate copulas with Kendall’sτ equal to 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.6 Asymptotic variances of λˆU,α,r and λˆL,α,r for various bivariate copulas withKendall’s τ equal to 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.1 Summary of empirical and model-based asymptotic variances under maximumlikelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.2 Tail properties and asymptotic variances for different parametric copula familieswith true Kendall’s τ equal to 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 1336.3 Asymptotic variance estimates for Kendall’s τ using different estimators for thecopula parameter, with the parametric model having Pareto margins and MTCJcopula with Kendall’s τ equal to 0.5 . . . . . . . . . . . . . . . . . . . . . . . . 1367.1 Matrix of pairwise differences between the empirical and model-based extremalcoefficient for the Burr 1-factor EV copula and 1-factor t-EV copula with ν = 3fitted to the US stock returns example . . . . . . . . . . . . . . . . . . . . . . . 1417.2 Possible approaches of obtaining critical values of the adequacy-of-fit statisticfor different parsimonious dependence structures, assuming models are estimatedvia maximum likelihood with features being Kendall’s τ or Spearman’s ρ . . . 1547.3 Summary of asymptotic variances and covariances of the differences for the d-dimensional exchangeable Gaussian distribution, using modified correlation asthe feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1587.4 Simulation scenarios for factor copulas to be used for different parametric linkingcopula families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1687.5 Simulation scenarios for truncated vine copulas to be used for different para-metric linking copula families . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1697.6 Summary of conservative SRMSR critical values (for sample size 100) for Gaus-sian copulas with factor and truncated vine structures . . . . . . . . . . . . . . 1797.7 Pairwise Kendall’s τ and extremal coefficient for the three simulation scenarioswith d = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1817.8 Average pairwise Kendall’s τ and extremal coefficient for the three simulationscenarios with d = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827.9 SRMSR critical values (for sample size 100) using Hu¨sler-Reiss copula and t-EVcopulas with matching pairwise Kendall’s τ or extremal coefficient, under weak,moderate and strong dependence scenarios for d = 5 and 10 . . . . . . . . . . . 182x7.10 Summary of SRMSR critical values (for sample size 100) using tail-weighteddependence measures with factor and truncated vine structures . . . . . . . . . 1897.11 SRMSR critical values (for sample size 100) for some exchangeable, factor andtruncated vine structures based on different estimation methods with Kendall’sτ as the feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1917.12 SRMSR critical values (for sample size 100) for the exchangeable structure withFrank and MTCJ copulas with Kendall’s τ value of 0.5, based on copula estima-tion and joint estimation of marginal and dependence parameters with Paretomargins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1927.13 SRMSR critical values (for sample size 100) for the Hu¨sler-Reiss copula, usingcopula estimation, IFM with GEV marginal distributions and the marginal ranksmethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1927.14 SRMSR critical values (for sample size 100) for some 2-truncated C-vines withthe feature being Kendall’s τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1937.15 Fitted models considered for the European market GARCH-filtered index re-turns data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1967.16 European market returns data: SRMSR statistics and critical values for thefitted models with the feature being Kendall’s τ . . . . . . . . . . . . . . . . . 1977.17 European market returns data: SRMSR statistics and critical values for thefitted models with the features being tail-weighted dependence measures . . . . 1987.18 European market returns data: SRMSR critical values with Kendall’s τ and tail-weighted dependence measures for sample size n = 484 obtained from Tables7.6 and 7.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1997.19 Fraser River flows data: SRMSR statistics and critical values for some fittedmodels with the features being Kendall’s τ and extremal coefficient . . . . . . . 2007.20 US stock returns data: SRMSR statistics and critical values for some fittedmodels with the features being Kendall’s τ and extremal coefficient . . . . . . . 202xiList of Figures2.1 Dependence between observed and latent variables for the 1-factor and 2-factorcopula models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Vine diagram for the first tree of a C-vine rooted at variable 1 . . . . . . . . . 152.3 Locations of the gauging stations in eastern Vancouver Island . . . . . . . . . . 212.4 Q-Q plots for the marginal GEV fitting of the Vancouver Island streamflows data 222.5 Scatterplot of normal scores for the Vancouver Island streamflows data . . . . . 232.6 Min-stable plot with exponential margin and max-stable plot with Fre´chet mar-gin for the Vancouver Island streamflows data . . . . . . . . . . . . . . . . . . . 242.7 Density contours and lower tail dependence functions of the MTCJ and reflectedGumbel copulas with Kendall’s τ equal to 0.5 . . . . . . . . . . . . . . . . . . . 263.1 Scatterplots of the normal scores of the data simulated from the Dagum 1-factorextreme value copula, with dependence parameters δ = (1, 1, 1, 1), (4, 4, 4, 4) and(1, 2, 3, 4) for the weak, strong and mixed dependence scenarios, respectively . 573.2 Sampling distributions of the fitted Kendall’s τ using full and pairwise likelihoodfor the strong dependence scenario, with δ = (4, 4, 4, 4) for the Dagum 1-factorextreme value copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.3 Locations of the gauging stations along the Fraser River and the pairwise scat-terplot of the normal scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4 Vine diagram for the fitted Hu¨sler-Reiss 1- and 2-truncated vine models usingthe vine suggested from Gaussian analysis . . . . . . . . . . . . . . . . . . . . . 663.5 Vine diagram for the fitted Hu¨sler-Reiss 1- and 2-truncated vine models using aD-vine following the relative positions of the stations . . . . . . . . . . . . . . . 663.6 Scatterplot of the normal scores for the US stocks data (negated minimum returns) 683.7 Diagram for the fitted Hu¨sler-Reiss 1-truncated vine model using the vine sug-gested from Gaussian analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69xii5.1 Asymptotic variances of the CFG and F-madogram estimators (assuming knownmargins) for the Gumbel and Hu¨sler-Reiss copulas . . . . . . . . . . . . . . . . 926.1 Asymptotic variances of the empirical, model-based and difference estimatorsfor different parametric copula families with various strengths of dependence . 1327.1 Plots of means and variances of the reference distribution Q for various dimen-sions, and the critical values (95% quantiles) of Q expressed in terms of SRMSRfor sample size n, i.e.,√(CV of Q) /n, for n = 100 . . . . . . . . . . . . . . . . 1597.2 Plots of SRMSR (sample size 100) critical values against the common strengthof dependence between variables expressed in terms of Kendall’s τ , for variousparametric copula families and dimensions . . . . . . . . . . . . . . . . . . . . . 1607.3 Histograms of asymptotic empirical covariance estimates using different meth-ods, based on a 1-truncated D-vine with 8 variables and Gumbel linking copulaswith Kendall’s τ = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.4 Histograms of asymptotic model-based covariance estimates, based on a 1-truncatedD-vine with 8 variables and Gumbel linking copulas with Kendall’s τ = 0.5 . . 1657.5 Ratio of SRMSR critical values (for sample size 100) for various 1- and 2- factorcopulas, with the feature being Kendall’s τ . . . . . . . . . . . . . . . . . . . . 1707.6 Ratio of SRMSR critical values (for sample size 100) for various 1- and 2-truncated C-vine copulas, with the feature being Kendall’s τ . . . . . . . . . . 1717.7 SRMSR critical values (for sample size 100) for Gaussian 1- and 2-factor copulas,with the feature being Kendall’s τ . . . . . . . . . . . . . . . . . . . . . . . . . 1727.8 SRMSR critical values (for sample size 100) for Gaussian 1- and 2-truncated C-and D-vine copulas, with the feature being Kendall’s τ . . . . . . . . . . . . . . 1737.9 SRMSR critical values (for sample size 100) for various Gaussian factor cop-ulas using the restricted maximum likelihood method, with the feature beingKendall’s τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.10 SRMSR critical values (for sample size 100) for various Gaussian C-vine cop-ulas using the restricted maximum likelihood method, with the feature beingKendall’s τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.11 Asymptotic variances of the empirical, model-based and difference estimators us-ing tail-weighted dependence measures for different parametric copula familieswith various strengths of dependence, plotted against the tail-weighted depen-dence measure for the respective tail . . . . . . . . . . . . . . . . . . . . . . . . 186xiii7.12 SRMSR critical values (for sample size 100) for some copulas having factor ortruncated C- and D-vine structures with 5 or 8 variables, with the feature beingthe tail-weighted dependence measure . . . . . . . . . . . . . . . . . . . . . . . 1887.13 Scatterplot of the normal scores for the European market GARCH-filtered indexreturns data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195D.1 Plots of various asymptotic variances and covariances for different dimensionsand true correlations of the positive exchangeable Gaussian distribution . . . . 236E.1 Illustration of the merge sort with 8 elements . . . . . . . . . . . . . . . . . . . 239E.2 Illustration of the modified merge sort with 8 elements . . . . . . . . . . . . . . 241E.3 Illustration of the quick sort with 8 elements . . . . . . . . . . . . . . . . . . . 244E.4 Illustration of the modified quick sort with 8 elements . . . . . . . . . . . . . . 247xivList of Symbols and NotationsThe following lists the most common uses of symbols and notations in the thesis. Thosenot listed have one or more uses that are not the main emphasis of the thesis.A, a: Stable tail dependence function (one form of the Pickands dependence functionthat is homogeneous of order 1 and appears in the exponent of a multivariate extremevalue copula (the “exponent function”)), or sometimes a generic functionaS : Stable tail dependence function for the subset of variables SA(S) : Derivative of A with respect to the subset of indices SB: Brownian bridgeB: One form of the Pickands dependence function, related to A as B(w1, . . . , wd−1) =A(w1, . . . , wd−1, 1−∑d−1j=1 wj) for wj ∈ [0, 1] and∑dj=1wj ≤ 1. For most of the thesis,we refer to B as the Pickands dependence functionb: Tail dependence functionbS : Marginal tail dependence function for the subset of variables SbSj |Sk : Conditional tail dependence function with subsets of indices Sj and SkC: Copula (i.e., multivariate distribution function with Uniform(0, 1) margins)C+ / C− / C⊥: Comonotonicity / (bivariate) Countermonotonicity / Independence copulaĈ: Reflected or survival copula of CC: Survival function of CCS : Marginal copula of C for the subset of variables SCSj |Sk : Conditional distribution for copulas with subsets of indices Sj and SkCjk;S : Copula for the conditional distributions Fj|S and Fk|SCj|k;S : Conditional distribution of Cjk;SCn: Empirical copula for sample size nc / cS / cjk;S : Copula density of C / CS / Cjk;SCor: Correlation between two random variablesCov: Covariance between two random variables or elements of a vectorDn: Difference statistic (empirical minus model-based feature) for sample size nxvd: Dimension or number of variablesE: Expectation of a random variableF , G: Distribution function, not necessarily copulaFSj |Sk : Conditional distribution with subsets of indices Sj and SkFˆn: Empirical distribution function (Fˆjk for bivariate version for margins (j, k))f , g: Density functionG: Gaussian processG: Survival function of GH: Sensitivity matrix (component of the sandwich information matrix)Id: Identity matrix of dimension dId: Index set {1, . . . , d}I: Fisher information matrixi.i.d.: Independent and identically distributedJd: Matrix of 1’s of dimension dJ : Variability matrix (component of the sandwich information matrix)`: Log-likelihood functionl: Inference function (for estimating equation); slowly varying functionN : Sample size for Monte Carlo simulation in the evaluation of integrals; Gaussiandistribution if followed by argumentsn: Sample size (for data, parametric bootstrap, etc.)O: Asymptotic order, e.g., h1(x) = O(h2(x)) as x→∞ means that, for some constants Mand x0, |h1(x)/h2(x)| ≤M for all x ≥ x0; complexity of sorting algorithmsOp: Stochastic version of O, e.g., Xn = Op(Yn) as n→∞ for (possibly stochastic)sequences Xn, Yn means that Xn/Yn is bounded in probabilityo: Asymptotic dominance, e.g., h1(x) = o(h2(x)) as x→∞means that limx→∞h1(x)/h2(x) = 0op: Stochastic version of o, e.g., Xn = op(Yn) as n→∞ for (possibly stochastic)sequences Xn, Yn means that Xn/Yn converges to zero in probabilityP: Probability of a set or eventp: Number of factors or truncation level for vinesQn: Quadratic form statistic for sample size n, with reference distribution QRd / Rd+: Reals / non-negative reals in d dimensions. Superscript omitted if d = 1.Td,ν : Distribution function of the d-variate t distribution with ν degrees of freedomT (F ): Functional mapping from F to R with distribution F ∈ F as argumentU : Random variable of a copulaV : Latent variable for factor copulaVar: Variance of a random variablexviX, Y , Z: Random variableβ: Blomqvist’s β or sometimes a parameterδ: Dependence parameter of copula familiesθ: Dependence parameter of copula families; parameter of a general statistical modelθ0: True value of a parameterθˆn: Estimator of θ based on a sample of size nϑ: Extremal coefficientϑα: Extremal coefficient for general copula with exponentiation index α for the F-madogramκL / κU : Lower / Upper tail orderλL / λU : Lower / Upper tail dependence indexλα: F-madogram measure of dependence with exponentiation index α, with estimatorassuming known margins λˆα and rank-based λˆα,r. A subscript L or U preceding αindicates lower or upper tail-weighted, respectively.µ: Quantity related to the mean of a variable; location parameter; positive measureν: Degree of freedom parameter for the t and related distributionsνα: F-madogram of a bivariate copula with exponentiation index α, with estimatorassuming known margins νˆα and rank-based νˆα,rρ: Correlation parameterρjk;S : Partial correlation of variables j, k given the set of variables SρN : Correlation of normal scoresρS : Spearman’s ρ or rank correlationΣ: Correlation / covariance matrixσ / σ2: Quantity related to the standard deviation or variance of a variable; scale parameterτ : Kendall’s τΦd: Distribution function for the standard d-variate Gaussian distribution. Subscriptomitted if d = 1.∅: Empty set1{S}: Indicator function for event Sc: Concordance ordering (left argument more concordant than right; ≺c for reverse order)∨: Maximum operator∧: Minimum operator,: Left argument is defined as right argument (sometimes “is denoted as”)∇ / ∇2: Gradient / HessianSᵀ: Transpose of the matrix S∼: Asymptotic equality, e.g., h1(x) ∼ h2(x) as x→∞ means that limx→∞h1(x)/h2(x) = 1;left argument is distributed as right argumentxviip→: Convergence in probabilityd→: Convergence in distributionh′: Derivative of h|S|: Cardinality of a finite set SxviiiAcknowledgementsI would like to express my deepest gratitude to my supervisor, Prof. Harry Joe, for hisguidance and continuous support throughout my PhD studies. His insightful comments,ideas and approach to problems allowed me to explore many areas in the world of copulamodelling that would not have been possible otherwise. Thanks to his critical but construc-tive comments, I have identified some inadequacies and learned various skills that will beuseful in my future career.I am grateful to the members of the supervisory committee, Prof. Lang Wu and Dr.Natalia Nolde, for their suggestions and comments. Their input has been valuable inimproving the presentation and flow of ideas in the thesis. I would also like to thank Dr.Johanna Nesˇlehova´, who provided extensive feedback as the external examiner, and Prof.Jiahua Chen and Prof. Joel Friedman for serving as the university examiners.My heartfelt appreciation extends to other faculty members and the administrative staffof the Department of Statistics. I am in particular grateful for their thoughtful assistancewhich made my PhD journey a much smoother one. The support from my fellow graduatestudents has also been very helpful; I truly feel that we are in a big family and they arealways approachable whenever I need someone to talk to. I am also thankful to the financialsupport from UBC in the form of the Four Year Doctoral Fellowship.Finally, I would like to thank my parents, relatives and friends for their constant en-couragement. It is never easy to start a new chapter in life in another continent; theirsupport and help have given me the motivation and courage to go forward.xixDedicationTo my parentsxxChapter 1IntroductionThe modelling of multivariate observations is important as variables often depend on eachother, and such dependence cannot be accounted for if each variable is modelled separately.The multivariate normal or Gaussian distribution is arguably the most commonly usedmodel for multivariate modelling. A fully parametrized d-variate Gaussian distribution hasd mean parameters, d variance parameters and(d2)= d(d − 1)/2 covariance parameters,for a total model complexity of O(d2) parameters. Despite the mathematical appeal of theGaussian distribution, it lacks certain characteristics that make it an inappropriate choice insome situations. One aspect is tail dependence; the Gaussian distribution with correlationparameter ρ ∈ (−1, 1) is tail independent (Sibuya (1960)), meaning that pairs of variablesare independent in the limit when they are jointly large or small. Using the Gaussiandistribution to model variables that exhibit tail dependence will therefore underestimatejoint tail probabilities. Another issue with the Gaussian distribution is that it is centrallyor reflection symmetric, that is, if Y ∼ N(µ,Σ), then Y − µ has the same distributionas µ− Y ; this is an unreasonable assumption for certain types of data, such as when it ismore likely to observe joint large values than joint small values.In light of the inadequacy of the Gaussian distribution, care should be taken whenmodelling the dependence structure among variables. Such procedure is aided by Sklar’stheorem (Sklar (1959)), which shows that the modelling of marginal and dependence char-acteristics can be separated. That is, if (X1, . . . , Xd) ∼ F with marginal distributionsF1, . . . , Fd, then there exists a distribution function C : [0, 1]d → [0, 1] such thatF (x1, . . . , xd) = C (F1(x1), . . . , Fd(xd)) ,C being unique if F is continuous. The distribution C, defined on the unit hypercube,is known as the copula of F and is a multivariate distribution function with Uniform(0, 1)margins. Note that the choice of C is independent of the choice of the marginal distributions1F1, . . . , Fd. Multivariate copula families with properties different from the Gaussian, suchas non-zero tail dependence and asymmetric behaviour, have been developed (see, e.g., Joe(1997, 2014) and Nelsen (2006)).1.1 Multivariate extreme value copula models withparsimonious dependenceA general d-dimensional copula contains O(d2) pairwise bivariate margins and hence thesame number of descriptions of pairwise dependence properties. Typically this means thesame number of dependence parameters to be modelled as well, just like the case of ad-dimensional Gaussian distribution. The property that the number of parameters growsquadratically in the dimension may not be desirable for several reasons:1. A model with too many parameters may be difficult to interpret. Although pairwisedependence summaries are useful, often the researcher is interested in the structureof the data as well. A fully parametrized, saturated model offers little insight into thedependence structure.2. There is the risk of overfitting if the number of parameters is large compared to thesample size. When this happens, the model may pick up noise that is not part ofthe structure. This can affect the quality of the fit and the model’s ability to predictfuture behaviour of the process.3. When the objective function depends on many parameters to be optimized over, modelestimation can become computationally inefficient or even impossible. For example,in Newton’s method it is necessary to obtain the inverse of the Hessian matrix atevery iteration, and the time complexity of this operation depends directly on thenumber of parameters.Because of these reasons, it makes sense to consider parsimonious dependence models whichtry to explain patterns with a formulation as simple as possible. In multivariate statisticalmodelling, this means that a dependence structure is applied to an otherwise unconstrainedmodel, in an attempt to reduce the number of parameters (model complexity) but yetallow both statistical and scientific interpretations to be made. Examples of parsimoniousmultivariate models include:1. Exchangeable structure, which assumes the same dependence pattern for every pair ofvariables, so that the distribution is invariant to the permutation of variable indices.In the Gaussian case, the correlation matrix has identical non-diagonal elements. For2copulas, the class of Archimedean copulas (see, e.g., Chapter 4 of Nelsen (2006)) iscommonly used to model the exchangeable dependence structure with a fixed numberof parameters for any dimension d. Due to the strong assumption of exchangeability,this structure alone is typically not very useful apart from small values of d. Forhigher dimensions, the exchangeable structure may be used for a smaller subset ofvariables within a cluster (see, e.g., Brechmann (2014)).2. Factor structure, which assumes that the dependence among variables is driven byone or more latent factors (see, e.g., Chapter 9 of Johnson and Wichern (2007)). Withthe factor structure, variables are conditionally independent given the latent factor(s).The classical Gaussian p-factor model decomposes a d × d correlation matrix ΣG asfollows:ΣG = LLᵀ + Ψ,where L is a d× p matrix of loadings and Ψ a diagonal matrix of uniquenesses. Fac-tor copulas (Krupskii and Joe (2013)) are the copula analogue of the Gaussian factormodel, where conditional independencies between the observed variables given thelatent factors are preserved, while the dependence between latent and observed vari-ables is specified by parametric bivariate copula families. The Gaussian factor modelis a special case, in which all linking copulas are Gaussian with suitable correlations.3. Markov tree and truncated vine structures (see Bedford and Cooke (2001, 2002);Brechmann et al. (2012)), in which the dependence between variables is propagatedthrough bivariate linkages. In a Markov tree, only the dependence properties ofneighbouring variables are specified. This results in variables being Markovian, inthe sense that a variable is conditionally independent with another given those alongthe linkage. The restriction of conditional independence is relaxed in the truncatedvine structure. A p-truncated vine has dependence properties up to order p (i.e.,conditioning on at most p− 1 variables) specified, beyond which conditional indepen-dence is assumed. For the d-variate Gaussian distribution, creating a Markov treeamounts to specifying the correlations of d− 1 acyclic pairs (i.e., no three pairs forma closed loop among three variables), while the dependence for other pairs is drivenby assuming zero higher-order partial correlations. A p-truncated vine is constructedby allowing partial correlations of at most order p−1 to be non-zero. Vine copulas orpair-copula constructions (Aas et al. (2009); Kurowicka and Joe (2011)) are general-izations of such dependence structures beyond Gaussianity; a p-truncated vine copulaspecifies parametric bivariate copula families that are applied to univariate marginalor conditional distributions of up to order (tree) p.3These models are parsimonious because in each case the number of parameters has ordersmaller than O(d2) and the distribution is structured to allow specific interpretation. Inparticular, the exchangeable model has O(1) parameters, while the factor and truncatedvine structures have O(d) parameters. Apart from continuous distributions, these modellingtechniques have previously been applied to discrete and ordinal data (Panagiotelis et al.(2012, 2017); Nikoloulopoulos and Joe (2015); Sto¨ber et al. (2015)). We extend the conceptof parsimonious dependence structure to multivariate extremes. The modelling of extremesis of interest in many areas, such as finance, meteorology, hydrology and engineering, wherethe occurrence of extreme events can lead to catastrophic results (see, e.g., Embrechtset al. (1997); Coles (2001); Castillo et al. (2005)). Very often, joint modelling of extremes isneeded when one wants to make valid inferences about the extremal behaviour of dependentrandom variables. We propose factor and truncated vine analogues for extreme value copulamodels; they are suitable for the modelling of joint extremal observations that are believedto possess such structures, for example extreme stock returns for the same sector in whicha factor model is plausible, or hydrological observations along a river where stations havenatural (spatial) ordering and a vine structure is a reasonable assumption. The distributionof joint extremes has the max-stability property (de Haan and Resnick (1977)), meaningthat the family of distributions is closed under componentwise extrema; this is not triviallysatisfied by a factor or truncated vine copula with arbitrary or even extreme value linkingcopulas. Direct generalization appears difficult, and therefore we take the route of obtainingthe extreme value limit in the case of factor copulas, and utilizing flexible multivariateextreme value copulas (which satisfy the max-stability property) where a factor or truncatedvine structure can be applied.1.2 Bivariate extremal coefficient and extension to generalcopulasTo explore the relationship among variables, it is useful to consider measures of dependencestrength which quantify the degree of dependence. These are typically bivariate or pair-wise in nature, although we note that higher-dimensional generalizations are possible. Theextremal coefficient ϑ is such a measure first proposed to describe the strength of depen-dence of a multivariate extreme value distribution (Sibuya (1960); Pickands (1981); Smith(1990)). For a random vector (X1, . . . , Xd) ∼ F with common marginal distribution Fm,the extremal coefficient measures the number of “effective” independent variables and isdefined implicitly as follows:F (x, . . . , x) = P(X1 ≤ x, . . . ,Xd ≤ x) = [P(X1 ≤ x)]ϑ = F ϑm(x). (1.1)4For copula C(u, . . . , u), the final term in (1.1) reduces to uϑ. For distributions with non-negative dependence, ϑ ∈ [1, d], where the boundaries are reached at the comonotonicity(ϑ = 1) and independence limits (ϑ = d). Empirical estimators (i.e., based on a sample)have been proposed for the estimation of ϑ for bivariate extreme value distributions (see,e.g., Pickands (1981); Deheuvels (1991); Cape´raa` et al. (1997); Hall and Tajvidi (2000);Cooley et al. (2006); Bu¨cher et al. (2011)). We examine these estimators and propose ageneralization for arbitrary copulas. This generalization allows one to put more weight atthe tail portion of a copula and estimate its tail dependence properties empirically.1.3 Adequacy-of-fit diagnostic checks for parsimoniousmodelsWhen multiple models are fitted to a data set, likelihood-based model selection criteriasuch as the Akaike and Bayesian information criteria (Akaike (1974); Schwarz (1978)) arecommonly used. However, these criteria do not measure the quality of the individual model,and it may happen that even the best model among those considered has a poor fit. To en-sure that a parsimonious model is adequate, it is intuitive to compare the closeness betweenempirical and model-based features. A feature is mathematically a functional applied toa distribution, and provides a summary of certain characteristics of the distribution. Bi-variate dependence measures such as the extremal coefficient for d = 2 have been used asdiagnostics for multivariate and spatial extremes (see, e.g., Smith (1990); an applicationto the modelling of extreme snow depth is given in Blanchet and Davison (2011)). Forgeneral copula models, dependence measures such as Kendall’s τ and Spearman’s ρ may bebetter features as they are widely used and are mathematically easier to handle. To assessthe quality of tail fits, the tail-weighted dependence measures (Krupskii and Joe (2015))may be an appropriate choice. For each of the(d2)bivariate marginal distributions of a d-dimensional random vector, we compute the difference (or residual) between the empiricalfeature and the feature obtained from the assumed model based on the estimated parame-ters. An adequate model should be such that these differences are small in magnitude forall or most of the bivariate margins.This type of statistics based on residuals has been investigated previously in the discretecase. A notable example of such statistics is Pearson’s χ2, in which a quadratic form isconstructed based on the vector of residuals in each cell of a contingency table. The testbased on Pearson’s χ2 statistic is known to be unreliable when some cells have small ex-pected counts, or equivalently when the ratio between the sample size and number of cellsis small (Cochran (1952)). When this happens, the asymptotic (or reference) distribution5can differ significantly from the distribution of the finite-sample statistic, leading to inac-curate type I error rates in conducting hypothesis tests. This is especially a problem forhigher-dimensional contingency tables; in this situation, Maydeu-Olivares and Joe (2005,2006) propose limited-information statistics that use residuals based on low-order marginaltables. This effectively reduces the number of cells with small proportions, bypassing spar-sity issues in higher dimensions. This idea is further generalized to linear combinations ofcell residuals in Joe and Maydeu-Olivares (2010).The method of limited-information statistics is applicable to√n-consistent and asymp-totically normal estimators, including the maximum likelihood estimator. There exist an-alytic expressions for the calculation of the limiting variance of the limited-informationstatistic, although they are not always easy to use such as when the number of low-ordermargins is large or when the marginal expected counts are intractable. This problem ismuch harder for continuous variables. In addition to the restrictions applicable to the dis-crete setting, here the behaviour of the limiting distribution depends on other factors suchas the feature chosen as well as the method of fitting marginal and dependence parameters.Taking these challenges into consideration, we develop criteria for assessing the adequacyof a model. To facilitate interpretation, we make use of the standardized root-mean-squareresiduals (SRMSR) (see, e.g., Hu and Bentler (1998, 1999)) which have the same scale asthe feature being considered. The most sensible approach for a particular problem dependson the feasibility of model simulation, estimation and evaluation of the features for the esti-mated model. As we illustrate, the simplest solution of performing a parametric bootstrapto estimate the sampling distribution of the residuals is usually not possible in high dimen-sions. We bypass this problem by evaluating the elements of the asymptotic covariancematrix without first estimating the model-based features if possible, or make approximateinference using surrogate models that have similar characteristics to the target model.Our work on using residuals to determine model fit has connections to the Kolmogorov-Smirnov, Crame´r-von Mises and Anderson-Darling-type statistics with estimated parame-ters (e.g., Chernoff and Lehmann (1954); Darling (1955) and an overview in Chapter 28of DasGupta (2008)). Based on these statistics, goodness-of-fit tests for bivariate copulashave been developed (Genest et al. (2009); Berg (2009)). Our objective is on assessingadequacy-of-fit of parametric models, and is different from what these tests aim to of-fer. Goodness-of-fit procedures involve statistical tests that try to answer the question ofwhether the observed data come from the assumed class of models, and can be useful whenthis is exactly the purpose of investigation. However, there is generally no stochastic, phys-ical or subject matter-based theoretical basis for a class of multivariate models or copulasand hence we cannot hypothesize a “correct” (null) model before data analysis. Rather6we are interested in parsimonious models that reasonably capture certain features of thedata that are the most relevant for inferences. To this end, we do not formulate statisticalhypotheses and instead use the adequacy-of-fit statistic as a diagnostic check.Note that the Kolmogorov-Smirnov, Crame´r-von Mises and Anderson-Darling-type statis-tics combine the information of the residuals (i.e., taking the supremum of or integratingover some functions of these residuals) in their constructions; the information about in-dividual residuals is often lost and neglected in subsequent analysis. On the other hand,our proposed adequacy-of-fit statistic retains the residual for each bivariate margin. Thishas the advantage that, in the case a model is found to be inadequate, one can look forthe source of misfit and make improvements accordingly. It is therefore easier to eliminatestructural inadequacy using these feature-based statistics.1.4 Research contributions and organization of thesisTo summarize, our main research contributions include the following:• Extension of the concept of parsimonious dependence structure to the modelling ofmultivariate extremes (Chapters 3 and 4). These dependence structures have beenpreviously investigated for discrete (including binary or count data), non-extremecontinuous or a mixture of discrete/continuous data.• Adaptation of the F-madogram empirical estimator of the extremal coefficient tomeasure the strength of tail dependence of non-extreme-value copulas (Chapter 5).• Diagnostics for parametric models to guard against inadequacy of fit, using the vectorof residuals or differences between empirical and model-based features. In certainsituations, the limiting distribution of such differences has a covariance matrix thatcan be separated into the empirical and model-based components, allowing simplercalculations (Chapter 6).• Assessment of adequacy-of-fit for higher-dimensional models with parsimonious de-pendence structure, using differences between empirical and model-based features inbivariate marginal distributions (Chapter 7).• An efficient (in the sense of having O(N log2N) complexity) algorithm for the eval-uation of bivariate empirical distributions at the observed values. This is given inAppendix E as an algorithmic supplement to one of the methods we use in the as-sessment of model adequacy-of-fit.7The rest of the thesis is structured as follows. Chapter 2 provides a review of existingmethods, and also serves partly as literature review in the relevant areas. The construc-tion of extreme value copulas with parsimonious dependence structures is dealt with inChapter 3. We investigate two approaches to constructing extreme value copulas with fac-tor or truncated vine structures: (a) through the extreme value limit of existing copulamodels, and (b) through structuring the underlying correlation matrix of flexible extremevalue copulas. We illustrate through examples that these models offer intuitive interpreta-tions. Chapter 4 includes the technical derivations pertaining to the numerical integrationprocedures in obtaining pairwise densities of the 1-factor extreme value copula. Chapter5 focuses on the bivariate empirical extremal coefficient, originally designed for extremevalue copulas. The bivariate extremal coefficient has been used for diagnostic purposesin modelling multivariate spatial extremes. We review some common empirical estimatorsand discuss the potential extension of one of these empirical estimators to the non-extremecontext, and provide connections with the tail-weighted dependence measures of Krupskiiand Joe (2015). Chapters 6 and 7 are devoted to the assessment of model adequacy usingbivariate Kendall’s τ , Spearman’s ρ, tail-weighted dependence measures and extremal co-efficient (for extreme value copulas) for copula models with parsimonious dependence. Theasymptotic properties of the vector of residuals, defined as differences between empiricaland model-based features, are derived in Chapter 6. Based on residuals, this class of statis-tics is very general and can be applied in a variety of situations, such as a combinationof different features for univariate or multivariate distributions. Chapter 7 focuses on theparticular application of such statistics to assess the adequacy of a multivariate copula us-ing residuals from bivariate marginal distributions. We demonstrate that the critical valueof the adequacy-of-fit statistic depends quite heavily on the overall dependence strengthof the data, with stronger dependence typically leading to smaller critical values. Vari-ous ways of determining critical values are proposed for different parsimonious dependencestructures, features, dimensions and methods of model estimation. Chapter 8 contains aconclusion of the thesis and potential topics for further research. Some of the longer proofsand derivations are given in the Appendix.8Chapter 2PreliminariesIn this chapter, we review existing results in the literature that form the basis we buildupon. Section 2.1 provides an overview of copula theory and a list of some properties andparametric families of bivariate copulas. Building multivariate models with factor and vinecopulas is also addressed. Section 2.2 has results in univariate and multivariate extremevalue theory. Multivariate extremes are closely related to the concept of tail dependence;Section 2.3 has details on tail dependence functions and extreme value limit of copula mod-els. We focus on model adequacy in the latter part of the thesis; to facilitate such discussionwith the use of dependence measures, an overview of these measures is provided in Section2.4. Finally, in Section 2.5 we mention several model estimation methods for multivari-ate copulas. Because of model complexity, maximum likelihood is not always possible orcomputationally efficient. Methods such as composite likelihood, inference functions formargins and marginal ranks thus emerge as viable alternatives.2.1 Copula theory and examplesWe begin by introducing the concept of copulas and several examples of common bivariatecopulas. An overview of factor and vine copulas is then given.2.1.1 Copulas as multivariate distributions on the unit hypercubeCopulas are multivariate distributions with Uniform(0,1) margins (hereafter referred to asthe unit uniform distribution). For any multivariate distribution with arbitrary marginaldistributions, Sklar (1959) shows that there exists a corresponding copula. Let (X1, . . . , Xd) ∼F with marginal distributions F1, . . . , Fd. Then there exists a copula C such thatF (x1, . . . , xd) = C (F1(x1), . . . , Fd(xd)) .9Furthermore, C is unique if F is continuous. The copula C has unit uniform marginsbecause of the probability integral transform in its arguments. This representation impliesthat the modelling of marginal and dependence components can be separated, and thechoice of the dependence structure is independent of the marginal distributions.Many parametric copula families with different tail properties exist in the literature(see, e.g., Chapter 4 of Joe (2014)). We list some commonly used parametric bivariatecopula families below; they are used in various places of this thesis. The support of allthese copulas is on [0, 1]2.• Gaussian (or normal) copula:C(u, v; ρ) = Φ2(Φ−1(u),Φ−1(v); ρ), ρ ∈ [−1, 1],where Φ and Φ2 are the univariate and bivariate standard Gaussian distribution func-tions, respectively.• Student t copula:C(u, v; ρ, ν) = T2,ν(T−11,ν (u), T−11,ν (v); ρ), ν ∈ (0,∞); ρ ∈ [−1, 1],where Td,ν is the distribution function of the d-dimensional Student t distributionwith ν degrees of freedom.• Frank copula (Frank (1979)):C(u, v; δ) = −1δlog[1− e−δ − (1− e−δu)(1− e−δv)1− e−δ], δ ∈ (−∞,∞).• Bivariate Mardia-Takahasi-Clayton-Cook-Johnson (MTCJ) copula (Mardia (1962);Takahasi (1965); Clayton (1978); Cook and Johnson (1981)):C(u, v; δ) = (u−δ + v−δ − 1)−1/δ, δ ∈ [0,∞).• Gumbel copula (Gumbel (1960)):C(u, v; δ) = exp{−[(− log u)δ + (− log v)δ]1/δ}, δ ∈ [1,∞).It is an extreme value copula, to be defined in Section 2.2.• Hu¨sler-Reiss copula (Hu¨sler and Reiss (1989)):C(u, v; δ) = exp{−wΦ(1δ+δ2log(wx))− xΦ(1δ+δ2log( xw))}, δ ∈ [0,∞),(2.1)where w = − log u and x = − log v. It is also an extreme value copula.10• t-EV copula (Demarta and McNeil (2005)):C(u, v; ρ, ν) = exp{−wT1,ν+1( √ν + 1√1− ρ2[(wx)1/ν − ρ])−xT1,ν+1( √ν + 1√1− ρ2[( xw)1/ν − ρ])} , ν ∈ (0,∞); ρ ∈ [−1, 1],where w = − log u and x = − log v. It is also an extreme value copula.• BB1 copula (Joe and Hu (1996)):C(u, v; θ, δ) =(1 +[(u−θ − 1)δ + (v−θ − 1)δ]1/δ)−1/θ , θ ∈ (0,∞); δ ∈ [1,∞).This is a two-parameter family with potentially different levels of upper and lowertail dependence, to be addressed later in this subsection.• Independence copula, C⊥:C⊥ = C⊥(u, v) , uv.As the name suggests, the copula corresponding to any bivariate distribution withindependent variables is the independence copula.• Comonotonicity copula, C+:C+ = C+(u, v) , min(u, v).The two variables of a comonotonicity copula are perfectly positively dependent, i.e.,if (U, V ) ∼ C+, then U = V almost surely.• Countermonotonicity copula, C−:C− = C−(u, v) , max(0, u+ v − 1).The two variables of a countermonotonicity copula are perfectly negatively dependent,i.e., if (U, V ) ∼ C−, then U = 1− V almost surely.Copulas can be differentiated based on their dependence properties. Some of them include1:• Symmetry. There are various types of symmetry properties. A copula has permu-tation symmetry (or exchangeability) if C(u, v) = C(v, u) for all (u, v) ∈ [0, 1]2, andreflection symmetry if Ĉ(u, v) , C(1− u, 1− v) = u+ v − 1 + C(1− u, 1− v) is thesame copula as C(u, v). Here C is the survival function of C and Ĉ is known as the1We focus on bivariate copulas in the following discussion of dependence properties.11reflected or survival copula of C. If (U, V ) ∼ C, then Ĉ is the distribution functionfor the pair (1 − U, 1 − V ). Graphically, a copula is permutation symmetric if thebivariate density is symmetric along the (0, 0)–(1, 1) diagonal.• Tail dependence. Roughly speaking, a copula has lower (resp. upper) tail depen-dence if the probability of one variable being very small (resp. large), given the otherone is very small (resp. large), is non-zero. The extreme value limit of a copula with-out tail dependence is the independence copula. Section 2.3 has a detailed overviewof tail dependence and extreme value limits.• Quadrant dependence. A copula has positive quadrant dependence (PQD) ifC(u, v) ≥ uv for all (u, v) ∈ [0, 1]2. That is, a copula is PQD if variables are morelikely to be jointly large or small compared to the independence copula. Reversingthe inequality gives negative quadrant dependence (NQD).• Stochastically increasing positive dependence. For (U, V ) ∼ C, V is said tobe stochastically increasing (SI) in U if the conditional distribution CV |U (v|u) isdecreasing in u for all v. That is, as U increases, the conditional probability of V ≤ vdrops for fixed v (V is likely to be larger). Note that SI positive dependence impliesPQD. Reversing the inequality results in stochastically decreasing random variables.• Concordance ordering. A copula family C(u, v; θ) is increasing (resp. decreasing)in concordance ordering if θ2 ≥ θ1 implies C(u, v; θ2) ≥ C(u, v; θ1) (resp. C(u, v; θ2) ≤C(u, v; θ1)) for all (u, v) ∈ [0, 1]2. A copula that is increasing in concordance orderingwith respect to θ is more likely to yield joint small or large values for larger θ.The properties of the copulas mentioned above are summarized in Table 2.1.2.1.2 Factor copulasThe factor copula model is a generalization of the classical factor analysis to non-Gaussiandependence structures. It assumes the observed variables U1, . . . , Ud depend on one or morelatent variables V1, . . . , Vp. Figure 2.1 presents a schematic diagram of the 1- and 2-factorcopulas. With a 1-factor copula, the observed variables are assumed to depend on onecommon latent variable; whereas in a 2-factor copula dependence arises from two latentvariables which are assumed to be independent of each other.Each edge in Figure 2.1 represents one bivariate linking copula with the distributionindicated. For example, V1U1 indicates that the bivariate copula links the distributionsFV1 and FU1 , while V2U2;V1 indicates that the copula links the conditional distributions12CopulaSymmetry Tail dep. Quad. dep. Stoc. ord.Conc. ord.Perm. Ref. Lower Upper +ve −ve Incr. Decr.Gaussian 3 3 7 7 ρ ≥ 0 ρ ≤ 0 ρ ≥ 0 ρ ≤ 0 3Student t 3 3 3 3 ρ ≥ 0 ρ ≤ 0 7 7 w.r.t. ρFrank 3 3 7 7 δ ≥ 0 δ ≤ 0 δ ≥ 0 δ ≤ 0 3MTCJ 3 7 3 7 3 7 3 7 3Gumbel 3 7 7 3 3 7 3 7 3Hu¨sler-Reiss 3 7 7 3 3 7 3 7 3t-EV 3 7 7 3 3 7 3 7 w.r.t. ρBB1 3 7 3 3 3 7 3 7 3C⊥ 3 3 7 7 — — — — —C+ 3 3 3 3 3 7 — — —C− 3 3 7 7 7 3 — — —Table 2.1: Dependence properties of some commonly used copulas. Unless specified, prop-erties correspond to the copulas with non-boundary parameter values.U1U2U3...UdV1V1U1V1U2V1U3V1UdU1U2U3...UdV1 V2V1U1V1U2V1U3V1UdV2U1;V1V2U2;V1V2U3;V1V2Ud;V1Figure 2.1: Dependence between observed and latent variables for the 1-factor (left) and2-factor (right) copula models. The observed variables are denoted by U1, . . . , Ud while thelatent variables are denoted by V1 and V2.FV2|V1 and FU2|V1 . As in the usual case for copulas, we assume that U1, . . . , Ud have beentransformed to have unit uniform margins, and the latent variables Vi are independentunit uniform variables. To specify a 1-factor copula model, one needs d bivariate copulasto link the observed variables to the latent factor, i.e., {CUi,V1 : i = 1, . . . , d}. The jointdistribution function of the observed variables, C(u1, . . . , ud), can be found by integrating13out the latent variable:C(u1, . . . , ud) =ˆ 10CU1,...,Ud|V1(u1, . . . , ud|v1)fV1(v1) dv1 =ˆ 10d∏i=1CUi|V1(ui|v1) dv1, (2.2)due to the assumed independence among the conditional distributions, where CUi|V1 is theconditional distribution of Ui given V1 and fV1(v1) = 1 is the density function of V1 whenv1 ∈ [0, 1]. The copula density isc(u1, . . . , ud) =∂dC(u1, . . . , ud)∂u1 · · · ∂ud =ˆ 10d∏i=1cUi,V1(ui, v1) dv1as CUi|V1(ui|v1) = ∂CUi,V1(ui, v1)/∂v1 for i = 1, . . . , d and further differentiation with re-spect to ui gives the copula density for Ui and V1.For the 2-factor copula model, in addition to the d copulas {CUi,V1 : i = 1, . . . , d}, an-other d are used to link the observed variables to the second latent variable given the firstone. That is, a copula in this second level links the conditional distributions CUi|V1 andCV2|V1 (which is unit uniform), denoted by CUi,V2;V1 . The set of copulas to be specified istherefore {CUi,V1 , CUi,V2;V1 : i = 1, . . . , d}. In this case, the joint distribution function of theobserved variables isC(u1, . . . , ud) =ˆ 10ˆ 10CU1,...,Ud|V1,V2(u1, . . . , ud|v1, v2)fV1,V2(v1, v2) dv1dv2=ˆ 10ˆ 10d∏i=1CUi|V2;V1(CUi|V1(ui|v1)|v2)dv1dv2, (2.3)where fV1,V2(v1, v2) = 1 is the joint density of V1 and V2 and CUi|V2;V1 is the conditionaldistribution of Ui|V1 given V2|V1. The derivation of (2.3) is presented in Krupskii and Joe(2013). The copula density function, necessary for inference, is given byc(u1, . . . , ud) =ˆ 10ˆ 10d∏i=1[cUi,V2;V1(CUi|V1(ui|v1), v2) · cUi,V1(ui, v1)]dv1dv2.Numerical integration is usually needed for the evaluation of the integral. The factor copulamodel includes the Gaussian factor model as a special case; it can be constructed by takingthe linking copulas as suitable Gaussian copulas.2.1.3 Vine copulasThe pair-copula construction (or vine copula) approach (Bedford and Cooke (2001, 2002);Aas et al. (2009)) allows one to build multivariate copulas hierarchically using only bivariate14linking copulas in each hierarchy or tree, directly linking variables together. In the firsttree of a d-dimensional vine copula, the d variables are linked together through the useof d − 1 bivariate copulas. In each subsequent tree, the copulas used link the conditionaldistributions implied by the copulas in the previous tree. Therefore a full regular vine on dvariables has d− 1 trees and is completely specified by (d2) bivariate copulas. An exampleis that of a C-vine (canonical vine); Table 2.2 lists the linking copulas in each tree of such avine rooted at the first variable, where each entry in the table corresponds to one bivariatecopula linking the variables indicated. Variables after the semicolon are those conditionedupon, e.g., 34;12 means that the copula models the dependence between the conditionaldistributions F3|12 and F4|12. Meanwhile, Figure 2.2 displays the vine diagram for thefirst tree of the same vine copula. By choosing the location of the connecting edges andparametric copula families appropriately, one may introduce different dependence structuresamong the variables.Tree 1 2 3 · · · d− 112 23;1 34;12 (d− 1,d);(123, . . . , d− 2)13 24;1 35;12Linking 14 25;1...variables 15... 3d;12... 2d;11dTable 2.2: Linking copulas for each tree of a C-vine rooted at variable 1. For brevity, somecommas between variables are omitted. Variables after the semicolon are those conditionedupon.234...d11213141dFigure 2.2: Vine diagram for the first tree of a C-vine rooted at variable 1The copula density of a d-dimensional regular vine is given by the product of all con-15stituent bivariate copula densities, i.e.,c(u1, . . . , ud) =∏[i1,i2|S(i1,i2)]∈E(V)ci1,i2;S(i1,i2)(Ci1|S(i1,i2)(ui1 |uS(i1,i2)), Ci2|S(i1,i2)(ui2 |uS(i1,i2))),(2.4)where [i1, i2|S(i1, i2)] is an edge and E(V) is the set of all edges of the vine V (see equation(3.41) of Joe (2014)). The distribution function is in general a d-dimensional integral, ob-tained by integrating (2.4) with respect to u1, . . . , ud. The existence of an explicit densityfunction allows statistical inference using the likelihood function, but quantities involv-ing the distribution function may be difficult to obtain and numerical integration is oftenrequired.The dependence structure is sometimes well approximated by the first few trees. Ifhigh-order residual dependence can be ignored, a parsimonious vine model can be obtainedby truncation. A p-truncated vine with p < d − 1 is one with all linking copulas beyondthe pth tree being the independence copula. A truncated vine has fewer parameters andmodel estimation is less costly. The special case of a 1-truncated vine is also known as aMarkov tree; it is so named due to its conditional independence property given variablesalong the path. For example, the vine in Figure 2.2 is a Markov tree if the shown linkagesare the only ones with non-independence copulas. In this case, variable i is independent ofvariable j given variable 1, for 2 ≤ i < j ≤ d. Note that the p-factor copula model is animplementation of a p-truncated C-vine rooted at the latent variables (Krupskii and Joe(2013)); all linking copulas beyond tree p are assumed to be independence copulas.2.2 Extreme value theoryThis section provides a brief overview on univariate and multivariate extreme value theory.An example is then given to demonstrate that a multivariate extreme value distributionmay not be an appropriate model for an arbitrary combination of extreme variables.2.2.1 Univariate extreme value theoryLet X1, . . . , Xn be n independent and identically distributed random variables from thedistribution F , and Mn ,∨ni=1Xi be the maximum. Fisher and Tippett (1928) andGnedenko (1943) show that, if there exist location and scale parameters an ∈ R and bn > 0such that (Mn − an)/bn converges to a non-degenerate distribution as n → ∞, then thelimiting distribution must either be Gumbel2 (light-tailed), Weibull (finite upper end point)or Fre´chet (heavy-tailed). These three families of distributions can be condensed into the2The Gumbel distribution here is different from the Gumbel copula mentioned in Section 2.1.16generalized extreme value (GEV) distribution (von Mises (1954); Jenkinson (1955)) withdistribution functionG(x;θ) =exp{− [1 + γ (x−µσ )]−1/γ+ } , γ 6= 0;exp{− exp{−x−µσ }} , γ = 0,where [y]+ = max(y, 0) and θ = {µ, σ, γ} is the set of parameters. In particular, theGumbel, Weibull and Fre´chet distributions are retrieved as γ = 0, γ < 0 and γ > 0,respectively. The above convergence in distribution result somewhat resembles the centrallimit theorem for the mean, in which suitable standardization of the sample mean by itscorresponding location and scale parameters (i.e., mean and standard deviation) leads tothe convergence to the standard normal distribution as n → ∞. Since the minimummn ,∧ni=1Xi = −∨ni=1 (−Xi), it suffices to consider the theory for maxima.Because each sample contains only one maximum, it is customary to use the block max-ima approach for inference (see, e.g., Coles (2001); Beirlant et al. (2004)). Observationsare divided into blocks of sufficiently large size or based on natural separation in the mea-surement unit such as months or years, and the maximum is obtained in each block. Apartfrom maximum likelihood estimation of the parameters (Prescott and Walden (1980, 1983);Hosking (1985)), alternatives such as the probability-weighted moments (Greenwood et al.(1979); Hosking (1985)) and elemental percentile (Castillo and Hadi (1995)) methods havebeen proposed.2.2.2 Multivariate extreme value theorySome early literature on multivariate extremes includes de Haan and Resnick (1977) andPickands (1981). Let (Xi1, . . . , Xid) ∼ F be the ith replicate of a random vector of dimen-sion d, i = 1, . . . , n. The vector of componentwise maxima is defined as (Mn1, . . . ,Mnd) ,(∨ni=1Xi1, . . . ,∨ni=1Xid). Suppose there exist standardizing constants anj ∈ R, bnj > 0 forj = 1, . . . , d such that, for yj = anj + bnjxj , we haveP (Mn1 ≤ y1, . . . ,Mnd ≤ yd) = Fn(y1, . . . , yd) , Hn(y1, . . . , yd)→ H(y1, . . . , yd)for some non-degenerate distribution function H as n → ∞. It can be shown that H hasGEV margins. It is customary to transform H to have some standardized margins to focuson the modelling of dependence structure. Let Gi be the marginal GEV distribution of theith variable of H, and H(y1, . . . , yd) = C (G1(y1), . . . , Gd(yd)) for some copula C. WriteC(u1, . . . , ud) = exp {−A(w1, . . . , wd)} , (2.5)17where wj = − log uj , j = 1, . . . , d. The representation of de Haan and Resnick (1977) andPickands (1981) is that the exponent function can be written asA(w1, . . . , wd) =ˆSdd∨j=1(ωjwj) dµ(ω),where Sd = {ω ∈ Rd+ : ||ω|| = 1} is the unit simplex on Rd and µ is a positive measure onSd that satisfies the mean constraintsˆSdωj dµ(ω) = 1, j = 1, . . . , d.The exponent function A is homogeneous of order 1 (see, e.g., Section 3.15 of Joe (2014)).The copula C constructed above is max-stable, defined asCn(u1/n1 , . . . , u1/nd ) = C(u1, . . . , ud) (2.6)for every positive integer n. A copula is an extreme value copula if and only if it satisfies (2.6)(Galambos (1987); Joe (1997)). In this case, G(w1, . . . , wd) , C(e−w1 , . . . , e−wd) is a min-stable survival function with unit exponential margins. The definition G is useful in thederivations below.The density function of an extreme value copula is obtained by differentiating (2.5) withrespect to its arguments. Using Theorem 8.46 of Joe (2014) and the chain rule, the densityof an extreme value copula CEV (u1, . . . , ud) is given bycEV (u1, . . . , ud) =(−1)d∏di=1 ui· ∂dG(w1, . . . , wd)∏di=1 ∂wi, wi = − log ui,with the mixed partial derivatives of G of order k ≤ d equal to(−1)k ∂kG∏ki=1 ∂wi= e−A · ∑P=(S1,...,S|P |)(−1)k−|P ||P |∏i=1A(Si) , (2.7)where the arguments of G are omitted for brevity, A(S) = ∂|S|A∏j∈S ∂wjfor a non-empty subsetS of the index set {1, . . . , d}, and the summation is over all possible partitions of the indexset. For d = 1, 2, 3, we can readily obtain the following expressions:− ∂G∂w1= e−A(∂A∂w1)= e−A(A(1));∂2G∂w1∂w2= −∂e−A (A(1))∂w2= e−A(A(1)A(2) −A(12));− ∂3G∂w1∂w2∂w3= −∂e−A (A(1)A(2) −A(12))∂w3= e−A(A(1)A(2)A(3) −A(12)A(3) −A(13)A(2) −A(23)A(1) +A(123)).18Since the summation in (2.7) enumerates all possible partitions of the index set, the numberof terms in the partial derivative (and hence the copula density) grows according to theBell numbers (Bell (1934)).The case with d = 2 has been studied in detail. Examples of commonly used bivariateextreme value distributions include the Gumbel or logistic (Gumbel (1960)), asymmetriclogistic (Tawn (1988)), Galambos or negative logistic (Galambos (1975)) and its asymmetricvariant (Joe (1993)), and the Hu¨sler-Reiss (Hu¨sler and Reiss (1989)) model. However, notall of these have a direct multivariate generalization. The Hu¨sler-Reiss distribution isone that allows so: A random vector (U1, . . . , Ud), with unit uniform margins, follows theHu¨sler-Reiss distribution if the distribution function is given byCU1,...,Ud(u1, . . . , ud; {δjk}) = exp{−d∑j=1wjΦd−1,Γj (vj)}, (2.8)where vj = (v1, . . . , vj−1, vj+1, . . . , vd) with vk = δ−1jk + δjk log (wj/wk) /2, wj = − log ujand Φd−1,Γj is the (d−1)-dimensional Gaussian distribution with zero mean and correlationmatrixΓj = (γik)i,k 6=j , γik =δ−2ij + δ−2kj − δ−2ik2δ−1ij δ−1kj. (2.9)The(d2)parameters, one for each bivariate margin, are δij ≥ 0 with δji = δij ; for complete-ness, δ−1ii is defined to be 0 for all i. The distribution function in the compact form (2.8) isattributed to Nikoloulopoulos et al. (2009). The Hu¨sler-Reiss distribution is generalizableto higher dimensions because of the particular way it is constructed. In the bivariate case,it is obtained as the extreme value limit of a bivariate Gaussian vector whose correlationapproaches 1 at a suitable rate, as the sample size tends to infinity. A direct generalizationis obtained by letting all pairwise correlations of a multivariate Gaussian vector tend to 1.It is easy to verify that the distribution (2.8) satisfies the max-stability condition (2.6).The Hu¨sler-Reiss distribution is used in the field of spatial extremes (see, e.g., Smith(1990)) and arises from a completely different framework known as the class of max-stableprocesses. It generalizes finite multivariate extreme value distributions to a continuous statespace. In particular, consider a Poisson process {(ξi,U i) : i ∈ N} on R+×Rd with intensitymeasure ξ−2dξ × ν(du), where ν is a positive measure. Let g : (Rd × Rd) → R+ be theGaussian density g(u, t) = (2pi)−d/2 |Σ|−1/2 exp{−12(u− t)ᵀΣ−1(u− t)} centred at u andwith covariance matrix Σ. The process defined byZ(t) = maxi=1,...{ξig(U i, t)} (2.10)has finite-dimensional distributions being Hu¨sler-Reiss (Genton et al. (2011)). The process(2.10) is now known as the Smith model. The joint distribution of Z(t1) and Z(t2) for19t1, t2 ∈ Rd has dependence parameter δ12 = 2[(t1 − t2)ᵀΣ−1(t1 − t2)]−1/2. Smith (1990)gives the process (2.10) a storm profile interpretation: Consider an infinite number ofstorms with intensities ξi and centres ui. The function g(u, t) reflects the effect, such asprecipitation, at location t induced by a storm centred at u. The overall effect depends onthe size of the storm ξi in a multiplicative manner. The finite-dimensional representationof the Smith process is a parsimonious model of the Hu¨sler-Reiss distribution because thedependence parameters δij are completely specified by the covariance matrix Σ. In otherwords, Σ determines the joint distribution of an arbitrary number of random variables inthis random field.A generalization of the multivariate Hu¨sler-Reiss distribution is the t-EV distribution(Demarta and McNeil (2005); Nikoloulopoulos et al. (2009)), obtained by taking the extremevalue limit of the t distribution. A random vector (U1, . . . , Ud), with unit uniform margins,follows the t-EV distribution if the distribution function is given byC(u1, . . . , ud; Ω, ν) = exp{−d∑j=1wjTd−1,ν+1 (sj ; Ωj)}, (2.11)where sj = (s1, . . . , sj−1, sj+1, . . . , sd) with sk =√ν + 1[(wj/wk)1/ν − ρjk] /√1− ρ2jk,wj = − log uj , ν is the degrees of freedom parameter, Ωj is the correlation matrix Ω =(ρrs) with the jth variable partialled out (that is, the ρrs’s are partial correlations), andTd−1,ν+1(·; Ωj) is the (d − 1)-variate t distribution function with ν + 1 degrees of freedomand correlation matrix Ωj (Section 4.16 of Joe (2014)). The t-EV distribution is the finite-dimensional distribution of the max-stable process known as extremal-t (Opitz (2013), seealso Ribatet (2013)). The Hu¨sler-Reiss distribution is obtained by taking the limit ν →∞.2.2.3 Relationship between marginal and joint extremesThrough a data set, we illustrate in this section that the class of multivariate extreme valuecopulas may not be an appropriate family for an arbitrary collection of marginally extremevariables. In this example, the variables of interest are annual maxima of daily streamflowsat 6 gauging stations in eastern Vancouver Island. The locations of the stations are shownin Figure 2.3. There are around 50 observations for each station except Mill Bay, whichhas 34. Inspection of the data set reveals that, in many cases, the extremes for differentstations are recorded on vastly different dates. For instance, in 2003 extreme streamflowswere recorded in mid-October for three of the stations, mid-March for two of them and earlyJanuary for the remaining one. Given this feature, the componentwise maxima (with blocksizes of one year) are unlikely to correspond to actual sample points from a multivariateextreme value distribution.20llllll12345648.448.648.849.049.2−124.5 −124.0 −123.5 −123.0lonlatFigure 2.3: Locations of the gauging stations in eastern Vancouver Island. The labels are:1 – Cassidy; 2 – Westholme; 3 – Duncan (Bing’s Creek); 4 – Duncan (Cowichan River); 5 –Cowichan Station; 6 – Mill Bay. The Google map is plotted using the ggmap package in R.To analyze the data set, we first fit GEV distributions to the margins; the resulting Q-Qplots (Figure 2.4) show acceptable fit. Next, we observe the pairwise dependence structureby means of normal scores plots, in which the data are transformed to standard normalmargins using the empirical quantiles. The normal scores plot is preferred over a plot of theraw data because variables may have different scales. It is also preferred over a plot of datatransformed to unit uniform margins because the tail behaviour is more easily identified ina normal scores plot. From the pairwise normal scores plots in Figure 2.5, it is clear thatthe lower tail has stronger dependence than the upper tail for most pairs. Joint modellingusing multivariate extreme value copulas is inappropriate as such copulas for the maximumcan only have upper tail dependence (see Section 2.3.3). Another way to illustrate theunsuitability of multivariate extreme value copulas is to produce empirical max-stable ormin-stable plots as in Figure 2.6. These plots are based on the principle of max- or min-stability of the corresponding weighted maxima/minima: For a multivariate extreme valuecopula CEV (u1, . . . , ud) = exp {−A(w1, . . . , wd)} = G(w1, . . . , wd), let W1, . . . ,Wd be unit21exponential random variables with joint survival function G. Define W ∗ =∧di=1Wi/ωi,where the ωi’s are positive constants. Then,P(W ∗ > w) = P (W1 > wω1, . . . ,Wd > wωd) = exp {−wA(ω1, . . . , ωd)} (2.12)due to the homogeneity property of A. Hence W ∗ is exponentially distributed with rateA(ω1, . . . , ωd). An empirical min-stable plot is a Q-Q plot of the realizations of W∗ versusthe theoretical quantiles of an exponential distribution, for some arbitrarily chosen weightsωi’s. The realizations of W∗ are obtained from the empirical probability integral transform,so that each margin has unit exponential distribution. The mean of the exponential distri-bution (2.12) can be estimated using maximum likelihood method (i.e., the sample mean inthis case). If the data can be reasonably approximated by a multivariate extreme value dis-tribution, the Q-Q plot should show no substantial departure from a straight line. Similarly,for the max-stable plot, we let Vi = 1/Wi, i = 1, . . . , d, be unit Fre´chet random variableswith joint distribution F (v1, . . . , vd) = G(1/v1, . . . , 1/vd). We use (ω1, . . . , ω6) = (1, . . . , 1)for the plots in Figure 2.6. The departure from the straight line in these plots suggests thata multivariate extreme value copula is not appropriate here.0 200 400 600 8000200400600800Station 1ModelEmpirical100 200 300 400 500100200300400500600700Station 2ModelEmpirical5 10 15051015Station 3ModelEmpirical100 200 300 400 500100200300400500600Station 4ModelEmpirical50 100 150 200 250050100150200250Station 5ModelEmpirical5 10 15 20 25 30 3501020304050Station 6ModelEmpiricalFigure 2.4: Q-Q plots for the marginal GEV fitting of the Vancouver Island streamflowsdataThe reason for the mismatch in the dates is that the extremes for various stations areinitiated by different mechanisms. Unlike the Fraser River example we will introduce inSection 3.7, snowmelt is not the main driving force behind extreme streamflows in eastern22Stn 1−2 0 2lllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−2 0 2llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−2 0 1 2−202lllllllllllllllll llllllllll llllll−202llllll lllllllllllllllll lll lllllllllllllll Stn 2 llllllllllllllllllll lll lllllllllllllllll l ll llllllllllllllllllllllll ll llllllllllllllll ll lllllllllllllll lllllllllllllllllllllll lllllllllllllllll ll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllStn 3 l llllllllllllll llllllllll llllllllllll llllllll lllllllllllllllllllllllllllll llllll llllll−2012lllllllllllllll llllll lllllllll−202lllllllll lllllllllllllllllllllllllll llllllllllllllllll lllllllllllllllllllllllll llllllll lllllllll lllllllllllllllllllllllllllllllllll Stn 4 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllll llllllllllllllllllll lll llll llllllllllll llllllllllllllllllllllllll llllllllllll lllll llllllllllllllllllllll lllllllllllllllll lllllllll lllllllllllllllllllllll Stn 5−202lllllllllllll llllllllllllll ll llll−2 0 2−2012ll llllll lllllll llllllllllllllllllll llllll lllllll lllllllllllllllllll−2 0 1 2ll lllllllllll lllllllllllllll llll llllll lllllll lllllllllllllllll−2 0 2ll lllll l lllllll lllllllllllllll lllStn 6Figure 2.5: Scatterplot of normal scores for the Vancouver Island streamflows dataVancouver Island. Even though the stations are only tens of kilometres apart, heavy rainfalltends to be more localized and thus an episode of extreme precipitation is less likely toaffect all stations at the same time. As a result, joint extremes may not exhibit uppertail dependence and multivariate extreme value copulas are unsuitable. For such kind ofdata, it is possible to conduct inference by fitting GEV distributions to the margins butusing general copulas for the multivariate dependence structure. Examples of this hybridmodelling technique can be found in the hydrology literature, e.g., Klein et al. (2010) andRequena et al. (2013).23lllllllllllllllll lllllllll0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.50.00.51.01.52.02.53.03.5Min−stable plotEmpirical quantileTheoretical quantilelllllllll lllll llllllll0 10 20 30 40 50 600102030405060Max−stable plotEmpirical quantileTheoretical quantileFigure 2.6: Min-stable plot with exponential margin (left) and max-stable plot with Fre´chetmargin (right) for the Vancouver Island streamflows data. For the max-stable plot, twoobservations have quantile values much larger than 60 and are beyond the boundaries ofthe plot.2.3 Tail dependence functions and extreme value limit ofcopula modelsIn this section, we discuss the concept of tail dependence and outline its relationship withextreme value copulas. Some other quantities related to the dependence properties of abivariate extreme value copula are then given.2.3.1 Tail dependence functionsTail dependence functions describe the behaviour of a copula at the joint upper or lowertail. The concept of tail dependence is introduced formally in Nikoloulopoulos et al. (2009);see also Joe et al. (2010) for its use in vine copulas. We first consider continuous bivariatecopulas C; these concepts can be extended to higher dimensions easily.To begin, the tail dependence index (or coefficient) is a commonly used measure incopula and extreme value theory to summarize the strength of dependence as one approachesthe limit of a multivariate distribution (see, e.g., Joe (1997); Coles (2001)). The lower taildependence index is defined asλL , limu→0+C(u, u)u= limu→0+P (U1 ≤ u|U2 ≤ u) = limu→0+P (U2 ≤ u|U1 ≤ u) , (2.13)24provided the limit exists. Similarly, the upper tail dependence index is defined asλU , limu→1−C(u, u)1− u = limu→1− P (U1 > u|U2 > u) = limu→1− P (U2 > u|U1 > u) , (2.14)where C(u, u) = P(U1 > u,U2 > u) = 1 − 2u + C(u, u) is the survival function of (U1, U2)at (u, u). It is easy to see that λL, λU ∈ [0, 1]; when λL (resp. λU ) is 0, we say that thebivariate copula has no lower (resp. upper) tail dependence. The strength of tail dependenceincreases with the magnitude of λL or λU .One limitation of the tail dependence index is that it only summarizes dependencealong the main diagonal of the copula. The tail dependence function (Nikoloulopouloset al. (2009); Joe et al. (2010)) is a generalization of the tail dependence index. For abivariate copula C, the lower tail dependence function is given byb(w1, w2;C) , limu→0+C(uw1, uw2)u, (2.15)while the upper tail dependence function isb∗(w1, w2;C) , limu→0+C(1− uw1, 1− uw2)u= limu→0+Ĉ(uw1, uw2)u,where Ĉ(v1, v2) = C(1 − v1, 1 − v2) is the reflected copula of C and thus the upper taildependence function of C is equal to the lower tail dependence function of Ĉ. The lower andupper tail dependence indices are then given by b(1, 1;C) and b∗(1, 1;C), respectively. Byvarying the ratio w1/w2, we can obtain a measure of the tail dependence of C along differentdirections. Figure 2.7 plots the lower tail dependence functions for two bivariate copulas:MTCJ and reflected Gumbel. For each copula, the plot on the left shows the density contourof the variables (X1, X2) with standard Gaussian margins, while the plot on the right is thecontour of the corresponding lower tail dependence function b(w1, w2;C). The lower taildependence index is also shown. Both copulas have Kendall’s τ equal to 0.5, but note thatthe MTCJ copula has stronger lower tail dependence than the reflected Gumbel copula.The b and b∗ functions can be extended to higher dimensions in a straightforwardmanner. For a d-dimensional copula Cd, we haveb(w1, . . . , wd;Cd) = limu→0+Cd(uw1, . . . , uwd)u;b∗(w1, . . . , wd;Cd) = limu→0+Cd(1− uw1, . . . , 1− uwd)u.A related quantity is the marginal tail dependence function for a subset of variablesS = {k1, . . . , km} ⊂ {1, . . . , d}, where m ≤ d. The definitions for the lower and upper tails25MTCJx1x 2 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.2 −3 −1 0 1 2 3−3−1123Tail dependence functionw1w2 0.5 1 1.5 2 2.5 3 0 1 2 3 4 5012345lb(1, 1) = 0.71Reflected Gumbelx1x 2 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 −3 −1 0 1 2 3−3−1123Tail dependence functionw1w2 0.5 1 1.5 2 2.5 0 1 2 3 4 5012345lb(1, 1) = 0.59Figure 2.7: Density contours (left) and lower tail dependence functions (right) of the MTCJand reflected Gumbel copulas with Kendall’s τ equal to 0.5. The lower tail dependence indexis highlighted as the value of b(1, 1).arebS(wk1 , . . . , wkm ;Cd) = limu→0+CS(uwk1 , . . . , uwkm)u;b∗S(wk1 , . . . , wkm ;Cd) = limu→0+CS(1− uwk1 , . . . , 1− uwkm)u,where CS is the distribution function of Uk1 , . . . , Ukm . In the following, the copula argumentin the tail dependence function will be omitted when there is no ambiguity.Another related quantity is the conditional tail dependence function, involved in thecomputation of tail dependence functions for factor and vine copulas. It is defined asbk1|k2,...,km(wk1 |wk2 , . . . , wkm) = limu→0+CUk1 |Uk2 ,...,Ukm (uwk1 |uwk2 , . . . , uwkm),where ki ∈ {1, . . . , d} and ki 6= kj if i 6= j. When m = 2, bk1|k2(wk1 |wk2) can be obtainedby differentiating bk1,k2(wk1 , wk2) with respect to wk2 (Theorem 8.58 of Joe (2014)).26The tail dependence function is related to the joint probability of each component beingsmall/large. A similar quantity useful in extreme value context is related to the probabilityof some of the components being small/large, defined here as the a(·) and a∗(·) functionsfor the lower and upper tail, respectively:a(w1, . . . , wd) = limu→0+P{⋃di=1 {Ui ≤ uwi}}u;a∗(w1, . . . , wd) = limu→0+P{⋃di=1 {Ui > 1− uwi}}u.Note that a(·) and a∗(·) are known as stable tail dependence functions in the extreme valueliterature (Huang (1992); Drees and Huang (1998)). By the inclusion-exclusion principle,a(w1, . . . , wd) = limu→0+∑S⊂Id(−1)|S|−1P{⋂i∈S {Ui ≤ uwi}}u=∑S⊂Id,S 6=∅(−1)|S|−1bS(wi, i ∈ S), (2.16)where Id = {1, . . . , d} is the index set and S is a non-empty subset of Id. A similarrelationship holds between a∗(·) and b∗(·). As we will see below, the quantity a(·) appearsin the extreme value limit of a copula; equation (2.16) therefore connects the tail propertiesof a copula to its extreme value limit.2.3.2 Extreme value limit of copula modelsSince extreme value copulas satisfy the max-stability condition (2.6), we can construct anextreme value copula from an arbitrary copula C by computing its extreme value limit,i.e., limn→∞Cn(u1/n1 , . . . , u1/nd ) for the upper tail or limn→∞Ĉn(u1/n1 , . . . , u1/nd ) for the lower tail,provided that it exists and is not the distribution function of a degenerate distribution. Inthe following, we quote the procedure outlined in Joe (2014) for the evaluation of such limitwith the help of tail dependence functions.The lower extreme value limit3 of C, CLEV , is equivalent to the upper extreme value3Here we mean that CLEV has the tail properties of the lower extreme value limit of C. In termsof sampling, this means componentwise minima are taken from observations sampled from C. However,we focus on maxima for ease of presentation, and thus the CLEV constructed in the subsequent text isactually reflected and has upper tail dependence instead. For modelling purposes, componentwise minimaare negated so that they exhibit upper tail dependence.27limit of Ĉ, i.e., limn→∞Ĉn(u1/n1 , . . . , u1/nd ). First, we have from definitionĈn(u1/n1 , . . . , u1/nd ) =[P(1− U1 ≤ u1/n1 , . . . , 1− Ud ≤ u1/nd)]n∼ [P (1− U1 ≤ 1− (− log u1)/n, . . . , 1− Ud ≤ 1− (− log ud)/n)]n=[P{d⋂i=1{Ui > − log ui/n}}]n,where the approximation follows from u1/ni = exp{n−1 log ui} ∼ 1− (− log ui)/n as n→∞.Recall that, as t ↓ 0,t · a(w1, . . . , wd) ∼ P{d⋃i=1{Ui ≤ twi}}= 1− P{d⋂i=1{Ui > twi}}so that P{⋂di=1 {Ui > twi}}∼ 1−t·a(w1, . . . , wd). Substituting t = 1/n (which approacheszero from above as n→∞) and wi = − log ui, we obtainĈn(u1/n1 , . . . , u1/nd ) ∼ [1− a(− log u1, . . . ,− log ud)/n]n → exp {−a(− log u1, . . . ,− log ud)}as n → ∞. Hence, CLEV (u1, . . . , ud) = exp {−a(− log u1, . . . ,− log ud)} . The upper ex-treme value limit of C can be obtained via a similar argument, with CUEV (u1, . . . , ud) =exp {−a∗(− log u1, . . . ,− log ud)} .An insight from this result is that the extreme value limit of a given copula C can beexpressed in terms of the exponent functions a(·) and a∗(·), which are in turn related tothe tail dependence functions b(·) and b∗(·) via (2.16). A class of potentially useful extremevalue copula models can thus be derived from the original copula.2.3.3 Measures of dependence for bivariate extreme value copulasThere exist several measures that describe the strength of dependence of a bivariate extremevalue copula CEV . One widely used measure is the Pickands dependence function (Pickands(1981)). WriteCEV (u1, u2) = exp {−A(− log u1,− log u2)} . (2.17)The Pickands dependence function is given by B(w) = A(w, 1−w) with 0 ≤ w ≤ 1. It canbe shown that B(0) = B(1) = 1, max(w, 1−w) ≤ B(w) ≤ 1 and B is convex. The Pickandsdependence function also serves to simplify the calculation of other dependence measuressuch as Kendall’s τ and Spearman’s ρ, avoiding the need to evaluate a two-dimensionalintegral as is usually needed for general bivariate distributions (see Theorem 8.44 of Joe(2014)). A summary of the strength of dependence using a single number, known as the28extremal coefficient, has origins dating back to Sibuya (1960) but is coined in Smith (1990).It is defined as the parameter ϑ that satisfiesCEV (u, u) = uϑ.The possible range of ϑ is [1, 2], with ϑ = 2 at independence and ϑ = 1 at comonotonic-ity. The extremal coefficient can be interpreted as the “effective” number of independentvariables. Since A in (2.17) is homogeneous of order 1, ϑ = A(1, 1) = 2B(1/2).There is a one-to-one mapping between the extremal coefficient and the upper taildependence index, while the lower tail dependence index of an extreme value copula isalways zero except for the comonotonicity limit. This can be derived from the definition ofthe various quantities:λEVL = limu→0+CEV (u, u)u= limu→0+uϑ−1 =0 if ϑ ∈ (1, 2],1 if ϑ = 1;λEVU = limu→1−CEV (u, u)1− u = limv→0+2v − 1 + (1− v)ϑv= limv→0+2v − 1 + 1− ϑv + o(v)v= 2− ϑ or 2−A(1, 1) or 2 [1−B(1/2)] .2.4 Dependence measures for general bivariate copulasIn this section, we state the definitions of several dependence measures used in the the-sis. Dependence measures (or measures of concordance) are quantities that summarize thestrength of association among variables. The Pearson correlation coefficient is not invariantunder monotone transformations (see, e.g., Section 2.12 of Joe (2014) for this and otherproperties of the Pearson correlation coefficient) and therefore not desirable for use in copulamodelling, as the dependence structure does not depend on marginal distributions.Throughout this section, we let U i = (Ui1, Ui2)ᵀ, i = 1, . . . , n, be an i.i.d. sample froma bivariate copula C with realizations ui = (ui1, ui2)ᵀ, and rij be the rank of uij amongu1j , . . . , unj , j = 1, 2. The subscript i is dropped when we refer to the distributional prop-erties of the random variables. Note that these measures are also defined for distributionswhose margins are not U(0,1).1. Kendall’s τ (based on the number of concordant and discordant pairs in a sample),with population version given byτ = 4ˆ 10ˆ 10C(u, v) dC(u, v)− 129and the empirical version (without ties) given byτˆ =4n(n− 1)n−1∑i=1n∑j=i+11 {(ui1 − uj1)(ui2 − uj2) > 0} − 1.2. Spearman’s ρ (based on the correlation of marginal ranks), with population and em-pirical (without ties) versions given byρS = 12ˆ 10ˆ 10uv dC(u, v)− 3 = Cor (U1, U2) ; ρˆS =∑ni=1 ri1ri2 − n [(n+ 1)/2]2n(n2 − 1)/12 .3. Blomqvist’s β (based on the centre of the distribution), with population version β =4C (0.5, 0.5)−1. The most efficient way of estimating β from data (see Section 2.12.3of Joe (2014)) is through the ranks:βˆ =2nn∑i=11{(ri1 − n+ 12)(ri2 − n+ 12)≥ 0}− 1.4. Correlation of normal scores ρN (based on the normal scores of the observations),with population version ρN = Cor [Φ−1(U1),Φ−1(U2)]. Letsi1 = (ri1 − 0.5)/n; si2 = (ri2 − 0.5)/n (2.18)be the ranks adjusted to [0, 1] scale. The empirical version is given byρˆN = Ĉor[Φ−1(si1),Φ−1(si2)],where Ĉor denotes the sample correlation.5. Tail-weighted dependence measures (Krupskii and Joe (2015)), defined as the correla-tion of a functional transform of the observations in a restricted region of the samplespace. For continuous function h(·) : [0, 1]→ (0,∞) and truncation level p ∈ (0, 0.5],the population version is given by%L(h, p) = Cor[h(1− U1p), h(1− U2p) ∣∣∣∣U1 < p,U2 < p] ;%U (h, p) = Cor[h(1− 1− U1p), h(1− 1− U2p) ∣∣∣∣1− U1 < p, 1− U2 < p] ,where the subscripts L and U denote the lower and upper measures, respectively. Theempirical version is obtained by replacing the correlation function with the samplecorrelation, and the random variables U ’s with the adjusted ranks si1 and si2 defined in(2.18). Under certain regularity conditions, the empirical estimator is asymptoticallynormal.30All five measures defined above are invariant under monotone transformations. In addition,the first four measures, i.e., Kendall’s τ , Spearman’s ρ, Blomqvist’s β and correlation ofnormal scores ρN , satisfy all desirable properties for measures of concordance suggested inScarsini (1984) (see Definition 2.8 of Joe (2014)). These four measures can be considered ascentral or global dependence measures as they summarize the dependence strength of thewhole copula. For some applications, it may be of interest to focus only on the tails. Thetail dependence index is one such measure, but its definition (2.13) or (2.14) involves a limitargument and cannot be estimated reliably from data except when one considers extremevalue copulas, in which case the (upper) tail dependence index can be estimated via theextremal coefficient. The tail-weighted dependence measures serve as an alternative, forwhich the empirical version can be easily estimated from data. Note that the tail-weighteddependence measures do not satisfy all of the criteria in Scarsini (1984); for example, theyare not defined for the countermonotonicity copula.2.5 Model estimation methods for multivariate copulasDespite the desirable properties of maximum likelihood estimation, it is not always feasiblefor parametric multivariate copula models due to model complexity. In this section, weprovide an overview of several alternative approaches for inference.2.5.1 Composite likelihoodMaximum likelihood estimation requires the joint density function to be known. In high-dimensional problems, it is often hard or even impossible to obtain the joint density. Com-posite likelihood methods (Lindsay (1988); Cox and Reid (2004); Varin et al. (2011)) providean attractive alternative whereby likelihood contributions involve marginal or conditionaldensities of lower dimensions. Let F (·,θ) be the distribution function of the d-dimensionali.i.d. random vectors Y 1, . . . ,Y n with parameter θ, fA1(yA1 ;θ), . . . , fAK (yAK ;θ) be a setof density or mass functions corresponding to marginal distributions indexed byA1, . . . ,AK ,and fB1|C1(yB1 |yC1 ;θ), . . . , fBK |CK (yBK |yCK ;θ) be a set of density or mass functions cor-responding to conditional distributions with variables indexed by B1, . . . ,BK left of theconditioning operator and C1, . . . , CK right of it. The composite likelihood can be definedas the product of these marginal density or mass functionsLC,marg(θ;y1, . . . ,yn) =n∏i=1K∏k=1[fAk(yi,Ak ;θ)]wk ,31or that of the conditional density or mass functionsLC,cond(θ;y1, . . . ,yn) =n∏i=1K∏k=1[fBk|Ck(yi,Bk |yi,Ck ;θ)]wk ,where y1, . . . ,yn are the observed values (with subsets yi,Ak , yi,Bk or yi,Ck as in the aboveexpressions) and wk are some non-negative weights. That is, the distributions fAk ’s orfBk|Ck ’s are effectively treated as independent. Special emphasis is on pairwise likelihood,obtained by using bivariate marginal densities in the likelihood construction. We use pair-wise likelihoods in Chapter 3 as it is relatively easy to obtain bivariate densities of extremevalue copulas. For simplicity, we also let all the weights be equal in the sequel, as our focusis not on the optimal choice of weights. Let cjk be the marginal copula density for theobserved variables j and k. The pairwise composite log-likelihood of the copula model is`P (θ;u) =n∑i=1∑j<klog cjk(uij , uik;θ), (2.19)where θ is the collection of all parameters and u = (urs) for 1 ≤ r ≤ n and 1 ≤ s ≤ d isthe collection of all data in the unit square, assumed to have unit uniform margins. Thepairwise composite likelihood estimator θˆP is obtained by maximizing (2.19) with respect toθ. Given the usual regularity conditions, θˆP is consistent and has asymptotic distribution√n(θˆP − θ0)d→ N (0,G−1(θ0))as n → ∞, where θ0 is the true value and G(θ) = H(θ)J−1(θ)H(θ) is the Godambeor sandwich information matrix (Godambe (1960)); here H(θ) = E [−∇2`P (θ;u1)] andJ(θ) = Var [∇`P (θ;u1)] are the sensitivity and variability matrices, respectively, and u1 =(ui1, . . . , uid)ᵀ is the first observation. If `P were the full log-likelihood, H(θ) = J(θ) andG(θ) would reduce to the usual Fisher information matrix. Estimation using compositelikelihood has lower efficiency in the sense that the variance of the estimator does not attainthe Crame´r-Rao lower bound.The sensitivity and variability matrices can be estimated using the sample average:Ĥ(θˆP ) = − 1nn∑i=1∑j<k∂2 log cjk(uij , uik; θˆP )∂θ∂θᵀ;Ĵ(θˆP ) =1nn∑i=1∑j<k[∂ log cjk(uij , uik; θˆP )∂θ] [∂ log cjk(uij , uik; θˆP )∂θᵀ].The matrix Ĥ(θˆP ) is obtained easily by modern numerical optimization packages used tomaximize the composite log-likelihood.322.5.2 Inference function for marginsMultivariate model fitting via copulas typically deals with both marginal and dependencemodelling. Let Y i = (Yi1, . . . , Yid)ᵀ be i.i.d. random vectors from the distributionG(y; ζ1, . . . , ζd, δ) = C (G1(y1; ζ1), . . . , Gd(yd; ζd); δ) ,where ζj is the marginal parameter vector for the jth margin with distribution functionGj , and δ is the copula parameter vector. The full (or joint) maximum likelihood methodinvolves fitting the parameters (ζ1, . . . , ζ, δ) at the same time, using the joint density g.Alternatively, if the marginal distributions are (assumed) known, then one can also regardthe fitting of δ using the copula density c as through maximum likelihood.When the number of variables is large, a computationally more efficient approachis to first estimate the marginal parameters individually, obtaining maximum likelihoodestimates ζ˜1, . . . , ζ˜d, and then estimate the copula parameter using the copula densityc(G1(yi1; ζ˜1), . . . , Gd(yid; ζ˜d); δ). This amounts to solving the set of equations(∂`1/∂ζᵀ1, . . . , ∂`d/∂ζᵀd, ∂`/∂δᵀ) = 0ᵀ,where `1, . . . , `d are the marginal log-likelihoods and ` is the log-likelihood for the jointdistribution. This is known as the method of inference function for margins (IFM) (Joeand Xu (1996); Joe (2005)) or more simply as the two-stage approach. The estimatesfrom this method are typically different from the joint maximum likelihood estimates4; inparticular, the copula parameter estimator δ˜ is generally less efficient than the maximumlikelihood estimator. The vector of estimator θ˜ = (ζ˜1, . . . , ζ˜d, δ˜)ᵀ is asymptotically normal;Joe (2005) provides a decomposition of the asymptotic covariance matrix.2.5.3 Marginal ranksWhen one does not want to assume a parametric family for the margins, it is possibleto use only the marginal ranks as input data for copula parameter estimation, bypassingparametric modelling for the margins. Specifically, the copula parameter δ is estimatedusing the copula density c (si1, . . . , sid; δ), where sij are ranks adjusted to the [0, 1] scaleas in (2.18). The marginal ranks method is also known as the semiparametric methodas the margins are “estimated” nonparametrically. Genest et al. (1995) demonstrate thatthe resulting estimator of δ is asymptotically normal but less efficient than the maximumlikelihood estimator.4The two estimates can be the same in certain cases; see Joe (2005).33Note that the IFM method works for discrete margins or when covariates are involvedin marginal modelling (e.g., univariate regression models), but the marginal ranks methodworks only for continuous margins without covariates.34Chapter 3Parsimonious multivariate extremevalue copula modelsIn multivariate statistics based on the Gaussian distribution, the factor model is a plausibleparsimonious model when the dependence among the observed variables can be explainedby latent variables. In the more general multivariate modelling based on copulas, factorcopula models have been developed in recent years, as well as parsimonious copula modelsbased on Markov tree structures and their vine extensions. For a d-dimensional distribution,each of these structures can be described by a total of O(d) parameters instead of the O(d2)parameters needed in a saturated dependence model. In this chapter, we develop factor andvine analogues for extreme value observations, and use the theory from copula modellingto construct multivariate extreme value copulas with such dependence structures. We showthat these models offer insightful interpretations using data examples.3.1 IntroductionTo model the relationship among multivariate observations, the two simplest structuresare factor and tree dependence (Markov trees). A 1-factor model assumes that variablesare linked to a common, single latent factor, through which dependence is generated. Oneexample is the returns on stocks in the same sector. Meanwhile, a Markov tree structurecan be considered when the dependence cannot be adequately explained by latent factors.In a Markov tree, the d observed variables are connected through d − 1 bivariate acycliclinkages, and non-neighbouring variables are assumed conditionally independent given thevariables along the tree path. An example of Markov tree for time-ordered observations isthe autoregressive model of order 1. In spatial applications where data are recorded from35stations at different geographic locations, the tree can be drawn according to their locationsso that nearest neighbours are linked.The factor structure can be extended to general p-factor models (p ≥ 2), as well asthe bi-factor model and its generalizations where variables are linked to both commonand group-specific factors. The Markov tree structure can be extended by adding layersof trees which relax the conditional independence assumption; one thus adds conditionaldependence parameters to such additional layers. These dependence structures have beenstudied outside the Gaussian context through a copula approach, see, for example, Krupskiiand Joe (2013) for factor copulas and Brechmann et al. (2012) for truncated vine copulas,where the multiple tree dependence known as vines comes from Bedford and Cooke (2001,2002). The cases with discrete or mixed continuous/discrete response types have also beendealt with (see Sections 3.9 and 3.10 of Joe (2014) and the references contained therein).In this chapter, we extend these parsimonious concepts to multivariate extreme valuecopulas. The conditional independencies implied by the factor and tree structures are hardor may even be impossible to replicate in the extreme value context, as extreme value cop-ula models must satisfy the max-stability property. Instead, classes of multivariate extremevalue models are constructed through: (a) extreme value limits of multivariate parametriccopula families with specified parsimonious dependence structures, and (b) parsimoniousrepresentations of existing extreme value copulas with(d2)dependence parameters, by ex-pressing each as functions of other parameters of order O(d). For (b), we consider, in par-ticular, the Hu¨sler-Reiss distribution (Hu¨sler and Reiss (1989)), obtained as a non-standardextreme value limit of the multivariate Gaussian distribution. This distribution is chosenfor its flexibility in imposing various dependence structures.The rest of this chapter is organized as follows. We introduce the extreme value factorcopula model in Section 3.2. It is the extreme value limit of factor copulas, a class ofstructured copulas that assume that dependence between observed variables is driven byone or more latent variables. Another class of models, the structured Hu¨sler-Reiss copula,allows for both factor and vine structures among the variables and is discussed in Section3.3. A comparison of these models is given in Section 3.4. The inference procedures viacomposite likelihood methods are addressed in Section 3.5, while Section 3.6 contains asimulation study. We illustrate the use and interpretation of these models in Section 3.7with two data examples.363.2 Extreme value factor copula modelThe factor copula model described in Section 2.1.2 (Krupskii and Joe (2013)) can be con-sidered as an extension of the Gaussian factor model to general copulas. In this section, weextend the idea of factor structure to the field of extremes using a copula approach. Wefirst derive the extreme value limits of factor copulas; this is followed by a description ofthe dependence properties of the resulting copulas. Examples of 1- and 2-factor extremevalue copula models with specific conditional tail dependence functions or linking copulasbetween the observed variables and latent factors are then given. We defer the technicaldetails regarding numerical integration methods for likelihood calculation to Chapter 4.3.2.1 Construction of extreme value factor copulasBy taking the extreme value limit of the factor copula using the procedure outlined inSection 2.3.2, we obtain a class of models (which we name as extreme value factor copulas)suitable for data of extremes with a factor structure. In this subsection, we demonstrate theconstruction of various extreme value factor copulas. For brevity, we replace the subscriptUi by i in the following discussion, for instance Ci|V , CUi|V and bi|V , bUi|V . All copulasare assumed to be continuous.A 1-factor extreme value copulaTo obtain the lower extreme value limit of the 1-factor copula model C given by (2.2), wefirst obtain the lower tail dependence function b(·) of C as follows:b(w1, . . . , wd) = limu→0+u−1ˆ 10d∏i=1Ci|V1(uwi|v1) dv1 = limu→0+ˆ 1/u0d∏i=1Ci|V1(uwi|uz1) dz1=ˆ ∞0limu→0+[1{z1 ≤ u−1} ·d∏i=1Ci|V1(uwi|uz1)]dz1=ˆ ∞0d∏i=1bi|V1(wi|z1) dz1, (3.1)in which the substitution v1 = uz1 is applied. The validity of exchanging the limit andintegral operators may be achieved through Lebesgue’s dominated convergence theorem.Alternatively, using results from Joe et al. (2010), a sufficient condition is that the bi|V1 ’s areproper distributions on [0,∞) (see Section 3.16 of Joe (2014); this condition is satisfied forthe parametric families described in Section 3.2.4 below). Similarly, we have bS(wi, i ∈ S) =´∞0∏i∈S bi|V1(wi|z1) dz1 for any non-empty subset S of Id. By letting mi = bi|V1(wi|z1), we37can arrive at an expression for a(·) using the relationship (2.16):a(w1, . . . , wd) =∑ibi(wi)−∑i<jbi,j(wi, wj) + · · ·+ (−1)d−1b1,...,d(w1, . . . , wd)=ˆ ∞0(∑imi −∑i<jmimj + · · ·+ (−1)d−1m1 · · ·md)dz1=ˆ ∞0[1−d∏i=1(1−mi)]dz1. (3.2)The equality between the second and third lines can be established by considering a setof independent events E1, . . . , Ed with respective occurrence probabilities m1, . . . ,md (notethat mi ∈ [0, 1], as it is the limit of a conditional probability). Then the inclusion-exclusionprinciple P(⋃di=1Ei)=∑i P(Ei) −∑i<j P(Ei ∩ Ej) + · · · + (−1)d−1P(⋂di=1Ei)impliesthat 1−∏di=1 (1−mi) = ∑imi−∑i<jmimj + · · ·+ (−1)d−1m1 · · ·md. The copula of thelower extreme value limit of C is then given byCLEV (u1, . . . , ud) = exp{−ˆ ∞0[1−d∏i=1(1− bi|V1(wi|z1))]dz1},where wi = − log ui5. Note that the extreme value limit depends only on the conditionaltail dependence functions of the linking copulas. Similarly, the upper extreme limit of Ccan be obtained as the lower extreme limit of its reflected copula Ĉ.B 2-factor extreme value copula and higher-order generalizationA similar technique can be applied to the 2-factor copula (2.3). By setting v1 = uz1, thetail dependence function isb(w1, . . . , wd) = limu→0+u−1ˆ 10ˆ 10d∏i=1Ci|V2;V1(Ci|V1(uwi|v1)|v2)dv1dv2=ˆ 10ˆ ∞0limu→0+[1{z1 ≤ u−1}d∏i=1Ci|V2;V1(Ci|V1(uwi|uz1)|v2)]dz1dv2=ˆ 10ˆ ∞0d∏i=1Ci|V2;V1(bi|V1(wi|z1)|v2)dz1dv2, (3.3)as bi|V1(wi|z1) = limu→0+Ci|V1(uwi|uz1), and the conditions needed for exchanging the limitand integral operators are similar to the 1-factor case. Note how (3.3) resembles (3.1), the5The relationship wi = − log ui emphasizes the min-stable representation of the distribution function;same for other occurrences in this chapter.38only differences being now a two dimensional integral and the conditional tail dependencefunctions are themselves arguments for the conditional distribution functions of the secondlayer copulas. Similarly, the marginal tail dependence functions are given by bS(wi, i ∈ S) =´ 10´∞0∏i∈S Ci|V2;V1(bi|V1(wi|z1)|v2)dz1dv2. The relationship between a(·) and b(·) can beobtained in the same manner as (3.2) with mi = Ci|V2;V1(bi|V1(wi|z1)|v2). Here we only listthe final expression:a(w1, . . . , wd) =ˆ 10ˆ ∞0(1−d∏i=1[1− Ci|V2;V1(bi|V1(wi|z1)|v2)])dz1dv2, (3.4)and the lower extreme value limit is CLEV (u1, . . . , ud) = exp {−a(w1, . . . , wd)}. The limitthus involves (a) the conditional tail dependence functions for the copulas linked to the firstlatent factor, and (b) the (conditional distributions of) copulas that link the conditionaldistributions of observed variables given the first latent factor and that of the second latentfactor given the first.These results can be generalized to p-factor copulas with p ≥ 3. With d observedvariables, the specification of a p-factor copula requires d × p linking copulas. Note thatthe conditional distribution function of Ui given all latent factors can be written asCi|V1,...,Vp(ui|v1, . . . , vp) = P(Ui ≤ ui|V1 = v1, . . . , Vp = vp)=∂∂vpP(Ui ≤ ui, Vp ≤ vp|V1 = v1, . . . , Vp−1 = vp−1)=∂∂vpCi,Vp;V1,...,Vp−1(Ci|V1,...,Vp−1(ui|v1, . . . , vp−1), vp)= Ci|Vp;V1,...,Vp−1(Ci|V1,...,Vp−1(ui|v1, . . . , vp−1)|vp).This recursion allows us to express Ci|V1,...,Vp(ui|v1, . . . , vp) in terms of the conditional dis-tributions of the linking copulas at each level. For example, when p = 3, we haveCi|V1,V2,V3(ui|v1, v2, v3) = Ci|V3;V1,V2︸ ︷︷ ︸level 3(Ci|V2;V1︸ ︷︷ ︸level 2(Ci|V1︸ ︷︷ ︸level 1(ui|v1) |v2)|v3).For general p, we haveCi|V1,...,Vp(ui|v1, . . . , vp) = Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (Ci|V1 (ui|v1) |v2) · · · |vp) .The expression of Ci|V1,...,Vp(ui|v1, . . . , vp) in terms of Ci|V1 (ui|v1), the only term that con-tains v1, allows us to obtain the corresponding extreme value factor copula using a similarapproach for the 1- and 2-factor extreme value copulas6. The general p-factor copula is6This assumes that the linking copulas Ci,Vk;V1,...,Vk−1(ui|vk) do not involve v1, . . . , vk−1, which is oftenthe simplifying assumption used in vine copula construction.39C(u1, . . . , ud) =ˆ 10· · ·ˆ 10d∏i=1Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (Ci|V1 (ui|v1) |v2) · · · |vp) dv1 · · · dvp,and the tail dependence function, with the substitution v1 = uz1, is given byb(w1, . . . , wd)= limu→0+u−1ˆ 10· · ·ˆ 10d∏i=1Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (Ci|V1 (uwi|v1) |v2) · · · |vp) dv1 · · · dvp=ˆ 10· · ·ˆ ∞0d∏i=1Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1(limu→0+Ci|V1 (uwi|uz1) |v2)· · · |vp)dz1dv2 · · · dvp=ˆ 10· · ·ˆ ∞0d∏i=1Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (bi|V1(wi|z1)|v2) · · · |vp) dz1dv2 · · · dvp,assuming that the conditions for the dominated convergence theorem, or that for the taildependence functions bi|V1 , are satisfied.By setting mi = Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (bi|V1(wi|z1)|v2) · · · |vp), we obtain the a(·)function asa(w1, . . . , wd) =ˆ 10· · ·ˆ ∞0[1−d∏i=1(1−mi)]dz1dv2 · · · dvp=ˆ 10· · ·ˆ ∞0[1−d∏i=1(1− Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (bi|V1(wi|z1)|v2) · · · |vp))]dz1dv2 · · · dvp,and the extreme value limit is CLEV (u1, . . . , ud) = exp {−a(w1, . . . , wd)}. Although quiteflexible, numerical evaluation of the p integrals can be challenging even with Gaussianquadrature. Less complex subsets of the p-factor extreme value copula, such as the bi-factor model to be discussed below, can offer computational efficiency and may even improveinterpretability of the model when used appropriately.C Bi-factor extreme value copulaThe bi-factor model with G groups of non-overlapping variables is a special case of thegeneral p-factor model with p = G + 1, and is often a reasonable assumption in practicewhen not all variables depend on every latent factor. Each observed variable is related tothe common latent variable, denoted now as V0. Each variable in group g (for g = 1, . . . , G)is linked to latent variable Vg. An example would be stocks in different sectors; a latentfactor common to all stocks can be attributed to the overall state of the economy, while40other latent factors are sector-specific whose effects are more local to the sectors concerned.Table 3.1 shows a possible dependence structure among the observed and latent variables ina bi-factor model. The presence of a checkmark indicates dependence between the elementsof that pair, conditional on the previous latent variables. For example, the checkmark inthe cell linking U4 and V2 means that U4|V0, V1 and V2|V0, V1 are dependent. Note thateach observed variable depends on exactly two latent variables, one of them being V0.Observed Latent variablevariable V0 V1 V2 · · · VGU1 3 3U2 3 3U3 3 3U4 3 3.... . .Ud 3 3Table 3.1: An example of the dependence structure between observed and latent variablesin a bi-factor model. A checkmark indicates dependence between the elements of that pair,conditional on the previous latent variables.The latent variables V0, V1, . . . , VG are assumed to be independent. Two variables ingroup g are conditionally independent given V0, Vg and two variables in different groupsare conditionally independent given V0. Suppose the indices for group g variables arekg−1 + 1, . . . , kg, for g = 1, ..., G where k0 = 0 and kG = d. Then, Ukg−1+1, . . . , Ukg(conditionally) depend only on V0 and Vg, g = 1, . . . , G. Note that, if Ui|V0, . . . , Vg−1 andVg|V0, . . . , Vg−1 are independent (i.e., no checkmark in the Ui–Vg cell in Table 3.1), thenCi|Vg ;V1,...,Vg−1(x|y) = x for x ∈ [0, 1] as FUi|V1,...,Vg−1 is unit uniform. This impliesCi|VG;V0,V1,...,VG−1(· · ·Ci|V1;V0 (Ci|V0 (ui|v0) |v1) · · · |vG) = Ci|Vqi ;V0 (Ci|V0 (ui|v0) |vqi) ,where qi ≥ 1 is the index of the latent factor that Ui is dependent upon conditionally, i.e.,qi = g if and only if kg−1 + 1 ≤ i ≤ kg. Hence the bi-factor copula isC(u1, . . . , ud) =ˆ 10· · ·ˆ 10︸ ︷︷ ︸G+ 1 integralsd∏i=1Ci|Vqi ;V0(Ci|V0 (ui|v0) |vqi)︸ ︷︷ ︸Depends on v0 and vqi onlydv0 · · · dvG=ˆ 10G∏g=1ˆ 10kg∏i=kg−1+1Ci|Vg ;V0(Ci|V0 (ui|v0) |vg)dvgdv0.The (G + 1)-dimensional integral is thus decomposed into a 1-dimensional outer integralwith a product of 1-dimensional integrals as integrand, so that numerically, this has the41complexity of a 2-dimensional nested integral. For example, consider the case where thereare 3 latent factors and 7 observed variables such that, in addition to V0, the first 3 observedvariables depend on V1 only while the rest depend on V2 only. Then the bi-factor copula isCbifact(u1, . . . , u7)=ˆ 10[ˆ 103∏i=1Ci|V1;V0(Ci|V0 (ui|v0) |v1)dv1][ˆ 107∏i=4Ci|V2;V0(Ci|V0 (ui|v0) |v2)dv2]dv0,whereas the full 3-factor copula isCfull(u1, . . . , u7) =ˆ 10ˆ 10ˆ 10d∏i=1Ci|V2;V0,V1(Ci|V1;V0(Ci|V0 (ui|v0) |v1) |v2) dv0dv1dv2.It can be seen that the bi-factor model is a simpler formulation.Since the bi-factor copula is a special case of the general (G + 1)-factor copula model,so is the tail dependence function. We haveb(w1, . . . , wd) =ˆ 10· · ·ˆ ∞0d∏i=1Ci|Vqi ;V0(bi|V0(wi|z0)|vqi)dz0dv1 · · · dvG=ˆ ∞0G∏g=1ˆ 10kg∏i=kg−1+1Ci|Vg ;V0(bi|V0(wi|z0)|vg)dvgdz0,and thereforea(w1, . . . , wd) =ˆ 10· · ·ˆ ∞0[1−d∏i=1(1− Ci|Vqi ;V0(bi|V0(wi|z0)|vqi))]dz0dv1 · · · dvG=ˆ ∞01− G∏g=1ˆ 10kg∏i=kg−1+1(1− Ci|Vg ;V0(bi|V0(wi|z0)|vg))dvgdz0,and the extreme value factor copula is CLEV (u1, . . . , ud) = exp {−a(w1, . . . , wd)} . Again,using the above 7-variable example, we obtain the b(·) and a(·) functions asb(w1, . . . , w7)=ˆ ∞0[ˆ 103∏i=1Ci|V1;V0(bi|V0(wi|z0)|v1)dv1][ˆ 107∏i=4Ci|V2;V0(bi|V0(wi|z0)|v2)dv2]dz0;a(w1, . . . , w7)=ˆ ∞0[1−(ˆ 103∏i=1[1− Ci|V1;V0 (bi|V0(wi|z0)|v1)] dv1)(ˆ 107∏i=4[1− Ci|V2;V0 (bi|V0(wi|z0)|v2)] dv2)]dz0.423.2.2 Extreme value limit of vine copulasThe extreme value limit of an arbitrary (truncated) vine copula usually involves a high-dimensional integral that depends on the number of variables, except for the case of a p-truncated C-vine whose extreme value limit is a p-dimensional integral. The p-factor copulamodel is simply a p-truncated C-vine rooted at the latent variable(s) and thus inherits thisnice property. For an illustration, we derive the distribution function of the 1-truncatedC-vine rooted at variable 1:C(u1, . . . , ud) =ˆ u10· · ·ˆ ud0d∏j=2c1j(v1, vj) dvd · · · dv1 =ˆ u10d∏j=2[ˆ uj0c1j(v1, vj) dvj]dv1=ˆ u10d∏j=2Cj|1(uj |v1) dv1,and the distribution function of the 2-truncated C-vine rooted at variable 1 in the first treeand the edge {1, 2} in the second tree:C(u1, . . . , ud)=ˆ u10· · ·ˆ ud0d∏j=2c1j(v1, vj)d∏k=3c2k;1(C2|1(v2|v1), Ck|1(vk|v1))dvd · · · dv1=ˆ u10ˆ u20c12(v1, v2)d∏k=3[ˆ uk0c2k;1(C2|1(v2|v1), Ck|1(vk|v1))c2;1(C2|1(v2|v1)) c1k(v1, vk) dvk]dv2dv1=ˆ u10ˆ u20c12(v1, v2)d∏k=3[ˆ Ck|1(uk|v1)0ck|2;1(zk|C2|1(v2|v1))dzk]dv2dv1=ˆ u10ˆ u20c12(v1, v2)d∏k=3Ck|2;1(Ck|1(uk|v1)|C2|1(v2|v1))dv2dv1,where the marginal copula density c2;1(C2|1(v2|v1))= 1 and the substitution zk = Ck|1(vk|v1)is applied so that dzk = c1k(v1, vk) dvk. A similar derivation generalizes to a p-truncatedC-vine. The separability of the integrand into a product of tractable integrals allows asimple form for the copula, as well as the b(·) and a(·) functions.In complete generality, a regular vine copula has O(d) intractable integrals. Section3.15 of Joe (2014) has an example of the d-dimensional 1-truncated D-vine copula. Thetail dependence function, obtained by applying (2.15) to the individual components of theintegrand, is a (d − 2)-dimensional integral. Therefore, although a similar derivation ofthe extreme value limit applies to the general vine, it is computationally challenging to dolikelihood inference. This is the motivation behind the structured Hu¨sler-Reiss model inSection 3.3.433.2.3 Bivariate dependence propertiesThe stable tail dependence function for the (i, j) bivariate marginal distribution of a p-factorextreme value copula isaij(wi, wj) =ˆ 10· · ·ˆ ∞0(mi +mj −mimj) dz1dv2 · · · dvp = wi + wj − bij(wi, wj),wheremi = Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (bi|V1(wi|z1)|v2) · · · |vp) and bij is the bivariate marginaltail dependence function of the parent factor copula. This relationship allows us to extendsome dependence results for the factor copula to its extreme value limit easily. The follow-ing list gives some bivariate dependence properties applicable to the extreme value factorcopulas.• Concordance ordering. A bivariate distribution F is more concordant than G,written as F c G, if F ≥ G or equivalently F ≥ G pointwise. Increasing in theconcordance ordering means that a larger dependence parameter of a copula leads tostronger dependence, i.e., the copula becomes more concordant. For the 1-factor case,it is given in Krupskii and Joe (2013) that, assuming (a) the linking copula CjV1 isfixed and Cj|V1 is stochastically increasing (i.e., 1− Cj|V1(u|v) = P(Uj > u|V1 = v) isincreasing in v for all u ∈ (0, 1)), and (b) CiV1 increases in the concordance ordering,then the factor copula Cij increases in the concordance ordering. By definition, itscorresponding tail dependence function bij is then increasing in the parameter(s) ofCiV1 , meaning that the extreme value limit CEV = uiuj exp {bij(wi, wj)} increases inthe concordance ordering.• 1-Factor dependence structure. Consider variables i, j, k linked to a latent vari-able V1 in the 1-factor copula in (2.2), and assume that the 1-factor copula has lowertail dependence. Suppose that each linking copula with V1 is stochastically increasing.If CiV1 c CjV1 c CkV1 are ordered in the concordance ordering c, then the pre-ceding item implies that Cij c Cik c Cjk for the bivariate margins of the 1-factorcopula, and the same concordance ordering holds for the corresponding bivariate mar-gins of the extreme value limit. If the variables are indexed so that the strength ofdependence (concordance) is decreasing as i increases from 1 to d, then the bivariatedependence of the (i, j) margin for the 1-factor copula and its extreme value limitdecreases as i, j increase. This is the typical pattern of dependence for the 1-factorstructure. By a matrix of empirical bivariate dependence measures, it is possibleto assess whether the 1-factor structure is a good approximation. If not, one couldconsider 2- or bi-factor structures assuming there are plausible latent variables.44• Dependence measures. The tail dependence function of the extreme value factorcopula is the same as that of the corresponding factor copula, and the tail dependenceindex is bij(1, 1). The Pickands dependence function is B(w) = aij(w, 1 − w) =1− bij(w, 1−w) and the extremal coefficient is ϑ = 2− bij(1, 1). The tail dependencefunction of the factor copula thus contains sufficient information on these quantities.• Dependence boundaries. The full d-dimensional extreme value factor copula be-comes the independence (resp. comonotonicity) copula when all linking copulas aretail independent (resp. comonotonicity). The bivariate marginal copula attains theselimits when all linking copulas related to the variables concerned do. The extremevalue limit is not joint independence if there is tail dependence for every linking copulaconnected to the first latent factor.Therefore, to construct parametric extreme value copulas with factor structure and in-terpretable parameters, we start with factor copulas where the bivariate linking copulasare in parametric families that increase in concordance, cover the range of independenceto comonotonicity, and are stochastically increasing. These properties are satisfied by 1-parameter families of bivariate copulas that are mentioned in the next subsection.3.2.4 Examples of 1-factor and 2-factor extreme value copulasHere we present examples of the 1- and 2-factor extreme value copulas. In each case d isthe number of observed variables. To distinguish between the roles of different copulas, welabel the linking copulas, factor copulas and their associated (lower) extreme value factorcopulas as C˜, C and CEV , respectively. A similar convention is applied to marginal andconditional tail dependence functions, so that b˜ are those for the linking copulas and b thefactor copulas.A 1-factor with Dagum (inverse Burr) conditional tail dependence functionsThe 1-factor extreme value copula is characterized by the conditional tail dependencefunction, and thus we name such copulas based on these functions. Let bi|V1(wi|z1) =[1 + (wi/z1)−δi]−1/δi−1 with δi > 0, i = 1, . . . , d. This is a special case of the Dagum(Dagum (1975)) or inverse Burr distribution and can be derived as the lower conditionaltail dependence function of the MTCJ copula, or the upper one for the Galambos copulawith dependence parameters δi that increase in the concordance ordering.In the following, we derive the various quantities assuming that the linking copulasCi,V1 , i = 1, . . . , d, are MTCJ:45• Linking copula:C˜i,V1(ui, v1) = (u−δii + v−δi1 − 1)−1/δiwhere δi > 0 controls the strength of dependence of the linking copula.• Tail dependence function between Ui and V1:b˜i,V1(wi, z1) = limu→0+u−1[(uwi)−δi + (uz1)−δi − 1]−1/δi=(w−δii + z−δi1)−1/δi.• Conditional tail dependence function:b˜i|V1(wi|z1) =∂∂z1b˜i,V1(wi, z1) =[1 + (wi/z1)−δi]−1/δi−1.With b˜i|V1 derived, we obtain the b(·) and a(·) functions for the factor copula asb(w1, . . . , wd) =ˆ ∞0d∏i=1b˜i|V1(wi|z1) dz1 =ˆ ∞0d∏i=1[1 + (wi/z1)−δi]−1/δi−1dz1;a(w1, . . . , wd) =ˆ ∞0[1−d∏i=1(1−[1 + (wi/z1)−δi]−1/δi−1)]dz1,using (3.1) and (3.2). Note that this is valid as b˜i|V1 is a proper distribution on [0,∞).Finally, CEV (u1, . . . , ud) = exp {−a(w1, . . . , wd)}. Numerically stable evaluation of thesetypes of integrals is discussed in Chapter 4.B 1-factor with Burr (Singh-Maddala) conditional tail dependence functionsAlternatively, let bi|V1(wi|z1) = 1 −[(wi/z1)θi + 1]1/θi−1 with θi > 1, i = 1, . . . , d. This isa special case of the Burr Type XII (Burr (1942)) or Singh-Maddala (Singh and Maddala(1976)) distribution and can be derived as the upper conditional tail dependence function ofthe Gumbel or Joe/B5 (Joe (1993)) copulas with dependence parameters θi, again increasingin the concordance ordering. The following exposition uses reflected Gumbel linking copulasand the lower extreme value limit is derived.• Linking copula:C˜i,V1(ui, v1) = CGum(1− ui, 1− v1)= ui + v1 − 1 + exp{−[(− log (1− ui))θi + (− log (1− v1))θi]1/θi},where θi > 1 controls the strength of dependence of the linking copula, and CGum isthe survival function of the Gumbel copula.46• Tail dependence function between Ui and V :b˜i,V1(wi, z1)= limu→0+u−1[uwi + uz1 − 1 + exp{−[(− log (1− uwi))θi + (− log (1− uz1))θi]1/θi}]=wi + z1 + limu→0+[u−1 ·(exp{−u(wθii + zθi1)1/θi}− 1)︸ ︷︷ ︸1−1−u(wθii +zθi1)1/θi+u22(wθii +zθi1)2/θi−···]= wi + z1 −(wθii + zθi1)1/θi.• Conditional tail dependence function:b˜i|V1(wi|z1) =∂∂z1b˜i,V1(wi, z1) = 1−[(wi/z1)θi + 1]1/θi−1.The conditional tail dependence function b˜i|V1(wi|z1) is again a proper distribution on [0,∞).Using (3.1) and (3.2), the b(·) and a(·) functions for this factor copula areb(w1, . . . , wd) =ˆ ∞0d∏i=1(1−[(wi/z1)θi + 1]1/θi−1)dz1;a(w1, . . . , wd) =ˆ ∞0(1−d∏i=1[(wi/z1)θi + 1]1/θi−1)dz1,and CEV (u1, . . . , ud) = exp {−a(w1, . . . , wd)} .C 2-factor with Dagum conditional tail dependence functions for factor 1Here we provide an example of 2-factor extreme value copula with Dagum conditional taildependence functions and dependence parameters δi, i = 1, . . . , d, for the first factor. From(3.3) and (3.4), the b(·) and a(·) functions areb(w1, . . . , wd) =ˆ 10ˆ ∞0d∏i=1C˜i|V2;V1([1 + (wi/z1)−δi]−1/δi−1 ∣∣∣∣v2; θi)dz1dv2;a(w1, . . . , wd) =ˆ 10ˆ ∞0(1−d∏i=1[1− C˜i|V2;V1([1 + (wi/z1)−δi]−1/δi−1 ∣∣∣∣v2; θi)])dz1dv2,where C˜i|V2;V1 is the conditional distribution function of the linking copula between the ithobserved variable and the second latent factor with dependence parameter θi. Numericaltechniques similar to the 1-factor case can be used to compute a(·).473.3 Structured Hu¨sler-Reiss modelIn this section, we propose another class of parsimonious extreme value dependence modelsbased on the multivariate Hu¨sler-Reiss copula (Hu¨sler and Reiss (1989)). This is an alterna-tive to extreme value limits of vine copulas, which involve high-dimensional integrals evenin bivariate margins and are computationally intractable. For parsimonious submodels, weimpose a structure on the correlation matrix of the Hu¨sler-Reiss copula according to thedesired dependence structure. Although this is proposed mainly for vine structures, thefactor structures are also well covered. Note that parsimonious forms of the Hu¨sler-Reisscopula have been developed in the context of multivariate spatial extremes (see, e.g., Smith(1990); Davison et al. (2012)).3.3.1 Hu¨sler-Reiss copula with parsimonious dependenceThe Hu¨sler-Reiss copula (2.8) with parameters δij is derived as the extreme value limit ofthe multivariate Gaussian distribution, with the assumption that the pairwise correlationsapproach 1 when the sample size n→∞ in the following fashion:[1− ρij(n)] log n→ δ−2ij ∈ (0,∞), 1 ≤ i 6= j ≤ d, (3.5)where Σ(n) = (ρij(n))1≤i,j≤d is the correlation matrix, dependent on n, of a d-variateGaussian distribution with zero mean and unit variance for all variables. Nikoloulopouloset al. (2009) show that the Hu¨sler-Reiss copula can be obtained as the limit of the t-EVcopula when the dispersion matrix Σ(ν) = (ρij(ν)) is such that ρij(ν) = 1−2δ−2ij /ν, with thelimit ν → ∞. This and (3.5) provide a link between the correlations ρij of the underlyingGaussian variates and the parameters of the Hu¨sler-Reiss distribution δij . We propose anextreme value model in which the ρij ’s are structured according to the dependence patternassumed for the data. This structure is translated to the parameters δij and is respectedduring the model fitting procedure. Specifically, (3.5) suggests that δij = γ(1− ρij)−1/2 forn large, where γ > 0 is a proportionality constant. Different structures can be applied toρij , for example:• Factor structure. For a given number of factors p, define the d × p matrix ofparametersL =α11 α12 · · · α1pα21 α22 · · · α2p....... . ....αd1 αd2 · · · αdp , (3.6)48where αij ∈ [−1, 1] is the parameter for the ith variable and the jth factor. Its roleis analogous to the loadings in classical factor analysis. The correlations are relatedto the parameters via the relationship ρij =∑pk=1 αikαjk for i 6= j, and ρii = 1. Forpractical model fitting, the following alternative parametrization is more useful:L =α11 α12√1− α211 α13√(1− α211) (1− α212) · · · α1p√∏p−1j=1(1− α21j)α21 α22√1− α221 α23√(1− α221) (1− α222) · · · α2p√∏p−1j=1(1− α22j).......... . ....αd1 αd2√1− α221 αd3√(1− α2d1) (1− α2d2) · · · αdp√∏p−1j=1(1− α2dj).With this construction, each αij can algebraically take the range [−1, 1] independentlyand the parametrization is better suited for numerical optimization. Refer to Section6.16 of Joe (2014) for a partial correlation interpretation. Note that this model isdifferent from the extreme value limit of factor copulas introduced in Section 3.2, withstable tail dependence functions being integrals that do not generally correspond tothe sum of Gaussian distribution functions for the Hu¨sler-Reiss copula.• Vine/tree structure. The number of parameters (or correlations) needed to spec-ify a Markov tree is d− 1, each corresponding to an edge of the associated graphicalmodel. A 2-truncated vine uses d−2 additional partial correlation parameters to con-nect the d−1 edges in the Markov tree, and similarly for higher-order truncated vines(see Section 3.9 of Joe (2014)). A p-truncated vine has (d−1)+(d−2)+· · ·+(d−p) =p [d− (p+ 1)/2] parameters, and so the number of parameters grows at the rate O(d)for fixed p. The parameters can be labelled according to the Gaussian pair-copulait represents. For example, the parameters for a d-dimensional D-vine (linear in tree1) are{{α12, α23, . . . , α(d−1),d},{α13;2, α24;3, . . . , α(d−2),d;(d−1)}, . . . ,{α1d;23···(d−1)}}.By construction of vine copulas, the α’s are within [−1, 1] and are algebraically inde-pendent.With this relationship between the dependence parameters δij and the correlation param-eters ρij of the underlying Gaussian variables, it is possible to specify certain δij ’s insteadand retrieve the corresponding α parameters and/or γ. For example, with the 1-truncatedD-vine, the correlation or dependence parameters for neighbouring variables are related asδi,i+1 = γ(1− αi,i+1)−1/2, i = 1, . . . , d− 1. Hence, by specifying δ12, . . . , δd−1,d, we can ob-tain αi,i+1 given any fixed γ. In this situation, it is possible to specify an extra δ parameterfor the estimation of γ. The parameter set {α12, . . . , αd−1,d, γ} determines the correlationsand hence dependence parameters for the other pairs. A similar argument generalizes to49any single tree or 1-truncated vine, i.e., specifying the δ parameters corresponding to thevariables linked in the first tree determines all other dependence parameters given a fixedγ. Likewise, for a p-truncated vine, one will need to specify the δ parameters correspondingto the variables of the bivariate linking copulas in the p trees.Such relationship is more complex for the factor model as every ρij depends on morethan one α parameter. As an example, for the 1-factor model, we obtain the(d2)equationsδij = γ(1 − αiαj)−1/2 for i < j using the parametrization (3.6). For a given γ, we cansolve for the α’s (unique up to sign) if δ12, . . . , δ1d and one other δ parameter is known. Ap-factor model has a total of pd + 1 parameters in our parametrization, and can only besolved if the number of given dependence parameters is no more than this. Even so, thesystem of equations may not have real solutions, even if the δ’s are compatible in the sensethat they constitute a valid Hu¨sler-Reiss distribution.The Hu¨sler-Reiss copula has been generalized to the t-EV copula (2.11) (Demartaand McNeil (2005); Nikoloulopoulos et al. (2009)) and their skewed counterparts (Padoan(2011)). A multivariate t copula with factor structure has been defined with the correlationmatrix having factor structure (Klu¨ppelberg and Kuhn (2009)). It is possible to impose afactor structure directly on the correlation matrix of a t-EV copula. However, the bivariatestrength of dependence corresponding to a correlation parameter of zero depends on thedegree of freedom ν used (Davison et al. (2012); Ribatet (2013)); it approaches the inde-pendence limit as ν →∞. This may complicate interpretation of the model. Nevertheless,it may be useful when the dependence strengths among all variables are rather strong, sothat the aforementioned boundary is not a major concern. The stock returns data examplein Section 3.7 contains an illustration of the 1-factor t-EV model.3.3.2 Bivariate dependence propertiesSince the structured Hu¨sler-Reiss model is a subset of the unconstrained Hu¨sler-Reiss cop-ula, its dependence properties are restricted by those of the latter. In particular, using(2.16), the bivariate tail dependence function is given bybij(wi, wj) = wiΦ(δij2logwjwi− 1δij)+ wjΦ(δij2logwiwj− 1δij),which is symmetric in (wi, wj) and thus the bivariate marginal copula is permutation sym-metric. The tail dependence index is given by bij(1, 1) = 2Φ(−1/δij), and the extremalcoefficient is 2− bij(1, 1) = 2Φ(1/δij). The Pickands dependence function isB(w) = wΦ(1δij+δij2logw1− w)+ (1− w)Φ(1δij+δij2log1− ww).50The structured Hu¨sler-Reiss model is increasing in concordance ordering with respect toboth α and γ: Increasing α and/or γ leads to an increase in some or all of the ρ’s (and thusthe δ’s), leading to stronger dependence.For the d-dimensional copula, independence is obtained when all δ parameters are zero,i.e., when γ = 0. Note that, even when ρij = −1 (the lower bound), δij will not be zerounless γ = 0. Therefore care must be exercised when interpreting the fitted correlationparameters. Meanwhile, the comonotonicity copula is the limit as δij →∞ for every (i, j),or equivalently when ρij = 1 for every pair. For the factor model, this happens when allthe parameters for the first factor are 1. For the vine model, this happens when all αparameters are 1. Regardless of the dependence structure, we obtain the same limit byletting γ →∞.3.4 Comparison between the extreme value factor copulaand the structured Hu¨sler-Reiss copulaBoth the structured Hu¨sler-Reiss copula and the extreme value factor copula are potentiallyuseful parsimonious models for multivariate extremes. In this section, we contrast somedifferences between the two classes of models.• The extreme value factor copula model is suitable for data exhibiting a latent orobserved factor structure, whereas the structured Hu¨sler-Reiss model is applicable toany parsimonious dependence structure that can be parametrized and represented interms of the correlation matrix of a Gaussian distribution.• Simulation of the extreme value factor copula can be approximated by taking the com-ponentwise maxima of the corresponding factor copula. However, there is currentlyno simple way to simulate from the multivariate Hu¨sler-Reiss copula in general.• The parameters between different linking copulas in the extreme value factor copulamodel are algebraically independent. For the Hu¨sler-Reiss model, the collection of pa-rameters have to be such that all d constituent (d−1)-dimensional correlation matricesin (2.9) are positive definite. It is however not difficult to implement parameter ver-ification in numerical optimization procedures. Some parameters may approach theboundary, i.e., 1 or −1, during model fitting of the structured Hu¨sler-Reiss model,especially when the sample size is small. If that occurs, one can fit again with suchparameters fixed at the boundary.51• For the factor copula model, the strength of dependence is controlled by both thechoice of linking copulas and the parameter values, while the dependence structureis fixed (i.e., factor). On the other hand, the Hu¨sler-Reiss model does not involvethe choice of copula families. The dependence structure can be specified, and theparameter values affect the strength of dependence. The α parameters mentionedabove control the strength of dependence for specific pairs and other variables linkedto them, while the proportionality constant γ controls the overall strength. Since γacts multiplicatively on all δij ’s, increasing γ leads to an increase in the strength ofdependence among all variables, keeping other parameters constant.• The Hu¨sler-Reiss copula with factor structure does not coincide with the class of ex-treme value factor copula in terms of the pairwise strength of dependence, but withcareful choice of parameters, the pairwise extremal coefficients can be well approxi-mated. In general, the approximation is better when the ρij ’s are close to 1; this isnot surprising given the limit argument in deriving the Hu¨sler-Reiss copula.• Model estimation, via pairwise likelihood to be described in the Section 3.5, is fasterand generally more stable for the structured Hu¨sler-Reiss copula as Gaussian den-sities and distribution functions are relatively easy to evaluate. With the extremevalue factor copula model, one must obtain the exponent function and its derivativesthrough numerical integration methods (see Chapter 4).The models can also be compared through their pairwise dependence characteristics. Wegive a representative example for illustration. Table 3.2 displays the Spearman’s ρ for eachbivariate margin of different parsimonious models for 5 observed variables. The top leftpanel corresponds to the 1-factor copula with MTCJ linkages and dependence parameters(δ1, . . . , δ5) = (1, 2, 3, 4, 5) (dependence is stronger as the index increases), from whichthe factor structure is apparent in the sense that the strength of dependence is similarlyordered across columns and rows. This property is carried over to its extreme value limitshown in the top right panel. In this case, the Spearman’s ρ’s are somewhat stronger thanits parent factor copula. This extreme value 1-factor copula can be closely approximatedby the Hu¨sler-Reiss 1-factor structure with parameters (tanh−1 α11, . . . , tanh−1 α15, γ) =(4.25, 4.83, 5.19, 5.45, 5.67, 0.0297).The bottom panel of Table 3.2 suggests that the dependence pattern of the structuredHu¨sler-Reiss copula follows that of the tree imposed on the underlying correlation matrix,with weaker dependence for variables farther apart in the tree. The matrix of pairwiseSpearman’s ρ for a linear tree dependence structure (1-truncated D-vine) with correlationparameters α all equal to 0.5 and γ = 1 is given in the bottom left panel, and it shows that521-factor MTCJ EV 1-factor Dagum1 .36 .40 .43 .441 .57 .60 .621 .69 .711 .7711 .51 .55 .56 .571 .75 .78 .791 .85 .871 .901HR (D-vine, γ = 1) HR (D-vine, γ = .5) HR (non-linear)1 .57 .46 .41 .391 .57 .46 .411 .57 .461 .5711 .57 .39 .29 .231 .57 .39 .291 .57 .391 .5711 .57 .39 .29 .291 .57 .39 .391 .57 .571 .391Table 3.2: Matrices of pairwise Spearman’s ρ for five copulas with parsimonious dependence.They are: 1-factor with MTCJ linking copulas and parameters (δ1, . . . , δ5) = (1, 2, 3, 4, 5)(top left); its extreme value limit (top right); Hu¨sler-Reiss with 1-truncated D-vine struc-ture, with all α parameters equal to 0.5 and γ = 1 (bottom left); same structure as previousmodel but with α = 0.875 and γ = 0.5 (bottom middle); same parameters as previous modelbut with variables 1 to 4 linearly connected and variable 5 connected to 3 (bottom right).adjacent variables are the most correlated with dependence strength tapering off betweenvariables that are farther apart. The bottom middle panel results from the same vine butwith all α = 0.875 and γ = 0.5 instead, so that the dependence between adjacent variablesremains the same. We can see that a smaller γ results in a more flexible coverage of thestrength of dependence. Finally, the bottom right panel represents the dependence structureof a tree with variables 1 to 4 linearly connected, but variable 5 is connected to 3 instead.The parameters are all α = 0.875 and γ = 0.5. Since variables 4 and 5 are symmetric,they have the same bivariate dependence with the other three variables. Because relativepairwise dependence strengths are preserved, one may thus apply the assumed structuredirectly to the proposed models. The magnitudes of the parameters, however, should notbe interpreted directly. For example, even if α < 0 implying possible negative values inthe underlying correlation matrix, the actual correlation of the observed extremes is stillpositive, albeit with a smaller magnitude.3.5 Statistical inference via composite likelihood methodsAs mentioned in Section 2.2.2, it is impractical to consider the full likelihood for multivari-ate extreme value copulas as the number of terms in the density grows rapidly with the53dimension. In this section, we describe the use of composite likelihood methods (Section2.5.1) for the statistical inference of the two classes of parsimonious extreme value copulasintroduced in this chapter. In particular, we focus on the pairwise likelihood with equalweight for each bivariate marginal density.We suggest using the two-stage estimation procedure (i.e., the inference functions formargins method, see Section 2.5.2) for model fitting, where the marginal distributions arefirst estimated for each variable using the generalized extreme value distribution. Theobservations are then transformed to unit uniform and are fitted by the copula7.For extreme value copulas, the bivariate marginal densities in (2.19) have the formc12(u1, u2) = e−A(A(1)A(2)−A(12))/u1u2, where A(1), A(2) and A(12) are the partial deriva-tives of the exponent function A(w1, w2)8, with respect to the arguments according to theorder indicated. The pairwise density for the Hu¨sler-Reiss copula involves only univariateGaussian densities and distribution functions and is thus easy to compute (see Section 4.10of Joe (2014)). For the 1- and 2-factor extreme value copulas, the partial derivatives of Awith respect to the non-empty set S ⊂ {1, 2} are as follows, using (3.2) and (3.4):A(S)1-fact =ˆ ∞0(−1)|S|−1∏j∈Sb′j|V1(wj |z1)∏i/∈S[1− bi|V1(wi|z1)]dz1,A(S)2-fact =ˆ 10ˆ ∞0(−1)|S|−1{∏j∈S[cj,V2;V1(bj|V1(wj |z1), v2) · b′j|V1(wj |z1)]·∏i/∈S[1− Ci|V2;V1(bi|V1(wi|z1)|v2)]}dz1dv2,where b′j|V1(wj |z1) = ∂bj|V1(wj |z1)/∂wj . Similarly, for p ≥ 3, we haveA(S)p-fact =ˆ 10· · ·ˆ ∞0(−1)|S|−1{∏j∈S[p∏k=3cj,Vk;V1,...,Vk−1(· · ·Cj|V2;V1 (bj|V1(wj |z1)|v2) · · · , vk)·cj,V2;V1(bj|V1(wj |z1), v2) · b′j|V1(wj |z1)]·∏i/∈S[1− Ci|Vp;V1,...,Vp−1(· · ·Ci|V2;V1 (bi|V1(wi|z1)|v2) · · · |vp)]}dz1dv2 · · · dvp.7Standardization to unit Fre´chet margin is also a popular choice in the extreme value literature. However,as we are fitting a copula model, the unit uniform is a more natural choice here.8Here the quantity exp{−A(w1, w2)} assumes the role of G, the min-stable survival function with unitexponential margins. The A(·) function is identical to a(·) in previous sections. In multivariate extremevalue theory, it is customary to use the capital letter A to denote the exponent function; we adopt thisconvention here.54In most cases, A(S) has to be evaluated numerically. The pairwise likelihood reduces theaccumulation of rounding errors, as it does not require as many terms as in a higher-dimensional marginal density. In Chapter 4, we discuss the issues on the numerical eval-uation of A and A(S), and stabilize (and speed up) such evaluations through Gaussianquadrature and transformations of the integrands. With these techniques, we can obtainthe accuracy required for the estimation of standard errors through the sensitivity andvariability matrices.3.6 Simulation StudyWe conduct a simulation study on the Dagum 1-factor extreme value factor model, basedon the extreme value limit of a 1-factor MTCJ copula with parameters δ = (δ1, . . . , δd),with the following intentions: (a) to verify that the numerical evaluation of the likelihoodof the model is valid; (b) to compare the statistical efficiency between full and composite(pairwise) likelihood estimation in dimension d = 4, and; (c) to show that the methodof pairwise likelihood yields accurate model-based standard errors. Note that with d ≥ 5,maximum likelihood with the d-dimensional density is impractical in terms of computationaltime and getting an analytic expression for the density. We consider 3 sets of parametersfor δ with d = 4 and let ζi = δi/(δi + 2) ∈ (0, 1) be a transform of δi that corresponds tothe Kendall’s τ value for a bivariate MTCJ copula with parameter δi. The parameter setsare:• Weak dependence: δ = (1, 1, 1, 1) or ζ = (ζ1, ζ2, ζ3, ζ4) =(13 ,13 ,13 ,13). This corre-sponds to a Kendall’s τ of 0.268 between variables of the extreme value factor copula,and an extremal coefficient of 1.667.• Strong dependence: δ = (4, 4, 4, 4) or ζ = (23 , 23 , 23 , 23). This corresponds to a Kendall’sτ of 0.707 between variables of the extreme value factor copula, and an extremalcoefficient of 1.227.• Mixed dependence: δ = (1, 2, 3, 4) or ζ = (13 , 12 , 35 , 23). The pairwise Kendall’s τ andextremal coefficients implied by this parameter set are listed in Table 3.3.Pair τ ϑ(1,2) 0.358 1.565(1,3) 0.387 1.534(1,4) 0.399 1.521Pair τ ϑ(2,3) 0.562 1.358(2,4) 0.588 1.334(3,4) 0.667 1.263Table 3.3: Pairwise Kendall’s τ and extremal coefficient ϑ for the mixed dependence scenario55Some scatterplots of the normal scores of the data simulated according to the three depen-dence patterns given above are in Figure 3.1.To explore the finite-sample performance, for each set of parameters we consider samplesizes n = 100 and 500, and number of replications R = 1,000, where each simulated sampleis fitted by both full and pairwise likelihood. In each case we compute the average values ofthe Kendall’s τ of the estimated parameter and its associated estimated variance, obtainedfrom the observed information matrix for the fitting using full likelihood and the sandwichestimator for the fitting using pairwise likelihood. The sample variance is also obtainedfrom the estimated Kendall’s τ for each fitting method and compared against the estimatedvariance. All variability estimates are reported in the following as standard errors.Table 3.4 summarizes the results of the simulation, while Figure 3.2 displays the sam-pling distribution for the strong dependence scenario. Estimates are reported in terms ofthe Kendall’s τ corresponding to the parameters to reduce the right-skewness of the sam-pling distributions. We observe that both estimators have small bias even with n = 100,and that the sampling distribution is skewed when sample size is small. From the efficiencyestimates, we observe that the pairwise likelihood estimators are in general more variablethan their full likelihood counterparts. The difference is especially pronounced when thevariables are strongly dependent; an explanation is that there is a bigger loss of informa-tion by using pairwise likelihood when there is strong dependence between variables. Asexpected, an increase in the sample size leads to a decrease of the variability in all cases.With regard to the variability estimates, we observe that there is some departure betweenthe model-based and sampling standard errors for both methods when the sample size issmall, but not so when n = 500.In summary, estimation using pairwise likelihood results in acceptable performance, andthe model-based standard errors are an accurate representation of the sampling variabilitywhen the sample size is reasonably large. With small sample sizes, however, we note that amore stable estimate of the variance under composite likelihood may be obtained throughresampling methods such as the jackknife (see, e.g., Zhao and Joe (2005)), but this willincur a significant computational cost since it requires refitting the model multiple times.Finally, simulation based on the structured Hu¨sler-Reiss model is not attempted as thecomposite likelihood method is known to be feasible. In the context of spatial extremeswith the Smith and Brown-Resnick parsimonious representation of the Hu¨sler-Reiss model,pairwise and triplewise likelihood estimators have been compared in Genton et al. (2011)and Huser and Davison (2013). The performance of pairwise likelihood estimators is satis-factory except for cases of the Smith model where spatial locations and the parametrizationof the Hu¨sler-Reiss model leads to matrices that are singular or nearly singular.56lllllllllllllllllllllllllllllll llllllllllllllllll llll llllllllllllllllllllllll lllllllllll llllllllllll lll llllllllllllllll lllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllll−3 −1 0 1 2 3−3−10123Weak dependencex1x 2 lllllllllllllllll llllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll ll ll lllllllllllllllllllllllllllllllllll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llll−3 −1 0 1 2 3−3−10123Strong dependencex1x 2x1−3 −1 0 1 2 3ll llllllll lllllllllllllllllllllllllllllllll ll llllllllllll llllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllll lllllll llllllll ll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllll llll llllllllllll lllllllllllllllllllllllllllllll lllllll lllllllllllllllllllll llllllllllllllllll lllllllllllllllll lllll lllllllllllllllll llllll llllllllllllllllllllllll lllllll lllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll llllllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−3 −1 0 1 2 3−3−1123llllllllllllllllllll lllllllllllllllll lllllll ll llllll lllllllll lllllllllllllllllllllllllll lllllll llllllllllllllllllllll llllllllllllllllllll ll llllllllllllllllllllllllllllllllllll lllllllllllllllllll lllll lll llllllllllll lll lllllllllllllllllllllll llllllllllllllllll lllllllllllllll llllllllllllll−3−1123lllllllll lll ll ll llllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll llll lll lllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllll llllllllll l lllll lllllllllllll lllll llllllll lllx2lllllllllllllllllllllllllllllllllllllll llllll llllllllllllllll llllllllllllllll lllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllll lll lllllllllllll lllllllll ll llllllllllll llllllll llll llllllllllllll lll lllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll ll lllllllllllllllllllllllllll lllllllllllllllll lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllll llllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllll lll ll ll llllll ll lllllllll lllll lllll lllll lllllllllllll lllll llll lllllll lllllllllll llllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllll llllllll lllllllll llllll lllllll llllllllllllllllllllllllllllllllll lllllllllllll lllllllllllll lll llllllllllllllllllllllllllll llllllllllllll lllll ll lllllll lllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll lll llllllllllllllll llllllll x3−3−1123llllllllllllllllllllllllllllllllllllllll ll llllllllll llllllllllll lll llllllllllllllllllllll llllllllllll llllllllllllllllllll llll lllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllll llllllllllllll−3 −1 0 1 2 3−3−1123lllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll llllll llllllllllllllll llllllllllllllllllllllllllllllllllllllllllll llllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllll llllllll llllllll lllllll l ll lllllllll lllll lllllllllllllllllllllllllllllllllllll lllllllllll llllllllllll llll lllllllllllllllll lllllllllll lllllllll l llllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllll l−3 −1 0 1 2 3ll llllllllllllllllllllllllllllllllllll llllll llllllllllll llll llll llllllllllllllll lllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll llllllllllllllllllllllllllllllllllllllllll llllllllllllllllx4Mixed dependenceFigure 3.1: Scatterplots of the normal scores of the data simulated from the Dagum 1-factorextreme value copula, with dependence parameters δ = (1, 1, 1, 1), (4, 4, 4, 4) and (1, 2, 3, 4)for the weak, strong and mixed dependence scenarios, respectively57Model estimates and standard errorsN Full likelihood Pairwise likelihoodζˆ1 ζˆ2 ζˆ3 ζˆ4 ζˆ1 ζˆ2 ζˆ3 ζˆ4Weak dependence: δ = (1, 1, 1, 1) and ζ = (13 ,13 ,13 ,13)100.338 .342 .339 .337 .338 .343 .340 .337(.048) (.048) (.048) (.047) (.048) (.049) (.049) (.048)500.333 .335 .334 .333 .334 .335 .334 .334(.020) (.021) (.021) (.020) (.021) (.021) (.021) (.021)Strong dependence: δ = (4, 4, 4, 4) and ζ = (23 ,23 ,23 ,23)100.664 .667 .666 .665 .669 .673 .672 .669(.025) (.025) (.025) (.025) (.035) (.035) (.035) (.035)500.665 .665 .665 .665 .667 .668 .667 .667(.011) (.011) (.011) (.011) (.014) (.014) (.014) (.014)Mixed dependence: δ = (1, 2, 3, 4) and ζ = (13 ,12 ,35 ,23)100.336 .505 .607 .657 .334 .503 .609 .672(.032) (.033) (.038) (.042) (.031) (.032) (.042) (.053)500.335 .502 .602 .656 .333 .500 .600 .669(.014) (.015) (.017) (.019) (.014) (.015) (.019) (.028)Sampling standard errorsN Full likelihood Pairwise likelihood Efficiency (%)ζˆ1 ζˆ2 ζˆ3 ζˆ4 ζˆ1 ζˆ2 ζˆ3 ζˆ4 ζˆ1 ζˆ2 ζˆ3 ζˆ4Weak dependence: δ = (1, 1, 1, 1) and ζ = (13 ,13 ,13 ,13)100 .054 .048 .051 .049 .055 .051 .054 .049 96 90 89 100500 .020 .021 .020 .020 .020 .021 .021 .021 96 97 94 98Strong dependence: δ = (4, 4, 4, 4) and ζ = (23 ,23 ,23 ,23)100 .022 .022 .022 .021 .037 .038 .037 .036 38 33 35 35500 .010 .010 .010 .010 .015 .015 .015 .014 46 48 47 49Mixed dependence: δ = (1, 2, 3, 4) and ζ = (13 ,12 ,35 ,23)100 .033 .031 .039 .041 .033 .031 .052 .061 100 100 57 46500 .015 .015 .017 .018 .014 .015 .019 .028 100 96 77 41Table 3.4: Estimated Kendall’s τ and standard errors for the Dagum 1-factor extreme valuecopula simulation. Standard errors are shown in brackets. Efficiency is calculated as theratio of sampling variances between the estimators using full and pairwise likelihood, andis capped at 100%. The Kendall’s τ transform ζ = δ/(δ + 2) is used because the samplingdistribution of δˆi is strongly skewed for the sample size n = 100.58Full likelihood: τ^δ1Frequency0.60 0.65 0.70 0.75050100150200Full likelihood: τ^δ2Frequency0.60 0.64 0.68 0.72020406080100Full likelihood: τ^δ3Frequency0.60 0.65 0.70050100150Full likelihood: τ^δ4Frequency0.60 0.65 0.70 0.75050100150Pairwise likelihood: τ^δ1Frequency0.60 0.65 0.70 0.75 0.8004080120Pairwise likelihood: τ^δ2Frequency0.60 0.70 0.8002060100Pairwise likelihood: τ^δ3Frequency0.60 0.70 0.8004080120Pairwise likelihood: τ^δ4Frequency0.60 0.70 0.8004080120Sampling distribution: strong dependence with n = 100 and R = 1000Full likelihood: τ^δ1Frequency0.63 0.65 0.67 0.69050100150Full likelihood: τ^δ2Frequency0.63 0.65 0.67 0.69050100150200Full likelihood: τ^δ3Frequency0.64 0.66 0.68 0.70050100150200Full likelihood: τ^δ4Frequency0.64 0.66 0.68050100150200Pairwise likelihood: τ^δ1Frequency0.64 0.66 0.68 0.70 0.7202060100Pairwise likelihood: τ^δ2Frequency0.64 0.68 0.7204080120Pairwise likelihood: τ^δ3Frequency0.62 0.66 0.7004080120Pairwise likelihood: τ^δ4Frequency0.64 0.66 0.68 0.70 0.72050100150Sampling distribution: strong dependence with n = 500 and R = 1000Figure 3.2: Sampling distributions of the fitted Kendall’s τ using full and pairwise likelihoodfor the strong dependence scenario, with δ = (4, 4, 4, 4) for the Dagum 1-factor extremevalue copula3.7 Data examplesThe proposed parsimonious extreme value copulas are applied to two data sets where thereis a plausible latent factor: (a) the Fraser River flows data where a common source ofstreamflow and spatial dependence among gauge stations is likely, and (b) the UnitedStates stock returns data for selected listed companies in the same sector. For example (a),the tree dependence structure is also plausible from neighbouring stations along the river.593.7.1 Fraser River flows dataThe Fraser River is the largest river in British Columbia (Canadian Heritage Rivers System(2016)) which originates from the Rocky Mountains near the British Columbia–Albertaborder. It continues downstream through Prince George and turns south, eventually reach-ing the Lower Mainland and discharging into the Strait of Georgia south of Richmond. Therate or volume of the flow along the river is highly dependent on the climate conditions incentral and southeastern BC. Snow accumulates during the winter and melts in the springand summer, causing an increase in river flows. This is compounded by possible rainstormsin the late spring and early summer along the Fraser River; it is the time of the year whenmost extreme streamflows are recorded.To study the dependence characteristics of river flows along the Fraser River, we havedownloaded such data at 8 hydrometric gauging stations from WaterOffice, EnvironmentCanada; their locations are plotted in the left panel of Figure 3.3. The observations areannual maxima of daily streamflows until 2013. We check that the autocorrelations areinsignificant. Most stations commenced operations in the 1950s; the only exceptions areHope (6) whose records date back to 1912, and Mission (8) with earliest observation in1965. The pairwise likelihood approach thus allows us to utilize information between 1950sand 1964 when data for most but one or two stations are available.Station 3 2 1 4 5 7 6 83 1.000 0.749 0.659 0.570 0.546 0.580 0.694 0.7682 0.749 1.000 0.781 0.696 0.701 0.752 0.713 0.6631 0.659 0.781 1.000 0.925 0.886 0.895 0.842 0.6714 0.570 0.696 0.925 1.000 0.934 0.897 0.817 0.6695 0.546 0.701 0.886 0.934 1.000 0.932 0.855 0.7157 0.580 0.752 0.895 0.897 0.932 1.000 0.923 0.7586 0.694 0.713 0.842 0.817 0.855 0.923 1.000 0.9208 0.768 0.663 0.671 0.669 0.715 0.758 0.920 1.000Table 3.5: Correlations of the normal scores for the Fraser River flows data. The stationlabels are the same as in Figure 3.3, but are reordered according to the sequence of stationsfrom source to mouth.The right panel of Figure 3.3 shows a pairwise scatterplot of the data, transformedto standard normal margins, while Table 3.5 gives the correlations of the normal scores.Dependence among stations is evident from the plot. Both factor and vine are plausiblestructural assumptions for the data. For a factor interpretation, the latent factor couldbe rainfall in the interior of BC: Higher rainfall contributes to increased river flows at allstations. Meanwhile, there is a natural ordering of the stations, namely from the source60llllllll123456784850525456−125 −120 −115lonlat1−2 0 2llll lllllllllllllllllllllllllllllllllllllllllll lllllllllllll llllllllllll lllllllllllllll−2 0 2lllllllllllllllllllllllllllllllllll lllllll lllllllllllllllllllllllllllllll−2 0 2llllllllllllllllllllllllllllllllllllllllll llllll lllllllllll llllllllllllllllll−2 0 2−202llllllll llllllllllllllllllllllllll−202llllllllllllllllllll llllllllllllllllllllllll2lll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll llllllllllllll llllllllll lllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllll llllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllll lllllllllllll lll lllll llllllllllllllllll llll lllllllllllllllllllllll llllllllllll3 lllllllll llll lllllllllllll lll llllllllllllllllllllllllllll llll lllllllllllll llll llllll llllllllllllllllllllllll llllllllllllllllllll lllll lllllllllllllllllllllll lllll lllllllllllllll llllllllllllllllll−202llllllllllllllllllll lll llllll l lllll−202l lllllllllllllllllllllllllllllllllll lll llllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll4 lllllllllllllllllllllllllllllllllll llllllllllllllll lllllllllllllllll lllllllllllllllllllllll lllllllllllll lllllllllllllllllll lllllllllllllllllllllll llllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllll lll lllllllllllllllll lll llllllllllllll lllllll lllll llllllllllllllll llllll llllllllllllllllllllllllllllllll5 lll lllllllllllllllll lllllll lllllllllllll lllllll llllllllllllllllllllllllllllllllllll−202llllllll lllllllllllllllllllll l lllll−202llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllll lll lllllllllllllllllllllllll llllllll llllllllllll lllllllllllllllllllllllllllllllll llllllllll lllllllll llllllllllllllllllllllllllll llllllll6 ll lllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllll lllllll llllllllllllllllll lll lllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll7−202llllllllllllllll lllllllllllllll llllll−2 0 2−202lllll lllll lll llllllllllllllllllllllllllllllllllllllllll lllllllllllll l−2 0 2lllllllllll llllllll llllllllllllll ll lllll lllll lll lllll llll lllll llllllllll l−2 0 2llll llllll lll lllll llllllllllllllllllllllllllll lllll lllllllllllllllll−2 0 2lllll lllll lll lllll llllllll llllllllll ll 8Figure 3.3: Locations of the gauging stations along the Fraser River (left) and the pairwisescatterplot of the normal scores (right). The labels are: 1 – Hansard; 2 – McBride; 3 – RedPass; 4 – Shelley; 5 – Marguerite; 6 – Hope; 7 – Texas Creek; 8 – Mission. The Google mapis plotted using the ggmap package in R.area such as Red Pass (3) and McBride (2) to downstream area such as Hope (6) andMission (8). A vine structure is thus also possible as stations close to each other are likelyto exhibit stronger dependence than those far apart.We conduct an exploratory analysis using the correlation matrix of normal scores afterprobability integral transforms to standard Gaussian margins. This entails fitting the trans-formed data using the classical factor analysis and a Gaussian vine. With 8 variables, wecan fit at most a 4-factor model and a 5-truncated vine9. A summary of the results is givenin Table 3.6. For each fitted model, we obtain several measures that are helpful to assessthe distance between the sample correlation matrix Robs and the fitted correlation matrixRmod. They include the maximum and average absolute difference across the elements ofthe matrices, and the D measure of discrepancy given byD = log |Rmod| − log |Robs|+ tr(R−1modRobs)− d,where d = 8 is the number of variables (see, e.g., Chapter 15 of Mulaik (2009)). Smallervalues for these measures mean that Rmod is closer to Robs. The values of the Akaike9The sequential minimum spanning tree algorithm is used for fitting vine models. It is computationallyfeasible to run an exhaustive search for the best vine on 8 variables, but this is not attempted here.61and Bayesian information criteria are also reported. We observe that the performance ofthe best factor model (4-factor) and the best truncated vine model (2-truncated vine) aresimilar in terms of the BIC, but the factor models generally have smaller values for thediscrepancy measures than the vine models. There is no clear winner in this case.The loadings for the 3-factor model (Table 3.7) suggest that the stations can be dividedinto three groups, i.e., (a) those close to the source (stations 2 and 3), (b) those along themidstream (stations 1, 4, 5 and 7), and (c) those located downstream (stations 6 and 8).This supports the assertion regarding spatial dependence among the stations. For the vinemodel, the best 1- and 2-truncated Gaussian vines turn out to be D-vines. The first treecan be drawn as 2–1–4–5–7–6–8–3, which follows exactly the order of the stations fromupstream to downstream except Red Pass (3). This also hints at a spatial dependencestructure where stations closer together exhibit stronger dependence.Model # param. DDiscrepancy: Robs and Rmod AIC BICMaximum Average1-factor 8 3.608 0.251 0.058 717.8 733.82-factor 15 1.830 0.215 0.028 634.6 664.73-factor 21 0.890 0.035 0.009 595.3 637.34-factor 26 0.419 0.022 0.003 579.6 631.61-truncated 7 1.832 0.339 0.056 618.8 632.82-truncated 13 1.297 0.392 0.047 601.5 627.53-truncated 18 1.102 0.337 0.029 600.9 636.94-truncated 22 1.016 0.296 0.021 604.2 648.25-truncated 25 0.841 0.306 0.018 600.6 650.6Table 3.6: Fitting results for the Fraser River flows data using normal scores and GaussiandistributionStation Factor 1 Factor 2 Factor 33 0.251 0.907 0.3302 0.569 0.578 0.2461 0.830 0.393 0.2844 0.878 0.290 0.2645 0.853 0.233 0.3657 0.807 0.237 0.4916 0.606 0.336 0.7188 0.356 0.471 0.761Table 3.7: Loadings of the Gaussian 3-factor model for the normal scores of the FraserRiver flows data. Coefficients larger than 0.7 in absolute value are shown in boldface.After conducting this exploratory investigation, we fit the data to the extreme value62factor copula models and the structured Hu¨sler-Reiss models. Each margin is first fittedby the GEV distribution and the observations are transformed to unit uniform throughthe probability integral transform using the estimated parameters. Diagnostic checks (notshown) suggest that the GEV distribution fits well to the 8 univariate series. We considerthe Dagum and Burr 1-factor extreme value copulas; for the structured Hu¨sler-Reiss model,we fit the 1- and 2-factor structures as well as 1- and 2-truncated vines. The vine obtainedfrom the Gaussian analysis above is used. Since there is a natural ordering of the stations,we additionally fit 1- and 2-truncated D-vines that take such information into consideration.The results of the fitting are given in Table 3.8. For each fitted model we presentthe parameter estimates as well as their standard errors estimated from the Godambeinformation matrix, which is conditional on the GEV univariate estimation stage. We alsoobtain the composite likelihood information criterion (CLIC, Varin and Vidoni (2005)) andthe composite likelihood Bayesian information criterion (CLBIC, Gao and Song (2010)),which are the composite likelihood analogues of the AIC and BIC. We observe that the twoclasses of models perform similarly in terms of the CLBIC. The Hu¨sler-Reiss model with a2-truncated vine structure performs the best, where the vine is the one obtained from theminimum spanning tree algorithm (indicated in the table as “Gauss. vine”). However boththe Burr 1-factor extreme value copula and the Hu¨sler-Reiss model with 1-truncated D-vinestructure based on the location of the stations (indicated in the table as “Stn. order”) areclose competitors. It is therefore useful to analyze the results for both classes of models.Both 1-factor extreme value copula models suggest that sites 4, 5 and 7 have thestrongest dependence with the latent factor. This is in agreement with the strong pair-wise correlations among these three sites observed in Figure 3.3 and Table 3.5. These threestations are in the midstream of the Fraser river and this suggests that the river flows thereare likely to be indicative of the overall flow. Meanwhile, sites 2, 3 and 8 have the weakestdependence with the latent factor. These stations are near the ends of the river and theflows there are more likely to be influenced by other factors.For the Hu¨sler-Reiss models, there are cases where the α parameters reach the boundary(1 or −1). In this situation we optimize the pairwise likelihood with those parameter(s)fixed at the boundary. This can potentially have impact on the magnitude of tr(H−1J)and consequently the information criteria; investigation of such effects could be a futurearea of research and is not carried out in the present work. We transform the parameters ofthe Hu¨sler-Reiss factor models back to the loading matrix for easier interpretation (Table3.9). Both factor models point to a contrast between the midstream stations (1, 4, 5, 7)and the source station 3. The 2-factor model suggests that downstream stations 6 and8 are related to each other, but this is not apparent from the 1-factor model. The 1-63ModelEstimates (SE)tr(H−1J) npllk CLIC CLBICθi, δi or αi γEV1fF13.928 2.147 1.854 4.618 5.237 3.613 5.484 2.49225.24 −790 −1530 −1480(Burr) (0.489) (0.222) (0.200) (0.995) (2.076) (0.468) (1.690) (0.280)EV1fF13.206 1.422 1.125 3.884 4.440 2.895 4.820 1.77027.14 −788 −1521 −1467(Dagum) (0.513) (0.219) (0.201) (1.088) (2.119) (0.502) (1.894) (0.284)HR1f F10.690 −0.225 −1.000 0.831 0.870 0.648 0.870 0.192 1.85533.05 −785 −1503 −1437(0.105) (0.179) — (0.108) (0.119) (0.177) (0.139) (0.290) (0.180)HR2fF10.060 0.140 −0.009 0.272 −0.081 −0.824 −0.406 −1.00035.21 −808 −1546 −1476(0.258) (0.287) (0.303) (0.244) (0.182) (0.058) (0.161) —F20.721 −0.197 −1.000 1.000 0.835 0.968 0.920 0.000 1.838(0.100) (0.196) — — (0.086) (0.137) (0.074) — (0.192)HR1tT10.826 −0.074 0.820 0.845 0.862 0.834 0.344 1.49131.70 −795 −1527 −1464(Gauss. vine) (0.066) (0.714) (0.053) (0.074) (0.078) (0.082) (0.346) (0.252)T10.841 0.105 0.811 0.783 0.788 0.761 0.29734.23 −810 −1551 −1483HR2t (0.047) (0.295) (0.045) (0.068) (0.093) (0.081) (0.153)(Gauss. vine)T2−1.000 −1.000 −0.481 0.221 −0.011 −0.836 1.781— — (0.339) (0.243) (0.215) (0.258) (0.170)HR1tT10.625 0.588 0.892 0.912 0.902 0.887 0.887 1.21229.44 −798 −1538 −1479(Stn. order) (0.129) (0.229) (0.057) (0.056) (0.056) (0.057) (0.067) (0.282)T10.335 0.380 0.772 0.798 0.791 0.818 0.84335.68 −809 −1547 −1476HR2t (0.149) (0.160) (0.082) (0.086) (0.064) (0.045) (0.046)(Stn. order)T2−0.832 −0.905 0.027 0.213 −0.461 −1.000 1.743(0.541) (0.352) (0.219) (0.263) (0.331) — (0.149)Saturated HR There are a total of 28 parameters and are omitted here 41.29 −814 −1544 −1462Table 3.8: Fitting results for the Fraser River flows data using multivariate extreme value copulas. The models are EV: Extremevalue factor copula and HR: Structured Hu¨sler-Reiss copula, where 1f and 2f indicate the number of factors, and 1t and 2t thelevel of truncation for vines. The saturated Hu¨sler-Reiss copula (last row) is the unstructured distribution where all(82)= 28parameters are allowed to vary independently. The SEs are based on the pairwise log-likelihood with the univariate parametersheld fixed from the first stage GEV estimation. The last 4 columns are respectively: (1) Trace of the matrix H−1J whichcan be considered as the penalty on the pairwise likelihood; (2) Negative pairwise log-likelihood; (3) Composite likelihoodinformation criterion; (4) Composite likelihood Bayesian information criterion.64factor Hu¨sler-Reiss model has the highest negative pairwise log-likelihood as well as bothCLIC and CLBIC, which are even higher than that of the saturated model; the 1-factorHu¨sler-Reiss model may thus be inferior to the others for this data set. Meanwhile, thevine diagrams for the Hu¨sler-Reiss vine models are plotted in Figures 3.4 and 3.5. Theformer is fitted using the vine obtained from the Gaussian analysis, while the latter is fittedusing a D-vine that follows the order of the stations from upstream to downstream. InFigure 3.4, we observe that the dependence strengths between stations 8 and 3, as wellas stations 2 and 1, are quite a lot weaker than the other neighbours. In Figure 3.5,where the stations are in their natural order, we observe somewhat weaker dependencebetween stations 3 and 2, as well as 2 and 1. This pattern is in agreement with theobservation in Table 3.5. It should be noted that the correlations implied by the truncatedvine models are not directly comparable to the correlations of the normal scores of theoriginal data, due to the limit argument in the derivation of the Hu¨sler-Reiss copula andthe role of the ρij ’s as correlations of the underlying normal variates instead. This can beillustrated through the fitted coefficients of the second tree, where multiple strong negativepartial correlations are obtained. The partial correlations of the observed normal scoresare (r13;2, r24;1, r15;4, r47;5, r56;7, r78;6) = (0.181,−0.110, 0.158, 0.204,−0.031,−0.603). Thesevalues are generally larger than the coefficients of the second tree in Figure 3.5, but theranks are similar with the exception of r13;2, for which the fitted coefficient is stronglynegative.Finally, we observe that the saturated Hu¨sler-Reiss model is competitive against some ofthe poorer parsimonious models. It appears that the dependence structure of our data couldbe more complex than a 1-factor or 1-truncated vine structure can handle, but the gain ofusing a parsimonious model still outweighs the loss of information for some structures.1-factorStation Factor 13 −1.0002 −0.2251 0.6904 0.8315 0.8707 0.8706 0.6488 0.1922-factorStation Factor 1 Factor 23 −0.009 −1.0002 0.140 −0.1951 0.060 0.7204 0.272 0.9625 −0.081 0.8327 −0.406 0.8416 −0.824 0.5498 −1.000 0.000Table 3.9: Loadings of the Hu¨sler-Reiss 1- and 2-factor models. Coefficients larger than 0.7in absolute value are shown in boldface.652 1 4 5 7 6 8 30.344 0.834 0.862 0.845 0.820 0.826 −0.0742 1 4 5 7 6 8 30.297 0.761 0.788 0.783 0.811 0.841 0.105−0.836 −0.011 0.221 −0.481 −1.000 −1.000Figure 3.4: Vine diagram for the fitted Hu¨sler-Reiss 1- (top) and 2-truncated (bottom) vinemodels using the vine suggested from Gaussian analysis3 2 1 4 5 7 6 80.625 0.588 0.892 0.912 0.902 0.887 0.8873 2 1 4 5 7 6 80.335 0.380 0.772 0.798 0.791 0.818 0.843−0.832 −0.905 0.027 0.213 −0.461 −1.000Figure 3.5: Vine diagram for the fitted Hu¨sler-Reiss 1- (top) and 2-truncated (bottom) vinemodels using a D-vine following the relative positions of the stations3.7.2 United States stock returns dataThe second example we consider is on the returns of selected stocks traded on the US stockexchanges. Extreme value theory is widely used in the financial sector, one reason beingthat it provides the theoretical foundation for the modelling of value-at-risk (VaR) on stockreturns. The VaR is an extreme quantile of the distribution for the returns; usually thequantile corresponding to a loss is of interest. Data for the extreme tails are often sparse,making empirical quantiles unreliable. Tsay (2010) illustrates the use of univariate extremevalue theory on monthly minima of daily log returns. The log return for day t is defined asrt = log (Pt/Pt−1) where Pt is the closing stock price on day t. When Pt/Pt−1 is close to 1,rt is close to the percentage change (Pt − Pt−1)/Pt−1.To highlight the use of the extreme value factor copula model, we selected 7 majorstocks in the pharmaceutical sector. They are: (1) GlaxoSmithKline PLC (GSK); (2)Johnson & Johnson (JNJ); (3) Eli Lilly and Co (LLY); (4) Merck & Co., Inc. (MRK); (5)Mylan NV (MYL); (6) Novartis AG (NVS), and; (7) Pfizer Inc. (PFE). The observationsare bimonthly minima of daily log returns between January 1997 and October 2016, for a66total of 119 observations. We choose bimonthly minima as the resulting series show weakerautocorrelations than monthly minima. Returns prior to 1997 are not used to reducethe effect of potential nonstationarity. For example, the advent of modern computingtechnology in the 1980s has seen a rapid growth in trading volume and frequency (andthus volatility), and is thought to be a potential factor underlying the 1987 stock marketcrash (Carlson (2007)). A joint extreme treatment of stock returns may be of use when theweights of different assets are to be selected to achieve an objective, such as maximizingexpected returns, subject to restrictions on the VaR of the portfolio (Smith (2002)).We negate the minimum returns in the subsequent analysis. They generally exhibitupper tail dependence, i.e., extreme losses tend to occur together. The pairwise scatterplotsof the normal scores are shown in Figure 3.6, while the correlations are given in Table 3.10.Stock GSK JNJ LLY MRK MYL NVS PFEGSK 1.000 0.567 0.450 0.374 0.375 0.579 0.469JNJ 0.567 1.000 0.501 0.464 0.477 0.651 0.636LLY 0.450 0.501 1.000 0.410 0.434 0.458 0.542MRK 0.374 0.464 0.410 1.000 0.393 0.485 0.498MYL 0.375 0.477 0.434 0.393 1.000 0.492 0.398NVS 0.579 0.651 0.458 0.485 0.492 1.000 0.502PFE 0.469 0.636 0.542 0.498 0.398 0.502 1.000Table 3.10: Correlations of the normal scores for the US stocks dataModel # param. DDiscrepancy: Robs and Rmod AIC BICMaximum Average1-factor 7 0.126 0.065 0.033 2050 20702-factor 13 0.040 0.066 0.016 2052 20883-factor 18 0.015 0.038 0.008 2059 21091-truncated 6 0.490 0.324 0.147 2092 21092-truncated 11 0.121 0.141 0.041 2058 20883-truncated 15 0.023 0.085 0.010 2054 20964-truncated 18 0.001 0.010 0.001 2058 21085-truncated 20 < 0.001 0.005 < 0.001 2062 2117Table 3.11: Fitting results for the US stocks data using normal scores and Gaussian distri-butionWe then fit Gaussian factor and vine models to the normal scores; the results are shownin Table 3.11. Unlike the preceding example, here the factor models clearly outperformthe vine models. The discrepancy between Robs and Rmod for the 1-factor model with 7parameters is comparable to that for the 2-truncated vine model with 11 parameters. This67GSK−2 0 2llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll llll llllllll llllllllllllllll lllllllll llllllllllllll lllllllll llllllll lllllllllllllllllllll lllllllllll lllll l lll−2 0 2llll lllllllllllllllll llll lllllllllllllllllllllllllllll llllll ll llllll lllllll lllllllll llllllllllllllll lllllllllllll llllllllllllllllllllllllllllllllllll llllll lllll−2 0 2llll lllllllllllllllllllllllllllllllllllllllll llllllllllllllll lllllllllllllllllll −202llll lllllllll llllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll−202llllllllllllllllllllllllllllllllllllll lllllllllllllll lllllll llllllll llllll llll ll lllll JNJ llll llllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllll llll llllllll llllllllllllllllll lllllllllllllllllllllllllll lll lllllllll lllll lllllll lllllllll lllllllllllllllllllllllll lllllllllllllllllll lll lllllll llllllllllllllll lllllllllllll llllllll llllll llllllllllllllllllllllllllll lll llllllllllllllllll llll lllllll llll llllllll llllllllllllllll llllllllllllllllllllllll lllll lllll lll lll llllll llllll lll lllllllll ll lllllll lllllllll ll lll lll lllllllllllll llllll ll llllllll ll lllll lll llllllll lllllllllllllllllllllllll llllllll lllllllllllllllllllllllllllllllllllllll llllllLLY lll llllllllllllllllllllllllllllll llllllllllllllll llllll lllllllllllllllllllllllllllllllllllllllllll lllll llllllllllllll lll llllll lllllllllllll lllll lll lllllllllllllllllllll lllllll lll llllll llllll lllllllllll llllllllllllllllllllllllll llllllllllllllllllllllllllllll−202llllllllll llllllllllllllllllllllll llllll llllllllllllllllllllllll llllllllll llllllllll−202llllll lllllllllllllllll lllllllllll llllllllllllll lllllllllllllllllllllllll llll lllllllllll llllllllllllllllll lllllllllllllllllll ll lllllllllllllllllllllllllll lllllllllllllllllll llll llll lllllllll lll llllllllllllllllllllll lllllllllllllllllllllllll lllMRK l llllllllllllll llllll llllllllll l lll lllllllllllll lllllllllllllllllllllllllllllllll llll llllll lllll lllllllllllllllll ll lllllllllllll llllll llllllllllll lllllllllll llllllllllllllllllll llll lllllllllllllllll ll llllllllllllllllllll lllll llllllll llllllllll lllll llllllllllllll l llll lllllllllllllllllllllllll lllllllllllllllll lllllll lllllllll lll lllll lllllllll llllllllll llllllllllllll lllllllllllllllllllllllllllllllllllllllll l llllllll lll llllllllllllllllllllllllllll lll l lllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll llllll llllllll l MYL lll lllllllllllllll lllll llllllll lllllllllllllll llllllllllll lllllllllll lllllllll−202llllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllllllll ll−202lllllllllllllllllllllllllllllllllllllll lllll llllllllllllllllllllllllllllllllllllllllllll llllllll lllllllllllllllllllll lll lll lllllllllllllll llllllllllllllllll llllllllll llllllll llll llll lll lllllllllllll ll lll l llllllllllllllllllll llllllllllll llllllllllllllllllllllllllllll ll lllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllll llll lll lllllllllllllllllllll lllllllllllllllllNVSllllllllllllllll lllllllll lllllllll ll l llll llllllllllllllllllllllllllllllll lllll−2 0 2l lllll lllllllllll llllllll llllllll lllll lllllllllll lll llllllllllllllllllll lllllllll lll lllllll lllllllll llllllllllllll lll llll llllll lll lllllllllllll lllllllllllllllll l−2 0 2l llllllllllll lllllllll lllllllllllll llll lll lllllllllllllllll llll llllllllllll ll llllllll lllllllllllllllll llll llllllllllll lllll lllll llllllllllllllllll llllll lll lllllll−2 0 2l llllllllllllllllll ll llll lllll llllllll llllllll llll lllllllllllllllllllll llllll lllllllllllllll lllllllllllllllll llllllllllllllllllllllllllllllll lllllllll lllllll lllll ll−2 0 2−202PFEFigure 3.6: Scatterplot of the normal scores for the US stocks data (negated minimumreturns)is also reflected by the superior AIC and BIC values for the factor models; they suggestthat a 1- or 2-factor structure may be good enough. The results here are consistent withintuition — there is no obvious ordering or tree structure among the stocks, but the returnsof stocks in the same sector can be thought of driven by some common factors such asthe general economic environment and sector-specific drivers. The loadings of the 1- and2-factor models are given in Table 3.12 and they seem to be consistent with the aboveexplanation.Next, we fit univariate GEV distribution to each margin and transform the observationsto unit uniform before applying the extreme value factor copula and structured Hu¨sler-Reissmodels. The truncated vines used for the Hu¨sler-Reiss model are those suggested from the681-factorStock Factor 1GSK 0.682JNJ 0.823LLY 0.650MRK 0.610MYL 0.597NVS 0.771PFE 0.7352-factorStock Factor 1 Factor 2GSK 0.640 0.270JNJ 0.685 0.432LLY 0.498 0.399MRK 0.488 0.356MYL 0.561 0.224NVS 0.784 0.253PFE 0.336 0.939Table 3.12: Loadings of the Gaussian 1- and 2-factor models for the normal scores of theUS stocks data. Coefficients larger than 0.7 in absolute value are shown in boldface.partial correlation vines from the correlation matrix of normal scores. We additionally fit a1-factor t-EV model with 3 degrees of freedom. The parameter estimates, standard errors(conditional on the GEV univariate estimation stage) and various auxiliary quantities arereported in Table 3.13. For this data set, the 1-factor extreme value copulas fit better thanthe Hu¨sler-Reiss models, as can be seen from its smaller CLIC and CLBIC values. However,the 1-factor t-EV model appears to be superior. From the fitted parameters of the 1-factormodels, the returns of Johnson & Johnson are more strongly correlated to the latent factor.Among the Hu¨sler-Reiss models, the 1-truncated vine (Figure 3.7) appears to be the best.Both Novartis and Pfizer connect to three other stocks due to their stronger dependence inextreme returns.We run separate analyses on stocks in the banking and consumer staples sectors and findthat the factor structure fits better than the vine structure in general. These results suggestthat the factor structure for daily returns is a plausible modelling assumption among majorstocks in the same sector, and hence diversification within one sector is less useful. Withthe fitted marginal and dependence models, it is then possible to arrive at a model-basedVaR for a portfolio consisting of the stocks involved. The dependence model also allows usto gain insight on the joint tail behaviour, for all or a subset of the stocks.3 7 2 6 54 10.406 0.600 0.641 0.2890.237 0.565Figure 3.7: Diagram for the fitted Hu¨sler-Reiss 1-truncated vine model using the vinesuggested from Gaussian analysis.69ModelEstimates (SE)tr(H−1J) npllk CLIC CLBICθi, δi or αi γEV1fF11.827 2.744 1.673 1.578 1.589 2.179 2.01722.34 −331 −617 −555(Burr) (0.185) (0.758) (0.188) (0.165) (0.134) (0.283) (0.247)EV1fF11.093 1.857 0.936 0.830 0.853 1.470 1.27724.29 −321 −594 −526(Dagum) (0.189) (0.613) (0.204) (0.179) (0.140) (0.288) (0.250)HR1f F10.720 0.891 0.580 0.500 0.540 0.824 0.773 0.76824.81 −306 −562 −493(0.224) (0.089) (0.447) (0.451) (0.475) (0.183) (0.245) (0.308)HR2fF1−0.148 −0.428 −0.293 −1.000 −0.292 −0.394 −0.48528.09 −306 −555 −477(0.288) (0.261) (0.161) — (0.230) (0.259) (0.164)F20.768 0.856 0.508 0.000 0.459 0.785 0.692 0.775(0.179) (0.046) (0.154) — (0.298) (0.108) (0.064) (0.151)tEV1fF10.790 0.915 0.749 0.720 0.692 0.859 0.84117.31 −345 −656 −608(ν = 3) (0.058) (0.041) (0.069) (0.087) (0.077) (0.041) (0.047)HR1tT10.641 0.600 0.565 0.406 0.237 0.289 0.91821.26 −307 −572 −513(Gauss. vine) (0.122) (0.130) (0.128) (0.164) (0.265) (0.283) (0.117)T10.593 0.452 0.742 0.382 0.680 0.47729.79 −309 −558 −475HR2t (0.269) (0.347) (0.172) (0.363) (0.181) (0.290)(Gauss. vine)T20.470 0.534 0.460 0.309 0.261 0.754(0.153) (0.120) (0.242) (0.158) (0.213) (0.222)Saturated HR There are a total of 21 parameters and are omitted here 34.88 −310 −549 −452Table 3.13: Fitting results for the US stocks data using multivariate extreme value copulas. The models are EV: Extreme valuefactor copula, HR: Structured Hu¨sler-Reiss copula, and tEV: Structured t-EV copula, where 1f and 2f indicate the numberof factors, and 1t and 2t the level of truncation for vines. The saturated Hu¨sler-Reiss copula (last row) is the unstructureddistribution where all(72)= 21 parameters are allowed to vary independently. The SEs are based on the pairwise log-likelihoodwith the univariate parameters held fixed from the first stage GEV estimation. The last 4 columns are respectively: (1) Traceof the matrix H−1J which can be considered as the penalty on the pairwise likelihood; (2) Negative pairwise log-likelihood;(3) Composite likelihood information criterion; (4) Composite likelihood Bayesian information criterion.70Chapter 4Numerical integration methods forextreme value factor copula modelsThis chapter is devoted to the technical details regarding the numerical integration involvedin computing the bivariate densities of the 1-factor extreme value copula. The bivariate den-sity for general extreme value copulas is given by c12(u1, u2) = e−A(A(1)A(2)−A(12))/(u1u2),where A is the exponent function and A(S) is the partial derivative of A with respect tothe element(s) in the set S. For the 1-factor extreme value copula, we have the followingresults in Section 3.5:A = A(w1, w2) =ˆ ∞0[1− (1− b1|V (w1|x)) (1− b2|V (w2|x))] dx;A(1) = A(1)(w1, w2) =ˆ ∞0b′1|V (w1|x)(1− b2|V (w2|x))dx;A(2) = A(2)(w1, w2) =ˆ ∞0b′2|V (w2|x)(1− b1|V (w1|x))dx;A(12) = A(12)(w1, w2) = −ˆ ∞0b′1|V (w1|x)b′2|V (w2|x) dx,where b′is the derivative of b with respect to the first argument, and wi = − log ui, i = 1, 2.Although these integrals can be evaluated by numerical adaptive integration, they are notprecise enough to allow accurate evaluation of the gradient and Hessian of the compositelikelihood and thus the parameter and standard error estimates are unstable. This is partlydue to the fact that the integrand often has a somewhat heavy tail, especially for A, andpartly because of the algebraic operations on A and its derivatives that magnify numericalimprecision.A first step to enhancing stability is to transform w1 and w2 so that they always sum to1. For any w1 and w2, obtain w∗1 = w1/(w1+w2) and w∗2 = w2/(w1+w2), which are used for71calculation. Because A, A(i) (i = 1, 2) and A(12) are respectively homogeneous functions oforders 1, 0 and −1, we have A(w1, w2) = (w1 + w2)A (w∗1, w∗2), A(i)(w1, w2) = A(i) (w∗1, w∗2)for i = 1, 2, and A(12)(w1, w2) = (w1 + w2)−1A(12) (w∗1, w∗2). Such transformation has thebest effect when w1 and w2 are close to 0, i.e., when u1 and u2 are close to 1, since thiswill bring them away from the boundary where numerical difficulties are the most likelyto happen. In the rest of the discussion, we assume that w1 and w2 have already beentransformed in this manner.To further improve the stability of integral computation, we split the integral into theregions x ∈ [0, 1) and x ∈ [1,∞), guided by the difference in asymptotic behaviour of theintegrands as x → 0+ and x → ∞. The Gauss-Laguerre quadrature (see, e.g., Salzer andZucker (1949) and page 890 of Abramowitz and Stegun (1970)) is applied to these integrals.Gaussian quadrature methods attempt to approximate integrals of the form´h(x)f(x) dx,where f(x) is a density function, as∑nqk=1wkh(xk), where the xk’s are the nq quadraturenodes and the wk’s are the corresponding quadrature weights (Stroud and Secrest (1966)).Such evaluation is exact if h is a polynomial of at most degree nq.For integrand h(x) with the transformation x = eηy for constant η > 0, we obtainˆ ∞0h(x) dx =ˆ 10h(x) dx+ˆ ∞1h(x) dx=ˆ 0−∞h(eηy) · ηeηy dy +ˆ ∞0h(eηy) · ηeηy dy=ˆ ∞0h(e−ηy) · ηe−(η−1)y · e−y dy +ˆ ∞0h(eηy) · ηe(η+1)y · e−y dy, (4.1)so that each integral becomes an expectation of a standard exponential random variable forthe application of the Gauss-Laguerre quadrature. Choosing η appropriately can stabilizeintegral evaluation; different choices of η can be made for the two integrals. One practicalchallenge associated with the transformation (4.1) is that e(η+1)y can grow rapidly, especiallywhen η is large. Numerical overflow is very likely to happen when the nodes are largeenough, say y > 100. In this situation, a first-order approximation is applied directly tothe term h(eηy) · ηe(η+1)y. It is also found that, for the integral x ∈ [1,∞) in the originalintegrand, stability can be further improved by another transformation that turns the limitsof the integration into [0, 1] and allows the use of the Gauss-Jacobi or Gauss-Legendrequadrature methods.In the following, we provide the details on the transformations and discuss the resultingbehaviour for the Burr and Dagum 1-factor extreme value copulas in Sections 4.1 and 4.2,respectively. Bivariate copulas are assumed throughout the chapter.724.1 Burr 1-factor extreme value copulaFor the 1-factor extreme value copula with Burr conditional tail dependence functions andparameters θ1, θ2 > 1, we have, for i = 1, 2, thatbi|V (wi|x) = 1−[(wi/x)θi + 1]1/θi−1; b′i|V (wi|x) =θi − 1x(wix)θi−1 [(wix)θi+ 1]1/θi−2.The A function. The A function is given byA(w1, w2) =ˆ ∞0(1−[(w1x)θ1+ 1]1/θ1−1 [(w2x)θ2+ 1]1/θ2−1)dx. (4.2)Let h(x) be the integrand of (4.2). First assume θ1 < θ2; results for the reverse case arereadily obtained by interchanging the subscripts. We first consider the second integral of(4.1), corresponding to the region x ∈ [1,∞). When x → ∞, we apply the Taylor seriesexpansion to 1− bi|V (wi|x) and obtainh(x) ≈ 1−2∏i=1[1 +(1θi− 1)(wix)θi+12(1θi− 1)(1θi− 2)(wix)2θi+ o(x−2θi)]=(1− 1θ1)(w1x)θ1+ o(x−θ1). (4.3)Hence the tail of h(x) behaves like x−θ1 . The transformation h(eηy) · ηe(η+1)y thereforebehaves like C0e(η+1−ηθ1)y, where C0 is a quantity10 not involving y. This transformation isnon-increasing at the limit (i.e., as y →∞) if and only if η+1−ηθ1 ≤ 0, or η ≥ (θ1 − 1)−1 .By setting η = (θ1 − 1)−1, we can ensure that h(eηy)·ηe(η+1)y tends to a constant as y →∞,while at the same time reducing the possibility of numerical overflow for the part e(η+1)y.Note that this value of η also works when θ1 = θ2 as the analogous expansion of (4.3) hasthe same dominating term x−θ1 .Next, for the first integral of (4.1) which corresponds to the region x ∈ [0, 1) andtransformed y > 0, note thath(e−ηy) · ηe−(η−1)y =(1−[(w1eηy)θ1 + 1]1/θ1−1 [(w2eηy)θ2 + 1]1/θ2−1)ηe−(η−1)y≤ ηe−(η−1)y,as 0 ≤[(w1eηy)θ1 + 1]1/θ1−1 [(w2eηy)θ2 + 1]1/θ2−1 ≤ 1. Using η = 1 will ensure that0 ≤ h(e−ηy) · ηe−(η−1)y ≤ 1, improving stability in the calculation of the integral.10Hereafter, we use Ci’s to denote various constants not depending on x or y.73The first derivatives A(1) and A(2). The A(1) function is given byA(1)(w1, w2) =ˆ ∞0{θ1 − 1x(w1x)θ1−1 [(w1x)θ1+ 1]1/θ1−2 [(w2x)θ2+ 1]1/θ2−1}dx.(4.4)Let h1(x) be this integrand and θ1 ≤ θ2. For the region x ∈ [1,∞), the Taylor seriesexpansion result above implies that, as x→∞,h1(x) ≈ C1x−θ1[1 + C2(w1x)θ1+ o(x−θ1)] [1 + C3(w2x)θ2+ o(x−θ2)]= C4x−θ1+o(x−θ1).Hence the tail behaviour of h1(x) is the same as that of h(x); we can thus use η = (θ1 − 1)−1.For x ∈ [0, 1) with transformed y > 0, the transformation h1(e−ηy) · ηe−(η−1)y is equalto(θ1 − 1)wθ1−11 eηθ1y[(w1eηy)θ1 + 1]1/θ1−2 [(w2eηy)θ2 + 1]1/θ2−1ηe−(η−1)y.As y →∞, the tail of this expression behaves likeeηθ1y · eηθ1y(1/θ1−2) · eηθ2y(1/θ2−1) · e−(η−1)y = e[η(1−θ1−θ2)+1]y. (4.5)Since 2− θ1 − θ2 ≤ 0, using η = 1 will ensure that h1(e−ηy) · ηe−(η−1)y is non-increasing iny in the limit.As for A(2), a swap of indices in (4.4) implies h2(x) = C5x−θ2 + o(x−θ2), where h2(x)is the integrand for A(2). As x → ∞ (i.e., y → ∞), the transformation h2(eηy) · ηe(η+1)yhas the same tail behaviour as e(η+1−ηθ2)y. The choice η = (θ1 − 1)−1 can again be usedas η + 1− ηθ2 = (θ1 − θ2)/(θ1 − 1) ≤ 0 by the assumption that θ1 ≤ θ2. The transformedintegrand is therefore non-increasing in y in the limit.For x ∈ [0, 1), the same result (4.5) applies as the expression is symmetric in θ1 and θ2.The second derivative A(12). The second derivative is given byA(12)(w1, w2) = −ˆ ∞02∏i=1{θi − 1x(wix)θi−1 [(wix)θi+ 1]1/θi−2}dx. (4.6)Let h12(x) be the integrand of (4.6). As x → ∞, a similar Taylor series expansion resultsinh12(x) = C6x−(θ1+θ2) + o(x−(θ1+θ2)).Hence the transformation h12(eηy) · ηe(η+1)y has the same tail behaviour as e[η(1−θ1−θ2)+1]y.Either η = (θ1 − 1)−1 or 1 will work here; the former choice leads to a more rapid declineof the transformed integrand when θ1 < 2, but is more prone to numerical overflow for the74term e(η+1)y. However, this can be overcome through a first-order approximation of thewhole integrand.As x → 0+, we have h(x) ≈ C7x−(θ1+θ2) · x2θ1−1 · x2θ2−1 = C7xθ1+θ2−2. The transfor-mation h12(e−ηy) · ηe−(η−1)y has the tail behaviour of e[η(1−θ1−θ2)+1]y, the same as that forthe transformation for the part x ∈ [1,∞). The choices of η in the preceding paragraph aretherefore also applicable to the part x ∈ [0, 1).First-order approximation of h(eηy) · ηe(η+1)y as y → ∞. As noted above, whenevaluating the integral for the part x ∈ [1,∞), the quantity e(η+1)y may be too big tohandle numerically when y is large. When this occurs, a first-order tail approximation ofthe whole transformed integrand is necessary. The following shows the derivation for eachintegrand.• For A, the expression (4.3) for h(x) implies that, if θ1 < θ2 and x = eηy is large,h(eηy) · ηe(η+1)y ≈(1− 1θ1)wθ11 e−ηθ1y · ηe(η+1)y = θ1 − 1θ1wθ11 ηe[η(1−θ1)+1]y =wθ11θ1with the choice of η = (θ1 − 1)−1. If θ1 = θ2 = θ, we haveh(x) ≈(1− 1θ)(wθ1 + wθ2xθ),and the approximation becomes h(eηy) · ηe(η+1)y ≈ (wθ1 + wθ2)/θ.• For A(1), the expansion of h1(x) (integrand of (4.4)) ish1(x) = (θ1 − 1)wθ1−11 x−θ1 [1 + o(1)] [1 + o(1)] ≈ (θ1 − 1)wθ1−11 x−θ1 .The resulting approximation ish1(eηy) · ηe(η+1)y ≈ (θ1 − 1)wθ1−11 ηe[η(1−θ1)+1]y = wθ1−11with the same choice of η as that for approximating A.• For A(2), the transformed integrand is decreasing in y. At the range of values wheree(η+1)y overflows, h2(eηy) · ηe(η+1)y is very close to zero for all practical purposes ifθ1 < θ2. If θ1 = θ2 = θ, the approximation for the integrand of A(1) remains thesame, while that for A(2) becomes wθ−12 instead of zero.• For A(12), h12(eηy) ·ηe(η+1)y has the same tail behaviour as e[η(1−θ1−θ2)+1]y, where h12is the integrand of (4.6). Since η(1− θ1 − θ2) + 1 < 0 when η = (θ1 − 1)−1 or 1, theapproximation is again zero. It remains zero even when θ1 = θ2.75In terms of programming implementation, for i = 1, 2 we check whether θi = min(θ1, θ2).If this is true, we increment the approximation for A and A(i) (for large y) by wθii /θi andwθi−1i , respectively.Gauss-Jacobi quadrature. Although a first-order approximation is sufficient in mostcases, the facts that h(eηy) → 0 (or hi(eηy) → 0) and e(η+1)y → ∞ as y → ∞ may causenumerical instability, especially when e(η+1)y is very large but does not overflow. Here wepresent another transformation that may improve the evaluation of the integral for theregion x ∈ [1,∞).Consider the transformation s = x−θ1 , so that dx = − (s−(θ1+1)/θ1/θ1) ds. Let A+,A(1)+ , A(2)+ and A(12)+ be the respective integrals with lower limit equal to 1 instead of 0.Then we haveA(1)+ =ˆ ∞1{(θ1 − 1)wθ1−11 x−θ1(wθ11 x−θ1 + 1)1/θ1−2 (wθ22 x−θ2 + 1)1/θ2−1}dx=ˆ 10{θ1 − 1θ1wθ1−11 s−1/θ1 (wθ11 s+ 1)1/θ1−2 (wθ22 sθ2/θ1 + 1)1/θ2−1}ds= B(1− 1θ1, 1)· θ1 − 1θ1wθ1−11 E[(wθ11 B1 + 1)1/θ1−2 (wθ22 Bθ2/θ11 + 1)1/θ2−1]= wθ1−11 E[(wθ11 B1 + 1)1/θ1−2 (wθ22 Bθ2/θ11 + 1)1/θ2−1], (4.7)where B(·, ·) is the Beta function and B1 is a Beta(1− θ−11 , 1)random variable. Theexpectation can be evaluated via Gauss-Jacobi quadrature. Using this transformation, thelikelihood of numerical overflow or imprecision due to rounding is greatly reduced, as therange of integration is [0, 1] rather than [0,∞) in the Gauss-Laguerre quadrature, and theintegrand(wθ11 B1 + 1)1/θ1−2 (wθ22 Bθ2/θ11 + 1)1/θ2−1is always in [0, 1]. Since there is noassumption about the relative magnitudes of θ1 and θ2 in the above derivation, we cansimply swap the indices in (4.7) to obtain an expression for A(2)+ . The expectation is thentaken with respect to a Beta(1− θ−12 , 1)random variable instead.Meanwhile, for the second derivative A(12)+ , we haveA(12)+=−ˆ ∞1{(θ1 − 1) (θ2 − 1)wθ1−11 wθ2−12 x−(θ1+θ2)(wθ11 x−θ1 + 1)1/θ1−2 (wθ22 x−θ2 + 1)1/θ2−2}dx=−ˆ 10{(θ1 − 1) (θ2 − 1)θ1wθ1−11 wθ2−12 s(θ2−1)/θ1 (wθ11 s+ 1)1/θ1−2 (wθ22 sθ2/θ1 + 1)1/θ2−2}ds= (1− θ2)wθ1−11 wθ2−12 E[Bθ2/θ11(wθ11 B1 + 1)1/θ1−2 (wθ22 Bθ2/θ11 + 1)1/θ2−2],76which involves an integral with bounded integrand and range.A similar derivation can be performed on A+ as follows:A+ =ˆ ∞1[1− (wθ11 x−θ1 + 1)1/θ1−1 (wθ22 x−θ2 + 1)1/θ2−1]dx=ˆ 101θ1s−1−1/θ1[1− (wθ11 s+ 1)1/θ1−1 (wθ22 sθ2/θ1 + 1)1/θ2−1]ds.However, this integral cannot be condensed into an expectation with finite range and finiteintegrand everywhere in that range. Experimentation with this transformation does notproduce better results than the Gauss-Laguerre transformation or the adaptive integrationfunction integrate in R.Behaviour of the exponent function and its derivatives at the independencelimit. The transformations above can help us discover the behaviour of A and its deriva-tives at the independence limit, i.e., when θ1, θ2 → 1+. These results can also serve to verifythe validity of the transformations. To simplify calculations, we assume θ1 = θ2 = θ → 1+in the following (a similar result applies to distinct θ’s by letting θ2 = kθ1 for k > 0).For the region x ∈ [0, 1), observe that 0 ≤ h(x) ≤ 1. For the first derivative, theintegrand of A(1) (see (4.4)) ish1(x) = (θ − 1)wθ−11 x−θ(wθ1x−θ + 1)1/θ−2 (wθ2x−θ + 1)1/θ−1= (θ − 1)w−θ1 w1−θ2 x2(θ−1)(w−θ1 xθ + 1)1/θ−2 (w−θ2 xθ + 1)1/θ−1,i.e., 0 ≤ h1(x) ≤ (θ − 1)w−θ1 w1−θ2 . A similar result holds for h2(x), the integrand of A(2).For the second derivative, the integrand of A(12) ish12(x) = (θ − 1)2wθ−11 wθ−12 x−2θ(wθ1x−θ + 1)1/θ−2 (wθ2x−θ + 1)1/θ−2= (θ − 1)2w−θ1 w−θ2 x2(θ−1)(w−θ1 xθ + 1)1/θ−2 (w−θ2 xθ + 1)1/θ−2,i.e., 0 ≤ h12(x) ≤ (θ − 1)2w−θ1 w−θ2 . Since the bounds of these integrands are all integrable,the dominated convergence theorem applies. Hencelimθ→1+ˆ 10h(x; θ) dx =ˆ 10limθ→1+h(x; θ) dx = 0;limθ→1+ˆ 10hi(x; θ) dx =ˆ 10limθ→1+hi(x; θ) dx = 0, i = 1, 2;limθ→1+ˆ 10h12(x; θ) dx =ˆ 10limθ→1+h12(x; θ) dx = 0,where the h functions are written in this way to emphasize their dependence on θ.For the region x ∈ [1,∞), we have the following results:77• For A, we haveA+ =ˆ 101θs−1−1/θ[1− (wθ1s+ 1)1/θ−1 (wθ2s+ 1)1/θ−1] ds=1θB(1− 1θ, 1)E[1− (wθ1B1 + 1)1/θ−1 (wθ2B1 + 1)1/θ−1B1]= E[1− (wθ1B1 + 1)1/θ−1 (wθ2B1 + 1)1/θ−1(θ − 1)B1], (4.8)where B1 ∼ Beta(1 − θ−1, 1). As θ decreases towards 1, B1 becomes increasinglyconcentrated at zero, eventually becoming a point mass there. If we expand theterms inside the expectation (4.8) about zero, we observe1− (wθ1B1 + 1)1/θ−1 (wθ2B1 + 1)1/θ−1(θ − 1)B1=1− [1 + (θ−1 − 1)wθ1B1 + o(B1)] [1 + (θ−1 − 1)wθ2B1 + o(B1)](θ − 1)B1=− (θ−1 − 1) (wθ1 + wθ2)B1 + o(B1)(θ − 1)B1 → w1 + w2as B1 → 0+. Hence, A+(w1, w2)→ w1 + w2 as θ → 1+.• For A(1), we obtained A(1)+ = wθ−11 E[(wθ1B1 + 1)1/θ−2 (wθ2B1 + 1)1/θ−1]above. Since(wθ1B1 + 1)1/θ−2 (wθ2B1 + 1)1/θ−1 → 1 as B1 → 0+, A(1)+ (w1, w2) → w01 = 1 as θ →1+. This is also true for A(2)+ .• ForA(12), we obtainedA(12)+ = (1− θ)wθ−11 wθ−12 E[B1(wθ1B1 + 1)1/θ−2 (wθ2B1 + 1)1/θ−2]above. AsB1 → 0+, B1(wθ1B1 + 1)1/θ−2 (wθ2B1 + 1)1/θ−2 → 0. HenceA(12)+ (w1, w2)→0 as θ → 1+.Combining the limits for the two regions, we obtainA(w1, w2)→ w1 + w2; A(i)(w1, w2)→ 1, i = 1, 2; A(12)(w1, w2)→ 0as θ → 1+. The limit of A is consistent with the construction of the extreme value factorcopula; if the linking copulas between the latent and observed variables are independencecopulas, the resulting bivariate extreme value distribution of the observed variables U1and U2 should be the product of the marginal distributions, i.e., FU1,U2(u1, u2) = u1u2 =exp {log u1 + log u2} = exp {− (w1 + w2)} , so that A(w1, w2) = w1 + w2.784.2 Dagum 1-factor extreme value copulaFor the 1-factor extreme value copula with Dagum conditional tail dependence functionsand parameters δ1, δ2 > 0, we have, for i = 1, 2, thatbi|V (wi|x) =[1 + (wi/x)−δi]−1/δi−1; b′i|V (wi|x) =1 + δix(wix)−δi−1 [1 +(wix)−δi]−1/δi−2.The A function. The A function is given byA(w1, w2) =ˆ ∞0{1−(1−[1 +(w1x)−δ1]−1/δ1−1)(1−[1 +(w2x)−δ2]−1/δ2−1)}dx.(4.9)First, we assume δ1 < δ2 in the following discussion. Let h(x) be the integrand of (4.9). Asx→∞,h(x) ≈ 1−(1−[(w1x)−δ1]−1/δ1−1)(1−[(w2x)−δ2]−1/δ2−1)= wδ1+11 x−δ1−1 + o(x−δ1−1)(4.10)i.e., the tail of h(x) behaves like x−1−δ1 . Consider the operations in (4.1). For the regionx ∈ [1,∞) and y > 0, the transformation h(eηy) · ηe(η+1)y therefore behaves like e(1−ηδ1)y,which is non-increasing at the limit (i.e., as y → ∞) when 1 − ηδ1 ≤ 0, or η ≥ δ−11 .Setting η = δ−11 implies that h(eηy) · ηe(η+1)y tends to a constant as y →∞, and minimizesthe possibility of numerical overflow for the part e(η+1)y. Note that this also works whenδ1 = δ2, and the subscripts can be interchanged if δ1 > δ2.A Taylor series expansion reveals that, as x→ 0+, we haveh(x) = 1+2∏i=1[1 + δiδi(wix)−δi+ o(xδi)]= 1+(1 + δ1δ1)(1 + δ2δ2)w−δ11 w−δ22 xδ1+δ2+o(xδ1+δ2).Hence for the region x ∈ [0, 1) and y > 0,h(e−ηy)·ηe−(η−1)y =[1 +(1 + δ1δ1)(1 + δ2δ2)w−δ11 w−δ22 e−η(δ1+δ2)y + o(e−η(δ1+δ2)y)]ηe−(η−1)y.The choice of η is fairly flexible here and makes little difference to the evaluation of thisintegral. When η = 1, the transformed integrand tends to 1 as y →∞.79The first derivatives A(1) and A(2). The A(1) function is given byA(1)(w1, w2)=ˆ ∞0{1 + δ1x(w1x)−δ1−1 [1 +(w1x)−δ1]−1/δ1−2(1−[1 +(w2x)−δ2]−1/δ2−1)}dx.Let h1(x) be this integrand. As x→∞,h1(x) ≈ (1 + δ1)w−δ1−11 xδ1(w1x)2δ1+1 [1−(w2x)1+δ2] ≤ (1 + δ1)wδ11 x−δ1−1. (4.11)The tail behaviour is thus the same as that of h(x) when δ1 < δ2, allowing the use of thesame η for the transformed integrand for the region x ∈ [1,∞).As for x ∈ [0, 1), we haveh1(x) = (1 + δ1)w−δ1−11 xδ1 ·[1− 2δ1 + 1δ1(w1x)−δ1+ o(xδ1)] [δ2 + 1δ2(w2x)−δ2+ o(xδ2)]= C8xδ1+δ2 + o(xδ1+δ2).Hence, as y → ∞ (i.e., x → 0+ with x = e−ηy), the tail of the transformed inte-grand h(e−ηy)e−(η−1)y behaves like e[1−η(δ1+δ2+1)]y. We can choose η = 1, so that 1 −η (δ1 + δ2 + 1) ≤ 0 for all δ1, δ2 > 0.For A(2) with x ∈ [1,∞), exchanging indices reveals that h2(x) has the same tail be-haviour as x−δ2−1, so that the transformation h(eηy) · ηe(η+1)y behaves like e(1−ηδ2)y. Withthe same choice η = δ−11 , 1− ηδ2 < 0 and the transformed integrand is decreasing in y (ornon-increasing, in the case δ1 = δ2) in the limit. For x ∈ [0, 1), the derivation is the sameas above, since the dominating term xδ1+δ2 is symmetric in δ1 and δ2. By interchangingsubscript, a similar result applies when δ1 > δ2.The second derivative A(12). The second derivative is given byA(12)(w1, w2) = −ˆ ∞02∏i=1{1 + δix(wix)−δi−1 [1 +(wix)−δi]−1/δi−2}dx.Let h12(x) be this integrand. As x→∞,h12(x) ≈ C92∏i=1xδi · x−2δi−1 = C9x−δ1−δ2−2.Hence the transformation h(eηy)·ηe(η+1)y (with x = eηy and y > 0) behaves like e[1−η(δ1+δ2+1)]y.With η = δ−11 , 1− η(δ1 + δ2 + 1) = −(δ2 + 1)/δ1 < 0 and the transformed integrand tends80to zero as y →∞. As in the case for Burr conditional tail dependence functions, η = 1 alsoworks and the integrand also declines to zero as y →∞.When x→ 0+, we haveh12(x) ≈2∏i=1{(1 + δi)w−δi−1i xδi ·[1− 2δi + 1δi(wix)−δi+ o(xδi)]}= C10xδi+δ2+o(xδi+δ2).The dominating term is the same as for h1(x), and the conclusion there applies to this caseas well.First-order approximation of h(eηy) · ηe(η+1)y as y → ∞. The following shows theappropriate first-order approximation of the transformed integrand for x ∈ [1,∞).• For A, we obtain from (4.10) that, if δ1 < δ2, h(eηy) · ηe(η+1)y ≈ wδ1+11 ηe(1−ηδ1)y =wδ1+11 /δ1 as y → ∞ with η = δ−11 . If δ1 = δ2 = δ, h(x) ≈ (wδ+11 + wδ+12 )x−δ−1 andthe approximation becomes h(eηy) · ηe(η+1)y ≈ (wδ+11 + wδ+12 )/δ.• For A(1), we obtain from (4.11) that h1(eηy) · ηe(η+1)y ≈ (1 + δ1)wδ11 ηe(1−ηδ1)y =(1 + δ1)wδ11 /δ1 with η = δ−11 if δ1 < δ2.• For A(2), the transformed integrand is decreasing to zero as y →∞ when δ1 < δ2. Ifδ1 = δ2 = δ, the first-order approximation becomes (1 + δ)wδ2/δ.• For A(12), the transformed integrand is decreasing to zero as y → ∞ regardlessof the magnitudes of δ1 and δ2, as the dominating term of h12(x) is x−δ1−δ2−2 =e−η(δ1+δ2−2)y.In terms of implementation, for i = 1, 2 we check whether δi = min(δ1, δ2). If this is true,we increment the approximation for A and A(i) (for large y) by wδi+1i /δi and (1 + δi)wδii /δi,respectively.Gauss-Jacobi quadrature. The transformation s = x−δ1 or dx = − (s−(θ1+1)/θ1/θ1) dscan be applied to the derivatives of A, so that the integrals are transformed to the unitinterval with a bounded integrand. Let A+, A(1)+ , A(2)+ and A(12)+ be the respective integrals81with lower limit equal to 1 instead of 0. Then we haveA(1)+=ˆ 10{1 + δ1δ1w−δ1−11 s−1/δ1−2 (wδ11 s+ 1)−1/δ1−2 (wδ11 s)1/δ1+2 [1− (1 + w−δ22 s−δ2/δ1)−1/δ2−1]}ds=1 + δ1δ1wδ11[ˆ 10(wδ11 s+ 1)−1/δ1−2ds−ˆ 10(wδ11 s+ 1)−1/δ1−2 (1 + w−δ22 s−δ2/δ1)−1/δ2−1 ds]=1− (wδ11 + 1)−1/δ1−1 − 1 + δ1δ1 wδ11 wδ2+12ˆ 10s(δ2+1)/δ1(wδ11 s+ 1)−1/δ1−2 (1 + wδ22 sδ2/δ1)−1/δ2−1ds=1− (wδ11 + 1)−1/δ1−1 − wδ11 wδ2+12 E [Bδ2/δ12 (wδ11 B2 + 1)−1/δ1−2 (1 + wδ22 Bδ2/δ12 )−1/δ2−1] ,where B2 ∼ Beta(1+δ−11 , 1). The quantity inside the expectation is bounded between 0 and1 as B2 takes values in the same range. Swapping indices result in an analogous formulafor A(2)+ .For A(12)+ , we obtainA(12)+= −ˆ 10[(1 + δ1) (1 + δ2)δ1wδ11 wδ22 s(δ2+1)/δ1(wδ11 s+ 1)−1/δ1−2 (wδ22 sδ2/δ1 + 1)−1/δ2−2]ds= − (1 + δ2)wδ11 wδ22 E[Bδ2/δ12(wδ11 B2 + 1)−1/δ1−2 (wδ22 Bδ2/δ12 + 1)−1/δ2−2].In this way, the term inside the expectation is bounded between 0 and 1.For A+, we haveA+=ˆ 101δ1s−1/δ1−1(1−[1− (1 + w−δ11 s−1)−1/δ1−1] [1− (1 + w−δ22 s−δ2/δ1)−1/δ2−1])ds=ˆ 101δ1wδ1+11(wδ11 s+ 1)−1/δ1−1ds+ˆ 101δ1wδ2+12 sδ2/δ1−1(wδ22 sδ2/δ1 + 1)−1/δ2−1ds−ˆ 101δ1wδ1+11 wδ2+12 s(1+δ2)/δ1(wδ11 s+ 1)−1/δ1−1 (wδ22 sδ2/δ1 + 1)−1/δ2−1ds (4.12)=w1[1− (wδ11 + 1)−1/δ1]+ 11 + δ1wδ2+12 E[B(δ2−δ1+1)/δ12(wδ22 Bδ2/δ12 + 1)−1/δ2−1]− 11 + δ1wδ1+11 wδ2+12 E[Bδ2/δ12(wδ11 B2 + 1)−1/δ1−1 (wδ22 Bδ2/δ12 + 1)−1/δ2−1].Unlike the Burr case, here we can write the transformed integral into expectations thatinvolve a finite range and bounded integrand. For δ1 ≤ δ2, B(δ2−δ1+1)/δ12 is bounded whenB2 ∈ [0, 1]. If δ1 > δ2, we can swap the indices so that the random variables for whichexpectation is taken are all distributed as Beta(1+δ−12 , 1) instead, and that the B(δ1−δ2+1)/δ22term inside one of the expectation operators will remain bounded.82Behaviour of the exponent function and its derivatives at the independencelimit. To obtain the limits of the various integrals when δ1, δ2 → 0+, for simplicity weassume δ1 = δ2 = δ (a similar result applies to distinct δ’s by letting δ2 = kδ1 for k > 0).For x ∈ [0, 1), it is obvious that 0 ≤ h(x) ≤ 1. Also, for the integrands of A(1)and A(12),h1(x) = (1 + δ)w−δ−11 xδ(1 + w−δ1 xδ)−1/δ−2 [1− (1 + w−δ2 xδ)−1/δ−1]≤ (1 + δ)w−δ−11 ;h12(x) = (1 + δ)2w−δ−11 w−δ−12 x2δ(1 + w−δ1 xδ)−1/δ−2 (1 + w−δ2 xδ)−1/δ−2≤ (1 + δ)2w−δ−11 w−δ−12 .The bounds are again integrable, and hence we can exchange the order of limit and inte-gration. Since(1 + w−δi xδ)−1/δ−1 → 0 as δ → 0+, i = 1, 2, we havelimδ→0+ˆ 10h(x; θ) dx =ˆ 10limδ→0+h(x; θ) dx = 0;limδ→0+ˆ 10hi(x; θ) dx =ˆ 10limδ→0+hi(x; θ) dx = 0, i = 1, 2;limδ→0+ˆ 10h12(x; θ) dx =ˆ 10limδ→0+h12(x; θ) dx = 0.Meanwhile, for x ∈ [1,∞), the derivations are listed as follows:• For A, note that the first two terms of (4.12) are the same when δ1 = δ2, except thatthe first term contains w1 while the second w2. HenceA+ = w1[1− (wδ1 + 1)−1/δ]+ w2 [1− (wδ2 + 1)−1/δ]− 11 + δ(w1w2)δ+1 E[B2(wδ1B2 + 1)−1/δ−1 (wδ2B2 + 1)−1/δ−1], (4.13)where B2 ∼ Beta(1 + δ−1, 1). When δ → 0+, the first two terms of (4.13) tend tow1 and w2, respectively, while B2 becomes increasingly concentrated at 1, eventu-ally becoming a point mass there. At B2 = 1, B2[(wδ1B2 + 1) (wδ2B2 + 1)]−1/δ−1=[(wδ1 + 1) (wδ2 + 1)]−1/δ−1 → 0 as δ → 0+. Therefore A+(w1, w2) → w1 + w2 asδ → 0+.• For A(1), we obtainedA(1)+ = 1−(wδ1 + 1)−1/δ−1 − wδ1wδ+12 E [B2 (wδ1B2 + 1)−1/δ−2 (1 + wδ2B2)−1/δ−1] .As δ → 0+, 1− (wδ1 + 1)−1/δ−1 → 1 and B2 (wδ1B2 + 1)−1/δ−2 (1 + wδ2B2)−1/δ−1 → 0as B2 → 1−. Hence A(1)+ (w1, w2)→ 1 as δ → 0+. A similar argument applies to A(2)+ .83• For A(12), the result isA(12)+ = − (1 + δ)wδ1wδ2E[B2(wδ1B2 + 1)−1/δ−2 (wδ2B2 + 1)−1/δ−2].As δ → 0+, B2 converges to a point mass at 1, at which the value inside the expec-tation tends to zero. Hence A(12)+ (w1, w2)→ 0 as δ → 0+.In summary, we obtain the same limits as in the Burr case.84Chapter 5Extremal coefficient for extremevalue copulas and itsgeneralizationsThe extremal coefficient ϑ is a measure of the strength of dependence used to study thedependence properties of extreme value copulas. As noted in Chapter 1, it can be inter-preted as the number of effective independent variables of a multivariate copula (see (1.1)).A d-dimensional extreme value copula can be written asC(u1, . . . , ud) = exp {−A(− log u1, . . . ,− log ud)} ,where the exponent function A is homogeneous of order 1 (Section 2.2.2). Because of thisproperty, one hasC(u, . . . , u) = exp {log u ·A(1, . . . , 1)} = uA(1,...,1)and thus ϑ = A(1, . . . , 1). The possible range of ϑ is [1, d]; it is equal to 1 if C is thecomonotonicity copula and d if C is the independence copula.Due to the importance of this summary measure for extreme value copulas, there existsmany estimators of ϑ with different properties. For the rest of this chapter, we focus onthe bivariate case as most of such estimators were proposed for bivariate copulas. Also, thematrix of extremal coefficients for all bivariate margins of a multivariate distribution is usedin the diagnostic checks in Chapters 6 and 7, where model adequacy-of-fit is considered.In this chapter, we review the properties of the different empirical estimators of ϑ.We focus on a particular type of estimators known as the F-madogram, and demonstrateits relationship with general copulas. Through a generalization of the F-madogram, it is85possible to construct a class of estimators for general copulas that put more weight in thetail portion, arriving at a tail-weighted measure of dependence. Properties of this estimatorare given, together with a note on the potential extension to higher dimensions.5.1 Empirical estimators of the extremal coefficientWe first review several empirical estimators of the extremal coefficient in the literature. Thestudy of the dependence between variables of a bivariate extreme value distribution datesback to Sibuya (1960), but the first proposal of estimation of the dependence propertiesfrom data came much later in Pickands (1981), where the notion of the Pickands dependencefunction is introduced. Empirical estimators are generally developed for the estimation ofthe whole Pickands dependence function B(w) = A(w, 1 − w) for w ∈ [0, 1], where A isthe exponent function (stable tail dependence function) of the extreme value distribution,written in copula form asC(u, v) = exp {−A(− log u,− log v)} . (5.1)The Pickands dependence function is also commonly expressed as the constituent of a min-stable survival function with unit exponential margins:G(x, y) = exp{−(x+ y)B(xx+ y)}, (5.2)or that of a max-stable distribution function with unit Fre´chet margins:G(x, y) = exp{−(1x+1y)B(yx+ y)}.The extremal coefficient is a multiple of the value at a point of the Pickands dependencefunction, namely ϑ = 2B(1/2). The Pickands dependence function of an extreme valuedistribution satisfies the boundary condition max(w, 1 − w) ≤ B(w) ≤ 1 (which impliesB(0) = B(1) = 1, B′(0) ∈ [−1, 0] with B′(w) = dB(w)/dw, B′(1) ∈ [0, 1] and B(1/2) ∈[1/2, 1]) and is convex. This is not necessarily satisfied by the empirical estimator, and someestimators proposed in the literature are in fact modifications of the one in Pickands (1981)so that some of these conditions are automatically satisfied. Although these restrictionsare not directly related to the estimation of the extremal coefficient, the modifications mayresult in a reduction in the variability of the estimators and are thus desirable.Most of the earlier estimators were proposed assuming known margins (i.e., each marginis transformed to a standard distribution using the known marginal distribution). Althoughderivation of the asymptotic properties is easier, in most situations marginal distributions86are unknown. One variant uses the empirical counterpart, i.e., the adjusted or scaled ranksfor each margin, as input. For bivariate observations Y i = (Yi1, Yi2)ᵀ, i = 1, . . . , n, thescaled ranks Ri = (Ri1, Ri2)ᵀ are defined asRik =1n+ 1n∑j=11(Yjk ≤ Yik),for k = 1, 2. The denominator uses n+ 1 to bypass boundary issues; other choices exist butthey do not affect asymptotic properties. It is important to differentiate between these twosituations because the resulting estimators can have quite different asymptotic variances,as pointed out in a thorough study in Genest and Segers (2009). In the following, wefirst provide the definitions and properties for the known margin version of each estimator,and then proceed to the rank-based version. Throughout the discussion, we assume Y i =(Yi1, Yi2)ᵀ, i = 1, . . . , n, is an i.i.d. sequence of bivariate random vectors that have survivalfunction (5.2) (i.e., exponential margins) and U i = (Ui1, Ui2)ᵀ, i = 1, . . . , n, is an i.i.d.sequence of bivariate random vectors from the extreme value copula (5.1). The subscript iis dropped when we refer to the distributional properties of the vectors.5.1.1 Estimators assuming known marginsSeveral existing empirical estimators for the extremal coefficient, when the margins areassumed known, are:1. Pickands estimator (Pickands (1981)). This is the first proposed estimator forthe Pickands dependence function. The corresponding estimator for the extremalcoefficient isϑˆ−1P =1nn∑i=1min (Yi1, Yi2) .The Pickands estimator for B(w) is not necessarily convex, and it does not satisfy theboundary conditions. The estimator ϑˆP need not be exactly 1 at comonotonicity.2. Deheuvels estimator (Deheuvels (1991)). This estimator utilizes an adjustmentto make sure the endpoint conditions B(0) = B(1) = 1 are satisfied. The extremalcoefficient is estimated byϑˆ−1D =1nn∑i=1[min (Yi1, Yi2)− 14(Yi1 + Yi2) +12].Similar to the Pickands estimator, ϑˆD need not be 1 at comonotonicity.873. Hall and Tajvidi’s (HT) estimator (Hall and Tajvidi (2000)). Instead of apply-ing an additive correction, the HT estimator uses a multiplicative correction. Thisconstruction allows the resulting estimator for B(w) to satisfy the boundary condi-tion max(w, 1− w) ≤ B(w) ≤ 1, although still not necessarily convex. The extremalcoefficient is estimated byϑˆ−1HT =1nn∑i=1min(Yi1Y 1,Yi2Y 2),where Y k = n−1∑ni=1 Yik for k = 1, 2. With the multiplicative correction, ϑˆHT = 1when (Y1, Y2) is comonotonic.4. Cape´raa`-Fouge`res-Genest (CFG) estimator (Cape´raa` et al. (1997)). The CFGestimator is based on the observation that the random variable Z = logU1/ log(U1U2)has distribution function H(z) = z + z(1 − z)B′(z)/B(z) for z ∈ [0, 1). This impliesthat the Pickands dependence function can be written asB(w) = exp{ˆ w0H(z)− zz(1− z) dz}= exp{−ˆ 1wH(z)− zz(1− z) dz}.Replacing H(z) by the empirical version Hˆn(z), the authors define their estimator aslog BˆCFG(w; p) = p(w)ˆ w0Hˆn(z)− zz(1− z) dz − [1− p(w)]ˆ 1wHˆn(z)− zz(1− z) dzfor suitable choices of the weight function p(w). Beirlant et al. (2004) provide a muchsimpler method to motivate the CFG estimator, in the sense thatE[log min(Y11− w,Y2w)]= − logB(w)− γ, (5.3)where γ = − ´∞0 log(x)e−x dx is Euler’s constant (see Segers (2007)). The CFGestimator can alternatively be written aslog BˆCFG(w; p) = − 1nn∑i=1[log min(Yi11− w,Yi2w)− p(w) log Yi1 − [1− p(w)] log Yi2].It satisfies the endpoint conditions if p is such as p(0) = 1 and p(1) = 0, and inpractice the choice p(w) = 1 − w is satisfactory (Segers (2007)). This leads to thefollowing estimator for the extremal coefficient:log ϑˆCFG = − 1nn∑i=1[log min (Yi1, Yi2)− 12(log Yi1 + log Yi2)]. (5.4)This corrected estimator has a value of 1 at comonotonicity, and is the version we usein this chapter (for both known-margin and rank-based estimators).885. F-madogram estimator (Cooley et al. (2006)). This estimator is inspired by themadogram used in spatial statistics as a measure of the dependence between two sites.The general empirical F-madogram is defined as:νˆα =12nn∑i=1|Uαi1 − Uαi2|. (5.5)For bivariate extreme value copulas, it can be shown that E(νˆα) = α/(α+1)−α/(α+ϑ)and thus the estimator for the extremal coefficient is given byϑˆα =α+ α(1 + α)νˆαα− (1 + α)νˆα .This estimator is also always 1 at comonotonicity. The original formulation by Cooleyet al. (2006) has α = 1, while Naveau et al. (2009) generalize the estimator to theestimation of the whole Pickands dependence function with the powers in (5.5) being1−β and β for β ∈ (0, 1); their estimator for the extremal coefficient has α = 1/2 = β.More recently, Fonseca et al. (2015) consider the case where the powers in (5.5) canbe any numbers α, β > 0.5.1.2 Rank-based estimatorsBecause marginal distributions are rarely known, in practice one is not usually equippedwith the observations Y i or U i that follow the stipulated marginal distributions. Therank-based approach uses instead the scaled ranks as a proxy of uniformly distributedobservations in [0, 1]. Before we review the rank-based estimators, define the empiricalcopula asCn(u1, u2) =1nn∑i=11(Ri1 ≤ u1, Ri2 ≤ u2)for u1, u2 ∈ [0, 1]. This representation is useful in determining the behaviour of the estima-tors.1. Estimators by Pickands / Deheuvels / Hall and Tajvidi. Genest and Segers(2009) show that the rank-based versions of the endpoint-corrected Pickands esti-mators, i.e., the Deheuvels and HT estimators, are asymptotically equivalent to therank-based Pickands estimator. Letting R˜ik = − logRik for i = 1, . . . , n and k = 1, 2,the rank-based Pickands estimator for the extremal coefficient can be written asϑˆ−1P,r =1nn∑i=1min(R˜i1, R˜i2)=12ˆ 101uCn(u1/2, u1/2) du.892. Cape´raa`-Fouge`res-Genest (CFG) estimator. Genest and Segers (2009) alsoshow that the rank-based, endpoint-corrected version of the CFG estimator is asymp-totically equivalent to the uncorrected one, i.e., the direct empirical analogue of (5.3).For asymptotic properties, it is easier to work with the uncorrected version, with cor-responding estimator for the extremal coefficient given bylog ϑˆCFG,r = −γ − 1nn∑i=1log min(R˜i1, R˜i2)= log 2− γ +ˆ 10Cn(u1/2, u1/2)− 1(u > e−1)u log udu.3. F-madogram estimator. The rank-based F-madogram estimator for the extremalcoefficient can be obtained by replacing the U ’s in (5.5) by R’s:νˆα,r =12nn∑i=1|Rαi1 −Rαi2|; ϑˆα,r =α+ α(1 + α)νˆα,rα− (1 + α)νˆα,r . (5.6)To write ϑˆα,r in terms of the empirical copula, we can proceed in a similar fashion asin Appendix A of Genest and Segers (2009), using |a− b|/2 = max(a, b)− (a+ b)/2:νˆα,r =1nn∑i=1[max(Rαi1, Rαi2)−12Rαi1 −12Rαi2]=1nn∑i=1ˆ 10(1− 1{Ri1 ≤ u1/α, Ri2 ≤ u1/α})du− 12nn∑i=1ˆ 10(1− 1{Ri1 ≤ u1/α, Ri2 ≤ 1})du− 12nn∑i=1ˆ 10(1− 1{Ri1 ≤ 1, Ri2 ≤ u1/α})du=12ˆ 10[Cn(u1/α, 1) + Cn(1, u1/α)− 2Cn(u1/α, u1/α)]du, (5.7)and ϑˆα,r is given by (5.6).4. Minimum distance estimator (Bu¨cher et al. (2011)). This estimator for thePickands dependence function is proposed as the minimizer of a weighted L2 dis-tance, so that BˆMD = arg minBMh(B,C), whereMh(B,C) =ˆ 10ˆ 10[logC(u1−w, uw)− log(u)B(t)]2 h(u) dudw90for copula C with positive quadrant dependence, with h(u) being a continuous non-negative weight function. Bu¨cher et al. (2011) show that the minimizer BˆMD(w),which satisfies the boundary condition, is given byBˆMD(w) = H−1hˆ 10logC(u1−w, uw) log(u)h(u) du,with Hh =´ 10 log2(u)h(u) du being a normalizing constant. The authors point out theparticular class of weight functions hk(u) = −uk/ log u, for k ≥ 0, yields the estimatorBˆMD(w, k) = −(k + 1)2ˆ 10logC(u1−w, uw)uk du. (5.8)The rank-based estimator is obtained by replacing C in (5.8) by Cn. For the extremalcoefficient, this results inϑˆMD,r(k) = −2(k + 1)2ˆ 10logCn(u1/2, u1/2)uk du.More sophisticated rank-based estimators for the Pickands dependence function (that sat-isfy the boundary and convexity conditions) exist in the literature, such as the projectionestimator of Fils-Villetard et al. (2008) and one based on a transformation of the pair(u1, u2) 7→ (log(u2)/ log(u1u2), log(C(u1, u2))/ log(u1u2)) and B-splines smoothing, as illus-trated in Cormier et al. (2014).5.1.3 Asymptotic efficiency of the estimatorsUnder certain regularity conditions, each of the estimators for the extremal coefficientmentioned is√n-consistent and asymptotically normal. However their asymptotic variancescan be quite different, as observed in some articles that review the asymptotic propertiesof estimators for the Pickands dependence function, e.g., Naveau et al. (2009), Genest andSegers (2009) and Bu¨cher et al. (2011). For most extremal coefficient estimators assumingknown margins, it is possible to derive expressions for the asymptotic variance or at leastuse Monte Carlo simulations to estimate it (without first obtaining a sampling distribution);the estimator by Hall and Tajvidi is the only one for which these methods appear difficult.Because the Pickands and Deheuvels estimators are not necessarily 1 under comonotonicity,we do not consider them in the sequel due to the higher variability of these estimators whenthe dependence is strong, compared to those that evaluate to 1 under comonotonicity. Theexpression for the asymptotic variance of the F-madogram estimator is derived in Section5.2.4, while the representation of the CFG estimator as a transform of the sum of i.i.d.random variables (5.4) allows quick estimation of the asymptotic variance via simulation:91The variance of log min (Yi1, Yi2) − 12 (log Yi1 + log Yi2) in (5.4) can be estimated with asingle sample of sufficient size, and then converted to the asymptotic variance of the CFGestimator using the delta method.Figure 5.1 displays the asymptotic variances of the CFG and F-madogram estimatorswith known margins, using the Gumbel and Hu¨sler-Reiss copulas. For the F-madogramestimator, we consider the cases with α = 1 (original formulation by Cooley et al. (2006))and α = 1/2 (modification by Naveau et al. (2009)). The results show that the CFG and F-madogram (α = 1) estimators have very similar performance with the CFG estimator havingslightly smaller asymptotic variance at moderate dependence but larger near independence.The F-madogram estimator with α = 1/2 has noticeably higher asymptotic variance thanfor α = 1 in both cases.1.2 1.4 1.6 1.80.00.51.01.5Gumbel copulaTrue extremal coefficientAsymptotic varianceCFGF−madogram (alpha=1)F−madogram (alpha=0.5)1.2 1.4 1.6 1.80.00.51.01.5Hüsler−Reiss copulaTrue extremal coefficientAsymptotic varianceCFGF−madogram (alpha=1)F−madogram (alpha=0.5)Figure 5.1: Asymptotic variances of the CFG and F-madogram estimators (assuming knownmargins) for the Gumbel and Hu¨sler-Reiss copulasAnalogous computations for the rank-based estimators are more difficult. Writing theestimators as functions of Cn facilitates calculations as a suitably scaled Cn converges indistribution to a Gaussian process GC (Fermanian et al. (2004)):√n [Cn(u1, u2)− C(u1, u2)] d→ GC(u1, u2), (5.9)whereGC(u1, u2) = BC(u1, u2)− BC(u1, 1) ∂C∂u1(u1, u2)− BC(1, u2) ∂C∂u2(u1, u2)92= BC(u1, u2)− BC(u1, 1)C2|1(u2|u1)− BC(1, u2)C1|2(u1|u2), (5.10)in which BC(u1, u2) is a Brownian bridge with covariance function E [BC(u1, u2)BC(u3, u4)] =C(u1∧u3, u2∧u4)−C(u1, u2)C(u3, u4). The proof in Fermanian et al. (2004) requires thatthe copula has continuous partial derivatives on the closed square [0, 1]2. This conditionis not satisfied by many common parametric families (such as those with tail dependence),but the assumptions required for the convergence have been weakened over time (see, e.g.,Tsukahara (2005); Omelka et al. (2009); Segers (2012)). Fermanian et al. (2004) also showthat, if J : [0, 1]2 → R is of bounded variation, continuous from above and with discontinu-ities of the first kind (Neuhaus (1971)), then1√nn∑i=1{J(Ri1, Ri2)− E[J(Ri1, Ri2)]} d→ˆ 10ˆ 10GC(u1, u2) dJ(u1, u2), (5.11)with the limiting distribution being Gaussian. These results permit the numerical compu-tation of the asymptotic variance of the limiting Gaussian distribution. Expressions for theasymptotic variance are generally 2-dimensional integrals, but there is a simplification withthe independence copula C(u1, u2) = u1u2. The asymptotic variances for the rank-basedextremal coefficient estimators when C is the independence copula are listed in Table 5.1.Because of the lack of analytic expressions, it is often hard to tell which estimator domi-nates another. However, based on simulations in Genest and Segers (2009), the rank-basedCFG estimator usually has smaller asymptotic variance than the rank-based Pickands esti-mator, although this is not always true. The simulation study in Naveau et al. (2009) usestransformed observations from fitted marginal GEV distributions rather than ranks; theirresults on the Gumbel or logistic model suggest that the CFG and F-madogram estimatorshave similar performance. The minimum distance estimator has asymptotic variance thatdepends on the value of k; a study of its performance with the asymmetric Galambos ornegative logistic model can be found in Bu¨cher et al. (2011), using k = 1 or 5. None ofthese results in asymptotic variances much smaller than the rank-based CFG estimator.Estimator Asymptotic variance SourcePickands 1.333Genest and Segers (2009)CFG 0.595F-madogram(2 + α)2(1 + α)(3 + 2α)See Section 5.2.4Minimum distance4(1 + k)2(1 + 2k)(3 + 4k)Bu¨cher et al. (2011)Table 5.1: Asymptotic variances of the empirical extremal coefficient ϑˆ for the rank-basedestimators with the independence copula93Finally, it is interesting to note that the rank-based estimators can sometimes havesmaller asymptotic variances than their known margin counterparts. Genest and Segers(2009) show that this is always true for the Pickands and the uncorrected CFG estimators.This is however not always the case for the endpoint-corrected versions of the Pickands orCFG estimator, or the F-madogram estimator. Table 5.2 shows some simulation resultsthat compare the asymptotic variances of different empirical estimators for the extremalcoefficient, for two Gumbel copulas with different strengths of dependence. The Pickandsand Deheuvels estimators are not considered for the same reason as mentioned at thebeginning of the subsection. It is clear that none of the estimators dominates another,although the HT estimator has somewhat higher asymptotic variances than the rest whenthe dependence is weak.ϑ = 1.7 (Kendall’s τ = 0.23)Estimator Known RankHT 0.92 0.86CFG 0.61 0.52F-madogram (α = 1) 0.57 0.61F-madogram (α = 0.5) 0.72 0.65Min. dist. (k = 1) — 0.61ϑ = 1.3 (Kendall’s τ = 0.62)Estimator Known RankHT 0.15 0.11CFG 0.08 0.19F-madogram (α = 1) 0.11 0.16F-madogram (α = 0.5) 0.13 0.16Min. dist. (k = 1) — 0.18Table 5.2: Asymptotic variances of the empirical extremal coefficient ϑˆ for various estima-tors with the Gumbel copula. “Known” stands for the estimator assuming known marginsand “Rank” the rank-based estimator.With regard to the use of the extremal coefficient as a dependence measure for adequacy-of-fit diagnostics in Chapters 6 and 7, the rank-based F-madogram estimator is chosenprimarily because of its simplicity in computation compared to the minimum distanceestimator and the competitive performance with the CFG estimator as suggested from thesimulation study of Naveau et al. (2009). The original formulation by Cooley et al. (2006)with α = 1 is selected as it appears to be less variable than that of Naveau et al. (2009)with α = 1/2, based on the results in Section 5.2.4.5.2 Generalization of the F-Madogram estimator tonon-extreme-value copulasMost of the estimators in the preceding section were originally motivated in the contextof extreme value distributions. In this section, we show that one particular class, the F-madogram estimators, can be extended to a dependence measure that describes the tailproperties of an arbitrary bivariate copula.94Consider the class of F-madogram estimators indexed by the parameter α > 0 as in(5.5). This form of νˆα has recently been considered in Fonseca et al. (2015), but their focusis also restricted to extreme value distributions.Observe that the population version of νˆα is given byνα =12E|Uα1 − Uα2 | =12E [2 max(Uα1 , Uα2 )− Uα1 − Uα2 ]= E [max(Uα1 , Uα2 )]−1α+ 1(5.12)as Uα1 ∼ Beta(α−1, 1) and E(Uα1 ) = (α + 1)−1, and similarly for Uα2 . The distribution forthe maximum can be obtained as follows:P[max(Uα1 , Uα2 ) ≤ u] = P(U1 ≤ u1/α, U2 ≤ u1/α) = C(u1/α, u1/α) , FM (u),and hence the expectation isE [max(Uα1 , Uα2 )] =ˆ 10udFM (u) =ˆ 10[1− FM (u)] du = 1−ˆ 10C(u1/α, u1/α) du.Let γα = γα(C) =´ 10 C(u1/α, u1/α) du. This means thatνα =αα+ 1− γα. (5.13)Note that γα =´ 10 uϑ/α du = α/(α+ ϑ) for extreme value copulas with extremal coefficientϑ. In general, ϑ is dependent on α (written as ϑα below; i.e., γα = α/(α + ϑα)) and isrelated to these quantities by the relationshipϑα =α+ α(1 + α)ναα− (1 + α)να = α(γ−1α − 1). (5.14)For the comonotonicity copula, ϑα = 1, γα = α/(α + 1) and να = 0; for the independencecopula, ϑα = 2, γα = α/(α+ 2) and να = α/[(α+ 1)(α+ 2)].We define the population version of the F-madogram measure of dependence asλα = λα(C) , 2− ϑα = 2 + α(1− γ−1α ), (5.15)such that λα ∈ [0, 1] for C with positive dependence. The empirical version isλˆα = 2− ϑˆα, (5.16)where ϑˆα is obtained from (5.14), replacing να by its sample counterpart νˆα in (5.5) whenthe margins are assumed known. The rank-based version νˆα,r is obtained by replacingthe U ’s in (5.5) by the R’s, the scaled marginal ranks. From the relationship (5.12), it is95possible to define νˆα in a different way, with ν˜α , n−1∑ni=1 max(Uαi1, Uαi2)− (α+1)−1. Thisis however not as desirable as νˆα because, unlike νˆα, ν˜α need not be 0 for observations fromthe comonotonicity copula.Using the definition (5.15), λα is a decreasing function of ϑα and has a higher valuewhen the copula is more strongly correlated. This direction is in agreement with othercommonly used measures of dependence such as Kendall’s τ . In particular, for extremevalue copulas, λα coincides with the tail dependence index (for the appropriate tail) for anyvalue of α > 0.Note that, when α = 1, the F-madogram να (for non-extreme-value copulas) is a lineartransformation of a measure of association known as Spearman’s footrule (Spearman (1904,1906)), with sample version beingϕˆr = 1− 3n2 − 1n∑i=1|Si1 − Si2| = 1− 3n− 1n∑i=1|Ri1 −Ri2|,where Si1 and Si2 are the marginal ranks of the ith observation, and the correspondingpopulation version ϕ = 1−3E|U1−U2| = 1−6ν1. The sample and asymptotic distributionalproperties of ϕˆr have been previously studied in Genest et al. (2010). Equations (5.14) and(5.15) implyλ1 = 2 + (1− γ−11 ) = 2−(1 + 2ν11− 2ν1)=1− 6ν11− 2ν1 =ϕ1− 2ν1 ,i.e., both λ1 and ϕ are 0 at independence and 1 at comonotonicity, but λ1 ≥ ϕ (resp.λ1 ≤ ϕ) for all copulas with positive (resp. negative) dependence as 1 − 2ν1 ≤ 1. At thecountermonotonicity limit, we have λ1 = −1 (see Section 5.2.1) and ϕ = −1/2.5.2.1 Dependence propertiesWe first investigate the behaviour of λα as a measure of concordance of a bivariate copula.Scarsini (1984) has a list of criteria that a measure of concordance should satisfy; these aresummarized in Definition 2.8 of Joe (2014). We check each of these items:1. Domain: Satisfied as λα is defined for all bivariate pairs with copula C.2. Symmetry (permutation): Satisfied as γα =´ 10 C(u1/α, u1/α) du is symmetric inthe arguments. See below for comments on reflection symmetry.3. Coherence: Satisfied as C1(u1, u2) ≺c C2(u1, u2) (i.e., C2 is larger than C1 in theconcordance ordering or equivalently C2 ≥ C1 pointwise) implies γα is larger for C2than C1, and so is λα.964. Range: The measure is constructed such that λα = 1 at comonotonicity. We showbelow that λα is not necessarily −1 at countermonotonicity, and hence this item isnot completely satisfied in general.5. Independence: Satisfied as λα = 0 for the independence copula.6. Sign reversal: As the range condition is not generally satisfied, it is impossible forλα for all (U1, U2) to be the negation of that for (−U1, U2).7. Continuity: Satisfied as λα is defined based on the copula.8. Invariance: Satisfied as monotonic marginal transformation does not affect the cop-ula.Thus, λα satisfies many desirable properties for a measure of concordance. For reflectionsymmetry (i.e., same value of λα for a copula C and its reflection Ĉ), note thatγα(Ĉ) =ˆ 10Ĉ(u1/α, u1/α) du =ˆ 10(2u1/α − 1 + C(1− u1/α, 1− u1/α) du=α− 1α+ 1+ˆ 10C(v1/α, v1/α)(v−1/α − 1)α−1 dv,where γα(Ĉ) denotes the value of γα for the copula Ĉ. This quantity is equal to γα(C) ifC is reflection symmetric (then it holds for any α > 0), or when α = 1 (then it holds forany copula). Otherwise, it is not generally true that γα(Ĉ) = γα(C).For item 4 (range), because λα is a coherent dependence measure, the lower bound ofits range can be obtained by considering the countermonotonicity copula, i.e., the Fre´chetlower bound of a bivariate copula. The countermonotonicity copula is given by C−(u1, u2) =max(0, u1 + u2 − 1) and thusγ−α =ˆ 10max(0, 2u1/α − 1) du =ˆ 12−α(2u1/α − 1) du = 2−α + α− 11 + α,where the minus sign at the superscript denotes the value for countermonotonicity copula.This impliesλ−α = 2 + α(1− 1γ−α)=2−α(α+ 2)− 22−α + α− 1 ,an increasing function of α. When α→ 0+, the limit of λ−α can be obtained using L’Hoˆpital’srule:limα→0+λ−α = limα→0+2−α(α+ 2)− 22−α + α− 1 = limα→0+2−α(1− α log 2− log 4)1− 2−α log 2 =1− log 41− log 2 ≈ −1.259.When α → ∞, λ−α → 0, and λ−α = −1 when α = 1. We thus observe that the rangerequirement of being −1 at countermonotonicity is only satisfied in the original formulationof the F-madogram, and can be slightly less than −1 when α ∈ (0, 1).975.2.2 Interpretation and use for general copulasInterpretation of λα in (5.15) for bivariate copulas can be made by focusing on the behaviourof γα. For α = 1, it is an integral along the diagonal of the copula at equal increment du.When α > 1, u1/α > u and more emphasis is on the distribution function at the joint uppertail, whereas the opposite is true when α ∈ (0, 1). The quantity λα can thus be thought ofas a tail-weighted summary that puts different weights on the strength of dependence of acopula (in terms of the magnitude of C(u1, u2) along the diagonal) at different locations.The use of this estimator in this regard can be compared to that of the tail-weighteddependence measures proposed in Krupskii and Joe (2015).To illustrate this, we compute the value of λα for several parametric copula families witha constant value of Kendall’s τ at 0.5. They include the Gaussian (symmetric with no taildependence, i.e., the tail dependence index is zero for both tails), Frank (symmetric witheven lighter tails than Gaussian), Hu¨sler-Reiss (asymmetric with upper tail dependence,extreme value copula), Gumbel (asymmetric with upper tail dependence, extreme valuecopula), reflected MTCJ (very asymmetric with upper tail dependence), t (symmetric withdependence at both tails), and BB1 (asymmetric with dependence at both tails) copulas.The BB1 family has two parameters; we set the upper tail dependence index to 0.5 as wellin order to arrive at a unique set of parameters. This results in a lower tail dependenceindex of 0.303. For a copula C, λα(C) puts more weight on the upper tail when α islarge. For the reflected copula Ĉ, λα(Ĉ) puts more weight on the lower tail of C whenα is large. Hereafter, when necessary we make the distinction that λU,α = λα(C) andλL,α = λα(Ĉ), respectively. The results are shown in Table 5.3; note that the λα values arenot very different among copulas with the same Kendall’s τ when α is small. The differencebecomes more pronounced as α increases and, more importantly, the magnitudes are indeedindicative of the strength of tail dependence expected of these copulas: For the panel withλU,α values, note that the reflected MTCJ has the heaviest upper tail, and is followed bythe Hu¨sler-Reiss and Gumbel copulas; the BB1 copula and t copula with small degrees offreedom come next; they have moderate upper tail dependence. The Gaussian copula andt copula with large degrees of freedom have even lighter upper tails, and the Frank copulahas the lightest tails of all the parametric copula families considered. As for the lower tails,only the t and BB1 copulas have lower tail dependence and this is also reflected by thevalues of λL,α for α large. By the construction of λU,α, it does not change with α when Cis the extreme value copula (i.e., C(u, u) = uϑ for some ϑ that does not depend on u); inthis case λU,α = λU , the upper tail dependence index, for all α. This can be seen for theHu¨sler-Reiss and Gumbel copulas in the upper panel of Table 5.3. For general copulas, weshow in Section 5.2.3 that λα converges to the tail dependence index for the appropriate98tail as α→∞.λU,α — Upper tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.61 0.59 0.58 0.59 0.54 0.64 0.62 0.620.5 0.60 0.59 0.58 0.59 0.56 0.62 0.60 0.601 0.58 0.58 0.58 0.59 0.59 0.60 0.58 0.592 0.55 0.56 0.58 0.59 0.62 0.57 0.55 0.565 0.50 0.47 0.58 0.59 0.65 0.53 0.50 0.5410 0.45 0.37 0.58 0.59 0.68 0.51 0.46 0.5220 0.40 0.26 0.58 0.59 0.69 0.49 0.42 0.5150 0.34 0.14 0.58 0.59 0.70 0.48 0.36 0.50100 0.29 0.08 0.58 0.59 0.70 0.47 0.33 0.50λU 0.00 0.00 0.58 0.59 0.71 0.45 0.07 0.50λL,α — Lower tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.61 0.59 0.66 0.66 0.69 0.64 0.62 0.640.5 0.60 0.59 0.62 0.63 0.65 0.62 0.60 0.611 0.58 0.58 0.58 0.59 0.59 0.60 0.58 0.592 0.55 0.56 0.52 0.53 0.50 0.57 0.55 0.555 0.50 0.47 0.43 0.44 0.35 0.53 0.50 0.5010 0.45 0.37 0.36 0.36 0.24 0.51 0.46 0.4720 0.40 0.26 0.29 0.30 0.15 0.49 0.42 0.4450 0.34 0.14 0.21 0.22 0.07 0.48 0.36 0.40100 0.29 0.08 0.17 0.17 0.04 0.47 0.33 0.38λL 0.00 0.00 0.00 0.00 0.00 0.45 0.07 0.30Table 5.3: Values of λU,α and λL,α for various bivariate copulas with Kendall’s τ equal to 0.5.“HR” and “rMTCJ” stand for the Hu¨sler-Reiss and reflected MTCJ copulas, respectively.The values for λU and λL in the final rows are the upper and lower tail dependence index,respectively.The choice of α is important in using the F-madogram measure of dependence to assessthe tail characteristics of a distribution. It is desirable to find a value of α such that thedifference of λα between a model with tail dependence and one without is large relativeto the variability of the empirical estimator, so as to obtain a higher differentiating poweramong copula models. To explore the range of desirable values of α, we run a simulationstudy similar to that of Table 2 of Krupskii and Joe (2015). Six scenarios are considered;in each of these scenarios, we estimate the difference between the λL,α values for a lowertail dependent copula (either t with 4 degrees of freedom or reflected Gumbel) versus onewithout tail dependence (for the lower tail) to high accuracy using large samples (20,00099replications each of size 20,000) and the rank-based version of the estimator. These arecompared against their variability (in terms of standard errors), for a sample size of 400which reflects realistic scenarios11. Copula parameters are chosen so that the Spearman’sρ is at 0.5 or 0.7; these simulation settings are the same as those in Krupskii and Joe(2015) so as to allow comparisons between the two measures. Results of the simulation aregiven in Table 5.4; they appear to suggest that a value of α between 15 and 20 is moreeffective in differentiating between copulas with and without tail dependence. As with thetail-weighted dependence measures of Krupskii and Joe (2015), a sample size of 400 seemsto be insufficient in detecting the difference between the tails of a Gaussian copula and a tcopula with 4 degrees of freedom, but marginally sufficient for copulas with more differentlower tails. For α less than 50, we can also see that the standard errors of the differencesare typically smaller than those of the tail-weighted dependence measures of Krupskii andJoe (2015), with their chosen functions and truncation level.5.2.3 Boundary casesWe investigate the properties of the F-madogram measure of dependence λα = λU,α =λα(C) as α approaches the lower or upper limit, i.e., as α → 0+ and α → ∞, and showthat λα tends to the upper tail dependence index λU in the latter case.Limit of λα as α→ 0+. When α→ 0+, the integrand of γα, C(u1/α, u1/α), tends to zeroeverywhere for u ∈ [0, 1) and is only 1 at u = 1. The integrand is also bounded in [0, 1],and thus the exchange of limit and integral is valid. This yields limα→0+γα = 0 and thuslimα→0+λα = 2− limα→0+αγα= 2− limα→0+[1αˆ 10C(u1/α, u1/α) du]−1= 2− limα→0+[ˆ 10vα−1C(v, v) dv]−1= 2−[ˆ 10v−1C(v, v) dv]−1.The limit exists for non-comonotonicity copula as C(v, v) ≤ v and C(v, v)vα−1 ≤ v−1.Limit of λα as α→∞ when the upper tail of C is well-behaved so that the uppertail dependence index λU exists. When α→∞, C(u1/α, u1/α) tends to 1 everywherefor u ∈ (0, 1] but is undefined at u = 0; limα→∞ γα = 1 and thuslimα→∞λα = 2 + limα→∞α(γα − 1)γα= 2 + limα→∞α(γα − 1).11Operationally, asymptotic variances σ2 are estimated from the simulation result, and then convertedto the standard error for sample size 400 via the relationship σ/√400.100Spearman’s ρ = 0.5αt4− rGumbel− t4− rGumbel− t4− rGumbel−Gaussian Gaussian Gumbel Gumbel Frank Frank1 0.03 (0.05) 0.02 (0.05) 0.02 (0.05) 0.00 (0.05) 0.02 (0.05) 0.01 (0.05)3 0.03 (0.05) 0.06 (0.05) 0.07 (0.05) 0.10 (0.05) 0.04 (0.05) 0.07 (0.05)5 0.04 (0.06) 0.09 (0.06) 0.09 (0.06) 0.14 (0.06) 0.06 (0.06) 0.11 (0.06)10 0.07 (0.07) 0.14 (0.07) 0.13 (0.07) 0.20 (0.07) 0.12 (0.07) 0.19 (0.07)15 0.08 (0.08) 0.17 (0.08) 0.15 (0.08) 0.24 (0.08) 0.16 (0.08) 0.24 (0.08)20 0.09 (0.09) 0.19 (0.09) 0.17 (0.09) 0.26 (0.09) 0.18 (0.08) 0.27 (0.08)50 0.13 (0.13) 0.24 (0.13) 0.20 (0.12) 0.32 (0.12) 0.24 (0.11) 0.35 (0.11)100 0.15 (0.17) 0.28 (0.17) 0.22 (0.16) 0.35 (0.16) 0.26 (0.15) 0.39 (0.15)Spearman’s ρ = 0.7αt4− rGumbel− t4− rGumbel− t4− rGumbel−Gaussian Gaussian Gumbel Gumbel Frank Frank1 0.03 (0.03) 0.02 (0.04) 0.01 (0.04) 0.00 (0.04) 0.02 (0.03) 0.01 (0.03)3 0.03 (0.04) 0.07 (0.04) 0.06 (0.04) 0.10 (0.04) 0.04 (0.04) 0.07 (0.04)5 0.04 (0.05) 0.10 (0.05) 0.09 (0.05) 0.15 (0.05) 0.07 (0.05) 0.13 (0.05)10 0.06 (0.06) 0.14 (0.06) 0.14 (0.06) 0.22 (0.06) 0.15 (0.06) 0.23 (0.06)15 0.08 (0.07) 0.17 (0.07) 0.17 (0.07) 0.26 (0.07) 0.20 (0.07) 0.29 (0.07)20 0.09 (0.08) 0.19 (0.08) 0.19 (0.08) 0.29 (0.08) 0.23 (0.08) 0.34 (0.08)50 0.13 (0.13) 0.25 (0.12) 0.24 (0.13) 0.37 (0.12) 0.33 (0.11) 0.46 (0.10)100 0.15 (0.18) 0.30 (0.17) 0.28 (0.17) 0.42 (0.16) 0.38 (0.15) 0.52 (0.13)Table 5.4: Comparison of the difference in λL,α for six pairs of copulas against their asymp-totic standard errors for the sample size 400 (in brackets). The upper panel uses copulaswith Spearman’s ρ equal to 0.5 and the bottom panel 0.7; values that are significant at 5%level are shown in boldface.Note thatα(γα − 1) =ˆ 10α[C(u1/α, u1/α)− 1]du =ˆ 10α[2u1/α − 2 + C(u1/α, u1/α)]du=ˆ 10αC(u1/α, u1/α) du− 2αα+ 1, (5.17)and thus we need to findlimα→∞ˆ 10αC(u1/α, u1/α) du.First, note that the integral is bounded as C(u1/α, u1/α) = 1 − 2u1/α + C(u1/α, u1/α) ≤1− u1/α, meaning thatˆ 10αC(u1/α, u1/α) du ≤ˆ 10α(1− u1/α) du = αα+ 1≤ 1101for any positive α, and tends to 1 as α → ∞. Then, consider the tail expansion of thesurvival function of C using the definition of the upper tail dependence index λU :C(1− v, 1− v) ∼ vλUas v → 0+, where the operator ∼ is such that f1(v) ∼ f2(v) means limv→0+f1(v)/f2(v) = 1.Suppose 0 < λU < 1. Then for every (small) > 0, there exists δ > 0 such that for every0 ≤ v < δ we havev(λU − ) ≤ C(1− v, 1− v) ≤ v(λU + ). (5.18)Also, there exists some α∗ such that for every α > α∗, 1− u1/α < δ for every u > . Writeˆ 10αC(u1/α, u1/α) du =ˆ 0αC(u1/α, u1/α) du+ˆ 1αC(u1/α, u1/α) du , h1(, α)+h2(, α).We treat each integral separately. For h1(, α),0 ≤ h1(, α) ≤ˆ 0α(1− u1/α) du = α(1− αα+ 11/α).Because the upper limit tends to (1− log ) as α→∞, there exists some α∗∗ and constantM > 1 such that for all α > α∗∗ we have0 ≤ h1(, α) ≤M (1− log ) .For h2(, α), since 1 − u1/α < δ for all u > and α > α∗, we use (5.18) to establish thebounds(λU − )ˆ 1α(1− u1/α) du ≤ h2(, α) ≤ (λU + )ˆ 1α(1− u1/α) du=⇒ (λU − )[ αα+ 1− α(1− αα+ 11/α)]≤ h2(, α) ≤ (λU + ),where the upper limit uses the relationship´ 1 α(1−u1/α) du ≤´ 10 α(1−u1/α) du ≤ 1. Thus(λU − )[ αα+ 1−M (1− log )]≤ h2(, α) ≤ (λU + )for all α > max(α∗, α∗∗), and, as α→∞,(λU − ) [1−M (1− log )] ≤ h1(,∞) + h2(,∞) ≤M (1− log ) + (λU + ),where hj(,∞) = limα→∞hj(, α), j = 1, 2. Since > 0 can be arbitrarily small,limα→∞ˆ 10αC(u1/α, u1/α) du = λU .102The proof applies to λU = 1 or 0 by taking (5.18) as v(λU − ) ≤ C(1 − v, 1 − v) ≤ v or0 ≤ C(1− v, 1− v) ≤ v(λU + ), respectively.Putting this result back into (5.17), we obtainlimα→∞λα = λU ,thus reinforcing the interpretation of λα that more weight is put on the upper tail as αincreases, eventually coinciding with the upper tail dependence index when α→∞.5.2.4 Asymptotic normality and varianceIn this subsection, we explore the asymptotic properties of the F-madogram measure ofdependence estimator as the sample size n → ∞. We deal with the two versions of theestimator separately: The estimator assuming known margins λˆα, and the one using ranksλˆα,r. We focus on the treatment of λα = λU,α = λα(C).Estimator assuming known margins. For the estimator assuming known margins,the asymptotic normality of√n (νˆα − να) is immediate noting from (5.5) that it is a sumof i.i.d. random variables with finite variance. We thus have√n (νˆα − να) d→ N(0, σ2ν),where σ2ν = Var (|Uα1 − Uα2 |/2) is obtained as follows:σ2ν =14{E[(Uα1 − Uα2 )2]− (E|Uα1 − Uα2 |)2} = 12E (U2α1 )− 12E [(U1U2)α]− ν2α=12(1 + 2α)− 12ρα − ν2α,where ρα , E [(U1U2)α]. To obtain the asymptotic variance of λˆα, we refer to (5.14) withλα = 2− ϑα and leth(x) = 2− α+ α(1 + α)xα− (1 + α)x =α− (1 + α)(2 + α)xα− (1 + α)x .This yieldsh′(να) = − α(1 + α)2[α(να − 1) + να]2= −(α+ 2− λα)2αusing the relationshipνα =α(1− λα)(1 + α)(α+ 2− λα)from (5.13) and (5.14). Then by the delta method,√n(λˆα − λα) d→ N(0, σ2),103whereσ2 = [h′(να)]2σ2ν =(α+ 2− λα)4α2[12(1 + 2α)− 12ρα − ν2α]. (5.19)Table 5.5 tabulates the values of asymptotic variances for a combination of variousparametric copula families and values of α; these are the same scenarios as those consideredin Table 5.3. Note that E [(U1U2)α] can be obtained by the following relationship, in whichV1 and V2 are i.i.d. unit uniform random variables and are independent of (U1, U2) ∼ C:ˆ 10ˆ 10(u1u2)α dC(u1, u2)=ˆ 10ˆ 10P(V 1/α1 ≤ u1, V 1/α2 ≤ u2|U1 = u1, U2 = u2)c(u1, u2) du1du2= P(V 1/α1 ≤ U1, V 1/α2 ≤ U2)=ˆ 10ˆ 10P(U1 > v1/α1 , U2 > v1/α2 |V1 = v1, V2 = v2) dv1dv2=ˆ 10ˆ 10C(v1/α1 , v1/α2 ) dv1dv2.The final integral can be evaluated numerically and is stable as both the integrand andthe limits of integration are bounded by [0, 1]; this is preferred over the evaluation of´ 10´ 10 (u1u2)α c(u1, u2) du1du2 directly as copula densities may asymptote to ∞ near (0, 0)or (1, 1).From Table 5.5, we note that for most of these families, the asymptotic variances arethe smallest when α is small (e.g., 1 or less). This is in agreement with the behaviour ofλα we observed earlier, that it tends to be similar among copulas with the same overallstrength of dependence. For given α, the asymptotic variance appears to be smaller with alarger value of λα.Rank-based estimator. Equation (5.7) expresses the rank-based estimator νˆα,r in termsof the empirical copula Cn; the population version να can be obtained by replacing Cn withC in (5.7). The convergence result of the empirical copula process (5.9) then implies that√n (νˆα,r − να) d→ X,whereX =12ˆ 10GC(u1/α, 1)du+12ˆ 10GC(1, u1/α)du−ˆ 10GC(u1/α, u1/α)du. (5.20)The normality of X is given by (5.11) with J(u, v) = |uα − vα|/2. For the asymptoticdistribution of λˆα,r, note that from (5.13) and (5.15) we haveλα = 2 + α(1− 1α/(α+ 1)− να); λˆα,r = 2 + α(1− 1α/(α+ 1)− νˆα,r),104Asymptotic variance of λˆU,α — Upper tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.23 0.32 0.33 0.34 0.49 0.25 0.23 0.240.5 0.19 0.21 0.23 0.24 0.30 0.22 0.19 0.201 0.18 0.18 0.19 0.20 0.20 0.22 0.19 0.202 0.24 0.25 0.21 0.22 0.17 0.28 0.25 0.255 0.54 0.67 0.36 0.37 0.24 0.52 0.54 0.4810 1.14 1.61 0.65 0.67 0.40 0.97 1.11 0.8920 2.50 3.78 1.25 1.29 0.73 1.91 2.39 1.7350 7.07 10.92 3.06 3.13 1.71 4.80 6.62 4.28100 15.38 23.24 6.08 6.21 3.34 9.67 14.22 8.52Asymptotic variance of λˆL,α — Lower tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.23 0.32 0.15 0.16 0.11 0.25 0.23 0.210.5 0.19 0.21 0.15 0.17 0.14 0.22 0.19 0.191 0.18 0.18 0.19 0.20 0.20 0.22 0.19 0.202 0.24 0.25 0.30 0.30 0.36 0.28 0.25 0.275 0.54 0.67 0.72 0.72 0.97 0.52 0.54 0.5510 1.14 1.61 1.54 1.53 2.11 0.97 1.11 1.0820 2.50 3.78 3.36 3.34 4.52 1.91 2.39 2.2350 7.07 10.92 9.36 9.29 11.94 4.80 6.62 5.87100 15.38 23.24 20.00 19.86 24.41 9.67 14.22 12.20Table 5.5: Asymptotic variances of λˆU,α and λˆL,α for various bivariate copulas withKendall’s τ equal to 0.5. The estimators are defined in (5.16), with the population versionof λˆU,α being λα(C) for copula C, and that of λˆL,α being λα(Ĉ). “HR” and “rMTCJ” standfor the Hu¨sler-Reiss and reflected MTCJ copulas, respectively.and therefore√n(λˆα,r − λα)= −√nα(1α/(α+ 1)− νˆα,r −1α/(α+ 1)− να)= −α( √n (νˆα,r − να)[α/(α+ 1)− νˆα,r][α/(α+ 1)− να])= rα[√n (νˆα,r − να)]+ op(1),where rα = −α[α/(α+ 1)− να]−2 = −αγ−2α . As a result,√n(λˆα,r − λα)d→ N(0,(α+ 2− λα)4α2Var(X)),where the multiplicative constant in the variance is the same as [h′(να)]2 in (5.19).105As we remarked in Section 5.1.3, the asymptotic variance is usually a 2-dimensionalintegral that must be evaluated numerically. Table 5.6 displays the asymptotic variances ofthe rank-based estimators λˆU,α,r and λˆL,α,r, evaluated using the expressions in Appendix A.Compared to the estimator assuming known margins, it appears that the general pattern ofthe asymptotic variances attaining the smallest values with α ≈ 1 continues to hold. Withsmaller α, the asymptotic variance of the rank-based estimators is higher than that of theestimator assuming known margins, while the reverse is true when α is larger (e.g., α ≥ 5).Asymptotic variance of λˆU,α,r — Upper tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.26 0.24 0.31 0.31 0.35 0.29 0.27 0.290.5 0.25 0.21 0.28 0.28 0.31 0.28 0.25 0.271 0.25 0.21 0.27 0.27 0.28 0.28 0.25 0.272 0.29 0.25 0.28 0.29 0.27 0.33 0.30 0.315 0.47 0.46 0.41 0.42 0.33 0.50 0.47 0.4810 0.78 0.76 0.65 0.67 0.46 0.83 0.79 0.7920 1.43 1.19 1.14 1.17 0.76 1.49 1.45 1.4350 3.33 1.82 2.64 2.71 1.65 3.52 3.44 3.34100 6.36 2.22 5.13 5.26 3.14 6.95 6.73 6.53Asymptotic variance of λˆL,α,r — Lower tail-weighted for large αα Gaussian Frank HR Gumbel rMTCJ t (ν = 3) t (ν = 20) BB10.2 0.26 0.24 0.23 0.24 0.20 0.29 0.27 0.270.5 0.25 0.21 0.24 0.24 0.23 0.28 0.25 0.261 0.25 0.21 0.27 0.27 0.28 0.28 0.25 0.272 0.29 0.25 0.33 0.33 0.38 0.33 0.30 0.325 0.47 0.46 0.53 0.53 0.58 0.50 0.47 0.5010 0.78 0.76 0.85 0.85 0.79 0.83 0.79 0.8320 1.43 1.19 1.41 1.42 1.00 1.49 1.45 1.5150 3.33 1.82 2.80 2.84 1.24 3.52 3.44 3.60100 6.36 2.22 4.63 4.73 1.35 6.95 6.73 7.13Table 5.6: Asymptotic variances of λˆU,α,r and λˆL,α,r for various bivariate copulas withKendall’s τ equal to 0.5. The estimators are defined in (5.16) but are based on marginalranks; the population version of λˆU,α,r is λα(C) for copula C, and that of λˆL,α,r is λα(Ĉ).“HR” and “rMTCJ” stand for the Hu¨sler-Reiss and reflected MTCJ copulas, respectively.The expressions for the asymptotic variance can be greatly simplified if C is the inde-pendence copula. Here we outline the derivations; the more tedious parts of the proof arerelegated to Appendix A. When C(u1, u2) = u1u2, we haveGC(u1, u2) = BC(u1, u2)− u2BC(u1, 1)− u1BC(1, u2),106whereE [BC(u1, u2)BC(u3, u4)] = (u1 ∧ u3) (u2 ∧ u4)− u1u2u3u4.With this, the covariance function of GC is given byE [GC(u1, u2)GC(u3, u4)] = (u1 ∧ u3 − u1u3)(u2 ∧ u4 − u2u4),which allows the computation of the variance of X (after some simplification), asVar(X) = E(X2) =ˆ 10ˆ 10E[GC(u1/α, u1/α)GC(v1/α, v1/α)]dudv=α2(2 + α)2(3 + 5α+ 2α2),so that √n(λˆα,r − λα)d→ N (0, σ2) ,whereσ2 =(2 + α)4α2· α2(2 + α)2(3 + 5α+ 2α2)=(2 + α)2(1 + α)(3 + 2α).Interestingly, this asymptotic variance is a decreasing function in α, from 4/3 when α→ 0+to 1/2 as α → ∞. From this, we can see that the (rank-based) F-madogram measure ofdependence has smaller variability than the tail-weighted dependence measures of Krupskiiand Joe (2015) under independence, where it is proved that the asymptotic variance of thelatter is 1/p2, with 0 < p ≤ 0.5 being the truncation threshold.5.2.5 Extension to higher dimensionsOne advantage of the F-madogram estimator is that it can be easily extended to higherdimensions. Note thatνˆ(2)α =12nn∑i=1|Uαi1 − Uαi2| =1nn∑i=1[max(Uαi1, Uαi2)−12(Uαi1 + Uαi2)],where the bracketed superscript on νˆ denotes the dimension of the underlying data. Thislatter form allows a generalization to higher dimensions, considered in Marcon et al. (2016).Let mi be the index of the largest observation in the vector (Ui1, . . . , Uid), then defineνˆ(d)α =1dnn∑i=1d∑j=1(Uαimi − Uαij)=1nn∑i=1[max(Uαi1, . . . , Uαid)−1dd∑j=1Uαij].This can be interpreted as the average distance between the maximum and other compo-nents of the vector. For an arbitrary d-dimensional copula C, the population version of νˆ(d)α107is given byν(d)α = E[max(Uαi1, . . . , Uαid)−1dd∑j=1Uαij]=αα+ 1−ˆ 10C(u1/α, . . . , u1/α) du, αα+ 1− γ(d)αsimilar to (5.13). Note that γ(d)α = α/(α+ d) for the independence copula and by definingλ(d)α =d+ α[1− 1/γ(d)α]d− 1 ,a measure of dependence can be obtained; it satisfies the conditions λ(d)α = 0 for theindependence copula and λ(d)α = 1 for the comonotonicity copula.5.2.6 Potential future researchWith a similar derivation, it is also possible to express the CFG estimator as an integral ofthe copula and its survival function along the diagonal. It may thus be possible to utilize theestimator as a dependence measure for general bivariate copulas. However, the expressionsare more involved and are not as easily interpretable as that for the F-madogram.A direction for future work could be to modify estimators for the extremal coefficientto general measures of dependence that reveal other characteristics of a copula. For exam-ple, by considering different powers of exponentiation for each margin of the F-madogramestimator (e.g., α for the first margin, β for the second), γ becomes an integral not alongthe diagonal but (u1/α, u1/β), for u ∈ [0, 1]. It may thus be possible to devise a measure onpermutation symmetry by considering the difference along (u1/α, u1/β) and (u1/β, u1/α).108Chapter 6Assessing model adequacy basedon empirical and fitted featuresAn important part of data modelling is to ensure that the fitted parsimonious models,which impose certain restrictions on the relationship between variables, capture sufficientcharacteristics expressed by the data. In Chapters 6 and 7, we turn our attention to modelevaluation. In particular, we make use of various features that compare the characteristicsexhibited by the data and those implied by the fitted models. We demonstrate the limitingbehaviour of this comparison statistic in this chapter and illustrate how we can constructa range of flexible statistics that measure the amount and source of misfit in terms ofthe features desired. This forms the theoretical background for the methods described inChapter 7, where we consider the evaluation of the adequacy-of-fit of a general multivariatecopula.6.1 IntroductionWe first provide an overview of and the intuition behind the discrepancy statistic based onthe difference between the empirical and fitted parametric distributions, and their functionalextensions. A connection to some existing work on goodness-of-fit procedures using theKolmogorov-Smirnov and Crame´r-von Mises-type tests is then made.6.1.1 Vector of differences between empirical and model-based featuresA statistical model is constructed as a tractable mathematical representation of a systemthat allows one to gain understanding on its mechanism and, in some cases, to assist inmaking reasonable prediction in the future. With this objective in mind, one way to assess109model adequacy is to compare how close the fitted model is to the data being modelled.This is especially relevant when the researcher thinks a decent model has been obtained,for instance through exploring the structure of the data or from subject knowledge, butwould like to seek guidance as to whether it is good enough.For a sample of (possibly multivariate) i.i.d. observations Y 1, . . . ,Y n ∼ F and para-metric model G with parameter vector θ, differences of the form T (Fˆn)− T[G(·; θˆn)]arean intuitive measure of discrepancy between the data and the fitted model, where Fˆn is theempirical distribution of F , θˆn is a model-based estimate of the parameter vector for G,and T is a functional taking a distribution function as its argument. This difference canalso be called residual as it measures the lack-of-fit between the data and the model. Weare interested in the difference statisticDn =T1(Fˆn)− T1[G(·; θˆn)]T2(Fˆn)− T2[G(·; θˆn)]...Tm(Fˆn)− Tm[G(·; θˆn)] , T n,emp − T n,mod, (6.1)where m denotes the number of functionals or features12 being compared. In evaluating theadequacy of dependence modelling, some possible choices of T are measures of dependencesuch as Kendall’s tau for (j, k) margin, τjk, with T (F ) = 4´R2 Fjk(y) dFjk(y) − 1 andSpearman’s rho for (j, k) margin, ρjk, with T (F ) = 12´R2 Fj(y1)Fk(y2) dFjk(y1, y2) − 3,where FS is the marginal distribution of the variable(s) in the set S. Other choices of Tthat may be helpful in evaluating model adequacy include the pth moment with T (F ) =´Rd yp dF (y), or the value of distribution function at a given point t with T (F ) = F (t).In the discrete case of contingency tables with a total of m cells, we can take Tj(Fˆn)as the sample proportion for cell j and Tj[G(·; θˆn)]as its model-based counterpart, j =1, . . . ,m. This makes Dn the difference between observed and expected cell probabilities;it is the building block of Pearson’s χ2 statistic. An adequate or good parametric statisticalmodel should be such that the magnitude of the difference is small. In higher-dimensionalproblems, Pearson’s χ2 statistic becomes impractical due to the sparsity of observations,resulting in small expected cell probabilities. Maydeu-Olivares and Joe (2005, 2006) proposethe use of low-order marginal tables to bypass sparsity issues. In this case, each Tj is theproportion or probability for one cell of a low-order (typically bivariate) marginal table. Thisconcept is further extended in Joe and Maydeu-Olivares (2010) where linear combinationsof the (marginal) cell probabilities are considered.12The functionals Tj can be considered as the aspects of the distribution one wants to examine, andhence they can also be referred to as features.110The limiting distributions of those statistics (based on low-order marginal cell probabil-ities and their linear combinations) have been derived. In particular, there are expressionsfor the computation of the limiting covariance matrix of the difference statistic. However, itis not always computationally easy to obtain this matrix, for example if the dimension of thetable is high resulting in a large number of low-order margins (i.e., when m is large). Theimplementation of the theory is impractical for high-dimensional models that are not closedunder margins, such that low-order marginal model-based cell probabilities cannot be ob-tained without aggregating the high-dimensional cell probabilities. These issues carry overto the continuous case, but there are extra factors that make this problem much harder forgeneral continuous distributions. For instance, expressions for the limiting covariance ma-trix depend on the features Tj used (these are no longer cell probabilities) and the methodof model estimation. It is generally undesirable to discretize continuous distributions, asimportant features (such as tail properties) can be lost and the resulting statistic may loseits ability to signal model inadequacy. Similar to the discrete case, we want to use featuresthat take low-dimensional distributions as inputs to bypass sparsity in higher dimensions.When the model is not closed under margins, it can be computationally intensive to obtainthe model-based feature due to the lack of a tractable low-dimensional distribution.In this chapter, we show that√nDnd→ N(0,Σ) for the class of features being U-statistics under correct model specification, when θ is estimated using maximum likelihoodas well as the more general case of an√n-consistent estimator that is the solution of aset of estimating equations. This is a fundamental result for us to investigate methods toassess adequacy-of-fit of multivariate distributions in Chapter 7. If θˆn is the maximumlikelihood estimator, then the asymptotic covariance matrix Σ depends on the Fisher infor-mation matrix and the gradient of the model-based features with respect to the parameter.It can also be shown that the asymptotic covariance matrix of√nT n,emp is “larger” thanthat of√nT n,mod, in the sense that the difference is equal to Σ, a positive semi-definitematrix. In plain terms, this means that the empirical feature is more variable than boththe model-based counterpart as well as the difference, whose asymptotic variance can bebroken down into the difference of the respective asymptotic variances. This is an impor-tant observation because, by not having to consider the cross covariance, the derivationof asymptotic properties for the difference statistic is greatly simplified when the model iscorrectly specified. It is useful to obtain the asymptotic behaviour of the differences becausethis gives us an idea as to what variability we can expect if the data generating mechanismreasonably follows the assumed model, thereby providing a guide as to whether the modelis adequate.Usually, obtaining the limiting behaviour of the vector of differences is an intermediate111step in model evaluation. Using the vector Dn, we can construct quadratic form statisticsQn that summarize the discrepancies for different features. The following lists some possibleconstructions:1. The first possibility is to simply use the square of each element, so that Qn =nmDᵀnDnafter scaling by the number of observations and features. This is the (scaled) meansquared discrepancy among the features considered, and gives equal weight to eachelement. A closely related quantity is√m−1DᵀnDn, known as the standardized rootmean squared residual (SRMSR) which is explained in more detail in Section 7.1.With this formulation, it is necessary to approximate the limiting distribution (towhich Qn converges as n→∞) using methods such as moment matching.2. If Σ is known or can be computed accurately, then another choice isQn = nDᵀnΣ−1Dnor Qn = nDᵀnΣ−Dn for singular Σ, where Σ− is its generalized inverse. This statisticconverges to a χ2 distribution as n→∞. In addition to having a convenient limitingdistribution, this formulation is also useful when the features are of substantiallydifferent magnitudes, so that standardization is desirable.3. Alternatively, one may only scale the individual terms by considering the statisticQn = nDᵀn [diag(Σ)]−1Dn; one such example is in Bartholomew and Leung (2002).The limiting distribution can be written as a mixture of independent chi-squaredvariables. Moment matching can be used, or the tail probabilities of the limitingdistribution can be evaluated numerically, see, e.g., Rice (1980).Much of the exposition in this chapter focuses on the behaviour of Dn; we develop decisioncriteria based on the asymptotic distribution of√nDn. The quadratic form statistic Qnwill be revisited in Section 6.5 where its use in evaluating model adequacy is discussed.Regarding the construction of Dn, although the theoretical derivations in this chaptermainly focus on the class of U-statistics with the parametric model fitted by means ofestimating equations that yield√n-consistent and asymptotically normal estimators, weconsider a wider class of empirical features and model estimation methods in Chapter 7:• Empirical features: (A1) U-statistics; (A2) rank-based F-madogram estimator of theextremal coefficient, and; (A3) rank-based tail-weighted dependence measures.• Model estimation methods: (B1) Maximum likelihood; (B2) estimating equations;(B3) estimation of dependence parameters with marginal ranks.In this chapter, we demonstrate the asymptotic normality of the combination (A1) and(B1), as well as (A1) and (B2), by decomposing the U-statistic and the expansion of the112model-based estimator into sums of i.i.d. random variables. Some of the results in Chapter7 depend also on the normality of other combinations; we acknowledge that this is stillopen to further research, and in Section 6.4 we mention some technical details and makeconjectures for results not yet rigorously demonstrated in the current work.6.1.2 Relevance to goodness-of-fit testsStatistics used for evaluating goodness-of-fit based on the difference between distributionfunctions have a long history. The Kolmogorov-Smirnov, Crame´r-von Mises and Anderson-Darling statistics all belong to this category. For testing the simple hypothesis that aunivariate data set belongs to a completely specified model F0(t), these statistics are definedasKSn = supt∈R|Fˆn(t)− F0(t)|;CvMn =ˆ ∞−∞[Fˆn(t)− F0(t)]2dF0(t);ADn =ˆ ∞−∞[Fˆn(t)− F0(t)]2F0(t) [1− F0(t)] dF0(t),respectively. If F0 is continuous, it can be proved that the sampling distributions of thesestatistics do not depend on F0, and thus they are “distribution-free”. However, in practiceF0(t) is usually unknown, and therefore it is often more practical to consider the compositehypothesis that the data follow a certain parametric family of distributions. Direct modi-fication to the test statistics above amounts to changing the completely specified F0(t) tothe fitted distribution G(t; θˆn), as defined at the beginning of this section. However, theresulting asymptotic distributions under correct model specification generally depend on Gand θ. Some early work tackling goodness-of-fit tests with estimated parameters includesChernoff and Lehmann (1954) in the discrete setting for χ2 tests, and Darling (1955) whichhas the result for general absolutely continuous distribution; see also Chapter 7 of Durbin(1973). Because these statistics often have intractable limiting distributions, they were notvery useful at that time when computational power was limited; results in the form of tablesof quantiles were only available for specific models of interest, see, e.g., Durbin (1975) andStephens (1976). Chapter 28 of DasGupta (2008) has a summary of the relevant texts anddevelopments on testing composite null hypotheses.Goodness-of-fit tests for multivariate copulas are generally not distribution-free. Nev-ertheless, due to the increasing emphasis on multivariate modelling and the advent of tech-nology which made analogous procedures viable, tests based on Kolmogorov-Smirnov and113Crame´r-von Mises-type statistics as well as Rosenblatt’s transformation have been studied,see, e.g., Genest et al. (2009, 2013), Berg (2009) and the references therein.By taking Ti as the distribution function at ti, (6.1) can be written as a vector ofdifferences of distribution functions at various points on its support. The vector Dn is thusconnected to these goodness-of-fit statistics when the model is estimated. The connection isthe most obvious with the Kolmogorov-Smirnov statistic given by supt∈R |Fˆn(t)−G(t; θˆn)|;this is the supremum of the absolute value of Dn when viewed as a process, i.e., with Dnhaving infinite number of elements within the support of the distributions. Meanwhile, withthe Crame´r-von Mises statistic, we consider its generalization by applying a weight functionw˜(t) to the integrand. For absolutely continuous G with density g, this generalization isgiven byGCvMn =ˆ ∞−∞w˜(t)[Fˆn(t)−G(t; θˆn)]2dG(t; θˆn) =ˆ ∞−∞w(t; θˆn)[Fˆn(t)−G(t; θˆn)]2dt= limm→∞m∑i=1w(tim; θˆn)∆tim[Fˆn(tim)−G(tim; θˆn)]2, (6.2)where w(t; θˆn) = w˜(t)g(t; θˆn), ∆tim = (tim − ti−1,m)/m and {t0m, t1m, . . .} is an increasingsequence of points on R that gets denser as m increases. Written this way, (6.2) becomesa weighted sum of infinitely many values of the squared differences at the tim’s; in matrixnotation, this is equal to DᵀnW nDn, a quadratic form in Dn, where W n is a diagonalmatrix with elements w(t1m; θˆn)∆t1m, . . . , w(tmm; θˆn)∆tmm. Since the Anderson-Darlingstatistic is itself a generalized Crame´r-von Mises statistic, this decomposition applies as well.Note how this resembles the quadratic form statistic Qn illustrated above, in particularthe third construction where here W n takes the place of diag(Σ). This also illustratessome differences between our current work and the Crame´r-von Mises statistic, where thedimensionality of Dn is infinite in the latter while typically finite when elements of Dn aresome features under consideration. Also, for the Crame´r-von Mises statistic, the weightmatrix W n depends on the estimator θˆn (and hence the sample size n), whereas the Qnconsidered above is constructed using the asymptotic covariance matrix Σ. A note valid toall three classical statistics is that they are not suggestive of direction for model improvementin case the null hypothesis is rejected, as the locations and patterns of maximal differencesare lost by construction of the statistics.Although much of the literature in this area has focused on goodness-of-fit, in this andthe next chapters we draw our attention to a computationally similar, but conceptuallydifferent idea which we call adequacy-of-fit. With the former, the emphasis is on testingwhether the data belong to a particular parametric family. This could be useful when themodel is scientifically driven or when a gold standard exists, for example in approximate114simulations where one wants to check if the simulated data set is close to the target model.More often, however, there is no target model that could be deemed the truth and theparametric model is merely constructed to help explain the underlying structure or mecha-nism of the data. In this case, our focus is whether the model is adequate for the intendedpurpose. An application of this philosophy in the time series context can be found in Tsay(1992); see, e.g., Gelman et al. (1996); Ray and Lindsay (2008) for similar remarks. Asa result, we will not suggest hypothesis tests aimed at proving the validity (or invalidity)of statistical models. Instead, we develop guidelines to intuitively assess model adequacybased on the discrepancy between empirical and model-based features.The rest of this chapter is organized as follows. We derive the asymptotic behaviourof the difference statistic under correct model specification in Section 6.2. We first startwith distribution functions and then extend the results to functionals, specifically the classof U-statistics. The properties under model misspecification are dealt with in Section 6.3.Some comments on the difference statistic are given in Section 6.4. In Section 6.5, wedevelop a decision criteria for model adequacy based on the difference statistic and discussfactors that may affect its quality for this purpose. This section serves as a bridge betweenChapters 6 and 7.6.2 Asymptotics of the difference vector for a correctlyspecified modelIn this section, we obtain the asymptotic properties of Dn, the vector of differences definedin (6.1). We start with the difference between the empirical and fitted distribution functionswhen the model is correctly specified, and then generalize the results to the difference ofa wider class of functionals T (U-statistics). Throughout this section, we let Y 1, . . . ,Y nbe a random sample in Rd from the distribution F , and G(·;θ) be a parametric model towhich the sample is fitted, with density g and vector parameter θ ∈ Θ. Denote Fˆn(t) =n−1∑ni=1 1{Y i ≤ t} as the empirical distribution at t and `(θ;y) = log g(y;θ) as the log-likelihood function for one observation. Unless otherwise specified, comparison operators onvectors are applied elementwise. As for θˆn, the maximum likelihood estimator is consideredfirst, followed by√n-consistent estimators resulting from solving a system of estimatingequations.The following lists the assumptions to be used in various proofs in this chapter.Assumption A1. The parametric model G is correctly specified, in the sense that thereexists θ0 ∈ int(Θ) such that F (t) = G(t;θ0) for all t ∈ Rd.115Assumption A2 (R1 in Serfling (1980), page 144). For every y ∈ Rd, the log-likelihoodfunction is thrice differentiable with respect to every θ ∈ int(Θ), i.e.,∂`(θ;y)∂θi,∂2`(θ;y)∂θi∂θj,∂3`(θ;y)∂θi∂θj∂θkexist for all values of θi, θj, θk, three (potentially identical) elements of θ.Assumption A3. The maximum likelihood estimator, θˆn = arg maxθ∑ni=1 `(θ;yi), solvesthe score equation∂∑ni=1 `(θˆn;yi)∂θ= 0.Assumption A4 (R2 in Serfling (1980), page 144). For every θi, θj and θk, three elementsin the interior of Θ, there exists a neighbourhood N (θi), N (θj) and N (θk) such that thefunctions ∣∣∣∣∂g(y;θ)∂θi∣∣∣∣ ≤ k1(y), ∣∣∣∣∂2g(y;θ)∂θi∂θj∣∣∣∣ ≤ k2(y), ∣∣∣∣ ∂3`(θ;y)∂θi∂θj∂θk∣∣∣∣ ≤ k3(y)for all y, and that´k1(y) dy < ∞,´k2(y) dy < ∞ and´k3(y)g(y;θ) dy < ∞ withinthese neighbourhoods.Assumption A5 (R3 in Serfling (1980), page 145). For every θ ∈ int(Θ), the expectationEθ[∂`(θ;Y )∂θ∂`(θ;Y )∂θᵀ]exists and is non-singular, where Eθ stresses that the expectation is taken with respect tothe distribution Y ∼ G(·;θ).The assumptions above guarantee the existence of the Taylor series expansion of themaximum likelihood estimator and ensure that various differentiation procedures under theintegral sign are valid. In the rest of this chapter, we use the following lemma to demonstrateasymptotic normality of the difference statistic and obtain its asymptotic variance.Lemma 6.1. Suppose we can decompose a d-dimensional random vector Xn intoXn =1√nn∑i=1Si + ζn, (6.3)where S1, . . . ,Sn are i.i.d. random vectors with mean µ and covariance matrix ΣS, andζn = op(1). Then Xnd→ N(µ,ΣS).Proof. The result follows directly from the central limit theorem and Slutsky’s theorem.1166.2.1 Behaviour of the difference between empirical and fitteddistribution functionsIn the following, we establish the limiting behaviour of the difference process Fˆn(t) −G(t; θˆn). Note that this has been considered in Darling (1955), but we include the de-tails here as they contain definitions of notations and illustrate methods that we will usein later sections. To simplify notation, let `′(·) , ∂`(·)∂θ and `′′(·) , ∂2`(·)∂θ∂θᵀ , and letI , I(θ0) = −E [`′′(θ0;Y 1)] (6.4)be the Fisher information matrix.Theorem 6.2. Let Y 1, . . . ,Y n be a random sample from G(·;θ0). Assume A1 to A5. Forany given vectors t1, t2, . . . , tm ∈ Rd, the vector√nFˆn(t1)−G(t1; θˆn)...Fˆn(tm)−G(tm; θˆn) d→ N (0,Σ) , (6.5)where Σ = (σjk) contains the elementsσjk = σkj = G(tj ;θ0) [1−G(tk;θ0)]− ∂G∂θᵀ(tj ;θ0)I−1∂G∂θ(tk;θ0), 1 ≤ j ≤ k ≤ m, (6.6)with I defined in (6.4).Proof. The expansion of the score function about the true value θ0 gives√n(θˆn − θ0)=I−1Z(θ0) +Op(n−1/2), where Z(θ0) = n−1/2∑nj=1 `′(θ0;Y j) = Op(1) and will be denotedby Z when there is no ambiguity. By expanding G(t; θˆn) about θ0, we obtainG(t; θˆn) = G(t;θ0) +∂G∂θᵀ(t;θ0)(θˆn − θ0)+ R˜n = G(t;θ0) +1√n∂G∂θᵀ(t;θ0)I−1Z +Rn,(6.7)where R˜n and Rn are both op(n−1/2). We can then rewrite the left hand side of (6.5) as1√nn∑i=11 {Y i ≤ t1} −G(t1;θ0)− ∂G∂θᵀ (t1;θ0)I−1`′(θ0;Y i)...1 {Y i ≤ tm} −G(tm;θ0)− ∂G∂θᵀ (tm;θ0)I−1`′(θ0;Y i)−√nRn1...Rnm , (6.8)where√n (Rn1, . . . , Rnm)ᵀ = op(1). Let the term within the square brackets after thesummation sign be denoted as Si = (Si1, . . . , Sim)ᵀ. Then we have E(Si) = 0, while117Cov(Si) has elementsCov(Sij , Sik)= Cov[1 {Y 1 ≤ tj} − ∂G∂θᵀ(tj ;θ0)I−1`′(θ0;Y 1), 1 {Y 1 ≤ tk} − ∂G∂θᵀ(tk;θ0)I−1`′(θ0;Y 1)]= Cov [1 {Y 1 ≤ tj} , 1 {Y 1 ≤ tk}]− Cov[1 {Y 1 ≤ tj} , ∂G∂θᵀ(tk;θ0)I−1`′(θ0;Y 1)]−Cov[1 {Y 1 ≤ tk} , ∂G∂θᵀ(tj ;θ0)I−1`′(θ0;Y 1)]+Cov[∂G∂θᵀ(tj ;θ0)I−1`′(θ0;Y 1),∂G∂θᵀ(tk;θ0)I−1`′(θ0;Y 1)]. (6.9)Now, Cov [1 {Y 1 ≤ tj} , 1 {Y 1 ≤ tk}] = P (Y 1 ≤ tj ∧ tk) − G(tj ;θ0)G(tk;θ0) and is equalto G(tj ;θ0) [1−G(tk;θ0)] when j ≤ k. The second term isCov[1 {Y 1 ≤ tj} , ∂G∂θᵀ(tk;θ0)I−1`′(θ0;Y 1)]=∂G∂θᵀ(tk;θ0)I−1E [1 {Y 1 ≤ tj} `′(θ0;Y 1)]=∂G∂θᵀ(tk;θ0)I−1ˆ tj−∞∂g(y;θ0)/∂θ0g(y;θ0)· g(y;θ0) dy=∂G∂θᵀ(tk;θ0)I−1∂G∂θ(tj ;θ0),and is equal to the third term. The final term is also equal to the second term asCov[∂G∂θᵀ(tj ;θ0)I−1`′(θ0;Y 1),∂G∂θᵀ(tk;θ0)I−1`′(θ0;Y 1)]=∂G∂θᵀ(tj ;θ0)I−1∂G∂θ(tk;θ0).Hence, for j ≤ k,Cov(Sij , Sik) = G(tj ;θ0) [1−G(tk;θ0)]− ∂G∂θᵀ(tj ;θ0)I−1∂G∂θ(tk;θ0).The proof is completed using Lemma 6.1.The convergence to a mean zero Gaussian distribution can be proved for the moregeneral case where θˆn is a√n-consistent estimator that is the solution of a system ofestimating equations. We first replace the assumptions in the maximum likelihood contextto those applicable to estimating equations:Assumption B1. Instead of the log-likelihood function, the Assumptions A2 to A5 aresatisfied for a vector of inference functions l, such that the estimator θˆn is a root of theestimating equation1nn∑i=1l(θ;Y i) = 0,118with E [l(θ0;Y i)] = 0. With this, assumption A5 is modified such that Eθ [l(θ;Y )lᵀ(θ;Y )]and Eθ [∂l(θ;Y )/∂θᵀ] exist and are not singular.This more general estimator covers several model fitting methods, which are especiallyrelevant to multivariate modelling, that will be addressed in Sections 6.4 and 7.8. Oneexample of these is the composite likelihood estimator; in this case the inference functionsl are the sum of marginal or conditional score functions.The estimating equation analogue of Theorem 6.2 is given below as Theorem 6.2′:Theorem 6.2′. Let Y 1, . . . ,Y n be a random sample from G(·;θ0). Assume A1 and B1.For any given vectors t1, t2, . . . , tm ∈ Rd, the vector√nFˆn(t1)−G(t1; θˆn)...Fˆn(tm)−G(tm; θˆn) d→ N (0,Σe) ,where Σe is a m×m covariance matrix.Proof. For√n-consistent estimators coming from estimating equations, a Taylor seriesexpansion gives√n(θˆn − θ0)= H−1Z +Op(n−1/2), where Z = n−1/2∑ni=1 l(θ0;Y i) andH = E [∂l(θ0;Y 1)/∂θᵀ]. The proof then follows in a similar fashion as in the proof ofTheorem 6.2, in particular the analogous version of (6.8) becomes1√nn∑i=11 {Y i ≤ t1} −G(t1;θ0)− ∂G∂θᵀ (t1;θ0)H−1l(θ0;Y i)...1 {Y i ≤ tm} −G(tm;θ0)− ∂G∂θᵀ (tm;θ0)H−1l(θ0;Y i)−√nRn1...Rnm ,to which Lemma 6.1 is applied. However, in this case the elements of the covariance matrixcannot be written as (6.6) because the mixed empirical-fitted covariance terms (i.e., thesecond and third terms in (6.9)) are in general not the same as the model-based covariance(i.e., the last term in (6.9)).These results can be easily extended to transformations of the distribution function;this is useful for features that are functions of them with the sample version estimated by adirect transformation of the empirical distribution function. The proof is given in Corollary6.3.Corollary 6.3. Assume A1 to A5 (or A1 and B1 in the case of estimating equations).For fixed vectors t1, t2, . . . , tm ∈ Rd and function h : [0, 1]m → Rm with continuous firstderivative ∇h at (G(t1;θ0), . . . , G(tm;θ0)), the vector√n[h(Fˆ n)− h (Gθˆn)] d→ N (0, (∇h)ᵀΣ (∇h)) ,119where Σ is the covariance matrix of the limiting distribution of√n(Fˆ n −Gθˆn), Fˆ n =(Fˆn(t1), . . . , Fˆn(tm))ᵀand Gθ = (G(t1;θ), . . . , G(tm;θ))ᵀ.Proof. The proof is similar to that of the delta method. From the mean value theorem, wehaveh(Fˆ n)= h (Gθ0) + [∇h (Gθ0)]ᵀ[Fˆ n −Gθ0]+ op(n−1/2)h(Gθˆn)= h (Gθ0) + [∇h (Gθ0)]ᵀ[Gθˆn −Gθ0]+ op(n−1/2).Hence√n[h(Fˆ n)− h (Gθˆn)] = √n [∇h (Gθ0)]ᵀ (Fˆ n −Gθˆn)+op(1) and the result follows.6.2.2 Generalization to the difference for U-statisticsHere, we show that a result analogous to Theorem 6.2 can be extended to the differenceof functionals applied to the empirical and fitted distributions, with empirical functionalsbeing U-statistics (Hoeffding (1948)). Let T : F → R be a functional that maps from amultivariate distribution F ∈ F to a real number. For this result to hold, we need to applythe following smoothness assumption on T .Assumption A6. The functional T is twice differentiable with respect to every θ ∈ int(Θ),i.e.,∂T [G(·;θ)]∂θi,∂2T [G(·;θ)]∂θi∂θjexist for all values of θi and θj, two (potentially identical) elements of θ.If A6 is satisfied, then T admits the Taylor series expansionT[G(·; θˆn)]= T [G(·;θ0)] + 1√n∂T∂θᵀ[G(·;θ0)] I−1Z +Rnunder maximum likelihood estimation, similar to (6.7), where Rn is op(n−1/2) and I isdefined in (6.4).Theorem 6.4. Let Y 1, . . . ,Y n be a random sample from G(·;θ0). Assume A1 to A6 andsuppose T (Fˆn) is a U-statistic, estimable of degree r, that can be written asT (Fˆn) =(nr)−1 ∑β∈Bh (Y β1 , . . . ,Y βr) ,where h is a symmetric kernel that is square-integrable with non-zero limiting variance, andB is a set of cardinality(nr), with each element β being one of the possible subsets of r120integers from {1, . . . , n}. Then√n(T (Fˆn)− T[G(·; θˆn)])d→ N (0, a− b) , (6.10)wherea = r2Cov [h (Y 1, . . . ,Y r) , h (Y 1,Y r+1 . . . ,Y 2r−1)] ; (6.11)b =∂T∂θᵀ[G(·;θ0)] I−1∂T∂θ[G(·;θ0)] . (6.12)Proof. By the definition of U-statistics, E[T (Fˆn)]= T [G(·;θ0)] , γ. DefineV ∗n =rnn∑i=1[h1(Y i)− γ]as in Lemma 3.3.8 of Randles and Wolfe (1979), where h1(x) = E [h (x,Y 2, . . . ,Y r)] andthe expectation is taken with respect to Y 2, . . . ,Y r. Theorem 3.3.13 in Randles and Wolfe(1979) has that nE[(T (Fˆn)− γ − V ∗n)2] → 0 as n → ∞; this implies the convergence inprobability√n[T (Fˆn)− γ − V ∗n]= op(1), and the following decomposition into a sum ofi.i.d. random variables, to which Lemma 6.1 is applied:√n(T (Fˆn)− T[G(·; θˆn)])=√n[V ∗n −(T[G(·; θˆn)]− γ)]+ op(1)=1√nn∑i=1[rh1(Y i)− rγ − ∂T∂θᵀ[G(·;θ0)] I−1`′(θ0;Y i)]+ op(1), 1√nn∑i=1Si + op(1).It is obvious that E(Si) = 0. The variance of S isVar(Si) = r2Var [h1(Y i)] +∂T∂θᵀ[G(·;θ0)] I−1∂T∂θ[G(·;θ0)]−2r ∂T∂θᵀ[G(·;θ0)] I−1Cov [h1(Y i), `′(θ0;Y i)] ,where Var [h1(Y i)] = Cov [h (Y 1, . . . ,Y r) , h (Y 1,Y r+1 . . . ,Y 2r−1)] from equation (3.3.7)of Randles and Wolfe (1979), andCov [h1(Y i), `′(θ0;Y i)] = E [h1(Y i)`′(θ0;Y i)]= E [h (Y 1, . . . ,Y r) `′(θ0;Y 1)] =1rE[h (Y 1, . . . ,Y r)r∑i=1`′(θ0;Y i)]=1rˆ ∞−∞h (y1,y2, . . . ,yr)∂∂θ∏ri=1 g(yi;θ0)∏ri=1 g(yi;θ0)r∏i=1g(yi;θ0) dy1 · · · dyr=1r∂T∂θ[G(·;θ0)] ,121where the third equality uses the fact that h is symmetric about its arguments. HenceVar(Si) = r2Cov [h (Y 1, . . . ,Y r) , h (Y 1,Y r+1 . . . ,Y 2r−1)]− ∂T∂θᵀ[G(·;θ0)] I−1∂T∂θ[G(·;θ0)] .Example 6.1. We illustrate the result (6.10) with a simple example where G is the Poissondistribution with true parameter θ0, and the functional is the second moment E(Y 2). Thefunctional transform is thus T [G(·; θ)] = θ2 + θ with sample version T (Fˆn) = n−1∑ni=1 Y2i(and thus with degree r = 1). The maximum likelihood estimator is θˆn = n−1∑ni=1 Yi , Y ,the score function is `′(θ0;Y ) = θ−10 Y −1 and the Fisher information is I = θ−10 . The partialderivatives of the functional are∂T∂θ[G(·; θ)] = 2θ + 1; ∂2T∂θ2[G(·; θ)] = 2; ∂kT∂θk[G(·; θ)] = 0for k ≥ 3. Since √n(θˆn − θ0)= I−1Z(θ0), we have√n(T (Fˆn)− T[G(·; θˆn)])=√n[1nn∑i=1Y 2i − Y (Y + 1)]=1√nn∑i=1[Y 2i − θ0(θ0 + 1)− (2θ0 + 1)θ0(Yiθ0− 1)]−√n (Y − θ0)2, 1√nn∑i=1Si − ζn. (6.13)In this case, we can evaluate the variance of (6.13) directly:Var(√n[T (Fˆn)− T[G(·; θˆn)]])= Var(S1) + Var(ζn)− 2√nCov(S1, ζn).By (6.11) and (6.12), the variance of Si is a−b where a = Var(Y 2i ) = θ0 (1 + 6θ0 + 4θ20), b =θ0 (2θ0 + 1)2 and a−b = 2θ20. For the variance of ζn, we have Y+ ,∑ni=1 Yi ∼ Poisson(nθ0)and thusVar(ζn) = Var[√n(Y − θ0)2]= n−3{E[(Y+ − nθ0)4]− (E [(Y+ − nθ0)2])2}= n−3(n2θ20)(3 +1nθ0− 1)= O(n−1),and the covariance term isCov(S1, ζn) = n−3/2E[S1 (Y+ − nθ0)2]= n−3/2E[S1(Y 21 + 2Y1n∑i=2Yi − 2nθ0Y1)]= n−3/2E[S1Y21 + 2(n− 1)θ0S1Y1 − 2nθ0S1Y1]= O(n−3/2),122where we make use of the properties that E(S1) = 0 and that S1 is independent of Yj , j 6= 1.This means that both Var(ζn) and 2√nCov(S1, ζn) are of order O(n−1) and converge tozero as n→∞. Therefore limn→∞Var[√n(T (Fˆn)− T[G(·; θˆn)])]= Var(S1) = 2θ20.Example 6.2. Let Y i = (Yi1, Yi2)ᵀ, i = 1, . . . , n, be a random sample with distribu-tion function G(·;θ0) and density function g. We obtain expressions for the limiting dis-tribution in Theorem 6.4 with T being Kendall’s τ or Spearman’s ρ, denoted below asTτ = 4´ ´G(y1, y2) dG(y1, y2) − 1 and TρS = 12´ ´G1(y1)G2(y2) dG(y1, y2) − 3, respec-tively, where G1 and G2 are the (known) marginal distribution functions of G. Let thetrue values of the measures be τ and ρS , the marginal densities be gm for m = 1, 2,1τij = 1 {Yi1 > Yj1, Yi2 > Yj2}, 1ρSijk = 1 {Yi1 < Yk1, Yj2 < Yk2}, and assume there are no tiesin the data. The empirical functionals can be written asTτ (Fˆn) =(n2)−1∑i<j[2(1τij + 1τji)− 1] ;TρS (Fˆn) =(n3)−1 ∑i<j<k[2(1ρSijk + 1ρSikj + 1ρSjik + 1ρSjki + 1ρSkij + 1ρSkji)− 3]+Op(n−1),i.e., Kendall’s τ and Spearman’s ρ have r = 2 and 3, respectively, and the h functionsare the terms inside the square brackets. The quantity a in (6.11) can be obtained byconsidering the projection V ∗nV ∗n =rnn∑i=1E [h (Y 1,Y 2, . . . ,Y r)− γ|Y 1] .The conditional expectations for Kendall’s τ areE(1τij |Y i)= E [1 {Yi1 > Yj1, Yi2 > Yj2} |Y i] = G(Y i;θ0);E(1τji|Y i)= E [1 {Yj1 > Yi1, Yj2 > Yi2} |Y i] = G(Y i;θ0),and henceV ∗n =4nn∑i=1[G(Y i;θ0) +G(Y i;θ0)]− 2(τ + 1);a = Var(4[G(Y i;θ0) +G(Y i;θ0)])= 16ˆ ∞−∞[G(y;θ0) +G(y;θ0)]2dG(y;θ0)− 4(τ + 1)2.123For Spearman’s ρ, the conditional expectations areE(1ρSijk|Y i)= E [1 {Yi1 < Yk1, Yj2 < Yk2} |Y i] = E [E (1 {Yi1 < Yk1, Yj2 < Yk2} |Y i,Y j) |Y i]= E[G(Yi1, Yj2;θ0)|Y i]=ˆ ∞−∞G(Yi1, w;θ0)g2(w) dw = E(1ρSikj |Y i);E(1ρSjik|Y i)= E [1 {Yj1 < Yk1, Yi2 < Yk2} |Y i]= E[G(Yj1, Yi2;θ0)|Y i]=ˆ ∞−∞G(w, Yi2;θ0)g1(w) dw = E(1ρSkij |Y i);E(1ρSjki|Y i)= E [1 {Yj1 < Yi1, Yk2 < Yi2} |Y i]= E [1 {Yj1 < Yi1} |Y i]E [1 {Yk2 < Yi2} |Y i] = G1(Yi1)G2(Yi2) = E(1ρSkji|Y i).Let B1(Y i;θ0) =´∞−∞G(Yi1, w;θ0)g2(w) dw and B2(Y i;θ0) =´∞−∞G(w, Yi2;θ0)g1(w) dw.ThenV ∗n =3nn∑i=1{4 [B1(Y i;θ0) +B2(Y i;θ0) +G1(Yi1)G2(Yi2)]− (ρS + 3)}, 12nn∑i=1B(Y i;θ0)− 3(ρS + 3)a = Var [12B(Y i;θ0)] = 144ˆ ∞−∞B2(y;θ0) dG(y;θ0)− 9(ρS + 3)2.Meanwhile, for the partial derivative that appears in the quantity b in (6.12), we obtain∂Tτ∂θ[G(·;θ0)] = 4 ∂∂θˆG(y;θ0)g(y;θ0) dy= 4ˆ [∂G∂θ(y;θ0) · g(y;θ0) +G(y;θ0) · ∂g∂θ(y;θ0)]dy= 4ˆ [∂G∂θ(y;θ0) +G(y;θ0)`′(θ0;y)]dG(y;θ0), `′ = ∂g/∂θ;∂TρS∂θ[G(·;θ0)] = 12 ∂∂θˆG1(y1)G2(y2) dG(y;θ0)= 12ˆG1(y1)G2(y2)`′(θ0;y) dG(y;θ0).For copulas U = (U1, U2) ∼ C, ∂TρS [G(·;θ0)] /∂θ reduces to 12´u1u2`′(θ0;u) dC(u;θ0)as the marginal distribution functions are Gm(u) = u for 0 ≤ u ≤ 1, m = 1, 2.Next, we provide the vector version of Theorem 6.4, as well as the one applicable toestimators as solutions of estimating equations.124Theorem 6.5. Let Y 1, . . . ,Y n be a random sample from G(·;θ0). Assume A1 to A6. LetTl(Fˆn) =(nr)−1 ∑β∈Bhl (Y β1 , . . . ,Y βr) , l = 1, . . . ,mbe U-statistics estimable of the same degree r and estimated based on the same sample,where hl is the symmetric kernel for the l-th functional that is square-integrable with non-zero limiting variance. Then√nT1(Fˆn)− T1[G(·; θˆn)]...Tm(Fˆn)− Tm[G(·; θˆn)] d→ N (0,Σ) ,where Σ has (j, k) entryr2κ(j,k)1 −∂Tj∂θᵀ[G(·;θ0)] I−1∂Tk∂θ[G(·;θ0)]with κ(j,k)1 = Cov [hj (Y 1, . . . ,Y r) , hk (Y 1,Y r+1 . . . ,Y 2r−1)] and I defined in (6.4).Proof. The projection argument is a direct generalization of that in Theorem 6.4 and The-orem 3.3.13 in Randles and Wolfe (1979). Once the U-statistics are projected to the spaceof sums of i.i.d. random variables, the multivariate central limit theorem can be applied asin Theorem 6.2.Theorem 6.5′. Let Y 1, . . . ,Y n be a random sample from G(·;θ0). Assume A1, A6 andB1. LetTl(Fˆn) =(nr)−1 ∑β∈Bhl (Y β1 , . . . ,Y βr) , l = 1, . . . ,mbe U-statistics estimable of the same degree r and estimated based on the same sample,where hl is the symmetric kernel for the lth functional that is square-integrable with non-zero limiting variance. Then√nT1(Fˆn)− T1[G(·; θˆn)]...Tm(Fˆn)− Tm[G(·; θˆn)] d→ N (0,Σe) ,where Σe is a m×m covariance matrix.1256.2.3 Separability of variance in the asymptotic normal distributionWe showed in the previous subsections that, when the parameter is estimated via maximumlikelihood, there are cases where the scaled difference statistic (one element of√nDn) hasan asymptotic normal distribution with mean zero and a variance that can be written as adifference. In each case, this asymptotic variance is that of the empirical feature minus thatof the model-based feature. Note that, given Assumptions A1 to A6, the following resultsconcerning the maximum likelihood estimator hold, with I being the Fisher informationmatrix defined in (6.4):√n(θˆn − θ0) d→ N(0, I−1(θ0));√n[G(t; θˆn)−G(t;θ0)]d→ N(0,∂G∂θᵀ(t;θ0)I−1∂G∂θ(t;θ0)), fixed t;√n(T[G(·; θˆn)]− T [G(·;θ0)])d→ N(0,∂T∂θᵀ[G(·;θ0)] I−1∂T∂θ[G(·;θ0)]).Meanwhile, the asymptotic variances for the empirical distribution function and U-statistics are available in standard non-parametric theory. Results corresponding to thetwo features in the previous subsections are summarized in Table 6.1.Feature Empirical AVar Model AVarDistribution function, fixed t G(1−G) ∂G∂θᵀ I−1 ∂G∂θU-statistics, T (F ) =(nr)−1∑β∈Bh (Y β1 , . . . ,Y βr) r2α1 (*see below)∂T∂θᵀ I−1 ∂T∂θ*For U-statistics, α1 = Cov [h (Y 1, . . . ,Y r) , h (Y 1,Y r+1 . . . ,Y 2r−1)].Table 6.1: Summary of empirical and model-based asymptotic variances under maximumlikelihood estimation. “AVar” stands for asymptotic variance.With the separability property, one can avoid the non-trivial calculation of the crosscovariance between the empirical and model-based features. Note that the separabilityproperty may not hold in the more general case of estimating equations. Theorem 6.6below proves that the variance of the scaled difference statistic approaches the varianceof the limiting distribution, given certain conditions on the behaviour of the moments forthe remainder term ζn in (6.3). This has implications in the parametric bootstrap wherevariances are estimated from the sampling distribution. In the vector case as in the proof,this can be generalized to the covariance matrix of the scaled differences.Theorem 6.6. In the decomposition (6.3), let ζn = n−1/2W n, where W n = Op(1). IfCov(W n) = O(1) as n→∞ and E(S1) = 0, thenlimn→∞ [Cov(Xn)− Cov(S1)] = 0.126Proof. Since Xn = n−1/2∑ni=1 Si + n−1/2W n, we haveCov(Xn) = Cov(S1) +1nCov(W n) + 2Cov(1nn∑i=1Si,W n). (6.14)The middle term n−1Cov(W n)→ 0 as n→∞ by assumption, and E[(n−1∑ni=1 Sij)2]=n−1Var(S1j) → 0 as n → ∞ for j = 1, . . . , d, where Sᵀi = (Si1, . . . , Sid). Letting W ᵀn =(Wn1, . . . ,Wnd), Sj =∑ni=1 Sij/n and using the Cauchy-Schwarz inequality, the (j, k) entryof the last covariance matrix of (6.14) is then given byCov(Sj ,Wnk)= E[SjWnk] ≤√E(S2j)E(W 2nk)→ 0as n→∞ because E(W 2nk) is bounded, and hence limn→∞ [Cov(Xn)− Cov(S1)] = 0.Under the regularity conditions, the remainder term ζn is of order Op(n−1/2) for maxi-mum likelihood estimators and those coming from estimating equations with√n-consistency.However, the condition that Cov(W n), or Var(W ) for the single-element case, is boundedas n→∞ is not so trivial to verify. For practical applications, we can check the magnitudesof nVar[Fˆn(t)−G(t; θˆn)]or nVar(T (Fˆn)− T[G(·; θˆn)])for several values of n. If theydo not seem to grow as n increases, that may be a sufficient indication that the variance ofthe scaled difference converges to the variance of the limiting distribution.6.3 Asymptotics of the difference vector under modelmisspecificationThe properties shown in Section 6.2 are only valid when the parametric model used forfitting coincides with the generating model. In this section, we mention the characteristicsof the same statistic when the fitted model is misspecified. In particular, we show thatthe statistic is still asymptotically normal, but it has a mean that grows with the samplesize. The Kullback-Leibler (KL) divergence (Kullback and Leibler (1951)), which can beconsidered as a distance measure between two distributions, is central to the subsequentdiscussion. The KL divergence of a density g from f is defined asKL(g||f) =ˆf(t) logf(t)g(t)dt ≥ 0.When the model is misspecified, there is no θ0 such that F (t) = G(t;θ0) for all t ∈ Rd.We will need the following assumption:127Assumption A7. For every θ ∈ int(Θ), the expectationsEF[∂`(θ;Y )∂θ∂`(θ;Y )∂θᵀ], EF[∂2`(θ;Y )∂θ∂θᵀ]exist and are not singular, where EF stresses that the expectation is taken with respect tothe true distribution F .We first state the classical result on maximum likelihood estimators when the model ismisspecified, as Theorem 6.7 below.Theorem 6.7. Let Y 1, . . . ,Y n be a random sample from F with density f and G(·;θ) bethe fitted parametric model with density g. Assume A2–A4 and A7. Then we haveθˆp→ θ˜; (6.15)√n(θˆ − θ˜)d→ N (0,H−1(θ˜)J(θ˜)H−1(θ˜)) , (6.16)where θ˜ is the parameter value that minimizes the Kullback-Leibler divergence of g(·,θ)from f , i.e.,θ˜ = arg minθˆf(t) logf(t)g(t;θ)dt,with θ˜ in the interior of the parameter space, andJ(θ˜) = EF[∂`(θ˜;Y 1)∂θ∂`(θ˜;Y 1)∂θᵀ]; H(θ˜) = −EF[∂2`(θ˜;Y 1)∂θ∂θᵀ].Proof. This is the standard result for maximum likelihood estimation of misspecified mod-els, see, e.g., Huber (1967) and White (1982).In the following, we use J and H to denote J(θ˜) and H(θ˜), respectively. The subscriptF for the expectation is understood and dropped. The Taylor series expansion of themaximum likelihood estimator about θ˜ gives√n(θˆ − θ˜)= H−1Z + Op(n−1/2), whereZ , Z(θ˜) = n−1/2∑nj=1 `′(θ˜;Y j), and E(Z) = 0 sinceE[`′(θ˜;Y j)]=ˆ ∞−∞`′(θ˜;y)f(y) dy =∂∂θᵀˆ ∞−∞f(y) log g(y; θ˜) dy = 0,with the final equality due to the fact that θ˜ maximizes h(θ) ,´∞−∞ f(y) log g(y;θ) dy andis a root of ∂h/∂θ = 0. A reasoning similar to that in Sections 6.2.1 and 6.2.2 allows us toderive the following.128Theorem 6.8. Let Y 1, . . . ,Y n be a random sample from F and G(·;θ) be the fittedparametric model. Assume A2–A4 and A7, so that (6.15) and (6.16) hold. Let η =F (t)−G(t; θ˜). Then√n[Fˆn(t)−G(t; θˆn)− η]d→ N (0, σ2) ,whereσ2 = Var[1 {Y i ≤ t} − ∂G∂θᵀ(t; θ˜)H−1`′(θ˜;Y j)]6= F (t) [1− F (t)]− ∂G∂θᵀ(t; θ˜)J−1∂G∂θ(t; θ˜) (6.17)in general.Proof. The decomposition of this difference is√n[Fˆn(t)−G(t; θˆn)− η]=1√nn∑i=1[1 {Y i ≤ t} −G(t; θ˜)− η − ∂G∂θᵀ(t; θ˜)H−1`′(θ˜;Y i)]−√nRn, 1√nn∑i=1Si + op(1).where Rn = op(n−1/2). Now E(Si) = F (t)−G(t; θ˜)− η = 0, whileVar(Si) = Var [1 {Y i ≤ t}] + ∂G∂θᵀ(t; θ˜)H−1JH−1∂G∂θ(t; θ˜)−2Cov[1 {Y i ≤ t} , ∂G∂θᵀ(t; θ˜)H−1`′(θ˜;Y i)]as Cov[`′(θ˜;Y i)]= E([`′(θ˜;Y i)] [`′(θ˜;Y i)]ᵀ)= J . The covariance term isCov[1 {Y i ≤ t} , ∂G∂θᵀ(t; θ˜)H−1`′(θ˜;Y i)]=∂G∂θᵀ(t; θ˜)H−1ˆ t−∞∂g(y; θ˜)∂θ· f(y)g(y; θ˜)dy.The integral ˆ t−∞∂g(y; θ˜)∂θ· f(y)g(y; θ˜)dy 6= JH−1∂G∂θ(t; θ˜)in general. Note that, when the model is correctly specified, we have J = H andˆ t−∞∂g(y; θ˜)∂θ· f(y)g(y; θ˜)dy =∂G∂θ(t; θ˜),retrieving (6.17).129Theorem 6.9. Let Y 1, . . . ,Y n be a random sample from F and G(·;θ) be the fitted para-metric model. Assume A2–A4 and A6–A7, and that Tk(Fˆn) is a U-statistic outlined inTheorem 6.4, k = 1, . . . ,m. Let ηk = Tk(F )− Tk[G(·; θ˜)]. Then√nT1(Fˆn)− T1[G(·; θˆn)]− η1...Tm(Fˆn)− Tm[G(·; θˆn)]− ηm d→ N (0,Σmis) ,for some covariance matrix Σmis.Proof. The proof for one element, i.e.,√n(Tk(Fˆn)− Tk[G(·; θˆn)])− ηk, follows from theprojection argument and Theorem 6.8. This is readily extended to the vector case as inTheorem 6.2.Note that Theorem 6.9 is also valid for θˆn being the solution of estimating equations,with the assumptions changed to A6 and B1.Theorems 6.8 and 6.9 contain two important observations: (a) that the scaled difference√n[Fˆn(t)−G(t; θˆn)]or√n(T (Fˆn)− T[G(·; θˆn)])has a mean that grows at rate O(n1/2)if η 6= 0, and (b) that the asymptotic variance of the difference is no longer separable whenthe model is misspecified, even under maximum likelihood estimation.Estimation via composite likelihood is one instance of modelling via estimating equa-tions. In this case θ˜ = θ0, the true parameter value, under some assumptions such as θ isidentifiable from the margins being used. The asymptotic covariance matrix is still givenby H−1(θ0)J(θ0)H−1(θ0). The asymptotic variance of the scaled difference cannot bedecomposed into the difference of the respective asymptotic variances in general.6.4 Some comments on properties of the difference statisticIn this section, we provide some comments on the difference T (Fˆn)− T[G(·; θˆn)]that arerelevant to its practical use as a diagnostic statistic.• As Section 6.2 suggests, under correct model specification, maximum likelihood esti-mation and subject to regularity conditions, the variance of the limiting distributionof√n(T (Fˆn)− T[G(·; θˆn)])can be decomposed into the difference of the limitingvariances of the empirical and model-based features. This means that, in the limit,the variability of the scaled difference cannot be larger than that of the empiricalestimator, and goes in the opposite direction as that of the model-based estimator;a more precise maximum likelihood estimator leads to a smaller limiting variance ofthe model-based feature, and a larger variance of the difference.130• There are cases where the empirical and model-based features are identical and hencethe difference is always zero. Such features are not suitable for use in assessing modeladequacy. One example is when T (F ) is the mean of F and the parametric model isone with the maximum likelihood estimator for the mean given by the sample mean,for example the exponential, geometric and Poisson distributions, and the binomialdistribution with the success probability being the only parameter. For the multi-variate Gaussian distribution, the maximum likelihood estimators for the mean andcovariance parameters are their sample counterparts. However, if data are gener-ated from a multivariate Gaussian distribution with zero mean and given marginalvariances, and only the correlations are fitted, the resulting maximum likelihood es-timators are no longer the sample correlations (see Section 7.5.3). Because of this,fitting a (saturated) multivariate Gaussian copula to data on the unit hypercube willgenerally result in correlation parameter estimates slightly different from the samplecorrelations.• For T being a central dependence measure for bivariate copulas (or bivariate mono-tone association), properties of the empirical estimators are available in Section 2.12of Joe (2014). Here we only consider T being Kendall’s τ or Spearman’s ρ, as theyare invariant to marginal standardization and their empirical estimators have smallerasymptotic variance than Blomqvist’s β. For both empirical and model-based esti-mators, the asymptotic variance generally decreases as a function of the strength ofdependence. This is however not a monotone relationship; for example, the asymptoticvariance of the empirical Kendall’s τ for Gumbel or MTCJ copula is slightly higherat τ = 0.1 than τ = 0. As the strength of dependence approaches comonotonic-ity, both the empirical and model-based measures converge to 1 and the asymptoticvariance approaches zero. The asymptotic variance for the difference is generally nota monotone function of the dependence strength; for many copulas it is increasingup to moderate dependence (true Kendall’s τ around 0.5), before eventually decreas-ing again. Finally, when T is Kendall’s τ , the empirical, model-based and differenceestimators typically all have smaller variances than when T is Spearman’s ρ unlessthe strength of dependence is very strong. A demonstrative plot of the magnitudesfor representative parametric copula families with different tail properties is given inFigure 6.1.• Figure 6.1 also suggests that the asymptotic variances for various copulas can be quitedifferent, even when the overall dependence strength is fixed. There is evidence thatthis behaviour, especially for the empirical estimator, can partly be attributed to the1310.1 0.3 0.5 0.70.00.20.40.60.81.01.2Gaussian − τTrue Kendall's taul llllllllllllll l0.1 0.3 0.5 0.70.00.20.40.60.81.01.2Frank − τTrue Kendall's taulllllll lllllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2Gumbel − τTrue Kendall's taul llllllll lllllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2MTCJ − τTrue Kendall's taul llllllllllllll l0.1 0.3 0.5 0.70.00.20.40.60.81.01.2t3 − τTrue Kendall's taullllllllllllllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2Gaussian − ρSTrue Kendall's taulllllllllllllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2Frank − ρSTrue Kendall's taulllllll llllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2Gumbel − ρSTrue Kendall's taulllllllllllllll0.1 0.3 0.5 0.70.00.20.40.60.81.01.2MTCJ − ρSTrue Kendall's taulllllllllllllll l0.1 0.3 0.5 0.70.00.20.40.60.81.01.2t3 − ρSTrue Kendall's taullllllllllllllllEmpModEmp−ModFigure 6.1: Asymptotic variances of the empirical (solid line with circles), model-based(dotted line with circles) and difference estimators (solid, thick line without circles) fordifferent parametric copula families with various strengths of dependence. The featuresare Kendall’s τ in the top row and Spearman’s ρ in the bottom. The x-axis plots thedependence strength (in Kendall’s τ) for each copula family, and the lines are smoothed toreduce the effect of sampling variability. The Frank copula is plotted to a Kendall’s τ valueof 0.75 due to numerical instability beyond this value.tail properties of the copula; the variability gets larger as tail dependence increases ortail order decreases13. A heuristic demonstration of the empirical asymptotic varianceis given in Appendix B. Table 6.2 shows the asymptotic variance for some copulaswith the same Kendall’s τ being set at 0.5. Frank and Gaussian copulas have weakdependence in the joint tails (with Frank weaker than Gaussian) and smaller empiricalasymptotic variance. Hu¨sler-Reiss, Gumbel and reflected MTCJ (rMTCJ) copulas areasymmetric with strong dependence in one tail only. Among these, the rMTCJ copulahas the strongest upper tail dependence with a given value of τ . These parametriccopula families generally result in larger variances. Finally, t copulas with smalldegrees of freedom have strong tail dependence in both tails and the largest varianceamong the parametric copula families considered.13The tail order (see Hua and Joe (2011)) also describes the tail behaviour of a copula. It must be atleast 1, with higher values indicating weaker dependence.132CopulaCopula Tail order Tail dependence indexparameter Lower Upper Lower UpperGaussian 0.71 1.17 1.17 0 0t (ν = 1) 0.71 1 1 0.62 0.62t (ν = 3) 0.71 1 1 0.45 0.45t (ν = 10) 0.71 1 1 0.20 0.20t (ν = 30) 0.71 1 1 0.03 0.03Hu¨sler-Reiss 1.81 1.42 1 0 0.58Gumbel 2 1.41 1 0 0.59rMTCJ 2 2 1 0 0.71Frank 5.74 2 2 0 0CopulaAsymptotic varianceT (F ) = Kendall’s τ T (F ) = Spearman’s ρEmpirical Model Difference Empirical Model DifferenceGaussian 0.23 0.14 0.10 0.27 0.18 0.10t (ν = 1) 0.49 0.29 0.20 0.78 0.37 0.41t (ν = 3) 0.33 0.22 0.11 0.47 0.29 0.18t (ν = 10) 0.26 0.17 0.09 0.34 0.22 0.12t (ν = 30) 0.24 0.15 0.10 0.29 0.19 0.10Hu¨sler-Reiss 0.25 0.13 0.12 0.29 0.17 0.12Gumbel 0.27 0.17 0.11 0.37 0.21 0.15rMTCJ 0.29 0.12 0.17 0.39 0.15 0.24Frank 0.21 0.18 0.03 0.30 0.24 0.06Table 6.2: Tail properties (above) and asymptotic variances (below) for different parametriccopula families with true Kendall’s τ equal to 0.5• For copula fitting with maximum likelihood using variables U1, . . . ,Un i.i.d. fromcopula C, the scaled difference statistic√nDn is asymptotically normally distributedand the separability property of the asymptotic variance holds when the empiricalfeature is a U-statistic. However, in practice data do not usually lie in the unithypercube and marginal modelling is relevant. Consider data Y i = (Yi1, . . . , Yid)ᵀi.i.d. from distributionG(y; ζ1, . . . , ζd, δ) = C (G1(y1; ζ1), . . . , Gd(yd; ζd); δ) , (6.18)where ζj is the marginal parameter vector for the jth margin with distribution func-tion Gj , and δ is the copula parameter vector. Several different estimators of δ exist:1. If all the Gj ’s and ζj ’s are known, this reduces to the copula problem as δˆ, themaximum likelihood estimator of δ using the combined model, is the same as133the maximizer of the copula C(u; δ) using the vector of transformed data U i =(G1(Yi1; ζ1), . . . , Gd(Yid; ζd))ᵀ. Therefore this case can be reduced to the (pure)copula fitting approach mentioned above, for which the separability propertyholds.2. If the Gj ’s are known (or assumed correctly specified) but the ζj ’s are not, andthe full likelihood consisting of densities of the formg(yi; ζ1, . . . , ζd, δ) = c (G1(yi1; ζ1), . . . , Gd(yid; ζd); δ)d∏j=1gj(yij ; ζj),where c and gj are respectively the copula and jth marginal density, is maximizedwith respect to the whole parameter vector θ = (ζᵀ1, . . . , ζᵀd, δᵀ)ᵀ jointly, theresulting estimators will still conform to the maximum likelihood framework(subject to regularity conditions) and thus the scaled difference statistic arisingfrom this fit will be asymptotically normal with separable asymptotic variances.However, this method is only practical when the dimension d is small.3. If the Gj ’s are known (or assumed correctly specified) but the ζj ’s are not, and itis not practical to maximize the likelihood with respect to the whole parametervector at the same time, we can first estimate the marginal parameters individ-ually, obtaining maximum likelihood estimates ζ˜1, . . . , ζ˜d, and then estimate thecopula parameter given these marginal parameters. This is the method of in-ference function for margins (IFM) (Joe and Xu (1996); Joe (2005)), mentionedbriefly in Section 2.5.2. The IFM copula estimator δ˜ is typically different fromand is less efficient than the maximum likelihood estimator δˆ. In this case, thescaled difference is asymptotically normal, as the estimator is√n-consistent aris-ing from the solution of a set of unbiased estimating equations, thus satisfyingAssumption B1. However, the separability property does not hold in this case.We observe that the variability of the difference statistic appears to be smallerthan that obtained using a purely copula (or known margin) approach in manysituations; see Section 7.8 for details.4. When one does not want to assume a certain parametric form for the Gj ’s, thecopula parameter can be estimated via the marginal ranks method, in which thelog-likelihood consisting of densities of the form c (si1, . . . , sid; δ) is maximized,where sij = n−1 (∑nk=1 1{ykj ≤ yij} − 0.5) is the adjusted rank of observationyij . An outline of the proof on asymptotic normality of the resulting estimatorδ˙ is given in Genest et al. (1995), using results in multivariate rank statistics(see, e.g., Ruymgaart et al. (1972); Ruymgaart (1974); Ru¨schendorf (1976)). In134essence, they argue that the quantity√n(δ˙ − δ0), where δ0 is the true value,can still be written as a sum of i.i.d. random variables plus negligible terms, sothat asymptotic normality can be established; Genest et al. (1995) show that thenecessary assumptions are satisfied by a large number of one-parameter bivariatecopula families. Although not rigorously proved in this thesis, we believe thatsuch decomposition into a sum of i.i.d. random variables will allow a proof similarto the one for√n-consistent estimators as solutions to estimating equations, sothat asymptotic normality of the scaled difference statistic can be established(such convergence is assumed for some results in Chapter 7 to hold). In general,the separability property does not hold in this case.The following illustrative simulation set-up provides an example of the behaviour ofthe asymptotic variances for the various copula parameter estimators described above.Here we restrict our attention to a bivariate random vector Y = (Y1, Y2)ᵀ with distri-bution function (6.18), where C is the MTCJ copula C(u1, u2; δ) = (u−δ1 +u−δ2 −1)−1/δand Gk is the two-parameter Pareto distribution Gk(yk;αk, σk) = 1− (1 + yk/σk)−αkfor yk > 0, k = 1, 2. The parameters chosen are (αk, σk) = (4, 3) for k = 1 and (2, 1)for k = 2, and δ = 2 so that the copula has Kendall’s τ equal to 0.5. Samples of sizen = 20,000 are generated from G, from which the empirical Kendall’s τ and the model-based Kendall’s τ based on the estimators above (i.e., known marginal distributions,joint maximum likelihood, IFM and marginal ranks) are obtained. The procedure isrepeated 2,000 times to obtain a sampling distribution, from which asymptotic vari-ances are estimated14. The results are displayed in Table 6.3. When the margins areknown or joint maximum likelihood is performed, the estimated asymptotic varianceof the (scaled) difference statistic is close to the difference between the empirical andmodel-based asymptotic variance estimates. This is no longer true for models fittedwith IFM or marginal ranks. Also, the model-based asymptotic variance estimatesusing marginal ranks are much larger than the other methods. In this case the knownmargin and IFM approaches have similar model-based asymptotic variances, but aswe noted above the variance using IFM is typically larger. The estimated model-based asymptotic variance for joint maximum likelihood estimation is the smallest,and results in the largest variance for the difference statistic. Finally, when marginsare known, the estimated asymptotic variance for the difference is larger than thoseusing IFM and marginal ranks. Although a theoretical proof appears difficult, thisseems to be the general pattern we see for more complex parsimonious structures. The14These estimates are compared against those arising from some other values of n to ensure that Theorem6.6 is applicable.135behaviour for these structures and its implications on using the difference statistic asa diagnostic tool are discussed in Section 7.8.MethodAsymptotic varianceEmpirical Model E −M DifferenceKnown margins0.280.13 0.15 0.15Joint MLE 0.12 0.16 0.16IFM 0.13 0.15 0.12Marginal ranks 0.25 0.03 0.10Table 6.3: Asymptotic variance estimates for Kendall’s τ using different estimators for thecopula parameter, with the parametric model having Pareto margins and MTCJ copulawith Kendall’s τ equal to 0.5. The column labelled “E −M” is the difference between theempirical and model-based asymptotic variances, and should be close to the asymptoticvariance of the difference (last column) if the separability property holds.• When the functional is the rank-based F-madogram empirical estimator of the ex-tremal coefficient or tail-weighted dependence measures, a different treatment is re-quired as they cannot be naturally expressed as U-statistics. In both cases, theempirical estimators are asymptotically normal using the theory of the empirical cop-ula process, while the asymptotic normality of the model-based counterparts (for themodel estimation methods in the preceding item) is a result of the delta method.However, it has not been proved that the scaled difference statistic is asymptoticallynormal, although extensive simulations suggest that this may be true. This is ourconjecture in Chapter 7, upon which some of the results are based.6.5 Decision criteria based on the adequacy-of-fit statisticIn the previous sections, we demonstrated the properties of the difference statistic Dn whenthe model is correctly specified or misspecified. An important message is that√nDn isapproximately normal for large n under some mild conditions on the functionals T1, . . . , Tm;the limiting distribution has mean zero when the model is correct, and the mean of√nDngrows at the rate O(n1/2) when the model is misspecified. Thus Dn satisfies the intuitiverequirement of a“distance”measure between the empirical and assumed distributions. A de-cision of model adequacy can be made based on the adequacy-of-fit statistic Qn = nDᵀnDn;other possible formulations were outlined at the beginning of this chapter. A fitted modelcan be seen as too parsimonious if Qn exceeds the 100(1− α)% quantile of its limiting (orreference) distribution. We refer to this quantile as a critical value.136This critical value depends on the matrix Σ which in turn depends on the feature beingused and the assumed structure in the case of a multivariate distribution; details on thedetermination of an appropriate critical value will be given in Chapter 7. Based on theconvergence results of a misspecified model in Section 6.3, we have the following insights:• The distance (with respect to the chosen features) between the true distribution andthe distribution closest to the truth within the family of assumed distributions, i.e.,T (F )− T [G(·; θ˜)], affects the ability of the statistic to detect model departure fromthe data. It is therefore important to choose the features carefully; they should reflectthe purpose of model fitting for a particular problem. For instance, if the objectiveis to find a model that can adequately represent the overall strength of dependencebetween two variables, then Kendall’s τ and Spearman’s ρ are choices to consider.If tail properties are of interest, features more specific to this purpose, such as tail-weighted dependence measures (Krupskii and Joe (2015)), may be better options.• Increasing the sample size improves the detection of departure from the true model.This is essentially the same statement that relates the sample size to the power ofa hypothesis test. The emphasis here is that statistical models are merely a tool toassist the researcher in explaining data patterns (“all models are wrong, but some areuseful” (Box (1979))). In practical applications, the 95% level critical value will likelybe exceeded with a sufficiently large sample size. In this case it is constructive toexamine the actual magnitudes of departure, i.e., those of Dn, and ask whether suchdifferences are scientifically or practically significant.• The magnitude of the asymptotic covariance matrix under model misspecification,i.e., Σmis in Theorem 6.9, may affect how likely one will obtain a small value forthe adequacy-of-fit statistic when the model is misspecified. This and the asymp-totic covariance matrix under correct model specification (i.e., Σ) are affected by theprecision of the empirical estimator. Using an efficient estimator for the features ofinterest will therefore improve the performance of the statistic.We consider the adequacy-of-fit statistic a guide as to whether further model improvementis necessary. This comes into play because model selection criteria like AIC and BIC do notprovide an indication as to whether a particular model is representative enough. Neverthe-less, the adequacy-of-fit statistic should not be used blindly to evaluate the relative strengthsof competing models without reference to model selection criteria. An overparametrizedmodel will likely lead to a smaller value than one with fewer parameters due to a better fit,but this does not imply that the latter is inadequate.137Chapter 7Adequacy-of-fit for multivariatecopulas with parsimoniousdependenceThe theoretical properties of the difference between empirical and model-based featureswere discussed in Chapter 6. We suggested the possible use of a quadratic form statisticbased on the vector of differences for model diagnostics, with the intuition that modelinadequacy will lead to a larger value of the statistic. In this chapter, we extend this ideato the adequacy-of-fit of multivariate copulas using the differences for pairwise margins. Inparticular, the functional or features to be considered are measures of dependence for eachof the(d2)bivariate margins of a d-dimensional copula, including Kendall’s τ , Spearman’sρ, tail-weighted dependence measures, and extremal coefficient for extreme value copulas.Sections 7.1 and 7.2 serve as an introduction to this chapter; we motivate the study of sucha statistic using bivariate margins in Section 7.1 and make connections to previous work inthis area. Section 7.2 provides the big picture viewpoints of the challenges we encounterand suggest appropriate strategies for each situation, and a brief overview of the contentsin subsequent sections.7.1 Motivation and backgroundLet F be the true distribution for d-dimensional i.i.d. observations Y i = (Yi1, . . . , Yid)ᵀ,i = 1, . . . , n, and G(·;θ) be a model with parameter θ ∈ Θ to which the data are fitted,resulting in the estimator θˆn. The empirical distribution function is denoted as Fˆn. SupposeG is correctly specified in the sense that F (t) = G(t;θ0) for all t ∈ Rd, for some θ0 ∈ Θ. We138suggested in the previous chapter that the vector of difference statistic Fˆn(t)−G(t; θˆn) orT (Fˆn)−T[G(·; θˆn)]for functional T can be used as a discrepancy measure of the adequacyof model fit. However, observations are sparse when d is large and, as a result, Fˆn becomesan increasingly unreliable estimate as more variables are considered in the model. Thisaffects the use of the difference statistic because the limiting (or reference) distributionmay not be representative of the finite-sample behaviour.This problem has been previously studied in the discrete context. The Pearson’s χ2statistic is widely used to inspect the validity of a model for item response data, typicallypresented in the form of cell counts for each category of each item. The total number ofcells grows exponentially with the number of items and can easily exceed the sample size.This leads to sparse contingency tables, i.e., with many cells having zero observed countsand small expected probabilities, even for a moderate number of items. In this case, theasymptotic distribution of the χ2 statistic can be substantially different from its empiricaldistribution, meaning that tests and decisions based on the asymptotic distribution can beinvalid. The issue of sparsity is alleviated by using limited-information methods throughlow-order marginal tables; this strategy is advocated in, e.g., Reiser and VandenBerg (1994);Reiser (1996); Reiser and Lin (1999); Bartholomew and Leung (2002); Maydeu-Olivares andJoe (2005, 2006). Let p be the vector of observed cell proportions and pi be the vectorof observed proportions for each cell of all order i marginal tables (one category for eachmargin is left out as the corresponding proportion can be determined from the other entriesof the vector), with corresponding model-based probabilities pi(θ) and pii(θ), respectively.Let pr = (pᵀ1, . . . ,pᵀr)ᵀbe the vector of such proportions for all marginal tables of orderr or fewer, and similarly pir(θ) be the model-based counterpart. Under certain regularityconditions, Maydeu-Olivares and Joe (2006) show that√n[pr − pir(θˆn)]d→ N (0,Σr) (7.1)for θˆn being a√n-consistent and asymptotically normal estimator, including the maximumlikelihood estimator, and the covariance matrix Σr of the limiting or reference distributionhas known expressions. The authors define the limited information statistic of order r asQn = n[pr − pir(θˆn)]ᵀV r(θˆn)[pr − pir(θˆn)], where V r , V r(θ0) = V rΣrV r, so thatΣr is a generalized inverse of V r. This quadratic form statistic converges to a chi-squareddistribution with appropriate degrees of freedom as n→∞. For limited sample sizes, it isdemonstrated that the empirical distribution of this limited information statistic matchesits asymptotic (or reference) distribution better than the Pearson’s χ2 statistic does. In Joeand Maydeu-Olivares (2010), this theory is extended to the quadratic forms in arbitrarylinear combinations of the cell residuals pr − pir(θˆn).139In this chapter, we extend the idea of examining adequacy of model fit using lower-ordermarginals to general copulas, as a way to bypass the sparsity issue in high dimensions. Thisis achieved through the theory developed in Chapter 6 regarding the asymptotic behaviourof differences that are analogous to the cell residuals in (7.1). We will mainly focus onbivariate marginal differences; these methods are applicable to trivariate and higher-ordermargins with suitably defined functionals (features) and sufficient computational power.Previous work on assessing the quality of fit of parsimonious models involves comparingthe discrepancy between a d×d observed correlation matrix Robs and the estimated, model-based correlation matrix Rmod, and has its roots in structural equation modelling. Thestatistic that is relevant to our current work is the standardized root mean squared residual(SRMSR) (see, e.g., Hu and Bentler (1998)), defined bySRMSR =[ ∑1≤j<k≤d(rjk − ρˆjk)2d(d− 1)/2]1/2, (7.2)where rjk and ρˆjk are the entries of Robs and Rmod, respectively15. The difference rjk− ρˆjkis known as the residual for the bivariate margin (j, k). The SRMSR has the same scale asthe correlations being considered and thus offers an intuitive interpretation as an averagedeviation between the observed and fitted correlations. Traditional research in this areafocuses on establishing cutoff values for various fit indices; these cutoffs typically correspondto quantiles of a reference distribution and are used as guidelines to determine whether thechosen parametric model is too parsimonious or underparametrized. A summary of suchcutoffs is given by Hooper et al. (2008). These studies are mostly simulation-based, see,e.g., Hu and Bentler (1998) and Hu and Bentler (1999). In this regard, our work canbe considered as an attempt to formalize the study of SRMSR to general multivariateparsimonious models.In the following, we illustrate the questions we try to answer in this chapter with twoexamples.Example 7.1. In Section 3.7 we fitted structured extreme value copulas to two data sets.For the US stock returns example with sample size n = 119, the matrices of absolutepairwise differences between the empirical and model-based extremal coefficient based onthe best two models using BIC are listed in Table 7.1. Also listed in the table are themaximum absolute difference and the average root-mean-square difference for each model.In this chapter, we investigate whether the differences are large enough to suggest that themodel(s) may not be adequate for the data.15The formulation in Hu and Bentler (1998) uses the covariance matrix and includes the diagonal elements,but the idea is similar.140Model Matrix of absolute pairwise differences: Extremal coefficientEV1f(Burr)0 .053 .028 .078 .003 .051 .0550 .034 .054 .040 .004 .0570 .058 .047 .012 .1100 .111 .085 .1230 .055 .0710 .0120Maximum absolute difference: 0.123Average RMS difference: 0.064tEV1f(ν = 3)0 .049 .002 .032 .012 .051 .0370 .019 .018 .034 .013 .0480 .024 .031 .002 .0840 .067 .055 .0890 .057 .0570 .0090Maximum absolute difference: 0.089Average RMS difference: 0.045Table 7.1: Matrix of pairwise differences between the empirical and model-based extremalcoefficient for the Burr 1-factor EV copula (top) and 1-factor t-EV copula with ν = 3(bottom) fitted to the US stock returns exampleOne may consider the parametric bootstrap on the fitted model to gain insight on thebehaviour of the differences assuming correct model specification. We discuss the challengesunderlying this approach, for instance the unavailability of accurate simulation methods forsome models.Example 7.2. Brechmann and Joe (2015) consider approaches to select the number oftruncation levels of a vine copula based on fit indices. The proposed methods are appliedto the GARCH-filtered return time series of 19 assets and indices of a Norwegian marketportfolio, as well as those of 15 largest German companies represented in the DAX index. Ineach case, the authors identify better 4- to 6-truncated vine copulas in terms of BIC valuesthan those previously reported. For each fitted model, the maximum and average absolutedeviations between empirical and model-based Kendall’s τ are computed. The model-basedKendall’s τ values are estimated by simulating 10,000 samples from the fitted models. Theauthors compare these deviations with the typical variances based on the empirical Kendall’sτ of a sample from a bivariate copula with moderate strength of dependence, and suggestthat the deviations are within sampling variability.For this example, our results based on maximum likelihood estimation in Chapter 6141suggest that the variability of the empirical Kendall’s τ is higher than that of the differencebetween the empirical and model-based Kendall’s τ . This difference can sometimes besubstantial (Figure 6.1). Using the variance of the empirical Kendall’s τ as the basis ofcomparison will thus likely overestimate the range within which deviations can be attributedto sampling variability. It is then natural to consider the parametric bootstrap to get anidea of what range of deviations for the difference is reasonable. However, evaluating themodel-based Kendall’s τ requires obtaining the bivariate marginal copulas; for general vinecopulas they are usually intractable. It is also impractical to estimate the model-basedKendall’s τ by simulating from the fitted model for the bootstrap sample, since this has tobe done for each of the many samples. In this chapter, we explore ways to obtain criticalvalues without going through this computationally intensive procedure.An additional challenge in developing cutoff or critical values for diagnostic statistics isthat these values may be sensitive to how the model is fitted. For the sake of computationalefficiency, fitting of complex copula models can be performed using the inference functionsfor margins (IFM) approach (Joe and Xu (1996); Joe (2005)) or marginal ranks approach(Genest et al. (1995)), where dependence parameters are fitted using pseudo observationsobtained based on the fitted marginal distributions or ranks. Meanwhile, composite likeli-hood can be used when the full density is hard to obtain, such as in the case of multivariateextreme value copulas. For vine copulas, yet another possibility is sequential fitting foreach tree in an attempt to reduce computational complexity. In each case, the resultingestimates are generally different from those based on maximum likelihood estimation of thecopula, and this may result in somewhat different behaviour of these diagnostic statistics.Our goal is to obtain a conservative (upper) bound for critical values that are applicableto most practical situations. Note that a conservative estimate is sufficient if such boundis within the practical significance for the problem.The adequacy-of-fit statistic to be introduced in this chapter is meant to be a diagnostictool (see Krupskii and Joe (2015)), and therefore much of the comments in Section 6.5applies to here as well. Unlike in the discrete case of Maydeu-Olivares and Joe (2005, 2006)and Joe and Maydeu-Olivares (2010), where a model is theorized using subject knowledge inpsychology, the choice of parametric copula family and structure is not guided by scientifictheory in general; they are instead chosen to best represent the structure of the data andpermit further inference based on the model. As a result, the purpose we suggest is differentfrom goodness-of-fit procedures such as those in Genest et al. (2009), Huang and Prokhorov(2014) and Schepsmeier (2014), where hypothesis tests are conducted in order to approveor disprove the validity of a model. The pathway of statistical modelling we adopt can besummarized as follows:1421. Potential models are selected based on simple summaries, such as pairwise correlationsof the normal scores, that reveal data pattern.2. The models are fitted and compared using model selection criteria such as AIC orBIC, or procedures like cross-validation and stepwise methods.3. Adequacy-of-fit diagnostic checks are applied to the final chosen models. Thosedeemed adequate (i.e., with diagnostic statistic smaller than a cutoff value basedon a reference distribution) are to be used for further inference and as basis for deci-sion making. If the checks suggest that the models are inadequate, one should revisethe models and the whole procedure should be repeated.An additional advantage of combining information from lower-order marginal distributionsis that the resulting statistic provides directions on model improvement. For example,if the discrepancy is found to be substantially larger for features involving a particularvariable, one should analyze the source of misfit accordingly. Systematic misfit such astrends in the differences should be removed prior to using the statistic as a means to assessmodel adequacy (see Maydeu-Olivares and Joe (2014)). As we mentioned at the beginningof Chapter 6, procedures that retain only the global summary of the distance betweendistributions, such as Kolmogorov-Smirnov and Crame´r-von Mises-type statistics and theircopula extensions, do not typically offer guidance for model improvement. Likelihood ratiotests in general also fall into this latter category (Section 7.8 of Cox and Wermuth (1996)).7.2 Diagnostic checks based on the adequacy-of-fit statisticIn this section, we give an overview of the big picture viewpoints regarding the use of theadequacy-of-fit statistic arising as a quadratic form of the vector of differences, and suggestappropriate strategies for each situation. The connection of such statistic to the SRMSR isthen made.7.2.1 Issues and general strategiesIn addition to the notations defined at the beginning of this chapter, we denote Fjk asthe bivariate marginal distribution for the (j, k) margin with empirical estimate Fˆjk. Thesubscript n is dropped to prevent cluttering of indices. The adequacy-of-fit statistic Qn isdefined asQn = n(d2)−1 ∑1≤j<k≤dwjk(T (Fˆjk)− T[Gjk(·; θˆn)])2, (7.3)143where wjk is the weight associated with the difference for the margin (j, k). The statisticQn/n can therefore be interpreted as the weighted average pairwise squared error withrespect to T . The following box makes concrete the classes of empirical features T (Fˆjk)and model estimation methods for θˆn we consider in this chapter:Empirical features1. U-statistics, including Kendall’s τ and Spearman’s ρ;2. Rank-based F-madogram estimator of the extremal coefficient, and;3. Rank-based tail-weighted dependence measures.Model estimation methods1. Maximum likelihood;2. Estimating equations (including the methods of inference functions for margins,composite likelihood and sequential vines), and;3. Marginal ranks or semiparametric estimation.The relevant combinations from the categories above are: (a) U-statistics with anymodel estimation method; (b) extremal coefficient for extreme value copulas with estimatingequations or marginal ranks method, and; (c) tail-weighted dependence measures with anymodel estimation method. To obtain the moment properties of the limiting distribution ofQn, we rely on the result that the vector of differences is asymptotically normal. It is knownthat the vector of empirical features (those listed above) is asymptotically normal, andsimilarly for the corresponding vector of model-based features. However, that the vectorof differences is asymptotically normal is only proved for the combination of empiricalU-statistics with modelling via estimating equations (including maximum likelihood) inChapter 6. For the other cases, asymptotic normality of the vector of differences is assumed.A rigorous proof involves non-trivial details and remains an open problem; a representationthat allows the application of Lemma 6.1 is needed.For U-statistics with the estimating equations method, the result of asymptotic normal-ity applied to the bivariate marginal features is stated in Theorem 7.1.Theorem 7.1. Let Y 1, . . . ,Y n be a random sample from F and G be the fitted parametricmodel. Suppose G is correctly specified, i.e., there exists θ0 such that F (t) = G(t;θ0) forevery t ∈ Rd. Let θˆn be the solution of a set of estimating equations, satisfying Assump-144tion B1 in Section 6.2, such that θˆn is√n-consistent and asymptotically normal. If thefunctional T is a U-statistic that satisfies the assumptions in Theorem 6.4, then√nDn ,√nT (Fˆ12)− T[G12(·; θˆn)]T (Fˆ13)− T[G13(·; θˆn)]...T (Fˆd−1,d)− T[Gd−1,d(·; θˆn)]d→ N(0,Σ), (7.4)where Σ is a square matrix of dimension(d2). Let 1 ≤ ipq ≤(d2)be the index of pairwisemargin (p, q), 1 ≤ p < q ≤ d. If further θˆn is the maximum likelihood estimator, then the(ijk, ilm) entry of Σ, where j < k, l < m, is:r2Cov [hjk(Y 1, . . . ,Y r), hlm(Y 1,Y r+1, . . . ,Y 2r−1)]− ∂T∂θᵀ[Gjk(·;θ0)] I−1∂T∂θ[Glm(·;θ0)] ,using the notation in Theorem 6.5.However, if G is misspecified, then we have√n(Dn −∆) d→ N(0,Σ∗) (7.5)as n → ∞, where ∆ is a non-zero vector with at least one O(1) element provided that atleast one of T (Fˆjk) − T[Gjk(·; θ˜)]= O(1), with G(·; θ˜) minimizing the Kullback-Leiblerdivergence of g from f .Proof. This theorem is an application of the relevant theorems (Theorems 6.5, 6.5′ and6.9), where each of the m =(d2)functionals here uses only partial information of F (or G),namely a bivariate marginal distribution.With the distributional properties (7.4) and (7.5) for the combination of empirical U-statistics and modelling via estimating equations, or the assumed analogous results for theother combinations, the moments of the limiting distribution of Qn are stated in Corollary7.2.Corollary 7.2. Let m =(d2)be the total number of bivariate pairs and σpq be the elementsof Σ in (7.4), 1 ≤ p, q ≤ m. Let Γ = diag (w12, . . . , wd−1,d) be the diagonal matrix of weightswith entries indexed by γpq, so that the Qn defined in (7.3) is equal to nDᵀnΓDn/m. If (7.4)or (7.5) applies, then we have the following:1451. When G is correctly specified, Qnd→ Q withE(Q) =tr(ΓΣ)m=1mm∑p=1γppσpp; (7.6)Var(Q) =2tr(ΓΣΓΣ)m2=2m2m∑p=1m∑q=1γppγqqσ2pq. (7.7)2. When G is misspecified, n (Dn −∆)ᵀ Γ (Dn −∆) /m d→ Q∗ withE(Q∗) =tr(ΓΣ∗)m;Var(Q∗) =2tr(ΓΣ∗ΓΣ∗)m2.with ∆ and Σ∗ as defined in (7.5).The expressions in Corollary 7.2 are derived directly from the moments of quadraticforms of normal random variables, see, e.g., Mathai and Provost (1992). When the modelis correctly specified, the reference distribution Q has a constant mean; otherwise the meanof an appropriate approximating distribution grows atO(n). We thus seek an upper quantile(“critical value”hereafter) ofQ under the assumed model, beyond which model improvementis recommended. To approximate the quantile of Q, we match the first two moments (7.6)and (7.7) to that of a gamma distribution (such as in Bartholomew and Leung (2002) whereapproximation via a chi-squared distribution is employed; see also Maydeu-Olivares and Joe(2008)), and then obtain the quantile of this approximating distribution.In practice, however, the computation of Qn or Σ can be challenging; the followingissues highlight the difficulties encountered:Issues of assessing model adequacy-of-fit using the quadratic form statistic1. For the dependence measures we consider, the model-based feature is only easy toobtain when the bivariate marginal distributions are numerically tractable. Thishappens when the parametric model is closed under margins.2. As seen above, the asymptotic distribution of the quadratic form statistic (orthe standardized version based on the SRMSR, to be described in Section 7.2.3)depends on the covariance matrix Σ associated with the residual vector.3. It is generally not easy to compute Σ; one particular case where this may beevaluated is when the empirical feature is a U-statistic and the model is estimatedusing maximum likelihood.1464. When Σ cannot be easily computed or reliably estimated, one approach is to usea surrogate model to approximate or obtain bounds on the critical value.5. We examine the performance of using surrogate models in low dimensions, whereresults can be compared to those of the target model, via maximum likelihoodestimation of the copula.In light of these challenges, the strategies we propose are based on the practicalityfor different combinations of empirical features and model estimation methods, and aresummarized as follows. Some methods involve the use of specific functions or a referencetable (i.e., Table 7.6); these are mentioned as software products below.User-oriented pathways to obtain critical values; software products1. Parametric bootstrap can be used whenever feasible (theoretically and computa-tionally). This involves the following steps:a) Fit parametric copula model (using maximum likelihood, inference functionsfor margins or marginal ranks method) and obtain parameter estimates;b) Repeatedly simulate (with large sample sizes) from the fitted model, obtainempirical and model-based features and hence a sampling distribution of thequadratic form statistic;c) Obtain critical value as a high quantile of this sampling distribution andconvert it to the SRMSR version for easier interpretation.When parametric bootstrap is infeasible (model-based feature is difficult to computeor model simulation/fitting is computationally expensive), we consider the followingpathways:2. Kendall’s τ and Spearman’s ρ for non-extreme-value copulas:a) Fit parametric copula model as in 1(a) above.b) For maximum likelihood estimation, it may be possible to evaluate Σ, thecovariance matrix of the reference distribution, by separating the empiricaland model-based components (Section 7.4). After getting Σ, use gammaapproximation with the first two moments (based on Corollary 7.2) to obtain147the critical value as a high quantile of this distribution and convert it to theSRMSR version.c) Alternatively, use a parsimonious Gaussian copula model with matchingKendall’s τ value of the linking copulas (for factor and truncated vine struc-tures) as surrogate (Section 7.5). Obtain an estimate of Σ for this surrogatemodel, or
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Models and diagnostics for parsimonious dependence...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Models and diagnostics for parsimonious dependence with applications to multivariate extremes Lee, David 2016
pdf
Page Metadata
Item Metadata
Title | Models and diagnostics for parsimonious dependence with applications to multivariate extremes |
Creator |
Lee, David |
Publisher | University of British Columbia |
Date Issued | 2016 |
Description | Statistical models with parsimonious dependence are useful for high-dimensional modelling as they offer interpretations relevant to the data being fitted and may be computationally more manageable. We propose parsimonious models for multivariate extremes; in particular, extreme value (EV) copulas with factor and truncated vine structures are developed, through (a) taking the EV limit of a factor copula, or (b) structuring the underlying correlation matrix of existing multivariate EV copulas. Through data examples, we demonstrate that these models allow interpretation of the respective structures and offer insight on the dependence relationship among variables. The strength of pairwise dependence for extreme value copulas can be described using the extremal coefficient. We consider a generalization of the F-madogram estimator for the bivariate extremal coefficient to the estimation of tail dependence of an arbitrary bivariate copula. This estimator is tail-weighted in the sense that the joint upper or lower portion of the copula is given a higher weight than the middle, thereby emphasizing tail dependence. The proposed estimator is useful when tail heaviness plays an important role in inference, so that choosing a copula with matching tail properties is essential. Before using a fitted parsimonious model for further analysis, diagnostic checks should be done to ensure that the model is adequate. Bivariate extremal coefficients have been used for diagnostic checking of multivariate extreme value models. We investigate the use of an adequacy-of-fit statistic based on the difference between low-order empirical and model-based features (dependence measures), including the extremal coefficient, for this purpose. The difference is computed for each of the bivariate margins and a quadratic form statistic is obtained, with large values relative to a high quantile of the reference distribution suggesting model inadequacy. We develop methods to determine the appropriate cutoff values for various parsimonious models, dimensions, dependence measures and methods of model fitting that reflect practical situations. Data examples show that these diagnostic checks are handy complements to existing model selection criteria such as the AIC and BIC, and provide the user with some idea about the quality of the fitted models. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2017-01-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0340541 |
URI | http://hdl.handle.net/2429/60157 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2017-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2017_february_lee_david.pdf [ 3.27MB ]
- Metadata
- JSON: 24-1.0340541.json
- JSON-LD: 24-1.0340541-ld.json
- RDF/XML (Pretty): 24-1.0340541-rdf.xml
- RDF/JSON: 24-1.0340541-rdf.json
- Turtle: 24-1.0340541-turtle.txt
- N-Triples: 24-1.0340541-rdf-ntriples.txt
- Original Record: 24-1.0340541-source.json
- Full Text
- 24-1.0340541-fulltext.txt
- Citation
- 24-1.0340541.ris