{"@context":{"@language":"en","Affiliation":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","AggregatedSourceRepository":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","Campus":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","Creator":"http:\/\/purl.org\/dc\/terms\/creator","DateAvailable":"http:\/\/purl.org\/dc\/terms\/issued","DateIssued":"http:\/\/purl.org\/dc\/terms\/issued","Degree":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","DegreeGrantor":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","Description":"http:\/\/purl.org\/dc\/terms\/description","DigitalResourceOriginalRecord":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","FullText":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","Genre":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","GraduationDate":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","IsShownAt":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","Language":"http:\/\/purl.org\/dc\/terms\/language","Program":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","Provider":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","Publisher":"http:\/\/purl.org\/dc\/terms\/publisher","Rights":"http:\/\/purl.org\/dc\/terms\/rights","RightsURI":"https:\/\/open.library.ubc.ca\/terms#rightsURI","ScholarlyLevel":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","Title":"http:\/\/purl.org\/dc\/terms\/title","Type":"http:\/\/purl.org\/dc\/terms\/type","URI":"https:\/\/open.library.ubc.ca\/terms#identifierURI","SortDate":"http:\/\/purl.org\/dc\/terms\/date"},"Affiliation":[{"@value":"Science, Faculty of","@language":"en"},{"@value":"Statistics, Department of","@language":"en"}],"AggregatedSourceRepository":[{"@value":"DSpace","@language":"en"}],"Campus":[{"@value":"UBCV","@language":"en"}],"Creator":[{"@value":"Chang, Bo","@language":"en"}],"DateAvailable":[{"@value":"2019-07-02T17:22:28Z","@language":"en"}],"DateIssued":[{"@value":"2019","@language":"en"}],"Degree":[{"@value":"Doctor of Philosophy - PhD","@language":"en"}],"DegreeGrantor":[{"@value":"University of British Columbia","@language":"en"}],"Description":[{"@value":"Copulas are widely used in high-dimensional multivariate applications where the assumption of Gaussian distributed variables does not hold. Vine copulas are a flexible family of copulas built from a sequence of bivariate copulas to represent bivariate dependence and bivariate conditional dependence. The vine structures consist of a hierarchy of trees to express conditional dependence. \r\n\r\nThe contributions of this thesis are\r\n(a) improved methods for finding parsimonious truncated vine structures when the number of variables is moderate to large;\r\n(b) diagnostic methods to help in decisions for bivariate copulas in the vine; \r\n(c) applications to predictions based on conditional distributions of the vine copula.\r\n\r\nThe vine structure learning problem has been challenging due to the large search space. Existing methods are based on greedy algorithms and do not in general produce a solution that is near the global optimum. It is an open problem to choose a good truncated vine structure when there are many variables. We propose a novel approach to learning truncated vine structures using Monte Carlo tree search, a method that has been widely adopted in game and planning problems. The proposed method has significantly better performance over the existing methods under various experimental setups.\r\n\r\nMoreover, diagnostic methods based on measures of dependence and tail asymmetry are proposed to guide the choice of parametric bivariate copula families assigned to the edges of the trees in the vine and to assess whether a copula is constant over the conditioning value(s) for trees 2 and higher. If the diagnostic methods suggest the existence of reflection asymmetry, permutation asymmetry, or asymmetric tail dependence, then three- or four-parameter bivariate copula families might be needed. If the conditional dependence measures or asymmetry measures in trees 2 and up are not constant over the conditioning value(s), then non-constant copulas with parameters varying over conditioning values should be considered. \r\n\r\nFinally, for data from an observational study, we propose a vine copula regression method that uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables.","@language":"en"}],"DigitalResourceOriginalRecord":[{"@value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/70869?expand=metadata","@language":"en"}],"FullText":[{"@value":"Vine copulas: dependence structure learning, diagnostics,and applications to regression analysisbyBo ChangB.S. (Mathematics), B.A. (Economics), Peking University, 2011M.S. (Statistics), University of California, Los Angeles, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)June 2019c\u00a9 Bo Chang, 2019The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Vine copulas: dependence structure learning, diagnostics, and applica-tions to regression analysissubmitted by Bo Chang in partial fulfillment of the requirements for the degree ofDoctor of Philosophy in Statistics.Examining Committee:Harry Joe, Department of StatisticsSupervisorNatalia Nolde, Department of StatisticsSupervisory Committee MemberMat\u0131\u00b4as Salibia\u00b4n-Barrera, Department of StatisticsSupervisory Committee MemberRuben H. Zamar, Department of StatisticsUniversity ExaminerKevin Song, Vancouver School of EconomicsUniversity ExaminerLouis-Paul Rivest, Universite\u00b4 LavalExternal ExamineriiAbstractCopulas are widely used in high-dimensional multivariate applications where theassumption of Gaussian distributed variables does not hold. Vine copulas are aflexible family of copulas built from a sequence of bivariate copulas to representbivariate dependence and bivariate conditional dependence. The vine structuresconsist of a hierarchy of trees to express conditional dependence.The contributions of this thesis are (a) improved methods for finding parsimo-nious truncated vine structures when the number of variables is moderate to large;(b) diagnostic methods to help in decisions for bivariate copulas in the vine; (c)applications to predictions based on conditional distributions of the vine copula.The vine structure learning problem has been challenging due to the largesearch space. Existing methods are based on greedy algorithms and do not ingeneral produce a solution that is near the global optimum. It is an open prob-lem to choose a good truncated vine structure when there are many variables. Wepropose a novel approach to learning truncated vine structures using Monte Carlotree search, a method that has been widely adopted in game and planning prob-lems. The proposed method has significantly better performance over the existingmethods under various experimental setups.Moreover, diagnostic methods based on measures of dependence and tail asym-metry are proposed to guide the choice of parametric bivariate copula families as-signed to the edges of the trees in the vine and to assess whether a copula is constantover the conditioning value(s) for trees 2 and higher. If the diagnostic methods sug-gest the existence of reflection asymmetry, permutation asymmetry, or asymmetrictail dependence, then three- or four-parameter bivariate copula families might beneeded. If the conditional dependence measures or asymmetry measures in trees 2iiiand up are not constant over the conditioning value(s), then non-constant copulaswith parameters varying over conditioning values should be considered.Finally, for data from an observational study, we propose a vine copula regres-sion method that uses regular vines and handles mixed continuous and discretevariables. This method can efficiently compute the conditional distribution of theresponse variable given the explanatory variables.ivLay SummaryIn applications with a large number of quantitative variables, the dependence re-lation among the variables is often of interest. Vine copulas are flexible modelsfor the dependence relation that extend beyond the restrictive assumptions in clas-sical multivariate Gaussian elliptical dependence. They are built from a sequenceof two-dimensional models to represent bivariate dependence and bivariate con-ditional dependence. The contributions of this thesis for vine copulas include (a)improved methods for finding parsimonious truncated vine dependence structureswhen the number of variables is moderate to large; (b) diagnostic methods to helpin decisions for bivariate copulas in the vine; (c) applications to predictions of theresponse variable from a set of explanatory variables that are observed at the sametime as the response.vPrefaceThe thesis is an original intellectual produce of the author, Bo Chang, with the guid-ance and mentorship of Prof. Joe. The research questions and the proposed newmethods are discussed with Prof. Joe during weekly research meetings. Through-out the preparation of the thesis, Prof. Joe makes ample suggestions on the im-provement of presentation, motivation, and big picture viewpoints as well as sometechnical details.Chapter 3 is based on a published paper: Chang, B., Pan, S., and Joe, H.(2019). Vine copula structure learning via Monte Carlo tree search. In Interna-tional Conference on Artificial Intelligence and Statistics. An extended version isunder preparation for journal submission. With guidance from the supervisor, theauthor develops the idea of the work and implements the proposed algorithm inPython. With help from Shenyi Pan, the author drafts the manuscript.Chapter 4 is based on a manuscript: Chang, B. and Joe, H. (2019). Copuladiagnostics for asymmetries and conditional dependence. The manuscript is underreview for journal publication. The supervisor suggests the idea of developingdiagnostic tools and provides an early version of the implementation of the concept.The author conducts the experiments and drafts the manuscript, and the supervisorsuggests revisions to the manuscript.Chapter 5 is based on a published paper: Chang, B. and Joe, H. (2019). Predic-tion based on conditional distributions of vine copulas. Computational Statistics &Data Analysis, 139:45\u201363. The supervisor develops the idea of work. The authorimplements the proposed method; some of the functions are adapted or taken fromthe CopulaModel R package written by the supervisor.Chapter 6 is based on the supplementary materials of Chang, B. and Joe, H.vi(2019). Prediction based on conditional distributions of vine copulas. Computa-tional Statistics & Data Analysis, 139:45\u201363. Under the guidance of the supervisor,the author derives the main results.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvList of Symbols and Notations . . . . . . . . . . . . . . . . . . . . . . . xxGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Dependence structure learning . . . . . . . . . . . . . . . . . . . 31.3 Vine copula diagnostics . . . . . . . . . . . . . . . . . . . . . . . 51.4 Vine copula regression . . . . . . . . . . . . . . . . . . . . . . . 51.5 Research contributions and organization of thesis . . . . . . . . . 7viii2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Bivariate copulas . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Density and conditional distributions . . . . . . . . . . . 122.1.2 Dependence measures . . . . . . . . . . . . . . . . . . . 142.1.3 Asymmetry measures . . . . . . . . . . . . . . . . . . . . 152.1.4 Tail order . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Archimedean copulas . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Vine copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Vine graphical models . . . . . . . . . . . . . . . . . . . 182.3.2 Vine array representation . . . . . . . . . . . . . . . . . . 202.3.3 From vines to multivariate distributions . . . . . . . . . . 202.3.4 Truncated vine . . . . . . . . . . . . . . . . . . . . . . . 222.3.5 Vine structure learning . . . . . . . . . . . . . . . . . . . 232.3.6 Performance metric . . . . . . . . . . . . . . . . . . . . . 242.4 Two-stage estimation method for copula models . . . . . . . . . . 252.5 Vuong\u2019s procedure . . . . . . . . . . . . . . . . . . . . . . . . . 263 Vine structure learning via Monte Carlo tree search . . . . . . . . . 273.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Vine structure learning as sequential decision making . . . 293.2.2 Monte Carlo tree search . . . . . . . . . . . . . . . . . . 313.2.3 Tree policy: vine UCT . . . . . . . . . . . . . . . . . . . 333.3 A worst-case example for SeqMST . . . . . . . . . . . . . . . . . 423.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.1 Structure learning experiments . . . . . . . . . . . . . . . 453.4.2 Vine copula learning experiments . . . . . . . . . . . . . 483.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Copula diagnostics for asymmetries and conditional dependence . . 524.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Copula-based conditional measures . . . . . . . . . . . . . . . . 554.2.1 Estimating copulas of conditional distributions . . . . . . 57ix4.2.2 Conditional Spearman\u2019s rho . . . . . . . . . . . . . . . . 604.2.3 Conditional tail-weighted dependence measure . . . . . . 614.2.4 Conditional measures of permutation and reflection asym-metry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Skewed bivariate copulas . . . . . . . . . . . . . . . . . . . . . . 634.4 Conditional dependence with the gamma factor model . . . . . . 674.5 Illustrative data examples . . . . . . . . . . . . . . . . . . . . . . 724.5.1 Hydro-geochemical data . . . . . . . . . . . . . . . . . . 734.5.2 Glioblastoma tumors dataset . . . . . . . . . . . . . . . . 764.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 Prediction based on conditional distributions of vine copulas . . . . 845.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Model fitting and assessment . . . . . . . . . . . . . . . . . . . . 865.2.1 Vine structure learning . . . . . . . . . . . . . . . . . . . 875.2.2 Bivariate copula selection . . . . . . . . . . . . . . . . . 895.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 925.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.1 Abalone data set . . . . . . . . . . . . . . . . . . . . . . 985.5.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 1005.5.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . 1055.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076 Theoretical results on shapes of conditional quantile functions . . . 1096.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.2 Bivariate asymptotic conditional quantile . . . . . . . . . . . . . 1106.3 Bivariate Archimedean copula boundary conditional distributions . 1146.3.1 Lower tail . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.3.2 Upper tail . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4 Trivariate asymptotic conditional quantile . . . . . . . . . . . . . 1216.4.1 Trivariate strongest functional relationship . . . . . . . . . 121x6.4.2 Trivariate conditional boundary distribution with bivariateArchimedean copulas . . . . . . . . . . . . . . . . . . . . 1226.4.3 Case studies: trivariate conditional quantile . . . . . . . . 1276.5 Beyond trivariate . . . . . . . . . . . . . . . . . . . . . . . . . . 1327 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A Derivations for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 144A.1 Derivations for Section 6.4.1 . . . . . . . . . . . . . . . . . . . . 144A.2 Derivations for Section 6.4.2 . . . . . . . . . . . . . . . . . . . . 146A.3 Derivations for case 1 in Section 6.4.3 . . . . . . . . . . . . . . . 150B Conditional dependence measures for trivariate Frank copulas . . . 155C Implementation of Monte Carlo tree search (MCTS) . . . . . . . . . 158C.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158C.2 Example usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 160C.3 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161xiList of TablesTable 3.1 Experimental results for the FX dataset. The columns includethe Gaussian log-likelihood and comparative fit index (CFI) ofthe vine dependence structure. Given the vine structures, bivari-ate copulas are assigned to the edges. The last three columns arethe resulting vine copula log-likelihood, number of parameters,and AIC. For BJ15 and MCTS, we show 10 replications withdifferent random seeds and the average and best model. . . . . 49Table 3.2 Experimental results for the GBM20 dataset. The columns in-clude the Gaussian log-likelihood and CFI of the vine depen-dence structure. Given the vine structures, bivariate copulas areassigned to the edges. The last three columns are the resultingvine copula log-likelihood, number of parameters, and AIC. ForBJ15 and MCTS, we show 10 replications with different randomseeds and the average and best model. . . . . . . . . . . . . . 50Table 4.1 Empirical tail-weighted dependence measures \u03b6\u02c6\u03b1=5, Gaussiantail-weighted dependence measure \u03b6\u03b1=5, permutation asymme-try measures G\u0302P,k=0.2, and fitted copulas in the first tree for thehydro-geochemical dataset. Gaussian \u03b6\u03b1=5 is the tail-weighteddependence measure of a bivariate Gaussian copula whose Spear-man\u2019s rho is the same as the empirical counterpart. There ap-pears to be reflection asymmetry, permutation asymmetry, andstronger dependence than Gaussian in the joint upper and lowertails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73xiiTable 4.2 Fitted bivariate copulas in the second tree for the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Table 4.3 Pairwise comparison of vine copula models on the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Table 4.4 Permutation asymmetry measures and Akaike information cri-terion (AIC)s of pairs of variables in the GBM dataset. . . . . 78Table 4.5 Empirical tail-weighted dependence measures \u03b6\u02c6\u03b1=5 and Gaus-sian tail-weighted dependence measure \u03b6\u03b1=5. . . . . . . . . . 79Table 4.6 Permutation asymmetry measure and AICs of pairs of variablesin tree 2 of D-vine 1342 and C-vine 1234 on the GBM dataset.If the AIC of a non-constant model is worse than a constantmodel, we report the AIC of the constant model, e.g., [14|3],[23|1] and [24|1]. . . . . . . . . . . . . . . . . . . . . . . . . 80Table 4.7 Model AICs for different vine structures on GBM dataset. . . . 81Table 5.1 Simulation results for two explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS)in different simulation cases. The arrows in the header indicatethat lower RMSE, IS, and IBS; and higher LS and QS are better.The numbers in parentheses are the corresponding standard errors.100Table 5.2 Simulation results for four explanatory variables. The tableshows the root-mean-square error (RMSE), logarithmic score(LS), quadratic score (QS), interval score (IS), and integratedBrier score (IBS) in different simulation cases. The arrows inthe header indicate that lower RMSE, IS, and IBS; and higher LSand QS are better. The numbers in parentheses are the corre-sponding standard errors. . . . . . . . . . . . . . . . . . . . . 101Table 5.3 Comparison of the performance of vine copula regressions andlinear regression. The numbers are the average scores over 100trials of 5-fold cross validation. The scoring rules are defined inSection 5.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104xiiiTable 5.4 Vine array and bivariate copulas of the R-vine copula regres-sion fitted on the full dataset. The variables are (1) Length, (2)Diameter, (3) Height, (4) WholeWeight, (5) ShuckedWeight,(6) VisceraWeight, (7) ShellWeight, (8) Rings. Asuffix of \u2018s\u2019 represents survival version of the copula family toget the opposite direction of joint tail asymmetry; \u2018u\u2019 and \u2018v\u2019represent the copula family with reflection on the first and sec-ond variable respectively to get negative dependence. . . . . . 105Table 6.1 The taxonomy of the lower tail boundary conditional distribu-tion limu1,u2\u21920 u3|12, where u3|12 is defined in Equation 6.23.For the first (non-heading) row where limu1,u2\u21920C1|2(u1|u2) =0, \u03ba13 represents \u03ba13L, the lower tail order of C13;2. Similarly,for the third (non-heading) row, where limu1,u2\u21920C1|2(u1|u2) =1, \u03ba13 represents \u03ba13U , the upper tail order of C13;2. . . . . . . 124Table 6.2 The taxonomy of the upper tail boundary conditional distribu-tion limu1,u2\u21921 u3|12, where u3|12 is defined in Equation 6.23.For the first (non-heading) row where limu1,u2\u21921C1|2(u1|u2) =0, \u03ba13 represents \u03ba13L, the lower tail order of C13;2. Similarly,for the third (non-heading) row, where limu1,u2\u21921C1|2(u1|u2) =1, \u03ba13 represents \u03ba13U , the upper tail order of C13;2. . . . . . . 126Table C.1 Correspondence of variables and functions defined in the psue-docode in Algorithm 3.1 and in the Python implementation. . . 160xivList of FiguresFigure 2.1 Contour plots of the joint probability density function (PDF)c(\u03a6(z1),\u03a6(z2))\u03c6(z1)\u03c6(z2). The margins are N(0,1) and cop-ulas have Spearman\u2019s \u03c1S = 0.5. . . . . . . . . . . . . . . . . 16Figure 2.2 An example of a vine for d = 5 up to tree 3. . . . . . . . . . . 19Figure 3.1 Vine structure learning as a sequential decision problem. Anedge can be added to an unconnected acyclic graph. When atree at level t is completed, the edges of this tree are used tocreate nodes for the next graph at level t+1. . . . . . . . . . 30Figure 3.2 One iteration of the general MCTS algorithm. . . . . . . . . . 32Figure 3.3 The search tree corresponding to a 1-truncated vine with d = 3.Although the search tree has six leaf nodes, there are only threeunique 1-truncated vines: {[1,2], [2,3]} and {[2,3], [1,2]} yield1\u20132\u20133; {[1,3], [2,3]} and {[2,3], [1,3]} yield 1\u20133\u20132; {[1,2], [1,3]}and {[1,3], [1,2]} yield 2\u20131\u20133. . . . . . . . . . . . . . . . . . 36xvFigure 3.4 Some nodes of depths from 0 to 3 in the search tree. The rootnode does not have any edges. A child node is obtained byadding an edge to the (incomplete) vine structure of its parentnode. In future iterations, child nodes with higher scores aremore likely to be visited (exploitation); child nodes with fewerprior visits are more likely to be visited (exploration); childnodes with larger values of H j =\u2212 log(1\u2212\u03c12e j) are more likelyto be visited (progressive bias). Note that each child node hasseveral predecessors so that the number of visits of a givennode in the search tree is fewer than the sum of numbers ofvisits of its child nodes. . . . . . . . . . . . . . . . . . . . . 40Figure 3.5 Some nodes of depths 5, 6, and 9 in the search tree. The nodesof depth 9 are the results of the simulation step, starting fromthe nodes of depth 6. The scores or objective functions ofthe best 2-truncated vine found by the MCTS algorithm, brute-force algorithm, and sequential maximum spanning tree (MST)algorithm are 2.362, 2.362, and 2.333, respectively. . . . . . 41Figure 3.6 CFI vs truncation level t for simulated 2-truncated D-vine datasetswith d = 10 and d = 15. A larger CFI is better. . . . . . . . . 44Figure 3.7 CFI vs truncation level t for the Abalone dataset. A larger CFIis better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 3.8 GBM dataset: CFI vs truncation level t. A larger CFI is better. . 46Figure 3.9 Optimal truncation level t\u2217\u03b1=0.01 vs dimension d. A smallert\u2217\u03b1=0.01 is better. . . . . . . . . . . . . . . . . . . . . . . . . . 47Figure 4.1 Scatter plots of 1000 random samples drawn from C\u02d8(u1,u2;\u03b8 ,\u03b2 )in Equation 4.3 when C(\u00b7;\u03b8) is comonotonic. . . . . . . . . . 66Figure 4.2 Comparison of permutation asymmetry measure GP,k=0.2 inSection 4.2.4 and central dependence measure Spearman\u2019s rhofor skew-BB1 and skew-t copulas. For skew-BB1 copulas, theparameter \u03b2 is in the set of 20 equally spaced points in [\u22121,1].Each red curve in the figure corresponds to a distinct \u03b2 value. 67xviFigure 4.3 Conditional measures of C12;3(\u00b7;x), the copula of Y1,Y2 givenF3(Y3)= x, for a gamma factor model with parameters (\u03b80,\u03b81,\u03b82,\u03b83)=(3,1,1.5,2). The sample size is n = 1000. The red dash-dotlines are the exact conditional measures computed via numer-ical integration. The dark solid lines and dashed lines are thekernel-smoothed conditional Spearman\u2019s rho and the corre-sponding 90%-level simultaneous bootstrap confidence bands,using Epanechnikov kernel and window size hn = 0.2. . . . . 70Figure 4.4 Conditional Spearman\u2019s rho of C12;34(\u00b7;x,y), the copula of Y1,Y2given F3(Y3) = x and F4(Y4) = y, for a gamma factor modelwith parameters (\u03b80,\u03b81,\u03b82,\u03b83,\u03b84)= (3,1,1.5,2,2.5). The sam-ple size is n = 1000. The red surface is the exact conditionalSpearman\u2019s rho computed via numerical integration, and theblue surfaces are the 90%-level simultaneous bootstrap con-fidence surfaces, using spherically symmetric Epanechnikovkernel and window size hn = 0.2. . . . . . . . . . . . . . . . 71Figure 4.5 Pairwise scatter plot of the normal scores of variables cobalt(Co), titanium (Ti) and scandium (Sc) in the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 4.6 Conditional Spearman\u2019s rho on the hydro-geochemical dataset.The dark solid lines and dashed lines are the kernel-smoothedconditional Spearman\u2019s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikovkernel and window size hn = 0.2. The red dash-dot lines rep-resent the estimated conditional Spearman\u2019s rho. . . . . . . . 77Figure 4.7 Pairwise scatter plot of the normal scores in the GBM dataset. 79xviiFigure 4.8 Conditional Spearman\u2019s rho of pairs [14|3] and [23|4] in theD-vine-1342 model, and [23|1] and [24|1] in the C-vine-1234model on the GBM dataset. The dark solid lines and dashedlines are the kernel-smoothed conditional Spearman\u2019s rho andthe corresponding 90%-level simultaneous bootstrap confidencebands, using Epanechnikov kernel and window size hn = 0.2.The red dash-dot lines represent the model conditional Spear-man\u2019s rho. For [14|3], the best-fitting model is a constant skewedt-copula. For [23|4], the best-fitting model is a non-constantskewed-BB1 copula (quartic \u03b4 ). For both [23|1] and [24|1], thebest-fitting models are constant reflected skewed-BB1 copulas. 82Figure 5.1 First two trees T1 and T2 of a vine V . The node set and edge setof T1 are N(T1)= {1,2,3,4,5} and E(T1)= {[12], [23], [24], [35]}.The node set and edge set of T2 are N(T2)=E(T1)= {[12], [23], [24], [35]}and E(T2) = {[13|2], [25|3], [34|2]}. . . . . . . . . . . . . . . 88Figure 5.2 Adding a response variable to the R-vine of the explanatoryvariables. In this example, variables 1 to 5 represent the ex-planatory variables and variable 6 represents the response vari-able. The newly added nodes are highlighted. . . . . . . . . . 89Figure 5.3 The linear homoscedastic simulation case. In this fitted vinecopula model, C13,C12 and C23;1 are all Gaussian copulas, withparameters \u03c113 = 0.77,\u03c112 = 0.5 and \u03c123;1 = 0.39. The greensurfaces represent the conditional expectation, and the red andblue surfaces are the 2.5% and 97.5% quantile surfaces, re-spectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 5.4 The linear heteroscedastic simulation case. In this fitted vinecopula model, C13 is a survival Gumbel copula with parameter\u03b413 = 2.21, C12 is a Gaussian copula with parameter \u03c112 = 0.5,and C23;1 is a BB8 copula with parameters \u03d123;1 = 3.06,\u03b423;1 =0.71. The green surfaces represent the conditional expectation,and the red and blue surfaces are the 2.5% and 97.5% quantilesurfaces, respectively. . . . . . . . . . . . . . . . . . . . . . 98xviiiFigure 5.5 The non-linear and heteroscedastic simulation case. In this fit-ted vine copula model, C13 is a survival BB8 copula with pa-rameters \u03d113 = 6,\u03b413 = 0.78, C12 is a Gaussian copula withparameter \u03c112 = 0.5, and C23;1 is a BB8 copula with parame-ters \u03d123;1 = 6,\u03b423;1 = 0.65. The green surfaces represent theconditional expectation, and the red and blue surfaces are the2.5% and 97.5% quantile surfaces, respectively. . . . . . . . 99Figure 5.6 Pairwise scatter plots of the Abalone dataset. . . . . . . . . . 101Figure 5.7 Visualization of the R-vine array in Table 5.4. . . . . . . . . . 103Figure 5.8 Residual vs. fitted value plots. The red and blue points cor-respond to the lower bound and upper bound of the predictionintervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 5.9 Comparison of the performance on the classification problem. 107Figure 6.1 Conditional quantile functions for bivariate copulas with Kendall\u2019s\u03c4 = 0.5, combined with N(0,1) margins. Quantile levels are20%,40%,60% and 80%. . . . . . . . . . . . . . . . . . . . 113Figure 6.2 Conditional quantile surface F\u22121Y |X1,X2(\u03b1|x1,x2) in cases 1 and3, for \u03b1 = 0.25 and 0.75. . . . . . . . . . . . . . . . . . . . . 130Figure 6.3 Conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) versus x1 in case 3 for\u03b1 = 0.25 and 0.75, as x1\u2192+\u221e. It shows that the conditionalquantile converges to +\u221e, a finite number, or \u2212\u221e. . . . . . . 132Figure B.1 Conditional measures of C12;3(\u00b7;x), the copula of Y1,Y2 givenF3(Y3) = x, for a trivariate Frank copula model with parameterthat corresponds to Kendall\u2019s \u03c4 = 0.6. The sample size is n =1000. The red dash-dot lines are the exact conditional mea-sures computed via numerical integration. The dark solid linesand dashed lines are the kernel-smoothed conditional Spear-man\u2019s rho and the corresponding 90%-level simultaneous boot-strap confidence bands, using Epanechnikov kernel and win-dow size hn = 0.2. . . . . . . . . . . . . . . . . . . . . . . . 157xixList of Symbols and NotationsC CopulaC+\/C\u2212\/C\u22a5 Comonotonicity \/ Countermonotonicity \/ Independence cop-ulaC\u02d8 Permutation asymmetric copulaC\u0302 Reflected or survival copula of CC jk;S Copula for the conditional distributions Fj|S and Fk|SC j|k;S Conditional distribution of C jk;Sc\/c\u02dc Copula density of Cc jk;S Copula density of C jk;Sd Dimension or number of variablesE Expectation of a random variableF Cumulative distribution functionFS j|Sk Conditional distribution with subsets of indices S j and Skf Probability density functionI{S} Indicator function for event SK Kernel density functiono Asymptotic order, e.g. h1(x) = o(h2(x)) as x\u2192 \u221e if and onlyif \u2200\u03b5 > 0,\u2203x0 \u2208 R such that |h1(x)| \u2264 \u03b5|h2(x)| for all x\u2265 x0O Asymptotic order, e.g. h1(x) = O(h2(x)) as x\u2192 \u221e if and onlyif \u2203M > 0,x0 \u2208 R such that |h1(x)| \u2264M|h2(x)| for all x\u2265 x0P Probability of an eventR Correlation matrixRd Real coordinate space of d dimensions. Superscript omitted ifd = 1xxVar Variance of a random variable\u03b6\u03b1 Tail-weighted dependence measure\u03baL\/\u03baU Lower \/ Upper tail order\u03c1S Spearman\u2019s \u03c1\u03c4 Kendall\u2019s \u03c4\u03a6 Standard Gaussian cumulative distribution function\u03c6 Standard Gaussian probability density function\u2126 Inverse correlation matrix or precision matrix\/0 Empty set\u223c Asymptotic equality, e.g., h1(x)\u223c h2(x) as x\u2192 \u221e if and onlyif limx\u2192\u221e h1(x)\/h2(x) = 14 Symmetric difference of two sets# Cardinality of a setxxiGlossaryAIC Akaike information criterionAUC area under the curveBIC Bayesian information criterionCDF cumulative distribution functionCFI comparative fit indexIBS integrated Brier scoreIFM inference functions for marginsIS interval scoreLS logarithmic scoreMCTS Monte Carlo tree searchMLE maximum likelihood estimationMST maximum spanning treeMTCJ Mardia\u2013Takahasi\u2013Clayton\u2013Cook\u2013JohnsonPDF probability density functionPMF probability mass functionQS quadratic scorexxiiRF random forestRMSE root-mean-square errorROC receiver operating characteristicSVM support vector machineUCT upper confidence bounds for treesxxiiiAcknowledgmentsFirst and foremost, I would like to thank my supervisor, Prof. Harry Joe, whogave me the perfect amount of guidance and freedom so that I can explore variousresearch ideas without getting stuck in dead ends or losing directions. Throughoutmy Ph.D. studies, he was always available for help and cheered me up when I feltdown. I feel valued and respected working with him. Furthermore, he set a rolemodel for me at the early stage of my career; I believe his excellent work habitswill continue to influence me long after this.I am grateful to the members of the supervisory committee, Prof. Natalia Noldeand Prof. Mat\u0131\u00b4as Salibia\u00b4n-Barrera, for the detailed comments leading to an im-proved presentation of the dissertation. I would also like to thank Prof. Louis-PaulRivest who provided extensive feedback as the external examiner, Prof. RubenH. Zamar and Prof. Kevin Song for serving as the university examiners, andthe anonymous referees for their comments. The research has been supportedby research grants from the Natural Sciences and Engineering Research Council(NSERC) and the Canadian Statistical Sciences Institute (CANSSI).I want to thank all my collaborators in addition to my supervisor. It has been agreat pleasure sharing the journey of scientific exploration with them. Especially, Iam grateful to the senior researchers with whom I worked closely, in chronologicalorder, Prof. Roger M. Cooke, Dr. Lili Meng, Prof. Eldad Haber, Dr. Bo Zhao,and Dr. Minmin Chen. As Confucius once said, \u201cWhen you see a worthy, thinkof becoming equal to him\/her.\u201d It makes me a better researcher by learning fromthem.Finally, I am indebted to my family and friends for their constant support andfor the happy moments we shared.xxivDedicationThis dissertation is dedicated to my beloved parents.\u8b39\u4ee5\u6b64\u6587\u737b\u7d66\u6211\u646f\u611b\u7684\u7236\u89aa\u548c\u6bcd\u89aa\u3002xxvChapter 1IntroductionCopulas are widely used in high-dimensional multivariate applications where theassumption of Gaussian distributed variables does not hold. Truncated vine cop-ulas are a flexible family of copulas built from a sequence of bivariate copulas torepresent bivariate dependence and bivariate conditional dependence.We address the following new research directions regarding vine copula mod-els.1. Finding parsimonious truncated vine structures so that there is a representa-tion where we rely on conditional dependence only up to t\u22121 conditioningvariables and conditional independence for t to d\u22122 variables. A smaller tindicates more conditional independence or parsimony.2. Developing diagnostic tools of tail dependence, tail asymmetry and non-constant conditional dependence to help in the selection of bivariate copulas.3. Applying vine copula models for improved predictive conditional distribu-tion of the response variable given explanatory variables, compared withclassical multiple regression, binary regression, or ordinal regression; vinecopula models handle nonlinear conditional expectation and conditional het-eroscedasticity in a simpler way than methods such as polynomial regres-sion, weighted regression, and ordinal regression.11.1 BackgroundIn multivariate statistics, modeling the dependence structure of multivariate ob-servations is essential. The multivariate Gaussian distribution is one of the mostcommonly used models for this task. However, multivariate data are seldom suf-ficiently summarized by the multivariate Gaussian distribution, because univariatemargins could be skewed or heavy-tailed, and the joint distribution could havestronger dependence than a Gaussian distribution in the tails or have tail asymme-tries.Copulas are a flexible tool in multivariate statistics that can be used to modeldistributions beyond the Gaussian. Sklar\u2019s theorem (Sklar, 1959) shows that, fora d-dimensional random vector Y = (Y1,Y2, . . . ,Yd)\u2032 following a joint cumulativedistribution function (CDF) F , with the j-th univariate margin Fj, there exists adistribution function C : [0,1]d \u2192 [0,1] such thatF(y1, . . . ,yd) =C(F1(y1), . . . ,Fd(yd)).If F is a continuous d-variate distribution function, then the copula C is unique.Otherwise C is unique only on the set Range(F1)\u00d7 \u00b7\u00b7 \u00b7\u00d7Range(Fd). Sklar\u2019s the-orem provides a decomposition of a multivariate distribution into two parts: themarginal distributions and the associated copula.Several parametric bivariate copula families have been proposed and their prop-erties, including tail dependence and asymmetric behavior, have been developed.However, extending copulas from bivariate to multivariate distributions is non-trivial. The challenge is that we need parsimonious and flexible models. Here, par-simonious models are models whose number of parameters does not grow quadrat-ically with respect to the dimension d, that is, a model with o(d2) parameters. Ad-variate Gaussian distribution can be characterized by O(d2) parameters, includ-ing d mean parameters and d(d + 1)\/2 covariance parameters. One class of par-simonious models is the exchangeable models, including isotropic Gaussian andt-distributions, and the class of Archimedean copulas. Those models only haveone parameter so are not flexible because of the strong assumption on exchange-ability; hence they are not useful except for small values of d.The vine copula or pair-copula construction is a flexible tool in high-dimensional2dependence modeling. It combines the vine graphs and bivariate copulas. A vineis a graphical object represented by a sequence of connected trees. In a vine copulamodel, vine graphs are adopted to specify the dependence structure, and bivariatecopulas are used as the basic building blocks on the edges of vines. Truncatedvines are useful for representing the dependence of multivariate observations in aparsimonious way. A vine copula with a truncation level of t has O(td) parame-ters, which grows linearly in d. Vine copulas have been proven to be a flexible toolin high-dimensional (non-Gaussian) dependence modeling, and have been appliedto various fields including finance (Dissmann et al., 2013), social science (Cookeet al., 2019), and spatial statistics (Krupskii et al., 2018).Fitting a vine copula model can be done in two steps: (1) To find a vine struc-ture that describes the dependence among variables, which is referred to as thestructure learning. The structure learning is a difficult combinatorial optimizationproblem since the number of possible vine structures is exponentially increasingwith respect to the dimension. Commonly used algorithms for structure learningare greedy algorithms, which, in general, do not produce a globally optimal so-lution. (2) To assign bivariate copulas to the edges of the vine structure. After avine structure is fixed, bivariate copulas are fitted to each edge in the vine. Thisis often done by iterating through a list of candidate bivariate copula families andpicking the one with the highest log-likelihood, or lowest Akaike information crite-rion (AIC) or Bayesian information criterion (BIC). However, this approach mightbring bias or inefficiency when the candidate families do not match the dependenceor asymmetry properties exhibited in data.1.2 Dependence structure learningThe structure learning of the truncated vine is computationally intractable in gen-eral. There are a large number of possible vine structures which result in a hugesearch space for a high-dimensional dataset if one would like to find the optimalone. Specifically, according to Cayley\u2019s formula, one can construct dd\u22122 differenttrees with d nodes. With this result, Kurowicka and Joe (2011) further show thatthere are in total 2(d\u22123)(d\u22122)(d!\/2) different vine structures considering all levelsof trees for a dataset with d variables. This makes vine structure searching and3learning a challenging problem.Monte Carlo tree search (MCTS) is a search framework for finding a sequenceof near-optimal decisions by taking random samples in the decision space (Browneet al., 2012). Many chess games can be formulated as sequential decision problemswhere players take actions sequentially. The key idea of MCTS is first to construct asearch tree which is explored by fast Monte Carlo simulations and then to grow thetree selectively (Coulom, 2006). Multi-armed bandit algorithms such as the upperconfidence bounds for trees (UCT) can be employed to balance between explorationand exploitation (Kocsis and Szepesva\u00b4ri, 2006). As one of the most importantmethods in artificial intelligence, MCTS has been widely applied in various gameand planning problems.Because the construction of a truncated vine is inherently sequential, we formu-late the vine structure learning problem as a sequential decision making process inthis work. A search tree thus arises, where the root node is \u201cempty\u201d and the termi-nal leaf nodes are valid vine structures. Although the height and branching factor ofthe search tree might be large, MCTS can be adopted to search through it efficiently.Specifically, we adapt the existing UCT algorithm for vine structure learning andincorporate tree policy enhancements including first play urgency, progressive bias,and efficient transposition handling. The adapted UCT is called the vine UCT, un-der the guidance of which, the tree policy strikes a balance between exploitationand exploration. Here, exploitation means to make the best decision given currentinformation, and exploration means to gather more information about the searchspace.After the MCTS method finds candidates for truncated vine structures, bivariatecopula families are chosen to match tail dependence and tail asymmetries in thedata. This approach improves on greedy algorithms in terms of model fitting buttakes more computational time. Comparisons are made with existing methods ondatasets from various disciplines. All the experiments suggest that the proposedmethod outperforms existing methods.41.3 Vine copula diagnosticsTo effectively facilitate the choice of bivariate parametric copula families on theedges of a vine, we propose diagnostic tools for bivariate asymmetries and forconditional dependence as a function of the conditioning value(s).Various dependence measures and asymmetry measures can effectively guidethe choice of candidate parametric copula families. If diagnostics for an edge ofthe vine suggest that tail dependence or asymmetry exists, then only appropriateparametric copula families with properties matching the tail asymmetry or strengthof dependence in the tail should be considered.In order for modeling with vine copulas to be tractable, the constant condi-tional dependence assumption is usually made for the bivariate copulas as an ap-proximation, since this can still lead to vine copulas with flexible tail properties.For vine copulas, in particular, the bivariate copulas of conditional distributions inthe second tree and higher do not depend on the conditioning values. Adoptingthe constant conditional dependence assumption can greatly simplify the model-ing process and evade the curse of dimensionality. For conditional dependence intrees 2 and higher of the vine, our diagnostic tools yield functions of conditioningvariable(s) to help in the visualization of the form of conditional dependence andasymmetry. Corresponding confidence bands can be obtained for the conditionalfunctions; if a constant function does not lie within the confidence bands, thenthe simplifying assumption might be inappropriate and one could consider copulaswhose parameters depend on the value of the conditioning variable.1.4 Vine copula regressionOne possible application of vine copula models is to regression analysis. In thecontext of an observational study, i.e., the response variable Y and the explanatoryvariables X = (X1, . . . ,Xp) are measured simultaneously, a natural approach is tofit a joint distribution to (X1, . . . ,Xp,Y ) assuming a random sample (xi1, . . . ,xip,yi)for i = 1, . . . ,n, and then obtain the conditional distribution of Y given X for mak-ing predictions. For example, conditional expectation and conditional quantilescan be obtained from the conditional distribution for out-of-sample point estimatesand prediction intervals. This becomes the usual multiple regression if the joint5distribution of (X,Y ) is multivariate Gaussian. Unlike multiple regression, this ap-proach uses information on the distributions of the variables and does not specify asimple linear or polynomial equation for the conditional expectation. Polynomialequations can only be valid locally and generally have poor performance in theextremes of the predictor space.To make the joint distribution approach work, there are two major questions tobe addressed: (A) How to model the joint distribution of (X1, . . . ,Xp,Y )? (B) Howto efficiently compute the conditional distribution of Y given X from a multivariatedistribution? For question (A), the vine copula is a flexible tool. The possibility ofapplying copulas for prediction and regression has been explored, but an algorithmis needed in general for (B) when some variables are continuous and others arediscrete.We propose a method, called the vine copula regression, that uses R-vines andhandles mixed continuous and discrete variables. That is, the predictor and re-sponse variables can be either continuous or discrete. As a result, we have a unifiedapproach for regression and (ordinal) classification. This approach is interpretable,and various shapes of conditional quantiles of Y as a function of X can be obtaineddepending on how pair-copulas are chosen on the edges of the vine.Another advantage of the proposed method is that it can handle ordinal re-sponses better than ordinal regression, especially when there are many explanatoryvariables. The ordinal regression model for an ordinal response variable Y withlevels {1,2, . . . ,K} is formulated as follows.P(Y \u2264 k|X = x) = \u03c3(\u03b8k\u2212wT x),where \u03c3 is the inverse link function, for example, the logistic function; \u03b81 < \u03b82 <.. . < \u03b8K\u22121 and w = (w1, . . . ,wp) are the parameters. This model guarantees thatP(Y \u2264 k|X = x) is monotonically increasing. However, the downside is that themodel is highly restrictive; it assumes the effect of an explanatory variable x j onthe log odds is a constant of w j, regardless of the level k. When (x1, ...,xp,y)are observed together in a sample, a multivariate distribution approach is moreflexible and can easily overcome the problem, since it models the joint distributionof (X,Y ).6We also provide a theoretical analysis of the asymptotic conditional CDF andquantile function for vine copula regression. This analysis sheds light on the ad-vantage of vine copula regression methods: flexible asymptotic tail behavior. Toeasily compare with the Gaussian copula or linear regression equations when allvariables have Gaussian distributions, we assume Y and the components of X havebeen transformed to standard normal N(0,1) variables Y \u2217,X\u2217. Leveraging the flex-ibility of bivariate copulas on the edges of the vine, the conditional quantile func-tion of Y \u2217 could be asymptotically linear, sublinear, or constant with respect to thetransformed explanatory variables X\u2217, as components of X\u2217 go to \u00b1\u221e.1.5 Research contributions and organization of thesisThe thesis is organized as follows. Chapter 2 gives an overview of results in exist-ing literature. The main research contributions in Chapter 3 to 6 are summarizedas follows; these chapters can be read in any order.\u2022 Chapter 3: A novel approach to learning truncated vine structures usingMonte Carlo tree search (MCTS). The proposed method can efficiently ex-plore a search space with guided random sampling and has significantly bet-ter performance over the existing methods under various experimental se-tups.\u2022 Chapter 4: A general framework for estimating the conditional dependenceor asymmetry measures as a function of the value(s) of the conditioning vari-able(s). An algorithm to compute the corresponding confidence bands is alsopresented. The estimation of the conditional measures can be adapted toother copula-based measures and enrich the diagnostic tools in the future.\u2022 Chapter 5: A novel method called the vine copula regression that uses R-vines and handles mixed continuous and discrete variables. The method is aunified approach for regression and (ordinal) classification and interpretable.Various shapes of nonlinear conditional mean, quantiles and heteroscedas-ticity of Y as a function of x can be obtained depending on how pair-copulasare chosen on the edges of the vine.7\u2022 Chapter 6: A theoretical analysis of the asymptotic conditional CDF andquantile function for vine copula regression. This analysis sheds light onthe advantage of vine copula regression methods: flexible asymptotic tailbehavior.Finally, Chapter 7 concludes the thesis and discusses further research.8Chapter 2PreliminariesA d-dimensional copula C is a multivariate distribution on the unit hypercube[0,1]d , with all univariate margins being U(0,1). Sklar\u2019s theorem (Sklar, 1959)provides a decomposition of a d-dimensional distribution into two parts: the marginaldistributions and the associated copula. It states that for a d-dimensional randomvector Y= (Y1,Y2, . . . ,Yd)\u2032 following a joint CDF F , with the j-th univariate marginFj, the copula associated with F is a CDF C : [0,1]d \u2192 [0,1] with U(0,1) marginsthat satisfiesF(y) =C(F1(y1), . . . ,Fd(yd)), y \u2208 Rd .If F is a continuous d-variate distribution function, then the copula C is unique.Otherwise C is unique only on the set Range(F1)\u00d7\u00b7\u00b7 \u00b7\u00d7Range(Fd).In this chapter, we review existing results that serve as background the thesis.Section 2.1 gives an overview of bivariate copulas and relevant properties. Sec-tion 2.2 briefly summarizes the Archimedean copula, which has an exchangeabledependence structure. A more flexible multivariate copula construction is the vinecopula. The definition and some related algorithms are introduced in Section 2.3.Section 2.4 describes the two-stage estimation for parameter estimation for copulamodels. Finally, Section 2.5 reviews a diagnostic method to check if two differentparametric models have similar fits.92.1 Bivariate copulasIn this section, we give an overview of parametric bivariate copula families that areused in the thesis. Consider a bivariate random vector (Y1,Y2) and let F12(y1,y2)be the CDF and f12(y1,y2) be the probability density function (PDF). By Sklar\u2019stheorem (Sklar, 1959), there exists a copula C(u1,u2) such thatF12(y1,y2) =C(F1(y1),F2(y2)),and C(u1,u2) is the CDF of a bivariate random vector (U1,U2), where U1 and U2are U(0,1) random variables.Commonly used parametric bivariate copula families include the following;properties of them are given in Chapter 4 of Joe (2014).\u2022 Independence copula: C\u22a5(u1,u2) = u1u2.\u2022 Comonotonicity copula: C+(u1,u2) = min(u1,u2).\u2022 Countermonotonicity copula: C\u2212(u1,u2) = max(0,u1+u2\u22121).\u2022 Gaussian copula:C(u1,u2;\u03c1) =\u03a62(\u03a6\u22121(u1),\u03a6\u22121(u2);\u03c1), \u03c1 \u2208 (\u22121,1),where\u03a6 is the univariate standard normal CDF, and\u03a62 is the CDF of a bivari-ate normal random vector with correlation \u03c1 , zero means and unit variances.\u2022 Frank copula:C(u1,u2;\u03b4 ) =\u2212\u03b4\u22121 log(1\u2212 e\u2212\u03b4 \u2212 (1\u2212 e\u2212\u03b4u1)(1\u2212 e\u2212\u03b4u2)1\u2212 e\u2212\u03b4), \u03b4 \u2208 R.\u2022 Gumbel copula:C(u1,u2;\u03b4 ) = exp{\u2212([\u2212 logu1]\u03b4 +[\u2212 logu2]\u03b4 )1\/\u03b4}, \u03b4 \u2208 [1,\u221e).10\u2022 Mardia\u2013Takahasi\u2013Clayton\u2013Cook\u2013Johnson (MTCJ) copula:C(u1,u2;\u03b4 ) = (u\u2212\u03b41 +u\u2212\u03b42 \u22121)\u22121\/\u03b4 , \u03b4 \u2208 [0,\u221e).\u2022 Joe copula:C(u1,u2;\u03b4 )= 1\u2212([1\u2212u1]\u03b4+[1\u2212u2]\u03b4\u2212[1\u2212u1]\u03b4 [1\u2212u2]\u03b4)1\/\u03b4, \u03b4 \u2208 [1,\u221e).\u2022 Student t copula:C(u1,u2;\u03c1,\u03bd) = T2,\u03bd(T\u221211,\u03bd (u1),T\u221211,\u03bd (u2);\u03c1), \u03c1 \u2208 (\u22121,1),\u03bd \u2208 (0,\u221e),where T1,\u03bd is the CDF of a univariate t-distribution with degree of freedom\u03bd , and T2,\u03bd is the CDF of a bivariate t-distribution with degree of freedom \u03bdand correlation parameter \u03c1 . Note that T2,\u03bd needs not to have finite secondmoments.\u2022 BB1 copula:C(u1,u2;\u03b8 ,\u03b4 ) ={1+[(u\u2212\u03b81 \u22121)\u03b4 +(u\u2212\u03b82 \u22121)\u03b4]1\/\u03b4}\u22121\/\u03b8,\u03b8 \u2208 (0,\u221e),\u03b4 \u2208 [1,\u221e).\u2022 BB6 copula:C(u1,u2;\u03b8 ,\u03b4 ) = 1\u2212(1\u2212 exp{\u2212 [(\u2212 log(1\u2212 u\u00af\u03b81 ))\u03b4+(\u2212 log(1\u2212 u\u00af\u03b82 ))\u03b4 ]1\/\u03b4})1\/\u03b8,\u03b8 \u2208 [1,\u221e),\u03b4 \u2208 [1,\u221e),where u\u00af1 = 1\u2212u1 and u\u00af2 = 1\u2212u2.\u2022 BB7 copula:C(u1,u2;\u03b8 ,\u03b4 ) = 1\u2212(1\u2212 [(1\u2212 u\u00af\u03b81 )\u2212\u03b4 +(1\u2212 u\u00af\u03b82 )\u2212\u03b4 \u22121]\u22121\/\u03b4)1\/\u03b8 ,\u03b8 \u2208 [1,\u221e),\u03b4 \u2208 (0,\u221e),11where u\u00af1 = 1\u2212u1 and u\u00af2 = 1\u2212u2.\u2022 BB8 copula:C(u1,u2;\u03b8 ,\u03b4 ) = \u03b4\u22121(1\u2212{1\u2212\u03b7\u22121[1\u2212(1\u2212\u03b4u1)\u03b8 ][1\u2212(1\u2212\u03b4u2)\u03b8 ]}1\/\u03b8),\u03b8 \u2208 [1,\u221e),\u03b4 \u2208 (0,1],where \u03b7 = 1\u2212 (1\u2212\u03b4 )\u03b8 .Note that the BB copulas are based on the theory developed by Joe and Hu(1996), and the naming convention comes from Joe (1997).2.1.1 Density and conditional distributionsIf C(u1,u2) is an absolutely continuous copula CDF, then its density function isc(u1,u2) =\u2202 2C(u1,u2)\u2202u1\u2202u2.The conditional CDF is defined as follows.C1|2(u1|u2) := P(U1 \u2264 u1|U2 = u2)= lim\u03b5\u21920+P(U1 \u2264 u1,u2 0 (Lee et al., 2018),\u03b6\u03b1(C) = 2\u2212\u03b1(\u03b3\u22121\u03b1 (C)\u22121), where \u03b3\u03b1(C) =\u222b 10C(u1\/\u03b1 ,u1\/\u03b1)du.Both Spearman\u2019s rho and Kendall\u2019s tau summarize the dependence in the centerand cannot quantify the dependence in the joint upper and lower tails. The tail-14weighted dependence measure puts more weight on data in the joint lower (upper)tail. When \u03b1 = 1, \u03b6\u03b1 is a measure of central dependence with properties similarto Kendall\u2019s tau and Spearman\u2019s rho. For large \u03b1 values, \u03b6\u03b1 is a tail-weighteddependence measure; the limit as \u03b1 \u2192 \u221e is the upper tail dependence coefficient.2.1.3 Asymmetry measuresThere are two types of symmetry that are most relevant for bivariate copulas: re-flection symmetry and permutation symmetry. A bivariate copula C(u1,u2) is re-flection symmetric if (U1,U2) and (1\u2212U1,1\u2212U2) are identically distributed, orequivalently, C(u1,u2) = C\u0302(u1,u2), where C\u0302(u1,u2) = u1 +u2\u22121+C(1\u2212u1,1\u2212u2) is known as the reflected or survival copula of C. A bivariate copula C(u1,u2)is permutation symmetric if (U1,U2) and (U2,U1) are identically distributed, orequivalently, C(u1,u2) =C(u2,u1). Figure 2.1 shows some bivariate copula fami-lies with and without symmetry.Krupskii (2017) proposes permutation asymmetry measure GP,k and reflectionasymmetry measure GR,k for data with positive quadrant dependence.\u2022 Permutation asymmetry measure, with k > 0,GP,k(C) =\u222b\u222b[0,1]2|u1\u2212u2|k+2 \u00b7 sign(u1\u2212u2)dC(u1,u2).\u2022 Reflection asymmetry measure, with k > 0,GR,k(C) =\u222b\u222b[0,1]2|1\u2212u1\u2212u2|k+2 \u00b7 sign(1\u2212u1\u2212u2)dC(u1,u2).The permutation asymmetry measure GP,k(C) is defined as the expectation ofthe variable |U1\u2212U2|k+2 adjusted for the sign of U1\u2212U2 for k > 0. It indicatesthe direction of permutation asymmetry: if the measure takes a positive (negative)value, then the conditional mean of data truncated in the right lower (left upper)corner is greater than that of data truncated in the left upper (right lower) corner. Alarger tuning parameter k results in greater variability of an empirical estimate,while a small k makes the measure less sensitive to a permutation asymmetricdependence. The permutation asymmetry measure GP,k can be further normalized15Figure 2.1: Contour plots of the joint PDF c(\u03a6(z1),\u03a6(z2))\u03c6(z1)\u03c6(z2). Themargins are N(0,1) and copulas have Spearman\u2019s \u03c1S = 0.5.Gaussian copula and Frank copula are reflection and permutation symmetric. BB1copula is permutation symmetric but not reflection symmetric. Skew-normal isneither reflection nor permutation symmetric.to the range of [\u22121,1] by finding a copula C that maximizes |GP,k(C)| (Rosco andJoe, 2013). Similarly, the reflection asymmetry measure GR,k is defined as theexpectation of the variable |1\u2212U1\u2212U2|k+2 adjusted for the sign of 1\u2212U1\u2212U2.2.1.4 Tail orderTail order, denoted by \u03ba , and as studied in Hua and Joe (2011), can be used as ameasure of the strength of dependence in the joint tails of a copula. For bivariatecopulas with positive dependence, the tail order has value between 1 and 2, withlarger values indicating less dependence in the joint tail. The tail order can belarger than the dimension for negative dependence.If there exists \u03baL > 0 and some `(u) that is slowly varying at 0+ (that is,`(tu)\/`(u) \u223c 1 as u\u2192 0+ for all t > 0) such that C(u,u) \u223c u\u03baL`(u), as u\u2192 0+,then \u03baL is called the lower tail order of C and \u03d2L = limu\u21920+ `(u) is the lower tail16order parameter, provided the limit exists. By reflection, the upper tail order isdefined as \u03baU such that C(1\u2212 u,1\u2212 u) \u223c u\u03baU `\u2217(u), as u\u2192 0+, for some slowlyvarying function `\u2217(u), where C is the survival function of the copula C. The uppertail order parameter is then \u03d2U = limu\u21920+ `\u2217(u).With \u03ba = \u03baL or \u03baU and \u03d2= \u03d2L or \u03d2U , we further classify the tail property of acopula into the following:\u2022 Strong tail dependence: \u03ba = 1 with \u03d2> 0. For example, a bivariate t-copulahas \u03baL = \u03baU = 1.\u2022 Intermediate tail dependence: 1< \u03ba < 2, or \u03ba = 1 and \u03d2= 0. For example, abivariate Gaussian copula has \u03baL = \u03baU = 2\/(1+\u03c1), where \u03c1 is the parameterof the Gaussian copula. When 0 < \u03c1 < 1, 1 < \u03baL = \u03baU < 2.\u2022 Tail quadrant independence: \u03ba = 2 and the slowly varying function is (asymp-totically) a constant. For example, a bivariate Frank copula has \u03baL = \u03baU = 2.2.2 Archimedean copulasOne way of extending bivariate copulas to multivariate is via Archimedean copu-las, which have an exchangeable dependence structure. A d-variate Archimedeancopula has the following copula CDF:C\u03c8(u) = \u03c8(d\u2211i=1\u03c8\u22121(ui)), u \u2208 [0,1]d . (2.2)This is a valid copula for any d if \u03c8 \u2208 L\u221e where L\u221e is the class of Laplacetransforms of non-negative random variables with no mass at 0 (i.e., \u03c8(\u221e) = 0).Note that Equation 2.2 is permutation symmetric:C\u03c8(upi(1), \u00b7 \u00b7 \u00b7 ,upi(d)) =C\u03c8(u1, \u00b7 \u00b7 \u00b7 ,ud)for any permutation pi of {1, . . . ,d}.The conditional distribution of the copula in Equation 2.2 given the last variable17isC1,...,d\u22121|d(u1, . . . ,ud\u22121|ud) =\u2202C\u03c8(u1, . . . ,ud)\u2202ud=\u03c8 \u2032(\u2211di=1\u03c8\u22121(ui))\u03c8 \u2032 (\u03c8\u22121(ud)).2.3 Vine copulasIn this section, we review the vine copula approach (Bedford and Cooke, 2001),which allows one to construct multivariate copulas hierarchically using bivariatecopulas as building blocks.2.3.1 Vine graphical modelsA vine is a nested set of trees where the edges in the first tree are the nodes ofthe second tree, the edges of the second tree are the nodes of the third tree, etc.Vines are useful in specifying the dependence structure for general multivariatedistributions on d variables.The first tree in a vine represents d variables as nodes and the bivariate depen-dence of d\u2212 1 pairs of variables as edges. The second tree describes conditionaldependence of d\u22122 pairs of variables conditioning on another variable; nodes arethe edges in tree 1, and a pair of nodes could be connected if there is a commonvariable in the pair. The third tree describes conditional dependence of d\u22123 pairsof variables conditioning on two other variables; nodes are the edges in tree 2, anda pair of nodes could be connected if there are two common conditioning vari-ables in the pair. This continues until tree d\u2212 1 has only one edge that describesthe conditional dependence of two variables conditioning on the remaining d\u2212 2variables.For a concrete example, as shown in Figure 2.2, consider d = 5 variables la-beled as 1,2,3,4,5. Suppose tree 1 has edges [1,2], [1,3], [2,4], [2,5] where [1,2]is an edge connecting variables 1 and 2, etc. Possible edges for tree 2 are [2,3|1],[1,4|2], [4,5|2] where [2,3|1] connects [1,2] and [1,3] (edges of tree 1 are nodes intree 2, and these two nodes have the variable 1 in common). Possible edges for tree3 are [3,4|1,2], [1,5|2,4] where [1,5|2,4] connects [1,4|2] and [4,5|2] (edges oftree 2 are nodes in tree 3, and these two nodes have the variables 2,4 in common).1813245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](a) Level 1 tree T1.13245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](b) Level 2 tree T2.13245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](c) Level 3 tree T3.Figure 2.2: An example of a vine for d = 5 up to tree 3.Note that the possible edges in a tree depend on but are not uniquely determinedby the edges of the previous trees. For example, [2,3|1], [1,4|2], [1,5|2] is anotherpossible set of edges for tree 2 in Figure 2.2. Therefore, one needs to decide whichconfiguration to adopt when building trees for a new level. The requirement thattwo connected nodes must have two distinct variables and the remaining variablesin common is called the proximity condition.A formal definition of the regular vine or R-vine is given as follows (Bedfordand Cooke, 2001; Kurowicka and Cooke, 2006).Definition 2.1. (Regular vine) V is a vine on d variables if1. V = (T1, . . . ,Td\u22121);2. T1 is a tree with nodes N(T1) = {1,2, . . . ,d}, and edges E(T1). For ` > 1, T`is a tree with nodes N(T`) = E(T`\u22121);3. (proximity condition) For `= 2, . . . ,d\u22121, for {n1,n2} \u2208 E(T`), #(n14n2) =2, where4 denotes symmetric difference and # denotes cardinality.There are two special classes of R-vines. A regular vine is called a canonicalvine or C-vine if tree T` has a unique node of degree d\u2212 ` (the maximum degree)for `= 1, . . . ,d\u22122. A regular vine is called a drawable vine or D-vine if all nodes19in T1 have degree not higher than two. In some scenarios, the special classes mightbe used directly. For example, D-vines are more natural if there is a time or linearspatial order in variables; C-vines are more natural if there are leading variablesthat influence others. R-vines might be better in the absence of these criteria.2.3.2 Vine array representationAn R-vine can be represented by the edge sets at each level E(T`), or equivalentlyby a graph, such as Figure 2.2. But those representations are not convenient foralgorithms; we need a more compact way to represent vine models. A vine arrayA = (a jk) for a regular vine V = (T1, . . . ,Td\u22121) on d elements is a d \u00d7 d uppertriangular matrix. There is an ordering of the variable indexes along the diagonal.The (`, j)-th element a` j is connected to the (`,`)-th element a`` in tree `. Thatis, the first ` rows of A and the diagonal elements encode the `-th tree T`, suchthat [a` j,a j j|a1 j, . . . ,a`\u22121, j] \u2208 E(T`) for `+1\u2264 j \u2264 d. For example, the vine arrayA1 represents the R-vine in Figure 2.2. The edges of T1 include [a12,a22] = [1,2],[a13,a33] = [2,4], [a14,a44] = [2,5], [a15,a55] = [1,3]. The edges of T2 include[a23,a33|a13] = [1,4|2], [a24,a44|a14] = [4,5|2], [a25, a55|a15] = [2,3|1].A1 =\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed1 1 2 2 12 1 4 24 1 45 53\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 , A2 =\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed2 2 1 2 21 2 1 43 3 14 35\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 .Note that a valid vine array represent a unique R-vine. However, an R-vine mayhave multiple vine array representations. For example, A1 and A2 encode exactlythe same R-vine. In applications, the variables are labeled arbitrarily. We candefine a permutation of the variables so that the diagonal elements are (1,2, . . . ,d).2.3.3 From vines to multivariate distributionsTo get a multivariate distribution from a vine, bivariate distributions are assigned tothe edges of tree 1 and bivariate conditional distributions are assigned to the edgesof trees 2, . . . ,d\u22121. In the above example, edges can be assigned bivariate distri-20butions F12,F13,F24,F25 that can be algebraically independent provided the univari-ate marginal distributions are F1,F2,F3,F4,F5, i.e., the parameters of these bivariatedistributions are free to vary on the parameter domains and the positive definitenessconstraint of the correlation matrix is automatically satisfied. For tree 2, edges canbe assigned the conditional distributions F23|1,F14|2,F45|2; for example, F23|1 sum-marizes the conditional dependence of F2|1,F3|1 where F2|1,F3|1 can be obtainedfrom F12,F13 in tree 1, respectively. The combination of F23|1,F12,F13 yields thetrivariate distribution F123. For tree 3, edges can be assigned the conditional distri-butions F34|12,F15|24; for example, F34|12 summarizes the conditional dependence ofF3|12,F4|12, which can be obtained from F123,F124. As mentioned above, F123,F124can be achieved by combining conditional distributions in trees 1 and 2.There are bivariate distributions on the edges in trees 1 to d\u22121 of the vine. Ifthe bivariate distributions on the edges are all bivariate Gaussian, each edge can becharacterized by a correlation parameter \u03c1 , which can be interpreted as a partialcorrelation for trees 2 to d\u2212 1. For the above example, one could consider thatthe edges have been assigned the quantities \u03c112, \u03c113, \u03c124, \u03c125, \u03c123;1, \u03c124;1, \u03c145;2,\u03c134;12, \u03c115;24; here the semicolon in the subscript is common for the partial correla-tion. For example, \u03c115;24 summarizes the conditional correlation of variables 1 and5 given variables 2 and 4. Partial correlations can be calculated by inverting theprincipal submatrices of a correlation matrix. Specifically, consider a partial corre-lation \u03c1a,b;S where S is a set of variables and {a,b}\u2229S= \/0. Let R be the correlationmatrix of random variables indexed by {a,b}\u222a S. If we define \u2126 = (\u03c9i j) = R\u22121,we have \u03c1a,b;S =\u2212\u03c9ab\/\u221a\u03c9aa\u03c9bb.The representation of a multivariate Gaussian distribution through a vine is analternative parametrization of the correlation matrix that avoids the positive defi-niteness constraint of a correlation matrix. From Kurowicka and Cooke (2003) andKurowicka and Cooke (2006), the correlations and partial correlations assigned toany vine are algebraically independent and the determinant of the correlation ma-trix is logdet(R) = \u2211e\u2208E(V ) log(1\u2212 \u03c12e ) for any vine with {\u03c1e} being the set ofcorrelations and partial correlations on E(V ), the edges of the vine V . Moreover,it is this parametrization of multivariate Gaussian distribution that can extend tomultivariate non-Gaussian by using bivariate copulas on the edges of the vine toget what is called the vine copula or pair-copula construction.21Multivariate data are seldom well summarized by the multivariate Gaussiandistribution, but the multivariate Gaussian distribution may be adequate as a firstorder model if the variables are monotonically related to each other. One approachto developing a parsimonious copula for high-dimensional non-Gaussian data is to(a) find a parsimonious truncated partial correlation vine for the matrix of normalscore correlations (where variables have each been converted to standard normalvia probability integral transforms), and (b) replace edges of the vine with bivari-ate copulas that can have tail behavior different from Gaussian if this is seen inbivariate plots. See Brechmann and Joe (2015) for data examples that follow thesesteps.2.3.4 Truncated vineThere are d(d\u22121)\/2 = O(d2) edges in a complete vine graph, and at least d(d\u22121)\/2 parameters for a vine copula with a parametric bivariate copula family oneach edge. Great computational effort is required for parameter estimation in high-dimensional cases. Truncated vines are useful for representing the dependence ofd variables in a parsimonious way. A truncated vine with 1 \u2264 t < d\u22121 trees, ora t-truncated vine, assumes that the most important dependencies are captured bythe first t trees Vt = (T1, . . . ,Tt) in a vine and the remaining trees have indepen-dence copulas assigned to the edges. In other words, for ` > t, T` represents theconditional independence of two variables given the conditioning variables. In theGaussian case, this is equivalent to assigning partial correlations of 0 to the edgesof the remaining d\u2212 t\u2212 1 trees. By vine truncation, the number of parameters isreduced from O(d2) to O(d), if t is constant as d increases.The most parsimonious vine structure is a 1-truncated vine with one tree thatconnects d\u2212 1 pairs. This is a valid structure (called a Markov tree) if variablesnot directly connected are conditionally independent given the variables in the treepath that connect them. But seldom can a Markov tree summarize the dependencewell in d variables. As an improvement, the truncated vine (t \u2265 2) adds some layersof conditional dependence on top of a Markov tree until conditional independencerelations from high-order trees are approximately valid.222.3.5 Vine structure learningKurowicka and Cooke (2003) show that the log-determinant of the empirical corre-lation matrix R is logdet(R) =\u2211e\u2208E(V ) log(1\u2212\u03c12e ) for any vine V , with {\u03c1e} beingthe set of correlations and partial correlations on the edges of the vine. Assumingall the bivariate copulas are Gaussian, logdet(R) is also linearly related to the neg-ative log-likelihood of the vine copula. The best t-truncated partial correlation vineto approximate the correlation matrix is such that \u2211e\u2208E(Vt) log(1\u2212\u03c12e ) is close tologdet(R). This implies that one wants a truncated vine such that \u03c12e are large inthe first t trees and small in the remaining trees.Formally, the goal of the vine structure learning problem is to find a t-truncatedvine that maximizes the objective function, or log-likelihood functionLt(V ) =t\u2211i=1\u2211e\u2208E(Ti)\u2212 log(1\u2212\u03c12e ), (2.3)where t is a pre-defined truncation level, and \u03c1e is the partial correlation corre-sponding to edge e in the vine. Since \u03c12e \u2208 (0,1), Lt(V ) is monotonically increasingwith respect to t. Furthermore, for any d-dimensional vine V ,Ld\u22121(V ) =\u2212 logdet(R).In other words, all the untruncated vines achieve the same objective function.There are a few existing methods attempting to solve the vine structure learn-ing problem. The most direct way is to enumerate and compare all possible vinestructures in a brute-force fashion. However, Kurowicka and Joe (2011) show thatthere are in total 2(d\u22123)(d\u22122)(d!\/2) different full vine structures considering all lev-els for d variables. This makes brute-force search only feasible for d \u2264 8 due tothe exponentially increasing number of possible vine structures.As an alternative, Dissmann et al. (2013) propose a method based on the max-imum spanning tree (MST) algorithms with different possible choices for edgeweights that reflect strength of the dependence between pairs of variables. Formultivariate Gaussian case, a good choice of edge weight in the trees is weight\u2212 log(1\u2212 \u03c12e ) for edge e; this is used in Section 6.17 of Joe (2014). The trees,23T1 to Tt , of the vine are sequentially constructed by maximizing the sum of theedge weights at each tree level. Such an MST can be obtained using the algorithmby Prim (1957). Dissmann\u2019s algorithm is a greedy algorithm: the construction ofTi+1 is based on the locally optimal choice given Ti. It does not in general producea globally optimal solution.Inspired by genetic algorithms, Brechmann and Joe (2015) propose methodsto effectively explore the search space of truncated vines. At each tree level, itconsiders not only the MST, but also neighbors of the MST. In general, the resultsgenerated by this algorithm outperform the greedy algorithm.2.3.6 Performance metricIt is a common question whether an empirical correlation matrix R is well approxi-mated by a model. A likelihood-ratio test is often used to assess the goodness-of-fitof a structural model. The comparative fit index (CFI) is a fit index that takes intoaccount the likelihood-ratio as well as the number of model parameters (Bentler,1990; Brechmann and Joe, 2015).A fit measure isDt = n[\u2212Lt(V )\u2212 logdet(R)], (2.4)where Lt is the objective function defined in Equation 2.3. If the model is com-pletely unstructured (the saturated model), then Dt = 0. On the other hand, if themodel assumes that all variables are uncorrelated, then D0 :=\u2212n logdet(R). Rea-sonable models should lie somewhere in between these two extreme cases. There-fore, Dt can be viewed as a discrepancy measure.For a t-truncated vine, its degree of freedom (or d(d\u22121)\/2 minus the numberof model parameters) is\u03bdt =d(d\u22121)2\u2212 t(2d\u2212 t\u22121)2=(d\u2212 t)(d\u2212 t\u22121)2. (2.5)In particular, \u03bd0 = d(d\u22121)\/2 is the case of complete independence.The CFI of a t-truncated vine is defined asCFIt := 1\u2212 max(0,Dt \u2212\u03bdt)max(0,D0\u2212\u03bd0,Dt \u2212\u03bdt) , (2.6)24which takes on values between 0 and 1. Higher CFI values correspond to better fit.CFI can be used to find an optimal truncation level given a predefined goodness-of-fit level. Formally, the optimal truncation level is given byt\u2217\u03b1 = min{t \u2208 {0, . . . ,d\u22121} : CFIt \u2265 1\u2212\u03b1}, (2.7)where \u03b1 \u2208 (0,1). Commonly used \u03b1 values include 0.01 and 0.05.2.4 Two-stage estimation method for copula modelsIn this section, we discuss methods for parameter estimation for copula models.The inference functions for margins (IFM) method or two-stage estimation is intro-duced in Joe and Xu (1996) and Joe (1997).In the first stage of IFM, univariate marginal distributions are fitted. The uni-variate marginal distributions could be estimated either parametrically or non-parametrically. Graphical diagnostics can suggest good choices of each parametricunivariate margin, and the best parameters are selected based on Akaike informa-tion criterion (AIC) or Bayesian information criterion (BIC). Alternatively, em-pirical CDFs can be used for continuous univariate margins (Genest et al., 1995).Observations for each marginal component are converted to uniform scores forcopula analysis using the probability integral transform.In the second stage, parameters of the dependence structure are estimated.Based on bivariate plots of normal scores for continuous variables, a set of can-didate copula families is chosen. With the estimated univariate marginal distribu-tions held fixed, copula parameters are estimated for the candidate copula families,and the best model is selected based on AIC or BIC. An analysis of the asymptoticefficiency of IFM is established in Joe (2005).For vine copulas with a parametric bivariate copula on each edge, we can es-timate bivariate copula families separately, starting with copulas in tree 1. This isthe approach of the VineCopula R package (Schepsmeier et al., 2018).252.5 Vuong\u2019s procedureVuong\u2019s procedure is a diagnostic method to check if two different parametric mod-els have similar fits (Vuong, 1989). Note that the two models need not to be nested.It is based on the vectors of log-likelihood contributions of each observation for twocompeting models. In the copula literature, for example, Brechmann et al. (2012),it has been used to compare two copula models.Suppose the observed sample are the vectors x1, . . . ,xn, where n is the samplesize. Given two models M1 and M2, nested or not, whose parametric densitiesare f (1) and f (2) and the estimated parameters are \u03b8\u02c6 (1) and \u03b8\u02c6 (2), respectively, thestatistic D\u030212 is defined as follows:D\u030212 =1nn\u2211i=1Di, where Di = log[f (2)(xi; \u03b8\u02c6(2))f (1)(xi; \u03b8\u02c6(1))].A large sample 95% confidence interval based on the AIC correction isD\u030212\u00b11.96\u00d7 \u03c3\u02c612\u221an \u22121n[dim(\u03b8\u02c6 (2))\u2212dim(\u03b8\u02c6 (1))],where\u03c3\u02c6212 =1n\u22121n\u2211i=1(Di\u2212 D\u030212)2.If the confidence interval contains 0, then models M1 and M2 would not be consid-ered significantly different. Otherwise, model M1 or M2 is the better fit dependingon whether the interval is completely below 0 or above 0, respectively. The AICcorrection means that a model with fewer parameters is more favorable than amodel with more parameters.26Chapter 3Vine structure learning via MonteCarlo tree search3.1 IntroductionIn multivariate statistics, modeling the dependence structure of multivariate obser-vations is essential. The multivariate Gaussian distribution for continuous randomvariables is one of the most commonly used models for this task. However, multi-variate data may not be well summarized by the multivariate Gaussian distribution,after transforming individual variables to standard normal margins, when there isjoint tail asymmetry or tail dependence. Copula models are flexible in modelingmultivariate distributions with tail behaviors that can be different from multivariateGaussian. Joe (2014) includes a detailed introduction to copula theory, models andapplications.The vine copula or pair-copula construction is a flexible tool in high-dimensionaldependence modeling. It combines vine graphs and bivariate copulas. A vine isa graphical object represented by a sequence of connected trees. In a vine cop-ula model, vines consisting of several trees are adopted to specify the dependencestructure with trees 2 and higher summarizing conditional dependence, and bivari-ate copulas are used as the basic building blocks on the edges of vine trees. Trun-cated vines are useful for representing the dependence of multivariate observationsin a parsimonious way with a few layers of conditional dependence. Vine copu-27las have been applied to many fields (Cooke et al., 2019; Dissmann et al., 2013;Krupskii et al., 2018).The structure learning of the truncated vine, defined in Section 2.3.5, is compu-tationally intractable in general. When dealing with d variables, one inputs to thealgorithm a d\u00d7d correlation matrix. The output is a truncated vine defined using ttrees where the nodes and edges of each tree need to meet the requirements of Def-inition 2.1. There are a large number of possible vine structures which result in alarge search space for a high-dimensional dataset if one would like to find parsimo-nious structures with fewer trees by maximizing the objective function defined inEquation 2.3. Specifically, according to Cayley\u2019s formula, one can construct dd\u22122different trees with d nodes. With this result, Kurowicka and Joe (2011) furthershow that there are in total 2(d\u22123)(d\u22122)(d!\/2) different vine structures consider-ing all levels for a dataset with d variables. This makes vine structure learning achallenging problem. Previous work has been mainly centered around greedy algo-rithms which follow the heuristics of linking variables with stronger dependence inlow-level trees and making the locally optimal choice at each tree level conditionalon previous trees. However, it does not in general produce a solution that is closeto the global optimum of the vine structure learning problem.Monte Carlo tree search (MCTS) is a searching framework for finding optimaldecisions by taking random samples in the decision space (Browne et al., 2012).The key idea of MCTS is first to construct a search tree of states which are evalu-ated by fast Monte Carlo simulations and then selectively grow the tree (Coulom,2006). Multi-armed bandit algorithms such as the upper confidence bounds fortrees (UCT) can be employed to balance between exploration and exploitation (Koc-sis and Szepesva\u00b4ri, 2006). As one of the most important methods in artificial in-telligence, MCTS has been widely applied in various game and planning problems,including chess, shogi, Go, real-time video games, and even games with incom-plete information such as poker. In March 2016, AlphaGo, which combines MCTSwith deep neural networks, became the first computer Go program to beat a 9-danprofessional without handicaps (Silver et al., 2016). This is regarded as a signifi-cant milestone in artificial intelligence research.Because the construction of a truncated vine is inherently sequential, we for-mulate the vine structure learning problem as a sequential decision making process28in this work. A search tree thus arises, where the root node is \u201cempty\u201d and doesnot has any edge, and the terminal leaf nodes are t-truncated vines. Although theheight and branching factor of the search tree might be large, MCTS can be adoptedto search through it efficiently.1 Specifically, we adapt the existing upper confi-dence bounds for trees (UCT) algorithm for vine structure learning and incorporatetree policy enhancements including first play urgency, progressive bias, and effi-cient transposition handling. The adapted UCT is called the vine UCT, under theguidance of which, the tree policy strikes a balance between exploration and ex-ploitation.After the MCTS method finds candidates for truncated vine structures, bivari-ate copulas (based on diagnostics and the candidate list) are assigned to the edgesof the truncated vines. By separating the search of the truncated vine structurefrom the assignment of bivariate copulas to the edges of the vine, one can comparedependence structures such as truncated vine and factor models with latent vari-ables as nodes of vine. The proposed approach improves on greedy algorithms interms of model fitting but takes more computational time. Comparisons are madewith existing methods on several real datasets. All the examples suggest that theproposed method outperforms existing methods.The remainder of the chapter is organized as follows. The proposed MCTS-based vine structure learning method is described in Section 3.2, and is evaluatedon various datasets in Section 3.4. Section 3.3 compares a synthetic situation wheregreedy algorithms are expected to perform poorly. Section 3.5 provides concludingremarks.3.2 Proposed method3.2.1 Vine structure learning as sequential decision makingIn this section, we formulate the vine structure learning problem as a sequentialdecision making problem. A t-truncated vine can be represented by a sequence ofedges in the sequence of trees T1, . . . ,Tt . Given the discrete and hierarchical na-1The height of a tree is the length of the longest path from the root to a leaf. The height of thesearch tree is thus \u2211ti=1(d\u2212 i). The branching factor of a node is the number of children of the node.An average branching factor can be calculated for all the nodes in a tree.2912 34512 345[1,2][2,5]12 345[1,2][2,5][2,3][3,4]12 345[1,2][2,5][2,3][3,4][1,2][2,5][2,3][3,4][1,5|2][1,3|2]12 345[1,2][2,5][2,3][3,4][1,2][2,5][2,3][3,4][1,5|2][1,3|2][2,4|3](a) (b) (c)(d) (e)Figure 3.1: Vine structure learning as a sequential decision problem. An edgecan be added to an unconnected acyclic graph. When a tree at level t iscompleted, the edges of this tree are used to create nodes for the nextgraph at level t+1.ture, the construction of a t-truncated vine can be regarded as a sequential decisionproblem:1. The initial state, or the root node of the search tree, has no edge; Ti are emptyfor i \u2208 {1, . . . , t}.2. Starting from level i = 1, add one edge to Ti at each step according to a treepolicy. The candidate edges are chosen so that Ti has only one connected andacyclic component. For levels i> 1, the candidate edges also need to satisfythe proximity condition, which is defined in Section 2.3.1. A vine tree iscompleted before going to the next tree.3. If Ti is connected and i < t, go to step 2 and start adding edges to Ti+1.Figure 3.1 shows an example of vine structure learning with dimension d = 5and truncation level t = 2. Each subfigure represents a state in the search tree. (a)The initial state is an empty graph with 5 nodes. (b) After two steps, two edges[1,2] and [2,5] are added to the graph. Note that given this state, [1,5] cannotbe added otherwise a cycle would form. (c) Two more edges [2,3] and [3,4] are30added and T1 becomes connected. (d) We have started adding edges to T2; [1,5|2]and [1,3|2] have been sequentially added. Given this state, edges [2,5]\u2013[3,4] and[1,2]\u2013[3,4] do not satisfy the proximity condition, and edge [2,3]\u2013[2,5] does notsatisfy the acyclic condition (defined in Section 2.3.1). Therefore, the only edgethat can be further added is [2,3]\u2013[3,4], namely [2,4|3]. (e) All the edges havebeen added and this is a terminal state. The tree construction resembles Prim\u2019salgorithm (Prim, 1957), which ensures that at each time step there exists at mostone connected component in each tree.3.2.2 Monte Carlo tree searchMonte Carlo tree search (MCTS) is a method trying to find the optimal sequence ofdecisions by taking random samples over possible decisions and building a searchtree accordingly. It has been widely used in domains that can be represented astrees of sequential decisions, such as games and planning problems. MCTS is par-ticularly useful in problems with high branching factors since MCTS can be config-ured to be terminated after a predefined computational budget is reached and canselect a sufficiently good solution based on the partially constructed tree. Whilea pure Monte Carlo process runs a large number of simulations completely ran-domly, MCTS keeps statistics of each possible move and uses them to guide theconstruction of the search tree. Notably, AlphaGo (Silver et al., 2016) combinesdeep neural networks with MCTS and has recently defeated a human professionalplayer in the full-sized game of Go, which has long been viewed as one of the mostchallenging classic games for artificial intelligence.The basic MCTS algorithm is conceptually simple: it iteratively builds a searchtree until a predefined computational budget is reached. The search tree is initial-ized to a root node v0 with an initial state s0, which does not have any edge. Everynode v in the search tree has its corresponding number of visits nv and averageobjective function x\u00afv; both are initialized to zero. The child nodes of a node in thesearch tree consist of incomplete t-truncated vines with one additional vine edge(from those that satisfy the proximity and acyclic conditions). In each iteration,four steps are applied:1. Selection: Starting at the root node v0, a tree policy is recursively applied to31\u0394\u0394\u0394\u0394\u0394Selection Expansion Simulation BackpropagationDefault PolicyTree PolicyFigure 3.2: One iteration of the general MCTS algorithm.descend from the root node to a node v with at least one unvisited child node(nv = 0). In other words, edges are added one by one.2. Expansion: An unvisited child node v` is added to expand the tree.3. Simulation: A simulation is run from the new node v` according to a defaultpolicy to produce a t-truncated vine. An edge is randomly selected withequal probability from all the eligible edges, that is the candidate edges thatsatisfy the acyclic and proximity conditions.4. Backpropagation: The objective function of the t-truncated vine is calculatedand nv and x\u00afv are updated for all nodes v along the path.One iteration of the MCTS algorithm is shown in Figure 3.2. Note that thereare two types of policies used in MCTS: tree policies are used in the selectionand expansion steps to select or create a child node vl from a node that is alreadycontained in the search tree, while default policies are used in the simulation stepto produce an estimate of the outcome proceeding from a non-terminal node. Thebackpropagation step does not involve a tree policy. However, once the statisticsfor each node in the tree are updated after backpropagation, any future decisionsmade based on the tree policies may be affected. The final result of MCTS is asequence of actions that leads to a sufficiently optimal outcome starting from theroot node v0.32A default policy determines how to move from a non-terminal node to a termi-nal node, which corresponds to a full t-truncated vine, in the search tree. In ourapplication, it specifies how to \u201ccomplete\u201d the truncated vine given an incompleteone, which has fewer edges than a full t-truncated vine. We consider two options:1. Uniformly random: given an incomplete t-truncated vine (with search treein level i with incomplete Ti), an edge is randomly selected with equal prob-ability from remaining eligible edges in Ti, that is the candidate edges thatsatisfy the acyclic and proximity conditions.2. Maximum spanning tree: similar to Dissmann\u2019s Algorithm (Dissmann et al.,2013) introduced in Section 2.3.5, given an incomplete truncated vine, maxi-mum spanning trees are constructed sequentially from lower to higher levels.The disadvantage of maximum spanning tree default policy is that it greedilyexpands an incomplete vine, and this might lead to insufficient exploration and lockonto a set of suboptimal actions. Therefore, we use the uniformly random defaultpolicy in the proposed method to better explore the search space.3.2.3 Tree policy: vine UCTMCTS iteratively builds and updates a search tree to approximate the optimal ac-tions that can be taken from a given state. The way the search tree is built dependson how the nodes in the tree are selected, which is controlled by the tree policy.Therefore, the choice of tree policy is crucial to the success of MCTS: it shouldmanage to balance between exploration (look in areas that have not been well sam-pled yet) and exploitation (look in areas which appear to be promising). In thissection, we describe a popular tree policy in the MCTS family, the upper confi-dence bounds for trees (UCT).In the original definition of UCT, in the selection and expansion step a childnode j is selected to maximizeUCT( j) = x\u00af j +\u03ba\u221a2lognn j, (3.1)where33\u2022 x\u00af j is the average objective function from child node j;\u2022 n is the number of times the parent node has been visited;\u2022 n j is the number of times child j has been visited;\u2022 \u03ba > 0 is a constant.If more than one child node has the same maximal value, the tie is usually brokenrandomly. Since n j = 0 yields a UCT value of \u221e, previously unvisited children areassigned the largest possible value, which corresponds to the expansion step in theMCTS algorithm. In the backpropagation step, x\u00af j and n j are updated accordingly.The two terms in Equation 3.1 attempt to balance between exploitation (thefirst term) and exploration (the second term). Without the exploration term, theUCT algorithm will always select the child node with the highest average outcomebased on the simulation history. However, with the exploration term, if a childnode j of a parent node has been visited, n in the numerator increases, which leadsto the increase of the exploration value of the other unvisited child nodes of theparent node. At the same time, n j in the denominator for this child node increasesand hence the exploration value of node j decreases. The exploration term in theUCT objective function ensures that each child node has a non-zero probability ofbeing selected and thus achieves a balance between exploitation and exploration.However, in order to apply the original UCT algorithm to the vine structurelearning problem, a few adaptations are needed.Scaling constantIn our application, the objective function Lt(V ) defined in Equation 2.3 is in therange of (0,\u2212 logdet(R)], where R is the empirical correlation matrix. We needto adjust the scaling constant \u03ba in Equation 3.1 accordingly so that the explorationand exploitation terms are on the same scale. A natural choice of \u03ba is\u2212 logdet(R).The value of \u03ba can be adjusted to lower or increase the amount of explorationperformed.34First play urgencyIn the original UCT algorithm, the selection step stops whenever a node has anunvisited child node. For problems with large branching factors or height in thesearch tree, the tree will not grow deeper unless all the child nodes are visited. Forvine structure learning, the height of the search tree is in the order of O(d2), andthe branching factor is also large. Therefore, exploitation will rarely occur deepin the tree according to the original UCT algorithm. First play urgency (FPU) is amodification proposed by Gelly and Wang (2006) to address this issue. It assigns afixed value of \u03bbFPU to score the unvisited nodes and uses the original UCT formulato score the visited nodes. By doing so, the score of an unvisited node is no longerinfinite, and this encourages early exploitation.Progressive biasWhen a node has been visited only a few times, its statistics are not reliable.Progressive bias is a technique of adding domain specific heuristic knowledge toMCTS (Chaslot et al., 2008). In artificial intelligence game playing problems, manygames already have strong heuristic knowledge.A general form of progressive bias for node j is H j\/(n j + 1), where H j is aheuristic value and n j is the number of visits for this node. This term is added tothe UCT formula to encourage exploitation of nodes with larger heuristic values.As the number of visits n j increases, the effect of progressive bias decreases.In our application, given the objective function in Equation 2.3, H j can bechosen as H j =\u2212 log(1\u2212\u03c12e j), since the objective function is the summation of H jover all the edges in a truncated vine. Here e j is the edge added by node j and \u03c1e j isthe corresponding (partial) correlation parameter. A tuning parameter \u03bbPB is usedto control the strength of progressive bias; that is, \u03bbPB times the progressive biasis added to the exploration term in UCT. When \u03bbPB is sufficiently large, the treepolicy is solely controlled by the heuristic value, and the MCTS algorithm coincideswith Dissmann\u2019s algorithm.35\u00d8[1,2][1,3][2,3]{[1,2]} {[1,3]} {[2,3]}{[1,2],[1,3]}[1,3] [2,3] [1,2] [2,3] [1,2] [1,3]{[1,2],[2,3]} {[1,3],[1,2]} {[1,3],[2,3]} {[2,3],[1,2]} {[2,3],[1,3]}Figure 3.3: The search tree corresponding to a 1-truncated vine with d =3. Although the search tree has six leaf nodes, there are only threeunique 1-truncated vines: {[1,2], [2,3]} and {[2,3], [1,2]} yield 1\u20132\u20133; {[1,3], [2,3]} and {[2,3], [1,3]} yield 1\u20133\u20132; {[1,2], [1,3]} and{[1,3], [1,2]} yield 2\u20131\u20133.TranspositionsThe formulation of vine structure learning as a sequential decision making processhas a potential problem: the same states can be reached through different paths inthe search tree. This is usually referred to as transpositions.Figure 3.3 shows the search tree corresponding to a 1-truncated vine with d = 3.It is obvious that there are only three unique 1-truncated vines in this case sum-marized as 1\u20132\u20133, 1\u20133\u20132, 2\u20131\u20133. However, the search tree contains six terminalnodes; for each unique vine structure, there exist two distinct paths leading to it.For example, the two paths [1,2], [2,3] and [2,3], [1,2] both result in the same vinestructure 1\u20132\u20133.Transpositions cause inefficiency because the statistics of the same state arescattered across different nodes. Transposition tables are the usual choice to tacklethis problem; they store information about states and share the statistics to sub-sequent occurrences of the same state during the search. A transposition table isusually implemented as a hash table of the unique vine states. On encountering anew vine state, the algorithm checks the table to see whether the state has alreadybeen analyzed; this can be done quickly, in expected constant time. If so, the table36contains the statistics that were previously assigned to this state, and the statisticsare used directly. If not, the new state is entered into the hash table.It is relatively straightforward to apply transposition tables in the selection stepsof MCTS. Childs et al. (2008) further discuss the use of transposition tables inthe backpropagation steps. Specifically, we adopt the UCT2 algorithm from thatpaper. Compared with the original UCT formula in Equation 3.1, there are twomodifications: (1) a transposition table is used to share statistics of the same vinestate; (2) the algorithm keeps track of the number of visits of both nodes and edgesin the search tree. For a parent node p and its child node j, the UCT2 value is givenbyUCT2(p, j) = x\u00af j +\u03ba\u221a2lognpn(p, j), (3.2)where x\u00af j is retrieved from the transposition table, np is the number of visits of nodep and n(p, j) is the number of visits of edge (p, j). Note that if n(p, j) is replaced withn j, the value of the parent node might converge to an incorrect value (Browne et al.,2012).Vine UCTCombining the above adaptations, the resulting UCT is called the vine UCT. For aparent node p and its child node j in the search tree, the vine UCT value isVUCT(p, j) = x\u00af j\u2212 logdet(R)[\u03bbPB \u00b7 H jn j +1 +min{\u221a2lognpn(p, j),\u03bbFPU}], (3.3)where\u2022 \u03bbFPU and \u03bbPB are the tuning parameters;\u2022 H j =\u2212 log(1\u2212\u03c12e j) is the contribution to the objective function by the newlyadded edge e j in child j;\u2022 np and n j are the numbers of visits of parent node p and child node j in thesearch tree;\u2022 n(p, j) is the number of visits of edge (p, j);37\u2022 x\u00af j is the average objective function (defined in Equation 2.3) from child nodej retrieved from the transposition table.In summary, the input of our method is a correlation matrix R calculated froma multivariate dataset. The MCTS algorithm is applied, using vine UCT as the treepolicy and uniformly random default policy. In every iteration, the default policyleads to a terminal t-truncated vine. Through the iterations, we keep the vine withthe largest value of the objective function in Equation 2.3, and it is returned as theoutput. Algorithm 3.1 presents the pseudocode of the proposed method.We further illustrate the MCTS algorithm by applying it to a simple example:learning a 2-truncated vine for d = 6. The correlation matrix is as follows:\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed1.00 0.40 0.15 0.41 0.32 0.620.40 1.00 0.45 0.54 0.76 0.480.15 0.45 1.00 0.20 0.51 0.260.41 0.54 0.20 1.00 0.42 0.410.32 0.76 0.51 0.42 1.00 0.360.62 0.48 0.26 0.41 0.36 1.00\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8.We run 3000 iterations of the MCTS algorithm. Figure 3.4 shows some of thenodes with depths of 0 to 3 in the search tree and the corresponding number ofvisits nv, average score (objective function) x\u00afv, and the progressive bias H j. Sinceeach iteration always starts from the root node, the root node has been visited 3000times. There are(62)= 15 nodes of depth 1 in the search tree in total. Figure 3.5further shows some nodes of depths 5, 6 and 9 in the search tree. Since the nodesof depth 5 have completed first vine trees, edges are added to the second vine trees.If we run the simulation steps using the default policy from the nodes of depth 6,each gives a node of depth 9 in the search tree, corresponding to a full t-truncatedvine. Since each iteration adds one node to the search tree, there are only 3000nodes in the search tree. As a result, the nodes of depth 9 are likely not part ofthe search tree, and their numbers of visits and average scores are not stored inthe search tree; they are only simulated using the default policy from a node in thesearch tree.38Algorithm 3.1 Vine UCT Algorithm1: function VUCTSEARCH(R, t, num iter)2: # R is the correlation matrix, t is the truncation level, num iter is the number ofiterations.3: \u2206best\u2190 0 . the best score so far4: vroot\u2190 \/0 . The root node contains no edge5: vbest\u2190 null . the terminal node in the search tree that has the best score6: for i = 1 to num iter do7: (vtree,vhistory)\u2190 TreePolicy(vroot)8: (vdefault,\u2206)\u2190 DefaultPolicy(vtree, t)9: Backprop(vhistory,\u2206)10: if \u2206> \u2206best then11: vbest\u2190 vdefault12: \u2206best\u2190 \u220613: end if14: end for15: return vbest16: end function17: function TREEPOLICY(v)18: vhistory\u2190 [v] . a list of nodes in the search tree19: while nv > 0 do . nv is the number of visits to node v20: v\u2190 argmaxc\u2208children(v)VUCT(v,c) . VUCT in Equation 3.321: append v to vhistory22: end while23: return (v,vhistory)24: end function25: function DEFAULTPOLICY(v, t)26: while v is not a completed t-truncated vine do27: v\u2190 uniformly random sample from children(v)28: end while29: \u2206\u2190 score of v30: return (v,\u2206)31: end function32: function BACKPROP(v,\u2206)33: for i in 1 : length(v) do34: x\u00afv[i]\u2190 (nv[i]x\u00afv[i]+\u2206)\/(nv[i]+1)35: nv[i]\u2190 nv[i]+1 . x\u00afv and nv are initialized to 0 for each node v36: end for37: if i > 1 then38: n(v[i\u22121],v[i])\u2190 n(v[i\u22121],v[i])+139: end if40: end function39num_visits = 3000avg_score = 2.1771 23456num_visits = 277avg_score = 2.276H = 0.1741 23456num_visits = 142avg_score = 2.046H = 0.0231 23456num_visits = 218avg_score = 2.204H = 0.1391 23456num_visits = 59avg_score = 2.195H = 0.0231 23456num_visits = 60avg_score = 2.267H = 0.1841 23456num_visits = 70avg_score = 2.290H = 0.2621 23456Depth 0Depth 1Depth 2Figure 3.4: Some nodes of depths from 0 to 3 in the search tree. The rootnode does not have any edges. A child node is obtained by adding anedge to the (incomplete) vine structure of its parent node. In futureiterations, child nodes with higher scores are more likely to be visited(exploitation); child nodes with fewer prior visits are more likely to bevisited (exploration); child nodes with larger values of H j = \u2212 log(1\u2212\u03c12e j) are more likely to be visited (progressive bias). Note that eachchild node has several predecessors so that the number of visits of agiven node in the search tree is fewer than the sum of numbers of visitsof its child nodes.40num_visits = 93avg_score = 2.3471 23456num_visits = 32avg_score = 2.351H = 0.1101 2345612 16num_visits = 25avg_score = 2.349H = 0.6531 2345612 24num_visits = 32avg_score = 2.354H = 0.0881 2345623 25score = 2.2081 2345612 162523 24Depth 5Depth 6Depth 9score = 2.2701 2345612 162523 24score = 2.2911 2345612 162523 24Figure 3.5: Some nodes of depths 5, 6, and 9 in the search tree. The nodesof depth 9 are the results of the simulation step, starting from the nodesof depth 6. The scores or objective functions of the best 2-truncatedvine found by the MCTS algorithm, brute-force algorithm, and sequen-tial MST algorithm are 2.362, 2.362, and 2.333, respectively.413.3 A worst-case example for SeqMSTGreedy algorithms generally do not find the global optimum. In this section, westudy a worst-case example where the dependence structure can be optimally cap-tured by a 2-truncated D-vine model, but greedy algorithms only find locally opti-mal solutions. This illustrates how the proposed method performs in a worst-casescenario for greedy algorithms.Consider a d-dimensional random vector Z = (Z1, . . . ,Zd) \u223c N(0,R), whichhas a stochastic representation as follows. Let \u03b5 j be i.i.d N(0,1) random variablesand \u03c6 j,1,\u03c6 j,2 and \u03c8 j are constants. LetZ1 = \u03b51,Z2 = \u03c62,1Z1+\u03c82\u03b52,Z j = \u03c6 j,1Z j\u22121+\u03c6 j,2Z j\u22122+\u03c8 j\u03b5 j for 3\u2264 j \u2264 d.Here, \u03c8 j are chosen as a function of \u03c6 j,` such that Var(Z j) = 1. Section 6.14.2of Joe (2014) gives an algorithm of converting from the coefficients {\u03c6 j,`} to thecorrelation matrix R and vice versa. R is said to be the correlation matrix of a 2-truncated partial correlation D-vine because the resulting D-vine has partial corre-lations of zero for variables separated by 3 or more nodes; i.e., \u03c1 j, j+k; j+1,..., j+k\u22121 =0 for k \u2265 3.To make the problem difficult for greedy algorithms, we set \u03c6 j,1 < \u03c6 j,2 for3 \u2264 j \u2264 d. As a result, the correlations between Z j and Z j+2 are greater thanbetween Z j and Z j+1. Here is an example for d = 5, \u03c6 j,1 = 0.3, and \u03c6 j,2 = 0.6 forall j; the correlation matrix isR =\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed1.00 0.30 0.69 0.39 0.530.30 1.00 0.48 0.74 0.510.69 0.48 1.00 0.59 0.780.39 0.74 0.59 1.00 0.650.53 0.51 0.78 0.65 1.00\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 .Since R j, j+1 0,GP,k(C) =\u222b\u222b[0,1]2|u1\u2212u2|k+2 \u00b7 sign(u1\u2212u2)dC(u1,u2).\u2022 Reflection asymmetry measure, with k > 0,GR,k(C) =\u222b\u222b[0,1]2|1\u2212u1\u2212u2|k+2 \u00b7 sign(1\u2212u1\u2212u2)dC(u1,u2).56The above measures and the corresponding conditional measures can be used as di-agnostic tools to detect permutation and reflection asymmetry for pairs of variablesused in vine copulas and check the reasonableness of the simplifying assumptionfor vine copulas.In Sections 2.15 and 2.17 of Joe (2014), there are tail-weighted dependencemeasures such as semi-correlations and asymmetry measures that depend on quan-tiles. We will not consider these for the conditional version in the subsequentdevelopments because there do not exist simple expressions for the conditionaldependence measures defined by quantiles in general.We give an overview of the kernel smoothing method (Gijbels et al., 2011)in Section 4.2.1. In Sections 4.2.2, 4.2.3 and 4.2.4, we present the estimates ofconditional Spearman\u2019s rho, conditional tail-weighted dependence measure, andconditional asymmetry measures, respectively.4.2.1 Estimating copulas of conditional distributionsGijbels et al. (2011) propose nonparametric estimators of Spearman\u2019s rho andKendall\u2019s tau for a conditional distribution of two variables given a covariate. Inthis section, we give an overview of the estimator of copulas of conditional distri-butions.Consider a random vector (Y1,Y2,X)T , where X could be a random variable ora random vector. Let FY1|X and FY2|X be the conditional CDF of Y1 and Y2 given Xrespectively, and CY1,Y2;X be the copula for FY1|X(\u00b7|x) and FY2|X(\u00b7|x). Let (yi1,yi2,xi)be the observed data, for i = 1, . . . ,n, and suppose this is considered as a randomsample. For j \u2208 {1,2}, the conditional CDF FY j|X can be estimated byF\u02dcY j|X(y|x) =n\u2211i\u2032=1wi\u2032 j(x)I{yi\u2032 j \u2264 y},for appropriately chosen weights wi\u2032 j(x), where I represents the indicator func-tion. The weight wi\u2032 j(x) is larger if xi\u2032 is closer to x. Let GY1,Y2;X(v1,v2;x) =P(FY1|X(Y1|x)\u2264 v1,FY2|X(Y2|x)\u2264 v2), then similarly an estimate of GY1,Y2;X(v1,v2;x)isG\u02dcY1,Y2;X(v1,v2;x) =n\u2211i=1wi(x)I{u\u02dci1 \u2264 v1, u\u02dci2 \u2264 v2},57where u\u02dci1 = F\u02dcY1|X(yi1|xi) and u\u02dci2 = F\u02dcY2|X(yi2|xi). Because of the smoothing forF\u02dcY1|X and F\u02dcY2|X , G\u02dcY1,Y2;X(v1,v2;x) does not have U(0,1) margins. One can obtainthe marginsG\u02dcY1;X(v1;x) =n\u2211i=1wi(x)I{u\u02dci1 \u2264 v1}; G\u02dcY2;X(v2;x) =n\u2211i=1wi(x)I{u\u02dci2 \u2264 v2}.The weight wi(x) is larger if xi is closer to x, and it can be different from wi1(x)and wi2(x). Let G\u02dc\u22121Yj;X be the generalized inverse distribution function of G\u02dcY j;X forj \u2208 {1,2}. An estimate of CY1,Y2;X(u1,u2;x) can be obtained:C\u02dcY1,Y2;X(u1,u2;x) = G\u02dcY1,Y2;X(G\u02dc\u22121Y1;X(u1;x), G\u02dc\u22121Y2;X(u2;x);x)=n\u2211i=1wi(x)I{u\u02c6i1 \u2264 u1, u\u02c6i2 \u2264 u2}, (4.1)where u\u02c6i1 := G\u02dcY1;X(u\u02dci1;x) and u\u02c6i2 := G\u02dcY2;X(u\u02dci2;x).One common choice of the weight function is the Nadaraya-Watson estimator(Nadaraya, 1964; Watson, 1964):wi(x) =K (\u2016xi\u2212 x\u2016\/hn)\u2211nj=1 K (\u2016x j\u2212 x\u2016\/hn), i = 1, . . . ,n,where hn = O(n\u22121\/5) is the bandwidth and K(\u00b7) is the kernel function. Whenx \u2208 Rd for d > 1, this is a spherically symmetric weight function (Loader, 1999).Commonly used kernel functions include: (1) Uniform: K(t) = 12I{|t| \u2264 1}; (2)Gaussian: K(t) = 1\u221a2pi e\u2212t2\/2; (3) Epanechnikov: K(t) = 34(1\u2212 t2)I{|t| \u2264 1}.For a bivariate measure \u03b7 that is a functional of a bivariate copula, its corre-sponding conditional measure can be written as \u03b7(x)=\u03b7(CY1,Y2;X(\u00b7;x)). If \u03b7(x) is asmooth function in x, it can be estimated by \u03b7\u02dc(x) = \u03b7(C\u02dcY1,Y2;X(\u00b7;x)), where C\u02dcY1,Y2;Xis defined in Equation 4.1 via kernel smoothing. Algorithm 4.1 shows the pseudo-code for estimating \u03b7(x) evaluated on a sequence of grid points (x\u22171,x\u22172, . . . ,x\u2217M),where x\u2217m \u2208 supp(X). For a fixed x\u2217m, all the observations (xi,yi1,yi2) contribute tothe estimation of \u03b7(x\u2217m), but those xi that are closer to x\u2217m carry more weight.Algorithm 4.1 works for a scalar-valued x as well as a vector-valued x. How-ever, due to the curse of dimensionality, the sample size needed to detect the signal58Algorithm 4.1 Estimation of a conditional measure \u03b7(x).Input: A sequence of grid points (x\u22171,x\u22172, . . . ,x\u2217M); observed data {((xi,yi1,yi2)}ni=1.Output: Estimated conditional measure \u03b7\u02dc(x\u2217m) for m = 1,2, . . . ,M.1: Smoothed empirical values u\u02dci1 and u\u02dci2 for i = 1, . . . ,n are obtained such that{u\u02dci j} is close to a U(0,1) distribution for j = 1,2.2: A smoothed empirical C\u02dcY1,Y2;X(\u00b7;x\u2217m) is computed for each x\u2217m using Equa-tion 4.1.3: An empirical \u03b7\u02dc(x\u2217m) = \u03b7(C\u02dcY1,Y2;X(\u00b7;x\u2217m)) can be obtained. (Examples are givenin the following subsections.)from random variation increases quickly when conditioning on more variables.Conditioning on one variable, a sample size of the order of 300 and above can leadto the detection of the shape of conditional measure as a function of the value of theconditioning variable. However, when conditioning on two variables, the samplesize might need to be several thousand to see the shape from the random variability.An example is given in Section 4.4 to illustrate this.Moreover, bootstrapping can be used to determine the confidence bands forconditional measures. The confidence bands can help to visually suggest whethera conditional measure \u03b7(x) is constant with respect to x. If so, there is more supportfor the simplifying assumption as an approximation. Acar et al. (2012) constructpointwise confidence bands at each grid point. Here, we propose to use simultane-ous envelop-based bootstrap confidence bands and provide a method of construct-ing such confidence bands. The idea is to draw bootstrap samples and computethe curve of estimated \u03b7\u02dc(x) on a sequence of grid points; repeating this step Nbstimes gives Nbs estimated curves. For each grid point, we find the curve that corre-sponds to the \u03b3-level upper (lower) confidence bound. For neighboring grid points,the same curve might be the pointwise \u201ccritical\u201d curve. Consider the set of curvesthat are critical for the upper (lower) confidence. The envelope from the pointwisemaximum (minimum) of these upper (lower) critical curves is the resulting simul-taneous upper (lower) confidence curve; the upper and lower envelopes guaranteeto cover entirely a proportion \u03b3 of the bootstrapped curves. Algorithm 4.2 gives aformal definition of the proposed bootstrapping method.59Algorithm 4.2 Upper and lower simultaneous bootstrap confidence bands of \u03b7(x).Input: A sequence of grid points (x\u22171,x\u22172, . . . ,x\u2217M); observed data {((xi,yi1,yi2)}ni=1;number of bootstrap samples Nbs; confidence level \u03b3 .Output: Upper and lower confidence bands evaluated at (x\u22171,x\u22172, . . . ,x\u2217M).1: for r = 1,2, . . . ,Nbs do2: Draw n observations with replacement from {((xi,yi1,yi2)}ni=1.3: Estimate \u03b7\u02dcr evaluated at (x\u22171,x\u22172, . . . ,x\u2217M) with the bootstrap sample, usingAlgorithm 4.1.4: end for5: Initialize Supper\u2190 \/0, Slower\u2190 \/0.6: for m = 1,2, . . . ,M do7: Find the (1+ \u03b3)\/2 quantile of {\u03b7\u02dcr(x\u2217m)}Nbsr=1, denote its index by Iupperm \u2208[Nbs].8: Supper\u2190 Supper\u222a{Iupperm }.9: Find the (1\u2212 \u03b3)\/2 quantile of {\u03b7\u02dcr(x\u2217m)}Nbsr=1, denote its index by Ilowerm \u2208[Nbs].10: Slower\u2190 Slower\u222a{Ilowerm }.11: end for12: The upper confidence band evaluated at x\u2217m is max{\u03b7\u02dcr(x\u2217m) : r \u2208 Supper}; thelower confidence band evaluated at x\u2217m is min{\u03b7\u02dcr(x\u2217m) : r \u2208 Slower}.4.2.2 Conditional Spearman\u2019s rhoFor a bivariate copula C, the population version of Spearman\u2019s rho can be expressedas\u03c1S(C) = 12\u222b\u222b[0,1]2C(u1,u2)du1du2\u22123.Using the kernel method described in Section 4.2.1, conditional Spearman\u2019s rhofor CY1,Y2;X(\u00b7;x) can be estimated by\u03c1S(C\u02dcY1,Y2;X(\u00b7;x)) = 12\u222b\u222b[0,1]2C\u02dcY1,Y2;X(u1,u2;x)du1du2\u22123= 12n\u2211i=1wi(x)\u222b\u222b[0,1]2I{u\u02c6i1 \u2264 u1, u\u02c6i2 \u2264 u2}du1du2\u22123= 12n\u2211i=1wi(x)(1\u2212 u\u02c6i1)(1\u2212 u\u02c6i2)\u22123,where u\u02c6i1 := G\u02dcY1;X(u\u02dci1;x) and u\u02c6i2 := G\u02dcY2;X(u\u02dci2;x) as defined in Section 4.2.1.60Note that the numerical implementation of a conditional Kendall\u2019s tau is muchmore time-consuming than Spearman\u2019s rho because the computational complex-ity is of a higher power of the sample size n. Hence, we do not use conditionalKendall\u2019s tau. We refer the readers to Gijbels et al. (2011) for its empirical versionusing kernel smoothing.4.2.3 Conditional tail-weighted dependence measureSpearman\u2019s rho and Kendall\u2019s tau summarize the dependence in the center and can-not quantify the dependence in the joint upper and lower tails. The lower (upper)tail dependence coefficients can be used to measure the strength of dependencein the joint lower (upper) tail of a bivariate distribution. However, since the taildependence coefficients are defined via limits, they do not have simple empiricalcounterparts. Some tail-weighted dependence measures, such as semi-correlationsof normal scores, do not have simple counterparts for conditional dependence mea-sures.Lee et al. (2018) propose a family of dependence measures \u03b6\u03b1 for \u03b1 > 0. When\u03b1 = 1, \u03b6\u03b1 is a measure of central dependence with properties similar to Kendall\u2019stau and Spearman\u2019s rho. For large \u03b1 , \u03b6\u03b1 is a tail-weighted dependence measure;the limit as \u03b1 \u2192 \u221e is the upper tail dependence coefficient. The definition of theupper tail-weighted dependence measure \u03b6\u03b1 is given as follows:\u03b6\u03b1(C) := 2\u2212\u03b1(\u03b3\u22121\u03b1 (C)\u22121), where \u03b3\u03b1(C) :=\u222b 10C(u1\/\u03b1 ,u1\/\u03b1)du. (4.2)The lower tail-weighted dependence measure of a copula is the upper tail-weighteddependence measure of its survival copula C\u0302(u1,u2) =C(1\u2212u1,1\u2212u2)+u1+u2\u22121:\u03b6\u03b1(C\u0302) := 2\u2212\u03b1(\u03b3\u22121\u03b1 (C\u0302)\u22121),where\u03b3\u03b1(C\u0302) =\u03b1\u22121\u03b1+1+\u222b 10C(1\u2212u1\/\u03b1 ,1\u2212u1\/\u03b1)du.Under the same setting in Section 4.2.2, the conditional tail-weighted depen-dence measure \u03b6\u03b1 for CY1,Y2;X(\u00b7;x) can be estimated similarly. By Equation 4.2, the61conditional \u03b3\u03b1(x) can be estimated by\u03b3\u03b1(C\u02dcY1,Y2;X(\u00b7;x)) =\u222b 10C\u02dcY1,Y2;X(u1\/\u03b1 ,u1\/\u03b1 ;x)du=n\u2211i=1wi(x)\u222b 10I{u\u02c6i1 \u2264 u1\/\u03b1 , u\u02c6i2 \u2264 u1\/\u03b1}du= 1\u2212n\u2211i=1wi(x)(u\u02c6i1\u2228 u\u02c6i2)\u03b1 ,where u\u02c6i1 := G\u02dcY1;X(u\u02dci1;x) and u\u02c6i2 := G\u02dcY2;X(u\u02dci2;x). As a result, the conditional uppertail-weighted dependence measure can be estimated by\u03b6\u03b1(C\u02dcY1,Y2;X(\u00b7;x)) = 2\u2212\u03b1(\u03b3\u22121\u03b1 (C\u02dcY1,Y2;X(\u00b7;x))\u22121).4.2.4 Conditional measures of permutation and reflection asymmetryA bivariate copula C is called permutation symmetric if for (U1,U2)\u223cC, we have(U2,U1)\u223cC as well. Similarly, C is called reflection symmetric if for (U1,U2)\u223cC,we have (1\u2212U1,1\u2212U2)\u223cC. Krupskii (2017) proposes permutation and reflectionasymmetry measures for data with positive quadrant dependence. Those asym-metry measures can be used as diagnostic tools to suggest proper candidate cop-ula families when fitting bivariate data. In this section, we present the permuta-tion asymmetry measure GP,k and reflection asymmetry measures GR,k, and extendthem to conditional measures.The permutation asymmetry measure GP,k(C) is defined as the expectation ofthe variable |U1\u2212U2|k+2 adjusted for the sign of U1\u2212U2 for k > 0:GP,k(C) = E[|U1\u2212U2|k+2 \u00b7 sign(U1\u2212U2)]=\u222b\u222b[0,1]2|u1\u2212u2|k+2 \u00b7 sign(u1\u2212u2)dC(u1,u2).It indicates the direction of permutation asymmetry: if the measure takes a posi-tive (negative) value, then the conditional mean of data truncated in the right lower(left upper) corner is greater than that of data truncated in the left upper (rightlower) corner. A larger tuning parameter k results in greater variability of an em-62pirical estimate, while a small k makes the measure less sensitive to a permutationasymmetric dependence. The permutation asymmetry measure GP,k can be furthernormalized to the range of [\u22121,1] by finding a copula C that maximizes |GP,k(C)|(Rosco and Joe, 2013). Following the choice of k in Krupskii (2017), we usek = 0.2 for the remainder of the chapter.The conditional GP,k can be estimated byGP,k(C\u02dcY1,Y2;X(\u00b7;x)) =\u222b\u222b[0,1]2|u1\u2212u2|k+2sign(u1\u2212u2)dC\u02dcY1,Y2;X(u1,u2;x)=n\u2211i=1wi(x)|u\u02c6i1\u2212 u\u02c6i2|k+2sign(u\u02c6i1\u2212 u\u02c6i2),where u\u02c6i1 := G\u02dcY1;X(u\u02dci1;x) and u\u02c6i2 := G\u02dcY2;X(u\u02dci2;x).Similarly, the reflection asymmetry measure GR,k is defined as the expectationof the variable |1\u2212U1\u2212U2|k+2 adjusted for the sign of 1\u2212U1\u2212U2.GR,k(C) = E[|1\u2212U1\u2212U2|k+2sign(1\u2212U1\u2212U2)]=\u222b\u222b[0,1]2|1\u2212u1\u2212u2|k+2sign(1\u2212u1\u2212u2)dC(u1,u2),and an estimate of the conditional GR,k isGR,k(C\u02dcY1,Y2;X(\u00b7;x)) =n\u2211i=1wi(x)|1\u2212 u\u02c6i1\u2212 u\u02c6i2|k+2sign(1\u2212 u\u02c6i1\u2212 u\u02c6i2).4.3 Skewed bivariate copulasBivariate copulas with permutation asymmetry have not been used much with vinecopulas, because often permutation asymmetry cannot be observed from a bivariatenormal scores plot. However, if permutation asymmetry is detected via asymmetrymeasures, then families of permutation asymmetric copulas should be considered.In this section, we compare two parametric families of permutation asymmetriccopulas with asymmetric tail dependence. At the boundaries of these families arecopulas without tail dependence. They have three or four parameters and can beused within parametric vine copulas.63The Azzalini-Capitanio (AC) skew-t copula and the skew-BB1 copula are twopermutation asymmetric bivariate copulas. They can be used within vine copulaswhen permutation asymmetry is detected together with possible tail dependence.Without both upper and lower tail dependence, Azzalini-Dalla Vallee skew-normal(limiting case of AC skew-t) and skew-Gumbel (boundary case of skew-BB1) areoptions.One way to generate new copula families from existing ones is through themaximum of independent beta random variables. This has been included in Gen-est et al. (1998); McNeil et al. (2015) and Section 3.18 of Joe (2014), withoutstudying its properties in detail. Let C1 and C2 be two bivariate copulas and\u03b31,\u03b32 \u2208 [0,1], then C1(u\u03b311 ,u\u03b322 )C2(u1\u2212\u03b311 ,u1\u2212\u03b322 ) is a valid bivariate copula. In fact,if V1 = (V11,V12) \u223c C1, V2 = (V21,V22) \u223c C2 and V1,V2 are independent, then(V 1\/\u03b3111 \u2228V 1\/(1\u2212\u03b31)21 ,V 1\/\u03b3212 \u2228V 1\/(1\u2212\u03b32)22 ) has uniform margins and follows the abovedistribution.In order to generate permutation asymmetric bivariate copulas, let C1 be a per-mutation symmetric bivariate copula C1 = C(\u00b7;\u03b8) parametrized by \u03b8 , let C2 bethe independence copula C\u22a5, i.e., C2(u1,u2) = u1u2, and suppose (1\u2212 \u03b31)(1\u2212\u03b32) = 0. This results in a permutation asymmetric bivariate copula C\u02d8 that can beparametrized by \u03b8 and \u03b2 \u2208 [\u22121,1].C\u02d8(u1,u2;\u03b8 ,\u03b2 ) =\uf8f1\uf8f2\uf8f3C(u1\u2212\u03b21 ,u2;\u03b8)u\u03b21 if 0\u2264 \u03b2 \u2264 1,C(u1,u1+\u03b22 ;\u03b8)u\u2212\u03b22 if \u22121\u2264 \u03b2 < 0.(4.3)The corresponding conditional CDF and copula PDF are as follows:C\u02d82|1(u2|u1;\u03b8 ,\u03b2 )=\uf8f1\uf8f2\uf8f3(1\u2212\u03b2 )C2|1(u2|u1\u2212\u03b21 ;\u03b8)+\u03b2C(u1\u2212\u03b21 ,u2;\u03b8)u\u03b2\u221211 if 0\u2264 \u03b2 \u2264 1,C2|1(u1+\u03b22 |u1;\u03b8)u\u2212\u03b22 if \u22121\u2264 \u03b2 < 0.64C\u02d81|2(u1|u2;\u03b8 ,\u03b2 )=\uf8f1\uf8f2\uf8f3C1|2(u1\u2212\u03b21 |u2;\u03b8)u\u03b21 if 0\u2264 \u03b2 \u2264 1,(1+\u03b2 )C1|2(u1|u1+\u03b22 ;\u03b8)\u2212\u03b2C(u1,u1+\u03b22 ;\u03b8)u\u2212\u03b2\u221212 if \u22121\u2264 \u03b2 < 0.c\u02d8(u1,u2;\u03b8 ,\u03b2 )=\uf8f1\uf8f2\uf8f3(1\u2212\u03b2 )c(u1\u2212\u03b21 ,u2;\u03b8)+\u03b2C1|2(u1\u2212\u03b21 |u2;\u03b8)u\u03b2\u221211 if 0\u2264 \u03b2 \u2264 1,(1+\u03b2 )c(u1,u1+\u03b22 ;\u03b8)\u2212\u03b2C2|1(u1+\u03b22 |u1;\u03b8)u\u2212\u03b2\u221212 if \u22121\u2264 \u03b2 < 0.When 0 < \u03b2 < 1, the copula is skewed towards the bottom-right corner; it hasmore probability in the (1,0) corner than the (0,1) corner. When \u22121 < \u03b2 < 0, thecopula is skewed towards the top-left corner; it has more probability in the (0,1)corner than the (1,0) corner. When \u03b2 \u2192 1 or \u03b2 \u2192 \u22121, Equation 4.3 convergesto C\u22a5 in distribution. When C(\u00b7;\u03b8) is the independence copula, the parameter \u03b2has no effect as the result is still the independence copula C\u22a5. When C(\u00b7;\u03b8) iscomonotonic,C\u02d8(u1,u2;\u03b8 ,\u03b2 ) =\uf8f1\uf8f2\uf8f3(u1\u2212\u03b21 \u2227u2)u\u03b21 if 0\u2264 \u03b2 \u2264 1,(u1\u2227u1+\u03b22 )u\u2212\u03b22 if \u22121\u2264 \u03b2 < 0.If 0\u2264 \u03b2 \u2264 1, there is no probability density above the curve u1\u2212\u03b21 = u2; the proba-bility that a point lies on the curve is 1\u2212\u03b2 ; the density under the curve is uniform.If \u22121 \u2264 \u03b2 < 0, there is no probability density below the curve u1 = u1+\u03b22 ; theprobability that a point lies on the curve is 1+ \u03b2 ; the density above the curve isuniform. Figure 4.1 shows scatter plots of random samples in this case for \u03b2 = 0.5and \u03b2 =\u22120.5.The above analysis shows that \u03b2 is not directly interpretable as a skewnessparameter, and \u03b8 of the original copula by itself does not indicate the strength ofdependence. To understand the range of permutation asymmetry versus centraldependence, these measures are plotted in Figure 4.2 for skew-BB1 copulas, i.e.,65(a) \u03b2 = 0.5. (b) \u03b2 =\u22120.5.Figure 4.1: Scatter plots of 1000 random samples drawn from C\u02d8(u1,u2;\u03b8 ,\u03b2 )in Equation 4.3 when C(\u00b7;\u03b8) is comonotonic.the bivariate copula C(\u00b7;\u03b8) in Equation 4.3 is a BB1 copula:C\u02d8(u1,u2;\u03b8 ,\u03b4 ,\u03b2 ) =\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3[1+[(u\u2212\u03b8(1\u2212\u03b2 )1 \u22121)\u03b4 +(u\u2212\u03b82 \u22121)\u03b4]1\/\u03b4]\u22121\/\u03b8u\u03b21 if 0\u2264 \u03b2 \u2264 1,[1+[(u\u2212\u03b81 \u22121)\u03b4 +(u\u2212\u03b8(1+\u03b2 )2 \u22121)\u03b4]1\/\u03b4]\u22121\/\u03b8u\u2212\u03b22 if \u22121\u2264 \u03b2 < 0.We conduct a grid search of all combinations of parameters \u03b8 . The widest rangeof permutation asymmetry occurs when the strength of central dependence (say, asmeasured by Spearman\u2019s rho) is around 0.5. This is shown in Figure 4.2 where thelengths of the curves are longest when \u03b2 is between 0.5 to 0.6 (or negative).Yoshiba (2018) has a numerical implementation of the AC skew-t copula, whichinvolves the multivariate skew-t distribution of Azzalini and Capitanio (2003). Ad-variate skew-t distribution has the following joint density function at x \u2208 Rd :g(x) = 2td,\u03bd(x;\u2126)T1,\u03bd+d(\u03b1T x\u221a\u03bd+dxT\u2126\u22121x+\u03bd),where \u03b1 \u2208Rd , td,\u03bd(x;\u2126) is the d-variate Student-t density with the correlation ma-66Figure 4.2: Comparison of permutation asymmetry measure GP,k=0.2 in Sec-tion 4.2.4 and central dependence measure Spearman\u2019s rho for skew-BB1 and skew-t copulas. For skew-BB1 copulas, the parameter \u03b2 isin the set of 20 equally spaced points in [\u22121,1]. Each red curve in thefigure corresponds to a distinct \u03b2 value.trix \u2126 and the degrees of freedom \u03bd , and T1,\u03bd+d is the univariate Student-t CDFwith degrees of freedom \u03bd . An AC skew-t copula is the copula of a multivariateskew-t distribution by applying Sklar\u2019s theorem, so the major numerical difficultyfor this copula is to get the univariate quantile functions. Figure 4.2 also shows thepermutation asymmetry measure and central dependence measure for AC skew-tcopulas. It indicates that skew-BB1 copulas cover a wider range of the permuta-tion asymmetry measure than AC skew-t copulas when the dependence measure isgreater than 0.5.4.4 Conditional dependence with the gamma factormodelIn this section, we apply the diagnostic tools for conditional dependence to thegamma factor model, a special case of convolution-closed families described in67Section 4.28 of Joe (2014). As mentioned in Stoeber et al. (2013), this is a multi-variate model with conditional dependence measures that vary from 0 to 1; hence,any method for assessing the simplifying assumption for vine copulas could makeuse of this model. There is more variation in conditional dependence measures inthis model than in other multivariate distributions for which we have done compu-tations. See Appendix B for a similar analysis on a trivariate Frank copula model.The simplifying assumption for conditional distributions of multivariate distri-butions is not satisfied other than in a few known cases (see Section 3.9.7 of Joe(2014)); in other cases where the conditional dependence measures can be com-puted, there is much less variation and often the simplifying assumption may beacceptable as an approximation. In cases where we have done computations ofSpearman\u2019s rho for conditional bivariate distributions from trivariate and 4-variatedistributions, the Spearman\u2019s rho curve is monotone or U-shaped or unimodal whenconditioning on one variable and the Spearman\u2019s rho surface is smooth with cor-ners as local maxima or minima when conditioning on two variables.We show that the conditional dependence measures can be used for moderatesample sizes when conditioning on one variable, but not for two or more variables.The implications of this for application to vine copulas are mentioned at the end ofthis section.For the gamma factor model, the marginal and joint distributions have sim-ple stochastic representations, and the conditional distributions can be obtainedvia one-dimension numerical integration. Therefore, the copula-based dependencemeasures can be computed even for conditional distributions. This allows for acomparison of the diagnostic tools proposed in Section 4.2 with the exact condi-tional dependence measures as a function of the conditioning values, in order toget an idea of the sample size needed to see the patterns.Suppose Yj = Z0+Z j for j= 1, . . . ,d, where Z0,Z1, . . . ,Zd are independent ran-dom variables and Z j \u223c Gamma(\u03b8 j,1) for j = 0,1, . . . ,d. This model has positivedependence and the simplifying assumption for vines is far from holding for anyvine structure because any conditional distribution can vary from independence tostrong dependence as the values of the conditioning variables vary. For bivariatemargins, there is stronger dependence in the joint upper tail than the joint lowertail.68For d = 3, we consider the copula of (Y1,Y2) given F3(Y3) = x, denoted byC12;3(\u00b7;x), where Fj is the CDF of Yj and x \u2208 (0,1). Transforming Y3 to U(0,1)before computing the conditional measures produces better estimates with a band-width hn that doesn\u2019t depend on the value of the conditioning variable. For thiscopula conditioning on one variable, the pattern of the conditional Spearman\u2019s rhois that it is increasing from 0 to 1 as x goes from 0 to 1. A similar pattern oc-curs for the tail-weighted dependence measure \u03b6\u03b1 but conditionally, there is moredependence in the joint lower tail than in the joint upper tail.We assess the estimation method on many simulated three-variate datasets. Forthe representative example in Figure 4.3, the simulation sample size is n = 1000with (\u03b80,\u03b81,\u03b82,\u03b83) = (3,1,1.5,2). The exact \u03c1S(C12;3(\u00b7;x)), \u03b6\u03b1=5(C12;3(\u00b7;x)),\u03b6\u03b1=5(C\u030212;3(\u00b7;x)), and GP,k=0.2(C12;3) computed via numerical integration are shownin red dash-dot lines in Figure 4.3. The kernel-smoothed estimates using Epanech-nikov kernel and window size hn = 0.2 are shown in solid dark lines and the boot-strap confidence bands are plotted in dashed dark lines. The plots indicate that theconstructed confidence bands are able to detect the increasing trends in the condi-tional dependence measures. For the conditional permutation asymmetry measureGP,k=0.2(C12;3), both the exact measure and estimates are close to zero for differentvalues of the conditioning variable.Next, we assess the estimation methods when conditioning on two variables.Consider C12;34(\u00b7;x,y), the copula of (Y1,Y2) given F3(Y3) = x and F4(Y4) = y. Theconditional Spearman\u2019s rho is increasing from 0 to 1 as x and y go from 0 to 1, alongor near the main diagonal. There is a ridge near the main diagonal that dependson the magnitude of asymmetry in \u03b81, . . . ,\u03b84, and the conditional Spearman\u2019s rhodecreases in the directions orthogonal to the ridge. A similar pattern occurs forthe conditional tail-weighted dependence measure \u03b6\u03b1 . We conduct a study on asimulated four-variate dataset. For a representative example in Figure 4.4, we haveparameters (\u03b80,\u03b81,\u03b82,\u03b83,\u03b84) = (3,1,1.5,2,2.5). The sample size is n = 1000. Inthis case, the conditional Spearman\u2019s rho \u03c1S(C12;34(\u00b7;x,y)) is a bivariate function.Figure 4.4 shows the exact function in red and the estimated confidence surfacesin blue. Conditional Spearman\u2019s rho near the corners (0,1) and (1,0) is severelyunderestimated. A much large sample size of several thousand is needed to ac-curately capture the shape of the two-dimensional surface. Other non-parametric69(a) Spearman\u2019s rho \u03c1S(C12;3). (b) Tail-weighted dependence mea-sure (lower tail) \u03b6\u03b1=5(C12;3).(c) Tail-weighted dependence mea-sure (upper tail) \u03b6\u03b1=5(C\u030212;3).(d) Permutation asymmetry mea-sure GP,k=0.2(C12;3).Figure 4.3: Conditional measures of C12;3(\u00b7;x), the copula of Y1,Y2 givenF3(Y3) = x, for a gamma factor model with parameters (\u03b80,\u03b81,\u03b82,\u03b83) =(3,1,1.5,2). The sample size is n = 1000. The red dash-dot lines arethe exact conditional measures computed via numerical integration. Thedark solid lines and dashed lines are the kernel-smoothed conditionalSpearman\u2019s rho and the corresponding 90%-level simultaneous boot-strap confidence bands, using Epanechnikov kernel and window sizehn = 0.2.70Figure 4.4: Conditional Spearman\u2019s rho of C12;34(\u00b7;x,y), the copula of Y1,Y2given F3(Y3) = x and F4(Y4) = y, for a gamma factor model with param-eters (\u03b80,\u03b81,\u03b82,\u03b83,\u03b84) = (3,1,1.5,2,2.5). The sample size is n= 1000.The red surface is the exact conditional Spearman\u2019s rho computedvia numerical integration, and the blue surfaces are the 90%-level si-multaneous bootstrap confidence surfaces, using spherically symmetricEpanechnikov kernel and window size hn = 0.2.smoothing methods yield similar estimation results.What the gamma factor model implies about the simplifying assumption forvine copulas is that, for a sample size of a few hundred to a few thousand, condi-tional dependence measures as a function of the values of conditioning variablesare useful for tree 2 but not for trees 3 and higher. The current algorithms for vinecopulas attempt to put pairs with stronger dependence and conditional dependencein lower trees and weaker dependence in higher order trees of truncated vines. Inthis case, it is especially important to check the validity of the simplifying assump-tion as an approximation for tree 2. If the simplifying assumption seems acceptablefor tree 2, then the assumption may be acceptable in higher-order trees. Otherwise,one could fit copulas in tree 2 with conditional dependence parameters that varyas simple functions of the values of conditioning variables. The above results forthe gamma factor model indicate that the conditional dependence measures cannot71be used for trees 3 and higher when the sample size is not large. From the resultsin Section 5.7 of Joe (2014), different bivariate copula models are hard to distin-guish when the dependence is weak. So it is mainly the vine edges with a strongerdependence where more care is needed in the choice of copula families based ondiagnostics.4.5 Illustrative data examplesIn this section, we illustrate the diagnostic tools for 3 to 4 variables on two datasets:a hydro-geochemical dataset in Section 4.5.1 and a gene expression dataset in Sec-tion 4.5.2. Both datasets have variables that exhibit significant permutation asym-metry and have examples where the conditional Spearman\u2019s rho function is non-constant.We have also applied the diagnostics to financial returns datasets; there seemsno pattern in the conditional measures when the conditioning variables have strongdependence and the permutation asymmetry measures are rarely significant. Hence,for financial returns datasets, vine or factor copulas with the simplifying assump-tion and permutation symmetric bivariate copulas should usually be adequate.When the simplifying assumption does not hold for an edge of a vine in tree2 or higher, a bivariate parametric copula family is chosen with at least one of itsparameters varying with the value(s) of the conditioning variable(s). In this case,we say that a non-constant copula family is used. Otherwise, if the simplifyingassumption is assumed to hold, we say that a constant copula is used, meaning thatthe parameters are constant over the value(s) of the conditioning variable(s).Without the simplifying assumption, a multivariate distribution can be decom-posed into sets of conditional distributions for all regular vines. With the simpli-fying assumption, it has been the practice for vine copulas to choose just one vinestructure based on a sequential procedure of having edges with the strongest de-pendence in low-order trees, for example, Dissmann et al. (2013). Then one cantruncate the process when additional trees show weak conditional dependence; forexample, the method of Brechmann et al. (2012) can be used to get a truncatedvine copula when there are many variables. For a few variables, truncation wouldnot occur if there is no vine for which conditional dependence becomes weak. In72lower \u03b6\u02c6\u03b1=5 upper \u03b6\u02c6\u03b1=5 Gaussian \u03b6\u03b1=5 G\u0302P,k=0.2 se(G\u0302P,k=0.2) fitted copula[Co,Sc] 0.522 0.511 0.495 0.036 0.017 skew-t[Sc,Ti] 0.467 0.357 0.359 0.093 0.021 refl. skew-BB1[Co,Ti] 0.434 0.118 0.283 0.064 0.028 refl. skew-BB1Table 4.1: Empirical tail-weighted dependence measures \u03b6\u02c6\u03b1=5, Gaussian tail-weighted dependence measure \u03b6\u03b1=5, permutation asymmetry measuresG\u0302P,k=0.2, and fitted copulas in the first tree for the hydro-geochemicaldataset. Gaussian \u03b6\u03b1=5 is the tail-weighted dependence measure of a bi-variate Gaussian copula whose Spearman\u2019s rho is the same as the empir-ical counterpart. There appears to be reflection asymmetry, permutationasymmetry, and stronger dependence than Gaussian in the joint upperand lower tails.this section, we include examples with three and four variables where we can fitparametric vine copulas for different vines without the simplifying assumption sothat they fit roughly the same when compared using Vuong\u2019s procedure (Vuong,1989).4.5.1 Hydro-geochemical dataWe use the hydro-geochemical dataset to illustrate the use of the diagnostic tools.The dataset is from the hydro-geochemical stream and sediment reconnaissance(HSSR) project, which is a Department of Energy program to assess the extentof uranium potential in the United States (Cook and Johnson, 1981). It consistsof the log-concentrations of seven chemicals in n = 655 water samples collectednear Grand Junction, Colorado. We focus on three variables that are the log-concentrations of the elements cobalt (Co), scandium (Sc) and titanium (Ti). Thethree variables are also used by Acar et al. (2012); Kraus and Czado (2017b). Fig-ure 4.5 shows the pairwise scatter plot of the normal scores. Acar et al. (2012)conduct an analysis which suggests that a non-simplifying vine copula construc-tion is needed. However, their analysis only focuses on the conditional distribu-tion [Co,Sc|Ti] without fitting a vine structure. We demonstrate how to get non-simplifying vine copulas for all three possible vine structures.Based on an initial data analysis and the normal scores plot, there appears to73Figure 4.5: Pairwise scatter plot of the normal scores of variables cobalt (Co),titanium (Ti) and scandium (Sc) in the hydro-geochemical dataset.be reflection asymmetry, permutation asymmetry, and stronger dependence thanGaussian in the joint upper and lower tails. Table 4.1 shows the comparison ofempirical tail-weighted dependence measures with tail-weighted dependence mea-sures of bivariate Gaussian copulas whose Spearman\u2019s rho are the same as the em-pirical counterparts. It also shows the permutation asymmetry measures G\u0302P,k=0.2and the corresponding standard errors. Bivariate parametric copula families in thefirst level or tree are selected from the following candidate families: Gaussian,t, BB1, reflected BB1, skew-Gaussian, skew-t, skew-BB1, and reflected skew-BB1. The best fitting bivariate copulas are shown in Table 4.1. For copulas inthe second tree, Figure 4.6 shows the kernel-smoothed conditional Spearman\u2019s rhofor all three possible vine structure using Epanechnikov kernel and window sizehn = 0.2, which is close to n\u22121\/5. The dark solid lines and dashed lines are thekernel-smoothed conditional Spearman\u2019s rho and the corresponding 90%-level si-multaneous bootstrap confidence bands using 1000 bootstrap samples. It visuallyindicates that the simplifying assumption does not seem valid for all three vines,but is closer to holding for [Co,Sc|Ti] and [Sc,Ti|Co]. Kraus and Czado (2017b)74decide on [Sc,Ti|Co] as the structure being closest to satisfying the simplifyingassumption. For [Co,Ti|Sc], and [Co,Sc|Ti], the curve of conditional Spearman\u2019srho is unimodal, so a quadratic parametrization in the conditioning value mightbe sufficient to capture the shape. For [Sc,Ti|Co], the curve is bimodal; a higherorder polynomial is needed to capture the trend. The shapes of the conditionaltail-weighted tail dependence are similar to that of the conditional Spearman\u2019s rho.In many cases, a copula parameter \u03d1 is bounded in (\u03d1L,\u03d1U), either by defini-tion or for numerical stability. When performing numerical maximum likelihoodestimation, a reparametrization can help for better convergence. We define a con-tinuous and monotonically increasing function h : R\u2192 (\u03d1L,\u03d1U) that maps fromthe real line to a finite interval:h(x;\u03d1L,\u03d1U) = tanh(x)(\u03d1U \u2212\u03d1L)\/2+(\u03d1U +\u03d1L)\/2.We find the following non-constant copula families as best fits, after trying dif-ferent ways of incorporating the value of the conditional variable into a conditionaldependence parameter.\u2022 The non-constant t copula whose parameter \u03c1 is the composition of thetransformation h and a quadratic function of the conditioning variable u:\u03c1 = h(a2u2+a1u+a0;\u03c1min,\u03c1max),where \u03c1min = \u22121 and \u03c1max = 1. With this parametrization, a non-constant tcopula model has three parameters: a2,a1,a0 and \u03bd .\u2022 The non-constant skew-BB1 copula whose parameters \u03b4 and \u03b8 are the com-position of the transformation h and a quadratic function of the conditioningvariable u:\u03b4 = h(a2u2+a1u+a0;\u03b4min,\u03b4max),\u03b8 = h(b2u2+b1u+b0;\u03b8min,\u03b8max),where \u03b4min = 1, \u03b4max = 7, \u03b8min = 0 and \u03b8max = 7. With this parametrization,a non-constant skew-BB1 copula model has seven parameters: \u03b2 , a2, a1, a0,75simplifying non-simplifying[Co,Ti|Sc] t t (quadratic \u03c1)[Co,Sc|Ti] skew-BB1 skew-BB1 (quartic \u03b4 )[Sc,Ti|Co] skew-BB1 skew-BB1 (quartic \u03b4 )Table 4.2: Fitted bivariate copulas in the second tree for the hydro-geochemical dataset.b2, b1 and b0.\u2022 The non-constant skew-BB1 copula whose parameter \u03b4 is the compositionof the transformation h and a quartic function of the conditioning variable u:\u03b4 = h(a4u4+a3u3+a2u2+a1u+a0;\u03b4min,\u03b4max),where \u03b4min = 1 and \u03b4max = 7. With this parametrization, a non-constantskew-BB1 copula model has seven parameters: \u03b2 ,a4,a3,a2,a1,a0 and \u03b8 .The selected constant and non-constant bivariate copulas are reported in Ta-ble 4.2. The red dash-dot lines in Figure 4.6 represent the estimated conditionalSpearman\u2019s rho. It indicates that the estimated curves accurately reflect the trendof the kernel-smoothed conditional Spearman\u2019s rho.Table 4.3 shows the pairwise comparison of the AICs with and without the sim-plifying assumption for the three models. The confidence intervals are calculatedusing Vuong procedure with the AIC correction (Vuong, 1989). It is clear that thevine copula models with non-constant copulas on the second tree fit better than thecorresponding model with constant copulas. If the parametric models fit well thebivariate copulas in the first tree and the copulas of conditional distributions in thesecond tree, the vine copulas for different vines should be similar. It can be seenfrom Table 4.3 that the AICs of the three vine copulas are close.4.5.2 Glioblastoma tumors datasetThe glioblastoma tumors (GBM) dataset is a level-3 gene expression dataset usedby Brennan et al. (2013). It is obtained from The Cancer Genome Atlas (TCGA)Data Portal (Tomczak et al., 2015) and contains expression data of 12044 genes76(a) [Co,Ti|Sc].(b) [Co,Sc|Ti]. (c) [Sc,Ti|Co].Figure 4.6: Conditional Spearman\u2019s rho on the hydro-geochemical dataset.The dark solid lines and dashed lines are the kernel-smoothed condi-tional Spearman\u2019s rho and the corresponding 90%-level simultaneousbootstrap confidence bands, using Epanechnikov kernel and windowsize hn = 0.2. The red dash-dot lines represent the estimated conditionalSpearman\u2019s rho.77C-vine 1 C-vine 2simplifying non-simplifyingAIC 1 AIC 2 AIC 1 AIC 2 CICo\u2013Sc\u2013Ti Co\u2013Ti\u2013Sc \u2212869.2 \u2212865.0 \u2212899.7 \u2212876.5 (\u22120.038,0.003)Co\u2013Ti\u2013Sc Sc\u2013Co\u2013Ti \u2212865.0 \u2212874.7 \u2212876.5 \u2212883.7 (\u22120.014,0.024)Sc\u2013Co\u2013Ti Co\u2013Sc\u2013Ti \u2212874.7 \u2212869.2 \u2212883.7 \u2212899.7 (\u22120.011,0.035)Table 4.3: Pairwise comparison of vine copula models on the hydro-geochemical dataset.G\u0302P,k=0.2 se(G\u0302P,k=0.2) copula symm AIC symm copula asymm AIC asymm[12] 0.072 0.028 BB1 \u2212105.5 skew-Gumbel \u2212112.0[13] \u22120.097 0.018 Gaussian \u2212321.9 skew-t \u2212349.6[14] 0.060 0.023 t \u2212232.4 skew-t \u2212234.9[23] \u22120.128 0.024 Gaussian \u2212126.0 skew-t \u2212156.3[24] \u22120.046 0.025 Gumbel \u2212162.5 refl. skew-BB1 \u2212165.4[34] 0.115 0.020 BB1 \u2212544.8 skew-t \u2212589.0Table 4.4: Permutation asymmetry measures and AICs of pairs of variables inthe GBM dataset.from n = 558 tumors. Within all the genes in the dataset, we first filter out 1342genes that are related to human cell cycle. Afterwards, a hierarchical clustering al-gorithm with Euclidean distance metric and complete-linkage is applied to obtain acluster of 92 genes. We pick four consecutive genes that have visible permutationasymmetry: RPL21, RPL22, RPL24 and RPL29, which are hereafter referred to asvariables 1 to 4. Figure 4.7 shows the pairwise scatter plot of the normal scores. Ta-ble 4.4 shows the pairwise permutation asymmetry measure G\u0302P,k=0.2 and the corre-sponding bootstrap standard errors using 1000 bootstrap samples. Table 4.5 showsthe lower and upper tail-weighted dependence measures and the tail-weighted de-pendence measure of a bivariate Gaussian copula whose Spearman\u2019s rho is thesame as the empirical counterpart. All pairs of variables are positively correlated,and some have perceivable permutation asymmetry and reflection asymmetry. Wealso conduct a similar analysis on other sets of four variables from the dataset;most of them do not exhibit permutation asymmetry and vine copula models withthe simplifying assumption appear to be sufficient.We fit both bivariate permutation symmetric and asymmetric copulas to all78Figure 4.7: Pairwise scatter plot of the normal scores in the GBM dataset.lower \u03b6\u02c6\u03b1=5 upper \u03b6\u02c6\u03b1=5 Gaussian \u03b6\u03b1=5[12] 0.128 0.380 0.197[13] 0.331 0.270 0.442[14] 0.197 0.338 0.355[23] 0.080 0.325 0.236[24] 0.015 0.463 0.237[34] 0.552 0.676 0.583Table 4.5: Empirical tail-weighted dependence measures \u03b6\u02c6\u03b1=5 and Gaussiantail-weighted dependence measure \u03b6\u03b1=5.79G\u0302P,k=0.2 se(G\u0302P,k=0.2) AIC symm AIC asymm AIC asymm & non-const[14|3] 0.140 0.030 \u221213.0 \u221231.7 \u221231.7[23|4] \u22120.047 0.033 \u221210.9 \u221215.2 \u221228.9[23|1] \u22120.083 0.029 \u221247.1 \u221254.2 \u221254.2[24|1] \u22120.010 0.029 \u221277.8 \u221283.1 \u221283.1Table 4.6: Permutation asymmetry measure and AICs of pairs of variables intree 2 of D-vine 1342 and C-vine 1234 on the GBM dataset. If the AIC ofa non-constant model is worse than a constant model, we report the AICof the constant model, e.g., [14|3], [23|1] and [24|1].pairs of variables, including Gaussian, Gumbel, student-t, BB1, skew-normal, skew-Gumbel, skew-t, skew-BB1, and their reflected copulas. The Gumbel, skew-Gumbel,BB1 and skew-BB1 and their reflected copulas are flexible in handling asymmetrictail dependence. The AICs are also presented in Table 4.4. For all pairs, permu-tation asymmetric copulas achieve better AICs than permutation symmetric ones.Furthermore, the pairs that are significantly asymmetric according to both mea-sures, i.e., [13], [23] and [34], have large improvements in AIC. This indicates thatthe permutation asymmetry measure G\u0302P,k=0.2 is informative in identifying permu-tation asymmetry and guiding the choice of bivariate copula families.The best vine structure selected by Dissmann\u2019s algorithm (Dissmann et al.,2013) is a D-vine with the path 1\u20133\u20134\u20132 as the first tree; we call it the D-vine-1342model. Based on this D-vine structure, we fit bivariate copulas with simplifyingassumptions from the families including Gaussian, Gumbel, student-t, BB1 andtheir reflected copulas in the second and third trees. The AIC of the best modelis \u22121066.5. The permutation asymmetry measures G\u0302P,k=0.2 shown in Table 4.6suggest that [23|4] is slightly permutation asymmetric and [14|3] is significantlypermutation asymmetric. By using skewed bivariate copulas, the AIC is improvedto \u22121156.6.We further assess the simplifying assumption for pairs [14|3] and [23|4] in thesecond tree of the D-vine-1342 model. Figure 4.8 shows the conditional Spear-man\u2019s rho for the two pairs. It indicates that the simplifying assumption is accept-able for [14|3] because the conditional Spearman\u2019s rho is approximately a constantbetween \u22120.1 and 0.25. However, for [23|4], the conditional Spearman\u2019s rho de-80model AIC symm AIC asymm AIC asymm & non-constD-vine-1342 \u22121066.5 \u22121156.6 \u22121171.0C-vine-1234 \u22121055.2 \u22121171.9 \u22121171.9Table 4.7: Model AICs for different vine structures on GBM dataset.creases as the conditioning value u4 increases. Therefore, the bivariate copulamodel for [23|4] can be further improved by using non-constant copulas. We usethe same non-constant bivariate copula families as in Section 4.5.1 because thereappears to be asymmetric tail dependence from the bivariate normal scores plots.Model AICs in Table 4.6 also confirm this observation: by adopting non-constantcopulas, the AIC improves significantly for [23|4], but deteriorates for [14|3]. Over-all, using non-constant copulas improves the AIC to \u22121171.0, as shown in Ta-ble 4.7.Similar to the analysis in Section 4.5.1, we fit vine copula models with differentvine structures. For four variables, there are 24 possible vine structures in total. Wepick a C-vine that has a very different structure from D-vine-1342. The model C-vine-1234 has edges [12], [13] and [14] in the first tree and [23|1], [24|1] in thesecond tree. Table 4.4 and Table 4.6 show that there is significant permutationasymmetry in the first and second trees. Therefore permutation asymmetric copulascould be used. We further investigate the conditional Spearman\u2019s rho in the secondtree for pairs [23|1] and [24|1]. The 90%-level simultaneous bootstrap confidencebands and the model Spearman\u2019s rho are also shown in Figure 4.8. It indicatesthat the conditional Spearman\u2019s rho is approximately constant for both pairs andconstant copulas should be sufficient.We evaluate the model AICs when using constant symmetric copulas, constantasymmetric copulas and non-constant copulas, and show the model AICs for D-vine-1342 and C-vine-1234 models in Table 4.7. For C-vine-1234 model, it im-proves significantly from constant symmetric copulas to constant asymmetric cop-ulas. But the model does not improve by using non-constant copulas. This cor-roborates the conclusions from the diagnostics. Moreover, when using asymmetricand non-constant copulas, both models have similar AICs.81(a) [14|3]. (b) [23|4].(c) [23|1]. (d) [24|1].Figure 4.8: Conditional Spearman\u2019s rho of pairs [14|3] and [23|4] in the D-vine-1342 model, and [23|1] and [24|1] in the C-vine-1234 model onthe GBM dataset. The dark solid lines and dashed lines are the kernel-smoothed conditional Spearman\u2019s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikov kerneland window size hn = 0.2. The red dash-dot lines represent the modelconditional Spearman\u2019s rho. For [14|3], the best-fitting model is aconstant skewed t-copula. For [23|4], the best-fitting model is a non-constant skewed-BB1 copula (quartic \u03b4 ). For both [23|1] and [24|1], thebest-fitting models are constant reflected skewed-BB1 copulas.824.6 ConclusionIn this chapter, we propose a general framework for estimating the conditional de-pendence or asymmetry measures as a function of the value(s) of the conditionalvariable(s). An algorithm to compute the corresponding confidence bands is alsopresented. The estimation of the conditional measures can be adapted to othercopula-based measures and enrich the diagnostic tools in the future. Since the es-timation of the conditional distributions requires a smoothing method, the measureshould be a simple function of the copula.The use of dependence and asymmetry measures as diagnostic tools for bi-variate copulas and bivariate conditional distributions has been illustrated with realdatasets. Diagnostics can guide the choice of candidate bivariate copula familiesto use in vine copulas. If diagnostics for some edges of a vine suggest positivemonotone dependence, reflection asymmetry, permutation asymmetry, and possi-ble asymmetric tail dependence, then one- or two-parameter bivariate copula fam-ilies are not sufficient; instead, three- or four-parameter bivariate copula familiesmight be needed. Moreover, if the dependence measures or asymmetry measuresin trees 2 and up are not constant over the conditioning value(s), then non-constantcopulas should be considered.The diagnostic measures have been shown to be effective in suggesting ap-propriate candidate parametric copula families. It is a future research direction toautomatically and adaptively generate a shortlist of candidate parametric copulafamilies for edges of a vine copula based on the diagnostic measures. An alter-native is a reverse-delete algorithm: start with a long list of bivariate parametriccopula families followed by deletion of families that cannot match the diagnosticsummaries.83Chapter 5Prediction based on conditionaldistributions of vine copulas5.1 IntroductionIn the context of an observational study, where the response variable Y and theexplanatory variables X = (X1, . . . ,Xp) are measured simultaneously, a natural ap-proach is to fit a joint distribution to (X1, . . . ,Xp,Y ) assuming a random sam-ple (xi1, . . . ,xip,yi) for i = 1, . . . ,n, and then obtain the conditional distribution ofY given X for making predictions. Observational studies are studies where re-searchers observe subjects and measure several variables together, and inferencesof interest are relationships among the measured variables, including the condi-tional distribution of Y given other variables when there is a variable Y that onemay want to predict from the other variables. In contrast, in experimental studies,the explanatory variables (treatment factors) are controlled for by researchers, andthe effect of the non-random explanatory variables is then observed on the experi-mental units. The inferences of interest may be different for experimental studies.The conditional expectation E(Y |X = x) and conditional quantiles F\u22121Y |X(\u00b7|x)can be obtained from the conditional distribution for out-of-sample point estimatesand prediction intervals. This becomes the usual multiple regression if the joint dis-tribution of (X,Y ) is multivariate Gaussian. Unlike multiple regression, the joint-distribution-based approach uses information on the distributions of the variables84and does not specify a simple linear or polynomial equation for the conditionalexpectation.Nonparametric regression methods are alternatives to multiple regression anddo not assume a predetermined form of the predictor (Fan, 1992; Ha\u00a8rdle, 1990;Stone, 1977). However, they have difficulty in (1) specifying heteroscedasticity,(2) capturing the shapes of the regression function in the extremes, (3) modelinghigh-dimensional data due to the curse of dimensionality. Nagler and Czado (2016)apply bivariate kernel density estimation with vines to get around the curse of di-mensionality. The joint-distribution-based approach estimates univariate distribu-tions and this is not sufficiently explored in the regression literature but is relevantwhen all variables are measured together in an observational study.When the explanatory variable is a scalar and continuous (p = 1), the jointdistribution of (X ,Y ) can be modeled using a bivariate parametric copula family.Bernard and Czado (2015) show how different copula families can lead to quitedifferent shapes in the conditional mean function E(Y |X = x) and say that linearityof conditional quantiles is a pitfall of quantile regression. There are applications ofbivariate or low-dimensional copulas for regression in Bouye\u00b4 and Salmon (2009)and Noh et al. (2013). However, none of the previous papers link the shape ofconditional quantiles to tail properties of the copula family.For the multivariate distribution approach to work for moderate to large di-mensions, there are two major questions to be addressed: (A) How to model thejoint distribution of (X1, . . . ,Xp,Y ) when p is not small and some X j variables arecontinuous and others are discrete? (B) How to efficiently compute the condi-tional distribution of Y given X? For question (A), the vine copula or pair-copulaconstruction is a flexible tool in high-dimensional dependence modeling; see Aaset al. (2009); Bedford and Cooke (2002); Brechmann et al. (2012); Dissmann et al.(2013); Joe (2014).The possibility of applying copulas for prediction and regression has been ex-plored, but an algorithm is needed in general for (B) when some variables arecontinuous and others are discrete. Parsa and Klugman (2011) use a multivari-ate Gaussian copula to model the joint distribution, and conditional distributionshave closed-form expressions. However, Gaussian copulas do not handle tail de-pendence or tail asymmetry, so can lead to incorrect inferences in the joint tails.85Vine copulas are used by Kraus and Czado (2017a) and Schallhorn et al. (2017) forquantile regression, but the vine structure is restricted to a boundary class of vinescalled the D-vine. A general regular-vine (R-vine) copula is adopted in Cookeet al. (2019), for the case where the response variable and explanatory variablesare continuous. Noh et al. (2013) use a non-parametric kernel density approachfor conditional expectations, but this can run into sparsity issues as the dimensionincreases.In this chapter, we propose a method, called vine copula regression, that usesR-vines and handles mixed continuous and discrete variables. That is, the predic-tor and response variables can be either continuous or discrete. As a result, wehave a unified approach for regression and (ordinal) classification. The proposedapproach is interpretable, and various shapes of conditional quantiles of y as afunction of x can be obtained depending on how pair-copulas are chosen on theedges of the vine. Another contribution is a theoretical analysis of the asymptoticconditional cumulative distribution function (CDF) and quantile function for vinecopula regression in Chapter 6. This analysis sheds light on the flexible shapes ofE(Y |X = x), as well as provide guidelines on choices of bivariate copulas on thevine to achieve different asymptotic behavior. For example, with the approach ofadding polynomial terms to an equation in classical multiple regression, one can-not get monotone increasing E(Y |X = x) functions that flatten out for large valuesof predictor variables.The remainder of this section is organized as follows. Section 5.2 introducesthe model fitting and assessment procedure. Section 5.3 describes an algorithmthat calculates the conditional CDF of the response variable of a new observation,given a fitted vine copula regression model. The conditional CDF can be furtherused to calculate the conditional mean and quantile for regression problems, andconditional probability mass function (PMF) for classification problems.5.2 Model fitting and assessmentDue to the decomposition of a joint distribution to univariate marginal distributionsand a dependence structure among variables, a two-stage estimation procedure canbe adopted. Suppose the observed data are (zi1,zi2, . . . ,zid) = (xi1, . . . ,xip,yi), for86i = 1, . . . ,n with d = p+1.1. Estimate the univariate marginal distributions F\u0302j, for j= 1, . . . ,d, using para-metric or non-parametric methods. The corresponding u-scores are obtainedby applying the probability integral transform: u\u02c6i j = F\u0302j(zi j).2. Fit a vine copula on the u-scores. There are two components: vine structureand bivariate copulas. Section 5.2.1 discusses how to choose a vine structure,and Section 5.2.2 presents a bivariate copula selection procedure.3. Compute some conditional quantiles, with some predictors fixed and othersvarying, to check if the monotonicity properties are interpretable.5.2.1 Vine structure learningIn this section, we introduce methods for learning or choosing truncated R-vinestructures. From Kurowicka and Joe (2011), the total number of (untruncated)R-vines in d variables is 2(d\u22123)(d\u22122)(d!\/2). When the dimension d is small, itis possible to enumerate all 2(d\u22123)(d\u22122)(d!\/2) vines and find the best `-truncatedR-vine based on some objective functions such as those in Section 6.17 of Joe(2014). However, this is only feasible for d \u2264 8 in practice. Greedy algorithms(Dissmann et al., 2013) and metaheuristic algorithms (Brechmann and Joe, 2014)are commonly adopted to find a locally optimal `-truncated vine. The developmentof vine structure learning algorithms is an active research topic; various algorithmsare proposed based on different heuristics. However, no heuristic method can beexpected to be universally the best.The goal of vine copula regression is to find the conditional distribution ofthe response variable, given the explanatory variables. In general, to calculatethe conditional distribution from the joint distribution specified by a vine copula,computationally intensive multidimensional numerical integration is required. Thiscould be avoided if we enforce a constraint on the vine structure such that the nodecontaining the response variable as a conditioned variable is always a leaf node inT`, ` = 1, . . . ,d\u2212 1. When this constraint is satisfied, Algorithm 5.1 computes theconditional CDF without numerical integration.87Figure 5.1: First two trees T1 and T2 of a vine V . The node set and edge setof T1 are N(T1) = {1,2,3,4,5} and E(T1) = {[12], [23], [24], [35]}. Thenode set and edge set of T2 are N(T2) = E(T1) = {[12], [23], [24], [35]}and E(T2) = {[13|2], [25|3], [34|2]}.To construct a truncated R-vine that satisfies the constraint, we can first finda locally optimal t-truncated R-vine using the explanatory variables x1, . . . ,xp.Then from level 1 to level t, the response variable y is sequentially linked to thenode that satisfies the proximity condition and has the largest absolute (normalscores) correlation with y. The idea of extending an existing R-vine is also ex-plored by Bauer and Czado (2016) for the construction of non-Gaussian condi-tional independence tests. Figures 5.1 and 5.2 demonstrate how to add a responsevariable to the R-vine of the explanatory variables, after each variable has beentransformed to standard normal N(0,1). Given a 2-truncated R-vine V = (T1,T2)in Figure 5.1 with N(T1) = {1, . . . ,5}, E(T1) = N(T2) = {[12], [23], [24], [35]},E(T2) = {[13|2], [25|3], [34|2]}. Suppose the response variable is indexed by 6.The first step is to find the node that has the largest absolute correlation, i.e.argmax1\u2264i\u22646 |\u03c1i6|. Assume \u03c136 is the largest, then node 3 and node 6 are linked:N(T \u20321) = N(T1)\u222a{6}, E(T \u20321) = E(T1)\u222a{[36]}. At level 2, according to the prox-imity condition, node [36] can be linked to either [23] or [35]. So we compare \u03c126;3with \u03c156;3. If we assume |\u03c156;3| > |\u03c126;3|, then E(T \u20322) = E(T2)\u222a{[56|3]}. So thenew 2-truncated R-vine is V \u2032 = (T \u20321,T\u20322), as shown in Figure 5.2.88Figure 5.2: Adding a response variable to the R-vine of the explanatory vari-ables. In this example, variables 1 to 5 represent the explanatory vari-ables and variable 6 represents the response variable. The newly addednodes are highlighted.5.2.2 Bivariate copula selectionAfter fitting the univariate margins and deciding on the vine structure, parametricbivariate copulas can be fitted sequentially from tree 1, tree 2, etc. The results inChapter 6 can provide guidelines of choices of bivariate copula families in order tomatch the expected behavior of conditional quantile functions in the extremes ofthe predictor space.The decomposition of a bivariate joint PDF described in Section 2.1.1 can beextended to multivariate cases using vine copulas:f1:d(y1, . . . ,yd) =d\u220fi=1fi(yi) \u00b7 \u220f[ jk|S]\u2208E(V )c\u02dc jk;S(y j,yk;yS). (5.1)The above representation for the case of absolutely continuous random variablesis derived in Bedford and Cooke (2001); its extension to include some discretevariables is in Section 3.9.5 of Joe (2014). For simplicity of notation, we denoteF+j|S = Fj|S(y j|yS) and F\u2212j|S = limt\u2191y j Fj|S(t|yS). If it is assumed that the copulas onedges of trees 2 to d\u2212 1 do not depend on the values of the conditioning values,then c jk;S and c\u02dc jk;S in (5.1) do not depend on yS; i.e., c jk;S(\u00b7) = c jk;S(\u00b7;yS) andc\u02dc jk;S(\u00b7) = c\u02dc jk;S(\u00b7;yS). This is called the simplifying assumption. With the simplify-ing assumption, we have the following definition of c\u02dc jk;S.89\u2022 If Yj and Yk are both continuous, then c\u02dc jk;S(y j,yk) := c jk;S(F+j|S,F+k|S).\u2022 If Yj is continuous and Yk is discrete, thenc\u02dc jk;S(y1,yk) :=[Ck| j;S(F+k|S|F+j|S)\u2212Ck| j;S(F\u2212k|S|F+j|S)]\/ fk|S(yk|yS).\u2022 If Yj is discrete and Yk is continuous, thenc\u02dc jk;S(y j,yk) :=[C j|k;S(F+j|S|F+k|S)\u2212C j|k;S(F\u2212j|S|F+k|S)]\/ f j|S(y j|yS).\u2022 If Yj and Yk are both discrete, thenc\u02dc jk;S(y j,yk) :=[C jk;S(F+j|S,F+k|S)\u2212C jk;S(F\u2212j|S,F+k|S)\u2212C jk;S(F+j|S,F\u2212k|S)+C jk;S(F\u2212j|S,F\u2212k|S)]\/[f j|S(y j|yS) fk|S(yk|yS)].With the simplifying assumption and parametric copula families, the log-likelihoodof the bivariate copula C j,k;S on edge [ jk|S] \u2208 E(V ), is` jk;S(\u03b8 jk) =n\u2211i=1log(c\u02dc jk;S(zi j,zik;\u03b8 jk)).Commonly used model selection criteria include AIC and BIC:AIC jk;S(\u03b8 jk) =\u22122` jk;S(\u03b8 jk)+2|\u03b8 jk|,BIC jk;S(\u03b8 jk) =\u22122` jk;S(\u03b8 jk)+ log(n)|\u03b8 jk|,where |\u03b8 jk| refers to the number of copula parameters in c jk;S. For each candidatebivariate copula family on an edge, we first find the parameters that maximizethe log-likelihood \u03b8\u02c6MLE. Then the copula family with the lowest AIC or BIC isselected. When all the variables are continuous, this approach of selecting thebivariate copula selection is the standard approach in VineCopula (Schepsmeieret al., 2018) and has been initially proposed and investigated by Brechmann (2010).905.3 PredictionThis section describes how to predict the conditional distribution of the responsevariable of a new observation, given a fitted vine copula regression model. We firstpresent an algorithm that computes the conditional CDF of the response variable.If the response variable is continuous, the conditional quantile and mean can becalculated by inverting the conditional CDF and integrating the quantile function.If the response variable if discrete, the conditional PMF can be easily derived fromthe conditional CDF via finite difference.Based on the ideas of the algorithms in Chapter 6 of Joe (2014), Algorithm5.1 can be applied to an R-vine with mixed continuous and discrete variables.The idea is that, given the structural constraint on the vine structure describedin Section 5.2.1, conditional distributions are sequentially computed according tothe vine structure, and the conditional distribution of the response variable givenall the explanatory variables is obtained in the end. The input is a vine copularegression model with a vine array A = (ak j), a vector of new explanatory vari-ables x = (x1, . . . ,xd)\u2032, and a percentile u \u2208 (0,1). The vine array is an efficientand compact way to represent a vine structure; see Section 2.3.2 or Kurowickaand Joe (2011) or Joe (2014). The R-vine matrices in the VineCopula package(Schepsmeier et al., 2018) are the vine arrays with backward indexing of rows andcolumns. The algorithm returns the conditional CDF of the response variable giventhe explanatory variables evaluated at u, that is, pi(u|x) := P(FY (Y ) \u2264 u|X = x).It calculates the conditional distributions C j|a` j;a1 j,...,a`\u22121, j and Ca` j| j;a1 j,...,a`\u22121, j for`= 1, . . . ,ntrunc and j = `+1, . . . ,d, where ntrunc is the truncation level of the vinecopula. For discrete variables, both the left-sided and right-sided limits of the con-ditional CDF are retained. In the end, Cd|ad\u22121,d ;a1d ,...,ad\u22122,d is returned.If the response variable Y is continuous, then the conditional mean and condi-tional quantile can be calculated using pi(\u00b7|x): the \u03b1-quantile is F\u22121Y (pi\u22121(\u03b1|x)),and the conditional mean isE(Y |X = x) =\u222b 10F\u22121Y (pi\u22121(\u03b1|x))d\u03b1,where pi\u22121(\u00b7|x) is calculated using the secant method, and the numerical integration91is computed using Monte Carlo methods or numerical quadrature. If the responsevariable Y is ordinal, then it is a classification problem; we only need to focus onthe support of Y . The conditional CDF is fully specified by pi(FY (y)|x) = P(Y \u2264y|X = x), where y \u2208 {k : P(Y = k)> 0}.If the response variable Y is nominal, then the proposed method does not apply.An alternative vine-copula-based method is to fit a vine copula model for eachclass separately and use the Bayes\u2019 theorem to predict the class label. Specifically,for samples in class Y = k, we fit a vine copula density f\u02c6X|Y (x|k). Let p\u02c6ik be theproportion of samples in class k in the training set. According the Bayes\u2019 theorem,the predicted probability that a sample belongs to class k isf\u02c6Y |X(k|x) =p\u02c6ik f\u02c6X|Y (x|k)\u2211 j p\u02c6i j f\u02c6X|Y (x| j).The classification rule has been utilized in Nagler and Czado (2016) in an exampleinvolving vines with nonparametric pair copula estimation using kernels. Sincethe distribution of predictors is modeled separately for each class, this alternativemethod is more flexible but has a high computational cost, especially when thenumber of classes is large.5.4 Simulation studyWe demonstrate the flexibility and effectiveness of vine copula regression methodsby visualizing the fitted models on simulated datasets. The simulated datasets havethree variables: X1 and X2 are the explanatory variables and Y is the responsevariable, whereX =(X1X2)\u223c N((00),(1 0.50.5 1))and Y is simulated in three cases with varying conditional expectation and variancestructures. Let U1 =\u03a6(X1) and U2 =\u03a6(X2), where \u03a6 is the standard normal CDF,and \u03b5 be a random error following a standard normal distribution and independentfrom X1 and X2. The three cases are as follows:1. Linear and homoscedastic: Y = 10X1+5X2+10\u03b5 .92Algorithm 5.1 Conditional CDF of the response variable given the explanatoryvariables with which to predict; based on steps from Algorithms 4, 7, 17, 18 inChapter 6 of Joe (2014).Input: Vine array A = (ak j) with a j j = j for j = 1, . . . ,d on the diagonal. u+ =(u+1 , . . . ,u+d ), u\u2212=(u\u22121 , . . . ,u\u2212d ), where u+j =Fj(x j) and u\u2212j =Fj(x\u2212j ) for 1\u2264 j\u2264 d\u22121,u+d = u\u2212d \u2208 [0,1].Output: P(Fd(Xd)\u2264 u+d |X1 = x1, . . . ,Xd\u22121 = xd\u22121).1: Compute M = (mk j) in the upper triangle, where mk j = max{a1 j, . . . ,ak j} for k =1, . . . , j\u22121, j = 2, . . . ,d.2: Compute the I = (Ik j) indicator array as in Algorithm 5 in Joe (2014).3: s+j = u+a1 j ,s\u2212j = u\u2212a1 j ,w+j = u+j ,w\u2212j = u\u2212j , for j = 1, . . . ,d.4: for `= 2, . . . ,ntrunc do5: for j = `, . . . ,d do6: if I`\u22121, j = 1 then7: if isDiscrete(variable j) then8: v\u2032+j \u2190Ca`\u22121, j j;a1 j ...a`\u22122, j (s+j ,w+j )\u2212Ca`\u22121, j j;a1 j ...a`\u22122, j (s+j ,w\u2212j )w+j \u2212w\u2212j,9: v\u2032\u2212j \u2190Ca`\u22121, j j;a1 j ...a`\u22122, j (s\u2212j ,w+j )\u2212Ca`\u22121, j j;a1 j ...a`\u22122, j (s\u2212j ,w\u2212j )w+j \u2212w\u2212j,10: else11: v\u2032+j \u2190Ca`\u22121, j | j;a1 j ...a`\u22122, j(s+j |w+j ),12: v\u2032\u2212j \u2190Ca`\u22121, j | j;a1 j ...a`\u22122, j(s\u2212j |w+j ),13: end if14: end if15: if isDiscrete(variable a`\u22121, j) then16: v+j \u2190Ca`\u22121, j j;a1 j ...a`\u22122, j (s+j ,w+j )\u2212Ca`\u22121, j j;a1 j ...a`\u22122, j (s\u2212j ,w+j )s+j \u2212s\u2212j,17: v\u2212j \u2190Ca`\u22121, j j;a1 j ...a`\u22122, j (s+j ,w\u2212j )\u2212Ca`\u22121, j j;a1 j ...a`\u22122, j (s\u2212j ,w\u2212j )s+j \u2212s\u2212j,18: else19: v+j \u2190C j|a`\u22121, j ;a1 j ...a`\u22122, j(w+j |s+j ), v\u2212j \u2190C j|a`\u22121, j ;a1 j ...a`\u22122, j(w\u2212j |s+j ),20: end if21: end for22: for j = `+1, . . . ,d do23: if a`, j = m`, j then s+j \u2190 v+m`+1, j , s\u2212j \u2190 v\u2212m`+1, j ,24: else if a`, j < m`, j then s+j \u2190 v\u2032+m`+1, j , s\u2212j \u2190 v\u2032\u2212m`+1, j ,25: end if26: w+j \u2190 v+j , w\u2212j \u2190 v\u2212j ,27: end for28: end for29: Return v+d .932. Linear and heteroscedastic: Y = 10X1+5X2+10(U1+U2)\u03b5 .3. Non-linear and heteroscedastic: Y =U1e1.8U2 +0.5(U1+U2)\u03b5 .We simulate samples with size 2000 in each case with a random split of 1000observations for a training set and a test set. Five methods are considered in thesimulation study: (1) linear regression, (2) linear regression with logarithmic trans-formation of the response variable, (3) quadratic regression, (4) Gaussian copularegression, and (5) vine copula regression. The Gaussian copula can be consideredas a special case of the vine copula, in which the bivariate copula families on thevine edges are all bivariate Gaussian. Different models are trained on the trainingset and used to obtain the conditional expectations as point predictions and 95%prediction intervals on the test set. For copula regressions, the upper and lowerbounds of the 95% prediction interval are the conditional 97.5% and 2.5% quan-tiles respectively. For the Gaussian and vine copula, the marginal distribution ofY is fitted by the maximum likelihood estimation (MLE) of a normal distributionin case 1. In cases 2 and 3, the distributions of the response variable are skewedand unimodal but not too heavy-tailed. Therefore, we fit 3-parameter skew-normaldistributions. For the vine copula regression, the candidate bivariate copula fami-lies include Student-t, MTCJ, Gumbel, Frank, Joe, BB1, BB6, BB7, BB8, and thecorresponding survival copulas. The bivariate copulas are selected using the AICdescribed in Section 5.2.2. The procedure is replicated 100 times and the averagescores of the replicates are reported in Table 5.1. To evaluate the performance of aregression model, we apply the root-mean-square error (RMSE) and several scoringrules for probabilistic forecasts studied in Gneiting and Raftery (2007), includingthe logarithmic score (LS), quadratic score (QS), interval score (IS), and integratedBrier score (IBS). Note that the RMSE is not meaningful if there is heteroscedas-ticity in conditional distributions; the LS, QS, IS, and IBS assess the predictivedistributions with non-constant variance more effectively.\u2022 The root-mean-square error (RMSE) measures a model\u2019s performance onpoint estimations.RMSE(M ) =\u221a1ntestntest\u2211i=1(yi\u2212 y\u02c6Mi )2,94where yi is the response variable of the i-th sample in the test set, and y\u02c6Mi isthe predictive conditional expectation of a fitted modelM .\u2022 The logarithmic score (LS) is a scoring rule for probabilistic forecasts ofcontinuous variables (Gneiting and Raftery, 2007). It is closely related tothe generalization error in machine learning literature (Chapter 7.2 in Hastieet al. (2009)).LS(M ) =1ntestntest\u2211i=1log f\u02c6MY |X(yi|xi),where (xi,yi) is the ith observation in the test set, and f\u02c6MY |X is the predictiveconditional PDF of modelM . For example, ifM is a linear regression, thenthe predictive conditional distribution is a scaled and shifted t-distribution. IfM is a vine copula, the predictive conditional distribution can be calculatedusing the procedure described in Section 5.3.\u2022 The quadratic score (QS) measures the predictive density, penalized by itsL2 norm (Gneiting and Raftery, 2007):QS(M ) =1ntestntest\u2211i=1[2 f\u02c6MY |X(yi|xi)\u2212\u222b \u221e\u2212\u221ef\u02c6MY |X(y|xi)2 dy].Selten (1998) provide axiomatic characterizations of the quadratic scoringrule in terms of desirable properties.\u2022 The interval score (IS) is a scoring rule for quantile and interval forecasts(Gneiting and Raftery, 2007). In the case of the central (1\u2212\u03b1)\u00d7 100%prediction interval, let u\u02c6Mi and \u02c6`Mi be the predictive quantiles at level \u03b1\/2and 1\u2212\u03b1\/2 by model M for the i-th test sample. The interval score ofmodelM isIS(M ) =1ntestntest\u2211i=1[(u\u02c6Mi \u2212 \u02c6`Mi )+2\u03b1( \u02c6`Mi \u2212 yi)I{yi < \u02c6`Mi }+2\u03b1(yi\u2212 u\u02c6Mi )I{yi > u\u02c6Mi }].Smaller interval scores are better. A model is rewarded for narrow predic-95tion intervals, and it incurs a penalty, the size of which depends on \u03b1 , if anobservation misses the interval.\u2022 The integrated Brier score (IBS) is a scoring rule that is defined in terms ofpredictive cumulative distribution functions (Gneiting and Raftery, 2007):IBS(M ) =1ntestntest\u2211i=1\u222b \u221e\u2212\u221e[F\u0302MY |X(y|xi)\u2212 I{y\u2265 yi}]2dy,where F\u0302MY |X is the predictive conditional CDF of modelM . Smaller integratedBrier scores are better.The first case serves as a sanity check; if the response variable is linear inthe explanatory variables and the conditional variance is constant, the vine cop-ula should behave like linear regression. Figure 5.3a plots the simulated data, thetrue conditional expectation surface and true 95% prediction interval surfaces. Fig-ure 5.3b plots the corresponding predicted surfaces. All three surfaces truthfullyreflect the linearity of the data. The first three lines of Table 5.1 show that the vinecopula and linear regression have similar performance in terms of all five metrics.The second case adds heteroscedasticity to the first case; that is, the variance ofY increases as X1 or X2 increases while the linear relationship remains the same. Weexpect the conditional expectation surface to be linear. Figure 5.4a and Figure 5.4bshow the true and predicted surfaces respectively. The conditional expectationsurface is linear and the lengths of prediction intervals increase with X1 and X2.The performance measures in Table 5.1 are also consistent with our expectation:the vine copula models have better LS, QS, IS, and IBS, although the RMSE isslightly worse than the linear regression model. The logarithmic transformation ofthe response variable does not seem to improve the performance.Finally, the third case incorporates both non-linearity and heteroscedasticity.Since the linear regression obviously cannot fit the non-linear trend, we compareour model to quadratic regression as well. Figure 5.5 shows the true surfacesand the predicted surfaces for the three models. Although the quadratic regres-sion model captures the non-linear trend, it is not flexible enough to model het-eroscedasticity. Another drawback of quadratic regression is that, the conditionalmean y\u02c6 is not always monotonically increasing with respect to x1 and x2, and this96(a) Linear and homoscedastic data,the true surfaces.(b) Linear and homoscedastic data,predicted surfaces by a vine copula re-gression model.Figure 5.3: The linear homoscedastic simulation case. In this fitted vine cop-ula model, C13,C12 and C23;1 are all Gaussian copulas, with parameters\u03c113 = 0.77,\u03c112 = 0.5 and \u03c123;1 = 0.39. The green surfaces represent theconditional expectation, and the red and blue surfaces are the 2.5% and97.5% quantile surfaces, respectively.contradicts the pattern in the data. The vine copula naturally fits the non-linearityand heteroscedasticity pattern. Quantitatively, the quadratic regression model hasthe best RMSE and IS, but vine copula models have the best LS, QS, and IBS, asshown in Table 5.1.We have also conducted a similar simulation study with four explanatory vari-ables X1,X2,X3,X4, whereX =\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8edX1X2X3X4\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8\u223c N\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed0000\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 ,\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed1 0.5 0.5 0.50.5 1 0.5 0.50.5 0.5 1 0.50.5 0.5 0.5 1\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 .The response variable Y is generated from similar three cases:1. Linear and homoscedastic: Y = 5(X1+X2+X3+X4)+20\u03b5 .2. Linear and heteroscedastic: Y = 5(X1 +X2 +X3 +X4)+10(U1 +U2 +U3 +97(a) Linear and heteroscedastic data,the true surfaces.(b) Linear and heteroscedastic data,predicted surfaces by a vine copula re-gression model.Figure 5.4: The linear heteroscedastic simulation case. In this fitted vine cop-ula model, C13 is a survival Gumbel copula with parameter \u03b413 = 2.21,C12 is a Gaussian copula with parameter \u03c112 = 0.5, and C23;1 is a BB8copula with parameters \u03d123;1 = 3.06,\u03b423;1 = 0.71. The green surfacesrepresent the conditional expectation, and the red and blue surfaces arethe 2.5% and 97.5% quantile surfaces, respectively.U4)\u03b5 .3. Non-linear and heteroscedastic: Y =U1U2e1.8U3U4+0.5(U1+U2+U3+U4)\u03b5 .The results of the simulation study are shown in Table 5.2, the pattern of which issimilar to that of Table 5.1.5.5 Application5.5.1 Abalone data setIn this section, we apply the vine copula regression method on a real data set: theAbalone data set (Lichman, 2013). The data set comes from an original (non-machine-learning) study (Nash et al., 1994). It has 4177 examples, and the goalis to predict the age of abalone from physical measurements; the names of thesemeasurements are in Figure 5.6. The age of abalone is determined by counting98(a) Non-linear and heteroscedasticdata, the true surfaces.(b) Non-linear and heteroscedasticdata, predicted surfaces by a linear re-gression model.(c) Non-linear and heteroscedasticdata, predicted surfaces by a quadraticregression model.(d) Non-linear and heteroscedasticdata, predicted surfaces by a vine cop-ula regression model.Figure 5.5: The non-linear and heteroscedastic simulation case. In this fit-ted vine copula model, C13 is a survival BB8 copula with parameters\u03d113 = 6,\u03b413 = 0.78, C12 is a Gaussian copula with parameter \u03c112 = 0.5,and C23;1 is a BB8 copula with parameters \u03d123;1 = 6,\u03b423;1 = 0.65. Thegreen surfaces represent the conditional expectation, and the red andblue surfaces are the 2.5% and 97.5% quantile surfaces, respectively.99Case Model RMSE\u2193 LS\u2191 QS\u2191 IS\u2193 IBS\u21931 Linear reg. 10.01 (0.02) \u22123.72 (0.00) 0.028 (0.000) 39.25 (0.09) 5.64 (0.01)Gaussian copula reg. 10.01 (0.02) \u22123.72 (0.00) 0.028 (0.000) 39.09 (0.09) 5.64 (0.01)Vine copula reg. 10.01 (0.02) \u22123.72 (0.00) 0.028 (0.000) 39.14 (0.09) 5.64 (0.01)2 Linear reg. 11.19 (0.03) \u22123.83 (0.00) 0.028 (0.000) 43.80 (0.14) 6.06 (0.02)Reg. with log-transform 11.71 (0.04) \u22123.83 (0.01) 0.031 (0.000) 47.22 (0.30) 6.09 (0.02)Gaussian copula reg. 11.32 (0.03) \u22123.75 (0.00) 0.031 (0.000) 41.45 (0.13) 5.95 (0.02)Vine copula reg. 11.38 (0.03) \u22123.73 (0.00) 0.033 (0.000) 41.24 (0.12) 5.97 (0.02)3Linear reg. 0.77 (0.01) \u22121.16 (0.00) 0.388 (0.001) 3.03 (0.00) 0.43 (0.00)Reg. with log-transform 0.69 (0.00) \u22120.87 (0.00) 0.540 (0.002) 2.52 (0.01) 0.35 (0.00)Quadratic reg. 0.62 (0.00) \u22120.95 (0.00) 0.511 (0.001) 2.43 (0.01) 0.34 (0.00)Gaussian copula reg. 0.69 (0.00) \u22120.86 (0.00) 0.604 (0.002) 2.65 (0.01) 0.35 (0.00)Vine copula reg. 0.63 (0.00) \u22120.75 (0.00) 0.686 (0.002) 2.50 (0.01) 0.32 (0.00)Table 5.1: Simulation results for two explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS) in differ-ent simulation cases. The arrows in the header indicate that lower RMSE,IS, and IBS; and higher LS and QS are better. The numbers in parenthesesare the corresponding standard errors.the number of rings (Rings) through a microscope, and this is a time-consumingtask. Other physical measurements that are easier to obtain, are used to predictthe age. Rings can be regarded either as a continuous variable or an ordinal one.Thus the problem can be either a regression or a classification problem. We focuson the subset of 1526 male samples (with two outliers removed). Figure 5.6 showsthe pairwise scatter plots, marginal density functions and pairwise correlation co-efficients. There is clear non-linearity and heteroscedasticity among the pairs ofvariables. We discuss the regression problem in Section 5.5.2, and Section 5.5.3shows the results for the classification problem.5.5.2 RegressionIn this section, we compare the performance of vine copula and linear regressionmethods. Three vine regressions are considered:\u2022 R-vine copula regression: the proposed method with the candidate bivariate100Case Model RMSE\u2193 LS\u2191 QS\u2191 IS\u2193 IBS\u21931 Linear reg. 20.09 (0.05) \u22124.42 (0.00) 0.014 (0.000) 78.53 (0.18) 11.34 (0.03)Gaussian copula reg. 20.09 (0.05) \u22124.42 (0.00) 0.014 (0.000) 78.06 (0.18) 11.34 (0.03)Vine copula reg. 20.12 (0.05) \u22124.42 (0.00) 0.014 (0.000) 78.18 (0.18) 11.36 (0.03)2 Linear reg. 22.04 (0.07) \u22124.51 (0.00) 0.014 (0.000) 86.25 (0.29) 12.01 (0.04)Reg. with log-transform 22.41 (0.07) \u22124.56 (0.01) 0.015 (0.000) 96.02 (0.75) 11.88 (0.03)Gaussian copula reg. 22.11 (0.07) \u22124.46 (0.00) 0.015 (0.000) 84.79 (0.27) 11.78 (0.03)Vine copula reg. 22.43 (0.07) \u22124.44 (0.00) 0.016 (0.000) 82.42 (0.26) 11.91 (0.04)3Linear reg. 1.22 (0.00) \u22121.62 (0.00) 0.251 (0.001) 4.80 (0.02) 0.67 (0.00)Reg. with log-transform 1.22 (0.00) \u22121.57 (0.00) 0.270 (0.001) 4.73 (0.02) 0.64 (0.00)Quadratic reg. 1.13 (0.00) \u22121.54 (0.00) 0.275 (0.001) 4.42 (0.01) 0.62 (0.00)Gaussian copula reg. 1.21 (0.00) \u22121.56 (0.00) 0.273 (0.001) 4.68 (0.02) 0.64 (0.00)Vine copula reg. 1.19 (0.00) \u22121.50 (0.00) 0.290 (0.001) 4.35 (0.01) 0.63 (0.00)Table 5.2: Simulation results for four explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS) in differ-ent simulation cases. The arrows in the header indicate that lower RMSE,IS, and IBS; and higher LS and QS are better. The numbers in parenthesesare the corresponding standard errors.Figure 5.6: Pairwise scatter plots of the Abalone dataset.101copula families;\u2022 Gaussian copula regression with R-vine partial correlation parametrization:the proposed method with the bivariate Gaussian copulas only;\u2022 D-vine copula regression: Kraus and Czado (2017a) with the candidate bi-variate copula families.The candidate bivariate copulas include Student-t, MTCJ, Gumbel, Frank, Joe,BB1, BB6, BB7, BB8, and the corresponding survival and reflected copulas.We perform 100 trials of 5-fold cross validation. Vine copula regressions andlinear regression are fitted using the training set, and the test set is used for per-formance evaluation. All the univariate margins are fitted by skew-normal distri-butions. The conditional mean and 95% prediction interval are obtained for allmodels. For copula regressions, the upper and lower bounds of the 95% predictioninterval are the conditional 97.5% and 2.5% quantiles respectively.We consider the out-of-sample performance measures used in Section 5.4: theroot-mean-square error (RMSE), logarithmic score (LS), quadratic score (QS), in-terval score (IS), and integrated Brier score (IBS). Table 5.3 shows the averageperformance measures from the 100 trials of cross validation. Compared with lin-ear regression, our method has lower prediction errors, and better predictive scores.The performance of the R-vine copula model is slightly better than the D-vine cop-ula model, in terms of all five scores. The vine array and bivariate copulas on theedges of the R-vine from one round of cross-validation are shown in Table 5.4.Figure 5.7 gives a visualization of the R-vine array. Several of the copulas linkingto the response variables in trees 2 to 7 represent weak negative dependence.The fitted D-vine regression model has path Diameter\u2013VisceraWeight\u2013WholeWeight\u2013ShuckedWeight\u2013ShellWeight\u2013Rings in the first level ofthe D-vine structure.We have applied the diagnostic tools of asymmetry and simplifying assumptionmentioned in Chapter 4 to the second tree of the SeqMST output. The simplify-ing assumption seems valid. We have also conducted monotonicity checks of thepredicted conditional median based on the fitted R-vine model. Four of the linkingcopulas in trees 2 to 7 (last column of the right-hand side of Table 5.4) represent102Figure 5.7: Visualization of the R-vine array in Table 5.4.103Model RMSE\u2193 LS\u2191 QS\u2191 IS\u2193 IBS\u2193Linear reg. 2.272 \u22122.240 0.138 8.909 1.232Gaussian copula reg. 2.287 \u22122.142 0.152 8.276 1.208D-vine copula reg. 2.183 \u22122.064 0.163 8.104 1.141R-vine copula reg. 2.168 \u22122.057 0.164 8.005 1.136Table 5.3: Comparison of the performance of vine copula regressions and lin-ear regression. The numbers are the average scores over 100 trials of5-fold cross validation. The scoring rules are defined in Section 5.4.conditional negative dependence given the previously linked variables to the re-sponse variable. This means that the conditional median function is not alwaysmonotone increasing in an explanatory variable when others are held fixed. How-ever, when all explanatory variables are increasing together (for larger abalone),the conditional median is increasing. This property is similar to classical Gaussianregression with positive correlated explanatory variables and the existence of neg-ative regression coefficients because of some negative partial correlations. Evenwith some negative conditional dependence, there is overall better out-of-sampleprediction performance by keeping all of the explanatory variables in the model.We also did some numerical checks on the conditional quantiles when oneexplanatory variable becomes extreme and other variables are held fixed. It appearsthat the behavior is close to asymptotically constant. From the linking copulas inTable 5.4 and the results in Chapter 6, we would not be expecting asymptotic linearbehavior (and this is reasonable from the context of the variables).Figure 5.8 visualizes the prediction performance of the three methods on thefull dataset. The plots show the residuals against the fitted values on the test set,and the prediction intervals. Due to heteroscedasticity, there is more variation inresiduals as fitted value increases. However, linear regression fails to capture theheteroscedasticity and the prediction intervals are roughly of the same length. Vinecopula regression gives wider (narrower) prediction intervals when the fitted valuesare larger (smaller). This illustrates the reason why our method overall has moreprecise prediction intervals.1044 4 4 4 4 7 1 77 7 7 5 4 4 45 5 7 6 5 56 6 5 7 61 1 6 33 3 12 28- BB6.s BB6.s BB6.s BB1.s Gumbel.s BB6.s Gumbel.s- - t Joe.v BB8.s t BB8.s BB8.u- - - t Frank Frank BB8.v BB8.u- - - - Frank t Frank MTCJ.v- - - - - t Frank t- - - - - - Gumbel t- - - - - - - Gumbel.u- - - - - - - -Table 5.4: Vine array and bivariate copulas of the R-vine copula regres-sion fitted on the full dataset. The variables are (1) Length, (2)Diameter, (3) Height, (4) WholeWeight, (5) ShuckedWeight,(6) VisceraWeight, (7) ShellWeight, (8) Rings. A suffix of \u2018s\u2019represents survival version of the copula family to get the opposite direc-tion of joint tail asymmetry; \u2018u\u2019 and \u2018v\u2019 represent the copula family withreflection on the first and second variable respectively to get negative de-pendence.5.5.3 ClassificationThe response variable Rings is an ordinal variable that ranges from 3 to 27.Therefore this is a multiclass classification problem. Although our method canhandle multiclass classification problems, we reduce it to a binary classificationproblem for easy comparison with commonly used methods, including logisticregression, support vector machine (SVM), and random forest (RF). The samplemedian of Rings is 10; if a sample\u2019s Rings is greater than 10, we label it as\u2018large\u2019, otherwise \u2018small\u2019. All the predictor variables are fitted by skew-normaldistributions, and we fit an empirical distribution to the response variable Rings.The D-vine regression method (Kraus and Czado, 2017a) can only handle con-tinuous variables and is not directly applicable to the classification problem. In or-der to compare our method with the D-vine based method, we first treat the binaryresponse variable as a continuous variable (0 and 1) and use the D-vine regressionmethod (Kraus and Czado, 2017a) to find a D-vine structure or an ordering of vari-ables. Then an R-vine regression model is fitted on that D-vine structure using ourmethod.105Figure 5.8: Residual vs. fitted value plots. The red and blue points corre-spond to the lower bound and upper bound of the prediction intervals.For binary classifiers, the performance can be demonstrated by a receiver oper-ating characteristic (ROC) curve. The curve is created by plotting the true positiverate against the false positive rate at various threshold settings. The (0,1) point cor-responds to a perfect classification; a completely random guess would give a pointalong the diagonal line. An ROC curve is a two-dimensional depiction of classifierperformance. To compare classifiers we may want to reduce ROC performance toa scalar value representing the expected performance. A common method is tocalculate the area under the curve (AUC) (Fawcett, 2006). The AUC can also beinterpreted as the probability that a classifier will rank a randomly chosen positiveinstance higher than a randomly chosen negative one. Therefore, larger AUC isbetter. Figure 5.9a shows sample ROC curves of different binary classifiers and thecorresponding AUCs.106(a) ROC curves of different binary clas-sifiers. The performance is evaluated on thetest set.(b) Box plot of the AUCs based on 10-fold cross-validation, repeated 20 times.Figure 5.9: Comparison of the performance on the classification problem.Repeated 10-fold cross-validation with random partitions is used to assess theperformance. In each pass, 10-fold cross-validation is performed and the averageAUC is recorded. Figure 5.9b shows a box plot of the average AUCs. The perfor-mance of vine copula regression is marginally better than the other methods. Theaverage AUCs are: RVineReg = 0.835, DVineReg = 0.826, SVM = 0.825, Logisti-cReg = 0.814, RF = 0.811.5.6 ConclusionOur vine copula regression method uses R-vines and can fit mixed continuous andordinal variables. The prediction algorithm can efficiently compute the conditionaldistribution given a fitted vine copula, without marginalizing the conditioning vari-ables. The performance of the proposed method is evaluated on simulated data setsand the Abalone data set. The heteroscedasticity in the data is better captured byvine copula regression than the standard regression methods.One potential drawback of the proposed method is the computational cost forhigh-dimensional data, especially when the dimensionality is greater than the sam-ple size. This chapter is more of a proof of concept of using R-vine copula models107for regression and classification problems. Therefore, we evaluate the performanceof the proposed methods on classical cases and compare with models such as lin-ear regressions. Another drawback is the constraint on the vine structure such thatthe response variable is always a leaf node at each level. This constraint greatlyreduces the computational complexity; without it, numerical integration would berequired to compute the conditional CDF. Finally, the criticism of copula-basedregression by Dette et al. (2014) also applies. The proposed method assumes amonotonic relationship between the response variable and explanatory variables.To relate how choices of bivariate copula families in the vine can affect predic-tion and to provide guidelines on bivariate copula families to consider, we give atheoretical analysis of the asymptotic shape of conditional quantile functions. Forbivariate copulas, the conditional quantile function of the response variable couldbe asymptotically linear, sublinear, or constant with respect to the explanatory vari-able. It turns out the asymptotic conditional distribution can be quite complex fortrivariate and higher-dimensional cases, and there are counter-intuitive examples.In practice, we recommend computing conditional quantile functions of the fittedvine copula to assess if the monotonicity properties are reasonable.One possible future research direction is the extension of the proposed regres-sion method for survival outcomes with censored data. For example, Emura et al.(2018) use bivariate copulas to predict time-to-death given time-to-cancer progres-sion; Barthel et al. (2018) apply vine copulas to multivariate right-censored eventtime data. They apply copulas to the joint survival function instead of the jointCDF to deal with right-censoring. These types of applications would require morenumerical integration methods.Another research direction is to handle variable selection and reduction whenthere are many explanatory variables, some of which might form clusters withstrong dependence. Traditional variable selection methods for regression can alsobe applied, for example, the forward selection approach. Moreover, recent pa-pers proposed methods for learning sparse vine copula models (Mu\u00a8ller and Czado,2019; Nagler et al., 2019), which can be potentially used as a variable selectionmethod for copula regression.108Chapter 6Theoretical results on shapes ofconditional quantile functions6.1 IntroductionFrom the properties of the multivariate normal distribution, if (X1, . . . ,Xp,Y ) fol-lows a multivariate normal distribution, then the conditional quantile function ofY |X1, . . . ,Xp has the linear formF\u22121Y |X1,...,Xp(\u03b1|x1, . . . ,xp) = \u03b21x1+ \u00b7 \u00b7 \u00b7+\u03b2pxp+\u03a6\u22121(\u03b1)\u221a1\u2212R2Y ;X1,...,Xp ,0 < \u03b1 < 1,where R2Y ;X1,...,Xp is the multiple correlation coefficient. Going beyond the nor-mal distribution, we address the following question in this section: how does thechoice of bivariate copulas in a vine copula regression model affect the condi-tional quantile function, especially when the explanatory variables are large (inabsolute value)? For comparisons with multivariate normal, we assume the vari-ables X1, . . . ,Xp,Y have been transformed so that they have marginal N(0,1) dis-tributions. In this case, plots from vine copulas with one or two explanatory vari-ables can show conditional quantile functions that are close to linear in the middle,and asymptotically linear, sublinear or constant along different directions to \u00b1\u221e;109Bernard and Czado (2015) have several figures that show this pattern for the case ofone explanatory variable. Such behavior cannot be obtained with regression equa-tions that are linear in \u03b2 \u2019s and is hard to obtain with nonlinear regression functionsthat are directly specified.We start with the bivariate case (one explanatory variable) in Section 6.2. Con-ditions are obtained to classify the asymptotic behavior of conditional quantilefunction into four categories: strongly linear, weakly linear, sublinear and asymp-totically constant. For bivariate Archimedean copulas, the conditions are related toconditions on the Laplace transform generator, as shown in Section 6.3. Section 6.4studies the trivariate case F\u22121Y |X1,X2(\u03b1|x1,x2) with a trivariate vine copula. However,extending from bivariate to trivariate is challenging: the asymptotic conditionalquantile depends on the direction in which (x1,x2) go to infinity. Section 6.4.1analyzes the strongest possible dependence in the trivariate case: functional rela-tionship between Y and (X1,X2). It is difficult if not impossible to obtain a generalresult, given the marginal distribution of Y does not have a closed-form expressionin general. We focus on a special case where the marginal distribution of Y canbe calculated. It is shown that the conditional quantile function is asymptoticallylinear in x1 or x2 along a ray on the (x1,x2)-plane, and this is an extension of thebivariate strongly linear case. Section 6.4.2 and 6.4.3 study the asymptotic con-ditional CDF for a trivariate vine copula with bivariate Archimedean copulas andstandard normal margins; the Archimedean assumption allows for some tractableresults to be obtained. We give a classification of the conditional CDF based on thetail dependence properties of bivariate Archimedean copulas. Section 6.4.3 con-siders several special cases with different combinations of bivariate Archimedeancopulas on the edges of a trivariate vine, and shows a number of different tail be-haviors. Section 6.5 briefly discusses the possibility of generalize the results tohigher dimensions.6.2 Bivariate asymptotic conditional quantileIn this section, we focus on a bivariate random vector (X ,Y ) with standard normalmargins. Let C(u,v) be the copula, then the joint CDF is FX ,Y (x,y)=C(\u03a6(x),\u03a6(y)).The copula C(u,v) is assumed to have positive dependence. We are interested in the110shape of the conditional CDF FY |X(y|x) and conditional quantile F\u22121Y |X(\u03b1|x), when xis extremely large or small and \u03b1 \u2208 (0,1) is fixed. Bernard and Czado (2015) studya few special cases for bivariate copulas. Our results are more extensive in relatingthe shape of asymptotic quantiles to the strength of dependence in the joint tail.If the conditional distribution CV |U(\u00b7|u) converges to a continuous distributionwith support on [0,1], as u\u2192 0+, then C\u22121V |U(\u03b1|0) > 0 , for \u03b1 \u2208 (0,1). Therefore,F\u22121Y |X(\u03b1|x) levels off as x\u2192\u2212\u221e. The same argument applies when x\u2192+\u221e. Thatis,limx\u2192\u2212\u221eF\u22121Y |X(\u03b1|x) =\u03a6\u22121(C\u22121V |U(\u03b1|0)); limx\u2192+\u221eF\u22121Y |X(\u03b1|x) =\u03a6\u22121(C\u22121V |U(\u03b1|1)).If CV |U(\u00b7|u) converges to a degenerate distribution at 0 when u\u2192 0+, thenlimu\u21920+ C\u22121V |U(\u03b1|u) = 0. To study the shape of FY |X(y|x) when x is very negative,we need to further investigate the rate at which C\u22121V |U(\u03b1|u) converges to 0. The nextproposition summarizes the possibilities.Proposition 6.1. Let (X ,Y ) be a bivariate random vector with standard normalmargins and a positively dependent copula C(u,v).\u2022 (Lower tail) Fixing \u03b1 \u2208 (0,1), if\u2212 logC\u22121V |U(\u03b1|u)\u223c k\u03b1(\u2212 logu)\u03b7 as u\u2192 0+,k\u03b1 > 0, then F\u22121Y |X(\u03b1|x)\u223c\u2212(21\u2212\u03b7k\u03b1)1\/2|x|\u03b7 as x\u2192\u2212\u221e.\u2022 (Upper tail) Fixing \u03b1 \u2208 (0,1), if \u2212 log[1\u2212C\u22121V |U(\u03b1|u)] \u223c k\u03b1 [\u2212 log(1\u2212 u)]\u03b7as u\u2192 1\u2212, k\u03b1 > 0, then F\u22121Y |X(\u03b1|x)\u223c (21\u2212\u03b7k\u03b1)1\/2x\u03b7 as x\u2192+\u221e.Proof. We use the following asymptotic results from Abramowitz and Stegun (1964).\u03a6(z)\u223c 1\u2212 \u03c6(z)z\u223c 1\u2212 1\u221a2pize\u2212z2\/2, z\u2192+\u221e;\u03a6\u22121(p)\u223c (\u22122log(1\u2212 p))1\/2 , p\u2192 1\u2212;\u03a6(z)\u223c\u2212\u03c6(z)z\u223c 1\u221a2pi|z|e\u2212z2\/2, z\u2192\u2212\u221e;\u03a6\u22121(p)\u223c\u2212(\u22122log p)1\/2 , p\u2192 0+.111Using the notation above,\u03a6\u22121(C\u22121V |U(\u03b1|u))\u223c\u2212(\u22122logC\u22121V |U(\u03b1|u))1\/2 \u223c\u2212(2k\u03b1(\u2212 logu)\u03b7)1\/2. (6.1)When u =\u03a6(x)\u223c \u03c6(x)\/|x|,\u2212 logu\u223c\u2212 log\u03c6(x)+ log |x| \u223c 12x2+ log |x| \u223c 12x2. (6.2)Combining Equation 6.1 and Equation 6.2, we obtain the asymptotic conditionalquantile function as x\u2192\u2212\u221e,F\u22121Y |X(\u03b1|x) =\u03a6\u22121(C\u22121V |U(\u03b1|\u03a6(x)))\u223c\u2212(2k\u03b1(x2\/2)\u03b7)1\/2 \u223c\u2212(21\u2212\u03b7k\u03b1)1\/2|x|\u03b7 .The proof of the second part is similar and thus omitted.Note that the positive dependence assumption implies k\u03b1 > 0 so that the condi-tional quantiles are asymptotically increasing at the two extremes. A related resultcan be obtained for negative dependence, but it is of less interest since, in general,one tries to orient variables to have positive dependence with each other. Here \u03b7indicates the strength of relation between two variables in the tail; a larger \u03b7 valuecorresponds to stronger relation. The strongest possible comonotonic dependenceis when Y = X , and the conditional quantile function is F\u22121Y |X(\u03b1|x) = x, which islinear in x and does not depend on \u03b1; in this case, \u03b7 = 1. The weakest possiblepositive dependence is when X and Y are independent, and F\u22121Y |X(\u03b1|x) = F\u22121Y (\u03b1)does not depend on x; in this case, \u03b7 = 0. Based on the value of \u03b7 , the asymptoticbehavior of the conditional quantile function can be classified into the followingcategories:1. Strongly linear: \u03b7 = 1 and k\u03b1 = 1. F\u22121Y |X(\u03b1|x) goes to infinity linearly, and itdoes not depend on \u03b1 . It has stronger dependence than bivariate normal.2. Weakly linear: \u03b7 = 1, k\u03b1 can depend on \u03b1 and 0 < k\u03b1 < 1. F\u22121Y |X(\u03b1|x) goesto infinity linearly and it depends on \u03b1 . It has comparable dependence withbivariate normal.112Figure 6.1: Conditional quantile functions for bivariate copulas withKendall\u2019s \u03c4 = 0.5, combined with N(0,1) margins. Quantile levels are20%,40%,60% and 80%.3. Sublinear: 0 < \u03b7 < 1. F\u22121Y |X(\u03b1|x) goes to infinity sublinearly. The depen-dence is weaker than bivariate normal.4. Asymptotically constant: \u03b7 = 0. F\u22121Y |X(\u03b1|x) converges to a finite constant.Asymptotically it behaves like independent.Figure 6.1 shows the conditional quantile functions for bivariate copulas with dif-ferent \u03b7 in the upper and lower tails. Examples 6.1 to 6.2 derive the conditionalquantile functions for bivariate MTCJ and Gumbel copulas. Note that \u03b7 is constantover \u03b1 for several commonly used parametric bivariate copula families. However,there are cases where \u03b7 depends on \u03b1 . For example, the boundary conditional dis-tribution of the bivariate Student-t copula has mass at both 0 and 1; depending onthe value of \u03b1 , C\u22121V |U(\u03b1|u) could go to either 0 or 1, as u\u2192 0.Example 6.1. (MTCJ lower tail) The bivariate MTCJ copula is defined in Sec-113tion 2.1. The conditional quantile function isC\u22121V |U(\u03b1|u;\u03b4 ) = [(\u03b1\u2212\u03b4\/(1+\u03b4 )\u22121)u\u2212\u03b4 +1]\u22121\/\u03b4\u223c (\u03b1\u2212\u03b4\/(1+\u03b4 )\u22121)\u22121\/\u03b4u, u\u2192 0; \u03b4 > 0.Take the log of both sides, \u2212 logC\u22121V |U(\u03b1|u;\u03b4 ) \u223c logu. By Proposition 6.1, wehave F\u22121Y |X(\u03b1|x) \u223c x, as x\u2192 \u2212\u221e. To apply the next proposition to get the sameconclusion, the generator is the gamma Laplace transform \u03c8(s) = (1+ s)\u22121\/\u03b4 .Example 6.2. (Gumbel lower tail) The bivariate Gumbel copula is defined in Sec-tion 2.1. The conditional CDF isCV |U(v|u;\u03b4 ) = u\u22121 exp{\u2212[(\u2212 logu)\u03b4 +(\u2212 logv)\u03b4 ]1\/\u03b4}[1+(\u2212 logv\u2212 logu)\u03b4]1\/\u03b4\u22121,\u03b4 > 1.The conditional quantile function C\u22121V |U(\u03b1|u;\u03b4 ) does not have a closed-form ex-pression; it has the following asymptotic expansion:\u2212 logC\u22121V |U(\u03b1|u;\u03b4 )\u223c (\u2212\u03b4 log\u03b1)1\/\u03b4 (\u2212 logu)1\u22121\/\u03b4 , u\u2192 0.By Proposition 6.1, we have F\u22121Y |X(\u03b1|x)\u223c\u2212(\u22122\u03b4 log\u03b1)1\/(2\u03b4 ) |x|1\u22121\/\u03b4 , as x\u2192\u2212\u221e.To apply the next proposition to get the same conclusion, the generator is the posi-tive stable Laplace transform \u03c8(s) = exp{\u2212s\u22121\/\u03b4}.6.3 Bivariate Archimedean copula boundary conditionaldistributionsThis section proves a proposition on the relationship between the tail dependencebehavior of a bivariate Archimedean copula and its tail conditional distribution andquantile functions. This proposition is used in Section 6.4.2.A bivariate Archimedean copula with Laplace transform generator \u03c8 can beconstructed as C(u,v)=\u03c8(\u03c8\u22121(u)+\u03c8\u22121(v)), where u,v\u2208 [0,1],\u03c8(\u221e)= 0,\u03c8(0)=1, and \u03c8 is non-increasing and convex. It is the CDF of a random vector (U,V ).114The corresponding conditional distribution P(V \u2264 v|U = u) isCV |U(v|u) =\u2202C(u,v)\u2202u=\u03c8 \u2032(\u03c8\u22121(u)+\u03c8\u22121(v))\u03c8 \u2032(\u03c8\u22121(u)).We study the limit of the conditional distribution CV |U(v|u) as u\u2192 0 (or 1),and v could be a number in (0,1) or v\u2192 0 (or 1) as well. The limit depends onthe lower (upper) tail behavior of the copula, and the rate at which u and v goes to0 (or 1). The rate is characterized on the normal scale: we assume u,v\u2192 0 (or 1)and \u03a6\u22121(u)\/\u03a6\u22121(v) converges to a constant. In other words, if X = \u03a6\u22121(U) andY = \u03a6\u22121(V ), we study the conditional distribution P(Y \u2264 y|X = x) as x,y\u2192 +\u221e(or \u2212\u221e) and x\/y converges to a constant.For Archimedean and survival Archimedean copulas, the following propositionprovides a link between tail dependence behavior and tail conditional distributionand quantile functions. The proof of the proposition is included in Section 6.3.Proposition 6.2. Given the generator or Laplace transform (LT)\u03c8 of an Archimedeancopula, we assume the following.1. For the upper tail of \u03c8 , as s\u2192+\u221e,\u03c8(s)\u223c T (s) = a1sq exp(\u2212a2sr) and \u03c8 \u2032(s)\u223c T \u2032(s), (6.3)where a1 > 0, r = 0 implies a2 = 0 and q< 0, and r > 0 implies r \u2264 1 and qcan be 0, negative or positive.2. For the lower tail of \u03c8 , as s\u2192 0+, there is M \u2208 (k,k+1) such that\u03c8(s) =k\u2211i=0(\u22121)ihisi+(\u22121)k+1hk+1sM +o(sM), s\u2192 0+, (6.4)where h0 = 1 and 0 < hi < \u221e for i = 1, . . . ,k+1. If 0 < M < 1, then k = 0.Then we have the following.115\u2022 (Lower tail) If v \u2208 (0,1) and \u03b1 \u2208 (0,1) are fixed, then as u\u2192 0,CV |U(v|u)\u223c\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f31+(q\u22121)\u03c8\u22121(v)(ua1)\u22121\/q\u2192 1 if r = 0,1\u2212a1\/r2 r\u03c8\u22121(v)(\u2212 logu)1\u22121\/r\u2192 1 if 0 < r < 1,const \u2208 (0,1) if r = 1.(6.5)C\u22121V |U(\u03b1|u)\u223c\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3(\u03b11\/(q\u22121)\u22121)q \u00b7u\u2192 0 if r = 0,exp[\u2212(\u2212 log\u03b1r)r(\u2212 logu)1\u2212r]\u2192 0 if 0 < r < 1,const \u2208 (0,1) if r = 1.(6.6)\u2022 (Upper tail) If v \u2208 (0,1) and \u03b1 \u2208 (0,1) are fixed, then as u\u2192 1,CV |U(v|u)\u223c\uf8f1\uf8f2\uf8f3\u2212\u03c8 \u2032(\u03c8\u22121(v))h1\/M1 M(1\u2212u)(1\u2212M)\/M \u2192 0 if 0 < M < 1,const \u2208 (0,1) if M > 1.(6.7)C\u22121V |U(\u03b1|u)\u223c\uf8f1\uf8f2\uf8f31\u2212(\u03b11\/(M\u22121)\u22121)M (1\u2212u)\u2192 1 if 0 < M < 1,const \u2208 (0,1) if M > 1.(6.8)Note that we do not cover the case of M = 1 for the upper tail because it in-volves a slowly varying function. Combined with Proposition 6.1, it states that, forthe lower tail, the three cases r = 0, 0< r < 1 and r = 1 correspond to strongly lin-ear, sublinear and asymptotic constant conditional quantile functions respectively;for the upper tail, the two cases 0 < M < 1 and M > 1 correspond to stronglylinear and asymptotic constant conditional quantile functions respectively. Propo-sition 6.2 is used in Section 6.4.2 for analyzing cases of trivariate vine copulas.1166.3.1 Lower tailTo avoid technicalities, we assume the following from Equation 8.42 of Joe (2014).As s\u2192 \u221e,\u03c8(s)\u223c T (s) = a1sq exp(\u2212a2sr) and \u03c8 \u2032(s)\u223c T \u2032(s), (6.9)where a1 > 0, r = 0 implies a2 = 0 and q < 0, and r > 0 implies r \u2264 1 and q canbe 0, negative or positive.Lower tail dependence (r = 0)According to Theorem 8.34 in Joe (2014), for a bivariate Archimedean copula,when r = 0 and \u03c8 \u2208 RVq where q < 0, it has lower tail dependence with \u03bbL = 2q.Therefore, \u03c8(s) \u223c a1sq, \u03c8 \u2032(s) \u223c a1qsq\u22121 and \u03c8\u22121(u) \u223c (u\/a1)1\/q, as s\u2192 \u221e andu\u2192 0+. If u\u2192 0, v \u2208 (0,1) and \u03b1 \u2208 (0,1), thenCV |U(v|u)\u223c(1+\u03c8\u22121(v)\u03c8\u22121(u))q\u22121\u223c 1+(q\u22121)\u03c8\u22121(v)(ua1)\u22121\/q\u2192 1, (6.10)For the conditional quantile, we set CV |U(v|u) = \u03b1 and solve for v. As u\u2192 0, vshould also converge to 0, otherwise CV |U(v|u)\u2192 1. If u,v\u2192 0, the conditionaldistribution is asymptotic toCV |U(v|u)\u223c(1+( vu)1\/q)q\u22121, (u,v)\u2192 (0,0). (6.11)Therefore,\u03b1 \u223c(1+\u03c8\u22121(v)\u03c8\u22121(u))q\u22121\u223c(1+( vu)1\/q)q\u22121,C\u22121V |U(\u03b1|u)\u223c(\u03b11\/(q\u22121)\u22121)q \u00b7u\u2192 0.(The above is one result in Proposition 6.2.) If we further assume u \u223c vk, wherek \u2208 (0,\u221e), thenCV |U(v|u)\u223c(1+u(1\/k\u22121)\/q)q\u22121, (u,v)\u2192 (0,0), u\u223c vk.117Depending on the value of k, there are three different cases:CV |U(v|u)\u223c\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f31+(q\u22121)u(1\/k\u22121)\/q\u2192 1 if k > 1,2q\u22121 \u2208 (0,1) if k = 1,uq\u22121q (1k\u22121)\u2192 0 if 0 < k < 1.(6.12)Lower tail intermediate dependence (0 < r < 1)If 0< r < 1, then C(u,v) has lower tail intermediate dependence with 1< \u03baL(C) =2r < 2 (Hua and Joe, 2011). \u03c8 \u2032(s) \u223c \u2212a1a2rsq+r\u22121 exp(\u2212a2sr) and \u03c8\u22121(u) \u223c((\u2212 logu)\/a2)1\/r, as s\u2192 \u221e and u\u2192 0.CV |U(v|u)\u223c(\u03c8\u22121(u)+\u03c8\u22121(v))q+r\u22121 exp(\u2212a2 (\u03c8\u22121(u)+\u03c8\u22121(v))r)(\u03c8\u22121(u))q+r\u22121 exp(\u2212a2 (\u03c8\u22121(u))r)\u223c(1+\u03c8\u22121(v)\u03c8\u22121(u))q+r\u22121exp{\u2212a2(\u03c8\u22121(u))r [(1+\u03c8\u22121(v)\u03c8\u22121(u))r\u22121]}, u\u2192 0.(6.13)If v \u2208 (0,1) is fixed, then as u\u2192 0,CV |U(v|u)\u223c(1+(q+ r\u22121)\u03c8\u22121(v)\u03c8\u22121(u))exp(\u2212a2r\u03c8\u22121(v)(\u03c8\u22121(u))r\u22121)\u223c(1+(q+ r\u22121)\u03c8\u22121(v)\u03c8\u22121(u))(1\u2212a2r\u03c8\u22121(v)(\u03c8\u22121(u))r\u22121)\u223c 1\u2212a2r\u03c8\u22121(v)(\u03c8\u22121(u))r\u22121\u223c 1\u2212a1\/r2 r\u03c8\u22121(v)(\u2212 logu)1\u22121\/r\u2192 1, u\u2192 0, v \u2208 (0,1) fixed.(6.14)For the conditional quantile, we set CV |U(v|u) =\u03b1 \u2208 (0,1) and solve for v. Accord-ing to Equation 6.13, if \u03c8\u22121(v)\/\u03c8\u22121(u) does not converge to 0, then CV |U(v|u)\u21921180. It must be that \u03c8\u22121(v)\/\u03c8\u22121(u)\u2192 0 andCV |U(v|u)\u223c exp(\u2212a2r\u03c8\u22121(v)(\u03c8\u22121(u))r\u22121)\u223c \u03b1.Solving for v, we haveC\u22121V |U(\u03b1|u)\u223c exp[\u2212(\u2212 log\u03b1r)r(\u2212 logu)1\u2212r]\u2192 0, u\u2192 0.(The above is one result in Proposition 6.2.) If u,v\u2192 0, the conditional distributionis asymptotic toCV |U(v|u)\u223c(1+(\u2212 logv\u2212 logu)1\/r)q+r\u22121\u00d7 exp(\u2212(\u2212 logu)(1+(\u2212 logv\u2212 logu)1\/r)r+(\u2212 logu)), (u,v)\u2192 (0,0).(6.15)If we further assume u\u223c vk, where k \u2208 (0,\u221e), thenCV |U(v|u)\u223c(1+(1k)1\/r)q+r\u22121u(1+(1\/k)1\/r)r\u22121\u2192 0, (6.16)regardless of the value of k.Lower tail quadrant independence (r = 1)If r = 1, then the copula has \u03baL = 2 and support(CV |U(\u00b7|0)) = (0,1). Thereforelimu\u21920CV |U(v|u) \u2208 (0,1) if v \u2208 (0,1), and CV |U(v|u)\u2192 0 if (u,v)\u2192 (0,0).6.3.2 Upper tail(Proposition 3 in Hua and Joe (2011)) Suppose \u03c8(s) is the Laplace transform (LT)of a positive variable Y with k < M < k+ 1 where M = sup{m \u2265 0 : E(Y m) <\u221e} and k \u2208 {0}\u222aN+. If |\u03c8(k)\u2212\u03c8(k)(s)| is regularly varying at 0+, then |\u03c8(k)\u2212\u03c8(k)(s)| \u2208 RM\u2212k(0+). In particular, if the slowly varying component is `(s) and119lims\u21920+ `(s) = hk+1 with 0 < hk+1 < \u221e, then\u03c8(s) =k\u2211i=0(\u22121)ihisi+(\u22121)k+1hk+1sM +o(sM), s\u2192 0+, (6.17)where h0 = 1 and 0 < hi < \u221e for i = 1, . . . ,k+1. If 0 < M < 1, then k = 0.Upper tail dependence (0 < M < 1)By Proposition 4 in Hua and Joe (2011), if 0 < M < 1, then C(u,v) has upper taildependence with \u03bbU = 2\u22122M. In this case, \u03c8(s) can be written as\u03c8(s)\u223c 1\u2212h1sM,as s\u2192 0+, where 0 < h1 < \u221e and 0 < M < 1. Therefore, \u03c8 \u2032(s)\u223c\u2212h1MsM\u22121,\u03c8\u22121(u)\u223c(1\u2212uh1)1\/M, u\u2192 1.If v \u2208 (0,1) and \u03b1 \u2208 (0,1) are fixed, then as u\u2192 1,CV |U(v|u)\u223c\u2212\u03c8 \u2032(\u03c8\u22121(v))h1\/M1 M(1\u2212u)(1\u2212M)\/M \u2192 0. (6.18)For the conditional quantile, we set CV |U(v|u) = \u03b1 \u2208 (0,1) and solve for v. SinceCV |U(v|u) = \u03c8 \u2032(\u03c8\u22121(u)+\u03c8\u22121(v))\/\u03c8 \u2032(\u03c8\u22121(u)), if v does not converge to 1, thenCV |U(v|u)\u2192 0. Therefore, we have (u,v)\u2192 (1,1) and the conditional distributionis asymptotic toCV |U(v|u)\u223c(1+(1\u2212 v1\u2212u)1\/M)M\u22121. (6.19)Solving for v, the conditional quantile function isC\u22121V |U(\u03b1|u)\u223c 1\u2212(\u03b11\/(M\u22121)\u22121)M(1\u2212u)\u2192 1, u\u2192 1.(The above is one result in Proposition 6.2.) If we further assume (1\u2212u)\u223c (1\u2212v)k,where k \u2208 (0,\u221e), thenCV |U(v|u)\u223c(1+(1\u2212u)(1\/k\u22121)\/M)M\u22121, u\u2192 1. (6.20)120Depending on the value of k, there are three different cases:CV |U(v|u)\u223c\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3(1\u2212u)(M\u22121)(1\/k\u22121)\/M \u2192 0 if k > 1,2M\u22121 \u2208 (0,1) if k = 1,1+(M\u22121)(1\u2212u)(1\/k\u22121)\/M \u2192 1 if 0 < k < 1.(6.21)Upper tail intermediate dependence \/ quadrant independence (M > 1)By Proposition 7 in Hua and Joe (2011), if M > 1, then the copula C has upper tailintermediate dependence or independence, and support(CV |U(\u00b7|1)) = (0,1). There-fore, limu\u21921CV |U(v|u) \u2208 (0,1) if v \u2208 (0,1), and CV |U(v|u)\u2192 1 if (u,v)\u2192 (1,1).6.4 Trivariate asymptotic conditional quantileIn this section, we aim to extend some results from the previous subsection totrivariate distributions. Specifically, we study the trivariate case F\u22121Y |X1,X2(\u03b1|x1,x2)with a trivariate vine copula model. However, extending from bivariate to trivariateis not trivial, since the asymptotic conditional quantile function depends on thedirection in which (x1,x2) go to infinite.6.4.1 Trivariate strongest functional relationshipWe first study the strongest dependence: functional relationship between the re-sponse variable Y and explanatory variables X1 and X2. Since the marginal dis-tribution of Y does not have a closed-form expression in general, it is difficult toobtain a general result. We focus on a special case where the marginal distributionof Y can be calculated. It is shown that the conditional quantile function is asymp-totically linear in x1 or x2 along a ray on the (x1,x2)-plane, and this is an extensionof the bivariate strongly linear case.Let Y \u2217 = h(X\u22171 ,X\u22172 ), where h is monotonically increasing in each argument,Y \u2217 \u223c FY \u2217 , X\u22171 \u223c FX\u22171 and X\u22172 \u223c FX\u22172 . Transforming them to N(0,1) variables, wedefine Y = \u03a6\u22121(FY \u2217(Y \u2217)), X1 = \u03a6\u22121(FX\u22171 (X\u22171 )) and X2 = \u03a6\u22121(FX\u22172 (X\u22172 )). The fol-121lowing functional relationship holdsY =\u03a6\u22121 \u25e6FY \u2217 \u25e6h(F\u22121X\u22171 (\u03a6(X1)),F\u22121X\u22172(\u03a6(X2))):= g(X1,X2). (6.22)We are interested in the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) = g(x1,x2). Inthis case, it is obvious that the conditional quantile function does not depend onthe quantile level \u03b1 . It is conjectured that it is a generalization of the bivariatestrongly linear case, in the sense that g is asymptotically linear as (x1,x2)\u2192 (\u221e,\u221e)or (\u2212\u221e,\u2212\u221e) along different rays, even though g can be quite nonlinear in twovariables.We focus on a special case to gain insight into the asymptotic behavior ofg(x1,x2). Assume Y \u2217=X\u22171 +X\u22172 , and X\u22171 , X\u22172 follow Gamma(\u03b11,1) and Gamma(\u03b12,1)independently. As a result, Y \u2217 follows Gamma(\u03b11 +\u03b12,1). Using tail expansionsof the gamma CDF at 0 and \u221e, the following can be obtained:\u2022 If x2 \u223c kx1 as x1,x2\u2192+\u221e, then g(x1,x2)\u223c\u221a1+ k2x1.\u2022 If x2 \u223c kx1 as x1,x2\u2192\u2212\u221e, theng(x1,x2)\u223c\uf8f1\uf8f2\uf8f3\u221a\u03b11+\u03b12\u03b11 x1 if k2 \u2265 \u03b12\u03b11 ,\u221a\u03b11+\u03b12\u03b12 kx1 if k2 < \u03b12\u03b11 .Detailed derivations can be found in Appendix A.1. Although the conditional quan-tile function is not linear in both x1 and x2, it is asymptotically linear in x1 or x2along a ray. Note that this is an asymptotic property and it is true only if x1 and x2are large enough. The rate of asymptotic approximation depends on k,\u03b11 and \u03b12because of the arguments\u2212x21\/(2\u03b11) and\u2212k2x21\/(2\u03b12) in the exponential function.6.4.2 Trivariate conditional boundary distribution with bivariateArchimedean copulasFor a trivariate vine copula model, it is difficult to get general results to cover alltypes of bivariate copulas for the vine, but it is possible to get results for the caseswhere all bivariate copulas are Archimedean. This provides some insight on thetail behavior of conditional quantile functions. Specifically, we study how the tail122properties of the bivariate Archimedean copulas on the edges affect the asymptoticbehavior of the conditional CDF. It turns out that trivariate cases are more complexthan bivariate ones. For a bivariate Archimedean copula, the boundary conditionalCDF CV |U(v|0) is either a distribution with support on all of (0,1) or a degeneratedistribution at 0. However, depending on the bivariate copulas on the edges, theconditional CDF C3|12(v|u1,u2) could be a distribution with support on all of (0,1),a degenerate distribution at 0, or a degenerate distribution at 1 (the unusual case),as (u1,u2)\u2192 (0,0) along a ray. The results are summarized in Tables 6.1 and 6.2.Some results on the asymptotic conditional distributions of bivariate Archimedeancopulas are presented in Section 6.3. Based on those results, we study the condi-tional distribution of trivariate vine copulas with bivariate Archimedean copulas.Specifically, we are interested in the boundary conditional distribution of a trivari-ate vine copula with C12,C23 in tree 1 and C13;2 in tree 2:u3|12 :=C3|12(v|u1,u2) =C3|1;2(u3|2|u1|2), (6.23)where u3|2 := C3|2(v|u2) and u1|2 := C1|2(u1|u2), C3|1;2(b|a) = \u2202C13;2(a,b)\/\u2202a,v \u2208 (0,1) and (u1,u2)\u2192 (0,0) or (u1,u2)\u2192 (1,1). C1|2,C3|2 and C3|1;2 are theconditional distributions of C12,C23 and C13;2 respectively.As (u1,u2)\u2192 (0,0), u3|12 can, in some cases, depend on C3|1;2(\u00b7|0) or C3|1;2(\u00b7|1),as well as C3|2(\u00b7|0) and C1|2(\u00b7|0). Similarly, as (u1,u2)\u2192 (1,1), u3|12 can, in somecases, depend on C3|1;2(\u00b7|0) or C3|1;2(\u00b7|1), as well as C3|2(\u00b7|1) and C1|2(\u00b7|1). This iswhy the trivariate and higher-dimensional cases of boundary conditional CDF canbe complicated. Also, in some cases, the form of the boundary conditional CDFdepends on the direction of (u1,u2)\u2192 (0,0) or (1,1).Given the copula boundary conditional distribution u3|12, we can obtain itsequivalence on the normal scale. Let the trivariate vine copula be the CDF of a ran-dom vector (U1,U2,V ), and define X1 =\u03a6\u22121(U1),X2 =\u03a6\u22121(U2) and Y =\u03a6\u22121(V ).We are interested in the conditional quantile function F\u22121Y |X1,X2(\u03b1|x1,x2), as x1,x2\u2192\u2212\u221e and x1\/x2 converges to a constant. For a fixed quantile level \u03b1 ,\u2022 If u3|12\u2192 0 as (u1,u2)\u2192 (0,0) and u2 \u223c uk1, then F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192 +\u221eas x1,x2\u2192\u2212\u221e and x2\/x1\u2192\u221ak.123limu2\u21920C3|2(v|u2)(0,1) 1limu1 ,u2\u21920C1|2(u1|u2)0 (0,1) 1 1 1 1 1(0,1) (0,1) (0,1) (0,1) 1 1 11 (0,1) (0,1) 0 1 1 \u2217C13;2 \u03ba13 = 2 \u03ba13 \u2208 (1,2) \u03ba13 = 1 \u03ba13 = 2 \u03ba13 \u2208 (1,2) \u03ba13 = 1Table 6.1: The taxonomy of the lower tail boundary conditional distributionlimu1,u2\u21920 u3|12, where u3|12 is defined in Equation 6.23. For the first(non-heading) row where limu1,u2\u21920C1|2(u1|u2) = 0, \u03ba13 represents \u03ba13L,the lower tail order of C13;2. Similarly, for the third (non-heading) row,where limu1,u2\u21920C1|2(u1|u2) = 1, \u03ba13 represents \u03ba13U , the upper tail orderof C13;2.\u2022 If u3|12 converges to a constant in (0,1) as (u1,u2)\u2192 (0,0) and u2 \u223c uk1, thenF\u22121Y |X1,X2(\u03b1|x1,x2) converges to a finite constant as x1,x2\u2192\u2212\u221e and x2\/x1\u2192\u221ak.\u2022 If u3|12\u2192 1 as (u1,u2)\u2192 (0,0) and u2 \u223c uk1, then F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192\u2212\u221eas x1,x2\u2192\u2212\u221e and x2\/x1\u2192\u221ak.Similar results hold for the upper tail, that is u1,u2\u2192 1 and u2 \u223c uk1.Trivariate vine copula lower tailFix v \u2208 (0,1) and let u1,u2 \u2192 0 with u2 \u223c uk1. According to Equation 6.10 andEquation 6.14, depending on the tail order of C23(u2,u3), the limit of u3|2 =C3|2(v|u2)could either be a number in (0,1), or 1. Similarly, according to Section 6.3.1, thelimit of u1|2 =C1|2(u1|u2) could either be 0, a number in (0,1), or 1. Depending onthe limit of u1|2, we also need to take the corresponding tail behavior of C13;2 intoconsideration. The possible combinations of the tail behaviors are summarized inTable 6.1.The first (non-heading) row of Table 6.1 corresponds to u1|2\u2192 0.\u2022 If limu2\u21920 u3|2 \u2208 (0,1) and C13;2 has \u03ba13L = 2, then support(C3|1;2(\u00b7|0)) =(0,1). Therefore limu1,u2\u21920 u3|12 \u2208 (0,1), as shown in row 1, column 1.\u2022 If limu2\u21920 u3|2 \u2208 (0,1) and C13;2 has \u03ba13L \u2208 [1,2), then C3|1;2(\u00b7|0) is a degen-erate distribution with a point mass at 0. Therefore u3|12 \u2192 1, as shown inrow 1, columns 2\u20133.124\u2022 If u3|2 \u2192 1, then u3|12 \u2192 1, regardless of the tail behavior of C13;2. This isshown in row 1, columns 4\u20136.The second (non-heading) row of Table 6.1 corresponds to limu1,u2\u21920 u1|2 \u2208(0,1). In this case, the tail behavior of C13;2 is irrelevant. If limu2\u21920 u3|2 \u2208 (0,1),then limu1,u2\u21920 u3|12 \u2208 (0,1) (row 2, columns 1\u20133); if limu2\u21920 u3|2 = 1, then limu1,u2\u21920 u3|12 =1 (row 2, columns 4\u20136).The third (non-heading) row of Table 6.1 corresponds to u1|2\u2192 1.\u2022 If limu2\u21920 u3|2 \u2208 (0,1) and C13;2 has \u03ba13U \u2208 (1,2], then support(C3|1;2(\u00b7|1)) =(0,1). Therefore limu1,u2\u21920 u3|12 \u2208 (0,1), as shown in row 3, columns 1\u20132.\u2022 If limu2\u21920 u3|2 \u2208 (0,1) and C13;2 has \u03ba13U = 1, then C3|1;2(\u00b7|1) is a degeneratedistribution with a point mass at 1. Therefore u3|12\u2192 0, as shown in row 3,column 3.\u2022 If u3|2 \u2192 1 and C13;2 has \u03ba13U \u2208 (1,2], then support(C3|1;2(\u00b7|1)) = (0,1).Therefore u3|12\u2192 1, as shown in row 3, columns 4\u20135.\u2022 If u3|2\u2192 1 and C13;2 has \u03ba13U = 1, then C3|1;2(\u00b7|1) is a degenerate distribu-tion with a point mass at 1. The limit of u3|12 is unclear and needs furtherinvestigation (row 3, column 6, cell \u2217). Depending on \u03ba23L, we have thefollowing results. (See Section A.2 for a detailed derivation.)\u2013 If C23 has \u03ba23L \u2208 (1,2), then C3|1;2(u3|2|u1|2)\u2192 0.\u2013 If C23 has \u03ba23L = 1, thenC3|1;2(u3|2|u1|2)\u2192\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f31 if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121)> 0,const \u2208 (0,1) if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121) = 0,0 if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121)< 0,where q23 and q12 are the parameters of \u03c823 for C23 and \u03c812 for C12respectively.125limu2\u21921C3|2(v|u2)0 (0,1)limu1 ,u2\u21921C1|2(u1|u2)0 0 0\u2217 \u2020 (0,1) 1 1(0,1) 0 0 0 (0,1) (0,1) (0,1)1 0 0 0 (0,1) (0,1) 0C13;2 \u03ba13 = 2 \u03ba13 \u2208 (1,2) \u03ba13 = 1 \u03ba13 = 2 \u03ba13 \u2208 (1,2) \u03ba13 = 1Table 6.2: The taxonomy of the upper tail boundary conditional distributionlimu1,u2\u21921 u3|12, where u3|12 is defined in Equation 6.23. For the first(non-heading) row where limu1,u2\u21921C1|2(u1|u2) = 0, \u03ba13 represents \u03ba13L,the lower tail order of C13;2. Similarly, for the third (non-heading) row,where limu1,u2\u21921C1|2(u1|u2) = 1, \u03ba13 represents \u03ba13U , the upper tail orderof C13;2.Trivariate vine copula upper tailFix v\u2208 (0,1) and let (u1,u2)\u2192 (1,1)with (1\u2212u2)\u223c (1\u2212u1)k. According to Equa-tion 6.18, depending on the tail order of C23(u2,u3), the limit of u3|2 =C3|2(v|u2)could either be a number in (0,1), or 0. Similarly, according to Section 6.3.2, thelimit of u1|2 =C1|2(u1|u2) could either be 0, a number in (0,1), or 1. Depending onthe limit of u1|2, we also need to take the corresponding tail behavior of C13;2 intoconsideration. The possible combinations of the tail behaviors are summarized inTable 6.2.The first (non-heading) row of Table 6.2 corresponds to u1|2\u2192 0.\u2022 If u3|2\u2192 0 and C13;2 has \u03ba13L = 2, then support(C3|1;2(\u00b7|0)) = (0,1). There-fore u3|12\u2192 0, as shown in row 1, column 1.\u2022 If u3|2\u2192 0 and C13;2 has \u03ba13L \u2208 (1,2), then C3|1;2(\u00b7|0) is a degenerate distri-bution with a point mass at 0. The limit of u3|12 is unclear and needs furtherinvestigation (row 1, column 2, cell \u2217). It can be shown that in that case,u1|2\u2192 0. See Section A.2 for a detailed derivation.\u2022 If u3|2\u2192 0 and C13;2 has \u03ba13L = 1, then C3|1;2(\u00b7|0) is a degenerate distribu-tion with a point mass at 0. The limit of u3|12 is unclear and needs furtherinvestigation (row 1, column 3, cell \u2020). Depending on the relationship amongM12,M23 and k, we have the following results. (See Section A.2 for a detailed126derivation.)C3|1;2(u3|2|u1|2)\u2192\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f30 if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121)> 0,const \u2208 (0,1) if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121) = 0,1 if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121)< 0,where M23 and M12 are the parameters of \u03c823 for C23 and \u03c812 for C12 respec-tively.\u2022 If limu2\u21921 u3|2 \u2208 (0,1) and C13;2 has \u03ba13L = 2, then support(C3|1;2(\u00b7|0)) =(0,1). Therefore lim(u1,u2)\u2192(0,0) u3|12 \u2208 (0,1), as shown in row 1, column 4.\u2022 If limu2\u21921 u3|2 \u2208 (0,1) and C13;2 has \u03ba13L \u2208 [1,2), then C3|1;2(\u00b7|0) is a degen-erate distribution with a point mass at 0. Therefore u3|12 \u2192 1, as shown inrow 1, columns 5\u20136.The second (non-heading) row of Table 6.2 corresponds to limu1,u2\u2192(1,1) u1|2 \u2208(0,1). In this case, the tail behavior of C13;2 is irrelevant. If limu2\u21921 u3|2 = 0,then limu1,u2\u2192(1,1) u3|12 = 0 (row 2, columns 1\u20133); if limu2\u21921 u3|2 \u2208 (0,1), thenlimu1,u2\u2192(1,1) u3|12 \u2208 (0,1) (row 2, columns 4\u20136).The third (non-heading) row of Table 6.2 corresponds to u1|2\u2192 1.\u2022 If u3|2 \u2192 0, then u3|12 \u2192 0, regardless of the tail property of C13;2. This isshown in row 3, columns 1\u20133.\u2022 If limu2\u21921 u3|2 \u2208 (0,1) and C13;2 has \u03ba13U \u2208 (1,2], then support(C3|1;2(\u00b7|1)) =(0,1). Therefore limu1,u2\u21920 u3|12 \u2208 (0,1), as shown in row 3, columns 4\u20135.\u2022 If limu2\u21921 u3|2 \u2208 (0,1) and C13;2 has \u03ba13U = 1, then C3|1;2(\u00b7|1) is a degeneratedistribution with a point mass at 1. Therefore u3|12\u2192 0, as shown in row 3,column 6.6.4.3 Case studies: trivariate conditional quantileIn this section, we provide a few examples to illustrate how to use the results inTable 6.1 and Table 6.2 to derive the boundary conditional quantiles for trivariate127vine with bivariate Archimedean copulas. Analytic results are provided for thoseexamples to illustrate how the tail properties of bivariate copulas on edges of thevine can affect asymptotic properties of conditional quantiles. We use the samesetting as before: v \u2208 (0,1) and u1,u2\u2192 (0,0) or u1,u2\u2192 (1,1); we are interestedin the limit of u3|12 =C3|1;2(u3|1|u1|2) as well as the conditional quantile functionon the normal scale.In case 1, the two linking copulas to Y have \u03baU = 1 and \u03baL \u2208 (1,2). In case2, the linking copulas to Y have \u03baU = \u03baL = 2. The less straightforward case 3 has\u03baU = \u03baL = 2 for the linking copula to Y in tree 1 and \u03baU = 1,\u03baL \u2208 (1,2) for thelinking copula to Y in tree 2.Case 1C12, C23 and C13;2 are all bivariate Gumbel copulas, with lower tail intermediatedependence and upper tail dependence.Lower tail (u1,u2\u2192 (0,0)): C12 has \u03ba12L \u2208 (1,2). According to Equation 6.16,u1|2\u2192 0. C23 has \u03ba23L \u2208 (1,2). According to Equation 6.14, u3|2\u2192 1. Finally, C13;2has \u03ba13L \u2208 (1,2). The combination of the three copulas corresponds to the row 1column 5 in Table 6.1, that is u3|12 \u2192 1. On the normal scale, the conditionalquantile F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192 \u2212\u221e as x1,x2 \u2192 \u2212\u221e. A more detailed analysis (seeAppendix A.3) shows thatF\u22121Y |X1,X2(\u03b1|x1,x2) = O((\u2212 log\u03b1)r23r13;2\/2|x2|1\u2212r23r13;2),x1,x2\u2192\u2212\u221e,x2\/x1\u2192\u221ak,where r23 and r13;2 are parameters of the LT \u03c823 for C23 and \u03c813;2 for C13;2 re-spectively. Since 1\u2212 r23r13;2 < 1, the conditional quantile function goes to \u2212\u221esublinearly with respect to x1 or x2.Upper tail (u1,u2 \u2192 (1,1)): C23 has \u03ba23U = 1. According to Equation 6.18,u3|2 \u2192 0. C12 has \u03ba12U = 1. Applying Equation 6.21, we need to investigate therate at which u1 and u2 go to 1. Assuming (1\u2212u2)\u223c (1\u2212u1)k:\u2022 If k > 1, then u1|2 \u2192 0. C13;2 has \u03ba13L \u2208 (1,2). This corresponds to row 1column 2 in Table 6.2, that is u3|12\u2192 0.128\u2022 If k = 1, then limu1|2 \u2208 (0,1). This corresponds to row 2 columns 1\u20133 inTable 6.2, that is limu3|12\u2192 0.\u2022 If 0 < k < 1, then u1|2 \u2192 1. We need to focus on the upper tail of C13;2,which has \u03ba32U = 1. This corresponds to row 3 column 3 in Table 6.2, thatis u3|12\u2192 0.Therefore, u3|12\u2192 0 regardless of the value of k. On the normal scale, the condi-tional quantile F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192 +\u221e as x1,x2\u2192 +\u221e. A more detailed analysisshows that, if x2\/x1\u2192\u221ak as x1,x2\u2192+\u221e, thenF\u22121Y |X1,X2(\u03b1|x1,x2)\u223c\uf8f1\uf8f2\uf8f3x2 if k \u2265 1,\u221a1+ M23M12 (1k \u22121)x2 if 0 < k < 1.In summary, as x1 and x2 go to +\u221e, the conditional quantile goes to +\u221e lin-early; as x1 and x2 go to \u2212\u221e, the conditional quantile goes to \u2212\u221e sublinearly. Thisis a natural extension of the conditional quantile function of the bivariate Gumbelcopula. Figure 6.2a shows the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) for \u03b1 = 0.25and 0.75. The parameters of the copulas \u03b412, \u03b423, and \u03b413;2 are chosen such that thecorresponding Spearman correlation \u03c1S = 0.5.Case 2\u2022 C12 is a bivariate Frank copula, with \u03ba12L = \u03ba12U = 2.\u2022 C23 is a bivariate Frank copula, with \u03ba23L = \u03ba23U = 2.\u2022 C13;2 could be any bivariate copula.In this case, row 2 columns 1\u20133 in Table 6.1 and row 2 column 4\u20136 in Table 6.2apply. That is, limu1,u2\u2192(0,0) u3|12 \u2208 (0,1) and limu1,u2\u2192(1,1) u3|12 \u2208 (0,1). On thenormal scale, the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) converges to a finite con-stant as x1,x2\u2192 +\u221e or \u2212\u221e. This example shows that, if the bivariate copulas onthe first level has \u03baL = 2 (or \u03baU = 2), then regardless of the second level copula,the conditional lower (upper) quantile is asymptotically constant.129(a) Case 1: as x1,x2 \u2192 +\u221e, the conditional quantile goes to+\u221e linearly; as x1,x2\u2192\u2212\u221e, the conditional quantile goes to \u2212\u221esublinearly.(b) Case 3: as x1,x2 \u2192 +\u221e at different rates, the conditionalquantile could either go up or down.Figure 6.2: Conditional quantile surface F\u22121Y |X1,X2(\u03b1|x1,x2) in cases 1 and 3,for \u03b1 = 0.25 and 0.75.130Case 3\u2022 C12 is a bivariate Gumbel copula, with \u03ba12L \u2208 (1,2) and \u03ba12U = 1.\u2022 C23 is a bivariate Frank copula, with \u03ba23L = \u03ba23U = 2.\u2022 C13;2 is a bivariate Gumbel copula, with with \u03ba13L \u2208 (1,2) and \u03ba13U = 1.Lower tail (u1,u2\u2192 (0,0)): C12 has \u03ba12L \u2208 (1,2). According to Equation 6.16,u1|2\u2192 0 and\u2212 logu1|2\u223cO(\u2212 logu2). Since C23 has support(C3|2(\u00b7|0))= (0,1) and\u03ba23L = 2, limu2\u21920 u3|2 \u2208 (0,1). Finally, C13;2 has \u03ba13L \u2208 (1,2). The combination ofthe three copulas corresponds to the row 1 column 2 in Table 6.1, that is, u3|12\u2192 1.On the normal scale, the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192\u2212\u221e as x1,x2\u2192\u2212\u221e. According to Proposition 6.2, the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) issublinear with respect to x1 or x2, if x2\/x1\u2192\u221ak.Upper tail (u1,u2\u2192 (1,1)): Since C23 has \u03ba23U = 2 and support(C3|2(\u00b7|1)) =(0,1), limu2\u21921 u3|2 \u2208 (0,1). C12 has \u03ba12U = 1. Applying Equation 6.21, we need toinvestigate the rate at which u1 and u2 go to 1. Assuming (1\u2212u2)\u223c (1\u2212u1)k:\u2022 If k > 1, then u1|2 \u2192 0. C13;2 has \u03ba13L \u2208 (1,2). This corresponds to row1 column 5 in Table 6.2, that is u3|12 \u2192 1. On the normal scale, the con-ditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192\u2212\u221e as x1,x2\u2192 +\u221e and x2\/x1\u2192\u221ak.The conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) is sublinear with respect to x1 orx2.\u2022 If k = 1, then limu1|2 \u2208 (0,1). This corresponds to row 2 columns 4\u20136 inTable 6.2, that is limu3|12 \u2208 (0,1). On the normal scale, the conditionalquantile F\u22121Y |X1,X2(\u03b1|x1,x2) converges to a finite number as x1,x2\u2192 +\u221e andx2\/x1\u2192 1.\u2022 If 0 < k < 1, then u1|2 \u2192 1. We need to focus on the upper tail of C13;2,which has \u03ba13U = 1. This corresponds to row 3 column 6 in Table 6.2, that isu3|12\u2192 0. On the normal scale, the conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192+\u221e as x1,x2\u2192+\u221e and x2\/x1\u2192\u221ak. The conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2)is linear with respect to x1 or x2.131(a) k = 0.5. (b) k = 1. (c) k = 4.Figure 6.3: Conditional quantile F\u22121Y |X1,X2(\u03b1|x1,x2) versus x1 in case 3 for\u03b1 = 0.25 and 0.75, as x1 \u2192 +\u221e. It shows that the conditional quan-tile converges to +\u221e, a finite number, or \u2212\u221e.Figure 6.2b shows the conditional quantile surface for \u03b1 = 0.25 and 0.75. Theparameters of the copulas \u03b412, \u03b423, and \u03b413;2 are chosen such that the correspondingSpearman correlation \u03c1S = 0.5. Depending on the rate at which x1 and x2 go to+\u221e,the conditional quantile could go to +\u221e or \u2212\u221e. Figure 6.3 shows the conditionalquantile F\u22121Y |X1,X2(\u03b1|x1,x2) for \u03b1 = 0.25 and 0.75 as x1,x2 \u2192 \u221e and x2\/x1 \u2192\u221ak,for k = 0.5,1 and 4. The three cases correspond to weakly linear, asymptotic con-stant, and sublinear. This example also shows that the asymptotic behavior of theconditional quantile function varies depending on the direction along which x1 andx2 take.The case of k> 1 and F\u22121Y |X1,X2(\u03b1|x1,x2)\u2192\u2212\u221e as x1,x2\u2192+\u221e is unusual, giventhat all three copulas have positive dependence. One possible explanation is that,variable X1 has strong tail dependence link to Y , and variable X2 has tail quadrantindependence link to Y ; when k > 1, X2 goes to infinity faster than X1 and thedirection to limit is more concentrated on the weaker variable.6.5 Beyond trivariateUsing the results in Section 6.3 and Section 6.4.2 as building blocks, the boundaryconditional distribution of a higher-dimensional vine copula can be derived. Take a4-dimensional vine copula as an example: without loss of generality, we considera D-vine copula with 1-2-3-4 as the first level tree. The conditional CDF can be132represented byu4|123 :=C4|123(v|u1,u2,u3) =C4|1;23(u4|23|u1|23),where u4|23 :=C4|2;3(u4|3|u2|3),u1|23 :=C1|3;2(u1|2|u3|2),u1|2 :=C1|2(u1|u2),u2|3 =C2|3(u2|u3),u3|2 =C3|2(u3|u2),u4|3 =C4|3(v|u3), and v \u2208 (0,1), u1,u2,u3\u2192 0 or 1.Applying the techniques demonstrated in Section 6.4.2, the asymptotic behaviorof u4|23 and u1|23 can be obtained. Afterwards, the results in Section 6.3 can beapplied to get the limit of u4|123. The limit of u4|123 could be summarized as a tablelike Table 6.1 and Table 6.2, but it would be complicated to classify all the possiblecombinations of the bivariate copula tail behavior. Technically, the idea could befurther generalized to any high dimensions.A general heuristic statement is that if more linking copulas of the X\u2019s to Yhave \u03ba = 1, then the tail behavior of conditional quantiles is more likely to beasymptotically linear or sublinear. If all of the linking copulas of the X\u2019s to Y have\u03ba = 2, then the tail behavior of conditional quantiles is asymptotically constant.133Chapter 7ConclusionThe major contributions of the thesis lie in the improvements on fitting paramet-ric vine copula models, compared with (a) alternative methods for truncated vinestructure learning, and (b) diagnostics for bivariate copula selection for dependenceanalysis or prediction.The Monte Carlo tree search (MCTS) algorithm improves on the greedy algo-rithm for the vine structure. Under the guidance of the vine UCT, our method caneffectively explore the large search space of possible truncated vines by balancingbetween exploration and exploitation. It also has significantly better performanceover the existing methods under various experimental setups.The diagnostic tools provide better ways for bivariate copula selection. Theuse of diagnostics can reduce the number of candidate copula families to consideron edges of the vine. We have also illustrated with real datasets the use of depen-dence and asymmetry measures as diagnostic tools for bivariate copulas and bivari-ate conditional distributions. It is a future research direction to automatically andadaptively generate a shortlist of candidate parametric copula families for edgesof a vine copula based on diagnostic measures. An alternative is a reverse-deletealgorithm: start with a long list of bivariate parametric copula families followed bydeletion of families that cannot match the diagnostic summaries.The vine copula regression method is interpretable and flexible. Comparedwith the existing methods that either use D-vines or only handle continuous vari-ables, the proposed method uses R-vines and can fit mixed continuous and ordinal134variables. Various shapes of conditional quantiles can be obtained depending onhow pair-copulas are chosen on the edges of the vine. For bivariate copulas, theconditional quantile function of the response variable could be asymptotically lin-ear, sublinear, or constant with respect to the explanatory variable. The asymptoticconditional distribution can be quite complex for trivariate and higher-dimensionalcases. The performance of the proposed method is evaluated on simulated data setsand the Abalone data set. The heteroscedasticity in the data is better captured byvine copula regression than the standard regression methods.One possible future research direction is the extension of the proposed regres-sion method for survival outcomes with censored data. For example, Emura et al.(2018) use bivariate copulas to predict time-to-death given time-to-cancer progres-sion; Barthel et al. (2018) apply vine copulas to multivariate right-censored eventtime data. These types of applications would require more numerical integrationmethods. Another research direction is to handle variable selection and reductionwhen there are many explanatory variables, some of which might form clusterswith strong dependence.135BibliographyAas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copulaconstructions of multiple dependence. Insurance: Mathematics andEconomics, 44(2):182\u2013198. \u2192 page 85Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions:with Formulas, Graphs, and Mathematical Tables, volume 55. CourierCorporation. \u2192 page 111Acar, E. F., Genest, C., and Nes\u02c7lehova\u00b4, J. (2012). Beyond simplified pair-copulaconstructions. Journal of Multivariate Analysis, 110:74\u201390. \u2192 pages 54, 59, 73Antweiler, W. (1996). Pacific exchange rate service. http:\/\/fx.sauder.ubc.ca\/.[Online; accessed: 2019-02-19]. \u2192 page 48Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation ofsymmetry with emphasis on a multivariate skew t-distribution. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 65(2):367\u2013389.\u2192 page 66Barthel, N., Geerdens, C., Killiches, M., Janssen, P., and Czado, C. (2018). Vinecopula based likelihood estimation of dependence patterns in multivariate eventtime data. Computational Statistics & Data Analysis, 117:109\u2013127. \u2192 pages108, 135Bauer, A. and Czado, C. (2016). Pair-copula Bayesian networks. Journal ofComputational and Graphical Statistics, 25(4):1248\u20131271. \u2192 page 88Bedford, T. and Cooke, R. M. (2001). Probability density decomposition forconditionally dependent random variables modeled by vines. Annals ofMathematics and Artificial Intelligence, 32(1-4):245\u2013268. \u2192 pages 18, 19, 89Bedford, T. and Cooke, R. M. (2002). Vines \u2014 A new graphical model fordependent random variables. Annals of Statistics, 30(4):1031\u20131068. \u2192 page 85136Bentler, P. M. (1990). Comparative fit indexes in structural models. PsychologicalBulletin, 107(2):238. \u2192 page 24Bernard, C. and Czado, C. (2015). Conditional quantiles and tail dependence.Journal of Multivariate Analysis, 138:104\u2013126. \u2192 pages 85, 110, 111Blomqvist, N. (1950). On a measure of dependence between two randomvariables. The Annals of Mathematical Statistics, 21(4):593\u2013600. \u2192 page 53Bouye\u00b4, E. and Salmon, M. (2009). Dynamic copula quantile regressions and tailarea dynamic dependence in Forex markets. The European Journal of Finance,15(7-8):721\u2013750. \u2192 page 85Brechmann, E. (2010). Truncated and simplified regular vines and theirapplications. Master\u2019s thesis, Technical University of Munich. \u2192 page 90Brechmann, E. C., Czado, C., and Aas, K. (2012). Truncated regular vines in highdimensions with application to financial data. Canadian Journal of Statistics,40(1):68\u201385. \u2192 pages 26, 52, 72, 85Brechmann, E. C. and Joe, H. (2014). Parsimonious parameterization ofcorrelation matrices using truncated vines and factor analysis. ComputationalStatistics & Data Analysis, 77:233\u2013251. \u2192 page 87Brechmann, E. C. and Joe, H. (2015). Truncation of vine copulas using fit indices.Journal of Multivariate Analysis, 138:19\u201333. \u2192 pages 22, 24, 43Brennan, C. W., Verhaak, R. G. W., McKenna, A., Campos, B., Noushmehr, H.,Salama, S. R., Zheng, S., Chakravarty, D., Sanborn, J. Z., Berman, S. H., et al.(2013). The somatic genomic landscape of glioblastoma. Cell,155(2):462\u2013477. \u2192 pages 45, 76Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I.,Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012).A survey of Monte Carlo tree search methods. IEEE Transactions onComputational Intelligence and AI in games, 4(1):1\u201343. \u2192 pages 4, 28, 37Chang, B. and Joe, H. (2019). Prediction based on conditional distributions ofvine copulas. Computational Statistics & Data Analysis, 139:45\u201363.Chang, B., Pan, S., and Joe, H. (2019). Vine copula structure learning via MonteCarlo tree search. In International Conference on Artificial Intelligence andStatistics.137Chaslot, G. M. J. B., Winands, M. H. M., Herik, H. J. v. D., Uiterwijk, J. W. H. M.,and Bouzy, B. (2008). Progressive strategies for Monte-Carlo tree search. NewMathematics and Natural Computation, 4(03):343\u2013357. \u2192 page 35Childs, B. E., Brodeur, J. H., and Kocsis, L. (2008). Transpositions and movegroups in Monte Carlo tree search. In Computational Intelligence and Games,2008. CIG\u201908. IEEE Symposium On, pages 389\u2013395. IEEE. \u2192 page 37Cook, R. D. and Johnson, M. E. (1981). A family of distributions for modellingnon-elliptically symmetric multivariate data. Journal of the Royal StatisticalSociety. Series B (Methodological), 43(2):210\u2013218. \u2192 page 73Cooke, R. M., Joe, H., and Chang, B. (2019). Vine copula regression forobservational studies. AStA Advances in Statistical Analysis. \u2192 pages 3, 28, 86Coulom, R. (2006). Efficient selectivity and backup operators in Monte-Carlo treesearch. In International Conference on Computers and Games, pages 72\u201383.Springer. \u2192 pages 4, 28Dette, H., Van Hecke, R., and Volgushev, S. (2014). Some comments oncopula-based regression. Journal of the American Statistical Association,109(507):1319\u20131324. \u2192 page 108Dissmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selectingand estimating regular vine copulae and application to financial returns.Computational Statistics & Data Analysis, 59:52\u201369. \u2192 pages3, 23, 28, 33, 42, 43, 49, 52, 72, 80, 85, 87Emura, T., Nakatochi, M., Matsui, S., Michimae, H., and Rondeau, V. (2018).Personalized dynamic prediction of death according to tumour progression andhigh-dimensional genetic factors: meta-analysis with a joint model. Statisticalmethods in medical research, 27(9):2842\u20132858. \u2192 pages 108, 135Fan, J. (1992). Design-adaptive nonparametric regression. Journal of theAmerican statistical Association, 87(420):998\u20131004. \u2192 page 85Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recogn. Lett.,27(8):861\u2013874. \u2192 page 106Gelly, S. and Wang, Y. (2006). Exploration exploitation in go: UCT forMonte-Carlo go. In NIPS: Neural Information Processing Systems ConferenceOn-line trading of Exploration and Exploitation Workshop. \u2192 page 35138Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimationprocedure of dependence parameters in multivariate families of distributions.Biometrika, 82(3):543\u2013552. \u2192 page 25Genest, C., Ghoudi, K., and Rivest, L.-P. (1998). \u201cUnderstanding relationshipsusing copulas,\u201d by Edward Frees and Emiliano Valdez, January 1998. NorthAmerican Actuarial Journal, 2(3):143\u2013149. \u2192 page 64Gijbels, I., Veraverbeke, N., and Omelka, M. (2011). Conditional copulas,association measures and their applications. Computational Statistics & DataAnalysis, 55(5):1919\u20131932. \u2192 pages 54, 56, 57, 61Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction,and estimation. Journal of the American Statistical Association,102(477):359\u2013378. \u2192 pages 94, 95, 96Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of StatisticalLearning: Data Mining, Inference and Prediction. Springer, 2 edition. \u2192 page95Hob\u00e6k Haff, I., Aas, K., and Frigessi, A. (2010). On the simplified pair-copulaconstruction \u2014 simply useful or too simplistic? Journal of MultivariateAnalysis, 101(5):1296\u20131310. \u2192 page 54Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence ofmultivariate copulas. Journal of Multivariate Analysis, 102(10):1454\u20131471. \u2192pages 16, 118, 119, 120, 121Ha\u00a8rdle, W. (1990). Applied Nonparametric Regression. Econometric SocietyMonographs. Cambridge University Press. \u2192 page 85Joe, H. (1993). Parametric families of multivariate distributions with givenmargins. Journal of Multivariate Analysis, 46(2):262\u2013282. \u2192 page 53Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman andHall\/CRC. \u2192 pages 12, 25Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method forcopula-based models. Journal of Multivariate Analysis, 94(2):401\u2013419. \u2192page 25Joe, H. (2014). Dependence Modeling with Copulas. Chapman & Hall \/ CRCPress, Boca Raton, FL. \u2192 pages10, 23, 27, 42, 45, 57, 64, 68, 72, 85, 87, 89, 91, 93, 117139Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures ofmax-infinitely divisible distributions. Journal of multivariate analysis,57(2):240\u2013265. \u2192 page 12Joe, H. and Xu, J. J. (1996). The estimation method of inference functions formargins for multivariate models. University of British Columbia, Departmentof Statistics, Technical Report, 166. \u2192 page 25Kendall, M. G. (1938). A new measure of rank correlation. Biometrika,30(1\/2):81\u201393. \u2192 pages 14, 53Kocsis, L. and Szepesva\u00b4ri, C. (2006). Bandit based Monte-Carlo planning. InEuropean Conference on Machine Learning, pages 282\u2013293. Springer. \u2192pages 4, 28Kraus, D. and Czado, C. (2017a). D-vine copula based quantile regression.Computational Statistics & Data Analysis, 110:1\u201318. \u2192 pages 86, 102, 105Kraus, D. and Czado, C. (2017b). Growing simplified vine copula trees:improving Dissmann\u2019s algorithm. arXiv preprint arXiv:1703.05203. \u2192 pages54, 73, 74Krupskii, P. (2017). Copula-based measures of reflection and permutationasymmetry and statistical tests. Statistical Papers, 58(4):1165\u20131187. \u2192 pages15, 53, 56, 62, 63Krupskii, P., Huser, R., and Genton, M. G. (2018). Factor copula models forreplicated spatial data. Journal of the American Statistical Association,113(521):467\u2013479. \u2192 pages 3, 28Krupskii, P. and Joe, H. (2015). Tail-weighted measures of dependence. Journalof Applied Statistics, 42(3):614\u2013629. \u2192 page 53Kurowicka, D. and Cooke, R. (2003). A parameterization of positive definitematrices in terms of partial correlation vines. Linear Algebra and itsApplications, 372:225\u2013251. \u2192 pages 21, 23Kurowicka, D. and Cooke, R. M. (2006). Uncertainty Analysis with HighDimensional Dependence Modelling. Wiley, Chichester. \u2192 pages 19, 21Kurowicka, D. and Joe, H. (2011). Dependence Modeling: Vine CopulaHandbook. World Scientific, Singapore. \u2192 pages 3, 23, 28, 87, 91Kurz, M. S. and Spanhel, F. (2017). Testing the simplifying assumption inhigh-dimensional vine copulas. arXiv preprint arXiv:1706.02338. \u2192 page 54140Lee, D., Joe, H., and Krupskii, P. (2018). Tail-weighted dependence measureswith limit being the tail dependence coefficient. Journal of NonparametricStatistics, 30(2):262\u2013290. \u2192 pages 14, 53, 56, 61Lichman, M. (2013). UCI machine learning repository. \u2192 pages 45, 98Loader, C. (1999). Local Regression and Likelihood. New York: Springer-Verlag.\u2192 page 58McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative Risk Management:Concepts, Techniques and Tools. Princeton University Press. \u2192 page 64Mu\u00a8ller, D. and Czado, C. (2019). Dependence modeling in ultra high dimensionswith vine copulas and the graphical lasso. Computational Statistics & DataAnalysis, 137:211\u2013232. \u2192 page 108Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & ItsApplications, 9(1):141\u2013142. \u2192 page 58Nagler, T., Bumann, C., and Czado, C. (2019). Model selection in sparsehigh-dimensional vine copula models with an application to portfolio risk.Journal of Multivariate Analysis. \u2192 page 108Nagler, T. and Czado, C. (2016). Evading the curse of dimensionality innonparametric density estimation with simplified vine copulas. Journal ofMultivariate Analysis, 151:69\u201389. \u2192 pages 85, 92Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. (1994).The population biology of abalone (haliotis species) in tasmania. i. blacklipabalone (h. rubra) from the north coast and islands of bass strait. Sea FisheriesDivision, Technical Report, (48). \u2192 page 98Noh, H., El Ghouch, A., and Bouezmarni, T. (2013). Copula-based regressionestimate and inference. Journal of the American Statistical Association,108(502):678\u2013688. \u2192 pages 85, 86Panagiotelis, A., Czado, C., and Joe, H. (2012). Pair copula constructions formultivariate discrete data. Journal of the American Statistical Association,107(499):1063\u20131072. \u2192 page 12Parsa, R. A. and Klugman, S. A. (2011). Copula regression. Variance Advancingand Science of Risk, 5:45\u201354. \u2192 page 85Prim, R. C. (1957). Shortest connection networks and some generalizations. BellLabs Technical Journal, 36(6):1389\u20131401. \u2192 pages 24, 31141Rosco, J. and Joe, H. (2013). Measures of tail asymmetry for bivariate copulas.Statistical Papers, 54(3):709\u2013726. \u2192 pages 16, 53, 63Schallhorn, N., Kraus, D., Nagler, T., and Czado, C. (2017). D-vine quantileregression with discrete variables. arXiv preprint arXiv:1705.08310. \u2192 page86Schepsmeier, U., Stoeber, J., Brechmann, E. C., Graeler, B., Nagler, T., andErhardt, T. (2018). VineCopula: Statistical Inference of Vine Copulas. Rpackage version 2.1.8. \u2192 pages 25, 48, 90, 91Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule.Experimental Economics, 1(1):43\u201361. \u2192 page 95Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche,G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.(2016). Mastering the game of go with deep neural networks and tree search.Nature, 529(7587):484\u2013489. \u2192 pages 28, 31Sklar, A. (1959). Fonctions de re\u00b4partition a` n dimensions et leurs marges.Publications de l\u2019Institut de Statistique de l\u2019Universite\u00b4 de Paris, 8:229\u2013231. \u2192pages 2, 9, 10Spearman, C. (1904). The proof and measurement of association between twothings. The American Journal of Psychology, 15(1):72\u2013101. \u2192 pages 14, 53Sto\u00a8ber, J., Hong, H. G., Czado, C., and Ghosh, P. (2015). Comorbidity of chronicdiseases in the elderly: Patterns identified by a copula design for mixedresponses. Computational Statistics and Data Analysis, 88:28\u201339. \u2192 page 12Stoeber, J., Joe, H., and Czado, C. (2013). Simplified pair copulaconstructions\u2014limitations and extensions. Journal of Multivariate Analysis,119:101\u2013118. \u2192 pages 54, 68Stone, C. J. (1977). Consistent nonparametric regression. The annals of statistics,pages 595\u2013620. \u2192 page 85Tomczak, K., Czerwin\u00b4ska, P., and Wiznerowicz, M. (2015). The Cancer GenomeAtlas (TCGA): an immeasurable source of knowledge. ContemporaryOncology, 19(1A):A68. \u2192 pages 45, 76Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nestedhypotheses. Econometrica: Journal of the Econometric Society,57(2):307\u2013333. \u2192 pages 26, 73, 76142Watson, G. S. (1964). Smooth regression analysis. Sankhya\u00af: The Indian Journalof Statistics, Series A, 26(4):359\u2013372. \u2192 page 58Yoshiba, T. (2018). Maximum likelihood estimation of skew-t copulas with itsapplications to stock returns. Journal of Statistical Computation andSimulation, 88(13):2489\u20132506. \u2192 page 66143Appendix ADerivations for Chapter 6A.1 Derivations for Section 6.4.1Tail expansions of the gamma CDFFor a random variable Z following Gamma(\u03b1,\u03b2 ), the CDF isFZ(z) = \u03b3(\u03b1,\u03b2 z)\/\u0393(\u03b1),where \u03b2 is the rate parameter and \u03b3(\u00b7, \u00b7) is the lower incomplete gamma function.Its CDF FZ and quantile function F\u22121Z have the following asymptotic behavior:FZ(z)\u223c 1\u2212 (\u03b2 z)\u03b1\u22121\u0393(\u03b1)e\u2212\u03b2 z, z\u2192+\u221e; F\u22121Z (p)\u223c\u2212log(1\u2212 p)\u03b2, p\u2192 1\u2212.FZ(z)\u223c (\u03b2 z)\u03b1\u0393(\u03b1+1), z\u2192 0+; F\u22121Z (p)\u223cp1\/\u03b1\u0393(\u03b1+1)1\/\u03b1\u03b2, p\u2192 0+.Upper tailWe first study the upper tail asymptotic behavior of g(x1,x2) as x1,x2\u2192 +\u221e.Without loss of generality, we assume \u03b2 = 1 because \u03b2 cancels in Equation A.1.\u03a6(x1)\u223c 1\u2212 1\u221a2pix1e\u2212x21\/2, x1\u2192+\u221e,144F\u22121X\u22171 (\u03a6(x1))\u223c\u2212 log(1\u221a2pix1e\u2212x21\/2)\u223c x212, x1\u2192+\u221e.Similarly,F\u22121X\u22172 (\u03a6(x2))\u223cx222, x2\u2192+\u221e.FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2)))\u223c 1\u2212 1\u0393(\u03b11+\u03b12)((x21+ x22)\u03b11+\u03b12\u221212)e\u2212(x21+x22)\/2, x1,x2\u2192+\u221e. (A.1)Finally,g(x1,x2) =\u03a6\u22121(FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2))))\u223c(\u22122log(1\u2212FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2)))))1\/2\u223c (x21+ x22)1\/2, x1,x2\u2192+\u221e.If x2 \u223c kx1 as x1,x2\u2192+\u221e, then g(x1,x2)\u223c\u221a1+ k2x1.Lower tailWithout loss of generality, we assume \u03b2 = 1. For the lower tail, x1,x2\u2192\u2212\u221e,\u03a6(x1)\u223c\u2212 1\u221a2pix1e\u2212x21\/2, x1\u2192\u2212\u221e,andF\u22121X\u22171 (\u03a6(x1))\u223c \u0393(\u03b11+1)1\/\u03b11 (\u03a6(x1))1\/\u03b11\u223c(\u0393(\u03b11+1)\u221a2pi)1\/\u03b11(\u2212 1x1)1\/\u03b11exp(\u2212 x212\u03b11), x1\u2192\u2212\u221e.Similarly,F\u22121X\u22172 (\u03a6(x2))\u223c(\u0393(\u03b12+1)\u221a2pi)1\/\u03b12(\u2212 1x2)1\/\u03b12exp(\u2212 x222\u03b12), x2\u2192\u2212\u221e.145FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2)))\u223c 1\u0393(\u03b11+\u03b12+1)[(\u0393(\u03b11+1)\u221a2pi)1\/\u03b11(\u2212 1x1)1\/\u03b11exp(\u2212 x212\u03b11)+(\u0393(\u03b12+1)\u221a2pi)1\/\u03b12(\u2212 1x2)1\/\u03b12exp(\u2212 x222\u03b12)]\u03b11+\u03b12.Assuming x2\u223c kx1, if k2 >\u03b12\/\u03b11, then exp(\u2212x21\/(2\u03b11)) dominates exp(\u2212x22\/(2\u03b12)).FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2)))=O((\u2212 1x1)(\u03b11+\u03b12)\/\u03b11exp(\u2212(\u03b11+\u03b12)x212\u03b11)),andg(x1,x2) =\u03a6\u22121(FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2))))\u223c(\u22122log(FY \u2217(F\u22121X\u22171 (\u03a6(x1))+F\u22121X\u22172(\u03a6(x2)))))1\/2\u223c\u221a\u03b11+\u03b12\u03b11x1, x1,x2\u2192\u2212\u221e, x2 \u223c kx1.If can be shown that the result holds for k2 = \u03b12\/\u03b11 as well. Similarly, if k2 <\u03b12\/\u03b11, theng(x1,x2)\u223c\u221a\u03b11+\u03b12\u03b12kx1, x1,x2\u2192\u2212\u221e, x2 \u223c kx1.In summary,g(x1,x2)\u223c\uf8f1\uf8f2\uf8f3\u221a\u03b11+\u03b12\u03b11 x1 if k2 \u2265 \u03b12\/\u03b11,\u221a\u03b11+\u03b12\u03b12 kx1 if k2 < \u03b12\/\u03b11.A.2 Derivations for Section 6.4.2Trivariate vine copula lower tailFix v\u2208 (0,1) and let u1,u2\u2192 0 with u2 \u223c uk1. According to Equations 6.10 and1466.14, depending on the tail order of C23(u2,u3), there are three cases of the limit ofu3|2 =C3|2(v|u2) as u2\u2192 0:\u2022 If C23 has \u03ba23L = 1, then from Equation 6.10,u3|2 \u223c 1\u2212A1(v)u\u22121\/q232 \u2192 1, u2\u2192 0, v \u2208 (0,1) fixed, A1(v)> 0, (A.2)where q23 is a parameter of the LT \u03c823 for C23 in Equation 6.3.\u2022 If C23 has \u03ba23L \u2208 (1,2), then from Equation 6.14,u3|2 \u223c 1\u2212A2(v)(\u2212 logu2)1\u22121\/r23 \u2192 1, u2\u2192 0, v\u2208 (0,1) fixed, A2(v)> 0,(A.3)where r23 is a parameter of the LT \u03c823 for C23 in Equation 6.3.\u2022 If C23 has \u03ba23L = 2, then limu2\u21920 u3|2 \u2208 (0,1).Therefore, the limit of u3|2 =C3|2(v|u2) could either be a number in (0,1), or 1.All cells in Table 6.1 are clear except row 3, column 6 (cell \u2217), that is u1|2\u2192 1,u3|2\u2192 1 and C13;2 has upper tail dependence. From Equation 6.19, the boundaryconditional distribution can be written asC3|1;2(u3|2|u1|2)\u223c(1+(1\u2212u3|21\u2212u1|2)1\/M13;2)M13;2\u22121, (u1|2,u3|2)\u2192 (1,1),where M13;2 is a parameter of the LT \u03c813;2 for C13;2 and 0 < M13;2 < 1.According to the analysis in Section 6.3.1, u1|2\u2192 1 implies that C12 has lowertail dependence and u2 \u223c uk1 with k > 1. However, C23 could have \u03ba23L = 1 or\u03ba23L \u2208 (1,2).\u2022 If C23 has \u03ba23L \u2208 (1,2), then from Equation 6.12 and Equation 6.14,1\u2212u3|2 \u223c A2(v)(\u2212 logu2)1\u22121\/r23 , u2\u2192 1, v \u2208 (0,1) fixed, A2(v)> 0,1\u2212u1|2 \u223c\u2212(q12\u22121)u(1\/k\u22121)\/q122 , (u1,u2)\u2192 (1,1), u2 \u223c uk1, q12 < 0,1\u2212u3|21\u2212u1|2\u223c\u2212 A2(v)q12\u22121(\u2212 logu2)\u2212(1\u2212r23)\/r23u(1\/k\u22121)\/q122\u2192 \u221e,147where r23 and q12 are the parameters of \u03c823 for C23 and \u03c812 for C12 respec-tively. Therefore, C3|1;2(u3|2|u1|2)\u2192 0.\u2022 If C23 has \u03ba23L = 1, then from Equation 6.10 and Equation 6.12,1\u2212u3|2 \u223c A1(v)u\u22121\/q232 , u2\u2192 1, v \u2208 (0,1) fixed, A1(v)> 0,1\u2212u1|2 \u223c\u2212(q12\u22121)u(1\/k\u22121)\/q122 , (u1,u2)\u2192 (1,1), u2 \u223c uk1, q12 < 0,1\u2212u3|21\u2212u1|2\u223c\u2212 A1(v)q12\u22121u\u22121\/q23\u2212(1\/k\u22121)\/q122 ,where q23 and q12 are the parameters of \u03c823 for C23 and \u03c812 for C12 respec-tively. Therefore,C3|1;2(u3|2|u1|2)\u2192\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f31 if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121)> 0,const \u2208 (0,1) if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121) = 0,0 if \u2212q\u2212123 \u2212q\u2212112 (k\u22121\u22121)< 0.Trivariate vine copula upper tailFix v \u2208 (0,1) and let (u1,u2)\u2192 (1,1) with (1\u2212u2)\u223c (1\u2212u1)k. According toEquation 6.18, depending on the tail order of C23(u2,u3), there are two cases of thelimit of u3|2 =C3|2(v|u2) as u2\u2192 1:\u2022 If C3|2 has \u03ba23U = 1, then from Equation 6.18,u3|2 \u223c A3(v)(1\u2212u2)(1\u2212M23)\/M23 \u2192 0, u2\u2192 1, v \u2208 (0,1) fixed, A3(v)> 0,(A.4)where M23 is a parameter of the LT \u03c823 for C23 in Equation 6.4.\u2022 If C3|2 has \u03ba23U \u2208 (1,2], then limu2\u21921 u3|2 \u2208 (0,1).Therefore, the limit of u3|2 =C3|2(v|u2) could either be a number in (0,1), or 0.Cell \u2217 in Table 6.2. Since C13;2 has \u03ba13L \u2208 (1,2), the conditional distribution148can be written as, via Equation 6.15,C3|1;2(u3|2|u1|2)\u223c(1+(\u2212 logu3|2\u2212 logu1|2)1\/r13;2)q13;2+r13;2\u22121u(1+(\u2212 logu3|2\u2212 logu1|2)1\/r13;2)r13;2\u221211|2 ,(u3|2,u1|2)\u2192 (0,0),where q13;2 and r13;2 are parameters of the LT \u03c813;2 for C13;2, and 0 < r13;2 < 1.Since C23 has \u03ba23U = 1, then by Equation 6.18,u3|2 \u223c A3(v)(1\u2212u2)(1\u2212M23)\/M23 \u2192 0, u2\u2192 1, v \u2208 (0,1) fixed,and\u2212 logu3|2 \u223c\u2212 logA3(v)\u2212M23\u22121M23(\u2212 log(1\u2212u2)),where M23 is a parameter of the LT \u03c823 for C23. C12 has upper tail dependence and(1\u2212u2)\u223c (1\u2212u1)k, where k > 1 because u1|2\u2192 0. We have by Equation 6.21,u1|2 \u223c (1\u2212u2)(M12\u22121)(1\/k\u22121)\/M12 \u2192 0, (u1,u2)\u2192 (1,1), (1\u2212u2)\u223c (1\u2212u1)k,and\u2212 logu1|2 \u223cM12\u22121M12(1k\u22121)(\u2212 log(1\u2212u2)),where M12 is a parameter of the LT \u03c812 for C12. Therefore,B :=\u2212 logu3|2\u2212 logu1|2\u223c \u2212M23\u22121M23M12\u22121M12(1k \u22121) > 0,andC3|1;2(u3|2|u1|2)\u223c (1+B1\/r)q+r\u22121u(1+B1\/r)r\u221211|2 \u2192 0, u1|2\u2192 0.The cell \u2217 converges to 0.Cell \u2020 in Table 6.2. Since C13;2 has lower tail dependence, the conditional149distribution can be written as, via Equation 6.11,C3|1;2(u3|2|u1|2)\u223c(1+(u3|2u1|2)1\/q13;2)q13;2\u22121, (u3|2,u1|2)\u2192 (0,0),where q13;2 is a parameter of the LT \u03c813;2 for C13;2 and q13;2 < 0, u3|2 and u1|2 arethe same as previous. Therefore,u3|2u1|2\u223c A3(v)(1\u2212u2)\u2212(M23\u22121)\/M23\u2212(M12\u22121)(1\/k\u22121)\/M12 , (u3|2,u1|2)\u2192 (0,0),C3|1;2(u3|2|u1|2)\u2192\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f30 if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121)> 0,const \u2208 (0,1) if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121) = 0,1 if \u2212 M23\u22121M23 \u2212M12\u22121M12(1k \u22121)< 0.A.3 Derivations for case 1 in Section 6.4.3Lower tailWe give a more detailed analysis to derive the rate at which the conditionalquantile goes to \u2212\u221e. Let (u1,u2)\u2192 (0,0) and u2 \u223c uk1. We are interested in theconditional quantile C\u221213|12(\u03b1|u1,u2). In other words, we need to find v such thatC3|2(v|u2) = C\u221213|1;2(\u03b1|u1|2). Let q12,q23,q13;2 and r12,r23,r13;2 be the parametersof \u03c812 for C12, \u03c823 for C23 and \u03c813;2 for C13;2 respectively. By Equation 6.16,u1|2 \u223c(1+(1k)1\/r12)q12+r12\u22121u(1+(1\/k)1\/r12)r12\u221212 \u2192 0,\u2212 logu1|2 = O(\u2212 logu2).By Equation 6.6,C\u221213|1;2(\u03b1|u1|2)\u223c exp[\u2212(\u2212 log\u03b1r13;2)r13;2(\u2212 logu1|2)1\u2212r13;2],150\u2212 logC\u221213|1;2(\u03b1|u1|2)\u223c\u2212(\u2212 log\u03b1r13;2)r13;2(\u2212 logu1|2)1\u2212r13;2= O((\u2212 log\u03b1)r13;2(\u2212 logu2)1\u2212r13;2).According to Equation 6.15,C3|2(v|u2)\u223c(1+( \u2212 logv\u2212 logu2)1\/r23)q23+r23\u22121\u00d7 exp(\u2212(\u2212 logu2)(1+( \u2212 logv\u2212 logu2)1\/r23)r23+(\u2212 logu2)).For C3|2(v|u2) =C\u221213|1;2(\u03b1|u1|2)to hold, it has to be true that (\u2212 logv)\/(\u2212 logu2)\u21920 as u2\u2192 0. As a result,\u2212 logC3|2(v|u2)\u223c r23(\u2212 logv)1\/r23(\u2212 logu2)1\/r23\u22121.Solving for \u2212 logC3|2(v|u2) =\u2212 logC\u221213|1;2(\u03b1|u1|2), we have\u2212 logv =\u2212 logC\u221213|12(\u03b1|u1,u2) = O((\u2212 log\u03b1)r23r13;2(\u2212 logu2)1\u2212r23r13;2).On the normal scale, it implies that, with u2 =\u03a6(x2)\u223c \u03c6(x2)\/|x2|,F\u22121Y |X1,X2(\u03b1|x1,x2)\u223c (\u22122logv)1\/2 = O((\u2212 log\u03b1)r23r13;2\/2|x2|1\u2212r23r13;2),x1,x2\u2192\u2212\u221e,x2\/x1\u2192\u221ak.Since 1\u2212 r23r13;2 < 1, the conditional quantile function goes to \u2212\u221e sublinearlywith respect to x1 or x2.Upper tailWe give a more detailed analysis to derive the rate at which the conditionalquantile goes to +\u221e. Assume (1\u2212 u2) \u223c (1\u2212 u1)k as (u1,u2) \u2192 (1,1). LetM12,M23,M13;2 be the parameters of \u03c812 for C12, \u03c823 for C23 and \u03c813;2 for C13;2respectively.151\u2022 If k > 1, then u1|2 \u2192 0. For a fixed quantile level \u03b1 \u2208 (0,1), u3|2 has toconverge to 0. By Equation 6.21,u1|2 \u223c (1\u2212u2)(M12\u22121)(1\/k\u22121)\/M12 \u2192 0, \u2212 logu1|2 = O(\u2212 log(1\u2212u2)).By Equation 6.19, as u2\u2192 1,u3|2 \u223c(1+(1\u2212 v1\u2212u2)1\/M23)M23\u22121\u223c(1\u2212 v1\u2212u2)1\u22121\/M23\u2192 0,\u2212 logu3|2 \u223c(1\u2212 1M23)(\u2212 log(1\u2212 v)+ log(1\u2212u2)) .Since C13;2 has \u03ba13L \u2208 (1,2), with a Taylor expansion of Equation 6.15,\u2212 logu3|12 \u223c r13;2(\u2212 logu3|2)1\/r13;2(\u2212 logu1|2)1\/r13;2\u22121\u223c r13;2(1\u2212 1M23)1\/r13;2(\u2212 log(1\u2212 v)+ log(1\u2212u2))1\/r13;2(\u2212 log(1\u2212u2))1\/r13;2\u22121.Let u3|12 = \u03b1 and solve for v, we have\u2212 log(1\u2212 v)\u223c\u2212 log(1\u2212u2)+O((\u2212 log\u03b1)r13;2 (\u2212 log(1\u2212u2))1\u2212r13;2)\u223c\u2212 log(1\u2212u2).On the normal scale, it impliesF\u22121Y |X1,X2(\u03b1|x1,x2)\u223c x2, x1,x2\u2192+\u221e, x2\/x1\u2192\u221ak.\u2022 If k = 1, then u1|2 \u2192 2M12\u22121. For a fixed quantile level \u03b1 \u2208 (0,1), u3|2 hasto converge to a constant. Specifically, u3|2\u2192C\u221213|1;2(\u03b1|2M12\u22121) = O(1). ByEquation 6.19,u3|2 \u223c(1+(1\u2212 v1\u2212u2)1\/M23)M23\u22121= O(1),1521\u2212 v = O(1\u2212u2),\u2212 log(1\u2212 v)\u223c\u2212 log(1\u2212u2).On the normal scale, it impliesF\u22121Y |X1,X2(\u03b1|x1,x2)\u223c x2, x1,x2\u2192+\u221e, x2\/x1\u2192 1.\u2022 If k < 1, then u1|2 \u2192 1. For a fixed quantile level \u03b1 \u2208 (0,1), u3|2 has toconverge to 1. By Equation 6.21,u1|2 \u223c 1+(M12\u22121)(1\u2212u2)(1\/k\u22121)\/M12 \u2192 1,log(1\u2212u1|2)\u223c1M12(1k\u22121)log(1\u2212u2).By Equation 6.19,u3|2 \u223c(1+(1\u2212 v1\u2212u2)1\/M23)M23\u22121\u223c 1+(M23\u22121)(1\u2212 v1\u2212u2)1\/M23\u2192 1,and (1\u2212 v)\/(1\u2212u2)\u2192 0, so thatlog(1\u2212u3|2)\u223c1M23(log(1\u2212 v)\u2212 log(1\u2212u2)) .Since C13;2 has \u03ba13U = 1, by Equation 6.19,u3|12 \u223c(1+(1\u2212u3|21\u2212u1|2)1\/M13;2)M13;2\u22121.Let u3|12 = \u03b1 and solve for v, we have(\u03b11M13;2\u22121 \u22121)M13;2\u223c 1\u2212u3|21\u2212u1|2,153M13;2 log(\u03b11M13;2\u22121 \u22121)\u223c log(1\u2212u3|2)\u2212 log(1\u2212u1|2)\u223c 1M23(log(1\u2212 v)\u2212 log(1\u2212u2))\u2212 1M12(1k\u22121)log(1\u2212u2)\u223c 1M23log(1\u2212 v)\u2212(1M23+1M12(1k\u22121))log(1\u2212u2),\u2212 log(1\u2212 v)\u223c(1+M23M12(1k\u22121))(\u2212 log(1\u2212u2)).On the normal scale, it impliesF\u22121Y |X1,X2(\u03b1|x1,x2)\u223c\u221a1+M23M12(1k\u22121)x2, x1,x2\u2192+\u221e, x2\/x1\u2192\u221ak.154Appendix BConditional dependencemeasures for trivariate FrankcopulasIn this section, we conduct a similar analysis to Section 4.4 on a trivariate Frankcopula model. It is shown that the trivariate Frank copula model has less variationin the conditional dependence measures than the gamma factor model.The copula CDF of a trivariate Archimedean copula isC123(u1,u2,u3) = \u03c8(\u03c8\u22121(u1)+\u03c8\u22121(u2)+\u03c8\u22121(u3)).For a Frank copula with parameter \u03b4 ,\u03c8(s) =\u2212 log[1\u2212 (1\u2212 e\u2212\u03b4 )e\u2212s]\u03b4.The copula of the conditional distribution isC12;3(u1,u2;u3) =h(h\u22121 (u1h(\u03c2))+h\u22121 (u2h(\u03c2))\u2212 \u03c2)h(\u03c2),155where \u03c2 = \u03c8\u22121(u3) andh(s) =\u2212\u03c8 \u2032(s) = (1\u2212 e\u2212\u03b4 )e\u2212s\u03b4(1\u2212 (1\u2212 e\u2212\u03b4 )e\u2212s) .Given the analytical form of the copula of the conditional distribution, we cancompute the exact conditional dependence measures using numerical integration.Similar to Figure 4.3, we simulate n = 1000 samples form a trivariate Frank cop-ula where Kendall\u2019s \u03c4 between two variables is 0.6. The exact \u03c1S(C12;3(\u00b7;x)),\u03b6\u03b1=5(C12;3(\u00b7;x)), and \u03b6\u03b1=5(C\u030212;3(\u00b7;x)) computed via numerical integration are shownin red dash-dot lines in Figure B.1. The kernel-smoothed estimates using Epanech-nikov kernel and window size hn = 0.2 are shown in solid dark lines and the boot-strap confidence bands are plotted in dashed dark lines. Compared to the gammafactor model in Section 4.4, it can be visually observed that there is less variationin the conditional dependence measures.156(a) Spearman\u2019s rho \u03c1S(C12;3).(b) Tail-weighted dependence mea-sure (lower tail) \u03b6\u03b1=5(C12;3).(c) Tail-weighted dependence mea-sure (upper tail) \u03b6\u03b1=5(C\u030212;3).Figure B.1: Conditional measures of C12;3(\u00b7;x), the copula of Y1,Y2 givenF3(Y3) = x, for a trivariate Frank copula model with parameter that cor-responds to Kendall\u2019s \u03c4 = 0.6. The sample size is n = 1000. The reddash-dot lines are the exact conditional measures computed via numer-ical integration. The dark solid lines and dashed lines are the kernel-smoothed conditional Spearman\u2019s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikov kerneland window size hn = 0.2.157Appendix CImplementation of Monte Carlotree search (MCTS)C.1 DescriptionThe Python code is written in the object-oriented programming paradigm. Thereare three classes defined in the code: VineState, CorrMat, and MctsNode.The VineState class represents an incomplete truncated vine structure, whichis internally represented by a list of trees. It contains the following public methods.\u2022 get child states returns all the child states, that is, the VineStateobjects that can be obtained by adding an edge to the current object.\u2022 roll out returns a complete truncated vine structure by adding edges uni-formly at random.\u2022 to vine array converts the VineState object to a vine array represen-tation.The CorrMat class represents a correlation matrix. It provides methods tocompute the log-determinant and partial correlations.The MctsNode class represents a tree node in the search tree. Each objectcontains a VineState object as an attribute. It also stores the relevant summarystatistics. It has the following public methods. add children adds child nodes158to the current node. select child selects a child node according to the treepolicy. roll out performs the default policy. update updates the summarystatistics of the node. Finally, the main function mcts vine takes the followingarguments.\u2022 corr: Correlation matrix, a two-dimensional NumPy array.\u2022 n sample: Number of samples, an integer.\u2022 ntrunc: Truncation level, an integer.\u2022 output dir: Directory where the output file is written, a string.\u2022 itermax: Maximum number of iterations of MCTS, an integer.\u2022 FPU: First play urgency, a floating point number. A larger FPU encouragesexploration while a smaller FPU encourages exploitation.\u2022 PB: Progressive bias, a floating point number. A larger PB gives more weightto heuristic or prior knowledge.\u2022 log freq: Frequency at which debug information is printed, an integer.The code utilizes the Graph class in python-igraph. Relevant methodsand properties of the class are listed below.\u2022 Methods\u2013 add edges: Adds some edges to the graph.\u2013 add vertices: Adds some vertices to the graph.\u2013 copy: Creates an exact deep copy of the graph.\u2013 ecount: Counts the number of edges.\u2013 get adjacency: Returns the adjacency matrix of a graph.\u2013 vcount: Counts the number of vertices.\u2022 Properties\u2013 es: The edge sequence of the graph.159\u2013 vs: The vertex sequence of the graph.Table C.1 shows the correspondence of variables and functions defined in thepsuedocode in Algorithm 3.1 and in the Python implementation.Pseudocode Implementationnv MctsNode.visitsnv \u00b7 x\u00afv MctsNode.sum scoren(v1,v2) MctsNode.child visitsvroot root nodevhistory temp node listTreePolicy MctsNode.select childDefaultPolicy MctsNode.roll outBackprop MctsNode.updateTable C.1: Correspondence of variables and functions defined in the psue-docode in Algorithm 3.1 and in the Python implementation.The provided code requires Python in version\u2265 3.4, numpy package in version\u2265 1.15, and python-igraph in version \u2265 0.7.C.2 Example usageIn this section, we provide a code snippet to showcase the usage of the mcts vinefunction.import numpy as np# A 8 dimensional correlation matrixrmat = np.array([[1.00, 0.98, 0.89, 0.97, 0.96, 0.95, 0.95, 0.60],[0.98, 1.00, 0.90, 0.97, 0.95, 0.95, 0.95, 0.62],[0.89, 0.90, 1.00, 0.92, 0.87, 0.90, 0.92, 0.66],[0.97, 0.97, 0.92, 1.00, 0.98, 0.98, 0.97, 0.63],[0.96, 0.95, 0.87, 0.98, 1.00, 0.95, 0.92, 0.54],[0.95, 0.95, 0.90, 0.98, 0.95, 1.00, 0.94, 0.61],[0.95, 0.95, 0.92, 0.97, 0.92, 0.94, 1.00, 0.69],[0.60, 0.62, 0.66, 0.63, 0.54, 0.61, 0.69, 1.00]])# Set seedsrandom.seed(0)np.random.seed(0)160# Run MCTSbest_vine = mcts_vine(rmat, n_sample=500, output_dir=\u2019output.txt\u2019, ntrunc=3,itermax=1000, FPU=1.0, PB=0.1, log_freq=100)# Print the resultprint(best_vine.to_vine_array())## CFI: 0.99## [[8 8 7 4 4 4 1 4]## [0 7 8 7 5 5 4 5]## [0 0 4 8 7 7 5 6]## [0 0 0 5 8 6 7 7]## [0 0 0 0 6 8 6 1]## [0 0 0 0 0 1 8 2]## [0 0 0 0 0 0 2 8]## [0 0 0 0 0 0 0 3]]C.3 Codeimport numpy as npimport igraphimport randomimport mathfrom functools import reduceclass VineState:def __init__(self, ntrunc, corr_mat):\"\"\" ConstructorThis function is only called when constructing the root state.Subsequent states are constructed by calling self._clone().Args:ntrunc: Number of truncation level.corr_mat: A CorrMat object.\"\"\"self._ntrunc = ntruncself._corr_mat = corr_mat# dimensionself._d = corr_mat.dim()assert self._ntrunc > 0161assert self._ntrunc < self._d# self.tree_list is a list of igraph objects, representing an# incomplete truncated vine. Each element is a tree, except for the# last one, which is an incomplete tree. When self.__init__() is# called, self.tree_list is a list with an empty igraph object.g = igraph.Graph()g.add_vertices(self._d)g.vs[\u2019name\u2019] = [str(i) for i in range(self._d)]self.tree_list = [g]# The score of the incomplete vine: -log(1-r\u02c62).self.score = 0.0def _clone(self):\"\"\" Create a deep clone of this state. \"\"\"# _corr_mat is a shallow copyst = VineState(ntrunc=self._ntrunc, corr_mat=self._corr_mat)# Create a deep copy of self.tree_listst.tree_list = [g.copy() for g in self.tree_list]st.score = self.scorereturn stdef _level(self):\"\"\" Return the level of the current incomplete vine.Level is using zero-based numbering.\"\"\"return len(self.tree_list) - 1def _is_complete(self):\"\"\" Whether a vine state is complete.A vine state is complete if the last tree in self.tree_list is aconnected tree, and the current level reaches the truncation level.\"\"\"self_g = self.tree_list[-1]return (self_g.ecount() == self_g.vcount() - 1) and \\(self._level() >= self._ntrunc - 1)def get_child_states(self):\"\"\" Get a list of all valid child states.If there is none, return an empty list.162\"\"\"if self._is_complete():return []self_g = self.tree_list[-1]# Append an empty graph to self.tree_list if self_g is a connected# tree.if self_g.ecount() == self_g.vcount() - 1:# self_g is already a tree.# If the current tree is connected but it hasn\u2019t reached ntrunc,# then add another empty graph.g = igraph.Graph()g.add_vertices(self_g.ecount())g.vs[\u2019name\u2019] = self_g.es[\u2019name\u2019]self.tree_list.append(g)self_g = self.tree_list[-1]# Initialize the returned list.res = []if self_g.ecount() == 0:# If self_g is empty, select all pairs of edges as child states.for i in range(self_g.vcount()):for j in range(i):# Connect i and j.st = self._add_edge_helper(i, j)if st is not None:res.append(st)else:# If self_g is NOT empty, connect vertices with degree > 0 and# vertices with degree == 0. By doing so, there is always only one# connected component in the graph. The way we grow the tree# resembles Prim\u2019s algorithm, not Kruskal\u2019s algorithm.adj_mat = self_g.get_adjacency()adj_vec = [max(a) for a in adj_mat]# Vertices with degree > 0conn_ids = [i for i, x in enumerate(adj_vec) if x == 1]# Vertices with degree == 0disconn_ids = [i for i, x in enumerate(adj_vec) if x == 0]for i in conn_ids:163for j in disconn_ids:# Connect i and j.st = self._add_edge_helper(i, j)if st is not None:res.append(st)return resdef _add_edge_helper(self, i, j):\"\"\" Add an edge to the last graph in self.tree_list.Add an edge between vertex id i and j, if proximity condition issatisfied. The score is updated.Args:i, j: vertex indices in self.tree_list[-1].res: a list which the result state is appended to.Returns:A VineState with the added edge.If proximity condition is not satisfied, return None.\"\"\"# Create a deep copy of the current state.temp_st = self._clone()# copy_g is the last incomplete tree in the newly copied state.copy_g = temp_st.tree_list[-1]if self._level() == 0:# If there\u2019s only one tree in self.tree_list, no need to consider# the proximity condition. Simply add an edge.copy_g.add_edges([(i, j)])# Add edge namecopy_g.es[copy_g.ecount() - 1][\u2019name\u2019] = \u2019,\u2019.join([str(j), str(i)] if j < i else [str(i), str(j)])# Update scorethis_score = -np.log(1 - self._corr_mat.pcorr(i, j) ** 2)temp_st.score += this_scorecopy_g.es[copy_g.ecount() - 1][\u2019weight\u2019] = this_scoreelse:# When level > 1, check the proximity condition first.# If it is not satisfied, return None.164# Otherwise, add the edge.prev_g = temp_st.tree_list[-2]# Get vertex names of i, j in copy_gi_v_name = copy_g.vs[i][\u2019name\u2019]j_v_name = copy_g.vs[j][\u2019name\u2019]# Get edge ids in prev_gi_edge = prev_g.es.find(name=i_v_name)j_edge = prev_g.es.find(name=j_v_name)if not set(i_edge.tuple) & set(j_edge.tuple):# If the intersection of i_edge and j_edge is empty, then the# proximity condition is not satisfied. Skip this pair.return None# Proximity condition is satisfied.copy_g.add_edges([(i, j)])# Assertionsif i_v_name.find(\u2019|\u2019) >= 0:assert j_v_name.find(\u2019|\u2019) >= 0elif i_v_name.find(\u2019|\u2019) < 0:assert j_v_name.find(\u2019|\u2019) < 0# Vertex namesi_v_name_set = set(i_v_name.replace(\u2019|\u2019, \u2019,\u2019).split(\u2019,\u2019))j_v_name_set = set(j_v_name.replace(\u2019|\u2019, \u2019,\u2019).split(\u2019,\u2019))# Symmetric differencenew_name_before_bar = \u2019,\u2019.join(sorted(i_v_name_set \u02c6 j_v_name_set))# Intersectionnew_name_after_bar = \u2019,\u2019.join(sorted(i_v_name_set & j_v_name_set))new_name = new_name_before_bar + \u2019|\u2019 + new_name_after_bar# Add edge namecopy_g.es[copy_g.ecount() - 1][\u2019name\u2019] = new_name# Update score_i, _j = [int(k) for k in new_name_before_bar.split(\u2019,\u2019)]_S = [int(k) for k in new_name_after_bar.split(\u2019,\u2019)]this_score = -np.log(1 - self._corr_mat.pcorr(i=_i, j=_j, S=_S) ** 2)165temp_st.score += this_scorecopy_g.es[copy_g.ecount() - 1][\u2019weight\u2019] = this_scorereturn temp_stdef roll_out(self):\"\"\" Roll out the current vine state to a complete one.The current implementation is naive. It randomly chooses a child stateiteratively until reaching the end.Returns:(score, vine): The score and final vine state.\"\"\"st = self._clone()while not st._is_complete():self_g = st.tree_list[-1]# Append an empty graph to self.tree_list if self_g is a connected# tree.if self_g.ecount() == self_g.vcount() - 1:# self_g is already a tree.# If the current tree is connected but it hasn\u2019t reached# ntrunc, then add another empty graph.g = igraph.Graph()g.add_vertices(self_g.ecount())g.vs[\u2019name\u2019] = self_g.es[\u2019name\u2019]st.tree_list.append(g)self_g = st.tree_list[-1]if self_g.ecount() == 0:# If self_g is empty, randomly pick a pair of edges as child# states.v_list = list(range(self_g.vcount()))random.shuffle(v_list)found = Falsefor i in v_list:for j in range(i):# Connect i and j.temp_st = st._add_edge_helper(i, j)if temp_st is not None:st = temp_stfound = Truebreak166if found:breakelse:# If self_g is NOT empty, connect vertices with degree > 0 and# vertices with degree == 0. By doing so, there is always only# one connected component in the graph. The way we grow the# tree resembles Prim\u2019s algorithm, not Kruskal\u2019s algorithm.adj_mat = self_g.get_adjacency()adj_vec = [max(a) for a in adj_mat]# Vertices with degree > 0conn_ids = [i for i, x in enumerate(adj_vec) if x == 1]# Vertices with degree == 0disconn_ids = [i for i, x in enumerate(adj_vec) if x == 0]random.shuffle(conn_ids)random.shuffle(disconn_ids)found = Falsefor i in conn_ids:for j in disconn_ids:# Connect i and j.temp_st = st._add_edge_helper(i, j)if temp_st is not None:st = temp_stfound = Truebreakif found:breakreturn (st.score, st)def to_vine_array(self):\"\"\" Convert an object to a vine array representation.The representation is one-based numbering.Return a d-by-d upper triagular matrix.\"\"\"d = self._d# clone is a full vine, randomly rolled out from the current truncated# vine.167clone = self._clone()clone._ntrunc = d - 1_, clone = clone.roll_out()# cond_sets is a list of length d-1,# each element is a list of conditioned sets at each level.cond_sets = []for k in range(d - 1):current_edges = clone._edge_repr()[k * d -(k**2 + k) \/\/ 2:(k + 1) * d -((k + 1)**2 + (k + 1)) \/\/ 2]current_edges = [[int(node) for node in e.split(\u2019|\u2019)[0].split(\u2019,\u2019)] for e in current_edges]# print(current_edges)cond_sets.append(current_edges)# When constructing the vine array, that elements are added column by# column, from right to left.# Within each column, elements are added from bottom to top.# In other words, we start from the last tree.M = -np.ones((d, d), dtype=np.int)for k in range(d - 2, -1, -1):w = cond_sets[k][0][0]M[k + 1, k + 1] = wM[k, k + 1] = cond_sets[k][0][1]del cond_sets[k][0]for ell in range(k - 1, -1, -1):for j in range(len(cond_sets[ell])):if w in cond_sets[ell][j]:cond_sets[ell][j].remove(w)v = cond_sets[ell][j][0]M[ell, k + 1] = vdel cond_sets[ell][j]breakM[0, 0] = M[0, 1]M += 1 # change from zero-based numbering to one-based numberingreturn Mdef _edge_repr(self):\"\"\" Edge representation of the VineState.Returns a list of strings, each represents an edge.For example:[\u20190,2\u2019, \u20191,3\u2019, \u20192,4\u2019, \u20193,5\u2019, \u20194,6\u2019, \u20195,6\u2019, \u20192,6|4\u2019, \u20193,6|5\u2019, \u20194,5|6\u2019]\"\"\"168res = [sorted(g.es[\u2019name\u2019]) for g in self.tree_list if g.ecount() > 0]if res:res = reduce(lambda x, y: x + y, res)return resdef __hash__(self):return hash(tuple(self._edge_repr()))def __eq__(self, other):return self.__hash__() == other.__hash__()def __repr__(self):return self._edge_repr().__repr__()class CorrMat:\"\"\" A wrapper of a correlation matrix. \"\"\"def __init__(self, corr_mat, n):\"\"\" Constructor.Args:corr_mat: a correlation matrix as a numpy array.\"\"\"# corr_mat should be a square matrixassert corr_mat.ndim == 2assert corr_mat.shape[0] == corr_mat.shape[1]self._corr_mat = corr_matself._corr_mat_inv = np.linalg.inv(self._corr_mat)self._n = ndef n_sample(self):return self._ndef dim(self):\"\"\" Number of variables in the correlation matrix. \"\"\"return self._corr_mat.shape[0]def log_det(self):\"\"\" Log determinant of the correlation matrix. \"\"\"return np.log(np.linalg.det(self._corr_mat))def pcorr(self, i, j, S=None):\"\"\" Partial correlation of (i,j)|S. The indices are zero based.169Args:i, j: Indices.S: A list of indices.\"\"\"if not S:return self._corr_mat[i][j]ind = [i, j] + Ssub_matrix = self._corr_mat[np.ix_(ind, ind)]# TODO: consider using np.linalg.solve instead of np.linalg.inv in the# future.sub_matrix_inv = np.linalg.inv(sub_matrix)return -sub_matrix_inv[0, 1] \/ np.sqrt(sub_matrix_inv[0, 0] * sub_matrix_inv[1, 1])def _parse_edge_name(self, name):\"\"\" Parse the name of an edge. For example: name =\u20192,5|3,6\u2019.Args:name: name of an edge. For example: name =\u20192,5|3,6\u2019.Returns:i, j, S\"\"\"name_split = name.split(\u2019|\u2019)i, j = [int(k) for k in name_split[0].split(\u2019,\u2019)]if len(name_split) > 1:S = [int(k) for k in name_split[1].split(\u2019,\u2019)]else:S = Nonereturn i, j, Sdef pcorr_by_name(self, name):\"\"\" Partial correlation by name. For example: name =\u20192,5|3,6\u2019. \"\"\"return self.pcorr(*self._parse_edge_name(name))def pcorr_given_all_by_name(self, name):\"\"\" Partial correlation of i, j given all other variables.Args:name: name of an edge. For example: name =\u20192,5|3,6\u2019.Returns:170Partial correlation of i, j given all other variables.\"\"\"i, j, _ = self._parse_edge_name(name)return -self._corr_mat_inv[i, j] \/ np.sqrt(self._corr_mat_inv[i, i] * self._corr_mat_inv[j, j])def __repr__(self):return self._corr_mat.__repr__()class MctsNode:\"\"\" A node class of the search tree. \"\"\"def __init__(self, config, state=None):\"\"\" ConstructorArgs:config: A configuration dictionary, containstranspos_table, UCT_const, FPU.\"\"\"self.state = stateself.config = configself.child_nodes = []# self.child_visits keeps how many times the child nodes are visitied# from the *current* node.self.child_visits = []self.visits = 0self.sum_score = 0def select_child(self):\"\"\" Tree policy: Select a child from the transposition table accordingto UCT.Update self.child_visits.Note: self.visits is not updated here. It is updated when self.updateis called.\"\"\"# UCT_score uses UCT2 in Childs et al. (2008). Transpositions and move# groups in monte carlo tree search.def UCT(c_node, c_node_visits):mean_score = (c_node.sum_score \/c_node.visits) if c_node.visits > 0 else 0# Margin of Errorif c_node_visits == 0:171moe = self.config[\u2019FPU\u2019]else:moe = math.sqrt(2 * math.log(self.visits + 1) \/ c_node_visits)edge_score = c_node.state.score - self.state.scoreprog_bias = self.config[\u2019PB\u2019] * edge_score \/ (c_node.visits + 1)# # An alternative progressive bias term:# # the partial correlation of the newly added edge, given all the# # other variables.# edge_diff_set = set(c_node.state._edge_repr()) - \\# set(self.state._edge_repr())# assert len(edge_diff_set) == 1# edge_diff = list(edge_diff_set)[0]# pcorr_all = self.state._corr_mat.pcorr_given_all_by_name(# edge_diff)# prog_bias = self.config[# \u2019PB\u2019] * (-np.log(1 - pcorr_all**2)) \/ (c_node.visits + 1)return mean_score + self.config[\u2019UCT_const\u2019] * (moe + prog_bias)UCT_list = [(UCT(node, self.child_visits[i]), node.state, i)for i, node in enumerate(self.child_nodes)]# max() in python 3: If multiple items are maximal, the function# returns the first one encountered.# Shuffle score_list so that when two nodes have the same UCT_score,# one of them is randomly picked.random.shuffle(UCT_list)max_UCT_list = max(UCT_list, key=lambda x: x[0])# print(max_UCT_list)selected_state = max_UCT_list[1]select_index = max_UCT_list[2]selected_node = self.config[\u2019transpos_table\u2019][selected_state]# Update number of visitsself.child_visits[select_index] += 1return selected_nodedef add_children(self):\"\"\" Add all children of the current node if possible.The children are added to the transposition table.Add the child *nodes* to self.child_nodes.Initialize self.child_visits to zeros.172Returns:bool: If children are successfully added, return True.If not, do nothing and return False.\"\"\"assert len(self.child_nodes) == 0child_states = self.state.get_child_states()if not child_states:return Falseself.child_visits = [0] * len(child_states)for c_state in child_states:if c_state not in self.config[\u2019transpos_table\u2019]:self.config[\u2019transpos_table\u2019][c_state] = MctsNode(config=self.config,state=c_state)self.child_nodes.append(self.config[\u2019transpos_table\u2019][c_state])self.child_visits = [0] * len(child_states)return Truedef roll_out(self):\"\"\" Run default policyReturns:(score, vine): The score and final vine state.\"\"\"return self.state.roll_out()def update(self, result):\"\"\" Update the node with result \"\"\"self.visits += 1self.sum_score += resultdef is_leaf(self):\"\"\" If the node is a leaf node.\"\"\"# The node is leaf node if its self.child_nodes is empty.return self.child_nodes == []173def __repr__(self):mean_score = str(self.sum_score \/self.visits) if self.visits > 0 else \u2019N\/A\u2019return \u2019Vine state: \u2019 + self.state.__repr__() + \u2019\\nMean score: \u2019 + \\mean_scoredef mcts_vine(corr, n_sample, ntrunc, output_dir, itermax=100, FPU=1.0,PB=0.1, log_freq=100):# Initialize the correlation matrix object, root state and root nodecorr_mat = CorrMat(corr, n_sample)root_state = VineState(ntrunc=ntrunc, corr_mat=corr_mat)transpos_table = {}config = {# a dictionary: state -> node.\u2019transpos_table\u2019: transpos_table,# UCB1 formula: \\bar{x} + UCT_const \\sqrt{log(n)\/log(n_j)}\u2019UCT_const\u2019: (-corr_mat.log_det()),# First Play Urgency\u2019FPU\u2019: FPU,# Progressive Bias\u2019PB\u2019: PB}print(\"Configuration dictionary:\")print(config)root_node = MctsNode(config, root_state)best_score = 0 # we want to maximize the scorebest_vine = Nonefile_handler = open(output_dir, \"w\")# CFI calculationD_0 = corr_mat.n_sample() * (-corr_mat.log_det())nu_0 = corr_mat.dim() * (corr_mat.dim() - 1) \/ 2.0for i in range(itermax):node = root_nodetemp_node_list = [node]# Selectwhile not node.is_leaf():node = node.select_child()174temp_node_list.append(node)# Expandif node.visits > 0:# Only expand the leaf node if it has been visited.add_children_success = node.add_children()if add_children_success:node = node.select_child()temp_node_list.append(node)# Rolloutscore, vine = node.roll_out()if score > best_score:best_score = scorebest_vine = vine# CFI calculationD_ell = corr_mat.n_sample() * (-corr_mat.log_det() - best_score)nu_ell = (corr_mat.dim() - ntrunc) * \\(corr_mat.dim() - ntrunc - 1) \/ 2.0CFI = 1 - max(0, D_ell - nu_ell) \/ \\max(0, D_0 - nu_0, D_ell - nu_ell)file_handler.write(\u2019%d, %.4f, %.4f\\n\u2019 % (i, best_score, CFI))if i % log_freq == 0 and i > 0:print(output_dir + \u2019, Iter %d: \u2019 % i)print(\"best_score: \" + str(best_score))# Backpropagate[node.update(score) for node in temp_node_list]file_handler.close()print(\"CFI: \" + str(CFI))return best_vine175","@language":"en"}],"Genre":[{"@value":"Thesis\/Dissertation","@language":"en"}],"GraduationDate":[{"@value":"2019-09","@language":"en"}],"IsShownAt":[{"@value":"10.14288\/1.0379699","@language":"en"}],"Language":[{"@value":"eng","@language":"en"}],"Program":[{"@value":"Statistics","@language":"en"}],"Provider":[{"@value":"Vancouver : University of British Columbia Library","@language":"en"}],"Publisher":[{"@value":"University of British Columbia","@language":"en"}],"Rights":[{"@value":"Attribution-NonCommercial-NoDerivatives 4.0 International","@language":"*"}],"RightsURI":[{"@value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","@language":"*"}],"ScholarlyLevel":[{"@value":"Graduate","@language":"en"}],"Title":[{"@value":"Vine copulas : dependence structure learning, diagnostics, and applications to regression analysis","@language":"en"}],"Type":[{"@value":"Text","@language":"en"}],"URI":[{"@value":"http:\/\/hdl.handle.net\/2429\/70869","@language":"en"}],"SortDate":[{"@value":"2019-12-31 AD","@language":"en"}],"@id":"doi:10.14288\/1.0379699"}