Vine copulas: dependence structure learning, diagnostics,and applications to regression analysisbyBo ChangB.S. (Mathematics), B.A. (Economics), Peking University, 2011M.S. (Statistics), University of California, Los Angeles, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)June 2019c© Bo Chang, 2019The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Vine copulas: dependence structure learning, diagnostics, and applica-tions to regression analysissubmitted by Bo Chang in partial fulfillment of the requirements for the degree ofDoctor of Philosophy in Statistics.Examining Committee:Harry Joe, Department of StatisticsSupervisorNatalia Nolde, Department of StatisticsSupervisory Committee MemberMatı´as Salibia´n-Barrera, Department of StatisticsSupervisory Committee MemberRuben H. Zamar, Department of StatisticsUniversity ExaminerKevin Song, Vancouver School of EconomicsUniversity ExaminerLouis-Paul Rivest, Universite´ LavalExternal ExamineriiAbstractCopulas are widely used in high-dimensional multivariate applications where theassumption of Gaussian distributed variables does not hold. Vine copulas are aflexible family of copulas built from a sequence of bivariate copulas to representbivariate dependence and bivariate conditional dependence. The vine structuresconsist of a hierarchy of trees to express conditional dependence.The contributions of this thesis are (a) improved methods for finding parsimo-nious truncated vine structures when the number of variables is moderate to large;(b) diagnostic methods to help in decisions for bivariate copulas in the vine; (c)applications to predictions based on conditional distributions of the vine copula.The vine structure learning problem has been challenging due to the largesearch space. Existing methods are based on greedy algorithms and do not ingeneral produce a solution that is near the global optimum. It is an open prob-lem to choose a good truncated vine structure when there are many variables. Wepropose a novel approach to learning truncated vine structures using Monte Carlotree search, a method that has been widely adopted in game and planning prob-lems. The proposed method has significantly better performance over the existingmethods under various experimental setups.Moreover, diagnostic methods based on measures of dependence and tail asym-metry are proposed to guide the choice of parametric bivariate copula families as-signed to the edges of the trees in the vine and to assess whether a copula is constantover the conditioning value(s) for trees 2 and higher. If the diagnostic methods sug-gest the existence of reflection asymmetry, permutation asymmetry, or asymmetrictail dependence, then three- or four-parameter bivariate copula families might beneeded. If the conditional dependence measures or asymmetry measures in trees 2iiiand up are not constant over the conditioning value(s), then non-constant copulaswith parameters varying over conditioning values should be considered.Finally, for data from an observational study, we propose a vine copula regres-sion method that uses regular vines and handles mixed continuous and discretevariables. This method can efficiently compute the conditional distribution of theresponse variable given the explanatory variables.ivLay SummaryIn applications with a large number of quantitative variables, the dependence re-lation among the variables is often of interest. Vine copulas are flexible modelsfor the dependence relation that extend beyond the restrictive assumptions in clas-sical multivariate Gaussian elliptical dependence. They are built from a sequenceof two-dimensional models to represent bivariate dependence and bivariate con-ditional dependence. The contributions of this thesis for vine copulas include (a)improved methods for finding parsimonious truncated vine dependence structureswhen the number of variables is moderate to large; (b) diagnostic methods to helpin decisions for bivariate copulas in the vine; (c) applications to predictions of theresponse variable from a set of explanatory variables that are observed at the sametime as the response.vPrefaceThe thesis is an original intellectual produce of the author, Bo Chang, with the guid-ance and mentorship of Prof. Joe. The research questions and the proposed newmethods are discussed with Prof. Joe during weekly research meetings. Through-out the preparation of the thesis, Prof. Joe makes ample suggestions on the im-provement of presentation, motivation, and big picture viewpoints as well as sometechnical details.Chapter 3 is based on a published paper: Chang, B., Pan, S., and Joe, H.(2019). Vine copula structure learning via Monte Carlo tree search. In Interna-tional Conference on Artificial Intelligence and Statistics. An extended version isunder preparation for journal submission. With guidance from the supervisor, theauthor develops the idea of the work and implements the proposed algorithm inPython. With help from Shenyi Pan, the author drafts the manuscript.Chapter 4 is based on a manuscript: Chang, B. and Joe, H. (2019). Copuladiagnostics for asymmetries and conditional dependence. The manuscript is underreview for journal publication. The supervisor suggests the idea of developingdiagnostic tools and provides an early version of the implementation of the concept.The author conducts the experiments and drafts the manuscript, and the supervisorsuggests revisions to the manuscript.Chapter 5 is based on a published paper: Chang, B. and Joe, H. (2019). Predic-tion based on conditional distributions of vine copulas. Computational Statistics &Data Analysis, 139:45–63. The supervisor develops the idea of work. The authorimplements the proposed method; some of the functions are adapted or taken fromthe CopulaModel R package written by the supervisor.Chapter 6 is based on the supplementary materials of Chang, B. and Joe, H.vi(2019). Prediction based on conditional distributions of vine copulas. Computa-tional Statistics & Data Analysis, 139:45–63. Under the guidance of the supervisor,the author derives the main results.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvList of Symbols and Notations . . . . . . . . . . . . . . . . . . . . . . . xxGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Dependence structure learning . . . . . . . . . . . . . . . . . . . 31.3 Vine copula diagnostics . . . . . . . . . . . . . . . . . . . . . . . 51.4 Vine copula regression . . . . . . . . . . . . . . . . . . . . . . . 51.5 Research contributions and organization of thesis . . . . . . . . . 7viii2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Bivariate copulas . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Density and conditional distributions . . . . . . . . . . . 122.1.2 Dependence measures . . . . . . . . . . . . . . . . . . . 142.1.3 Asymmetry measures . . . . . . . . . . . . . . . . . . . . 152.1.4 Tail order . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Archimedean copulas . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Vine copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Vine graphical models . . . . . . . . . . . . . . . . . . . 182.3.2 Vine array representation . . . . . . . . . . . . . . . . . . 202.3.3 From vines to multivariate distributions . . . . . . . . . . 202.3.4 Truncated vine . . . . . . . . . . . . . . . . . . . . . . . 222.3.5 Vine structure learning . . . . . . . . . . . . . . . . . . . 232.3.6 Performance metric . . . . . . . . . . . . . . . . . . . . . 242.4 Two-stage estimation method for copula models . . . . . . . . . . 252.5 Vuong’s procedure . . . . . . . . . . . . . . . . . . . . . . . . . 263 Vine structure learning via Monte Carlo tree search . . . . . . . . . 273.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Vine structure learning as sequential decision making . . . 293.2.2 Monte Carlo tree search . . . . . . . . . . . . . . . . . . 313.2.3 Tree policy: vine UCT . . . . . . . . . . . . . . . . . . . 333.3 A worst-case example for SeqMST . . . . . . . . . . . . . . . . . 423.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.1 Structure learning experiments . . . . . . . . . . . . . . . 453.4.2 Vine copula learning experiments . . . . . . . . . . . . . 483.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Copula diagnostics for asymmetries and conditional dependence . . 524.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Copula-based conditional measures . . . . . . . . . . . . . . . . 554.2.1 Estimating copulas of conditional distributions . . . . . . 57ix4.2.2 Conditional Spearman’s rho . . . . . . . . . . . . . . . . 604.2.3 Conditional tail-weighted dependence measure . . . . . . 614.2.4 Conditional measures of permutation and reflection asym-metry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Skewed bivariate copulas . . . . . . . . . . . . . . . . . . . . . . 634.4 Conditional dependence with the gamma factor model . . . . . . 674.5 Illustrative data examples . . . . . . . . . . . . . . . . . . . . . . 724.5.1 Hydro-geochemical data . . . . . . . . . . . . . . . . . . 734.5.2 Glioblastoma tumors dataset . . . . . . . . . . . . . . . . 764.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 Prediction based on conditional distributions of vine copulas . . . . 845.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Model fitting and assessment . . . . . . . . . . . . . . . . . . . . 865.2.1 Vine structure learning . . . . . . . . . . . . . . . . . . . 875.2.2 Bivariate copula selection . . . . . . . . . . . . . . . . . 895.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 925.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.1 Abalone data set . . . . . . . . . . . . . . . . . . . . . . 985.5.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 1005.5.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . 1055.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076 Theoretical results on shapes of conditional quantile functions . . . 1096.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.2 Bivariate asymptotic conditional quantile . . . . . . . . . . . . . 1106.3 Bivariate Archimedean copula boundary conditional distributions . 1146.3.1 Lower tail . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.3.2 Upper tail . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4 Trivariate asymptotic conditional quantile . . . . . . . . . . . . . 1216.4.1 Trivariate strongest functional relationship . . . . . . . . . 121x6.4.2 Trivariate conditional boundary distribution with bivariateArchimedean copulas . . . . . . . . . . . . . . . . . . . . 1226.4.3 Case studies: trivariate conditional quantile . . . . . . . . 1276.5 Beyond trivariate . . . . . . . . . . . . . . . . . . . . . . . . . . 1327 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A Derivations for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 144A.1 Derivations for Section 6.4.1 . . . . . . . . . . . . . . . . . . . . 144A.2 Derivations for Section 6.4.2 . . . . . . . . . . . . . . . . . . . . 146A.3 Derivations for case 1 in Section 6.4.3 . . . . . . . . . . . . . . . 150B Conditional dependence measures for trivariate Frank copulas . . . 155C Implementation of Monte Carlo tree search (MCTS) . . . . . . . . . 158C.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158C.2 Example usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 160C.3 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161xiList of TablesTable 3.1 Experimental results for the FX dataset. The columns includethe Gaussian log-likelihood and comparative fit index (CFI) ofthe vine dependence structure. Given the vine structures, bivari-ate copulas are assigned to the edges. The last three columns arethe resulting vine copula log-likelihood, number of parameters,and AIC. For BJ15 and MCTS, we show 10 replications withdifferent random seeds and the average and best model. . . . . 49Table 3.2 Experimental results for the GBM20 dataset. The columns in-clude the Gaussian log-likelihood and CFI of the vine depen-dence structure. Given the vine structures, bivariate copulas areassigned to the edges. The last three columns are the resultingvine copula log-likelihood, number of parameters, and AIC. ForBJ15 and MCTS, we show 10 replications with different randomseeds and the average and best model. . . . . . . . . . . . . . 50Table 4.1 Empirical tail-weighted dependence measures ζˆα=5, Gaussiantail-weighted dependence measure ζα=5, permutation asymme-try measures ĜP,k=0.2, and fitted copulas in the first tree for thehydro-geochemical dataset. Gaussian ζα=5 is the tail-weighteddependence measure of a bivariate Gaussian copula whose Spear-man’s rho is the same as the empirical counterpart. There ap-pears to be reflection asymmetry, permutation asymmetry, andstronger dependence than Gaussian in the joint upper and lowertails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73xiiTable 4.2 Fitted bivariate copulas in the second tree for the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Table 4.3 Pairwise comparison of vine copula models on the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Table 4.4 Permutation asymmetry measures and Akaike information cri-terion (AIC)s of pairs of variables in the GBM dataset. . . . . 78Table 4.5 Empirical tail-weighted dependence measures ζˆα=5 and Gaus-sian tail-weighted dependence measure ζα=5. . . . . . . . . . 79Table 4.6 Permutation asymmetry measure and AICs of pairs of variablesin tree 2 of D-vine 1342 and C-vine 1234 on the GBM dataset.If the AIC of a non-constant model is worse than a constantmodel, we report the AIC of the constant model, e.g., [14|3],[23|1] and [24|1]. . . . . . . . . . . . . . . . . . . . . . . . . 80Table 4.7 Model AICs for different vine structures on GBM dataset. . . . 81Table 5.1 Simulation results for two explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS)in different simulation cases. The arrows in the header indicatethat lower RMSE, IS, and IBS; and higher LS and QS are better.The numbers in parentheses are the corresponding standard errors.100Table 5.2 Simulation results for four explanatory variables. The tableshows the root-mean-square error (RMSE), logarithmic score(LS), quadratic score (QS), interval score (IS), and integratedBrier score (IBS) in different simulation cases. The arrows inthe header indicate that lower RMSE, IS, and IBS; and higher LSand QS are better. The numbers in parentheses are the corre-sponding standard errors. . . . . . . . . . . . . . . . . . . . . 101Table 5.3 Comparison of the performance of vine copula regressions andlinear regression. The numbers are the average scores over 100trials of 5-fold cross validation. The scoring rules are defined inSection 5.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104xiiiTable 5.4 Vine array and bivariate copulas of the R-vine copula regres-sion fitted on the full dataset. The variables are (1) Length, (2)Diameter, (3) Height, (4) WholeWeight, (5) ShuckedWeight,(6) VisceraWeight, (7) ShellWeight, (8) Rings. Asuffix of ‘s’ represents survival version of the copula family toget the opposite direction of joint tail asymmetry; ‘u’ and ‘v’represent the copula family with reflection on the first and sec-ond variable respectively to get negative dependence. . . . . . 105Table 6.1 The taxonomy of the lower tail boundary conditional distribu-tion limu1,u2→0 u3|12, where u3|12 is defined in Equation 6.23.For the first (non-heading) row where limu1,u2→0C1|2(u1|u2) =0, κ13 represents κ13L, the lower tail order of C13;2. Similarly,for the third (non-heading) row, where limu1,u2→0C1|2(u1|u2) =1, κ13 represents κ13U , the upper tail order of C13;2. . . . . . . 124Table 6.2 The taxonomy of the upper tail boundary conditional distribu-tion limu1,u2→1 u3|12, where u3|12 is defined in Equation 6.23.For the first (non-heading) row where limu1,u2→1C1|2(u1|u2) =0, κ13 represents κ13L, the lower tail order of C13;2. Similarly,for the third (non-heading) row, where limu1,u2→1C1|2(u1|u2) =1, κ13 represents κ13U , the upper tail order of C13;2. . . . . . . 126Table C.1 Correspondence of variables and functions defined in the psue-docode in Algorithm 3.1 and in the Python implementation. . . 160xivList of FiguresFigure 2.1 Contour plots of the joint probability density function (PDF)c(Φ(z1),Φ(z2))φ(z1)φ(z2). The margins are N(0,1) and cop-ulas have Spearman’s ρS = 0.5. . . . . . . . . . . . . . . . . 16Figure 2.2 An example of a vine for d = 5 up to tree 3. . . . . . . . . . . 19Figure 3.1 Vine structure learning as a sequential decision problem. Anedge can be added to an unconnected acyclic graph. When atree at level t is completed, the edges of this tree are used tocreate nodes for the next graph at level t+1. . . . . . . . . . 30Figure 3.2 One iteration of the general MCTS algorithm. . . . . . . . . . 32Figure 3.3 The search tree corresponding to a 1-truncated vine with d = 3.Although the search tree has six leaf nodes, there are only threeunique 1-truncated vines: {[1,2], [2,3]} and {[2,3], [1,2]} yield1–2–3; {[1,3], [2,3]} and {[2,3], [1,3]} yield 1–3–2; {[1,2], [1,3]}and {[1,3], [1,2]} yield 2–1–3. . . . . . . . . . . . . . . . . . 36xvFigure 3.4 Some nodes of depths from 0 to 3 in the search tree. The rootnode does not have any edges. A child node is obtained byadding an edge to the (incomplete) vine structure of its parentnode. In future iterations, child nodes with higher scores aremore likely to be visited (exploitation); child nodes with fewerprior visits are more likely to be visited (exploration); childnodes with larger values of H j =− log(1−ρ2e j) are more likelyto be visited (progressive bias). Note that each child node hasseveral predecessors so that the number of visits of a givennode in the search tree is fewer than the sum of numbers ofvisits of its child nodes. . . . . . . . . . . . . . . . . . . . . 40Figure 3.5 Some nodes of depths 5, 6, and 9 in the search tree. The nodesof depth 9 are the results of the simulation step, starting fromthe nodes of depth 6. The scores or objective functions ofthe best 2-truncated vine found by the MCTS algorithm, brute-force algorithm, and sequential maximum spanning tree (MST)algorithm are 2.362, 2.362, and 2.333, respectively. . . . . . 41Figure 3.6 CFI vs truncation level t for simulated 2-truncated D-vine datasetswith d = 10 and d = 15. A larger CFI is better. . . . . . . . . 44Figure 3.7 CFI vs truncation level t for the Abalone dataset. A larger CFIis better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 3.8 GBM dataset: CFI vs truncation level t. A larger CFI is better. . 46Figure 3.9 Optimal truncation level t∗α=0.01 vs dimension d. A smallert∗α=0.01 is better. . . . . . . . . . . . . . . . . . . . . . . . . . 47Figure 4.1 Scatter plots of 1000 random samples drawn from C˘(u1,u2;θ ,β )in Equation 4.3 when C(·;θ) is comonotonic. . . . . . . . . . 66Figure 4.2 Comparison of permutation asymmetry measure GP,k=0.2 inSection 4.2.4 and central dependence measure Spearman’s rhofor skew-BB1 and skew-t copulas. For skew-BB1 copulas, theparameter β is in the set of 20 equally spaced points in [−1,1].Each red curve in the figure corresponds to a distinct β value. 67xviFigure 4.3 Conditional measures of C12;3(·;x), the copula of Y1,Y2 givenF3(Y3)= x, for a gamma factor model with parameters (θ0,θ1,θ2,θ3)=(3,1,1.5,2). The sample size is n = 1000. The red dash-dotlines are the exact conditional measures computed via numer-ical integration. The dark solid lines and dashed lines are thekernel-smoothed conditional Spearman’s rho and the corre-sponding 90%-level simultaneous bootstrap confidence bands,using Epanechnikov kernel and window size hn = 0.2. . . . . 70Figure 4.4 Conditional Spearman’s rho of C12;34(·;x,y), the copula of Y1,Y2given F3(Y3) = x and F4(Y4) = y, for a gamma factor modelwith parameters (θ0,θ1,θ2,θ3,θ4)= (3,1,1.5,2,2.5). The sam-ple size is n = 1000. The red surface is the exact conditionalSpearman’s rho computed via numerical integration, and theblue surfaces are the 90%-level simultaneous bootstrap con-fidence surfaces, using spherically symmetric Epanechnikovkernel and window size hn = 0.2. . . . . . . . . . . . . . . . 71Figure 4.5 Pairwise scatter plot of the normal scores of variables cobalt(Co), titanium (Ti) and scandium (Sc) in the hydro-geochemicaldataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 4.6 Conditional Spearman’s rho on the hydro-geochemical dataset.The dark solid lines and dashed lines are the kernel-smoothedconditional Spearman’s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikovkernel and window size hn = 0.2. The red dash-dot lines rep-resent the estimated conditional Spearman’s rho. . . . . . . . 77Figure 4.7 Pairwise scatter plot of the normal scores in the GBM dataset. 79xviiFigure 4.8 Conditional Spearman’s rho of pairs [14|3] and [23|4] in theD-vine-1342 model, and [23|1] and [24|1] in the C-vine-1234model on the GBM dataset. The dark solid lines and dashedlines are the kernel-smoothed conditional Spearman’s rho andthe corresponding 90%-level simultaneous bootstrap confidencebands, using Epanechnikov kernel and window size hn = 0.2.The red dash-dot lines represent the model conditional Spear-man’s rho. For [14|3], the best-fitting model is a constant skewedt-copula. For [23|4], the best-fitting model is a non-constantskewed-BB1 copula (quartic δ ). For both [23|1] and [24|1], thebest-fitting models are constant reflected skewed-BB1 copulas. 82Figure 5.1 First two trees T1 and T2 of a vine V . The node set and edge setof T1 are N(T1)= {1,2,3,4,5} and E(T1)= {[12], [23], [24], [35]}.The node set and edge set of T2 are N(T2)=E(T1)= {[12], [23], [24], [35]}and E(T2) = {[13|2], [25|3], [34|2]}. . . . . . . . . . . . . . . 88Figure 5.2 Adding a response variable to the R-vine of the explanatoryvariables. In this example, variables 1 to 5 represent the ex-planatory variables and variable 6 represents the response vari-able. The newly added nodes are highlighted. . . . . . . . . . 89Figure 5.3 The linear homoscedastic simulation case. In this fitted vinecopula model, C13,C12 and C23;1 are all Gaussian copulas, withparameters ρ13 = 0.77,ρ12 = 0.5 and ρ23;1 = 0.39. The greensurfaces represent the conditional expectation, and the red andblue surfaces are the 2.5% and 97.5% quantile surfaces, re-spectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 5.4 The linear heteroscedastic simulation case. In this fitted vinecopula model, C13 is a survival Gumbel copula with parameterδ13 = 2.21, C12 is a Gaussian copula with parameter ρ12 = 0.5,and C23;1 is a BB8 copula with parameters ϑ23;1 = 3.06,δ23;1 =0.71. The green surfaces represent the conditional expectation,and the red and blue surfaces are the 2.5% and 97.5% quantilesurfaces, respectively. . . . . . . . . . . . . . . . . . . . . . 98xviiiFigure 5.5 The non-linear and heteroscedastic simulation case. In this fit-ted vine copula model, C13 is a survival BB8 copula with pa-rameters ϑ13 = 6,δ13 = 0.78, C12 is a Gaussian copula withparameter ρ12 = 0.5, and C23;1 is a BB8 copula with parame-ters ϑ23;1 = 6,δ23;1 = 0.65. The green surfaces represent theconditional expectation, and the red and blue surfaces are the2.5% and 97.5% quantile surfaces, respectively. . . . . . . . 99Figure 5.6 Pairwise scatter plots of the Abalone dataset. . . . . . . . . . 101Figure 5.7 Visualization of the R-vine array in Table 5.4. . . . . . . . . . 103Figure 5.8 Residual vs. fitted value plots. The red and blue points cor-respond to the lower bound and upper bound of the predictionintervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 5.9 Comparison of the performance on the classification problem. 107Figure 6.1 Conditional quantile functions for bivariate copulas with Kendall’sτ = 0.5, combined with N(0,1) margins. Quantile levels are20%,40%,60% and 80%. . . . . . . . . . . . . . . . . . . . 113Figure 6.2 Conditional quantile surface F−1Y |X1,X2(α|x1,x2) in cases 1 and3, for α = 0.25 and 0.75. . . . . . . . . . . . . . . . . . . . . 130Figure 6.3 Conditional quantile F−1Y |X1,X2(α|x1,x2) versus x1 in case 3 forα = 0.25 and 0.75, as x1→+∞. It shows that the conditionalquantile converges to +∞, a finite number, or −∞. . . . . . . 132Figure B.1 Conditional measures of C12;3(·;x), the copula of Y1,Y2 givenF3(Y3) = x, for a trivariate Frank copula model with parameterthat corresponds to Kendall’s τ = 0.6. The sample size is n =1000. The red dash-dot lines are the exact conditional mea-sures computed via numerical integration. The dark solid linesand dashed lines are the kernel-smoothed conditional Spear-man’s rho and the corresponding 90%-level simultaneous boot-strap confidence bands, using Epanechnikov kernel and win-dow size hn = 0.2. . . . . . . . . . . . . . . . . . . . . . . . 157xixList of Symbols and NotationsC CopulaC+/C−/C⊥ Comonotonicity / Countermonotonicity / Independence cop-ulaC˘ Permutation asymmetric copulaĈ Reflected or survival copula of CC jk;S Copula for the conditional distributions Fj|S and Fk|SC j|k;S Conditional distribution of C jk;Sc/c˜ Copula density of Cc jk;S Copula density of C jk;Sd Dimension or number of variablesE Expectation of a random variableF Cumulative distribution functionFS j|Sk Conditional distribution with subsets of indices S j and Skf Probability density functionI{S} Indicator function for event SK Kernel density functiono Asymptotic order, e.g. h1(x) = o(h2(x)) as x→ ∞ if and onlyif ∀ε > 0,∃x0 ∈ R such that |h1(x)| ≤ ε|h2(x)| for all x≥ x0O Asymptotic order, e.g. h1(x) = O(h2(x)) as x→ ∞ if and onlyif ∃M > 0,x0 ∈ R such that |h1(x)| ≤M|h2(x)| for all x≥ x0P Probability of an eventR Correlation matrixRd Real coordinate space of d dimensions. Superscript omitted ifd = 1xxVar Variance of a random variableζα Tail-weighted dependence measureκL/κU Lower / Upper tail orderρS Spearman’s ρτ Kendall’s τΦ Standard Gaussian cumulative distribution functionφ Standard Gaussian probability density functionΩ Inverse correlation matrix or precision matrix/0 Empty set∼ Asymptotic equality, e.g., h1(x)∼ h2(x) as x→ ∞ if and onlyif limx→∞ h1(x)/h2(x) = 14 Symmetric difference of two sets# Cardinality of a setxxiGlossaryAIC Akaike information criterionAUC area under the curveBIC Bayesian information criterionCDF cumulative distribution functionCFI comparative fit indexIBS integrated Brier scoreIFM inference functions for marginsIS interval scoreLS logarithmic scoreMCTS Monte Carlo tree searchMLE maximum likelihood estimationMST maximum spanning treeMTCJ Mardia–Takahasi–Clayton–Cook–JohnsonPDF probability density functionPMF probability mass functionQS quadratic scorexxiiRF random forestRMSE root-mean-square errorROC receiver operating characteristicSVM support vector machineUCT upper confidence bounds for treesxxiiiAcknowledgmentsFirst and foremost, I would like to thank my supervisor, Prof. Harry Joe, whogave me the perfect amount of guidance and freedom so that I can explore variousresearch ideas without getting stuck in dead ends or losing directions. Throughoutmy Ph.D. studies, he was always available for help and cheered me up when I feltdown. I feel valued and respected working with him. Furthermore, he set a rolemodel for me at the early stage of my career; I believe his excellent work habitswill continue to influence me long after this.I am grateful to the members of the supervisory committee, Prof. Natalia Noldeand Prof. Matı´as Salibia´n-Barrera, for the detailed comments leading to an im-proved presentation of the dissertation. I would also like to thank Prof. Louis-PaulRivest who provided extensive feedback as the external examiner, Prof. RubenH. Zamar and Prof. Kevin Song for serving as the university examiners, andthe anonymous referees for their comments. The research has been supportedby research grants from the Natural Sciences and Engineering Research Council(NSERC) and the Canadian Statistical Sciences Institute (CANSSI).I want to thank all my collaborators in addition to my supervisor. It has been agreat pleasure sharing the journey of scientific exploration with them. Especially, Iam grateful to the senior researchers with whom I worked closely, in chronologicalorder, Prof. Roger M. Cooke, Dr. Lili Meng, Prof. Eldad Haber, Dr. Bo Zhao,and Dr. Minmin Chen. As Confucius once said, “When you see a worthy, thinkof becoming equal to him/her.” It makes me a better researcher by learning fromthem.Finally, I am indebted to my family and friends for their constant support andfor the happy moments we shared.xxivDedicationThis dissertation is dedicated to my beloved parents.謹以此文獻給我摯愛的父親和母親。xxvChapter 1IntroductionCopulas are widely used in high-dimensional multivariate applications where theassumption of Gaussian distributed variables does not hold. Truncated vine cop-ulas are a flexible family of copulas built from a sequence of bivariate copulas torepresent bivariate dependence and bivariate conditional dependence.We address the following new research directions regarding vine copula mod-els.1. Finding parsimonious truncated vine structures so that there is a representa-tion where we rely on conditional dependence only up to t−1 conditioningvariables and conditional independence for t to d−2 variables. A smaller tindicates more conditional independence or parsimony.2. Developing diagnostic tools of tail dependence, tail asymmetry and non-constant conditional dependence to help in the selection of bivariate copulas.3. Applying vine copula models for improved predictive conditional distribu-tion of the response variable given explanatory variables, compared withclassical multiple regression, binary regression, or ordinal regression; vinecopula models handle nonlinear conditional expectation and conditional het-eroscedasticity in a simpler way than methods such as polynomial regres-sion, weighted regression, and ordinal regression.11.1 BackgroundIn multivariate statistics, modeling the dependence structure of multivariate ob-servations is essential. The multivariate Gaussian distribution is one of the mostcommonly used models for this task. However, multivariate data are seldom suf-ficiently summarized by the multivariate Gaussian distribution, because univariatemargins could be skewed or heavy-tailed, and the joint distribution could havestronger dependence than a Gaussian distribution in the tails or have tail asymme-tries.Copulas are a flexible tool in multivariate statistics that can be used to modeldistributions beyond the Gaussian. Sklar’s theorem (Sklar, 1959) shows that, fora d-dimensional random vector Y = (Y1,Y2, . . . ,Yd)′ following a joint cumulativedistribution function (CDF) F , with the j-th univariate margin Fj, there exists adistribution function C : [0,1]d → [0,1] such thatF(y1, . . . ,yd) =C(F1(y1), . . . ,Fd(yd)).If F is a continuous d-variate distribution function, then the copula C is unique.Otherwise C is unique only on the set Range(F1)× ·· ·×Range(Fd). Sklar’s the-orem provides a decomposition of a multivariate distribution into two parts: themarginal distributions and the associated copula.Several parametric bivariate copula families have been proposed and their prop-erties, including tail dependence and asymmetric behavior, have been developed.However, extending copulas from bivariate to multivariate distributions is non-trivial. The challenge is that we need parsimonious and flexible models. Here, par-simonious models are models whose number of parameters does not grow quadrat-ically with respect to the dimension d, that is, a model with o(d2) parameters. Ad-variate Gaussian distribution can be characterized by O(d2) parameters, includ-ing d mean parameters and d(d + 1)/2 covariance parameters. One class of par-simonious models is the exchangeable models, including isotropic Gaussian andt-distributions, and the class of Archimedean copulas. Those models only haveone parameter so are not flexible because of the strong assumption on exchange-ability; hence they are not useful except for small values of d.The vine copula or pair-copula construction is a flexible tool in high-dimensional2dependence modeling. It combines the vine graphs and bivariate copulas. A vineis a graphical object represented by a sequence of connected trees. In a vine copulamodel, vine graphs are adopted to specify the dependence structure, and bivariatecopulas are used as the basic building blocks on the edges of vines. Truncatedvines are useful for representing the dependence of multivariate observations in aparsimonious way. A vine copula with a truncation level of t has O(td) parame-ters, which grows linearly in d. Vine copulas have been proven to be a flexible toolin high-dimensional (non-Gaussian) dependence modeling, and have been appliedto various fields including finance (Dissmann et al., 2013), social science (Cookeet al., 2019), and spatial statistics (Krupskii et al., 2018).Fitting a vine copula model can be done in two steps: (1) To find a vine struc-ture that describes the dependence among variables, which is referred to as thestructure learning. The structure learning is a difficult combinatorial optimizationproblem since the number of possible vine structures is exponentially increasingwith respect to the dimension. Commonly used algorithms for structure learningare greedy algorithms, which, in general, do not produce a globally optimal so-lution. (2) To assign bivariate copulas to the edges of the vine structure. After avine structure is fixed, bivariate copulas are fitted to each edge in the vine. Thisis often done by iterating through a list of candidate bivariate copula families andpicking the one with the highest log-likelihood, or lowest Akaike information crite-rion (AIC) or Bayesian information criterion (BIC). However, this approach mightbring bias or inefficiency when the candidate families do not match the dependenceor asymmetry properties exhibited in data.1.2 Dependence structure learningThe structure learning of the truncated vine is computationally intractable in gen-eral. There are a large number of possible vine structures which result in a hugesearch space for a high-dimensional dataset if one would like to find the optimalone. Specifically, according to Cayley’s formula, one can construct dd−2 differenttrees with d nodes. With this result, Kurowicka and Joe (2011) further show thatthere are in total 2(d−3)(d−2)(d!/2) different vine structures considering all levelsof trees for a dataset with d variables. This makes vine structure searching and3learning a challenging problem.Monte Carlo tree search (MCTS) is a search framework for finding a sequenceof near-optimal decisions by taking random samples in the decision space (Browneet al., 2012). Many chess games can be formulated as sequential decision problemswhere players take actions sequentially. The key idea of MCTS is first to construct asearch tree which is explored by fast Monte Carlo simulations and then to grow thetree selectively (Coulom, 2006). Multi-armed bandit algorithms such as the upperconfidence bounds for trees (UCT) can be employed to balance between explorationand exploitation (Kocsis and Szepesva´ri, 2006). As one of the most importantmethods in artificial intelligence, MCTS has been widely applied in various gameand planning problems.Because the construction of a truncated vine is inherently sequential, we formu-late the vine structure learning problem as a sequential decision making process inthis work. A search tree thus arises, where the root node is “empty” and the termi-nal leaf nodes are valid vine structures. Although the height and branching factor ofthe search tree might be large, MCTS can be adopted to search through it efficiently.Specifically, we adapt the existing UCT algorithm for vine structure learning andincorporate tree policy enhancements including first play urgency, progressive bias,and efficient transposition handling. The adapted UCT is called the vine UCT, un-der the guidance of which, the tree policy strikes a balance between exploitationand exploration. Here, exploitation means to make the best decision given currentinformation, and exploration means to gather more information about the searchspace.After the MCTS method finds candidates for truncated vine structures, bivariatecopula families are chosen to match tail dependence and tail asymmetries in thedata. This approach improves on greedy algorithms in terms of model fitting buttakes more computational time. Comparisons are made with existing methods ondatasets from various disciplines. All the experiments suggest that the proposedmethod outperforms existing methods.41.3 Vine copula diagnosticsTo effectively facilitate the choice of bivariate parametric copula families on theedges of a vine, we propose diagnostic tools for bivariate asymmetries and forconditional dependence as a function of the conditioning value(s).Various dependence measures and asymmetry measures can effectively guidethe choice of candidate parametric copula families. If diagnostics for an edge ofthe vine suggest that tail dependence or asymmetry exists, then only appropriateparametric copula families with properties matching the tail asymmetry or strengthof dependence in the tail should be considered.In order for modeling with vine copulas to be tractable, the constant condi-tional dependence assumption is usually made for the bivariate copulas as an ap-proximation, since this can still lead to vine copulas with flexible tail properties.For vine copulas, in particular, the bivariate copulas of conditional distributions inthe second tree and higher do not depend on the conditioning values. Adoptingthe constant conditional dependence assumption can greatly simplify the model-ing process and evade the curse of dimensionality. For conditional dependence intrees 2 and higher of the vine, our diagnostic tools yield functions of conditioningvariable(s) to help in the visualization of the form of conditional dependence andasymmetry. Corresponding confidence bands can be obtained for the conditionalfunctions; if a constant function does not lie within the confidence bands, thenthe simplifying assumption might be inappropriate and one could consider copulaswhose parameters depend on the value of the conditioning variable.1.4 Vine copula regressionOne possible application of vine copula models is to regression analysis. In thecontext of an observational study, i.e., the response variable Y and the explanatoryvariables X = (X1, . . . ,Xp) are measured simultaneously, a natural approach is tofit a joint distribution to (X1, . . . ,Xp,Y ) assuming a random sample (xi1, . . . ,xip,yi)for i = 1, . . . ,n, and then obtain the conditional distribution of Y given X for mak-ing predictions. For example, conditional expectation and conditional quantilescan be obtained from the conditional distribution for out-of-sample point estimatesand prediction intervals. This becomes the usual multiple regression if the joint5distribution of (X,Y ) is multivariate Gaussian. Unlike multiple regression, this ap-proach uses information on the distributions of the variables and does not specify asimple linear or polynomial equation for the conditional expectation. Polynomialequations can only be valid locally and generally have poor performance in theextremes of the predictor space.To make the joint distribution approach work, there are two major questions tobe addressed: (A) How to model the joint distribution of (X1, . . . ,Xp,Y )? (B) Howto efficiently compute the conditional distribution of Y given X from a multivariatedistribution? For question (A), the vine copula is a flexible tool. The possibility ofapplying copulas for prediction and regression has been explored, but an algorithmis needed in general for (B) when some variables are continuous and others arediscrete.We propose a method, called the vine copula regression, that uses R-vines andhandles mixed continuous and discrete variables. That is, the predictor and re-sponse variables can be either continuous or discrete. As a result, we have a unifiedapproach for regression and (ordinal) classification. This approach is interpretable,and various shapes of conditional quantiles of Y as a function of X can be obtaineddepending on how pair-copulas are chosen on the edges of the vine.Another advantage of the proposed method is that it can handle ordinal re-sponses better than ordinal regression, especially when there are many explanatoryvariables. The ordinal regression model for an ordinal response variable Y withlevels {1,2, . . . ,K} is formulated as follows.P(Y ≤ k|X = x) = σ(θk−wT x),where σ is the inverse link function, for example, the logistic function; θ1 < θ2 <.. . < θK−1 and w = (w1, . . . ,wp) are the parameters. This model guarantees thatP(Y ≤ k|X = x) is monotonically increasing. However, the downside is that themodel is highly restrictive; it assumes the effect of an explanatory variable x j onthe log odds is a constant of w j, regardless of the level k. When (x1, ...,xp,y)are observed together in a sample, a multivariate distribution approach is moreflexible and can easily overcome the problem, since it models the joint distributionof (X,Y ).6We also provide a theoretical analysis of the asymptotic conditional CDF andquantile function for vine copula regression. This analysis sheds light on the ad-vantage of vine copula regression methods: flexible asymptotic tail behavior. Toeasily compare with the Gaussian copula or linear regression equations when allvariables have Gaussian distributions, we assume Y and the components of X havebeen transformed to standard normal N(0,1) variables Y ∗,X∗. Leveraging the flex-ibility of bivariate copulas on the edges of the vine, the conditional quantile func-tion of Y ∗ could be asymptotically linear, sublinear, or constant with respect to thetransformed explanatory variables X∗, as components of X∗ go to ±∞.1.5 Research contributions and organization of thesisThe thesis is organized as follows. Chapter 2 gives an overview of results in exist-ing literature. The main research contributions in Chapter 3 to 6 are summarizedas follows; these chapters can be read in any order.• Chapter 3: A novel approach to learning truncated vine structures usingMonte Carlo tree search (MCTS). The proposed method can efficiently ex-plore a search space with guided random sampling and has significantly bet-ter performance over the existing methods under various experimental se-tups.• Chapter 4: A general framework for estimating the conditional dependenceor asymmetry measures as a function of the value(s) of the conditioning vari-able(s). An algorithm to compute the corresponding confidence bands is alsopresented. The estimation of the conditional measures can be adapted toother copula-based measures and enrich the diagnostic tools in the future.• Chapter 5: A novel method called the vine copula regression that uses R-vines and handles mixed continuous and discrete variables. The method is aunified approach for regression and (ordinal) classification and interpretable.Various shapes of nonlinear conditional mean, quantiles and heteroscedas-ticity of Y as a function of x can be obtained depending on how pair-copulasare chosen on the edges of the vine.7• Chapter 6: A theoretical analysis of the asymptotic conditional CDF andquantile function for vine copula regression. This analysis sheds light onthe advantage of vine copula regression methods: flexible asymptotic tailbehavior.Finally, Chapter 7 concludes the thesis and discusses further research.8Chapter 2PreliminariesA d-dimensional copula C is a multivariate distribution on the unit hypercube[0,1]d , with all univariate margins being U(0,1). Sklar’s theorem (Sklar, 1959)provides a decomposition of a d-dimensional distribution into two parts: the marginaldistributions and the associated copula. It states that for a d-dimensional randomvector Y= (Y1,Y2, . . . ,Yd)′ following a joint CDF F , with the j-th univariate marginFj, the copula associated with F is a CDF C : [0,1]d → [0,1] with U(0,1) marginsthat satisfiesF(y) =C(F1(y1), . . . ,Fd(yd)), y ∈ Rd .If F is a continuous d-variate distribution function, then the copula C is unique.Otherwise C is unique only on the set Range(F1)×·· ·×Range(Fd).In this chapter, we review existing results that serve as background the thesis.Section 2.1 gives an overview of bivariate copulas and relevant properties. Sec-tion 2.2 briefly summarizes the Archimedean copula, which has an exchangeabledependence structure. A more flexible multivariate copula construction is the vinecopula. The definition and some related algorithms are introduced in Section 2.3.Section 2.4 describes the two-stage estimation for parameter estimation for copulamodels. Finally, Section 2.5 reviews a diagnostic method to check if two differentparametric models have similar fits.92.1 Bivariate copulasIn this section, we give an overview of parametric bivariate copula families that areused in the thesis. Consider a bivariate random vector (Y1,Y2) and let F12(y1,y2)be the CDF and f12(y1,y2) be the probability density function (PDF). By Sklar’stheorem (Sklar, 1959), there exists a copula C(u1,u2) such thatF12(y1,y2) =C(F1(y1),F2(y2)),and C(u1,u2) is the CDF of a bivariate random vector (U1,U2), where U1 and U2are U(0,1) random variables.Commonly used parametric bivariate copula families include the following;properties of them are given in Chapter 4 of Joe (2014).• Independence copula: C⊥(u1,u2) = u1u2.• Comonotonicity copula: C+(u1,u2) = min(u1,u2).• Countermonotonicity copula: C−(u1,u2) = max(0,u1+u2−1).• Gaussian copula:C(u1,u2;ρ) =Φ2(Φ−1(u1),Φ−1(u2);ρ), ρ ∈ (−1,1),whereΦ is the univariate standard normal CDF, andΦ2 is the CDF of a bivari-ate normal random vector with correlation ρ , zero means and unit variances.• Frank copula:C(u1,u2;δ ) =−δ−1 log(1− e−δ − (1− e−δu1)(1− e−δu2)1− e−δ), δ ∈ R.• Gumbel copula:C(u1,u2;δ ) = exp{−([− logu1]δ +[− logu2]δ )1/δ}, δ ∈ [1,∞).10• Mardia–Takahasi–Clayton–Cook–Johnson (MTCJ) copula:C(u1,u2;δ ) = (u−δ1 +u−δ2 −1)−1/δ , δ ∈ [0,∞).• Joe copula:C(u1,u2;δ )= 1−([1−u1]δ+[1−u2]δ−[1−u1]δ [1−u2]δ)1/δ, δ ∈ [1,∞).• Student t copula:C(u1,u2;ρ,ν) = T2,ν(T−11,ν (u1),T−11,ν (u2);ρ), ρ ∈ (−1,1),ν ∈ (0,∞),where T1,ν is the CDF of a univariate t-distribution with degree of freedomν , and T2,ν is the CDF of a bivariate t-distribution with degree of freedom νand correlation parameter ρ . Note that T2,ν needs not to have finite secondmoments.• BB1 copula:C(u1,u2;θ ,δ ) ={1+[(u−θ1 −1)δ +(u−θ2 −1)δ]1/δ}−1/θ,θ ∈ (0,∞),δ ∈ [1,∞).• BB6 copula:C(u1,u2;θ ,δ ) = 1−(1− exp{− [(− log(1− u¯θ1 ))δ+(− log(1− u¯θ2 ))δ ]1/δ})1/θ,θ ∈ [1,∞),δ ∈ [1,∞),where u¯1 = 1−u1 and u¯2 = 1−u2.• BB7 copula:C(u1,u2;θ ,δ ) = 1−(1− [(1− u¯θ1 )−δ +(1− u¯θ2 )−δ −1]−1/δ)1/θ ,θ ∈ [1,∞),δ ∈ (0,∞),11where u¯1 = 1−u1 and u¯2 = 1−u2.• BB8 copula:C(u1,u2;θ ,δ ) = δ−1(1−{1−η−1[1−(1−δu1)θ ][1−(1−δu2)θ ]}1/θ),θ ∈ [1,∞),δ ∈ (0,1],where η = 1− (1−δ )θ .Note that the BB copulas are based on the theory developed by Joe and Hu(1996), and the naming convention comes from Joe (1997).2.1.1 Density and conditional distributionsIf C(u1,u2) is an absolutely continuous copula CDF, then its density function isc(u1,u2) =∂ 2C(u1,u2)∂u1∂u2.The conditional CDF is defined as follows.C1|2(u1|u2) := P(U1 ≤ u1|U2 = u2)= limε→0+P(U1 ≤ u1,u2 <U2 ≤ u2+ ε)P(u2 <U2 ≤ u2+ ε)=∂C(u1,u2)∂u2.The conditional quantile function C−11|2(·|u2) is the inverse function of C1|2(·|u2).C2|1(·|u1) and C−12|1(·|u1) can be defined in a similar fashion.We further present the copula density functions when some of the variables arediscrete (Panagiotelis et al., 2012; Sto¨ber et al., 2015). Let f1, f2 and f12 be thePDFs of Y1, Y2 and (Y1,Y2) respectively, with respect to the Lebesgue measure forcontinuous random variables or the counting measure for discrete ones. The jointPDF f12 can be decomposed as follows:f12(y1,y2) = c˜(y1,y2) f1(y1) f2(y2), (2.1)12where c˜ is defined below. If Yj is discrete, we denote Fj(y−j ) := P(Yj < y j) =limt↑y j Fj(t).• If both Y1 and Y2 are continuous random variables, then the PDF of (Y1,Y2)is:f12(y1,y2) =∂ 2F12(y1,y2)∂y1∂y2=∂ 2C(F1(y1),F2(y2))∂F1(y1)∂F2(y2)∂F1(y1)∂y1∂F2(y2)∂y2= c(F1(y1),F2(y2)) f1(y1) f2(y2).If we define c˜(y1,y2) := c(F1(y1),F2(y2)), then Equation 2.1 holds.• If Y1 is a discrete random variable and Y2 is continuous, thenf12(y1,y2) =∂∂y2[F12(y1,y2)−F12(y−1 ,y2)]=∂∂F2(y2)[C(F1(y1),F2(y2))−C(F1(y−1 ),F2(y2))]∂F2(y2)∂y2=[C1|2(F1(y1)|F2(y2))−C1|2(F1(y−1 )|F2(y2))]f2(y2).If we define c˜(y1,y2) :=[C1|2(F1(y1)|F2(y2))−C1|2(F1(y−1 )|F2(y2))]/ f1(y1),then Equation 2.1 holds.• If Y1 is a continuous random variable and Y2 is discrete, then Equation 2.1holds if we definec˜(y1,y2) :=[C2|1(F2(y2)|F1(y1))−C2|1(F2(y−2 )|F1(y1))]/ f2(y2).• If both Y1 and Y2 are discrete random variables, then the density of (Y1,Y2)13is:f12(y1,y2) = P(Y1 = y1,Y2 = y2)= F12(y1,y2)−F12(y−1 ,y2)−F12(y1,y−2 )+F12(y−1 ,y−2 )=C(F1(y1),F2(y2))−C(F1(y−1 ),F2(y2))−C(F1(y1),F2(y−2 ))+C(F1(y−1 ),F2(y−2 )).If we definec˜(y1,y2) :=[C(F1(y1),F2(y2))−C(F1(y−1 ),F2(y2))−C(F1(y1),F2(y−2 ))+C(F1(y−1 ),F2(y−2 ))]/[f1(y1) f2(y2)],then Equation 2.1 holds.2.1.2 Dependence measuresIn this section, we consider the following dependence measures that are definedvia copulas. These dependence measures are invariant under strictly increasingtransformations on the variables.• Spearman’s rho (Spearman, 1904),ρS(C) = 12∫∫[0,1]2C(u1,u2)du1du2−3.• Kendall’s tau (Kendall, 1938),τ(C) =∫∫[0,1]2C(u1,u2)dC(u1,u2).• Tail-weighted dependence measure, with α > 0 (Lee et al., 2018),ζα(C) = 2−α(γ−1α (C)−1), where γα(C) =∫ 10C(u1/α ,u1/α)du.Both Spearman’s rho and Kendall’s tau summarize the dependence in the centerand cannot quantify the dependence in the joint upper and lower tails. The tail-14weighted dependence measure puts more weight on data in the joint lower (upper)tail. When α = 1, ζα is a measure of central dependence with properties similarto Kendall’s tau and Spearman’s rho. For large α values, ζα is a tail-weighteddependence measure; the limit as α → ∞ is the upper tail dependence coefficient.2.1.3 Asymmetry measuresThere are two types of symmetry that are most relevant for bivariate copulas: re-flection symmetry and permutation symmetry. A bivariate copula C(u1,u2) is re-flection symmetric if (U1,U2) and (1−U1,1−U2) are identically distributed, orequivalently, C(u1,u2) = Ĉ(u1,u2), where Ĉ(u1,u2) = u1 +u2−1+C(1−u1,1−u2) is known as the reflected or survival copula of C. A bivariate copula C(u1,u2)is permutation symmetric if (U1,U2) and (U2,U1) are identically distributed, orequivalently, C(u1,u2) =C(u2,u1). Figure 2.1 shows some bivariate copula fami-lies with and without symmetry.Krupskii (2017) proposes permutation asymmetry measure GP,k and reflectionasymmetry measure GR,k for data with positive quadrant dependence.• Permutation asymmetry measure, with k > 0,GP,k(C) =∫∫[0,1]2|u1−u2|k+2 · sign(u1−u2)dC(u1,u2).• Reflection asymmetry measure, with k > 0,GR,k(C) =∫∫[0,1]2|1−u1−u2|k+2 · sign(1−u1−u2)dC(u1,u2).The permutation asymmetry measure GP,k(C) is defined as the expectation ofthe variable |U1−U2|k+2 adjusted for the sign of U1−U2 for k > 0. It indicatesthe direction of permutation asymmetry: if the measure takes a positive (negative)value, then the conditional mean of data truncated in the right lower (left upper)corner is greater than that of data truncated in the left upper (right lower) corner. Alarger tuning parameter k results in greater variability of an empirical estimate,while a small k makes the measure less sensitive to a permutation asymmetricdependence. The permutation asymmetry measure GP,k can be further normalized15Figure 2.1: Contour plots of the joint PDF c(Φ(z1),Φ(z2))φ(z1)φ(z2). Themargins are N(0,1) and copulas have Spearman’s ρS = 0.5.Gaussian copula and Frank copula are reflection and permutation symmetric. BB1copula is permutation symmetric but not reflection symmetric. Skew-normal isneither reflection nor permutation symmetric.to the range of [−1,1] by finding a copula C that maximizes |GP,k(C)| (Rosco andJoe, 2013). Similarly, the reflection asymmetry measure GR,k is defined as theexpectation of the variable |1−U1−U2|k+2 adjusted for the sign of 1−U1−U2.2.1.4 Tail orderTail order, denoted by κ , and as studied in Hua and Joe (2011), can be used as ameasure of the strength of dependence in the joint tails of a copula. For bivariatecopulas with positive dependence, the tail order has value between 1 and 2, withlarger values indicating less dependence in the joint tail. The tail order can belarger than the dimension for negative dependence.If there exists κL > 0 and some `(u) that is slowly varying at 0+ (that is,`(tu)/`(u) ∼ 1 as u→ 0+ for all t > 0) such that C(u,u) ∼ uκL`(u), as u→ 0+,then κL is called the lower tail order of C and ϒL = limu→0+ `(u) is the lower tail16order parameter, provided the limit exists. By reflection, the upper tail order isdefined as κU such that C(1− u,1− u) ∼ uκU `∗(u), as u→ 0+, for some slowlyvarying function `∗(u), where C is the survival function of the copula C. The uppertail order parameter is then ϒU = limu→0+ `∗(u).With κ = κL or κU and ϒ= ϒL or ϒU , we further classify the tail property of acopula into the following:• Strong tail dependence: κ = 1 with ϒ> 0. For example, a bivariate t-copulahas κL = κU = 1.• Intermediate tail dependence: 1< κ < 2, or κ = 1 and ϒ= 0. For example, abivariate Gaussian copula has κL = κU = 2/(1+ρ), where ρ is the parameterof the Gaussian copula. When 0 < ρ < 1, 1 < κL = κU < 2.• Tail quadrant independence: κ = 2 and the slowly varying function is (asymp-totically) a constant. For example, a bivariate Frank copula has κL = κU = 2.2.2 Archimedean copulasOne way of extending bivariate copulas to multivariate is via Archimedean copu-las, which have an exchangeable dependence structure. A d-variate Archimedeancopula has the following copula CDF:Cψ(u) = ψ(d∑i=1ψ−1(ui)), u ∈ [0,1]d . (2.2)This is a valid copula for any d if ψ ∈ L∞ where L∞ is the class of Laplacetransforms of non-negative random variables with no mass at 0 (i.e., ψ(∞) = 0).Note that Equation 2.2 is permutation symmetric:Cψ(upi(1), · · · ,upi(d)) =Cψ(u1, · · · ,ud)for any permutation pi of {1, . . . ,d}.The conditional distribution of the copula in Equation 2.2 given the last variable17isC1,...,d−1|d(u1, . . . ,ud−1|ud) =∂Cψ(u1, . . . ,ud)∂ud=ψ ′(∑di=1ψ−1(ui))ψ ′ (ψ−1(ud)).2.3 Vine copulasIn this section, we review the vine copula approach (Bedford and Cooke, 2001),which allows one to construct multivariate copulas hierarchically using bivariatecopulas as building blocks.2.3.1 Vine graphical modelsA vine is a nested set of trees where the edges in the first tree are the nodes ofthe second tree, the edges of the second tree are the nodes of the third tree, etc.Vines are useful in specifying the dependence structure for general multivariatedistributions on d variables.The first tree in a vine represents d variables as nodes and the bivariate depen-dence of d− 1 pairs of variables as edges. The second tree describes conditionaldependence of d−2 pairs of variables conditioning on another variable; nodes arethe edges in tree 1, and a pair of nodes could be connected if there is a commonvariable in the pair. The third tree describes conditional dependence of d−3 pairsof variables conditioning on two other variables; nodes are the edges in tree 2, anda pair of nodes could be connected if there are two common conditioning vari-ables in the pair. This continues until tree d− 1 has only one edge that describesthe conditional dependence of two variables conditioning on the remaining d− 2variables.For a concrete example, as shown in Figure 2.2, consider d = 5 variables la-beled as 1,2,3,4,5. Suppose tree 1 has edges [1,2], [1,3], [2,4], [2,5] where [1,2]is an edge connecting variables 1 and 2, etc. Possible edges for tree 2 are [2,3|1],[1,4|2], [4,5|2] where [2,3|1] connects [1,2] and [1,3] (edges of tree 1 are nodes intree 2, and these two nodes have the variable 1 in common). Possible edges for tree3 are [3,4|1,2], [1,5|2,4] where [1,5|2,4] connects [1,4|2] and [4,5|2] (edges oftree 2 are nodes in tree 3, and these two nodes have the variables 2,4 in common).1813245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](a) Level 1 tree T1.13245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](b) Level 2 tree T2.13245[1,2] [2,5][1,3] [2,4][1,2][1,3][2,5][2,4][2,3|1][1,4|2][4,5|2] [2,3|1] [1,4|2] [4,5|2][3,4|1,2] [1,5|2,4](c) Level 3 tree T3.Figure 2.2: An example of a vine for d = 5 up to tree 3.Note that the possible edges in a tree depend on but are not uniquely determinedby the edges of the previous trees. For example, [2,3|1], [1,4|2], [1,5|2] is anotherpossible set of edges for tree 2 in Figure 2.2. Therefore, one needs to decide whichconfiguration to adopt when building trees for a new level. The requirement thattwo connected nodes must have two distinct variables and the remaining variablesin common is called the proximity condition.A formal definition of the regular vine or R-vine is given as follows (Bedfordand Cooke, 2001; Kurowicka and Cooke, 2006).Definition 2.1. (Regular vine) V is a vine on d variables if1. V = (T1, . . . ,Td−1);2. T1 is a tree with nodes N(T1) = {1,2, . . . ,d}, and edges E(T1). For ` > 1, T`is a tree with nodes N(T`) = E(T`−1);3. (proximity condition) For `= 2, . . . ,d−1, for {n1,n2} ∈ E(T`), #(n14n2) =2, where4 denotes symmetric difference and # denotes cardinality.There are two special classes of R-vines. A regular vine is called a canonicalvine or C-vine if tree T` has a unique node of degree d− ` (the maximum degree)for `= 1, . . . ,d−2. A regular vine is called a drawable vine or D-vine if all nodes19in T1 have degree not higher than two. In some scenarios, the special classes mightbe used directly. For example, D-vines are more natural if there is a time or linearspatial order in variables; C-vines are more natural if there are leading variablesthat influence others. R-vines might be better in the absence of these criteria.2.3.2 Vine array representationAn R-vine can be represented by the edge sets at each level E(T`), or equivalentlyby a graph, such as Figure 2.2. But those representations are not convenient foralgorithms; we need a more compact way to represent vine models. A vine arrayA = (a jk) for a regular vine V = (T1, . . . ,Td−1) on d elements is a d × d uppertriangular matrix. There is an ordering of the variable indexes along the diagonal.The (`, j)-th element a` j is connected to the (`,`)-th element a`` in tree `. Thatis, the first ` rows of A and the diagonal elements encode the `-th tree T`, suchthat [a` j,a j j|a1 j, . . . ,a`−1, j] ∈ E(T`) for `+1≤ j ≤ d. For example, the vine arrayA1 represents the R-vine in Figure 2.2. The edges of T1 include [a12,a22] = [1,2],[a13,a33] = [2,4], [a14,a44] = [2,5], [a15,a55] = [1,3]. The edges of T2 include[a23,a33|a13] = [1,4|2], [a24,a44|a14] = [4,5|2], [a25, a55|a15] = [2,3|1].A1 =1 1 2 2 12 1 4 24 1 45 53 , A2 =2 2 1 2 21 2 1 43 3 14 35 .Note that a valid vine array represent a unique R-vine. However, an R-vine mayhave multiple vine array representations. For example, A1 and A2 encode exactlythe same R-vine. In applications, the variables are labeled arbitrarily. We candefine a permutation of the variables so that the diagonal elements are (1,2, . . . ,d).2.3.3 From vines to multivariate distributionsTo get a multivariate distribution from a vine, bivariate distributions are assigned tothe edges of tree 1 and bivariate conditional distributions are assigned to the edgesof trees 2, . . . ,d−1. In the above example, edges can be assigned bivariate distri-20butions F12,F13,F24,F25 that can be algebraically independent provided the univari-ate marginal distributions are F1,F2,F3,F4,F5, i.e., the parameters of these bivariatedistributions are free to vary on the parameter domains and the positive definitenessconstraint of the correlation matrix is automatically satisfied. For tree 2, edges canbe assigned the conditional distributions F23|1,F14|2,F45|2; for example, F23|1 sum-marizes the conditional dependence of F2|1,F3|1 where F2|1,F3|1 can be obtainedfrom F12,F13 in tree 1, respectively. The combination of F23|1,F12,F13 yields thetrivariate distribution F123. For tree 3, edges can be assigned the conditional distri-butions F34|12,F15|24; for example, F34|12 summarizes the conditional dependence ofF3|12,F4|12, which can be obtained from F123,F124. As mentioned above, F123,F124can be achieved by combining conditional distributions in trees 1 and 2.There are bivariate distributions on the edges in trees 1 to d−1 of the vine. Ifthe bivariate distributions on the edges are all bivariate Gaussian, each edge can becharacterized by a correlation parameter ρ , which can be interpreted as a partialcorrelation for trees 2 to d− 1. For the above example, one could consider thatthe edges have been assigned the quantities ρ12, ρ13, ρ24, ρ25, ρ23;1, ρ24;1, ρ45;2,ρ34;12, ρ15;24; here the semicolon in the subscript is common for the partial correla-tion. For example, ρ15;24 summarizes the conditional correlation of variables 1 and5 given variables 2 and 4. Partial correlations can be calculated by inverting theprincipal submatrices of a correlation matrix. Specifically, consider a partial corre-lation ρa,b;S where S is a set of variables and {a,b}∩S= /0. Let R be the correlationmatrix of random variables indexed by {a,b}∪ S. If we define Ω = (ωi j) = R−1,we have ρa,b;S =−ωab/√ωaaωbb.The representation of a multivariate Gaussian distribution through a vine is analternative parametrization of the correlation matrix that avoids the positive defi-niteness constraint of a correlation matrix. From Kurowicka and Cooke (2003) andKurowicka and Cooke (2006), the correlations and partial correlations assigned toany vine are algebraically independent and the determinant of the correlation ma-trix is logdet(R) = ∑e∈E(V ) log(1− ρ2e ) for any vine with {ρe} being the set ofcorrelations and partial correlations on E(V ), the edges of the vine V . Moreover,it is this parametrization of multivariate Gaussian distribution that can extend tomultivariate non-Gaussian by using bivariate copulas on the edges of the vine toget what is called the vine copula or pair-copula construction.21Multivariate data are seldom well summarized by the multivariate Gaussiandistribution, but the multivariate Gaussian distribution may be adequate as a firstorder model if the variables are monotonically related to each other. One approachto developing a parsimonious copula for high-dimensional non-Gaussian data is to(a) find a parsimonious truncated partial correlation vine for the matrix of normalscore correlations (where variables have each been converted to standard normalvia probability integral transforms), and (b) replace edges of the vine with bivari-ate copulas that can have tail behavior different from Gaussian if this is seen inbivariate plots. See Brechmann and Joe (2015) for data examples that follow thesesteps.2.3.4 Truncated vineThere are d(d−1)/2 = O(d2) edges in a complete vine graph, and at least d(d−1)/2 parameters for a vine copula with a parametric bivariate copula family oneach edge. Great computational effort is required for parameter estimation in high-dimensional cases. Truncated vines are useful for representing the dependence ofd variables in a parsimonious way. A truncated vine with 1 ≤ t < d−1 trees, ora t-truncated vine, assumes that the most important dependencies are captured bythe first t trees Vt = (T1, . . . ,Tt) in a vine and the remaining trees have indepen-dence copulas assigned to the edges. In other words, for ` > t, T` represents theconditional independence of two variables given the conditioning variables. In theGaussian case, this is equivalent to assigning partial correlations of 0 to the edgesof the remaining d− t− 1 trees. By vine truncation, the number of parameters isreduced from O(d2) to O(d), if t is constant as d increases.The most parsimonious vine structure is a 1-truncated vine with one tree thatconnects d− 1 pairs. This is a valid structure (called a Markov tree) if variablesnot directly connected are conditionally independent given the variables in the treepath that connect them. But seldom can a Markov tree summarize the dependencewell in d variables. As an improvement, the truncated vine (t ≥ 2) adds some layersof conditional dependence on top of a Markov tree until conditional independencerelations from high-order trees are approximately valid.222.3.5 Vine structure learningKurowicka and Cooke (2003) show that the log-determinant of the empirical corre-lation matrix R is logdet(R) =∑e∈E(V ) log(1−ρ2e ) for any vine V , with {ρe} beingthe set of correlations and partial correlations on the edges of the vine. Assumingall the bivariate copulas are Gaussian, logdet(R) is also linearly related to the neg-ative log-likelihood of the vine copula. The best t-truncated partial correlation vineto approximate the correlation matrix is such that ∑e∈E(Vt) log(1−ρ2e ) is close tologdet(R). This implies that one wants a truncated vine such that ρ2e are large inthe first t trees and small in the remaining trees.Formally, the goal of the vine structure learning problem is to find a t-truncatedvine that maximizes the objective function, or log-likelihood functionLt(V ) =t∑i=1∑e∈E(Ti)− log(1−ρ2e ), (2.3)where t is a pre-defined truncation level, and ρe is the partial correlation corre-sponding to edge e in the vine. Since ρ2e ∈ (0,1), Lt(V ) is monotonically increasingwith respect to t. Furthermore, for any d-dimensional vine V ,Ld−1(V ) =− logdet(R).In other words, all the untruncated vines achieve the same objective function.There are a few existing methods attempting to solve the vine structure learn-ing problem. The most direct way is to enumerate and compare all possible vinestructures in a brute-force fashion. However, Kurowicka and Joe (2011) show thatthere are in total 2(d−3)(d−2)(d!/2) different full vine structures considering all lev-els for d variables. This makes brute-force search only feasible for d ≤ 8 due tothe exponentially increasing number of possible vine structures.As an alternative, Dissmann et al. (2013) propose a method based on the max-imum spanning tree (MST) algorithms with different possible choices for edgeweights that reflect strength of the dependence between pairs of variables. Formultivariate Gaussian case, a good choice of edge weight in the trees is weight− log(1− ρ2e ) for edge e; this is used in Section 6.17 of Joe (2014). The trees,23T1 to Tt , of the vine are sequentially constructed by maximizing the sum of theedge weights at each tree level. Such an MST can be obtained using the algorithmby Prim (1957). Dissmann’s algorithm is a greedy algorithm: the construction ofTi+1 is based on the locally optimal choice given Ti. It does not in general producea globally optimal solution.Inspired by genetic algorithms, Brechmann and Joe (2015) propose methodsto effectively explore the search space of truncated vines. At each tree level, itconsiders not only the MST, but also neighbors of the MST. In general, the resultsgenerated by this algorithm outperform the greedy algorithm.2.3.6 Performance metricIt is a common question whether an empirical correlation matrix R is well approxi-mated by a model. A likelihood-ratio test is often used to assess the goodness-of-fitof a structural model. The comparative fit index (CFI) is a fit index that takes intoaccount the likelihood-ratio as well as the number of model parameters (Bentler,1990; Brechmann and Joe, 2015).A fit measure isDt = n[−Lt(V )− logdet(R)], (2.4)where Lt is the objective function defined in Equation 2.3. If the model is com-pletely unstructured (the saturated model), then Dt = 0. On the other hand, if themodel assumes that all variables are uncorrelated, then D0 :=−n logdet(R). Rea-sonable models should lie somewhere in between these two extreme cases. There-fore, Dt can be viewed as a discrepancy measure.For a t-truncated vine, its degree of freedom (or d(d−1)/2 minus the numberof model parameters) isνt =d(d−1)2− t(2d− t−1)2=(d− t)(d− t−1)2. (2.5)In particular, ν0 = d(d−1)/2 is the case of complete independence.The CFI of a t-truncated vine is defined asCFIt := 1− max(0,Dt −νt)max(0,D0−ν0,Dt −νt) , (2.6)24which takes on values between 0 and 1. Higher CFI values correspond to better fit.CFI can be used to find an optimal truncation level given a predefined goodness-of-fit level. Formally, the optimal truncation level is given byt∗α = min{t ∈ {0, . . . ,d−1} : CFIt ≥ 1−α}, (2.7)where α ∈ (0,1). Commonly used α values include 0.01 and 0.05.2.4 Two-stage estimation method for copula modelsIn this section, we discuss methods for parameter estimation for copula models.The inference functions for margins (IFM) method or two-stage estimation is intro-duced in Joe and Xu (1996) and Joe (1997).In the first stage of IFM, univariate marginal distributions are fitted. The uni-variate marginal distributions could be estimated either parametrically or non-parametrically. Graphical diagnostics can suggest good choices of each parametricunivariate margin, and the best parameters are selected based on Akaike informa-tion criterion (AIC) or Bayesian information criterion (BIC). Alternatively, em-pirical CDFs can be used for continuous univariate margins (Genest et al., 1995).Observations for each marginal component are converted to uniform scores forcopula analysis using the probability integral transform.In the second stage, parameters of the dependence structure are estimated.Based on bivariate plots of normal scores for continuous variables, a set of can-didate copula families is chosen. With the estimated univariate marginal distribu-tions held fixed, copula parameters are estimated for the candidate copula families,and the best model is selected based on AIC or BIC. An analysis of the asymptoticefficiency of IFM is established in Joe (2005).For vine copulas with a parametric bivariate copula on each edge, we can es-timate bivariate copula families separately, starting with copulas in tree 1. This isthe approach of the VineCopula R package (Schepsmeier et al., 2018).252.5 Vuong’s procedureVuong’s procedure is a diagnostic method to check if two different parametric mod-els have similar fits (Vuong, 1989). Note that the two models need not to be nested.It is based on the vectors of log-likelihood contributions of each observation for twocompeting models. In the copula literature, for example, Brechmann et al. (2012),it has been used to compare two copula models.Suppose the observed sample are the vectors x1, . . . ,xn, where n is the samplesize. Given two models M1 and M2, nested or not, whose parametric densitiesare f (1) and f (2) and the estimated parameters are θˆ (1) and θˆ (2), respectively, thestatistic D̂12 is defined as follows:D̂12 =1nn∑i=1Di, where Di = log[f (2)(xi; θˆ(2))f (1)(xi; θˆ(1))].A large sample 95% confidence interval based on the AIC correction isD̂12±1.96× σˆ12√n −1n[dim(θˆ (2))−dim(θˆ (1))],whereσˆ212 =1n−1n∑i=1(Di− D̂12)2.If the confidence interval contains 0, then models M1 and M2 would not be consid-ered significantly different. Otherwise, model M1 or M2 is the better fit dependingon whether the interval is completely below 0 or above 0, respectively. The AICcorrection means that a model with fewer parameters is more favorable than amodel with more parameters.26Chapter 3Vine structure learning via MonteCarlo tree search3.1 IntroductionIn multivariate statistics, modeling the dependence structure of multivariate obser-vations is essential. The multivariate Gaussian distribution for continuous randomvariables is one of the most commonly used models for this task. However, multi-variate data may not be well summarized by the multivariate Gaussian distribution,after transforming individual variables to standard normal margins, when there isjoint tail asymmetry or tail dependence. Copula models are flexible in modelingmultivariate distributions with tail behaviors that can be different from multivariateGaussian. Joe (2014) includes a detailed introduction to copula theory, models andapplications.The vine copula or pair-copula construction is a flexible tool in high-dimensionaldependence modeling. It combines vine graphs and bivariate copulas. A vine isa graphical object represented by a sequence of connected trees. In a vine cop-ula model, vines consisting of several trees are adopted to specify the dependencestructure with trees 2 and higher summarizing conditional dependence, and bivari-ate copulas are used as the basic building blocks on the edges of vine trees. Trun-cated vines are useful for representing the dependence of multivariate observationsin a parsimonious way with a few layers of conditional dependence. Vine copu-27las have been applied to many fields (Cooke et al., 2019; Dissmann et al., 2013;Krupskii et al., 2018).The structure learning of the truncated vine, defined in Section 2.3.5, is compu-tationally intractable in general. When dealing with d variables, one inputs to thealgorithm a d×d correlation matrix. The output is a truncated vine defined using ttrees where the nodes and edges of each tree need to meet the requirements of Def-inition 2.1. There are a large number of possible vine structures which result in alarge search space for a high-dimensional dataset if one would like to find parsimo-nious structures with fewer trees by maximizing the objective function defined inEquation 2.3. Specifically, according to Cayley’s formula, one can construct dd−2different trees with d nodes. With this result, Kurowicka and Joe (2011) furthershow that there are in total 2(d−3)(d−2)(d!/2) different vine structures consider-ing all levels for a dataset with d variables. This makes vine structure learning achallenging problem. Previous work has been mainly centered around greedy algo-rithms which follow the heuristics of linking variables with stronger dependence inlow-level trees and making the locally optimal choice at each tree level conditionalon previous trees. However, it does not in general produce a solution that is closeto the global optimum of the vine structure learning problem.Monte Carlo tree search (MCTS) is a searching framework for finding optimaldecisions by taking random samples in the decision space (Browne et al., 2012).The key idea of MCTS is first to construct a search tree of states which are evalu-ated by fast Monte Carlo simulations and then selectively grow the tree (Coulom,2006). Multi-armed bandit algorithms such as the upper confidence bounds fortrees (UCT) can be employed to balance between exploration and exploitation (Koc-sis and Szepesva´ri, 2006). As one of the most important methods in artificial in-telligence, MCTS has been widely applied in various game and planning problems,including chess, shogi, Go, real-time video games, and even games with incom-plete information such as poker. In March 2016, AlphaGo, which combines MCTSwith deep neural networks, became the first computer Go program to beat a 9-danprofessional without handicaps (Silver et al., 2016). This is regarded as a signifi-cant milestone in artificial intelligence research.Because the construction of a truncated vine is inherently sequential, we for-mulate the vine structure learning problem as a sequential decision making process28in this work. A search tree thus arises, where the root node is “empty” and doesnot has any edge, and the terminal leaf nodes are t-truncated vines. Although theheight and branching factor of the search tree might be large, MCTS can be adoptedto search through it efficiently.1 Specifically, we adapt the existing upper confi-dence bounds for trees (UCT) algorithm for vine structure learning and incorporatetree policy enhancements including first play urgency, progressive bias, and effi-cient transposition handling. The adapted UCT is called the vine UCT, under theguidance of which, the tree policy strikes a balance between exploration and ex-ploitation.After the MCTS method finds candidates for truncated vine structures, bivari-ate copulas (based on diagnostics and the candidate list) are assigned to the edgesof the truncated vines. By separating the search of the truncated vine structurefrom the assignment of bivariate copulas to the edges of the vine, one can comparedependence structures such as truncated vine and factor models with latent vari-ables as nodes of vine. The proposed approach improves on greedy algorithms interms of model fitting but takes more computational time. Comparisons are madewith existing methods on several real datasets. All the examples suggest that theproposed method outperforms existing methods.The remainder of the chapter is organized as follows. The proposed MCTS-based vine structure learning method is described in Section 3.2, and is evaluatedon various datasets in Section 3.4. Section 3.3 compares a synthetic situation wheregreedy algorithms are expected to perform poorly. Section 3.5 provides concludingremarks.3.2 Proposed method3.2.1 Vine structure learning as sequential decision makingIn this section, we formulate the vine structure learning problem as a sequentialdecision making problem. A t-truncated vine can be represented by a sequence ofedges in the sequence of trees T1, . . . ,Tt . Given the discrete and hierarchical na-1The height of a tree is the length of the longest path from the root to a leaf. The height of thesearch tree is thus ∑ti=1(d− i). The branching factor of a node is the number of children of the node.An average branching factor can be calculated for all the nodes in a tree.2912 34512 345[1,2][2,5]12 345[1,2][2,5][2,3][3,4]12 345[1,2][2,5][2,3][3,4][1,2][2,5][2,3][3,4][1,5|2][1,3|2]12 345[1,2][2,5][2,3][3,4][1,2][2,5][2,3][3,4][1,5|2][1,3|2][2,4|3](a) (b) (c)(d) (e)Figure 3.1: Vine structure learning as a sequential decision problem. An edgecan be added to an unconnected acyclic graph. When a tree at level t iscompleted, the edges of this tree are used to create nodes for the nextgraph at level t+1.ture, the construction of a t-truncated vine can be regarded as a sequential decisionproblem:1. The initial state, or the root node of the search tree, has no edge; Ti are emptyfor i ∈ {1, . . . , t}.2. Starting from level i = 1, add one edge to Ti at each step according to a treepolicy. The candidate edges are chosen so that Ti has only one connected andacyclic component. For levels i> 1, the candidate edges also need to satisfythe proximity condition, which is defined in Section 2.3.1. A vine tree iscompleted before going to the next tree.3. If Ti is connected and i < t, go to step 2 and start adding edges to Ti+1.Figure 3.1 shows an example of vine structure learning with dimension d = 5and truncation level t = 2. Each subfigure represents a state in the search tree. (a)The initial state is an empty graph with 5 nodes. (b) After two steps, two edges[1,2] and [2,5] are added to the graph. Note that given this state, [1,5] cannotbe added otherwise a cycle would form. (c) Two more edges [2,3] and [3,4] are30added and T1 becomes connected. (d) We have started adding edges to T2; [1,5|2]and [1,3|2] have been sequentially added. Given this state, edges [2,5]–[3,4] and[1,2]–[3,4] do not satisfy the proximity condition, and edge [2,3]–[2,5] does notsatisfy the acyclic condition (defined in Section 2.3.1). Therefore, the only edgethat can be further added is [2,3]–[3,4], namely [2,4|3]. (e) All the edges havebeen added and this is a terminal state. The tree construction resembles Prim’salgorithm (Prim, 1957), which ensures that at each time step there exists at mostone connected component in each tree.3.2.2 Monte Carlo tree searchMonte Carlo tree search (MCTS) is a method trying to find the optimal sequence ofdecisions by taking random samples over possible decisions and building a searchtree accordingly. It has been widely used in domains that can be represented astrees of sequential decisions, such as games and planning problems. MCTS is par-ticularly useful in problems with high branching factors since MCTS can be config-ured to be terminated after a predefined computational budget is reached and canselect a sufficiently good solution based on the partially constructed tree. Whilea pure Monte Carlo process runs a large number of simulations completely ran-domly, MCTS keeps statistics of each possible move and uses them to guide theconstruction of the search tree. Notably, AlphaGo (Silver et al., 2016) combinesdeep neural networks with MCTS and has recently defeated a human professionalplayer in the full-sized game of Go, which has long been viewed as one of the mostchallenging classic games for artificial intelligence.The basic MCTS algorithm is conceptually simple: it iteratively builds a searchtree until a predefined computational budget is reached. The search tree is initial-ized to a root node v0 with an initial state s0, which does not have any edge. Everynode v in the search tree has its corresponding number of visits nv and averageobjective function x¯v; both are initialized to zero. The child nodes of a node in thesearch tree consist of incomplete t-truncated vines with one additional vine edge(from those that satisfy the proximity and acyclic conditions). In each iteration,four steps are applied:1. Selection: Starting at the root node v0, a tree policy is recursively applied to31ΔΔΔΔΔSelection Expansion Simulation BackpropagationDefault PolicyTree PolicyFigure 3.2: One iteration of the general MCTS algorithm.descend from the root node to a node v with at least one unvisited child node(nv = 0). In other words, edges are added one by one.2. Expansion: An unvisited child node v` is added to expand the tree.3. Simulation: A simulation is run from the new node v` according to a defaultpolicy to produce a t-truncated vine. An edge is randomly selected withequal probability from all the eligible edges, that is the candidate edges thatsatisfy the acyclic and proximity conditions.4. Backpropagation: The objective function of the t-truncated vine is calculatedand nv and x¯v are updated for all nodes v along the path.One iteration of the MCTS algorithm is shown in Figure 3.2. Note that thereare two types of policies used in MCTS: tree policies are used in the selectionand expansion steps to select or create a child node vl from a node that is alreadycontained in the search tree, while default policies are used in the simulation stepto produce an estimate of the outcome proceeding from a non-terminal node. Thebackpropagation step does not involve a tree policy. However, once the statisticsfor each node in the tree are updated after backpropagation, any future decisionsmade based on the tree policies may be affected. The final result of MCTS is asequence of actions that leads to a sufficiently optimal outcome starting from theroot node v0.32A default policy determines how to move from a non-terminal node to a termi-nal node, which corresponds to a full t-truncated vine, in the search tree. In ourapplication, it specifies how to “complete” the truncated vine given an incompleteone, which has fewer edges than a full t-truncated vine. We consider two options:1. Uniformly random: given an incomplete t-truncated vine (with search treein level i with incomplete Ti), an edge is randomly selected with equal prob-ability from remaining eligible edges in Ti, that is the candidate edges thatsatisfy the acyclic and proximity conditions.2. Maximum spanning tree: similar to Dissmann’s Algorithm (Dissmann et al.,2013) introduced in Section 2.3.5, given an incomplete truncated vine, maxi-mum spanning trees are constructed sequentially from lower to higher levels.The disadvantage of maximum spanning tree default policy is that it greedilyexpands an incomplete vine, and this might lead to insufficient exploration and lockonto a set of suboptimal actions. Therefore, we use the uniformly random defaultpolicy in the proposed method to better explore the search space.3.2.3 Tree policy: vine UCTMCTS iteratively builds and updates a search tree to approximate the optimal ac-tions that can be taken from a given state. The way the search tree is built dependson how the nodes in the tree are selected, which is controlled by the tree policy.Therefore, the choice of tree policy is crucial to the success of MCTS: it shouldmanage to balance between exploration (look in areas that have not been well sam-pled yet) and exploitation (look in areas which appear to be promising). In thissection, we describe a popular tree policy in the MCTS family, the upper confi-dence bounds for trees (UCT).In the original definition of UCT, in the selection and expansion step a childnode j is selected to maximizeUCT( j) = x¯ j +κ√2lognn j, (3.1)where33• x¯ j is the average objective function from child node j;• n is the number of times the parent node has been visited;• n j is the number of times child j has been visited;• κ > 0 is a constant.If more than one child node has the same maximal value, the tie is usually brokenrandomly. Since n j = 0 yields a UCT value of ∞, previously unvisited children areassigned the largest possible value, which corresponds to the expansion step in theMCTS algorithm. In the backpropagation step, x¯ j and n j are updated accordingly.The two terms in Equation 3.1 attempt to balance between exploitation (thefirst term) and exploration (the second term). Without the exploration term, theUCT algorithm will always select the child node with the highest average outcomebased on the simulation history. However, with the exploration term, if a childnode j of a parent node has been visited, n in the numerator increases, which leadsto the increase of the exploration value of the other unvisited child nodes of theparent node. At the same time, n j in the denominator for this child node increasesand hence the exploration value of node j decreases. The exploration term in theUCT objective function ensures that each child node has a non-zero probability ofbeing selected and thus achieves a balance between exploitation and exploration.However, in order to apply the original UCT algorithm to the vine structurelearning problem, a few adaptations are needed.Scaling constantIn our application, the objective function Lt(V ) defined in Equation 2.3 is in therange of (0,− logdet(R)], where R is the empirical correlation matrix. We needto adjust the scaling constant κ in Equation 3.1 accordingly so that the explorationand exploitation terms are on the same scale. A natural choice of κ is− logdet(R).The value of κ can be adjusted to lower or increase the amount of explorationperformed.34First play urgencyIn the original UCT algorithm, the selection step stops whenever a node has anunvisited child node. For problems with large branching factors or height in thesearch tree, the tree will not grow deeper unless all the child nodes are visited. Forvine structure learning, the height of the search tree is in the order of O(d2), andthe branching factor is also large. Therefore, exploitation will rarely occur deepin the tree according to the original UCT algorithm. First play urgency (FPU) is amodification proposed by Gelly and Wang (2006) to address this issue. It assigns afixed value of λFPU to score the unvisited nodes and uses the original UCT formulato score the visited nodes. By doing so, the score of an unvisited node is no longerinfinite, and this encourages early exploitation.Progressive biasWhen a node has been visited only a few times, its statistics are not reliable.Progressive bias is a technique of adding domain specific heuristic knowledge toMCTS (Chaslot et al., 2008). In artificial intelligence game playing problems, manygames already have strong heuristic knowledge.A general form of progressive bias for node j is H j/(n j + 1), where H j is aheuristic value and n j is the number of visits for this node. This term is added tothe UCT formula to encourage exploitation of nodes with larger heuristic values.As the number of visits n j increases, the effect of progressive bias decreases.In our application, given the objective function in Equation 2.3, H j can bechosen as H j =− log(1−ρ2e j), since the objective function is the summation of H jover all the edges in a truncated vine. Here e j is the edge added by node j and ρe j isthe corresponding (partial) correlation parameter. A tuning parameter λPB is usedto control the strength of progressive bias; that is, λPB times the progressive biasis added to the exploration term in UCT. When λPB is sufficiently large, the treepolicy is solely controlled by the heuristic value, and the MCTS algorithm coincideswith Dissmann’s algorithm.35Ø[1,2][1,3][2,3]{[1,2]} {[1,3]} {[2,3]}{[1,2],[1,3]}[1,3] [2,3] [1,2] [2,3] [1,2] [1,3]{[1,2],[2,3]} {[1,3],[1,2]} {[1,3],[2,3]} {[2,3],[1,2]} {[2,3],[1,3]}Figure 3.3: The search tree corresponding to a 1-truncated vine with d =3. Although the search tree has six leaf nodes, there are only threeunique 1-truncated vines: {[1,2], [2,3]} and {[2,3], [1,2]} yield 1–2–3; {[1,3], [2,3]} and {[2,3], [1,3]} yield 1–3–2; {[1,2], [1,3]} and{[1,3], [1,2]} yield 2–1–3.TranspositionsThe formulation of vine structure learning as a sequential decision making processhas a potential problem: the same states can be reached through different paths inthe search tree. This is usually referred to as transpositions.Figure 3.3 shows the search tree corresponding to a 1-truncated vine with d = 3.It is obvious that there are only three unique 1-truncated vines in this case sum-marized as 1–2–3, 1–3–2, 2–1–3. However, the search tree contains six terminalnodes; for each unique vine structure, there exist two distinct paths leading to it.For example, the two paths [1,2], [2,3] and [2,3], [1,2] both result in the same vinestructure 1–2–3.Transpositions cause inefficiency because the statistics of the same state arescattered across different nodes. Transposition tables are the usual choice to tacklethis problem; they store information about states and share the statistics to sub-sequent occurrences of the same state during the search. A transposition table isusually implemented as a hash table of the unique vine states. On encountering anew vine state, the algorithm checks the table to see whether the state has alreadybeen analyzed; this can be done quickly, in expected constant time. If so, the table36contains the statistics that were previously assigned to this state, and the statisticsare used directly. If not, the new state is entered into the hash table.It is relatively straightforward to apply transposition tables in the selection stepsof MCTS. Childs et al. (2008) further discuss the use of transposition tables inthe backpropagation steps. Specifically, we adopt the UCT2 algorithm from thatpaper. Compared with the original UCT formula in Equation 3.1, there are twomodifications: (1) a transposition table is used to share statistics of the same vinestate; (2) the algorithm keeps track of the number of visits of both nodes and edgesin the search tree. For a parent node p and its child node j, the UCT2 value is givenbyUCT2(p, j) = x¯ j +κ√2lognpn(p, j), (3.2)where x¯ j is retrieved from the transposition table, np is the number of visits of nodep and n(p, j) is the number of visits of edge (p, j). Note that if n(p, j) is replaced withn j, the value of the parent node might converge to an incorrect value (Browne et al.,2012).Vine UCTCombining the above adaptations, the resulting UCT is called the vine UCT. For aparent node p and its child node j in the search tree, the vine UCT value isVUCT(p, j) = x¯ j− logdet(R)[λPB · H jn j +1 +min{√2lognpn(p, j),λFPU}], (3.3)where• λFPU and λPB are the tuning parameters;• H j =− log(1−ρ2e j) is the contribution to the objective function by the newlyadded edge e j in child j;• np and n j are the numbers of visits of parent node p and child node j in thesearch tree;• n(p, j) is the number of visits of edge (p, j);37• x¯ j is the average objective function (defined in Equation 2.3) from child nodej retrieved from the transposition table.In summary, the input of our method is a correlation matrix R calculated froma multivariate dataset. The MCTS algorithm is applied, using vine UCT as the treepolicy and uniformly random default policy. In every iteration, the default policyleads to a terminal t-truncated vine. Through the iterations, we keep the vine withthe largest value of the objective function in Equation 2.3, and it is returned as theoutput. Algorithm 3.1 presents the pseudocode of the proposed method.We further illustrate the MCTS algorithm by applying it to a simple example:learning a 2-truncated vine for d = 6. The correlation matrix is as follows:1.00 0.40 0.15 0.41 0.32 0.620.40 1.00 0.45 0.54 0.76 0.480.15 0.45 1.00 0.20 0.51 0.260.41 0.54 0.20 1.00 0.42 0.410.32 0.76 0.51 0.42 1.00 0.360.62 0.48 0.26 0.41 0.36 1.00.We run 3000 iterations of the MCTS algorithm. Figure 3.4 shows some of thenodes with depths of 0 to 3 in the search tree and the corresponding number ofvisits nv, average score (objective function) x¯v, and the progressive bias H j. Sinceeach iteration always starts from the root node, the root node has been visited 3000times. There are(62)= 15 nodes of depth 1 in the search tree in total. Figure 3.5further shows some nodes of depths 5, 6 and 9 in the search tree. Since the nodesof depth 5 have completed first vine trees, edges are added to the second vine trees.If we run the simulation steps using the default policy from the nodes of depth 6,each gives a node of depth 9 in the search tree, corresponding to a full t-truncatedvine. Since each iteration adds one node to the search tree, there are only 3000nodes in the search tree. As a result, the nodes of depth 9 are likely not part ofthe search tree, and their numbers of visits and average scores are not stored inthe search tree; they are only simulated using the default policy from a node in thesearch tree.38Algorithm 3.1 Vine UCT Algorithm1: function VUCTSEARCH(R, t, num iter)2: # R is the correlation matrix, t is the truncation level, num iter is the number ofiterations.3: ∆best← 0 . the best score so far4: vroot← /0 . The root node contains no edge5: vbest← null . the terminal node in the search tree that has the best score6: for i = 1 to num iter do7: (vtree,vhistory)← TreePolicy(vroot)8: (vdefault,∆)← DefaultPolicy(vtree, t)9: Backprop(vhistory,∆)10: if ∆> ∆best then11: vbest← vdefault12: ∆best← ∆13: end if14: end for15: return vbest16: end function17: function TREEPOLICY(v)18: vhistory← [v] . a list of nodes in the search tree19: while nv > 0 do . nv is the number of visits to node v20: v← argmaxc∈children(v)VUCT(v,c) . VUCT in Equation 3.321: append v to vhistory22: end while23: return (v,vhistory)24: end function25: function DEFAULTPOLICY(v, t)26: while v is not a completed t-truncated vine do27: v← uniformly random sample from children(v)28: end while29: ∆← score of v30: return (v,∆)31: end function32: function BACKPROP(v,∆)33: for i in 1 : length(v) do34: x¯v[i]← (nv[i]x¯v[i]+∆)/(nv[i]+1)35: nv[i]← nv[i]+1 . x¯v and nv are initialized to 0 for each node v36: end for37: if i > 1 then38: n(v[i−1],v[i])← n(v[i−1],v[i])+139: end if40: end function39num_visits = 3000avg_score = 2.1771 23456num_visits = 277avg_score = 2.276H = 0.1741 23456num_visits = 142avg_score = 2.046H = 0.0231 23456num_visits = 218avg_score = 2.204H = 0.1391 23456num_visits = 59avg_score = 2.195H = 0.0231 23456num_visits = 60avg_score = 2.267H = 0.1841 23456num_visits = 70avg_score = 2.290H = 0.2621 23456Depth 0Depth 1Depth 2Figure 3.4: Some nodes of depths from 0 to 3 in the search tree. The rootnode does not have any edges. A child node is obtained by adding anedge to the (incomplete) vine structure of its parent node. In futureiterations, child nodes with higher scores are more likely to be visited(exploitation); child nodes with fewer prior visits are more likely to bevisited (exploration); child nodes with larger values of H j = − log(1−ρ2e j) are more likely to be visited (progressive bias). Note that eachchild node has several predecessors so that the number of visits of agiven node in the search tree is fewer than the sum of numbers of visitsof its child nodes.40num_visits = 93avg_score = 2.3471 23456num_visits = 32avg_score = 2.351H = 0.1101 2345612 16num_visits = 25avg_score = 2.349H = 0.6531 2345612 24num_visits = 32avg_score = 2.354H = 0.0881 2345623 25score = 2.2081 2345612 162523 24Depth 5Depth 6Depth 9score = 2.2701 2345612 162523 24score = 2.2911 2345612 162523 24Figure 3.5: Some nodes of depths 5, 6, and 9 in the search tree. The nodesof depth 9 are the results of the simulation step, starting from the nodesof depth 6. The scores or objective functions of the best 2-truncatedvine found by the MCTS algorithm, brute-force algorithm, and sequen-tial MST algorithm are 2.362, 2.362, and 2.333, respectively.413.3 A worst-case example for SeqMSTGreedy algorithms generally do not find the global optimum. In this section, westudy a worst-case example where the dependence structure can be optimally cap-tured by a 2-truncated D-vine model, but greedy algorithms only find locally opti-mal solutions. This illustrates how the proposed method performs in a worst-casescenario for greedy algorithms.Consider a d-dimensional random vector Z = (Z1, . . . ,Zd) ∼ N(0,R), whichhas a stochastic representation as follows. Let ε j be i.i.d N(0,1) random variablesand φ j,1,φ j,2 and ψ j are constants. LetZ1 = ε1,Z2 = φ2,1Z1+ψ2ε2,Z j = φ j,1Z j−1+φ j,2Z j−2+ψ jε j for 3≤ j ≤ d.Here, ψ j are chosen as a function of φ j,` such that Var(Z j) = 1. Section 6.14.2of Joe (2014) gives an algorithm of converting from the coefficients {φ j,`} to thecorrelation matrix R and vice versa. R is said to be the correlation matrix of a 2-truncated partial correlation D-vine because the resulting D-vine has partial corre-lations of zero for variables separated by 3 or more nodes; i.e., ρ j, j+k; j+1,..., j+k−1 =0 for k ≥ 3.To make the problem difficult for greedy algorithms, we set φ j,1 < φ j,2 for3 ≤ j ≤ d. As a result, the correlations between Z j and Z j+2 are greater thanbetween Z j and Z j+1. Here is an example for d = 5, φ j,1 = 0.3, and φ j,2 = 0.6 forall j; the correlation matrix isR =1.00 0.30 0.69 0.39 0.530.30 1.00 0.48 0.74 0.510.69 0.48 1.00 0.59 0.780.39 0.74 0.59 1.00 0.650.53 0.51 0.78 0.65 1.00 .Since R j, j+1 <R j, j+2, for j= 1,2,3, Dissmann’s algorithm (Dissmann et al., 2013)selects edges [ j, j+2] in the first tree, and this leads to a suboptimal solution.42We simulate 30 correlation matrices with φ j,1∼U(0.2,0.3) and φ j,2∼U(0.5,0.6)independently. The proposed method (MCTS) is compared with the followingbaseline methods: (1) BF: the brute-force search; (2) SeqMST: Dissmann’s al-gorithm (Dissmann et al., 2013); (3) BJ15: method proposed by Brechmann andJoe (2015). See Section 2.3.5 for details about the baseline methods. Figure 3.6(a)and 3.6(c) show the comparative fit index (CFI) for various truncation levels t ∈{2, . . . ,d− 1} of truncated vines achieved by different algorithms for d = 10 andd = 15 respectively. Higher CFI values indicate better fit. Note that 1-truncatedvines are omitted from the figures because the optimal 1-truncated vine can befound by a minimum spanning tree algorithm. For d = 10, the MCTS method isable to find models with CFI close to 1 using only t ≤ 3 trunction level, while itneeds t ≤ 5 for BJ15 and t ≤ 8 for SeqMST. The pattern is similar for d = 15.We further conduct experiments on a 2-truncated D-vine model with pertur-bation. Given a correlation matrix R, we draw a random sample of size n fromN(0,R) and compute the sample correlation matrix R̂. By adding extra noise, R̂is no longer the correlation matrix of a 2-truncated partial correlation D-vine. Theresults are shown in Figures 3.6(b) and 3.6(d) for d = 10 and d = 15 respectively.For this worst-case example, an alternative progressive bias term is considered.For the previous correlation matrix R, its corresponding partial correlation matrixis 1.00 −0.04 0.52 0.00 0.00−0.04 1.00 0.07 0.62 0.000.52 0.07 1.00 0.08 0.550.00 0.62 0.08 1.00 0.300.00 0.00 0.55 0.30 1.00 ,where the (i, j)-th element is the partial correlation of (Zi,Z j) given all the othervariables. Note that the partial correlation is zero if i and j are separated by 3or more variables. Therefore, we define the progressive bias as H j = − log(1−ρ2e j;−e j), where ρe j;−e j is the partial correlation of the two nodes that are incidentto edge e j given all the other variables. This discourages the MCTS algorithm fromchoosing pairs of nodes that are separated by 3 or more nodes. Experiments showthat the new progressive bias term yields similar performance as the original one;hence we leave out the experimental results.430.940.960.981.002 4 6 8Truncation levelCFIMethod MCTS BJ15 SeqMST(a) d = 10 without perturbation.0.940.960.981.002 4 6 8Truncation levelCFIMethod MCTS BJ15 SeqMST(b) d = 10 with perturbation, n = 500.0.920.940.960.981.005 10Truncation levelCFIMethod MCTS BJ15 SeqMST(c) d = 15 without perturbation.0.920.940.960.981.005 10Truncation levelCFIMethod MCTS BJ15 SeqMST(d) d = 15 with perturbation, n = 500.Figure 3.6: CFI vs truncation level t for simulated 2-truncated D-vine datasetswith d = 10 and d = 15. A larger CFI is better.3.4 ExperimentsIn this section, we evaluate the performance of the proposed methods on severalreal datasets. Section 3.4.1 focuses on the structure learning tasks or finding parsi-monious truncated vines, while Section 3.4.2 considers learning or fitting truncatedvine copula models, that is, vine structures as well as bivariate copulas.443.4.1 Structure learning experimentsIn this subsection, we only consider the vine structure learning tasks. In otherwords, the task is only to learn the vine structure, and the performance is measuredby CFI defined in Section 2.3.5.DatasetsTo assess the effectiveness of our proposed method, we consider three datasetsfrom various fields. One of them is a small dataset with d = 8 so that it is feasibleto run the brute-force algorithm. The other two have more variables. Subsetsof variables are randomly selected from these two larger datasets to evaluate theperformance of the proposed method.Abalone The abalone dataset is obtained from the UCI machine learning repos-itory (Lichman, 2013). It contains n = 4177 samples and 8 numerical variables,including age and physical measurements of abalones.Glioblastoma tumors (GBM) The glioblastoma tumors dataset is a level-3gene expression dataset studied by Brennan et al. (2013). It is obtained from TheCancer Genome Atlas (TCGA) Data Portal (Tomczak et al., 2015) and containsexpression data of 12044 genes from n = 558 tumors. Within all the genes in thedataset, we first filter out 1342 genes that are related to human cell cycle. Af-terward, a hierarchical clustering algorithm with Euclidean distance metric andcomplete-linkage is applied to obtain a cluster of 92 genes. To further study dif-ferent scenarios, we randomly sample d = 8,10,15,20 variables and repeat theprocedure 100 times. This allows us to calculate the means and standard errors ofCFI when comparing different methods.DAX2011 This dataset contains n = 511 daily log returns of 29 stocks listedin Deutscher Aktien Index (DAX) in 2011–2012 (Section 7.8.2 in Joe (2014)). AGARCH filter is applied to remove serial dependence. Similar to the sub-samplingprocedure for the GBM dataset, we also randomly sample d = 8,10,15,20 vari-ables for 100 times.450.9850.9900.9951.0002 3 4 5 6 7Truncation levelCFIMethod MCTS BF BJ15 SeqMSTFigure 3.7: CFI vs truncation level t for the Abalone dataset. A larger CFI isbetter.d = 8 d = 10 d = 15 d = 202 3 4 5 6 7 2 4 6 8 5 10 5 10 150.800.850.900.951.000.850.900.951.000.8750.9000.9250.9500.9751.0000.920.961.00Truncation levelCFIMethod MCTS BJ15 SeqMSTFigure 3.8: GBM dataset: CFI vs truncation level t. A larger CFI is better.ResultsFor all the experiments, 5000 MCTS iterations are performed; hyperparameters ortuning parameters are set to λFPU = 1 and λPB = 0.1. The following results showthat for datasets with various dimensions d and truncation levels t, this set of hy-perparameters consistently gives decent results. This indicates that our method isrobust under different settings and does not require much hyperparameter tuning.The algorithm is implemented in Python; the code can be found in Appendix C.The comparisons have been made on correlation matrices of actual datasets.In Section 3.3, synthetic structural correlation matrices are constructed for whichSeqMST performs much worse than the examples in this section.Abalone dataset Figure 3.7 shows the CFI for various truncation levels t. The464812168 10 15 20DimensionOptimal truncation levelMethod MCTS BJ15 SeqMST(a) GBM dataset.5798 10 15 20DimensionOptimal truncation levelMethod MCTS BJ15 SeqMST(b) DAX2011 dataset.Figure 3.9: Optimal truncation level t∗α=0.01 vs dimension d. A smallert∗α=0.01 is better.performance of MCTS is better than SeqMST and BJ15 for all truncation levels.Notably, MCTS has the same performance as BF for t = 4,5 and 6, which indicatesthat our method can find the best truncated vine for those truncation levels.GBM and DAX2011 datasets Figure 3.8 shows the CFI for different trunca-tion levels t and various dimensions d on the GBM dataset. Since 100 subsets ofvariables are randomly sampled for each d = 8,10,15 and 20, confidence intervalsfor the CFI can be obtained. Note that the brute-force search is no longer feasiblein this experiment. MCTS outperforms both BJ15 and SeqMST on all the combina-tions of truncation level t and dimension d. Especially when the truncation level tis small, MCTS is significantly better than the existing methods.Another way to demonstrate the performance is to compare the optimal trun-cation level t∗α defined in Equation 2.7. It gives the minimal truncation level thatreaches a CFI level of 1−α . The lower t∗α , the more dependence information cap-tured by the vine structure. Figure 3.9 shows the the optimal truncation level t∗α=0.01for both GBM and DAX datasets. For the GBM dataset with d = 20, MCTS selectsa vine with 2.4 fewer trees over SeqMST and 1.2 fewer trees over BJ15 on average.The performance for larger values of d such as d = 30 is similar.473.4.2 Vine copula learning experimentsIn this subsection, we consider the overall vine copula models; that is, given alearned vine structure by MCTS, BJ15 or SeqMST, bivariate copulas are fitted toeach edge. This gives an end-to-end comparison of the vine structure learningalgorithms. We consider the following two datasets:FX dataset The dataset contains the log-returns of daily exchange rates of16 currencies from 2012 to 2015, using the US dollar (USD) as the base cur-rency. The exchange rates are retrieved from Antweiler (1996). The currencies areCanadian dollar (CAD), European euro (EUR), Japanese yen (JPY), British pound(GBP), Swiss franc (CHF), Australian dollar (AUD), Hong Kong dollar (HKD),New Zealand dollar (NZD), Korean won (KRW), Mexican peso (MXN), Brazilianreal (BRL), Chinese yuan (CNY), Indian rupee (INR), Russian ruble (RUB), SaudiArabian riyal (SAR), South African rand (ZAR).GBM20 dataset From the GBM dataset mentioned in Section 3.4.1, we pickthe first d = 20 genes.The bivariate copula selection is done by the VineCopula package (Schep-smeier et al., 2018) using maximum likelihood estimation. All possible bivariatecopula families provided by the package are considered, and the bivariate copulamodel with the lowest Akaike information criterion (AIC) is selected. The per-formance is measured by the log-likelihood or the AIC of the learned vine copulamodel on datasets. For BJ15 and MCTS, each algorithm is repeated ten times withdifferent seeds. The truncation level t are chosen such that the CFI is close to 0.95.We have applied the diagnostic tools of asymmetry and simplifying assumptiondescribed in Chapter 4 to the second tree of the SeqMST output. The simplifyingassumption seems valid.The results are shown in Tables 3.1 and 3.2 for the FX and GBM20 dataset,respectively. Algorithm 3.1 is run 10 times, starting with different seeds; similarly,BJ15 is also run 10 times. For the FX dataset, the 10 runs of MCTS return the sametruncated vine structure. The MCTS algorithm performs better than SeqMST andBJ15. Moreover, on the GBM20 dataset, the MCTS algorithm is able to find vinestructures with lower truncation levels yet better AIC compared with SeqMST andBJ15. It also indicates that the vine models with higher Gaussian likelihood tend48Method Rep. Trunc. Gauss log-lik. CFI Log-lik. No. par. AICSeqMST - 3 3239 0.956 3546 66 −6960BJ15 1 3 3216 0.950 3527 65 −69252 3 3215 0.946 3536 68 −69363 3 3239 0.956 3556 66 −69794 3 3231 0.953 3551 66 −69705 3 3234 0.954 3540 65 −69516 3 3225 0.949 3524 65 −69177 3 3215 0.947 3536 67 −69388 3 3216 0.947 3524 66 −69179 3 3220 0.948 3529 66 −692510 3 3242 0.957 3563 68 −6990mean 3225 0.951 3539 −6945best 3242 0.957 3563 −6990MCTS - 3 3262 0.963 3572 70 −7003Table 3.1: Experimental results for the FX dataset. The columns include theGaussian log-likelihood and CFI of the vine dependence structure. Giventhe vine structures, bivariate copulas are assigned to the edges. The lastthree columns are the resulting vine copula log-likelihood, number ofparameters, and AIC. For BJ15 and MCTS, we show 10 replications withdifferent random seeds and the average and best model.to lead to vine copula models with better AIC, and this corroborates the two-stepapproach of separating the steps of learning vine dependence structures from ap-plying non-Gaussian copulas to the edges. Dissmann’s algorithm (Dissmann et al.,2013) in its general form does not separate these two steps. SeqMST is essen-tially Dissmanm’s algorithm with the only candidate of bivariate Gaussian copula.These tables show the importance of repeating the genetic algorithm or MCTS algo-rithm many times. Although the correlation of the Gaussian negative log-likelihoodand the corresponding vine copula AIC is large, they are not perfectly correlated.Among the 10 different runs of MCTS on the GBM20 dataset,−5591 is the smallestAIC, but the Gaussian log-likelihood of 2541 is the fifth largest of the candidates.49Method Rep. Trunc. Gauss log-lik. CFI Log-lik. No. par. AICSeqMST - 16 2563 0.964 3045 252 −5587BJ15 1 14 2552 0.960 3004 249 −55102 14 2539 0.960 3010 234 −55533 14 2484 0.936 2996 246 −55004 14 2543 0.955 2981 238 −54865 14 2511 0.945 2955 232 −54456 14 2551 0.957 3021 242 −55587 14 2495 0.940 2996 241 −55108 14 2554 0.965 3011 231 −55619 14 2567 0.967 3001 238 −552510 14 2581 0.970 3001 226 −5551mean 2538 0.955 2998 −5520best 2581 0.970 3020 −5561MCTS 1 11 2538 0.961 2986 218 −55372 11 2533 0.961 2857 224 −52663 11 2539 0.964 2946 209 −54744 11 2541 0.966 3006 211 −55915 11 2545 0.970 2925 212 −54276 11 2548 0.965 2959 223 −54717 11 2538 0.962 2930 207 −54478 11 2549 0.963 3000 225 −55509 11 2524 0.964 2886 216 −533910 11 2545 0.963 2962 207 −5510mean 2540 0.964 2946 −5461best 2549 0.970 3006 −5591Table 3.2: Experimental results for the GBM20 dataset. The columns includethe Gaussian log-likelihood and CFI of the vine dependence structure.Given the vine structures, bivariate copulas are assigned to the edges. Thelast three columns are the resulting vine copula log-likelihood, numberof parameters, and AIC. For BJ15 and MCTS, we show 10 replicationswith different random seeds and the average and best model.503.5 ConclusionIn this chapter, we present a novel and effective approach to learning truncated vinestructures, or equivalently, finding parsimonious truncated vines. Our method com-bines the original MCTS algorithm with the proposed vine UCT, which is adaptedfrom the original UCT. Under the guidance of the vine UCT, our method can effec-tively explore a large search space of possible truncated vines by balancing betweenexploration and exploitation. We demonstrate that the proposed method has signif-icantly better performance on vine structure learning over the existing methodsunder various experimental setups. The comparisons have been made on correla-tion matrices of actual datasets to reflect performance that might be expected inpractical applications.51Chapter 4Copula diagnostics forasymmetries and conditionaldependence4.1 IntroductionVine copula models have been used for flexible dependence structures that canhave tail asymmetries or stronger dependence in the tails relative to multivariateGaussian copulas. They have been widely used in finance (Brechmann et al., 2012;Dissmann et al., 2013) and other application areas. A vine is a graphical objectrepresented by a sequence of trees. In a vine copula model, vine graphs consistingof nested trees are adopted to specify the dependence structure, and bivariate copu-las are used as basic building blocks on the edges of vines. In tree 1 of the vine, theedges have bivariate copulas applied to univariate margins to get bivariate distribu-tions; in trees 2 and higher, the edges have bivariate copulas applied to univariateconditional margins to get bivariate conditional distributions.In this chapter, we present diagnostic tools for bivariate tail asymmetries andfor conditional dependence as a function of the conditioning value(s). These toolscan effectively facilitate the choice of bivariate parametric copula families on theedges of a vine. For example, if diagnostics for an edge of the vine suggest that52asymmetry or tail dependence exists, then only appropriate parametric copula fam-ilies with properties matching the tail asymmetry or strength of dependence in thetail should be considered. One often makes the simplifying assumption when onefitting a vine copula model; it assumes that the parameters of the copula of a con-ditional distribution do not depend on the value(s) of the conditioning variable(s).For conditional dependence in trees 2 and higher of the vine, our diagnostic toolsyield functions of conditioning variable(s) to help in the visualization of the form ofconditional dependence and asymmetry. Corresponding confidence bands can beobtained for the conditional functions; if a constant function does not lie within theconfidence bands, then the simplifying assumption might be inappropriate and onecould consider copulas whose parameters depend on the value of the conditioningvariable.The strength of dependence between pairs of random variables is often of inter-est. There are many measures of bivariate monotone association that are invariantto strictly increasing transformations on the variables. Commonly used are Spear-man’s rho (Spearman, 1904), Kendall’s tau (Kendall, 1938), and Blomqvist’s beta(Blomqvist, 1950). However, they only capture the strength of dependence in thecenter. Recently, different tail-weighted dependence measures have been proposedso that the strength of dependence in the joint tails can be summarized as well asthe strength of central dependence; see, for example, Krupskii and Joe (2015); Leeet al. (2018) and references therein for the study of families of tail-weighted depen-dence measures. The tail dependence coefficients (Joe, 1993) measure the strengthof dependence in the upper-quadrant or lower-quadrant tails; they are relevant todependence in extreme values and do not have simple empirical counterparts.For univariate distributions, there exist several measures of asymmetry. Themost commonly used one is the skewness. When extending the idea of asym-metry to bivariate distributions, there are two types of asymmetry that are mostrelevant: reflection asymmetry and permutation asymmetry, as defined in Sec-tion 2.1.3. Measures of asymmetry include the L∞ distance between a copula andits reflected/permuted copula, quantile-based asymmetry measures (Rosco and Joe,2013), and asymmetry measures based on a weighting function (Krupskii, 2017).The aforementioned dependence measures and asymmetry measures can effec-tively guide the choice of candidate parametric copula families. For example, based53on the tail-weighted measures of dependence in the joint upper and lower tails, onecould reduce the suitable copula families from many possible families with vary-ing numbers of parameters, normally from one to four. The simplest parametriccopula families are permutation symmetric. However, if an asymmetry measuresuggests the existence of permutation asymmetry, using permutation asymmetriccopula families can potentially lead to a better fit.In order for modeling with vine copulas to be tractable, the simplifying as-sumption is usually made as an approximation, since this can still lead to vinecopulas with flexible tail properties. The simplifying assumption implies that thecopulas of conditional distributions in the second tree and higher do not depend onthe conditioning values. Adopting the simplifying assumption can greatly simplifythe modeling process and evade the curse of dimensionality. An extensive liter-ature on vine copulas has been based on the simplifying assumption. Acar et al.(2012); Hobæk Haff et al. (2010); Stoeber et al. (2013) are some of the earliestpapers to discuss the use of the simplifying assumption as an approximation orthe relaxation of the simplifying assumption for vine copulas. Acar et al. (2012)apply non-parametric kernel smoothing methods to estimate the relationship be-tween a copula parameter and the conditioning value. Stoeber et al. (2013) includea method to assess the distance of a distribution from a nearby distribution that sat-isfies the simplifying assumption. Kraus and Czado (2017b) take the simplifyingassumption into consideration when learning the vine structure. Kurz and Spanhel(2017) propose a framework of testing the simplifying assumption, but do not usetheir method to suggest candidate copula families to use with vines.Many bivariate dependence and asymmetry measures can be written as func-tionals of the copula corresponding to the two continuous random variables. Gi-jbels et al. (2011) propose a kernel smoothing estimator of Spearman’s rho andKendall’s tau for a conditional distribution of two variables given a covariate.Acar et al. (2012) calculate pointwise confidence intervals for the estimated de-pendence measures. We extend the kernel smoothing method to other conditionaldependence and asymmetry measures. Furthermore, we propose the simultaneousenvelop-based bootstrap confidence bands, from which one can visualize whethera conditional measure is roughly constant with respect to the covariates. In caseswhere the conditional measure is far from constant, the simplifying assumption54might be invalid and one should consider parametric copula families whose pa-rameters depend on the value of the conditioning variable.We provide diagnostic methods for the bivariate marginal and conditional dis-tributions, and these can be used within vine copula models. Section 4.2 introducesa general framework for estimating a conditional dependence or asymmetry mea-sure as a function of the conditioning value. An algorithm to compute simultane-ous bootstrap confidence bands of the conditional measure is also provided. Thesemethods allow one to detect whether permutation asymmetry exists and to visu-ally decide if the simplifying assumption is a good approximation. In cases wherepermutation asymmetric copulas are needed, Section 4.3 reviews some methods forconstructing bivariate copulas with asymmetries and compares their range of asym-metry. Section 4.4 illustrates the use of the conditional diagnostics for a multivari-ate copula model for which conditional distributions have widely varying strengthof dependence. For this copula model, the exact conditional dependence measurescan be computed via numerical integration. This allows for a direct comparison ofthe estimated conditional measures with the exact values to assess the sample sizeneeded to see the shape of the function for the conditional measures. Section 4.5demonstrates the proposed diagnostic tools on two real datasets where permuta-tion asymmetry and violation of the simplifying assumption are prominent. Withthe help of the diagnostic tools, we are able to choose suitable bivariate paramet-ric copula families and improve model fit. Finally, Section 4.6 discusses furtherresearch.4.2 Copula-based conditional measuresFor a bivariate continuous random vector (Y1,Y2)T , let η be a dependence measureor asymmetry measure that is invariant to increasing monotone transforms of thetwo random variables. For the joint distribution F12 with copula C12, η is a functionof the copula: η = η(F12) = η(C12).The measure η can be applied to bivariate conditional distributions. Let Y1,Y2be two continuous random variables and X be a random variable or random vector.Consider the (continuous) conditional distribution of [Y1,Y2|X = x] and denote its55copula by CY1,Y2;X(·;x). ThenCY1,Y2;X(u1,u2;x) = FY1,Y2|X(F−1Y1|X(u1|x),F−1Y2|X(u2|x)|x),where FY1,Y2|X(·|x),FY1|X(·|x) and FY2|X(·|x) are the conditional CDFs of [Y1,Y2|X =x], [Y1|X = x] and [Y2|X = x], respectively. Some authors refer to CY1,Y2;X(·;x) as aconditional copula, but it is actually not a conditional distribution.If η is applied to FY1,Y2|X(·|x) for each x, then we have a function of x. Withoverloaded notations, denoteη(x) = η(FY1,Y2|X(·|x)) = η(CY1,Y2;X(·;x)).There exist several copula-based bivariate measures η that are integrals of a copula,including dependence measures, tail-weighted dependences measure (Lee et al.,2018), and asymmetry measures (Krupskii, 2017). In this section, we estimateCY1,Y2;X(·;x) using kernel smoothing over x (Gijbels et al., 2011) and apply theestimated C˜Y1,Y2;X(·;x) to further obtain η˜(x) = η(C˜Y1,Y2;X(·;x)). We consider thefollowing copula-based bivariate measures η :• Spearman’s rho,ρS(C) = 12∫∫[0,1]2C(u1,u2)du1du2−3.• Tail-weighted dependence measure, with α ≥ 1,ζα(C) := 2−α(γ−1α (C)−1), where γα(C) :=∫ 10C(u1/α ,u1/α)du.• Permutation asymmetry measure, with k > 0,GP,k(C) =∫∫[0,1]2|u1−u2|k+2 · sign(u1−u2)dC(u1,u2).• Reflection asymmetry measure, with k > 0,GR,k(C) =∫∫[0,1]2|1−u1−u2|k+2 · sign(1−u1−u2)dC(u1,u2).56The above measures and the corresponding conditional measures can be used as di-agnostic tools to detect permutation and reflection asymmetry for pairs of variablesused in vine copulas and check the reasonableness of the simplifying assumptionfor vine copulas.In Sections 2.15 and 2.17 of Joe (2014), there are tail-weighted dependencemeasures such as semi-correlations and asymmetry measures that depend on quan-tiles. We will not consider these for the conditional version in the subsequentdevelopments because there do not exist simple expressions for the conditionaldependence measures defined by quantiles in general.We give an overview of the kernel smoothing method (Gijbels et al., 2011)in Section 4.2.1. In Sections 4.2.2, 4.2.3 and 4.2.4, we present the estimates ofconditional Spearman’s rho, conditional tail-weighted dependence measure, andconditional asymmetry measures, respectively.4.2.1 Estimating copulas of conditional distributionsGijbels et al. (2011) propose nonparametric estimators of Spearman’s rho andKendall’s tau for a conditional distribution of two variables given a covariate. Inthis section, we give an overview of the estimator of copulas of conditional distri-butions.Consider a random vector (Y1,Y2,X)T , where X could be a random variable ora random vector. Let FY1|X and FY2|X be the conditional CDF of Y1 and Y2 given Xrespectively, and CY1,Y2;X be the copula for FY1|X(·|x) and FY2|X(·|x). Let (yi1,yi2,xi)be the observed data, for i = 1, . . . ,n, and suppose this is considered as a randomsample. For j ∈ {1,2}, the conditional CDF FY j|X can be estimated byF˜Y j|X(y|x) =n∑i′=1wi′ j(x)I{yi′ j ≤ y},for appropriately chosen weights wi′ j(x), where I represents the indicator func-tion. The weight wi′ j(x) is larger if xi′ is closer to x. Let GY1,Y2;X(v1,v2;x) =P(FY1|X(Y1|x)≤ v1,FY2|X(Y2|x)≤ v2), then similarly an estimate of GY1,Y2;X(v1,v2;x)isG˜Y1,Y2;X(v1,v2;x) =n∑i=1wi(x)I{u˜i1 ≤ v1, u˜i2 ≤ v2},57where u˜i1 = F˜Y1|X(yi1|xi) and u˜i2 = F˜Y2|X(yi2|xi). Because of the smoothing forF˜Y1|X and F˜Y2|X , G˜Y1,Y2;X(v1,v2;x) does not have U(0,1) margins. One can obtainthe marginsG˜Y1;X(v1;x) =n∑i=1wi(x)I{u˜i1 ≤ v1}; G˜Y2;X(v2;x) =n∑i=1wi(x)I{u˜i2 ≤ v2}.The weight wi(x) is larger if xi is closer to x, and it can be different from wi1(x)and wi2(x). Let G˜−1Yj;X be the generalized inverse distribution function of G˜Y j;X forj ∈ {1,2}. An estimate of CY1,Y2;X(u1,u2;x) can be obtained:C˜Y1,Y2;X(u1,u2;x) = G˜Y1,Y2;X(G˜−1Y1;X(u1;x), G˜−1Y2;X(u2;x);x)=n∑i=1wi(x)I{uˆi1 ≤ u1, uˆi2 ≤ u2}, (4.1)where uˆi1 := G˜Y1;X(u˜i1;x) and uˆi2 := G˜Y2;X(u˜i2;x).One common choice of the weight function is the Nadaraya-Watson estimator(Nadaraya, 1964; Watson, 1964):wi(x) =K (‖xi− x‖/hn)∑nj=1 K (‖x j− x‖/hn), i = 1, . . . ,n,where hn = O(n−1/5) is the bandwidth and K(·) is the kernel function. Whenx ∈ Rd for d > 1, this is a spherically symmetric weight function (Loader, 1999).Commonly used kernel functions include: (1) Uniform: K(t) = 12I{|t| ≤ 1}; (2)Gaussian: K(t) = 1√2pi e−t2/2; (3) Epanechnikov: K(t) = 34(1− t2)I{|t| ≤ 1}.For a bivariate measure η that is a functional of a bivariate copula, its corre-sponding conditional measure can be written as η(x)=η(CY1,Y2;X(·;x)). If η(x) is asmooth function in x, it can be estimated by η˜(x) = η(C˜Y1,Y2;X(·;x)), where C˜Y1,Y2;Xis defined in Equation 4.1 via kernel smoothing. Algorithm 4.1 shows the pseudo-code for estimating η(x) evaluated on a sequence of grid points (x∗1,x∗2, . . . ,x∗M),where x∗m ∈ supp(X). For a fixed x∗m, all the observations (xi,yi1,yi2) contribute tothe estimation of η(x∗m), but those xi that are closer to x∗m carry more weight.Algorithm 4.1 works for a scalar-valued x as well as a vector-valued x. How-ever, due to the curse of dimensionality, the sample size needed to detect the signal58Algorithm 4.1 Estimation of a conditional measure η(x).Input: A sequence of grid points (x∗1,x∗2, . . . ,x∗M); observed data {((xi,yi1,yi2)}ni=1.Output: Estimated conditional measure η˜(x∗m) for m = 1,2, . . . ,M.1: Smoothed empirical values u˜i1 and u˜i2 for i = 1, . . . ,n are obtained such that{u˜i j} is close to a U(0,1) distribution for j = 1,2.2: A smoothed empirical C˜Y1,Y2;X(·;x∗m) is computed for each x∗m using Equa-tion 4.1.3: An empirical η˜(x∗m) = η(C˜Y1,Y2;X(·;x∗m)) can be obtained. (Examples are givenin the following subsections.)from random variation increases quickly when conditioning on more variables.Conditioning on one variable, a sample size of the order of 300 and above can leadto the detection of the shape of conditional measure as a function of the value of theconditioning variable. However, when conditioning on two variables, the samplesize might need to be several thousand to see the shape from the random variability.An example is given in Section 4.4 to illustrate this.Moreover, bootstrapping can be used to determine the confidence bands forconditional measures. The confidence bands can help to visually suggest whethera conditional measure η(x) is constant with respect to x. If so, there is more supportfor the simplifying assumption as an approximation. Acar et al. (2012) constructpointwise confidence bands at each grid point. Here, we propose to use simultane-ous envelop-based bootstrap confidence bands and provide a method of construct-ing such confidence bands. The idea is to draw bootstrap samples and computethe curve of estimated η˜(x) on a sequence of grid points; repeating this step Nbstimes gives Nbs estimated curves. For each grid point, we find the curve that corre-sponds to the γ-level upper (lower) confidence bound. For neighboring grid points,the same curve might be the pointwise “critical” curve. Consider the set of curvesthat are critical for the upper (lower) confidence. The envelope from the pointwisemaximum (minimum) of these upper (lower) critical curves is the resulting simul-taneous upper (lower) confidence curve; the upper and lower envelopes guaranteeto cover entirely a proportion γ of the bootstrapped curves. Algorithm 4.2 gives aformal definition of the proposed bootstrapping method.59Algorithm 4.2 Upper and lower simultaneous bootstrap confidence bands of η(x).Input: A sequence of grid points (x∗1,x∗2, . . . ,x∗M); observed data {((xi,yi1,yi2)}ni=1;number of bootstrap samples Nbs; confidence level γ .Output: Upper and lower confidence bands evaluated at (x∗1,x∗2, . . . ,x∗M).1: for r = 1,2, . . . ,Nbs do2: Draw n observations with replacement from {((xi,yi1,yi2)}ni=1.3: Estimate η˜r evaluated at (x∗1,x∗2, . . . ,x∗M) with the bootstrap sample, usingAlgorithm 4.1.4: end for5: Initialize Supper← /0, Slower← /0.6: for m = 1,2, . . . ,M do7: Find the (1+ γ)/2 quantile of {η˜r(x∗m)}Nbsr=1, denote its index by Iupperm ∈[Nbs].8: Supper← Supper∪{Iupperm }.9: Find the (1− γ)/2 quantile of {η˜r(x∗m)}Nbsr=1, denote its index by Ilowerm ∈[Nbs].10: Slower← Slower∪{Ilowerm }.11: end for12: The upper confidence band evaluated at x∗m is max{η˜r(x∗m) : r ∈ Supper}; thelower confidence band evaluated at x∗m is min{η˜r(x∗m) : r ∈ Slower}.4.2.2 Conditional Spearman’s rhoFor a bivariate copula C, the population version of Spearman’s rho can be expressedasρS(C) = 12∫∫[0,1]2C(u1,u2)du1du2−3.Using the kernel method described in Section 4.2.1, conditional Spearman’s rhofor CY1,Y2;X(·;x) can be estimated byρS(C˜Y1,Y2;X(·;x)) = 12∫∫[0,1]2C˜Y1,Y2;X(u1,u2;x)du1du2−3= 12n∑i=1wi(x)∫∫[0,1]2I{uˆi1 ≤ u1, uˆi2 ≤ u2}du1du2−3= 12n∑i=1wi(x)(1− uˆi1)(1− uˆi2)−3,where uˆi1 := G˜Y1;X(u˜i1;x) and uˆi2 := G˜Y2;X(u˜i2;x) as defined in Section 4.2.1.60Note that the numerical implementation of a conditional Kendall’s tau is muchmore time-consuming than Spearman’s rho because the computational complex-ity is of a higher power of the sample size n. Hence, we do not use conditionalKendall’s tau. We refer the readers to Gijbels et al. (2011) for its empirical versionusing kernel smoothing.4.2.3 Conditional tail-weighted dependence measureSpearman’s rho and Kendall’s tau summarize the dependence in the center and can-not quantify the dependence in the joint upper and lower tails. The lower (upper)tail dependence coefficients can be used to measure the strength of dependencein the joint lower (upper) tail of a bivariate distribution. However, since the taildependence coefficients are defined via limits, they do not have simple empiricalcounterparts. Some tail-weighted dependence measures, such as semi-correlationsof normal scores, do not have simple counterparts for conditional dependence mea-sures.Lee et al. (2018) propose a family of dependence measures ζα for α > 0. Whenα = 1, ζα is a measure of central dependence with properties similar to Kendall’stau and Spearman’s rho. For large α , ζα is a tail-weighted dependence measure;the limit as α → ∞ is the upper tail dependence coefficient. The definition of theupper tail-weighted dependence measure ζα is given as follows:ζα(C) := 2−α(γ−1α (C)−1), where γα(C) :=∫ 10C(u1/α ,u1/α)du. (4.2)The lower tail-weighted dependence measure of a copula is the upper tail-weighteddependence measure of its survival copula Ĉ(u1,u2) =C(1−u1,1−u2)+u1+u2−1:ζα(Ĉ) := 2−α(γ−1α (Ĉ)−1),whereγα(Ĉ) =α−1α+1+∫ 10C(1−u1/α ,1−u1/α)du.Under the same setting in Section 4.2.2, the conditional tail-weighted depen-dence measure ζα for CY1,Y2;X(·;x) can be estimated similarly. By Equation 4.2, the61conditional γα(x) can be estimated byγα(C˜Y1,Y2;X(·;x)) =∫ 10C˜Y1,Y2;X(u1/α ,u1/α ;x)du=n∑i=1wi(x)∫ 10I{uˆi1 ≤ u1/α , uˆi2 ≤ u1/α}du= 1−n∑i=1wi(x)(uˆi1∨ uˆi2)α ,where uˆi1 := G˜Y1;X(u˜i1;x) and uˆi2 := G˜Y2;X(u˜i2;x). As a result, the conditional uppertail-weighted dependence measure can be estimated byζα(C˜Y1,Y2;X(·;x)) = 2−α(γ−1α (C˜Y1,Y2;X(·;x))−1).4.2.4 Conditional measures of permutation and reflection asymmetryA bivariate copula C is called permutation symmetric if for (U1,U2)∼C, we have(U2,U1)∼C as well. Similarly, C is called reflection symmetric if for (U1,U2)∼C,we have (1−U1,1−U2)∼C. Krupskii (2017) proposes permutation and reflectionasymmetry measures for data with positive quadrant dependence. Those asym-metry measures can be used as diagnostic tools to suggest proper candidate cop-ula families when fitting bivariate data. In this section, we present the permuta-tion asymmetry measure GP,k and reflection asymmetry measures GR,k, and extendthem to conditional measures.The permutation asymmetry measure GP,k(C) is defined as the expectation ofthe variable |U1−U2|k+2 adjusted for the sign of U1−U2 for k > 0:GP,k(C) = E[|U1−U2|k+2 · sign(U1−U2)]=∫∫[0,1]2|u1−u2|k+2 · sign(u1−u2)dC(u1,u2).It indicates the direction of permutation asymmetry: if the measure takes a posi-tive (negative) value, then the conditional mean of data truncated in the right lower(left upper) corner is greater than that of data truncated in the left upper (rightlower) corner. A larger tuning parameter k results in greater variability of an em-62pirical estimate, while a small k makes the measure less sensitive to a permutationasymmetric dependence. The permutation asymmetry measure GP,k can be furthernormalized to the range of [−1,1] by finding a copula C that maximizes |GP,k(C)|(Rosco and Joe, 2013). Following the choice of k in Krupskii (2017), we usek = 0.2 for the remainder of the chapter.The conditional GP,k can be estimated byGP,k(C˜Y1,Y2;X(·;x)) =∫∫[0,1]2|u1−u2|k+2sign(u1−u2)dC˜Y1,Y2;X(u1,u2;x)=n∑i=1wi(x)|uˆi1− uˆi2|k+2sign(uˆi1− uˆi2),where uˆi1 := G˜Y1;X(u˜i1;x) and uˆi2 := G˜Y2;X(u˜i2;x).Similarly, the reflection asymmetry measure GR,k is defined as the expectationof the variable |1−U1−U2|k+2 adjusted for the sign of 1−U1−U2.GR,k(C) = E[|1−U1−U2|k+2sign(1−U1−U2)]=∫∫[0,1]2|1−u1−u2|k+2sign(1−u1−u2)dC(u1,u2),and an estimate of the conditional GR,k isGR,k(C˜Y1,Y2;X(·;x)) =n∑i=1wi(x)|1− uˆi1− uˆi2|k+2sign(1− uˆi1− uˆi2).4.3 Skewed bivariate copulasBivariate copulas with permutation asymmetry have not been used much with vinecopulas, because often permutation asymmetry cannot be observed from a bivariatenormal scores plot. However, if permutation asymmetry is detected via asymmetrymeasures, then families of permutation asymmetric copulas should be considered.In this section, we compare two parametric families of permutation asymmetriccopulas with asymmetric tail dependence. At the boundaries of these families arecopulas without tail dependence. They have three or four parameters and can beused within parametric vine copulas.63The Azzalini-Capitanio (AC) skew-t copula and the skew-BB1 copula are twopermutation asymmetric bivariate copulas. They can be used within vine copulaswhen permutation asymmetry is detected together with possible tail dependence.Without both upper and lower tail dependence, Azzalini-Dalla Vallee skew-normal(limiting case of AC skew-t) and skew-Gumbel (boundary case of skew-BB1) areoptions.One way to generate new copula families from existing ones is through themaximum of independent beta random variables. This has been included in Gen-est et al. (1998); McNeil et al. (2015) and Section 3.18 of Joe (2014), withoutstudying its properties in detail. Let C1 and C2 be two bivariate copulas andγ1,γ2 ∈ [0,1], then C1(uγ11 ,uγ22 )C2(u1−γ11 ,u1−γ22 ) is a valid bivariate copula. In fact,if V1 = (V11,V12) ∼ C1, V2 = (V21,V22) ∼ C2 and V1,V2 are independent, then(V 1/γ111 ∨V 1/(1−γ1)21 ,V 1/γ212 ∨V 1/(1−γ2)22 ) has uniform margins and follows the abovedistribution.In order to generate permutation asymmetric bivariate copulas, let C1 be a per-mutation symmetric bivariate copula C1 = C(·;θ) parametrized by θ , let C2 bethe independence copula C⊥, i.e., C2(u1,u2) = u1u2, and suppose (1− γ1)(1−γ2) = 0. This results in a permutation asymmetric bivariate copula C˘ that can beparametrized by θ and β ∈ [−1,1].C˘(u1,u2;θ ,β ) =C(u1−β1 ,u2;θ)uβ1 if 0≤ β ≤ 1,C(u1,u1+β2 ;θ)u−β2 if −1≤ β < 0.(4.3)The corresponding conditional CDF and copula PDF are as follows:C˘2|1(u2|u1;θ ,β )=(1−β )C2|1(u2|u1−β1 ;θ)+βC(u1−β1 ,u2;θ)uβ−11 if 0≤ β ≤ 1,C2|1(u1+β2 |u1;θ)u−β2 if −1≤ β < 0.64C˘1|2(u1|u2;θ ,β )=C1|2(u1−β1 |u2;θ)uβ1 if 0≤ β ≤ 1,(1+β )C1|2(u1|u1+β2 ;θ)−βC(u1,u1+β2 ;θ)u−β−12 if −1≤ β < 0.c˘(u1,u2;θ ,β )=(1−β )c(u1−β1 ,u2;θ)+βC1|2(u1−β1 |u2;θ)uβ−11 if 0≤ β ≤ 1,(1+β )c(u1,u1+β2 ;θ)−βC2|1(u1+β2 |u1;θ)u−β−12 if −1≤ β < 0.When 0 < β < 1, the copula is skewed towards the bottom-right corner; it hasmore probability in the (1,0) corner than the (0,1) corner. When −1 < β < 0, thecopula is skewed towards the top-left corner; it has more probability in the (0,1)corner than the (1,0) corner. When β → 1 or β → −1, Equation 4.3 convergesto C⊥ in distribution. When C(·;θ) is the independence copula, the parameter βhas no effect as the result is still the independence copula C⊥. When C(·;θ) iscomonotonic,C˘(u1,u2;θ ,β ) =(u1−β1 ∧u2)uβ1 if 0≤ β ≤ 1,(u1∧u1+β2 )u−β2 if −1≤ β < 0.If 0≤ β ≤ 1, there is no probability density above the curve u1−β1 = u2; the proba-bility that a point lies on the curve is 1−β ; the density under the curve is uniform.If −1 ≤ β < 0, there is no probability density below the curve u1 = u1+β2 ; theprobability that a point lies on the curve is 1+ β ; the density above the curve isuniform. Figure 4.1 shows scatter plots of random samples in this case for β = 0.5and β =−0.5.The above analysis shows that β is not directly interpretable as a skewnessparameter, and θ of the original copula by itself does not indicate the strength ofdependence. To understand the range of permutation asymmetry versus centraldependence, these measures are plotted in Figure 4.2 for skew-BB1 copulas, i.e.,65(a) β = 0.5. (b) β =−0.5.Figure 4.1: Scatter plots of 1000 random samples drawn from C˘(u1,u2;θ ,β )in Equation 4.3 when C(·;θ) is comonotonic.the bivariate copula C(·;θ) in Equation 4.3 is a BB1 copula:C˘(u1,u2;θ ,δ ,β ) =[1+[(u−θ(1−β )1 −1)δ +(u−θ2 −1)δ]1/δ]−1/θuβ1 if 0≤ β ≤ 1,[1+[(u−θ1 −1)δ +(u−θ(1+β )2 −1)δ]1/δ]−1/θu−β2 if −1≤ β < 0.We conduct a grid search of all combinations of parameters θ . The widest rangeof permutation asymmetry occurs when the strength of central dependence (say, asmeasured by Spearman’s rho) is around 0.5. This is shown in Figure 4.2 where thelengths of the curves are longest when β is between 0.5 to 0.6 (or negative).Yoshiba (2018) has a numerical implementation of the AC skew-t copula, whichinvolves the multivariate skew-t distribution of Azzalini and Capitanio (2003). Ad-variate skew-t distribution has the following joint density function at x ∈ Rd :g(x) = 2td,ν(x;Ω)T1,ν+d(αT x√ν+dxTΩ−1x+ν),where α ∈Rd , td,ν(x;Ω) is the d-variate Student-t density with the correlation ma-66Figure 4.2: Comparison of permutation asymmetry measure GP,k=0.2 in Sec-tion 4.2.4 and central dependence measure Spearman’s rho for skew-BB1 and skew-t copulas. For skew-BB1 copulas, the parameter β isin the set of 20 equally spaced points in [−1,1]. Each red curve in thefigure corresponds to a distinct β value.trix Ω and the degrees of freedom ν , and T1,ν+d is the univariate Student-t CDFwith degrees of freedom ν . An AC skew-t copula is the copula of a multivariateskew-t distribution by applying Sklar’s theorem, so the major numerical difficultyfor this copula is to get the univariate quantile functions. Figure 4.2 also shows thepermutation asymmetry measure and central dependence measure for AC skew-tcopulas. It indicates that skew-BB1 copulas cover a wider range of the permuta-tion asymmetry measure than AC skew-t copulas when the dependence measure isgreater than 0.5.4.4 Conditional dependence with the gamma factormodelIn this section, we apply the diagnostic tools for conditional dependence to thegamma factor model, a special case of convolution-closed families described in67Section 4.28 of Joe (2014). As mentioned in Stoeber et al. (2013), this is a multi-variate model with conditional dependence measures that vary from 0 to 1; hence,any method for assessing the simplifying assumption for vine copulas could makeuse of this model. There is more variation in conditional dependence measures inthis model than in other multivariate distributions for which we have done compu-tations. See Appendix B for a similar analysis on a trivariate Frank copula model.The simplifying assumption for conditional distributions of multivariate distri-butions is not satisfied other than in a few known cases (see Section 3.9.7 of Joe(2014)); in other cases where the conditional dependence measures can be com-puted, there is much less variation and often the simplifying assumption may beacceptable as an approximation. In cases where we have done computations ofSpearman’s rho for conditional bivariate distributions from trivariate and 4-variatedistributions, the Spearman’s rho curve is monotone or U-shaped or unimodal whenconditioning on one variable and the Spearman’s rho surface is smooth with cor-ners as local maxima or minima when conditioning on two variables.We show that the conditional dependence measures can be used for moderatesample sizes when conditioning on one variable, but not for two or more variables.The implications of this for application to vine copulas are mentioned at the end ofthis section.For the gamma factor model, the marginal and joint distributions have sim-ple stochastic representations, and the conditional distributions can be obtainedvia one-dimension numerical integration. Therefore, the copula-based dependencemeasures can be computed even for conditional distributions. This allows for acomparison of the diagnostic tools proposed in Section 4.2 with the exact condi-tional dependence measures as a function of the conditioning values, in order toget an idea of the sample size needed to see the patterns.Suppose Yj = Z0+Z j for j= 1, . . . ,d, where Z0,Z1, . . . ,Zd are independent ran-dom variables and Z j ∼ Gamma(θ j,1) for j = 0,1, . . . ,d. This model has positivedependence and the simplifying assumption for vines is far from holding for anyvine structure because any conditional distribution can vary from independence tostrong dependence as the values of the conditioning variables vary. For bivariatemargins, there is stronger dependence in the joint upper tail than the joint lowertail.68For d = 3, we consider the copula of (Y1,Y2) given F3(Y3) = x, denoted byC12;3(·;x), where Fj is the CDF of Yj and x ∈ (0,1). Transforming Y3 to U(0,1)before computing the conditional measures produces better estimates with a band-width hn that doesn’t depend on the value of the conditioning variable. For thiscopula conditioning on one variable, the pattern of the conditional Spearman’s rhois that it is increasing from 0 to 1 as x goes from 0 to 1. A similar pattern oc-curs for the tail-weighted dependence measure ζα but conditionally, there is moredependence in the joint lower tail than in the joint upper tail.We assess the estimation method on many simulated three-variate datasets. Forthe representative example in Figure 4.3, the simulation sample size is n = 1000with (θ0,θ1,θ2,θ3) = (3,1,1.5,2). The exact ρS(C12;3(·;x)), ζα=5(C12;3(·;x)),ζα=5(Ĉ12;3(·;x)), and GP,k=0.2(C12;3) computed via numerical integration are shownin red dash-dot lines in Figure 4.3. The kernel-smoothed estimates using Epanech-nikov kernel and window size hn = 0.2 are shown in solid dark lines and the boot-strap confidence bands are plotted in dashed dark lines. The plots indicate that theconstructed confidence bands are able to detect the increasing trends in the condi-tional dependence measures. For the conditional permutation asymmetry measureGP,k=0.2(C12;3), both the exact measure and estimates are close to zero for differentvalues of the conditioning variable.Next, we assess the estimation methods when conditioning on two variables.Consider C12;34(·;x,y), the copula of (Y1,Y2) given F3(Y3) = x and F4(Y4) = y. Theconditional Spearman’s rho is increasing from 0 to 1 as x and y go from 0 to 1, alongor near the main diagonal. There is a ridge near the main diagonal that dependson the magnitude of asymmetry in θ1, . . . ,θ4, and the conditional Spearman’s rhodecreases in the directions orthogonal to the ridge. A similar pattern occurs forthe conditional tail-weighted dependence measure ζα . We conduct a study on asimulated four-variate dataset. For a representative example in Figure 4.4, we haveparameters (θ0,θ1,θ2,θ3,θ4) = (3,1,1.5,2,2.5). The sample size is n = 1000. Inthis case, the conditional Spearman’s rho ρS(C12;34(·;x,y)) is a bivariate function.Figure 4.4 shows the exact function in red and the estimated confidence surfacesin blue. Conditional Spearman’s rho near the corners (0,1) and (1,0) is severelyunderestimated. A much large sample size of several thousand is needed to ac-curately capture the shape of the two-dimensional surface. Other non-parametric69(a) Spearman’s rho ρS(C12;3). (b) Tail-weighted dependence mea-sure (lower tail) ζα=5(C12;3).(c) Tail-weighted dependence mea-sure (upper tail) ζα=5(Ĉ12;3).(d) Permutation asymmetry mea-sure GP,k=0.2(C12;3).Figure 4.3: Conditional measures of C12;3(·;x), the copula of Y1,Y2 givenF3(Y3) = x, for a gamma factor model with parameters (θ0,θ1,θ2,θ3) =(3,1,1.5,2). The sample size is n = 1000. The red dash-dot lines arethe exact conditional measures computed via numerical integration. Thedark solid lines and dashed lines are the kernel-smoothed conditionalSpearman’s rho and the corresponding 90%-level simultaneous boot-strap confidence bands, using Epanechnikov kernel and window sizehn = 0.2.70Figure 4.4: Conditional Spearman’s rho of C12;34(·;x,y), the copula of Y1,Y2given F3(Y3) = x and F4(Y4) = y, for a gamma factor model with param-eters (θ0,θ1,θ2,θ3,θ4) = (3,1,1.5,2,2.5). The sample size is n= 1000.The red surface is the exact conditional Spearman’s rho computedvia numerical integration, and the blue surfaces are the 90%-level si-multaneous bootstrap confidence surfaces, using spherically symmetricEpanechnikov kernel and window size hn = 0.2.smoothing methods yield similar estimation results.What the gamma factor model implies about the simplifying assumption forvine copulas is that, for a sample size of a few hundred to a few thousand, condi-tional dependence measures as a function of the values of conditioning variablesare useful for tree 2 but not for trees 3 and higher. The current algorithms for vinecopulas attempt to put pairs with stronger dependence and conditional dependencein lower trees and weaker dependence in higher order trees of truncated vines. Inthis case, it is especially important to check the validity of the simplifying assump-tion as an approximation for tree 2. If the simplifying assumption seems acceptablefor tree 2, then the assumption may be acceptable in higher-order trees. Otherwise,one could fit copulas in tree 2 with conditional dependence parameters that varyas simple functions of the values of conditioning variables. The above results forthe gamma factor model indicate that the conditional dependence measures cannot71be used for trees 3 and higher when the sample size is not large. From the resultsin Section 5.7 of Joe (2014), different bivariate copula models are hard to distin-guish when the dependence is weak. So it is mainly the vine edges with a strongerdependence where more care is needed in the choice of copula families based ondiagnostics.4.5 Illustrative data examplesIn this section, we illustrate the diagnostic tools for 3 to 4 variables on two datasets:a hydro-geochemical dataset in Section 4.5.1 and a gene expression dataset in Sec-tion 4.5.2. Both datasets have variables that exhibit significant permutation asym-metry and have examples where the conditional Spearman’s rho function is non-constant.We have also applied the diagnostics to financial returns datasets; there seemsno pattern in the conditional measures when the conditioning variables have strongdependence and the permutation asymmetry measures are rarely significant. Hence,for financial returns datasets, vine or factor copulas with the simplifying assump-tion and permutation symmetric bivariate copulas should usually be adequate.When the simplifying assumption does not hold for an edge of a vine in tree2 or higher, a bivariate parametric copula family is chosen with at least one of itsparameters varying with the value(s) of the conditioning variable(s). In this case,we say that a non-constant copula family is used. Otherwise, if the simplifyingassumption is assumed to hold, we say that a constant copula is used, meaning thatthe parameters are constant over the value(s) of the conditioning variable(s).Without the simplifying assumption, a multivariate distribution can be decom-posed into sets of conditional distributions for all regular vines. With the simpli-fying assumption, it has been the practice for vine copulas to choose just one vinestructure based on a sequential procedure of having edges with the strongest de-pendence in low-order trees, for example, Dissmann et al. (2013). Then one cantruncate the process when additional trees show weak conditional dependence; forexample, the method of Brechmann et al. (2012) can be used to get a truncatedvine copula when there are many variables. For a few variables, truncation wouldnot occur if there is no vine for which conditional dependence becomes weak. In72lower ζˆα=5 upper ζˆα=5 Gaussian ζα=5 ĜP,k=0.2 se(ĜP,k=0.2) fitted copula[Co,Sc] 0.522 0.511 0.495 0.036 0.017 skew-t[Sc,Ti] 0.467 0.357 0.359 0.093 0.021 refl. skew-BB1[Co,Ti] 0.434 0.118 0.283 0.064 0.028 refl. skew-BB1Table 4.1: Empirical tail-weighted dependence measures ζˆα=5, Gaussian tail-weighted dependence measure ζα=5, permutation asymmetry measuresĜP,k=0.2, and fitted copulas in the first tree for the hydro-geochemicaldataset. Gaussian ζα=5 is the tail-weighted dependence measure of a bi-variate Gaussian copula whose Spearman’s rho is the same as the empir-ical counterpart. There appears to be reflection asymmetry, permutationasymmetry, and stronger dependence than Gaussian in the joint upperand lower tails.this section, we include examples with three and four variables where we can fitparametric vine copulas for different vines without the simplifying assumption sothat they fit roughly the same when compared using Vuong’s procedure (Vuong,1989).4.5.1 Hydro-geochemical dataWe use the hydro-geochemical dataset to illustrate the use of the diagnostic tools.The dataset is from the hydro-geochemical stream and sediment reconnaissance(HSSR) project, which is a Department of Energy program to assess the extentof uranium potential in the United States (Cook and Johnson, 1981). It consistsof the log-concentrations of seven chemicals in n = 655 water samples collectednear Grand Junction, Colorado. We focus on three variables that are the log-concentrations of the elements cobalt (Co), scandium (Sc) and titanium (Ti). Thethree variables are also used by Acar et al. (2012); Kraus and Czado (2017b). Fig-ure 4.5 shows the pairwise scatter plot of the normal scores. Acar et al. (2012)conduct an analysis which suggests that a non-simplifying vine copula construc-tion is needed. However, their analysis only focuses on the conditional distribu-tion [Co,Sc|Ti] without fitting a vine structure. We demonstrate how to get non-simplifying vine copulas for all three possible vine structures.Based on an initial data analysis and the normal scores plot, there appears to73Figure 4.5: Pairwise scatter plot of the normal scores of variables cobalt (Co),titanium (Ti) and scandium (Sc) in the hydro-geochemical dataset.be reflection asymmetry, permutation asymmetry, and stronger dependence thanGaussian in the joint upper and lower tails. Table 4.1 shows the comparison ofempirical tail-weighted dependence measures with tail-weighted dependence mea-sures of bivariate Gaussian copulas whose Spearman’s rho are the same as the em-pirical counterparts. It also shows the permutation asymmetry measures ĜP,k=0.2and the corresponding standard errors. Bivariate parametric copula families in thefirst level or tree are selected from the following candidate families: Gaussian,t, BB1, reflected BB1, skew-Gaussian, skew-t, skew-BB1, and reflected skew-BB1. The best fitting bivariate copulas are shown in Table 4.1. For copulas inthe second tree, Figure 4.6 shows the kernel-smoothed conditional Spearman’s rhofor all three possible vine structure using Epanechnikov kernel and window sizehn = 0.2, which is close to n−1/5. The dark solid lines and dashed lines are thekernel-smoothed conditional Spearman’s rho and the corresponding 90%-level si-multaneous bootstrap confidence bands using 1000 bootstrap samples. It visuallyindicates that the simplifying assumption does not seem valid for all three vines,but is closer to holding for [Co,Sc|Ti] and [Sc,Ti|Co]. Kraus and Czado (2017b)74decide on [Sc,Ti|Co] as the structure being closest to satisfying the simplifyingassumption. For [Co,Ti|Sc], and [Co,Sc|Ti], the curve of conditional Spearman’srho is unimodal, so a quadratic parametrization in the conditioning value mightbe sufficient to capture the shape. For [Sc,Ti|Co], the curve is bimodal; a higherorder polynomial is needed to capture the trend. The shapes of the conditionaltail-weighted tail dependence are similar to that of the conditional Spearman’s rho.In many cases, a copula parameter ϑ is bounded in (ϑL,ϑU), either by defini-tion or for numerical stability. When performing numerical maximum likelihoodestimation, a reparametrization can help for better convergence. We define a con-tinuous and monotonically increasing function h : R→ (ϑL,ϑU) that maps fromthe real line to a finite interval:h(x;ϑL,ϑU) = tanh(x)(ϑU −ϑL)/2+(ϑU +ϑL)/2.We find the following non-constant copula families as best fits, after trying dif-ferent ways of incorporating the value of the conditional variable into a conditionaldependence parameter.• The non-constant t copula whose parameter ρ is the composition of thetransformation h and a quadratic function of the conditioning variable u:ρ = h(a2u2+a1u+a0;ρmin,ρmax),where ρmin = −1 and ρmax = 1. With this parametrization, a non-constant tcopula model has three parameters: a2,a1,a0 and ν .• The non-constant skew-BB1 copula whose parameters δ and θ are the com-position of the transformation h and a quadratic function of the conditioningvariable u:δ = h(a2u2+a1u+a0;δmin,δmax),θ = h(b2u2+b1u+b0;θmin,θmax),where δmin = 1, δmax = 7, θmin = 0 and θmax = 7. With this parametrization,a non-constant skew-BB1 copula model has seven parameters: β , a2, a1, a0,75simplifying non-simplifying[Co,Ti|Sc] t t (quadratic ρ)[Co,Sc|Ti] skew-BB1 skew-BB1 (quartic δ )[Sc,Ti|Co] skew-BB1 skew-BB1 (quartic δ )Table 4.2: Fitted bivariate copulas in the second tree for the hydro-geochemical dataset.b2, b1 and b0.• The non-constant skew-BB1 copula whose parameter δ is the compositionof the transformation h and a quartic function of the conditioning variable u:δ = h(a4u4+a3u3+a2u2+a1u+a0;δmin,δmax),where δmin = 1 and δmax = 7. With this parametrization, a non-constantskew-BB1 copula model has seven parameters: β ,a4,a3,a2,a1,a0 and θ .The selected constant and non-constant bivariate copulas are reported in Ta-ble 4.2. The red dash-dot lines in Figure 4.6 represent the estimated conditionalSpearman’s rho. It indicates that the estimated curves accurately reflect the trendof the kernel-smoothed conditional Spearman’s rho.Table 4.3 shows the pairwise comparison of the AICs with and without the sim-plifying assumption for the three models. The confidence intervals are calculatedusing Vuong procedure with the AIC correction (Vuong, 1989). It is clear that thevine copula models with non-constant copulas on the second tree fit better than thecorresponding model with constant copulas. If the parametric models fit well thebivariate copulas in the first tree and the copulas of conditional distributions in thesecond tree, the vine copulas for different vines should be similar. It can be seenfrom Table 4.3 that the AICs of the three vine copulas are close.4.5.2 Glioblastoma tumors datasetThe glioblastoma tumors (GBM) dataset is a level-3 gene expression dataset usedby Brennan et al. (2013). It is obtained from The Cancer Genome Atlas (TCGA)Data Portal (Tomczak et al., 2015) and contains expression data of 12044 genes76(a) [Co,Ti|Sc].(b) [Co,Sc|Ti]. (c) [Sc,Ti|Co].Figure 4.6: Conditional Spearman’s rho on the hydro-geochemical dataset.The dark solid lines and dashed lines are the kernel-smoothed condi-tional Spearman’s rho and the corresponding 90%-level simultaneousbootstrap confidence bands, using Epanechnikov kernel and windowsize hn = 0.2. The red dash-dot lines represent the estimated conditionalSpearman’s rho.77C-vine 1 C-vine 2simplifying non-simplifyingAIC 1 AIC 2 AIC 1 AIC 2 CICo–Sc–Ti Co–Ti–Sc −869.2 −865.0 −899.7 −876.5 (−0.038,0.003)Co–Ti–Sc Sc–Co–Ti −865.0 −874.7 −876.5 −883.7 (−0.014,0.024)Sc–Co–Ti Co–Sc–Ti −874.7 −869.2 −883.7 −899.7 (−0.011,0.035)Table 4.3: Pairwise comparison of vine copula models on the hydro-geochemical dataset.ĜP,k=0.2 se(ĜP,k=0.2) copula symm AIC symm copula asymm AIC asymm[12] 0.072 0.028 BB1 −105.5 skew-Gumbel −112.0[13] −0.097 0.018 Gaussian −321.9 skew-t −349.6[14] 0.060 0.023 t −232.4 skew-t −234.9[23] −0.128 0.024 Gaussian −126.0 skew-t −156.3[24] −0.046 0.025 Gumbel −162.5 refl. skew-BB1 −165.4[34] 0.115 0.020 BB1 −544.8 skew-t −589.0Table 4.4: Permutation asymmetry measures and AICs of pairs of variables inthe GBM dataset.from n = 558 tumors. Within all the genes in the dataset, we first filter out 1342genes that are related to human cell cycle. Afterwards, a hierarchical clustering al-gorithm with Euclidean distance metric and complete-linkage is applied to obtain acluster of 92 genes. We pick four consecutive genes that have visible permutationasymmetry: RPL21, RPL22, RPL24 and RPL29, which are hereafter referred to asvariables 1 to 4. Figure 4.7 shows the pairwise scatter plot of the normal scores. Ta-ble 4.4 shows the pairwise permutation asymmetry measure ĜP,k=0.2 and the corre-sponding bootstrap standard errors using 1000 bootstrap samples. Table 4.5 showsthe lower and upper tail-weighted dependence measures and the tail-weighted de-pendence measure of a bivariate Gaussian copula whose Spearman’s rho is thesame as the empirical counterpart. All pairs of variables are positively correlated,and some have perceivable permutation asymmetry and reflection asymmetry. Wealso conduct a similar analysis on other sets of four variables from the dataset;most of them do not exhibit permutation asymmetry and vine copula models withthe simplifying assumption appear to be sufficient.We fit both bivariate permutation symmetric and asymmetric copulas to all78Figure 4.7: Pairwise scatter plot of the normal scores in the GBM dataset.lower ζˆα=5 upper ζˆα=5 Gaussian ζα=5[12] 0.128 0.380 0.197[13] 0.331 0.270 0.442[14] 0.197 0.338 0.355[23] 0.080 0.325 0.236[24] 0.015 0.463 0.237[34] 0.552 0.676 0.583Table 4.5: Empirical tail-weighted dependence measures ζˆα=5 and Gaussiantail-weighted dependence measure ζα=5.79ĜP,k=0.2 se(ĜP,k=0.2) AIC symm AIC asymm AIC asymm & non-const[14|3] 0.140 0.030 −13.0 −31.7 −31.7[23|4] −0.047 0.033 −10.9 −15.2 −28.9[23|1] −0.083 0.029 −47.1 −54.2 −54.2[24|1] −0.010 0.029 −77.8 −83.1 −83.1Table 4.6: Permutation asymmetry measure and AICs of pairs of variables intree 2 of D-vine 1342 and C-vine 1234 on the GBM dataset. If the AIC ofa non-constant model is worse than a constant model, we report the AICof the constant model, e.g., [14|3], [23|1] and [24|1].pairs of variables, including Gaussian, Gumbel, student-t, BB1, skew-normal, skew-Gumbel, skew-t, skew-BB1, and their reflected copulas. The Gumbel, skew-Gumbel,BB1 and skew-BB1 and their reflected copulas are flexible in handling asymmetrictail dependence. The AICs are also presented in Table 4.4. For all pairs, permu-tation asymmetric copulas achieve better AICs than permutation symmetric ones.Furthermore, the pairs that are significantly asymmetric according to both mea-sures, i.e., [13], [23] and [34], have large improvements in AIC. This indicates thatthe permutation asymmetry measure ĜP,k=0.2 is informative in identifying permu-tation asymmetry and guiding the choice of bivariate copula families.The best vine structure selected by Dissmann’s algorithm (Dissmann et al.,2013) is a D-vine with the path 1–3–4–2 as the first tree; we call it the D-vine-1342model. Based on this D-vine structure, we fit bivariate copulas with simplifyingassumptions from the families including Gaussian, Gumbel, student-t, BB1 andtheir reflected copulas in the second and third trees. The AIC of the best modelis −1066.5. The permutation asymmetry measures ĜP,k=0.2 shown in Table 4.6suggest that [23|4] is slightly permutation asymmetric and [14|3] is significantlypermutation asymmetric. By using skewed bivariate copulas, the AIC is improvedto −1156.6.We further assess the simplifying assumption for pairs [14|3] and [23|4] in thesecond tree of the D-vine-1342 model. Figure 4.8 shows the conditional Spear-man’s rho for the two pairs. It indicates that the simplifying assumption is accept-able for [14|3] because the conditional Spearman’s rho is approximately a constantbetween −0.1 and 0.25. However, for [23|4], the conditional Spearman’s rho de-80model AIC symm AIC asymm AIC asymm & non-constD-vine-1342 −1066.5 −1156.6 −1171.0C-vine-1234 −1055.2 −1171.9 −1171.9Table 4.7: Model AICs for different vine structures on GBM dataset.creases as the conditioning value u4 increases. Therefore, the bivariate copulamodel for [23|4] can be further improved by using non-constant copulas. We usethe same non-constant bivariate copula families as in Section 4.5.1 because thereappears to be asymmetric tail dependence from the bivariate normal scores plots.Model AICs in Table 4.6 also confirm this observation: by adopting non-constantcopulas, the AIC improves significantly for [23|4], but deteriorates for [14|3]. Over-all, using non-constant copulas improves the AIC to −1171.0, as shown in Ta-ble 4.7.Similar to the analysis in Section 4.5.1, we fit vine copula models with differentvine structures. For four variables, there are 24 possible vine structures in total. Wepick a C-vine that has a very different structure from D-vine-1342. The model C-vine-1234 has edges [12], [13] and [14] in the first tree and [23|1], [24|1] in thesecond tree. Table 4.4 and Table 4.6 show that there is significant permutationasymmetry in the first and second trees. Therefore permutation asymmetric copulascould be used. We further investigate the conditional Spearman’s rho in the secondtree for pairs [23|1] and [24|1]. The 90%-level simultaneous bootstrap confidencebands and the model Spearman’s rho are also shown in Figure 4.8. It indicatesthat the conditional Spearman’s rho is approximately constant for both pairs andconstant copulas should be sufficient.We evaluate the model AICs when using constant symmetric copulas, constantasymmetric copulas and non-constant copulas, and show the model AICs for D-vine-1342 and C-vine-1234 models in Table 4.7. For C-vine-1234 model, it im-proves significantly from constant symmetric copulas to constant asymmetric cop-ulas. But the model does not improve by using non-constant copulas. This cor-roborates the conclusions from the diagnostics. Moreover, when using asymmetricand non-constant copulas, both models have similar AICs.81(a) [14|3]. (b) [23|4].(c) [23|1]. (d) [24|1].Figure 4.8: Conditional Spearman’s rho of pairs [14|3] and [23|4] in the D-vine-1342 model, and [23|1] and [24|1] in the C-vine-1234 model onthe GBM dataset. The dark solid lines and dashed lines are the kernel-smoothed conditional Spearman’s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikov kerneland window size hn = 0.2. The red dash-dot lines represent the modelconditional Spearman’s rho. For [14|3], the best-fitting model is aconstant skewed t-copula. For [23|4], the best-fitting model is a non-constant skewed-BB1 copula (quartic δ ). For both [23|1] and [24|1], thebest-fitting models are constant reflected skewed-BB1 copulas.824.6 ConclusionIn this chapter, we propose a general framework for estimating the conditional de-pendence or asymmetry measures as a function of the value(s) of the conditionalvariable(s). An algorithm to compute the corresponding confidence bands is alsopresented. The estimation of the conditional measures can be adapted to othercopula-based measures and enrich the diagnostic tools in the future. Since the es-timation of the conditional distributions requires a smoothing method, the measureshould be a simple function of the copula.The use of dependence and asymmetry measures as diagnostic tools for bi-variate copulas and bivariate conditional distributions has been illustrated with realdatasets. Diagnostics can guide the choice of candidate bivariate copula familiesto use in vine copulas. If diagnostics for some edges of a vine suggest positivemonotone dependence, reflection asymmetry, permutation asymmetry, and possi-ble asymmetric tail dependence, then one- or two-parameter bivariate copula fam-ilies are not sufficient; instead, three- or four-parameter bivariate copula familiesmight be needed. Moreover, if the dependence measures or asymmetry measuresin trees 2 and up are not constant over the conditioning value(s), then non-constantcopulas should be considered.The diagnostic measures have been shown to be effective in suggesting ap-propriate candidate parametric copula families. It is a future research direction toautomatically and adaptively generate a shortlist of candidate parametric copulafamilies for edges of a vine copula based on the diagnostic measures. An alter-native is a reverse-delete algorithm: start with a long list of bivariate parametriccopula families followed by deletion of families that cannot match the diagnosticsummaries.83Chapter 5Prediction based on conditionaldistributions of vine copulas5.1 IntroductionIn the context of an observational study, where the response variable Y and theexplanatory variables X = (X1, . . . ,Xp) are measured simultaneously, a natural ap-proach is to fit a joint distribution to (X1, . . . ,Xp,Y ) assuming a random sam-ple (xi1, . . . ,xip,yi) for i = 1, . . . ,n, and then obtain the conditional distribution ofY given X for making predictions. Observational studies are studies where re-searchers observe subjects and measure several variables together, and inferencesof interest are relationships among the measured variables, including the condi-tional distribution of Y given other variables when there is a variable Y that onemay want to predict from the other variables. In contrast, in experimental studies,the explanatory variables (treatment factors) are controlled for by researchers, andthe effect of the non-random explanatory variables is then observed on the experi-mental units. The inferences of interest may be different for experimental studies.The conditional expectation E(Y |X = x) and conditional quantiles F−1Y |X(·|x)can be obtained from the conditional distribution for out-of-sample point estimatesand prediction intervals. This becomes the usual multiple regression if the joint dis-tribution of (X,Y ) is multivariate Gaussian. Unlike multiple regression, the joint-distribution-based approach uses information on the distributions of the variables84and does not specify a simple linear or polynomial equation for the conditionalexpectation.Nonparametric regression methods are alternatives to multiple regression anddo not assume a predetermined form of the predictor (Fan, 1992; Ha¨rdle, 1990;Stone, 1977). However, they have difficulty in (1) specifying heteroscedasticity,(2) capturing the shapes of the regression function in the extremes, (3) modelinghigh-dimensional data due to the curse of dimensionality. Nagler and Czado (2016)apply bivariate kernel density estimation with vines to get around the curse of di-mensionality. The joint-distribution-based approach estimates univariate distribu-tions and this is not sufficiently explored in the regression literature but is relevantwhen all variables are measured together in an observational study.When the explanatory variable is a scalar and continuous (p = 1), the jointdistribution of (X ,Y ) can be modeled using a bivariate parametric copula family.Bernard and Czado (2015) show how different copula families can lead to quitedifferent shapes in the conditional mean function E(Y |X = x) and say that linearityof conditional quantiles is a pitfall of quantile regression. There are applications ofbivariate or low-dimensional copulas for regression in Bouye´ and Salmon (2009)and Noh et al. (2013). However, none of the previous papers link the shape ofconditional quantiles to tail properties of the copula family.For the multivariate distribution approach to work for moderate to large di-mensions, there are two major questions to be addressed: (A) How to model thejoint distribution of (X1, . . . ,Xp,Y ) when p is not small and some X j variables arecontinuous and others are discrete? (B) How to efficiently compute the condi-tional distribution of Y given X? For question (A), the vine copula or pair-copulaconstruction is a flexible tool in high-dimensional dependence modeling; see Aaset al. (2009); Bedford and Cooke (2002); Brechmann et al. (2012); Dissmann et al.(2013); Joe (2014).The possibility of applying copulas for prediction and regression has been ex-plored, but an algorithm is needed in general for (B) when some variables arecontinuous and others are discrete. Parsa and Klugman (2011) use a multivari-ate Gaussian copula to model the joint distribution, and conditional distributionshave closed-form expressions. However, Gaussian copulas do not handle tail de-pendence or tail asymmetry, so can lead to incorrect inferences in the joint tails.85Vine copulas are used by Kraus and Czado (2017a) and Schallhorn et al. (2017) forquantile regression, but the vine structure is restricted to a boundary class of vinescalled the D-vine. A general regular-vine (R-vine) copula is adopted in Cookeet al. (2019), for the case where the response variable and explanatory variablesare continuous. Noh et al. (2013) use a non-parametric kernel density approachfor conditional expectations, but this can run into sparsity issues as the dimensionincreases.In this chapter, we propose a method, called vine copula regression, that usesR-vines and handles mixed continuous and discrete variables. That is, the predic-tor and response variables can be either continuous or discrete. As a result, wehave a unified approach for regression and (ordinal) classification. The proposedapproach is interpretable, and various shapes of conditional quantiles of y as afunction of x can be obtained depending on how pair-copulas are chosen on theedges of the vine. Another contribution is a theoretical analysis of the asymptoticconditional cumulative distribution function (CDF) and quantile function for vinecopula regression in Chapter 6. This analysis sheds light on the flexible shapes ofE(Y |X = x), as well as provide guidelines on choices of bivariate copulas on thevine to achieve different asymptotic behavior. For example, with the approach ofadding polynomial terms to an equation in classical multiple regression, one can-not get monotone increasing E(Y |X = x) functions that flatten out for large valuesof predictor variables.The remainder of this section is organized as follows. Section 5.2 introducesthe model fitting and assessment procedure. Section 5.3 describes an algorithmthat calculates the conditional CDF of the response variable of a new observation,given a fitted vine copula regression model. The conditional CDF can be furtherused to calculate the conditional mean and quantile for regression problems, andconditional probability mass function (PMF) for classification problems.5.2 Model fitting and assessmentDue to the decomposition of a joint distribution to univariate marginal distributionsand a dependence structure among variables, a two-stage estimation procedure canbe adopted. Suppose the observed data are (zi1,zi2, . . . ,zid) = (xi1, . . . ,xip,yi), for86i = 1, . . . ,n with d = p+1.1. Estimate the univariate marginal distributions F̂j, for j= 1, . . . ,d, using para-metric or non-parametric methods. The corresponding u-scores are obtainedby applying the probability integral transform: uˆi j = F̂j(zi j).2. Fit a vine copula on the u-scores. There are two components: vine structureand bivariate copulas. Section 5.2.1 discusses how to choose a vine structure,and Section 5.2.2 presents a bivariate copula selection procedure.3. Compute some conditional quantiles, with some predictors fixed and othersvarying, to check if the monotonicity properties are interpretable.5.2.1 Vine structure learningIn this section, we introduce methods for learning or choosing truncated R-vinestructures. From Kurowicka and Joe (2011), the total number of (untruncated)R-vines in d variables is 2(d−3)(d−2)(d!/2). When the dimension d is small, itis possible to enumerate all 2(d−3)(d−2)(d!/2) vines and find the best `-truncatedR-vine based on some objective functions such as those in Section 6.17 of Joe(2014). However, this is only feasible for d ≤ 8 in practice. Greedy algorithms(Dissmann et al., 2013) and metaheuristic algorithms (Brechmann and Joe, 2014)are commonly adopted to find a locally optimal `-truncated vine. The developmentof vine structure learning algorithms is an active research topic; various algorithmsare proposed based on different heuristics. However, no heuristic method can beexpected to be universally the best.The goal of vine copula regression is to find the conditional distribution ofthe response variable, given the explanatory variables. In general, to calculatethe conditional distribution from the joint distribution specified by a vine copula,computationally intensive multidimensional numerical integration is required. Thiscould be avoided if we enforce a constraint on the vine structure such that the nodecontaining the response variable as a conditioned variable is always a leaf node inT`, ` = 1, . . . ,d− 1. When this constraint is satisfied, Algorithm 5.1 computes theconditional CDF without numerical integration.87Figure 5.1: First two trees T1 and T2 of a vine V . The node set and edge setof T1 are N(T1) = {1,2,3,4,5} and E(T1) = {[12], [23], [24], [35]}. Thenode set and edge set of T2 are N(T2) = E(T1) = {[12], [23], [24], [35]}and E(T2) = {[13|2], [25|3], [34|2]}.To construct a truncated R-vine that satisfies the constraint, we can first finda locally optimal t-truncated R-vine using the explanatory variables x1, . . . ,xp.Then from level 1 to level t, the response variable y is sequentially linked to thenode that satisfies the proximity condition and has the largest absolute (normalscores) correlation with y. The idea of extending an existing R-vine is also ex-plored by Bauer and Czado (2016) for the construction of non-Gaussian condi-tional independence tests. Figures 5.1 and 5.2 demonstrate how to add a responsevariable to the R-vine of the explanatory variables, after each variable has beentransformed to standard normal N(0,1). Given a 2-truncated R-vine V = (T1,T2)in Figure 5.1 with N(T1) = {1, . . . ,5}, E(T1) = N(T2) = {[12], [23], [24], [35]},E(T2) = {[13|2], [25|3], [34|2]}. Suppose the response variable is indexed by 6.The first step is to find the node that has the largest absolute correlation, i.e.argmax1≤i≤6 |ρi6|. Assume ρ36 is the largest, then node 3 and node 6 are linked:N(T ′1) = N(T1)∪{6}, E(T ′1) = E(T1)∪{[36]}. At level 2, according to the prox-imity condition, node [36] can be linked to either [23] or [35]. So we compare ρ26;3with ρ56;3. If we assume |ρ56;3| > |ρ26;3|, then E(T ′2) = E(T2)∪{[56|3]}. So thenew 2-truncated R-vine is V ′ = (T ′1,T′2), as shown in Figure 5.2.88Figure 5.2: Adding a response variable to the R-vine of the explanatory vari-ables. In this example, variables 1 to 5 represent the explanatory vari-ables and variable 6 represents the response variable. The newly addednodes are highlighted.5.2.2 Bivariate copula selectionAfter fitting the univariate margins and deciding on the vine structure, parametricbivariate copulas can be fitted sequentially from tree 1, tree 2, etc. The results inChapter 6 can provide guidelines of choices of bivariate copula families in order tomatch the expected behavior of conditional quantile functions in the extremes ofthe predictor space.The decomposition of a bivariate joint PDF described in Section 2.1.1 can beextended to multivariate cases using vine copulas:f1:d(y1, . . . ,yd) =d∏i=1fi(yi) · ∏[ jk|S]∈E(V )c˜ jk;S(y j,yk;yS). (5.1)The above representation for the case of absolutely continuous random variablesis derived in Bedford and Cooke (2001); its extension to include some discretevariables is in Section 3.9.5 of Joe (2014). For simplicity of notation, we denoteF+j|S = Fj|S(y j|yS) and F−j|S = limt↑y j Fj|S(t|yS). If it is assumed that the copulas onedges of trees 2 to d− 1 do not depend on the values of the conditioning values,then c jk;S and c˜ jk;S in (5.1) do not depend on yS; i.e., c jk;S(·) = c jk;S(·;yS) andc˜ jk;S(·) = c˜ jk;S(·;yS). This is called the simplifying assumption. With the simplify-ing assumption, we have the following definition of c˜ jk;S.89• If Yj and Yk are both continuous, then c˜ jk;S(y j,yk) := c jk;S(F+j|S,F+k|S).• If Yj is continuous and Yk is discrete, thenc˜ jk;S(y1,yk) :=[Ck| j;S(F+k|S|F+j|S)−Ck| j;S(F−k|S|F+j|S)]/ fk|S(yk|yS).• If Yj is discrete and Yk is continuous, thenc˜ jk;S(y j,yk) :=[C j|k;S(F+j|S|F+k|S)−C j|k;S(F−j|S|F+k|S)]/ f j|S(y j|yS).• If Yj and Yk are both discrete, thenc˜ jk;S(y j,yk) :=[C jk;S(F+j|S,F+k|S)−C jk;S(F−j|S,F+k|S)−C jk;S(F+j|S,F−k|S)+C jk;S(F−j|S,F−k|S)]/[f j|S(y j|yS) fk|S(yk|yS)].With the simplifying assumption and parametric copula families, the log-likelihoodof the bivariate copula C j,k;S on edge [ jk|S] ∈ E(V ), is` jk;S(θ jk) =n∑i=1log(c˜ jk;S(zi j,zik;θ jk)).Commonly used model selection criteria include AIC and BIC:AIC jk;S(θ jk) =−2` jk;S(θ jk)+2|θ jk|,BIC jk;S(θ jk) =−2` jk;S(θ jk)+ log(n)|θ jk|,where |θ jk| refers to the number of copula parameters in c jk;S. For each candidatebivariate copula family on an edge, we first find the parameters that maximizethe log-likelihood θˆMLE. Then the copula family with the lowest AIC or BIC isselected. When all the variables are continuous, this approach of selecting thebivariate copula selection is the standard approach in VineCopula (Schepsmeieret al., 2018) and has been initially proposed and investigated by Brechmann (2010).905.3 PredictionThis section describes how to predict the conditional distribution of the responsevariable of a new observation, given a fitted vine copula regression model. We firstpresent an algorithm that computes the conditional CDF of the response variable.If the response variable is continuous, the conditional quantile and mean can becalculated by inverting the conditional CDF and integrating the quantile function.If the response variable if discrete, the conditional PMF can be easily derived fromthe conditional CDF via finite difference.Based on the ideas of the algorithms in Chapter 6 of Joe (2014), Algorithm5.1 can be applied to an R-vine with mixed continuous and discrete variables.The idea is that, given the structural constraint on the vine structure describedin Section 5.2.1, conditional distributions are sequentially computed according tothe vine structure, and the conditional distribution of the response variable givenall the explanatory variables is obtained in the end. The input is a vine copularegression model with a vine array A = (ak j), a vector of new explanatory vari-ables x = (x1, . . . ,xd)′, and a percentile u ∈ (0,1). The vine array is an efficientand compact way to represent a vine structure; see Section 2.3.2 or Kurowickaand Joe (2011) or Joe (2014). The R-vine matrices in the VineCopula package(Schepsmeier et al., 2018) are the vine arrays with backward indexing of rows andcolumns. The algorithm returns the conditional CDF of the response variable giventhe explanatory variables evaluated at u, that is, pi(u|x) := P(FY (Y ) ≤ u|X = x).It calculates the conditional distributions C j|a` j;a1 j,...,a`−1, j and Ca` j| j;a1 j,...,a`−1, j for`= 1, . . . ,ntrunc and j = `+1, . . . ,d, where ntrunc is the truncation level of the vinecopula. For discrete variables, both the left-sided and right-sided limits of the con-ditional CDF are retained. In the end, Cd|ad−1,d ;a1d ,...,ad−2,d is returned.If the response variable Y is continuous, then the conditional mean and condi-tional quantile can be calculated using pi(·|x): the α-quantile is F−1Y (pi−1(α|x)),and the conditional mean isE(Y |X = x) =∫ 10F−1Y (pi−1(α|x))dα,where pi−1(·|x) is calculated using the secant method, and the numerical integration91is computed using Monte Carlo methods or numerical quadrature. If the responsevariable Y is ordinal, then it is a classification problem; we only need to focus onthe support of Y . The conditional CDF is fully specified by pi(FY (y)|x) = P(Y ≤y|X = x), where y ∈ {k : P(Y = k)> 0}.If the response variable Y is nominal, then the proposed method does not apply.An alternative vine-copula-based method is to fit a vine copula model for eachclass separately and use the Bayes’ theorem to predict the class label. Specifically,for samples in class Y = k, we fit a vine copula density fˆX|Y (x|k). Let pˆik be theproportion of samples in class k in the training set. According the Bayes’ theorem,the predicted probability that a sample belongs to class k isfˆY |X(k|x) =pˆik fˆX|Y (x|k)∑ j pˆi j fˆX|Y (x| j).The classification rule has been utilized in Nagler and Czado (2016) in an exampleinvolving vines with nonparametric pair copula estimation using kernels. Sincethe distribution of predictors is modeled separately for each class, this alternativemethod is more flexible but has a high computational cost, especially when thenumber of classes is large.5.4 Simulation studyWe demonstrate the flexibility and effectiveness of vine copula regression methodsby visualizing the fitted models on simulated datasets. The simulated datasets havethree variables: X1 and X2 are the explanatory variables and Y is the responsevariable, whereX =(X1X2)∼ N((00),(1 0.50.5 1))and Y is simulated in three cases with varying conditional expectation and variancestructures. Let U1 =Φ(X1) and U2 =Φ(X2), where Φ is the standard normal CDF,and ε be a random error following a standard normal distribution and independentfrom X1 and X2. The three cases are as follows:1. Linear and homoscedastic: Y = 10X1+5X2+10ε .92Algorithm 5.1 Conditional CDF of the response variable given the explanatoryvariables with which to predict; based on steps from Algorithms 4, 7, 17, 18 inChapter 6 of Joe (2014).Input: Vine array A = (ak j) with a j j = j for j = 1, . . . ,d on the diagonal. u+ =(u+1 , . . . ,u+d ), u−=(u−1 , . . . ,u−d ), where u+j =Fj(x j) and u−j =Fj(x−j ) for 1≤ j≤ d−1,u+d = u−d ∈ [0,1].Output: P(Fd(Xd)≤ u+d |X1 = x1, . . . ,Xd−1 = xd−1).1: Compute M = (mk j) in the upper triangle, where mk j = max{a1 j, . . . ,ak j} for k =1, . . . , j−1, j = 2, . . . ,d.2: Compute the I = (Ik j) indicator array as in Algorithm 5 in Joe (2014).3: s+j = u+a1 j ,s−j = u−a1 j ,w+j = u+j ,w−j = u−j , for j = 1, . . . ,d.4: for `= 2, . . . ,ntrunc do5: for j = `, . . . ,d do6: if I`−1, j = 1 then7: if isDiscrete(variable j) then8: v′+j ←Ca`−1, j j;a1 j ...a`−2, j (s+j ,w+j )−Ca`−1, j j;a1 j ...a`−2, j (s+j ,w−j )w+j −w−j,9: v′−j ←Ca`−1, j j;a1 j ...a`−2, j (s−j ,w+j )−Ca`−1, j j;a1 j ...a`−2, j (s−j ,w−j )w+j −w−j,10: else11: v′+j ←Ca`−1, j | j;a1 j ...a`−2, j(s+j |w+j ),12: v′−j ←Ca`−1, j | j;a1 j ...a`−2, j(s−j |w+j ),13: end if14: end if15: if isDiscrete(variable a`−1, j) then16: v+j ←Ca`−1, j j;a1 j ...a`−2, j (s+j ,w+j )−Ca`−1, j j;a1 j ...a`−2, j (s−j ,w+j )s+j −s−j,17: v−j ←Ca`−1, j j;a1 j ...a`−2, j (s+j ,w−j )−Ca`−1, j j;a1 j ...a`−2, j (s−j ,w−j )s+j −s−j,18: else19: v+j ←C j|a`−1, j ;a1 j ...a`−2, j(w+j |s+j ), v−j ←C j|a`−1, j ;a1 j ...a`−2, j(w−j |s+j ),20: end if21: end for22: for j = `+1, . . . ,d do23: if a`, j = m`, j then s+j ← v+m`+1, j , s−j ← v−m`+1, j ,24: else if a`, j < m`, j then s+j ← v′+m`+1, j , s−j ← v′−m`+1, j ,25: end if26: w+j ← v+j , w−j ← v−j ,27: end for28: end for29: Return v+d .932. Linear and heteroscedastic: Y = 10X1+5X2+10(U1+U2)ε .3. Non-linear and heteroscedastic: Y =U1e1.8U2 +0.5(U1+U2)ε .We simulate samples with size 2000 in each case with a random split of 1000observations for a training set and a test set. Five methods are considered in thesimulation study: (1) linear regression, (2) linear regression with logarithmic trans-formation of the response variable, (3) quadratic regression, (4) Gaussian copularegression, and (5) vine copula regression. The Gaussian copula can be consideredas a special case of the vine copula, in which the bivariate copula families on thevine edges are all bivariate Gaussian. Different models are trained on the trainingset and used to obtain the conditional expectations as point predictions and 95%prediction intervals on the test set. For copula regressions, the upper and lowerbounds of the 95% prediction interval are the conditional 97.5% and 2.5% quan-tiles respectively. For the Gaussian and vine copula, the marginal distribution ofY is fitted by the maximum likelihood estimation (MLE) of a normal distributionin case 1. In cases 2 and 3, the distributions of the response variable are skewedand unimodal but not too heavy-tailed. Therefore, we fit 3-parameter skew-normaldistributions. For the vine copula regression, the candidate bivariate copula fami-lies include Student-t, MTCJ, Gumbel, Frank, Joe, BB1, BB6, BB7, BB8, and thecorresponding survival copulas. The bivariate copulas are selected using the AICdescribed in Section 5.2.2. The procedure is replicated 100 times and the averagescores of the replicates are reported in Table 5.1. To evaluate the performance of aregression model, we apply the root-mean-square error (RMSE) and several scoringrules for probabilistic forecasts studied in Gneiting and Raftery (2007), includingthe logarithmic score (LS), quadratic score (QS), interval score (IS), and integratedBrier score (IBS). Note that the RMSE is not meaningful if there is heteroscedas-ticity in conditional distributions; the LS, QS, IS, and IBS assess the predictivedistributions with non-constant variance more effectively.• The root-mean-square error (RMSE) measures a model’s performance onpoint estimations.RMSE(M ) =√1ntestntest∑i=1(yi− yˆMi )2,94where yi is the response variable of the i-th sample in the test set, and yˆMi isthe predictive conditional expectation of a fitted modelM .• The logarithmic score (LS) is a scoring rule for probabilistic forecasts ofcontinuous variables (Gneiting and Raftery, 2007). It is closely related tothe generalization error in machine learning literature (Chapter 7.2 in Hastieet al. (2009)).LS(M ) =1ntestntest∑i=1log fˆMY |X(yi|xi),where (xi,yi) is the ith observation in the test set, and fˆMY |X is the predictiveconditional PDF of modelM . For example, ifM is a linear regression, thenthe predictive conditional distribution is a scaled and shifted t-distribution. IfM is a vine copula, the predictive conditional distribution can be calculatedusing the procedure described in Section 5.3.• The quadratic score (QS) measures the predictive density, penalized by itsL2 norm (Gneiting and Raftery, 2007):QS(M ) =1ntestntest∑i=1[2 fˆMY |X(yi|xi)−∫ ∞−∞fˆMY |X(y|xi)2 dy].Selten (1998) provide axiomatic characterizations of the quadratic scoringrule in terms of desirable properties.• The interval score (IS) is a scoring rule for quantile and interval forecasts(Gneiting and Raftery, 2007). In the case of the central (1−α)× 100%prediction interval, let uˆMi and ˆ`Mi be the predictive quantiles at level α/2and 1−α/2 by model M for the i-th test sample. The interval score ofmodelM isIS(M ) =1ntestntest∑i=1[(uˆMi − ˆ`Mi )+2α( ˆ`Mi − yi)I{yi < ˆ`Mi }+2α(yi− uˆMi )I{yi > uˆMi }].Smaller interval scores are better. A model is rewarded for narrow predic-95tion intervals, and it incurs a penalty, the size of which depends on α , if anobservation misses the interval.• The integrated Brier score (IBS) is a scoring rule that is defined in terms ofpredictive cumulative distribution functions (Gneiting and Raftery, 2007):IBS(M ) =1ntestntest∑i=1∫ ∞−∞[F̂MY |X(y|xi)− I{y≥ yi}]2dy,where F̂MY |X is the predictive conditional CDF of modelM . Smaller integratedBrier scores are better.The first case serves as a sanity check; if the response variable is linear inthe explanatory variables and the conditional variance is constant, the vine cop-ula should behave like linear regression. Figure 5.3a plots the simulated data, thetrue conditional expectation surface and true 95% prediction interval surfaces. Fig-ure 5.3b plots the corresponding predicted surfaces. All three surfaces truthfullyreflect the linearity of the data. The first three lines of Table 5.1 show that the vinecopula and linear regression have similar performance in terms of all five metrics.The second case adds heteroscedasticity to the first case; that is, the variance ofY increases as X1 or X2 increases while the linear relationship remains the same. Weexpect the conditional expectation surface to be linear. Figure 5.4a and Figure 5.4bshow the true and predicted surfaces respectively. The conditional expectationsurface is linear and the lengths of prediction intervals increase with X1 and X2.The performance measures in Table 5.1 are also consistent with our expectation:the vine copula models have better LS, QS, IS, and IBS, although the RMSE isslightly worse than the linear regression model. The logarithmic transformation ofthe response variable does not seem to improve the performance.Finally, the third case incorporates both non-linearity and heteroscedasticity.Since the linear regression obviously cannot fit the non-linear trend, we compareour model to quadratic regression as well. Figure 5.5 shows the true surfacesand the predicted surfaces for the three models. Although the quadratic regres-sion model captures the non-linear trend, it is not flexible enough to model het-eroscedasticity. Another drawback of quadratic regression is that, the conditionalmean yˆ is not always monotonically increasing with respect to x1 and x2, and this96(a) Linear and homoscedastic data,the true surfaces.(b) Linear and homoscedastic data,predicted surfaces by a vine copula re-gression model.Figure 5.3: The linear homoscedastic simulation case. In this fitted vine cop-ula model, C13,C12 and C23;1 are all Gaussian copulas, with parametersρ13 = 0.77,ρ12 = 0.5 and ρ23;1 = 0.39. The green surfaces represent theconditional expectation, and the red and blue surfaces are the 2.5% and97.5% quantile surfaces, respectively.contradicts the pattern in the data. The vine copula naturally fits the non-linearityand heteroscedasticity pattern. Quantitatively, the quadratic regression model hasthe best RMSE and IS, but vine copula models have the best LS, QS, and IBS, asshown in Table 5.1.We have also conducted a similar simulation study with four explanatory vari-ables X1,X2,X3,X4, whereX =X1X2X3X4∼ N0000 ,1 0.5 0.5 0.50.5 1 0.5 0.50.5 0.5 1 0.50.5 0.5 0.5 1 .The response variable Y is generated from similar three cases:1. Linear and homoscedastic: Y = 5(X1+X2+X3+X4)+20ε .2. Linear and heteroscedastic: Y = 5(X1 +X2 +X3 +X4)+10(U1 +U2 +U3 +97(a) Linear and heteroscedastic data,the true surfaces.(b) Linear and heteroscedastic data,predicted surfaces by a vine copula re-gression model.Figure 5.4: The linear heteroscedastic simulation case. In this fitted vine cop-ula model, C13 is a survival Gumbel copula with parameter δ13 = 2.21,C12 is a Gaussian copula with parameter ρ12 = 0.5, and C23;1 is a BB8copula with parameters ϑ23;1 = 3.06,δ23;1 = 0.71. The green surfacesrepresent the conditional expectation, and the red and blue surfaces arethe 2.5% and 97.5% quantile surfaces, respectively.U4)ε .3. Non-linear and heteroscedastic: Y =U1U2e1.8U3U4+0.5(U1+U2+U3+U4)ε .The results of the simulation study are shown in Table 5.2, the pattern of which issimilar to that of Table 5.1.5.5 Application5.5.1 Abalone data setIn this section, we apply the vine copula regression method on a real data set: theAbalone data set (Lichman, 2013). The data set comes from an original (non-machine-learning) study (Nash et al., 1994). It has 4177 examples, and the goalis to predict the age of abalone from physical measurements; the names of thesemeasurements are in Figure 5.6. The age of abalone is determined by counting98(a) Non-linear and heteroscedasticdata, the true surfaces.(b) Non-linear and heteroscedasticdata, predicted surfaces by a linear re-gression model.(c) Non-linear and heteroscedasticdata, predicted surfaces by a quadraticregression model.(d) Non-linear and heteroscedasticdata, predicted surfaces by a vine cop-ula regression model.Figure 5.5: The non-linear and heteroscedastic simulation case. In this fit-ted vine copula model, C13 is a survival BB8 copula with parametersϑ13 = 6,δ13 = 0.78, C12 is a Gaussian copula with parameter ρ12 = 0.5,and C23;1 is a BB8 copula with parameters ϑ23;1 = 6,δ23;1 = 0.65. Thegreen surfaces represent the conditional expectation, and the red andblue surfaces are the 2.5% and 97.5% quantile surfaces, respectively.99Case Model RMSE↓ LS↑ QS↑ IS↓ IBS↓1 Linear reg. 10.01 (0.02) −3.72 (0.00) 0.028 (0.000) 39.25 (0.09) 5.64 (0.01)Gaussian copula reg. 10.01 (0.02) −3.72 (0.00) 0.028 (0.000) 39.09 (0.09) 5.64 (0.01)Vine copula reg. 10.01 (0.02) −3.72 (0.00) 0.028 (0.000) 39.14 (0.09) 5.64 (0.01)2 Linear reg. 11.19 (0.03) −3.83 (0.00) 0.028 (0.000) 43.80 (0.14) 6.06 (0.02)Reg. with log-transform 11.71 (0.04) −3.83 (0.01) 0.031 (0.000) 47.22 (0.30) 6.09 (0.02)Gaussian copula reg. 11.32 (0.03) −3.75 (0.00) 0.031 (0.000) 41.45 (0.13) 5.95 (0.02)Vine copula reg. 11.38 (0.03) −3.73 (0.00) 0.033 (0.000) 41.24 (0.12) 5.97 (0.02)3Linear reg. 0.77 (0.01) −1.16 (0.00) 0.388 (0.001) 3.03 (0.00) 0.43 (0.00)Reg. with log-transform 0.69 (0.00) −0.87 (0.00) 0.540 (0.002) 2.52 (0.01) 0.35 (0.00)Quadratic reg. 0.62 (0.00) −0.95 (0.00) 0.511 (0.001) 2.43 (0.01) 0.34 (0.00)Gaussian copula reg. 0.69 (0.00) −0.86 (0.00) 0.604 (0.002) 2.65 (0.01) 0.35 (0.00)Vine copula reg. 0.63 (0.00) −0.75 (0.00) 0.686 (0.002) 2.50 (0.01) 0.32 (0.00)Table 5.1: Simulation results for two explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS) in differ-ent simulation cases. The arrows in the header indicate that lower RMSE,IS, and IBS; and higher LS and QS are better. The numbers in parenthesesare the corresponding standard errors.the number of rings (Rings) through a microscope, and this is a time-consumingtask. Other physical measurements that are easier to obtain, are used to predictthe age. Rings can be regarded either as a continuous variable or an ordinal one.Thus the problem can be either a regression or a classification problem. We focuson the subset of 1526 male samples (with two outliers removed). Figure 5.6 showsthe pairwise scatter plots, marginal density functions and pairwise correlation co-efficients. There is clear non-linearity and heteroscedasticity among the pairs ofvariables. We discuss the regression problem in Section 5.5.2, and Section 5.5.3shows the results for the classification problem.5.5.2 RegressionIn this section, we compare the performance of vine copula and linear regressionmethods. Three vine regressions are considered:• R-vine copula regression: the proposed method with the candidate bivariate100Case Model RMSE↓ LS↑ QS↑ IS↓ IBS↓1 Linear reg. 20.09 (0.05) −4.42 (0.00) 0.014 (0.000) 78.53 (0.18) 11.34 (0.03)Gaussian copula reg. 20.09 (0.05) −4.42 (0.00) 0.014 (0.000) 78.06 (0.18) 11.34 (0.03)Vine copula reg. 20.12 (0.05) −4.42 (0.00) 0.014 (0.000) 78.18 (0.18) 11.36 (0.03)2 Linear reg. 22.04 (0.07) −4.51 (0.00) 0.014 (0.000) 86.25 (0.29) 12.01 (0.04)Reg. with log-transform 22.41 (0.07) −4.56 (0.01) 0.015 (0.000) 96.02 (0.75) 11.88 (0.03)Gaussian copula reg. 22.11 (0.07) −4.46 (0.00) 0.015 (0.000) 84.79 (0.27) 11.78 (0.03)Vine copula reg. 22.43 (0.07) −4.44 (0.00) 0.016 (0.000) 82.42 (0.26) 11.91 (0.04)3Linear reg. 1.22 (0.00) −1.62 (0.00) 0.251 (0.001) 4.80 (0.02) 0.67 (0.00)Reg. with log-transform 1.22 (0.00) −1.57 (0.00) 0.270 (0.001) 4.73 (0.02) 0.64 (0.00)Quadratic reg. 1.13 (0.00) −1.54 (0.00) 0.275 (0.001) 4.42 (0.01) 0.62 (0.00)Gaussian copula reg. 1.21 (0.00) −1.56 (0.00) 0.273 (0.001) 4.68 (0.02) 0.64 (0.00)Vine copula reg. 1.19 (0.00) −1.50 (0.00) 0.290 (0.001) 4.35 (0.01) 0.63 (0.00)Table 5.2: Simulation results for four explanatory variables. The table showsthe root-mean-square error (RMSE), logarithmic score (LS), quadraticscore (QS), interval score (IS), and integrated Brier score (IBS) in differ-ent simulation cases. The arrows in the header indicate that lower RMSE,IS, and IBS; and higher LS and QS are better. The numbers in parenthesesare the corresponding standard errors.Figure 5.6: Pairwise scatter plots of the Abalone dataset.101copula families;• Gaussian copula regression with R-vine partial correlation parametrization:the proposed method with the bivariate Gaussian copulas only;• D-vine copula regression: Kraus and Czado (2017a) with the candidate bi-variate copula families.The candidate bivariate copulas include Student-t, MTCJ, Gumbel, Frank, Joe,BB1, BB6, BB7, BB8, and the corresponding survival and reflected copulas.We perform 100 trials of 5-fold cross validation. Vine copula regressions andlinear regression are fitted using the training set, and the test set is used for per-formance evaluation. All the univariate margins are fitted by skew-normal distri-butions. The conditional mean and 95% prediction interval are obtained for allmodels. For copula regressions, the upper and lower bounds of the 95% predictioninterval are the conditional 97.5% and 2.5% quantiles respectively.We consider the out-of-sample performance measures used in Section 5.4: theroot-mean-square error (RMSE), logarithmic score (LS), quadratic score (QS), in-terval score (IS), and integrated Brier score (IBS). Table 5.3 shows the averageperformance measures from the 100 trials of cross validation. Compared with lin-ear regression, our method has lower prediction errors, and better predictive scores.The performance of the R-vine copula model is slightly better than the D-vine cop-ula model, in terms of all five scores. The vine array and bivariate copulas on theedges of the R-vine from one round of cross-validation are shown in Table 5.4.Figure 5.7 gives a visualization of the R-vine array. Several of the copulas linkingto the response variables in trees 2 to 7 represent weak negative dependence.The fitted D-vine regression model has path Diameter–VisceraWeight–WholeWeight–ShuckedWeight–ShellWeight–Rings in the first level ofthe D-vine structure.We have applied the diagnostic tools of asymmetry and simplifying assumptionmentioned in Chapter 4 to the second tree of the SeqMST output. The simplify-ing assumption seems valid. We have also conducted monotonicity checks of thepredicted conditional median based on the fitted R-vine model. Four of the linkingcopulas in trees 2 to 7 (last column of the right-hand side of Table 5.4) represent102Figure 5.7: Visualization of the R-vine array in Table 5.4.103Model RMSE↓ LS↑ QS↑ IS↓ IBS↓Linear reg. 2.272 −2.240 0.138 8.909 1.232Gaussian copula reg. 2.287 −2.142 0.152 8.276 1.208D-vine copula reg. 2.183 −2.064 0.163 8.104 1.141R-vine copula reg. 2.168 −2.057 0.164 8.005 1.136Table 5.3: Comparison of the performance of vine copula regressions and lin-ear regression. The numbers are the average scores over 100 trials of5-fold cross validation. The scoring rules are defined in Section 5.4.conditional negative dependence given the previously linked variables to the re-sponse variable. This means that the conditional median function is not alwaysmonotone increasing in an explanatory variable when others are held fixed. How-ever, when all explanatory variables are increasing together (for larger abalone),the conditional median is increasing. This property is similar to classical Gaussianregression with positive correlated explanatory variables and the existence of neg-ative regression coefficients because of some negative partial correlations. Evenwith some negative conditional dependence, there is overall better out-of-sampleprediction performance by keeping all of the explanatory variables in the model.We also did some numerical checks on the conditional quantiles when oneexplanatory variable becomes extreme and other variables are held fixed. It appearsthat the behavior is close to asymptotically constant. From the linking copulas inTable 5.4 and the results in Chapter 6, we would not be expecting asymptotic linearbehavior (and this is reasonable from the context of the variables).Figure 5.8 visualizes the prediction performance of the three methods on thefull dataset. The plots show the residuals against the fitted values on the test set,and the prediction intervals. Due to heteroscedasticity, there is more variation inresiduals as fitted value increases. However, linear regression fails to capture theheteroscedasticity and the prediction intervals are roughly of the same length. Vinecopula regression gives wider (narrower) prediction intervals when the fitted valuesare larger (smaller). This illustrates the reason why our method overall has moreprecise prediction intervals.1044 4 4 4 4 7 1 77 7 7 5 4 4 45 5 7 6 5 56 6 5 7 61 1 6 33 3 12 28- BB6.s BB6.s BB6.s BB1.s Gumbel.s BB6.s Gumbel.s- - t Joe.v BB8.s t BB8.s BB8.u- - - t Frank Frank BB8.v BB8.u- - - - Frank t Frank MTCJ.v- - - - - t Frank t- - - - - - Gumbel t- - - - - - - Gumbel.u- - - - - - - -Table 5.4: Vine array and bivariate copulas of the R-vine copula regres-sion fitted on the full dataset. The variables are (1) Length, (2)Diameter, (3) Height, (4) WholeWeight, (5) ShuckedWeight,(6) VisceraWeight, (7) ShellWeight, (8) Rings. A suffix of ‘s’represents survival version of the copula family to get the opposite direc-tion of joint tail asymmetry; ‘u’ and ‘v’ represent the copula family withreflection on the first and second variable respectively to get negative de-pendence.5.5.3 ClassificationThe response variable Rings is an ordinal variable that ranges from 3 to 27.Therefore this is a multiclass classification problem. Although our method canhandle multiclass classification problems, we reduce it to a binary classificationproblem for easy comparison with commonly used methods, including logisticregression, support vector machine (SVM), and random forest (RF). The samplemedian of Rings is 10; if a sample’s Rings is greater than 10, we label it as‘large’, otherwise ‘small’. All the predictor variables are fitted by skew-normaldistributions, and we fit an empirical distribution to the response variable Rings.The D-vine regression method (Kraus and Czado, 2017a) can only handle con-tinuous variables and is not directly applicable to the classification problem. In or-der to compare our method with the D-vine based method, we first treat the binaryresponse variable as a continuous variable (0 and 1) and use the D-vine regressionmethod (Kraus and Czado, 2017a) to find a D-vine structure or an ordering of vari-ables. Then an R-vine regression model is fitted on that D-vine structure using ourmethod.105Figure 5.8: Residual vs. fitted value plots. The red and blue points corre-spond to the lower bound and upper bound of the prediction intervals.For binary classifiers, the performance can be demonstrated by a receiver oper-ating characteristic (ROC) curve. The curve is created by plotting the true positiverate against the false positive rate at various threshold settings. The (0,1) point cor-responds to a perfect classification; a completely random guess would give a pointalong the diagonal line. An ROC curve is a two-dimensional depiction of classifierperformance. To compare classifiers we may want to reduce ROC performance toa scalar value representing the expected performance. A common method is tocalculate the area under the curve (AUC) (Fawcett, 2006). The AUC can also beinterpreted as the probability that a classifier will rank a randomly chosen positiveinstance higher than a randomly chosen negative one. Therefore, larger AUC isbetter. Figure 5.9a shows sample ROC curves of different binary classifiers and thecorresponding AUCs.106(a) ROC curves of different binary clas-sifiers. The performance is evaluated on thetest set.(b) Box plot of the AUCs based on 10-fold cross-validation, repeated 20 times.Figure 5.9: Comparison of the performance on the classification problem.Repeated 10-fold cross-validation with random partitions is used to assess theperformance. In each pass, 10-fold cross-validation is performed and the averageAUC is recorded. Figure 5.9b shows a box plot of the average AUCs. The perfor-mance of vine copula regression is marginally better than the other methods. Theaverage AUCs are: RVineReg = 0.835, DVineReg = 0.826, SVM = 0.825, Logisti-cReg = 0.814, RF = 0.811.5.6 ConclusionOur vine copula regression method uses R-vines and can fit mixed continuous andordinal variables. The prediction algorithm can efficiently compute the conditionaldistribution given a fitted vine copula, without marginalizing the conditioning vari-ables. The performance of the proposed method is evaluated on simulated data setsand the Abalone data set. The heteroscedasticity in the data is better captured byvine copula regression than the standard regression methods.One potential drawback of the proposed method is the computational cost forhigh-dimensional data, especially when the dimensionality is greater than the sam-ple size. This chapter is more of a proof of concept of using R-vine copula models107for regression and classification problems. Therefore, we evaluate the performanceof the proposed methods on classical cases and compare with models such as lin-ear regressions. Another drawback is the constraint on the vine structure such thatthe response variable is always a leaf node at each level. This constraint greatlyreduces the computational complexity; without it, numerical integration would berequired to compute the conditional CDF. Finally, the criticism of copula-basedregression by Dette et al. (2014) also applies. The proposed method assumes amonotonic relationship between the response variable and explanatory variables.To relate how choices of bivariate copula families in the vine can affect predic-tion and to provide guidelines on bivariate copula families to consider, we give atheoretical analysis of the asymptotic shape of conditional quantile functions. Forbivariate copulas, the conditional quantile function of the response variable couldbe asymptotically linear, sublinear, or constant with respect to the explanatory vari-able. It turns out the asymptotic conditional distribution can be quite complex fortrivariate and higher-dimensional cases, and there are counter-intuitive examples.In practice, we recommend computing conditional quantile functions of the fittedvine copula to assess if the monotonicity properties are reasonable.One possible future research direction is the extension of the proposed regres-sion method for survival outcomes with censored data. For example, Emura et al.(2018) use bivariate copulas to predict time-to-death given time-to-cancer progres-sion; Barthel et al. (2018) apply vine copulas to multivariate right-censored eventtime data. They apply copulas to the joint survival function instead of the jointCDF to deal with right-censoring. These types of applications would require morenumerical integration methods.Another research direction is to handle variable selection and reduction whenthere are many explanatory variables, some of which might form clusters withstrong dependence. Traditional variable selection methods for regression can alsobe applied, for example, the forward selection approach. Moreover, recent pa-pers proposed methods for learning sparse vine copula models (Mu¨ller and Czado,2019; Nagler et al., 2019), which can be potentially used as a variable selectionmethod for copula regression.108Chapter 6Theoretical results on shapes ofconditional quantile functions6.1 IntroductionFrom the properties of the multivariate normal distribution, if (X1, . . . ,Xp,Y ) fol-lows a multivariate normal distribution, then the conditional quantile function ofY |X1, . . . ,Xp has the linear formF−1Y |X1,...,Xp(α|x1, . . . ,xp) = β1x1+ · · ·+βpxp+Φ−1(α)√1−R2Y ;X1,...,Xp ,0 < α < 1,where R2Y ;X1,...,Xp is the multiple correlation coefficient. Going beyond the nor-mal distribution, we address the following question in this section: how does thechoice of bivariate copulas in a vine copula regression model affect the condi-tional quantile function, especially when the explanatory variables are large (inabsolute value)? For comparisons with multivariate normal, we assume the vari-ables X1, . . . ,Xp,Y have been transformed so that they have marginal N(0,1) dis-tributions. In this case, plots from vine copulas with one or two explanatory vari-ables can show conditional quantile functions that are close to linear in the middle,and asymptotically linear, sublinear or constant along different directions to ±∞;109Bernard and Czado (2015) have several figures that show this pattern for the case ofone explanatory variable. Such behavior cannot be obtained with regression equa-tions that are linear in β ’s and is hard to obtain with nonlinear regression functionsthat are directly specified.We start with the bivariate case (one explanatory variable) in Section 6.2. Con-ditions are obtained to classify the asymptotic behavior of conditional quantilefunction into four categories: strongly linear, weakly linear, sublinear and asymp-totically constant. For bivariate Archimedean copulas, the conditions are related toconditions on the Laplace transform generator, as shown in Section 6.3. Section 6.4studies the trivariate case F−1Y |X1,X2(α|x1,x2) with a trivariate vine copula. However,extending from bivariate to trivariate is challenging: the asymptotic conditionalquantile depends on the direction in which (x1,x2) go to infinity. Section 6.4.1analyzes the strongest possible dependence in the trivariate case: functional rela-tionship between Y and (X1,X2). It is difficult if not impossible to obtain a generalresult, given the marginal distribution of Y does not have a closed-form expressionin general. We focus on a special case where the marginal distribution of Y canbe calculated. It is shown that the conditional quantile function is asymptoticallylinear in x1 or x2 along a ray on the (x1,x2)-plane, and this is an extension of thebivariate strongly linear case. Section 6.4.2 and 6.4.3 study the asymptotic con-ditional CDF for a trivariate vine copula with bivariate Archimedean copulas andstandard normal margins; the Archimedean assumption allows for some tractableresults to be obtained. We give a classification of the conditional CDF based on thetail dependence properties of bivariate Archimedean copulas. Section 6.4.3 con-siders several special cases with different combinations of bivariate Archimedeancopulas on the edges of a trivariate vine, and shows a number of different tail be-haviors. Section 6.5 briefly discusses the possibility of generalize the results tohigher dimensions.6.2 Bivariate asymptotic conditional quantileIn this section, we focus on a bivariate random vector (X ,Y ) with standard normalmargins. Let C(u,v) be the copula, then the joint CDF is FX ,Y (x,y)=C(Φ(x),Φ(y)).The copula C(u,v) is assumed to have positive dependence. We are interested in the110shape of the conditional CDF FY |X(y|x) and conditional quantile F−1Y |X(α|x), when xis extremely large or small and α ∈ (0,1) is fixed. Bernard and Czado (2015) studya few special cases for bivariate copulas. Our results are more extensive in relatingthe shape of asymptotic quantiles to the strength of dependence in the joint tail.If the conditional distribution CV |U(·|u) converges to a continuous distributionwith support on [0,1], as u→ 0+, then C−1V |U(α|0) > 0 , for α ∈ (0,1). Therefore,F−1Y |X(α|x) levels off as x→−∞. The same argument applies when x→+∞. Thatis,limx→−∞F−1Y |X(α|x) =Φ−1(C−1V |U(α|0)); limx→+∞F−1Y |X(α|x) =Φ−1(C−1V |U(α|1)).If CV |U(·|u) converges to a degenerate distribution at 0 when u→ 0+, thenlimu→0+ C−1V |U(α|u) = 0. To study the shape of FY |X(y|x) when x is very negative,we need to further investigate the rate at which C−1V |U(α|u) converges to 0. The nextproposition summarizes the possibilities.Proposition 6.1. Let (X ,Y ) be a bivariate random vector with standard normalmargins and a positively dependent copula C(u,v).• (Lower tail) Fixing α ∈ (0,1), if− logC−1V |U(α|u)∼ kα(− logu)η as u→ 0+,kα > 0, then F−1Y |X(α|x)∼−(21−ηkα)1/2|x|η as x→−∞.• (Upper tail) Fixing α ∈ (0,1), if − log[1−C−1V |U(α|u)] ∼ kα [− log(1− u)]ηas u→ 1−, kα > 0, then F−1Y |X(α|x)∼ (21−ηkα)1/2xη as x→+∞.Proof. We use the following asymptotic results from Abramowitz and Stegun (1964).Φ(z)∼ 1− φ(z)z∼ 1− 1√2pize−z2/2, z→+∞;Φ−1(p)∼ (−2log(1− p))1/2 , p→ 1−;Φ(z)∼−φ(z)z∼ 1√2pi|z|e−z2/2, z→−∞;Φ−1(p)∼−(−2log p)1/2 , p→ 0+.111Using the notation above,Φ−1(C−1V |U(α|u))∼−(−2logC−1V |U(α|u))1/2 ∼−(2kα(− logu)η)1/2. (6.1)When u =Φ(x)∼ φ(x)/|x|,− logu∼− logφ(x)+ log |x| ∼ 12x2+ log |x| ∼ 12x2. (6.2)Combining Equation 6.1 and Equation 6.2, we obtain the asymptotic conditionalquantile function as x→−∞,F−1Y |X(α|x) =Φ−1(C−1V |U(α|Φ(x)))∼−(2kα(x2/2)η)1/2 ∼−(21−ηkα)1/2|x|η .The proof of the second part is similar and thus omitted.Note that the positive dependence assumption implies kα > 0 so that the condi-tional quantiles are asymptotically increasing at the two extremes. A related resultcan be obtained for negative dependence, but it is of less interest since, in general,one tries to orient variables to have positive dependence with each other. Here ηindicates the strength of relation between two variables in the tail; a larger η valuecorresponds to stronger relation. The strongest possible comonotonic dependenceis when Y = X , and the conditional quantile function is F−1Y |X(α|x) = x, which islinear in x and does not depend on α; in this case, η = 1. The weakest possiblepositive dependence is when X and Y are independent, and F−1Y |X(α|x) = F−1Y (α)does not depend on x; in this case, η = 0. Based on the value of η , the asymptoticbehavior of the conditional quantile function can be classified into the followingcategories:1. Strongly linear: η = 1 and kα = 1. F−1Y |X(α|x) goes to infinity linearly, and itdoes not depend on α . It has stronger dependence than bivariate normal.2. Weakly linear: η = 1, kα can depend on α and 0 < kα < 1. F−1Y |X(α|x) goesto infinity linearly and it depends on α . It has comparable dependence withbivariate normal.112Figure 6.1: Conditional quantile functions for bivariate copulas withKendall’s τ = 0.5, combined with N(0,1) margins. Quantile levels are20%,40%,60% and 80%.3. Sublinear: 0 < η < 1. F−1Y |X(α|x) goes to infinity sublinearly. The depen-dence is weaker than bivariate normal.4. Asymptotically constant: η = 0. F−1Y |X(α|x) converges to a finite constant.Asymptotically it behaves like independent.Figure 6.1 shows the conditional quantile functions for bivariate copulas with dif-ferent η in the upper and lower tails. Examples 6.1 to 6.2 derive the conditionalquantile functions for bivariate MTCJ and Gumbel copulas. Note that η is constantover α for several commonly used parametric bivariate copula families. However,there are cases where η depends on α . For example, the boundary conditional dis-tribution of the bivariate Student-t copula has mass at both 0 and 1; depending onthe value of α , C−1V |U(α|u) could go to either 0 or 1, as u→ 0.Example 6.1. (MTCJ lower tail) The bivariate MTCJ copula is defined in Sec-113tion 2.1. The conditional quantile function isC−1V |U(α|u;δ ) = [(α−δ/(1+δ )−1)u−δ +1]−1/δ∼ (α−δ/(1+δ )−1)−1/δu, u→ 0; δ > 0.Take the log of both sides, − logC−1V |U(α|u;δ ) ∼ logu. By Proposition 6.1, wehave F−1Y |X(α|x) ∼ x, as x→ −∞. To apply the next proposition to get the sameconclusion, the generator is the gamma Laplace transform ψ(s) = (1+ s)−1/δ .Example 6.2. (Gumbel lower tail) The bivariate Gumbel copula is defined in Sec-tion 2.1. The conditional CDF isCV |U(v|u;δ ) = u−1 exp{−[(− logu)δ +(− logv)δ ]1/δ}[1+(− logv− logu)δ]1/δ−1,δ > 1.The conditional quantile function C−1V |U(α|u;δ ) does not have a closed-form ex-pression; it has the following asymptotic expansion:− logC−1V |U(α|u;δ )∼ (−δ logα)1/δ (− logu)1−1/δ , u→ 0.By Proposition 6.1, we have F−1Y |X(α|x)∼−(−2δ logα)1/(2δ ) |x|1−1/δ , as x→−∞.To apply the next proposition to get the same conclusion, the generator is the posi-tive stable Laplace transform ψ(s) = exp{−s−1/δ}.6.3 Bivariate Archimedean copula boundary conditionaldistributionsThis section proves a proposition on the relationship between the tail dependencebehavior of a bivariate Archimedean copula and its tail conditional distribution andquantile functions. This proposition is used in Section 6.4.2.A bivariate Archimedean copula with Laplace transform generator ψ can beconstructed as C(u,v)=ψ(ψ−1(u)+ψ−1(v)), where u,v∈ [0,1],ψ(∞)= 0,ψ(0)=1, and ψ is non-increasing and convex. It is the CDF of a random vector (U,V ).114The corresponding conditional distribution P(V ≤ v|U = u) isCV |U(v|u) =∂C(u,v)∂u=ψ ′(ψ−1(u)+ψ−1(v))ψ ′(ψ−1(u)).We study the limit of the conditional distribution CV |U(v|u) as u→ 0 (or 1),and v could be a number in (0,1) or v→ 0 (or 1) as well. The limit depends onthe lower (upper) tail behavior of the copula, and the rate at which u and v goes to0 (or 1). The rate is characterized on the normal scale: we assume u,v→ 0 (or 1)and Φ−1(u)/Φ−1(v) converges to a constant. In other words, if X = Φ−1(U) andY = Φ−1(V ), we study the conditional distribution P(Y ≤ y|X = x) as x,y→ +∞(or −∞) and x/y converges to a constant.For Archimedean and survival Archimedean copulas, the following propositionprovides a link between tail dependence behavior and tail conditional distributionand quantile functions. The proof of the proposition is included in Section 6.3.Proposition 6.2. Given the generator or Laplace transform (LT)ψ of an Archimedeancopula, we assume the following.1. For the upper tail of ψ , as s→+∞,ψ(s)∼ T (s) = a1sq exp(−a2sr) and ψ ′(s)∼ T ′(s), (6.3)where a1 > 0, r = 0 implies a2 = 0 and q< 0, and r > 0 implies r ≤ 1 and qcan be 0, negative or positive.2. For the lower tail of ψ , as s→ 0+, there is M ∈ (k,k+1) such thatψ(s) =k∑i=0(−1)ihisi+(−1)k+1hk+1sM +o(sM), s→ 0+, (6.4)where h0 = 1 and 0 < hi < ∞ for i = 1, . . . ,k+1. If 0 < M < 1, then k = 0.Then we have the following.115• (Lower tail) If v ∈ (0,1) and α ∈ (0,1) are fixed, then as u→ 0,CV |U(v|u)∼1+(q−1)ψ−1(v)(ua1)−1/q→ 1 if r = 0,1−a1/r2 rψ−1(v)(− logu)1−1/r→ 1 if 0 < r < 1,const ∈ (0,1) if r = 1.(6.5)C−1V |U(α|u)∼(α1/(q−1)−1)q ·u→ 0 if r = 0,exp[−(− logαr)r(− logu)1−r]→ 0 if 0 < r < 1,const ∈ (0,1) if r = 1.(6.6)• (Upper tail) If v ∈ (0,1) and α ∈ (0,1) are fixed, then as u→ 1,CV |U(v|u)∼−ψ ′(ψ−1(v))h1/M1 M(1−u)(1−M)/M → 0 if 0 < M < 1,const ∈ (0,1) if M > 1.(6.7)C−1V |U(α|u)∼1−(α1/(M−1)−1)M (1−u)→ 1 if 0 < M < 1,const ∈ (0,1) if M > 1.(6.8)Note that we do not cover the case of M = 1 for the upper tail because it in-volves a slowly varying function. Combined with Proposition 6.1, it states that, forthe lower tail, the three cases r = 0, 0< r < 1 and r = 1 correspond to strongly lin-ear, sublinear and asymptotic constant conditional quantile functions respectively;for the upper tail, the two cases 0 < M < 1 and M > 1 correspond to stronglylinear and asymptotic constant conditional quantile functions respectively. Propo-sition 6.2 is used in Section 6.4.2 for analyzing cases of trivariate vine copulas.1166.3.1 Lower tailTo avoid technicalities, we assume the following from Equation 8.42 of Joe (2014).As s→ ∞,ψ(s)∼ T (s) = a1sq exp(−a2sr) and ψ ′(s)∼ T ′(s), (6.9)where a1 > 0, r = 0 implies a2 = 0 and q < 0, and r > 0 implies r ≤ 1 and q canbe 0, negative or positive.Lower tail dependence (r = 0)According to Theorem 8.34 in Joe (2014), for a bivariate Archimedean copula,when r = 0 and ψ ∈ RVq where q < 0, it has lower tail dependence with λL = 2q.Therefore, ψ(s) ∼ a1sq, ψ ′(s) ∼ a1qsq−1 and ψ−1(u) ∼ (u/a1)1/q, as s→ ∞ andu→ 0+. If u→ 0, v ∈ (0,1) and α ∈ (0,1), thenCV |U(v|u)∼(1+ψ−1(v)ψ−1(u))q−1∼ 1+(q−1)ψ−1(v)(ua1)−1/q→ 1, (6.10)For the conditional quantile, we set CV |U(v|u) = α and solve for v. As u→ 0, vshould also converge to 0, otherwise CV |U(v|u)→ 1. If u,v→ 0, the conditionaldistribution is asymptotic toCV |U(v|u)∼(1+( vu)1/q)q−1, (u,v)→ (0,0). (6.11)Therefore,α ∼(1+ψ−1(v)ψ−1(u))q−1∼(1+( vu)1/q)q−1,C−1V |U(α|u)∼(α1/(q−1)−1)q ·u→ 0.(The above is one result in Proposition 6.2.) If we further assume u ∼ vk, wherek ∈ (0,∞), thenCV |U(v|u)∼(1+u(1/k−1)/q)q−1, (u,v)→ (0,0), u∼ vk.117Depending on the value of k, there are three different cases:CV |U(v|u)∼1+(q−1)u(1/k−1)/q→ 1 if k > 1,2q−1 ∈ (0,1) if k = 1,uq−1q (1k−1)→ 0 if 0 < k < 1.(6.12)Lower tail intermediate dependence (0 < r < 1)If 0< r < 1, then C(u,v) has lower tail intermediate dependence with 1< κL(C) =2r < 2 (Hua and Joe, 2011). ψ ′(s) ∼ −a1a2rsq+r−1 exp(−a2sr) and ψ−1(u) ∼((− logu)/a2)1/r, as s→ ∞ and u→ 0.CV |U(v|u)∼(ψ−1(u)+ψ−1(v))q+r−1 exp(−a2 (ψ−1(u)+ψ−1(v))r)(ψ−1(u))q+r−1 exp(−a2 (ψ−1(u))r)∼(1+ψ−1(v)ψ−1(u))q+r−1exp{−a2(ψ−1(u))r [(1+ψ−1(v)ψ−1(u))r−1]}, u→ 0.(6.13)If v ∈ (0,1) is fixed, then as u→ 0,CV |U(v|u)∼(1+(q+ r−1)ψ−1(v)ψ−1(u))exp(−a2rψ−1(v)(ψ−1(u))r−1)∼(1+(q+ r−1)ψ−1(v)ψ−1(u))(1−a2rψ−1(v)(ψ−1(u))r−1)∼ 1−a2rψ−1(v)(ψ−1(u))r−1∼ 1−a1/r2 rψ−1(v)(− logu)1−1/r→ 1, u→ 0, v ∈ (0,1) fixed.(6.14)For the conditional quantile, we set CV |U(v|u) =α ∈ (0,1) and solve for v. Accord-ing to Equation 6.13, if ψ−1(v)/ψ−1(u) does not converge to 0, then CV |U(v|u)→1180. It must be that ψ−1(v)/ψ−1(u)→ 0 andCV |U(v|u)∼ exp(−a2rψ−1(v)(ψ−1(u))r−1)∼ α.Solving for v, we haveC−1V |U(α|u)∼ exp[−(− logαr)r(− logu)1−r]→ 0, u→ 0.(The above is one result in Proposition 6.2.) If u,v→ 0, the conditional distributionis asymptotic toCV |U(v|u)∼(1+(− logv− logu)1/r)q+r−1× exp(−(− logu)(1+(− logv− logu)1/r)r+(− logu)), (u,v)→ (0,0).(6.15)If we further assume u∼ vk, where k ∈ (0,∞), thenCV |U(v|u)∼(1+(1k)1/r)q+r−1u(1+(1/k)1/r)r−1→ 0, (6.16)regardless of the value of k.Lower tail quadrant independence (r = 1)If r = 1, then the copula has κL = 2 and support(CV |U(·|0)) = (0,1). Thereforelimu→0CV |U(v|u) ∈ (0,1) if v ∈ (0,1), and CV |U(v|u)→ 0 if (u,v)→ (0,0).6.3.2 Upper tail(Proposition 3 in Hua and Joe (2011)) Suppose ψ(s) is the Laplace transform (LT)of a positive variable Y with k < M < k+ 1 where M = sup{m ≥ 0 : E(Y m) <∞} and k ∈ {0}∪N+. If |ψ(k)−ψ(k)(s)| is regularly varying at 0+, then |ψ(k)−ψ(k)(s)| ∈ RM−k(0+). In particular, if the slowly varying component is `(s) and119lims→0+ `(s) = hk+1 with 0 < hk+1 < ∞, thenψ(s) =k∑i=0(−1)ihisi+(−1)k+1hk+1sM +o(sM), s→ 0+, (6.17)where h0 = 1 and 0 < hi < ∞ for i = 1, . . . ,k+1. If 0 < M < 1, then k = 0.Upper tail dependence (0 < M < 1)By Proposition 4 in Hua and Joe (2011), if 0 < M < 1, then C(u,v) has upper taildependence with λU = 2−2M. In this case, ψ(s) can be written asψ(s)∼ 1−h1sM,as s→ 0+, where 0 < h1 < ∞ and 0 < M < 1. Therefore, ψ ′(s)∼−h1MsM−1,ψ−1(u)∼(1−uh1)1/M, u→ 1.If v ∈ (0,1) and α ∈ (0,1) are fixed, then as u→ 1,CV |U(v|u)∼−ψ ′(ψ−1(v))h1/M1 M(1−u)(1−M)/M → 0. (6.18)For the conditional quantile, we set CV |U(v|u) = α ∈ (0,1) and solve for v. SinceCV |U(v|u) = ψ ′(ψ−1(u)+ψ−1(v))/ψ ′(ψ−1(u)), if v does not converge to 1, thenCV |U(v|u)→ 0. Therefore, we have (u,v)→ (1,1) and the conditional distributionis asymptotic toCV |U(v|u)∼(1+(1− v1−u)1/M)M−1. (6.19)Solving for v, the conditional quantile function isC−1V |U(α|u)∼ 1−(α1/(M−1)−1)M(1−u)→ 1, u→ 1.(The above is one result in Proposition 6.2.) If we further assume (1−u)∼ (1−v)k,where k ∈ (0,∞), thenCV |U(v|u)∼(1+(1−u)(1/k−1)/M)M−1, u→ 1. (6.20)120Depending on the value of k, there are three different cases:CV |U(v|u)∼(1−u)(M−1)(1/k−1)/M → 0 if k > 1,2M−1 ∈ (0,1) if k = 1,1+(M−1)(1−u)(1/k−1)/M → 1 if 0 < k < 1.(6.21)Upper tail intermediate dependence / quadrant independence (M > 1)By Proposition 7 in Hua and Joe (2011), if M > 1, then the copula C has upper tailintermediate dependence or independence, and support(CV |U(·|1)) = (0,1). There-fore, limu→1CV |U(v|u) ∈ (0,1) if v ∈ (0,1), and CV |U(v|u)→ 1 if (u,v)→ (1,1).6.4 Trivariate asymptotic conditional quantileIn this section, we aim to extend some results from the previous subsection totrivariate distributions. Specifically, we study the trivariate case F−1Y |X1,X2(α|x1,x2)with a trivariate vine copula model. However, extending from bivariate to trivariateis not trivial, since the asymptotic conditional quantile function depends on thedirection in which (x1,x2) go to infinite.6.4.1 Trivariate strongest functional relationshipWe first study the strongest dependence: functional relationship between the re-sponse variable Y and explanatory variables X1 and X2. Since the marginal dis-tribution of Y does not have a closed-form expression in general, it is difficult toobtain a general result. We focus on a special case where the marginal distributionof Y can be calculated. It is shown that the conditional quantile function is asymp-totically linear in x1 or x2 along a ray on the (x1,x2)-plane, and this is an extensionof the bivariate strongly linear case.Let Y ∗ = h(X∗1 ,X∗2 ), where h is monotonically increasing in each argument,Y ∗ ∼ FY ∗ , X∗1 ∼ FX∗1 and X∗2 ∼ FX∗2 . Transforming them to N(0,1) variables, wedefine Y = Φ−1(FY ∗(Y ∗)), X1 = Φ−1(FX∗1 (X∗1 )) and X2 = Φ−1(FX∗2 (X∗2 )). The fol-121lowing functional relationship holdsY =Φ−1 ◦FY ∗ ◦h(F−1X∗1 (Φ(X1)),F−1X∗2(Φ(X2))):= g(X1,X2). (6.22)We are interested in the conditional quantile F−1Y |X1,X2(α|x1,x2) = g(x1,x2). Inthis case, it is obvious that the conditional quantile function does not depend onthe quantile level α . It is conjectured that it is a generalization of the bivariatestrongly linear case, in the sense that g is asymptotically linear as (x1,x2)→ (∞,∞)or (−∞,−∞) along different rays, even though g can be quite nonlinear in twovariables.We focus on a special case to gain insight into the asymptotic behavior ofg(x1,x2). Assume Y ∗=X∗1 +X∗2 , and X∗1 , X∗2 follow Gamma(α1,1) and Gamma(α2,1)independently. As a result, Y ∗ follows Gamma(α1 +α2,1). Using tail expansionsof the gamma CDF at 0 and ∞, the following can be obtained:• If x2 ∼ kx1 as x1,x2→+∞, then g(x1,x2)∼√1+ k2x1.• If x2 ∼ kx1 as x1,x2→−∞, theng(x1,x2)∼√α1+α2α1 x1 if k2 ≥ α2α1 ,√α1+α2α2 kx1 if k2 < α2α1 .Detailed derivations can be found in Appendix A.1. Although the conditional quan-tile function is not linear in both x1 and x2, it is asymptotically linear in x1 or x2along a ray. Note that this is an asymptotic property and it is true only if x1 and x2are large enough. The rate of asymptotic approximation depends on k,α1 and α2because of the arguments−x21/(2α1) and−k2x21/(2α2) in the exponential function.6.4.2 Trivariate conditional boundary distribution with bivariateArchimedean copulasFor a trivariate vine copula model, it is difficult to get general results to cover alltypes of bivariate copulas for the vine, but it is possible to get results for the caseswhere all bivariate copulas are Archimedean. This provides some insight on thetail behavior of conditional quantile functions. Specifically, we study how the tail122properties of the bivariate Archimedean copulas on the edges affect the asymptoticbehavior of the conditional CDF. It turns out that trivariate cases are more complexthan bivariate ones. For a bivariate Archimedean copula, the boundary conditionalCDF CV |U(v|0) is either a distribution with support on all of (0,1) or a degeneratedistribution at 0. However, depending on the bivariate copulas on the edges, theconditional CDF C3|12(v|u1,u2) could be a distribution with support on all of (0,1),a degenerate distribution at 0, or a degenerate distribution at 1 (the unusual case),as (u1,u2)→ (0,0) along a ray. The results are summarized in Tables 6.1 and 6.2.Some results on the asymptotic conditional distributions of bivariate Archimedeancopulas are presented in Section 6.3. Based on those results, we study the condi-tional distribution of trivariate vine copulas with bivariate Archimedean copulas.Specifically, we are interested in the boundary conditional distribution of a trivari-ate vine copula with C12,C23 in tree 1 and C13;2 in tree 2:u3|12 :=C3|12(v|u1,u2) =C3|1;2(u3|2|u1|2), (6.23)where u3|2 := C3|2(v|u2) and u1|2 := C1|2(u1|u2), C3|1;2(b|a) = ∂C13;2(a,b)/∂a,v ∈ (0,1) and (u1,u2)→ (0,0) or (u1,u2)→ (1,1). C1|2,C3|2 and C3|1;2 are theconditional distributions of C12,C23 and C13;2 respectively.As (u1,u2)→ (0,0), u3|12 can, in some cases, depend on C3|1;2(·|0) or C3|1;2(·|1),as well as C3|2(·|0) and C1|2(·|0). Similarly, as (u1,u2)→ (1,1), u3|12 can, in somecases, depend on C3|1;2(·|0) or C3|1;2(·|1), as well as C3|2(·|1) and C1|2(·|1). This iswhy the trivariate and higher-dimensional cases of boundary conditional CDF canbe complicated. Also, in some cases, the form of the boundary conditional CDFdepends on the direction of (u1,u2)→ (0,0) or (1,1).Given the copula boundary conditional distribution u3|12, we can obtain itsequivalence on the normal scale. Let the trivariate vine copula be the CDF of a ran-dom vector (U1,U2,V ), and define X1 =Φ−1(U1),X2 =Φ−1(U2) and Y =Φ−1(V ).We are interested in the conditional quantile function F−1Y |X1,X2(α|x1,x2), as x1,x2→−∞ and x1/x2 converges to a constant. For a fixed quantile level α ,• If u3|12→ 0 as (u1,u2)→ (0,0) and u2 ∼ uk1, then F−1Y |X1,X2(α|x1,x2)→ +∞as x1,x2→−∞ and x2/x1→√k.123limu2→0C3|2(v|u2)(0,1) 1limu1 ,u2→0C1|2(u1|u2)0 (0,1) 1 1 1 1 1(0,1) (0,1) (0,1) (0,1) 1 1 11 (0,1) (0,1) 0 1 1 ∗C13;2 κ13 = 2 κ13 ∈ (1,2) κ13 = 1 κ13 = 2 κ13 ∈ (1,2) κ13 = 1Table 6.1: The taxonomy of the lower tail boundary conditional distributionlimu1,u2→0 u3|12, where u3|12 is defined in Equation 6.23. For the first(non-heading) row where limu1,u2→0C1|2(u1|u2) = 0, κ13 represents κ13L,the lower tail order of C13;2. Similarly, for the third (non-heading) row,where limu1,u2→0C1|2(u1|u2) = 1, κ13 represents κ13U , the upper tail orderof C13;2.• If u3|12 converges to a constant in (0,1) as (u1,u2)→ (0,0) and u2 ∼ uk1, thenF−1Y |X1,X2(α|x1,x2) converges to a finite constant as x1,x2→−∞ and x2/x1→√k.• If u3|12→ 1 as (u1,u2)→ (0,0) and u2 ∼ uk1, then F−1Y |X1,X2(α|x1,x2)→−∞as x1,x2→−∞ and x2/x1→√k.Similar results hold for the upper tail, that is u1,u2→ 1 and u2 ∼ uk1.Trivariate vine copula lower tailFix v ∈ (0,1) and let u1,u2 → 0 with u2 ∼ uk1. According to Equation 6.10 andEquation 6.14, depending on the tail order of C23(u2,u3), the limit of u3|2 =C3|2(v|u2)could either be a number in (0,1), or 1. Similarly, according to Section 6.3.1, thelimit of u1|2 =C1|2(u1|u2) could either be 0, a number in (0,1), or 1. Depending onthe limit of u1|2, we also need to take the corresponding tail behavior of C13;2 intoconsideration. The possible combinations of the tail behaviors are summarized inTable 6.1.The first (non-heading) row of Table 6.1 corresponds to u1|2→ 0.• If limu2→0 u3|2 ∈ (0,1) and C13;2 has κ13L = 2, then support(C3|1;2(·|0)) =(0,1). Therefore limu1,u2→0 u3|12 ∈ (0,1), as shown in row 1, column 1.• If limu2→0 u3|2 ∈ (0,1) and C13;2 has κ13L ∈ [1,2), then C3|1;2(·|0) is a degen-erate distribution with a point mass at 0. Therefore u3|12 → 1, as shown inrow 1, columns 2–3.124• If u3|2 → 1, then u3|12 → 1, regardless of the tail behavior of C13;2. This isshown in row 1, columns 4–6.The second (non-heading) row of Table 6.1 corresponds to limu1,u2→0 u1|2 ∈(0,1). In this case, the tail behavior of C13;2 is irrelevant. If limu2→0 u3|2 ∈ (0,1),then limu1,u2→0 u3|12 ∈ (0,1) (row 2, columns 1–3); if limu2→0 u3|2 = 1, then limu1,u2→0 u3|12 =1 (row 2, columns 4–6).The third (non-heading) row of Table 6.1 corresponds to u1|2→ 1.• If limu2→0 u3|2 ∈ (0,1) and C13;2 has κ13U ∈ (1,2], then support(C3|1;2(·|1)) =(0,1). Therefore limu1,u2→0 u3|12 ∈ (0,1), as shown in row 3, columns 1–2.• If limu2→0 u3|2 ∈ (0,1) and C13;2 has κ13U = 1, then C3|1;2(·|1) is a degeneratedistribution with a point mass at 1. Therefore u3|12→ 0, as shown in row 3,column 3.• If u3|2 → 1 and C13;2 has κ13U ∈ (1,2], then support(C3|1;2(·|1)) = (0,1).Therefore u3|12→ 1, as shown in row 3, columns 4–5.• If u3|2→ 1 and C13;2 has κ13U = 1, then C3|1;2(·|1) is a degenerate distribu-tion with a point mass at 1. The limit of u3|12 is unclear and needs furtherinvestigation (row 3, column 6, cell ∗). Depending on κ23L, we have thefollowing results. (See Section A.2 for a detailed derivation.)– If C23 has κ23L ∈ (1,2), then C3|1;2(u3|2|u1|2)→ 0.– If C23 has κ23L = 1, thenC3|1;2(u3|2|u1|2)→1 if −q−123 −q−112 (k−1−1)> 0,const ∈ (0,1) if −q−123 −q−112 (k−1−1) = 0,0 if −q−123 −q−112 (k−1−1)< 0,where q23 and q12 are the parameters of ψ23 for C23 and ψ12 for C12respectively.125limu2→1C3|2(v|u2)0 (0,1)limu1 ,u2→1C1|2(u1|u2)0 0 0∗ † (0,1) 1 1(0,1) 0 0 0 (0,1) (0,1) (0,1)1 0 0 0 (0,1) (0,1) 0C13;2 κ13 = 2 κ13 ∈ (1,2) κ13 = 1 κ13 = 2 κ13 ∈ (1,2) κ13 = 1Table 6.2: The taxonomy of the upper tail boundary conditional distributionlimu1,u2→1 u3|12, where u3|12 is defined in Equation 6.23. For the first(non-heading) row where limu1,u2→1C1|2(u1|u2) = 0, κ13 represents κ13L,the lower tail order of C13;2. Similarly, for the third (non-heading) row,where limu1,u2→1C1|2(u1|u2) = 1, κ13 represents κ13U , the upper tail orderof C13;2.Trivariate vine copula upper tailFix v∈ (0,1) and let (u1,u2)→ (1,1)with (1−u2)∼ (1−u1)k. According to Equa-tion 6.18, depending on the tail order of C23(u2,u3), the limit of u3|2 =C3|2(v|u2)could either be a number in (0,1), or 0. Similarly, according to Section 6.3.2, thelimit of u1|2 =C1|2(u1|u2) could either be 0, a number in (0,1), or 1. Depending onthe limit of u1|2, we also need to take the corresponding tail behavior of C13;2 intoconsideration. The possible combinations of the tail behaviors are summarized inTable 6.2.The first (non-heading) row of Table 6.2 corresponds to u1|2→ 0.• If u3|2→ 0 and C13;2 has κ13L = 2, then support(C3|1;2(·|0)) = (0,1). There-fore u3|12→ 0, as shown in row 1, column 1.• If u3|2→ 0 and C13;2 has κ13L ∈ (1,2), then C3|1;2(·|0) is a degenerate distri-bution with a point mass at 0. The limit of u3|12 is unclear and needs furtherinvestigation (row 1, column 2, cell ∗). It can be shown that in that case,u1|2→ 0. See Section A.2 for a detailed derivation.• If u3|2→ 0 and C13;2 has κ13L = 1, then C3|1;2(·|0) is a degenerate distribu-tion with a point mass at 0. The limit of u3|12 is unclear and needs furtherinvestigation (row 1, column 3, cell †). Depending on the relationship amongM12,M23 and k, we have the following results. (See Section A.2 for a detailed126derivation.)C3|1;2(u3|2|u1|2)→0 if − M23−1M23 −M12−1M12(1k −1)> 0,const ∈ (0,1) if − M23−1M23 −M12−1M12(1k −1) = 0,1 if − M23−1M23 −M12−1M12(1k −1)< 0,where M23 and M12 are the parameters of ψ23 for C23 and ψ12 for C12 respec-tively.• If limu2→1 u3|2 ∈ (0,1) and C13;2 has κ13L = 2, then support(C3|1;2(·|0)) =(0,1). Therefore lim(u1,u2)→(0,0) u3|12 ∈ (0,1), as shown in row 1, column 4.• If limu2→1 u3|2 ∈ (0,1) and C13;2 has κ13L ∈ [1,2), then C3|1;2(·|0) is a degen-erate distribution with a point mass at 0. Therefore u3|12 → 1, as shown inrow 1, columns 5–6.The second (non-heading) row of Table 6.2 corresponds to limu1,u2→(1,1) u1|2 ∈(0,1). In this case, the tail behavior of C13;2 is irrelevant. If limu2→1 u3|2 = 0,then limu1,u2→(1,1) u3|12 = 0 (row 2, columns 1–3); if limu2→1 u3|2 ∈ (0,1), thenlimu1,u2→(1,1) u3|12 ∈ (0,1) (row 2, columns 4–6).The third (non-heading) row of Table 6.2 corresponds to u1|2→ 1.• If u3|2 → 0, then u3|12 → 0, regardless of the tail property of C13;2. This isshown in row 3, columns 1–3.• If limu2→1 u3|2 ∈ (0,1) and C13;2 has κ13U ∈ (1,2], then support(C3|1;2(·|1)) =(0,1). Therefore limu1,u2→0 u3|12 ∈ (0,1), as shown in row 3, columns 4–5.• If limu2→1 u3|2 ∈ (0,1) and C13;2 has κ13U = 1, then C3|1;2(·|1) is a degeneratedistribution with a point mass at 1. Therefore u3|12→ 0, as shown in row 3,column 6.6.4.3 Case studies: trivariate conditional quantileIn this section, we provide a few examples to illustrate how to use the results inTable 6.1 and Table 6.2 to derive the boundary conditional quantiles for trivariate127vine with bivariate Archimedean copulas. Analytic results are provided for thoseexamples to illustrate how the tail properties of bivariate copulas on edges of thevine can affect asymptotic properties of conditional quantiles. We use the samesetting as before: v ∈ (0,1) and u1,u2→ (0,0) or u1,u2→ (1,1); we are interestedin the limit of u3|12 =C3|1;2(u3|1|u1|2) as well as the conditional quantile functionon the normal scale.In case 1, the two linking copulas to Y have κU = 1 and κL ∈ (1,2). In case2, the linking copulas to Y have κU = κL = 2. The less straightforward case 3 hasκU = κL = 2 for the linking copula to Y in tree 1 and κU = 1,κL ∈ (1,2) for thelinking copula to Y in tree 2.Case 1C12, C23 and C13;2 are all bivariate Gumbel copulas, with lower tail intermediatedependence and upper tail dependence.Lower tail (u1,u2→ (0,0)): C12 has κ12L ∈ (1,2). According to Equation 6.16,u1|2→ 0. C23 has κ23L ∈ (1,2). According to Equation 6.14, u3|2→ 1. Finally, C13;2has κ13L ∈ (1,2). The combination of the three copulas corresponds to the row 1column 5 in Table 6.1, that is u3|12 → 1. On the normal scale, the conditionalquantile F−1Y |X1,X2(α|x1,x2)→ −∞ as x1,x2 → −∞. A more detailed analysis (seeAppendix A.3) shows thatF−1Y |X1,X2(α|x1,x2) = O((− logα)r23r13;2/2|x2|1−r23r13;2),x1,x2→−∞,x2/x1→√k,where r23 and r13;2 are parameters of the LT ψ23 for C23 and ψ13;2 for C13;2 re-spectively. Since 1− r23r13;2 < 1, the conditional quantile function goes to −∞sublinearly with respect to x1 or x2.Upper tail (u1,u2 → (1,1)): C23 has κ23U = 1. According to Equation 6.18,u3|2 → 0. C12 has κ12U = 1. Applying Equation 6.21, we need to investigate therate at which u1 and u2 go to 1. Assuming (1−u2)∼ (1−u1)k:• If k > 1, then u1|2 → 0. C13;2 has κ13L ∈ (1,2). This corresponds to row 1column 2 in Table 6.2, that is u3|12→ 0.128• If k = 1, then limu1|2 ∈ (0,1). This corresponds to row 2 columns 1–3 inTable 6.2, that is limu3|12→ 0.• If 0 < k < 1, then u1|2 → 1. We need to focus on the upper tail of C13;2,which has κ32U = 1. This corresponds to row 3 column 3 in Table 6.2, thatis u3|12→ 0.Therefore, u3|12→ 0 regardless of the value of k. On the normal scale, the condi-tional quantile F−1Y |X1,X2(α|x1,x2)→ +∞ as x1,x2→ +∞. A more detailed analysisshows that, if x2/x1→√k as x1,x2→+∞, thenF−1Y |X1,X2(α|x1,x2)∼x2 if k ≥ 1,√1+ M23M12 (1k −1)x2 if 0 < k < 1.In summary, as x1 and x2 go to +∞, the conditional quantile goes to +∞ lin-early; as x1 and x2 go to −∞, the conditional quantile goes to −∞ sublinearly. Thisis a natural extension of the conditional quantile function of the bivariate Gumbelcopula. Figure 6.2a shows the conditional quantile F−1Y |X1,X2(α|x1,x2) for α = 0.25and 0.75. The parameters of the copulas δ12, δ23, and δ13;2 are chosen such that thecorresponding Spearman correlation ρS = 0.5.Case 2• C12 is a bivariate Frank copula, with κ12L = κ12U = 2.• C23 is a bivariate Frank copula, with κ23L = κ23U = 2.• C13;2 could be any bivariate copula.In this case, row 2 columns 1–3 in Table 6.1 and row 2 column 4–6 in Table 6.2apply. That is, limu1,u2→(0,0) u3|12 ∈ (0,1) and limu1,u2→(1,1) u3|12 ∈ (0,1). On thenormal scale, the conditional quantile F−1Y |X1,X2(α|x1,x2) converges to a finite con-stant as x1,x2→ +∞ or −∞. This example shows that, if the bivariate copulas onthe first level has κL = 2 (or κU = 2), then regardless of the second level copula,the conditional lower (upper) quantile is asymptotically constant.129(a) Case 1: as x1,x2 → +∞, the conditional quantile goes to+∞ linearly; as x1,x2→−∞, the conditional quantile goes to −∞sublinearly.(b) Case 3: as x1,x2 → +∞ at different rates, the conditionalquantile could either go up or down.Figure 6.2: Conditional quantile surface F−1Y |X1,X2(α|x1,x2) in cases 1 and 3,for α = 0.25 and 0.75.130Case 3• C12 is a bivariate Gumbel copula, with κ12L ∈ (1,2) and κ12U = 1.• C23 is a bivariate Frank copula, with κ23L = κ23U = 2.• C13;2 is a bivariate Gumbel copula, with with κ13L ∈ (1,2) and κ13U = 1.Lower tail (u1,u2→ (0,0)): C12 has κ12L ∈ (1,2). According to Equation 6.16,u1|2→ 0 and− logu1|2∼O(− logu2). Since C23 has support(C3|2(·|0))= (0,1) andκ23L = 2, limu2→0 u3|2 ∈ (0,1). Finally, C13;2 has κ13L ∈ (1,2). The combination ofthe three copulas corresponds to the row 1 column 2 in Table 6.1, that is, u3|12→ 1.On the normal scale, the conditional quantile F−1Y |X1,X2(α|x1,x2)→−∞ as x1,x2→−∞. According to Proposition 6.2, the conditional quantile F−1Y |X1,X2(α|x1,x2) issublinear with respect to x1 or x2, if x2/x1→√k.Upper tail (u1,u2→ (1,1)): Since C23 has κ23U = 2 and support(C3|2(·|1)) =(0,1), limu2→1 u3|2 ∈ (0,1). C12 has κ12U = 1. Applying Equation 6.21, we need toinvestigate the rate at which u1 and u2 go to 1. Assuming (1−u2)∼ (1−u1)k:• If k > 1, then u1|2 → 0. C13;2 has κ13L ∈ (1,2). This corresponds to row1 column 5 in Table 6.2, that is u3|12 → 1. On the normal scale, the con-ditional quantile F−1Y |X1,X2(α|x1,x2)→−∞ as x1,x2→ +∞ and x2/x1→√k.The conditional quantile F−1Y |X1,X2(α|x1,x2) is sublinear with respect to x1 orx2.• If k = 1, then limu1|2 ∈ (0,1). This corresponds to row 2 columns 4–6 inTable 6.2, that is limu3|12 ∈ (0,1). On the normal scale, the conditionalquantile F−1Y |X1,X2(α|x1,x2) converges to a finite number as x1,x2→ +∞ andx2/x1→ 1.• If 0 < k < 1, then u1|2 → 1. We need to focus on the upper tail of C13;2,which has κ13U = 1. This corresponds to row 3 column 6 in Table 6.2, that isu3|12→ 0. On the normal scale, the conditional quantile F−1Y |X1,X2(α|x1,x2)→+∞ as x1,x2→+∞ and x2/x1→√k. The conditional quantile F−1Y |X1,X2(α|x1,x2)is linear with respect to x1 or x2.131(a) k = 0.5. (b) k = 1. (c) k = 4.Figure 6.3: Conditional quantile F−1Y |X1,X2(α|x1,x2) versus x1 in case 3 forα = 0.25 and 0.75, as x1 → +∞. It shows that the conditional quan-tile converges to +∞, a finite number, or −∞.Figure 6.2b shows the conditional quantile surface for α = 0.25 and 0.75. Theparameters of the copulas δ12, δ23, and δ13;2 are chosen such that the correspondingSpearman correlation ρS = 0.5. Depending on the rate at which x1 and x2 go to+∞,the conditional quantile could go to +∞ or −∞. Figure 6.3 shows the conditionalquantile F−1Y |X1,X2(α|x1,x2) for α = 0.25 and 0.75 as x1,x2 → ∞ and x2/x1 →√k,for k = 0.5,1 and 4. The three cases correspond to weakly linear, asymptotic con-stant, and sublinear. This example also shows that the asymptotic behavior of theconditional quantile function varies depending on the direction along which x1 andx2 take.The case of k> 1 and F−1Y |X1,X2(α|x1,x2)→−∞ as x1,x2→+∞ is unusual, giventhat all three copulas have positive dependence. One possible explanation is that,variable X1 has strong tail dependence link to Y , and variable X2 has tail quadrantindependence link to Y ; when k > 1, X2 goes to infinity faster than X1 and thedirection to limit is more concentrated on the weaker variable.6.5 Beyond trivariateUsing the results in Section 6.3 and Section 6.4.2 as building blocks, the boundaryconditional distribution of a higher-dimensional vine copula can be derived. Take a4-dimensional vine copula as an example: without loss of generality, we considera D-vine copula with 1-2-3-4 as the first level tree. The conditional CDF can be132represented byu4|123 :=C4|123(v|u1,u2,u3) =C4|1;23(u4|23|u1|23),where u4|23 :=C4|2;3(u4|3|u2|3),u1|23 :=C1|3;2(u1|2|u3|2),u1|2 :=C1|2(u1|u2),u2|3 =C2|3(u2|u3),u3|2 =C3|2(u3|u2),u4|3 =C4|3(v|u3), and v ∈ (0,1), u1,u2,u3→ 0 or 1.Applying the techniques demonstrated in Section 6.4.2, the asymptotic behaviorof u4|23 and u1|23 can be obtained. Afterwards, the results in Section 6.3 can beapplied to get the limit of u4|123. The limit of u4|123 could be summarized as a tablelike Table 6.1 and Table 6.2, but it would be complicated to classify all the possiblecombinations of the bivariate copula tail behavior. Technically, the idea could befurther generalized to any high dimensions.A general heuristic statement is that if more linking copulas of the X’s to Yhave κ = 1, then the tail behavior of conditional quantiles is more likely to beasymptotically linear or sublinear. If all of the linking copulas of the X’s to Y haveκ = 2, then the tail behavior of conditional quantiles is asymptotically constant.133Chapter 7ConclusionThe major contributions of the thesis lie in the improvements on fitting paramet-ric vine copula models, compared with (a) alternative methods for truncated vinestructure learning, and (b) diagnostics for bivariate copula selection for dependenceanalysis or prediction.The Monte Carlo tree search (MCTS) algorithm improves on the greedy algo-rithm for the vine structure. Under the guidance of the vine UCT, our method caneffectively explore the large search space of possible truncated vines by balancingbetween exploration and exploitation. It also has significantly better performanceover the existing methods under various experimental setups.The diagnostic tools provide better ways for bivariate copula selection. Theuse of diagnostics can reduce the number of candidate copula families to consideron edges of the vine. We have also illustrated with real datasets the use of depen-dence and asymmetry measures as diagnostic tools for bivariate copulas and bivari-ate conditional distributions. It is a future research direction to automatically andadaptively generate a shortlist of candidate parametric copula families for edgesof a vine copula based on diagnostic measures. An alternative is a reverse-deletealgorithm: start with a long list of bivariate parametric copula families followed bydeletion of families that cannot match the diagnostic summaries.The vine copula regression method is interpretable and flexible. Comparedwith the existing methods that either use D-vines or only handle continuous vari-ables, the proposed method uses R-vines and can fit mixed continuous and ordinal134variables. Various shapes of conditional quantiles can be obtained depending onhow pair-copulas are chosen on the edges of the vine. For bivariate copulas, theconditional quantile function of the response variable could be asymptotically lin-ear, sublinear, or constant with respect to the explanatory variable. The asymptoticconditional distribution can be quite complex for trivariate and higher-dimensionalcases. The performance of the proposed method is evaluated on simulated data setsand the Abalone data set. The heteroscedasticity in the data is better captured byvine copula regression than the standard regression methods.One possible future research direction is the extension of the proposed regres-sion method for survival outcomes with censored data. For example, Emura et al.(2018) use bivariate copulas to predict time-to-death given time-to-cancer progres-sion; Barthel et al. (2018) apply vine copulas to multivariate right-censored eventtime data. These types of applications would require more numerical integrationmethods. Another research direction is to handle variable selection and reductionwhen there are many explanatory variables, some of which might form clusterswith strong dependence.135BibliographyAas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copulaconstructions of multiple dependence. Insurance: Mathematics andEconomics, 44(2):182–198. → page 85Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions:with Formulas, Graphs, and Mathematical Tables, volume 55. CourierCorporation. → page 111Acar, E. F., Genest, C., and Nesˇlehova´, J. (2012). Beyond simplified pair-copulaconstructions. Journal of Multivariate Analysis, 110:74–90. → pages 54, 59, 73Antweiler, W. (1996). Pacific exchange rate service. http://fx.sauder.ubc.ca/.[Online; accessed: 2019-02-19]. → page 48Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation ofsymmetry with emphasis on a multivariate skew t-distribution. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 65(2):367–389.→ page 66Barthel, N., Geerdens, C., Killiches, M., Janssen, P., and Czado, C. (2018). Vinecopula based likelihood estimation of dependence patterns in multivariate eventtime data. Computational Statistics & Data Analysis, 117:109–127. → pages108, 135Bauer, A. and Czado, C. (2016). Pair-copula Bayesian networks. Journal ofComputational and Graphical Statistics, 25(4):1248–1271. → page 88Bedford, T. and Cooke, R. M. (2001). Probability density decomposition forconditionally dependent random variables modeled by vines. Annals ofMathematics and Artificial Intelligence, 32(1-4):245–268. → pages 18, 19, 89Bedford, T. and Cooke, R. M. (2002). Vines — A new graphical model fordependent random variables. Annals of Statistics, 30(4):1031–1068. → page 85136Bentler, P. M. (1990). Comparative fit indexes in structural models. PsychologicalBulletin, 107(2):238. → page 24Bernard, C. and Czado, C. (2015). Conditional quantiles and tail dependence.Journal of Multivariate Analysis, 138:104–126. → pages 85, 110, 111Blomqvist, N. (1950). On a measure of dependence between two randomvariables. The Annals of Mathematical Statistics, 21(4):593–600. → page 53Bouye´, E. and Salmon, M. (2009). Dynamic copula quantile regressions and tailarea dynamic dependence in Forex markets. The European Journal of Finance,15(7-8):721–750. → page 85Brechmann, E. (2010). Truncated and simplified regular vines and theirapplications. Master’s thesis, Technical University of Munich. → page 90Brechmann, E. C., Czado, C., and Aas, K. (2012). Truncated regular vines in highdimensions with application to financial data. Canadian Journal of Statistics,40(1):68–85. → pages 26, 52, 72, 85Brechmann, E. C. and Joe, H. (2014). Parsimonious parameterization ofcorrelation matrices using truncated vines and factor analysis. ComputationalStatistics & Data Analysis, 77:233–251. → page 87Brechmann, E. C. and Joe, H. (2015). Truncation of vine copulas using fit indices.Journal of Multivariate Analysis, 138:19–33. → pages 22, 24, 43Brennan, C. W., Verhaak, R. G. W., McKenna, A., Campos, B., Noushmehr, H.,Salama, S. R., Zheng, S., Chakravarty, D., Sanborn, J. Z., Berman, S. H., et al.(2013). The somatic genomic landscape of glioblastoma. Cell,155(2):462–477. → pages 45, 76Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I.,Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012).A survey of Monte Carlo tree search methods. IEEE Transactions onComputational Intelligence and AI in games, 4(1):1–43. → pages 4, 28, 37Chang, B. and Joe, H. (2019). Prediction based on conditional distributions ofvine copulas. Computational Statistics & Data Analysis, 139:45–63.Chang, B., Pan, S., and Joe, H. (2019). Vine copula structure learning via MonteCarlo tree search. In International Conference on Artificial Intelligence andStatistics.137Chaslot, G. M. J. B., Winands, M. H. M., Herik, H. J. v. D., Uiterwijk, J. W. H. M.,and Bouzy, B. (2008). Progressive strategies for Monte-Carlo tree search. NewMathematics and Natural Computation, 4(03):343–357. → page 35Childs, B. E., Brodeur, J. H., and Kocsis, L. (2008). Transpositions and movegroups in Monte Carlo tree search. In Computational Intelligence and Games,2008. CIG’08. IEEE Symposium On, pages 389–395. IEEE. → page 37Cook, R. D. and Johnson, M. E. (1981). A family of distributions for modellingnon-elliptically symmetric multivariate data. Journal of the Royal StatisticalSociety. Series B (Methodological), 43(2):210–218. → page 73Cooke, R. M., Joe, H., and Chang, B. (2019). Vine copula regression forobservational studies. AStA Advances in Statistical Analysis. → pages 3, 28, 86Coulom, R. (2006). Efficient selectivity and backup operators in Monte-Carlo treesearch. In International Conference on Computers and Games, pages 72–83.Springer. → pages 4, 28Dette, H., Van Hecke, R., and Volgushev, S. (2014). Some comments oncopula-based regression. Journal of the American Statistical Association,109(507):1319–1324. → page 108Dissmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selectingand estimating regular vine copulae and application to financial returns.Computational Statistics & Data Analysis, 59:52–69. → pages3, 23, 28, 33, 42, 43, 49, 52, 72, 80, 85, 87Emura, T., Nakatochi, M., Matsui, S., Michimae, H., and Rondeau, V. (2018).Personalized dynamic prediction of death according to tumour progression andhigh-dimensional genetic factors: meta-analysis with a joint model. Statisticalmethods in medical research, 27(9):2842–2858. → pages 108, 135Fan, J. (1992). Design-adaptive nonparametric regression. Journal of theAmerican statistical Association, 87(420):998–1004. → page 85Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recogn. Lett.,27(8):861–874. → page 106Gelly, S. and Wang, Y. (2006). Exploration exploitation in go: UCT forMonte-Carlo go. In NIPS: Neural Information Processing Systems ConferenceOn-line trading of Exploration and Exploitation Workshop. → page 35138Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimationprocedure of dependence parameters in multivariate families of distributions.Biometrika, 82(3):543–552. → page 25Genest, C., Ghoudi, K., and Rivest, L.-P. (1998). “Understanding relationshipsusing copulas,” by Edward Frees and Emiliano Valdez, January 1998. NorthAmerican Actuarial Journal, 2(3):143–149. → page 64Gijbels, I., Veraverbeke, N., and Omelka, M. (2011). Conditional copulas,association measures and their applications. Computational Statistics & DataAnalysis, 55(5):1919–1932. → pages 54, 56, 57, 61Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction,and estimation. Journal of the American Statistical Association,102(477):359–378. → pages 94, 95, 96Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of StatisticalLearning: Data Mining, Inference and Prediction. Springer, 2 edition. → page95Hobæk Haff, I., Aas, K., and Frigessi, A. (2010). On the simplified pair-copulaconstruction — simply useful or too simplistic? Journal of MultivariateAnalysis, 101(5):1296–1310. → page 54Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence ofmultivariate copulas. Journal of Multivariate Analysis, 102(10):1454–1471. →pages 16, 118, 119, 120, 121Ha¨rdle, W. (1990). Applied Nonparametric Regression. Econometric SocietyMonographs. Cambridge University Press. → page 85Joe, H. (1993). Parametric families of multivariate distributions with givenmargins. Journal of Multivariate Analysis, 46(2):262–282. → page 53Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman andHall/CRC. → pages 12, 25Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method forcopula-based models. Journal of Multivariate Analysis, 94(2):401–419. →page 25Joe, H. (2014). Dependence Modeling with Copulas. Chapman & Hall / CRCPress, Boca Raton, FL. → pages10, 23, 27, 42, 45, 57, 64, 68, 72, 85, 87, 89, 91, 93, 117139Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures ofmax-infinitely divisible distributions. Journal of multivariate analysis,57(2):240–265. → page 12Joe, H. and Xu, J. J. (1996). The estimation method of inference functions formargins for multivariate models. University of British Columbia, Departmentof Statistics, Technical Report, 166. → page 25Kendall, M. G. (1938). A new measure of rank correlation. Biometrika,30(1/2):81–93. → pages 14, 53Kocsis, L. and Szepesva´ri, C. (2006). Bandit based Monte-Carlo planning. InEuropean Conference on Machine Learning, pages 282–293. Springer. →pages 4, 28Kraus, D. and Czado, C. (2017a). D-vine copula based quantile regression.Computational Statistics & Data Analysis, 110:1–18. → pages 86, 102, 105Kraus, D. and Czado, C. (2017b). Growing simplified vine copula trees:improving Dissmann’s algorithm. arXiv preprint arXiv:1703.05203. → pages54, 73, 74Krupskii, P. (2017). Copula-based measures of reflection and permutationasymmetry and statistical tests. Statistical Papers, 58(4):1165–1187. → pages15, 53, 56, 62, 63Krupskii, P., Huser, R., and Genton, M. G. (2018). Factor copula models forreplicated spatial data. Journal of the American Statistical Association,113(521):467–479. → pages 3, 28Krupskii, P. and Joe, H. (2015). Tail-weighted measures of dependence. Journalof Applied Statistics, 42(3):614–629. → page 53Kurowicka, D. and Cooke, R. (2003). A parameterization of positive definitematrices in terms of partial correlation vines. Linear Algebra and itsApplications, 372:225–251. → pages 21, 23Kurowicka, D. and Cooke, R. M. (2006). Uncertainty Analysis with HighDimensional Dependence Modelling. Wiley, Chichester. → pages 19, 21Kurowicka, D. and Joe, H. (2011). Dependence Modeling: Vine CopulaHandbook. World Scientific, Singapore. → pages 3, 23, 28, 87, 91Kurz, M. S. and Spanhel, F. (2017). Testing the simplifying assumption inhigh-dimensional vine copulas. arXiv preprint arXiv:1706.02338. → page 54140Lee, D., Joe, H., and Krupskii, P. (2018). Tail-weighted dependence measureswith limit being the tail dependence coefficient. Journal of NonparametricStatistics, 30(2):262–290. → pages 14, 53, 56, 61Lichman, M. (2013). UCI machine learning repository. → pages 45, 98Loader, C. (1999). Local Regression and Likelihood. New York: Springer-Verlag.→ page 58McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative Risk Management:Concepts, Techniques and Tools. Princeton University Press. → page 64Mu¨ller, D. and Czado, C. (2019). Dependence modeling in ultra high dimensionswith vine copulas and the graphical lasso. Computational Statistics & DataAnalysis, 137:211–232. → page 108Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & ItsApplications, 9(1):141–142. → page 58Nagler, T., Bumann, C., and Czado, C. (2019). Model selection in sparsehigh-dimensional vine copula models with an application to portfolio risk.Journal of Multivariate Analysis. → page 108Nagler, T. and Czado, C. (2016). Evading the curse of dimensionality innonparametric density estimation with simplified vine copulas. Journal ofMultivariate Analysis, 151:69–89. → pages 85, 92Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. (1994).The population biology of abalone (haliotis species) in tasmania. i. blacklipabalone (h. rubra) from the north coast and islands of bass strait. Sea FisheriesDivision, Technical Report, (48). → page 98Noh, H., El Ghouch, A., and Bouezmarni, T. (2013). Copula-based regressionestimate and inference. Journal of the American Statistical Association,108(502):678–688. → pages 85, 86Panagiotelis, A., Czado, C., and Joe, H. (2012). Pair copula constructions formultivariate discrete data. Journal of the American Statistical Association,107(499):1063–1072. → page 12Parsa, R. A. and Klugman, S. A. (2011). Copula regression. Variance Advancingand Science of Risk, 5:45–54. → page 85Prim, R. C. (1957). Shortest connection networks and some generalizations. BellLabs Technical Journal, 36(6):1389–1401. → pages 24, 31141Rosco, J. and Joe, H. (2013). Measures of tail asymmetry for bivariate copulas.Statistical Papers, 54(3):709–726. → pages 16, 53, 63Schallhorn, N., Kraus, D., Nagler, T., and Czado, C. (2017). D-vine quantileregression with discrete variables. arXiv preprint arXiv:1705.08310. → page86Schepsmeier, U., Stoeber, J., Brechmann, E. C., Graeler, B., Nagler, T., andErhardt, T. (2018). VineCopula: Statistical Inference of Vine Copulas. Rpackage version 2.1.8. → pages 25, 48, 90, 91Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule.Experimental Economics, 1(1):43–61. → page 95Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche,G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.(2016). Mastering the game of go with deep neural networks and tree search.Nature, 529(7587):484–489. → pages 28, 31Sklar, A. (1959). Fonctions de re´partition a` n dimensions et leurs marges.Publications de l’Institut de Statistique de l’Universite´ de Paris, 8:229–231. →pages 2, 9, 10Spearman, C. (1904). The proof and measurement of association between twothings. The American Journal of Psychology, 15(1):72–101. → pages 14, 53Sto¨ber, J., Hong, H. G., Czado, C., and Ghosh, P. (2015). Comorbidity of chronicdiseases in the elderly: Patterns identified by a copula design for mixedresponses. Computational Statistics and Data Analysis, 88:28–39. → page 12Stoeber, J., Joe, H., and Czado, C. (2013). Simplified pair copulaconstructions—limitations and extensions. Journal of Multivariate Analysis,119:101–118. → pages 54, 68Stone, C. J. (1977). Consistent nonparametric regression. The annals of statistics,pages 595–620. → page 85Tomczak, K., Czerwin´ska, P., and Wiznerowicz, M. (2015). The Cancer GenomeAtlas (TCGA): an immeasurable source of knowledge. ContemporaryOncology, 19(1A):A68. → pages 45, 76Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nestedhypotheses. Econometrica: Journal of the Econometric Society,57(2):307–333. → pages 26, 73, 76142Watson, G. S. (1964). Smooth regression analysis. Sankhya¯: The Indian Journalof Statistics, Series A, 26(4):359–372. → page 58Yoshiba, T. (2018). Maximum likelihood estimation of skew-t copulas with itsapplications to stock returns. Journal of Statistical Computation andSimulation, 88(13):2489–2506. → page 66143Appendix ADerivations for Chapter 6A.1 Derivations for Section 6.4.1Tail expansions of the gamma CDFFor a random variable Z following Gamma(α,β ), the CDF isFZ(z) = γ(α,β z)/Γ(α),where β is the rate parameter and γ(·, ·) is the lower incomplete gamma function.Its CDF FZ and quantile function F−1Z have the following asymptotic behavior:FZ(z)∼ 1− (β z)α−1Γ(α)e−β z, z→+∞; F−1Z (p)∼−log(1− p)β, p→ 1−.FZ(z)∼ (β z)αΓ(α+1), z→ 0+; F−1Z (p)∼p1/αΓ(α+1)1/αβ, p→ 0+.Upper tailWe first study the upper tail asymptotic behavior of g(x1,x2) as x1,x2→ +∞.Without loss of generality, we assume β = 1 because β cancels in Equation A.1.Φ(x1)∼ 1− 1√2pix1e−x21/2, x1→+∞,144F−1X∗1 (Φ(x1))∼− log(1√2pix1e−x21/2)∼ x212, x1→+∞.Similarly,F−1X∗2 (Φ(x2))∼x222, x2→+∞.FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2)))∼ 1− 1Γ(α1+α2)((x21+ x22)α1+α2−12)e−(x21+x22)/2, x1,x2→+∞. (A.1)Finally,g(x1,x2) =Φ−1(FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2))))∼(−2log(1−FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2)))))1/2∼ (x21+ x22)1/2, x1,x2→+∞.If x2 ∼ kx1 as x1,x2→+∞, then g(x1,x2)∼√1+ k2x1.Lower tailWithout loss of generality, we assume β = 1. For the lower tail, x1,x2→−∞,Φ(x1)∼− 1√2pix1e−x21/2, x1→−∞,andF−1X∗1 (Φ(x1))∼ Γ(α1+1)1/α1 (Φ(x1))1/α1∼(Γ(α1+1)√2pi)1/α1(− 1x1)1/α1exp(− x212α1), x1→−∞.Similarly,F−1X∗2 (Φ(x2))∼(Γ(α2+1)√2pi)1/α2(− 1x2)1/α2exp(− x222α2), x2→−∞.145FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2)))∼ 1Γ(α1+α2+1)[(Γ(α1+1)√2pi)1/α1(− 1x1)1/α1exp(− x212α1)+(Γ(α2+1)√2pi)1/α2(− 1x2)1/α2exp(− x222α2)]α1+α2.Assuming x2∼ kx1, if k2 >α2/α1, then exp(−x21/(2α1)) dominates exp(−x22/(2α2)).FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2)))=O((− 1x1)(α1+α2)/α1exp(−(α1+α2)x212α1)),andg(x1,x2) =Φ−1(FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2))))∼(−2log(FY ∗(F−1X∗1 (Φ(x1))+F−1X∗2(Φ(x2)))))1/2∼√α1+α2α1x1, x1,x2→−∞, x2 ∼ kx1.If can be shown that the result holds for k2 = α2/α1 as well. Similarly, if k2 <α2/α1, theng(x1,x2)∼√α1+α2α2kx1, x1,x2→−∞, x2 ∼ kx1.In summary,g(x1,x2)∼√α1+α2α1 x1 if k2 ≥ α2/α1,√α1+α2α2 kx1 if k2 < α2/α1.A.2 Derivations for Section 6.4.2Trivariate vine copula lower tailFix v∈ (0,1) and let u1,u2→ 0 with u2 ∼ uk1. According to Equations 6.10 and1466.14, depending on the tail order of C23(u2,u3), there are three cases of the limit ofu3|2 =C3|2(v|u2) as u2→ 0:• If C23 has κ23L = 1, then from Equation 6.10,u3|2 ∼ 1−A1(v)u−1/q232 → 1, u2→ 0, v ∈ (0,1) fixed, A1(v)> 0, (A.2)where q23 is a parameter of the LT ψ23 for C23 in Equation 6.3.• If C23 has κ23L ∈ (1,2), then from Equation 6.14,u3|2 ∼ 1−A2(v)(− logu2)1−1/r23 → 1, u2→ 0, v∈ (0,1) fixed, A2(v)> 0,(A.3)where r23 is a parameter of the LT ψ23 for C23 in Equation 6.3.• If C23 has κ23L = 2, then limu2→0 u3|2 ∈ (0,1).Therefore, the limit of u3|2 =C3|2(v|u2) could either be a number in (0,1), or 1.All cells in Table 6.1 are clear except row 3, column 6 (cell ∗), that is u1|2→ 1,u3|2→ 1 and C13;2 has upper tail dependence. From Equation 6.19, the boundaryconditional distribution can be written asC3|1;2(u3|2|u1|2)∼(1+(1−u3|21−u1|2)1/M13;2)M13;2−1, (u1|2,u3|2)→ (1,1),where M13;2 is a parameter of the LT ψ13;2 for C13;2 and 0 < M13;2 < 1.According to the analysis in Section 6.3.1, u1|2→ 1 implies that C12 has lowertail dependence and u2 ∼ uk1 with k > 1. However, C23 could have κ23L = 1 orκ23L ∈ (1,2).• If C23 has κ23L ∈ (1,2), then from Equation 6.12 and Equation 6.14,1−u3|2 ∼ A2(v)(− logu2)1−1/r23 , u2→ 1, v ∈ (0,1) fixed, A2(v)> 0,1−u1|2 ∼−(q12−1)u(1/k−1)/q122 , (u1,u2)→ (1,1), u2 ∼ uk1, q12 < 0,1−u3|21−u1|2∼− A2(v)q12−1(− logu2)−(1−r23)/r23u(1/k−1)/q122→ ∞,147where r23 and q12 are the parameters of ψ23 for C23 and ψ12 for C12 respec-tively. Therefore, C3|1;2(u3|2|u1|2)→ 0.• If C23 has κ23L = 1, then from Equation 6.10 and Equation 6.12,1−u3|2 ∼ A1(v)u−1/q232 , u2→ 1, v ∈ (0,1) fixed, A1(v)> 0,1−u1|2 ∼−(q12−1)u(1/k−1)/q122 , (u1,u2)→ (1,1), u2 ∼ uk1, q12 < 0,1−u3|21−u1|2∼− A1(v)q12−1u−1/q23−(1/k−1)/q122 ,where q23 and q12 are the parameters of ψ23 for C23 and ψ12 for C12 respec-tively. Therefore,C3|1;2(u3|2|u1|2)→1 if −q−123 −q−112 (k−1−1)> 0,const ∈ (0,1) if −q−123 −q−112 (k−1−1) = 0,0 if −q−123 −q−112 (k−1−1)< 0.Trivariate vine copula upper tailFix v ∈ (0,1) and let (u1,u2)→ (1,1) with (1−u2)∼ (1−u1)k. According toEquation 6.18, depending on the tail order of C23(u2,u3), there are two cases of thelimit of u3|2 =C3|2(v|u2) as u2→ 1:• If C3|2 has κ23U = 1, then from Equation 6.18,u3|2 ∼ A3(v)(1−u2)(1−M23)/M23 → 0, u2→ 1, v ∈ (0,1) fixed, A3(v)> 0,(A.4)where M23 is a parameter of the LT ψ23 for C23 in Equation 6.4.• If C3|2 has κ23U ∈ (1,2], then limu2→1 u3|2 ∈ (0,1).Therefore, the limit of u3|2 =C3|2(v|u2) could either be a number in (0,1), or 0.Cell ∗ in Table 6.2. Since C13;2 has κ13L ∈ (1,2), the conditional distribution148can be written as, via Equation 6.15,C3|1;2(u3|2|u1|2)∼(1+(− logu3|2− logu1|2)1/r13;2)q13;2+r13;2−1u(1+(− logu3|2− logu1|2)1/r13;2)r13;2−11|2 ,(u3|2,u1|2)→ (0,0),where q13;2 and r13;2 are parameters of the LT ψ13;2 for C13;2, and 0 < r13;2 < 1.Since C23 has κ23U = 1, then by Equation 6.18,u3|2 ∼ A3(v)(1−u2)(1−M23)/M23 → 0, u2→ 1, v ∈ (0,1) fixed,and− logu3|2 ∼− logA3(v)−M23−1M23(− log(1−u2)),where M23 is a parameter of the LT ψ23 for C23. C12 has upper tail dependence and(1−u2)∼ (1−u1)k, where k > 1 because u1|2→ 0. We have by Equation 6.21,u1|2 ∼ (1−u2)(M12−1)(1/k−1)/M12 → 0, (u1,u2)→ (1,1), (1−u2)∼ (1−u1)k,and− logu1|2 ∼M12−1M12(1k−1)(− log(1−u2)),where M12 is a parameter of the LT ψ12 for C12. Therefore,B :=− logu3|2− logu1|2∼ −M23−1M23M12−1M12(1k −1) > 0,andC3|1;2(u3|2|u1|2)∼ (1+B1/r)q+r−1u(1+B1/r)r−11|2 → 0, u1|2→ 0.The cell ∗ converges to 0.Cell † in Table 6.2. Since C13;2 has lower tail dependence, the conditional149distribution can be written as, via Equation 6.11,C3|1;2(u3|2|u1|2)∼(1+(u3|2u1|2)1/q13;2)q13;2−1, (u3|2,u1|2)→ (0,0),where q13;2 is a parameter of the LT ψ13;2 for C13;2 and q13;2 < 0, u3|2 and u1|2 arethe same as previous. Therefore,u3|2u1|2∼ A3(v)(1−u2)−(M23−1)/M23−(M12−1)(1/k−1)/M12 , (u3|2,u1|2)→ (0,0),C3|1;2(u3|2|u1|2)→0 if − M23−1M23 −M12−1M12(1k −1)> 0,const ∈ (0,1) if − M23−1M23 −M12−1M12(1k −1) = 0,1 if − M23−1M23 −M12−1M12(1k −1)< 0.A.3 Derivations for case 1 in Section 6.4.3Lower tailWe give a more detailed analysis to derive the rate at which the conditionalquantile goes to −∞. Let (u1,u2)→ (0,0) and u2 ∼ uk1. We are interested in theconditional quantile C−13|12(α|u1,u2). In other words, we need to find v such thatC3|2(v|u2) = C−13|1;2(α|u1|2). Let q12,q23,q13;2 and r12,r23,r13;2 be the parametersof ψ12 for C12, ψ23 for C23 and ψ13;2 for C13;2 respectively. By Equation 6.16,u1|2 ∼(1+(1k)1/r12)q12+r12−1u(1+(1/k)1/r12)r12−12 → 0,− logu1|2 = O(− logu2).By Equation 6.6,C−13|1;2(α|u1|2)∼ exp[−(− logαr13;2)r13;2(− logu1|2)1−r13;2],150− logC−13|1;2(α|u1|2)∼−(− logαr13;2)r13;2(− logu1|2)1−r13;2= O((− logα)r13;2(− logu2)1−r13;2).According to Equation 6.15,C3|2(v|u2)∼(1+( − logv− logu2)1/r23)q23+r23−1× exp(−(− logu2)(1+( − logv− logu2)1/r23)r23+(− logu2)).For C3|2(v|u2) =C−13|1;2(α|u1|2)to hold, it has to be true that (− logv)/(− logu2)→0 as u2→ 0. As a result,− logC3|2(v|u2)∼ r23(− logv)1/r23(− logu2)1/r23−1.Solving for − logC3|2(v|u2) =− logC−13|1;2(α|u1|2), we have− logv =− logC−13|12(α|u1,u2) = O((− logα)r23r13;2(− logu2)1−r23r13;2).On the normal scale, it implies that, with u2 =Φ(x2)∼ φ(x2)/|x2|,F−1Y |X1,X2(α|x1,x2)∼ (−2logv)1/2 = O((− logα)r23r13;2/2|x2|1−r23r13;2),x1,x2→−∞,x2/x1→√k.Since 1− r23r13;2 < 1, the conditional quantile function goes to −∞ sublinearlywith respect to x1 or x2.Upper tailWe give a more detailed analysis to derive the rate at which the conditionalquantile goes to +∞. Assume (1− u2) ∼ (1− u1)k as (u1,u2) → (1,1). LetM12,M23,M13;2 be the parameters of ψ12 for C12, ψ23 for C23 and ψ13;2 for C13;2respectively.151• If k > 1, then u1|2 → 0. For a fixed quantile level α ∈ (0,1), u3|2 has toconverge to 0. By Equation 6.21,u1|2 ∼ (1−u2)(M12−1)(1/k−1)/M12 → 0, − logu1|2 = O(− log(1−u2)).By Equation 6.19, as u2→ 1,u3|2 ∼(1+(1− v1−u2)1/M23)M23−1∼(1− v1−u2)1−1/M23→ 0,− logu3|2 ∼(1− 1M23)(− log(1− v)+ log(1−u2)) .Since C13;2 has κ13L ∈ (1,2), with a Taylor expansion of Equation 6.15,− logu3|12 ∼ r13;2(− logu3|2)1/r13;2(− logu1|2)1/r13;2−1∼ r13;2(1− 1M23)1/r13;2(− log(1− v)+ log(1−u2))1/r13;2(− log(1−u2))1/r13;2−1.Let u3|12 = α and solve for v, we have− log(1− v)∼− log(1−u2)+O((− logα)r13;2 (− log(1−u2))1−r13;2)∼− log(1−u2).On the normal scale, it impliesF−1Y |X1,X2(α|x1,x2)∼ x2, x1,x2→+∞, x2/x1→√k.• If k = 1, then u1|2 → 2M12−1. For a fixed quantile level α ∈ (0,1), u3|2 hasto converge to a constant. Specifically, u3|2→C−13|1;2(α|2M12−1) = O(1). ByEquation 6.19,u3|2 ∼(1+(1− v1−u2)1/M23)M23−1= O(1),1521− v = O(1−u2),− log(1− v)∼− log(1−u2).On the normal scale, it impliesF−1Y |X1,X2(α|x1,x2)∼ x2, x1,x2→+∞, x2/x1→ 1.• If k < 1, then u1|2 → 1. For a fixed quantile level α ∈ (0,1), u3|2 has toconverge to 1. By Equation 6.21,u1|2 ∼ 1+(M12−1)(1−u2)(1/k−1)/M12 → 1,log(1−u1|2)∼1M12(1k−1)log(1−u2).By Equation 6.19,u3|2 ∼(1+(1− v1−u2)1/M23)M23−1∼ 1+(M23−1)(1− v1−u2)1/M23→ 1,and (1− v)/(1−u2)→ 0, so thatlog(1−u3|2)∼1M23(log(1− v)− log(1−u2)) .Since C13;2 has κ13U = 1, by Equation 6.19,u3|12 ∼(1+(1−u3|21−u1|2)1/M13;2)M13;2−1.Let u3|12 = α and solve for v, we have(α1M13;2−1 −1)M13;2∼ 1−u3|21−u1|2,153M13;2 log(α1M13;2−1 −1)∼ log(1−u3|2)− log(1−u1|2)∼ 1M23(log(1− v)− log(1−u2))− 1M12(1k−1)log(1−u2)∼ 1M23log(1− v)−(1M23+1M12(1k−1))log(1−u2),− log(1− v)∼(1+M23M12(1k−1))(− log(1−u2)).On the normal scale, it impliesF−1Y |X1,X2(α|x1,x2)∼√1+M23M12(1k−1)x2, x1,x2→+∞, x2/x1→√k.154Appendix BConditional dependencemeasures for trivariate FrankcopulasIn this section, we conduct a similar analysis to Section 4.4 on a trivariate Frankcopula model. It is shown that the trivariate Frank copula model has less variationin the conditional dependence measures than the gamma factor model.The copula CDF of a trivariate Archimedean copula isC123(u1,u2,u3) = ψ(ψ−1(u1)+ψ−1(u2)+ψ−1(u3)).For a Frank copula with parameter δ ,ψ(s) =− log[1− (1− e−δ )e−s]δ.The copula of the conditional distribution isC12;3(u1,u2;u3) =h(h−1 (u1h(ς))+h−1 (u2h(ς))− ς)h(ς),155where ς = ψ−1(u3) andh(s) =−ψ ′(s) = (1− e−δ )e−sδ(1− (1− e−δ )e−s) .Given the analytical form of the copula of the conditional distribution, we cancompute the exact conditional dependence measures using numerical integration.Similar to Figure 4.3, we simulate n = 1000 samples form a trivariate Frank cop-ula where Kendall’s τ between two variables is 0.6. The exact ρS(C12;3(·;x)),ζα=5(C12;3(·;x)), and ζα=5(Ĉ12;3(·;x)) computed via numerical integration are shownin red dash-dot lines in Figure B.1. The kernel-smoothed estimates using Epanech-nikov kernel and window size hn = 0.2 are shown in solid dark lines and the boot-strap confidence bands are plotted in dashed dark lines. Compared to the gammafactor model in Section 4.4, it can be visually observed that there is less variationin the conditional dependence measures.156(a) Spearman’s rho ρS(C12;3).(b) Tail-weighted dependence mea-sure (lower tail) ζα=5(C12;3).(c) Tail-weighted dependence mea-sure (upper tail) ζα=5(Ĉ12;3).Figure B.1: Conditional measures of C12;3(·;x), the copula of Y1,Y2 givenF3(Y3) = x, for a trivariate Frank copula model with parameter that cor-responds to Kendall’s τ = 0.6. The sample size is n = 1000. The reddash-dot lines are the exact conditional measures computed via numer-ical integration. The dark solid lines and dashed lines are the kernel-smoothed conditional Spearman’s rho and the corresponding 90%-levelsimultaneous bootstrap confidence bands, using Epanechnikov kerneland window size hn = 0.2.157Appendix CImplementation of Monte Carlotree search (MCTS)C.1 DescriptionThe Python code is written in the object-oriented programming paradigm. Thereare three classes defined in the code: VineState, CorrMat, and MctsNode.The VineState class represents an incomplete truncated vine structure, whichis internally represented by a list of trees. It contains the following public methods.• get child states returns all the child states, that is, the VineStateobjects that can be obtained by adding an edge to the current object.• roll out returns a complete truncated vine structure by adding edges uni-formly at random.• to vine array converts the VineState object to a vine array represen-tation.The CorrMat class represents a correlation matrix. It provides methods tocompute the log-determinant and partial correlations.The MctsNode class represents a tree node in the search tree. Each objectcontains a VineState object as an attribute. It also stores the relevant summarystatistics. It has the following public methods. add children adds child nodes158to the current node. select child selects a child node according to the treepolicy. roll out performs the default policy. update updates the summarystatistics of the node. Finally, the main function mcts vine takes the followingarguments.• corr: Correlation matrix, a two-dimensional NumPy array.• n sample: Number of samples, an integer.• ntrunc: Truncation level, an integer.• output dir: Directory where the output file is written, a string.• itermax: Maximum number of iterations of MCTS, an integer.• FPU: First play urgency, a floating point number. A larger FPU encouragesexploration while a smaller FPU encourages exploitation.• PB: Progressive bias, a floating point number. A larger PB gives more weightto heuristic or prior knowledge.• log freq: Frequency at which debug information is printed, an integer.The code utilizes the Graph class in python-igraph. Relevant methodsand properties of the class are listed below.• Methods– add edges: Adds some edges to the graph.– add vertices: Adds some vertices to the graph.– copy: Creates an exact deep copy of the graph.– ecount: Counts the number of edges.– get adjacency: Returns the adjacency matrix of a graph.– vcount: Counts the number of vertices.• Properties– es: The edge sequence of the graph.159– vs: The vertex sequence of the graph.Table C.1 shows the correspondence of variables and functions defined in thepsuedocode in Algorithm 3.1 and in the Python implementation.Pseudocode Implementationnv MctsNode.visitsnv · x¯v MctsNode.sum scoren(v1,v2) MctsNode.child visitsvroot root nodevhistory temp node listTreePolicy MctsNode.select childDefaultPolicy MctsNode.roll outBackprop MctsNode.updateTable C.1: Correspondence of variables and functions defined in the psue-docode in Algorithm 3.1 and in the Python implementation.The provided code requires Python in version≥ 3.4, numpy package in version≥ 1.15, and python-igraph in version ≥ 0.7.C.2 Example usageIn this section, we provide a code snippet to showcase the usage of the mcts vinefunction.import numpy as np# A 8 dimensional correlation matrixrmat = np.array([[1.00, 0.98, 0.89, 0.97, 0.96, 0.95, 0.95, 0.60],[0.98, 1.00, 0.90, 0.97, 0.95, 0.95, 0.95, 0.62],[0.89, 0.90, 1.00, 0.92, 0.87, 0.90, 0.92, 0.66],[0.97, 0.97, 0.92, 1.00, 0.98, 0.98, 0.97, 0.63],[0.96, 0.95, 0.87, 0.98, 1.00, 0.95, 0.92, 0.54],[0.95, 0.95, 0.90, 0.98, 0.95, 1.00, 0.94, 0.61],[0.95, 0.95, 0.92, 0.97, 0.92, 0.94, 1.00, 0.69],[0.60, 0.62, 0.66, 0.63, 0.54, 0.61, 0.69, 1.00]])# Set seedsrandom.seed(0)np.random.seed(0)160# Run MCTSbest_vine = mcts_vine(rmat, n_sample=500, output_dir=’output.txt’, ntrunc=3,itermax=1000, FPU=1.0, PB=0.1, log_freq=100)# Print the resultprint(best_vine.to_vine_array())## CFI: 0.99## [[8 8 7 4 4 4 1 4]## [0 7 8 7 5 5 4 5]## [0 0 4 8 7 7 5 6]## [0 0 0 5 8 6 7 7]## [0 0 0 0 6 8 6 1]## [0 0 0 0 0 1 8 2]## [0 0 0 0 0 0 2 8]## [0 0 0 0 0 0 0 3]]C.3 Codeimport numpy as npimport igraphimport randomimport mathfrom functools import reduceclass VineState:def __init__(self, ntrunc, corr_mat):""" ConstructorThis function is only called when constructing the root state.Subsequent states are constructed by calling self._clone().Args:ntrunc: Number of truncation level.corr_mat: A CorrMat object."""self._ntrunc = ntruncself._corr_mat = corr_mat# dimensionself._d = corr_mat.dim()assert self._ntrunc > 0161assert self._ntrunc < self._d# self.tree_list is a list of igraph objects, representing an# incomplete truncated vine. Each element is a tree, except for the# last one, which is an incomplete tree. When self.__init__() is# called, self.tree_list is a list with an empty igraph object.g = igraph.Graph()g.add_vertices(self._d)g.vs[’name’] = [str(i) for i in range(self._d)]self.tree_list = [g]# The score of the incomplete vine: -log(1-rˆ2).self.score = 0.0def _clone(self):""" Create a deep clone of this state. """# _corr_mat is a shallow copyst = VineState(ntrunc=self._ntrunc, corr_mat=self._corr_mat)# Create a deep copy of self.tree_listst.tree_list = [g.copy() for g in self.tree_list]st.score = self.scorereturn stdef _level(self):""" Return the level of the current incomplete vine.Level is using zero-based numbering."""return len(self.tree_list) - 1def _is_complete(self):""" Whether a vine state is complete.A vine state is complete if the last tree in self.tree_list is aconnected tree, and the current level reaches the truncation level."""self_g = self.tree_list[-1]return (self_g.ecount() == self_g.vcount() - 1) and \(self._level() >= self._ntrunc - 1)def get_child_states(self):""" Get a list of all valid child states.If there is none, return an empty list.162"""if self._is_complete():return []self_g = self.tree_list[-1]# Append an empty graph to self.tree_list if self_g is a connected# tree.if self_g.ecount() == self_g.vcount() - 1:# self_g is already a tree.# If the current tree is connected but it hasn’t reached ntrunc,# then add another empty graph.g = igraph.Graph()g.add_vertices(self_g.ecount())g.vs[’name’] = self_g.es[’name’]self.tree_list.append(g)self_g = self.tree_list[-1]# Initialize the returned list.res = []if self_g.ecount() == 0:# If self_g is empty, select all pairs of edges as child states.for i in range(self_g.vcount()):for j in range(i):# Connect i and j.st = self._add_edge_helper(i, j)if st is not None:res.append(st)else:# If self_g is NOT empty, connect vertices with degree > 0 and# vertices with degree == 0. By doing so, there is always only one# connected component in the graph. The way we grow the tree# resembles Prim’s algorithm, not Kruskal’s algorithm.adj_mat = self_g.get_adjacency()adj_vec = [max(a) for a in adj_mat]# Vertices with degree > 0conn_ids = [i for i, x in enumerate(adj_vec) if x == 1]# Vertices with degree == 0disconn_ids = [i for i, x in enumerate(adj_vec) if x == 0]for i in conn_ids:163for j in disconn_ids:# Connect i and j.st = self._add_edge_helper(i, j)if st is not None:res.append(st)return resdef _add_edge_helper(self, i, j):""" Add an edge to the last graph in self.tree_list.Add an edge between vertex id i and j, if proximity condition issatisfied. The score is updated.Args:i, j: vertex indices in self.tree_list[-1].res: a list which the result state is appended to.Returns:A VineState with the added edge.If proximity condition is not satisfied, return None."""# Create a deep copy of the current state.temp_st = self._clone()# copy_g is the last incomplete tree in the newly copied state.copy_g = temp_st.tree_list[-1]if self._level() == 0:# If there’s only one tree in self.tree_list, no need to consider# the proximity condition. Simply add an edge.copy_g.add_edges([(i, j)])# Add edge namecopy_g.es[copy_g.ecount() - 1][’name’] = ’,’.join([str(j), str(i)] if j < i else [str(i), str(j)])# Update scorethis_score = -np.log(1 - self._corr_mat.pcorr(i, j) ** 2)temp_st.score += this_scorecopy_g.es[copy_g.ecount() - 1][’weight’] = this_scoreelse:# When level > 1, check the proximity condition first.# If it is not satisfied, return None.164# Otherwise, add the edge.prev_g = temp_st.tree_list[-2]# Get vertex names of i, j in copy_gi_v_name = copy_g.vs[i][’name’]j_v_name = copy_g.vs[j][’name’]# Get edge ids in prev_gi_edge = prev_g.es.find(name=i_v_name)j_edge = prev_g.es.find(name=j_v_name)if not set(i_edge.tuple) & set(j_edge.tuple):# If the intersection of i_edge and j_edge is empty, then the# proximity condition is not satisfied. Skip this pair.return None# Proximity condition is satisfied.copy_g.add_edges([(i, j)])# Assertionsif i_v_name.find(’|’) >= 0:assert j_v_name.find(’|’) >= 0elif i_v_name.find(’|’) < 0:assert j_v_name.find(’|’) < 0# Vertex namesi_v_name_set = set(i_v_name.replace(’|’, ’,’).split(’,’))j_v_name_set = set(j_v_name.replace(’|’, ’,’).split(’,’))# Symmetric differencenew_name_before_bar = ’,’.join(sorted(i_v_name_set ˆ j_v_name_set))# Intersectionnew_name_after_bar = ’,’.join(sorted(i_v_name_set & j_v_name_set))new_name = new_name_before_bar + ’|’ + new_name_after_bar# Add edge namecopy_g.es[copy_g.ecount() - 1][’name’] = new_name# Update score_i, _j = [int(k) for k in new_name_before_bar.split(’,’)]_S = [int(k) for k in new_name_after_bar.split(’,’)]this_score = -np.log(1 - self._corr_mat.pcorr(i=_i, j=_j, S=_S) ** 2)165temp_st.score += this_scorecopy_g.es[copy_g.ecount() - 1][’weight’] = this_scorereturn temp_stdef roll_out(self):""" Roll out the current vine state to a complete one.The current implementation is naive. It randomly chooses a child stateiteratively until reaching the end.Returns:(score, vine): The score and final vine state."""st = self._clone()while not st._is_complete():self_g = st.tree_list[-1]# Append an empty graph to self.tree_list if self_g is a connected# tree.if self_g.ecount() == self_g.vcount() - 1:# self_g is already a tree.# If the current tree is connected but it hasn’t reached# ntrunc, then add another empty graph.g = igraph.Graph()g.add_vertices(self_g.ecount())g.vs[’name’] = self_g.es[’name’]st.tree_list.append(g)self_g = st.tree_list[-1]if self_g.ecount() == 0:# If self_g is empty, randomly pick a pair of edges as child# states.v_list = list(range(self_g.vcount()))random.shuffle(v_list)found = Falsefor i in v_list:for j in range(i):# Connect i and j.temp_st = st._add_edge_helper(i, j)if temp_st is not None:st = temp_stfound = Truebreak166if found:breakelse:# If self_g is NOT empty, connect vertices with degree > 0 and# vertices with degree == 0. By doing so, there is always only# one connected component in the graph. The way we grow the# tree resembles Prim’s algorithm, not Kruskal’s algorithm.adj_mat = self_g.get_adjacency()adj_vec = [max(a) for a in adj_mat]# Vertices with degree > 0conn_ids = [i for i, x in enumerate(adj_vec) if x == 1]# Vertices with degree == 0disconn_ids = [i for i, x in enumerate(adj_vec) if x == 0]random.shuffle(conn_ids)random.shuffle(disconn_ids)found = Falsefor i in conn_ids:for j in disconn_ids:# Connect i and j.temp_st = st._add_edge_helper(i, j)if temp_st is not None:st = temp_stfound = Truebreakif found:breakreturn (st.score, st)def to_vine_array(self):""" Convert an object to a vine array representation.The representation is one-based numbering.Return a d-by-d upper triagular matrix."""d = self._d# clone is a full vine, randomly rolled out from the current truncated# vine.167clone = self._clone()clone._ntrunc = d - 1_, clone = clone.roll_out()# cond_sets is a list of length d-1,# each element is a list of conditioned sets at each level.cond_sets = []for k in range(d - 1):current_edges = clone._edge_repr()[k * d -(k**2 + k) // 2:(k + 1) * d -((k + 1)**2 + (k + 1)) // 2]current_edges = [[int(node) for node in e.split(’|’)[0].split(’,’)] for e in current_edges]# print(current_edges)cond_sets.append(current_edges)# When constructing the vine array, that elements are added column by# column, from right to left.# Within each column, elements are added from bottom to top.# In other words, we start from the last tree.M = -np.ones((d, d), dtype=np.int)for k in range(d - 2, -1, -1):w = cond_sets[k][0][0]M[k + 1, k + 1] = wM[k, k + 1] = cond_sets[k][0][1]del cond_sets[k][0]for ell in range(k - 1, -1, -1):for j in range(len(cond_sets[ell])):if w in cond_sets[ell][j]:cond_sets[ell][j].remove(w)v = cond_sets[ell][j][0]M[ell, k + 1] = vdel cond_sets[ell][j]breakM[0, 0] = M[0, 1]M += 1 # change from zero-based numbering to one-based numberingreturn Mdef _edge_repr(self):""" Edge representation of the VineState.Returns a list of strings, each represents an edge.For example:[’0,2’, ’1,3’, ’2,4’, ’3,5’, ’4,6’, ’5,6’, ’2,6|4’, ’3,6|5’, ’4,5|6’]"""168res = [sorted(g.es[’name’]) for g in self.tree_list if g.ecount() > 0]if res:res = reduce(lambda x, y: x + y, res)return resdef __hash__(self):return hash(tuple(self._edge_repr()))def __eq__(self, other):return self.__hash__() == other.__hash__()def __repr__(self):return self._edge_repr().__repr__()class CorrMat:""" A wrapper of a correlation matrix. """def __init__(self, corr_mat, n):""" Constructor.Args:corr_mat: a correlation matrix as a numpy array."""# corr_mat should be a square matrixassert corr_mat.ndim == 2assert corr_mat.shape[0] == corr_mat.shape[1]self._corr_mat = corr_matself._corr_mat_inv = np.linalg.inv(self._corr_mat)self._n = ndef n_sample(self):return self._ndef dim(self):""" Number of variables in the correlation matrix. """return self._corr_mat.shape[0]def log_det(self):""" Log determinant of the correlation matrix. """return np.log(np.linalg.det(self._corr_mat))def pcorr(self, i, j, S=None):""" Partial correlation of (i,j)|S. The indices are zero based.169Args:i, j: Indices.S: A list of indices."""if not S:return self._corr_mat[i][j]ind = [i, j] + Ssub_matrix = self._corr_mat[np.ix_(ind, ind)]# TODO: consider using np.linalg.solve instead of np.linalg.inv in the# future.sub_matrix_inv = np.linalg.inv(sub_matrix)return -sub_matrix_inv[0, 1] / np.sqrt(sub_matrix_inv[0, 0] * sub_matrix_inv[1, 1])def _parse_edge_name(self, name):""" Parse the name of an edge. For example: name =’2,5|3,6’.Args:name: name of an edge. For example: name =’2,5|3,6’.Returns:i, j, S"""name_split = name.split(’|’)i, j = [int(k) for k in name_split[0].split(’,’)]if len(name_split) > 1:S = [int(k) for k in name_split[1].split(’,’)]else:S = Nonereturn i, j, Sdef pcorr_by_name(self, name):""" Partial correlation by name. For example: name =’2,5|3,6’. """return self.pcorr(*self._parse_edge_name(name))def pcorr_given_all_by_name(self, name):""" Partial correlation of i, j given all other variables.Args:name: name of an edge. For example: name =’2,5|3,6’.Returns:170Partial correlation of i, j given all other variables."""i, j, _ = self._parse_edge_name(name)return -self._corr_mat_inv[i, j] / np.sqrt(self._corr_mat_inv[i, i] * self._corr_mat_inv[j, j])def __repr__(self):return self._corr_mat.__repr__()class MctsNode:""" A node class of the search tree. """def __init__(self, config, state=None):""" ConstructorArgs:config: A configuration dictionary, containstranspos_table, UCT_const, FPU."""self.state = stateself.config = configself.child_nodes = []# self.child_visits keeps how many times the child nodes are visitied# from the *current* node.self.child_visits = []self.visits = 0self.sum_score = 0def select_child(self):""" Tree policy: Select a child from the transposition table accordingto UCT.Update self.child_visits.Note: self.visits is not updated here. It is updated when self.updateis called."""# UCT_score uses UCT2 in Childs et al. (2008). Transpositions and move# groups in monte carlo tree search.def UCT(c_node, c_node_visits):mean_score = (c_node.sum_score /c_node.visits) if c_node.visits > 0 else 0# Margin of Errorif c_node_visits == 0:171moe = self.config[’FPU’]else:moe = math.sqrt(2 * math.log(self.visits + 1) / c_node_visits)edge_score = c_node.state.score - self.state.scoreprog_bias = self.config[’PB’] * edge_score / (c_node.visits + 1)# # An alternative progressive bias term:# # the partial correlation of the newly added edge, given all the# # other variables.# edge_diff_set = set(c_node.state._edge_repr()) - \# set(self.state._edge_repr())# assert len(edge_diff_set) == 1# edge_diff = list(edge_diff_set)[0]# pcorr_all = self.state._corr_mat.pcorr_given_all_by_name(# edge_diff)# prog_bias = self.config[# ’PB’] * (-np.log(1 - pcorr_all**2)) / (c_node.visits + 1)return mean_score + self.config[’UCT_const’] * (moe + prog_bias)UCT_list = [(UCT(node, self.child_visits[i]), node.state, i)for i, node in enumerate(self.child_nodes)]# max() in python 3: If multiple items are maximal, the function# returns the first one encountered.# Shuffle score_list so that when two nodes have the same UCT_score,# one of them is randomly picked.random.shuffle(UCT_list)max_UCT_list = max(UCT_list, key=lambda x: x[0])# print(max_UCT_list)selected_state = max_UCT_list[1]select_index = max_UCT_list[2]selected_node = self.config[’transpos_table’][selected_state]# Update number of visitsself.child_visits[select_index] += 1return selected_nodedef add_children(self):""" Add all children of the current node if possible.The children are added to the transposition table.Add the child *nodes* to self.child_nodes.Initialize self.child_visits to zeros.172Returns:bool: If children are successfully added, return True.If not, do nothing and return False."""assert len(self.child_nodes) == 0child_states = self.state.get_child_states()if not child_states:return Falseself.child_visits = [0] * len(child_states)for c_state in child_states:if c_state not in self.config[’transpos_table’]:self.config[’transpos_table’][c_state] = MctsNode(config=self.config,state=c_state)self.child_nodes.append(self.config[’transpos_table’][c_state])self.child_visits = [0] * len(child_states)return Truedef roll_out(self):""" Run default policyReturns:(score, vine): The score and final vine state."""return self.state.roll_out()def update(self, result):""" Update the node with result """self.visits += 1self.sum_score += resultdef is_leaf(self):""" If the node is a leaf node."""# The node is leaf node if its self.child_nodes is empty.return self.child_nodes == []173def __repr__(self):mean_score = str(self.sum_score /self.visits) if self.visits > 0 else ’N/A’return ’Vine state: ’ + self.state.__repr__() + ’\nMean score: ’ + \mean_scoredef mcts_vine(corr, n_sample, ntrunc, output_dir, itermax=100, FPU=1.0,PB=0.1, log_freq=100):# Initialize the correlation matrix object, root state and root nodecorr_mat = CorrMat(corr, n_sample)root_state = VineState(ntrunc=ntrunc, corr_mat=corr_mat)transpos_table = {}config = {# a dictionary: state -> node.’transpos_table’: transpos_table,# UCB1 formula: \bar{x} + UCT_const \sqrt{log(n)/log(n_j)}’UCT_const’: (-corr_mat.log_det()),# First Play Urgency’FPU’: FPU,# Progressive Bias’PB’: PB}print("Configuration dictionary:")print(config)root_node = MctsNode(config, root_state)best_score = 0 # we want to maximize the scorebest_vine = Nonefile_handler = open(output_dir, "w")# CFI calculationD_0 = corr_mat.n_sample() * (-corr_mat.log_det())nu_0 = corr_mat.dim() * (corr_mat.dim() - 1) / 2.0for i in range(itermax):node = root_nodetemp_node_list = [node]# Selectwhile not node.is_leaf():node = node.select_child()174temp_node_list.append(node)# Expandif node.visits > 0:# Only expand the leaf node if it has been visited.add_children_success = node.add_children()if add_children_success:node = node.select_child()temp_node_list.append(node)# Rolloutscore, vine = node.roll_out()if score > best_score:best_score = scorebest_vine = vine# CFI calculationD_ell = corr_mat.n_sample() * (-corr_mat.log_det() - best_score)nu_ell = (corr_mat.dim() - ntrunc) * \(corr_mat.dim() - ntrunc - 1) / 2.0CFI = 1 - max(0, D_ell - nu_ell) / \max(0, D_0 - nu_0, D_ell - nu_ell)file_handler.write(’%d, %.4f, %.4f\n’ % (i, best_score, CFI))if i % log_freq == 0 and i > 0:print(output_dir + ’, Iter %d: ’ % i)print("best_score: " + str(best_score))# Backpropagate[node.update(score) for node in temp_node_list]file_handler.close()print("CFI: " + str(CFI))return best_vine175
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Vine copulas : dependence structure learning, diagnostics,...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Vine copulas : dependence structure learning, diagnostics, and applications to regression analysis Chang, Bo 2019
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Vine copulas : dependence structure learning, diagnostics, and applications to regression analysis |
Creator |
Chang, Bo |
Publisher | University of British Columbia |
Date Issued | 2019 |
Description | Copulas are widely used in high-dimensional multivariate applications where the assumption of Gaussian distributed variables does not hold. Vine copulas are a flexible family of copulas built from a sequence of bivariate copulas to represent bivariate dependence and bivariate conditional dependence. The vine structures consist of a hierarchy of trees to express conditional dependence. The contributions of this thesis are (a) improved methods for finding parsimonious truncated vine structures when the number of variables is moderate to large; (b) diagnostic methods to help in decisions for bivariate copulas in the vine; (c) applications to predictions based on conditional distributions of the vine copula. The vine structure learning problem has been challenging due to the large search space. Existing methods are based on greedy algorithms and do not in general produce a solution that is near the global optimum. It is an open problem to choose a good truncated vine structure when there are many variables. We propose a novel approach to learning truncated vine structures using Monte Carlo tree search, a method that has been widely adopted in game and planning problems. The proposed method has significantly better performance over the existing methods under various experimental setups. Moreover, diagnostic methods based on measures of dependence and tail asymmetry are proposed to guide the choice of parametric bivariate copula families assigned to the edges of the trees in the vine and to assess whether a copula is constant over the conditioning value(s) for trees 2 and higher. If the diagnostic methods suggest the existence of reflection asymmetry, permutation asymmetry, or asymmetric tail dependence, then three- or four-parameter bivariate copula families might be needed. If the conditional dependence measures or asymmetry measures in trees 2 and up are not constant over the conditioning value(s), then non-constant copulas with parameters varying over conditioning values should be considered. Finally, for data from an observational study, we propose a vine copula regression method that uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2019-07-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0379699 |
URI | http://hdl.handle.net/2429/70869 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2019-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2019_september_chang_bo.pdf [ 8.05MB ]
- Metadata
- JSON: 24-1.0379699.json
- JSON-LD: 24-1.0379699-ld.json
- RDF/XML (Pretty): 24-1.0379699-rdf.xml
- RDF/JSON: 24-1.0379699-rdf.json
- Turtle: 24-1.0379699-turtle.txt
- N-Triples: 24-1.0379699-rdf-ntriples.txt
- Original Record: 24-1.0379699-source.json
- Full Text
- 24-1.0379699-fulltext.txt
- Citation
- 24-1.0379699.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0379699/manifest