Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Forecasting of nonlinear extreme quantiles using copula models Coia, Vincenzo 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_may_coia_vincenzo.pdf [ 3.66MB ]
Metadata
JSON: 24-1.0342941.json
JSON-LD: 24-1.0342941-ld.json
RDF/XML (Pretty): 24-1.0342941-rdf.xml
RDF/JSON: 24-1.0342941-rdf.json
Turtle: 24-1.0342941-turtle.txt
N-Triples: 24-1.0342941-rdf-ntriples.txt
Original Record: 24-1.0342941-source.json
Full Text
24-1.0342941-fulltext.txt
Citation
24-1.0342941.ris

Full Text

FORECASTING OF NONLINEAR EXTREME QUANTILES USINGCOPULA MODELSbyVINCENZO COIAB.Sc. (Mathematics), Brock University, June 2011B.Sc. (Biology), Brock University, October 2011M.Sc., Brock University, October 2012A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THEREQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)February 2017© Vincenzo Coia, 2017AbstractForecasts of extreme events are useful in order to prepare for disaster. Such forecastsare usefully communicated as an upper quantile function, and in the presence of predic-tors, can be estimated using quantile regression techniques. This dissertation proposesmethodology that seeks to produce forecasts that (1) are consistent in the sense thatthe quantile functions are valid (non-decreasing); (2) are flexible enough to capture thedependence between the predictors and the response; and (3) can reliably extrapolateinto the tail of the upper quantile function. To address these goals, a family of properscoring rules is first established that measure the goodness of upper quantile functionforecasts. To build a model of the conditional quantile function, a method that usespair-copula Bayesian networks or vine copulas is proposed. This model is fit using a newclass of estimators called the composite nonlinear quantile regression (CNQR) family ofestimators, which optimize the scores from the previous scoring rules. In addition, anew parametric copula family is introduced that allows for a non-constant conditionalextreme value index, and another parametric family is introduced that reduces a heavy-tailed response to a light tail upon conditioning. Taken altogether, this work is able toproduce forecasts satisfying the three goals. This means that the resulting forecasts ofextremes are more reliable than other methods, because they more adequately capturethe insight that predictors hold on extreme outcomes. This work is applied to forecastingextreme flows of the Bow River at Banff, Alberta, for flood preparation, but can be usedto forecast extremes of any continuous response when predictors are present.iiPrefaceThis dissertation was completed under the supervision of Dr. Natalia Nolde and Dr.Harry Joe, both of whom provided several ideas put forth in this dissertation.The motivating problem introduced in Section 2.1 came from the town of Canmore,Alberta, and the Alberta Environment and Parks. Dr. Nolde had the idea of addressingtheir need by forecasting high quantiles, making use of Extreme Value Theory. Dr. Noldesuggested the use of calibration and proper scoring rules to assess forecasts. I modifiedthe forecasts to be the entire tail of a predictive distribution, and modified existing properscoring rules to this type of forecast, in Section 3.2.The idea of using vine copulas to model the conditional quantiles came from Dr. Joe.I worked out the details behind the non-identifiability issues that arose, as well as thefitting procedure, as discussed in Section 5.1.Dr. Joe presented the issue of finding a copula family that allows for a non-constantconditional EVI. I developed the IG copula family with k = 2, with the IGL(2) copula1as a limit. Dr. Joe suggested generalizing the limit copula to attain higher dependence.I identified exactly how that can be done, by choosing a distribution function satisfyingcertain properties. This function, for the IGL(2) copula, was recognized by Dr. Joe as theGamma (2, 1) distribution function. I was then able to generalize this to the Gamma (k, 1)distribution function, to obtain the IG and IGL families presented in Section 6.1.Dr. Joe presented the problem of generalizing the estimation method of Bouyé andSalmon (2009). I developed an ad hoc generalization, which ended up being the CNQRestimator with identity transformation functions. Dr. Joe identified the addition of atransformation function, and I generalized that further to allow for different transforma-tion functions for each quantile level. I later found that the family of CNQR estimatorsis rooted in the theory of proper scoring rules, so that these estimators were no longer adhoc. Dr. Nolde and Dr. Joe were instrumental in identifying errors in my proofs of the1The construction of the IGL(2) copula that I was working with was different than what is pre-sented in Section 6.1. I was working with the α = 1 version of the DJ copula family, presented inProposition 6.2.2.iiiasymptotic properties of the CNQR estimators, and putting forth suggestions for fixingthem. The results can be found in Section 5.2.Chapter 7 makes use of two R (R Core Team, 2015) packages that I developed, and amstill in the process of developing. They are the copsupp and cnqr packages. The copsupppackage was originally developed with the intention of supplementing the CopulaModel(Joe and Krupskii, 2014) package with miscellaneous useful functions, but has evolved apurpose of providing a user-friendly interface for working with vine copula models. Assuch, its name will likely change to something more appropriate, like rvine or regvine,before being published. The copsupp package has some code written by Dr. Joe andBo Chang. The cnqr package allows for the implementation of the proposed regressionmethodology, discussed in Chapter 5. For both packages, Dr. Joe was instrumental inhelping me get some tricky aspects of the code to work. In addition, the code for theanalysis in Chapter 7 benefited greatly by the careful eyes of Dr. Nolde and Dr. Joe.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Data and Motivating Application . . . . . . . . . . . . . . . . . . . . . . 52.2 The Forecasting Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Extreme Value Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.1 Copula Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.2 Pair-Copula Bayesian Networks . . . . . . . . . . . . . . . . . . . 142.4.3 PCBN Representations . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Conditional EVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5.2 Conditional Quantiles . . . . . . . . . . . . . . . . . . . . . . . . 182.5.3 Modelling the Conditional EVI . . . . . . . . . . . . . . . . . . . 22v3 Assessment of a Forecaster . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Proper Scoring Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.2 A Class of Proper Scoring Rules . . . . . . . . . . . . . . . . . . . 273.2.3 Choice of Quantile Weight Function . . . . . . . . . . . . . . . . . 303.3 Integrity of Extreme Forecasts . . . . . . . . . . . . . . . . . . . . . . . . 324 Existing Forecasting Methodology . . . . . . . . . . . . . . . . . . . . . . 344.1 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Fully Parametric Regression . . . . . . . . . . . . . . . . . . . . . . . . . 365 Proposed Forecasting Methodology . . . . . . . . . . . . . . . . . . . . . 385.1 Building a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 CNQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.3.1 Parallel Linear Quantile Surfaces . . . . . . . . . . . . . 475.2.3.2 Linear Quantile Surfaces . . . . . . . . . . . . . . . . . . 485.3 Selecting a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 New Bivariate Copula Families . . . . . . . . . . . . . . . . . . . . . . . . 516.1 The IG and IGL Copula Families . . . . . . . . . . . . . . . . . . . . . . 516.1.1 Beginnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.1.1.1 Initial Derivation . . . . . . . . . . . . . . . . . . . . . . 526.1.1.2 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 556.1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.2 Non-parametric Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2.1 Durante and Jaworski Copula Class . . . . . . . . . . . . . . . . . 646.2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 646.2.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 An Extended Class . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Application to Flood Forecasting . . . . . . . . . . . . . . . . . . . . . . . 70vi7.1 Pre-processing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.1 Separating the Data . . . . . . . . . . . . . . . . . . . . . . . . . 717.1.2 Deseasonalizing the Discharge . . . . . . . . . . . . . . . . . . . . 717.1.3 Choosing Predictors and a Response . . . . . . . . . . . . . . . . 727.2 Building Forecasters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.4 Forecasting the 2013 Flood . . . . . . . . . . . . . . . . . . . . . . . . . . 807.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87A Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92A.1 Reflection Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92A.2 PCBN Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93B Proofs related to the Conditional EVI . . . . . . . . . . . . . . . . . . . 95B.1 Proofs of Conditional EVI Results . . . . . . . . . . . . . . . . . . . . . . 95B.2 Derivation of CCEVI’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100B.2.1 Gaussian Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . 100B.2.2 Frank Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102B.2.3 Gumbel Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103C Proofs related to Proper Scoring Rules . . . . . . . . . . . . . . . . . . . 105C.1 Propriety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105C.2 Tail Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109D Proofs related to CNQR Asymptotics . . . . . . . . . . . . . . . . . . . . 114D.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114D.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117E Proofs related to the New Copula Families . . . . . . . . . . . . . . . . 128E.1 DJ Copula Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128E.1.1 Generating Function Properties . . . . . . . . . . . . . . . . . . . 128E.1.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129E.1.3 Proofs of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 130E.2 extDJ Copula Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135viiE.2.1 Generating Function Properties . . . . . . . . . . . . . . . . . . . 135E.2.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137E.2.3 Proofs of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 138E.3 IGL Copula Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141E.3.1 Generating Function Properties . . . . . . . . . . . . . . . . . . . 141E.3.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143E.3.3 Proofs of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 144E.4 IG Copula Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149E.4.1 Generating Function Properties . . . . . . . . . . . . . . . . . . . 149E.4.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150E.4.3 Proofs of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 152F Application Supplement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154F.1 Deseasonalization of Discharge on a Log Scale . . . . . . . . . . . . . . . 154F.2 Marginal of the Response . . . . . . . . . . . . . . . . . . . . . . . . . . . 155F.3 Marginal of Snowmelt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156F.4 Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157F.5 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162F.5.1 Scenario 1: Large Discharge . . . . . . . . . . . . . . . . . . . . . 162F.5.2 Scenario 2: Large Snowmelt . . . . . . . . . . . . . . . . . . . . . 163F.5.3 Scenarios 3 and 4: Large Snowmelt and/or Large Discharge . . . 164viiiList of Tables2.1 Information about the stations where data are collected in Alberta. . . . 72.2 CCEVI’s of some existing bivariate copula models, all of which are con-stant. See Appendix B.2 for proofs. Note that the bivariate Gaussiancopula has the same CCEVI for ρ and −ρ, the latter being a 1-reflection. 133.1 Examples of the expected single-quantile score with w (τ) = 1 and g (y) =y, as given in Equation 3.2.9. The distributions are Gaussian (with φ andΦ denoting the density and distribution function of the standard Gaussiandistribution, respectively), Exponential, and Type I Pareto. Proofs can befound in Appendix C.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317.1 The years selected for each data set. The vine CNQR method splits thefitting data into training and validation, whereas the linear and local meth-ods use the entire fitting data. . . . . . . . . . . . . . . . . . . . . . . . . 71D.1 Indicator functions in the formulation of Zn (θ) . . . . . . . . . . . . . . 116F.1 Abbreviations of the copula families considered. A ‘v’ appended after anabbreviation represents the vertically reflected copula family, an appended‘u’ represents the horizontally reflected copula family, and an appended ‘r’represents the reflected copula family. . . . . . . . . . . . . . . . . . . . . 158F.2 Copulas linking the response and the predictors using CNQR. The marginaldistribution, followed by the linkage order of the predictors, are indicatedin the top row. The fitted copulas are displayed corresponding to thelinkage order, from top to bottom. . . . . . . . . . . . . . . . . . . . . . . 159ixF.3 Properties of some bivariate copula families. All families have membersthat are at least permutation symmetric (symmetry after swapping thetwo variables); those that are listed as “symmetric” also have reflectionsymmetry (a copula C is reflection symmetric if RC = C, where R is thereflection operator defined in Definition 2.4.1). Dependence in either tailof each copula are also indicated, sometimes indicating the strength oftail dependence (a concept related to tail order – see Joe, 2014, Section2.16). These copulas cover a range of tail behaviours in comparison to theGaussian copula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159F.4 Copulas linking the response and the predictors using CNQR, for Order 1with a GPD marginal, using a skewed BB1 copula linking change in dsdischarge separated by a lag of 1. The fitted copulas are displayed corre-sponding to the linkage order, from top to bottom. “bb1sk” is short for“skewed BB1”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160xList of Figures2.1 Time series plots of the discharge data (left) and drop in snowpack data(right), with each year overlaid. Note the log-scale on the discharge timeseries, and that “SWE” is snow water equivalent. . . . . . . . . . . . . . . 62.2 Intervals of time where data are available. Each interval represents anunbroken record, and is displayed as a solid horizontal line between twoshort vertical lines. A continuous record of discharge data is available. . . 62.3 Graphical representation of a pair-copula Bayesian network (left) witharray in (2.4.5), and a regular vine (right). The vine array on the right isthe same as in (2.4.5), but with the last column replaced by (5, 1, 2, 3, 4). 172.4 Relationship between atmospheric sulphur dioxide concentrations ([SO2])and atmospheric ozone concentrations ([O3]) (left) or probability integraltransformed (PIT) [O3] (right). Measurements are in parts per billion(ppb), except for PIT scores, which are unit-less. Data are daily maximumconcentrations collected at ground level in Leeds city center, UK, betweenthe years 1994 and 1998 (inclusive) during the months of April to July. . 182.5 Local estimates of the conditional EVI of atmospheric sulphur dioxide con-centrations ([SO2]), conditional on PIT atmospheric ozone concentrations([O3]). To estimate the conditional EVI at a particular [O3] PIT score,a window with some radius (indicated in the panels on the right) is con-structed, and the corresponding subsample of [SO2] data is extracted. AGeneralized Pareto distribution (GPD) is fit to this univariate subsampleusing MLE, with the threshold parameter taken to be quantile estimateswith levels indicated in the upper panels. Error bands represent one stan-dard error of the EVI estimates. Despite different estimation parameters,there always appears to be a downward trend in the conditional EVI. . . 19xi5.1 Examples of quantile curves of some bivariate distributions determined bycopulas. Parameters for the copula families are chosen to have a Kendall’stau of 0.3. The marginal distributions are indicated in the form of “re-sponse distribution” ~ “predictor distribution”. . . . . . . . . . . . . . . . 395.2 Asymptotic precision (inverse of the variance) of the family of CNQRestimators using K evenly spaced quantile levels calculated with Equa-tion (5.2.2). The single predictor is linked to the response with the copulaindicated in the side panels having a Kendall’s tau of 0.3. The copulas werechosen to see whether tail dependence influences the asymptotic precision:the Frank copula has upper tail quadrant independence; the Gumbel cop-ula has strong upper tail dependence; and the reflected Gumbel copula hasintermediate tail dependence (similar to the Gaussian copula). A commontransformation function gk = g is taken such that the transformed responseY 7→ g (Y ) has marginal distributions indicated in the upper panels (thecolumns should therefore be interpreted as different CNQR estimators,not as different underlying distributions). Using approximately 10 quan-tile levels appears sufficient to achieve the converged (K →∞) precision.It appears that transforming the response to a heavy-tailed distributionmay compromise the performance of the resulting CNQR estimator. . . 476.1 Some generating functions Ψk, described in Equation (6.1.7). . . . . . . 576.2 Some generating functions Hk (·; θ), described in Equation (6.1.8). . . . . 576.3 Normal scores plots of simulated data from members of the IG (θ, k) andIGL (k) copula families, defined respectively in Definitions 6.1.1 and 6.1.2.The marginal distributions are standard Gaussian, and the dependence isdescribed by an IG or IGL copula. Each panel displays 1000 randomlygenerated observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.4 Kendall’s tau dependence of the IG and IGL copula families, estimated bya random sample of 10,000 observations. . . . . . . . . . . . . . . . . . . 616.5 Normal scores plots of simulated data from the IG copula family (Defini-tion 6.1.1) with θ and k parameters being equal, indicated in the panellabels. The marginal distributions are standard Gaussian, and the depen-dence is described by an IG copula. . . . . . . . . . . . . . . . . . . . . . 61xii6.6 Examples of quantile curves of bivariate distributions described by theIGL(1.6) and IG(8.9, 3) copulas, defined respectively in Equations (6.1.12)and (6.1.9). Both copulas are chosen to have a Kendall’s tau of approxi-mately 0.3. The marginal distributions are indicated in the form “responsedistribution” ~ “predictor distribution”. . . . . . . . . . . . . . . . . . . 637.1 Estimated seasonal components {µi} and {σi} (respectively, location andscale) of the discharge for the selected “flood season”. . . . . . . . . . . . 727.2 Overlaid time series plots of deseasonalized discharge during the “floodseason”, on each of the training, validation, and test sets. . . . . . . . . . 727.3 Autocorrelation plots of ds discharge and change in ds discharge. Errorbands represent 95% confidence bands about zero. The ds discharge ishighly serially correlated, whereas the change in ds discharge has minimalserial correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.4 Calibration plots of the ds discharge marginal forecaster, and the change inds discharge marginal forecaster. A marginal forecaster uses the marginalquantile function as a forecast. The ten curves in each panel represent theresults under ten randomly chosen splits of the data into fitting and testsets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.5 Scatterplots of predictors (listed in the upper panels) against the response,with local estimates of upper quantile curves. Variables are unit-less, asidefrom lag 0 and lag 1 drop in snowpack, which are in millimeters. The fittedcurves are local estimates of K = 10 upper quantiles with levels aboveτc = 0.9, taken to be τk = τc + (1− τc) (2k − 1) / (2K) for k = 1, . . . , K.The curves are displayed only to convey trends. The curves are obtainedby fitting a GPD tail with MLE over a moving-window having a bandwidthof 0.5 standard deviations of the predictor. . . . . . . . . . . . . . . . . 747.6 Calibration plots (above) and histograms (below) of the competing fore-casters, on the test set. The dashed lines represent a perfectly calibratedforecaster. For the calibration plots, the region above the dashed linerepresents under-estimation (i.e., an exceedance occurs more often thansuggested), and the region below represents over-estimation (i.e., an ex-ceedance occurs less often than suggested). . . . . . . . . . . . . . . . . . 777.7 Estimates of the mean score for each of the three competing forecasters,computed on the test set. Error bars represent the standard error of themean estimate, ignoring autocorrelation. Smaller scores are better. . . . . 78xiii7.8 Of the forecasts issued by the linear forecaster when the observed out-come/discharge is at least as big as the horizontal axis, the solid blue linerepresents the (interpolated) proportion of forecasts that are inconsistent(that is, are non-monotonic quantile functions). Inconsistency was de-termined using 10 quantile levels. Approximately 11.3% of forecasts areinconsistent, and inconsistencies appear more likely when the outcome islarge. In addition, the raw data are shown: each instance of an inconsis-tent forecast is plotted with jitter around a proportion of 1, and around 0for each consistent forecast. . . . . . . . . . . . . . . . . . . . . . . . . . 787.9 Comparison of weighted scores amongst the competing forecasters, esti-mated using the test set. The weighting scenario is indicated in the panellabels. Lines connect scores under common weighting schemes, and thesolid horizontal lines represent the unweighted scores. . . . . . . . . . . . 797.10 Calibration plots of the competing forecasters under different weightingschemes, estimated using the test set. The forecaster is indicated inthe side panels; the weighting scenario is indicated in the upper panels.The dashed line represents a perfectly calibrated forecaster; the regionabove the dashed line represents under-estimation (i.e., an exceedance oc-curs more often than suggested), and the region below represents over-estimation (i.e., an exceedance occurs less often than suggested). . . . . . 807.11 Above: discharge during the start of the 2013 flood, on the indicated days,observed at the Bow River at Banff. Below: Forecasts that would havebeen issued for the Bow River at Banff for the day after that which isindicated. The diamond-shaped bullet indicates the actual observed valueon the next day. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.12 Linear simulation data with fitted linear (left) and non-linear (right) quan-tile surfaces. The linear surfaces were fit using linear quantile regression,and the non-linear surface with vine CNQR. . . . . . . . . . . . . . . . . 837.13 Non-linear simulation data with fitted linear (left) and non-linear (right)quantile surfaces. The linear surfaces were fit using linear quantile regres-sion (with ten quantile levels above 0.9), and the non-linear surface withvine CNQR (with quantile levels 0.905 and 0.995). . . . . . . . . . . . . . 83xivD.1 An example of an objective function of the CNQR estimator in Equa-tion (5.2.1) with two quantile levels τ1 = 0.6 and τ2 = 0.8, and transfor-mation functions g1 = g2 equal to the identity function. Ten bivariate ob-servations are generated from a Gumbel copula with parameter θ0 = 5, andgiven standard Exponential marginals. The objective function is createdwith these data having known marginals but unknown copula parameter.A region of the parameter space containing the minimum is displayed. . 115F.1 QQ and PP plots of the marginal distribution estimate of deseasonalizeddischarge, against the empirical distribution for the fitting data. Only theupper corners of the plots are shown, since both distributions are equal inthe lower corners. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156F.2 Empirical distribution functions of the probability integral transformed(PIT) snowmelt, using the estimated cdf of snowmelt. The PIT samplesshould be from a Uniform(0,1) distribution if the cdf estimate is correct,occurring when the empirical distribution falls on the diagonal. . . . . . . 157F.3 Mean score estimates of the candidate vine CNQR forecasters, estimatedusing the training set. Error bars represent standard error of the meanestimate (which ignore autocorrelation). . . . . . . . . . . . . . . . . . . 160F.4 Calibration plots on the candidate vine CNQR forecasters, using the train-ing set. Each appear to be sufficiently calibrated. . . . . . . . . . . . . . 161F.5 Calibration histograms on the candidate vine CNQR forecasters, using thetraining set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161F.6 Single-variable logistic weight functions resulting from the weight param-eter choices for discharge. The histogram is that of discharge data on thetest set (and is without scale). . . . . . . . . . . . . . . . . . . . . . . . 163F.7 Single-variable logistic weight functions resulting from the weight param-eter choices for snowmelt. The histogram is that of snowmelt data on thetest set (and is without scale). . . . . . . . . . . . . . . . . . . . . . . . 163xvList of AbbreviationsCCEVI Copula Conditional Extreme Value Index. See Definition 2.4.2.CNQR Composite Nonlinear Quantile Regression. See Section 5.2.DJ The Durante and Jaworski class of copulas. See Section 6.2.1 for a description.ds Deseasonalized, referring to deseasonalized discharge of the Bow River atBanff. See Section F.1 for a definition.Exp (µ) The exponential distribution with mean µ > 0, defined with distributionfunctionFExp (x;µ) = 1− exp(−xµ)for x > 0.extDJ The extended DJ copula family. See Section 6.2.2 for a description.EVI Extreme Value Index. See Section 2.3 for a description.IG (k, θ) The Integrated Gamma copula with parameters k > 1 and θ > 0. SeeEquation (6.1.12) for a definition.IGL (k) The Integrated Gamma Limit copula with parameter k > 1. See Equa-tion (6.1.9) for a definition.iid Independent and identically distributed, referring to a set of random variables.Gamma (k, θ) The gamma distribution with shape parameter k > 0 and scale parameterθ > 0, defined byFGamma (x; k, θ) = 1−Γ∗(k, x/θ)Γ (k)for x ≥ 0.xviGPD Generalized Pareto Distribution.N (µ, σ2) The Gaussian distribution with mean µ ∈ R and variance σ2 > 0. This isalso used to indicate the multivariate Gaussian distribution.Par (σ, α) The Type I Pareto distribution, with scale parameter σ > 0 and shape pa-rameter α > 0, defined byFPar (x;σ, α) = 1−(xσ)−αfor x ≥ σ.PCBN Pair-Copula Bayesian Network. See Section 2.4.2 for a definition.PIT Probability Integral Transform, of a continuous random variable. If W is arandom variable, then FW (W ) is the PIT score ofW , and follows a Unif (0, 1)distribution.resp. Respectively; used within braces to complement a result.SWE Snow Water Equivalent, which is the depth of water obtained after havingmelted a snowpack.Unif (a, b) The continuous uniform distribution from a to b, where a < b.xviiList of SymbolsThis section defines the key symbols used throughout this dissertation. To aid indefining some symbols, letW = (W1, . . . ,Wd)> be a random vector of continuous randomvariables for some integer d ≥ 2, and let h be a real-valued function with domain in R.Whenever h is defined such that h (x0) does not exist when x0 is in the domain of h, thenthe convention is adopted where this quantity is defined as the appropriate limit of h atx0.:= Used for defining symbols. The quantity on the left-hand side is defined asthe quantity on the right-hand side. The symbol =: means that the quantityon the right-hand side is defined as the quantity on the left-hand side.→d Convergence in distribution.→p Convergence in probability.a : b The set of integers from a to b inclusive, where a ≤ b are integers – that is,[a, b] ∩ Z. The convention is adopted that b : a = ∅.CW The copula of W .cW The copula density of CW , defined as D1:dCW .R The reflection operator acting on a d-dimensional copula, defined as R = R1:d.RI For a set of integers I ⊂ 1 : d and d-dimensional copula C, RIC is theI-reflection of copula C, defined in Appendix A.1.C+ The comonotonicity copula, C+ : (u, v) 7→ min (u, v).C− The countermonotonicity copula, C− : (u, v) 7→ max (u+ v − 1, 0).C⊥ The independence copula, C⊥ : (u, v) 7→ uv.xviiiCa1a2;a3:d The copula distribution function joining the pair (Wa1 ,Wa2) | W a3:d , wherea = (a1, . . . , ad)> is some permutation of 1 : d. This notation is shorthandfor the more proper notation that replaces each ai with Wai , i = 1, . . . , d.ca1a2;a3:d The copula density function joining the pair (Wa1 ,Wa2) |W a3:d , where a =(a1, . . . , ad)> is some permutation of 1 : d. This notation is shorthand for themore proper notation that replaces each ai with Wai , i = 1, . . . , d.Ca1|a2;a3:d A conditional distribution of a bivariate copula, defined as D1Ca1,a2;a3:d , wherea = (a1, . . . , ad)> is some permutation of 1 : d. For example, when d = 2 andC is a bivariate copula, C2|1 = D1C and C1|2 = D2C are conditional distribu-tion functions. This notation is shorthand for the more proper notation thatreplaces each ai with Wai , i = 1, . . . , d.Cal Calibration, defined in Equation (3.1.1).CDJ The DJ copula family, defined in Equation (6.2.1).CextDJ The extDJ copula family, defined in Equation (6.2.6).CIG The IG copula family, defined in Equation (6.1.9).CIGL The IGL copula family, defined in Equation (6.1.12).cˆW The reflection copula density, D12 CˆW .CˆW The reflection copula RCW , so that the “hat” can also be thought of as thereflection operator.δ Mapping from a calendar date to the day of year. The range is 1 : 366.D The differential operator, defined as Dh = h′.Dj The differential operator on the j’th argument. If hn : Rn → R is a functiondifferentiable in its j’th argument for some integer n ≥ 1, then Dj hn for1 ≤ j ≤ n is the derivative with respect to the j’th argument, defined asDj hn (x1, . . . , xn) =∂∂x jhn (x1, . . . , xn) .D (Gξ) The (maximum) domain of attraction, loosely defined as the set of all distri-bution functions having sample maxima that tend to Gξ in distribution (upxixto location and scale). A distribution function F is said to have extreme valueindex ξ ∈ R if F ∈ D (Gξ).E Expectation. For W1, the expectation is E (W1) =∫w dFW1 (w), sometimeswritten EW1 to emphasize expectation over the distribution of W1.FW The distribution function of W . When it is clear, the convention is adoptedthat Fa:b = FW a:b for integers 1 ≤ a ≤ b ≤ d.fW The density of W , if it exists. When it is clear, the convention is adoptedthat fa:b = fW a:b for integers 1 ≤ a ≤ b ≤ d.F The set of all possible conditional distributions of Y .Fˆ The set of all upper distribution functions corresponding to the forecast space,defined as Fˆ ={Q← : Q ∈ Qˆ}.F A sigma field of sets in Ω.g Transformation function, part of the proper scoring rule in Equation (3.2.8).Gξ The distribution function of the standard generalized extreme value distribu-tion for ξ ∈ R, defined asGξ (x) =exp(− (1 + ξx)−1/ξ), ξ 6= 0exp(− exp (−x)) , ξ = 0for x ∈ [−1/ξ,∞) when ξ > 0, x ∈ R when ξ = 0, and x ∈ (−∞,−1/ξ]when ξ < 0.G A sigma field withinF , interpreted as an “observation” or “experiment”. Usedfor discussions of conditional probability.Hk A generating function of the extDJ copula class that generates an IG copula;defined in Equation (6.2).Γ The gamma function. For k > 12, the gamma function isΓ (k) =∫ ∞0xk−1 exp (−x) dx.2Although the gamma function is also defined when k is complex, only k > 1 is needed.xxΓ∗ The (upper) incomplete gamma function. For (k, t) ∈ (1,∞)× [0,∞),Γ∗ (k, t) =∫ ∞txk−1 exp (−x) dx.h← The generalized inverse function of h. This is defined when h is non-decreasingby h← (s) = inf{x ∈ R : h (x) ≥ s} for all s in the range of h. When his non-increasing, the generalized inverse is defined as h← (s) = h¯← (−s),where h¯ = −h, resulting in h← (s) = sup{x ∈ R : h (x) < s}. If a and bare two monotone real functions, then a ◦ b is monotone. If a and b haveopposite monotonicity, then a◦b is decreasing. If a and b are both increasing,(a ◦ b)← = b← ◦ a← is increasing.h′, h′′ First and second derivatives of h, respectively.I The indicator function. For a set S,IS (x) =0, x ∈ S;1, x /∈ S.κψ The kappa function for DJ generating function ψ, defined in Equation (E.1.1).Ω The sample space that is the domain of Y .p The number of predictors, X.P Probability measure on F . Also used as the generic symbol for “probability”.φ The density of the standard Gaussian distribution.Φ The distribution function of the standard Gaussian distribution.Ψk A DJ generating function that generates an IGL copula; defined in Equa-tion (6.1.7).QW1 The quantile function of W1, equal to F←W1 .Q The set of all conditional quantile functions of Y , defined as {F← : F ∈ F}.Qˆ The set of forecasts forming the forecast space.xxiRVα The set of regularly varying functions (at infinity) with index of variationα ∈ R. A function h ∈ RVα if and only iflimx→∞h (ux)h (x)= uαfor any u > 0. Further, h (x) = xα` (x) for x in the domain of h, where` ∈ RV0 is a slowly varying function. We say a function h ∈ RVα at 1−if x 7→ h (1− 1x) ∈ RVα at infinity, so that h (x) = (1− x)−α `((1− x)−1)where ` ∈ RV0 at 1−.ρτ The asymmetric absolute deviation loss function, defined in Equation (3.2.6).τc Lower quantile level cutoff, defining the lower limit of the quantile functionforecasts.θ+ For θ ∈ R, is the positive part of θ, defined as max (0, θ).θ− For θ ∈ R, is the negative part of θ, defined as min (0, θ).va:b The subvector of v =(v1, . . . , vp)>, defined as va:b = (va, . . . , vb)> for integers1 ≤ a ≤ b ≤ p. The convention is adopted that vb:a is a vector of length zero.w Cross-quantile weight function, part of the proper scoring rule in Equa-tion (3.2.8).X A vector of p predictors.X The predictor space; X ∈X ⊂ Rp.Ξλ The Box-Cox power transformation, defined in Equation (2.5.2).ξW1 The extreme value index of FW1 , if it exists.Y The response variable, to be forecast; a random variable with probabilityspace (Ω,F ,P).Yt For times t = 1, 2, . . . are (not necessarily independent) replicates of Y , andis the response at time t.Y The support of the distribution of Y , often taken to be R.xxiiAcknowledgementsIt’s incredible the impact a person can have on another’s life. During my PhD pro-gram, I have had the fortune of being able to surround myself with truly inspiring peoplewho have believed in my potential from the start. Without these people, not only wouldI have not accomplished this dissertation, but I would not have developed much as aperson either.First and foremost, my supervisors, Dr. Natalia Nolde and Dr. Harry Joe, areinspiring teachers and provided an effective learning environment. They have dedicateda generous amount of time and patience to help get me where I am today. Throughregular meetings and feedback on my work, I have gained insight into their ideas, big-picture mindset, and pragmatic sense of what’s important. Their patience has been agodsend, having dealt with a lot from me, from ignorance in research to the emotionalups and downs that proved inhibitive at times. In particular, Dr. Nolde’s openness tolearning new things is inspiring. Dr. Joe’s computing practices are inspiring, and hisextra help checking my code has been very helpful. Also, I appreciate Dr. Matías SalibiánBarrera for his involvement with my supervisory committee – his input has been veryvaluable.Of course, this work would not have been possible without my funders. I am gratefulfor the Natural Science and Engineering Research Council of Canada for a postgraduatescholarship, as well as the University of British Columbia for funding through the Four-Year Fellowship and Faculty of Science Graduate Award. Dr. Nolde and Dr. Joe havealso graciously provided me with funding through research assistantships.I owe it to the Department of Statistics for believing in my ability from the start, andallowing me to partake in this journey under its umbrella. The department offered a nur-turing environment in several ways. Its incredibly inspiring members gave me valuableinsight into professional effectiveness. The many seminars, conferences, visiting speakers,department teas, and grad trips offered valuable new perspectives and friendships. Thevolunteer positions that were made available, such as the grad rep position and member-ship on a search committee, have given me valuable leadership skills. This environmentxxiiiallowed me to build valuable communication skills and a big-picture mindset of howstatistics fits into the world.Discussions with my colleagues have also proved invaluable. For instance, Bo Changprovided valuable input through discussions of software development, and David Leeprovided insight through discussions on my research.It was totally eye-opening to hear about the needs surrounding the Alberta 2013 flood,from speaking with BGC Engineering, the town of Canmore, the Alberta Environmentand Parks, and Transalta. These conversations have provided me with a valuable newperspective of how government and industry respond and plan for natural disasters. Iam grateful for the opportunity to be involved with addressing the issue of flooding.Friends and family have played an integral role in the development of this dissertation.All of my friends have leant an ear and offered their support, and provided an outlet forstress relief and character building. Each person has helped me complete this dissertationin their own unique way. I’m grateful for my parents for always encouraging me to pursueeducation, and for fostering a love of learning. Colleen Lau, for helping free my mind andexposing me to new activities, through things such as our spontaneous days. ChristinaBouchard, for helping me stay strong in the face of difficulties. Antonio Coia, for helpingme connect with the outdoors. Derek Kief, for reminding me of the beauty of mathematicsand science, and my research. Adam Higgs, for helping me gain confidence in myself. AlCannon, for his empathy of my struggles, and his ideas. Tyler Kolpin, for enabling me tofocus more of my energy on my research. Gareth Sirotnik, for bringing Zen meditationpractice to UBC. Elena Shchurenkova and Andres Sanchez-Ordonez, for incredible hikes.Daniel Dinsdale and colleagues, for dinner parties.This dissertation is important to me, as I am thrilled to be part of our civilization’spursuit of knowledge. There is still much that must be learned so that we can drive ourcivilization forward. I dream of being a driver of such progress, and this dissertation hasgiven me more competence and a better idea of how this can be done. So, these peoplehave not only greatly assisted with the completion of this dissertation, but by extension,have helped me step closer to my life vision. I am forever grateful for this. Thank youall.xxivTo Zio Claudio, whose curiosity of nature is contagious.xxvChapter 1IntroductionOn June 19, 2013, Alberta saw the Bow River and its tributaries swell beyond his-torical levels and devastate the river’s nearby towns in a costly 3-day flood. The townof Canmore was one such town, whose unprepared inhabitants ended up stranded on an“island” surrounded by rushing water (Doll and Geddes, 2014).How can one gain insight into such extreme events in advance? There will always besome amount of uncertainty in the outcome, but sometimes there are “precursors” thatcan provide insight to reducing such uncertainty. For the Alberta flood, it was heavy rain,a saturated ground, and a heavy snowpack together that made flooding almost certain.Such precursors are called predictors (or sometimes covariates or input variables), andwhen informative ones exist, the task becomes one of effectively garnering the insight thatthe predictors conceal about the outcome – extreme outcomes, in particular. Statistically,this “insight” is the conditional distribution of the response given the predictors.A plethora of regression and machine learning methods exist that attempt to extractsuch insight on a typical outcome (i.e., the expectation of the conditional distribution) –linear regression being an example. But the expected outcome will be exceeded often, byits very nature of being a “middle” value. If large enough, this exceedance can be devas-tating, and unforeseen. Instead, one should garner insight into the chance of an extremeoutcome. An extreme is usefully communicated as an extreme quantile – a predictedoutcome that is exceeded with some small chance called an exceedance probability1.However, issuing a single or a few quantiles (point forecasts) is still quite limited inthe amount of information conveyed, and the choice of exceedance probabilities is arbi-trary. It has been recognized as early as Cooke (1906), and stressed by such authors asKrzysztofowicz (2001) and Gneiting and Raftery (2007), that it is important to commu-nicate quantiles for all exceedance probabilities (a probabilistic forecast) – that is, convey1One minus the exceedance probability is called a quantile level.1CHAPTER 1. INTRODUCTIONthe entire conditional distribution. But quantiles having small exceedance probabilitiesare uninteresting in the context of extremes, and estimating these can compromise thequality of the estimates of extreme quantiles. Instead, it is more sensible to focus onlyon forecasting all extreme quantiles (i.e., the tail of the conditional distribution). Thegoal of this dissertation is to find a way to build a model that produces “good” forecastsof this “upper quantile function” type.A model that produces forecasts is called a forecaster. To build a forecaster, a for-mula for obtaining a quantile from both the exceedance probability and the predictor isrequired. Three qualities are sought in a forecaster, as evaluated by forecasts on futureevents (as opposed to past events already “known” by the forecaster):1. consistency of all forecasts produced, so that forecasts are valid upper quantilefunctions;2. ability to extrapolate into the distributional tail reliably, so that extremes areproperly represented; and3. flexibility across the space of predictors, so that the unique effect of predictors (suchas tail dependence) can be accommodated.Existing methods to create such forecasts fall under the “quantile regression” umbrella.Many methodologies exist that can handle such regression, but have downfalls.One commonly used method is linear quantile regression, introduced by Koenker andBassett (1978). This method presumes that each quantile has a generic linear trend in thepredictors, unrelated to other quantiles (non-parametric with respect to exceedance level).The estimation method enforces an upper bound to the distributional tails, violating theextrapolation requirement. Further, this method can result in non-monotonic predictivedistributions, since no relation across exceedance probabilities is enforced. Modificationshave been proposed to fix the non-monotonicity problem (cf. Bondell et al., 2010; Cher-nozhukov et al., 2010). Others add parametric assumptions across exceedance levels (cf.Zou and Yuan, 2008), but do not properly focus on the decay rate of the distribution’stail, and do not allow for flexibility across the predictor space.There are also local fitting methods, such as a kernel method (Spokoiny et al., 2013),that do not impose any structure across the predictor space (i.e., non-parametric in thepredictors), except that nearby predictors should convey similar predictions. However,these methods only use information from past scenarios where predictors were observedto be similar. When more predictors are considered, the chance of finding similar pastsituations dramatically decreases, so that less data are available for making a future pre-diction. And whatever data are available are likely “typical” values under that scenario,2CHAPTER 1. INTRODUCTIONso that minimal information about extremes is available. Ultimately, this “data sparsity”issue results in highly uncertain forecasts.Newer methods are available that indicate a (parametric) relationship amongst bothexceedance probabilities and predictors, using copula models. These have the benefitof producing consistent forecasts, and are able to incorporate Extreme Value Theory tojustify a model for the distributional tail. Bouyé and Salmon (2009) use bivariate copulasin the one predictor setting for quantile regression. Kraus and Czado (2017) use vinecopula models to accommodate multiple predictors, but focus on fitting the entire jointdistribution of the predictors and response, and do not optimize the fit of the predictivedistribution’s tail.This dissertation also uses vine copula models (extended to pair-copula Bayesian net-works). Such models provide a flexible approach for modelling dependence amongst thepredictors and response, and offer far more flexibility than the multivariate Gaussian dis-tribution can provide. For example, perhaps the combination of heavy rain, a saturatedground, and heavy snowpack had more of a detrimental impact on the Alberta flood thanthe independent sum of those effects. This type of “tail dependence” can be capturedwith vine copulas. This modelling technique is adapted to fitting the conditional distri-bution’s tail, and is estimated using the proposed composite nonlinear quantile regression(CNQR) family of estimators. This approach to quantile regression is one of the majornew contributions of this dissertation. Also, the CNQR estimator has roots in the the-ory of proper scoring rules. Describing a proper scoring rule for upper quantile functionforecasts is a major contribution of this dissertation.The decay rate of the conditional distribution into the tail is determined by theconditional extreme value index (EVI), which might not be constant in the predictors.Trends in the conditional EVI must be captured by the underlying copula. A new copulafamily allowing for a non-constant conditional EVI is proposed, as well as one that turnsa heavy-tailed marginal distribution into a light-tailed conditional distribution – anothermajor new contribution of this dissertation.The outline of the remainder of this dissertation is as follows. In Chapter 2, theexample motivating the work of this dissertation is introduced in Section 2.1, followedby a formal description of the statistical objective in Section 2.2. The chapter continuesby describing some requisite background in Extreme Value Theory (Section 2.3) andcopulas (Section 2.4), the two of which are tied together with new concepts related tothe conditional EVI (Section 2.5). Chapter 3 describes what it means for a forecaster tobe “good”. The new research here begins with Section 3.2.2, where a new class of properscoring rules is proposed, and new research continues for the remainder of the chapter. A3CHAPTER 1. INTRODUCTIONmore detailed description of existing methodology for building a forecaster is discussed inChapter 4, which is followed by a discussion of the proposed methodology and proposedestimation in Chapter 5 (and contains all new research). To allow for greater flexibility inmodelling, new copula families are introduced in Chapter 6 that allow for a non-constantor 0 conditional EVI (the entire chapter is new research). The methodologies discussedin this dissertation are compared through an application to predict extreme flows of theBow River at Banff, Alberta, in Chapter 7, which is all new research. Chapter 8 concludeswith a summary of the findings, and a discussion of some future research.This dissertation also contains several appendices to supplement the material in thebody. Appendix A gives formulas for vine and reflection copulas. Appendices B to Econsist of new research, and contains proofs related to the conditional EVI of Section 2.5,the new proper scoring rules of Chapter 3, the CNQR estimator of Section 5.2, and thenew copula families of Chapter 6 (respectively). Appendix F ends with details about thedata analysis in Chapter 7, and is all new research.4Chapter 2BackgroundThe town of Canmore, Alberta experienced a devastating flood in the year 2013,motivating the need to forecast the “worst case scenario” river flow in the near futureon a day-to-day basis. To do this, data on rainfall, snowpack, and river discharge areavailable at various stations since the 1980s. Section 2.1 elaborates on this motivationand the data.A forecast is a communication of one’s belief of a future outcome. This belief is rep-resented as a probability (or density) for each possible outcome, and forms the predic-tive distribution. When there are observations on predictors, the predictive distributionshould be as close as possible to the conditional distribution of the response given thepredictors. An extreme forecast can be formed by communicating the tail of this distri-bution in the form of an upper quantile function. These concepts are formally stated inSection 2.2.To build a forecaster of extremes, the methodology proposed in this dissertationmakes use of concepts in Extreme Value Theory and copulas. The requisite backgroundis provided in Sections 2.3–2.5.2.1 Data and Motivating ApplicationThe interest is in forecasting the discharge of the Bow River at Banff1, one day ahead.The discharge of a river can be defined as the volume of water that flows past a cross-section of the river over a time period, measured in m3/s. Discharge data beginning inthe year 1980 are available, and are displayed in Figure 2.1. For time periods where dataare available, see Figure 2.2.1Discharge data are from the Water Survey of Canada of the Ministry, a branch of Environment andClimate Change Canada, https://www.ec.gc.ca/rhc-wsc/.5CHAPTER 2. BACKGROUND510501005000 100 200 300Day of the YearDischarge (m3s)−80−400400 100 200 300Day of the YearDrop in Snowpack(mm SWE)Figure 2.1: Time series plots of the discharge data (left) and drop in snowpack data (right), with eachyear overlaid. Note the log-scale on the discharge time series, and that “SWE” is snow water equivalent.|||| | |||||| ||||| | | || | | | |||| | ||| | | | | | | | | | | | | ||| | || || | ||| || | | |||| | | | | | | | |DischargeSnowmelt1980 1990 2000 2010DateData TypeFigure 2.2: Intervals of time where data are available. Each interval represents an unbroken record, andis displayed as a solid horizontal line between two short vertical lines. A continuous record of dischargedata is available.To forecast discharge using information other than lagged (“recent”) discharge andday of the year, variables that influence surface runoff are typically sought (cf. Beven,2012). These include snowmelt, rainfall, soil infiltration, and evapotranspiration, andare typically computed from other measurements such as snowpack and temperaturedata. For simplicity of modelling, only snowmelt is used of these surface runoff variables– simple hydrologic models have been shown to compete well with complex hydrologicmodels anyway (Micovic, 2005).Omitting rainfall as a predictor might seem surprising. Newspaper articles oftenreport flash flooding of urban creeks due to heavy rainfall, such as in North Vancouverin British Columbia (Li, 2016). Perhaps rainfall has a heavy influence on urban creeksdue to a higher abundance of impermeable surfaces, such as concrete, in the watershed.Further, the media may be more likely to report such floods by the very nature of themoccurring in urban areas. As for rivers in the Rocky Mountains (including the Bow Riverat Banff), there is evidence that snowmelt (not rainfall) is the major driver of surfacerunoff, whereas rainfall becomes the predominate driver in the prairies (Environmentand Climate Change Canada, 2009). Empirical evidence of available data confirm that6CHAPTER 2. BACKGROUNDdischarge is more dependent on snowmelt than rainfall.The snowpack data2 are available at various stations, shown in Table 2.1. Snowpillows are used to measure the “Snow Water Equivalent” (SWE) of the snowpack, whichis the depth of water that would be obtained if the snowpack was melted (in millimeters).As a proxy for snowmelt, the drop in snowpack data are averaged across the stations forall data available on a given day. These data are displayed in Figure 2.1. More advancedmethods exist to estimate quantities such as total basin snowmelt, but these are notconsidered in this dissertation.Station Name Station ID Measurement TypeBow River at Banff 05BB001 Discharge (m3/s)Three Isle Lake 05BF824Snow Water Equivalent (mm)Little Elbow Summit 05BJ805Mount Odlum 05BL812Skoki Lodge 05CA805Sunshine Village 05BB803Table 2.1: Information about the stations where data are collected in Alberta.2.2 The Forecasting TaskLet Y : Ω→ Y ⊂ R be a random variable on the probability space (Ω,F ,P), whereF is a sigma field of subsets of a sample space Ω, and P is a probability measure3. Themarginal distribution function satisfiesFY (y) = P({ω ∈ Ω : Y (ω) ≤ y})for all y ∈ Y . A forecast is an upper quantile function Qˆ : (τc, 1)→ R for non-decreasingQˆ, where τc ∈ (0, 1) is a quantile level cutoff, which is chosen by the analyst to suit theneeds of the forecast users. Surely, without any information, the best forecast that canbe issued is the true distribution F←Y over (τc, 1).Sometimes, partial information about the response is available. This partial informa-tion takes the form of knowing whether the outcome ω ∈ G for all G ∈ G , where G is asigma field contained in F that can be interpreted as an “observation”. In this case, the2The snowpack data were provided to the authors by the Alberta Ministry of Environment andParks, http://aep.alberta.ca/.3Throughout this dissertation, P is used in general to represent “probability” of any event.7CHAPTER 2. BACKGROUNDconditional distribution of Y given knowledge of G should be the forecast, because thisdistribution by setup is the most one can know under this information.Of course, such conditional distribution is unknown, and requires estimation. Fur-ther, this estimation can only practically be done after selecting only the most usefulinformation. Thus, it is supposed that this information takes the form of observationson p predictors X ∈ X ⊂ Rp. Under the information that X = x is observed, the truequantile function QY |X=x over (τc, 1) is the best forecast. The task of issuing a forecasttherefore becomes one of estimating the tail of the conditional distribution of Y giventhe predictors.Since forecasts depend on observed information, a forecast is stochastic. A forecasteris a stochastic draw from a set Qˆ of forecasts4. The corresponding set of upper distributionfunctions is Fˆ ={Qˆ← : Qˆ ∈ Qˆ}. In this vein, the set of all (true) conditional distributionfunctions of Y given the results of any such observation G ⊂ F is denoted F , and itsquantile functions are Q = {F← : F ∈ F}.The response occurring at the (discrete) time t is Yt. Gneiting et al. (2007) speak of asequence of distributions from which nature generates the respective responses Y1, Y2, . . ..However, the concept of a “generating distribution” depends highly on philosophy – forinstance, it could be argued that nature only draws an outcome from a degenerate distri-bution. Instead, we think of the responses Y1, Y2, . . . as only being stochastic relative to anobserver, with sequences of distributions belonging to F , depending on what informationis observed.2.3 Extreme Value IndexThis section briefly discusses the tail behaviour of a univariate distribution througha concept called the extreme value index (EVI), and why such behaviour is interestingto know in practice. For a more extensive review, see de Haan and Ferreira (2006).We can begin our exploration of tail behaviour by investigating the asymptotic distri-bution of the sample maximum. Let Y1, . . . , Yn be an independent random sample drawnfrom the distribution function FY with support R, and denote Yn:n = max (Y1, . . . , Yn) asthe sample maximum. Of course, Yn:n approaches the right-endpoint of FY as n→∞ andits distribution thus becomes degenerate. But if there exist real sequences {an > 0} and{bn} such that the distribution of (Yn:n − bn) /an as n→∞ is not degenerate, what mightthat distribution look like? That is, if the following assumption holds, can something be4This set is not necessarily the set of all upper quantile functions. We will see in Chapter 3.2 thatinformation about such set is required for assessing goodness.8CHAPTER 2. BACKGROUNDsaid about G?Assumption 2.3.1. For y ∈ R,limn→∞F nY (any + bn) = G (y) , (2.3.1)where G is not degenerate.Proposition 2.3.1. (Fisher and Tippett, 1928; Gnedenko, 1943) Suppose Assumption 2.3.1holds for G non-degenerate. Then G belongs to the class of Generalized Extreme Value(GEV) distributions Gξ((x− µ) /σ) for location parameter µ ∈ R, scale parameter σ > 0,and shape parameter ξ ∈ R, whereGξ (z) = exp(− (1 + ξz)−1/ξ)(2.3.2)for 1 + ξz > 0.See de Haan and Ferreira (2006, Theorem 1.1.6) for a proof. We will see later thatlarger values of ξ correspond to distributions FY with heavier tails.In addition, other valid sequences of {an > 0} and {bn} satisfying the convergenceproperty in Equation (2.3.1) do not change the shape parameter ξ of the resulting GEVdistribution – only the location and scale parameters might change. We therefore speakof convergence to a GEV distribution of a certain ξ ∈ R. All such distribution functionsFY satisfying the convergence property in Equation (2.3.1) leading to a GEV distributionwith some common shape parameter ξ ∈ R are said to belong to the (maximum) domainof attraction of Gξ, which is denoted D(Gξ). Most continuous distributions that aredealt with in practice belong to a domain of attraction.Proposition 2.3.1 is particularly important for the identification of the shape param-eter defining the limiting GEV distribution of an underlying distribution FY . Whenthought of as a property of a distribution FY , the shape parameter of the GEV distribu-tion is called the extreme value index (EVI) of FY , and is denoted ξY (with the randomvariable in the subscript). The EVI is an important quantity because it describes the de-cay rate of the underlying distribution’s tail. In particular, the tail can be heavy (ξY > 0),decaying like a power function; light (ξY = 0), decaying rapidly or exponentially; or short(ξY < 0), having a finite right-endpoint.Proposition 2.3.2 (Tail behaviour). Let y∗ = sup{y ∈ R : FY (y) < 1}be the right-endpoint of the support of a distribution function FY , which is possibly infinite. Forsome ξY ∈ R, FY ∈ D(GξY)if and only if9CHAPTER 2. BACKGROUND1. for ξY > 0, y∗ =∞ andlimt→∞1− FY (ts)1− FY (t) = s−1/ξY (2.3.3)for all s > 0;2. for ξY = 0, y∗ ≤ ∞ and there exists a positive function g such thatlimt↑y∗1− FY(t+ sg (t))1− FY (t) = exp (−s) (2.3.4)for all s ∈ R; and3. for ξY < 0, y∗ <∞ andlimt↓01− FY (y∗ − st)1− FY (y∗ − t) = s−1/ξY (2.3.5)for all s > 0.See de Haan and Ferreira (2006, Theorem 1.2.1) for a proof.Since the EVI determines the tail behaviour of the underlying distribution FY , theEVI should be an important consideration when modelling a distribution to evaluatethe exceedance probability of extremes. This is especially true when FY is heavy-tailed:due to the slow decay of a heavy-tailed distribution, a realization that is much largerthan “typical” realizations is not an unusual occurrence. The extent of such extremevalues is more likely to be large if ξY is large. Such “rogue values” are sometimes seen inpractice, such as with natural disasters or insurance claims. If the EVI is not consideredwhen modelling the tail behaviour of such phenomena, then the occurrence frequency ofextreme events (and thus the risk of danger) could be seriously underestimated.Proposition 2.3.2 tells us one way to find the EVI of a distribution, supposing oneknows its sign. The von Mises condition is another useful way to find the EVI, althoughit is sufficient, and not necessary.Proposition 2.3.3 (Tail behaviour). Let y∗ = sup{y ∈ R : F (y) < 1} be the right-endpoint of a distribution function F , which is possibly infinite. Suppose that, in someleft neighbourhood of y∗, F ′′ exists and F ′ > 0. Iflimy↑y∗D(1− FF ′)(y) = ξ ∈ R, (2.3.6)10CHAPTER 2. BACKGROUNDor equivalently,limy↑y∗(1− F (y))F ′′ (y)[F ′ (y)]2 = −ξ − 1 ∈ R, (2.3.7)then F ∈ D (Gξ).See de Haan and Ferreira (2006, Theorem 1.1.8) for a proof.We do not discuss estimation of a distribution’s tail in this dissertation, or how tomodel the tail of a univariate distribution, but rather focus on modelling the conditional5EVI in a regression setting.2.4 CopulasThe proposed methodology (along with some existing methods) makes use of copulasfor modelling. An overview of modelling joint distributions with copulas is discussedhere. For more details, see Joe (2014).2.4.1 Copula BasicsEssentially, a function C : [0, 1]d → [0, 1] for integer d ≥ 2 is a copula if it is adistribution function with Uniform (0, 1) univariate marginals. The importance of copulasis made clear by Sklar’s Theorem.Theorem 2.4.1 (Sklar’s theorem). A distribution function F1:d : Rd → [0, 1] havingunivariate marginals F1, . . . , Fd, d ≥ 2, can be written asF1:d (x1, . . . , xd) = C(F1 (x1) , . . . , Fd (xd)),where C is a copula. Further, C is unique if F1, . . . , Fd are continuous. The con-verse is also true: given a copula C and univariate distribution functions F1, . . . , Fd,C(F1 (x1) , . . . , Fd (xd))for (x1, . . . , xd) ∈ Rd is a distribution function having univariatemarginals F1, . . . , Fd.Sklar’s Theorem implies that the dependence structure within a multivariate (contin-uous) distribution is determined by the underlying copula. Consequently, multivariatedistributions can be modelled using copulas together with the well-known methods ofmodelling univariate distributions. The theorem also describes a way to find the copula5In this dissertation, “conditional EVI” means the EVI of the response given the predictors.11CHAPTER 2. BACKGROUNDC underlying a joint distribution F1:d:C : (u1, . . . , ud) 7→ F1:d(F←1 (u1) , . . . , F←d (ud))(2.4.1)for left-continuous inverse functions F←1 , . . . , F←d .It is useful to consider different reflections of copulas when modelling, which we definehere.Definition 2.4.1 (Copula reflections). Suppose C is a copula, representing the distribu-tion function of U = (U1, . . . , Ud) for integer d ≥ 2. Suppose K is an integer, 1 ≤ K ≤ d,and consider a subset {i1, . . . , iK} ⊂ {1, . . . , d}, i1 < · · · < iK. We make the followingdefinitions.• The (i1, . . . , iK)-reflected copula is the distribution function of U with the ij’th entryreplaced by 1 − Uij for each j = 1, . . . , K. Such a copula is denoted by Ri1···iKC,where Ri1···iK is a reflection operator defined in Equation (A.1.1).• The (1, . . . , d)-reflected copula is called the reflected or reflection copula.• For any reflection operator R, the copula RC is in general referred to as a reflectioncopula.• For d = 2, the 1-reflected copula is also called the horizontally-reflected copula, andthe 2-reflected copula is also called the vertically-reflected copula.Formulas for computing a reflection copula are given in Appendix A.1.We reserve the letter C to indicate a copula distribution function, and its lower casec = D12C to indicate its density. For bivariate copulas, the conditional distributions aredenoted C2|1 = D1C and C1|2 = D2C. A “hat” on these quantities refers to the reflectioncopula – for example, Cˆ2|1 = D1 R12C.There are many parametric copula families that have been identified in the literature(cf. Joe, 2014, Chapter 4), as well as many different interesting properties that are charac-teristic of these copulas. Some of these properties are asymmetry, quadrant dependence,and tail dependence. Tail dependence describes the dependence between variables, giventhat each is large. For a detailed overview of properties, see Joe (2014, Chapter 2).Since a copula determines the dependence structure between multiple random vari-ables, we would expect it to describe much of the behaviour of the conditional EVI. Thisnotion is formalized in the CCEVI, which is the conditional EVI of a copula variablethat is transformed to have a standard Type I Pareto margin. For random variableU ∼ Unif (0, 1), (1− U)−1 is the required transformation.12CHAPTER 2. BACKGROUNDDefinition 2.4.2 (CCEVI). Suppose U = (U1, . . . , Ud)> is a d-dimensional randomvector, d ≥ 2, with joint distribution given by some copula C. Take i ∈ {1, . . . , d}, andlet U−i be the vector U with the i’th entry removed. Then, if there exists a functionξi : [0, 1]d−1 → R such thatF(1−Ui)−1|U−i=u−i ∈ D(Gξi(u−i))for all u−i ∈ [0, 1]d−1, we call ξi the (upper) i’th Copula Conditional Extreme ValueIndex (CCEVI). We refer to the CCEVI of a d-dimensional copula C as its upper d’thCCEVI, and denote this CCEVI by ξC.The lower i’th CCEVI can be defined analogously, but for the lower tail; that is,the EVI of U−1i | U−i = u−i. The i’th CCEVI could have been defined as the EVI ofUi | U−i = u−i, but since this random variable has a finite right-endpoint (at 1), its EVIis nonpositive and therefore uninteresting.Examples of CCEVI’s for some bivariate copula families are listed in Table 2.2.Parameter CCEVI CCEVI of ReflectedGumbel θ > 1 1/θ 1Frank θ ∈ R 1 1Bivariate Gaussian −1 < ρ < 1 1− ρ2 1− ρ2Table 2.2: CCEVI’s of some existing bivariate copula models, all of which are constant. See Ap-pendix B.2 for proofs. Note that the bivariate Gaussian copula has the same CCEVI for ρ and −ρ, thelatter being a 1-reflection.The CCEVI is a real function with domain [0, 1]d−1. If this function is constantthroughout its domain, we sometimes refer to the CCEVI as that constant with theimplicit meaning of a constant function.To find the CCEVI of a bivariate copula C, first fix a u ∈ (0, 1). One can then usethe von Mises condition in Proposition 2.3.3 with F = C2|1(·|u). Or, if one suspectsthe CCEVI is positive, the CCEVI can be obtained by the index of variation of thedistribution function C2|1(1− y−1|u) for y > 1 through Equation (2.3.3). In particular,limt→∞1− C2|1(1− (ts)−1 |u)1− C2|1(1− t−1|u) = s−1/ξC(u), (2.4.2)13CHAPTER 2. BACKGROUNDor equivalently, using the copula density,limt→∞c(u, 1− (ts)−1)c (u, 1− t−1) = s1−1/ξC(u) (2.4.3)(which can be derived using L’hôpital’s rule). In fact, this suggests that a copula havinga finite and non-zero density at a boundary has a CCEVI of 1.Proposition 2.4.2. Suppose C is a bivariate copula withlimv↑1c (u, v) = c∗ ∈ (0,∞) (2.4.4)for all u ∈ U ⊂ (0, 1). Then the CCEVI of C over U is 1.A straight-forward proof follows by Equation (2.4.3).2.4.2 Pair-Copula Bayesian NetworksModelling a bivariate distribution using copulas is straight-forward relative to mod-elling higher dimensional distributions. Once the univariate distributions are modelled,then one of the many parametric bivariate copula families listed in the literature (cf. Joe,2014, Chapter 4) can be chosen to fit the dependence. Such dependence can be visualizedin the data with a normal scores plot – a scatterplot of the data with marginals trans-formed to standard Gaussian. These plots are useful because it allows the dependence tobe compared to a bivariate Gaussian distribution, and may reveal features such as taildependence that one would be interested in capturing in a model.However, modelling a multivariate distribution with more than two variables is moredifficult. Although there are multivariate copula families identified in the literature thatcan be used, they are often too simple to capture the intricate dependence featuresdemonstrated by real data. In addition, the dependence amongst the variables cannotbe visualized in a straight-forward way like it can with a bivariate normal scores plot.To deal with the issue of modelling the joint distribution of some p-dimensional ran-dom vectorW , we can consider introducing one variable at a time. As a preliminary, theunivariate marginal distributions of each variable inW should be modelled in full. Hereare the steps involved in building a p-variate copula model to describe the dependencein W , and obtaining its joint distribution.1. Choose two variables to start with – say W1 and W2 (letting the subscripts in-dicate the introduction order). Select a bivariate copula model to describe theirdependence.14CHAPTER 2. BACKGROUND2. Introduce another variable, W3, and “pair” it with each of the previous variablesW1 and W2. However, copulas cannot be chosen freely for each pair, otherwise thejoint distribution of (W1,W2,W3) might not be valid. Instead, the pairs are chosenaccording to some chosen order (which we call the pairing order for W3), and acopula model is chosen for each pair conditional on previously paired variables. Inthis case, we can choose copula models for (W1,W3) and (W2,W3) | W1, or (W2,W3)and (W1,W3) | W2. The copula models for each pair are assumed not to dependon the conditioning (“given”) variables – an assumption known as the simplifyingassumption.3. Step 2 is repeated, each time introducing a new variable and pairing it with thepreviously introduced variables, until all variables are introduced. A model forthe resulting copula underlying W can then be constructed, along with the jointdistribution that is called a pair-copula Bayesian network (PCBN) (Bauer andCzado, 2015).Here are some additional modifications that are useful.• To prevent overfitting, a PCBN model can be simplified by choosing the indepen-dence copula to describe the dependence in pairs at some point in a pairing order.This is called truncation, and carries the interpretation that variability is alreadyaccounted for by the conditioning variables.• To avoid an otherwise computationally expensive evaluation of the PCBN jointdistribution (which can involve multi-dimensional integration), some restrictions onthe pairing order can be applied. The details of these restrictions are not discussedhere (see Joe, 2014, Chapter 3), but are only needed for variables introduced afterthe third (i.e. W4, . . . ,Wp). Under the restrictions, the PCBN is called a regularvine and the resulting distribution ofW is called a vine distribution, and its copulaa vine copula.For equations related to the PCBN and vine distributions, see Appendix A.2.2.4.3 PCBN RepresentationsA PCBN is identified by its variable pairs and the copulas for each pair. Writing thevariable pairs is quite verbose, and can be summarized by an array called the PCBN15CHAPTER 2. BACKGROUNDarray (or vine array). An example of such an array with five variables isM =1 2 3 4 51 1 2 12 432. (2.4.5)The introduction order of the variables is indicated by the first row, and each variables’pairing order is listed in order below it. Specifically, denoting Mij as the entry in rowi and column j, then for i > 1 and j ≥ i, the pairs in the PCBN are (WMij ,WM1j) |(WM2j , . . . ,WM(i−1)j)whenever Mij is not blank (a blank entry indicates a truncation).This array has pairs (W1,W2), (W1,W3), (W2,W3) | W1, (W2,W4), (W1,W5), (W4,W5) |W1, (W3,W5) | (W1,W4), (W2,W5) | (W1,W3,W4), and happens to be a vine. Differentways of writing vine arrays exist in the literature (with variables along the diagonal), butthe advantage of this representation is that it makes truncation easy to express.The pairs can also be represented by a directed acyclic graph with labelled edges6.Upon introducing the kth variable Wk, arrows are drawn from previous variables towardsWk with edges labelled according to their pairing order. Pairs that are (conditionally)independent can omit an arrow. See Figure 2.3 for a graphical representation of a PCBNand a vine. To obtain the pairs from a graphical representation, it is easiest to follow thegraph “backwards” to get the reverse introduction order. Begin with a variable not atany arrow tail, and obtain its pairing order according to the labelled edges. Then removethat node and its edges, and repeat until no variables remain.2.5 Conditional EVITo extrapolate into the tail of a univariate distribution (i.e., extract estimates ofhigher and higher quantiles), Extreme Value Theory provides justification for a modelthat decays according to the EVI. If the EVI is estimated to be too small, then extremeswill occur more often than anticipated (and vice-versa).When using predictors to make forecasts, it is equally important to estimate theEVI of the conditional distribution of the response given the predictors. There areexamples in nature where this conditional EVI is non-constant across the predictor space.6Note that vines can also be represented using a different graphical representation due to theirrestrictions on the pairing order.16CHAPTER 2. BACKGROUND1   1   1   2   1   4   3   2   W1W2W3W4W51   1   1   2   1   2   3   4   W1W2W3W4W5Figure 2.3: Graphical representation of a pair-copula Bayesian network (left) with array in (2.4.5), anda regular vine (right). The vine array on the right is the same as in (2.4.5), but with the last columnreplaced by (5, 1, 2, 3, 4).Section 2.5.1 demonstrates this with pollution data. Without capturing a non-constantconditional EVI in a model, extrapolation into a forecast’s tail would be less reliable.The link between conditional quantiles and the conditional EVI is explored briefly inSection 2.5.2, and shows that a conditional EVI that is non-constant results in quantilesurfaces that are highly non-linear. Section 2.5.3 discusses how copulas influence theconditional EVI through a new concept called the copula conditional EVI (CCEVI), andthis is useful for incorporating a non-constant (in the predictors) conditional EVI in amodel.2.5.1 MotivationThe dependence amongst atmospheric pollutants is an important consideration whenmaking environmental impact assessments (Heffernan and Tawn, 2004). Examination ofthe dependence amongst the concentration of some pollutants reveals that a non-constantconditional EVI is tangible in nature.As an example, data of daily maximum pollutant concentrations of five pollutantsare available at ground level in Leeds city center, UK, between the years 1994 and 1998(inclusive) during the months of April to July, and can be found in Heffernan and Tawn(2004). Two pollutants—ozone (O3) and sulphur dioxide (SO2)—demonstrate the needfor a model that allows for a non-constant conditional EVI. These data are shown inFigure 2.4. The conditional EVI of SO2 concentrations given O3 concentrations can beestimated by using a moving-window approach. Estimates can be found in Figure 2.5,which includes a sensitivity analysis to ensure that the results are not an artifact of theestimation parameters. It appears that there is indeed a negative trend in the conditional17CHAPTER 2. BACKGROUNDEVI.010020030020 40 60 80[O3] (ppb)[SO2] (ppb)01002003000.00 0.25 0.50 0.75 1.00PIT [O3][SO2] (ppb)Figure 2.4: Relationship between atmospheric sulphur dioxide concentrations ([SO2]) and atmosphericozone concentrations ([O3]) (left) or probability integral transformed (PIT) [O3] (right). Measurementsare in parts per billion (ppb), except for PIT scores, which are unit-less. Data are daily maximumconcentrations collected at ground level in Leeds city center, UK, between the years 1994 and 1998(inclusive) during the months of April to July.Note that the direction of trend in the conditional EVI need not match the signof the dependence, as these data seem to suggest. Although there is evidence that theconditional EVI is decreasing, there is evidence7 that the overall dependence is decreasing.Indeed, with a decreasing conditional EVI, the upper quantile curves must be decreasing,as identified in Proposition 2.5.2; but this does not force the central quantile curves (orthe conditional expectation curve) to be decreasing.Other pairs of pollutants show evidence of a non-constant conditional EVI as well,but this example is perhaps the most convincing.2.5.2 Conditional QuantilesSuppose X ∈ X ⊂ Rp is a random vector of p ≥ 1 predictors, and Y ∈ R is aresponse random variable. We make the following assumptions.Assumption 2.5.1. The response Y has distribution function FY ∈ D(GξY)for someξY ∈ R, with right-endpoint y∗ = sup{y ∈ R : FY (y) < 1}.Assumption 2.5.2. There exists a measurable function ξY |X : X → R such that, foreach x ∈ X , the conditional response Y |X=x has distribution function FY |X=x ∈D(GξY |X(x))with right-endpointy∗Y |X (x) = sup{y ∈ R : FY |X(y|x) < 1} . (2.5.1)7For the raw data, an estimate of Pearson’s correlation is 0.16, and a test for significance of simplelinear regression results in a p-value less than 0.001.18CHAPTER 2. BACKGROUNDlevel=0.7 level=0.8 level=0.9−0.50.00.51.01.5−0.50.00.51.01.5radius=0.2radius=0.30.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0[O3] PIT Score[SO2]: Conditional EVI EstimateFigure 2.5: Local estimates of the conditional EVI of atmospheric sulphur dioxide concentrations([SO2]), conditional on PIT atmospheric ozone concentrations ([O3]). To estimate the conditionalEVI at a particular [O3] PIT score, a window with some radius (indicated in the panels on the right) isconstructed, and the corresponding subsample of [SO2] data is extracted. A Generalized Pareto distri-bution (GPD) is fit to this univariate subsample using MLE, with the threshold parameter taken to bequantile estimates with levels indicated in the upper panels. Error bands represent one standard errorof the EVI estimates. Despite different estimation parameters, there always appears to be a downwardtrend in the conditional EVI.We begin by demonstrating that conditioning Y on X cannot result in a conditionalEVI that is larger than the marginal EVI.Proposition 2.5.1. Under Assumptions 2.5.1 and 2.5.2 with ξY |X continuous, if ξY ≥ 0,then ξY |X (x) ≤ ξY for almost all x ∈X .See Appendix B.1 for a proof. This result indicates a “mixing” property that themarginal tail can become heavier than the conditional tails.Note that, even if the conditional EVI is constant over the covariate space, it neednot equal the marginal extreme value index (examples are shown in Table 2.2). Thisproposition tells us that knowledge of predictors cannot introduce “extra” tail heavinessinto the response. The proposition also introduces the notion that some predictors mightbe more “informative” than others in terms of their ability to describe the extreme be-haviour of the response: predictors X that induce a larger reduction of ξY to ξY |X arebetter in this light. This is because such predictors are able to describe some of the tailheaviness of the response distribution.We turn our attention to the relationship between the conditional extreme valueindex and upper τ -quantile surfaces QY |X(τ |·) (or quantile curves if the function is one-19CHAPTER 2. BACKGROUNDdimensional), where QY |X(τ |x) = F←Y |X (τ |x) for each τ ∈ (0, 1) and x ∈ X . First,the conditional EVI over a path in the predictor space is non-decreasing if the quantilecurves are non-decreasing over the path.Proposition 2.5.2. Suppose Assumption 2.5.2 holds with ξY |X ≥ 0, and let x˜ : (0, 1)→X be a path in the predictor space from x˜ (0) to x˜ (1). If there exists an ε > 0 suchthat the τ -quantile curves QY |X(τ | x˜ (·)) along the path are non-decreasing for eachτ ∈ (1− ε, 1), then ξY |X(x˜ (·)) is non-decreasing.See Appendix B.1 for a proof.One might think of the conditional EVI in relation to the marginal EVI as measuring“informativeness” of the predictors about the behaviour of extremes. When the condi-tional EVI and marginal EVI are close, then observing that predictor did not “lighten”the tail of the response much, and that observation of the predictors is therefore notvery informative about the extremes of the response. Likewise, a conditional EVI thatis much lower than the marginal EVI “lightens” the tail of the response, and is thereforeinformative.Proposition 2.5.2 suggests that, when upper quantile curves are increasing, largervalues of the predictors cannot become more informative regarding the extreme behaviourof the response. If a predictor in fact becomes less informative of the tail of Y , so thatthe conditional EVI increases as the predictor increases, it is important to capture thisin a model so that the tail behaviour of the response is not underestimated when thepredictor is observed to be large. Capturing such a trend requires a highly non-linearmodel of the quantile surfaces, as the following proposition suggests.Proposition 2.5.3. Suppose Assumption 2.5.2 holds with ξY |X > 0. If there exists anε > 0 and a one-to-one mapping T :X → Rp of the predictor space such thatQY |X(τ |T← (w)) = α +w>β (τ)for w ∈ Rp and each τ ∈ (1− ε, 1), where β : (1− ε, 1)→ Rp, then ξY |X is constant.See Appendix B.1 for a proof.This proposition suggests that, in order for a model to accommodate a varying con-ditional EVI, its quantile surfaces cannot be linear after any transformation of the pre-dictor space. This proposition extends the result of Wang and Li (2013), which indicatesthat modelling linear quantile curves in one predictor (such as with Koenker and Basset’s(1978) linear quantile regression method) is not appropriate if ξY |X is non-constant. Thismakes intuitive sense, since the spacing between any two conditional quantiles is just a20CHAPTER 2. BACKGROUNDmultiple of the same conditional quantiles under a different observed predictor, and thiscannot change the decay rate of the quantile function.As an attempt to be able to use the linear model and still have the potential for anon-constant conditional EVI, Wang and Li (2013) propose a transformation method (inthe single-predictor X = X1 setting). However, their method does not actually achievetheir desired result. Let Ξλ : [0,∞)→ R be the Box-Cox power transform given byΞλ : y 7→yλ−1λ, λ ∈ R\ {0}log y, λ = 0.(2.5.2)They claim that there exists a λ ∈ R such that Ξλ (Y ) | X1 has linear τ -quantile curvesfor all τ ∈ [1− ε, 1] for some ε > 0, and that the conditional EVI ξY |X1 of the distributionof Y | X1 can be non-constant. However, ξY |X1 must in fact be constant in this scenario,unless λ = 0 or FY |X1 is not “well behaved”, as the following proposition indicates.Proposition 2.5.4. Suppose Assumption 2.5.1 holds, and F ′′Y exists and F ′Y > 0 in someleft neighbourhood of y∗. Further, assume Y > 0 almost surely. Then for λ ∈ R, thedistribution of Ξλ (Y ) has EVIξΞλ(Y ) =λξY , ξY ≥ 0;ξY , ξY < 0, (2.5.3)where Ξλ is the Box-Cox power transform defined in Equation (2.5.2).This proposition is shown more generally in Wadsworth et al. (2010, Theorem 1), butEquation (2.5.3) is given in a less convenient form. See Appendix B.1 for a proof.If there is to exist a λ ∈ R such that the upper quantile surfaces of Ξλ (Y ) | X arelinear, then the EVI of the distribution of Ξλ (Y ) | X is constant, and this can onlyhappen if any one of the following scenarios holds:1. the original ξY |X is constant;2. λ = 0 (which results in a log-transform); or3. there exists an x ∈ X such that, near the upper endpoint y∗Y |X (x) defined inEquation (2.5.1), F ′Y |X=x = 0 or F′′Y |X=x does not exist.Scenario 1 defeats the purpose of attempting a power transformation; Scenario 2 restrictsthe power transformation to a log-transform, so that there is no use looking for a λ 6= 0;21CHAPTER 2. BACKGROUNDand Scenario 3 only holds for “misbehaved” conditional distributions, which are not usefulto consider in practice. Further, Scenario 3 does not guarantee that the EVI of Y |X = xis non-constant.2.5.3 Modelling the Conditional EVISection 2.5.2 warns that a high amount of non-linearity in the upper quantile surfacesof Y | X is required for the conditional EVI to be non-constant. Such non-linearityshould be accommodated when modelling if one wishes to allow for a varying conditionalEVI.Few methods exist to allow for a non-constant conditional EVI. Local estimation ofhigh quantiles is a very flexible option, since one (indirectly) models the conditional EVIas a non-parametric function (cf. Koenker et al., 1994; Daouia et al., 2011; Spokoinyet al., 2013, for examples including one predictor). But local methods suffer from poorestimation quality of the conditional EVI: estimating any EVI is prone to error due tothe data-scarcity nature of extreme values, and introducing predictors exacerbates thisissue by thinning out the data even further.Another option is to use an approach similar to a generalized linear model. Forexample, the conditional distribution’s tail can be modelled as a Generalized Paretodistribution whose parameters (including the EVI) are modelled as a linear functionof the predictors, up to some link function (cf. Coles, 2001, Chapter 6). Or, a neuralnetwork can be used to link the predictors to the parameters, as shown by Cannon (2010)for the generalized extreme value distribution. However, the choice of link function isnot clear. In addition, the number of parameters involved in such a model can be quitelarge, especially if one wishes to account for interaction between predictors.An alternative method is to use copulas with a non-constant CCEVI, as defined inDefinition 2.4.2.First, notice that a CCEVI cannot exceed 1.Proposition 2.5.5. Let ξi be the upper i’th CCEVI for a d-dimensional copula, d ≥ 2.Then for almost all u−i ∈ [0, 1]d−1, ξi (u−i) ≤ 1.See Appendix B.1 for a proof.It is possible for the i’th CCEVI to be negative. In this case, the copula would nothave support fully on the unit cube [0, 1]d. This is because the conditional distributionof the Pareto-transformed i’th variable has a finite right-endpoint, as opposed to infinity.The support does not extend all the way to 1 in variable i for the subset of the unit cubewhere the CCEVI is negative.22CHAPTER 2. BACKGROUNDGiven information about marginals, the conditional EVI of Y | X can be easilydetermined from the CCEVI.Theorem 2.5.6. Suppose Assumptions 2.5.1 and 2.5.2 hold, and that the dependence of(X>, Y)is described by a copula C with CCEVI ξC, which is defined in Definition 2.4.2.Suppose further that the right-endpoints sup{y ∈ R : FY |X(y|x)} = ∞ for all x ∈ X(so that ξY ≥ 0). Suppose that the copula density c > 0, D2 c exists, F ′Y > 0, and F ′′Yexists in some left neighbourhood of y∗, and that the von Mises limit in Equation (2.3.6)exists for both F = FY and F = C2|1(·|u) for each u ∈ (0, 1). ThenξY |X (x) = ξY ξC((FX1 (x1) , . . . , FXp(xp))>)(2.5.4)for x ∈X , where FX1 , . . . , FXp are the respective univariate marginal distribution func-tions of X.See Appendix B.1 for a proof.The CCEVI is a useful property to consider when using copulas for regression, par-ticularly regression of extremes. Theorem 2.5.6 suggests that copulas with a CCEVI of0 makes the response light-tailed upon conditioning on the predictor variable. We callsuch copulas fully (upper) tail-lightening. If this relationship is found in nature, thenknowledge of this predictor would be very valuable, since it would entirely describe theextreme behaviour of the response. We describe such a copula family in Section 6.1. Onthe contrary, a copula having CCEVI of 1 suggests that conditioning on the predictorresults in a conditional EVI that is unchanged from the marginal EVI. Loosely, this canbe interpreted as the predictor not carrying any information about the extremes of aresponse. We call such copulas tail-preserving.23Chapter 3Assessment of a ForecasterBefore discussing the construction of a forecaster, it is important to identify how itwill be evaluated so that we know what to aim for. An important diagnostic assessmentis the “calibration” of a forecaster, which assesses whether a forecaster is “on target”, andis discussed in Section 3.1. To quantitatively evaluate forecasters, we use proper scoringrules, as discussed in Section 3.2. We end by emphasizing in Section 3.3 that forecastersshould still be good under extreme circumstances.3.1 CalibrationAn important quality of a forecaster is its calibration1. A forecaster is calibrated ifits forecasts of the τ -quantile are exceeded (1− τ) 100% of the time, for each τ ∈ (τc, 1).The calibration of a forecaster can be defined over (τc, 1) asCal : τ 7→ EY,Qˆ(P(Y ≤ Qˆ (τ)))= EQˆ(FY(Qˆ (τ))), (3.1.1)where the expectations are taken over the stochastic quantities listed in the subscripts.The forecaster is calibrated if Cal (τ) = τ for all τ ∈ (τc, 1). Christoffersen (1998) intro-duced the concept of calibration in the context of interval forecasts, calling the concept“efficiency”, and is discussed in terms of predictive distribution forecasts by Gneiting et al.(2007).1Sometimes “calibration” is used to describe the procedure of selecting model parameters to obtaina forecaster, often referred to by statisticians as “estimation” and “model selection”. This is not whatwe mean by calibration. The proposed estimation and selection procedures are discussed in Sections 5.2and 5.3, respectively.24CHAPTER 3. ASSESSMENT OF A FORECASTERCalibration can be estimated by a sample average,Ĉal (τ) =1nn∑i=1I[Yi,∞)(Qˆi (τ)), (3.1.2)which converges in probability to Cal (τ) by the law of large numbers. Indeed, calibrationis a measure of “correctness” of a forecaster, and so a check for calibration should be a“prerequisite” for a forecaster to be considered at all.The calibration defined in Equation (3.1.1) is a sort-of “overall” calibration, where theexpectation and probability is taken relative to the marginal distributions of Y and Qˆ.But calibration can also be defined under stricter circumstances – for instance, under theobservation of a set of predictorsX = x. In this case, a forecaster is calibrated relative tothe event X = x if its forecasts of the τ -quantile are exceeded (1− τ) 100% of the timewhenever the partial information X = x is obtained, for each τ ∈ (τc, 1). The relativecalibration to such event isCalX=x : τ 7→ EQˆ(FY |X=x(Qˆ (τ))|X = x), (3.1.3)where the expectation is taken over the stochastic forecast Qˆ. Relative calibration acrossthe predictor space is difficult to communicate and interpret, but a weighted version canbe appropriate if one is interested in a particular region of the parameter space. Thisweighted analysis is discussed in Section 3.3.Calibration can be visualized in a calibration plot that plots τ ∈ (τc, 1) on the hor-izontal axis, and 1 − Ĉal (τ) on the vertical axis. Any deviation from the diagonal1− Ĉal (τ) = 1− τ signifies miscalibration.Alternatively, one can consider calibration histograms. Notice that Ĉal (τ) is simplya distribution function of Fˆ (Y ), the probability integral transformed (PIT) response.This distribution function should be uniform over τ ∈ (τc, 1). We can therefore constructa histogram of the transformed response to check the uniformity of these values. ThePIT response can be interpreted as a measurement of “location” within the predictivedistribution that the response realizes – larger values (closer to 1) indicate a realizationthat is further into the tail of the distribution Fˆ . A histogram that is left-skewed (i.e., hasmore mass over larger values) means that the response falls into the tail more often thanit should, suggesting that the forecasts need more mass in the tails. On the contrary, aright-skewed histogram suggests that the forecasts tend to have too much mass in thetail.25CHAPTER 3. ASSESSMENT OF A FORECASTER3.2 Proper Scoring RulesCalibration plots do not allow for a quantitative comparison of forecasters. We there-fore seek a scoring rule to carry out such comparison. According to the prequentialprinciple of Dawid (1984), a scoring rule should assign a score using only the forecastand the observed response. A desirable type of scoring rule is the so-called proper scor-ing rule, which has the property that the expected score is optimal whenever the issuedforecast equals “the distribution of Y ”. We clarify this concept and define such a classof scoring rules here. Throughout the discussion, we assume functions are measurablewherever appropriate.3.2.1 DefinitionWe denote S : Y × Qˆ → [0,∞) as a scoring rule for quantile function forecasts,representing a non-negative penalty so that smaller values are better, and denote y ∈ Yas a realized value of Y .The class F of possible distributions of Y is assumed to be such that the expectedscore given a forecast Qˆ ∈ Qˆ exists:µF(Qˆ):=∫S(y, Qˆ)dF (y) ∈ R (3.2.1)for all F ∈ F . Also, the class Qˆ should only contain quantile functions that are non-decreasing, so that forecasts are sensible. The scoring rule S should have the propertythat its expectation under the forecast Qˆ is smallest when Qˆ = Q, the “true” distribution.Definition 3.2.1. The scoring rule S is proper if, for all forecasts Qˆ ∈ Qˆ,∫S(y, Qˆ)dF (y) ≥∫S (y,Q) dF (y) (3.2.2)for any F ∈ F , where Q = F←. The scoring rule is called strictly proper if the equalityholds if and only if Qˆ = Q.The idea of proper scoring rules is formalized by Matheson and Winkler (1976), andadvanced by Gneiting and Raftery (2007). Proper scoring rules are attractive becausethey reward the forecaster for issuing an honest forecast: if a forecaster truly believesin the distribution having quantile function Qˆ, then relative to this distribution, theforecaster can expect to optimize their score by issuing Qˆ as the forecast – and Qˆ exactlyif S is strictly proper.26CHAPTER 3. ASSESSMENT OF A FORECASTERThe goodness of a forecaster can be determined by its expectation:µ¯F := EQˆ(µF(Qˆ)), (3.2.3)where the expectation EQˆ is taken over the (stochastic) forecaster, and µF is defined inEquation (3.2.1), representing the mean score given a forecast relative to F ∈ F . Thisexpectation can be compared to the expectations of other forecasters to determine whichis best – though, an interpretation of these expectations is lacking. In practice, the meanscores would be estimated using T forecasts Qˆ1, . . . , QˆT and realizations Y1, . . . , YT :ˆ¯µF =1TT∑t=1S(Yt, Qˆt). (3.2.4)Though, other central estimates such as the median or trimmed mean might be consid-ered. This is an estimate of the expectation relative to the marginal distribution F = FY .But expectations relative to other distributions F ∈ F can be used as well. For example,one might be interested in assessing the forecaster when a set of predictors X = x isobserved. In this case, the expectation is relative to the conditional distribution FY |X=x,and can be estimated by regression of the single-observation scores over the predictorspace X . This regression surface can be used to compare the performance of forecast-ers over different regions of the predictor space. This concept is extended further inSection 3.3 when considering weighted scores.There are many proper scoring rules one can use, as is discussed in the next section.However, a common scoring rule should be used when comparing different forecasts soas not to introduce bias into the comparison.3.2.2 A Class of Proper Scoring RulesA class of proper scoring rules for a single quantile forecast has been described byGneiting and Raftery (2007). For any quantile level τ ∈ (0, 1), if y ∈ Y is the observedresponse and Qˆ (τ) is the forecast τ -quantile, then sτ(y, Qˆ (τ) ;ϕ)+ h (y) is a properscoring rule, wheresτ(y, Qˆ (τ) ;ϕ)= ρτ(ϕ (y; τ)− ϕ(Qˆ (τ) ; τ)), (3.2.5)h : Y → R is arbitrary, ϕ (·; τ) : R → R is a non-decreasing function that is a transfor-mation of the realization y and forecast Qˆ (τ), and ρτ : R → [0,∞) is the asymmetric27CHAPTER 3. ASSESSMENT OF A FORECASTERabsolute deviation function defined asρτ (s) =(τ − I(−∞,0) (s))s =(1− τ) |s| , s < 0τ |s| , s ≥ 0. (3.2.6)Gneiting and Raftery (2007, Equation 42) extend this class to the case of forecast-ing a finite set of K quantiles with levels 0 < τ1 < · · · < τK < 1. Denoting qˆ =(Qˆ (τ1) , . . . , Qˆ (τK))as the forecast quantiles, and τ = (τ1, . . . , τK), a proper scoringrule is Sc (y, qˆ;ϕ, τ ) + h (y), whereSc (y, qˆ;ϕ, τ ) =1KK∑k=1ρτk(ϕ (y; τk)− ϕ(Qˆ (τk) ; τk))(3.2.7)and each ϕ (·; τk) : R→ R is nondecreasing. Note that the class of scoring rules definedin this way may not be exhaustive.The composite scoring rule Sc above is simply an aggregation of the single-quantilescoring rules sτ over some quantile levels τ . A natural extension to a class of properscoring rules for the upper quantile function follows.Theorem 3.2.1. Let h : Y → R be an arbitrary function, and ϕ (y; τ) = g (y)w (τ)where g : R → R is non-decreasing and w : (τc, 1) → [0,∞). Under the regularityconditions listed in Appendix C.1, the scoring rule S(y, Qˆ;ϕ)+ h (y) for y ∈ Y andQˆ ∈ Qˆ exists and is proper relative to F , whereS(y, Qˆ;ϕ)=∫ 1τcρτ(ϕ (y; τ)− ϕ(Qˆ (τ) ; τ))dτ. (3.2.8)See Appendix C.1 for a proof. Matheson and Winkler (1976) similarly discuss theaggregation of quantile scores, but in less detail and for τc = 0.Remark 3.2.1. The function g can be interpreted as a transformation of the response,and is therefore called the transformation function of S. The function w is called theweight function of S, since it assigns different weights across the quantile levels in (τc, 1).Remark 3.2.2. In practice, the choice of h does not make a difference when comparingscores across forecasters, so it is taken to be the zero function. In addition, it is morecomputationally convenient to work with finite sums instead of an integral over τ . Inthis case, it is convenient to use the composite scoring rule Sc in place of S, with τk =τc + (1− τc) (2k − 1)/(2K) for k = 1, . . . , K.28CHAPTER 3. ASSESSMENT OF A FORECASTERThe existence of the scoring rule S requires the concept of the extreme value index(EVI), discussed in Section 2.3. The EVI of a distribution can be thought of as thebehaviour of the distribution’s tail. Roughly, a positive EVI suggests that the survivalfunction decays like a power function, and these distributions are said to be heavy-tailed(and more so when the EVI is larger); a negative EVI suggests that the distributionhas a finite right-endpoint; and a zero EVI suggests that the survival function decaysexponentially. It is important to note that distributions having EVI ≥ 1 do not havea finite mean. The set of distribution functions having some EVI ξ ∈ R is denoted byD (Gξ), and is called the domain of attraction of Gξ, where Gξ is the so-called ExtremeValue distribution, the details of which are not necessary for this dissertation. For moredetails of concepts in Extreme Value Theory, see de Haan and Ferreira (2006).The conditions of Theorem 3.2.1 reveal that the scoring rule (through ϕ) and theforecast space Qˆ cannot be chosen independently. So as not to restrict the forecasters’beliefs, it is most sensible to gather the space of possible forecasts Qˆ before choosing afunction ϕ. An analyst who wishes to “bypass” the choice of ϕ by selecting the identitytransformation function and constant weight function so that ϕ (y; τ) = y inadvertentlyrestricts the forecasters to issue forecast distributions that are not too heavy-tailed, asthe following example illustrates.Example 3.2.1. Suppose the scoring rule with ϕ (y; τ) = y is selected. Then by Condi-tion 2 in Appendix C.1, the space of potential forecasts Qˆ can only contain distributionswith extreme value indices less than 1. In addition, by Condition 3, it is further assumedthat the response Y has a finite mean, and this may not be the case if, for example, Yhas distribution with EVI at least 1.If heavy-tailed distributions are allowed to be forecast, the following example illus-trates a valid choice of the scoring rule.Example 3.2.2. Suppose the space of allowed forecasts are in some domain of attraction,which covers most distributions one would use in practice. Suppose it is known by theforecasters that Y ⊂ (0,∞). Then assuming E (log Y ) exists, a scoring rule with g = logis a valid choice when the weight function w ∈ RVα at 1− for some α < 1, and iscontinuous almost everywhere. Most of the conditions in Appendix C.1 follow as animmediate consequence of this definition, except for Condition 2, which follows afterrecognizing that log ◦Qˆ is the quantile function of a distribution with extreme valueindex in (−∞, 0].There are some choices of ϕ that are always valid, as the following example illustrates.29CHAPTER 3. ASSESSMENT OF A FORECASTERExample 3.2.3. Take a transformation function g that is bounded above and below,and a weight function satisfying w ∈ RVα at 1− for some α < 1. Then the conditionsin Appendix C.1 follow as a direct consequence, after recognizing that g ◦ Qˆ has a non-positive extreme value index (as long as the distribution function Qˆ← ◦ g← is in somedomain of attraction, which might not happen if g and Qˆ are too “misbehaved”).3.2.3 Choice of Quantile Weight FunctionThere are restrictions on the choice of ϕ that leads to a valid scoring rule, as discussedin Section 3.2.2. The present section investigates whether we can find scoring rules thatare more desirable than others, but concludes that we cannot find any under certaincriteria.In the context of a single-quantile scoring rule sτ , it is not well understood whattransformation functions g are desirable. We instead turn our attention to finding scoringrules that have desirable properties across quantiles by choosing an appropriate weightfunction w.One desirable feature is the notion that a single-quantile score sτ (the integrand inEquation 3.2.8) should provide “equally good scores” to a forecast having “equally goodquantiles”. Such a scoring rule based on goodness alone would not assign different weightsacross quantiles.Issuing the correct distribution Qˆ = Q ∈ Q is one example of a forecast that shouldsufficiently satisfy any notion of an “equally good forecast” across quantiles, so we willonly consider this. The notion of an “equally good score” can be expressed through themean single-quantile score at the true distribution, E(sτ(Y,Q (τ) ;ϕ)). Lemma 3.2.2identifies an expression for this expectation, and makes it clear that selecting a desirablescoring rule in this context amounts to selecting a weight function w.Lemma 3.2.2. Suppose the quantile function Q of Y is continuous, and µ = E(g (Y ))exists. ThenE(sτ(Y,Q (τ) ;ϕ))= w (τ) τ(µ− µL (τ)), (3.2.9)where µL (τ) = E(g (Y ) | Y < Q (τ)).A proof can be found in Appendix C.2. Note that µ − µL (τ) > 0 for τ ∈ (0, 1), bydefinition. Some examples of this expectation are shown in Table 3.1.One way to define the notion of a scoring rule assigning “equally good scores” wouldbe if the mean single-quantile score is constant (and non-zero) across τ ∈ (τc, 1), for alldistributions Q ∈ Q. However, it is clear from Equation (3.2.9) that no such w exists30CHAPTER 3. ASSESSMENT OF A FORECASTERDistribution FY (y) E(sτ(Y, F←Y (τ) ;ϕ))Y ∼ N (µ, σ2)for µ ∈ R, σ > 0Φ(y−µσ), y ∈ R σφ (Φ← (τ)) = σ (1− τ) ` (τ)where ` ∈ RV0 at 1−.Y ∼ Exp (µ) forµ > 01− exp(− yµ), y > 0 µ (1− τ) log(11−τ)Y ∼ Par(σ, 1ξ)for σ > 0,ξ ∈ (0, 1)1− ( yσ)−1/ξ, y > σ σ1−ξ (1− τ)[(1− τ)−ξ − 1]∼σ1−ξ (1− τ)1−ξ as τ ↑ 1.Table 3.1: Examples of the expected single-quantile score with w (τ) = 1 and g (y) = y, as given inEquation 3.2.9. The distributions are Gaussian (with φ and Φ denoting the density and distributionfunction of the standard Gaussian distribution, respectively), Exponential, and Type I Pareto. Proofscan be found in Appendix C.2.due to the expectation’s dependence on Q. Even if the definition was relaxed to hold foronly one distribution (such as the marginal of Y ), one must have a priori knowledge ofQ (in the form of µL) to select such a w.This raises the question as to whether an a posteriori selection of the weight func-tion based on the fitting data is appropriate. For example, one may estimate τ 7→τ(µ− µL (τ))using the fitting data, and choose its reciprocal as the weight function sothat E(sτ(Y,Q (τ) ;ϕ))is estimated to be 1 relative to the marginal distribution of Y .We recommend against any a posteriori selection of the weight function. Although theresulting scoring rule would be proper when assessing new data, it may not be properwhen assessing the fitting data, as is done in estimation, for example (see Section 5.2).This may result in an estimator that does not converge to the true distribution, because“dishonest” forecasts may be encouraged.Lemma 3.2.2 demonstrates that, without the weight function (i.e. with w (τ) = 1),the mean single-quantile score tends towards 0 as τ ↑ 1, since µL (τ) → µ. This meansthat higher quantiles have less of an influence on the overall score than others. Proposi-tion 3.2.3 investigates the rate of decay of the mean single-quantile score.Proposition 3.2.3. Suppose w ∈ RVα at 1− for some α ∈ R, and g : R → R is anon-decreasing function such that the distribution function F ◦ g← ∈ D (Gξ) for someξ < 1. Thenτ 7→ E(sτ(Y,Q (τ) ;ϕ)) ∈ RVα+ξ+−1 as τ ↑ 1,where ξ+ = max (0, ξ).Perhaps instead of asking for the mean single-quantile score to be constant, we can askfor it to be slowly varying as τ ↑ 1, achieved through an appropriate choice of α. If this31CHAPTER 3. ASSESSMENT OF A FORECASTERis achieved, then the mean single-quantile score would display “constant-like behaviour”near τ = 1, so that upper quantiles would be weighed similarly. To achieve this for asingle F ∈ F with F ◦ g← ∈ D (Gξ) for ξ < 1, we would need to choose α = 1 − ξ+.Such a choice is valid when the correct forecasts are issued: according to Condition 2’ inAppendix C.1, we only require α < 2−ξ+ in order for the scoring rule in Equation (3.2.8)to exist.It is realistic to select a g so that ξ+ (and thus α) is assumed to be known a priori.It is often reasonable to assume a priori that each distribution function in F belongsto some domain of attraction. Taking g = log will then ensure that, for each F ∈ F ,F ◦ g← ∈ D (Gξ) where ξ ≤ 0, so that ξ+ = 0. One can then take α = 1, using a weightfunction such as w (τ) = 1/ (1− τ).However, in using Condition 2’, we have assumed that all forecasts will never havetheir right-endpoint exceeded, and this may not be realistic in practice. Otherwise, toensure the existence of the scoring rule in Equation (3.2.8), one must consult Condition 2in Appendix C.1 as opposed to Condition 2’. The former condition states the requirementα < 1 − ξ+, meaning the selection of α = 1 − ξ+ is no longer an option. If one decidesto choose w ∈ RVα at 1− with e α = 1 − ξ+ anyway, forecasts would receive a scoreof infinity whenever the response exceeds the right-endpoint of the forecast distribution(see the existence proof of Theorem 3.2.1 in Appendix C.1 to see why).In fact, if one must choose α < 1−ξ+, then the mean single-quantile score will alwaysdecay to zero. Empirical evidence suggests that the closer α is to 1 − ξ+, the later thisdecay happens. Hence, one may wish to choose α close to 1− ξ+.3.3 Integrity of Extreme ForecastsThe calibration and proper scoring rule assessments discussed in Sections 3.1 and 3.2are in general applicable to any forecasting method that issues a predictive distribution.When forecasting extremes, however, we ask that a forecaster’s performance not be com-promised in the tail of X – in other words, the forecaster should keep its integrity (or“goodness”).Integrity of forecasts in a distributional tail of X is important because the mostextreme responses are typically more prone to occur here. Forecasters that keep theirintegrity in the tail of X should produce scores that are not compromised when Xis observed in its tail. This can be evaluated by estimating the mean score surfacex 7→ µ¯FY |X=x in Equation (3.2.3) over the tail, and comparing this to the marginal meanµ¯FY .32CHAPTER 3. ASSESSMENT OF A FORECASTERTo obtain a concrete score for the mean over the tail of X, a weighted average overthe predictor space can be used. The weight function wX : X → [0,∞) should bechosen to put more weight in the tail of X. In the application in Chapter 7, a weightfunction based on the logistic function is used. A weighted estimator of the mean scorein Equation (3.2.3) isˆ¯µ(wX)FY=∑twX (X t)S(Yt, Qˆt)∑twX (X t).For example, in the context of flooding, a forecaster that scores equally good when aheavy snowmelt is observed is desirable – it keeps its integrity.Similarly, integrity in the tail ofX can also be assessed with a calibration plot, wherean estimate of the calibration isĈalwX (τ) =∑twX (X t) I(Qˆt(τ),∞) (Yt)∑twX (X t).33Chapter 4Existing Forecasting MethodologyUnder the framework in Chapter 2, this chapter discusses how existing methodologymight be used to build forecasters, along with their shortcomings. The methods discussedare local regression methods, linear regression, and fully parametric regression. In short,parametric assumptions are needed to extrapolate into the upper tail of the conditionaldistribution of Y |X, as well as into the tails of the predictor space. Yet, these existingmethods fall short in some way.4.1 Local RegressionThese methods model the quantile surfaces for any quantile level τ as a non-parametricfunction of the predictors. There are many such methods, and we discuss only a few here.Smoothing methods for the mean can be extended to quantile regression. Localpolynomials are discussed by Spokoiny et al. (2013), who concisely review other similarmethods. Splines are discussed by Koenker et al. (1994) in the single-predictor setting,and is extended to the two-predictor setting in Koenker and Mizera (2004).In the context of extreme quantiles, Daouia et al. (2011) consider fitting quantile func-tions at each point on the predictor space. For one such point x, they propose estimatingthe conditional quantile function using the inverse empirical distribution function of theresponse data weighted proportionally to their predictors’ vicinity to x. This quantilefunction is used to obtain a Pickands-type estimate of the conditional extreme value in-dex. Asymptotics of the quantile estimators are given by Daouia et al. (2013) when theconditional response belongs to the domain of attraction of some extreme value distribu-tion.Regardless of the specifics, these methods suffer from data sparsity when the numberof predictors is not small, so that the fitted quantile functions will generalize poorly to34CHAPTER 4. EXISTING FORECASTING METHODOLOGYnew data. This is true for any local fitting technique, but the problem is exacerbated inthe context of extreme forecasting because even less data are found in either the X orY | X tail. However, local methods are still useful to obtain a preliminary sense of theshape of the quantile surfaces.4.2 Linear RegressionThese methods model quantile surfaces as linear (or polynomial) functions, but typi-cally do not impose a parametric form across quantile levels.Possibly the most well-known method of this type is linear quantile regression, in-troduced by Koenker and Bassett (1978), where each quantile is modelled as a linearfunction of the predictors:QˆY |X(τ |x;α,β) = α (τ) + x>β (τ) , (4.2.1)where β : (0, 1)→ Rp and α : (0, 1)→ R. The parameters for a particular quantile levelτ can be estimated by optimizing the score for a single quantile prediction,(αˆ (τ) , βˆ (τ))= arg min(a,b)T∑t=1ρτ(yt − a− x>t b), (4.2.2)where ρτ is the asymmetric absolute deviation function in Equation (3.2.6). See Koenker(2005) for a comprehensive review of quantile regression.One problem with Koenker’s method is that it may result in non-monotonic forecasts.This happens due to the “crossing” of different quantile surfaces, and is noted by Koenker(2005). This may be particularly problematic when predictors are observed to be extreme,as the application in Chapter 7 demonstrates.There are several papers that attempt to address the crossing issue. Chernozhukovet al. (2010) “rearrange” the invalid quantile function to obtain a valid quantile function,both of which share a common stochastic process when generating data. Bondell et al.(2010) restrict estimation so that the quantile surfaces do not cross within the convex hullof the covariates, but this may not be appropriate if an extreme predictor is observed.Another method to avoid inconsistent forecasts is to force the quantile surfaces to beparallel, as in the model used by Zou and Yuan (2008). Their model restricts Koenker’smodel so that β is constant. To estimate the common slope parameter, they propose acomposite quantile regression estimator similar to Equation (4.2.2) that borrows strength35CHAPTER 4. EXISTING FORECASTING METHODOLOGYacross K quantile levels 0 < τ1 < · · · < τK < 1, again optimizing the predictor’s score:(αˆ (τ1) , . . . , αˆ (τK) , βˆ)= arg min(a1,...,aK ,b)K∑k=1T∑t=1ρτk(yt − ak − x>t b). (4.2.3)Even in light of these solutions to the inconsistency problem, models enforcing linearquantile surfaces are restrictive (Bernard and Czado, 2015). In particular, the fittedquantile functions over the tail of X can be highly biased, and therefore noncompliantwith theX tail integrity requirement discussed in Section 3.3. Higher order terms can beadded in an attempt to address this issue, but at the risk of losing the ability to generalizeto new data, due to overfitting. In addition, the original problem of poor fit persists whenwe wish to extrapolate into the predictors’ tail. This situation would happen if we wishto issue a forecast when record predictors are observed (such as record-high snowmelt).To improve upon the fit in the tail of X, Huang et al. (2015) propose weighting thequantile regression so that the tails carry more weight. This would improve the fit closerto the tail, but at the cost of compromising the fit over “central” values of the predictors,as seen in the unweighted score. In addition, the resulting forecaster would still be unableto extrapolate further into the tail of X.There is yet another problem: as long as the quantile functions are non-parametricin the quantile level, the quantile functions cannot effectively extrapolate into the distri-butional tail – the largest responses impose an unrealistic upper bound. A parametricassumption over the quantile level is needed to extrapolate into the tail beyond the boundimposed by the response variable.Overall, these regression methods are generally useful when making predictions on“intermediate” quantiles after observing typical values of the predictors.4.3 Fully Parametric RegressionA parametric modelling approach is to first fit a parametric distribution to(X>, Y),from which the conditional quantile function of Y given the predictors can be derived.If a multivariate Gaussian distribution is used to model the joint distribution, the condi-tional quantiles become linear in the predictors, with univariate Gaussian error. A moremodern approach to modelling FX,Y is to use parametric copula families to describe thedependence. Vine copulas are a flexible approach to such modelling in the multi-predictorsetting.In the case of a single predictor, Bouyé and Salmon (2009) first suggest modelling36CHAPTER 4. EXISTING FORECASTING METHODOLOGYthe conditional quantiles by specifying a copula model in a univariate time series. Theypropose to use nonlinear quantile regression over one quantile to estimate the copulaparameters, and call the estimator a “Copula Quantile Regression” (CQR) estimator.Koenker (2005, Section 8.4) mentions as a research topic the use of copula models tobuild non-linear quantile models (and refers to a 2002 version of Bouyé and Salmon,2009).Chen et al. (2009) use a similar model for time series data as Bouyé and Salmon(2009), but suggest that the CQR estimator can be used separately to estimate differentquantiles (so that the copula parameters need not be common across all quantile levels).Xie (2015) proposes estimating the copula parameters as being common across quantilelevels by taking a weighted average of CQR estimates across different quantile levels.However, their recommended weights depend on the value of the predictor and it istherefore unclear exactly how the weights are computed (especially in comparison withthe maximum likelihood estimator).To generalize the copula model approach to p predictors, Kraus and Czado (2017)recommend modelling the joint distribution of(X>, Y)using a special type of vinecalled a D-Vine, and discuss how to compute the quantiles of Y given the predictors.Instead of using quantile regression to estimate the quantile function, they fully estimatethe joint D-Vine distribution first.However, fitting the joint distribution of(X>, Y)in its entirety will not necessarilylead to a good fit when the upper quantiles of Y given the predictors are extracted. Thesame can be said about the weighted estimator of Xie (2015). In addition, the jointdistribution of(X>, Y)can be modelled more flexibly than the D-Vine of Kraus andCzado (2017). Improvements in these respects are addressed next in Chapter 5.37Chapter 5Proposed Forecasting MethodologyIn this chapter, we propose a method to obtain a forecaster. The new approachallows for flexible shapes of high quantile surfaces, and accounts for dependence amongstthe predictors and response that is non-Gaussian. Essentially, the procedure involvesspecifying a joint distribution for(X>, Y), from which the quantile function of Y | Xcan be derived.Making a forecaster requires the following steps.1. Model the quantile function of Y | X by modelling the joint distribution of(X>, Y)with a vine copula or PCBN; see Section 5.1.2. Fit the model to obtain a forecaster by optimizing the score on the training data;see Section 5.2.3. Repeat Steps 1-2 with other candidate models, and select the optimal forecaster onthe validation set; see Section 5.3.5.1 Building a ModelWe are interested in obtaining a model for the quantile function of Y | X that isparametric (or semi-parametric) across both predictors and quantile level. The mainidea is to specify a PCBN model for the distribution of(X>, Y), and then derive theconditional quantile function.Using copula models to describe the dependence amongst variables is advantageousbecause it allows for tail dependence to be modelled, resulting in nonlinear quantilesurfaces. For example, it could be that a predictor does not influence the response muchunless the predictor is large. Or, observing two large predictors could have a greater effect38CHAPTER 5. PROPOSED FORECASTING METHODOLOGYon the response than they independently would have. Bernard and Czado (2015) discussthe importance of capturing tail dependence when modelling quantile curves (they onlyconsider one predictor). They show the relationship between tail dependence and theasymptotic (x → ±∞) quantile curves, and demonstrate that the assumption of linearquantile curves is a poor one.Examples of quantile curves for some bivariate copulas can be found in Figure 5.1,under various marginals. The properties of these copulas are observable in the shapes ofthe quantile curves, and is is discussed more extensively by Bernard and Czado (2015).For example, the Frank copulas have no tail dependence, and the flattening of the quantilecurves for larger predictor values is telling of this tail independence. The Gumbel copulashave quantile curves that approach the positive diagonal, indicating tail dependence.Overall, linearity of quantile curves (and by extension, quantile surfaces) is generally apoor assumption outside of the “central” predictor space. As such, the flexibility of thePCBN (or vine) approach allows for the forecaster to perform well, even when extremepredictors are observed.GumbelN(0,1)~N(0,1)GumbelN(0,1)~t(2)GumbelPar(3)~N(0,1)GumbelPar(3)~t(2)FrankN(0,1)~N(0,1)FrankN(0,1)~t(2)FrankPar(3)~N(0,1)FrankPar(3)~t(2)GaussianN(0,1)~N(0,1)GaussianN(0,1)~t(2)GaussianPar(3)~N(0,1)GaussianPar(3)~t(2)01230.00.51.01.52.02.551015202523450.00.51.01.52.00.00.51.01.52.023423−101230.00.51.01.52.02.55.07.510.0234−2 0 2 −4 −2 0 2 4 −2 0 2 −4 −2 0 2 4−2 0 2 −4 −2 0 2 4 −2 0 2 −4 −2 0 2 4−2 0 2 −4 −2 0 2 4 −2 0 2 −4 −2 0 2 4xQuantileQuantileLevel0.950.900.80Figure 5.1: Examples of quantile curves of some bivariate distributions determined by copulas. Param-eters for the copula families are chosen to have a Kendall’s tau of 0.3. The marginal distributions areindicated in the form of “response distribution” ~ “predictor distribution”.After modelling the joint distribution of(X>, Y)by a parametric distribution, the39CHAPTER 5. PROPOSED FORECASTING METHODOLOGYresulting parametric model for the quantile surfaces may be non-identifiable. This isbecause some of the parameters are strictly associated with the joint distribution of thepredictors, as the following example demonstrates.Example 5.1.1. Consider a Gaussian model for the joint distribution of three randomvariables:(X1, X2, Y ) ∼ N0, 1 ρ1 σρ3ρ1 1 σρ2σρ3 σρ2 σ2 .The conditional quantile function of Y |X isQˆY |(X1,X2)(τ | (x1, x2) ;β)= β0Φ← (τ) + β1x1 + β2x2,where Φ← is the probit function, and β = (β0, β1, β2)> can be found from the parametersθ = (ρ1, ρ2, ρ3, σ)> byβ0 = σ√1 + 2ρ1ρ2ρ3 − ρ21 − ρ22 − ρ231− ρ21,β1 = σρ3 − ρ1ρ21− ρ21,andβ2 = σρ2 − ρ1ρ31− ρ21.The parameters θ are non-identifiable for QˆY |(X1,X2) because there exist more than one(in fact, infinitely many) members of the parameter space of θ that result in the samevalue of β – that is, results in the same quantile surfaces. More generally, the regressionslopes β is a matrix product of the inverse covariance matrix of the predictors, timesthe covariance vector between the response and predictors. If the former is left as aparameter, then this linear combination is non-identifiable.A simple solution to the non-identifiability problem is to estimate those parametersassociated with the distribution of X first, by maximum likelihood, for example.Computing the quantile function of Y | X and the distribution of X from a PCBNis in general cumbersome and computationally expensive. The situation improves if thePCBN is modelled so that Y is introduced last. This is the proposed modelling method,and can be summarized in the following steps:40CHAPTER 5. PROPOSED FORECASTING METHODOLOGY1. Fit a model to FX , the joint distribution of X. For example, use a vine copula orPCBN.2. Choose an order in which to link the predictors to the response. Without loss ofgenerality, suppose this order is 1, . . . , p.3. Choose a copula family to model(Xj, Y) | X1:(j−1), for the selected order j =1, . . . , p.4. Use Equation (A.2.3) in Appendix A.2 to calculate the quantile function of Y |X.Using this modelling method, the model forX is easier to fit, because it does not requirethe response to be integrated out of the full joint distribution. In addition, the equationfor the quantile function of Y |X indicated in Step 4 is a closed-form equation.This modelling method also carries an interpretation. A subsequent variable in thepairing order determined in Step 2 explains some of the “leftover” uncertainty in theresponse that previous variables cannot explain. Like in principal components analysis,the result is a division of the uncertainty of the response amongst the predictors. Con-sequently, we obtain a natural variable selection method, similar to that described inKraus and Czado (2017): a predictor can be removed if it does not have much more tocontribute in addition to the previously fitted predictors. However, since model error iscompounded further down the linkage order of Step 2, the most descriptive predictorsought to be included early in the linkage order.In the computation of the conditional quantile function in Step 4, the joint predictordistribution is only needed to transform the predictors Xj to an independent set ofuniform predictors Uj := Fj|1:(j−1)(Xj |X1:(j−1))for j = 1, . . . , p, forming the vectorU . This transformation sometimes coincides with the so-called Rosenblatt transform ifthe linkage order of Step 2 coincides with the introduction order of the vine/PCBN ofthe predictors. As such, estimation of the joint predictor distribution in Step 1 is notas critical as fitting the link between the predictors and response, whose fit will adaptappropriately to the resulting transformation of the predictors.However, estimation of the joint predictor distribution is still important. If this esti-mation is done poorly, then there is more likely to be issues with multicollinearity, as thetransformed predictors are no longer spread uniformly throughout the unit p-cube. Butmore importantly, the copulas modelled between the response and transformed predic-tors in Step 3 may not actually represent the copulas underlying the model. Unintendedconsequences may result, such as unintended tail dependence or CCEVI trend.41CHAPTER 5. PROPOSED FORECASTING METHODOLOGYHowever, computing U can be computationally expensive if the PCBN on X1:j forj = 2, . . . , p are not vines, and could each involve up to (j − 1)-dimensional integrals.One way to simplify this is to model the distribution of X with a vine that has anintroduction order equal to the pairing order of Y . Since vine models are extremelyflexible, this should not negatively affect the flexibility of the modelling procedure. Thena closed-form expression exists for each transformed variable. In this case, the PCBN for(X>, Y)is also a vine.When the PCBN for(X>, Y)is a D-Vine, then the model is similar to that of Krausand Czado (2017) – the difference being that they fit the entire joint distribution of(X>, Y)using maximum likelihood so that the entire quantile function of Y | X canbe found.5.2 CNQRWith a parametric model of the conditional quantile function available, the next taskbecomes fitting the model. To do this, an estimation method called composite nonlinearquantile regression (CNQR) estimation is proposed in Section 5.2.1. The idea behind theestimation is to select the parameter that optimizes the forecaster’s average score overthe training data. Since there are a family of scores that can be optimized, there is acorresponding family of CNQR estimators.Asymptotics of the CNQR estimators is established in Section 5.2.2. Under someregularity conditions, a CNQR estimator is consistent and asymptotically Gaussian, andthis information can be used to make inference. Estimator choice is briefly investigatedby comparing asymptotic variances, and it is found that about 10 quantile levels areappropriate for estimation, and that choosing a heavy-tailed transformation functionmight compromise the performance of the estimator.Other existing estimation methods in the literature are special cases of CNQR esti-mation, as is discussed in Section 5.2.3.5.2.1 EstimationSuppose the upper quantile function of Y conditional on the predictors is modelledby the family QˆY |X(τ |x;θ), indexed by (identifiable) parameters θ ∈ Θ. We would liketo fit a model from the family using observations(X>i , Yi)for i = 1, . . . , n.The optimum score estimator is a natural choice for estimation. It chooses the θ ∈ Θso that the average score, as indicated in Equation (3.2.7), is optimized. Specifically, for42CHAPTER 5. PROPOSED FORECASTING METHODOLOGYa quantile level cutoff τc ∈ (0, 1), a selection of quantile levels τc ≤ τ1 < · · · < τK < 1,and non-decreasing transformation functions g1, . . . , gK , the parameter θˆn ∈ Θ is chosen,whereθˆn = arg minθ∈Θ1nKn∑i=1K∑k=1ρτk(gk (Yi)− gk(QˆY |X(τk|X i;θ))), (5.2.1)and ρτ is indicated in Equation (3.2.6). The resulting family of estimators is referred toas the composite nonlinear quantile regression (CNQR) family. Whenτk =2k − 12K(1− τc) + τc (5.2.2)for k = 1, . . . , K, the CNQR estimators become a Riemann Sum approximation to theintegral scoring rule in Equation (3.2.8).The CNQR objective function can have a cusp at its minimum, as demonstrated inFigure D.1 of Appendix D. In addition, the function is non-differentiable at many pointsalong its domain. This means minimization methods that rely on derivatives, such asgradient descent, may run into difficulties. However, the objective function approximatesa smooth function for a large sample size, so that gradient descent methods may besufficient. If not, Koenker and Park (1996) describe an algorithm for computing such aminimum. See Appendix D for a discussion on the smoothness of the objective function.The cutoff τc is intended to be chosen “near” 1. In doing so, CNQR estimation selectsa model that more accurately describes the distributional tail of Y given the predictors.However, choosing a larger τc comes at a cost of accepting a higher variance in the quantileestimates. This bias-variance tradeoff is similar to that of estimating the extreme valueindex of a univariate sample with the peaks over threshold method.Note that the nonlinear quantile regression estimator of Koenker (2005) results as aspecial case of the estimator in Equation (5.2.1) when K = 1, g1 is the identity function,and QˆY |X(τ |x;θ) = (1,x>)θ. The CNQR estimators are similar to the weighted aver-age estimator of Xie (2015), but the latter provides no guarantee that the scoring rule isoptimized.Instead of CNQR estimation, it is possible to find the likelihood from the quantilemodel, and use the corresponding maximum likelihood estimator (MLE). However, thelower endpoint of the distribution of Y | X = x is limτ↓τc QˆY |X(τ |x;θ), which dependson the model parameters. The regularity conditions are therefore not all satisfied, mean-ing that the MLE may have undesirable properties. In addition, there is no guaranteethat the resulting forecaster would optimize the scoring rule.43CHAPTER 5. PROPOSED FORECASTING METHODOLOGY5.2.2 InferenceTo estimate the conditional quantile function of Y given the predictors, a parametricor semi-parametric family of such functions is chosen. To study asymptotic properties,the family is assumed to contain the true quantile function.Assumption 5.2.1. The data(X>i , Yi)for i = 1, . . . , n are iid replicates of the randomvector(X>, Y).Assumption 5.2.2. Consider K quantile levels τc ≤ τ1 < · · · < τK < 1 for some τc ∈(0, 1). Further, consider a family of quantile functions (τ,x) 7→ QˆY |X(τ |x;θ) withdomain (τc, 1)×X , each strictly increasing in its first argument, indexed by parametersθ ∈ Θ ⊂ RdΘ , dΘ ≥ 1. There exists a θ0 ∈ Θ such that QY |X(τk|x)= QˆY |X(τk|x;θ0)for all k = 1, . . . , K and x ∈X .The composite non-linear quantile regression (CNQR) family of estimators of θ0 areintroduced in Equation (5.2.1) in Section 5.2. The estimators optimize the fit of Kupper quantile surfaces with levels above a cutoff τc ∈ (0, 1), via a scoring rule indicatedin Equation (3.2.7) in Section 3.2.2. Although it is indicated that the cutoff τc be non-zero, the CNQR estimators are also valid for τc = 0. However, in this case, the familyof models would describe the entire conditional distribution of Y given the predictors,so that the maximum likelihood estimator may be more appropriate. As discussed inSection 5.2, the motivation for choosing τc near 1 is to improve accuracy of high quantileestimation.For simplicity of notation, for each i = 1, . . . , n and k = 1, . . . , K, defineqki : θ 7→ gk(QˆY |X(τk|X i;θ))(5.2.3)over the parameter space Θ, having gradientq˙ki : θ 7→∂∂θqki (θ) (5.2.4)and Hessianq¨ki : θ 7→∂2∂θ∂θ>qki (θ) . (5.2.5)The i subscript is sometimes dropped to refer to the generic random vector(X>, Y).Theorem 5.2.1 (Consistency). Suppose Assumptions 5.2.1 and 5.2.2 hold, and considernon-decreasing functions gk : R → R, k = 1, . . . , K. Under the regularity conditionslisted in Condition 1 in Appendix D, the CNQR estimator θˆn defined in Equation (5.2.1)44CHAPTER 5. PROPOSED FORECASTING METHODOLOGYconverges in probability to θ0 as n → ∞. That is, the CNQR family of estimators areconsistent estimators.Theorem 5.2.2 (Asymptotic Normality). Suppose Assumptions 5.2.1 and 5.2.2 hold,and consider non-decreasing functions gk : R → R, k = 1, . . . , K. Under the regular-ity conditions listed in Conditions 1 and 2, the CNQR estimator θˆn defined in Equa-tion (5.2.1) satisfies√n(θˆn − θ0)→d N(0,ΣCNQR), whereΣCNQR = D−11 K∑k,k′=1D0kk′ min (τk, τk′)(1−max (τk, τk′))D−11 , (5.2.6)D0kk′ = E([q˙k (θ0)] [q˙k′ (θ0)]>), (5.2.7)for each k, k′,D1 =K∑k=1D1k, (5.2.8)D1k = E(fgk(Y )|X(qk (θ0) |X) [q˙k (θ0)] [q˙k (θ0)]>), (5.2.9)and qk and q˙k are respectively defined in Equations (5.2.3) and (5.2.4).Proofs of Theorems 5.2.1 and 5.2.2 are provided in Appendix D. These proofs considerthe limit of the CNQR objective function defined in Equation (5.2.1), as opposed to thefinite-sample objective function that is used in estimating equations theory. This isbecause the gradient may not exist at the minimum.Theorem 5.2.2 is useful when making inference on the parameter θ0, and more impor-tantly, on the quantiles themselves (using the delta method). For a large enough samplesize, one could approximate the sampling distribution of the CNQR estimator using theGaussian distribution, and estimate the variance of this distribution using an estimate ofEquation (5.2.6). However, the asymptotic distribution under intermediate and extremeorder sequences might be a better finite-sample approximation than the Gaussian whenτc is large – that is, by allowing τc ↑ 1 as n → ∞. For an example of this in the linearquantile regression case, see Chernozhukov (2005).The asymptotic covariance in Theorem 5.2.2 can provide insight into a desirablechoice of a CNQR estimator. In particular, it is interesting to ask what transformationfunctions gk result in a better estimator. For example, what effect does transforming theresponse to a light-tailed, heavy-tailed, or short-tailed distribution have on the estimatorperformance? In addition, when using the Riemann sum approximation with quantilelevels computed using Equation (5.2.2), how is the variance related to the number of45CHAPTER 5. PROPOSED FORECASTING METHODOLOGYquantile levels K, and at what point does the variance stabilize? How does the choice ofτc affect the estimator variance?These questions are investigated in Figure 5.2 using bivariate data generated fromthree models (a Frank copula, Gumbel copula, and reflected Gumbel copula). Estimationis done using the correct copula families and correct marginals. As expected, the precisionworsens as larger values of τc are chosen, since less information is being drawn from thedata. Also, it appears that when the response is transformed to have a heavy-taileddistribution, the estimator performance may be compromised. Perhaps this is due to the“spreading out” of the quantile curves when the tail is heavy. Another way to interpretthis finding is that, if a response variable is suspected to have a heavy-tailed (Pareto-like)distribution, then it should be transformed to have a lighter tail (such as exponential oruniform). It seems that choosing K = 10 quantile levels is sufficient for convergence ofthe variance.The asymptotic precision can be interpreted as a measure of goodness, indicating howwell the estimator can identify the correct parameter from the parameter space. But thisprecision measure changes with parameterization, and it is less obvious how to extendprecision (and variance) in the case of multiple parameters. For future research, it wouldbe useful to investigate how well the estimator can identify the correct model from themodel space (as opposed to parameter from the parameter space). For example, estimatesof the quantiles themselves instead of the parameters would be more meaningful here.This would provide a “standardized” approach, allowing for a natural comparison in thecase of different numbers of parameters and predictors, and would be independent ofparameterization. It would allow for estimation on vines with more than two dimensions– a useful investigation indeed.In extending the exploration to the case of multiple parameters and predictors, per-haps one would find the same result: that a transformation to a heavier-tailed responseleads to less precise estimates. This is the case with the empirical univariate quantileestimator, which has variance proportional to the square of the slope of the true quantilefunction. This slope steepens in the tail with increasing tail heaviness. Some generaliza-tion of this is to be expected in the regression setting.46CHAPTER 5. PROPOSED FORECASTING METHODOLOGYExp(1) Par(3) Unif(0,1)llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.0050.0100.0150.0200.20.30.40.50.60.10.20.30.40.50.6FrankGumbelRefl. Gumbel0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50Number of Quantile Indices (K) for RegressionAsymptotic Precisionτc00.50.9Figure 5.2: Asymptotic precision (inverse of the variance) of the family of CNQR estimators usingK evenly spaced quantile levels calculated with Equation (5.2.2). The single predictor is linked to theresponse with the copula indicated in the side panels having a Kendall’s tau of 0.3. The copulas werechosen to see whether tail dependence influences the asymptotic precision: the Frank copula has upper tailquadrant independence; the Gumbel copula has strong upper tail dependence; and the reflected Gumbelcopula has intermediate tail dependence (similar to the Gaussian copula). A common transformationfunction gk = g is taken such that the transformed response Y 7→ g (Y ) has marginal distributionsindicated in the upper panels (the columns should therefore be interpreted as different CNQR estimators,not as different underlying distributions). Using approximately 10 quantile levels appears sufficient toachieve the converged (K →∞) precision. It appears that transforming the response to a heavy-taileddistribution may compromise the performance of the resulting CNQR estimator.5.2.3 Special CasesIt is reassuring that the asymptotic covariance of a CNQR estimator in Equation (5.2.6)decomposes to the special case presented by the parallel linear case in Zou and Yuan(2008), and the single-quantile linear case in the seminal paper of Koenker and Bassett(1978), as the following remarks indicate.5.2.3.1 Parallel Linear Quantile SurfacesSuppose the quantile surfaces of Y conditional on the predictors are linear and parallel,so thatQY |X(τ | x) = x>θ0 +Qε (τ) (5.2.10)47CHAPTER 5. PROPOSED FORECASTING METHODOLOGYfor(τ,x>) ∈ (0, 1) × Rp, where θ0 ∈ Rp, and Qε : (0, 1) → R is a strictly increasingfunction having Qε (0.5) = 0. The quantile function Qε describes an error distribution,determining the “spacing” of the parallel quantile surfaces.Consider modelling these conditional quantiles with a family indexed by parametersθ ∈ Rp and functions Qˆε in the space of strictly increasing quantile functions withQˆε (0.5) = 0, byQˆY |X(τ | x; Qˆε,θ)= x>θ + Qˆε (τ)for(τ,x>) ∈ (0, 1)×Rp. Asymptotics for the CNQR estimator for θ0 that usesK quantilelevels 0 < τ1 < · · · < τK < 1 and identity transformation functions gk is considered byZou and Yuan (2008, Theorem 2.1) (the estimator simultaneously estimates parametersQε (τk) for k = 1, . . . , K, but these are not considered). To compute the covariance inEquation (5.2.6), first notice thatfY |X(y | x) = fε (y − x>θ0)for(y,x>) ∈ Rp+1, where fε is the density corresponding to the distribution with quantilefunction Qε. Now, for k, k′ = 1, . . . , K, we have q˙k (θ0) = X, and using Equations (5.2.7)and (5.2.8),D0kk′ = E(XX>)=: D0andD1 =K∑k=1E(fε(Qε (τk))XX>)= D0K∑k=1fε(Qε (τk)).The covariance becomesΣCNQR = D−10∑Kk,k′=1 min (τk, τk′)(1−max (τk, τk′))[∑Kk=1 fε(Qε (τk))]2 , (5.2.11)which matches the result of Zou and Yuan (2008, Theorem 2.1).5.2.3.2 Linear Quantile SurfacesSuppose the quantile surfaces of Y conditional on the predictors are linear, so thatQY |X(τ | x) = x>θ0 (τ) (5.2.12)for (τ,x) ∈ (0, 1) × Rp, where θ0 : (0, 1) → Rp is a function such that QY |X(· | x) is avalid quantile function for each x. Fixing τ ∈ (0, 1), a family of models for the τ -quantile48CHAPTER 5. PROPOSED FORECASTING METHODOLOGYsurface isQˆY |X(τ | x) = x>θfor x ∈ Rp, indexed by parameters θ ∈ Rp. The CNQR estimator of θ0 (τ) usingK = 1 quantile level τ1 = τ and identity transformation function g1 reduces to thequantile regression estimator of Koenker and Bassett (1978). To compute the covariancein Equation (5.2.6), we have q˙1(θ0 (τ))= X, and using Equations (5.2.7) and (5.2.8),D011 = E(XX>)=: D0andD1 = E(fY |X(X>θ0 (τ) |X)XX>).The covariance becomesΣCNQR = τ (1− τ)D−11 D0D−11 , (5.2.13)which matches the result of Koenker (2005, Theorem 4.1).Although the estimation considered here is a special case of the estimation of theparallel linear model in Equation (5.2.10) with K = 1 quantile levels, the asymptoticcovariances are not equal in general. That is, Equation (5.2.11) with K = 1 does notreduce to Equation (5.2.13) in general. This is because the quantity D1 depends on thetrue conditional quantile function in a neighbourhood of τ1, which is more specific inthe case of parallel quantile surfaces. In general, the D1 term can be reduced wheneverQ′Y |X(τ | x) does not depend on x, because in general,fY |X(QY |X(τ | x) | x) = 1Q′Y |X(τ | x)for (τ,x) ∈ (0, 1)× Rp.5.3 Selecting a ModelAn added complication to the proposed PCBN modelling method is that there aremany candidate models that one could consider. Fitting m copula models to p predictorsfor all pairing orders results in mpp! possible forecasters. We are tasked with choosingthe most appropriate forecaster – the one that generalizes best to new data.One need not consider all p! possible pairing orders. Instead, we recommend putting49CHAPTER 5. PROPOSED FORECASTING METHODOLOGYthe predictors that are most dependent with the response near the beginning of thepairing order (using any dependence measure). This is most desirable because error ispropagated further down the pairing order.With a pairing order selected, we propose a sequential method for selecting a fore-caster. The method adds a predictor one at a time, each time choosing the optimalforecaster.1. Select a model for QY that optimizes the expected out-of-sample score. Set thepredictor number k = 1.2. Select candidate copula models for Ck,Y ;1:(k−1), fitting the corresponding modelfor QY |1:k for each – possibly re-estimating the parameters from previous steps tofurther optimize the score on the training set.3. Choose the forecaster that has the best score on QY |1:k given in Equation (A.2.3).To avoid overfitting, consider estimating the scores using out-of-sample data, suchas with cross validation or using a validation set; otherwise, one can use the trainingdata to estimate the score.4. Increase k by one and repeat Steps 2 and 3 until all predictors are considered.As long as the independence copula is included as a candidate in Step 2—equivalent toleaving the k’th predictor out—then the estimated out-of-sample score will never worsenby adding a predictor. Of course, this does not guarantee that the true score does notworsen. This forecaster progression allows us to see the effect of adding a particularpredictor (conditional on the already-included ones). Variable selection occurs here byremoving predictors with the independence copula fitted.Of course, candidate copula models should be motivated by an exploratory analysis.Normal scores plots of each predictor paired with the response will often give valuableinsight as to what copula models to consider. In addition, the rough shape of the quantilecurves in the normal scores plots can be investigated using the local fitting techniquesdiscussed in Section 4.1, and compared to the quantile curves of copula families. Forexample, Bernard and Czado (2015) show that quantile curves that flatten out in thetails suggest tail independence. For a study on the quantile curves of copula families andtheir relationship to tail dependence, see Bernard and Czado (2015).50Chapter 6New Bivariate Copula FamiliesThe flexible PCBN/vine copula method of Chapter 5 uses a sequence of bivariatecopula models having a variety of tail behaviour and CCEVI’s. This variety is what givesthe new approach its flexibility. Particularly, copulas having a non-constant CCEVI areimportant for capturing different tail behaviours of the upper quantile function forecastsacross the predictor space.However, an example of a copula having a non-constant CCEVI remains elusive –many (if not all) simple one- and two-parameter bivariate copulas in Joe (2014) haveconstant CCEVI’s. This problem is compounded by the often non-trivial computationsthat are involved in finding the CCEVI of some copulas. In addition, fully tail-lighteningcopula families (i.e., those allowing a CCEVI of 0) remain elusive as well.To solve this issue, a new copula family having a non-constant CCEVI is proposed inSection 6.1, and is called the Integrated Gamma (IG) copula family. An additional copulafamily, called the Integrated Gamma Limit (IGL) copula family, exists on a boundaryof the IG copula family. The IGL family is an example of a family containing fullytail-lightening copulas, and is therefore described in some detail in this chapter as well.The IG and IGL copula families have a natural non-parametric extension. The ex-tension of the IG copula family is identified by Durante and Jaworski (2012). Theseextensions are discussed in Section 6.2.For this chapter, it is especially pertinent to recall the convention that, if a functionlacks a definition at a point in its domain, then it is defined as the limit at that point.6.1 The IG and IGL Copula FamiliesThis section introduces a copula family (the IG family) having non-constant CCEVI’s,as well as a copula family (the IGL family) existing on a boundary of the IG family having51CHAPTER 6. NEW BIVARIATE COPULA FAMILIESCCEVI’s of zero. As discussed in Section 6.1.1, these families were discovered by findingthe copula underlying a Type I Pareto conditional distribution having a linear shapeparameter in the conditioning variable. This derivation can be safely skipped to obtain aformal definition of the families in Section 6.1.2. Some properties of the copula familiesare discussed in Section 6.1.3.This section refrains from describing extensive properties of the IG and IGL copulafamilies, because these families have a non-parametric generalization that is discussed inSection 6.2.6.1.1 BeginningsThe approach taken to construct a bivariate copula family with a non-constant CCEVIis to specify a conditional distribution having an EVI that depends on the predictor. Inparticular, a response variable conditional on a uniform predictor is taken to have a TypeI Pareto distribution with linear shape parameter. The corresponding copula can thenbe derived. Section 6.1.1.1 discusses the details.The resulting copula family does not achieve high dependence, so a generalization ofthe family is sought in Section 6.1.1.2. A generalization is found after recognizing thatthe limit copula is related to the Gamma (2, 1) distribution, and generalizing this to theGamma (k, 1) distribution.6.1.1.1 Initial DerivationBuilding a bivariate copula family with non-constant CCEVI can be done as follows.1. Choose a distribution for a random variable Y conditional on a uniform predictorU , such that the conditional EVI is non-constant.2. Obtain the joint distribution of (U, Y ).3. Obtain the marginal quantile function of Y .4. Derive the underlying copula of (U, Y ) using Sklar’s theorem – specifically, Equa-tion (2.4.1).It is difficult to find a distribution in Step 1 so that the integral in Step 2 is tractable. Ifintegration is tractable, the quantile function in Step 3 is unlikely to have a convenientclosed form.52CHAPTER 6. NEW BIVARIATE COPULA FAMILIESOne workable example is to choose a Type I Pareto distribution in Step 1 having ashape parameter that is linear in the predictor:FY |U(y | u) = 1− y−α(u)for y ≥ 1 and u ∈ (0, 1), where α : (0, 1)→ (0,∞) is a positive linear function. The EVIof this distribution is 1/α (u), which is non-constant in u.To ensure the positivity of α, it is convenient to writeα (u) = ν(1− θ− + θu)for some θ ∈ R and ν > 0, where θ− = min (θ, 0). With ν = 1, α is the increasing linesegment from the point (0, 1) to (1, 1 + θ) when θ > 0, and is the decreasing line segmentfrom the point(0, 1 + |θ|) to (1, 1) when θ < 0. The positive ν parameter acts as an“amplifier” so that the space of positive linear functions with domain (0, 1) is spanned.Next, the joint distribution of (U, Y ) can be found rather easily by integration, aftera change of variables to t = α (w):FU,Y (u, y) =∫ u0FY |U(y|w) dw= u− 1νθ∫ α(u)α(0)y−t dt= u− y−ν(1−θ−) 1− y−νθuνθ log yfor (u, y) ∈ (0, 1)× [1,∞). It will be more convenient to write this joint distribution asFU,Y (u, y) = u(1− yνθ−H2 (yν ; θu)), (6.1.1)whereH2 (t; η) =1tΨ2(1η log t)(6.1.2)for (t, η) ∈ [1,∞)× (0,∞), andΨ2 (t) =1− exp (−1/t)1/t(6.1.3)for t ≥ 0 (the motivation for using the subscript “2” will become clear in the followingsections).53CHAPTER 6. NEW BIVARIATE COPULA FAMILIESThe marginal distribution function of Y can be found byFY (y) = limu↑1FU,Y (u, y) = 1− yνθ−H2 (yν ; θ)for y ≥ 1. But notice thattη−H2 (t; η) = tη− 1t1− t−ηη log t=1ttη− − t−η+η log t=1t1− t−|η||η| log t= H2(t; |η|) (6.1.4)for (t, η) ∈ [1,∞)× (0,∞), where η+ = max (0, η), so thatFY (y) = 1−H2(yν ; |θ|) .The quantile function is thereforeQY (τ) =[H←2(1− τ ; |θ|)]1/νfor τ ∈ (0, 1), where H←2 (·; η) is the unique inverse function of H2 (·; η) for each η > 0.Lastly, the copula underlying (U, Y ) can be found by Sklar’s theorem, using Equa-tion (2.4.1):CU,Y (u, v; θ) = FU,Y(u,QY (v))= u− u[H←2(1− v; |θ|)]θ− H2 (H←2 (1− v; |θ|) ; θu) .The parameter ν cancels out, leaving only the θ parameter.Practically, we need only consider θ > 0, because changing the sign of θ only re-sults in a 1-reflection of the copula. This can be shown by proving R1CU,Y (u, v;−θ) =CU,Y (u, v; θ) for any θ 6= 0. Using the formula of a 1-reflection in Equation (A.1.3), andletting h = H←2(1− v; |θ|), we haveR1CU,Y (u, v;−θ) = v − CU,Y (1− u, v;−θ)= u− (1− v) + (1− u)h(−θ)−H2(h;−θ (1− u)) .54CHAPTER 6. NEW BIVARIATE COPULA FAMILIESNow, notice that (−θ)− = −θ + θ−, and use the identity in Equation (6.1.4) to obtain1− v = H2(h; |θ|) = hθ−H2 (h; θ) .Making these substitutions,R1CU,Y (u, v;−θ) = u− hθ−[H2 (h; θ)− (1− u)h−θH2(h;−θ (1− u))]= u− uhθ−H2 (h; θu)= CU,Y (u, v; θ) ,after some routine algebra to obtain the second line, by substituting the expression forH2 found in Equation (6.1.2).Taking θ > 0, then, we obtain the family conveniently represented by the 2-reflectionR2CU,Y (u, v; θ) = u− CU,Y (u, 1− v; θ) = uH2(H←2 (v; θ) ; θu)(6.1.5)for (u, v) ∈ (0, 1)2.6.1.1.2 ExtensionOne question to ask when confronted with a copula class is whether a full range ofdependence is included (i.e., up to perfect positive and negative dependence). Empiri-cal investigations of the copula family in Equation (6.1.5) have shown that dependenceincreases as θ increases, so the range of dependence can be investigated through the θlimits. It can be shown that the θ ↓ 0 limit of the copula family in Equation (6.1.5) is theindependence copula, but that the θ → ∞ limit does not result in the comonotonicitycopula (this is shown in Proposition 6.2.10 in a later section). Instead, we obtainlimθ→∞R2CU,Y (u, v; θ) = uΨ2(u−1Ψ←2 (v))(6.1.6)for (u, v) ∈ (0, 1)2, where Ψ2 is defined in Equation (6.1.3). This copula has a lowKendall’s tau of approximately 0.39, so it is of interest to extend this copula so that alarger dependence is reached.The limit copula in Equation (6.1.6) belongs to a larger class of copulas described byDurante and Jaworski (2012) that generalizes the function Ψ2. This class is describedin a later section in Equation (6.2.1), but requires that Ψ2 : [0,∞) → [0, 1] be a con-cave distribution function (or convex survival function). Equivalently, Ψ′2 should be a55CHAPTER 6. NEW BIVARIATE COPULA FAMILIESdecreasing density function on [0,∞).For Ψ2 defined in Equation (6.1.3), the derivative isΨ′2 (t) = 1−(1t+ 1)exp(−1t)for t ≥ 0, for which Ψ′2(1/t)is the Gamma (2, 1) distribution function. Since the integralof a Gamma distribution function over (0,∞) exists, Ψ′2(1/t)can be replaced by thenormalized Gamma (k, 1) distribution function for k > 1 (the scale parameter of theGamma distribution does not affect the results). Doing so results in a family of functionsΨk that can replace Ψ2 in general, and defines a class of copulas called the IntegratedGamma Limit (IGL) family of copulas. The functions Ψk are defined in Equation (6.1.7).It is shown in a later section that the IGL copula family has the comonotonicitycopula occur as k →∞, so that the full range of dependence can now be reached.With the limit (θ → ∞) copula in Equation (6.1.6) generalized, it is interesting tocheck whether an interior (θ ∈ (0,∞)) copula can be generalized by replacing H2 inEquation (6.1.2) with “Hk”, which replaces Ψ2 with Ψk. This generalization is possible,and results in the Integrated Gamma (IG) family of copulas.6.1.2 DefinitionThe IG family of bivariate two-parameter copulas, derived in Section 6.1.1, is definedhere. Its boundary case – the IGL copula family – is also defined here.The following families of functions for k > 1 are central to the IG and IGL copulafamilies:Ψk (y) = yΓ (k)− Γ∗ (k, y−1)Γ (k − 1) +Γ∗(k − 1, y−1)Γ (k − 1) (6.1.7)for y ∈ [0,∞), where Γ and Γ∗ are (respectively) the gamma and (upper) incompletegamma functions, andHk (y; η) =1yΨk(1η log y)(6.1.8)for (y, η) ∈ [1,∞)× (0,∞). Plots of some of these functions are shown in Figures 6.1 (forΨk) and 6.2 (for Hk). It is important to recognize that each Ψk is a concave distributionfunction, and each Hk (·; η) is a survival function. Both are strictly monotone, so that theinverses (in the first argument) exist. Properties of these functions are examined morerigorously in Appendices E.3.1 and E.4.1.Note that the IG and IGL copula families have a non-parametric generalization bygeneralizing the Ψk functions, as discussed in Section 6.2. As such, proofs related to56CHAPTER 6. NEW BIVARIATE COPULA FAMILIES0.000.250.500.751.000.0 2.5 5.0 7.5 10.0tΨk(t)k1.52310Figure 6.1: Some generating functions Ψk, described in Equation (6.1.7).θ = 0.2 θ = 2 θ = 100.000.250.500.751.001 4 7 10 1 4 7 10 1 4 7 10tHk(t, θ)k1.5520Figure 6.2: Some generating functions Hk (·; θ), described in Equation (6.1.8).the IG and IGL copula families, outlined in Appendices E.4 and E.3, rely on resultsregarding the non-parametric generalizations. These appendices therefore appear afterthe appendices for the non-parametric generalizations.Theorem 6.1.1. For each (k, θ) ∈ (1,∞) × (0,∞), the function Hk (·; θ) in Equa-tion (6.1.8) is strictly decreasing and has unique inverse H←k (·; θ), andCIG (u, v; k, θ) = u+ v − 1 + (1− u)Hk(H←k (1− v; θ) ; θ (1− u))(6.1.9)with (u, v) ∈ (0, 1)2 is a copula.A proof follows from a further generalization introduced in Section 6.2 (through astochastic representation given in Proposition 6.2.11).Definition 6.1.1. The copula family described by Equation (6.1.9) for parameters (k, θ) ∈(1,∞) × (0,∞) is called the Integrated Gamma (IG) copula family, having membersIG (k, θ). The reflection copula isCˆIG (u, v; k, θ) = uHk(H←k (v; θ) ; θu).57CHAPTER 6. NEW BIVARIATE COPULA FAMILIESThe name “Integrated Gamma” comes from the fact that Ψ′k(1/y)is proportional tothe Gamma(k, 1) distribution function for y ≥ 0 (see Equation (E.3.1)), and the “IG”notation should not be confused with the “incomplete gamma” function. For formulasrelated to the IG copula family, see Appendix E.4.2.The independence copula is achieved when θ ↓ 0, whereas the IGL copula familyresults when θ →∞ (an investigation of concordance is left for future research).Theorem 6.1.2. For any k > 1, the IG copula family described in Equation (6.1.9) haslimθ↓0CIG (u, v; k, θ) = C⊥ (u, v) = uv (6.1.10)andlimθ→∞CIG (u, v; k, θ) = CIGL (u, v; k) (6.1.11)for (u, v) ∈ (0, 1)2, whereCIGL (u, v; k) = u+ v − 1 + (1− u) Ψk((1− u)−1 Ψ←k (1− v)), (6.1.12)and Ψk is defined in Equation (6.1.7). Furthermore, (u, v) 7→ CIGL (u, v; k) is a copula.This theorem is given more generally in Section 6.2, and is therefore not directlyproven. The limit results are given in Proposition 6.2.10, and a stochastic representationof a generalized copula is given in Proposition 6.2.4.Definition 6.1.2. The copula family described by Equation (6.1.12) for parameter k > 1is called the Integrated Gamma Limit (IGL) copula family, having members IGL (k). Thereflection copula isCˆIGL (u, v; k) = uΨk(u−1Ψk (v)).For formulas related to the IGL copula family, see Appendix E.3.2.There may exist a natural multivariate extension to the IG and IGL copula fami-lies. One such extension might be achieved by adding to the definition of Hk (t; η) inEquation (6.1.8) by multiplying by p > 1 copies of the Ψk function, to obtainH(p)k (t; η1, . . . , ηd) =1tp∏j=1Ψk(1ηj log t)for t > 1 and(η1, . . . , ηp) ∈ (0,∞)p. Taking k = 2, one can show that(u1, . . . , up, v) 7→ u1 · · ·up(1−H(p)2 (H(p)←2 (1− v; θ1, . . . , θp) ; θ1u1, . . . , θpup))58CHAPTER 6. NEW BIVARIATE COPULA FAMILIESwith(θ1, . . . , θp) ∈ (0,∞)p is the copula resulting from the following stochastic represen-tation: Uj ∼ Unif (0, 1) for j = 1, . . . , p are iid, andY | (U1 = u1, . . . , Up = up) ∼ Par (1, 1 + θ1u1 + · · ·+ θpup) .However, it is questionable whether such copula holds any value in practice, since thepredictor variables are independent.Another option to generalize the IG and IGL copula families is to consider the follow-ing stochastic representation: U ∼ Unif (0, 1), and iid variables Yj | U = u for j = 1, . . . , pfor some p > 1 having standard Type I Pareto distributions with shape parameter linearin u. But the joint distribution of(U, Y1, . . . , Yp)involves a cumbersome binomial expan-sion. Yet another option is to specify a stochastic representation having a hierarchicalform. For example, with three variables, one can take U ∼ Unif (0, 1),Y1 | U = u ∼ Par(1, ν1 (1 + θ1u))for ν1, θ1 > 0 and u ∈ (0, 1), andY2 | (Y1 = y1, U = u) ∼ Par(1, ν2(1 + θ2FY1|U(y1 | u)))for ν2, θ2 > 0 and y1 > 1. The joint distribution of (U, Y1, Y2) is that of (U, Y1) (derivedin Section 6.1.1.1), plus another term involving elliptic integrals. Overall, these ideas aremathematically cumbersome, and possibly intractable.6.1.3 PropertiesSome properties of the IG(k, θ) and IGL(k) copula families are discussed here. First,normal scores plots of simulated data from some of these copulas are explored in Fig-ure 6.3. The following observations can be made ab initio.• IG and IGL copulas are not permutation symmetric. This means that switchingthe horizontal and vertical variables results in a different copula, and can be seenby the asymmetry about the y = x line.• Overall, larger values of θ and k allow for stronger dependence, particularly whenθ  k are both large. The comonotonicity copula results as θ → ∞ faster thank → ∞. Independence results as θ ↓ 0. For a plot of the dependence in terms ofKendall’s tau, see Figure 6.4.59CHAPTER 6. NEW BIVARIATE COPULA FAMILIESθ = 1 θ = 10 θ = 100 θ = infinity−5.0−2.50.02.5−5.0−2.50.02.5−5.0−2.50.02.5k=1.5k=5k=20−4 −2 0 2 −4 −2 0 2 −4 −2 0 2 −4 −2 0 2xyFigure 6.3: Normal scores plots of simulated data from members of the IG (θ, k) and IGL (k) copulafamilies, defined respectively in Definitions 6.1.1 and 6.1.2. The marginal distributions are standardGaussian, and the dependence is described by an IG or IGL copula. Each panel displays 1000 randomlygenerated observations.• When θ < k, the copulas appear to be close to independent• There is no tail dependence, except in the upper tails of the IGL copulas, in whichcase a larger k results in a larger tail dependence coefficient.Next, notice that as θ and k grow comparably, the resulting copula is not the comono-tonicity copula. See the normal scores plots in Figure 6.5. The limiting copula as bothθ and k increase appears to have a “hollowed out” upper-left corner with the formationof a “ridge”.Some of these observations will be proven. From these observations, it appears thatthe position of θ relative to k is a meaningful parameter. Additionally, the family maynot show concordance in θ (an investigation is left for future research). It is left to futureresearch to find an appropriate reparameterization – for example, possibly including theratio θ/k.The IG and IGL families indeed achieve the comonotonicity and independence copulasas bounds.Proposition 6.1.3. For (u, v) ∈ (0, 1)2,CIGL (u, v; k)→ C+ (u, v)60CHAPTER 6. NEW BIVARIATE COPULA FAMILIESlllllllllllllllllllll0.000.250.500.751.000 50 100 150θKendall's tau kllll251030IG copula familylllllllll0.000.250.500.751.000 25 50 75 100kKendall's tauIGL copula familyFigure 6.4: Kendall’s tau dependence of the IG and IGL copula families, estimated by a random sampleof 10,000 observations.3 10 50 300−2024−2 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 2 4xyFigure 6.5: Normal scores plots of simulated data from the IG copula family (Definition 6.1.1) withθ and k parameters being equal, indicated in the panel labels. The marginal distributions are standardGaussian, and the dependence is described by an IG copula.as k →∞, where C+ (u, v) = min (u, v) is the comonotonicity copula.A proof can be found in Appendix E.3.3. For future research, an extension of thisresult might hold for the IG copula family as both θ →∞ and k →∞, with θ increasingfaster so that θ/k →∞.The IG family has no tail dependence, but the IGL family does. An investigation oftail order is left for future research.Proposition 6.1.4 (IG tail dependence). The IG copula family CIG, defined in Equa-tion (6.1.9), has zero lower tail dependence coefficient whenever k ≥ 2, and zero uppertail dependence coefficient for all k > 1.A proof can be found in Appendix E.4.3. The case with k ∈ (1, 2) is left for futureresearch.61CHAPTER 6. NEW BIVARIATE COPULA FAMILIESProposition 6.1.5 (IGL tail dependence). The IGL copula family CIGL, defined in Equa-tion (6.1.12) has zero lower tail dependence coefficient and an upper tail dependencecoefficient ofλ(IGL)U = 1−[(k − 1) e−1]k−1Γ (k). (6.1.13)A proof can be found in Appendix E.3.3.Of course, the primary interest for describing the IG and IGL families is their CCEVI’s.Proposition 6.1.6. The IG copula with θ > 0 and k > 1, defined in Definition 6.1.1,has CCEVIξIG : u 7→ 11 + θ (1− u) , (6.1.14)which is an increasing function.A proof can be found in Appendix E.4.3.Proposition 6.1.7. Any IGL copula, defined in Equation (6.1.12), has a CCEVI of 0.A proof can be found in Appendix E.3.3.The IG copula family has an increasing CCEVI, whose reciprocal is linear in the firstcopula variable. Modelling an increasing CCEVI is useful for capturing an increasinglevel of “risk” of an extreme response given larger values of a predictor. In particular, theCCEVI of an IG copula sharply increases towards 1 near u = 1, meaning that occurrencesof extreme values of the predictor are mostly responsible for an extreme response.The IGL copula family has a CCEVI of 0, so that conditioning a heavy-tailed re-sponse on a predictor would result in a light-tailed conditional distribution. A predictordisplaying this phenomenon in practice is very useful, since the tail heaviness of a re-sponse variable is entirely described by the predictor. An IGL copula can model suchphenomenon.Some quantile curves of the IG and IGL copula families are shown in Figure 6.6.The zero tail dependence of the IG copula family is manifest in the flattening of itsquantile curves for increasing values of the predictor variable. In contrast, the IGLcopula has increasing quantile curves. The IGL copula also has upper quantile curvesthat are close together, and this is possibly a testament to its fully tail-lightening property“compressing” the marginal tail. Interestingly, near-linear quantile curves result from theIGL copula under exponential marginals.As indicated throughout this section, the IG and IGL copula families have a non-parametric extension by generalizing the function Ψk. This extension is explored in thefollowing section.62CHAPTER 6. NEW BIVARIATE COPULA FAMILIESIGLExp(1)~Exp(1)IGLExp(1)~t(2)IGLPar(3)~Exp(1)IGLPar(3)~t(2)IGExp(1)~Exp(1)IGExp(1)~t(2)IGPar(3)~Exp(1)IGPar(3)~t(2)2461234524682345123451234523456234560 1 2 3 4 5 −4 −2 0 2 4 0 1 2 3 4 5 −4 −2 0 2 40 1 2 3 4 5 −4 −2 0 2 4 0 1 2 3 4 5 −4 −2 0 2 4xQuantileQuantileLevel0.950.900.80Figure 6.6: Examples of quantile curves of bivariate distributions described by the IGL(1.6) and IG(8.9,3) copulas, defined respectively in Equations (6.1.12) and (6.1.9). Both copulas are chosen to havea Kendall’s tau of approximately 0.3. The marginal distributions are indicated in the form “responsedistribution” ~ “predictor distribution”.6.2 Non-parametric ExtensionsThe IG copula family is defined in Section 6.1.2, and indexed by (k, θ) ∈ (1,∞) ×(0,∞). Upon deriving the IG family in Section 6.1.1, there is no knowledge of the kparameter – only k = 2 is inadvertently used. This narrower family suffers from a poorrange of dependence. Generalizing to the IG copula family by allowing Ψ2 to be anyone of Ψk for k > 1, defined in Equation (6.1.7), adds the flexibility of a full range ofdependence.But the IG copula family is not the only generalization possible. In fact, replacingΨ2 in an IG copula to be any function ψ that describes a valid copula leads to a muchlarger class of copulas. The θ →∞ version of this class, corresponding to the IGL copulafamily, is investigated by Durante and Jaworski (2012), and is referred to as the DJ copulaclass in this dissertation. The finite-θ version can be thought of as an extension to theDJ copula class, and is therefore referred to as the extended DJ (extDJ) copula class.In this section, properties of these large copula classes are discussed to open up thepossibility of describing other parametric families – especially those with non-constantCCEVI, like the IG copula family. Indeed, another such family is found, and is identifiedbriefly.Section 6.2.1 discusses the DJ copula class and its properties, and Section 6.2.2 dis-cusses the extDJ copula class.63CHAPTER 6. NEW BIVARIATE COPULA FAMILIES6.2.1 Durante and Jaworski Copula ClassThe DJ copula class is indexed by functions referred to in this dissertation as gener-ating functions. These generating functions are identified to have a type of “parity”, andalso form “clusters” due to an invariance property. This forms a discussion on definingthe DJ copula class that is held in Section 6.2.1.1. Some properties of the DJ copulaclass are discussed in Section 6.2.1.2.6.2.1.1 DefinitionDurante and Jaworski (2012, Equation (2)) describe a class1 of bivariate copulas thatcontains the IGL family as a special case. This DJ copula class can be defined byCDJ (u, v;ψ) = uψ(u−1ψ← (v))(6.2.1)for (u, v) ∈ (0, 1)2, with DJ generating function or DJ generator ψ : [0,∞)→ (0, 1) thatis concave and non-decreasing (or convex and non-increasing) with inverse ψ←. The DJcopula with generator ψ is said to be generated by ψ. Formulas related to the DJ copulaclass can be found in Appendix E.1.2. The DJ class is investigated here in additionaldetail to that described by Durante and Jaworski (2012).Notice that the IGL copula family, defined in Equation (6.1.12), has reflection copulasthat belong to the DJ copula class. The reflected IGL family results by restricting thespace of DJ generating functions to Ψk for k > 1, defined in Equation (E.3.1).There is a natural bipartition of the DJ generators into distribution functions (increas-ing) and survival functions (decreasing). In fact, these subclasses are naturally pairedelement-wise, connected by a vertical reflection of the generated copulas.Proposition 6.2.1. If ψ is a DJ generator that is a distribution function (resp. survivalfunction), then ψ˜ = 1− ψ is a survival function (resp. distribution function) andCDJ(u, v; ψ˜)= R2CDJ (u, v;ψ)for (u, v) ∈ (0, 1)2.The proposition can be proved by a direct application of Equation (A.1.4), and istherefore omitted.1Their motivation for describing this class is to characterize copulas that have the “invariance underunivariate truncation” property. We omit details.64CHAPTER 6. NEW BIVARIATE COPULA FAMILIESNot only does Proposition 6.2.1 identify that a valid distribution function and corre-sponding survival function generate copulas that are related by vertical reflection, but italso identifies closure of the DJ copula class under vertical reflection.The definition of the DJ copula class in Equation (6.2.1) is not the only way the classcan be defined.Proposition 6.2.2. Fix α ∈ R\ {0}. Let ψ : [0,∞) → [0, 1] be such that (u, v) 7→CDJ (u, v;ψ), defined in Equation (6.2.1), is a copula. ThenCDJ (u, v;ψ) = C(α)DJ(u, v;ψ ◦ g−1/α)for all (u, v) ∈ (0, 1)2, where g−1/α (t) = at−1/α for t ∈ [0,∞), a > 0, andC(α)DJ (u, v;ψα) = uψα(uαψ←α (v))(6.2.2)for some ψα : [0,∞)→ [0, 1]. The contrary is also true: let ψα : (0,∞)→ [0, 1) be suchthat (u, v) 7→ C(α)DJ (u, v;ψα) is a copula. ThenC(α)DJ (u, v;ψα) = CDJ (u, v;ψα ◦ g−α)for all (u, v) ∈ (0, 1)2, where g−α (t) = bt−α for t ∈ [0,∞), b > 0.It follows by taking common values of α that a DJ generator does not generate aunique DJ copula, because the generators are invariant to a change in scale.Corollary 6.2.3 (Invariance to scale changes). Suppose that (u, v) 7→ C(α)DJ (u, v;ψ),defined in Equation (6.2.2), is a copula for some real α 6= 0 and generating functionψ : [0,∞)→ [0, 1]. Then if g (t) = αt for any α > 0, C(α)DJ (u, v;ψ) = C(α)DJ (u, v;ψ ◦ g).This class of copulas has a convenient stochastic representation.Proposition 6.2.4 (Stochastic representation). Let ψ : [0,∞) → [0, 1] be a concavedistribution function with density ψ′. Consider random variables Y and Z with FY = ψ,and for each y ∈ [0,∞),FZ|Y(z | y) = 1− ψ′ (z)ψ′ (y)(6.2.3)for z ≥ y. Letting U = Y/Z, the joint distribution function of (U, Y ) isFUY (u, y) = uψ(u−1y)(6.2.4)65CHAPTER 6. NEW BIVARIATE COPULA FAMILIESfor (u, y) ∈ (0, 1) × [0,∞), and the corresponding copula is the DJ copula in Equa-tion (6.2.1) generated by ψ. The copula describing the dependence of(U, 1/Y)is the DJcopula generated by 1− ψ, a convex survival function.Proofs of these results can be found in Appendix E.1.3.6.2.1.2 PropertiesA DJ copula generated by a distribution function has positive dependence, whereasas DJ copula generated by a survival function has negative dependence.Proposition 6.2.5 (Quadrant dependence). The DJ copula generated by an increasingDJ generating function ψ is positive quadrant dependent; that is,CDJ (u, v;ψ) ≥ uvfor (u, v) ∈ (0, 1)2, where CDJ is defined in Equation (6.2.1). The DJ copula generatedby a decreasing generating function 1− ψ is negative quadrant dependent; that is,CDJ (u, v; 1− ψ) ≤ uvfor (u, v) ∈ (0, 1)2.This is proved by Durante et al. (2011, Proposition 2.1).The comonotonicity and countermonotonicity copulas belong to the DJ copula class,and the independence copula arises as a limiting case.Proposition 6.2.6 (Comonotonicity/Countermonotonicity). The DJ copula in Equa-tion (6.2.1) with generating function ψ (y) = min (y, 1) for y ∈ [0,∞) and inverseψ← (τ) = τ for τ ∈ [0, 1] is the comonotonicity copula C+. Likewise, the generatingfunction ψ (y) = max (1− y, 0) for y ∈ [0,∞) and inverse ψ← (τ) = 1 − τ for τ ∈ [0, 1]yields the countermonotonicity copula C−.This is proved by Jaworski (2015, Proposition 8).Proposition 6.2.7 (Independence Limit). Consider the subclass of DJ copulas (u, v) 7→CDJ (u, v;ψθ) described in Equation (6.2.1) with generating functions indexed by θ ∈ (0, 1)given by ψθ(y) = min(yθ, 1)for y ∈ [0,∞) with inverses ψ←θ (τ) = τ 1/θ. Then theindependence copula arises as the θ ↓ 0 limit:limθ↓0CDJ (u, v;ψθ) = C⊥ (u, v)66CHAPTER 6. NEW BIVARIATE COPULA FAMILIESfor all (u, v) ∈ (0, 1)2.Some members of this class have non-constant CCEVI. We give an example wherethe DJ generating function is a Weibull distribution function.Proposition 6.2.8. Let Fβ (y) = 1−exp(−y1/β)for β ≥ 1 and y ≥ 0 be a Weibull distri-bution function, with decreasing density. The copula family defined by CDJ(u, v; 1− Fβ)is referred to as the DJ Weibull copula family in this dissertation. These copulas haveCCEVI of 1, and the reflected copulas have CCEVI u 7→ u1/β.Proofs of these results can be found in Appendix E.1.3. It is left for future researchto discover other properties of the DJ copula class.6.2.2 An Extended ClassIt is natural to ask whether an extension of the DJ copula class be constructed, similarto how the IG copula family in Equation (6.1.9) can be thought of as an extension tothe IGL copula family in Equation (6.1.12) by linking their generating functions viaEquation (6.1.8).Theorem 6.2.9. Let ψ : [0,∞) → [0, 1] be a concave distribution function, and η > 0.Define Hψ (·; η) : [1,∞)→ [0, 1] by generalizing Equation (6.1.8):Hψ (t; η) =1tψ(1η log t). (6.2.5)Then, Hψ (·; η) is monotone decreasing, andCextDJ (u, v;ψ, θ) = uHψ(H←ψ (v; θ) ; θu)(6.2.6)for (u, v) ∈ (0, 1)2 and θ > 0 is a copula defining an extended DJ (extDJ) copula class,where H←ψ (·; θ) is the inverse of Hψ (·; θ).A proof is given through a stochastic representation in Proposition 6.2.11. For for-mulas related to the extDJ copula class, see Appendix E.2.2.The pair (ψ, θ) is called an extDJ generator, and an extDJ copula in Equation (6.2.6)is said to be generated by an extDJ generator. The corresponding bivariate functionHψ : [1,∞) × (0,∞) → [0, 1], defined in Equation (6.2.5), is called a extDJ generatingfunction. The extDJ copula class is semi-parametric, with the space of generators beingdefined by concave distribution functions ψ : [0,∞)→ [0, 1] and a real parameter θ > 0.See Appendix E.2.1 for properties of the extDJ generating functions.67CHAPTER 6. NEW BIVARIATE COPULA FAMILIESUnlike the DJ generating functions, the extDJ generating functions are all survivalfunctions, defined by the product of two survival functions. The space of extDJ generatingfunctions are based only on those DJ generating functions that are increasing. This is sothat the IG copula class, defined in Equation (6.1.9), follows as a special case of the extDJcopula class. In particular, restricting the space of extDJ generators to (Ψk, θ) for k > 1and θ > 0, where Ψk is defined in Equation (6.1.7), generate the (parametric) reflected IGcopula family. To generalize the extDJ copula class further, one might consider definingHψ as a product of two generic survival functions or distribution functions, but this isnot considered in this dissertation.We saw in Proposition 6.2.7 that the independence copula arises as a limiting case ofthe DJ copula class, but otherwise, it seems that the independence copula is not a DJcopula. The extDJ copula class effectively provides a “bridge” between the DJ copulasand the independence copula.Proposition 6.2.10 (Limit classes). Let ψ : [0,∞) → [0, 1] be a concave distributionfunction. Then for (u, v) ∈ (0, 1)2,limθ→∞CextDJ (u, v;ψ, θ) = CDJ (u, v;ψ)where CDJ is defined in Equation (6.2.1), andlimθ↓0CextDJ (u, v;ψ, θ) = C⊥ (u, v) .Proposition 6.2.11 (Stochastic representation). Fix θ > 0 and a concave distributionfunction ψ : [0,∞) → [0, 1], which has density ψ′. Consider random variables Y and Udescribed by the (valid) distribution functionsFY (y) = 1−Hψ (y; θ)for y ≥ 1, andFU |Y(u | y) = uD1Hψ (y; θu)D1Hψ (y; θ)(6.2.7)for u ∈ (0, 1) and any y ≥ 1, where Hψ is defined in Equation (6.2.5). The randomvector (U, Y ) has joint distributionFU,Y (u, y) = u− uHψ (y; θu) (6.2.8)for (u, y) ∈ (0, 1)× [1,∞), and the random vector (U, 1/Y ) has copula equal to the extDJ68CHAPTER 6. NEW BIVARIATE COPULA FAMILIEScopula in Equation (6.2.6) generated by (ψ, θ).The extDJ generating functions can be defined in other ways besides Equation (6.2.5),due to their invariance under composition with invertible functions.Proposition 6.2.12 (Invariance to function composition). Let ψ : [0,∞) → [0, 1] be aconcave distribution function, η > 0, and g be a univariate strictly monotone functionhaving a domain such that the range is [1,∞). The extDJ copula with generating functionHψ, defined in Equation (6.2.5), is equivalent to the extDJ copula with generating functionH˜ψ (t; η) = Hψ(g (t) ; η)for (t; η) ∈ [1,∞)× (0,∞). That is, for (u, v) ∈ (0, 1)2 and θ > 0,uHψ(H←ψ (v; θ) ; θu)= uH˜ψ(H˜←ψ (v; θ) ; θu).Notice that composition of Hψ (·; η) with a strictly decreasing function changes thedirection of monotonicity. As such, the extDJ generating functions can be taken asdistribution functions instead of survival functions.Proposition 6.2.13 (Quadrant dependence). All extDJ copulas are positive quadrantdependent; that is, for any increasing DJ generator ψ and θ > 0,CextDJ (u, v;ψ, θ) ≥ uvfor (u, v) ∈ (0, 1)2, where CextDJ is defined in Equation (6.2.6).The IG copula with θ > 0 and k > 1 is the reflected extDJ copula generated by(Ψk, θ). It therefore follows that IG copulas are positive quadrant dependent.Proofs of these results can be found in Appendix .E.2.3.69Chapter 7Application to Flood ForecastingThis chapter focusses on applying the concepts in this dissertation to the applicationdiscussed in Section 2.1. Specifically, forecasters are built to forecast the extremes of one-day-ahead discharge of the Bow River at Banff using three methodologies: linear quantileregression, local regression, and the proposed CNQR method of Chapter 5 using vines.Each forecaster uses the same set of predictors, so as to assess each method’s ability toleverage the information available in the predictors about the response. Forecasts arequantile functions above a lower quantile level of τc = 0.9.The data are first pre-processed in Section 7.1 by computing predictors and responsevariables, and are split into fitting and test sets. Building the three forecasters is discussedin Section 7.2.It is found that the proposed method has the best score on the test set. However,the three methods perform comparably when considering uncertainty in the mean score,likely due to the near-linearity and/or lack of informativeness of the predictors. Theevaluation and comparison of the forecasters is made in Section 7.3 using proper scoringrules, calibration plots and histograms, and weighted scores. The forecasts that wouldhave been made during the Alberta flood are shown and discussed in Section 7.4.Computational details of this application can be found in Appendix F.7.1 Pre-processing the DataThe data presented in Section 2.1 need pre-processing before fitting forecasters. First,in Section 7.1.1, data are split into a fitting set, used for building the forecasters, anda test set, used for evaluating. The river discharge data are then deseasonalized inSection 7.1.2 to remove the within-year cycle, so that the deseasonalized discharge isapproximately stationary across time. Then, effective predictors and a response can be70CHAPTER 7. APPLICATION TO FLOOD FORECASTINGidentified and computed from the raw data, discussed in Section 7.1.3. The response ischosen to be the change in deseasonalized discharge, and the four predictors are lags 0and 1 change in deseasonalized discharge, and lags 0 and 1 drop in snowpack.7.1.1 Separating the DataIn addition to building a forecaster, we wish to be able to evaluate the forecaster. Toperform an adequate evaluation, one should use data (test data) that are different fromthose (fitting data) used in the model-building process. We reserve approximately 25%of the overall data for the test set.Where model selection is required (such as with vine CNQR, for estimating the re-sponse marginal distribution), the fitting data are further split into a training set (50%of the data) used for model estimation, and a validation set (25% of the data) for modelselection. To avoid “contamination” between the data sets due to autocorrelation, entireyears are randomly selected for each set (except for 2013, the year of Alberta’s big flood,reserved for the test set). The final selection of years can be seen in Table 7.1. In addi-tion, a subset of data belonging to a “flood season” is selected, taken to be between thejulian days 110 and 200.Data Set Years SelectedTest 1984, 1985, 1992, 2004, 2005, 2009, 2013, 2014Fitting Training 1980, 1981, 1982, 1983, 1988, 1990, 1991, 1993, 1994, 1996,1998, 1999, 2000, 2001, 2002, 2003, 2007, 2010, 2012Validation 1986, 1987, 1989, 1995, 1997, 2006, 2008, 2011Table 7.1: The years selected for each data set. The vine CNQR method splits the fitting data intotraining and validation, whereas the linear and local methods use the entire fitting data.7.1.2 Deseasonalizing the DischargeA glance at the overlaid time series plot in Figure 2.1 makes it clear that dischargeis heavily dependent on the day of year. We consider this dependence separately fromother predictors because of its cyclic trend, which cannot be adequately addressed usingvine CNQR and linear quantile regression techniques (also because it is deterministic).We will therefore pre-process the data by removing this trend.If Yt is the discharge on date t, and δ is the mapping from date to the day of the year,then we assumeZt :=log Yt − µδ(t)σδ(t)(7.1.1)71CHAPTER 7. APPLICATION TO FLOOD FORECASTINGis strictly stationary for some sequences {µi} and {σi > 0}, i = 1, . . . , 366 (that is, Zthas the same distribution across t). Such transformed data are called deseasonalized(ds). Estimates of {µi} and {σi} are shown in Figure 7.1, and are obtained using a localmethod, described in detail in Appendix F.1. The resulting overlaid time series plots ofthe deseasonalized data Zt can be found in Figure 7.2.Location Scale125 150 175 200 125 150 175 2000.30.40.50.62345Day of yearParameter EstimateFigure 7.1: Estimated seasonal components {µi} and {σi} (respectively, location and scale) of thedischarge for the selected “flood season”.Training Validation Test125 150 175 200 125 150 175 200 125 150 175 200−5.0−2.50.02.55.0Day of the YearDeseasonalizedDischargeFigure 7.2: Overlaid time series plots of deseasonalized discharge during the “flood season”, on each ofthe training, validation, and test sets.7.1.3 Choosing Predictors and a ResponseAlthough forecasts are to be made on the discharge of the Bow River at Banff, choos-ing discharge (even deseasonalized) as a response variable turns out to be a poor choice.Discharge is known to exhibit long-range dependence, meaning that there is a strongserial correlation in the time series. This is intuitive, because the discharge separated bytwo days ought to be rather similar. Such high serial correlation is visible in the autocor-72CHAPTER 7. APPLICATION TO FLOOD FORECASTINGrelation plot of ds discharge, found in Figure 7.3. Although there are 1965 observationson the fitting set, the “effective” sample size is much smaller.On the contrary, difference in ds discharge displays very minimal serial correlation(also shown in Figure 7.3). This means that the effective sample size is close to the fullsample size, and so more information is available on the day-to-day change in ds dischargethan there is for the absolute ds discharge. Since the discharge can be recovered fromthe change in ds discharge, the one-day ahead change in ds discharge is chosen to be theresponse variable.ds Discharge Change in ds Discharge0.000.250.500.751.000 10 20 30 0 10 20 30LagAutocorrelationFigure 7.3: Autocorrelation plots of ds discharge and change in ds discharge. Error bands represent95% confidence bands about zero. The ds discharge is highly serially correlated, whereas the change inds discharge has minimal serial correlation.ds Discharge Change in ds Discharge0.000.050.100.150.9 0.95 1 0.9 0.95 1Quantile levelProportion ExceededFigure 7.4: Calibration plots of the ds discharge marginal forecaster, and the change in ds dischargemarginal forecaster. A marginal forecaster uses the marginal quantile function as a forecast. The tencurves in each panel represent the results under ten randomly chosen splits of the data into fitting andtest sets.To verify that change in ds discharge is in fact a better choice for the response thanthe ds discharge itself, one can compare the respective marginal forecasts, which do notuse any predictors. Fit using the fitting set, the marginal forecaster of a variable is simplyits empirical quantile function, which is issued as a forecast under any circumstance. Due73CHAPTER 7. APPLICATION TO FLOOD FORECASTINGto high serial correlation, one would expect more disparity in ds discharge between thefitting and test sets, so that the forecaster based on ds discharge is more susceptible togive over- or under-estimates. Indeed, this is visible in the calibration plots of Figure 7.4.Next, predictors are needed. Four predictors are chosen, based on the snowpack anddischarge data shown in Figure 2.1 of Section 2.1: lags 0 and 1 change in deseasonalizeddischarge, and lags 0 and 1 drop in snowpack. Other predictors were considered, such aslagged ds discharge, but they were not as informative (as indicated by Kendall’s tau).Scatterplots of the response against the predictors can be seen in Figure 7.5, alongwith local estimates of upper quantiles. An assumption of linearity on these data so fardoes not appear to be a bad one, particularly because the dependence in each pair is notvery strong. Certainly, a more rigorous choice of predictors would be an asset, such asa better estimate of snowmelt within the Bow River watershed, but these predictors areused regardless.Change inds DischargeLag 1 Change inds Discharge Drop in SnowpackLag 1Drop in Snowpack−0.50.0 0.5 1.0 1.5 −0.50.0 0.5 1.0 1.5 −40 −20 0 20 40 −40 −20 0 20 40−1012Predictor ValueResponseFigure 7.5: Scatterplots of predictors (listed in the upper panels) against the response, with localestimates of upper quantile curves. Variables are unit-less, aside from lag 0 and lag 1 drop in snowpack,which are in millimeters. The fitted curves are local estimates ofK = 10 upper quantiles with levels aboveτc = 0.9, taken to be τk = τc + (1− τc) (2k − 1) / (2K) for k = 1, . . . ,K. The curves are displayedonly to convey trends. The curves are obtained by fitting a GPD tail with MLE over a moving-windowhaving a bandwidth of 0.5 standard deviations of the predictor.7.2 Building ForecastersWith the data pre-processed, we are now ready to build the forecasters using thethree competing techniques.The application of linear quantile regression on the deseasonalized fitting data is rou-tine – we use the rq() function in the R (R Core Team, 2015) package quantreg (Koenker,2016). As for the local smoothing method, a Gaussian kernel with a bandwidth of 0.5Mahalanobis units over the predictor space is used to obtain weights for the response.74CHAPTER 7. APPLICATION TO FLOOD FORECASTINGThe weights are used to obtain a weighted empirical quantile function, as discussed byDaouia et al. (2011).For the vine CNQR method, the CNQR estimator is used with K = 10 quantile levelsτk =2k − 12K(1− τc) + τcfor k = 1, . . . , K, and each gk being the identity function.The first stage of building a forecaster with vine CNQR is to model the univariatemarginals of each variable. Although there are five variables between the four predictorsand one response, these make up only two variables, up to lag: change in ds discharge,and drop in snowpack.We consider two marginal distributions for change in ds discharge:1. the empirical distribution, and2. the empirical distribution with a Generalized Pareto distribution (GPD) tail,both of which are estimated using the entire fitting data. Strictly using the empirical dis-tribution forces a finite right-endpoint to the forecasts, whereas the GPD tail is justifiedby Extreme Value Theory and allows for a more reliable extrapolation into the predictivedistribution’s tail. See Appendix F.2 for details on fitting this marginal distribution.From the overlaid time series plot in Figure 2.1, it appears that there is a cyclic trendin the distribution of snowmelt. This trend should be considered so that the appropriatemarginal distribution can be used when computing the conditional predictors (computedwith Equation (A.2.2)). However, it is not clear what components of the distributionchange, as it is when deseasonalizing the discharge. We will therefore use a local empiricaldistribution method on the fitting data to estimate the marginal distribution, as discussedby Daouia et al. (2011). See Appendix F.3 for details on fitting the marginal distributionfor snowmelt.The second stage of building a forecaster with vine CNQR is to model the dependencebetween the variables. A vine copula is first fit on the predictors using MLE and selectedusing AIC. The vine copula is extended by introducing the response, where two pairingorders are considered (see Appendix F.4). Bivariate copula models are fit for each variablein the pairing order using the sequential method introduced in Section 5.3. Estimation isdone on the entire fitting set – splitting the fitting data into training and validation hereis not necessary, since the sample size of the fitting data (1965) is not that large anyway.Lastly, we must select one of the resulting candidate vine CNQR forecasters. Scoresand calibration plots of these forecasters on the fitting set are investigated to make a se-75CHAPTER 7. APPLICATION TO FLOOD FORECASTINGlection (again, using a separate validation set for this is not necessary). See Appendix F.4for details on modelling the dependence, and for model selection. Of the five candidatesconsidered, “Candidate 5” was selected as the best.To investigate the sensitivity of this method to the splitting of the data into fittingand test sets, this fitting procedure was repeated under a different random split. Thevines fit to the predictors were almost identical. The fitted copulas linking the responseand predictors were also similar, except for the last two vine CNQR forecasters, whichhad a BB8 copula instead of a t copula fit to the first predictor. Overall, this suggeststhat the vine CNQR method is not overly sensitive if fit on another set of data.7.3 EvaluationThe three competing forecasters can now be evaluated using calibration plots (seeSection 3.1) and a proper scoring rule (see Section 3.2.2). For comparison, the forecastersare also compared to the GPD marginal forecaster – that is, the forecaster that alwaysissues the upper quantile function of the fitted GPD distribution of the response. Thisis to provide a sense of the value added by the predictors.Calibration plots and calibration histograms of the forecasters can be found in Fig-ure 7.6. They show that the forecasters are well-calibrated, and since the histograms allappear fairly uniform (or at least have no trend), it seems the tails of the forecasts decayadequately.Next, mean scores are estimated both for forecasts made on the response (one-day-ahead change in ds discharge) and the original discharge scale. Regarding the choice of ascoring rule, we use Equation (3.2.8) with a constant cross-quantile weight of w (τ) = 1,and transformation functions g (x) = 1 and g (x) = log (x) for the response and discharge,respectively. The computational simplifications mentioned in Remark 3.2.2 are also used.The mean score estimates can be seen in Figure 7.7. Although the proposed methodhas the best score, at least the linear method appears to perform similarly1, and themethods do not show much difference between scoring on the response or the discharge.Each competing forecaster shows some improvement from the marginal forecaster, thoughcould be better if predictors that are more informative are used.1A Diebold-Mariano test (Diebold and Mariano, 1995) for differences between the linear and vineCNQR mean scores produces p-values of 0.14 and 0.06 for the response and discharge scores, respectively.This suggests that any difference in the mean scores of the two forecasters is insignificant at the 0.05level.76CHAPTER 7. APPLICATION TO FLOOD FORECASTINGlllllll ll lllllllllllll lllllll lll llllllllMarginal Linear Local Vine CNQR0.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 10.0000.0250.0500.0750.100Quantile LevelExceedance ProbabilityMarginal Linear Local Vine CNQR0.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 10.0000.0050.0100.0150.020Quantile IndexHistogramFigure 7.6: Calibration plots (above) and histograms (below) of the competing forecasters, on thetest set. The dashed lines represent a perfectly calibrated forecaster. For the calibration plots, theregion above the dashed line represents under-estimation (i.e., an exceedance occurs more often thansuggested), and the region below represents over-estimation (i.e., an exceedance occurs less often thansuggested).Although both the linear and vine CNQR methods appear to be more or less equallygood, the vine CNQR method still has additional properties that make it more desirable:its forecasts are consistent and is motivated by Extreme Value Theory in the Y marginaldistribution (see Appendix F.2). The linear method gives an inconsistent forecast 11.3%of the time, and seems to be more likely to be inconsistent when the discharge is larger,as demonstrated in Figure 7.8. Note that nine other random splits of the data into fittingand test sets were tried, using the same copula models, and the results were very similar.77CHAPTER 7. APPLICATION TO FLOOD FORECASTINGllllllllResponse DischargeMarginal Linear Local Vine CNQR Marginal Linear Local Vine CNQR0.0000.0050.0100.0150.000.010.020.030.04ForecasterMean ScoreFigure 7.7: Estimates of the mean score for each of the three competing forecasters, computed on thetest set. Error bars represent the standard error of the mean estimate, ignoring autocorrelation. Smallerscores are better.0.000.250.500.751.000 100 200 300 400Outcome (m3 s)ProportionInconsistentFigure 7.8: Of the forecasts issued by the linear forecaster when the observed outcome/discharge is atleast as big as the horizontal axis, the solid blue line represents the (interpolated) proportion of forecaststhat are inconsistent (that is, are non-monotonic quantile functions). Inconsistency was determined using10 quantile levels. Approximately 11.3% of forecasts are inconsistent, and inconsistencies appear morelikely when the outcome is large. In addition, the raw data are shown: each instance of an inconsistentforecast is plotted with jitter around a proportion of 1, and around 0 for each consistent forecast.Lastly, we investigate how the performance of each forecaster changes when predictorsare observed to be large, as well as how performance compares between forecasters whenpredictors are large. Weights are used under four scenarios: larger weights are takenwhen we observe• a large discharge,• a large snowmelt,• either a large snowmelt or a large discharge, and• both a large snowmelt and a large discharge.78CHAPTER 7. APPLICATION TO FLOOD FORECASTINGWithin each scenario, we consider different weight functions to ensure that the resultsare not an artifact of the weight function parameters. See Appendix F.5 for the specificchoice of weights.The weighted scores and calibration plots can be seen in Figures 7.9 and 7.10, respec-tively. The scores typically worsen when the weights are added, but less so when onlydischarge is weighted. This suggests that the forecasters perform slightly worse whenpredictors are observed to be large.As for the weighted calibration plots, it appears that the calibration of each forecasterdoes not change much when predictors are large. That is, the curves in each panel aremostly clustered and not much different from the unweighted versions. However, whenboth snowmelt and discharge are weighted, it appears that the vine CNQR forecasteris able to retain its calibration more so than the linear and local methods, which havecalibration plots that are more “scattered”. This weighting scenario is different fromthe others, because it puts weight over the joint tail of the predictors. The relativeperformance of the vine CNQR forecaster suggests that it is able to capture the tailproperties better than the local and linear methods can.Snowmelt only Discharge only Snowmelt and Discharge Snowmelt or DischargeResponseDischargeLinear VineCNQRLocal Linear VineCNQRLocal Linear VineCNQRLocal Linear VineCNQRLocal0.000.020.040.060.0000.0050.0100.0150.020ForecasterScoreFigure 7.9: Comparison of weighted scores amongst the competing forecasters, estimated using thetest set. The weighting scenario is indicated in the panel labels. Lines connect scores under commonweighting schemes, and the solid horizontal lines represent the unweighted scores.79CHAPTER 7. APPLICATION TO FLOOD FORECASTINGSnowmelt only Discharge only Snowmelt and Discharge Snowmelt or DischargeLinearVine CNQRLocal0.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 10.000.050.100.000.050.100.000.050.10Quantile LevelExceedance ProbabilityFigure 7.10: Calibration plots of the competing forecasters under different weighting schemes, estimatedusing the test set. The forecaster is indicated in the side panels; the weighting scenario is indicated in theupper panels. The dashed line represents a perfectly calibrated forecaster; the region above the dashedline represents under-estimation (i.e., an exceedance occurs more often than suggested), and the regionbelow represents over-estimation (i.e., an exceedance occurs less often than suggested).7.4 Forecasting the 2013 FloodThe days of the Alberta 2013 flood were deliberately included in the test set so thatthe forecasts for those days could be evaluated without being influenced by the fittingprocedure. The forecasts can be seen in Figure 7.11.80CHAPTER 7. APPLICATION TO FLOOD FORECASTINGl l llll0100200300400Jun 18 Jun 20 Jun 22DayDischarge (m3s) Discharge During the  2013 FloodJune 17 June 18 June 19 June 20 June 21 June 220.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 1 0.9 0.95 13004005006009001200150040060080010001200150180210240140160180200220130150170190210Quantile IndexDischarge (m3s)Model VineCNQR Linear LocalFigure 7.11: Above: discharge during the start of the 2013 flood, on the indicated days, observed atthe Bow River at Banff. Below: Forecasts that would have been issued for the Bow River at Banff forthe day after that which is indicated. The diamond-shaped bullet indicates the actual observed value onthe next day.Recall that, since the predictive distribution’s tail is being forecast, we expect theforecasts to be well above the observed discharge. This is not the case when the dischargeincreased on June 20. This may be because the relatively unchanging discharge over thetwo previous days was not suggestive of an increase in discharge, leaving snowmelt as thesole indicator.Notice that the linear method does not always produce consistent forecasts, as isespecially true for June 21 and 22. In addition, the local method has a flat quantilefunction from June 20–22, corresponding to a large jump in the distribution function.This suggests that there are very few observations in the fitting set with predictors similarto what was observed on those days (because at least one observation has a weight > 0.1).7.5 SimulationThe predictors used in the application are not very informative, and so trends (in-cluding nonlinear ones) are not very pronounced. The vine CNQR method’s performanceis investigated further here on two simulated data sets: one with linear quantile surfaces,and another with nonlinear quantile surfaces.81CHAPTER 7. APPLICATION TO FLOOD FORECASTINGIn both cases, 1000 observations on three variables are generated as training data,and 10,000 observations as test data. The vine CNQR method is compared againstlinear quantile regression. For vine CNQR, the marginals are estimated by the empiricaldistribution function, and when linking the response, the predictor with the highestKendall’s tau is chosen first in the pairing order.The linear simulation has data generated from the quantile functionQY |X1,X2(τ | x1, x2)= − log (1− τ) + τx1 +√(1− τ)x2for x1, x2 > 0 and τ ∈ (0, 1), where the predictors have standard Exponential marginalslinked by a t (0.3, 3) copula. The non-linear simulation has data generated from a vinewith the array  Y X1 X2Y X1Yand corresponding copulas [Gumbel (3) t (0.3, 3)MTCJ (2)].Figures 7.12 and 7.13 show the fitted quantile surfaces. For the linear data, the averagescores on the test set were 0.14 for the linear method, and 0.15 for the nonlinear method.This means that, although the quantile surfaces are truly linear, the vine CNQR methodstill performs comparably to the linear method. The vine CNQR method producessurfaces that look approximately linear, but the linear method produces quantile surfacesthat cross. As for the non-linear data, the average scores on the test set were 0.072 forthe linear method, and 0.037 for the nonlinear method. The vine CNQR is a clear winnerhere, not only by its superior score, but by its visually excellent fit to the data. Also,note that the non-linearity of the quantile surfaces do not appear smooth – this is dueto the empirical distributions being used as the marginals.82CHAPTER 7. APPLICATION TO FLOOD FORECASTINGFigure 7.12: Linear simulation data with fitted linear (left) and non-linear (right) quantile surfaces.The linear surfaces were fit using linear quantile regression, and the non-linear surface with vine CNQR.Figure 7.13: Non-linear simulation data with fitted linear (left) and non-linear (right) quantile surfaces.The linear surfaces were fit using linear quantile regression (with ten quantile levels above 0.9), and thenon-linear surface with vine CNQR (with quantile levels 0.905 and 0.995).83Chapter 8ConclusionThe main objective of this work is to issue extreme forecasts in terms of upper quantilefunctions, by garnering insight from predictors. This type of forecast is useful because itcommunicates the forecaster’s entire belief about the chance of extremes.Specifically, a method is sought in this dissertation that ensures quantile functionforecasts are non-decreasing (consistent); that allows for extrapolation into the tail ofthe forecast distribution in a justified way; and that is flexible enough to depart fromthe assumption of linear quantile surfaces, so that phenomena such as tail dependencecan be modelled. This work follows up on the call for further research in copula quantileregression by Koenker (2005).A family of proper scoring rules is developed in order to assess the goodness of aforecaster – though it still remains unclear which scoring rules are more desirable thanothers. This family of scoring rules can be used to define a new family of estimators,called the CNQR estimators, which chooses the model that optimizes the score on thetraining data. Asymptotic properties of these estimators are established, and can beused for making inference. These results provide some evidence that data should betransformed to a random variable having a light- or short- tailed distribution to reducethe asymptotic variance.The PCBN or vine copula approach to modelling quantile surfaces is very flexible,and allows for intricate aspects of quantile surfaces—such as tail dependence—to becaptured. This is particularly important when forecasting extremes, because often themost extreme values occur when the predictors are large. In addition, this method alwaysissues consistent forecasts. However, the PCBN or vine approach cannot handle a largenumber of predictors, because the model error compounds further down the linkage order.The method will become increasingly more flexible as more copula models are identifiedin the literature.84CHAPTER 8. CONCLUSIONTo allow for the most reliable extrapolation into the tail of a forecast, the conditionalEVI should be carefully estimated. This can be done by first fitting an appropriateunivariate marginal, such as a GPD, and then selecting a copula family having an ap-propriate CCEVI. The latter step is done indirectly through CNQR estimation. Sincecopulas having a non-constant CCEVI seem to be unavailable, two new parametric copulafamilies are identified. The IG copula family is made up of copulas having a non-constantCCEVI, allowing for the conditional EVI to differ depending on the value of the predic-tor. The IGL copula family is made up of copulas having a zero CCEVI, allowing fora predictor to entirely describe the tail heaviness of a response variable. This finding issignificant, because other parametric copula families identified in the literature do notseem to have these properties.The work presented in this dissertation has many applications. The example of floodforecasting is given, but by extension, this work can be applied to forecast any extremeweather or natural disaster having some sort of “precursor”. Other examples of applicationareas are finance, to forecast crashes in stock prices; insurance, to ensure an unusuallyhigh amount of claims can be paid out; and supply-chain management, to ensure thatenough resources are available to handle unusually high demands from customers. Ingeneral, this work can be a big asset for decision makers who need to prepare for potentialdisaster.There are many new research questions that arise from this work. The following aresome interesting ones.• Perhaps the sampling distribution of a CNQR estimator can be better approximatedby considering intermediate and extreme order sequences of τc. Chernozhukov(2005) found this result with the linear quantile regression estimator, when regress-ing on high quantiles.• What is the asymptotic distribution of the CNQR estimators when K →∞? Canthis be used to gain insight as to what transformation functions gk are appropriateto use for estimation?• Koenker and Park (1996) introduce an algorithm that can be used to find theminimum of a CNQR objective function. Are there ways in which this algorithmcan be improved specifically for CNQR?• The proposed IG and IGL families still have many properties that need discovery,possibly aided by a new parameterization.85CHAPTER 8. CONCLUSION• Can the DJ Weibull copula, identified in Section 6.2.1.2, be generalized to attainhigher dependence?• Can a family of proper scoring rules for the EVI be obtained from the proper scoringrules identified for upper quantile functions, in Equation (3.2.8)? This would beuseful to assess the goodness of fit of a model for the conditional EVI. Perhaps alimit as τc ↑ 1 while the integrand expands (for example, by increasing the weightfunction w) will result in such a scoring rule.• If one knows the CCEVI’s of each bivariate copula in a vine, how can the CCEVIof the entire vine copula be computed?• When forecasting the mean in a time series, the “last value carried forward” tech-nique is a naïve method used for comparison. This method forecasts the previouslyobserved response. Is there an extension to quantiles that can be used?86BibliographyAmemiya, T. (1985). Advanced Econometrics. Harvard University Press, Cambridge,Massachusetts.Bauer, A. and Czado, C. (2015). Pair-copula Bayesian networks. Journal of Computa-tional and Graphical Statistics (to appear).Bernard, C. and Czado, C. (2015). Conditional quantiles and tail dependence. Journalof Multivariate Analysis, 138:104–126.Beven, K. (2012). Rainfall Runoff Modelling. Wiley-Blackwell.Bondell, H. D., Reich, B. J., and Wang, H. (2010). Noncrossing quantile regression curveestimation. Biometrika, 97(4):825–838.Bouyé, E. and Salmon, M. (2009). Dynamic copula quantile regressions and tail areadynamic dependence in Forex markets. The European Journal of Finance, 15(7-8):721–750.Cannon, A. J. (2010). A flexible nonlinear modelling framework for nonstationary gener-alized extreme value analysis in hydroclimatology. Hydrological Processes, 24(6):673–685.Chen, X., Koenker, R., and Xiao, Z. (2009). Copula-based nonlinear quantile autoregres-sion. Econometrics Journal, 12(S1):S50–S67.Chernozhukov, V. (2005). Extremal quantile regression. The Annals of Statistics,33(2):806–839.Chernozhukov, V., Fernández-Val, I., and Galichon, A. (2010). Quantile and probabilitycurves without crossing. Econometrica, 78(3):1093–1125.Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Re-view, 39(4):841–862.87BIBLIOGRAPHYColes, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London.Cooke, E. (1906). Forecasts and verifications in western Australia. Monthly WeatherReview, 34(1):23–24.Daouia, A., Gardes, L., and Girard, S. (2013). On kernel smoothing for extremal quantileregression. Bernoulli, 19(5B):2557–2589.Daouia, A., Gardes, L., Girard, S., and Lekina, A. (2011). Kernel estimators of extremelevel curves. Test, 20(2):311–333.Dawid, A. P. (1984). Statistical theory: the prequential approach. Journal of the RoyalStatistical Society Series A, 147(2):278–292.de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer.Diebold, F. X. and Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal ofBusiness and Economic Statistics, 13:253–265.Doll, J. and Geddes, L. (2014). Flood reflections: reporter Jayme Doll returns to Can-more’s Cougar Creek. Global News.Durante, F. and Jaworski, P. (2012). Invariant dependence structure under univariatetruncation. Statistics, 46(2):263–277.Durante, F., Jaworski, P., and Mesiar, R. (2011). Invariant dependence structures andArchimedean copulas. Statistics and Probability Letters, 81(12):1995–2003.Environment and Climate Change Canada (2009). Flooding Events in Canada - PrairieProvinces. http://www.ec.gc.ca/eau-water/default.asp?lang=En&n=E0399791-1[Accessed: February 4, 2016].Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution ofthe largest or smallest member of a sample. Mathematical Proceedings of the CambridgePhilosophical Society, 24(2):180–190.Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une série aléatoire.Annals of Mathematics, 44(3):423–453.Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007). Probabilistic forecasts, calibra-tion and sharpness. Journal of the Royal Statistical Society Series B (Methodological),69(2):243–268.88BIBLIOGRAPHYGneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, andestimation. Journal of the American Statistical Association, 102(477):359–378.Heffernan, J. E. and Tawn, J. a. (2004). A conditional approach to modelling multi-variate extreme values. Journal of the Royal Statistical Society. Series B: StatisticalMethodology, 66(3):497–546.Hjort, N. L. and Pollard, D. (1993). Asymptotics for minimisers of convex processes.Technical report, University of Oslo.Huang, M. L., Xu, X., and Tashnev, D. (2015). A weighted linear quantile regression.Journal of Statistical Computation and Simulation, 85(13):2596–2618.Jaworski, P. (2015). Univariate conditioning of vine copulas. Journal of MultivariateAnalysis, 138:89–103.Jiang, X., Jiang, J., and Song, X. (2012). Oracle model selection for nonlinear modelsbased on weighted composite quantile regression. Statistica Sinica, 22:1479–1506.Joe, H. (2014). Dependence Modeling with Copulas. CRC Press.Joe, H. and Krupskii, P. (2014). CopulaModel: Dependence Modeling with Copulas. Rpackage version 0.6.Knight, K. (1998). Limiting distributions for L1 regression estimators under generalconditions. The Annals of Statistics, 26(2):755–770.Koenker, R. (2005). Quantile Regression. Cambridge University Press.Koenker, R. (2016). quantreg: Quantile Regression. R package version 5.26.Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46(1):33–50.Koenker, R. and Mizera, I. (2004). Penalized triograms: total variation regularization forbivariate smoothing. Journal of the Royal Statistical Society Series B, 66(1):145–163.Koenker, R., Ng, P., and Portnoy, S. (1994). Quantile smoothing splines. Biometrika,81(4):673–680.Koenker, R. and Park, B. (1996). An interior point algorithm for nonlinear quantileregression. Journal of Econometrics, 71:265–283.89BIBLIOGRAPHYKraus, D. and Czado, C. (2017). D-vine copula based quantile regression. ComputationalStatistics and Data Analysis, 110:1–18.Krzysztofowicz, R. (2001). The case for probabilistic forecasting in hydrology. Journalof Hydrology, 249(1-4):2–9.Li, W. (2016). Flooding forces residents out of homes. Vancouver Metro News, page 4.Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probabilitydistributions. Management Science, 22(10):1087–1096.Micovic, Z. (2005). Minimum Watershed Model Structure for Representation of RunoffProcesses. PhD thesis, The University of British Columbia.Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing.In Engle, R. F. and McFadden, D. F., editors, Handbook of Econometrics, volume 4,chapter 36, pages 2111–2245.Noh, H., El Ghouch, A., and Van Keilegom, I. (2015). Semiparametric conditionalquantile estimation through copula-based multivariate models. Journal of Businessand Economic Statistics, 33(2):167–178.R Core Team (2015). R: A Language and Environment for Statistical Computing. RFoundation for Statistical Computing, Vienna, Austria.Schepsmeier, U., Stoeber, J., Brechmann, E. C., Graeler, B., Nagler, T., and Erhardt, T.(2016). VineCopula: Statistical Inference of Vine Copulas. R package version 2.0.1.Spokoiny, V., Wang, W., and Härdle, W. K. (2013). Local quantile regression. Journalof Statistical Planning and Inference, 143(7):1109–1129.Wadsworth, J. L., Tawn, J. A., and Jonathan, P. (2010). Accounting for choice ofmeasurement scale in extreme value modeling. Annals of Applied Statistics, 4(3):1558–1578.Wang, H. J. and Li, D. (2013). Estimation of extreme conditional quantiles through powertransformation. Journal of the American Statistical Association, 108(503):1062–1074.Xie, Q. (2015). Computation and application of copula-based weighted average quantileregression. Journal of Computational and Applied Mathematics, 281:182–195.Zhao, Z. and Xiao, Z. (2014). Efficient regressions via optimally combining quantileinformation. Econometric Theory, 30(6):1272–1314.90BIBLIOGRAPHYZou, H. and Yuan, M. (2008). Composite quantile regression and the oracle modelselection theory. The Annals of Statistics, 36(3):1108–1126.91Appendix AFormulasSome formulas related to copulas are provided here. Section A.1 provides formulas forcopula reflections, and Section A.2 provides formulas for PCBN/vine copula conditionaldistribution functions and conditional quantile functions.A.1 Reflection CopulasLet C be a d-dimensional copula, for d ≥ 2. For some m ∈ {1, . . . , d}, we derive theformula for the reflection copula R1···mC, which defines a reflection operator.Notice that R1···m = RmRm−1 · · ·R1 (or any permutation of Rj, j = 1, . . . ,m). Theformula for R1C (which can be extended to any RjC) is simplyR1C (u1, . . . , ud) = P (U1 ≥ 1− u1, U2 ≤ u2, . . . , Ud ≤ ud)= C (1, u2, . . . , ud)− C (1− u1, u2, . . . , ud) .Applying this formula recursively results in an inclusion-exclusion type formula that canbe written compactly asR1···mC (u1, . . . , ud) =∑S⊂{1,...,m}(−1)|S|CS∪(m+1):d (1− ui, i ∈ S, um+1, . . . , ud) , (A.1.1)where for set S ⊂ {1, . . . ,m}, |S| denotes the cardinality of S, andCS∪(m+1):d (ui, i ∈ S, um+1, . . . , ud)is a(d−m+ |S|)-dimensional margin of the copula C; it is C with the i’th argumentbeing ui for each i ∈ S, the j’th argument being 1 for each j ∈ {1, . . . ,m} \S, and92APPENDIX A. FORMULASarguments m+ 1, . . . , d being (respectively) um+1, . . . , ud. Informally, but more useful inpractice, we can write this formula asR1···mC (u1, . . . , ud) = C (1, . . . , 1, um+1, . . . , ud)−m∑i=1C (1, . . . , 1− ui (position i), . . . , 1, um+1, . . . , ud)+∑j<i≤mC(1, . . . , 1− uj (position j), . . . , 1− ui (position i), . . . , 1, um+1, . . . , ud)−+ · · · .(A.1.2)The formulas for the bivariate case areR1C (u, v) = v − C (1− u, v) , (A.1.3)R2C (u, v) = u− C (u, 1− v) , (A.1.4)andCˆ (u, v) = u+ v − 1 + C (1− u, 1− v) . (A.1.5)A.2 PCBN EquationsFor other formulas relating to vine distributions, see Joe (2014, Section 3.9.3).Theorem A.2.1. SupposeW =(W1, . . . ,Wp)> is a vector of absolutely continuous ran-dom variables whose distribution function can be described by a PCBN with (p, 1, . . . , p− 1)>as the last column of its PCBN array (that is, Wp is introduced last and has a pair-ing order W1, . . . ,Wp−1). Then if p′ < p, the conditional distribution function of Wp |(W1, . . . ,Wp′)can be calculated using the recursive relationFp|1:p′(wp | w1:p′)= Cp|p′;1:(p′−1)(Fp|1:(p′−1)(wp | w1:(p′−1)) | up′) , (A.2.1)whereup′ = Fp′|1:(p′−1)(wp′ | w1:(p′−1)). (A.2.2)The corresponding quantile function isQp|1:p′(τ | w1:p′)= Qp|1:(p′−1)(C←p|p′;1:(p′−1)(τ | up′) | w1:(p′−1)) . (A.2.3)93APPENDIX A. FORMULASProof. Omitting function arguments where obvious, we haveFp|1:p′ =D1Fp′,p|1:(p′−1)fp′|1:(p′−1)=D1Cp′,p;1:(p′−1)(Fp′|1:(p′−1), Fp|1:(p′−1))fp′|1:(p′−1)fp′|1:(p′−1)= Cp|p′;1:(p′−1)(Fp|1:(p′−1) | Fp′|1:(p′−1)),where D1 is the differential operator on the first argument. The quantile function can beobtained by a simple function inversion exercise.So, for example, if there are p = 2 predictors, the quantile function of Y | (X1, X2)would beQY |1:2(τ | (x1, x2))= QY |1(C←Y |2;1(τ | u2))= QY(C←Y |1(C←Y |2;1(τ | u2) | u1)) ,(A.2.4)where u1 = F1 (w1) and u2 = F2|1(w2 | w1).94Appendix BProofs related to the Conditional EVIThis appendix contains the proofs of the general results on the conditional EVI con-tained in Section 2.5. Then, the CCEVI’s of some bivariate copula families are computed.B.1 Proofs of Conditional EVI ResultsProof of Proposition 2.5.1. Proof by contradiction: suppose there exists a set X0 ⊂ Xof non-zero measure such that ξY |X (x) > ξY ≥ 0 for all x ∈ X0, where ξY |X (x) is theextreme value index of Y |X = x, assumed to be continuous in x for simplicity. We canfurther suppose that X0 is compact, so that ξY |X (x) ≥ ξY + ε > 0 for some ε > 0.Computing the marginal distribution from the conditional, we have for y ∈ R,1− FY (y) ≥∫X0(1− FY |X(y|x)) dFX (x)≥∫X0`x (y) y−1/(ξY +ε) dFX (x)= ˜`(y) y−1/(ξY +ε), (B.1.1)where `x ∈ RV0 for all x ∈X0, and˜`(y) =∫X0`x (y) dFX (x)exists. Note that ˜` is non-zero since X0 has non-zero measure.Since X0 is compact, ˜` is slowly varying because each `x is: by the Uniform Conver-gence theorem,limt→∞˜`(ty)˜`(y)= limt→∞∫X0`x (ty) dFX (x)∫X0`x (t) dFX (x)= 195APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIfor y ≥ y0 > 1 such that ˜` is locally integrable over [y0,∞).The inequality in Equation (B.1.1) suggests that Y must be heavy-tailed, so thatξY = 0 is not possible. But if ξY > 0, we have 1 − FY (·) ∈ RV−1/ξY , which decaysmore quickly than the lower bound in Equation (B.1.1) allows. We conclude that no setX0 ⊂X with non-zero measure exists.Before proving Proposition 2.5.2, we present a lemma stating that a heavier-taileddistribution always “overtakes” a lighter-tailed distribution.Lemma B.1.1. Suppose F1 ∈ D(Gξ1)and F2 ∈ D(Gξ2)for some ξ1 > ξ2 ≥ 0, so thatF1 is heavier tailed than F2. Then there exists an ε > 0 such that F←1 (τ) > F←2 (τ) forall τ ∈ (1− ε, 1).This result is intuitive, but a proof is shown anyway.Proof. We haveF←1 (τ) = `1 (τ) (1− τ)−ξ1andF←2 (τ) = `2 (τ) (1− τ)−ξ2for all τ ∈ (1− ε1, 1) and some ε1 > 0, where `1 and `2 are slowly varying at 1−. Theirratio is thereforeF←1 (τ)F←2 (τ)= ` (τ) (1− τ)−(ξ1−ξ2) ,where ` is slowly varying at 1−. Since ξ1 > ξ2, this ratio tends to infinity as τ ↑1, suggesting the existence of an ε ∈ (0, ε1] such that F←1 (τ) > F←2 (τ) for all τ ∈(1− ε, 1).Now we can prove the main proposition.Proof of Proposition 2.5.2. Take 0 < t1 < t2 < 1, and defineQ1 = QY |X(· | x˜ (t1))andQ2 = QY |X(· | x˜ (t2)) .Since the τ -quantile curves are non-decreasing along the path x˜ for all τ ∈ (1− ε, 1),we have Q1 (τ) ≤ Q2 (τ). By the contrapositive of Lemma B.1.1, ξY |X(x˜ (t1)) ≤ξY |X(x˜ (t2)), so that the conditional EVI is non-decreasing from x˜ (t1) to x˜ (t2). Sincet1 and t2 are arbitrary, this must hold for the entire path x˜ (t) for t ∈ (0, 1).96APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIProof of Proposition 2.5.3. Define W = T (X), and consider x ∈ X . Since T is one-to-one, the event X = x is the same as the event W = T (x), so that the conditionaldistributions of Y | X = x and Y | W = T (x)—and their respective EVI’s ξY |X (x)and ξY |W(T (x))—are the same. This also means thatQY |W(τ |w) = QY |X (τ |T← (w)) = α +w>β (τ)for each τ ∈ (1− ε, 1), so that Y | W has linear quantile surfaces. By Wang and Li(2013, Proposition 1), it follows that ξY |W = ξ for some constant ξ > 0 (they show thisfor one predictor, but their argument extends to p predictors). Therefore,ξY |X (x) = ξY |W(T (x))= ξ,so that ξY |X is constant as well.Proof of Proposition 2.5.4. The proposition for ξY ≤ 0 is already proven in Wadsworthet al. (2010, Theorem 1). Consider, then, ξY > 0. The right-endpoint is y∗ = ∞, andsince F ′′Y exists and F ′Y > 0 in a neighbourhood of y∗, by Wadsworth et al. (2010, Theorem1),ξΞλ(Y ) = ξY + (λ− 1) limy→∞(1−FY (y)F ′Y (y))y. (B.1.2)Now, since ξY > 0, we have 1− FY ∈ RV−1/ξY and F ′Y ∈ RV−1/ξY −1. This means1− FY (y)F ′Y (y)∈ RV1,so that we can apply L’hôpital’s rule to Equation (B.1.2) to obtainξΞλ(Y ) = ξY + (λ− 1) limy→∞(1− FYF ′Y)′(y) . (B.1.3)Since ξΞλ(Y ) ∈ R, the limit under consideration must exist. By the von Mises condition,the limit in Equation (B.1.3) must be the EVI of FY ; that is,ξΞλ(Y ) = ξY + (λ− 1) ξY = λξY ,which completes the proof.Proof of Corollary 2.5.5. Since Ui ∼ Uniform (0, 1), 1 − U−1i follows a Type I Pareto97APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIdistribution with distribution functionF1−U−1i (x) = 1− x−1for x > 1. This Pareto distribution has an EVI of 1. Since the CCEVI ξi is defined asthe EVI of 1− U−1i | U−i, by Proposition 2.5.1, ξi ≤ 1.Proof of Theorem 2.5.6. LetU =(F1 (X1) , . . . , Fp(Xp))>and u =(F1 (x1) , . . . , Fp(xp))>.Since the event X = x is equivalent to the event U = u, the EVI of the distribution ofY | X = x is the same as the EVI of the distribution of Y | U = u. We will focus onfinding the latter.Let V = FY (Y ), andW =11− V , (B.1.4)so that W ∼ Pareto (1). By Sklar’s Theorem (Theorem 2.4.1),(U>, V)has distributionfunction given by the copula C. The EVI of W | U = u is just ξC (u), by definition ofthe CCEVI. We find ξY |X (x) by back-transforming W to Y byY = F←Y(1− 1W), (B.1.5)and using a corollary of the von Mises condition (cf. de Haan and Ferreira, 2006, Corollary1.1.10). Note that ξY ≥ 0.We first need the quantile function of Y | U = u, which, by Equation (B.1.5), isF←Y |U=u (τ) = F←Y(1− 1F←W |U=u (1− t−1))for τ ∈ (0, 1), where τ = 1 − t−1. Written in terms of the (1− t−1)-quantile functions1for t > 1,GY |U=u (t) = GY ◦GW |U=u (t) ,where G (t) = F←(1− t−1) for distribution function F .Next, we need the first and second derivatives of GY |U=u. These derivatives areG′Y |U=u (t) =[G′Y ◦GW |U=u (t)] [G′W |U=u (t)]1These functions are sometimes denoted by U (cf. de Haan and Ferreira, 2006), but we use G toavoid overloading U .98APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIandG′′Y |U=u (t) =[G′′Y ◦GW |U=u (t)] [G′W |U=u (t)]2+[G′Y ◦GW |U=u (t)] [G′′W |U=u (t)].By the corollary to the von Mises condition under consideration (cf. de Haan andFerreira, 2006, Corollary 1.1.10), we havelimt→∞tG′′W |U=u (t)G′W |U=u (t)= ξC (u)− 1 (B.1.6)andlimt→∞tG′′Y (t)G′Y (t)= ξY − 1. (B.1.7)Now investigate a similar limit to find ξY |X (x):limt→∞tG′′Y |U=u (t)G′Y |U=u (t)= limt→∞t[G′′Y ◦GW |U=u (t)] [G′W |U=u (t)]G′Y ◦GW |U=u (t)+ limt→∞tG′′W |U=u (t)G′W |U=u (t)= limt→∞(GW |U=u (t)G′′Y ◦GW |U=u (t)G′Y ◦GW |U=u (t))(tG′W |U=u (t)GW |U=u (t))+ ξC (u)− 1,(B.1.8)by Equation (B.1.6). To continue, we consider two cases.Consider the case where ξC (u) > 0. Then GW |U=u (t) → ∞ as t → ∞, sinceY | U = u has infinite right-endpoint. Then,limt→∞tG′′Y |U=u (t)G′Y |U=u (t)=(lims→∞sG′′Y (s)G′Y (s))(limt→∞tG′W |U=u (t)GW |U=u (t))+ ξC (u)− 1= (ξY − 1)(limt→∞tG′W |U=u (t)GW |U=u (t))+ ξC (u)− 1,by Equation (B.1.7). Noting that GW |U=u (t) → ∞ and tG′W |U=u (t) → as t → ∞ sothat L’hôpital’s rule applies (both functions are in RVξC(u)), we obtainlimt→∞tG′′Y |U=u (t)G′Y |U=u (t)= (ξY − 1)(1 + limt→∞tG′′W |U=u (t)G′W |U=u (t))+ ξC (u)− 1= (ξY − 1)(1 + ξC (u)− 1)+ ξC (u)− 1= ξY ξC (u)− 1,99APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIby Equation (B.1.6). By the von Mises corollary, ξY |X (x) = ξY ξC (u).Next, consider the case where ξC (u) < 0. Then the right-endpoint of W | U = u isw∗ = limt→∞GW |U=u (t) ∈ (1,∞)(it must be greater than 1 due to the definition of W in Equation (B.1.4), but lessthan infinity, meaning that the copula does not have support on the entire hypercube).Continuing with Equation (B.1.8), the first limit islimt→∞GW |U=u (t)G′′Y ◦GW |U=u (t)G′Y ◦GW |U=u (t)= w∗G′′Y (w∗)G′Y (w∗)∈ R,since G′Y > 0 and G′′Y exists. To evaluate the second limit, notice that G′W |U=u ∈RVξC(u)−1 and GW |U=u ∈ RV0, so that the second limit is the limit of a function inRVξC(u):limt→∞tG′W |U=u (t)GW |U=u (t)= 0.Continuing from Equation (B.1.8), we obtainlimt→∞tG′′Y |U=u (t)G′Y |U=u (t)= ξC (u)− 1.By the von Mises corollary, ξY |X (x) = ξC (u).Proof of Proposition 2.4.2. Take u ∈ U , and use the limit in Equation (2.4.3) to obtains1−1/ξC(u) =c∗c∗= 1,so that ξC (u) = 1.B.2 Derivation of CCEVI’sThe CCEVI’s of the Gaussian, Frank, and Gumbel copulas are provided here.B.2.1 Gaussian CopulaThe Gaussian copula with dependence parameter ρ ∈ [−1, 1] isCGauss (u, v; ρ) = Φ2(Φ← (u) ,Φ← (v) ; ρ),100APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIwhere Φ← is the probit function, and Φ2 is the cdf of the bivariate normal distribution,Φ2 (x, y; ρ) =∫ x−∞y−∞φ2 (s, t; ρ) dt dsandφ2 (x, y; ρ) =12pi√1− ρ2 exp(−y2 − 2ρxy + x22 (1− ρ2))for (x, y) ∈ R2.We find the CCEVI through the copula density. Letting x = Φ← (u) and y = Φ← (v),the copula density iscGauss (u, v; ρ) =φ2 (x, y; ρ)φ (x)φ (y)=1√1− ρ2 exp(−x2 − 2ρxy + y22 (1− ρ2))exp(x22)exp(y22).Viewing x and ρ as constants, the density can be further decomposed:cGauss (u, v; ρ) ∝ exp(−−2ρxy + y22 (1− ρ2) +y22)=[exp(2ρxy − y2 + y2 (1− ρ2))]1/(2(1−ρ2))=[exp(− (ρy − x)2)]1/(2(1−ρ2)).Next, consider the regular variation of the functionh : t 7→ exp(ρΦ←(1− 1t)− x)2in some neighbourhood of infinity. This is easier to explore through its inverseh← : t 7→ 11− Φ(√log t+xρ) ,101APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIalso in a neighbourhood of infinity. For s > 0, we havelimt→∞h← (ts)h← (t)= limt→∞1− Φ(√log t+xρ)1− Φ(√log(st)+xρ)= limt→∞φ(√log t+xρ)1ρ12(log t)−1/2 1tφ(√log(st)+xρ)1ρ12(log (st))−1/2 1sts= limt→∞φ(√log t+xρ)φ(√log(st)+xρ)=[limt→∞exp{−(√log t+ x)2+(√log (st) + x)2}]1/(2ρ2)=[s limt→∞exp{2x√log (st)− 2x√log t}]1/(2ρ2)= s1/(2ρ2),so that h← ∈ RV1/(2ρ2) at infinity. This means that h ∈ RV2ρ2 .Lastly, the CCEVI of the Gaussian copula can be found by investigating the followinglimit for s > 0:s−1 limt→∞cGauss(u, 1− (st)−1 ; ρ)cGauss (u, 1− t−1; ρ) = s−1[limt→∞h (ts)h (t)]−1/(2(1−ρ2))= s−1[s−2ρ2]−1/(2(1−ρ2))= s−1/(1−ρ2).The CCEVI of the Gaussian copula family is therefore 1 − ρ2. Since a Gaussian copulais reflection-symmetric, the reflection copula family shares the same CCEVI.B.2.2 Frank CopulaThe Frank copula family is defined byCFrk (u, v; θ) = −1θlog(1−(1− exp (−θu)) (1− exp (−θv))1− exp (−θ))102APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVIfor θ ∈ R (the independence copula is obtained when θ = 0). We find the CCEVI throughthe copula density.Fixing θ and u, let x = 1 − exp (−θu) and y = 1 − exp (−θv) so that the copuladensity iscFrk (u, v; θ) ∝(1− exp (−θ)− xy)−2 exp (−θv) .Evaluated at v = 1 (and y = 1− exp (−θ)), we havecFrk (u, v; θ) ∝(1− exp (−θ))−2 (1− x)−2 exp (−θ) ,which is non-zero (so long as θ 6= 0, in which case we obtain the independence copulaanyway). To find the CCEVI, consider the following limit for s > 0:s−1 limt→∞cFrk(u, 1− (st)−1 ; θ)cFrk (u, 1− t−1; θ) = s−1 limt→∞cFrk (u, 1; θ)cFrk (u, 1; θ)= s−1.The CCEVI of the Frank copula family is therefore 1. This is also the CCEVI of thereflected Frank copula family due to the family’s reflection symmetry.B.2.3 Gumbel CopulaFor the Gumbel copula family, we have for (u, v) ∈ (0, 1)2 and θ > 1,CGum (u, v; θ) = exp{−ψGum (x, y; θ)} ,whereψGum (x, y; θ) =(xθ + yθ)1/θ,and x = − log u, y = − log v. The density iscGum (u, v; θ) =CGum (u, v; θ)uv(D2 ψGum (x, y; θ)−D12 ψGum (x, y; θ))=CGum (u, v; θ)uvyθ−1[(xθ + yθ)−1+1/θ+(xθ + yθ)−2+1/θxθ−1 (θ − 1)]∼ yθ−1(x−θ+1 + x−θ (θ − 1))as v ↑ 1 (simply substitute v = 1 and y = 0 everywhere except the yθ−1 term).103APPENDIX B. PROOFS RELATED TO THE CONDITIONAL EVITo find the CCEVI, investigate the following limit for s > 0:s−1 limt→∞cGum(u, 1− (ts)−1 ; θ)cGum(u, 1− (ts)−1 ; θ) = s−1 limt→∞(ts)−θ+1t−θ+1= s−θ,since − log(1− (ts)−1)∼ (ts)−1 as t→∞. The CCEVI is therefore 1/θ.To find the CCEVI of the survival copula, first notice thatcGum (u, v; θ) ∼ CGum (u, v; θ)uvas v ↓ 0 (so y → ∞). Now investigate the following limit for s > 0, this time takingx = − log (1− u):s−1 limt→∞cˆGum(u, 1− (ts)−1 ; θ)cˆGum(u, 1− (ts)−1 ; θ) = s−1 limt→∞cGum(1− u, (ts)−1 ; θ)cGum(1− u, (ts)−1 ; θ)= s−1 limt→∞CGum(1− u, (ts)−1 ; θ)tsCGum (1− u, t−1; θ) t= exp{limt→∞[(xθ + (log t)θ)1/θ−(xθ +(log (ts))θ)1/θ]}= exp {− log s}= s−1.The CCEVI of the survival Gumbel family is therefore 1.104Appendix CProofs related to Proper Scoring RulesThis appendix provides the proofs related to propriety and tail balance, discussed inSection 3.2.C.1 ProprietyHere are the regularity conditions required for Theorem 3.2.1 in Section 3.2.2, followedby a proof of the theorem. Recall that the transformation function g : R → R isnon-decreasing, the weight function w : (τc, 1) → [0,∞), and ϕ (x; τ) = g (x)w (τ) for(x, τ) ∈ R× (τc, 1) – all of which are fixed.1. The function τ 7→ ϕ(Qˆ (τ) ; τ)is continuous almost everywhere for all Qˆ ∈ Qˆ.2. w ∈ RVα at 1−, satisfying α + ξˆ+ < 1 for all ξˆ ∈ Ξˆg, where ξˆ+ = max(0, ξˆ),and Ξˆg exists and is the set of extreme value indices of the distribution functions{Fˆ ◦ g← : Fˆ ∈ Fˆ}.3. The meanµ = E(g (Y ))=∫g (y) dF (y)exists for all F ∈ F .Note that Condition 2 can be relaxed if the response never exceeds the support of theforecasts. If the right endpoints limτ↑1 Qˆ (τ) ≥ limτ↑1Q (τ) (which is either finite orinfinite) for all Qˆ ∈ Qˆ and Q ∈ Q, then the condition can be relaxed to2’. w ∈ RVα at 1− , satisfying α+ ξˆ+ < 2 for all ξˆ ∈ Ξˆg, where Ξˆg exists and is the setof extreme value indices of the distribution functions{Fˆ ◦ g← : Fˆ ∈ Fˆ}.105APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESProof of Theorem 3.2.1 – Existence. Let Qˆ ∈ Qˆ and y ∈ Y . To show the existence of Sin Equation (3.2.8), we must show that the integralIϕ :=∫ 1τc(τ − I(−∞,Qˆ(τ)) (y))(ϕ (y; τ)− ϕ(Qˆ (τ) ; τ))dτ (C.1.1)exists. As a preliminary, Condition 1 is needed to satisfy Lebesgue’s integrability condi-tion so that the integral is well-defined.It is important to note that Condition 2 implies that α < 1, since ξˆ+ ≥ 0 for all ξˆ ∈ Rwhether or not Qˆ has a finite right-endpoint. Similarly, Condition 2’ implies that α < 2.We define qˆ∗ = limτ↑1 Qˆ (τ) as the right-endpoint of the forecast distribution, whichis possibly infinity. The proof involves three cases.• Case 1: y ≥ qˆ∗.• Case 2: y < qˆ∗, and g ◦ Qˆ is bounded above by some gˆ∗ ∈ R.• Case 3: y < qˆ∗, and g ◦ Qˆ is not bounded above.First, consider Case 1. The integral Iϕ in Equation (C.1.1) becomes∫ 1τcw (τ) τ(g (y)− g ◦ Qˆ (τ))dτ.Since g ◦ Qˆ (τ) is bounded below by gˆ∗ := limτ↓τc g ◦ Qˆ (τ) and above by g (y), Iϕ exists if∫ 1τcw (τ)(g (y)− g ◦ Qˆ (τc))dτ =(g (y)− gˆ∗) ∫ 1τcw (τ) dτexists, and this holds by Condition 2. However, if limτ↑1Q (τ) ≤ qˆ∗ for all Q ∈ Q, thenobserving y ≥ qˆ∗ cannot happen. This would happen if all forecasts have an infiniteright-endpoint.Now, all the other cases have y < qˆ∗, meaning that there exists a τL ∈ (τc, 1) suchthat Qˆ (τ) > y for all τ ∈ (τL, 1). We now investigate the existence of Iϕ after replacingthe lower limit of integration with τL to obtainI2ϕ :=∫ 1τL(1− τ)(ϕ(Qˆ (τ) ; τ)− ϕ (y; τ))dτ =∫ 1τLw (τ) (1− τ)(g ◦ Qˆ (τ)− g (y))dτ,(C.1.2)whose existence implies the existence of the original integral Iϕ.106APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESNext, consider Case 2. Since g ◦ Qˆ (τ) ≤ gˆ∗, the integral I2ϕ exists if(gˆ∗ − g (y)) ∫ 1τLw (τ) (1− τ) dτdoes, happening whenever w ∈ RVα at 1− with α < 2 – true by Condition 2’.Finally, consider Case 3. Since g ◦ Qˆ (τ)→∞ as τ ↑ 1, the integral I2ϕ exists if∫ 1τLw (τ) (1− τ) g ◦ Qˆ (τ) dτdoes. By Condition 2’, g ◦ Qˆ is the quantile function of a distribution in some domain ofattraction (because the existence of Ξˆg is assumed), and its extreme value index is ξˆ ≥ 0since g ◦ Qˆ is not bounded above. It follows that g ◦ Qˆ ∈ RVξˆ+ . Also by Condition 2’,τ 7→ w (τ) (1− τ) g ◦ Qˆ (τ) ∈ RVα−1+ξˆ+at 1− where α− 1 + ξˆ+ < 1, and this guarantees the existence of the integral.Proof of Theorem 3.2.1 – Propriety. To show propriety (Definition 3.2.1), the goal is toshow that for any fixed Qˆ ∈ Qˆ and Q ∈ Q, the expected difference of scores satisfies∫ (S (y,Q;ϕ)− S(y, Qˆ;ϕ))dF (y) ≤ 0, (C.1.3)where F = Q←.We first must show that the expected score exists, so that the left-hand side of Equa-tion (C.1.3) exists. We consider the existence of the expectation∫S(y, Qˆ;ϕ)dF (y) =∫ 1τcEY(sτ(Y, Qˆ (τ) ;ϕ))dτ, (C.1.4)and begin by examining the expected single-quantile score (note that Qˆ is not randomhere, and the subscript Y is used for indicating this). Denotingτˆ (τ) = P(Y < Qˆ (τ))(C.1.5)andµs(τ ; Qˆ (τ))= EY(sτ(Y, Qˆ (τ) ;ϕ)),107APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESfrom the single-quantile score in Equation (3.2.5), we haveµs(τ ; Qˆ (τ))=∫w (τ) ρτ(g (y)− g ◦ Qˆ (τ))dF (y) (C.1.6)= w (τ)∫ [(τ − I(−∞,Qˆ(τ)) (y))g (y)−(τ − I(−∞,Qˆ(τ)) (y))g ◦ Qˆ (τ)]dF (y)= w (τ)(τµ− ∫ Qˆ(τ)−∞g (y) dF (y))− g ◦ Qˆ (τ) (τ − τˆ (τ))= w (τ) τ(µ− µˆL (τ))− ϕ(Qˆ (τ) ; τ) (τ − τˆ (τ)) , (C.1.7)where µ = E(g (Y ))andµˆL (τ) =1τ∫ Qˆ(τ)−∞g (y) dF (y) .Denoting qˆ∗ = limτ↑1 Qˆ (τ) and q∗ = limτ↑1Q (τ), notice thatlimτ↑1µˆL (τ) =µ, qˆ∗ ≥ q∗;µ− ε, qˆ∗ < q∗ (C.1.8)for some ε > 0. Condition 3 then guarantees the existence of µ and µˆL (τ), which extendsto the existence of µs(τ ; Qˆ (τ)).We need only show existence of the integral on the right-hand side of Equation (C.1.4)by using the expression in Equation (C.1.7).First, investigate the existence of∫ 1τcw (τ) τ(µ− µˆL (τ))dτ.Since τ(µ− µˆL (τ)) ≤ µ for τ ∈ (τc, 1), we have∫ 1τcw (τ) τ(µ− µˆL (τ))dτ ≤ µ∫ 1τcw (τ) dτ,which exists since w ∈ RVα at 1− where α < 1 by Condition 2.108APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESSecondly, we show that ∫ 1τcϕ(Qˆ (τ) ; τ) (τ − τˆ (τ)) dτexists by showing that ∫ 1τcw (τ)∣∣∣g ◦ Qˆ (τ)∣∣∣ ∣∣τ − τˆ (τ)∣∣ dτ (C.1.9)exists. Since 0 ≤ ∣∣τ − τˆ (τ)∣∣ ≤ 1 for all τ ∈ (τc, 1), and ∫ 1τc w (τ) ∣∣∣g ◦ Qˆ (τ)∣∣∣ dτ exists dueto Condition 2, it follows that the desired integral in Equation (C.1.9) exists.Now that we have shown existence of the expected score, we turn to showing pro-priety. We adopt a more rigorous notation to aid the discussion. For k = 1, . . . , K,take τk:K = τc + (1− τc) (2k − 1)/(2K), and τK = (τ1:K , . . . , τK:K). Denote qK =(Q (τ1:K) , . . . , Q (τK:K))and qˆK =(Qˆ (τ1:K) , . . . , Qˆ (τK:K)). We know from Gneitingand Raftery (2007, Corollary 1) that Sc—defined in Equation (3.2.7)—is a proper scor-ing rule, so that ∫ (Sc (y, qK ;ϕ, τK)− Sc (y, qˆK ;ϕ, τK))dF (y) ≤ 0. (C.1.10)Also, we have convergence of the Riemann sum:limK→∞(Sc (y, qK ;ϕ, τ k)− Sc (y, qˆK ;ϕ, τ k))= S (y,Q;ϕ)− S(y, Qˆ;ϕ)uniformly over y ∈ Y . Therefore, by the Dominated Convergence Theorem, it followsthat ∫ (S (y,Q;ϕ)− S(y, Qˆ;ϕ))dF (y)= limK→∞∫ (Sc (y, qK ;ϕ, τ k)− Sc (y, qˆK ;ϕ, τ k))dF (y) . (C.1.11)The desired inequality in Equation (C.1.3) follows by the bound in Equation (C.1.10).C.2 Tail BalanceHere is a proof of the expected score formula in Equation (3.2.9).109APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESProof of Lemma 3.2.2. We seek the expectation ofsτ(Y,Q (τ) ;ϕ)= w (τ)(τ − I(−∞,Q(τ)) (Y ))(g (Y )− g (Q (τ)))for almost all τ ∈ (τc, 1) when Y has distribution function F , and Q = F←. SinceE(I(−∞,Q(τ)) (Y ))= τfor almost all τ ∈ (τc, 1),E(sτ(Y,Q (τ) ;ϕ))= w (τ)E((τ − I(−∞,Q(τ)) (Y ))g (Y ))= w (τ)[τµ−∫ Q(τ)−∞g (y) dF (y)], (C.2.1)where µ = E(g (Y )). Now, notice that the distribution function of Y | Y < Q (τ) is justthe original distribution function of Y adjusted by τ :FY |Y <Q(τ) (y) =P (Y ≤ y, Y < Q (τ))P (Y < Q (τ)) = F (y)τ ,y < Q (τ), for almost all τ ∈ (τc, 1). The conditional expectation µL (τ) = E(g (Y ) | Y < Q (τ))is thusµL (τ) =1τ∫ Q(τ)−∞g (y) dF (y) .Factoring τ out of the square brackets in Equation (C.2.1) gives the desired result.Here is a proof of the tail stability theorem.Proof of Proposition 3.2.3. Let u > 0. To find the index of variation, we must compute,110APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESusing Equation (3.2.9),limt→∞E(s1−(ut)−1 (Y,Q;ϕ))E(s1−t−1 (Y,Q;ϕ)) = limt→∞w(1− 1ut) (1− 1ut) (µ− µL(1− 1ut))w(1− 1t) (1− 1t) (µ− µL(1− 1t)) (C.2.2)= uα limt→∞µ− µL(1− 1ut)µ− µL(1− 1t) (C.2.3)= uα−1 limt→∞µ′L(1− 1ut)µ′L(1− 1t) , (C.2.4)where µ′L is the derivative of µL (the last equality holds due to L’Hôpital’s rule). Notethat the expectation exists because ξ < 1. We seek an asymptotic form for µ′L.Computed using the quantile function, the lower mean can be writtenµL (τ) =1τ∫ τ0g(Q (p))dp. (C.2.5)Differentiating, we obtainµ′L (τ) =1τg ◦Q (τ)− 1τ 2∫ τ0g ◦Q (p) dp,which exists due to the continuity of Q. But as τ ↑ 1,∫ τ0g ◦Q (p) dp→ µandg ◦Q (τ)→∞, ξ ≥ 0;g∗, ξ < 0,for right-endpoint g∗ ∈ R. For the case where ξ < 0, we have µ′L (τ) → g∗ − µ as τ ↑ 1,so that the limit in Equation (C.2.5) becomes uα−1. For the case where ξ ≥ 0, we haveµ′L (τ) ∼1τg ◦Q (τ)as τ ↑ 1, so that the limit in Equation (C.2.2) becomesuα−1 limt→∞(1− 1t)g ◦Q (1− 1ut)(1− 1ut)g ◦Q (1− 1t) = uα−1 limt→∞g(Q(1− 1ut))g(Q(1− 1t)) .111APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESThe result follows after recognizing that g ◦ Q ∈ RVξ at 1−, since F ◦ g← has extremevalue index ξ ≥ 0.Finally, we derive the equations of the expected single-quantile score found in Table 3.1by computing µL.GaussianComputing the lower conditional mean using the density, we haveµL (τ) =1τ∫ µ+σΦ←(τ)−∞yφ(y − µσ)dy,and through a change of variables to z = µ+ σy, we obtainµL (τ) =1τ∫ Φ←(τ)−∞(σz + µ)φ (z) dz.The integration is made easier by noting that −yφ (y) = φ′ (y). Separating the integralinto two terms and integrating, we obtainµL (τ) =1τ[σ(−φ (Φ← (τ)))+ µτ]= µ− σφ(Φ← (τ))τ,leading to the desired result.ExponentialThe quantile function under consideration is QY (τ) = µ log(11−τ). Now,µL (τ) =1τ∫ QY (τ)0yfY (y) dy.Using integration by parts (and integrating fY to its survival function F¯Y (y) = 1 −FY (y) = exp(−y/µ) for y ≥ 0), we obtainµL (τ) =1τ[−QY (τ) (1− τ) +∫ QY (τ)0F¯Y (y) dy].112APPENDIX C. PROOFS RELATED TO PROPER SCORING RULESThe integration is made easier by noting that − 1µF¯Y (y) = F¯′Y (y). Continuing,µL (τ) =1τ[−QY (τ) (1− τ) + µτ]= µ− µ1− ττlog(11− τ),leading to the desired result after recognizing that E (Y ) = µ.Type I ParetoThe quantile function under consideration is QY (τ) = σ (1− τ)−ξ, and the density isfY (y) =1ξσ1/ξy−1/ξ−1 for y > σ and 0 < ξ < 1. Now,µL (τ) =1τ∫ QY (τ)σyfY (y) dy.=1τ1ξσ1/ξ∫ QY (τ)σy−1/ξ dy=1τ1ξσ1/ξ(ξξ − 1)[(QY (τ))1−1/ξ − σ1−1/ξ]=1τ1ξ − 1σ[(1− τ)1−ξ − 1].The mean of a Pareto distribution exists when 0 < ξ < 1 and is µ = E (Y ) = σ/ (1− ξ),so thenµ− µL (τ) = σ1− ξ[1 +1τ((1− τ)1−ξ − 1)]=σ1− ξ1τ(τ + (1− τ)1−ξ − 1)=σ1− ξ1− ττ((1− τ)−ξ − 1).113Appendix DProofs related to CNQR AsymptoticsThis section contains proofs of consistency and asymptotic normality of the CNQRfamily of estimators. The estimator is defined in Equation (5.2.1), and the asymptoticresults are outlined in Section 5.2.2.D.1 PreliminariesStandard estimating equations theory cannot be used to prove the asymptotic results.Such method requires the existence of the gradient at the objective function’s minimum,but this is not always the case with CNQR estimation. Figure D.1 displays an exampleof an objective function whose minimum is at a cusp. Instead, the smoothness of theobjective function’s limit is used, as described in Newey and McFadden (1994, Chapter7).The cusp in the objective function is caused by the cusp in the asymmetric absolutedeviation function, ρτ . Each term in the CNQR objective function (say the i’th obser-vation and the k’th quantile) contributes a cusp at the parameter(s) whose τk-quantilesurface passes through the i’th observation. This is because the argument of ρτ is zero –the location of which ρτ is non-differentiable. For there not to be a cusp at the minimum,none of the K quantile surfaces must intersect an observation. It is not obvious whetherthis is possible, but we at least know that sometimes there is a cusp at the minimum.However, this cusp may not cause numerical issues when minimizing. The objectivefunction seems to quickly become approximately smooth when the sample size is moder-ate, and when the number of regression quantiles K is more than a few. In fact, it seemsthat integrating over quantiles leads to a smooth objective function. A possible reason forthis is, for any parameter value, there could exist a quantile surface that passes througha particular observation (unless that observation lies below the τc-quantile surface), in114APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICS0.0600.0610.0620.0630.0643.0 3.5 4.0 4.5 5.0θObjective FunctionFigure D.1: An example of an objective function of the CNQR estimator in Equation (5.2.1) withtwo quantile levels τ1 = 0.6 and τ2 = 0.8, and transformation functions g1 = g2 equal to the identityfunction. Ten bivariate observations are generated from a Gumbel copula with parameter θ0 = 5, andgiven standard Exponential marginals. The objective function is created with these data having knownmarginals but unknown copula parameter. A region of the parameter space containing the minimum isdisplayed.which case there is a cusp for that quantile surface and that observation. For most “nice”models, there is at most one such surface for one value of the parameter, and certainlynot an uncountable number of such surfaces. This quantile surface has zero measure inthe integral, so does not influence the resulting integral.Before proving the asymptotic results, some useful identities are introduced in thefollowing lemma, which extends a result of Knight (1998).Lemma D.1.1. For x, y ∈ R and asymmetric absolute deviation function ρτ defined inEquation (3.2.6) for τ ∈ (0, 1),ρτ (x− y)− ρτ (x) = y(I(−∞,0) (x)− τ)+ I (x, y) , (D.1.1)where I (x, y) can be written in the following ways:I (x, y) =∫ y0[I(−∞,s) (x)− I(−∞,0) (x)]ds, (D.1.2)I (x, y) =∫ y0I[0,s] (x) ds, y ≥ 0;∫ 0yI[s,0] (x) ds, y < 0,(D.1.3)andI (x, y) =∫ |y|0I[0,s](sign (y)x)ds. (D.1.4)Note that the lower limit of integration need not be less than the upper limit.Proof of Lemma D.1.1. The result with Equation (D.1.2) is shown first. Rewrite the115APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSleft-hand-side of Equation (D.1.1):ρτ (x− y)− ρτ (x) =(τ − I(−∞,0) (x− y))(x− y)− (τ − I(−∞,0) (x))x= −τy + (y − x) I(0,∞) (y − x) + xI(−∞,0) (x) .Rewrite the right-hand side of Equation (D.1.1) using the form of I (x, y) in Equa-tion (D.1.2):y(I(−∞,0) (x)− τ)+∫ y0[I(−∞,s) (x)− I(−∞,0) (x)]ds=− τy + I(−∞,0) (x) y +∫ y0I(x,∞) (s) ds− yI(−∞,0) (x)=− τy +∫ y0I(x,∞) (s) ds.The above integral can be further evaluated:∫ y0I(x,∞) (s) ds =∫ x0I(x,∞) (s) ds+∫ yxI(x,∞) (s) ds=∫ x0I(x,∞) (s) ds+∫ y−x0I(0,∞) (s) ds= xI(−∞,0) (x) + (y − x) I(0,∞) (y − x) ,so that the identity holds with I (x, y) in Equation (D.1.2).I(−∞,s) (x) I(−∞,0) (x) I(−∞,s) (x)− I(−∞,0) (x) Happens when...For s > 0 For s < 01 1 0 x < 0 x < s0 1 -1 impossible s ≤ x ≤ 01 0 1 0 ≤ x ≤ s impossible0 0 0 x > s x > 0Table D.1: Indicator functions in the formulation of Zn (θ)To show Equation (D.1.3), rewrite the integrand of Equation (D.1.2) by consideringthe cases outlined in Table D.1. For s ∈ R,I(−∞,s) (x)− I(−∞,0) (x) =I[0,s] (x) , s ≥ 0;−I[s,0] (x) , s < 0,so that Equation (D.1.3) follows.To show Equation (D.1.4), reduce the right-hand side of Equation (D.1.4) considering116APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSthe two cases y < 0 and y ≥ 0. The y ≥ 0 case is trivial, since it directly matches they ≥ 0 case in Equation (D.1.3). Consider, then, y < 0. Changing the variable ofintegration in the right-hand side of Equation (D.1.4) to t = −s, we have∫ |y|0I[0,s](sign (y)x)ds = −∫ y0I[0,−t] (−x) dt=∫ 0yI[t,0] (x) dt,which matches the y < 0 case in Equation (D.1.3). Therefore, Equation (D.1.4) holds.The identity in Lemma D.1.1 using Equation (D.1.2) follows from Knight (1998), andis often used in the literature when proving asymptotic properties of quantile regressionestimators (cf. Chernozhukov, 2005; Jiang et al., 2012; Noh et al., 2015; Zhao and Xiao,2014; Huang et al., 2015; Xie, 2015). The other two identities are also useful, yet seemto be absent from the literature.D.2 ProofsTo prove the asymptotic results of the CNQR family of estimators, it is useful towork with an alternative objective function that has a zero minimum. Recall by As-sumption 5.2.2 that QˆY |X(·|x;θ0) = QY |X (·|x); letεki = gk (Yi)− gk(QY |X(τk|X i))= gk (Yi)− qki (θ0) , (D.2.1)and∆qki (θ) = qki (θ)− qki (θ0) (D.2.2)for each i = 1, . . . , n and k = 1, . . . , K, where θ ∈ Θ, and qki is defined in Equa-tion (5.2.3). The i subscript is dropped to refer to the generic random vector(X>, Y).The conditional distribution function of εk given the predictors is thenFεk|X(t | x) = Fgk(Y )|X (qk (θ0) + t | x) (D.2.3)for t ∈ R and all x ∈X , with densityfεk|X(t | x) = fgk(Y )|X (qk (θ0) + t | x) . (D.2.4)The new objective function that we seek is117APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSZn (θ) =1nK∑k=1n∑i=1[ρτk(εki −∆qki (θ))− ρτk (εki)] (D.2.5)for θ ∈ Θ. Notice that the corresponding CNQR estimator1 is equivalent to θˆn =arg minθ Zn (θ). This objective function is convenient because Zn (θ0) = 0, since ∆qki (θ0) =0. Using the identity in Lemma D.1.1, this objective function can be written asZn (θ) =1nK∑k=1n∑i=1[∆qki (θ)(I(−∞,0) (εki)− τk)+ I (εki,∆qki (θ))] . (D.2.6)To show consistency of the CNQR family of estimators, the following regularity con-ditions are needed.Condition 1. Consider a CNQR estimator in Equation (5.2.1).(a) The parameter space Θ is a compact set.(b) There exist real numbers 1 > 0 and 2 > 0 such that, for at least one k = 1, . . . , K,the setXk ={x ∈X : fgk(Y )|X(qki (θ0) + t|x) ≥ 1 ∀ t ∈ (−2, 2)} (D.2.7)has P (X ∈Xk) = pXk > 0.(c) For all θ ∈ Θ\ {θ0} and at least one k = 1, . . . , K,P(∣∣∆qk (θ)∣∣ > 0 ∣∣∣X ∈Xk) > 0.Condition 1(a) can be relaxed in practice, as is discussed in Newey and McFadden(1994, Section 2.6). Condition 1(b) allows us to restrict focus of the predictor spaceX to regions Xk having non-zero measure, over which the response has a non-zero(conditional) probability of taking values near the τk-quantile. Condition 1(c) ensuresthat the parameter θ0 is identifiable – that is, the quantile surface QˆY |X(τk | ·;θ)isdifferent from QY |X(τk | ·)over some subset of Xk having non-zero measure, unlessθ = θ0.To prove consistency, we make use of the fact that a CNQR objective function has aunique minimum at the true parameter value in expectation, as stated in the following1In practice, estimation cannot be carried out with this alternative objective function Zn, whichcontains unknown quantities. But this objective function is useful for theoretical considerations.118APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSlemma.Lemma D.2.1. Suppose Assumptions 5.2.1 and 5.2.2 hold, and consider non-decreasingfunctions gk : R → R, k = 1, . . . , K. Under the regularity conditions listed in Condi-tion 1, E(Zn (θ))is uniquely minimized at θ = θ0, where Zn (θ) is defined in Equa-tion (D.2.5). That is, E(Zn (θ))> E(Zn (θ0))for all θ ∈ Θ\ {θ0}.Proof. The main idea of this proof is from Chen et al. (2009).First, we will decompose E(Zn (θ))for all θ ∈ Θ. Using the formulation of Zn (θ) inEquation (D.2.6) together with the law of total expectation,E(Zn (θ))=K∑k=1E(E([∆qk (θ)(I(−∞,0) (εk)− τk)+ I (εk,∆qk (θ))] ∣∣∣X)) ,due to the iid assumption of the data, where εk and ∆qk are defined respectively inEquations (D.2.1) and (D.2.2). SinceE(I(−∞,0) (εk)− τk |X)= P(Y < QY |X(τk |X) |X)− τk = 0,we need only focus on the integral term I (εk,∆qk (θ)). Using the two-case form of theintegral term in Equation (D.1.3), we have E(Zn (θ))=∑k E (T1k + T2k), where “terms1 and 2” areT1k = I(0,∞)(∆qk (θ))E[∫ ∆qk(θ)0I(0,s) (εk) ds |X]andT2k = I(−∞,0)(∆qk (θ))E[∫ 0∆qk(θ)I(s,0) (εk) ds |X].It is now clear that E(Zn (θ)) ≥ 0 since T1k ≥ 0 and T2k ≥ 0 almost surely. Also,notice that Zn (θ0) = 0 since ∆qk (θ0) = 0. Thus, to show that E(Zn (θ))is uniquelyminimized at θ = θ0 is to show that E(Zn (θ))is positive for all θ ∈ Θ\ {θ0}.Now, take θ ∈ Θ\ {θ0} and k ∈ {1, . . . , K}. Restricting the predictor space to Xkdefined in Equation (D.2.7), we haveT1k ≥ IXk (X) T1k= IXk (X) I(0,∞)(∆qk (θ)) ∫ ∞−∞[∫ ∆qk(θ)0I(0,s) (ε) ds]fgk(Y )|X(qk (θ0) + ε |X)dε= IXk (X) I(0,∞)(∆qk (θ)) ∫ ∆qk(θ)0∫ s0fgk(Y )|X(qk (θ0) + ε |X)dε ds,119APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSusing the density of εk conditional on the predictors in Equation (D.2.4). By Condi-tion 1(b), confiningX ∈Xk ensures that the random function ε 7→ fgk(Y )|X(qk (θ0) + ε |X)is almost surely larger than 1 over (−2, 2). So we integrate over the smaller functionε 7→ 1I(−2,2) (ε) instead, to obtainT1k ≥ IXk (X) I(0,∞)(∆qk (θ)) 12[∆qk (θ)]2, ∆qk (θ) < 2;22, ∆qk (θ) ≥ 2almost surely.Similarly, we obtainT2k ≥ IXk (X) I(−∞,0)(∆qk (θ)) 12[∆qk (θ)]2, ∆qk (θ) > −2;22, ∆qk (θ) ≤ −2almost surely.Putting T1k and T2k together, we obtainE (T1k + T2k)≥12E(IXk (X)[[∆qk (θ)]2 I(−2,2) (∆qk (θ))+ 22I[2,∞) (∣∣∆qk (θ)∣∣)])=12E([∆qk (θ)]2 I(−2,2) (∆qk (θ))+ 22I[2,∞) (∣∣∆qk (θ)∣∣) ∣∣∣X ∈Xk)P (X ∈Xk)=12pXkE([∆qk (θ)]2 ∣∣∣X ∈Xk, ∣∣∆qk (θ)∣∣ < 2)P (∣∣∆qk (θ)∣∣ < 2 ∣∣∣X ∈Xk)+1222pXkP(∣∣∆qk (θ)∣∣ ≥ 2 ∣∣∣X ∈Xk) .In the case thatP(∣∣∆qk (θ)∣∣ ≥ 2 ∣∣∣X ∈Xk) > 0,we have E (T1k + T2k) > 0 and so E(Zn (θ))> 0. In the case thatP(∣∣∆qk (θ)∣∣ ≥ 2 ∣∣∣X ∈Xk) = 0,120APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSso that most of the mass of ∆qk (θ) is near 0, we haveE([∆qk (θ)]2 ∣∣∣X ∈Xk, ∣∣∆qk (θ)∣∣ < 2) > 0under Condition 1(c), and so E(Zn (θ))> 0.We are now equipped to prove consistency.Proof of Theorem 5.2.1 (Consistency). Following Chen et al. (2009), Newey and McFad-den (1994, Theorem 2.1) is used to prove consistency, which follows if the following fourconditions hold.Firstly, E(Zn (θ))must be uniquely minimized at θ = θ0. This holds by Lemma D.2.1.Secondly, Θ must be compact, which follows by Condition 1(a).Thirdly, E(Zn (θ))must be continuous, which follows by the continuity of θ 7→QY |X(τ |x;θ) for any τ ∈ (0, 1), and x ∈X .Lastly, we require Zn (θ)→p E(Zn (θ))uniformly over θ ∈ Θ; that is,supθ∈Θ∣∣∣Zn (θ)− E (Zn (θ))∣∣∣→p 0.This follows by a uniform version of the weak law of large numbers (cf. Amemiya, 1985,Theorem 4.2.1), since the loss functions ρτ1 , . . . , ρτK , defined in Equation (3.2.6), are“well behaved”. In particular, since Θ is compact, the data are iid, Zn is continuous(conditional on the observations), and the loss functions are finite over Θ, then uniformconvergence in probability holds.Next, we turn our attention to proving the asymptotic distribution of the CNQRestimator. The idea behind the proof comes from Koenker (2005); Zou and Yuan (2008).The proof of the asymptotic distribution of the CNQR estimator relies on the followingregularity conditions.Condition 2. Consider a CNQR estimator in Equation (5.2.1).(a) For each i = 1, 2, . . .,(i) the gradient q˙ki, defined in Equation (5.2.4), exists and is continuous in aneighbourhood of θ0 and is non-zero at θ0; and(ii) the Hessian q¨ki, defined in Equation (5.2.5), exists and is continuous in aneighbourhood of θ0.121APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICS(b) For each k = 1, . . . , K and k′ = 1, . . . , K, each entry of D0kk′ defined in Equa-tion (5.2.7) is finite.(c) For each k = 1, . . . , K, each entry of q¨k (θ0) defined in Equation (5.2.5) is almostsurely finite.(d) For each k = 1, . . . , K, each entry of D1k defined in Equation (5.2.9) is finite.Condition 2(a) allows for the use of Taylor’s theorem with a second degree Taylorpolynomial, and implies that θ0 must be in the interior of Θ. Conditions 2(b)–(d)prevent divergence of the objective function as the sample size increases.Proof of Theorem 5.2.2 (Asymptotic Normality). We begin by identifying an objectivefunction whose arg min is√n(θˆn − θ0). Since θˆn minimizes Zn (θ), defined in Equa-tion (D.2.5), it follows that√n(θˆn − θ0)minimizes Z∗n : δ 7→ nZn(θ0 + δ/√n), whichevaluates toZ∗n (δ) =K∑k=1n∑i=1∆qki(θ0 + δ√n)(I(−∞,0) (εki)− τk)+ I(εki,∆qki(θ0 +δ√n))(D.2.8)for δ in some neighbourhood of 0, where the function I takes one of the forms described inLemma D.1.1. The strategy is to find the limiting distribution of√n(θˆn − θ0)throughthe limiting objective function, Z∗∞ : δ 7→ limn→∞ Z∗n (δ).Consider the second order Taylor polynomial of δ 7→ ∆qki(θ0 + δ/√n)about 0 foreach i and k. By Taylor’s theorem,∆qki(θ0 + δ/√n)=1√n[q˙ki (θ0)]>δ +12nδ>[q¨ki (θ0)]δ + op(n−1)=: ∆q∗nki (δ)as n → ∞ for δ in some neighbourhood of 0, where q˙ki and q¨ki are respectively de-fined in Equations (5.2.4) and (5.2.5). Note that ∆q∗nki (δ) = Op(n−1/2)as n → ∞.This expansion exists due to Condition 2(a). Substituting this expansion into Z∗n (δ) inEquation (D.2.8), we obtainZ∗n (δ) =√nδ>Z¯1n +12Z¯2n (δ) +K∑k=1Z¯3nk (δ) + op (1)122APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSas n→∞, whereZ¯1n =1nn∑i=1K∑k=1(I(−∞,0) (εki)− τk)q˙ki (θ0) ,Z¯2n (δ) =1nn∑i=1K∑k=1(I(−∞,0) (εki)− τk)δ>q¨ki (θ0) δ=:1nn∑i=1Z2ni (δ) ,andZ¯3nk (δ) =1nn∑i=1Z3nki (δ) ,whereZ3nki (δ) = nI(εki,∆q∗nki (δ)).The asymptotic behaviour of each√nZ¯1n, Z¯2n (δ), and Z¯3nk (δ) are now sought.Firstly, it is shown that√nZ¯1n converges to a normal distribution as n → ∞. Forits expectation, sinceE[I(−∞,0) (εki)]= E[E(I(−∞,0) (εki) |X)]= E[P(Y < QY |X(τk |X) |X)]= τk,we have E(Z¯1n)= 0. For the covariance matrix, notice thatI(−∞,0) (εki) ∼ Bernoulli (τk)andI(−∞,0) (εki) I(−∞,0) (εk′i) ∼ Bernoulli(min (τk, τk′))123APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSfor each k,k′. Now,Cov(√nZ¯1n)=1nn∑i=1∑k,k′E(Cov(I(−∞,0) (εki) q˙ki (θ0) , I(−∞,0) (εk′i) q˙k′i (θ0) |X i))=1nn∑i=1∑k,k′E(Cov(I(−∞,0) (εki) , I(−∞,0) (εk′i) |X i) [q˙ki (θ0)] [q˙k′i (θ0)]>)=∑k,k′min (τk, τk′)(1−max (τk, τk′))D0kk′=: Σ1where D0kk′ is defined in Equation (5.2.7), because τkτk′ = min (τk, τk′) max (τk, τk′). ByCondition 2(a)(i), Σ1 is nonzero because each q˙ki (θ0) is nonzero, and by Condition 2(b),Σ1 has finite entries. Thus√nZ¯1n →d Z1∞ by the Central Limit theorem, where Z1∞ ∼N (0,Σ1).Secondly, it is shown that Z¯2n (δ) →p 0 as n → ∞. We have E(Z¯2n (δ))= 0 for thesame reason that E(Z¯1n)= 0. By the law of total covariance, and since the data areiid, we haveVar(Z¯2n (δ))=1n2n∑i=1∑k,k′E(Cov(I(−∞,0) (εki) , I(−∞,0) (εki) |X i)δ>q¨ki (θ0) δδ>q¨k′i (θ0) δ)=1n∑k,k′min (τk, τk′)(1−max (τk, τk′))E(δ>q¨k (θ0) δδ>q¨k′ (θ0) δ)→0as n→∞, by Condition 2(c). Thus Z¯2n (δ)→p 0 by the weak law of large numbers.Lastly, it is shown that Z¯3nk (δ) converges in probability to the limit of its expectationas n → ∞ for each k = 1, . . . , K. To find the expectation of Z3nki (δ), use the form ofthe integral I found in Equation (D.1.2), and consider a change of variables to t = √ns,to obtainZ3nki (δ) =∫ [q˙ki(θ0)]>δ+op(1)0t[I(−∞,t/√n) (εki)− I(−∞,0) (εki)]/(t√n)dt.124APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSBy the law of total expectation, and since the data are iid,E(Z3nki (δ))=E∫ δ>q˙k(θ0)+op(1)0t[Fgk(Y )|X(qk (θ0) + t/√n |X)− Fgk(Y )|X (qk (θ0) |X)]t/√ndt=E(∫ δ>q˙k(θ0)+op(1)0t[fgk(Y )|X(qk (θ0) |X)+ op (1)]dt)=E(12[δ>q˙k (θ0) + op (1)]2 [fgk(Y )|X(qk (θ0) |X)+ op (1)])=12δ>D1kδ +R3nkas n→∞, where the remainder R3nk = o (1), and D1k is defined in Equation (5.2.9). Tofind the variance of Z3nki (δ), use the form of the integral I found in Equation (D.1.4),so thatZ3nki (δ) = n∫ |∆q∗nki(δ)|0I[0,s](sign(∆q∗nki (δ))εki)ds= n(∣∣∆q∗nki (δ)∣∣− sign (∆q∗nki (δ)) εki) I[0,|∆q∗nki(δ)|] (sign (∆q∗nki (δ)) εki)= n(∆q∗nki (δ)− εki)Wnki,as n→∞, whereWnki = sign(∆q∗nki (δ))I[min(0,∆q∗nki(δ)),max(0,∆q∗nki(δ))] (εki)=1, 0 ≤ εki < ∆q∗nki (δ) ;0, εki < min(0,∆q∗nki (δ))or εki > max(0,∆q∗nki (δ));−1, ∆q∗nki (δ) < εki ≤ 0,which can only take two values with non-zero probability, depending on the sign of∆q∗nki (δ). The probabilities are defined by 1 − pnk := P(Wnk = 0 |X)(recall that a125APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSdropped i subscript refers to the generic random vector(X>, Y)), which evaluate topnk =P(min(0,∆q∗nk (δ))< εk ≤ max(0,∆q∗nk (δ)) |X)=Fgk(Y )|X(max(0,∆q∗nk (δ))+ qk (θ0) |X)− Fgk(Y )|X(min(0,∆q∗nk (δ))+ qk (θ0) |X)=sign(∆q∗nk (δ)) Fgk(Y )|X (∆q∗nk (δ) + qk (θ0) |X)− Fgk(Y )|X (qk (θ0) |X)∆q∗nk (δ)∆q∗nk (δ)=[fgk(Y )|X(qk (θ0) |X)+ op (1)] ∣∣∆q∗nk (δ)∣∣as n → ∞, since ∆q∗nk (δ) →p 0. Note that pnk = Op(n−1/2)as n → ∞. Now, let usfind the variance of Z3nki (δ) in a few stages. First,Var(Z3nk (δ))= n2Var((∆q∗nk (δ)− εk)Wnk)≤ n2E((∆q∗nk (δ)− εk)2W 2nk)= n2E(E((∆q∗nk (δ)− εk)2W 2nk∣∣X)) ,based on the definition of variance in terms of expectation, and based on the law of totalexpectation (double expectation). Next, the inner expectation can be decomposed bythe law of total expectation on the partition Wnk = 0 and Wnk 6= 0:Var(Z3nk (δ)) ≤ n2E(0 (1− pnk) + E((∆q∗nk (δ)− εk)2 ∣∣X,Wnk 6= 0) pnk)≤ n2E(E(∆q∗nk (δ)2∣∣X,Wnk 6= 0) pnk) ,where the second line holds because(∆q∗nk (δ)− εk)2 ≤ ∆q∗nk (δ)2 almost surely whenWnk 6= 0. But now the inner expectation is the expectation of a constant, because∆q∗nk (δ) is fixed when X is fixed. This means thatVar(Z3nk (δ)) ≤ n2E(∆q∗nk (δ)2 pnk) = n2O (n−3/2) = O (n1/2)as n→∞. Now it can be shown that the variance of the average vanishes as n→∞:Var(Z¯3nk (δ))=1nVar(Z3nk (δ))= O(n−1/2).126APPENDIX D. PROOFS RELATED TO CNQR ASYMPTOTICSNow, by Chebyshev’s inequality, for any  > 0,P(∣∣∣∣Z¯3nk (δ)− 12δ>D1kδ −R3nk∣∣∣∣ > )≤ 12Var(Z¯3nk (δ))→ 0as n → ∞, so that Z¯3nk (δ) →p δ>D1kδ/2 (because the remainder R3nk → 0), and∑k Z¯3nk (δ)→p δ>D1δ/2.Combining the asymptotic behaviour of√nZ¯1n, Z¯2n (δ), and Z¯3nk (δ), we obtainZ∗n (δ)→d Z∗∞ (δ) = δ>Z1∞ +12δ>D1δ.It follows that the arg min of these objective functions also converge in distribution (cf.Hjort and Pollard, 1993). Condition 2(a)(i) ensures that D1 is non-zero and is thereforepositive definite, and this ensures that Z∗∞ (δ) is uniquely minimized at −D−11 Z1∞, whichexists because of Condition 2(d). Finally,√n(θˆn − θ0)= arg minδZ∗n (δ)→d arg minδZ∗∞ (δ)= −D−11 Z1∞.127Appendix EProofs related to the New CopulaFamiliesThis appendix first provides details and proofs of the DJ and extDJ copula families,introduced in Section 6.2. Then, details and proofs of the IGL and IG copula families– special cases of the DJ and extDJ classes, respectively – are provided, introduced inSection 6.1. Note that the DJ and extDJ families are discussed here first, because theseresults follow through to the IGL and IG copula families. This order is reversed inChapter 6 because it makes more sense to introduce the various families by generalizingthe IGL and IG families to the DJ and extDJ families.Note that Durante and Jaworski (2012), the founders of the DJ copula class, do notdiscuss much of the properties of the DJ class, nor do they discuss much about parametriccases of this copula class.Each section in this appendix is laid out in the same way. First, properties of thegenerating function of the copula class are outlined. Second, some key formulas relatedto the copula class are provided. Third, proofs of the properties outlined in Chapter 6are given.E.1 DJ Copula ClassE.1.1 Generating Function PropertiesNot much can be said about the DJ generating functions outside of their definition.However, the following family of functions plays an important role in the IG and othercopula families.Definition E.1.1. Let ψ : [0,∞)→ [0, 1] be a concave distribution function or a convex128APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESsurvival function with derivative ψ′. The kappa function of ψ is defined asκψ (t) = ψ (t)− tψ′ (t) (E.1.1)for t > 0.Proposition E.1.1. Let ψ : [0,∞) → [0, 1] be a concave distribution function (resp.convex survival function) with derivative ψ′. Then κψ, defined in Equation (E.1.1), is adistribution function (resp. survival function). If ψ′′ exists, thenκ′ψ (t) = −tψ′′ (t)for t > 0.Proof. When ψ′′ exists,κ′ψ (t) = ψ′ (t)− ψ′ (t)− tψ′′ (t) = −tψ′′ (t)for t > 0. The monotonicity of κψ follows directly from the concavity of ψ, because ψ′′does not change sign.It remains to evaluate κψ at the endpoints of the support. First, notice that theexistence of the upper integral of ψ′ to infinity implies ψ′ (t) = o(1/t)as t→∞, so thattψ′ (t)→ 0. Now,limt→∞κψ (t) = limt→∞ψ (t)− limt→∞tψ′ (t) = limt→∞ψ (t) .Similarly, the existence of the lower integral of ψ′ from zero implies that ψ′ (t) = o(1/t)as t ↓ 0, so that tψ′ (t)→ 0. Now,limt↓0κψ (t) = limt↓0ψ (t)− limt↓0tψ′ (t) = limt↓0ψ (t) .Since κψ is monotone and evaluates to the same values as ψ at the endpoints of thesupport (either 0 or 1), then κψ is a distribution function when ψ is a distributionfunction, and κψ is a survival function when ψ is a survival function.E.1.2 FormulasThe DJ copula generated by ψ has the following distributional quantities, for (u, v) ∈(0, 1)2.129APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIES• Whenever ψ′ exists, the conditional distributions areCDJ,2|1(v|u;ψ) = κψ (u−1ψ← (v)) , (E.1.2)where κψ is defined in Equation (E.1.1), andCDJ,1|2(u|v;ψ) = ψ′ (u−1ψ← (v))ψ′(ψ← (v)) . (E.1.3)• Whenever ψ′′ exists, the copula density iscDJ (u, v;ψ) = −ψ← (v)ψ′′(u−1ψ← (v))u2ψ′(ψ← (v)) . (E.1.4)E.1.3 Proofs of PropertiesProof of Proposition 6.2.2 (equivalence class). Consider ψ so that (u, v) 7→ CDJ (u, v;ψ)is a copula. Then (ψ ◦ g−1/α)←= g←−1/α ◦ ψ←(the same direction of monotonicity of ψ and g−1/α is not required), and for (u, v) ∈(0, 1)2,C(α)DJ(u, v;ψ ◦ g−1/α)= uψ(g−1/α(uαg←−1/α(ψ← (v))))= uψ(u−1g−1/α(g←−1/α(ψ← (v))))= uψ(u−1ψ← (v))= CDJ (u, v;ψ) .It follows that the class of copulas defined in Equation (6.2.2) is not empty, since theoriginal DJ class of copulas is non-empty.For the converse, suppose ψα is such that (u, v) 7→ C(α)DJ (u, v;ψα) is a copula. Then130APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESfor (u, v) ∈ (0, 1)2,CDJ (u, v;ψα ◦ g−α) = uψα(g−α(u−1g←−α(ψ←α (v))))= uψα(uαg−α(g←−α(ψ←α (v))))= uψα(uαψ←α (v))= C(α)DJ (u, v;ψα) .Proof of Corollary 6.2.3 (scale invariance). Consider the function g−α (t) = t−α for t ∈(0,∞), and take (u, v) ∈ (0, 1)2. By Proposition 6.2.2,C(α)DJ (u, v;ψ ◦ g) = CDJ (u, v;ψ ◦ g ◦ g−α) .Now takeg−1/α = (g ◦ g−α)← = g←−α ◦ g←(the direction of monotonicity of g and g−α is not required), defined byg−1/α (t) =(ta)−1/α= a˜t−1/αfor t ∈ (0,∞), where a˜ = a1/α > 0. By Proposition 6.2.2,CDJ (u, v;ψ ◦ g ◦ g−α) = C(α)DJ(u, v;ψ ◦ g ◦ g−α ◦ g−1/α)= C(α)DJ(u, v;ψ ◦ g ◦ g−α ◦ g←−α ◦ g←)= C(α)DJ (u, v;ψ) ,so that C(α)DJ (u, v;ψ ◦ g) = C(α)DJ (u, v;ψ).Proof of Proposition 6.2.4 (stochastic representation). First, notice that the distributionfunction of Z conditional on Y , described in Equation (6.2.4), is in fact a distributionfunction, since ψ′ is decreasing due to the concavity of ψ. Now, the joint distribution131APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESfunction of (U, Y ) in Equation (6.2.4) can be shown using Equation (6.2.3):P (U ≤ u, Y ≤ y) =∫ y0P(YZ≤ u∣∣∣ Y = w)ψ′ (w) dw=∫ y0P (Z ≥ u−1w | Y = w)ψ′ (w) dw=∫ y0ψ′(u−1w)dw= uψ(u−1y)for (u, y) ∈ (0, 1)× [0,∞). It follows that U has a Unif (0, 1) distribution:P (U ≤ u) = limy→∞P (U ≤ u, Y ≤ y) = u limy→∞ψ(u−1y)= ufor u ∈ (0, 1). The copula linking (U, Y ) is thereforeCUY (u, v) = P(U ≤ u, Y ≤ ψ← (v)) = uψ (u−1ψ← (v)) ,which is the DJ copula with generating function ψ.Now, using the distribution function of Y˜ = 1/Y given byFY˜ (y) = 1− ψ(1y),the copula linking(U, Y˜)isCUY˜ (u, v) = P(U ≤ u, FY˜(Y˜)≤ v)= P (U ≤ u, 1− ψ (Y ) ≤ v) = R2CUY (u, v) ,by definition of the R2 reflection in Definition 2.4.1. Since CUY is the DJ copula withgenerating function ψ, by Proposition 6.2.1, CUY˜ is the DJ copula with the convex survivalfunction 1− ψ.Proof of Proposition 6.2.7 (independence copula). Fix (u, v) ∈ (0, 1)2. Using Equation (6.2.1),the copula family evaluates toCDJ (u, v;ψθ) = umin(u−θv, 1)132APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESfor θ ∈ (0, 1). Since u−θv < 1 for θ in some neighbourhood of 0,limθ↓0CDJ (u, v;ψθ) = limθ↓0u1−θv = uv = C⊥ (u, v) .Proof of Proposition 6.2.8 (DJ Weibull CCEVI). The DJ Weibull copula family is de-scribed for β ≥ 1 byCDJW (u, v; β) = uvu−1/β ,and results when taking the DJ generating function ψβ : t 7→ exp(−t1/β)in Equa-tion (6.2.1), which is strictly decreasing with inverse function ψ←β (w) = (− logw)β forw ∈ (0, 1) and derivativeψ′β (t) = −1βt1/β−1ψβ (t) .To find the CCEVI, fix u ∈ (0, 1) and let x = u−1/β > 1. We use the copula density,which iscDJW (u, v; β) =xvx−1 (β − 1− x log v)β,where v ∈ (0, 1). Sincelimv↑1cDJW (u, v; β) = x(β − 1)β∈ (0,∞) ,the CCEVI is 1, by Proposition 2.4.2.To find the CCEVI of the reflection copula, we use the conditional distribution of theDJ family found in Equation (E.1.2),CˆDJW,2|1(v|u; β) = 1− κβ ((1− u)−1 (− log (1− v))β) ,whereκβ (t) = ψβ (t)− tψ′β (t)= exp(−t1/β)(1 +1βt1/β)∼ exp(−t1/β)133APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESas t→∞. Now, consider the following limit for any s > 0:limt→∞1− CˆDJW,2|1(1− (ts)−1 |u; β)1− CˆDJW,2|1(1− t−1|u; β) = limt→∞κβ((1− u)−1 (log (ts))β)κβ((1− u)−1 (log t)β)=[limt→∞exp(− log (ts))exp (− log t)](1−u)−1/β= s−1/(1−u)1/β,so that the CCEVI of the reflection copula is u 7→ (1− u)1/β, which is decreasing.134APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESE.2 extDJ Copula ClassE.2.1 Generating Function PropertiesThis section examines properties of the extDJ generating functions.Proposition E.2.1. Let ψ : [0,∞) → [0, 1] be a distribution function. The functionHψ : [1,∞)× (0,∞)→ [0, 1], defined in Equation (6.2.5), has the following properties:1. If ψ is differentiable, then Hψ has derivativesD1Hψ (t; η) = − 1t2[ψ(1η log t)+1η (log t)2ψ′(1η log t)](E.2.1)andD2Hψ (t; η) = − 1tη2 log tψ′(1η log t). (E.2.2)If ψ is twice-differentiable, thenD12Hψ (t; η) =1(tη log t)2[(1 + log t)ψ′(1η log t)+1η log tψ′′(1η log t)](E.2.3)for (t, η) ∈ (1,∞)× (0,∞).2. For any η > 0, Hψ (·; η) is strictly decreasing, so that H←ψ (·; η) is the unique inversefunction.3. For any η > 0, Hψ (·; η) is a survival function over [1,∞).4. For t > 1, we haveHψ (t; 0) := limη↓0Hψ (t; η) =1t(E.2.4)andlimη→∞Hψ (t; η) = 0. (E.2.5)5. For any t > 1, Hψ (t; ·) is non-increasing, and is strictly decreasing if ψ is strictlyincreasing. It follows that the functions t 7→ Hψ (t; η) are bounded above by t 7→ 1/tfor each η > 0.135APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIES6. If ψ is differentiable, thenHψ (t; η) + ηD2Hψ (t; η) =1tκψ(1η log t)(E.2.6)= Hκψ (t; η) (E.2.7)for (t, η) ∈ (1,∞)× (0,∞), where κψ is defined in Equation (E.1.1).7. If ψ is strictly increasing, then1η logH←ψ (v; η)= ψ←(vH←ψ (v; η))(E.2.8)for η > 0 and v ∈ (0, 1).Proof. Property 1 follows by direct differentiation of Hψ.Property 2: take 1 < t1 < t2. Now,Hψ (t1; η) =1t1ψ(1η log t1)>1t2ψ(1η log t1)≥ 1t2ψ(1η log t2)= Hψ (t2; η) ,where the second inequality holds because ψ is increasing.Property 3: we havelimt↓1Hψ (t; η) = limt↓1ψ(1η log t)= limx→∞ψ (x) = 1,andlimt→∞Hψ (t; η) =(limt→∞1t)(limx↓0ψ (x))= 0,since ψ is a distribution function on [0,∞). Since Hψ (·; η) is decreasing, as identified inProperty 2, Hψ (·; η) is a survival function.Property 4: for t >1,limη↓0Hψ (t; η) =1tlimη↓0ψ(1η log t)=1tlimx→∞ψ (x) =1tandlimη→∞Hψ (t; η) =1tlimη→∞ψ(1η log t)=1tlimx↓0ψ (x) = 0,since ψ is a distribution function on [0,∞).136APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESProperty 5: take 0 < η1 < η2. Now,Hψ (t; η1) =1tψ(1η1 log t)≥ 1tψ(1η2 log t)= Hψ (t; η2) ,where the inequality arises since ψ is non-decreasing. If ψ is strictly increasing, then theinequality also becomes strict.Property 6: the result follows by simple algebra, using the form of D2Hψ in Equa-tion (E.2.2).Property 7: by Property 2, Hψ (·; η) is strictly decreasing, so that H←ψ (·; η) is itsunique inverse. From the definition of Hψ (·; η) in Equation (6.2.5), the inverse functionH←ψ (·; η) satisfiesv =1H←ψ (v; η)ψ(1η logH←ψ (v; η))for v ∈ (0, 1). Multiplying both sides by H←ψ (v; η) and applying the inverse function ψ←arrives at the desired identity.E.2.2 FormulasLet (ψ, θ) be an extDJ generator, with corresponding extDJ generating function Hψdefined in Equation (6.2.5). The corresponding extDJ copula, defined in Equation (6.2.6),has the following distributional quantities for (u, v) ∈ (0, 1)2, using the shorthand nota-tion t = H←ψ (v; θ) when appropriate.• Whenever ψ′ exists, the 2|1 conditional distribution function isCextDJ,2|1(v | u;ψ; θ) = 1tκψ(1θu log t)= Hκψ(H←ψ (v; θ) ; θu), (E.2.9)where κψ is defined in Equation (E.1.1). This can be found using Proposition E.2.1(6).The corresponding quantile function isC←extDJ,2|1(τ | u;ψ, θ) = Hψ (H←κψ (τ ; θu) ; θ) (E.2.10)for τ ∈ (0, 1).137APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIES• Whenever ψ′ exists, the 1|2 conditional distribution function isCextDJ,1|2(u | v;ψ, θ) = uD1Hψ(H←ψ (v; θ) ; θu)D1Hψ(H←ψ (v; θ) ; θ) (E.2.11)= uψ(1θu log t)+ 1θu(log t)2ψ′(1θu log t)ψ(1θ log t)+ 1θ(log t)2ψ′(1θ log t) .This result can be determined using the formula for D1Hψ, found in Equation (E.2.1).• Whenever ψ is twice-differentiable, the copula density iscextDJ (u, v;ψ, θ) =D1Hκψ(H←ψ (v; θ) ; θu)D1Hψ(H←ψ (v; θ) ; θ) (E.2.12)=κψ(1θu log t)+ 1θu(log t)2κ′ψ(1θu log t)ψ(1θ log t)+ 1θ(log t)2ψ′(1θ log t) ,where κψ is defined in Equation (E.1.1). Note that D1Hκψ and D1Hψ are bothnegative, due to Proposition E.2.1(2), so that this density is non-negative.E.2.3 Proofs of PropertiesProof of Proposition 6.2.10 (limit copulas). Take (u, v) ∈ (0, 1)2. To show the θ → ∞limit, rewrite the extDJ copula by substituting the definition ofHψ from Equation (6.2.5),then apply the identity in Proposition E.2.17, to obtainCextDJ (u, v;ψ, θ) = u1H←ψ (v; θ)ψ(u−11θ logH←ψ (v; θ))=uψ(u−1ψ←(vH←ψ (v; θ)))H←ψ (v; θ).Since H←k (v; θ)→ 1 as θ →∞ for any v ∈ (0, 1), we obtain the DJ copula as a limit:limθ→∞CextDJ (u, v;ψ, θ) = limh→1uψ(u−1ψ← (vh))h= uψ(u−1ψ← (v))= CDJ (u, v;ψ) .The θ ↓ 0 limit follows directly due to the limiting curve Hψ (t; θ)→ 1/t for t ∈ (1,∞)138APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESidentified in Equation (E.2.4):limθ↓0CextDJ (u, v;ψ, θ) = u11/v= C⊥ (u, v) .Proof of Proposition 6.2.11 (stochastic representation). First, the validity of the distri-bution functions FY and FU |Y are verified. The validity of FY follows directly by Propo-sition E.2.1(3). Next, for any y > 1, after interchanging the derivative and limit, wehavelimu↓0FU |Y(u | y) = (limu↓0u)1D1Hψ (y; θ)ddy1y= 0due to Proposition E.2.1(4), andlimu↑1FU |Y(u | y) = D1Hψ (y; θ)D1Hψ (y; θ)= 1.The density is non-negative:fU |Y(u | y) = D1Hψ (y; θu) + θuD12Hψ (y; θu)D1Hψ (y; θ)=D1Hκψ (y; θu)D1Hψ (y; θ),by Proposition E.2.1(6), where κψ is defined in Equation (E.1.1). Proposition E.1.1 statesthat κψ is a distribution function, so that by Proposition E.2.1(2), both D1Hκψ and D1Hψare negative, and therefore fU |Y is positive.The joint distribution of (U, Y ) isFU,Y (u, y) =∫ y1FU |Y(u | t) fY (t) dt= −∫ y1uD1Hψ (t; θu)D1Hψ (t; θ)D1Hψ (t; θ) dt= u− uHψ (y; θu)for (u, y) ∈ (0, 1)× (1,∞), since Hψ (1; θu) = 1 by Proposition E.2.1(3).To find the copula, we needFU (u) = limy→∞FU,Y (u, y) = u− u limy→∞Hψ (y; θu) = u,139APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESsince Hψ (·; θu) is a survival function, as identified in Proposition E.2.1(3). We also needQY (τ) = H←ψ (1− τ ; θ)for τ ∈ (0, 1).Now, relevant distributional quantities related to(U, 1/Y)areFU,1/Y (u, t) = FU (u)− FU,Y(u,1t)= u− FU,Y(u,1t)for (u, t) ∈ (0, 1)× (1,∞), andQ1/Y (τ) =1QY (1− τ)for τ ∈ (0, 1). Sklar’s theorem gives us the copulaCU,1/Y (u, v) = FU,1/Y(u,Q1/Y (v))= u− FU,Y(u,QY (1− v))= uHψ(H←ψ (v; θ) ; θu)for (u, v) ∈ (0, 1)2, which is the extDJ copula generated by (ψ, θ).Proof of Proposition 6.2.12. Evaluating the extDJ copula using H˜ψ, we obtainuH˜ψ(H˜←ψ (v; θ) ; θu)= uHψ(g(g←(H←ψ (v; θ))); θu)= uHψ(H←ψ (v; θ) ; θu)for all (u, v) ∈ (0, 1)2. Note that the direction of monotonicity of g does not matter.Proof of Proposition 6.2.13 (quadrant dependence). Take ψ to be a concave distributionfunction, θ > 0, and (u, v) ∈ (0, 1)2. The argument is made clearer by letting y =H←ψ (v; θ). Since Hψ (y; ·) is non-increasing, as indicated in Proposition E.2.1(5), we haveHψ (y; θu) ≥ Hψ (y; θ) = v. Multiplying both sides by u gives the desired result,uHψ(H←ψ (v; θ) ; θu)≥ uv.140APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESE.3 IGL Copula ClassE.3.1 Generating Function PropertiesThe mathematics underlying the IGL family of copulas relies heavily on the functionsΨk for k > 1, defined in Equation (6.1.7). These functions are also DJ generatingfunctions. Properties of Ψk are discussed here.Proposition E.3.1. For k > 1, Ψk defined in Equation (6.1.7) has the following prop-erties.1. Derivatives: for t > 0,Ψ′k (t) =Γ (k)− Γ∗ (k, 1/t)Γ (k − 1) (E.3.1)andΨ′′k (t) = −t−k−1 exp(−t−1)Γ (k − 1) . (E.3.2)2. limt→∞Ψk (t) = 1.3. Ψ′k ∈ RV−k at infinity.4. Ψk ∈ RV−1 at 0+.5. limt↓0 Ψk (t) = 0.6. limt↓0 Ψ′k (t) = k − 1.7. limt↓0 Ψ′′k (t) = 0.8. The kappa function of Ψk, defined in Equation (E.1.1), isκk (t) := κΨk (t) =Γ∗(k − 1, 1/t)Γ (k − 1) (E.3.3)for t > 0.Proof. Property 1 follows by direct differentiation.Property 2:limt→∞Ψk (t) = limx↓0(Γ (k)− Γ∗ (k, x)xΓ (k − 1) +Γ∗ (k − 1, x)Γ (k − 1))= limx↓0−xk−1 exp (−x)Γ (k − 1) + 1= 1,141APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESsince k > 1.Property 3: for s > 0,limt→∞Ψ′k (ts)Ψ′k (ts)= limt→∞Γ (k)− Γ∗(k, (ts)−1)Γ (k)− Γ∗ (k, t−1)= limt→∞(ts)−(k−1) exp(− (ts)−1)s−1(−t−2)t−(k−1) exp (−t−1) (−t−2)= s−k limt→∞exp(t−1(1− s−1))= s−k.Property 4: for s > 0,limt→∞Ψk((ts)−1)Ψk (t−1)= s−1 limt→∞Ψ′k((ts)−1)Ψ′k (t−1)= s−1 limt→∞Γ (k)− Γ∗ (k, ts)Γ (k)− Γ∗ (k, t)= s−1Γ (k)Γ (k)= s−1.Property 5:limt↓0Ψk (t) = limy→∞Γ (k)− Γ∗ (k, y)yΓ (k − 1) + limy→∞Γ∗ (k − 1, y)Γ (k − 1) = 0Property 6:limt↓0Ψ′k (t) = limy→∞Γ (k)− Γ∗ (k, y)Γ (k − 1) =Γ (k)Γ (k − 1) = k − 1.Property 7:limt↓0Ψ′′k (t) = − limy→∞yk+1 exp (−y)Γ (k − 1) = 0.Property 8 follows by a direct substitution of Ψk and Ψ′k, found respectively in Equa-tions (6.1.7) and (E.3.1).142APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESE.3.2 FormulasDistributional quantities related to the IGL copula family are presented here. Forneatness, results are bundled as a proposition.Proposition E.3.2. Let k > 1. The IGL copula (u, v) 7→ CIGL (u, v; k) hasCIGL,2|1(v | u; k) = 1− Γ∗ (k − 1, (1− u) /Ψ←k (1− v))Γ (k − 1) , (E.3.4)CIGL,1|2(u | v; k) = 1− Γ (k)− Γ∗ (k, (1− u) /Ψ←k (1− v))Γ (k)− Γ∗ (k, 1/Ψ←k (1− v)) , (E.3.5)andcIGL (u, v; k) =(1− u)k−1 [Ψ←k (1− v)]−k exp{− (1− u) /Ψ←k (1− v)}Γ (k)− Γ (k, 1/Ψ←k (1− v)) (E.3.6)for (u, v) ∈ (0, 1)2 and k > 1.Note that CIGL,2|1(· | u; k) is the Gamma(k − 1, 1) distribution function composedwithv 7→ 1− uΨ←k (1− v).Proof. We work with the reflection copula, which has a simpler form. Fix k > 1 andconsider (u, v) ∈ (0, 1)2.For the 2|1 distribution function, use the DJ copula 2|1 distribution in Equation (E.2.9)to obtainCˆIGL,2|1(v | u; k) = κk (u−1ψ← (v)) = Γ∗ (k − 1, u/Ψ←k (v))Γ (k − 1) ,where κk is defined in Equation (E.3.3). The result follows since CIGL,2|1(v | u; k) =1− CˆIGL,2|1(1− v | 1− u; k).For the 1|2 distribution function, use the DJ copula 1|2 distribution in Equation (E.2.11)to obtainCˆIGL,1|2(u | v; k) = Ψ′k (u−1Ψ←k (v))Ψ′k(Ψ←k (v))=Γ (k)− Γ∗ (k, u/Ψ←k (v))Γ (k)− Γ∗ (k, 1/Ψ←k (v)) .The result follows since CIGL,1|2(u | v; k) = 1− CˆIGL,2|1 (1− u | 1− v; k).143APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESFor the density, use the DJ copula density in Equation (E.1.4), and substitute theform of Ψ′k and Ψ′′k found respectively in Equations (E.3.1) and (E.3.2):cˆIGL (u, v; k) = −Ψ←k (v) Ψ′′k(u−1Ψ←k (v))u2Ψ′k(Ψ←k (v))= −Ψ←k (v)u2[u/Ψ←k (v)]k+1exp(−u/Ψ←k (v))Γ (k)− Γ∗ (k, 1/Ψ←k (v))= −uk−1 [Ψ←k (v)]−k exp (−u/Ψ←k (v))Γ (k)− Γ∗ (k, 1/Ψ←k (v)) .The result follows since cIGL (u, v; k) = cˆIGL (1− u, 1− v; k).E.3.3 Proofs of PropertiesProof of Proposition 6.1.3 (comonotonicity). The strategy is to describe the IGL familyusing a different subclass of DJ generating functions whose k →∞ limit is a comonotonicity-generating construction function.For any k > 1, the DJ copula with generating functionΨ2k : y 7→ Ψk(yk)for Ψk defined in Equation (6.1.7) is the IGL (k) copula, due to the property of invarianceunder scale change described in Corollary 6.2.3. By Proposition 6.2.6, it only needs tobe shown that Ψ2k (y) → min (y, 1) as k → ∞ for almost all y > 0. It is sufficient toshow this by restricting k to the integers.Take k > 1. Out generating function for y > 0 evaluates toΨ2k (y) =k − 1kyFk(ky)+ 1− Fk−1(ky),where Fk is the Gamma (k, 1) distribution function. Letting E1, . . . , Ek ∼ Exp (1) be aniid sample, Fk is the distribution function of Sk =∑ki Ei. As k →∞, the Central Limittheorem states thatSk − k√k→d N (0, 1) ,so thatFk(ky)= P(Sk ≤ ky)= P(Sk − k√k≤√k(1y− 1))= Φ(√k(1y− 1))+ o (1) .144APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESAs such,limk→∞Fk(ky)=0, y > 1;1, y < 1.The generating function, therefore, has the comonotonicity-generating function as a limit:limk→∞Ψ2k (y) = yI(0,1) (y) + I(1,∞) (y) = min (y, 1)for almost all y > 0.Proof of Proposition 6.1.5 (tail dependence). Fix k > 1 and θ > 0.The upper tail dependence λ(IGL)U of the IGL copula family is the lower tail dependenceof the reflection copula family:λ(IGL)U = limu↓0CˆIGL (u, u; k)u= limu↓0Ψk(Ψ←k (u)u)= limt↓0Ψk(tΨk (t))= Ψk(limt↓01Ψ′k (t))= Ψk(1k − 1),by Property 6 of Proposition E.3.1. Using the incomplete gamma function identityΓ∗ (k, x) = (k − 1) Γ∗ (k − 1, x) + xk−1 exp (−x)for any x ≥ 0, the tail dependence can be further evaluated:Ψk(1k − 1)=Γ (k)− Γ∗ (k, k − 1)(k − 1) Γ (k − 1) +Γ∗ (k − 1, k − 1)Γ (k − 1)=Γ (k)− Γ∗ (k, k − 1)Γ (k)+(k − 1) Γ∗ (k − 1, k − 1)Γ (k)=Γ (k)− (k − 1)k−1 exp (− (k − 1))Γ (k)= 1−[(k − 1) e−1]k−1Γ (k).145APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESThe lower tail dependence λ(IGL)L of the IGL copula family isλ(IGL)L = limu↓0CIGL (u, u; k)u= limu↓01− 2 (1− u) + (1− u) Ψk([Ψ←k (1− u)]/ (1− u))1− (1− u)= limt→∞1− 2Ψk (t) + Ψk (t) Ψk(t/Ψk (t))1−Ψk (t)= 1− limt→∞1−Ψk(t/Ψk (t))1−Ψk (t) ,since Ψk (t)→ 1 as t→∞. To continue, we make use of L’hôpital’s rule twice. We willneed the form of Ψ′k found in Equation (E.3.1), and the knowledge thatddtt/Ψk (t) → 1as t→∞ because t/Ψk (t) ∼ t. The desired limit islimt→∞1−Ψk(t/Ψk (t))1−Ψk (t) = limt→∞Ψ′k(t/Ψk (t))Ψ′k (t)= limt→∞Γ (k)− Γ∗ (k,Ψk (t) /t)Γ (k)− Γ∗ (k, 1/t)= limt→∞[Ψk (t)]k−1t−(k−1) exp(−Ψk (t) /t)t−(k−1) exp(−1/t)= exp(limt→∞1−Ψk (t)t)= 1,so that λ(IGL)L = 0.Proof of Proposition 6.1.7 (CCEVI).. Fix u ∈ (0, 1) and k > 1. Using the conditionaldistribution found in Equation (E.3.4), letF (y) = 1− CIGL,2|1(1− 1y| u; k)= 1− Γ∗ (k − 1, R1 (y))Γ (k − 1)= Fk−1(R1 (y))146APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESfor y > 0, where Fk−1 is the Gamma (k − 1, 1) distribution function, andR1 (y) =1− uΨk (y−1).Note that R1 has a unique inverse function since Ψk does, and by Property (4) of Propo-sition (E.3.1), R1 ∈ RV1.The CCEVI of the IGL copula family is just the EVI of F , which we will find usingTheorem 1.2.6 of de Haan and Ferreira (2006). By the theorem, since Fk−1 ∈ D (G0), wehave1− Fk−1 (t) = b1 (t) exp{−∫ tt0dsb2 (s)}, (E.3.7)for t ∈ (t0,∞), t0 ∈ (0,∞), where b1 (continuous) and b2 are positive real functionssatisfyinglimt→∞b1 (t) = b∗1 ∈ (0,∞) , (E.3.8)limt→∞b′2 (t) = 0, (E.3.9)andlimt→∞b2 (t) = 0. (E.3.10)Now investigate the form of 1− F , using Equation (E.3.7) and changing the variableof integration to r = R←1 (s):1− F (y) = b1(R1 (y))exp{−∫ R1(y)t0dsb2 (s)}= b˜1 (y) exp{−∫ yy0drb˜2 (r)},where y0 = R←1 (t0),b˜1 (y) = b1(R1 (y)), (E.3.11)andb˜2 (y) =b2(R1 (y))R′1 (y)(E.3.12)for y ∈ (y0,∞). The DOA condition of F depends on b˜1 and b˜2. First, both functionsare positive: since b1 > 0, we have b˜1 > 0, and since b2 > 0 and R1 ∈ RV1, it follows thatb˜2 > 0 since R′1 > 0 in some neighbourhood of infinity (which we can take to be (y0,∞),without loss of generality). In addition, b˜2 is continuous because b2 is, and because R1 is147APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESdifferentiable everywhere on (y0,∞). Second, by Equations (E.3.8) and (E.3.11),limy→∞b˜1 (y) = limy→∞b1 (y) = 0.Third, we need the derivativeb˜′2 (y) =[R′1 (y)]2b′2(R1 (y))−R′′1 (y) b2 (R1 (y))[R′1 (y)]2= b′2(R1 (y))− b2 (R1 (y)) R′′1 (y)[R′1 (y)]2 ,for y ∈ (y0,∞). By Equations (E.3.9) and (E.3.10), and since R′′1/ (R′1)2 ∈ RV−1 so thatlimy→∞R′′1 (y)[R′1 (y)]2 = 0,we havelimy→∞b˜′2 (y) = limy→∞b′2 (y)−(limy→∞b2 (y)) limy→∞R′′1 (y)[R′1 (y)]2 = 0.By Theorem 1.2.6 of de Haan and Ferreira (2006), F ∈ D (G0), so that the CCEVI ofthe IGL copula family is 0.148APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESE.4 IG Copula ClassE.4.1 Generating Function PropertiesD1Hk (t; η) = −(log t+ 1)(Γ (k)− Γ∗ (k, η log t))+ η (log t)2 Γ∗ (k − 1, η log t)η (t log t)2 Γ (k − 1)D2Hk (t; η) = − 1tη2 log tΓ (k)− Γ∗ (k, η log t)Γ (k)D12Hk (t; η) =(Γ (k)− Γ∗ (k, η log t)) (log t+ 1)− t−η (η log t)k(tη log t)2 Γ (k − 1)Proposition E.4.1. For k > 1, the function Hk defined in Equation (6.1.8) has thefollowing properties.1. Initial slope:limt↓1D1Hk (t; θ) =−∞, 1 < k < 2;−(1 + θ2), k = 2;−1 k > 2.(E.4.1)2. For t > 0,Hk (t; θ) + θD2Hk (t; θ) =Γ∗ (k − 1, θ log t)tΓ (k − 1) . (E.4.2)3. For θ > 0, H←k (·; θ) ∈ RV−1 at 0+.Proof. Property 1: by Equation (E.2.1),limt↓1D1Hk (t; θ) = − limt↓1[Ψk(1θ log t)+1θ (log t)2Ψ′k(1θ log t)]= −[1 + θ limx→∞x2Ψ′k (x)].From Proposition E.3.1(3), x 7→ x2Ψ′k (x) ∈ RV2−k at infinity, so that the limit is clear149APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESwhenever k 6= 2. Since Ψ′2 (x) = 1−(x−1 + 1)exp(−x−1),limx→∞x2Ψ′2 (x) = limt↓01− (t+ 1) exp (−t)t2= limt↓0− exp (−t) + exp (−t) (1 + t)2t=12.Property 2: using the identity in Equation (E.2.6),Hk (t; θ) + θD2Hk (t; θ) =1tκk(1θ log t),where κk can be found in Equation (E.3.3).Property 3: First, find the index of variation of Hk (·; θ) at infinity: for s > 0,limt→∞Hk (ts; θ)Hk (t; θ)= s−1 limt→∞Ψk((θ log (ts))−1)Ψk((θ log (t))−1) .By Property 4 of Proposition E.3.1, Ψk ∈ RV−1 at 0+. Since (θ log t)−1 ∈ RV0 as t→∞,it follows thatΨk((θ log (t))−1) ∈ RV(−1)(0) = RV0as t→∞. Finally,limt→∞Hk (ts; θ)Hk (t; θ)= s−1.E.4.2 FormulasDistributional quantities related to the IG copula family are presented here. Forneatness, results are bundled as a proposition.Proposition E.4.2. Let θ > 0 and k > 1. The IG copula (u, v) 7→ CIG (u, v; k, θ) hasCIG,2|1(v | u; k, θ) = 1− Γ∗(k − 1, θ (1− u) log (H←k (1− v; θ)))H←k (1− v; θ) Γ (k − 1), (E.4.3)150APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESCIG,1|2(v | u; k, θ) = 1− (1− u) D1Hψ(H←ψ (1− v; θ) ; θ (1− u))D1Hψ(H←ψ (1− v; θ) ; θ)(which is presented this way because the long-hand form is very cumbersome), andcIG (u, v; k, θ) = −θk−1 (1− u)k−1 (log tv)k−2 t−θ(1−u)v + Γ∗(k − 1, θ (1− u) log tv)Γ (k − 1) t2v D1Hk (tv; θ),(E.4.4)where tv = H←k (1− v; θ).Proof. We work with the reflection copula, which has a simpler form. Fix θ > 0 andk > 1, and consider (u, v) ∈ (0, 1)2.For the 2|1 distribution function,CˆIG,2|1(v|u; k, θ) = dduuHk(H←k (v; θ) ; θu)= Hk(H←k (v; θ) ; θu)+ θuD2Hk(H←k (v; θ) ; θu)=Γ∗(k − 1, θu log (H←k (v; θ)))H←k (v; θ) Γ (k − 1), (E.4.5)by Property 2 of Proposition E.2.1. The result follows sinceCIG,2|1(v|u; k) = 1− CˆIG,2|1 (1− v|1− u; k) .For the 1|2 distribution function,CˆIG,1|2(u|v; k, θ) = ddvuHk(H←k (v; θ) ; θu)=uD1Hk(H←k (v; θ) ; θu)D1Hk(H←k (v; θ) ; θ) .For the density function, differentiate Equation (E.4.5) with respect to v, lettingtv = H←k (1− v; θ):cˆIG (u, v; k, θ) =− (θu log t1−v)k−2 exp (−θu log t1−v) θ − Γ∗ (k − 1, θu log t1−v)Γ (k − 1) t21−vdt1−vdv= −θk−1uk−1 (log t1−v)k−2 t−θu1−v + Γ∗ (k − 1, θu log t1−v)Γ (k − 1) t21−v D1Hk (t1−v; θ).151APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESE.4.3 Proofs of PropertiesProof of Proposition 6.1.4. Fix k > 1 and θ > 0.The upper tail dependence λ(IG)U of the IG copula family is the lower tail dependenceof the reflection IG copula family. So,λ(IG)U = limu↓0CˆIG (u, u; k, θ)u= limu↓0Hk(H←k (u; θ) ; θu)≤ limu↓01H←k (u; θ)= 0,due to the upper bound of Hk (see Property 5 in Proposition E.2.1). Since Hk is non-negative, it follows that λ(IG)U = 0.For the lower tail dependence λ(IG)L of the IG copula family,λ(IG)L = limu↓0CIG (u, u; k, θ)u= limu↑11− 2u+ uHk(H←k (u; θ) ; θu)1− u= 1− limu↑11−Hk(H←k (u; θ) ; θu)1− u= 1− limu↑1[D1Hk(H←k (u; θ) ; θu)D1Hk(H←k (u; θ) ; θ) + θD2Hk (H←k (u; θ) ; θu)].For the first limit, since D1Hk (1; θ) is finite and non-zero whenever k ≥ 2 (Equa-tion (E.4.1)), we obtainlimu↑1D1Hk(H←k (u; θ) ; θu)D1Hk(H←k (u; θ) ; θ) = D1Hk (1; θ)D1Hk (1; θ)= 1.For the second limit, set tu = H←k (u; θ) ↑ 1 as u ↑ 1 and use Equation (E.2.2) to obtainlimu↑1D2Hk (tu; θu) = −1θlimu↑11θu log tuΨ′k(1θu log tu)= −1θlimx→∞xΨ′k (x)= 0,152APPENDIX E. PROOFS RELATED TO THE NEW COPULA FAMILIESsince xΨ′k (x) ∈ RV−(k−1) by Property 3 of Proposition E.3.1. It follows that λ(IG)L = 0when k ≥ 2.Proof of Proposition 6.1.6 (CCEVI).. Take k > 1, θ > 0, and u ∈ (0, 1), and let u¯ = 1−u. Define v (t) = H←k(t−1; θ)for t > 1, and note that by Property 3 of Proposition E.2.1,v ∈ RV1 at infinity.To find the CCEVI, investigate the following limit for s > 0, using Equation (E.4.3):limt→∞1− CIG,2|1(1− (ts)−1 | u; k, θ)1− CIG,2|1(1− t−1 | u; k, θ)= limt→∞Γ∗(k − 1, θu¯ log (v (ts)))Γ∗(k − 1, θu¯ log (v (t)))( limt→∞v (ts)v (t))−1=s−1 limt→∞[log(v (ts))]k−2 [v (ts)]−1−θu¯v′ (ts) s[log(v (t))]k−2 [v (t)]−1−θu¯v′ (t)=(limt→∞log(v (ts))log(v (t)) )k−2( limt→∞v (ts)v (t))−1−θu¯(limt→∞v′ (ts)v′ (t))=s−1−θu¯,since log ◦v ∈ RV0 at infinity and v′ ∈ RV0 at infinity. It follows that the CCEVI isu 7→ 11 + θ (1− u) .153Appendix FApplication SupplementThis appendix provides the details of the analysis discussed in Chapter 7. First, desea-sonalization of the river discharge variable is discussed in Section F.1. The marginals ofthe response and predictors are modelled in Sections F.2 and F.3, respectively. Then, thedependence structure amongst the variables is modelled using vine copulas in Section F.4.The final section, Section F.5, discusses the weights chosen to assess the competing fore-casters.F.1 Deseasonalization of Discharge on a Log ScaleThis section discusses the details of estimating the sequences {µi} and {σi} in Equa-tion (7.1.1) using discharge data Yt on the fitting set.To estimate the location parameter µi, we use a kernel-weighted sample average oflog Y :µˆi =∑Tt=1Kb(δ (t)− i) (log Yt)∑Tt=1Kb(δ (t)− i) ,where Kb : R→ [0,∞) is a kernel density function symmetric about 0, which we take tobe a tri-cubic kernel for some bandwidth b > 0 defined asKb (x) =(1− ∣∣xb∣∣3)3 , −b ≤ x ≤ b0, otherwise.(F.1.1)Here, b−1 represents the furthest number of days away from i to take in the computationof the weighted average. We choose b = 4.To estimate the scale parameter σi, we similarly use a kernel-weighted sample standard154APPENDIX F. APPLICATION SUPPLEMENTdeviation:σˆi =√√√√∑Tt=1 Kb (δ (t)− i) (log Yt − µˆi)2∑Tt=1Kb(δ (t)− i) ,where Kb : R→ [0,∞) is again the tri-cubic kernel for which we use b = 8.F.2 Marginal of the ResponseThis section discusses fitting a marginal distribution to the change in deseasonalizeddischarge, ∆Zt := Zt − Zt−1, with a Generalized Pareto tail.Let T be the set of all dates where discharge data are to be used for analysis. Further,denote Ttr, Tval, Tfit, and Ttest as the sets of dates corresponding to the training, validation,fitting, and test sets, respectively, so that Tfit = Ttr ∪ Tval and T = Tfit ∪ Ttest.The GPD-tail model using some subset T∗ ⊂ T and parameters ` > 0 (threshold),σ > 0 (scale), and ξ ∈ R (shape, or extreme value index) is given byFˆ∆Zt (z; T∗, `, σ, ξ) =FˆEmp (z; T∗) , z ≤ `;p` + (1− p`)FGPD ( z−`σ ; ξ) , z > `,where FˆEmp (·; T∗) is the empirical distribution function of the data {∆Zt : t ∈ T∗}, p` =FˆEmp (`; T∗), and FGPD is the standardized Generalized Pareto distribution functionFGPD (x; ξ) = 1− (1 + ξx)−1/ξfor x ≥ 0 (and x ≤ −1/ξ if ξ < 0). Our objective is to choose T∗, `, σ, and ξ so thatFˆ∆Zt (z; T∗, `, σ, ξ) scores well on an out-of-sample data set.We first fit the parameters `, σ, and ξ using the following steps:1. Build the set of distributions{Fˆ∆Zt(z; Ttr, `, σˆMLE, ξˆMLE): ` ∈ R}, (F.2.1)where σˆMLE and ξˆMLE are the maximum likelihood estimators of σ and ξ on thetraining set.2. Choose the threshold parameter ˆ` whose distribution in Equation (F.2.1)—as aforecaster—scores optimally on the validation set.155APPENDIX F. APPLICATION SUPPLEMENTThe final model of the marginal distribution is Fˆ∆Zt(·; Tfit, ˆ`, σˆMLE, ξˆMLE).Note that the entire fitting data should not be used in Step 1, because out-of-sampledata are needed for scoring in Step 2. Scoring in Step 2 should not occur on the samedata set as is used in Step 1, since the entire empirical distribution would be favouredover any GPD tail due to the empirical distribution’s optimal in-sample score.The final model has a threshold of 0.2 (corresponding to the empirical 0.81-quantile),with scale and shape parameter estimates being 0.22 and 0.09, respectively, so that thedischarge distribution is modelled to have a heavier tail than exponential. See Figure F.1for QQ and PP plots of the model fit. Finally, the distribution estimates are transformedso that they estimate the distribution of Zt.llllllllllllllllllllllllllllllllllllllllllllllllll0.61.01.40.5 1.0 1.5Empirical QuantilesGPD Model Quantileslllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.970.980.991.000.97 0.98 0.99 1.00Empirical ProbabilitiesGPD Model ProbabilitiesFigure F.1: QQ and PP plots of the marginal distribution estimate of deseasonalized discharge, againstthe empirical distribution for the fitting data. Only the upper corners of the plots are shown, since bothdistributions are equal in the lower corners.F.3 Marginal of SnowmeltThe marginal distribution of snowmelt is estimated in a similar way to the deseason-alization of discharge.Let Xt be the snowmelt observed on date t. The cdf of Xt is assumed to only dependon the day of year δ (t), and is modelled with a weighted empirical distribution on thefitting set:FˆXt (x) =∑r∈Tfit Kb(δ (r)− δ (t)) I[Xr,∞) (x)∑r∈Tfit Kb(δ (r)− δ (t)) ,156APPENDIX F. APPLICATION SUPPLEMENTwhere Kb is the tri-cubic kernel given in Equation (F.1.1). A bandwidth of b = 4 is used.See Figure F.2 for diagnostic plots of the resulting model.Training Validation Test0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.00.000.250.500.751.00PIT snowmeltDistribution FunctionFigure F.2: Empirical distribution functions of the probability integral transformed (PIT) snowmelt,using the estimated cdf of snowmelt. The PIT samples should be from a Uniform(0,1) distribution if thecdf estimate is correct, occurring when the empirical distribution falls on the diagonal.F.4 DependenceThis section discusses fitting vine copula models to the response and four predictors.First, a 2-truncated vine is fit to the four predictors after applying a probabilityintegral transform using the marginal models. Since we consider two marginal models forthe deseasonalized discharge, we obtain two such vines. The vine array is chosen usingsequential minimum spanning trees (cf. Joe, 2014, Section 6.17). Then, the followingbivariate copula models are fit using MLE, and are selected using AIC with the functionRVineCopSelect() from the R package VineCopula (Schepsmeier et al., 2016): Gaussian,t, MTCJ, Gumbel, Frank, Joe, BB1, BB7, and BB8. Coding the variables according to1. one-day-ahead change in deseasonalized discharge (response),2. change in deseasonalized discharge: lag 0,3. change in deseasonalized discharge: lag 1,4. drop in snowpack: lag 0, and5. drop in snowpack: lag 1,we obtain the vine array157APPENDIX F. APPLICATION SUPPLEMENT5 4 3 20 5 5 30 0 4 5 (F.4.1)in both cases. The resulting copula models corresponding to the edges in the vine arrayare similar in both vines, and are as follows:Empirical Marginalbb8(6, 0.768) bb8(2.43, 0.828) bvtcop(0.612, 3.11)mtcjv(0.0847) frk(1.14)GPD-tail Marginalbb8(6, 0.768) bb8(2.42, 0.831) bvtcop(0.613, 2.98)mtcjv(0.0876) frk(1.15)The copula indicated in row i and column j corresponds to the edge in row i+ 1 andcolumn j + 1 of the vine array. The abbreviations of the copula models can be found inTable F.1.Abbreviation Copula Familybvncop Gaussianbvtcop tmtcj MTCJgum Gumbelfrk Frankjoe Joebb1 BB1bb7 BB7bb8 BB8indepcop Independence CopulaTable F.1: Abbreviations of the copula families considered. A ‘v’ appended after an abbreviationrepresents the vertically reflected copula family, an appended ‘u’ represents the horizontally reflectedcopula family, and an appended ‘r’ represents the reflected copula family.Next, pairing orders for the response was chosen so that the resulting PCBN is a vine(for computational simplicity):• Order one: 2, 3, 5, 4• Order two: 4, 5, 3, 2158APPENDIX F. APPLICATION SUPPLEMENTThe copula models considered for each edge are the independence copula, Gaussian, t,MTCJ, Gumbel, Frank, Joe, BB1, and BB8, and these are fit with CNQR using thesequential method. Also, the IG and IGL copulas were fit for edges that were not closeto independence (to avoid computational difficulty), but these did not lead to an optimalscore. If there was evidence that a non-constant CCEVI is present, then one mightconsider choosing the IG copula, so long as the resulting score is “acceptable”. The fittedcopulas for each scenario can be seen in Table F.2. These copula families exhibit a rangeof tail dependence – see Table F.3 for a summary of symmetry and tail dependence ofthese copula models.Empirical, Order 1 GPD tail, Order 1 Empirical, Order 2 GPD tail, Order 2bb1r(0.24, 1.6) bvncop(0.61) bvtcop(0.4, 16) bvtcop(0.38, 14)bb1v(0.23, 1.2) bvtcop(-0.38, 9.2) bvtcop(-0.25, 23) bvncop(-0.25)bb1r(0.015, 1) frk(0.088) bvncop(0.07) gumr(1.1)bvtcop(0.29, 95) bvncop(0.29) bvncop(0.54) frk(4.5)Table F.2: Copulas linking the response and the predictors using CNQR. The marginal distribution,followed by the linkage order of the predictors, are indicated in the top row. The fitted copulas aredisplayed corresponding to the linkage order, from top to bottom.Family Symmetry Tail DependenceLower UpperGaussian symmetric intermediate intermediatet symmetric strong strongMTCJ permutation yes tail quadrant independenceGumbel permutation intermediate strongFrank symmetric tail quadrant independence tail quadrant independenceJoe permutation tail quadrant independence yesBB1 permutation yes yesBB7 permutation yes yesBB8 permutation tail quadrant independence tail quadrant independenceTable F.3: Properties of some bivariate copula families. All families have members that are at leastpermutation symmetric (symmetry after swapping the two variables); those that are listed as “symmetric”also have reflection symmetry (a copula C is reflection symmetric if RC = C, where R is the reflectionoperator defined in Definition 2.4.1). Dependence in either tail of each copula are also indicated,sometimes indicating the strength of tail dependence (a concept related to tail order – see Joe, 2014,Section 2.16). These copulas cover a range of tail behaviours in comparison to the Gaussian copula.Next, it was identified that variables 2, 3 and variables 1, 2 (both pairs are change inds discharge separated by a lag of 1) have dependence that is not permutation symmetric(as seen in a normal scores plot). To accommodate this, a skewed BB1 copula was fit tothese edges (for a GPD marginal). The BB1 copula family is described by parameters159APPENDIX F. APPLICATION SUPPLEMENTθ > 0 and δ > 1 (details can be found in Joe, 2014, Section 4.17.1), and the upper-skewedversion has an additional parameter α ∈ (0, 1) (larger values imply more skewness). Therelationship between the copulas isCBB1sk (u, v; θ, δ, α) = CBB1(u, v1−α; θ, δ)vαfor (u, v) ∈ (0, 1)2, where CBB1sk and CBB1 are skewed BB1 and BB1 copulas, respectively.Because the skewed BB1 family has three parameters, this family was not considered forall edges, for computational simplicity. The skewed BB1 copula linking variables 2, 3(in the predictor vine) has parameter estimates θˆ = 1.21, δˆ = 1.44, and αˆ = 0.26. Thisresulted in an improvement on the predictor vine of approximately 49 AIC. The resultingcopulas linking the predictors and the response, fit by CNQR, are shown in Table F.4.Candidate 5: with skewbb1sk(0.41, 1.6, 0.17)bvtcop(-0.42, 7.8)frk(0.12)bvncop(0.27)Table F.4: Copulas linking the response and the predictors using CNQR, for Order 1 with a GPDmarginal, using a skewed BB1 copula linking change in ds discharge separated by a lag of 1. The fittedcopulas are displayed corresponding to the linkage order, from top to bottom. “bb1sk” is short for “skewedBB1”.Of the five vine CNQR models, we must select an optimal one. We consider themean scores (Figure F.3), calibration plots (Figure F.4), and calibration histograms (Fig-ure F.5) on the validation set. It appears that all five are generally acceptable. Candidate5 seems to have the best score, and produces forecasts with a GPD tail (as opposed toforecasts with a right-endpoint, as in Candidates 1 and 3), so it is selected.l ll ll0.000.010.021 2 3 4 5ForecasterScoreFigure F.3: Mean score estimates of the candidate vine CNQR forecasters, estimated using the trainingset. Error bars represent standard error of the mean estimate (which ignore autocorrelation).160APPENDIX F. APPLICATION SUPPLEMENTllllll ll lllllllllllllllll lllllllllllllllllllll llllCandidate 4 Candidate 5Candidate 1 Candidate 2 Candidate 30.9 0.95 1 0.9 0.95 10.9 0.95 10.0000.0250.0500.0750.1000.0000.0250.0500.0750.100Quantile LevelExceedance ProbabilityFigure F.4: Calibration plots on the candidate vine CNQR forecasters, using the training set. Eachappear to be sufficiently calibrated.Candidate 4 Candidate 5Candidate 1 Candidate 2 Candidate 30.9 0.95 1 0.9 0.95 10.9 0.95 10.0000.0050.0100.0150.0000.0050.0100.015Quantile IndexHistogramFigure F.5: Calibration histograms on the candidate vine CNQR forecasters, using the training set.161APPENDIX F. APPLICATION SUPPLEMENTF.5 WeightsThis section discusses the choice of weight functions to be used for evaluating fore-casters when predictors are large. We seek weight functions over the space of the originalpredictors (that is, not deseasonalized). Practically, the weight functions can be thoughtof as the “amount of concern” that one has upon observing the predictors.First, consider a weight function on one predictor. We would like weights to transitionfrom 0 at −∞ to a weight of 1 at +∞, and therefore choose the logistic curve, definedfor location parameter x0 ∈ R and rate parameter r > 0 asw0 (x; r, x0) =11 + exp(−r (x− x0)) (F.5.1)for x ∈ R. We will use this function to construct the weight functions for each scenario.F.5.1 Scenario 1: Large DischargeFor this scenario, we suppose that observing a large discharge on either today or yes-terday (lags 0 or 1) is enough for concern (i.e., one would not be “doubly concerned” uponseeing a large discharge two days in a row). An appropriate weight function motivatedby inclusion-exclusion isw1 (x2, x3; rD, x0D) = w0 (x2; rD, x0D)+w0 (x3; rD, x0D)−w0 (x2; rD, x0D)w0 (x3; rD, x0D) ,(F.5.2)where (x2, x3) ∈ R2 are observed discharge for both lags (labels are chosen for consistencywith variable labels in Appendix F.4). We choose parameters rD ∈ {0.02, 0.06, 0.1} andx0D ∈ {40, 90, 140}. The single-variable logistic function w0 under these parameters canbe seen contrasted against a histogram of discharge in Figure F.6.162APPENDIX F. APPLICATION SUPPLEMENT0.000.250.500.751.000 100 200 300 400Discharge (m3 s)WeightFigure F.6: Single-variable logistic weight functions resulting from the weight parameter choices fordischarge. The histogram is that of discharge data on the test set (and is without scale).F.5.2 Scenario 2: Large SnowmeltFor this scenario, we suppose that observing a large snowmelt on both today andyesterday (lags 0 and 1) is “doubly concerning”. An appropriate weight function isw2 (x4, x5; rS, x0S) = w0 (x4; rS, x0S) + w0 (x5; rS, x0S) , (F.5.3)where (x4, x5) ∈ R2 are observed snowmelt for both lags (labels are chosen for consistencywith variable labels in Appendix F.4). We choose parameters rS ∈ {0.05, 0.2, 0.7} andx0S ∈ {0, 7.5, 15}. The single-variable logistic function w0 under these parameters canbe seen juxtaposed against a histogram of snowmelt in Figure F.7.0.000.250.500.751.00−25 0 25 50Drop in SnowpackWeightFigure F.7: Single-variable logistic weight functions resulting from the weight parameter choices forsnowmelt. The histogram is that of snowmelt data on the test set (and is without scale).163APPENDIX F. APPLICATION SUPPLEMENTF.5.3 Scenarios 3 and 4: Large Snowmelt and/or Large Dis-chargeScenario 3 assigns a high weight if either a large snowmelt or a large discharge isobserved. We suppose that observing both is “doubly concerning”, and take a weightfunction ofw3 (x2, . . . , x5; rD, rS, x0D, x0S) = w1 (x2, x3; rD, x0D) + w2 (x4, x5; rS, x0S) , (F.5.4)where (x2, . . . , x5) ∈ (0,∞)2 × R2 are the observed lags 0 and 1 discharge and lags 0and 1 snowmelt, respectively. The same parameters for rD, rS, x0D, and x0S as used inScenarios 1 and 2 are used here.Scenario 4 assigns a high weight if both a large discharge or large snowmelt is observed.An appropriate weight function is thusw4 (x2, . . . , x5; rD, rS, x0D, x0S) = w1 (x2, x3; rD, x0D)w2 (x4, x5; rS, x0S) . (F.5.5)164

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0342941/manifest

Comment

Related Items