UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Risk region estimation for light-tailed multivariate samples Hou, Yiwei 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_november_hou_yiwei.pdf [ 2.37MB ]
JSON: 24-1.0357224.json
JSON-LD: 24-1.0357224-ld.json
RDF/XML (Pretty): 24-1.0357224-rdf.xml
RDF/JSON: 24-1.0357224-rdf.json
Turtle: 24-1.0357224-turtle.txt
N-Triples: 24-1.0357224-rdf-ntriples.txt
Original Record: 24-1.0357224-source.json
Full Text

Full Text

Risk Region Estimation for Light-tailed MultivariateSamplesbyYiwei HouB.Econ Renmin University of China, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)October 2017c© Yiwei Hou, 2017AbstractEstimation of multivariate quantile regions with very small probabilities, referredto as risk regions in this report, plays an important part in various applications. Yet,it is a difficult problem since such regions contain hardly any or no data. Existingmethods address the problem only for heavy-tailed distributions or for bivariatedistributions with non-degenerate exponent measure that do not include the casesof asymptotic independence.In this report, we propose a more flexible framework to supplement existingapproaches to risk region estimation by allowing tails of the underlying distribu-tion to be light as well as covering situations of tail independence. In particular,we concentrate on a class of distributions assumed to have a density function withhomothetic level sets. In simulation studies, reasonable performance of our pro-posed method is demonstrated. We also present two real-life applications to furtherillustrate the flexibility and performance of our method.iiLay SummaryIdentifying events that are very unlikely to happen is very important in various ap-plications as these events could be viewed as signals for extreme risk. One is ofteninterested in multiple factors associated with such events at the same time. Forinstance, when assessing future asset value of a company, there is a need to inves-tigate how some of the fundamental factors, like equity prices, yields for differentbond maturities and bond mortality rate and etc., behave jointly, especially casesthat rarely happen yet could result in potential high losses. In this report, we sup-plement existing methods and propose a more flexible framework to estimate theregions associated with very small probabilities that could be utilized to identifysuch extreme events. We demonstrate the performance of proposed method boththrough simulated data and real-life data.iiiPrefaceThis dissertation is original, unpublished, independent work by the author, YiweiHou. The estimation algorithm of the limit set approach described in Section 3.2.2was implemented by Allen Lee. The implementation code for Reversible JumpMCMC algorithm in Section 3.2.1 was build upon the work from Vincent Zhai,where I modified the prior structure of the knot locations and tuned the parameters.The algorithm to perform importance sampling in Section 4.1.3 is of my designand implementation. All simulation studies and data examples are suggested bymy supervisor, Professor Natalia Nolde and are of my original work.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Density generator . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Simulation from homothetic densities . . . . . . . . . . . . . . . 102.5 Relation to D-class of Distributions . . . . . . . . . . . . . . . . 133 Estimation of Multivariate Risk Regions . . . . . . . . . . . . . . . . 173.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Estimation of the shape set . . . . . . . . . . . . . . . . . . . . . 20v3.2.1 Initial estimate via the limiting shape of sample clouds . . 203.2.2 Inferences on the directional dispersion . . . . . . . . . . 233.3 Estimation of the density generator . . . . . . . . . . . . . . . . . 263.3.1 Maximum likelihood estimation . . . . . . . . . . . . . . 273.3.2 The estimator based on asymptotic assumptions . . . . . . 273.4 Estimation of the scaling factor . . . . . . . . . . . . . . . . . . . 304 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1 Simulation settings . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . 344.1.2 Simulation methods . . . . . . . . . . . . . . . . . . . . 354.1.3 Estimation assessment . . . . . . . . . . . . . . . . . . . 374.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.1 Example 1: Australian Institute of Sport data set revisited . . . . . 525.2 Example 2: Financial application . . . . . . . . . . . . . . . . . . 556 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59viList of TablesTable 4.1 Three points from the NE2 density with large estimated radialcomponents from the limit set approach. Their coordinates, truevalues and estimated values of the dispersion function are alsopresented. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Table 4.2 Root mean square error for MLE estimator of the density gen-erator parameters . . . . . . . . . . . . . . . . . . . . . . . . 46Table 4.3 Root mean square error for the estimator of the density genera-tor based on the asymptotic result . . . . . . . . . . . . . . . 46Table 4.4 Mean and the standard deviation of the ratio|D∆Dˆ||D| of the threemethods for the four densities with η = 0.2,0.5,0.8 in the scalematrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 4.5 Mean and standard deviations for the estimated scaling factorsusing RJMCMC with true density generator f0 and estimatedfˆ0 for p = 1/200,1/500,1/1000 based on 100 samples of size1000 from the four densities. True values are also included. . . 49Table 4.6 Median of the relative error P(Qp∆Qˆp)/p of the three methodsfor p = 1/200,1/500,1/1000 with the four densities. . . . . . 50Table 5.1 Logarithms of Bayes factors for the two methods and the fourpairs of variables with respect to the normal alternative . . . . 53Table 5.2 Expected and observed numbers of points in Qp and Qˆp for p=1/200,1/500,1/1000. . . . . . . . . . . . . . . . . . . . . . . 56viiList of FiguresFigure 2.1 Sample clouds with n = 1000 data points from four densitiesin HD whose shape sets are specified by (1) equation (2.3.1)with diagonal entry of Σ as 1 and off-diagonal entry as 0.5;(2)equation (2.3.2) with α= (−1,6) and the scale matrix Ω = I;(3) equation (2.3.3) with λ = 1,θ = 2; (4) equation (2.3.4).The density generator is equation (2.4.3) for all four samples. 14Figure 4.1 True and estimated shape sets and dispersion functions fromthe three methods based on one sample of size 1000 from E1and E2 densities. The off-diagonal entry in Σ is set to be η =0.5. The number of intervals is k = 6 in the limit set approach. 40Figure 4.2 True and estimated shape sets and dispersion functions fromthe three methods based on one sample of size 1000 from NE1and NE2 densities. The off-diagonal entry in Σ is set to beη = 0.5. The skewness parameter is α = (−1,6). The numberof intervals is k = 6 in the limit set approach. . . . . . . . . . 41Figure 4.3 True and estimated dispersion functions from the limit set ap-proach based on same single sample of size 1000 from the fourdensities. The off-diagonal entry of Σ is η = 0.5 and the skew-ness parameter is α = c(−1,6). The number of intervals is setto k = 4,10,15. . . . . . . . . . . . . . . . . . . . . . . . . . 42viiiFigure 4.4 MLE estimates for the shape and scale parameters of the Weibulldensity based on the same 100 samples of size 1000 from thefour densities with η = 0.5 and α = c(−1,6). True valuesof the parameters are indicated by the red line segments. Theshape of the level sets is estimated with limit set approach. . . 44Figure 4.5 Estimated β and c based on 100 samples of size 1000 fromthe four densities using the estimator bases on the von Misesassumption. True values of the parameters are indicated by thered line segments. The shape of the level sets is estimated withlimit set approach. . . . . . . . . . . . . . . . . . . . . . . . 45Figure 4.6 True and estimated risk regions for p = 1/200,1/1000 basedon a single sample of size 1000 from the four densities. . . . . 48Figure 5.1 Contour plots of the posterior predictive D-class densities ofthe four pairs of the variables using RJMCMC with fˆ0 (toppanel) and the estimated risk regions for p = 1/200 (bottompanel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 5.2 Estimated risk regions of daily returns of the two stocks for p=1/200,1/500,1/1000. The size of the data set is n = 13865. . 55ixAcknowledgmentsFirst and foremost, I would like to thank my supervisor, Professor Natalia Nolde.It was a great experience to work under her guidance. I am sincerely gratefulfor her constant support, patience and invaluable advice throughout my researchwork as well as for her kindness, thoughtfulness and encouragement in personalmatters. I also want to thank Professor Harry Joe for being my second reader andfor generously spending time reading my thesis and providing helpful feedback.A big thank you to Allen Lee and Vincent Zhai for their previous work relatedto this project. Allen developed the algorithm to estimate the shape of the limitset and Vincent provided the implementation code for the algorithm in the D-classframework. Interacting with these brilliant students and building upon their workhas been an extremely rewarding experience.I also wish to thank everyone in the department for the ever supportive envi-ronment. Thank you to my wonderful office mates, Seong, Jasmine and Bo, for allthe inspiring discussions and so many happy memories. Thank you to my friends,Qiong, Jonathan, Yijun, Tingting, Julian for their help and advice and for beingthere whenever I needed.Lastly my deepest gratitude goes to my dear parents for a warm and lovingfamily, for being so caring and encouraging all the time and for their love andsupport.xChapter 1IntroductionAccurate assessments for the probabilities of multivariate extreme events are ofgreat importance in various applications including finanical risk management (stresstesting)[McNeil and Smith, 2012], air pollutant regulations [Heffernan and Tawn,2004] and Internet traffic modelling [Resnick and Rootze´n, 2000]. For example,in order to predict high pollution concentrations and better understand the cause,there is a need to investigate the joint extremes of certain combinatons of air pollu-tants. If extreme concentrations of several pollutants are detected at the same time,examinations of their physical and chemical processesses can be used as an alarmsystem in air quality regulation. In the univariate setting, quantiles are essentialtools for detecting and analyzing extremes, see de Haan and Ferreira [2007]. It isnatural to extend the notion of quantile to multidimensional setting. However, themultivariate case is much more complicated. First of all, there is no unique wayto order multivariate data, thus giving rise to a number of possible definitions ofmultivariate quantiles. In this report, assuming that the data can be described by amultivariate probability density, we consider quantile regions of the formQp := {x ∈ Rd | f (x)< cp} (1.0.1)for a random vector X with density f on Rd . Here p is the probability level andcp is chosen such that P(X ∈ Qp) = p. Second, we are interested in cases where pis small. In such cases, we refer to Qp as the risk region. Estimation of such risk1regions is difficult since p may be so small that there is hardly any or no data inQp. Moreover, we need to estimate a set instead of only one value in the univariatesetting, with very limited data.There is an ongoing development of sophisticated statistical methods to esti-mate multivariate risk regions in the literature. Cai et al. [2011] propose an es-timator for extreme risk regions under multivariate regular variation and demon-strate good performance of the estimator through a simulation study. Einmahl et al.[2013] propose a semi-parametric estimator for bivariate quantile regions estima-tion assuming the existence of non-degenerate exponent measure. In this report, wediscuss another class of distributions assumed to have a density function with ho-mothetic level sets. These densities can be specified by two quantities: a set deter-mining the shape of the level sets (shape set), and a decreasing function governingthe rate of decay of the tails (the density generator). Such distributions generalizethe family of elliptical distributions, moving away from the elliptical symmetry.We furthermore assume that these distributions have light tails or, equivalently,that the density generator is light-tailed. Detailed discussion of densities with ho-mothetic level sets is provided in Chapter 2. The above mentioned three classescontain a variety of distributions. However, it is an important feature of the modelin Cai et al. [2011] that distributions are heavy-tailed. The method in Einmahl et al.[2013] does not include the cases of asymptotic independence.For densities with homothetic level sets, there is a simple way to express riskregions as risk regions inherit the common shape of the level sets. One only needsto expand the shape set by a factor so that the condition P(X ∈Qp) = p is fulfilled.In this report, we propose a two-step method to estimate the risk regions for densi-ties with homothetic level sets. The method can be broken down into the followingtwo steps:• estimating the shape of level sets;• estimating the (tail of the) density generator.There are few methods available in the literature dealing with inference for densi-ties having homothetic level sets of non-specified shape. Ferreira and Steel [2005]introduce the directionally dispersed class of multivariate distributions and propose2a flexible implementation for modelling directional dispersion in any dimensionwith hyperspherical log-splines. We show that under certain assumptions, densi-ties with homothetic level sets coincide with the class of the directionally disperseddistributions. We adopt the inference framework in Ferreira and Steel [2005] aspart of our alogorithm. However, their method requires the function governing therate of decay of the tails to be pre-specified and thus can only be applied when theprior knowledge of the density generator is available. This may not be practicalin many real-life applications. Motivated by the idea that the limiting shape ofsample clouds from light-tailed distributions gives the impression of the shape forthe level sets, we propose to obtain an initial estimate of the density generator viaan estimate of the limiting shape of sample clouds. We compare two approachesfor estimation of the density generator: maximum likelihood estimate assuminga Weibull type distribution and an estimator based on asymptotic results. The as-sumptions and estimation process are discussed in Section 3.2. Combined with thisintial estimate, the method in Ferreira and Steel [2005] can be used to model theshape of level sets.Based on the shape, we can then estimate how much we should extrapolate toget the risk regions of desired probability levels. Essentially this is a univariateextreme value problem. In this report, we adopt the estimator assuming Weibull-tailed distributions; see Asimit et al. [2010]. The two-step algorithm is discussedin detail in Chapter 3.Simulation results are presented in Chapter 4. We show that our approach per-forms comparably well as other existing estimators for distributions with ellipticallevel sets when the simulated data do come from an elliptical model. In caseswhere the underlying model is no longer elliptical, our method captures the shapeand range of risk regions and outperforms the estimators based on the ellipticityassumption. A data example and a financial application are given in Chapter 5. InChapter 6, we provide discussion of our work and possible future studies.3Chapter 2FrameworkIn this report, we address the question of risk region estimation for a particular classof densities whose level sets are homothetic. These densities can be completelyspecified by two quantities: a set determining the shape of the level sets, and adecreasing function governing the rate of decay of the tails. The structural simplic-ity makes them a good starting point in the investigation of risk region estimation.Furthermore, we assume the densities are light-tailed. The precise definition andconditions are stated below.2.1 DefinitionsDensities with homothetic level sets are a class of densities whose level sets are allscaled copies of some given open set D, with respect to some origin. Consider arandom vector X that has a density f with homothetic level sets with respect to theorigin on Rd . we have:{ f > c} := {x ∈ Rd | f (x)> c}= rcD,0 < c < c0 := sup f , (2.1.1)where D is the shape set and the constant rc is the scaling factor correspondingto the value of c. A typical example of these densities is the centered multivariatenormal density. For any fixed value of c, the set { f > c} is an ellipsoid described bythe covariance matrix. Any point on the boundary of the ellipsoid gives the constantdensity. For various values of c, a family of ellipsoids at different elevations can be4obtained and viewed as scaled copies of some set D.In order to incorporate densities with homothetic level sets that live on theopen half space or on the positive orthant, we assume that D is a bounded openstar-shaped set on Rd containing the origin.Definition 1. A set D is star-shaped with respect to the origin if x ∈ D impliestx ∈ D for all 0 < t < 1.It means that a star-shaped set D contains the line segment between any pointx∈D and the origin. For such a set D, there exists a unique, homogeneous functionof degree one, nD: Rd → [0,∞), such that (i) D = {x ∈ Rd |nD(x) < 1} and (ii)nD(tx) = tnD(x) for t > 0,x ∈ Rd (homogeneity property). The function nD iscalled the gauge function of set D and gives an analytic description of D. Forinstance, the function nD is a norm on Rd with the open unit ball D if D is convexand symmetric, D =−D. We will give some examples of nD later in this chapter.Furthermore, we restrict our attention to bounded, open, star-shaped sets withcontinuous boundary. With respect to the origin, the boundary of the shape set∂D = {x ∈ Rd |nD(x) = 1} can be written as ∂D = {r(ω)|ω ∈ Sd−1} in terms of afunction r on the unit sphere Sd−1 in Rd . We say that D has continuous boundary ifthe function r is continuous and positive on Sd−1. With the homogeneity propertyof the gauge function, we can prove that r(ω) =1nD(t(ω))where t(ω) ∈ Rd isthe polar coordinate, for ω on the unit sphere Sd−1. From this it follows that r(ω)is continuous if nD(t(ω)) is. In fact, if each ray from the origin intersects theboundary ∂D in exactly one point, the gauge function is continuous. See Balkemaand Nolde [2010], Proposition 3.2.A formal definition for the class of densities with homothetic level sets HD(µ,θ)is given below.Definition 2. Let Dd denote the class of all bounded open star-shaped sets D⊂Rd ,for which the open cone D∞ = ∪nnD is convex, and for which the gauge functionnD is continuous on this cone. A density f on Rd belongs to the class HD(µ,θ) ifthe shape set D ∈ Dd and if f has the form f (x|µ,θ) = f0(nD(x−µ)) for x ∈ Rd ,where µ is the location parameter, θ summarizes the parameters associated withthe density generator f0 and nD is the gauge function of the shape set D = {x ∈5Rd |nD(x− µ) < 1}. The function f0 : (0,∞)→ [0,∞) is decreasing, positive andcontinuous.From the definition above, it is easy to see that densities in HD are unimodalsince all the level sets { f > c},c ∈ (0,c0) are star-shaped. They are also invariantunder linear transformations. If X ∈ Rd has a density fX ∈ HD with the shape setD, then Y = AX has a density fY ∈ HD with its shape set (star-shaped) being theimage of D under the linear transformation.Densities in HD on Rd allow the flexibility of modeling distributional asym-metry in multivariate distributions, in the sense that they generalize the class ofspherical distributions. There has been much progress and an increased interest inflexible classes of distributions that can model distributional asymmetry in recentyears. Here we briefly review some related classes of distributions in the literature.Spherical distributions [Fang et al., 1990] can be characterized by having aconstant dispersion along every direction. The lp-spherical densities [Osiewalskiand Steel, 1993] allow the level sets to be balls in lp norm for any p ≥ 1, thusextending the spherical distributions under different norms. Elliptically symmetricdensities can be obtained as affine transformations of the spherical distributions.The isodensity sets can thus be relaxed to become elliptical. In some applications,elliptical symmetry may be too restrictive and a further extension can be obtainedby relaxing the shape of the density level sets. The v-spherical densities [Fernandezet al., 1995] generalize elliptical distributions by introducing a scale function vthat plays the same role as the gauge function nD, where symmetry is no longerimposed and the dispersion is allowed to vary with the direction. In fact, under theframework of v-spherical distributions, densities f ∈HD form a subclass where thelevel sets are bounded and star-shaped and f is continuous on D∞. The D-class ofdistributions [Ferreira and Steel, 2005] provides a different and more operationalparameterization of the v-spherical distributions. We provide more details aboutthe D-class distributions and comparison between densities with homothetic levelsets in Section Density generatorAs before, suppose X on Rd has a density f ∈ HD with the form of f (x|µ,θ) =f0(nD(x− µ)). We assume that the function f0 is continuous on (0,∞), positive,strictly decreasing. The behavior of the density generator f0 at infinity determinesthe decay of the tails for f . This report concentrates on densities that have light tailsand thus we assume that f0 is of exponential decay. In particular, this condition isformalized by assuming that f0 is a von Mises function.Definition 3. A function e−ψ on [0,∞) is a von Mises function with scale function(also called the auxiliary function), c(r) = 1ψ ′(r) if ψ is a C2 function with ψ ′ > 0,and c′(t)→ 0 for t→ ∞.Following the proof of Proposition 1.1(a) in Resnick [1987], we havef0(r+ vc(r))f0(r)→ e−v,r→ ∞,v ∈ Rd .This means that von Mises functions characterize the maximum domain of attrac-tion of the Gumbel distribution. By assuming f0 is a von Mises function, we ensurethat the marginals of densities with homothetic level sets will lie in the Gumbelmaximum domain of attraction. Below we showcase two examples of von Misesfunctions.Example 1. Consider a centered multivariate normal densityf (x) = k exp(−xTΣ−1x/2)with x ∈ Rd and k being the normalizing constant. So here the gauge functionnD(x) =√xTΣ−1x and f0(r) = ke−r2/2,r > 0. Up to a constant, ψ(r) = r2/2, sothat the scale function c(r) = 1/ψ ′(r) = 1/r > 0, and clearly c′(r)→ 0 as r→ ∞.Example 2. Consider a function of the form : f0(r) = arbe−prθ,r > 0,a, p,θ >0. The value of b can be determined to guarantee that function ψ has a positivederivative. Up to a constant, ψ(r) = prθ − blnr, so the scale function c(r) =1/ψ ′(r) = r/(pθrθ −b), and c′(r) = pθrθ (1−θ)−b(pθrθ −b)2 → 0 as r→ ∞.7Apart from the assumption on the rate of decay for the density generator,another natural question to ask about the function f0 is as follows: for a givenshape set D, what conditions does f0 have to satisfy in order that the functionf (·) = f0(nD(·)) is a proper probability density on Rd?Observe thatP((X−µ) ∈ tD)= ∫tDf0(nD(x−µ))d(x−µ)= f0(t)|tD|+ |D|∫ t0rd |d f0(r)|, t > 0,where | · | denotes the volume of a set and the last equality comes from viewing theset below the graph of f as a pile of thin D-shaped slices. Since f0 is decreasingand |tD|= td |D|, by partial integration, the probability can be re-written asP((X−µ) ∈ tD)= |D| f0(t)td−|D|∫ t0rdd f0(r)= |D| f0(t)td− [|D|td f0(t)−0−|D|∫ t0f0(r)drd ]=∫ t0d|D|rd−1 f0(r)dr, t > 0. (2.2.1)Letting t go to infinity, we obtain the condition for the density generator f0 toensure that f (·) = f0(nD(·)) is a proper probability density:d|D|∫ ∞0rd−1 f0(r)dr = 1. (2.2.2)This is the same condition as Equation (5) in Fernandez et al. [1995].2.3 ExamplesAs mentioned earlier, in order to give an analytic expression for a density withhomothetic level sets determined by a shape set D, it is necessary to specify thegauge function of D and a density generator. In this section we give some ex-amples of shape sets with their corresponding gauge functions. Some of theseexamples will form the basis of distributions to be considered later in this report.In such cases, we also give explicit expressions for the area |D| of D in the bivariate8case, which has to be computed to specify the corresponding density generator; seeequation (2.2.2).(1) The gauge function of the formn2D(x) = xTΣ−1x (2.3.1)describes an ellipsoid in Rd , which defines level sets of densities of elliptical dis-tributions with scale matrix Σ. The volume of D is |D|= pi√|Σ|.(2) Another example of a gauge function isn2D(x) = xTΩ−1x+(αT x)21{αT x<0}, (2.3.2)where 1{·} is an indicator function and outputs the value of 1 when the conditionis satisfied and 0 otherwise. Equation (2.3.2) characterizes an asymmetric (forα 6= 0) convex set. This set arises as a limit for suitably scaled sample clouds froma multivariate skew-normal distribution on Rd with shape parameter α and scalematrix Ω; see Balkema and Nolde [2010]. The parameter α controls the degree ofasymmetry. The area of D ∈ R2 is |D|= (|E1|+ |E2|)/2, where Ω=[1 ωω 1],|E1|= pi√1−ω2, |E2|= pi√1−ω21+α21 +2ωα1α2+α22for the ellipses E1 and E2 given byE1 = {(x,y)|x2−2ωxy+ y2 < 1−ω2},E2 = {(x,y)|(1+α21 (1−ω2))x2−2(ω−α1α2(1−ω2))xy+(1+α22 (1−ω2))y2 < 1−ω2}.(3) The following gauge functionnθD(x) = ((λ +d)||x||θ∞−||x||θθ ))/λ , x ∈ Rd ,θ ,λ > 0 (2.3.3)where || · ||θ is the `θ -norm is derived as a limit for the shape of sample clouds9from a class of meta distributions; refer to Balkema et al. [2010] for details. Whenθ = 2, the area of D is equal to|D|= 4λ√λ +1log(√λ +1+1√λ).(4) Finally, if a set of interest is a triangle with vertices (−1,0), (0,−1) and(1,1), its gauge function is given bynT (x1,x2) = max(−x1− x2, 2x1− x2, 2x2− x1), (x1,x2) ∈ R2. (2.3.4)The area of this triangle is 3/2.2.4 Simulation from homothetic densitiesWe now provide a stochastic representation for random vectors with a densityf ∈ HD. This representation can be utilized to generate random samples fromdistributions with densities in HD.Proposition 1. Consider a random vector X on Rd with density f ∈ HD(µ,θ) ofthe form f (x) = f0(nD(x− µ)); see definition 1. Let s` = f−10 (l) for ` ∈ (0, l0),where `0 := sup f . Let random variable L have density fL(`) = |s`D|, ` ∈ (0, `0),and define S = f−10 (L). Let random vector U be uniformly distributed on D ={x ∈ Rd |nD(x) < 1}, and be independent of L (and S). Then X has the followingstochastic representation:X= SU+µ. (2.4.1)Proof. First note that fL(`) = |s`D| is indeed a probability density on interval(0, `0):1 =∫Rdf (x)dx =∫Rdf0(nD(x−µ))dx =∫ `00|s`D|d`,which can be seen by viewing the graph below density f as a stack of thin D-shaped10slices s`D at levels `. The density of S = f−10 (L) is given byfS(s) = fL( f0(s)) | f ′0(s)|= | f−10 ( f0(s))D| | f ′0(s)|= |sD| | f ′0(s)|= sd |D| | f ′0(s)|, s > 0, (2.4.2)and the density of U isfU(u) =1|D|1{nD(u)<1}, u ∈ Rd .Consider now the distribution function of X. By independence of S and U, wehave:P(X≤ x) = P(SU+µ ≤ x) =∫ ∞0P(SU+µ ≤ x|S = s) fS(s)ds=∫ ∞0P(U≤ (x−µ)/s) fS(s)ds=∫ ∞0(∫· · ·∫ x−µs−∞1|D|1{{nD(u)< 1}}du)fS(s)ds=∫ ∞0((∫· · ·∫ x−∞1|D|1{{nD(v−µs)< 1}}s−ddv)fS(s)ds=∫ ∞0(∫· · ·∫ x−∞1|D|1{{nD(v−µ)< s}}s−ddv)fS(s)ds,the last line makes use of the homogeneity of gauge function nD. Changing theorder of integration gives:P(X≤ x) =∫· · ·∫(−∞,x)1|D|(∫ ∞nD(v−µ)fS(s)s−dds)dv.It remains to show that the integrand is indeed the density of X. Plugging in the11expression (2.4.2) for fS, we get1|D|∫ ∞nD(x−µ)fS(s)s−dds =1|D|∫ ∞nD(x−µ)sd |D|| f ′0(s)|s−dds=−∫ ∞nD(x−µ)d f0(s)= f0(nD(x−µ))= f (x)as desired.The proposition motivates the following algorithm to simulate data from a dis-tribution with the density f (x) = f0(nD(x−µ)).Algorithm 1 Generating from a density with homothetic level sets f ∈ HD(µ,θ)1: generate U1,U2, · · · ,Un uniformly on shape set D specified by the gauge func-tion nD;2: sample S1,S2, · · · ,Sn from the distribution with density fS(s) = sd |D|| f ′0(s)|;3: take the products and adjust them with the location µ to obtain samples SiUi+µ, i = 1,2, · · · ,n of size n from f (x) = f0(nD(x−µ)).Here we showcase the detailed generation process for a particular density gen-eratorf0(t) = (2|D|)−1e−t2/2, t > 0, (2.4.3)which forms the basis of some distributions we will consider later in simulationstudy.Example 3. Consider a bivariate probability distribution, whose density f ∈ HDwith shape set D ⊂ Rd and the density generator of the form in equation (2.4.3).The density of random variable S in equation (2.4.2) is given byfS(s) = (2|D|)−1|D|s3e−s2/2 = 12s3e−s2/2, s > 0.Observe that a transformed variable S¯ = S2/2 has a gamma distribution with12shape parameter α = 2 and rate parameterβ = 1, sincefS¯(s) = fS(√2s)/√2s ∝ se−s.Hence to generate a sample of iid copies of S, we first generate a random samplefrom the distribution of S¯ and then apply the inverse transformation S=√2S¯. Thenwe generate a sample {U} uniformly from the shape set D. Finally we multiplyU with S to obtain a sample from f centered at (0,0), with level sets that areall scaled copies of shape set D. Sample clouds for densities with homotheticlevel sets specified by the gauge functions in equation (2.3.1), equation (2.3.2),equation (2.3.3) and equation (2.3.4) are shown in Figure Relation to D-class of DistributionsAs shown in Figure 2.1, densities in HD give us the flexibility to model directionalirregularities in the data clouds. By specifying different gauge functions, we couldgenerate a wide class of unimodal distributions. In order to exploit the generalityof this distribution class , we need a very flexible modeling framework to conductinferences for these distributions. In this report, we adopt the framework proposedby Ferreira and Steel [2005]. They introduce a class of multivariate distributionsthat the densities depend on directions of points. These distributions allow the dis-persion vary along each direction. Ferreira and Steel [2005] name this flexible classof distributions the directionally dispersed class, abbreviated to D-class. They pro-pose a very flexible, yet practical framework for modeling directional dispersionin any dimension, which we use for inferences on the shape sets of densities in HDas part of our proposed method. Here we provide more details about the relation-ship between HD class and the D-class and show how the framework for modelingdirectional dispersion could be adopted in our setting.D-class distributions can be viewed as a generalization of the elliptical dis-tributions. Let Σ be a d× d scale matrix and µx ∈ Rd be the location parameter.Consider a random vector X onRd with an elliptical distribution with density given13Figure 2.1: Sample clouds with n = 1000 data points from four densities inHD whose shape sets are specified by (1) equation (2.3.1) with diagonalentry of Σ as 1 and off-diagonal entry as 0.5;(2) equation (2.3.2) withα= (−1,6) and the scale matrix Ω = I; (3) equation (2.3.3) with λ =1,θ = 2; (4) equation (2.3.4). The density generator is equation (2.4.3)for all four samples.byf (x|µx,Σ) ∝ g(d){(x−µx)TΣ−1(x−µx)}.The direction of x− µx is given by dx−µx ∈ Sd−1, the unit sphere in Rd . Immedi-14ately the density can be written asf (x|µx,Σ) ∝ g(d){(x−µx)T (x−µx)D(dx−µx)},where D(dx−µx) = (dTx−µxΣ−1dx−µx)−1. It is clear that the direction of x−µx playsa part when calculating the density values. Here D(dx−µx) models the dispersion ofx−µx along dx−µx . D-class of distributions extend the elliptical class by droppingthe specific, rigid form for the direction dispersion. The definition of D-class ispresented below; see Ferreira and Steel [2005].Definition 4. A density f on Rd belongs to the D-class DCd(µ,g(d),D), if it hasthe following representation:f (x|µ,g(d),Dd) = 1KD g(d){(x−µ)T (x−µ)Dd(dx−µ)}, (2.5.1)where dx−µ = (x−µ)/||x−µ|| with || · || being the l2 norm. Function Dd is calledthe directional dispersion function and is a function from the unit sphere Sd−1 inRd to R+. Constant KD is a normalizing factor and µ is the location parameter.Function g(d) is the density generating function.By controlling the directional dispersion function, it is possible to generatea very rich and flexible class of distributions. Ferreira and Steel further provethe necessary and sufficient condition on the dispersion function Dd so that theequation (2.5.1) is indeed a probability density function and give an explicit formof the constant KD. The conditions and the value of KD do not depend on either onlocation or density generating function. As such, a direction dispersion functionthat is valid for a particular choice of location and density generating function isalso valid for any other locations and generating functions; see Ferreira and Steel[2005] for more details.The D-class of distributions is as rich as the flexible v-spherical class of Fer-nandez et al. [1995] but with a more operational parameterization. In fact, densitiesin HD form a subclass of both v-spherical class and D-class, where the level sets arebounded and star-shaped and f is continuous on D∞. Consider a density f ∈ HD15on Rd , the density can be written asf (x) = f0(nD(x−µ))= f0(||x−µ||nD(dx−µ)).Thus, in this D-class framework, density f corresponds to specifying f0 and nDsuch thatf (x) = f0(||x−µ||nD(dx−µ))= 1KD g(d){ ||x−µ||2D(dx−µ)}.Given f0 and g(d), the gauge function nD and the directional dispersion D have aone-to-one relationship.Example 4. Consider a multivariate normal distribution f onRd with mean µ andthe covariance matrix Σ. Here the gauge function isnD(x−µ) =√(x−µ)TΣ−1(x−µ),and the density generator function is f0(r) ∝ e−r2/2. In D-class framework, thedirectional dispersion D(dx−µ) = (dTx−µΣ−1dx−µ)−1 and the generating functiong(d)(r) ∝ e−r/2.The gauge function and the directional dispersion are related throughn2D(dx−µ) =1D(dx−µ).16Chapter 3Estimation of Multivariate RiskRegionsThe problem of risk region estimation for densities in HD can be split into two parts:we first estimate the shape of the level sets and obtain the estimate for the scalingfactor from the sample of radial components with existing methods in univariateextreme value analysis; then we get the estimated risk region by extrapolating theshape set with the scaling factor.3.1 OverviewIn the univariate case, a natural order makes it convenient and straightforward todefine quantile function as the inverse of the distribution function. However, inhigher dimensions, there is no unique way to order multivariate data. Thereforethere are many possible extensions of quantile definitions for multivariate distri-butions. Assuming that the distribution of a random vector of interest X can bedescribed by a probability density function f , we adopt the definition of the multi-variate quantile region Qp at probability level p given by equation (1.0.1), page 1.In this report, we will be interested in cases where p is small. We refer to quantileregions as risk regions for p small. That is, there is a very small chance that a datapoint from the underlying distribution hit the risk region Qp.Below we outline the approach we use in the investigation of risk region esti-17mations. When the density function f ∈HD(µ,θ), there is a simple way to expressthe risk region in terms of the location, the shape set and a scaling factor relatedto the density generator. The density f has the form of f (x) = f0(nD(x−µ)) withthe shape set D and density generator f0. Observe thatQp = {x ∈ Rd | f0(nD(x−µ))< cp}= {x ∈ Rd |nD(x−µ)> f−10 (cp)}= µ+{y ∈ Rd |nD(y)> f−10 (cp)}= µ+(rpD)c, where rp = f−10 (cp).This shows that the risk region Qp is equal to the complement (rpD)c shifted by lo-cation µ for an appropriate scaling factor rp = f−10 (cp). The problem of risk regionestimation can thus be broken down into two steps: estimation of the shape set Dand the location parameter µ and estimation of the scaling factor rp correspondingto the probability level p.To estimate the shape set, there are several possible options that can be ex-plored. Under the assumption of elliptical level sets, one can estimate the scalematrix based on Kendall’s tau (see Hult and Lindskog [2002]) and obtain an esti-mation for the shape set. The ellipticity assumption may sometimes become toorigid to make full use of the flexibility of the homothetic densities. So we adopt themodeling framework for D-class of distributions by Ferreira and Steel [2005] thatdoes not assume elliptical level sets. The method proposed by Ferreira and Steelassumes that the form of the density generating function g(d), or the density genera-tor f0 in our setting, is given. In practice, one has to decide which density generatorto consider. Note that if the shape set were known a priori, one could estimate thedensity generator using the sample of the radial variable nD(Xi−µ), i= 1,2, · · · ,n.We propose to use a simple estimator of the shape set as a starting point to learnabout the density generator. This estimator does not explore the homothetic struc-ture of level sets, but rather uses the fact that the ”shape” of the sample cloudcoincides with the common shape of level sets. Since the underlying distributionis assumed to be light-tailed, large sample clouds tend to cluster within a set; seeFigure 2.1. In fact, under certain assumptions there is a limit shape for the sampleclouds in a sense that they can be suitably scaled to converge onto a deterministicset, known as a limit set; see Davis et al. [1988]. In order to get an estimate of the18limit shape for the sample cloud, a spline-based method is used. We then supplythe resulting estimate of the density generator into the method for making inferenceabout the directional dispersion in Ferreira and Steel [2005], which does make useof the particular structure of the density.Observe thatp = P(X ∈ Qp) = P(X ∈ (µ+(rpD)c))= P((X−µ) ∈ (rpD)c). (3.1.1)From equation (2.2.1), we havep = P((X−µ) ∈ (rpD)c)= P(nD(X−µ)> rp)= d|D|∫ ∞rprd−1 f0(r)dr. (3.1.2)For p = 1,rp = 0, the above equality gives equation (2.2.2):d|D|∫ ∞0rd−1 f0(r)dr = 1.Thus we seeg(r) := d|D|rd−1 f0(r) (3.1.3)is the probability density function of the radial variable R= nD(X−µ). The scalingfactor rp is the pth quantile for the distribution of R. When p is small, findingrp is a univariate extreme value problem, and several solutions already exist inthe literature. However, it should be noted that this is a difficult problem in thecontext of light-tailed distributions as the ones we consider. From the estimateDˆ with our proposed method,, we should also get a sample of radial variablesri = nD(xi− µ), i = 1,2, · · · ,n for the estimation of the scaling factor rp. In thisreport, we adopt the estimator for Weibull-tailed distributions proposed in Asimitet al. [2010] to estimate rp.It is the purpose of this report to estimate Qp based on iid copies of randomvector X. Note that the estimation depends on the shape D, the scaling factor rp aswell as on the location µ . The estimation methods described above for D and rpshould all be conducted with respect to the location parameter. In practice, priorknowledge about the location may not be available and should then be inferredfrom data at the same time with the shape set and the scaling factor. The prob-19lem of estimating the location is essentially the problem of mode estimation forcontinuous multivariate random vectors. There are many existing methods in theliterature. However, determining the most appropriate estimator in high dimen-sions will be challenging due to the very nature of this problem. In this report, forsimplicity, we mainly focus on the estimation of D and rp assuming true locationµ is known and set µ = (0,0).3.2 Estimation of the shape setWe discuss the proposed method for estimating the shape set in details. In thisreport, we restrict our attention to the bivariate cases.3.2.1 Initial estimate via the limiting shape of sample cloudsSample clouds from light-tailed distributions tend to have clearly defined bound-aries. Suitably scaled, these sample clouds can converge onto a limit set; see Daviset al. [1988]. For instance, consider a sample of points from a bivariate standardnormal distribution, as the sample size increases, the points can be scaled to fill ina circle. The shape of a large sample cloud gives the impression of the shape of alimit set. If the underlying distribution has a density whose level sets are homoth-etic and the density generator is rapidly varying, the limit set exists and coincideswith the common shape D of the level sets, up to a scaling, see Balkema and Nolde[2010]. This leads to the limit set approach where we estimate the shape of thelimit set to learn the shape of level sets for densities in HD.Consider a random vector X ∈ R2 with a density f in HD(µ,θ), assuming weknow its true location µ = (0,0). The boundary of the shape set∂D = {x ∈ R2|nD(x) = 1}can also be represented by∂D = {r(ω)|ω ∈ S1},in terms of a continuous function r on the unit circle S1. More specifically, through20polar coordinate transformation, we can write the points on ∂D asx = (r(ω)cosω,r(ω)sinω).We could also represent the common shape of level sets through the limit set. De-note Dlim as the limit set. Let the boundary ∂Dlim = {rlim(ω)|ω ∈ S1}. We haveDlim = {x ∈ R2|||x||< rlim(ω),ω ∈ S1}= {(ρcosω,ρsinω)|ρ < rlim(ω),ω ∈ S1}, (3.2.1)where ρ = ||x|| denotes the `2 norm. On the other hand, the set Dlim can be repre-sented using its gauge functionDlim = {x ∈ R2 | nDlim(x)< 1}= {(ρcosω,ρsinω) | ρnDlim((cosω,sinω))< 1,ω ∈ S1}= {(ρcosω,ρsinω) | ρ < 1nDlim((cosω,sinω)) ,ω ∈ S1} (3.2.2)Comparing equation (3.2.1) and equation (3.2.2), we obtain the relationship be-tween the continuous function r(ω) and the gauge function nD asr(ω) = 1/nD((cosω,sinω)).In order to learn the shape of level sets from the limiting shape of sample clouds,essentially we need to estimate the continuous function r(ω) to approximate theboundary of its limit set. In the bivariate cases, given a sample of points froma light-tailed distribution, a simple way to approximate ∂Dlim to fit a continuouscurve through the maximal points at a grid of directions. Below we describe theestimation process in detail.First of all, we define a sequence of even-spaced positions 0 = θ0,θ1, · · · ,θk =2pi on [0,2pi] that gives k intervals on the unit circle S1. Then we transform allthe data points into polar coordinate as x = (ρcosω,ρsinω) where the radial com-ponent ρ = ||x|| and ω ∈ [0,2pi] is the angular component. Next, within eachinterval, we identify the point with the maximal radial component among all the21points whose angular component falls into this interval. Then we define the set ofk+1 maximal points (ωi,ρi), i= 1,2 · · · ,k+1 with ω1 =ωk+1+2pi,ρ1 = ρk+1 andthen fit a cubic spline interpolation with these k+1 points as the knots to obtain asmooth approximation of the boundary of the sample cloud:∂Dlim = sr(ω) =C1(ω) ω1 ≤ ω < ω2· · ·Ci(ω) ωi ≤ ω < ωi+1· · ·Ck(ω) ωk ≤ ω < ωk+1,where Ci(ω) = ai+biω+ciω2+diω3,di 6= 0, i= 1,2, · · · ,k. The set Dlim can alsobe standardized by rescaling so as to be bounded by say y = 1 line. It should benoted that the choice of the number of intervals in the unit circle influences theestimation performance. A coarse grid of directions may not be accurate enoughto capture the curvature of the limit set; yet a fine grid may be too sensitive andresults in an approximation with too much noise. However, there is no guidelineto choose this number efficiently. The choice depends much on expert knowledgeand visualization of the sample.In the same spirit, this limit set approach can be generalized into higher di-mension. The boundary of the shape set D, in multivariate cases, can as wellbe represented by ∂D = {r(ω)|ω ∈ Sd−1} in terms of a continuous function ron the unit sphere Sd−1. We can write the points on the boundary ∂D as x =r(ω)t(ω) where ω ∈ Sd−1 = (ω1,ω2, · · · ,ωd−1) ∈ [0,pi)d−2× [0,2pi)} and t(ω) =(t1(ω), t2(ω), · · · , td(ω))T witht j(ω) =j−1∏i=1sin(ωi)cos(ω j),1≤ j ≤ d−1,td(ω) =d−1∏i=1sin(ωi).Then a more sophisticated framework can be developed for estimating the con-22tinuous function r in multivariate setting.3.2.2 Inferences on the directional dispersionAs mentioned earlier, the gauge function plays the same role in describing thecommon shape of the level sets for densities in HD as the directional dispersionfunction for the D-class of distributions studied in Ferreira and Steel [2005], ofwhich the HD class form a sub-class. Hence, we adopt the approach proposedin Ferreira and Steel [2005] to estimate the gauge function and from it to get anestimate of the shape set. In this section, we give an overview of the methodologydeveloped by Ferreira and Steel [2005].With the direct modeling of the directional dispersion, the quantification ofthe dispersion in D-class is straightforward and can explore the homothetic struc-ture of level sets. Ferreira and Steel [2005] propose to model the logarithm ofthe dispersion, denoted by log(D(dx)), with hyperspherical spline models. Hyper-spherical splines are constructed similarly to splines in real spaces and are veryflexible. They could well approximate any smooth function and are numericallymore stable. Formally, the hyperspherical spline model in our standard setting ofhomothetic densities is given bylog{D(d)}= c0+K∑k=1ciRld(d|dk), (3.2.3)where K is a positive integer, c = (c0,c1, · · · ,cK) ∈ RK+1, BD = {d1,d2, · · · ,dK}is a set of vectors in the unit sphere Sd−1 and vectors dk,k = 1,2, · · · ,K are knots.Rld(d|dk)∈Cl are real-valued spline basis functions defined on Sd−1 for k= 1, · · · ,K.By choosing an appropriate l, it is possible to control the smoothness of the dis-persion function D. Ferreira and Steel [2005] use the approximating reproducingkernels for which a closed form of function Rld(d|dk) is not known. In the bivariatecases, the vector d ∈ S1 can be parameterized by only the angular component ω .The kernel is then given byRld(ω|ωk) =(−1)l/2(2pi)l+1(l+2)!Bl+2(θ2pi), (3.2.4)23with θ being the angle between ω and ωk and Bl+2 being the Bernoulli polynomialof degree l + 2. For instance, when l = 2, according to Abramowitz and Stegun[1972], B4 has the following form: B4(x) = [x(1− x)]2− 1/30,0 ≤ x ≤ 1. Oncethe value of l has been specified, an interpolating spline can then be characterizedby the number of knots K, the location of knotsBD and the coefficients c. Denotethe vector of values v = (v1,v2, · · · ,vK) ∈ RK corresponding with each knot. Thecoefficients c can be obtained by solving a linear system of equations ensuring thatthe last K elements of c add up to 0 and log{D(dk)}= vk.Ferreira and Steel [2005] propose a Bayesian method to conduct inference onthe location parameter µ and the dispersion function D(dx). Inference on θ associ-ated with g(d) can also be conducted at the same time. They adopt an independentprior structure that is given by Pµ,D,θ = PµPDPθ , assuming prior independence be-tween the location µ , the dispersion function D and θ . We will discuss the priordistributions and proposals for µ and θ in the context of the simulation study andapplications. Ferreira and Steel [2005] state that the specification of PD is crucialas a sensible prior on D provides an automatic protection against overfitting (seealso DiMatteo et al. [2001]). As the logarithm of D is modeled by an interpolat-ing hyperspherical spline, with unknown number of knots K, knot locations BDand knot values v, the prior of D can be defined through a prior on K,BD,v asPK,BD,v = PBD,v|KPK . We adopt the choice for the prior of K as a Poisson priorwith parameter λK , restricted to K > 0 as in Ferreira and Steel [2005]. Withoutany further information, the locationsBD and the knot values v are assumed to beindependent given K. Thus the prior for D can be written as PD = PBD|KPv|KPK .Again, we adopt the specifications of PBD|K and Pv|K as in their paper. The prioron BD given K is derived by assuming that each of its elements di, i = 1,2, · · · ,Khas a uniform distribution on the unit sphere Sd−1, independently from all otherelements. The prior on knot value v is set such that each entry of v comes from anormal distribution with mean µv and variance σ2v . This prior structure is invari-ant to linear orthogonal transformations and uninformative about the direction ofdispersion.This Bayesian method automatically trains the number of knots when conduct-ing inference on the directional dispersion. Moreover, any prior knowledge aboutD could be incorporated by making the priors on knot locationBD and knot value24v dependent, giving a more flexible framework for estimation. However, becausethe number of knots is left unspecified, inference needs to be conducted on a spacewith varying dimensions. Ferreira and Steel [2005] use a reversible jump Markovchain Monte Carlo (RJMCMC) sampler (see Green [1995]) that is appropriate inthis context. For a fixed K, due to the independent role of BD, D and θ , sam-pling them independently works well. Here we briefly describe the sampling ofthe dispersion function D and leave the implementation details to the simulationstudy and applications. Through polar-coordinate transformation, we could re-parametrize the knot location dk ∈ S1 into ωk. The uniform prior on dk implies thatP(ω) ∝ 1,k = 1,2, · · · ,K. Four possible transitions of the sampler are:(a) a change to the value of a randomly chosen knot;(b) a change to the location of a randomly chosen knot;(c) ’birth’ of a new knot and(d) ’death’ of an existing knot.The first two moves are fixed dimension moves and are defined as random-walkMetropolis Hasting moves. Move (c) and (d) involve changing dimension of thestate space. At each transition, an independent random choice is made betweenattempting each of the four possible moves. We specify the four probabilities as inGreen [1995] that could simplify the calculation of the acceptance rates. If move(a) or (b) is chosen, the remaining details are straightforward. For move (a), werandomly choose an existing knot k and propose a new knot value from a normaldistribution with mean vk as the current value of the knot, and a tuning standarddeviation σv. Similarly, a component j ∈ {1,2, · · · ,d−1} of ωk can be chosen atrandom and a new value for ωkj can be sampled from a normal distribution centeredat ωkj and standard deviation σw that needs tuning.The details for the birth of a knot are more complicated. We first set K∗ =K+1 and sample a location dK∗ in unit sphere Sd−1 from the uniform prior. Thenwe sample a knot value vK∗ from a normal distribution with mean as the currentvalue of log(D) at dK∗ and a tuning standard deviation. If accepted, the updatedlocation and the knot values are BD∪dK∗ and (v,vK∗) ∈ Rk+1. Finally, when the25number of knots K is greater than 1, move (d) can be made by randomly selectingk from {1,2, · · · ,K} and removing the corresponding knot location dk and valuevk. As move (c) and (d) involve parameter spaces of different dimensionality withthe change of value K, in order to achieve detailed balance across spaces withvarious dimensions, the dimension-matching requirement should be imposed forpairs of move (c) and (d). Further details can be found in Green [1995]. Thenthe acceptance rate can be properly computed with a Jacobian matrix defined for abijection between the parameter spaces of moves (c) and (d).In simulation studies and applications, we tune the parameters in proposal dis-tributions, for instance, the standard deviation σv of the normal proposal for knotvalue in move (a), to obtain acceptance rates for all parameters that are close to30% in order to achieve an efficient mixing of the chain (as discussed in, for exam-ple, Gelman et al. [1996]).A final note is that in the bivariate case, it is possible to order the knot locationsand thus a prior structure that is similar to the one used in the change-point locationsampler in Green [1995] could be used. Given a fixed K, when d = 2, we denotethe knot locations as BD = (ω1,ω2, · · · ,ωK) and assume that they are distributedas the even-numbered order statistics from 2K+1 points uniformly distributed on[0,2pi]. Simply assuming the knot locations are uniformly distributed on [0,2pi]may allow too many small intervals (ωi,ω j) to capture the dispersion in differentdirections. This ”spacing-out” technique, in fact, gives higher acceptance rates inour simulations and applications. Therefore in this report, we use this particularsetting for the knot locations.3.3 Estimation of the density generatorThe method proposed by Ferreira and Steel assumes that the density generator isgiven. They use a density generator of the multivariate standard normal distribu-tion and a multivariate Student density generator with an extra parameter for thedegrees of freedom in their numerical applications. However, it might be too re-strictive and not practical to impose a particular form for the generator in somecases. One possible option is to assume a flexible form of the density generatingfunction for D-class, for instance, g(d)(r) = exp(−(r/λ )k), and estimate its param-26eters together with the dispersion function. This fully Bayesian approach turns outto be computationally demanding and makes the tuning for across-space moves inRJMCMC even harder. An alternative is to divide the inference procedure into twosteps: first we estimate the density generator and then we plug the resulting esti-mator into the RJMCMC algorithm for the estimation of the dispersion function.Via the limiting shape of sample clouds, we could learn the shape of the level setsfor densities in HD. This could, at the same time, give us a sample of radial compo-nent Rˆ = nDˆ, which forms the basis for the estimation of the density generator. Wediscuss two possible approaches that can be used to estimate the density generatorin this section.3.3.1 Maximum likelihood estimationConsider a homothetic density f = f0(nD(x− µ)) on Rd with location µ , shapeset D, gauge function nD and density generator f0. In Section 3.1, we define theradial component R with a density of given by equation (3.1.3). We can then fita probability distribution to the sample of radial components and obtain the max-imum likelihood estimates (MLE) for the underlying distribution. The estimatedparameters for the density generator can be obtained through equation (3.1.3). Forexample, with a density generator f0(r) ∝ e−r2/2, observing the Weibull densityhas a form of f (x) = kλ (xλ )k−1e−(x/λ )k , x≥ 0, a Weibull distribution can be fittedwith the true parameters k = 2,λ =√ The estimator based on asymptotic assumptionsAnother possibility to estimate the density generator is to construct an estimatorbased on the assumption that f0 is a von Mises function. As before, suppose arandom vector X ∈ Rd has a density in HD f (x) = f0(nD(x− µ)). The radialcomponent R= nD(X−µ) is a random variable with density fR of equation (3.1.3).We present the following proposition.Proposition 2. Consider a homothetic density function f (x) = f0(nD(x− µ)) onRd with location µ , gauge function nD and shape set D∈Dd . The density generatorf0 : [0,∞]→ [0,∞] is a von Mises function and thus of form f0 = e−ψ0 with scalefunction c0(·). Then the density fR and tail distribution function 1− FR of the27radial component R= nD(X−µ) are also von Mises functions with scale functionsasymptotically equal to c0(r) for r→ ∞.Proof. We begin by showing that f0 and fR(r), the density of R, have asymp-totically equal scale functions. First note that rψ ′0(r) = r/c0(r)→ ∞ as r → ∞by Lemma 1.2(a) in Resnick [1987]. The density of fR is given by (see equa-tion (3.1.3))fR(r) = d|D|rd−1 f0(r) = e−ψ(r) with ψ(r) =− log(d|D|)− (d−1) log(r)+ψ0(r),for r > 0, from which we find thatψ ′(r) =−(d−1)1r+ψ ′0(r) = ψ′0(r)(1− d−1rψ ′0(r)).Since rψ ′0(r)→ ∞, we see that ψ ′(r) is positive eventually. Considering the scalefunction for fRc(r) =1ψ ′(r)=1ψ ′0(r)− (d−1)/r,the derivative satisfies:c′(r) =− ψ′′(r)[ψ ′(r)]2 =− d−1rψ ′0(r)− (d−1) − ψ′′0 (r)[ψ ′0(r)]2(1− (d−1) 1rψ ′0(r))2 → 0,as r→∞. Hence, fR is a von Mises function. Looking at the ratio of scale functionsfor fR and f0, we find thatc(r)c0(r)=ψ ′0(r)ψ ′(r)=[1− d−1rψ ′0(r)]−1→ 1, r→ ∞as claimed.Let C(r) denote a scale function for the distribution tail 1−FR of R = nD(X−µ). We can take C(r) = (1−FR(r))/ fR(r). Since both the numerator and denomi-nator tend to zero as r→ ∞, applying L’Hospital’s rule gives:C(r)∼ − fR(r)f ′R(r)=exp{−ψ(r)}ψ ′(r)exp{−ψ(r)} =1ψ ′(r)= c(r), r→ ∞.28To verify that 1−FR is indeed a von Mises function, we note that Ψ′(r) =1/C(r) = fR(r)/(1−FR(r)) is positive as both the numerator and denominator arenot identically zero for r > 0. To compute C′(r) = −Ψ′′(r)/(Ψ′(r))2, we firstcompute:Ψ′′(r) =f ′R(r)1−FR(r) +( fR(r)1−FR(r))2=−ψ ′(r)Ψ′(r)+(Ψ′(r))2,since f ′R(r) =−ψ ′(r) fR(r). Hence,C′(r) =ψ ′(r)Ψ′(r)−1 = C(r)c(r)−1→ 0, r→ ∞.Following the same approach as in Abdous et al. [2008] (pp.1072-1073), anadmissible form for the scale function of the tail distribution 1−FR of R iscR(x) =1cβx1−β , (3.3.1)for some constants c > 0 and β > 0. Suppose X1,X2, · · · ,Xn are independently,identically distributed random vectors on Rd with density f . Let R1,R2, · · · ,Rndenote the radial components. Then let kn be a user-chosen threshold and R j,n,1≤j ≤ n be the order statistics of R1,R2, · · · ,Rn. The parameters c and β can beestimated asβˆn =k−1n ∑kni=1 loglog(n/i)− loglog(n/kn)k−1n ∑kni=1 log(Rn−i+1,n)− log(Rn−kn,n), (3.3.2)cˆn =1knkn∑i=1log(n/i)Rβˆnn−i+1,n. (3.3.3)With the scale function in equation (3.3.1), the tail distribution function of R isgiven by1−FR(r) = exp(−∫ rr0dxcR(x))∝ exp(−crβ ),29for some r0 ≥ 0. The density fR of the radial component is then of the formfR(r) =1−FR(r)cR(r)∝ rβ−1exp(−crβ ), r > 0, (3.3.4)where the normalizing constant can be computed by integration. Plugging in theestimated parameters, we have fˆ0(r) = 1d|Dˆ|r1−d fˆR(r),r > 0.It should be noticed that with the form of scale function given in equation (3.3.1),the derived density function of the radial component is a Weibull density. In fact,with a density generator with the same rate of decay as e−t2/2 and the estimatorbased on asymptotic assumption shares the same form of the MLE estimator intro-duced in Section 3.3.1. As before, the estimated shape from the limit set approachcan be used to obtain a sample of radial components.Once we obtain the estimates for the density generator, we could use them asthe input for the RJMCMC algorithm. Moreover, the knowledge of the densitygenerator could possibly be updated through repetitive run of the RJMCMC. Wecould obtain a sample of radial components every time we run the RJMCMC andthis sample could then be used to update our estimate for the generator.3.4 Estimation of the scaling factorThe second part of the risk region estimation is to determine how much we needto extrapolate the shape set D to achieve the specified probability level. The riskregion at probability level p is given by equation (3.1.2). In this light, the scalingfactor rp is the pth quantile for the distribution of the radial variable R. The esti-mation of rp then becomes a univariate extreme value problem as we are interestedin cases where p is very small. In this report, we adopt the estimator proposed byAsimit et al. [2010].Definition 5. Suppose X ∈ R has a distribution function F. Distribution functionF is called Weibull-tailed if it can be written as 1− F(x) = exp{−H(x)} withH−(x)= inf{t : H(t)≥ x}= xθ l(x), θ > 0 where l(x) is a slowly varying functionat infinity and θ is referred to as the Weibull tail coefficient.If the underlying distribution is believed to have Weibull-type tail , Weibull-tailed distributions can be used to estimate high quantiles. Yet, to estimate extreme30tail probabilities, a more refined model is needed. Asimit et al. [2010] introducea possible class and stated that it is useful in estimating very high quantiles byconsider a proposed estimator in their paper. We use the same estimator in thisreport.Suppose Ri = nD(Xi− µ), i = 1, · · · ,n are the radial components of a densityin HD.. Let Rn,1 ≤ ·· · ,≤ Rn,n denote the order statistics of R1, · · · ,Rn and k de-note a user-chosen threshold. An estimator of high quantile rp corresponding toprobability level p is given byrˆp(k) = Rn,n−k+1{log(1/p)/log(n/k)}θˆH(k), (3.4.1)where θˆH(k) =k−1∑ki=1 log(Rn,n−i+1/Rn,n−k)k−1∑ki=1 loglog((n+1)/i)− loglog((n+1)/(k+1)).31Chapter 4Numerical ResultsIn this chapter we assess the performance of the proposed method and compare theresults with those from other methods existing in the literature on both simulatedand real data.4.1 Simulation settingsIn the simulation studies, the data are generated from probability distributionswhose densities are in the class HD. We restrict our attention to bivariate cases.The density functions have a general form :f (x|µ,θ) = f0(nD(x−µ)), x ∈ R2, (4.1.1)where µ ∈ R2 is the location parameter and θ summarizes the parameters associ-ated with the density generator f0 and the gauge function nD of the shape set D.We consider two types of shape sets. The first is an ellipse determined by a scalematrix Σ. This shape set can be specified by the following gauge function:nD(x) =√xTΣ−1x. (4.1.2)The other type is a non-elliptical shape set which introduces skewness and has thegauge function of the following form with the skewness parameter α ∈ R2 and32scale matrix Σ:nD(x) =√xTΣ−1x+(αT x)21{αT x<0}. (4.1.3)Two different density generators are considered. The first density generatorshares the same exponential rate of decay as the normal distribution and has thefollowing formf0(r) ∝ e−r2/2. (4.1.4)The second one has the same rate of decay as the logistic distribution and has theform off0(r) ∝ r−1e−r. (4.1.5)Combinations of the shape sets and the density generators give us four differentdistributions. For example, when the density generator decays at the same rate asa normal density and the level sets are elliptical, the underlying distribution is abivariate normal distribution. To fully specify f0, we need to determine the nor-malizing constant that ensures f (x) = f0(nD(x− µ)) is a proper density function.Recall that the density generator f0 must satisfy equation (2.2.2). By plugging inf0 and letting the integral of f over R2 equal to 1, we get the normalizing constantk = 1/(2|D|) for both density generators to be considered in the simulation studies.The expressions for the area of |D| of the shape sets specified by equation (4.1.2)and equation (4.1.3) are presented in Section 2.3, page 9.In summary, the following four types of densities are to be considered in sim-ulation studies:• Densities with homothetic elliptical level sets and the density generator as inequation (4.1.4), referred to as E2(Σ) densitiesf (x) =12|D|e(−xTΣ−1x)/2, x ∈ R2; (4.1.6)• Densities with homothetic non-elliptical level sets and the density generatoras in equation (4.1.4), referred to as NE2(Σ,α) densitiesf (x) =12|D|e(−xTΣ−1x+(αT x)21{αT x<0})/2, x ∈ R2; (4.1.7)33• Densities with homothetic elliptical level sets and the density generator as inequation (4.1.5), referred to as E1(Σ) densitiesf (x) =(2|D|)−1√xTΣ−1xe(−√xTΣ−1x), x ∈ R2; (4.1.8)• Densities with homothetic non-elliptical level sets and the density generatoras in equation (4.1.5), referred to as NE1(Σ,α) densitiesf (x) =(2|D|)−1e−√xTΣ−1x+(αT x)21{αT x<0}√xTΣ−1x+(αT x)21{αT x<0}, x ∈ R2; (4.1.9)where we use E and NE to denote densities with homothetic elliptical and non-elliptical level sets, respectively. The subscript represent the rate of the decay forthe two density generators to be considered. It should be noted that here we alsoassume that the true location parameter is known and µ = (0,0). The main focusof the simulation studies is to• compare the shape set estimation from our proposed method, from the esti-mated scale matrix assuming the level sets are elliptical and from RJMCMCwith correctly specified density generator, for densities with elliptical andnon-elliptical level sets;• examine the estimation of the density generator using two different approaches,see details in Section 3.4;• assess the performance of our proposed method for estimating risk regionsat different probability levels p = 1/200,1/500,1/1000.4.1.1 Data generationThe process of generating from densities with homothetic shape sets is describedin Algorithm 1, page 12. We derive the density function fS for the random variableS when f0(r) =12|D|r−1e−r. From Proposition 1, when d = 2, we havefS(s) = s2|D|| f ′0(s)|= s2|D|12|D|e−s(s+1s2) =12e−s(s+1), (4.1.10)34which is a mixture distribution of gamma(2,1) and gamma(1,1)with equal weights.Derivation for the density function fS when f0(r) =12|D|e−r2/2 are shown in ex-ample 3.In the simulation studies, we examine the impact of the scale matrix by settingits off-diagonal element η to 0.2,0.5 and 0.8 with the diagonal element as 1. ForNE2 and NE1 densities, we set the skewness parameter α = (−1,6). For all fourdensities, location µ is assumed to be (0,0). At each iteration, we generate asample of size 1000 from each of the four densities. The number of iterations is setto be Simulation methodsIn order to assess the performance for shape set estimation of our proposed methodfor densities in HD, we compare results from the following estimators :• Assuming the level sets are elliptical, the estimation for the shape set canbe reduced to estimating the off-diagonal entry η of the scale matrix Σ2×2,which can then be estimated using the empirical Kendall’s tau viaηˆ = sin(piτˆ2), (4.1.11)where τˆ is the empirical Kendall’s tau. This estimator is shown to be√n-consistent and asymptotically normal. See Theorem 4.2 in Hult and Lind-skog [2002] for more details.• The shape set can also be estimated via the limiting shape of sample clouds.More details of this approach can be found in Section 3.2.1. We consider theimpact of the number of intervals k on the estimation by comparing resultswith k = 4,10,15.• Assuming the true density generator is known, the spline model proposed byFerreira and Steel [2005] can be used. Details are discussed in Section 3.2.2.In the simulation studies, we use cubic hyperspherical log-splines, namely,we use Rld with l = 2 in equation (3.2.4). We adopt the choice of a Poissonprior for K with parameter λk = 10. The knot locations are assumed to be the35even-numbered order statistics from (2K+1) points uniformly distributed on[0,2pi], for a given K. For each component of the knot value v= (v1, · · · ,vK),we adopt the normal prior with mean µv = 0 and standard deviation σv = 1.4as in Ferreira and Steel [2005]. We choose a normal prior for the locationwith mean (0,0) and standard deviation 1 for the location parameter µ . Thestarting point for µ is set to be (0,0). We start from K = 6 as the number ofknots and their locations as the even-numbered points from (2K+1) evenly-spaced points between (0,2pi). Values at all knots are set as 0 at the start. Theparameters in the proposal distributions are tuned to obtain an acceptancerate close to 30%. Inferences are conducted using MCMC chains of 12000iterations, retaining every 10th sample with a burn-in period of 2000 draws.With the sample of radial components from the initial estimate of the shapeset from the limit set approach, we compare the performance of the following twoestimators for the density generator function.• A Weibull probability distribution can be fitted to the sample of the estimatedradial component. The parameters of the Weibull distribution are estimatedwith MLE. In bivariate cases, the density for R is fR(r) = 2|D|r f0(r); seeequation (3.1.3). So when f0(r)∝ e−r2/2, the radial component has a Weibulldistribution with parameters k = 2,λ =√2; when f0(r) ∝ r−1e−r, the radialcomponent has a Weibull distribution with parameters k= λ = 1 can be fittedto the sample of Ri’s.• Assuming that f0 is a von Mises function with the scale function of theform as in equation (3.3.1), we can estimate the two parameters using equa-tion (3.3.2) and equation (3.3.3), respectively. Details are presented in Sec-tion 3.3.2. When f0(r)∝ e−r2/2, we have β = 2,c = 1 for the scale function;when f0(r) ∝ r−1e−r, then we have β = c = 1 for the scale function.Finally the scaling factor is estimated with the estimator in equation (3.4.1).It shoulde be noted that all simulation methods described above are conductedassuming we know the true location parameter of densities. However, in practice,this prior knowledge may not be available and should then be inferred from the data36set itself at the same time. The problem of estimating the location is essentially theproblem of mode estimation for continuous bivariate random vectors. There aremany existing methods in the literature. One possible approach is to use a non-parametric estimation of the density function with kernel density estimation andthen to estimate the mode with the point with the maximum density using a grid-search across the estimated density function. In bivariate cases, kernel densityestimation has nice asymptotic properties, see Parzen [1962] and De Valpine[2004]. Yet, a well-known difficulty for this method is the choice of the bandwidthmatrix. An optimal bandwidth matrix could improve the density estimation thusimproving the location estimation. Ultimately, this could improve the shape set andrisk region estimation. However, it is beyond the scope of this report as finding theoptimal bandwidth matrix is difficult and computationally demanding, especiallyin high dimensions. We assume the true location is known and µ throughout oursimulation studies.4.1.3 Estimation assessmentWe measure the symmetric difference between two sets to quantify the perfor-mance of the estimation method, for shape set estimation and for risk region esti-mation. The symmetric difference of two sets is the set of elements that belong toeither of the sets but not their intersection. In this report, we adopt the notation ofA∆B to represent the symmetric difference between set A and B. A simple examplewould be A∆B = {1,2,3,4} for A = {1,2,3,5,6} and B = {4,5,6}.To assess the shape set estimation, we compute the area of the symmetric dif-ference D∆Dˆ relative to the area of the true D, where Dˆ denotes the estimatedshape set. For risk region estimation, we compute the relative error P(Qp∆Qˆp)/pto take different probability levels into account. Here Qˆp denotes the estimatedrisk region corresponding to the probability level p and P(Qp∆QˆP) is the probabil-ity of Qp∆QˆP and it is computed with importance sampling approximation. Thisis particularly important when comparing risk regions as only a few or even nopoints will lie at the boundary of estimated risk regions. We briefly review the ideaof importance sampling and describe the computational details for our simulationstudies. Suppose we want to find v = E(h(X)) =∫B h(x) f (x)dx , where f is a37probability density on B ⊆ Rd with f (x) = 0 for all x /∈ B and h is a function ofinterest. If g is a probability density on Rd , thenv = E(h(X)) =∫Bh(x) f (x)dx=∫Bh(x) f (x)g(x)g(x)dx= Eg(h(X) f (X)g(X)), (4.1.12)where Eg(·) denotes the expectation under distribution with density g. The ideaof importance sampling is to add a multiplicative adjustment to f and then samplefrom g instead of f . The importance sampling estimate of v = E(h(X)) isvˆg =1nn∑i=1h(Yi) f (Yi)g(Yi), Y∼ g. (4.1.13)In our case, as we are interested in estimating a probability, the function h is anindicator function h(X) = 1{X∈Qp∆Qˆp}. We take the density g to be a density func-tion with homothetic level sets of the same shape as f (i.e. determined by the setD) but with a different location, in order to compensate for the fact that there isvery few points under the distribution with density f . A preliminary simulationstudy is conducted to validate the choice of the density g and the results show theapproximation is very close to the true P(X ∈Qp∆Qˆp). The sampling procedure isdescribed in detail in Algorithm 2.When assessing the performance of our proposed method for risk region esti-mation, a natural benchmark that can be used is the results from RJMCMC methodwith the knowledge of the true density generator. We will compare results fromour method with those from RJMCMC with correctly specified density generator.4.2 Simulation resultsInitial estimation of the shape set First, we simulate single data sets of size1000 from the E2(Σ) density as in equation (4.1.6), NE2(Σ,µ) density as in equa-tion (4.1.7), E1(Σ) density as in equation (4.1.8) and NE1(Σ,α) density as in equa-38Algorithm 2 Estimating P(X ∈ Qp∆Qˆp) by importance sampling1: find all the intersection points (c1, · · · ,cm) between the boundaries of true andthe estimated risk regions;2: use the average between every two neighboring intersection points as the loca-tion parameter µ1, · · · ,µm for the distribution with density gi, i = 1,2, · · · ,m.If there is no intersection points, use an evenly-spaced grid of length m0 on theboundary of true risk region as the location parameters;3: initiate P(D∆Dˆ) as sym = 0;4: for i in 1 : m do . Or i in 1 : m05: sample y1, · · · ,yn from gi(y) = f0(nD(y−µ i));6: compute the sum of ratios si =∑f (y j)gi(y j)for y j ∈ Qp∆Qˆp, j = 1, · · · ,n;7: sym = sym+ si;8: end9: return symtion (4.1.9). We compute the true and estimated shape sets with the three methodsdescribed in Section 4.1.2. This is shown in Figure 4.1 and Figure 4.2. We see thatsymmetry can be observed for both data clouds with E1 and E2 densities whereasskewness is visible in samples from NE1 and NE2 densities. We observe that whenthe underlying distributions have elliptical level sets, the estimated shape sets basedon ellipticity assumption are close to the true shape sets. However, for distributionswith densities NE1 and NE2 where the ellipticity assumption is not satisfied, theestimator for the scale matrix assuming elliptical level sets could not capture theshape. This shows that assuming a rigid shape of the level sets could not provideenough flexibility and does not work well when the shape is mis-specified. Never-theless, for both cases where the distributions have elliptical and non-elliptical levelsets, we see satisfactory performance of estimators using limit set approach. Theshapes are approximated comparably well as the results from RJMCMC assumingtrue density generator.We notice the estimation from limit set approach is not stable in the sense thatthe estimated shape can sometimes be too smooth or unexpectedly wavy. We exam-ine the impact of the number of intervals k on the performance of estimation fromlimit set approach. Figure 4.3 shows the true and estimated dispersion functions39−4 −2 0 2 4−4−2024−6 −4 −2 0 2 4 6−6−4−2024−101−1 0 1DKendallLimitMCMC−101−1 0 1DKendallLimitMCMComegagridD0120 pi 2 pi 3pi 2 2piDKendallLimitMCMC(a) E2 DensityomegagridD0120 pi 2 pi 3pi 2 2piDKendallLimitMCMC(b) E1 DensityFigure 4.1: True and estimated shape sets and dispersion functions from thethree methods based on one sample of size 1000 from E1 and E2 den-sities. The off-diagonal entry in Σ is set to be η = 0.5. The number ofintervals is k = 6 in the limit set approach.40−4 −2 0 2 4−101234−6 −4 −2 0 2 4 6−1012345−101−1 0 1DKendallLimitMCMC−101−1 0 1DKendallLimitMCMComegagridD0120 pi 2 pi 3pi 2 2piDKendallLimitMCMC(a) NE2 DensityomegagridD0120 pi 2 pi 3pi 2 2piDKendallLimitMCMC(b) NE1 DensityFigure 4.2: True and estimated shape sets and dispersion functions from thethree methods based on one sample of size 1000 from NE1 and NE2densities. The off-diagonal entry in Σ is set to be η = 0.5. The skewnessparameter is α = (−1,6). The number of intervals is k = 6 in the limitset approach. 41with k = 4,10,15 for the four densities.omegagridD0123456780 pi 2 pi 3pi 2 2piTrueDk=4k=10k=15(a) E1 DensityomegagridD0120 pi 2 pi 3pi 2 2piTrueDk=4k=10k=15(b) E2 DensityomegagridD0120 pi 2 pi 3pi 2 2piTrueDk=4k=10k=15(c) NE1 DensityomegagridD0120 pi 2 pi 3pi 2 2piTrueDk=4k=10k=15(d) NE2 DensityFigure 4.3: True and estimated dispersion functions from the limit set ap-proach based on same single sample of size 1000 from the four densi-ties. The off-diagonal entry of Σ is η = 0.5 and the skewness parameteris α = c(−1,6). The number of intervals is set to k = 4,10,15.We see that larger value of k could possibly result in spikes in the dispersionestimation, as shown in the estimation for E1 density. The spike in estimated dis-persion with large k reflects that limit set estimation is relatively sensitive to po-tential ”outliers” that could possibly distort the shape of D. However, small valueof k results in estimation that is sometimes too smooth to reflect the change in cur-42vature. Unfortunately, there is no guideline for the choice of k when using limitset approach to estimate the shape set. In practice, some visual assessment of thedata set can be conducted to help determine a proper value of k. In our simulationstudies, we set k = 6 for E1 and E2 densities and k = 8 for NE1 and NE2 densitiesto reflect the fact that more intervals are needed to capture the irregularity in theshape of non-elliptical level sets .Estimation of the density generator We also compute the estimated density gen-erator based on the two approaches described in Section 4.1.2. We investigate theperformance of the two estimators based on 100 simulated samples of size 1000from the four densities. This is depicted in Figure 4.4 and Figure 4.5.We see that the estimation for parameters of the density generator is biasedbased on sample radial components from the limit set approach using MLE andthe estimator based on asymptotic results. With MLE, the shape parameter tendsto be under-estimated and the scale parameter tends to be over-estimated. Forthe estimator based on the asymptotic results, both parameters β and c tend to beunder-estimated. This implies that the tails of the distributions are estimated todecay more slowly than they actually do. A closer look at the estimated radialcomponents from the initial shape set estimation suggests that the Rˆ for certainpoints sometimes were drastically over-estimated due to the fact that the disper-sion function at these points was close to zero. The estimated radial componentswere thus inflated by the near-zero dispersion. Table 4.1 showcases three of theseexamples occurred in the data sample from NE2 density with their estimated andtrue radial components and dispersion.Table 4.1: Three points from the NE2 density with large estimated radialcomponents from the limit set approach. Their coordinates, true valuesand estimated values of the dispersion function are also presented.Coordinates(x) D(dx) Dˆ(dx) R Rˆ(0.162,−0.176) 0.0359 4.9524e−06 1.2609 107.3576(0.396,−0.291) 0.0476 0.0033 2.2520 148.3141(0.269,−0.214) 0.0448 0.0014 1.6225 247.6347These over-estimated radial components could lead to an estimation of a much43NE1 E10. E11.01.52.0NE2 E21. MLE estimates of shape parametersNE2 E21. MLE estimates of scale parametersFigure 4.4: MLE estimates for the shape and scale parameters of the Weibulldensity based on the same 100 samples of size 1000 from the four den-sities with η = 0.5 and α = c(−1,6). True values of the parametersare indicated by the red line segments. The shape of the level sets isestimated with limit set approach.44NE1 E10. E10. E21. Estimated βNE2 E20. Estimated cFigure 4.5: Estimated β and c based on 100 samples of size 1000 from thefour densities using the estimator bases on the von Mises assumption.True values of the parameters are indicated by the red line segments.The shape of the level sets is estimated with limit set approach.heavier tail than that of the underlying distribution. Moreover, the estimator un-der the asymptotic results sometimes fails to converge with the existence of theseover-estimated radial components. As a result, when estimating the density gener-ator, we first truncated Rˆ so that the maximum Rˆ is no greater than 8. The resultspresented in Figure 4.4 and Figure 4.5 are based on the truncated Rˆ.Table 4.2 and Table 4.3 show the root mean square error (RMSE) for the twoestimators from the truncated radial components.45Table 4.2: Root mean square error for MLE estimator of the density generatorparametersParameter E1 E2 NE1 NE2shape 0.8118 0.2939 0.5679 0.1949scale 0.5892 0.3047 0.4122 0.2021Table 4.3: Root mean square error for the estimator of the density generatorbased on the asymptotic resultParameter E1 E2 NE1 NE2β 0.1375 0.4348 0.1623 0.3151c 0.1633 0.1506 0.1883 0.0965We see the estimator based on the asymptotic result outperforms the MLE esti-mator with a smaller RMSE in most cases. It should be noted that here our choiceof the upper bound for Rˆ is relatively conservative as there may not be knowledgeavailable for this value in real-life applications. When the upper bound is set to2.4, the estimation is much improved with smaller variation, compared with thetruncation value of 30, for example.Estimation of the shape set We now examine the estimation of the shape set withour proposed method. We compute the ratio between the area of the symmetricdifference D∆Dˆ and the area of the true D. Table 4.4 shows for the estimator underellipticity assumption, RJMCMC with the correctly specified f0 and the estimatedfˆ0, the median of the 100 ratios for the four densities. We also consider cases withthe off-diagonal entry of the scale matrix η = 0.2,0.5,0.8. From this table, we seethe estimator based on the ellipticity assumption performs best when the level setsof the underlying distribution are indeed elliptical. Yet it is outperformed by ourproposed method and RJMCMC with the true density generator for densities withnon-elliptical level sets in NE1 and NE2 cases. We see a comparable performanceof our proposed method with the RJMCMC with f0 , yet with a larger variation.Recall we use a fixed value for the number of intervals k in the initial estimationof the shape set using limit set approach and a relatively conservative upper boundRˆ≤ 8 for the truncation of the Rˆ when estimating the density generator. The results46Table 4.4: Mean and the standard deviation of the ratio|D∆Dˆ||D| of the threemethods for the four densities with η = 0.2,0.5,0.8 in the scale matrixMethod η = 0.2 η = 0.5 η = 0.8E1Underellipticityassumption0.0167(0.0085)0.0167(0.0111)0.0197(0.0270)rp with f0 0.0406(0.0154)0.0475(0.0167)0.0524(0.0162)rp with fˆ0 0.1183(0.0882)0.1290(0.0848)0.1365(0.0797)E2Underellipticityassumption0.0078(0.0055)0.0092(0.0072)0.0154(0.0125)rp with f0 0.0490(0.0217)0.0542(0.0215)0.0728(0.0244)rp with fˆ0 0.0982(0.0657)0.1224(0.0656)0.1359(0.0755)NE1Underellipticityassumption0.5780(0.0052)0.5840(0.0126)0.7889(0.0461)rp with f0 0.0747(0.0223)0.0823(0.0328)0.0974(0.0429)rp with fˆ0 0.1202(0.0567)0.1217(0.0650)0.1279(0.0595)NE2Underellipticityassumption0.5868(0.0045)0.6014(0.0165)0.8495(0.0358)rp with f0 0.1051(0.0431)0.1053(0.0431)0.1402(0.0861)rp with fˆ0 0.1498(0.0628)0.1350(0.0490)0.1601(0.0699)from our method could possibly be improved by further fine-tuning the value of kand choosing a different truncation point.47Estimation of the risk regions We investigate the performance of our proposedmethod for risk region estimation for p = 1/200,1/500 and 1/1000. First wecompute the true and estimated risk regions for p = 1/200,1/1000 based on thesingle data sets of size 1000 of the four densities. This is shown in Figure 4.6.−5 0 5−505Q^ pQp(a) E1 Density−4 −2 0 2 4−4−2024Q^ pQp(b) E2 Density−6 −4 −2 0 2 4 6−20246Q^ pQp(c) NE1 Density−4 −2 0 2 4−2024Q^ pQp(d) NE2 DensityFigure 4.6: True and estimated risk regions for p = 1/200,1/1000 based ona single sample of size 1000 from the four densities.We see that the estimated risk regions are relatively close to the true risk re-gions. The estimated regions have the right sizes and the shapes are approximatedreasonably well. Also it is interesting to note that there are barely no or few pointsat the boundary of the risk regions. This further emphasizes that the set we are48interested in is an almost empty ”risk region” and the traditional methods couldnot work in this case.In addition, we assess the performance of the proposed method through 100iterations. The off-diagonal entry of the scale matrix is η = 0.5 and the skewnessparameter is α = (−1,6). Table 4.5 shows the mean and the standard deviationsfor the estimated scaling factors from RJMCMC with the true f0 and estimated fˆ0.Here we use the estimator based on the asymptotic results to estimate the densitygenerator.Table 4.5: Mean and standard deviations for the estimated scaling factorsusing RJMCMC with true density generator f0 and estimated fˆ0 forp = 1/200,1/500,1/1000 based on 100 samples of size 1000 from thefour densities. True values are also included.Method p = 1/200 p = 1/500 p = 1/1000E1rp 5.2983 6.2146 6.9078rp with f0 5.3423(0.2344)6.2733(0.3068)6.9782(0.3648)rp with fˆ0 5.5045(0.7793)6.4627(0.9205)7.1882(1.0287)E2rp 3.2553 3.5255 3.7170rp with f0 3.2701(0.0772)3.5418(0.0938)3.7342(0.1060)rp with fˆ0 3.5157(0.4233)3.8076(0.4609)4.0144(0.4878)NE1rp 5.2983 6.2146 6.9078rp with f0 5.2680(0.2604)6.1855(0.3423)6.8803(0.4083)rp with fˆ0 5.4616(0.6474)6.4116(0.7723)7.1309(0.8692)NE2rp 3.2553 3.5255 3.7170rp with f0 3.3171(0.1205)3.5984(0.1435)3.7980(0.1605)rp with fˆ0 3.5768(0.2407)3.8795(0.2662)4.0942(0.2848)We see that with the true density generator, the estimated scaling factors arecloser to the true values and have a smaller variation. Our proposed method over-49estimates rp’s with different probability levels for all four densities. This over-estimation may be as well caused by the over-estimation of the radial componentsusing the limit set approach.Risk region estimation To quantify the overall performance of our proposed methodfor estimating bivariate risk regions, we compute the relative error P(Qp∆Qˆp)/pfor each of the four densities at different probability levels based on 100 samplesof size 1000. Table 4.6 shows for the estimator based on the ellipcity assumption,RJMCMC with f0 and RJMCMC with fˆ0, the median of the 100 relative errorsP(Qp∆Qˆp)/p when p = 1/200,1/500,1/1000. We see a superior performance ofour proposed method when the ellipticity assumption is not satisfied. The esti-mator based on the ellipticity assumption is obviously not adequate and could notwork in these cases. When the underlying distribution does have elliptical levelsets, we have comparable results from RJMCMC with the true and the estimateddensity generator. Our method is shown to be more flexible and performs well fordistributions whose level sets have irregular shape.Table 4.6: Median of the relative error P(Qp∆Qˆp)/p of the three methods forp = 1/200,1/500,1/1000 with the four densities.Method p=1/200p =1/500p=1/1000E1Under ellipticity assumption 0.3259 0.1095 0.2839rp with f0 0.2936 0.2865 0.3827rp with fˆ0 0.2698 0.2749 0.3816E2Under ellipticity assumption 0.1558 0.1927 0.1028rp with f0 0.3642 0.2843 0.2936rp with fˆ0 0.3715 0.2930 0.2519NE1Under ellipticity assumption 0.8752 0.8125 0.8744rp with f0 0.2198 0.2653 0.2634rp with fˆ0 0.2617 0.2938 0.3029NE2Under ellipticity assumption 0.8376 0.8643 0.8346rp with f0 0.1928 0.2937 0.3021rp with fˆ0 0.2129 0.2838 0.2938We also include the results from the RJMCMC with the true density generator50and results when we estimate the location parameter with kernel density estimationalong with the shape set when obtaining the initial estimate from the limit set ap-proach. We see that the more we need to extrapolate from the shape set, namely, thesmaller p becomes, the larger relative error we have for the risk region estimation.When the density generator is estimated with the limit set approach, there is morenoise in our estimation for the risk regions as expected. Furthermore, the relativeerror is even greater when we incorporate the location parameter estimation intoour inference.51Chapter 5Data ExamplesIn this section, we compare our proposed method with the RJMCMC assuming apaticular parametric form for the density generator by re-visiting the data set fromthe Australian Institute of Sport as in Ferreira and Steel [2005]. In addition, anapplication of our method to daily rates of return on two stocks of the Dow JonesIndustrial Average is presented.5.1 Example 1: Australian Institute of Sport data setrevisitedThe data set includes four biomedical variables : body mass index BMI, percentageof body fat PBF, sum of skin folds SSF and lean body mass LBM for n = 202 ath-letes at the Australian Institute of Sport, as described in Cook and Weisberg [2009].This dataset can be found in R package locfit. For simplicity of illustration, weconsider the distribution of four different pairs of variables: BMI& PBF, BMI &SSF, SSF & LBF and PBF & SSF.We begin by comparing the results from RJMCMC with f0(r) ∝ e−r2/2, de-noted as RJMCMC-normal, and RJMCMC with fˆ0 estimated from the limit setapproach for different pairs of variables using Bayes factors. Table 5.1 presentsthe logarithm of the Bayes factors with respect to the bivariate normal distribution.Larger values indicate more support for the model compared with the bivariate nor-mal model. We see all four pairs of variables strongly favour the D-class models52estimated with RJMCMC. For BMI& SSF and SSF& LBF, the differences betweenthe two models are small. For BMI& PBF and PBF & SSF, our method performsbetter than the results from RJMCMC assuming a normal density generator. WeTable 5.1: Logarithms of Bayes factors for the two methods and the four pairsof variables with respect to the normal alternativeModel BMI& PBF BMI & SSF SSF & LBF PBF & SSFRJMCMC-normal33.7244 50.2716 50.8756 63.2873RJMCMCwith fˆ041.3617 50.5327 50.0864 78.0736further plot the predictive contours of the D-class densities for the four pairs in Fig-ure 5.1. In addition, it displays the risk regions corresponding to p= 1/200 for thedata using our method. From the contour plots, we see a departure from symmetryof the underlying distributions. Even for the PBF & SSF pair where the contourresemble an ellipse, the mode of the distribution is near the lower left-hand corner,implying the skewness of the data. We see satisfactory performance of the D-classacross different contour shapes. This illustrates the flexibility of the D-class model.Moreover, we see the estimated parameters for the density generator are differentacross data sets. This implies that our method adds flexibility to the original RJM-CMC method by allowing modeling data with different rates of decay. Finally, wesee the estimated risk regions have the right sizes and reasonable shapes. There isno or only one point on the boundary or outside of the estimated Qˆp. These regionscan help identify the outliers in the data set.53(a) βˆ = 1.76, cˆ = 0.26 (b) βˆ = 1.80, cˆ = 0.22 (c) βˆ = 1.80, cˆ = 0.22 (d) βˆ = 0.93, cˆ = 0.57−20 −10 0 10 20 30 40010203040(BMI, PBF)(e) BMI&PBF0 5 10 15−10−505101520( SSF, LBM)(f) SSF&LBF−2 0 2 4 6 8 10024681012( BMI, SSF)(g) BMI&SSF0 5 100510( PBF, SSF)(h) PBF&SSFFigure 5.1: Contour plots of the posterior predictive D-class densities of the four pairs of the variables using RJMCMCwith fˆ0 (top panel) and the estimated risk regions for p = 1/200 (bottom panel).545.2 Example 2: Financial applicationIn this example, we illustrate the practical usefulness of our proposed method inthe context of financial market. Data used by Levy and Duchin [2004] are revisited.We use the daily rates of return on stocks of the Boeing company and the Coca-Colacompany ranging from July 2nd, 1962 to July 31st, 2017. According to Levy andDuchin [2004], these two series can be well modelled by a logistic distribution. Weestimate the risk regions for this pair of variables at p= 1/200,1/500 and 1/1000;see Figure 5.2.−0.2 −0.1 0.0 0.1 0.2−0.3−0.2− Co.Coca Cola Co.p = 1/200p = 1/500p =1/1000Oct−19−1987Sep−17−2001Figure 5.2: Estimated risk regions of daily returns of the two stocks for p =1/200,1/500,1/1000. The size of the data set is n = 13865.The estimated parameters for the density generator using the estimator basedon the asymptotic result are βˆ = 1.35, cˆ= 106.99. We see the estimated risk regionshave the right size. Table 5.2 shows the expected numbers of points in Qp and theobserved numbers of points in Qˆp at p = 1/200,1/500 and 1/1000.These estimated risk regions can be utilized to identify extreme pairs of thedaily returns on the two stocks.These extreme pairs occur with a very small prob-ability and may correspond to major events in the stock market. For instance, thepoint at (−0.1175,−0.2469) where the two stocks both declined by a significant55Table 5.2: Expected and observed numbers of points in Qp and Qˆp for p =1/200,1/500,1/1000.p Expected number ofpoints in QpObserved number ofpoints in Qˆp1/200 70 891/500 28 341/1000 14 17amount occurred on October 19th, 1987, known as the Black Monday. It lies inthe estimated risk region at p = 1/1000. Another example would be the point at(−0.1763,0.0086) which also lies in the estimated risk region at p = 1/1000. Itcorresponds to the date of September 17th, 2001 and shows the market reactionin the aviation industry after the 9/11 attack. It shows steep decline hit the Boe-ing company as the New York Stock Exchange (NYSE) and the Nasdaq resumedtrading on September 17th in 2001.56Chapter 6DiscussionEstimation of multivariate risk regions is of great importance in a variety of appli-cations. These risk regions allow one to detect multivariate extremes. However,it is a statistically difficult problem in the sense that we are estimating an almostempty set with very limited information.In this report, we propose a new method that combines ideas of shape set esti-mation within the D-class of distributions with techniques from extreme value the-ory for tail estimation. Our probabilistic framework supplements existing extreme-value type approaches to risk region estimation by allowing tails of the underlyingdistribution to be light as well as covering situations of tail independence. Wedemonstrate reasonable performance of our proposed method by simulation stud-ies. Two real-life applications are also presented to further illustrate the flexibilityand performance of our method.However, in simulation studies, we observe a large variation in the parameterestimation and risk region estimation from our proposed method compared withRJMCMC assuming the true density generator. As mentioned earlier, it is partlybecause of the arbitrary choice for the number of intervals when estimating theshape from the limiting shape of sample clouds. In real life applications, a visualassessment of the data set should be conducted to help determine the value. More-over, we use the cubic spline model to fit a smooth curve through the maximalpoints within each interval in the bivariate case. In multivariate cases, a more so-phisticated model is required to capture the shape in high dimensions. In addition,57one possible drawback of this limit set approach is that it only utilize the maximalpoints in the data set. Future studies could explore some non-parametric methodthat could take all data points into consideration to estimate the shape of level sets.Furthermore, we assume the true location parameter is known and set µ =(0,0) for all simulation studies. In many real-life applications, the location mayneed to be estimated from the data set itself along with the shape set estimation.In the two data examples, we estimate the location parameter as the point with themaximal density using a grid-search across the estimated density function fromkernel density estimation. Here the bandwidth of the kernel is a free parameterthat has a substantial impact on the resulting estimation. However, the choice of anoptimal bandwidth is a difficult problem, especially in high dimensions. For futurework, we could explore other existing mode estimation methods, like mean shiftprocedure; see Cheng [1995].58BibliographyB. Abdous, A. L. Fouge`res, K. Ghoudi, and P. Soulier. Estimation of bivariateexcess probabilities for elliptical models. Bernoulli, 14:1065–1088, 2008. →pages 29M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions: withFormulas, Graphs, and Mathematical Tables. Dover, New York, 2nd edition,1972. → pages 24A. V. Asimit, D. Li, and L. Peng. Pitfalls in using Weibull tailed distributions.Journal of Statistical Planning and Inference, 140:2018–2024, 2010. → pages3, 19, 30, 31A. Balkema and N. Nolde. Asymptotic independence for unimodal densities.Advances in Applied Probability, 42:411–432, 2010. → pages 5, 9, 20A. Balkema, P. Embrechts, and N. Nolde. Meta densities and the shape of theirsample clouds. Journal of Multivariate Analysis, 101:1738–1754, 2010. →pages 10J. J. Cai, J. H. Einmahl, and L. de Haan. Estimation of extreme risk regions undermultivariate regular variation. The Annals of Statistics, 39:1803–1826, 2011.→ pages 2Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions onPattern Analysis and Machine Intelligence, 17:790–799, 1995. → pages 58R. D. Cook and S. Weisberg. An Introduction to Regression Graphics. John Wiley& Sons, 2009. → pages 52R. Davis, E. Mulrow, and S. Resnick. Almost sure limit sets of random samples inRd . Advances in Applied Probability, 20:573–599, 1988. → pages 18, 20L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. SpringerScience & Business Media, 2007. → pages 159P. De Valpine. Monte Carlo state-space likelihoods by weighted posterior kerneldensity estimation. Journal of the American Statistical Association, 99:523–536, 2004. → pages 37I. DiMatteo, C. R. Genovese, and R. E. Kass. Bayesian curve-fitting withfree-knot splines. Biometrika, 88:1055–1071, 2001. → pages 24J. H. Einmahl, L. de Haan, and A. Krajina. Estimating extreme bivariate quantileregions. Extremes, 16:121–145, 2013. → pages 2K. T. Fang, S. Kotz, and K. W. Ng. Symmetric Multivariate and RelatedDistributions. Chapman and Hall, 1990. → pages 6C. Fernandez, J. Osiewalski, and M. Steel. Modeling and inference withv-spherical distributions. Journal of the American Statistical Association, 90:1331–1340, 1995. → pages 6, 8, 15J. T. Ferreira and M. F. Steel. Modeling directional dispersion throughhyperspherical log-splines. Journal of the Royal Statistical Society: Series B(Statistical Methodology), 67:599–616, 2005. → pages 2, 3, 6, 13, 15, 18, 19,23, 24, 25, 35, 36, 52A. Gelman, G. O. Roberts, W. R. Gilks, et al. Efficient Metropolis jumping rules.Bayesian Statistics, 5:42, 1996. → pages 26P. J. Green. Reversible jump Markov chain Monte Carlo computation andBayesian model determination. Biometrika, 82:711–732, 1995. → pages 25, 26J. Heffernan and J. Tawn. A conditional approach for multivariate extreme values.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66:497–546, 2004. → pages 1H. Hult and F. Lindskog. Multivariate extremes, aggregation and dependence inelliptical distributions. Advances in Applied probability, 34:587–608, 2002. →pages 18, 35H. Levy and R. Duchin. Asset return distributions and the investment horizon.The Journal of Portfolio Management, 30:47–62, 2004. → pages 55A. J. McNeil and A. D. Smith. Multivariate stress scenarios and solvency.Insurance: Mathematics and Economics, 50:299–308, 2012. → pages 1J. Osiewalski and M. Steel. Robust Bayesian inference in `q-spherical models.Biometrika, 80:456–460, 1993. → pages 660E. Parzen. On estimation of a probability density function and mode. The Annalsof Mathematical Statistics, 1962. → pages 37S. Resnick. Extreme Values, Regular Variation, and Point Processes. Springer,1987. → pages 7, 28S. Resnick and H. Rootze´n. Self-similar communication models and very heavytails. Annals of Applied Probability, 10:753–778, 2000. → pages 161


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items