UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Some statistical properties of multivariate proper dispersion models, with special reference to a multivariate… Rajeswaran, Jeevanantham 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1998-0586.pdf [ 2.93MB ]
Metadata
JSON: 831-1.0088555.json
JSON-LD: 831-1.0088555-ld.json
RDF/XML (Pretty): 831-1.0088555-rdf.xml
RDF/JSON: 831-1.0088555-rdf.json
Turtle: 831-1.0088555-turtle.txt
N-Triples: 831-1.0088555-rdf-ntriples.txt
Original Record: 831-1.0088555-source.json
Full Text
831-1.0088555-fulltext.txt
Citation
831-1.0088555.ris

Full Text

S O M E STATISTICAL PROPERTIES OF M U L T I V A R I A T E P R O P E R DISPERSION M O D E L S , W I T H S P E C I A L R E F E R E N C E T O A M U L T I V A R I A T E G A M M A M O D E L by Jeevanantham Rajeswaran B.Sc , University of Jaffna, Jaffna, Sri Lanka, 1992. A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES Department of Statistics We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF BRITISH C O L U M B I A August 31, 1998 ©Jeevanantham Rajeswaran, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of 'bClbf^&bt'CLi The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract A broad class of error distributions for generalized linear models is provided by the class of dispersion models which was introduced by J0rgensen (1987a, 1997a) and a detailed study on dispersion models was made by J0rgensen (1997b). In this thesis we study multivariate proper dispersion models. Our aim is to do multivariate analysis for non-normal data, particularly data from the multivariate gamma distri-bution which is an example of a multivariate proper dispersion model, introduced by j0rgensen and Lauritzen (1998). This class provides a multivariate extension of the dispersion model density, following the spirit of the multivariate normal density. We consider the saddlepoint approximation for small dispersion matrices, which, in turn, implies that the multivariate proper dispersion model is approximately multi-variate normal for small dispersion matrices. We want to mimic the basic technique of testing in multivariate normal, Hotelling's T2. Our version of the T 2 test applies asymptotically, for either small dispersion or large samples. We also consider estimating the normalizing constant of the bivariate gamma by Monte Carlo simulation and we investigate the marginal density by using numerical integration. We also investigate the distribution of the T 2-statistic by Monte Carlo simulation. i i Contents Table of Contents iii List of Tables vi List of Figures vii Acknowledgements 1 1 Introduction 2 2 Univariate dispersion models 6 2.1 Definition 6 2.2 Exponential dispersion models 9 2.3 Proper dispersion models 11 2.4 Saddlepoint approximation 12 2.4.1 Barndorff-Nielsen's p*-formula . 13 2.4.2 Saddlepoint and Laplace approximations 14 3 Multivariate dispersion models 17 3.1 Definition 17 3.2 _ Multivariate proper dispersion models 19 3.2.1 Multivariate gamma distribution 20 ii i 3.2.2 Maximum likelihood estimate of the position vector fx 23 3.3 Saddlepoint approximations 26 3.3.1 Barndorff-Nielsen's p*-formula 27 3.3.2 The Laplace method 28 3.4 Multivariate normal approximation 31 4 Bivar ia te gamma d is t r ibu t ion 34 4.1 Numerical study of normalizing constant 34 4.2 Density plots 38 4.3 Estimation of moments 48 4.4 Numerical study of marginal density 49 5 A generalization of Hotel l ing 's T 2 -s ta t is t ic 52 5.1 Introduction 52 5.2 General likelihood ratio method in multivariate normal model . . . . 53 5.2.1 T 2-statistic in multivariate normal . . 54 5.3 Maximum likelihood estimators of fj, and £ 56 5.4 T2 in multivariate proper dispersion model 57 5.4.1 Numerical study on the T 2-statistic 63 A p p e n d i x A 69 A Some results on matrices 69 A . l Determinant 69 A. 2 Inverse 69 A.3 Kronecker product (Direct product) 70 A.4 Matrix derivatives and Jacobians 71 iv Bibliography List of Tables 4.1 Dispersion Matrix 36 4.2 Estimation of normalizing constant by using M C method 37 4.3 A comparison of estimations of normalizing constant 37 4.4 Position vectors • 38 4.5 Dispersion matrices. . 38 4.6 Small dispersion matrices 41 4.7 Moderately large dispersion matrices 41 4.8 Large Dispersion matrices 46 4.9 Estimation of moments 49 4.10 Estimation of moments 51 4.11 Dispersion matrices 51 A . l Vector Derivatives (x : p x 1) 71 vi List of Figures 2.1 Some gamma densities 9 4.1 Contour plots of independent bivariate gamma densities 39 4.2 Perspective plots of independent bivariate gamma densities 40 4.3 Contour plots of bivariate gamma densities with small dispersion. . . 42 4.4 Perspective plots of bivariate gamma densities with small dispersion. 43 4.5 Contour plots of bivariate gamma densities with moderately large dis-persion 44 4.6 Perspective plots of bivariate gamma densities with moderately large dispersion 45 4.7 Contour plots of bivariate gamma densities with large dispersion. . . 46 4.8 Perspective plots of bivariate gamma densities with large dispersion. . 47 4.9 A comparison between true and estimated marginal densities (esti-mated density is given in solid lines) 50 5.1 Histograms of diameter and height of the trees 63 5.2 Comparisons of estimated and true density of T 2 65 5.3 Comparisons of estimated and true density of T 2 67 vii Acknowledgments First of all, I would like to thank my supervisor Dr. Bent J0rgensen for guiding me into this unexplored territory, for his inspiration, for his encouragement and for the generous financial support throughout my graduate career at U B C . I would also like to take this opportunity to thank for his help right from receiving me at the Vancouver airport. I am also grateful to Dr. Martin Puterman for his careful reading of the manuscript and valuable comments. I would also like to thank the faculty members of the Department of Statistics for their sincere support, for showing me the "taste of Statistics" and for the encour-agements during my M.Sc program at U B C . I would like to take this opportunity to thank Ms. Christine Graham, Ms. Donna Pavlov, and Ms. Margaret Harney, our Department staff, for their constant support during my stay at U B C . Finally, I would like to thank Department of Statistics, especially Dr. Ruben Zamar, for giving me an opportunity to follow a Master's program at U B C and for giving financial support. 1 Chapter 1 Introduction The multivariate normal distribution has been studied for more extensively than any other multivariate distribution. Indeed, its position of preeminence among multi-variate continuous distribution is more marked than that of the normal univariate continuous distribution. However, in recent years there have been signs that the need for more usable alternatives to the multivariate normal distribution is becoming recognized. Although the bivariate normal distribution had been studied at the beginning of the 19th century, interest in multivariate distributions remained at a low level until it was stimulated by the work of Galton in the last quarter of the century. He did not, himself, introduce any new form of distribution, but he developed the idea of correlation and regression and focussed attention on the need for greater knowledge of possible forms of multivariate distribution. Much multivariate theory has become practically useful only with the advent of the electronic computer. This stimulus has resulted in a rapid development of multivariate theory over the past two decades. Investigation of non-normal distributions, or "skew frequency surfaces" (as non-symmetrical forms have been denoted) has generally followed lines suggested by previ-2 cms work on univariate distributions. Early work in this fields followed rather different lines, but was not very successful. It is noteworthy that, among non-normal distri-butions, methods of estimation appear to have been developed less fully than one might expect. Karl Pearson (1923), whose first investigations appear to have been prompted by noting distinctly non-normal properties of some observed joint distri-butions, initially tried to proceed by an analogy with the bivariate normal surface. Later, Neyman (1926) also considered methods of construction of joint distributions, starting from certain requirements on the regression and scedastic functions. This was an extension of the work initiated by Jule (1897), who showed that assuming multiple linear regression, the multiple regression function obtained by the method of least squares is identical with that of a multinomial distribution. The multivariate distributions were derived from other distributions through trans-formations, projections, convolutions, extreme values, mixing, compounding, truncat-ing, and censoring. Then the marginal distributions of various statistics were derived from them. The study of multivariate distributions is concerned mainly with distri-butions of the continuous and discrete types. Some distributions arise from multidi-mensional central limit theorems, many serve as models for random experiments, and others are of interest primarily as derived distributions. Prominent among limit dis-tributions and those from which others derive is the multinormal distribution. Some distributions derived from it are known to be the same for all parent distributions in a class containing symmetric multivariate stable distributions. These facts bear on ro-bustness and the validity of normal-theory procedures for use with non-normal data, including multivariate analysis, multivariate analysis of variance and the multivariate Bartlett test. Often multivariate data are scattered more heavily away from the center of location than are multinormal data. The introduction of a class of generalized linear models by 3 Nelder and Wedderburn (1972) paved the way to analyze a large variety of non-normal data by a simple general technique. Since there was no suitable class of multivariate non-normal distributions, this analysis is restricted to univariate data. Analysis of univariate non-normal data became simpler and accommodate more non-normal data types after the introduction of dispersion models, a broad class of error distributions for generalized linear models by J0rgensen (1987b,1997a). Having in mind to find a class of multivariate distributions suitable to analyze the multivariate non-normal responses in a generalized linear model setup and to extend the existing multivariate techniques to multivariate non-normal data, J0rgensen and Lauritzen (1998) proposed a multivariate extension to the dispersion models, the so called multivariate dispersion models. Since the form of the distribution is similar the multivariate normal distribution this distribution can handle a regression setup. It would be noteworthy that, unlike the other non-normal multivariate distributions proposed earlier, the definition of multivariate dispersion models is geared towards good statistical properties rather than requiring specific moments or marginals. As a first step towards these goals, we, in this thesis, consider a sub-class of multivariate dispersion models, the multivariate proper dispersion models and inves-tigate some properties of the sub-class and propose a generalization of Hotelling's T 2-statistic. In Chapter 2, we give a brief introduction to the univariate dispersion models of J0rgensen (1997b). In Chapter 3, we introduce the multivariate dispersion models and multivariate proper dispersion models. Specifically, we consider a multivariate gamma distribution which is an example of the multivariate proper dispersion models. Then we investigate the asymptotic behaviour of the multivariate proper dispersion models. In fact, we will show that multivariate proper dispersion models can be approximated by a multivariate normal distribution. 4 In chapter 4, we investigate the behaviour of the bivariate gamma distribution by observing the shape of the density plots and estimating the moments and marginal densities. In chapter 5, we generalize the Hotelling's T 2-statistic so that it can ac-commodate non-normal data. 5 Chapter 2 Univariate dispersion models In this chapter, we give an introduction to the theory of the univariate dispersion models of J0rgensen (1997b), which is the basis of the entire theory that we develop in this thesis. First we give the definition of dispersion models, then we look at the two types of dispersion models: exponential dispersion models and proper disper-sion models, and finally we discuss the saddlepoint approximations for the dispersion models. 2.1 Definition Nelder and Wedderburn (1972) were the first to show, by introducing the class of generalized linear models, that a large variety of non-normal data may be analyzed by a simple general technique. Generalized linear models were originally developed for exponential families of distributions, but the main ideas were extended to a wider class of models called dispersion models. Dispersion models were originally introduced by J0rgensen (1987b) and the models were studied in detail by J0rgensen (1997b). The class of dispersion models covers a comprehensive range of non-normal dis-tributions, including distributions for the following seven basic data types, where S 6 denotes the support of the distribution: • Data on the real line, S = 1Z. • Positive data, S = 1Z+ = (0,1). • Positive data with zeros, S = [0, oo). • Proportions, S = (0,1). • Directions, S = [0, 2TT). • Count data, S = {0,1, 2 , . . . } . • Count data with finite support, S = { 0 , . . . , m}. The main idea behind the dispersion model is that the notions of location and scale may be generalized to position and dispersion, respectively, for a l l the above seven data types. Definition 2.1 Let Q C C C TZ be intervals with Q open. A function d : C x Q —> 1Z is called a unit deviance if it satisfies A unit deviance d is called regular if d(y; fx) is twice continuously dijferentiable with respect to (y, y) on Q x Q and satisfies d (y; y) = d (/j; fj) = 0 Vy, fj, e ft (2.1) and d {y; n)>0 My ± p. (2-2) d2d (p,; p) > 0 V> G Q. (2.3) dy2 7 The unit variance function V : ft —> 7c+ of a regular unit deviance is defined by v w = w ^ y { 2 A ) A univariate (regular) dispersion model DM(/J,,O2) with position parameter ii € ft and dispersion parameter o2 > 0 is defined by a probability density function of the form f(y,V,°-2) = a ( y ; a 2 ) e x p j - ^ d ( ? / ; / i ) J , ? / e C (2.5) where a > 0 is a suitable function and d is a regular unit deviance. The important characteristics of the density are: 1. The factor a in (2.5) is independent of the position parameter /x. 2. The densities are often unimodal; in general, the deviance d tends to make the density peak near //, and the peak will become higher and sharper the smaller the value of o2. As an example, consider the gamma distribution with shape parameter A and scale parameter 6. The corresponding distribution is given by f(y,\0)= ^ ( A ) ;y>0, \ >0,9>0. (2.6) The mean and the variance of Y are given by E(Y) = X/9, and Var(Y) = X/62 respectively. Now we write the above density function (2.6) as a dispersion model. Let E(Y) = ix and A = 1/CT2. Then we have, Var(F) = fj?o2 and 9 = l / ( / /a 2 ) , where IL and o2 are position and dispersion parameters, respectively. Then the density is given by f(y,V,°-2) = | ^ y - 1 e - A e x p j - ^ d ( y ; / / ) J = a ( y ; a 2 ) e x p | - ^ o ! ( y ; A i ) J , (2.7) 8 o Figure 2.1: Some gamma densities. where the regular unit deviance d(y; fi) is given by (2.8) and the corresponding unit variance function is given by V(n) = \x2. Note that in the following we denote the a function as c(y; A) when we use A instead of a2 For any regular unit deviance d(y;n) d2d . , d2d . , d2d . ' (2.9) for details see J0rgensen (1997b, p. 24). 2.2 Exponential dispersion models Exponential dispersion models were originally proposed by Tweedie (1947), and later reintroduced by Nelder and Wedderburn (1972) as a class of error distribution for generalized linear models. 9 A reproductive exponential dispersion model is defined by the following density for a random variable Y, with respect to a suitable measure, f(y; 9, A) = c(y; A) exp[A{0y - K(0)}],y e 71. (2.10) An additive exponential dispersion model for a random variable Z is defined by the probability density function of the following form, with respect to a suitable measure, f*{z;9,\) = c*(z;\)exrj{9z-\K(9)},zen. (2.11) The two types are connected by the duality transformation Z = XY, so in principle, any exponential dispersion model has both an additive and a reproductive forms. The parameter 9 is called the canonical parameter and A is called the index parameter. The maximum possible domain for 9, denoted 0 C 71 is called the canonical parameter domain. The domain for A, called the index set, denoted by A, is a subset of 7Z+. The function K is called the cumulant function. The reproductive form (2.10) is denoted by Y ~ ED(n,o2), where // = T{9) — K'(9) is the mean and o2 = A - 1 is the dispersion parameter. The variance of Y is o2V(ii) = o2r'{T~1(fj,)}, where V is the variance function. The additive form (2.11) is denoted by Z ~ ED*(9,X), and the mean and the variance of Z are E(Z) = C = XT{9) and Var(Z) = XV((/X) = XV(LI). (2.12) We define the unit deviance for the above exponential dispersion models (see J0rgensen (1997b) for details) as follows, Then the density (2.10) can be rewritten in the form (2.5) in terms of d of (2.13), and 10 sup.{j/0 - K(9)} - yr-l{p) + K { r " 1 ^ ) } (2.13) the a function is given by a"2 sup{y6 - K(6)} This proves that the reproductive model (2.10) is indeed a dispersion model (2.5). However, the additive model (2.11) is not, in general, of the form (2.5). Both types of exponential dispersion models satisfy a convolution property ( see J0rgensen (1997b) for details) given as follows. Let Y\,... ,Yn be independent and Yi ~ ED(p,o2/wi) with weights W{. Then 1 n { a 2 \ — ^WiY-EDlLi, — ) (2.14) w+ t i \ W + J where w+ = X)™ Wi- • If Z i , . . . , Zn are independent and Zi ~ ED*(6, Xi), then Z+ = '£iZi~ED*(d,\1 + --- + \n). (2.15) I Furthermore, we have A = TZ+ if and only if (2.11) is infinitely divisible (J0rgensen, 1997b). Note that the gamma distribution (2.6) can be written as fir, A, 0) = e x p [ A { ( - % - ( - log(-0))}], (2.16) which is of the form (2.10). Hence the gamma distribution is a reproductive expo-nential dispersion model. One of the shortcomings of the exponential dispersion models is that there are no adequate exponential dispersion models available for data with bounded support, such as angles or proportions. 2.3 Proper dispersion models In this section, we discuss another class of dispersion models, proper dispersion models, introduced by J0rgensen (1997a). The introduction of proper dispersion models pave 11 the way to accommodate distributions suitable for angles and proportions, namely the von Mises and simplex distributions, respectively. Def ini t ion 2.2 If d is a regular unit deviance with unit variance function V, a reg-ular proper dispersion model, denoted by PD(LI,O2), is defined by f (y; //, A) = c(X)V-Hy) exp ^~d(y; //) j , y G O. (2.17) where A = l/o2. That is, the term a in (2.5) is factorized as c(\)V~^(y). A crucial property of this model is that the integral cJX) = jnV~Hy)^^-\d{y-^dy (2.18) does not depend on LI. Since the term a in the gamma density (2.7) factorizes as c(X)V~^(y) it is a proper dispersion model. Exactly three exponential dispersion models are also proper dispersion models, namely the normal, gamma and inverse Gaussian families. Other examples of proper dispersion models are the von Mises distribution and the simplex distribution of Barndorff-Nielsen and J0rgensen (1991). The von Mises distribution, for example, is given by f ^ ^ = 2 J o ( a - 2 ) 6 X P [" 2 ^ 2 { 1 ~ C ° S ( y " ' for 0 < y < 2TV, where position parameter LI G [0,27r), dispersion parameter o2 1/A > 0 and I0 denotes the modified Bessel function given by (2.19) /•2TT (A) = / exp (X cos y)dy. Jo 2.4 Saddlepoint approximat ion In this section, we briefly discuss three methods of approximations that give an ap-proximation to a(y;o2). 12 2.4.1 Barndorff-Nielsen's p*-formula Barndorff-Nielsen's p*-formula provides an approximation to the conditional distri-bution of the maximum likelihood estimator for a given statistical model, given an ancillary statistic. Here we apply the formula in the case of a single observation from a proper dispersion model (2.17) with A known, in which case an ancillary statistic is degenerate, and the formula provides an approximation to the marginal distribution of the maximum likelihood estimator. Since the maximum likelihood estimate of fj, for A fixed is y, Barndorff-Nielsen's formula hence provides an approximation to the marginal distribution of Y. The approximation is defined by where f0 is the renormalized saddlepoint approximation corresponding to unit de-viance d , which we will discuss in the next section. Now let us define the multivariate version (Barndorff-Nielsen and Cox, 1994, p. 172) of the p*-formula. Definition 2.3 For a given ancillary statistic a, the log-likelihood can be considered as a function of ii and data. Let there be a one-to-one correspondence between the data y and (/2, a). Then we can write the log-likelihood £ as £ (/z; p., a) and the normed form £ as f{v\P, A ) ~ fo(y,fi,X), (2.20) £ (p; %a)=i (fi; ft, a) - £ ( £ ; a). (2.21) The p*-formula (or saddlepoint approximation) is defined by p* (£ ;M|a) = ( 2 7 r ) - p / 2 | j | V V (2.22) where, \j\ is the determinant of the observed information matrix for it evaluated at 11 = IL. 13 This formula (renormalized) determines a probability density, with respect to Lebesgue measure, if the range space of /z is an open subset of W. Note that we deal with con-tinuous distribution. The question of Lebesgue measure will not arise in our problem. In fact, formula (2.22) is called the pt-formula by Barndorff-Nielsen and Cox (1994). As compared to , the p*-formula which is originally defined as c|j |1/2e^, gives at least one degree higher accuracy but at the cost of having to calculate the normalizing constant c. So it is feasible to use pt-formula and we still call this formula the p*-formula. We use the formula (2.22) in the Section 3.3.1 to develop an asymptotic distribution for multivariate proper dispersion models.. 2.4.2 Saddlepoint and Laplace approximations In this section, we briefly discuss three methods of approximations that give an ap-proximation to a(y;o2). We now briefly discuss the saddlepoint approximation for dispersion models; a detailed discussion can be found in J0rgensen (1997b) . The sad-dlepoint approximation is important for the asymptotic theory of generalized linear models. The saddlepoint approximation for a dispersion model with regular unit deviance d is defined for y e ft by / ( y ; / x , a 2 ) - { 2 7 r a V ( | / ) } - 1 / 2 e x p | - ^ d ( y ; / x ) } as o2 -)• 0, (2.23) meaning that the ratio of the two sides goes to 1 as o2 —>• 0. The saddlepoint approximation is exact in a few very special cases, such as the normal and simplex distributions. The saddlepoint approximation may be interpreted as being half way to a normal approximation for the density. Thus, if we replace V(y) by V(ii) in (2.23) and intro-duce a quadratic approximation to the unit deviance in the exponent of the density, 14 we get the normal approximation. Note that the saddlepoint approximation on the right hand side of (2.23), while positive, is not in general a density function on fl. However, it may be rescaled to become a density function. The corresponding renormalized saddlepoint approximation is the density function defined for y G f2 by fo (y,^,a2) = a0 (^,<7 2 ) y-i/2(y)exp{-^ Ld(y;/x)} , (2.24) where a0 (fj,, a2) is a normalizing constant defined by j ^ [ v ) ^ { - ^ i M ) i y . (2.25) Note that, in the case of proper dispersion models, ao in (2.25) does not depend on We now present the important Laplace approximation, as described in, for exam-ple, Barndorff-Nielsen and Cox (1989, p.59). Proposition 2.1 (Laplace approximation) Define / ( A ) = f b(y)ext^dy, where Cl is an open interval. Suppose that b is positive, and continuous at y = JJL, that t is twice differentiable, has global maximum for y = \i G ft and that K(JJ) = -t"(p) > 0. Then / ( A ) ~ \ j T j ^ j A ^ y ^ a s A -+ °°- (2-26) 15 The ordinary and renormalized saddlepoint approximations agree asymptotically for o2 small; this can be shown by using the Laplace approximation. We now present this important saddlepoint approximation result, for a proof see J0rgensen (1997b, p. 28). Propos i t ion 2.2 Let d be a given regular unit deviance defined on C x ft and define aQ (LI,O2) by (2.25). Then, as a2 ->• 0, The essence of the saddlepoint approximation is the approximation of the function a(y;o2). Specifically, the saddlepoint approximation is equivalent to Def ini t ion 2.4 The saddlepoint approximation is said to be uniform on compacts if the convergence in (2.27) is uniform in y on compact subsets of ft. Propos i t ion 2.3 In the case of a proper dispersion model, the saddlepoint approx-imation is uniform on compacts, and the renormalized saddlepoint approximation is exact. The proof of this proposition contains the following arguments: For proper dispersion model we have a 0 (LI, o2) = a(o2) making the renormalized saddlepoint approximation exact. The continuity of the unit variance function V implies that the saddlepoint approximation is uniform on compacts. For details, see J0rgensen (1997b). oa(y;o2) -> {2-nV(y)}* as o2 0. (2.27) 16 Chapter 3 Multivariate dispersion models In this chapter, we discuss multivariate dispersion models. In the first section, we discuss the definition of multivariate dispersion models. Basically, in the first section, we discuss the multivariate dispersion models defined by J0rgensen and Lauritzen (1998) which opened a new field in statistics that has a lot of problems to investigate. It is the basis for this whole thesis. In subsequent sections we develop and discuss some statistical properties of the multivariate proper dispersion models and in particular the multivariate gamma distribution. Note that the order in which we discuss the multivariate dispersion models is related to its univariate counterpart in the preceding section. 3.1 Def in i t ion A class of multivariate dispersion models suitable as error distributions for generalized linear models with multivariate non-normal responses was introduced by J0rgensen and Lauritzen (1998). The main concern, in their paper, is with suitable distributions for generalized linear models with a multivariate response vector. The definition of multivariate dispersion models is geared towards good statistical properties similar to 17 that are possessed by multivariate normal distribution, rather than requiring specific moments or marginals. Let the sample space be an open subset of W. A multivariate dispersion model is defined by the following probability density function on W\ where LI £ Q (an open region in 7ZP), £ is a symmetric positive-definite p x p matrix and t (y; LI) is suitably defined vector of deviance residuals satisfying t (LI; LI) = 0 for LI € Q. The parameter LI, called the position vector, and S , called the dispersion matrix, may be interpreted as analogies of respectively the mean vector and variance-covariance matrix of the multivariate normal distribution. As an example, J0rgensen and Lauritzen (1998) defined the multivariate Laplace distribution by the following density with respect to Lebesgue measure on IV: and ± = sgn(?/j — Lij) denotes the sign of i/j — Lij for each j . Note that when the dispersion matrix £ is diagonal, the components of Y follow independent univariate Laplace distributions. For a detailed discussion on generalization of the definition (3.1) and geomet-ric construction of multivariate dispersion models refer to J0rgensen and Lauritzen (3.1) / (y ; LI, S) = o(S) exp j ~tT (y - LI) S" 1 * (y - LI) } , where a(S) is a normalizing constant, t is defined by (3.2) (1998). 18 3.2 Mul t ivar ia te proper dispersion models Following the spirit of the univariate proper dispersion models J0rgensen and Lau-ritzen (1998) called (3.1) a multivariate proper dispersion models if a(y; S) in (3.1) factorizes as a(£)6(y) for suitable functions a and b. That is, we define the multi-variate proper dispersion models as follows where V(fJt) = d i a g - f V ^ ) , . . . , V(/xp)}. The distribution (3.3) is denoted by PDp(fi, E). In the following we often use A = S _ 1 instead of As an example, consider the von Mises distribution given by (2.19). Define the deviance residual by where the sign ± denotes the sign of sin(y — fj,). Since r(y; fj,) is a pivot, a function of Y and n whose distribution under p, does not depend on the value of p, the standard construction proposed by J0rgensen and Lauritzen (1998) leads to a multivariate von Mises distribution of the form on [0, 27r) p, where t(y; fx) is the vector of deviance residuals. This is clearly a multi-variate proper dispersion model. j0rgensen and Lauritzen (1998) discussed a standard construction that takes al-most any univariate proper dispersion model and gives a multivariate proper disper-sion model as a result. They noted, however, that this standard construction does not work for the Inverse Gaussian distribution which is a univariate proper dispersion model. /(y; pi- £ ) = a p ( E ) | V ( / i ) | - i exp {-^ T(y; Ai)E _ 1*(y; » ) } , (3.3) / ( Y ; M , s) = a(S) exp [-^0T(y;/*)£_1*(y;/*)} 19 3.2.1 Multivariate gamma distribution In this section, we construct the multivariate gamma distribution which was described by J0rgensen and Lauritzen (1998). Consider the univariate gamma density (2.7) f(v, H, A) = ci (X)y 1 exp ( - ^d(y; LL) \ , y > 0 (3.4) where the regular unit deviance d is given by (2.8), the unit variance function V(LI) LL2 and C l (A) = A e r ( A ) " Note that the geometric measure is v(dy) = V~1^2(y)dy = y~ldy, where the geometric measure is defined as follows, see J0rgensen and Lauritzen (1998) for more details. Suppose G\(y) = {gJ\(y)} is a matrix with entries 1 d2 9 xiv) 2 duj/ifc tr{AT(y;/x)} fj-=y then the geometric measure is defined by ^ A ( % ) = | G A ( y ) | 1 / 2 ^ . The deviance residual is r(y; LL) = ±yJd(y,Li) = ± , / 2 { ^ - log V- - 1}, y A* A4 where ± denotes the sign of (y — LL). Clearly r(Y; LL) is a pivot. For more details about pivot refer to J0rgensen (1997b). Let *(y; At) = {r{yi, L i i ) , r ( y p ; LIP)}T denote the vector of residuals for a p-vector of data. Then the yoke r ( F i ; M l ) T(Y-Li) = t(Y-Lx)tT(Y-Li) = \ r { Y p ; L i p ) 20 W r ( y i ; A i l ) , . . . , r ( F p ; ^ ) } (3.5) is also a pivot with respect to the product measure u(dyu.. .,dyp) v(dyi) ® • • • <g) v(dyp) p n{rt}- (3.6) A multivariate version of the univariate gamma therefore appears as / (y ; ti, £ ) = ap(Y,){yi... yPYl exp [ - ^ { E ^ y ; » ) } (3.7) We denote this model Gap(ii, £). The new model (3.7) is clearly a multivariate proper dispersion model and in the case of A being diagonal, it has gamma marginals distributed independently. So we call this model a multivariate gamma distribution. The main problem in this multivariate model is determining the normalizing constant ap(A). This normalizer can be estimated by numerical integration or by Monte Carlo simulation. We defer the discussion of this problem until Chapter 4. This multivariate gamma distribution seems to be new and different from other multivariate gamma distributions proposed in the literature cf. Krishnaiah (1985). In particular, its marginals are not gamma except.when A is diagonal. Several procedures exist for constructing a multivariate gamma distribution given that the marginal distributions are gamma. For example Johnson and Kotz (1972, p. 216) proceeds as follows: Suppose that X0,Xi,... ,Xm are independent random vari-ables and that Xj has a standard gamma distribution. The joint distribution of Yj = X0 + Xj (j = 1,2,... ,m) is called a multivariate gamma distribution and the marginal of Yj is a standard gamma distribution. 21 Krishnamoorthy and Parthasarathy (1951) denned a multivariate gamma as fol-identically as a multivariate normal with mean vector 0 and covariance matrix £ . Also, let Zi — \ Y!j=\ Xfj for i = 1, 2 , . . . , p. Then the joint distribution of (Zi,..., Zp) is a central multivariate gamma distribution with n/2 as shape parameter and £ as the covariance parameter. The main application of the proposed multivariate gamma distributions has so far been in the study of reliability. Since the forms of the densities are quite com-plicated, it is difficult to accommodate a multivariate regression set up. The form of multivariate gamma model, in general multivariate proper dispersion model, pro-posed by J0rgensen and Lauritzen (1998) is very similar to that of a multivariate normal model. This will enable us to accommodate a multivariate regression setup. The problems related to the inference on the position vector is very much similar to that of a multivariate normal model. We now introduce some properties of the multivariate gamma distribution. Theorem 3.1 The Multivariate gamma model is closed with respect to scale trans-formations. Proof: Let Y ~ Gap(n, £ ) and C be a diagonal matrix with elements c i , . . . , cp, Ci 7^  0 Vi . Consider the transformation X = CY. The Jacobian matrix is given by Hence the density is given by lows: Let X j = (Xij,..., XPj), (j = 1, 2 , . . . , n) be distributed independently and 0(x;/x,E) I J I - V C C - ^ J M . S ) C\-lap{Y.) 22 = a. Now let us consider the deviance log (3.9) Hence, it follows from (3.9) that t(C *x; LI) = t ( x ; C ^ ) . Hence we may write (3.8) as Therefore, we have X ~ Ga p (C/x, £ ) . Hence the theorem. • 3.2.2 Max imum likelihood estimate of the position vector LI One of the main advantages of the multivariate dispersion model defined by J0rgensen and Lauritzen (1998), as in the univariate dispersion models defined by J0rgensen (1997b), is that the normalizer a p (S) does not involve the position vector LL and therefore it allows us to make inferences about LL without knowing the exact formula From now on, in this chapter, we use quite a number of techniques for matrix derivatives. We give an overview of these formulas in Appendix A.4- For more details cf. Dwyer (1962). Theorem 3.2 Let Y i , . . . , Y n be n random vectors, in W, from a multivariate proper dispersion model PDp(LI,TI). Then, for arbitrary S , the score function for for a p (S) . 23 Lij based on the full sample is <>ft HiS h r(Yif, ft) J where Xjk = (A)jfc and d'(y; LI) denotes the derivative of d with respect to LL. Proof . The log-likelihood function of LL is given by n 1 n £ (LL) = n log ap(A) + TJ log b(Yi) - - J ] t T (y i ; LI) At (y,; LL) . (3.11) i = l Z i = l Differentiating (3.11) with respect to LL gives 0*M = i p T (YjjAx) A t (Y i ;/x) 1 " P fat T (Y < ; / i)At(Y i ; / i ) ] 9{t (Y i ; / / ) } m 2 S i l l 9 * ( Y i ; / i ) J m 9M = - E E { ^ ( Y » ; M ) L a { t ( ^ ) } m i = l m = l ^ Z j = l m = l where em is a p x 1 vector with 1 at the mth position and zero elsewhere and we define D ' ( Y ; ^ ) = d i a g { ! ^ , . . . , ^ 4 [r^Y^fn) r(Yp;Lip) Therefore, from (3.12) it follows that ^ . 2 ^ Matrix multiplication of diagonal matrix D' with A gives A . d'{Yi:j-LLj) 3 k r(Yij\ Uj) (3.13) ^ = ~ \ t D ' (Y , ; LL)M (Yi- LI) . (3.14) 24 as the (j, k)th element. Now the multiplication of the resulting matrix from the above with the vector t(Yjj /x) gives a vector with as jth element. Now (3.10) follows immediately from (3.14). • Lemma 3.1 For a multivariate gamma model, for arbitrary X, the score function for Hj based on a full sample given by d^=P4>Mk)r(Y^] (3-lB) Proof . For a multivariate gamma model the D'(yjj ii) in (3.12) can be written as D ' ( Y i ; / x ) - - 2 A ( Y i , M ) V - 1 ( M ) where and A ( Y , / * ) = diag f Y l ~ ^ Yp~^p V(/z ) = diagj / i 2 , . . . , / / 2 ,} Now by using the fact that d(y; p) = 2(^ — log ^  — 1) and by using Theorem 3.2, it is easy to show that (3.15) holds. • Note that, if we make a small-dispersion approximation to d' at Y = /j,, we have (J0rgensen, 1997 p. 25), d\Y-y) « 2r(Y^)V-l/\p). (3.16) Thus, D ' (Y; fi) « 2 V - 1 / 2 (/x). (3.17) We will use the approximations (3.16) and (3.17) frequently in the next section. 25 There is no closed form solution to M ^ = 0 except for the case where we have a single observation vector. But this can be solved by numerical techniques; for example, we may use optimization technique such as quasi-Newton to find numerical values. L e m m a 3.2 Let Y be a random vector, in W, from a multivariate proper dispersion model PDp(ix, £ ) . Then, for arbitrary S , the maximum likelihood estimator of pi is y-Proof. We prove this by the following arguments. The log-likelihood function of fx is given by Let t = t(y;pb). Since £ is positive-definite so is S _ 1 = A . Hence tTAt > 0 and clearly we have t(y; y) = 0. Now if y ^ pi we have t(y; ii) ^ 0 and hence tTAt > 0 again by A being positive-definitive. Hence, tTAt is minimum for ii = y. Therefore it follows from (3.18) that the maximum likelihood estimator of is y. • We now give the second derivative of the log likelihood (3.18) with respect to fx, which will be used in the next section 3.3 Saddlepoint approximations We now introduce the saddlepoint approximations for the multivariate proper disper-sion model. The term 'saddlepoint approximation' was used by J0rgensen (1997b) in (3.18) dH{Ll) dpidiiT 1 \dB'(y-Li) 2 [ diiT 1 [f lD ' (y ; f i ) 2 [ dvT 26 univariate dispersion models and we keep this term for the multivariate cases also. The saddlepoint approximation in multivariate dispersion models, as in the univariate case, fortuitously avoids approximation of the term in the exponent. This is useful because the position vector LL enters the density only via the exponent, so that the saddlepoint approximations preserves the statistical properties regarding the inference on LL. In the following sections we investigate the limiting behaviour of the multivari-ate proper dispersion model when | | £ | | —> 0, that is, | | A | | —>• oo where || • || is the Euclidean norm of the matrix. 3.3.1 Barndorff-Nielsen's p*-formula In this section, we develop a saddlepoint approximation for multivariate proper dis-persion model by using the p*-formula (2.22): Theorem 3.3 Let the multivariate proper dispersion model be given by f (y; LI, £ ) = a ( S ) | V ( y ) | - 1 / 2 exp | - ± t T (y; LL) E"1* (y; //)} . (3.20) Then the saddlepoint approximation for the multivariate proper dispersion model is given by /(y;/x,E) ~ (27r)-"|EH|V(y)|^exp{-^T(y;/x)E-1t(y;/x)}, yenp (3.21) Proof. Let Y ~ PDP(LL, E). For one observation, we have by Lemma 3.2 that the maximum likelihood estimator of fx is p, = y. Note that for a single observation the ancillary statistic is degenerate. Using (3.17) and (3.19), the observation matrix of LL 27 at y is given by d2£{fi) 3 = - dfxdiJ,1 = V - 1 / 2 (y) S ^ V " 1 / 2 (y) . (3.22) Therefore, by using the p*-formula (2.22) the saddlepoint approximation for the mul-tivariate proper dispersion model is given by / ( y ; / * , E ) ~ ( 2 7 r ) - " | V ^ ( y ) S V ^ ( y ) | ^ e x p { - ^ T ( y ; z x ) S - 1 t ( y ; / x ) } = (27r)-f | E | - i | V (y) | " i exp { - i * T (y; ,x) E " 1 * (y; /x) j , (3.23) Hence the result. • We show below, by a Laplace approximation that the above approximation holds for ||5]|| —¥ 0. That is, the saddlepoint approximation is equivalent to | £ | 1 / 2 a p ( y ; s ) ~> ( 2 T ) " § | V ( y ) | " 2 as | | E | | -+ 0 or, equivalently, in multivariate proper dispersion models | £ | 1 / 2 a p ( E ) -> (27r)^ as | | E | | ->• 0 and this approximation is clearly uniform on compacts. 3.3.2 The Laplace method The Laplace method provides a proof of the saddlepoint approximation. The Laplace method for multidimensional case is direct generalization of the corresponding uni-variate discussion in (2.1). The key argument in the univariate case was to produce in the integrand a factor close to a normal density of small variance. Essentially the same argument applies in p dimensions. 28 We now present the important Laplace approximation, as described in, e.g., Barndorff-Nielsen and Cox (1989, p.169). Before we give the Laplace approxima-tion, note that for any matrix A , we can write A = A A 0 where Ao is a fixed constant matrix and A is a constant. P ropos i t ion 3.1 (Laplace approximation) Define where the functions r(y) and q(y) are defined over the region Q of integration, Q C W. Suppose that r(y) has a unique minimum in the interior o / D at y = LL, then where r" is the second derivative of r(y) evaluated at y = LI and | r"(/i)| > 0. Theorem 3.4 Let the multivariate proper dispersion model be given by 7(A) e - A r ( A X ) g ( A A ) ( 2 7 r ) P / 2 | r " ( / x ) | 1 / 2 , as A —> oo, (3.24) / (y ; LL, £) = a (E) | V ( y ) | - * exp | - ^ T (y; LL) YrH (y; A*)} , y G CP where a (E) - l Jn Then as | | £ | | —>• 0, a (E) ~ ( 2 7 r ) -p / 2 | E | - 1 / 2 . (3.25) Proof. Define r ( y ) = ^ T ( y ; / * ) s - 1 * (y; M) • and 29 where d! is the differentiation of d with respect to y. Differentiating r(y) twice with respect to y gives, = l A ' ( y ^ ) S - 4 ( y ; / . ) oy 2 = ^ ^ ^ { s - ^ C y i ^ ^ I p J + U X y j ^ E ^ A ' C y ; ^ ) . (3.26) Note that by using (3.16), we have A'(/it; n) = 2 V - 1 / 2 ( ^ t ) . We have shown in Lemma 3.2 that r(y) has unique minimum at y = /x. At y = /z the first term on the right-hand side of (3.26) vanishes and we have g & = v - v ( l . ) S - . v - / ' W . J y=fj. Now by applying this to (3.24), with the fact that r{y) = 0, we obtain a ( E ) - 1 V ~ ^ ) ( 2 f =(2 , )* |S |*. Now by taking the inverse of the above, we get (3.25). Hence the theorem. • Suppose we think of the approximation obtained by using the p*-formula as a sad-dlepoint approximation, then the approximation obtained by Laplace's method can be regarded as renormalized saddlepoint approximation. Therefore the ordinary and renormalized saddlepoint approximation agree asymptotically in the case of multivari-ate proper dispersion model. That is, in the case of a multivariate proper dispersion model the renormalized saddlepoint approximation is exact. In fact, this is one of the main results for the univariate proper dispersion models. 30 3.4 Mul t ivar ia te normal approximat ion We now consider a multivariate normal approximation to the multivariate proper dispersion model. This will pave the way to mimic some basic multivariate techniques in the multivariate proper dispersion model with small dispersion matrix. Theorem 3 . 5 Let the multivariate proper dispersion model be given by / (y ; LL; £ ) - a (E ) |V(y ) | -* exp { - ^ T ( y ; ^ E " 1 ^ ; /*)} • (3-27) Then Z 4 (0,I P) as | | £ | | 0, (3.28) where Z is a linear transformation given by Z = s - ^ V - 1 / 2 (/x) ( Y - LL) . (3.29) Proof. For small dispersion y gets closer to LI. SO we prove this by introducing a second-order Taylor series approximation to the term in the exponent of (3.27) at y = LI and saddlepoint approximation (uniformly convergent saddlepoint approximation) to the normalizing constant. Let <fe (y;Li) = tT ( y ; A t ) E - 4 ( y ; A t ) dd-z (y; LI) I dy d2d-z (y; LI) = 0 d y d y T = 2 V " 1 / 2 (LI) Y,~lV~ll2 (LL) . (3.30) y=A« Therefore the second-order Taylor series approximation to cfe (y;./z) at y = LL is given by du (y; y) « (y - n)T V " 1 / 2 (») s ^ V " 1 / 2 (LL) (y - /x) . (3.31) 31 The Jacobian matrix is given by j{„) = -^ = *-^v-^{»). Hence the density of Z is / ( z ; / i , E ) = | J ( / i )| - 1 /(v 1 / 2 ( f i )S 1 / 2 z + M;/i,s) « | S - 2 V - 5 (/x) ^ ^ ( S ^ V {V^(//)S5z + /z} | - 5 exp {z Tz}(3.32) = | S - 1 / 2 V - 1 / 2 ( / i)|- 1(27r ) - p / 2 | S | - 1 / 2 | V ( / x ) r 1 / 2 e x p { z T z } (3.33) - (27r ) - p / 2 e x p { z T z } . (3.34) where the term in the exponent of (3.32) was obtained by using the second-order Taylor approximation and the normalizing constant in (3.34) was obtained by the saddlepoint approximation for small-dispersion. Hence the results follows. • Note that we may write (3.28) as follows for small dispersion matrix. We note that d2dy (y; / i ) = , 9{2A' (y ; /x )S -H (y ;^ )} dydfxT dfiT d2d-z (y; p) dydLiT 2 V - 1 / 2 (y) E ^ V " 1 ' 2 (y). (3.35) Hence we have, generalizing (2.13), d2dv , , d2d^ , s d2dv dydyT (y; y ) d/j,dnT (y; y ) dydnr (y'y)' We now consider a normal approximation to another transformation of Y which is approximately equal to (3.29). 32 Theorem 3.6 The normal approximation to the multivariate proper dispersion model (3.27) with the transformation Z - Y,-l'H (Y; LL) (3.36) is given by zAj\f(0,Ip) as | | £ | | -»• 0. (3.37) Proof. For the transformation (3.36), the Jacobian matrix is given by J(Y, i . ) = ^ r = i E - ^ A ' ( Y ; A . ) . Now by using a second-order Taylor series approximation for small dispersion at H = y, we get A ' (Y; it) « 2 V ~ 1 / 2 ( Y ) and thus J (Y) « s - ^ V " 1 / 2 (Y). Hence, the density of Z is given by / ( z j / i . E ) « | J ( y ) | - 1 a ( E ) | V ( y ) | - 1 / 2 e x p { z T z } = E - ^ V - ^ ^ p ' ^ ^ l V ^ r ^ e x p f z ^ } ~ V 1 / 2 (y ) E V 1 / 2 ( y ) | " 1 / 2 ( 2 T T ) - ^ 2 | E | - 1 / 2 | V (y ) I" 1/2 exp { z T z } ~ (27r ) - p / 2 exp{z T z} . (3.38) The normalizing constant in Equation (3.38) is obtained by the saddlepoint approxi-mation. Therefore (3.37) follows immediately. • Note that we may write (3.37) as follows t (Y; LI) ~NP (0 , E ) . It is noted that since t (Y; LL) « V - 1 / 2 (LL) (Y — LI) we have the normal approxima-tions (3.28), and (3.37) are approximately equal, as we expected. 33 Chapter 4 Bivariate gamma distribution In this chapter, we discuss the bivariate gamma distribution. In Section 1, we discuss the normalizing constant in the case of the multivariate gamma model and study the normalizer numerically. In fact, we employ two methods to estimate the normalizer. Although in their construction of multivariate dispersion models, J0rgensen and Lau-ritzen (1998) did not require the definition of multivariate dispersion models to have specific moments or marginals, we, in Section 3, investigate the moments up to order two in a bivariate situation. And finally, in Section 4, we investigate the marginals numerically. 4.1 Numer ical s tudy of normal iz ing constant Let the multivariate proper dispersion model be given by J0rgensen and Lauritzen (1998) have shown the following derivations to calculate the normalizing constant. / (y ; / i ; A) = a p (A) |V(y)|-5 exp [ - ^ T ( y ; M)A*(y; A*)} (4.1) < ( A ) = / . . . / { V-*{yi) • • • V-HVP)} ^ P { - ^ T ( y ; /*)A*(y; A*)} dy (4-2) 34 = J -J{V *(yi)...V 2(y p )}exp j ^ A ^ O / i ; ^ ) 1 e x p j - 2 !C A ^ R A * 0 r | dy n ^ ^ - ^ A j j ) y . . . y"exp|-^5ZA 0T (yi] ta) r {yj\fij) i ( 1 p n i = i {c(Ajj)^"^(?/j)dyj} exp ^ - - ^ A^d (y*; ^ ) n ^ c - ^ A ^ O E l e x p i+3 - ^ T ( y ; / / ) A 0 t ( y ; fx)} (4.3) where A 0 has zeroes on the diagonal, and off-diagonal elements equal to A , c is the normalizer of the marginal univariate proper dispersion model, and the expectation is with respect to the distribution with independent marginals corresponding to the diagonal of A . In fact, Song (1996) used a derivation same as the above to estimate the normal-izing constant for the multivariate gamma model. By using (4.3), the normalizing constant for multivariate gamma model is given by (nTA*')exp{-E?A«} - i n? r (A«) E e x p { - £ V ( * i ; l ) r ( X , - ; l ) (4.4) The expectation is taken under the probability measure generated by p-independent univariate gamma random variables Xi ~ Ga (l, A^1) , i = 1,... ,p. This expectation in (4.4) can be calculated by using Monte Carlo simulation. Note that we may estimate the normalizer by using numerical integration directly applied to (4.2). Note that this approach may not be a good one to estimate the normalizer in the case of higher dimensions. 35 In the case of Monte Carlo simulation, we have to estimate the following for the bivariate gamma: (n jA^expj -E?^} !" 1 . nir(A«) E [ e x p { - A 1 2 r ( X i ; l ) r ( X 2 ; l ) } ] , where the expectation is taken over two independent gamma random variables with mean 1 and dispersion parameter A ^ 1 . Before estimating the normalizer, let us define the following as a measure of association between two variables <7l2 P = i where (E)^ = Oij. We consider the following dispersion matrices ( £ ) and the measure of association (p) to study the normalizing constants . Table 4.1: Dispersion Matrix. 0.75 0.3 0.1 -0.1 -0.3 -0.75 S i = So = s, = s 4 = E s = E f i = An S-Plus program interfaced with C was written to estimate the normalizing constant by using Monte Carlo (MC) simulation. In fact, S-Plus was used to generate 36 Table 4.2: Estimation of normalizing constant by using M C method. p S n 1000 10000 50000 100000 0.75 S i 6.25 6.02 6.13 5.99 (0.21) (0.14) (0.05) (0.01) 0.3 s 2 0.3814 0.37869 0.37874 0.37905 (1.3 x 10"3) (5.3 x 10~4) (2.3 x lO" 4 ) (1.3 x 10~4) 0.1 s 3 0.11263 0.112513 0.112507 0.112498 (1.2 x 10~4) (4.4 x 10~5) (1.9 x 10"5) (1.09 x 10~5) Table 4.3: A comparison of estimations of normalizing constant. p S M C method Saddlepoint approximation 0.75 S x 5.99 (0.01) 6.02 0.3 s 2 0.37905 (1.3 x 10~4) 0.41709 0.1 s 3 0.112498 (1.09 x 10"5) 0.14307 the gamma random variables and C was used to do the calculation. Since C was used to do the calculation, the time taken to complete a simulation was very small. For example, for a simulation study on 100000 random numbers, the average time was 8 sec. using an HP C200 machine. In Tables 4.2 and 4.3, the estimates are given with their corresponding standard errors (given in brackets). Each simulation was repeated 10 times to get the standard errors. It is observed from Table 4.2 that as we increase the number of simulations (n) from 1000 to 100000, we obtain almost the same values of the estimate but with higher precision. In Table 4.3, we compared the estimate obtained by using M C methods with ns 100000 and the estimate obtained by using saddlepoint approximation. For small 37 dispersion ( | | £ | | small) both estimates are very close to each other. Then as | | £ | | increases saddlepoint estimates tends to over estimate the normalizing constant. Another interesting and potentially important empirical observation was that when a similar simulation study was done to estimate the normalizing constant in the cases of £ 4 , £ 5 and £ 6 the values obtained were almost equal to that of £ 1 , £ 2 and £ 3 , respectively. This situation is similar to a bivariate normal where the values of normalizing constant in the cases of £ 4 , £ 5 and £ 6 are equal to that of £ 1 , £ 2 and £ 3 , respectively. 4 . 2 Density plots In this section, we give some density plots of the bivariate gamma. First we shall investigate the behaviour of contour plots and perspective plots when we have two independent gamma margins. That is, the dispersion matrix is diagonal. In this case, we consider the position vectors and dispersion matrices given in Tables 4.4 and 4.5. Table 4.4: Position vectors. I1) (3) V 1 / I 2 / \5) Table 4.5; Dispersion matrices. 0.04 0 \ / 0.2 0 \ / 2 °1 £ 7 = £ 8 = £ 9 = I 0 0.04 / I 0 0.3 / I 0 2 J Figure 4.1 shows some contour plots of independent bivariate gamma densities and Figure 4.2 shows the corresponding perspective plots. The main reason for plotting the contour plots of independent bivariate gamma densities is to investigate the role 38 39 E 7 S 8 S 9 Figure 4.2: Perspective plots of independent bivariate gamma densities. 40 of the position vector LL on the shape of the densities. In the first row of Figure 4.1 the dispersion matrix is small and thus we can observe clearly that as position varies the peak of the density varies accordingly. When we have large dispersion matrix (in row 3) the peak becomes blurred, essentially because the large value of the dispersion squeezes a lot of probability mass down towards the origin. Large values of the dispersion, as explained by J0rgensen (1997b), corresponds to a value of the Euclidean-norm of the dispersion matrix for which the non-normality of the distribution becomes evident. Contour plots and perspective plots for bivariate gamma densities when the two variables have some dependence structure were discussed in three cases: 1) dispersion matrix with small dispersion; 2) dispersion matrix with moderately large dispersion; 3) dispersion matrix with large dispersion. Small dispersion: The dispersion matrices used in this case are given in Table 4.6. And the contour plots and perspective plots corresponds to this case are given in Figures 4.3 and 4.4. Table 4.6: Small dispersion matrices. 0.04 0.03 0.03 0.04 0.04 -0.03 -0.03 0.04 0.03 0.015 0.015 0.09 0.03 -.015 -0.015 0.09 Moderately large dispersion: The dispersion matrices used in this case are given in Table 4.7. And the contour plots and perspective plots corresponds to this case are given in Figures 4.5 and 4.6. Table 4.7: Moderately large dispersion matrices. 0.1 0.138 0.138 0.3 0.1 -0.138 -0.138 0.3 0.2 0.085 0.085 0.4 0.2 -.085 -0.085 0.4 Large dispersion: The dispersion matrices used in this case are given in Table 4.8. 41 p = 0.75 S u Figure 4.3: Contour plots of bivariate gamma densities with small dispersion. 42 A*i A*2 A*3 p = 0 . 7 5 S n p = - 0 . 7 5 £ i 2 p = 0.3 E 1 3 p = -0.3 S u Figure 4.4: Perspective plots of bivariate gamma densities with small dispersion. 43 Figure 4 . 5 : Contour plots of bivariate gamma densities with moderately large disper-sion. 4 4 Mi M 2 M 3 p = 0.80 Sis p = -0 .80Si 6 p = 0.3 S17 p = -0.3 Sig Figure 4.6: Perspective plots of bivariate gamma densities with moderately large dispersion. 45 1 Figure 4.7: Contour plots of bivariate gamma densities with large dispersion. And the contour plots and perspective plots corresponds to this case are given in Figures 4.7 and 4.8. Table 4.8: Large Dispersion matrices. Discussion: In the independence case, (Figure 4.1) the contour plots are almost ellipsoid like bivariate normal distribution where the axes (major and minor) of the contours are parallel to the xy-axes. When the two variables have some dependence structure, for small dispersion ( Figure 4.3), we considered two cases of dispersion matrices: one with diagonal elements equal: and the other with diagonal elements of ratio 3 and with large and small measure of 46 p = - 0 . 8 0 S 2 0 Figure 4.8: Perspective plots of bivariate gamma densities with large dispersion. association. In the first case of dispersion matrix, for positive measure of association, the major axis of the contour plot ( almost ellipsoid) is almost along the 4 5 ° line through LL. When the measure is negative the major axis of the contour plot lie along a line at right angle to the 4 5 ° line through LL. That is, for small dispersion, the bivariate gamma behaves like bivariate normal. We may also note how the larger measure of association causes the probability to concentrate along a line. When we dealt with moderately large dispersion matrices (Figure 4.5), this large dispersion causes some deviation from normality but still preserve the fact that ac-cording to the sign of the dependency measure the major axis is either along the 4 5 ° line or perpendicular to it. Since a large dispersion in gamma density squeezes lot of probability mass down to-wards the origin the corresponding plots (Figures 4.7 and 4.8) are no longer ellipsoids. 47 4.3 Est imat ion of moments In this section, we estimate the moments up to order two. The estimation, for dimen-sion p , can be done by using Monte Carlo simulation. The formula for this simulation can be obtained by employing the same kind of derivation that was used to obtain a formula for the normalizer (4.4). We give the equation in the case of two variables Y\ and Y2. ( (Vp xxA exp ( - T2 X-] 1 _ 1 E (Yi) = ap (S) | 1 ulTiXii) 1 } E [ Y i 6 X P { ~ A l 2 r { Y l ] ^ r { Y 2 ] P 2 ) } ] • (4.5) E (Y2) = ap (£) { ( m A ^ ^ E ' A r i } } 1 E [Y2 exp { - A 1 2 r (*; M l ) r (Y2; »)}] . (4.6) The expectation is taken under the probability measure generated by two independent univariate gamma random variables Yi ~ Ga (^i, A^ 1 ) , i = 1,2. This expectation can be calculated by using Monte Carlo simulation. In the case of small dimensions we may also use numerical integration to estimate the moments. In Tables 4.9 and 4.10, we give estimates of moments, using Monte Carlo simulation, for two dispersion matrices. The first one with p = 0.75 and the second with p = 0.3. We compare the estimated values with reference values (Ref. Val.) obtained by using the marginals Yi ~ Ga(pi,ou) i = 1,2. A C program interfaced with S-Plus was used to estimate the moments. A set of 100000 random numbers was generated from corresponding gamma distributions to estimate the moments and this was repeated ten times to estimate the standard error. It was noticed from Tables 4.9 and 4.10, that although the estimates are almost equal for large and small measures of association, the corresponding standard errors are small for small measures of association. This implies that the off-diagonal term 48 in the dispersion matrix plays an important role, along with the diagonal elements, in estimating the moments. Table 4.9: Estimation of moments. E = ( 0.04 0.03 N ^ 0.03 0.04 j fj, = C) fj, = w Ref. Val. Estimated Ref. Val. Estimated 1 0.9893 (0.0019) 3 2.9879 (0.0044) E(Y2) 1 0.9894 (0.0017) 4 4.0083 (0.0426) Var(Yx) 0.04 0.0404 0.36 0.3122 (0.0019) (0.0578) Var(Y2) 0.04 0.0341 0.64 0.62 (0.0115) (0.26) 4 . 4 Numer ica l s tudy of marginal density In this section, we investigate marginals from the bi-variate gamma. For this study, we considered the position vector and dispersion matrices given in Table 4.4. A numerical integration routine in C was used to integrate out the second variable at some specified points of the first variable. The reference density and the estimated density are given in Figure 4.9. The first plot is for small dispersion, the second and third are for moderately large dispersions and the last is for large dispersion. It is noted, from the first plot, that the estimated and the reference marginal densities are almost exactly the same. In the second and third plots the estimated and the reference marginal densities are very similar. But for the large dispersion, in the last 49 Ga(3,0.04) Ga(3,0.5) Ga(3,l) Figure 4.9: A comparison between true and estimated marginal densities (estimated density is given in solid lines). 50 Table 4.10: Estimation of moments. •E = ' 0.03 0.015 \ v 0.015 0.09 ) fx = C) fi = {31 Ref. Val. Estimated Ref. Val. Estimated E(Y,) 1 0.9949 (0.0003) 3 2.9841 (0.0006) E(Y2) 1 0.9953 (0.0003) 4 3.9784 (0.0001) Var{Yx) 0.03 0.0298 0.27 0.2717 (0.00001) (0.00001) Var(Y2) 0.09 0.0951 1.44 1.4326 (0.0008) (0.0093) Table 4.11: Dispersion matrices. 0.04 0.03 0.03 0.04 0.2 0.12 0.12 0.8 0.5 0.112 0.112 2.5 1 0.735 0.735 1.5 plot, there is a marked difference between the estimated and the reference marginal densities. 51 Chapter 5 A generalization of Hotelling's T 2 -statist ic 5.1 In t roduc t ion One of the most important test problems in univariate statistics is the test of the mean of a given distribution when the variance of the distribution is unknown. The statistic usually used in univariate statistics is the difference between the sample mean, x, and the hypothetical population mean, divided by the sample standard deviation, s. If the population sampled is iV(/i, cr2), then t = ^ r i ^ ^ (5.1) s has the well-known ^-distribution with n — 1 degrees of freedom, where n is the sample size. On the basis of this fact, one can test the hypothesis HQ : p = po, where JIQ is specified. The statistic (5.1) was generalized for non-normal distributions in the framework of generalized linear models, cf. J0rgensen (1997a). J0rgensen used the corresponding estimated dispersion parameter in the denominator of (5.1). He, in fact, showed 52 that the statistic follows asymptotically a ^-distribution for large samples and small dispersions. The multivariate analogue of the square of t given in (5.1) is T 2 = n ( X - A t o ) T S - 1 ( X - M o ) (5.2) where X is the mean vector of a sample of size n and S is the sample covariance matrix. (5.2) is referred to as the Hotelling T2-statistic in the multivariate normal model. In the following sections, first we discuss the general likelihood ratio method in the multivariate normal model which will be used to derive (5.2). Then we generalize the T 2-statistic to the multivariate proper dispersion model. 5.2 General l ikel ihood rat io method in mul t ivar i -ate normal model Let 6 be a vector consisting of all the unknown population parameters, and let L (6) be the likelihood function function obtained by evaluating the joint density of X l 5 . . . , X n at their observed values xx,..., x„. The parameter 0 takes its values in the parameter set 0 . Let us consider the p-dimensional multivariate case, where the X^s are independent and identically distributed as JVp(LL, £ ) . Then 0 consists of the product of the p-dimensional space for means and the p(p + l)/2-dimensional space of variances and covariances such that £ positive-definite. Therefore, 0 has dimension v = V + P(P + l ) /2 . Under the null hypothesis H0 : LI = fx0, LL0 is restricted to lie in a subset 0 O of 0 . So 0 O has dimension v0 = p(p + l ) /2 (mean vector is known). 53 The likelihood ratio test of H0 : /x = fi0 rejects H0 in favour of Hi : LI ^ ^t 0 if A = m a x ^ = A t o L ( A t , S ) ^ m a x ^ ^ L (tx, S) where c is a suitably chosen constant. When the sample size is large and under certain regularity conditions, the sampling distribution of —2 log A is well approximated by the chi-squared distribution with degrees freedom v — v0. That is, —2 log A ~ xl-Vo-5.2.1 T2-statistic in multivariate normal Before we make a connection between likelihood ratio test for the hypothesis H0 : tx = tx0 and Hotelling's T 2-statistic, we first give the general definition of the Hotelling's T 2-statistic and its distribution. Def in i t ion 5.1 Let Z ~ jV (0,IP), W ~ W (Ip,n — 1) (Wishart distribution), and let the two random variables be mutually independently distributed. Then the random variable T2 = {n- l)ZTVf-lZ (5.3) is called Hotelling's T 2-statistic. The distribution of T2 is called Hotelling's T2-distribution with n d.f. The T 2-statistic and its distribution were originally developed by Hotelling (1931). Now let us consider the likelihood ratio test for the hypothesis H0 : /j, = LI0 vs. H i : /x ^ /x0 on the basis of the random sample ( x i , . . . , x n ) drawn from J\fp (fi, S ) , T, positive-definite , where n > p , fi0 is a given vector, and (//, S) is unknown. Let 0 = {(/x, S) : —oo < pi < oo, i — 1 , . . . ,p, £ positive-definite} , ©o = {(/x, S) : pt = /x0, £ positive-definite} . 54 The likelihood ratio criterion A is given by A = m a x 6 > 6 0 o L ( 6 > ) m a x 0 6 0 L (0) / T2 \ ~n^2 ra - 1 where T 2 = n ( x — LI0 ) S 1 ( X — LI0^ . Note that this T2 can be written in the form of (5.3) by letting Z = V ^ T 1 7 2 ( X - Mo) and V = nSn and W = E ^ V E " 1 ^ . Since A is a monotonically decreasing function of T 2 , the likelihood ratio test based on A is equivalent to the following: Accept H0 if T2 < T2 (a), Reject/^ if T2 > T 2 (a), where T2 (a) is the upper a-point of the distribution of T2 for a specified significance level a. Note that when p = 1, T 2 reduces to the square of Student's t-statistic. Der iva t ion of d i s t r ibu t ion of T 2 -s ta t is t ic Consider the statistic T2 = (n — 1 ) Z T W _ 1 Z . Since T2 is invariant under any non-singular transformation, we make use of a random orthogonal transformation Z * = H Z , W * = H W H T , where H P X P is an orthogonal matrix such that its first row is z l / 2 . Noting that H Z = | ( Z T Z ) , 0,. . . , oj , T2 reduces to the following simple form: T2 = (n- 1) ( H Z ) T ( H W H T ) _ 1 ( H Z ) = (n - 1 ) ^ ; 55 where w~iX is the (1, l)th element of W " 1 . Now we have (zTz) ~ x 2 . By the property of the Wishart distribution, we have the distribution of W* condition on H is W (I p, n — 1). Suppose we partition the W* as follows: ^ wn W 1 2 V W 2 i W 2 2 then we can write w+n — w\\ — Wi 2 W 2 " 2 1 W 2 i and this has a chi-squared distribution of n — p degrees of freedom independent of H and hence of Z. Hence the distribution of T 2 is given by X2P P r>j — — ( n - 1 ) Xl-P ( n - P y p ' n - p -The above method of derivation of the T 2-distribution by making use of a random orthogonal transformation is due to Wijsman (1957). 5.3 M a x i m u m l ikel ihood estimators of LL and E In this section, we discuss the small-dispersion asymptotic maximum likelihood esti-mators of LI and X . Theorem 5 . 1 Let Y x , . . . ,Yn, be a random sample from PDP (fi, S ) (multivariate proper dispersion model with p-dimension) where fi and X are the position vector and dispersion matrix, respectively. Then asymptotically, the maximum likelihood estimators of fi and S are given by fi « Y as | | S | | -> 0, (5.4) E ~ - E { * ( Y i ; Y ) t T ( Y i ; Y ) } as | | S | | ^ 0. (5.5) n i=i 56 Proof. We have, from (3.14) in Theorem 3.2, that ^ = -^ED ' (Y i ; M)At(Y , ; A x) . (5.6) For small-dispersion, we have D ' ( Y l ; A t ) « 2 V - 1 / 2 ( M ) , and t ( Y l ; A t ) « V - 1 / 2 ( A t ) ( Y i - A t ) . Hence, we have ^ « ± V - 1 / 2 ( A i ) A V - 1 / 2 ( M ) ( Y i - /*). (5.7) a M i=i Hence (5.4) follows immediately from (5.7) by taking 9 Q J ^ = 0. For small-dispersion, by the saddlepoint approximation, we obtain / (y; /x, £) » (27T)-* | E | " * | V (y) | "* exp { -^ t T (y; /z) £ " 4 (y; jx)} . (5.8) Hence the log-likelihood function is given by £(LI, E) « log {(27T)-? ft | V (y 4) | - 1 / 2 | - £ log | E | - \ ± tT(yi; / i J E " 1 ^ ; / i ) . (5.9) We now differentiate (5.9) with respect to E . = - T ^ - 1 + ^ t s _ 1 {* n) tT M)} S " 1 . (5.10) i=l Now taking = 0 and inserting the estimator /x = y gives a maximization problem similar to that for the multivariate normal and we obtain (5.5). • 5.4 T 2 i n multivariate proper dispersion model In the general setup of the T 2-statistic, we consider the likelihood ratio test for the hy-pothesis H0 : /x = (j,0 vs. Hi : LI ^ /x0, where /z0 is known and the dispersion matrix 57 X is unknown. For the discussion of T 2-statistic, we consider two types of asymptotic theory, namely standard large-sample theory (n —>• oo) and small-dispersion asymp-totics. But since we use the estimator of X in the likelihood ratio test, we cannot make | | X | | —> 0. So we introduce a known weight matrix A for each observation vector such that A is symmetric and regular and consider | | A j | | —>• oo for each i = 1,..., n. The limit | | A j | | —>• oo is the theoretically simplest way of expressing small-dispersion asymptotics; in practice the results may be applied for | | X | | —)• 0 or more generally | | A ~ 1 / 2 S A ~ 1 / 2 | | 0 for all i. Consider the likelihood ratio test for the hypothesis H0 : fx = fx0 vs. Hi : LL ^  LL0 on the basis of the random sample Y i , . . . , Y n , with distribution Y^ ~ PDP [LL, A ^ ^ X A ^ i = 1,..., n, where X is positive-definite, n > p and (fx, X ) is unknown. Let 0 = {(fx, X) : ^ e f i , i = l , . . . , p , S positive-definite} , 0o = {(A*> X) : fx = fx0, X positive-definite} . We know from Theorem 3.6 that S - 1 / 2 A j / 2 t ( Y i ; fx) ~ JVP(0,IP) as | | A ; | | -> oo 1 /2 that is, A / t (Yf, fi) ~ J\fp (0, X) as | | A ; | | —> oo. L e m m a 5.1 Let Y i , . . . , Y n be a random sample from PDP [fx, A ~ 1 / , 2 X A ~ 1 7 2 ) . Then, for 1  A^l | —> oo the maximum likelihood estimate of X is approximated by s « ^EAj / a*(yi;y)*T(y*;y)Aj / 2. (5.11) n i=i Suppose fi is known and equals to fx0. Then, for \ \Ai\ \ —>• oo the maximum likelihood estimate of X is given by S o ~ - E A , 1 7 2 * (yt; fx0) tT (y,; fi0) A]/2. (5.12) n i-l 58 Proof. The results (5.11) and (5.12) follow immediately from Theorem 5.1. • L e m m a 5.2 Let Z{ = A\'2t ( Y f ; fx0) and S = ^ £ ? = 1 A.]/2t (y i ; y) t T (y i ; y) A\'2. Then S and Z are asymptotically independent for | | A j | | —>• oo /or a// i , where Z = 1/n XI Z i , and furthermore Z - A / ^ O , ^ (5.13) and (n - 1)S ~ W n _ ! ( £ ) . (5.14) where W„_i (S) is £/ie Wishart distribution with n — 1 degrees of freedom. Proof. Since Y i , . . . , Y„ are independent, so are Z i , . . . , Z n . We have by Theorem 3.6 Zi ~ A/p ( 0 , S ) as 11Aj|| -r oo (5.15) and hence Z~J\fp(o, ^ as 11Ai|| oo for a l i i . , (5.16) Now by applying a second-order Taylor series approximation to Zi we obtain A]/2t ( Y ; LI0) « A ^ V " 1 / 2 (At0) ( Y - At0) = A j / 2 { V - V 2 ( / i f l ) ( Y - Y ) + ( M o ) (Y - M o ) } « A j / 2 t ( Y i ; Y ) + i f : A j / 2 t ( Y , ; / i 0 ) . Hence A j / 2 * ( Y i ; Y ) ~ Z i - Z . (5.17) Let Zpxn = ( Z i , . . . , Z„), and X = Z H , where H n X 7 1 is an orthogonal matrix and nth column is n 1 / 2 l n x l so that X„ = n : / 2 Z . Now X ~ Afp (0, E ® I „ ) , (5.18) 59 so, the columns of X are independently normally distributed with the same covari-ance matrix X . Now let X ' = ( X i , . . . , X n _ i ) , then we have that X ' and X„ are asymptotically independent. Now consider ( n - l ) S = ^ Z i Z 7 - n Z Z T i=i = Z Z T — X n X n = X X T — X n X n = E X . X T . ' (5.19) i=i Hence we have that S and Z are asymptotically independent for | | A j | | —» oo for all i. The result (5.15) now follows immediately from (5.18) and (5.19). • Note that S is an asymptotically unbiased estimator of X for | | A , | | —» oo for all i. We define the T 2-statistic for testing H0 : LL = Li0 vs. Hi : LL ^  LL0 with X unknown by the following monotone transformation of the likelihood ratio test A, T 2 = (n - 1) ( A " 2 / " - l ) . (5.20) H0 is rejected for small values of A or, equivalently, for large values of T 2 . Note that in the case of multivariate normal the Equation (5.20) is exactly the Hotelling's T 2-statistic. Theorem 5.2 Consider the test of the hypothesis Ho : LL = /x0 vs. Hi : LL ^  LL0 on the basis of the random sample Y 1 ; . . . , Y„ drawn from PDP [LL, A l ~ 1 / / 2 X A ~ 1 ^ 2 ) . The T2 in (5.20) then follows approximately the Hotelling T2-distribution. Hence, furthermore, we can show that T ~ F p n _ „ as A J —> oo for all i. (5.21) n — p ' 60 Proof. It can easily be shown that the likelihood criterion for the above hypothesis, under multivariate normal approximation, is given by A 2 / " « Now by taking we have Hence E + ( E 0 - S ) | Isl B ( -1) s + zz s + zz A2/r, s + zz s z z T - 1 s - i - z T s - 1 z ( l + Z T E " 1 z ) V n - 1 I 1 + T2 ' ra - 1 T2 « (n - 1) ( A " 2 / " - l ) and we have T2 « raZTS-1Z (5.22) Since fnZ • ~ JVP (0, S ) , 61 S ~ W n _ i ( S ) and Z and S are asymptotically independent, by Definition 5.1, T2 is approximately the Hotelling T 2-distribution. Hence T)(TI — 1 J T2 ~ Fnr,-r, as ||A,|| —» co for all i. n — p Hence the theorem. • Theorem 5.3 For large sample, Equation (5.21) still holds. That is, -,2 .. P(n ~ 1) T ~ Fvn_vas n —> oo /or all i. n — p Proof. By large sample theory, we have v = — 2 log A —>• x 2 as n —> oo. Using (5.20), we have T 2 = ( n - l ) ( e - - l ) ( n - l ^ P f r - 1 ) ^ n n p p(n - 1) n — p Hence T 2 ~ as n ->• oo. • •Pp.oo as n —> oo. 62 Tree Data (Diameter) Tree Data (height) tO 12 14 16 (in inches) 70 75 60 85 {in In!) Figure 5.1: Histograms of diameter and height of the trees. 5.4.1 Numerical study on the T2-statistic In this section, we investigate the density of the generalized T2 statistic via two ways; 1. by using data, 2. by simulating random numbers from the bivariate gamma model. B y using data: The data set used in this analysis was obtained from Ryan, Jointer and Ryan (1985). The Trees data contains three variables. The variables are diameter (d), height (h) and volume (v) for 31 black cherry trees in Allegheny National Forest, Pennsylvania. The variables are clearly strong candidates for a multivariate gamma model. We consider the first two variables in our density estimation problem. The histograms of these two variables are given in Figure 5.1. Asymptotic estimates for the position vector and dispersion matrix is given as follows ^ 13.25 ^ 76.00 and X = 0.05 0.01 0.01 0.007 and | |E | | = 0.05. To estimate the density of T2, we bootstrap 1000 samples from the data and the values were calculated for both the classical T 2-statistic (5.2) and the generalized T 2 -63 statistic (5.22). A non-parametric density estimation was obtained by using kernel smoother (with Silverman's band width). The estimated density is compared with the true density, F p > n _ p = F 2 ) 2 9 . The plots are given in Figure 5.2. From the Figure 5.2, we observe that our generalized T 2-statistic behave almost exactly the same as the classical T 2-statistic for this data set. The main reason may be that the dispersion matrix is small. Even there is some discrepancy at the upper part of the distribution, the approximation is very accurate around the critical region. Note that ^ 2 , 2 9 , 5 % = 3.33 and i*2,29, i% = 5.02 Although our generalized T 2-statistic obtained from asymptotic theory, we may use this when we have a non-normal data, data from multivariate gamma, in this case. B y S imula t ion study: One of the hardest part in this thesis was to find a suit-able method or procedure to generate random numbers from the multivariate gamma distribution, bivariate gamma distribution in this case. The main problem in em-ploying the well known Markov chain Monte Carlo methods such as Gibbs sampling and Metropolis algorithm, is that it is hard to find the conditional distributions in the multivariate gamma. Neal (1997) proposed a Markov chain Monte Carlo method based on 'Slicing' the density function. The method does not require the whole den-sity function, only requires a function that is proportional to the density. Therefore, it is feasible to use this method in the multivariate gamma where the normalizing constant is unknown. The idea is that suppose we need to sample from a distribution over some subset of lZn, with density function proportional to some function f(x). One simple idea is to sample uniformly from n +1 dimensional region that lies under the plot of f(x). This idea can be formalized by introducing an auxiliary real variable, y, and defining a joint distribution over x and y that is uniform over region U = {(x,y) : 0 < y < f(x)} 64 Classical T 2-statistic 65 below the curve defined by f(x). The algorithm for a single variable (x) is briefly as follows: We sample alternately from the conditional distribution for y given the current x -which is uniform over the interval (0, f(x)) - and from the conditional distribution for x given y - which is uniform over the region S = {x : y < f(x)}, which is called the 'slice' defined by y. For multivariables: each real-valued variable, be updated by a single-variable slice sampling procedure. A S-plus program interfaced with C routine was used to generate the random numbers from Ga2(Li, S),where Five hundred samples each with 50 random numbers were generated and the classi-cal T 2-statistic (5.2) and the generalized T 2-statistic (5.22) were calculated and the corresponding estimated densities were compared with true density, F2>48 (see Figure 5.3). From the Figure 5.3, we observe that the generalized T 2-statistic behave a little bit better than classical one. The discrepancy between the estimated and the true densities is more in the classical case than in generalized case. Although there is a discrepancy between the estimated and the true densities in the upper part of the plot, there is pretty close agreement between the estimated and the true densities in the tail area in both cases. That is, the approximation is very close around the critical region. This is a very important observation because we are interested in hypothesis testing. Note that F2^8^% = 3.19 and i*2,48,i% = 5.08. V 0.015 0.04 and IISII = 0.1. 66 Classical T 2-statistic 67 Discussion: Bo th in the simulation study and in the data analysis, we observed that the generalized T 2 -s tat is t ic behave almost as the same as the classical T 2 -s tat is t ic . Then the following question arises here: what is the usefulness of the generalized T 2 -s tat is t ic? The one answer for this question is that in most situations gamma data behave almost like normal data. That is why we could not observe a marked differ-ence between the generalized T 2 -s tat is t ic (gamma case) and the classical T 2 -s ta t is t ic (normal case, of course). Another answer may be that we simulated data from gamma with small dispersion matrix, in the case of small dispersion, the multivariate proper dispersion model is very similar to the multivariate normal model. Most of the results in non-normal data analysis have developed by using asymptotic theory. But this does not mean that we have to apply the results only when we have small-dispersion or large sample. A s we do in generalized linear model, we can apply in any situation. The main point here is that we feel quite comfortable and satisfied when we do the data analysis of non-normal data with the relevant non-normal dis-tr ibution rather than wi th the normal distribution. A s a final note to the non-normal data analysis, we quote the following from j0rgensen (1997b): "While normal distribution is useful for many types of data the normal dis-tribution is the exception, rather than the rule, except for data with small dispersion. There is an importance of describing data in their natural habitat. Analysis of non-normal data should hence take into account the actual form of the sample space for each type of data". 68 A p p e n d i x A Some results on matrices Here we give basic results on matrices. For more details, see for eg. Dwyer (1962), Johnson and Wichern (1992) and Rao (1985). A . l Determinant 1. Let A : p x p,B : p x q,C : q x p,D : q x q. Then A B — < ' |A| |D - CA^BI if A is nonsingular C D |D| |A - B D ^ C I if D is nonsingular 2. Let A : p x q, B : q x p. Then |IP + AB| = |I + BA|. A . 2 Inverse 1. Let A be an n x n nonsingular matrix that is partitioned as where A^ : rii x rij, i, j = 1, 2 and rii + n 2 = n. Then ( A n - Ai 2 A22 1 A 2 i ) is the upper left-hand block of A - 1 . A.3 Kronecker product (Direct product) Definition A . l Let A = (a^ ) be an m x n matrix and B = (bij) be a p x q matrix. Then the (mp) x (nq) matrix f auB ai 2 B . . . a i n B ^ a 2 i B a 2 2B . . . a 2 nB \ a m i B a m 2 B . . . a m n B j is called the Kronecker product or the direct product of A and B and is denoted by A <g> B. Some elementary properties 1. (cA) <g>B = A ® (cB), 2. (A ® B) <g> C = A <g> (B ® C), 3. (A <g) B) T = A T <g) B T , 4. (A (g) B ) _ 1 = A - 1 (g) B _ 1 for nonsingular A and B, 5. (A <g> B)(G (g) D) = (AC) <g> (BD), 6. |A (g) B| = |B ® A| = | A | p | B | m for A : m x m and B : p x p. Definition A.2 F o r a #n;en A : p x q, we denote by VecA the vector obtained by writing the columns of A one below the other starting with the first. 70 A . 4 M a t r i x derivatives and Jacobians Defini t ion A .3 For a scalar function f of a matrix variable X = (xij) : m x n we define its matrix derivative by df f df <9X \dxij : (m x n). When n = 1, X is a column vector x and the corresponding vector df /cbc is called the vector derivative. Table A . l : Vector Derivatives ( x : j ) X 1) / df/dx a T x (a is constant) a x T x 2x x T A x , A:pxp (A + A T ) x x T A x , (A = A T ) 2Ax Matrix Derivatives • Let w = w(Y) be a scalar valued function of a matrix Y which is a function of matrix X then • The derivative of an element with respect to the matrix d ( C X T D ) D K T C , where C X T D is a p x q matrix and K is p x q matrix with zero in every position except for kij = 1. 71 • Let U(X) : p x q and V(X) : q x r be matrix functions of a matrix X. Then the Vec derivative of U(X)V(X) respect to X is given by T , T aVec(U) . T T . aVec(V) aVecT(X) 1 v ^ ~ ; « 9 V e c T ( X ) 72 Bibliography [1] Barndorff-Nielsen, O.E. (1988). Parametric Statistical Models and Likelihood, Lecture notes in Statistics, 50, New York, Springer-Verlag. [2] Barndorff-Nielsen, O.E. and Cox, D.R. (1989). Asymptotic Techniques for use in Statistics, New York, Chapman and Hall. [3] Barndorff-Nielsen, O.E., and Cox, D.R. (1994). Inference and Asymptotics, New York, Chapman and Hall. [4] Barndorff-Nielsen, O.E. and J0rgensen, B. (1991). "Some parametric models on the simplex", J. Multivariate Anal, 39, 106-116. [5] Chambers, J . M . (1977). Computational Methods for Data Analysis, Wiley series in Prob. and Math. Statistics, New York, John Wiley and Sons. [6] Dwyer, P.S. (1962). "Some applications of matrix derivatives in multivariate analysis", Journal of American Statistical Association, 62, 607-625. [7] Everitt, B.S. (1987). Introduction to optimization methods and their application in Statistics, New York, Chapman and Hall [8] Hotelling, H. (1931). The generalization of Student's ratio Ann. Math. Statist. 2 360-378. 73 [9] Hutchinson, T.P. and Lai, C D . (1990). Continuous Bivariate Distributions, Em-phasising Applications, South Australia, Rumsby Scientific Publishing. [10] Johnson, R .A. and Wichern, D.W. (1992). Applied Multivariate Statistical Anal-ysis, Third edition, New-Jersey, Prentice Hall Inc. [11] Johnson, W.L and Kotz, S. (1972). Distributions in Statistics: Continuous Mul-tivariate Distributions, Wiley series in Prob. and Math. Statistics-Applies, New York, John Wiley and Sons. [12] J0rgensen, B. (1987a). "Small Dispersion Asymptotics", Brazilian Journal of Prob. and Statistics, 1, 59-90. [13] J0rgensen, B. (1987b). "Exponential Dispersion Models", Journal of Royal Sta-tistical Society, B49, 127-162. [14] J0rgensen, B. (1997a). Proper dispersion models (with discussion). Brazilian Journal of Probability and Statistics 11, 89-140. [15] J0rgensen, B. (1997b). "The theory of dispersion models" , New York, Chapman & Hall. [16] J0rgensen, B. and Lauritzen, S.L. (1998). "Constructing Multivariate Proper Dis-persion Models", Research report 2: Department of Statistics and Demography, Odense University, Denmark. [17] Krishnaiah, P.R (1985). Multivariate gamma distributions. In Encyclopedia of Statistical Sciences Vol. 6 (eds. S.Kotz, C.B.Read and D.L.Banks), 63-66. New York: Wiley. [18] Krishnamoorthy, A.S. and Parthasarathy, M (1951). Ann. Math. Statist., 22, 549; erratum 31, 229. 74 [19] Minoru Siotani, Takesi Hayakawa, and Yasunori Fujikoshi (1985). Modern Multi-variate Statistical Analysis: A Graduate Course and Handbook, Ohio, American Sciences Press, Inc. [20] McCullagh, P. and Nelder, J .A (1989). Generalized linear models, Second edition. London: Chapman & Hall. [21] Neal, M.R. (1997). "Markov Chain Monte Carlo Methods based on 'Slicing' the density function", Technical report No: 9772, Department of Statistics, Univer-sity of Toronto, Canada. [22] Nelder, J .A and Wedderburn, R . W . M . (1972). "Generalized linear models", J. Roy. Statist. Soc. Ser., A 135, 370-384. [23] Neyman, J. (1926). "Further notes on non-linear regression", Biometrika, 18, 257-262. [24] Pearson, K . (1923). "Notes on skew frequency surfaces", Biometrika, 15, 222-230. [25] Radhakrishna Rao, C (1985). Matrix derivatives: Applications in statistics. In Encyclopedia of Statistical Sciences Vol. 6 (eds. S.Kotz, C.B.Read and D.L.Banks), 320-325. New York: Wiley. [26] Rogers, G.S. (1980). Matrix Derivatives, Volume2, New York, Marcel Decker Inc. [27] Ryan, B.F. , Joiner, B .L . and Ryan, T .A . (1985). Minitab, Handbook, Second Edition, Duxbury Press, Boston. [28] Song, P.X. (1996). "Some Statistical Models for the Multivariate Analysis of Lon-gitudinal Data", Ph.D thesis, Dept. of Statistics, University of British Columbia. 75 [29] Tweedie, M . C . K . (1947). "Functions of a statistical variate with given mean, with special reference to Laplacian distributions", Proc. Cambridge Phil. Soc. 49, 41-49. [30] Wijsman, R .A. (1957). "Random Orthogonal Transformations and their use in some classical distribution problems in Multivariate Analysis", Ann. Math. Statistics, 28, 415-423. [31] Yule, G .U. (1897). " On the significance of Bravais' formulae for regression in the case of skew correlation", Proceedings of the Royal Society of London, 60, 477-489. 76 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088555/manifest

Comment

Related Items