UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Some statistical properties of multivariate proper dispersion models, with special reference to a multivariate… Rajeswaran, Jeevanantham 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1998-0586.pdf [ 2.93MB ]
Metadata
JSON: 831-1.0088555.json
JSON-LD: 831-1.0088555-ld.json
RDF/XML (Pretty): 831-1.0088555-rdf.xml
RDF/JSON: 831-1.0088555-rdf.json
Turtle: 831-1.0088555-turtle.txt
N-Triples: 831-1.0088555-rdf-ntriples.txt
Original Record: 831-1.0088555-source.json
Full Text
831-1.0088555-fulltext.txt
Citation
831-1.0088555.ris

Full Text

S O M E STATISTICAL P R O P E R T I E S O F M U L T I V A R I A T E P R O P E R DISPERSION MODELS, WITH SPECIAL R E F E R E N C E TO A MULTIVARIATE G A M M A MODEL by Jeevanantham Rajeswaran B.Sc, University of Jaffna, Jaffna, Sri Lanka, 1992. A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF M A S T E R OF SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES Department of Statistics  We accept this thesis as conforming to the required standard  T H E U N I V E R S I T Y O F BRITISH C O L U M B I A August 31, 1998 ©Jeevanantham Rajeswaran, 1998  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department  or  by his or  her  representatives.  It  is understood  that  copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department of  'bClbf^&bt'CLi  The University of British Columbia Vancouver, Canada  DE-6 (2/88)  Abstract A broad class of error distributions for generalized linear models is provided by the class of dispersion models which was introduced by J0rgensen (1987a, 1997a) and a detailed study on dispersion models was made by J0rgensen (1997b). In this thesis we study multivariate proper dispersion models. Our aim is to do multivariate analysis for non-normal data, particularly data from the multivariate gamma distribution which is an example of a multivariate proper dispersion model, introduced by j0rgensen and Lauritzen (1998). This class provides a multivariate extension of the dispersion model density, following the spirit of the multivariate normal density. We consider the saddlepoint approximation for small dispersion matrices, which, in turn, implies that the multivariate proper dispersion model is approximately multivariate normal for small dispersion matrices. We want to mimic the basic technique of testing in multivariate normal, Hotelling's T. 2  Our version of the T test applies asymptotically, for either small dispersion or 2  large samples. We also consider estimating the normalizing constant of the bivariate gamma by Monte Carlo simulation and we investigate the marginal density by using numerical integration. We also investigate the distribution of the T -statistic by Monte Carlo 2  simulation.  ii  Contents Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgements  1  1 Introduction  2  2 Univariate dispersion models  6  2.1  Definition  6  2.2  Exponential dispersion models  9  2.3  Proper dispersion models  11  2.4  Saddlepoint approximation  12  2.4.1  Barndorff-Nielsen's p*-formula  2.4.2  Saddlepoint and Laplace approximations  3 Multivariate dispersion models 3.1  13 14 17  Definition  17  3.2 _ Multivariate proper dispersion models 3.2.1  .  Multivariate gamma distribution iii  19 20  3.2.2 3.3  3.4 4  5  Maximum likelihood estimate of the position vector fx  23  Saddlepoint approximations  26  3.3.1  Barndorff-Nielsen's p*-formula  27  3.3.2  The Laplace method  28  Multivariate normal approximation  31  Bivariate gamma distribution  34  4.1  Numerical study of normalizing constant  34  4.2  Density plots  38  4.3  Estimation of moments  48  4.4  Numerical study of marginal density  49  A generalization of H o t e l l i n g ' s T - s t a t i s t i c  52  5.1  52  2  5.2  Introduction General likelihood ratio method in multivariate normal model . . . . 5.2.1  T -statistic in multivariate normal 2  . .  53 54  5.3  Maximum likelihood estimators of fj, and £  56  5.4  T in multivariate proper dispersion model  57  2  5.4.1  Numerical study on the T -statistic 2  63  Appendix A  69  A Some results on matrices  69  A.l  Determinant  69  A. 2  Inverse  69  A.3  Kronecker product (Direct product)  70  A.4  Matrix derivatives and Jacobians  71  iv  Bibliography  List of Tables 4.1  Dispersion Matrix  36  4.2  Estimation of normalizing constant by using M C method  37  4.3  A comparison of estimations of normalizing constant  37  4.4  Position vectors  4.5  Dispersion matrices.  4.6  Small dispersion matrices  41  4.7  Moderately large dispersion matrices  41  4.8  Large Dispersion matrices  46  4.9  Estimation of moments  49  4.10 Estimation of moments  51  4.11 Dispersion matrices  51  A.l  • .  38 38  Vector Derivatives (x : p x 1)  71  vi  List of Figures 2.1  Some gamma densities  9  4.1  Contour plots of independent bivariate gamma densities  39  4.2  Perspective plots of independent bivariate gamma densities  40  4.3  Contour plots of bivariate gamma densities with small dispersion. . .  42  4.4  Perspective plots of bivariate gamma densities with small dispersion.  43  4.5  Contour plots of bivariate gamma densities with moderately large dispersion  4.6  44  Perspective plots of bivariate gamma densities with moderately large dispersion  45  4.7  Contour plots of bivariate gamma densities with large dispersion.  . .  46  4.8  Perspective plots of bivariate gamma densities with large dispersion. .  47  4.9  A comparison between true and estimated marginal densities (estimated density is given in solid lines)  50  5.1  Histograms of diameter and height of the trees  63  5.2  Comparisons of estimated and true density of T  2  65  5.3  Comparisons of estimated and true density of T  2  67  vii  Acknowledgments First of all, I would like to thank my supervisor Dr. Bent J0rgensen for guiding me into this unexplored territory, for his inspiration, for his encouragement and for the generous financial support throughout my graduate career at U B C . I would also like to take this opportunity to thank for his help right from receiving me at the Vancouver airport. I am also grateful to Dr. Martin Puterman for his careful reading of the manuscript and valuable comments. I would also like to thank the faculty members of the Department of Statistics for their sincere support, for showing me the "taste of Statistics" and for the encouragements during my M.Sc program at U B C . I would like to take this opportunity to thank Ms. Christine Graham, Ms. Donna Pavlov, and Ms. Margaret Harney, our Department staff, for their constant support during my stay at U B C . Finally, I would like to thank Department of Statistics, especially Dr. Ruben Zamar, for giving me an opportunity to follow a Master's program at U B C and for giving financial support.  1  Chapter 1 Introduction The multivariate normal distribution has been studied for more extensively than any other multivariate distribution. Indeed, its position of preeminence among multivariate continuous distribution is more marked than that of the normal univariate continuous distribution. However, in recent years there have been signs that the need for more usable alternatives to the multivariate normal distribution is becoming recognized. Although the bivariate normal distribution had been studied at the beginning of the 19th century, interest in multivariate distributions remained at a low level until it was stimulated by the work of Galton in the last quarter of the century. He did not, himself, introduce any new form of distribution, but he developed the idea of correlation and regression and focussed attention on the need for greater knowledge of possible forms of multivariate distribution. Much multivariate theory has become practically useful only with the advent of the electronic computer. This stimulus has resulted in a rapid development of multivariate theory over the past two decades. Investigation of non-normal distributions, or "skew frequency surfaces" (as nonsymmetrical forms have been denoted) has generally followed lines suggested by previ-  2  cms work on univariate distributions. Early work in this fields followed rather different lines, but was not very successful. It is noteworthy that, among non-normal distributions, methods of estimation appear to have been developed less fully than one might expect. Karl Pearson (1923), whose first investigations appear to have been prompted by noting distinctly non-normal properties of some observed joint distributions, initially tried to proceed by an analogy with the bivariate normal surface. Later, Neyman (1926) also considered methods of construction of joint distributions, starting from certain requirements on the regression and scedastic functions. This was an extension of the work initiated by Jule (1897), who showed that assuming multiple linear regression, the multiple regression function obtained by the method of least squares is identical with that of a multinomial distribution. The multivariate distributions were derived from other distributions through transformations, projections, convolutions, extreme values, mixing, compounding, truncating, and censoring. Then the marginal distributions of various statistics were derived from them. The study of multivariate distributions is concerned mainly with distributions of the continuous and discrete types. Some distributions arise from multidimensional central limit theorems, many serve as models for random experiments, and others are of interest primarily as derived distributions. Prominent among limit distributions and those from which others derive is the multinormal distribution. Some distributions derived from it are known to be the same for all parent distributions in a class containing symmetric multivariate stable distributions. These facts bear on robustness and the validity of normal-theory procedures for use with non-normal data, including multivariate analysis, multivariate analysis of variance and the multivariate Bartlett test. Often multivariate data are scattered more heavily away from the center of location than are multinormal data. The introduction of a class of generalized linear models by 3  Nelder and Wedderburn (1972) paved the way to analyze a large variety of non-normal data by a simple general technique. Since there was no suitable class of multivariate non-normal distributions, this analysis is restricted to univariate data. Analysis of univariate non-normal data became simpler and accommodate more non-normal data types after the introduction of dispersion models, a broad class of error distributions for generalized linear models by J0rgensen (1987b,1997a). Having in mind to find a class of multivariate distributions suitable to analyze the multivariate non-normal responses in a generalized linear model setup and to extend the existing multivariate techniques to multivariate non-normal data, J0rgensen and Lauritzen (1998) proposed a multivariate extension to the dispersion models, the so called multivariate dispersion models. Since the form of the distribution is similar the multivariate normal distribution this distribution can handle a regression setup. It would be noteworthy that, unlike the other non-normal multivariate distributions proposed earlier, the definition of multivariate dispersion models is geared towards good statistical properties rather than requiring specific moments or marginals. As a first step towards these goals, we, in this thesis, consider a sub-class of multivariate dispersion models, the multivariate proper dispersion models and investigate some properties of the sub-class and propose a generalization of Hotelling's T -statistic. 2  In Chapter 2, we give a brief introduction to the univariate dispersion models of J0rgensen (1997b). In Chapter 3, we introduce the multivariate dispersion models and multivariate proper dispersion models. Specifically, we consider a multivariate gamma distribution which is an example of the multivariate proper dispersion models. Then we investigate the asymptotic behaviour of the multivariate proper dispersion models. In fact, we will show that multivariate proper dispersion models can be approximated by a multivariate normal distribution.  4  In chapter 4, we investigate the behaviour of the bivariate gamma distribution by observing the shape of the density plots and estimating the moments and marginal densities. In chapter 5, we generalize the Hotelling's T -statistic so that it can ac2  commodate non-normal data.  5  Chapter 2 Univariate dispersion models In this chapter, we give an introduction to the theory of the univariate dispersion models of J0rgensen (1997b), which is the basis of the entire theory that we develop in this thesis. F i r s t we give the definition of dispersion models, then we look at the two types of dispersion models: exponential dispersion models and proper dispersion models, and finally we discuss the saddlepoint approximations for the dispersion models.  2.1  Definition  Nelder and Wedderburn (1972) were the first to show, by introducing the class of generalized linear models, that a large variety of non-normal data may be analyzed by a simple general technique. Generalized linear models were originally developed for exponential families of distributions, but the m a i n ideas were extended to a wider class of models called dispersion models. Dispersion models were originally introduced by J0rgensen (1987b) and the models were studied i n detail by J0rgensen (1997b). T h e class of dispersion models covers a comprehensive range of non-normal distributions, including distributions for the following seven basic data types, where S 6  denotes the support of the distribution: • D a t a on the real line, S = 1Z. • Positive data, S = 1Z+ = (0,1). • Positive data w i t h zeros, S = [0, oo). • Proportions, S = (0,1). • Directions, S = [0, 2TT). • Count data, S = { 0 , 1 , 2 , . . . } . • Count data w i t h finite support, S = { 0 , . . . , m}. The main idea behind the dispersion model is that the notions of location and scale may be generalized to position and dispersion, respectively, for a l l the above seven data types.  Definition 2.1 Let Q C C C TZ be intervals with Q open. A function d : C x Q —> 1Z is called a  unit deviance if it satisfies d (y; y) = d (/j;  fj) =  0 Vy,  fj,  e ft  (2.1)  and d {y; n)>0 A unit deviance d is called  My ± p.  regular if d(y; fx) is twice continuously dijferentiable  (2-2) with  respect to (y, y) on Q x Q and satisfies dd 2  dy  2  (p,; p) > 0 V> G Q.  7  (2.3)  The unit variance function V : ft —> 7c+ of a regular unit deviance is defined by v  w  = w ^ y  { 2 A )  A univariate (regular) dispersion model DM(/J,,O ) 2  with position parameter ii € ft  and dispersion parameter o > 0 is defined by a probability density function of the 2  form f(y,V,°- ) 2  =a(y;a )exp j - ^ d ( ? / ; / i ) J , ? / e C  (2.5)  2  where a > 0 is a suitable function and d is a regular unit deviance. The important characteristics of the density are: 1. The factor a in (2.5) is independent of the position parameter /x. 2. The densities are often unimodal; in general, the deviance d tends to make the density peak near //, and the peak will become higher and sharper the smaller the value of o . 2  As an example, consider the gamma distribution with shape parameter A and scale parameter 6. The corresponding distribution is given by f(y,\0)=  ^  (  A  ;y>0,  )  \ >0,9>0.  The mean and the variance of Y are given by E(Y) = X/9, and Var(Y)  (2.6) = X/6  2  respectively. Now we write the above density function (2.6) as a dispersion model. Let E(Y) = ix and A = 1/CT. Then we have, Var(F) = fj?o and 9 = l / ( / / a ) , where 2  2  2  IL and o are position and dispersion parameters, respectively. Then the density is 2  given by f(y,V,°- ) 2  = |^y =  - 1  e  - A  exp  j-^d(y;//)J  a(y;a )exp|-^o!(y; i)J, 2  A  8  (2.7)  o  Figure 2.1: Some gamma densities.  where the regular unit deviance d(y; fi) is given by (2.8)  and the corresponding unit variance function is given by V(n) = \x . Note that in the 2  following we denote the a function as c(y; A) when we use A instead of a  2  For any regular unit deviance dd 2  .  d(y;n) ,  dd 2  .  ,  dd 2  .'  (2.9)  for details see J0rgensen (1997b, p. 24).  2.2  Exponential dispersion models  Exponential dispersion models were originally proposed by Tweedie (1947), and later reintroduced by Nelder and Wedderburn (1972) as a class of error distribution for generalized linear models.  9  A reproductive exponential dispersion model is defined by the following density for a random variable Y, with respect to a suitable measure, f(y; 9, A) = c(y; A) exp[A{0y - (0)}],y e 71.  (2.10)  K  A n additive exponential dispersion model for a random variable Z is defined by the probability density function of the following form, with respect to a suitable measure, f*{z;9,\)  = c*(z;\)exrj{9z-\K(9)},zen.  (2.11)  The two types are connected by the duality transformation Z = XY, so in principle, any exponential dispersion model has both an additive and a reproductive forms. The parameter 9 is called the canonical parameter and A is called the index parameter. The maximum possible domain for 9, denoted 0 C 71 is called the canonical parameter domain. The domain for A, called the index set, denoted by A , is a subset of 7Z . +  The function K is called the cumulant function. where // = T{9) —  The reproductive form (2.10) is denoted by Y ~ ED(n,o ), 2  K'(9) is the mean and o  2  o V(ii) 2  = o r'{T~ (fj,)}, 2  1  = A  - 1  is the dispersion parameter. The variance of Y is  where V is the variance function.  The additive form (2.11) is denoted by Z ~ ED*(9,X),  and the mean and the  variance of Z are E(Z) =  C = XT{9) and  Var(Z) = XV((/X)  = XV(LI).  (2.12)  We define the unit deviance for the above exponential dispersion models (see J0rgensen (1997b) for details) as follows, sup.{j/0 - K(9)} - yr- {p) + K { r " ^ ) } l  1  (2.13)  Then the density (2.10) can be rewritten in the form (2.5) in terms of d of (2.13), and 10  the a function is given by a"  2  sup{y6 -  K(6)}  This proves that the reproductive model (2.10) is indeed a dispersion model (2.5). However, the additive model (2.11) is not, in general, of the form (2.5). Both types of exponential dispersion models satisfy a convolution property ( see J0rgensen (1997b) for details) given as follows. Let Y\,... ,Y  n  with weights W{. Then  Yi ~ ED(p,o /wi) 2  1 — ^WiY-EDlLi, n  +  ti  w  where w  +  be independent and  = X)™ Wi-  {  a \ 2  — )  \  W  (2.14)  +J  •  If Z i , . . . , Z are independent and Zi ~ ED*(6, Xi), then n  Z+ = '£ Z ~ED*(d,\ i  i  1  + --- + \ ).  (2.15)  n  I  Furthermore, we have A = TZ+ if and only if (2.11) is infinitely divisible (J0rgensen, 1997b). Note that the gamma distribution (2.6) can be written as fir,  A, 0) =  e x p [ A { ( - % - ( - log(-0))}],  (2.16)  which is of the form (2.10). Hence the gamma distribution is a reproductive exponential dispersion model. One of the shortcomings of the exponential dispersion models is that there are no adequate exponential dispersion models available for data with bounded support, such as angles or proportions.  2.3  P r o p e r dispersion models  In this section, we discuss another class of dispersion models,  proper  dispersion  models,  introduced by J0rgensen (1997a). The introduction of proper dispersion models pave 11  the way to accommodate distributions suitable for angles and proportions, namely the von Mises and simplex distributions, respectively. D e f i n i t i o n 2.2 If d is a regular unit deviance with unit variance function V, a regular proper dispersion model, denoted by PD(LI,O ),  is defined by  2  f (y; //, A) = c(X)V-Hy) exp ^~d(y; where A = l/o .  //) j , y G O.  That is, the term a in (2.5) is factorized as  2  (2.17)  c(\)V~^(y).  A crucial property of this model is that the integral cJX) = j ~Hy)^^-\d{y-^dy  (2.18)  V  n  does not depend on LI. Since the term a in the gamma density (2.7) factorizes as c(X)V~^(y)  it is a  proper dispersion model. Exactly three exponential dispersion models are also proper dispersion models, namely the normal, gamma and inverse Gaussian families. Other examples of proper dispersion models are the von Mises distribution and the simplex distribution of Barndorff-Nielsen and J0rgensen (1991). The von Mises distribution, for example, is given by  f  ^  ^  =  Jo(a- ) 2  2  6  X  P  [" 2 ^  2 { 1  ~ ° C  S ( y  "  '  (2.19)  for 0 < y < 2TV, where position parameter LI G [0,27r), dispersion parameter o  2  1/A > 0 and I denotes the modified Bessel function given by 0  /•2TT  (A) = / exp (X cos y)dy. Jo  2.4  Saddlepoint a p p r o x i m a t i o n  In this section, we briefly discuss three methods of approximations that give an approximation to  a(y;o ). 2  12  2.4.1  Barndorff-Nielsen's p*-formula  Barndorff-Nielsen's p*-formula provides an approximation to the conditional distribution of the maximum likelihood estimator for a given statistical model, given an ancillary statistic. Here we apply the formula in the case of a single observation from a proper dispersion model (2.17) with A known, in which case an ancillary statistic is degenerate, and the formula provides an approximation to the marginal distribution of the maximum likelihood estimator. Since the maximum likelihood estimate of fj, for A fixed is y, Barndorff-Nielsen's formula hence provides an approximation to the marginal distribution of Y.  The  approximation is defined by (2.20)  f{v\P, ) ~ fo(y,fi,X), A  where f  0  is the renormalized saddlepoint approximation corresponding to unit de-  viance d , which we will discuss in the next section. Now let us define the multivariate version (Barndorff-Nielsen and Cox, 1994, p. 172) of the p*-formula. Definition 2.3 as a function  For a given ancillary  statistic a, the log-likelihood  can be considered  of ii and data. Let there be a one-to-one correspondence  between the  data y and (/2, a). Then we can write the log-likelihood £ as £ (/z; p., a) and the normed form £ as £ (p; %a)=i  (fi;  ft, a) - £ ( £ ;  The p*-formula (or saddlepoint approximation) p*(£;M|a) = (27r)where, \j\ is the determinant 11 =  a).  (2.21)  is defined by |j|VV  (2.22)  of the observed information  matrix for it evaluated at  IL.  13  p/2  This formula (renormalized) determines a probability density, with respect to Lebesgue measure, if the range space of /z is an open subset of W. Note that we deal with continuous distribution. The question of Lebesgue measure will not arise in our problem. In fact, formula (2.22) is called the pt-formula by Barndorff-Nielsen and Cox (1994). As compared to  , the p*-formula which is originally defined as c|j | / e^, gives at 1  2  least one degree higher accuracy but at the cost of having to calculate the normalizing constant c. So it is feasible to use pt-formula and we still call this formula the p*formula. We use the formula (2.22) in the Section 3.3.1 to develop an asymptotic distribution for multivariate proper dispersion models..  2.4.2  Saddlepoint and Laplace approximations  In this section, we briefly discuss three methods of approximations that give an approximation to a(y;o ).  We now briefly discuss the saddlepoint approximation for  2  dispersion models; a detailed discussion can be found in J0rgensen (1997b) . The saddlepoint approximation is important for the asymptotic theory of generalized linear models. The saddlepoint approximation for a dispersion model with regular unit deviance d is defined for y e ft by / ( y ; / x , a ) - { 2 r a V ( | / ) } - / e x p | - ^ d ( y ; x ) } as o -)• 0, 2  1  2  2  7  /  meaning that the ratio of the two sides goes to 1 as o  2  —>• 0.  (2.23)  The saddlepoint  approximation is exact in a few very special cases, such as the normal and simplex distributions. The saddlepoint approximation may be interpreted as being half way to a normal approximation for the density. Thus, if we replace V(y) by V(ii) in (2.23) and introduce a quadratic approximation to the unit deviance in the exponent of the density, 14  we get the normal approximation. Note that the saddlepoint approximation on the right hand side of (2.23), while positive, is not in general a density function on fl. However, it may be rescaled to become a density function. The corresponding renormalized saddlepoint approximation is the density function defined for y G f2 by fo (y,^,a ) 2  = a  (^,<7 ) 2  0  y-i/2()exp{-^Ld(yx)} , y  ;/  (2.24)  where a (fj,, a ) is a normalizing constant defined by 2  0  j ^  [  v  )  ^ { - ^  i  M  )  i  y  .  (2.25)  Note that, in the case of proper dispersion models, ao in (2.25) does not depend on  We now present the important Laplace approximation, as described in, for example, Barndorff-Nielsen and Cox (1989, p.59).  Proposition 2.1 (Laplace approximation) Define /(A) = f  b(y)e ^dy, xt  where Cl is an open interval. Suppose that b is positive, and continuous at y = JJL, that t is twice differentiable, has global maximum for y = \i G ft and that  K(JJ)  = -t"(p)  > 0.  Then /(A) ~  \ j T j ^ j A ^ y ^  15  a  s A  -+ °°-  (- ) 2  26  The ordinary and renormalized saddlepoint approximations agree asymptotically for o small; this can be shown by using the Laplace approximation. We now present 2  this important saddlepoint approximation result, for a proof see J0rgensen (1997b, p. 28). P r o p o s i t i o n 2.2 Let d be a given regular unit deviance defined on C x ft and define a (LI,O ) by (2.25). Then, as a ->• 0, 2  2  Q  The essence of the saddlepoint approximation is the approximation of the function a(y;o ). 2  Specifically, the saddlepoint approximation is equivalent to oa(y;o ) 2  -> {2-nV(y)}*  D e f i n i t i o n 2.4 The saddlepoint approximation if the convergence in (2.27) is uniform  as o  2  is said to be u n i f o r m o n compacts  in y on compact subsets of ft.  P r o p o s i t i o n 2.3 In the case of a proper dispersion imation  is uniform  on compacts,  (2.27)  0.  and the renormalized  model, the saddlepoint saddlepoint  approx-  approximation  is  exact.  The proof of this proposition contains the following arguments: For proper dispersion model we have a (LI, o ) = a(o ) making the renormalized saddlepoint approximation 2  2  0  exact. The continuity of the unit variance function V implies that the saddlepoint approximation is uniform on compacts. For details, see J0rgensen (1997b).  16  Chapter 3 Multivariate dispersion models In this chapter, we discuss multivariate dispersion models. In the first section, we discuss the definition of multivariate dispersion models. Basically, in the first section, we discuss the multivariate dispersion models defined by J0rgensen and Lauritzen (1998) which opened a new field in statistics that has a lot of problems to investigate. It is the basis for this whole thesis. In subsequent sections we develop and discuss some statistical properties of the multivariate proper dispersion models and in particular the multivariate gamma distribution. Note that the order in which we discuss the multivariate dispersion models is related to its univariate counterpart in the preceding section.  3.1  Definition  A class of multivariate dispersion models suitable as error distributions for generalized linear models with multivariate non-normal responses was introduced by J0rgensen and Lauritzen (1998). The main concern, in their paper, is with suitable distributions for generalized linear models with a multivariate response vector. The definition of multivariate dispersion models is geared towards good statistical properties similar to 17  that are possessed by multivariate normal distribution, rather than requiring specific moments or marginals. Let the sample space be an open subset of W.  A multivariate dispersion model  is defined by the following probability density function on W\ (3.1)  where LI £ Q (an open region in 7Z ), £ is a symmetric positive-definite p x p matrix P  and t (y; LI) is suitably defined vector of deviance residuals satisfying t (LI; LI) = 0 for LI € Q. The parameter LI, called the position vector, and S , called the dispersion matrix, may be interpreted as analogies of respectively the mean vector and variancecovariance matrix of the multivariate normal distribution. As an example, J0rgensen and Lauritzen (1998) defined the multivariate Laplace distribution by the following density with respect to Lebesgue measure on IV: / ( y ; LI, S ) = o(S) exp  j ~t  T  (y - LI) S " * (y - LI) } , 1  where a(S) is a normalizing constant, t is defined by (3.2) and ± = sgn(?/j — Lij) denotes the sign of i/j — Lij for each j .  Note that when the  dispersion matrix £ is diagonal, the components of Y follow independent univariate Laplace distributions. For a detailed discussion on generalization of the definition (3.1) and geometric construction of multivariate dispersion models refer to J0rgensen and Lauritzen (1998). 18  3.2  M u l t i v a r i a t e proper dispersion models  Following the spirit of the univariate proper dispersion models J0rgensen and Lauritzen (1998) called (3.1) a multivariate proper dispersion models if a(y; S) in (3.1) factorizes as a(£)6(y) for suitable functions a and b. That is, we define the multivariate proper dispersion models as follows /(y; pi- £ ) = a p ( E ) | V ( / i ) | - i exp { - ^ ( y ; Ai)E *(y; » ) } ,  (3.3)  _1  T  where V(fJt) = d i a g - f V ^ ) , . . . , V(/x )}. The distribution (3.3) is denoted by PD (fi, p  In the following we often use A = S  _ 1  p  E).  instead of  As an example, consider the von Mises distribution given by (2.19). Define the deviance residual by  where the sign ± denotes the sign of sin(y — fj,). Since r(y; fj,) is a pivot, a function of Y and n whose distribution under p, does not depend on the value of p, the standard construction proposed by J0rgensen and Lauritzen (1998) leads to a multivariate von Mises distribution of the form / ( M , s) = a(S) exp [-^0 (y;/*)£ *(y;/*)} T  Y ;  _1  on [0, 27r) , where t(y; fx) is the vector of deviance residuals. This is clearly a multip  variate proper dispersion model. j0rgensen and Lauritzen (1998) discussed a standard construction that takes almost any univariate proper dispersion model and gives a multivariate proper dispersion model as a result. They noted, however, that this standard construction does not work for the Inverse Gaussian distribution which is a univariate proper dispersion model. 19  3.2.1  Multivariate gamma distribution  In this section, we construct the multivariate gamma distribution which was described by J0rgensen and Lauritzen (1998). Consider the univariate gamma density (2.7) f(v, H,  A) = ci (X)y  exp ( - ^d(y;  1  (3.4)  ,y > 0  LL) \  where the regular unit deviance d is given by (2.8), the unit variance function V(LI) LL  2  and Cl  (A) =  A e  r(A)"  Note that the geometric measure is v(dy) = V~ ^ (y)dy = y~ dy, where the geometric 1  2  l  measure is defined as follows, see J0rgensen and Lauritzen (1998) for more details. Suppose G\(y)  = {g \(y)} is a matrix with entries J  9xiv)  1  d  2  2 duj/ifc  tr{AT(y;/x)} fj-=y  then the geometric measure is defined by ^A(%)  =  | G  A  ( y ) |  1  /  2  ^ .  The deviance residual is LL) = ±yJd(y,Li)  r(y;  = ± , / 2 { ^ - log - y A* A V  1},  4  where ± denotes the sign of (y — LL). Clearly r(Y; LL) is a pivot. For more details about pivot refer to J0rgensen (1997b). Let *(y; At) = {r{yi,  L i i ) , r ( y  p  ;  LI )}  T  P  denote the vector of residuals for a p-vector of data. Then the yoke r(F T(Y-Li)  = t(Y-Lx)t (Y-Li) T  )  i ; M l  = \  Wr(y r{Y ;Li ) p  20  p  i ; A i l  ),...,r(F ;^)} p  (3.5)  is also a pivot with respect to the product measure u(dy ..  v(dyi) ® • • • <g) v(dy )  .,dy )  u  p  p  n{rt}p  (3.6)  A multivariate version of the univariate gamma therefore appears as / ( y ; ti, £ ) = a (Y,){ ... p  We denote this model  Ga (ii, p  yY  l  yi  P  exp [ - ^ { E ^ y ; » ) }  (3.7)  £).  The new model (3.7) is clearly a multivariate proper dispersion model and in the case of A being diagonal, it has gamma marginals distributed independently. we call this model a multivariate  gamma distribution.  So  The main problem in this  multivariate model is determining the normalizing constant a (A). This normalizer p  can be estimated by numerical integration or by Monte Carlo simulation. We defer the discussion of this problem until Chapter 4. This multivariate gamma distribution seems to be new and different from other multivariate gamma distributions proposed in the literature cf. Krishnaiah (1985). In particular, its marginals are not gamma except.when A is diagonal. Several procedures exist for constructing a multivariate gamma distribution given that the marginal distributions are gamma. For example Johnson and Kotz (1972, p. 216) proceeds as follows:  Suppose that X ,Xi,... 0  ,X  m  are independent random vari-  ables and that Xj has a standard gamma distribution. The joint distribution of Yj = X + Xj (j = 1,2,... ,m) is called a multivariate gamma distribution and the 0  marginal of Yj is a standard gamma distribution.  21  Krishnamoorthy and Parthasarathy (1951) denned a multivariate gamma as follows: Let X j = (Xij,...,  X j), P  (j = 1, 2 , . . . , n) be distributed independently and  identically as a multivariate normal with mean vector 0 and covariance matrix £ . Also, let Zi — \ Y!j=\ Xfj for i = 1, 2 , . . . , p. Then the joint distribution of (Zi,...,  Z) p  is a central multivariate gamma distribution with n/2 as shape parameter and £ as the covariance parameter. The main application of the proposed multivariate gamma distributions has so far been in the study of reliability. Since the forms of the densities are quite complicated, it is difficult to accommodate a multivariate regression set up. The form of multivariate gamma model, in general multivariate proper dispersion model, proposed by J0rgensen and Lauritzen (1998) is very similar to that of a multivariate normal model. This will enable us to accommodate a multivariate regression setup. The problems related to the inference on the position vector is very much similar to that of a multivariate normal model. We now introduce some properties of the multivariate gamma distribution. Theorem 3.1  The Multivariate gamma model is closed with respect to scale trans-  formations. Proof: Let Y ~ Ga (n, £ ) and C be a diagonal matrix with elements c i , . . . , c , p  p  Ci 7^ 0 V i . Consider the transformation X = C Y . The Jacobian matrix is given by  Hence the density is given by 0(x;/x,E)  IJI-VCC-^JM.S)  C\- a {Y.) l  p  22  =  a.  Now let us consider the deviance  log (3.9)  Hence, it follows from (3.9) that t(C *x; LI) = t ( x ; C ^ ) . Hence we may write (3.8) as  Therefore, we have X ~ Ga (C/x, £ ) . Hence the theorem. p  3.2.2  •  M a x i m u m likelihood estimate of the position vector LI  One of the main advantages of the multivariate dispersion model defined by J0rgensen and Lauritzen (1998), as in the univariate dispersion models defined by J0rgensen (1997b), is that the normalizer a ( S ) does not involve the position vector LL and p  therefore it allows us to make inferences about LL without knowing the exact formula for a ( S ) . p  From now on, in this chapter, we use quite a number of techniques for matrix derivatives. We give an overview of these formulas in Appendix A.4- For more details cf. Dwyer (1962). T h e o r e m 3.2 Let Y i , . . . , Y proper dispersion  n  be n random vectors, in W,  model PD (LI,TI). p  Then, for arbitrary  23  from a  multivariate  S , the score function  for  Lij  based on the full sample is  H i S h r(Yif,  <>ft  ft)  where Xjk = (A)jfc and d'(y; LI) denotes the derivative  J  of d with respect to LL.  P r o o f . The log-likelihood function of LL is given by 1 £ (LL) = n log a (A) + TJ log b( ) - - J ] t n  n  p  T  Yi  i=l  Z  (y LI) At (y,; LL) . i;  (3.11)  i=l  Differentiating (3.11) with respect to LL gives 0*M  =  i p  (YjjAx) A t (Y /x)  T  i;  1 " P fat (Y /i)At(Y /i)]  9{t(Y /)}  T  <;  2  Sill  i;  9*(Yi;/i)  = -EE{^(Y»;M)L i=l  Z  where e  m  = l  a { t (  ^  m  9M  m  ) } m  ^  j=l m=l  is a p x 1 vector with 1 at the m  th  m  J  i;/  position and zero elsewhere and we define  D'(Y;^)=diag{!^,...,^4 [r^Y^fn)  (3.13)  r(Y ;Li ) p  p  Therefore, from (3.12) it follows that ^  = ~ \ t ' ( Y , ; LL)M (Yi- LI) . . 2^ D  ^  Matrix multiplication of diagonal matrix D' with A gives A  . 3 k  d'{Y -LLj) i:j  r(Yij\ Uj)  24  (3.14)  as the (j, k)  th  element. Now the multiplication of the resulting matrix from the above  with the vector t(Yjj /x) gives a vector with  as j  th  element. Now (3.10) follows immediately from (3.14).  Lemma 3.1  For a multivariate  •  gamma model, for arbitrary  X, the score  function  for Hj based on a full sample given by  ^=P4>Mk) ^  d  r(Y  -  ]  (3 lB)  P r o o f . For a multivariate gamma model the D'(yjj ii) in (3.12) can be written as D'(Y ;/x)--2A(Y , )V- ( ) 1  i  i  M  M  where A ( Y , / * ) = diag f  Y  l  ~ ^  ~^  Yp  p  and V(/z) =  diagj/i ,...,// ,} 2  2  Now by using the fact that d(y; p) = 2(^ — log ^ — 1) and by using Theorem 3.2, it is easy to show that (3.15) holds.  •  Note that, if we make a small-dispersion approximation to d' at Y = /j,, we have (J0rgensen, 1997 p. 25), d\Y-y)  « 2r(Y^)V- \p). l/  (3.16)  Thus, D ' ( Y ; fi) « 2 V / (/x). - 1  2  (3.17)  We will use the approximations (3.16) and (3.17) frequently in the next section. 25  There is no closed form solution to  M  ^ = 0 except for the case where we have a single  observation vector. But this can be solved by numerical techniques; for example, we may use optimization technique such as quasi-Newton to find numerical values.  L e m m a 3.2 Let Y be a random vector, in W, from a multivariate model PD (ix, p  £ ) . Then, for arbitrary S , the maximum  proper  dispersion of pi is  likelihood estimator  y-  Proof. We prove this by the following arguments. The log-likelihood function of fx is given by (3.18) Let t = t(y;pb). Since £ is positive-definite so is S  _  1  = A . Hence t At  > 0 and  T  clearly we have t(y; y) = 0. Now if y ^ pi we have t(y; ii) ^ 0 and hence t At > 0 T  again by A being positive-definitive. Hence, t At is minimum for ii = y . Therefore T  it follows from (3.18) that the maximum likelihood estimator of  is y .  •  We now give the second derivative of the log likelihood (3.18) with respect to fx, which will be used in the next section  dH{Ll) dpidii  T  1 [flD'(y;fi) 2 [ dv T  1 \dB'(y-Li) 2 [  3.3  dii  T  Saddlepoint approximations  We now introduce the saddlepoint approximations for the multivariate proper dispersion model. The term 'saddlepoint approximation' was used by J0rgensen (1997b) in 26  univariate dispersion models and we keep this term for the multivariate cases also. The saddlepoint approximation in multivariate dispersion models, as in the univariate case, fortuitously avoids approximation of the term in the exponent. This is useful because the position vector LL enters the density only via the exponent, so that the saddlepoint approximations preserves the statistical properties regarding the inference on  LL.  In the following sections we investigate the limiting behaviour of the multivariate proper dispersion model when | | £ | | —> 0, that is, | | A | | —>• oo where || • || is the Euclidean norm of the matrix.  3.3.1  Barndorff-Nielsen's p*-formula  In this section, we develop a saddlepoint approximation for multivariate proper dispersion model by using the p*-formula (2.22):  Theorem 3.3  Let the multivariate  proper dispersion  f (y; LI, £ ) = a ( S ) | V ( ) | - / exp | - ± t 1  2  y  Then the saddlepoint  approximation  T  model be given by  (y; LL) E" * (y; //)} 1  for the multivariate  .  proper dispersion  (3.20) model is  given by  /(y;/x,E)  ~  (27r)-"|EH|V( )|^exp{-^ ( x)E- t( ;/x)}, T  y  yen  p  Proof.  Let  Y~  PD (LL, P  E).  1  y;/  y  (3.21)  For one observation, we have by Lemma 3.2 that the  maximum likelihood estimator of fx is p, = y. Note that for a single observation the ancillary statistic is degenerate. Using (3.17) and (3.19), the observation matrix of LL 27  at y is given by d £{fi) 2  3 =  -  = V- / 1  dfxdiJ,  1  2  (y) S ^ V " / 1  2  (y).  (3.22)  Therefore, by using the p*-formula (2.22) the saddlepoint approximation for the multivariate proper dispersion model is given by /(y;/*,E)  ~ (27r)-"|V^(y)SV^(y)|^exp{-^ (y;zx)S- t(y; x)} T  /  =  Hence the result.  1  (27r)-f | E | - i | V (y) | " i exp { - i *  T  (y; ,x) E " * (y; /x) j , 1  (3.23)  •  We show below, by a Laplace approximation that the above approximation holds for ||5]|| —¥ 0. That is, the saddlepoint approximation is equivalent to  |£|  1 / 2  a ( y ; ) ~> ( 2 T ) " | V ( y ) | " 2 as | | E | | -+ 0 s  §  p  or, equivalently, in multivariate proper dispersion models |£|  1 / 2  a ( E ) -> (27r)^ as | | E | | ->• 0 p  and this approximation is clearly uniform on compacts.  3.3.2  The Laplace method  The Laplace method provides a proof of the saddlepoint approximation. The Laplace method for multidimensional case is direct generalization of the corresponding univariate discussion in (2.1). The key argument in the univariate case was to produce in the integrand a factor close to a normal density of small variance. Essentially the same argument applies in p dimensions. 28  We now present the important Laplace approximation, as described in, e.g., Barndorff-Nielsen and Cox (1989, p.169). Before we give the Laplace approximation, note that for any matrix A , we can write A = A A where Ao is a fixed constant 0  matrix and A is a constant. P r o p o s i t i o n 3.1 (Laplace a p p r o x i m a t i o n ) Define  where the functions W.  r(y) and q(y) are defined over the region Q of integration,  Suppose that r(y) has a unique minimum  7(A)  e  -Ar(AX)g(  A A  )(  |r"(/x)|  where r" is the second derivative of r(y)  T h e o r e m 3.4 Let the multivariate  2 7 r  Q C  in the interior o / D at y = LL, then )P/2  , as A —> oo,  1/2  (3.24)  evaluated at y = LI and |r"(/i)| > 0.  proper dispersion model be given by  / ( y ; LL, £) = a (E) | V ( y ) | - * exp | - ^  T  (y; LL) YrH (y; A*)} , y G CP  where  a(E)  -l  Jn  Then as | | £ | | —>• 0,  a (E) ~ ( 2 r ) 7  p / 2  |E|- / . 1  2  P r o o f . Define  r(y) = ^ ( y ; / * ) s T  and  29  1  * (y; M) •  (3.25)  where d! is the differentiation of d with respect to y. Differentiating r(y) twice with respect to y gives,  oy  lA'( ^)S-4(y;/.) y  =  2  =  ^^^{s-^Cyi^^IpJ  + UXyj^E^A'Cy;^).  (3.26)  Note that by using (3.16), we have A'(/it; n) = 2 V / ( ^ t ) . We have shown in Lemma -1  2  3.2 that r(y) has unique minimum at y = /x. A t y = /z the first term on the righthand side of (3.26) vanishes and we have  =v-v .) -.v-/' .  g& J  ( l  S  W  y=fj.  Now by applying this to (3.24), with the fact that r{y) = 0, we obtain a(E)-  1  V  ~ ^  )  (  2  f  =(2,)*|S|*.  Now by taking the inverse of the above, we get (3.25). Hence the theorem.  •  Suppose we think of the approximation obtained by using the p*-formula as a saddlepoint approximation, then the approximation obtained by Laplace's method can be regarded as renormalized saddlepoint approximation. Therefore the ordinary and renormalized saddlepoint approximation agree asymptotically in the case of multivariate proper dispersion model. That is, in the case of a multivariate proper dispersion model the renormalized saddlepoint approximation is exact. In fact, this is one of the main results for the univariate proper dispersion models.  30  3.4  Multivariate normal approximation  We now consider a multivariate normal approximation to the multivariate proper dispersion model. This will pave the way to mimic some basic multivariate techniques in the multivariate proper dispersion model with small dispersion matrix.  Theorem  3.5  Let the multivariate  proper dispersion model be given by  /(y; LL; £ ) - a ( E ) | V ( y ) | - * exp { - ^ ( y ; ^ E " T  1  ^ ; /*)} •  (3-27)  Then  Z 4  (0,I ) as | | £ | |  0,  P  where Z is a linear transformation  (3.28)  given by  Z = s-^V- / 1  2  (/x) ( Y - LL) .  (3.29)  Proof. For small dispersion y gets closer to LI. SO we prove this by introducing a second-order Taylor series approximation to the term in the exponent of (3.27) at y = LI and saddlepoint approximation (uniformly convergent saddlepoint approximation) to the normalizing constant. Let  <fe dd-z  (y;Li)  (y; LI)  =  I  (y;At)E-4(  t  T  y ; A t  )  =  0  =  2 V " / (LI) Y,~ V~ l  dy  d d-z (y; LI) 2  dydy  1  2  l  l  2  (LL) .  (3.30)  T  y=A«  Therefore the second-order Taylor series approximation to cfe (y;./z) at y = LL is given by du (y; y) « (y - n)  T  V " / (») s ^ V " / 1  2  31  1  2  (LL) (y - /x).  (3.31)  The Jacobian matrix is given by  j{„) = -^ =  *-^v-^{»).  Hence the density of Z is /(z;/i,E)  =  | J ( / i ) | - / ( v / ( i ) S / z + M;/i,s)  «  |S-2V-5  =  |S- / V- / (/i)|- (27r)- / |S|- / |V(/x)r  -  (27r)- exp{z z}.  1  1  1  2  2  1  2  f  (/x) ^ ^ ( S ^ V {V^(//)S5z + /z}  1  2  p  1  p/2  2  1  2  | - 5  1 / 2  exp {z z}(3.32) T  e x p { z z } (3.33) T  (3.34)  T  where the term in the exponent of (3.32) was obtained by using the second-order Taylor approximation and the normalizing constant in (3.34) was obtained by the saddlepoint approximation for small-dispersion. Hence the results follows.  •  Note that we may write (3.28) as follows  for small dispersion matrix. We note that d dy (y; / i ) dydfx  ,9{2A'(y;/x)S-H(y;^)} dfi  2  =  T  T  d d-z (y; p) 2  2 V - / (y) E ^ V " ' 1  dydLi  T  2  1  2  (y).  (3.35)  Hence we have, generalizing (2.13), d dv 2  ,  dydy  , T (y; y )  d d^ 2  ,  d dv 2  d/j,dn  s  T (y; y )  dydn ' ' r (y y)  We now consider a normal approximation to another transformation of Y which is approximately equal to (3.29). 32  T h e o r e m 3.6 The normal approximation to the multivariate proper dispersion model (3.27) with the transformation  Z - Y,- 'H (Y; LL) l  (3.36)  is given by as | | £ | | -»• 0.  zAj\f(0,I ) p  (3.37)  Proof. For the transformation (3.36), the Jacobian matrix is given by J(Y,i.) = ^  = iE-^A'(Y; .).  r  A  Now by using a second-order Taylor series approximation for small dispersion at H = y, we get A ' (Y; it) « 2 V ~  1/2  ( Y ) and thus  J (Y) « s - ^ V " / (Y). 1  2  Hence, the density of Z is given by /(zj/i.E)  «  =  |J( )|- a(E)|V( )|- / exp{z z} 1  1  y  2  T  y  E-^V-^^p'^^lV^r^expfz^}  ~  V / (y) E V / ( y ) | "  ~  (27r)- exp{z z}.  1  2  1  p/2  2  1 / 2  ( 2 T T ) - ^ | E | - / | V ( y ) I" 1 / 2 exp { z z } 2  1  2  T  (3.38)  T  The normalizing constant in Equation (3.38) is obtained by the saddlepoint approximation. Therefore (3.37) follows immediately.  •  Note that we may write (3.37) as follows t (Y; LI) ~N ( 0 , E ) . P  It is noted that since t (Y; LL) « V  -  1  / (LL) (Y — LI) we have the normal approxima2  tions (3.28), and (3.37) are approximately equal, as we expected. 33  Chapter 4 Bivariate gamma distribution In this chapter, we discuss the bivariate gamma distribution. In Section 1, we discuss the normalizing constant in the case of the multivariate gamma model and study the normalizer numerically. In fact, we employ two methods to estimate the normalizer. Although in their construction of multivariate dispersion models, J0rgensen and Lauritzen (1998) did not require the definition of multivariate dispersion models to have specific moments or marginals, we, in Section 3, investigate the moments up to order two in a bivariate situation. And finally, in Section 4, we investigate the marginals numerically.  4.1  N u m e r i c a l s t u d y o f normalizing constant  Let the multivariate proper dispersion model be given by / ( y ; / i ; A ) = a (A)|V(y)|-5 exp [ - ^ ( y ; M)A*(y; A*)} T  p  (4.1)  J0rgensen and Lauritzen (1998) have shown the following derivations to calculate the normalizing constant. <(A)  =  / . . . / { V-*{yi)  • • • V-HVP)}  34  ^ P { - ^ ( y ; /*)A*(y; A*)} d y (4-2) T  =  J  -J{V  e x  *(yi)...V  1 p j - 2 !C ^ A  R  j^A^O/i;^)  2(y )}exp p  A*0  |  r  d  y  n ^ ^ - ^ A j j ) y . . . y"exp|-^5ZA 0 T (yi] ta) r {yj\fij) i ( 1 i = i {c(Ajj)^"^(?/j)dyj} exp ^ - - ^  p  n  A^d (y*; ^ )  i+3  n ^ c - ^ A ^ O E l e x p - ^ ( y ; / / ) A t ( y ; fx)} T  0  (4.3)  where A has zeroes on the diagonal, and off-diagonal elements equal to A , c is the 0  normalizer of the marginal univariate proper dispersion model, and the expectation is with respect to the distribution with independent marginals corresponding to the diagonal of A . In fact, Song (1996) used a derivation same as the above to estimate the normalizing constant for the multivariate gamma model. By using (4.3), the normalizing constant for multivariate gamma model is given by  (nTA*')exp{-E?A«}  -i  E  n? r (A«)  e x p { - £ V (*i;l)r(X,-;l) (4.4)  The expectation is taken under the probability measure generated by p-independent univariate gamma random variables Xi ~ Ga (l, A^ ) , i = 1,... ,p. This expectation 1  in (4.4) can be calculated by using Monte Carlo simulation. Note that we may estimate the normalizer by using numerical integration directly applied to (4.2). Note that this approach may not be a good one to estimate the normalizer in the case of higher dimensions. 35  In the case of Monte Carlo simulation, we have to estimate the following for the bivariate gamma:  (njA^expj-E?^}!" . 1  E[exp{-A r(X l)r(X ;l)}], 1 2  nir(A«)  i ;  2  where the expectation is taken over two independent gamma random variables with mean 1 and dispersion parameter A ^ . Before estimating the normalizer, let us define 1  the following as a measure of association between two variables <7l2  P  =  i  where ( E ) ^ = Oij. We consider the following dispersion matrices ( £ ) and the measure of association (p) to study the normalizing constants . Table 4.1: Dispersion Matrix.  0.75  Si =  0.3  So =  0.1  s, =  -0.1  -0.3  -0.75  s  4  E  s  E  f i  =  =  =  A n S-Plus program interfaced with C was written to estimate the normalizing constant by using Monte Carlo (MC) simulation. In fact, S-Plus was used to generate 36  Table 4.2: Estimation of normalizing constant by using M C method. S  p  0.75  0.3  n  Si  s  2  1000  10000  50000  100000  6.25  6.02  6.13  5.99  (0.21)  (0.14)  (0.05)  (0.01)  0.3814  0.37869  0.37874  0.37905  (1.3 x 10" )  (5.3 x 10~ )  (2.3 x l O " )  (1.3 x 10~ )  0.11263  0.112513  0.112507  0.112498  3  0.1  s  3  (1.2 x 10~ ) 4  4  (4.4 x 10~ ) 5  4  (1.9 x 10" ) 5  4  (1.09 x 10~ ) 5  Table 4.3: A comparison of estimations of normalizing constant. p  S  M C method  Saddlepoint approximation  0.75  S  x  5.99 (0.01)  6.02  0.3  s  2  0.37905 (1.3 x 10~ )  0.41709  0.1  s  3  0.112498 (1.09 x 10" )  0.14307  4  5  the gamma random variables and C was used to do the calculation. Since C was used to do the calculation, the time taken to complete a simulation was very small. For example, for a simulation study on 100000 random numbers, the average time was 8 sec. using an H P C200 machine. In Tables 4.2 and 4.3, the estimates are given with their corresponding standard errors (given in brackets). Each simulation was repeated 10 times to get the standard errors. It is observed from Table 4.2 that as we increase the number of simulations (n) from 1000 to 100000, we obtain almost the same values of the estimate but with higher precision. In Table 4.3, we compared the estimate obtained by using M C methods with ns 100000 and the estimate obtained by using saddlepoint approximation. For small  37  dispersion ( | | £ | | small) both estimates are very close to each other. Then as | | £ | | increases saddlepoint estimates tends to over estimate the normalizing constant. Another interesting and potentially important empirical observation was that when a similar simulation study was done to estimate the normalizing constant in the cases of £ 4 , £ 5 and £ 6 the values obtained were almost equal to that of £ 1 , £ 2 and £ 3 , respectively. This situation is similar to a bivariate normal where the values of normalizing constant in the cases of £ 4 , £ 5 and £  6  are equal to that of £ 1 , £  2  and  £ , respectively. 3  4.2  Density plots  In this section, we give some density plots of the bivariate gamma. First we shall investigate the behaviour of contour plots and perspective plots when we have two independent gamma margins. That is, the dispersion matrix is diagonal. In this case, we consider the position vectors and dispersion matrices given in Tables 4.4 and 4.5. Table 4.4: Position vectors.  I) 1  V  1  ( 5) 3  I  /  2  \)  /  Table 4.5; Dispersion matrices. £  7  = I  0.04  0  0  0.04  \ £ /  8  /  0.2  0  I  0  0.3  =  /2  \ £ 9 =  /  I  0  °1 2  J  Figure 4.1 shows some contour plots of independent bivariate gamma densities and Figure 4.2 shows the corresponding perspective plots. The main reason for plotting the contour plots of independent bivariate gamma densities is to investigate the role  38  39  E  7  S  8  S  9  Figure 4.2: Perspective plots of independent bivariate gamma densities.  40  of the position vector LL on the shape of the densities. In the first row of Figure 4.1 the dispersion matrix is small and thus we can observe clearly that as position varies the peak of the density varies accordingly. When we have large dispersion matrix (in row 3) the peak becomes blurred, essentially because the large value of the dispersion squeezes a lot of probability mass down towards the origin. Large values of the dispersion, as explained by J0rgensen (1997b), corresponds to a value of the Euclidean-norm of the dispersion matrix for which the non-normality of the distribution becomes evident. Contour plots and perspective plots for bivariate gamma densities when the two variables have some dependence structure were discussed in three cases: 1) dispersion matrix with small dispersion; 2) dispersion matrix with moderately large dispersion; 3) dispersion matrix with large dispersion. Small dispersion: The dispersion matrices used in this case are given in Table 4.6. And the contour plots and perspective plots corresponds to this case are given in Figures 4.3 and 4.4. Table 4.6: Small dispersion matrices. 0.04 0.03  0.03 0.04  0.04  -0.03  -0.03  0.04  0.03 0.015  0.015 0.09  0.03  -0.015  -.015  0.09  Moderately large dispersion: The dispersion matrices used in this case are given in Table 4.7. And the contour plots and perspective plots corresponds to this case are given in Figures 4.5 and 4.6. Table 4.7: Moderately large dispersion matrices. 0.1  0.138  0.1  -0.138  0.138  0.3  -0.138  0.3  0.2 0.085  0.085 0.4  0.2 -.085  -0.085 0.4  Large dispersion: The dispersion matrices used in this case are given in Table 4.8. 41  p = 0.75 S u  Figure 4.3: Contour plots of bivariate gamma densities with small dispersion.  42  A*  A*i  2  A*  3  p = 0.75Sn  p = -0.75£i  p = 0.3 E  2  1 3  p = -0.3 S  u  Figure 4.4: Perspective plots of bivariate gamma densities with small dispersion.  43  Figure 4 . 5 : Contour plots of bivariate gamma densities with moderately large dispersion.  44  Mi p = 0.80  M  M  2  3  Sis  p = -0.80Si  6  p = 0.3  S17  p = -0.3  Sig  Figure 4.6: Perspective plots of bivariate gamma densities with moderately large dispersion.  45  1  Figure 4.7: Contour plots of bivariate gamma densities with large dispersion.  And the contour plots and perspective plots corresponds to this case are given in Figures 4.7 and 4.8. Table 4.8: Large Dispersion matrices.  Discussion: In the independence case, (Figure 4.1) the contour plots are almost ellipsoid like bivariate normal distribution where the axes (major and minor) of the contours are parallel to the xy-axes. When the two variables have some dependence structure, for small dispersion ( Figure 4.3), we considered two cases of dispersion matrices: one with diagonal elements equal: and the other with diagonal elements of ratio 3 and with large and small measure of  46  p =  -0.80S20  Figure 4.8: Perspective plots of bivariate gamma densities with large dispersion.  association. In the first case of dispersion matrix, for positive measure of association, the major axis of the contour plot ( almost ellipsoid) is almost along the 4 5 ° line through LL. When the measure is negative the major axis of the contour plot lie along a line at right angle to the 4 5 ° line through LL. That is, for small dispersion, the bivariate gamma behaves like bivariate normal. We may also note how the larger measure of association causes the probability to concentrate along a line. When we dealt with moderately large dispersion matrices (Figure 4.5), this large dispersion causes some deviation from normality but still preserve the fact that according to the sign of the dependency measure the major axis is either along the 4 5 ° line or perpendicular to it. Since a large dispersion in gamma density squeezes lot of probability mass down towards the origin the corresponding plots (Figures 4.7 and 4.8) are no longer ellipsoids.  47  E s t i m a t i o n of moments  4.3  In this section, we estimate the moments up to order two. The estimation, for dimension p , can be done by using Monte Carlo simulation. The formula for this simulation can be obtained by employing the same kind of derivation that was used to obtain a formula for the normalizer (4.4). We give the equation in the case of two variables Y\ and Y . 2  ((Vp E (Yi) = a (S) |  x A exp ( -  T X-] 1  x  2  ulTiXii)  1  p  }  1  _ 1  E  [ Y i 6  X  P  {  ~  A  l  2  r  { Y l ]  ^  r  { Y 2 ]  P 2 ) } ]  •  (4.5) E (Y ) = a (£) { 2  (  m  A  p  ^ ^  E  '  A  r  i  }  }  1  E [Y exp - A 2  {  1 2  r (*;  M l  ) r (Y ; »)}] . 2  (4.6) The expectation is taken under the probability measure generated by two independent univariate gamma random variables Yi ~ Ga (^i, A ^ ) , i = 1,2. This expectation can 1  be calculated by using Monte Carlo simulation. In the case of small dimensions we may also use numerical integration to estimate the moments.  In Tables 4.9 and  4.10, we give estimates of moments, using Monte Carlo simulation, for two dispersion matrices. The first one with p = 0.75 and the second with p = 0.3. We compare the estimated values with reference values (Ref. Val.) obtained by using the marginals Yi ~ Ga(pi,ou)  i = 1,2.  A C program interfaced with S-Plus was used to estimate the moments. A set of 100000 random numbers was generated from corresponding gamma distributions to estimate the moments and this was repeated ten times to estimate the standard error. It was noticed from Tables 4.9 and 4.10, that although the estimates are almost equal for large and small measures of association, the corresponding standard errors are small for small measures of association. This implies that the off-diagonal term 48  in the dispersion matrix plays an important role, along with the diagonal elements, in estimating the moments. Table 4.9: Estimation of moments. (  E =  0.04 0.03  N  fj, =  ^ 0.03 0.04 j  Ref. Val.  C)  Estimated  1  0.9893  fj, = w  Ref. Val. 3  (0.0019) 1  E(Y ) 2  0.9894  x  0.04  0.0404  4  2  0.04  0.0341  4.0083 (0.0426)  0.36  (0.0019) Var(Y )  2.9879 (0.0044)  (0.0017) Var(Y )  Estimated  0.3122 (0.0578)  0.64  (0.0115)  4.4  0.62 (0.26)  N u m e r i c a l s t u d y of marginal density  In this section, we investigate marginals from the bi-variate gamma. For this study, we considered the position vector and dispersion matrices given in Table 4.4. A numerical integration routine in C was used to integrate out the second variable at some specified points of the first variable. The reference density and the estimated density are given in Figure 4.9. The first plot is for small dispersion, the second and third are for moderately large dispersions and the last is for large dispersion. It is noted, from the first plot, that the estimated and the reference marginal densities are almost exactly the same. In the second and third plots the estimated and the reference marginal densities are very similar. But for the large dispersion, in the last 49  Ga(3,0.04)  Ga(3,0.5)  Ga(3,l)  Figure 4.9: A comparison between true and estimated marginal densities (estimated density is given in solid lines). 50  Table 4.10: Estimation of moments. •E =  ' 0.03  0.015 \  0.015  0.09 )  v  fx = Ref. Val.  E(Y,)  C)  Estimated  1  0.9949  fi = Ref. Val. 3  1  2  0.9953  0.03  x  0.0298  4  3.9784 (0.0001)  0.27  0.2717  (0.00001) Var(Y )  0.09  2  0.0951  Estimated  (0.0006)  (0.0003) Var{Y )  3  2.9841  (0.0003) E(Y )  {1  (0.00001) 1.44  1.4326  (0.0008)  (0.0093)  Table 4.11: Dispersion matrices. 0.04  0.03  0.2  0.12  0.5  0.112  1  0.735  0.03  0.04  0.12  0.8  0.112  2.5  0.735  1.5  plot, there is a marked difference between the estimated and the reference marginal densities.  51  Chapter 5 A generalization of Hotelling's T -statistic 2  5.1  Introduction  One of the most important test problems in univariate statistics is the test of the mean of a given distribution when the variance of the distribution is unknown. The statistic usually used in univariate statistics is the difference between the sample mean, x, and the hypothetical population mean, divided by the sample standard deviation, s. If the population sampled is iV(/i, cr ), then 2  t = ^ri^^ s  (5.1)  has the well-known ^-distribution with n — 1 degrees of freedom, where n is the sample size. On the basis of this fact, one can test the hypothesis HQ : p = po, where JIQ is specified. The statistic (5.1) was generalized for non-normal distributions in the framework of generalized linear models, cf. J0rgensen (1997a). J0rgensen used the corresponding estimated dispersion parameter in the denominator of (5.1). 52  He, in fact, showed  that the statistic follows asymptotically a ^-distribution for large samples and small dispersions. The multivariate analogue of the square of t given in (5.1) is T  2  = n(X-  ) S- (XT  A t o  )  1  M o  (5.2)  where X is the mean vector of a sample of size n and S is the sample covariance matrix. (5.2) is referred to as the Hotelling T -statistic  in the multivariate normal  2  model. In the following sections, first we discuss the general likelihood ratio method in the multivariate normal model which will be used to derive (5.2). Then we generalize the T -statistic to the multivariate proper dispersion model. 2  5.2  General likelihood r a t i o m e t h o d i n m u l t i v a r i ate n o r m a l model  Let 6 be a vector consisting of all the unknown population parameters, and let L (6) be the likelihood function function obtained by evaluating the joint density of X  l 5  ..., X  n  at their observed values x ,..., x„. The parameter 0 takes its values in the parameter x  set  0 .  Let us consider the p-dimensional multivariate case, where the X^s are independent and identically distributed as JV (LL, £ ) . Then 0 consists of the product of the p  p-dimensional space for means and the p(p + l)/2-dimensional space of variances and covariances such that £ positive-definite.  Therefore, 0 has dimension v =  V + P(P + l ) / 2 . Under the null hypothesis H : LI = fx , LL is restricted to lie in a 0  subset 0  O  of 0 . So 0  O  0  0  has dimension v = p(p + l ) / 2 (mean vector is known). 0  53  The likelihood ratio test of H : /x = fi0 rejects H in favour of Hi : LI ^ ^t if 0  A  max^  =  0  0  = A t o  L(At,S)  ^  m a x ^ ^ L (tx, S) where c is a suitably chosen constant. When the sample size is large and under certain regularity conditions, the sampling distribution of —2 log A is well approximated by the chi-squared distribution with degrees freedom v — v . That is, —2 log A ~  xl- -  0  Vo  T -statistic in multivariate normal  5.2.1  2  Before we make a connection between likelihood ratio test for the hypothesis H : 0  tx = tx and Hotelling's T -statistic, we first give the general definition of the Hotelling's 2  0  T -statistic and its distribution. 2  D e f i n i t i o n 5.1 Let Z ~ jV (0,I ), P  W ~ W (I ,n p  — 1) (Wishart  let the two random variables be mutually independently distributed.  distribution),  and  Then the random  variable T  = {n-  2  is called Hotelling's T -statistic.  T  l  The distribution  2  (5.3)  l)Z Vf- Z  of T  2  is called Hotelling's T 2  distribution with n d.f. The T -statistic and its distribution were originally developed by Hotelling (1931). 2  Now let us consider the likelihood ratio test for the hypothesis H : /j, = LI vs. 0  0  H i : /x ^ /x on the basis of the random sample ( x i , . . . , x ) drawn from J\f (fi, S ) , T, 0  n  p  positive-definite , where n > p , fi0 is a given vector, and (//, S ) is unknown. Let 0  =  {(/x, S) : —oo < pi < oo, i — 1 , . . . ,p, £ positive-definite} ,  ©o  =  {(/x, S) : pt = /x , £ positive-definite} . 0  54  The likelihood ratio criterion A is given by m a x A  =  6>60o ( L  max  )  L (0)  0 6 0  /  6 >  T  \ ~^  2  n  2  ra - 1  where T = n ( x — LI )  S  2  0  1  ( X — LI ^ . Note that this T can be written in the form 2  0  of (5.3) by letting Z = V ^ T  1 7 2  ( X - Mo) and V = nS  and W = E ^ V E " ^ . 1  n  Since A is a monotonically decreasing function of T , the likelihood ratio test 2  based on A is equivalent to the following: Accept H  0  if  T <T  Reject/^  if  T > T (a),  2  (a),  2  2  2  where T (a) is the upper a-point of the distribution of T for a specified significance 2  2  level a. Note that when p = 1, T reduces to the square of Student's t-statistic. 2  D e r i v a t i o n of d i s t r i b u t i o n of T - s t a t i s t i c 2  Consider the statistic T  2  = (n — 1 ) Z W T  _ 1  Z . Since T  2  is invariant under any non-  singular transformation, we make use of a random orthogonal transformation Z* = H Z , W * =  where H  P  X  HWH  T  ,  is an orthogonal matrix such that its first row is  P  H Z = | (Z Z)  , 0,..., oj  T  T  2  = (n-  z l / 2  . Noting that  , T reduces to the following simple form: 2  1) ( H Z )  T  (HWH ) T  55  _  1  ( H Z ) = (n -  1 ) ^ ;  where w~i is the (1, l)  element of W " . 1  th  X  Now we have (z z) ~ x . By the property of the Wishart distribution, we have T  2  the distribution of W* condition on H is W (I , n — 1). Suppose we partition the W* p  as follows: ^ w  W12  n  VW i W 2  2 2  then we can write w+n — \\ — W i W " W i and this has a chi-squared distribution 1  w  2  2  2  2  of n — p degrees of freedom independent of H and hence of Z . Hence the distribution of T is given by 2  X  P  2  r>j —  (n-1)  P  —  Xl-  (n-  P  P  y  p  '  n  -  p  -  The above method of derivation of the T -distribution by making use of a random 2  orthogonal transformation is due to Wijsman (1957).  5.3  M a x i m u m likelihood estimators of LL and E  In this section, we discuss the small-dispersion asymptotic maximum likelihood estimators of LI and X . T h e o r e m 5 . 1 Let Y , . . . ,Yn, x  proper dispersion and dispersion estimators  be a random sample from PD (fi, S ) P  where fi and X are the position  model with p-dimension)  matrix,  respectively.  (multivariate  Then asymptotically,  the maximum  vector  likelihood  of fi and S are given by fi  « Y as | | S | | -> 0,  (5.4)  E  ~ -E{*(Y Y)t (Y Y)} T  n  i=i  i ;  i ;  56  as | | S | | ^ 0.  (5.5)  Proof. We have, from (3.14) in Theorem 3.2, that  ^  = -^ED'(Y M)At(Y, x). i;  (5.6)  ;A  For small-dispersion, we have D'(Y ; )«2V- / ( ), 1  l  2  A t  M  and t(Y ; l  )«V- / ( 1  A t  2  A t  )(Y i  A t  ).  Hence, we have ^  « ± V- / ( 1  a  M  2  AV- / ( 1  A i )  (  2  M )  Y i  - /*).  (5.7)  i=i  Hence (5.4) follows immediately from (5.7) by taking Q J ^ = 0. 9  For small-dispersion, by the saddlepoint approximation, we obtain / (y; /x, £) » (27T)-* | E | " * | V (y) | " * exp { - ^ t  T  (y; /z) £ " 4 (y; jx)}  .  (5.8)  Hence the log-likelihood function is given by £(LI, E ) « log { ( 2 7 T ) - ? ft | V ( y ) | -  1 / 2  | - £ log | E | - \ ± t ( ; / i J E " ^ ; / i ) . T  1  yi  4  (5.9) We now differentiate (5.9) with respect to E . =  - T ^ -  1  + ^  t  s  _  1  {*  n) t  T  M)}  S" .  (5.10)  1  i=l  = 0 and inserting the estimator /x = y gives a maximization  Now taking  problem similar to that for the multivariate normal and we obtain (5.5).  5.4  •  T i n multivariate proper dispersion m o d e l 2  In the general setup of the T -statistic, we consider the likelihood ratio test for the hy2  pothesis H : /x = (j, vs. Hi : LI ^ /x, where /z is known and the dispersion matrix 0  0  0  0  57  X is unknown. For the discussion of T -statistic, we consider two types of asymptotic 2  theory, namely standard large-sample theory (n —>• oo) and small-dispersion asymptotics. But since we use the estimator of X in the likelihood ratio test, we cannot make | | X | | —> 0. So we introduce a known weight matrix A for each observation vector such that A is symmetric and regular and consider | | A j | | —>• oo for each i = 1,..., n. The limit | | A j | | —>• oo is the theoretically simplest way of expressing small-dispersion asymptotics; in practice the results may be applied for | | X | | —•) 0 or more generally ||A~  1 / 2  SA~  1 / 2  ||  0 for all i.  Consider the likelihood ratio test for the hypothesis H : fx = fx vs. Hi : LL ^ LL on 0  0  0  the basis of the random sample Y i , . . . , Y , with distribution Y^ ~ PD n  P  [LL, A ^ ^ X A ^  i = 1,..., n, where X is positive-definite, n > p and (fx, X ) is unknown. Let 0  =  {(fx, X ) : ^ e f i , i = l , . . . , p , S positive-definite} ,  0o  =  {(A*> X ) : fx = fx , X positive-definite} . 0  We know from Theorem 3.6 that S / Aj t (Y  fx)  ~  JV (0,I )  that is, A / t (Yf, fi)  ~  J\f (0, X ) as | | A ; | | —> oo.  - 1  2  / 2  i ;  P  P  as | | A ; | | -> oo  1 /2  L e m m a 5.1 Let Y i , . . . , Y  p  be a random sample from PD  n  P  [fx, A ~  1 / , 2  XA~  1 7 2  ) . Then,  for 11 A^l | —> oo the maximum likelihood estimate of X is approximated by s  «  ^EAj *(yi;y)* (y*;y)Aj . /a  n  T  (5.11)  /2  i=i  Suppose fi is known and equals to fx . Then, for \ \Ai\ \ —>• oo the maximum likelihood 0  estimate of X is given by S o ~ - E A , * (y ; fx ) t (y,; fi ) A] . i-l 172  T  t  n  58  0  /2  0  (5.12)  Proof. The results (5.11) and (5.12) follow immediately from Theorem 5.1. L e m m a 5.2 Let Z = A\' t ( Y ; fx ) and S = ^  £?  2  {  0  f  Then S and Z are asymptotically  independent  A.] t ( y y) t ( y y) A\' . /2  = 1  •  T  i;  2  i;  for | | A j | | —>• oo /or a// i , where Z =  1/n XI Z i , and furthermore  Z - A / ^ O , ^  (5.13)  and (n - 1)S ~ W _ ! ( £ ) .  (5.14)  n  where W „ _ i ( S ) is £/ie Wishart distribution  with n — 1 degrees of freedom.  Proof. Since Y i , . . . , Y „ are independent, so are Z i , . . . , Z . We have by n  Theorem 3.6  Zi ~ A/p ( 0 , S ) as 11Aj|| -r oo  (5.15)  and hence Z~J\f (o,  ^  p  as 11Ai||  oo for a l i i . ,  (5.16)  Now by applying a second-order Taylor series approximation to Z i we obtain A] t  ( Y ; LI ) «  /2  0  A^V" / 1  (At ) ( Y - At )  2  0  = Aj/ {V-V 2  «  2 (/ifl  0  ) ( - Y) +  ( M o )  Y  (Y -  Mo  )}  Aj t(Y Y) + if:Aj t(Y,;/i ). / 2  / 2  i ;  0  Hence  Aj *(Y Y) ~ Z i - Z . /2  i;  Let Zpxn = ( Z i , . . . , Z„), and X = Z H , where H column is n / l 1  2  is an orthogonal matrix and n  th  n X 7 1  so that X „ = n / Z . Now :  n x l  (5.17)  2  X ~ Afp (0, E ® I „ ) , 59  (5.18)  so, the columns of X are independently normally distributed with the same covariance matrix X . Now let X ' = ( X i , . . . , X _ i ) , then we have that X ' and X „ are n  asymptotically independent. Now consider (n-l)S  =  ^ZiZ7-nZZ  T  i=i  =  ZZ — X X  = =  X X — X X EX.XT.'  T  n  n  T  n  n  (5.19)  i=i  Hence we have that S and Z are asymptotically independent for | | A j | | —» oo for all i. The result (5.15) now follows immediately from (5.18) and (5.19). • Note that S is an asymptotically unbiased estimator of X for | | A , | | —» oo for all i. We define the T -statistic for testing H : LL = Li vs. Hi : LL ^ LL with X unknown 2  0  0  0  by the following monotone transformation of the likelihood ratio test A , T H  2  = (n - 1) ( A " / " - l ) .  (5.20)  2  is rejected for small values of A or, equivalently, for large values of T . Note 2  0  that in the case of multivariate normal the Equation (5.20) is exactly the Hotelling's T -statistic. 2  T h e o r e m 5.2 Consider the test of the hypothesis Ho : LL = /x vs.  Hi : LL ^ LL  0  on the basis of the random sample Y The T  2  1 ;  . . . , Y „ drawn from PD  P  in (5.20) then follows approximately  furthermore,  the Hotelling  0  [LL, A ~  1 / / 2  l  T -distribution. 2  XA~ ^ ). 1  2  Hence,  we can show that  T  ~  n —p  F _ „ as A J —> oo for all i. p n  60  (5.21) '  Proof. It can easily be shown that the likelihood criterion for the above hypothesis, under multivariate normal approximation, is given by A /"  «  2  E + ( E  - S ) |  0  Isl  s + zz Now by taking B  s  z  z  -1  T  we have (-1)  s + zz  s  - i - z  T  s - z 1  (l + Z E " z ) T  s + zz  1  n - 1  V  I  Hence '  T  2  2/r,  1 +  A  T  2  «  ra - 1  (n - 1) ( A " / " - l ) 2  and we have T  2  «  raZ S Z T  -1  Since fnZ • ~  JV (0, P  61  S ) ,  (5.22)  S  ~  W _i(S) n  and Z and S are asymptotically independent, by Definition 5.1, T is approximately 2  the Hotelling T -distribution. Hence 2  T)(TI  T  ~  2  Hence the theorem.  — 1J  n —p  as ||A,|| —» co for all i.  F r,-r, n  •  T h e o r e m 5.3 For large sample, Equation  (5.21) still holds. That is,  -,2 .. P(n ~ 1) T ~ F _ as n —p vn  n —> oo /or all i.  v  Proof. By large sample theory, we have v = — 2 log A —>• x  2  as n —> oo.  Using (5.20), we have T  2  =  (n-l)(e--l) ( n - l ^ P f r n n p(n - 1) n —p  Hence T ~ 2  as n ->• oo.  1  ) ^ p  •Pp.oo as n —> oo.  •  62  Tree Data (Diameter)  tO  12  Tree Data (height)  14 16 (in inches)  70  75 {in In!)  60  85  Figure 5.1: Histograms of diameter and height of the trees.  5.4.1  Numerical study on the T -statistic 2  In this section, we investigate the density of the generalized T statistic via two ways; 2  1. by using data, 2. by simulating random numbers from the bivariate gamma model.  B y using data: The data set used in this analysis was obtained from Ryan, Jointer and Ryan (1985). The Trees data contains three variables. The variables are diameter (d), height (h) and volume (v) for 31 black cherry trees in Allegheny National Forest, Pennsylvania. The variables are clearly strong candidates for a multivariate gamma model. We consider the first two variables in our density estimation problem. The histograms of these two variables are given in Figure 5.1. Asymptotic estimates for the position vector and dispersion matrix is given as follows ^ 13.25 ^ 76.00  and  X =  0.05  0.01  0.01  0.007  and | | E | | = 0.05. To estimate the density of T , we bootstrap 1000 samples from the data and the 2  values were calculated for both the classical T -statistic (5.2) and the generalized T 2  63  2  statistic (5.22). A non-parametric density estimation was obtained by using kernel smoother (with Silverman's band width). The estimated density is compared with the true density, F  p > n  _ = p  F  2 ) 2  9.  The plots are given in Figure 5.2.  From the Figure 5.2, we observe that our generalized T -statistic behave almost 2  exactly the same as the classical T -statistic for this data set. The main reason may 2  be that the dispersion matrix is small. Even there is some discrepancy at the upper part of the distribution, the approximation is very accurate around the critical region. Note that  ^2,29,5%  =  3.33 and  i*2,29,i%  = 5.02 Although our generalized T -statistic 2  obtained from asymptotic theory, we may use this when we have a non-normal data, data from multivariate gamma, in this case.  B y S i m u l a t i o n study: One of the hardest part in this thesis was to find a suitable method or procedure to generate random numbers from the multivariate gamma distribution, bivariate gamma distribution in this case. The main problem in employing the well known Markov chain Monte Carlo methods such as Gibbs sampling and Metropolis algorithm, is that it is hard to find the conditional distributions in the multivariate gamma. Neal (1997) proposed a Markov chain Monte Carlo method based on 'Slicing' the density function. The method does not require the whole density function, only requires a function that is proportional to the density. Therefore, it is feasible to use this method in the multivariate gamma where the normalizing constant is unknown. The idea is that suppose we need to sample from a distribution over some subset of lZ , with density function proportional to some function f(x). n  One simple idea is to  sample uniformly from n +1 dimensional region that lies under the plot of f(x).  This  idea can be formalized by introducing an auxiliary real variable, y, and defining a joint distribution over x and y that is uniform over region U = {(x,y) 64  :0 < y <  f(x)}  Classical T -statistic 2  65  below the curve defined by  f(x).  The algorithm for a single variable (x) is briefly as follows: We sample alternately from the conditional distribution for y given the current x which is uniform over the interval (0, f(x)) - and from the conditional distribution for x given y - which is uniform over the region S = {x : y < f(x)},  which is called the  'slice' defined by y. For multivariables: each real-valued variable,  be updated  by a single-variable slice sampling procedure. A S-plus program interfaced with C routine was used to generate the random numbers from Ga (Li, S),where 2  0.04 V  0.015  and IISII = 0.1. Five hundred samples each with 50 random numbers were generated and the classical T -statistic (5.2) and the generalized T -statistic (5.22) were calculated and the 2  2  corresponding estimated densities were compared with true density,  F 48 2>  (see Figure  5.3). From the Figure 5.3, we observe that the generalized T -statistic behave a little 2  bit better than classical one. The discrepancy between the estimated and the true densities is more in the classical case than in generalized case. Although there is a discrepancy between the estimated and the true densities in the upper part of the plot, there is pretty close agreement between the estimated and the true densities in the tail area in both cases. That is, the approximation is very close around the critical region. This is a very important observation because we are interested in hypothesis testing. Note that F ^ ^% = 3.19 and 2  8  i*2,48,i%  66  = 5.08.  Classical T -statistic 2  67  Discussion: B o t h i n the simulation study and i n the data analysis, we observed that the generalized T - s t a t i s t i c behave almost as the same as the classical T - s t a t i s t i c . 2  2  T h e n the following question arises here: what is the usefulness of the generalized T - s t a t i s t i c ? T h e one answer for this question is that i n most situations gamma data 2  behave almost like normal data. T h a t is why we could not observe a marked difference between the generalized T - s t a t i s t i c (gamma case) and the classical T - s t a t i s t i c 2  2  (normal case, of course). Another answer may be that we simulated data from gamma w i t h small dispersion matrix, i n the case of small dispersion, the multivariate proper dispersion model is very similar to the multivariate normal model. Most of the results i n non-normal data analysis have developed by using asymptotic theory. B u t this does not mean that we have to apply the results only when we have small-dispersion or large sample. A s we do i n generalized linear model, we can apply in any situation. T h e m a i n point here is that we feel quite comfortable and satisfied when we do the data analysis of non-normal data w i t h the relevant non-normal dist r i b u t i o n rather than w i t h the normal distribution.  A s a final note to the non-normal data analysis, we quote the following from j0rgensen (1997b): "While normal distribution tribution  is the exception,  There is an importance  is useful for many types of data the normal dis-  rather than the rule, except for data with small of describing  data in their natural habitat. Analysis  dispersion. of non-  normal data should hence take into account the actual form of the sample space for each type of data".  68  Appendix A Some results on matrices Here we give basic results on matrices. For more details, see for eg. Dwyer (1962), Johnson and Wichern (1992) and Rao (1985).  A.l  Determinant  1. Let A : p x p,B : p x q,C : q x p,D : q x q. Then  A B C D  |D - CA^BI  if A is nonsingular  |D| |A - B D ^ C I  if D is nonsingular  ' |A| —  <  2. Let A : p x q, B : q x p. Then  |I + AB| = |I + BA|. P  A. 2  Inverse  1. Let A be an n x n nonsingular matrix that is partitioned as  where A^ : rii x rij, i, j = 1, 2 and rii + n = n. Then 2  ( A n - Ai A22 A i) 1  2  2  is the upper left-hand block of A  A.3  - 1  .  Kronecker product (Direct product)  Definition A . l Let A = (a^) be an m x n matrix and B = (bij) be a p x q matrix. Then the (mp) x (nq) matrix f  auB  ai B 2  ...  a B ^  a iB  a B  ...  a B  a  ... a  2  22  \ a iB m  is called the Kronecker  m2  B  in  2n  m n  B j  product or the direct product of A and B and is denoted by  A <g> B. Some elementary properties 1. (cA) <g>B = A ® (cB), 2. (A ® B) <g> C = A <g> (B ® C), 3.  (A <g) B ) = A  4. (A (g) B )  T  _ 1  T  = A  <g) B , T  - 1  (g) B  _ 1  for nonsingular A and B,  5. (A <g> B)(G (g) D) = (AC) <g> (BD), 6. |A (g) B| = |B ® A| = | A | | B | p  m  for A : m x m and B : p x p.  Definition A.2 F o r a #n;en A : p x q, we denote by VecA the vector obtained by writing the columns of A one below the other starting with the first. 70  A.4  M a t r i x derivatives and Jacobians  D e f i n i t i o n A . 3 For a scalar function f of a matrix variable X = (xij) : m x n we define its matrix derivative by df <9X  f df \dxij  : (m x n).  When n = 1, X is a column vector x and the corresponding vector df /cbc is called the vector derivative.  Table A . l : Vector Derivatives ( x : j ) X 1) /  df/dx  a x (a is constant)  a  x x  2x  x A x , A:pxp  (A + A ) x  T  T  T  T  x A x , (A = A ) T  T  2Ax  Matrix Derivatives • Let w = w(Y) be a scalar valued function of a matrix Y which is a function of matrix X  then  • The derivative of an element with respect to the matrix d (CX D) T  DK C, T  where C X D is a p x q matrix and K is p x q matrix with zero in every position T  except for kij = 1. 71  • Let U(X) : p x q and V(X) : q x r be matrix functions of a matrix X. Then the Vec derivative of U(X)V(X) respect to X is given by T  ,  T  aVec(U) aVec (X) T  72  . 1  v  T T  . aVec(V)  ^~ «9Vec (X) ;  T  Bibliography [1] Barndorff-Nielsen, O.E. (1988). Parametric  Statistical  Models  and  Likelihood,  Lecture notes in Statistics, 50, New York, Springer-Verlag. [2] Barndorff-Nielsen, O.E. and Cox, D.R. (1989). Asymptotic Techniques for use in Statistics, New York, Chapman and Hall. [3] Barndorff-Nielsen, O.E., and Cox, D.R. (1994). Inference and Asymptotics, New York, Chapman and Hall. [4] Barndorff-Nielsen, O.E. and J0rgensen, B . (1991). "Some parametric models on the simplex", J. Multivariate  Anal, 39, 106-116.  [5] Chambers, J . M . (1977). Computational  Methods for Data Analysis, Wiley series  in Prob. and Math. Statistics, New York, John Wiley and Sons. [6] Dwyer, P.S. (1962). "Some applications of matrix derivatives in multivariate analysis", Journal  of American  [7] Everitt, B.S. (1987). Introduction  Statistical  Association,  to optimization  62, 607-625.  methods and their  application  in Statistics, New York, Chapman and Hall [8] Hotelling, H . (1931). The generalization of Student's ratio Ann. Math. Statist. 2 360-378.  73  [9] Hutchinson, T.P. and Lai, C D . (1990). Continuous  phasising Applications,  Bivariate  Distributions,  Em-  South Australia, Rumsby Scientific Publishing.  [10] Johnson, R . A . and Wichern, D . W . (1992). Applied Multivariate  Statistical  Anal-  Continuous  Mul-  ysis, Third edition, New-Jersey, Prentice Hall Inc. [11] Johnson, W . L and Kotz, S. (1972). Distributions  tivariate Distributions,  in Statistics:  Wiley series in Prob. and Math. Statistics-Applies, New  York, John Wiley and Sons. [12] J0rgensen, B . (1987a). "Small Dispersion Asymptotics", Brazilian Prob. and Statistics,  Journal of  1, 59-90.  [13] J0rgensen, B . (1987b). "Exponential Dispersion Models", Journal of Royal Statistical Society, B49, 127-162.  [14] J0rgensen, B . (1997a). Proper dispersion models (with discussion). Journal  of Probability  and Statistics  Brazilian  11, 89-140.  [15] J0rgensen, B . (1997b). "The theory of dispersion models" , New York, Chapman & Hall. [16] J0rgensen, B. and Lauritzen, S.L. (1998). "Constructing Multivariate Proper Dispersion Models", Research report 2: Department of Statistics and Demography, Odense University, Denmark. [17] Krishnaiah, P.R (1985). Multivariate gamma distributions. In Encyclopedia Statistical  of  Sciences Vol. 6 (eds. S.Kotz, C.B.Read and D.L.Banks), 63-66. New  York: Wiley. [18] Krishnamoorthy, A.S. and Parthasarathy, M (1951). Ann. Math. Statist., 22, 549; erratum 31, 229. 74  [19] Minoru Siotani, Takesi Hayakawa, and Yasunori Fujikoshi (1985). Modern variate Statistical  Analysis:  Multi-  A Graduate Course and Handbook, Ohio, American  Sciences Press, Inc. [20] McCullagh, P. and Nelder, J . A (1989). Generalized linear models, Second edition. London: Chapman & Hall. [21] Neal, M.R. (1997). "Markov Chain Monte Carlo Methods based on 'Slicing' the density function", Technical report No: 9772, Department of Statistics, University of Toronto, Canada. [22] Nelder, J . A and Wedderburn, R . W . M . (1972). "Generalized linear models", J. Roy. Statist. Soc. Ser., A 135, 370-384. [23] Neyman, J. (1926). "Further notes on non-linear regression", Biometrika,  18,  257-262. [24] Pearson, K . (1923). "Notes on skew frequency surfaces", Biometrika,  15, 222-  230. [25] Radhakrishna Rao, C (1985). Matrix derivatives: Applications in statistics. In Encyclopedia  of Statistical  Sciences Vol. 6 (eds.  S.Kotz,  C.B.Read  and  D.L.Banks), 320-325. New York: Wiley. [26] Rogers, G.S. (1980). Matrix Derivatives, Volume2, New York, Marcel Decker Inc. [27] Ryan, B . F . , Joiner, B . L . and Ryan, T . A . (1985). Minitab,  Handbook, Second  Edition, Duxbury Press, Boston. [28] Song, P.X. (1996). "Some Statistical Models for the Multivariate Analysis of Longitudinal Data", Ph.D thesis, Dept. of Statistics, University of British Columbia. 75  [29] Tweedie, M . C . K . (1947). "Functions of a statistical variate with given mean, with special reference to Laplacian distributions", Proc. Cambridge Phil. Soc. 49, 41-49. [30] Wijsman, R . A . (1957). "Random Orthogonal Transformations and their use in some classical distribution problems in Multivariate Analysis", Ann.  Math.  Statistics, 28, 415-423. [31] Yule, G . U . (1897). " On the significance of Bravais' formulae for regression in the case of skew correlation", Proceedings  477-489.  76  of the Royal Society of London,  60,  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088555/manifest

Comment

Related Items