THE DISTRIBUTION OF THE EXTREME MAHALANOBIS1 DISTANCE FROM THE SAMPLE MEAN by Yvonne Germaine Marie Ghislaine Cuttle A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS in the Department of MATHEMATICS We accept this thesis as conforming to the standard required from candidates for the degree of MASTER OF ARTS. Members of the Department of MATHEMATICS THE UNIVERSITY OF BRITISH COLUMBIA April, 1956 ABSTRACT The problem of c l a s s i f i c a t i o n i n multivariate analysis i s considered. The d i s t r i b u t i o n of the extreme Mahalanobis 1 distance from the sample mean has been derived fo r a s p e c i a l case of the bi v a r i a t e problem, and f o r t h i s s p e c i a l case the cumulative d i s t r i b u t i o n has been p a r t i a l l y tabulated. The c h a r a c t e r i s t i c function of the j o i n t d i s -t r i b u t i o n of the Mahalanobis 1 distances from the sample mean has also been derived. A b r i e f discussion of the one-dimensional problem and i t s solut i o n has been included. ACKNOWLEDGEMENT We are deeply indebted to Dr. Stanley W. Nash of the Mathematics Department at the University of B r i t i s h Columbia for suggesting the topic and rendering invaluable assistance i n the development of thi s t h e s i s . Our further thanks are due to Dr. S. A. Jennings of the Department of Mathematics f o r his h e l p f u l suggestions. We are also pleased to acknowledge the support of the National Research Council of Canada which made th i s research possible. TABLE OF CONTENTS INTRODUCTION CHAPTER ONE: THE ONE-DIMENSIONAL CASE AND ITS SOLUTION 1.1 Notation 1.2 Multiple Range Tests A. Student-Newman-Keul test B. Other Multiple Range tests 1.3 Multiple F Tests A. Duncan1s multiple F test B. Scheffe's test C. Tukey's Gap-Straggler and variance test CHAPTER TWO: THE MULTI-DIMENSIONAL CASE 2.1 Definitions and Fundamental Assumptions 2.2 A possible approach to the multi-dimensional problem A. Generalization of Tukey*s method B. Generalization of NairTs approach to the distribution of the extreme deviate from the sample mean 2.3 Two methods for obtaining the joint distribution of the generalized distances of the groups from their observed centroid A. Introduction of additional variables B. The Characteristic function of the distribution TABLE OF CONTENTS (Continued) CHAPTER THREE: A SPECIAL CASE AND ITS SOLUTION 3.1 The j o i n t d i s t r i b u t i o n of the generalized distances of the groups from t h e i r observed centroid 3.2 The d i s t r i b u t i o n of the extreme deviate from the centroid 3.3 Construction of the table - Table CONCLUSION BIBLIOGRAPHY ( i ) INTRODUCTION In Anthropology, Biology and other sciences the following problem often occurs: Given k groups of objects G-p G 2, G^ of which samples of sizes n-^ , n 2, ..., n^ are taken, p normally d i s t r i b u t e d characters being measured on these objects, determine on the basis of t h i s data (1) whether the groups G-j_, G 2, ..., G^ a l l belong to the same population (2) i f a l l the groups do not belong to the same population, which groups belong together to form clusters and which are from d i f f e r e n t populations. The f i r s t part of t h i s problem i s solved i n the most general case and several methods are available to answer the second part of t h i s problem i n the s p e c i a l case p = 1. For p> 1, the second part of t h i s problem has not been solved s a t i s f a c t o r i l y and c e r t a i n l y not rigorously, although a subjective method of attack has been advocated by K. D. Tocher and i s presented i n r e f . 1 p. 363. We have attempted i n t h i s paper to give a more rigorous approach to the second part of the problem and have succeeded i n solving a s p e c i a l case. (1) CHAPTER ONE THE ONE-DIMENSIONAL CASE AND ITS SOLUTION 1.1 Notation C a l l G]_, G2, ... Gfc the groups or treatments about which we want to t e s t some hypothesis; l e t n^, n2, ... njj be the sample sizes of the groups, l e t x^, xj?, ••• Xfc be the sample means of a normally d i s t r i b u t e d character measured i n the objects i n the groups. Further, l e t x be the grand mean of the measurements, s be an independent estimate of the variance of the measurements and st = -L. the estimate of the variance of the " n i mean x where n = n. The problem consists of deciding on the basis of the above information which groups or treatments are s i g n i f i c a n t l y d i f f e r e n t . Several tests are available to solve t h i s problem. They can be roughly c l a s s i f i e d as "Multiple Range Tests" and "Multiple F Tests". A b r i e f discussion of these tests f o r the purpose of i l l u s t r a t i n g t h e i r nature i s given below. A more detailed expose and i l l u s t r a t i o n s of the various tests can be found i n r e f . 2 pp. 18-45. We have attempted i n Chapter 2 and Chapter 3 to generalize, to some extent.^ one of the one-dimensional (2) tests to the multi-dimensional problem. Tukey's multiple F t e s t appeared to be the one which would most e a s i l y carry over to the general case, and i t i s with that t e s t i n mind that we approached the problem. 1.2 Multiple Range Tests In what follows we can assume without loss of generality that the means x^, X 2 , ... xjj. have been ranked, x i being the smallest mean and x^ the largest mean. A. Student-Newman-Keul te s t ( r e f . 3) The Studentized range xmax - x m i n _ r a n g e ' q = c _ standard deviation i s considered. The d i s t r i b u t i o n of o^u,*) » where <* i s the l e v e l of s i g n i f i c a n c e and n the number of degrees of freedom associated with t has been v tabulated by J . M. May f o r various values of <* . The tables can also be found i n r e f . 2 p.22-23. \ The t e s t suggested by Newman i s as follows: Step 1: Choose a l e v e l s i g n i f i c a n c e <* , (usually .05 or .1) . (3) Step 2 : Compute x Step 3- Compare x n - with W n # I f x n - x"i i s l e s s than W n , the process terminates and we assert that the groups belong to the same popula-t i o n at a l e v e l of s i gn i f i cance °< • I f x n - x i > W n ^ we state that x^ i s d i f f e r e n t from x^ or that the corresponding groups G n and G^ are s i g n i f i c a n t l y d i f f e r e n t . We then proceed to compare *" * 1 a n c * 3 n " * 2 w i t n w n - i * ^ both xn_-j_ - x-[_ and x n - x 2 are less than W n _^ the process terminates . I f , say x n _ ? 2 > ^ n - l * w e s t a t e that x n i s d i f f e r e n t from X 2 (or G' n d i f f e r e n t from G 2 ) and proceed t o compare x n _ X3 a n ( i x n _ i ~ *2 w i t h W n _ 2 . This process continues u n t i l the actual ranges of subsets of i means do not exceed Note that i t i s not necessary to compare a subset of means which i s contained i n a l a r g e r subset, the range of which i s less than the ca lculated W . We could have U) therefore dispensed with the comparison of x n - l - x"2 a*1** w n - 2 since the subset (x^.^, x n _ 2 , . . . , X 2 ) i s c o n " tained in (x^.-^, x n _ 2 » « » « x 2 , X i ), the range of which was found to be less than • This is one of the easiest tests to perform. B. Other multiple range tests Duncan (ref. 5) suggested table values somewhat different from ^ c ^ r o • The use of Duncan's pro-cedure tends to decrease the number of type II errors. Other variations of this test were proposed by Tukey (ref. 6 ) , the use of which would decrease.the number of type I errors . 1.3 Multiple F tests Multiple F tests combine the use of ranges with variance-ratios. A. Duncan's multiple F test (cf. ref . 5 and 7) The f i r s t stage of Duncans procedure is to perform a multiple range test as was done above using instead of q ^ ^ tabular values somewhat different, (ref. 7 or 2 ) . Once the multiple range test has been performed calculate ss^ ^ JL p - 2 , 3 . . . n which gives the sum of squares significant at level <* , (5) obtained from the l e a s t s i g n i f i c a n t range. Suppose ( xj[ X 2 ... ) i s a group of ranked means f o r which the multiple range t e s t has f a i l e d to show any hetrogenity. The second stage of the t e s t consists of applying the following r u l e : x j - x^ i s s i g n i f i c a n t i f - x j > R2 and i f the sum of squares of a l l - combinations of means out of x^ xjj; ... x*2 including x^ x^ exceed ss^, , p being the number of means i n the combination. -1 _1 _1 The sum of square among the r means x^ X 2 ... x r i s ss. _1 _1 1 Duncan showed that x r - x^ > together with ss x,r > s s r implies that the sum of squares among m means or less out of the r means exceeds the corresponding ss } ST so that i n most cases i t i s not necessary to calculate the sum of square f o r a l l possible combinations. B. Scheffe's te s t ( r e f . 8) In addition to being applicable i n te s t i n g the difference between two means, Scheffe's t e s t may be used to judge a l l comparisons of the form a i x^ + a 2 x 2 + ... -i-a n x n n where the a's are constants with the condition aL =0 The standard error of the comparison i s .sc = s_ v/a,2-4 ... + &l Define 3 = J ^ ^n-'> ^ where n i s the number of means f i s the number of degrees of freedom of the error variance (6) Scheffe proves that the value of the comparison i s s i g n i -cant at the * l e v e l i f a x 3ci + &z x 2 + ... -t-an > S s c . This test has a larger type II error than Duncan*s te s t , but i t has smaller type I error. C. Tukev's Gap Straggler and variance t e s t ( r e f . 9) Rather than considering the range of a group of, say, k means and comparing i t to the tabulated values of the Studentized range, Tukey considers the extreme deviate -1 say x from the grand mean m of the group of k means. He shows empirically that n a d.f. associated with s__ (for k>3 ) ( f o r k s 3 ) are d i s t r i b u t e d approximately as normal deviates. The exact d i s t r i b u t i o n of an extreme deviate from the sample mean has been given by K. R. Nair ( r e f . 1 0 ) . Tukey"s t e s t as given i n r e f . 9. Step 1: Choose a l e v e l of s i g n i f i c a n c e 4. Step 2; Calculate the difference which would have been s i g n i f i c a n t i f there were only two means, i . e . /I s. t ( h > 0 where t i s the Student 1s t with n d. f . (1) x 1 - ml _ k. )< 5 'S ro 3 ( + * 4 and (2) x 1 - m I z 3 U * * > (7) Step 3 : Arrange the means in order of magnitude and consider any gap wider than /I ^ t( } as a group boundary. If no group contains more than two means the process terminates. Step 4: In each group of three or more find the grand mean m, the most straggling mean x~ and compute the value (l) or (2) as the case may he. Separate any straggling mean for which this is significant at the two sided significance level o< for the normal distribution. Step 5: If step k changes any group, repeat the process until no further means are separated. The means separated off from one side of a group form a new group. If any of the new groups so formed contains three or more means apply step k and 5 to this new group. Step 6: Calculate the sum of squares of deviations from the group mean and the corresponding mean square for each group of three or more means resulting from step 5. Using p s_. as denominator calculate the variance ratio and apply x the F test. If the ratio is found significant we assert that there is an overall difference among the means of that group. (8) CHAPTER TWO THE MULTI-DIMENSIONAL CASE 2.1 D e f i n i t i o n s and fundamental assumptions Suppose we have k groups (of objects) G^, G 2» ••• of which samples of s i z e n^, n 2 , ... n^ are taken; p normally d i s t r i b u t e d characters are measured on these objects. We denote the sample means of the characters by x » > > •. • > x i k xf>l > • • • J Xl>k Throughout t h i s paper we w i l l assume that the covariance matrix ( °<ij ) of these measurements i s known or estimated f o r a large number of degrees of freedom. We denote the inverse of t h i s matrix by ( <^tJ ). We w i l l make extensive use of the following s t a t i s t i c V= TIL Z " r f x ^ - X i X v - X : ) <-=( j = r r=i J J k where x. _ ^rr h r * t r and n = ZL n n I f we l e t k s 2 , we get, a f t e r some manipulation or t (9) 2 2 D i s known as the Mahalanobis distance, D i s , to some extent, a measure of the distance between two groups, __ 2 2 V i s a generalization of Mahalanobis D . D was shown to be d i s t r i b u t e d as X with p degrees of freedom, and V as X with p (k-1) degrees of freedom, 2.2 A possible approach to the multidimensional problem A. Generalization of Tukev's method The s t a t i s t i c V can be used to t e s t the n u l l hypothesis that the groups belong to the same multinomial population as follows: I f the observed value of V i s larger than the tabulated X with p (k-1) d.f. at the o< l e v e l of sig n i f i c a n c e , we re j e c t the n u l l hypothesis and assert that the groups do not a l l belong to the same population. We are then l e f t with the problem of c l a s s i f y i n g the groups in t o clusters of groups belonging to the same population. As stated i n Chapter 1, we w i l l t r y to generalize Tukey 1 s method and more p a r t i c u l a r i l y step 4 of his procedure, Tukey uses i n his te s t the extreme deviate from the grand mean, the exact d i s t r i b u t i o n of which i s given by K. R, Naii*. We w i l l use the extreme generalized distance from the centroid of a l l the groups, and our problem w i l l be then to f i n d the d i s t r i b u t i o n of such a distance. (10) B. Generalization of Nair's approach to the d i s t r i b u t i o n of the extreme deviate from the sample mean I Nair's d i s t r i b u t i o n We w i l l give here only a short account of Nair's work (ref. 10). In order to f i n d the d i s t r i b u t i o n of the extreme deviate among the ordered normal N (0,i) variates x l ••• x n from t h e i r mean x , Nair writes down the j o i n t d i s t r i b u t i o n of the x*s (fin) • J By a suitable orthogonal transformation t h i s reduces to n • w and integrating out x he gets where i t may be shown that z h ( = = nu u being the extreme deviate from the mean x , and The d i s t r i b u t i o n of u may then be obtained by integrating out 2, , . .. ,z*-z F i n a l l y the d i s t r i b u t i o n of u can be written where ^* x (11) I I Generalization of Nair's approach Under the n u l l hypothesis that a l l the groups belong to the same population, a l l measurements have the same multinomial d i s t r i b u t i o n . The means v •" V from d i f f e r e n t groups are independent, but i n general the observations from d i f f e r e n t characters are correlated. Since we know the covariance matrix ( <j) we can replace the observations ( ••- by l i n e a r combinations C^ir, , ) which are uncorrelated. The covariance matrix of the y's i s a diagonal matrix denoted by f X;) Moreover we can assume without loss of generality that the true centroid of the d i s t r i b u t i o n i s JU., = » , ^ -.o, ,.. / V = ° The j o i n t d i s t r i b u t i o n of the ^ '5 i s then where Now, f k c . T J 7 I £ L k f ' k J L x Note that , v^ - / - \*-i s the Mahalanobis distance between the r t(v group and (12) the observed centroid of a l l the groups. The largest of these V r 's i s thus the extreme distance the d i s t r i b u t i o n of.which we want to f i n d . We can write-\ % % V • . \ d > - C «*r (~i t V - -fttf>)]}, ] H Consider now the following orthogonal transformation The inverse transformation i s k-i and we note that i s independent of the u's . The d i s t r i b u t i o n of the u f s and v ! s i s then f> fc-l Integrating out the u f s from -<*> to + °° we get (13) where The problem i s now to f i n d the j o i n t d i s t r i b u t i o n of the V r 2.3 Two methods fo r obtaining the .joint d i s t r i b u t i o n of the generalized distances of the groups from t h e i r observed centroid A. Introduction of a d d i t i o n a l variables, Considering the d i s t r i b u t i o n obtained i n 2.2 j. C k-i • J where the Vr* 3 are functions of the v's i t w i l l be noticed that we have p ( k - l ) v ! s but only k V r T s • A change of variables from the v ! s to the V^'s as they stand i s therefore not possible. -A device sometimes used under such circumstances would be to introduce a d d i t i o n a l V r f s which are functions of the v f s and integrate them out l a t e r on. We would then be l e f t with the desired j o i n t d i s t r i b u t i o n of the V r f s . This method proves successful i n the s p e c i a l case p • 2, k - 3 which i s considered i n Chapter 3* In other cases the integrations of the a d d i t i o n a l variables could not be performed. Numerical integration i s obviously not applicable here. (14) B. The Charact e r i s t i c function of the d i s t r i b u t i o n The j o i n t c h a r a c t e r i s t i c function of the functions V...^ ... V k of the variables v ^ ••• v p ( k - i ) i s defined to be the expected value of exp j>_ krVr j that i s ... . . f ( ^ u ) > ( 4 nj , where C-^ and h (V-Q, ... , v p ( k - i ) ) a r e S i v e n i n the preceding section. I f t h i s C h a r a c t e r i s t i c function turns out to be a known Fourier transform, the j o i n t d i s t r i b u t i o n of the 1^ ••• ^k w ^ - l be the inverse of t h i s transform. We consider f i r s t the s p e c i a l case k a 3, p • 2 and n i > • n ^ • nQ . The Cha r a c t e r i s t i c function i s then | | | « r { [ v , f i t . - i ) w j t r , - 1 ) + v; ( i t , - i ) ] L «!«;, fll^ ^ where C, = Expressing the V's i n terms of the v T s , ^ ( k . j t a , ^ ) becomes - ae no These two double integrations are quite s i m i l a r except f o r minor changes i n the constants. (15 ) Omitting a l l factors not involving v ^ , the integration with respect to v ^ reduces to which a f t e r some manipulations y i e l d s I.= K. ex L 4>, (jet -+ -Lti-i) where /—> K = y ^ T t A i v/n0 (Ur, + Integrating with respect to v^g we get - °° This i n turn y i e l d s T - K - - Vfen >, J-^ - M A - _ J nA Ik. -+ \k, •+ * U, - 3 - (U,- 1 V L 3 ^ , 4 ^ - 0 - ' The f i r s t double integration gives us K. K, ^ v 7 ^ 7i X, The second double integration i s performed s i m i l a r l y . and given an analogous expression with \ replaced by X^ F i n a l l y the c h a r a c t e r i s t i c function i s found to be ^1 j hi; 3^ ) ) = p 777—., .1 This expression could be s i m p l i f i e d , but f o r the purpose of generalization i t i s convenient to leave i t i n thi s form. This function generalizes r e a d i l y to 3 groups of p characters giving ^ ( f c , , t ^ t 3 ) = [ Vjt-,, t x , O ] 1 (16) In a similar fashion the special cases k - 4 and k - 5 were worked out and a pattern was observed which enabled us to write the characteristic function ^ O . ? t a , . . . , t k ) as follows P where C- i)k~' t»(r+i) f-1 r = Z A + h — (r+i) _ Z This Characteristic function applies generally except for the restriction n^ = n 2 a . • • n^ • n 0 • For k)>3 ^ ( IT, , £ A . 71^) becomes very complicated and quite hard to handle. But even for the simplest case k s } we were unable to recognize (t, , t z , t 3 ) as a familiar Fourier transform. In Chapter 3> the joint distribution in the special case k s 3 , P a 2 is shown to be Formally, this is the inverse transform of <fx( t,,t i ; t 3) . (17) CHAPTER THREE A SPECIAL CASE AND ITS SOLUTION 3.1 The .joint d i s t r i b u t i o n of the generalized distances of the groups from t h e i r observed centroid S p e c i a l i z i n g the re s u l t s of . 2.2 B I I to the special case p » 2 k - 3 j the j o i n t d i s t r i b u t i o n of X where and Consider the orthogonal transformation of which the inverse transformation i s The d i s t r i b u t i o n of the u's and the v f s i s (18) where the V's are functions of the v's only. Integrating out the u*s we get c "t ( - i I v r ) TJ ]J J - j (J « r f - f ( ; t < 8 < i « . d u -J- * XT where C a l l n l n 2 n 3 and consider the transformation * [ £ , ( * " • « • £ • - » ) * • t ( ^ - + j K ) * ] Y,' = V 3 . N [ t , ( t ^ * £ ( £ < - N i where V i s a new variable introduced i n order to perform the change of variables from the v's to the V's . V* s a t i s f i e s the inequality 0 < V.J. <iY,1 + IV,' - 1Y>' A f t e r lengthy algebraic manipulations the inverse transformation i s found to be ( 1 ) (2) i . -1 M v l - V 1 , J U N f l 3 =± 3 >, (v:-v:)JyT ± iav^iv^Iv^v:)av:v^^^^^ ,v• l-vr-vr-vr) v;+v^-iv9' 1 ± IN| U 1 = ± (19). To eliminate extraneous solutions the following restrictions on the signs in (1) and (2) must be introduced: the signs in front of the expressions (l) and (2) are the signs of v ^ and respectively; the signs in front of the root sign in the expressions (1) and (2) must be opposite. The Jacobian of this transformation is j _ -3 xf? XX. SNT ( v a ^ i r , , - V - , , 2 - V ^ U i . , ) or in terms of the V fs J = 3 W , 1 1 1 1 x The joint distribution of ( V 2 ) is then found to be 7 * v,« < + ^ v j + v i - v r - v ^ - v r yv,;fe v ; + i ^ - i v^-v-; where , 1 1 1 , To find the joint distribution of ( V 1 V 2 ) we integrate out over its range which is o to j ^ ' + i ^ - J j V j ! J.V.'-tlS/'-J-Vj This integration is easily performed and gives (20) The j o i n t d i s t r i b u t i o n of ( V"i V 2 V3 ) i s then found to be exp [ -afV. + V x + V P ] dV, dV 2 dV 3 4TTH 3.2 The d i s t r i b u t i o n of the extreme deviate from the centroid Let us r e s t r i c t the problem further by assuming the number of observations to be the same f o r a l l groups, i . e . n-^ = n 2 = . . . H--n^, C a l l n 0 t h i s common value. The j o i n t d i s t r i b u t i o n of V-p V"2, i n t h i s case sp e c i a l i z e s to t ( X , N / a , V j = ex^ [ :£(v , + y v , ) ] ^ 7aV 1V t + zViV3 + 2V3V, -V^-V^-V^ The variates V^, V"2, are always posit i v e and i t i s easy to check that V^, v"2, &° n o t assume values out-side the cone defined by (*) A V,V, + A V Z V 3 + X V , V , - V ' - V ^ - V , 1 > o The d i s t r i b u t i o n f ( V-^ V 2, ) i s therefore always r e a l and p o s i t i v e . We can assume without loss of generality that the variates have been ordered say 0 4 V-^ ^ V 2 ^ 4 t . The density of these ordered variates i s 3 l f (V^, V 2, V3 ) . (21) We are interested i n the d i s t r i b u t i o n of the extreme deviate from the centroid, V3 ; i n other words we want to f i n d the p r o b a b i l i t y G [t], that V3 4 t • 1 4 The lower l i m i t s f o r and V 2 ; are obtained from the r e s t r i c t i o n on the variates and the i n e q u a l i t i e s 0 « V i < V 2 ^ V3 s< t . G (t) i s well defined by the above expression but the integration i s hard to perform and not suitable f o r numerical i n -tegration. In order to remedy th i s state of a f f a i r s consider the orthogonal transformation U = ± (-V, + v , ) the" inverse of which i s •= - L i i r + _Lu--L 'U" •\f3 \ffc Under t h i s transformation the d i s t r i b u t i o n 3 - ( ( ^ V ^ V5) dV( dV z dV 3 becomes (22) Follow t h i s by the transformation = Tic 5 ^ ? (23) This change of variable i s , roughly speaking, a change to c y l i n d r i c a l coordinates as shown i n ( f i g . 1) where we have set ^ * n - rj 6 = t a n ~ ' j : ^ = ± = £ The transformation i s defined and single valued i f t;4 o and 5 < «> (B 4. g) . I t can be e a s i l y v e r i f i e d that the angle at the vertex of the cone i s ^ so that 8 < I and 5 i s therefore always f i n i t e . The ranges of C, 7 5, ^ taken independently are o ^ ^ < ° ° , o ^ $ ^ I o 4 ^ iTi However i f we l e t V3 < t, the l i m i t s on 5 , S and rj are no longer independent, f o r and therefore ^ 3fc ^ v a( 1 •+ 5 cos The i n e q u a l i t y V 1 < V 2 ^ V3 give l i m i t s for 17 : V!<V 2 implies u > o or o ^ ^ ^~n ? V 2 4 V3 implies and the l i m i t s on are thus o < ^ ^ ^ The p r o b a b i l i t y G (t) of getting 0 4 V, ^ V t 4 V3 4 t becomes f i n a l l y 77 3 / i O t g « y ) The region of integration f o r the unordered V's shown i n ( f i g . 2) i s (24) The integration with respect to ^ gives 5 _ -ex If -The other integrations have to be evaluated numerically. (26) three decimal places. The double numerical integration ; 3 •+ I •ex , 1 ? was performed using seven equally spaced values of rj : yjx - , i s o to 6 and eleven equally spaced values of & : §•= ±. . i a o to 10. The function 5 ^" JO ' -at was f i r s t evaluated f o r a l l seventy seven pairs of values C iji , % j ) using the "Tables of the Exponential Function e ." prepared by the National Bureau of Standards. The integration with respect to 17 keeping § fixe d , j -f ( i ^ f ; ) ^ o was performed using Cotes 1 numbers, thus f i t t i n g a polynomial equation of degree s i x to the seven points K^c, §>')] i s o to 6. This was done f o r the eleven values of 3 • The f i r s t integration thus yi e l d s eleven values of a function Ad) Using the method of lea s t squares and a set of orthogonal polynomials, we constructed a polynomial. P(§} f i t t i n g the eleven points A , i = o to 10, thus approximately The integration ? ?C%) = \ / i - f z can be s p l i t into s i x integrations each of which can be performed by using tables of the Beta-function (e.g.: "Tables of the Incomplete Beta-Function" edited by K a r l Pearson). (271 At l e a s t s i x decimal places were carried throughout the computation but the accuracy i s reduced by approximating A(s) by PCf) . Bounds on G (t) can be given as follows i Ate)] ( S ?fe)ol5 < 4 l _ I A(5)d5 < F(5) The results of t h i s computation are summarized i n the following table. (28) Table of the pr o b a b i l i t y G(t) of getting a value f o r the extreme deviate at least as large as t t G(t) 2.667 .686 3 .000 .743 * 3 .333 .791 * 3 .667 .832 » 4 .000 .866 4.333 .894 * * * 4.424 .900 4.667 .916 * 5.000 .934 * 5.333 .948 * * 5.394 .950 5.667 .959 * 6.000 .967 * 6.333 .974 * 6.667 .980 * These p r o b a b i l i t i e s were obtained by parabolic i n t e r p o l a t i o n through the points (1) (2) (3) (4) • ** The values of t y i e l d i n g a pr o b a b i l i t y of .90 and .95 were obtained by l i n e a r i n t e r p o l a t i o n . (29) Conclusion The results obtained i n t h i s paper enable us to construct a solution to the following problem: Samples of equal size n 0 are taken from three groups. Two normally.distributed characters are measured on these objects, x n> • • • 2^3 denoting the mean values of these measurements. The covariance matrix of these measurements i s known, or estimated on a large number of degrees of freedom. On the basis of these measurements decide whether the groups belong to the same population and i f they do not, which are d i f f e r e n t from the others. The solu t i o n we propose i s as follows: Step 1: Choose a l e v e l of significance oi. Step 2: Uncorrelate the measurements. To t h i s end f i n d the orthogonal matrix B = (b-y) such that BAB*= A where A = C^i) i s a diagonal matrix. Then perform a transformation from the s to a new set of variates ^ . y . Compute u, w the means of the new uncorrelated v a r i a t e s . ) II 7 • • j /Z3 Step 3: Compute v * E v r where ,J_ _ (30) Rank the V's say 4 V 2 4 Step L: Compare V to with k d.f. I f V < X * the groups are asserted to belong to the same population, and the process terminates. I f V ) % ^ proceed to compare with (tabular value given i n Chapter 3), I f < (r, , no group i s separated from the c l u s t e r although there i s an o v e r a l l difference among the groups, and the process terminates. 1 I f ^ we assert that at a l e v e l of 1 significance oi the group corresponding to does not belong to the same population as the other two groups. 2. Then proceed to compute V = K, ^ _L (^-(, - Y^.)1^ | z. and compare Y with with 2 d.f. I f V <C X*,2. we assert that the groups corresponding to and belong to the same population. I f V > rci,7. we assert that each of the groups belongs to a d i f f e r e n t population. Although a so l u t i o n i s given only f o r the s p e c i a l case 3 groups - 2 characters, i t covers a somewhat wider range of problems. In many instances the configuration of the mean values with respect to p characters can be pre-served by representing the groups with respect to two s u i t -ably chosen functions of the p characters. Methods have been devised f o r constructing such functions and f o r t e s t i n g (3D the adequacy of the representation, (see for example ref. 1 p. 365). The main restriction to the solution given is thus in the number of groups. As stated previously in the paper, the joint dis-tribution of the deviates from the centroid in the general case is not readily available by the method used in the special case. We do suspect the form of the general dis-tribution to be quite similar to that of the special case, but we have been unable to justify this guess so far. We suggest that some more research could be carried in the following directions (1) Try to increase the number of groups (2) Try to increase the number of characters (3) Try to invert the characteristic function of the joint distribution (4) Guessing the joint distribution try to show that its characteristic function coincides with that given in Chapter 2 (5) Extend these results to the case where the covariance matrix (o^ j) is not known, that is, find the Studentized form of the distribution of the extreme deviate from the centroid of the groups. (32) BIBLIOGRAPHY 1. C. R. Rao, Advanced S ta t i s t i ca l Methods in Biometrics Research. J . Wiley & Sons (1952) : ^ 2. W. T. Federer, Experimental design. MacMillan Co. (1955) 3. D. Newman, The distr ibution of range in samples from a normal population. Biometrika 31 20-30, U939) 4. E . S. Pearson & H. 0. Hartley, Biometrika tables for s tat i s t ic ians . Cambridge University Press (1954)) 5. D. B. Duncan, Multiple range and multiple F . test . Mimeo. Tech. Report no. 6, Va. Polytechnical Inst. (Sept. 1953) 6. J . W. Tukey, The problem of multiple comparisons. Dit to , Princeton University (1953) 7. D. B. Duncan, A significance test for difference between ranked treatments in an analysis of variance, Va. J . Sc i 2: 171-189 C195H 8. H. Scheffe, A method for .judging a l l contrasts in the analysis of variance. Biometrika 40: 87-104. 11953J 9. J . W. Tukey, Comparing individual means in the analysis of variance. Biometrica 5: 99-114, (1949)- " 10. K. R. Nair, The distr ibution of the extreme deviate from the sample mean and i t s studentized form. Biometrika 35: 118-144 (1948)
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The distribution of the extreme Mahalanobis' distnace...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The distribution of the extreme Mahalanobis' distnace from the sample mean Cuttle, Yvonne Germaine Marie Ghislaine 1956
pdf
Page Metadata
Item Metadata
Title | The distribution of the extreme Mahalanobis' distnace from the sample mean |
Creator |
Cuttle, Yvonne Germaine Marie Ghislaine |
Publisher | University of British Columbia |
Date Issued | 1956 |
Description | The problem of classification in multivariate analysis is considered. The distribution of the extreme Mahalanobis’ distance from the sample mean has been derived for a special case of the bivariate problem, and for this special case the cumulative distribution has been partially tabulated. The characteristic function of the joint distribution of the Mahalanobis’ distances from the sample mean has also been derived. A brief discussion of the one-dimensional problem and its solution has been included. |
Subject |
Sampling (Statistics) |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-02-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0302307 |
URI | http://hdl.handle.net/2429/40442 |
Degree |
Master of Arts - MA |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1956_A8 C9 D4.pdf [ 1.55MB ]
- Metadata
- JSON: 831-1.0302307.json
- JSON-LD: 831-1.0302307-ld.json
- RDF/XML (Pretty): 831-1.0302307-rdf.xml
- RDF/JSON: 831-1.0302307-rdf.json
- Turtle: 831-1.0302307-turtle.txt
- N-Triples: 831-1.0302307-rdf-ntriples.txt
- Original Record: 831-1.0302307-source.json
- Full Text
- 831-1.0302307-fulltext.txt
- Citation
- 831-1.0302307.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0302307/manifest