@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en, "Mathematics, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Bagai, Om Parkash"@en ; dcterms:issued "2011-12-08T20:38:31Z"@en, "1960"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """The problem of classifying multivariate normal populations into homogeneous clusters on the basis of random samples drawn from those populations is taken up. Three alternative methods have been suggested for this. One of them is explained fully with an illustrative example, and the tabular values for the corresponding statistic, used for the purpose, have been computed. In the case of the other two alternatives only the working procedure is discussed. Further, a new statistic R, 'the largest distance', is proposed in one of these two alternatives, and its distribution is determined for the bivariate case in the form of definite integrals. Ignoring a priori probabilities, two alternative methods are suggested for assigning an arbitrary population to one or more clusters of populations, and are demonstrated by an illustrative example. A method is discussed for finding confidence regions for the non-centrality parameters of the distributions of certain statistics used in multivariate analysis and this method is also illustrated by an example. The exact distribution of the determinant of the sum of products (S.P.) matrix is found (in series), both in the central and the non-central linear cases for particular values of the rank of the matrix. Further, these results have been made use of in finding the limiting distribution of the Wilks-Lawley statistic proposed for testing the null hypothesis of the equality of the mean vectors of any number of populations. Six different statistics based on the roots of certain determinantal equations have been proposed for various tests of hypotheses arising in the problems of multivariate analysis of variance (Anova). Their distributions in the limited cases of two and three eigenroots have been found in the form of definite integrals. Also, the limiting distribution of the Roy's statistics of the largest, an intermediate and the smallest eigenroots have been found by a simple, easy method of integration, which method is quite different from that of Nanda (1948). Lastly, the distributions of the mean square and the mean product (M.P.) matrix have been approximated respectively in the univariate and multivariate cases of unequal sub-class numbers in the analysis of variance (Anova) of Model II."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/39569?expand=metadata"@en ; skos:note "MULTIPLE COMPARISON METHODS AND CERTAIN DISTRIBUTIONS ARISING IN MULTIVARIATE STATISTICAL ANALYSIS by ::: Om ParkashlBagai A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in the Department of MATHEMATICS We accept this thesis as conforming to the standard required from candidates for: the degree of Doctor of Philosophy. Members of the Department of Mathematics THE UNIVERSITY OF BRITISH COLUMBIA April, I960 . In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e ' r e q u i r e m e n t s f o r an advanced degree at the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e Head o f my Department o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t copying' o r p u b l i c a t i o n o f t h i s t h e s i s , f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department o f Mathematics The U n i v e r s i t y o f - B r i t i s h C o l u m b i a , V a n c o u v e r C a n a d a . Date A p r i l 13, I960 1 G R A D U A T E S T U D I E S i Field of Study: Mathematics Modern Algebra D . C. Murdoch Mathematical Statistics S. W . Nash Computation Methods F. M . Goodspeedj Real Variables P. S. Bullen Probability and Statistics S. W . Nashj Other Studies: Theoretical Mechanics W . Opechowskij Industrial Statistics T . I. Matuszewskil j Directed Studies (Plant Science) V . C. Brink! i i l | i j CLLbe iMitterstig of ^British Qlolmitbta Faculty of Graduate Studies PROGRAMME OF THE F I N A L O R A L E X A M I N A T I O N FOR T H E D E G R E E OF D O C T O R O F P H I L O S O P H Y of O M P A R K A S H B A G A I B.A. University of Panjab, India, 1948 M . A . University of Panjab, India, 1950 R O O M 225, B U C H A N A N B U I L D I N G T U E S D A Y , A P R I L 26, 1960 A T 9:30 A . M . COMMITTEE IN CHARGE Dean F. H . S O W A R D : Chairman S. W. N A S H S. A . J E N N I N G S R. A . R E S T R E P O B. N . M O Y L S W. OPECHOWSK1 V . C. B R I N K T. I. M A T U S Z E W S K I C. W A L D E N A . D . M O O R E External Examiner: PROF. E. S. K E E P I N G University of Alberta, Edmonton, Alta. M U L T I P L E C O M P A R I S O N M E T H O D S A N D C E R T A I N DISTRIBUTIONS A R I S I N G IN M U L T I V A R I A T E S T A T I S T I C A L A N A L Y S I S A B S T R A C T The problem of classifying multivariate normal populations into homogeneous clusters on the basis of random samples drawn from those populations is taken up. Three alternative methods have been suggested for this. One of them is explained fully with an illustrative example, and the tabular values for the corresponding statistic, used for the purpose, have been computed. In the case of the other two alternatives only the working procedure is discussed. Further, a new statistic R, 'the largest distance', is proposed in one of these two alternatives, and its distribution is determined for the bivariate case in the form of definite integrals. Ignoring a priori probabilities, two alternative methods are suggested for assigning an arbitrary population to one or more clusters of populations, and are demonstrated by an illustrative example. A method is discussed for finding confidence regions for the non-centrality parameters of the distributions of certain statistics used in multivariate analysis and this method is illustrated by an example. The exact distribution of the determinant of the sum of products (S.P.) matrix is found (in series), both in the central and the.non-central linear cases for particular values of the rank of the matrix. Further, these results have been made use of in finding the limiting distribution of the Wilks-Lawley statistic proposed for testing the null hypothesis of the equality of the mean vectors of any number of populations. Six different statistics based on the roots of certain determinantal equations have been proposed for various tests of hypotheses arising in the problems of multivariate analysis of variance (Anova). Their distributions in the limited cases of two and three eigen roots have been found in the form of definite integrals. Also, the limiting distribution of Roy's statistics of the largest, an intermediate and the smallest eigen roots have been found by a simple, easy method of integration, which method is quite different from that of Nanda (1948). Lastly, the distributions of the mean square and the mean product (M.P.) matrix have been approximated respectively in the univariate and multivariate cases of unequal sub-class numbers in the analysis of variance (Anova) of Model- II. P U B L I C A T I O N Intermediate Algebra—Textbook for the Panjab University, India. ABSTRACT The problem of classifying multivariate normal populations into homogeneous clusters on the basis of random samples drawn from those populations is taken up. Three alternative methods have been suggested for this. One of them is explained fully with an illustrative example,and the tabular values for the corresponding statistic, used for the purpose, have been computed. In the case of the other two alternatives only the working procedure is discussed. Further, a new statistic R, 'the largest distance', is proposed in one of these two alternatives, and its distribution is determined for the bivariate case in the form of definite integrals. Ignoring a priori probabilities, two alternative methods are suggested for assigning an arbitrary population to one or more clusters of populations, anddemonstrated by an illustrative example. A method is discussed for finding confidence regions for the non-centrality parameters of the distributions of certain statistics used in multivariate analysis and this method is also illustrated by an example. The exact distribution of the determinant of the sum of products (S.P.) matrix is found (in series), both in the central and the non-central linear cases for particular values of the rank of the matrix. Further, these results have been made use of in finding the limiting distribution of the Wilks-Lawley statistic proposed for testing the null hypothesis of the equality of the mean vectors of any number of populations. - i i -Six different statistics based on the roots of certain determinantal equations have been proposed for various tests of hypotheses arising in the problems of multivariate analysis of variance (Anova). Their distributions in the limited cases of two and three eigenroots : have been found in the form of definite integrals. Also, the limiting distribution of the Roy's statistics of the largest, an intermediate and the smallest eigenrobt^a have been found by a simple, easy method of integration, which method is quite different from that of Nanda (19I+8). Lastly, the distributions of the mean square and the mean product (M.P.) matrix have been approximated respectively in the univariate and multivariate cases of unequal sub-class numbers in the analysis of variance (Anova) of Model II. TABLE OF CONTENTS ABSTRACT ACKNOWLEDGEMENTS CHAPTER ONE CHAPTER TWO CHAPTER THREE CHAPTER FOUR CHAPTER FIVE CHAPTER SIX CHAPTER SEVEN CHAPTER EIGHT APPENDIX A APPENDIX B APPENDIX C APPENDIX D BIBLIOGRAPHY INTRODUCTION ANALOGUES OF DUNCAN'S PROCEDURE IN FORMING CLUSTERS IN MULTIVARIATE ANOVA ANALOGUES OF DUNCAN'S PROCEDURE IN FORMING CLUSTERS IN MULTIVARIATE ANOVA (CONTD.) ASSIGNING A POPULATION TO ONE OF THE CLUSTERS DETERMINATION OF CONFIDENCE REGIONS FOR NON-CENTRALITY PARAMETERS CORRESPONDING TO D| AND T 2, AND ANOTHER EXPRESSION FOR T 2 DISTRIBUTION OF THE DETERMINANT OF THE S.P. MATRIX IN THE NON-CENTRAL LINEAR CASE FOR SOME VALUES OF p. STATISTICS PROPOSED FOR VARIOUS TESTS OF HYPOTHESIS I, II AND III THEIR DISTRIBUTIONS IN PARTICULAR CASES. APPROXIMATE DISTRIBUTIONS OF THE NON-ORTHOGONAL COMPLEX ESTIMATES EVALUATION OF THE EIGENVALUES AND EIGENVECTORS OF THE MATRIX BW\"1 FINDING THE BOUNDS FOR THE COEFFICIENTS OF CERTAIN CUBIC EQUATIONS TABLES OF CHI-SQUARE AT<^ k LEVEL for^=.05,FOR k = 2(1)20 AND D.F. = 1(1)30(10)100 ajABLES OF CHI-SQUARE AT^k LEVEL FOR I = .01, FOR k = 2(1)20 AMD D.F. = 1(1)30(10)100 -Page No. (i ) - ( i i ) ( i i i ) 1 27 * 6 81 87 100 118 151 161 162 (i)-(vi) (i)-(vi) (i)-(vi) ACKNOWLEDGEMENTS The author wishes to thank Dr. Stanley W. Nash of the Department of Mathematics for suggesting the topic of this thesis, and feels highly indebted to him for his invaluable advice during its preparation. His further thanks are due to Dr. S.A. Jennings and Dr. R.A. Restrepo for their assistance in preparation of the final manuscript. He is pleased to thank the staff of the U.B.C. Computing Centre for their assistance in working on the problem for illustra-tion and in the preparation of the tables of chi-square. His thanks are also due to Mr. K.G. Fensom, Superintendent, Forest Products Laboratory, Canada, for allowing him to use the data for the illustrative example. He is also pleased to acknowledge the support of the National Research Council of Canada which made this research possible. CHAPTER ONE INTRODUCTION 1.1 Test of Equality of Mean Vectors in the Case of Two p-variate Normal Populations In sciences like anthropology, biology, and others, we often wish, on the basis of two p-variate samples drawn from two populations, to find whether the two populations, on a given probability level, are distinct or not. Karl Pearson (1921) gave a start to answering such a question by suggesting his well-known Coefficient of Racial Likeness (C.R.L.) to Tildesley (1921), and he himself discussed i t in his paper in 1926. But this coefficient was found to be inadequate and was severely criticized by Mahalanobis and Morant as a measure of divergence. Mahalanobis (1925) modified C.R.L. and defined a measure of divergence DJ~ the \"Mahalanobis distance\", both for classical and Studentized cases, as follows: Given two p-variate samples of sizes N^ and N^ with observations X i r h ( i = 1, 2, p-, r a 1, 2; h = 1, 2, Nr) drawn from two p-variate normal populations assumed to have the same covariance matrix but different sets of means and ^ a 1» 2> •••»?)» let and X ( i » 1, 2, p) respectively be the means of the ith trait from the two samples. If the covariance matrix i^^\". .) i s known or has been computed on the basis of large samples, then, taking ( ^ —1^) -2-the inverse of ( V . ), the Mahalanobis distance in the classical case is defined as: P P X. 11. ^ ^ i i - ^ ^ j i - ^ ci.i.1) i»l j=l If ( is not known, we estimate i t from the samples and define the Studentized form as: 2 P P i=l j-1 2 \\ where (N. + N 0 - 2)w. . * / 7__ (X, . - X, )(X. . - X. ) v 1 2 7 i j ^— * — N irh i r v jrh j r r=l h*l and (w^) is the inverse of (w )«, Simultaneously Hotelling (1931) generalized Students' t to the 2 multivariate case. We denote this by T » It was found to be identical *c o (Roy and Bose 1938, Fisher 1938) in form to the Studentized except for a factor involving sample sizes, i.e, . 2 N1 N2 2 l2 N X + N 2 u2 2 2 Distributions of D 2 and T 2 : P P i=l j * l - 3 -be the measure of divergence between the populations, the distributions 2 of (1 .1 .1 .) and (1.1.2) for both central (-&• = 0) and non-central (^ /o) cases are known as stated below: (i) In the Studentized case (Bose and Roy, 1938), under the null hypothesis /\".Q = /*_2 ( i * 1, 2, p) or_c_ 2 = 0, the quantity 2 N + N - p - 1 N1N2 D 2 — x N. + N N + N - 2 1 2 1 2 is distributed as the central F-ratio with p and (N-j_ + ^-p-l) degrees of freedom (D.F), while in the classical case, under the same null hypothesis, »1N2 D2 N l + N2 2 is distributed as central chi-square with p D.F. 2 ( i i ) Again, in the Studentized case (Bose and Roy, 1938) for_i» / 0, , the quantity N x + N2 - p - 1 N-jN2 D| p N x + N 2 N_ + N 2 - 2 is distributed as non-central F-ratio with p and (Nj- +- N2 - p - 1) D.F. and parameter ___2 / ( - ^ - t while in the classical case N]_ N2 ' again f or * f 0, l 2 D c i s distributed as non-central N-, + N„ 2 chi-square with p D.F. and parameter <£x /( rr + W l W2 The distribution of T^ in the central case was given by Hotel1ing (1931) and in the non-central case by Hsu (1938). These are identical to the distributions of Studentized D^ except for the constant multiplier. When the hypothesis of equality of mean vectors is rejected, the problem generally arises of giving confidence regions to the corres-ponding, non- cent rality parameter. We have attempted to answer this problem in Chapter Five, where we have taken simultaneously the case of two or any number of populations. We have fi r s t given the method and then, to demonstrate the method, we have presented an illustration. 1.2 Classification and Discrimination in the Case of k p-variate Normal Populations Again in sciences like anthropology, biology and others, one i s often faced with the problem of discrimination and classification. In the biological sciences we are concerned with specifying an individual as a member of one of the populations to which he can possibly belong, as when a taxonomist has to assign an organism to i t s proper -species or sub-species or an anthropologist is faced with the problem of sexing a skull or jaw-bone. We are also faced with the problem of classification of the groups themselves into some significant system based on the configuration of the various characteristics, for example when - 5 -•a number of species or sub-species may have to be arrayed in hierarchical order showing the closeness of some-and distinctiveness of the others 1. In a l l such problems our first aim is to test whether the populations involved are distinct or not. Four statistics have been suggested for testing the hypothesis of equality of the mean vectors of the populations. We l i s t them below: Suppose we are given k p-variate normal populations, assumed to have the same covariance matrix and distinct mean vectors (/*\\ » » ••*>/' )(r = 1, 2, ..., k). From these populations samples respectively of sizes K^ , N2, »«*, are drawn andii observations X i r h ^ \" 1 ' 2* *•*'' P ' r \" 1 > 2* °**' k 3 1 1 ( 1 n \" •L> 2> • 8 ,» a r e made1; Let W = (w. .) and B = (bj.,) be the within and between mean product (M0P«) matrices with respectively n 2 and n^ D,F. where and b^j are respectively defined as: N k r (1.2.1) r=l h^l k and n n b. (1.2.2) r=l k where n^ = k 1 and (1.2.3) r ^ l - 6 -2 (i) Hotelling's T -Statistic: Hotelling (1947, 1950) gives a statistic to test the hypothesis of equality of k mean vectors and defines i t in the classical case using a matrix ( or~. .) known or estimated on the basis of large samples as: P P k = i - - l j.-l r * l or = n ^ r [(«=r-iJ)B] (1,2.5) k / k where (\"cr-3\"*3) is the inverse of ( ) and X = / ( N X . ) / / (N ) ±y i Z__ r i r / Z — r r=l ' r=l 2 The Studentized can be expressed in three different ways as follows: P P k T.S Y Y wiJ / N (X. - X.)(X. -X.) (1.2.6) k L— L— — • r xr l jr j i=l j r l r=l or = njtrCW'^r n 2tr [ ( ^ W ) \" ^ ^ ) ] (1.2.7) 2 or T~: k i=l 1 i d 1 where 0^ and 0_ are respectively the roots of the determinantal equations: | iXjB - 0 n2W | 0 (1.2.9) - 7 -and | i^B - eCrx^ B • n 2¥) j = 0 (1.2.10) and where s Min.(p,ri) (1.2.11) 2 We have found another interesting expression of T in terms of weighted Mahalanobis distances. It is given in the last section of Chapter Five. In Chapters Two and Four, we have made use of this statistic in forming clusters and in assigning an arbitrary population to one of the clusters. The classical T 2 is known (Rao, 1952) to be distributed, under XV the null hypothesis, as central chi-square with n^ p D.F. In the case 2 2 of non-centrality parameter ^ 0, the classical T^ is non-central 2 chi-square distributed with n^p D„F0/and the parameter is defined as follows: i=l j=l r * l where /\" ± = (\\/^±r) / (N P) (1.2.13) r=l / r=l 2 The exact distribution of Studentized T, is not known in compact k standard form. Ito (1956) has given, under the null hypothesis, its approximate formula as: T.2 r V 2+ — 1 P ^ 1 V 4 * ..•]+••.. (1.2.14) k ** 2n 2 L n l P+2 J where % is central chi-square with n^p D.F. The use of in multi-variate analysis of variance (Anova) has been illustrated by Siotani (1958), who has constructed its tabular values for % and 1% significance levels for three or more dimensions,-( i i ) Wilks -A- -Criterion: Following the likelihood ratio method (Neyman and Pearson, 1928, 1931, and Pearson and Neyman, 1930), Wilks obtained a suitable extension of the univariate F-ratio in the form: A- =|n2w| j I n2W+ n ^ l (1,2.15) or alternatively as: A - T T (f_)-TT a-©±) (1.2.16) i=l i=l where and are respectively the roots of the determinantal equations |n2W - f(n 2W * n-B)! = 0 (1,2,17) and (1.2.10), where W and B are the usual mean products (M.P.) matrices, Wilks (1932) and Nair (1939) have given the exact distribution of for n^ =. 1, 2 and any p, and for p = 1, 2 and any n^ by comparing the moments of A. with those of F-ratio, Bartlett (1934, 1938, 1947) suggested its useful approximation;.;as \".follows: - n2) - |(p «• lj) loggA-^n/ ^ U p n ^ \" X Y? , .2 ^2 l (1.2.18) n,P 1 2 2 2 where Y ^ - (p i- n^ - 5) and ^ is central chi-square with f D.F. We have made use of this approximate test in Chapter Two in testing for the over-all homogeneity of the species taken in the illustrative example. More recently Bannerjee (1958) has been able to give the exact distribution of .A. in series form, but the tabular values are not yet available, ( i i i ) Wjiks-LawleyU-statistic and Pillai's V-statistic: There are two other statistics to test the homogeneity of k mean vectors due to Wilks-Lawley (1932, 1938) and P i l l a i (1954, 1956) defined respectively as: U = | n ^ l J | ngW + n-jBl (1.2.19) -1 and V = tr [(n2W + n^B)\" (OjB)] (1.2.20) These can also be expressed respectively as follows: \" - - ^ ( r - V ) , or= H (1.2*21) 1 1 ( i 7 T : ) ' o r = J J - ( e i } i=l 1 i=l I t and V = X o r * II ( 0 i ' (1.2.22) i=l 1 i=l where 0^ and are defined respectively as in (1.2.9) and (1.2.10). -10-These two statistics will be discussed further in Section 1.4* When the hypothesis of the equality of mean vectors i s rejected by the use of any of the above four statistics, three problems arise: (i) determining the confidence region for the population parameter corresponding to the statistic used to test the hypothesis of equality of mean vectors; (i i ) to find groups or clusters of populations having like mean vectors; and ( i i i ) to classify an arbitrary individual as belonging to one of the k normal populations, or an arbitrary population as belonging to one of the clusters. We have dealt with the fi r s t problem in Chapter Five and have discussed the method of giving a confidence region to ~C^* Finally, k we have demonstrated the method by taking a particular case with k = 2, p = 4, = 4, n 2 • 29. For forming clusters of populations with like mean vectors, Rao (1948, 1955) and Tocher (1948) have given a subjective approach which i s not based on probabilistic considerations. Working on the principle of minimum average distance, they have suggested a technique based on the criterion that 'any two groups belonging to the same cluster should at least on the average show a smaller D 2 than those belonging to different clusters•. -11-Rao's Graphical Approach A graphical approach to the same problem has been given by Rao on the basis of significant discriminant scores or canonical variates. Since we have also made extensive use of significant 2 2 discriminant scores in reducing and D 2 and likelihood functions to convenient and easy workable forms, we shall f i r s t discuss how Rao obtained these scores and then his graphical approach, Rao (1952), like Fisher, takes the linear combinations ^ i l ^ l + \"^ip^p »^ 2> maximizes the ratio: lX:l j e l J j 1=1 J=l and gets the system of equations, LCBW\" 1) : J L (1,2,24) where |\" (p x p) is a diagonal matrix with diagonal elements 0^ ( i - 1, 2, p) and L(p x p) is the matrix of coefficients of the discriminant functions, Without losing generality we can suppose that 0p & ^p-1 ' ' ^2 * ^1 a n d t e s ^ t n e i r significance by Bartlett's modified approximate formula (1.2.18) (Rao, 1952) given by: L I \"i 2 ( V V \"* 2 (P + \\ * x y l 0 « e ( l * 0±) = % i (1.2.25) .2 where i s c e n t r a l chi-square with (p -* n -»• l ) - 2i D.F, -12-By repeated use of formula (1.2.25), he gets a set of, say, p„' ( £• p) significant eigenvalues and hence the corresponding p1 significant discriminant functions. Placing their p 1 vectors of coefficients row-wise, so that the first row should correspond to the largest eigenvalue, the second to the second largest and so forth, he forms a matrix K(p' x p). Denoting further X( k x p) as the matrix of k sample mean vectors, he gets the matrix Y ( k x p ' ) of p 1 significant discriminant scores as: X K (1.2.26) Note: To find J3\" (p x p) and L(p x p) as the solutions of (1.2.21;), we can first symmetrize BW\"1 by the procedure suggested by Nash and Jolicoeur (unpublished,' lQf>°) which we have summarized in Appendix A and then apply the familiar technique due to Jacobi, which can be used on high speed computers. Thus, knowing the significant discriminant scores, Rao then suggests plotting them in a space whose dimensionality is equal to the number of significant eigenvalues. If there are only two significant eigenvalues, there is no difficulty in having the plane representation of the points in which the closeness of the points (populations) with one another can be easily visualized. But i t becomes difficult in the case of three or more eigenvalues. Rao -13-(1948) in such situations suggests having pair-wise plane representations of the points and then seeing (of course relying mostly on most significant scores) which of the populations l i e close to one another. In our discussion of the procedure for forming clusters in Chapter Two, we have sought a departure from Rao's and Tocher's subjective approach and have instead suggested two stages. Stage I i s a sort of prediction by making use of Rao's graphical approach. In Stage II we give f i r s t our own definition of a cluster. Then we propose to correct the prediction by three alternative statistics where in each, unlike Rao and Tocher, we are able to attach probability to our decision. The fi r s t alternative has been discussed with an illustration in Chapter Two and the remaining two briefly in Chapter Three. Our working criteria for a l l three cases are multivariate analogues of previous criteria used in univariate analysis of variance (Anova) for forming clusters of like groups. The choice of the level of significance i s that proposed by Duncan. Therefore we will discuss briefly such procedures for the univariate problem. Some of the methods of forming clusters of like groups in univariate Anova are the following: Fisher's least significant difference test, the Student-Newman-Keuls' range test, and more recently Scfo&ff&'s multiple F-test, Tukey's test based on allowances and his gap-straggler and variance test, Duncan's multiple range and F-tests based on degrees of freedom, and further extensions by Sawkin, Kramer, Hartley, and Roy and -14-Bose. A detailed explanation of these procedures with illustrations i s provided by Federer (1955). Since we have generalized Duncan's approach to the multivariate case, we give below briefly what he did. Duncan made a two-way attack on the problem - f i r s t by the multiple range test and second by the multiple F-test. To avoid duplication we will not give the description of his range test, since i t s procedure, except for significant ranges, i s just the same as the Stage I of his multiple F-test. Duncan's Level of Significance Duncan's multiple range test i s similar to the Student-Hewman-Keuls1 test and his multiple F-test similar to that of Scheffe. The only difference between Duncan and the others has been in the choice of a level of significance. He proposes that the level of significance should increase with the increase of the number of means in a group whereas others have kept the same pre-assigned level of significance as in the case of k-means. He justifies himself by arguing that any increase in the later levels would result in the increase of type II error and thus suggests that the r-mean ( r • 2, 3, k) significance level £ , for a pre-assigned , be r &L = ! - ( ! - / )*-! (1.2.27) r r — 2, 3, •».j k •15-vhere (r - l ) is the number of independent comparisons which can be specified among the r means. Duncan1s Multiple F-test Duncan, in this test procedure, has made use of both the range test and F-test by setting again the level of significance based on D.F. as described above. According to Federer, Duncan's test procedure can be set up in three stages of which we wil l give the fi r s t two - the second being the most important for our purpose: Stage I: The f i r s t stage, as pointed out earlier, is in fact Just the multiple range test but with different significant ranges. The procedure is as follows: (i) Compute the quantities R* = y2(k - 1)F, (r - 1/ f) r r. for r = 1, 2, ..., k, where, for a pre-assigned is defined as (1.2.27) and f is the D.F. associated with the pooled error variance A_. x ' ( i i ) Compute the quantities R = R' A_ (r = 1, 2, k). x ( i i i ) Compute the differences between the ranked means. (iv) Finally, compare these differences of the ranked means with R ( r » 1, 2, k) and determine the group of like means by following the criterion: \"The differences between any two means in a set of k means is significant provided the range of each and every subset which contains -16-the given means is significant™. Stage II: Stage II is the correction of the prediction made in Stage I. The procedure for correction is summarized as: (i) Compute the sum of squares among the combinations of the means bracketed together in the prediction. ( i i ) Compute the least significant sum of squares ( i i l ) Correct the predicted groups by following the criterion: •; \"The difference between any two means in a set of k^( ^ k) means is significant provided the variance of each and every subset which contains the given means is significant according to an ^ - l e v e l F-test where r is the number of means in the set\". As pointed out earlier, the third problem that can arise after the hypothesis of equality of mean vectors is rejected is to classify an individual as belonging to one of the k distinct normal p-variate populations or a population as belonging to one of the clusters. Assuming a priori that the individual, with measurements (X-^ X2> •••> does belong to one of the k populations, Rao (19^8) computes, where we ignore the a priori probabilities, the linear discriminant scores for the rth (r = 1, 2, k) population as -17-P P P P j=l i=l j«l and then suggests assigning the individual to the sth population i f L s is greater than every other I»r for r (^ts) = 1, 2, k 0 We have taken up, in Chapter Four, the problem of assigning a population known to belong a priori to one of the clusters and have suggested two alternative procedures - the first similar to the L-functions 2 and the second based on the statistic Finally, an illustrative example is given to demonstrate the theory. 1.3 Generalized Variance and its Moments Wilks (1932) defines the generalized variance to be the determinant of variances and covariances and considers i t to be a measure of the spread of the observations. He then presents the hth moment of the generalized variance in the null case as follows: If S be the sample variance-covariance matrix with n D.F, and JJ(p x p) = E(nS), then the hth moment of | A| (= (ns| ) in the central case is given by Wilks (1932): E L i A i h J = 2 * ¥ V t * ^ ^ * h> A v ^ S H * (w-w. i 1 2 Further, let k^ ( i 2 1, 2, p) be the real and nonnaegative roots of the determinantal equation: 0 -18-where T= || £ ( - ^ ) ( ^ - /*. ) || and /> . 1 £ ^ r=l r=l 2 2 Assuming now k^ = 0 ( i = 2, 3, ..., p) and k^ ^ 0, Anderson (I9k6) gives the h-th moment of | AJ in the non-central linear case as: (1.3.3) Making use of these moments we have found in Chapter Six the distribution of the determinant of the sum of products (S.P.) matrix A in the non-central linear case for some particular values of p, namely p = 2, 3, and h. l.k Problem of ElgenrBbtss of Certain Determinantal Equations It is shown in Section (1.2) that, for testing the hypothesis of the equality of mean vectors of samples drawn from k p-variate normal populations, the four statistics (1.2.8), (1.2.16), (1.2.21) and (1.2.22) can a l l be expressed as functions of the roots of certain determinantal equations. There are two other tests of hypotheses due to Roy (1939) and Hotelling (1936) which also result in the roots of the same type of determinantal equations with, of course, the use of different matrices. •19-Roy's effort (1939) to seek a statistic to test the equality of dispersion matrices 5 ^ and ^ 2 o f t w o P _ v a r i a t e normal populations finally led him, applying the same technique as Fisher's (1936), to test, instead of one, p Studentized statistics \\ ^, A g, \\^ (all positive in this case) which are the p roots of the determinantal equation in A : | n ^ - A n2W2| = 0 ( l A . l ) or alternatively, by substituting ©. = ^ . (i=l,2,...,p), the roots of j n ^ - © ( n ^ + n2Wg)| = 0 ( l A . 2 ) where n,H, and n„W_ are the S . P . matrices estimated from the respective 11 d d samples. To test the hypothesis of the independence of two sets of variates, such as p measurements of physical- characteristics such as lengths and breadths of skulls and q measurements of mental characteristics such as scores on intelligence tests, Hotelling (1936) considered the determinantal equation of the roots ^ ( i = 1, 2, p) and (p < q) of -1 , ¥* w w» - ew' = 0 (1.^.3) pq qq qp pp \\ or Iw' W« ¥' -©[(¥' - ¥' W ¥' ) + ¥ ' ¥' ¥' ]{ =0 (l.^A) ( pq qq qp Lv pp pq qq qp pq qq qpj| -1 .'Here ¥' ¥' ¥' and ¥' are independent S . P . matrices with q and pq q.9. qp PP -20-(N - q - l) D.F, and N is the size of the sample of individuals drawn from a (p •+ q)-variate normal population with covariance matrix JJj Further W1 is the S.P. matrix of the sample observations on the PP p-set of variates, W1 that on the q-set and W that between the qq qq observations on the p-set and those on the q-set, Thus in multivariate Anova (Pill a i , 1954) the three tests of hypotheses above, i.e. I, \"equality of two dispersion matrices\", II, \"equality of the p-dimensional mean vectors\", and III, \"the independence between a p-set and q-set of variates\" depend, when the respective hypotheses to be tested are true, only on the roots 0^ or 0^ ( i = 1 , 2, . . . . , & ) respectively of the determinantal equations | A - 0 ( A * C)| = 0 (1.4.5) and |A - 0G | = 0 (1,4.6) where A and C\" are independent S.P, matrices based on sample observations with n^ and D.F. respectively and can be defined differently for different hypotheses. The common standard form (Nanda 1948, Roy 1957) of the joint distribution of the eigenroots of (1 .4 .5) , under the respective hypotheses, is A m ' ^ ' 1 TT TT i = » i * 2 j=l i = i for. 0 £ &L 5 9 2 < ... < 0 i 1 and £ defined as in (1.2.11), where -21-t/2 l c(m,m, C ) - ~ | ~. . . (1.4.8) J [ p ( » ^ * i ) \\ > y - i ) f ( i ) i»l • £ , m, n can be different in the different situations defined below in (1.4.12) and (1.4.13). The common standard form (Hsu 1939) of the joint distribution of the eigenroots of (1.4.6), under the respective hypotheses, is l l ^ t cKn.n IT ^ ( i * * t r ( B * , , * , * 1 ) T T TTt'i-'V T W i=l i s 2 > l »\"=i (1.4.9) for 0 « jl. & jrf. « ... t ^ c o o 1 2 p where 2 , c(m,n,t ) are defined respectively as in (1.2.11) and (1.4.8) and t , m, n can be different in the different situations defined below in (1.4.12) and (1.4.13). •Finally, Nanda (1948) gives the limiting form of (1.4.7) by setting © i= and then letting n tend to infinity. The limit is 1 I I i - i « K(t,m) ]~[ ^ e x p j \" - ^ c i ] T T T T ( c i ~ C j ' T T d c i d-4.10) ' 1*1 ^ i-1 J i=2 j.-l i=l where K( I ,m) = ^ \\ J] P 2 J ^ L j L l ± ) ^ ( | ) ( l . 4 e l l ) e i 1 -22-and I i s the same as in (1.211). Again g , m assume different values defined below in different cases. Finally, for the three tests of hypotheses I, II and III, we can sum up the values of £ , m, n for respective hypotheses as: I. I = p, m = - p - 1), n - |(n 2 - p - 1), (1.4.12) II. If p 6 x^, ft a p, m = 1(1^ - p - 1), n = |(n 2 - p - l) If p > n , £ = n . m = i(p - n - 1), n = i( n - p - l ) 1 1 2 1 2 2 (1.4.13) III. Same as II No great headway has been made so far in finding the distributions of the various statistics we have discussed. The exact or approximate distributions of two statistics T2. and .A. have already been discussed in Section (1.2). Below i s the brief account of the other statistics: Roy (1943) proposed the statistics - largest, smallest or intermediate eigenroots of the determinantal equation (1.4.5) to test hypothesis I, II and III. Roy (1943) and Nanda (1948) have both worked out the distributions both for the limiting and non-limiting cases. Their tabular values have been given by P i l l a i (1957) for the cases 0 = 2(1)5, m = 0(1)4 and n = 5 to 1,000 both at 5% and 1$ significance values. P i l l a i (1954, 1955, 1959) has succeeded in giving an approximation to his statistic V defined in (1.2.22) and has been able to tabulate i t for t = 2(1)5, m = - ,5(.5)5(5)80 and n = 5(5)80. Nanda (1950) has also given its exact distribution for the special case when m = 0. -23-We have also been able to work out the exact distributions of various statistics for certain special cases in Chapter Seven. We have been able to give the distributions of a l l the statistics for the cases Q> - 2, 3 in the form of definite integrals which can be easily evaluated by some numerical method. The limiting distribution of Roy's statistics by another method of integration have been found and particular cases evaluated. Lastly the limiting distribution of Wilks-Lawley U-statistic for the cases II = 2, 3 and 4 has also been found. 1.5 Note on Analysis of Variance Under both the Models I and II (Eisenhart) of Anova one is faced with two types of situations - fi r s t l y when the cell frequencies are equal and secondly when they are unequal. These cases are usually called balanced and unbalanced respectively. Balanced Anova For tests of significance in both univariate and multivariate balanced Anova of Model I and II and further for finding confidence regions again in both univariate and multivariate balanced Anova of Model I, there i s not much difficulty. One can refer for such univariate problems to the various standard books, e.g. by Federer, Fisher, Anderson and Bancroft, Bennett and Franklin, Snedecor, Kempthorne -24-and others, whereas for the multivariate problems sufficient material has been developed by Roy and Bose (1953), Roy (1955, 1956), Roy and Gnanadesikan (1959, I and II), Tukey (1949), Bartlett (1934, 1938, 1947), Kempthorne (1952), Rao (1948) and others. The real difficulty arises in both univariate and multivariate problems when, in Model II, one i s finding the confidence regions for the complex estimates (Satterthwaite, 1946) of the variance components, since in that case their corresponding distributions are not known. To overcome this difficulty in univariate problems various methods, approximate or otherwise, have been suggested. The more prominent amongst them are those due to Satterthwaite (1941, 1946), Brose (1950), Fisher (1935), Roy (1954a, 1954b, 1956), Roy and Bose (1953), Roy and Gnanadesikan (1957, 1959 I and II), Cornfield (1953), Ramachandran (1956) and Grayball, Morton and Godfrey (1956). Since we have made use of Satterthwaite's technique in our work in Chapter Eight, we briefly summarize what he did while finding the distribution of complex estimates; Satterthwaite's Procedure Let be the mean squares independently distributed as A^)£ 2 f where j£? i s central chi-square with f^ D.F. The procedure i s to L 2 2 ( a ^ ) , a^ being constants, by , | being chosen so that the fi r s t two moments of the former are equal to those of the latter. Therefore, -25-i and i i From (l.5\".l) and (l . j f .2) we have ,2 2 U 2 Since X7~~^ are not known, he suggests to substitute for them their respective estimates and gets: 2 (1.5-3) It is again unfortunate that very l i t t l e has been accomplished in analogous multivariate problems. Roy and Gnanadesikan (1959* I and HI) have recently been able to give a lead,but their approach is under the;,very restrictive assumptions of ^ - ^ ( P X P ) = \\ (P X P)> i« e« °f proportional dispersion matrices, proposed usually (Federer, 1951) for certain types of genetical problems, where 2J. ± i s t 5 i e covariance matrix, due to the ith factor.. Unbalanced Anova The problem is considerably complicated for both the cases of uni-variate and multivariate unbalanced Anova especially of Model II. In the univariate balanced case the mean squares were independent and distributed independently as chi-square but the situation now is worsened -26-by the fact that the mean squares are not orthogonal and hence are not distributed as central chi-squares. They are in fact distributed (Anderson and Bancroft, 1952) as sums / ( A Y 2 ) where A are L— r r r functions of the variance components and the number of observa-p tions, and each Y^ * i s a central chi-square with 1 D.F. Since the ^ r are distinct, we cannot apply the additive property of independent chi-squares to the sums ( A % 2)» Similarly for corresponding multivariate situations, the M.P. matrix i s no longer distributed as a Wishart matrix but, as proved in Chapter Eight, i s distributed as a sum (Wr) of independent r Wishart matrices W^ , each distributed as W \\T[ r» l j • If these Wishart matrices W had the common corresponding parameters, i.e. r ^ g 2£ ^ = ^ £ (say), then there would be no problem. We could then simply use the additive property of independent Wishart matrices and would get another Wishart matrix. We have attempted, in Chapter Eight, to find the approximate distribution of mean squares or M.P. matrices. We have determined fi r s t the values of the above quoted quantities A and 2^ and then have applied Satterthwaite1s technique in approximating the di s t r i -butions of sums / (A X 2) a n d / C O . i.— r r Z r CHAPTER TWO ANALOGUES OF DUNCAN'S PROCEDURE IN FORMING CLUSTERS IN MULTIVARIATE ANOVA 2.1 As already stated, we sometimes come across the following type of problem in anthropology and the biological sciences, namely this, certain multivariate populations are found to be distinct, and we want to find out which populations are most nearly alike and which are least alike. To do this, we propose to extend Duncan's procedure of the multiple comparisons' tests used in univariate Anova and to seek a departure from Rao's and Tocher's subjective approach. We give below first a different definition of the cluster and then, after clearing some preliminaries, suggest a procedure based on probabilistic considerations. Definition of a cluster: \"A cluster of populations i s a group of populations having the same vector mean.\" 2.2 Preliminaries and Procedure Suppose we are given k p-variate normally distributed populations ( i = 1, 2, pj r « 1, 2, k and h = 1, 2, N ) be the assumed to have the same dispersion matrix Let X irh -28-observation of the ith trait on the hth individual from the rth sample of size drawn from the rth population. Further, let B and ¥ be the between and within independent S.P. matrices, with n^ and n^ D.F. respectively, computed on the basis of k p-variate samples defined respectively in (1.2.2) and 1.2.1.). Suppose also the hypothesis of homogeneity of mean vectors of the populations has been rejected by the use of Wilks-W statistic (1.2.15) and Bartlett's approximation to its probability (1.2.18). Knowing thus that the populations are heterogeneous, we proceed to form clusters. Before doing this we make the following preliminary remarks: 2 2 Since we have made frequent use of both Studentized D^ and T~, i t would be appropriate to modify them to an easily workable form. To do this we derive f i r s t the significant discriminant scores discussed already in Section (1.2). We sum the matter up briefly in the following steps: (a) Find, by the method given in Appendix A, anonsihguiarl matrix L(p x p) and the diagonal matrix j|{p x p) as the solution of (1.2.24). (b) Test the significance of 0^^ by the formula (1.2.25). Without losing generality suppose the fi r s t p'( £ p) of the p eigenroots are significant and the last (p - p 1) are non-significant. Discard the last (p - p 1) eigenroots, and hence the corresponding eigenvectors, because they in fact account for random variation. ; Obtain the matrix K(p' x p) of the eigenvectors whose firs t row corresponds to the largest eigenroot, i t s second to the second largest and so forth to the smallest one left, namely the p'th. Taking X^Ck x p) to be the matrix of k sample mean vectors, using columns for characters and rows for sub-population —t samples, compute the matrix Y ( k x p 1 ) , defined as in (1.2.26), which is the matrix of significant discriminant scores, and whose fir s t column gives the discriminant score corresponding to the largest eigen value, the second column to the second largest, and so forth. With these scores, the Studentized statistics D 2 and T 2 reduce from (1.1.2) and (1.2.6) respectively to: (2.2.1) i=l and i=l r=l (2.2.2) where (2.2.3) -30-Note; The same technique works for the corresponding classical D*. Statistic Used For testing the hypothesis of equality of the mean vectors involved in a cluster we suggest an analogue of Duncan's Stage 2 of the multiple F-test. He computed the variance of the means involved in a predicted group of like means and tested i t against his least significant sums of squares with type I error based on D.F. In the multivariate situ-ations as the analogue of his \"variance of the means involved in a cluster\" we propose an expression T 2 , where k (5 k) i s the number k l 1 of sample mean vectors of the populations involved in the predicted cluster. The distribution of TT , under the null hypothesis, i s 1 known in the classical case to be central chi-square with p(k^-l) D.F. and in the Studentized case to be an asymptotic expression involving chi-squares as shown in (1.2.14), where again the D.F. for chi-square i s p d ^ - l ) . Note: It should be noted that we have used p instead of p' for defining degrees of freedom, since (Rao, 1948) the effect of a l l p correlated variates has been taken care of by the discriminant scores. Level of Significance or Protection Level In selecting the level of significance or protection level we again propose to follow Duncan. In order to keep the two types of errors well balanced, we shall let the type I error increase with -31-the increase in the number of populations in a cluster. Thus with (6 k) populations in a cluster, for a pre-assigned significance.level / , we shall fix the level of significance to be: JLV = 1 - (1 - / ) 1 (2.2.4) Preparation of Tables for the new Levels Since the statistic T^ involves a central chi-square for both the Studentized and classical cases, we need to modify the central chi-square tables for both 5% and 1# significance levels and also for different values of k = 2(l)(20), To do i t , we proceed as follows: The table 1 below gives the various significance levels 1 - Y k (= ^ or = Q) for k = 2(l)(20), for pre-assigned significance levels 5% and 1$. Again table 1 gives under the column X the normal variates X corresponding to each level of significance Q. X has been used in the computation of tabular values of chi-squares. To compute these X values, a linear interpolation formula: f(X) - f(X Q) X = X, :o* ilXj)- *U Q) ( x i \" x o ) ( 2 - 2 ' 5 ) has been used where X i s the normal variate to be determined between the two known normal variates X^ and X^ and where also f(X) (= Q) i s a known quantity and f(X^) and f(X^) ? corresponding respectively to Xrt and X., are taken from table I of Hartley and Pearson, 1954. 0 1* -32-Table 1 1$ k V k - 1 - Q l - Y k - Q X \\ - 1 - Q 1 -Y k- Q X 2 0.9500 0.05000000 1.64490 0.9900 0.0100 2.32630 3 0.9025 0.09750000 1.29600 0.9801 0.0199 2.05584 4 0.85737500 0.14262500 1.06860 0.970299 0.029701 I.88523 5 0.81450625 0.18549375 0.89450 0.96059601 0.03940399 1.75766 6 0.77378094 0.22621906 0.75136 0.95099005 0.04900995 1.65455 7 0.73509189 0.26490811 0.62830 0.94148015 0.05851985 1.56729 8 0.69833729 0.30166271 0.51960 0.93065349 O.O693465I 1.48068 9 0.66342043 0.33657957 0.42180 0.92134695 0.07865305 1.41421 10 0.63024941 0.36975059 0.33250 0.91213348 0.08786652 1.35403 n 0.59873694 0.40126306 0.25008 0.90301215 O.O9698785 I.2989I 12 0.56880009 O.43119991 0.17330 0.89398202 0.10601798 1.24800 13 0.54036008 0.45963992 0.10140 0.88504220 0.11495780 1.20058 14 0.51334208 0.48665792 0.03350 O.87619178 0.12380822 1.15617 15 0.48767497 O.51232503 -0.03090 0.86742986 0.13257014 1.11434 16 0.46329122 0.53670878 -0.09220 0.85875556 0.14124444 I.07476 17 0.44012666 0.55987334 -0.15065 0.85016800 0.14983200 1.03717 18 0.41812033 O.58187967 -0.20670 0.84166632 0.15833368 1.00134 19 0.39721431 O.60278569 -0.26060 O.83324966 0.16675034 O.96710 20 0.37735359 0.62264641 -0.31244 1 , 0.82491716 0.17508284 0.93428 Figure 1 -33-Further, the study of the behaviour of the chi-square curves (Fig. l ) for various degrees of freedom is very helpful. From Fig. 1 i t is obvious that with the increase of degrees of freedom, chi-square curves tend to be symmetrical while for the smaller degrees of freedom they lack symmetry. Thus direct linear interpolation of ( l - Q) values along with the corresponding chi-square values (especially for the smaller degrees of freedom) cannot be expected to lead us to accurate results. To keep the accuracy for the smaller degrees of freedom and also the uniformity of method^ we have decided to use, instead of ( l - Q) values, the corresponding normal varlates X shown in table 1. Then, the Aitken's Iterative interpolation formula has been used to compute the tabular chi-square values. We give below a demonstration of the method for 3 D.F. against the normal value 2.0558M4-. Then some of the values have been actually computed both by the use of ( l - Q) values and the corresponding X-variates and have been list e d below in table 2 . A brief glance over the table 2 w i l l show that as the degrees of freedom increase, both methods lead approximately, to the same result. -34-Demonstration of the Method Let D.F. = 3, X = 2.055844 and v j 2 corresponding to X is to be found. X X2 X - X 1.6449 7.81473 -0.410944 1.9600 9.3484 9.81489 -0.095844 2.3263 11.3449 9.94373 9.84660 0.270456 2.5758 12.8381 10.03228 9.84872 9.84846 0.519956 Thus adopting Aitken's iterative method for interpolation, the new chi-square values have been computed at various significance levels X for k = 2(1)20 and D.F. = 1(1)30(10)100 for pre-assigned /- • .05 and .01. We record them for use in Appendices C and D respectively. Table 2 D.F. 1 - Q 2 X -corresponding to Q-values X-normal • variates X -corresponding to normal variates 3 .9801 9.71768 2.055644 9.84846 10 .9703 19.88597 1.885233 19.95269 25 .9703 39.90252 1.885233 39.92268 Finally, to find in the Studentized case the tabular values for any k, we have to use the formula (1.2.14) and substitute in i t the newly com-puted chi-square values with n. p D.F. i l and n are the degrees of freedom -35-respectively for between and within independent covariance matrices and p i s the number of characters. Since our illustration which is presented for demonstration concerns the studentized Tp, i t s tabular values needed for the purpose for k - 2(1)5, « 1(1)4, p = 4, and n 2 = 29 at 5% and 1$ significance levels are tabulated approximately and presented below in table 3. Table 3 D.F.sp(k-l) =nxP *2(,05) k X?(.ca) k T 2 r ( . 0 5 ) k ^ ( . o i ) k 2 4 9.4877 13.2767 12.1371 18.2030 3 8 13.4428 18.1825 16.7783 24.0936 4 12 17.1889 22.7746 21.7064 29.9100 5 16 20.8200 27.1912 25.6131 35.6187 Note: The tabular T^ values have been computed on the assumption that terms involving the third and higher powers of — are negligible. n2 In fact they may affect the fourth significant figure. 2.3 The Proposed Stages for Forming Clusters We propose two stages for the purpose. Stage I comprises three steps wherein we predict the possible clusters. Stage II then corrects -36-the predictions on some probabilistic basis. So far three alternative methods have been proposed for Stage II. The first has been discussed with illustration in this very Chapter and the other two will be described in Chapter Three. The methods are as follows: (i) The Duncan-Hotelling test. (i i ) The 'Extreme Distance from the Mean' - E-test. ( i i i ) The 'Largest Distance' - R-test. Stage I: Prediction Step 1: Compute ( j£ ) Mahalanobis distances by the formula (2.2.1) between a l l the pairs of k populations and set up the table of distances, where the distances of each population from the remaining ones are arranged in order of increasing magnitude. Such a table (like Table 7) will help us to visualize which of the populations are closer to a particular one and which are farther away. Step 2: Represent graphically the significant discriminant scores of each population. For p' > 2, they should be represented pair-wise on plane graph paper. Relying largely on the plane rep-resentations of the most significant discriminant scores, visualize which of the populations cluster together and which of them l i e farther apart. Step 3: Step 3 deals with the prediction of the clusters on the basis of the fir s t two steps. Keeping in view the table of -37-distances and the graphic plane representations, estimate roughly the 'would be' clusters - closeness being the only criterion for the populations to form a predicted cluster. The following two points are worth noting: (i) That a wide range should be allowed to the clusters since giving a narrow range might result in the loss of a population lying actually in the cluster. ( i i ) That overlappings should be allowed since sometimes one is uncertain as to whether to include one (or more) populations) in one or the other cluster(s). In a l l such cases i t is advisable to include the doubtful cases in a l l the neighbouring ones. Stage II: Correction by the Duncan-Hotelling Test No generality is lost i f we. explain the procedure for only one predicted cluster having k^ populations in following steps: o (i) Compute the statistic T, by the formula (2.2.2). p p ( i i ) Compare the computed T with the tabular T, where £, is kl k l already defined as in (2.2.^). 2 2 ( i i i ) If T, is less than or equal to T, , a l l the k- populations 1 ^ X -38. are concluded to form a cluster. Otherwise, split the k^ populations into sets of (lc^ - 1) populations each, (iv) Compare the computed T^J^^^J values for each of the k^ sets with the tabular T? . Of these some may be ^k^-1 significant and some may not be. Those non-significant will yield clusters with the corresponding number of populations involved in them. Those for which T 2 k x - l values are significant are further split into (k^ - l) sets of (lc^ - 2) populations each and their corresponding T 2 values are compared with the tabular T 2 . In k-L-2 this way the process i s continued t i l l we arrive at the clusters of the type defined. Thus the working criterion analogous to Duncan's can be presented as: \"A group of 1c. populations will form a cluster i f T 2 computed for 1 k x the mean vectors of the k^ populations i s non-significant and also the T 2 of each and every set of populations of which the k^ populations form a subset i s significant according to JL -level T^-test for some pre-r r assigned L , where r i s the number of populations in the set.\" Note: The above procedure i s for the Studentized case. In the classical case the procedure i s the same except for the use of tabular chi-square values in place of T^-values. -39-2.4 Demonstration of the Above Procedure by an Example To demonstrate the theory we present below an example where the samples have been drawn on the basis of nested sampling: Description of Data Data has been taken from the 'Forest Products Laboratory Division, Forestry Branch, Department of Northern Affairs and National Resources, Vancouver, B.C., Canada1. Shipments of logs of various species of trees from various localities of Canada were received. The interest lies in comparing the species on the basis of static bending properties. For this purpose the following six measurements were taken at several loca-tions in each tree: X^ : Modulus of elasticity; X^: Work to the maximum limit; X^: Fibre strength at proportional limit; X.: Modulus of rupture; X : Specific gravity at oven dry; 5 and X^: Work to the proportional limit. Note: While finding the values of the determinants of the S.P. matrices to be used for tests of significance, i t was found that they came out to be zeros, which enabled us to conclude that the variables were functionally dependent. The fact was actually verified when the physical interpretation was known. The last two variables X_ and X, were found to be functionally dependent on the 5 o firs t four X^ , X 2 , X^ , and X^ . We thus -ko-discarded X_ and and continued our work on the 5 6 variables X^ , Xg, X^ , and X^. The species taken for the purpose are listed as follows: (l) Yellow cedar, (2) Lodge pole pine, (3) Western larch, (k) Western yellow pine, (5) Western white pine, (6) Western white spruce, (7) Sitka spruce, (8) Amabilis f i r , (9) Western hemlock, (10) Engelman spruce, ( l l ) Western red cedar, (12) Coast mature Douglas fir, (13) .Interior mature Douglas f i r , and (l4) Coast second growth Douglas f'ir. Note; In what follows we will call each species by its corresponding number instead of specifying each time its name. Description of the Model of Wested Sampling We have the mixed model of Tflested sampling - with fixed species and random localities and locations on trees. Further, the number of localities and locations is not uniform in a l l cases. Let X^jj.^ be t n e observation of the ith character on the t th location of the t-th tree belonging to the jth locality of the hth species. In place of observation X^^.^ w e were provided with the means X^j-j . along with the corresponding number of locations. The model for such data would be: 4 J t + l)(h) + it(hj) + *hjt <2A.i) where ( l ) X j ^ s (X^^., X^^) is a four dimensional mean vector of locations on the t-th tree from the jth locality of the hth species. -41-(2) is the four dimensional mean vector of the populations and X ....is the corresponding sample statistic. (3) ^ is again the four dimensional hth species fixed effect, but for the sake of illustratbn we will take i t as random, distributed normally with mean vector zero and covariance matrix (4) !Lj(h) * s t h e f o u r divisional jth locality within hth species random effect, normally distributed with mean vector zero and covariance matrix ZL (5) J L j - ^ j ) i s t h e f o u r dimensional t-th tree within hth species from the jth locality random effect, normally distributed with mean vector zero and covariance matrix (6) is the four dimensional mean error vector of v ' -hjt -hjt£ where each ^ j ^ g l s random and normally distributed with mean vector zero and covariance matrix 2£ £ (7) Finally, i^, ^ ( ^ j a n d t(hj) a r e * n d e P e n d e n t a n d Our model is just the analogue of the univariate model on jested sampling with unequal cell frequencies presented by Ganguli ( l 9 4 l ) . We follow his method for finding the cceefficients of the expected M.P. matrices and end with the Table 4 of analysis of variance. -1+2-Table k Source of Variation D.F. S.P. Matrices E(M.P. Matrices) Species 13 A E f i + 1 3 . 3 8 1 ^ + 8 1 . 2 7 ^ ^ + Localities within species 29 B + 81.26 5f.n Trees within localities 217 C 13.372 %fc * Locations 32U8 D We do not have this row in our example since we have only the mean observations on each tree. ' A = ( Z [ v . / \\ h . . . - \\ . . . > ( % h . . . - % . . . ? J y ) Here and (JJ) = 10675527 38557 3097l6*+7 53851101 38557 305 156717 273320 3097161+7 156717 121780733 201012595 53851101 273320 201012595 3*+3055522 B (Z H [ \\ j . . ( \\ h j . . r \\ h . . . ^ % h j . . . - % h . . . > j j '29' 988308 1397 19365^1 31679^9 1397 21 6231 12721 19365^1 6231 7821366 9^69922 31679^9 12721 9^69922 15396656 and C = \" h j t r i ^ t T X i 1 h j . . ) ( X i 2 h j t 7 X i 2 h j 299438 558 593421 994326 558 7 1496 313 593421 1496 21011669 2575188 994325 313 2575188 4281234 Note: Referring back to Table 4 showing the analysis of variance, we notice that the corresponding coefficients in the formula treat i t as a problem of nested sampling with equal numbers in the sub-classes and will proceed with the usual procedure of tests of significance. To test the locality effect, Wilks' TV -criterion was applied to the independent S.P. matrices B and C, with 29 and 217 D.F. respectively, and the locality effect was found to be significant by Bartlett's approximate test (1.2.18). Similarly the species effects were found to be significant upon taking the independent S.P. matrices A and B respectively with 13 and 29 D.F. From this we may conclude that the species are heterogeneous. for expected values are approximately equal. Thus we will Start of the Problem After concluding that the fourteen species are heterogeneous, we proceed to our main problem of forming clusters as follows: -44-We treat A and B respectively as the between the within matrices with 13 and 29 D.F. and present below in table 5 \"the means of the characters of the species along with the corresponding sizes: Table 5 Species No. Size x l X 2 X 3 X 4 1 264 1311 8.04 3664 6527 2 78 1285 5.35 2989 5657 3 158 1648 7.85 5002 8609 4 212 1137 5.45 . 3334 5718 5 324 1183 5.13 2877 4818 6 93 1113 5.76 2644 4831 7 380 1368 4.84 3078 5408 8 436 13 1^ 5.57 2999 5460 9 200 1477 6.68 4150 6952 10 90 1251 5.36 3079 5662 11 207 1046 4.87 3102 5302 12 U58 1650 6.97 4491 7548 13 348 1647 6.59 4099 7351 14 260 1583 7.41 4285 7697 He solve for L(4 x 4) and_ (^4 x 4) the equation by the method described in the Appendix A, and get: -45-L(4 x 4) = -0.001064336 0.001162664 0.001336923 0.000325045 -0.158567182 0.369050519 -0.069180685 0.023168191 -0.000067004 0.000134634 -0.000181461 0.000669918 0.000590678 -0.000497864 -0.000028873 -0.000460445 J and j?(4 x 4) E 25.94 0 0 0 0 11.84 0 0 0 0 5.65 0 0 0 0 1.65 Applying Bartlett's modified first approximation test (1.3.25) we test the significance of the eigenroots 0, i.e. of 25.94, 11.84, 5.65 and I .65, and find 1.65 to be non-significant at the 5% level. Discarding thus the last row of L(4 x 4) which corresponds to 1.65, we get the matrix K(3 x 4). Now, i f X (14 x 4) be the matrix of mean vectors of species given in the last four columns of table 5, we get, by the formula (1.2.26) the matrix ^ (14 x 3) of significant discriminant scores which are presented below in Table 6 again, along with their corresponding sample sizes. (See Table 6, following page.) Finally we compute the distances between the (\"H*) pairs of species of trees by the formula (2.2.1) and present them in Table 7 - called \"Table of Distances\", arranging the distances of each population from the remaining ones in order of increasing magnitude. (See Table 7, page 47.) Also we plot these points pair-wise, i.e. (Y^, Y^), (Y^, Y^) and (Y_, Y-) on the plane graphs which are shown respectively in Fig.2, Fig.3, and Fig.4. -46-Table 6 Species No. Size Y, Y0 Y 1 264 0.94597083 1.72039748 0.34593229 2 78 0.91725592 1.07290071 0.63864788 3 158 •., 1.74328671 1.21889759 0.50048469 4 212 1.07183611 O.95381032 0.36949996 5 324 0.58531407 1.24622259 0.56758425 6 93 0.57211123 1.38532958 0.46747816 7 380 0.75515749 1.12082710 0.77524229 8 436 0.70890607 1.31124555 0.70355287 9 200 1.19390269 1.28747391 0.55733533 10 90 0.95036642 1.04299783 O.57671666 11 207 1.03365387 0.80245391 0.34345856 12 458 1.29139812 1.34851322 0.68878220 13 348 1.26791986 1.24270782 0.78926436 14 260 1.40109551 1.31631867 0.60461486 Note: The column under Y^ corresponds to the largest significant discriminant score, the column under Y g to the second largest and that under Y^ to the third largest significant score. Table 7 (Table of Distances) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 6/ .2669 10/ .0058 14/ 11/ 6/ .1375 .0250 .0296 5/ .0296 8/ .0436 .0380 12/ .2/ 4/ 13/ 12/ 12/ .0306 .0058 .0250 .0119 .0119 .0202 9/ .2936 77' .0472 12: 10 8 .2565 .0657 .0380 8 .0800 2 .0472 7 .0436 14 4 10 14 14 .0460 .0657 .1193 .0202 .0573 9 .0460 8 .3516 8 .1048 9 2 7 .3098 .1106 .0877 7 .1983 10 .0836 6 .0800 13 7 2 9 9 .0613 .0836 .1739 .0306 .0613 13 .0573 12 .3752 4 .1106 13 9 2 .3100 .1616 .1453 2 .2427 5 .0877 2 .1048 10 11 9 2 2 .1195 . H 9 3 .3067 .2185 .1745 3 .1375 5 ,4o4i 9 .1292 4 7 10 .5384 .2929 .1747 1 .2669 6 .1983 10 .1464 2 9 13 10 10 , 10 .1292 .1195 .3475 .2223 .i860 .2698 l 4 .4374 5 .1453 10 14 4 .6655 .2951 .3614 10 .2723 9 .2675 9 .2572 4 8 7 3 7 .1616 .1464 .3654 .2565 .2780 2 .2946 2 .5078 11 .1739 11 13 9 .7018 .2984 .3722 9 .4043 13 .2780 13 .3245 8 5 5 4 4 .2572 .1747 .4483 .3060 .2984 4 .2951 10 .5122 13 .1745 2 12 1 .7232 .3060 .4o4l 4 .4457 4 .2929 12 .3409 7 13 14 8 3 .2675 .i860 .4676 .3409 .3100 1 .4374 13 .5284 12 .2185 1 5 11 .9112 .3614 .4483 12 .5677 12 .3470 1 .3516 1 12 12 7 8 11 .2936 .2223 .4840 .3470 .3245 .4674 7 .5802 6 .2427 7 8 13 I.0615 .3712 .5151 11 .5681 11 .3654 4 .3712 11 14 8 1 11 .3067 .2698 .4941 .3752 .3475 7 .4847 4 .6o4i 14 .2946 8 6 12 1.1198 .4457 .5237 13 .6080 14 .4847 14 .4890 3 6 6 11 5 .3098 .2723 .5681 .4840 .5151 8 .4890 11 .8504 1 .5078 5 3 14 1.3460 .5384 .6718 14 • .7108 1 .5802 11 .4941 5 1 3 5 1 .3722 .5122 .7018 .5237 .5284 5 .6718 3 .9111 3 .7232 6 1 3 1.4003 .6o4i 1.3460 3 1.4003 3 I.0615 3 1.1198 6 3 1 6 6 .4043 .6655 .8504 .5677 .\"6080 6 .7108 -51-Forming of Clusters Stage I: In Stage I we predict the clusters, keeping before us Table 7 and Figures 2, 3> and k. Relying on the plane representation of the most significant discriminant scores and and then following the criteria discussed in Step 3 of Stage I in Section (2.3), we predict the following clusters: (1) 2, 5, 6, 7, and 8. ( i i ) 2, 5, 7, 8, and 10. ( i n ) 2, 10, and 11. \"(!•) 2, K 9, and 10. (v) 9, 12. , 13, and Ik. (vi) 1 and 3 by themselves. Stage II: We now correct the above predicted clusters for each of which we have a tabular set up given below, and from them we cobtainc the corrected clusters. -52-Table 8 Populations involved Computed i D.F. Tabular T 2 *- k Conclusion Cluster 5% 2,5,6,7,8 34.47 16 25.6131 35.6187 Significant 2,5,6,8 19.89 12 21.7064 29.9100 Non-significant 2,5,6,8 2,5,6,7 41.43 » II it Significant 2,6,7,8 40.56 n II u it 2,5,7,8 34.21 n n II Significant 5,6,7,8 27.91 it II it II 2,5,7 20.50 8 16.7783 24.0936 Significant 2,6,7 22.83 II it ti it 2,7,8 13.61 ti II « Non-significant 2,7,8 5,6,7 23.37 ii II n Significant 5,7,8 20.44 4 it n n it 6,7,8 19.21 II ti ti II 6,7 17.38 4 12.1371 18.2030 it 5,7 15.35 4 it ti Table 9 Populations involved Computed T2 k D.F. Tabular T 2 k Conclusion C 1 Luster 5% 2,5,7,8,10 34.98 16 25.6131 35.6187 Significant, 2,5,7,10 27.62 12 21.7064 29.9100 n 2,5,8,10 25.30 n 11 it it 5,7,8,10 30.15 n n 11 it 2,5,7,8 26.31 n n 11 n 2,7,8,10 21.37 it it 11 Non-significant 2,7,8,1C 2,5,7 20.50 8 16.7783 24.0936 Significant (*)2,5,8 17.79 n n it Significant 2,5,8 2,5,10 18.90 n it it n 5,7,8 20.44 it it n tt 5,7,10 23.62 it » » 11 5,8,10 19.07 II ti ti it 5,7 15.35 4 12.1371 18.2030 tt 5,10 12.98 4 n ti tt Table 10 2,4,10,11 15.76 12 21*7661} 29*9i6d Non-significant 2,4,10,13 (*) We could exclude this from being considered because i t already has been included in the bigger cluster (2,5,6,8). -54-Table 11 Populations involved Computed D.F. Tabu. 5% Lar T 2 1* Conclusion Cluster 9,12,13,14 16.62 12 21.7064 29.9100 Non-significant 9,12,13,14 Table 12 2,4,9,10 24.37 12 21.7064 29-9100 Significant 2,4,9 21.82 8 16.7783 24.0936 ti ) 2,4,10 8.20 it it n Non-significant 2,4,10 2,9,10 11.45 » II n n 2.9.10 4,9,10 20.42 i II n ti Significant 4,9 16.624 4 12.1371 18.2030 ti We could exclude this from being considered because i t already has been included in the bigger cluster (2,4,10,11). Thus, from tables 8 to 12, one concludes that the following are clusters: (a) 2,5,6, and 8. (b) 2,7,8, and 10. (c) 2,9, and 10. (d) 2,4,10 and 11. (e) 9,12,13, and 14. (f) 1, by itsel f . (g) 3, by itself. -55-Further, i t remains to prove that each and every set of populations of which these clusters form a subset is significant. To do this, we refer back to the Table 7 of distances and the Figs. 2, 3, and 4 and form the following bigger clusters by incorporating in the corrected clusters the populations lying closest to them: (i) 2, 5, 6, 8, and 10. (ii) 2, 4, 7, 8, and 10. ( i i i ) 2, 4, 7, 10 and 11. (iv) 2, 4, 9, 10, and 11. (v) 2, 9, 12, 13, and 14. (vi) 3, 9, 12, 13, and 14. (vii) 2, 9, 10, and 13. 1 (viii) 1 and 6. (ix) 3 and 14. We test the significance of these bigger clusters and, as shown in Table 13, find them a l l to be significant which confirms the conclusion made above. Table 13 Populations involved Computed D.F. Tabular T 2 Conclusion Cluster 5% 1* 2,5,6,8,10 31.02 16 25.6131 35.6187 Significant 2,4,7,8,10 68.39 II ti II it 2,4,7,10,11 68.07 it it it it 2,4,9,10,11 41.88 it n it it 2,9,12,13,14 30.85 II n it it 3,9,12,13,14 50.23 it ti ti it 2,9,10,13 26.63 12 21*7064 29*9100 it 1,6 18.36 4 12.1371 18.2030 n 3,14 14.71 4 it n n CHAPTER THREE ANALOGUES OF DUNCAN'S PROCEDURE IN FORMING CLUSTERS IN MULTIVARIATE ANOVA (Contd.) 3.1 In section (2.3) we have proposed three alternative approaches to correct the predicted clusters where the fi r s t - called the Duncan-Hotelling test - has been explained quite at length with an illustrative example. Now we take up the remaining two - the. ';Extreme Distance from the Mean' - E-test and the 'Largest Distance' - R-test. The exact distributions of both the statistics are not known. Siotani (1958) has found the approximate distribution of the E-statistic for the k p-variate normal populations and has computed the tabular values at $% and 1% significance levels for some particular values of p. With Siotani's tabular values in hand we fi r s t discuss below the procedure for the E-test in Section (3.2). We then take up the R-statistic in Section (3.3) and discuss the working procedure. Lastly, in Section (3.4) we present the distribution of the R-statistic for the bivariate case in the form of definite integrals. 3.2 Procedure for the E-Statistic The E-test is based on Mahalanobis' distance and Duncan's level of significance based on degrees of freedom. -57-Suppose again that the clusters have been predicted by following the procedure discussed in Stage I of Section (2.3). Without losing generality, we take up one of the predicted clusters containing k^ populations and discuss the;procedure for the E-test in the following steps: (i) Compute the statistic E.^ ( i = 1, 2, ..., kj), the Mahalanobis' distance between the mean vectors of the ith population and the grand mean vectors of the k^ populations. ( i i ) Without losing generality, let E, be the largest of a l l the computed E^ ( i = 1 , 2, k^). ( i i i ) Compare this E, with tabular E^ ,where £ is defined already in (2.2.6) and X is the pre-assigned significance level. (iv) If E, is less than or equal to E r , a l l the k populations K l 1 involved are concluded to form a cluster. Otherwise, split the k^ populations into k j ) sets of (k-^-l) populations each. (v) Compare the extreme distance of each set of (k^-l) populations from their, respective grand mean vectors with the. tabular Out of them some may be significant and some may not be. Those non-significant ,vwill yield clusters with the corresponding populations involved in them. Those, for which the extreme E's are significant, are further split into sets of (k^-2) each and their corresponding extreme E's are them compared against the tabular value E, . In -53-this way the process is continued t i l l we arrive at the clusters of the type defined. Thus a working criterion analogous to Duncan's can be stated as follows: 'A group of k^ populations will form a cluster i f the extreme distance E (assumed to be the largest amongst a l l the k distances *1 1 between the mean vectors of individual populations and their grand mean vector) i s non-significant and i f furthermore such extreme E's of each and every new set of populations of which the k^ populations form a subset, is significant according to Ji -level E-test for some pre-r assigned JC , where r i s the number of the populations in the set'. Note: The exact distribution of the extreme classical distance was taken up by Mrs, Cuttle in her Master's thesis, 1956. She successfully solved the problem for three bivariate populations and gave the tabular values at some probability levels. We tried in vain to extend her procedure to four bivariate populations. The joint distribution of four distances came out in terms of elliptic funcions, whose further integration, in order to find the distribution of the extreme E amongst the four E's, was found to be quite involved. -59-3.3 Procedure for R-Statistic Duncan's range test has already been explained in Section (1.2). We extend his procedure to the multivariate case. Suppose we have k p-variate normal populations having significantly different mean vectors. Suppose further that the clusters have been predicted by following the procedure discussed in Stage I of Section (2.3). In correcting these predicted clusters no generality i s lost i f we take up one cluster containing k^( c k) populations. The procedure i s described in detail in the following steps: (i) Compute ( ) Mahalanobis distances R (r # s = 1, 2, ..., k ) «c rs 1 between the rth and sth populations. (ii ) Again, no generality i s lost i f we suppose that the distance RJJ^ between the f i r s t and the k^th populations i s the largest amongst k l ( 2 ) distances. ( i i i ) Compare the computed R_, with the tabular R, , where ± k l k l k l is already defined in (2.2.6) and -C i s a pre-assigned level of significance. If R„, i s less than or equal to R/ , a l l the k l k x ^k]/ 1 populations involved are considered to form a cluster. Otherwise, split the set of k^ populations into k^ sets of (k^-l) populations each. (iv) Compare the largest distance of each set of (k^-l) populations with the tabular R/ . Out of them some may be significant and some k^-1 -60-may not be. Those non-significant W i l l yield clusters with the populations involved in them. Those for which the largest distance is significant are further s p l i t into sets of (k^-2) and their respective largest distances are then compared against their corres-ponding tabular values R.- . In this way the process is continued t i l l we arrive at the clusters of the type defined. Thus the working criterion analogous to Duncan's can be summed up as fellows: 'A group of k populations i.will form a cluster i f the distance (assumed to be the largest amongst a l l (^ ) distances) between the f i r s t and the k^th populations is non-significant and also the largest distance, amongst a l l possible distances between pairs of each and every new set of populations of which the k^ populations form a subset, is significant according to £ ^-level R-test for some pre-assigned JC , where r is the number of populations in the s e t 1 . There is no doubt that the test procedure set up above is com-pletely analogous to what Duncan did in his multiple range test, but, in order to apply i t , we need the distribution 1 of the s t a t i s t i c R and hence the tabular values at / ^-level for r populations. To overcome part of the d i f f i c u l t y we present below the simultaneous' distribution of the distances involved in a predicted cluster in the case of bivariate populations. we have actually found the joint distribution for k = 3> 4, 5 populations and then have generalized i t for any k. Lastly, we have also suggested the limits of integration to find the distributions - 6 l -of the individual largest distance i.e. of the st a t i s t i c R. To find the tabular values one could apply any method of numerical integration. 3.U The Distribution of the R-Statistic i n the form ofaDefinite Integral (a) Preliminaries and Notations _t ( i ) Let X(k x p) be the matrix of k mean vectors (columns for characters and rows for sub-population samples) of samples of sizes N^, N^, ..., N^ respectively drawn independently from k p-variate normal populations. Let the covariance matrix («£ ) be known or estimated on the basis of large samples. _t Further, l e t the matrix X(k x p) be transformed into another matrix _t Y(k x p) by such an orthogonal transformation that the covariance matrix of y's is a diagonal matrix J\\. (p x p) with elements \\ ^ ( i = 1, 2, p). Without loss of generality we can assume that the true centroid of the distribution i s = = ... = f* = 0. The joint distribution of the y's is then: _ P k f ( y i r . . . , y y l k,..., y k ) I 1 dy i=l r=l J£ p p K r=l i=l i=l r=l -62-where c , pk r=l i=l i=l r=l i=l k where n k • ^ N r (3.4.3) r=l Further, i t is easy to prove that: k k-1 k £ N > i r \" = n\\ ]T C ^ i r \" y i * ) 2 r=l r=l s=r+l Thus we have: k p k-1 k r=l i=l r=l s=r+l i=l where - - i ± £ ( y ± p - y i g ) 2 (3.4.5) • i=l X Thus the joint distribution (3.4.1) can be written as: k-1 k p p k * K \" * F F *» - i t Z $ 5501 A ^ N r=l 8-r+l i=l ' ( i i ) From the quadrilateral joining points i , j , k and £, 6) - 6 3 -we can find the distance v between i and j as follows: V - Ri<* Bj« - 2 ^ x T 7 ^ C 0 S A (3.4.7) i R „•» R . - R n R. + R - R where A- Cos\"1 - l L — & * _ ^ - 1 M *k Jk ( 3 . 4 < 8 ) 2 y R i ( ? R £ k 2 J R ^ R F K ( i i i ) Frequently we shall have relations of the type: ax + by = L 2 2 x + y « M and a 2 b 2 - N ( 3 . 4 * 9 ) where we shall be required to find the value of: bx - ay (3.4.10) Solving the first two equations of (3.4.9) we have a L i b N/(a2+ b2)M - L 2 x = 2 , 2 a -»• b f 2 2 a and y _ bL + a J (a -» b )M - L* 2^ ,2 a •+ b where we have placed the restriction that the signs before the square root in the expressions of x and y must be opposite. Therefore 7 2 2 2 / 2 (a • b )M - L s i Jm - L (3.4.U-) (iv) We shall frequently need the following: -64-/ = I ^ = T T (3A.12) S Jax - x (v) Lastly we give below the notations which are used quite frequently in what follows: s i J k 2 N i N A i R k o + 2 H i H k R j i R j k + 2 N A R i k R i j - N i 4 - ^ 4 - 4 R h ( 3 - u - i 3 ) 5 i j k = 2 R k i R k j + 2 R j i R j k + 2 R i k R i j - R j k \" 4 - R L ( 3 . 4 . 1 1 + ) S ' - 2 R k R k j * 2 R J i R j k * 2 R i k R i j - R S - R i k - R i j (3.4.15) (b) Distributions Case I: For k = 3 c 2 3 exp where, The joint distribution of ( y ^ y ^ y ^ ' y 2 1 ' y 2 2 , y 2 3 ^ f r o m 6 > ) i S : 2 3 2 i l l r=l s=r+l i=l N N N from ( 3 - 4 . 2 ) , C = 1 2 3r—r-..6 ( 3 - 4 . 1 7 ) and from ( 3 . 4 . 3 ) , ^ = 1^ + Ng.+ N3 ( 3 . 4 . 1 8 ) Consider the orthogonal transformation: - 6 5 -V n ( ; 7 n + .^12) v 2 i - J ; ( - y 2 i + ^22) V12 C\" 7 H \" ^ 2 + ^ 1 3 ) V22 = ^ ( \" y 21 • y 22 + 2 y 2 3 ) (3.4.19) whose inverse transformation i s : - U l V l l V12 and V 22 Y l 1 ^ 3 v/2 \" 76 ~73 v/2 \" 76 - u l ^ V 3 V + 1 1 -v/2 V12 V6 y22 73 v 21 v/2 \" ^ 2 u l 7 1 3 = 73 2 v12 H! y23 U 2 + 73 + 2 V 22 76 and from these and from (3.4.5)> we have: 2 2 *12 R!3 ^1( 72 5 A 2 v/2 J V i r g i n + * k , a + , ( i 2 i + ^ a ] n 3 L A v/2 >/6 V 7 2 76 . J N0N_ r . 7 , , - - 5 7 : . 2 v 3v 2 1 The distribution now takes the form: 2 3 2 2 n 2 G 2 3 exp [ - 1 1 5 - , r ^ l — r=l s=r-f-l i=l (3.4.21) -66-where R are defined as in ( 3 . 4 . 2 0 ) . rs v Integrating with respect to u^ and both with the limits from _ oo to 0 0 , we get the reduced form; of (3-4 .21) as: 6 F J~~k—A\" 2 3 2 2 — = T ^ °23 \\ I L B r s ] I 5 3 r=l s=r+l i=l j=l N N N Let N = — - . Then we define R* R* and R' as: n^ 12 ' 13 23 2v 2v R 1 2 = N3R 1 2 = A- (-72-) + ^ ( - ^ ) Dt A T o N / i i . 3 V 1 2 ,2 N / V 21 ^ 3 V 2 2 A 2 R i 3 = N 2 R i 3 = * 7 5 - ) + X 2 ( 7 2 * + 7 6 - ) R P 3 = K 1 R23 - f ^ ^ ^ - 3 \" 2 2 >2 ( 3- U' 2 3> 2 1 23 Ax v/2 v/6 X 2 j 2 \" - y g -Further, to effect the change of variables from the v's to R's, we introduce a fourth R' defined by: R' - j 1 (7I ) 2 (3-4.24) Finding f i r s t from (3.4.23) and (3.4.24) the Jacobian of the transformation, we conclude that: 2 2 dRf12dR' dR« dR^ I II d V i j = 2 8 8 N 2 ^ N v u ^ N V 2 1^N V l l N ^ 2 _ N ^12 N V 2 1 j i=l j=l A l x 7 2 - A2 J2 h v /2 ^2 J6 X l yJS X2 J2 -67-Further, with the help of ( 3 . 4 . 9 ) , (3 .4 .10), (3 .4.11), (3 .4.23) and (3 . 4 .24) , we obtain: 2 2 dR* dR' .dR'_dR' U H av ± J - 1 2 1 3 2 3 — i=l j=l J 24r '123 where i s defined as i n (3.4.15). Again using (3-4.23), we get 2 2 WnW0N0 dR dR dR^dR' TT Hdv.. = - ^ 1 2 1 3 2 3 (3.4.25) N NN Using (3.4.12), (3.4.25) and the value N = - , the joint distribution . . . . n 3 (3.4.22) reduces, after integrating with respect to R' over the range from N 0 to - j - R 1 2 as shown in (3.4.12), to: W e x p L \" 2 1 2 i3 23_J dR 1 2dR 1 3dR 2 3 (3.4.26) ^ 1 2 3 which i s the joint distribution of R 1 2 > R 1 3 a n d Rg^-'-All these variates are always positive,and i t i s easy to check that they do not assume values outside the cone defined as S 1 2 3 \\ 0. The distribution of f ( R 1 2 ' R 1 3 ' R 2 3 ^ i s therefore always positive. -68-The Distribution of the Largest R _ rs Let us further restr i c t the problem by assuming the number of observations to be the same for a l l the three groups, i.e. 1^ = N2 = = NQ, say. The joint distribution of ( R ^ R ^ R ^ ) i s . exp [- \\ (R +R +R )J >/ &123 where now the variates R, 0 , R-,->, and R0_ do not assume values outside l<= l j the cone defined by S^2^ 0. We can assume without loss of generality that the variates have been ordered, say 0 ^ R 2 3 £ R^ <. R 1 2 . The density of these ordered variates i s 3*.f(R12, R.^ , ^23^ ^ -hus t n e probability G(t), that R 1 2 £ t , i s : V where V is the region: 123 (3.4.28) 2 ( A 2 \"^ / R 13 ) - R 23 ~ R13 ? R 1 2 ^ R13 - R12 0 i - t < ^ -69-The procedure for i t s numerical integration has been given by Mrs. Cuttle and one can easily compute the values of t for known values of G(t). Case II: For k = 4 The joint distribution of (y^Y^Y^Y^Y^^^Y^,^^) from (3.4.6) is r=l s=r+l i=l 1 1 - 1 r = 1 N N N N where, from (3.4.2) j C_, = 3 ^ • (3-4.30) and from (3.4.3), n^ = Nx + N 2 + N3 + (3-4.31) Consider an orthogonal transformation of the type (3-4.19) whose inverse transformation we write as: V _ u l V l l V12 V13 and y 2 1 _U 2 V21 V22 y l l ~7^ \"76 712 \"7^ \"72 76 712 y l 2 u l \\ A + Y i i 72 ! i 2 \"76 _ ! l 3 J12 y22 _ \"2 \" 7 4 V21 + 7 V -V22 \"76 123 \"712 u i V + 2 V12 U2 2v 22 V23 y l 3 76 \"712 y23 76 v/12 y l 4 ' u ; • • 7^ 3 v13 712 y24 _ 2^ + 3 v 23 712 -70-R12 R13 R l 4 R2k With the help of these and (3.4.5) we have: 1 2 r 2v 2v 1 = V i r i ( i i i n 4 L \\ 72 76 A 2 y 2 76 J = W l ( ! l l + - ^ 2 + ^13)2 + 1 (Igl + ! 2 2 + ^ F \\ \\ l \\ J2 76 712 A 2 7 2 J6 712 J N 2 N 3 f l ( ^ 1 3 V 1 2 ) 2 ^ 1 (^ 23L 3 V 2 2 } 2 ] \\ [ \\ 1 . 7 2 \" 7 6 A 2 7 2 \" 76 J _ ! g \\ r i ^ S i 2 + l ' ( ^ l . ^ 2 U v 2 3 ) 2 l \\ L \\ v/2 */6 \"712 A 2 7 2 7 6 712 J 3 k D 4 L A 1 > / 6 N / 1 2 A 2 V / 6 V/12 J (3.4.32) Making use of (3.4.32) and integrating with respect to u^ and u 2 both vextehdihg3 from - <=>o to °o , the digtribution (3.4.29) takes the form: r=l s=r-frl f A A c a exp[- I f_ f_ R r s ] E I av t J ( 3 > . 3 3 ) •71-N NJI.N Let N - ? * . Then from (3.4.32) we have: 4 2v n 2v n „ « » N ' 1 2 n 2 . * / 21*2 12\" 3 \\ 12 * AX<-3TJ J ^ T T * R' = N N R = ^ ) 2 * ^ £ 3 L - ^ p ) 2 13 2 4 13 X /2 -/6 » <7. \" V \" W i t v v 4v o v v 4v o N ( j a + JL2 ^ l l l ) ^ N (_21 , _22 f 121)2 A x /2 >/o~ 7l2 A 2 72 To* x/l2 v 3v o v 3v o » » » N 12v2, N r_21 l_22x 2 23 1 4 23 A /2 A J2 SZ V V ZlV V V 2lV O I H H P N / 11 12 113N2 N / 21 22 23^ R' - N N R = — (—=- - — - -t — (—7=.- —7=. - —7=r-j 24 1 3 24 /\\x >/2 J12 A 2 V ^ * T /12 However, in changing from the v^j to the Rj^. , we discover that the Jacobian of the transformation vanishes. In fact i t should, since the quadrilateral is completely determined by taking two triangles standing on the same base or by taking any of the five out of six R' s« Thus, we do away with one of the six R\" s (which can be done in 6 ways) and then,to complete the set of six R1 s corresponding to sis v's, we bring in another R', functionally independent of the five retained R1 s. It is defined as: -72-Assuming R^ to be the smallest, we do away with i t and replace i t by (3.4.35\"). Then, with the help of (3-^-3^) where R^ is l e f t out and of (3.4.35*), we find: n IT av ( A A ) 3 / g ^ V ^ ' i ^ V ^ i=i ]iiTiJ~ *^ y^*^ - *• > ^ w s l > 3 Finally making use of (3.4.34), (3.4.36), (3.4.13), and (3.4.30), the distribution (3.4.33) reduces, after integrating out R* from 0 to i ' w 5 j : R , 1 2 ( o r -JJ--2- R 1 2),again as in (3.4.12), to expf\"- \\ Y Y R 1 dR dR dR ,dR dR , ,6^1,4+1^.4-2 L 2 Z — J 3 3 ( l ) ( 2 } p ) r = 1 8 - p f l (3.4.37) N / ^ 2 4 >/^123 where, by using (3.4.7) and (3.4.8), R^ is determined from the quadrilateral formed by the points (1,2,3,4) and is substituted i n (3.4.37). Furthermore, the variates R 1 2, R^, Rg^, R-^j and R^ are a l l positive and i t is easy to prove that R,0, R, and R__ do not assume values Ld l j . O outside the cone S^2^ ^ 0 and that R]_2'Rl4> a n c * R24 ^° n o * ' a s s u m e values outside the cone S^2^ 0. The Distribution of the Largest Distance: Let us further restr i c t the problem by assuming that = N 2 = = = NQ, say. The joint distribution (3-4.37) then becomes: -73-3 4 f. •% o -i eXP[\" * T Z R r s ] d R12 d R 1 3 d RlU d R2 3 d R24 q ) ( | ) 3 ±2 r * l g T f l ^ 1 2 4 ^ 1 2 3 (3.4.38) where again the variates B.^R^^ and R 2 3 do not assume values outside the cone ^ 0 and also R^' R l U a n d R24 d o n ° * a s s u m e values outside the cone 0. Furthermore the distribution of f ( R 1 2 ' R 13' R l V R 2 3 ' R 2*^ i s a l w a v s Positive. Me can assume without loss of generality that R ^ 2 is the largest of the five R's and further that they are ordered as: o £ R 2 3 - R 1 3 * R 1 2 ^ ~ and 0 ^ * ^ ^ * - (3.4.39) The density of the ordered variates is 5(21 )(2. )f (R 1 2 > R 1 3 > R 2 3 ' Rlk' R2l+^' and the probability S'(t) that R 1 2 i t Is 3 4 * i , i rrrrrex^ Z HRrs)dRi2dRi3dRi4dR23dR2u G(t) « ( ° ) ( 5 ) ( 2 : ) ( 2 : ) ( | ) 3 f 2 / / / / / r=l w f l - V N/^123 >/^124 (3.4.40) where V is the region: ^ (s /R 1 2 - y R 1 3 ) 2 * R 2 3 * R 1 3 \\ R12 - R i 3 - R12 ( 7 R 1 2 -jRlkf * R ^ < R L U - 7 4 -F R12 £ hk i R12 0 i R 1 2 * t 0 i . . t <-GJ^ib) can be evaluated by some numerical method. Case III; For k = 5 UEollgwing the similar steps as in Case II for k = 4^£rom (3.4.29) to (3.4.32)) we finally cbbtairie the distribution of R (s = (r+l) to 5, ' rs r = 1 to 4) N where, from (3-4.2), Q = 1 2 3 ^ — ( 3 . 4 . 4 2 ) 2 5 ( ( 2 r r ) 5 and from ( 3 . 4 . 3 ) , n ? = Nj_ + Ng + » 3 + + N? (3 .4 .43) I N I N,N Letting N = 1 2 3 ^ , we obtain as in (3 .4 .34) the following: U 5 R12 \" N 3 N 4 V l 2 = j^T 2\"* + A 1 ( 7 2 ~ ) N A 1 ^ 3 V12\\2 j . N /V21 . 3^22>2 Ri3 = W 5 R i 3 = \\^72 + 7 o - } + A \" 2 ( \" 7 2 + 7 O - ) - 7 5 -N / l l 3 V 12 N 2 . N , V 21 3 Y22x2 R' - w w N p - i i ( X J L ^ j l ™ /J~L _^£V R 2 3 - W 5 R 2 3 = A^T 2 \" 7 o ~ } * r 2 ( 7 2 pt « M n B w / l l 1 2 ^ 13 N2 „ N , 2 1 „ V22 „ _ ^ 3 . 2 \\k - w 2 N 3 n 5 R i 4 = * 7 ° ~ + 712> + r 2 ( 7 2 * 7 5 - + 7 i 2 1 ) R » - N N M R - W A 1 V l 2 W l3\\2. N / 2 1 V22 W 2 3 x 2 R 2 4 - \" l D 3 V * \" ^ 1 7 2 \" \" 7% \" . \" 7 2 + A\" ( 7 2 ' 7 5 \" \" 7 1 2 } pi „ w „ R N / i i . V12 ^ V 1 3 . ^ V l 4 N 2 N /V21 „ V22, . V 2 3 ^ ^ V 2 U A 2 R 1 5 \" N 2 N 3 N 4 R 1 5 = Ax T2\" + 75 * J l 2 + 720> + A\"2 (72 + 7 ^ + v 7 l 2 + N 7 2 0 _ ) R. _ N N T J R _ n / l l V ! 2 V 1 3 5 V14\\2 N /V21 V22 V 2 3 5 V 2 ^ 2 R 2 5 = N 1 N 3 \\ R 2 5 = 3^72\" ~W - 7 1 2 \" 7 2 0 ) % ( 7 2 - \"Jo\" \\ 7 l 2 \" 120 2v l+v 2v k-v ?t n TVT « p W / 12 1 3 v 2 ^ N / 22 J_23\\2 p» n « TVT p N / 2 V 12 V 1 3 ^ V l 4 A 2 a N , 2 v 2 2 V 2 3 ^V24>2 R 3 5 = N l W 3 5 - A ! ( 7 5 - \"712 ' 720> % ( 7 5 - -712 1 \" 7 2 ^ ) Again from the fgeometric representation of the five points, we see that seven of the R* s are independent and the remaining three can he found with the help of the known seven. So again we discard any three of the ten R' s (which can he done i n (^®) ways) and then to complete a set of eight R* s corresponding to eight v s, we bring in another R', functionally independent of the remaining seven, defined as: -76-Thus, assuming R^, and R^ to be the smallest of the ten R',.s, we discard them and then with the remaining seven R1 s and R1 in (3.4.45), we conclude that: i i 2 JL. JL U,AJ dR' dR' dR« dR« dR' dR'r dR' dR' Making use of (3.4.44), (3.4.46),(3.4.13) and(3.4.42), we get the joint density of the seven R' and R'0 As was done in (3.4.12), we integrate out R', where 0 <-R* S £ R£ 2 • This yields exp [ - i £ H R r s J % 2dR 1 3cffl 23C^di^dR 1 5dE r l O w l N 5 * l / ^ 5 - 2 rsl s»r+l ( 3 ) ( 2 ) W / s \" / a \" / s ~ >/ 123 V 124 V 12 (3.4.47) 25 5 Here R3^, R35 and R^ ^ are functions of the other R r s and should be expressed in terms of these other Rj.s in (3.4.47). R34, R35 and R^IJ can be determined from the quadrilaterals formed by joining the sets of points (1,2,3,4), (1,2,3,5) and (1,2,4,5) respectively. The variates 1*12* R]_3* ^23* ^14* ^24* ^15* ^25 a r e all>P 0 S j-tive, a n d the sets of variates (R 1 2, R 1 3, R 2 3), (R 1 2, R^, R^) and (R 1 2, R 1 5, R 2 $) do not assume values outside the cones S-^^ 0, >, 0 and S-^ >, 0 respective-ly. Thus the density in (3*4.47) is always positive. The Distribution of the Largest Distance We again restrict the problem by assuming that N R * NQ (r=l, 2, 5)« -77-The joint distribution (3.4.47) reduces to: . 4 5 l n , f i c , e X P [ \" l L L R r s J ^ d R ^ d R ^ d R ^ d R ^ d R ^ (\")(|) (fif) 3 ^ s=r-fl J ^ 2 3 ^ 2 4 >^125 (3.4.48) where again the sets of variates (R 1 2, R13, R^)* ( R i 2 ' R l 4 ' R24^ and (R,„. R,,-, Roc.) do not assume values outside the cones §,„_ X- 0, Ld 1? O 123 ^124 ^ 0 a n d ^125^ 0 r e sP e c\" t i v e ly» We can again assume without loss of generality that R^ is the largest of a l l the seven R's and further that they have been ordered as: 0 R 23 R13 <. R12 <. 0 < R l l t *. R12 and 0 R 25 R15 <. R12 The density of the ordered variates is 7(21 ) 3f(R 1 2, R13> R23> R^, R^, R,_, R „ c ) , and the probability G(t) that R,_ £ t i s : V) d? Id 4 5 , 6 , J f 6 X P t I Z E R r s ] a R 1 2 d R 1 3 d R 2 3 - d R 1 5 d R 2 5 G(t) - ( 1 0 3) 7 ( 2 l ) 3 ( | ) 6 ( 2 4 ) 3 J ' - y r=l s=r+l J 7 ^ ^ 2 3 ^124 N/ § 125 (3.4.49) V is the region: where (7R 1 2 - ^ R 1 3 ) 2 C ^ C R ^ I R12 - R13 - R12 -78-( y i L 2 \" * R24 * hu R25 * R15 1 ^ 2 * ^15 <• ^12 G - R^ ± t 0 £ t «o (3.4.50) Generalization. For any k An inspection of (3.4.26), 0.4.37) and (3.4.47) enables us to generalize the joint distribution of E's for any k - the number of /k\\ bivariate normal populations. To start with we shall have ( ) R's from which (2k - 3) geometrically independent R's denoted by R^, , ... ^lk* R23* R24' \"**' R2k 0 5 1 1 1 a r ^ ^ r a r H y 1 3 6 chosen to complete the k point figure. It should be noted that such a choice can be made in / ^ \" 1) Ways. V 2 k - 3 y The remainder £(*) - (2k - 3)J of the R's denoted by R^, R^j R^j, R^j . ; ^(^2.)^ a r e again assumed to be the smallest and are discarded. Thus we conclude that the generalization of (3.4.26), (3.4.37) and (3.4.47) i s the density -79-k-l S X P [ \" \\ L 11 * » ] d R12 d Rl 3 d R23-'' ( k(k-l) V n, C ^ L \" ^ Z_ Z_ \" » | u n12^ 1 3 u n 2 3-\" d Rlk d R2k 2 ( | r + 1 ( p | ) k \" 2 r=l 8i»l v/S123 v/ 124 ••*^ S 12k (3.4.51) where R^> • ^ j j * R45* ***' R 4 k ' '\"' R(k-l)k c a n ^ e determined as shown in (3.4.7) and (3.4.8),and where the line joining the points 1 and 2 is the common side of the quadrilaterals ( l , 2, 3, 4), ( l , 2, 3, 5), (1, 2, 3, k); (1, 2, 4, 5) ...(1, 2, 4, k); ( l , 2, k^I, k) respectively. Again the variates R ^ , R ^ , R L K , R ^ , R G K are a l l positive,and the sets of variates (R-j^, R13* R 23^' '\"> (R^g, R ^ , ^2k^ d o n a t a s s u m e values outside the cones >, 0, S124 °* '\"' S12k J' 0 respectively. The Distribution of the Largest Distribution Assuming again the equality of sample sizes, that is the largest and that the variates in each of the sets (R-J^* Ri3> **23^' **\"' (R O T R. , R 0 , ) are ordered as in the previous cases, we conclude finally 12' lk' 2k' the probability G(t) that R t t is k-1 J k - 3 / ^ k + 1 ( 2 i ) k - 2 ( 2 k . 3 ) ( 2 i ) k - 2 ' - ' ' r = 1 eXP[^r L Rr 8 J d R12 d Rl 3- d R2k ^^123 J^l2h \"'J^lZk (3.4.52) -80-V is the region where ( J \\ 2 \" ^ R 1 3 ^ ~ R 23 ~ R13 F R12 - R13 - R12 2 i r R i 2 * R i u * R i 2 ( JR12 -J\\kf £ R 2 k 6 F R 1 2 ~ R l k 6 R12 * R12 * t 0 6 t <. CHAPTER FOUR ASSIGNING A POPULATION TO ONE OF THE CLUSTERS k.l We propose a method for assigning any other individual or population to one of the clusters obtained by any of the methods described in Chapters Two and Three, where the prior fact is known that the individual or population being assigned belongs to one of the clusters. Two alternative approaches have been suggested, both of them being based on the assumption that the populations concerned are normally distributed. The first approach deals with the method of likelihood functions as already discussed in Section (1.3), and 2 the second with the use of T values. Finally an illustration is presented to demonstrate their use. k.2 Since, by definition, a l l the populations included in the cluster have identical mean vectors, we can consider the cluster as one population whose mean vector is estimated to be the grand mean vector of that of the populations included in the cluster. Thus, i f there are C clusters, we shall imagine them as C distinct populations with their estimated mean vectors as the grand means of those populations which are included in the respective clusters. Let the estimated mean vectors of the C (so-called) populations be given in matrix form as: -82-- t , . Z (C x p) Zll» Z21> Z p l Z. , Z , . . « , Z 12 22 p2 1C* 2C' * pG (4.2.1) To get the corresponding significant discriminant scores, we post-multiply Z*(C x p).by the matrix (introduced in part (d) of Section (2.2) and obtain the corresponding matrix U defined as: i - 1, 2, p» t - 1, 2, C _t _ t U (C x pt) = (U i t) (4.2.2) Further, i f Xp j be the mean vector of the sample from another new population which we are trying to assign to one of the clusters, then the corresponding significant discriminant scores can be similarly obtained. We denote them by ( V V V } ( 4 o 2 ' 3 ) 4.3: Discussion of Approaches A . (a) Approach I: Use of L-functions Since the C so-called populations are normally distributed, we use Rao's procedure for assigning an arbitrary population to one of the multivariate normally distributed populations. We f i r s t compute L-functions in the form already defined in Section (1.2) or in the form obtained below by the use of significant discriminant scores, namely P1 P« i t l i»l (4.3.1) -83-- i TJmen, following Rao, we would assign, ignoring the a priori probabilities, the new population to the Ath (ft £ C) \"so-called population\" (or cluster) i f ^ A. L - L. > 0 for a l l t = 1, 2, s-1, s+1, £ s u p (b) Approach II: Use of T -Statistic In the previous method we have not been able to assign tae probability to our decision. To achieve this aim we propose the following steps: Step 1. Let the size of the sample drawn from the new normally distributed population be N and try including i t in each of the clusters so that the number of populations involved in each cluster increases by one. 2 /o Step 2. Compute the statistic T^ + ^ for t = 1, 2, *»,and where k^ is the number of populations in the t-th cluster. Step 3» Include the new population in the Sth cluster i f (i) T2. + 1 < a l l T 2 + 1 for t ( ^ s) = 1, 2, tf-1, S+l, C s' t 2 2 and ( i i ) computed T, , < tabular T. V * 1 \" Z k +1 s Note: Since we allow overlappings, we shall include the population in 2 each cluster for which the computed T is non-significant. -84-4.4 Illustration To demonstrate the above approaches we continue with the illustration discussed in Chapter Two. The B.C. Forest Laboratory obtained later a shipment of 7 trees of black Cottonwood from some locality. To assign i t to one of the clusters on the basis of i t s static bending property, the same four measurements X^, X^, X^, and X^ were taken on different locations of each tree, and the following results were obtained: S i Z e * L *2 h h 61 982 4.70 2287 4102 The corresponding significant discriminant scores are: s i ' « J l 2 l 2 l 61 0.4794140 1.1417523 0.4540478 Demonstration of Approach I Considering each cluster to be one population whose mean vector is estimated as the grand mean vector of the populations (species) involved in the corresponding cluster, we write below the mean vectors of each of the seven clusters by use of (4.2.1) and (4.2.2): - 8 5 -Size h (a) 931 0.66968541 1.27604842 0.62721413 (*> 984 0.76536770 1.19428189 0.71449228 (c) 368 1.08072219 1.17054579 0.58144678 (a) 587 1.01019297 0.92978089 0.42782876 (e) 1266 1.29464479 1.30553049 0.70800378 (f) 264 0.94597083 1.72039748 0.34593229 (g) 158 1.74328671 1.21889759 O.50O48469 (4.4.2) Note: These clusters (a) to (g) have been written in the same'order as shown in the end of Chapter Two. Using (4.3.1), (4.4.1) and (4.4.2) we obtain L-functions as: L(a) = ° - 8 2 7685l4 L( d* = O.69443164 1^ , = 0.79361765 L ( f ) = °'58770051 £ ( e ) = 0.49183864 L = 0.72048219 (CJ and L ( g ) = O.O6075676 o Since I ^ a j is greater than a l l the remaining £-functions, we would assign the black cottonwood to the cluster (a) i.e. to (2, 5, 6, 8 ) . Demonstration of Approach II Combining the new species of black cottonwood with each of the 2 sets of populations already in clusters, we compute T -values by the formula (2.2.4) and obtain:;: -86-4 (for 2, 5, 6, 8 and new one) = 24.70 (for 2, 7, 8, 10 and new one) = 29.91 < (for 2, 9, 10 and new one) = 30.94 (for 2, 4, 10, 11 and new one) = 34.39 T 2 (for 1 and new one) = 37.44 (for 9, 12 , 13, 14 and new one) = 43.99 4 (for 3 and new one) = 77.66 Clearly T,. (for 2, 5> 6, 8 and new species) i s less than a l l 2 the other computed T -values and also i s the only one non-significant for l6 D.F. and for >£ = .05*since the corresponding tabular value i s 26.7251. Hence the black cottonwood would naturally be assigned to the cluster of species 2, 5, 6 and 8. Remark: We have plotted the point representing the new species •black cottonwood' in Figures 2, 3 and 4. This graphical.: representation also shows that the new species is close to 2, 5, 6 and 8. CHAPTER FIVE DETERMINATION OF CONFIDENCE REGIONS FOR NON-CENTRALITY PARAMETERS CORRESPONDING TO D 2 AND T 2 2 k and ANOTHER EXPRESSION FOR T 2 k 5.1 In multivariate analysis of variance, when the hypothesis of the equality of mean vectors in the case of two or more populations i s rejected, the need arises to set up confidence limits for the non-centrality parameters corresponding to the statistics used for tests of hypotheses. In Chapter One, Sections ( l . l ) and(L.2), we have 2 9 considered using the statistics D^ and T£ for testing the hypotheses of equal mean vectors. Now we discuss the problem of setting up confidence regions for the corresponding non-centrality parameters 2 2 9 ^ and J \" ^ . Lastly we shall give another expression for T£ in terms of the sum of weighted Mahalanobis distances. 5.2 Distributions of the Two Statistics in the Non-Central Case 2 The distribution of D^ , both for Studentized and classical cases, i s summed up in Section (1.1) for the non-central case 4 0. As regards Studentized T2, we do not have i t s exact distribution in compact known standard form even for the central case. The asymptotic - 8 8 -2 expression of a percentage point of the central T -distribution in terms of corresponding percentage points of central chi-square with p(k-l) D.F. has been given by Ito (195°\")}-thish we have already given in Section (;1.2'). We again write i t below but in a different form suitable for our purpose as: 2 - 3 4 c x X 2 + G 2 ( x 2 ) + c 3 ( x 2 ) + c ^ ( x 2 ) ( 5 . 2 . 1 ) p - n 1 + 1 7P2 + 12(1 - n x)p + ( 7 n 2 - 12^ + l ) where G, = 1 + + :—-x 1 2 n2 24n2 p + n 4 1 1 3 P 2 + 24p - l l n 2 + 7 C — A. 2 2n 0(n np +2) 2, „ _x 2V 1^ ' 24ng(n p + 2) kn^ + 2(3n2 + 3n± + 10 )p 2 + 2(2n3 + 3n^ + 1 7 ^ + l8)p44(5n2+9n1+2 ) 3 24n 2(n l P + 2) 2(n Lp + k) C4 = 6(p - l)(p - 2)(n 1 + l ) ( n i + 2) 24n2(n p + 2)\\n p + k)(n p + 6) 2 1 1 1 (5.2.2) n^ = k - 1, and n is taken so large so that the cubes and higher powers of — are negligible. 2 Although Ito considered only central ^ , there is no difficulty in 2 deducing the approximate distribution of non-central T^. If we go carefully through the procedure Ito (1956) followed in arriving at the -89-distribution of central. T2, we can easily deduce the distribution for non-central T2. We have only to replace the central chi-square by the non-central chi-square with the same degrees of freedom and non-centrality parameter tdefined in Section 1.2. Thus we write for T2, when Z\\* ®> T2 = C i x : 2 * c 2( X ' 2 ) 2 * c 3( X' 2) 3 + c 4( Xl2)k (5.2.3) where X i s non-central chi-square with p(k-l) D.F. and parameter •£\"2, and C^ , C^, Cy are defined above in (5.2.2). Further, in the classical case, T^ i s again $,'2 distributed with p(k-l) D.F. and parameter 5.3 Tabular Values of Non-Central F-Ratio and Chi-Square The percentage points for both the non-central F-ratio and chi-square with appropriate degrees of freedom and non-centrality parameters are then needed for the above purpose and so we refer to the following: Non-Central F-Ratio Wishart (1932) and Tang (1938) have evaluated the probability integral for the non-central F-ratio. Patnaik (1949) has also computed the tables by an easier and approximate method by fitting an F-distribution .with the exact f i r s t two moments of non-central F-ratio. Thus, for the use of tabular values at the required confidence level, any of the tables given by Wishart, Tang or Patnaik may be referred to. I -90-Non-Central Chi-Square Fisher ( l 9 3 l ) and Garwood have each computed tables of,the 5$ significant points, of non-central chi-square for 1 to 7 D.F. and TT^ 2 X 0(0.2)(5.0). Patnaik (19^9) has also evaluated\"them by using various approximations to non-central chi-square, which are quite close to exact ones. Thus, for finding the confidence intervals, any of the available tables given by Fisher, Garwood, or Patnaik may be referred to. 5.4 Description of the Method Used for Confidence Regions We now give the method for determining the confidence regions for 2 2 either of the parameters ^ or Since the method used is the 2 same for both, we shall take up only one s t a t i s t i c - Studentized T^. >- ? We shall describe f u l l y the procedure for this s t a t i s t i c , sande the same technique can be made use of for the other also. To do this we shall follow Mood's method (Art. 11.5) given for t .: functions not distributed independently of the parameters. Let, for a pre-assigned X- , the confidence level be 100(l-.£)$. 2 2 2 Since, for a given value of £ k = \"C^^y the density of T^, which 2 2 \\ i s g ^ k / \"£-k(o)) i s c o m P l e t e l y specified, we can find numbers such that: P r [ T k ^ = £ k ( o J \" / Z l i 0 ) ] ^ l - 1 . 0 -91-and P r 2^ (5.4.1) where + JL^ ^ L (•£,> ^x. a r e two predetermined numbers) 2 Similarly, for every value of £ , the pairs of numbers ^ ., ^_ can be il 1 2 found 2 which enable us to write ^ , ^ as functions of JTjj- **e* 2 2 ^( Z^) and ^ ( 7 ^ ) r e s P e c t i v e l y , and finally we state: P r [ iSZl) * o b s e r v e d Tk - ^ ^ k ^ = 1 (5.^.2) Writing ^ ( r 2 ) = T2, S^ Z 2 . ) = T 2 , (5-4.3) we invert them to obtain respectively: and then rewrite (5.4.2) as: ' p r [ * 2 ( T k> 4 Z l * % ( T k ) ] = 1 - l ( 5 A ' 5 ) 2 2 which determines the region for £\" for a known value T^ at ( l -£. )$ confidence. 2 Thus to compute the interval for ,corresponding to a known 2 value of T ,.we refer meanwhile to the Fig. 5 below and explain the 1 procedure as follows: 2 Suppose we have computed T on the basis of k populations. kl 1 Through the point E £ T 2 f o J on T 2-axis, erect a perpendicular to the - 9 2 -P 2 2 T -axis and let i t cut the curves ( T v ) a n d ( T ir ) . *1 1 K l respectively at the points A and B. Take A' and B' respectively to 2 he the images of A and B on the £ -axis. Then, i f the distances of 1 2 2 A\" and B' from the origin are respectively Z-^i) a n d 2*^ ( 2 ) ' w e have: 2 2 which determines the region for a known value T, of T with 100(l-.£)# k l confidence. -93 -5.5 Example To make the procedure clearer we present below an example for p = 4 , n^ = k-1 - 1 , - 29, and construct the 90$ confidence region for ~C 2 corresponding to known value of Studentized T2, = 25. Both lower and upper % significant points (Fisher and Garwood) of non-central chi-square for D.F. = 1(1)7 QXAJ\\ (= ^ T ^ ) = 0 (0 .2)5 .0 have long since been computed; but, since they were not immediately available to us, we have preferred to compute them by the approximate method suggested by Patnaik (1949), for A = z\\ ~ © ( 2 ) 3 6 and D.F. f - p(k-l) = 4 as follows: (i) We fir s t select an appropriate percentage point of chi-square as tabled by Hartley and Pearson (1954) and use the 4-point Langrangian formula to get the same percentage point for chi-square with D.F. ~{t • f t 2 \\ ^ t h e n multiply the result by f - ( l * j^)* The appropriate lower and upper % points obtained by the method are recorded respectively in the second and third columns of table 14. ( i i ) Then we find the values of C^ , G^, 0y and (^defined in (5*2.2) for appropriate values of p, n^ and n^which in our case are 1.07432, 0.0197, 0.000198 and 0.0000037 respectively. ( i i i ) Lastly, substituting the values, obtained above in steps (i) and ( i i ) , in formula ( 5 . 2 . 3 ) , we obtain the corresponding lower and upper 5% tabular values of Studentized T^ and record them respectively in columns 4 and 5 of table 14. -94-2 Having obtained these lower and upper 5$ tabular values of , we plot them on the graph corresponding to respective values of 2 / 2 / 2 i . e . \"C2>an^ obtain two curves ^ ( t g ^ ) a n d ^ 2 v ? 2 . ) a S s ^ o w n * n Fig. 6. 2 Finally, to find 90$ confidence region for computed T g = 25, we 2 erect a perpendicular through the point E(25,0) on the Tg-axis and 2 2 let i t cut the curves ^g(Cg) and ^ ( l 2 ) respectively at A and B. 2 We then take A' and B' respectively the images of A and B on \"Cg-axis. Reading their distances from the origin respectively to be 3.9 and 29.1 approximately, we conclude that: p r ^ 3 . 9 ^ r | < 2 9 . 1/T 2 = 25J = . 9 0 (5.5.1) 2 which determines thus the region for a known value 25 of with 90$ confidence. 2 Note: The non-centrality parameter involves sample sizes. In order that the non-centrality parameter should contain population constants only, we have to resort to the assumption that the sample sizes are equal i.e. W l = W 2 Wk ^ N ( s a y ) ' i n w h i c h c a s e i j r=l where = ( £ / \" l r ) / r=l ' -95-or alternatively that Z. \\ = N 7 2 (5.5-3) p k where ^ £ J_ X ± A £ ( < i r \" /\"i>< /\" J r \" > <5.5.*> i j r=l Thus i f we suppose N » 15, say, we can deduce from (5.5.I) the following 2 for as: P| 0 . 2 6 ^ ^ 2 * 1.9W Tg - 25] . - .90 (5.5-5) Table 14 A 5$ chi-square values 5# T2--values Lower Upper, Lower Upper 0 0.71 9.49 0.773 12.168 1 0.93 11.72 1.005 15.676 2 1.24 13.72 I.362 19.090 1-77 17.31 1.965 25.859 6 2.83 20.77 3.202 33.275 8 3.80 24.08 4.379 41.302 10 4.85 26.97 5.698 49.146 12 5.98 29.93 7.176 58.079 14 7.15 32.85 8.770 67.878 16 8.36 35.69 10.493 78.440 18 9.64 38.44 12.396 89-731 20 10.91 41.29 14.375 102.635 22 12.24 43.96 16.547 115.935 25 14.26 47.94 20.053 138.136 30 17.77 54.55 26.792 182.128 36 22.12 62.24 36.432 246.443 9m 32$ -97-2 5«6 An Alternative Expression for T, — • k 2 We have already given three expressions of in Section 1.2 . We now give below another expression as the sum of weighted Mahalanobis distances as: 2 V~ N N 2 v 2 L Z u (5-6>1) l i r o i k ^ ( N r) r=l 2 where D is the Mahalanobis distance between the rth and the sth (rs) populations. The statement (5.6,1) is proved as follows: Consider the set of numbers u r, v p and the set of integers N r (r — 1, Zf • k)e k > k k Let N s £ N r and Nur £ N ru r, Nv = £ N rv r. r=l r * l r ^ l Let S(u.v) r i / J ~ N N (u - u )(v -v ) v ' u A — £ r s v r s / v; r s' 1* n <. s s k k k Then S(u,v) = ^ ^ Z N r V u r *\" u s ^ v r \" v s } n»l s=l k k nrl s=l k k ^ z\\JL H KrN|(ur-u)(vr-?)-(ur-u)Cvs-v)-(us-u)(vr-v) n=l s=l + (u -u)(v -y)J f k k k k j | I N r(u r-u)(v r-7)( L Ns) • ( L V INS(US-U)(Vs-7)J * ~ r - l s=l r=l s=l k I 1< r <-s ik *' \" . * \". . * \". r=l Thus S(u,v)=|H H . N rN s(u r-u s)(v r-v s)= Z N r(u r-u)(v r-v) Now we apply this relationship, taking Uj. =. Xjj., v r =. X j r , so that , v a. x. • In tt J p p k I s x. y a?.» In the Studentized case T 2 = Z . 2 1 w1^ Z L Nr(5Lr-35)(x, K 1=1 o*l r=l J J. _ X 21 ^ I H H N r N s ^ x i r - x i s ^ x j r - x o s > i=l 3=1 l < n < s £ k 1 £ r <. s i k i ; l j=l L— *— a - ( r s ) l*r*s£k N N r 3 ? 2 ¥ s 2 o L e t T ( r s ) = K > ' N D(rs) d e n a t e t n e corresponding Hotelling T^ • r s -99-Then 2 y \\ _r_s D 2 a \\ N _r V ^ ^ H (rs) ^ - N s ij,2 1 * r <.s *k 1 i r < s *k (rs) The same argument works for classical T , which wil l be expressed 2 2 k i i i j in terms of classical D and T , where ^j— replaces w throughout. (rs) (rs) 2 The same argument works for the parameter C • k P p . . * L r r S B . 2 • r - l 1 * r <-s* k 2 P P i j where ^ ( r s ) * H ^ ( ^ i r ~ ^ i s ^ Ajr\" ^s> i * l J * l Again the relation (5.6.1) can also be expressed in matrix form as: [ H N J Tk * l * i - * 2 - - * t L r . l J D12 ° 1 3 — D U c 2 D 0 21 2 2 D ... D 23 2k 2 2 2 D D D ... 0 k l k2 k3 (5.6.16) CHAPTER SIX DISTRIBUTION OF THE DETERMINANT OF THE S.P. MATRIX IN THE NON-CENTRAL LINEAR CASE FOR SOME VALUES OF p 2 6.1 Let k^ be the non-centrality parameter for the linear case. Then the h-th moment of the determinant |A| , where A is the S.P. matrix with n D.F., is rewritten from (1.3-3) as:follows: f p - i r c ^ + ^ i r - ^ k ^ 2 h f t f + 5 4 h - ] B l l A ( ] h =[l * h f P S - J L «P(- \\ A> L 2* J S R | + J ) J U i=l 1 2 j=0 1 2 (6.1.1) The right hand side of (6.1.1) can be interpreted as-.follows: (i) — — — : is the h-th moment of f. (u. ), n i i . i 1 2 1 where f i ( u i ) - - u. exp^^-u^) i=l ,2 ,... ,p-l X 2 r 2 (6.1.2) 1 2 V 4 ^ f ( t ^ J + h ) and ( i i ) exp(- ^ k ) / — = — is the h-th • U 2 j ; ?i+ j> moment of f 0 ( u Q ) , where j=0 2 2 |?|n.+ j) (6.1.3) -101-Thus: (i) from (6.1.2), f^u^), i a 1, 2, p^l, are central chi-squares, which we can take to be independently distributed with (n-i) D.F. for u ±; and ( i i ) from (6.1.3), £QX uO^ i s a n o n - c e n t r a l chi-square which we take again to be independently distributed, with n D.F. and non-centrality parameter k^ • Since the moment of a product of independent variables is the product of the moments of the variables, i t follows that: E[jA|hJ= E(u£) E(u£) ... E ^ ) * E [ ( V l U 2 — V l ^ J \" Alternatively, therefore, the h-th moment of |A| could be directly determined by multiplying respectively the h-th moments of independent u ^ ( i * 0, 1, 2, p-l) variates defined above from which one con-cludes that i f one wants to determine the distribution of JA| , one can do so by finding the distribution of the product ( UQ^»' , Up ])* Since u Q, u^, u _^ are independent, their joint distribution can be written down and the distribution of (ur,un...u ,) can further Ox p-1 be determined for p sr2, 3, and 4 as follows: The joint distribution of the independent variates ( i = 0, 1, . . . can be written as: \"p-1 \"Sii -j • . 1 . 2 , 1 , TT 1 , u i v 2 1 , 1 V l.«l,ir«p(- ? k l ) || - _ - ( - ) exp(- 5 Ui)d(-)Jl Q n ( 2 ) , i ? l \\ V 2 > (6.1 -102-where 0 £ <. oo ± = 0, 1, 2, . . . , p-1 After a l i t t l e manipulation and setting n » 2m + p + 1 (p 6 n)> \"the joint distribution of u^(i = 0, 1, 2, ..., p-l) becomes m * 4 P - l 1 Up-1 Up-2 ' • -U2 U l U0 / I V \" \"\\ 1 2 T U0 ( 4 / 2 ) U0 ( ^ 2 ) 2 1 T T e x p ( - ^ k L ) L l + i r ( g B f p f , l ) ( 2 m f p f r 3 ) + \"J [ I du ± (6.1.5) i=0 where 0 < ^ <.«» i = 0, 1, 2, ..., p - l . 6.2: Preliminarie s ( i ) We make use of Legendre's duplication formula for the gamma function, namely of f ( n + 1, J 7 n + 1) - ^ ± 1 L (6.2.1) ( i i ) Me l i s t below the standard integrals^ derived from various books of integral tables, of which frequent use has been made: (a) Larsen's book of tables (p. 254) gives CO e x p [ - ( x 2 + a 2 x\"2)Jdx = ^ exp(-2a) (6.2.2) 0 for a }>, 0. -103-(b) Bierens de Haan gives i n his Table 98 (pp. lk-3-lkk) two integrals numbered (5) and (17) as follows: 00 1 2 a exp L-(.px + Qx\"1)} dx (^)2 exp(-2/pI) x n=0 L (a 4 1 - n) L 2 ^ ( 2 7 p i f 2n/ll (6.2.3) and 1 2~ exp .J§ 7 T ( v ° > g D / 1 [-(px + qx' 1)] dx = (|) 2 exp d-2/pi) (6.2.4) Note: In both of these Kramp's notation i s used, namely n/h „ x ' = x(x + h)(x + 2b.) ... (x + n-1 h) (c) From Whittaker and Watson's book, we quote two integrals (p. 116, ex. 6 and p. 2^3, ex. 4): J exp (-t )-exp(-tz) dt = /exp(-t j-exp^-zt\" 1) d t = l Q g z ( 6 2 5 ) where the real part of z is positive,; 1 ' 1 J ex P(-u ) + exp(-u) -1 ^ = y = _ u (6.2.6) where Y is known as the Euler constant. -104-( i i i ) Evaluation of Pertain Integrals by the Use of Differential Equations (a) Evaluation of I, where CD I s J x exp(-2x-2ax-1)dx. (6.2.7) Setting x s i . u - 1 and b = 4a in I, one obtains the integral • 2 CO K(b)i 1 J u\" 3 exp(-u~1-bu)du (6.2.8) 4 % If k(u) : i u - 3 exp(-u\"1) (6.2.9) . 4 then • E 31 ' (l-3u2)u-2 (6.2.10) and co K(b)= / k(u)exp(-bu)du (6.2.11) 0 . . . . . . The function K satisfies a differential equation of the form ( c ^ ^ b ) ^ | + (c-jtd-jb) || * (c^dQbjK- 0, (6.2.12) which after some simplification reduces to b d?K . dK - K s 0 (6.2.130 db2 db -105-Solving (6.2.13) by Frobenius method of series, we get: K(b) = fA +.H log bir-_j£_ - _ b L - _ V l -L J L 2 Jit 3J1I 4!2J 3 (6.2.14) To find A and B we proceed as follows: Set b = 0 in (6.2.8) and in its derivative: to get: K(0) = -K'(0) = I (6.2.15) - 4 Now setting b = 0 in (6.2.14) and then using (6.2.15), we get: B = 7 (6.2.16) 4 . ' ' However, the substitution of b = 0 in the derivative of K(b) defined in (6.2.14) does not help since by using (6.2.15): K'(b) • jL[i + b log b • log b(|L,U + ,„)), Q(b) b - A = Lt b-»0 which i s indeterminate. Again, making use of L'Hospital's Rule, - A - i - Lt f K»(b) + r log b] 4 b ^ 0 L 4 J or -4A - 1 = Lt b-*0 = Lt b-»0 j [ u - 1 exp [-(u\"*1 + bu)J du + log b - 0 J exp [-(u* + bu)J % + J e^P(-u) -exp(-bu) D UJ (by using 6.2.5) -106-- Lt b-»0 /exp(-u ^ - bu) •*• exp(-u u L 0 ) - exp(-bu) du ] -1 Since (i) f(b,u) = exp(-u •- bu) ^ exp(^) - wqp(-ba) i s C G n t i n u o u s on the right at b r 0 and ( i i ) for 0 £• b i 1 |f(u,b)l < max [|f(u,0)| , ]f(u,l)|] exp(-u 1 ) -exp(-u) +1 u u exp(-u) 1 - exp(-u \"S u u for 0 j u j 1 for 3i s u c ^ where each term in the last expression is integrable over the given inter-val, the order of limit and integration can be interchanged and oneagets: - 4A - 1 = O<0 /acp(-u ) <• gg>(-u) - du •i exp(-u 1)+ exp(-u) - 1 u du exp(-v. X)+ exp(-v)-l y . , . . . dv Now setting v= ^ in the second integral, we obtain by using (6.2.6) 1 - 4A - 1 = 2 0 j. j exp(-u 1) + exp(-u) - 1 d u - 2 Y . I t 2% k (6.2.17) -107-Finally from (6.2.7), (6.2.8), (6.2.14), (6.2.16) and (6.2.17), we get: I r of . A fCUg* ) - log 4aIf (^a)2 „ (4a)3 . J x exp L-2(x + ax )J dx=|_v ~% Jl 2*0*. * 3TTT\" + 0 (6.2.18) (b) Evaluation of L r(a) =2 ^ x 2 r + 1 exp(-x2 - ax _ 1)dx (6.2.19) 0 for a real and positive and r = 0, 1, 2, ... The values of successive derivatives at a = 0 are: L.(O) = [ V * ^ l;<°> = -H r+1>> L r(°) = T(r) 1^(0) = - [(r - | ) , L*V(0) = f ( r - 1), l7(0) = f ( r - f ) , etc. (6.2.20) Setting x = u \\ we get from (6.2.19): L r(a) =2 J u\" 2 r' 3 exp(-u\"2 -au)du (6.2.21) 0 Consider £,(u) = 2 u\" 2 r\" 3 exp(-u\"2) (6.2.22) Its differential equation i s : _1 d k s 2 - ( 2 r . 3)u2 (6.2.23) P du u c r Now L r(a)= J\" £r(u)exp(-au)du is the solution of the differential equation: -108-3 2 \\d L . , . , \\ d L . / _ , \\ dL (c^ + d^a) r + (c 2 + d 2a) r -ft- (cj-fr d^a) r + (c Q + d 0 a ^ L r = 0 da 3 da 2 d a oo (6.2.24) i f j £ (-c^u3 + c gu 2 - + c Q) + a(-d 3u 3 + d gu 2 - d^u + d Q ) J £ (u) 0 exp(-au)du ^ 0 Proceeding as before, as in part (a), we obtain: c^ = 2d^ , c 2 = -2rd^ , d^ ^ 0 and c 1 = c 3 = d Q = d x = d g = 0 Thus L r(a) satisfies the differential equation: 3 2 d°L d L a — j L - 2r — + 2L r = 0 (6.2.25) da da To solve (6.2.25) by Frobenius method of series, let: L r(a) = a C(b Q -9 -b La + b 2a 2 + b^a 3 + ...) (6.2.26) Substituting i t in (6.2.25), we obtain the following: (i) from indicial equation, c = 0, 1, 2(l+r) (6.2.27) ( i i ) b x = b 3 = b 5 = ... = b 2 n + 1 = ... =0 (6.2.28) and ( i i i ) b = \"^0 (c+2)(c+l)(c-2r) b,, = 91 4 ( c-fr4)(c+3 )( C42 )(c-frl)( c -2r )(c -2r+2 ) b 2 \\ 6 (c-ft6)(c+5)...(Gi>l)(c-2r)(c-2r-*2)(c-2rf4) -105!-2 0,\" 0 Do = _ (c+3)(c+7)...(c+l)(c-2r)(c-2r+2)(c-2r+4)(c-2r+6) -, etc. (6.2.29) Evaluation of L (a) for Particular Values of r r v (i) Setting r = 0, the differential equation (6.2.25) becomes d 3L a — + 2 L Q = 0 (6.2.30) da Making use of results (6.2.26) to (6.2.30), we get: L 0(a) = [ A Q + B 0 log a][ - ^ + 1^-**-^+ . . . ] r . 2.3 2 4(124) 4 -i + B0|_l + x2 a \" h2 2.24 x2 a * • • • J + c o L a \" 3 T \" a 3 + f l T T a 5 \" T T I T T a ? + * * * ] (6.2.31) With the help of (6.2.20) and remembering that r = 0, we easily obtain from (6.2.31): B Q = f ( l ) , C Q = - p i ) (6.2.32) To find AQ, we differentiate twice (6.2.31) with respect to a, and then, setting a = 0, we obtain: - 2AQ = Lt jj.J(a) +.2 log a] a-* 0 = Lt a-> 0 ^2 j u 1 exp(-u~2 - au)du + 2 log aj -110-2 Setting u = t we obtain: - 2A = Lt 0 T I 1 t\" 1 - exp(-t - at 2)dt + log a 2 / a-r 0 0 Finally, making use of (6.2.5) we get: 1 t \" 1 £exp(-t - at 2 ) -fr exp (-t\"1 - a2t~^}dt J . = Lt a-* 0 Again an interchange of limit and integration is possible, so we obtain: 2 A0 j t _ 1[_exp(-t) + exp(-t _ 1) - l j dt Now proceeding as before in part (a), we get: - 2Aq = 2Y , so AQ = - Y » -thei Euler constant. Thus 2a2 k k 8a6 L Q(a) = ( * - log a ) ( ^ - + 2 1 4 3 • 2 • 1 -Vff ( a - f ^ a 3 + a 5 - ^ J L _ a 7 + ...) (6.2.33) ( i i ) Setting r = 1, the differential equation (6.2.25) becomes: 3 2 &\\ d L a ±- -2 — ± +2L =0 (6.2.34) da 3 da L Proceeding as above and similarly evaluating the constants with the help of (6.2.5), (6.2.6), (6.2.20) for r = 1, we get: -111-L ^ a ) = + ^ \" L QS aJ ( lH2 a \" S + 81 2-4-2 a \" • \" ) + ( l + I\" a 2 + a^ \" + ... ) - ^ ( a + | r a 3 - a 5 + | ^ a 7 - ...) ( 6 . 2 . 3 5 ) ( i i i ) For r = 2, the dif f e r e n t i a l equation to he solved i s : ,3 - , 2 T aJL„ d Lg + 2L_ = 0 (6.2 .36) * 3 . 2 \" 2 da da Again with the help of ( 6 . 2 . 5 ) , ( 6 . 2 . 6 ) and ( 6 . 2 . 2 0 ) for r = 2, the solution o f , ( 6 . 2 . 3 6 ) is • 3 h L 2(a) = [( % + |) - ( ? 3 ) log a j f ^ f ^ ; a 6 - ft,^) 2 a 8 4 . . . ] + f?3) [ l + ^ a 2 + ^ ^+5^5- a 6 + . . . J - [(§) [ ^ + 3 1 3 a 3 + 5 i f ^ a 5 + - - - ] ( 6 . 2 . 3 7 ) 6 . 3 : Distribution of the Determinant of the S.P. Matrices A up to the Order k i n the Mon-Central Linear Case Case 1: For p = 2 , i.e. when A is of order 2 and is positive definite. Substituting p = 2 i n (6.1.5)> the joint distribution of u Q and u^ i s : -112-5\\ ,m mfr 1 u Q 4/2 uS (k=/2)2 IT 2S3 * ST (2j*3)(gw5) * ••• J a \" o d u l ( 6 - 3 a ) 0 £ u , u <.00 where 0 1 Set Vo = V l ' uO = 2 V 2 ( 6 . 3 . 2 ) so that du^ du., = 1 0 ( v2 7F Now using (6.2.2) J exp(- \" V d V 2 = 2~ 6 X p ^ \" V ( 6* 3* 5 ) v2=o For r ^ 0, V 0 = t, the integral I = 1 r- — 2 2 r exp(- - i f - V2)dV 2 ^2 2 2 reduces to I = r 2 2 1 t exp(-t - )dt, and now using (6.2.3) we have: = § < ^ e x p < - V 2n/l 2 ' V n=0 v l -2 exp(- V 1)T j (6.3.6) * - . T r - ( V I ^ 2n/l 2 ' V n=0 1 Thus the distribution of = / U Q I I ^ is vf* 1 exp(- V x - § k2) T T x k2 T 2 k* 1 p2m42) L 1 + IT 2m+3 + 2 T (2m*3)(2n*5) + • • • J d V X (6.3.7) where 0) < V C and m n-3 Note: For k2 = 0, and m = , (6.3-7) becomes: f i f e ) V l 2 eXP< - V l > d V l which is a gamma variate with parameter (n-l). (6.3.8) -mi-Case 2: For p = 3 Substituting p = 3 in ( 6 . 1 . 5 ) , the joint distribution of V V u2 i s ; _ m m-fr- m-fl-1 3% u_u., 2 ur -3(mfr£) *2\\ * u 0 , 1 T- • 1 . 2 . 2 J V 2' F ^ o-p exp(- - > u - - k ) PmH)pn*f)f(m«) 2 ^ - 1 2 1 i = 0 2 / „ x 2 r uo V ( 1V 2 ) i L1 + lT ^ P T * 2f (2nH^ )(2nHf6)^ •••JdU0dUldU2 where 0 - V V u2 c o e ( 6 . 3 - 9 ) Setting UgU^s u ^ 2 = V2,, uQ= 2V^ (6.3.10) so that duQdu]Ldu2= ^ - S r \" 1 dV]dV?dV3 and then making use of (6.2.1), we obtain the distribution of after <*. l i t t l e manipulation as: oo Vlf 2 n * l f W ) ^ m + 3 ) J J e X p { ' ~ 2 - ' ^ ' V 3 ) L 1+ v3=o v2=o ^ .2 „U ,U V„ k. i f 2mir + 2^ T ^ f e o T * ...]dV 3dV 2d V l ( 6 . 3 . 1 1 ) where 0 * \\ « • 0 0 Making use of (6.2.2), oo ? 2 v,vr v 0 2 3 Then ( 6 . 3 . 1 1 ) reduces to: -115-2 m |(ntfH) j(2m+3) J 3 2 J ^ X* * a w d* 0 (awUKaatg) + — ] d V v i t 6 - 3 - 1 2 ' Now making use of the integral (6.2.19) for r = 0, 1, 2, ... given respectively in (6.2.33), (6.2.35), (6.2.37), etc., and remembering that a in (6.2.19) is equal to / 1 , the distribution of V X ( = \" Q \" ^ ) I S : L U W 2 V IT ~2nHff + 2 T (2nrt4)(2n „m , 1 ,2 v V l 6 X P ( - 2 k l } 211*1 [(mtl)p(2m*3) L \"°l>/2^ \" i t i s p f ~ + 2T (2mW)(2m*6) + d V l n 4 where 0 < V^<-<>~ f and m = (6.3.13) n—4 ;ucing = u ana m = case becomes: Note: Substituting k^ = 0 and m = ^ , the distribution in the-central n-4 V , 2 n-2 r-» 0 2 2 l ( f - ^ f a - L> U J ^ ) (6.3.14) for > 0 f V < where I^G/^) is defined in (6.2.33) Case 3: For p = 4: Substituting p = 4 in (6.1.5)> the joint distribution of u Q, u^, u 2 and u^ i s : -116-7 m 2 mfrl \"* 2 0-U(m+ir) U 3 U 2 U l U 0 ( L ST 1 „ 2 . x Rm+l)RnH(|)Rii>fr2)(T] ' i=0 r U q (kJ/2) (k 2/2) 2 n L 1*IT 2 m i 5 - + 2T (2nHh5)(2nHh7) * • • J * V V V 1 ^ <6-3*15) where 0 * u Q, u.^ Ug, Setting u^gU-jU^ U g i y ^ 2 V 2 , W= u 0= 2 v f (6.3.16) so that d ^ d U g d u ^ u ^ S f V g V ^ ) \" 1 av^VgdV^V^ and also making use of ( 6 . 2 . 1 ) , we obtain the distribution of afteralittle manipulation; as-.'.follows 00 00 2VJ exp(- |k! TTf(2mfr2)|(2mf4 2Mh) J J J u v 2 v 2 t v T V ° v 3 = ° V 2=° V 2 k 2 V*- k^ t+TT 2^5 + 2T (2 n ^5)(2m f r 7 ) + - J d V 2 d V 3 d V 4 ( 6 * 3 * 1 T ) where 0 < 0 0 Making use of ( 6 . 2 . 2 ) , we integrate (6.3.17) first with respect to Vg and obtain: V™exp(-|-k 2)dV f FT f ••V? • v T K2m42)R2nH^) 7 3 V 3 ' ' Uv? U V v3=o V^=0 * / V2, k 2 V,,11 k,1* • A ( l + J£ * Jt i •> ... dV.dV. (6.3.18) V l i - 2m+5 2 (2m+5)(2m+7) • / 3 U ^ where 0 J ^ - 1 1 7 -To integrate with respect to V^, we evaluate' again the f i r s t integral as before by using ( 6 . 2 . 2 ) , while i n the others we set = t and then, using ( 6 . 2 . 3 ) , we obtain in place of ( 6 . 3 - 1 8 ) : -v ; e x p ( - | k 2 ) d V 1 f V j. I k* —=; i=s / V exp( - — - V ) 1 + y r p — + 2 p 2 m + 2 ) p 2 m + 4 ) / = Q 3 V 3 3 - L 1 * ^ 3 + 2 T (Lf5)(aM-7) + — J d V 3 ( 6 . 3 . 1 9 ) for 0 i v <> V_ V c . i \\ 2 n / l v h e r e l r = 0 y?e Xp(-V 3) 2 _ ^ f ^ 3 r - (6-3-20) n=0 3 Further to evaluate ( 6 . 3 . 1 9 ) , we have to use either ( 6 . 2 . 3 ) or ( 6 . 2 . 4 ) for p = 1 , q_ =v/V2_ a n d suitable value of a. This determines the distribution of V^(= u^u^UgU^) where i t should be remembered that m = 7j{n-5). Note: For the central case we set k^ = 0 i n ( 6 . 3 . 1 9 ) and then^making use of ( 6 . 2 . 1 8 ) , we get the distribution of V.^ a n 1 V l d \\ _ I\" flUSX) - log a W a 2 ' a 3 a^_ N p - 3 ) p - i ) I K. ~2 ; ( 2 l o T + 3TlT + 412T + \"J •+ h i - a + \\ a 2 + 4 i p - a 3 +...)] ( 6 . 3 . 2 1 ) 2 3 2 J where 0 f V 1 <- 0 0 and a = Jl~^ CHAPTER SEVEN STATISTICS PROPOSED FOR VARIOUS TESTS OF HYPOTHESES I, II AND III AND THEIR DISTRIBUTIONS IN PARTICULAR CASES 7*1: We l i s t below the statistics,based simultaneously on the roots of both the determinantal equations (1.4.5) and (1.4.6),which can be used to test the hypotheses I, II and III with the suitable use of independent S.P. matrices A and C: ( i ) Roy's st a t i s t i c s of largest, smallest and intermediate eigen-roots based on the determinantal equation (1.4.5). We can simultaneously propose to include that of the eigenrootst: based on the determinantal equation (1.4.6). 2 ( i i ) Hotelling's T^-statistic defined as: I = n 2tr ( 0 - l & ) - £ . £ ( ^ ) . i=l i=l ( i i i ) Wilks-^7V-statistic defined as: I I - | c | / |A + c l - [ f (1 -e±) = Jf U + ^ r 1 ' i=l i=l (iv) The Wilks-Lawley U-stat^stic definedgas: U - M i | A + c | - fl < V = I < r W > ' i=l i=l (v) P i l l a i ' s V-statistic defined as: t 1 V = t r [ ( A • 0)-h] <*i> \" H ^ 1 1=1 1=1 -119-(vi) We propose another statistic Y defined as: i-1 i r l Of course, the distribution of any of the statistics, under the null hypothesis, can be found from either of the joint distributions (1.4.7) and (1.4.9); hut i t wi l l be more convenient to use (1.4.90 for finding that of J\\ , U, V, and either of the two for finding that of Roy's statistics. 2 We have taken in Section 7*2 the statistics T and Y and have been able to give their distributions for t s 2, 3 in the form of definite integrals. Since the procedure is quite similar for the remaining statistics, we have only listed at the end of the Section 7.2 their respective distributions in the form of definite integrals, again for the cases t - 2, 3o Nanda (194$) gives the joint limiting form of (1.4.7), which we have listed under (1.4.10). Following him, the joint limiting form of C. (1.4.9) is easily proved also to be (1.4.10) by setting -~ in (1.4.9) and then letting n tend to infinity. The fact that the limiting forms of both (1.4.7) and (1.4.9) are the same enables us to conclude that limiting distributions of the 2 statistics Y and U will be the same and also that of T^ and V except for the constant multiplier. The same can be said in the case of Roy's statistics. -120-In Sections (7*3) and (7 .4) we have given another method, different from that of Nanda (1948b), of finding the limiting distributions of Roy's s t a t i s t i c s . Further, to demonstrate the method of integration, we have solved some particular cases, giving various values to m, for ^ = 2 , 3> and 4 . Lastly, i n Section 7.5, we have f i r s t found a new form suitable for finding the limiting distribution of Ucr Y. Since this form i s quite similar to that already obtained i n Chapter Six for finding the distribution of the determinant of S.P. matrix, we have only effected certain substitutions i n the results obtained in Chapter Six and have been able to deduce the limiting distributions of U or Y for £ = 2, 3 and 4 . 7.2: Distributions of the Statistics T 2 and Y for t = 2, 3/ and Further Results Case I: For ft = 2 , The joint distribution of ^ and ^ from (1 .4 .9) is c(m, n, 2 ) ( ^ 2 ) f f l [ ( 1 +^ )J \"m-n-3(^ - ^ ) d ^ 2 for 0 «. ^ j ^ 2 < <=*=> ( i ) For Y-statistic: Let, 7.si; ( ^ 2= u, (1 + . ^ ) ( l + 42) = v (7-2.1) (7 .2 .2) so that - ^ ) d ^ dj^ g = du dv, and the relation (7.2.1) becomes: c(m, n, 2)u m v\" 2* 1**** 3* du dv (7.2.3) Now the roots ^g of the quadratic: x 2 - (v - u - l)x + u = 0 (7-2.U) g are real i f (v - u - l ) h bu o i.e. i f ( l +Ju) < v Then the limits for v and u are given by: ( l -&VU) i- V «o 0 Sr U <~ <— (7.2.5) The distribution of u(= b-^^) o r Y i s S i ven by: -3 c(m, n, 2)u m d u ^ v \" m \" n dv 2 v=dVu) where 0 < u <. o - 3 ( H / u ) 2 ( » « 2 ) where 0 u c 0-0 Further, for any test of hypothesis, we need to make two forms of substitutions: n l ~ 3 n2\" 3 If p = 2( < n^, m = — — , n = — — I f n i = 2 ( i p ) , m = ^ 3 - , n=-|-^ (7.2.7) -122-Effecting these changes in ( 7 . 2 . 6 ) , we have: For p = 2 ( £ n 1 ) , the distribution ( 7 . 2 . 6 ) reduces to n ( V 1 ) _ 1 \\(n + n - 2 ) Uul d W ( Q ) r ( n r 1 ) r V 1 ) ( l V u ) 1 2 where 0 £ u 0 0 which states that (= Jfi^^) i s distributed as F-ratio with 2 ( n 1 - l ) and 2 ( n 2 - l ) D.F. For n^ = 2(< p) the distribution takes the form: ftD2} W ^ \" 1 a(/u) ( 7 . 2 . 9 ) fXp-DfTn2-pfrl) ( l V u ) where 0 £ u <-which states that V/Y (=Jp^^) i s also F-distributed with 2(p-l), 2 ( n 2 - l ) D.F. ( i i ) For T 2 - s t a t i s t i c : Considering now the change (j^-fr^) = u, ^-^2 ~ v ( 7 . 2 . 1 0 ) and proceeding similarly as above, the joint distribution ( 7 . 2 . 1 ) becomes: c(m, n, 2 ) v m ( l + u + v ) \" m ~ n \" 3 du dv ( 7 . 2 . 1 1 ) u 2 where 0 <• v i 0 £ u *~ °° Then the distribution of u i s : -123-1 2 4* J m -m-n—3 c(m, n, 2) / v (1 * u •* v) du dv (7.2.12) Vs.0 where 0 * u <• ***> Setting v = (1 4 U)VQ, we get in place of (7.2.12), the distribution of u as: 2 u 4(l*u) c(m, n, 2)(1 + u)\" n\" 2 du J ' VQ(1 * VQ)~m~n~3 dVQ (7,2.13) V ° where 0 £• u 0 0 Again, effecting the changes in (7.2.13) as indicated above in (7.2.7) we have: For p r 2( £ n^) the distribution of Ur for two roots i s : 2 i » ( n i V 1 } 7 v ~ ( > v ) 2 dV 4 [ V ) [Tn 2-D ! 2 i j 0 I 0> ( 1 ' U ) 2 V ° (7.2.14) where 0 £ u cea The integral involved is an incomplete beta function which can be easily evaluated. 2 For n r 2( <. p) the distribution of u(= T ) for two eigenroots i s : 1 M t v ^ J v° (1^, ^ from (1.4.9) is c ( m , n , 3 ) ( ^ 3 ) m [(1 + ^ ) ( 1 + ^ 2 )(1 + t3)]-m-n-k Ii 3 i-i 3 |T TT W± - I d ^ i (7.2.16) for „2 For finding the distributions of both the statistics Y and T^ for three eigenrootsj.. we effect the following changes: ^ + 42 + ^ = u, + 1^^ 3 + ^ 3 = v ' a n d ^ 1 ^ 3 = w (7.2.17) so that (^ 3-^ 2)(^ 2-^ 1)(^ 3-^ 1)d^ 1d^ 2d^ 3 = du dv dw Then (7.2.16) reduces to: c(m,n,3)wm(l + u + v + w)\"m\"n\"^ du dv dw (7.2.18) where j ^ , j ^ 2 , ^ 3 are the roots of the cubic: x - ux +vx-w = 0 (7.2.19) (i) For Y-statistic: In order for the roots of the cubic (7.2.19) \"to be real and positive, we know, from the Appendix B, Form II, the limits on u, v, -125-and w respectively Bust be the following: 0 <• w < and 0 * w < ~° 3w2/3 < v 5 3w 2 / 3(l+V3) 3w 2 / 3(lV3)* v <. oo (7,2.20) Thus the distribution of w(= j ^ g ^ ) = Y from (7.2.18) and (7.2.20) i s : c(m,n,3)wm y (l-*u+v4w)\"m\"n\"^ du dv dw (7.2.21) V u where u, v, w are defined as in (7.2.20). Effecting another change i n (7.2.21) as follows: v = ( l + w ) V 1 , u = (1 + w)(l + (7.2.22) so that du dv = ( l + w ) 2 ( l + V •) dV 1 d l ^ , we get in place of (7-2.21): ri r a u i c(m,n,3J 1 I -L / 7 9 Po\\ (l-ftw)1***142 / (l+V ^ m + n + 3 ' ^ n W + V u 1 1 f o r o i w < =*° and 0 5 w <; *»° 3wfZi < v < 3v 2 / 3(W3) 3w 2 / 3(lV3) , v . ^ Hw V l ~ 14W • \" l^w\"\"^ \" , V1^ fi3 P k ?3 « u Ph ( n f w J d ^ ) * u i ~ (nwXi+v^ (i+w)(i*v1) - i - (l+w)(l+V1) Further, for any test of hypothesis, we need to make following two fci-hds of changes for m, n in (7.2.23) as fgjfeven:below -126-n -k n -h If p = 3U n 1), m = , n = - ~ -and i f n x =• 3( 4 p), m = , n = - 2 — (7-2.24) p ( i i ) For T k-statistic: In order that the roots of the cubic (7.2.19) be real and positive, we write down the conditions respectively for u, v and w, derived in Appendix B, Form I, as: (a) 0 * u o*\" 2 0 6 v ~< u _ and 0 * w */?2 (7-2.25) and (b) 0 £ u <• 1 2 , 1 2 ^ U £ V £ - r U /{ & w s £ 2 (7-2.26) Thus the distribution of u(= ^ + ^ 2 + ^ ) = T^ for 3 eigenpaots^ from (7.2.18) with the help of (7.2.25) and (7-2.26) i s : (i) c(m,n,3) J j w^HKrt-vHw)\"131\"11\"11 dw dv du (7-2.27) w and ( i i ) c(m,n,3) J J wm(i4u+v+w)\"m\"u\"'+ dw dv du (7.2.28) v w with limits in (i) and (i i ) given by (a) and (b) above, respectively. Effecting another change for both (7.2.27) and (7.2.28) as: v = (1 + u)V2 w = (1 + u)(l + V 2)U 2 (7-2.29) n i / , . . . \\ - m - n - 4 -12.7-2 we get respectively as the distribution of u = for 3 eigen-roots: *u f f ^ (1) c(m,n,3) n+2 / / — — r r dU^ dV. (1-fru) 7 7 (14-V 2) Q + 3 ( l 4 U 2 f + n + 4 2 c V 2 U 2 (7.2.30) i where 0 * u <- o° P2 ° * V ( H u ) ( W g ) ( 7 . 2 . 3 D and where v used in ^ and /*2 is equal to (l+u)V2, and (2) 2 n H h l ^ -V u ) 2 m f t 2 d ( A ) (7-2.37) where 0 i u - 1 Further, for any test of hypothesis, the changes of the type indicated in (7.2.7) are possible. Case I I : For 0 = (i) For U-statistic The distribution of w(= Q^&^Q or = U,£threje reigenvalues) is: ~TL2>-c(m,n ,3)w m(l- W) n + 2 j J (l+V^) n + 1(l-U 4) n dV^ dU^ dw (7-2.38) where 0 * w '< 1 . and 0 « w * 1 • i l l l < y - 3 v 2 / 3 ( l H K M ) 3^/3 ( i V 3 ) t . T . , 1-w *~ k 7 1-w 1-w ~~ U \" (i-w)(i+v^) \" u 4 ' (i-w)(i+v^) ( i - W ) ( i + v ^ ; ' U k ^'(i-vniw^; (7.2.39) where P and f ^ are defined in Appendix B, Form II and v used in them is equal to (l-w)V^. ( i i ) The distribution of u(= &1 + e g + 6^ or V) is: m^+n+1 A r i 5 5 (1) c(m,n ,3)(l-ufl H f c f 2 du j j ^ ( l - U 5 f ( H V 5 f f c * 1 dU^V, V 5 U 5 ( 7 . 2 A 0 ) where 0 £ u 5 1 2 0 < V 5 r - r ^ — t -u - v5 4(1-U) and ( 2 ) : 5 5 V 5 U 5 (7.2.42) where 0 £ u * 1 4 ( l - u ) 5 V 5 * 3(l-u) -13:6'-(l-u)(l +V 5) * V ( l - u ) ( u v , . ) (7.2.43) where further /5 and 2 are defined in Appendix B, Form I, and v used in them is equal to (l-u)V^. ( i i i ) The distribution of w(= (l-O^Kl-egKl-O^Jor A. for three eigen-roots i s : C K D , 3 A I - V ) ^ j j (lW6f*hl-V6)m dV6 dU6 dw V, U, (7.2.44) '6 °6 where 0 £ w * 1 a n d 0 * w i 1 1-w 6 1-w » T=jj i w 6 3 /> . h fi' * \"6 * (l-w)(l +V 6) < (I^)(IH.Y6) * U ^ ( £ ^ l + v 6 ) _ _ (7.2.U5) where again and p ^ are defined in Appendix B, Form II, and v used in them is equal to (l-w)Vg. Finally for a l l these three parts, under any test of hypothesis, the suitable changes for m, n indicated in (7.2.24) can be effected. 7.3: Distribution of the Smallest Eigen-root in the Limiting Case: The joint limiting distribution of the eigen-roots c^ of the determinantal equations (1.4.7) and (1.4.9) given in (1.4.10) is re-written as: •131-t * i - i ft K(e,m) 0 c m exp(- |F |J (o± - c.) ]J dc. (? .3.l) i = l i=>2j=l' i = l 0 4 ° 1 5 C 2 *•* ~ c g c < \" ^ = ^ ^ ( P j * ^ ) where K(£,m) = R 2 ^ ) R | > The distribution of the smallest eigenrcfro'bt, i s : oo P r( C l>, x) = K(£,m) J J j ][ c m exp(- C i: i=;2 j=l> i=l Set c f = c ^ U g ...Ug_ 2u c_ 1 c t - i = c i u i u 2 ••• u e-2 C = C U U c 3 1 1 2 c 2 = (7.3-4) Then: y ^ P r ( C ; L ^ x) = K(£,m) / J J ^ ctm + ^ - 1 ^ - 2 ) e x p ( . C i ) j C^=X U^=l Ug =^1 L u ! ( u 2 - l ) e x p ( - c 1 u 1 ) J * -1&2-m(t 2).I ( f l - 3 ) ( g ) [ u 2 2 (ug-lJCu^-l) e x p t - c ^ U g ) ) — ( u 2 ^ 2 (u^_ 2-l) (uf _ 2 U t - 3 •**( u «-2 U £-3 •* , U 1 e x P ( \" c i u i U 2 •••u?-2'!i)fU?-l^U^-l\"1^ ^ U f - l U C - 2 \" 1^ U«-l U ? - 2 U ^ - 3 '••W-l u £-2 \"* U 1 e x p ( \" c l u i u 2 •**u<». d u ^ d u ^ ...du^Cj^ (7.3.5) Me have evaluated below the integrals for L = 2, 3, and k. The same method can be extended to any value of 2 , Symbols and Notations OC /•\\ -r 1 \\ / a 1 \\ exp(-a)f, . n . n(n-l) . n i l (1) J (n,a) * J x exp(-ax) = —f' ^ L 1 + a + 2 — + * * * ~ I 1 - • a a (7.3.6) 00 ( 00 ( i i ) J(n,p,q,r, ...;a) = J xnexp(-ax)dx£ j yPexp(-axy)dy 1 00 y zqexp(-axyz)dz(... )J (7«3.7() 1 a ( i i i ) ' P ^ J ( p i , P 2 ; a x ) t j ( q _ i , c | 2 ; a x ) t . . . J CO 00 «o = J x n exp(-ax)£ ^ y ^ 1 exp(-axy1) (^f ^ e xP( \"^1^2 ^ d y 2 ^ d y l 1 1 1 * j exp(-axz )/•I z g exp(-axz^) | a x (7.3.8) -133-= T (n,p1,p2;a) t TCn^qj^^ja)* ... (7-3.9) iv:,l etc. Case I: Substituting t = 2 in (7*3.5): (^ >, x) = K(2,m) J\" y c 2 m f r 2 exp(-c1) u^(u 1-l)exp(-c 1u 1)dc 1du 1 P r C;L=x U 1 = 1 (7-3.10) Making use of ( 6 . 2 . 1 ) , (7 .3 .2) and after^little manipulation, we deduce from (7 .3 .10) : 22m*-l j 2 m + 2 T -\\ c^ =x Using (7 .3 .6) and simplifying, we get: ' c^x 1 + (mfrl) 2 i J d c l ' (7.3.11) which can be easily evaluated for successive substitutions of m=0, 1, 2, ., Case II: For t = 3 Substituting I = 3 in (7.3.5), we get: 0 0 OO CX» P r ( C l >, x) - K(3,m) J / / [ c ^ e x ^ - c ^ J f u ^ ^ - l ) C^=X Uj=l Ug=l exp( - c ^ )J [u m ( u 2 - 1)( UjUg-l) e x p ^ u ^ J d u ^ d ^ ( 7 > 3 > 1 2 ) -134-Making use of (7.3.2) for ( = 3 and then (6*2.1) , we obtain from (7.3.12) after re-arrangement of terms: P (c, >, x) = ,2m-»3 f(m*l) R2m*3) J 1 1 c^»x ~ P £ j (m*2j u ^ ) - J(mf 4 / c c exp(-c-,) i i ^ dc. 1 (7.3.14) Using (7.3o6) we obtain: JU .U-LC-L) -T ( 1 > u I C I ) \" J C O - U ^ ) - J(2,u Lc 1) ^ ^'-W/,. _2_ 2 2 ^ » \" ' u c2exp(-iL c ) u c 1 1 1 1. exp(-u c ) J ( l , u l C l ) - J ( 0,u l C l) = g 2 x 1 (7.3.15) V l Again using (7 .3 .6) and ( 7 . 3 . 8 ) : 1 , . exp(-2c ) T ( j ( 2 , u c ) - j ( l , u c.))= = ^ ( 1 + ^ - + - ^ ) 4 1 1 x 2 c i 1 2c1 1 , . exp(-2c ) T ( ? ( o , V l ) - K 2 , V i > J - - \" ^ 3 — - ( 2 * ^ > ~k/ \\ exp(-2c ) and • X ( j ( l , u l c l ) \" T ( 0 » V l 7 = 3 ~ ~ ( 7 . 3 . 1 6 ) 2 2 C1 Substituting these in (7.3.14) and simplifying, we get: P r(c L V x) = 3 I exp( -3c 1)dc 1 = exp(-3x) (7.3.17) Cj=X ( i i ) For a = 1 Substituting m = 1 in (7*3.13) and again ufMissing the steps of the type (7.3.15) and (7-3.16), we obtain: P r C c l > ' 3 \\ ) = j ( c i + 5 c 2 + 5c 1)exp(-3c 1)dc 1 (7-3.18) C l X = exp ( -3x)[x 3 + 6 x 2 + 9x + 3J J3 ( i i i ) For m = 2 Substituting m = 2 in (7.3*13) and again v&©Mowing the steps of the type (7*3.15) and (7.3-16), we obtain: -136-pr ( c l > / x) = j$ J (2c^-f 20cJ+ 80c4 - 140c 3 105c 2)exp(-3c 1)dc 1 CnrX r 6 5 4 3 2 i / 5 exp(-3x) I 2x 24x - 120x * 300x + 4©5x + 2?0x -*• 90j / 90 (7.3.19) L e t c . ' Case III; For I - 4 Substituting £ = 4 in (7.3.5) we obtain: P r( C ; L >, x) r K(4,m) [cj»* 9 exp(-c 1)J c^=x u^=l u 2=l u3~^-j^u3™ ^ - l j e x p t - c ^ ) ] [ u 2 ^ * 2 ( u 2 - l ) ( u 2 u 1 - l ) e x p ( - c 1 u 1 u 2 ) J [ u ^ u ^ - l ) ( u ^ U g - l X u ^ U g ^ - l J e x p C - ^ u ^ u ^ ) J dc-jdu-jdUgdu^ (7.3.20) Making use of (6.2.1) and (7.3.2) for ^ = 4, and after re-arrangement of terms, we obtain from (7.3*20): C 4m-»5 pr < V ' x ) = |T2m-.2) p2m44) c± 'expt-c.^ t 3m+8 2m 4 c^=x ( j ( m * 2 , c^ u g ) - J (m+1, c ^ U g ) ) (_7(nwl, c-jU^Ug) -v 2m-»5 C ^ r \"I f (m*3iCjU^)) •* I ^ ( m O . c ^ U g ) - j U ^ c ^ u ^ ! r 2m+6 I c u 1 1 ^ ( m ^ u - j U g ) - J(m-t2, c ^ U g ) ! * ' \" P ry (m+3 , c ia 1 u 2 ) - yfciijCjUjUg),) / n i l / « •a 3m*7 T 2m*3 V l , t ' \" \" j \"* (?(m 2,c 1u 1u 2) - T(m 3 , 0 ^^2)^1 2m+6 cifiut . c i u i 3m+i UT^^Iftfr:L,ClUlU2^ - y( m» ci ui u 2 ^ ) + | ( ^ m ' c i u l u 2 ^ \" 7 ( ^ 3 , 0 ^ ^ ) 1 + | ^ J C m f S j C^Ug)- 7 ( ^ 1 , 0 ^ 2 ) 1 ) + ^ f ^ p 1 ^ j ( m , c 1u 1u 2) -2m+5 J 3m+5 (,2m+2 . C 1 U 1 c l u l jCmtt^^Ug)J+ 'y H X i i H^CjUjUg) -7(111,0^^2)1+ y 2 m * 3 , 2m+U ^ • ( n H - l , ^ ^ ) - 7 ( ^ , 0 ^ ^ 2 ) 1 1 J d C l (7.3.21) Now we explain below how to make us of (7.3.21) to obtain the probabilities for a particular value of m. (i) For m = 0 First we substitute m = 0 in ( 7 . 3 . 2 l ) , and then by using ( 7 . 3 . 6 ) , we obtain the following: exp(-c u u ) 7 ( 1 , 0 ^ ) - J(0,c U ]u 2) - 2 2 2 1 2 C 1 U 1 U 2 2exp(-c u.u ) 7 ( 2 , 0 ^ ) - 7 ( 1 , ^ 2 ) = - g - g - g (1 + — - ) C 1 U 1 U 2 1 1 2 exp(-c y ) 7 ( 2 , 0 ^ ) - 7 ( 1 , 0 ^ ) = - 2 - 2 - 2 — - (1 + ^ n r ) C 1 U 1 U 2 1 1 2 3exp(-c u,up) y ( 3 , c l V a > - 7 ( 0 , 0 ^ ) = 2 u 2 u 2 ( 1 + c T ^ r + - 2 - IT > C 1 U 1 U 2 1 ^ V l ^ 2exp(-c 1u 1u 2) , 7(3,c u u 2) -7(l,c u u ) - (1 + - J — + 1 T I ) C 1 U 1 U 2 1 1 2 C 1 U 1 U 2 and -13S-exp(-c U . IL) , 6 C 1 U 1 U 2 1 1 2 C 1 U 1 U 2 (7.3-22) Again, using (7-3.*6) and ( 7 . 3 . 8 ) : c l u l (A) : \" T ( j C ^ V i V ~ 7 ( 0 , 0 ^ 2 ) ^ = exp( - 2 0 ^ ) 2 c R ° 1 U 1 . exp(-2c u ) (B) J ( 7 ( 2 , 0 ^ ; ) - J t C c ^ ) ^ 3 3 (1 + 2^r-) 3 c l u l 1 1 V l , ' exp(-2c u ) (o) T ( j t 2 ' V i V \" T ^ W ^ , 3 3 2c^u^ 1 1 SCjU^. ° l \" l / x Zexg{-2c^i^) (D) n r ( j ^ ^ c ^ u g ) - 7 ( 0 , 0 ^ 2 ) ) = 2 c 3u 3 1 1 (E) ( l + ^ V + ^ - ) 1 1 2 c 2u 2 / \\ e x p ^ r ( 7 ( 3 ^ ^ 2 ) - 7 ( 1 , 0 ^ 2 ) } = exp(-2c 1u 1) 3 (F) c u 1 1 U + 2c,u,' + 0 2 2 + , 3 3 ; 1 1 2c u IfcJuJ. T ( T ( 3 ' c l u l u 2 > \" 7 ( 2 > C 1 U 1 U 2>) = e x p C ^ c ^ ) 3 3 c l u l u + <•*, * 2 2 + 3 3 I ^ T ^ ; 1 1 c,u, c,u, 2c, u. 1 1 1 1 1 1 (7.3.23) -139-Finally, using ( 7 . 3 . 6 ) , (7-3.8) and (7*3-23), we obtain: K1 1 exp(-3c ) -(A) + (B) - (C)J = 5 - ^ -5 U c l c i r 1 3exp(-3c ) T | ( A ) - (!>>•+ (E) j = r — ( 1 + f - ) /- L J 4c? G l c, r-r^r 1 3exp(-3c ) I [;(B) + (D) - ( ? ) ] - ^ - 1 -UcJ C l c\\ c l I 1 ( C ) - (E) + (F)| - 1_(1+|_ L J 4c: l 6- -fr^+S.) (7.3.24) 8 \"\"1 c l c l Hence from (7.3.24), (7.3.21) for m = 0, we get: / P r ( c L £ x ) = 4 I e x p ^ c ^ d ^ - exp(-4x) (7-3.25) c 1 =x ( i i ) For m = 1, I'EdllcjHiing the similar steps like (7.3.22), (7.3.23) and (7.3.24) for m = 1 in (7.3.21), we get: ; 1<.x)= J (30c 1+45c 2+l8c 3+2c^)exp(-4c 1)dc 1 c^x exp( -4x) [2x^ + 2Gx3 + 60x 2 + 60x +• 15J / l 5 (7•3.26) P (c, r v etc. -140-Generalization in the case of m = 0 He can make a generalization for P r( c^ *V x) ^ a the case of m=0. Observing for m=0, the relations (7.3.11), (7.3.17) and (7.3«25)> we can conclude for any ^ that P r ( C ; L ^ x ) = H j expf-c^ J d ^ = exp(-x£ ) (7-3.27) c^ =x 7,4; Limiting Distribution of the Largest Elgenroot'.i From (7.3.1), the distribution of the largest eigenvalues c^ } c < c 2 a P r(c e l x) = K(*,m) J J ] [ c l exp(- C i) 0^ =0 c 1 = 0 1 I e. II (o± - c'j) If d C i 0=2 i=l ' (7.4.1) Set C ] L = c ^ . . . * ^ ^ ^ C 2 ' C t U l U 2 ' - - ^ - 3 ^ - 2 \\-2 - c t V 2 c t - i = ° e u i (7.4.2) then the distribution (7.4.1) reduces to: -141-X P r(c^S x) = K(e,m) / /- / / c =0 u 1 =0 ut_2=0 u e_ 1 =0 _.(£-!)(£-2) , ,n ,x - (g-2)(g-l) L m-F -if m(£.-l)+-i £ 1 n 81 e x P ( _ c e ) J L u l . \" \" ' (l-UgJexpC-c^^)! |_u2 (l-u 2)(l-u 1u 2)exp(-c £u 1u 2 )J . . . • l u j _ 1 ( 1 - u g _ 1 ) ( 1 - u £ - 1 u £ _ 2 ) •••( 1- u^_ 1 u^_2\" * u i ^ e x p ( \" c £ u i u2\" * u ^ _ i ^ ] d c £ J[ a U ; L (7.4.3) i=l -Here below we give the method for evaluating (7.4.3) for particular values of £ = 2, 3> 4 which can further be extended for any 2 . Symbols and Notations 1 ( i ) K t y O - j x n e x p ( - « ) d x - - S ? f i I [ l + | + £ ^ + . -0 a •* a 1 ( i i ) J (n^p^r, ...;a) = ^ x nexp(-ax)dx£y yPexp(-axy)dy 0 1 (7.4.5) ( i i i ) j^I( P l,P 2;ax) t I(q.1,q2;ax)t ... J = 0 0 1 1 J exp(-axz 1)^ j z ^ exp(-axz.^ )dz2J dz 1 * . .^dx 0 (7.4.6) Case I; Substituting t = 2 In (7.4.3): x 1 P r(c 2 £ x) = K(2,m) J J c 2 ^ 2 exp(-c2) u^l-^Jexpf-c^Jdu^Cg c 2 = 0 u l = 0 (7.4.7) Making use of (6.2.1) and (7.3.2), (7.4.7) reduces to: ,211*1 • r v - 2 - - ^2m+2) J \"2 - ~ * - v - ~ 2 / L - v \" > - 2 / - - . v » . - » - 2 # j — 2 P„(c0 i x ) = ^ g ^ ^ j c^1\"*2 exp(-c2) [^I(m,c2) - l(nHKL,c2)Jdc, c„=0 '2 Using (7.4.4) and simplifying: W 1 x a 2 * * 1 / 2m*2 , (-exp(-c2) -Pr (°2 \" X ) - f[2m+2T 7 C2 e x p ( \" C 2 ) L * 2 ^ + 2 ~ + 2 =2-o 3 ^ + 4) + (-*L- - « ^ U ] a c 2 (7A.8) C^ C„ •* ^c„ \"c« s J 2 2 2 2 which can be easily evaluated for successive values of m. Case II: For I = 3 Substituting & = 3 in (7.4.3), we have: -143-1 1 1 P^C^ . x) = K(3,m) J J J c 3 ^ 5 exp( -cj c = 0 • u = 0 u = 0 3 1 2 [ ^l-u^eaqpC-c u^J^Cu™ (l-UgJCl-u^JexpC-c^Ugjj dc cta^dii^ (7.4.9) Using (7.3.2) for I = 3 and (6.2.1), we obtain from (7.4.9) after re-arrangemait of terms: 22m 3 / 3m 5 p (c < x) / c exp(-c )dc x 3 r(m , l ) R2m*3) / 3 3 3 ~~ c 3 c 3 J ^ ^ K m u , ^ ) - I U ^ u ^ ) ) * ( i G ^ u . ^ ) - I ( m , Y 3 ) J 2mt4 2m+3 * (1(21,^03) - K a + l j^Cj)) 2m*2^ - •« (7.4.10) * which can be easily evaluated for different values of m by repeated use of (7.4.4). In fact, we have to use the same steps as in (7.3.15) and (7.3.16), using repeatedly (7.4.4) instead of (7.3.6). Following this procedure we have computed probabilities for m* 0, 1 and 2. The results are as follows: (i) For m r 0, X x Pr(c3 < x) = -3 J exp(-303)dc3 + U J exp(-2c3)dc3 03*0 03*0 x * J ( 2 c 3 \" 6 c 3 * 3jiexp(-c3)dc3 (7.4.11) C-jrO \\ -144-( i i ) Kor m = 1. x /3 2 3 * 5°3 \"* 5 c 3 ^ e x p ^ ~ 3 c 3 ^ d c 3 c3*0 . A . + 3 / c 3 exp(-2c 3)dc 3 >3=0 X *. / (c3 \" 5cl+ 5c 3)exp(-c 3)dc 3 (7.4.12) v ° ( i i i ) Form: 2 x /6 5 l± 3 2 (2c 3* 20c3-f 80c3+ 140c3 + 105c3)exp(-3c3)dc3 X + j | y c 3 exp(-2c3)dc3 (2c^ - 14c^ • 21c 3)exp(-c 3)dc 3 (7.4.13) cU u l U2 j i * J - ' f / c k u i u2 ) \" I (m+3, c^u^g )V J. , l 1(11*3,0^^) - X ( % c k u i u 2 } / + JL ( K m > c k u i u 2 ^ \" 2m*4 \" ~ 2m#2 X (mfrl.c^Ugyj + f l d r t S - c ^ U g ) - Ifm+l-c^UgMH ' J 3m+5 2m+4 C4 U1 \\ U 1 2m43 °™»? ^ ^ 2m#2 ^ (7A.15) - l k 6 -The procedure for evaluating (7 .4.15) is the same as we used in Case III of Section (7.3) dealing with smallest eigen-roots. We have to fMJl^w the same steps as (7.3.22), (7.3.23) and (7-3 .24) and have to make repeated use of ( 7 . 4 . 4 ) . 7.5: Limiting Distribution of U or Y for £ = 2 , 3 and 4 . The moment generating function of (1 .4.10) i s : ,) = \\\"\"J K^mXcjCg ...c^ ) m exp[ - Y_ c± * * H c i J I / . i=l , i=l m(t; ; J i w ^ J i*2:'-j«l i=l from which the h-th moment f about the origin i s : I / h KU,mfrh) i t r\\/2mfri-lHv i=l V k 2 ' , Tfea*** f t ^ h j P ^ 4 ^ + hJ O T / ^ h ~ ^2m*2j ^2m+3^ *** ^ 2m+g-frl j (7-5.2) This h-th moment shows that the moments of the limiting distribution of the product of the roots (c^Cg...c^ ) can also be determined from the following: 2mfr2 _ 1 2m43 _x i v x 2 exp( - V l )dvx v 2 2 exp( -v 2 )dv2 2mfrl-frl = -j_ ^ 2 m f r J - f r J L j t exp(- L v ) e-1 i - 1 m mfr?r mfrl 2 or from . — — 1 — ~ _ — — — — ar-TT- v i vo vo •••v» dv...dv. f ( m f r l ) p n 4 ) . . . p ^ l f . ) 1 2 3 ? 1 . t • >•:. -.,.v.',, • ( 7 . 5 . 3 ) ft'.' where 0 6 v ± 4. oof i=l, 2 , t Case I: For 2=2 , Substituting (? = 2 in ( 7 . 5 . 3 ) , we get the joint distribution of v^ and v 2 as: ( W l ) pmfr|) m v ^ exp( -v x -v2 )a v xd v 2 ( 7 . 5 . k ) where 0 £ v i c «*° i=l, 2 , which is the same as (6 .3.I) for k^ = 0 , u Q = 2 v 2 , u^ = 2v^. Hence from ( 6 . 3 . 7 ) , the distribution of V]L = 2y/v^y2 or of ' r—- 2 2 Jc^o^ for k^ = 0 i s : v f * 1 expC-V.) ~ <3V , ( 7 . 5 . 5 ) r(2m+2) 1 ' where . 1 ' which is a gamma variate of parameter (2m+l). Further, for any test of hypothesis,, we need to make the two types of substitutions form. We proceed as follows: -Ili8-n -3 (i) : When p = 2( < n^), set m = — in (7.5.5),and get the distribution of V (= 2v/ c 1 c 2) fis-follows: r ( n ' - i ) , v i r e x p < - V d V i , (7.5.6) where O f ^ 0 0 , which is a gamma variate with parameter (n^-l). ( i i ) ¥hen ^ = 2(c p), set m = in (7.5.5) and obtain: flP-i) v i \" 2 ^ ( \" V ^ i (7.5-7) 0 «• V <-which is a gamma variate with parameter (p-l). Case II: For t = 3 Substituting t = 3 in (7.5.3), the joint distribution of v l ' v 2 ' a n d v3 i s : exp(-v1-v2-v3) m m+3/2 mfrl f(mfrl)[?m4)[(W2) 1 2 3 v\" v_ v r * dvndv0dv, (7-5.8) 1 2 3 » where 0 * V °\" i=1>2,3, which J J' \" \" - - \\ - < 2 is the same as (6.3.9) for = 0, u g = 2v^ , u^ = 2v2, u Q = 2v3< Hence from (6.3.13), the distribution of V1(= Ov-jv^ or = 80^0^), 2 where k^ =0, i s : V* ' ' /V~ — — m i - L(J J . ) d V (7.5.9) 2mfrlr(nHKl) fem^) 0 2 1 -1&9-where L Q(a), for a = J — , is defined in (6.2.33). Again, for any test of hypothesis, we need to make the following two types of substitutions in (7.5.9) n -4 (i) i f p = 3 U n x ) , set m = - | — in (7.5-9) ( i i ) i f n1 = 3( <-p), set m = ^ in (7-5-9). Case III: For t = 4 Substituting for I = 4 in (7.5-3), the joint distribution of v x, v 2, v 3 and v^is: expt-Vj^-Vg-v -v^) m ^ x ^g - — r; — — ^ v v v v. J / dv dv dv dv, (7-5.10) Pm+l)|(m4)|rm42)frm4) 1 2 3 * 1 2 3 where 0 <• v ± c 1=1,2,3,4, which is again of the type (6.4.15) for ^ = 0, u^ = 2v1, u 2 = 2v2, Uj^ = 2v^ and u Q = 2v^. 2 Hence from (6.3.19) for k = 0, and making use of (6.2.18), the distribution of V^(= ^v^VgV^v^ or ^c^CgC^c^) i s : a 2 md(a 2) f f U-tZX) - log a ) / a 2 ^ a 3 • a* ' ^2m+2)fl2m44) |_ V . 2 A 2'1» + 3-1- + 412T * + | ^1 - a + + ^ L + ' . .- H (7-5-11) 2 2 3 2 2 2 ^ where a =*/V]L and 0 * V.^ 00 -150-Again, for any test of hypothesis, we need to make two types of changes for m in (7.2.11) as: (i) For p s 4( < n-^ ), set m - — 2 — ( i i ) For n( - 4( ^ p), set m = J ^ Note: For £ r 5, 6; a similar method was applied but we were confronted with the following difficult integrals: For I - 5 The\" integral in this case, i s : 4m*8 + r Rm*l)p:2m,3) r(2m»5) J ' ^ \" v 2 \" \\ \" . • I . 11*0 V,= 0 2 2V2)dV2dV4 ' (7.5.12) for V-^ =. Ci^c-jC^c^ 0 £ V <- «*° For A - 6 The integral i s : \\ (2m*2) \\ (2m*4) | (2m+6) ' ' 3 5 (7.5*13) for =. c^c 2 ... and 0 s- V-^ <• 0 0 CHAPTER EIGHT APPROXIMATE DISTRIBUTIONS OF THE NON-ORTHOGONAL COMPLEX ESTIMATES 8.1: In the case of unequal sub-class numbers in Anova of Model II, we run into the difficulty of defining the distributions of the mean squares or the sum product (S«P.) matrices respectively in both the univariate and multivariate cases. In such situations, as pointed out earlier, for the univariate case the mean squares are distributed as sums like y ( A r % r ) , where the A r are functions of the variance components r A/2 and the number of observations, while each % r is distributed as central chi-square with 1 D.F. Similarly, for the multivariate case the S«P0 matrices, as proved below, are distributed as sums ^ \"*(Wr) of independent r Wishart matrices with different parameter matrices and one degree of freedom for each. Thus, we try to approximate ^ ( \\- ^ r ) a n d ^L. r e s P e c t i v e l y 2 r r by A ^ and a Wishart matrix W with revised D.F. To do this we determine first what are the \\ and //_ and then use Satterthwaite' s technique to r r approximate the sums ^ (A rX^) and ^ (Wr) respectively by A X r r and W to find the respective corresponding D.F. For finding A r suid r> we begin below with the multivariate case, from which the univariate case is deduced, and the corresponding D.F. determined for both0 8.2 Suppose N is greater than p or n. Observations (X..,, X_ ...X ; U 7 24' p^ ZUL' Z2JL' \"'Znj) ^ 0 r ^ = 1> 2, ..., N, are made on (p -8- n) variables. The overall set of assumptions i s >A, : X i X - ^ i l z l / + — + ^ i n z n ^ + e i . i ( 8 ' 2 - l ) for i = 1, 2, ..., p; = 1, 2, ..., Njand furthermore: ( i ) the z (r = 1, 2, ..., n; «d = 1, 2, ..., W) are non-random, and the matrix Z(n x w) = ( z r > t) is of rank n. (In the case we are most interested in, the case of Anova, some of the z's are zeros and the rest are ones) ( i i ) the vectors e_^ s (ejjj '\"> ep/^ a r e ^^P^ndent a n ^ normally O distributed with mean vectors and error covariance matrix £ e ( p x p), i.e. Eie^) = 0 and E ( e ± z x e ^ ) = v for i , j = 1, 2, p and j[~ 1, 2; N, so that e ^ i j I ( p x p ) = ( , J (8 .2 .2) ( i i i ) - oo <. fi •<,•+«» Let us introduce some further notations as :ffollows: P (p x n) = ( ^ i r)B.[/S x, p2] for n x*i'=n (8 .2 .3) h£ h! . where n.^ and n' w i l l be specified below. X* (1 x p)s ( X u , X2^,.., X p z ) , so that t ( p x N)2 (X. z) = ( j ^ / ^ , X^J (8 .2 .4) 4 ( 1 X U ) - (zU'Z2l>~-> Z n J -103-A(n x n) s <•«> - U l > *e \" A n ; A 12 n l * A ' _ 21 J A 2 2 n\" n l n' ZJJJ (8 .2 .5) isf ( 8 . 2 . 6 ) (p x n) = [ J C 2 J = XZ t (8 .2 .7) 1 n * and _ n Under the overall set of assumptions «/L , the matrix B rt (p x n) which is the least square estimator of P (p x n) is B (p x n) = CA\"1 (8 .2 .8) and hence the S . P . matrix Q (Anderson, pp. l 8 l ) i s : Q = XX* - £ , B. A (8 .2 .9) If a hypothesis H specifies P-^(v x n^), then the distribution of Qg., the S . P . matrix,due to deviations from hypothesis, depends on the nature of the P i ( i = 1, 2, p; r = 1, 2, n.^). The overall set of assumptions can be completed i n two useful ways as :follows: ( i ) The columns of )5 ^ are independent, normally distributed vectors with common covariance matrix (P X p) of rank p, and are a l l quite independent of the columns of i ^ 2 ( p x n' ) and of e_ . /3 2 may be either random or constant, ( i i ) i s constant. Case ( i i ) i s the usual regression problem considered in standard texts. In what follows we consider only case ( i ) . We let w denote the subset of A for which the following hypothesis holds, H ZL =0, which implies that E ( ^ 1 ) = P 1 Q , a matrix of constants. Then % = (X - / Z ^ X X - fi^fa? - B ^ S ^ w = XX* - B^ A ]&•+ ( B 1 A ^ 1 > 0 ) A 1 1 < 2 ( B 1 A - A ^ ) * (8.2.10) where B 2 w = (Cg - B ^ g f c j * (8.2.11) and *na A 1 1 > 2 ( n ; L x n L) - A ^ - A ^ A 2 1 (8.2.12) Hence from (8 .2.9) and (8.2.10), we obtain: Qg(p x p) = ^ -Now there exists an orthogonal matrix U such that U A 1 1 . 2 u t = [ ~ 2 ( n l X n l ) (8 .2 .14) where ( i ) (~ is a diagonal matrix with elements 2 ^ r (r = 1, 2, nj) 2 and ( i i ) A1 0 P and U are a l l non-random and each of order (n x n.^). Therefore y P x p) = ( B ^ - ^ 0 ) U * f 2 \" h , < f - 1 r (8.2.23) r=l r \"' where * r = S ] p( % e + 1 ^ )£, r = 1, 2, (8.2.24) so that % « E(Wp) = ? e + Y * ^ = (^ ± J ^ ± J (8.2.25) 8.3: Approximate Distributions (a) Univariate Case Wecbnsider again the relation (8.2.21) and write: n i Q H ( i x , D = y ( ^ + v j ^ 2 ) u 2 . r*T 2 2 2 2 f.Since the h-th cumulant of ( + )u is: i t follows^\" that the h-th cumulant of 0^(1 x l ) i s : n l 2 h- l(h-Di x . < H ; + Y r ^ 2 ) h -r=l -Behcet the first two moments of 0^(1 x l ) about the origin are: r=l / A ^ 2 = 2 2_ ( ^ r - / ' 2 C 8 ' ^ ) r=l (8.3.1) (8.3.2) (8.3.3) -157-Following Satterthwaite, we approximate QJJ, defined in ( 8 . 3 . 1 ) , by A X where is a central chi-square with f D.F., so that the f i r s t 2 two moments of Q^(l x 1). are respectively equal to those of A X . Therefore, making use of ( 8 . 3 . 4 ) , we obtain: n l f A = 21 +V 2TT-£) ( 8 . 3 . 5 ) r . l 6 r ^ n l since E( % ) z f and var ( X ) » 2f• Finally, from ( 8 .3 .5 ) and ( 8 .3 .6 ) we obtain: f = [ Z > ^ e \" ' r ^ ' l / Z_ v ~ e T °r ~ / » 2 2 Since and r s 1> 2 » n]_ r by a Wishart matrix W , f J or order (p x p) where f is to be determined such that: (i) The expected matrix of the approximating matrix is equal to that of the sum of the Wrj (i i ) The elements of the approximating Wishart matrix have an ellipsoid of concentration (Cramer 1946) whose volume is equal to the corresponding volume for the sum of the given Wishart matrices* Condition (i) gives: E(W) r 21 E(Wr) r i.e. f % = Sjl x + 2 £ 2 + • • • * 2 . n (8.3.10) Further, i f p ^ ^ (?2^^ x ( P2^ J b e t n e m a t r i x °^ t i i e covariances of f ,p-*lN /P+lJ elements of W? and P that of W also of order ( ^ ) x ( ^ )J > condition (i i ) gives: Det.(P) = Det.( (8.3.11) Thus to find f 1 , one should find from (8.3.10) by comparison the elements of in terms of those .of ^ and should substitute them in the left hand side of (8.3.11). For instance, in our case from (8.3.13), we have: -. - y 13(r) ) (8.3.12) where U(r) r=l • 2 ^ , i j + * r ^ i j ' ^ \" X' 2 7 — ' » (8.3.13) and, following T.W. Anderson (p. l 6 l ) , from (8.3.II) we have: det ( i£ Ok 1 r D l )J= det.j ^ r=l (8 .3 .14) ik(r) ^Tg(r) * ^ ( r ) ^ j k f r ^ J for i , j , k,C = 1, 2, p. Finally, making use of (8.3.12) and (8.3.13), the relation (8.3.1U) becomes: det •V.% n. p(p+D =f 1 r = 1 r-1 , - r = 1 ^ C x : r i k ( r ) ^ ( r ) + r r ± H r ) ^ ( r ) ^ jk(r) (8.3.15) -160-Again, since the , 0 i.e. i f v >, u (B-8) Thus, from (B-5), (B-7) and (B-8), the following two parts of the bounds of u, v, w are: (i) 0 < u <- 0 0 0 £ v < JJ- u 0 w * £ and ( i i ) 0 4 u 00 1 2 , 1 2 jj- U 5 V £• — U 3 ft. where /? and /3 are defined as:.follows: a U/ 2 2 N 2 / 2 ^1 \" 3 ( v \" 9 U } \" 2 7 ^ \" 3 V / ^ \" ^ 9 2 2v , 2 , 2 _ ,3/2 3,v - - u ) + gy(u - 3v)J/ (B-9) (B-10) Note: When the roots of (B-l) assume values between zero and one, we have to change the range 0 * u < . i n (B-9) into 0 £ u £ 1, and the bounds for y and w remain unaffected. Formll;: Applying Descartes* rule of signs to A = 0 in (B-3), we conclude that the cubic in u, for known positive real values of v, w, has at most two positive roots and one negative root. Setting: 2 y = Uwu - ~ (B-ll) in (B-k), we obtain: k 6 f(y) = y 3 - 3y(24vw2 + j-) + 2(2l6wU + 20v3w2 - |~) $ o (B-12) Now two cases arise: Case I, 2l6w + 20v\\ - v /27 is positive (B-13) Case II, 2l6w^ + 20v3w2 - v 6 /27 is negative (B-13 1 ) Further, applying Descartes1 rule of signs to (B-12) and using the case of (B-13), we conclude that the cubic in y, for known positive and real values of v, w, has again at most two positive roots and one negative root. Thus, for known real and positive values of v, w, the negative root of (B-12) shall correspond to the negative root of (\"B-k) and the two positive roots of (B-12) to the two positive roots of (B-4). Similarly (B-13*) enables us to conclude that the largest positive root of (B-k) corresponds to the only positive root of (B-12) and the smallest positive root of (B-k) corresponds to the largest negative root of (B-12). To find the bounds on u, v, w, in both these cases we proceed as follows:--165-Case I To find the bounds for y, v, w for f(y) in (B-12), we first draw its curve and from its shape we conclude what the bounds for y are; Consider: f(y) = y 3 - 3y(24vw2^ 2L_) ^ 2 (21ow 4i- 20v3w2 - (B-14) (i) When y = 0 , f(y) > 0 . ( i i ) By Descartes rale of signs f(y) has at most two positive roots and one negative root. ( i i i ) Finding f'(y) and f\"(y), we conclude the following: (a) y = (b) y = gives a minimum of f(y). gives a maximum of f(y). and (c) y - 0 is a point of inflection. Thus the shape of the curve (B-14) is as shown below. Hence, in order that f(y) be negative for real and positive values of y, y must take the values from A to B, i.e.: OA * y <• OB -166-and- d 2 from (B-ll), jL.(0A+i-) * u * jL(oB + y-)' (B-15) Thus to have positive zeros of (B-l4),i.e. OA and OB, we follow Todhunter or Burnside and Panton and write the zeros of (B-l4) as: 2(24vw ' + — ) cos f- , y 5 k , -2(2W 2 + J-) cos £ ± ± , k , -2(24vw2 + J-) cos 2Llj£ , (B-16) y J where ^ is defined by the relation: •8(~ - ^ tan i = - —-2 j- — g (B-17) 2(2l6w + 2 0 v V - v^ ) 27 and .where real value of

9w i.e. i f 3w2//3 < v (B-18) 3 2,3/2 Since tan in (B-17) is negative, ^ will be an obtuse angle which will make the first two roots of (B-l6) positives and the last negative. Therefore, OA and OB are obtained as: OA = 2(2Ww2 + j-) cos & f h , and OB = -2(2Ww2 + cos W* r (B-19) y J which enable us to write the reduced form of (B-15) as follows: -167-[ 2 (2W 2 + cos i + i - ] 4 u , ( - 2 ( 2 k w 2 + |- )^ or, making use of (B-5), we get: . £ 3 i U £ (B-20) where fi^ = Max £ 73vr]j|(2(2Ww2 + | - ) 2 cos |- + j - j ] and /» k - £X-2(2kvw2 + ^ c o s ( - ^ ) + ^ ] j L . (B-2l) 4 3 2 v 6 Further, 2l6w + 20vJw - — is positive, i f 27w 2(l0 - 6J3) *• v 3 i 27w2(l0 + 6 ^3) i.e. i f 3 w 2/ 3(l - 7 3 ) ^ v «• 3 w 2^ 3(l+ 7 3 ) But v cannot be negative% Therefore 0 * v * • 3 w 2^ 3(l + 7 3 ) (B-22) Finally, from (B-5), (B-20), (B-2l) and (B-22), we obtain: 0 f- -v in (B-23) to 0 £ w £ 1,and the ranges for v and u remain unchanged. -168-Case II Proceeding as before, the graph of f(y) in (B-12) is as follows: Hence, in order that f(y) be negative for real and positive values of y, y must take the values from C to D and thus from (B-ll), 2 2 ^ ( C O + ^ - ) . u , i ( O D + |-) (B-24) To find CO and OD, we proceed as in Case I, and conclude that the same relations as (B-16), (B-l?) and (B-18) hold. Further, since tan $ in (B-17) is positive, 4 will be an acute angle. Noting this fact and equation (B-13')> we can easily obtain the bounds on u, v, w, for this case also. Finally the bounds obtained for both of the cases are written down as follows: 0 $ w <. <» and 0 6 w c •*» 3w2/3< v < 3w 2 / 3(W3) 3*^(1 V 3 ) & v < ~ where and fi' .- i [ 2 ( 2 4 w 2 - ^ ) 1 / 2 cos i . ^ J (B-26) where ^ is the supplement of ^ used in Case I. Upper 100-^Percentage Points of % . 0 5 and /.* I - ( I - / . o r 8 I 2 3 4 5 6 7 9 10 II 12 13 14 15 16 17 18\" 19 20 21 22 3.8\" 4146 5 .99147 7 .81473 9 .48773 1 1 . 0 7 0 5 1 2 . 5 9 1 6 14.067.1 15 .5073 1 6 . 9 1 9 0 18.3070 1 9 . 6 7 5 1 21 .0261 2 2 . 3 6 2 1 23.68\" 48\" 24.9958\" 2 6 . 2 9 6 2 2 7 . 5 8 7 1 2 8 . 8 6 9 3 3 0 . 1 4 3 5 3 1 . 4 1 0 4 3 2 . 6 7 0 5 3 3 . 9 2 4 4 2 . 7 4 6 0 5 4 . 6 5 5 9 0 6 . 3 0 9 2 3 7.8\"43I5 9 . 3 0 5 1 0 1 0 . 7 1 7 9 1 2 . 0 9 4 4 13.4428\" 1 4 . 7 6 8 5 1 6 . 2 9 6 6 1 7 . 3 6 6 4 18\". 6438\" 1 9 . 9 0 9 2 21 . 1643 2 2 . 4 1 0 0 2 3 . 6 4 7 3 24.8\"7I0 2 6 . 0 9 9 9 2.7.3164 28\".5272 2 9 . 7 3 2 5 3 0 . 9 3 3 0 2.15OII 3.0*9530 5.43433 6.8\"7534 8.25704 9.598\" 08 10.9092 12.1970 13.4660 14.7194 I 5 . 9 5 9 S 17.1889 18.4080 19.6187 20.8215 22.0173 23.2070 24.3909 25.5699 26.7436 27.9131 29.0785 I. 75364 3.36943 4.81934 .6.18829 7.50787 8.79351 10.0504 II. 2951 12.5205 13.7328 14.9342 16.1261 17.3096 18.4861 19.6560 20.8110 21.9789 23.I33O 24.2827 25.4283 26.5703 27.7088 I. 46490 2 . 9 7 2 6 6 4 . 3 4 8 4 7 5 .65765 6 . 9 2 5 8 0 8 . I 6 5 6 I 9 . 3 8 4 4 5 1 0 . 5 8 6 9 II. 7762 1 2 . 9 5 4 6 1 4 . 1 2 3 8 1 5 . 2 8 5 0 16 .4392 17 .5875 18 .7302 19.8680 2 1 . 0 0 1 5 2 2 . 1 3 1 0 2 3 . 2 5 6 8 24*3792 25.49^5 26 .6149 I. 24301 2 . 6 5 6 7 7 3 . 9 6 8 3 2 5 .22577 6 . 4 4 9 4 1 7 . 6 4 9 7 0 8 . 8 3 2 5 0 1 0 . 0 0 1 6 II. 1599 1 2 . 3 0 9 1 1 3 . 4 5 0 4 1 4 . 5 8 5 2 1 5 . 7 1 4 2 1 6 . 8 3 8 2 1 7 . 9 5 7 5 1 9 . 0 7 2 7 2 0 . 1 8 4 4 2 1 . 2 9 2 7 2 2 . 3 9 7 8 2 3 . 5 0 0 2 2 4 . 5 9 9 9 2 5 . 6 9 7 3 I. 06704 2.39686 3.65123 4.86276 6.04700 7.2I2I5 8.36304 9.50270 10.6334 II. 7567 12.8735 13.8950 15.0915 16.1940 17.2927 18.3879 19.4802 20.5697 21.6567 22.7413 23.8238 24.9043 {iii APPENDIX C Upper 100 /^Percentage Points of % .05 and A»l-(I../) \"-' Contd. a. 3 k 5 6 7 8 23 3 5 . 1 7 2 5 3 2 . 1 2 8 7 3 0 . 2 4 0 1 28 .8441 2 7 . 7 2 8 5 26.7923 2 5 . 9 8 2 9 24 3 6 . 4 1 5 1 3 3 . 3 2 0 2 3 1 . 3 9 8 2 2 9 . 9 7 6 5 23 .8397 2 7 . 8 8 5 4 2 7 . 0 5 9 8 25 3 7 . 6 5 2 5 3 4 . 5 0 7 6 3 2 . 5 5 2 4 3 1 . 1 0 6 0 2 9 . 9 4 8 6 , 2 8 . 9 7 6 4 28 .1351 26 3 8 . 8 8 5 2 3 5 . 6 9 1 1 3 3 . 7 0 4 2 3 2 . 2 3 2 8 3 1 . 0 5 5 1 3 0 . 0 6 5 4 2 9 . 2 0 8 7 27 4 0 . 1 1 3 3 3 6 . 8 7 1 2 3 4 . 8 5 2 8 3 3 . 3 5 7 3 3 2 . 1 5 9 6 3 1 . 1 5 3 0 3 0 . 2 8 1 1 28 4 1 . 3 3 7 2 3 3 . 0 4 7 9 3 5 . 9 9 8 6 3 4 . 4 7 9 3 3 3 . 2 6 2 2 3 2 . 2 3 8 7 3 1 . 3 5 2 0 29 4 2 . 5 5 6 9 3 9 . 2 2 1 4 3 7 . 1 4 1 8 3 5 . 5 9 9 2 3 4 . 3 6 3 0 3 3 . 3 2 3 0 3 2 . 4 2 1 7 30 4 3 . 7 7 2 9 4 0 . 3 9 1 2 3 8 . 2 8 2 4 3 6 . 7 1 7 0 35.4620 3 4 . 4 0 5 7 3 3 . 4 9 0 1 40 55.7535 51.958-1 49.5762 47.3005 46.3717 45.1660 44.1189 50 67.504^ ' 6 3 . 3 3 5 5 ' 60.7110 58.7506 57.1704 5 5 . 8 3 4 5 54.6717 60 79.0819 74.5791 71.7368 69.6095 67.8920 66.4380 65.1708 70 90.5312 85.7220 32.6795 80 .3988 78 .5550 76 .9924 75.6293 30 101.879 96 .7843 93.5562 91.1329 89.1717 87.5082 86.0559 90 113.145 107.733 104.379 101.822 99.7505 97 .9922 96.4562 100 1 2 4 . 3 4 2 118.726 115.157 112 .473 IIO . 2 9 8 108.450 106.834 X 1.6449 1.2960 1.0686 0.8945 0.7514 0.6283 0.5196 Oil/ Appendix C Upper 100 Percentage Points of X1 /:.05 a n d / ^ I-(I- I Contd. 10 M 11 ' 3 \\ii \\T 2 3 4 5 6 7 9 10 II 12 13 14 15 16 17 18 19 20 21 22 0.92376 0.80490 0.70494 0.61989 0.54904 O .48403 0.42935 2 .17786 1.98989 1.82634 1.68238 1 .55476 1.94048 I.33761 3.38059 3.14536 2.93815 2.75354 2.58790 2.43980 2 .30102 4.55064 4.27748 4.03522 3.81794 3.62I7I 3.44275 3.27864 5 .69931 5 .39360 5.I2I26 4 . 8 7 5 9 5 4.65347 4 .44973 4 .26213 6.83281 6 .49816 6 . 1 9 9 0 9 5 .92886 5 . 6 8 3 0 4 5.45726 5.24877 7 .95493 7.59396 7 . 2 7 0 5 8 6 .97769 6 . 7 1 0 6 4 6 . 4 6 4 8 3 6.23734 9.06802 8.68277 8.33697 8 .02317 7.73656 7.47226 7 .22724 1 0 . 1 7 3 9 9 .76510 9 . 3 9 9 2 5 9 .06594 8 .76105 8 . 4 7 9 4 8 8.2I8I0 11 .2739 10.8446 10.4581 IO . IO64 9.78433 9 . 4 8 6 5 1 9 . 2 0 9 7 0 12.3686 II.9I9I II.5140 II.1449 IO.8O65 10.4933 10.2019 13 .4586 12 .9901 12.5672 12 .1816 11.8276 11 .4998 II.1945 1 4 . 5 4 5 4 14.0582 13.6183 13.2168 12.8481 12.5062 12.1876 15.6285 15.1237 14.6674 14.2507 13.8677 13.5123 I 3 . I 8 I O 16.7085 16.1867 15.7147 15.2834 14.8866 14.5183; 14.1747 17.7858 17.2475 16.7604 16 .3149 15 .9049 15 .5241 15.1686 18.8607 18.3065 17.8046 17 .3454 16.9226 16.5297 16.1627 19.9333 19.3637 18.8476 18.4752 17.93.99 17 .5353 17 .1572 21.0038 20.4192 19 .8893 19.4040 18.9567 18.5406 18.I517 22.0725 21 .4737 20.9300 20 .4321 19.9730 19 .5459 19.1464 23.1395 22.5261 21.9696 21.4595 20.9890 20.5510 20.1413 24.2048 23 .5775 23.0082 22.4862 22.0045 21.5560 21.1363 (iv) Appendix C Upper 100 /^Percentage Points of X / « . 0 5 and /,= ! - ( ! - / ) - Contd. 10 II 13 Hi 23 2 6 . 2 6 8 6 2 4 . 6 2 7 8 2 4 . 0 4 6 0 23 . 5123 2 3 . 0 1 9 8 2 2 . 5 6 I O 24 2 6 . 3 3 1 0 2 5 . 6 7 7 0 2 5 . 0 8 3 0 2 4 . 5 3 7 9 2 4 . 0 3 4 6 2 3 . 5 6 5 8 25 2 7 . 3 9 2 1 2 6 . 7 2 5 1 2 6 . 1 1 9 1 2 5 . 5 6 2 9 2 5 . 0493 2 4 . 5 7 0 5 26 2 8 . 4 5 1 9 2 7 . 7 7 2 2 2 7 . 1 5 4 5 2 6 . 5 8 7 4 2 6 . 0 6 3 5 2 5 . 5 7 5 1 27 29 . 5 1 0 6 2 8 . 8 1 8 4 2 8 . 1 8 9 3 2 7 . . 6 I I 4 2 7 . 0 7 7 5 2 6 . 5 7 9 7 28 3 0 . 5 6 8 1 2 9 . 8 6 3 8 2 9 . 2 2 3 4 2 8 . 6 3 5 0 28 .0913 2 7 . 5 8 4 1 29 3 1 . 6 2 4 7 3O.9O84 3 0 . 2 5 7 0 . 2 9 . 6 5 8 3 2 9 . 1 0 4 9 2 8 . 5 8 8 6 30 3 2 . 6 8 0 3 3 1 . 9 5 2 2 3 1 . 2 8 9 8 3 0 . 6 8 1 0 3 0 . 1 1 8 1 2 9 . 5 9 2 9 40 4 3 . I 9 0 6 4 2 . 3 5 4 4 4 1 . 5 9 2 2 4 0 . 3 9 0 4 4 0 . 2 4 0 4 3 9 . 6 3 2 9 50 53 . 6393 5 2 .7080 51 .8580 £1 . 0 7 4 5 5 0 . 3 4 7 9 4 9 . 6 6 8 0 60 6 4 . 0 4 4 4 6 3 . 0 2 7 3 6 2 . 0 9 8 2 61.2408. 6 0 . 4 4 5 1 5 9 . 7 0 0 0 70 74 .4166 : . 7 3 . 3 2 0 7 7 2 . 3 1 8 7 7 1 . 3 9 3 5 7 0 . 5 3 4 4 69 .7293 80 3 4 . 7 6 2 9 8 3 . 5 9 3 6 8 2 . 5 2 4 1 8 1 . 5 3 5 8 8 0 . 6 1 7 5 7 9 . 7 5 6 7 90 95 .0879 9 3 . 8 4 9 8 92 .7166 9 1 . 6 6 9 2 9 0 . 6 9 5 6 8 9 . 7 8 2 4 100 1 0 5 . 3 9 5 1 0 4 . 0 9 2 1 0 2 . 8 9 8 1 0 1 . 7 9 5 1 0 0 . 7 6 9 9 9 . 8 0 6 7 X 0 . 4 2 1 8 0 . 3 3 2 5 0 . 2 5 0 1 0 .1733 0 . I 0 I 4 0 . 0 3 3 5 2 2 . 5 6 1 0 Appendix C Upper 1 0 0 P e r c e n t a g e Points of X /= . 0 5 and ^ = I-(I- JL ) - Contd. \\5 16 17 18 3.0 I 0 . 4 2 9 3 5 8.38I7I 0.34003 0.30333 0 .27092 0 .242283 2 1*33761 1 . 2 4 4 5 7 I .-16017 I . 0 8 3 I O I.01246 0 . 9 4 7 6 8 0 3 2 . 3 0 1 0 2 2 .17582 2 . 0 6 0 8 7 1.95466 I . 8 5 6 I 6 1 . 7 6 4 7 4 4 3 . 2 7 3 6 4 3 . 1 2 7 4 5 2 . 9 8 7 7 9 2 .85793 2 . 7 3 6 7 7 2.62363 5 . 4 .26213 4 . 0 8 8 6 1 3 . 9 2 7 6 9 3 . 7 7 7 4 8 3 . 6 3 6 7 9 3 . 5 0 4 9 1 6 5 . 2 4 8 7 7 5 . 0 5 5 3 8 4 . 8 7 5 5 3 4 . 9 0 7 2 0 4 . 5 4 9 1 0 4 . 4 0 0 5 1 7 6 . 2 3 7 3 4 6 . 0 2 5 8 8 5.82882 5 .64400 5 .47005 5 .30624 8 7 . 2 2 7 2 4 6 . 9 9 9 1 0 6 .78613 6 . 5 8 6 0 7 6 . 3 9 7 5 0 6 . 2 1 9 6 4 9 8.218\" 10 7 . 9 7 4 3 8 7 . 7 4 6 5 7 7 . 5 3 2 2 9 7 . 3 3 0 0 5 7 . 1 3 9 0 4 10 9 . 2 0 9 7 0 8 . 9 5 1 3 0 8 . 7 0 9 5 3 8 . 4 8 I 8 6 8 . 2 6 6 7 4 8 . 0 6 3 3 7 II 1 0 . 2 0 1 9 9 . 9 2 9 6 0 9 . 6 7 4 5 8 9 . 4 3 422 9 . 2 0 6 9 1 8 . 9 9 1 8 2 12 II.1945 1 0 . 9 0 9 0 1 0 . 6 4 1 4 •00O.3889 1 0 . 1 5 0 0 9 . 9 2 3 7 8 13 12 .1896 1 1 . 8 8 9 5 II.6098 11*3458 1 1 . 0 9 5 7 10.8588 14 I3.I8IO 12 .8707 12.5794 12.3043 12 .0436 11.7964 15 14 .1747 13 .8527 13.5503 13.2645 12.9935 12.7364 16 15.1686 14.8353 14.5222 14 .2260 1 3 . 9 4 5 1 13.6785 17 16.1627 15.8185 1 5 . 4 9 5 0 15.1888 1 4 . 8 9 8 3 14.6225 18 17 .1572 16 .8024 I 6 . 4 6 8 7 16.1529 1 5 . 8 5 3 1 15.5683 19 1 8 . 1 5 1 7 17.7866 1 7 . 4 4 3 1 I 7 . H 7 9 16 .8091 16.5156 20 19.1464 18.7113 18.4183 18 .0839 1 7 . 7 6 6 3 17.4644 21 20.1413 1 9 . 7 5 6 4 1 9 . 3 9 4 1 1 9 . 0 5 0 8 18.7246 18.4144 22 21.1363 20.7419 20 .3705 20.0185- 1 9 . 6 8 3 9 1 9 . 3 6 5 7 \\ Appendix C Upper 100 Percentage Points of X / = . 0 5 and I-(I-/) - Contd. 1 5 \" 16 1 7 11? 3.0 23 22.1315 21.7278 21.3475 20.9871 20.6443 20.3182 . 24 23.1267 22.,7139 22.3250 21.9562 21.6055 21 .2717 25 2 4 . 1 2 2 1 23.7005 2 3 . 3 0 3 0 22.9261 22.5675 2 2 . 2 2 6 2 26 25.1175 24.6872 24.2314 23.8965 2 3 . 5 3 0 3 23.1817 27 26.1131 25.6742 25.2603 24.8676 24.4939 24.1380 23 27.1089 2 6 . 6 6 1 6 26.2397 25.8394 25.4583 2 5 . 0 9 5 3 29 28.1047 27.6491 27.2194 26.8115 26.4232 2 6 . 0 5 3 2 30 29.1004 28.6368 28 .0161 27.7841 27.3886 27.0119 40 39.0623 38.5244 38.0161 3 7 . 5 3 3 0 37.0722 3 6 . 6 3 2 7 50 49.0289 48.4257 4 7 . 8 5 5 2 4 7 . 3 1 2 3 46.7943 46.2996 60 58.9990 58.3369 57.7192 57.1135 56.5435 55.9990 70 68 .9714 68 .2551 6 7 . 5 7 6 7 6 6 . 9 3 0 5 6 6 . 3 1 3 0 6 5 . 7 2 2 7 80 7 8 . 9 4 6 0 ' 7 8 . 1 9 9 4 7 7 . 4 5 3 1 7 6 . 7 6 0 2 7 6 . 0 9 9 1 75.4662^ 90 88.9221 88.1082 87.3368 86 .6014 35.8981 85.2253 100 98 .9994 98.0409 97.2270 9 6 . 4 5 0 7 95.7081 9 9 . 9 9 7 6 X -0.0309 -0.0922 -0.1567 -0.2067 -0.2606 -0.3124 Appendix „D Upper 100 Percentage Points of X .81 and 1.=. I - ( I - Jt ) - C 5 > A 3 5 6 7 I 6.63490 5.42074 4.72647 4.24329 3.87492 3.57869 3.29841 2 9.21034 7.83416 7.03308 6.46767 6.03136 5.67655 5.33716 3 11.3449 9.84847 8.96928 8.36409 7.85922 7.46260 7.08130 4 13.2767 11.6797 10.7359 10.0614 9.53601 9.10502 8.68933 5 15.0863 13.4008 12.3998 11.6825 I I . I 2 2 I 10.6612 10.2156 6 16.8119 15.0464 13.9941 13.2382 12.6461 12.1584 11.6860 7 18.4753 16.6362 15.5368 14.7453 14.1243 13.6120 I 3 . I I 5 I 8 20.0902 18.1825 17 .0394 16.2148 15.5671 15.0319 14.5123 9 21.6660 19.6938 18.5095 17.6540 16.9811 16.4246 15.8836 10 23.2093 21.1759 19.9527 19.0680 18.3714 17.7946 17.2335 II 24.7250 22.6336 21.3734 20.4608 19.7415 19.1457 18.5655 12 26.2170 24.0701 22.7746 21.8354 21.0945 20.4804 19.8820 13 27.6883 25.4881 24.1586 23.1940 22.4325 21.8008 21.1849 14 29.1413 26.8897 25.5277 24.5385 23.7571 23.1085 22.4758 15 30.5779 28.2768 26.8832 25.8705 25-0699 24.4050 23,7562 16 31.9999 29.6509 28.2269 27.1912 26.3720 25.6914 25.0269 17 33.4087 31.0131 29.5595 28.5017 27.6646 26.9687 2 6 . 2 8 9 1 18 34.8053 32.3646 30.8823 29.8030 28.9485 28.2378 27.5439 19 36.1908 33.7061 3 2 . I 9 6 0 31.0957 30.2242 29.4992 28.7905 iii) A p p e n d i x D U p p e r 100 P e r c e n t a g e P o i n t s o f )C *~ / = . 0 I and JI-(I-/) - C o n t d . h X 3 5 Q 7 2 20 3 7 . 5 6 6 2 3 5 . 0 3 8 7 3 3 . 5 0 1 4 3 2 . 3 8 0 7 3 1 . 4 9 2 7 3 0 . 7 5 3 6 3 0 . 0 3 1 0 21 3 8 . 9 3 2 1 3 6 . 3 6 2 8 3 4 . 7 9 9 0 3 3 . 6 5 8 4 3 2 . 7 5 4 3 3 2 . 0 0 1 6 3 1 . 2 6 5 4 22 4 0 . 2 8 9 4 3 7 . 6 7 9 2 3 6 . 0 8 9 5 3 4 . 9 2 9 5 3 4 . 0 0 9 7 3 3 . 2 4 3 6 32 .4942 23 4 1 . 6 3 8 4 3 8 . 9 8 3 4 3 7 . 3 7 3 4 3 6 . 1 9 4 4 3 5 . 2 5 9 2 3 4 . 4 3 0 1 3 3 . 7 1 7 7 24 4 2 . 9 7 9 8 4 0 . 2 9 0 7 3 8 . 6 5 0 9 3 7 . 4 5 3 5 3 6 . 5 0 3 2 3 5 . 7 H 4 3 4 . 9 3 6 3 25 4 4 . 3 1 4 1 4 1 . 5 8 6 7 3 9 . 9 2 2 7 3 8 . 7 0 7 0 3 7 . 7 4 2 0 3 6 . 9 3 7 7 3 6 . 1 5 0 2 26 4 5 . 6 4 1 7 4 2 . 8 7 6 3 4 1 . 1 8 9 0 3 9 . 9 5 5 5 3 8 . 9 7 6 1 3 8 . 1 5 9 5 3 7 . 3 5 9 8 27 4 6 . 9 6 3 0 4 4 .I6 1 1 4 2 . 4 5 0 0 4 1 . 1 9 9 0 4 0 . 2 0 5 5 3 9 . 3 7 6 9 3 8 . 5 6 5 3 28 48 .2782 45 .4402 43 .7062 4 2 . 4 3 8 1 4 1 . 4 3 0 7 40 .4903 3 9 . 7 6 7 0 29 4 9 . 5 8 7 9 4 6 . 7 1 4 4 4 4 . 9 6 7 8 4 3 . 6 7 2 8 4 2 . 6 5 1 7 4 1 . 7 9 9 7 4 0 . 9 6 5 0 30 5 0 . 8 9 2 2 47 .9838 4 6 . 2 0 5 1 4 4 . 9 0 3 5 4 3 . 3 6 8 9 4 3 . 0 0 5 6 4 2 . 1 5 9 5 40 6 3 . 6 9 0 7 6 0 . 4 6 0 6 58.4783 57 .0242 55 .8661 5 4 . 8 9 8 1 53 .9480 50 7 6 . 1 5 3 9 7 2 . 6 3 9 9 7 0 . 4 7 8 1 6 8 . 8 8 9 5 6 7 . 6 2 2 6 6 6 . 5 6 2 5 6 5 . 5 2 0 8 60 8 8 . 3 7 9 4 8 4 . 6 0 8 5 82 .2843 8 O . 5 7 . 4 i 7 9 . 2 0 8 8 7 8 . 0 6 5 5 7 6 . 9 4 H 7Q\": 1 0 0 . 4 2 5 9 6 . 4 1 7 $ 93 .9443 92 .1223 90 .6666 8 9 . 4 4 6 7 8 8 . 2 4 6 2 80. . 1 1 2 . 3 2 9 1 0 8 . 1 0 2 1 0 5 . 4 8 1 1 0 3 . 5 6 3 1 0 2 . 0 2 2 1 0 0 . 7 3 1 9 9 . 4 5 9 8 90 1 2 4 . 1 1 6 1 1 9 . 6 8 2 1 1 6 . 9 3 9 1 1 4 . 9 1 5 1 1 3 . 2 9 6 III.978 1 1 0 . 6 0 0 100 1 3 5 . 8 0 7 1 3 1 . 1 7 7 1 2 8 . 3 1 0 1 2 6 . 1 9 3 1 2 4 . 5 0 0 1 2 3 . 0 7 8 1 2 1 . 6 7 8 X 2 .32630 2 . 0 5 5 8 4 1 . 3 8 5 2 3 1 . 7 5 7 6 6 I . 6 5 4 5 5 1 . 5 6 7 2 9 1.480)68 IU4; Appendix D Upper 100 Percentage Points ©f /_.QI and //=!-(!-/) Contd. 10 II 15. 13 /A 1 3 . 0 9 2 5 4 2 . 9 1 3 0 2 2 . 7 5 4 2 9 2.61264 2 . 4 8 4 9 4 2 . 3 6 8 9 2 2 5 . 0 8 5 3 6 . 4 . 8 6 3 8 0 4 . 6 6 6 2 0 4 . 4 8 8 2 3 4 . 3 2 6 3 9 4.I78I7 3 6 .79703 6 . 5 4 5 8 8 6 . 3 2 1 0 2 6.11768 5 . 9 3 2 0 9 5 .76154 4 8 . 3 7 8 5 0 8 . 9 0 3 1 8 7 . 8 5 6 0 8 7.63206 7 . 4 2 7 2 0 7 . 2 3 8 4 9 5 $.88171 9 . 5 8 5 4 3 9 . 3 1 9 0 5 9 . 0 7 7 2 0 8.85561 8.65123 6 II.3 3 1 4 II.0 1 6 4 1 0 . 7 3 2 7 1 0 . 4 7 4 9 IO . 2 3 8 4 1 0 . 0 2 0 0 7 1 2 . 7 4 1 7 1 2 . 4 0 9 5 I 2.II0I 1 1 . 8 3 7 7 1 1 . 5 8 7 5 11 .3563 14.1213 1 3 . 7 7 3 2 1 3 . 4 5 9 3 13 .1733 12.9106 1 2 . 6 6 7 6 9 1 5 . 4 7 6 3 I5.H33 1 4 . 7 8 5 7 14.4871 14 .2125 13.9584 10 16.8107 16.4336 16.0931 1 5 . 7 8 2 7 15.4969 1 5 . 2 3 2 4 II 18.1280 1 7 . 7 3 7 6 17.3849 17.0630 16.7669 16 .4924 12 19 .4304 19 .0274 I8 e6629 18 .3303 18.0240 1 7 . 7 4 0 1 13 20.7199 20.3046 19 .9290 19.5860 19 .2700 18.9771 14 2 1 . 9 9 7 9 2 1 . 5 7 1 0 2I.I846 2 0 . 3 3 1 7 20.5065 2 0 . 2 0 4 9 15 23.2658 22.8276 2 2 . 4 3 0 9 22.0684 2 1 . 7 3 4 3 2 1 . 4 2 4 3 16 2 4 . 5 2 4 5 2 4 . 0 7 5 3 23.6686 23.2969 2 2 . 9 5 4 1 2 2 . 6 3 6 1 17 25 . 7 7 5 0 2 5 . 3 1 5 2 24.8989 24.5182 2 4 . 1 6 7 1 2 3 . 3 4 1 2 18 27.0180 26.5480 2 6 . 1 2 2 2 2 5 . 7 3 2 8 2 5 . 3 7 3 6 2 5 . 0 4 0 2 19 2 8 . 2 5 4 2 2 7 . 7 7 4 2 2 7 . 3 3 9 3 2 6 . 9 4 1 5 2 6 . 5 7 4 3 2 6 . 2 3 3 5 C i v / A p p e n d i x D U p p e r 100 P e r c e n t a g e P o i n t s o f X-/ ^ . O I and JL' - ! - ( . ! - / C o n t d . 11 SSL IB Ik ' 20 2 9 . 4 8 4 0 28 . 9 9 4 3 2 8 . 5 5 0 5 2 8 . 1 4 4 4 2 7 . 7 6 9 7 27 .4216 21 3 0 . 7 0 7 9 3 0 . 2 0 8 8 29 .7563 2 9 . 3 4 2 2 2 8 . 9 6 0 0 2 8 . 6 0 5 0 22 3 1 . 9 2 6 5 3 I . 4 I 8 I 3 0 . 9 5 7 2 3 0 . 5 3 5 3 3 0 . 1 4 5 7 2 9 . 7 8 3 9 23 33.I4OO 3 2 . 6 2 2 6 3 2 . 1 5 3 4 . 3 1 . 7238 3 1 . 3 2 7 1 3 0 . 9 5 8 6 24 3 4 . 3 4 8 9 3 3 . 8 2 2 7 34 .3453 32 .9083 3 2 . 5 0 4 6 3 2 . 1 2 9 5 25 3 5 . 5 5 3 2 3 5 . 0 1 8 4 3 4 . 5 3 3 1 3 4 . 0 8 8 6 3 3 . 6 7 8 3 3 3 . 2 9 6 8 26 3 6 . 7 5 3 5 3 6 . 2 1 0 1 3 5 . 7 1 7 1 . 3 5 . 2 6 5 5 ; 34 .8483 3 4 . 4 6 0 6 27 3 7 . 9 4 9 9 37 .39 .82 36 .0976 3 6 . 4 3 9 0 36.OI52 35 .6213 28 39 .1425 ; 3 8 . 5 8 2 7 38 .0746 . 3 7 . 6 0 9 1 3 7 . 1 7 8 9 3 6 . 7 7 9 0 29 4 0 . 3 3 4 5 3 9 . 7 6 3 9 3 9 . 2 4 8 5 3 8 . 7 7 6 2 3 8 . 3 3 9 8 3 7 . 9 3 3 9 30 4 1 . 5 1 7 5 4 0 . 9 4 1 9 40 .4193 3 9 . 9 4 0 3 3 9 . 4 9 7 7 3 9 . 0 8 6 0 40 5 3 . 2 2 6 1 - 52 .5781 . 5 1 . 9 8 9 1 .51 .4493 5 0 . 9 4 9 9 50 .4851 50 64.;7286 64*0169 ' 6 3 . 3 6 9 6 6 2 . 7 7 5 4 6 2 . 2 2 5 3 6 1 . 7 1 3 0 60 76 .0853 7 5 . 3 1 6 1 7 4 . 6 1 6 0 7 3 . 9 7 3 1 7 3 . 3 7 7 6 7 2 . 8 2 2 8 70 8 7 . 3 3 2 1 8 6 . 5 0 9 9 8 5 . 7 6 1 4 8 5 . 0 7 3 6 8 4 . 4 3 6 5 8 3 . 8 4 2 6 80 9 8 . 4 9 1 4 9 7 . 6 2 0 1 9 6 . 8 2 6 7 9 6 . 0 9 7 2 95 .4213 9 4 . 7 9 1 0 90 1 0 9 . 5 8 0 108 . 6 6 3 1 0 7 . 8 2 7 1 0 7 . 0 5 8 1 0 6 . 3 4 5 8 1 0 5 . 6 8 1 100 1 2 0 . 6 1 0 1 1 9 . 6 4 8 118 .772 1 1 7 . 9 6 7 . 1 1 7 . 2 2 0 1 1 6 . 5 2 3 X I . 4 I 4 2 I 1 .35403 I . 2 9 8 9 1 I . 2 4 8 O O 1 . 20058 I . I 5 6 I 7 ( i n A p p e n d i x D U p p e r 100 P e r c e n t a g e o f X / = . 0 1 and / , = I - ( I - / ) - C o n t d . 15 16 17 1 £ n mo I 2.26275 2.16508 2 . 0 7 4 7 7 1 .99093 1 .91282 1 . 8 3 9 7 9 2 4 . 0 4 1 4 9 3 . 9 1 4 7 9 3 . 7 9 6 7 8 3 . 6 3 6 4 2 , 3 .58286 3 .48535 3 5 . 6 0 3 7 2 5 . 4 5 6 9 5 5 . 3 I 9 8 I 5 . I 9 H 6 5 . 0 7 0 0 8 4 . 9 5 5 7 3 4 7o06353 6 . 9 0 0 4 9 6 . 7 4 7 8 7 6 . 6 0 4 4 2 6.46917 6 . 3 4 I 2 I 5 8.46146 8 . 2 8 4 3 8 8.II839 7.96218 7.81472 7 . 6 7 5 0 2 6 9 . 8 1 6 9 9 9 . 6 2 7 3 7 9 . 4 4 9 4 6 9 . 2 8 1 8 7 9 . I 2 3 5 I 8 . 9 7 3 3 6 7 II . I 4 I 3 1 0 . 9 4 0 2 1 0 . 7 5 1 4 1 0 . 5 7 3 5 1 0 . 4 0 5 2 1 0 . 2 4 5 5 12 .4413 1 2 . 2 2 9 7 12.0308 1 1 . 8 4 3 3 1 1 . 6 7 5 8 11 .4973 9 |I3.72I8 1 3 . 5 0 0 2 1 3 . 2 9 2 0 1 3 . 0 9 5 4 1 2 . 9 0 9 3 1 2 . 7 3 2 6 10 1 4 . 9 8 5 9 1 4 . 7 5 5 1 14.5380 1 4 . 3 3 3 0 1 4 . 1 3 8 8 1 3 . 9 5 4 3 II I 6 . 2 3 6 5 1 5 . 9 9 6 9 15 .7713 15-5533 1 5 . 3 5 6 5 1 5 . 1 6 4 6 12 1 7 . 4 7 5 7 17 .2273 16.9937 16.7730 16.5639 16.3650 13 18.7039 18.4477 18.2064 1 7 . 9 7 8 4 1 7 . 7 6 2 3 1 7 . 5 5 6 7 14 1 9 . 9 2 3 5 1 9 . 6 5 9 5 1 9 . 4 1 0 9 1 9 . 1 8 0 6 1 8 . 9 5 3 1 18.7410 15 2 1 . 1 3 4 9 2 0 , 8 6 3 5 2 0 . 6 0 7 8 2 0 . 3 6 6 0 2 0 . 1 3 6 6 19.9184 16 2 2 . 3 3 9 1 2 2 . 0 6 0 5 2 1 , 7 9 7 9 2 1 . 5 4 9 6 21*3140 21.0897 17 2 3 . 5 3 6 8 23*2512 22.9320 2 2 . 7 2 7 3 22*4357 2 2 . 2 5 5 7 18 2 4 . 7 2 8 6 2 4 . 4 3 6 2 2 4 . 1 6 0 6 23.8998 2 3 . 6 5 2 2 2 3 . 4 1 6 6 19 25.9i .5O 2 5 . 6 1 6 0 2 5 . 3 3 4 1 2 5 . 0 6 7 3 24.8141 2 4 . 5 7 2 9 Appendix D Upper 100 -^Percentage of X * /=.0I. and ^ f e = - I - ( I - ^ ) - Contd. 15 16 17 IS 3LO 2§ 27*0964 26 . 7 9 1 0 26 . 5 0 2 9 26.2303 2 5 . 9 7 1 5 25.7251 21 23 o 27.31 27 . 9 6 1 4 27.6675 27.3892 27.1250 26.8733 22 29.4456' 29.1278 28.8280 28.5442 28.2748 28.0180 23 30.6140 - 30 . 2 9 0 2 29.9848 2 9 . 6 9 5 6 29.4210 29 . 1593 24 31.7738 31.4492 31.1383 30.8438 30.5641 30.2975 25 32 . 9 4 0 0 32.6047 32.2884 31.9887 31.7041 31.4328 26 \" 34 . 0 9 7 9 33.7570 33.4353 33.1306 32.8411 32.5652 27 35.2528 34.9065 34.5796 34.2699 33.9757 33.6952 28 36.4049 36.0531 35 . 7 2 1 1 35.4065 35.1076 34.8227 29 37 . 5 5 4 1 37.1971 36.8601 36.5407 '36.2373 35.9479 , 30 38.7007 38.3386 37.9966 37.6726 37.3646 37.0709 40 50.0496 49.6398 49.2525 48.8853 48.5390 48.2026 50 61.2330 60.7811 60.3539 59.9486 5 9 . 5 6 2 9 59.1948 60 72.3027 71 .8129 71.3496 70.9100 70.4915 70.0919 70 83.2856 82.7609 82.2646 81 , 7 9 3 3 81.3447 80.9I6I 80 94 . 1 9 9 8 93.6427 93 . 1 1 5 6 92 . 6 1 5 0 9 2 . 1 3 8 3 9 1 . 6 8 3 9 90 105.058 104.470 1 0 3 . 9 1 4 1 0 3 . 3 8 6 102.883 I02:o403 100 115 . 8 6 9 115.2530 1 1 4 . 6 7 0 114.115 113.587 113.083 X I.II433 1 . 0 7 4 7 6 1 .03717 I.00134 I..96711 <&. 93428 BIBLIOGRAPHY Anderson, T.W. (1946) \"The non-central Wishart distribution and certain problems of multivariate s t a t i s t i c s . \" Ann. of Math. Stat. 17, 409-431. (1957) An Introduction to Multivariate S t a t i s t i c a l Analysis. New York, John Wiley and Sons. Anderson, R.L. and Bancroft^.Jl. (1902) S t a t i s t i c a l Theory i n Research. New York, McGraw-Hill Book Co. Bannerjee, D.P. (1958) \"On the exact distribution of a test i n multivariate analysis.\" J . Roy. Stat. Soc.^Ser. B, 20, 108-110. Bartlett, M.S. (1934) \"The vector representation of sample.\" Proc.-Camb. Phil.Sbc. 30, 327-31+0. (1938) \"Further aspects of the theory of multiple regression.\" Proc. Camb. P h i l . Soc. 34, 33-40. (1947) \"Multivariate analysis.\" Supp. J . Roy. Stat. Soc. 9, 176-197. Bierens deHaan, D. (1939) Nouvelles Tables d'Integrales Definies. Ed. of I 8 6 7 - corrected. New York,G.E. Stechert and Co. Bose,1 R.C. (1936) \"On the exact distribution and moment coefficients of the D 1 - s t a t i s t i c . \" Sankhya 2, 143-154. Bose, R.C. and Roy, S.N. (1938) \"The distribution of the Studentized D* - ~ s t a t i s t i c . \" Sankhya 4, 19-38. Bross, I. (1900) \"Fiducial intervals for variance ccmponentsi\" Biometrics 6, 136-144. . Burnside, W.S. and Panton, A.W. (1904) Theory of Equations. Dublin, Dublin University Press Series, Vol. 1. Cramer, H. (1946) Mathematical Methods of St a t i s t i c s . Princeton, Princeton University Press. Crump, S.L. (1945) \"The estimation of variance components i n analysis of variance.\" Biometrics 1, 7-H. Cuttle, Yvonne (1956) \"The distribution of extreme Mahalanobis distance from the sample mean.\" Master's Thesis, University of Br i t i s h Columbia, Vancouver. -11-Duncan, D.B. Eisenhart, C. Federer, W.T. Fisher, R.A. Ganguli, M. Garwood, F. Girshick, M.A. Grayball, F.A. Hartley, H. 0. Hotelling, H. (1901) \"A significance test for difference between ranked treatments in an analysis of variance.\" Virginia Journal of Sc. 2, 171-189. (1953) \"Multiple range tests and the multiple comparisons test. : ; (Preliminary report).\" Biometrics 9, 262, Abstract 220. (1955) \"Multiple range and multiple F tests.\" Biometrics 11, 1-1+2. (1957) \"Multiple range tests for correlated and heteroscedastic means.\" Biometrics 13, 164-176. (1947)\"The assumptions underlying the analysis of variance.\" Biometrics 3 , 1-21. (1955) Experimental Design. New York, The Macmillan Go. (1928) \"The general sampling distributions of the multiple correlation coefficient.'.* Proc. Roy. Soc. London, Ser. A, 121, 654-673. (1931) \"Introduction\" in Mathematical; Tables of the British Association for the Advancement of Science. Vol. 1. Cambridge, Cambridge University Press, p. 26. . (1930) \"The fiducial argument of variance components.\" Ann. Eug. 6, 391-398. (1936) \"The use of multiple measurements in taxonomic problems.!' Ann. Eug. 7, 179-188. (1938) \"The statistical utilization of multiple measurements.\" Ann. Eug. 8, 376-386. (1939) \"The sampling distribution of seme statistics obtained from non-linear equations.\" Ann. Eug. 9, 238-249. (1941) \"A note on nested sampling.!' Sankhya 5, 449-452. \"Unpublished table of the lower $% points of the noncentral chi-square.\" (1939) \"On the sampling theory of roots of determinantal equations.\" Ann. Math. Stat. 10, 203-224. , Martin, F. and Godfrey, G. (1956) \"Confidence intervals for variance ratios specifying genetic heritability.1-' Biometrics 12, 99-109. and Pearson, E. S. (1954) Bicmetrika Tables for Statisticians. Vol. 1, Cambridge University Press. (1931) \"The generalization of Students ratio.\" Ann. Math. Stat. 2. 36-378. (1936) \"Relations between two sets of variates.!1 Biometrika 28, 321-377. (1950) \"A generalized T test and measure of multivariate dispersion.\" Proc. of Second Berkeley Symposium.on Math. Stat, and Prob. 23-41. Berkeley, University of California Press. - i i i ' Hsu, P.L. Ito, K. Keuls, M. Kramer, C.Y. Larsen, H.D. Lawley, D.N. x. » (1938) \"Notes on Hotelling's generalized T. Ann. Math. Stat. 9, 231-243. (1939) \"On the distribution of roots of certain determinantal equations.\" Ann. Eug. 9, 250-258. (1940) \"On the limiting distribution of roots of a determinantal equation.\" J. London Math.-Soc. 16, 183-194. (1956) \"Asymptotic formulae for the distribution of Hotelling's generalized T^ - statistic.\" Ann. Math. Stat. 27, 1091-1105. (1952) \"The use of the Studentized range in connection with an analysis of variance.\" Euphytica 1, 112-122. (1955) \"On the analysis of variance of a two-way classification with unequal sub-class numbers.\" Biometrics 11, 441-452. (1956) \"Extension of multiple measurements to group means with unequal numbers of replications.\" Biometrics 12, 307-310* (1957) \"Extension of multiple reange tests to group correlated adjusted means.\" Biometrics 13, 13-18. (1948). Rinehart Mathematical Tables Formulas and Curves. New York, Rinehart and Co. Inc., publishers. (1938a) \"A generalization of Fisher's z test.\" Biometrika 30, 180-167. (1938b) \"Tests of significance for the latent roots of covariance and correlation matrices.? Biometrika 43, 128-136. Mahalanobis, P.O. (1927) \"Analysis of race-mixtures in Bengal.\" J. Asiat. Soc. Bengal, 23, 301-333. (1936) \"On the generalized distance in statistics.\" Pro. Nat. Inst. Sci. India, 2, 49-55. Mahalanobis, P.C., Majumdar, D.M. and Rao, C.R. (1949) \"Anthropometric survey of the United Provinces, 1941. A statistical study,!' Sankhya 9, 89-324. Mood, A.M. Morant, G.M. Nair, U.S. (1950) Introduction to the Theory of Statistics. New York, McGraw-Hill Book Co. (1951) \"On the distribution of the characteristic roots of normal second-moment matrices.\" Ann. Math. Stat. 22, 266-273* (1923) \"Afirst study of the Tibetan skull.\" Biometrika 14, 193-260. (1924) \"A study of certain oriental series of crania including the Nepalese and Tibetan series in the British Museum (Natural History).\" Biometrika 16, 1-105* (1926) \"Studies of Palaeolithic man. 1. The Chancelade skull and its relation to the modern Eskimo skull.\" Ann. Eug. 1. 257-276, (1939) \"The application of moment functions in the study of distribution.laws in Statistics.\" Biometrika 30, 274-294••• -iv-Nanda, D.N. Nash, S.¥. \"Distribution of a root of a determinantal equation.\" Ann.. Math. Stat. 19, 47-07 (1948) \"Limiting distributions of a root of a determinantal equation.\" Ann. Math. Stat. 19,340-300. (1900) \"Distribution of the sum of roots of a determinantal equation under a certain condition.\" Ann. Math. 21, 432-439. (1906) \"Contributions to the theory of experiments with many treatments.\" Berkeley, University of California Press. Nash, S.W. and Jolicoeur, P. (1909) \"Calculating discriminant functions.\" Unpublished. Newman, D. Patnaik, P.B. Pearson, K. (1939) \"The distribution of the range in samples from a normal population expressed in terms of an independent estimate of standard deviation.\" Biometrika 31, 20-30. (1949) \"The non-central chi-square and F distributions and their applications.\" Biometrika 36, 202-232. (1926) \"On the coefficient of racial likeness.\" Biometrikal8, 1O07117. (1928) \"The application of the coefficient of the racial likeness to test the character of samples.\" Biometrika 20B, 294-300. P i l l a i , K.C.S. (1903) \"On-'the distribution of the sum of the roots of a determinantal equation.\" Abstract, Ann. Math. Stat. 24, 490. (1904) \"On the distribution of Hotelling's generalized'-\"^. ^ T - test.\" Abstract, Ann. Math. Stat. 20, 412. (1904) \"On some distribution problems in multivariate analysis.\" Mimeo, Series No. 88. Institute of Statistics, University of Nmrth Carolina. (1900) \"Some new criteria in multivariate analysis.\" Ann. Math. Stat. 26, 117-121. (1906) \"On the distribution of the largest or the smallest root of a matrix in multivariate analysis.\" Biometrika 43, 122-127. (1907) Concise Tables for Statistics. Manila, The Statistical Center, University of Phillipines. Rao, C.R. (1946) \"Tests with discriminant functions in multivariate analysis.\" Sankhya 7, 407-414. (1948) \"The utilization of multiple measurements in problems of biological classification.\" J. Roy. Stat. Soc. B. 10, 109-203. (1902) Advanced Statistical Methods in Bicmetric Research. New Jork, John Wiley and Sons. Roy, S.N. (1939) \"p - statistics or some generalizations in analysis. of variance appropriate to multivariate problems.\" Sankhya 4, 381-396. . . (1942a)\"Analysis of variance for multivariate normal populations -the sampling distribution of the requisite p-statistics on the null and non-null hypotheses.\" Sankhya 6, 35-50. ' (1942b) \"The sampling distribution of p-statistics and certain allied.statistics on the non-null hypothesis.\"- Sankhya 6, 15*34. (1945) \"The individual sampling distribution of the maximum, the minimum and any intermediate of the p-statistics on the null hypothesis.\" Sankhya 7, 133-158. Roy, S.N. and Bo3e, R.C. (1953) \"Simultaneous confidence interval estimation.\" Ann. Math. Stat. 24, 513-536. Roy, S.N. (1954) \"Some further results in simultaneous confidence interval estimation.\" Ann. Math. Stat. 25, 725-776. (1956) \"Anote on some further results in simultaneous confidence interval estimation.\" Ann. Math. Stat. 27, 856-858. (1957) Some Aspects of Multivariate Ananlysis. New York, John Wiley and Sons. Calcutta, Indian Statistical Institute. Roy, S.N. and Gnanadesikan, R. (1957) \"Further contributions to multivariate confidence bounds.\" Biometrika 44, 399-410. (1959a) \"Some contributions to Anova in one or more dimensions. I.\" Ann. Math. Stat. 30, 304-317. (1959b) \"Some contributions to Anova in one or more dimensions. H.\" Ann. Math. Stat. 30, 310-340. Satterthwaite, F.E. (1941) \"Synthesis of variance.\" Psychometrika 6, 309-316. (1946) \"An approximate distribution of estimates of variance components.\" Biometrics 1, IIO-3.14. Scheffe' , H. (1953) \"Amethod of judging a l l contrasts in the analysis of Siotani, M. Tang, P.O. variance.\" Biometrika 40, 87-104. (1958) \"Note on the utilization of the generalized Student ratio in the analysis of variance of dispersion.1' Tokyo, Annals of the Institute of Statistical Math., 9, 157-171. (1959) \"The extreme value of the generalized distance of the • ' individual points in the multivariate normal samples.\" Tokyo, Annals of the Institute of Statistical Math., 10, 183-208, (1938) \"The power function of theanalysis of variance \"tests with tables and illustrations of their use.\" Stat. Bes. Mem. 2, 126-149 and 8 pages of tables. Tildesley, N.L. (1921) \"A first study of the Burmese skull.\" Biometrika 13, 176-262 - v i -Tukey, J.W. Whittaker, E.T. Wilks, S.S. (19k9) \"Comparing individual means in the analysis of variance.\" Biometrics 0, 99-HU. (1901) \"Quick and dirty methods in statistics.\" Part It. Simple analysis for standard designs. Proc. f i f t h Annual Convention, Am. Soc. for Quality Control, 189^197. (1952) \"Allowances for various types of error rates.\" Unpublished invited address, Blacksburgh, Virginia meeting of the Institute of Math. Stat. (1953) \"The problem of multiple comparisons.\"- Unpublished dittoed notes. Princeton University, Princeton, N.J. 396 pages. and Watson, G.M, (1927) A Course of Modern Analysis. Uth. ed. Cambridge, Cambridge University Press. (1932) \"Certain generalizations in the analysis of variance.\" Biometrika 26, kll-k9k» (1900) Mathematical Statistics. Princeton, N.J., Princeton University Press. t. "@en ; edm:hasType "Thesis/Dissertation"@en ; edm:isShownAt "10.14288/1.0080600"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Mathematics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Multiple comparison methods and certain distributions arising in multivariate statistical analysis"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/39569"@en .