UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The performance of discriminant analysis procedures under non-optimal conditions Lind, John Charles 1979

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1979_A8 L55.pdf [ 2.73MB ]
Metadata
JSON: 831-1.0100253.json
JSON-LD: 831-1.0100253-ld.json
RDF/XML (Pretty): 831-1.0100253-rdf.xml
RDF/JSON: 831-1.0100253-rdf.json
Turtle: 831-1.0100253-turtle.txt
N-Triples: 831-1.0100253-rdf-ntriples.txt
Original Record: 831-1.0100253-source.json
Full Text
831-1.0100253-fulltext.txt
Citation
831-1.0100253.ris

Full Text

THE PERFORMANCE OF DISCRIMINANT ANALYSIS PROCEDURES UNDER NON-OPTIMAL CONDITIONS by JOHN CHARLES LIND B . S c , The U n i v e r s i t y of A l b e r t a 1974 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS i n THE FACULTY OF GRADUATE STUDIES Department of Psychology We accept t h i s t h e s i s as conforming to the req u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA October 1979 (c) John Charles L i n d , 1979 In presenting th i s thes is in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ibrary shal l make it f ree ly ava i l ab le for reference and study. I further agree that permission for extensive copying of th i s thesis for scholar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of this thes is fo r f inanc ia l gain sha l l not be allowed without my written permission. Depa rtment The Univers i ty of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 i i ABSTRACT The performance of four d i s c r i m i n a n t a n a l y s i s procedures f o r the c l a s s i f i c a t i o n of observations from unknown populations was examined by Monte Carlo methods. The procedures examined were the F i s h e r l i n e a r d i s c r i m i n a n t f u n c t i o n , the quadratic d i s c r i m i n a n t f u n c t i o n , a polynomial d i s c r i m i n a n t f u n c t i o n and a l i n e a r procedure designed f o r use i n s i t u a t i o n s where covariance matrices are unequal. Each procedure was observed under c o n d i t i o n s of unequal sample s i z e s , unequal covariance m a t r i c e s , and i n co n d i t i o n s where the samples were drawn from populations that d i d not have a m u l t i v a r i a t e normal d i s t r i b u t i o n . When the po p u l a t i o n covariance matrices were equal, or not g r e a t l y d i f f e r e n t , the quadratic d i s c r i m i n a n t f u n c t i o n performed s i m i l a r l y or m a r g i n a l l y b e t t e r than the l i n e a r procedures. In a l l cases the polynomial d i s c r i m i n a n t f u n c t i o n demonstrated the poorest quadratic d i s c r i m i n a n t f u n c t i o n performed much b e t t e r than the other procedures. A l l of the procedures were g r e a t l y a f f e c t e d by non-normality and tended to make many more e r r o r s i n the c l a s s i f i c a t i o n of one group than the other, suggesting that data be standardized when non-normality i s suspected. i i i TABLE OF CONTENTS Page Abs t r a c t i i L i s t of Tables i v Acknowledgement v I n t r o d u c t i o n . 1 Method 9 The F i s h e r L i n e a r D i s c r i m i n a n t Function 9 The Quadratic D i s c r i m i n a n t Function 10 The A-B Disc r i m i n a n t Function 12 The Polynomial D i s c r i m i n a n t Function 14 Independent V a r i a b l e s 17 Data Generation 21 Results and D i s c u s s i o n 23 Conclusions 36 References 38 Appendices Appendix I P o p u l a t i o n Covariance Matrices 41 Appendix I I The E f f e c t of Transformation 1 on the B i v a r i a t e Normal D i s t r i b u t i o n 42 XV LIST OF TABLES Page Table 1 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples Two Dependent V a r i a b l e s - Both Samples Drawn from MVN Populations 24 Table 2 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples Two Dependent V a r i a b l e s - Transformation 1 Ap p l i e d to Both Samples 25 Table 3 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples Two Dependent V a r i a b l e s - Transformation 2 Ap p l i e d to Both Samples 26 Table 4 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples Two Dependent V a r i a b l e s - Transformation 1 Ap p l i e d to Sample 2 Only 27 Table 5 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples S i x Dependent V a r i a b l e s - Both Samples Drawn from MVN Populations 28 Table 6 Mean P r o p o r t i o n :6f Correct C l a s s i f i c a t i o n f o r 1000 Samples S i x Dependent V a r i a b l e s - Transformation 1 Ap p l i e d to Both Samples 29 Table 7 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples S i x Dependent V a r i a b l e s - Transformation 2 Ap p l i e d to Both Samples 30 Table 8 Mean P r o p o r t i o n of Correct C l a s s i f i c a t i o n f o r 1000 Samples S i x Dependent V a r i a b l e s - Transformation 1 Ap p l i e d to Sample 2 only 31 ACKNOWLEDGEMENT I would l i k e to express my g r a t i t u d e to a number of people f o r t h e i r a s s i s t a n c e i n t h i s p r o j e c t . I would l i k e to thank the members of my committee; Dr. A.R. Hakstian, Dr. J.H. S t e i g e r , Dr. J.S. Wiggins and Dr. S.W. Nash f o r t h e i r t h o u g h t f u l comments and advice during the course of t h i s p r o j e c t . I would a l s o l i k e to thank Tom N i c h o l s f o r h i s advice and a s s i s t a n c e w i t h part of the computer programming. 1. INTRODUCTION The assignment of i n d i v i d u a l s of unknown o r i g i n to one or more d i s t i n c t groups on the b a s i s of a number of measurements or t e s t scores i s a common problem i n many areas of p s y c h o l o g i c a l research. While the o r i g i n of the i n d i v i d u a l s i s unknown, i t i s assumed that they come from one or the other of the d i s t i n c t groups being considered. For example, i n c l i n i c a l psychology i t may be necessary to a s s i g n an i n d i v i d u a l to one of s e v e r a l d i a g n o s t i c c a t e g o r i e s on the b a s i s of t e s t scores and c l i n i c i a n ' s r a t i n g s . In educational psychology, there are problems l i k e those encountered when ad v i s i n g students on a course of f u t u r e study based on t h e i r examination r e s u l t s i n the l a s t year of secondary s c h o o l . The most w i d e l y used s t a t i s t i c a l procedure f o r p r e d i c t i o n - c l a s s i f i c a t i o n problems of t h i s k ind i s d i s c r i m i n a n t a n a l y s i s , o r i g i n a l l y developed by F i s h e r (1936). The F i s h e r l i n e a r d i s c r i m i n a n t f u n c t i o n (LDF) i s obtained when one maximizes the r a t i o of the between-samples mean square to the w i t h i n -sample v a r i a n c e , y i e l d i n g l i n e a r combinations of the o r i g i n a l v a r i a b l e s which can then be used f o r c l a s s i f i c a t i o n of new cases. An i n d i v i d u a l ' s score i s computed on the d i s c r i m i n a n t f u n c t i o n that produces the l a r g e s t between- to within-groups variance r a t i o , and i s compared w i t h the mean scores on t h i s d i s c r i m i n a n t f u n c t i o n f o r each of the groups. I f the t r u e p o p u l a t i o n proportions of the various kinds of i n d i v i d u a l s who c o n s t i t u t e the groups are equal and the p r o b a b i l i t y of making an e r r o r i n c l a s s i f i c a t i o n i s the same f o r each of the groups then the i n d i v i d u a l i s assigned to th a t group whose mean d i s c r i m i n a n t score i s c l o s e s t to the subject's score (Morrison, 1975, pp. 230-246). When the po p u l a t i o n proportions d i f f e r or e r r o r s i n 2. c l a s s i f i c a t i o n are not e q u a l l y c o s t l y , the d e c i s i o n boundary i s moved c l o s e r to the mean d i s c r i m i n a n t score f o r the more common group or away from the group i n t o which c l a s s i f i c a t i o n i s more c o s t l y , such as a "mentally i l l " or " h a b i t u a l c r i m i n a l " group. In the f o l l o w i n g t e x t v e c t o r s are i n general column ve c t o r s and are underscored, f o r example x. The determinant of a matrix i s i n d i c a t e d by v e r t i c a l bars such as |A| and In r e f e r s to n a t u r a l logarithms. In the case of two groups F i s h e r (1936) demonstrated that the l i n e a r combination of the p observations which maximizes the r a t i o of between- to within-sample v a r i a n c e i s given by (1) y = ( x x - x 2 ) ' S _ 1 x where x = fx, ,...,x 1 i s a v e c t o r of observations and x n and x„ are the — L 1' p — 1 ~2 r e s p e c t i v e mean vec t o r s of two independent random samples from each of two p - v a r i a t e d i s t r i b u t i o n s w i t h a pooled sample covariance m a t r i x S. Equation (1) does not r e q u i r e any d i s t r i b u t i o n a l assumptions. Welch (1939), however, showed that i f both populations are m u l t i v a r i a t e normal (MVN), have respect-i v e mean vectors u_^,''V_2> a n (^ a c o m m o n covariance m a t r i x £ then the value of the l o g l i k e l i h o o d r a t i o i n the two populations f o r any observation x i s given by (2) ? = (V_± - p_2) 1 - hd]^ + £2)] . In t h i s case an unknown observation can be c l a s s i f i e d as belonging to e i t h e r p o p u l a t i o n 1 or 2 i f t, i s e i t h e r g r e a t e r or l e s s than a constant k, which i s determined by the r e l a t i v e cost of m i s c l a s s i f i c a t i o n and the p r i o r p r o b a b i l i t y of x coming from each of the po p u l a t i o n s . 3. When the p o p u l a t i o n parameters j i , and I are unknown, Wald (1944) and Anderson (1951) suggested r e p l a c i n g the unknown parameters by t h e i r sample counterparts i n which case the c l a s s i f i c a t i o n r u l e depends on which i s g e n e r a l l y r e f e r r e d to as Anderson's c l a s s i f i c a t i o n s t a t i s t i c . The d e r i v a t i o n of equation (1) d i d not r e q u i r e any d i s t r i b u t i o n a l assumptions; the l i n e a r combinations (x^ - x^'S ~^"x or i t s p o p u l a t i o n counterpart, however, i s contained i n equations (2) and (3) t h a t f o l l o w from the m u l t i v a r i a t e normality of x. This suggests that c a u t i o n i s nec-essary when app l y i n g the F i s h e r LDF i n s i t u a t i o n s where data are not m u l t i v a r i a t e normal and i n f o r m a t i o n about i t s performance i n these non-optimal c o n d i t i o n s would be v a l u a b l e . I n s i t u a t i o n s where m u l t i v a r i a t e normality can be assumed, the F i s h e r LDF may s t i l l not provide the optimal c l a s s i f i c a t i o n r u l e s i n c e the covar-iance matrices of the r e s p e c t i v e groups may be heterogeneous. I f the sample covariance matrices are not equal and are combined to form the pooled w i t h i n -groups covariance m a t r i x i n the F i s h e r LDF, the r e s u l t a n t d i s c r i m i n a n t space and a s s o c i a t e d coordinates may be d i s t o r t e d , thus i n c r e a s i n g the e r r o r i n c l a s s i f i c a t i o n . There .is the a d d i t i o n a l problem of d e r i v i n g meaningful i n t e r p r e t a t i o n s about group d i f f e r e n c e s when there are l a r g e d i s p a r i t i e s among the covariance matrices f o r the g r o u p s — t h e m u l t i v a r i a t e extension (see Gnanadesikan," 1977) of the s o - c a l l e d Behrens-Fisher problem ( F i s h e r , I f sample covariance matrices are unequal, the optimal c l a s s i f i c a t i o n r u l e c o n s i s t s of a quadratic d i s c r i m i n a n t f u n c t i o n (QDF) (Smith, 1947), (3) 1935). 4. which i s more d i f f i c u l t to i n t e r p r e t than the F i s h e r LDF sin c e i t i n v o l v e s n o n l i n e a r terms. Anderson and Bahadur (1962) developed a procedure (which we s h a l l r e f e r to i n the sequel as the "A-B" procedure) to f i n d the optimal l i n e a r d i s c r i m i n a n t f u n c t i o n i n the case of unequal covariance m a t r i c e s . I n recent years there has been a great deal of research i n Engineering and Computer Science i n the area of p a t t e r n r e c o g n i t i o n — r e s e a r c h which has r e s u l t e d i n s e v e r a l d i s c r i m i n a n t a n a l y s i s procedures. Some of these are i n the form of a polynomial d i s c r i m i n a n t f u n c t i o n (PDF) which are designed to be a p p l i e d i n s i t u a t i o n s where the form of the underlying d i s t r i b u t i o n s are not known or covariance matrices are not equal. At present there i s l i t t l e i n f o r m a t i o n comparing these procedures, under v i o l a t i o n of the usual m u l t i v a r i a t e assumptions, to more standard methods l i k e the LDF and QDF. Some in f o r m a t i o n i s a v a i l a b l e on the p e r f o r -mance of the F i s h e r LDF, the QDF and the A-B procedure i n the case of unequal covariance m a t r i c e s , but most of the published work has concentrated p r i m a r i l y on the performance of the F i s h e r LDF under c o n d i t i o n s of non-normality. The studies that have been conducted on the robustness of the F i s h e r LDF and QDF are described i n what f o l l o w s . G i l b e r t (1969) i n v e s t i g a t e d the performance of F i s h e r ' s LDF under co n d i t i o n s of unequal covariance matrices by comparing i t w i t h the QDF when the parameters of the two populations are assumed to be known. She r e s t r i c t e d a t t e n t i o n to the case where one covariance matrix was a constant m u l t i p l e of the other and a l l c o r r e l a t i o n s were assumed to be equal. I t 2 was found that f o r l a r g e values of the Mahalanobis d i s t a n c e measure, D (see Morrison, 1975, pp. 235-245) the quadratic and l i n e a r f u n c t i o n s behave s i m i l a r l y . As mean d i f f e r e n c e s or the number of v a r i a b l e s i n c r e a s e d , 5. however, the quadratic f u n c t i o n performed b e t t e r than the l i n e a r . Marks and Dunn (1974) assumed unknown parameters and compared the sample LDF, QDF and A-B procedure when unequal covariance matrices were present. They used a s l i g h t l y wider v a r i e t y of s i t u a t i o n s than G i l b e r t , but i n c l u d e d some of the same parameter combinations. The sample r e s u l t s i n d i c a t e d that the F i s h e r LDF i s q u i t e s a t i s f a c t o r y compared to the other procedures i f the d i f f e r e n c e between covariance matrices i s not l a r g e . The QDF d i d not perform w e l l f o r s m a l l sample s i z e s , and as a r e s u l t , they recommended the use of the QDF i n s i t u a t i o n s where there are l a r g e d i f f e r -ences i n covariance m a t r i c e s , and the sample s i z e s are q u i t e l a r g e . The m a j o r i t y of research on the performance of the F i s h e r LDF and the QDF f o r d i s t r i b u t i o n s other than that of the m u l t i v a r i a t e normal i s p r i m a r i l y due to the many ways non-normal data can a r i s e i n p r a c t i c a l s i t u a t i o n s . Non-normality can a r i s e when a l l the v a r i a b l e s are continuous, but t h e i r j o i n t d i s t r i b u t i o n i s not normal, or when a l l the v a r i a b l e s are d i s c r e t e and each can assume only a f i n i t e number of val u e s . A l s o s i t u a t i o n s a r i s e i n which d i s c r i m i n a n t a n a l y s i s may be a p p l i e d to data that are a mixture of both d i s c r e t e and continuous v a r i a b l e s . Lachenbruch, Sneeringer, and Revo (1973) considered the robustness of the F i s h e r LDF i n the case of continuous non-normal data. They considered three s p e c i f i c d i s t r i b u t i o n s and the case of unco r r e l a t e d v a r i a b l e s . The d i s t r i b u t i o n s were generated from the normal d i s t r i b u t i o n by using the non-l i n e a r transformations suggested by Johnson( 1949). I n t h i s case optimal p r o b a b i l i t i e s of m i s c l a s s i f i c a t i o n were known s i n c e the i n v e r s e transform-a t i o n could be a p p l i e d to o b t a i n normal v a r i a b l e s . They a l s o used Monte Carlo methods to i n v e s t i g a t e the robustness of the LDF when the parameters 6. were estimated. Their r e s u l t s i n d i c a t e d that the F i s h e r LDF i s g r e a t l y a f f e c t e d by non-normality i n the po p u l a t i o n s . E r r o r s i n c l a s s i f i c a t i o n were g e n e r a l l y l a r g e r i n one p o p u l a t i o n , whereas the reverse was tru e f o r the other. The LDF appeared to perform best when the v a r i a b l e s could take on values i n a f i n i t e range, but e r r o r increased when the range of values v a r i a b l e s could assume was i n f i n i t e or s e m i - i n f i n i t e . They concluded t h a t the r e s u l t s of the F i s h e r LDF could be misleading and recommended that the data be transformed to normality or approximate n o r m a l i t y p r i o r to ;use of the LDF. In a d d i t i o n they suggested that the d i s t o r t i o n i n e r r o r s i n c l a s s i f i c a t i o n could be used to gain i n f o r m a t i o n about the no r m a l i t y assump-t i o n . I f e r r o r s i n c l a s s i f i c a t i o n are g r e a t e l y d i f f e r e n t when a zero cut-o f f p o i n t i s used, the presence of non-normal data i s i n d i c a t e d . Zhezhel (1968) considered the LDF w i t h a r b i t r a r y d i s t r i b u t i o n s w i t h equal covariance matrices and c a l c u l a t e d the maximum e r r o r i n c l a s s i f i c a t i o n i n each p o p u l a t i o n . He demonstrated that f o r given means and a common covariance m a t r i x , the maximum e r r o r i s greater than the corresponding e r r o r f o r a m u l t i v a r i a t e normal d i s t r i b u t i o n w i t h the same means and covar-iance matrix. A l s o the maximum e r r o r value i s decreasing f u n c t i o n of the 2 2 Mahalanobis D between populations and tends to zero as D * °°. For 2 D > 4 the p r o b a b i l i t y of e r r o r i n c l a s s i f i c a t i o n i s always l e s s than 0.5; thus the LDF w i l l always be b e t t e r than random c l a s s i f i c a t i o n i n t o the two 2 populations.. For 0 < D < 4 the maximum e r r o r exceeds 0.5, suggesting t h a t s i t u a t i o n s can e x i s t i n which the LDF can give poorer r e s u l t s than random c l a s s i f i c a t i o n . S i t u a t i o n s o f t e n a r i s e i n p s y c h o l o g i c a l research i n which the data are not continous, such as w i t h t r u e - f a l s e items, or i n which the data are a 7. mixture of continuous and d i s c r e t e v a r i a b l e s , such as t e s t scores and measures re p r e s e n t i n g signs or symptoms of various behaviors analyzed t o -gether. The performance of the F i s h e r LDF f o r b i n a r y data has been considered by s e v e r a l authors. G i l b e r t (1968) considered the performance of s e v e r a l c l a s s i f i c a t i o n r u l e s on b i n a r y data, and the r e s u l t s suggested that the l o s s i n using F i s h e r ' s LDF, as opposed to any of the other procedures, was too small to be of any p r a c t i c a l importance, and that the procedure could be recommended f o r a l a r g e number of v a r i a b l e s and f o r s i t u a t i o n s w i t h both d i s c r e t e and continuous data. In another study, Moore (1973) conducted a s i m i l a r i n v e s t i g a t i o n but used a more general model than G i l b e r t ' s (1968) to generate samples. His r e s u l t s i n d i c a t e d that care was necessary i n the s e l e c t i o n of the d i s c r i m i -nant procedure f o r b i n a r y v a r i a b l e s , s i n c e the F i s h e r LDF can give s i g n i f i -c a n t l y greater e r r o r i n c l a s s i f i c a t i o n than other procedures i n s i t u a t i o n s where the values (0,0) and (1,1) of any two b i n a r y v a r i a b l e s occur more fr e q u e n t l y i n one p o p u l a t i o n w h i l e the values (0,1) and (1,0) occur more fr e q u e n t l y i n the other. Krzanowski (1977) considered the case of a mixture of b i n a r y and con-tinuous v a r i a b l e s . His r e s u l t s i n d i c a t e d that the F i s h e r LDF can give u n s a t i s f a c t o r y r e s u l t s i f there are high c o r r e l a t i o n s between b i n a r y and con-tinuous v a r i a b l e s i n one p o p u l a t i o n but correspondingly low c o r r e l a t i o n s between these v a r i a b l e s i n the other p o p u l a t i o n . In t h i s s i t u a t i o n a d i s -c r iminant procedure based on a l o c a t i o n model i s recommended (Krzanowski, 1975). There have been recent attempts to develop robust l i n e a r and quadratic d i s c r i m i n a n t f u n c t i o n s (Randies, B r o f f i t t , Ramberg, & Hogg, 1978). One of these i s a g e n e r a l i z a t i o n of F i s h e r ' s LDF that places l e s s weight on out-8. l i e r s , and another uses robust estimators of the means and covariance m a t r i c e s . Some e m p i r i c a l r e s u l t s using these methods i n d i c a t e that they perform b e t t e r , under non-optimal c o n d i t i o n s , than does the F i s h e r LDF or the QDF. The purpose of the present study was to compare the performance of the four d i s c r i m i n a n t a n a l y s i s p r o c e d u r e s — t h e LDF, QDF, PDF, and A-B m e t h o d — i n terms of robustness i n the face of v i o l a t i o n of assumptions regarding d i s -t r i b u t i o n a l form and d i s p e r s i o n . The A-B procedure and the two n o n - l i n e a r methods are those which could be expected to outperform the LDF under non-optimal c o n d i t i o n s of sample s i z e , e q u a l i t y of covariance m a t r i c e s , and d i s -t r i b u t i o n a l form. The QDF was included s i n c e i t i s widely used as an a l t e r n a t i v e to the LDF under heterogeneity of covariance m a t r i c e s . The A-B method was i n c l u d e d f o r comparison s i n c e there i s l i t t l e i n f o r m a t i o n about i t s performance r e l a t i v e to t h a t of F i s h e r ' s LDF or the QDF i n s i t u a t i o n s where data i s not MVN. The other procedure examined was the polynomial d i s c r i m i n a n t f u n c t i o n described by Tou and Gonzalez (1974), which has been used i n engineering and computer a p p l i c a t i o n s . This procedure was i n c l u d e d s i n c e i t does not r e p l y on any of the d i s t r i b u t i o n a l assumptions of the other methods.. Also there i s very l i t t l e i n f o r m a t i o n about PDF methods i n r e l a t i o n to more commonly used procedures. In a l l c o n d i t i o n s of the present study the above procedures were a p p l i e d to two samples and the p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n from the d i s c r i m i n a n t f u n c t i o n s d e f i n i n g the boundary between the two groups was t a b u l a t e d . 9. METHOD In each c o n d i t i o n of the present study 1000 p a i r s of samples were generated by Monte Carlo methods. For each p a i r of samples the four d i s c r i m -inant a n a l y s i s procedures were a p p l i e d and the p r o p o r t i o n of c o r r e c t c l a s s -i f i c a t i o n f o r each sample c a l c u l a t e d . In a d d i t i o n , f o r each p a i r of samples a corresponding c r o s s - v a l i d a t i o n p a i r of samples of the same s i z e was drawn from the same p o p u l a t i o n and the d i s c r i m i n a n t f u n c t i o n c o e f f i c i e n t s obtained f o r the o r i g i n a l samples were then a p p l i e d to the c r o s s - v a l i d a t i o n samples. The mean p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n f o r the 1000 o r i g i n a l and c r o s s - v a l i d a t i o n samples was then recorded f o r each procedure. The four d i s c r i m i n a n t a n a l y s i s procedures and the method of t h e i r computation i s described i n what f o l l o w s . The F i s h e r L i n e a r D i s c r i m i n a n t Function The F i s h e r LDF i s obtained by maximizing the F - r a t i o of the between-groups mean-square to the within-groups mean-square of a l i n e a r combination of the o r i g i n a l response v a r i a b l e s . The F - r a t i o can be expressed as and f o r p v a r i a b l e s the s o l u t i o n v e c t o r a.' = [a^,...,a^J which maximizes the F - r a t i o can be obtained from the eigenvectors corresponding to the eigen-values of the c h a r a c t e r i s t i c polynomial |B - AW| = 0. I n the above expression, W, of order p: x p, i s the within-groups covariance m a t r i x , and B, p x p, i s the between-groups covariance matrix. These matrices are obtained from the expressions: 10. 1 1 1 W = n^k i = l . < V 1 ) S 1 k-1 i = l where i s the sample covariance matrix f o r group i , y_^  = [yj>...»y ] i s the i t h sample mean v e c t o r , y_'= [ y , . . . , y ] , 1 x p, i s the o v e r a l l mean n v e c t o r , k i s the number of groups, and N = ^ i ^ n ^ > where i s the number of observations i n the i t h sample. For the present a n a l y s i s the case of two groups was examined ( i . e . , k=2), and the F i s h e r LDF was c a l c u l a t e d f o r each p a i r of samples by f i r s t computing the ve c t o r of c o e f f i c i e n t s a_, by a = W _ 1d where d_ = x_ - y_. The midpoint of the mean d i s c r i m i n a n t scores was then c a l c u l a t e d from: M = ha*(x + j ) . Each observation v e c t o r was then c l a s s i f i e d as belonging to Po p u l a t i o n 1 or 2 according to whether a'x - M was e i t h e r > 0 or < 0 (see K s h i r s a g a r , 1972, pp. 195-200). The ve c t o r of c o e f f i c i e n t s j i , was then a p p l i e d to the c r o s s - v a l i d a t i o n samples, and the pr o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n was tabulat e d . The Quadratic D i s c r i m i n a n t Function When an observation v e c t o r x, i s drawn from a MVN d i s t r i b u t i o n w i t h mean v e c t o r u . and covariance m a t r i x £., the MVN den s i t y f u n c t i o n f ( x ) , 11. can be expressed as: f.(x) = (27r)" k / 2|E i r %exp[-^(x- J :i d) ,Z^ 1(x-y 1)] . In the case of two groups an i n d i v i d u a l i s c l a s s i f i e d as belonging to Population 1 i f p f^jO/p f 2(x)-> 1 , that i s , i f f 1 ( x ) / f 2 ( x ) > p 2/p^. A l t e r n a t i v e l y , an i n d i v i d u a l i s assigned to Population 2 i f p ^ f ^ ( x ) / p 2 f 2 ( x ) _< 1, that i s , i f f-^(x)/f 200 <_ ^2^1' w n e r e Pi a n c* P 2 a r e t n e Proportions of i n d i v i d u a l s from the two groups i n the population (Lachenbruch, 1 9 7 5 , pp. 1 1 ) , When ?the two groups have a common covariance matrix, and mean vectors y^, and. _y2, the above ru l e becomes f 1 ( x ) / f 2 ( x ) = e x p[-i2(x - p 1) ,Z~ 1(x-y 1) + ^ ( x - y ^ ' E _ 1 (x - u_ 2 ) ] = exp[x'Z 1 ( u _ 1 - u _ 2 ) - kCVy+v^yz 1 ( i L 1 ~ £ 2 ^ ' and taking logarithms, the r u l e i s to assign an i n d i v i d u a l to Population 1: i f D t(x) = l n [ f 1 ( x ) / f 2 ( x ) J = [•x- 32(Ei-HL 2 ) ] , S " 1 ^ r^2 ) > l n C p ^ ) and to group 2 otherwise. The sample analogue of the above equation i s V- } = [»-3s(£1-ti.2^,s~1(^r^2) and the c o e f f i c i e n t s S ^(Xj-Xg) are seen to be i d e n t i c a l to Fisher's r e s u l t for the LDF. When covariance matrices are unequal and cannot be pooled, but the population d i s t r i b u t i o n s are m u l t i v a r i a t e normal, the c l a s s i f i c a t i o n r u l e has the form Q t(x) = l n [ f 1 ( x ) / f 2 ( x ) J > l n [ p 2 / P l ] EL k\ ~ %(*-J±l > ' Z ~ 1 > + ^(x-£2) ' E2 1 (x-li2) [in 4|2J- - x ' d l 1 - ^ 1 ) ^ + 2 x ' ( Z i V i - f~2lV2) - . MI^IL I + M ^ I ^ ] -12. In t h i s case, t h e ' d i s c r i m i n a n t f u n c t i o n i s q u a d r a t i c , since the term -1 -2 El - l i i s s t i l l present (Lachenbruch, 1975, pp. 20). From the above w i t h y j , y_2>'-^ l an^ ^2 estimated by t h e i r r e s p e c t i v e mean vectors and covariance matrices _x^, x^, and S^, the sample analogue of Q t(x) i s Q g(x) = ln(|sJ/|S 2|)-(x-£ 1) ,S^ 1(x-xp+(2c-£ 2)*S^ 1(x-x 2) > 2 1 n ( p 2 / P l ) . In each of the co n d i t i o n s of the present study the proporti o n s of each group i n the po p u l a t i o n were assumed to be equal to each other and not p r o p o r t i o n a l to sample s i z e s i n c e the true proportions are not u s u a l l y known i n most areas of p s y c h o l o g i c a l research. When p o p u l a t i o n proportions are equal the quadratic d e c i s i o n r u l e i s then to c l a s s i f y an i n d i v i d u a l i n t o "Population 1 i f Q g(x) > 0 or i n t o P o p u l a t i o n 2 i f Q g(x) .1 0 s i n c e ln(p 2/p^)=0. The A-B Di s c r i m i n a n t Function Anderson and Bahadur (1962) proposed a l i n e a r d i s c r i m i n a n t f u n c t i o n of the form b_'x, w i t h b_' = [b^,.. . ,b ] chosen so that x i s c l a s s i f i e d as from Po p u l a t i o n 1 i f b_'2£ > c a n d from P o p u l a t i o n 2 i f b_'x _^  c, where c i s a l s o s u i t a b l y determined. With t h i s procedure, the m i s c l a s s i f i c a t i o n p r o b a b i l i t i e s are: P 1 = Prob.(b_'x £ c | x e pop 1) = l- $ ( y ), and P 2 = Prob.(b_'x > c | x e pop 2) = l-<Ky2), where $ i s the cumulative d i s t r i b u t i o n f u n c t i o n of a standard normal v a r i a b l e . The y and y are determined by 13. : b_'j£2 - c y 9 = ' — where y_^.:.andare the means of P o p u l a t i o n 1 and Population;2. Now can be expressed as: -k' ( i L i " ^ ) ~ y 2 ( - 12-}^ Y l = (b'Z^b)^ " ' The b_ i s then chosen which maximizes y^ f o r a given y^. By d i f f e r e n t i a t i n g y^ w i t h respect to b_, i t can be shown that the s o l u t i o n c o n s i s t s of s o l v i n g the f o l l o w i n g equations i n b_ and a s c a l a r t : (A) [ t E x + ( l - t ) Z 2 ] b = ( ^ - £ 2 ) , and y 2 = ( l - t ) ( b ' Z 2 b ) i s . The s o l u t i o n to these equations i s obtained by a t r i a l - a n d - e r r o r procedure, and c i s then obtained by: (5) c = b'pj + tb'E b = b'£ 2 - ( l - O b ^ b . Now y^-can be obtained from: yl = tCb'ijb) 1 2. Anderson and Bahadur (1962) a l s o considered an a l t e r n a t i v e method when the two m i s c l a s s i f i c a t i o n p r o b a b i l i t i e s are equal, i . e . y^ = y 2 > I n t h i s case, b_ and t are found from: 0 = y\ - y\ = t V ^ k - ( l - t ) 2 b ' E 2 b = b' [t2x1 - ( i - t ) 2 z 2 ] b . The determination of the value of t was accomplished by using the r e s u l t due to Banerjee and Marcus (1965), i n which E^ and E 2 were expressed as: 14. >I1 = N 'AN, and * 2 = N'N, where A = diag [ A i , A ? , . . . , A ], and A i , A 2 , . . . , A are the roots of the deter-— . P p 2 2 minantal equation |Ej -AE 21 =.0.. If v = ( l - t )/t , then v must l i e between the minimum and maximum roots of the above characteristic equation. In the present study the optimal value of t was approximated by evaluating t for v equal to the minimum and maximum characteristic roots and computing the vector b_ in each case from Equation (4). The value of c was then calc-ulated from Equation (5), and the observations were classified into Population 1 i f b_'jc > c or Population 2 i f b_'_x <^  c. In this manner, the interval [min.X^, max A ] was successively bisected, and for each value of t, the propor-tion of correct classifications calculated. The interval was bisected a max-imum of five times or until classification did not improve. The resultant discriminant function was then applied to ^the cross validation sample, and the proportion of correct classifications was calculated. The Polynomial Discriminant Function In this case, the discriminant function was constructed by estimating the probability density function for each sample directly from the observed data, as described i n Tou and Gonzalez (1974, pp. 145-151). This was accomp-lished by expanding the estimate p(x), of p(x) in a series which represents the probability density function of the i t h population, ptxJPop^). Tou and Gonzalez show that i f i t is required that the estimate of the probability density function minimize a mean-square error function defined as R =./ w(x)[p(x) - M x ) ] 2 dx, 15. where w(x) i s a weighting function., then p(x) may be expanded i n the s e r i e s m where the c^ are c o e f f i c i e n t s to be determined, and the {^(x)}™ a r e a set of s p e c i f i e d b a s i s f u n c t i o n s . A set of u n i v a r i a t e b a s i s f u n c t i o n s a s s o c i a t e d w i t h the normal d i s t r i b u t i o n from which m u l t i v a r i a t e b a s i s f u n c t i o n s can be obtained, are Hermite p o l y -nomials, H ( x ) , generated by the r e c u r s i v e r e l a t i o n H k + 1 ( x ) - 2 x U k ( x ) + 2 k H k _ 1 ( x ) = 0, k > 1 where HQ(X)=1 and H^(x)=2x. The f i r s t few Hermite polynomials are: HQ(X)=1; H 1(x)=2x; H 2(x)=4x 2-2; H 3(x)=8x 3-12x; H 4(x)=16x 4-48x 2+12. S u b s t i t u t i n g the expansion of p(x) i n t o the mean-square e r r o r f u n c t i o n y i e l d s m o R = / w(x)[p(x) - , E c * (x)] dx, x. J J J and minimizing R w i t h respect to the coef f i c i e n t s , 8 R / 3 c k = 0, k = l,...,m y i e l d s m . E . c . / w(x)$.(x)$, (x)dx = / w(x)S> (x)p(x)dx. 3 = 1 j x — 3 — k — — x — k — — — The r i g h t s i d e of t h i s equation i s the d e f i n i t i o n of the expected value of the f u n c t i o n w(x)$, (x) and may be approximated from the sample average m • 1 n ? cf. w(x)$.(x)$ (x)dx = - . E w(x) . $ (x. ) . . E , i x — i — k — — N i = l — l k —i Sinee the b a s i s f u n c t i o n s {$_.(x)} are orthonormal and are chosen orthog-onal w i t h respect to the weighting f u n c t i o n w(x), the c o e f f i c i e n t s may be cetermined from 1 n (6) c k = - ± E 1 ^ ( X ) , k=l,2,...,m 16. and the resultant density may be obtained from m (7) CjSjQO. By using Bayes' formula P(Pop.) P(x|Pop. ) P(Pop±[x) = - — P(x) where P(Pop^) is the probability of the i t h population, the discriminant functions for this problem are then given by: d x(x) = f3(x|Pop ) P(Pop ), and d 2(x) = g(x|Pop2) P(Pop 2), and i f P(Pop^) = P(Pop 2), the decision boundary is given by d^(2c)—(x)=0. In the present study a two-dimensional set of orthogonal functions was obtained by forming pairwise combinations of the one-dimensional functions. Six terms were used to approximate the density function and were constructed as follows: V x ) = V x x ,x2) • H 0 ( X 1 ) H 0 ( X 2 ) = l ; * 2 ( x ) = • 2 ( x 1 ,x2) = H 1 ( x 1 ) H Q ( x 2 ) = 2xx; *3<£) = * 3 ( X ! ,x2) = H ( )(x 1)H 1(x 2) = 2x2; • 4 (x) = • 4 ( x 1 ,x 2) = H 1 ( x 1 ) H 1 ( x 2 ) = 4 X lx 2; * 5 ( x ) • $5 ( X1 ,x2) = H 2(x 1)H 0(x 2) 2 = 4x^-2; and $ 6(x) • V X 1 ,x2) = H 0( X ; L)H 2(x 2) = 4x2-2. -The set of orthogonal functions for the six-variable case was constructed in the same manner as for the bivariate case by forming the product of one-17. dimensional Hermite polynomials. In order f o r the estimates of the den s i t y f u n c t i o n s to be polynomials of degree two f o r a l l the v a r i a b l e s , 28 terms were constructed as f o l l o w s : * 1 ( x 1 , . . . , x & ) = H Q ( x 1 ) H 0 ( x 2 ) ... H Q ( x 6 ) = 1; $ 2 ( x 1 , . ..,x6> = •H1(.x1)H ( x 2 ) .... H Q ( X 6 ) = 2 ^ ; $ 7 ( x 1 , . . . ,x 6) = H Q ( x 1 ) H 0 ( x 2 ) ... H ^ X g ) = 2x &; $g(x 1,...,x 6') = H 1 ( x 1 ) H 1 ( x 2 ) H Q ( x 3 ) ... H Q ( x 6 ) = A x ^ ; $ 2 2 ( x 1 } . . . ,x &) = HQCXJ^) ... H Q ( x ^ ( x ^ H ^ x ) = ^ x ^ ; 2 $ 2 3 ( x l S . . . , x & ) = H 2 ( x 1 ) H Q ( x 2 ) ... H Q ( x 6 ) = 4x^-2; 2 $ 2g(x 1,...,x 6) = H Q ( x 1 ) H 0 ( x 2 ) ... H 0 ( x 5 ) H 2 ( x 6 ) =• 4x &-2. The v e c t o r of c o e f f i c i e n t s c, was then computed f o r each sample from Equation ( 6 ) , and the polynomial estimates of the d e n s i t y f u n c t i o n s were constructed as i n Equation (7). The two estimates of the d e n s i t y f u n c t i o n s were then subtracted to form the polynomial d i s c r i m i n a n t f u n c t i o n , which was then a p p l i e d to the observations i n each of the o r i g i n a l and c r o s s - v a l i d a -t i o n samples. F i n a l l y , the p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n was c a l c -u l a t e d . Independent V a r i a b l e s The f o l l o w i n g independent v a r i a b l e s were s y s t e m a t i c a l l y examined i n the present study. 18. Sample s i z e . Two average sample s i z e s (to handle cases i n which the n's were unequal) were used: average n's of (1) 100 and (2) 40 observations. I n e q u a l i t y of sample s i z e s . The number of observations i n the two samples were v a r i e d i n the r a t i o s of 1:1 and 3:1. Thus, from t h i s f a c t and that above, i t can be seen that f o u r combinations of average sample s i z e -i n e q u a l i t y were used: 100:100, 150:50, 40:40, and 60:20. Number of v a r i a b l e s , p. Two l e v e l s of t h i s f a c t o r were examined: 2 and 6. Heterogeneity of covariance matrices. This was the c e n t r a l f a c t o r i n the present study. Heterogeneity of covariance matrices was accomplished by i n t r o d u c i n g s c a l e f a c t o r s i n t o the raw data, r e s u l t i n g i n three c o n d i t i o n s : 1. The v a r i a b l e s i n Po p u l a t i o n 2 were r e s c a l e d by /2; i . e . , the v a r i -ables i n P o p u l a t i o n 1 were a l l s et w i t h standard d e v i a t i o n s of 1, whereas those i n P o p u l a t i o n 2 had standard d e v i a t i o n s of /l and thus variances of 2. This can be considered as a m i l d departure from homogeneity and i s r e f e r r e d t o , i n what f o l l o w s , as "Heterogeneity C o n d i t i o n 1". 2. The v a r i a b l e s i n Po p u l a t i o n 2 were r e s c a l e d by 2. This was used to represent a stronger departure from homogeneity. The variances and co-variances f o r P o p u l a t i o n 2 v a r i a b l e s were, thus, 4 times those f o r P o p u l a t i o n 1. This i s subsequently r e f e r r e d to as "Heterogeneity C o n d i t i o n 2". 3. The v a r i a b l e s i n P o p u l a t i o n 2 were of the same s c a l e as those of Po p u l a t i o n 1; i . e . , no s c a l i n g took p l a c e . This i s subsequently r e f e r r e d to as the "Homogeneity C o n d i t i o n " . P o p u l a t i o n 1 covariance m a t r i c e s , E, were thus constructed f o r two and s i x v a r i a b l e s . The variances were a l l a r b i t r a r i l y set to u n i t y , and the o f f diagonal (covariance) elements ranged from zero to .50. The matrices 19. appear i n Appendix I . The P o p u l a t i o n 2 covariance matrices f o r both two and s i x v a r i a b l e s were then obtained by: E„ = D.ED. i = 1,...,3 2 i l l ' where the diagonal D_^  matrices contained the s c a l e f a c t o r s r e f e r r e d to above. Thus the three c o n d i t i o n s were: 1. Heterogeneity C o n d i t i o n 1: D^  contained i n each diagonal p o s i t i o n /2; 2. Heterogeneity C o n d i t i o n 2: D^ contained i n each diagonal p o s i t i o n , 2.0; 3. Homogeneity C o n d i t i o n : D„ was simply I . 3 P M u l t i v a r i a t e nonnormality. I n order to o b t a i n samples from populations w i t h observation v e c t o r s that d i d not have a m u l t i v a r i a t e normal d i s t r i b u t i o n , samples were f i r s t generated w i t h a m u l t i v a r i a t e normal d i s t r i b u t i o n , and then a t r a n s f o r m a t i o n was a p p l i e d to o b t a i n the d e s i r e d d i s t r i b u t i o n . The two transformations a p p l i e d were chosen from those proposed by Johnson (1949). These transformations provide a method to convert data to approximate nor-m a l i t y , and s i n c e the i n v e r s e transformation i s known, i t provides a method f o r generating v a r i a b l e s w i t h the r e q u i r e d d i s t r i b u t i o n . The two transformations a p p l i e d i n the present study were: 1. y = l n ( x ) , (0 _< x < °°), w i t h i n v e r s e x = e^, and 2. y = l n ( x / ( l - x ) ) , (0 < x < 1), w i t h i n v e r s e x = e y / ( l + e y ) . In each case, the v a r i a b l e y i s assumed to be normally d i s t r i b u t e d . The e f f e c t of the i n v e r s e transformation i n (1) above on random v a r i a b l e s w i t h a b i v a r i a t e normal d i s t r i b u t i o n i s examined i n Appendix I I . In the a c t u a l computation of x from the above i n v e r s e t r a n s f o r m a t i o n s , each v a r i a b l e was f i r s t r e s c a l e d by d i v i d i n g by 100 i n order to reduce the magnitude of the transformed scores and consequently prevent overflow e r r o r s 20. i n the computations. Mean v e c t o r s . The mean vectors were of the same magnitude f o r both the two and s i x v a r i a b l e cases but d i f f e r e d under each heterogeneity c o n d i t i o n . The P o p u l a t i o n 2 mean vectors were equal but opposite i n s i g n to those of P o p u l a t i o n 1. The mean vec t o r s f o r P o p u l a t i o n 1 i n the homogeneous c o n d i t i o n and heterogeneity c o n d i t i o n s 1 and 2 were, r e s p e c t i v e l y : y^ = (.5,...,.5), y^ = (.75,..., .75) , and y^ = (.3,...,.3). The corresponding 2 distances between populations measured by the Mahalanobis D were: 2.54, 2.86, and 0.23 (Homogeneous, Heterogeneous 1, and Heterogeneous 2) f o r the s i x v a r i a b l e case, and 1.33, 1.50, and 0.12 f o r the b i v a r i a t e case. Summary Of the independent V a r i a b l e s s t u d i e d . The f o l l o w i n g summary may a s s i s t i n understanding the o v e r a l l experimental design: 1. Two l e v e l s of average sample s i z e ; 100 and 40, w i t h two l e v e l s of n T : n 2 r a t i o (1:1, 3:1) r e s u l t i n g i n four sample s i z e c o n d i t i o n s ; 2. Two l e v e l s of p: 2 and 6; 3. Three l e v e l s of covariance matrix i n e q u a l i t y : Heterogeneity Condi-t i o n s 1 and 2 and the Homogeneous C o n d i t i o n ; 4. Four l e v e l s of nonnormality: (a) Both samples were drawn from MVN d i s t r i b u t i o n s ; i . e . , no t r a n s -formation was a p p l i e d ; (b) Both samples were drawn from MVN d i s t r i b u t i o n s and converted to nonnormality by t r a n s f o r m a t i o n 1; (c) Both samples were drawn from MVN d i s t r i b u t i o n s and converted to nonnormality by transformation 2; (d) Both samples were drawn from MVN d i s t r i b u t i o n s but only sample 2 was converted to nonnormality by transformation 1. 21. From the preceeding i t can be seen t h a t , i n a l l , 96 c o n d i t i o n s were examined. Data Generation The generation of random samples was accomplished by a random number generator, RANDN, on the U n i v e r s i t y of B r i t i s h Columbia's Amdahl 470 V/6-II computer. This generator produces independent unifor m l y d i s t r i b u t e d random numbers on the i n t e r v a l (0,1) and then transforms them to normally d i s t r i -buted random numbers w i t h mean 0 and va r i a n c e 1, using M a r s a g l i a ' s Rectangu-lar-Wedge-Tail Method (Knuth, 1968). The r e s u l t s of extensive t e s t i n g , i n c l u d i n g s p e c t r a l t e s t s (Coveyou & Macpherson, 1967), performed when the generator was implemented, have shown that i t I s f r e e • f r o m : s e r i a l : c o r r e l a t i o n . Using t h i s generator, s t r i n g s of le n g t h Np (where N = n^+ n^) of indepen-dent random normally d i s t r i b u t e d data p o i n t s were generated, and then these s t r i n g s were p a r t i t i o n e d i n t o two data s e t s , x, one n^ observations by p v a r i a t e s , the other x p. With the s t r i n g s p a r t i t i o n e d i n t h i s way, each n. x 1 ( j = 1,2) v a r i a t e v e c t o r i s normally d i s t r i b u t e d , w i t h mean 0 and variance 1, and independent of every other v e c t o r . I t can be e a s i l y demon-s t r a t e d that the j o i n t d i s t r i b u t i o n thus a r i s i n g ' i s MVN(0,I) (see, f o r example, Anderson, 1958, pp. 19-27). In order that each data matrix Y represents a sample from a p o p u l a t i o n w i t h a known covariance matrix £, and mean vector y_, the random v a r i a b l e Y = A'x + c was generated where x was the vector, of standard normal v a r i a t e s described above. Hence the random v a r i a b l e y had a p - v a r i a t e normal d i s t r i -b u t i o n w i t h y_ = c and E = A'A (Chambers, 1977, pp. 185). The p x p. matrix A, was obtained by performing an eigenvalue decomposition of the des i r e d p o p u l a t i o n covariance matrix o b t a i n i n g : 22. E = V A 2 V', and A ' was then obtained by A ' = VA, w i t h V orthogonal and A d i a g o n a l . Next, the n^ x p MVN(0,I) data m a t r i c e s , X , described above, were post m u l t i p l i e d by A y i e l d i n g : Y = X A . This process can be seen as producing a sample data m a t r i x that could have a r i s e n from a p o p u l a t i o n having a covariance m a t r i x E, s i n c e E ( Y ' Y ) = E [ ( X A ) * ( X A ) ] = E ( A ' X T X A ) = A ' E ( X ' X ) A = A ' A = V A 2 V' = E From the preceeding, i t should be c l e a r that f o r each value of p the sample data matrices were drawn from MVN (0,E^, i = l , . . . , 3 ) p o p u l a t i o n s . In t h i s , way 1000 pai r s , of samples and c r o s s - v a l i d a t i o n samples were generated f o r each'of the..9.6 conditions:, s t u d i e d and the average p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n f o r each d i s c r i m i n a n t a n a l y s i s procedure., c a l c u l a t e d . 23. RESULTS AND DISCUSSION The r e s u l t s of the Monte Carlo analyses described above appear i n Tables 1 through 8. Tables 1 through 4 co n t a i n the r e s u l t s f o r the b i v a r i a t e data s e t s , and Tables 5 through 8 co n t a i n the r e s u l t s f o r the s i x - v a r i a b l e data s e t s . The f i r s t of the four t a b l e s i n each set contains the r e s u l t s f o r normally d i s t r i b u t e d v a r i a b l e s i n both samples. The next two t a b l e s c o n t a i n the r e s u l t s f o r transformed v a r i a b l e s i n both samples, and the f o u r t h t a b l e contains the r e s u l t s w i t h a transformation a p p l i e d to the P o p u l a t i o n 2 data o n l y . The e n t r i e s i n each t a b l e correspond to the propor-t i o n of c o r r e c t c l a s s i f i c a t i o n s f o r the o r i g i n a l and corresponding c r o s s -v a l i d a t i o n samples, based on 1000 r e p l i c a t i o n s f o r each d i s c r i m i n a n t a n a l y s i s procedure. The r e s u l t s i n Tables 1 and 5 i n d i c a t e t h a t , i n general, w i t h samples drawn from MVN populations w i t h equal covariance m a t r i c e s , the F i s h e r LDF, the QDF, and the A-B procedure performed s i m i l a r l y , but as the degree of heterogeneity i n c r e a s e d , the QDF outperformed the other procedures (see Tables 1 and 5 ) . These procedures a l s o outperformed the PDF under a l l c o n d i t i o n s of sample s i z e and e q u a l i t y of covariance m a t r i c e s . These r e s u l t s are c o n s i s t e n t w i t h those of Marks and Dunn (1974) and G i l b e r t (1969), s i n c e i t can be observed that the F i s h e r LDF performed w e l l , w i t h respect to the QDF, f o r m i l d departures from homogeneity of covariance m a t r i c e s , but as the degree of heterogeneity i n c r e a s e d , the QDF outperformed the F i s h e r LDF and A-B procedure. The average p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n f o r the A-B procedure i s very s i m i l a r to that f o r the F i s h e r LDF across the heterogeneity c o n d i t i o n s , 24. Table 1 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Two Dependent Variables - Both Samples Drawn from MVN Populations Homo. Hetero. Hetero. Cond s Cond. 1 Cond. 2 Mean n^:n£ Method Sample „ . a Orig c-v b O r i g a c-v b „ . a Orig c-v" „ . a Orig c-v b L i n . 1. 2. .72 .72 .72 .72 .81 .73 .80 .73 .64 .57 .64 .56 .70 .70 Quad. 1. 2. .73 .72 .72 .71 .84 .72 .83 .72 .86 .65 .84 .65 .75 .75 100:100 Poly. 1. 2. .66 .66 .63 .64 .38 .79 .38 .79 .40 .87 .40 .87 .63 .62 A-B 1. 2. .73 .72 .72 .71 .77 .76 .76 .76 .60 .60 .59 .58 .70 .69 L i n . 1. 2. .72 .72 .71 .71 .81 .74 .80 .72 .65 .58 .63 .54 .70 .69 Quad. 1. 2. .73 .73 .71 .70 .85 .73 .82 .72 .86 .65 .83 .65 .79 .74 40:40 Poly. 1. 2. .63 .64 .59 .60 .43 .78 .42 .77 .41 .87 .41 .86 .63 .61 A-B 1. .73 .71 .78 .77 .61 .59 .70 .68 2. .72 .71 .77 .75 .60 .57 L i n . 1. 2. .72 .72 .72 .71 .81 .74 .81 .72 .65 .58 .64 .55 .70 .69 Quad. 150:50 1. 2. .72 .73 .72 .71 .84 .73 .84 .71 .86 .65 .84 .63 .76 .74 Poly. 1. 2. .64 .66 .63 .62 .40 .79 .40 .78 .41 .86 .40 .86 .63 .62 A-B 1. 2. .73 .73 .72 .71 .77 .77 .77 .75 .61 .60 .60 .57 .70 .69 L i n . 1. 2. .73 .73 .72 .70 .81 .74 .80 .71 .67 .59 .65 .52 .71 .68 Quad. 1. 2. .73 .74 .72 .68 .85 .74 .83 . .71 .86 .66 .83 .63 .76 .73 60:20 Poly. 1. 2. .62 .66 .60 .58 .45 .78 .44 .75 .41 .86 ."41 .86 .63 .61 A-B 1. 2. .74 .74 .73 .69 .78 .78 .77 .74 .63 .62 .61 .55 .72 .68 fftefers to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples Table 2 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Two Dependent Variables - Transformation 1 Applied to Both Samples Homo. Heterc Heterc . Cond. Cond. 1 Cond. 2 Mean n j : n 2 Method Sample O r i g a c-v b „ . a Orig c-v b O r i g 3 c-v b „ . a Orig c-v b L i n . 1. 2. .76 .43 .75 .43 .81 .42 .81 .41 .72 .34 .71 .32 "58 .57 Quad. 1. 2. .81 .39 .80 .38 .82 .41 .80 .41 .81 .39 .80 .39 .61 .60 100:100 Poly. 1. 2. .72 .42 .71 .40 .30 .79 .30 .79 .20 .93 .20 .93 .56 .56 A-B 1. 2. .75 .44 .74 .43 .78 .43 .78 .42 .71 .35 .70 .33 .58 .57 L i n . 1. .77 .76 .82 .81 .72 .70 .58 .57 2. .42 .40 .41 .39 .36 .33 Quad. 1. .81 .77 .83 .80 .82 .80 .61 .59 2. .39 .37 .40 .40 .39 .38 40:40 Poly. 1. 2. .68 .46 .65 .42 .39 .73 .37 .71 .20 .93 .20 .93 .57 .55 A-B 1. 2. .76 .42 .79 .41 .79 .42 .78 .41 .70 .38 .68 .34 .58 .57 L i n . 1. .75 .75 .81 .80 .71 .71 .58 .57 2. .42 .42 .41 .40 .35 .32 Quad. 1. 2. .81 .38 .80 .37 .82 .41 .81 .40 .81 .38 .80 .38 .60 .59 150:50 Poly. 1. 2. .72 .41 .71 .38 .37 .74 .37 .73 .20 .93 .20 .93 .56 .55 A-B 1. 2. .74 .43 .74 .42 .78 .42 .78 .41 .70 .36 .69 .33 .57 .56 L i n . 1. 2. .76 .40 .75 .39 .81 .39 .80 .37 .71 .38 .70 .32 .58 .56 Quad. 1. 2. .77 .42 .75 .37 .83 .38 .81 .36 .82 .37 .81 .35 .60 .58 60:20 Poly. 1. 2. .64 .49 .62 .43 .44 .68 .43 .64 .22 .92 .22 .91 .57 .54 A-B 1. 2. .75 .41 .74 .39 .79 Al .78 .38 .68 .41 .67 .34 .58 .55 Refers to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples 26. Table 3 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Two Dependent Variables - Transformation 2 Applied to Both Samples Homo. Hetero. Hetero. Cond. Cond. 1 Cond. 2 Mean n x : n 2 Method Sample O r i g 3 C-Vb O r i g 3 C-Vb O r i g 3 C-Vb O r i g 3 C-Vb L i n . Quad. 100:100 Poly. A-B L i n . 1. 2. .82 .39 .81 .37 .87 .38 .86 .37 .76 .34 .75 .30 .59 .58 40:40 Quad. Poly. 1. 2. 1. 2. .80 .41 .80 .34 .76 .38 .77 .30 .85 .39 .75 .36 .81 .38 .73 .35 .82 .39 .73 .41 .79 .38 .72 .41 .61 .57 .58 .55 A-B 1. 2. .81 .40 .80 .38 .83 .42 .82 .41 .73 .37 .71 .33 .59 .58 L i n . 1. 2. .81 .39 .81 .38 .86 .39 .86 .38 .77 .32 .76 .29 .59 .58 Quad. 1. 2. .82 .38 .81 .36 .84 .39 .83 .38 .81 .39 .80 .39 .61 .60 150:50 Poly. 1. 2. .81 .32 .80 .29 .73 .37 .72 .36 .71 .41 .71 .41 .59 .59 A-B 1. 2. .80 .40 .80 .39 .84 .41 .83 .39 .75 .33 .74 .30 .59 .58 L i n . 1. 2. .81 .38 .80 .36 .86 .37 .85 .35 .76 .34 .75 .28 .59 .57 60:20 Quad. Poly. 1. 2. 1. 2. .72 .48 .74 .39 .69 .42 .72 .33 .85 .38 .77 .34 .82 .35 .75 .30 .82 .37 .73 .38 .80 .35 .72 .38 .60 .59 .57 .53 A-B 1. 2. .80 .39 .79 .37 .80 .44 .79 .41 .74 .37 .72 .30 .59 .56 1. .81 .81 .86 .86 .77 .76 2. .40 .39 .39 .39 .32 .30 1. .83 .81 .84 .82 .81 .80 2. .38 .37 .40 .40 .40 .40 1. .82 .81 .72 .71 .71 .71 2. .32 .30 .38 .38 .42 .42 1. .81 .80 .85 .84 .75 .75 2. .41 .40 .40 .40 .32 .31 .59 .61 .56 .59 .59 .60 .56 .58 R e f e r s to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples Table 4 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Two Dependent Variables - Trar-sformation 1 Applied to Sample 2 Only Homo. Hetero. .Hetero. Cond. Cond. 1 Cond. 2 Mean n ^ n ^ Method Sample „ . a Orig c-v b 0 r i g a c-v b O r i g 3 c-v b „ . a Orig c-v b L i n . 1. 2. .56 .40 .54 .40 .64 .41 .63 .40 .54 .59 .52 .57 .52 .51 Quad. 1. 2. .52 .76 .50 .75 .78 .37 .75 .36 .78 .34 .76 .34 .59 .58 100:100 Poly. 1. 2. .54 .50 .52 .49 .51 .66 .49 .65 .40 .93 .40 .93 .59 .58 A-B 1. 2. .56 .40 .53 .40 .62 .42 .60 .41 .54 .60 .51 .57 .52 .50 L i n . 1. 2. .57 .44 .51 .42 .64 .40 .61 .39 .57 .59 .53 .55 .54 .50 Quad. 40:40 Pdly. 1. 2. 1. 2. .55 .74 .56 .50 .50 .71 .51 .48 .76 .38 .54 .61 .70 .36 .50 .58 .79 .36 .42 .93 .73 .34 - .41 .93 .60 .59 .56 .59 A-B 1. 2. .58 .43 .52 .41 .62 .41 .59 .39 .57 .61 .52 .56 .54 .50 L i n . 1. 2. .56 .41 .54 .39 .64 .41 .63 .39 .56 .59 .54 .55 .53 .51 Quad. 150:50 Poly. 1. 2. 1. 2. .53 .76 .54 .50 .51 .73 .52 .47 .77 .37 .52 .63 .75 .34 .51 .61 .78 .34 .41 .93 .76 .32 .40 .93 .59 .59 .57 .57 A-B 1. 2. .56 .40 .54 .39 .61 .41 .61 .40 .55 .59 .53 .55 .52 .50 L i n . 1. 2. .56 .44 .53 .40 .63 .38 .61 .37 .59 .60 .56 .53 .53 .50 Quad. 1. 2. .56 .77 .53 .71 .73 .44 .69 .38 .77 .40 .73 .34 .61 .56 60:20 Poly. 1. 2. .54 .54 .51 .49 .54 .58 .52 .54 .42 .93 .42 .92 .59 .57 A-B 1. 2. .57 .44 .53 .39 .61 .39 .58 .37 .58 .62 .56 .55 .54 .50 ^Refers to o r i g i n a l samples Refers to cr o s s - v a l i d a t i o n samples Table 5 Mean Proportion of Correct C l a s s i f i c a t i o n f o r 1000 Samples Six Dependent Variables - Both Samples Drawn from MVN Populations Homo. Hetero. Hetero Cond. Cond. 1 Cond. 2 Mean nyn2 Method Sample O r i g 3 c-vb Orig c-v b „ . a Orig c-v b O r i g 3 c-v b L i n . 1. 2. .80 .80 .78 .78 .89 .81 .88 .79 .70 .61 .68 .57 .77 .75 Quad. 1. 2. .81 .81 .77 .77 .92 .86 .87 .84 .94 .86 .89 .85 .87 .83 100:100 Poly. 1. 2. .69 .69 .63 .63 .21 .95 .20 .94 .21 .99 .21 .99 .62 .60 A-B 1. 2. .80 .78 .78 .78 .86 .84 .85 .82 .66 .64 .64 .59 .76 .74 L i n . 1. 2. .81 .81 .77 .77 .89 .82 .87 .78 .73 .63 .69 .54 .78 .74 Quad. 1. 2. .84 .84 .73 .74 .93 .88 .83 .84 .96 .88 .82 .86 .89 .80 40:40 Poly. 1. .70 .60 .30 .26 .23 .22 .64 .59 2. .67 .58 .93 .90 .99 .99 A-B 1. .82 .77 .88 .84 .70 .65 .79 .73 2.. .81 .76 .85 .80 .66 .56 L i n . 1. 2. .80 .80 .78 .77 .89 .82 .88 .78 .72 .62 .70 .54 .64 .74 Quad. 1. 2. .81 .82 .78 .73 .91 .87 .89 .83 .93 .86 .90 .84 .87 .83 150:50 Poly. 1. 2. .66 .69 .63 .65 .22 .95 .21 .88 .20 .99 .20 .71 .62 .55 A-B 1. 2. .80 .81 .79 .76 .87 .85 .85 .80 .68 .65 .66 .57 .78 .74 L i n . 1. 2. .81 .81 .78 .75 .90 .83 .88 .76 .76 .66 .73 .49 .80 .73 Quad. 1. 2. .84 .87 .80 .61 .93 .89 .87 .76 .94 .87 .87 .80 .89 .79 60:20 Poly 1. 2. .65 .74 .59 .56 .33 .92 .31 .77 .24 .99 .23 .86 .65 .55 A-B . 1. .82 .79 .88 .86 .73 .70 .81 .73 2. .84 .73 .88 .77 .70 .51 ^Refers to o r i g i n a l samples Refers to cr o s s - v a l i d a t i o n samples Table 6 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Six Dependent Variables - Transformation 1 Applied to Both Samples Homo. Hetero. Hetero. Cond. Cond 1 Cond. 2. Mean n ^ : n 2 Method Sample O r i g 3 c-v b O r i g 3 c-v b O r i g 3 c-v b O r i g 3 c-v b L i n . 1. .76 .73 .85 .84 .68 .66 .70 .68 2. .69 .67 .68 .67 .53 .48 Quad. 1. 2. .81 .67 .73 .62 .86 .72 .78 .71 .88 .72 .82 .72 .78 .73 100:100 Poly. 1. 2. .70 .59 .65 .53 .20 .93 .19 .93 .17 .99 .16 .99 .60 .58 A-B 1. 2. .72 .71 .70 .69 .79 .72 .77 .70 .65 .55 .63 .50 .69 .67 L i n . 1. 2. .77 .64 .71 .60 .85 .64 .82 .60 .73 .51 .68 .42 .69 .64 Quad. 1. 2. .84 .64 .68 .54 .88 .68 .71 .65 .90 .67 .73 .67 .77 .66 40:40 Poly. 1. .69 .60 .33 .29 .16 .16 .60 .56 2. .58 .50 .86 .83 .99 .99 A-B 1. 2. .74 .66 .67 .62 .80 .67 .75 .64 .71 .53 .65 .43 .69 .63 L i n . 1. 2. .74 .66 .72 .63 .84 .65 .82 .63 .69 .51 .67 .42 .68 .65 Quad. 1. 2. .81 .63 .76 .52 .85 .68 .81 .65 .87 .68 .83 .67 .75 .71 150:50 Poly. 1. 2. .65 .61 .61 .52 .26 .90 .25 .88 .18 .99 .17 .99 .60 .57 A-B 1. 2. .71 .68 .69 .65 .77 .68 .75 .66 .66 .53 .64 .44 .67 .64 L i n . 1. 2. .73 .55 .68 .49 .82 .56 .78 .50 .73 .47 .70 .32 .64 .58 Quad. 1. 2. .67 .83 .62 .56 .88 .56 .79 .40 .89 .56 .79 .44 .73 .60 60:20 Poly. 1. 2. .62 .63 .57 .49 .42 .78 .39 .70 .19 .99 .19 .98 .61 .55 A-B 1. .71 .65 .76 .72 .69 .65 .64 .57 2. .57 .49 .58 .52 .53 .36 ^Refers to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples Table 7 Mean Proportion of Correct C l a s s i f i c a t i o n f o r 1000 Samples Six Dependent Variables - Transformation 2 Applied to Both Samples Homo. Hetero. Hetero. Cond. Cond. 1 Cond. 2 Mean n l : n 2 Method Sample O r i g 3 c-v b O r i g 3 c-v b O r i g 3 c-v b . . a Orig c-v b L i n . 1. .80 .78 .89 .87 .73 .71 .71 .69 2. .66 .65 .67 .65 .51 .47 Quad. 1. 2. .83 .66 .76 .61 .88 .71 .81 .70 .88 .72 .82 .72 .78 .74 100:100 Poly. 1. 2. .76 .53 .71 .48 .39 .75 .38 .74 38 .78 .38 .78 .60 .58 A-B 1. 2. .78 .68 .76 .67 .85 .69 .83 .68 .70 .53 .67 .49 .71 .68 L i n . 1. 2. .82 .62 .78 .58 .89 .63 .87 .58 .77 .49 .72 .41 .70 .66 Quad. 1. 2. .86 .63 .70 .54 .90 .67 .73 .64 .90 .68 .73 .67 .77 .67 40:40 Poly. 1. 2. .75 .52 .67 .43 .52 .67 .48 .64 .42 .74 .42 .74 .60 .56 A-B 1. .81 .75 .87 .82 .74 .69 .71 .65 2. .65 .60 .65 .61 .52 .43 L i n . 1. 2. .79 .63 .77 .61 .88 .64 .86 .61 .72 .49 .71 .42 .69 .66 Quad. 1. 2. .81 .64 .78 .53 .88 .67 .83 .64 .87 .68 .83 .67 .76 .71 150:50 Poly. 1. 2. .75 .50 .72 .41 .45 .69 .44 .66 .36 .75 .36 .75 .58 .56 A-B 1. 2. .77 .66 .75 .62 .83 .67 .82 .64 .70 .52 .68 .43 .69 .66 L i n . 1. 2. .79 .54 .75 .49 .87 .54 .84 .49 .76 .46 .73 .31 .66 .60 Quad. 1. 2. .67 .86 .62 .59 .89 .57 .82 .38 .88 .56 .79 .44 .74 .61 60:20 Poly. 1. 2. .76 .45 .71 .32 .61 .54 .58 .44 .41 .67 .40 .66 .57 .52 A-B 1. 2. .75 .59 .70 .52 .71 .73 .68 .66 .73 .51 .69 .34 .67 .60 ^Refers to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples Table 8 Mean Proportion of Correct C l a s s i f i c a t i o n for 1000 Samples Six Dependent Variables - Transformation 1 Applied to Sample 2 Only Homo. Hetero. Hetero. Cond. Cond. 1 Cond. 2 Mean n l : n 2 Method Sample O r i g 3 c-v b 0 r i g a c-v b 0 r i g a c-v b _ . a Orig c-v b L i n . 1. 2. .71 .67 .68 .65 .82 .68 .80 .66 .63 .51 .59 .45 .67 .64 Quad. 1. 2. .77 .66 .70 .59 .88 .70 .81 .68 .91 .69 .84 .69 .77 .72 100:100 Poly. 1. 2. .62 .64 .56 .60 .25 .93 .23 .92 .21 .99 .21 .98 .61 .58 A-B 1. 2. .69 .69 .65 .67 .76 .71 .74 .69 .61 .52 .56 .47 .66 .63 L i n . 1. 2. .70 .62 .61 .58 .80 .63 .75 .59 .66 .51 .58 .41 .65 .59 Quad. 1. 2. .77 .77 .62 .63 .88 .65 .72 .59 .92 .65 .76 .61 .77 .66 40:40 Poly. 1. 2. .62 .65 .52 .57 .38 .85 .33 .81 .24 .99 .22 .99 :62 .57 A-B 1. 2. .68 .63 .59 .59 .75 .65 .69 .62 .64 .55 .56 .44 .65 .58 L i n . 1. 2. .70 .65 .68 .62 .80 .65 .79 .62 .64 .50 .62 .40 .66 .62 Quad. 1. 2. .74 .75 .70 .61 .87 .66 .82 .60 .90 .65 .85 .63 .76 .70 150:50 Poly. 1. 2. .58 .68 .55 .59 .29 .90 .27 .87 .22 .99 .22 .99 .61 .58 A-B 1. 2. .67 .66 .65 .63 .74 .67 .73 .65 .61 .52 .59 .42 .65 .61 L i n . 1. 2. .66 .53 .59 .46 .76 .55 .71 .49 .67 .52 .62 .36 .62 .54 Quad. 1. .79 .74 .84 .77 .89 .80 .76 .61 2. .90 .62 .60 .38 .54 .35 60:20 Poly. 1. 2. .56 .73 .49 .62 .45 .78 .41 .68 .26 .99 .24 .99 .63 .57 A-B 1. .65 .57 .71 .66 .66 .60 .63 .54 2. .55 .46 .57 .50 .61 .43 ^Refers to o r i g i n a l samples Refers to c r o s s - v a l i d a t i o n samples 32. except that the p r o p o r t i o n of m i s c l a s s i f i c a t i o n was n e a r l y equal i n both samp-l e s f o r the A-B procedure, whereas the F i s h e r LDF tended to m i s c l a s s i f y more cases i n the sample w i t h greater d i s p e r s i o n s and fewer cases i n the other sample. The d i f f e r e n c e s between the two procedures i n the p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n s were r e l a t i v e l y small and, as can be seen i n Tables 1 and 5, tended to be about .03. For most a p p l i c a t i o n s t h i s d i f f e r e n c e would be of l i t t l e importance, s i n c e i n s i t u a t i o n s where i t might a r i s e , the QDF outperforms [both of these procedures. In the case of equal sample s i z e s i t can be observed from Tables 1 and 5 that the e r r o r i n c l a s s i f i c a t i o n i n the c r o s s - v a l i d a t i o n sample by the F i s h e r LDF and A-B procedure was greater i n the heterogeneous c o n d i t i o n s . The QDF i n t h i s case had a s l i g h t l y higher e r r o r r a t e i n the homogeneous c o n d i t i o n than i n the heterogeneous but outperformed the other procedures i n the heterogeneous c o n d i t i o n s i n c l a s s i f y i n g the c r o s s - v a l i d a t i o n samples. In the homogeneous c o n d i t i o n w i t h unequal and s m a l l sample s i z e s the QDF performed poorly i n r e l a t i o n to the F i s h e r LDF and A-B procedures, i n c l a s s i f i c a t i o n of the c r o s s - v a l i d a t i o n sample, but i t outperformed these procedures by a wide margin i n Heterogeneity C o n d i t i o n 2. Even i n t h i s case, however, the p r o p o r t i o n of c o r r e c t c l a s s i f i c a t i o n s of the s m a l l e r c r o s s -v a l i d a t i o n sample was consid e r a b l y less--.80 compared to .86 i n the equal and s m a l l sample s i z e f o r the s i x v a r i a b l e case and .63 to .65 i n the b i -v a r i a t e case. These r e s u l t s i n d i c a t e t h a t , although the QDF outperformed the other procedures when covariance matrices were equal, i t i s more s e n s i t i v e to sample s i z e d i f f e r e n c e s , p a r t i c u l a r l y when one sample s i z e i s s m a l l . The greater s e n s i t i v i t y of the QDF than the LDF to s m a l l sample s i z e s may be seen as the r e s u l t of the QDF not p o o l i n g the sample covariance m a t r i c e s ; 3 3 . i t i s , t h e r e f o r e , more s e n s i t i v e to the v a r i a b i l i t y of the smaller samples. In the present study, i f equal proportions of the two groups had not been assumed to e x i s t i n the p o p u l a t i o n and sample s i z e s had been taken to represent the p o p u l a t i o n proportions then i t would have been necessary to use 21n(p2/p^) a s t' i e c u t - ° f f f° r d e c i d i n g group membership.- This would have r e s u l t e d i n l e s s e r r o r i n c l a s s i f i c a t i o n of the l a r g e r group, P o p u l a t i o n 1 i n the present casey s i n c e the d e c i s i o n boundary would now be f u r t h e r away from t h i s group. The r e s u l t s f o r samples that were generated from MVN d i s t r i b u t i o n s and then transformed appear i n Tables 2 and 3 f o r the b i v a r i a t e case and i n Tables 6 and 7 f o r the s i x v a r i a b l e case. From these r e s u l t s i t can be seen that a l l of the procedures were adversely a f f e c t e d by p o p u l a t i o n nonnormality. The e r r o r i n c l a s s i f i c a t i o n tended to be higher i n one sample than the other, w i t h the imbalance being greater i n the heterogeneous, s m a l l , and unequal sample s i z e c o n d i t i o n s . This imbalance a l s o tended to be l a r g e r i n the b i v a r i a t e , ; case. Although the QDF tended to have a high degree of e r r o r i n the c l a s s i f i c a t i o n of one sample, i t s t i l l performed b e t t e r than the other procedures i n Hetero-geneity C o n d i t i o n 2, although only m a r g i n a l l y b e t t e r i n Heterogeneity C o n d i t i o n 1 and the Homogeneous Cond i t i o n . In a d d i t i o n , when sample s i z e s were s m a l l and unequal, the QDF d i d not perform c o n s i s t e n t l y . As can be seen i n t a b l e s 6 and 7, i n the Homogeneous C o n d i t i o n , the F i s h e r LDF and the QDF had most of the e r r o r i n the c l a s s i f i c a t i o n of Sample 2, except i n the s m a l l and unequal sample s i z e c o n d i t i o n , i n which case the QDF m i s c l a s s i f i e d more observations i n Sample 1 than Sample 2, w h i l e the reverse was t r u e f o r the corresponding c r o s s - v a l i d a -t i o n sample. 34. The F i s h e r LDF and the A-B procedures a l s o performed s i m i l a r l y i n o v e r a l l c l a s s i f i c a t i o n , although the F i s h e r LDF again tended to have more of the e r r o r i n c l a s s i f i c a t i o n concentrated i n one sample, whereas the A-B procedure had the e r r o r i n c l a s s i f i c a t i o n d i s t r i b u t e d more evenly between the two samples. For the co n d i t i o n s i n which transformation 1 was a p p l i e d to Sample 2 onl y , the r e s u l t s appear i n Table 4 f o r the b i v a r i a t e case and Table 8 f o r the s i x v a r i a b l e case. These r e s u l t s are g e n e r a l l y s i m i l a r to those i n which t r a n s f o r -mations were a p p l i e d to both samples. There tended to be more m i s c l a s s i f i c a t i o n i n the transformed sample f o r a l l the procedures, w i t h t h i s imbalance being greater i n the b i v a r i a t e c o n d i t i o n . I n these c o n d i t i o n s , the PDF again had the poorest performance i n c l a s s i f i c a t i o n of the o r i g i n a l and c r o s s - v a l i d a t i o n samples. In the b i v a r i a t e case, the r e s u l t s i n Table 4 show th a t i n the Homogeneous Co n d i t i o n , the QDF m i s c l a s s i f i e d more observations i n Sample 1 (the untrans-formed sample) of the o r i g i n a l and c r o s s - v a l i d a t i o n samples, whereas the F i s h e r LDF and the A-B procedures had most of the e r r o r concentrated i n Sample 2. These r e s u l t s suggest that i n c e r t a i n non-optimal c o n d i t i o n s the QDF may y i e l d r e s u l t s that do not agree w i t h those of the F i s h e r LDF. The poor performance of the PDF i n a l l c o n d i t i o n s of the present study may have r e s u l t e d from s e v e r a l f a c t o r s . One f a c t o r that may have had an adverse a f f e c t on the PDF was the choice of Hermite polynomials. The polynomials used i n the present study were those described i n Tou and Gonzalez (1974, pp. 70) 2 and are orthogonal to the weight f u n c t i o n e x p ( - x ) . A b e t t e r choice of Hermite polynomials would be those described by Cramer (1946) which are orthogonal to 2 the weight f u n c t i o n exp(-x /2) and more c l o s e l y resemble the normal curve. I n the present study the Hermite polynomials described by Tou and Gonzalez (1974) 35. were used s i n c e they suggest that the d i s c r i m i n a t i n g power of the r e s u l t a n t PDF i s not a f f e c t e d by the weight f u n c t i o n because i t i s common to the estimated density f u n c t i o n s i n both samples. Tou and Gonzalez (1974, pp. 72) a l s o sug-gest that the polynomials need not be orthonormal and perform w e l l i n t h e i r orthogonal form which a l s o s i m p l i f i e s the computations. From the r e s u l t s of the present study i t i s apparent that the above e f f e c t s should be examined when using Hermite polynomials f o r c o n s t r u c t i n g polynomial d i s c r i m i n a n t f u n c t i o n s . An a d d i t i o n a l source of e r r o r i n the PDF may have r e s u l t e d from the man-ner i n which the c o e f f i c i e n t s of the den s i t y f u n c t i o n estimates were c a l c u l a t e d . As can be observed from Equation ( 6 ) , i f the x^ are chosen from a sample w i t h l a r g e variances r e l a t i v e ' t o - t h e other,.-; sample,. the c o e f f i c i e n t s w i l l be l a r g e r i n the den s i t y f u n c t i o n estimate of the sample w i t h the l a r g e r v a r i a n c e s . Consequently t h i s d e n s i t y f u n c t i o n estimate w i l l dominate the d e c i s i o n r u l e and produce a l a r g e amount of e r r o r i n the c l a s s i f i c a t i o n of the sample w i t h l e s s v a r i a n c e , s i n c e these observations would a l s o be c l a s s i f i e d i n the l a r g e r v a r i a n c e sample, as was observed i n the present study. Another source of e r r o r i n the PDF may have r e s u l t e d from not i n c l u d i n g terms of higher degree i n the c o n s t r u c t i o n of the m u l t i v a r i a t e Hermite p o l y -nomials. I n c l u d i n g more terms i n the polynomial would have added g r e a t l y to computation time however, and made the examination of t h i s procedure i m p r a c t i c a l . A l t e r n a t i v e procedures e x i s t f o r the e s t i m a t i o n of a polynomial d i s c r i m i n a n t f u n c t i o n , such as that described by Specht (1967). The computer a l g o r i t h m , however, i n t h i s l a t t e r case i s complex and would r e q u i r e a l a r g e amount of computation time. 36. CONCLUSIONS In general the r e s u l t s of the present study are c o n s i s t e n t w i t h those of G i l b e r t (1969) and Marks and Dunn (1974). They i n d i c a t e t h a t f o r m i l d to moderate departures from homogeneity of covariance m a t r i c e s , theLEisher.LDF performs s i m i l a r l y to the QDF. Consequently, unless l a r g e d i f f e r e n c e s e x i s t between covariance matrices there may be l i t t l e advantage i n using the QDF. In a d d i t i o n , the QDF i s adversely a f f e c t e d by small sample s i z e s to a g r e a t e r extent than i s the F i s h e r LDF. S t a t i s t i c a l t e s t s of the hypothesis of equal covariance matrices are a v a i l a b l e such as that proposed by B a r t l e t t (1947) and. Box's (1949) improved chi-square and F approximations (see H a r r i s , 1975, pp. 85-86). These t e s t s are s e n s i t i v e to nonnormality, however, (Mardia, 1971) and i f nonnormal data i s suspected transforming i t to approximate n o r m a l i t y would be a d v i s a b l e . The r e s u l t s a l s o i n d i c a t e that the A-B procedure performs s i m i l a r l y to the F i s h e r LDF i n o v e r a l l c l a s s i f i c a t i o n . The e r r o r s i n c l a s s i f i c a t i o n , how-ever, are, to a small,degree,,more evenly d i v i d e d between the two samples. Since t h i s procedure i s computationally more complex and r e q u i r e s t r i a l - a n d -errorcprocedures to f i n d the optimal c l a s s i f i c a t i o n r u l e , there appears to be l i t t l e advantage i n choosing t h i s method over the F i s h e r LDF. The F i s h e r LDF, the QDF, and the A-B procedures are adversely a f f e c t e d by nonnormality i n the p o p u l a t i o n , and a l l procedures produce more e r r o r i n c l a s s i f i c a t i o n of one than the other under these c o n d i t i o n s . A l s o , when one or both samples are drawn from nonnormal p o p u l a t i o n s , the F i s h e r LDF and the QDF can d i f f e r i n terms of which of the samples can have the most m i s c l a s s i -f i c a t i o n s . As a r e s u l t , i n a p p l i c a t i o n s where these c o n d i t i o n s might be suspected, i t would be a d v i s a b l e to convert the data to approximate normality before applying these procedures. The use of the p a r t i c u l a r PDF described i n the present study would appear to be of l i t t l e value as a c l a s s i f i c a t i o n procedure. Other methods f o r d e r i v i n g a polynomial d i s c r i m i n a n t f u n c t i o n e x i s t and may perform more e f f e c t i v e l y i n the present c o n d i t i o n s . Further research i s necessary to determine the best PDF c l a s s i f i c a t i o n r u l e . 38. REFERENCES Anderson, T.W. C l a s s i f i c a t i o n by m u l t i v a r i a t e a n a l y s i s . Psychometrika, 1951, 16_, 31-50. Anderson, T.W. An I n t r o d u c t i o n to M u l t i v a r i a t e S t a t i s t i c a l A n a l y s i s . New York: Wiley, 1958. Anderson, T.W. & Bahadur, R.R. C l a s s i f i c a t i o n i n t o two m u l t i v a r i a t e normal d i s t r i b u t i o n s w i t h d i f f e r e n t covariance matrices. Annals of Mathematical S t a t i s t i c s , 1962, 33, 420-431. Banerjee, K.S. & Marcus, L.F. Bounds i n a minimax c l a s s i f i c a t i o n procedure. Biometrika, 1965, 52, 653-654. B a r t l e t t , M.S. M u l t i v a r i a t e a n a l y s i s . J o u r n a l of the Royal S t a t i s t i c a l Soc- i e t y Supplement, Series B, 1947, 9_, 176-197. Box, G.E. A general d i s t r i b u t i o n theory f o r a c l a s s of l i k e l i h o o d c r i t e r i a . B i o m e t r i k a , 1949, _36, 317-346. Chambers, J.M. Computational Methods fOr Data A n a l y s i s . New York: Wiley, 1977. Coveyou, R.R & Macpherson, R.D. F o u r i e r a n a l y s i s of uniform random number generators. J o u r n a l of the A s s o c i a t i o n of Computing Machinery, 1967, Ut_, 100-119. Cramer, H. Mathematical Methods of S t a t i s t i c s . P r i n c e t o n : P r i n c e t o n Univer-s i t y P r e ss, 1946. F i s h e r , R.A. The l o g i c of i n d u c t i v e i n f e r e n c e . J o u r n a l of the Royal S t a t i s - t i c a l S o c i e t y , 1935, 98, 39-82. F i s h e r R.A. The use of m u l t i p l e measurements i n taxonomic problems. Annals of Eugenics, 1936, 2> 1 7 9- 1 8 8-G i l b e r t , E.S. On d i s c r i m i n a t i o n using q u a l i t a t i v e v a r i a b l e s . J o u r n a l of the  American S t a t i s t i c a l A s s o c i a t i o n , 1968, 63_, 1399-1412. 39. G i l b e r t , E.S. The e f f e c t of unequal variance-covariance matrices on F i s h e r ' s l i n e a r d i s c r i m i n a n t f u n c t i o n . B i o m e t r i c s , 1969, 25, 505-516. Gnanadesikan, R. Methods f o r S t a t i s t i c a l Data A n a l y s i s of M u l t i v a r i a t e Obser- va t i o n s . New York: Wiley, 1977. H a r r i s , R.J. A Primer of M u l t i v a r i a t e S t a t i s t i c s . New York: Academic P r e s s , 1975. 2 Holloway, L.N. & Dunn, O.J. The robustness of H o t e l l i n g ' s T . J o u r n a l of the  American S t a t i s t i c a l A s s o c i a t i o n , 1967, 62, 124-136. 2 Hopkins, J.W. & Clay, P.P.F. Some e m p i r i c a l d i s t r i b u t i o n s of b i v a r i a t e T and homoscedasticity c r i t e r i o n M under unequal v a r i a n c e and l e p t o k u r t o s i s . J o u r n a l of the American S t a t i s t i c a l A s s o c i a t i o n , 1963, 58, 1048-1053. 2 I t o , K. & S c h u l l , W.J. On the robustness of the T Q t e s t i n m u l t i v a r i a t e a n a l y s i s of v a r i a n c e when variance-covariance matrices are not equal. Biometrika, 1964, 51_, 71-82. Johnson, N.L. Systems of frequency curves generated by! methods of t r a n s l a t i o n . Biometrika, 1949, 36_, 149-176. Knuth, D.E. The A r t of Computer Programming ( V o l . 2 ) : Semi-numerical A l g o r i t m s . Reading, Mass.: Addison-Wesley, 1968. Krzanowski, W.J. Discrimination-?''and cla'ssification.;.using::both b i n a r y .and c'ontinous--variablesv . J o u r n a l oF the-American" S t a t i s t i c a l ' A s s o c i a t i o n , 1975, 70, 782-790. Krzanowski, W.J. The performance of F i s h e r ' s l i n e a r d i s c r i m i n a n t f u n c t i o n under non-optimal c o n d i t i o n s . Techridmetfics, 1977, 19_, 191-200. Ks h i r s a g a r , A.M. M u l t i v a r i a t e A n a l y s i s . New York: Dekker, 1972. Lachenbruch, P.A. D i s c r i m i n a n t A n a l y s i s • New York: Hafner P r e s s , 1975. Lachenbruch, P.A., Sneeringer, C. & Revo, L.T. Robustness of the l i n e a r and quadratic d i s c r i m i n a n t f u n c t i o n to c e r t a i n types of nonnormality. Communications i n S t a t i s t i c s , 1973, 1_, 39-57. 40. Mardia., K.V. The e f f e c t of nonormality on some m u l t i v a r i a t e t e s t s and robust-ness to nonnormality i n the l i n e a r model. Biom e t r i k a , 1971, 58, 105-121. Marks, S. & Dunn, .O.J. D i s c r i m i n a n t f u n c t i o n s when covariance matrices are unequal. J o u r n a l of • the American S t a t i s t i c a l A s s o c i a t i o n , 1974, 6>9_, 555-559. Moore, D.H. E v a l u a t i o n of f i v e d i s c r i m i n a t i o n procedures f o r b i n a r y v a r i a -b l e s . J o u r n a l Of the American S t a t i s t i c a l A s s o c i a t i o n , 1973, 68, 399-404. Morrison, D.F. M u l t i v a r i a t e S t a t i s t i c a l Methods (2nd ed.). New York: McGraw-Hill, 1976. Randies, R.H., B r o f f i t t , J.D., Ramberg, J.S. & Hogg, R.V. Generalized l i n e a r and quadratic d i s c r i m i n a n t f u n c t i o n s using robust estimates. J o u r n a l of  the American S t a t i s t i c a l A s s o c i a t i o n , 1978, 73, 564-568. Smith, C.A.B. Some examples of d i s c r i m i n a t i o n . Annals of Eugenics, 1947, 13, 272-282. Specht, D.F. Generation of polynomial d i s c r i m i n a n t f u n c t i o n s f o r p a t t e r n r e c o g n i t i o n . IEEE Transactions on E l e c t r o n i c Computers, 1967, EC-16, No. 3, 308-319. Tou, J.T. & Gonzalez, R.C. P a t t e r n Recognition P r i n c i p l e s . Reading, Mass.: Addison-Wesley, 1974. Wald, A. On a s t a t i s t i c a l problem a r i s i n g i n the c l a s s i f i c a t i o n of an i n d i v i d -u a l i n t o two groups. Annals of Mathematical S t a t i s t i c s , 1944, 15, 145-163. Welch, B.L. Note on d i s c r i m i n a n t f u n c t i o n s . B i o m e t r i k a , 1939, 31_, 218-220. Zhezhel, Y.N. The e f f i c i e n c y o f . a l i n e a r d i s c r i m i n a n t f u n c t i o n f o r a r b i t r a r y d i s t r i b u t i o n s . Engineering Cybernetics, 1968, _6, .107-111. 41. APPENDIX I POPULATION COVARIANCE MATRICES USED IN THE STUDY AS . . . . ( E n t r i e s are a c t u a l values m u l t i p l i e d by 100) Two V a r i a b l e s 1 2 1 .100. 2 50 100 S i x V a r i a b l e s 1 2 3 4 5 6 1 100 2 40 100 3 30 35 100 4 40 25 10 100 5 30 40 50 30 100 6 20 30 00 40 15 100 The 1^ matrices were obtained from those tabulated by means of r e s c a l i n g s described i n the t e x t . APPENDIX I I THE EFFECT OF TRANSFORMATION (1) ON THE BIVARIATE NORMAL DISTRIBUTION Suppose y^, y^ are random v a r i a b l e s w i t h the b i v a r i a t e normal p r o b a b i l i t y d e n s i t y f u n c t i o n ( p . d . f . ) : P(£) = — - | — r ^ , e x p { - % [ ( Z - u ) T ^ x - y ) ] } [( 2 lr) n|z | r 2 where y_' = [ y ^ y ^ , i i = L V ^ ^ ] . a n d Z l = I f the transformation .'. x. = e y i ; i=l,2 1 i s a p p l i e d , the i n v e r s e transformation i s e a s i l y seen as: y. = l n x., and the Jacobian of the inverse transformation i s then: 8 y l 9x_ 9x„ 1 2 B y 2 3 y 2 9x, 9x„ 1 2 i 0 x l where x' = [x ,x„]. 1 P P 1 * 0 -x„ X1 X2 43. The r e s u l t i n g d i s t r i b u t i o n of x_ then f o l l o w s as: 1 . r _ i _ p(x) = - exp{-%[(ln,(x)-y) ' I ( l n ( x ) - y ) ] } - x 1 x 2 [ ( 2 1 T ) n | E | ] ' 1 ' -which i s not b i v a r i a t e normal i n form. A s i m i l a r d e r i v a t i o n can be expressed f o r the s i x v a r i a b l e case. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0100253/manifest

Comment

Related Items