THE INDEX OF DISPERSION By EDGAR G. AVELINO B . S c , The University of Br i t ish Columbia, 1978 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FQR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES , Stat ist ics Department The University of Br i t ish Columbia We accept this thesis as conforming to the_ requifibd/t stan'cQrd' THE UNIVERSITY OF BRITISH COLUMBIA November 1934 © Edgar G. Avelino, 1984 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of GTA7~/S7?&' The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date 'OCT- A3, /<?8y DE-6 (3/81) i i ABSTRACT The index of dispersion is a statistic commonly used to detect departures from randomness of count data. Under the hypothesis of randomness, the true distribution of this statistic is unknown. The accu-racy of large sample approximations is assessed by a Monte Carlo simulation. Further approximations by Pearson curves and infinite series expansions are in-vestigated. Finally, the powers of the individual tests based on the likelihood ratio, the index of dis-persion and Pearson's goodness-of-fit statistic are compared. i i i TABLE OF CONTENTS PAGE ABSTRACT i i ACKNOWLEDGEMENT ix 1. INTRODUCTION 1.1 History of the Index of Dispersion 1 1.2 Purpose of the Paper 4 2. LARGE SAMPLE APPROXIMATIONS 2.1 Joint Distribution of X & S 2 6 2.2 The Asymptotic Distribution of I 7 2.3 Description of the Monte Carlo Simulation 9 2.4 The x 2 Approximation 14 2.5 Discussion 15 3. PEARSON CURVES 3.1 The Theory of Pearson Curves 19 3.2 Two Examples 21 3.3 The F i rst Four Moments of I 24 3.4 Discussion 29 i v PAGE 4. THE GRAM-CHARLIER SERIES OF TYPE A 4.1 The Theory of Gram-Charlier Expansions 37 4.2 Discussion 42 5. THE LIKELIHOOD RATIO AND GOODNESS-OF-FIT TESTS 5.1 The Likelihood Ratio Test 49 5.2 Pearson's Goodness-of-Fit Test 54 5.3 Power Computations 55 5.4 The Likelihood Ratio Test Revisited 65 6. GONGLUSIONS 74 REFERENCES . . . . . 7 6 APPENDIX A l . l The Conditional Distribution of a Poisson Sample Given the Total 79 A1.2 The First Four Moments of I 80 A1.3 The Type VI Pearson Curve 87 A1.4 A Limiting Case of the Negative Binomial 88 A2.1 Histograms of I 90 A2.2 Empirical Cr i t ica l Values 106 A2.3 Pearson Curve Cr i t ica l Values 107 A2.4 Gram-Charlier Cr i t ica l Values 109 V TABLE OF CONTENTS LIST OF TABLES PAGE TABLE 1: Expected Number of I*'s 10 TABLE 2 (A-D): Normal Approximation 11 TABLE 3 (A-D): x 2 Approximation 16 TABLE 4 (A-D): Pearson Curve F i t with Exact Moments 31 TABLE 5 (A-D): Pearson Curve F i t with Asymptotic Moments 33 TABLE 6 (A-D): Gram-Charlier Three-Moment F i t (Exact) 43 TABLE 7 (A-D): Gram-Charlier Four-Moment F i t (Exact) 45 TABLE 8: Asymptotic Power of the Index of Dispersion Test -. 57 TABLE 9-10 (A-D): Power of Tests Based on A,I and X2 60 TABLE 11 (A-D): Power of Tests Based on A and I (n=50).. 66 TABLE 12: Number of Times (n-l)S 2 ^ nX 68 TABLE 13 (A-B): Power of Tests Based on A , I and X2 (n=10) 70 TABLE 14 (A-B): Power of Tests Based on A,I and X2 (n=20) 71 vi PAGE TABLE 15 (A-B): Power of Tests Based on A and I (n=50).. 72 TABLE Al: The Ratios of the Moments of I and x 2 n _ ! 86 TABLE A2: Emperical Cr i t i ca l Values of I (Based on 15,000 Samples) 106 TABLE A3: Pearson Curve Cr i t i ca l Values (Exact) 107 TABLE A4: Pearson Curve Cr i t i ca l Values (Asymptotic) 108 TABLE A5: Gram-Charlier Cr i t i ca l Values (Three Exact Moments) 109 TABLE A6: Gram-Charlier Cr i t i ca l Values (Three Asymptotic Moments) 110 TABLE A7: Gram-Charlier Cr i t ica l Values (Four Exact Moments) I l l TABLE A8: Gram-Charlier Cr i t i ca l Values (Four Asymptotic Moments) 112 v i i TABLE OF CONTENTS LIST OF FIGURES PAGE FIG. A . l : Histogram of I (1000 Samples, n = 10, X = 3) 90 FIG. A.2: Histogram of I (1000 Samples, n = 10, X = 5) 91 FIG. A.3: Histogram of I (1000 Samples, n = 20, X = 3) 92 FIG. A.4: Histogram of I (1000 Samples, n = 20, X = 5) 93 FIG. A.5: Histogram of I (1000 Samples, n = 50, X = 3) 94 FIG. A.6: Histogram of I (1000 Samples, n = 50, X = 5) 95 FIG. A.7: Histogram of I (1000 Samples, n = 100, X = 3) 96 FIG. A.8: Histogram of I (1000 Samples, n = 100, X = 5) 97 FIG. A.9: Normal Probability Plot for I (1000 Samples, n = 10, X = 3) 9S v i i i PAGE FIG. A.10: Normal Probability Plot for I (1000 Samples, n = 10, x = 5) 9S FIG. A.11: Normal Probability Plot for I (1000 Samples, n = 20, x = 3) 100 FIG. A.12: Normal Probability 'Plot for I (1000 Samples, n = 20, x = 5) 101 FIG. A.13: Normal Probability Plot for I (1000 Samples, n = 50, X = 3) 102 FIG. A.14: Normal Probability Plot for I (1000 Samples, n = 50, X = 5) 103 FIG. A i l 5 : Normal Probability Plot for I (1000 Samples, n =.100, X = 3) 104 FIG. A.16: Normal Probability Plot for I (1000 Samples, n = 100, X = 5) 105 i x ACKNOWLEDGEMENT I wish to express my s i n c e r e a p p r e c i a t i o n to P r o f . A. John Petkau who devoted p rec ious t ime to the s u p e r v i s i o n of t h i s t h e s i s . Edgar G. A v e l i n o 1 1. INTRODUCTION 1.1 HISTORY OF THE INDEX OF DISPERSION The index of d i s p e r s i o n i s a t e s t s t a t i s t i c o f t e n used to detect s p a t i a l p a t t e r n , a term e c o l o g i s t s use to d e s c r i b e non-randomness of p lan t p o p u l a t i o n s . Th is i s e q u i v a l e n t to t e s t i n g tha t the growth of p l a n t s over an area i s p u r e l y random, or e q u i v a l e n t l y tha t the number of p lan ts in any given area has the Po isson d i s t r i b u t i o n . Suppose then tha t we randomly p a r t i t i o n some area by n d i s j o i n t e q u a l - s i z e d quadrats and make a count , x , of the number of p lan ts in each quadrat . Under the hypothes is of randomness, X j , . . . , X n would have the Po isson d i s t r i b u t i o n , For a l t e r n a t i v e s to complete randomness i n v o l v i n g patches or clumping of p l a n t s , we would expect Var(X) > E ( X ) , w h i l e f o r more r e g u l a r spac ing of p l a n t s , we would expect Var(X) < E(X) (see f o r example R.H. Green (1966)) . These p r o p e r t i e s lead q u i t e n a t u r a l l y to c o n s i d e r i n g the var iance - to -mean r a t i o as a p o p u l a t i o n index to measure s p a t i a l p a t t e r n . An es t imato r of the v a r i a n c e - t o - m e a n r a t i o i s the index of d i s p e r s i o n , de f ined as P(X=x) = e ' V / x l , f o r x > 0 and x = 0, 1 , 2, ... , f o r which E(X) = Var(X) = X . 1 , i f I = 0 I = I s2/x i f X > 0 n n where X = (1/n) F. X. and S 2 = i = l 1 { l / ( n - l ) > z ( X . - X ) 2 , the i = l 1 2 unbiased estimators of E(X) and Var(X), respectively. (It is natural to define I to be 1 if X = 0 because under the null hypothesis, the variance-to-mean ratio equals 1.) Ever since G.E. Blackman (1935) used the Poisson model for counts of plants, the concept of randomness in a community of plants became a growing interest among ecologists. Although the index of dispersion was introduced by R.A. Fisher (Fisher, Thornton and MacKenzie, 1922), it was not until 1936 that it was first used by ecologists for the purpose of inference. A.R. Clapham (1936), using a x2 approximation for the distribution of the index of dispersion under the null hypothesis of randomness, found that among 44 plant species he studied, only four of these seemed to be distributed randomly, while over-dispersion (i.e. clumping) was clearly present for the remaining species. Student (1919) had already pointed out that the Poisson is not usually a good model for ecological data and in most cases, clumping occurs. This has been termed "contagious" by G. Polya (1930) and also by J. Neyman (1939). Ever since Clapham's paper, the use of the index of dispersion as a test of significance of departures from randomness has been extensive, not just for field data, but also in other areas (for example, blood counts, insect counts and larvae counts). Fisher et al (1922) showed that the distribution of the index of dispersion could be closely approximated by the x2 distribution with n-1 degrees of freedom. However, if the Poisson parameter, X» is small, or if the estimated expectation, X, is smal1, then the adequacy of the 3 x 2 approximation becomes questionable. This is discussed by H.O. Lancaster (1952). Fisher (1950) and W.G. Cochran (1936) have pointed out that in this case, the test of randomness based on I should be done conditionally with given totals , £X.j. Since this sum is a sufficient s ta t i s t i c for the Poisson parameter X, conditioning on the total wi l l yield a distribution independent of X. Hence, exact frequencies can be computed. The conditional moments of the index of dispersion are provided in Appendix A1.2. These moments are also given by J .B.S. Haldane (1937) (see also Haldane (1939)). Several people have examined the power of the test based on the index of dispersion. G.I. Bateman (1950) considered Neyman's contagious distribution as an alternative to the Poisson and found that this test exhibits reasonably high power for n>50 and mjm2^5, where mi and m2 are the parameters of Neyman's distr ibut ion. For 5^ns20, she found that the power is also high, provided that mim2 is large (in particular, mim2^20). Proceeding along the same lines as Bateman, N. Kathirgamatamby (1953) and J.H. Darwin (1957) compared the power of this test when the alternatives are Thomas' double Poisson, Neyman's contagious distribution type A and the negative binomial. They found that this test attained about the same power in each of the three alternatives. F inal ly , in a recent paper, J .N. Perry and R. Mead (1979) investigated the power of the index of dispersion test over a wide class of alternatives to complete randomness. They concluded that this test is very powerful particularly in detecting clumping, and they 4 s t r o n g l y recommend the use of t h i s t e s t . Examination of the power of t h i s t e s t r e l a t i v e to other t e s t s of the n u l l may a l so be impor tant . 1.2 PURPOSE OF THE PAPER The purpose of t h i s paper i s to examine the d i s t r i b u t i o n of the index of d i s p e r i o n and compare i t s power to the power of o ther t e s t s of randomness. We examine the p r o p e r t i e s of the index of d i s p e r s i o n and through these p r o p e r t i e s , attempt to answer such quest ions a s : "How do we dec ide whether a given sample i s s i g n i f i c a n t l y d i f f e r e n t from a Poisson sample?" and "How good i s t h i s t e s t i n d e t e c t i n g departures from randomness r e l a t i v e to other (perhaps r e l i a b l e and w e l l - s t u d i e d ) t e s t s ? " Answers to the f i r s t quest ion could be based on c o n s t r u c t i n g a r e j e c t i o n reg ion R, where i f I e R, we would tend to f a v o r some other a l t e r n a t i v e . For example, i f we wished to t e s t the n u l l hypothes is aga inst a l t e r n a t i v e s i n v o l v i n g c lumping , then l a r g e va lues of I would prov ide evidence aga ins t the n u l l h y p o t h e s i s , and the r e j e c t i o n reg ion would presumably be of the form I > C. For two s ided a l t e r n a t i v e s , we would be i n t e r e s t e d i n both l a r g e and smal l va lues of t h i s s t a t i s t i c , say I < Cj or I > C 2 . We would a l s o want to examine the chances of wrongly r e j e c t i n g the n u l l which i n s t a t i s t i c a l te rmino logy i s c a l l e d the p r o b a b i l i t y of making a type I e r r o r or the s i g n i f i c a n c e l e v e l (or s i z e ) of the t e s t . The constants C, Cj, and C 2 are c a l l e d c r i t i c a l v a l u e s , and i t i s through these c r i t i c a l values tha t the r e j e c t i o n reg ion w i l l be c o n s t r u c t e d . 5 We then rephrase the question as: "Is there a method of determining the r e j e c t i o n region R at a given l e v e l of s i g n i f i c a n c e a ? " As the true p r o b a b i l i t y d i s t r i b u t i o n of I i s unknown, we f i r s t attempt to solve the problem through large sample approximations which lead to asymptotic c r i t i c a l values. We w i l l show that the asymptotic n u l l d i s t r i b u t i o n of I i s normal with mean 1 and variance 2/n. We can then use the c r i t i c a l values from the normal and determine how accurate these c r i t i c a l values are. T h i s study i s done through a Monte Carl o sim u l a t i o n . S i m i l a r l y , the x 2 approximation to the d i s t r i b u t i o n of I i s a l s o examined. We also examine c r i t i c a l values obtained from approximating the n u l l d i s t r i b u t i o n of I by Pearson curves and Gram-Charlier expansions. To assess the "goodness" of the index of d i s p e r s i o n , we might be i n t e r e s t e d i n determining how often we would c o r r e c t l y r e j e c t the n u l l in repeated sampling. This i s c a l l e d the power of the t e s t , the complement of t h i s being the p r o b a b i l i t y of making a type II e r r o r . With the negative binomial as an a l t e r n a t i v e to the Poisson, the power of I i s then compared to the power of the L i k e l i h o o d Ratio Test and Pearson's Goodness-of-fit t e s t . 6 2. LARGE SAMPLE APPROXIMATIONS 2.1 THE JOINT DISTRIBUTION OF X AND S 2 Suppose we choose a random sample of n disjoint equal-sized quadrats and make a count, X.., of the number of plants in the i ^ quadrat. Let X^ X^ be independent identical ly distributed random variables with mean y and variance a 2 . Let = ECCX^-p)^] and suppose that vk<*. In particular, u\ - 0 and y 2 = ° 2 - As a consequence of the Central Limit Theorem, we have /n" (x-y) NCo,M2). (2.1) /?, (S 2 -y 2 ) N ( 0 , y i t - y 2 2 ) . (2.2) These results can be found in Cramer (1946, pp. 345 - 348). Similar ly , the Multivariate Central Limit Theorem implies that /n(X-p) and /n(.S2-y2) converge jo int ly to a bivariate normal distr ibution with mean vector JD and variance-covariance matrix (l/n)E, where 11m nC0V(X,S2) lim nC0V(X,S2) n -*• « yit - V24 l .e. •^n(X-y) ^ ( S 2 - y 2 ) N(0,E) (2.3) Assuming that y = 0, we have C0V(X,S 2) = E (X -S 2 ) = (n/(n- l ) ) {E(XU 2 ) - E ( X 3 ) } where U 2 = ( l / n ) E X ? . Hence E ( X U 2 ) = ( l / n 2 ) E { E X . 3 + E E X . X . 2 } 1 i f j 1 J = y3/n , since by independence, the double sum has zero expectation. Simi lar ly , E ( X 3 ) = y 3 /n 2 from which i t follows that C0V(X,S 2) = y3/n + 0(l/n 2) . \2A) From (2.3) and (2.4) we have for large n that N -^2 , (1/n) V2 ^3 V3 Vii-V2' 2.2 THE ASYMPTOTIC DISTRIBUTION OF I We compute the asymptotic distr ibution of S2/X using the "delta method". It wi l l be seen that the asymptotic distribution of I is the same as that of S2/X\ Let g(x,y) = y/x, so that S2/X = g(X,S 2 ) . Assuming that 3g/8x and 8g/3y exist near the point (y ,cr 2 ) , (note that this requires the assumption that y > 0) , we can expand g(X,S2) in a Taylor series about (y,o 2) and have g(X,S2) = g(y ,o 2 ) + (X-y)g x (y ,o 2 ) + (S 2 -a 2 )g y (y ,o 2 )+ . . . Let U(n)'=(5(,S2) and b = (y .a 2 ) ' . Then (i) u(n) -P—b ; ( i i ) /n"(U(n)-b) N ( 0 , E ) . 8 The result of the delta method (see, for example, T.W. Anderson (1958, pp. 75 - 77).). is that /n"((S2/X) - (a 2 /p)) - 1 N(0 , where <|>b' = (8g/8x,3g/8y) evaluated at ( p , a 2 ) . Under the stated assumptions, we have for large n, S2/X * N ( a 2 / p , ( l / n j ^ ' z ^ ) . (2.5) After some matrix calculations, we get • b ' z + b = v i 3 / ^ - 2y 2 y 3 /y 3 + hk-V22)/v2~. (2.6) So far , a l l of the results hold regardless of the underlying distribution of the X's. If we now assume that X j , . . . , X ~ P(x), then v = E(X) = X , p 2 = Var(X) = X , P3 = X and Pit = 3x2 + X . Substituting these into (2.6), we have that • b ' J : * b = ( 1 A ) - ( 2 A ) + C(3X2+X) - x 2]/x 2 = 2, and hence from (2.5) that S2/X » N ( l , 2/n) . The asymptotic null distr ibution of I is easily seen to be the same as that of S2/X since P{|I-(S2/X)| > e} = P{X=0) = e " n X for any e>0. This probability approaches 0 as n —*• » , and hence I « N( 1 , 2/n) . 9 Note that the 0(.l/n) approximation to the variance of I is inde-pendent of the parameter X. This would be useful in practice because the source of error in estimating X by the maximum likelihood estimator t would not have to be introduced. We should note however that the inclusion of higher order terms wi l l introduce this dependence. 2.3 DESCRIPTION OF THE MONTE CARLO SIMULATION To answer the question of how well the asymptotic c r i t i c a l values work, we perfromed a Monte Carlo simulation when the underlying d i s t r i -bution of the X's is Poisson. Fifteen thousand samples of n Poisson random variables were generated for n = 10,20,50,100 and for X = 1,3,5,8 and fifteen thousand indices of dispersion were computed for each pair n and X. The 1%,2.5%,5% and 10%. quantiles for each pair n and X are given in Table A2 . With such a large number of samples, these c r i t i ca l values may be regarded as exact and they assist in assessing the accuracy of the asymptotic c r i t i ca l values. Given a nominal significance level a , two-sided rejection regions were constructed with a/2 in each t a i l . Using the asymptotic normal c r i t i ca l values, the rejection regions used were the following: R. 0 1 = {I*: |I*| > 2.58} R. 0 5 = {I*: |I*| > 1.96} R. 1 0 = {I*: > 1.64} R. 2 0 = (I*: |I*| > 1.28} where I* = (I-1)//(2/n) and where R^ denotes the rejection region at the nominal significance level a . To test the accuracy of the normal c r i t i ca l values, we merely count the number of I*'s that fa l l in R This a 10 would then give us an estimate p, of the true significance level p. Now, p = C# of I*'s e R )/15,0Q0. Since the number of I*'s e R a a is binomi'a-1 l y distributed (with parameters N=15,000 and p), the A standard error of p is SECp) = / p(.l-p)/15,000 . We then might conclude that the distribution of I is well approximated by the normal i f p is within one standard error of the nominal s i g n i f i -cance 1evel a . To assist the reader in interpreting the results, we supply a l i s t of how many I*'s would be expected in each t a i l of the rejection region i f the true significance level corresponding to each ta i l was identical ly equal to a/2 , one-half the nominal significance leve l . Table 1: Expected Number of I*'s a ' {a/2 ± SE(a/2)} • 15,000 0.01 75 ± 9 0.05 375 ± 19 0.10 750 ± 27 0.20 1500 ± 37 The results, summarized in Tables 2(A-D), are shown in the following pages. The entries in the "I<L" and "I>U" columns are the number of I*'s that l i e to the le f t and right of the lower and upper normal c r i -t ical values, respectively. We immediately notice that the normal approximation is very poor even for n as large as 50. The lower c r i t i ca l values are much 11 NORMAL APPROXIMATION Table 2A ( a = O.Ol) X 1 3 5 8 I <L I>U j KL I>U : KL I>U I <L I>U I 10 0 342 1 0 293 0 . 286 0 276 I 20 0 280 0 243 0 227 0 2 1 3 § 50 9 220 9 197 13 175 12 182 100 1 9 184 27 145 22 134 26 Table 2B (a = 0.05) X 1 3 5 8 n K L 1>U : K L I>U K L I>U 1 K L i>u 1 10 8 713 3 645 o 670 2 646 1 20 50 678 74 633 . 76 607 72 587 1 50 148 587 1 183 546, ] 199 533 197 515 I 100 I 199 538 1 248 506 244 490 1 2 " 481 I 12 Table 2C ( a = 0.10) ] L 3 8 n 1 [-• K L I>U f; K L I>U ' K b I>U K L I>U i 10 114 951 ' 142 974 145 1011 149 1007 | 20 ' 283 1010 368 964 345 983 359 986 I 50 457 962 519 909 527 938 541 903 100 554 943 570 872 • 570 -859 592 862 Table 2D ( a = 0.20) n I K L I>U I K L I>U I K L I>U K L I>U 10 ' 596 1520 I 962 1571 • 953 1633 958 1607 20 1008 1447 |a 180 1627 11212 1643 1221 1648 50 1216 1612 1.1338 1589 |,1334 1587 1324 1584 100 1386 1528 I 1377 1522 I 1322 1576 1355 1564 13 too l i b e r a l . For the cases ns5Q, the total number of I*'s in the re-jection region is close to the total number we would expect to be rejected, hut the significance level in each t a i l is nowhere near a/2. The probability of falsely rejcting the null would be too low in the lower t a i l and too high in the upper t a i l . There is obviously a prob-lem of skewness in the distr ibution. Too many observations l i e in the right ta i l implying that the distribution of I is positively skewed. Notice that for fixed x and increasing n, the number of I*'s rejected in each t a i l becomes more equal. However even for n=100, the lower c r i t i ca l values are s t i l l conservative in a l l significance levels while the upper c r i t i c a l values are too l i b e r a l . For fixed n and i n -creasing X on the other hand, no such pattern is obvious. Thus, i t appears that the normal approximation is really only satisfactory for n>100 and this wi l l not suffice for practical work. An examination of the probability plots and histograms (see Appendix 2.1) provides more deta i l . One might hope to find an improvement to this approximation and one approach taken to improve the approximation is through in f in i te series expansions (e.g. Edgeworth, Gram-Charlier, Fisher-Cornish expansions). The trade-off for having such an improvement is the requirement of higher order moments; and these higher order moments wi l l surely have a dependence on X. More of this wi l l be seen in later chapters. For the moment, we abandon the normal approximation and move on to another simple large sample approximation. 14 2.4 THE x 2 APPROXIMATION As seen in section 2.2, the probability under the null that I and S2/X di f fer by an amount bigger than E(E>0) is e n X which approaches 0 as n — • » . It is therefore suff ic ient to consider an approximation to the distribution of S2/X. At a f i r s t glance, we might suspect that S2/X has some relat ion-ship with the x 2 d ist r ibut ion, for i t is well known that i f X j , • is a random sample from the normal distr ibution with mean y and variance a 2 , then (n- l )S 2 /a 2 ~ x 2 . n-1 In our case, the X's are Poisson and Var(X) is only approximated-by 5 2 = X. However, i t would not be surprising that the null d ist r ibu-tion of (n-l)S2/X could be well approximated byx 2 n ^ for large n. A clearer motivation of this is outlined below. Consider the following one-way contingency table: X2 • • • Xn X. E.: X J X — X nX* The entries in the cel ls of the f i r s t row are just the observed counts themselves, having row total X. , and the entries in the second row are the estimated expected counts, X. (Note that this contingency table differs from the ordinary contingency table where observations are free to f a l l in any one c e l l . In our contingency table, we have 15 one cel l for each count. However, i f we considered only those sampling experiments that produced the same order of experimental results in add-it ion to the same marginal totals , the methods of the ordinary contingen-cy tahles s t i l l apply.) The goodness-of-fit s ta t i s t i c is formed by summing up over the columns, the square of the difference between the observed and the expected values and dividing this by the expected value. This gives us " CX.-X)2/X" , i=l 1 • which is precisely (n-l)S 2/X. Providing the E.'s are not too small J (for example, E.>5 for a l l j ) , the distribution of the goodness-of-fit J s tat is t ic might be expected to be well approximated by x 2 p ^ for large n. This motivation is due to P.G. Hoel (1943). In his paper, he approximated the moments of S2/X under the null hypothesis by power series expansions, correct to 0( l/n 3 ) , and showed that the f i r s t four moments of (n-li)S2/X were in close agreement with those of the x 2 n j d istr ibut ion. 2.5 DISCUSSION Returning now to the simulation study, we recall that since the normal distribution is symmetric, i t could not account for the skew-ness of the distribution of I. On the other hand, since the x 2 n _| distribution is skewed, one might expect i t to perform better than the normal approximation. So as not to obscure the comparison of the two approximations, the same 15,000 samples generated for each case were used. The results are displayed in Tables 3(A-D). 16 X 2 APPROXIMATION Table 3A ( a = 0.01) X 1 2 1 5 8 n I ' KL I>U j I>U I KL I>U J K L I>U 10 j 44 72 1 5 9 81 77 . 83 77 60 20 39 100 69 93 63 76 | 66 78 1 50 • 50 93 69 93 73 9b 75 : 82 100 50 102 72 72 72 • 67 77 69 . Table 3B ] (a = 0.05) L. 3 X 5 8 n • .' KL I>U K L IMJ K L I>U K L I>U 1 10 182 342 . 356 354 379 356 385 360 20 213 408 336 385 1 340 365 359 360 50 : 285 408 350 405, ." 363 380 - 373 365 1 100 ; 309 419 349 396 : 359 373 |: 370 367 9 17 Table 3C (ct = 0.10) K L 10 B 587 I>U KL 713 i 633 I>U KL 707 730 I>U 1 K L 733 1 777 I>U 720 20 1 537 741 9 725 50 I 607 100 I 676 774 807 fl 729 775 796 B 693 757 704 791 I 750 708 684 752 I 765 738 R 709 756 756 733 Table 3D (a = 0.20) 5 n K L I>U K L IMJ | K L IMJ IMJ 10 968 1083 1383 1429 1 1464 1478 : 1489 1469 20 - 1225 1350 : 1478 1504 1- 1468 1507 1517 1539 I 50 1463 1534 1448 1528 : 1457 1505 1468 1495 1 100 1 1440 1489 t 1432 1472 1411 1508 I 1454 1502 1 18 The x 2 approximation clearly gives a better f i t to the null distribution of I than the normal. Most of the entries in the cel ls fa l l within the range of values that one would expect to see. Notice that these tables display a similar pattern, namely that symmetry be-tween the "I<L" and "I>U" columns becomes more apparent with increasing n and fixed X (and with increasing X and fixed n). This pattern appears with increasing a too. However, there seems to be more room for improve-ment for the cases n^ 20 and x<3. In fact, even for n=100 and x=l, the lower c r i t i ca l values tend to be too conservative while the upper c r i t i -cal values tend to lead to rejection too often. (In the ecological con-text however, this would cause no serious problems. One can simply take larger quadrats to ensure that the mean number of plants in each quadrat is larger than 1.) For the cases n^ 20 and X^3, the x 2 approximation i gives a very reasonable approximation to the null distribution of I, and leads to a pleasantly simple method of constructing rejection regions. ' As stated in the previous section, one way of improving these large sample approximations is through an in f in i te series expansion of the true density of I. Another technique commonly used is approximations by Pearson curves, which wi l l require the f i r s t four moments of I. 19 3." PEARSON CURVES 3.1 THE THEORY OF PEARSON CURVES The family of distributions that satisf ies the differential equation d(log f)/dx = (x-a)/(b0+b1+b2x2) C3.1) are known as Pearson Curves. Under regularity conditions, the constants a.bo.bj and b2 can be expressed in terms of the f i r s t four moments of the distribution f ; see Kendall and Stuart(1958, vol . 1, p. 149). Karl Pearson (.1901) identif ied 12 types of distributions each of which is completely determined by the f i r s t four moments of f. It is convenient to rewrite-the denominator as B0 + Bi(x-a) + B 2 (x-a) 2 for suitably chosen constants B Q . B J and B 2 , and hence (3.1) may be written as d(log f)/dx = (x-a)/{B0 + B^x-a) + B 2 (x-a) 2 . (3.2) As in (.3.1), the constants in C3.2) are functions of the f i r s t four i moments of f(.x). By integrating the right-hand side of (3.2), an ex-p l i c i t expression can be obtained for f (x) . The cr i ter ion for determining which type of Pearson curve results is obtained from the discriminant of the denominator in (3.2). This cr iter ion is given by: K = Bi/4B 0 B 2 . (.3.3) 20 Defining 5i = V3/V2 and B2 = VH/V2> where the y . ' s are the central moments for f (x ) , the constants Bo.Bi and B2 can be ex-pressed in terms of 3i and B 2 . The cr i ter ion K then becomes K = 6 1 (3 2 + 3) 2 /{4(2e 2 -3Bi-6)(43 2 -3Bi)} . (3.4) For example, a value of K<0 gives Pearson's type I curve, also called the Beta distribution of the f i r s t kind. In this case, f (x ) = k x ^ U - x ) ^ 1 , for 0 < x s 1, where the constants k,p and q are functions of the f i r s t four mo-ments. If 1<K<«, then we get Pearson's type VI curve, also known as the Beta distribution of the second kind. Here, H x ) = k x p - 1 / U + x ) p + q , for 0<x<°°. The following is a summary of the steps one would take when approximating by Pearson curves. ' Let g(X_) be a s ta t i s t i c whose null distribution we wish to approximate by Pearson curves. The f i r s t step is to compute the f i r s t four moments of g(X) which wi l l depend on the para-meters of the null distr ibution of the X's ( i f the parameters are not specified, they may be estimated by the maximum l ikel ihood). Then &i and 6 2 • can be computed from the moments. From here, either one of two routes can be taken. If c r i t i c a l values are a l l that are required, then the Biometrika tables pub-21 1ished by Pearson and Hartley (1966) can be used. The c r i t i c a l va l -ues are tabulated for a wide range of values of /Bi and 8 2 , and i f necessary, l inear interpolation along rows and columns is suf f ic ient . We should note that when using c r i t i c a l values from the Biometrika tables, one should keep in mind that those c r i t i ca l values are a l l standardized. So i f X denotes the ot level c r i t i ca l value from the ct tables, then the. appropriate a level c r i t i ca l value to use for the test is x = (/M!)X + ji. a a H The other alternative is to compute K from (3.4) and determine the type of Pearson curve to be used. If the resulting distribution is not too uncommon, the parameters of the distribution can be com-puted. The text by W.P. Elderton and N.L. Johnson (1964, pp.35-46) gives an excellent treatment of this situation. Once the Pearson curve is completely determined, c r i t i ca l values can usually be ob-tained from the computer. In particular, the IMSL l ibrary provides c r i t i ca l values for a wide class of distributions. 3.2 TWO EXAMPLES Before applying Pearson curves as an approximation to the null distribution of I, we discuss brief ly two of the examples from the paper by H. Solomon and M. Stephens (1978), where the accuracy of c r i t i ca l values obtained from a Pearson.curve f i t is examined. 22 n 2 Example 1: Let Q (c,a) = E c.(X. + a.) , n i=l 1 1 1 where c = (c 1' • • • 9 c )' and a_ = ( a j , . . . , a n ) ' are vectors with constant components, and the X. 's come from a standard normal distr ibut ion. The exact moments of this s ta t i s t i c are known to any order. In fact, the rth cumulant < . is There is a long l i terature on obtaining the c r i t i ca l values for d i f -ferent combinations of n,c_ and a/. These c r i t i c a l values are tabula-ted in Grad and Solomon (1955) and in Solomon (1960). Much mathema-t ical analysis was used to obtain.these c r i t i c a l values and an exten-sive amount of numerical computations were made so accurately that a l l the c r i t i ca l values can be regarded as exact. Pearson curve f i t s were obtained for different values of the constants and c r i t i c a l values were obtained by-quadratic interpola-tion from Biometrika tables. The results were that the Pearson curve c r i t i ca l values agreed very closely with the exact c r i t i ca l values for the upper t a i l , but there was no close agreement at a l l between the two c r i t i ca l values in the lower t a i l of the distr ibu-t ion. Now, Pearson curves can also be obtained when the f i r s t three moments and a l e f t endpoint of the distribution are known. (See for r n r 2 ( r -1) : E c . ( l+ra.) . i = l 1 1 23 example, R.H. Muller and H. Vahl (1976) and A.B. Hoadley (1968) ). Solomon and Stephens proceeded to do this three-moment f i t and found that the f i t in the lower t a i l was improved considerably, but this approach made the f i t in the upper ta i l less accurate. They point out however that whenever four moments are available, then these four mo-ments should be used for the f i t as i t is the upper ta i l of the d is -tribution that is usually of more importance in practice. Of course, in our case, depending on whether we are concerned with clumping or regular spacing of plants, either t a i l of the distribution might be of interest. Example 2: Solomon and Stephens considered the s ta t i s t i c U = R/S, where R is the range and S the standard deviation of a sample from the standard normal distribution. ' For n=3, the density of. U is known: . f(u) = (3/TT){1-(U 2 /4)}~ ( 1 / 2 \ for /3~< u < 2. The Pearson curve turned out to be a Beta distribution of the f i r s t kind and had the form g(u) = 0 .9573/{ (u -1 .7324)°- 0 1 0 1 (2 .000-u) 0 - 4 9 7 0 } , for 1.7324 £ u < 2.000. First we notice that while the true distribution of U is b e l l -shaped, the Pearson curve is U-shaped. However, notice that the Pearson curve f i t gave the correct le f t and right endpoints of the distr ibut ion, at least to three decimal places. F inal ly , Solomon and Stephens found that the Pearson curye c r t i c ia l values agreed very well with the exact c r i t i c a l values in both the lower and upper ta i l s of the distr ibution. Given that the Pearson curve is U-shaped, the accu-rate f i t obtained in both t a i l s of the distribution is extremely sur-24 prising' These two examples i l lus t rate the usefulness of Pearson curves as a means of approximating perhaps not so much the distr ibut ion, but the c r i t i c a l values. 3.3 THE FIRST FOUR MOMENTS OF I Computing the f i r s t four moments of I is no easy task since I involves a random variable in i ts denominator, namely X. However, we can express the expectation of I , for k = 1,2,3,4, as the expec-n tation of the conditional expectation given the total X. = E X. (or given X). Now jk J \ ^ x = o (S 2/X) k i f X > 0 If follows that E(Ik|X.) = \ 1 i f X. = 0 E{(S2/X)k|-X.} i f X. > 0 UX.=0} + E{(S2/X)k|X.} i {X.>0}, where i{A} is an indicator function equalling 1 i f A is true and 0 otherwise. Hence V - E d " ) = E{E(Ik|X.)} = P(X.=0) +l=l E{(S2/X)k| X.=j}-P(X.=j) = P(X.=0) + E +{Er(S 2/X) k!xJ> (3.5) where E + denotes an expectation over the marginal distribution of X. restricted to the positive values of X. . Thus we may write the kth_ 25 moment of I as u k ' = PtX.=0) +E +(Cl/X) k.EC(S 2) k | X]}. (3.6) Hopefully, the conditional expectation,.which wi l l depend on X, -k might cancel ,off the X in the denominator, and hence computation of the unconditional expectation wi l l be relatively easy. Another d i f f i cu l t y arises in computing the conditional expec-tation i t s e l f , which involves expanding k n k (S 2 ) K = { [ l / C n - l ) ] E (X.-X) 2}\ for k = 1,2,3,4. i=l 1 The. conditional expectation of this random variable wi l l involve moments and product moments of (X^.X^,. . . ,X n)|X. up to the eighth order, and hence i f we choose to compute moments from the moment generating funtion, we would require mixed partial derivatives of the moment generating function of (X^.Xg,....^ )|X. up to the eighth order. Fortunately, when the underlying distribution of the X.. 's is Poisson, the distribution of (X^.Xg x n ^ X * h a s a ^ a r i V ' v simple form, reducing to a multinomial distribution with parameters X. and p.. = 1/n, for i=l , 2 n; i . e . (X 1 ,X 2 x n ) l x - ~ Mult (X. ,1/n, . . . ,1/n) . This result , the derivation of which is provided in Appendix A l . l , faci l i tates the derivation of the conditional moments E{(S2/X) |X.=x.} for x.>0. These are provided in equations ( A l . 3 ) - ( A l . 6 ) of Appendix 26 A1.2 and lead via (3.5) to the following, expressions for the f i r s t four raw moments of I, the index of dispersion; P(X.=0) + P(X.>0) = 1 P(X.=0) + {(n+l)/(n-l)}P(X.>0) - {2/n(n-l)}E +(l/X) P(X.=0) + {(n+l)(n+3)/(n-l)2}P(X.>0) + { l / ( n - l ) 2 } -{ ( -4/n)[l - (6/n ) ]E +(l/X 2 ) - 2[l+(13/n)]E+(l/X)} P(X.=0) + {(n+l)(n+3)(n+5)/(n-l)3}P(X.>0) + {l/(.n-l) 3}{4nCl+(5/n)][l - (17/n)]E +(l/X) - (4/n2)(2n2+53n-261)E+(l/X2) -- (8/n3)(n2-30n+90)E+(l/X3)} . (3.7) From these expressions, we see that we have not overcome the problem of evaluating the expectation E (1/X ) for k = 1,2,3. We have taken two approaches in evaluating these expectations. The practical approach is to express these expectations as integrals, and evaluate these integrals by asymptotic expansions. This is done in Appendix A1.2; the resulting expressions for the raw moments, correct to 0(l/ n ' » ) , are provided in equation (A1.7). The central moments, correct to i ^3 i Pi* 27 OU/n 1*), are then immediate: u = 1 ( e x a c t ) , u 2 ~ 2/n + C2/n 2){l-(l/x)} + (2/n 3){l-(l/x)-(l/x 2)} + (2/n£M{l-a/A)-(l/X2)-(2/X3)} + 0 ( l/n 5 ) , y 3 ~ (l/n2){3+(4/x)} + Cl/n3){16-(24/x)} + (l/n\K24-(52/x)-(3/x2)-(4/X3 )} + 0( l/n 5 ) , MK ~ 12/n2 + (l/n3){72+(72/x)+(8/x2)> + ,(l/nlt){180-(240/x)-(228/x2)+(15/x3)} + 0 ( l/n 5 ) . (3.8) Using the d e f i n i t i o n o f - S i a n d B 2 , we can a l s o express these i n a s i m i l a r e x p a n s i o n : ~ (2/n){2+UA)} 2 + (2/n2){4-(16/X)-C3/X2)+C3/X3) + (2/n3){4-(ie/x)-(7/x2)-(17/X3)+(7/xl*)+ . . . } . (3.9) B 2 ~ 3 + (2/n){6+(12/x) + U/X 2 ) } + (2/n 2 ) {6 - ( .36/x ) - (5/X 2 )+ (4/X 3 ) ) + (3.10) 28 The a c c u r a c y o f t h e s e a p p r o x i m a t i o n s can be a s s e s s e d by computing the moments " e x a c t l y " . By t h i s , we mean computing t h e moments to a r e a s o n a b l e degree o f a c c u r a c y . To do t h i s , we can a p p r o x i m a t e t h e A. L i n f i n i t e s e r i e s i n the e x p r e s s i o n s f o r the e x a c t moments by N ' p a r t i a l sums, S ^ , where N, t h e number o f terms i n t h e p a r t i a l sum, i s chosen so t h a t t h e d i f f e r e n c e between the t r u e and a p p r o x i m a t e d v a l u e s i s no b i g g e r t h a n 1 0 ~ 6 , s a y . A g e o m e t r i c bound on t h e e r r o r i s shown b e l o w . oo L e t S = ( l / n ) E + ( l / X ) - E ( l / k ) e ~ 9 e k / k ' . k = 1 Where 6 = nX. We want to d e t e r m i n e N such t h a t S - S . . 5 1 0 ~ 6 . Now, N OO S - S . . = z ( l / k ) e " e e k / k ' . , N k = N+l co <; E e " 0 e k / k ' . . k <= N+l = { e" e8 N + 1/(M+l)'.}{l+ C e / ( N + 2 ) ] +[e 2 /(N+2)(N+3)] + Ce3/(N+2)(N+3)(N+4)] + . . . } ^ { e ' 0 e N + 1 / ( N+l):}{l+ ( e / N ) + ( e / N ) 2 + ( 0 / N ) 3 + . . . } = { e " e e N + 1 / ( N + l ) : } { i / [ l - ( e / N ) ] } , i f e<N. 29 We th e r e f o r e want to choose N so that ( i ) N > nx and ( I i ) S-S N <; IO" 6 We note that t h i s same value o f N can be used f o r E + ( l / X 2 ) and E + ( l / X 3 ) s i n c e convergence i s f a s t e r i n these cases. The importance o f a good asymptotic expansion i s c l e a r ; one would not want to compute p a r t i a l sums when f i t t i n g by Pearson curves. While computation o f "exact" moments may be r e l a t i v e l y inexpensive f o r small values o f e, i t can get q u i t e expensive f o r l a r g e r values of e, and furthermore, over-flow problems w i l l occur i n these cases. The accuracy o f the asymptotic expansions i s d i s -cussed i n the next s e c t i o n . 3.4 . DISCUSSION Using the known values o f X, we can compute exact and asymptotic moments and hence obtain two Pearson curve f i t s f o r the simulated data. The Pearson curves obtained are the f o l l o w i n g : i ) Using asymptotic moments (up to the fourth order) a type IV f i t was obtained f o r the case x=l ( f o r a l l n) and a type VI f o r a l l other cases, i i ) Using exact moments, the same types were obtained except f o r the case n=10andx=l where the f i t turned out to be a type VI. The type IV Pearson curve i s not a common d i s t r i b u t i o n . f(.x) has the form f ( x ) = k { l + ( x 2 / a 2 ) } " m exp{-b arctan (x/a)}. 30 Since the c r i t i c a l values from this distribution cannot be obtained from the IMSL l ib rary , a l l the type IV c r i t i c a l values were obtained from Biometrika tables (Pearson & Hartley, (1966)). However, the c r i t i c a l values from a type VI curve can be obtained from IMSL. The algorithm for determining the form of the density is outlined in Appendix A1.3. Using the same 15,000 samples for a given n and x , a Monte Carlo study was done to assess the Pearson curve f i t s . The reults of the study are presented in Tables 4(A-D) and Tables 5(A-D). As seen from the tables, the number of rejections from a Pearson curve f i t are close indeed to the ideal number of rejections l is ted in Table 1. The cases of main concern (n^20 and x<3) seem to be satisfactory except for"the case n=10 and x=l where the c r i t i c a l values tend to reject too often. However, a definite improvement from the x2 approximation is clearly present for these cases. While the lower x2 c r i t i c a l values tend to be too conservative, the Pearson curve f i t has corrected for this - however, i t has over-corrected, as now, the lower c r i t i ca l values of the Pearson curves tend to be too l i b e r a l ! This is apparent in a l l the significance levels considered. Note how similar the tables obtained using exact and asymptotic moments are. This indicates that the asymptotic expressions for the moments, when used up to the ;fourth order, are fa i r l y accurate. The excellent performance of the asymptotic moments is indeed encouraging for i ts use in applications. Even for n as low as 10, the asymptotic values of ^ 2 ^ 3 a n c l m» w e r e correct to the f i r s t 4,3 and 2 decimal places, respectively. 31 PEARSON CURVE FIT WITH EXACT MOMENTS Table 4A (a = 0.01) X 1 3 5 8 n K L I>U I I < L I>U KL I>U K L I>U 10 • 109 72 59 77 69 74 • 76 59 20 j 67 64 71 82 70 ' 75 69 76 50 : 87 63 85 84 «• 84 82 79 100 64 75 84 63 79 65 79 67 Table 4B ] (a = 0.05) L 3 X 5 8 n K L I>U 1 K L I>U K L I>U K L I>U I 10 587 410 : 3 8 4 399 384 376 391 365 1 20 ' 401 407 : 384 372 364 362 1 381 360 I 50 ; 401 371 383 397 , | 383 371 384 352 1 100 \ 389 -393 366 380 379 368 - 380 I 364 1 32 Table 4C (a = 0.10) X 1 3 5 8 n K L I>U I K L I M J : K L I>U I M J 10 . 596 821 .. 758 747 I 780 . 760 790 744 20 t 682' 785 786 778 - 769 792 779 763 50 : 751 796 761 770 733 762 774 755 100 759 763 :. 728 749 703 735 724 730 Table 40 (a = 0.20) 1 3 X 5 8 n l K L I M J : K L I M J : K L I M J K L I M J I 10 , 1636 1514 • 1578 1510 j : 1549 1519 1519 1504 20 . 1424 1447 [1520 1534. [ : 1547 1537 1535 1543 50 • 1576 1561 1531 1538 . 1476 1511 1489 1505 100 .. 1502 • 1489 1470 1477 ' 1426 1511 j 1467 1502 33 PEARSON CURVE FIT WITH ASYMPTOTIC MOMENTS Table 5A ( a = O.Ol) X 1 3 5 8 KL I>U KL I>U I KL I>U | KL I>U • 10 44 72 59 77 77 . 75 1 87 59 j 20 67 ' 61 I 71 82 70 75 • 69 76 50 84 64 . 85 84 79 84 ; 82 81 ' 100 71 74 - 8 4 63 79 : 65 79 67 Table 5B ( « = 0.05) X 1 3 5 8 n : K L I>U K L I>U I>U K L I>U I* 10 J 384 410 j 384 399 [ 397 374 405 369 20 ; 401 407 1 384 372 . '•: 364 361 381 360 50 f 401 381 I; 383 397, . 383 ; 371 ; 386 '352 100 : 394 384 1 366 384 383 368 : 3 7 9 364 1 Table 5C ( a = 0.10) X 1 3 5 8 n . ,' K L I>U r KL IMJ j: J<L I>U K L I>U 10 • 596 821 758 747 780. 756 806 744 I 20 I 683 785 786 778 763' 782 765 1 50 i* 744 796 . 761 770 I'" 733 762 778 755 1 100 [ 759 761 : 728 744 " 703 734 730 733 I Table 5D ( a = 0 . 2 0 ) 1 3 X 5 8 n K L I>U 1' K L I>U I>U K L I>U B 10 1636 2233 1578 1510 i 1547 1516 1519 1500 20 " 1424 1447 1; 1520 1534. ) 1547 1537 : 1533 1542 50 •1576 1573 . 1531 1538 j ; 1478 1511 ; 1489 1505 \ 100 1511 1489 1 1470 1468 . 1422 1507 Jf 1 4 6 8 1501 35 The c r i t i ca l values obtained using exact and asymptotic moments may be found in Tables A3 and A4 respectively. With Pearson curve c r i t i ca l values now available, two issues come to mind: ( i) While the approximate values obtained from Pearson curves . clearly improve upon those obtained with the x 2 approximations for n<20 and/or X<3, is i t worthwhile going through the Pearson curve algorithm, computing asymptotic moments, determining the Pearson curve and then obtaining the c r i t i ca l values, as opposed to simply going to x 2 table and reading off the c r i t i c a l values? ( i i ) Are the Pearson curves s t i l l better when we replace X by the maximum likelihood estimator x=X? No attempt was made to examine the second question, although for large sample s izes, we would expect that Pearson curves would s t i l l be better. In answer to the f i r s t question, i f accuracy of c r i t i c a l values is of primary importance, then we might favor Pearson curves. The asymptotic expressions for the moments are now known to be accurate, and once the moments are computed'" from these expressions (to the fourth order), $i and B2 a r e deter-mined and the Biometrika Tables (Pearson and Hartley (1966)) provide us with the c r i t i c a l values. If on the other hand, the cr i ter ion K given in (.3.4) results in a not too uncommon distr ibution, then the c r i t i c a l values may be obtained from the computer. We reiterate that in this case, the exp l ic i t form of the density has to be derived. Alternatively, given the values of n and X, one can obtain c r i t i c a l values through interpolation from the tables 36 provided in the appendix. Although the accuracy of interpolating from these tables has not been assessed, the Pearson curve algorithm is smooth and presumably, a simple linear interpolation wi l l suff ice. 37 4. THE GRAM-CHARLIER SERIES OF TYPE A 4.1 THE THEORY OF GRAM-CHARLIER EXPANSIONS In mathematics, a typical procedure for studying the properties of a function is to express the function as an in f in i te series. Two types of series that immediately come to mind are Taylor series (or power series) and Fourier series. While these two series express a function as a sum of powers of a variable or as a sum of trigono-metric functions, we wi l l instead consider expanding the true density of I as a sum of derivatives of the standard normal density. One can then think of such an expansion as a correction to the normal approximation that was examined in Chapter 2. Let <f>(x) be the standard normal density. • Cx) ( 1 / ^ ) expC-x2/2), Then + '(x) -x<(. Cx) • "(x) (xMHCx) <J>(3)(x) -Cx3-3xH(x) * ( l t ) ( x ) 6x2+3)*(x) In general, » • • (4.1) 38 The polynomials H.(x) of degree j are called the Tchebycheff- Hermite polynomials. By convention, <l>^(x) = <f>(x), i . e . HQ = 1. Some important properties of these polynomials are that ( i ) HV(x) = jH.^Cx) '0 , k*j Cii) 2 Hj(.x)Hk(x)cbCx).dx =j k'. , k=j v. i .e . the Tchebycheff-Hermite poloynomials are orthogonal. Ciii) _/ Hj(xHCx)dx = -hVj^UMx). The proof of ( i) involves expanding cp(x-t) = cbCx)exp{tx-Ct2/2)} in a Taylor series about t = 0. This yields the equation exp{tx-U2/2)} = ~ Ct j/j : )H.(x) . Substituting in the series for the exponential term and redefining the index of the summation gives the desired result.; ( i i ) follows from (i) by substituting in the expression for H^(x) (k) from (4.1) in terms of <|> (x) and performing successive integration by parts, ( i i i ) follows immediately from (4.1). Suppose then that a density function f(x) can be expanded in an in f in i te series of derivatives of • (x): f(x) =' i c .H.CxH(x). j=0 J J (4.2) 39 The conditions for this series expansion to be valid can be found in a theorem by Cramer (1926). The conditions are that: co ( i ) / (df/dx)exp(_-x2/2)dx converges, and ( i i ) f(x) — * 0 as x —>• ±°> . To find the coefficients c , multiply equation (4.2) by H.(x) J K and integrate from -°» to » and use the orthogonality property. CO CO CX> / f(x)H (x) dx = Y z c.H.Cx)H. tx)*Cx)dx K .» j=o J J K OO OO = tloc.H.tx)Hk(x)*(x)dx (interchanging the sum and the integral is just i f ied since p(.x)«<j>(xl is always bounded, for any polynomial pCx) of f in i te degree.) The c can then be expressed in terms of the moments about the or ig in . We l i s t the f i r s t five coefficients below/. c 0 = i Cj = u c 2 = Ll/2)(ix2'-l) 40 c 3 = (i/eKiia'-y) ch = ( l/24)( y i + , -6u 2 , +3) . For the purpose of computing c r i t i c a l values, i t is convenient to express the cumulative distribution function (CDF) F(x), in a similar series. F(x) = / X f(x) dx —CO = /x{c{>(x) + ? c.H.(xHCx)} dx j=l J J = $(x) - Z c .H. ,(x)<f>(x}» where $ is the standard j=l J J _ i normal CDF. This series is called the Gram-Charlier series of type A (Kendall and Stuart (1958), vo l . 1, pp. 155-157). Let X = ( I - l ) / ^ . Then Ci= c2= 0, and the Gram-Charlier series of type A for the CDF of X is given by F(x) = $(x) - <Kx){c3H2(x) + c^Cx) + ...} where c 3 = (l/6)p3* = Cl/6)E{(I-l)3/n23/2} = (l/6)y3/M2 3 / 2 = (l/6)/3T . Similar ly , c u = (l/24)(.B2-3). Now consider using partial sums of the Gram-Charlier series as 41 an approximation to FCx). (Note that the one-term approximation is merely the normal approximation.) In particular, suppose we use the f i r s t two terms of the series to approximate F(x). FCx) * *(x) - Cl/6)(/Bl)(x 2-lH(x) = G(x) Then the corresponding approximate c r i t i c a l values can be computed for any significance level a, by solving the equation G(x) = a. Of course, this equation has to be solved i terat ively (by the Newton-Raphson method, say). Since exact and asymptotic moments are available, C3 and c^ can be computed l ikewise. Note that from ( 3 . 9 ) and ( 3 . 1 0 ) , c 3 ~ 0(l/n) and c^ ~ 0(l/n). If we choose to do the two-term approximation, then we would be neglecting ct, , and hence neglecting terms of 0(l/n). Therefore, c 3 can be approximated by terms whose orders are less than 1/n. In this case, 6c3 ~ /27n (2+ ( 1 A ) > , and the approximation becomes F(x) ~*Cx) - a/6)/(^{2+(lA)}(x2-l)<>(x). In the case of a three-term approximation, we would be neglec-ting C5. The f i f t h moment is unavailable, but we might anticipate 3/2 that c 5 ~ OCl/n ' .). If this is the case, then up to the order of neglected terms, 6c 3 .~ /r?7n1 (2+(lA) > 24c, ~ (2/n){6+(12A) + ( 1 A 2 ) >. 42 and the approximation becomes FCx) ~ »(x) - (Cl/6/[27nl{2+(lA)}(x2-l) + (l/24){C2/n)[6+(.12A) + ClA2)3(x3-3x)}(i>(x). 4.2 DISCUSSION Since the results with the asymptotic moments were v i r tual ly the same as those with the exact moments Cas i t was with Pearson curves), we only display the table for the two- and three- term f i t s with exact moments.(see Tables 6CA-D) and Tables 7(A-D) ). We can immediately see that'the series expansion has improved the normal approximation. We point out some interesting results arising from the comparison of the two series approximation. F i rs t , while the lower c r i t i ca l values from the three-moment f i t tend to be -too conservative, those from the four-moment f i t are s l ight ly l i b e r a l . This appears to be the case of a > 0.05. In general, the four-moment f i t seems to be adequate at the lower t a i l , except for the usual cases of concern n £ 20 and x <• 3. On the other hand, the upper c r i t i c a l values from the three-moment f i t tend to be adequate for most cases, but the inclusion of the fourth moment has made the upper c r i t i c a l values very conservative. The four-moment f i t is only satisfactory for n a 50 and \ > 3. Obviously, the Gram-Charlier approximation is not recommended since the much simpler x 2 approximation i s even better. However, i t is interesting to note that a three-term partial sum approximation of the true density of I improves the normal approximation consider-ably. The c r i t i ca l values obtained from this approximation may be 43 GRAM-CHARLIER THREE-MOMENT FIT (EXACT) Table 6A (a = O.Ol) X 1 3 5 8 n K L I>U K L IMJ KL IMJ K L I>U | 10 109 140 32 125 : 22 132 26 106 20 •; 105 123 79 115 ] 66 101 66 106 1 50 I 101 93 96 98 86 98 1; 90 94 100 85 96 89 74 . 88 69 84 76 I Table 6B ] (a = 0.05) L 3 X 5 8 n K L IMJ • K L IMJ K L IMJ K L IMJ 1 10 190 381 176 364 ' 192 352 • 199 347 1 20 : 305 370 304 354 . j • 287 • 344 284 330 I 50 ; 354 363 358 384, 1 345 363 •• 352 ' '345 I-100 384 377 i . 356 370 . 359 362 368 354 1 44 Table 6C (a = 0.10) X 1 3 5 8 n j K L I>U K L IMJ KL I>U K L IMJ 10 587 551 482 626 ; 497 628 515 626 E 20 537 678 659 687 638' 709 : 633 680 B 50 668 726 723 731 683 721 727 720 100 735 733 : 707 727 678 713 703 714 1 Table 6D (a = 0.20) 1 3 X 5 8 n K L IMJ : K L I>U IMJ K L I>U 9 10 974 1083 - 1104 1291 • 1240 1328 ; 1248 1312 B 20 1297 1350 : 1352 1426 .. : 1402 1404 1389 1448 • 50 1517 1471 ' 1436 1472, ; 1430 1450 '• 1439 1453 100 • 1468 1455 1432 1442 1398 1485 1438 1486 3 GRAM-CHARLIER FOUR-MOMENT FIT (EXACT) Table 7A (.'<*= O.Ol) X 1 3 5 8 n 1 K L I>U 1 KL I>U 1 KL I>U I K L IMJ J 10 I 109 81 89 0 . 89 1 5 70 1 20 39' 74 36 82 ; 30 : 32 77 I 50 54 69 62 83 59 83 12 115 100 1 59 74 1 72 62 71 63 . 70 67 . 1 table'7B ( a = .0.05) X 1 3 5 8 • n K L I>U 1 K L IMJ K L IMJ | K L I>U 8 10 587 212 305 262 275 259 267 264 20 401 246 317 293 • 303 284 293 283 50 358 303 350 355 , i;: 341 335 I 228 429 100 . 377 342 | 349 359 I 358 344 I 3 6 5 342 46 Table 7C (• o = 0.10) X 1 3 5 8 n , K L I>U I<L I>U j I>U \ Ul I>U 10 968 426 774 504 • 738 502 732 515 20 839 532 761 655 707 663 7 7 639 50 744 687 746 725 • 713 713 747 718 100 759 733 720 725 695 712 709 713 1 Table 7D ( a = 0.20) 1 X 3 5 8 n I<L I>U I<L I>U I<L I>U I I<L I I>U 10 2147 2233 1824 1733 1667 1695 1665 1630 20 1672 1678 1619 1587 1600 1579 1 , 1580 1589 50 1628 1598 1538 1554 I I 1520 1522 j 1508 1515 100 1536 1499 1499 1483 J 1429 1514 1469 1504 47 found in Tables A5 - A3. Other types of series expansions could also be examined. Two of the more common ones are Edgeworth expansions, which are known to be equivalent to the Gram-Charlier series of type A, and Fisher-Cornish expansions, which are derived from Edgeworth expansions. Treatment of these can be found in Kendall and Stuart (1958, Vol . 1, pp. 157 -157). 48 5. THE LIKELIHOOD RATIO AND GOODNESS-OF-FIT TESTS In the previous chapters, we examined various approximations to the distribution of the index of dispersion for the case that the data is distributed as Poisson in order to obtain approximate c r i t i c a l values. We compared the performances of c r i t i c a l values obtained from large sample approximations, series expansions and Pearson curve f i t s , and found that Pearson curves seemed to give the most accurate c r i t i c a l values. The one remaining question we wi l l attempt to answer i s : "How good is the test based on the index of dispersion relative to other tests of the null hypothesis that the data is distributed as Poisson?" Two well-known methods of testing the adequacy of the mo.del under the null are ( i ) The Likelihood Ratio Test and O'i) Pearson's Goodness-of-Fit (GOF). To assess the performance of the test based on the index of dispersion, we can examine the power of these three tests against appropriate alternatives. In testing for over-dispersion, ecologists have used the negative binomial (Fisher, 1941), Nermann's conta-gious distribution Type A (Neymann, 1939) and Thomas' double Poisson (Thomas, 1949) as alternatives to the Poisson d is t r ibu -t ion. P. Robinson (1954) has pointed out that the Neymann d i s t r i -bution may have several modes (leading to non-unique estimates when estimating by maximum likelihood) and that a basic assumption 49 of the double Poisson may not be satisf ied by the distribution of plant populations. The negative binomial distribution is perhaps the most widely applied alternative to the Poisson. Letting the parameters of the negative binomial be k and 9 (k>0,e>0), we may write: where x=0 , l ,2 , . . . . From t h i s , we have that E(X) = ke and Var(X) = ke(.l+e) = E(X).(.l+e) > E(.X). For alternatives involving under-dispersion, we can test the null against the positive binomial (although i t wi l l 6e noted that the maximum likelihood estimator of n, the number of Bernoulli t r i a l s , may not be unique). 5.1 THE LIKELIHOOD RATIO TEST Let r± = (k,e) be a two-dimensional vector of parameters and let f(x,_n) be the probability mass function of the negative binomial. It is shown in Appendix A1.4 that as k + » and e 0 in such a way that ke =X, a constant, then the l imit ing distribution arrived at is the Poisson with parameter X which has probability mass function f (x ,x ) . Let 0 O and 0 be the space of values that the parameters X and n_ may take on, respectively. The GOF problem then is to test H 0 : n e 0 O Hj: n_ e 9 -9 0 50 The l ikelihood ratio s ta t i s t i c for testing H0 against Hi is A = sup Lfx.nJ/sup* L(x,n), where L(x.,n) is the likelihood function for a sample X j » . . . , X . Here, "sup" indicates a supremum taken over Q Q while "sup" " indicates a supremum taken over e. Note that this implies thatA< l . In general, the distribution of the l ikelihood ratio s ta t i s t i c is unknown. However, under regularity conditions (Kendall and Stuart (1958), vo l . 1, pp. 230-231), as n -> °>, i t is known that asymptotically where p and q are the dimensionalities of the parameter spaces under the alternative and the n u l l , respectively. We now compute the MLE's of X,k and 9. The l ikelihood functions of the Poisson and negative binomial are respectively, -2 ln A * x 2 p-q L(x,X) n x i -x n x 1 e A/x ' i=l 1 n ' = xx* e" n V n x.'. , and i=l 1 L(x,k,e) = n 1 / (e/U+ejr'WU+e)} i=l\ x. / (5.1) Let l0(> ,^x) and I j ^ . k . e ) be the corresponding log- l ikel ihood functions. Then n l 0 (x ,X) = x. InX -nX - X In(x.l) i=l 1 (5.2) 51 But M x . k . e ) = £ in{(k+x.- l) ! / Cx '. (k-1)!]} + x. lne 1=1 1 1 - (x. + nk) In (1+6) = E 1 n {(k+x.-l)! / (k-1)!} - Z In (x. l ) i=l 1 1=1 1 + x. In e - (x.+nk) In (1+e). z In {(k+x,-l):/(k-l)'.} i=l 1 = C £ + E ] .In {(k+x.-l): / (k-1).} {i:x.=0} {i:x.>0} 1 Since the summation over the zero values of x. is zero, n + E ln{(k+x,-l ) l /Ck-l) : } =.E ln{Ck+x,- l ) : / (k-l ) : } , 1=1 1 i 1 where E + denotes a summation over i such that x.>0. This sum can be i written as, z + zMnCk+j-l), i j=l 52 and hence, l i U . k . e ) = x . l n e - ( x . + n k ) l n ( l + e ) + E E 1 l n ( . k + j - l ) i J=l n - 2 l n ( x ' ) . (5.3) i=l 1 To obtain the MLE of X, (5.2) must be maximized with respect to x while the MLE's of e and k are obtained by maximizing (5.3) with respect to 6 and k simultaneously. Thus, 3lQ/8X = (x ./x) - n . Setting this derivative to zero.and solving for X yields X , the MLE of X as, X = X. (.5.4) Similar ly , ali/ae = (x./e) -{(x.+nk)/(l+e)} ali/ak = E E 1 {l/Ck+j-1)} - n ln(l+e), i j=l Setting the derivatives to zero, the f i r s t equation can be expl ic i t ly solved for e to yield 6 = X/k. " (5.5) 53 Substituting this value into the second equation leads to E + Z1 U / O j - 1 ) } - n lnU+(X7k)} = 0. (5.6) i j=l Levin and Reeds (1978) give a necessary and suff icient condition for the uniqueness of the MLE of k. This cr iter ion can be stated as: "k, the MLE of k, exists uniquely in CO,00) i f and only i f n n n E X 2 - E X. > ( E X.) 2 /n. (5.7) i=l 1 i=l 1 i=l 1 The right-hand side of this cr iterion is simply nX 2 , and so (5.7) can be rewritten as n n E (X. -X) 2 > E X. , or i=l 1 i=l 1 (1/n) z (X. -X) 2 > X. i=l 1 Since S2 = {l/(n-l)> E (X. -X) 2 > (1/n) E (X. -X) 2 i=l 1 i=l 1 provided that E CX.-X) >0, a consequence of Levin and Reed's c r i -i=l 1 terion is that a unique k exists in C0,«) i f the index of dispersion is greater than 1. Thus, subject to Levin and Reeds' c r i ter ion , the solution to (5.6) can be obtained numerically. Once k, the MLE of k, is obtained, .... A substitution into (5.5) yields e, the MLE of 9. (The case where the c r i -terion is not sat isf ied corresponds to k=», and is discussed in more 54 detail in section 5.4.) Continuing with the likelihood ratio test , we have A A A In A. = l 0 ( x , X) - l i( .x,k,e) x. = x'.{(ln k)-l} + (x.+nk')ln{l+(X/k)} - r + I1 In (k+j-1). i j=l This is the form of the likelihood ratio test. The asymptotic result is that as n -> °° -2 ln A ~ x 2 . ' 5.2 PEARSON'S G00DNESS;0F-FIT TEST A test that assesses goodness-of-fit is the well-known x 2 test that was proposed by Karl Pearson (1900). The GOF s ta t i s t i c i s : X2 = f (n . - x . ) 2 /X. , where n. is the number of times that the integer j is observed in a 3 sample and x . = nPCX=j) where X ~ P ( x ) , is the expected number of times 3 the integer j wi l l occur under the n u l l . The asymptotic result is that as n -> » , v2 ss Y2 * V l where v is the number of c e l l s . (Note that one degree of freedom is lost since the probabilit ies computed uner the null are subject to the constraint that they sum up to 1. Also, in the simulation that follows, the value of X is specified and hence no further degree of freedom is lost . ) This approximation has been known to work well particularly i f the expected number of observations, X . , in each ce l l is 55 at least 5. Now, for sample sizes of about 10 to 20, this rule of thumb may not always be sat is f ied . The rule that has been implemented is that the expected number of observations in each cel l is at least 3. As wi l l be seen, the x 2 approximation was s t i l l satisfactory in this case. 5.3 POWER COMPUTATIONS We now have three tests whose power we wish to compare. Since the index of dispersion and the test based on X2 do not depend on expl ic i t alternative hypotheses, we might expect the likelihood ratio test to be superior of the three. Because of computational d i f f i cu l t ies that may arise when using the l ikelihood ratio test (these are mentioned later on in this section), i t is not recommended for use in practice. We use i t here only to provide a baseline for the assessment of the power of the test based on the index of dispersion. Since the index of dispersion is devised to test for the variance being different from the mean, i t wi l l be geared towards alternative hypotheses which have this property and so we might expect the index of dispersion to perform better than the test based on X 2 . Note that while a one-sided test was implemented for the test based oh the index of dispersion (and necessarily for the l i k e l i -hood ratio test) , the test based on X2 is necessarily two-sided. This should be taken into consideration when comparing the power of the tests. Let us recall the hypotheses we are testing: Ho* X j ,X2» . . . ,X ~ P(.X) H I : x 1,x 2,...,x n ~ NB(k,e). Through simulation studies, we are going to compare the power of the three tests. However, as the null hypothesis does not specify a par-t icular value of x, i t is not clear how to choose k and 9 for the 56 simulation. To do th i s , we argue as follows: Since the Poisson distribution with parameter x is a l imit ing case of the negative binomial with parameters k and 9 , we can specify X and choose k and 9 so that ke = X. Now we also want to choose k so that the tests exhibit reasonable power. For instance, we do not wish to generate data that yields power that is very close to 1. We would l ike to choose values of k so that the range of the power covers the unit interval [0,1]. To get a good idea of what k should roughly be, we can examine the asymptotic power of the index of dispersion test. To do th is , we need the f i r s t four moments of the negative binomial which can be found in Kendall and Stuart (1958, vo l . 1, p. 131). Letting v, v 2,v 3, and vt, denote the central moments of the negative binomial, we have v = ke, v 2 = ke(e+l), v 3 = ke(e+l)(2e+l) , and v h = ke(e+l)Cl+6e+6e 2+3ke+3ke 2). Substituting these moments into equation (2.6) in Chapter 2, we have that I ~ N(l+e , (l/n)v 2) where v2 = 2 ( 1 + 6 ) 2 + (l+e)(2+3e)/k. Notice that by setting k=X/e-and lett ing e + 0, we obtain the aysmptotic null distribution of the 57 index of dispersion for the Poisson case, namely I « N(l,2/n). This result was seen in Chapter 2 where the performance of the asymptotic normal c r i t i c a l values was assessed. Hence the asypmtotic power of I can be computed from the set of hypotheses: H :9 = 0 o Hj. :9 = 9j > 0 . Let u(e) = 1+e and o2(e) = .(l/n)v 2, where v2 is defined as above. If we le t I = 1+z /(.2/n), where z is the upper c r i t i c a l value a a a of the standard normal distribution at significance level a, then the asymptotic power of I is Power = *(-w a), where W q = U a - y C^i )}//o i 5(e 1). The asymptotic power of I is presented in table 8 for the case n = 20 and a = 0.05. Table 8: ASYMPTOTIC POWER OF THE INDEX OF DISPERSION TEST (n = 20, a = 0.05) k x = ke 3 5 7 • 10 SIZE 1 .352 .224 .166 .125 .05 3 .739 .560 .425 .309 .05 5 .873 .752 .663 .484 .05 58 Thus, the values of k = 3,5,7 and 10 seem to be adequate. Notice the pattern in this table. The power decreases with increasing k and decreasing e. This is not surprising because as ik-increases and e decreases, the negative binomial approaches the Poisson, and hence i t would be much harder to detect differences between the null and the alternative with k large and e small. We now proceed with the Monte Carlo simulation. A total of 500 samples of n = 10, 20 and 50 negative binomial random variables were generated for k = 3, 5, 7 and 10. The three stat is t ics , -2 InA, I and X2 were computed using the negative binomial data. While the computation of I and X2 are very easy on the computer, some problems may occur in computing the l ikelihood ratio s t a t i s t i c , as mentioned previously. F i rs t , the computation of the double.sum in (5.6) at each iteration wi l l increase the cost of running the computer program. This w i l l be more evident for large n and/or large X. Second, for some of the samples, a negative value of k was obtained at some point in the iteration process. This may create a problem in computing In {l+(X/k)} in (5.6). Barring a l l d i f f i cu l t i es however, the Newton-Raphson Algorithm achieved convergence in about 5 or 6 iterations. Continuing with the simulation, an attempt is made to treat each test as equal as possible by using x 2 c r i t i c a l values in each case. However, a problem may s t i l l occur in the power comparison. Since a l l these tests are based on asymptotic c r i t i c a l values, the asymptotic approximations may not treat each of the three tests exactly the same. 59 For example, for a given sample s ize , i t may be that X is better approximated by x 2 than I, which in turn may be better approximated by X2 than -2 In A. This may make the conclusions on the power comparison unreliable. Proceeding with the power computations, the number of s tat is t ics which fe l l in the rejection region were counted for each of the three tests (Recall that a one-sided rejection region was formed for the tests based on A and I while a two-sided rejection region is necessary for X 2 ) . The power of each test is displayed in tables 9-10 (A-D). Each cell in this table contains the power of the l ikelihood ratio test , the index of dispersion test and the GOF test , in that order. To provide a handle on the accuracy of the x 2 approximation for each of the three tests, the estimated size of each test is also displayed in each table. If the approx-imation were good for a particular test , then the estimated size of that test should be close to the specified significance leve l . As mentioned above, the x 2 approximation may hot treat these three tests equally. This in fact is the case when n = 10. The c r i t i c a l values for the likelihood ratio test are too conservative as can be seen from the estimated size of the test . For example,, when a = 0.05, the estimated size of the l ikelihood ratio test is s l ight ly less than 0.01. On the other hand, the estimated size of the test based on X2 is very close to the true significance level for a l l a , while the test based on the index of dispersion tends to be intermediate. Thus we could infer that i f exact 60 POWER OF TESTS BASED ON A.I AND X2 fn=10) Table 9A: a = 0.01 k X = ke 3 5 7 10 SIZE .028 .016 .012 .008 0 1 .068 .044 .032 .028 .008 .010 .008 .006 .006 .008 .148 .058 .026 .018 0 3 .252 .138 .086 .056 .002 .156 .096 .086 .072 .010 .356 .148 .086 .042 .002 5 .478 .276 .180 .114 .004 .290 .192 .128 .102. .024 Table 9B: ct = 0.05 k . X = k9 3 5 7 10 SIZE .064 .048 .038 .032 .008 1 .162 .114 .092 .074 .034 .062 .068 .060 .054 .055 .270 .144 .098 .058 0 3 .444 .272 .224 .152 .038 .242 .174 .136 .112 .050 .486 .286 .182 .122 .006 5 .648 .464 .332 .252 .036 .388 .268 .208 .166 .068 61 Table 9C: a = 0.10 . • k A=l<e 3 5 7 10 SIZE .118 .066 . .052 .048 .020 I .222 .174 .130 .110 .046 .086 .096 .092 .088 .094 .344 .198 .150 .098 .012 3 .566 .384 .310 .242 .070 .272 .204 .156 .136 .090 .570 .752 .250 .178 .006 5 .752 .588 .448 .342 .082 .516 .384 .312 .264 .136 Table 9D: a =0.20 k X=k.9 3 5 7 10 SIZE .166 .108 .086 .072 .036 1 .366 .294 .256 .236 .124 .224 .206 .200 .196 .182 .452 .286 .232 .178 .040 3 .688 .546 .466 .398 .158 .424 .332 .256 .244 .184 .656 .470 .356 .258 .040 5 .848 .718 .618 .504 .160 .582 .456 .382 .326 .216 62 POWER OF TESTS'BASED 0*1 • A , T ANH y2 ( n = ? n ) * Table 10A: a = 0.01 k X=k9 3 5 7 10 SIZE .044 .016 .012 .010 0 1 .102 .050 .034 .024 .010 .048 .038 .030 .030 .016 .314 .142 .076 .034 .002 3 .450 .228 ,150 .098 .012 .266 .136 .096 .062 .024 .664 .334 .190 .112 0 5 .770 .472 .312 .190 .008 .558 .296 .202 .140 .022 Table 103: a = 0.05 k X=kO 3 5 7 10 SIZE .104 .050 .042 .032 .010 1 .214 .140 .100 .082 .046 .078 .056 .052 .050 .040 .502 .258 .164 .104 .012 3 .666 .416 .298 .218 .032 .410 .266 .206 .156 .056 .804 .534 .342 .208 .008 5 .898 .706 .526 .366 .042 .686 .408 .314 .224 .074 63 Table IOC: a = 0.10 k X=ke 3 5 7 10 SIZE .172 .104 .076 .054 .020 1 .316 .228 .174 .148 .086 .148 .128 .110 .104 .098 .594 .336 .232 .158 .020 3 .780 .544 .420 .320 .086 .498 .328 .268 .210 .110 .872 .620 .452 .290 .020 5 .936 .796 .650 .486 .086 .760 .514 .408 .298 .112 Table 10D: - a = 0.20 x=ke 10 SIZE .246 .492 .226 .164 .388 .202 ,124 ,328 ,174 ,104 .272 .174 .054 .172 .166 .704 .460 .326 .246 .032 3 .882 .714 .604 .476 .196 .614 .462 .382 .308 .194 .916 .746 .578 .396 .046 5 .966 .886 .792 .662 .192 .834 .652 .542 .422 .220 64 c r i t i c a l values were employed, the power of the l ikelihood ratio test would be considerably larger than indicated in the tables. Similar ly , the c r i t i c a l values for the index of dispersion test are s l ight ly conservative and hence we would expect the power of this test to increase i f exact c r i t i c a l values were used. The power of 2 the test based on X , however, would be pretty much what the tables indicate. Turning to n = 20, the same problem s t i l l arises for the likelihood ratio - - very conservative c r i t i c a l values. On the other hand, while the asymptotic c r i t i c a l values used for the index of dispersion are s t i l l s l ight ly conservative, the approximation has clearly imprPved and the estimated size of the index of dispersion test is closer to the true significance level . In fact, the estimated power of the index of dispersion is close indeed to i ts asymptotic power. Although i t is not clear that the likelihood ratio test is more powerful than the test based on the index of dispersion, we can make one additional observation i f we compare tables with the same size (for instance, the 20% table for the 1 ikel ihood ratio and the 5% table for the index of dispersion when n = 20), we see that in each c e l l , the estimated power of the l ikelihood ratio test is indeed higher than that of the index of dispersion - however, only marginally. This is an indication of what we might expect to see i f the sample size were large enough so that the estimated size' of the test is close to the true significance leve l . 65 Thus, the results displayed in tables 9 and 10 seem to suggest the following order in terms of the power of each test. Likelihood Ratio, Index of Dispersion and GOF based on X 2 . A further attempt to compare the power of the likelihood ratio and the index of dispersion, tests are displayed in table 11 (A-D) for a sample size of n = 50. As before, we may compare the 20% table for the likelihood ratio with the 5% table for the index of dispersion to conclude that the likelihood ratio test is only s l ight ly more powerful than the test based on the index of dispersion. 5.4 THE LIKELIHOOD RATIO TEST REVISITED At the time when this thesis Was f i r s t being written, no obvious explanation could be made about the conservatism of the c r i t i c a l values of the l ikelihood ratio test. Subsequently, the explanation became clear: For the situation under consideration, the null distribution of -2 1 n A does not converge to that of a x2> but rather to that of a mix-ture of a x 2 and a zero random variable, each with probability 1/2. This is an example of the general results of Chernoff (1954). The reasoning goes as follows: A A A If the MLE (6,k) for the negative binomial occurs at k=°°, then A - l and -2 In A= 0. Levin and Reeds (1977) have established that this occurs i f and only i f (n-l)S s nX. Thus, under the null hypothesis, we have P(-21n A H 0) = P{(n-l)S 2 <; nX> = P{n(.S2-X) - S 2 s 0} = P(/nC(S2-x) - CX-X)3 - U/^ )S 2 * 0} = P(/ii[(S2-x) - (X-x)]^0>, for large n. 66 POWER OF TESTS BASED ON A AND I (n=50) Table 1'IA: a = O.Ol k X=k8 3 5 7 10 SIZE .094 .034 .016 .010 .004 .166 .068 .050 .032 .012 .744 .376 • .172 .082 .002 .842 .510 .304 .152 .010 .984 .762 .500 .270 .002 .990 .844 ' .636 .400 .008 Table 113: a = 0.05 X =ke 10 SIZE ,224 ,354 .098 .222 .060 .156 .034 .106 .014 .046 ,886 .936 .572 .724 .374 .532 ,206 .354 .018 .046 .996 .998 .890 .950 .714 .818 .*52 .622 .014 .040 67 Table 11C: ct = 0.10 k X=k8 3 5 7 10 SIZE .314 .178 .120 .080 .028 .522 .318 .234 .178 .106 .922 .668 .482 .296 .032 .960 .810 .642 .484 .096 .998 .942 .778 .564 .034 .998 .978 * .896 .722 .098 Table 11D: cc = 0.20 X=ke 10 SIZE .430 .664 .950 .984 .268 .434 .770 .910 .198 .380 .600 .784 .152 .308 .422 .638 .066 .216 .076 .192 .998 .968 .876 .676 .070 1.000 .988 .956 .848 .188 68 N ( Q , E ) , where E = From sections 2.1 and 2.2, we have that 7r7CX-x}" "x x _x x +-2x£ Letting f(x,y) = y-x so that f (X,S 2) = S2-X = (S 2-x) - (X-x), we have, as a consequence of the delta method, that /h"{(S2-x) - (X-X)}-^ N(0,2X 2 ) . Thus for large n, we have that P(-2 In A = 0) » 1/2; i . e . -2 In A = 0 approximately half the time. Thus under the n u l l , we have the following result : -2 In A ( 0 with probability 1/2 X 2 with probability 1/2 (5.8) As a supplement to (5.8), the f i r s t 500 Poisson samples from the 15,000 previously generated were again used in order to check i f half of these 500 samples would give a value of -2 In A = 0. Table 12 displays the number of samples out of. the'500 which led to -2 In A = 0. Table 12: Number of Times (n- l )S 2 n X 10 20 50 1 367 337 304 3 340 317 302 5 338 ' 307 293 69 The fact that the entries in the table decrease as ngets large i s indeed encouraging and re-affirms our position that the null d i s t r i -bution of -2 InA converges to a mixture of distr ibutions. What effect then does (5.8) have on the power computations? In the previous computations, we have been assuming that a = P(-2 1n A s C ), a where Ca is the upper c r i t i c a l value corresponding to a x 2 d ist r ibu-t ion. Letting Z be a standard normal random variable, we have instead that P{-2 ln A s CM * (l/2)P{I0>Ca}+ (1/2)P{Z2>C> = (1/2){1-P[Z2<C ]} a = 1-*(7C'-), a where I0 = 0 with probability 1 and $ is the standard normal CDF. But,-a = P(Z2>Ca) = 1 - P(-/C < Z < /C ) a a = 2{l-<»(/C a) , and hence, P(-2 ln A s C ) « a/2 , a instead of the anticipated value of a'. Further simulations were not done as enough information can be gathered from the previous results. In particular, using the correct asymptotic c r i t i c a l values for the likelihood ratio test , for each fixed n, the previous results for a = 0.10 are the appropriate results for a = 0.05 and the previous results for a = 0.20 are the appropriate results for a = 0.10. These are displayed in Tables 13,14 and 15 (A-B). 70 POWER OF TESTS BASED ON A,I AND X2 (n=10) Table 13A: a = 0.05 k x=ke 3 5 7 10 SIZE .118 .066 .052 .048 .020 l. .162 .114 .092 .074 .034 .062 .068 .060 .054 .056 .344 .198 .150 .098 .012 3 .444 .272 .224 .152 .038 .242 .174 .136 .112 .050 .570 .752 .250 .178 .006 5 .648 .464 .332 .252 .036 .388 .268 , .208 .166 • .068 Table 13B: a = 0.10 k i X=k9 3 5 7 10 SIZE .166 .108 .086 .072 .036 1 .222 .174 .130 .110 .046 .086 .096 .092 .088 .094 .452 .286 .232 .178 .040 3 .566 .384 .310 .242 .070 .272 .204 .156 .136 .090 .656 .470 .356 .258 .040 3 .752 .588 .448 .342 .082 .516 .384 .312 .264 .136 71 POWER OF TESTS BASED ON A,I AND X2 (n=20) Table 14A: a = 0.05 k X=ke 3 5 7 10 SIZE .172 .104 .076 .054 .020 1 .214 .140 .100 .082 .046 .078 .056 .052 .050 .040 .594 .336 .232 .158 .020 3 .666 .416 .298 .218 .032 .410 .266 .206 .156 .056 .872 .620 .452 .290 .020 5 .898 .706 .526 .366 .042 .686 .408 .314 .224 .074 Table 14B: a = 0.10 k X=k8 3 5 7 10 SIZE .246 .164 .124 .104 .054 1 .316 .228 .174 .148 .086 .148 .128 .110 .104 .098 .704 .460 .326 .246 .032 3 .780 .544 .420 .320 .086 .498 .328 .268 .210 .110 .916 .745 .578 • .396 .046 5 .936 .796 .650 .486 .086 .760 .514 .408 .298 .112 72 POWER OF TESTS BASED ON A AND I (n=50) Table 15A: a = 0.05 k X=ke 3 5 7 10 SIZE 1 .314 .354 .178 .222 .120 .156 .080 .106 .028 .046 3 .922 .936 .668 .724 .482 .532 .296 .354 .032 .046 5 .998 .998 .942 .950 .778 .818 .564 .622 .034 .040 Table 15B: a = 0.10 k X=k8 3 5 7 10 SIZE 1 .430 .522 .268 .318 .198 .234 .152 .178 .066 .106 • 3 .950 .960 .770 .810 .600 .642 .422 .484 .076 .096 5 .998 .998 .968 .978 .876 .896 .676 : .722 .070 .098 73 The correction to the asymptotic null distribution of -2 ln A has certainly created a better picture. The estimated size of the l i k e l i -hood ratio test is closer to the nominal significance level than i t was when the x 2 approximation was employed. However, the c r i t i ca l values l for the likelihood ratio test are s t i l l very conservative. Hence, the power of the l ikelihood ratio test would be greater than that displayed in these tables. As before, we may compare tables with approximately the same estimated size. For example the power of the test based on the index of dispersion from Table 13A might be compared to the power of the likelihood ratio test from Table 13B and s imi lar ly , Table 14A to 14B. We see that for n=10 and 20, the l ikelihood ratio test is only margin-a l ly better than the test based on the index of dispersion. For the case n=50, no reasonable comparison can be made, but we would expect the same behavior from both tests. 74 6. CONCLUSIONS As mentioned in Chapter 1, the index of dispersion is a s ta t i s -t ic often used to detect departures from randomness. As the null d is -tribution of the index of dispersion is unknown, large sample approxi-mations were used as a preliminary f i t . The asymptotic null d is t r ibu -tion of I was seen to be normal with mean 1 and variance 2/n. Asymp-tot ic c r i t i c a l values from this distribution were then employed and assessed by a Monte Carlo simulation. The results were that the nor-mal approximation was very poor for sample sizes typical ly encountered in practice and that this approximation only becomes satisfactory for a sample size of about 100 and x > 5. A further attempt to improve the normal approximation was made by an inf in i te series expansion of of the true null distribution of I,. We saw that a three-moment f i t from the Gram-Charlier expansion improved the normal approximation enormously, but that this approximation was only satisfactory for n _ 50. The x 2 approximation on the other hand seemed to be fa i r l y accurate for n>20 and X>3. This is certainly encouraging because of one important reason - the x 2 approximation is simple to apply. To further improve the x 2 approximation (particularly for the cases n<20 and;x<3), Pearson curves were u t i l i zed . We found that except for the case n=10 and X=l, Pearson curves definitely improved the approximation. 75 Two issues s t i l l remain unanswered: ( i ) What should be done in the case n=10 and X=l? ( i i ) How well wi l l the approximations remain when we A replace X by X=X? For the second question, we expect that the Pearson curve approxima-tion wi l l s t i l l perform wel l . As for the f i r s t question, let us keep in mind the suggestion put forth by Fisher (1950) and Cochran (.1936) — that the test based on the index of dispersion should be carried out conditionally, particularly when the Poisson parameter X is small, for then exact frequencies can be computed. F inal ly , the comparison of the powers of the tests based on the likelihood rat io , the index of dispersion and Pearson's X2 s ta t is t ic showed that the test based on the index of dispersion exhibits reason-able power when the hypothesis of randomness is tested against over-dispersion. This supplements the results obtained by Perry and Mead (.1979). From the basis of accurate c r i t i c a l values and reasonably high power, we conclude that the index of dispersion is highly recommend-able for i ts use in applications. 76 REFERENCES ANDERSON, T.W. (1958). "An Introduction to Multivariate Stat ist ical Analysis". Wiley, New York. BATEMAN, G.I. (1950). "The Power of the x 2 Index of Dispersion Test When Neyman's Contagious Distribution is the Alternative Hypothesis". Biometrika, 37, 59-63. BLACKMAN, G.E. (1935). "A Study by Stat is t ica l Methods of the Distribution of Species in Grassland Communities". Annals of Botany, N.S., 49, 749-777. CHERNOFF, H. (1954). "On the Distribution of the Likelihood Ratio". Annals of Mathematical S tat i s t i cs , 25, 573-578. CLAPHAM, A.R. (1936). "Over-dispersion in Grassland Communities and the Use of Stat is t ica l Methods in Ecology". Journal of Ecology, 24, 232-251. COCHRAN, W.G. (1936). "The x 2 Distribution for the Binomial and Poisson Series with Small Expectations". Annals of Eugenics, 7, 207-217. CRAMER, H. (1926). "On Some Classes of Series Used in Mathematical S tat is t ics" . Skandinairske Matematikercongres, Copenhagen. CRAMER, H. (1946). "Methods of S ta t i s t i cs " . Princeton University Press, Princeton. DARWIN, J .H. (1957). "The Power of the Poisson Index of Dispersion". Biomterika, 44, 286-289. DAVID, F.N. and MOORE, P . G . (1954). "Notes on Contagious Distributions in Plant Populations". Annals of Botany, N.S., 28, 47-53. ELDERTON, W.P. and JOHNSON, N.L. (1969). "Systems of Frequency Curves". Cambridge University. FISHER, R.A. (1941). "The Negative Binomial Distribution". Annals of Eugenics , 11, 182-187. FISHER, R.A. (1950). "The Significance of Deviations from Expectation in a Poisson Series". Biometrics, 6, 17-24. FISHER, R.A., THORNTON, H.G. and MACKENZIE, W.A. (1922). "The Accuracy of the Plating Method of Estimating Bacterial Populations". Annals of Applied Biology, 9, 325. GRAD, A. and SOLOMON, H. (1955). "Distribution of Quadratic Forms and Some Applications". Annals of Mathematical S tat is t i cs , 26, 464-477. 77 GREEN, R.H. (1966). "Measurement of Non-Randomness in Spatial Distributions". Res. Popul. Eco l . , 8, 1-7. HALDANE, J .B .S . (1937). "The Exact Value and Moments of the Distribution of x 2 , Used as a Test of Goodness-of-Fit, When Expectations are Small". Biometrika, 29, 133-143. HALDANE, J .B .S . (1939). "The Mean and Variance of x 2 , When Used as a Test of Homogeneity, When Expectations are Small". Biometrika, 31, 419-355. HOADLEY, A.B. (1968). "Use of the Pearson Densities for Approximating a Skew Density Whose Left Terminal and First Three Moments are Known". Biometrika, 55, 559-563. HOEL, P.G. (1943). "On Indices of Dispersion". Annals of Mathematical S ta t i s t i cs , 14, 155-162. KATHIRGAMATAMBY, N. (1953). "Notes on the Poisson Index of Dispersion". Biometrika, 40, 225-228. KENDALL, M.G. and STUART, A. (1958). "The Advanced Theory of S ta t i s t i cs " . Vol. 1. G r i f f i n , London. KENDALL, M.G. and STUART, A. (1958). "The Advanced Theory of S ta t i s t i cs " , Vol. 2. G r i f f i n , London. LANCASTER, H.O. (1952). "Stat ist ical Control of Counting Experiments". Biometrika, 39, 419-422. LEVIN, B. and REEDS, J . (1977). "Compound Multinomial Likelihood Functions are Unimodal: Proof of a Conjecture of I .J. Good". Annals of S ta t i s t i cs , 5, 79-87. MENDENHALL and SCHEAFFER (1973). "Mathematical Stat is t ics with Applications". Duxbury Press, North Scituate, Massachusetts. MULLER, P.H. and VAHL, H. (1976). "Pearson's System of Frequency Curves Whose Left Boundary and First Three Moments are Known". Biometrika, 54, 649-656. NEYMAN, J . (1939). "On a New Class of Contagious Distributions Applicable in Entomology and Bacteriology". Annals of Mathematical S ta t i s t i cs , 10, 35-57. PEARSON, E.S. and HARTLEY, H.O. (1966). "Biometrika Tables for Stat is t ic ians" , Vol. 1, 3rd ed. , Cambridge. 78 PEARSON, K. (1900). "On a Criterion that a Given System of Deviation from the Probable in the Case of a Correlated System of Variable is Such that i t can be Reasonably Supposed to Have Risen in Random Sampling", P h i l . Mag., (5), 50, 157 PEARSON, K. (1901). "Systematic F i t t ing of Curves to Observations". Biometrika, 1, 265. PERRY, J .N. and MEAD, R. (1979). "On the Power of the Index of Dispersion Test to Detect Spatial Pattern". Biometrics, 35, 613-622. POLYA, G. (1930). "Sur Quelques Points de la Theorie des Probabil ites". Ann. de L ' lnst . Henri Poincare, 1, 117-162. ROBINSON, P. (1954). "The Distribution of Plant Populations". Annals of Botany, N.S. 18, 35-45. SOLOMON, H. (1960). "Distributions of Quadratic Forms - Tables and Applications". Technical Report No. 45, Applied Mathematics and Stat ist ics Laboratories, Stanford University, Stanford, Cal i fornia. SOLOMON, H. and STEPHENS, M.A. (1978). "Approximations to Density Functions Using Pearson Curves"; Journal of the American Stat ist ical . Association, 73, 153-160. STUDENT (1919). "An Explanation of Deviation from Poisson1s Law in Practice". Biometrika, 12, 211-215. THOMAS, M. (1949). "A Generalization of Poisson's Binomial Limit for Use in Ecology". Biomtrika, 36, 18. 79 APPENDIX A l.l THE CONDITIONAL DISTRIBUTION OF A POISSON SAMPLE GIVEN THE TOTAL Let X j , . . . , X be independent identical ly distributed Poisson random variables with parameter X. Then the sum of the X. 's n 1 X • — E X • ) i=l 1 is distributed as Poisson with parameter nX. Consider the joint distribution of X ^ , . . . , X n given the total X.. Since fx|x.(-) = fx^)/fx.(x-) = ( n x V V x ')/{CnX)x-e"nV(x.):} i=l = ( x.:)(l / n ) x 7 n x . l , the desired result follows, i . e . (xx , . . . , X n |X . ) ~ Mult (.X.,1/n,1/n,1/n,...1/n). The distribution of a vector of independent and identically distributed Poisson random variables conditioned on the total is a 1 multinomial with parameter m = X. and equal cel l probabilities 1/n. This conditional distribution i s independent of the Poisson parameter X since X. is a suff icient s ta t i s t i c forx . The moment generating funtion of the multinomial is {PiexpUi) + . . . + p n exp( t n ) l m . In our case this becomes M(t) = {(l/n)[exp(t!) + . . . + exp(t n ) ] } X ' . 80 A1.2 THE FIRST FOUR MOMENTS OF I From [3.5), we see that we require the evaluation for x.>0 of E { ( S 2 /x ) k |x .=x.}, for k=l,2,3 and 4. Now for k=l and x.>0, we have n (n-l)X-E{S2/X|X.=x.} = E{ z (X . -X) 2 | X.=x.l i=l 1 n = E{ z X 2 - nX2|X.=x.} i=l 1 n = Z E{X.2|X.=x.} - nX2 i=l 1 = n{Var(X.|X.=x.) + E2(X. |X.=x.)}-nX2 = n{x.(l/n)[l-(l/n)]+(x./n)2> - nX2 = (n-l)X . It follows that for x.>0, we have E{S2/X | X.=x.} = 1. (A1.3) For k=2 and x.>0, we begin by noting that {(n-l)sY= ( ^ ( X . - X ) 2 ) 2 = • Z (X.-X)1* +.z z (X . -X) 2 (X . -X) 2 . i=l 1 lVj 1 3 Upon expanding these powers of (X^-XJ and evaluating the required conditional expectations through the moment generating function of (X^.X^, . . . ,X )|X., we have, for x.>0, that E{(S2/X)2 j x . = x . l = (n+l)/(n-l) - (2/nCn-l))(1/X). CA1.4) It follows that for x.>0 Var(S2/X|X.) = {2/(n-l)}{l-(_/nX)}. 81 Considerably more algebraic effort is required in the cases k = 3 and 4. For x.>0, we obtain E{(S2/X}3|X.=x.} = (n+l)Cn+3)/Cn-l)2 - 2{l/(n-l)2}-{[l+Cl3/n}]Cl/X) + (2/n)[l-(6/n)](.l/X2)} (A1.5) Ef/(S 2/xT|X.=x.} = (n+l)Cn+3)Cn+5)/Cn-l)3 + {2/(n-l)3>-{2n[l+C5/n)][l-(17/n)]Cl/X} - (.2/n2)(2n2+ 53n - 261)(1/X2) - (4/n 3 l ( .n 2 - 30n +90)(1/X3)}. CA1.6) Equations (Al .3) - (Al .6) agree with those provided by Haldane (.1937). Substitution of these conditional moments into (3.5) yields exact expressions for the f i r s t four raw moments of I which are given • in (3.7). To obtain the central moments from these raw moments is a matter of using the formulas given in Kendall and Stuart (1958, vo l . 1, p. 56). We should mention that the algebra involved in computing these conditional expectation was checked by UBC's symbolic manipula-tor, documented in "UBC REDUCE". Expansions of powers and the compu-tation of the partial derivatives of the moment generating function were a l l checked on the computer. It remains to evaluate E+(l/X^), for j =1,2 and 3. Now, E+(.l/X) = nE+(l/X.) -fl k = n £ (l/k)e e /k! , where 9=nX. k=l 82 If we let then or f(e) = E (l/k)e' 6e k/k:, k=l co f ( e ) = - f (e) + E e" 8 e k _ 1 /k! k=l f ' (e) + f(e) = e" e (e e-l )/e. Since the solution to this dif ferential equation is f(e) = e~8/8 {e t-l)/t)dt, o i t follows that E+(l/X) = ne" 9/ 6 { (e^D/t ldt . o Simi1arly, E+(l/X2) = n 2 (e" e ln e/8 [(e t-l)/t]dt o - e~V (In t ) [ (e t - l )/t ]dt>, and E +( l/X 3) = n 3C(l/2)e* 6(ln e) 2 fQi{et-l)/t}dt 0 - e" 8 ln 6 fQ (In t){ (e^D/tldt o + (l/2)e" 8 / 8 (In t ) 2 { (e t - l ) / t )d t ] . o None of the above integrals can be evaluated exp l i c i t l y , and P 2 ' . V 3 1 and ]ik' would either have to be approximated by numerical integration or by an asymptotic expansion. We i l lust rate this by e panding the integrals for large 6 . Let (l/n)E +(l/X) = f ( 6 ) , ( l/n 2 )E + ( l/X 2 ) = g(6) and ( l/n 3 )E + ( l/X 3 ) = h ( 6 ) . 83 Under the transformation t=9x, we have f(6) = e ' 8 /{ (e 8 x - l )/x}dx o l l = e" 8 ln x ( e 8 x - l ) | - e"8 / (In x)ee 8 xdx l f 0 = -ee" 9 / l n ( l - z ) e 8 ^ 1 _ z ^ d z , where z=l-x. Now, and so o For j = 1 , 2 , 3 , . . . , let •ln(l-z) = z + z2/2 + z 3/3 + f'(e) = / (z + z2/2 + z 3/3 + . . . ) e e " 0 z d : U e ) = / z j ee" 8 Z dz J o 3 - e z , 1 , * J - l - e z . = -z e + J / z e dz. o o We then have an expression for I j in terms of II . , k<j: I .(e) = - e ' 8 + (j/e)I ._ 1(e), where I 0(e) = 1 - e " 8 . For j ^ l , this recursive formula yields I j (e) = - e - 8 + (j/e){-e~8 + _(j - l )/e_l (e)} " f l - (j/e-)e"e• + { j ( j - l )/8 2 } { -e- e + C(j -2)/e]l .(e)} -e = -e = j'./eJ + 0 ( e - 8 ) ~ j I/ej Therefore, f(e) ~ _ k'./e k + 1 + 0(l/e N + 2 ) as N k=0 and the asymptotic expansion for E+(l/X) is E+(l/X) ~ n(l/e + l/e2 + 2/e3 +...). 84 Notice that the f i r s t term approximation of E + ( l / X ) is n/e = 1/E(X), which would be the naive approximation to this expectation. The asymptotic expansions for g(e) and h(e) are obtained in a similar fashion, except instead of expanding l n ( l - z ) , we would need to expand ( l n ( l - z ) l 2 and ( I n(l - z ) } 3 for g(e) and h(e) respectively. The results are E + ( l / X 2 ) ~ n 2 ( l / e 2 + 3/03 + l l / e 4 + . . . ) , E + ( l / X 3 ) ~ n 3 ( l / 6 3 + 6/e* + 35/65 + ...). Alternately, these same expansions could be obtained by repeated applications of L'Hospital's rule. Substituting these expansions into (3.7) yields the raw moments, correct to 0(l/ n t * ) : P i ' ~ 1 , vi' ~ 1 + (2/n) + (2/n2)[ 1 - (1/x)] + (2/n3)[ .1 - (1/x) -(1/X2)] + (2/n-)C 1 - (1A) - (1A 2 ) - (2/x3)] , y 3 ' ~ 1 + (6/n) + (2/n2)[ 7 - (1A)1 + (?/n3)[ 11 - (15A) - (3/X2)] + (2/nMC 15 - (29/x) - (7A 2 ) - ( 8 A 3 ) ] , y i t ' ~ 1 + (12/n) + (4/n2)[ 14 + (l/x) 1 ] + (4/n3)[ 37 - (9/x) - ( 1 A 2 ) ] + (4/nMC 72 - (115A) -.(68/x 2) - ( 6 A 3 ) 1 . (A1.7) 85 Proceeding in the same way as Hoel. (1943), we can also assess the accuracy of the x2 approximation to the null distribution of I by examining the ratio of the asymptotic moments of I with the moments of [l/(n-l)lx 2 The behavior of these ratios as n and/orX increases, wi l l indicate when the x2 approximation is satisfactory. The f i r s t four moments of a random variable distributed as [l/(n-l)lx 2 ^ are: V = 1 . oiz ' = (n+l)/(n-l) , 103' = (n+l)(n+3)/(n-l)2 , c V = (n+l)(n+3)(n+5)/(n-l)3 , (see Mendenhall and Scheaffer, 1973, p.138). Notice that the moments of Cl/(n-l) lx 2 j approximate the moments of a Mult (x. ,1/n.. . ,1/n), correct to 0(l/n). Let Ri = V j ' / u i f ' . for i = 1,2,3 and 4 Cnote that R ^ l for a l l n and'X). Using the asymptotic expressions in (A1.7), these ratios are computed for n = 10,20,50 and 100 and x = 1,3,5 and 8, and entered in Table A l . The asymptotic moments of the index of dispersion agree very well with those of the x 2 _j distribution for n^20 and X ^ l . In fact, this is also apparent for n^lO and X^ 5. As n and/or X increases, R 2 ,R 3 , and R^ a l l approach the l imit ing value 1. This is indeed encouraging and compliments the results obtained in section 2.5. 86 TABLE A l : The Ratios of the Moments of I and x 2 i Cfor each c e l l , the ratios ^2*^3 a n d R4 a r e entered in that order) X n 1 3 5 8 0.9797 0.9937 0.9962 0.9977 10 0.9631 0.9887 0.9932 0.9957 0.9724 0.9921 ' 0.9948 0.9962 0.9950 0.9984 0.9990 0.9994 20 0.9925 0.9976 0.9986 0.9991 1.0003 1.0002 1.0001 1.0001 0.9992 0.9997 0.9998 0.9999 50 0.9991 0.9997 0.9998 0.9998 1.0009 1.0003 1.0002 1.0001 0.9998 0.9999 0.99996 0.99993 100 0.9998 0.9999 0.99998 0.99998 1.0003 1.0001 1.00004 0.99996 87 Al.'3 THE TYPE VI PEARSON CURVE We rewrite the dif ferent ial equation given by (3.1) as dClog f(x)}/dx = Cx-al/IbzCx-Aj)Cx-A2)}, (A1.8) where Ai and A2 are the roots of the quadratic b0 + b xx + b 2 x 2 . Kendall and Stuart (1958, vo l . I ,p.l49) give the formulas for a , b 0 , bx and b2 as functions of 8 i , B 2 and y 2 . When using these formulas, one should keep in mind that the formulas were obtained assuming the origin at the mean. For the type VI case, both roots of the quadratic are real and have the same sign. Without loss of generality, assume that A2>Ai. Then, by partial fractions, we can write dllog f(x)}/dx = U/bzUCi/Cx-A!) + C2/(x-A2)}, where C2 = (a-Ax )/(A 2-Ai) = (a-A^/S, C2 = (A 2 -a)/(A 2 -A!) = (A 2 -a)/c> and 5 = A 2 - A i . For x>A2, we can integrate equation (A1.8) with respect to x to get log f(x) = (C1/b2)log(.x-A1) + (C 2/b 2)log(x-A 2) + C where C is the arbitrary constant of integration. Transforming back to the true o r ig in , i . e . replacing x by x - 1 , yields log f(x) = (A/bzJlogfx-ai) + (C 2/b 2)log(x-a 2) + C where ai = 1+Ai and a 2 = 1+A2, and hence -Pi Q2 f(x) = k(x-a!) (x-a 2) (A1.9) where qi = -C i/b 2 , q 2 = C2/b2 and k is a normalizing constant. Since A2 > Aj (and hence a 2 > aj) and qj and q 2 are real numbers, i t follows that type VI Pearson curve defined in (A1.9) is 88 a distribution defined on [ a 2 , » ) . If we let y = x-al , then -qi q 2 fly) = ky (y+a^a,,) , for y ^ - a ^ O , -qi Q2 = ky (y-s) , since 6 = A2 - Ai = a2 - ai Now let z = S/y so that dy/dz = - 6 / z 2 . Then f(y) = kC«/z) q i CC5/z)-5] q 2|dy/dz| q 2 - q i +l q i - 2 q 2 = kfi z [Cl-z)/z] q i - q 2 - 2 q 2 = k'z U-z) , for 0<z<l. This last form of the density of the beta distribution is what is required when using the IMSL l ibrary to compute c r i t i c a l values. A1.4 A LIMITING CASE OF THE NEGATIVE BINOMIAL The negative binomial distribution with parameters k and 8 approaches different distributions depending on the l imit ing operation. In particular, le t k -> <» and e -> 0 in such a way that ke = x, a constant. If X ~ NB(k,e), then the moment generating function of X is M x(t) - {p/(l -qe t )} k , where p = l/(e+l) and q = e/(e+l). Hence, 89 M x(t) = {Cl/(e+l)]/[l-(e/9+l)e t]}k = {[k/(X+k)][l-(x/(A+k))et]}k = (k/CxCl-e*) + k]} k = {l+[X(l-e t)/k]}" k (A1.5) But as k ->• » , the l imi t of the right-hand side of (A1.5) is gX(e -1)^ w( 1 1 - c n j s p r e C i S e l y the moment of generating function of the Poisson distr ibut ion. 90 A2.1 HISTOGRAMS OF I ro II II c tv> OJ to o o o C D o I— C O I - H zc CM in in • - 2 > • o ( U O H O I-I- > l/> I X U J ty> co O O m Z ai • < • U J o _ i -Z O </> 3 O h -O O 2 (J — U J l/> U J - I oc o a 0 0 x x oc > o m _ I o < U J • i3 2 < Z3 •r- — o o _ CO CO • — CM co in CO r~ U J o » O N T f (J) ID —• CO "» U J Z —• co t - CO o — ai a. • H — >• o X o n co CM CO 0 1 o CO t-2 _ o o — CO co U J u — CM CO in CO t~ O • U J 1 - o tN 1 0 1 0 —• CO *r CC Z to r~ 03 o — u. *~* « * # < • • o + X X X X X X co I X X X X X X 1 X X X X X X 1 X X X X X X I X X X X X X in + X X X X X X t- I X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X o X X X X X X X I X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X in X X X X X X X CO 1 X X X X X X X I X X X X X X X 1 X X X X X X X I X X X X X X X O X X X X X X X ID 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X m X X X X X X X in 1 X X X X X X X 1 X X X X X X X I X X X X X X X j X X X X X X X O X X X X X X X in t X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X in X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X I X X X X X X X O X X X X X X X <T 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X 1 X X X X X X X in X X X X X X X o 1 X X X X X X X 1 X X X X X X X t X X X X X X X X 1 X X X X X X X X O X X X X X X X X co 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X in X X X X X X X X CM 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X t X X X X X X X X O X X X X X X X X CN 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X in X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X 1 X X X X X X X X o X X X X X X X X X *- 1 X X X X X X X X X 1 X X X X X X X X X 1 X X X X X X X X X 1 X X X X X X X X X in X X X X X X X X X 1 X X X X X X X X X 1 X X X X X X X X X 1 X X X X X X X X X 1 X X X X X X X X X 4 •1 + + + + •» + + o o o o o o o o o < o o o o o o o o o O o o o o o o o o O o o o o o o CO o Ul U l i ID CO o CM «I to o CM 1- *£ tN m IP co a> 2 < 2 « 11 « • « » « « II . i 0 ) Q O a i o o 8 8 + o CO in r-x x X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X + + o in u> o CO in m O m O 1-O co in CM O CM X X X X X X X X X X X X X X X X X X X X X X X X X x x : + + X X X X X X •i 1 CM ti in to r- co o C M C M t N C M r M C M C M C M f O C O C O FIG. A2 HISTOGRAM OF I (.1000 samples, n = 10, X = 51 SYMBOL COUNT MEAN S T . D E V . X 1 0 0 0 0 . 9 8 1 0 . 4 4 7 E A C H SYMBOL R E P R E S E N T S 1 O B S E R V A T I O N S I N T E R V A L F R E Q U E N C Y P E R C E N T A G E NAME 5 10 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 I N T . , C U M . I N T . C U M . * . 2 4 0 0 0 0 + X X X X X X X X X X X X X X 14 14 1 .4 1 .4 * . 3 6 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 3 5 4 9 3 . 5 4 . 9 * . 4 8 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 6 2 111 6 . 2 1 1 . 1 * . 6 0 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 101 2 1 2 1 0 . 1 2 1 . 2 * . 7 2 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 9 4 3 0 6 9 .4 3 0 . 6 » . 8 4 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 1 1 6 4 2 2 11 . 6 4 2 . 2 * . 9 6 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 1 0 3 5 2 5 1 0 . 3 5 2 . 5 * 1 . 0 8 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 1 0 3 6 2 8 10 . 3 6 2.8 • 1 . 2 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 9 6 7 2 4 9 . 6 7 2 . 4 * 1 . 3 2 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 8 3 8 0 7 8. . 3 8 0 . 7 * 1 . 4 4 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 5 1 8 5 8 5 . 1 8 5.8 - 1 . 5 6 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 4 3 9 0 1 4 . 3 9 0 . 1 * 1 . 6 8 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 31 9 3 2 3 . 1 9 3 . 2 * 1 . 8 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X 2 0 9 5 2 2 . 0 9 5 . 2 * 1 . 9 2 0 0 0 + X X X X X X X X X X X 11 9 6 3 1, . 1 9 6 . 3 " 2 . 0 4 0 0 0 + X X X X X X X X X X X X X 13 9 7 6 1, . 3 9 7 . 6 - 2 . 1 6 0 0 0 +XXXXX 5 9 8 1 0 . 5 9 8 . 1 - 2 . 2 8 0 0 0 +XXXXXXXX 8 9 8 9 0 . .8 9 8 . 9 ' 2 . 4 0 0 0 0 +XXXXX 5 9 9 4 0 . . 5 9 9 . 4 - 2 . 5 2 0 0 0 . + X X X 3 9 9 7 0 . 3 9 9 . 7 - 2 . 6 4 0 0 0 + 0 9 9 7 0 . , 0 9 9 . 7 - 2 . 7 6 0 0 0 + o 9 9 7 0 . 0 9 9 . 7 - 2 . 8 8 0 0 0 + XX 2 9 9 9 0 . . 2 9 9 . 9 • 3 . 0 0 0 0 0 +x 1 1 0 0 0 0. 1 1 0 0.0 - 3 . 1 2 0 0 0 + 0 1 0 0 0 0 . 0 1 0 0.0 - 3 . 2 4 0 0 0 4 - 0 1 0 0 0 0 . 0 1 0 0.0 + + + + + + + ._+ + + +— + + + +—--+ 5 10 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 FIG. A3 HISTOGRAM OF I (1000 samples, n = 20, X SYMBOL COUNT MEAN ST.DEV. X 1000 0.998 0.330 EACH SYMBOL REPRESENTS 1 OBSERVATIONS INTERVAL FREQUENCY PERCENTAGE NAME 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 INT, . CUM. INT. CUM. *.300000 + 0 0 0 .0 0.0 * . 400000 +XXXXXXXXXXX 11 11 1 . 1 1 . 1 *.500000 +XXXXXXXXXXXXXXXXXXXXXXXX 24 35 2 .4 3.5 *.600000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 64 99 6 .4 9.9 *.700000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 91 190 9 . 1 19.0 *.800000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 103 293 10 .3 29.3 *.900000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 114 407 11 .4 40.7 * 1.00000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXX* 142 549 14 .2 54 .9 •1.10000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 99 648 9 .9 64.8 * 1.20000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 98 746 9 .8 74.6 * 1.30000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 84 830 8 .4 83.0 * 1.40000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 53 883 5. ,3 88.3 • 1.50000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 55 938 5. .5 93.8 * 1 .60000 +XXXXXXXXXXXXXXXXXXXXXXX 23 961 2 .3 96. 1 • 1.70000 +XXXXXXXXXXX 11 972 1 . , 1 97.2 * 1.80000 +XXXXXXXXXXX 11 983 1 , , 1 98.3 * 1.90000 +XXXX 4 987 0, ,4 98.7 •2.00000 +XXX 3 990 0. ,3 99.0 •2.10000 +XXXX 4 994 0, , 4 99.4 •2.20000 +XXX 3 997 0. ,3 99.7 *2.30000 + 0 997 0. 0 99.7 •2.40000 +XX 2 999 0. 2 99.9 *2.50000 + • 0 999 0. 0 99 .9 "2.60000 +x 1 1000 0. 1 100.0 •2.70000 + 0 1000 0. 0 100.0 •2.80000 + 0 1000 0. 0 100.0 Co 5 10 15 20 25 30 35 40 45 50 5S 60 65 70 75 80 93 LCI II o CM II c l/> CU I a. E ta in o o o < 3 I— CJ z t u CJ • CC H KJ Z 0. >-! > • CJ I Z 3 UJ o o • ui I-o: Z O + to O n t in oo o i i n o o v M i i ^ O i i K t n o u i M o g i o o o o 0 0 ' - t O o » - M r ) t n ' » » - i j ) n i i ) M o c i i o i o ) 0 ) o i o o o o ^ ^ n v i n i P r - c o c o c o c o c n r o c Q C n c o c o c n o o o o CD O CD CD CM (/) • CO Z > • O U J O M Q I-< I— > CO CX. L U CO m ro o O) 2 0 ) " -< • UJ O Z O in 3 O I-o o z (J *- u i CO u l _ i o : o a. co CU > 10 _ l o m s. >-CO I o < > a U I U J H - E 2 < >-• Z X X X + + O O o o o o o o o o CM co CO *- ( o r o c s^rocMCMOiiooo CO (0 co CM o CM 0) I- co CM •r-*~ in co 0) in CO o r- 0) O 10 T o co CM CO in cn CO in r» *~ CO in CD t> co CO OJ 0) 0) CO to CO CM •o: CO CM CM 0) u> 00 CO 10 to CN o CM 0) r> f- ro CM *~ « « « « « « X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X + + + + + + + + + + + + + 4 O oo i n O in io o CO i n in O i n i n O in CO o CO i n CM O CM X X X X x x x x x x x x x x x x x x x X X X X X X X + + + + + + + + o o g o o o o o o o o o o o o o o o o o o o o o o o o o o o o g o o o o o o o o o o o o o o o o OOOOOOOOOOOOOOOOOOOOOOOO o o o o o o o o o o o o o o o o o o o o o o o o OOOOOOO^cMCO^inior~coo)0^(Nco^inu)t~ •3ini0f-coo> ' " ' - - - ' ' - ' - " - " - " - " - • • - O I C M C M C M C M C M r M C M K o o i n t l t j i K a i i t j i i t l i t a j i t ) . . ! . FIG. A5 HISTOGRAM OF I (.1000 samples, n = 50, \ = 3) SYMBOL COUNT MEAN ST.DEV. X 1000 1.003 0.204 EACH SYMBOL REPRESENTS 1 OBSERVATIONS INTERVAL FREQUENCY PERCENTAGE NAME 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 INT CUM. INT. CUM. •.550000 +XXXX 4 4 0 4 0 4 *.600000 +XXXXXX 6 10 0 6 1 0 •.650000 +XXXXXXXXXXXXXXX 15 25 1 5 2 5 •.700000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 35 60 3 5 6 0 •.750000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 35 95 3 5 . 9 5 •.800000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 48 143 4 8 14 3 •.850000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 67 210 6 7 21 0 •.900000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 109 319 10 9 31 9 •.950000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 1 13 432 11 3 43 2 •1.00000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 105 537 10 5 53 '7 • 1.05000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX* 94 631 9 4 63 1 •1.10000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 77 708 . 7 7 70 8 •1.15000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 75 783 7 5 78 3 • 1.20000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 67 850 6 7 85 0 •1.25000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 42 892 4 2 89 2 •1.30000 +XXXXXXXXXXXXXXXXXXXXXXXXXXXX 28 920 2 8 92 0 •1.35000 +XXXXXXXXXXXXXXXXXXXXXXX 23 943 2 3 94 3 • 1 .40000 +XXXXXXXXXXXXXXXX 16 959 1 6 95 9 •1.45000 +XXXXXXXXXXXXX 13 972 1 3 97 2 • 1.50000 +XXXXXXXXXX 10 982 1 0 98 2 •1.55000 ,+xxxx 4 986 0 4 98 6 * 1.60000 +XXXX 4 990 0 4 99 0 •1.65000 +XXXX 4 994 0 4 99 4 •1.70000 +XX 2 996 0 2 99 6 -1.75000 +XXX 3 999 0 3 99 9 •1.80000 +X 1 1000 0. 1 100 0 DO + + + + + + + + + + + + + j, + + + 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 I_<_OO-J.-~IOIOIUIJ> I D ( 0 O f f l v | U | l n i l u l M J O M M ^ O O o l O I > l » W < n O t . O CDMono bOOM<no^coM(jiOJ>cofooOOOOOOOO O O O O O O O O O O O O O O o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o > z S -1 m m JO < > + + + X + + X X X X X X X X o to CO o co o O l U l O U l U l CD O cn ui O u i oa O + + • X X X X X X X X X X X X X X X X X X X X X X X X X X X + + X X X X X X •o + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X t + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X + • X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X + + + + X X X X x x x . x X X X X X X X X X X X X x x x X X X x x x x x x x x x x x x x x x x x x X X X X X X X X X X X X X X X X X X X X X X X X : x : x : x X X X X X X X . . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 00 + o > n I </i •< 3 CO o 1— i n -< JJ X s m 00 "O o JO r~ m </i m — r> Z Q o -1 5 c l / i o z H — o o o — CO (/) to 71 < > -I o O m O • < Z KJ • l / l O to CD O l —I O C D 73 !> _ O — o o o t o CD t o 3 II cn o v >» II cn — — N>_UI<J>~J_MK3 — c o m e o — — 0 0 - » 0 — J > 4 * 0 0 O ' - J U l - - l 0 T > U l _ O — M — m — ~ J J > 0 ! K > 0 O Q O i O O O i 10 00 CD - J 01 U l J> CO CO (0 CO (0 01 o M O CO - J M (0 CO 03 01 O O O O O O O O — — K > < O U I C 1 ~ 4 C J M M — ( O O I C O — — O O o o - > o - f c * o ) o > i oi s m o i u o - M - m - > i J > o i M O O O Q < o i O ( 0 ( o _ i o _ _ < r > a > o o ~ j . o i u i J > _ K > - — O O O I D ( 0 I C ( f l l H O ~ I U l ( J ( 0 Q N l ( 0 f f l b M - ' - ' 0 I U - O O O O O l 0 _ C O i > O M K J C J l O < 0 - - J I O ( O ( 0 0 0 a l U l l D 0 0 — - a i o o *~> ~n Z JO -I m • o c O m c z 2 O • -< z -—4 JO • o m z n H c > _ a • m 96 FIG. A7 HISTOGRAM OF I (.1000 samples, n = 100, \ = 3)_ SYMBOL COUNT MEAN S T . D E V . X 1 0 0 0 1 . 0 0 1 0 . 1 4 4 E A C H SYMBOL R E P R E S E N T S 1 O B S E R V A T I O N S I N T E R V A L F R E Q U E N C Y P E R C E N T A G E NAME 5 10 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 BO I N T . C U M . I N T . C U M . -+ • . ' 6 4 0 0 0 0 + XXX • . 6 8 0 0 0 0 +XX • . 7 2 O O O 0 + X X X X X X X X X • . 7 6 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X * . 8 0 O O O 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • . 8 4 0 0 0 0 + - X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • . 8 3 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • . 9 2 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * • . 9 6 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * • 1 . 0 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X j f t ( X X X X X X X X X X X X X X X X X X X X X X X X X X * • 1 . 0 4 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * • 1 . 0 8 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * • 1 . 1 2 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • 1 . 1 6 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • 1 . 2 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • 1 . 2 4 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • 1 . 2 8 0 0 0 + X X X X X X X X X X X X X X X X X X X • 1 . 3 2 0 0 0 + X X X X X X X X X X X X X • 1 . 3 6 0 0 0 + X X X X X X X X X X X X • 1 . 4 0 0 0 0 + X X X X X X • 1 . 4 4 0 0 0 +XX » 1 . 4 8 0 0 0 * + X X X - 1 . 5 2 0 0 0 + • 1 . 5 5 0 0 0 + • 1 . 6 0 0 0 0 +X • 1 . 6 4 0 0 0 + + + + + + + 4. + 4. + + + + 4. 4. 4. 4. 5 1 0 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 3 3 0 . . 3 0 . 3 2 5 0 . . 2 0 . 5 9 14 0 . 9 1 . 4 2 0 34 2 . , 0 3 . 4 3 9 7 3 3 . 9 7 . 3 5 9 132 5 . 9 13 . 2 6 6 198 6 . 6 1 9 , . 8 105 3 0 3 1 0 . 5 3 0 , . 3 9 8 401 9 . 8 4 0 . . 1 124 5 2 5 1 2 . 4 5 2 , . 5 108 6 3 3 1 0 . 8 6 3 . . 3 8 5 7 1 8 8 . 5 71 . . 8 7 9 7 9 7 7 . 9 7 9 . . 7 7 5 8 7 2 7 . 5 8 7 . . 2 3 9 911 3 . 9 91 . . 1 3 3 9 4 4 3 . 3 9 4 , . 4 19 9 6 3 1 . 9 9 6 . . 3 13 9 7 6 1 . 3 9 7 , . 6 12 9 8 8 1 . 2 9 8 . . 8 6 9 9 4 0 . 6 9 9 . . 4 2 9 9 6 0 . 2 9 9 . . 6 3 9 9 9 0 . 3 9 9 . g 0 9 9 9 0 . 0 9 9 . . 9 0 9 9 9 0 . 0 9 9 . 9 1 1 0 0 0 0 . 1 1 0 0 . , 0 0 1 0 0 0 0 . 0 1 0 0 . 0 FIG. A8 HISTOGRAM OF I (.1000 samples, n = 100, X = 5) SYMBOL COUNT MEAN S T . D E V . X 1 0 0 0 1 . 0 0 0 0 . 1 4 1 E A C H SYMBOL R E P R E S E N T S 1 O B S E R V A T I O N S I N T E R V A L F R E Q U E N C Y P E R C E N T A G E NAME 5 10 15 2 0 2 5 3 0 35 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 I N T , , C U M . I N T . C U M . * . 5 0 0 0 0 0 + • 0 0 0 . 0 0 . 0 * . 5 5 0 0 0 0 + 0 0 0 . 0 0 . 0 * . 6 0 0 0 0 0 + X 1 1 0 . 1 0 , . 1 * . 6 5 0 0 0 0 +x 1 2 0 . 1 0 . 2 * . 7 0 0 0 0 0 + X X X X X X X X 8 10 0 . 8 1 . 0 • . 7 5 0 0 0 0 + X X X X X X X X X X X X X X X X 16 2 6 1 . 6 2 , . 6 • . 8 0 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 4 2 6 8 4 . 2 6 . . 8 • . 8 5 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 7 5 143 7 . 5 1 4 , . 3 • . 9 0 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 107 2 5 0 10 , , 7 2 5 . 0 • . 9 5 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 1 2 9 3 7 9 12 . 9 3 7 . . 9 • 1 . O O O O O + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 152 5 3 1 15, . 2 5 3 . . 1 • 1 . 0 5 0 O 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 1 18 6 4 9 1 1 . . 8 6 4 . . 9 • 1 . 1 O O O O + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • 123 7 7 2 1 2 . . 3 7 7 . . 2 • 1 . 1 5 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X * 82 8 5 4 8 . . 2 8 5 . 4 • 1 . 2 0 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 57 9 1 1 5 . , 7 91 . 1 • 1 . 2 5 0 0 0 + X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 4 8 9 5 9 4 . 8 9 5 . 9 • 1 . 3 0 0 0 0 + X X X X X X X X X X X X X X X X X 17 9 7 6 1. . 7 9 7 . . 6 • 1 . 3 5 0 0 0 + X X X X X X X X X X X X X X X X 16 9 9 2 1. . 6 • 99. . , 2 • 1 . 4 0 0 0 0 + XXXX 4 9 9 6 0 . , 4 9 9 . 6 •1 . 4 5 0 0 0 +x 1 9 9 7 0 . 1 9 9 . 7 * 1 . 5 0 0 0 0 + XX 2 9 9 9 0 . 2 9 9 . 9 • 1 . 5 5 0 0 0 + 0 9 9 9 0 . , 0 9 9 . 9 • 1 . 6 0 0 0 0 +x 1 1 0 0 0 0 . 1 1 0 0 . , 0 •1 . 6 5 0 0 0 + 0 1 0 0 0 0 . 0 1 0 0 . 0 - 1 . 7 0 0 0 0 + 0 1 0 0 0 0 . 0 1 0 0 . 0 • 1 . 7 5 0 0 0 + 0 1 0 0 0 0 . 0 1 0 0 . 0 5 10 15 2 0 2 5 . 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 98 FIG. A.9: NORMAL PROBABILITY PLOT FOR I (100C Samples, n = 10, X = 3) - 3 . 7 S • •*••..*....*....*....•....*....*....*....•....•....*....•....*... * • * * . 2 5 0 . 7 5 0 1 . 2 5 1 . 7 5 2 . 2 5 2 . 7 5 3 . 2 5 3 . 7 5 " 0 . 0 0 . 5 0 0 1 . 0 0 1 . 5 0 2 . 0 0 2 . 5 0 3 . 0 0 3 . 5 0 4 . 0 0 99 FIG. A.10: NORMAL PROBABILITY PLOT FOR i (1000 Samples, n = 10, X = 5) .+....+. -3.75 * • .•.... + .... + ...+....+,.. . + .... + ..,. + .... + .... + ....•.... + .. ..+....+... • • 2 0 . 6 0 1 . 0 1 . 4 1 . 8 2 . 2 2 . 6 3 . 0 0 . 0 . 4 0 . 8 0 1 . 2 1 . 6 2 . 0 2 . 4 2 . 8 3 . 2 100 FIG. A.11: NORMAL PROBABILITY PLOT FOR I (10C0 Samples, n = 20, \ = 3) - 3 . 7 5 • .... + .... + .... + ....*.... + .... + .... + .... + .... + .... + .... + ....•.... + .... + .... + ....* . 4 5 0 . 7 5 0 1 . 0 5 1 . 3 5 1 . 6 5 1 . 9 5 2 . 2 5 2 . 5 5 . 3 0 0 . 6 0 0 . 9 0 0 1 . 2 0 1 . 5 0 1 . 8 0 2 . 1 0 2 . 4 0 101 FIG. A.12: NORMAL PROBABILITY PLOT FOR I (100C Samples, n = 20, \ = 5) • * * * • • * * • • • • • • • • • • * • * * * .3750 .6250 .8750 1.125 1.375 1.625 1.875 2.125 2500 .5000 .7500 1.OOO 1.250 1.50O 1.750 2.OOO 102 FIG. A.13: NORMAL PROBABILITY PLOT FOR I (1000 Samples, n = 50, X = 3) 3.00 2.25 X 1 .50 P E C T E .750 0 N 0 R 0.00 M A L V -.750 A L U E - 1 .50 • • • • • * • * • • * • * • • • • • « » * * • • • • * -2.25 -3.00 -3.75 • . » . . . . * . . . . * . . . . * . . . . • . . . . • . . . . • . . . . • . . . . • . . . . • . . . . * . . . . • . . . . • . . . . • . . . . • . . . . • . . , .450 .630 .810 .990 1.17 1.3S 1.53 1.71 .540 .720 .900 1.08 1.26 1.44 1.62 1.80 103 3.75 * FIG. A.14: NORMAL PROBABILITY PLOT FOR I (1C00 Samples, n = 50, X = 5) * * • * * * * • * - 3 . 7 5 + .630 .810 .990 1.17 1.39 1.53 1.71 1.89 .540 .720 .900 1.08 1.26 1.44 1.62 1.80 104 3.75 • FIG. A.15: NORMAL PROBABILITY PLOT FOR I (10CC Samples, n = ICO, X = 3) • • a * • * -3.75 + ... + .... + ....•.... + .... + ....•.... + .... + .... + ..:. + .... + .... + ....•*•.... + ... . + .... + ... .5250 .6750 .8250 .9750 1.125 1.275 1.425 1.575 .6000 .7500 .9000 1.050 1.200 1.350 1.500 1.650 105 FIG. A.16: NORMAL PROBABILITY PLOT FOR I (10CC Samples, n = ICO, X = 5) - 3 . 7 5 * .5250 .6750 .8250 .9750 1.125 1.275 1 4 2 5 i S7S 6O00 .7500 .9000 1.050 1.200 1.350 1.500 ' 1 106 A2.2 EMPIRICAL CRITICAL VALUES CM r-x O CTl CM CM o CO O CM CO CO o LO CM IT) O CO CO r-x o o CO 1—1 LO CTl 0x1 CM o oo r—1 cn oo co- CO CO r-x r^ CO 1—1 1—1 r-H o CM o CTl o cn UD co CO LO o o o o CO CO CO CO «xf CO «xT • • • • • • • • • • • • • • • • • CM CM CM CM CM CM CM CM r-H 1—1 I-H 1—1 r—l r-H I-H t—1 i—1 co CM CTl CO r-- -3" CTl CM i — l CM oo -3- "xT "xT LO i—1 i—1 CO oo i—1 o CM CO CM xj-CTl CO o CO LO r-x i—1 i—I o CTl LO CO CM CM x* "xt CO CM o o CTl CTl cn i—1 r—1 I—1 o r-x r-x r-x r-x •xT -xT «3- <xt" CO CO CM CM • • • • • • • • • • • • • • • • CM CM CM CM I-H 1—1 1—1 T—1 1—1 r-H 1—1 I-H I — l 1—1 t—1 CO 1 QL in CO co o CM LO 1—1 CO CTl oo co CO CO •xT o r-x CTl cn I—l LO 1—1 CM CO 1— CO LO r-x. LO CTl LO CO CM LO co LO r-x CO 00 CTl CTl oo CO LO LO LO x3- x* x3" CO oo 00 co oo LO LO LO LO CO CO CO CO CM CM CM CM • • • • • • • • • • • • • • •• • • o i — i I-I 1—1 i-H 1—1 T—1 1—1 1—1 1—1 I-H 1—1 1—1 I-H 1—1 T—t o o c IO 1—4 CD CO r-x o CO CM *3" oo cn CO CM CTl CTl cn r-x CO CXJ r-x CO LO CO CO CM CO CTl rx. CO LO CO CO CO CO zr o CTl t~t CM CM CTl CO CO CO CO CO CO CO 00 00 oo co o cn LO CO CO CO CO «xt- «xt CM CM CM CM I—1 I-H 1—1 I—l • • • • • • • a • • • • • • « • • 1 i i—1 r-H i—1 i — l 1—1 I—1 f—1 I-H r-i r-H i-H I — l I-H T—1 T—t I — l CO =C CO o CTl CO CM 1—1 -3- oc CM CO cn 1—1 CO CO 1— I-H o O cn CO cn co LO r-H CM CO CO CO CO CO i—i o o r-x CO CO xj- 1—1 1—1 1—1 LO LO LO LO CM CM CM CM i — i LO co CO CO CO f-x.rx.rx. fx. 00 00 00 00 LO- • • • • • • • • • • • • • • • • • C CO UJ •xt r-- LO oo CO o CTl co r-x o CTl LO CTl CO CTl => CM LO CO r-x. CO LO CM CO LO r-x I-H CO I—l CO o _J IT) CO rx. CO CO CO CO CO o CTl CTl CTl 00 oo 00 00 <: O CO CO CO LO LO LO LO r-x CO CO CO r-x r-x r-x r-x > • • • • " • • • • i •a: c_> t—t CO CO LO CO co l-x r-x LO «3- oo CO LO r-x CTl CO CO 1— LO I—l CTl CO LO CO r-~ o 00 CO CO CO "xT CM I — l • — I CM CO o CTl CTl CTl r-x r-x r-x LO "xt x3- xfr -3" or O CO ro CM CM «3- "xT CO CO CO co r- l-x, l-x. r-. c_> • • • • • • • • • • • * • • • _ i C_) DC cn CM r—i LO co LO CTl i — l «3" CM CO cn fxx LO CM oo o l — l o CM co i — l co CO CTl co oo CO r-x CO x3- co CM 1—1 T—t O- o CM cn CTl 00 o CO CO co • r-x LO CO LO 00 r-x r-x r-x s : • CM t — i i — l 1—1 CO CO LO LO CO LO CO CO CO CO LU • • * • • CM i •a: i — l CO LO CO I—1 co LO 00 1—1 CO LO CO r-t co LO 00 UJ i o o o Q TAE c r-i CM LO o t—1 107 A2-3 PEARSON CURVE CRITICAL VALUES ID i n r-l ri o rH r-l c n o r- co vo r o VD r-l in 1— rH o rH O i I O r o r o cn i n r r -r CO rH o o O l VD VO VO VD o o o o VD VD vo vo • • • • • • • • • • CN CN CN CN CN CN CN CN rH rH rH rH in CN O r ~ r~ r» in VD rH O l VO CN O i n CO CN rH O O VD in r- r- O o o ro ro ro ro ro ro ro CT> o rH rH rH r- r-• • • • • • • * • • • • * CN CN CN CN rH rH rH rH rH rH rH rH CN cn CN VO CO ro rH in r- •ST rH in in cn i-» CN CN in VD -r -3* -r CN in VD r- r- CO co CO in in in in cn CO CO CO CO in in in in ro ro ro ro • • • • t • • • • • • • • rH rH rH rH rH rH rH rH rH rH rH rH o CO CO in ro in ro O -1* r- CN VO VD O CO ro rH >> cn CN in m o r~ rH rH CN rH CN CN CN VD VD VO VO cn in VO VO VD -r CN CN CN CN • • • • • • • • • • • • • rH H H H rH rH rH rH rH rH rH rH a 1— ro ro CO rH CO in cn VO ro vo o o -i' CO CN CO CO rH CO VO o M' ro CN «c rH r-l r ~ VD ro CN rH rH VD in in in X • m VD VD VO VO • LU • » • • • • • • • • • • * — CO 11 J _ o CN o r- CO ro o CO cn rH rH _J r-- ro r- •3' CN CN CO VO VD VD in rji •< in CN CO r~ r-» VO ro ro o cn oi cn > o ro ro ro in in m in vo vo vo • • • • • • • • ' • • • • • _ C_) 1— i—i —1" ro o VD CO ro ro o _ in cn O in CN rH CO -r CN CN cn r- VD CM in rH o o O r~ r- VO - r o ro ro ro ro m vo vo VO vo UJ • • • • • • • • • • • • • la-ce: _D C J z: o in cn in VD in cn ro ro r- o *r o m VO o CO cn rH ro CN rH CN o oo CO o cn CO CO CTl VO vo vo 00 VO vo in cx o CN rH rH rH ro ro ro ro in in in in •< • • • • • • • • • • • • • o_ ro H ro in co rH ro in co H ro in co LE o o o CO rH CN in •< 1— ro CO CN r-O CO VD CN r H O o 1 • • • r H rH rH rH cn VO r~ rH CN cn CO CO o cn cn cn ro CN CN CN « • • • rH rH rH rH ••tf V O C N O v o i n m i n - r -?•-(• T I « C N C N C N C N • • • • rH rH rH rH vo vo r-- cn rr in in in co co co co rH rH rH rH • • • • H H rH rH r- rH vo CN vo -r co ro CN CN CN' CM CO CO CO CO cn in vo rH -i' o cn oi co co r-» r» »> r> ro o oo H CO T J - C N C N ^J* r j 4 r- r- r- r-~ H r l rf tf H vO M co » r- r-vo vo vo vo rH ro in co o o rH TABLE A4 : PEARSON CURVE CRITICAL VALUES (ASYMPTOTIC) n X .005 .025 .05 .10 a .90 .95 .975 .995 .1 « I 8 .2169 .1898 .1928 .1972 .3477 .3099 .3070 .3046 .4216 .3832 .3775 .3759 '.5140 .4786 .4720 .4687 1.4443 1.6108 1.6196 1.6259 1.8188 1.8601 1.8692 1.8726 2.0673 2.0998 2.1060 2.1084 2.6656 2.6431 2.6340 2.6275 20 50 1 .3953 Z5038 .5642 .6388 1.4114 1.5753 1.7342 2.0988 3 .3674 .4789 .5424 .6219 1.4244 1.5828 1.7315 2.0573 5 .3642 .4747 .5382 .6184 1.4271 1.5845 1.7307 2.0472 8 .3630 .4728 .5366 .6163 1.4297 1.5848 1.7299 2.0398 1 .5786 .6612 .7062 .7608 1.2621 1.3560 1.4436 1.6341 3 .5627 .6494 .6971 .7548 - 1.2649 1.3545 1.4370 1.6093 5 .5599 .6473 .6951 '.7534 1.2650 1.3545 1.4353 1.6044 8 .5587 .6463 .6945 .7525 1.2653 1.3539 1.4346 1.6014 100 1 .6856 .7503 3 .6763 .7440 5 .6744 .7429 8 .6733 .7420 .7852 .8270 1.1852 .7804 .8240 1.1863 .7796 .8233 1.1860 .7794 .8234 1-1862 1.2474 1.3042 1.4239 1.2459 1.2994 1.4110 1.2455 1.2987 1.4085 1-2447 1-2981 1-4069 •TABLE A5 : GRAM-CHARLIER CRITICAL VALUES (THREE EXACT MOMENTS) a 10 X .005 .025 .05 .10 .90 -95. .975 .995 1 .2291 .3078 . .3750 .4711 1.6649 1.9223 2.10.94 2. 4263 3 .1618 .2581 .3357 .4433 1.6708 1.9292 2.1257 2. 4618 5 .1489 .2489 .3286 .4384 1.6720 1.9300 2.1279 2. 4673 8 .1418 .2438 .3247 .4357 1.6727 1.9304 2.1291 2. 4703 20 50 1 .4127 .4855 .5419 .6184 1.4492 1.6200 1.7537 1.9847 3 .3713 .4601 .5238 .6076 1.4459 1.6095 1.7437 1.9812 5 .3626 .4550 .5202 .6055 1.4454 1.6075 1.7415 1.9800 8 .3577 .4521 .5182 .6044 1.4452 1.6064 1.7402 1.9793 1 .5913 .6557 .6989 .7542 1.2722 1.3683 1.4496 1.5974 3 .5694 .6446 .6917 .7505 1.2700 1.3616 1.4407 1.5884 5 .5647 .6423 .6903 .7498 1.2696 1.3603 1.7415 1.9800 8 .5621 .6411 .6894 .7494 1.2694 1.3595 1.4379 1.5852 iob 1 .6924 .7482 3 .6799 .7423 5 .6773 .7411 8 .6758 .7404 .7822 .8243 1.1887 .7786 .8226 1.1875 .7778 .8223 1.1873 .7774 .8221 1.1872 1.2517 1.3065 1.4097 1.2480 1.3010 1.4030 1.2473 1.3000 1.4015 1.2469 1.2994 1.4007 TABLE A6: GRAM-CHARLIER CRITICAL VALUES (THREE ASYMPTOTIC MOMENTS) a n X .005 .025 .05 .10 .90 .95 .975 .995 10 20 1 . 2406 . 3131 .3770 .4698 1.6848 1.9478 2.1346 2.4506 3 . 1940 .2855 .'3598 .4630 1.6496 1.9001 2.0898 2.4139 5 . 1828 .2792 .3559 .4615 1.6433 1.8901 2.0799 2.4054 8 . 1761 .2756 .3537 .4606 1.6399 1.8845 2.0741 2.4005 1 . 4172 .4876 .5429 .6186 1.4524 1.6252 1.7593 1.9901 3 . 3830 .4698 .5322 .6144 1.4387 1.5999 1.7319 1.9653 5 . 3751 .4658 .5299 .6135 1.4362 1.5948 1.7260 1.9597 8 . 3705 .4635 .5285 .6130 1.4349 1.5920 1.7227 1.9565 50 1 . 5924 .6562 3 . 5725 .6471 5 . 5681 .6451 8 . 5656 .6441 :6991 .7542 1.2725 .6938 .7522 1.2683 .6927 .7518 1.2675 .6921 .7516 1.2670 1.3690 1.4505 1.5983 1.3592 1.4379 1.5846 1.3573 1.4353 1.5816 1.3563 1.4338 1.5798 100 1 . 6928 .7483 .7823 .8243 1.1888 1.2519 1.3067 1.4100 3 . 6810 .7432 .7793 .8232 1.1869 1.2472 1.3001 1.4017 5 . 6785 .7421 .7787 .8230 1.1865 1.2463 1.2988 1.3999 8 . 6771 .7415 .7784 .8229 1.1863 1.2458 1.2981 1.3989 TABLE A7: GRAM-CHARLIER CRITICAL VALUES (FOUR EXACT MOMENTS) a 10 X .005 .025 .05 .10 .90 .95 .975 .995 1 .2233 -.3797 '.4598 .5573 1:4398 2.0509 2.2601 2.5756 3 .1123 .2918 \3868 .5010 1.5637 2.0050 2.2406 2.5873 5 .1017 .2764 .3728 .4896 1.5831 1.9941 2.2324 2.5850 8 .0965 .2681 .3651 .4832 1.5930 1.9881 2.2272 2.5829 20 1 .3518 .5031 3 .3266 .4650 5 .3223 .4584 8 .3209 .4548 .5733 .6553 1.3862 .5398 .6289 1.4162 .5336. .6238 1.4208 .5301 .6210 1.4232 1.6631 1.8292 2.0707 1.6260 1.7896 2.0448 1.6202 1.7809 2.0375 1.6172 1.7759 2.0331 50 1 .5616 .6562 .3 .5502 .6439 5 .5481 .6415 8 .4823 .6138 .7058 .7643 1.2592 .6951 .7561 1.2634 .6930 .7545 1.2641 .6919 .7536 1.2645 1.3733 1.4709 1.6321 1.3630 1.4515 1.6110 1.3613 1.4478 1.6061 1.3604 1.4145 1.5670 100 1 .6785 .7474 3 .6715 .7416 5 .6701 .7405 8 .6693 .7398 .7843 .8280 1.1844 .7796 .8246 1.1853 .7787 .8240 1.1854 .7781 .8236 1.1855 1.2524 1.3134 1.4252 1.2482 1.3045 1.4123 1.2474 1.3028 1.4095 1.2470 1.3018 1.4079 TABLE A3: GRAM-CHARLIER CRITICAL VALUES (FOUR ASYMPTOTIC MOMENTS) n X .005 .025 .05 .10 .90 .95 .975 .995 1 .2461 .3181 .3815 • .4737 1.6798 1.9409 2.1263 2.4400 1 n 3 .1651 .2600 .3369 .4438 1.6728 1.9323 2.1288 2.4644 XU 5 .1474 .2481 .3281 .4382 1.6712 1.9287 2.1266 2.4662 8 .1371 .2412 .3231 .4350 1.6703 1.9264 2.1250 2.4668 1 .4181 .4884 .5436 .6191 1.4517 1.6243 1.7582 1.9886 ?n 3 .3724 .4606 .5241 .6078 1.4463 1.6102 1.7445 1.9820 *cu 5 .3621 .4547 .5201 .6055 ' 1.4453 1.6072 1.7411 1.9797 8 .3561 .4513 .5178 .6042 1.4448 1.6055 1.7391 1.9782 1 .5924 .6563 .6992 .7543 1.2725 1.3689 1.450.4 1.5982 50 3 .5696 . .6447 .6918 .7506 1.2701 1.3617 1.4408 1.5886 5 .5646 .6423 .6902 .7498 1.2696 1.3602 1.4388 1.5863 8 .5618 .6409 .6894 .7494 1.2694 1.3594 1.4377 1.5850 100 1 .6928 .7483 3 .6799 .7423 5 .6772 .7411 8 .6757 .7404 .7823 .8243 1.1888 .7786 .8226 1.1875 .7778 .8223 1.1873 .7774 .8221 1.1871 1.2519 1.3067 1.4099 1.2481 1.3011 1.4030 1.2473 1.3000 1.4015 1.2469 1.2994 1.4006
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The index of dispersion
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The index of dispersion Avelino, Edgar G. 1984
pdf
Page Metadata
Item Metadata
Title | The index of dispersion |
Creator |
Avelino, Edgar G. |
Publisher | University of British Columbia |
Date Issued | 1984 |
Description | The index of dispersion is a statistic commonly used to detect departures from randomness of count data. Under the hypothesis of randomness, the true distribution of this statistic is unknown. The accuracy of large sample approximations is assessed by a Monte Carlo simulation. Further approximations by Pearson curves and infinite series expansions are investigated. Finally, the powers of the individual tests based on the likelihood ratio, the index of dispersion and Pearson's goodness-of-fit statistic are compared. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0095978 |
URI | http://hdl.handle.net/2429/24541 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1984_A6_7 A94.pdf [ 6.79MB ]
- Metadata
- JSON: 831-1.0095978.json
- JSON-LD: 831-1.0095978-ld.json
- RDF/XML (Pretty): 831-1.0095978-rdf.xml
- RDF/JSON: 831-1.0095978-rdf.json
- Turtle: 831-1.0095978-turtle.txt
- N-Triples: 831-1.0095978-rdf-ntriples.txt
- Original Record: 831-1.0095978-source.json
- Full Text
- 831-1.0095978-fulltext.txt
- Citation
- 831-1.0095978.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0095978/manifest