SIMULTANEOUS ESTIMATION OF THE PARAMETERS OF THE DISTRIBUTIONS OF INDEPENDENT POISSON RANDOM VARIABLES by KAM-WAH JTSITI B.Sc, The Chinese Uni v e r s i t y of Hong Kong, 1970 M.Sc, Univ e r s i t y of Windsor, 1974 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY i n THE FACULTY OF GRADUATE STUDIES Department of Mathematics and I n s t i t u t e of Applied Mathematics and S t a t i s t i c s We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF June, ^ ) Kam-Wah BRITISH COLUMBIA 1978 Tsui, 1978 In presenting th is thes is in p a r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ibrary shal l make it f ree l y ava i lab le for reference and study. I fur ther agree that permission for extensive copying of th i s thes is for scho lar l y purposes may be granted by the Head of my Department or by h is representat ives . It is understood that copying or pub l i ca t ion of th is thes is for f i n a n c i a l gain sha l l not be allowed without my wri t ten permission. Department of Mathematics and Institute of Applied Mathematics and Statistics The Univers i ty of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date June 30, 1978 i i Supervisors: Professor S. James Press and Professor James V. Zidek ABSTRACT This work is devoted to simultaneously estimating the parameters of the distributions of several independent Poisson random variables. In particular, we explore the possibility of finding estimators of the Poisson parameters which have better performance than the maximum likelihood estimator (MLE). We first approach the problem from a fre-quentist point of view, employing a generally scaled loss function, called the k-normalized squared error loss function P L ( A , A ) = Z ( A . - A . ) 2 / A K , K. , - 1 1 1 1=1 where k is a non-negative integer. The case k=0' is the squared error loss case, in which we propose a large class of estimators including those proposed by Peng [1975] as special cases. Estimators pulling the MLE towards a point other than zero as well as a point determined by the data itself are proposed, and i t is shown that these estimators dominate the MLE uniformly. Under L^ with k >_ 1, we obtain a class of estimators dominating the MLE which includes the estimators proposed by Clevenson and Zidek [1975]. We next approach the problem from a Bayesian point, of view; a two-stage prior distribution is adopted and results for a large class of prior distributions are derived. Substantial savings in terms of mean squared error loss of the Bayes point estimators over the MLE are expected, especially when the Poisson parameters f a l l into a relatively narrow range. i i i An empirical Bayes approach to the problem is carried out along the line suggested by Clevenson and Zidek [1975]. Some results are obtained which parallel those of Efron and Morris [1973], who work under the assumption that the random variables are normally distributed. We report the results of our computer simulation to quantitatively examine the performance of some of our proposed estimators. In most cases, the savings, under the appropriate loss functions, are an in-: . creasing function of the number of Poisson parameters. The simulation results indicate that our estimators are very promising. The savings of the Bayes estimators depend on the choice of prior hyperparameters, and hence proper choice leads to substantial improvement over the MLE. Although most of the results in this work are derived under the assumption that only one observation is taken from each Poisson distri-bution, we extend some results to the case where possibly more than one observation is taken. We conclude with suggestions for further work. iv TABLE OF CONTENTS Page SECTION 1. INTRODUCTION 1.1 Background 1 1.2 Outline 5 SECTION 2. NOTATION AND FUNDAMENTALS 2.1 Notation 10 2.2 Basic Lemmas and Theorems 11 SECTION 3. ESTIMATION UNDER SQUARED ERROR LOSS 3.1 Introduction 19 3.2 Notation 20 3.3 Shifting the MLE Towards k 23 3.4 Adaptive Estimators 36 SECTION 4. ESTIMATION UNDER K-NORMALIZED SQUARED ERROR LOSS 4.1 Introduction 41 4.2 Minimax Estimators 41 4.3 Better Estimators Under L, t 50 k SECTION 5. BAYESIAN ANALYSIS 5.1 Introduction 57 5.2 Estimates of the Parameters 59 • 5.3 Marginal Posterior Density 70 5.4 Summary 76 V SECTION 6. EMPIRICAL BAYES ESTIMATION 6.1 Background .. • 78 6.2 Relative Savings Loss i n the Poisson Case 79 6.3 The Plus Rules 81 6.4 Bayes Rules with Respect to Other P r i o r s 81 6.5 Truncated Bayes Rules 83 6.6 The Risk Function of the Estimator A* 87 6.7 S i m p l i f i c a t i o n of R(b,A*) 89 6.8 Risk Function of A* as a Function of A 91 6.9 Risk Function of A-'as a Function of A . 94 6.10 Risk Function of the Clevenson-Zidek Estimators 95 6.11 Summary 96 SECTION 7. COMPUTER SIMULATION 7.1 Introduction 98 7.2 Estimators Under k-NSEL 100 7.3 Bayes Estimators 103 7.4 Estimators Under Squared Error Loss 109 7.5 Comparison of the Estimators 112 SECTION 8. EXTENSIONS 8.1 Motivation 113 8.2 Estimators Under Generalized Squared Error Loss 114 8.3 Estimators Under Generalized k-NSEL 117 8.4 An A p p l i c a t i o n 120 SECTION 9. SUGGESTIONS FOR FURTHER RESEARCH 122 BIBLIOGRAPHY 123 VI LIST OF TABLES Page I. Improvement Percentage of A over the MLE Narrow Range f or X/s 101 I I . Improvement Percentage of A over the MLE Wide Range for A / s 101 I I I . Improvement Percentage of A over the MLE Narrow Range for A / s 102 A l IV. Improvement Percentage of A over the MLE Wide Range f or A / s 102 V. Improvement Percentage of A over the MLE 104 VI. Poisson-Distributed a Cu=0, v=0) Narrow Range for A ±' s 104 VII. Poisson-Distributed a (u=0, v=0) E f f e c t of y 107 VIII. Poisson-Distributed a (u=0, v=0) Wide Range for A.'s 107 l IX. Negative Binomial-Distributed a (u=l, v=0, a^=3.0) 108 X. Geometric-Distributed a (u=l, v=0, a^=1.0) 108 A(0) XI. Improvement Percentage of A over the MLE Narrow Range f or A / s 110 * (0) XII. Improvement Percentage of A over the MLE Wide Range f or A ? s 110 XIII. Improvement Percentage of A^ m^ over the MLE Narrow Range for A^'s I l l XIV. Improvement Percentage of A over the MLE Wide Range for A 's I l l v i i : ACKNOWLEDGEMENTS I would l i k e to express my gratitude to Professor James Zidek for suggesting my thesis topic and guiding my research during the f i r s t year of my work. His keen in s i g h t s were extremely h e l p f u l . I would also l i k e to thank Professor S. James Press for supervising my work for the past two years, and for suggesting the topics studied i n Sections 4 and 5. Both Professor Zidek and Professor Press provided encouragement and stimulation during the preparation of th i s t h e s i s . Thanks are further extended to Professor Stanley Nash for meticulous reading of the th e s i s , and to Professor Fred Wan, who has been very h e l p f u l during my entire, enrollment at the Un i v e r s i t y of B r i t i s h Columbia. The f i n a n c i a l assistance of the Univ e r s i t y of B r i t i s h Columbia and the National Research Council of Canada, as we l l as the use of the com- . puter f a c i l i t i e s at the Un i v e r s i t y of B r i t i s h Columbia and the Un i v e r s i t y of C a l i f o r n i a , Riverside, are g r a t e f u l l y acknowledged. F i n a l l y , I would l i k e to thank my wife for her steadfast encouragement and patient assistance. 1 SECTION 1. INTRODUCTION 1.1 Background The l i t e r a t u r e on the problem of estimating the mean of a p-dimen-s i o n a l m u l t i v a r i a t e normal d i s t r i b u t i o n N (0,E), both when the covariance P matrix E i s known and unknown, has p r o l i f e r a t e d since Stein [1956] d i s -covered the s u r p r i s i n g r e s u l t that when £ - I, the sample mean, which i s the maximum l i k e l i h o o d estimator CMLE), i s inadmissible under squared error loss when p > 3. Better estimators of the mu l t i v a r i a t e normal mean have been found by Alam [1973, 1975], Baranchik [1964, 1970], Berger [1976a, 1976b, 1976c], Berger and Bock [1976a, 1976b], Berger et a l . [1977], Efron and Morris [1973, 1976], Haff [1976, 1977], Hudson [1974], James and Stein [1961], L i n and Ts a i [1973], Stein [1974], Strawderman [1971, 1973], and others. B a s i c a l l y , these estimators are obtained by shrinking the MLE towards a known fi x e d point or a point determined by the data i t s e l f . When we consider the problem from another point of view, we see that f o r simultaneous estimating of several means of normal populations, i t i s inadmissible to estimate each population mean by i t s sample mean, even though the populations have no mathematical dependence and the observations between populations are independent. Attention has also recently been brought to simultaneous estimation of the parameters of non-normal d i s t r i b u t i o n s , including the d i s c r e t e d i s t r i b u t i o n s . Johnson [1971] shows that i n the binomial case the usual estimator i s admissible under squared error l o s s . This i s due to the superb performance of the usual estimator near the boundaries of the para-meter space. Also, attempts have been made to work with general members of the exponential family i n the simultaneous estimation problem (Hudson [1974, 1977]). 2 In t h i s study, we concentrate on the s i t u a t i o n where the underlying d i s t r i b u t i o n i s Poisson. There are many p r a c t i c a l s i t u a t i o n s that lead us to investigate the problem of simultaneous estimation of the para-meters of several independent Poisson d i s t r i b u t i o n s . For example, a metro-p o l i t a n area divided into several f i r e precincts might be interested i n knowing the expected numbers of f i r e s i n each of the precincts i n a f i x e d time period so that f i r e - f i g h t i n g resources can be optimally a l l o c a t e d . During that period, the number of f i r e s i n a si n g l e precinct could be supposed to follow a Poisson d i s t r i b u t i o n , with an i n t e n s i t y parameter which would be d i f f e r e n t across p r e c i n c t s . In the study of the process of o i l w e l l discovery by wildcat exploration i n Alberta, Canada, Clevenson and Zidek [1975] suggest that the problem be treated as one of simultaneously estimating the parameters of the d i s t r i b u t i o n s of several independent Poisson random v a r i a b l e s . ; They l e t each parameter represent the expected number of/ o i l w e l l d i s -coveries during a p a r t i c u l a r month. With t h e i r a v a i l a b l e data, the number of parameters i s approximately 200. Let x^,...,Xp be observations of p independent'.Poisson random variables_-X^,.. .yXp bwlth i n t e n s i t y parameters A^,..........A , r e s p e c t i v e l y and p >_ 2. Let x = (x^,...,Xp), X = (X^,.. .,-Xp), and A = (A^,..., A }. The problem i s to simultaneously estimate the parameters A^,...,A based on the data x. The usual estimator of A i s the MLE, X. One might conjecture that, as i n the normal case, estimators uniformly better than the MLE can be found i n the Poisson case when p i s large. Using the normalized squared error l o s s ^ R "2 function L(A,A) = £ (A. - A.) /A., Clevenson and Zidek [1975] do indeed i = l obtain a large c l a s s of estimators dominating the MLE as long as p > 2. 3 Such a loss function was chosen p a r t l y because t h e i r A^'s were expected to be small and hence inaccurate estimation of small X_/s seemed highly undesirable; t h e i r l o s s f u n c t i o n r r e f l e c t s the desire to penalize over-estimation of small A.'s. The loss function was chosen also because the x usual estimator would be minimax i n t h i s case and because A. i s the i 1 population's variance. Their estimators shrink the MLE towards the o r i g i n . They give two reasons why one might expect that shrinking the usual estimator w i l l y i e l d a better estimator: F i r s t l y the loss function penalizes heavily f o r bad estimates when the A.'s are small, and i n such cases i t i s only p o s s i b l e to produce .'; bad overestimates. Secondly Stein's r e s u l t s suggest i t i s better to r e s t r a i n random m u l t i v a r i a t e and hence chaotic observations by shrinking them toward some point, here zero which does play a distinguished r o l e i n view of the f i r s t reason above and because of i t s s p e c i a l nature as the extreme point of the parameter space. The estimation problem under d i f f e r e n t error l o s s functions has been P - 2 studied by others. Using the squared error loss function L( A, X) = 7. (A.-A.),, i = l 1 1 Hiidson [1974] proposes some empirical Bayes estimators for A. The estimators. _ are expected to improve considerably on X for a wide range of parameter values, e s p e c i a l l y when X^,...,A are s i m i l a r i n value and are not too large. But when one parameter A^ i s very large and the.o.ther parameters small, the estimators are i n f e r i o r to the MLE X, though the loss i s expected to be small. That i s , the proposed estimators do not dominate the MLE uniformly. Hudson also derives an unbiased estimate U for the improvement of the r i s k when the estimator A i s used instead of X, i . e . E, U(X) = R(A,X) - R(A,A). A A s u f f i c i e n t condition f o r A to be uniformly better than X i s that U(x) > 0 f o r a l l x and U(x) > 0 for some x. Estimators of the form 4 A = (11- C(X))X, are of considerable appeal. For, they are s i m i l a r to those obtained i n the normal case. However, Hudson [1974] shows that i n order for the estimators of t h i s form to y i e l d U(x) >_ 0 for a l l x, i t i s necessary that C(x) =0. When a l l the observations but one are zero, none of the observations can be moved any closer to the o r i g i n . This i s because the dimension of the problem i s then e s s e n t i a l l y one, and when p = 1,^'the' MLEiis known to be admissible under squared error l o s s (Hodges and Lehmann [1951]). Because of the d i f f i c u l t y of handling cases when a l l but one of the parameters are near zero, i t was conjectured that i t might be impossible to improve upon the MLE when squared error l o s s i s the c r i t e r i o n . The r e s u l t s of Peng [1975] are therefore quite s u r p r i s i n g . He shows that althoughtth'ezusual estimator i s admissible when p _^ 2, i t i s inadmissible when p >_ 3 by proposing estimators that are a c t u a l l y superior to the usual one uniformly i n A. One of h i s estimators shrinks the MLE towards the o r i g i n provided the number of non^-zero observations i s greater than two. He also proposes an estimator which shrinks a l l the non-zero components of the MLE towards zero, but which sometimes gives non-^zero values for the zero components of the MLE. The same condition i s required, namely the number of non-zero observations i s at l e a s t three, i n order that the proposed estimator be s t r i c t l y better than the MLE. So f a r , the estimators that are uniformly better than the MLE under c e r t a i n l o s s functions shrink the MLE towards the o r i g i n only. There i s no other point towards which the shrinkage i s made. We propose some estimators which shrink the MLE towards an a r b i t r a r y prechosen nonnegative integer k, and some estimators which shrink the MLE towards a point 5 determined by the data. These estimators dominate the usual one uniformly under squared error l o s s . 1.2 Outline In Section 2, we introduce the notation to be used i n the subsequent sections and given some basic lemmas that are u s e f u l . E s s e n t i a l l y , the basic lemmas (Lemmas 2.2.2 and 2.2.7) provide i d e n t i t i e s of the " r i s k d e t e r i o r a t i o n " under c e r t a i n l o s s functions when an estimator X i s used instead of the MLE X. Two basic theorems w i l l also be stated. The f i r s t one (Theorem 2.2.3), due to Peng 11975], provides estimators of the Poisson parameters which dominate the MLE uniformly under squared error l o s s . The other (Theorem 2.2.9), due to Clevenson and Zidek [1975], provides estimators which dominate the MLE uniformly under normalized squared error l o s s . An alternate proof f o r the l a t t e r theorem i s provided. Section 3 examines the estimation problem when the f a m i l i a r squared error l o s s function i s used. Although Peng [1975J investigates the problem and finds estimators dominating the MLE when p >_ 3, the performance of h i s estimators i s expected to be good only when a l l the underlying parameters are r e l a t i v e l y small or when only some of the.parameters are large. I t i s therefore of i n t e r e s t to see i f further improvements are possi b l e . In "(k) t h i s section, we generalize Peng's r e s u l t s and propose estimators X which dominate the MLE uniformly In-X:: For a f i x e d non-negative integer :he (k) k, p u l l s the MLE towards k whenever t e number of observations greater than k i s at l e a s t three. The X) '= X. + f. ( X ) , where estimator X = (X^,...,X ) i s given by 6 r k<Kx)h(x ±) f (x) = - — , i = 1,.. . -, p 1 if - *y E :.h (x.) j = i - J and r i s the maximum of the number of observations greater than k, le s s K. two, and zero. The conditions on t)>(x) and h(x^) are given i n Section 3.2. When the number of observations greater than k i s le s s than three, * (k) " (k) A gives the same estimate as the MLE. When k=0, X reduces to one of Peng's estimators. We also propose estimators X^m^ which s h i f t the MLE towards a point determined by the data i t s e l f . These adaptive estimators, which dominate the MLE uniformly, are expected to perform w e l l for a wide range of parameter values, including the case when the parameters, are a l l r e l a t i v e l y Fml large but s i m i l a r i n value. The estimator A i s s i m i l a r i n form to A (k) X with the functions h(x.), <J>(x)» and r, replaced by appropriate ones. X K. In Section 4, we examine the simultaneous estimation problem from a d i f f e r e n t angle. In addition to normalized squared l o s s , we employ a more general los s function, namely, k-normalized square error los s (k-NSEL) A - A 2 k L, (A,A) = E (A. - A.) /A. , where k i s a p o s i t i v e integer. The case K ' , _ X X X 1=1 when k=l, L^, i s simply the normalized squared error los s function under which Clevenson and Zidek [1975] perform t h e i r a n a l y s i s . The usual estimator X of A, under L^, i s minimax. Using t h i s f a c t , a s u f f i c i e n t condition that an estimator A of A be minimax under L^ i s that i t s r i s k i s l e s s than or equal to that of X uniformly i n A. Theorem 2.2.9 thus provides us with a c l a s s of minimax estimators of A. A t y p i c a l member of t h i s c l a s s i s of the form A(X) = (1 - <j> (Z) / (Z+p-1) )X where (|>(z) i s a nondecreasing real-valued function and 0 <^ <)>(z) 2 ( p - l ) . 7 In the f i r s t part of t h i s section, we s h a l l show that t h i s c l a s s of mini-max estimators can be enlarged i n two ways. F i r s t , we include estimators of the form where (1) ip'(z) ^ b > 0 for some b (2) (j)(z) i s nondecreasing and 0 £ <j>(z+l) £. 2Min{p-l, i|^ (z) } f o r a l l z >_ 0 (3) iKz) i s nonincreasing and z + iR.z) i s nondecreasing. We next include estimators of the form x(x) = :(i ^7Z?- ) X ( z + b ) t + 1 where (1) t > 0, b > t+1 (2) <j>t(z) i s nondecreasing (3) <j>t(z) >_ 0 and <|> (z) i 0 (4) «> (z) / (z+b) t £Min{2 (p-t-1) , 2 (b-t-1) }, Moreover, when b = p-1, condition (4) can be replaced by <)> t(z)/(z+p-l) t < 2(p-t-1). In the remainder of the section, we provide motivation for the use A A A of k-NSEL, L, , (k >_ 2) and derive estimators X'~ (X X ) dominating the MLE under L^. B a s i c a l l y , the estimators have the form (KZ)X.(X.-l) •••(X.-k+l) X = X. - 1 1 1 l i p E (X.+l)(X.+2) •••(X..+k) + X.(X.-l) •••(X.-k+l) 2~j_ 3 3 3 8 where (1) 0 < cb(z) <_ 2k(p-l) (2) (j)(z) i s nondecreasing i n z. Because the MLE X i s a Bayes estimator i f p r i o r knowledge of the i n t e n s i t y parameters i s vague and the parameters are independent, i t i s hoped that s u b s t a n t i a l p r i o r knowledge, when i t i s a v a i l a b l e , can be incorporated i n a Bayesian manner to obtain s i g n i f i c a n t l y better estimators of A than X. In Section 5, we take a Bayesian approach to the problem of simultaneously estimating the p i n t e n s i t y parameters when mean squared error i s the loss c r i t e r i o n . The i n t e n s i t y parameters are assumed ex-changeable i n the sense of de F i n e t t i [1964], and to j o i n t l y follow a two-parameter gamma d i s t r i b u t i o n , a p r i o r i . For the second stage of the p r i o r d i s t r i b u t i o n we adopt a vague p r i o r density f o r one of the gamma d i s t r i b u t i o n parameters, and a generalized hypergeometric d i s t r i b u t i o n f o r the other. The j o i n t and marginal posterior d e n s i t i e s of the Poisson i n t e n s i t y parameters are developed, and Bayesian point estimators are found. In Section 6, we focus on estimators of X of the form X = (l--"b(Z))X and use empirical Bayes methods to perform analysis along the l i n e suggested by Clevenson and Zidek [1975]. Although no new estimators are found i n thi s section, the approach w i l l be h e l p f u l i n gaining i n s i g h t s about some of the estimators. We ca l c u l a t e the " r e l a t i v e savings l o s s " i n the Poisson case under normalized square error l o s s and use i t as a t o o l to obtain "plus r u l e s " (A = (1 - b(Z)) +X) and "truncated Bayes r u l e s " . We also c a l -culate the r i s k R(A,A ) of the Clevenson-Zidek estimators V 5 = [1 - ( s(p-l) / (Z+p-1) ) ]X 9 's s as a function of X, where 0 <_ s <_ 2. We f i n d that R(X,X ), which i s P a c t u a l l y a function of A = Z X., i s increasing and concave i n A. i = l 1 Section 7 contains the r e s u l t s of a computer simulation used to q u a n t i t a t i v e l y compare the MLE with some of our estimators, as w e l l as those of Clevenson and Zidek [1975] and Peng [1975]. For each estimator X, the percentage of the savings i n r i s k using X instead of the MLE i s A. calculated as [^(^»x) R(X,X)- > 2.00]%, using the los s function under K\A,Ji.) which X was derived. In most of the cases, the improvement percentage i s seen to be an increasing function of p, the number of d i s t r i b u t i o n s of the independent Poisson random v a r i a b l e s . For the non-Bayes estimators, the improvement percentage generally decreases as the magnitude of the X^'s increases. The improvement percentage of the Bayes estimators depends on the choice of p r i o r hyperparameters, and hence proper choice leads to su b s t a n t i a l improvement over the MLE. In Section 8, we return to the search for better estimators than the MLE under various los s functions, but a l t e r the s e t t i n g by allowing d i f -ferent numbers of observations to be taken from each population. We are led to consider weighted l o s s functions, an example of which i s the P - 2 k generalized k-NSEL function L(X,X).- : -2 c.(X. - X.) /X. . More i = l estimators dominating the MLE uniformly i n X are proposed, and an a p p l i -cation to estimation of the parameters of p independent Poisson processes i s provided. F i n a l l y , Section 9 consists of proposals for further research. 10 SECTION 2. NOTATION AND FUNDAMENTALS Let X.,...,X be p independent Poisson random va r i a b l e s with i n t e n s i t y 1 P parameters A^,...,A , r e s p e c t i v e l y (p >_ 2). For easy reference, we introduce here some notation that w i l l be used throughout t h i s paper. We also include some basic lemmas and theorems which prove to be u s e f u l i n ensuing sections. 2.1 Notation D e f i n i t i o n s (1) "(Random v a r i a b l e name) 'V ( D i s t r i b u t i o n name)" indicates that the random v a r i a b l e has the s p e c i f i e d d i s t r i b u t i o n with given parameter(s). (2) X = (X^,...,X ); x = (x^,...,x ) i s the vector of observations of X; P A= (A.,,...,A ); A = E A.. 1 P i = 1 i (3) J = the set of a l l integers; J + = the set of a l l nonnegative integers. (4) f . : J p -»• R, i = l , . . . , p , are functions from the p - f o l d Cartesian product J*3 of J with i t s e l f to the set of r e a l numbers R. (5) f(X) = (f..(X),...,f (X)). E j f . ( X ) | < », i = l , . . . , p . X p A X (k) (.6) Y = Y(Y-l) ... (Y-k+1)[, where k i s a p o s i t i v e integer and Y i s a r e a l number. P P (7) Z = Z X.; z = E x. . . , x . , i 1=1 i = l (8) <()(z) i s a nonr-decreasing realrvalued function. (9) [ S ] W = the w t h power of S. 11 t i l (10) e. = the p-vector whose i coordinate i s one and the r e s t of x whose coordinates are zero. (11) R(A,A) = the r i s k of the estimator A of A. (12) NB(u,p) = the negative binomial d i s t r i b u t i o n with probability-mass function Pr(z|b) = z+p-1 z ^ bP ( l ^ b ) Z , z e J + . P P (13) H = Max {x.}; m= Min {x.}. i = l 1 i = l 1 (14) N. = # {x. : x. = j'}. 3 x x J (15) N = (N ,...,N ). O. x. (16) ( y ) + = Max {y 50}. 2.2 Basic Lemmas and Theorems Let A be an estimator of A and R(A,A) be the r i s k of A. Most of our estimators considered i n subsequent sections w i l l be of the form X + f(X) where f i s as defined i n subsection 2.1. Define the r i s k improvement of A over the MLE, X, to be I = R(A,X) T, RCA,A). Hudson [1974J furnishes a us e f u l i d e n t i t y which we state as Lemma 2.2.1 below. Based on that i d e n t i t y , Hudson derives a basic i d e n t i t y about the unbiased r i s k improvement estimate under squared error l o s s : I = E ^ U(X), where U i s a function of X only. Peng 11975] defines r i s k d e t e r i o r a t i o n D = -I = E (.^U(X))' and shows that h i s estimators s a t i s f y the i n e q u a l i t y A ^U(x) <^ 0 for a l l x with s t r i c t i n e q u a l i t y for some x. This implies that the proposed estimators dominate the usual one under squared error l o s s . The i d e n t i t y of the r i s k d e t e r i o r a t i o n w i l l be stated as Lemma 2.2.2 below. 1 2 Use of the same i d e n t i t y leads to the discovery of s t i l l more estimators of X better than the MLE under squared error l o s s . We s h a l l show i n Section 3 that to any fi x e d nonnegative integer k, there corresponds a cla s s of estimators of X which shrinks the MLE towards k as long as the number of variables i s at l e a s t ;three. These estimators w i l l give an. estimate d i f f e r e n t from the MLE i f the number of observations that i s greater than k i s at l e a s t three; otherwise they give the same estimate as the MLE does. Adaptive estimators w i l l also be proposed i n Section 3. Lemma 2 . 2 . 1 . (Hudson [ 1 9 7 4 1 ? Peng [ 1 9 7 5 ] ) . Suppose Y ^ Poisson Cu) and G : R -* R i s a measureable function such that E | G ( Y ) I < oo and G(y) = 0 i f y < 0. Then yE G ( Y ' ) -= E Y G ( Y ^ 1 ) . v y u The following lemma .gives the unbiased estimate A of the d e t e r i o r a t i o n o i n r i s k of X = X + f(X) as compared to the r i s k of the MLE X. Lemma 2 . 2 . 2 . (Hudson I1974J, Peng [ 1 9 7 5 ] ) . Suppose X i s a random vector with independent Poisson random va r i a b l e s as coordinates, and X i s the corresponding Poisson parameter vector. Then the d e t e r i o r a t i o n i n r i s k D of X = X + f(X) as compared to the r i s k of the MLE i s D = R(X,X) - R(X,X) = E, A where X o P P A = I f .(X) + 2 ,S. X.[f.(X) - f . ( X - e . ) ] . o J . , l i i l l i = l 1=1 Proof: Use Lemma 2 . 2 . 1 (see Peng [ 1 9 7 5 ] ) . We see that a s u f f i c i e n t condition for an estimator of the form X + f(X) to have smaller r i s k than X (under squared error loss) i s that P o P A = E f.(x) + 2.E x. [f.(x) - f.(x - e.)] ° 1=1 1 1=1 1 1 + P <^ 0 for a l l x e J with s t r i c t i n e q u a l i t y f o r some x e J' . The following theorem i s due to Peng [1975]. Theorem 2.2.3. _+P Let X^,...,Xp be independent Poisson random va r i a b l e s with unknown expectations A^,...,A , and l e t the loss function L be given by * P o ? L(X,X) = E (A. - X.) . 1-1 ' 1 1 The estimator X - [ (p-N Q-2) +/S ]H dominates the MLE, X i f p >^ 3. Here X =• (X_,.. ., X ) , 1 P X = (X.,...,X ) e [Q,°o]P, 1 p X. H. = ..E1 (1/k) f o r i = 1 p (H. = 0 i f X. = 0), 1 k=l 1 X H = (H, ,...,H ), 1 P l l l .2 P 2 S = H = E H., i = l 1 N = # {X. : X. = 0} = the number of zero observations, and (p-N -2), o 1 x r o + Max {p-N -2,0;}. o Proof: Use Lemma 2.2.2 (see Peng ;[1975]). 14 In ad d i t i o n to the squared error l o s s function, we s h a l l consider other los s functions. We derive an i d e n t i t y s i m i l a r to that of Lemma 2.2.2.for the case of "k-normalized squared error l o s s " (k-NSEL), P ^ 2 k L K ( A , A ) = Z ( A . - 2^) A * i = l where k i s a p o s i t i v e integer. The motivation for considering k-NSEL i s discussed i n Section 4.3. The d e r i v a t i o n of the i d e n t i t y w i l l be decom-posed into the following four lemmas. Lemma 2.2.4. Suppose Y * Poisson (y) and h ; J -*• R i s a real^-valued function such that (1) E^ |h(Y)J < -Then (.2) h(j) =0 i f j < 0. v hOQ_ _ _ h(Y+l) u -u y Y+l Proof: v v y = 0 ti. y! V y=l y J oo -y y = o + s h(y+i)\;f- y y=0 _ I h ( y + l ) . e V ' y=o ^ y ! = E h(Y+l) y Y+l -' Q.E.D. The next lemma i s an immediate consequence of Lemma 2.2.4. 15 Lemma 2.2.5. Suppose Y ^ Poisson (u), k i s a p o s i t i v e integer and h : J -> R i s a real-valued function such that (1) E y |h(Y+j)| < ~, j = 0,...,k-l (2) h(j) = 0 i f j < k. Then v H(Y) = h(Y+k) = h(Y+k) y ^k % ( Y + k ) ( k ) y (Y+k)•(Y+k-1)•••(Y+l) Proof: Induction on k and a p p l i c a t i o n of Lemma 2.2.4. Lemma 2.2.6 below i s a gen e r a l i z a t i o n of Lemma 2.2.2 to the vector case. Lemma 2.2.6. independent Suppose ^ Poisson (A X, i = l , . . . , p , p >^ 2, and k i s a po s i t i v e integer. I f f : J P R, i = l , . . . , p , are functions on the p-f o l d Cartesian product of J,, such that (1) E |f,(X + je,);| ••:«», j ••= 0,...,k-l then (2) f ± ( x ) = 0 i f x < k, f .(X) f;(X + ke.) E, = E 1 1 A A k X (X. + k ) ( k ) l l Proof: Condition on X^ : j ^ i and apply Lemma 2.2.5. Let X = X + f(X) be an estimator of X, where f(X) = ( f , ( x ) , . . . , f (X)) 1 P and the f.'s s a t i s f y the conditions i n Lemma 2.2.6. The next lemma l gives an unbiased estimate of D^, the d e t e r i o r a t i o n i n r i s k of X as compared to the r i s k of X. 16 Lemma 2.2.7. Under k-NSEL, the d e t e r i o r a t i o n i n r i s k of A i s D k = R(A,A) - R(A,X) = Afc. where p f 2 ( X + ke.) P f.(X + ke.) - f.(X + (k-l)e.) A = E — 77V + 2 E (X. + k) — 1 1 1 k 1-1 ( k ) 1-1 1 (X + k ) ( k ) 1 X (x + k) 1 1 U i + k ; Proof: p (A. - X. - f . ( X ) ) 2 R(A,A) = E £ 1 1 1 A . k 1=1 A. P (A - X ) 2 p f 2 ( X ) p (X -A.) f.(X) = E, E ^ - T ; 1 — + E, E + 2E, E A . . - . k A k X . .. ,k :1=1 A. 1=1 A. 1=1 A. P f 2 ( X ) p X.f.(X) p f ; ( X ) = R(A,X) + E, E - i = — + 2E. E 2E. E -=r-T A . .. k A . . . v k A . .. , k - l i = l A. 1=1 A. 1=1 A. i l l p f 2 ( X + ke ) = R(A,X> + E [ E -± p£r-1-1 (X. + k ) W p f .(X + ke.) -. f . (X + (k-l)e.) . + 2 E (X + k) 1 Ijr- =-], 1=1 ••' " (X. + k) W The l a s t equality follows from Lemma 2.2.6. The r e s u l t follows immediately. Q.E.D. The s p e c i a l case of Lemma 2.2.7.when k = 1 i s the case when normalized squared error l o s s i s considered. The r e s u l t i s stated as a c o r o l l a r y below. 17 C o r o l l a r y 2.2.8. Under the normalized squared error loss function, the d e t e r i o r a t i o n i n r i s k of A i s D = R ( A , A ) - R(A,X) = E A , where P + 2 Z [f.(X+e.) - f.(X)']. 1=1 1 1 The following theorem of Clevenson and Zidek [1975].provides a cl a s s of estimators A of A dominating the MLE under the normalized squared error l o s s function. Theorem 2.2.9. Let X^,...,Xp be p independent Poisson random v a r i a b l e s with unknown parameters A ^ , . . . , A (p >_ 2) . Let A = ( A ^ , . . . , A ), X = (X^,...,X ), and p P P P Z = Z X.. Let the los s function be :the normalized squared los s 1=1 1 function L ( A , A ) = z ( A . - A . ) 2 / A ; . ; 1=1 1 1 1 Then, for a l l A , the r i s k using the estimator A * = [ 1 - (<j>(Z)/(Z+p-l)) ]X i s l e s s than or equal to the r i s k using A = X when <f> : [0,°°) -> [0,2(p-l)] i s nondecreasing and not i d e n t i c a l l y zero. Proof: Let f.(x) = -cb(z)x./(z+p-l) i f x. > 0 X x x — ' = 0 i f x. < 0, i = 1 x , . . . >P-By C o r o l l a r y 2.2.8, the d e t e r i o r a t i o n i n r i s k i s p <p 2(z+l)(x.+l) p (f)(z+l)(x.+l) p <f>(z)x. A = E 1 2 E -~~ + 2 E , - 1 1 - 1 t _ i _ _ \ 2 • 1 Z + P • i z+p-1 1=1 (z+p) 1=1 1=1 2 < — — ( ^ + ^ — 2(j)(z+l) + 2 4l(z+Jil^. ( s i n c e § ± s nondecreasing)' — z+p z+p—± = <p(.z+l) ,<Kz+D _ 2(p^-l) z+p z+p-1 <_ 0 (since 0 <_ <f>(z+l) £ 2(p-l););. A A Therefore, R(A,X) >_R(A,A) for a l l A. In other words, A dominates X uniformly i n A. Q.E.D. In Section 4, we propose estimators A of the form X + f(X) which s a t i s f y the i n e q u a l i t y A^ £ 0 for a l l x and A^ jjf 0. Those estimators therefore dominate the MLE under k-NSEL. 19 SECTION 3. ESTIMATION UNDER SQUARED ERROR LOSS 3.1 Introduction Probably the most extensively studied lo s s function used i n estima-r t i o n problems i s the squared error l o s s function. I t s popularity can be ascribed to i t s mathematical t r a c t a b i l i t y and also to the f a c t that " i t i s an acceptable approximation i n a wide v a r i e t y of s i t u a t i o n s " (DeGroot [1970], p. 228) where the loss depends s o l e l y on the d i f f e r e n c e between a parameter and i t s estimate. I f we allow k to be equal to zero, the squared error l o s s function can be thought of as a s p e c i a l case of k-NSEL. This section w i l l be l i m i t e d i n scope to simultaneous estimation of Poisson parameters under the squared error loss function. The s e t t i n g independent i s as described previously. We suppose that <v Poisson (A^), i = l , . . . ; p and that one observation i s taken from each population. As mentioned i n Sections 1 and 2, t h i s problem has been investigated by Peng [1975], who succeeded i n f i n d i n g estimators dominating the MLE when p >_ 3 (Theorem 2.2.3). B a s i c a l l y , h i s proposed estimators p u l l the MLE towards the o r i g i n whenever the number of non-zero observations exceeds two. The performance of h i s estimators i s expected to be good when the underlying parameters A^ are r e l a t i v e l y small. When some of the para- : meters are large, however, very l i t t l e improvement over the MLE i s ant i c i p a t e d . In t h i s s i t u a t i o n , some very large observations are l i k e l y to occur and both Peng's estimator (Theorem 2.2.3). and the MLE give v i r t u a l l y the same estimate. In order to remedy t h i s s i t u a t i o n , Peng [1975] uses Stein's method [1974] to modify h i s estimator. I f a l l the parameters A_ are . r e l a t i v e l y large, none of the estimators proposed by Peng w i l l give noticeable improvement over the MLE. B a s i c a l l y , 20 t h i s i s due to the fa c t that those estimators are biased toward the o r i g i n , a point f ar away from the true X. Estimators that s h i f t the observations towards a point i n a neighbourhood of the true underlying parameter would be expected to give better estimates i n t h i s case. For each non-negative integer k, we show that there i s a family of estimators " (k) A (k) X of X such that X dominates the MLE uniformly i n X under the squared A(k) error l o s s function. For a fi x e d k, the estimator X has the property that i t p u l l s the MLE towards the integer k whenever the number of observations x^ greater than k i s at l e a s t three. Otherwise, i t gives the same estimate as the MLE. In the case when k=0, we have Peng's r e s u l t . Estimators that s h i f t the MLE towards a point determined by the data i t s e l f w i l l also be proposed. These adaptive estimators are expected to perform w e l l f o r a wide range of parameter values, including the case when the parameters are a l l r e l a t i v e l y large but s i m i l a r i n value. Most of the r e s u l t s of Peng [1975] w i l l be generalized. In the next subsection, we w i l l introduce the notation that i s used i n t h i s section. E s s e n t i a l l y , we employ the notation used by Peng [1975]. 3.2 Notation Let x = ( x l 9 . . . , x ) be a vector of observations of the random 1 P vector (X,,... ,X ), where the X.'s are mutually independent Poisson 1 p l random v a r i a b l e s with parameters X^,...',X7, r e s p e c t i v e l y . Let f = ( f ^ , . . . , f p ) be as defined i n Section 2.1 and s a t i s f y the following conditions: (1) f^(x) = 0 i f x has a negative coordinate (2) E. |f.(X+je.);| < co for j = 0,1-v A 1 1 The notation employed here i s l i s t e d below f o r reference. 21 D e f i n i t i o n s : (1) EL = #{x^ : x^ = j} , i . e . the number of x^'s that are equal to j . P P. (2) I = Max{x.}.; m = Min{x.} . i = l 1 i = l 1 (3) N = (N ,...,N ). O X , k (4) For any non-negative integer k, r = (p - E N - 2) . n=0 (5) h : J •+ R i s a real-valued function such that h(y) = 0 i f y < 0. P 2 p 2 (6) S = E h (X.); S = E. ti (x.). i = l 1 i = l 1 (7) Write f ± ( x ) = T (N) i f x. = j . (8) <J>: J P •> R i s a real-valued function s a t i s f y i n g the following properties: ( i ) <j> i s nondecreasing i n each argument x^ whenever x. > k. l — ( i i ) $ i s nonincreasing i n each argument x^ whenever X. < k. l — ( i i i ) There i s a r e a l number B > 0 such that 0 <j <j>(x) 1 2B and <f>(x) i 0. We are interested i n functions h which s a t i s f y the properties l i s t e d i n Lemma 3.2.2 below. Before s t a t i n g the lemma, we provide a representative example of the functions h we want. r 22 Example 3.2.1. (a) Tor k >_ 2 y-k h(y) = 1 + Z i f y = k+2,k+3, n=2 =M i f y = k+1 = 0 i f y = k or y < 0 k-y Z n=l = -b r — - r — i f y = 0,...,k-l k+l-n where b i s a p o s i t i v e number to be determined such that (8) of Lemma k 1 -1 3.2.2 below holds. One such b i s b = v3( X — - z ) k+l-n n=l (b) For k = 1 h(y) = 1 + Z T T ~ i f y = k+2,k+3,... 0 k+n n=2 = 1 i f y = k+1 = 0 i f y = k or y < 0 = -b i f y = 0 where b i s any p o s i t i v e number, (c); For k = 0 y 1 h(y) = Z i i f y = 1,2,... n=l = 0 i f y <_ 0. The following lemma gives the properties of h. For s i m p l i c i t y , we denote h. = h ( j ) . 3 23 Lemma 3.2.2. Let h be as defined i n Example 3.2.1. Then h s a t i s f i e s the following properties: 2 2 (1) h. - h. , i s nonincreasing i n i for i > k+1. J j-1 (2) j [ h ^ - h --^ ] i s nondecreasing i n j for j > k and lim j [ h . - h ] = B for some B > 0. (3) h > h._v j = 1,2,... (4) h k = 0. (5) h. > 0 i f j > k+1. 3 -(6) I f k > 0, then h < 0 f o r j < k. (7) h k + l i B . (8) 3B 2 > hjh provided k > 0. (9) \ + 1 > j [ h - h j_] f o r j > k+2. (10) h 2 - h 2 . < h 2 - h 2 for 1 < j < k i f k > 0. j j-1 - j+1 j - J The proof of the lemma i s straightforward and i s omitted. 3.3 S h i f t i n g the MLE Towards k We use the notation defined i n 3.2 and define r cb(x)h(x ) f.(x) ^ i - , i = l , . . . , p . (3.3.1) * (k) We s h a l l show that the estimator X = X + f(X) of X dominates X uniformly i n X under the squared error los s function when p >^ 3. The 24 " (k) estimator X s h i f t s the coordinates of the MLE towards the integer k provided the number of observations greater than k i s at l e a s t three. R e c a l l that the r i s k d e t e r i o r a t i o n of the estimator X = X +f(X) as compared to X i s , by Lemma 2.2.2, R(X,X) - R(X,X) = E A A P 2 P where A = E f.(X) + 2 E X.[f.(X) - f. ( X - e . ) ] . (3.3.2) i = l 1 i = l 1 1 1 1 In terms of N and (N) , A can be rewritten as A = E N . ¥ . ( N ) + 2 E j N . . ( N ) - ¥._ (N-6.+6._ )] (3.3.3) j_Q J 1 j = l ^ th where 6^ i s an (£+1)-vector with the j coordinate equal to one and the * (k) other coordinates zero. To show that X = X + f(X) dominates X under the squared error l o s s , i t s u f f i c e s to show that A(x') <_ 0 f o r a l l +P x e J . The proof w i l l use the following s e r i e s of lemmas. For convenience, we define A j . J H [ T W .,<,»_««,] ( 3 ^ 4 ) (= x.[f.(x) - f . ( x - e . ) ] , with x. = j ) i i I I 7 I Lemma 3.3.5. (1) For k >_ 2, j [ h . - h. ]$(x)r,N. S - h.(h.+h. ,) A. < - — J 3=± ^ [ \ .1 j " 1 ] i f 1 < j < k. 3 S S - hf +:-hT... J J-1 <p(x)kN r h (2) A. < fc , 1 . S + h L l 25 j [ h -h ]r cJ,(x)N S - h (h +h ) (3) A. < -1 -1 1 K 1 [ -1 -1 j 1 ] 3 S S - h. + hf , A[h -h„ J U ( x ) N . S-2h 2 A £ 1 k 1 [ — V " ] for k+2 < j < Proof: (1) i s true because <j>(x) i s nonincreasing f o r x^ <_ k, i = l , . . . , p . (2) i s true because cb(x) i s nonincreasing for x. <^ k and h, n < 0. (3) "..is true f o r the following reasons: ? ( i ) I f r. > 0, then S > 2h. f o r k+2 < j < I and N. f 0. k J ~ 3 2 2 ( i i ) - h j _ i ^ s nonincreasing by (1) of Lemma 3.2.2. ( i i i ) lu > n (iv) cb(x) i s nondecreasing when x^ >^ k, i = l , . . . , p . Q.E.D. Lemma 3.3.6 2h 2 > h 2 - h 2 where H > k+2. k+1 ~ x, £_ 1 — Proof: By (9) of Lemma 3.3.2, h ^ + ^ ^_ j [h - n j _ i ^ f ° r J — k+2. Also, since l u - h j _ ^ i s decreasing, < < 2 V l . - 3 — k+1 2 [ h i + V l ] Hence 2h. > j [ h . - h. J — J , •' - J J-1 J 26 h. , for j > k+2. 3-1 Q.E.D. Lemma 3.3.7. £ S - 2h2 {Nk+l + E N i 2 V" } ^ rk' " rk > k + i j=k+2 3 S - h£ + h^_1 k k Proof: The l e f t hand side of the above i n e q u a l i t y i s A S - 2h2 N + I. N ^ \ + l + . _ . W j _ . 2 . .2 j=k+2 J S - h £ + h^_ 1 £ £ m W - Whrhtl> + jJ+2 V - 21=k+2N^tH S " h£ + h l l . £ £ . i=k+i V + W 2^r h f o l i > - 2 4 +i Ni h? s - h' + h L i rk^ 2 2 2 > x 5— (since 2h, _ £L-h0 - h 0 , by Lemma 3.3.6) S ~ h £ + h£-l >_ r ^ (since r ^ > 0) Theorem 3.3.8. Q.E.D. r±(|)Cx)hCx1) With f . (x) = - , i = l , . . . , p , p s 3, and h as described i n Lemma 3.2.2, A < 0. . Proof: R e c a l l that A = E N.Y.(N) + 2 E jN.[¥.(N) - V. .(N-g.+fi. ,)] j=0 3 3 j = l J J J - 1 3 J _ 1 Case 1. r, = 0. Then A = 0 < 0. k — Case 2. r, > 0. k 2 2 I _ r, cb (x) (i) E N X ( N ) j=0 3 3 s £ k-1 £ ( i i ) 2 E A. = 2 E A. + 2A^ + 2A^ + 2 E A. j-1 3 3=1 3 J=k+2 3 k-1 with the understanding that E A. = 0 i f k <_ 1. j=l J By Lemma 3.3.5, k-1 <t>(x)r k-1 S - h.(h.+h ) 2 E A <_ - E j [ h - h ]N [ ] 3 3 ] 3=1 3 3=1 3 3 - 1 3 S - h 2 + h 2 ^ 4>(x)r k-1 S - h 2 - h h 1 - * j [ h j - h j - i ] N j [ g A, A 1 j = l J J S — h. ~f* h, i J 3-1 since h. < 0 for j < k and h. > h. so L h n > h.h. . for 1 < j < k J J 3 J-1 1 0 - ] J-1 2 2 Now r f c > 0 implies that S >_ 3 n k + 1 1 3B . By (8) of Lemma 3.2.2 2 k " l we have S - h. -Lemma 3.3.5 i h..hn > 0 for N. ? 0. Hence E A. < 0. Again by 3 1 0 - 3 3=1 3 ~ ^ ( x ) k N k r k h k _ 1 ( k + D N k + 1 r ^ ( x ) h \ - q . h2 a n d V l S S + h k - l (because h k = 0). 28 As a r e s u l t , £ 2kN r <Kx)h 2 Z A. £ k k k " 1 j = 1 J S + V l 2r <(>(x) £ S - 2h? " [ ( k ± 1 ) N k + l \ + l + * [ V h £ - l ] * N j — 2 - 2 - 1 3=k+2 S-h £+h £_ 1 (by Lemma 3.3.5. (3)' ). Thus, £ 2 k N k V ( x ) h k _ 1 2 K x ) r k k N k + 1 h k + 1 i — 2 S 2r <p(x) £ S-2h 2 " [ h k + A + l + £ [ h£" h£-l ] * N j „ h 2 + h 2 1 j=k+2 S-h j i+h £_ 1 1 2 k N k r k K x ) h k _ 1 2 ^ ( x ) r k k N k + 1 h k + 1 1 <; + h 2 S S + \ - l 2r <l>(x)£[h - h . J £ S-2h 2 k [N i + Z N. ^ ] S L"k+1 .... 2.. 2. 3=k+2 S - t ^ + h ^ (since h k + ]_ > A l h ^ - h ^ ] ) 2 k N k r k » ( x ) h k M l 2 » ( x ) r k k N k + 1 h k + 1 _ 2r2<Kx)B 1 S - l l (by Lemma 3.3.7 and since £[h -h _^] ^ B) . Consequently, 2 k N r ^ ( x ) h 2 k N k + I r k K x ) h k + 1 r j ^ x ) A £ ~ - ^ — z [2B - (|>(x)J (3.3.9) s + K L i < 0 since 0 £ <J>(x) £ 2B, h ^ < 0, and h k + 1 > 0. Q.E.D. 29 Our r e s u l t s show that to every f i x e d non-negative integer k, there i (k) corresponds a cla s s of estimators X of X which has the property that a l l i t s members p u l l the coordinates of the MLE towards k when the number % of observations that are greater than k ( i . e . E N.) i s at l e a s t three. j=k+l 2 If the number of observations greater than k i s less than three, then the " (k) estimators X give the same estimate as the MLE X. The choice of k i s c r i t i c a l . I t may depend on p r i o r information a v a i l a b l e about the range A (0) of the X.'s. When k=0, the estimators X shrink a l l the non-zero l observations towards zero as long as the number of non-zero observations i s at l e a s t three. Such a value of k should be chosen only when we have some p r i o r knowledge about the X^ .'s i n d i c a t i n g that they aire a l l close "(0) to zero. X i s expected to perform w e l l i n t h i s s i t u a t i o n , e s p e c i a l l y when a l l the X.'s are close to one another. I f the p r i o r information x suggests that the A_/s are l i k e l y to be large and within the range [a,b] with 0 < a < b, then a large value of k somewhere around a may be chosen. Choosing k i n t h i s manner w i l l l i k e l y lead to considerable improvement over the MLE. Hence, having some p r i o r knowledge about the parameters X^ i s advantageous to the estimation problem. In f a c t , we see that, according to our simulation r e s u l t s reported i n Section 7, our Bayesian analysis i n Section 5 does lead to s u b s t a n t i a l improvement over the usual procedure when the parameters are known to be close to one another. Notice that the bound (3.3.9) f or the unbiased estimate A of the " (k) d e t e r i o r a t i o n i n r i s k of X depends on k, N^ and ...Hence, savings i n r i s k would be greater i f N^ and are large. In other words, when more observations are close to the chosen integer k the r i s k w i l l be reduced much more. One might say that r e l i a b l e p r i o r information 30 can be p r o f i t a b l y exploited i n our estimation problem. The dependency of the bound for A on k, N. and N, ,- further implies that the estimators k k+1 " (k) + X f o r various k e J are competitive; one cannot dominate the other. In the case of simultaneous estimation of p normal means, the existence of an estimator which shrinks the MLE towards zero and dominates the MLE implies the existence of another estimator shrinking towards any f i x e d point and s t i l l dominating the MLE. This i s due to the t r a n s l a t i o n invariance of the normal density and the squared error los s function. However, i n our case, we do not have t h i s invariance property f o r the Poisson p r o b a b i l i t y function, and hence the existence of better estimators shrinking towards a point other than zero i s not automatic even though a better estimator shrinking towards zero e x i s t s . Thus our r e s u l t s are not obvious consequences of Peng's. Since our estimators depend on h, i t i s i n t e r e s t i n g to f i n d more examples of functions h which have the properties i n Lemma 3.2.2. Below are some examples. Example 3.3.10. (k = 0) Let h(y) = In(ay) i f y > 1 = 0 i f y < 1 where a •> 4. Let B = 1. We check below that t h i s function h s a t i s f i e s the properties l i s t e d i n Lemma 3.2.2. (a) In order to show that property (1) holds, i t i s s u f f i c i e n t 2 1 2 2 to show that h.. >_ ^ ^ h j j - i + h j + l ^ f o r J - 2* T h i s i s c e r t a i n l y true since 2 the function G(y) = [In(ay)] i s concave for y >_ 1. (b) Property (2): The function F(y) = y[ln(ay) In a(yr-l)] = y [ l n y l n ( y - l ) ] i s decreasing when y>_ 2 since the d e r i v a t i v e 31 F' (y) = l n ( l + - - \ < 0 for y > 2. Also, h. - h = In a J y-1 y-1 — 1 0 >^ 2[ln(2a) - In a] whenever a >_ 4. Moreover, l i m j [ l n j - l n ( j - l ) ] = 1 = B. (c) Properties (3), (4), (5), (7), and (9) c l e a r l y hold, and properties (6), (8), and (10) are not applicable here because k = 0 i n t h i s example. Example 3.3.11. (k = 0) Let h : J -*- R be any function such that h^ = 0 i f j <^ 0 and j 1 h. = E — f o r i = 1,2,... . Here {g :}. i s a sequence of r e a l numbers J i g n J n=l °n s a t i s f y i n g (1) g x = 1 (2) g n + 1 - g n > 1, for n = 1,2,... (3) {—} i s nonincreasing and lim -^ = B > 0. g . g. Properties (1) through (10) of Lemma 3.2.2 are checked as follows: 2 2 2 (a) Property (1) holds since 2h_. - (hj_-j_ + n j + i ^ j-1 i g i + 1 8, {2[g^i - g j t 2 r 1 - 2 + „ - — J - > > 0 SjSj+l j + 1 J n = l ^ n J 8j g j + 1 for j >_ 2. (b) Property (2) holds since s a t i s f i e s requirement (3) above. (c) A l l the other properties c l e a r l y hold. Property (8) of h given i n Lemma 3.2.2 guarantees that S - h.(h. + h._^) >^ 0 for j < k, which i s a s u f f i c i e n t condition that k-1 £ A. <_ 0. However, i t i s not a necessary condition, as the following j = l 3 theorem shows. 32 Theorem 373'.'121 Let h : J -> R be as described i n Lemma 3.2.2 except that properties (3) and (8) are replaced by (3)' h. > h. , for j > k+1 3 J-1 (8)' hj = -b < 0 for j = 0,...,k-l. Then Theorem 3.3.8 s t i l l holds. k-1 Proof: The change s t i l l gives E A. <_ 0. j = l 2 The following theorem i s a s l i g h t v a r i a t i o n of Theorem 3.3.12. Theorem 3.3.13. Let h be as described i n Theorem 3.3.12. Suppose 0 <_ cb(x) <_ Min{2B,l} i f x± < k, i = 1,.. . ,p. Define r cb(x)h(x ) f .(x) = - — i f x. > k l S l = 0 i f x. = k i b r k = cb(x) Min{—-—, 1} i f x. < k for i = l , . . . p . *(k) + P Then the estimator A = X + f(X) s a t i s f i e s A(x) £ 0 f o r a l l x e J ( i . e . A dominates X uniformly under the squared error l o s s function). Proof: k-1 It can be checked that E A. < 0 for k > 0 and that J-1 J " I r V ( x ) E < ; . Hence A < 0 as shown i n Theorem 3.3.8. j-0 3 3 ~ s Q.E.D. Note that estimators A of A given i n Theorems 3.3.8 and 3.3.12 have the property that they p u l l the x^'s that are fart h e r away from k 33 more than those x.'s that are closer to k. This means that the extreme 1 observations experience a great deal of s h i f t i n g , while the observations close to k are s h i f t e d to a lesser extent. There i s s t i l l another choice of h that w i l l guarantee A <_ 0. Theorem 3.3.14. Let h be as described i n Lemma 3.2.2 except that property (8) i s v . replaced by (8)" 3 h 2 + 1 > h l h Q . Then Theorem 3.3.8 s t i l l holds. (In t h i s case, |hg| has a larger upper bound). k-1 Proof: Note that E A. i s s t i l l l e s s than zero i n t h i s case. j = l 3 An example of a function h described i n Theorem 3.3.14 i s given below. Example 3.3.15. (k >. 2) J 1 Let h. = E - i f j > k+1 n=l = 0 i f j = k i f 0 < j < k. L . k+l-n - . n=l A (k) The estimators A derived thus f a r have the property that i f the l^1 observation i s equal to k, then A ^ ) = That i s , there i s no s h i f t i n g of the observations having the value k. The next theorem provides an estimator of A which improves on the MLE but whose estimate of A^ th i s not n e c e s s a r i l y equal to k i f the i observation i s equal to k. The theorem u n i f i e s and gernalizes Theorems 3.1 and 5.1 of Peng 11975]. 34 Theorem 3.3.16. Let h be as described In Theorem 3.3.12 except that properties (4) and (7) of h are replaced by (4) ' h k = -b (7).' h k + 1 > Max{l,B}. Define : i ^ " " S rk<|) (x)h(x ±) f . (x) = - • - — :V: i f x. > k x = (j>(x) Min{ — 1 " + 1 } i f x < k for i = l , . . . , p . Suppose 0 <_ <p(x) <_Min{l,2B} i f x^ < k, i = l , . . . , p , and l e t X = X + f G O . Then A G O < 0 for a l l x e J . The proof of the theorem i s s i m i l a r to that of Theorem 3.3.8. R e c a l l that we denote V (N) = f ^ G O i f x^ = j . Proof: r k h k + l F i r s t , note that 1 - — - > 0 since (7)' holds and S > (p - Z N n ) h 2 + 1 > r k h k + 1 . n=0 Case 1. r, < 0. Then A =0 < 0. k — — Case 2. r, > 0. k 2 2 £ „ r cp G O ( i ) Z N.W,(N) < j=0 3 3 S ( i i ) Z A < <p(x) Z jN.[V(N) - V(N-6.+6. -)•] < 0 j = l 3 j = l 3 3 3 ~ L b r k r k \ + l where V(N) = Min {—- , 1 - *x} . Note that V(N) = V(N-6.+6. ,) f o r o 13 j J - l 1 < j < k. 35 ( i i i ) 2 j A - 2 ( k + l ) N k + 1 I? (Ji) . V ^ k + l + 6 k)J + 2 j A j=k+l J J=k+2 J 1 2 H v i W N ) - 2 ( k + i ) N k + i \ ( N - 6 k + i + 6 k ) -(the reasoning i s s i m i l a r to that of Theorem 3.3.8). ( i ) , ( i i ) , and ( i i i ) imply that 2 k \ + l r k < K x ) h k + l V * 0 0 (3.3.17) Remarks: (1) The bound f o r the unbiased estimate of the r i s k d e t e r i o r a t i o n A given by (3.3.17) depends on k and N k + ^ . I f N k + ^ i s l i k e l y to be large, " (k) 1 then great improvement i n r i s k w i l l r e s u l t i f A i s used instead of X. Moreover, the dependence of (3.3.17) on k implies that the estimators " (k) 1 + A f o r d i f f e r e n t values of k e J are competitive. (2) The s p e c i a l case when k = 0, cb(x) = 1, b = 1, and h i s as given i n Example 3.2.1 (c) i s Peng's [1975] Theorem 5.1, which shrinks a l l non-zero observations towards zero while a p o s s i b l y non-zero estimate of A. i s given f o r x. ="0. i i (3) The case when k = 0, <j>(x) = 1, b = 0, and h i s as given i n Example 3.2.1 (c) i s Theorem 3.1 of Peng [1975], which we stated as Theorem 2.2.3 i n Section 2. (4) I f b = 0 i n Theorem 3.3.16, A^ w i l l be estimated as zero if';-., x^ = 0. However, i f b > 0, the estimate f o r A^ w i l l be p o s s i b l y non-zero i f x^ = 0. The choice of r e l a t i v e l y large value of b can be interpreted as r e f l e c t i n g the b e l i e f that the X^'s are non-zero. 36 3.4 Adaptive Estimators The estimators A " ~ of A suggested i n 3.3 p u l l the MLE towards a prechosen non^negative integer k, and the choice of k i s guided by the p r i o r knowledge of the A.^'s. A natural question which a r i s e s i s : Is there.an. .estimator A of A which s h i f t s the observations towards a point determined by the data i t s e l f ? We s h a l l show that the answer i s affirm-t i v e . P P Rec a l l that m = Min {x.}. We define new functions H. : J R, i = l 1 1 i = l , . . . , p as follows: x. -m 1 H.. (x) = 1 + 1 E - f - i f x. > m+1 and m > 0 x • . „ m+n x — n=2 = 1 i f x. = m+1 and m > 0 x — = 0 i f x. = m and m > 0 x — = 0 i f m < 0 for i = 1, (3.4.1) The functions have s i m i l a r properties as those of h described i n Lemma 3.2.2. We state the properties of the H^'s i n the following lemma. Lemma 3.4.2. Let B = 1. The H 's,.i = l , . . . , p , have the following properties: 2 2 (1) BL(x) - H^(x - e^) i s nonincreasing i n x^ f o r x^ > m+1 with m >_ 0. (2) x.[H.(x) - H.(x i- ..e.)J::is nonincreasing i n x. f o r x. > m with. x x x x x x m > 0 and l i m x.[H.(x) - H.(x - e.)] = B. — x.-*° i i i i x (3) H i(x) > H i(x - e_^ ) i f x^ > m and m >^ 0. (4) H.(x) = 0 i f x. = m. x x (5) H.(x) > B i f m > 0 and x. = m + 1. x — — x 37 (6) H.(x) > (j + x )[H.(x + je.) « H.(x + ( j - l ) e . ) ] f o r j > 1, 1 1 1 1 1 2_ — x. = m +11, and m > 0. i Proof: The proof i s straightforward and hence i s omitted. T i n 1 The following theorem provides us with a c l a s s of estimators X of X which p u l l the MLE towards a point determined by the data, namely the minimum of the x.'s. l Theorem 3.4.3. Let X [ m ] = d| m ],...,A^ m ]) be such that - f m l (p-N -2) <p(X)H (X) , LmJ v m + I - i i . A. = X. , I = l , . . . , p , where I l p » »f» E Hf (X) i = l 1 (1) The H..'s are as described i n Lemma 3.4.2. l (2) N = #{i : X. = m} and p > 4. m l — (3) <p(x) i s non-negative and cp(x) <_ 2B f o r some B > 0. (4) <p(x) i s nondecreasing i n each agrument x^. ~ Tml Then f o r a l l X = (X^,.-. .,X ), X dominates X under squared error l o s s . Proof: P 2 Define f.(N) = f.(x) i f x. = j > 0 and l e t S = E HT(x). The 3 1 - i-1 1 proof i s very s i m i l a r to that of Theorem 3.3.8. Case 1. (p-N -2), = 0. Then A = 0 < 0. m + — Case 2. (p-N -2), > 0. m + I ? (p-N -2 ) J$ 2(x) (i) E N.¥,CN) ^ 2L=-J£ j=m 2 2 b 38 £ £ ( i i ) 2 Z A. = 2 Z A. 3=m 3 j=m+l 3 . 2 (m+l)N m Cp^N -2) cb (x)h(m+l) £ = - m + 1 g m + + 2 Z A. j=m+2 J £ where h(j) i s defined to be H.(x) for x. = j . the summation 2 Z A. 1 1 j=m+2 J can be shown (.ci... Lemma 3.3.5 (3) ) to be less-than or equal to 2 (P-N -2) K x ) £ S - 2 h 2 ( i V £[h(£) - h(£-l)] Z N. S 2 h . j=m+2 3 S-h (£)+h (£-1) Hence we have .. £ 2 (p-N -2) cb(x) 2 Z A. < c {(m+l)N ,.,h(m+l) ,, j — S m+1 j=m+l £ S - 2h (j) + £[h(£) - h(£-l)] Z N. - } j=m+2 2 S-h (£)+h (£-1) (p-N -2)2<f>2(x) 2m(p-N -2) cb(x)N ...h(m+l) _ J A ^ m -t- m + m+1 a n d A < _ _ 2(p-N m-2) 2cb(x)B g ( c f . Lemma 3.3.7) 2m(p-N -2) cb(x)N .^(m+1) (p-N -2)2<j>(x) m + m+1 m + r o xr\^ = g g [2B - <j>:(x)] <_ 0 since 0 <j,cb(x) £ 2B. Q'.'E.D. Remarks: (1) Note that N > 1 and hence the estimators \^ m^ dominate the m — • ^ MLE only when p i s at l e a s t four instead of three. The increase of the dimension i s needed because we estimate k, a point towards which the MLE 39 should be s h i f t e d , by the data. (2) When a l l the observations x^ of the p Poisson random v a r i a b l e s are equal to the same value, say X q , one would conjecture that the parameters of the p random v a r i a b l e s are close to one another or even i d e n t i c a l . In t h i s case, one would estimate the X.'s by the grand mean P 1 C E x.)/p, which i s equal to x . Our Bayesian point estimators pro^-i= l 1 ° posed i n Section 5 give such an estimate f o r the A^'s i n t h i s s i t u a t i o n , as do the estimators X^m^ described i n Theorem 3.4.3 above. That i s , X_^ = x^ for a l l i i f x_^ = x^ for a l l i , an i n t u i t i v e l y appealing r e s u l t . As Peng [1975] has noticed, i f some of the observations x^,...,x^ are large, the unbiased estimate of the improvement i n r i s k , i . e . -A, w i l l be small. This makes the savings of the proposed estimators small. In order to tackle t h i s problem, Peng used Stein's method [1974] to modify h i s proposed estimator f o r Poisson parameters to guard against extreme observations. However, h i s estimators s t i l l give small savings when a l l the observations are large. For example, when the minimum of the A^'s i s greater than or equal to 12, say, the savings from using Peng's ^Tml modified estimator w i l l s t i l l be small. Our proposed estimator X i s ~ [nil u s e f u l i n t h i s s i t u a t i o n . The savings of X w i l l be much greater than Peng's estimators provided that the observations f a l l i nto a r e l a t i v e l y narrow i n t e r v a l , i . e . when the observations do not d i f f e r very much. In general, the estimators X are expected to perform w e l l i n a very wide range of parameter values, including the cases when the A^'s are r e l a t i v e l y small as w e l l as when they are r e l a t i v e l y large. The * Ttnl improvement i n r i s k of X over:the MLE should be considerable, e s p e c i a l l y when p i s large and the \^''S are close to one another. A simulation 40 r e s u l t .of; the behaylour of i s reported i n Section 7, which supports our conjecture. Moreover, further a p p l i c a t i o n of Stein's method [1974] to our estimators A'-"1"' w i l l y i e l d a more v e r s a t i l e estimator A ^ of A i n that, unlike Peng's estimator, i t guards against extreme observations and cases i n which a l l the parameters are large or small. The a p p l i c a t i o n i s straightforward (cf. Peng [1975]). We now digress to discuss an obvious a p p l i c a t i o n of the techniques :". i n t h i s section to extend Hudson's r e s u l t Q-1977], p. 18) on general d i s c r e t e exponential f a m i l i e s . Though we w i l l not carry out the d e t a i l e d a nalysis here Csince i t i s outside the scope of t h i s study), we b r i e f l y give::, an. example below. The int e r e s t e d reader should read Hudson 11977] before proceeding to the following example because we w i l l use the notation employed by Hudson. Using h i s notation, we define r CD m = Min {x.;}. and recall t(x.) = t. i f x„ = j 1=1 1 1 J 1 x.^ m x (2) b v = 1/t. + Z ™ - if x = m+2,,,.. x i x k=2 ck+m 1 = l / t n if x. = m+1 ' 1 x = 0 i f x. = m x (3) S = i f , pm.= ( P-N m-3) + • 1=1 X p (4) g.(x) = -r • b x , g(x),=-(g1Cx).,....,g (X)). i p Then if p >^ 5, the estimator T + g(x) dominates T = (tCX^',... ,t(X )) under squared error loss. The improvement in risk exceeds 2P N -L-I t _,, p 2 <p b t., o 41 SECTION 4. ESTIMATION UNDER K-NSEL 4.1 Introduction Theorem 2.2.9 provides us with a clas s of estimators dominating P- 2 the MLE, X, under normalized squared error los s L^ = E (A - X^) /X^. i = l Since X i s minimax under L^, the estimators given i n Theorem 2.2.9 are ac t u a l l y minimax estimators of X. One way to show that an estimator of the form X + f(X) i s minimax i s to show that f s a t i s f i e s the conditions i n Lemma 2.2.6 and that 2 p f.(x+e.) p A.. = E 1 1 + 2 E [f.(x+e.) - f (x) ] 1 . i X.+l . T X X X X=l X x=l <_ 0 for a l l x e J + (Corollary 2.2.8). In the f i r s t part of t h i s section, we s h a l l use Co r o l l a r y 2.2.8 to show that the clas s of minimax estimators given i n Theorem 2.2.9 can be enlarged i n two ways. The remainder of the section i s devoted to attempts to f i n d estimators better than the MLE under the k-normalized squared P A 2 k error l o s s (k-NSEL) function L (X,X) = E (X. - X.) /X. , with k >_ 2. 1C . - . 1 1 1 1=1 We s h a l l make use of Lemma 2.2.7 to prove the main r e s u l t s and s h a l l also discuss the motivation f o r using k-NSEL. 4.2 Minimax Estimators We s h a l l show i n t h i s subsection that a large c l a s s of minimax estimators can be obtained which includes the class of estimators proposed by Clevenson and Zidek [1975] as a subclass. The estimators we consider are of the form X + f(X) and loss function L^ i s used. As remarked i n 4.1, we need only show that A^(x) <_ 0 for a l l x e J +P 42 Let ij>(z) be a nonincreasing real-valued function such that z + i|>(z) i s nondecreasing f or z e J + . The following theorem gives a cla s s of estimators which dominate the MLE X and hence are minimax. Theorem 4.2.1. independent Suppose X ^ ^ Poisson (A^), i = l , . . . , p , p J^2. Define A ( X ) = [1 - cb ( Z ) / ( Z + I ( J ( Z ) ) ] X where (1) i|j(z) >_ b > 0 for some b (2) cb(z) i s nondecreasing (3) 0 £<|>(z+l) <_ 2Min{(p-l) ,^(z)} for a l l z e . J . Then for a l l A = (A. A ), A dominates the MLE X under the normalized 1 p squared error l o s s function L^. Proof: -<K'z)x + P Let f.(x) = -—, ,,, . i f x e J l z+nz) = 0 otherwise. We need to show that 2 p f (x+e ) p A, =; E + 2 £ [f.(x+e.) - f.(x)] < 0. 1 . , x.+l . - 1 1 l — i = l l i = l In f a c t , = (b 2(z+l)(z+p) _ rcb(z+l)(z+p) _ cb(z) Z l 1 " [ z + i + l K z + i ) ] 2 " L Z+1+*( Z+D " J cb(z+l) , (z+p)cb(z+l) _ (z+p)(z+i|;(z))-z(z+l+^(z+l)) 1 - [z+l+iKz+D] z+l+ij/(z+l) z+ij;(z) (since z+^(z) i s non-decreasing and cb(z) >_ 0) 43 = <p(z+l) r(z+p)(p(z+l) _ z]&(z)-ifr (z+1) ]+(p-l) z+pifr(z) , - [z+l+i|;(z+l)] z+l+iKz+1) " z+i|j(z) -Note that iKz) - ^(z+1) >_ 0 by assumption and that z+\\)(z) i s non-decreasing, so we have < <Kz+l) { ( z + p ) ( | ) ( z + i ) . 2 ((p-l)z + piKzX)} [z+l+^(z+l)] Kz+1) _ { z [ ( p ( z + 1 ) _ 2 ( p - l ) ] + P[<p(z+l)-2^(z)]} fz+l+^(z+l)] 1 0 by condition (3) of the theorem. Therefore A = (1 - " ^ ^ y ) ' x dominates X. Q.E.D. The s p e c i a l case where ty(z) = b > 0 i s stated as a c o r o l l a r y below. Cor o l l a r y 4.2.'2. r' _= independent ^ . . i . . Suppose X^ ^ Poisson (A£); I = l , ^ . J p , p >_ 21 ' " -' • 'R " ^ A f *7 \ Then the estimator A = X - „,, • -X of A dominates the MLE X under the -z+b'.. J l o s s function L^ where (1) :b > 0 (2) cp(z) i s nondecreasing (3) .0 <_ <p(z) i Min{2(p-1) ,2b} and <p(z) t 0. Note that the constant b given i n the c o r o l l a r y i s an a r b i t r a r y p o s i t i v e r e a l number, while the estimators given i n Theorem 2.2.9 require b >_ p-1. Moreover, the theorem requires that 0 <^ <p(z) <^ 2(p-1), and hence the class of estimators given i n Theorem 2.2.9 i s i n f a c t a subclass of ours. 44 -The estimator X shrinks the MLE towards the o r i g i n by the amount . For every b, the maximum shrinkage allowable i f X i s to dominate *.u MTT? • Min{2(p-1) ,2b}x r,. •.. • • * c -u the MLE i s z+b ' w n l c n 1 S a n increasing function of b (coordinate-wise) whenever 0 < b <_ p-1 and a decreasing function whenever b > p-1. Therefore the maximum shrinkage i s obtained when b = p-1. An a p p l i c a t i o n of C o r o l l a r y 4.2.2 above gives us s t i l l more i n t e r e s t i n g estimators of X. The r e s u l t i s stated i n C o r o l l a r y 4.2.3 below. Co r o l l a r y 4.2.3. independent Suppose ^ Poisson (X^) , i = l , . . . , p , p ^ 2. Then the estimator A = (1 - "^- ) t X of X dominates the MLE X under the loss function L^ where (1) t >_ 1, c > 0 (2) 0 < a < Min { 1 ^ = 1 ) . , C , ^ } . Proof: Rewrite X as follows: X = {1~ [1 - (^ |g a - ) t ] }X where 0(z) = (^+c) t-( 2+c-a) t ( z + c ) t _ 1 We s h a l l show that ( i ) 0(z) i s non-decreasing and ( i i ) .0 1.0 (z) .<_ Min{2(p-1) , 2c}. .Now,'the de r i v a t i v e of 0 (z) i s 9'(z) = 1 (z+c-a) t 1(z+c-a+ta) (z+c)* 45 Since (z+c)*" >_ (z+c-a)*" + ta^+c-a)*" for a l l z >_ 0, we have 0'(z) >^ 0. Hence 0(z) i s non-decreasing. Next, i t i s c l e a r that 0 <_ 6(z) for a l l z >_ 0. Now 0(z) • taCz+c-a)^ 1 + (z+cK 1 where iKz) i s such that l i m i|)(z) =0. As a r e s u l t , 0(z) £ l i m 0(z) = ta. From condition (2), we have 0(z) £min{2(p-l), t c , 2c}. By C o r o l l a r y 4.2.2 we see that the estimator dominates the MLE under the los s function L^. Q.E.D. 2 The estimator X(X) = (1 - (p-1)/(Z+p-1)) X, which i s an estimator described i n the previous c o r o l l a r y with t = 2 and a = c = p-1, shrinks more towards the o r i g i n than does the estimator X = (1 - (p-1) / (Z+p-1) )'X. ^* Thus, A should give a better estimate .of X than X i f the parameters X^, i = l , . . . , p are close enough to zero. The following argument gives us an i n t e r e s t i n g i n s i g h t as to why we might a r r i v e at estimators of the form X = (1 - ^ r - ) 2 X . Z+a Let X^ ^ Poisson (X ), i = l , . . . , p , be mutually independent and l e t Y^ = 2/xT, 9^ = 2\(xT, i = l , . . . , p . I t i s approximately true that 7. Y_^ ^ N(0^,1), i = l , . . . , p , and that the Y^'s are mutually independent. That i s , approximately, Y = (Y.,...,Y ) ^ N (0,1 ), where I i s the p x p 1 p p p p i d e n t i t y matrix and 0 = (0^,...,6 ). The James-Stein estimator A A. A A, 0 = (0^ 0p) of 0 under squared error l o s s i s 0^ = (1 - r/Y'Y)Y^, ^ 2_f 2. c — 1 = l , . . . , p . Or, i n terms of X and X, X. = (1 - —)*/xT, i = l , . . . , p , 1 Ci W. 46 where Z = E X . , c = r/4, and A = ( A . , . . . , A ) i s an estimator of A . i = l 1 1 p We thus have the estimator A = ( 1 - c/Z) X of A . Since Z has a p o s i t i v e p r o b a b i l i t y of being zero, we are prompted to replace Z by Z + a, where a i s a p o s i t i v e r e a l number. We thus arrive.-at'.the estimator A = ( 1 - ^ - ) 2 X of A . Z+a One might argue that the above r e s u l t i s obtained through the use of squared error loss instead of normalized squared error l o s s . However, notice that the squared error l o s s function i s applied only i n estimating 2 8, and asymptotically, (6L - 6^) i s O ( A ^ ) , i = l , . . . , p ( i . e . 2 lim (0. - 0.) / A . < °°). The normalized squared error l o s s function has X -*» 1 i " 2 trie same asymptotic property, i . e . ( A ^ - A ^ ) /X = 0 ( A _ ^ ) , i = l , . . . , p . Hence, our r e s u l t can be thought of as i f i t i s obtained under L^. In order that estimators of the form A = X - V,, X dominate the MLE Z+b X under normalized squared l o s s , the requirement that ep(z) be nondecreasing i s not a necessary condition. In f a c t , <p(z) can be decreasing for a l l z and A w i l l s t i l l dominate the MLE X . The following theorem states t h i s ..' f a c t . Theorem 4.2.4 independent Suppose X ^ A* Poisson ( A ^ ) , i = l , . . . , p (p ^ 2 ) . v.. A <!>t(z) Let A = ( 1 Z Z T ) X be an estimator of A where (z+b) t + 1 ( 1 ) t >_ 0 , b > t + 1 (2) <f>t(z) i s nondecreasing i n z ( 3 ) tp t(z) > 0 and cp^z) f 0 <*V(z) (4) — ^ — < Min{2(p-t-l),2(b^t-l)}. u+br 47 Then for a l l A = (-A,,...,A ), A has smaller r i s k than A° = X when the 1 p lo s s function i s given by L^. Proof: -cb t(z)x i + p Let f . (x) = — — i f x e J 1 (z+b) t + J-=0 otherwise, i = l , . . . , p . Again, we need to show 2 p f (x+e ) p A = Z 1 1 + 2 Z [f.(x+e.) - f.(x)] < 0. X=l 1 1=1 Now <|>2(z+l) (z+p) -eb (z+l)(z+p) cb..(z)z A i = — 21+2" + 2 [ E+i- + t + l ] 1 ( z + b + l ) Z t + Z : (z+b+1) ( z + b ) t + 1 cb 2(z+l)(z+p) _ < — *ZTT-+ 2 4> (z+D [ U + P L i +• 1 / - u o - - ^ 2 t + 2 ^ T ^ - / L , tH-1 • .,xt+l (z+b+1) (z+b+1) (z+b) (since $ i s non-decreasing) cb (z+l)(z+p) 1 = V z ± 1 ) { - • 7 -2t+2 + 2 ( z + p ) [ t+i " L-E+T ] ^t+T ] C (z*b+I) (z+b) (z+b+1) (z+b) C 1 Observe that the function f defined by f ( z ) = _ i s s t r i c t l y ( z + b ) t + i convex for z >_ 0. Hence f(z+l) - f ( z ) >^ f ' ( z ) . Consequently 1 C (z+b+1) C ( z + b ) ^ ( z + b ) C + Z ( z + b ) t + J : 48 < 0, 4> t(z+l) since <_ 2Min{p-t-l,b-t-l}, i . e . condition C4) of our (z+b+1) theorem. Q.E.D. If b=p-l, the theorem holds even when condition (4) i n the theorem i s replaced by (4)' ( z+b) _ t 4> t(z) < 2 ( p - f - l ) . That i s , (z+b) *" 4>t(z) has a larger upper bound. We state t h i s r e s u l t i n the theorem below. Theorem 4.2.5. Let X, A and A be as given i n Theorem 4 with b=p-l. Suppose <f>t(z) s a t i s f i e s the following conditions: (1) t >/0, p-1 > t (2) cb^z) i s non-decreasing i n z (3) cbt(z) > 0 and <j>t(z) 4 0 Then for a l l A, A has smaller r i s k than X when the error loss function i s given by L r Proof: The proof here i s e s s e n t i a l l y the same as that of Theorem 4.2.4. I t can be shown that 49 A i ± 2t+i + 2 * t — F - — i ~ T ) ^ 7 + T ] (2+P)Zt+1 t (Z+p-1)C (Z+P) (Z+p-1) t + 1 ^.T±31+I + 2 t t ^ D t * - — f 8 ^ ] (-z+p) (z+p-1) (z+p-1) <P^(z+l) - r r+r+ 2 M 2 + 1 > — L i : (z+p) ^ ( z + p - D t + i t " ( z + p - D t + 1 «> (z+1) <?. (z+1) [ — — — - 2(p-t- l ) ] ; ' ( z + p - D t + 1 (z+p) 1 <_ 0 (by condition (4) ') . The second i n e q u a l i t y holds since the function f ( z ) = — — — i s (z+p-1) s t r i c t l y convex f o r z >_ 0. Q.E.D. When t = 0, the above theorem i s Theor em 2.2.9 of Section 2. The theorem, due to Clevenson and Zidek [1975], suggests that estimators of ^ 6 (Z") the form X = (1 - _, \)X w i l l dominate the MLE X provided that Z+p-1 0 <_ 8(z) <_ 2(p-1) and that 8(z) i s nondecreasing. However, as we see from the theorem above, the requirement that 6(z) be nondecreasing i s not necessary. A simple example i s given below. Example 4.2.6. Let X = (1 - C )X with 0 ^ f < p-1 and 0 < c < 2(p-t-1). (Z+p-1) t + ± Then by our theorem, X i s better than X under the normalized squared error 0 (Z) loss function. We can write X i n the form (1 - ^ ^)X, where -0(z) = ^-'j and note that 0(z) i s s t r i c t l y decreasing f o r a l l z >_ 0. (z+p-1) 50 4 ."3 Better Estimators Under L, k The usage of unbounded l o s s functions has been a c o n t r o v e r s i a l topic because of the famous St. Petersberg's paradox; hence i t i s natural to consider bounded loss functions. Noting that the loss function i s unbounded both when the A^'s are small and when they are large, we are P " 2 k prompted to consider the loss functions L (A,A) = E (A. - X.) /A. x=l (k 2l 2) which are bounded when the A^'s are large. This i s desirable i f we are p r i m a r i l y interested i n the problem of estimating A when the A^'s are r e l a t i v e l y large. Furthermore, when the A^'s are large, the variances of the d i s t r i b u t i o n s are large, and the estimation problem i s more dif-- . f i c u l t . I n t u i t i v e l y we might expect the MLE to perform f a i r l y w e l l when the A.'s are r e l a t i v e l y small, since i n t h i s case, the variances of the d i s t r i b u t i o n s are small, and the observations are exceedingly l i k e l y to f a l l close to the mode (hence the range of the observations i s very l i k e l y small). However, when the A^'s are large, the variances are large and the observations are scattered across a wide range of values. This leads us to conjecture that i t i s possible to improve on the MLE when the A_/s are large. There i s another reason why we might think of using as our loss function i n the estimation problem. The parameter of a Poisson d i s t r l b u ^ •. ti o n can be thought of not only as the mean of the d i s t r i b u t i o n , but also as the variance. With t h i s i n mind, a natural l o s s function to use would . P r. O *• be E ..(1> A./A.) , which i s L.(A,A) (see Brown [1968]). 1=1 - 1 1 Before s t a t i n g the next theorem, we introduce some notation to be used i n the theorem. 51 D e f i n i t i o n s : - F (kl H (1) S = E (X.+k) v ; = E (X.+l)---(X.+k) 1=1 1 1=1 1 X (2) S = Z ( x . + k ) w = Z (x.+l)---(x.+k) 1=1 1 1=1 1 X (3) S 1 = S - (X + k ) ( k ) (4) S 1 = S - (x.+k) (k) Theorem 4.3.1 below gives us a class of estimators A = (A^,...,A ) uniformly dominating the MLE under the loss function L^ (k >_ 2). These estimators have the following properties: th (1) I f the observation from the i population i s small (less than k), then the estimator A_^ of A_^ i s the same as the MLE. (2) If the observation i s large (greater than or equal to k), then the estimator X. of X. shrinks the MLE towards zero. Using the d e f i n i t i o n s given above, the theorem and i t s proof are stated as follows: Theorem 4.3.1. independent Suppose X^ ^ Poisson (X/) , i = l , , . . , p , p >_ 2,' .and the loss function i s L^XA^A). Then the estimator A given below dominates the MLE X uniformly i n A = CA-p • . •,A ) : " th '^ A. = i coordinate of A I <j>(Z) X. (X.-l) • • • (X.-k+l) = X_. 1 1 1 1 S 1 "+ X ± ( X i - 1 ) . • • (X^.-k+l) 52 P where (1) z = E x. i = l 1 (2) <|>(z) i s a r e a l valued function increasing i n z (3) 0 ± <|>(z) 1 2k(p-1) and <|>(z) 2 0. Proof: -<f>(z) x. (x.-l) • • • (x.-k+l) Let f . (x) = —: i f x. >_ 0, i = 1,. ..,p 1 S 1 + x (x - l ) " - ( x «k+l) 1 = 0 i f x. < 0. I We see that the f^'s s a t i s f y the conditions given i n Lemma 2.2.6. Hence Lemma 2.2.7 gives us an unbiased estimate A^ of the d e t e r i o r a t i o n i n r i s k of A. We have D, = E A = R(A,A) - R(A,X) P f 2(x+ke.) P f.(x+ke.) - f.(x+(k-l)e.) a n d A k = I - — ( k i + 2 ^ ( x i + k ) — — 1 —m ~ k 1=1 ( x . + k ) W 1=1 1 ( x . + k ) W l l Substituting the f 7 s i n the formula, we have ^2. . . . P -<j>(z+k) • (x.+k) <b(z+k-l)x. M | + k I + 2 l + k s i = i s s ^ x . + k - i ) ^ ,2. . . / ^ P -(x,+k)[S - k ( x J + k - l ) ( k _ 1 ) ] •+ x 4S < (|) (z+k) , 2(t(z+k) v v i S S 1=1 S 1 + (x + k - l ) ( k ) (since <j>(z) i s increasing and <f>(z) ^ _ 0) 12 v -v p -kS + k ( x . + k ) ( k ) (j) (z+k) + 2cj) (z+k) £ I S S 1=1 S 1 + (x + k - l ) ( k ) Now observe that S > S 1 + (x^+k-l) and S > ( x ^ k ) (k) (k) 53 Hence, we have, 1 0 (since 0 <_ <p(z+k) <_ 2k(p-l) and <J>(z+k) i 0) . Q.E.D. Remarks: (1) When k=l, Theorem 4.3.1 i s the same as Theorem 2.2.9, a r e s u l t i n Clevenson and Zidek [1975]. (2) • I f x < k-1, then gives the same estimate as the MLE, i . e . no shrinkage of the MLE takes place. We see from t h i s that as k <?, the l i k e l i h o o d that shrinkage w i l l be indicated becomes smaller and smaller. It i s i n t u i t i v e l y c l e a r that i f we shrink the MLE, we should shrink small observations l e s s than we shrink:large observations. This i s p a r t l y because when the observations are small, the underlying parameters are l i k e l y to be small, which means that the variances are small and hence each observation i s l i k e l y to be close to the value of i t s respective parameter. On the other hand, larger values of the parameters correspond to larger variances of the random v a r i a b l e s , and the higher p r o b a b i l i t y of s c a t t e r i n g of the observations leads one to suspect that i n t h i s case the MLE i s l e s s r e l i a b l e than when the observations are small; one might therefore be more i n c l i n e d to adjust the MLE. The integer k i n the los s function L^ r e f l e c t s the degree of concern about misestimation of r e l a t i v e l y small A^'s. Large values of k r e s u l t i n lo s s functions which are very s e n s i t i v e to changes i n the A^'s. Since our estimators shrink those observations that are greater than or equal to k but leave the 54 others untouched, the integer k can be interpreted as an in d i c a t o r of a person's willingness to move the MLE for better estimation r e s u l t s . Choice of a small value of k would probably r e s u l t from a p r i o r b e l i e f that the A^'s are small, and hence the person i s w i l l i n g to move even small observations which, as pointed out above, are supposed to be more r e l i a b l e . Choice of a large value of k would probably r e s u l t from a p r i o r b e l i e f that the A^'s are not small, and thus the person i s i n c l i n e d not to move the observations i f they are not large enough. (3) Theorem 3.1 of Clevenson and Zidek [1975] suggests that e s t i - '_ mators A = (1 - <p (Z) / (Z+p-1) )X of X s t i l l dominate the MLE under a general P - 2 loss function L (A,A) = Z K(A.) (A. - A.) /A. where K > 0 i s some K. . i 1 1 1 1 1 = 1 k-1 nonincreasing function. When K(y) = 1/y , L i s the k-NSEL L. . However, our estimators do not shrink observations that are less than k; only those observations greater than or equal to k are moved. Therefore i f A^ >^ k, our estimators guard against unnecessary shrinkage i f the observations happen to be small ( i . e . < k). Since the Clevenson-Zidek estimator shrinks a l l non-zero observations, we are led to conjecture that our estimators are better than t h e i r s . i n terms of the percentage i n savings compared to the MLE when the A^'s are r e l a t i v e l y large ( i . e . when Min{A_^}^_ k^_ 2). Some simulation r e s u l t s which support t h i s conjecture are reported i n Section 7. The next theorem i s a genera l i z a t i o n of Coro l l a r y 4.2.2 to the case when k-NSEL L i s used, where k i s any p o s i t i v e integer. Theorem 4.3.2. Suppose X = (X^,...,Xp) i s as given i n Theorem 4.3.1 and X = (X,,...,A ) i s an estimator of X. 1 p < K z ) x . ( k ) Let X = X r- T T T , i=l,...,p, 1 1 S ^ X . ^ + b l where (1) k i s .a p o s i t i v e integer (2) <j) i s non-decreasing (3) 0 < <f>(z) £ min{2 ( b | ^ ~ ^ k ! , 2k(p-l)} (4) b > - ( p - l ) [ k ! ] . Then for a l l X, X dominates X under the. error loss function L, k Proof: -<j>(z)x 0 0 p Define f . (x) = — r r - r i f x e J sWk>-H, l = 0 otherwise, i-1,...,p. We need to show that p f.(x+ke.) p f.(x+ke.)-f.(x+(k-l)e.) A, = S — pfr- + 2 E (x.+k) — — - 1—?-: k 1-1 ( x . + k ) ( k ) 1=1 1 ( x . + k ) ( k ) < 0. 12/ J_T_\c i r ,i \ p kS-k(x.+k) ( k )+kb I n d e e d * < * < z + k> S - 2 \ -. i k ~ [S+b] 2 S + b i=lS 1 +(x. +k-l) ( xVb 56 ± [ l 2 ! ^ _ l i C z + k i ( k ( p _ 1 ) s + p k b ) [S+b] [S+b] (since kS - k(x i+k) ( k )+kb > 0) <. $ (z+hl [S((f,(z+k)-2k(p-l)) - 2pkb]. [s+b] z Now observe that S(<J)(z+k) - 2k(p-l)) —2pkb <_ pk! (<j>(z+k) - 2k(p-l)) -2pkb (since condition (3) gives <|>(z) £2k(p-l)) - p k ! [•(*« -2 ] < 0 by condition (3) . Hence A. < 0. k — Q.E.D. 57 SECTION 5. BAYESIAN ANALYSIS 5.1 Introduction In t h i s section, we explore the problem of estimating the Poisson parameters from another perspective. R e c a l l that ^ Poisson (A^), i = l , . . . , p , and that the X^'s are mutually independent. Moreover, only one observation i s taken from each of the p Poisson random v a r i a b l e s . In contrast to the frequentist approach taken i n the l a s t section, we s h a l l take a Bayesian approach and show how to f i n d estimators better than the MLE X = (X^,...,X^). Observe that X i s a Bayes estimator i f p r i o r knowledge of the parameters A^ i s non-informative (vague) and the A^'s are independent. When su b s t a n t i a l p r i o r knowledge i s a v a i l a b l e , sign-i f i c a n t improvement on the usual estimation of procedure would be expected by means of Bayesian methods. That i s , i n c e r t a i n s i t u a t i o n s , we can incorporate the information at hand about the p r i o r d i s t r i b u t i o n i n a Bayesian manner and obtain estimators of A = (A^,...',A ) better than X. Recently, some attempts have been made to study the problem of simultaneously estimating the parameters of several independent Poisson random va r i a b l e s from the Bayesian point of view. As mentioned previously, Clevenson and Zidek [1975] propose a class of estimators that dominate the MLE uniformly under normalized squared error l o s s . They also provide a Bayesian i n t e r p r e t a t i o n of t h e i r r e s u l t s . Leonard 11972] assumes that the A^'s are independent and i d e n t i c a l l y d i s t r i b u t e d , with 2 2 In A^ ^ N(y,a ).., i = l , . . . , p f o r given y and a , that y i s uniformly 2 d i s t r i b u t e d over the r e a l l i n e , and that vn/a i s independent of y and has a chi^square d i s t r i b u t i o n with v degrees of freedom. Modal estimates 58 of In A^, i = l , . . . , p are proposed. Later, i n another paper, Leonard [1976] b r i e f l y discusses the problem again. He assumes that the para- .':-meters A ^ are exchangeable i n the sense of de F i n e t t i [1964]. That i s , the p r i o r d i s t r i b u t i o n s (two at a time, three at a time, etc.) of the A^'s are invariant under permutation of the s u f f i x e s . Given a > 0 and g > 0, the A^'s are assumed to be independent and each A ^ has gamma density n(X |.a,3) = B a A i a _ 1 e " e A i / r ( a ) f o r X > 0 = 0 otherwise. In the second stage of the d i s t r i b u t i o n , In 3 i s assumed to be uniformly d i s t r i b u t e d over the r e a l l i n e and no p r i o r d i s t r i b u t i o n of a i s suggested. In t h i s section, we generalize the above r e s u l t s by adopting various p r i o r d i s t r i b u t i o n s on a, a l l included within the broad family of -generalized hypergeometric functions, and develop the j o i n t p osterior d i s t r i b u t i o n of A as we l l as the marginal posterior d i s t r i b u t i o n s of the In A^, 1=1,...,p. Point estimators of the A^'s are proposed and compared with the MLE. By means of a computer simulation which w i l l be reported i n Section 7, i t i s found that i n c e r t a i n s i t u a t i o n s , e s p e c i a l l y when the parameters A ^ are close to each other and the loss i s squared error, a subs t a n t i a l savings over the MLE w i l l r e s u l t . We s h a l l f i r s t derive a Bayesian s o l u t i o n f o r our problem, and then develop the marginal posterior density of A... 59 5.2 Estimates of the Parameters The following d i s t r i b u t i o n s are relevant to a Bayesian s o l u t i o n of our problem: (1) Given A^, i=l,...,p, the observations x^,...,x^ are independent and have Poisson d i s t r i b u t i o n s with parameters A^,...,A re s p e c t i v e l y . The j o i n t p r o b a b i l i t y mass function of x given X i s -X. x. P e X. f(x|X) = n — r ^ - . i-1 x i ! (2) The A_/s are exchangeable a p r i o r i , and the p r i o r d i s t r i b u t i o n i s described i n two stages as follows: ( i ) Given a and B, the X^ are independent and have a gamma density, i . e . for a > 0, g > 0, ir(A |a,8) = A ? e X/T(a) i f X ± > 0 = 0 otherwise. ( i i ) :;a and g are independent, and g has a non-informative p r i o r density, i . e . , the density of g i s proportional to 1/g. The parameter a i s assumed to have the translated geometric d i s t r i b u t i o n with p r o b a b i l i t y mass function (1-y) i f a = 1 a-1 (l-y)y i f a = 2,3,..., where 0 <_ y <.l. The hyperparameter y i s assessed a p r i o r i . 60 -The j o i n t density of x, A, a, B given y i s proportional to -A. xv , -BA. 1, i „ot ,ot-l I S 6 A i 5 A i 6 1 a l l * * . n . T(a) " * B * y ' i = l " i i = l Conditioning on x and i n t e g r a t i n g with respect to B gives the j o i n t p o sterior density of A and ct, given x and y, p x p . / • \ • £ / , I •> -A •„ i r ,ot-l T(pa) f(A,a x,y) « e II ..A. [y IT A.] E — • i 1 • i 1 T»P / \ A pot 1=1 1=1 T (a) 'K P where A = E A.. i-1 1 Now introduce the common f a c t o r i a l function notation (y) = ^ ^ " ^ , and l e t <j> = (p Py II A.)/A P. Using the Gauss m u l t i p l i c a t i o n theorem i = l 1 i d e n t i t y (see, e.g., p. 26 of R a i n v i l l e [I960]) r ( P a ) s a ^ ^ \ ^ - \ i 0 r ( a + F } ' we have -A P x i P _ 1 k e II A. n (1 + -) •• ._, i , , p a-1 n f(A,a|x,p) = C(x,y) i = r — ~ * , A P ( D P :(ct-l)T a-1 where C(x,y) depends only on x and fi,,and i s • such", that. E / f (A,ct |x,y)dA = 1. ot=l Summing with respect to a gives the j o i n t p o s t e r i o r density of the A^'s, given y and the data x, 61 -A P * i i n A . P" 1 k n ( i + r ) f(A|x,y) = C(x,y) 1=1 k=l P a=l (1)P 2 ( a - l ) ! a-1 a-1 , a - l <P a p x. -A „ , l = C(x,y) e - n A _ _ i = i : A 1 p-1 n ( l + r ) . k=l ' j=0 ( l ) P " 2 j ! Notice that the i n f i n i t e sum here i s a generalized hypergeometric function p-1 p-2 1 + 1 + 1 + 1 1 1 • -L, J- , . . . , _L , p X i which converges f o r j <j> | •> 1, i . e . f o r [p Py E (.~fL)\ < !• Since the i = l geometric mean of the A^'s cannot exceed t h e i r arithmetic mean, i t i s s u f f i c i e n t for the function to converge i f y<l. The j o i n t p osterior density of X i s proper i f the sum of the observations i s p o s i t i v e , which i s a reasonable assumption. Now we f i n d the e x p l i c i t expression of the normalizing constant C(x,y). We make the following transformation of va r i a b l e s : (A , •. ., X ) >• (A, 6 ,...,0 ) 1 p • 1 p-1 where X. = A0. i=l,...,p, x x and E 6. = 1 , 6. > 0. i-1 1 The Jacobian of the transformation i s A p-1 /f(A|x,y)dA^dA2 i•*dA^ = 1 implies that 62 P - 1 k n ( l + - ) . - l - A z - i p x i °° k=i p 2 D p 1 p [C(x ,y)] x = / / e - A V . n i e . E _ 9 [pPy n 9 . ] JdA n de. A>0 9±> 0 i = l 1 j=0 ( l ) p j ! i = l 1 1=1 1 E9.=l 2 l P-1 K P n ( i + - ) . n r(j + x. + l ) - (z-1)! E [ p P y ] 3 « P 3 1=1 3=0 ( i ) p 2 j ! r ( P j + z + P) where z = E x. 1=1 3 Now observe that (x. +.„!) . = T(j + x. + 1) i ' - " j r(x. + 1) and that P-1 r ( P j + z + p) = r(z + P ) (z + p) . = r(z + P ) P p j n ( z + p + k ) . P J k=0 p J (cf. p. 22, Lemma 6 of R a i n v i l l e [I960]). Consequently, we see that P P-1 (z-1)! n x±l [ C ( x , y ) ] ~ 1 = . , K}. E n (1 + -) . n (x:.: + 1) . k=l P 31=1 1 2 y 3 (z+p-1)! _ p ^ l . j 2 = 0 ( I ) ? " 2 n ( ^ 2 % . 2 k=0 p 2 (z-1)! n x.! (z+p-1)! 2p-l 2p-2 1 + - 1 + ^ x +1 P P 1 x +1; P i i ' z±P. Z + 2 P ~ 1 . P P That i s , C(x,y) =• (z+p-1)! (z-1) ! n x.! 1=1 1 2p-l 2p-2 L * • • • »1 • • X-H~X » • • • » X 1 • p P 1 P P " ' pz+p_ 1, . . . , 1, , p z+2p-l p -1 63 For s i m p l i c i t y , we use the symbol p* to represent the f i n i t e sequence 1 + -jjj- ,..., 1 + a n d u s e the symbol 1^ to represent the f i n i t e sequence of q l ' s . With t h i s notation, C(x,y) becomes (z+p-1)! (z-1)! n x.! i = l 1 2p-l 2p-2 p*, x x + l , . . . , x + 1; z+p_ z+2p-l lV-2> p p —1 The posterior means of the components of A, given x and y, are now e a s i l y shown to be A = E A = c ( x » " ) i = l i x,y i C(x+e^,y) ' ' th where e. i s a p-vector which has i coordinate one and the other I coordinates zero, and E means that the expectation i s taken holding x, y x and y . f i x e d . This expression can also be written i n terms of hyper-geometric functions. Accordingly, l e t g(x,y) denote the generalized hypergeometric function, g(x,y) := 2 p l ; 1 F 2 ^ 2 p*, x.+l,...,x +1; 1 P P-2' z+p ' P ' z+2p-l . P ' In terms of g, the p o s t e r i o r mean of A becomes z ( x ^ l ) g(x+e.,y) p X . = ; — 7 r — , i = l , .. ., p, where z = E x. . i z+p g(x,y) I = 1 I The p o s t e r i o r variances and covariances among the components of A can also be obtained. In terms of g, they are given as follows: 64 Var (A.) = E A 2 - [E A . ] 2 x,y 1 x,y; 1 x,y 1 z(z+l)(x.+l)(x.+2) g(x+2e±,v) 2 (z+p) (z+p+1) g(x,y) '• Ex,y Ai"' (z+l)(x.+2) g(x+2e.,y) z(x.+l) g(x+e.,y) i (z+p+1) g(x+e ,y) z+p ' g(x,y) ' Cov (A.,A.) x,y i ' j E A.A. - (E A.)(E A.) x,y:.i j x,y x x,y j z(z+l) ( x ^ l ) (x.,+1) g(x+e i+e i ,y) „ „ (z+p>(z+p+l) " " g(x,y) " • X i X j i = l,...,p, j = l , . . . , p i i ^ j We see that the A.'s are correlated i f at l e a s t one of the x.'s i s non-l I zero. Moreover, since the variances are always non-negative, we have (z+l)(x i+2) g(x+2 e i,y) z(x ±+l) g(x 1+e i,y) - ZZZ— * „\"'— i=l,...,p. (z+p+1) g(x+e i }y) — z+p ' g(x,y) That i s , f o r fi x e d i the marginal po s t e r i o r mean E A. i s a non-x,y x decreasing function of x^ (an i n t u i t i v e l y reasonable result).. R e c a l l that the foregoing r e s u l t i s based on the assumption that a follows a translated geometric d i s t r i b u t i o n . Similar r e s u l t s can be derived with d i f f e r e n t d i s c r e t e p r i o r d i s t r i b u t i o n s of a. Two examples are given below. Example 5.2.1. Suppose the p r i o r d i s t r i b u t i o n of a i s Poisson with counting a-1 density proportional to exp(-y)y , a=l,2,...,0 j< y < °° and y i s 65 known. The normalizing constant i n t h i s case i s C(x,y) = _ (z+p-1)! (z-1)! E x. i = l 1 2p-l 2p-l p*, x+l,...,x +1; 1 P i P r 1 P z+2p-l p - l The posterior mean and the covariance matrix of X can then be calculated. z (x.,+1) g(x+e.,y) E(A.|x,y)= r ^ — -.—^ i 1 z+p g(x,y) i=l,...,p, where g(x,y) = 2p-l^2p-l p*, Xj+1,...,x +1; 1 z+p_ z+2p-l p-1 p p Example 5.2.2. Suppose the p r i o r d i s t r i b u t i o n of a i s negative binomial with counting density proportional to m+a-2'. a-1 y" ^(l-y)m»a=l,2,..., f or some known m and y such that m >_ 1 and 0 <^ y < 1. The j o i n t p osterior density of X given m,y and x i s e n A . i i = l 1 f(A|m,y,x) = C(x,m,y) 1- F AP P P-1 P*. m; P-1' where <j> = p P n x. i - i 1 yA -p The normalizing constant, C(x,m,y), i s given by C(x,m,y) • { (z-1)! n.• x.! i - 1 1 I F 2p 2p-l P*> x.j+1,... ,x +1, m; 1 _5±P_ z+2p-l L P - 1 ' P p ; -1 66 The posterior mean given x, m and y, i s z(x ±+l) g ( x + e i > m ? y ) E( A.. x,m,y) = T • 7 ^ — : v , i .=• l , . . . , p , 1 ' ' z+p g(x,m,y) • ? " ' where g(x,m,y) = 2p 2p~l p*, x.,+1,... ,x +l,m; 1 P z+p_ P-1' P " z+2p-l. In many si t u a t i o n s it.';is d e sirable to have greater f l e x i b i l i t y i n tr a n s l a t i n g one's p r i o r b e l i e f s into a parametric family of p r i o r d i s t r i b u t i o n s . A family which i s r i c h e r i n parameters w i l l u sually be s u f f i c i e n t . Accordingly, r i c h e r r e s u l t s can be obtained by choosing the p r i o r d i s t r i b u t i o n of a to have the following (hypergeometric) form: . .Vaj>k-i k - i P(a=k) a n . , k=1>... W V -Yvk-1 J=l (k-1) (5.2.3)! for some known (a^,...,a u, b^,...,b r i, y) where: (1) w = F u v a ^ , . . . » ^ U J b h • u -J^ > • • • 5 U J i s w e l l defined, u (2) n ( d . ) ^ = 1, i f u=0, j = l J 67 (3) u <_ v+1 and u >_ 0, i f u < v+1, 0 <_ y < 1, i f u = v+1. Let a denote (a,,...,a ) and b denote ( b - , b ). The r e s u l t i n g i o i n t 1 u 1 v J posterior density of A given a, b, y and x i s f(AJa,b,y,x) = C(x,a,b,y) -A P X i e n A . .1=1 1 A p+u-1 p+v-1 P*, a; 1 b; P-1' where <}> = p 1 P y>.A -P and [C(x,a,b,y)] -1 (z-1)! n x.! 1=1 1 .. F (z+p-1)! 2p+u-l 2p+v-l p*, x^+1,...>Xp+l, a; z+p_ z+2p-l P-1' p »••*•» p ' b ' In Example 5.2.1, we had u=v=0 and 0<y<°°. In Example 5.2.2, we had v=0, u=l, a-^ = m, 0£y<l. As before, to s i m p l i f y the representation for the p o s t e r i o r density, and to a i d i n understanding i t s behavior, we define the generalized hypergeometric function g as follows: [p*» x^+1,...,x +1, a; g(x,a,b,y) = 2 p + u _ 1 F 2 p + v _ 1 P-1' P z+p_ z+2p-l (5.2.4) 68 ;; In the case of the hypergeometric family of p r i o r d i s t r i b u t i o n s , we can express the posterior mean vector and the posterior covariance matrix [Cov (A., A.)J i n terms of g. Indeed, x,a,b,y x' J pxp 5 ' z(x.+l) g(x+e.,a,b,y) (1) E . A. = ± _ — — — ± - r — r — , i = l p, (5.2.5) x,a,b,y l z+p g(x,a,b,y) 2 z(z+l)(x i+l)(x i+2) g(x+2 e i,a,b,y) ^ E x , a , b , y X i (z+p)(z+p+1) g(x,a,b,y) ' (3) Var , (A.) = E , A 2 - [E A. ] 2 x,a,b,y p l x,a,b,y l x,a,b,y x (z+l)(x..+2) g(x+2e.,a,b,y) - I E „ i . , . A , - H - 1 -~ x , a , b , y " i J 1 (z+p+1) g(x+e i,a,b,y) z(x.+l) g(x+e.,a,b,y) _ 1 1 ] z+p+1 g(x,a,b,y) ' z( z + l ) ( x 1 + l ) ( x i + l ) g(x+e i+e 1,a,b,y) ( 4 ) C o v x , a , b , y ( A i , A j ) = (z+p)(z+p+1) ' g(x,a,b,y) -[E , A.][E , A.], x,a,b,y I x,a,b,y j i=l,...,p, j=l,...,p, i£j. The marginal p o s t e r i o r mean, E , A., i s seen to be a non-J: '-x,a,b,y l decreasing function of x.. This follows d i r e c t l y since Var , (A.) > 0. x J x,a,b,y i . . — The MLE of A^, x^, i s obviously an increasing function of x.. Since the posterior moments involve the function g(x,a,b,y), we would l i k e to investigate the properties of g. In p a r t i c u l a r , we would l i k e to compare the value g(x+e^,a,b,y) and the value g(x,a,b,y). A s u f f i c i e n t condition that g(x+e^,a,b,y) i s greater than or equal to 69 g(x,a,b,y) i s that each i n d i v i d u a l term of the i n f i n i t e sum of g(x+e^,a,b,y) i s greater than or equal to the corresponding term of the i n f i n i t e sum of g(x,a,b,y). In other words, g(x+e^,a,b,y) f^g(x,a,b,y) P p-1 . n (x.+l) u n (1+J-) • j = l 3 a • (x.+2) • n (a.) • i p a ... l a . i j a ( 1 )P-1 . I (J5+P+JS) . n (b.) R=l P a j = l 3 0 1 p-1 p u n • n (x.+l) • n (a.) >; 2zl 1 .1 = 1 1 = 1 p-1 ^JE+P+k) . I ( h ) a k=0 P a j = l J a for a=0, 1,... . The condition can be s i m p l i f i e d to (x^+2+a-l) (x ±+l) z+2p+(a-l):P - (z+p) ' f ° r a = 1' 2'--' which i s equivalent to * > x. . p - X p Consequently, i f the grand mean, E (x./p), i s s t r i c t l y greater than x., i = l 1 . 1 then g(x+e ±,a,b,y) S g(x,a,b,y), and g(x+e i,a,b,y) = g(x,a,b,y) i f P P E x./p = x.. On the other hand, i f E x./p < x., then j-1 J 1 j-1 3 1 g(x+e^,a,b,y) < g(x,a,b,y). In f a c t , i f the grand mean i s less than x^, g(x,a,.b,y). i s a s t r i c t l y decreasing ifunction ,of x^. Now suppose that a l l the x _ / s are i d e n t i c a l and equal to x^, say. Then the marginal posterior means are 70 z C x ^ l ) g(x+e i,a,b,y) x,a,b,y I z+p g(x,a,b,y) 0 That-is to say, E A. equals the MLE i n t h i s case. This i s a x, a, b, y l natural r e s u l t . .'For, the assumption of having an exchangeable p r i o r d i s t r i b u t i o n of X implies that our p r i o r b e l i e f s about the A_^'s i s that they are " p r o b a b i l i s t i c a l l y c l o se" to one another. Then with i d e n t i c a l observations of the x^'s, we have supporting evidence that the A ^ ' s a r e very close, or even i d e n t i c a l to one another. Hence, the grand mean X Q i s an appropriate estimate for each of the A^'s. 5.3 Marginal Posterior Density In t h i s section, we w i l l derive the marginal posterior density of X^, for 1=1,...,p. R e c a l l that the j o i n t p o s t e r i o r density of the A^'s i s given by -A f(A|x,a,b,y) = C(x,a,b,y) P x n A , ' i = i 1 p+u-l p+v-1 a; 1 2 where p* = (1 H , 1 H , , . P P = y;p P n i = l ,-P and .'.A = E ."A .. i = l 1 71 The j o i n t p osterior density of A can be thought of as a product of three « P x. —A i functions of A . The f i r s t function, e n A . , i s maximized when . , x x=l A . = x., i=l,...p. The second function, — ; increases as the A . ' s 1 1 ,P i : A decrease. . The t h i r d function. p+u-1 p+v-1 p*, a; i s maximized when the A^ ,'s are equal to one another. Hence the posterior mode i s s h i f t e d from x to a p o s i t i o n where the A_^ 's are more nearly equal and towards the o r i g i n . The p o s t e r i o r mean can be used as an estimator of A; i t l i e s between the MLE and the grand mean x-. To see t h i s , we u t i l i z e another representation of the posterior mean. Conditioning on a, the data, and the p r i o r information, the po s t e r i o r mean.of A^.'is. x x+d x. + X x+d x, which i s a convex combination of x. and x. The post e r i o r mean of A . x r x i s therefore equal to x,a,b,y^i ^a|x,a,b,y x+a x. + X x+a x where E i i s the expectation taken with respect to the po s t e r i o r d i s t r i b u t i o n of a. Now, for a l l a, the expression i n s i d e the brackets l i e s between the MLE x_^ , and the grand mean x. Hence the posterior mean of A. also l i e s between x. and x. This does not mean that the posterior mean of A. an estimate of A. i s ne c e s s a r i l y better than the MLE x. for a x x • J x 72 p a r t i c u l a r i . We only expect that most of the posterior means of X^j i=l,...,p, are closer to the true parameters than the MLE. For a p a r t i c u l a r A^ > the squared error loss i n using the Bayes estimator may be greater than that of the MLE. However, the t o t a l squared error l o s s i s expected to be much smaller than that of the MLE when the A / s are " p r o b a b i l i s t i c a l l y c l o s e " to one another. On the other hand, when the X^'s are spread across a wide range, the Bayes estimator i s not expected to perform much better than the MLE. In f a c t , there may be cases where the MLE i s better. Such a case i s discussed i n Section 7. Next, we f i n d the marginal po s t e r i o r density f o r A_^ i n several instances, f i r s t , when the d i s t r i b u t i o n of a i s degenerate, and then, i n general. Results f o r various s p e c i f i c d i s t r i b u t i o n s of ct w i l l be studied numerically. Consider f i r s t , the case when the d i s t r i b u t i o n of a i s degenerate, i . e . p(a=s) = l , s > 0. The pos t e r i o r density of A given x and s i s - A . P X ^ + s - 1 e [ n \* ] r ( P s ) f(A|x,s) = C(x,s) — r P ( s ) A P S In order to f i n d the marginal po s t e r i o r density of A_^ , we introduce the following notation: D e f i n i t i o n s (1) z,.. = E x . — ~ (x) i H 3 (2) -A,.. = E A. (1) iH 3 (3) e a ) = ^ , . . . , 6 . ^ , 9 . + 1 , . . . , 0 ) Now transform A according to: (1) A. +:-A. x x (2) A. ->• where E 0, = 1 and j 1 i . j u ; 3 k The Jacobian of t h i s transformation i s A P 2 A f t e r the transformation, (x) ' f(A|x,s) becomes f(A^, ^^y 9(^|x>s) -A. x.+s-l e A. l x.+s-l P J n 0. ;- A(i)„ X D + ^ P " 1 ) - 1 A (1) (A. + A , . , ) P S x :(x) (Note that f ( - ) i s being used g e n e r i c a l l y to denote the density of the indicated arguments). Integrating out and 8(£)> w e have the marginal p o s t e r i o r density of A_^ given by -A. x.+s-l f U j x . s ) °= e ^"A^1 I I ( X ) , where Z/ .N + ( p - l ) s - l -y (x) v r = r 1 0 (X, + y ) p S dy. Let t = (y/A i). Then 11(A) becomes x . ( l ) dt. o ( i + t ) p s 74 Hence the marginal p o s t e r i o r density of A. may be written 1 - A t z ±(p-l)s-l X: / \ I s 1, z-1 e t f(A. x,s) « e A. / dt. • 1 1 0 ( l + t ) p S I t i s reasonable to assume t h a t l a t l e a s t one of the x.'s i s non-zero and J hence, z > 0 i s a reasonable assumption. In t h i s case, f(A^|x,s) i s a proper density. Let us examine the property of the marginal posterior density f(A^|x,s) of A^ given x and s. If z ^ ^ < s, then the i n t e g r a l I .(A.) = /" 5 dt (s > 0) S " 1 1 0 ( l + t ) p s i s f i n i t e even when A. i s zero. I f s=l=z /. x, we have l (x)' i x. - A . t . -A. -X X p-1 ~i oo Q + - 1 f(A.|x,s) - e XA. f°° - ^ dt , 1 X „ „ . . v P ' o (i+t);p and f(A_Jx,.s) i s s t r i c t l y decreasing i f x = 0. I t i s straightforward to check that the i n t e g r a l i s a monotone decreasing, convex function of A.. Moreover, f(A.|x,s) tends to i n f i n i t y as A. tends to zero. This .'. x x1 x i s not s u r p r i s i n g because x_^ = 0 and z = 1 would i n d i c a t e ..that A_^ i s very l i k e l y to be near zero. In general, we would expect the shape of the marginal p o s t e r i o r density to be gamma-like. If x^ > 0 and s ^ 1, the curve of f(;A |x,s) i s again unimodal and skewed to the r i g h t . Now, we consider the cases i n which a has the d i s c r e t e d i s t r i b u t i o n s suggested i n t h i s paper. The d e r i v a t i o n of the marginal p o s t e r i o r 75 density of A ^ i s straightforward but tedious. In f a c t , f ( A |x,a,b,u) - A . 1 i , z-1 a e :. A . . E p p S n (x.+l) P-1 , u IT (1+-;) •: n. (a.) :•: y s • I ( A . ) " " ' "• " S I . 1 . 3 s t - p s . T i s .1^1 k=l r i = l J i " - n , p-2: z ,.N+p-l+k / nx v 3 = 0 ( D ^ s i n ) • ( p - i ) ( p - 1 ) s • n ( b . ) s k i = 0 P-1 s 3 s Define the hypergeometric function h ^ ^ ( x , y , t ) p p-1 i n (x.+l) • n ( i + - ) :• n ( a.) = E •j.- 3 s , T p s . . . i s J f i J k=l y j = l J nP-2 z,..+p-l+k v 3 = 0 CD*-1 n (-^4 ) ,. n (b.) s k=o P" 1 s j = i J s p < P-i) ( p- 1 )(i+t) p 2p-2+u 2p-2+v p*, x.,+1,... . x ^ + l , x i + 1 + l , . . . ,x^+l,a; w Lp-1' p-1 z,.x+2p-3 (i ) p-1 , b; where w = ^P-— (p-D^-^d+t) 1 5 We then have for the general (non-degenerate a) marginal po s t e r i o r density, , - A z...+p-2 i i z-1 oo e X t t ( l ) f ( A . |x,a,b,y) <= e A . / : • Yi.(x,y, t ) d t . 1 1 0 ( l + t ) P W 76 I t can be. checked that the normalizing constant i s c(x,a,b,v); n r(x.+i)/r(z,..+p-i) (z+p-1).! P (z-1)! n x.! j = l 3 2p+u-l 2p+v-l p*, x.j+1,... ,x +1, a; 1 ^±P_ z+2 P-i b . P ' "*' P -1 n x.!/(z,..+p-2)! (z+p-1)! ( z - l ) ! ( z ( i ) + p - 2 ) ! x . ! 2p+u-l 2p+v-l -p*, x.+l,...,x +1, a; 1 p z+p_ z+2p-l p-1' P " " ' P -1 The shape of t h i s density function i s also gamma-like, by the same argument as before. 5.4 Summary In t h i s section we proposed Bayes estimators f o r simultaneously estimating the parameters of p d i s t r i b u t i o n s of independent Poisson ran-dom v a r i a b l e s when the p r i o r d i s t r i b u t i o n of the parameters i s assumed to be exchangeable. Substantial improvement over the usual procedure (which i s the MLE) i s expected when the parameters are close to one another, e s p e c i a l l y when p i s large, because the assumption of exchangeability implies that the larger p i s , the more information we have about the A^'s. Our claim i s supported i n Section 7 by the r e s u l t s 77 of a computer simulation designed to compare the estimation e f f i c i e n c y of the Bayes estimators with the MLE. The measure used to assess the performance of the estimators i s mean squared error, which i s often a reasonable measure of the o v e r a l l adequacy of an estimator. 78 SECTION 6. EMPIRICAL BAYES ESTIMATION 6.1 Background In estimating the mean of a m u l t i v a r i a t e normal random vector, Efron and Morris [1973] give an i n t e r p r e t a t i o n of the James-Stein estimator from an empirical Bayes point of view. If we l e t p 2 V.~ •(V.,...,V ) ^ N ( 6 , I ) , S = E V. and p > 3, where I i s the 1 P P P i = 1 i P p x p i d e n t i t y matrix, then the James-Stein estimator of 9 i s XI - (p-2)/S)V. Efron and Morris assume that the coordinates 9^ of 9 are independently and i d e n t i c a l l y ^ N(0,A). In t h i s s i t u a t i o n , the Bayes r u l e i s e* = (1 - B)V, with B = 1/(1 + A). The marginal d i s t r i b u t i o n of V i s N p(0,(1+4)I p) and S i s d i s t r i b u t e d as ( l + A ) X p \ a multiple oJ a"chi-square d i s t r i b u t i o n with p degrees of freedom. Replacing B by B(S) = (p-2)/S, which i s an unbiased estimate of B, y i e l d s the empirical Bayes estimator 9 = (1 - B(S))V which i s the James-Stein estimator. Efron and Morris perform much of t h e i r a nalysis based on -what they define as the " r e l a t i v e savings l o s s " , i . e . the regret from using the empirical Bayes r u l e 9 instead of the ac t u a l Bayes r u l e divided by the corresponding regret i f the MLE, V, i s used instead of the Bayes r u l e . I f the r i s k of the estimator 9 i s denoted by .. R(B,9) = E,, R(9,9), where E.„ in d i c a t e s that expectation i s taken with respect to the above p r i o r d i s t r i b u t i o n , the r e l a t i v e savings los s i s RSL(B,9) = [R(B;9) - R(B,9*)]/[R(B,V) - R(B,9*)]. Under the above assumptions, straightforward c a l c u l a t i o n s show that RSL(B,9) can be written as RSL(B,9) = EL' {(B(S) - B) / B} 2. Here a D 2 indicates expectation with respect to S ^ (1/B)x + 9 . The empirical 79 Bayes approach thus reduces the p-dimetisional problem of estimating 0 = (0.., ...,0 ) from V = (V.,... ,V ) to the one-dimensional problem of 1 p 1 p estimating B, or more p r e c i s e l y , 1/(1+A). 6.2 Relative Savings Loss i n the Poisson Case In t h i s section, we use the normalized squared error loss P (A. - A . ) 2 L(A,A) = Z 1 1 1=1 i as our measure f or the loss i n estimating A^ by A l , i = l , . . . , p . This i s the loss function employed by Clevenson and Zidek [1975]. As they suggested, the above approach used on the m u l t i v a r i a t e normal estimation problem seems applicable to our problem of simultaneously estimating the parameters A^,...,Ap of the independent Poisson v a r i a b l e s X^,...,X^. Our developments here p a r a l l e l those i n the normal case. Although no new estimators are found, the empirical Bayes approach provides an alt e r n a t e way to view our problem and derive the estimators given i n Theorem 2.2.9 so that we have a better understanding of those estimators. We w i l l next derive the " r e l a t i v e savings l o s s " for the Poisson case. Let A^,...,A be independently and i d e n t i c a l l y d i s t r i b u t e d with 2 p r i o r d i s t r i b u t i o n a ^ j a scalar multiple of a chi-square d i s t r i b u t i o n with two degrees of freedom. The j o i n t p r i o r density of A.,...,A i s P P proportional to exp(- Z A./a) dA,•••••dA. for A. > 0, and zero otherwise. 1-1 1 1 ? The Bayes estimator of A when such a p r i o r i s used i s (l-b)X, where b = 1/(1 + a). This estimator has Bayes r i s k p(.l-- b) , while the MLE, X, has r i s k p. Thus, the regret at using the MLE instead of the Bayes P estimator i s pb. The marginal d i s t r i b u t i o n of Z = Z X., given b, i s i = l 1 80 z+p-1 z p z + b.(1-b) , f o r z e J . Suppose b i s to be the negative binomial d i s t r i b u t i o n NB(b,p) with p r o b a b i l i t y mass function Pr(z|b) = estimated by b(Z). Then the regret at using the empirical Bayes •:. estimator X = (l-b(Z))X instead of the Bayes r u l e (1 - b)X i s E b E x [ Z ([1 - b(Z)]X. - X . T / X,] - p ( l - b) i = l = E b' (1/(1 -6 ) ) [Zb (Z) - 2b(Z)Z + 2b(Z)Z(l-b)] + p - p(l-b) = [{ Z(b(Z) - b ) Z } / (1 - b)] where E^" denotes expectation with respect to the marginal d i s t r i b u t i o n of Z as given above. The r e l a t i v e savings lo s s i n t h i s case i s RSL(b,X) = E b' [ Z(b(Z) - b ) 2 / p b ( l - b)] = Z z=0 ( 1 ( Z ) 7 b ? • z • • b * ( l -(1 - b)pb z! (p-1)! ( l - b ) ' = Z z=l (b(z) - b)' (l-b)pb (z-1)!(p-1)! (z+p-1) ! , p n = z ( b ( 2 + 1 ) - b> . l & n . b P + 1 ( i - b ) 2 z! p! z=0 ^ [(b(Z+l) - b) / b ]' where E^ denotes expectation with respect to the d i s t r i b u t i o n of Z and Z ^ NB(b,p+l). 81 6.3 The Plus Rules Suppose we have an estimator of the form X = (1 - b(Z))X and that there i s a p o s i t i v e p r o b a b i l i t y that b(Z) > 1. The following proposi-t i o n , shows that we can always improve the r e l a t i v e savings los s RSL of A A. j ^ such an estimator by rep l a c i n g b(Z) with b (Z) = Min {l,b(Z)}. I n t u i t i v e l y , we would expect that the estimator X = (1 - b(Z))X can be improved upon by replacing b(Z) with 1 when b(Z) i s greater than 1, because the Bayes estimator i s (1 - b)X, where b i s known to be between 0 and 1. Hence the proposition i s a very natural r e s u l t . Proposition 6.3'..l. Let X + = (1 - b +(Z))X. Then RSL(b,A) - RSL(b,X +) = (1/b 2) \ {[ (b(Z+l) - b +(Z+l)) + ( l - b ) ] 2 - [1 - b ] 2 } . The function of Z i n the outermost brackets i s nonnegative and s t r i c t l y greater than 0 i f b(Z) > 1, so that RSL(b,X) >_RSL(b,X ) f o r a l l b, with s t r i c t i n e q u a l i t y i f Pr {b(Z) > 1} i s p o s i t i v e f or any value of b. The proof of the above proposition i s immediate and hence i s j /\ omitted. The estimator X may be c a l l e d the "plus r u l e " of X. 6.4 Bayes Rules with Respect to Other P r i o r s In t h i s subsection, we s h a l l consider some Bayes rules with respect to p r i o r s which are members of the family considered by Clevenson and Zidek [1975]. We reparametrize the parameters X^,...,X as (0 ,A), P P 1 i=l,...,p, where A = E X. and 0. = X./A, and suppose the j o i n t p r i o r i = l 1 1 d i s t r i b u t i o n of (0^,A), i = l , . . . , p , i s proportional to exp(-A/a)A adAd0 1 -'W0 (6.4.1)' P 1 P when A >0 and E 0. = 1, and zero otherwise, i = l 1 82 where a i s a p o s i t i v e integer. The Bayes estimator with respect to t h i s p r i o r d i s t r i b u t i o n i s = (1-b) (Z+a)X / (Z+p-1), where b = l / ( l + a ) . The marginal d i s t r i b u t i o n of Z has p r o b a b i l i t y mass function NB(b , a+l). Since & {Z/(Z+a)} = (1-b), estimating (1-b) by Z/(Z+a) leads to the empirical Bayes estimator A*(X) = [Z/(Z+p-1)]X = [1 - (p-1)/(Z+p-1)]X, which i s independent of a . This estimator belongs to the c l a s s of estimators mentioned i n Theorem 2. 2.9. The r i s k of \* can be calculated as a follows. p [ ( l - b W g ^ X. - A9,] 2 R(b, A ) = E, E Z Z + P " 1 f l Q 1 1 — a b A,-0 . .. A9. 1=1 x = E b E z Cl/A) • {(1-b) 2 [Z^l] 2 • ~ 2tL-b) ' ^ = 1 ZA+A2} where E^ i n d i c a t e s expectation with respect to the p r i o r stated above and Eg indicates expectation with respect to Poisson (A). Note that E^,^ A = [(Z+a+l)/(l+ (1/a))] = (1-b)(Z+o+1) and L j D E z , b ( 1 / A ) = 1 1 [ ( 1 - b ) ( z + a > J> where E i s the expectation taken with respect to the p o s t e r i o r L* ) D d i s t r i b u t i o n of .A given Z and b. Hence R(b, A ) = E { (1-b) (Z+a+1) - (1-b) (Z+a)a/(Z+pr-1) } a b = p(l-b) + (l-b ) (a+l-p) [ (p-1)/(Z+p-1) ] 83 where denotes expectation with respect to the p r o b a b i l i t y mass function NB(b,a+l). For example, when a = p, R(b,A p) = p(.l-b) •+ ( l - b ) [ b 2 + (p-1) b (1-b)/p] = p - b[p - 1 + l/p. + 'Cl - 2/p)b + b 2/p] _< p f o r a l l b e (0,1]. 6.5 Truncated Bayes Rules In t h i s subsection, we consider again the class of estimators of the form (1 - b(Z))X. Theorem 2.2.9 provides a class of such estimators dominating the MLE. We w i l l attempt to f i n d conditions under which they are also Bayes with respect to some p r i o r d i s t r i b u t i o n . As before, ^ Poisson ( A ) independently, i = l , . . . , p . Suppose each A ^ ^ (1/a) exp ( - A ^ / a ) , and l e t b = 1/(1+a). Furthermore, we suppose that b ^ h(b), where h(b) i s a p r i o r p r o b a b i l i t y density f u n c t i o n o f b putting a l l of i t s p r o b a b i l i t y on the i n t e r v a l (0,1]. In th i s s e t t i n g , the Bayes r u l e under normalized squared loss i s calculated to be .A*(x) = E h [ E ( A | x , b ) ] = /J(l-b)Xh z(b) db = [1 - b*(Z)]X (6.5.1) where E^ indicates expectation with respect to the s i t u a t i o n described above, h^Cb) i s the condit i o n a l density of b given Z, and bg(Z) i s the con d i t i o n a l expectation of b given Z. The Bayes r u l e A* i s thus of the form that we have been considering and Theorem 2.2.9 i s ap p l i c a b l e . 84 The following lemma gives equivalent d e f i n i t i o n s of the Bayes r u l e with respect to h. In p a r t i c u l a r , an alternate d e f i n i t i o n which proves to be more convenient for our analysis than the usual one i s suggested. Lemma 6.5.1. The following are equivalent: (i ) A* i s that A which minimizes J(jR(b,A) h(b) db. ( i i ) A* i s that A which minimizes /o[R(b,A) - p(l-b)]/pb g(b) db where g(b) = pbh(b). ( i i i ) A* i s that A = (1 - b(Z))X for which b(Z) minimizes h a. E g b(Z+l) - b 2 , b(Z-El) - b b 2 ' g(b) db (6.5.2) where E means expectation with respect to Z ^ NB(b,p+l) Proof: Form ( i ) i s the usual d e f i n i t i o n ; form ( i i ) follows d i r e c t l y because the minimization i s over A. Form ( i i i ) i s equivalent to form ( i i ) by (6.2.1). Q.E.D. Finding the Bayes rules having the form A = (1 - b(Z))X i s thus equivalent to f i n d i n g b which minimizes (6.5.2). The b(z) that gives the minimization i n ( i i i ) i s b * ( z ) , which i s given by j J g ( b ) b P ( l - b ) Z db b*(z+l) = 4- (6.5.3) h / ^ g ( b ) b p - 1 ( l - b ) Z db j; o h ( b ) b p + 1 ( i - b ) z db or b*(z+l) = — (6.5.4) . n / o h ( b ) b P ( l - b ) Z db 85 2 If we take the improper p r i o r h(b) = 1/b or g(b) = p/b, then the Bayes r u l e i s (1 - b * ( Z ) ) X with / J b - 2 b P + 1 ( l - b ) Z db b * ( z + l ) = - j r : h / J b - 2 b P ( l - b ) Z db = (p-1)/(z+p) or b*(z) = (p-1)/(z+p-1). Thus the empirical Bayes estimator X* - ( 1 — (p-1)/(Z+p-1))X derived i n Section 6.4 i s a c t u a l l y a Bayes r u l e . We next inquire i f the Bayes rules thus obtained w i l l s t i l l dominate the MLE. Let $ be the cla s s of estimators s {1 - [ (p-1) cp(Z)/(Z+p-1) ]}X described i n Theorem 2.2.9 s a t i s f y i n g l i m <p(z) = s <_ 2. The following z-x» theorem gives the c l a s s of estimators obtained by modifying the Bayes r u l e obtained i n t h i s section so that they are i n $ f o r some 0 < s < 2. s — — Theorem 6.5.5. Suppose that h(b) i s such that <P*(z) = b*(z)/[(p-1)/(z+p-1)] i s nonnegative and nondecreasing i n z. The estimator i n $ g which min-imizes the Bayes r i s k versus h, E^ R(b,A), i s given by A*(X) = {1 - [(p-1)/(Z+p-1) ] ^ ( Z ) } X s where <i>n(z) = Min {s,(p*(.z) }. 86 Proof: Condition on Z=z, and l e t g (b)=pbh (b) . Then g (b) °= gCb)b P + 1(l-b) Z z z z where g(b) = pbh(b) as before. For any estimator A = (1 - b(Z))X, /J[(b(z+l)-b)/b] 2g z(b) db =/o[(bCz±l)-b*(.z+l))/b] 2g zCb) db + [Cb*Cz+l)-b)/b] 2g z(b) db. The cross term i s zero because of formula (6.5".3) for b*(z+l). In terms h of <p(z) = b(z)/[ (p-1)/(.z+p-1)], the above equality :can be written as /J'[(b(z+1) - b)/b] 2g zCb) db = jl [(<p(z+l) - <f>*(z+l))]2 [(p-l)/Cz+p-l)] 2g 2Cb) db + j] [(b*(z+l)-b)/b] 2g z(b) db. (6.5.6) g I t i s seen that '('^ (z) minimizes the r i g h t hand side of the above equality for a l l z among cp(z) giving rules i n $ g. ~ s Integrating over the marginal d i s t r i b u t i o n of Z shows that A. h 1 minimizes the i n t e g r a l j 0 RSL(b,A) g(b) db for A i n $ . s Q.E.D. The estimator thus obtained i s c a l l e d a "truncated Bayes r u l e " , a term introduced by Efron and Morris [1973]. s s We now define b^Cz) = [ (p^l)/(z+p-1) ]cp b(z) with s < l i m <j>*(z), and let'A?(X) = (1 - b?(Z))X. The following lemma — z-*-°° n n h shows that the r i s k of A, i s a decreasing convex function of s. 87 Lemma 6.5.7, R(b,A^) i s a s t r i c t l y decreasing convex function of s. Proof: s 2 2 Since {d>, (z) - <p*(z)} = [Max {0,<p*(z)-s}] i s a decreasing convex n n n function of s, the r e s u l t follows from (6.5.6). Q.E.D. 6.6 The Risk Function of the Estimator A* As remarked i n the previous subsection, A* = [1- (p-1)/(Z+p-1)]X can be viewed as an empirical Bayes estimator under the assumption that the p r i o r d i s t r i b u t i o n of A i s given by (6.4.1). Since A* i s independent of a , t h i s estimator i s robust i n that any member of a whole family of p r i o r d i s t r i b u t i o n s may be chosen and s t i l l we a r r i v e at the same empirical Bayes estimator. We therefore proceed to c a l c u l a t e the r i s k function R(A,A*) of the estimator A*. We f i r s t derive an expression f o r the r i s k function as a function of b. As i n subsection 6.2, we assume that A^,...,A are independently 2 and i d e n t i c a l l y d i s t r i b u t e d with p r i o r d i s t r i b u t i o n a x 2 ' Equivalently, the p r i o r i s of the form given above i n (6.4.1), with a = p - 1. The RSL can then be calculated as follows. RSL(b,A*|a=p-l) = E f e [ b 2 ( Z + l ) / b 2 - 2b(Z+l)/b + 1] OO = I [(p-1)/(z+p)] 2 [ ( z + p ) ! / ( b 2 z ! p ! ) ] b P + 1 ( l - b ) Z z=0 OO - E [ 2 ( p - l ) / { b ( z + p ) } ] [ ( z + p ) ! / ( z ! p ! ) ] b P + 1 ( l - b ) Z + 1 z=0 OO = E [ ( p - 1 ) 2 / ( z + p ) ] [ ( z + p - 1 ) ! ( z ! p ! ) ] b P _ 1 ( l - b ) Z - 2(p-l)/p + 1 z=0 88 CO = ( p - l ) 2 b p - 1 E [l/( z + p ) ] [ ( z + p - 1 ) ! ( z ! p ! ) ] ( l - b ) Z - 2(p-l)/p + 1 z=0 [(p-l) 2b p-1/{p(l-b) p}] E [(z+p-1)!/ z!(p-1)!}]/J(l-i) Z + P 1 dt z=0 H - 2(p-l)/p + 1 = [(p-l)V" 1/{p(l-b) P}]/J E [ (z+p-1) !/{z! (p-1) !}] (1-1) Z + P _ 1 dt 0 z=0 - 2(p-l)/p + 1 = [ ( p - l ) 2 b P _ 1 / { p ( l - b ) P } ] / J ( l - t ) P " 1 / t P dt - 2(p-l)/p + 1. Observe that as b tends to zero, RSL(b,A*|a=p-l) tends to (p-l)/p - 2(p-l)/p + 1 = 1 - (p-l)/p = 1/p. As b tends to one, RSL(b,b|a=p-l) tends to ( p - l ) 2 / p 2 - 2(p-l)/p + 1 = 1/p 2. Using the above r e s u l t to c a l c u l a t e the r i s k of the estimator as a function of b, we have R(b,X*|a=p-l) = RSL(b,A*)pb + p(l-b) = ( p - l ) 2 b P / ( l - b ) P i 7 j ( l - t ) P _ 1 / t P dt - 2(p-l)b + pb + p(l-b) = p - 2(p-l)b + (p-1) V / ( 1 - b ) P ( l - t ) P _ 1 / t P dt. R e c a l l that t h i s expression depends on the choice of a. Since t h i s i s not a simple expression, i n the next subsection we explore the p o s s i b i l i t y of s i m p l i f y i n g the expression by varying a. 89 6.7 S i m p l i f i c a t i o n of R(b,A*) In our attempt to obtain a simpler expression for R(b,A*), we now adopt another p r i o r d i s t r i b u t i o n . Suppose the j o i n t p r i o r d i s t r i b u t i o n i s as given i n (6.4.1), with a = p. The r i s k R(b,A*|a=p) when such a p r i o r i s used i s E a V e * [((i-Mz))x. - Ae^/CAe.)] ' i = l = E E (l/A)[Z(Z+p-l){b 2(Z) - b(Z) + 2b(Z)A/(Z+p-1)}] + p cL Li (conditioned on Z, the x ^ ' s have a multinomial d i s t r i b u t i o n with para-meters 9_^ , i = l,...,p) " E b l=b 2^, Z ( Z +P- ]T ) B ( Z ) - 2MZ) (Z+P-1) + 2 b ( z ) z ( 1 _ b ) Z+p Z+p + P where E, indicates expectation with respect to the marginal d i s t r i b u t i o n b of Z ^ NB(b,p+l). With b(Z) = (p-1)/(Z+p-1), we have M F _ I _ Z(Z+p-l)b 2(Z) U ) E b 1-b Z+p z(z+p-i) ( p - i ) 2 (z+p)! ,p+i ;• ,z 1-b n z+p , v - TN2 z!p! z=0 (z+p-1) 1 = ( P - D 2 I Cz+p-2)! p + l ( ,z 1-b ^ ( z - l ) ! p ! b U b J 2 °° = i ? = l l _ E l z + E = l ) l b P + l ( 1 _ b ) Z + l 1-b z = Q z!p! = ( p - l ) 2 b / p . ( i i , ^ ^ M Z ) ^ -2 „ 2 = l _ z i £ ^ b P + l ( 1 . b ) « 5 ! 2 = l 1-b . z+p-1 z!p! v ' z+p z=0 = -2(p-1) ( i i i ) 2E 1b(Z)Z b OO - 2(p-1) I (z+p) <^P- 2>j bP+Vb)2 = 2(p-l) z (z+p-1) ; ( z + P - l > ! b ^ d - b ) ^ 1 z=0 z i p 2 ( p 1 ) b(l-b) [p(l-b)/b + p + 1] P 2(p-1) - 2b(p-l) 2/p - 2b 2(p-1)/p. Consequently, R(b,A*|a=p) becomes p b ( p - l ) 2 / p - 2b 2(p-l)/p, 0 « b <_ 1. (6.7.1) Equivalently, R(a,A*) = p - (p-1) 2/{p(1+a)} - 2(p-l)/{p(l+a) 2}, a ^ (6.7.2) A. A. R(b,A*|a=p) i s c l e a r l y concave, and decreases i n b from R(0,A*|a=p) to R(l,A*|a=p) = 1/p. 91 6.8 Risk Function of A* as a Function of A We have now obtained a r e l a t i v e l y simple expression for R(b,A*), and P note that R(A,A*) depends on A only through the value of A = £ A.. . i = l 1 The r i s k function R(A,A*) i s therefore a c t u a l l y a function of A, R(A,A*), for which we are now i n a p o s i t i o n to derive an expression. Usually, knowledge about R(A,A) implies knowledge of R(b,A) when the p r i o r d i s t r i b u t i o n of A i s known. The r e s u l t below shows that we can reverse the d i r e c t i o n and gain information about R(A,A*) from knowledge about R(b,A*). With (6.4.1) as the p r i o r d i s t r i b u t i o n (a=p), A has density function P (A) = (A/a) P[exp(-A/a)]/[ar(p+l)] a = 0 i f A > 0 i f A < 0. By d e f i n i t i o n , R(a,A*) = /Q R(A,A*) P (A) dA, which can be written as cl R(a,A*) = /o R(aA,A*)A P exp(-A)/r'(p+l) dA. (6.8.1) D i f f e r e n t i a t i n g both sides of (6.8.1) j times with respect to a gives d jR(a,A*) da- a=0 r(p+j+l) d R(A,A*) r(p+l) dA j (6.8.2) A=0 However, from (6.7.2), the d e r i v a t i v e on the l e f t side of (6.8.2) i s - ( p - l ) 2 ( - l ) J j ! / p - 2 ( p - l ) ( - l ) j ( j + l ) ! / p , j = !,...,» . T h e r e f o r e d^R(A yA*) dA" A=0 = r ( P + i ) r(p+j+i) •2(p-1) 92 Representing R(A,A*) as a power se r i e s expansion about A = 0 gives R ( A J*) - p - ^Szll I (J+Dr(p+D ( _ A ) j m ( p - D 2 * r( P+i) j • P P j=o R ( P + J + 1 ) P j=o r (p + J+D C ; (6.8.3) CO ^ ^ Furthermore, Z ^ P ^ | 1 ) (-A) j = /:0 p exp(-At) ( l - t ) P " 1 dt (see Abramowitz and Stegun [1964], p. 505). 0 0 r ( The seri e s Z ., (-A)J i s uniformly convergent on bounded j=0 K v 3 ) i n t e r v a l s of the r e a l l i n e . Hence z -(j+i ) r( P+i) j ^ 0 ' r(p+j+i) ( A ) E r(p+D j l o r(P+j+i) i f 0 0 = <L z r( P+i) ( _ ( _ A ) j + i x dA F(p+j+l) c c A ; ) > = ^ Ap/J exp(-At) ( l - t ) P " 1 dt = p/ 0 exp(-At) ( l - t ) P dt - p/o t e x p ( - A f ) ( l - t ) P dt. (6.8.4) .1 p-1 The l a s t term can be rewritten as -pJ 0 exp(-At) t ( l - t ) dt = -p[-exp(-tA) t ( l - t ) |J + /Jexp(-At) { ( l - t ) P " 1 - ( p - l ) t ( l - t ) P " 2 } dt] (inte g r a t i o n by parts) = -p [/J e x p ( r A t ) ( l - t ) P " " 1 d t - (p-D/5 t ( l - t ) P 2 exp(-At) d t ] . r 1 P-2 Equation (6.8.4) thus becomes p(p-l)Jg t ( l - t ) exp(-At) dt. Consequently, (6.8.3) has the form R(A,A*) = p -[2(p-l)/p][p(p-l)/J t ( l - t ) P " 2 exp (-At) dt] [ ( p - l ) 2 / p ] [ p / J ( l - t ) P _ 1 exp(-At) dt = p-2(p-l) 2/J t ( l - t ) P " 2 exp(-At) d t - ' ( p - l ) 2 / J e x p ( - A t ) ( l - t ) p - 1 dt. Thus R(A,A*) i s seen to be an increasing concave function of A. We summarize the above into the following theorem. Theorem 6.8.5. Let X^,...,X be independent Poisson v a r i a b l e s with parameters A^,...,A , p ^ 2. Then the r i s k R(A,A*) of the estimator A* = (1 - (p-1)/(Z+p-1))X P i s an increasing concave function of A = E A. from R(0,A*) = 1/p to i-1 1 R(«>,A*) = p. I t has power serie s R F A X * ) = D _ ^ P - 1 ) v ( j + i ) r ( P + i ) ( . j _ (p-D2 p r(p+i) j R ( A ' A } P P r(p+j+i) ( A ) P . . ! 0 r ( P + j + i ) ( A ) • It also has the i n t e g r a l representation R(A,A*) =p- ( p - l ) 2 / J e x p ( - A t ) ( l - t ) P _ 1 dt - 2(p-l) 2/Jexp(-At) t ( l - t ) P " 2 dt. This r e s u l t i s very s i m i l a r to Coro l l a r y 2 of Efron and Morris [1973], which deals with the normal case. 94 6.9 Risk Function of A * as a Function of A The next theorem gives the r i s k of the estimator A * as a function of X = (A.,...,X ). 1 p Theorem 6.9.1. The r i s k of the estimator X* = (1 - [(p-1)/(Z+p-1)])X as a function of X = ( A T ..... A ) i s 1 p R ( A , A * ) = p - ( p - l ) 2 E ^ [l/(Z+p)]- 2 ( p - l ) 2 E ^ [l/{(Z+p)(Z+p-1)}] = p - ( p - l ) 2 E A [(Z+p+l)/{(Z+p)(Z+p-1)}]. (6.9.2) Proof: The l e f t hand side of (6.9.2), as pointed out e a r l i e r , i s a P function of A = E A . , say f ( A ) . The r i g h t hand side of (6.9.2) i s i = l 1 also a function of A, say g(A). By d e f i n i t i o n , E b f(A) = p- ( p - l ) 2 b / p - 2(p-l)b 2/p where E^ indicates expectation with respect to the j o i n t p r i o r d i s t r i -bution (6.4.1) with a = p. Also, E b g(A) = p - ( p - l ) 2 E b E A [ l / ( Z + p ) ] - 2 (p-1) 2 E b E A [1/{ (Z+p) (Z+p-1) } ] = p - (p-1) 2 E b [l/(Z+p)] - 2 ( p - l ) 2 E b [l/{(Z+p)(Z+p-1)}] where E i n d i c a t e s expectation with respect to the marginal d i s t r i b u t i o n b of Z ^ NB(b,p+l). As a r e s u l t , E b g(A) = p - ( p - l ) 2 b / p - 2(p-l)b 2/p. .95 Therefore E, g(A) = E, f(A) for every value of b, 0 < b < 1, or b b ~~ equivalently, for every value of a, a > 0. Since the d i s t r i b u t i o n s of A = A P exp(-A/a) dA are complete as a function of a, f and g must be the same function. Q.E.D. Both Theorem 6.8.5 and Theorem 6.9.1 show that R(A,A*) < p for a l l A, so A* i s better than the MLE (whose r i s k i s p) f o r a l l A. Furthermore, [l/(z+p)] and [l/{(z+p)(z+p-1)}] are convex functions i n z. Hence by Jensen's i n e q u a l i t y , the upper bound UB(A) of R(A,A*) = R(A,A*) i s of the form « » - - ^ - x f ^ -- A +2pA+p -p _ (p-1) 2 _ 2(p-1) 2 A+P , A - U ^ 2 ' (A+p) -p In p a r t i c u l a r , when A = 0, then UB(0) = 1/p. Since R(0,A*) = 1/p, the upper bound i s attained when A = 0. 6.10 Risk Function of the Clevenson-Zidek Estimators By Theorem 2.2.9, the estimators A S = (1 - [s(p-1)/(Z+p-1)])X, 0 < s < 2, ~ g are uniformly better than the MLE. The r i s k functions of A are s i m i l a r to those given i n Theorem 6.8.5 and Theorem 6.9.1. They are given i n the following theorems (proofs are s i m i l a r to those i n Theorems 6.8.5 and 6.9.1) 96 Theorem 6..10.1. The r i s k function of A (0 < s < 2) as a function of b i s R(b,A S) = p - ( p - l ) 2 b [ l - ( s - l ) 2 ] / p - 2 b 2 s ( p - l ) / p . Theorem 6.10.2. A s - P The r i s k function of A (0 < s < 2) as a function of A = Z A_^ i s R(A,A S) = p - ( p - l ) 2 [ l - ( s - l ) 2 ] / J exp(-At) ( l - t ) P _ 1 dt - 2 ( p - l ) 2 s / J exp(-At) t ( l - t ) P ~ 2 dt. This i s an increasing concave function of A from R(0,A S) = 1/p + ( p - l ) 2 ( s - l ) 2 / p + 2 ( p - l ) ( l - s ) / p to R(~,A S) = p. Theorem 6.10.3. A s The r i s k function of A (0 < s < 2) as a function of A = (A.,...,A ) _ _ 1 p i s R(A,A S) = p - ( p - l ) 2 [ l - Cs-1) 2JE A[1/(Z+p)] - 2 ( p - l ) 2 s E A [l/{ (Z+p) (Z+p-1)}]. 6.11 Summary In t h i s section, we confine our attention to estimators of A of the form A = (1 - b(Z))X and carry out our analysis by means of empirical Bayes methods. We begin with a d e s c r i p t i o n of'-.the. method used i n the normal case and then proceed to c a l c u l a t e the " r e l a t i v e savings l o s s " (a term coined by Efron and Morris [1973]) i n the Poisson case when normalized squared error l o s s i s the c r i t e r i o n . The r e l a t i v e savings loss plays a fundamental r o l e i n the analysis performed by Efron and Morris [1973] i n the normal case. We obtain "plus r u l e s (A = ( l - b ( Z ) ) + X ) and "truncated Bayes r u l e s " , r e s u l t s s i m i l a r to those i n the normal case. The remainder of the section i s devoted to the c a l c u l a t i o n of the g r i s k R(A,A ) of the Clevenson-Zidek estimators X S = [1 - s(p-1)/(Z+p-1)]X as a function of X, where 0 <_ s <^ 2. We f i n d that R(A,A ) i s P a c t u a l l y a function of A only through the value of A = E A.. i = l g Moreover, R(A,A ) i s an increasing and concave function of A. 98 SECTION 7. COMPUTER SIMULATION 7.1 Introduction In t h i s section we describe the r e s u l t s of a computer simulation used to compare some of our proposed estimators with the MLE. We also compare a Clevenson-Zidek estimator and one of Peng's estimators (Theorem 2. 2.3)-with the MLE. F i n a l l y , we discuss the performance of the estimators. The computations reported here were performed both on the IBM 370/168 computer at the U n i v e r s i t y of B r i t i s h Columbia and the Data General NOVA 840 computer at the U n i v e r s i t y of C a l i f o r n i a , Riverside. A FORTRAN program was used i n the IBM computer and a BASIC program was used i n the NOVA computer. F i r s t , the number p of independent Poisson random var i a b l e s i s chosen. Second, p parameters A^ are generated randomly within a c e r t a i n range (c,d). Third, one observation of each of the p d i s t r i b u t i o n s with the parameters obtained i n the second step i s generated. Estimates of the parameters are then calculated according to the estimator A we want to t e s t . The t h i r d step i s repeated 2000 times and the r i s k s under the relevant lo s s functions f o r both the estimator and the MLE are calc u l a t e d . The percentage of the savings i n using A as compared to the MLE, [ R ( X > R ( X ~ X ) * 1 0 0 ] %. i s c a l c u l a t e d . The whole process i s then repeated at l e a s t three times and the average percentage of the savings i s calculated. In the case when our Bayesian estimators (Section 5) are used, the hyperparameters (u,v,y,a,b) which spec i f y the p r i o r information are chosen beforehand, i . e . before the t h i r d step takes place. We chose the range of the parameters A^ i n such a way that we might check the performance of 99 the estimators when the parameters X are r e l a t i v e l y close to one another. This i s e s p e c i a l l y relevant when the performance of our Bayes estimators are checked, since t h i s i s the case where we expect that a sub s t a n t i a l improvement over the usual procedure w i l l take place. For purposes of comparison, we also include a simulation study of the per-formance of the estimators when the range of the parameters A^ i s wide. Our expectation that the Bayes estimator might not always be better than the MLE when the range of the A^'s i s wide was substantiated when negative savings r e s u l t e d from a choice of p = 3, u ~ 9.0, and (c,d) = (0,20). The MLE performed well i n t h i s case because p i s small, the range of the A / s i s wide, and the p r i o r was purposely selected to be an improper choice for the problem. In c a l c u l a t i n g the percentage of improvement of the various estimators over the MLE, the appropriate los s functions must be used. R e c a l l that P 2 k L (A,A) = E (A. - A.) /X.. In Tables I and I I , the l o s s function L~ i s x=l the c r i t e r i o n , while i n Tables I I I and IV, L^ i s used. The loss function L^ i s the c r i t e r i o n i n Table V, and Tables VI through XIV use squared error l o s s . In most of the cases, the improvement percentage i s seen to be an increasing function of p, the number of independent Poisson d i s t r i b u t i o n s . For the non-Bayes estimators, we see that i n general, the improvement percentage decreases as the magnitude of the A^'s increases. The improve-ment percentage of the Bayes estimators depends on the choice of the p r i o r hyperparameters; proper choice leads to s u b s t a n t i a l improvement over the MLE. 100 7.2 Estimators Under k-NSEL In Section 4 we derived, for each non-negative integer k, a c l a s s of Ak P * 2 k estimators A dominating the MLE under k-NSEL L (A,A) = Z (A.-A.) /A.. 1=1 These estimators are of the form (f>(Z)X ( X . - l ) . - - (X.-k+l) A^ = X. - 1 1 1 X X p Z (X.+l)---(X.+k) + X.(X.-l)•••(X.-k+l) j ^ i J "2 where <p(z) e [0,2k(p-l)] and i s nondecreasing. The estimator A ( i . e . k=2) i s of considerable appeal because t h i s i s the case where a P X± 2 natural loss function L 0(A,A) = Z (1 - -— ) i s used i n estimating scale 2 . A . . x=l x: parameters. Our computer simulation r e s u l t s i n t h i s subsection are mainly "2 devoted to the performance of an estimator A and a Clevenson-Zidek "1 estimator A . More s p e c i f i c a l l y , we choose (p(z) = 2[p-1] ( i . e . k [ p - l ] ) for A 2, and A 1 = ( 1 ™ -=P—)Z. z+p In Table I, we see that for the ranges considered, the percentage "2 of improvement i n r i s k of A over the MLE i s considerable when the "2 parameters f a l l into a narrow i n t e r v a l . For each value of p, A performs best when the parameters are i n the i n t e r v a l s (0, 4) and (4, 8). The improvement decreases gradually as the magnitude of the A / S increases. "1 In contrast, the Clevenson-Zidek estimator A performs very w e l l only when the parameters are r e l a t i v e l y small, with the improvement percentage decreasing dramatically as the magnitude of the A_/s increases (Table I I I ) . This i s as conjectured i n Section 4. Tables II and IV show r e s u l t s f or "2 wider ranges of the A^'s. Although the improvement percentages of A over the MLE are by no means su b s t a n t i a l , they are nevertheless greater 101 -Table IV Improvement Percentage of A over the MLE Narrow Range f o r x.'s Range of the Percentage of Improvement over the MLE Parameters X. 1 p=3 p=4 p=5 p=8 =10 (0, 4) 24.97 23.53 34.35 33.94 35 .43 (4, 8) 23.08 25.93 28.83 30.95 32 .53 (8, .12) 14.89 19.60 20.69 21.58 24 .12 (12, 16) 11.04 12.53 15.33 17.92 18 .09 Table I I . Improvement Percentage of X over the MLE Wide Range for A ' s Range of the Percentage of Improvement Parameters X. over the MLE l (0, 20) (10,30) 11.62 11.99 6.59 12.01 102 Table I I I . Improvement Percentage of A over the MLE Narrow Range for A.'s Range of the Parameters A. X Percentage of Improvement over the MLE P=3 p=4 p=5 p=8 p=10 (0, 4) 24.77 26.70 27.97 29.47 29.89 (4, 8) 7.34 8.29 9.83 11.21 12.46 (8, 12) 4.02 5.85 6.05 7.00 7.69 (12, 16) 2.44 2.74 4.28 5.36 5.18 Table IV. Improvement Percentage of A"*" over the MLE Wide Range for A^'s Range of the Percentage of Improvement Parameters A. over the MLE p=5 p=8 (0, 20) (10, 30) .8.23 4.08 6.87 4.32 103 "1 than those of X . Of course, the d i f f e r e n t l o s s functions employed for "2 "\ X and X might contribute to such a d i f f e r e n c e . -4 Table V reveals the performance of the estimator X with <p(z) = 4[p-rl] ( i . e . k [ p - l ] ) . When the parameters are confined to the i n t e r v a l (4, 8), the percentages of improvement are seen to be above 30%, which i s rather s u b s t a n t i a l . However, when the parameters are i n the i n t e r v a l (0, 4), the improvements are unimpressive. This i s as expected because X gives i d e n t i c a l estimates as the MLE when the observations are less than four, and when the X.'s are i n the i n t e r v a l l (0, 4), such observations are l i k e l y to occur ( c f . the remarks following the proof of Theorem 4.3.1). The s l i g h t improvements shown i n the (0, 4) row of Table V are due to the f a c t that some observations are greater than four. While the MLE estimates X^ by x^, the estimator X shrinks the observations greater than four towards zero, r e s u l t i n g i n some improvement. 7.3 Bayes Estimators The Bayes estimators X developed i n Section 5 are of the form X. = E v . X . l X,a,b,u i Z(X ±+1) g(X+e±,a,b,u) = ~(Z+F) g(X,a,b,p) ' 1 = 1 " ' - » P ( 7 - 3 ' x ) where g(x,a,b,y) i s the generalized hypergeometric function given i n equation (5.2.4). B a s i c a l l y , (7.3.1) depends on the choice of the p r i o r d i s t r i b u t i o n of a, which i s given i n equation (5.2.3). A p a r t i c u l a r p r i o r d i s t r i b u t i o n of a i s determined by the s p e c i f i c a t i o n of the 104 Table V. Improvement Percentage of A over the MLE Range of the Percentage of Improvement over the MLE Parameters X. 1 p=4 p=5 p=8 p=10 (0, 4) 7.51 0.57 6.56 3.80 (4, 8) 30.03 34.23 38.25 31.89 Table VI. Poisson-Distributed a (u=0, v=0) Narrow Range f o r A.'s Range of the Percentage of Improvement over the MLE Parameters A. V p=3 p=4 p=5 p=8 p=10 (0, 2) 0.5 47.81 42.33 54.52 60.17 54.25 (2, 4) 2.0 45.29 50.19 54.87 63.03 65.01 (4, 8) 5.0 46.02 47.00 57.94 58.74 61.28 (8, 12) 9.0 43.66 52.33 58.70 61.93 66.65 (12, 16) 13.0 47.18 55.57 57.35 65.92 65.00 (16, 20) 17.0 46.71 53.13 55.25 64.90 65.03 105 hyperparameters (u,v,a,b,y). In our simulation study, we s h a l l j . r e s t r i c t ourselves to the following cases; (1) u=0, v=0, 0 <_ \i < <x> (Poisson-distributed a) (2) u=l, v=0, a^=1.0, 0 <_ y < 1 (geometric-distributed a) (3) u=l, v=0, a^=3.0, 0 <_ y < 1 (negative binomial-distributed a). The choice of y r e f l e c t s one's b e l i e f about the magnitude of the para-meters A_^ . A r e l a t i v e l y large value of y indicates that the A^'s are believed to be large, while a r e l a t i v e l y small value means that the A.'s are believed to be small. l R e c a l l that A_^ ^ gamma (a, 3). Observe that f o r given y and 3» the marginal expectation of X_^ i s E(X.|S) = E E(X.|(3,A) = E(A.|3) = E E(A. | a, 3) = E ( f | 3) = \ Ea l 1 x; 1 p p where the expectations are taken appropriately. Note that Ea i s a function of y. Suppose that the scale parameter 3 i s equal to one, so that our problem i s cast i n terms of units of 3. In order that E(X_j3=l) might be close to the parameter A^, y i s chosen so that E^A = E(X_j3=l) = c + d — - — i f the A 's are believed to be i n the i n t e r v a l (c, d). For example, i f (c, d) = (4, 8), y=5 i s chosen. However, t h i s i s only one way to choose y; other choices are c e r t a i n l y p o s s i b l e . In Table VII, three values of y were chosen. The choice of y=12 seemed to r e s u l t i n the greatest savings. This leads us to suspect that there are i n f a c t better choices for y than that described above. In retrospect, i t seems that we should consider not only the length of the i n t e r v a l as a measure of closeness, but also, and possibly more importantly, the r e l a t i v e magnitude of the 106 A^'s within an i n t e r v a l . We might use the r a t i o c/d, where c > 0, as an in d i c a t o r of the closeness of the A.'s. When the r a t i o i s close to zero, 1 the A^'s may be said to be r e l a t i v e l y spread out i n that i n t e r v a l , and when the r a t i o i s close to one, they may be said to be r e l a t i v e l y close. Hence for i n t e r v a l s of f i x e d length, the parameters are considered to be closer i f the lower bound of the i n t e r v a l i s further away from zero. In other words, the closeness of the A^'s does not depend s o l e l y on the Euclidean distance between the A.'s, but also on t h e i r l o c a t i o n . I R e c a l l that the Bayes estimate i s a weighted average of x^ and x. If the A^'s are very close to one another, more weight should be given to x, which means that y should be given a r e l a t i v e l y large value. If the A / s are spread out, a small value of y should be chosen. In Table VIII, we see that a choice of y=0.1 i s better than 1.0 and 9.0 when the i n t e r v a l i s (0, 20). Since the parameters are quite spread out, the Bayes estimate should give more weight to the MLE, i . e . a small value of y i s s u i t a b l e . However, i n the i n t e r v a l (10, 30), the A 's. are r e l a t i v e l y close (although the length of the i n t e r v a l i s s t i l l 20) and we see that a choice of y=19 i s better than a choice of y=14, a phenomenon s i m i l a r to that encountered i n Table VII. Thus, the Bayes estimators remain superior to the MLE even though the range of the parameters A i s wide, i f we are acute enough to s e l e c t the hyperparameters appropriately. From Table VI, we conclude that the percentage of improvement over the MLE i s s u b s t a n t i a l when the parameters f a l l into a narrow i n t e r v a l , which i s as expected. Tables IX and X both show that the Bayes estimators using other p r i o r d i s t r i b u t i o n s (negative binomial- and geometric-d i s t r i b u t e d a) are s t i l l superior to the MLE and the savings i n mean squared error i s s u b s t a n t i a l . 107 Table VII. Poisson-Distributed a (u=0, v=0> E f f e c t of y Range of the Percentage of Improvement over the MLE Parameters A. 1 v p=4 p=5 p=8 p=10 (4, 8) 1.0 29.93 35.51 47.57 44.47 (4, 8) 5.0 47.00 57.94 58.74 61.28 (4, 8) 12.0 60.69 64.91 65.94 72.04 Table VIII. Poisson-Distributed a (u=0, v=0) Wide Range for A ^ ' s Range of the Percentage of Improvement „ , over the.MLE Parameters A . Vi P=5 . p=8 (0, 10) 2.0 8.57 29.38 (0, 20) 0.1 9.34 15.84 (0, 20) 1.1.0 7.85 12.12 (0, 20) 9.0 9.12 3.05 (10, 20) 14.0 48.51 52.92 (10, 30) 14.0 11.45 10.90 (10, 30) 19.0 33.43 30.21 108 Table IX. Negative Binomial-Distributed ci (u=l, v=0, a^=3.0) Range of the Parameters A. 1 y Percentage of Improvement over the MLE p=3 p=4 p=5 p=8 p=10 (0, 4) (4, 8) (8, 12) 0.25 0.5 0.75 41.99 42.60 44.61 55.73 47.11 49.70 53.71 48.35 55.83 44.66 57.08 61.68 47.44 55.61 65.41 Table X. Geometric-Distributed a (u=l, v=0, a^=1.0) Range of the Parameters A. y Percentage of Improvement over the MLE p=3 p=4 p=5 p=8 p=10 (0, 4) (4, 8) 0.5 35.17 34.69 37.08 47.77 51.58 0.833 39.20 47.42 46.68 60.84 59.16 109 7.4 Estimators Under Squared Error Loss " (0) We s h a l l next compare the performance of Peng's estimator A (a s p e c i a l case of A ^ ^ described i n Section 3) and our new estimator A ^ , which shrinks the MLE towards the minimum of the observations, where M™ 1 (p-N -2) H (X) A [ = x. m + 1 , 1 - 1 p, 1 1 p 9 •S H?(X) 1=1 1 and the H^'s are as defined i n equation (3.4.1). We have argued that : t h i s adaptive estimator A ^ M ^ should perform better than A ^ ^ across a broad spectrum of values f o r the A . ' s . I Table XI shows that although A ^ ^ provides a noticeable improvement over the MLE, the improvement percentage decreases r a p i d l y as the A ^ ' s move away from zero. In contrast, the improvement percentages i n Table XIII remain noticeable even when the A_^'s are i n the i n t e r v a l (12, 16). This supports our conjecture that the estimator A ^ i s superior to A ^ ^ when p ^ 4. The improvement percentages for p=3 i n Table XIII are a l l Tm1 zero because A i s i d e n t i c a l with the MLE when p <^ 3. Thus Peng's estimator has the merit that i t provides a better estimator than the MLE under squared error l o s s when p >_ 3. Note, however, that use of A ^ i m p l i c i t l y involves a choice of k=0, towards which the observations are shrunk. If t h i s kind of s u b j e c t i v i t y i s to avoided and the shrinkage ~ Tml determined only by the data (using A ), one degree of freedom i s l o s t and improvement over the MLE r e s u l t s only when p ^ 4. F i n a l l y , Tables XII and XIV show our r e s u l t s for wide ranges of the parameters A_^. In both cases, the improvement percentages are seen to be minimal, which i s not s u r p r i s i n g because the r i s k of the MLE i s large i n th i s case, and hence the r e l a t i v e savings are bound to be small. 110 Table XI. Improvement Percentage of A over the MLE Narrow Range for A.'s Range of the Percentage of Improvement over the MLE Parameters A. 1 p=3 P=4 p=5 p=8 p=10 (0, 4) 2.23 3.97 6.37 7.70 8.52 (4, 8) 0.58 1.14 1.42 1.68 1.79 (8, 12) 0.17 0.36 0.50 0.63 0.85 (12, 16) 0.05 0.00 0.19 0.37 0.48 Table XII. Improvement Percentage of A ^ over the MLE Wide Range for A^'s Range of the Percentage of Improvement Parameters A. over the MLE l P=5 p=8 (0, 20) (10, 30) 0.69 0.37 0.62 0.40 I l l Table XIII. Improvement Percentage of A over the MLE Narrow Range for A.'s Range of the Percentage of Improvement over the MLE Parameters A. 1 p=3 p=4 p=5 p=8 p=10 (0, 4) 0.00 4.21 6.35 12.41 11.76 (4, 8) 0.00 3.80 6.08 .7.95 7.69 (8, 12) 0.00 3.81 5.50 6.37 6.59 (12, 16) 0.00 2.95 4.74 5.90 5.22 Table XIV. Improvement Percentage of A over the MLE Wide Range for A.'s Range of the Percentage of Improvement Parameters A. over the MLE l p=5 p=8 (0, 20) . (10, :30) 0.79 2.30 0.68 1.85 112 7.5 Comparison of the Estimators Based! on our computer simulation r e s u l t s , i t appears that among the estimators considered, the Bayes estimators provide by far the most improvement over the MLE when the parameters are i n a . r e l a t i v e l y narrow i n t e r v a l . When p r i o r knowledge indicates that the A^'s are exchangeable and are l i k e l y to f a l l i nto a c e r t a i n r e l a t i v e l y narrow i n t e r v a l (c, d), there i s no doubt that the Bayes estimator should be used. Recall also that the Bayes estimators remain superior to the MLE even though the range of the parameters A^ i s wide, i f we choose the hyperparameters s u i t a b l y . I t should be noted, however, that the Bayes estimators do not dominate the MLE uniformly i n A, and hence should not be used i n d i s c r i m i n a t e l y . On the other hand, the estimators A^, A ^ , and A ^ are guaranteed to have lower r i s k than the MLE uniformly i n A under appropriate los s ~k "(k) functions. The estimators A and A can be used advantageously when the integer k i s chosen appropriately. The estimator A ^ i s . u s e f u l i f " (k) p •> 4 and p r i o r knowledge of k used i n the estimator A i s vague. 113 SECTION 8. EXTENSIONS 8.1 Motivation Up to t h i s point, we have confined our a t t e n t i o n to the s i t u a t i o n i n which only one observation i s taken from each of p independent Poisson populations. We now suppose that X ^,...,X. ^ Poisson (A.), where i x ni ;L.!•» i = 1»."»P> and that a l l the X_^ 's are independent. L e t t i n g n. i X. X xn- = ^ X , i = l , . . . , p , the MLE of A i s (—,... . We pose the -i=1 J n T n J X 1 P following question: Question 8.1.1. Given the s i t u a t i o n described above, are there estimators A of A which dominate the MLE under an appropriate los s function? In the case of simultaneously estimating p normal means 0., we independent suppose that X ^ N(0^,1) for i = l , . . . , p and j = l , . . . , n ^ . The v a r i a b l e s X^/n^ are s t i l l normal and we are e s s e n t i a l l y dealing with the problem i n which one observation i s taken from each population. In our case, unfortunately, the d i s t r i b u t i o n of X^/n_^ I s n o longer Poisson i f n_^ > 1. We therefore cannot apply our previous r e s u l t s d i r e c t l y to t h i s problem. However, i f our i n t e r e s t i s to estimate n A^y i = l , . . . , p , then the foregoing theory can be applied because X^ ^ Poisson (n^A^) i n t h i s case. We f i r s t consider the squared error los s function. The goal i s to f i n d an estimator A of A such that P . 2 P . X 2 E, Z (A.-A.) < E, Z (A. - — ) for a l l A A . . . i i — A . , i n . i = l i = l I 114 with s t r i c t i n e q u a l i t y f o r some A. Observe that P X p E. Z (— - X.y = E, Z ±r (X. - n.A.r A . - n . i A .... 2 1 i i i = l l i = l n. l and that X. ^ Poisson (n.A.), i = l , . . . , p . We are thus led to consider-I i i at i o n of the following problem. Problem 8.1.2. independent Suppose X_^ ^ Poisson (A^) , i = l , . . . , p , and that x^ i s an observation of X_^ , i = l , . . . , p . I f possible, f i n d an estimator A of A. such that A dominates the MLE X = (X^,...,X ) uniformly under the c P P " 2 generalized squared error l o s s function L = Z c.(A.-A.) , where c. > 0, a 1 1 1 1 1 i = l i = 1,... , p. Solution of Problem 8.1.2 w i l l automatically provide an answer to Question 8.1.1. In the next subsection we s h a l l show that a s o l u t i o n to Problem 8.1.2 e x i s t s provided c e r t a i n conditions on the c^'s hold. 8.2 Estimators Under Generalized Squared Error Loss independent Let X. ^ Poisson (A.), i = l , . . . , p and X = (X,,...,X ). l l r 1 p The estimators A of A we consider here are again of the form X + f(X) where f(X) i s as described i n Section 2.1 and s a t i s f i e s the following conditions: (1) f^(x) = 0 i f x has a negative coordinate (2) Ex |f.(X+je.)| < - for j = 0, 1. The r e s u l t s i n t h i s subsection are very s i m i l a r to those i n Section 6, as are the derivations. The next lemma, which i s s i m i l a r to Lemma 2.2.2, gives the d e t e r i o r a t i o n i n r i s k of .A = X + f(X) as compared to the r i s k of the MLE X. 115 Lemma .:8.:2.1. c " P " 2 Under the loss function L (A,A) = Z c.(A.-A.)", the deterioration, i n 1=1 r i s k D of A = X + f(X) as compared to the r i s k of X i s A, D = R(A,A) - R(A,X) = E. A where A P 2 p A = Z c.f.(X) + 2 Z c.X.ff.CX) - f . ( X - e . ) ] . . .. i i i l l . . l l 1=1 i = l The following theorem gives estimators A of A dominating X under L C . Theorem 8.2.2. Let X_^ Poisson (A^), i = l , . . . , p and X = (X^,.. . ,X^) independent (p > 3). Define f.(x) = - — (. Z ,„ /cT - 2v/c*") ,h(x.)/S i f x. > 0 i v v j : x fO 2 + i i -i 3 = 0 i f x. < 0, l i ,=! 1,... ,p, where P (1) c* = Max{c.} i = l 1 j 1 (2) h(j) = Z ^ i f j > 1 n=l = 0 i f j < 0 P 2 (3) S = Z h (x.). i = l 1 Let f(X) = ( f . ( X ) , . . . , f (X)). Then the estimator A = X + f(X) 1 p c dominates X uniformly i n A under the l o s s function L provided Z /cT > 2/c*. i = l J 116 Proof: The proof i s very s i m i l a r to that of Theorem 6.3.8. I t can be shown that the A given i n Lemma 8.2.1 i s l e s s than or equal to -( E /c7 - 2/c*)2_/S and hence R(A,A) <_ R(A,X) for a l l A. j:x#> 2 3 Q.E.D. If we i n t e r p r e t c. as the cost of misestimation of A. per squared P ••/— r -unit, of length, then the condition E /c^ > 2/c* can be interpreted as meaning that no cost per un i t length of a p a r t i c u l a r A^ can be greater than the t o t a l cost per unit length of the remaining A_^'s. I n t u i t i v e l y , we would not expect that an improvement over X can be made i f there i s a p a r t i c u l a r cost /c7 (per u n i t length) dominating the r e s t of the costs, since we know that i n the one-dimensional case (p=l), X i s P admissible under squared error l o s s . Hence the condition E /cT > l/c* i = l 1 i s i n t u i t i v e l y reasonable. The estimator A given i n Theorem 8.2.2 i s not the only s o l u t i o n to problem 8.1.2. In f a c t , solutions s i m i l a r to those given i n Section 6 can be obtained. For example, we may suppose the function h s a t i s f i e s Lemma 6.2.2 and obtain theorems s i m i l a r to Theorems 6.3.8, 6.3.12, 6.3.13, etc. However, we w i l l not carry out the d e t a i l s here. 2 Let c^ = 1/ru. The following c o r o l l a r y suggests an answer to Question 8.1.1. Corol l a r y 8.2.3. independent Let X „ ^ Poisson (A_^ ) where >_ 1, i = l , . . . 5 p , and p >_ 3. n. Let X. = E X.., i = 1 , — , p . Define A = (A.,...,A ) as 1 j = i !J 1 p 117 X. ( S I _2_) j I X. vj:X.^O n-. n '+ ^ n i n. X. ' 1 1»*-*»P 1 P J n E- < Z ? j = l n=l where (1) n. = Min{n.} x 1=1 0 i (2) F, = 0. n=l P 1 2 Suppose £ — > — . (8.2.4) 1=1 n i ' n * X± X Then the estimator A dominates the MLE = (— ,. .., —P-) under the squared n, n 1 P ^ P ^ 2 error l o s s function L(X, X) = £ (X. - X.) . 1=1 Remarks: (1) Condition (8.2.4) guarantees that the proposed estimator X i s d i f f e r e n t from the MLE with p o s i t i v e p r o b a b i l i t y . (2) When n^ = 1 for a l l i , the estimator X i s Peng's estimator A(0) • A (cf. Theorem 2.2.3). Moreover, condition (8.2.4) holds automatically i f p > 3. th (3) Suppose the i A population has sample s i z e n^. Condition (8.2.4) says that improvement over the MLE i s possible under squared error loss i f the sample sizes n^ of the other populations are not too' large. 8.3 Estimators Under Generalized k-NSEL Answers to Question 8.1.1 are possible under other los s functions as w e l l . The techniques are much the same as those used above, i . e . 118 Question 8.1.1 i s transformed into a problem s i m i l a r to 8.1.2, with a d i f f e r e n t loss function, and analysis s i m i l a r to that i n a previous section i s employed to derive our r e s u l t s . We w i l l therefore for the most part merely state the r e s u l t s . The following lemma i s s i m i l a r to Lemma 2.2.7. Lemma 8.3.1. independent Let X. 'v. Poisson (A.), i = l , . . . , p and l e t f . : J P -> R, l I I i = l , . . . , p s a t i s f y the conditions given i n Lemma 2.2.6. Define A = X + f ( X ) . Then, under the loss function c p - 2 k ::, L. (A,A) = F. c.(A. - A.) /A. , (8.3.2) i = l the d e t e r i o r a t i o n i n .risk" of : A as compared to X i s R(A,A) - R(A,X) = A, where A k p f.(X+ke.) p f.(X+ke.) - f.(X+(k-l)e.) A, = E c. -± + 2 Z c, (X 4+k) 1 1 1 i k 1=1 1 (x. +k) ( k ) i = l 1 1 (x.+k) ( k ) Proof: The proof i s v i r t u a l l y the same as that of Lemma 2.2.7. The next theorem supplies estimators that dominate the MLE under the loss function (8.3.2). Theorem 8.3.3. Let X^ Poisson (A^), i = l , . . . , p , and the loss function independent a. be as given i n (8.3.2). Define f.(x) = -k(p-i)/cT7F7 x f k ) / ( s i + x f k ) ) i f x. > 0 i l l l = 0 i f x. < 0 l 119 i = 1,...,p, where (1) = Min{c.} 1=1 1 (2) S 1 = E ( x . + k ) ( k ) j * i 3 (3) x ? k ) = x.(x.-l)(x.-2)--.(x.-k+l). 1 1 1 X X Let f(X) = (f (X),...,f (X)) and A = X + f ( X ) . Then the estimator A of J_ p A dominates the MLE X uniformly i n A under the loss function (8.3.2). Proof: Using s i m i l a r techniques as i n the proof of Theorem 4.3.1, we can show that given i n Lemma 8.3.1 above s a t i s f i e s 2 2 c ^ k / ( p - l ) Z A < < 0. E ( x . + k ) W i = l r Q.E.D. The following c o r o l l a r i e s provide other answers to Question 8.1.1. Coroll a r y 8.3.4. independent Let X ^ Poisson ( A ^ , i = l , . . . , p , j = 1,...,^. Let n. X. = E X.., i = 1,...,p. Define A = (A ,...,A ) by 1 j = i ^ 1 p X = h. _ k(P-D i n. n. x x ((k-2)/2) n n . x ^ k ) / ( s 1 + x ^ k ) ) i2o; * P. i = l , . . . , p , where n = Max{n.}. Then A dominates the MLE i = l 1 X. X *• P A 9 lr ( ) under k-NSEL L (A,A) = Z (A. - A.) /A. with k = 1 or 2, n- n k . , l I x 1 p i = l k-2 * k-2 Proof: Use Theorem 8.3.3 with c. = n. and note that c. = (n ) I X * i f k = 1 or 2. C o r o l l a r y 8.3.5. independent Let X „ ^ Poisson (A^), i = l , . . . , p , j = l , . . . , n ^ . Let n. X. = Z X.., i = l , . . . , p . Define A = (A.,...,A ) by 1 i i 1 p \ , h _ k(p-D i n. n. x x n. x ((k-2)/2) x ^ / C s V x f 0 ) , . p X X i = l , . . . , p , where n,. = Min{n.}. Then A dominates the MLE (—\ .. .,—P-) . -, n n 1=1 1 p under k-NSEL with k > 3. ,k-2 , ., vk-2 X ' X' Proof: Observe that c. = (n.) and = (n A) i f k 13, There are, of course, other estimators dominating the MLE under P - 2 k the los s function L(A,A) = Z c.(A. - A.) /A.. The r e s u l t s w i l l be i = l s i m i l a r to those derived i n Section 4, and we s h a l l therefore not set down the d e t a i l s here. 8.4 An A p p l i c a t i o n There are other s i t u a t i o n s i n which the r e s u l t s of Theorem 8.2.2 and 8.3.3 might be u s e f u l . One such s i t u a t i o n i s described as follows: Let P ^ ( A ^ ) , 1 = 1> •••»?» be p independent Poisson processes with 121 i n t e n s i t y parameters A . Let be the number of counts of the process observed during the period of time ( 0 , t . ) , i = l , . . . , p . The MLE of ' • X X 1 A = (A-,..., A ) i s (—,••• ,—2-) • The r e s u l t s i n subsection 8.2 and 8.3 P 1 p show that there are other estimators A of A dominating the MLE under squared error l o s s or under k-NSEL, provided the t^'s s a t i s f y c e r t a i n conditions. In f a c t , the estimators described i n C o r o l l a r i e s 8.2.3, 8.3.4, and 8.3.5 can be employed here with n^ replaced by t ^ and n. = t. = Minit.} and n = t = Max{t.}. * * i i 122 SECTION 9. SUGGESTIONS FOR FURTHER RESEARCH In Section 6, we gave an empirical Bayes i n t e r p r e t a t i o n f o r the estimators derived under the normalized squared error los s L^. I t i s therefore natural to seek a s i m i l a r i n t e r p r e t a t i o n f o r our new estimators ~k A i n Section 4 which are derived under k-NSEL L, , with k > 2. Moreover, k — since some of the estimators under are a c t u a l l y Bayes estimators, one "k might attempt to check i f some of the estimators of A are Bayes estimators under some appropriate p r i o r d i s t r i b u t i o n s . I f such p r i o r d i s t r i b u t i o n s can be found, then we have a clear understanding of when ~k we should use those estimators A . In the squared error los s case, Peng [1975] has already suggested " (0) that some i n s i g h t s about h i s estimators A (see Theorem 2.2.3) might be obtained from an empirical Bayes i n t e r p r e t a t i o n for the estimators. Up to t h i s point, no successful attempt has been made. " (k) Though the estimators A suggested i n Section 3 dominate the MLE under squared error los s when p > 3, they are not admissible. I t might be of i n t e r e s t to see i f there are admissible estimators which also dominate the MLE. 123 BIBLIOGRAPHY 1. Alam, K. [1973], "A Family of Admissible Minimax Estimators of the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n , " Annals of S t a t i s t i c s , 1, 517-525. 2. Alam, K. [1975], "Minimax and Admissible Minimax Estimators of the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n f o r Unknown Covariance Matrix," Journal of M u l t i v a r i a t e Analysis, 5, 83-95. 3. Abramowitz, M., and Stegun, I. [1964], Handbook of Mathematical Functions, National Bureau of Standards, Applied Mathematics Series, 55. 4. Baranchik, A.K. [1964], "Multiple Regression and Estimation of the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n , " Technical Report No. 51, Stanford U n i v e r s i t y . 5. Baranchik, A.K. [1970], "A Family of Minimax Estimators of the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n , " Annals of Mathematical S t a t i s t i c s , 41, 642-645. 6. Berger, J.O. [1976a], "Minimax Estimation of a M u l t i v a r i a t e Normal Mean Under A r b i t r a r y Quadratic Loss," Journal of M u l t i v a r i a t e Anal- y s i s , 6, 256-264. 7. Berger, J.O. [1976b], "Admissible Mimimax Estimation of a M u l t i v a r i a t e Normal Mean with A r b i t r a r y Quadratic Loss," Annals of S t a t i s t i c s , 4, 223-226. 8. Berger, J.O. [1976c], "A Robust Generalized Bayes Estimator of a M u l t i v a r i a t e Normal Mean," Mimeograph Series No. 480, Department of S t a t i s t i c s , D i v i s i o n of Mathematical Sciences, Purdue Un i v e r s i t y . 9. Berger, J.O.,- - and Bock, M.E. [1976], "Combining Independent Normal Mean Estimation Problems with Unknown Variances," Annals-of S t a t i s t i c s , 4, 642-648. 10. Berger, J.O., Bock, M.E., Brown, L.D., C a s e l l a , G., and Gleser, L. [1977], "Minimax Estimation of a Normal Mean Vector f o r A r b i t r a r y Quadratic Loss and Unknown Covariance Matrix," Annals of S t a t i s t i c s , 5, 763-771. 11. Brown, L. [1968], " I n a d m i s s i b i l i t y of the Usual Estimators of Scale Parameters i n Problems with Unknown Location and Scale Parameter," Annals of Mathematical S t a t i s t i c s , 39, 29-48. 12. Clevenson, L., and Zidek, J.V. [1975], "Simultaneous Estimation of the Means of Independent Poisson Laws," Journal of the American S t a t i s t i c a l Association, 70, 698-705. 124 13. de F i n e t t i , B. [1964], "Foresight: I t s L o g i c a l Laws, Its Sub-j e c t i v e Sources," i n Studies i n Subjective P r o b a b i l i t y (H. E. Kyburg, J r . and H. E. Smokier, eds.), pp. 93-158, New York: Wiley. 14. DeGroot, M.H. [1970], Optimal S t a t i s t i c a l Decisions, San Francisco: McGraw-Hill. 15. Efron, B., and Morris, C. [1973], "Stein's Estimation Rule and i t s Competitors—an Empirical Bayes Approach," Journal of the American S t a t i s t i c a l Association, 68, 117-130. 16. Efron, B., and Morris, C. [1976], "Families of Minimax Estimators of the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n , " Annals of S t a t i s t i c s , 4, 11-21. 17. Haff, L. R. [1976], "Minimax Estimators of the M u l t i v a r i a t e Mean: Autoregressive P r i o r s , " Journal of M u l t i v a r i a t e Analysis, 6, 265-280. 18. Haff, L. R. [1977], "Minimax Estimation of the Multinormal Mean with A r b i t r a r y Quadratic Loss: An A p p l i c a t i o n of Gauss' Theorem," to appear i n Annals of S t a t i s t i c s . 19. Hodges, J . L., and Lehmann, E. L. [1951], "Some Applications of the Cramer-Rao Inequality," i n Proceedings of the Second Berkeley Symposium, pp. 13-22, Berkeley: Un i v e r s i t y of C a l i f o r n i a Press. 20. Hudson, M. [1974], "Empirical Bayes Estimation," Technical Report No. 58, Department of S t a t i s t i c s , Stanford U n i v e r s i t y . 21. Hudson, M. [1977], "A Natural Identity for Exponential Families with Applications i n Multiparameter Estimation," Research Paper No. 138, School of Economic and F i n a n c i a l Studies, Macquarie U n i v e r s i t y . 22. James, W., and Stein, C. [1961], "Estimation with Quadratic Loss," i n Proceedings of the Fourth Berkeley Symposium, pp. 361-379, Berkeley: U n i v e r s i t y of C a l i f o r n i a Press. 23. Johnson, B. M. [1971], "On the Admissible Estimators for Certain Fixed Sample Binomial Problems," Annals of Mathematical S t a t i s t i c s , 42, 1579-1587. 24. Leonard, T. [1972], "Bayesian Methods f o r Discrete Data," American College Testing Technical B u l l e t i n No. 10. 25. Leonard, T. [1976], "Some A l t e r n a t i v e Approaches to Multiparameter Estimation," Biometrika, 63, 69-75. 125 26. L i n , P., and Ts a i , H. [1973], "Generalized Bayes Minimax Estimators of the M u l t i v a r i a t e Normal Mean with Unknown Covariance Matrix," Annals of S t a t i s t i c s , 1, 142-145. 27. Peng, J. [1975], "Simultaneous Estimation of the Parameters of Independent Poisson D i s t r i b u t i o n s , " Technical Report No. 78, Department of S t a t i s t i c s , Stanford University. 28. R a i n v i l l e , E. D. [1960], Special Functions, New York: MacMillan'. 29. Stein, C. [1956], " I n a d m i s s i b i l i t y of the Usual Estimator f o r the Mean of a M u l t i v a r i a t e Normal D i s t r i b u t i o n , " i n Proceedings of the Third Berkeley Symposium, pp. 197-206, Berkeley: Un i v e r s i t y of C a l i f o r n i a Press. 30. Stein, C. [1974], "Estimation of the Parameters of a M u l t i v a r i a t e Normal D i s t r i b u t i o n I: Estimation of the Means," Technical Report No. 63, Department of S t a t i s t i c s , Stanford U n i v e r s i t y . 31. Strawderman, W. [1971], "Proper Bayes Minimax Estimators of the Mu l t i v a r i a t e Normal Mean," Annals of Mathematical S t a t i s t i c s , 42, 385-388. 32. Strawderman, W. [1973], "Proper Bayes Minimax Estimators of the Mu l t i v a r i a t e Normal Mean Vector f o r the Case of Common Unknown Variances," Annals of S t a t i s t i c s , 1, 1189-1194.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Simultaneous estimation of the parameters of the distributions...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Simultaneous estimation of the parameters of the distributions of independent Poisson random variables Tsui, Kam-Wah 1978
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Simultaneous estimation of the parameters of the distributions of independent Poisson random variables |
Creator |
Tsui, Kam-Wah |
Publisher | University of British Columbia |
Date Issued | 1978 |
Description | This work is devoted to simultaneously estimating the parameters of the distributions of several independent Poisson random variables. In particular, we explore the possibility of finding estimators of the Poisson parameters which have better performance than the maximum likelihood estimator (MLE). We first approach the problem from a fre-quentist point of view, employing a generally scaled loss function, called the k-normalized squared error loss function L[sub k] (λ, [sup ^]λ) = [sup P] ∑ [sub i=1] (λ[sub i] – [sup ^]λ[sub i])²/ λ[sup k][sub i], where k is a non-negative integer. The case k=0' is the squared error loss case, in which we propose a large class of estimators including those proposed by Peng [1975] as special cases. Estimators pulling the MLE towards a point other than zero as well as a point determined by the data itself are proposed, and it is shown that these estimators dominate the MLE uniformly. Under L[sub k] with k ≥ 1, we obtain a class of estimators dominating the MLE which includes the estimators proposed by Clevenson and Zidek [1975]. We next approach the problem from a Bayesian point, of view; a two-stage prior distribution is adopted and results for a large class of prior distributions are derived. Substantial savings in terms of mean squared error loss of the Bayes point estimators over the MLE are expected, especially when the Poisson parameters fall into a relatively narrow range. An empirical Bayes approach to the problem is carried out along the line suggested by Clevenson and Zidek [1975]. Some results are obtained which parallel those of Efron and Morris [1973], who work under the assumption that the random variables are normally distributed. We report the results of our computer simulation to quantitatively examine the performance of some of our proposed estimators. In most cases, the savings, under the appropriate loss functions, are an increasing function of the number of Poisson parameters. The simulation results indicate that our estimators are very promising. The savings of the Bayes estimators depend on the choice of prior hyperparameters, and hence proper choice leads to substantial improvement over the MLE. Although most of the results in this work are derived under the assumption that only one observation is taken from each Poisson distribution, we extend some results to the case where possibly more than one observation is taken. We conclude with suggestions for further work. |
Subject |
Poisson distribution |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0080162 |
URI | http://hdl.handle.net/2429/21302 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1978_A1 T85.pdf [ 5.6MB ]
- Metadata
- JSON: 831-1.0080162.json
- JSON-LD: 831-1.0080162-ld.json
- RDF/XML (Pretty): 831-1.0080162-rdf.xml
- RDF/JSON: 831-1.0080162-rdf.json
- Turtle: 831-1.0080162-turtle.txt
- N-Triples: 831-1.0080162-rdf-ntriples.txt
- Original Record: 831-1.0080162-source.json
- Full Text
- 831-1.0080162-fulltext.txt
- Citation
- 831-1.0080162.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0080162/manifest