r I THE RATIO, MEAN-OF-TH E-RATIOS AND HORVITZ-THOMPSON ESTIMATORS UNDER THE CONTINUOUS VARIABLE MODEL BY ANTHONY ALIFA CHAMWALI B. SC., UNIVERSITY OF EAST AFRICA -THE UNIVERSITY COLLEGE, DAR-ES-SALAAM, 1970 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE (BUS. ADMIN.) IN THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION V/E ACCEPT THIS THESIS AS CONFORMING TO THE REQUIRED STANDARD THE UNIVERSITY OF BRITISH COLUMBIA APRIL, 1974 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an advanced degree a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I agree t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by the head o f my Department o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department of Commerce and Business A d m i n i s t r a t i o n The U n i v e r s i t y o f B r i t i s h C o l u m b i a Vancouver 8, Canada Date A p r i l , 1974 ABSTRACT T h i s study i n v e s t i g a t e s the performances of the r a t i o e s t i m a t o r , the m e a n - o f - t h e - r a t i o s e s t i m a t o r and the Horvitz-Thompson (HT) e s t i m a t o r under the continuous v a r i a b l e model of C a s s e l and S a r n d a l (1972a, 1972b, 1973). Under t h i s model, the c h a r a c t e r , Y, which i s of i n t e r e s t t o the i n v e s t i g a t o r i s assumed t o be r e l a t e d t o an a u x i l i a r y v a r i a b l e , X, by Y(Xj_) = e{X± + Z ( X i ) ) where £ ( Z i | x^ ) = 0 VXi e (0, ^ ) £(z±2\ X i ) = cr 2 ( X i ) = k2 x? £ ( Z i Z j / X i X j ) =0 ( i * j ) I t i s assumed, i n t h i s paper, t h a t X i s gamma d i s t r i b u t e d over (0, CP) w i t h parameter r . The mean of Y i s t o be estimat e d , under the a d d i t i o n a l assumptions t h a t the de s i g n f u n c t i o n , P(x), i s l ) polynominal 2) e x p o n e n t i a l , i . e . 1) P(X) -2) P(X) = (1-c) r e c x I t i s observed t h a t f o r g = 0 or 1, the r a t i o e s t i m a t o r performs b e t t e r than the other two. For g = 0, 1 or 2, and f o r a wider range of v a l u e s of rn or c, the mean-of-the-r a t i o s e s t i m a t o r performs b e t t e r than the HT e s t i m a t o r . When P(X) i s p o l y n o m i n a l , the III e s t i m a t o r i s most e f f i c i e n t i f the s a m p l i n g d e s i g n i s a p p r o x i m a t e l y pps. The r e s u l t s compare w e l l w i t h t h o s e of o t h e r r e s e a r c h e r s under s i m i l a r assumptions., i i . TABLE OF CONTENTS Page ABSTRACT i ACKNOIVLEDG EMENT S ' ' . v i 1 INTRODUCTION 1 C l a s s i c a l S a m p l i n g Theory. . 1 The Sample Su r v e y Theory . . . . . . . . 4 S e a r c h f o r O p t i m a l E s t i m a t o r s and S a m p l i n g D e s i g n s . . . 6 The R a t i o , M e a n - o f - t h e - r a t i o s and HT E s t i m a t o r s . 10 2 . THE PROBLEM 14 Statement of t h e P r o b l e m . . . . . . . . 14 Method Used and Problems of E v a l u a t i o n . 17 3 THE RESULTS 21 R e s u l t s f o r P ( x ) P o l y n o m i n a l and k = 0.5 21 R e s u l t s f o r P ( x ) P o l y n o m i n a l and k = 1.0' 23 R e s u l t s f o r P\x) E x p o n e n t i a l and k = 0.5 37 R e s u l t s f o r P ( x ) E x p o n e n t i a l and k = 1.0 45 Summary of R e s u l t s and Some C o n c l u s i o n s 52 4 DISCUSSION OF RESULTS . 5 5 Some E m p i r i c a l Comparisons . . . . . . . 55 Comments and Some I m p l i c a t i o n s . . . . . 60 Some L i m i t a t i o n s 64 Some Recommendations 65 REFERENCES . ' 66 i i i LIST OF TABLES T a b l e Number X V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 1, g = 0 Pac^e I V a r i a n c e s of t h e E s t i m a t o r s when X i s Gamma • D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 0.5, 0 / 1 g = 0 2 4 I I V a r i a n c e s of t h e E s t i m a t o r s when X i s Gamma D i s t r i b u t e d P ( x ) i s P o l y n o m i n a l , k = 0.5, : ^ g = 1 l o A ^ I I I V a r i a n c e s of M-YHT and Y- Ymr when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 0„5, g = 2 2 8 IV V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 1, g = o 3 0 V V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 1, g = i , 3 2 A A V I V a r ( N Y H T and V a r (K-Ymr) when X i s Gamma D i s t r i b u t e d P ( x ) i s P o l y n o m i n a l , k - 1 , g = 2 3 4 V I I V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 0.5, _ g = 0 3 <? V I I I V a r i a n c e s of t h e E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x o o n e n t i a l , k = 0.5, g 5 5 i 4 1 I IX V a r ( M Y H T ) and V a r ( M Ymr) when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 0.5, g = 2 4 3 A 6 XI V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 1, q = 1 4 8 X I I V a r ( KYHT) and V a r ( K Y m r ) when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 1, o = 2 5 0 i v LIST CF FIGURES F i g u r e Number Page 1 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s Polynominal, k = 0.5, g = 0 . 2o 2 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s Polynominal, k = 0.5, g = 1 . 27 3 V a r i a n c e s of KYHT and M-Ymr when X i s Gamma D i s t r i b u t e d , P(x) i s Poly n o m i n a l , k = 0.5, g = 2 29 4 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s Polynominal, k = 1, g = 0 31 5 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s Polynominal, k = 1, g = 1 ' 33 A A 6 Var (HYHT) and Var (KYmr) when X i s Gamma D i s t r i b u t e d , P(x) I s Polynominal, k = 1, 9 = 2 35 7 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k = 0.5, g = 0 40 8 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k - 0.5, g = 1 . . . 42 A A I 9 Var (MYHT) and Var (nYmr) when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k = 0.5, g = 2 44 10 V a r i a n c e s of.the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k = 1, g = 0 . . 47 11 V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k = 1, g = 1 49 A A 12 Var (MYHT) and Var (f^Ymr) when X i s Gamma D i s t r i b u t e d , P(x) i s E x p o n e n t i a l , k = 1, 0 = 2 " . 51 v ACKNOWLEDGEMENTS - I would l i k e t o th a n k Dr. C. E. S a r n d a l f o r h i s i n v a l u a b l e a d v i c e and h e l p o f f e r e d t h r o u g h o u t t h i s s t u d y . I would a l s o l i k e t o e x t e n d my' t h a n k s t o the C a n a d i a n Government f o r s p o n s o r i n g my e d u c a t i o n here i n Canada. L a s t l y , my a p p r e c i a t i o n t o t h e T a n z a n i a n Government and t h e P r i n c i p a l of t h e I n s t i t u t e of Development Management, T a n z a n i a , who, i n one way or a n o t h e r , a r r a n g e d f o r my s t u d y . v i CHAPTER I INTRODUCTION Recently, much emphasis has been placed on the d i s t i n c t i o n between General S t a t i s t i c a l Theory and Sample Survey Theory. I t has been f e l t that the two theories should either be reconciled or be treated d i f f e r e n t l y . C l a s s i c a l Sampling Theory In General S t a t i s t i c a l Theory, Sampling Theory involves making inferences from observed sample values to an i n f i n i t e hypothetical population. The sample values are assumed to have been drawn randomly from a smooth density function defined on the units of the hypothetical population. Godambe (1969, 1970) suggests that the tendency to associate t h e o r e t i c a l s t a t i s t i c a l sampling with an i n f i n i t e hypothetical population that has a smooth density function may l i k e l y be due to the V|ay General S t a t i s t i c a l Theory developed. The early s t a t i s t i c i a n s were mainly interested i n b i o l o g i c a l and so c i o l o g i c a l phenomena l i k e inheritance. They assumed that some chance mechanism operated behind these phenomena, and that the chance mechanism uniquely determined the frequency function of the c h a r a c t e r i s t i c under study i n the hypothetical population. 2. When discussing Sample Survey Theory, some optimality properties of estimators from General S t a t i s t i c a l Theory are sometimes encountered. The following are some of these: Let Y^, Y 2 , . . , i Y n be a sample from a population with density f unctionf(Y,0). Let © = d(Yi, Y 2, . . . Y n ) be the estimator of the parameter 0. Since 6 i s a function of the observed sample values, i t i s a random var i a b l e . Let 6* = d* (Yi , .. . Y n ) by any other s t a t i s t i c which i s not a function of 0. The estimator 0 i s s u f f i c i e n t for 0 i f , f o r each 0*, the conditional density of 0* given 0, P(0*J0) does not contain 0. A s u f f i c i e n t s t a t i s t i c , i f i t e x i s t s , contains a l l the information about the parameter to be estimated. The estimator, 8, Is an unbiased estimator of 0 A A i f i t s expected value, E(6), equals 0. 0 i s a Uniformly Minimum Variance Unbiased Estimator (UMVUE) of 0, i f 0 i s an unbiased estimator of 0 and i t s variance i s l e s s than or equal to the variance of any other possible estimator, 6p, of 9 i . e . Var (e)^Var (0p) for a l l 0p. Let 0n = dn ( Y i , . . c,Y n) be an estimator of 9 based on a sa_mple of size n. The 3 « sequence [0n^ i s a consistent estimator of 0 i f , for every ;r>o l i m p(e -£ ^ en<e +e) = i , ve n -*cQ i . e B , 0 n approaches 0 as n gets large. The sequence [®n^ ^ s a Best Asymptotically Normal (BAN) Estimator of 0 i f A (a) The d i s t r i b u t i o n of ^ (0n - 0) approaches the normal d i s t r i b u t i o n with mean 0 and variance cf2 (e) as n approaches i n f i n i t y (b) For every £ > 0, l i m P [j ©n - ej>i] = o , ve n (c) There i s no other sequence of consistent estimators 0j[ , 0"! , . . . . , 0 ^ . ... for which the d i s t r i b u t i o n of ^ ( 6 * - 9) approaches the normal d i s t r i b u t i o n with mean 0 and variance 0*2 (e) and such that ^Jel_> I 0 * 2 (0) for a l l 0 i n some open i n t e r v a l . If the unbiased 0 = d(Yi, Y A ..Y ) i s a l i n e a r 1 2 > ' n function of Y and i f Var (0)iVar (0p) for a l l l i n e a r 0p, A. then 0 i s a Uniformly Best Linear Unbiased Estimator (UBLUE) of 0. It may be worthwhile noting that while general s t a t i s t i c a l theory aims at making inferences about some frequency function of a hypothetical population, i t may 4. a c t u a l l y be i n f e r r i n g about the p o s s i b l e chance mechanism ( d i s c u s s e d above) t h a t produced the gi v e n set of observa-t i o n s which may have nothing t o do w i t h the h y p o t h e t i c a l p o p u l a t i o n at a l l . The Sample Survey Theory In Sample Survey Theory, the p o p u l a t i o n s c o n s i d e r e d are f i n i t e . They c o n s i s t of elements which are r e a l and co u n t a b l e . They are not h y p o t h e t i c a l l i k e the ones c o n s i d e r e d i n the Gener a l S t a t i s t i c a l Theory. T h i s i s the main d i f f e r e n c e between the p o p u l a t i o n s d e a l t w i t h by the two sampling t h e o r i e s . In t h i s sense the problems of i n f e r e n c e i n the two models of sampling t h e o r y may be taken t o be d i f f e r e n t . In survey sampling, s i n c e the i n v e s t i g a t o r d e a l s w i t h r e a l and f i n i t e p o p u l a t i o n s , he has c h o i c e over the sampling d e s i g n and the e s t i m a t o r he may want t o use. O c c a s i o n a l l y , c i r c u m s t a n c e s may n e c e s s i t a t e the use of n o n p r o b a b i l i t y sampling a l t o g e t h e r . Cochran (1963) g i v e s the f o l l o w i n g s i t u a t i o n s t h a t l e a d t o n o n p r o b a b i l i t y sampling: (1) When the sample i s r e s t r i c t e d t o a p a r t of the p o p u l a t i o n t h a t i s r e a d i l y a c c e s s i b l e . (2) When the sample i s s e l e c t e d h a p h a z a r d l y . (3) With a small but heterogenous p o p u l a t i o n , the sampler i n s p e c t s the whole of i t and s e l e c t s a small sample of ' t y p i c a l ' u n i t s , i . e . , u n i t s t h a t are c l o s e t o h i s imp r e s s i o n of the average of the p o p u l a t i o n . T h i s method i s sometimes c a l l e d judgement or purpos i v e s e l e c t i o n . 5. ( 4 ) L a s t l y , when the sample consists e s s e n t i a l l y of volunteers. He notes that these methods may give good r e s u l t s under the r i g h t conditions although they are not amenable to the development of a sampling theory owing to lack of .random sel e c t i o n . But whether one deals with Sample Survey Theory or General S t a t i s t i c a l Theory, he i s dealing with the same problem of i n f e r r i n g from the sample to the population usually with the help of P r o b a b i l i t y Theory and S t a t i s t i c s . In the General S t a t i s t i c a l Theory, random sampling solved nearly a l l the sampling design problems. With survey populations, things are not that much easy. The nature of the population may make random sampling d i f f i c u l t ; i t may not be possible to i d e n t i f y a l l the units i n the population. In most cases, the frequency function of the character under consideration i s unknown. This makes i t d i f f i c u l t to check some of the optimality conditions of estimators i n the General S t a t i s t i c a l Theory. And optimal estimators under the General S t a t i s t i c a l Theory need not be so under the Sample Survey Theory. This r a i s e s the main problem of f inding sampling designs and estimators that are optimal for the sample survey s i t u a t i o n . Mathematically, i n the sample survey model, we have a set P of N units that constitute the population, P = (Ui,, U 2, UN) 6. where stands for unit i . An unknown quantity Y i which i s of i n t e r e s t to the surveyor i s associated with U^. The surveyor wants to know 0 = ©(Yi, Y 2, Yj«j) He may also know the a u x i l i a r y variable X , i . e . a = (Xj_, X 2, . . . , X N ) associated with P. He se l e c t s a subset • s = ( u l f u 2 , u n ) ! Of units of P and observes the corresponding Y = ( Y l v Y 2, Y n) (response errors are ignored). The sampling plan generates S. He then c a l c u l a t e s 9 = e(Y I } Y 2, Y n) as an estimator of 0. The problem i s how to select S (how to assign the p r o b a b i l i t y that unit i of P w i l l be included i n S) and how to choose the random variable 0 to get an estimate which i s as near 0 as possible . C e r t a i n l y , i t i s not just a matter of getting a 'representative' sample and c a l c u l a t i n g the mean, variance I and Median of the sample values. Besides, what i s a representative sample? Search for Optimal Estimators and Sampling Designs, In 1934 Neyman introduced the Gauss-Markov theorem to obtain a l i n e a r unbiased minimum variance estimate for 7. the mean of a survey population. He established that, for simple random sampling, the sample mean was the minimum variance l i n e a r unbiased estimate of the survey population mean. This was an attempt to f i t survey sampling into the hypothetical population model, and h i s findings stimulated most sample-survey s t a t i s t i c i a n s to f i n d , with the help of Gauss-Markov theorem, e f f i c i e n t ( i . e . minimum variance) unbiased estimators for a v a r i e t y of more complex designs. Sampling procedures l i k e , sampling with a r b i t r a r y p r o b a b i l i t i e s , s t r a t i f i e d sampling, c l u s t e r sampling, 2-stage sampling, multi-stage sampling, etc. were designed to reduce the variance of the estimators (see, for example, Goodman and Kish (1950), Durbin (1953), Hartley and Rao (1962), Cochran (1963). Horvitz and Thompson (1952) attempted to provide a general method for dealing with sampling without replacement from a f i n i t e population when var i a b l e p r o b a b i l i t i e s of sel e c t i o n are used to the elements remaining p r i o r to each draw. They, i n p a r t i c u l a r , considered a general estimator of the population t o t a l of the form -A n Y = I r i Y i i=l where ^± ( i = 1, N) i s a constant to be used as a weight for the i t n unit whenever i t i s selected for the 8. sample. L e t t i n g P(Xj_) to be p r o b a b i l i t y that the i " ^ element w i l l be included i n a sample of size n, they showed that P(Xi) makes Y unbiased and of minimum variance ( X ^ i s the value of the a u x i l i a r y variable associated with unit i ) . Godambe (1955, 1965) considered the general estimator •A n Y = -21 b s i Y i i=l where b s i i s defined i n advance f o r a l l the N n l o g i c a l l y possible S (samples), and for a l l i i n S. T h i s . i s a more general c l a s s of estimators than the ones considered by Horvitz and Thompson. He proved the non-existence of a uniformly minimum variance (UMV) unbiased estimator i n t h i s c l ass.of estimators for any design P(s), excepting those i n which no two S with P(S) > o have at least one common and one uncommon u n i t . Because of the non-existence of UMV unbiased estimator, other c r i t e r i a of goodness of an estimator were sought. Godambe and J o s h i (1965) considered the c r i t e r i o n of a d m i s s i b i l i t y and proved that the Horvitz-Thompson (HT) estimator i s admissible i n the c l a s s of a l l p-unbiased estimators of Y , the population t o t a l , for any design P(S) such that 9. TTj = 2 p ( s ) > o , vuj ' S : ) u j The a d m i s s i b i l i t y c r i t e r i o n , however, was s a t i s f i e d by many other estimators, and other new c r i t e r i a l i k e 'hyper-admissi'bility, ' 'necessary bestness' (Basu, 1971) were introduced. A d e t a i l e d examination of these optimality properties r a i s e some questions regarding t h e i r relevance. For example, the Horvitz and Thompson estimator, YHT, i s uniquely 'hyper-admissible' whatever the character under i n v e s t i g a t i o n or the sample design. But the YHT may lead to disastrous r e s u l t s (Sarndal, 1972; Basu, 1971). Many v a r i a t i o n s of the regression type population model y i = axi + Zj_ i = (l,...,N) have been used, where y^ i s the character of i n t e r e s t and Xj_ i s an a u x i l i a r y variable for unit i i n the population; Z i i s an error component. Cochran (1946), Godambe (1955, 1965), Royall (1970, 1971) and many others used the super-population approach i n conjunction with the regression model to compare e f f i c i e n c i e s between sampling methods. Cassel and Sarndal (1972a, 1972b, 1973) use the continuous variable model of the form Y(X) = 0 (X + Z(X) ) VXC(O.cP) where (z(x) / X) = 0 , £(z(x) | X ) 2 = cr2 (X) £(z(Xj.) z (x 2 ) | x 2 x 2 ) •= o (x 2 * x 2 ) 10. and the a u x i l i a r y variable X i s assumed to have a known d i s t r i b u t i o n described by the d i s t r i b u t i o n function F(x) over (0,C P ) . D i f f e r e n t notions of unbiasedness have been used during t h i s search for optimal estimators and designs: An estimator, Y for Y, i s c a l l e d design-unbiased or p-unbiased i f ' t V - - -E p (Y) =2_P(s) Y = Y stS for a l l vectors (Yj_, .... ,Y^) , where P(s) i s a p r o b a b i l i t y function defined on the set S of subsets of s of l a b e l s ( I , « . . . , N ) . An estimator i s c a l l e d model-unbiased or£-unbiased i f i t i s unbiased under the assumptions of the s p e c i f i e d model. A L a s t l y , an estimator Y i s c a l l e d Ep^-unbiased f o r Y i f Ep£ (Y) = I P ( S ) C (Y) = Y s£S an estimator can be p-unbiased but not£-unbiased and vice versa. The P L a t i o , Mean-of-the-Ratios and the HT Estimators It i s a f a c t that using supplementary information in sampling designs and estimators greatly improves the accuracy of the estimates. Three estimators that have 11. received considerable attention i n the l i t e r a t u r e of sample survey theory and which make use of supplementary information are the Ratio estimator, the Mean-of-the-ratios estimator and the Horvitz-Thompson (Hi) estimator. If X i s an a u x i l i a r y v a r i a b l e , the r a t i o estimator for the population mean,KY, i s given by AYR = Y. X = £- X (1.1) x x n n - i • N , A where y = I X = I X = J I X., - = I Z_ Y±, 1=1 1 1=1 . N 1=1 1 Y n 1=1 X = — ZZ X-j, n i s the sample size and N i s the population n i=i s i z e . The mean-of-the-ratios estimator i s A v n y i / x KYmr = - H ~ .-^-2) n i= i XL y • The r a t i o estimators make use of the r a t i o s Y and — i - . X Xj_ i n order to improve estimation •With simple random sampling from an i n f i n i t e population, the r a t i o estimator of Ry has been shown to be the best l i n e a r unbiased estimate i f two conditions are s a t i s f i e d (Cochran , 1963, p. 166, Theorem 6.4): (1) The r e l a t i o n between yj_ and X^ i s a straight l i n e through the o r i g i n (2) The variance of y^'about t h i s l i n e i s proportional to X i . When the variance of y i about t h i s l i n e i s 2 proportional to X'^ , using the mean-of-the-ratios estimator gives much better performance than the other estimator. 12. When the. r e l a t i o n between y± and Xj i s l i n e a r but the l i n e does not go through the o r i g i n , the regression estimator . y" l r = y + b (X - x) performs much better than the otherj- Y j [ x reduces to y i f A y b = 0 and toKRY i f b = Both the r a t i o and regression estimators are consistent and, generally, s l i g h t l y biased, but with sampling designs l i k e s t r a t i f i e d sampling, the bias of the r a t i o estimator may be considerable. This has led to searching unbiased or better ratio-type estimators. J.N.K. Rao (1969) gives some r e s u l t s of the search. The r e s u l t s indicate that under c e r t a i n conditions, other ratio-type estimators perform better than the t r a d i -. t i o n a l r a t i o estimator. The Horvitz-Thornpson estimator i s given by KYHT = . • ni=lP(xi) (1.3). where P(xi) i s the p r o b a b i l i t y that a unit with a u x i l i a r y variable x^ w i l l be included i n the sample. The variance of the HT estimator may be negative, or i t may not.reduce to zero even when a l l the Y-values are the same and the variance should a c t u a l l y be zero. In many studies the HT estimator has been shown to compete very -well with the r a t i o estimators. Assuming a l i n e a r stochastic model of the form Y i = cc + pXi + Zi , = 0%2Yt Foreman and Brewer (1971) showed that the HT estimator i s 13. more e f f i c i e n t than the r a t i o e s t i m a t o r w i t h equal s e l e c -t i o n p r o b a b i l i t i e s i f Y?^ , but they c a u t i o n t h a t i f the sampling f r a c t i o n i s l a r g e and a i s a p p r e c i a b l e , the r a t i o e s t i m a t o r may be made more e f f i c i e n t than the HI. e s t i m a t o r . Rao (1967) g i v e s some c o n d i t i o n s under which the r n e a n - o f - t h e - r a t i o s e s t i m a t o r i s s u p e r i o r t o both the r a t i o e s t i m a t o r and the HT e s t i m a t o r . U n f o r t u n a t e l y , the m e a n - o f - t h e - r a t i o s e s t i m a t o r i s not c o n s i s t e n t l i k e , say, ! the r a t i o e s t i m a t o r (Sukhatme and Sukhatme, 1970, p. 1 60). 14. CHAPTER II THE PROBLEM Statement of the Problem The problem with simple random sampling i s that i t does not take into account the possible importance of the larger units i n the population. Because of t h i s , sampling designs l i k e sampling with p r o b a b i l i t y proportional to size (pps), and generally known as sampling with varying p r o b a b i l i t i e s came to be used. These are more complex designs than simple random sampling. It was also r e a l i z e d that i t makes a difference whether sampling i s done with replacement or without replacement. However, the estimators considered were very complicated. To get simple estimators, sampling schemes l i k e the Midzumo system of sampling, the Narain Method of sampling, Systematic Sampling with varying p r o b a b i l i t i e s etc. were introduced. It i s very cl e a r , from such studies, that performances of estimators can be improved by, not only making use of supplementary information, but also by choosing the proper sampling design. The r a t i o estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson estimator have performed quite e f f i c i e n t l y i n some of the.research work i n sample survey l i t e r a t u r e . But which of these three d-oes better 15. e s t i m a t i o n than the o t h e r s depends on the sampling d e s i g n used, the way the supplementary i n f o r m a t i o n has been used,.the c l a s s of e s t i m a t o r s used and the way ' b i a s 1 or 'unbiased' i s d e f i n e d . J o s h i (1971), f o r example, s t a t e s t h a t ( i ) The HT estimate i s always a d m i s s i b l e i n the c l a s s of a l l unbiased e s t i m a t e s , l i n e a r and n o n - l i n e a r . ( i i ) In the e n t i r e c l a s s of e s t i m a t e s , the HT e s t i m a t o r i s a d m i s s i b l e i f the sampling d e s i g n i s of f i x e d sample s i z e , ( i i i ) I f the l o s s f u n c t i o n , V ( t ) , where t i s the n u m e r i c a l d i f f e r e n c e between the e s t i m a t e d and t r u e v a l u e s , s a t i s f i e s only (a) V ( t ) i s non-decreasing i n CO, ^ l , (b) f o r every K > 0 °o 2 5 V ( t ) exp (- % - ) dt C <*> 0 2 then the sample mean, and more g e n e r a l l y the r a t i o e s t i m a t e i s always a d m i s s i b l e as an e s t i m a t e of the p o p u l a t i o n mean. The use of- these t h r e e e s t i m a t o r s w i t h the r e g r e s s i o n type models has shown t o g i v e v e r y promising r e s u l t s . I would l i k e t o i n v e s t i g a t e the performances of these t h r e e e s t i m a t o r s (the r a t i o e s t i m a t o r , mean-of-the-r a t i o s e s t i m a t o r and the HT e s t i m a t o r ) under the r e g r e s s i o n type c o n t i n u o u s v a r i a b l e Model of C a s s e l arid S a r n d a l (1973). 16. I would l i k e to consider the model Y(xi) = 6(xi) + z ( x i ) ) where £(z/i| Xi) = 0 V x i € (0, *P) . £ ( Z i 2 | x i ) = cr 2 (xi) = k 2 Xi9 E ( Z i z j j XJX-J) = 0 ( i T j ) where Y i i s the value of the character under i n v e s t i g a t i o n for unit i , x i i s the value of the corresponding a u x i l i a r y v a r i a b l e . I further assume that the a u x i l i a r y variable X i s gamma d i s t r i b u t e d over (O,0^1) i . e . . f(x) = X r ~ 1 e ' " X for x £ ( 0 , < ^ ) Hence i t s mean E(x) = § x f(x)dx = r 0 My aim i s to estimate the population mean E P € ( Y(x) ) = =Ky The three estimators of the population mean that I would l i k e to consider are * n y ( * i ) MYHT = i YI - r - T - ( 2 , I ) 1=1 ^^mr - £ (2.2) 1=1 A n H YR = r i f ! ( 2.3) n * i i = l 1 7 . Where the design function, P(x^) i s the s e l e c t i o n p r o b a b i l i t y density of a unit with a u x i l i a r y variable x i , n i s the number of units i n the sample. Draws are made independently of each other according to the same d i s t r i b u t i o n of i n c l u s i o n p r o b a b i l i t i e s . (2.l) i s the H 0rvitz-Thompson estimator, (2.2) i s the rnean-of-the-r a t i o s estimator and (2.3) i s the c l a s s i c a l r a t i o estimator. The three estimators are Ep£-unbiased and t h e i r MSE's, then, equal t h e i r EpGvariances. Method used and Problems of Evaluations The e f f i c i e n c i e s of these estimators w i l l be evaluated i n terms of t h e i r EpGvariances, which are obtained from Generally, the expressions for the variances of these estimators are Var (KY) = E p£(ft y 2) - (Ep£(fiy) ) 2 Var (RYHT) x__±_k 2xi f ( x ) d ( x ) - 1 ) A 2 ° o Var (Hymr) Ky j k 2 x g ~ 2 p(x) f(x) dx n 0 Var (KYR) A p(x) f ( x j d(x_) where P( x) = JT P(x,) , f(x) = U f ^ ) , d(x) = J J dx i -' i = l -1 I - I x l - l 18. and the l a s t i n t e g r a l i s n-dimensional (over 0 z_ x^,/^ for i = 1, n). The variances depend e s s e n t i a l l y on three things, namely, the shape of the continuous d i s t r i b u t i o n , f ( x ) , of the a u x i l i a r y v a r i a b l e , x; the variance of Z i and the se l e c t i o n p r o b a b i l i t y density, P ( x ) . I w i l l compare the variances under l ) polynomial p(x), 2) exponential p(x). (1) Let P(x) -s ?M X m (m = 0 represents simple random sampling, m = 1 represents pps sampling scheme). In t h i s case the expressions f o r the variances of these estimators are: Var rf(YHT) = ^ [ f r { f T ^ (JU+r-mH k 2 P(g+r-mjj-l}(2 04) Var (tfymr) = ^jg-r+a^). -By2. ( 2 o 5 ) for g = o: A k 2 2 V a r ( H Y R ) = (n^nr-l)(nm-nr-2) ' H y Uy^ k2 " —- ' ftrTTT2' ( i f n i s l a ? g e ) . (2.6) for g = 1: Var (KYR) = ^ - • U_ 2 nm+nr-1 / -n- i i y i • k2-. (for n large) (2.7) n m+r 19. I evaluate these expressions numerically for r = 1, 2, 3 and k and g f i x e d at 0 .5 and 0 res p e c t i v e l y , 0 .5 and 1 resp e c t i v e l y , 0 . 5 and 2 re s p e c t i v e l y , 1 and 0 res p e c t i v e l y , etc., and as m takes the values -1 (0 .5 ) 2 , 3 , 4 . I w i l l observe how the variances of these estimators behave. (2) Let P(x) = ( l - c ) r e c x , c£.l ' (c = 0 represents simple random sampling). In t h i s case the expressions for the variances are: Var ( K Y H T ) = ^ C ^ — f ! l 2 ± X - L _ + ( 2 . 3 ) n p U - c ) * l ( r ) L ( c + a.)2+r ( c + 1 ) g+U J Var fr(vrnr1 = Hy 2. k 2 V (q+r 72) (2.9) for g=0: Var (KyR) = K? n i ^ ^ c l 2 1 y ( n r - l ) ( n r - 2 l f ^ ' . i i H l l r c J 2 ( f o r n l a r g e ) { 2 < 1 0 ) for g=l: Var (kyR) = K v k 2 ( l - c ) nr-1 2 i^x • k 2 ( l - c ) . ( f o r n large) (2.1l) n r As with the polynomial case, these expressions are evaluated and compared as values of r » c, g, k vary. It has. not been possible to evaluate' the variance 20. of the r a t i o e s t i m a t o r when g = 2, o n l y t h e v a r i a n c e s of the HT e s t i m a t o r and m e a n - o f - t h e - r a t i o s e s t i m a t o r w i l l be compared when g = 2. W i t h e x p o n e n t i a l p ( x ) , t h e v a r i a n c e of t h e m e a n - o f - t h e - r a t i o s e s t i m a t o r i s f i n i t e o n l y i f r + g > 2 . Throughout, I have used t h e a p p r o x i m a t e A. e x p r e s s i o n s of Var (KyR) f o r my e v a l u a t i o n s . The r e s u l t s a r e t a b u l a t e d below and i l l u s t r a t e d by g r a p h s . CHAPTER 3 THE RESULTS Results for P(x) Polynominal and k = 0.5 The r e s u l t s are presented i n Tables I - I I I and i l l u s t r a t e d by fi g u r e s 1-3. When g = 0 the r a t i o estimator has, generally, the smallest variance for fixed values of r and m, followed by the mean-of-the-ratios estimator and then by the HT estimator. When r = 1 not much comparison can be made between the HT estimator and the rnean-of-the-ratios estimator as the variance of the HT estimator i s f i n i t e for -l.O^m^l while the variance of the mean-of-the-ratios estimator i s f i n i t e for m>l. In the range of values of m for which the variances of the r a t i o estimator and of either the HT estimator or the mean-of-the-ratios estimator are f i n i t e , the r a t i o estimator has smaller variance. When r=2 and 0<m<2 where the variances of a l l the three estimators are f i n i t e , the r a t i o estimator has the smallest variance; I the HT estimator i n i t i a l l y has a smaller variance than the mean-of-the-ratios estimator. The variance of the HT estimator takes i t s minimum value at about m = 1.0 and for m?1.0 the mean-of-the-ratios estimator's variance i s smaller than that of the HT estimator. The variance of the HT estimator s t a r t s at i n f i n i t y , takes i t s minimum value at 22. m = 0.5 when r = 1, and m = 1.0 when r = 2 or 3, and goes to i n f i n i t y as rn takes higher values, When r = 3 the r a t i o estimator i s most e f f i c i e n t followed by the mean-of-the- r a t i o s estimator and the.'HT estimator i s last.- The variances of the r a t i o and mean-of-the-ratios estimators get smaller and smaller as m increases. The same i s true with changes i n the value of r . This can also be seen by observing formulas (2.5) and (2.6) for the variances of the two estimators. When g = 1 the r a t i o estimator s t i l l performs better than the other two. When r = 1 and m^l, the HT estimator has smaller variance than the mean-of-the-ratios estimator. The reverse i s true for m^l. When r = 2 or 3, the mean-of-the-ratios estimator has smaller variance than the HT estimator except when the variance of the HT estimator takes i t s minimum value equal to the value of the variance of the mean-of-the-ratios estimator. When r = 1 the HT estimator beats the mean-of-the-ratios estimator for l<m, the mean-of-the-ratios estimator beats the HT estimator when m?l. The varience of the HT estimator takes i t s minimum value at m = 1.0, and the variances of the r a t i o and mean-of-the-ratios estimators become smaller and smaller as m gets larger and larger. When g = 2 the mean-of-the-ratios estimator's variance i s a constant and smaller" than that ox the HT 23. estimator (except when the variance of the HT estimator takes i t s minimum value equal to the value of the variance of the mean-of-the-ratios estimator). The variance of the HT estimator i s smallest at m = 1.0. 1 . Generally, as m increases, the variances of the r a t i o and the mean-of-the-ratios estimators are reduced, but the extent to which these variances can be reduced by, say, a given big m i s lessened as g takes higher values. At m = 4, for example, the variances of the r a t i o and the mean-of-the-ratios estimators are approximately zero for g = 0, 0.05 M,2/n for g = 1, and. the mean=of-the-ratios y estimator has a constant variance of 0.25|{2/n when q = 2. y For large values of m, the difference between the variance of the r a t i o estimator and that of the mean-of-the-ratios estimator i s very small and not very much dependent on the value of r . The smallest possible value taken by the variance of the HT estimator and the extent to which i t i s f i n i t e i s greatly affected by the value r takes. As r takes bigger values, the variance of the HT estimator can' tlake smaller values and i s f i n i t e for a wider range of the values of m. Results for P(x) Polynominal and k = 1 The r e s u l t s are given i n tables IV-VI and i l l u s t r a t e d by Figures 4 - 6. . When g r 0 and r - 1, the variances of the Hi TABLE I Variances of the Estimators when X i s Gamma Distrib u t e d , P(x) i s Polynominal, k = 0.5, g = 0. (Each entry should be m u l t i p l i e d byf^{2/n) r = 1 r = 2 I i ! r = 3 A • HYR A Kymr i u . ! m M-YHT Kymr ^ YHT i KYR I UYHT Hymr i KYR -1.0 CO OO 5.13 J0.25 J I 2.38 ! oo > 0.063 -0.5 5.28 1.00 1.65 JO.11 ! j 0.97 0.33 | 0.040 0.0 -1.25 0.25 0.56 ;0.063 i 0.36 0.13 I 0.028 0.5 0.57 0.11 0.22 0.33 J0.040 I 0.11 0.067 j 0.020 1.0 oo oo 0.063 0.13 0.13 1 |0.028 1 ! 0.042 0.042 j j 0.016 1.5 oo 0.33 0.040 0.47 0.067 ! J0.020 • 0.15 0.029 ! ! 0.012 2.0 0.13 0.028 CP 0.042 |0.016 0.50 0.021 | 0.010 3.0 0.042 0.016 0.021 i j o . o i o oo 1 0.013 ! 0.0069 4.0 0.02 0.01 • 0.013 J0.0069 j l ! 0.0083 ! 0.0050 'TABLE II Variances of the Estimators when X i s Gamma Distributed, P(x) i s Polynominal, k = 0 . 5 , g = 1 . (Each entry should be m u l t i p l i e d by|U.2/n). r = 1 r = 2 r = i 3 .. ! I m r A KYHT Kymr K YR A K Y H T A kymr. H-YR A H Y H T A H Ymr 1 A j r ^ Y R ! - 1 . 0 oo C O 5 . 3 7 oo 0 . 2 5 2 . 5 0 0 . 2 5 ! • ! 0 . 1 3 - 0 . 5 5 . 4 8 0 . 5 > 1 . 7 6 2 0 . 5 0 0 . 1 7 • 1 . 0 4 0 . 1 7 0 . 1 0 | 0 . 0 1 . 2 5 C O 0 . 2 5 0 . 6 3 0 . 2 5 0 . 1 3 0 . 4 2 0 . 1 3 j 0 . 0 8 3 | i 0 . 5 0 . 3 7 0 . 5 0 . 1 7 0 . 2 1 0 . 1 7 i 0 . 1 0 ; 0 . 1 4 0 . 1 0 i 0 . 0 7 3 | ! 1 . 0 0 . 2 5 0 . 2 5 0 . 1 3 0 . 1 3 . 0 . 1 3 0 . 0 8 3 0 . 0 8 0 . 0 8 3 0 . 0 6 3 1 1 . 5 2 . 5 2 0 . 1 7 0 . 1 0 0 . 2 9 0 . 1 0 0 . 0 7 1 0 . 1 8 0 . 0 7 1 j 0 . 0 5 5 j i 2 . 0 G o 0 . 1 3 0 . 0 8 0 . 8 6 0 . 0 8 0 . 0 6 3 0 . 5 0 0 . 0 6 3 1 0 . 0 5 0 j 3 . 0 0 . 0 8 0 . 0 6 oo 0 . 0 6 3 0 . 0 5 0 I j 3 . 1 7 0 . 0 5 0 0 . 0 4 3 I I 4 . 0 0 . 0 6 0 . 0 5 i ! i 0 . 0 5 i i 1 0 . 0 4 2 j i i 1 cO i i I 0 . 0 4 1 ( 0 . 0 3 5 TABLE III Variances of YHT and Ymr when X i s Gamma Distributed, P(x) i s Polynominal, k = 0 . 5 , g = 2 . (Each entry should be m u l t i p l i e d by Ky/n) 1 ! i i ! I J. r = 1 r = 2 ] r = 3 m ; i A KYHT A KYmr A [X, YHT A U Ymr KYHT A KYmr 1 - 1 . 0 1 o o 0.25 6.50 < .0.25 3.17 0.25 i | - 0.5 6.36 0.25 2.21 0.25 • 1.41 0.25 j 0 .0 1.500 0.25 0.88 1 0.25 0.67 0.25 ; ! 0 .5 0.47 0.25 0.38 I 0.25 0.34 0.25 . ! 1.0 0 .25 0.25 0.25 1 0.25 j 0.25 0.25 1 ! 1.5 0.47 j 0o25 0.38 ! 0.25 0.34 0.25 2 .0 1.50 0.25 0.86 i 0.25 0.67 0.25 3 .0 0.25 6.25 I 0.25 3.17 0.25 1 4 . 0 i 0.25 ! 0.25 ! ! i 24.00 0.25 j i • H 4 4 4 • : - • . . "TT TTTT EIGUHE -3; XTX t-tr ! I ! ! ' • -- V a r i a n c .' ^ Gamma [Distributed [Jl 4 J 44 -h p -m i M -HI 1 I jw 1 11 r j V a i . ( H Y r n r ; ^ V a x ^ K Y H f l 1 1 1 ! f o r any • : . j j Lu-L f or ' r • = ? - 1 I I . r r •44—h -Hi Li" r r i :2, 3 :re i i ; • T - " I T — ' T ' T T T ? a nd(KY mr) whe n • X • i b • • E5TJF p(x)'is:£diynominal, (0; i l lustrate:fable: 0 r • 4 1414X1.-4. ;.L . lift pectlye! 444-0'. • rrrr . i - i . -4-L. £L ! r rr ± 1— t mt H fflxllx r 1 1 I 1 1 1 t ttt . : tfct f t 1 : • -• • IX X I H-hH t • • - r t -t - H i -H +11! ' f f H±tt± r r r r r M 4 -4-U .- l-L r r r t t t .14 -L. .1 f (-4 - M -.1. L —1-4. L ; i L t t - H -;xt;X r - r r -rr 44 t I tx jx : ! x i : Bit t TABLE IV Variances of the Estimators when X i s Gamma Distrib u t e d , P(x) i s Polynominal, k = 1, g = 0. (Each entry should be mu l t i p l i e d by r = 1 r = 3 m KYHT A KYmr A A j A j / v ; A j A x I A (iYR. j pi YHT | KYmr 1 f^YR | KYHT j KYmr j |4YR -1.0, -0.5 0.0 0.5 1.0 1.5 2.0 3.00 4.00 6.63 1.00 1.75 oo oo oo I 1.33 0.50 0.17 0.083 5.50 '4.00 ! 1.86 1.00 I 0.75 I 0.44 j 0.40 i 0.25 ! 0.50 0.16 | 1.50 0.11 j D O 0.063; 0 0 0.040: j i.oo I 0.43 j o° | 0.25 '1.33 |0.16 0.50 j0.11 i 0.27 |0.082 .0.17 •0.063 :0.083J 0.04 !0.050|0.028 2.50 1.08 0.44 0.19 0.17 0.36 1.00 oo 1.32 • 0.52 0.27 0.17 0.12 0.084 ;• 0.052 I 0.033 ! 0.25 I t I 0.16 ! 0.11 ! j 0.08 ! 0.06 j I 0.048 ! 0.04 J | : 0.028 i i 0.020 TABLE V Variances of YHT and Ymr when X i s Gamma Distributed, P(x), i s Polynominal, k = 0.5, g = 2. (Each entry should be mu l t i p l i e d by L[ 2/n) r = 1 j i r = 2 - r = 3 m A, RYHT fiYrnr A, HYR IKYHT j A | KYmr. HYR KYHT 1 ! j A ; j K.Ymr ; i ; A KYR -1.0 6.50 i i o O j 1.00 3.00 ! i 1.00 0.50 j -0.5 7.44 2.00 2.32 2.00 0.67 1.36 0.67 0.40 j o.o 2.00 oo 1.00 .1.00 j 1.00 j 0.50 0.67 0.50 0.33 j 0.5 0.96 2.00 0.67 0.55 "! 0.67 0.40 0.37 0.40 0.29 j 1.0 1.00 1.00 0.50 0.50 i i 0.50 0.33 0.33 0.33 0.25 1 1.5 2.53 • 0.67 0.40 0.84 \ 0.40 0.29 0.50 0.29 0.22 | 2.0 j • O O i 0.50 0.33 2.00 1 ! 0.33 I I 0.25 1.00 0.25 0.20 1 3.0 0.33 0.25 o O 0.25 0.20 5.67 0.20 0.17 j j 4.0 i 0.25 0.20 . 0.20 0.17 0.17 0.14 . . . TABLE VI Variances of YHT and Ymr when X i s Gamma Dist r i b u t e d , P(x) i s Polynominal k = 1, g = 2. (Each entry should be m u l t i p l i e d by j^y/n) r = 1 r = 3 ) i ! m 1—1 — ; KYHT i ' i A, I KYmr A KYHT A KYrnr 1 A KYHT A K Ymr t i - i . o ! 1 1 1 11.00 1 | 5 .67 | - 0 . 5 j j 10 .78 1 4 . 1 5 1 2 .86 • \ 0 .0 i 3 .00 i ! i » 2.00 i j 1.67 1 ! 0 .5 j 1.36 i I i 1.21 1 ! L 1 7 i ; i .o : 1.00 i 1 ! • i 1.00 1 • 1.00 1 1.5 i ; 1.36 i ! 1.21 1 1.17 1 ; i 2 .0 i 3 . 0 1 2.0 1 1.67 .1 j 3 .0 i 1 i i 11.00 1 5.67 i ! | 4 . 0 ' i i 39 .00 ] CO 36. estimator and of the mean-of-the-ratios estimator are not comparable as the values of m for which these variances are f i n i t e are not the same. In the range of values of m for which either the variance of the HT estimator or the variance of the mean-of-the-ratios estimator and that of the r a t i o estimator are f i n i t e , the r a t i o estimator i s more e f f i c i e n t . When g = 0, r ~ 2, and 0 < n U l the r a t i o estimator has smallest variance followed.by the HT estimat-or and then by the mean-of-the-ratios estimator, but the mean-of-the-ratios estimator beats the HT estimator for m">l. When g = 0, r = 3 and m<0.8, the r a t i o estimator i s best followed by the HT estimator and the mean-of-the-ratio estimator i s l a s t . The r a t i o and mean-of-the-ratios estimators become more e f f i c i e n t with an increase i n the value of both m and r. The HT estimator's variance takes i t s minimum value for values of m near 0.8; i t also takes smaller values as r increases. When g - 1 the r a t i o estimator always has the smallest variance. When r = 1 and m<l the HT estimator i s better than the mean-of-the-ratios estimator. The ! reverse i s true when m>l. When r = 2 and for either mcO or m>l, the HI estimator has larger variance than the mean-of-the-ratios estimator, for 0<rn*l the HT estimator's variance i s smaller than that of the mean-of-the-ratios. When r = 3 the HT estimator beats the mean-of-the-ratios for 0.3< : r iKl and the mean-of-the-ratios estimator beats the HT estimator for other values of m. 3 7 . When g = 2 the variance of the mean-of-the-ratios estimator i s a constant at j.(2/n and, except when the variance of the HT estimator takes i t s minimum value, i t i s smaller than that of the HT estimator. As i n the other case, an increase i n the value of g has a general e f f e c t of lessening the power of reducing the variances of the r a t i o and rnean-of-the-ratios estimators by increasing m. No improvement can be made on the variance of the mean-of-the-ratios estimator when g = 2. The variance of the HT estimator i s smallest for values of m near 1.0. The change i n the value of k from 0.5 to 1.0 has the e f f e c t of s l i g h t l y increasing the values of the variances of the estimators f o r fixed values of g, r and rn. Results for P(x) Exponential and k - 0.5 The r e s u l t s are given i n Tables VII-IX and i l l u s t r a t e d by Figures 7-9. When g = 0 or 1, the r a t i o estimator i s most e f f i c i e n t followed by the mean-of-the-ratios estimator and then by the HT estimator. The variances of the r a t i o and mean-of-the-ratios estimators are zero when c = 1. The greater the value r takes the more e f f i c i e n t the estimators become. The graph of the variance of the HT estimator i s u-shaped; an increase i n r has the e f f e c t of pushing the minimum point of the variance of the HT estimator to the l e f t . 33. When g - 2, the variance of the mean-of-the-ratios estimator i s constant at 0.25ky/n and i s smaller than that of the HT estimator. The e f f e c t of increasing r on the shape of the variance of the HT estimator's curve i s simply to s h i f t i t s minimum point to the l e f t . The rate of improvement on the estimators by increasing c i s reduced as g changes from 0 to 1, and the estimators become less e f f i c i e n t too. TABLE VII Variances of the Estimators when X i s P(x) i s Exponential, k = 0.5, g = 0. be m u l t i p l i e d by \i2/^) A : KYHT j r r -• 1-i 138.58 i 18.90 j 6.00 2.50 j 1.25 | 0.70 I 0.52 | 0.60 | 1.44 i KYR 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01 0.00 ! \ \ RYHT i |288.50 j 21.25 i | .4.96 j 1.61 0.56 i ! 0.21 I I ! 0.58 1 2.53 C O = 2 ! 0.25 [ 0.20 I 0.16 S 0.12 I 0.09 i 0.06 | 0.04 I | 0-02 j 0.01 i. 0.003 ! 0.00 Gamma Distributed, (Each entry should KYHT c o 713.50 29.84 4.70 1.38 0.36 0.10 0.24 1.21 12.68 r = 3 -1 -I KYmr I 0.50 •: 0.40 0.32 | 0.24 I 0.18 ! 0.12 | 0.08 ; 0.04 j 0.02 j 0.005 j 0.00 A I KYR I o . n • 0.09 ; o.o7 : 0.05 ' 0.04 ' 0.03 ; 0.02 : 0.01 ; 0.004 ; 0.001 ', 0.00 t r m -ff IT i - H H - H H - H -I.).,). |,|_ J - U u • • r TXT k N 4 4 - H 0.5 ± •iT1" 7T7Tj[+ when X. rs 40. • V a r i a n c e s ' oi j the j L:s timator s! : Gamma : p j ) $ t t i b u t e d i P(x).; i s : f:*poneptia " • =;0. • To i l l u s t i a t e - T a b i e 1i 1 s i r n : t~l n ~n i :-rt-n-iurn.:::.; HH+rt H- M - {-Ei : . ; : ! - : T ! - | i TABLE VIII Variances of the Estimators when X i s Gamma Distrib u t e d , P(x) i s Exponential, k = 0.05, g = 1. (Each entry should be m u l t i p l i e d by^/n) I i 1 r : = 1 r = 2 r = 3 1 A A A . A i A j A ; ; C ! HYHT MYR KYHT KYmr MYR MYHT Hymr I UYR j i - i . o ! • 1 oo 0.50 oo 0.50 0.25 o o i t 0.25 | j i 0.17 \ j j -0.8 J141.30 0.45 290.30 0.45 0.23 713.50 0.23 0.15 j • -0.6 j 1 19.51 0.40 22.20 0.40 0.20 30.93 0.20 | j 0.13 | \ -0.4 "! 6.11 0.35 5.08 0.35 0.18- 5.33 0.18 | 0.12 i -0.2 : 2.53 0.30 1.60 0.30 0.15 1.47 0.15 0.10 | 0.0 :' 1.25 0.25 0.63 - 0.25 0.13 0.42 0.13 j i 0.083. | 0.2 ; 0.66 0.20 0.27 0.20 0.10 0.16 0.10 ; 0.067 ; 0.4 .0.43 0.15 0.25 0.15 0.075 ! 0.39 0.08 1 0.050 '; i 0.6 0.47 ! 0.10 I 0.73 0.10 0.050 j 1.50 0.05 i 0.033 ; o.s 1 1.10 ! 0.05 3.54 0.05 0.025 11.15 0.03 j 0.017 ; ; i.o ! <?0 ; 0.00 o O 0.00 0.00 j c O 0.00 ' 0.00 Var (KYmr) i s not d e f i n e d f o r r = 1. TABLE IX Va r (KYHT) and v a r (KYmr) when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 0.5, g = 2. (.Each e n t r y s h o u l d be m u l t i p l i e d byk^/n) 1 1 • ' • — • • • - — • 1 i. - 1 r r = 2 r = 3 f 1 <= t I KYHT KYmr A KYHT A KYmr 1 KYHT ; A j K Ymr i i | -1.0 t 0.25 o o 0.25 1 o o ! | 0.25 ! j -o.s • ' 165.67 0.25 301.10 0.25 720.6 j 0.25 j ! -0.6 15.13 0.25 24.20 0.25 32.06 j j 0.25 | j -0.4 7.00 0.25 5.79 0.25 5.81 j 0.25 j -0.2 2.91 0.25 2.05 0.25 1.73 j 0.25 • 1 i 0.0 1.50 0.25. 0.88 0.25 0.67 0.25 I i 0 , 2 j 0.74 0.25 0.47 0.25 0.42 j 0.25' ! 1 0.4 1 1 0.46 0.25 0.46 | 0.25 i | 0.71 ; 0.25 j j 0.6 . 0.46 0.25 1.00 | 0.25 | - 2.26 1 0 o 2 5 | : o.s i 1.06 i j 0.25 | '4.13 j 0.25 i j 14.97 ! 0.25 | j 1.0 | o O I 1 | 0.25 i j o o | 0.25 i 1 i 0.25 ! ! • rl t t -tr 4 5 . R e s u l t s f o r P(x) Exponential and k = 1. The r e s u l t s are given i n Tables X-XII and i l l u s t r a t e d by Figures 10-12. The r a t i o estimator has the smallest variance throughout. When g = 0, r = 3 and 0.23<c or c^-0.1, the variance of the mean-of-the-ratios estimator i s smaller than t h a t of the HT estim a t o r . The HT estimator beats the mean-of-the-ratios estimator f o r 0.1^c<0.28. When g = 1, r = 2 and c^O.12 or O0.3, the mean-o f - t h e - r a t i o s estimator i s more e f f i c i e n t than the HT esti m a t o r , which beats the mean-of-the-ratios estimator f o r 0.12£c^0.3. When r = 3 the mean-of-the-ratios estimator has smaller v a r i a n c e than the HT es t i m a t o r . When g = 2 the vari a n c e of the mean-of-the-ratios estimator i s constant at A 2 / n a n d i s smaller than t h a t of the HT es t i m a t o r . In t h i s p a r t i c u l a r case the minimum point of the graph of the va r i a n c e of HT estimator moves to the l e f t and the value of the variance at i t s minimum inc r e a s e s as r i n c r e a s e s . An increase i n the values of g has e f f e c t s s i m i l a r t o those mentioned i n the other cases above. As w i t h the polynominal case, an inc r e a s e i n the value of k from 0.5 to 1 has the e f f e c t of making the estimators l e s s e f f i c i e n t f o r f i x e d values of c, r and g. TABLE X Variances of the Estimators when X i s Gamma Distributed, P(x) i s Exponential, k = 1, g = 0. (Each entry should be multiplied by L( 2/n) 1 1 1 i t r = 1 r = 2 | r = 3 • ! C i A I K Y H T rt ; RYR 1 A ! A MYHT j {AYR i " A • KYHT ——— i ! A : KYmr —-1 A i H YR ' -1.0 1 oo i i j | 4.00 j oo : 1.00 cQ ! 2.00 0.47 -0.8 | 140.7 | 3.24 ! 288.70 j : 0.81 i 713.4 1.62 0.36 -0.6 | 20.10 j 2.56 j 22.13 i 0.64 : 30.80 1.28 0.28 -0.4 i i 1 5.6i ; - f 1.96 j 5.13 0.49 i 5.25 0.98 0.22 -0.2 t i j 3.30 j 1.44 i | 1.77 0.36 1.41 0.72 0.16 0.0 • 1 ' 2.00 | i j 1.00 | • 0.75 : 0.25 ; 0.31 0.50 0.11 0.2 1.50 ! 0.64 ! 0.45 0.16 i 0.23 0.32 0.07 0.4 ! 1.44 0.36 j 0.57 0.09 0.52 0.18 0.04 0.6 | 1.77 : 0.16 ! 1.41 0.04 ; 2.07 0.08 0.018 0.8 ' 3.53 . 0.04 i 6.29 0.01 i 15.58 0.02 0.0044 1.0 j oO i 0.00 oO .0.00 oO 0.00 0.00 Var ( A Y mr) i s not defined for k = l , g = 0, r c 2. TABLE XI Variances of the Estimators when X i s Gamma Distributed, P(x) i s Exponential, k = 1, g = 1. (Each entry should be m u l t i p l i e d by jj^/n) r - 1 r - 2 • r = 3 c A KYHT A Y R A. KYHT A K.Ymr i KYR A KYHT A KYmr A KYR -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 i j 0.4 1 0.6 1 0.8 • 1.0 151.8 22.44 7.60 3.56 2.00 1.31 1.07 1.20 2.26 oO 2.00 1.80 1.60 1.40 1.20 1.00 0.80 0.60 | 0.40 | 0.20 i 0.00 j 1 292.20 59.55 5.61 2.25 2.00 0.67 0.78 1.64 6.43 C O 2.00 1.80 1.60 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 | 1.00 ! 0.90 ; 0.80 j 0.70 : o.60 ! 0.50 j 0.40 j 0.30 1 0.20 i 1 °-10 , > o0oo ; o O 714.80 31.30 5.59 1.70 0.67 0.50 0.98 3.03 20.79 o O 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.67 0.60 0.53 0.47 0.40 0.33 0.27 0.20 0.13 0.07 0.00 Var A (MYmr) i s not defined f or. r = 1. iTf 4-4-4 I 1 1 I I 1 J 4 Vaxianc es I of I the i :Stima tor Gariiriia ;Dic t x i b i j i t e dP(x) H I III I U J-U-iJ-Hri m t m 19. :: i t - h R± r~n~| • FFr i TABLE XII A A var (KYHT) and var (KYmr) when X i s Gamma Distributed, P(x) i s Exponential, k = 1, g = 2. (Each entry should be multiplied by H^/n) 1 i = i r = 2 r = .3 c ; &YHT A •UYrnr UYHT $ Ymr A KYHT $Ymr - 1.0 - 0.8 - 0.6 - 0.4 - 0.2 0.0 0.2 0.4 0.6 0.8 1.0 i c O "' 276.78 • 37.46 12.16 5.51 3.00 1.90 1.43 1.45 2.51 o O i i 1 1 . 1 1 1 1 i ! i ! 1 i 1 i i ! 1 1 O O 346.30 31.04 8.95 3.58 2.00 1.49 I 1.60 | 2.72 1 9.00 i ! oo ! 1 1 1 1 1 1 1 1 1 1 1 o O 742.0 37.87 7.50 2.86 1.67 1.55 2.40 6.08 | 36.41 1 1 1 1 1 1 1 1 1 | 1 1 52. Summary of Results and Some Conclusions The following observations can be made from the r e s u l t s : (1) Under the given assumptions about the d i s t r i b u t i o n of the a u x i l i a r y v a r i a b l e , X,and the sampling designs, and for g = 0 or 1, the r a t i o estimator i s better than both the mean-of-the-ratios and the HT estimators. Under a very wide range of the values of m, the mean-of-the-ratios estimator performs better than the HT estimator. (2) The HT estimator i s very sensitive to changes i n values of the design function parameters m and c. Within a c e r t a i n (small) range of the values of these parameters the HT estimator has, sometimes, smaller variance than that of the mean-of-the-ratios estimator. (3) The variances of these estimators are usually smaller when k = 0„5 than when k = 1. ( 4 ) For fixed values of k and g, the variances of the r a t i o and mean-of-the-ratios estimators usually increase with a decrease i n the value of r . With the HT estimator, i t depends on the range of values of m or c considered and the design function used. With the polynominal design function, the e f f i c i e n c y of the HT estimator usually increases with r . With the exponential design function and f o r , say, c<0.3, the 53. e f f i c i e n c y of the HT estimator improves with an increase i n the value of r; when c>0.4 the HT estimator usually becomes les s e f f i c i e n t as r takes higher values. For the same values of k, and for the same ranges of the values of the design function parameters, m or c, the ef f e c t of increasing g i s usually to make the r a t i o and the mean-of-the-ratios estimators les s e f f i c i e n t . Other things f i x e d , the variances of the r a t i o and the mean-of-the-ratios estimators decrease with an increase i n the values of m or c. The r a t i o and mean-of-the-ratios estimators have smaller variances when sel e c t i o n i s 'purposive' (m?0 or c70) than when simple random sampling (m=c=0) i s employed. The HT estimator i s most e f f i c i e n t when the sampling scheme i s approximately pps, i . e . when m i s approximately = 1 ( i n the polynominal case). In most cases, the variances of these estimators can be equalized by simply choosing d i f f e r e n t combinations of the parameters. For example, the variance of the r a t i o estimator for k = 1, g = 1 and r = 1 equals the variance ofthe mean-of-the-ratios estimator with k = 1, g = 1 and r = 2 (and P(x) i s polynominal). 54. (10) For a g r e a t e r range of the v a l u e s of c or m, the e s t i m a t o r s have s m a l l e r v a r i a n c e s when P ( x ) i s p o l y n o m i n a l t h a n when i t i s e x p o n e n t i a l . (11) With an e x p o n e n t i a l d e s i g n f u n c t i o n , the v a r i a n c e s of the e s t i m a t o r s a r e f i n i t e f o r a v e r y s m a l l range of t h e v a l u e s of c. The v a r i a n c e s of the r a t i o and m e a n - o f - t h e - r a t i o s e s t i m a t o r s a re z e r o when c = 0. 55. CHAPTER 4 DISCUSSION OF RESULTS Some Empirical Comparisons P.S.R.S. Rao (1969) compared four ratio-type estimators under the regression model: y=(X+p>X+e where E(e . | X i ) = 0 E ( e i e. | X ± Xj) = 0 Var (ej_ j XL) = £x? and X i s gamma di s t r i b u t e d with parameter h. He found out that when a = 0 and g > 1, the MSE's increase as h increases. He considered values of h>2 and samples of size 2-10. These r e s u l t s are not quite compatible with my r e s u l t s f o r the same MSE's of the r a t i o and mean-of-the-ratios estimators (Rao's f i r s t estimator i s the r a t i o estimator). But from (2.5) and (2.9), i t i s clear that, for g>2, the variance of the mean-of-the-ratios estimator increases with r . For g 2, the variance decreases with r . Since I did not work out the variance of the r a t i o estimator for g > 1, I do not have much to compare with. C e r t a i n l y when g = 1, the variance of the r a t i o estimator decreases as r increases. Rao also found out that for fixed n and h (or r) the MSE's increase as g increases. This agrees with my r e s u l t s and can be i n f e r r e d from (2.5) -(2.7). L a s t l y , his r e s u l t s show that for 0f= 0, n 72 and 56. g = 1 or 2, the r a t i o estimator has smaller MSE than the other three. It i s implied from his paper that a choice of a and g.combinations does a f f e c t the MSE's of the estimators. My r e s u l t s show that the variances of the r a t i o and mean-of-the-ratios estimators become smaller as the design function parameter m or c increases. (2.5) -(2.7), (2.10) - (2.11) show similar r e s u l t s . But the larger the values of m or c the higher w i l l be the i n c l u s i o n p r o b a b i l i t i e s for units with large values of X compared with those with small values of X. This leads to the notion that i f units with large values of X are purposely included i n the sample, the estimation procedure w i l l be more e f f i c i e n t . Many researchers, notably, Royall (1970, 1971) have found si m i l a r r e s u l t s under similar conditions. Royall's r e s u l t s also show that the r a t i o estimator i s better when combined with designs other than simple random sampling. For g = 1, k = 1, he shows that the r a t i o estimator i s the best l i n e a r i - unbiased estimator. When used with simple random sampling design, i t remains optimal only i f M i s i n f i n i t e . Since Godambe proved the non-existence .of a uniformly minimum variance unbiased estimator among a c l a s s of a l l unbiased l i n e a r estimators for any sampling design, i t may be proper to note that while discussing these optimality properties of the r a t i o estimator we are, most of the time, r e s t r i c t i n g 57. ourselves to a c e r t a i n class of l i n e a r estimators only (Godambe's 1955 c l a s s ( i i i ) ). The same remarks apply to optimality conditions given by Cochran (1963). The HT estimator belongs to Godambe's (1955) sub-class ( i ) of estimators. It i s the best and the only unbiased l i n e a r estimator i n the sub-class. Under super-population set up, Rao (1967) considered the problem of choosing a suitable strategy for the r a t i o , mean-of-the-ratios and the HT estimators. His r e s u l t s suggest that the mean-of-the-ratios estimator with ITPs i s better than the other two estimators used with the Midzumo-Sen sampling scheme. My r e s u l t s suggest that i f the r a t i o and mean-of-the-ratios estimators are combined with simple random sampling, there always e x i s t s an m or c such that the variance of the mean-of-the-ratios estimator i s l e s s than the variances of the r a t i o and of the HT estimators. S i m i l a r l y , the variance of the r a t i o estimator with varying p r o b a b i l i t y of i n c l u s i o n procedure can be made small compared with those of the HT and mean-of- t h e - r a t i o s estimators with simple random sampling. If i n formula (2.2) for the mean-of-the-ratios estimator, we l e t * i = P(Xi) • r the HT estimator (2.1) i s obtained. Hanurav (1967) was 58 o interested i n finding sampling designs under which t h i s X • p(X-) = — and the variance of the rnean-of-the-ratios 1 r estimator i s unbiased (he was estimating the population t o t a l ) . For n = 2, he gives,two sequential sampling procedures that solve the problem. With my study, the conditions are easy to f i n d : when P(X) P(Xi) when P(X) P(Xi) It i s easy to f i n d an r (c or m) that w i l l give the . required P(Xj_) for fixed m or c (or r) provided c ~ m = 0. With polynominal P(x), t h i s study i s r e a l l y a special case of that considered by Cassel and Sarndal (1973). Nearly a l l of my findings are i n agreement with t h e i r f i n d i n g s . They f i n d out, for example, that: (i ) When g = 2, the mean-of-the-ratios estimator has constant variance regardless of both the design and the d i s t r i b u t i o n function, f ( x ) . ( i i ) The r a t i o estimator combined with simple random sampling can be quite i n e f f i c i e n t compared with i t i f used with other sampling designs, ( i i i ) Under c e r t a i n conditions, one should use purposive s e l e c t i o n of the units with largest X-values. i s polynominal, we want = X^ • (m+ r j r i s exponential we have ( 1 - c ) r cxi = X i e r 59. (iv) When g ^ '1 the r a t i o estimator i s at least as e f f i c i e n t as the mean-of-the-ratios estimator for any design P(x). In the case that I am consider-ing, for n>l and g ^.1, the expressions for the variances of the r a t i o and the mean-of-the-ratios estimators show that the r a t i o estimator has smaller variance than the mean-of-the-ratios estimator, and the difference between the two variances increases with n. For example, when g = 0 and P(x) i s polynominal, we have A 2 nk 2 H y Var (M-Ymr) = n 2(m+r-l)(m+r-2) 2 i t 2 V a r ( ^ Y R ) = (nm +nr-l)(nm+nr-2 (v) Variances of the r a t i o and mean-of-the-ratios estimators increase with m or c. (vi ) If g = 2, regardless of the design function used, the mean-of-the-ratios estimator has variance that i s smaller than, a l to that of the r a t i o ' estimator (I did not work out the variance of the r a t i o estimator when g = 2). ( v i i ) If g > 2 the variance of the mean-of-the-ratios estimator, becomes small i f the design assigns the bulk of the se l e c t i o n p r o b a b i l i t i e s to the units with smallest X-values. 60. ( v i i i ) The HT estimator i s generally highly sensitive to s h i f t s i n the design. Its variance i s a minimum' when se l e c t i o n p r o b a b i l i t i e s are somewhat i n the. v i c i n i t y of pps procedure. The variance can become very large due to minor deviations from the point of minimum. Comments and Some Implications From the r e s u l t s of t h i s study, i t would seem that instead of tr y i n g to look for estimators that are generally optimal l i k e uniformly minimum variance estimators, more e f f o r t should be used i n defining c l e a r l y and simply the conditions under which the popular estimators are e f f i c i e n t . Under the usual regression models, the choice of g , i t seems, determines the extent to which the best choice of sampling strategy can improve the estimation process. In most, cases studied, i t has been found that the value of g l i e s between 1 and 2. This may be unfortunate as the three common estimators I have considered can be I more e f f i c i e n t when g = 0. On the other hand, g = 0 implies that the variance of the error term, Zj_, i s constant which i s a very u n l i k e l y s i t u a t i o n i n p r a c t i c e . For 1 £ g £ 2 , the variances of the three estimators can be made to a t t a i n t h e i r minimum values by a proper choice of rn, r and g. Perhaps the good thing with the continuous 61. variable model i s that the expressions for the'variances of the estimators are very, very simple. If one was interested, i n getting the exact minimum values of the variances of these estimators, i t should not be too d i f f i c u l t for him to do so. He w i l l l i k e l y have to use the computer and some mathematical programming techniques. The r e s u l t s also c a l l for more attention to the choice of estimators and sampling designs when doing 1 survey sampling. In p a r t i c u l a r , the r e s u l t s show again, 1 that i n most cases, simple random sampling i s not an optimal sampling design. There are other better ones. And when estimating the population mean, the sample average need not give optimal r e s u l t s . There may be other better estimators. In p r a c t i c e , the sampler, under t h i s model, w i l l p a r t l y be able to control the design function parameters. In such s i t u a t i o n s , studies l i k e t h i s may help the sampler make a proper choice of the design function parameter that w i l l give best r e s u l t s . In t h i s study, the r a t i o and rtiean-of-the-ratios estimators, as i n similar other studies, promise good r e s u l t s when sel e c t i o n i s purposive under c e r t a i n conditions. Most sample survey experts object to t h i s method of sel e c t i o n because, as Hansen, Hurwitz and Madow (1953, p.9) put i t : (a) Methods of selecting samples based on the theory of p r o b a b i l i t y are the only general methods known to us which can provide a measure of p r e c i s i o n . 62. Only by using p r o b a b i l i t y methods can objective numerical statements be made concerning the p r e c i s i o n of the r e s u l t s of the survey; (b) It i s necessary to be sure that the conditions imposed by the use of p r o b a b i l i t y methods are s a t i s f i e d . It i s not enough to hope or expect that they are. Steps must be taken to meet these conditions by selecting methods that are tested and are demonstrated to conform to the p r o b a b i l i t y model. They continue saying: We assert that, with rare exceptions, the p r e c i s i o n of estimates not based on known prob-a b i l i t i e s of selecting the samples cannot be predicted before the survey i s made, nor can the p r o b a b i l i t i e s or p r e c i s i o n be estimated a f t e r the sample i s obtained. If we know nothing of the p r e c i s i o n , then we do not know whether to have much f a i t h i n our estimates, even though highly accurate measurements are made on the units i n the sample. Random sampling i s usually supported for si m i l a r arguments, namely, i t protects against f a i l u r e of c e r t a i n p r o b a b i l i s t i c assumptions, i t averages out e f f e c t s of unobserved or unknown random variables, i t guards against unconscious bias on the part of the experimenter, i t w i l l usually produce a sample i n which the X's are spread throughout the range of X values i n the population and t h i s enables the sampler to check the.accuracy of assumptions concerning the r e l a t i o n of the y's to the X's and, again, i t enables the sampler to estimate, from the sample, the p r e c i s i o n of h i s estimate. But p r o b a b i l i t y methods can do nothing more than give us expectations about, say, the possible p r e c i s i o n of 6 3 . the r e s u l t s of the survey. The precision would usually be stated i n terms of the p r o b a b i l i t y that the estimate deviates from the r e a l value, and as long as i t i s given in these p r o b a b i l i t y terms, i t does nothing more than give expectations, however high and refined the p r o b a b i l i t i e s may be. On the other hand, i f many studies point to the f a c t that non-probability methods, l i k e purposive s e l e c t i o n , lead, under c e r t a i n conditions, to e f f i c i e n t estimations, the same p r o b a b i l i t y theory may allow that under similar sampling conditions, we can expect, with high p r o b a b i l i t y , to obtain s i m i l a r good r e s u l t s . As Royall (1970) argues, i f the sampler believes i t to be important that he obtain .. a sample i n which the X values have a c e r t a i n configuration, then he should choose a sample d e l i b e r a t e l y and not leave i t to the choice of a c e r t a i n chance mechanism. In t h i s study, the r a t i o estimator has once again shown i t s u p e r i o r i t y over the mean-of-the-ratios and the HT estimators. The HT estimator has revealed i t s s e n s i t i v i t y to the choice of parameters of the model and of the sampling design. The rnean-of-the-ratios estimator may not be very far off from the r a t i o estimator. The r e s u l t s of the'study are quite similar to similar studies under d i f f e r e n t regression type models. I would l i k e to note that s t r a t i f i e d sampling combined with simple random sampling i n each stratum can be achieved, under t h i s model, 64. by assigning the same P(x) for a l l members of one stratum and d i f f e r e n t P(x) for members of d i f f e r e n t s t r a t a . Some Limitations In putting the ideas- contained i n t h i s study into practice, the order of events i s ( l ) estimate F ( x ) , the d i s t r i b u t i o n function of x (2) approximate g and 9 and (3)"investigate some sampling designs and estimators and choose the ones that give best r e s u l t s . This study would help the sampler to approximately examine the behaviours of d i f f e r e n t designs and estimators he may be pondering to use. The assumption concerning an i n f i n i t e population that has approximately a continuous frequency d i s t r i b u t i o n , while i t helps simplify the in v e s t i g a t i o n of d i f f e r e n t designs, also makes the s i t u a t i o n considered an i d e a l i z a t i o n of the r e a l s i t u a t i o n . To estimate the frequency function of x, one could st a r t by observing the histogram of the x values and choose or f i t an approximate continuous function, possibly by some mathematical curve f i t t i n g techniques, that c l o s e l y resembles the histogram; and the continuous function thus obtained has to be standardized to become a cumulative d i s t r i b u t i o n function. To approximate 6 and g, we could use some l i k e l i h o o d methods l i k e the ones suggested by Brewer (1963). 65. It i s quite possible that some problems of evaluating the variances of the estimators we may want to investigate given the approximate d i s t r i b u t i o n function of• x w i l l be encountered. Should i t , for example, turn out that the X values are approximately normally d i s t r i b u t e d and the sampler wants to investigate the sampling designs and estimators I have studied, i t would not be easy to evaluate the variances. But with other d i s t r i b u t i o n functions and sampling designs things should be easy going and studying the properties of such sampling strategies i s easy. Cassel and Sarndal (1973) show that r e s u l t s obtained under t h i s model are v a l i d for values of N as low as N =- 10. Some Recommendations I think sample survey t h e o r i s t s should spend more time simplifying and unifying the r e s u l t s of t h e i r research. They should spend more time i n r e f i n i n g and reducing the number of d i f f e r e n t sampling designs and estimators that they are considering. They should, somehow, formulate a simple u n i f i e d theory of sampling that can e a s i l y be put into practice. In'this connection, the estimators I have considered may prove us e f u l . Surely, some problems may crop up i n i t i a l l y . I also think that some ideas from General S t a t i s t i c a l Theory can be useful i n formulating a simple sample survey theory; there i s no need of divorcing one of the sampling theories from the other. 6 6 . REFERENCES Basu, D. (1971). An essay on the l o g i c a l foundations of survey sampling, Part 1. i n Foundations of S t a t i s t i c a l Inference, ed. by V. P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 203-242. Brewer, K.R.W. (1963). Ratio estimation and f i n i t e populations: some r e s u l t s deducible from the assumption of an underlying stochastic process. Aust. J . S t a t i s t . 5. 93-105. Cassel, CM. and Sarndal, C.E. (1972a). A model for studying robustness of estimators and informative-ness of lab e l s i n sampling with varying p r o b a b i l i t i e s , J . R. S t a t i s t . Soc. B., 35. 279-289. , (1972b). An anal y t i c framework o f f e r i n g some guidelines to the choice of sampling design and the choice of estimator for f i n i t e populations. Working paper No. 142, Faculty of Commerce, UBC. , (1973). Evaluat ion of some sampling strategies for f i n i t e populations using a continuous variable framework. To appear i n the A p r i l issue of Communications i n S t a t i s t i c s . Cochran, W.G. (1946). Relative accuracy of systematic and s t r a t i f i e d random samples for a c e r t a i n class of populations. Ann. Math. S t a t i s t . , 17, 164-177. , (1963). Sampling Techniques. New York: Wiley. Durbin, J . , (1953). Some r e s u l t s i n sampling theory when the units are selected with unequal p r o b a b i l i t i e s . J . R. S t a t i s . Soc. B., 15. 262-269. Foreman, E.K., and Brewer, K.R.W. (1971). The e f f i c i e n t Use of Supplementary Information i n Standard Sampling Procedures. J.R. S t a t i s t . Soc. B.. 33. 391-400. Godambe, V.P. (1955). A u n i f i e d theory of sampling from f i n i t e populations. J.R. S t a t i s t . Soc.B., 17, 269-278. , (1965). A review of the contributions towards a uni f i e d theory of sampling from f i n i t e populations. Rev. Int. S t a t i s t . Inst.. 32.. 242-253. 67. Godambe, V.P. (1969). Some aspects of the t h e o r e t i c a l developments i n survey sampling. New Develop-ments i n Survey Sampling, ed. by N. L. Johnson and H. Smith, J r . , 27-58. New York: Wiley. ., (1970). Foundations of Survey-sampling. The American S t a t i s t i c i a n , 24. 33-38. and Joshi, V.M. (1965). A d m i s s i b i l i t y and Bayes estimation i n sampling f i n i t e populations,1. Ann. Math. Statist... 36. 1707-1722. Goodman, R. and Kish, L. (1950). Controlled selection - A technique i n p r o b a b i l i t y sampling. J . Amer. S t a t i s t , Assoc.. 45. 350-372. Hansen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sampling Survey Methods and iheory, V o l . 11. John Wiley & Sons, New York. Hanurav, T.V. (1967). Optimum u t i l i z a t i o n of A u x i l i a r y Information: f/"PS sampling of two units from a stratum. J . R. S t a t i s t . Soc. B. 29, 374-391. Hartley, H.O. and Rao, J.N.K. (1962). Sampling with unequal p r o b a b i l i t i e s and without replacement. Ann. Math. S t a t i s t . 2. 77-86. Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a f i n i t e universe. J . Amer. S t a t i s t . Ass. 47. 663-635. Joshi, V.M. (1969). A d m i s s i b i l i t y of Estimates of the mean of a f i n i t e population. New Developments i n Survey Sampling, ed. by N. L. Johnson and H. Smith, J r . , 188-207, New York: Wiley. Neyman, J . (1934). On the two d i f f e r e n t aspects of the representative method. The method of s t r a t i f i e d sampling and the method of purposive s e l e c t i o n . J . R. S t a t i s t . Soc. 97.. 558-606. Rao, J.N.K. (1969). Ratio and Regression Estimators, New Developments i n Survey Sampling. Ed. by N.L. Johnson and H. Smith, J r . , 213-234, New York, Wiley. Rao, P.S.R.S. (1969). Comparison of four ratio-type estimates under a model. J . Amer. S t a t i s t . Assoc. 64. 574-580. oo. Rao, T.J. (1967). On the choice of a strategy for the Ratio Method of estimation. J . R. S t a t i s t . Soc. B. 29, 392-397. Royall, R.M. (1970). On f i n i t e population sampling theory under c e r t a i n l i n e a r regression models. Biometrika, 57. 377-387. , (1971). Linear regression models i n f i n i t e population sampling theory. In Foundations of S t a t i s t i c a l Inference, ed. by V.P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 259-279. Sarndal, C.E. (1972). Sample Survey theory Vs. General S t a t i s t i c a l theory: estimation of the population mean. Rev. Int. S t a t i s t . Inst., 40. 1-12. Sukhatme, P.V. and Sukhatme, B.V. (1970). Sampling Theory of Surveys with Applications, Asia Publishing House, Bombay 1.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The ratio, mean-of-the ratios and Horvitz-Thompson...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The ratio, mean-of-the ratios and Horvitz-Thompson estimators under the continuous variable model Chamwali, Anthony Alifa 1974
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | The ratio, mean-of-the ratios and Horvitz-Thompson estimators under the continuous variable model |
Creator |
Chamwali, Anthony Alifa |
Publisher | University of British Columbia |
Date Issued | 1974 |
Description | This study investigates the performances of the ratio estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson (HT) estimator under the continuous variable model of Cassel and Sarndal (1972a, 1972b, 1973). Under this model, the character, Y, which is of interest to the investigator is assumed to be related to an auxiliary variable, X, by Y(Xi) = θ(Xi + Z(Xi)) where ℇ(Zi | Xi) = 0; ∀Xi ℇ (0, ∞); ℇ(Zi² | Xi) = σ² (Xi) = k² Xi[sup g]; ℇ(ZiZj | XiXj) =0; (i ≠ j). It is assumed, in this paper, that X is gamma distributed over (0, ∞) with parameter r. The mean of Y is to be estimated, under the additional assumptions that the design function, P(x), is l) polynominal 2) exponential, i.e. [formulas are not included]. It is observed that for g = 0 or 1, the ratio estimator performs better than the other two. For g = 0, 1 or 2, and for a wider range of values of m or c, the mean-of-the-ratios estimator performs better than the HT estimator. When P(X) is polynominal, the III estimator is most efficient if the sampling design is approximately pps. The results compare well with those of other researchers under similar assumptions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0092965 |
URI | http://hdl.handle.net/2429/18800 |
Degree |
Master of Science in Business - MScB |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1974_A4_6 C49_3.pdf [ 13.79MB ]
- Metadata
- JSON: 831-1.0092965.json
- JSON-LD: 831-1.0092965-ld.json
- RDF/XML (Pretty): 831-1.0092965-rdf.xml
- RDF/JSON: 831-1.0092965-rdf.json
- Turtle: 831-1.0092965-turtle.txt
- N-Triples: 831-1.0092965-rdf-ntriples.txt
- Original Record: 831-1.0092965-source.json
- Full Text
- 831-1.0092965-fulltext.txt
- Citation
- 831-1.0092965.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0092965/manifest