@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Business, Sauder School of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Chamwali, Anthony Alifa"@en ; dcterms:issued "2010-01-21T00:34:03Z"@en, "1974"@en ; vivo:relatedDegree "Master of Science in Business - MScB"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "This study investigates the performances of the ratio estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson (HT) estimator under the continuous variable model of Cassel and Sarndal (1972a, 1972b, 1973). Under this model, the character, Y, which is of interest to the investigator is assumed to be related to an auxiliary variable, X, by Y(Xi) = θ(Xi + Z(Xi)) where ℇ(Zi | Xi) = 0; ∀Xi ℇ (0, ∞); ℇ(Zi² | Xi) = σ² (Xi) = k² Xi[sup g]; ℇ(ZiZj | XiXj) =0; (i ≠ j). It is assumed, in this paper, that X is gamma distributed over (0, ∞) with parameter r. The mean of Y is to be estimated, under the additional assumptions that the design function, P(x), is l) polynominal 2) exponential, i.e. [formulas are not included]. It is observed that for g = 0 or 1, the ratio estimator performs better than the other two. For g = 0, 1 or 2, and for a wider range of values of m or c, the mean-of-the-ratios estimator performs better than the HT estimator. When P(X) is polynominal, the III estimator is most efficient if the sampling design is approximately pps. The results compare well with those of other researchers under similar assumptions."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/18800?expand=metadata"@en ; skos:note "r I THE RATIO, MEAN-OF-TH E-RATIOS AND HORVITZ-THOMPSON ESTIMATORS UNDER THE CONTINUOUS VARIABLE MODEL BY ANTHONY ALIFA CHAMWALI B. SC., UNIVERSITY OF EAST AFRICA -THE UNIVERSITY COLLEGE, DAR-ES-SALAAM, 1970 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE (BUS. ADMIN.) IN THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION V/E ACCEPT THIS THESIS AS CONFORMING TO THE REQUIRED STANDARD THE UNIVERSITY OF BRITISH COLUMBIA APRIL, 1974 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an advanced degree a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I agree t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by the head o f my Department o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department of Commerce and Business A d m i n i s t r a t i o n The U n i v e r s i t y o f B r i t i s h C o l u m b i a Vancouver 8, Canada Date A p r i l , 1974 ABSTRACT T h i s study i n v e s t i g a t e s the performances of the r a t i o e s t i m a t o r , the m e a n - o f - t h e - r a t i o s e s t i m a t o r and the Horvitz-Thompson (HT) e s t i m a t o r under the continuous v a r i a b l e model of C a s s e l and S a r n d a l (1972a, 1972b, 1973). Under t h i s model, the c h a r a c t e r , Y, which i s of i n t e r e s t t o the i n v e s t i g a t o r i s assumed t o be r e l a t e d t o an a u x i l i a r y v a r i a b l e , X, by Y(Xj_) = e{X± + Z ( X i ) ) where £ ( Z i | x^ ) = 0 VXi e (0, ^ ) £(z±2\\ X i ) = cr 2 ( X i ) = k2 x? £ ( Z i Z j / X i X j ) =0 ( i * j ) I t i s assumed, i n t h i s paper, t h a t X i s gamma d i s t r i b u t e d over (0, CP) w i t h parameter r . The mean of Y i s t o be estimat e d , under the a d d i t i o n a l assumptions t h a t the de s i g n f u n c t i o n , P(x), i s l ) polynominal 2) e x p o n e n t i a l , i . e . 1) P(X) -2) P(X) = (1-c) r e c x I t i s observed t h a t f o r g = 0 or 1, the r a t i o e s t i m a t o r performs b e t t e r than the other two. For g = 0, 1 or 2, and f o r a wider range of v a l u e s of rn or c, the mean-of-the-r a t i o s e s t i m a t o r performs b e t t e r than the HT e s t i m a t o r . When P(X) i s p o l y n o m i n a l , the III e s t i m a t o r i s most e f f i c i e n t i f the s a m p l i n g d e s i g n i s a p p r o x i m a t e l y pps. The r e s u l t s compare w e l l w i t h t h o s e of o t h e r r e s e a r c h e r s under s i m i l a r assumptions., i i . TABLE OF CONTENTS Page ABSTRACT i ACKNOIVLEDG EMENT S ' ' . v i 1 INTRODUCTION 1 C l a s s i c a l S a m p l i n g Theory. . 1 The Sample Su r v e y Theory . . . . . . . . 4 S e a r c h f o r O p t i m a l E s t i m a t o r s and S a m p l i n g D e s i g n s . . . 6 The R a t i o , M e a n - o f - t h e - r a t i o s and HT E s t i m a t o r s . 10 2 . THE PROBLEM 14 Statement of t h e P r o b l e m . . . . . . . . 14 Method Used and Problems of E v a l u a t i o n . 17 3 THE RESULTS 21 R e s u l t s f o r P ( x ) P o l y n o m i n a l and k = 0.5 21 R e s u l t s f o r P ( x ) P o l y n o m i n a l and k = 1.0' 23 R e s u l t s f o r P\\x) E x p o n e n t i a l and k = 0.5 37 R e s u l t s f o r P ( x ) E x p o n e n t i a l and k = 1.0 45 Summary of R e s u l t s and Some C o n c l u s i o n s 52 4 DISCUSSION OF RESULTS . 5 5 Some E m p i r i c a l Comparisons . . . . . . . 55 Comments and Some I m p l i c a t i o n s . . . . . 60 Some L i m i t a t i o n s 64 Some Recommendations 65 REFERENCES . ' 66 i i i LIST OF TABLES T a b l e Number X V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 1, g = 0 Pac^e I V a r i a n c e s of t h e E s t i m a t o r s when X i s Gamma • D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 0.5, 0 / 1 g = 0 2 4 I I V a r i a n c e s of t h e E s t i m a t o r s when X i s Gamma D i s t r i b u t e d P ( x ) i s P o l y n o m i n a l , k = 0.5, : ^ g = 1 l o A ^ I I I V a r i a n c e s of M-YHT and Y- Ymr when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 0„5, g = 2 2 8 IV V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 1, g = o 3 0 V V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s P o l y n o m i n a l , k = 1, g = i , 3 2 A A V I V a r ( N Y H T and V a r (K-Ymr) when X i s Gamma D i s t r i b u t e d P ( x ) i s P o l y n o m i n a l , k - 1 , g = 2 3 4 V I I V a r i a n c e s of the E s t i m a t o r s when X i s Gamma D i s t r i b u t e d , P ( x ) i s E x p o n e n t i a l , k = 0.5, _ g = 0 3 o l i m p(e -£ ^ en 0, l i m P [j ©n - ej>i] = o , ve n (c) There i s no other sequence of consistent estimators 0j[ , 0\"! , . . . . , 0 ^ . ... for which the d i s t r i b u t i o n of ^ ( 6 * - 9) approaches the normal d i s t r i b u t i o n with mean 0 and variance 0*2 (e) and such that ^Jel_> I 0 * 2 (0) for a l l 0 i n some open i n t e r v a l . If the unbiased 0 = d(Yi, Y A ..Y ) i s a l i n e a r 1 2 > ' n function of Y and i f Var (0)iVar (0p) for a l l l i n e a r 0p, A. then 0 i s a Uniformly Best Linear Unbiased Estimator (UBLUE) of 0. It may be worthwhile noting that while general s t a t i s t i c a l theory aims at making inferences about some frequency function of a hypothetical population, i t may 4. a c t u a l l y be i n f e r r i n g about the p o s s i b l e chance mechanism ( d i s c u s s e d above) t h a t produced the gi v e n set of observa-t i o n s which may have nothing t o do w i t h the h y p o t h e t i c a l p o p u l a t i o n at a l l . The Sample Survey Theory In Sample Survey Theory, the p o p u l a t i o n s c o n s i d e r e d are f i n i t e . They c o n s i s t of elements which are r e a l and co u n t a b l e . They are not h y p o t h e t i c a l l i k e the ones c o n s i d e r e d i n the Gener a l S t a t i s t i c a l Theory. T h i s i s the main d i f f e r e n c e between the p o p u l a t i o n s d e a l t w i t h by the two sampling t h e o r i e s . In t h i s sense the problems of i n f e r e n c e i n the two models of sampling t h e o r y may be taken t o be d i f f e r e n t . In survey sampling, s i n c e the i n v e s t i g a t o r d e a l s w i t h r e a l and f i n i t e p o p u l a t i o n s , he has c h o i c e over the sampling d e s i g n and the e s t i m a t o r he may want t o use. O c c a s i o n a l l y , c i r c u m s t a n c e s may n e c e s s i t a t e the use of n o n p r o b a b i l i t y sampling a l t o g e t h e r . Cochran (1963) g i v e s the f o l l o w i n g s i t u a t i o n s t h a t l e a d t o n o n p r o b a b i l i t y sampling: (1) When the sample i s r e s t r i c t e d t o a p a r t of the p o p u l a t i o n t h a t i s r e a d i l y a c c e s s i b l e . (2) When the sample i s s e l e c t e d h a p h a z a r d l y . (3) With a small but heterogenous p o p u l a t i o n , the sampler i n s p e c t s the whole of i t and s e l e c t s a small sample of ' t y p i c a l ' u n i t s , i . e . , u n i t s t h a t are c l o s e t o h i s imp r e s s i o n of the average of the p o p u l a t i o n . T h i s method i s sometimes c a l l e d judgement or purpos i v e s e l e c t i o n . 5. ( 4 ) L a s t l y , when the sample consists e s s e n t i a l l y of volunteers. He notes that these methods may give good r e s u l t s under the r i g h t conditions although they are not amenable to the development of a sampling theory owing to lack of .random sel e c t i o n . But whether one deals with Sample Survey Theory or General S t a t i s t i c a l Theory, he i s dealing with the same problem of i n f e r r i n g from the sample to the population usually with the help of P r o b a b i l i t y Theory and S t a t i s t i c s . In the General S t a t i s t i c a l Theory, random sampling solved nearly a l l the sampling design problems. With survey populations, things are not that much easy. The nature of the population may make random sampling d i f f i c u l t ; i t may not be possible to i d e n t i f y a l l the units i n the population. In most cases, the frequency function of the character under consideration i s unknown. This makes i t d i f f i c u l t to check some of the optimality conditions of estimators i n the General S t a t i s t i c a l Theory. And optimal estimators under the General S t a t i s t i c a l Theory need not be so under the Sample Survey Theory. This r a i s e s the main problem of f inding sampling designs and estimators that are optimal for the sample survey s i t u a t i o n . Mathematically, i n the sample survey model, we have a set P of N units that constitute the population, P = (Ui,, U 2, UN) 6. where stands for unit i . An unknown quantity Y i which i s of i n t e r e s t to the surveyor i s associated with U^. The surveyor wants to know 0 = ©(Yi, Y 2, Yj«j) He may also know the a u x i l i a r y variable X , i . e . a = (Xj_, X 2, . . . , X N ) associated with P. He se l e c t s a subset • s = ( u l f u 2 , u n ) ! Of units of P and observes the corresponding Y = ( Y l v Y 2, Y n) (response errors are ignored). The sampling plan generates S. He then c a l c u l a t e s 9 = e(Y I } Y 2, Y n) as an estimator of 0. The problem i s how to select S (how to assign the p r o b a b i l i t y that unit i of P w i l l be included i n S) and how to choose the random variable 0 to get an estimate which i s as near 0 as possible . C e r t a i n l y , i t i s not just a matter of getting a 'representative' sample and c a l c u l a t i n g the mean, variance I and Median of the sample values. Besides, what i s a representative sample? Search for Optimal Estimators and Sampling Designs, In 1934 Neyman introduced the Gauss-Markov theorem to obtain a l i n e a r unbiased minimum variance estimate for 7. the mean of a survey population. He established that, for simple random sampling, the sample mean was the minimum variance l i n e a r unbiased estimate of the survey population mean. This was an attempt to f i t survey sampling into the hypothetical population model, and h i s findings stimulated most sample-survey s t a t i s t i c i a n s to f i n d , with the help of Gauss-Markov theorem, e f f i c i e n t ( i . e . minimum variance) unbiased estimators for a v a r i e t y of more complex designs. Sampling procedures l i k e , sampling with a r b i t r a r y p r o b a b i l i t i e s , s t r a t i f i e d sampling, c l u s t e r sampling, 2-stage sampling, multi-stage sampling, etc. were designed to reduce the variance of the estimators (see, for example, Goodman and Kish (1950), Durbin (1953), Hartley and Rao (1962), Cochran (1963). Horvitz and Thompson (1952) attempted to provide a general method for dealing with sampling without replacement from a f i n i t e population when var i a b l e p r o b a b i l i t i e s of sel e c t i o n are used to the elements remaining p r i o r to each draw. They, i n p a r t i c u l a r , considered a general estimator of the population t o t a l of the form -A n Y = I r i Y i i=l where ^± ( i = 1, N) i s a constant to be used as a weight for the i t n unit whenever i t i s selected for the 8. sample. L e t t i n g P(Xj_) to be p r o b a b i l i t y that the i \" ^ element w i l l be included i n a sample of size n, they showed that P(Xi) makes Y unbiased and of minimum variance ( X ^ i s the value of the a u x i l i a r y variable associated with unit i ) . Godambe (1955, 1965) considered the general estimator •A n Y = -21 b s i Y i i=l where b s i i s defined i n advance f o r a l l the N n l o g i c a l l y possible S (samples), and for a l l i i n S. T h i s . i s a more general c l a s s of estimators than the ones considered by Horvitz and Thompson. He proved the non-existence of a uniformly minimum variance (UMV) unbiased estimator i n t h i s c l ass.of estimators for any design P(s), excepting those i n which no two S with P(S) > o have at least one common and one uncommon u n i t . Because of the non-existence of UMV unbiased estimator, other c r i t e r i a of goodness of an estimator were sought. Godambe and J o s h i (1965) considered the c r i t e r i o n of a d m i s s i b i l i t y and proved that the Horvitz-Thompson (HT) estimator i s admissible i n the c l a s s of a l l p-unbiased estimators of Y , the population t o t a l , for any design P(S) such that 9. TTj = 2 p ( s ) > o , vuj ' S : ) u j The a d m i s s i b i l i t y c r i t e r i o n , however, was s a t i s f i e d by many other estimators, and other new c r i t e r i a l i k e 'hyper-admissi'bility, ' 'necessary bestness' (Basu, 1971) were introduced. A d e t a i l e d examination of these optimality properties r a i s e some questions regarding t h e i r relevance. For example, the Horvitz and Thompson estimator, YHT, i s uniquely 'hyper-admissible' whatever the character under i n v e s t i g a t i o n or the sample design. But the YHT may lead to disastrous r e s u l t s (Sarndal, 1972; Basu, 1971). Many v a r i a t i o n s of the regression type population model y i = axi + Zj_ i = (l,...,N) have been used, where y^ i s the character of i n t e r e s t and Xj_ i s an a u x i l i a r y variable for unit i i n the population; Z i i s an error component. Cochran (1946), Godambe (1955, 1965), Royall (1970, 1971) and many others used the super-population approach i n conjunction with the regression model to compare e f f i c i e n c i e s between sampling methods. Cassel and Sarndal (1972a, 1972b, 1973) use the continuous variable model of the form Y(X) = 0 (X + Z(X) ) VXC(O.cP) where (z(x) / X) = 0 , £(z(x) | X ) 2 = cr2 (X) £(z(Xj.) z (x 2 ) | x 2 x 2 ) •= o (x 2 * x 2 ) 10. and the a u x i l i a r y variable X i s assumed to have a known d i s t r i b u t i o n described by the d i s t r i b u t i o n function F(x) over (0,C P ) . D i f f e r e n t notions of unbiasedness have been used during t h i s search for optimal estimators and designs: An estimator, Y for Y, i s c a l l e d design-unbiased or p-unbiased i f ' t V - - -E p (Y) =2_P(s) Y = Y stS for a l l vectors (Yj_, .... ,Y^) , where P(s) i s a p r o b a b i l i t y function defined on the set S of subsets of s of l a b e l s ( I , « . . . , N ) . An estimator i s c a l l e d model-unbiased or£-unbiased i f i t i s unbiased under the assumptions of the s p e c i f i e d model. A L a s t l y , an estimator Y i s c a l l e d Ep^-unbiased f o r Y i f Ep£ (Y) = I P ( S ) C (Y) = Y s£S an estimator can be p-unbiased but not£-unbiased and vice versa. The P L a t i o , Mean-of-the-Ratios and the HT Estimators It i s a f a c t that using supplementary information in sampling designs and estimators greatly improves the accuracy of the estimates. Three estimators that have 11. received considerable attention i n the l i t e r a t u r e of sample survey theory and which make use of supplementary information are the Ratio estimator, the Mean-of-the-ratios estimator and the Horvitz-Thompson (Hi) estimator. If X i s an a u x i l i a r y v a r i a b l e , the r a t i o estimator for the population mean,KY, i s given by AYR = Y. X = £- X (1.1) x x n n - i • N , A where y = I X = I X = J I X., - = I Z_ Y±, 1=1 1 1=1 . N 1=1 1 Y n 1=1 X = — ZZ X-j, n i s the sample size and N i s the population n i=i s i z e . The mean-of-the-ratios estimator i s A v n y i / x KYmr = - H ~ .-^-2) n i= i XL y • The r a t i o estimators make use of the r a t i o s Y and — i - . X Xj_ i n order to improve estimation •With simple random sampling from an i n f i n i t e population, the r a t i o estimator of Ry has been shown to be the best l i n e a r unbiased estimate i f two conditions are s a t i s f i e d (Cochran , 1963, p. 166, Theorem 6.4): (1) The r e l a t i o n between yj_ and X^ i s a straight l i n e through the o r i g i n (2) The variance of y^'about t h i s l i n e i s proportional to X i . When the variance of y i about t h i s l i n e i s 2 proportional to X'^ , using the mean-of-the-ratios estimator gives much better performance than the other estimator. 12. When the. r e l a t i o n between y± and Xj i s l i n e a r but the l i n e does not go through the o r i g i n , the regression estimator . y\" l r = y + b (X - x) performs much better than the otherj- Y j [ x reduces to y i f A y b = 0 and toKRY i f b = Both the r a t i o and regression estimators are consistent and, generally, s l i g h t l y biased, but with sampling designs l i k e s t r a t i f i e d sampling, the bias of the r a t i o estimator may be considerable. This has led to searching unbiased or better ratio-type estimators. J.N.K. Rao (1969) gives some r e s u l t s of the search. The r e s u l t s indicate that under c e r t a i n conditions, other ratio-type estimators perform better than the t r a d i -. t i o n a l r a t i o estimator. The Horvitz-Thornpson estimator i s given by KYHT = . • ni=lP(xi) (1.3). where P(xi) i s the p r o b a b i l i t y that a unit with a u x i l i a r y variable x^ w i l l be included i n the sample. The variance of the HT estimator may be negative, or i t may not.reduce to zero even when a l l the Y-values are the same and the variance should a c t u a l l y be zero. In many studies the HT estimator has been shown to compete very -well with the r a t i o estimators. Assuming a l i n e a r stochastic model of the form Y i = cc + pXi + Zi , = 0%2Yt Foreman and Brewer (1971) showed that the HT estimator i s 13. more e f f i c i e n t than the r a t i o e s t i m a t o r w i t h equal s e l e c -t i o n p r o b a b i l i t i e s i f Y?^ , but they c a u t i o n t h a t i f the sampling f r a c t i o n i s l a r g e and a i s a p p r e c i a b l e , the r a t i o e s t i m a t o r may be made more e f f i c i e n t than the HI. e s t i m a t o r . Rao (1967) g i v e s some c o n d i t i o n s under which the r n e a n - o f - t h e - r a t i o s e s t i m a t o r i s s u p e r i o r t o both the r a t i o e s t i m a t o r and the HT e s t i m a t o r . U n f o r t u n a t e l y , the m e a n - o f - t h e - r a t i o s e s t i m a t o r i s not c o n s i s t e n t l i k e , say, ! the r a t i o e s t i m a t o r (Sukhatme and Sukhatme, 1970, p. 1 60). 14. CHAPTER II THE PROBLEM Statement of the Problem The problem with simple random sampling i s that i t does not take into account the possible importance of the larger units i n the population. Because of t h i s , sampling designs l i k e sampling with p r o b a b i l i t y proportional to size (pps), and generally known as sampling with varying p r o b a b i l i t i e s came to be used. These are more complex designs than simple random sampling. It was also r e a l i z e d that i t makes a difference whether sampling i s done with replacement or without replacement. However, the estimators considered were very complicated. To get simple estimators, sampling schemes l i k e the Midzumo system of sampling, the Narain Method of sampling, Systematic Sampling with varying p r o b a b i l i t i e s etc. were introduced. It i s very cl e a r , from such studies, that performances of estimators can be improved by, not only making use of supplementary information, but also by choosing the proper sampling design. The r a t i o estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson estimator have performed quite e f f i c i e n t l y i n some of the.research work i n sample survey l i t e r a t u r e . But which of these three d-oes better 15. e s t i m a t i o n than the o t h e r s depends on the sampling d e s i g n used, the way the supplementary i n f o r m a t i o n has been used,.the c l a s s of e s t i m a t o r s used and the way ' b i a s 1 or 'unbiased' i s d e f i n e d . J o s h i (1971), f o r example, s t a t e s t h a t ( i ) The HT estimate i s always a d m i s s i b l e i n the c l a s s of a l l unbiased e s t i m a t e s , l i n e a r and n o n - l i n e a r . ( i i ) In the e n t i r e c l a s s of e s t i m a t e s , the HT e s t i m a t o r i s a d m i s s i b l e i f the sampling d e s i g n i s of f i x e d sample s i z e , ( i i i ) I f the l o s s f u n c t i o n , V ( t ) , where t i s the n u m e r i c a l d i f f e r e n c e between the e s t i m a t e d and t r u e v a l u e s , s a t i s f i e s only (a) V ( t ) i s non-decreasing i n CO, ^ l , (b) f o r every K > 0 °o 2 5 V ( t ) exp (- % - ) dt C <*> 0 2 then the sample mean, and more g e n e r a l l y the r a t i o e s t i m a t e i s always a d m i s s i b l e as an e s t i m a t e of the p o p u l a t i o n mean. The use of- these t h r e e e s t i m a t o r s w i t h the r e g r e s s i o n type models has shown t o g i v e v e r y promising r e s u l t s . I would l i k e t o i n v e s t i g a t e the performances of these t h r e e e s t i m a t o r s (the r a t i o e s t i m a t o r , mean-of-the-r a t i o s e s t i m a t o r and the HT e s t i m a t o r ) under the r e g r e s s i o n type c o n t i n u o u s v a r i a b l e Model of C a s s e l arid S a r n d a l (1973). 16. I would l i k e to consider the model Y(xi) = 6(xi) + z ( x i ) ) where £(z/i| Xi) = 0 V x i € (0, *P) . £ ( Z i 2 | x i ) = cr 2 (xi) = k 2 Xi9 E ( Z i z j j XJX-J) = 0 ( i T j ) where Y i i s the value of the character under i n v e s t i g a t i o n for unit i , x i i s the value of the corresponding a u x i l i a r y v a r i a b l e . I further assume that the a u x i l i a r y variable X i s gamma d i s t r i b u t e d over (O,0^1) i . e . . f(x) = X r ~ 1 e ' \" X for x £ ( 0 , < ^ ) Hence i t s mean E(x) = § x f(x)dx = r 0 My aim i s to estimate the population mean E P € ( Y(x) ) = =Ky The three estimators of the population mean that I would l i k e to consider are * n y ( * i ) MYHT = i YI - r - T - ( 2 , I ) 1=1 ^^mr - £ (2.2) 1=1 A n H YR = r i f ! ( 2.3) n * i i = l 1 7 . Where the design function, P(x^) i s the s e l e c t i o n p r o b a b i l i t y density of a unit with a u x i l i a r y variable x i , n i s the number of units i n the sample. Draws are made independently of each other according to the same d i s t r i b u t i o n of i n c l u s i o n p r o b a b i l i t i e s . (2.l) i s the H 0rvitz-Thompson estimator, (2.2) i s the rnean-of-the-r a t i o s estimator and (2.3) i s the c l a s s i c a l r a t i o estimator. The three estimators are Ep£-unbiased and t h e i r MSE's, then, equal t h e i r EpGvariances. Method used and Problems of Evaluations The e f f i c i e n c i e s of these estimators w i l l be evaluated i n terms of t h e i r EpGvariances, which are obtained from Generally, the expressions for the variances of these estimators are Var (KY) = E p£(ft y 2) - (Ep£(fiy) ) 2 Var (RYHT) x__±_k 2xi f ( x ) d ( x ) - 1 ) A 2 ° o Var (Hymr) Ky j k 2 x g ~ 2 p(x) f(x) dx n 0 Var (KYR) A p(x) f ( x j d(x_) where P( x) = JT P(x,) , f(x) = U f ^ ) , d(x) = J J dx i -' i = l -1 I - I x l - l 18. and the l a s t i n t e g r a l i s n-dimensional (over 0 z_ x^,/^ for i = 1, n). The variances depend e s s e n t i a l l y on three things, namely, the shape of the continuous d i s t r i b u t i o n , f ( x ) , of the a u x i l i a r y v a r i a b l e , x; the variance of Z i and the se l e c t i o n p r o b a b i l i t y density, P ( x ) . I w i l l compare the variances under l ) polynomial p(x), 2) exponential p(x). (1) Let P(x) -s ?M X m (m = 0 represents simple random sampling, m = 1 represents pps sampling scheme). In t h i s case the expressions f o r the variances of these estimators are: Var rf(YHT) = ^ [ f r { f T ^ (JU+r-mH k 2 P(g+r-mjj-l}(2 04) Var (tfymr) = ^jg-r+a^). -By2. ( 2 o 5 ) for g = o: A k 2 2 V a r ( H Y R ) = (n^nr-l)(nm-nr-2) ' H y Uy^ k2 \" —- ' ftrTTT2' ( i f n i s l a ? g e ) . (2.6) for g = 1: Var (KYR) = ^ - • U_ 2 nm+nr-1 / -n- i i y i • k2-. (for n large) (2.7) n m+r 19. I evaluate these expressions numerically for r = 1, 2, 3 and k and g f i x e d at 0 .5 and 0 res p e c t i v e l y , 0 .5 and 1 resp e c t i v e l y , 0 . 5 and 2 re s p e c t i v e l y , 1 and 0 res p e c t i v e l y , etc., and as m takes the values -1 (0 .5 ) 2 , 3 , 4 . I w i l l observe how the variances of these estimators behave. (2) Let P(x) = ( l - c ) r e c x , c£.l ' (c = 0 represents simple random sampling). In t h i s case the expressions for the variances are: Var ( K Y H T ) = ^ C ^ — f ! l 2 ± X - L _ + ( 2 . 3 ) n p U - c ) * l ( r ) L ( c + a.)2+r ( c + 1 ) g+U J Var fr(vrnr1 = Hy 2. k 2 V (q+r 72) (2.9) for g=0: Var (KyR) = K? n i ^ ^ c l 2 1 y ( n r - l ) ( n r - 2 l f ^ ' . i i H l l r c J 2 ( f o r n l a r g e ) { 2 < 1 0 ) for g=l: Var (kyR) = K v k 2 ( l - c ) nr-1 2 i^x • k 2 ( l - c ) . ( f o r n large) (2.1l) n r As with the polynomial case, these expressions are evaluated and compared as values of r » c, g, k vary. It has. not been possible to evaluate' the variance 20. of the r a t i o e s t i m a t o r when g = 2, o n l y t h e v a r i a n c e s of the HT e s t i m a t o r and m e a n - o f - t h e - r a t i o s e s t i m a t o r w i l l be compared when g = 2. W i t h e x p o n e n t i a l p ( x ) , t h e v a r i a n c e of t h e m e a n - o f - t h e - r a t i o s e s t i m a t o r i s f i n i t e o n l y i f r + g > 2 . Throughout, I have used t h e a p p r o x i m a t e A. e x p r e s s i o n s of Var (KyR) f o r my e v a l u a t i o n s . The r e s u l t s a r e t a b u l a t e d below and i l l u s t r a t e d by g r a p h s . CHAPTER 3 THE RESULTS Results for P(x) Polynominal and k = 0.5 The r e s u l t s are presented i n Tables I - I I I and i l l u s t r a t e d by fi g u r e s 1-3. When g = 0 the r a t i o estimator has, generally, the smallest variance for fixed values of r and m, followed by the mean-of-the-ratios estimator and then by the HT estimator. When r = 1 not much comparison can be made between the HT estimator and the rnean-of-the-ratios estimator as the variance of the HT estimator i s f i n i t e for -l.O^m^l while the variance of the mean-of-the-ratios estimator i s f i n i t e for m>l. In the range of values of m for which the variances of the r a t i o estimator and of either the HT estimator or the mean-of-the-ratios estimator are f i n i t e , the r a t i o estimator has smaller variance. When r=2 and 0 0.063 -0.5 5.28 1.00 1.65 JO.11 ! j 0.97 0.33 | 0.040 0.0 -1.25 0.25 0.56 ;0.063 i 0.36 0.13 I 0.028 0.5 0.57 0.11 0.22 0.33 J0.040 I 0.11 0.067 j 0.020 1.0 oo oo 0.063 0.13 0.13 1 |0.028 1 ! 0.042 0.042 j j 0.016 1.5 oo 0.33 0.040 0.47 0.067 ! J0.020 • 0.15 0.029 ! ! 0.012 2.0 0.13 0.028 CP 0.042 |0.016 0.50 0.021 | 0.010 3.0 0.042 0.016 0.021 i j o . o i o oo 1 0.013 ! 0.0069 4.0 0.02 0.01 • 0.013 J0.0069 j l ! 0.0083 ! 0.0050 'TABLE II Variances of the Estimators when X i s Gamma Distributed, P(x) i s Polynominal, k = 0 . 5 , g = 1 . (Each entry should be m u l t i p l i e d by|U.2/n). r = 1 r = 2 r = i 3 .. ! I m r A KYHT Kymr K YR A K Y H T A kymr. H-YR A H Y H T A H Ymr 1 A j r ^ Y R ! - 1 . 0 oo C O 5 . 3 7 oo 0 . 2 5 2 . 5 0 0 . 2 5 ! • ! 0 . 1 3 - 0 . 5 5 . 4 8 0 . 5 > 1 . 7 6 2 0 . 5 0 0 . 1 7 • 1 . 0 4 0 . 1 7 0 . 1 0 | 0 . 0 1 . 2 5 C O 0 . 2 5 0 . 6 3 0 . 2 5 0 . 1 3 0 . 4 2 0 . 1 3 j 0 . 0 8 3 | i 0 . 5 0 . 3 7 0 . 5 0 . 1 7 0 . 2 1 0 . 1 7 i 0 . 1 0 ; 0 . 1 4 0 . 1 0 i 0 . 0 7 3 | ! 1 . 0 0 . 2 5 0 . 2 5 0 . 1 3 0 . 1 3 . 0 . 1 3 0 . 0 8 3 0 . 0 8 0 . 0 8 3 0 . 0 6 3 1 1 . 5 2 . 5 2 0 . 1 7 0 . 1 0 0 . 2 9 0 . 1 0 0 . 0 7 1 0 . 1 8 0 . 0 7 1 j 0 . 0 5 5 j i 2 . 0 G o 0 . 1 3 0 . 0 8 0 . 8 6 0 . 0 8 0 . 0 6 3 0 . 5 0 0 . 0 6 3 1 0 . 0 5 0 j 3 . 0 0 . 0 8 0 . 0 6 oo 0 . 0 6 3 0 . 0 5 0 I j 3 . 1 7 0 . 0 5 0 0 . 0 4 3 I I 4 . 0 0 . 0 6 0 . 0 5 i ! i 0 . 0 5 i i 1 0 . 0 4 2 j i i 1 cO i i I 0 . 0 4 1 ( 0 . 0 3 5 TABLE III Variances of YHT and Ymr when X i s Gamma Distributed, P(x) i s Polynominal, k = 0 . 5 , g = 2 . (Each entry should be m u l t i p l i e d by Ky/n) 1 ! i i ! I J. r = 1 r = 2 ] r = 3 m ; i A KYHT A KYmr A [X, YHT A U Ymr KYHT A KYmr 1 - 1 . 0 1 o o 0.25 6.50 < .0.25 3.17 0.25 i | - 0.5 6.36 0.25 2.21 0.25 • 1.41 0.25 j 0 .0 1.500 0.25 0.88 1 0.25 0.67 0.25 ; ! 0 .5 0.47 0.25 0.38 I 0.25 0.34 0.25 . ! 1.0 0 .25 0.25 0.25 1 0.25 j 0.25 0.25 1 ! 1.5 0.47 j 0o25 0.38 ! 0.25 0.34 0.25 2 .0 1.50 0.25 0.86 i 0.25 0.67 0.25 3 .0 0.25 6.25 I 0.25 3.17 0.25 1 4 . 0 i 0.25 ! 0.25 ! ! i 24.00 0.25 j i • H 4 4 4 • : - • . . \"TT TTTT EIGUHE -3; XTX t-tr ! I ! ! ' • -- V a r i a n c .' ^ Gamma [Distributed [Jl 4 J 44 -h p -m i M -HI 1 I jw 1 11 r j V a i . ( H Y r n r ; ^ V a x ^ K Y H f l 1 1 1 ! f o r any • : . j j Lu-L f or ' r • = ? - 1 I I . r r •44—h -Hi Li\" r r i :2, 3 :re i i ; • T - \" I T — ' T ' T T T ? a nd(KY mr) whe n • X • i b • • E5TJF p(x)'is:£diynominal, (0; i l lustrate:fable: 0 r • 4 1414X1.-4. ;.L . lift pectlye! 444-0'. • rrrr . i - i . -4-L. £L ! r rr ± 1— t mt H fflxllx r 1 1 I 1 1 1 t ttt . : tfct f t 1 : • -• • IX X I H-hH t • • - r t -t - H i -H +11! ' f f H±tt± r r r r r M 4 -4-U .- l-L r r r t t t .14 -L. .1 f (-4 - M -.1. L —1-4. L ; i L t t - H -;xt;X r - r r -rr 44 t I tx jx : ! x i : Bit t TABLE IV Variances of the Estimators when X i s Gamma Distrib u t e d , P(x) i s Polynominal, k = 1, g = 0. (Each entry should be mu l t i p l i e d by r = 1 r = 3 m KYHT A KYmr A A j A j / v ; A j A x I A (iYR. j pi YHT | KYmr 1 f^YR | KYHT j KYmr j |4YR -1.0, -0.5 0.0 0.5 1.0 1.5 2.0 3.00 4.00 6.63 1.00 1.75 oo oo oo I 1.33 0.50 0.17 0.083 5.50 '4.00 ! 1.86 1.00 I 0.75 I 0.44 j 0.40 i 0.25 ! 0.50 0.16 | 1.50 0.11 j D O 0.063; 0 0 0.040: j i.oo I 0.43 j o° | 0.25 '1.33 |0.16 0.50 j0.11 i 0.27 |0.082 .0.17 •0.063 :0.083J 0.04 !0.050|0.028 2.50 1.08 0.44 0.19 0.17 0.36 1.00 oo 1.32 • 0.52 0.27 0.17 0.12 0.084 ;• 0.052 I 0.033 ! 0.25 I t I 0.16 ! 0.11 ! j 0.08 ! 0.06 j I 0.048 ! 0.04 J | : 0.028 i i 0.020 TABLE V Variances of YHT and Ymr when X i s Gamma Distributed, P(x), i s Polynominal, k = 0.5, g = 2. (Each entry should be mu l t i p l i e d by L[ 2/n) r = 1 j i r = 2 - r = 3 m A, RYHT fiYrnr A, HYR IKYHT j A | KYmr. HYR KYHT 1 ! j A ; j K.Ymr ; i ; A KYR -1.0 6.50 i i o O j 1.00 3.00 ! i 1.00 0.50 j -0.5 7.44 2.00 2.32 2.00 0.67 1.36 0.67 0.40 j o.o 2.00 oo 1.00 .1.00 j 1.00 j 0.50 0.67 0.50 0.33 j 0.5 0.96 2.00 0.67 0.55 \"! 0.67 0.40 0.37 0.40 0.29 j 1.0 1.00 1.00 0.50 0.50 i i 0.50 0.33 0.33 0.33 0.25 1 1.5 2.53 • 0.67 0.40 0.84 \\ 0.40 0.29 0.50 0.29 0.22 | 2.0 j • O O i 0.50 0.33 2.00 1 ! 0.33 I I 0.25 1.00 0.25 0.20 1 3.0 0.33 0.25 o O 0.25 0.20 5.67 0.20 0.17 j j 4.0 i 0.25 0.20 . 0.20 0.17 0.17 0.14 . . . TABLE VI Variances of YHT and Ymr when X i s Gamma Dist r i b u t e d , P(x) i s Polynominal k = 1, g = 2. (Each entry should be m u l t i p l i e d by j^y/n) r = 1 r = 3 ) i ! m 1—1 — ; KYHT i ' i A, I KYmr A KYHT A KYrnr 1 A KYHT A K Ymr t i - i . o ! 1 1 1 11.00 1 | 5 .67 | - 0 . 5 j j 10 .78 1 4 . 1 5 1 2 .86 • \\ 0 .0 i 3 .00 i ! i » 2.00 i j 1.67 1 ! 0 .5 j 1.36 i I i 1.21 1 ! L 1 7 i ; i .o : 1.00 i 1 ! • i 1.00 1 • 1.00 1 1.5 i ; 1.36 i ! 1.21 1 1.17 1 ; i 2 .0 i 3 . 0 1 2.0 1 1.67 .1 j 3 .0 i 1 i i 11.00 1 5.67 i ! | 4 . 0 ' i i 39 .00 ] CO 36. estimator and of the mean-of-the-ratios estimator are not comparable as the values of m for which these variances are f i n i t e are not the same. In the range of values of m for which either the variance of the HT estimator or the variance of the mean-of-the-ratios estimator and that of the r a t i o estimator are f i n i t e , the r a t i o estimator i s more e f f i c i e n t . When g = 0, r ~ 2, and 0 < n U l the r a t i o estimator has smallest variance followed.by the HT estimat-or and then by the mean-of-the-ratios estimator, but the mean-of-the-ratios estimator beats the HT estimator for m\">l. When g = 0, r = 3 and m<0.8, the r a t i o estimator i s best followed by the HT estimator and the mean-of-the-ratio estimator i s l a s t . The r a t i o and mean-of-the-ratios estimators become more e f f i c i e n t with an increase i n the value of both m and r. The HT estimator's variance takes i t s minimum value for values of m near 0.8; i t also takes smaller values as r increases. When g - 1 the r a t i o estimator always has the smallest variance. When r = 1 and ml. When r = 2 and for either mcO or m>l, the HI estimator has larger variance than the mean-of-the-ratios estimator, for 0 o0oo ; o O 714.80 31.30 5.59 1.70 0.67 0.50 0.98 3.03 20.79 o O 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.67 0.60 0.53 0.47 0.40 0.33 0.27 0.20 0.13 0.07 0.00 Var A (MYmr) i s not defined f or. r = 1. iTf 4-4-4 I 1 1 I I 1 J 4 Vaxianc es I of I the i :Stima tor Gariiriia ;Dic t x i b i j i t e dP(x) H I III I U J-U-iJ-Hri m t m 19. :: i t - h R± r~n~| • FFr i TABLE XII A A var (KYHT) and var (KYmr) when X i s Gamma Distributed, P(x) i s Exponential, k = 1, g = 2. (Each entry should be multiplied by H^/n) 1 i = i r = 2 r = .3 c ; &YHT A •UYrnr UYHT $ Ymr A KYHT $Ymr - 1.0 - 0.8 - 0.6 - 0.4 - 0.2 0.0 0.2 0.4 0.6 0.8 1.0 i c O \"' 276.78 • 37.46 12.16 5.51 3.00 1.90 1.43 1.45 2.51 o O i i 1 1 . 1 1 1 1 i ! i ! 1 i 1 i i ! 1 1 O O 346.30 31.04 8.95 3.58 2.00 1.49 I 1.60 | 2.72 1 9.00 i ! oo ! 1 1 1 1 1 1 1 1 1 1 1 o O 742.0 37.87 7.50 2.86 1.67 1.55 2.40 6.08 | 36.41 1 1 1 1 1 1 1 1 1 | 1 1 52. Summary of Results and Some Conclusions The following observations can be made from the r e s u l t s : (1) Under the given assumptions about the d i s t r i b u t i o n of the a u x i l i a r y v a r i a b l e , X,and the sampling designs, and for g = 0 or 1, the r a t i o estimator i s better than both the mean-of-the-ratios and the HT estimators. Under a very wide range of the values of m, the mean-of-the-ratios estimator performs better than the HT estimator. (2) The HT estimator i s very sensitive to changes i n values of the design function parameters m and c. Within a c e r t a i n (small) range of the values of these parameters the HT estimator has, sometimes, smaller variance than that of the mean-of-the-ratios estimator. (3) The variances of these estimators are usually smaller when k = 0„5 than when k = 1. ( 4 ) For fixed values of k and g, the variances of the r a t i o and mean-of-the-ratios estimators usually increase with a decrease i n the value of r . With the HT estimator, i t depends on the range of values of m or c considered and the design function used. With the polynominal design function, the e f f i c i e n c y of the HT estimator usually increases with r . With the exponential design function and f o r , say, c<0.3, the 53. e f f i c i e n c y of the HT estimator improves with an increase i n the value of r; when c>0.4 the HT estimator usually becomes les s e f f i c i e n t as r takes higher values. For the same values of k, and for the same ranges of the values of the design function parameters, m or c, the ef f e c t of increasing g i s usually to make the r a t i o and the mean-of-the-ratios estimators les s e f f i c i e n t . Other things f i x e d , the variances of the r a t i o and the mean-of-the-ratios estimators decrease with an increase i n the values of m or c. The r a t i o and mean-of-the-ratios estimators have smaller variances when sel e c t i o n i s 'purposive' (m?0 or c70) than when simple random sampling (m=c=0) i s employed. The HT estimator i s most e f f i c i e n t when the sampling scheme i s approximately pps, i . e . when m i s approximately = 1 ( i n the polynominal case). In most cases, the variances of these estimators can be equalized by simply choosing d i f f e r e n t combinations of the parameters. For example, the variance of the r a t i o estimator for k = 1, g = 1 and r = 1 equals the variance ofthe mean-of-the-ratios estimator with k = 1, g = 1 and r = 2 (and P(x) i s polynominal). 54. (10) For a g r e a t e r range of the v a l u e s of c or m, the e s t i m a t o r s have s m a l l e r v a r i a n c e s when P ( x ) i s p o l y n o m i n a l t h a n when i t i s e x p o n e n t i a l . (11) With an e x p o n e n t i a l d e s i g n f u n c t i o n , the v a r i a n c e s of the e s t i m a t o r s a r e f i n i t e f o r a v e r y s m a l l range of t h e v a l u e s of c. The v a r i a n c e s of the r a t i o and m e a n - o f - t h e - r a t i o s e s t i m a t o r s a re z e r o when c = 0. 55. CHAPTER 4 DISCUSSION OF RESULTS Some Empirical Comparisons P.S.R.S. Rao (1969) compared four ratio-type estimators under the regression model: y=(X+p>X+e where E(e . | X i ) = 0 E ( e i e. | X ± Xj) = 0 Var (ej_ j XL) = £x? and X i s gamma di s t r i b u t e d with parameter h. He found out that when a = 0 and g > 1, the MSE's increase as h increases. He considered values of h>2 and samples of size 2-10. These r e s u l t s are not quite compatible with my r e s u l t s f o r the same MSE's of the r a t i o and mean-of-the-ratios estimators (Rao's f i r s t estimator i s the r a t i o estimator). But from (2.5) and (2.9), i t i s clear that, for g>2, the variance of the mean-of-the-ratios estimator increases with r . For g 2, the variance decreases with r . Since I did not work out the variance of the r a t i o estimator for g > 1, I do not have much to compare with. C e r t a i n l y when g = 1, the variance of the r a t i o estimator decreases as r increases. Rao also found out that for fixed n and h (or r) the MSE's increase as g increases. This agrees with my r e s u l t s and can be i n f e r r e d from (2.5) -(2.7). L a s t l y , his r e s u l t s show that for 0f= 0, n 72 and 56. g = 1 or 2, the r a t i o estimator has smaller MSE than the other three. It i s implied from his paper that a choice of a and g.combinations does a f f e c t the MSE's of the estimators. My r e s u l t s show that the variances of the r a t i o and mean-of-the-ratios estimators become smaller as the design function parameter m or c increases. (2.5) -(2.7), (2.10) - (2.11) show similar r e s u l t s . But the larger the values of m or c the higher w i l l be the i n c l u s i o n p r o b a b i l i t i e s for units with large values of X compared with those with small values of X. This leads to the notion that i f units with large values of X are purposely included i n the sample, the estimation procedure w i l l be more e f f i c i e n t . Many researchers, notably, Royall (1970, 1971) have found si m i l a r r e s u l t s under similar conditions. Royall's r e s u l t s also show that the r a t i o estimator i s better when combined with designs other than simple random sampling. For g = 1, k = 1, he shows that the r a t i o estimator i s the best l i n e a r i - unbiased estimator. When used with simple random sampling design, i t remains optimal only i f M i s i n f i n i t e . Since Godambe proved the non-existence .of a uniformly minimum variance unbiased estimator among a c l a s s of a l l unbiased l i n e a r estimators for any sampling design, i t may be proper to note that while discussing these optimality properties of the r a t i o estimator we are, most of the time, r e s t r i c t i n g 57. ourselves to a c e r t a i n class of l i n e a r estimators only (Godambe's 1955 c l a s s ( i i i ) ). The same remarks apply to optimality conditions given by Cochran (1963). The HT estimator belongs to Godambe's (1955) sub-class ( i ) of estimators. It i s the best and the only unbiased l i n e a r estimator i n the sub-class. Under super-population set up, Rao (1967) considered the problem of choosing a suitable strategy for the r a t i o , mean-of-the-ratios and the HT estimators. His r e s u l t s suggest that the mean-of-the-ratios estimator with ITPs i s better than the other two estimators used with the Midzumo-Sen sampling scheme. My r e s u l t s suggest that i f the r a t i o and mean-of-the-ratios estimators are combined with simple random sampling, there always e x i s t s an m or c such that the variance of the mean-of-the-ratios estimator i s l e s s than the variances of the r a t i o and of the HT estimators. S i m i l a r l y , the variance of the r a t i o estimator with varying p r o b a b i l i t y of i n c l u s i o n procedure can be made small compared with those of the HT and mean-of- t h e - r a t i o s estimators with simple random sampling. If i n formula (2.2) for the mean-of-the-ratios estimator, we l e t * i = P(Xi) • r the HT estimator (2.1) i s obtained. Hanurav (1967) was 58 o interested i n finding sampling designs under which t h i s X • p(X-) = — and the variance of the rnean-of-the-ratios 1 r estimator i s unbiased (he was estimating the population t o t a l ) . For n = 2, he gives,two sequential sampling procedures that solve the problem. With my study, the conditions are easy to f i n d : when P(X) P(Xi) when P(X) P(Xi) It i s easy to f i n d an r (c or m) that w i l l give the . required P(Xj_) for fixed m or c (or r) provided c ~ m = 0. With polynominal P(x), t h i s study i s r e a l l y a special case of that considered by Cassel and Sarndal (1973). Nearly a l l of my findings are i n agreement with t h e i r f i n d i n g s . They f i n d out, for example, that: (i ) When g = 2, the mean-of-the-ratios estimator has constant variance regardless of both the design and the d i s t r i b u t i o n function, f ( x ) . ( i i ) The r a t i o estimator combined with simple random sampling can be quite i n e f f i c i e n t compared with i t i f used with other sampling designs, ( i i i ) Under c e r t a i n conditions, one should use purposive s e l e c t i o n of the units with largest X-values. i s polynominal, we want = X^ • (m+ r j r i s exponential we have ( 1 - c ) r cxi = X i e r 59. (iv) When g ^ '1 the r a t i o estimator i s at least as e f f i c i e n t as the mean-of-the-ratios estimator for any design P(x). In the case that I am consider-ing, for n>l and g ^.1, the expressions for the variances of the r a t i o and the mean-of-the-ratios estimators show that the r a t i o estimator has smaller variance than the mean-of-the-ratios estimator, and the difference between the two variances increases with n. For example, when g = 0 and P(x) i s polynominal, we have A 2 nk 2 H y Var (M-Ymr) = n 2(m+r-l)(m+r-2) 2 i t 2 V a r ( ^ Y R ) = (nm +nr-l)(nm+nr-2 (v) Variances of the r a t i o and mean-of-the-ratios estimators increase with m or c. (vi ) If g = 2, regardless of the design function used, the mean-of-the-ratios estimator has variance that i s smaller than, a l to that of the r a t i o ' estimator (I did not work out the variance of the r a t i o estimator when g = 2). ( v i i ) If g > 2 the variance of the mean-of-the-ratios estimator, becomes small i f the design assigns the bulk of the se l e c t i o n p r o b a b i l i t i e s to the units with smallest X-values. 60. ( v i i i ) The HT estimator i s generally highly sensitive to s h i f t s i n the design. Its variance i s a minimum' when se l e c t i o n p r o b a b i l i t i e s are somewhat i n the. v i c i n i t y of pps procedure. The variance can become very large due to minor deviations from the point of minimum. Comments and Some Implications From the r e s u l t s of t h i s study, i t would seem that instead of tr y i n g to look for estimators that are generally optimal l i k e uniformly minimum variance estimators, more e f f o r t should be used i n defining c l e a r l y and simply the conditions under which the popular estimators are e f f i c i e n t . Under the usual regression models, the choice of g , i t seems, determines the extent to which the best choice of sampling strategy can improve the estimation process. In most, cases studied, i t has been found that the value of g l i e s between 1 and 2. This may be unfortunate as the three common estimators I have considered can be I more e f f i c i e n t when g = 0. On the other hand, g = 0 implies that the variance of the error term, Zj_, i s constant which i s a very u n l i k e l y s i t u a t i o n i n p r a c t i c e . For 1 £ g £ 2 , the variances of the three estimators can be made to a t t a i n t h e i r minimum values by a proper choice of rn, r and g. Perhaps the good thing with the continuous 61. variable model i s that the expressions for the'variances of the estimators are very, very simple. If one was interested, i n getting the exact minimum values of the variances of these estimators, i t should not be too d i f f i c u l t for him to do so. He w i l l l i k e l y have to use the computer and some mathematical programming techniques. The r e s u l t s also c a l l for more attention to the choice of estimators and sampling designs when doing 1 survey sampling. In p a r t i c u l a r , the r e s u l t s show again, 1 that i n most cases, simple random sampling i s not an optimal sampling design. There are other better ones. And when estimating the population mean, the sample average need not give optimal r e s u l t s . There may be other better estimators. In p r a c t i c e , the sampler, under t h i s model, w i l l p a r t l y be able to control the design function parameters. In such s i t u a t i o n s , studies l i k e t h i s may help the sampler make a proper choice of the design function parameter that w i l l give best r e s u l t s . In t h i s study, the r a t i o and rtiean-of-the-ratios estimators, as i n similar other studies, promise good r e s u l t s when sel e c t i o n i s purposive under c e r t a i n conditions. Most sample survey experts object to t h i s method of sel e c t i o n because, as Hansen, Hurwitz and Madow (1953, p.9) put i t : (a) Methods of selecting samples based on the theory of p r o b a b i l i t y are the only general methods known to us which can provide a measure of p r e c i s i o n . 62. Only by using p r o b a b i l i t y methods can objective numerical statements be made concerning the p r e c i s i o n of the r e s u l t s of the survey; (b) It i s necessary to be sure that the conditions imposed by the use of p r o b a b i l i t y methods are s a t i s f i e d . It i s not enough to hope or expect that they are. Steps must be taken to meet these conditions by selecting methods that are tested and are demonstrated to conform to the p r o b a b i l i t y model. They continue saying: We assert that, with rare exceptions, the p r e c i s i o n of estimates not based on known prob-a b i l i t i e s of selecting the samples cannot be predicted before the survey i s made, nor can the p r o b a b i l i t i e s or p r e c i s i o n be estimated a f t e r the sample i s obtained. If we know nothing of the p r e c i s i o n , then we do not know whether to have much f a i t h i n our estimates, even though highly accurate measurements are made on the units i n the sample. Random sampling i s usually supported for si m i l a r arguments, namely, i t protects against f a i l u r e of c e r t a i n p r o b a b i l i s t i c assumptions, i t averages out e f f e c t s of unobserved or unknown random variables, i t guards against unconscious bias on the part of the experimenter, i t w i l l usually produce a sample i n which the X's are spread throughout the range of X values i n the population and t h i s enables the sampler to check the.accuracy of assumptions concerning the r e l a t i o n of the y's to the X's and, again, i t enables the sampler to estimate, from the sample, the p r e c i s i o n of h i s estimate. But p r o b a b i l i t y methods can do nothing more than give us expectations about, say, the possible p r e c i s i o n of 6 3 . the r e s u l t s of the survey. The precision would usually be stated i n terms of the p r o b a b i l i t y that the estimate deviates from the r e a l value, and as long as i t i s given in these p r o b a b i l i t y terms, i t does nothing more than give expectations, however high and refined the p r o b a b i l i t i e s may be. On the other hand, i f many studies point to the f a c t that non-probability methods, l i k e purposive s e l e c t i o n , lead, under c e r t a i n conditions, to e f f i c i e n t estimations, the same p r o b a b i l i t y theory may allow that under similar sampling conditions, we can expect, with high p r o b a b i l i t y , to obtain s i m i l a r good r e s u l t s . As Royall (1970) argues, i f the sampler believes i t to be important that he obtain .. a sample i n which the X values have a c e r t a i n configuration, then he should choose a sample d e l i b e r a t e l y and not leave i t to the choice of a c e r t a i n chance mechanism. In t h i s study, the r a t i o estimator has once again shown i t s u p e r i o r i t y over the mean-of-the-ratios and the HT estimators. The HT estimator has revealed i t s s e n s i t i v i t y to the choice of parameters of the model and of the sampling design. The rnean-of-the-ratios estimator may not be very far off from the r a t i o estimator. The r e s u l t s of the'study are quite similar to similar studies under d i f f e r e n t regression type models. I would l i k e to note that s t r a t i f i e d sampling combined with simple random sampling i n each stratum can be achieved, under t h i s model, 64. by assigning the same P(x) for a l l members of one stratum and d i f f e r e n t P(x) for members of d i f f e r e n t s t r a t a . Some Limitations In putting the ideas- contained i n t h i s study into practice, the order of events i s ( l ) estimate F ( x ) , the d i s t r i b u t i o n function of x (2) approximate g and 9 and (3)\"investigate some sampling designs and estimators and choose the ones that give best r e s u l t s . This study would help the sampler to approximately examine the behaviours of d i f f e r e n t designs and estimators he may be pondering to use. The assumption concerning an i n f i n i t e population that has approximately a continuous frequency d i s t r i b u t i o n , while i t helps simplify the in v e s t i g a t i o n of d i f f e r e n t designs, also makes the s i t u a t i o n considered an i d e a l i z a t i o n of the r e a l s i t u a t i o n . To estimate the frequency function of x, one could st a r t by observing the histogram of the x values and choose or f i t an approximate continuous function, possibly by some mathematical curve f i t t i n g techniques, that c l o s e l y resembles the histogram; and the continuous function thus obtained has to be standardized to become a cumulative d i s t r i b u t i o n function. To approximate 6 and g, we could use some l i k e l i h o o d methods l i k e the ones suggested by Brewer (1963). 65. It i s quite possible that some problems of evaluating the variances of the estimators we may want to investigate given the approximate d i s t r i b u t i o n function of• x w i l l be encountered. Should i t , for example, turn out that the X values are approximately normally d i s t r i b u t e d and the sampler wants to investigate the sampling designs and estimators I have studied, i t would not be easy to evaluate the variances. But with other d i s t r i b u t i o n functions and sampling designs things should be easy going and studying the properties of such sampling strategies i s easy. Cassel and Sarndal (1973) show that r e s u l t s obtained under t h i s model are v a l i d for values of N as low as N =- 10. Some Recommendations I think sample survey t h e o r i s t s should spend more time simplifying and unifying the r e s u l t s of t h e i r research. They should spend more time i n r e f i n i n g and reducing the number of d i f f e r e n t sampling designs and estimators that they are considering. They should, somehow, formulate a simple u n i f i e d theory of sampling that can e a s i l y be put into practice. In'this connection, the estimators I have considered may prove us e f u l . Surely, some problems may crop up i n i t i a l l y . I also think that some ideas from General S t a t i s t i c a l Theory can be useful i n formulating a simple sample survey theory; there i s no need of divorcing one of the sampling theories from the other. 6 6 . REFERENCES Basu, D. (1971). An essay on the l o g i c a l foundations of survey sampling, Part 1. i n Foundations of S t a t i s t i c a l Inference, ed. by V. P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 203-242. Brewer, K.R.W. (1963). Ratio estimation and f i n i t e populations: some r e s u l t s deducible from the assumption of an underlying stochastic process. Aust. J . S t a t i s t . 5. 93-105. Cassel, CM. and Sarndal, C.E. (1972a). A model for studying robustness of estimators and informative-ness of lab e l s i n sampling with varying p r o b a b i l i t i e s , J . R. S t a t i s t . Soc. B., 35. 279-289. , (1972b). An anal y t i c framework o f f e r i n g some guidelines to the choice of sampling design and the choice of estimator for f i n i t e populations. Working paper No. 142, Faculty of Commerce, UBC. , (1973). Evaluat ion of some sampling strategies for f i n i t e populations using a continuous variable framework. To appear i n the A p r i l issue of Communications i n S t a t i s t i c s . Cochran, W.G. (1946). Relative accuracy of systematic and s t r a t i f i e d random samples for a c e r t a i n class of populations. Ann. Math. S t a t i s t . , 17, 164-177. , (1963). Sampling Techniques. New York: Wiley. Durbin, J . , (1953). Some r e s u l t s i n sampling theory when the units are selected with unequal p r o b a b i l i t i e s . J . R. S t a t i s . Soc. B., 15. 262-269. Foreman, E.K., and Brewer, K.R.W. (1971). The e f f i c i e n t Use of Supplementary Information i n Standard Sampling Procedures. J.R. S t a t i s t . Soc. B.. 33. 391-400. Godambe, V.P. (1955). A u n i f i e d theory of sampling from f i n i t e populations. J.R. S t a t i s t . Soc.B., 17, 269-278. , (1965). A review of the contributions towards a uni f i e d theory of sampling from f i n i t e populations. Rev. Int. S t a t i s t . Inst.. 32.. 242-253. 67. Godambe, V.P. (1969). Some aspects of the t h e o r e t i c a l developments i n survey sampling. New Develop-ments i n Survey Sampling, ed. by N. L. Johnson and H. Smith, J r . , 27-58. New York: Wiley. ., (1970). Foundations of Survey-sampling. The American S t a t i s t i c i a n , 24. 33-38. and Joshi, V.M. (1965). A d m i s s i b i l i t y and Bayes estimation i n sampling f i n i t e populations,1. Ann. Math. Statist... 36. 1707-1722. Goodman, R. and Kish, L. (1950). Controlled selection - A technique i n p r o b a b i l i t y sampling. J . Amer. S t a t i s t , Assoc.. 45. 350-372. Hansen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sampling Survey Methods and iheory, V o l . 11. John Wiley & Sons, New York. Hanurav, T.V. (1967). Optimum u t i l i z a t i o n of A u x i l i a r y Information: f/\"PS sampling of two units from a stratum. J . R. S t a t i s t . Soc. B. 29, 374-391. Hartley, H.O. and Rao, J.N.K. (1962). Sampling with unequal p r o b a b i l i t i e s and without replacement. Ann. Math. S t a t i s t . 2. 77-86. Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a f i n i t e universe. J . Amer. S t a t i s t . Ass. 47. 663-635. Joshi, V.M. (1969). A d m i s s i b i l i t y of Estimates of the mean of a f i n i t e population. New Developments i n Survey Sampling, ed. by N. L. Johnson and H. Smith, J r . , 188-207, New York: Wiley. Neyman, J . (1934). On the two d i f f e r e n t aspects of the representative method. The method of s t r a t i f i e d sampling and the method of purposive s e l e c t i o n . J . R. S t a t i s t . Soc. 97.. 558-606. Rao, J.N.K. (1969). Ratio and Regression Estimators, New Developments i n Survey Sampling. Ed. by N.L. Johnson and H. Smith, J r . , 213-234, New York, Wiley. Rao, P.S.R.S. (1969). Comparison of four ratio-type estimates under a model. J . Amer. S t a t i s t . Assoc. 64. 574-580. oo. Rao, T.J. (1967). On the choice of a strategy for the Ratio Method of estimation. J . R. S t a t i s t . Soc. B. 29, 392-397. Royall, R.M. (1970). On f i n i t e population sampling theory under c e r t a i n l i n e a r regression models. Biometrika, 57. 377-387. , (1971). Linear regression models i n f i n i t e population sampling theory. In Foundations of S t a t i s t i c a l Inference, ed. by V.P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 259-279. Sarndal, C.E. (1972). Sample Survey theory Vs. General S t a t i s t i c a l theory: estimation of the population mean. Rev. Int. S t a t i s t . Inst., 40. 1-12. Sukhatme, P.V. and Sukhatme, B.V. (1970). Sampling Theory of Surveys with Applications, Asia Publishing House, Bombay 1. "@en ; edm:hasType "Thesis/Dissertation"@en ; edm:isShownAt "10.14288/1.0092965"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Business Administration"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "The ratio, mean-of-the ratios and Horvitz-Thompson estimators under the continuous variable model"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/18800"@en .