r I THE RATIO, MEAN-OF-TH E-RATIOS AND HORVITZ-THOMPSON ESTIMATORS UNDER THE CONTINUOUS VARIABLE MODEL BY ANTHONY ALIFA CHAMWALI B. SC., UNIVERSITY OF EAST AFRICA -THE UNIVERSITY COLLEGE, DAR-ES-SALAAM, 1970 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE (BUS. ADMIN.) IN THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION V/E ACCEPT THIS THESIS AS CONFORMING TO THE REQUIRED STANDARD THE UNIVERSITY OF BRITISH COLUMBIA APRIL, 1974 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Commerce and Business Administration The University of British Columbia Vancouver 8, Canada Date April, 1974 ABSTRACT This study investigates the performances of the ratio estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson (HT) estimator under the continuous variable model of Cassel and Sarndal (1972a, 1972b, 1973). Under this model, the character, Y, which is of interest to the investigator is assumed to be related to an auxiliary variable, X, by Y(Xj_) = e{X± + Z(Xi) ) where £(Zi| x^) = 0 VXi e (0, ^) £(z±2\ Xi) = cr2 (Xi) = k2 x? £ (ZiZj / XiXj) =0 (i * j) It is assumed, in this paper, that X is gamma distributed over (0, CP) with parameter r. The mean of Y is to be estimated, under the additional assumptions that the design function, P(x), is l) polynominal 2) exponential, i.e. 1) P(X) -2) P(X) = (1-c) r ecx It is observed that for g = 0 or 1, the ratio estimator performs better than the other two. For g = 0, 1 or 2, and for a wider range of values of rn or c, the mean-of-the-ratios estimator performs better than the HT estimator. When P(X) is polynominal, the III estimator is most efficient if the sampling design is approximately pps. The results compare well with those of other researchers under similar assumptions., ii . TABLE OF CONTENTS Page ABSTRACT i ACKNOIVLEDG EMENT S ' ' . vi 1 INTRODUCTION 1 Classical Sampling Theory. . 1 The Sample Survey Theory ........ 4 Search for Optimal Estimators and Sampling Designs. . . 6 The Ratio, Mean-of-the-ratios and HT Estimators . 10 2 . THE PROBLEM 14 Statement of the Problem ........ 14 Method Used and Problems of Evaluation . 17 3 THE RESULTS 21 Results for P(x) Polynominal and k = 0.5 21 Results for P(x) Polynominal and k = 1.0' 23 Results for P\x) Exponential and k = 0.5 37 Results for P(x) Exponential and k = 1.0 45 Summary of Results and Some Conclusions 52 4 DISCUSSION OF RESULTS .55 Some Empirical Comparisons ....... 55 Comments and Some Implications ..... 60 Some Limitations 64 Some Recommendations 5 REFERENCES . ' 66 iii LIST OF TABLES Table Number X Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 0 Pac^e I Variances of the Estimators when X is Gamma • Distributed, P(x) is Polynominal, k = 0.5, 0/1 g = 0 24 II Variances of the Estimators when X is Gamma Distributed P(x) is Polynominal, k = 0.5, : ^ g = 1 lo A ^ III Variances of M-YHT and Y- Ymr when X is Gamma Distributed, P(x) is Polynominal, k = 0„5, g = 2 28 IV Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 1, g = o 30 V Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 1, g = i , 32 A A VI Var (NYHT and Var (K-Ymr) when X is Gamma Distributed P(x) is Polynominal, k-1, g = 2 34 VII Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 0.5, _ g = 0 3<? VIII Variances of the Estimators when X is Gamma Distributed, P(x) is Exoonential, k = 0.5, g 55 i 41 I IX Var(MYHT) and Var(M Ymr) when X is Gamma Distributed, P(x) is Exponential, k = 0.5, g = 2 43 A 6 XI Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, q = 1 48 XII Var( KYHT) and Var(KYmr) when X is Gamma Distributed, P(x) is Exponential, k = 1, o = 2 50 iv LIST CF FIGURES Figure Number Page 1 Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 0 . 2o 2 Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 1 . 27 3 Variances of KYHT and M-Ymr when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 2 9 4 Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 1, g = 0 31 5 Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 1, g = 1 ' 33 A A 6 Var (HYHT) and Var (KYmr) when X is Gamma Distributed, P(x) Is Polynominal, k = 1, 9 = 2 35 7 Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 0.5, g = 0 40 8 Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k - 0.5, g = 1 . . . 42 A A I 9 Var (MYHT) and Var (nYmr) when X is Gamma Distributed, P(x) is Exponential, k = 0.5, g = 2 4 10 Variances of.the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 0 . . 47 11 Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 1 9 A A 12 Var (MYHT) and Var (f^Ymr) when X is Gamma Distributed, P(x) is Exponential, k = 1, 0 = 2 " . 51 v ACKNOWLEDGEMENTS - I would like to thank Dr. C. E. Sarndal for his invaluable advice and help offered throughout this study. I would also like to extend my' thanks to the Canadian Government for sponsoring my education here in Canada. Lastly, my appreciation to the Tanzanian Government and the Principal of the Institute of Development Management, Tanzania, who, in one way or another, arranged for my study. vi CHAPTER I INTRODUCTION Recently, much emphasis has been placed on the distinction between General Statistical Theory and Sample Survey Theory. It has been felt that the two theories should either be reconciled or be treated differently. Classical Sampling Theory In General Statistical Theory, Sampling Theory involves making inferences from observed sample values to an infinite hypothetical population. The sample values are assumed to have been drawn randomly from a smooth density function defined on the units of the hypothetical population. Godambe (1969, 1970) suggests that the tendency to associate theoretical statistical sampling with an infinite hypothetical population that has a smooth density function may likely be due to the V|ay General Statistical Theory developed. The early statisticians were mainly interested in biological and sociological phenomena like inheritance. They assumed that some chance mechanism operated behind these phenomena, and that the chance mechanism uniquely determined the frequency function of the characteristic under study in the hypothetical population. 2. When discussing Sample Survey Theory, some optimality properties of estimators from General Statistical Theory are sometimes encountered. The following are some of these: Let Y^, Y2, . . ,iYn be a sample from a population with density f unctionf(Y,0). Let © = d(Yi, Y2, . . . Yn ) be the estimator of the parameter 0. Since 6 is a function of the observed sample values, it is a random variable. Let 6* = d* (Yi , .. . Yn ) by any other statistic which is not a function of 0. The estimator 0 is sufficient for 0 if, for each 0*, the conditional density of 0* given 0, P(0*J0) does not contain 0. A sufficient statistic, if it exists, contains all the information about the parameter to be estimated. The estimator, 8, Is an unbiased estimator of 0 A A if its expected value, E(6), equals 0. 0 is a Uniformly Minimum Variance Unbiased Estimator (UMVUE) of 0, if 0 is an unbiased estimator of 0 and its variance is less than or equal to the variance of any other possible estimator, 6p, of 9 i.e. Var (e)^Var (0p) for all 0p. Let 0n = dn (Yi, . . c,Yn) be an estimator of 9 based on a sa_mple of size n. The 3« sequence [0n^ is a consistent estimator of 0 if, for every ;r>o lim p(e -£ ^en<e +e) = i , ve n -*cQ i.eB, 0n approaches 0 as n gets large. The sequence [®n^ ^s a Best Asymptotically Normal (BAN) Estimator of 0 if A (a) The distribution of ^ (0n - 0) approaches the normal distribution with mean 0 and variance cf2 (e) as n approaches infinity (b) For every £ > 0, lim P [j ©n - ej>i] = o , ve n (c) There is no other sequence of consistent estimators 0j[ , 0"! , ....,0^ . ... for which the distribution of ^(6* - 9) approaches the normal distribution with mean 0 and variance 0*2 (e) and such that ^Jel_> I 0*2 (0) for all 0 in some open interval. If the unbiased 0 = d(Yi, YA ..Y ) is a linear 1 2 > ' n function of Y and if Var (0)iVar (0p) for all linear 0p, A. then 0 is a Uniformly Best Linear Unbiased Estimator (UBLUE) of 0. It may be worthwhile noting that while general statistical theory aims at making inferences about some frequency function of a hypothetical population, it may 4. actually be inferring about the possible chance mechanism (discussed above) that produced the given set of observa tions which may have nothing to do with the hypothetical population at all. The Sample Survey Theory In Sample Survey Theory, the populations considered are finite. They consist of elements which are real and countable. They are not hypothetical like the ones considered in the General Statistical Theory. This is the main difference between the populations dealt with by the two sampling theories. In this sense the problems of inference in the two models of sampling theory may be taken to be different. In survey sampling, since the investigator deals with real and finite populations, he has choice over the sampling design and the estimator he may want to use. Occasionally, circumstances may necessitate the use of nonprobability sampling altogether. Cochran (1963) gives the following situations that lead to nonprobability sampling: (1) When the sample is restricted to a part of the population that is readily accessible. (2) When the sample is selected haphazardly. (3) With a small but heterogenous population, the sampler inspects the whole of it and selects a small sample of 'typical' units, i.e., units that are close to his impression of the average of the population. This method is sometimes called judgement or purposive selection. 5. (4) Lastly, when the sample consists essentially of volunteers. He notes that these methods may give good results under the right conditions although they are not amenable to the development of a sampling theory owing to lack of .random selection. But whether one deals with Sample Survey Theory or General Statistical Theory, he is dealing with the same problem of inferring from the sample to the population usually with the help of Probability Theory and Statistics. In the General Statistical Theory, random sampling solved nearly all the sampling design problems. With survey populations, things are not that much easy. The nature of the population may make random sampling difficult; it may not be possible to identify all the units in the population. In most cases, the frequency function of the character under consideration is unknown. This makes it difficult to check some of the optimality conditions of estimators in the General Statistical Theory. And optimal estimators under the General Statistical Theory need not be so under the Sample Survey Theory. This raises the main problem of finding sampling designs and estimators that are optimal for the sample survey situation. Mathematically, in the sample survey model, we have a set P of N units that constitute the population, P = (Ui,, U2, UN) 6. where stands for unit i. An unknown quantity Yi which is of interest to the surveyor is associated with U^. The surveyor wants to know 0 = ©(Yi, Y2, Yj«j) He may also know the auxiliary variable X, i.e. a = (Xj_, X2, ..., XN) associated with P. He selects a subset • s = (ulf u2, un) ! Of units of P and observes the corresponding Y = (Ylv Y2, Yn) (response errors are ignored). The sampling plan generates S. He then calculates 9 = e(YI} Y2, Yn) as an estimator of 0. The problem is how to select S (how to assign the probability that unit i of P will be included in S) and how to choose the random variable 0 to get an estimate which is as near 0 as possible. Certainly, it is not just a matter of getting a 'representative' sample and calculating the mean, variance I and Median of the sample values. Besides, what is a representative sample? Search for Optimal Estimators and Sampling Designs, In 1934 Neyman introduced the Gauss-Markov theorem to obtain a linear unbiased minimum variance estimate for 7. the mean of a survey population. He established that, for simple random sampling, the sample mean was the minimum variance linear unbiased estimate of the survey population mean. This was an attempt to fit survey sampling into the hypothetical population model, and his findings stimulated most sample-survey statisticians to find, with the help of Gauss-Markov theorem, efficient (i.e. minimum variance) unbiased estimators for a variety of more complex designs. Sampling procedures like, sampling with arbitrary probabilities, stratified sampling, cluster sampling, 2-stage sampling, multi-stage sampling, etc. were designed to reduce the variance of the estimators (see, for example, Goodman and Kish (1950), Durbin (1953), Hartley and Rao (1962), Cochran (1963). Horvitz and Thompson (1952) attempted to provide a general method for dealing with sampling without replacement from a finite population when variable probabilities of selection are used to the elements remaining prior to each draw. They, in particular, considered a general estimator of the population total of the form -A n Y = Iri Yi i=l where ^± (i = 1, N) is a constant to be used as a weight for the itn unit whenever it is selected for the 8. sample. Letting P(Xj_) to be probability that the i"^ element will be included in a sample of size n, they showed that P(Xi) makes Y unbiased and of minimum variance (X^ is the value of the auxiliary variable associated with unit i). Godambe (1955, 1965) considered the general estimator •A n Y = -21 bsi Yi i=l where bsi is defined in advance for all the Nn logically possible S (samples), and for all i in S. This.is a more general class of estimators than the ones considered by Horvitz and Thompson. He proved the non-existence of a uniformly minimum variance (UMV) unbiased estimator in this class.of estimators for any design P(s), excepting those in which no two S with P(S) > o have at least one common and one uncommon unit. Because of the non-existence of UMV unbiased estimator, other criteria of goodness of an estimator were sought. Godambe and Joshi (1965) considered the criterion of admissibility and proved that the Horvitz-Thompson (HT) estimator is admissible in the class of all p-unbiased estimators of Y, the population total, for any design P(S) such that 9. TTj = 2 p(s) > o , vuj ' S:)uj The admissibility criterion, however, was satisfied by many other estimators, and other new criteria like 'hyper-admissi'bility, ' 'necessary bestness' (Basu, 1971) were introduced. A detailed examination of these optimality properties raise some questions regarding their relevance. For example, the Horvitz and Thompson estimator, YHT, is uniquely 'hyper-admissible' whatever the character under investigation or the sample design. But the YHT may lead to disastrous results (Sarndal, 1972; Basu, 1971). Many variations of the regression type population model yi = axi + Zj_ i = (l,...,N) have been used, where y^ is the character of interest and Xj_ is an auxiliary variable for unit i in the population; Zi is an error component. Cochran (1946), Godambe (1955, 1965), Royall (1970, 1971) and many others used the super-population approach in conjunction with the regression model to compare efficiencies between sampling methods. Cassel and Sarndal (1972a, 1972b, 1973) use the continuous variable model of the form Y(X) = 0 (X + Z(X) ) VXC(O.cP) where (z(x) / X) = 0 , £(z(x) | X)2 = cr2 (X) £(z(Xj.) z(x2) | x2x2) •= o (x2 * x2) 10. and the auxiliary variable X is assumed to have a known distribution described by the distribution function F(x) over (0,CP). Different notions of unbiasedness have been used during this search for optimal estimators and designs: An estimator, Y for Y, is called design-unbiased or p-unbiased if ' t V- - -Ep (Y) =2_P(s) Y = Y stS for all vectors (Yj_, .... ,Y^) , where P(s) is a probability function defined on the set S of subsets of s of labels (I,«...,N). An estimator is called model-unbiased or£-unbiased if it is unbiased under the assumptions of the specified model. A Lastly, an estimator Y is called Ep^-unbiased for Y if Ep£ (Y) =IP(S) C (Y) = Y s£S an estimator can be p-unbiased but not£-unbiased and vice versa. The PLatio, Mean-of-the-Ratios and the HT Estimators It is a fact that using supplementary information in sampling designs and estimators greatly improves the accuracy of the estimates. Three estimators that have 11. received considerable attention in the literature of sample survey theory and which make use of supplementary information are the Ratio estimator, the Mean-of-the-ratios estimator and the Horvitz-Thompson (Hi) estimator. If X is an auxiliary variable, the ratio estimator for the population mean,KY, is given by AYR = Y. X = £- X (1.1) x x n n - i • N , A where y = I X = I X = J I X., - = I Z_ Y±, 1=1 1 1=1 . N 1=1 1 Y n 1=1 X = — ZZ X-j, n is the sample size and N is the population n i=i size. The mean-of-the-ratios estimator is A v n yi / x KYmr = - H ~ .-^-2) n i=i XL y • The ratio estimators make use of the ratios Y and —i- . X Xj_ in order to improve estimation •With simple random sampling from an infinite population, the ratio estimator of Ry has been shown to be the best linear unbiased estimate if two conditions are satisfied (Cochran , 1963, p. 166, Theorem 6.4): (1) The relation between yj_ and X^ is a straight line through the origin (2) The variance of y^'about this line is proportional to Xi. When the variance of yi about this line is 2 proportional to X'^, using the mean-of-the-ratios estimator gives much better performance than the other estimator. 12. When the. relation between y± and Xj is linear but the line does not go through the origin, the regression estimator . y"lr = y + b (X - x) performs much better than the otherj- Yj[x reduces to y if A y b = 0 and toKRY if b = Both the ratio and regression estimators are consistent and, generally, slightly biased, but with sampling designs like stratified sampling, the bias of the ratio estimator may be considerable. This has led to searching unbiased or better ratio-type estimators. J.N.K. Rao (1969) gives some results of the search. The results indicate that under certain conditions, other ratio-type estimators perform better than the tradi-.tional ratio estimator. The Horvitz-Thornpson estimator is given by KYHT = . • ni=lP(xi) (1.3). where P(xi) is the probability that a unit with auxiliary variable x^ will be included in the sample. The variance of the HT estimator may be negative, or it may not.reduce to zero even when all the Y-values are the same and the variance should actually be zero. In many studies the HT estimator has been shown to compete very -well with the ratio estimators. Assuming a linear stochastic model of the form Yi = cc + pXi + Zi , = 0%2Yt Foreman and Brewer (1971) showed that the HT estimator is 13. more efficient than the ratio estimator with equal selec tion probabilities if Y?^ , but they caution that if the sampling fraction is large and a is appreciable, the ratio estimator may be made more efficient than the HI. estimator. Rao (1967) gives some conditions under which the rnean-of-the-ratios estimator is superior to both the ratio estimator and the HT estimator. Unfortunately, the mean-of-the-ratios estimator is not consistent like, say, ! the ratio estimator (Sukhatme and Sukhatme, 1970, p. 1 60). 14. CHAPTER II THE PROBLEM Statement of the Problem The problem with simple random sampling is that it does not take into account the possible importance of the larger units in the population. Because of this, sampling designs like sampling with probability proportional to size (pps), and generally known as sampling with varying probabilities came to be used. These are more complex designs than simple random sampling. It was also realized that it makes a difference whether sampling is done with replacement or without replacement. However, the estimators considered were very complicated. To get simple estimators, sampling schemes like the Midzumo system of sampling, the Narain Method of sampling, Systematic Sampling with varying probabilities etc. were introduced. It is very clear, from such studies, that performances of estimators can be improved by, not only making use of supplementary information, but also by choosing the proper sampling design. The ratio estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson estimator have performed quite efficiently in some of the.research work in sample survey literature. But which of these three d-oes better 15. estimation than the others depends on the sampling design used, the way the supplementary information has been used,.the class of estimators used and the way 'bias1 or 'unbiased' is defined. Joshi (1971), for example, states that (i) The HT estimate is always admissible in the class of all unbiased estimates, linear and non-linear. (ii) In the entire class of estimates, the HT estimator is admissible if the sampling design is of fixed sample size, (iii) If the loss function, V(t), where t is the numerical difference between the estimated and true values, satisfies only (a) V(t) is non-decreasing in CO, ^l, (b) for every K > 0 °o 2 5 V(t) exp (- %- ) dt C <*> 0 2 then the sample mean, and more generally the ratio estimate is always admissible as an estimate of the population mean. The use of- these three estimators with the regression type models has shown to give very promising results. I would like to investigate the performances of these three estimators (the ratio estimator, mean-of-the-ratios estimator and the HT estimator) under the regression type continuous variable Model of Cassel arid Sarndal (1973). 16. I would like to consider the model Y(xi) = 6(xi) + z(xi) ) where £(z/i| Xi) = 0 Vxi € (0, *P) . £(Zi2|xi) = cr2 (xi) = k2 Xi9 E (Zizj j XJX-J) =0 (i T j) where Yi is the value of the character under investigation for unit i, xi is the value of the corresponding auxiliary variable. I further assume that the auxiliary variable X is gamma distributed over (O,0^1) i.e. . f(x) = X r~1e'"X for x£(0,<^) Hence its mean E(x) = § x f(x)dx = r 0 My aim is to estimate the population mean EP€(Y(x) ) = =Ky The three estimators of the population mean that I would like to consider are * n y(*i) MYHT = i YI -r-T- (2,I) 1=1 ^^mr - £ (2.2) 1=1 A n H YR = r if! (2.3) n *i i=l 17. Where the design function, P(x^) is the selection probability density of a unit with auxiliary variable xi, n is the number of units in the sample. Draws are made independently of each other according to the same distribution of inclusion probabilities. (2.l) is the H0rvitz-Thompson estimator, (2.2) is the rnean-of-the-ratios estimator and (2.3) is the classical ratio estimator. The three estimators are Ep£-unbiased and their MSE's, then, equal their EpGvariances. Method used and Problems of Evaluations The efficiencies of these estimators will be evaluated in terms of their EpGvariances, which are obtained from Generally, the expressions for the variances of these estimators are Var (KY) = Ep£(fty2) - (Ep£(fiy) )2 Var (RYHT) x__±_k 2xi f(x)d(x) - 1) A 2 °o Var (Hymr) Ky j k2xg~2 p(x) f(x) dx n 0 Var (KYR) A p(x) f(xj d(x_) where P(x) = JT P(x,) , f(x) = Uf^) , d(x) = JJ dxi -' i=l -1 I-I x l-l 18. and the last integral is n-dimensional (over 0 z_ x^,/^ for i = 1, n). The variances depend essentially on three things, namely, the shape of the continuous distribution, f(x), of the auxiliary variable, x; the variance of Zi and the selection probability density, P(x). I will compare the variances under l) polynomial p(x), 2) exponential p(x). (1) Let P(x) -s ?M Xm (m = 0 represents simple random sampling, m = 1 represents pps sampling scheme). In this case the expressions for the variances of these estimators are: Var rf(YHT) = ^ [ fr{fT^ (JU+r-mH k2 P(g+r-mjj-l}(204) Var (tfymr) = ^jg-r+a^). -By2. (2o5) for g = o: A k2 2 Var (HYR) = (n^nr-l)(nm-nr-2) ' Hy Uy^ k2 " —- ' ftrTTT2' (if n is la?ge) . (2.6) for g = 1: Var (KYR) = ^- • U_ 2 nm+nr-1 / -n- iiyi • k2-. (for n large) (2.7) n m+r 19. I evaluate these expressions numerically for r = 1, 2, 3 and k and g fixed at 0.5 and 0 respectively, 0.5 and 1 respectively, 0.5 and 2 respectively, 1 and 0 respectively, etc., and as m takes the values -1 (0.5) 2,3,4. I will observe how the variances of these estimators behave. (2) Let P(x) = (l-c)r ecx, c£.l ' (c = 0 represents simple random sampling). In this case the expressions for the variances are: Var (KYHT) = ^C^— f !l2±X-L_ + (2.3) n pU-c)*l(r) L (c+a.)2+r (c+1)g+U J Var fr(vrnr1 = Hy2. k2 V (q+r72) (2.9) for g=0: Var (KyR) = K? ni^^cl2 1 y (nr-l)(nr-2l f^'. iiHllrcJ2 (for n large) {2<10) for g=l: Var (kyR) = Kv k2(l-c) nr-1 2 ^ix • k2(l-c). (for n large) (2.1l) n r As with the polynomial case, these expressions are evaluated and compared as values of r » c, g, k vary. It has. not been possible to evaluate' the variance 20. of the ratio estimator when g = 2, only the variances of the HT estimator and mean-of-the-ratios estimator will be compared when g = 2. With exponential p(x), the variance of the mean-of-the-ratios estimator is finite only if r + g>2. Throughout, I have used the approximate A. expressions of Var (KyR) for my evaluations. The results are tabulated below and illustrated by graphs. CHAPTER 3 THE RESULTS Results for P(x) Polynominal and k = 0.5 The results are presented in Tables I -III and illustrated by figures 1-3. When g = 0 the ratio estimator has, generally, the smallest variance for fixed values of r and m, followed by the mean-of-the-ratios estimator and then by the HT estimator. When r = 1 not much comparison can be made between the HT estimator and the rnean-of-the-ratios estimator as the variance of the HT estimator is finite for -l.O^m^l while the variance of the mean-of-the-ratios estimator is finite for m>l. In the range of values of m for which the variances of the ratio estimator and of either the HT estimator or the mean-of-the-ratios estimator are finite, the ratio estimator has smaller variance. When r=2 and 0<m<2 where the variances of all the three estimators are finite, the ratio estimator has the smallest variance; I the HT estimator initially has a smaller variance than the mean-of-the-ratios estimator. The variance of the HT estimator takes its minimum value at about m = 1.0 and for m?1.0 the mean-of-the-ratios estimator's variance is smaller than that of the HT estimator. The variance of the HT estimator starts at infinity, takes its minimum value at 22. m = 0.5 when r = 1, and m = 1.0 when r = 2 or 3, and goes to infinity as rn takes higher values, When r = 3 the ratio estimator is most efficient followed by the mean-of-the-ratios estimator and the.'HT estimator is last.- The variances of the ratio and mean-of-the-ratios estimators get smaller and smaller as m increases. The same is true with changes in the value of r. This can also be seen by observing formulas (2.5) and (2.6) for the variances of the two estimators. When g = 1 the ratio estimator still performs better than the other two. When r = 1 and m^l, the HT estimator has smaller variance than the mean-of-the-ratios estimator. The reverse is true for m^l. When r = 2 or 3, the mean-of-the-ratios estimator has smaller variance than the HT estimator except when the variance of the HT estimator takes its minimum value equal to the value of the variance of the mean-of-the-ratios estimator. When r = 1 the HT estimator beats the mean-of-the-ratios estimator for l<m, the mean-of-the-ratios estimator beats the HT estimator when m?l. The varience of the HT estimator takes its minimum value at m = 1.0, and the variances of the ratio and mean-of-the-ratios estimators become smaller and smaller as m gets larger and larger. When g = 2 the mean-of-the-ratios estimator's variance is a constant and smaller" than that ox the HT 23. estimator (except when the variance of the HT estimator takes its minimum value equal to the value of the variance of the mean-of-the-ratios estimator). The variance of the HT estimator is smallest at m = 1.0. 1 . Generally, as m increases, the variances of the ratio and the mean-of-the-ratios estimators are reduced, but the extent to which these variances can be reduced by, say, a given big m is lessened as g takes higher values. At m = 4, for example, the variances of the ratio and the mean-of-the-ratios estimators are approximately zero for g = 0, 0.05 M,2/n for g = 1, and. the mean=of-the-ratios y estimator has a constant variance of 0.25|{2/n when q = 2. y For large values of m, the difference between the variance of the ratio estimator and that of the mean-of-the-ratios estimator is very small and not very much dependent on the value of r. The smallest possible value taken by the variance of the HT estimator and the extent to which it is finite is greatly affected by the value r takes. As r takes bigger values, the variance of the HT estimator can' tlake smaller values and is finite for a wider range of the values of m. Results for P(x) Polynominal and k = 1 The results are given in tables IV-VI and illustrated by Figures 4 - 6. . When g r 0 and r - 1, the variances of the Hi TABLE I Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 0. (Each entry should be multiplied byf^{2/n) r = 1 r = 2 I i ! r = 3 A • HYR A Kymr i u . ! m M-YHT Kymr ^ YHT i KYR I UYHT Hymr i KYR -1.0 CO OO 5.13 J0.25 J I 2.38 ! oo > 0.063 -0.5 5.28 1.00 1.65 JO.11 ! j 0.97 0.33 | 0.040 0.0 -1.25 0.25 0.56 ;0.063 i 0.36 0.13 I 0.028 0.5 0.57 0.11 0.22 0.33 J0.040 I 0.11 0.067 j 0.020 1.0 oo oo 0.063 0.13 0.13 1 |0.028 1 ! 0.042 0.042 j j 0.016 1.5 oo 0.33 0.040 0.47 0.067 ! J0.020 • 0.15 0.029 ! ! 0.012 2.0 0.13 0.028 CP 0.042 |0.016 0.50 0.021 | 0.010 3.0 0.042 0.016 0.021 i jo.oio oo 1 0.013 ! 0.0069 4.0 0.02 0.01 • 0.013 J0.0069 j l ! 0.0083 ! 0.0050 'TABLE II Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 1. (Each entry should be multiplied by|U.2/n). r = 1 r = 2 r = i 3 .. ! I m r A KYHT Kymr KYR A KYHT A kymr. H-YR A HYHT A H Ymr 1 A j r^YR ! -1.0 oo CO 5.37 oo 0.25 2.50 0.25 ! • ! 0.13 -0.5 5.48 0.5 > 1.762 0.50 0.17 • 1.04 0.17 0.10 | 0.0 1.25 CO 0.25 0.63 0.25 0.13 0.42 0.13 j 0.083 | i 0.5 0.37 0.5 0.17 0.21 0.17 i 0.10 ; 0.14 0.10 i 0.073 | ! 1.0 0.25 0.25 0.13 0.13 .0.13 0.083 0.08 0.083 0.063 1 1.5 2.52 0.17 0.10 0.29 0.10 0.071 0.18 0.071 j 0.055 j i 2.0 Go 0.13 0.08 0.86 0.08 0.063 0.50 0.063 1 0.050 j 3.0 0.08 0.06 oo 0.063 0.050 I j 3.17 0.050 0.043 I I 4.0 0.06 0.05 i ! i 0.05 i i 1 0.042 j i i 1 cO i i I 0.041 ( 0.035 TABLE III Variances of YHT and Ymr when X is Gamma Distributed, P(x) is Polynominal, k = 0.5, g = 2. (Each entry should be multiplied by Ky/n) 1 ! i i ! I J. r = 1 r = 2 ] r = 3 m ; i A KYHT A KYmr A [X, YHT A U Ymr KYHT A KYmr 1 -1.0 1 oo 0.25 6.50 < .0.25 3.17 0.25 i | - 0.5 6.36 0.25 2.21 0.25 • 1.41 0.25 j 0.0 1.500 0.25 0.88 1 0.25 0.67 0.25 ; ! 0.5 0.47 0.25 0.38 I 0.25 0.34 0.25 . ! 1.0 0.25 0.25 0.25 1 0.25 j 0.25 0.25 1 ! 1.5 0.47 j 0o25 0.38 ! 0.25 0.34 0.25 2.0 1.50 0.25 0.86 i 0.25 0.67 0.25 3.0 0.25 6.25 I 0.25 3.17 0.25 1 4.0 i 0.25 ! 0.25 ! ! i 24.00 0.25 j i • H444 • : - • . . "TT TTTT EIGUHE -3; XTX t-tr ! I ! ! ' • -- Varianc .' ^Gamma [Distributed [Jl 4J 44 -h p-mi M-HI 1 I jw 1 11 rjVai . (HYrnr; ^Vax^KYHfl 111! for any • :.j j Lu-L f or ' r • = ? - 1 II .rr •44—h -Hi Li" rr i :2, 3 :re i i ; • T- "IT—' T'TTT? a nd(KY mr) whe n • X • i b • • E5TJF p(x)'is:£diynominal, (0;illustrate:fable: 0 r •4 1414X1.-4. ;.L . lift pectlye! 444-0'. • rrrr .i-i. -4-L. £L ! r rr ± 1— t mt H fflxllx r 1 1 I 1 1 1 t ttt . : tfct ft 1 : • -• • IX XI H-hH t • • -rt-t-Hi -H +11! ' ff H±tt± rrr rr M4 -4-U.-l-L rrr ttt .14 -L. .1 f (-4-M-.1. L —1-4. L; i Ltt-H -;xt;X r-r r -rr 44 t I txjx :!xi: Bit t TABLE IV Variances of the Estimators when X is Gamma Distributed, P(x) is Polynominal, k = 1, g = 0. (Each entry should be multiplied by r = 1 r = 3 m KYHT A KYmr A A j A j /v ; A j Ax I A (iYR. j pi YHT | KYmr 1 f^YR | KYHT j KYmr j |4YR -1.0, -0.5 0.0 0.5 1.0 1.5 2.0 3.00 4.00 6.63 1.00 1.75 oo oo oo I 1.33 0.50 0.17 0.083 5.50 '4.00 ! 1.86 1.00 I 0.75 I 0.44 j 0.40 i 0.25 ! 0.50 0.16 | 1.50 0.11 j DO 0.063; 00 0.040: j i.oo I 0.43 j o° | 0.25 '1.33 |0.16 0.50 j0.11 i 0.27 |0.082 .0.17 •0.063 :0.083J 0.04 !0.050|0.028 2.50 1.08 0.44 0.19 0.17 0.36 1.00 oo 1.32 • 0.52 0.27 0.17 0.12 0.084 ;• 0.052 I 0.033 ! 0.25 I t I 0.16 ! 0.11 ! j 0.08 ! 0.06 j I 0.048 ! 0.04 J | : 0.028 i i 0.020 TABLE V Variances of YHT and Ymr when X is Gamma Distributed, P(x), is Polynominal, k = 0.5, g = 2. (Each entry should be multiplied by L[2/n) r = 1 j i r = 2 - r = 3 m A, RYHT fiYrnr A, HYR IKYHT j A | KYmr. HYR KYHT 1 ! j A ; j K.Ymr ; i ; A KYR -1.0 6.50 i i oO j 1.00 3.00 ! i 1.00 0.50 j -0.5 7.44 2.00 2.32 2.00 0.67 1.36 0.67 0.40 j o.o 2.00 oo 1.00 .1.00 j 1.00 j 0.50 0.67 0.50 0.33 j 0.5 0.96 2.00 0.67 0.55 " ! 0.67 0.40 0.37 0.40 0.29 j 1.0 1.00 1.00 0.50 0.50 i i 0.50 0.33 0.33 0.33 0.25 1 1.5 2.53 • 0.67 0.40 0.84 \ 0.40 0.29 0.50 0.29 0.22 | 2.0 j • OO i 0.50 0.33 2.00 1 ! 0.33 I I 0.25 1.00 0.25 0.20 1 3.0 0.33 0.25 oO 0.25 0.20 5.67 0.20 0.17 j j 4.0 i 0.25 0.20 . 0.20 0.17 0.17 0.14 ... TABLE VI Variances of YHT and Ymr when X is Gamma Distributed, P(x) is Polynominal k = 1, g = 2. (Each entry should be multiplied by j^y/n) r = 1 r = 3 ) i ! m 1—1 — ; KYHT i ' i A, I KYmr A KYHT A KYrnr 1 A KYHT A K Ymr t i -i.o ! 1 1 1 11.00 1 | 5.67 | -0.5 j j 10.78 1 4.15 1 2.86 • \ 0.0 i 3.00 i ! i » 2.00 i j 1.67 1 ! 0.5 j 1.36 i I i 1.21 1 ! L17 i ; i.o : 1.00 i 1 ! • i 1.00 1 • 1.00 1 1.5 i ; 1.36 i ! 1.21 1 1.17 1 ; i 2.0 i 3.0 1 2.0 1 1.67 .1 j 3.0 i 1 i i 11.00 1 5.67 i ! | 4.0 ' i i 39.00 ] CO 36. estimator and of the mean-of-the-ratios estimator are not comparable as the values of m for which these variances are finite are not the same. In the range of values of m for which either the variance of the HT estimator or the variance of the mean-of-the-ratios estimator and that of the ratio estimator are finite, the ratio estimator is more efficient. When g = 0, r ~ 2, and 0<nUl the ratio estimator has smallest variance followed.by the HT estimat or and then by the mean-of-the-ratios estimator, but the mean-of-the-ratios estimator beats the HT estimator for m">l. When g = 0, r = 3 and m<0.8, the ratio estimator is best followed by the HT estimator and the mean-of-the-ratio estimator is last. The ratio and mean-of-the-ratios estimators become more efficient with an increase in the value of both m and r. The HT estimator's variance takes its minimum value for values of m near 0.8; it also takes smaller values as r increases. When g - 1 the ratio estimator always has the smallest variance. When r = 1 and m<l the HT estimator is better than the mean-of-the-ratios estimator. The ! reverse is true when m>l. When r = 2 and for either mcO or m>l, the HI estimator has larger variance than the mean-of-the-ratios estimator, for 0<rn*l the HT estimator's variance is smaller than that of the mean-of-the-ratios. When r = 3 the HT estimator beats the mean-of-the-ratios for 0.3<:riKl and the mean-of-the-ratios estimator beats the HT estimator for other values of m. 37. When g = 2 the variance of the mean-of-the-ratios estimator is a constant at j.(2/n and, except when the variance of the HT estimator takes its minimum value, it is smaller than that of the HT estimator. As in the other case, an increase in the value of g has a general effect of lessening the power of reducing the variances of the ratio and rnean-of-the-ratios estimators by increasing m. No improvement can be made on the variance of the mean-of-the-ratios estimator when g = 2. The variance of the HT estimator is smallest for values of m near 1.0. The change in the value of k from 0.5 to 1.0 has the effect of slightly increasing the values of the variances of the estimators for fixed values of g, r and rn. Results for P(x) Exponential and k - 0.5 The results are given in Tables VII-IX and illustrated by Figures 7-9. When g = 0 or 1, the ratio estimator is most efficient followed by the mean-of-the-ratios estimator and then by the HT estimator. The variances of the ratio and mean-of-the-ratios estimators are zero when c = 1. The greater the value r takes the more efficient the estimators become. The graph of the variance of the HT estimator is u-shaped; an increase in r has the effect of pushing the minimum point of the variance of the HT estimator to the left. 33. When g - 2, the variance of the mean-of-the-ratios estimator is constant at 0.25ky/n and is smaller than that of the HT estimator. The effect of increasing r on the shape of the variance of the HT estimator's curve is simply to shift its minimum point to the left. The rate of improvement on the estimators by increasing c is reduced as g changes from 0 to 1, and the estimators become less efficient too. TABLE VII Variances of the Estimators when X is P(x) is Exponential, k = 0.5, g = 0. be multiplied by \i2/^) A : KYHT j rr -• 1-i 138.58 i 18.90 j 6.00 2.50 j 1.25 | 0.70 I 0.52 | 0.60 | 1.44 i KYR 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01 0.00 ! \ \ RYHT i |288.50 j 21.25 i | .4.96 j 1.61 0.56 i ! 0.21 I I ! 0.58 1 2.53 CO = 2 ! 0.25 [ 0.20 I 0.16 S 0.12 I 0.09 i 0.06 | 0.04 I | 0-02 j 0.01 i. 0.003 ! 0.00 Gamma Distributed, (Each entry should KYHT co 713.50 29.84 4.70 1.38 0.36 0.10 0.24 1.21 12.68 r = 3 -1 -I KYmr I 0.50 •: 0.40 0.32 | 0.24 I 0.18 ! 0.12 | 0.08 ; 0.04 j 0.02 j 0.005 j 0.00 A I KYR I o.n • 0.09 ; o.o7 : 0.05 ' 0.04 ' 0.03 ; 0.02 : 0.01 ; 0.004 ; 0.001 ', 0.00 trm-ff IT i -H H-HH-H -I.).,). |,|_ J-U u • • r TXT k N 44-H 0.5 ± •iT1" 7T7Tj[+ when X. rs 40. • Variances ' oi j the j L:s timator s! : Gamma :pj)$ttibutedi P(x).; is: f:*poneptia " • =;0. • To illustiate-Tabie 1i 1 s i r n : t~l n ~n i :-rt-n-iurn.:::.; HH+rt H- M- {-Ei :.;:!-:T!-|i TABLE VIII Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 0.05, g = 1. (Each entry should be multiplied by^/n) I i 1 r : = 1 r = 2 r = 3 1 A A A . A i A j A ; ; C ! HYHT MYR KYHT KYmr MYR MYHT Hymr I UYR j i -i.o ! • 1 oo 0.50 oo 0.50 0.25 oo i t 0.25 | j i 0.17 \ j j -0.8 J141.30 0.45 290.30 0.45 0.23 713.50 0.23 0.15 j • -0.6 j 1 19.51 0.40 22.20 0.40 0.20 30.93 0.20 | j 0.13 | \ -0.4 "! 6.11 0.35 5.08 0.35 0.18- 5.33 0.18 | 0.12 i -0.2 : 2.53 0.30 1.60 0.30 0.15 1.47 0.15 0.10 | 0.0 :' 1.25 0.25 0.63 - 0.25 0.13 0.42 0.13 j i 0.083. | 0.2 ; 0.66 0.20 0.27 0.20 0.10 0.16 0.10 ; 0.067 ; 0.4 .0.43 0.15 0.25 0.15 0.075 ! 0.39 0.08 1 0.050 '; i 0.6 0.47 ! 0.10 I 0.73 0.10 0.050 j 1.50 0.05 i 0.033 ; o.s 1 1.10 ! 0.05 3.54 0.05 0.025 11.15 0.03 j 0.017 ; ; i.o ! <?0 ; 0.00 oO 0.00 0.00 j cO 0.00 ' 0.00 Var (KYmr) is not defined for r = 1. TABLE IX Var (KYHT) and var (KYmr) when X is Gamma Distributed, P(x) is Exponential, k = 0.5, g = 2. (.Each entry should be multiplied byk^/n) 1 1 • ' • — • • • - — • 1 i. - 1 r r = 2 r = 3 f 1 <= t I KYHT KYmr A KYHT A KYmr 1 KYHT ; A j K Ymr i i | -1.0 t 0.25 oo 0.25 1 oo ! | 0.25 ! j -o.s • ' 165.67 0.25 301.10 0.25 720.6 j 0.25 j ! -0.6 15.13 0.25 24.20 0.25 32.06 j j 0.25 | j -0.4 7.00 0.25 5.79 0.25 5.81 j 0.25 j -0.2 2.91 0.25 2.05 0.25 1.73 j 0.25 • 1 i 0.0 1.50 0.25. 0.88 0.25 0.67 0.25 I i 0,2 j 0.74 0.25 0.47 0.25 0.42 j 0.25' ! 1 0.4 1 1 0.46 0.25 0.46 | 0.25 i | 0.71 ; 0.25 j j 0.6 . 0.46 0.25 1.00 | 0.25 | - 2.26 1 0o25 | : o.s i 1.06 i j 0.25 | '4.13 j 0.25 i j 14.97 ! 0.25 | j 1.0 | oO I 1 | 0.25 i j oo | 0.25 i 1 i 0.25 ! ! • rl tt -tr 45. Results for P(x) Exponential and k = 1. The results are given in Tables X-XII and illustrated by Figures 10-12. The ratio estimator has the smallest variance throughout. When g = 0, r = 3 and 0.23<c or c^-0.1, the variance of the mean-of-the-ratios estimator is smaller than that of the HT estimator. The HT estimator beats the mean-of-the-ratios estimator for 0.1^c<0.28. When g = 1, r = 2 and c^O.12 or O0.3, the mean-of-the-ratios estimator is more efficient than the HT estimator, which beats the mean-of-the-ratios estimator for 0.12£c^0.3. When r = 3 the mean-of-the-ratios estimator has smaller variance than the HT estimator. When g = 2 the variance of the mean-of-the-ratios estimator is constant at A2/n and is smaller than that of the HT estimator. In this particular case the minimum point of the graph of the variance of HT estimator moves to the left and the value of the variance at its minimum increases as r increases. An increase in the values of g has effects similar to those mentioned in the other cases above. As with the polynominal case, an increase in the value of k from 0.5 to 1 has the effect of making the estimators less efficient for fixed values of c, r and g. TABLE X Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 0. (Each entry should be multiplied by L(2/n) 1 1 1 i t r = 1 r = 2 | r = 3 • ! C i A I KYHT rt ; RYR 1 A ! A MYHT j {AYR i " A • KYHT ——— i ! A : KYmr —-1 A i H YR ' -1.0 1 oo i i j | 4.00 j oo : 1.00 cQ ! 2.00 0.47 -0.8 | 140.7 | 3.24 ! 288.70 j : 0.81 i 713.4 1.62 0.36 -0.6 | 20.10 j 2.56 j 22.13 i 0.64 : 30.80 1.28 0.28 -0.4 i i 1 5.6i ; - f 1.96 j 5.13 0.49 i 5.25 0.98 0.22 -0.2 t i j 3.30 j 1.44 i | 1.77 0.36 1.41 0.72 0.16 0.0 • 1 ' 2.00 | i j 1.00 | • 0.75 : 0.25 ; 0.31 0.50 0.11 0.2 1.50 ! 0.64 ! 0.45 0.16 i 0.23 0.32 0.07 0.4 ! 1.44 0.36 j 0.57 0.09 0.52 0.18 0.04 0.6 | 1.77 : 0.16 ! 1.41 0.04 ; 2.07 0.08 0.018 0.8 ' 3.53 . 0.04 i 6.29 0.01 i 15.58 0.02 0.0044 1.0 j oO i 0.00 oO .0.00 oO 0.00 0.00 Var (AY mr) is not defined for k = l, g = 0, r c 2. TABLE XI Variances of the Estimators when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 1. (Each entry should be multiplied by jj^/n) r - 1 r - 2 • r = 3 c A KYHT AYR A. KYHT A K.Ymr i KYR A KYHT A KYmr A KYR -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 i j 0.4 1 0.6 1 0.8 • 1.0 151.8 22.44 7.60 3.56 2.00 1.31 1.07 1.20 2.26 oO 2.00 1.80 1.60 1.40 1.20 1.00 0.80 0.60 | 0.40 | 0.20 i 0.00 j 1 292.20 59.55 5.61 2.25 2.00 0.67 0.78 1.64 6.43 CO 2.00 1.80 1.60 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 | 1.00 ! 0.90 ; 0.80 j 0.70 : o.60 ! 0.50 j 0.40 j 0.30 1 0.20 i 1 °-10 , > o0oo ; oO 714.80 31.30 5.59 1.70 0.67 0.50 0.98 3.03 20.79 oO 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.67 0.60 0.53 0.47 0.40 0.33 0.27 0.20 0.13 0.07 0.00 Var A (MYmr) i s not defined f or. r = 1. iTf 4-4-4 I 1 1 I I 1 J 4 Vaxianc es I of I the i:Stima tor Gariiriia ;DictxibijitedP(x) H I III I U J-U-iJ-Hri mtm 19. :: it - h R± r~n~| • FFr i TABLE XII A A var (KYHT) and var (KYmr) when X is Gamma Distributed, P(x) is Exponential, k = 1, g = 2. (Each entry should be multiplied by H^/n) 1 i = i r = 2 r = .3 c ; &YHT A •UYrnr UYHT $ Ymr A KYHT $Ymr - 1.0 - 0.8 - 0.6 - 0.4 - 0.2 0.0 0.2 0.4 0.6 0.8 1.0 i cO "' 276.78 • 37.46 12.16 5.51 3.00 1.90 1.43 1.45 2.51 oO i i 1 1 . 1 1 1 1 i ! i ! 1 i 1 i i ! 1 1 OO 346.30 31.04 8.95 3.58 2.00 1.49 I 1.60 | 2.72 1 9.00 i ! oo ! 1 1 1 1 1 1 1 1 1 1 1 oO 742.0 37.87 7.50 2.86 1.67 1.55 2.40 6.08 | 36.41 1 1 1 1 1 1 1 1 1 | 1 1 52. Summary of Results and Some Conclusions The following observations can be made from the results: (1) Under the given assumptions about the distribution of the auxiliary variable, X,and the sampling designs, and for g = 0 or 1, the ratio estimator is better than both the mean-of-the-ratios and the HT estimators. Under a very wide range of the values of m, the mean-of-the-ratios estimator performs better than the HT estimator. (2) The HT estimator is very sensitive to changes in values of the design function parameters m and c. Within a certain (small) range of the values of these parameters the HT estimator has, sometimes, smaller variance than that of the mean-of-the-ratios estimator. (3) The variances of these estimators are usually smaller when k = 0„5 than when k = 1. (4) For fixed values of k and g, the variances of the ratio and mean-of-the-ratios estimators usually increase with a decrease in the value of r. With the HT estimator, it depends on the range of values of m or c considered and the design function used. With the polynominal design function, the efficiency of the HT estimator usually increases with r. With the exponential design function and for, say, c<0.3, the 53. efficiency of the HT estimator improves with an increase in the value of r; when c>0.4 the HT estimator usually becomes less efficient as r takes higher values. For the same values of k, and for the same ranges of the values of the design function parameters, m or c, the effect of increasing g is usually to make the ratio and the mean-of-the-ratios estimators less efficient. Other things fixed, the variances of the ratio and the mean-of-the-ratios estimators decrease with an increase in the values of m or c. The ratio and mean-of-the-ratios estimators have smaller variances when selection is 'purposive' (m?0 or c70) than when simple random sampling (m=c=0) is employed. The HT estimator is most efficient when the sampling scheme is approximately pps, i.e. when m is approximately = 1 (in the polynominal case). In most cases, the variances of these estimators can be equalized by simply choosing different combinations of the parameters. For example, the variance of the ratio estimator for k = 1, g = 1 and r = 1 equals the variance ofthe mean-of-the-ratios estimator with k = 1, g = 1 and r = 2 (and P(x) is polynominal). 54. (10) For a greater range of the values of c or m, the estimators have smaller variances when P(x) is polynominal than when it is exponential. (11) With an exponential design function, the variances of the estimators are finite for a very small range of the values of c. The variances of the ratio and mean-of-the-ratios estimators are zero when c = 0. 55. CHAPTER 4 DISCUSSION OF RESULTS Some Empirical Comparisons P.S.R.S. Rao (1969) compared four ratio-type estimators under the regression model: y=(X+p>X+e where E(e.|Xi) = 0 E(ei e. | X± Xj) = 0 Var (ej_ j XL) = £x? and X is gamma distributed with parameter h. He found out that when a = 0 and g > 1, the MSE's increase as h increases. He considered values of h>2 and samples of size 2-10. These results are not quite compatible with my results for the same MSE's of the ratio and mean-of-the-ratios estimators (Rao's first estimator is the ratio estimator). But from (2.5) and (2.9), it is clear that, for g>2, the variance of the mean-of-the-ratios estimator increases with r. For g 2, the variance decreases with r. Since I did not work out the variance of the ratio estimator for g > 1, I do not have much to compare with. Certainly when g = 1, the variance of the ratio estimator decreases as r increases. Rao also found out that for fixed n and h (or r) the MSE's increase as g increases. This agrees with my results and can be inferred from (2.5) -(2.7). Lastly, his results show that for 0f= 0, n 72 and 56. g = 1 or 2, the ratio estimator has smaller MSE than the other three. It is implied from his paper that a choice of a and g.combinations does affect the MSE's of the estimators. My results show that the variances of the ratio and mean-of-the-ratios estimators become smaller as the design function parameter m or c increases. (2.5) -(2.7), (2.10) - (2.11) show similar results. But the larger the values of m or c the higher will be the inclusion probabilities for units with large values of X compared with those with small values of X. This leads to the notion that if units with large values of X are purposely included in the sample, the estimation procedure will be more efficient. Many researchers, notably, Royall (1970, 1971) have found similar results under similar conditions. Royall's results also show that the ratio estimator is better when combined with designs other than simple random sampling. For g = 1, k = 1, he shows that the ratio estimator is the best lineari- unbiased estimator. When used with simple random sampling design, it remains optimal only if M is infinite. Since Godambe proved the non-existence .of a uniformly minimum variance unbiased estimator among a class of all unbiased linear estimators for any sampling design, it may be proper to note that while discussing these optimality properties of the ratio estimator we are, most of the time, restricting 57. ourselves to a certain class of linear estimators only (Godambe's 1955 class (iii) ). The same remarks apply to optimality conditions given by Cochran (1963). The HT estimator belongs to Godambe's (1955) sub-class (i) of estimators. It is the best and the only unbiased linear estimator in the sub-class. Under super-population set up, Rao (1967) considered the problem of choosing a suitable strategy for the ratio, mean-of-the-ratios and the HT estimators. His results suggest that the mean-of-the-ratios estimator with ITPs is better than the other two estimators used with the Midzumo-Sen sampling scheme. My results suggest that if the ratio and mean-of-the-ratios estimators are combined with simple random sampling, there always exists an m or c such that the variance of the mean-of-the-ratios estimator is less than the variances of the ratio and of the HT estimators. Similarly, the variance of the ratio estimator with varying probability of inclusion procedure can be made small compared with those of the HT and mean-of-the-ratios estimators with simple random sampling. If in formula (2.2) for the mean-of-the-ratios estimator, we let *i = P(Xi) • r the HT estimator (2.1) is obtained. Hanurav (1967) was 58 o interested in finding sampling designs under which this X • p(X-) = — and the variance of the rnean-of-the-ratios 1 r estimator is unbiased (he was estimating the population total). For n = 2, he gives,two sequential sampling procedures that solve the problem. With my study, the conditions are easy to find: when P(X) P(Xi) when P(X) P(Xi) It is easy to find an r (c or m) that will give the . required P(Xj_) for fixed m or c (or r) provided c ~ m = 0. With polynominal P(x), this study is really a special case of that considered by Cassel and Sarndal (1973). Nearly all of my findings are in agreement with their findings. They find out, for example, that: (i) When g = 2, the mean-of-the-ratios estimator has constant variance regardless of both the design and the distribution function, f(x). (ii) The ratio estimator combined with simple random sampling can be quite inefficient compared with it if used with other sampling designs, (iii) Under certain conditions, one should use purposive selection of the units with largest X-values. is polynominal, we want = X^ • (m+ r j r is exponential we have (1 - c)r cxi = Xi e r 59. (iv) When g ^'1 the ratio estimator is at least as efficient as the mean-of-the-ratios estimator for any design P(x). In the case that I am consider ing, for n>l and g ^.1, the expressions for the variances of the ratio and the mean-of-the-ratios estimators show that the ratio estimator has smaller variance than the mean-of-the-ratios estimator, and the difference between the two variances increases with n. For example, when g = 0 and P(x) is polynominal, we have A 2 nk2 H y Var (M-Ymr) = n2(m+r-l)(m+r-2) 2 it2 Var (^YR) = (nm+nr-l)(nm+nr-2 (v) Variances of the ratio and mean-of-the-ratios estimators increase with m or c. (vi) If g = 2, regardless of the design function used, the mean-of-the-ratios estimator has variance that is smaller than, al to that of the ratio ' estimator (I did not work out the variance of the ratio estimator when g = 2). (vii) If g > 2 the variance of the mean-of-the-ratios estimator, becomes small if the design assigns the bulk of the selection probabilities to the units with smallest X-values. 60. (viii) The HT estimator is generally highly sensitive to shifts in the design. Its variance is a minimum' when selection probabilities are somewhat in the. vicinity of pps procedure. The variance can become very large due to minor deviations from the point of minimum. Comments and Some Implications From the results of this study, it would seem that instead of trying to look for estimators that are generally optimal like uniformly minimum variance estimators, more effort should be used in defining clearly and simply the conditions under which the popular estimators are efficient. Under the usual regression models, the choice of g, it seems, determines the extent to which the best choice of sampling strategy can improve the estimation process. In most, cases studied, it has been found that the value of g lies between 1 and 2. This may be unfortunate as the three common estimators I have considered can be I more efficient when g = 0. On the other hand, g = 0 implies that the variance of the error term, Zj_, is constant which is a very unlikely situation in practice. For 1 £g£2, the variances of the three estimators can be made to attain their minimum values by a proper choice of rn, r and g. Perhaps the good thing with the continuous 61. variable model is that the expressions for the'variances of the estimators are very, very simple. If one was interested, in getting the exact minimum values of the variances of these estimators, it should not be too difficult for him to do so. He will likely have to use the computer and some mathematical programming techniques. The results also call for more attention to the choice of estimators and sampling designs when doing 1 survey sampling. In particular, the results show again, 1 that in most cases, simple random sampling is not an optimal sampling design. There are other better ones. And when estimating the population mean, the sample average need not give optimal results. There may be other better estimators. In practice, the sampler, under this model, will partly be able to control the design function parameters. In such situations, studies like this may help the sampler make a proper choice of the design function parameter that will give best results. In this study, the ratio and rtiean-of-the-ratios estimators, as in similar other studies, promise good results when selection is purposive under certain conditions. Most sample survey experts object to this method of selection because, as Hansen, Hurwitz and Madow (1953, p.9) put it: (a) Methods of selecting samples based on the theory of probability are the only general methods known to us which can provide a measure of precision. 62. Only by using probability methods can objective numerical statements be made concerning the precision of the results of the survey; (b) It is necessary to be sure that the conditions imposed by the use of probability methods are satisfied. It is not enough to hope or expect that they are. Steps must be taken to meet these conditions by selecting methods that are tested and are demonstrated to conform to the probability model. They continue saying: We assert that, with rare exceptions, the precision of estimates not based on known prob abilities of selecting the samples cannot be predicted before the survey is made, nor can the probabilities or precision be estimated after the sample is obtained. If we know nothing of the precision, then we do not know whether to have much faith in our estimates, even though highly accurate measurements are made on the units in the sample. Random sampling is usually supported for similar arguments, namely, it protects against failure of certain probabilistic assumptions, it averages out effects of unobserved or unknown random variables, it guards against unconscious bias on the part of the experimenter, it will usually produce a sample in which the X's are spread throughout the range of X values in the population and this enables the sampler to check the.accuracy of assumptions concerning the relation of the y's to the X's and, again, it enables the sampler to estimate, from the sample, the precision of his estimate. But probability methods can do nothing more than give us expectations about, say, the possible precision of 63. the results of the survey. The precision would usually be stated in terms of the probability that the estimate deviates from the real value, and as long as it is given in these probability terms, it does nothing more than give expectations, however high and refined the probabilities may be. On the other hand, if many studies point to the fact that non-probability methods, like purposive selection, lead, under certain conditions, to efficient estimations, the same probability theory may allow that under similar sampling conditions, we can expect, with high probability, to obtain similar good results. As Royall (1970) argues, if the sampler believes it to be important that he obtain .. a sample in which the X values have a certain configuration, then he should choose a sample deliberately and not leave it to the choice of a certain chance mechanism. In this study, the ratio estimator has once again shown it superiority over the mean-of-the-ratios and the HT estimators. The HT estimator has revealed its sensitivity to the choice of parameters of the model and of the sampling design. The rnean-of-the-ratios estimator may not be very far off from the ratio estimator. The results of the'study are quite similar to similar studies under different regression type models. I would like to note that stratified sampling combined with simple random sampling in each stratum can be achieved, under this model, 64. by assigning the same P(x) for all members of one stratum and different P(x) for members of different strata. Some Limitations In putting the ideas- contained in this study into practice, the order of events is (l) estimate F(x), the distribution function of x (2) approximate g and 9 and (3)"investigate some sampling designs and estimators and choose the ones that give best results. This study would help the sampler to approximately examine the behaviours of different designs and estimators he may be pondering to use. The assumption concerning an infinite population that has approximately a continuous frequency distribution, while it helps simplify the investigation of different designs, also makes the situation considered an idealization of the real situation. To estimate the frequency function of x, one could start by observing the histogram of the x values and choose or fit an approximate continuous function, possibly by some mathematical curve fitting techniques, that closely resembles the histogram; and the continuous function thus obtained has to be standardized to become a cumulative distribution function. To approximate 6 and g, we could use some likelihood methods like the ones suggested by Brewer (1963). 65. It is quite possible that some problems of evaluating the variances of the estimators we may want to investigate given the approximate distribution function of• x will be encountered. Should it, for example, turn out that the X values are approximately normally distributed and the sampler wants to investigate the sampling designs and estimators I have studied, it would not be easy to evaluate the variances. But with other distribution functions and sampling designs things should be easy going and studying the properties of such sampling strategies is easy. Cassel and Sarndal (1973) show that results obtained under this model are valid for values of N as low as N =- 10. Some Recommendations I think sample survey theorists should spend more time simplifying and unifying the results of their research. They should spend more time in refining and reducing the number of different sampling designs and estimators that they are considering. They should, somehow, formulate a simple unified theory of sampling that can easily be put into practice. In'this connection, the estimators I have considered may prove useful. Surely, some problems may crop up initially. I also think that some ideas from General Statistical Theory can be useful in formulating a simple sample survey theory; there is no need of divorcing one of the sampling theories from the other. 66. REFERENCES Basu, D. (1971). An essay on the logical foundations of survey sampling, Part 1. in Foundations of Statistical Inference, ed. by V. P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 203-242. Brewer, K.R.W. (1963). Ratio estimation and finite populations: some results deducible from the assumption of an underlying stochastic process. Aust. J. Statist. 5. 93-105. Cassel, CM. and Sarndal, C.E. (1972a). A model for studying robustness of estimators and informative-ness of labels in sampling with varying probabilities, J. R. Statist. Soc. B., 35. 279-289. , (1972b). An analytic framework offering some guidelines to the choice of sampling design and the choice of estimator for finite populations. Working paper No. 142, Faculty of Commerce, UBC. , (1973). Evaluat ion of some sampling strategies for finite populations using a continuous variable framework. To appear in the April issue of Communications in Statistics. Cochran, W.G. (1946). Relative accuracy of systematic and stratified random samples for a certain class of populations. Ann. Math. Statist., 17, 164-177. , (1963). Sampling Techniques. New York: Wiley. Durbin, J., (1953). Some results in sampling theory when the units are selected with unequal probabilities. J. R. Statis. Soc. B., 15. 262-269. Foreman, E.K., and Brewer, K.R.W. (1971). The efficient Use of Supplementary Information in Standard Sampling Procedures. J.R. Statist. Soc. B.. 33. 391-400. Godambe, V.P. (1955). A unified theory of sampling from finite populations. J.R. Statist. Soc.B., 17, 269-278. , (1965). A review of the contributions towards a unified theory of sampling from finite populations. Rev. Int. Statist. Inst.. 32.. 242-253. 67. Godambe, V.P. (1969). Some aspects of the theoretical developments in survey sampling. New Develop ments in Survey Sampling, ed. by N. L. Johnson and H. Smith, Jr., 27-58. New York: Wiley. ., (1970). Foundations of Survey-sampling. The American Statistician, 24. 33-38. and Joshi, V.M. (1965). Admissibility and Bayes estimation in sampling finite populations,1. Ann. Math. Statist... 36. 1707-1722. Goodman, R. and Kish, L. (1950). Controlled selection - A technique in probability sampling. J. Amer. Statist, Assoc.. 45. 350-372. Hansen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sampling Survey Methods and iheory, Vol. 11. John Wiley & Sons, New York. Hanurav, T.V. (1967). Optimum utilization of Auxiliary Information: f/"PS sampling of two units from a stratum. J. R. Statist. Soc. B. 29, 374-391. Hartley, H.O. and Rao, J.N.K. (1962). Sampling with unequal probabilities and without replacement. Ann. Math. Statist. 2. 77-86. Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Ass. 47. 663-635. Joshi, V.M. (1969). Admissibility of Estimates of the mean of a finite population. New Developments in Survey Sampling, ed. by N. L. Johnson and H. Smith, Jr., 188-207, New York: Wiley. Neyman, J. (1934). On the two different aspects of the representative method. The method of stratified sampling and the method of purposive selection. J. R. Statist. Soc. 97.. 558-606. Rao, J.N.K. (1969). Ratio and Regression Estimators, New Developments in Survey Sampling. Ed. by N.L. Johnson and H. Smith, Jr., 213-234, New York, Wiley. Rao, P.S.R.S. (1969). Comparison of four ratio-type estimates under a model. J. Amer. Statist. Assoc. 64. 574-580. oo. Rao, T.J. (1967). On the choice of a strategy for the Ratio Method of estimation. J. R. Statist. Soc. B. 29, 392-397. Royall, R.M. (1970). On finite population sampling theory under certain linear regression models. Biometrika, 57. 377-387. , (1971). Linear regression models in finite population sampling theory. In Foundations of Statistical Inference, ed. by V.P. Godambe and D. A. Sprott, Toronto: Holt, Rinehart and Winston of Canada Ltd., 259-279. Sarndal, C.E. (1972). Sample Survey theory Vs. General Statistical theory: estimation of the population mean. Rev. Int. Statist. Inst., 40. 1-12. Sukhatme, P.V. and Sukhatme, B.V. (1970). Sampling Theory of Surveys with Applications, Asia Publishing House, Bombay 1.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The ratio, mean-of-the ratios and Horvitz-Thompson...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The ratio, mean-of-the ratios and Horvitz-Thompson estimators under the continuous variable model Chamwali, Anthony Alifa 1974-01-21
pdf
Page Metadata
Item Metadata
Title | The ratio, mean-of-the ratios and Horvitz-Thompson estimators under the continuous variable model |
Creator |
Chamwali, Anthony Alifa |
Date Issued | 1974 |
Description | This study investigates the performances of the ratio estimator, the mean-of-the-ratios estimator and the Horvitz-Thompson (HT) estimator under the continuous variable model of Cassel and Sarndal (1972a, 1972b, 1973). Under this model, the character, Y, which is of interest to the investigator is assumed to be related to an auxiliary variable, X, by Y(Xi) = θ(Xi + Z(Xi)) where ℇ(Zi | Xi) = 0; ∀Xi ℇ (0, ∞); ℇ(Zi² | Xi) = σ² (Xi) = k² Xi[sup g]; ℇ(ZiZj | XiXj) =0; (i ≠ j). It is assumed, in this paper, that X is gamma distributed over (0, ∞) with parameter r. The mean of Y is to be estimated, under the additional assumptions that the design function, P(x), is l) polynominal 2) exponential, i.e. [formulas are not included]. It is observed that for g = 0 or 1, the ratio estimator performs better than the other two. For g = 0, 1 or 2, and for a wider range of values of m or c, the mean-of-the-ratios estimator performs better than the HT estimator. When P(X) is polynominal, the III estimator is most efficient if the sampling design is approximately pps. The results compare well with those of other researchers under similar assumptions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0092965 |
URI | http://hdl.handle.net/2429/18800 |
Degree |
Master of Science in Business - MScB |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1974_A4_6 C49_3.pdf [ 13.79MB ]
- Metadata
- JSON: 831-1.0092965.json
- JSON-LD: 831-1.0092965-ld.json
- RDF/XML (Pretty): 831-1.0092965-rdf.xml
- RDF/JSON: 831-1.0092965-rdf.json
- Turtle: 831-1.0092965-turtle.txt
- N-Triples: 831-1.0092965-rdf-ntriples.txt
- Original Record: 831-1.0092965-source.json
- Full Text
- 831-1.0092965-fulltext.txt
- Citation
- 831-1.0092965.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092965/manifest