UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Small area quantile estimation under unit-level models Zhang, Qiong 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_november_zhang_qiong.pdf [ 775.5kB ]
Metadata
JSON: 24-1.0355208.json
JSON-LD: 24-1.0355208-ld.json
RDF/XML (Pretty): 24-1.0355208-rdf.xml
RDF/JSON: 24-1.0355208-rdf.json
Turtle: 24-1.0355208-turtle.txt
N-Triples: 24-1.0355208-rdf-ntriples.txt
Original Record: 24-1.0355208-source.json
Full Text
24-1.0355208-fulltext.txt
Citation
24-1.0355208.ris

Full Text

Small Area Quantile Estimation under Unit-Level ModelsbyQiong ZhangB.Sc. Statistics, University of Science and Technology of China, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)August 2017c© Qiong Zhang, 2017AbstractSample surveys are widely used as a cost-effective way to collect information onvariables of interest in target populations. In applications, we are generally in-terested in parameters such as population means, totals, and quantiles. Similarparameters for subpopulations or areas, formed by geographic areas and socio-demographic groups, are also of interest in applications. However, the sample sizemight be small or even zero in subpopulations due to the probability samplingand the budget limitation. There has been intensive research on how to producereliable estimates for characteristics of interest for subpopulations for which thesample size is small or even zero. We call this line of research Small Area Esti-mation (SAE). In this thesis, we study the performance of a number of small areaquantile estimators based on a popular unit-level model and its variations.When a finite population can be regarded as a sample from some model, wemay use the whole sample from the finite population to determine the model struc-ture with a good precision. The information can then be used to produce morereliable estimates for small areas. However, if the model assumption is wrong, theresulting estimates can be misleading and their mean squared errors can be un-derestimated. Therefore, it is critical to check the robustness of estimators undervarious model mis-specification scenarios. In this thesis, we first conduct simula-tion studies to investigate the performance of three small area quantile estimators inthe literature. They are found not to be very robust in some likely situations. Basedon these observations, we propose an approach to obtain more robust small areaquantile estimators. Simulation results show that the proposed new methods havesuperior performance either when the error distribution in the model is non-normalor the data set contain many outliers.iiLay SummarySample surveys are cost-effective ways to collect information on different char-acteristics of interest. However, the sample size for the subpopulations might bevery small or even zero due to the randomness and budget limitation. We are in-terested in providing reliable quantile estimators for subpopulations for which thesample size is small. We first study three small area quantile estimators and theninvestigate other possible small area quantile estimators that are potentially morerobust.iiiPrefaceThis dissertation is original, unpublished work by the author, Qiong Zhang. Theresearch topic is suggested by the supervisor Prof. Jiahua Chen. The supportingsimulation studies are designed and carried out by the author.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Small area estimation . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Quantile estimation . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Small area quantile estimation . . . . . . . . . . . . . . . . . . . 31.4 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 42 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Notation and assumptions . . . . . . . . . . . . . . . . . . . . . . 62.2 Model-based and design-based methods . . . . . . . . . . . . . . 72.3 Area-level models and unit-level models . . . . . . . . . . . . . . 102.4 Nested error linear regression model with applications to small areamean estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11v2.4.1 Nested error linear regression model . . . . . . . . . . . . 112.4.2 Real data example for small area mean estimation . . . . . 142.5 Transformation of quantile estimation into cumulative distributionfunction prediction . . . . . . . . . . . . . . . . . . . . . . . . . 193 Small Area Quantile Estimation . . . . . . . . . . . . . . . . . . . . 213.1 Molina-Rao approach . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Chen-Liu approach . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.1 Estimation under nested error linear regression model . . . 273.2.2 Estimation under density ratio model . . . . . . . . . . . 283.3 M-quantile approach . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 M-quantile regression model . . . . . . . . . . . . . . . . 333.3.2 M-quantile approach for small area estimation . . . . . . 363.4 Performance of quantile estimators . . . . . . . . . . . . . . . . . 403.4.1 Simulation setting . . . . . . . . . . . . . . . . . . . . . 403.4.2 Estimators under comparison . . . . . . . . . . . . . . . . 423.4.3 Performance measures . . . . . . . . . . . . . . . . . . . 443.4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . 454 Small Area Quantile Estimators based on Linear Mixed Effect Me-dian Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 544.1 Linear mixed effect median regression model . . . . . . . . . . . 554.2 New quantile estimation . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Estimation under DRM . . . . . . . . . . . . . . . . . . . 604.2.2 Estimation under identically distributed errors . . . . . . . 614.2.3 The equivalence of quantile estimation under DRM andunder identically distributed errors . . . . . . . . . . . . . 634.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.1 Simulation setting . . . . . . . . . . . . . . . . . . . . . 654.3.2 Estimators for comparison . . . . . . . . . . . . . . . . . 684.3.3 Simulation results . . . . . . . . . . . . . . . . . . . . . 705 Conclusion and Future Topics . . . . . . . . . . . . . . . . . . . . . 78viBibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80viiList of TablesTable 2.1 Crop Area Survey and Satellite Data for Corn and Soybeans in12 Iowa Counties. . . . . . . . . . . . . . . . . . . . . . . . . 16Table 2.2 Predicted Hectares of Corn With Standard Errors of SampleMean Predictors and Predictor based on NELRM. . . . . . . . 17Table 3.1 The AAB and AMSE under scenarios [1]-[3] with ni = 10 andNi = 500. The estimators are sorted so that the AMSEs of 50%quantile estimators are in ascending order. . . . . . . . . . . . 45Table 3.2 The AAB and AMSE under scenarios [4]-[6] with ni = 10 andNi = 500. The estimators are sorted so that the AMSEs of 50%quantile estimators are in ascending order. . . . . . . . . . . . 46Table 4.1 The average coefficient of determination across 1000 repetitionsunder scenarios (i)-(vi) when ni = 10 and Ni = 500. . . . . . . 70Table 4.2 The AAB and AMSE under scenarios (i)-(iii) when ni = 10 andNi = 500. The estimators are sorted so that the AMSEs of 50%quantile estimators are in ascending order. . . . . . . . . . . . 71Table 4.3 The AAB and AMSE under scenarios (iv)-(vi) when ni = 10and Ni = 500. The estimators are sorted so that the AMSEs of50% quantile estimators are in ascending order. . . . . . . . . 72Table 4.4 The average MSEs of the estimated regression coefficient basedon different approaches under scenarios(i)-(vi) when ni = 10and Ni = 500. . . . . . . . . . . . . . . . . . . . . . . . . . . 72viiiTable 4.5 The AAB and AMSE under scenarios (i)-(iii) when ni = 30 andNi = 1000. The estimators are sorted so that the AMSEs of 50%quantile estimators are in ascending order. . . . . . . . . . . . 76Table 4.6 The AAB and AMSE under scenarios (iv)-(vi) when ni = 30and Ni = 1000. The estimators are sorted so that the AMSEs of50% quantile estimators are in ascending order. . . . . . . . . 77ixList of FiguresFigure 1.1 Histogram of the 1987 annual salary on opening day in thou-sands of dollars. . . . . . . . . . . . . . . . . . . . . . . . . . 2Figure 2.1 Plot of Reported Hectares Versus Number of Pixels by Countyfor (a) Corn; (b) Soybeans. . . . . . . . . . . . . . . . . . . . 20Figure 3.1 Different choices of ψ(u) in M-quantile regression model. . . 34Figure 3.2 The AB and MSE of quantile estimators with respect to smallareas under scenario [1]. . . . . . . . . . . . . . . . . . . . . 48Figure 3.3 The AB and MSE of quantile estimators with respect to smallareas under scenario [2]. . . . . . . . . . . . . . . . . . . . . 49Figure 3.4 The AB and MSE of quantile estimators with respect to smallareas under scenario [3]. . . . . . . . . . . . . . . . . . . . . 50Figure 3.5 The AB and MSE of quantile estimators with respect to smallareas under scenario [4]. . . . . . . . . . . . . . . . . . . . . 51Figure 3.6 The AB and MSE of quantile estimators with respect to smallareas under scenario [5]. . . . . . . . . . . . . . . . . . . . . 52Figure 3.7 The AB and MSE of quantile estimators with respect to smallareas under scenario [6]. . . . . . . . . . . . . . . . . . . . . 53Figure 4.1 Probability density functions of the normal, laplace, mixturenormal distributions. . . . . . . . . . . . . . . . . . . . . . . 66xFigure 4.2 Scatter plot of y versus x in an example sample generated un-der: (a) error distribution matches the assumption for NELRM;(b) non-normal but symmetric and identical error distributions;(c) non-normal, non-symmetric but identical error distributions;(d) non-normal, non-symmetric and non-identical error distri-butions; (e) outliers created by large variance of the error dis-tribution; (f) mean shift outliers. The observations in each areaare sorted so that x is in the ascending order and the line isdrawn in each area to connect these observations. The blackline is the regression with slope 50 and intercept 10. . . . . . 67xiAcknowledgementsI would like to express my deepest gratitude to my supervisor Professor JiahuaChen. Thank you for your support, encouragement, patience, and excellent ideas.Thank you for helping me learn to start to think critically and accurately. Thankyou for all from the bottom of my heart.I would like to thank CANSSI for supporting my internship at Statistics Canada.I would like to express my sincere gratitude to Professor J.N.K. Rao for his willing-ness to share his remarkable knowledge and insight. I would also like to thank mysupervisor Yong You at Statistics Canada and Jean-Franc¸ois for their comments,suggestions, and support during my internship. Special thanks to the small area es-timation group in Statistics Canada for their kindness and help. I am also gratefulto Prof. William J. Welch for being my second reader and his helpful suggestionsconcerning my thesis.Big thanks to my mother Decun Li and my brother Teng Zhang for their un-changing love. I also wish to thank Kuan Wang for being my best friend andsupporting me all the time. Lastly, special thanks to everyone in the department ofStatistics at UBC to make our department such a wonderful place.xiiChapter 1Introduction1.1 Small area estimationSample surveys are widely used as a cost-effective way for both public and privateagencies to gather information on variables of interest in target populations. Peopleare generally interested in estimating population parameters such as total, meanand median based on a random sample drawn from the target population. Suchparameters for subpopulations or areas, formed by geographic areas and socio-demographic groups, are also of interest in applications. Design-based estimatorsmake use of mostly direct information contained in sample. They have satisfac-tory performance when the sample size from the target population is sufficientlylarge ensured by proper sampling plan. Although the sampling plan and samplesize in a survey are usually well designed at the population level such as countryor province, the sample size might be small or even zero in subpopulations (e.g.,counties) due to the probability sampling and the budget limitation. In this case,the design-based estimators for subpopulation parameters may fail to attain desiredaccuracy and become unreliable. The research problem of improving the estima-tion accuracy and producing reliable estimates of parameters for subpopulations inthis situation is called Small Area Estimation (Rao and Molina, 2015). The “SmallArea” is not referring to the small geographical area of the subpopulation, but tothe small sample size in the subpopulation.To overcome the limitation induced by the small sample sizes, statisticians usu-1ally use auxiliary information and indirect information from other areas to “bor-row” strength to “increase” effective sample size. The strength-borrowing strat-egy can be achieved both by design-based and model-based methods. In this the-sis, we only discuss methods that activate the strength-borrowing strategy throughmodels. The definition for model-based methods and their differences from thedesign-based methods are discussed in Section 2.2. We also illustrate the ideaof model-based methods and strength-borrowing via a real example in small areamean estimation in Section 2.4.2.1.2 Quantile estimationThe quantile of a population can be a more informative summary of a populationin real application when the population distribution is skewed. This can happenin many applications. To illustrate the usefulness of the population quantiles, letus take the 1987 annual salary of USA Major League Baseball players’ obtainedfrom ISLR package in R software as an example. Figure 1.1 is the histogram ofthe players’ salary.Histogram of 1987 annual salary on opening day Salary (in thousands of dollars)Frequency0 500 1000 1500 2000 25000204060Figure 1.1: Histogram of the 1987 annual salary on opening day in thousands of dollars.2The histogram of the salary shows that the salary distribution is highly skewedto the right as a few excellent players earn way more than other players. The popu-lation mean is not so indicative on the salary of an average player. One would havegrossly overestimated the salary of an average player if the mean salary is used. Incomparison, the median is more reliable than the mean in this case. In addition, themedian is much less influenced by the presence of a few extremely large observa-tions compare with the mean. In the current example, the highest salary among theplayers is 2460 (thousands), the mean of the annual salary is 535.9 (thousands) andthe median of the salary is 425 (thousands). When we manually change the high-est salary to 8000 (thousands), which is an extremely high value, the mean wouldbe increased to 557 (thousands) while the median still remains the same. That is,the median is hardly influenced by the presence of a very large observation butthe mean is affected. Lastly, one can use multiple quantiles to provide informativesummary of the distribution under investigation.1.3 Small area quantile estimationSmall area mean estimation problem has been studied intensively over many yearsand some methods have been successfully applied to real survey data (Elbers et al.,2003; Schaible, 2013; Tzavidis et al., 2008). Although both mean and quantile of avariable of interest are functionals of the distribution function, the success of smallarea mean estimation relies on the fact that the mean is linear in the variable ofinterest. Since quantiles are nonlinear in the variable of interest, the nonlinearitymakes the small area quantile estimation difficult (Diallo and Rao, 2014). Therehave been fewer but still many methods proposed in the literature for small areaquantile estimation. We now give a short review of this area of research. Moredetailed review of some existing methods will be given in Chapter 3.Based on a unit-level model, Molina and Rao (2010) proposed Empirical Best(EB) estimators of complex parameters. The EB estimator is initially proposedfor assessing poverty mapping, but it can also be used for estimating small areaquantiles. A key assumption of this model is that the observations can be describedby a linear regression with its error term normally distributed. Naturally, the vio-lation of the normality assumption may lead to large bias and significant increase3of the mean squared error (MSE) of the estimators of complex parameters. Toovercome this shortcoming, Diallo and Rao (2014) developed EB estimator forcomplex parameters such as quantiles under the basic unit-level model where er-ror terms follow a skewed-normal distribution. This problem is also tackled fromanother angle by Chen and Liu (2017). They did not place any parametric as-sumptions on error terms, but assume that the distributions of different small areasare linked through a Density Ratio Model (DRM). They subsequently proposed asemi-parametric method for quantile estimation. Chambers and Tzavidis (2006)proposed an M-quantile approach for small area estimation which is claimed to berobust to outliers. We refer to Pfeffermann et al. (2013) for a comprehensive reviewon the new important developments in Small Area Estimation (SAE) and Rao andMolina (2015) for technical details and examples of these methods.We study the performance of the M-quantile for small area quantile estimationand find it is not robust to the outliers as anticipated. Outliers are very common insurvey data especially in the business surveys (Beaumont and Rivest, 2009). There-fore, it is an important research problem to develop robust small area quantile es-timators. In this thesis, we explore and develop small area quantile estimators thatare robust to outliers and non-normal error distributions. Our idea is to combine theideas in Chen and Liu (2017) and Chambers and Tzavidis (2006). In the commonlyused linear regression models, the conditional expectation of the variable of inter-est is assumed linear in explanatory variables. In this thesis, we explore the use ofmedian regression model. In this model, we assume that the conditional medianof the variable of interest is linear in explanatory variables. Based on the medianregression model, we adopt the same semi-parametric approach of Chen and Liu(2017) for small area quantile estimation. We conjecture the median regressionbased small area quantile estimators are more robust to outliers. The conjecture issupported by the simulation studies.1.4 Outline of the thesisThe thesis is structured as follows: In Chapter 2, we introduce some basic con-cepts in small area estimation including but not limited to design-based methodsand model-based methods. In Chapter 3, we review three approaches in the liter-4ature for small area quantile estimation: Molina and Rao (2010), Chambers andTzavidis (2006) and Chen and Liu (2017). Chapter 3 concludes with simulationstudies to show the pros and cons of these approaches. In Chapter 4, we introducethe proposed quantile estimators based on linear mixed effect median regressionmodel. The simulation studies and the corresponding results are also presentedboth under correct model and different model misspecification scenarios to revealthe performance of proposed estimator and several other estimators discussed inthe thesis. Lastly, in Chapter 5, we draw the conclusion and discuss possible futurework on the small area quantile estimation.5Chapter 2Preliminaries2.1 Notation and assumptionsConsider a finite population of size N made of m areas such that the populationsize of the ith area is Ni. Let Ui be the index set for population units in small areai, si and ri =Ui− si respectively denote the index set for sampled and non-sampledunits in area i. Let{(yi j,xi j), i = 1, · · · ,m; j = 1, · · · ,Ni}be the values of the response and explanatory/auxiliary variables. For convenience,we use {(yi j,xi j), i = 1, · · · ,m; j = 1, · · · ,ni} for the random sample drawn fromthe finite population where i is the area index and j is the unit index in the cor-responding small area. We also assume the sample is non-informative, namelythe population and the observed sampling units can both be regarded as randomsamples from the same probability model. This probability model will be re-ferred as a superpopulation model. The informative sample needs to be consid-ered more carefully (Guadarrama et al., 2016). The total sample size is denoted asn = n1+n2+ · · ·+nm. We take yi j as the variable of interest and is only known forsampled units j ∈ si, xi j = (xi j1 = 1,xi j2, · · · ,xi jp)′ are the auxiliary variables andare known for all units in the finite population. This is a reasonable assumptionin many cases since the auxiliary variables can be obtained from a census in real6world applications.2.2 Model-based and design-based methodsThere are generally two types of inference methods when analysing survey data:model-based methods and design-based methods. We give a brief discussion onthe difference between model-based and design-based methods in this section. Letus illustrate the difference between the two methods through their treatment in es-timating the mean of a finite population based on a simple random sample withoutreplacement plan. As mentioned in Section 1.1, this thesis focuses on model-basedsmall area quantile estimation methods. We also discuss the strength and weaknessof model-based methods.Suppose P = {Y1,Y2, · · · ,YN} is the finite population and S = {y1,y2, · · · ,yn}is a simple random sample without replacement of size n drawn from the finitepopulation P. The population mean is given by Y¯ = 1N ∑Ni=1Yi and the sample meany¯ = 1n ∑ni=1 yi is an estimator for the population mean. As an estimator, the samplemean is a sensible one for the population mean under both design-based and model-based analyses.When the sample mean is regarded as a design-based estimator, we study itsvariance from a specific angle. In this case, we regard N response values in Pas non-random. However, due to randomness of n units drawn according to thesampling plan, the outcome of the sample mean differs from one realization toanother. Hence, the sample mean is a random variable from this angle. Is it anunbiased estimator of the finite population mean and how large is its variance?Let us start with design-based analysis in which response values Y1, . . . ,YN areregarded as non-random constants. We use Ed to indicate the expectation is based7on design induced randomness. It is seen thatEd(y¯) = Ed(n∑i=1yi)= Ed(1nN∑i=1I(Yi ∈ S)Yi)=1nN∑i=1YiEd(I(Yi ∈ S))=1nN∑i=1YinN= Y¯ .In the above calculation, we used the fact that Ed(I(Yi ∈ S)) = nN under simplerandom sampling without replacement. The variance of the sample mean is givenbyvar(y¯) =Cov(1nN∑i=1I(Yi ∈ S)Yi, 1nN∑i=1I(Yi ∈ S)Yi)=1n2N∑i=1var(I(Yi ∈ S))Y 2i +1n2 ∑i6= jcov(I(Yi ∈ S), I(Yj ∈ S)YiYj)=1n2N∑i=1nN(1− nN)Y 2i −1n2 ∑i 6= j1N−1nN(1− nN)YiYj=(1− nN) v2nwhere v2 = 1N−1 ∑Ni=1(Yi− Y¯ )2. The key fact in the above calculation is thatcov(I(Yi ∈ S), I(Yj ∈ S)) = n(n−1)N(N−1) −n2N2.In model-based analysis, the response values in the finite population, Y1,Y2, · · · ,YNare regarded as independent and identically distributed observations from somedistribution such as N(µ,σ2). Note that this distribution is also regarded as a su-perpopulation. Under simple random sample without replacement, the responsevalues in the sample, y1,y2, · · · ,yn are also independent and identically distributed8observations from, say N(µ,σ2) as well. The finite population mean, Y¯ , is a func-tion of the random variables. It is not a model parameter, but is generally regardedas a finite population parameter. The sample mean y¯ may be regarded as an esti-mator of superpopulation mean µ or a predictor of Y¯ . Let Em be the expectationoperator under the model. It is seen thatEm(y¯− Y¯ ) = EM(1n ∑yi∈Syi− 1NN∑i=1yi)=(1n− 1N)∑yi∈SEM(yi)− 1N ∑yi 6∈SEM(yi)=(1n− 1N)nµ− N−nNµ = 0.Therefore, y¯ is a model-unbiased predictor for the population mean. Under thesuperpopulation model, the variance of the sample mean is given by σ2n . One maybe interested to find that the model-based calculation is given byvar(y¯− Y¯ ) = (1− nN)σ2n.Note in the above analyses for the two cases that the same sample mean isregarded as the point estimator or predictor of the finite population mean respec-tively. In practice, if the normal model N(µ,σ2) is adopted, one would estimate σ2by the sample variance v2. Thus the design-based approach and the model-basedapproach with the normal model lead to the same estimator of the population meanand the same variance. If a different model were adopted, however, a design-basedapproach and a model-based approach can lead to same point estimation, howeverthe variances under two the approaches are different (Lohr, 2009). What is thereason behind this difference? This difference is due to the belief of the sourcesof randomness. In the design-based analysis, the variance is attributed to the dif-ference of Y values between different units in the finite population. In the model-based analysis, the variance is attributed to the superpopulation: the response valueof unit i differs from one realization of the finite population to another. Many pos-sible finite populations could be generated from the superpopulation.In the above example, we use the normal model for the purpose of illustra-9tion. In other applications, other probability models may be assumed based on ourknowledge of the data. For example, if an auxiliary variable is highly correlatedto the variable of interest, a linear regression model may be a good description ofthe relationship between the variable of interest and the auxiliary variable. Whensome probability models are suitable for the nature of the finite population, ef-fective data analysis strategies can be designed accordingly. However, when theassumed model does not appropriately reflect the nature of the finite population,the subsequent data analysis strategies can be misleading. Sometimes, the predic-tor can be more biased than we believe. In addition, the MSE of the predictor couldbe underestimated. Therefore, it is crucial to check the validity of model assump-tions. This can be a hard task in real applications. We refer to Thompson (1997)and Valliant et al. (2000) for discussions on the difference between design-basedmethods and model-based methods.In model-based methods, our objects of interest are often not model parame-ters, but summary statistics of the finite population. Therefore, we generally predictrather than estimate the object of interest. This will be the case for small area prob-lem and we should refer to many of our research problems as prediction problems.Nevertheless, in this thesis, we adopt the terminology in the literature and use “esti-mation” instead of “prediction”. We will make a distinguish between “estimation”and “prediction” only if it is necessary.2.3 Area-level models and unit-level modelsFor the purpose of small area estimation, depending on the sources of the auxiliaryinformation, two types of models are often utilized: area-level and unit-level mod-els. We introduce these two types of models and discuss their differences in thissection.Area-level models relate direct estimators of the characteristic of interest ofsmall areas to corresponding area-specific auxiliary variables. Area-level modelwas first used by Fay and Herriot (1979) and hence the model is named the Fay-Herriot model. The common representation of Fay-Herriot model isθˆi = θi+ ei = x′iβ +ui+ ei, i = 1, · · · ,m (2.1)10where area specific errors uii.i.d∼ N(0,σ2u ) and sampling error ei i.i.d∼ N(0,σ2e ). In(2.1), θi is the ith area population mean of the response variable and θˆi its directestimator. The xis are area-level auxiliary information, such as the area-level meansfrom census. Area-level models are usually used to improve the performance of thedirect estimators when only area-level auxiliary information is available.Unit-level models relate the unit values of the variable of interest to the unit-specific auxiliary information. In contrast to area-level models, the auxiliary infor-mation is assumed available for every unit in the population when unit-level modelsare considered. The most widely used unit-level model is the nested error linearregression model to be discussed in Section 2.4.Both area-level models and unit-level models are special linear mixed effectmodels where the random effects and the random errors account for the betweenarea and within area correlation. The principal difference between area-level mod-els and unit-level models are the availability of the auxiliary information at differ-ent levels. The difference leads to different response variables and their relation-ship with the auxiliary information. In this thesis, we only focus on the unit-levelmodels for small area quantile estimation.2.4 Nested error linear regression model withapplications to small area mean estimationSince we focus on unit-level models for small area quantile estimation, it is criticalto clearly describe and begin with the most widely used unit-level model in smallarea estimation called the Nested Error Linear Regression Model (NELRM). TheNELRM is used by Battese et al. (1988) for small area mean estimation. In thissection, we introduce the NELRM and show how the model was applied to IowaAgriculture Survey and for small area mean estimation in Battese et al. (1988).2.4.1 Nested error linear regression modelIn a nested error linear regression model, the variable of interest, yi j, is associatedwith the auxiliary variables at the population level byyi j = x′i jβ + vi+ ei j, i = 1, · · · ,m; j = 1, · · · ,Ni11where area-specific random effects vii.i.d∼ N(0,σ2v ) are independent of unit errorsei ji.i.d∼ N(0,σ2e ). We obtain a non-informative sample from the finite populationwhich means the sample also follows the NELRM given byyi j = x′i jβ + vi+ ei j, i = 1, · · · ,m; j = 1, · · · ,ni (2.2)where vii.i.d∼ N(0,σ2v ) are independent of ei j i.i.d∼ N(0,σ2e ). The NELRM, whenapplied to the sample, can be written in a matrix form asy = Xβ +Zv+ e (2.3)wherey′ = (y′1,y′2, · · · ,y′m), y′i = (yi1,yi2, · · · ,yini)X′ = (X′1,X′2, · · · ,X′m), X′i = (xi1,xi2, · · · ,xini)v′ = (v1, · · · ,vm)Z = diag(Z1,Z2, · · · ,Zm), Zi = 1nie′ = (e′1,e′2, · · · ,e′m), e′i = (ei1,ei2, · · · ,eini).The NELRM is the special case of general linear mixed effect model whichis also called linear random intercept model. The elements of β are called fixedeffects and the elements of v are called random effects. The model is “linear”because the conditional mean of yi j is a linear function of fixed and random effects.E(y|v,X) = Xβ +ZvMoreover, [ve]∼ N([00],[G 00 R])where G = σ2v Im and R = σ2e In. Therefore,[ye]=[Xβ0]+[Z II 0][ve]12and [ye]∼ N([Xβ0],[Z II 0][G 00 R][Z′ II 0]).The variable of interest y given X follows a multivariate normal distribution withmean and covariance matrix given byE(y|X) = XβVar(y|X) = R+ZGZ′ := Σ(σ2e ,σ2v )When the parameters β and η ′ = (σ2v ,σ2e ) are known, we predict the random ef-fects v by the corresponding Best Linear Unbiased Predictor (BLUP) denoted asv˜ which is an unbiased predictor linear in y with the smallest MSE. Given v˜ is anunbiased predictor for v which is linear in y, v˜ has the form v˜ = Ay+ c, where Aand c are some matrix and vector, the unbiasedness of v˜ implies thatE(v˜) = E(Ay+ c)= E [E(Ay+ c|v,X)] = E [AXβ ]+ c= AXβ + c = E(v) = 0Therefore, we must have c = −AXβ and v˜ = A(y−Xβ ) when the parameters βand η are known. Suppose d(y) is another linear unbiased predictor of v, by thefact thatE[(d(y)−v)2|y]≥ E[(E(v|y)−v)2|y]The BLUP of v is the BLUE of E(v|y) = GZ′Σ−1(y−Xβ ). When the parametersare known, by the specific form of Z and Σ, the BLUP for the random effects arev˜i = γi(y¯i− X¯′i)β (2.4)where γi = σ2vσ2v +σ2e /ni. However, in applications, the parameters η and β are un-known. Then we substitute these unknown parameters with their estimates andobtain the Empirical Best Linear Unbiased Predictor (EBLUP) for v.These parameters can be estimated by method of moments (Henderson, 1953),or alternatively by maximum likelihood (ML) or REstricted Maximum Likelihood13(REML) based on the normal likelihood (Hartley and Rao, 1967; Patterson andThompson, 1971). If maximum likelihood method is employed, the estimators forthe unknown parameters η and β maximize the log-likelihood function given byl(β ,η) =−12log(|Σ(η)|)− 12(y−Xβ )′Σ(η)−1(y−Xβ ). (2.5)For numerical computation, profile likelihood is usually used to obtain the estima-tor. It follows Battese et al. (1988) and Prasad and Rao (1990) that when variancecomponents η is known, the MLE of β is given byβˆ (η) =(X′Σ(η)−1X)−1 X′Σ(η)−1y. (2.6)Then MLE ηˆ of η maximizesl(η) =−12log(|Σ(η)|)− 12(y−Xβˆ (η))′Σ(η)−1(y−Xβˆ (η)) (2.7)Substituting ηˆ into (2.6), we get the ML estimators βˆMLand ηˆML. Replacing theunknown parameters by the corresponding estimators in (2.4), the correspondingestimator denoted as vˆ is the EBLUP of v.So far, the setup and parameter estimation in the nested error linear regressionmodel is presented. One may ask how the nested error linear regression model beapplied for small area estimation. We give a real data example for its use in smallarea mean estimation in the next section. The extensions of the nested error linearregression model with cases where the variances of the error term ei j are differentover small areas and the variable of interest is multivariate can be found in Rao andMolina (2015).2.4.2 Real data example for small area mean estimationThe use of the nested error linear regression model for small area estimation ismotivated by a real survey data in Battese et al. (1988) where they were engagedin the prediction of mean hectares of corn and soybeans per segment for 12 coun-ties in north-central Iowa. The survey called 1978 June Enumerative Survey (JES)was conducted to make an improvement of the crop areas estimates for large re-14gions such as states and also to develop estimates for crops areas for small regionssuch as counties. The sampling unit in JES, called segment, is areas of land, typi-cally 250 hectares in size. A two-level stratified sampling plan is employed in thesurvey (Sigman et al., 1978). The first-level strata are the individual states. Sec-ondary strata are areas of land within a state which have similar patterns of landuse. The secondary strata areas were determined by visual interpretation of aerialphotographs from the land observatory satellites (LANDSAT) data. The hectaresfor corns and soybeans and other information related to the segments in the samplewere obtained by interviewing farm operators.Table 2.1 presents the number of segments, the reported hectares in each sam-pled segment in each of 12 counties of interest in Iowa State. As shown in thetable, the sample sizes in these counteis are very small. Battese et al. (1988) wereinterested in the possible use of the LANDSAT data to improve the precision of theestimation of the mean crop areas. The LANDSAT collected a set of measurementsof the earth’s surface called the “pixel”, which is 0.4 hectares in size, from satellite.The procedure in U.S. Department of Agriculture (USDA) can determine the typeof crops grown in the pixel based on the satellite data. Based on this procedure, thenumber of pixels that grow corn and soybeans in each segment can be obtained.The information is given as the last two columns in Table 2.1.Plots of reported hectares versus number of pixels by county in the sample areshown in Figure 2.1. From the plots, Battese et al. (1988) found that it is evi-dent that there is a strong relationship between the reported hectares of corn andthe number of pixels of corn, and between the reported hectares of soybeans andthe number of pixels of soybeans. In addition, the plots indicate that observationsfor segments within counties tend to be closer than observations between counties.Therefore, based on these observations, they propose the nested error linear regres-sion model for the relationship between the reported hectares and the number ofpixels of corns and soybeans from the satellite.Let yi j be the reported hectares of corns (or soybeans) of the jth segment in itharea. Let xi j1 be the corresponding number of pixels of corns, xi j2 be the numberof pixels of soybeans. Let vis be the random effects of the counties and ei js be15Table 2.1: Crop Area Survey and Satellite Data for Corn and Soybeans in 12 Iowa Counties.County No. of segments Reported hectares No. of pixels insample segmentsSample County Corn Soybeans Corn SoybeansCerro Gordo 1 545 165.76 8.09 374 55Hamilton 1 566 96.32 106.32 209 218Worth 1 394 76.08 103.60 253 250Humboldt 2 424 185.35 116.43 432 96116.43 63.82 367 178Franklin 3 564 162.08 43.50 361 137152.04 71.43 288 206161.75 42.49 369 165Pocahontas 3 570 92.88 105.26 206 218149.94 76.49 316 22164.75 174.34 145 338Winnebago 3 402 127.07 95.67 355 128133.55 76.57 295 14777.70 93.48 223 204Wright 3 567 206.39 37.84 459 77108.33 131.12 290 147118.17 124.44 207 258Webster 4 687 99.96 144.15 252 303140.43 103.60 293 22198.95 88.59 206 222131.04 115.58 302 274Hancock 5 569 114.12 99.15 313 190100.60 124.56 246 270127.88 110.88 353 172116.90 109.14 271 22887.41 143.66 237 297Kossuth 5 965 93.48 91.05 221 167121.00 132.33 369 191109.91 143.14 243 249122.66 104.13 342 182104.21 118.57 294 179Hardin 6 556 88.59 102.59 220 26288.59 29.46 340 87165.35 69.28 355 160104.00 99.15 261 22188.63 143.66 187 345153.70 94.49 350 190segment specific errors. The model at the population level is given byyi j = β0+β1xi j1+β2xi j2+ vi+ ei j= x′i jβ + vi+ ei j, i = 1, · · · ,12; j = 1, · · · ,Ni(2.8)where xi j = (1,xi j1,xi j2)′, β = (β0,β1,β2)′, vii.i.d∼ N(0,σ2v ), and ei j i.i.d∼ N(0,σ2e ).16Due to the non-informativeness of the sampling plan, the model holds for sampledunits.The parameters of interest, ϑi = 1Ni ∑Nij=1 yi j, written in terms of the auxiliaryinformation based on the NELRM are given byϑi =1NiNi∑j=1yi j = X′i.β + vi+ e¯i.≈ X′i.β + vi(2.9)where Xi. = 1Ni ∑Nij=1 xi j contains the population mean pixel values and the approx-imation is based on the strong law of large numbers and the expectation of thesample error is zero. The population mean pixel values are known because thenumber of pixels of corn and soybeans are available from the satellite classifica-tions for all segments in the ith county. The authors then derive the form of theempirical best linear unbiased predictor for subpopulation means given byϑ̂i = γˆiy¯i+(Xi.− γˆix¯i)′βˆ (2.10)where γˆi = σˆ2vσˆ2v +σˆ2e /ni, y¯i = 1ni ∑nij=1 yi j, and x¯i =1ni ∑nij=1 xi j. The estimators σˆ2v , σˆ2e ,and βˆ for their corresponding unknown parameters in (2.10) are obtained usingmaximum likelihood method as described in the Section 2.4.1. The fourth col-Table 2.2: Predicted Hectares of Corn With Standard Errors of Sample Mean Predictors andPredictor based on NELRM.County Sample segments Predicted hectares Standard errorPredictor based on NLERM Sample meanCerro Gordo 1 122.2 9.6 30.5Hamilton 1 126.3 9.5 30.5Worth 1 106.2 9.3 30.5Humboldt 2 108.0 8.1 21.5Franklin 3 145.0 6.5 17.6Pocahontas 3 112.6 6.6 17.6Winnebago 3 112.4 6.6 17.6Wright 3 122.1 6.7 17.6Webster 4 115.8 5.8 15.2Hancock 5 124.3 5.3 13.6Kossuth 5 106.3 5.2 13.6Hardin 5 143.6 5.7 13.617umn in Table 2.2 contains the estimated standard errors of the subpopulation meanestimators based on the NELRM given by (2.10). The standard errors for the sam-ple means ˆ¯yi = 1ni ∑nij=1 yi j are given in the last column in Table 2.2. As shown inTable 2.2, the estimated standard errors of the design-based sample mean estima-tor are almost three times as that of the small area mean estimators based on theNELRM. This shows the improvement of the quality of the estimation. This is ex-plainable because ϑ̂i across different areas share the common regression coefficientβ , the estimation of the regression coefficient using sampled information from allareas enable the strength-borrowing strategy so that the parameter can be estimatedmore accurately.The success of the NELRM based small area mean estimator relies on the valid-ity of ϑi =X′i.β +vi. The representation based on the auxiliary variables makes theestimation of the subpopulation means straightforward. Unfortunately, the idea ofrepresenting the population parameter by the auxiliary variables cannot be directlyused for the small area quantile estimation for two reasons. Firstly, the populationquantiles are non-linear in yi js. If we follow the same procedure, the subpopula-tion quantiles cannot be expressed explicitly. Therefore, we transform the quantileestimation problem into a cumulative distribution function (CDF) prediction prob-lem as described in Section 2.5. Secondly, even after transforming the quantileestimation problem to CDF prediction, the CDF is still non-linear in yi js and can-not be straightforwardly extended from the small area mean estimator. Hence, thesmall area quantile estimators are reviewed in Chapter 3 to show how the unit-levelmodels can be used for small area quantile estimation.182.5 Transformation of quantile estimation intocumulative distribution function predictionThe definition of a population quantile is through its cumulative distribution func-tion (CDF). The CDF of variable of interest y of the ith small area is given byFi(t) =1NiNi∑i=1I(yi j ≤ t)=1Ni[∑j∈siI(yi j ≤ t)+∑j 6∈siI(yi j ≤ t)].For any α ∈ (0,1), the αth quantile ξi,α of the distribution y in ith area is usuallydefined asξi,α = F−1i (α) = inf{t : Fi(t)≥ α}. (2.11)Note that in model-based approaches, we remarked that yi js in each area are treatedas random samples from some probability model. Therefore, ξi,α in (2.11) is afinite population parameter which is random in the eyes of model-based data anal-ysis. Once a predictor for Fi is given, a natural predictor for a quantile can beobtained via (2.11). Hence, the problem of predicting quantiles for a finite popu-lation becomes the problem of predicting CDF. In fact, one may also predict smallarea means through their connections with corresponding CDFs:ϑi =∫tdFi(t) =1NiNi∑j=1yi jFollowing Molina and Rao (2010), many finite population parameters have theform ofHi =1NiNi∑j=1h(yi j) (2.12)for some function h(·). When h(x) = x, Hi is the finite population mean. Whenh(x) = I(x≤ t), Hi is the CDF of the variable of interest y evaluated at t. To makethe notation simpler and clearer, we use Hit to denote the CDF of the variable ofinterest in ith area evaluated at t. We discuss the problem of predicting Hit fort ∈ R.19150 200 250 300 350 400 4506080100120140160180200Number of PixelsReported HectaresPlot of Corn Hectares Verus Corn Pixels by CountyCountyCerro_GordoFranklinHamcockHamiltonHardinHumboldtKossuthPocahontasWebsterWinnebagoWorthWright(a)50 100 150 200 250 300 350050100150Number of PixelsReported HectaresPlot of Soybean Hectares Verus Soybean Pixels by CountyCountyCerro_GordoFranklinHamcockHamiltonHardinHumboldtKossuthPocahontasWebsterWinnebagoWorthWright(b)Figure 2.1: Plot of Reported Hectares Versus Number of Pixels by County for (a) Corn; (b)Soybeans.20Chapter 3Small Area Quantile EstimationIn this chapter, we study three existing small area quantile estimation methods inthe literature. The first one is the parametric approach proposed by Molina andRao (2010), their approach is based on the nested error linear regression model inwhich the error term is assumed to be normally distributed. The second methodto be investigated is the one given by Chen and Liu (2017), they also assumedthe nested error linear regression model. However, they remove the normality as-sumption on the error term but allow one nonparametric distribution for each area.At the same time, they assume that these distributions are connected via a densityratio model. The third method we investigate is the M-quantile regression basedapproach proposed by Chambers and Tzavidis (2006). We will give a clear anddetailed description of these three methods. After which, we conduct a simulationstudy to examine their performances.3.1 Molina-Rao approachThe Molina-Rao approach is proposed by Molina and Rao (2010) for estimatingcomplex small area parameters such as poverty indicators. This proposed methodcan also be used for estimating small area quantiles. In this section, we reviewthe Molina-Rao approach and give details of the numerical implementation of theestimator. We specifically discuss the numerical implementation when the basicunit-level model is assumed. The numerical computation is simplified by utilizing21the properties of multivariate normal distribution.As discussed in Section 2.5, we wish to predict the CDF of variable of interestevaluated at any t ∈ R in each area which is given by Hit in a more general form.Let yis, yir respectively be the sub-vectors of yi corresponding to sampled and non-sampled y values. The idea of the method is that for finite population parametersin form (2.12), an unbiased predictor with minimum variance for Hit under a prob-ability model, when the parameters in the probability model are known, is givenbyHˆit = E(Hit |y1s, · · · ,yms)=1Ni[∑i∈sih(yi j)+∑j 6∈siE(h(yi j)|y1s, · · · ,yms)]=1Ni[∑i∈sih(yi j)+∑j 6∈sihˆi j]where hˆi j =E(h(yi j)|y1s, · · · ,yms), j 6∈ si may be regarded as the predicted value forh(yi j). Therefore, Molina-Rao predict the nonlinear function h(yi j) of yi j evaluatedat t instead of predicting the non-sampled yi j values. When the parameters inthe probability model are unknown in a real application, Hˆit depends on unknownparameters. Then Molina and Rao (2010) substituted these unknown parametersin Hˆit with suitable estimates and the corresponding predictor for Hit is called theEmpirical Best (EB) estimation predictor.To be more specific, when the parameter values in the probability model areknown, the mean squared error of a predictor hˆi j is given byMSE(hˆi j) = Eyi{(hˆi j−h(yi j))2} (3.1)where Eyi denotes the expectation with respect to the joint distribution of the pop-ulation vector in the ith area. The Best Predictor (BP) of h(yi j) that minimizes themean squared error is given by the conditional expectationhˆBi j = Eyir(h(yi j)|yis)=∫Rh(yi j) fyi j(y|yis)dy, j ∈ ri(3.2)22where fyi j(y|yis) is the conditional probability density function (pdf) of yi j givenobserved yis. We may regard (3.2) as an unbiased estimator sinceE(hˆBi j)= Eyis{Eyir(h(yi j)|yis)}= Eyi(h(yi j)).Wherever hˆBi j depends on unknown parameters in the probability model, bysubstituting these parameters with suitable estimators in (3.2), the correspondingpredictor (hˆEBi j ) called Empirical Best Predictor (EBP) is obtained. Thus, we maypredict Fi byFˆEBi (t) =1Ni[∑j∈siI(yi j ≤ t)+∑j∈rihˆEBi j]. (3.3)Subsequently, we may predict the pth quantile byξˆMRp,i = inf{t : FˆEBi (t)≥ p}. (3.4)Numerical implementation under NELRMTo evaluate the predictor FˆEBi (t), one may first derive the distribution of fyi j(y|yis)and then replace any unknown parameters in the conditional distribution with theirestimates. We may obtain numerical values of hˆEBi j via (3.2).Deriving the conditional pdf for yi j|yis, j ∈ ri under NELRM is relatively sim-ple. Note that yi ∼ N(µ i,Vi) andµ i = Xiβ , Vi = σ2v 1Ni1′Ni +σ2e INiwhere 1Ni is a column vector of ones of length Ni and INi is the Ni×Ni identitymatrix. Taking the decomposition of y′i = (y′is,y′ir) and the property of multivariatenormal distribution, we getyir|yis ∼ N(µ ir|s,Vir|s) (3.5)whereµ ir|s = Xirβ +σ2v 1Ni1′NiV−1is (yis−Xisβ ), (3.6)23Vir|s = σ2v (1− γi)1Ni−ni1′Ni−ni +σ2e INi−ni , (3.7)andVis = σ2v 1Ni1′Ni +σ2e Ini , γi =σ2vσ2v +σ2e /ni.The conditional distribution of yi j|yis for j ∈ ri can be obtained accordingly.However, the conditional distribution contains some unknown parameters β , σ2vand σ2e . They must be replaced by some appropriate estimates before the proposedEB can be evaluated. Molina and Rao (2010) did not recommend specific estima-tors for these parameters. We may estimate these parameters by their maximumlikelihood estimates (MLE) as described below. The MLEs of the parameters areused in our simulation study.When h(x) = I(x≤ t) and under the basic unit-level model, the integralhˆBi j =∫Rh(yi j) fyi j(y|yis)dybecomes the CDF of the conditional distribution of yi j|yis evaluated at t. The con-ditional distribution under the current model is N(µ ir|s,Vir|s). Hence, we can easilycompute its CDF at any t. For complex function h(x), we may not be able to findan explicit expression for (3.2). Molina and Rao (2010) proposed to approximate(3.2) by Monte Carlo simulation. Supposey(`)ir = col(yi j, j ∈ ri)(`) i.i.d.∼ fyir(y|yis), `= 1, · · · ,Lis the `th Monte Carlo sample where the total number of Monte Carlo samples isL and the parameter values in fyir(y|yis) are known. By the Strong Law of LargeNumbers, we havehˆB(yi j)≈ 1LL∑`=1I(y(`)i j ≤ t). (3.8)Based on this fact, one may generate the yirs from a multivariate normal distribu-tion with dimension Ni−ni. In small area estimation, Ni is large and ni is small. Itis computationally time consuming or nearly impossible to generate a random vec-tor with Ni−ni dimension in general. However, our target has a specific structureof the conditional mean and variance of yir. Taking advantage of this structure, we24may first generateui ∼ N(0,σ2v (1− γi)), i = 1, · · · ,m and ε ir ∼ N(0Ni−ni ,σ2e INi−ni).After which, we create yir by another nested error linear regression model specifiedasyir = µ ir|s+ui1Ni−ni + ε ir. (3.9)Thus, substituting the unknown parameters σ2e , σ2v , and β with by their MLE’s:(σˆ2e )ML, (σˆ2v )ML, and βˆML, we generate univariate normal variableu(`)i ∼ N(0,(1− γi)(σˆ2v )ML)andε(`)i j ∼ N(0,(σˆ2e )ML).With these quantities ready to use, we createy(`)i j = µˆ( j)ir|s+u(`)i + ε(`)i jwhereµˆ( j)ir|s = XirβˆML+ σˆ2v 1Ni1′NiVˆ−1is (yis−XisβˆML)denotes the jth element of the vector. Thus the corresponding estimator after theMonte Carlo approximation becomes hˆEB(yi j).Molina-Rao approach can be sensitive to the violation of normality assumptionon the error distribution. To overcome this shortcoming, Diallo and Rao (2014)studied empirical best predictor for the complex small area finite population pa-rameters under the assumption that the error distribution is a skewed normal. Al-though their method follows the same principle, the computation is much morecomplex due to technical complexities attributable to the skewed normal distribu-tion families. We do not review their numerical implementation in this thesis butrefer to Diallo and Rao (2014) for more detailed information.253.2 Chen-Liu approachTo obtain a reliable estimate for small area quantiles, it can be helpful to developsome methods not relying on normality assumption on the error terms ei js. Follow-ing this line of thinking, Chen and Liu (2017) adopted the linear regression modelwith a nested error structure, but allowed more general distributions for the errorterms. They assumeyi j = x′i jβ + vi+ ei j (3.10)where vii.i.d∼ N(0,σ2v ) are independent from error terms andei ji.i.d∼ Gi(t)for some Gi such that they are linked by the following density ratio model (DRM):log{dGi(t)/dG1(t)}= θ ′iq(t), i = 2, · · · ,mwhere θ is are free parameters and q(t) are some pre-specified basis functions. Apossible choice would be q(t) = (1, t)′.We have already formulated the quantile estimation problem into a CDF pre-diction problem. To get an appropriate CDF predictor, appropriate estimator forthe unknown parameters in the probability model and appropriate estimator forh(yi j), j 6∈ si are needed. The Chen-Liu approach concentrates on procedures thatbuild predictors for h(yi j), j 6∈ si without relying on normality assumption. If theauxiliary variables, β , vis, and the error distributions Gis were known, they sug-gested that an estimator mimicking the Molina-Rao approach is possible. This isreadily given byFˆi(t) =1Ni[∑j∈sih(yi j)+∑j 6∈siGi(t−x′i jβ − vi)]26becauseE(h(yi j)) = P(yi j ≤ t) = Ev,e [I(yi j ≤ t)] = Ev,e[I(ei j ≤ t−x′i jβ − vi)]= Ev{Ee[I(ei j ≤ t−x′i jβ − vi)|vi]}= Ev[Gi(t−x′i jβ − vi)]The error distributions Gis and the parameters in proposed model are unknown inapplications. Therefore, they must find a way to have Gi and other parametersappropriately estimated. The details of their approach are given in the followingsections.3.2.1 Estimation under nested error linear regression modelIt is helpful to investigate the situation where the error distributions for small areasare in fact identical and normal. That isG1(t) = G2(t) = · · ·= Gm(t) =Φ(tσe).In this case, to obtain an estimator the CDF, we only need to estimate the pa-rameters in the NELRM to obtain small area quantile estimates. Denote the MLestimators for β , σv, σe as βˆML, σˆMLv , σˆMLe respectively, denote the EBLUP forvi as vˆi. When the auxiliary variables are available for all population units, an EBPversion estimator for CDF becomesFˆChen1i (t) =1Ni{∑j∈siI(yi j ≤ t)+∑j 6∈siΦ[t−xi jβˆML− vˆiσˆMLe]}. (3.11)When the auxiliary variables are only known for sampled units, then a naturalestimator becomesFˆChen2i (t) =1nini∑j=1Φ[t−xi jβˆML− vˆiσˆMLe]. (3.12)Once these CDF predictors are available, the corresponding pth quantile estimators27are respectively given by:ξˆChen1p,i = inf{t : FˆChen1i (t)≥ p}, (3.13)ξˆChen2p,i = inf{t : FˆChen2i (t)≥ p}. (3.14)3.2.2 Estimation under density ratio modelRecall that Chen and Liu (2017) adopted a semi-parametric DRM for Gis such thatlog{dGi(t)/dG1(t)}= θ ′iq(t), i = 2, · · · ,mwith the natural constraints∫dGi(t) =∫exp(θ ′iq(t))dG1(t) = 1, i = 1, · · · ,m. (3.15)One may without loss of generality assume θ1 = 0. The form of the baseline dis-tribution G1(t) is left unspecified. Becauselog{dGi(t)/dG j(t)}= (θ i−θ j)′q(t),any distribution G j can be regarded as a baseline distribution. Without loss ofgenerality, they use the distribution function of the first small area as the baselinedistribution function.The semiparametric nature of DRM makes it most appropriate to use empiricallikelihood (EL) for the purpose of statistical inference. Suppose we have m samples{ei j, j = 1, · · · ,ni} from a DRM. Following the common convention of EL, Chenand Liu (2017) required the form of the baseline distribution function to beG1(t) =∑i, jpi jI(ei j ≤ t).The support points of G1(t) includes the error term ei j from all the small areas.Under this setting, we have dG1(ei j) = pi j and dGk(ei j) = exp(θ ′k)pi j, for k =282, · · · ,m. As a result, the CDF for the error term in ith area can be written asGi(t) =∑k, jpk j exp(θ ′iq(ek j))I(ek j ≤ t) (3.16)Because ei j follows Gi(t), it contributes to the likelihood only through dGi(t). Theempirical likelihood function is given byLn(G1, · · · ,Gm) =m∏i=1ni∏j=1dGi(ei j)=∏i, j[pi j exp(θ ′iq(ei j))]=(∏i, jpi j)exp[∑i, jθ ′iq(ei j)] (3.17)with the non-parametric constraints analogues to (3.15) given by∑k, jpk j exp(θ ′iq(ek j)) = 1 (3.18)for all i = 1,2, · · · ,m.Let θ ′ = (θ ′1, · · · ,θ ′m). Taking logarithm of (3.17), we obtain the empiricallog-likelihood function:`n(θ ,G1) =∑i, jlog pi j +m∑i=1θ ′iq(ei j).We now maximize the empirical log-likelihood function with respect to G1 underconstraint (3.18) using Lagrange multiplier method. LetK = log pi j +m∑i=1θ ′iq(ei j)− γ(∑i, jpi j−1)−nm∑l=2λl{∑i, jpi j[exp(θ ′lq(ei j))−1]}.Setting∂K∂ pi j=1pi j− γ−nm∑l=2λl[exp(θ ′lq(ei j))−1]= 029givespi j =1γ+n∑ml=2λl[exp(θ ′lq(ei j))−1] .Summing pi j ∂K∂ pi j over i, j, we find that γ = n. Therefore, the fitted probabilitiesmust satisfypi j =1n{1+m∑l=2λl[exp(θ ′lq(ei j))−1]}−1(3.19)with λ2, · · · ,λm being the solution tom∑k=1nk∑j=1exp(θ ′iq(ek j))−11+∑ml=2λl[exp(θ ′lq(ek j))−1] = 0, i = 2, · · · ,m. (3.20)Making use of these expressions, we get the profile empirical likelihood func-tion of θ given by˜`n(θ) =−m∑k=1nk∑j=1log{1+m∑l=2λl[exp(θ ′lq(ek j))−1]}+m∑k=1nk∑j=1θ ′kq(ek j). (3.21)The statistical inference can be done by regarding this profile likelihood as if it isa usual parametric likelihood. For instance, we may estimate θ by its maximumprofile likelihood estimator.Let θˆ be the maximum point of ˜`n(θ). Clearly, this maximum point is a solu-tion to∂ ˜`n(θ)∂θ i= 0.That is,−m∑k=1nk∑j=1λi[exp(θ ′iq(ek j))−1]q(ek j)+∑ml=2∂λl∂θi[exp(θ ′lq(ek j))−1]1+∑ml=2λl[exp(θ ′lq(ek j))−1] + ni∑j=1q(ei j)= 0.Because the Lagrange multipliers λi, i = 2, · · · ,m satisfy (3.20), it follows that theabove equation can be simplified to−λim∑k=1nk∑j=1[exp(θ ′iq(ek j))−1]q(ek j)1+∑ml=2λl[exp(θ ′lq(ek j))−1] + ni∑j=1q(ei j) = 0.30Note that q(t) is a vector. Hence, the above equation is in fact a system ofequations. In particular, since the first element of q(t) is constant 1, this part ofthe equation implies λl = ρl = nln . Namely, at the maximum point of the profilelikelihood, the Lagrange multipliers take these specific values.Let λl = ρl in (3.21), we find the stationary point of ˜`n(θ) coincides with thestationary point of˘`n(θ) =−m∑k=1nk∑j=1log[m∑r=1ρr exp(θ ′rq(ek j))]+∑k, jθ ′kq(ek j) (3.22)where ρr = nr/n, r = 1,2, · · · ,m.For this reason, (3.22) is regarded as a dual empirical log-likelihood function.The dual empirical likelihood function (3.22) is a concave function. Hence, it isnumerically much simpler to find its maximum point than directly using (3.21).Parameter estimation with fitted residualsThe procedure introduced in the last section assumes that the values of error termsei js are known. This is certainly not the case in applications. The practical im-plementation is to replace virtual error terms {ei j; i = 1, · · · ,m; j = 1, · · · ,ni} byresiduals {eˆi j; i = 1, · · · ,m; j = 1, · · · ,ni} obtained from fitting the nested errorlinear regression model.Suppose there is a sample (yi j,xi j) for i = 1, · · · ,m and j = 1, · · · ,ni satisfyingnested error linear regression model with error distribution following DRM. Bytreating vis as fixed effects, the parameters are estimated by(βˆCL, vˆCLi ) = argminβ ,vi∑i, j(yi j−x′i jβ − vi)2.The resulting estimators are given byvˆCLi = y¯i− x¯′iβˆCL(3.23)whereβˆCL= {∑i, j(xi j− x¯i)′(xi j− x¯i)}−1{∑i, j(xi j− x¯i)′(yi j− y¯i)} (3.24)31where x¯i and y¯i are sample means over small area i. Then the residuals of this fitare given byeˆCLi j = yi j−x′i jβˆCL− vˆCLi= (yi j− y¯i)− (xi j− x¯i)′βˆCL.(3.25)Let us now treat {eˆCLi j : j = 1,2, · · · ,ni}, i = 1, · · · ,m as random samples fromDRM. Replacing ei j by eˆCLi j in (3.22), we obtain`n(θ) =−m∑k=1nk∑j=1log[m∑r=1ρr exp(θ ′rq(eˆCLk j ))]+m∑k=1nk∑j=1θ ′kq(eˆCLk j ) (3.26)Maximize (3.26) and obtain θˆCL for the parameters in the DRM. Then we can getthe estimators pˆCLi j for the probabilities via (3.19). Hence, the estimator for Gi isobtained asGˆCLi (t) =m∑i=1ni∑j=1pˆCLi j exp{θˆ′iq(eˆCLi j )}I(eˆCLi j ≤ t) (3.27)with θˆ 1 = 0 andpˆCLi j = n−1{1+m∑l=1ρl[exp{θˆ ′lq(eˆCLi j )}−1]}−1. (3.28)Consequently, if the census information is available, the EBP version of the CDFestimator Fi(t) based on DRM becomesFˆCL-DRMi (y) =1Ni{∑j∈siI(yi j ≤ y)+∑j 6∈siGˆCLi (y−x′i jβˆCL− vˆCLi )}. (3.29)The corresponding pth quantile estimators are respectively given by:ξˆCL-DRMp,i = inf{t : FˆCL-DRMi (t)≥ p}. (3.30)323.3 M-quantile approachWe review another approach for small area quantile estimation proposed by Cham-bers and Tzavidis (2006) in this section. This is a method based on an M-quantileregression model. By fitting quantiles regression to data collect from small areas,the method is anticipated to be robust to outliers. The fitted values and predictedvalues of the variable of interest are then used to construct an estimator of the smallarea CDFs. From which, the method derives small area quantiles estimations. Toexplain this approach, we first give a brief introduction to the M-quantile regres-sion model. We then provide sufficient details of numerical implementation ofM-quantile approach for small area quantile estimation.3.3.1 M-quantile regression modelM-quantile regression is a generalization of quantile regression. Breckling andChambers (1988) defined the M-quantile of order q ∈ (0,1), for any real valuedrandom variable y with probability density function f (y), as the solution Qψ(q) ofthe following equation in Q:∫ ∞−∞ψq(y−Q) f (y)dy = 0 (3.31)for someψq(u) = ψ(u){qI(u> 0)+(1−q)I(u≤ 0)}where ψ(u) can be any monotone non-decreasing function such that ψ(−∞) <ψ(0) = 0 < ψ(∞). When ψ(u) = sign(u), the solution Qψ(q) is the usual qthquantile of a distribution function. When ψ(u) = u and q = 12 , the solution Qψ(q)is the to expectation of random variable y. Another widely used choice for ψ isψ(u) = uI(|u| ≤ c)+ c sign(u)I(|u|> c).This function takes the form ψ(u) = u when u is not very large and is boundedby c when u is large. Because of this, the solution is a location type parameter ofthe population. When applied to a random sample, it retains the efficiency of leastsquare estimator, while limiting strong influences of large observations.33Figure 3.1 depicts these three choices of ψ(u).−2 −1 0 1 2−2−1012Comparison of Different psi(u)xψ(u)ψ(u) = uψ(u) = sign(u)ψ(u) = uI(u ≤ 1) + sign(u)I(u > 1)Figure 3.1: Different choices of ψ(u) in M-quantile regression model.In the context of linear regression, we’re interested in conditional distributionof y given some explanatory variables: f (y|x) rather than the marginal distribu-tion f (y). The M-quantile Qψ(q) must be generalized so that Qψ(x;q) dependson explanatory variables. An M-quantile regression model assumes the followingrelationshipQψ(x;q) = x′βψ(q). (3.32)As an extension for quantile regression, M-quantile regression models assume alinear relationship between the conditional qth M-quantile of the response vari-able and the explanatory variables. For any pre-specified q, we may estimate theM-quantile regression coefficient βψ(q) based on a sample {(yi,xi); i = 1, · · · ,n}.Namely, we estimate the coefficient through estimation equationn∑i=1ψq(yi−x′iβψ(q)s)x′i = 0 (3.33)34wheres = med|yi−x′iβψ(q)|/0.6745is the median absolute deviation (MAD) robust scale estimator. When ψ(u) = uand q = 12 , estimating equation (3.33) reduces to the estimating equation for ordi-nary least square (OLS) estimator of the regression model. When ψ(u) = sign(u),it reduces to the estimating equation for fitting quantile regression model (Koenkerand Bassett Jr, 1978).Numerical solution to M-quantile regressionWe may find the numerical solution for (3.33) for any prespecified q and ψ(u) byan iterative re-weighted least square (IWRLS) method. To illustrate the idea of thisalgorithm, let us first ignore the robust scale estimator s. Letwi = ψq(yi−x′iβψ(q))/(yi−x′iβψ(q)).Then (3.33) can be written asn∑i=1wi(yi−x′iβψ(q))x′i = 0 (3.34)Let Y = (y1, · · · ,yn)′, X = (x′1, · · · ,x′n)′, W = diag(w1, · · · ,wn). We obtain a matrixexpression of (3.34):(Y −Xβψ(q))′WX = 0. (3.35)When W is regarded as constants not dependent on β , the solution to (3.35) isgiven byβˆψ(q) = [X′WX ]−1[X ′WY ].The pseudo-code for iteratively solving the M-quantile regression estimation equa-tion is as in Algorithm 1.A common choice for initialization of β (0)ψ (q) is the OLS estimate for the re-gression coefficient in an ordinary linear regression model. Using the popular soft-ware package R, function rlm in MASS package (Venables and Ripley, 2002) isconvenient for computing the IWRLS estimator. The function psi needs to be35Algorithm 1 Iterative Reweighted Least Square Method for solving M-quantileregression estimation equation1: procedure SOLVE M-QUANTILE REGRESSION ESTIMATION EQUATION2: Initialize β (0)ψ (q), Tol=10−6, nIter=03: while 1 do4: nIter=nIter+15: w(new)i = ψq(yi−x′iβ (old)ψ (q))/(yi−x′iβ (old)ψ (q))6: Updated coefficients estimate β (new)q = [X ′W (new)X ]−1[X ′W (new)Y ]7: diff = ||β (new)ψ (q)−β (old)ψ (q)||8: β (old)ψ (q)← β (new)ψ (q)9: if diff<Tol then10: Exit11: if nInter>maxInter then12: Exit13: return β (new)ψ (q)specified depending on the user’s choice. As for the scale estimator s in (3.33),the default setting in rlm is the re-scaled MAD of the residuals.3.3.2 M-quantile approach for small area estimationChambers and Tzavidis (2006) proposed to use the M-quantile regression modelfor small area estimation. They assumed the population from the ith small areafollows some M-quantile regression models. Namely,Qψ(xi j;q) = x′i jβψ(θi), j ∈Ui, i = 1, · · · ,m. (3.36)The motivation for proposing such M-quantile regression models in each area isbased on the following arguments:1. For each sampling units in small area i, one can find a value qi j ∈ (0,1) suchthat Qψ(xi j;qi j) = yi j. They define this qi j as the M-quantile coefficient ofthis sampling unit.2. If a hierarchical structure does explain between-area and within-area vari-ability of yi j, then all sampling units within the same area behave similarly.36In other words, they share the same M-quantile coefficients. Based on thispromise, they defined the M-quantile coefficient for the small area i asθi =∑Nij=1 qi jNi.To make use of the M-quantile approach for small area estimation, Chambersand Tzavidis (2006) choseψ(u) = uI(|u| ≤ c)+ c sign(u)I(|u|> c)called Huber Proposal 2. This leads toψq(x) =−(1−q)c x<−c(1−q)x −c≤ x< 0qx 0≤ x< cqc x≥ cand they chose c = 1.345. Chambers and Tzavidis (2006) motivated the choice ofHuber Proposal 2 over ψ(u) = sign(u) in M-quantile regression as follows. Thelatter choice defines the ordinary quantile regression. When applied to a finitesample, the solution to the corresponding estimating equation is not unique. Incomparison, when ψ(u) is a continuous function, the corresponding estimatingequation has a unique solution. The IWRLS method is well suited and the iterativeprocedure produces a sequence that converges to this unique solution (Kokic et al.,1997).One way to construct a predictor of the small area CDF is to provide a fittedvalue for every yi j for sampling units not observed. This is what Chambers andTzavidis (2006) proposed via M-quantile.Under M-quantile approach, Chambers and Tzavidis (2006) proposed to esti-mate the quantile regression coefficient of small area i θi byθˆi =1ni∑j∈siqˆi j37where qˆi j, j ∈ si are the approximated M-quantile coefficients for jth sampled unitin area i from linear interpolation. They then constructed the fitted value for everynon-sampled unit byyˆMQi j = x′i jβˆψ(θˆi). (3.37)With all response values in the finite population either observed or predicted,Chambers and Tzavidis (2006) gave a naı¨ve CDF predictorFˆMQ-Naı¨vei (t) =1Ni[∑j∈siI(yi j ≤ t)+∑j∈riI(yˆMQi j ≤ t)]. (3.38)However, as pointed out by Tzavidis et al. (2010), the expected difference betweenFi(t) and Fˆi(t) is non-diminishing even when ni→∞. This can be easily seen sincethe difference betweenI(yˆMQi j ≤ t) = I(yi j ≤ t+ eˆi j)and I(yi j ≤ t) does not get smaller as n→ ∞. This will also be seen in our simula-tion.To overcome the problem of the naı¨ve CDF predictor, a bias-adjusted CDFestimator is proposedFˆMQ-CDi (t) =1Ni{∑j∈siI (yi j ≤ t)+ 1ni ∑j∈ri ∑k∈siI(yˆMQi j +(yik− yˆMQik )≤ t)}. (3.39)This estimator can be motived from the idea of Chambers and Dunstan (1986). Thecorresponding pth quantile estimator are respectively given by:ξˆMQ-Naı¨vep,i = inf{t : FˆMQ-Naı¨vei (t)≥ p}ξˆMQ-CDp,i = inf{t : FˆMQ-CDi (t)≥ p}.Numerical implementationWe have provide a general review of the M-quantile approach for small area quan-tile estimation. To apply this method in application, one must provide concrete38numerical method. For this purpose, we describe one way of numerical implemen-tation for the M-quantile regression as follows.1. Create a grid of values from 0.01 to 0.99 with increment 0.01. For each valueqk on the grid, fit an M-quantile model on sample, {xi j,yi j}nij=1 mi=1. Computethe M-quantile regression coefficients βˆψ(qk).2. For the jth sampled unit in small area i, use linear interpolation on the pairedobservations {(qk,x′i jβˆψ(qk)), k = 1, · · · ,99} to get a qˆi j such thatx′i jβˆψ(qˆi j) = yi j.Ifyi j <min{x′i jβˆψ(qk)}we set qˆi j = 0.01;Ifyi j >max{x′i jβˆψ(qk)}we set qˆi j = 0.99.Because x′i jβˆψ(qk) is not necessarily monotone in k, there might be multiplesolutions tox′i jβˆψ(qˆi j) = yi j.When this happens, we choose the qˆi j that is closest to 0.5.3. After the M-quantile coefficient qˆi j is obtained for all sampled units, weestimate the M-quantile coefficient of area i as θˆi =∑ j∈si qˆi jni.4. For each small area i, put q = θˆi and fit an M-quantile regression modelsbased on the whole data set {xi j,yi j}nij=1 mi=1. The outcome will be denotedas βˆψ(θˆi). This is the M-quantile coefficient for the ith small area.5. Compute the predicted value for the sampling units in small area i asyˆMQi j = x′i jβˆψ(θˆi), j ∈ ri.396. For the MQ-Naı¨ve predictor for the CDF in ith area, use the predicted yvalues obtained in the last step and get the predictor for CDF through (3.38).For the biased adjusted MQ-CD predictor for the CDF in ith area, we firstcomputeeˆMQi j = yi j− yˆMQi j .We then predict FˆMQ-CDi (t) via (3.39).After two versions of the CDF predictors are obtained, calculate the corre-sponding quantile estimators accordingly.3.4 Performance of quantile estimatorsWe are interested in the performance of the quantile estimators reviewed aboveunder different model scenarios. In this section, we report some simulation resultson the performance of these estimators.3.4.1 Simulation settingOur simulation is done under model-based analysis. In this case, the finite pop-ulations are regarded as random samples from some superpopulaiton. The esti-mators/predictors are investigated through their variations between one finite pop-ulation to another finite population generated from the same superpopulation. Intotal, we generate R = 1000 finite populations from each superpopulation underconsideration.The superpopulation is made to have m = 30 small areas with population sizeNi = 500 in each area. From each small area, ni = 10 sampling units are drawn bysimple random sampling without replacement.The superpopulation is chosen as the nested error regression modelyi j = β0i+β1ixi j2+ vi+ ei jsuch that vii.i.d.∼ N(0,σ2v ) and ei j ∼Gi. The auxiliary variable xi j2 is generated froma gamma distribution with mean 4 and variance 8 (i.e., shape α = 2, rate β = 12 ).We consider the following 6 scenarios for different choices of regression coef-40ficient β , σv and the error distribution Gi:[1] All assumptions for nested error linear regression model are satisfied. We letβ01 = · · ·= β0,30 = 50,β11 = · · ·= β1,30 = 10,σv = 10, and ei ji.i.d.∼ N(0,152).[2] All assumptions for nested error linear regression model are satisfied butwith a smaller random effect variance. More specifically, we letβ01 = · · ·= β0,30 = 50,β11 = · · ·= β1,30 = 10which are the same as above, but σv = 0.0001 which is smaller, and keepei ji.i.d.∼ N(0,152).[3] In this scenario, we let the small area error variance differ from one area toanother: ei ji.i.d.∼ N(0,100i).The rest of the specifications areβ01 = · · ·= β0,30 = 50,β11 = · · ·= β1,30 = 10and σv = 10.[4] In this scenario, we let the small areas have different regression coefficients:β0i = 50i,β1i = 31− ibut still have σv = 10, and ei ji.i.d.∼ N(0,152).[5] In this scenario, we remove the normality assumption on the error distribu-tion:ei ji.i.d.∼ 0.9N(− µi18,9)+0.1N(µi2,3)where µii.i.d.∼ U(90,95).41The other aspects of the model are:β01 = · · ·= β0,30 = 50,β11 = · · ·= β1,30 = 10and σv = 10.[6] In this scenario, we study the situation where some observed y values areoutliers.We still generate data from the nested error linear regression model. Theparameter values are chosen to beβ01 = · · ·= β0,30 = 50,β11 = · · ·= β1,30 = 10and σv = 10.In the first 15 small areas, we randomly let 80% of the error terms ei j beindependently generated from N(0,152). We generate the other 20% fromN(0,902) distribution in each area. In the rest of 15 areas, the ei j are inde-pendently generated from N(0,152).These choices are based on the following considerations: Scenario [1] satisfiesall the assumptions for the nested error linear regression model. In scenario [2], theassumptions for the nested error linear regression model still hold but the randomeffect is negligible. This implies that the correlation of the variable of interest be-tween areas is very small. In scenario [3], all error distributions are still normal butthey have different variances in different small areas. In scenario [4], we considerthe case where the regression coefficients are different across small areas. The er-ror distribution is not normally distributed in scenario [5] and the error distributionis still normally distributed but with outliers in the data due to very large variancesof ei j in scenario [6].3.4.2 Estimators under comparisonWe include the following 8 estimators of the small area quantiles for comparisonin the simulation study. They are:421. Direct: we compute the sample quantiles for small area i based on onlyobserved response variable yi j, j = 1, · · · ,ni.2. MR: we compute the Molina-Rao quantiles estimators given by (3.3). Theunknown parameters β , σ2v and σ2e in Molina-Rao approach are estimatedby a restricted maximum likelihood method.3. MQ-Naı¨ve: we compute the naı¨ve quantile estimators based on the M-quantileapproach given by (3.38).4. MQ-CD: we estimate the small area quantiles by (3.39) which is convertedfrom the Chambers-Dunstan version of the bias-adjusted predictor for CDFFi.5. Chen1 and Chen2: we compute these two estimators specified by 3.13 and 3.14respectively.6. CL-DRM: we compute the quantile estimators proposed by Chen-Liu underthe DRM assumption according to (3.29) with q(t) = (1,sign-root(t))′.7. Regression: We assume that the regression coefficients in different smallareas can be different. Hence, we fityi j = x′i jβ i+ εi j, j = 1,2, · · · ,ni. (3.40)for each small area based on OLS.Let βˆ i be the OLS estimators of the regression coefficients. We predict thenon-sampled response values byyˆi j = x′i jβˆ i, j ∈ ri.We mimic the Chambers-Dunstan version of the CDF estimator and computeFˆRegri (t) =1Ni{∑j∈siI (yi j ≤ t)+ 1ni ∑j∈ri ∑k∈siI (yˆi j +(yik− yˆik)≤ t)}. (3.41)The small area quantile estimators are obtained accordingly.43The motivation behind including the above regression estimator is as follows.In situations where no single linear regression model is appropriate for all smallareas, the performance of the M-quantile approach is much better than other esti-mators in preliminary simulation studies. Our investigation shows that this mightbe due to the fact that the model (3.37) behind the M-quantile approach permitsdifferent regression coefficients for different small areas. Such a model assump-tion is against the strength-borrowing strategy, hence, it should not perform whena single linear regression model is suitable for all small areas. However, whendifferent small areas do have different relationships between the response variabley and auxiliary variables x, all methods based on strength-borrowing strategy willnot perform. Because of this, the performance of the M-quantile approach standsout.If this is the case and our idea is right, then even a naı¨ve regression approachwill outperform all methods, including the MQ-CD and MQ-Naı¨ve quantile esti-mators. This is indeed the case as will be seen in the reported simulation resultsfor scenario [4].3.4.3 Performance measuresLet η(r)i be the population quantile in ith area in the rth repetition and ηˆ(r)i be thecorresponding estimator. We compute its Absolute Bias (AB) asABi =∣∣∣∣∣ 1R R∑r=1 ηˆ(r)i −η(r)i∣∣∣∣∣ .We compute its MSE asMSEi =1RR∑r=1(ηˆ(r)i −η(r)i )2.The overall performance of an estimator is measured by its Average Absolute Bias(AAB) and Average MSE (AMSE):AAB =1mm∑i=1ABi, AMSE =1mm∑i=1MSEi.44In general, a model-unbiased estimator should have AAB close to 0. The smallerthe average MSE, the more accurate the estimator is.3.4.4 Simulation resultsWe report our simulation results in Tables 3.1 and 3.2 and have the absolute biasand MSE of the quantile estimator in each area plotted in Figures 3.2 - 3.7.Table 3.1: The AAB and AMSE under scenarios [1]-[3] with ni = 10 and Ni = 500. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario Estimator Average AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%[1]MR 1.28 0.44 0.38 0.86 1.20 25.14 21.47 20.94 22.71 27.10Chen1 0.16 0.13 0.13 0.14 0.25 24.48 23.37 23.08 23.89 27.42MQ-CD 0.47 0.18 0.10 0.22 0.22 32.44 25.04 23.63 25.79 31.01CL-DRM 0.76 0.29 0.10 0.24 0.20 26.31 24.83 24.35 25.10 28.60MQ-Naive 6.48 1.51 1.72 3.31 4.02 67.78 26.68 26.82 35.94 46.54Regression 0.95 0.26 0.20 0.39 0.41 55.50 37.76 27.52 39.73 101.38Chen2 1.15 0.93 0.79 0.68 1.33 57.68 66.10 94.37 179.67 435.64Direct 4.76 3.01 2.80 4.47 10.27 165.96 100.61 128.59 205.51 485.43[2]MR 0.86 0.30 0.27 0.60 0.88 7.12 4.19 3.75 5.22 9.14MQ-Naive 6.48 1.51 1.72 3.31 4.03 54.01 13.12 13.51 22.66 32.46Chen1 0.21 0.14 0.12 0.13 0.21 24.50 23.38 23.07 23.88 27.40MQ-CD 0.48 0.18 0.10 0.22 0.22 32.03 24.77 23.56 25.74 30.34CL-DRM 0.76 0.29 0.10 0.24 0.20 26.31 24.83 24.35 25.10 28.60Regression 0.95 0.26 0.20 0.39 0.41 55.50 37.76 27.52 39.73 101.38Chen2 1.22 0.96 0.78 0.65 1.29 57.81 66.14 94.43 179.95 436.31Direct 4.76 3.01 2.80 4.47 10.27 165.96 100.61 128.59 205.51 485.43[3]MR 10.62 5.01 0.81 5.81 9.54 264.55 123.65 85.82 137.85 230.82MQ-Naive 29.05 11.71 3.98 15.68 22.97 1101.75 254.68 93.52 376.57 754.80Chen1 11.13 5.26 0.91 6.20 10.13 338.23 195.61 157.47 213.82 311.29CL-DRM 11.39 5.61 0.98 6.40 10.10 348.49 209.30 165.75 222.06 315.60MQ-CD 1.63 0.55 0.21 0.69 1.52 288.28 203.31 177.52 201.26 289.84Regression 2.07 0.47 0.74 1.26 1.03 374.34 249.27 201.74 258.46 501.73Chen2 11.10 5.24 1.11 6.23 10.16 377.52 245.28 226.78 328.43 547.60Direct 10.42 6.44 5.25 7.12 13.76 769.99 385.07 359.18 427.76 801.73The results in Table 3.1 show the performance of the MR estimator is the bestamong these 8 estimators in the first three scenarios. All the model-based estima-tors have better performance than the direct estimator. MQ-Naı¨ve has large biases.MQ-CD is indeed less biased as shown by the simulation results. The MQ-CDis good when estimating the middle quantiles but has a relatively larger MSE forestimating quantiles in two tails.In scenario [1] where all the model assumptions for nested error linear regres-sion model are satisfied, the MR has the best performance. The performance ofChen1 is also pretty good. In scenario [2], the assumptions for NELRM still holdbut the random effects are negligible. The NELRM assumes that the correlation ofthe variable of interest between areas is σ2v . We have selected a small σ2v value.45Hence the correlation between areas is small. In this case, the MR has superior per-formance than other estimators. We conjecture this is due to the shrinkage effectof the random effects which makes the MR estimator more robust.In scenario [3], the variances of the error distributions in different small areasare different. Since the MQ-CD and the regression estimators fit different modelsfor different small areas. This seems to help them to achieve lower biases. Noneof the methods work well though MR has the smallest MSE.Table 3.2: The AAB and AMSE under scenarios [4]-[6] with ni = 10 and Ni = 500. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario Estimator Average AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%[4]Regression 0.72 0.22 0.20 0.28 0.31 56.55 38.89 28.38 41.57 103.54MQ-CD 7.38 3.98 0.82 3.57 7.49 426.86 185.76 69.86 145.27 566.82CL-DRM 18.00 12.79 3.89 10.32 27.06 510.55 292.59 99.57 228.79 1082.50Chen1 20.15 14.25 5.24 9.75 26.41 629.52 346.54 119.13 216.39 1030.66MR 20.75 14.35 5.29 9.87 26.27 678.83 355.59 123.27 224.61 1032.62Chen2 19.85 14.21 5.60 9.95 26.31 686.16 444.55 290.89 573.89 1945.66Direct 5.21 3.45 3.81 6.68 15.85 244.72 194.08 324.05 593.31 1485.26MQ-Naive 14.04 12.63 13.21 14.62 16.21 1381.68 1207.45 1053.33 1077.35 1484.15[5]MQ-Naive 1.87 0.88 1.35 6.46 9.03 16.24 12.94 13.87 56.53 104.04CL-DRM 0.25 0.69 0.96 0.37 1.11 18.97 17.61 16.18 19.35 25.67MQ-CD 0.14 0.26 0.76 0.22 0.99 3.51 5.68 27.03 63.52 85.44Regression 0.24 0.71 1.02 0.45 1.33 28.05 19.06 31.04 74.78 142.56MR 3.68 1.40 3.48 0.65 0.90 35.85 23.34 34.62 25.84 31.30Chen1 2.35 1.95 3.25 0.15 1.84 30.38 28.51 35.98 27.41 35.59Chen2 1.34 2.79 3.95 0.47 0.84 57.87 73.86 110.36 179.18 431.77Direct 1.72 1.45 2.17 5.44 11.45 44.27 57.51 132.62 260.32 569.47[6]MQ-Naive 10.85 2.95 1.86 5.70 11.15 170.51 41.26 33.36 70.10 217.91MQ-CD 3.31 0.47 0.16 0.69 3.72 371.87 63.97 41.40 97.23 644.03CL-DRM 4.12 1.35 0.30 2.29 6.77 80.12 64.21 61.29 65.63 116.97MR 12.42 5.74 2.06 7.10 6.89 252.84 100.44 65.57 123.44 177.82Regression 5.00 1.55 0.23 1.40 3.19 523.55 128.00 69.09 165.89 644.12Chen1 11.28 5.24 1.98 6.46 7.19 263.96 133.45 103.75 153.17 205.63Direct 17.74 4.76 2.99 4.71 11.74 1879.98 218.32 153.78 270.91 762.27Chen2 10.54 4.65 2.40 6.68 7.23 285.80 174.61 173.73 277.93 483.87Scenario [4] in Table 3.2 is an example where the small area quantile estimatorsbased on the M-quantile approach works better than Chen-Liu and Molina-Raoapproaches. As we pointed out earlier, when no single regression model fits allsmall areas, it is better to fit different models for different small areas. The M-quantile approach implicitly does this and this seems to be the reason behind itsgood performance. For example, the average MSE of MQ-CD is 185.76 whilethe average MSE of other estimators are larger than 200 when estimating 25%quantiles.Recall that the regression estimator explicitly fits different models for differentsmall areas. This indeed makes it the best performer in this scenario. It has much46lower biases and AMSEs compared with all other methods. From this example thatwhen the model assumption is severely violated, the borrowing strength strategymay fail and the estimator uses the area-specific information may work better. Theerror distributions are generated from a skewed normal distribution in scenario [5].CL-DRM, MQ-CD and regression estimators have low biases. Although the MQ-CD estimator has the best performance for estimating lower quantiles, but its MSEis larger for estimating upper quantiles. The CL-DRM estimator has a reasonableperformance in this scenario.In scenario [6], we added some outliers to the dataset. In the presence of out-liers, only MQ-CD has reasonably low bias. In terms of the MSE, the MQ-Naı¨veestimator has the smallest MSE for estimating 50% quantile. However, when esti-mating the 90% quantile, the CL-DRM estimator has the smallest MSE. Therefore,the performances of these estimators cannot be described uniformly. The compar-ison of the MSE of these estimators shows that the small area quantiles estimatorsbased on M-quantile approach are not robust to outliers as promised.In summary, based on these simulation results, we find that the MR estimatorhas the best performance when the assumptions for the nested error linear regres-sion model are not violated. When the error distribution is not normally distributedor in the presence of outliers, the MR estimator may lose efficiency. We also findout that the Chen-Liu approach has as good performance as MR estimator when theassumptions for NELRM are not violated. At the same time, Chen-Liu approachis robust to the violation of the normality assumption. The MQ-CD estimator hasbetter performance than MQ-Naı¨ve estimator in most of the times. However, thesetwo estimators are not robust to the non-normality nor the outliers. The M-quantileapproaches are better than Chen-Liu and Molina-Rao approaches when the regres-sion coefficients across small areas are different. However, their performances areworse than the regression estimator. This example also illustrates the importanceof the validity of the model assumption which is however very difficult to check inreality. When the model assumption is violated and a quantile estimator is not ro-bust, the strength-borrowing strategy may fail. Such estimators perform worse thanthe estimators using only area-specific information. Therefore, we attempt to mod-ify and propose new estimators and investigate their robustness and performanceunder different model misspecification scenarios in the next chapter.471:30quar1.abias01234561:30quar2.abias0.00.51.01.52.02.53.03.51:30quar3.abias0.00.51.01.52.02.53.03.51:30quar4.abias0123451:30quar5.abias0246810120 5 10 15 20 25 30501001501:30quar1.mse0 5 10 15 20 25 30204060801001:30quar2.mse0 5 10 15 20 25 30204060801001201401:30quar3.mse0 5 10 15 20 25 30501001502001:30quar4.mse0 5 10 15 20 25 301002003004005001:30quar5.mseDirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMRAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.2: The AB and MSE of quantile estimators with respect to small areas under scenario [1].481:30quar1.abias01234561:30quar2.abias0.00.51.01.52.02.53.03.51:30quar3.abias0.00.51.01.52.02.53.03.51:30quar4.abias0123451:30quar5.abias0246810120 5 10 15 20 25 300501001501:30quar1.mse0 5 10 15 20 25 300204060801001:30quar2.mse0 5 10 15 20 25 300204060801001201401:30quar3.mse0 5 10 15 20 25 300501001502001:30quar4.mse0 5 10 15 20 25 3001002003004005001:30quar5.mseDirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMRAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.3: The AB and MSE of quantile estimators with respect to small areas under scenario [2].491:30quar1.abias010203040501:30quar2.abias051015201:30quar3.abias024681:30quar4.abias05101520251:30quar5.abias0102030400 5 10 15 20 25 30050010001500200025001:30quar1.mse0 5 10 15 20 25 3001002003004005006007001:30quar2.mse0 5 10 15 20 25 3001002003004005006001:30quar3.mse0 5 10 15 20 25 3002004006008001:30quar4.mse0 5 10 15 20 25 300500100015001:30quar5.mseDirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMRAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.4: The AB and MSE of quantile estimators with respect to small areas under scenario [3].501:30quar1.abias0204060801001:30quar2.abias0204060801001:30quar3.abias0204060801001:30quar4.abias0204060801001:30quar5.abias0204060801000 5 10 15 20 25 300200040006000800010000120001:30quar1.mse0 5 10 15 20 25 3002000400060008000100001:30quar2.mse0 5 10 15 20 25 3002000400060008000100001:30quar3.mse0 5 10 15 20 25 3002000400060008000100001:30quar4.mse0 5 10 15 20 25 3002000400060008000100001:30quar5.mseDirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMRAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.5: The AB and MSE of quantile estimators with respect to small areas under scenario [4].511:30quar1.abias012345DirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMR1:30quar2.abias0.00.51.01.52.02.53.03.51:30quar3.abias123451:30quar4.abias01234561:30quar5.abias0246810120 5 10 15 20 25 301020304050601:30quar1.mse0 5 10 15 20 25 30204060801:30quar2.mse0 5 10 15 20 25 30204060801001201401:30quar3.mse0 5 10 15 20 25 30501001502002501:30quar4.mse0 5 10 15 20 25 3001002003004005006001:30quar5.mseAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.6: The AB and MSE of quantile estimators with respect to small areas under scenario [5].521:30quar1.abias051015202530DirectRegressionChen1Chen2CL−DRMMQ−NaiveMQ−CDMR1:30quar2.abias024681:30quar3.abias012341:30quar4.abias02468101:30quar5.abias0510150 5 10 15 20 25 30010002000300040001:30quar1.mse0 5 10 15 20 25 301002003004001:30quar2.mse0 5 10 15 20 25 30501001502002501:30quar3.mse0 5 10 15 20 25 30501001502002503003501:30quar4.mse0 5 10 15 20 25 3002004006008001000120014001:30quar5.mseAreaAbsolute BiasMSE10 % quantile 25 % quantile 50 % quantile 75 % quantile 90 % quantileFigure 3.7: The AB and MSE of quantile estimators with respect to small areas under scenario [6].53Chapter 4Small Area Quantile Estimatorsbased on Linear Mixed EffectMedian Regression ModelsWe begin this chapter with a summary of the procedures for small area quantileestimation proposed in the literature. Such procedures first obtain an predictor forthe CDF Fi of the response variable of small area i in two steps under a unit-levellinear regression model(a) Estimate the shared regression coefficient β in a unit-level model;(b) Build an estimator for the error distribution Gi based on the estimated re-gression coefficient from step (a).Chen and Liu (2017) focused on step (b) by using DRM to improve the esti-mation precision of Gi without imposing a normality assumption. The M-quantileapproach reformulates the regression relationship and robustify the estimation ofβ . Although Chen-Liu approach does not assume normality, their approach maylose efficiency because the estimation of β is distorted by outliers. As the examplesshown in section 3.4 and our intuitions, the MQ-Naı¨ve and MQ-CD estimators arefound not as robust as we hoped against the violation of the normality assumptionnor to the presence of outliers. These observations lead to the following question:54“Is it possible to combine these two approaches to achieve the improved robustnesswhile retaining good estimation precision?” We hope to develop such an approachin this chapter.Inspired by the M-quantile approach, we estimate the regression coefficientin step (a) differently. In Molina and Rao (2010), the regression coefficient isestimated by REML method. In Chen and Liu (2017), the regression coefficientis estimated by minimizing the sum of the residual squares by treating vis as fixedeffects. In Chambers and Tzavidis (2006), an M-quantile regression model is astrong assumption. We estimate the regression coefficient β by minimizing the sumof absolute deviance instead of sum of residual squares as described in Section 4.1.Intuitively, the estimated regression coefficient would be more robust if estimatedthis way. Then we follow Chen and Liu (2017) for the estimation of the Gi whichwill be given in Section 4.2. During this process, we find that step (b) in Chenand Liu (2017) can be directly used for estimation of Fi combined with estimatedregression coefficient β from other approaches. The resulting method is seen toimprove the performance of the M-quantile approach for the small area quantileestimation.4.1 Linear mixed effect median regression modelWe adopt the nested error linear regression model in which the error distributionsfollow DRM in this section. More specifically, we assume thatyi j = x′i jβ + vi+ εi j i = 1, · · · ,m, j = 1, · · · ,ni (4.1)where vii.i.d∼ N(0,σ2v ) are independent from ei j i.i.d∼ Gi(t). We assume that Gis followthe density ratio model (DRM):log{dGi(t)/dG1(t)}= θ ′iq(t), i = 2, · · · ,mwhere θ is are free parameters and q(t) are pre-specified basis functions. A possiblechoice would be q(t) = (1, t)′. We have an extra assumption Gi(0) = 0.5, i =1, · · · ,m. With this assumption, the model is equivalent to a median regression55modelMedian(yi j|vi,xi j) = x′i jβ (4.2)where vii.i.d.∼ N(0,σ2v ). In comparison, the nested error linear regression model as-sumes the conditional expectation E(yi j|xi j,vi) is a linear function of the auxiliaryvariables xi j with a nested error structure.Following Koenker (2004), one may estimate the unknown parameters by treat-ing vis as fixed effect. We estimate β and vis by the minimizer of the sum of theabsolute deviance(βˆAD, vˆADi ) = argminβ ,vim∑i=1ni∑j=1|yi j−x′i jβ − vi| (4.3)Koenker (2004) noticed when a model contains a large number of “parameters”vis, the variability of the estimators can be very high. We may limit this effect byshrinking these individual effects toward a common value. A general approachto estimating parameters in median regression model with nested error structureis proposed by Koenker (2004) by adding an `1 regularization term for the fixedeffects vis:(βˆPAD, vˆPADi ) = argminβ ,vim∑i=1ni∑j=1|yi j−x′i jβ − vi|+λm∑i=1|vi| (4.4)The effect of introducing the penalty on the random effects is examined by simu-lation study in Section 4.3.In this approach, the parameter λ which controls the intensity of shrinkageneeds to be selected or estimated. We choose the most appropriate λ by cross val-idation in this thesis. We first define a grid on a range for the possible choices ofλ . For each λ in the grid, a k-fold cross-validation is performed. The sample{(yi j,xi j), i = 1, · · · ,m; j = 1, · · · ,ni}is partitioned into k equal sized subsam-ples. One subsample of the k subsamples is used as the validation set, the otherk− 1 subsamples are used to fit the median regression model with penalized ran-dom effects given by (4.1). After the median regression model is fitted, the meansquared error of the prediction on the validation set is obtained. We then iterateover which fold is the validation fold and choose the λ value that minizes the total56mean squared errors. We may notice that the random effect vi is detectable only ifthere are samples in ith area, if there are no samples in the ith area in the k−1 sub-samples during jth iteration over validation set in the cross-validation, the randomeffects cannot be estimated and the prediction might be problematic. To avoid thisproblem, the area structure and the sample sizes in different areas need to be takeninto consideration so that each subsample contains observations from every area.One possible way for the k folds sample partition is to divide the sample withineach area into k folds and then combine observations from the same fold across ar-eas to form the fold in the whole sample. The pseudo-code for performing a k-foldcross-validation is given in Algorithm 2.Algorithm 2 k-fold Cross-Validation for choosing the parameter λ in median linearmixed effect regression model fitting1: procedure k-FOLD CROSS-VALIDATION2: for i in 1 to m do3: Randomly divide{(yi j,xi j), j = 1, · · · ,ni}into Di1,Di2, · · · ,Dik withequal sizes.4: for l in 1 to k do5: Combine D1l,D2l, · · · ,Dml into Dl .6: for λ in {λ1,λ2, · · · ,λM} do7: for i in 1 to k do8: (βˆ λ , vˆλ ,l) = argminβ ,vl ∑(yl j,xl j)∈D−i |yl j − x′l jβ − vl| + λ ∑ml=1 |vl|where D−i denote the sampled data set without the ith fold Di;9: Predict the ys in the validation set: yˆi j = x′i jβˆ λ + vˆλ ,l;10: errori = ∑ j∈Di(yi j− yˆi j)2.11: error = error1+···+errorkk .12: if error<minError then13: minError=error14: bestLambda=lambda15: return bestLambdaWhen k = n, then the k-fold cross-validation becomes the leave-one-out cross-validation. The larger the k, the more expensive it becomes to select λ , the cross-57validation result is less biased, but may suffer from larger variability (Friedmanet al., 2001). A 10-fold cross-validation is adopted in choosing the most appropri-ate λ in our simulation study.Numerical implementation for minimizing sum of absolute devianceWe show how to numerically get a solution for (4.4) after the parameter λ is chosenby cross-validation. For the simplicity of the notation, we letf (β ,v1, · · · ,vm) =m∑i=1ni∑j=1|yi j−x′i jβ − vi|+λm∑i=1|vi| (4.5)be our objective function.When λ = 0, the objective function becomesf (β ,v1, · · · ,vm) =m∑i=1ni∑j=1|yi j−x′i jβ − vi|The function is concave in β and vis, therefore any local minima is also globalminima. However, a local minimizer cannot be obtained by finding the stationarypoint of the objective function since it is non-differentiable. We may instead usethe iterative re-weighted least square method as discussed in Section 3.3.1. Inthis thesis, we use linear programming as follows. Finding the minimizer of theobjective function is equivalent to solving the following problemargminβ ,v1,··· ,vm,ri jm∑i=1ni∑j=1ri jwith constraintsri j ≥max{yi j−x′i jβ − vi,x′i jβ + vi− yi j}for any i, j.This is equivalent to solveargminβ ,vi,ri jm∑i=1ni∑j=1ri j58with the constraintsri j ≥ yi j−x′i jβ − vi, ri j ≥ x′i jβ + vi− yi jover i = 1, · · · ,m. j = 1, · · · ,ni.Now we have a linear objective function with 2n constraints. The optimiza-tion problem can be solved efficiently by the function lp in lpSolve package inR (Berkelaar et al., 2015) when λ = 0. When λ > 0, the linear programmingproblem is formulated intoargminβ ,bi,vi,ri jm∑i=1ni∑j=1ri j +λm∑i=1biwith the constraintsri j ≥ yi j−x′i jβ − vi, ri j ≥ x′i jβ + vi− yi j, bi ≥ vi, bi ≥−vifor any i = 1, · · · ,m. j = 1, · · · ,ni. The lp function in R can also be used for thisproblem.We think it is worth mentioning the idea of linear programming which is veryneat in optimizing (4.5). The idea of linear programming can be used for similarobjective functions in regression. Once a optimization problem can be formedinto a linear/quardratic programming problem, the computation is faster than thetraditional iterative methods. There is an R package called rqpd by Koenker andBache (2012) which is particularly designed for fitting linear quantile mixed effectmodels. To decrease the chance of making any mistakes in fitting the model, weused the well-tested and published package rqpd in our simulation study.4.2 New quantile estimationWe have proposed a likely robust procedure for estimating the shared regressioncoefficients β . This is to be used to build an estimator for the CDF Fi and subse-quently small area quantile estimators. We present our idea of how to estimate Fi inthis section either when the error distributions follow DRM or when the error dis-tributions are identical. Meanwhile, we show the possibility of extension of these59CDF predictors so that they can be used when the regression coefficient is esti-mated by M-quantile approach. Hopefully, this attempt can improve the predictionaccuracy of the small area quantile estimators.4.2.1 Estimation under DRMWe now investigate the possible approach when Gi, i = 1,2, · · · ,m follow DRM.That is, they satisfylog{dGi(t)/dG1(t)}= θ ′iq(t).If the DRM assumption is appropriate, we adopt the estimation method for Gk ofChen and Liu (2017) as reviewed in Section 3.2. In Chen and Liu (2017), theestimator for Fi is constructed based on residuals from the fitted linear randomeffect model. The linear model is fitted by least square method. The residuals arethen regarded as samples from the density ratio model. In our proposed method,the only needed change is how the linear model is fitted and this has been discussedin the last section.The basic procedure is as follows. Firstly, we estimate the regression coef-ficient using some approach, predict the y values using the corresponding fittedmodel. Secondly, the residuals for the sampled units are obtained. Thirdly, weuse these residuals as samples from DRM and obtain the predictor for the CDF asdescribed in Section 3.2.2. The small area quantile estimates are computed accord-ingly. We now illustrate how to build a predictor for Fi based on a linear mixedeffect median regression model.Suppose we have a sample (yi j,xi j) for i = 1, · · · ,m and j = 1, · · · ,ni satisfy-ing linear mixed effect median regression model with error distribution followingDRM. By performing k-fold cross-validation, the most appropriate λ is obtained.We minimizem∑i=1ni∑j=1|yi j−x′i jβ − vi|+λm∑i=1|vi|with respect to β and vi to obtain β˜PADand v˜PADi .The residuals from this fit are calculated asε˜PADi j = yi j−x′i jβ˜PAD− v˜PADi (4.6)60We treat {ε˜PADi j : j = 1,2, · · · ,ni}, i = 1, · · · ,m as samples from DRM and applythe empirical likelihood method in Chen and Liu (2017).Let `n(θ) denote the log empirical likelihood function (3.22) with ei j replacedby ε˜PADi j :`n(θ) =−∑k, jlog[m∑r=1ρr exp(θ ′rq(ε˜PADi j ))]+∑k, jθ ′kq(ε˜PADi j ) (4.7)where we may choose q(t) = (1, t)′. We define the maximum EL estimator of θ byθ˜ = argmax`n(θ)and estimate Gi byG˜PADi (t) =m∑i=1ni∑j=1p˜PADi j exp{θ˜ ′iq(ε˜PADi j )}I(ε˜PADi j ≤ t) (4.8)with θˆ 1 = 0 andp˜PADi j = n−1{1+m∑l=1ρl[exp{θ˜ ′lq(ε˜PADi j )}−1]}−1. (4.9)Consequently, if the census information is available, the EBP version of the CDFestimator Fi(t) becomesFˆPAD-DRMi (y) =1Ni{∑j∈siI(yi j ≤ y)+∑j 6∈siG˜PADi (y−x′i jβ˜PAD− v˜PADi )}. (4.10)Similarly, the procedure above is applicable to the residuals obtained from the M-quantile regression model given by (3.36).4.2.2 Estimation under identically distributed errorsIn the last section, we consider the situation where the error distributions are iden-tical. That isG1(t) = G2(t) = · · ·= Gm(t).61We propose to apply the idea of Chambers-Dunstan CDF predictor here based onthe assumption that the error distributions are identical.Chen and Liu (2017) suggested that the Chambers-Dunstan CDF estimator mayalso be written asFˆi(t) =1Ni[∑j∈siI(yi j ≤ t)+∑j 6∈siGˆi(t− yˆi j)]where yˆi js are the predicted y values based on some approach and Gˆi(t) is theempirical distribution of the residuals:Gˆi(t) =1ni∑j∈siI(eˆi j ≤ t).The above empirical distribution Gˆi is constructed by using only residuals from theith area. If the error distributions of small areas are identical, it is best to pool theresiduals and use information across areas to build an empirical distribution. Theresulting Gˆi would activate the “strength-borrowing” strategy and achieve higherprecision. This empirical likelihood functions are henceGˆED1 (t) = · · ·= GˆEDm (t) =1n ∑i, j∈siI(eˆi j ≤ t) (4.11)where eˆi j can be residuals obtained from any one of the regression approachesdiscussed above.Once Gi’s have been predicted appropriately, we mimic the Chambers-Dunstanpredictor for CDF:FˆEDi (t) =1Ni[∑j∈siI(yi j ≤ t)+∑j 6∈siGˆEDi (t− yˆi j)](4.12)where the yˆi js are the predicted values from any regression approach. For example,the pooled empirical distribution (ED) version of the CDF estimator Fi(t) wherethe parameters are estimated from the linear mixed effect median regression model62becomesFˆPAD-EDi (y) =1Ni{∑j∈siI(yi j ≤ y)+∑j 6∈siGˆEDi (y−x′i jβ˜PAD− v˜PADi )}(4.13)whereGˆED1 (y) = · · ·= GˆEDm (y) =1nm∑i=1ni∑j=1I(ε˜PADi j ≤ y).4.2.3 The equivalence of quantile estimation under DRM and underidentically distributed errorsWe have given two CDF predictors, one is constructed when the error distributionssatisfy DRM and the other is when the error distributions are known to be iden-tically distributed. It turns out that the two estimators are identical if we chooseq(t) = (1, t)′ in DRM, and fit the regression model in a specific way. The detaileddescription is given in the following theorem.Theorem 4.2.1. Suppose{(yi j,xi j), i = 1, · · · ,m; j = 1, · · · ,ni}are m samples from the linear mixed effect regression model (3.10). Letn = n1+n2+ · · ·+nmbe the total sample size, βˆ be the estimated regression coefficient, and eˆi j = yi j− yˆi jbe the corresponding residuals of the sampled units across all small areas.Suppose the residuals satisfy ∑nij=1 eˆi j = 0 and the basis function q(t) = (1, t)′.Then the dual likelihood function˜`n(θ) =−m∑k=1nk∑j=1log{1+m∑l=2ρl[exp(θ ′lq(eˆk j))−1]}+m∑k=1nk∑j=1θ ′kq(eˆk j) (4.14)63is maximized at θˆ i = 0, where ρl = nl/n. Consequently, we havepˆi j =1n{1+m∑l=2ρl[exp(θˆ l′q(ei j))−1]}−1=1n. (4.15)Suppose one uses least squares fit to the linear mixed effect regression model toobtain corresponding residuals. Due to the property of the least squares, we wouldhave ∑nij=1 eˆi j = 0 for all small areas. If we also use q(t) = (1, t)′ as the basis func-tion for the subsequent data analysis under DRM, then the resulting predictors Gˆiwill be identical for all i. In fact, the CDF predictor becomes the pooled empiricalversion given by (4.12).Proof. It is obvious that (4.14) is a concave function with respect to θ . Therefore,if θ = θˆ is its local maximum, then it must satisfy∂`∂θk= 0, k = 2, · · · ,mHence∂`∂θk=−∑i, jρk exp(θ ′rq(eˆi j))∑mr=1ρr exp(θ′rq(eˆi j))q(eˆi j)+nk∑j=1q(eˆk j) = 0, k = 2, · · · ,mLet θˆ i = 0 and θˆ = (θˆ 1, · · · , θˆm). Because q(t) = (1, t)′, we can directly verify that∂`∂θk|θ=θˆ must be equal to 0. Since function (4.14) is concave, its local maximumis also a global maxima. Therefore, (4.14) reaches its maximum value at θˆ = 0.Consequently, we have pˆi j = 1n by (4.15).4.3 Simulation studyIn this section, we study the performance of various small area quantile estimatorsunder a number of scenarios via a simulation study. In this simulation, we aim toestimate 10%, 25%, 50%, 75% and 90% small area quantiles.644.3.1 Simulation settingSimilar to Section 3.4, the model-based simulation will be conducted. For thepurpose of simulation study, we first create some superpopulation models. Weconsider the following modelyi j = β0+β1xi j1+ vi+ ei j, i = 1, · · · ,m, j = 1, · · · ,Ni (4.16)(β0,β1) = (50,10)xi j1i.i.d.∼ Gamma(2, 12), vii.i.d.∼ N(0,100)This model incorporates a linear structure between response variable y and thepre-determined auxiliary variable xi j1. A finite population will be repeatedly gen-erated for simulation R = 1000 times for each setting. The small area populationquantiles are also repeatedly computed and regarded as targets for prediction. Sim-ple random sample without replacement will be used to obtain samples from eacharea. We give two choices for the sample size and population size: (1) ni = 10 andNi = 500; (2) ni = 30 and Ni = 1000. We study the properties of selected small areaquantile estimators with a number of choices for the distributions for ei j. Many ofthem are designed to examine the robustness of the quantile estimators. We con-sider the following 5 scenarios:(i) ei ji.i.d.∼ N(0,152).(ii) ei ji.i.d.∼ Laplace(0,15/√2).(iii) ei ji.i.d.∼ 0.9N(−5,9)+0.1N(45,9).(iv) ei ji.i.d.∼ 0.9N(µi/18,9)+0.1N(−µi/2,9) where µi i.i.d.∼ U(90,95) and is fixedover all repetitions.(v) In each of the first m/2 areas, 80% of the ei j are independently generatedfrom N(0,152), the other 20% are generated from N(0,502). In each of thelast m/2 areas, ei ji.i.d.∼ N(0,152), i = m/2+1, · · · ,m.We also consider a scenario (vi) where mean shift outliers are created (Sinha,2004). The outliers are high leverage points and are used to investigate the ro-65bustness of quantile estimators. In this scenario, the error distributions follow (i)and the population and samples in each replicate are obtained accordingly. Theoutliers are created by replacing the corresponding auxiliary variable xi j1 values byxi j1− 30 for 1% random data points in the sample. An example of the sampled yvalues versus x values in scenarios (i)-(vi) is presented in Figure 4.2.The above choices are based on the considerations: (i) does not violate the as-sumption for nested error linear regression model; (ii) error distributions are non-normal but symmetric and identical; (iii) error distributions are non-normal, non-symmetric and identical; (iv) error distributions are non-normal, non-symmetricand non-identical; (v) outliers are presented. To make the comparison under differ-ent scenarios meaningful, the parameters for the distributions in (ii)-(iv) are chosenso that their variances are equal to that in scenario (i). A plot of the density of theerror distributions in scenario (i)-(iii) is in Figure 4.1.−60 −40 −20 0 20 40 600.000.020.040.060.080.100.12xdensityN(0, 152)laplace(0, 15 2)0.9N(− 5, 9) + 0.1N(45, 9)Figure 4.1: Probability density functions of the normal, laplace, mixture normal distributions.662 4 6 8 106080100120140160180Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (i)AreaArea 1Area 2Area 3Area 4Area 5(a)0 2 4 6 86080100120140160Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (ii)AreaArea 1Area 2Area 3Area 4Area 5(b)2 4 6 8 10 12 1450100150200250Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (i)AreaArea 1Area 2Area 3Area 4Area 5(c)2 4 6 8 10 12 14050100150Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (iv)AreaArea 1Area 2Area 3Area 4Area 5(d)0 2 4 6 8 10 1250100150200Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (v)AreaArea 1Area 2Area 3Area 6Area 7(e)−30 −20 −10 0 1050100150Auxiliary variable xVariable of interest yPlot of one sample generated under scenario (v)AreaArea 6Area 7Area 8Area 9Area 10(f)Figure 4.2: Scatter plot of y versus x in an example sample generated under: (a) error distri-bution matches the assumption for NELRM; (b) non-normal but symmetric and identicalerror distributions; (c) non-normal, non-symmetric but identical error distributions; (d)non-normal, non-symmetric and non-identical error distributions; (e) outliers created bylarge variance of the error distribution; (f) mean shift outliers. The observations in eacharea are sorted so that x is in the ascending order and the line is drawn in each area toconnect these observations. The black line is the regression with slope 50 and intercept10. 674.3.2 Estimators for comparisonIn the simulation study, we have 12 estimators in total for the comparison. Herewe specify their acronym and the mathematical formula for the CDF predictor.The quantiles are estimated based on definition of the quantiles using the CDFpredictor. The hyphen in the acronym for the quantile estimator combines themethod for parameter estimation and type of CDF predictor together. We chooseq(t) = (1, t)′ for all the DRM based approaches.1. The Direct, MR, CL-DRM, MQ-Naı¨ve, and MQ-CD estimators are the sameas described in Section 3.4.2.2. CL-ED: For this estimator, we estimate the regression coefficient as givenby Chen-Liu in (3.23) and (3.24). The residuals eˆCLi j are then obtained ac-cording to (3.25). Then by substituting eˆi j with eˆCLi j in (4.11) and (4.12), thecorresponding CL-ED predictor for Fi is obtained.3. MQ-DRM: In this estimator, the regression coefficient is estimated based onM-quantile approach. The predictor y values are given by (3.37), the residu-als eˆMQi j are obtained accordingly. The DRM version of the CDF predictor isobtained by replacing ε˜PADi j with eˆMQi j in (4.7) and maximize the log empiricallikelihood function with respect to θ .4. MQ-ED: The residuals eˆMQi j are obtained same as described in MQ-DRM.Then by substituting eˆi j with eˆMQi j in (4.11) and (4.12), the correspondingCL-ED predictor for Fi is obtained.5. AD-DRM: In this estimator, the regression coefficient and vis are estimatedvia (4.3). The residuals eˆADi j are obtained accordingly. The DRM versionof the CDF predictor is obtained by replacing ε˜PADi j with eˆADi j in (4.7) andmaximize the log empirical likelihood function with respect to θ .6. AD-ED: The residuals eˆADi j are obtained same as described in AD-DRM.Then by substituting eˆi j with eˆADi j in (4.11) and (4.12), the correspondingCL-ED predictor for Fi is obtained.7. PAD-DRM: the CDF predictor for Fi is defined as (4.10).688. PAD-ED: the CDF predictor for Fi is defined as (4.13).Given these estimators, based on different approaches for estimating the re-gression coefficient and the model settings (i)-(vi), we calculated the coefficient ofdetermination (Nagelkerke, 1991). The coefficient of determination R2 (Nagelk-erke, 1991) indicates the proportion of the variance in the dependent variable thatis predictable by the linear structure β0+β1xi j1. The larger the R2, the more capa-ble the fitted model is able to predict the value of the dependent variable.The coefficient of determination is defined to beR2 =SSRSST= 1− SSESSTwhereSSR =∑i, j(yˆi j− y¯)2, SSE =∑i, j(yˆi j− yi j)2, SST =∑i, j(yi j− y¯)2and yˆi j is the predicted y values and y¯ = 1n ∑i, j yi j. Suppose {(yi j,xi j)(r), i =1, · · · ,m; j= 1, · · · ,ni} is a sample from the rth simulation, yˆi j is the correspondingprediction based on the fitted regression model and the corresponding coefficientof determination is denoted as R2(r). Then the average coefficient of determinationacross 1000 repetition is defined asR2 =110001000∑r=1R2(r).The average coefficients of determination for each regression fit is given in Table4.1.Based on the observations from the simulation study in section 3.4, we believethe performance of these estimators under each scenario as follows:1. In scenario (i), the MR and CL based estimators would perform well whenthe model assumptions are not violated.2. In scenarios (ii), (iii) and (iv) the PAD, CL, and AD based estimators whichdo not assume normality would perform well. Comparing scenario (iii) with69Table 4.1: The average coefficient of determination across 1000 repetitions under scenarios(i)-(vi) when ni = 10 and Ni = 500.Scenariom = 10, Ni = 500, ni = 10REML CL AD PAD MQ(i) 0.808 0.814 0.807 0.802 0.801(ii) 0.808 0.814 0.806 0.802 0.801(iii) 0.801 0.807 0.772 0.769 0.788(iv) 0.790 0.797 0.760 0.757 0.776(v) 0.677 0.692 0.678 0.671 0.671(vi) 0.438 0.469 0.110 0.100 0.190scenario (iv), we expect that the ED based estimators perform better thanDRM based estimators since the error distributions are identical in scenario(iii), we expect DRM based estimators perform better than ED based estima-tors since the error distributions are non-identical in scenario (iv).3. In scenario (v) where there are outliers, we expect the PAD, AD based theestimators and the modified MQ estimators perform well.4. In scenario (iv), we expect PAD and AD based estimators would performwell based on the estimation method of the regression coefficient in thesetwo methods. Intuitively, the estimated regression coefficient is more robustin these two methods.4.3.3 Simulation resultsThe average absolute biases and the average MSEs as described in Section 3.4.3are adopted to measure the performance of different estimators. Tables 4.2 4.3include the average biases and average MSEs for the 12 quantile estimators underscenarios (i)-(vi). We also report the MSE of the estimated regression coefficientin Table 4.4 based on different approaches under different scenarios.Across six different scenarios, we find that all the model-based estimators havean average MSE that is substantially lower than that of the direct estimator. Thisobservation indicates that the model-based method and strength-borrowing works.70Table 4.2: The AAB and AMSE under scenarios (i)-(iii) when ni = 10 and Ni = 500. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario Estimator Average AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%(i)MR 0.25 0.09 0.18 0.34 0.47 25.39 22.97 22.05 23.62 29.13MQ-DRM 0.95 0.37 0.13 0.27 0.31 27.52 24.22 22.70 24.27 31.34CL-ED 0.84 0.34 0.12 0.26 0.29 26.70 24.08 23.00 24.47 30.06CL-DRM 0.84 0.34 0.12 0.26 0.29 26.70 24.08 23.00 24.47 30.06PAD-DRM 1.06 0.32 0.16 0.25 0.22 28.41 24.65 23.13 24.73 31.81AD-DRM 1.04 0.30 0.17 0.24 0.20 28.53 24.95 23.27 24.82 32.21MQ-CD 0.67 0.27 0.11 0.26 0.37 33.97 25.79 23.28 25.82 33.86MQ-ED 0.62 0.26 0.09 0.10 0.12 27.54 24.83 23.53 25.17 32.17MQ-Naı¨ve 6.65 1.64 1.69 3.33 4.18 72.50 28.27 27.06 37.21 52.54PAD-ED 0.87 0.23 0.11 0.11 0.05 32.85 29.63 27.95 29.48 36.52AD-ED 0.94 0.26 0.15 0.18 0.13 33.63 30.20 28.43 29.94 37.43Direct 4.73 2.88 2.64 4.29 10.25 162.84 98.86 125.84 197.43 481.89(ii)PAD-ED 0.48 0.18 0.11 0.13 0.18 20.28 17.64 16.81 18.88 24.96AD-ED 0.54 0.22 0.13 0.16 0.17 21.44 18.68 17.69 19.72 26.10MQ-ED 0.28 0.21 0.13 0.13 0.17 24.17 21.03 19.58 22.11 30.06MQ-CD 0.17 0.20 0.14 0.20 0.30 37.39 21.50 20.94 28.97 43.92MQ-Naı¨ve 5.25 1.39 1.29 3.03 4.09 51.77 23.32 21.43 31.24 47.61PAD-DRM 0.53 0.23 0.15 0.19 0.21 26.56 21.60 21.59 25.43 33.29MQ-DRM 0.42 0.28 0.15 0.20 0.23 27.84 22.56 21.61 25.28 34.62AD-DRM 0.56 0.23 0.16 0.20 0.21 26.57 22.30 21.95 25.35 33.06MR 1.60 0.28 0.58 0.65 0.57 29.57 23.34 22.24 24.31 29.96CL-ED 0.37 0.26 0.17 0.19 0.22 27.21 24.31 23.04 24.95 31.20CL-DRM 0.37 0.26 0.17 0.19 0.22 27.21 24.31 23.04 24.95 31.20Direct 6.19 2.83 2.67 4.47 10.27 224.21 92.89 121.30 206.66 492.44(iii)PAD-ED 0.13 0.14 0.21 0.08 0.06 3.03 3.14 4.87 10.86 15.48AD-ED 0.15 0.16 0.23 0.06 0.05 3.71 3.85 5.55 11.40 16.00MQ-ED 0.10 0.30 0.49 0.10 0.12 13.59 12.28 12.87 18.68 26.69MQ-Naı¨ve 1.76 0.81 1.38 6.16 8.50 18.22 13.97 13.97 53.21 98.07CL-ED 0.16 0.59 0.85 0.19 0.68 23.76 23.35 24.61 29.12 36.74CL-DRM 0.16 0.59 0.85 0.19 0.68 23.76 23.35 24.61 29.12 36.74AD-DRM 0.21 0.30 0.70 0.23 0.71 4.70 7.60 25.07 53.35 67.48MQ-DRM 0.13 0.41 0.79 0.21 0.61 13.17 13.92 25.09 45.12 59.11PAD-DRM 0.20 0.29 0.72 0.23 0.73 4.00 6.98 25.59 54.38 69.16MQ-CD 0.08 0.27 0.82 0.23 0.75 5.77 6.93 26.51 59.41 81.71MR 2.63 1.69 3.13 0.22 1.18 28.53 24.32 33.25 27.47 36.31Direct 1.81 1.54 2.31 5.30 11.53 44.65 56.96 129.86 251.65 556.62We observe that the MR estimator performs well when the nested error linear re-gression model holds. We also observe that CL-ED and CL-DRM estimators haveidentical performance as implied by Theorem 4.2.1.Here is a description of the performance of these estimators:1. Under scenario (i) where the assumptions for the nested error linear regres-sion model are satisfied, the Molina-Rao approach has the lowest averageMSEs. The average of CL-ED, CL-DRM, MQ-ED, MQ-DRM, PAD-DRM,and AD-DRM are almost the same as MR. The MQ-Naı¨ve estimator haslarge bias and its average MSEs are larger than the average MSEs of MRespecially for estimating the 10% and 90% quantiles. The average MSEs ofPAD-ED, AD-ED, MQ-CD, are noticeably larger than the average MSEs ofMR. The difference in average MSE is still moderate. The two modified M-71Table 4.3: The AAB and AMSE under scenarios (iv)-(vi) when ni = 10 and Ni = 500. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario Estimator Average AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%(iv)PAD-ED 0.60 0.08 0.06 0.06 0.18 22.52 4.53 3.70 3.76 5.00AD-ED 0.55 0.10 0.07 0.05 0.18 22.37 5.50 4.61 4.65 5.87MQ-CD 4.03 0.55 0.16 0.12 0.06 179.65 27.93 11.52 9.80 15.48MQ-ED 0.98 0.24 0.08 0.05 0.12 32.07 15.07 12.09 12.66 19.30MQ-Naive 4.30 1.29 0.34 0.12 0.38 37.69 15.99 12.42 13.64 21.31PAD-DRM 3.62 0.43 0.17 0.09 0.12 159.66 23.36 12.51 10.54 10.91AD-DRM 3.56 0.41 0.18 0.09 0.11 156.40 22.21 12.99 11.14 11.60MQ-DRM 2.94 0.44 0.17 0.09 0.09 117.36 25.15 16.86 16.06 22.03CL-ED 1.46 0.55 0.21 0.10 0.20 42.02 26.52 23.67 23.93 28.73CL-DRN 1.46 0.55 0.21 0.10 0.20 42.02 26.52 23.67 23.93 28.73MR 5.91 3.29 0.38 1.21 2.07 72.50 38.26 23.09 24.38 31.84Direct 14.86 4.11 2.36 4.19 10.08 650.00 139.11 101.45 183.32 454.21(v)MQ-ED 1.67 0.56 0.18 1.11 3.20 36.64 28.94 26.81 30.81 57.65MQ-Naı¨ve 8.32 2.21 1.76 4.34 7.13 105.05 34.37 30.34 49.39 102.95MQ-CD 1.84 0.23 0.14 0.34 2.58 180.12 39.04 31.42 53.57 403.84PAD-ED 1.70 0.62 0.19 1.09 3.19 42.57 34.59 32.46 36.46 62.38AD-ED 1.68 0.61 0.21 1.11 3.20 45.99 37.85 35.47 39.49 65.84MQ-DRM 0.50 0.39 0.16 0.76 1.70 104.66 39.27 36.39 48.85 141.25PAD-DRM 0.55 0.44 0.17 0.75 1.71 107.01 40.93 38.26 49.87 139.66AD-DRM 0.71 0.46 0.19 0.80 1.87 97.96 43.48 40.80 50.90 130.77MR 7.11 3.01 1.29 4.04 4.17 145.23 67.82 51.37 79.90 115.83CL-ED 1.63 0.53 0.22 1.13 3.23 72.76 64.04 60.72 65.63 96.14CL-DRM 1.63 0.53 0.22 1.13 3.23 72.76 64.04 60.72 65.63 96.14Direct 12.49 3.69 2.96 4.50 10.81 1181.32 140.18 141.14 228.19 599.46(vi)MQ-CD 1.30 1.75 0.45 1.95 5.04 80.25 33.38 27.55 39.88 86.20MQ-ED 2.21 1.97 0.63 1.78 5.03 42.10 35.64 27.56 32.79 75.50PAD-ED 1.60 1.08 0.12 1.21 2.91 35.73 31.90 29.19 32.15 46.40MQ-Naı¨ve 10.72 4.31 1.35 6.25 10.91 162.62 54.85 29.33 73.19 190.10MQ-DRM 1.97 2.00 0.55 1.97 5.28 65.26 38.90 31.10 42.42 95.93AD-ED 1.71 1.12 0.12 1.28 3.03 38.58 34.38 31.37 34.55 50.18PAD-DRM 1.07 1.06 0.09 1.37 3.12 70.80 37.32 31.58 34.06 49.08AD-DRM 1.16 1.09 0.08 1.39 3.17 70.55 39.20 33.25 35.82 51.76MR 2.02 1.19 2.68 0.55 5.64 61.64 51.96 56.85 55.14 100.29CL-ED 5.12 4.63 2.45 2.44 10.04 90.68 83.77 68.07 70.91 175.22CL-DRM 5.12 4.63 2.45 2.44 10.04 90.68 83.77 68.07 70.91 175.22Direct 4.73 2.88 2.64 4.29 10.25 162.84 98.86 125.84 197.43 481.89Table 4.4: The average MSEs of the estimated regression coefficient based on different ap-proaches under scenarios(i)-(vi) when ni = 10 and Ni = 500.Scenario β1 = 50 β0 + v¯ = 10+ v¯REML CL AD PAD MQ REML CL AD PAD MQ(i) 0.31 0.31 0.49 0.47 0.46 24.99 27.32 36.61 36.20 30.21(ii) 0.32 0.32 0.29 0.26 0.47 25.67 28.36 21.47 20.25 26.10(iii) 0.35 0.36 0.03 0.03 0.49 26.35 29.62 23.18 23.57 26.29(iv) 1.41 1.44 0.03 0.03 1.10 85.72 121.09 100.98 99.83 100.07(v) 0.64 0.66 0.63 0.56 0.65 47.19 56.67 48.30 43.87 37.75(vi) 29.21 29.15 1.30 1.24 3.09 568.18 581.79 53.81 50.49 72.89quantile based quantile estimators have lower average MSEs than MQ-CDand MQ-Naı¨ve estimators.2. Under scenario (ii) where the error distributions are non-normal but are sym-metric and identical, the PAD-ED estimator has the lowest average MSEs72while AD-ED has comparable average MSEs. The simulation results alsoconfirm that the estimator with a penalty term works slightly better thanthat without a penalty term. The MQ-Naı¨ve and MQ-CD estimators havenoticeably larger MSEs than these two estimators. The other model-basedestimators for comparison have comparable average MSEs compared withPAD-ED and AD-ED. The two modified M-quantile-based quantile estima-tors have lower average MSEs than MQ-CD and MQ-Naı¨ve estimators.3. The error distributions are non-normal, non-symmetric, and identical in sce-nario (iii). As shown in Figure 4.1, the error distribution is bimodal, theoverall variance is large but the two modes center around -5 and 45 with0.9 weight on N(−5,9). Therefore, in this case, as shown in Figure 4.2 (c),the data points in the same area are closer compared with Figure 4.2 (a).In this case, PAD-ED and AD-ED again have the best performance. Theaverage MSEs of other estimators are almost 4 times as large as these ofPAD-ED and AD-ED although the performance of these estimators cannotbe described uniformly. For example, PAD-DRM and AD-DRM has rela-tively small average MSEs when estimating the 10% and 25% quantiles, buttheir average MSEs are larger than these of CL-ED and MR for estimatingthe 75% and 90% quantiles. In this scenario, the two modified M-quantileestimators do not uniformly perform better than MQ-CD and MQ-Naı¨ve.For example, when estimating the10% quantile, MQ-CD is better than MQ-DRM but MQ-DRM is better than MQ-CD for estimating the 90% quantile.4. Under scenario (iv) where the error distributions are non-normal but are non-symmetric and non-identical, we expect the DRM based estimators performbetter than ED based estimators. The simulation results do not support thisconjecture. The PAD-ED estimator has the lowest average MSEs while AD-ED has a comparable average MSEs. Similar to scenario (iii), the averageMSE of other estimators are almost 3 three times as large. In this scenario,the MQ-ED estimator is better than MQ-CD and MQ-Naı¨ve. However, MD-DRM has worse performance than MQ-Naı¨ve. For example, when estimat-ing 10% quantile, MQ-Naı¨ve has a smaller average MSE than MQ-DRM.735. Under scenario (v) where the outliers are included, MQ-ED has the best per-formance. PAD-ED and AD-ED have slightly larger average MSEs. Otherestimators have evidently larger MSEs than these three estimators. The twomodified M-quantile-based quantile estimators have lower average MSEsthan MQ-CD and MQ-Naı¨ve estimators.6. Under scenario (vi) where there are mean shift outliers, the average MSE ofestimators based on Chen and Liu (2017) and Molina and Rao (2010) arelarge as we expected. In this case, as shown in Figure 4.2 (f), the presenceof the leverage point may distort the estimation of the parameters in the re-gression model and the ultimate quantile estimator has large MSE therefore-hence. The performance of the quantiles estimator for estimating differentquantiles cannot be described uniformly. For example, MQ-CD performswell when estimating 50% quantile but has a noticeably large MSE when es-timating 10% and 90% quantiles. Overall, the PAD-ED, AD-ED, and MQ-ED are comparably good and are the best among these estimators.The observation above compares the performance of different estimators underthe same scenario, we also compare the performance of the same estimator underdifferent scenarios.1. Molina-Rao quantile estimator performs well only if the model assumptionof nested error linear regression model is satisfied.2. CL-DRM (CL-ED) is not robust to model mis-specification neither, but itsperformance is as good as Molina-Rao under the correct model.3. MQ-ED and MQ-DRM have better performance than MQ-CD and MQ-Naı¨ve for most cases. Hence, the proposed approach is working to somedegree. However, under scenarios (iii) and (iv) where the error distributionis non-normal and non-symmetric, the modification does not work.4. PAD-ED and AD-ED estimators performs relatively better under model mis-specification. Their performances under the nested error linear regressionmodel is worse than MR, but the performance is still acceptable.745. The performance of the quantile estimators do not positively relate to thecoefficient of determination. For example, in all model scenarios, Chen andLiu (2017) approach has the highest coefficient of determination, but thecorresponding average MSEs are not the smallest. This is explainable sincethe coefficient determination explains the variance rather than the absolutedeviance.6. When the form of the CDF predictors are the same, the performances ofthe ultimate quantile estimators are negatively correlated to the MSEs of theregression coefficient. In other words, the larger the MSEs of the regressioncoefficient, the worse the performance of the ultimate estimators.To summarise, the final observation from the simulation study: The PAD-EDand AD-ED quantile estimators show superior performance under non-normal er-ror distributions and relatively good performance in the presence of outliers inscenarios we considered above. When the nested error linear regression modelare valid, the average MSEs of these two estimators are larger than the Molina-Rao estimator. The latter has the best performance so far under nested error linearregression model. The MQ-ED estimator performs well when the data containoutliers.We only present the simulation results for ni = 10 and Ni = 500. The simulationresults for ni = 30 and Ni = 1000 are similar. These results can given in Tables 4.5and 4.6.75Table 4.5: The AAB and AMSE under scenarios (i)-(iii) when ni = 30 and Ni = 1000. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario EstimatorAverage AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%(i)MQ-DRM 0.26 0.09 0.07 0.09 0.06 8.39 7.80 7.64 8.08 9.76CL-ED 0.22 0.08 0.07 0.08 0.06 8.29 7.79 7.66 8.07 9.56CL-DRM 0.22 0.08 0.07 0.08 0.06 8.29 7.79 7.66 8.07 9.56PAD-DRM 0.35 0.09 0.08 0.09 0.06 8.47 7.85 7.67 8.10 9.81AD-DRM 0.34 0.09 0.08 0.09 0.06 8.48 7.86 7.67 8.10 9.83MQ-CD 0.15 0.07 0.07 0.08 0.07 10.89 8.32 7.84 8.70 10.79MR 1.13 0.41 0.26 0.65 0.90 11.52 8.84 8.46 9.56 11.68AD-ED 0.27 0.10 0.08 0.09 0.10 11.19 10.63 10.44 10.84 12.66PAD-ED 0.26 0.09 0.08 0.08 0.11 11.44 10.89 10.71 11.10 12.88MQ-ED 0.13 0.10 0.10 0.15 0.24 16.38 15.90 15.71 16.13 17.74MQ-Naı¨ve 6.40 1.48 1.73 3.28 3.99 57.76 18.47 18.93 27.21 34.44Direct 1.57 1.01 0.95 1.56 3.72 46.51 33.51 41.79 74.29 177.85(ii)PAD-ED 0.19 0.08 0.07 0.07 0.08 5.74 5.17 5.15 5.61 7.08AD-ED 0.20 0.08 0.06 0.06 0.08 5.73 5.18 5.15 5.61 7.09MQ-CD 0.10 0.08 0.08 0.08 0.10 11.04 6.57 6.86 9.36 14.00PAD-DRM 0.21 0.11 0.08 0.09 0.09 8.05 6.86 7.15 8.22 10.27AD-DRM 0.22 0.11 0.08 0.08 0.09 8.08 6.94 7.20 8.21 10.21MQ-DRM 0.09 0.10 0.10 0.08 0.09 8.50 7.19 7.27 8.28 10.60CL-ED 0.14 0.11 0.09 0.08 0.09 8.31 7.74 7.69 8.17 9.74CL-DRM 0.14 0.11 0.09 0.08 0.09 8.31 7.74 7.69 8.17 9.74MR 2.41 0.59 0.65 0.90 0.84 16.24 8.97 8.83 9.98 11.62MQ-ED 0.20 0.13 0.13 0.16 0.23 14.25 13.55 13.44 13.99 15.75MQ-Naive 5.08 1.28 1.33 3.03 4.01 40.22 15.43 15.39 23.32 32.38Direct 1.78 0.80 0.86 1.58 3.79 48.19 29.29 41.27 76.39 181.68(iii)AD-ED 0.06 0.06 0.08 0.09 0.13 0.71 0.74 1.10 2.36 3.93PAD-ED 0.06 0.05 0.08 0.09 0.13 0.74 0.77 1.13 2.38 3.95AD-DRM 0.08 0.10 0.25 0.10 0.18 1.12 1.91 6.79 19.41 24.63PAD-DRM 0.08 0.10 0.25 0.10 0.19 1.13 1.91 6.80 19.49 24.76MQ-CD 0.07 0.11 0.28 0.11 0.23 1.08 1.58 6.85 21.45 28.26MQ-DRM 0.07 0.20 0.39 0.11 0.27 4.80 4.86 7.51 16.56 21.74CL-ED 0.08 0.21 0.31 0.07 0.16 7.49 7.42 7.73 8.81 10.70CL-DRM 0.08 0.21 0.31 0.07 0.16 7.49 7.42 7.73 8.81 10.70MQ-ED 0.08 0.14 0.22 0.12 0.15 8.43 8.37 8.68 9.88 11.85MQ-Naive 1.91 0.93 1.28 6.08 8.39 12.36 9.49 10.40 46.89 83.42MR 3.27 1.44 3.21 0.47 0.94 20.12 10.38 19.04 10.16 13.19Direct 0.54 0.47 0.69 1.94 4.09 14.90 20.97 46.58 94.04 195.1676Table 4.6: The AAB and AMSE under scenarios (iv)-(vi) when ni = 30 and Ni = 1000. Theestimators are sorted so that the AMSEs of 50% quantile estimators are in ascending order.Scenario EstimatorAverage AB Average MSE10% 25% 50% 75% 90% 10% 25% 50% 75% 90%(iv)AD-ED 0.09 0.03 0.03 0.04 0.12 2.70 1.00 0.89 1.02 1.56PAD-ED 0.09 0.03 0.03 0.04 0.12 2.73 1.03 0.92 1.05 1.59MQ-CD 2.53 0.12 0.06 0.05 0.08 75.27 6.04 3.37 2.88 3.71PAD-DR, 2.14 0.11 0.06 0.05 0.09 63.65 5.97 3.82 3.32 3.60AD-DRM 2.12 0.11 0.06 0.05 0.09 63.09 5.98 3.83 3.33 3.61MQ-DRM 2.28 0.21 0.07 0.07 0.14 52.44 7.23 5.60 5.46 6.47CL-ED 0.48 0.16 0.09 0.09 0.13 10.28 8.20 7.91 8.07 8.98CL-DRM 0.48 0.16 0.09 0.09 0.13 10.28 8.20 7.91 8.07 8.98MQ-ED 0.46 0.15 0.09 0.13 0.22 11.49 9.27 8.81 9.01 10.17MR 6.76 3.49 0.27 1.52 2.37 58.09 21.71 8.88 11.60 16.35MQ-Naı¨ve 3.97 1.18 0.28 0.14 0.33 26.40 10.75 9.08 9.31 10.73Direct 7.19 0.75 0.76 1.47 3.73 239.67 24.00 34.49 69.43 173.38(v)MQ-CD 0.26 0.08 0.09 0.11 0.17 27.16 12.18 10.94 16.29 36.24MQ-DRM 2.17 0.80 0.19 1.38 3.00 20.62 13.27 12.38 16.40 31.65PAD-ED 2.33 0.83 0.22 1.51 3.30 19.68 13.67 12.80 15.76 27.21PAD-DRM 2.22 0.82 0.19 1.41 3.07 20.76 13.66 12.83 16.61 30.29AD-ED 2.34 0.84 0.21 1.50 3.29 19.80 13.76 12.86 15.79 27.23AD-DRM 2.24 0.82 0.19 1.42 3.09 20.87 13.92 13.07 16.72 30.04CL-ED 2.36 0.86 0.20 1.48 3.27 21.63 15.66 14.69 17.60 28.88CL-DRM 2.36 0.86 0.20 1.48 3.27 21.63 15.66 14.69 17.60 28.88MR 3.93 1.61 0.92 2.06 3.73 42.61 19.82 16.41 24.11 35.84MQ-ED 2.33 0.83 0.22 1.51 3.30 24.69 18.94 18.03 21.02 32.48MQ-Naı¨ve 8.79 2.34 1.92 4.77 7.26 102.47 24.82 21.99 44.03 85.63Direct 2.55 1.13 1.02 1.71 4.09 92.04 40.50 47.14 84.72 209.62(vi)MQ-CD 2.79 2.27 1.28 0.29 1.15 21.75 15.91 11.27 11.74 50.24PAD-ED 2.00 1.49 0.85 0.10 0.10 15.71 13.50 11.82 11.50 13.34AD-ED 2.01 1.50 0.84 0.10 0.11 15.80 13.54 11.83 11.53 13.40PAD-DRM 2.07 1.53 0.86 0.11 0.24 15.94 13.70 12.58 14.82 39.99AD-DRM 2.07 1.54 0.85 0.11 0.23 16.13 13.87 12.72 14.97 39.51MQ-DRM 2.76 2.24 1.28 0.22 1.30 21.06 18.09 15.37 17.34 38.77MQ-Naı¨ve 10.38 4.36 0.69 4.80 8.49 121.37 33.12 17.11 46.53 108.55MQ-ED 2.52 2.14 1.30 0.16 1.37 20.16 18.90 18.10 21.97 35.31CL-ED 5.85 5.12 3.04 1.30 7.50 55.39 47.14 30.21 23.47 80.78CL-DRM 5.85 5.12 3.04 1.30 7.50 55.39 47.14 30.21 23.47 80.78Direct 1.57 1.01 0.96 1.56 3.73 46.49 33.53 41.85 74.31 177.87MR 0.60 3.46 5.38 3.64 2.22 24.72 33.81 50.39 36.57 33.3577Chapter 5Conclusion and Future TopicsIn this thesis, we have studied some small area quantile estimators motivated by thenested error linear regression model. We have also investigated potentially morerobust estimators for small area quantiles.To achieve our goal, we have tried many combinations of methods for fittingregression models and methods for constructing CDF predictors. We investigatethe use of least sum of the absolute deviance instead of residual squares to fit theregression model. We believed that the change helps to robustify the estimationof the regression coefficient. We also follow the idea of Chen and Liu (2017)to estimate the small area error distributions under DRM. The idea of Chambersand Dunstan is then applied to obtain CDF predictors. The resulting small areaquantile estimators have some advantages over the direct estimator. In particular,the PAD-ED, AD-ED, and MQ-ED have shown satisfactory performance undersome types of model misspecification scenarios. When the error distributions areskewed and the data contain outliers, the PAD-ED, AD-ED are the best so far. Theestimators based on the DRM assumption are less satisfactory in the examples wehave investigated. We also show that our attempt for modifying the M-quantilebased small area quantile estimators works.For the estimators investigated so far, we adopt a two-step procedure: (a) ob-tain the estimator for the regression coefficient; (b) construct the CDF predictorbased on the residuals obtained from step (a). These two steps are mutually inde-pendent of each other. For future work, we wish to find a way to jointly estimate78the regression coefficient and the CDF. We also haven’t discussed how to estimatethe MSE of the quantile estimator in the thesis which is important in real applica-tions. Moreover, this thesis only considers samples that are non-informative in thesense that the sample selection probability is independent of the variable of inter-est. There are a lot of informative samples in real world for which the methods arenot applicable.79BibliographyBattese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-componentsmodel for prediction of county crop areas using survey and satellite data.Journal of the American Statistical Association, 83(401):28–36. → pages 11,14, 15Beaumont, J.-F. and Rivest, L.-P. (2009). Dealing with outliers in survey data.Handbook of Statistics, 29:247–279. → pages 4Berkelaar, M. et al. (2015). Package lpsolve. → pages 59Breckling, J. and Chambers, R. (1988). M-quantiles. Biometrika, 75(4):761–771.→ pages 33Chambers, R. and Tzavidis, N. (2006). M-quantile models for small areaestimation. Biometrika, pages 255–268. → pages 4, 5, 21, 33, 36, 37, 38, 55Chambers, R. L. and Dunstan, R. (1986). Estimating distribution functions fromsurvey data. Biometrika, pages 597–604. → pages 38Chen, J. and Liu, Y. (2017). Small area quantile estimation. arXiv preprintarXiv:1705.10063. → pages 4, 5, 21, 26, 28, 54, 55, 60, 61, 62, 74, 75, 78Diallo, M. and Rao, J. (2014). Small area estimation of complex parameters underunit-level models with skew-normal errors. Proceedings of the Survey ResearchSection, American Statistical Association. → pages 3, 4, 25Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2003). Micro–level estimation ofpoverty and inequality. Econometrica, 71(1):355–364. → pages 3Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: anapplication of james-stein procedures to census data. Journal of the AmericanStatistical Association, 74(366a):269–277. → pages 1080Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statisticallearning, volume 1. Springer series in statistics Springer, Berlin. → pages 58Guadarrama, M., Molina, I., and Rao, J. (2016). Small area estimation of generalparameters under complex sampling designs. → pages 6Hartley, H. O. and Rao, J. N. (1967). Maximum-likelihood estimation for themixed analysis of variance model. Biometrika, 54(1-2):93–108. → pages 14Henderson, C. R. (1953). Estimation of variance and covariance components.Biometrics, 9(2):226–252. → pages 13Koenker, R. (2004). Quantile regression for longitudinal data. Journal ofMultivariate Analysis, 91(1):74–89. → pages 56Koenker, R. and Bache, S. (2012). rqpd: Regression quantiles for panel data. Rpackage version 0.5, URL http://rqpd. r-forge. r-project. org. → pages 59Koenker, R. and Bassett Jr, G. (1978). Regression quantiles. Econometrica:journal of the Econometric Society, pages 33–50. → pages 35Kokic, P., Chambers, R., Breckling, J., and Beare, S. (1997). A measure ofproduction performance. Journal of Business & Economic Statistics,15(4):445–451. → pages 37Lohr, S. (2009). Sampling: design and analysis. Nelson Education. → pages 9Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators.Canadian Journal of Statistics, 38(3):369–385. → pages 3, 5, 19, 21, 22, 24,55, 74Nagelkerke, N. J. (1991). A note on a general definition of the coefficient ofdetermination. Biometrika, 78(3):691–692. → pages 69Patterson, H. D. and Thompson, R. (1971). Recovery of inter-block informationwhen block sizes are unequal. Biometrika, pages 545–554. → pages 14Pfeffermann, D. et al. (2013). New important developments in small areaestimation. Statistical Science, 28(1):40–68. → pages 4Prasad, N. and Rao, J. N. (1990). The estimation of the mean squared error ofsmall-area estimators. Journal of the American statistical association,85(409):163–171. → pages 1481Rao, J. N. and Molina, I. (2015). Small area estimation. John Wiley & Sons. →pages 1, 4, 14Schaible, W. L. (2013). Indirect estimators in US federal programs, volume 108.Springer Science & Business Media. → pages 3Sigman, R. S., Hanuschak, G. A., Craig, M., Cook, P. W., and Cardenas, M.(1978). The use of regression estimation with landsat and probability groundsample data. Survey Section Proceedings. → pages 15Sinha, S. K. (2004). Robust analysis of generalized linear mixed models. Journalof the American Statistical Association, 99(466):451–460. → pages 65Thompson, M. (1997). Theory of sample surveys, volume 74. CRC Press. →pages 10Tzavidis, N., Marchetti, S., and Chambers, R. (2010). Robust estimation ofsmall-area means and quantiles. Australian & New Zealand Journal ofStatistics, 52(2):167–186. → pages 38Tzavidis, N., Salvati, N., Pratesi, M., and Chambers, R. (2008). M-quantilemodels with application to poverty mapping. Statistical Methods &Applications, 17(3):393–411. → pages 3Valliant, R., Dorfman, A. H., and Royall, R. M. (2000). Finite populationsampling and inference: a prediction approach. → pages 10Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S.Springer, New York, fourth edition. ISBN 0-387-95457-0. → pages 3582

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0355208/manifest

Comment

Related Items