Properties of Empirical and AdjustedEmpirical LikelihoodsbyYi HuangB.Sc., University of Science and Technology of China, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2010c Yi Huang 2010AbstractLikelihood based statistical inferences have been advocated by generationsof statisticians. As an alternative to the traditional parametric likelihood,empirical likelihood (EL) is appealing for its nonparametric setting and de-sirable asymptotic properties.In this thesis, we rst review and investigate the asymptotic and nite-sample properties of the empirical likelihood, particularly its implication toconstructing con dence regions for population mean. We then study theproperties of the adjusted empirical likelihood (AEL) proposed by Chenet al. (2008). The adjusted empirical likelihood was introduced to overcomethe shortcomings of the empirical likelihood when it is applied to statisti-cal models speci ed through general estimating equations. The adjustedempirical likelihood preserves the rst order asymptotic properties of theempirical likelihood and its numerical problem is substantially simpli ed.A major application of the empirical likelihood or adjusted empiricallikelihood is the construction of con dence regions for the population mean.In addition, we discover that adjusted empirical likelihood, like empiricallikelihood, has an important monotonicity property.One major discovery of this thesis is that the adjusted empirical like-lihood ratio statistic is always smaller than the empirical likelihood ratiostatistic. It implies that the AEL-based con dence regions always containthe corresponding EL-based con dence regions and hence have higher cov-erage probability. This result has been observed in many empirical studies,and we prove it rigorously.We also nd that the original adjusted empirical likelihood as speci edby Chen et al. (2008) has a bounded likelihood ratio statistic. This mayresult in con dence regions of in nite size, particularly when the sampleiiAbstractsize is small. We further investigate approaches to modify the adjustedempirical likelihood so that the resulting con dence regions of populationmean are always bounded.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Empirical Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Parametric Likelihood . . . . . . . . . . . . . . . . . . . . . . 52.2 De nition of Empirical Likelihood . . . . . . . . . . . . . . . 82.3 Pro le Empirical Likelihood of the Population Mean . . . . . 102.4 Empirical Likelihood and General Estimating Equations . . 142.5 Asymptotic Properties and EL-Based Con dence Regions . . 162.6 Limitations of Empirical Likelihood . . . . . . . . . . . . . . 202.6.1 Under-Coverage Problem . . . . . . . . . . . . . . . . 202.6.2 The No-Solution Problem . . . . . . . . . . . . . . . . 213 Adjusted Empirical Likelihood . . . . . . . . . . . . . . . . . 233.1 Adjusted Empirical Likelihood and AEL-Based Con denceRegions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Finite-Sample Properties of Adjusted Empirical Likelihood . 273.2.1 Monotonicity of W n( ;an) in . . . . . . . . . . . . 27ivTable of Contents3.2.2 Monotonicity of W n( ;an) in an . . . . . . . . . . . . 293.2.3 Boundedness of W n( ;an) . . . . . . . . . . . . . . . 334 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 454.1 Con dence Intervals for One-Dimensional Mean . . . . . . . 464.2 Con dence Regions for Two-Dimensional Mean . . . . . . . 494.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55vList of Tables4.1 Coverage rates for one-dimensional mean . . . . . . . . . . . . 474.2 Con dence interval lengths for one-dimensional mean . . . . . 484.3 Coverage rates for two-dimensional mean . . . . . . . . . . . 504.4 Con dence region areas for two-dimensional mean . . . . . . 51viList of Figures2.1 A two-dimensional example of EL-based con dence region . . 193.1 A two-dimensional example showing the position of the pseudopoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 An example showing the e ect of the pseudo point on thelikelihood ratio statistic. . . . . . . . . . . . . . . . . . . . . . 333.3 Plot of upper bound against sample size . . . . . . . . . . . . 393.4 Plot of required sample size for various critical values . . . . . 403.5 The EL-based and AEL-based 95% con dence intervals forthe population mean . . . . . . . . . . . . . . . . . . . . . . . 413.6 W n( ;an) as a function of an . . . . . . . . . . . . . . . . . . 423.7 The e ect of an( ) . . . . . . . . . . . . . . . . . . . . . . . . 433.8 The 95% approximate con dence regions produced by EL,AEL and modi ed AEL . . . . . . . . . . . . . . . . . . . . . 44viiAcknowledgementsForemost I would like to express my deep gratitude to my advisor, ProfessorJiahua Chen. His inspiration, guidance, encouragement and insight helpedme through the last two years.I am grateful to Dr. Mat as Salibi an-Barrera for serving on my examiningcommittee and for his valuable comments and suggestions.Finally, I would like to give my heartful appreciation and gratitude tomy parents for their support and encouragement. This thesis is dedicatedto them.viiiChapter 1IntroductionLikelihood based statistical inferences have been advocated by generationsof statisticians. Let us illustrate the likelihood approach through the prob-lem of modeling the randomness of the wind speed which is an importantcovariate in weather forecasting. In meteorology, the Weibull distributionwith shape and scale parameters is used to model the distribution of windspeed (Corotis et al., 1978; Lun and Lam, 2000); that is, we postulate thatthe distribution of the wind speed is Weibull with two unspeci ed parametervalues. Suppose we are given a set of observations of the wind speed, andit is reasonable to assume that they are a random sample from a Weibulldistribution. Based on this assumption, we may calculate the probability ofobtaining the observed data, which is a function of these two parameters.This function of the parameters is called the likelihood function. The like-lihood function is an e ective means of summarizing the information aboutthe unknown values of the parameters contained in the data: (1) the valuesthat maximize the likelihood function are often used as point estimates ofthe unknown parameters, which are called the maximum likelihood estimates(MLEs); (2) the likelihood function can be used to perform statistical testsfor hypothesis on the parameters, and to construct an con dence region forthe parameters. These likelihood based statistical inferences possess manyoptimality properties under regularity conditions: (1) MLE is asymptoticallye cient in many senses, and give intuitively best explanation of the data;(2) likelihood based statistical tests and con dence intervals or con denceregions have good asymptotic and small-sample properties; (3) likelihood isconvenient for combining information from several data sources, and incor-porating knowledge arising from outside of data, such as the domain and aprior distribution of the parameter(s).1Chapter 1. IntroductionTraditionally, the likelihood is de ned through a pre-speci ed paramet-ric model. However, the choice of the parametric model in some applicationscan be a di cult issue. In the previous example, the Weibull distribution iswidely used to characterize the wind speed because it has been found to ta wide collection of wind data in many empirical studies. If the true distri-bution of the wind speed cannot be t well by a Weibull distribution, theoptimality properties of the likelihood approach will be in question. In com-parison, the nonparametric methods for statistical inference do not requirespeci c parametric assumptions on the shape of the population distribution.Among these methods, the empirical likelihood (EL) approach proposed byOwen (1988, 1990) has gained increasing popularity. This approach retainsa likelihood setting without activating a parametric assumption, and sharesmany desirable properties with the parametric likelihood.In this thesis, we review and investigate the asymptotic and nite-sampleproperties of empirical likelihood. In Chapter 2, we rst present a short sum-mary of the properties of the parametric likelihood, followed by an intro-duction to the empirical likelihood. The pro le empirical likelihood is thenintroduced for population mean and parameters de ned through general es-timating equations. We also discuss the numerical algorithm for computingthe empirical likelihood and some asymptotic properties of the empiricallikelihood. One of major successes of the empirical likelihood is its easi-ness to construct approximate con dence regions for parameter of interest.The EL-based con dence region possesses many advantages: it has a data-driven shape; it is invariant under parameter transformation; and it is rangerespecting.On the other hand, the empirical likelihood method has a few short-comings. The EL-based con dence regions often have lower than speci edcoverage probabilities, particularly when the sample size is small. This prob-lem can be alleviated through Bartlett correction. However, the con dentregion of the population mean, for instance, is con ned within the convexhull of the data. In some cases, even the convex hull of the data does nothave large enough coverage probability of the population mean. In addition,the empirical likelihood is not de ned at certain parameter values which may2Chapter 1. Introductionoccur especially when they are de ned through general estimating equations.Consequently, the empirical likelihood approach may fail to make a sensibleinference in a particular application.To overcome these shortcomings of the EL approach, Chen et al. (2008)proposed an adjusted empirical likelihood (AEL). The adjusted empiricallikelihood is well de ned on all parameter values de ned through estimat-ing equations. It shares the same desirable rst-order asymptotic propertieswith the empirical likelihood. Its numerical computation is much simplerand faster. In Chapter 3, we rst introduce and present some properties ofthe adjusted empirical likelihood. We further investigate some nite-sampleproperties of the adjusted empirical likelihood. One major discovery is thatthe adjusted empirical likelihood ratio statistic is always smaller than thatof the empirical likelihood. Consequently, the AEL-based con dence re-gions always contain the corresponding EL-based con dence regions. Thus,it e ectively recti es the under-coverage problem su ered by the empiricallikelihood when the sample size is not large. We also discovered that theadjusted empirical likelihood has a monotonicity property. Because of this,the AEL-based con dence region for population mean is star-shaped. Italso enables us to design a simple algorithm for computing con dence re-gions of multivariate population mean. It also leaves us an open questionwhether the AEL-based con dence region for population mean is convex.In addition, we nd the original recipe of the adjusted empirical likelihoodgiven by Chen et al. (2008) results in bounded likelihood ratio statistic. Asa result, the AEL-based con dence regions can be unbounded. However,this problem can be easily xed. We propose one possible modi cation tothe adjusted empirical likelihood so that the corresponding likelihood ratiostatistic becomes unbounded.In Chapter 4, We empirically examine the ability of the foregoing meth-ods to statistical inference about population mean. Certain kinds of settingare considered to investigate the nite-sample performances of the foregoingmethods.3Chapter 2Empirical LikelihoodEmpirical likelihood (EL) is a nonparametric analogue of the classical para-metric likelihood. The empirical likelihood method is rst formalized inthe pioneering works of Owen (1988, 1990) for statistical inference on thepopulation mean. Qin and Lawless (1994) generalize empirical likelihood tothe case where parameters are de ned through general estimating equations.We call this method \empirical" because the empirical distribution, whichassigns equal point mass on the data point, plays a key role in the settingof this method.The empirical likelihood method provides a versatile approach that maybe applied to perform inference for a wide variety of parameters of interest,and has been employed in a number of di erent areas of statistics. Qin andZhang (2007) apply the empirical likelihood method to make a constrainedlikelihood estimation of mean response in missing data problems. Chen et al.(2003) consider constructing EL-based con dence intervals for the mean ofa population containing many zero values in the area of survey. Chen et al.(2002) design an EL-based algorithm to determine design weights in surveysthat meet pre-speci ed range restrictions. Chen and Sitter (1999) develop apseudo empirical likelihood approach to incorporating auxiliary informationinto estimates from complex surveys. Chen et al. (2009) examine the perfor-mance of EL-based con dence intervals for copulas. Chan and Ling (2006)develop an empirical likelihood ratio test for GARCH model in time series.Qin and Zhou (2005) propose an empirical likelihood approach for construct-ing con dence intervals for the area under the ROC curve. Nordman andCaragea (2008) present a spatial blockwise empirical likelihood method forestimating variogram model parameters in the analysis of spatial data on agrid. And many more varied topics.42.1. Parametric LikelihoodIn Section 2.1, we brie y review the parametric likelihood inferences.Section 2.2, 2.3 and 2.4 are contributed to summarizing the setting of em-pirical likelihood. In Section 2.5, some asymptotic properties of empiricallikelihood are presented; we focus on results related to constructing approx-imate con dence regions for parameter of interest. We also present many nite-sample properties of the EL-based con dence region. In the end, wepoint out the limitations of the empirical likelihood method and the corre-sponding remedies in Section 2.6, which lead to the subject of Chapter 3.2.1 Parametric LikelihoodLet F = ff(x; ) : 2 g be a collection of probability density functionwith respect to some - nite measure, where 2Rp is a parameter thatuniquely determines the form of the density and is a p-dimensional setof possible values for . Suppose a random sample X = (X1;X2;:::;Xn)is generated from one distribution of this probability family. Given thatX = x, the likelihood function of is de ned asLn( jx) =nYi=1f(xi; ):The likelihood function is interpreted as the probability of obtaining theobserved sample if the parameter value equals . Hence, the likelihoodfunction provides a way to measure the plausibility of di erent parametervalues. If we compare the values of the likelihood function at two parametervalues 1 and 2 and nd thatLn( 1jx) >Ln( 2jx);then the sample we observed is more likely to have occurred if = 1 thanif = 2. That is, 1 is a more plausible value of than is 2.Take the wind speed example, where the two-parameter Weibull distri-bution is postulated. The probability density function of the two-parameter52.1. Parametric LikelihoodWeibull distribution is given byf(x;k; ) = k x k 1exp x k ; x 0;where k > 0 is the shape parameter and > 0 is the scale parameter. Inthis example, we have = (k; ) and = (0;1) (0;1). If a collectionof wind data is available in the form of n independent observations x =(x1;x2;:::;xn), the likelihood function will beLn(k; jx) =nYi=1f(xi;k; )= k n nYi=1 xi k 1exp xi k = kn nk exp( 1 knXi=1xki + (k 1)nXi=1logxi):In this example, we may be interested in answering the following question:given the observed sample, what value of is the most plausible? Let ^ n(x)be the global maximum of Ln( jx):^ n(x) = argmax Ln( jx):We call ^ n(x) the maximum likelihood estimate of . As a function of therandom sample X, ^ n = ^ n(X) is called the maximum likelihood estimator(MLE) of . Intuitively, the MLE is a reasonable choice for a point estimator:the observed sample is the most likely when the MLE is the parameter value.The MLE possesses two important properties by its construction. Firstly,the MLE is range respect; the range of the MLE coincides with the rangeof the parameter. Secondly, the MLE is invariant under parameter trans-formation. Suppose a distribution family is indexed by a parameter , butthe interest lies in nding an estimator of some function of , say ( ). If^ n is the MLE of , then (^ n) is the MLE of ( ). The second propertyof MLE allows us to study a parameter that does not appear in the density62.1. Parametric Likelihoodfunction. In the wind speed example, we may be interested in the mean ofthe wind speed, say . Note that can be expressed as a function of k and : = (1 + k 1) where ( ) is the gamma function. Hence, if ^k and ^ are the MLEs of k and respectively, then ^ = ^ (1 + ^k 1) is the MLE of .MLE possesses many nice asymptotic properties under some mild con-ditions on f(x; ). Firstly, MLE is a consistent estimator of the parameter,i.e. MLE converges to the true parameter value almost surely as the samplesize increases. Secondly, MLE is asymptotically e cient in the sense that itsasymptotic variance equals the Cram er-Rao bound as the sample size tendsto in nity.In applications, we may prefer a guess of a region of parameter values to aguess of a single parameter value. We can imagine that those parameter val-ues that are slightly \di erent" from the MLE are also good candidates of thetrue parameter value. The likelihood function can be used to quantify the\di erence" between any parameter value and the MLE. According to thede nition of the MLE, the likelihood ratio Rn( jx) = Ln( jx)=Ln(^ njx) isalways less than 1. Thus, we may choose some constant c2(0;1) and claimthat the true parameter value is likely contained in the following region ofparameter values:CRc =f : Rn( jx) cg: (2.1)The purpose of using a region estimator rather than a point estimator isto have some guarantee of capturing the true value of parameter. Thecertainty of this guarantee is quanti ed by the probability of CRc coveringthe true parameter value, Prf 2CRcg = PrfRn( jx) cg. With thisin mind, we may choose the constant c such that CRc has a pre-speci edcoverage probability. The guaranteed coverage probability is also called thecon dence level ofCRc. Thus, we need to know the distribution of Rn( jX).In general, the exact distribution of Rn( jX) is hard to determine. Wilks(1938) proves that under some wild conditions 2 logRn( jX) convergesto 2p in distribution as n!1, provided that the true parameter value is72.2. De nition of Empirical Likelihood . This asymptotic result is known as Wilksâ€™ theorem.Using this 2 approximation, we may choose c in equation (2.1) to beexpf 2p( )=2g, where 2p( ) denotes the upper quantile of 2p, for small . The resulting approximate 100 (1 )% con dence region for isCR= : Rn( jx) expf 2p( )=2g =f : 2 logRn( jx) 2p( )g:Similar to the MLE, the foregoing con dence region is also range respect andinvariant under parameter transformation to some degree. In the exampleof the wind speed, if CR is an approximate 95% con dence region for theparameter (k; ) then CR0 = f (1 + k 1) : (k; ) 2CRg is an at leastapproximate 95% con dence region for the mean .As widely recognized, the statistical inferences based on parametric like-lihood has its own risk: If the true distribution deviates from the parametricdistribution that we assume for the data, the foregoing nice properties ofthese inferences on the parameter of interest may be deprived of.The di culties in choosing a parametric family make many statisticiansturn to nonparametric methods for statistical inferences. These nonparamet-ric methods include the jackknife, the in nitesimal jackknife, the bootstrapmethod, and the empirical likelihood method. Each nonparametric methodhas its own advantages, but most of them are lack of a likelihood setting.The empirical likelihood method stands out since it combines the reliabilityof the nonparametric methods and the exibility and e ectiveness of thelikelihood approach.2.2 De nition of Empirical LikelihoodIn this section, we present the setting of empirical likelihood.SupposeX1;X2;:::;Xn are independent and identically distributed (i.i.d.)d-dimensional random vectors with unknown distribution F0 for some d 1.82.2. De nition of Empirical LikelihoodThe empirical likelihood of any distribution F is de ned asL(F) =nYi=1F(fXig);where F(A) is Pr(X2A) for X F and A Rd.The de nition of empirical likelihood is a direct analogue of paramet-ric likelihood: the probability of observing the sample under the assumeddistribution. The major di erence between empirical likelihood and para-metric likelihood is that the former is de ned over a very broad range ofdistributions. That is, there are practically no restrictions on the shape ofthe distribution under consideration. The name \empirical likelihood" isadopted because the empirical distribution of the sample plays a key role inthe setting of empirical likelihood. The empirical distribution is de ned asFn = 1nnXi=1 Xi;where x denotes the distribution under which Pr(X = x) = 1. The empiri-cal likelihood is maximized at the empirical distribution.Proposition 2.1. Suppose X1;X2;:::;Xn 2Rd for some d 1 are in-dependent random vectors with a common distribution F0 and Fn is thecorresponding empirical distribution. For any distribution F 6= Fn, we haveL(F) <L(Fn).Proof. Let pi = F(fXig) for i = 1;2;:::;n. It is easy to see that pi 0 andPni=1pi 1. Using a well-known fact that the arithmetic mean of a sequenceof nonnegative numbers is always larger than or equal to its geometric mean,we haveLn(F) =nYi=1pi 1nnXi=1pi!n n n: (2.2)The last equality in (2.2) holds if and only if all piâ€™s are equal and Pni=1pi =1. This inequality implies that L(F) attains its maximum n n at F =92.3. Pro le Empirical Likelihood of the Population MeanFn.By analogy with the de nition of MLE under parametric model, wesay that the empirical distribution Fn is the maximum empirical likelihoodestimate (MELE) of the distribution F. In this spirit, the MELE of thepopulation mean = R xdF(x) is ^ n = R xdFn(x) = Pni=1Xi=n = Xn,which is the sample mean.The properties of Xn under some mild conditions have already beenwell studied: Xn is an unbiased and consistent estimator of ; it has thesmallest asymptotic variance among all the unbiased estimators of ; it isasymptotic normal distributed; and so on. In this thesis, we mainly focus onthe problem of constructing con dence regions for through the empiricallikelihood. For this purpose, we introduce the pro le empirical likelihood inthe next section.2.3 Pro le Empirical Likelihood of thePopulation MeanLet X1;X2;:::;Xn be i.i.d. d-dimensional random vectors with unknowndistribution F.By analogy with the Wilksâ€™ theorem, we may also use the ratio of theempirical likelihood as a basis for constructing con dence regions. The em-pirical likelihood ratio for a distribution F is de ned asRn(F) = Ln(F)Ln(Fn):By Proposition 2.1, Rn(F) 1 and the equality holds if and only if F = Fn.Recall that the population mean is a functional of the population distri-bution. The likeliness of a speci c value of can be inferred from thisrelationship. In the literature of empirical likelihood, we de ne the pro le102.3. Pro le Empirical Likelihood of the Population Meanempirical likelihood of asLn( ) = sup Ln(F) :ZxdF(x) = : (2.3)By analogy with the parametric likelihood ratio, we may de ne the pro leempirical likelihood ratio function of asRn( ) = Ln( )Ln( Xn)= sup Rn(F) :ZxdF(x) = ; (2.4)Yet without requiring the support of F being con ned within the set ofobserved values ofX, this pro le empirical likelihood ratio for the populationmean always equals 1. We illustrate this point as follows. For any given ,let " be a positive constant smaller than 1 andx = 1" 1 "" Xn:We construct a mixture distribution F ;" = (1 ")Fn+" x . Note that themean of (F ;") is andRn(F ;") = Ln(F ;")Ln(Fn)= [(1 ")=n]n(1=n)n = (1 ")n:Hence for any pre-speci ed value of , Rn(F ;") can be made arbitrarilyclose to 1 as long as " is su ciently small. As a result, Rn( ) de ned byequation (2.4) always equals 1 for any . Hence, this de nition of the pro lelikelihood ratio function is not useful for constructing con dence regions.The above problem can be easily solved by requiring the support ofF being contained in the set of observed values of X. As proposed byOwen (1988), the empirical likelihood ratio will be pro led for a parameterover only the distributions with support on the data set. In other words,only distributions such that pi = F(fXig) > 0 and Pni=1pi = 1 will beconsidered. We denote such a distribution F as F Fn. The de nition of112.3. Pro le Empirical Likelihood of the Population Meanthe pro le empirical likelihood for in the literature is given byLn( ) = sup Ln(F) : F Fn;ZxdF(x) = = sup( nYi=1pi : pi > 0;nXi=1pi = 1;nXi=1piXi = ): (2.5)Without further clari cation, we refer to \pro le empirical likelihood" as\empirical likelihood" from now on.In this de nition of Ln( ), the sample mean Xn is its maximum point.We naturally de ne the pro le empirical likelihood ratio for asRn( ) = Ln( )Ln( Xn)= sup Ln(F)Ln(Fn) : F Fn;ZxdF(x) = = sup( nYi=1npi : pi > 0;nXi=1pi = 1;nXi=1piXi = ): (2.6)For the convenience of discussing asymptotic properties, we prefer workingon the pro le empirical likelihood ratio statistic de ned asWn( ) = 2 logRn( ):Because Rn( ) is the maximum value of Qni=1npi subject to some con-straints, Wn( ) is the minimum value of 2 Pni=1 log(npi) subject to thesame constraints. We will refer to a set of weightsfpigni=1 that satisfy theseconstraints as sub-optimal weights for Wn( ).The second constraint may also be written asnXi=1pi(Xi ) = 0: (2.7)The calculation of Rn( ) and Wn( ) at a given parameter value amounts tosolving a constrained optimization problem. The Lagrangeâ€™s method is well122.3. Pro le Empirical Likelihood of the Population Meansuited in this situation. Take Wn( ) as an example. Let us de neH(p1;p2;:::;pn; ; ) = 2nXi=1log(npi) n T" nXi=1pi(Xi )#+ nXi=1pi 1!with 2Rd and 2R being the lagrange multipliers.Setting the derivatives of H with respect to and to zero, we recoverthe two equality constraints on piâ€™s. Di erentiating H with respect to piand setting the derivatives equal to zero, we get0 = @H@pi= 1pi n T(Xi ) + (2.8)Multiplying the above equation by pi and summing over i, with the help oftwo constraints, we get0 =nXi=1pi @H@pi= n+ :It gives us = n. Substituting this result into equation (2.8) gives theoptimal weightspi = 1n 11 + T(Xi ); i = 1;:::;n: (2.9)The value of can be computed through the constraintnXi=1pi(Xi ) =nXi=11nXi 1 + T(Xi ) = 0:Equivalently, we havenXi=1Xi 1 + T(Xi ) = 0: (2.10)132.4. Empirical Likelihood and General Estimating EquationsThe above equation can be easily solved numerically. From now on, wewill refer to the weights given by equation (2.9) as the optimal weights forWn( ).Once the value of is obtained, we can compute Wn( ) throughWn( ) = 2nXi=1log[1 + T(Xi )]: (2.11)The primal constrained optimization problem for Wn( ) must work on nvariables p1;p2;:::;pn. Equation (2.11) shows that Wn( ) has a simpleanalytic expression, and the constrained optimization problem is reducedto nding an appropriate root to equation (2.10). This simple expressionof Wn( ) has two advantages. Firstly, this expression of Wn( ) provides afeasible approach to calculate Wn( ) numerically. Chen et al. (2002) proposea modi ed Newtonâ€™s algorithm for nding the root to equation (2.10), whosealgorithmic convergence is guaranteed when the solution exists. Secondly,this expression helps us study the asymptotic behavior of Wn( ). In theinvestigation of the asymptotic properties of Wn( ), the property of playsa key role.2.4 Pro le Empirical Likelihood for ParametersDe ned Through General EstimatingEquationsIn additional to make inference on population means, empirical likelihood nds many applications to parameters de ned in a nonparametric way. Forinstance, Owen (1991) applies empirical likelihood to make inference on theregression coe cients in linear models. In general, we can often de ne someparameters of interest through the so-called \general estimating equations".For a random variable X F, a p-dimensional parameter can be de nedas the solution toEF[g(X; )] = 0 (2.12)142.4. Empirical Likelihood and General Estimating Equationsfor some q-dimensional mapping g(X; ) with q p. The above system isknown as the general estimating equation (GEE), and g(x; ) is called theestimating function. When g(x; ) = x , the parameter is the mean ofX. When g(s; ) = I(x ) for some 2 (0;1), is the quantile ofX.The classic setting of GEE has q = p. Given a simple random sampleX1;X2;:::;Xn, an estimator of , say ^ , can be obtained as the solution toEFn[g(X; )] = 1nnXi=1g(Xi; ) = 0: (2.13)Since Fn is the MELE of F, it implies that this estimator is the MELE of .In econometrics applications, however, most interest attaches to the caseof over-identi cation with q > p (Imbens, 2002; Hansen, 1982; Hall, 2005).In this case, equation (2.13) may not have any solutions.Empirical likelihood provides a natural approach to overcome this prob-lem. Qin and Lawless (1994) develop a theory for the EL-based statisticalinference for parameters de ned through general estimating functions. Theypropose to de ne the pro le empirical likelihood for asLn( ) = max( nYi=1pi : pi > 0;nXi=1pi = 1;nXi=1pig(Xi; ) = 0):The corresponding pro le empirical likelihood ratio for becomesRn( ) = L( )L(Fn)= max( nYi=1npi : pi > 0;nXi=1pi = 1;nXi=1pig(Xi; ) = 0;):(2.14)The likelihood ratio statistic is then given byWn( ) = 2 logRn( ):Similar to the case of the population mean discussed in Section 2.3, Wn( )152.5. Asymptotic Properties and EL-Based Con dence Regionscan be written asWn( ) = 2nXi=1log[1 + Tg(Xi; )] (2.15)with being the solution tonXi=1g(Xi; )1 + Tg(Xi; ) = 0: (2.16)In the framework of general estimating equation, the MELE of , whichis de ned as the maximum point of Ln( ), is not so trivial as that for pop-ulation mean. One of the main contributions of Qin and Lawless (1994) isthat they demonstrate the asymptotic normality of the MELE of undersome regularity conditions on the estimating function. In addition, theyjustify the use of the empirical likelihood ratio statistic for testing or ob-taining con dence regions for parameters in a completely analogous way tothe parametric likelihood approach. But the main interest of this thesis liesin the statistical inference for population mean, so we will not explore thistopic further here.2.5 Asymptotic Properties of EmpiricalLikelihood and EL-Based Con dence RegionsThe most impressive result in Owen (1988, 1990) is the following asymptoticlimiting distribution of Wn( ).Theorem 2.2. Let X1;X2;:::;Xn be a simple random sample from some d-dimensional population X and Wn( ) is the empirical likelihood ratio statis-tic for the population mean . If the variance-covariance matrix of X ispositive de nite and the true value of is 0, thenWn( 0) d ! 2d; as n!1: (2.17)Theorem 2.2 suggests an approximate 100(1 )% con dence region for162.5. Asymptotic Properties and EL-Based Con dence Regions in the form ofCR =f jWn( ) 2d( )g;where 2d( ) is the upper quantile of 2d. Hall and La Scala (1990) andOwen (2001) point out that the con dence region CR is always convex.This is clearly a nice property to practitioners. We summarize their resultas Proposition 2.3 with a simple proof.Proposition 2.3. Let X1;X2;:::;Xn be a random sample from some pop-ulation X and Wn( ) be the empirical likelihood ratio statistic for the pop-ulation mean. Suppose 1 6= 2 and 1; 2 2 CR , and is a convexcombination of 1 and 2. Then 2CR .Proof. Letfpigni=1 andfqigni=1 be the optimal weights forWn( 1) andWn( 2),respectively. For any such that = 1 + (1 ) 2 for some 0 1,it is easy to verify thatfri = pi+(1 )qigni=1 are sub-optimal weights forLn( ). Note also that pi + (1 )qi p i q1 i for i = 1;2;:::;n. Hence,we haveLn( ) nYi=1ri =nYi=1[ pi + (1 )qi] nYi=1p i q1 i = [Ln( 1)] [Ln( 2)]1 :It follows thatWn( ) Wn( 1) + (1 )Wn( 2) 2d( ) + (1 ) 2d( ) = 2d( ):By de nition of CR , we conclude that 2 CR . This completes theproof.Following this proposition, we can see that Wn( ) has some kind ofmonotonicity property.Corollary 2.4. Assume the same conditions as in Proposition 2.3. Let vbe a d-dimensional unit vector and consider the half line de ned by Xn +tvfor t> 0. Then Wn( Xn +tv) is an increasing function of t.172.5. Asymptotic Properties and EL-Based Con dence RegionsProof. For any 0 < t1 < t2, let 1 = Xn + t1v and 2 = Xn + t2v. Notethat 1 is a convex combination of 2 and Xn.Consider a con dence region for , CR=f : Wn( ) Wn( 2)g. Notethat 2 and Xn always fall inside of this region. By Proposition 2.3, 1 alsobelongs to CR and thus Wn( 1) Wn( 2).Corollary 2.4 justi es a simple algorithm for numerically nding theboundary of the con dence region. We brie y describe this algorithm in thecase of bivariate mean:1. Choose a su ciently dense sequence of angles from 0 to 2 , for ex-ample, an arithmetic sequence from 0 to 2 with common di erence2 =M for some su ciently large positive integer M.2. Along each direction de ned as a unit vector m = (cos m;sin m)Twith m being an angle selected in Step 1, we search for a positive realnumber tm such that Wn( Xn+tm m) is su ciently close to the criticalvalue determined by the 2 approximation. Since Wn( ) is increasingalong any direction starting from Xn, as asserted in Corollary 2.4, asimple bisection algorithm is e ective.3. Let f Xn + tm m;m = 1;2;:::;Mg be the boundary points obtainedin Step 2.With this set of points, we not only can visualize the con dence regionsthrough a two-dimensional plot, but can also calculate the approximate areaof the con dence regions. Apparently, the approximation becomes betterwhen M gets larger, but the exact accuracy is di cult to determine.EL-based con dence region has many celebrating properties. We sum-marize some as follows:1. EL-based con dence region has a data-driven shape. Figure 2.1 showsthe boundary of the EL-based 95% con dence region for the bivari-ate mean based on a data set of 10 observations generated from abivariate gamma distribution. It is seen that the shape of the con -182.5. Asymptotic Properties and EL-Based Con dence Regionsa71a71a71a71a71a71a71a71a71a710 2 4 60.00.51.01.52.02.53.0a71XnELHotelling's T2Figure 2.1: A two-dimensional example of EL-based con dence regiondence region based on the widely-used normal approximation is pre-determined even before the data are available. On the contrary, theEL-based con dence region automatically re ects the emphasis on thedata set. It is an appealing property to many practitioners since itupholds the principle of \letting the data speak."2. EL-based con dence region is range respecting and transformation in-variant. For example, the con dence interval for the correlation alwayslies between 1 and 1.3. The EL-based con dence region is Bartlett-correctable. In both para-metric likelihood based and EL-based con dence regions, we selectthe critical value using the limiting distribution of the likelihood ratiostatistic. Such approximations introduce error to the coverage accu-racy of the resulting con dence regions. The actual coverage proba-192.6. Limitations of Empirical Likelihoodbility of the con dence region does not exactly agree with the nom-inal level. In the parametric setting, the coverage accuracy can beimproved by the so-called Bartlett correction on the likelihood ratiostatistic (Barndor -Nielsen and Cox, 1984). As shown by Diciccioet al. (1991), the empirical likelihood ratio statistic is also Bartlettcorrectable. We will discuss it further in Section 2.6.1.2.6 Limitations of Empirical LikelihoodWhile empirical likelihood has many nice properties as shown in Section 2.5,there are situations where its applications meet some practical obstacles.In this section, we discuss two related issues which lead to the adjustedempirical likelihood in Chapter 3.2.6.1 Under-Coverage ProblemTheorem 2.2 suggests using the limiting distribution of Wn( ) to calibratethe EL-based con dence region for . As expected, the coverage probabilityof the resulting con dence region does not exactly match the pre-speci edcon dence level. Diciccio et al. (1991) shows thatPrf 02CR g= PrfWn( 0) 2d( )g= 1 +O(n 1): (2.18)Simulation results reveal that EL-based con dence regions su er from theso-called \under-coverage" problem. That is, its coverage probability islower than the nominal level particularly when the sample size is small orthe population is skewed. Diciccio et al. (1991) prove that the empiricallikelihood is Bartlett correctable; a simple correction on Wn( 0) can improvethe approximating precision given in equation (2.18) fromO(n 1) toO(n 2).Empirical studies reveal that the Bartlett correction signi cantly improvesthe coverage rate of the EL-based con dence regions.The error in the approximation is partially accounted to the fact that theexpectation of Wn( 0) does not match the expectation of the correspondinglimiting distribution. Thus, the coverage accuracy may be improved by202.6. Limitations of Empirical Likelihoodrescaling Wn( ). Asymptotically, it is found thatE[Wn( )] = d 1 + bn +O(n 2) with b being a constant depending on the rst four moments of the pop-ulation X. By applying the 2 approximation to Wn( )=(1 + bn 1), theresulting con dence region has higher precision; the coverage error is re-duced from order n 1 to of order n 2. More precisely,Pr Wn( 0)1 +b=n 2d( ) = Pr Wn( 0) 2d( ) 1 + bn = 1 +O(n 2): (2.19)The asymptotic derivation of equation (2.19) is long and complex. Thedetails can be found in Diciccio et al. (1991).The constant b in equation (2.19) is called the Bartlett correction fac-tor. The value of b depends on the rst four moments of the populationdistribution. Its value must be estimated based on data in applications.Replacing b by a pn-consistent estimator in equation (2.19) will not a ectthe theoretical result.The Bartlett correction is also applicable for EL-based con dence regionsfor parameters de ned through general estimating equation (2.12). TheBartlett correction factor b is determined by the distribution of g(X; ).Liu and Chen (2010) provide a detailed discussion on how to calculate theBartlett correction factor in the framework of general estimating equation.2.6.2 The No-Solution ProblemAs described in Section 2.4, Wn( ) equals the minimum value of 2 Pni=1 log(npi)over all sub-optimal weightsfpigni=1. Hence Wn( ) is well de ned if and onlyif there exists at least one set of sub-optimal weights. Let CHf g be theconvex hull expanded by the set of points inside fg. Then, Wn( ) is well212.6. Limitations of Empirical Likelihoodde ned when02CHfg(Xi; );i = 1;2;:::;ng: (2.20)We take the population mean as an example. Condition (2.20) is satis edif and only if 0 2CHfXi ;i = 1;2;:::;ng. Equivalently, we must have 2CHfXi;i = 1;2;:::;ng. When is one-dimensional, condition (2.20)can be further simpli ed to X(1) < < X(n). Let 0 be the set of overwhich condition (2.20) is satis ed. We can see that 0 is determined bythe data. For complex estimating equations, it can be hard to specify thestructure of 0.Owen (2001) proves that the true parameter value 0, de ned as theunique solution to equation (2.12), is contained in 0 almost surely asn!1 under some regularity conditions on the estimating function. For is not close to 0 or when the sample size is small, it is very possible that =2 0 and thus equation (2.16) is not solvable. When =2 0, it is conven-tional to de ne Wn( ) = 1. However, this setting has its own limitations.Firstly, for any two di erent parameter values 1; 2 =2 0, we are unableto evaluate their relative plausibility based on Wn( ). Secondly, using thissetting implies that the con dence region is always a subset of 0, which isdetermined by the data. This can be a problem when even 0 itself doesnot achieve the desired con dence level especially when the sample size issmall.Aiming to solve the no-solution problem of empirical likelihood, Chenet al. (2008) propose an adjustment to the original empirical likelihood suchthat the resulting adjusted empirical likelihood (AEL) is well de ned forall possible parameter values. Chapter 3 is contributed to summarizing thewell-studied asymptotic properties of the adjusted empirical likelihood, andinvestigating the nite-sample properties of AEL-based con dence regionsmainly in the case of population mean.22Chapter 3Adjusted EmpiricalLikelihoodTo overcome the obstacle caused by the no-solution problem in the appli-cation of empirical likelihood, Chen et al. (2008) propose an adjustment toempirical likelihood. The resulting adjusted empirical likelihood is attract-ing for its easy computation and desirable asymptotic properties. Recently,this method has found its applications in various areas. Zhu et al. (2009)incorporate the adjusted empirical likelihood and the exponentially tiltedlikelihood, and apply it to the analysis of morphometric measures in MRIstudies. Liu and Yu (2010) propose a two-sample adjusted empirical like-lihood approach to construct con dence regions for the di erence of twopopulation means. Variyath et al. (2010) introduce the information criteriaunder adjusted empirical likelihood to variable/model selection problems.In this chapter, we rst review the setting of the resulting adjusted em-pirical likelihood (AEL) and its asymptotic property. In addition, we presentsome new results on the nite-sample properties of the adjusted empiricallikelihood.3.1 Adjusted Empirical Likelihood andAEL-Based Con dence RegionsLet us start with a simple example. Suppose that we have a random sampleof n bivariate observations, and we are interested in the population mean .Now consider a value of outside of CHfXi;i = 1;2;:::;ng. Apparently,such a value of does not satisfy condition (2.20), and therefore Wn( )233.1. Adjusted Empirical Likelihood and AEL-Based Con dence Regionsis not well de ned. The idea of the adjustment proposed by Chen et al.(2008) is to add a pseudo observation Xn+1 into the data set such that 2CHfXi;i = 1;2;:::;n+ 1g. More speci cally, we may choose Xn+1 asXn+1 = +an ( Xn); (3.1)for some positive constant an; or equivalently, we may writeXn+1 = an ( Xn ): (3.2)The rationale of adding such a pseudo point is illustrated in Figure 3.1.Note that Xn = Pni=1Xi=n is always an interior point of CHfXi;i =1;2;:::;ng. Suppose be a parameter value outside of the convex hullof fX1;X2;:::;Xng. Let us rst draw a ray from Xn towards , and letXn+1 be a point on the further side of . Apparently, the constant an deter-mines how far Xn+1 should be placed. It is seen that is an interior pointof the convex hull of fX1;X2;:::;Xn+1g.a71a71a71a71a71a71a71a71ÂµXn(a) Plot of Xiâ€™sa71a71a71a71a71a71a71a71ÂµXna71Xn+1(b) After adding a pseudo pointFigure 3.1: A two-dimensional example showing the position of the pseudopoint.243.1. Adjusted Empirical Likelihood and AEL-Based Con dence RegionsThis adjustment is generally applicable. For any given value of , letgi( ) = g(Xi; ) and gn( ) = Pni=1gi( )=n. And the pseudo observation isde ned asgn+1( ) = an gn( ): (3.3)By including this pseudo observation into the data set, the empirical likeli-hood ratio statistic for becomesW n( ;an) = 2 logR n( ;an) (3.4)withR n( ) = (n+ 1)n+1L n( ;an);andL n( ;an) = max(n+1Yi=1pi : pi > 0;n+1Xi=1pi = 1;n+1Xi=1pigi( ) = 0):We call a set of weights of fpign+1i=1 sub-optimal for W n( ;an), R n( ;an) orL n( ;an) if they satisfy the above equality constraints.Using the Lagrangeâ€™s method, we can easily show that the optimalweights are given bypi = 1n+ 1 11 + Tgi( ); i = 1;2;:::;n+ 1;where is the solution ton+1Xi=1gi( )1 + Tgi( ) = 0:253.1. Adjusted Empirical Likelihood and AEL-Based Con dence RegionsAs a consequence, W n( ;an) can be expressed asW n( ;an) = 2n+1Xi=1log[1 + Tgi( )]:Compared to the original empirical likelihood, adjusted empirical likelihoodhas many desirable properties. Firstly, adjusted empirical likelihood yields asensible value of likelihood at any putative parameter value, and this allowsus to evaluate the plausibility of any parameter value. On the contrary, theoriginal empirical likelihood is well de ned only over a data-dependent sub-set of the parameter space, and this subset is di cult to specify numericallywhen the estimating function g(x; ) is complex.Secondly, the rst order asymptotic property of Wn( 0), where 0 is thetrue value of , is largely preserved for W n( 0;an). For example, W n( 0;an)has the same limiting distribution as that of Wn( 0) as long as an = op(n2=3).Thus, the 2 calibration is still applicable to constructing con dence regionsfor parameter of interest. That is, CR =f : W n( ;an) 2q( )g remainsan approximate 100(1 )% con dence region for .Thirdly, AEL-based con dence regions can achieve coverage precisionof higher order with appropriately chosen an. The positive constant anin the de nition of the pseudo point can be used as a tuning parameterwhich controls the level of adjustment. Recall that EL-based con denceregions have the under-coverage problem, and they are Bartlett correctableto achieve higher order precision. Apparently, tuning the size of an mayachieve the same good. This is exactly what has been proposed in Liuand Chen (2010). They discover that when an = b=2 with b being theBartlett correction factor, the coverage accuracy of AEL-based con denceregions is of order n 2, which is the same as that of Bartlett-corrected EL-based con dence regions. The sign of b matters in the adjusted empiricallikelihood. In the one-dimensional case, b is positive for any distribution.When the dimension is higher than 1, empirical studies (Liu and Chen,2010) seem to support that b is positive, but theoretical justi cation is stillneeded.263.2. Finite-Sample Properties of Adjusted Empirical LikelihoodFourthly, although the original motivation of the adjusted empirical like-lihood method is to handle the no-solution problem confronted with the ELmethod, empirical studies (Chen et al., 2008; Liu and Chen, 2010) revealthat AEL-based con dence regions have higher coverage rate than EL-basedcon dence regions. Note that this does not imply that the AEL-based con- dence regions have more accurate coverage rates than the correspondingEL-based con dence regions, though it is often the case because EL-basedcon dence regions have the under-coverage problem. We will demonstratethis empirical discovery rigorously in Section 3.2.3.2 Finite-Sample Properties of AdjustedEmpirical Likelihood for Population MeanWe devote this section to the nite-sample properties of the adjusted em-pirical likelihood mainly in the case of population mean.3.2.1 Monotonicity of W n( ;an) in It is desirable that the con dence region of any parameter is convex. Empir-ical evidences seem to support that AEL-based con dence region for popula-tion mean is convex. This is yet to be con rmed theoretically. In this section,we prove that like Wn( ), the adjusted likelihood ratio statistic W n( ;an)also has a monotonicity property. As mentioned earlier in Section 2.5, thisproperty is critical for the numerical computation of multidimensional con- dence regions.Theorem 3.1. Suppose we have a random sample X1;X2;:::;Xn fromsome population X, and W n( ;an) is the adjusted empirical likelihood ratiostatistic for the population mean. For any d-dimensional unit vector v, con-sider the half line Xn+tv for t 0. Then W n( Xn+tv;an) is an increasingfunction of t.Proof. For any 0 < t1 < t2, let 1 = Xn + t1v and 2 = Xn + t2v. Note273.2. Finite-Sample Properties of Adjusted Empirical LikelihoodthatW n( ;an) = 2 logR n( ;an) = 2 logL n( ;an) 2 log(n+ 1)n+1;and the logarithm transformation is monotone. It su ces to showL n( 1;an) L n( 2;an).Let fpign+1i=1 be the optimal weights for L n( 2;an). If we can nd a setof sub-optimal weightsfqign+1i=1 for L n( 1;an) such that Qn+1i=1 qi Qn+1i=1 pi,then the conclusion of the theorem follows since L n( 1;an) Qn+1i=1 qi.For i = 1;2;:::;n, we de neri = pi1 pn+1:Then we havenXi=1ri (Xi 2) + pn+11 pn+1(Xn+1 2) = 0:Substituting Xn+1 = an ( Xn ) and letting k = pn+1an=(1 pn+1),we havenXi=1ri (Xi 2) = k( Xn 2) = ( Xn 2) + (k 1) ( Xn 2):De neeLn( ) = max( nYi=1si : si > 0;nXi=1si = 1;nXi=1si (Xi 2) = ):That is, eLn( ) is the pro le empirical likelihood for = E[X1 2]. It iseasy to verify that eLn( ) is maximized at n = Pni=1(Xi 2)=n = Xn 2,andfrigni=1 is the optimizing weights for eLn( 2) with 2 = ( Xn 2) + (k 1) ( Xn 2). Denote 1 = ( Xn 2) + (k 1) ( Xn 1). Note that 1 is aconvex combination of 2 and Xn. Hence 1 is also a convex combinationof 2 and n. By the monotonicity property of eLn( ) (Corollary 2.4), we283.2. Finite-Sample Properties of Adjusted Empirical Likelihoodhave eLn( 1) eLn( 2).Let fsigni=1 be the optimal weights for eLn( 1), and de ne qn+1 = pn+1and qi = (1 pn+1)si for i = 1;2;:::;n. We can easily verify that fqign+1i=1is a set of sub-optimal weights for L n( 1). Hence,L n( 1;an) n+1Yi=1qi= pn+1 (1 pn+1)nnYi=1si pn+1 (1 pn+1)nnYi=1ri= pn+1 (1 pn+1)nnYi=1pi1 pn+1=n+1Yi=1pi= L n( 2;an):Hence, W n( 1;an) W n( 2;an).According to Theorem 3.1, we nd that Xn is the minimum point ofW n( ;an), and AEL-based con dence regions for the population mean areat least star-shaped with Xn being the center. In the case of the univari-ate mean, the AEL-based con dence regions are still intervals. This resultguarantees that the bisection algorithm described in Section 2.5 also worksin nding the boundary of the AEL-based con dence region.Whether AEL-based con dence regions for population mean are convexor not is still not clear, though it seems to be the case in the simulationstudies.3.2.2 Monotonicity of W n( ;an) in anEmpirical studies in Chen et al. (2008) and Liu and Chen (2010) revealthat the AEL-based con dence regions have higher coverage rate than the293.2. Finite-Sample Properties of Adjusted Empirical Likelihoodcorresponding EL-based con dence regions. Intuitively, the gain in cover-age rate of AEL-based con dence regions may be explained by the way howthe adjusted empirical likelihood ratio statistic is constructed. As arguedby Hua (2009), for testing the null hypothesis H0 : = 0, the pseudo pointgn+1( 0) is always placed at a position that is in favor of the null hypothe-sis. Thus, the adjusted empirical likelihood ratio statistic tends to favor thenull hypothesis and de ates the type-I error. Consequently, the AEL-basedcon dence regions has higher coverage rate compared to the correspond-ing EL-based con dence regions. It turns out that this observation can beproved rigorously. Theorem 3.2 reveals the monotonicity of W n( ;an) inan, and an interesting relationship between adjusted empirical likelihoodand empirical likelihood as a special case. It implies that the AEL-basedcon dence region strictly contains the corresponding EL-based con denceregion.Theorem 3.2. Suppose X1;X2;:::;Xn is a random sample from somepopulation X, and Wn( ) and W n( ;an) are the empirical likelihood ratiostatistic and the adjusted empirical likelihood ratio statistic de ned by equa-tions (2.15) and (3.4), respectively. We adopt the conventional value 1 forWn( ) when it is not well de ned. Then we have(1) W n( ;an) = Wn( ) if an = 0.(2) W n( ;an) is a decreasing function of an on the closed interval [0;n].Proof. (1) When an = 0, we have gn+1( ) = 0. Hence, W n( ; 0) becomesW n( ; 0) = 2n+1Xi=1log[1 + Tgi( )]= 2nXi=1log[1 + Tgi( )];where is the solution ton+1Xi=1gi( )1 + Tgi( ) =nXi=1gi( )1 + Tgi( ) = 0:303.2. Finite-Sample Properties of Adjusted Empirical LikelihoodIt is clear that the expression of W n( ; 0) and the equation to coincidewith those in the de nition of Wn( ) (equation (2.15) and (2.16)). That is,we have W n( ; 0) = Wn( ).(2) We will only give the proof in the case of population mean for sim-plicity; the proof is the same for the case of general estimating equation.Without loss of generality, we also x = 0 and assume Xn6= 0.When an = n, it is easy to verify that weights fpi = 1=(n + 1)gn+1i=1are sub-optimal for W n(0;n) and thus are optimal for W n(0;n). Therefore,W n(0;n) = 0 W n(0;an) for any an < n. Next we will only consideran2[0;n).Note that W n(0;an) can be expressed asW n(0;an) = 2n+1Xi=1log(1 + TXi);where Xn+1 = an Xn, and satis esn+1Xi=1Xi1 + TXi = 0: (3.5)The derivative of W n(0;an) with respect to an isdW (0;an)dan =n+1Xi=1 d dan TXi1 + TXi + T dXn+1dan1 + TXn+1= d dan T n+1Xi=1Xi1 + TXi + T( Xn)1 + TXn+1= T Xn1 + TXn+1;where we substitute equation (3.5).If the derivative of W n(0;an) is always negative, then we know W n(0;an)is a decreasing function of an. Our task is to prove that the derivative ofW n(0;an) is negative for an2[0;n), or equivalently to prove T Xn > 0.313.2. Finite-Sample Properties of Adjusted Empirical LikelihoodConsider the following functionf(t) =n+1Xi=1 TXi1 +t TXi:Note thatf(0) =n+1Xi=1 TXi = TnXi=1Xi + T( an Xn) = (n an) T Xn;f(1) =n+1Xi=1 TXi1 + TXi = Tn+1Xi=1Xi1 + TXi = 0:We also notice that the derivative of f(t)df(t)dt =n+1Xi=1 TXi TXi(1 +t TXi)2 = n+1Xi=1 TXi1 +t TXi 2is always negative, and thus f(t) is a decreasing function of t. Therefore,we have f(0) > f(1), that is (n an) T Xn > 0. Since an < n, we nd T Xn > 0. Consequently, W n(0;an) is a decreasing function of an, and itcompletes the proof.Figure 3.2 plots the EL and AEL likelihood ratio statistics based on anarti cially generated data set. It clearly shows that W n( ;an) Wn( ) forall . As a consequence, the AEL-based con dence interval for containsthe corresponding EL-based con dence interval and hence the former hashigher coverage probability. This conclusion is generally true for parametersde ned through general estimating equation. It enhances the results in Liuand Chen (2010) that AEL-based con dence regions where an = b=2 with bbeing the Bartlett correction factor has not only higher coverage accuracybut also higher coverage probability.323.2. Finite-Sample Properties of Adjusted Empirical LikelihoodÂµLikelihood ratio statisticX(1)X(n)ELAELFigure 3.2: An example showing the e ect of the pseudo point on the like-lihood ratio statistic.3.2.3 Boundedness of W n( ;an)Suppose X1;X2;:::;Xn 2Rd are independent random vectors from somepopulation X, and 2 Rp is the parameter of interest de ned throughgeneral estimating equation (2.12). The EL-based approximate 100(1 )%con dence region for is de ned asCR =f : Wn( ) 2q( )gwhere 2q( ) is the upper quantile of 2 distribution with q degrees offreedom. The AEL-based 100(1 )% con dence region is de ned in thesame way except for replacing the foregoing Wn( ) by W n( ;an).In the case of population mean, since Wn( ) is only well de ned for inthe convex hull of the sample, and Wn( ) tends to in nity as approaches333.2. Finite-Sample Properties of Adjusted Empirical Likelihoodto the boundary of the convex hull, the EL-based con dence region forpopulation mean is always of nite size. On the other hand, however, we nd that W n( ;an) is bounded from above for any given n. Hence, AELmay give unbounded con dence region when the sample size is not largeenough, or the con dence level (1 ) is too high. We state this result asfollows.Theorem 3.3. Suppose we have a nite sample X1;X2;:::;Xn and letW n( ;an) be the adjusted empirical likelihood ratio statistic de ned in equa-tion (3.4). For any , we haveW n( ;an) 2n log (n+ 1)ann(1 +an) 2 log n+ 11 +an :Proof. Letq1 = q2 = = qn = 1n an1 +an;qn+1 = 11 +an:It is clear that qi > 0 and Pn+1i=1 qi = 1. In addition, it is seen thatn+1Xi=1qigi( ) = an1 +an1nnXi=1gi( ) + 11 +an[ an gn( )]= an1 +an gn( ) an1 +an gn( )= 0:Hence, fqign+1i=1 is a set of sub-optimal weights for W n( ;an). According tothe de nition of W n( ;an), we thus haveW n( ;an) 2n+1Xi=1log[(n+ 1)qi] = 2n log (n+ 1)ann(1 +an) 2 log n+ 11 +an :It completes the proof.For the population mean, the next theorem shows that the upper bound343.2. Finite-Sample Properties of Adjusted Empirical Likelihoodin Theorem 3.3 is the supremum of W n( ;an).Theorem 3.4. Let X1;X2;:::;Xn be i.i.d. d-dimensional random vectorsand be the population mean. Denote M as the upper bound in Theorem 3.3.For any d-dimensional unit vector v, consider the half line Xn + tv witht> 0. We havelimt!1W n( Xn +tv;an) = M:Proof. We will present proof for the case when d = 1 and for the case whend> 1 separately.Case 1: d = 1. We will only present the proof of the theorem in the casewhere ! 1; the proof in the case where !1 is similar.Letfpign+1i=1 be the optimal weights for W n( ;an). We prove the result inthree steps. Firstly, we demonstrate that lim ! 1pn+1 = 1=(1+an). Secondly,we further show that lim ! 1pi = an=[n(1 + an)] for i = 1;2;:::;n. In the nal step, the conclusion of the proposition readily follows.Step 1. Consider any such that <X(1). Note that fpign+1i=1 satisfyn+1Xi=1pi (Xi i) = 0:From the above equation, we getnXi=1pi (Xi ) = pn+1 (Xn+1 ) = pn+1an ( Xn ):Thus,pn+1 =nXi=1pi (Xi )an ( Xn ):Note that 0 < X(1) Xi X(n) for i = 1;2;:::;n, and353.2. Finite-Sample Properties of Adjusted Empirical LikelihoodPni=1pi = 1 pn+1. Hencepn+1 nXi=1pi (X(n) )an ( Xn ) =Pni=1pianX(n) Xn =1 pn+1anX(n) Xn :Since pn+1 < 1, we get an upper bound for pn+1 from the above equation:pn+1 X(n) an ( Xn ) + (X(n) ):Similarly, we get a lower bound for pn+1:pn+1 X(1) an ( Xn ) + (X(1) ):Letting ! 1, we get11 +an lim ! 1pn+1 lim ! 1pn+1 11 +an:Hence,lim ! 1pn+1 = 11 +an: (3.6)Step 2. For i = 1;2;:::;n+ 1, pi can be expressed aspi = 1n+ 1 11 + (Xi )(3.7)for some . By equation (3.7) with i = n+ 1, we have = (n+ 1) p 1n+1(n+ 1)(Xn+1 ) =(n+ 1) p 1n+1an (n+ 1)( Xn ):For i = 1;2;:::;n, substituting this expression of into equation (3.7) leads363.2. Finite-Sample Properties of Adjusted Empirical Likelihoodtopi = 1n+ 1"1 + (n+ 1) p 1n+1(n+ 1)anXi Xn # 1="(n+ 1) + (n+ 1) p 1n+1anXi Xn # 1:Letting ! 1 and using equation (3.6), we getlim ! 1pi = (n+ 1) + (n+ 1) (1 +an)an 1= 1n an1 +an:Step 3. Since fpign+1i=1 are the optimal weights for W n( ;an), we haveW n( ;an) = 2n+1Xi=1log[(n+ 1)pi]:Consequently,lim ! 1W n( ;an) = lim ! 1 2n+1Xi=1log[(n+ 1)pi]= 2n log (n+ 1)ann(1 +an) 2 log n+ 11 +an ;which is the conclusion.Case 2: d> 1. For any d-dimensional unit vector v and t> 0, let fpign+1i=1be the optimal weights for W n( Xn + tv;an). Consider Yi = vT (Xi Xn)for i = 1;2;:::;n+ 1. It is easy to verify thatn+1Xi=1pi (Yi t) = 0:373.2. Finite-Sample Properties of Adjusted Empirical LikelihoodDe neeR n(t;an) = max(n+1Yi=1(n+ 1)pi : pi > 0;n+1Xi=1pi = 1;n+1Xi=1pi (Yi t) = 0);and fW n(t;an) = 2 log eR n(t;an). Since fpign+1i=1 are sub-optimal weightsfor fW n(t;an), we havefW n(t;an) W n( ;an) M:We have already proved that limt!1fW n(t;an) = M. Consequently, we getM = limt!1fW n(t;an) limt!1W n( ;an) limt!1W n( ;an) M:Hencelimt!1W n( ;an) = M:This completes the proof.Theorem 3.3 reveals that W n( ;an) is a bounded function of ; Figure 3.3shows the relationship between the upper bound and the sample size whenan = log(n)=2. When the sample size is small or the dimension is high, theupper bound of W n( ;an) is likely to be smaller than the upper quantile ofthe 2 distribution. When this happens, the approximate con dence regionbased on the 2 calibration becomes the entire parameter space. Figure 3.4shows the minimum sample size needed for the adjusted empirical likelihoodmethod to give bounded con dence regions versus di erent degrees of free-dom and con dence levels.Even if the minimum sample size is attained in a particular situation, theadjusted empirical likelihood method may still produce unreasonably largecon dence regions. For example, suppose we have a univariate sample of size5 and we would like to construct the AEL-based 95% con dence interval forthe population mean. As proposed by Chen et al. (2008), an is chosen tobe log(5)=2 = 0:805. For this choice of an, the upper bound of W n( ;an)383.2. Finite-Sample Properties of Adjusted Empirical Likelihood2468101214Sample sizeUpper bound2 3 4 5 6 7 8 9 1011121314151617181920Figure 3.3: Plot of upper bound against sample sizeis 3:851, which is only slightly larger than the upper 5% quantile of the 21distribution, 3:841. Because of this, the resulting con dence interval is verylong. We can imagine that the coverage rate is much higher than the nom-inal level 95%. Figure 3.5 illustrates this point with a data set of 5 pointsgenerated from N(0;1).In the case of population mean, we may modify the adjusted empiricallikelihood method so that the resulting W n( ;an) becomes unbounded fromabove. To motivate such a modi cation, let us once again look into Theo-rem 3.3. We view W n( ;an) as a function of an while regarding as a xedconstant satisfying 6= Xn. It is seen that W n( ;an) equals Wn( ) whenan = 0, and W n( ;an) is a decreasing function of an on the closed interval[0;n]. See Figure 3.6 for an illustration. Attempting to make W n( ;an)393.2. Finite-Sample Properties of Adjusted Empirical Likelihooda71a71a71a71a71a71a71a71a71Critical valueRequired sample size45678910111213141516Ï‡12(0.1)Ï‡22(0.2)Ï‡12(0.05) Ï‡22(0.1) Ï‡22(0.05)Ï‡12(0.01) Ï‡32(0.05) Ï‡22(0.01) Ï‡32(0.01)Figure 3.4: Plot of required sample size for various critical valuesunbounded from above, we consider replacing the constant an byan( ) = an exp q( Xn )T S 1n ( Xn ) ; (3.8)where Sn is the sample variance-covariance matrix. We assume that Sn isnonsingular.The resulting W n( ;an( )) is always larger than W n( ;an) but smallerthanWn( ) for any value of sincean( ) is always smaller thanan but largerthan 0. As deviates from Xn, an( ) tends to zero and thus W n( ;an( ))approaches W( ). As a result, W n( ;an( )) is unbounded from above. Fig-ure 3.7 visualizes the e ect of an( ) in the univariate case.The modi ed adjusted empirical likelihood possesses two key advantages.Firstly, the modi ed adjusted empirical likelihood ratio statistic preservesthe monotonicity of the adjusted empirical likelihood, and therefore the cor-403.2. Finite-Sample Properties of Adjusted Empirical LikelihoodÂµLikelihood ratio statisticâˆ’7.073 0.1141.299 8.477Upper Î± quantile of Ï‡12ELAELFigure 3.5: The EL-based and AEL-based 95% con dence intervals for thepopulation meanresponding con dence region is star-shaped and bounded. Based on theforegoing discussion, we can imagine that the AEL-based con dence regioncontains the con dence region based on the modi ed adjusted empirical like-lihood while the latter one contains the EL-based con dence region. Fig-ure 3.8 shows the 95% approximate con dence regions based on the empiricallikelihood method, the adjusted empirical likelihood method and the modi- ed adjusted empirical likelihood method.Secondly, it is seen that the multiplier in the de nition of an( ) convergesto 1 of order n 1=2 as n ! 1 when equals the true value 0. Thisfact implies the modi ed adjusted empirical likelihood preserves both the rst-order and second-order asymptotic properties of the original adjustedempirical likelihood.Obviously, there are many choices for the multiplier in the de nition of413.2. Finite-Sample Properties of Adjusted Empirical Likelihood0 1 2 3 4 50123456anWn*(Âµ;an)Figure 3.6: W n( ;an) as a function of anan( ). Based on the foregoing discussion, we may require the multipliershould satisfy:(1) it decreases to 0 as deviates from Xn; and(2) it converges to 1 of order n 1=2 as n increases.If a multiplier satis es these two conditions, the corresponding modi edadjusted empirical likelihood not only gives bounded con dence regions,but also preserves the asymptotic and nite-sample properties of the originaladjusted empirical likelihood presented in this thesis.With these two conditions in mind, we can see that there are still manykinds of choice for the multiplier. The \optimal" choice of multiplier wouldbe an interesting topic for future research.In the literature, there is another variant of adjusted empirical likelihoodthat gives unbounded likelihood ratio statistic. Emerson and Owen (2009)423.2. Finite-Sample Properties of Adjusted Empirical LikelihoodÂµLikelihood ratio statisticUpper Bound of AELX(1)X(n)ELAELModified AELFigure 3.7: The e ect of an( )also discover the boundedness of W n( ;an), and they consider a di erentway to modify the adjusted empirical likelihood so as to get an unboundedlikelihood ratio statistic in the case of population mean. They proposeadding two pseudo points to the original sample. More speci cally, for any 6= Xn, the rst pseudo point Xn+1 is also added on the further side of but the distance between Xn+1 and is a constant s, and the secondXn+2 is added such that Xn is the midpoint of Xn+1 and Xn+2. Then thelikelihood ratio statistic is de ned asW n( ) = 2 max(n+2Xi=1log[(n+ 2)pi] : pi > 0;n+2Xi=1pi = 1;n+2Xi=1pi (Xi ) = 0):The resulting method is called the balanced augmented empirical likelihoodmethod. The method is called \balanced" because the sample mean of433.2. Finite-Sample Properties of Adjusted Empirical Likelihooda71a71a71a71a71a71a71a71a71a710 2 4 60.00.51.01.52.02.53.0a71XnELAELModified AELFigure 3.8: The 95% approximate con dence regions produced by EL, AELand modi ed AELfX1;X2;:::;Xn+2g is maintained at Xn. They demonstrate this likelihoodratio statistic is unbounded from above, and establish the nite-sample rela-tionship between this modi ed likelihood ratio method and the well-knownHotellingâ€™s T2 test through the tuning parameter s. This topic is beyondthe scope of the thesis; details can be found in Emerson and Owen (2009).44Chapter 4Empirical ResultsIn this chapter, we conduct simulation studies on the nite-sample propertiesof the empirical likelihood method and its several variants. Particularly, weinvestigate the coverage probabilities and the sizes of a number of con denceregions for population mean.Constructing con dence regions for population mean based on a simplerandom sample of n observations is a classical problem in statistical infer-ence. The most widely-used method of constructing con dence regions forpopulation mean is based on the Hotellingâ€™s T2 statisticT2n( ) = n( Xn )TS 1n ( Xn );where Xn and Sn are the sample mean and the sample variance-covariancematrix, respectively. If the population distribution is multivariate normal ofdimension d, then (n d)T2( 0)=[d(n 1)] is known to have an F distributionwith d and n d degrees of freedom where 0 is the true parameter value.A 100(1 )% con dence region for is given byCR= : T2n( ) d(n 1)n d Fd;n d( ) ;where Fd;n d( ) denotes the upper quantile of F distribution with d andn d degrees of freedom. When d = 1, Hotellingâ€™s T2 statistic becomes thesquare of the well-known Studentâ€™s t statistic.Many practitioners prefer using this kind of con dence region based onnormal approximation because of its easy calculation and straightforwardinterpretation. Moreover, many numerical studies have found that the fore-going form of con dence region has surprisingly accurate coverage rate even454.1. Con dence Intervals for One-Dimensional Meanwhen the population distribution is not normal and the sample size is small.We investigate the coverage rates and the sizes of approximate 90% and95% con dence intervals/regions in the cases of one-dimensional mean andtwo-dimensional mean. Seven methods are considered:1. The Hotellingâ€™s T2 method, denoted as T2;2. The original empirical likelihood method, denoted as EL;3. The adjusted empirical likelihood method with an = log(n)=2, denotedas AEL;4. The modi ed adjusted empirical likelihood method withan = log(n)=2,denoted as MAEL;5. The Bartlett corrected empirical likelihood method, denoted as EL ;6. The adjusted empirical likelihood method with an = b=2 where b isestimated by moments, denoted as AEL ;7. The modi ed adjusted empirical likelihood method with an = b=2where b is estimated by method of moments, denoted as MAEL ;4.1 Con dence Intervals for One-DimensionalMeanThree sample sizes (n = 5;10;50) are considered. For each sample size, wegenerated 1;000 samples from each of the following four distributions:1. Standard normal distribution, denoted as N(0;1);2. 2 distribution with 1 degree of freedom, denoted as 21;3. Exponential distribution with rate 1, denoted as Exp(1); and4. A normal mixture 0:1 N( 9;1)+0:9 N(1;1), denoted as 0:1 N1+0:9 N2.For each sample, we calculated the 90% and 95% con dence intervals. Ta-ble 4.1 reports the coverage frequencies, and Table 4.2 gives the averagelengths of the corresponding intervals.464.1. Con dence Intervals for One-Dimensional MeanTable 4.1: Coverage rates for one-dimensional meanNominal level0.9 0.95N(0;1) n = 5 n = 10 n = 50 n = 5 n = 10 n = 50T2 0.895 0.909 0.892 0.956 0.952 0.949EL 0.757 0.858 0.889 0.815 0.913 0.935AEL 0.881 0.910 0.897 1.000 0.954 0.945MAEL 0.796 0.884 0.896 0.845 0.928 0.942EL 0.782 0.876 0.891 0.834 0.921 0.939AEL 0.797 0.880 0.891 0.855 0.923 0.939MAEL 0.779 0.870 0.889 0.831 0.917 0.937 21T2 0.785 0.809 0.898 0.840 0.857 0.940EL 0.663 0.766 0.892 0.733 0.832 0.943AEL 0.789 0.827 0.905 0.995 0.881 0.951MAEL 0.709 0.804 0.904 0.765 0.856 0.947EL 0.691 0.784 0.901 0.757 0.850 0.946AEL 0.715 0.792 0.901 0.770 0.853 0.947MAEL 0.688 0.782 0.900 0.749 0.843 0.945Exp(1)T2 0.829 0.849 0.884 0.875 0.896 0.932EL 0.712 0.799 0.878 0.765 0.869 0.934AEL 0.829 0.864 0.892 0.999 0.913 0.945MAEL 0.751 0.831 0.888 0.800 0.889 0.945EL 0.734 0.824 0.886 0.785 0.884 0.941AEL 0.749 0.828 0.886 0.813 0.887 0.941MAEL 0.722 0.819 0.884 0.773 0.878 0.9400:1 N1 + 0:9 N2T2 0.648 0.677 0.896 0.724 0.764 0.944EL 0.474 0.625 0.909 0.534 0.660 0.952AEL 0.636 0.660 0.923 0.999 0.718 0.959MAEL 0.522 0.645 0.918 0.573 0.680 0.959EL 0.497 0.632 0.920 0.547 0.666 0.957AEL 0.516 0.637 0.921 0.584 0.670 0.957MAEL 0.495 0.632 0.917 0.545 0.665 0.957474.1. Con dence Intervals for One-Dimensional MeanTable 4.2: Con dence interval lengths for one-dimensional meanNominal level0.9 0.95N(0;1) n = 5 n = 10 n = 50 n = 5 n = 10 n = 50T2 1.797 1.130 0.474 2.341 1.395 0.567EL 1.178 0.966 0.465 1.371 1.150 0.556AEL 1.729 1.135 0.485 18.201 1.398 0.581MAEL 1.307 1.046 0.480 1.521 1.239 0.574EL 1.266 1.019 0.472 1.466 1.213 0.564AEL 1.328 1.031 0.472 1.596 1.232 0.565MAEL 1.240 1.002 0.470 1.438 1.189 0.562 21T2 2.252 1.497 0.660 2.933 1.847 0.791EL 1.447 1.265 0.657 1.675 1.500 0.790AEL 2.156 1.491 0.685 22.802 1.844 0.825MAEL 1.614 1.370 0.678 1.882 1.621 0.815EL 1.547 1.339 0.675 1.782 1.587 0.813AEL 1.624 1.361 0.676 1.941 1.626 0.814MAEL 1.542 1.327 0.673 1.791 1.572 0.809Exp(1)T2 1.686 1.077 0.468 2.196 1.329 0.561EL 1.091 0.913 0.464 1.266 1.084 0.557AEL 1.617 1.075 0.484 17.075 1.327 0.581MAEL 1.215 0.989 0.479 1.415 1.170 0.574EL 1.169 0.965 0.474 1.349 1.145 0.569AEL 1.226 0.979 0.474 1.467 1.169 0.569MAEL 1.150 0.949 0.472 1.333 1.125 0.5660:1 N1 + 0:9 N2T2 4.796 3.309 1.498 6.246 4.083 1.796EL 3.076 2.801 1.468 3.562 3.327 1.756AEL 4.591 3.302 1.532 48.563 4.088 1.834MAEL 3.433 3.032 1.517 4.006 3.592 1.812EL 3.289 2.978 1.497 3.792 3.536 1.791AEL 3.456 3.032 1.498 4.138 3.633 1.792MAEL 3.234 2.921 1.491 3.739 3.460 1.782484.2. Con dence Regions for Two-Dimensional Mean4.2 Con dence Regions for Two-DimensionalMeanWe also consider constructing con dence regions for the population mean ofthe following bivariate distributions:1. Standard normal distribution, N(0;I2);2. Distribution of (X1;X2) where X1 (U;1) and X2 (U 1;1) withU Uniform(1:5;2), denoted as Gamma-Gamma.3. (X1;X2) is bivariate normal distributed with Var(X1) = Var(X2) = 1and Cov(X1;X2) = given Uniform(0;1), denoted asNormal-Uniform.Two sample sizes (n = 10;50) are considered. For each sample size, wegenerated 1;000 samples from each of the above three distributions. Ap-proximate 90% and 95% con dence regions were calculated. The coveragefrequencies and average areas of the con dence regions based on variousmethods are summarized in Table 4.3 and 4.4. Note that the area of con -dence region is calculated approximately.4.3 SummaryUnder the standard normal model, T2 has very accurate coverage rates com-pared to its nonparametric alternatives except the AEL. It is because thatT2-based con dence interval/region achieves the nominal level in theory re-gardless of the sample size. The performance of the nonparametric methodsgets better when the sample size increases. Under other distribution mod-els, the performances of all methods in small sample cases are unsatisfactory.Especially in the mixture normal model, the coverage rates are dramaticallylower than the nominal level.On average, T2-based con dence interval/region has larger size than itsnonparametric alternatives. It makes sense since T2-based con dence in-terval/region has higher coverage rate. Because of this trade-o between494.3. SummaryTable 4.3: Coverage rates for two-dimensional meanNominal level0.9 0.95N(0;I2) n = 10 n = 50 n = 10 n = 50T2 0.904 0.909 0.952 0.958EL 0.766 0.898 0.833 0.946AEL 0.883 0.917 0.966 0.958MAEL 0.813 0.914 0.861 0.956EL 0.801 0.907 0.852 0.952AEL 0.826 0.907 0.876 0.952MAEL 0.795 0.905 0.848 0.952Gamma-GammaT2 0.827 0.877 0.878 0.930EL 0.708 0.876 0.766 0.921AEL 0.827 0.892 0.926 0.935MAEL 0.736 0.887 0.795 0.929EL 0.742 0.887 0.797 0.928AEL 0.760 0.884 0.816 0.928MAEL 0.727 0.882 0.781 0.928Normal-UniformT2 0.920 0.899 0.959 0.960EL 0.752 0.890 0.831 0.944AEL 0.897 0.907 0.977 0.958MAEL 0.807 0.896 0.867 0.953EL 0.797 0.894 0.858 0.951AEL 0.822 0.894 0.882 0.951MAEL 0.790 0.894 0.849 0.948coverage probability and the size of con dence region, it is di cult to saywhich method has better performance. How to evaluate the performanceof certain kind of con dence region based on both the coverage probabilityand the region size is still an open question.Surprisingly, the AEL keeps up with T2 in terms of both coverage rateand con dence interval/region size for most cases. However, note that the504.3. SummaryTable 4.4: Con dence region areas for two-dimensional meanNominal level0.9 0.95N(0;I2) n = 10 n = 50 n = 10 n = 50T2 1.963 0.304 2.812 0.401EL 1.115 0.287 1.422 0.378AEL 1.821 0.314 3.511 0.414MAEL 1.293 0.306 1.644 0.401EL 1.260 0.299 1.600 0.392AEL 1.356 0.299 1.799 0.393MAEL 1.211 0.296 1.537 0.388Gamma-GammaT2 1.872 0.316 2.681 0.417EL 1.031 0.302 1.307 0.398AEL 1.715 0.330 3.331 0.436MAEL 1.194 0.321 1.519 0.422EL 1.161 0.318 1.466 0.419AEL 1.269 0.319 1.779 0.420MAEL 1.117 0.314 1.410 0.413Normal-UniformT2 1.676 0.263 2.401 0.347EL 0.954 0.251 1.218 0.330AEL 1.558 0.274 3.001 0.362MAEL 1.107 0.267 1.408 0.351EL 1.081 0.261 1.374 0.344AEL 1.168 0.262 1.568 0.345MAEL 1.038 0.259 1.319 0.340AEL-based con dence interval has substantially higher than nominal cov-erage rate when the sample size is 5 and the nominal level is 95% in theunivariate case. It is in accordance with the discussion in Section 3.2.3. Inthis situation, the upper bound of the adjusted empirical likelihood ratiostatistic is only slightly larger than the critical value. It results in very longinterval, which in turn possesses higher-than-expected coverage probability.514.3. SummaryAs expected, the MAEL is a compromise between the EL and the AEL.We observe that the performance of MAEL is more similar to that of ELthan that of AEL especially in the bivariate case. It may be due to thefact that the multiplier in the de nition of pseudo point in MAEL decreasesto 0 very fast as deviates from Xn; the decreasing is even faster as thedimension increases. It implies the di erence between the MAEL and ELlikelihood ratio statistics is smaller than that between the MAEL and AELlikelihood ratio statistics.The EL , AEL and MAEL have similar performance because they areknown to be precise up to the same order n 2.52Chapter 5ConclusionThe main interest of this thesis lies in the nite-sample properties of ad-justed empirical likelihood and its implication to constructing con denceregions for population mean. The monotonicity property of the adjustedempirical likelihood ratio statistic guarantees that AEL-based con denceregions for population mean are at least star-shaped. It is a desirable prop-erty for con dence regions because of its intuitive interpretation. We alsodiscovered the connection between empirical likelihood and adjusted empir-ical likelihood as a special case of a more general conclusion, which justi edthe empirical observation that AEL-based con dence regions have highercoverage probability than the corresponding EL-based con dence regions.The boundedness of adjusted empirical likelihood ratio statistic reveals thatconstant level of adjustment may produce inappropriate con dence regionswhen the sample size is not large enough or the nominal con dence level istoo high. We attempted to modify the level of adjustment so as to obtainan unbounded likelihood ratio statistic, and justi ed the proposed modi ca-tion preserves both asymptotic and nite-sample properties of the originaladjusted empirical likelihood.As future research, the convexity of AEL-based con dence regions forpopulation mean is of interest; current empirical studies support this propo-sition. On the other hand, the choice of an( ) also remains an interestingtopic. As discussed in Section 3.2.3, we may adjust the trade-o betweenthe coverage probability and the size of con dence region for populationmean. If we can come up with a sensible criterion to evaluate certain kindof con dence regions by taking both its coverage probability and its size intoconsideration, we may be able to nd a family of an( ) such that the result-ing AEL-based con dence region for population mean has both asymptotic53Chapter 5. Conclusionand nite-sample advantages.54BibliographyBarndor -Nielsen, O. E. and Cox, D. R. (1984). Bartlett adjustments to thelikelihood ratio statistic and the distribution of the maximum likelihoodestimator. Journal of Royal Statistical Society, Series B, 46(3):483{495.Chan, N. H. and Ling, S. (2006). Empirical likelihood for GARCH models.Econometric Theory, 22:403{428.Chen, J., Chen, S.-Y., and Rao, J. N. K. (2003). Empirical likelihood con -dence intervals for the mean of a population containing many zero values.The Canadian Journal of Statistics, 31(1):53{68.Chen, J., Peng, L., and Zhao, Y. (2009). Empirical likelihood based con -dence intervals for copulas. Journal of Multivariate Analysis, 100(1):137{151.Chen, J. and Sitter, R. R. (1999). A pseudo empirical likelihood approachto the e ective use of auxiliary information in complex surveys. StatisticaSinica, 9:385{406.Chen, J., Sitter, R. R., and Wu, C. (2002). Using empirical likelihoodmethod to obtain range restricted weights in regression estimators forsurveys. Biometrika, 89(1):230{237.Chen, J., Variyath, A. M., and Abraham, B. (2008). Adjusted empiricallikelihood and its properties. Journal of Computational and GraphicalStatistics, 17(2):426{443.Corotis, R. B., Sigl, A. B., and Klein, J. (1978). Probability models of windvelocity magnitude and persistence. Solar Energy, 20(6):483{493.55BibliographyDiciccio, T., Hall, P., and Romano, J. (1991). Empirical likelihood isBartlett-correctable. The Annals of Statistics, 19(2):1053{1061.Emerson, S. C. and Owen, A. B. (2009). Calibration of the empirical likeli-hood method for a vector mean. Electronic Journal of Statistics, 3:1161{1192.Hall, A. R. (2005). Generalized Method of Moments. Oxford UniversityPress.Hall, P. and La Scala, B. (1990). Methodology and algorithms of empiricallikelihood. International Statistical Review, 58(2):109{127.Hansen, L. P. (1982). Large sample properties of generalized method ofmoments estimators. Econometrica, 50(4):1029{1054.Hua, L. (2009). A report on an adjusted empirical likelihood. Manuscript,16 pages.Imbens, G. W. (2002). Generalized method of moments and empirical like-lihood. Journal of Business and Economic Statistics, 20(4):493{506.Liu, Y. and Chen, J. (2010). Adjusted empirical likelihood with high-orderprecision. The Annals of Statistics, 38(3):1341{1362.Liu, Y. and Yu, C. W. (2010). Bartlett correctable two-sample adjustedempirical likelihood. Journal of Multivariate Analysis, 101(7):1701{1711.Lun, I. Y. and Lam, J. C. (2000). A study of weibull parameters usinglong-term wind observations. Renewable Energy, 20(2):145{153.Nordman, D. J. and Caragea, P. C. (2008). Point and interval estimationof variogram models using spatial empirical likelihood. Journal of theAmerican Statistical Association, 103(481):350{361.Owen, A. B. (1988). Empirical likelihood ratio con dence intervals for asingle functional. Biometrika, 75(2):237{249.56BibliographyOwen, A. B. (1990). Empirical likelihood ratio con dence regions. TheAnnals of Statstics, 18(1):90{120.Owen, A. B. (1991). Empirical likelihood for linear models. The Annals ofStatistics, 19(4):1725{1747.Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC.Qin, G. and Zhou, X.-H. (2005). Empirical likelihood inference for the areaunder the ROC curve. Biometrics, 62(2):613{622.Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimatingequations. The Annals of Statistics, 22(1):300{325.Qin, J. and Zhang, B. (2007). Empirical-likelihood-based inference in miss-ing response problems and its application in observational studies. Journalof the Royal Statistical Society: Series B, 69(1):101{122.Variyath, A. M., Chen, J., and Abraham, B. (2010). Empirical likelihoodbased variable selection. Journal of Statistical Planning and Inference,140(4):971{981.Wilks, S. S. (1938). The large-sample distribution of the likelihood ratiofor testing composite hypotheses. The Annals of Mathematical Statistics,9(1):60{62.Zhu, H., Zhou, H., Chen, J., Li, Y., Lieberman, J., and Styner, M. (2009).Adjusted exponentially tilted likelihood with applications to brain mor-phology. Biometrics, 65(3):919{927.57
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Properties of empirical and adjusted empirical likelihoods
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Properties of empirical and adjusted empirical likelihoods Huang, Yi 2010
pdf
Page Metadata
Item Metadata
Title | Properties of empirical and adjusted empirical likelihoods |
Creator |
Huang, Yi |
Publisher | University of British Columbia |
Date | 2010 |
Date Issued | 2010-08-26T17:53:40Z |
Description | Likelihood based statistical inferences have been advocated by generations of statisticians. As an alternative to the traditional parametric likelihood, empirical likelihood (EL) is appealing for its nonparametric setting and desirable asymptotic properties. In this thesis, we first review and investigate the asymptotic and finite-sample properties of the empirical likelihood, particularly its implication to constructing confidence regions for population mean. We then study the properties of the adjusted empirical likelihood (AEL) proposed by Chen et al. (2008). The adjusted empirical likelihood was introduced to overcome the shortcomings of the empirical likelihood when it is applied to statistical models specified through general estimating equations. The adjusted empirical likelihood preserves the first order asymptotic properties of the empirical likelihood and its numerical problem is substantially simplified. A major application of the empirical likelihood or adjusted empirical likelihood is the construction of confidence regions for the population mean. In addition, we discover that adjusted empirical likelihood, like empirical likelihood, has an important monotonicity property. One major discovery of this thesis is that the adjusted empirical likelihood ratio statistic is always smaller than the empirical likelihood ratio statistic. It implies that the AEL-based confidence regions always contain the corresponding EL-based confidence regions and hence have higher coverage probability. This result has been observed in many empirical studies, and we prove it rigorously. We also find that the original adjusted empirical likelihood as specified by Chen et al. (2008) has a bounded likelihood ratio statistic. This may result in confidence regions of infinite size, particularly when the sample size is small. We further investigate approaches to modify the adjusted empirical likelihood so that the resulting confidence regions of population mean are always bounded. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Electronic Theses and Dissertations (ETDs) 2008+ |
Date Available | 2010-08-26 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0071220 |
URI | http://hdl.handle.net/2429/27819 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- ubc_2010_fall_huang_yi.pdf [ 452.55kB ]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0071220.json
- JSON-LD: 1.0071220+ld.json
- RDF/XML (Pretty): 1.0071220.xml
- RDF/JSON: 1.0071220+rdf.json
- Turtle: 1.0071220+rdf-turtle.txt
- N-Triples: 1.0071220+rdf-ntriples.txt
- Original Record: 1.0071220 +original-record.json
- Full Text
- 1.0071220.txt
- Citation
- 1.0071220.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 9 | 22 |
United States | 7 | 1 |
Russia | 4 | 0 |
Canada | 3 | 0 |
France | 2 | 0 |
Japan | 2 | 0 |
India | 1 | 0 |
Sweden | 1 | 0 |
Ukraine | 1 | 0 |
City | Views | Downloads |
---|---|---|
Shenzhen | 8 | 22 |
Unknown | 7 | 4 |
Ashburn | 6 | 0 |
Tokyo | 2 | 0 |
Delta | 2 | 0 |
Stockholm | 1 | 0 |
Saint Petersburg | 1 | 0 |
Beijing | 1 | 0 |
Vancouver | 1 | 0 |
Gilmer | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071220/manifest