Properties of Empirical and Adjusted Empirical Likelihoods by Yi Huang B.Sc., University of Science and Technology of China, 2008 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Statistics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2010 c© Yi Huang 2010 Abstract Likelihood based statistical inferences have been advocated by generations of statisticians. As an alternative to the traditional parametric likelihood, empirical likelihood (EL) is appealing for its nonparametric setting and de- sirable asymptotic properties. In this thesis, we first review and investigate the asymptotic and finite- sample properties of the empirical likelihood, particularly its implication to constructing confidence regions for population mean. We then study the properties of the adjusted empirical likelihood (AEL) proposed by Chen et al. (2008). The adjusted empirical likelihood was introduced to overcome the shortcomings of the empirical likelihood when it is applied to statisti- cal models specified through general estimating equations. The adjusted empirical likelihood preserves the first order asymptotic properties of the empirical likelihood and its numerical problem is substantially simplified. A major application of the empirical likelihood or adjusted empirical likelihood is the construction of confidence regions for the population mean. In addition, we discover that adjusted empirical likelihood, like empirical likelihood, has an important monotonicity property. One major discovery of this thesis is that the adjusted empirical like- lihood ratio statistic is always smaller than the empirical likelihood ratio statistic. It implies that the AEL-based confidence regions always contain the corresponding EL-based confidence regions and hence have higher cov- erage probability. This result has been observed in many empirical studies, and we prove it rigorously. We also find that the original adjusted empirical likelihood as specified by Chen et al. (2008) has a bounded likelihood ratio statistic. This may result in confidence regions of infinite size, particularly when the sample ii Abstract size is small. We further investigate approaches to modify the adjusted empirical likelihood so that the resulting confidence regions of population mean are always bounded. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Empirical Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Parametric Likelihood . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Definition of Empirical Likelihood . . . . . . . . . . . . . . . 8 2.3 Profile Empirical Likelihood of the Population Mean . . . . . 10 2.4 Empirical Likelihood and General Estimating Equations . . 14 2.5 Asymptotic Properties and EL-Based Confidence Regions . . 16 2.6 Limitations of Empirical Likelihood . . . . . . . . . . . . . . 20 2.6.1 Under-Coverage Problem . . . . . . . . . . . . . . . . 20 2.6.2 The No-Solution Problem . . . . . . . . . . . . . . . . 21 3 Adjusted Empirical Likelihood . . . . . . . . . . . . . . . . . 23 3.1 Adjusted Empirical Likelihood and AEL-Based Confidence Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Finite-Sample Properties of Adjusted Empirical Likelihood . 27 3.2.1 Monotonicity of W ∗n(µ; an) in µ . . . . . . . . . . . . 27 iv Table of Contents 3.2.2 Monotonicity of W ∗n(θ; an) in an . . . . . . . . . . . . 29 3.2.3 Boundedness of W ∗n(µ; an) . . . . . . . . . . . . . . . 33 4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Confidence Intervals for One-Dimensional Mean . . . . . . . 46 4.2 Confidence Regions for Two-Dimensional Mean . . . . . . . 49 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 v List of Tables 4.1 Coverage rates for one-dimensional mean . . . . . . . . . . . . 47 4.2 Confidence interval lengths for one-dimensional mean . . . . . 48 4.3 Coverage rates for two-dimensional mean . . . . . . . . . . . 50 4.4 Confidence region areas for two-dimensional mean . . . . . . 51 vi List of Figures 2.1 A two-dimensional example of EL-based confidence region . . 19 3.1 A two-dimensional example showing the position of the pseudo point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 An example showing the effect of the pseudo point on the likelihood ratio statistic. . . . . . . . . . . . . . . . . . . . . . 33 3.3 Plot of upper bound against sample size . . . . . . . . . . . . 39 3.4 Plot of required sample size for various critical values . . . . . 40 3.5 The EL-based and AEL-based 95% confidence intervals for the population mean . . . . . . . . . . . . . . . . . . . . . . . 41 3.6 W ∗n(µ; an) as a function of an . . . . . . . . . . . . . . . . . . 42 3.7 The effect of an(µ) . . . . . . . . . . . . . . . . . . . . . . . . 43 3.8 The 95% approximate confidence regions produced by EL, AEL and modified AEL . . . . . . . . . . . . . . . . . . . . . 44 vii Acknowledgements Foremost I would like to express my deep gratitude to my advisor, Professor Jiahua Chen. His inspiration, guidance, encouragement and insight helped me through the last two years. I am grateful to Dr. Mat́ıas Salibián-Barrera for serving on my examining committee and for his valuable comments and suggestions. Finally, I would like to give my heartful appreciation and gratitude to my parents for their support and encouragement. This thesis is dedicated to them. viii Chapter 1 Introduction Likelihood based statistical inferences have been advocated by generations of statisticians. Let us illustrate the likelihood approach through the prob- lem of modeling the randomness of the wind speed which is an important covariate in weather forecasting. In meteorology, the Weibull distribution with shape and scale parameters is used to model the distribution of wind speed (Corotis et al., 1978; Lun and Lam, 2000); that is, we postulate that the distribution of the wind speed is Weibull with two unspecified parameter values. Suppose we are given a set of observations of the wind speed, and it is reasonable to assume that they are a random sample from a Weibull distribution. Based on this assumption, we may calculate the probability of obtaining the observed data, which is a function of these two parameters. This function of the parameters is called the likelihood function. The like- lihood function is an effective means of summarizing the information about the unknown values of the parameters contained in the data: (1) the values that maximize the likelihood function are often used as point estimates of the unknown parameters, which are called the maximum likelihood estimates (MLEs); (2) the likelihood function can be used to perform statistical tests for hypothesis on the parameters, and to construct an confidence region for the parameters. These likelihood based statistical inferences possess many optimality properties under regularity conditions: (1) MLE is asymptotically efficient in many senses, and give intuitively best explanation of the data; (2) likelihood based statistical tests and confidence intervals or confidence regions have good asymptotic and small-sample properties; (3) likelihood is convenient for combining information from several data sources, and incor- porating knowledge arising from outside of data, such as the domain and a prior distribution of the parameter(s). 1 Chapter 1. Introduction Traditionally, the likelihood is defined through a pre-specified paramet- ric model. However, the choice of the parametric model in some applications can be a difficult issue. In the previous example, the Weibull distribution is widely used to characterize the wind speed because it has been found to fit a wide collection of wind data in many empirical studies. If the true distri- bution of the wind speed cannot be fit well by a Weibull distribution, the optimality properties of the likelihood approach will be in question. In com- parison, the nonparametric methods for statistical inference do not require specific parametric assumptions on the shape of the population distribution. Among these methods, the empirical likelihood (EL) approach proposed by Owen (1988, 1990) has gained increasing popularity. This approach retains a likelihood setting without activating a parametric assumption, and shares many desirable properties with the parametric likelihood. In this thesis, we review and investigate the asymptotic and finite-sample properties of empirical likelihood. In Chapter 2, we first present a short sum- mary of the properties of the parametric likelihood, followed by an intro- duction to the empirical likelihood. The profile empirical likelihood is then introduced for population mean and parameters defined through general es- timating equations. We also discuss the numerical algorithm for computing the empirical likelihood and some asymptotic properties of the empirical likelihood. One of major successes of the empirical likelihood is its easi- ness to construct approximate confidence regions for parameter of interest. The EL-based confidence region possesses many advantages: it has a data- driven shape; it is invariant under parameter transformation; and it is range respecting. On the other hand, the empirical likelihood method has a few short- comings. The EL-based confidence regions often have lower than specified coverage probabilities, particularly when the sample size is small. This prob- lem can be alleviated through Bartlett correction. However, the confident region of the population mean, for instance, is confined within the convex hull of the data. In some cases, even the convex hull of the data does not have large enough coverage probability of the population mean. In addition, the empirical likelihood is not defined at certain parameter values which may 2 Chapter 1. Introduction occur especially when they are defined through general estimating equations. Consequently, the empirical likelihood approach may fail to make a sensible inference in a particular application. To overcome these shortcomings of the EL approach, Chen et al. (2008) proposed an adjusted empirical likelihood (AEL). The adjusted empirical likelihood is well defined on all parameter values defined through estimat- ing equations. It shares the same desirable first-order asymptotic properties with the empirical likelihood. Its numerical computation is much simpler and faster. In Chapter 3, we first introduce and present some properties of the adjusted empirical likelihood. We further investigate some finite-sample properties of the adjusted empirical likelihood. One major discovery is that the adjusted empirical likelihood ratio statistic is always smaller than that of the empirical likelihood. Consequently, the AEL-based confidence re- gions always contain the corresponding EL-based confidence regions. Thus, it effectively rectifies the under-coverage problem suffered by the empirical likelihood when the sample size is not large. We also discovered that the adjusted empirical likelihood has a monotonicity property. Because of this, the AEL-based confidence region for population mean is star-shaped. It also enables us to design a simple algorithm for computing confidence re- gions of multivariate population mean. It also leaves us an open question whether the AEL-based confidence region for population mean is convex. In addition, we find the original recipe of the adjusted empirical likelihood given by Chen et al. (2008) results in bounded likelihood ratio statistic. As a result, the AEL-based confidence regions can be unbounded. However, this problem can be easily fixed. We propose one possible modification to the adjusted empirical likelihood so that the corresponding likelihood ratio statistic becomes unbounded. In Chapter 4, We empirically examine the ability of the foregoing meth- ods to statistical inference about population mean. Certain kinds of setting are considered to investigate the finite-sample performances of the foregoing methods. 3 Chapter 2 Empirical Likelihood Empirical likelihood (EL) is a nonparametric analogue of the classical para- metric likelihood. The empirical likelihood method is first formalized in the pioneering works of Owen (1988, 1990) for statistical inference on the population mean. Qin and Lawless (1994) generalize empirical likelihood to the case where parameters are defined through general estimating equations. We call this method “empirical” because the empirical distribution, which assigns equal point mass on the data point, plays a key role in the setting of this method. The empirical likelihood method provides a versatile approach that may be applied to perform inference for a wide variety of parameters of interest, and has been employed in a number of different areas of statistics. Qin and Zhang (2007) apply the empirical likelihood method to make a constrained likelihood estimation of mean response in missing data problems. Chen et al. (2003) consider constructing EL-based confidence intervals for the mean of a population containing many zero values in the area of survey. Chen et al. (2002) design an EL-based algorithm to determine design weights in surveys that meet pre-specified range restrictions. Chen and Sitter (1999) develop a pseudo empirical likelihood approach to incorporating auxiliary information into estimates from complex surveys. Chen et al. (2009) examine the perfor- mance of EL-based confidence intervals for copulas. Chan and Ling (2006) develop an empirical likelihood ratio test for GARCH model in time series. Qin and Zhou (2005) propose an empirical likelihood approach for construct- ing confidence intervals for the area under the ROC curve. Nordman and Caragea (2008) present a spatial blockwise empirical likelihood method for estimating variogram model parameters in the analysis of spatial data on a grid. And many more varied topics. 4 2.1. Parametric Likelihood In Section 2.1, we briefly review the parametric likelihood inferences. Section 2.2, 2.3 and 2.4 are contributed to summarizing the setting of em- pirical likelihood. In Section 2.5, some asymptotic properties of empirical likelihood are presented; we focus on results related to constructing approx- imate confidence regions for parameter of interest. We also present many finite-sample properties of the EL-based confidence region. In the end, we point out the limitations of the empirical likelihood method and the corre- sponding remedies in Section 2.6, which lead to the subject of Chapter 3. 2.1 Parametric Likelihood Let F = {f(x; θ) : θ ∈ Θ} be a collection of probability density function with respect to some σ-finite measure, where θ ∈ Rp is a parameter that uniquely determines the form of the density and Θ is a p-dimensional set of possible values for θ. Suppose a random sample X = (X1, X2, . . . , Xn) is generated from one distribution of this probability family. Given that X = x, the likelihood function of θ is defined as Ln(θ |x) = n∏ i=1 f(xi; θ). The likelihood function is interpreted as the probability of obtaining the observed sample if the parameter value equals θ. Hence, the likelihood function provides a way to measure the plausibility of different parameter values. If we compare the values of the likelihood function at two parameter values θ1 and θ2 and find that Ln(θ1 |x) > Ln(θ2 |x), then the sample we observed is more likely to have occurred if θ = θ1 than if θ = θ2. That is, θ1 is a more plausible value of θ than is θ2. Take the wind speed example, where the two-parameter Weibull distri- bution is postulated. The probability density function of the two-parameter 5 2.1. Parametric Likelihood Weibull distribution is given by f(x; k, λ) = k λ (x λ )k−1 exp { − (x λ )k} , x ≥ 0, where k > 0 is the shape parameter and λ > 0 is the scale parameter. In this example, we have θ = (k, λ) and Θ = (0,∞) × (0,∞). If a collection of wind data is available in the form of n independent observations x = (x1, x2, . . . , xn), the likelihood function will be Ln(k, λ |x) = n∏ i=1 f(xi; k, λ) = ( k λ )n n∏ i=1 (xi λ )k−1 exp { − (xi λ )k} = kn λnk exp { − 1 λk n∑ i=1 xki + (k − 1) n∑ i=1 log xi } . In this example, we may be interested in answering the following question: given the observed sample, what value of θ is the most plausible? Let θ̂n(x) be the global maximum of Ln(θ |x): θ̂n(x) = argmax θ Ln(θ |x). We call θ̂n(x) the maximum likelihood estimate of θ. As a function of the random sample X, θ̂n = θ̂n(X) is called the maximum likelihood estimator (MLE) of θ. Intuitively, the MLE is a reasonable choice for a point estimator: the observed sample is the most likely when the MLE is the parameter value. The MLE possesses two important properties by its construction. Firstly, the MLE is range respect; the range of the MLE coincides with the range of the parameter. Secondly, the MLE is invariant under parameter trans- formation. Suppose a distribution family is indexed by a parameter θ, but the interest lies in finding an estimator of some function of θ, say η(θ). If θ̂n is the MLE of θ, then η(θ̂n) is the MLE of η(θ). The second property of MLE allows us to study a parameter that does not appear in the density 6 2.1. Parametric Likelihood function. In the wind speed example, we may be interested in the mean of the wind speed, say µ. Note that µ can be expressed as a function of k and λ: µ = λΓ(1 + k−1) where Γ(·) is the gamma function. Hence, if k̂ and λ̂ are the MLEs of k and λ respectively, then µ̂ = λ̂Γ(1 + k̂−1) is the MLE of µ. MLE possesses many nice asymptotic properties under some mild con- ditions on f(x; θ). Firstly, MLE is a consistent estimator of the parameter, i.e. MLE converges to the true parameter value almost surely as the sample size increases. Secondly, MLE is asymptotically efficient in the sense that its asymptotic variance equals the Cramér-Rao bound as the sample size tends to infinity. In applications, we may prefer a guess of a region of parameter values to a guess of a single parameter value. We can imagine that those parameter val- ues that are slightly “different” from the MLE are also good candidates of the true parameter value. The likelihood function can be used to quantify the “difference” between any parameter value and the MLE. According to the definition of the MLE, the likelihood ratio Rn(θ |x) = Ln(θ |x)/Ln(θ̂n |x) is always less than 1. Thus, we may choose some constant c ∈ (0, 1) and claim that the true parameter value is likely contained in the following region of parameter values: CRc = {θ : Rn(θ |x) ≥ c}. (2.1) The purpose of using a region estimator rather than a point estimator is to have some guarantee of capturing the true value of parameter. The certainty of this guarantee is quantified by the probability of CRc covering the true parameter value, Pr{θ ∈ CRc} = Pr{Rn(θ |x) ≥ c}. With this in mind, we may choose the constant c such that CRc has a pre-specified coverage probability. The guaranteed coverage probability is also called the confidence level of CRc. Thus, we need to know the distribution of Rn(θ |X). In general, the exact distribution of Rn(θ |X) is hard to determine. Wilks (1938) proves that under some wild conditions −2 logRn(θ |X) converges to χ2p in distribution as n → ∞, provided that the true parameter value is 7 2.2. Definition of Empirical Likelihood θ. This asymptotic result is known as Wilks’ theorem. Using this χ2 approximation, we may choose c in equation (2.1) to be exp{−χ2p(α)/2}, where χ2p(α) denotes the upper α quantile of χ2p, for small α. The resulting approximate 100 (1− α)% confidence region for θ is CR = {θ : Rn(θ |x) ≥ exp{−χ2p(α)/2}} = {θ : −2 logRn(θ |x) ≤ χ2p(α)}. Similar to the MLE, the foregoing confidence region is also range respect and invariant under parameter transformation to some degree. In the example of the wind speed, if CR is an approximate 95% confidence region for the parameter (k, λ) then CR′ = {λΓ(1 + k−1) : (k, λ) ∈ CR} is an at least approximate 95% confidence region for the mean µ. As widely recognized, the statistical inferences based on parametric like- lihood has its own risk: If the true distribution deviates from the parametric distribution that we assume for the data, the foregoing nice properties of these inferences on the parameter of interest may be deprived of. The difficulties in choosing a parametric family make many statisticians turn to nonparametric methods for statistical inferences. These nonparamet- ric methods include the jackknife, the infinitesimal jackknife, the bootstrap method, and the empirical likelihood method. Each nonparametric method has its own advantages, but most of them are lack of a likelihood setting. The empirical likelihood method stands out since it combines the reliability of the nonparametric methods and the flexibility and effectiveness of the likelihood approach. 2.2 Definition of Empirical Likelihood In this section, we present the setting of empirical likelihood. SupposeX1, X2, . . . , Xn are independent and identically distributed (i.i.d.) d-dimensional random vectors with unknown distribution F0 for some d ≥ 1. 8 2.2. Definition of Empirical Likelihood The empirical likelihood of any distribution F is defined as L(F ) = n∏ i=1 F ({Xi}), where F (A) is Pr(X ∈ A) for X ∼ F and A ⊆ Rd. The definition of empirical likelihood is a direct analogue of paramet- ric likelihood: the probability of observing the sample under the assumed distribution. The major difference between empirical likelihood and para- metric likelihood is that the former is defined over a very broad range of distributions. That is, there are practically no restrictions on the shape of the distribution under consideration. The name “empirical likelihood” is adopted because the empirical distribution of the sample plays a key role in the setting of empirical likelihood. The empirical distribution is defined as Fn = 1 n n∑ i=1 δXi , where δx denotes the distribution under which Pr(X = x) = 1. The empiri- cal likelihood is maximized at the empirical distribution. Proposition 2.1. Suppose X1, X2, . . . , Xn ∈ Rd for some d ≥ 1 are in- dependent random vectors with a common distribution F0 and Fn is the corresponding empirical distribution. For any distribution F 6= Fn, we have L(F ) < L(Fn). Proof. Let pi = F ({Xi}) for i = 1, 2, . . . , n. It is easy to see that pi ≥ 0 and∑n i=1 pi ≤ 1. Using a well-known fact that the arithmetic mean of a sequence of nonnegative numbers is always larger than or equal to its geometric mean, we have Ln(F ) = n∏ i=1 pi ≤ ( 1 n n∑ i=1 pi )n ≤ n−n. (2.2) The last equality in (2.2) holds if and only if all pi’s are equal and ∑n i=1 pi = 1. This inequality implies that L(F ) attains its maximum n−n at F = 9 2.3. Profile Empirical Likelihood of the Population Mean Fn. By analogy with the definition of MLE under parametric model, we say that the empirical distribution Fn is the maximum empirical likelihood estimate (MELE) of the distribution F . In this spirit, the MELE of the population mean µ = ∫ x dF (x) is µ̂n = ∫ x dFn(x) = ∑n i=1Xi/n = X̄n, which is the sample mean. The properties of X̄n under some mild conditions have already been well studied: X̄n is an unbiased and consistent estimator of µ; it has the smallest asymptotic variance among all the unbiased estimators of µ; it is asymptotic normal distributed; and so on. In this thesis, we mainly focus on the problem of constructing confidence regions for µ through the empirical likelihood. For this purpose, we introduce the profile empirical likelihood in the next section. 2.3 Profile Empirical Likelihood of the Population Mean Let X1, X2, . . . , Xn be i.i.d. d-dimensional random vectors with unknown distribution F . By analogy with the Wilks’ theorem, we may also use the ratio of the empirical likelihood as a basis for constructing confidence regions. The em- pirical likelihood ratio for a distribution F is defined as Rn(F ) = Ln(F ) Ln(Fn) . By Proposition 2.1, Rn(F ) ≤ 1 and the equality holds if and only if F = Fn. Recall that the population mean is a functional of the population distri- bution. The likeliness of a specific value of µ can be inferred from this relationship. In the literature of empirical likelihood, we define the profile 10 2.3. Profile Empirical Likelihood of the Population Mean empirical likelihood of µ as Ln(µ) = sup { Ln(F ) : ∫ x dF (x) = µ } . (2.3) By analogy with the parametric likelihood ratio, we may define the profile empirical likelihood ratio function of µ as Rn(µ) = Ln(µ) Ln(X̄n) = sup { Rn(F ) : ∫ x dF (x) = µ } , (2.4) Yet without requiring the support of F being confined within the set of observed values ofX, this profile empirical likelihood ratio for the population mean always equals 1. We illustrate this point as follows. For any given µ, let ε be a positive constant smaller than 1 and xµ = 1 ε µ− 1− ε ε X̄n. We construct a mixture distribution Fµ,ε = (1− ε)Fn+ ε δxµ . Note that the mean of (Fµ,ε) is µ and Rn(Fµ,ε) = Ln(Fµ,ε) Ln(Fn) = [(1− ε)/n]n (1/n)n = (1− ε)n. Hence for any pre-specified value of µ, Rn(Fµ,ε) can be made arbitrarily close to 1 as long as ε is sufficiently small. As a result, Rn(µ) defined by equation (2.4) always equals 1 for any µ. Hence, this definition of the profile likelihood ratio function is not useful for constructing confidence regions. The above problem can be easily solved by requiring the support of F being contained in the set of observed values of X. As proposed by Owen (1988), the empirical likelihood ratio will be profiled for a parameter over only the distributions with support on the data set. In other words, only distributions such that pi = F ({Xi}) > 0 and ∑n i=1 pi = 1 will be considered. We denote such a distribution F as F Fn. The definition of 11 2.3. Profile Empirical Likelihood of the Population Mean the profile empirical likelihood for µ in the literature is given by Ln(µ) = sup { Ln(F ) : F Fn, ∫ x dF (x) = µ } = sup { n∏ i=1 pi : pi > 0, n∑ i=1 pi = 1, n∑ i=1 piXi = µ } . (2.5) Without further clarification, we refer to “profile empirical likelihood” as “empirical likelihood” from now on. In this definition of Ln(µ), the sample mean X̄n is its maximum point. We naturally define the profile empirical likelihood ratio for µ as Rn(µ) = Ln(µ) Ln(X̄n) = sup { Ln(F ) Ln(Fn) : F Fn, ∫ x dF (x) = µ } = sup { n∏ i=1 npi : pi > 0, n∑ i=1 pi = 1, n∑ i=1 piXi = µ } . (2.6) For the convenience of discussing asymptotic properties, we prefer working on the profile empirical likelihood ratio statistic defined as Wn(µ) = −2 logRn(µ). Because Rn(µ) is the maximum value of ∏n i=1 npi subject to some con- straints, Wn(µ) is the minimum value of −2 ∑n i=1 log(npi) subject to the same constraints. We will refer to a set of weights {pi}ni=1 that satisfy these constraints as sub-optimal weights for Wn(µ). The second constraint may also be written as n∑ i=1 pi(Xi − µ) = 0. (2.7) The calculation of Rn(µ) and Wn(µ) at a given parameter value amounts to solving a constrained optimization problem. The Lagrange’s method is well 12 2.3. Profile Empirical Likelihood of the Population Mean suited in this situation. Take Wn(µ) as an example. Let us define H(p1, p2, . . . , pn;λ, η) = −2 n∑ i=1 log(npi)− nλT [ n∑ i=1 pi(Xi − µ) ] + η ( n∑ i=1 pi − 1 ) with λ ∈ Rd and η ∈ R being the lagrange multipliers. Setting the derivatives of H with respect to λ and η to zero, we recover the two equality constraints on pi’s. Differentiating H with respect to pi and setting the derivatives equal to zero, we get 0 = ∂H ∂pi = 1 pi − nλT (Xi − µ) + η (2.8) Multiplying the above equation by pi and summing over i, with the help of two constraints, we get 0 = n∑ i=1 pi ∂H ∂pi = n+ η. It gives us η = −n. Substituting this result into equation (2.8) gives the optimal weights pi = 1 n 1 1 + λT (Xi − µ) , i = 1, . . . , n. (2.9) The value of λ can be computed through the constraint n∑ i=1 pi(Xi − µ) = n∑ i=1 1 n Xi − µ 1 + λT (Xi − µ) = 0. Equivalently, we have n∑ i=1 Xi − µ 1 + λT (Xi − µ) = 0. (2.10) 13 2.4. Empirical Likelihood and General Estimating Equations The above equation can be easily solved numerically. From now on, we will refer to the weights given by equation (2.9) as the optimal weights for Wn(µ). Once the value of λ is obtained, we can compute Wn(µ) through Wn(µ) = 2 n∑ i=1 log[1 + λT (Xi − µ)]. (2.11) The primal constrained optimization problem for Wn(µ) must work on n variables p1, p2, . . . , pn. Equation (2.11) shows that Wn(µ) has a simple analytic expression, and the constrained optimization problem is reduced to finding an appropriate root λ to equation (2.10). This simple expression of Wn(µ) has two advantages. Firstly, this expression of Wn(µ) provides a feasible approach to calculate Wn(µ) numerically. Chen et al. (2002) propose a modified Newton’s algorithm for finding the root to equation (2.10), whose algorithmic convergence is guaranteed when the solution exists. Secondly, this expression helps us study the asymptotic behavior of Wn(µ). In the investigation of the asymptotic properties of Wn(µ), the property of λ plays a key role. 2.4 Profile Empirical Likelihood for Parameters Defined Through General Estimating Equations In additional to make inference on population means, empirical likelihood finds many applications to parameters defined in a nonparametric way. For instance, Owen (1991) applies empirical likelihood to make inference on the regression coefficients in linear models. In general, we can often define some parameters of interest through the so-called “general estimating equations”. For a random variable X ∼ F , a p-dimensional parameter θ can be defined as the solution to EF [g(X; θ)] = 0 (2.12) 14 2.4. Empirical Likelihood and General Estimating Equations for some q-dimensional mapping g(X; θ) with q ≥ p. The above system is known as the general estimating equation (GEE), and g(x; θ) is called the estimating function. When g(x; θ) = x− θ, the parameter θ is the mean of X. When g(s; θ) = I(x ≤ θ) − α for some α ∈ (0, 1), θ is the α quantile of X. The classic setting of GEE has q = p. Given a simple random sample X1, X2, . . . , Xn, an estimator of θ, say θ̂, can be obtained as the solution to EFn [g(X; θ)] = 1 n n∑ i=1 g(Xi; θ) = 0. (2.13) Since Fn is the MELE of F , it implies that this estimator is the MELE of θ. In econometrics applications, however, most interest attaches to the case of over-identification with q > p (Imbens, 2002; Hansen, 1982; Hall, 2005). In this case, equation (2.13) may not have any solutions. Empirical likelihood provides a natural approach to overcome this prob- lem. Qin and Lawless (1994) develop a theory for the EL-based statistical inference for parameters defined through general estimating functions. They propose to define the profile empirical likelihood for θ as Ln(θ) = max { n∏ i=1 pi : pi > 0, n∑ i=1 pi = 1, n∑ i=1 pi g(Xi; θ) = 0 } . The corresponding profile empirical likelihood ratio for θ becomes Rn(θ) = L(θ) L(Fn) = max { n∏ i=1 npi : pi > 0, n∑ i=1 pi = 1, n∑ i=1 pi g(Xi; θ) = 0, } . (2.14) The likelihood ratio statistic is then given by Wn(θ) = −2 logRn(θ). Similar to the case of the population mean discussed in Section 2.3, Wn(θ) 15 2.5. Asymptotic Properties and EL-Based Confidence Regions can be written as Wn(θ) = 2 n∑ i=1 log[1 + λT g(Xi; θ)] (2.15) with λ being the solution to n∑ i=1 g(Xi; θ) 1 + λT g(Xi; θ) = 0. (2.16) In the framework of general estimating equation, the MELE of θ, which is defined as the maximum point of Ln(θ), is not so trivial as that for pop- ulation mean. One of the main contributions of Qin and Lawless (1994) is that they demonstrate the asymptotic normality of the MELE of θ under some regularity conditions on the estimating function. In addition, they justify the use of the empirical likelihood ratio statistic for testing or ob- taining confidence regions for parameters in a completely analogous way to the parametric likelihood approach. But the main interest of this thesis lies in the statistical inference for population mean, so we will not explore this topic further here. 2.5 Asymptotic Properties of Empirical Likelihood and EL-Based Confidence Regions The most impressive result in Owen (1988, 1990) is the following asymptotic limiting distribution of Wn(µ). Theorem 2.2. Let X1, X2, . . . , Xn be a simple random sample from some d- dimensional population X and Wn(µ) is the empirical likelihood ratio statis- tic for the population mean µ. If the variance-covariance matrix of X is positive definite and the true value of µ is µ0, then Wn(µ0) d−→ χ2d, as n→∞. (2.17) Theorem 2.2 suggests an approximate 100(1−α)% confidence region for 16 2.5. Asymptotic Properties and EL-Based Confidence Regions µ in the form of CRα = {µ |Wn(µ) ≤ χ2d(α)}, where χ2d(α) is the upper α quantile of χ 2 d. Hall and La Scala (1990) and Owen (2001) point out that the confidence region CRα is always convex. This is clearly a nice property to practitioners. We summarize their result as Proposition 2.3 with a simple proof. Proposition 2.3. Let X1, X2, . . . , Xn be a random sample from some pop- ulation X and Wn(µ) be the empirical likelihood ratio statistic for the pop- ulation mean. Suppose µ1 6= µ2 and µ1, µ2 ∈ CRα, and µ is a convex combination of µ1 and µ2. Then µ ∈ CRα. Proof. Let {pi}ni=1 and {qi}ni=1 be the optimal weights forWn(µ1) andWn(µ2), respectively. For any µ such that µ = ξ µ1 + (1− ξ)µ2 for some 0 ≤ ξ ≤ 1, it is easy to verify that {ri = ξ pi+(1− ξ) qi}ni=1 are sub-optimal weights for Ln(µ). Note also that ξ pi + (1 − ξ) qi ≥ pξi q1−ξi for i = 1, 2, . . . , n. Hence, we have Ln(µ) ≥ n∏ i=1 ri = n∏ i=1 [ξ pi + (1− ξ) qi] ≥ n∏ i=1 pξi q 1−ξ i = [Ln(µ1)] ξ [Ln(µ2)] 1−ξ. It follows that Wn(µ) ≤ ξ Wn(µ1) + (1− ξ)Wn(µ2) ≤ ξ χ2d(α) + (1− ξ)χ2d(α) = χ2d(α). By definition of CRα, we conclude that µ ∈ CRα. This completes the proof. Following this proposition, we can see that Wn(µ) has some kind of monotonicity property. Corollary 2.4. Assume the same conditions as in Proposition 2.3. Let v be a d-dimensional unit vector and consider the half line defined by X̄n + tv for t > 0. Then Wn(X̄n + tv) is an increasing function of t. 17 2.5. Asymptotic Properties and EL-Based Confidence Regions Proof. For any 0 < t1 < t2, let µ1 = X̄n + t1 v and µ2 = X̄n + t2 v. Note that µ1 is a convex combination of µ2 and X̄n. Consider a confidence region for µ, CR = {µ : Wn(µ) ≤ Wn(µ2)}. Note that µ2 and X̄n always fall inside of this region. By Proposition 2.3, µ1 also belongs to CR and thus Wn(µ1) ≤Wn(µ2). Corollary 2.4 justifies a simple algorithm for numerically finding the boundary of the confidence region. We briefly describe this algorithm in the case of bivariate mean: 1. Choose a sufficiently dense sequence of angles from 0 to 2pi, for ex- ample, an arithmetic sequence from 0 to 2pi with common difference 2pi/M for some sufficiently large positive integer M . 2. Along each direction defined as a unit vector Φm = (cosφm, sinφm) T with φm being an angle selected in Step 1, we search for a positive real number tm such that Wn(X̄n+tm Φm) is sufficiently close to the critical value determined by the χ2 approximation. Since Wn(µ) is increasing along any direction starting from X̄n, as asserted in Corollary 2.4, a simple bisection algorithm is effective. 3. Let {X̄n + tm Φm,m = 1, 2, . . . ,M} be the boundary points obtained in Step 2. With this set of points, we not only can visualize the confidence regions through a two-dimensional plot, but can also calculate the approximate area of the confidence regions. Apparently, the approximation becomes better when M gets larger, but the exact accuracy is difficult to determine. EL-based confidence region has many celebrating properties. We sum- marize some as follows: 1. EL-based confidence region has a data-driven shape. Figure 2.1 shows the boundary of the EL-based 95% confidence region for the bivari- ate mean based on a data set of 10 observations generated from a bivariate gamma distribution. It is seen that the shape of the confi- 18 2.5. Asymptotic Properties and EL-Based Confidence Regions l l l l l l l l l l 0 2 4 6 0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 lXn EL Hotelling's T2 Figure 2.1: A two-dimensional example of EL-based confidence region dence region based on the widely-used normal approximation is pre- determined even before the data are available. On the contrary, the EL-based confidence region automatically reflects the emphasis on the data set. It is an appealing property to many practitioners since it upholds the principle of “letting the data speak.” 2. EL-based confidence region is range respecting and transformation in- variant. For example, the confidence interval for the correlation always lies between −1 and 1. 3. The EL-based confidence region is Bartlett-correctable. In both para- metric likelihood based and EL-based confidence regions, we select the critical value using the limiting distribution of the likelihood ratio statistic. Such approximations introduce error to the coverage accu- racy of the resulting confidence regions. The actual coverage proba- 19 2.6. Limitations of Empirical Likelihood bility of the confidence region does not exactly agree with the nom- inal level. In the parametric setting, the coverage accuracy can be improved by the so-called Bartlett correction on the likelihood ratio statistic (Barndorff-Nielsen and Cox, 1984). As shown by Diciccio et al. (1991), the empirical likelihood ratio statistic is also Bartlett correctable. We will discuss it further in Section 2.6.1. 2.6 Limitations of Empirical Likelihood While empirical likelihood has many nice properties as shown in Section 2.5, there are situations where its applications meet some practical obstacles. In this section, we discuss two related issues which lead to the adjusted empirical likelihood in Chapter 3. 2.6.1 Under-Coverage Problem Theorem 2.2 suggests using the limiting distribution of Wn(µ) to calibrate the EL-based confidence region for µ. As expected, the coverage probability of the resulting confidence region does not exactly match the pre-specified confidence level. Diciccio et al. (1991) shows that Pr{µ0 ∈ CRα} = Pr{Wn(µ0) ≤ χ2d(α)} = 1− α+O(n−1). (2.18) Simulation results reveal that EL-based confidence regions suffer from the so-called “under-coverage” problem. That is, its coverage probability is lower than the nominal level particularly when the sample size is small or the population is skewed. Diciccio et al. (1991) prove that the empirical likelihood is Bartlett correctable; a simple correction on Wn(µ0) can improve the approximating precision given in equation (2.18) fromO(n−1) toO(n−2). Empirical studies reveal that the Bartlett correction significantly improves the coverage rate of the EL-based confidence regions. The error in the approximation is partially accounted to the fact that the expectation of Wn(µ0) does not match the expectation of the corresponding limiting distribution. Thus, the coverage accuracy may be improved by 20 2.6. Limitations of Empirical Likelihood rescaling Wn(µ). Asymptotically, it is found that E[Wn(µ)] = d [ 1 + b n +O(n−2) ] with b being a constant depending on the first four moments of the pop- ulation X. By applying the χ2 approximation to Wn(µ)/(1 + b n −1), the resulting confidence region has higher precision; the coverage error is re- duced from order n−1 to of order n−2. More precisely, Pr { Wn(µ0) 1 + b/n ≤ χ2d(α) } = Pr { Wn(µ0) ≤ χ2d(α) ( 1 + b n )} = 1− α+O(n−2). (2.19) The asymptotic derivation of equation (2.19) is long and complex. The details can be found in Diciccio et al. (1991). The constant b in equation (2.19) is called the Bartlett correction fac- tor. The value of b depends on the first four moments of the population distribution. Its value must be estimated based on data in applications. Replacing b by a √ n-consistent estimator in equation (2.19) will not affect the theoretical result. The Bartlett correction is also applicable for EL-based confidence regions for parameters defined through general estimating equation (2.12). The Bartlett correction factor b is determined by the distribution of g(X; θ). Liu and Chen (2010) provide a detailed discussion on how to calculate the Bartlett correction factor in the framework of general estimating equation. 2.6.2 The No-Solution Problem As described in Section 2.4, Wn(θ) equals the minimum value of−2 ∑n i=1 log(n pi) over all sub-optimal weights {pi}ni=1. Hence Wn(θ) is well defined if and only if there exists at least one set of sub-optimal weights. Let CH{· · · } be the convex hull expanded by the set of points inside {}. Then, Wn(θ) is well 21 2.6. Limitations of Empirical Likelihood defined when 0 ∈ CH{g(Xi; θ), i = 1, 2, . . . , n}. (2.20) We take the population mean µ as an example. Condition (2.20) is satisfied if and only if 0 ∈ CH{Xi − µ, i = 1, 2, . . . , n}. Equivalently, we must have µ ∈ CH{Xi, i = 1, 2, . . . , n}. When µ is one-dimensional, condition (2.20) can be further simplified to X(1) < µ < X(n). Let Θ0 be the set of θ over which condition (2.20) is satisfied. We can see that Θ0 is determined by the data. For complex estimating equations, it can be hard to specify the structure of Θ0. Owen (2001) proves that the true parameter value θ0, defined as the unique solution to equation (2.12), is contained in Θ0 almost surely as n→∞ under some regularity conditions on the estimating function. For θ is not close to θ0 or when the sample size is small, it is very possible that θ /∈ Θ0 and thus equation (2.16) is not solvable. When θ /∈ Θ0, it is conven- tional to define Wn(θ) = ∞. However, this setting has its own limitations. Firstly, for any two different parameter values θ1, θ2 /∈ Θ0, we are unable to evaluate their relative plausibility based on Wn(θ). Secondly, using this setting implies that the confidence region is always a subset of Θ0, which is determined by the data. This can be a problem when even Θ0 itself does not achieve the desired confidence level especially when the sample size is small. Aiming to solve the no-solution problem of empirical likelihood, Chen et al. (2008) propose an adjustment to the original empirical likelihood such that the resulting adjusted empirical likelihood (AEL) is well defined for all possible parameter values. Chapter 3 is contributed to summarizing the well-studied asymptotic properties of the adjusted empirical likelihood, and investigating the finite-sample properties of AEL-based confidence regions mainly in the case of population mean. 22 Chapter 3 Adjusted Empirical Likelihood To overcome the obstacle caused by the no-solution problem in the appli- cation of empirical likelihood, Chen et al. (2008) propose an adjustment to empirical likelihood. The resulting adjusted empirical likelihood is attract- ing for its easy computation and desirable asymptotic properties. Recently, this method has found its applications in various areas. Zhu et al. (2009) incorporate the adjusted empirical likelihood and the exponentially tilted likelihood, and apply it to the analysis of morphometric measures in MRI studies. Liu and Yu (2010) propose a two-sample adjusted empirical like- lihood approach to construct confidence regions for the difference of two population means. Variyath et al. (2010) introduce the information criteria under adjusted empirical likelihood to variable/model selection problems. In this chapter, we first review the setting of the resulting adjusted em- pirical likelihood (AEL) and its asymptotic property. In addition, we present some new results on the finite-sample properties of the adjusted empirical likelihood. 3.1 Adjusted Empirical Likelihood and AEL-Based Confidence Regions Let us start with a simple example. Suppose that we have a random sample of n bivariate observations, and we are interested in the population mean µ. Now consider a value of µ outside of CH{Xi, i = 1, 2, . . . , n}. Apparently, such a value of µ does not satisfy condition (2.20), and therefore Wn(µ) 23 3.1. Adjusted Empirical Likelihood and AEL-Based Confidence Regions is not well defined. The idea of the adjustment proposed by Chen et al. (2008) is to add a pseudo observation Xn+1 into the data set such that µ ∈ CH{Xi, i = 1, 2, . . . , n+ 1}. More specifically, we may choose Xn+1 as Xn+1 = µ+ an (µ− X̄n), (3.1) for some positive constant an; or equivalently, we may write Xn+1 − µ = −an (X̄n − µ). (3.2) The rationale of adding such a pseudo point is illustrated in Figure 3.1. Note that X̄n = ∑n i=1Xi/n is always an interior point of CH{Xi, i = 1, 2, . . . , n}. Suppose µ be a parameter value outside of the convex hull of {X1, X2, . . . , Xn}. Let us first draw a ray from X̄n towards µ, and let Xn+1 be a point on the further side of µ. Apparently, the constant an deter- mines how far Xn+1 should be placed. It is seen that µ is an interior point of the convex hull of {X1, X2, . . . , Xn+1}. l l l l l l l l µ Xn (a) Plot of Xi’s l l l l l l l l µ Xn l Xn+1 (b) After adding a pseudo point Figure 3.1: A two-dimensional example showing the position of the pseudo point. 24 3.1. Adjusted Empirical Likelihood and AEL-Based Confidence Regions This adjustment is generally applicable. For any given value of θ, let gi(θ) = g(Xi; θ) and ḡn(θ) = ∑n i=1 gi(θ)/n. And the pseudo observation is defined as gn+1(θ) = −an ḡn(θ). (3.3) By including this pseudo observation into the data set, the empirical likeli- hood ratio statistic for θ becomes W ∗n(θ; an) = −2 logR∗n(θ; an) (3.4) with R∗n(θ) = (n+ 1) n+1 L∗n(θ; an), and L∗n(θ; an) = max { n+1∏ i=1 pi : pi > 0, n+1∑ i=1 pi = 1, n+1∑ i=1 pi gi(θ) = 0 } . We call a set of weights of {pi}n+1i=1 sub-optimal for W ∗n(θ; an), R∗n(θ; an) or L∗n(θ; an) if they satisfy the above equality constraints. Using the Lagrange’s method, we can easily show that the optimal weights are given by pi = 1 n+ 1 1 1 + λT gi(θ) , i = 1, 2, . . . , n+ 1, where λ is the solution to n+1∑ i=1 gi(θ) 1 + λT gi(θ) = 0. 25 3.1. Adjusted Empirical Likelihood and AEL-Based Confidence Regions As a consequence, W ∗n(θ; an) can be expressed as W ∗n(θ; an) = 2 n+1∑ i=1 log[1 + λT gi(θ)]. Compared to the original empirical likelihood, adjusted empirical likelihood has many desirable properties. Firstly, adjusted empirical likelihood yields a sensible value of likelihood at any putative parameter value, and this allows us to evaluate the plausibility of any parameter value. On the contrary, the original empirical likelihood is well defined only over a data-dependent sub- set of the parameter space, and this subset is difficult to specify numerically when the estimating function g(x; θ) is complex. Secondly, the first order asymptotic property of Wn(θ0), where θ0 is the true value of θ, is largely preserved for W ∗n(θ0; an). For example, W ∗n(θ0; an) has the same limiting distribution as that of Wn(θ0) as long as an = op(n 2/3). Thus, the χ2 calibration is still applicable to constructing confidence regions for parameter of interest. That is, CR∗α = {θ : W ∗n(θ; an) ≤ χ2q(α)} remains an approximate 100(1− α)% confidence region for θ. Thirdly, AEL-based confidence regions can achieve coverage precision of higher order with appropriately chosen an. The positive constant an in the definition of the pseudo point can be used as a tuning parameter which controls the level of adjustment. Recall that EL-based confidence regions have the under-coverage problem, and they are Bartlett correctable to achieve higher order precision. Apparently, tuning the size of an may achieve the same good. This is exactly what has been proposed in Liu and Chen (2010). They discover that when an = b/2 with b being the Bartlett correction factor, the coverage accuracy of AEL-based confidence regions is of order n−2, which is the same as that of Bartlett-corrected EL- based confidence regions. The sign of b matters in the adjusted empirical likelihood. In the one-dimensional case, b is positive for any distribution. When the dimension is higher than 1, empirical studies (Liu and Chen, 2010) seem to support that b is positive, but theoretical justification is still needed. 26 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood Fourthly, although the original motivation of the adjusted empirical like- lihood method is to handle the no-solution problem confronted with the EL method, empirical studies (Chen et al., 2008; Liu and Chen, 2010) reveal that AEL-based confidence regions have higher coverage rate than EL-based confidence regions. Note that this does not imply that the AEL-based con- fidence regions have more accurate coverage rates than the corresponding EL-based confidence regions, though it is often the case because EL-based confidence regions have the under-coverage problem. We will demonstrate this empirical discovery rigorously in Section 3.2. 3.2 Finite-Sample Properties of Adjusted Empirical Likelihood for Population Mean We devote this section to the finite-sample properties of the adjusted em- pirical likelihood mainly in the case of population mean. 3.2.1 Monotonicity of W ∗n(µ; an) in µ It is desirable that the confidence region of any parameter is convex. Empir- ical evidences seem to support that AEL-based confidence region for popula- tion mean is convex. This is yet to be confirmed theoretically. In this section, we prove that like Wn(µ), the adjusted likelihood ratio statistic W ∗ n(µ; an) also has a monotonicity property. As mentioned earlier in Section 2.5, this property is critical for the numerical computation of multidimensional con- fidence regions. Theorem 3.1. Suppose we have a random sample X1, X2, . . . , Xn from some population X, and W ∗n(µ; an) is the adjusted empirical likelihood ratio statistic for the population mean. For any d-dimensional unit vector v, con- sider the half line X̄n+tv for t ≥ 0. Then W ∗n(X̄n+tv; an) is an increasing function of t. Proof. For any 0 < t1 < t2, let µ1 = X̄n + t1 v and µ2 = X̄n + t2 v. Note 27 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood that W ∗n(µ; an) = −2 logR∗n(µ; an) = −2 logL∗n(µ; an)− 2 log(n+ 1)n+1, and the logarithm transformation is monotone. It suffices to show L∗n(µ1; an) ≥ L∗n(µ2; an). Let {pi}n+1i=1 be the optimal weights for L∗n(µ2; an). If we can find a set of sub-optimal weights {qi}n+1i=1 for L∗n(µ1; an) such that ∏n+1 i=1 qi ≥ ∏n+1 i=1 pi, then the conclusion of the theorem follows since L∗n(µ1; an) ≥ ∏n+1 i=1 qi. For i = 1, 2, . . . , n, we define ri = pi 1− pn+1 . Then we have n∑ i=1 ri (Xi − µ2) + pn+1 1− pn+1 (Xn+1 − µ2) = 0. Substituting Xn+1 − µ = −an (X̄n − µ) and letting k = pn+1 an/(1− pn+1), we have n∑ i=1 ri (Xi − µ2) = k (X̄n − µ2) = (X̄n − µ2) + (k − 1) (X̄n − µ2). Define L̃n(φ) = max { n∏ i=1 si : si > 0, n∑ i=1 si = 1, n∑ i=1 si (Xi − µ2) = φ } . That is, L̃n(φ) is the profile empirical likelihood for φ = E[X1 − µ2]. It is easy to verify that L̃n(φ) is maximized at φ̄n = ∑n i=1(Xi−µ2)/n = X̄n−µ2, and {ri}ni=1 is the optimizing weights for L̃n(φ2) with φ2 = (X̄n−µ2) + (k− 1) (X̄n− µ2). Denote φ1 = (X̄n− µ2) + (k− 1) (X̄n− µ1). Note that µ1 is a convex combination of µ2 and X̄n. Hence φ1 is also a convex combination of φ2 and φ̄n. By the monotonicity property of L̃n(φ) (Corollary 2.4), we 28 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood have L̃n(φ1) ≥ L̃n(φ2). Let {si}ni=1 be the optimal weights for L̃n(φ1), and define qn+1 = pn+1 and qi = (1 − pn+1) si for i = 1, 2, . . . , n. We can easily verify that {qi}n+1i=1 is a set of sub-optimal weights for L∗n(µ1). Hence, L∗n(µ1; an) ≥ n+1∏ i=1 qi = pn+1 · (1− pn+1)n n∏ i=1 si ≥ pn+1 · (1− pn+1)n n∏ i=1 ri = pn+1 · (1− pn+1)n n∏ i=1 pi 1− pn+1 = n+1∏ i=1 pi = L∗n(µ2; an). Hence, W ∗n(µ1; an) ≤W ∗n(µ2; an). According to Theorem 3.1, we find that X̄n is the minimum point of W ∗n(µ; an), and AEL-based confidence regions for the population mean are at least star-shaped with X̄n being the center. In the case of the univari- ate mean, the AEL-based confidence regions are still intervals. This result guarantees that the bisection algorithm described in Section 2.5 also works in finding the boundary of the AEL-based confidence region. Whether AEL-based confidence regions for population mean are convex or not is still not clear, though it seems to be the case in the simulation studies. 3.2.2 Monotonicity of W ∗n(θ; an) in an Empirical studies in Chen et al. (2008) and Liu and Chen (2010) reveal that the AEL-based confidence regions have higher coverage rate than the 29 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood corresponding EL-based confidence regions. Intuitively, the gain in cover- age rate of AEL-based confidence regions may be explained by the way how the adjusted empirical likelihood ratio statistic is constructed. As argued by Hua (2009), for testing the null hypothesis H0 : θ = θ0, the pseudo point gn+1(θ0) is always placed at a position that is in favor of the null hypothe- sis. Thus, the adjusted empirical likelihood ratio statistic tends to favor the null hypothesis and deflates the type-I error. Consequently, the AEL-based confidence regions has higher coverage rate compared to the correspond- ing EL-based confidence regions. It turns out that this observation can be proved rigorously. Theorem 3.2 reveals the monotonicity of W ∗n(θ; an) in an, and an interesting relationship between adjusted empirical likelihood and empirical likelihood as a special case. It implies that the AEL-based confidence region strictly contains the corresponding EL-based confidence region. Theorem 3.2. Suppose X1, X2, . . . , Xn is a random sample from some population X, and Wn(θ) and W ∗ n(θ; an) are the empirical likelihood ratio statistic and the adjusted empirical likelihood ratio statistic defined by equa- tions (2.15) and (3.4), respectively. We adopt the conventional value ∞ for Wn(θ) when it is not well defined. Then we have (1) W ∗n(θ; an) = Wn(θ) if an = 0. (2) W ∗n(θ; an) is a decreasing function of an on the closed interval [0, n]. Proof. (1) When an = 0, we have gn+1(θ) = 0. Hence, W ∗ n(θ; 0) becomes W ∗n(θ; 0) = 2 n+1∑ i=1 log[1 + λT gi(θ)] = 2 n∑ i=1 log[1 + λT gi(θ)], where λ is the solution to n+1∑ i=1 gi(θ) 1 + λT gi(θ) = n∑ i=1 gi(θ) 1 + λT gi(θ) = 0. 30 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood It is clear that the expression of W ∗n(θ; 0) and the equation to λ coincide with those in the definition of Wn(θ) (equation (2.15) and (2.16)). That is, we have W ∗n(θ; 0) = Wn(θ). (2) We will only give the proof in the case of population mean µ for sim- plicity; the proof is the same for the case of general estimating equation. Without loss of generality, we also fix µ = 0 and assume X̄n 6= 0. When an = n, it is easy to verify that weights {pi = 1/(n + 1)}n+1i=1 are sub-optimal for W ∗n(0;n) and thus are optimal for W ∗n(0;n). Therefore, W ∗n(0;n) = 0 ≤ W ∗n(0; an) for any an < n. Next we will only consider an ∈ [0, n). Note that W ∗n(0; an) can be expressed as W ∗n(0; an) = 2 n+1∑ i=1 log(1 + λTXi), where Xn+1 = −an X̄n, and λ satisfies n+1∑ i=1 Xi 1 + λTXi = 0. (3.5) The derivative of W ∗n(0; an) with respect to an is dW ∗(0; an) dan = n+1∑ i=1 ( dλ dan )T Xi 1 + λTXi + λT dXn+1dan 1 + λTXn+1 = ( dλ dan )T n+1∑ i=1 Xi 1 + λTXi + λT (−X̄n) 1 + λTXn+1 = − λ T X̄n 1 + λTXn+1 , where we substitute equation (3.5). If the derivative of W ∗n(0; an) is always negative, then we know W ∗n(0; an) is a decreasing function of an. Our task is to prove that the derivative of W ∗n(0; an) is negative for an ∈ [0, n), or equivalently to prove λT X̄n > 0. 31 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood Consider the following function f(t) = n+1∑ i=1 λTXi 1 + t · λTXi . Note that f(0) = n+1∑ i=1 λTXi = λ T n∑ i=1 Xi + λ T (−an X̄n) = (n− an)λT X̄n, f(1) = n+1∑ i=1 λTXi 1 + λTXi = λT n+1∑ i=1 Xi 1 + λTXi = 0. We also notice that the derivative of f(t) df(t) dt = n+1∑ i=1 [ − λ TXi · λTXi (1 + t · λTXi)2 ] = − n+1∑ i=1 ( λTXi 1 + t · λTXi )2 is always negative, and thus f(t) is a decreasing function of t. Therefore, we have f(0) > f(1), that is (n − an)λT X̄n > 0. Since an < n, we find λT X̄n > 0. Consequently, W ∗ n(0; an) is a decreasing function of an, and it completes the proof. Figure 3.2 plots the EL and AEL likelihood ratio statistics based on an artificially generated data set. It clearly shows that W ∗n(µ; an) ≤Wn(µ) for all µ. As a consequence, the AEL-based confidence interval for µ contains the corresponding EL-based confidence interval and hence the former has higher coverage probability. This conclusion is generally true for parameters defined through general estimating equation. It enhances the results in Liu and Chen (2010) that AEL-based confidence regions where an = b/2 with b being the Bartlett correction factor has not only higher coverage accuracy but also higher coverage probability. 32 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood µ Lik eli ho od ra tio st ati sti c X(1) X(n) EL AEL Figure 3.2: An example showing the effect of the pseudo point on the like- lihood ratio statistic. 3.2.3 Boundedness of W ∗n(µ; an) Suppose X1, X2, . . . , Xn ∈ Rd are independent random vectors from some population X, and θ ∈ Rp is the parameter of interest defined through general estimating equation (2.12). The EL-based approximate 100(1−α)% confidence region for θ is defined as CRα = {θ : Wn(θ) ≤ χ2q(α)} where χ2q(α) is the upper α quantile of χ 2 distribution with q degrees of freedom. The AEL-based 100(1 − α)% confidence region is defined in the same way except for replacing the foregoing Wn(θ) by W ∗ n(θ; an). In the case of population mean, since Wn(µ) is only well defined for µ in the convex hull of the sample, and Wn(µ) tends to infinity as µ approaches 33 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood to the boundary of the convex hull, the EL-based confidence region for population mean is always of finite size. On the other hand, however, we find that W ∗n(θ; an) is bounded from above for any given n. Hence, AEL may give unbounded confidence region when the sample size is not large enough, or the confidence level (1 − α) is too high. We state this result as follows. Theorem 3.3. Suppose we have a finite sample X1, X2, . . . , Xn and let W ∗n(θ; an) be the adjusted empirical likelihood ratio statistic defined in equa- tion (3.4). For any θ, we have W ∗n(θ; an) ≤ −2n log [ (n+ 1) an n (1 + an) ] − 2 log [ n+ 1 1 + an ] . Proof. Let q1 = q2 = · · · = qn = 1 n an 1 + an , qn+1 = 1 1 + an . It is clear that qi > 0 and ∑n+1 i=1 qi = 1. In addition, it is seen that n+1∑ i=1 qi gi(θ) = an 1 + an 1 n n∑ i=1 gi(θ) + 1 1 + an [−an ḡn(θ)] = an 1 + an ḡn(θ)− an 1 + an ḡn(θ) = 0. Hence, {qi}n+1i=1 is a set of sub-optimal weights for W ∗n(θ; an). According to the definition of W ∗n(θ; an), we thus have W ∗n(θ; an) ≤ −2 n+1∑ i=1 log[(n+ 1) qi] = −2n log [ (n+ 1) an n (1 + an) ] − 2 log [ n+ 1 1 + an ] . It completes the proof. For the population mean, the next theorem shows that the upper bound 34 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood in Theorem 3.3 is the supremum of W ∗n(µ; an). Theorem 3.4. Let X1, X2, . . . , Xn be i.i.d. d-dimensional random vectors and µ be the population mean. Denote M as the upper bound in Theorem 3.3. For any d-dimensional unit vector v, consider the half line X̄n + tv with t > 0. We have lim t→∞W ∗ n(X̄n + tv; an) = M. Proof. We will present proof for the case when d = 1 and for the case when d > 1 separately. Case 1: d = 1. We will only present the proof of the theorem in the case where µ→ −∞; the proof in the case where µ→∞ is similar. Let {pi}n+1i=1 be the optimal weights for W ∗n(µ; an). We prove the result in three steps. Firstly, we demonstrate that lim µ→−∞pn+1 = 1/(1+an). Secondly, we further show that lim µ→−∞pi = an/[n (1 + an)] for i = 1, 2, . . . , n. In the final step, the conclusion of the proposition readily follows. Step 1. Consider any µ such that µ < X(1). Note that {pi}n+1i=1 satisfy n+1∑ i=1 pi (Xi − µi) = 0. From the above equation, we get n∑ i=1 pi (Xi − µ) = −pn+1 (Xn+1 − µ) = pn+1 an (X̄n − µ). Thus, pn+1 = n∑ i=1 pi (Xi − µ) an (X̄n − µ) . Note that 0 < X(1) − µ ≤ Xi − µ ≤ X(n) − µ for i = 1, 2, . . . , n, and 35 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood ∑n i=1 pi = 1− pn+1. Hence pn+1 ≤ n∑ i=1 pi (X(n) − µ) an (X̄n − µ) = ∑n i=1 pi an X(n) − µ X̄n − µ = 1− pn+1 an X(n) − µ X̄n − µ . Since pn+1 < 1, we get an upper bound for pn+1 from the above equation: pn+1 ≤ X(n) − µ an (X̄n − µ) + (X(n) − µ) . Similarly, we get a lower bound for pn+1: pn+1 ≥ X(1) − µ an (X̄n − µ) + (X(1) − µ) . Letting µ→ −∞, we get 1 1 + an ≤ lim µ→−∞ pn+1 ≤ lim µ→−∞ pn+1 ≤ 1 1 + an . Hence, lim µ→−∞ pn+1 = 1 1 + an . (3.6) Step 2. For i = 1, 2, . . . , n+ 1, pi can be expressed as pi = 1 n+ 1 1 1 + λ (Xi − µ) (3.7) for some λ. By equation (3.7) with i = n+ 1, we have λ = − (n+ 1)− p −1 n+1 (n+ 1)(Xn+1 − µ) = (n+ 1)− p−1n+1 an (n+ 1)(X̄n − µ) . For i = 1, 2, . . . , n, substituting this expression of λ into equation (3.7) leads 36 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood to pi = 1 n+ 1 [ 1 + (n+ 1)− p−1n+1 (n+ 1) an Xi − µ X̄n − µ ]−1 = [ (n+ 1) + (n+ 1)− p−1n+1 an Xi − µ X̄n − µ ]−1 . Letting µ→ −∞ and using equation (3.6), we get lim µ→−∞ pi = [ (n+ 1) + (n+ 1)− (1 + an) an ]−1 = 1 n an 1 + an . Step 3. Since {pi}n+1i=1 are the optimal weights for W ∗n(µ; an), we have W ∗n(µ; an) = −2 n+1∑ i=1 log[(n+ 1) pi]. Consequently, lim µ→−∞W ∗ n(µ; an) = limµ→−∞−2 n+1∑ i=1 log[(n+ 1) pi] = −2n log [ (n+ 1) an n (1 + an) ] − 2 log [ n+ 1 1 + an ] , which is the conclusion. Case 2: d > 1. For any d-dimensional unit vector v and t > 0, let {pi}n+1i=1 be the optimal weights for W ∗n(X̄n + tv; an). Consider Yi = vT (Xi − X̄n) for i = 1, 2, . . . , n+ 1. It is easy to verify that n+1∑ i=1 pi (Yi − t) = 0. 37 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood Define R̃∗n(t; an) = max { n+1∏ i=1 (n+ 1) pi : pi > 0, n+1∑ i=1 pi = 1, n+1∑ i=1 pi (Yi − t) = 0 } , and W̃ ∗n(t; an) = −2 log R̃∗n(t; an). Since {pi}n+1i=1 are sub-optimal weights for W̃ ∗n(t; an), we have W̃ ∗n(t; an) ≤W ∗n(µ; an) ≤M. We have already proved that limt→∞ W̃ ∗n(t; an) = M . Consequently, we get M = lim t→∞ W̃ ∗n(t; an) ≤ lim t→∞ W ∗n(µ; an) ≤ lim t→∞W ∗ n(µ; an) ≤M. Hence lim t→∞W ∗ n(µ; an) = M. This completes the proof. Theorem 3.3 reveals that W ∗n(θ; an) is a bounded function of θ; Figure 3.3 shows the relationship between the upper bound and the sample size when an = log(n)/2. When the sample size is small or the dimension is high, the upper bound of W ∗n(θ; an) is likely to be smaller than the upper α quantile of the χ2 distribution. When this happens, the approximate confidence region based on the χ2 calibration becomes the entire parameter space. Figure 3.4 shows the minimum sample size needed for the adjusted empirical likelihood method to give bounded confidence regions versus different degrees of free- dom and confidence levels. Even if the minimum sample size is attained in a particular situation, the adjusted empirical likelihood method may still produce unreasonably large confidence regions. For example, suppose we have a univariate sample of size 5 and we would like to construct the AEL-based 95% confidence interval for the population mean. As proposed by Chen et al. (2008), an is chosen to be log(5)/2 = 0.805. For this choice of an, the upper bound of W ∗ n(µ; an) 38 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood 2 4 6 8 1 0 1 2 1 4 Sample size U pp er b ou nd 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 3.3: Plot of upper bound against sample size is 3.851, which is only slightly larger than the upper 5% quantile of the χ21 distribution, 3.841. Because of this, the resulting confidence interval is very long. We can imagine that the coverage rate is much higher than the nom- inal level 95%. Figure 3.5 illustrates this point with a data set of 5 points generated from N(0, 1). In the case of population mean, we may modify the adjusted empirical likelihood method so that the resulting W ∗n(µ; an) becomes unbounded from above. To motivate such a modification, let us once again look into Theo- rem 3.3. We view W ∗n(µ; an) as a function of an while regarding µ as a fixed constant satisfying µ 6= X̄n. It is seen that W ∗n(µ; an) equals Wn(µ) when an = 0, and W ∗ n(µ; an) is a decreasing function of an on the closed interval [0, n]. See Figure 3.6 for an illustration. Attempting to make W ∗n(µ; an) 39 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood l l l l l l l l l Critical value R e qu ire d sa m pl e si ze 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 χ1 2(0.1) χ22(0.2) χ12(0.05) χ22(0.1) χ22(0.05) χ12(0.01) χ32(0.05) χ22(0.01) χ32(0.01) Figure 3.4: Plot of required sample size for various critical values unbounded from above, we consider replacing the constant an by an(µ) = an · exp { − √ (X̄n − µ)T S−1n (X̄n − µ) } , (3.8) where Sn is the sample variance-covariance matrix. We assume that Sn is nonsingular. The resulting W ∗n(µ; an(µ)) is always larger than W ∗n(µ; an) but smaller thanWn(µ) for any value of µ since an(µ) is always smaller than an but larger than 0. As µ deviates from X̄n, an(µ) tends to zero and thus W ∗ n(µ; an(µ)) approaches W (µ). As a result, W ∗n(µ; an(µ)) is unbounded from above. Fig- ure 3.7 visualizes the effect of an(µ) in the univariate case. The modified adjusted empirical likelihood possesses two key advantages. Firstly, the modified adjusted empirical likelihood ratio statistic preserves the monotonicity of the adjusted empirical likelihood, and therefore the cor- 40 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood µ L ik e lih o o d ra tio s ta tis tic −7.073 0.114 1.299 8.477 Upper α quantile of χ1 2 EL AEL Figure 3.5: The EL-based and AEL-based 95% confidence intervals for the population mean responding confidence region is star-shaped and bounded. Based on the foregoing discussion, we can imagine that the AEL-based confidence region contains the confidence region based on the modified adjusted empirical like- lihood while the latter one contains the EL-based confidence region. Fig- ure 3.8 shows the 95% approximate confidence regions based on the empirical likelihood method, the adjusted empirical likelihood method and the modi- fied adjusted empirical likelihood method. Secondly, it is seen that the multiplier in the definition of an(µ) converges to 1 of order n−1/2 as n → ∞ when µ equals the true value µ0. This fact implies the modified adjusted empirical likelihood preserves both the first-order and second-order asymptotic properties of the original adjusted empirical likelihood. Obviously, there are many choices for the multiplier in the definition of 41 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood 0 1 2 3 4 5 0 1 2 3 4 5 6 an W n* (µ;a n) Figure 3.6: W ∗n(µ; an) as a function of an an(µ). Based on the foregoing discussion, we may require the multiplier should satisfy: (1) it decreases to 0 as µ deviates from X̄n; and (2) it converges to 1 of order n−1/2 as n increases. If a multiplier satisfies these two conditions, the corresponding modified adjusted empirical likelihood not only gives bounded confidence regions, but also preserves the asymptotic and finite-sample properties of the original adjusted empirical likelihood presented in this thesis. With these two conditions in mind, we can see that there are still many kinds of choice for the multiplier. The “optimal” choice of multiplier would be an interesting topic for future research. In the literature, there is another variant of adjusted empirical likelihood that gives unbounded likelihood ratio statistic. Emerson and Owen (2009) 42 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood µ Lik eli ho od ra tio st ati sti c Upper Bound of AEL X(1) X(n) EL AEL Modified AEL Figure 3.7: The effect of an(µ) also discover the boundedness of W ∗n(µ; an), and they consider a different way to modify the adjusted empirical likelihood so as to get an unbounded likelihood ratio statistic in the case of population mean. They propose adding two pseudo points to the original sample. More specifically, for any µ 6= X̄n, the first pseudo point Xn+1 is also added on the further side of µ but the distance between Xn+1 and µ is a constant s, and the second Xn+2 is added such that X̄n is the midpoint of Xn+1 and Xn+2. Then the likelihood ratio statistic is defined as W ∗n(µ) = −2 max { n+2∑ i=1 log[(n+ 2) pi] : pi > 0, n+2∑ i=1 pi = 1, n+2∑ i=1 pi (Xi − µ) = 0 } . The resulting method is called the balanced augmented empirical likelihood method. The method is called “balanced” because the sample mean of 43 3.2. Finite-Sample Properties of Adjusted Empirical Likelihood l l l l l l l l l l 0 2 4 6 0.0 0.5 1.0 1.5 2.0 2.5 3.0 lXn EL AEL Modified AEL Figure 3.8: The 95% approximate confidence regions produced by EL, AEL and modified AEL {X1, X2, . . . , Xn+2} is maintained at X̄n. They demonstrate this likelihood ratio statistic is unbounded from above, and establish the finite-sample rela- tionship between this modified likelihood ratio method and the well-known Hotelling’s T 2 test through the tuning parameter s. This topic is beyond the scope of the thesis; details can be found in Emerson and Owen (2009). 44 Chapter 4 Empirical Results In this chapter, we conduct simulation studies on the finite-sample properties of the empirical likelihood method and its several variants. Particularly, we investigate the coverage probabilities and the sizes of a number of confidence regions for population mean. Constructing confidence regions for population mean based on a simple random sample of n observations is a classical problem in statistical infer- ence. The most widely-used method of constructing confidence regions for population mean is based on the Hotelling’s T 2 statistic T 2n(µ) = n (X̄n − µ)TS−1n (X̄n − µ), where X̄n and Sn are the sample mean and the sample variance-covariance matrix, respectively. If the population distribution is multivariate normal of dimension d, then (n−d)T 2(µ0)/[d (n−1)] is known to have an F distribution with d and n − d degrees of freedom where µ0 is the true parameter value. A 100(1− α)% confidence region for µ is given by CR = { µ : T 2n(µ) ≤ d (n− 1) n− d Fd, n−d(α) } , where Fd, n−d(α) denotes the upper α quantile of F distribution with d and n− d degrees of freedom. When d = 1, Hotelling’s T 2 statistic becomes the square of the well-known Student’s t statistic. Many practitioners prefer using this kind of confidence region based on normal approximation because of its easy calculation and straightforward interpretation. Moreover, many numerical studies have found that the fore- going form of confidence region has surprisingly accurate coverage rate even 45 4.1. Confidence Intervals for One-Dimensional Mean when the population distribution is not normal and the sample size is small. We investigate the coverage rates and the sizes of approximate 90% and 95% confidence intervals/regions in the cases of one-dimensional mean and two-dimensional mean. Seven methods are considered: 1. The Hotelling’s T 2 method, denoted as T 2; 2. The original empirical likelihood method, denoted as EL; 3. The adjusted empirical likelihood method with an = log(n)/2, denoted as AEL; 4. The modified adjusted empirical likelihood method with an = log(n)/2, denoted as MAEL; 5. The Bartlett corrected empirical likelihood method, denoted as EL∗; 6. The adjusted empirical likelihood method with an = b/2 where b is estimated by moments, denoted as AEL∗; 7. The modified adjusted empirical likelihood method with an = b/2 where b is estimated by method of moments, denoted as MAEL∗; 4.1 Confidence Intervals for One-Dimensional Mean Three sample sizes (n = 5, 10, 50) are considered. For each sample size, we generated 1, 000 samples from each of the following four distributions: 1. Standard normal distribution, denoted as N(0, 1); 2. χ2 distribution with 1 degree of freedom, denoted as χ21; 3. Exponential distribution with rate 1, denoted as Exp(1); and 4. A normal mixture 0.1 N(−9, 1)+0.9 N(1, 1), denoted as 0.1 N1+0.9 N2. For each sample, we calculated the 90% and 95% confidence intervals. Ta- ble 4.1 reports the coverage frequencies, and Table 4.2 gives the average lengths of the corresponding intervals. 46 4.1. Confidence Intervals for One-Dimensional Mean Table 4.1: Coverage rates for one-dimensional mean Nominal level 0.9 0.95 N(0, 1) n = 5 n = 10 n = 50 n = 5 n = 10 n = 50 T 2 0.895 0.909 0.892 0.956 0.952 0.949 EL 0.757 0.858 0.889 0.815 0.913 0.935 AEL 0.881 0.910 0.897 1.000 0.954 0.945 MAEL 0.796 0.884 0.896 0.845 0.928 0.942 EL∗ 0.782 0.876 0.891 0.834 0.921 0.939 AEL∗ 0.797 0.880 0.891 0.855 0.923 0.939 MAEL∗ 0.779 0.870 0.889 0.831 0.917 0.937 χ21 T 2 0.785 0.809 0.898 0.840 0.857 0.940 EL 0.663 0.766 0.892 0.733 0.832 0.943 AEL 0.789 0.827 0.905 0.995 0.881 0.951 MAEL 0.709 0.804 0.904 0.765 0.856 0.947 EL∗ 0.691 0.784 0.901 0.757 0.850 0.946 AEL∗ 0.715 0.792 0.901 0.770 0.853 0.947 MAEL∗ 0.688 0.782 0.900 0.749 0.843 0.945 Exp(1) T 2 0.829 0.849 0.884 0.875 0.896 0.932 EL 0.712 0.799 0.878 0.765 0.869 0.934 AEL 0.829 0.864 0.892 0.999 0.913 0.945 MAEL 0.751 0.831 0.888 0.800 0.889 0.945 EL∗ 0.734 0.824 0.886 0.785 0.884 0.941 AEL∗ 0.749 0.828 0.886 0.813 0.887 0.941 MAEL∗ 0.722 0.819 0.884 0.773 0.878 0.940 0.1 N1 + 0.9 N2 T 2 0.648 0.677 0.896 0.724 0.764 0.944 EL 0.474 0.625 0.909 0.534 0.660 0.952 AEL 0.636 0.660 0.923 0.999 0.718 0.959 MAEL 0.522 0.645 0.918 0.573 0.680 0.959 EL∗ 0.497 0.632 0.920 0.547 0.666 0.957 AEL∗ 0.516 0.637 0.921 0.584 0.670 0.957 MAEL∗ 0.495 0.632 0.917 0.545 0.665 0.957 47 4.1. Confidence Intervals for One-Dimensional Mean Table 4.2: Confidence interval lengths for one-dimensional mean Nominal level 0.9 0.95 N(0, 1) n = 5 n = 10 n = 50 n = 5 n = 10 n = 50 T 2 1.797 1.130 0.474 2.341 1.395 0.567 EL 1.178 0.966 0.465 1.371 1.150 0.556 AEL 1.729 1.135 0.485 18.201 1.398 0.581 MAEL 1.307 1.046 0.480 1.521 1.239 0.574 EL∗ 1.266 1.019 0.472 1.466 1.213 0.564 AEL∗ 1.328 1.031 0.472 1.596 1.232 0.565 MAEL∗ 1.240 1.002 0.470 1.438 1.189 0.562 χ21 T 2 2.252 1.497 0.660 2.933 1.847 0.791 EL 1.447 1.265 0.657 1.675 1.500 0.790 AEL 2.156 1.491 0.685 22.802 1.844 0.825 MAEL 1.614 1.370 0.678 1.882 1.621 0.815 EL∗ 1.547 1.339 0.675 1.782 1.587 0.813 AEL∗ 1.624 1.361 0.676 1.941 1.626 0.814 MAEL∗ 1.542 1.327 0.673 1.791 1.572 0.809 Exp(1) T 2 1.686 1.077 0.468 2.196 1.329 0.561 EL 1.091 0.913 0.464 1.266 1.084 0.557 AEL 1.617 1.075 0.484 17.075 1.327 0.581 MAEL 1.215 0.989 0.479 1.415 1.170 0.574 EL∗ 1.169 0.965 0.474 1.349 1.145 0.569 AEL∗ 1.226 0.979 0.474 1.467 1.169 0.569 MAEL∗ 1.150 0.949 0.472 1.333 1.125 0.566 0.1 N1 + 0.9 N2 T 2 4.796 3.309 1.498 6.246 4.083 1.796 EL 3.076 2.801 1.468 3.562 3.327 1.756 AEL 4.591 3.302 1.532 48.563 4.088 1.834 MAEL 3.433 3.032 1.517 4.006 3.592 1.812 EL∗ 3.289 2.978 1.497 3.792 3.536 1.791 AEL∗ 3.456 3.032 1.498 4.138 3.633 1.792 MAEL∗ 3.234 2.921 1.491 3.739 3.460 1.782 48 4.2. Confidence Regions for Two-Dimensional Mean 4.2 Confidence Regions for Two-Dimensional Mean We also consider constructing confidence regions for the population mean of the following bivariate distributions: 1. Standard normal distribution, N(0, I2); 2. Distribution of (X1, X2) where X1 ∼ Γ(U, 1) and X2 ∼ Γ(U−1, 1) with U ∼ Uniform(1.5, 2), denoted as Gamma-Gamma. 3. (X1, X2) is bivariate normal distributed with Var(X1) = Var(X2) = 1 and Cov(X1, X2) = ρ given ρ ∼ Uniform(0, 1), denoted as Normal-Uniform. Two sample sizes (n = 10, 50) are considered. For each sample size, we generated 1, 000 samples from each of the above three distributions. Ap- proximate 90% and 95% confidence regions were calculated. The coverage frequencies and average areas of the confidence regions based on various methods are summarized in Table 4.3 and 4.4. Note that the area of confi- dence region is calculated approximately. 4.3 Summary Under the standard normal model, T 2 has very accurate coverage rates com- pared to its nonparametric alternatives except the AEL. It is because that T 2-based confidence interval/region achieves the nominal level in theory re- gardless of the sample size. The performance of the nonparametric methods gets better when the sample size increases. Under other distribution mod- els, the performances of all methods in small sample cases are unsatisfactory. Especially in the mixture normal model, the coverage rates are dramatically lower than the nominal level. On average, T 2-based confidence interval/region has larger size than its nonparametric alternatives. It makes sense since T 2-based confidence in- terval/region has higher coverage rate. Because of this trade-off between 49 4.3. Summary Table 4.3: Coverage rates for two-dimensional mean Nominal level 0.9 0.95 N(0, I2) n = 10 n = 50 n = 10 n = 50 T 2 0.904 0.909 0.952 0.958 EL 0.766 0.898 0.833 0.946 AEL 0.883 0.917 0.966 0.958 MAEL 0.813 0.914 0.861 0.956 EL∗ 0.801 0.907 0.852 0.952 AEL∗ 0.826 0.907 0.876 0.952 MAEL∗ 0.795 0.905 0.848 0.952 Gamma-Gamma T 2 0.827 0.877 0.878 0.930 EL 0.708 0.876 0.766 0.921 AEL 0.827 0.892 0.926 0.935 MAEL 0.736 0.887 0.795 0.929 EL∗ 0.742 0.887 0.797 0.928 AEL∗ 0.760 0.884 0.816 0.928 MAEL∗ 0.727 0.882 0.781 0.928 Normal-Uniform T 2 0.920 0.899 0.959 0.960 EL 0.752 0.890 0.831 0.944 AEL 0.897 0.907 0.977 0.958 MAEL 0.807 0.896 0.867 0.953 EL∗ 0.797 0.894 0.858 0.951 AEL∗ 0.822 0.894 0.882 0.951 MAEL∗ 0.790 0.894 0.849 0.948 coverage probability and the size of confidence region, it is difficult to say which method has better performance. How to evaluate the performance of certain kind of confidence region based on both the coverage probability and the region size is still an open question. Surprisingly, the AEL keeps up with T 2 in terms of both coverage rate and confidence interval/region size for most cases. However, note that the 50 4.3. Summary Table 4.4: Confidence region areas for two-dimensional mean Nominal level 0.9 0.95 N(0, I2) n = 10 n = 50 n = 10 n = 50 T 2 1.963 0.304 2.812 0.401 EL 1.115 0.287 1.422 0.378 AEL 1.821 0.314 3.511 0.414 MAEL 1.293 0.306 1.644 0.401 EL∗ 1.260 0.299 1.600 0.392 AEL∗ 1.356 0.299 1.799 0.393 MAEL∗ 1.211 0.296 1.537 0.388 Gamma-Gamma T 2 1.872 0.316 2.681 0.417 EL 1.031 0.302 1.307 0.398 AEL 1.715 0.330 3.331 0.436 MAEL 1.194 0.321 1.519 0.422 EL∗ 1.161 0.318 1.466 0.419 AEL∗ 1.269 0.319 1.779 0.420 MAEL∗ 1.117 0.314 1.410 0.413 Normal-Uniform T 2 1.676 0.263 2.401 0.347 EL 0.954 0.251 1.218 0.330 AEL 1.558 0.274 3.001 0.362 MAEL 1.107 0.267 1.408 0.351 EL∗ 1.081 0.261 1.374 0.344 AEL∗ 1.168 0.262 1.568 0.345 MAEL∗ 1.038 0.259 1.319 0.340 AEL-based confidence interval has substantially higher than nominal cov- erage rate when the sample size is 5 and the nominal level is 95% in the univariate case. It is in accordance with the discussion in Section 3.2.3. In this situation, the upper bound of the adjusted empirical likelihood ratio statistic is only slightly larger than the critical value. It results in very long interval, which in turn possesses higher-than-expected coverage probability. 51 4.3. Summary As expected, the MAEL is a compromise between the EL and the AEL. We observe that the performance of MAEL is more similar to that of EL than that of AEL especially in the bivariate case. It may be due to the fact that the multiplier in the definition of pseudo point in MAEL decreases to 0 very fast as µ deviates from X̄n; the decreasing is even faster as the dimension increases. It implies the difference between the MAEL and EL likelihood ratio statistics is smaller than that between the MAEL and AEL likelihood ratio statistics. The EL∗, AEL∗ and MAEL∗ have similar performance because they are known to be precise up to the same order n−2. 52 Chapter 5 Conclusion The main interest of this thesis lies in the finite-sample properties of ad- justed empirical likelihood and its implication to constructing confidence regions for population mean. The monotonicity property of the adjusted empirical likelihood ratio statistic guarantees that AEL-based confidence regions for population mean are at least star-shaped. It is a desirable prop- erty for confidence regions because of its intuitive interpretation. We also discovered the connection between empirical likelihood and adjusted empir- ical likelihood as a special case of a more general conclusion, which justified the empirical observation that AEL-based confidence regions have higher coverage probability than the corresponding EL-based confidence regions. The boundedness of adjusted empirical likelihood ratio statistic reveals that constant level of adjustment may produce inappropriate confidence regions when the sample size is not large enough or the nominal confidence level is too high. We attempted to modify the level of adjustment so as to obtain an unbounded likelihood ratio statistic, and justified the proposed modifica- tion preserves both asymptotic and finite-sample properties of the original adjusted empirical likelihood. As future research, the convexity of AEL-based confidence regions for population mean is of interest; current empirical studies support this propo- sition. On the other hand, the choice of an(µ) also remains an interesting topic. As discussed in Section 3.2.3, we may adjust the trade-off between the coverage probability and the size of confidence region for population mean. If we can come up with a sensible criterion to evaluate certain kind of confidence regions by taking both its coverage probability and its size into consideration, we may be able to find a family of an(µ) such that the result- ing AEL-based confidence region for population mean has both asymptotic 53 Chapter 5. Conclusion and finite-sample advantages. 54 Bibliography Barndorff-Nielsen, O. E. and Cox, D. R. (1984). Bartlett adjustments to the likelihood ratio statistic and the distribution of the maximum likelihood estimator. Journal of Royal Statistical Society, Series B, 46(3):483–495. Chan, N. H. and Ling, S. (2006). Empirical likelihood for GARCH models. Econometric Theory, 22:403–428. Chen, J., Chen, S.-Y., and Rao, J. N. K. (2003). Empirical likelihood confi- dence intervals for the mean of a population containing many zero values. The Canadian Journal of Statistics, 31(1):53–68. Chen, J., Peng, L., and Zhao, Y. (2009). Empirical likelihood based confi- dence intervals for copulas. Journal of Multivariate Analysis, 100(1):137– 151. Chen, J. and Sitter, R. R. (1999). A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Statistica Sinica, 9:385–406. Chen, J., Sitter, R. R., and Wu, C. (2002). Using empirical likelihood method to obtain range restricted weights in regression estimators for surveys. Biometrika, 89(1):230–237. Chen, J., Variyath, A. M., and Abraham, B. (2008). Adjusted empirical likelihood and its properties. Journal of Computational and Graphical Statistics, 17(2):426–443. Corotis, R. B., Sigl, A. B., and Klein, J. (1978). Probability models of wind velocity magnitude and persistence. Solar Energy, 20(6):483–493. 55 Bibliography Diciccio, T., Hall, P., and Romano, J. (1991). Empirical likelihood is Bartlett-correctable. The Annals of Statistics, 19(2):1053–1061. Emerson, S. C. and Owen, A. B. (2009). Calibration of the empirical likeli- hood method for a vector mean. Electronic Journal of Statistics, 3:1161– 1192. Hall, A. R. (2005). Generalized Method of Moments. Oxford University Press. Hall, P. and La Scala, B. (1990). Methodology and algorithms of empirical likelihood. International Statistical Review, 58(2):109–127. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054. Hua, L. (2009). A report on an adjusted empirical likelihood. Manuscript, 16 pages. Imbens, G. W. (2002). Generalized method of moments and empirical like- lihood. Journal of Business and Economic Statistics, 20(4):493–506. Liu, Y. and Chen, J. (2010). Adjusted empirical likelihood with high-order precision. The Annals of Statistics, 38(3):1341–1362. Liu, Y. and Yu, C. W. (2010). Bartlett correctable two-sample adjusted empirical likelihood. Journal of Multivariate Analysis, 101(7):1701–1711. Lun, I. Y. and Lam, J. C. (2000). A study of weibull parameters using long-term wind observations. Renewable Energy, 20(2):145–153. Nordman, D. J. and Caragea, P. C. (2008). Point and interval estimation of variogram models using spatial empirical likelihood. Journal of the American Statistical Association, 103(481):350–361. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249. 56 Bibliography Owen, A. B. (1990). Empirical likelihood ratio confidence regions. The Annals of Statstics, 18(1):90–120. Owen, A. B. (1991). Empirical likelihood for linear models. The Annals of Statistics, 19(4):1725–1747. Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC. Qin, G. and Zhou, X.-H. (2005). Empirical likelihood inference for the area under the ROC curve. Biometrics, 62(2):613–622. Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1):300–325. Qin, J. and Zhang, B. (2007). Empirical-likelihood-based inference in miss- ing response problems and its application in observational studies. Journal of the Royal Statistical Society: Series B, 69(1):101–122. Variyath, A. M., Chen, J., and Abraham, B. (2010). Empirical likelihood based variable selection. Journal of Statistical Planning and Inference, 140(4):971–981. Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9(1):60–62. Zhu, H., Zhou, H., Chen, J., Li, Y., Lieberman, J., and Styner, M. (2009). Adjusted exponentially tilted likelihood with applications to brain mor- phology. Biometrics, 65(3):919–927. 57
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Properties of empirical and adjusted empirical likelihoods
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Properties of empirical and adjusted empirical likelihoods Huang, Yi 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Properties of empirical and adjusted empirical likelihoods |
Creator |
Huang, Yi |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Likelihood based statistical inferences have been advocated by generations of statisticians. As an alternative to the traditional parametric likelihood, empirical likelihood (EL) is appealing for its nonparametric setting and desirable asymptotic properties. In this thesis, we first review and investigate the asymptotic and finite-sample properties of the empirical likelihood, particularly its implication to constructing confidence regions for population mean. We then study the properties of the adjusted empirical likelihood (AEL) proposed by Chen et al. (2008). The adjusted empirical likelihood was introduced to overcome the shortcomings of the empirical likelihood when it is applied to statistical models specified through general estimating equations. The adjusted empirical likelihood preserves the first order asymptotic properties of the empirical likelihood and its numerical problem is substantially simplified. A major application of the empirical likelihood or adjusted empirical likelihood is the construction of confidence regions for the population mean. In addition, we discover that adjusted empirical likelihood, like empirical likelihood, has an important monotonicity property. One major discovery of this thesis is that the adjusted empirical likelihood ratio statistic is always smaller than the empirical likelihood ratio statistic. It implies that the AEL-based confidence regions always contain the corresponding EL-based confidence regions and hence have higher coverage probability. This result has been observed in many empirical studies, and we prove it rigorously. We also find that the original adjusted empirical likelihood as specified by Chen et al. (2008) has a bounded likelihood ratio statistic. This may result in confidence regions of infinite size, particularly when the sample size is small. We further investigate approaches to modify the adjusted empirical likelihood so that the resulting confidence regions of population mean are always bounded. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-26 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0071220 |
URI | http://hdl.handle.net/2429/27819 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_huang_yi.pdf [ 452.55kB ]
- Metadata
- JSON: 24-1.0071220.json
- JSON-LD: 24-1.0071220-ld.json
- RDF/XML (Pretty): 24-1.0071220-rdf.xml
- RDF/JSON: 24-1.0071220-rdf.json
- Turtle: 24-1.0071220-turtle.txt
- N-Triples: 24-1.0071220-rdf-ntriples.txt
- Original Record: 24-1.0071220-source.json
- Full Text
- 24-1.0071220-fulltext.txt
- Citation
- 24-1.0071220.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071220/manifest