Essays on Empirical LikelihoodbyJun MaB.A., Beijing Foreign Studies University, 2007M.Sc., City University of Hong Kong, 2008M.A., The University of British Columbia, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Economics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2014c© Jun Ma 2014AbstractThis thesis consists of three research chapters on the theory of empirical likelihood (EL), which is a classof inferential methods widely used in econometrics. In Chapter 2, we focus on estimation and testing ofmoment restriction models with weakly dependent stationary time series data using blockwise empiricallikelihood method. Empirical likelihood based methods often encounters the finite sample problem thatthe constraint set of the profiling step becomes empty. This issue undermines the validity of EL-basedmethods in empirical applications. We first show first-order validity of Chen, Variyath and Abraham(2008)’s pseudo observation adjustment, which is used to overcome this shortcoming. Under regularityconditions, key higher-order properties are found. The first property is that blockwise EL ratio statisticsadmit higher-order refinement and this refinement can be implemented via either mean adjustment tothe EL ratio statistic or creating a pseudo observation with specific level of adjustment. By the latterapproach, we address both the empty-constraint-set issue and low precision of chi-square approximation.We also find that for testing problems, the optimal block length choice that minimizes the higher-orderapproximation error has an order of magnitude the sample size to the power of 2/5. In Chapter 3, wefocus on parameter hypothesis testing problems for moment restriction models using EL ratio tests. Wesubstantially extend existing theorems on Bartlet correctability of EL ratio tests for parameter testingproblems in Chen and Cui (2007) and Chen and Cui (2006.a). We consider tests of general nonlinearrestrictions on the parameter under the null hypothesis. We show Bartlett correctability of EL ratio testsof such a large family of testing problems, which are potentially useful in many empirical applications.In Chapter 4, we focus on estimation and testing of conditional moment restrictions with i.i.d. data.Following the approach of adjusted empirical likelihood (AEL) proposed by Chen, Variyath and Abraham(2008), this paper develops AEL-based methods for conditional moment restrictions, and establishes thatnew methods produce semiparametrically efficient estimators and consistent specification tests. This newmethod shows improved computational efficiency and accuracy in finite samples, as compared to someexisting alternatives.iiPrefaceThis thesis is written under the supervisions of Professor Vadim Marmer. I initiated ideas of all of theresearch Chapters 2,3 and 4 and worked out all the theoretical and Monte Carlo simulations resultsindependently. During this process, Professors Vadim Marmer, Jiahua Chen, Hiroyuki Kasahara andKyungchul Song gave me valuable criticism that helped me improve the presentation of my work.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Pseudo Observation Adjustment and Higher Order Properties of Blockwise EmpiricalLikelihood with Moment Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Basic Setup and Review of Blockwise Empirical Likelihood . . . . . . . . . . . . . . . . . 72.3 Pseudo-observation Adjustment and its First-order Validity . . . . . . . . . . . . . . . . . 152.4 Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Em-pirical Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Second Order Refinement of Empirical Likelihood Ratio Tests for General ParameterTesting Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68ivTable of Contents3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3 The Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.4 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874 On Pseudo Observation Adjustment of Empirical Likelihood Based Methods for Con-ditional Moment Restriction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1364.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1364.2 Basic Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1394.3 Review of Empirical Likelihood Method for Moment Restriction Models . . . . . . . . . 1404.4 Problem of Infeasible Inner Loop Optimizations . . . . . . . . . . . . . . . . . . . . . . . 1454.5 Pseudo Observation Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1484.6 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504.7 Monte Carlo Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1544.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1554.9 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1565 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175vList of Tables2.1 Monte Carlo Distribution of #Θ˜1Replications = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Monte Carlo Distribution of #Θ˜1Replications = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Monte Carlo Distribution of #Θ˜1Replications = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Linear Model (ρ = 0.5)Empirical Sizes of BEL Based Tests and Adjusted BEL Based Tests . . . . . . . . . . . . 312.5 Linear Model (ρ = 0.9)Empirical Sizes of BEL Based Tests and Adjusted BEL Based Tests . . . . . . . . . . . . 322.6 Linear ModelPower Property of Adjusted BEL Based t-Test and BEL Based t-Test . . . . . . . . . . . 332.7 Nonlinear Model (CAPM)Empirical Size of BEL Based Tests and Adjusted BEL Based Tests . . . . . . . . . . . . . 342.8 Nonlinear Model (CAPM)Power Property of Adjusted BEL Based J-Test and BEL Based J-Test . . . . . . . . . . . 353.1 Rejection Frequencies of Tests for H0 : β = 0 (T = 200) . . . . . . . . . . . . . . . . . . . 833.2 Rejection Frequencies of Tests for H0 : α = β (T = 200) . . . . . . . . . . . . . . . . . . . 844.1 Average of Fractions of Convex-Hull-Condition-Failing Points (500 Replications) . . . . . 1484.2 Relative RMSE of AKSEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1554.3 Relative RMSE of ABFEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155viList of Figures2.1 Shrinkage of Convex Hulls Due to Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Construction of the Pseudo Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1 Rejection Frequencies for Different Sample Sizes (F = t5, λ = 4, d = 6) . . . . . . . . . . . 843.2 Size-Corrected Power of EL Ratio, BEL Ratio and AEL Ratio Tests for H0 : β = 0 (T = 200) 853.3 Size-Corrected Power of EL Ratio, BEL Ratio and AEL Ratio Tests for H0 : α = β (T = 200) 86viiAcknowledgementsI am greatly indebted to my thesis supervisor Professor Vadim Marmer for his guidance with great patienceduring my graduate studies. I am greatly grateful to Professors Jiahua Chen, Hiroyuki Kasahara andKyungchul Song whose comments help me improve the presentation of my work. I am also grateful toProfessor Shinichi Sakata for his excellent instruction in advanced econometrics and great patience. Ithank Peisong Han, Yukun Liu, Paul Rilstone, Paul Schrimpf and Zhengfei Yu for their helpful commentsand discussions. I am grateful to the University of British Columbia for providing me with the fantasticopportunity to work on these interesting and exciting research projects, especially faculty members of theeconomics, mathematics and statistics departments where I received training necessary for being a rookieresearcher in econometrics. I thank my parents for their support. I thank Renmin University of Chinafor providing me an opportunity to continue working on my research after graduation.viiiChapter 1IntroductionIn this thesis, I make contributions to the theory of empirical likelihood method. It includes two veryclosely related topics. The first one is pseudo observation adjustment approach to address the convex hullproblem.EL based methods often encounters the nite sample problem that the constraint set of parametricoptimization problem in the EL profiling step becomes empty for some or all parameter values in finitesamples. This makes it challenging to find an accurate EL estimate in practice or in worse cases weeven do not have a well-defined EL profiling likelihood function. Thorough discussion about the convexhull problem can be found in Kitamura (2006), Grenda´r and Judge (2009) and Owen (2001). Pseudoobservation adjustment approach was proposed by Chen, Variyath and Abraham (2008). The originalapproach was proposed for unconditional moment restriction models with i.i.d. data. In this thesis, Iextend this approach to unconditional moment restriction models with dependent data and conditionalmoment restriction models with i.i.d. data.Another topic covered is the high order asymptotics of EL. We consider testing problems and aim forreducing the gap between the asymptotic distribution and the finite-sample distribution of the testingstatistics. The approach is known as Bartlett correction in the literature. Previous research includesDiCiccio, Hall and Romano (1989) for smooth function models, Chen and Cui (1993, 1994) for linearregression models, Chen and Cui (2007) for overidentified moment restriction models, Chen and Cui(2006.a) for just identified models with nuisance parameters, Matsushita and Otsu for testing overidenti-fying restictions. There are also many other papers on this topic. Liu and Chen (2010) found that usingpseudo observation adjustment we can achieve the same goal. Pseudo observation adjustment is a novelapproach that can address the same issue as Bartlett correction. In this thesis, I extend Liu and Chen(2010) to dependent data. Another important theorem missing in the literature is Bartlett correction ofEL tests for general nonlinear parameter hypothesis testing problems. Papers mentioned tackle only asmall set of parameter testing problems, while a large set of interesting testing problems are excluded.1Chapter 1. IntroductionFor example, overidentified models with nuisance parameters. In this thesis, I prove EL ratio tests arestill Bartlett correctable, even if we consider much larger class of testing problems.2Chapter 2Pseudo Observation Adjustment andHigher Order Properties of BlockwiseEmpirical Likelihood with MomentRestrictions2.1 IntroductionMoment restriction is an important modelling framework in economics which includes many useful econo-metric models as special cases. Many examples are found in empirical macroeconomics, finance andlabor economics literature. In most of the papers with empirical applications in this framework, Hansen(1982)’s generalized method of moments (GMM), a family of estimation and testing methods for momentrestrictions, is the most popular toolbox of applied econometricians.There are examples of both linear and nonlinear moment restrictions and both static models (thedata is cross-sectional and assumed to be independently and identically distributed (i.i.d.)) and dynamicmodels (the data is time-dependent and assumed to be serial dependent). To name a few, see Clarida, Galiand Gertler (2000) and Blinder and Maccini (1991) for examples of dynamic models. Many empirical laboreconomics papers which used instrumental variables are examples of static models. For a comprehensivelist of papers with empirical applications in this modelling framework, see Hall (2005) and Hansen andWest (2002).Despite of the popularity of GMM, Monte Carlo simulations for some models show that its finite-sample properties are poor . It is found that finite-sample mean square error (MSE) of the (two-step)GMM estimator with optimal weighting matrix (efficient GMM) can be larger than GMM with identitiy32.1. Introductionweighting matrix. The need for a first-step estimate for estimating the optimal weighting matrix is asource for finite-sample bias. Some Monte Carlo studies in the literature showed that χ2 distributionprovides poor approximation to the finite-sample distribution of GMM-based testing statistics used forstatic models and even poorer for dynamic models, which leads to severe distortion to the actual sizeof the tests from its nominal size. To name a few of simulation studies of finite-sample behavior ofGMM with dynamic models, see Tauchen (1986), Hansen, Heaton and Yaron (1996) and Clark (1996);also see Hall (2005) chapter 6 for a comprehensive introduction. In this paper, we focus on testing andpropose new tests which are shown to have better statistical properties than tests based on GMM-basedstatistics (both Wald test for parametric hypotheis (t-test) and Sargan test (J-test) for testing validity ofoveridentification restrictions).As alternatives to GMM, the family of generalized empirical likelihood (GEL) methods (Neweyand Smith (2004)) which include empirical likelihood (EL) (Owen (1988), Qin and Lawless (1994)),information-theoretic estimator (Kitamura and Stutzer (1997)) and the continuous updating estimator(Hansen, Heaton and Yaron (1996)) as special cases was proposed for estimation and testing of staticmoment restriction models. To account for the serial dependence in time series data nonparametrically,GEL has been modified using blocking and averaging technique (Kitamura (1997)’s blockwise empiricallikelihood (BEL) and Bravo (2009)’s blockwise GEL) or more generally, kernel smoothing (Smith (2011)’ssmoothed GEL). The blocking technique was also used in bootstrap for time series data (Politis andRomano (1992), Lahiri (2006)). See also Kitamura (2006) for another approach that first proposes aparametric model for the data-generating process (DGP) and then applies an GEL-type procedure. Ithas been proved that GEL-based estimator is consistent and asymptotically normally distributed with thesame asympototic covariance matrix as the two-step GMM with optimal weighting matrix. In paralell toparametric likelihood theory, we also define two different empirical likelihood ratio statistics for t-test andJ-test respectively. Kitamura (2001), Otsu (2010) and Newey and Smith (2004) established optimality ofEL within the family of GEL under different criteria.GEL estimator is computed in one step, while to compute efficient GMM estimator, we need a pre-liminary estimator to get the weighting matrix. GEL-based tests achieve asymptotic pivotalness withoutexplicit studentization. So intuitively we expect that GEL should have better finite-sample performancethan efficient GMM. Newey and Smith (2004) derived high-order asymptotic bias terms of GMM and GELestimators for static models and found that compared to that of GMM, the second-order asymptotic bias42.1. Introductionof GEL lacks two components. They also showed that in paralell to parametric likelihood, bias-correctedEL is high-order asymptotically efficient. Anatolyev (2005) showed that for dynamic models, smoothedGEL inherited these desirable higher-order properties. EL-based methods also have better second-orderproperties for testing problems. We have the following key results for static models in the literature. Di-Ciccio, Hall and Romano (1991) showed that in paralell to parametric likelihood, EL for smooth functionmodels (i.e. the parameter of interest is a smooth function of population moments, for example, the pop-ulation variance) is Bartlett correctable, which means we can adjust the EL ratio statistic so that the χ2approximation of the finite-sample distribution has an error of smaller order. Chen and Cui (2007) showedthat EL for moment restrictions which include smooth function models as special cases admit this typeof second-order refinement (Bartlett correction) for EL ratio statistic for parametric hypothesis (t-test).Matsushita and Otsu (2012) showed that EL for moment restrictions admit second-order refinement forEL ratio statistic for testing overidentifying restrictions (J-test). For dynamic models, Kitamura (1997)showed that BEL admits a higher-order refinement (Bartlett correction) for smooth function model in asimilar fashion.There are several issues with GEL-based methods. Firstly, GEL is computationally costly, relativelyto GMM. GEL requires working on a two-fold nested optimization routine. There is another practicalissue (named convex hull problem (CHP) as in Owen (2001)) with the computation of GEL in finitesamples. This issue originates from the structure of the GEL optimization problem we need to solveto get the GEL estimate. To obtain GEL estimates, we need to find the global minimizer of objectivefunction (cf. equation (2.4) below) numerically. However, this is an irregular optimization problem thatmakes it challenging to find the minimizer using conventional numerical methods. Namely, we may havean implicitly defined feasible region and an objective function not defined outside of the feasible region.This issue may occur even if our moment rstrictions (cf. equation (2.2)) are defined by a regular function.We do not have this problem with GMM-based methods (cf. equation (2.3) below). Moreover, it ishard to determine whether we have such irregularities, and consequently using conventional numericalmethods without more careful consideration is very risky (cf. Kitamura (2006)). In worse cases theestimator objective function can be nowhere defined analytically, so a valid GEL estimate does notexist (cf. Grenda´r and Judge (2005)). Ignoring this issue and directly carrying out computation of anGEL estimate with conventional numerical methods may lead to invalid estimation and testing results.Therefore validity of GEL-based inference is seriously undermined in real applications when we encounter52.1. Introductionsuch irregularities. Owen (2001), Kitamura (2006) and Grenda´r and Judge (2009) analyzed this problemin detail (also cf. Section 2.2.3). The situation is even more severe for GEL-based methods with blockingand averaging or kernel smoothing for dynamic models (cf. Section 2.2.3).Monte Carlo studies (cf. Chen, Variyath and Abraham (2008) and Liu and Chen (2010)) also revealthat there is an under-coverage problem with GEL-based confidence intervals in the context of staticmodels. Equivalently, there is distortion to the size of GEL-based tests and in particular, these teststend to over-reject the null. Monte Carlo simulations show that the distortion is bigger with GEL-basedmethods for dynamic models (cf. Gregory, Lamarche and Smith (2002)). As noticed by Tsao (2004),this under-coverage problem also originates from the structure of GEL is related to CHP in some sense.Another related issue is that the coverage probability of GEL-based confidence intervals has a nontrivialupper bound.In statistical literature, several remedies have been proposed. In this paper, we follow Chen, Variyathand Abraham (2008)’s adjusted empirical likelihood (AEL) approach. This method introduces a simple“pseudo observation” adjustment to GEL. The adjusted GEL method is robust to CHP but is first-ordervalid in the sense that it has the same first-order asymptotic properties as GEL. The advantage of Chen,Variyath and Abraham (2008)’s approach is that the adjustment is flexible and the researcher can set theadjustment in a way that the adjusted EL ratio test has better second-order property than the orignalEL ratio test (Liu and Chen (2010)). Chen, Variyath and Abraham (2008), Liu and Chen (2010) andMatsushita and Otsu (2012) only studied properties of AEL in the context of static models.In this paper, we make the following contributions. First following the pseudo observation adjustmentapproach, we define the adjusted BEL method that is robust to CHP and has the same first-order asymp-totic properties as BEL. Second we find that the adjusted BEL method can improve on size propertiesof GEL-based tests for dynamic models. This gives a new solution to the size distortion problem of someexisting testing methods including GMM and GEL. Third we find the optimal order of block length fortesting problems of the adjusted BEL with higher-order refinement.The paper is organized as follows. In Section 2.2, we describe the model. The data is assumed tobe stationary and weakly dependent. We then review the BEL method. We also discuss in more detailthe CHP as well as the problem with poor size properties of existing tests. In Section 2.3, we definepseudo observation adjustment in application to BEL and show that the new adjusted BEL inheritsfirst-order asymptotic properties of BEL and is robust to CHP. In Section 2.4, we first formally derive62.2. Basic Setup and Review of Blockwise Empirical Likelihoodhigh-order asymptotics of BEL ratio statistics and adjusted BEL ratio statistics for dynamic momentrestriction models. Based on new results in the toolbox of probability theory ( Lahiri (2006, 2007)’sedgeworth expansion results), we formally extend Kitamura (1997)’s higher-order refinement result forsmooth function model to a bigger class of models, the dynamic moment restriction models. We show thathigher-order refinements can be obtained for BEL with moment restrictions. We then show that similarlyto Liu and Chen (2010) in the context of static models, we achieve higher-order refinement via creatingthe pseudo observation in a specific way. When applying GEL with blocking and averaging or kernelsmoothing, there is a tuning parameter, block length, chosen by the econometrician. The block lengthis growing with the sample size in the asymptotic analysis. We also show that from the higher-orderedgeworth expansion, we are able to pin down the optimal order of the block length, for both adjustedBEL and BEL. In Section 2.5, we report Monte Carlo results on the finite-sample performance of ourmethod.2.2 Basic Setup and Review of Blockwise Empirical Likelihood2.2.1 Basic SetupIn this paper, transpose of a matrix A is denoted by Aτ . We use I to denote identity matrices.‖·‖ denotesthe Frobenius matrix norm for real matrices (‖A‖ = (trace(AτA))1/2) and Euclidean norm for vectors.In this paper, we use superscript to denote coordinates of vectors or matrices. xj is the j − th elementof vector x ∈ Rs. Elements of multi-dimensional Euclidean space are understood to be column vectors.We use B(x, δ) to denote an open ball with center x in an Euclidean space and radius δ. As common inmultivariate statistics literature, we use Kronecker product notation to arrange the derivatives. We adoptthe convention that ∂∂x⊗∂xτ is an s×s matrix of differential operations with∂∂xi∂xj as the i− j th element.For a matrix-valued function G : Rs −→ Rd1×d2 , we denote ∂∂x⊗∂xτG(x)∣∣∣x=x′≡ ∂∂x⊗∂xτ ⊗G(x)∣∣∣x=x′. So∂∂x⊗∂xτG(x)∣∣∣x=x′is understood to be a d1s × d2s matrix of second-order cross partial derivatives of Gevaluated at x′ ∈ Rs.Let (Xt)Tt=1 be a set of Rs valued random vectors that are observed by econometricians and T bethe sample size. In this paper, we assume existence of serial correlation in the data. For a stationarydiscrete stochastic process (Xt)t∈Z defined on a probability space (Ω,F , P ), we define the α−mixing72.2. Basic Setup and Review of Blockwise Empirical Likelihood(strong mixing) coefficients to beαk ≡ supA∈F0−∞, B∈F∞k|P (A ∩B)− P (A)P (B)| (2.1)where Fnm (m < n) is the σ−field generated by (Xm, Xm+1, · · · , Xn). For a sequence of σ−fields, (Dj)∞j=1,we use Dnm (m < n) to denote the σ- field generated by⋃nj=mDj . As is common in the literature, weconcentrate on (αk)k∈N as a measure of the serial dependence of the DGP and make some mild assumptionson the decay rate of (αk)k∈N. Throughout this paper, we assume the following regularity condition holds.Assumption 1. Our data (Xt)Tt=1 is a finite-sample realization of a stationary Rs valued discrete stochas-tic process (Xt)t∈Z with mixing coefficients satisfying∑∞k=1 αX(k)1−2/2+η for some η > 0.A large number of interesting time series processes satisfy this size condition on α−mixing coefficientsunder some mild technical conditions. See Doukhan (1994) for examples and details on mixing propertiesof various families of time series processes.Many economic models imply moment restrictions on the time series. Hence we assume that for someknown set Θ ⊆ Rp, there exists some θ∗ ∈ Θ such thatE [g(X1, θ∗)] = 0 (2.2)holds for some known function g : Rs ×Θ −→ Rd (d > p). See Hall (2005) and Hansen and West (2002)for lists of examples of dynamic empirical models in this framework. We also make the following standardassumption.Assumption 2. (a) The model is identifiable: θ∗ is the unique point in Θ with E [g(X1, θ∗)] = 01;(b) The parameter space Θ is compact.The GMM estimator is defined to be the minimizer of a quadratic form, which is a weighted sampleanalogue of the moment restrictions. The efficient GMM estimator function is`gmm(θ) ≡(1TT∑t=1g(θ,Xt))τW(1TT∑t=1g(θ,Xt))(2.3)1See Komunjer (2012) for some primitive conditions for the global identifiability assumption.82.2. Basic Setup and Review of Blockwise Empirical Likelihoodwhere W is the optimal weighting matrix. In constructing W , we need an estimate for the long-run covari-ance matrix∑∞j=−∞ E [g(Xt, θ∗)g(Xt−j , θ∗)τ ]. A popular choice is heteroscedasticity and autocorrelationcovariance (HAC) estimator with Bartlett kernel (cf. Andrews (1991)) . In the literature, bootstrapestimate for the distribution of GMM-based test statistics is proposed to aid these tests for achievingbetter size property than using asymptotic distribution. See Hall and Horowitz (1996) and Brown andNewey (2002) for bootstrapping in the context of static models, see Inoue and Shintani (2006) and Allen,Gregory and Shimotsu (2011) for bootstrapping in the context of dynamic models assuming that data isweakly dependent.2.2.2 Review of Blockwise Empirical LikelihoodNext we describe the blocking technique. Let M 6 T be block length and some L 6M be the seperationbetween consecutive blocks. For any θ ∈ Θ, we consider the i− th data block{g(X(i−1)L+1, θ), · · · , g(X(i−1)L+M , θ)}which is defined to be M consecutive observations from (g(Xt, θ))t∈Z . Since the sample size is T , we havetotally Q =⌊T−ML⌋+ 1 blocks, where bcc is the biggest integer that is smaller than c. LetYi(θ) ≡ 1M∑Mv=1 g(X(i−1)L+v, θ),i = 1, · · · , Q be averages of these blocks. More generally, instead of equal weights for each element inthe block, Smith (2011) considered weighting based on a kernel function. L = 1 corresponds to fullyoverlapped blocking and L = M corresponds to nonoverlapped blocking. These are the most commonlyused blocking schemes used in the bootstrap literature. L = 1 is known as Kunsch’s blocking rule andL = M is known as Carlstein’s blocking rule.The block length is a tuning parameter chosen by the researcher. GMM-based methods also requirechoosing a tuning parameter which is the window width in HAC estimator of the long-run covariancematrix. We define the Blockwise Empirical Likelihood (BEL) profile likelihood function ` to be`(θ) ≡ inf{−Q∑i=1log(wi) : (w1, · · · , wQ) ∈ SQ,Q∑i=1wiYi(θ) = 0}(2.4)92.2. Basic Setup and Review of Blockwise Empirical Likelihoodwhere Sk ≡{(w1, · · · , wk) ∈ Rk+ :∑ki=1wi = 1}is the k − 1 dimensional unit simplex. See Kitamura(2006) for intuitive motivation behind this construction. The GEL family is characterized by replacingthe logarithm function in (2.4) with any other smooth concave function. Approximate minimizer θˆ of `on Θ is defined to be the BEL estimator, and the minimum of `T on Θ is used to test overidentificationrestrictions. We define BEL ratio statisticsLRI ≡ 2(QMT)−1 {`(θˆ)− `T (θ0)}(2.5)LRII ≡ 2(QMT)−1`(θˆ).LRI is used to test parametric hypothesis θ∗ = θ0 and LRII is used to test overidentifying restrictions.Both statistics are χ2 distributed asymptotically under the corresponding nulls. It is noticed in Kitamura(1997) that if we ignore serial dependence in data and carry out original EL estimation, the estimatoris still consistent and asymptotically normal but is asymptotically inefficient relative to efficient GMMusing the inverse of a HAC estimate for the long-run covariance matrixS ≡∞∑j=−∞E [g(Xt, θ∗)g(Xt−j , θ∗)τ ] (2.6)as the weighting matrix. Moreover, the BEL ratio statistics LRI and LRII are not χ2 distributed asymp-totically under the nulls.Seeing (2.4), we notice that to get the BEL estimate, we need to work with a nested optimizationroutine since the profile likelihood function is a minimum value function. Following Kitamura (2006), wecall computation of `(θ) at each θ ∈ Θ inner loop and minimization of `(θ) over θ ∈ Θ outer loop. It isobserved that the inner loop is a convex programming problem and the outer loop is a general nonlinearprogramming problem. For the inner loop, we usually solve its dual optimization problem (the Lagrangemultiplier method) in practice: that is, for each θ ∈ Θ, we have the representation`(θ) = supγ∈Γ(θ)Q∑i=1log(1 + γτYi(θ)) (2.7)where Γ(θ) ≡{γ ∈ Rd : 1 + γτYi(θ) > 0, i = 1, · · · , Q}. The number of arguments of the inner loopobjective function is far less than that of the objective function if we solve the original problem (2.4). By102.2. Basic Setup and Review of Blockwise Empirical Likelihoodnoticing the nested nature of infθ∈Θ`(θ), it is also clear that BEL estimator is much more computationalcostly than two-step GMM, especially when we have an overidentified system and the number of momentrestrictions is large. From a more sophisticated computational perspective, minimizing (2.3) is a muchsimpler problem than (2.7), since the problem infθ∈Θ`(θ) is irregular.2.2.3 The Convex Hull ProblemA common practical problem is that with our realized data, the constraint set in (2.4):{(w1, · · · , wQ) ∈ SQ :Q∑i=1wiYi(θ) = 0}(2.8)can be empty at possibly many points, even all the points in Θ. If at some point θ ∈ Θ, (2.8) is empty,we should set value of ` to be ∞ (By the convention that inf ∅ = ∞.). It is equivalent to saying thatthe convex hull of the Q real vectors (Yi(θ))Qi=1 does not contain the origin. This is known as “convexhull problem (CHP)” to EL-based methods (cf. Owen (2001), Kitamura (2006)). We say convex hullcondition is satisfied at θ if (2.8) is nonempty.In finite samples, the optimization problem infθ∈Θ`(θ) is irregular in the sense that may have an im-plicitly defined feasible region and an objective function not defined outside of the feasible region. Theoptimization problem infθ∈Θ`gmm(θ) to compute GMM estimates does not have this problem. The feasibleregion in the outerloop optimization problem denoted by Θ0 is by definition the points in Θ at whichthe profile likelihood function is not ∞. The fact that (2.8) is empty (equivalently, the inner loop is aninfeasible optimization problem) at a proportion of points in Θ may have some consequences that preventus from obtaining a reasonably accurate global minimizer. In cases with Θ0 = ∅, the profile likelihoodfunction is not well-defined and EL-based estimates do not exist. In cases with Θ0 6= ∅, Θ0 can be anyshape and can be small relative to Θ. It is also hard to specify it a priori2 and it can be disjoint. Itcan be hard to find a feasible initial point but we need our initial point to be feasible when carrying outouter loop optimization using any conventional algorithm3. In practice, we solve the dual optimization2Except in simpler cases, we can use theorems from parametric optimization theory (Bank, Guddat, Klatte, Kummer andTammer (1983)) to prove that if θ 7→ g(x, θ) is linear for each x, the outer loop feasible region is convex but still can beempty. It is hard to get more results about the outer loop feasible region under weaker conditions .3For general nonlinear constrained programming problem, if we cannot immediately find a feasible initial point from thesetup, finding a feasible initial point or determining whether a point is feasible can be hard. In fact often we need to solveanother optimization problem. In the cases with computation of EL-based methods, we need to solve a linear programming.The two-phase simplex method (see Vanderbei (2008)) is a reliable numerical method to check whether a collection of vectorsspan a convex hull that contains the origin.112.2. Basic Setup and Review of Blockwise Empirical LikelihoodFigure 2.1: Shrinkage of Convex Hulls Due to Blocking(a) Fully-Overlapped Blocking (b) Non-Overlapped BlockingCircles: Original Observations; Star: Effective Observations after Blocking; Square: the Originproblem supγ∈Γ(θ)∑Qi=1 log(1+γτYi(θ)) instead of the primal to get a value for the profile likelihood functionat each point θ ∈ Θ. By seperating hyperplane theorem, supγ∈Γ(θ)∑Qi=1 log(1 + γτYi(θ)) =∞ if and only ifthe constraint set (2.8) is empty. So in our case the primal solution and dual solution agree even whenthe inner loop primal (2.4) is infeasible.As noticed by Kitamura (2006), using conventional algorithm may wrongly return a point in theinfeasible region as local minimizer if our initial point is infeasible. Therefore, if an applied econometricianis unaware of presence of CHP and use conventional algorithm to work on the nested optimization routine,it is likely that a seemingly meaningful estimate is obtained. The irregularities of outer loop make itdifficult to get a valid BEL estimate as the global optimizer of the profile likelihood function. It isproposed in the computer science literature that more sophisticated algorithm is needed to solve thistype of irregular optimization problem (cf. Zilinskas, Fraga, Mackute and Varoneckas (2004)). The riskof getting an invalid estimate is high if we do not try sufficiently many initial points and Θ0 is smallrelative to Θ. It is necessary to use numerical methods to get evidence of presence of CHP (cf. Grenda´rand Judge (2009) and also Section 2.5) before applying EL-based methods. But this procedure is verycomputationally costly and hard to implement, especially when the dimension of Θ is large. It is alsonecessary to check if the initial point we choose is feasible. Seeing these aspects, CHP is a remarkablelimitation of GEL-based methods.Most of EL-based methods (cf. Kitamura, Tripathi and Ahn (2007), Donald, Imbens and Newey122.2. Basic Setup and Review of Blockwise Empirical LikelihoodTable 2.1: Monte Carlo Distribution of #Θ˜1Replications = 2000T = 100Number of Pointswhere CHC Fails[0, 50] (50, 100] (100, 150] (150, 200] (200, 250] (250, 300] (300, 350] (350, 400]Fully Overlapped 0.0765 0.2485 0.3430 0.2370 0.0755 0.0145 0.005 0Non-overlapped 0 0.0005 0.0050 0.0495 0.1520 0.3595 0.3305 0.1030(2008) for examples) in the literature require doing a two-fold optimization routine similar to infθ∈Θ`(θ) asfor BEL. If the DGP (Xt)t∈Z is i.i.d., then we can simply use EL without blocking. We compute infθ∈Θ`el(θ)where`el(θ) ≡ inf{−T∑t=1log(wt) : (w1, · · · , wT ) ∈ ST ,T∑t=1wtg(Xt, θ) = 0}. (2.9)Comparing (2.4) and (2.9), it is easy to see that for any θ ∈ Θ, the convex hull spanned by (g(Xt, θ))Tt=1 isa proper subset of the convex hull spanned by (Yi(θ))Qi=1. It may be possible that the origin is containedin the convex hull spanned by the original vectors (g(Xt, θ))Tt=1 but is not contained in the convex hull ofthe block averages (Yi(θ))Qi=1. But when working with a dynamic moment restriction model, we need touse the blocking technique to account for serial dependence of data. Therefore BEL is more likely to besubject to CHP, relatively to orignal EL. If using GEL with kernel smoothing (cf. Smith (2011)), we havethe same problem with shrinked convex hulls. The fact that using blocking (or more generally, kernelsmoothing) may result in shrinked convex hulls is illustrated in Figure 2.1. We generated 100 data pointsfrom a dependent process in R2 and then use fully overlapped blocking and non-overlapped blocking toconstruct effective data points.As suggested by Grenda´r and Judge (2009), use of Monte Carlo simulation results to show finite-sample performance of some GEL-based methods fails to reflect the hazard associated with presence ofCHP. In simulation settings we know the true parameter and since the model is correctly specified Θ0is very likely to be around the true parameter. In this case, CHP may not be a problem that preventsus from getting a valid estimate if we just set the true parameter to be the initial point. Grenda´r andJudge (2009) took several examples for moment restriction models from statistical literature and derivedconditions on the observations that lead to an empty feasible region. They then draw 1000 Monte Carlo132.2. Basic Setup and Review of Blockwise Empirical LikelihoodTable 2.2: Monte Carlo Distribution of #Θ˜1Replications = 2000T = 250Number of Pointswhere CHC Fails[0, 50] (50, 100] (100, 150] (150, 200] (200, 250] (250, 300] (300, 350] (350, 400]Fully Overlapped 0.0745 0.3290 0.4030 0.1610 0.0280 0.0045 0 0Non-overlapped 0 0.001 0.0050 0.0595 0.2755 0.4520 0.1955 0.0115samples and see that a fraction of these samples have the feature that the feasible region is empty. We takea similar approach. We use the same simulation setting (linear instrumental variable model) as consideredin Section 2.5. While in Section 2.5 we report performance of fully overlapped and non-overlapped, herewe first give evidence of the effect of blocking and averaging on increasing the risk associated with CHP.We do two simulations with sample size equal to 100 and 250. The parameter space is assumed tobe [−1, 1]× [−1, 1] and the true parameter is (0, 0). Let Θ˜ be a 400-point grid on [−1, 1]× [−1, 1] (withequal distance between consecutive points). In Table 2.1 and 2.2, we report the Monte Carlo distributionof #Θ˜1, where Θ˜1 is the set of points in Θ˜ that fail the convex hull condition4. The results are showed inTable 2.1 and Table 2.2. We find that even though in all of the 2000 replications, convex hull conditionholds at every point on the grid Θ˜ if original observations are used. But after blocking, many points onΘ˜ fail the convex hull condition. This finding indicates a large risk of getting invalid estimate using BELmethod if we do not have prior information to help us choose initial point.We also consider the consumption-based asset pricing model in Section 2.5. The parameter spaceis assumed to be [0, 6]. Let Θ˜ be a 120-point grid on [0, 6]. In Table 2.3, we report the Monte Carlodistribution of the number of points on the grid that fail the convex hull condition.Enough caution should be taken before applying GEL-based method to dynamic models. It is alsonecessary to find remedy to CHP with BEL in the context of dynamic models. In the statistical literature,researchers proposed several remedies, to name a few, Owen (2001) (Chapter 10), Chen, Variyath and4To check whether a given collection of multi-dimensional vectors contain the origin, we use the two-phase simplex methodfrom linear programming theory. Cf. Vanderbei (2008)142.3. Pseudo-observation Adjustment and its First-order ValidityTable 2.3: Monte Carlo Distribution of #Θ˜1Replications = 2000T = 100Number of Pointswhere CHC Fails[0, 20] (20, 30] (30, 40] (40, 60]Fully Overlapped 0.200 0.581 0.197 0.022Non-overlapped 0.048 0.414 0.432 0.106Abraham (2008) and Brown and Chen (1998). We suggest using Chen, Variyath and Abraham (2008)’smethod of pseudo observation adjustment, which is originally proposed in the context of static models.In Section 2.3, we show that pseudo observation adjustment is effective to address CHP in our setting.In Section 2.4, we show that this approach also has more theoretically appealing features which may helpus to address another important theoretical issue.2.3 Pseudo-observation Adjustment and its First-order ValidityLet a > 0 be a possibly data-dependent tuning parameter chosen by the econometrician, which is calledthe “level of adjustment”. Following the construction proposed in Chen, Variyath and Abraham (2008),we define a pseudo observation adjustment as follows: for each θ ∈ Θ,YQ+1(θ) ≡ −aQQ∑i=1Yi(θ) (2.10)where we defined Yi(θ) ≡ 1M∑Mt=1 g(X(i−1)L+t, θ). The adjusted BEL profile likelihood function is definedas`a(θ) ≡ inf{−Q+1∑i=1log(wi) : (w1, · · · , wQ+1) ∈ SQ+1,Q+1∑i=1wiYi(θ) = 0}. (2.11)We define the adjusted BEL estimator for θ∗ to be approximate minimizer of `a, θˆa ≡ argminθ∈Θ`a(θ), wherethe error is in appropriate probabilistic sense. We also have its representation in dual optimization form`a(θ) = supγ∈Γa(θ)Q+1∑i=1log(1 + γτYi(θ))152.3. Pseudo-observation Adjustment and its First-order ValidityFigure 2.2: Construction of the Pseudo Observation(a) Convex Hull of the Original Observations (b) Convex Hull of the Original and the Pseudo Observationswhere Γa(θ) ≡{γ ∈ Rd : 1 + γτYi(θ) > 0, for 1 6 i 6 Q+ 1}, for every θ ∈ Θ. It is observed that for anyrealized data points, the constraint set in the inner loop optimization problem{(w1, · · · , wQ+1) ∈ SQ+1 :Q∑i=1wiYi(θ) + wQ+1(−aQQ∑i=1Yi(θ))= 0}(2.12)is always nonempty, since(1Q ,··· ,1Q ,1a)/1+ 1a is a point that satisfies the constraint and it is in the relativeinterior of SQ+1. The effect of the pseudo observation can be illustrated by Figure 2.2In finite samples, it can be shown that adjusted BEL profile likelihood function is below the BELprofile likelihood function and it is continuous everywhere on Θ if g(x, ·) is continuous on Θ for every x.Since the constraint set (2.12) is always nonempty, the dual problem we want to solve to construct theprofile likelihood function must be bounded and existence of solution is always guaranteed. See Chen andHuang (2012) for more finite-sample properties of pseudo observation adjustment which can be extendedto EL with blocking easily.For any realized data points, the feasible region of the realized AEL profile likelihood function isthe whole of Θ. A set of good initial values from Θ is still needed when we use nonlinear optimizationalgorithms to find an estimate, however we no longer have the problem with disconnected feasible regionand any point in Θ is a valid initial point for outer loop optimization. If we are interested in obtaining an162.3. Pseudo-observation Adjustment and its First-order Validityestimate from the data and using AEL to get it, it is believed that in many cases computational efficiencyis largely enhanced relative to optimizing the original EL profile likelihood function for an estimate.Chen, Variyath and Abraham (2008) showed that, under some conditions on the tuning parametera, the adjusted empirical likelihood has the same first-order asymptotic properties as the original EL,i.e. the estimator is consistent, asymptotically normally distributed (with the same asymptotic varianceas the two-step efficient GMM estimator) and the EL ratio statistics have χ2 limiting distributions. Inthis paper, we show that this argument extends to BEL for dynamic models. We define two adjustedblockwise empirical likelihood ratio statistics:ALRI ≡ 2(QMT)−1 {`a(θˆa)− `a(θ0)}(2.13)ALRII ≡ 2(QMT)−1`a(θˆa).Notice that ALRII is the appropriate statistic for testing validity of overidentification restriction:H0 : E [g (X1, θ)] = 0 for some θ ∈ ΘH1 : E [g (X1, θ)] 6= 0 for any θ ∈ Θ.This is known as J-test in econometric literature. ALRI is the appropriate statistic to test the parametrichypothesis H0 : θ∗ = θ0 v.s. H1 : θ∗ 6= θ0. This is known as t-test in econometric literature. We furtherassume the following regularity conditions:Assumption 3. (a) For some small real number κ ∈ (0, 1) and some η > 0 ,E[supθ′∈B(θ,κ)‖g(X1, θ′)‖2+η]<∞ for all θ ∈ Θ.(b) For each θ ∈ Θ, there exists some Dθ ⊆ Rs with P [X1 ∈ Dθ] = 1 such that for all x ∈ Dθ,θ′ 7→ g(x, θ′) is continuous at θ.Assumption 4. (a) The true parameter θ∗ is an interior point of Θ.(b) There exists some D∗ ⊆ Rs with P [X1 ∈ D∗] = 1 such that for all x ∈ D∗, θ′ 7→ g(x, θ′) is twicecontinuously differentiable at θ∗.(c) S ≡ limT−→∞T−1Var[∑Tt=1 g(Xt, θ∗)]is positive definite.(d) E[‖g(X1, θ∗)‖2+η]<∞ for η > 0 given in Assumption 1.172.3. Pseudo-observation Adjustment and its First-order Validity(e) There exists some κ ∈ (0, 1) such that E[supθ′∈B(θ∗,κ)∥∥ ∂∂θτ g(X1, θ)∣∣θ=θ′∥∥2]<∞.(f) G ≡[∂∂θτ g(X1, θ)∣∣θ=θ∗]is of full column rank.Under Assumption 1 on the decaying rate of mixing coefficients, if Assumption 4 (d) holds, thenlimT−→∞T−1Var[∑Tt=1 g(Xt, θ∗)]exists and a central limit theorem applies to the scaled average T 1/2∑Tt=1 g(Xt, θ∗)as T −→∞ (Ibragimov (1962) Theorem 1.7). The case of long memory is ruled out under these assump-tions.Proposition 2.1. Suppose Assumptions 1,2,3 hold. If we set M = O(T 1/2−1/2+η) where η > 0 is given inAssumption 4 (e) and a = op(T1/2), then θˆa −→p θ∗, the adjusted BEL estimator is weakly consistent.Proposition 2.2. Suppose Assumptions 1 to 4 hold. If we set M = O(T 1/2−1/2+η) where η > 0 is givenin Assumption 4 (e) and a = op(T1/2), then:T 1/2(θˆa − θ∗)M−1T 1/2λˆa −→d Normal0,Vθ 00 Vλwhere Vθ ≡(GτS−1G)−1, Vλ ≡ S−1 (I−GVθGτ ) and λˆa ≡ argmaxγ∈Γa(θˆa)∑Q+1i=1 log(1 + γτYi(θˆa))The asymptotic covariance matrix Vθ is the same as the asymptotic covariance matrix of efficientGMM estimator. But same as unadjusted BEL estimator, this estimator can be computed in one stepwhile efficient GMM estimator requires a first-step preliminary estimator.Proposition 2.3. Suppose Assumptions 1 to 4 hold. If we set M = O(T 1/2−1/2+η) where η > 0 is givenin Assumption 4(e) and a = o(T 1/2), then:(a) Under H0 : θ∗ = θ0, ALRII −→d χ2p.(b) ALRI −→d χ2d−p.For tests using adjusted BEL ratio statistics, estimation of the covariance matrix is internalized.Adjusted BEL retains this favorable feature of BEL. These propositions generalize Chen, Variyath andAbraham (2008)’s first-order validity results in the conext of static models to BEL for dynamic models.The regularity conditions are almost the same as Kitamura (1997).With the pseudo observation, we need to choose two tuning parameters a and the block length M .Therefore, an apparent shortcoming of pseudo observation adjustment is that we introduce an additional182.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodtuning parameter and our estimate is sensitive to our choice of the adjustment parameter. A conventionalchoice for a is max{1, log(Q)2}as proposed by Chen, Variyath and Abraham (2008). From the simulationresults in Chen, Variyath and Abraham (2008) and Liu and Chen (2010), it is seen that with such achoice, the size property of the tests is better than original EL-based tests.Since Propositions 2.1,2.2 and 2.3 allow for data-driven choice of a, it is interesting to find optimalchoice of the tuning parameters a based on data. Compared to original EL, the tuning parameter a isan additional degree of freedom. Liu and Chen (2010) and Matsushita and Otsu (2012) show that foradjusted EL with i.i.d. data, we can find an optimal data-driven choice of the adjustment parameter fortesting problems. Finite-sample distributions of the adjusted EL ratio statistics (for both t-test and J-test)with this specific data-driven choice of adjustment level can be better approximated by χ2 distribution.The order of the approximation error becomes smaller than the order of the approximation error of χ2approximation to finite-sample distribution of unadjusted EL ratio statistics. In Section 2.4, with higher-order analysis we find a data-driven choice of a to achieve the same type of higher-order refinement. Theprecision of χ2 approximation is enhanced. Our results extend Liu and Chen (2010) and Matsushita andOtsu (2012) to this time series setting.2.4 Higher-Order Properties of Blockwise Empirical Likelihood andAdjusted Blockwise Empirical LikelihoodAs discussed in Section 2.1, for hypothesis tesings (both of t-test and J-test) of dynamic moment restrictionmodels, GMM-based tests for both t-test (Wald test) and J-test (Sargan test) were shown to have severesize distortion in many papers with Monte Carlo studies. GMM-based tests tend to over-reject. One wayto get tests with better size properties is to disregard the χ2 approximation and use a bootstrap estimatefor the finite-sample distributions of GMM-based testing statistics (Hall and Horowitz (1996), Inoue andShintani (2006)). Allen, Gregory and Shimotsu (2011) proposed a combination of bootstrap and EL andtheir Monte Carlo simulations show that this method is more effective in addressing the size distortionproblem.Another approach is to give up using GMM-based methods and use some alternatives including GEL-based tests. For static models, researchers also noticed the problem that GEL also have the problem withover-rejecting in finite samples both theoretically (Tsao(2004)) and via simulation studies (Chen, Variyath192.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodand Abraham (2008), Imbens, Spady and Johnson (1998)). For dynamic models, Gregory, Lamarche andSmith (2002)’s Monte Carlo simulations showed that tests based on one of the GEL family (Kitamuraand Stutzer (1997)) still has the problem. The magnitude of size distortion is comparable to GMM-basedtests. Therefore switching to GEL-based tests may not solve the problem.Monte Carlo simulations in Chen, Variyath and Abraham (2008) show that AEL already improvesthe approximation precision for static models, even without implementation of second-order refinementas proposed by Liu and Chen (2010). Therefore it would be interesting to see whether pseudo observationadjustment to BEL can address the issue of size distortion of existing testing methods with dynamicmodels. In Section 2.3, we showed first-order validity of adjusted BEL, i.e. adjusted BEL has the samefirst-order asymptotic properties as the BEL. In this section we derive higher-order properties.This approach achieves refinement in a different way from bootstrap. By this approach, we modifythe original EL ratio statistics in a way that χ2 distribution should better approximate its finite sampledistribution. By bootstrapping we calibrate another distribution that may provide a better approximationto finite sample distribution of the testing statistic. Therefore, it is interesting to see if we can achieve alevel of improvement of size property over conventional methods comparable to the improvement due toAllen, Gregory and Shimotsu (2011)’s bootstrapping.In Section 2.3, we showed that with a wide range of choice of the block length, Propositions 2.1,2.2,2.3hold and thus first-order properties of BEL are derived. The block length is a tuning parameter andour estimation and testing results can be sensitive to choice of block length in finite samples. Anotherinteresting question is to pin down the order of optimal block length choice for testing problems.In this section, we first show that the BEL ratio statistics can be mean-adjusted such that the order ofχ2 approximation to their finite-sample distributions under the null hypothesis becomes smaller. Based onthis result, we then show that there is a data-driven choice of the adjustment level such that the pseudoobservation-adjusted BEL ratio statistics using this specific adjustment level also have this favorablefeature.In the literature, Chen and Cui (2007) and Matsushita and Otsu (2012) derived the higher-orderproperties of EL-based tests for static models. We first give an outline of the general approach. Wefirst find a signed-root decomposition Rk of the EL ratio statistics for k = I, II with LRk = T · RτkRk +Op(T−2). Remember that k = I corresponds to EL ratio statistic used for t-test and k = II correspondsto EL ratio statistic used for J-test. In the literature, probabilistic approximation T · RτkRk to LRk is202.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodconsidered. We call T · RτkRk signed-root approximation. To the first order, T1/2Rk is asymptoticallynormally distributed.Rk is a polynomial of a centralized sample averages of functions of (Xt)Tt=1 (cf. page35, Chen and Cui (2005) and page 330, Matsushita and Otsu (2012)).Then either a mean adjustment or pseudo observation adjustment (Liu and Chen (2010)) can beapplied to implement higher-order refinement. By either approach, we obtained an adjusted EL ratiostatistic. It can be shown that χ2 approximation to the finite-sample distribution of T ·RτkRk has an errorof order T−2. The χ2 approximation error to the original EL ratio statistics is of order T−1. In this sensewe say that EL-based tests admit higher-order refinement. In this section, we show how such refinementcan be extended to our time series setting. We expect that tests based on adjusted EL ratio statisticswill have better size property. Kitamura (1997) derived high-order properties of EL-based tests in theframework of smooth function models. We formally extend the results in Kitamura (1997) to dynamicmoment restriction models, a more general modelling framework.In this section, we establish several key higher-order properties of BEL and adjusted BEL. Mostinterestingly, we show that a higher-order refinement can be implemented. Derivation of these newresults hinges on some key algebraic results drived in the literature (Chen and Cui (2007) and Matsuthitaand Otsu (2013)) as well as some key edgeworth expansion results in the literature. Validity of edgeworthexpansion of averages of block variables (fully overlapped blocking and non-overlapped blocking) wasstudied in Lahiri (2006, 2007), this paper is an application of his results. Another closedly relatedapplication is Lahiri (2010), where validity of general-order edgeworth expansions for studentized samplemeans under weak dependence is established and explicit expression of the expansion was derived. It isreadily seen from proof of Theorem 2 of Kitamura (1997) that to the first order `T (θ∗) = Q · R˜τ R˜+ op(1)where R˜ is sample mean of (g(Xt, θ∗))Tt=1 studentized by a block bootstrap estimator (cf. Lahiri (2003)chapter 3) of the long-run covariance matrix S.BEL is a class of methods indexed by the blocking schemes. In this section, we restrict to non-overlapped blocking. By blocking, serial correlation of the effective observations (i.e. the block averages)(Ti(θ))Qi=1 is less than that of the original observations. The extent to which non-overlapped blockingreduces the serial correlation is more than other blocking schemes. So restricting to non-overlappedblocking simplifies our problem of deriving higher-order properties5.Let V ≡ Var(Yi(θ∗)). Under Assumption 1, limM−→∞Var(M 1/2Yi(θ∗)) exists and is equal to the long-run5Kitamura (1997) also considered only non-overlapped blocking.212.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodcovariance matrix. Therefore V = O( 1M ). Now we define ΨL and ΨR to be unitary matrices associatedwith singular value decomposition of V−1/2E[∂∂θτ Yi(θ)∣∣θ=θ∗]such thatΨLV−1/2E[∂∂θτ Yi(θ)∣∣θ=θ∗]ΨR =Λ0 (2.14)where Λ is a p × p diagonal matrix and 0 is a (d − p) × p zero matrix. We transform the effectiveobservations to Wi(θ) ≡ ΨLV−1/2Yi(θ). The transformation matrix ΨLV−1/2 is estimable from the data.Let Wi ≡ Wi(θ∗) and with probability approaching one, the transformation does not change `, i.e. forevery θ ∈ Θ,`(θ) = inf{−Q∑i=1log(wi) : (w1, · · · , wQ) ∈ SQ,Q∑i=1wiWi(θ) = 0}. We use the convention that summation over the superscript is understood with suitable range. We usethe following rule for the indices a, b, c ∈ {1, . . . , d− p}, f, g, h, i, j ∈ {1, . . . , d}, k, l,m, n, o ∈ {1, . . . , p}.i.e. for example, AjAj =∑dj=1AjAj .αj1...jk ≡ E[W j10 · · ·Wjk0](2.15)We also use the notationαj1...jk1˜jk1+1 ...jk2 ≡ E[W j10 · · ·Wjk10 Wjk1+11 · · ·Wjk21]to denote first-order autocovariance of the transformed block variables Wj , j ∈ Z. By applying themoment bound of Kim (1993) it is easy to check that αj1...jk = O(1) under Assumption 4.We make the following assumptions on the the DGP (Xt)t∈Z. These are much stronger than Assump-tions 3 and 4 under which we proved the first-order results.Assumption 5. (a) For some κ ∈ (0, 1), E[‖g(X1, θ∗)‖18+κ]<∞ and E[∥∥ ∂∂θτ g(Xt, θ)∣∣θ=θ∗∥∥6+κ]<∞.(b) For any 16 j 6 p, all partial derivatives of (x, θ) 7→ gj(x, θ) with respect to θ up to third orderexists in a neighborhood B(θ∗, κ) for some κ ∈ (0, 1), andE[supθ′∈B(θ∗,κ)∥∥∥∥∂∂θ ⊗ ∂θτ ⊗ ∂θτg(X1, θ)∣∣∣∣θ=θ′∥∥∥∥3]<∞.222.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihood(c)∑∞k=1 k1+καX(k)1−κ <∞ for some κ ∈ (0, 1).Assumptions 5(a) and 5(b) are necessary assumptions on moments so that we can apply Lahiri(2006,2007)’s edgeworth expansion results to derive higher-order properties of our EL-based tests. As-sumption 5(c) is stronger restriction on serial dependence than Assumption 1, which is sufficient forfirst-order results. LetZt ≡g(Xt, θ∗)Vec(∂∂θτ g(Xt, θ)∣∣θ=θ∗) . (2.16)We assume that the process (Xt)t∈Z can be approximated in L2 norm by some process that has expo-nentially decaying mixing coefficients.Assumption 6. There exists a sequence of auxiliary σ−fields (Dj ⊆ F )j∈Z such that(a) There exists κ ∈ (0, 1) such that for all m > κ−1 and for all j > 1, there exists Dj+mj−m measurableZ†j,m such thatE[∥∥∥Zj − Z†j,m∥∥∥2]6 κ−1exp(−κm)(b) There exists a constant κ ∈ (0, 1) such that for all n,m ∈ N,supA∈Dn−∞, B∈D∞n+m|P (A ∩B)− P (A)P (B)| 6 κ−1exp(−κm)(c) There exists κ ∈ (0, 1), such that for all i, j, k, r,m ∈ N with i < k < r < j and m > κ−1,E [|P (A |Dj : j /∈ [k, r])− P (A |Dj : j ∈ [j −m, k) ∪ (r, j +m))|] 6 κ−1exp(−κm)(d) There exists κ ∈ (0, 1) such that for all m, j0 ∈ N with κ−1 < m < j0 and for all s ∈ Rd(1+p+p2)with ‖s‖ > κ232.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical LikelihoodE[∣∣∣E[exp(isτ(∑j0+mt=j0−m Zt))∣∣∣Dt, t 6= j0]∣∣∣]6 1− κUnder these assumptions, by applying theorems in Lahiri (2006,2007), we can show that there exists avalid edgeworth expansion for the sums of blocks of the variables (Zt)t∈Z. Assumptions 6(a), 6(b), 6(c) arestated in terms of auxiliary σ−fields (Dj ⊆ F )j∈Z to allow for more generality. Examples of DGPs (Xt)t∈Zthat satisfy these assumptions include linear processes, Gaussian processes, nonlinear AR processes undersome mild restrictions. See Go¨tze and Hipp (1983, 1994) for examples of time series processes (Xt)t∈Zsuch that transformations of it by a regular function g : Rs×Θ −→ Rd and its derivative functions satisfythese regularity condtions and details about verification.For l, o ∈ {1, . . . , p} and a, b ∈ {1, . . . , d− p}, we define the following constants∆lI,[1,1] ≡ −16αlkk,∆loI,[2,1] ≡Mαl o˜[2; l, o]and∆loI,[2,2] ≡12αlokk −13αlkmαokm −136αlokαkmm + αlo p+a p+a − αl p+a p+bαo p+a p+b− αlk p+aαok p+a − αlokαk p+a p+a +13(αlko k˜ + αl˜ko k)−16(αlkk o˜[2; l, o] + αl˜kk o[2; l, o])−(αp+a p+a l o˜[2; l, o] + α˜p+a p+a l o[2; l, o])(2.17)and∆p+aII,[1,1] ≡ −16αp+a p+b p+b,∆p+a p+bII,[2,1] ≡Mαp+a p˜+b[2; a, b]242.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodand∆p+a p+bII,[2,2] ≡12αp+a p+b p+c p+c −13αp+a p+c p+dαp+b p+c p+d−136αp+a p+b p+cαp+c p+d p+d +13(αp+a p+b p+c p˜+c + α˜p+a p+b p+c p+c)−16(αp+a p+c p+c p˜+b[2; a, b] + α ˜p+a p+c p+c p+b[2; a, b])(2.18)where it is understood that αl o˜[2; l, o] ≡ αl o˜ + αo l˜. We use the notation [2; l, o] to indicate that there aretwo terms obtained by interchanging l and o in αl o˜. By Lemma 2.7 in the appendix and Kim (1993)’smoment bound, the constants ∆′s are O(1). Let fp be the density of χ2p distribution and fd−p be thedensity of χ2d−p distribution. Let%I,1 ≡ p−1p∑l=1∆llI,[2,1]%I,2 ≡ p−1p∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1])and%II,1 ≡ (d− p)−1d−p∑a=1∆p+aII,[2,1]%II,2 ≡ (d− p)−1d−p∑a=1(∆p+a p+aII,[2,2] + ∆p+aII,[1,1]∆p+aII,[1,1]).These unknown O(1) constants are defined to be higher-order refinement factors. It is observed thatcompared to the refinement factor that refines χ2 approximation to distribution of EL ratio statisticswith i.i.d. data (cf. 3.1 and 3.2 of Chen and Cui (2007), A4 of Matsushita and Otsu (2013)), thesefactors include much fewer terms. As we introduce blocking to account for serial dependence in data,the higher-order expansion becomes an superposition of two series, one in order O(M−1/2) and one inorder O(Q−1/2). The shorter form of refinement factor is a natural consequence of our finer definitionof “orders”. The refinement factors depend on second-order cumulant of the signed root decompositionQ1/2Rk. The O(T−1) terms in the expansions of second-order cumulants of the signed-root decompositionof the EL ratio statistics for i.i.d. case are algebraically identical to a large part of the O(Q−1) terms in theexpansions of second-order cumulants of the signed-root decomposition of the BEL ratio statistics, except252.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodthat these cross moments are defined in terms of block averages. O(Q−1) terms also include a number ofautocovariance terms. But it is observed that due to our introduction of blocking and the block lengthincreases with sample size M −→∞ as T −→∞, some of these O(Q−1) terms are in fact O(Q−1M−1/2),which means these terms are smaller asymptotically. It is clear from the proofs of Propositions 2.4 and2.5, we need to look at only the leading terms.Now we define signed-root decomposition for BEL. We find p−dimensional random vector RI and(d− p)−dimensional random vector RII with LRI = Q ·RτIRI +Op(Q−1M−1/2) and LRII = Q ·RτIIRII +Op(Q−1M−1/2). We call RI and RII signed-root decomposition and Q · RτIRI and Q · RτIIRII signed-rootapproximation to the original BEL ratio statistics. Derivation of RI and RII will be reviewed in Section(2.7.2). The signed root statistics RI and RII are both polynomials in a vector of centralized samplemeans of block variables (cf. 2.43 and 2.44). Derivation of the explicit forms of RI and RII follows thesame steps as in Chen and Cui (2007) and Matsushita and Otsu (2013), where signed-root approximationto EL ratio statistics are derived. Now we state the first higher-order property.Proposition 2.4. Under Assumptions 1 to 6, we havesupx∈R∣∣P [Q ·RτIRI 6 x]− P[χ2p 6 x]+M−1%I,1xfp(x) +Q−1%I,2xfp(x)∣∣ = O(M−2) +O(Q−1M−1/2)(2.19)andsupx∈R∣∣∣P [Q ·RτIIRII 6 x]− P[χ2d−p 6 x]+M−1%II,1xfd−p(x) +Q−1%II,2xfp(x)∣∣∣ = O(M−2) +O(Q−1M−1/2)(2.20)Proposition 2.4 extends Theorem 1 of Chen and Cui (2007) and Theorem 1 of Matsushita and Otsu(2013) to time series setting. The probabilistic error of signed-root approximation to the BEL ratiostatistics is of smaller order than O(M−2)+O(Q−1M−1/2). Proposition 2.4 shows that the approximationerror of χ2 distribution to the finite-sample distribution of the BEL ratio statistics is of order O(M−1).The proof can be found in the proof section. The idea of this proof is the same as proof of Theorem 1 ofChen and Cui (2007) and Theorem 1 of Matsushita and Otsu (2013). Proof of (2.19) and (2.20) hingeson the fact that third and fourth order joint cummulants of RI and RII have a very special property:after expanding, it is found that sum of the terms of the largest asymptotic order is equal to zero andtherefore only terms of smaller asymptotic order (O(Q−1/2M) and O(M−2) +O(Q−1M−1/2)) are left (cf.262.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihood(2.55) and (2.57) in the proof section).Many important algebraic results in Chen and Cui (2007) and Matsushita and Otsu (2013) are usedin our proof, however a big difference between our BEL with time series setting and their EL withi.i.d. setting is that expansion of joint cummulants of RI and RII involves also many autocovarianceterms, which required additional algebraic work. Moreover, the presence of autocovariance terms inthe expansion makes the form the expansion of P [Q ·RτIRI 6 x] very different from its EL counterpart.Because of use of blocking to account for serial correlation, superposition of a series in M−1/2 and a seriesin Q−1/2 appears in this expansion as well as in expansion of joint cummulants of RI and RII. Sameobservation is made by Lahiri (2006,2007,2010) for edgeworth expansion of studentized time series samplemean. The signed-root statistic RI and RII are to the first order studentized time series sample means.Proposition 2.4 gives rise to the following higher-order refinement result via an infeasible mean ad-justment.Proposition 2.5. Under Assumptions 1 to 6, we havesupx∈R∣∣P[Q ·RτIRI 6 x(1 +M−1%I,1 +Q−1%I,2)]− P[χ2p 6 x]∣∣ = O(M−2) +O(Q−1M−1/2) (2.21)andsupx∈R∣∣∣P[Q ·RτIIRII 6 x(1 +M−1%II,1 +Q−1%II,2)]− P[χ2d−p 6 x]∣∣∣ = O(M−2) +O(Q−1M−1/2)(2.22)Proposition (2.5) indicates that potentially finite-sample distribution of an infeasible mean-adjustedversion of the BEL ratio statistic LRI1+M−1%I,1+Q−1%I,2 can be better approximated by χ2 distribution. Theunknown constants %′s can be estimated by sample moments. It can be proved that if we replace theunknown constants %′s by its Q1/2−consistent estimates and define the mean-adjusted BEL ratio statisticswith sample estimates, this feasible version of mean-adjusted BEL ratio statistic have the same higher-order properties (2.21) and (2.22). Propositions (2.5) can be viewed as an extension of Chen and Cui (2007)and Matsushita and Otsu (2013)’s Bartlett correction result to BEL with dynamic moment restrictionmodel. This proposition is also an extension of Kitamura’s (1997) Bartlett correction result for smoothfunction model to more general dynamic moment restrictions framework.We now look at a novel approach to achieve the higher-order refinement via defining pseudo observa-272.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical Likelihoodtions. Let %̂I,1 = %I,1 +Op(Q−1/2) and %̂I,2 = %I,2 +Op(Q−1/2) be estimators of the unknown constants %I,1and %I,2. By applying Lemma 2.7 in the proof section (Cf. also Lahiri (2003)), it follows that estimatorsdefined using sample analogues of the block variables has the approximation error of order Op(Q−1/2). Wedefine the feasible adjustment parameterahr−I ≡12(QM%̂I,1 + %̂I,2). (2.23)Then let `hr−I be the “type - I” higher-order refined adjusted BEL profile likelihood function`a(θ) ≡ inf{−Q+1∑i=1log(wi) : (w1, · · · , wQ+1) ∈ SQ+1,Q∑i=1wiYi(θ) + wQ+1(−ahr−IQQ∑i=1Yi(θ))= 0}which is the adjusted BEL profile likelihood function as defined in (2.11) with the refinement factor (2.23)as the adjustment level. We can define “type - II” higher-order refined adjusted BEL profile likelihoodfunction in similar fashion using %̂II,1 and %̂II,2 to define refinement factor as in (2.23), such as samplemoments of the block variables. We now define adjusted BEL ratio statistics for t-test and J-test. LerALRhr−I ≡ 2{`hr−I(θˆhr−I)− `hr−I(θ∗)}(2.24)ALRhr−I ≡ 2`hr−II(θˆhr−II)where θˆhr−I ≡ argminθ∈Θ`hr−I(θ) and θˆhr−II ≡ argminθ∈Θ`hr−II(θ) are the corresponding adjusted BEL estima-tors.We find signed-root decomposition for ALRhr−I and ALRhr−II, Rhr−I and Rhr−II with ALRhr−I =Q·Rτhr−IRhr−I+Op(Q−1M−1/2)+Op(M−2) and ALRhr−II = Q·Rτhr−IIRhr−II+Op(Q−1M−1/2)+Op(M−2).The following proposition is the main result of this section.Proposition 2.6. Under Assumptions 1 to 6,supx∈R∣∣P[Q ·Rτhr−IRhr−I 6 x]− P[χ2p 6 x]∣∣ = O(M−2) +O(Q−1M−1/2) (2.25)andsupx∈R∣∣P[Q ·Rτhr−IIRhr−II 6 x]− P[χ2p 6 x]∣∣ = O(M−2) +O(Q−1M−1/2) (2.26)282.4. Higher-Order Properties of Blockwise Empirical Likelihood and Adjusted Blockwise Empirical LikelihoodThis proposition generalizes Liu and Chen (2010) Theorem 2 and Matsushita and Otsu (2013) Theorem3.3 to adjusted BEL with time series data. It shows that ALRhr−I and ALRhr−II are feasible BEL ratiostatistics that have approximation errors of smaller order of magnitude. The probabilistic error of signed-root approximation to the adjusted BEL ratio statistics is of smaller order than O(M−2)+O(Q−1M−1/2).Proposition 2.6 gives the rationale for using the adjusted BEL ratio statistics for testing problems: weexpect that χ2 approximation to the finite-sample distribution of the adjusted BEL ratio statistics usingthe specific data-driven adjustment (2.23) is better than to unadjusted BEL ratio test.Two remarks are necessary. Firstly, %I,1 and %I,2 depend on the unknown parameter θ∗. To get theestimated optimal adjustment parameter (2.23), we need a preliminary estimator. We propose to use theadjusted BEL estimate using the conventional adjustment parameter a = max{log(Q)2 , 1}to constructmoment estimates for %I,1 and %I,2. Secondly, we may have ahr−k < 0 in finite samples and in suchcases we do not escape the convex hull problem. As proposed by Liu and Chen (2010), we can solve theproblem by adding two pseudo observations and Proposition 2.6 still applies. Because %I,2 is a sum ofpopulation moment of block variables, we simply decompose the estimator %̂I,2 to %̂[I,2],1 and %̂[I,2],2 suchthat %̂I,2 = %̂[I,2],1 − %̂[I,2],2. Then define ahr−I,1 ≡QM %̂I,1 + %̂[I,2],1 and ahr−I,2 ≡ %̂[I,2],2. Define`hr−I(θ) ≡ inf{−Q+2∑i=1log(wi) : (w1, · · · , wQ+2) ∈ SQ+2,Q+2∑i=1wiYi(θ) = 0}where YQ+1(θ) ≡ −ahr−I,1Q∑Qi=1 Yi(θ) and YQ+2(θ) ≡ahr−I,2Q∑Qi=1 Yi(θ). We construct BEL ratio statisticsas in (2.24), and the result about approximation error of Proposition 2.6 still holds.The adjusted BEL ratio statistic using the higher-order-refining adjustment parameter still dependson choice of block length. For simplicity, we denote the statistic by ALRM to emphasize its dependenceon block length M . The discussion in this subsection applies to both ALRhr−I and ALRhr−II so we omitthe subscipt I and II. We can define the theoretical optimal block length M∗ to be that minimizing theabsolute difference of the nominal size α and the actual size of the test using χ2 (χ2p or χ2d−p) quantilec1−α to define the rejection rule, i.e.M∗ ≡ argminM∈M|P [ALRT,M > c1−α]− α| (2.27)where M is the collection of block lengths such that the first-order validity results hold. Omitting the292.5. Monte Carlo Simulationshigher-order terms and not distinguishing adjusted BEL ratio statistic and its signed-root approximation,it is observed from (2.25) that this constraint is not binding and the optimal block length is of the formM∗ = C∗T2/5, where C∗ is a positive constant depending on unknown population moments. Estimationof this constant C∗ or a data-driven choice of the block length is left for future research.2.5 Monte Carlo SimulationsIn this section, we report results of Monte Carlo simulations. We use simulations to see if adjustedBEL with higher-order refinement is effective to address the size distortion problem with existing tests.Specifically, we compare finite-sample performances of adjusted BEL with higher-order refinement andoriginal BEL for testing problems. We report empirical size and size-adjusted power of these tests. Weconsider both t-test and J-test. We use two different simulation setup: one time series regression modeland a consumption-based asset pricing model (CAPM). The former is an example of linear models and thelatter is an example of nonlinear models. We choose the block length to be M = T 2/5 in the simulations. Itis desired to have a data dependent way of choosing the block length in practice. A heuristically sensibledata dependent choice is to set the block length equal to the data-driven lag length by Newey and West(1994) with the Bartlett kernel. The reason is that the EL ratio statistic first order asymptoticallyequivalent to a GMM wald statistic using a HAC estimator with Bartlett kernel.2.5.1 A Linear ModelWe use the Monte Carlo simulation setup in Inoue and Shintani (2006) and give evidence of finite-sample behavior of our method.Yt = θ∗,1 + θ∗,2Xt + Utfor t = 1, . . . , T . (Ut)Tt=1 and (Xt)Tt=1 are generated by AR(1) processes Ut = ρUt−1+1t and Xt = ρXt−1+2t, where (1t, 2t)t∈Z is an i.i.d. normal process with mean00 and covariance1 00 1. Theapproach is instrumental variable with instruments Zt ≡ (1, Xt, Xt−1, Xt−2). The t-test null hypothesisis H0 : (θ∗,1, θ∗,2) = (0, 0). The refinement factor is estimated using sample analogue.302.5. Monte Carlo SimulationsTable 2.4: Linear Model (ρ = 0.5)Empirical Sizes of BEL Based Tests and Adjusted BEL Based Testst-test J-testNominal Size 0.1 0.05 0.01 0.1 0.05 0.01T = 100 BEL (Non-Overlapped) 0.3910 0.3095 0.2000 0.1980 0.1305 0.0580BEL (Fully Overlapped) 0.2820 0.2025 0.0875 0.1575 0.0940 0.0325Adjusted BEL 0.1640 0.1085 0.0480 0.1080 0.0655 0.0215T = 250 BEL (Non-Overlapped) 0.2475 0.1665 0.0725 0.1405 0.0875 0.0280BEL (Fully Overlapped) 0.2200 0.1410 0.0480 0.0910 0.0445 0.0160Adjusted BEL 0.1125 0.0620 0.0175 0.0915 0.0510 0.0110T = 500 BEL (Non-Overlapped) 0.1845 0.1235 0.0395 0.1185 0.0715 0.0190BEL (Fully Overlapped) 0.1900 0.1250 0.0345 0.0730 0.0340 0.0040Adjusted BEL 0.0965 0.0565 0.0100 0.0870 0.0415 0.0110First, we find the empirical sizes of t-test and J-test using original non-overlapped BEL ratio and fullyoverlapped BEL ratio statistics and χ2 quantiles with three different nominal sizes. We find that sucht-test and J-test severely over-reject the null hypothesis. The actual size of J-test is closer to nominal sizethan t-test. Second, we find that the empirical sizes of t-test and J-test using adjusted BEL with higherorder precision based likelihood ratio statistics and χ2 approximation is much closer to the nominal size.However, it is observed that for t-test, the magnitude of size distortion with our adjusted BEL ratio testswith higher order precision is still large, when we choose the block length to be M = T 2/5 and sample sizeis small.We now consider power property of t-test based on the adjusted BEL ratio statistics. We considerpower against five different alternatives: θ∗ = (0.5, 0), θ∗ = (0, 0.5), θ∗ = (0.5, 0.5), θ∗ = (0.5, 1) andθ∗ =(1, 0.5). We report both the rejection frequency and the size-adjusted power. We run 2000 replications,with T = 100 and T = 250. From Table 2.6, we find decay of power against all of the alternativesconsidered in our simulations, when adjusted BEL is used. There is a heuristic reason behind this decayof power. Tsao and Zhou (2001) argued that the length of EL based confidence interval can be heavilyaffected by outlers. The pseudo obervation is likely to be an outlier, since it can be far away from theoriginal observations.312.5. Monte Carlo SimulationsTable 2.5: Linear Model (ρ = 0.9)Empirical Sizes of BEL Based Tests and Adjusted BEL Based Testst-test J-testNominal Size 0.1 0.05 0.01 0.1 0.05 0.01T = 100 BEL (Non-Overlapped) 0.4225 0.2900 0.0340 0.2090 0.1520 0.0785BEL (Fully Overlapped) 0.4180 0.3535 0.2145 0.1665 0.1095 0.0520Adjusted BEL 0.1605 0.1145 0.0640 0.1135 0.0775 0.0370T = 250 BEL (Non-Overlapped) 0.3910 0.2960 0.1620 0.1405 0.0850 0.0315BEL (Fully Overlapped) 0.4650 0.3695 0.1220 0.0800 0.0380 0.0140Adjusted BEL 0.0695 0.0385 0.0120 0.0680 0.0380 0.0105T = 500 BEL (Non-Overlapped) 0.3730 0.2790 0.1535 0.1060 0.0590 0.0145BEL (Fully Overlapped) 0.3780 0.2840 0.1425 0.0550 0.0260 0.0055Adjusted BEL 0.0600 0.0325 0.0130 0.0680 0.0320 0.00902.5.2 A Nonlinear Model (CAPM)We consider a simple version of a consumption-based asset pricing model (CAPM) with the followingmoment conditions:E [exp(µ− θ(Xt + Zt) + 3Zt)− 1] = 0 (2.28)E [Zt (exp(µ− θ(Xt + Zt) + 3Zt)− 1)] = 0 (2.29)with µ = −0.72. The data is generated byXt = ρXt−1 +√(1− ρ2)1tZt = ρZt−1 +√(1− ρ2)2twhere (1t, 2t) ∼ N(0,0.16 00 0.16) i.i.d. and ρ = 0see.6. θ = 3 is the unique point at which themoment restrictions (2.28) are satisfied. This setup is the same as the simulation setup considered inSection 5.2.2. of Allen, Gregory and Shimotsu (2011). We are also looking at the same testing problemsn322.5. Monte Carlo SimulationsTable 2.6: Linear ModelPower Property of Adjusted BEL Based t-Test and BEL Based t-TestRejection Size-Adjusted Rejection Size-AdjustedFrequency Power Frequency PowerT = 100 T = 250True Value α BEL ABEL BEL ABEL BEL ABEL BEL ABEL(0, 0.5) 0.01 1 0.3850 0.3835 0.1775 1 0.5385 0.9313 0.51200.05 1 0.5070 0.6925 0.3900 1 0.6670 1 0.63700.10 1 0.5745 0.8750 0.4850 1 0.7425 1 0.7290(0.5, 0) 0.01 0.9660 0.2640 0.1210 0.1000 0.9107 0.4595 0.4140 0.42500.05 0.9810 0.3850 0.3300 0.2655 0.9703 0.5975 0.9127 0.56000.10 0.9910 0.4620 0.5175 0.3685 0.9857 0.6670 0.9637 0.6655(0.5, 0.5) 0.01 1 0.4205 0.5555 0.2015 1 0.5210 1 0.49400.05 1 0.5525 0.8270 0.4220 1 0.6465 1 0.61850.10 1 0.5990 0.9420 0.5065 1 0.7125 1 0.7010(0.5, 1) 0.01 1 0.4515 0.9560 0.2585 1 0.5380 1 0.50750.05 1 0.5225 0.9960 0.4540 1 0.6670 1 0.63300.10 1 0.6375 0.9995 0.5520 1 0.7385 1 0.7230(1, 0.5) 0.01 1 0.4195 0.8550 0.2375 1 0.5525 1 0.49300.05 1 0.5290 0.9760 0.4225 1 0.7405 1 0.62400.10 1 0.6050 0.9975 0.5125 1 0.6585 1 0.7235as Allen, Gregory and Shimotsu (2011). In Table 2.7, we report the empirical sizes of t-tests and J-tests.It is found that adjusted BEL based tests have better size property. Comparing the output in Table2.7 and the output in Tables 2.5 and 2.6, we find that applied to the nonlinear model, adjusted BEL ismore effective to address the size distortion problem than applied to the linear model when the sample sizeis small. We can also compare 2.7 with Table 4 of Allen, Gregory and Shimotsu (2011). For t-test in thisnonlinear modelling setup, our adjusted BEL based test is more effective in addressing the size distortionproblem than Allen, Gregory and Shimotsu (2011)’s bootstrap method. For J-test, our adjusted BEL isas effective as Allen, Gregory and Shimotsu (2011)’s bootstrap.Using this nonlinear setup, we also look at power property of J-tests based on BEL and ABEL ratiotesting statistics. We consider variations of previous DGP.Xt = ι+ ρXt−1 +√(1− ρ2)1tZt = ι+ ρZt−1 +√(1− ρ2)2tand the moment restrictions (2.28) with µ = −0.18. We consider the following three DGPs.332.6. ConclusionTable 2.7: Nonlinear Model (CAPM)Empirical Size of BEL Based Tests and Adjusted BEL Based Testst-test J-testNominal Size 0.01 0.05 0.1 0.01 0.05 0.1T = 100 BEL (Non-Overlapped) 0.177 0.275 0.334 0.130 0.227 0.294BEL (Fully Overlapped) 0.145 0.242 0.314 0.106 0.213 0.279Adjusted BEL 0,027 0.061 0.090 0.037 0.080 0.119T = 250 BEL (Non-Overlapped) 0.081 0.171 0.238 0.070 0.157 0.233BEL (Fully Overlapped) 0.070 0.160 0.231 0.059 0.139 0.211Adjusted BEL 0.018 0.051 0.094 0.027 0.077 0.124T = 500 BEL (Non-Overlapped) 0.046 0.117 0.186 0.040 0.106 0.175BEL (Fully Overlapped) 0.044 0.116 0.187 0.028 0.095 0.168DGP 0: (1t, 2t) ∼ N(0,0.16 00 0.16) i.i.d. and ρ = 0.6, ι = 0.DGP 1: (1t, 2t) ∼ N(0,0.09 00 0.09) i.i.d. and ρ = 0.6, ι = 0.DGP 2: (1t, 2t) ∼ N(0,0.04 00 0.04) i.i.d. and ρ = 0.6, ι = 0.02.If our data is generated by DGP 0 , (2.28) with µ = −0.18 are satisfied at θ∗ = 3. If our data isgenerated by DGP 1 or DGP 2, we can show that for both cases, there is no θ that satisfies (2.28). InTable 2.8, we report rejection frequencies and size-adjusted power with two different sample sizes, basedon 10000 replications. We find that although adjusted BEL based tests have favorable size property, thereis decay of power in adjusted BEL, especially if the true data is generated by DGP 2.2.6 ConclusionIn this paper, we propose pseudo observation adjustment to blockwise empirical likelihood (BEL) withdynamic moment restriction model. Pseudo observation adjustment is a device to help us avoid the342.6. ConclusionTable 2.8: Nonlinear Model (CAPM)Power Property of Adjusted BEL Based J-Test and BEL Based J-TestRejection Frequency Size-Adjusted Powerα BEL ABEL BEL ABELT = 100 DGP 0 0.01 0.071 0.0090.05 0.153 0.0400.10 0.227 0.078DGP 1 0.01 0.166 0.106 0.018 0.0470.05 0.313 0.216 0.119 0.1500.10 0.410 0.340 0.227 0.228DGP 2 0.01 0.262 0.031 0.084 0.0170.05 0.394 0.100 0.216 0.0630.10 0.476 0.177 0.315 0.123T = 250 DGP 0 0.01 0.120 0.0090.05 0.339 0.0400.10 0.476 0.078DGP 1 0.01 0.288 0.106 0.119 0.1080.05 0.485 0.216 0.339 0.2450.10 0.592 0.340 0.476 0.352DGP 2 0.01 0.359 0.031 0.192 0.0330.05 0.538 0.100 0.400 0.1200.10 0.633 0.177 0.528 0.229convex hull problem which undermines the validity of BEL estimates. We prove first-order validity ofthis adjustment: the estimator is consistent; it is asymptotically normally distributed and asymptoticallyefficient; the BEL ratio statistics have χ2 limiting distribution. As we proposed a method to escape convexhull problem, we introduce a new tuning parameter. In the second part of the paper, we find a data-drivenchoice of the tuning parameter which is optimal for testing problems. This method is favorable becausewe escape convex hull problem and achieve higher-order refinement simultaneously. This new methodis also proposed as a solution to the size distortion problem of existing tests for testing problems withdynamic moment restriction models. We also show that optimal block length choice that minimizes thehigher-order approximation error has an order of magnitude O(T 2/5). This finding gives some guidance onselecting the block length in practice. Data-driven choice of the block length is a topic for future research.Lastly we show finite-sample performance of our method using Monte Carlo simulations.352.7. Proofs2.7 Proofs2.7.1 Proofs of Propositions in Section 3“with probability approaching one” will be abbreviated to w.p.a.1. We use κ to denote a genericpositive constant that can be different in different places. FOC is short for “first order condition”.“p.d.” and “p.s.d.” are short for positive definite and positive semi-definite. Remember that wedefined Yi(θ) ≡ 1M∑Mv=1 g(X(i−1)L+v, θ), i = 1, · · · , Q be averages of these blocks and assumed S ≡limT−→∞T−1Var[∑Tt=1 g(Xt, θ∗)]exists and is positive definite.First we extend Fitzenberger (1997)’s Lemmas A1 and A2 to more general blocking schemes. Wedefined Q ≡⌊T−ML⌋+ 1, where bcc is the biggest integer that is smaller than c, where M 6 T be blocklength and L 6M is the seperation between consecutive blocks.Lemma 2.1. Let (Xt)Tt=1 be stationary and strong mixing of sizeηη−1 for some η > 1 process withE[|X1|2η]<∞ then we have1Q∑Qi=11M∑Mv=1X(i−1)L+v =1T∑Tt=1Xt +Op(MT )and1QQ∑i=11M(M∑v=1X(i−1)L+v)(M∑v=1X(i−1)L+v)τ=1TT∑t=1XtXτt +M−1∑d=1(1−dM)(1TT−d∑t=1XtXτt+d)+M−1∑d=1(1−dM)(1TT−d∑t=1Xt+dXτt)+Op(MT)Proof. By interchanging the order of summations, we haveQ∑i=1M∑v=1X(i−1)L+v =Q∑i=1T∑t=11[(i− 1)L+M > t > (i− 1)L+ 1]Xt=T∑t=1(min{Q,⌊t− 1L⌋+ 1}−max{1,⌊t−ML⌋}+ 1)Xt=MLT∑t=1Xt +M∑t=1(⌊t− 1L⌋−ML+ 1)Xt+T∑t=(Q−1)L+1(Q−⌊t−ML⌋−ML)Xt +(Q−1)L+1∑t=M(1−1L)Xt362.7. Proofshold for all T large enough. By definition of Q, we have 1QM ≈1T−M+LLM . Therefore the differencebetween 1T∑Tt=1Xt and the leading term1QL∑Tt=1Xt is Op(MT ). For the second part, by interchangingsummation operations, we have1QQ∑i=11M(M∑v=1X(i−1)L+v)(M∑v=1X(i−1)L+v)τ=1QMQ∑i=1(i−1)L+M∑t=(i−1)L+1XtXτt +M−1∑d=1(i−1)L+M−d∑t=(i−1)L+1XtXτt+d+M−1∑d=1(i−1)L+M−d∑t=(i−1)L+1Xt+dXτt=1QMQ∑i=1(i−1)L+M∑t=(i−1)L+1XtXτt +M−1∑d=1Q∑i=1(i−1)L+M−d∑t=(i−1)L+1XtXτt+d+M−1∑d=1Q∑i=1(i−1)L+M−d∑t=(i−1)L+1Xt+dXτtNow apply the algebraic steps and results from proof of the first part to∑Qi=1∑(i−1)L+Mt=(i−1)L+1XtXτt ,∑Qi=1∑(i−1)L+M−dt=(i−1)L+1 XtXτt+d and∑Qi=1∑(i−1)L+M−dt=(i−1)L+1 Xt+dXτt . By some lengthy algebra, we can find thecorrect expression of the leading term and by applying we can see that the remainder terms are Op(MT )under the assumptions by applying the moment bound of Ibragimov (1962)’s Theorem 1.7 and Theorem11.14 of Severini (2005).Corollary 2.1. Under Assumptions 1,2, we have(a). T 1/2 1Q∑Qi=1 Yi(θ∗) −→d Normal(0, S)(b). MQ∑Qi=1 Yi(θ∗)Yi(θ∗)τ −→p S.Proof. By applying Lemma 2.1, we can see that T 1/2 1Q∑Qi=1 Yi(θ∗) = T−1/2∑Tt=1 g(Xt, θ∗) + op(1). Thenunder our assumptions, a central limit theorem (cf. Ibragimov (1962) Theorem 1.7) for stationary strongmixing process applies and we get the asymptotic normality. For part (b), we see that MQ∑Qi=1 Yi(θ∗)Yi(θ∗)τ =Sˆ + op(1) where Sˆ is the heteroskedasticity and autocorrelation consistent (HAC) variance estimator withBartlett kernel. Andrews (1991) showed that Sˆ is consistent for S given M = o(T ).Lemma 2.2. Let (Zt)t∈Z be a stationary process with E[‖Z1‖2+η]<∞ for some η > 0. Let the double-array stochatic process(BT,i ≡ 1M∑Mj=1 Z(t−1)L+j)T∈N,i∈Nbe a blocking transformation of (Zt)t∈Z, where372.7. ProofsM = o(T ) and L = O(M). Let Q ≡⌊T−ML⌋+ 1, then with probability 1, maxi=1,...,Q‖BT,i‖ = o(T1/2+η) andmaxt=1,...,T‖Zt‖ = o(T1/2+η).Proof. See Owen (1990) Lemma 3 and Kunsch (1989) Lemma 3.2.Lemma 2.3. Suppose that θ˜ is some estimator of θ∗ with θ˜ = θ∗ + OP (T−1/2) and M = o(T 1/2), thenMQ∑Qi=1 Yi(θ˜)Yi(θ˜)τ −→p S.Proof. By Lemma 2.1, under the assumption M = o(T 1/2), we have MQ∑Qi=1 Yi(θ∗)Yi(θ∗)τ −→p S. Wethen have∥∥∥∥∥MQQ∑i=1Yi(θ˜)Yi(θ˜)τ −MQQ∑i=1Yi(θ∗)Yi(θ∗)τ∥∥∥∥∥6∥∥∥θ˜ − θ∗∥∥∥ 2MQQ∑i=1(supθ∈Θ∥∥∥∥∂∂θτTi(θ)∥∥∥∥)(supθ∈Θ‖Ti(θ)‖)6∥∥∥θ˜ − θ∗∥∥∥ 2M(1QQ∑i=1(1M∑v=1supθ∈Θ∥∥∥∥∂∂θτg(X(i−1)L+v, θ)∥∥∥∥− E[supθ∈Θ∥∥∥∥∂∂θτg(X(i−1)L+v, θ)∥∥∥∥])×(1M∑v=1supθ∈Θ∥∥g(X(i−1)L+v, θ)∥∥− E[supθ∈Θ∥∥g(X(i−1)L+v, θ)∥∥])+Op(1))=∥∥∥θ˜ − θ∗∥∥∥ 2M ×Op(1)where the first and second inequalities follow by applying Lemmas 2.1 and 2.2.We first prove that the lagrange multiplier associated with the adjusted BEL has desired asymptoticproperties.Lemma 2.4. Suppose that θ˜ = θ∗ + OP (T−1/2),∥∥∥1Q∑Qi=1 Yi(θ˜)∥∥∥ = Op(T−1/2), M = O(T 1/2−1/η) anda = op(Q) hold. Let λ˜ be a random vector that attains supγ∈Γa(θ˜)∑Q+1i=1 log(1 + γτYi(θ˜)), then λ˜ = Op( MT 1/2 ).Proof. We have the FOCQ∑i=1Yi(θ˜)1 + λ˜τYi(θ˜)+YQ+1(θ˜)1 + λ˜τYQ+1(θ˜)= 0.382.7. ProofsDenote Y˜i ≡ Yi(θ˜). Put ξ ≡ λ˜‖λ˜‖. Then we have0 =ξτQQ+1∑i=1Y˜i1 + λ˜τ Y˜i=ξτQQ+1∑i=1λ˜−(λ˜τ Y˜i)Y˜i1 + λ˜τ Y˜i= ξτ(1QQ+1∑i=1Y˜i +1QY˜Q+1)−ξτQQ+1∑i=1(λ˜τ Y˜i)Y˜i1 + λ˜τ Y˜i= ξτ(1QQ∑i=1Y˜i)(1−aQ)−∥∥∥λ˜∥∥∥QQ+1∑i=1(ξτ Y˜i)21 + λ˜τ Y˜i.Therefore, we have∥∥∥λ˜∥∥∥(ξτ S˜?ξ)= ξτ(1QQ∑i=1Y˜i)(1−aQ)−∥∥∥λ˜∥∥∥Q(ξτ Y˜Q+1)21 + λ˜τ Y˜Q+16∥∥∥∥∥1QQ∑i=1Y˜i∥∥∥∥∥(1−aQ)where S˜? ≡ 1Q∑Qi=1Y˜iY˜ τi1+λ˜τ Y˜i. Let S˜ ≡ 1Q∑Qi=1 Y˜iY˜τi . By construction we have 1 + λ˜τ Y˜i > 0 for every1 6 i 6 Q + 1. It is easy to see that S˜?(1 + max16i6Qλ˜τ Y˜i) − S˜ is p.s.d. Therefore, by applyingCauchy-Schwartz inequality we have∥∥∥λ˜∥∥∥(ξτ S˜ξ)6∥∥∥λ˜∥∥∥(ξτ S˜?ξ)(1 + max16i6Qλ˜τ Y˜i)6∥∥∥1Q∑Qi=1 Y˜i∥∥∥(1− aQ)(1 +∥∥∥λ˜∥∥∥(max16i6Q∥∥∥Y˜i∥∥∥)).And we can solve for∥∥∥λ˜∥∥∥ and this gives∥∥∥λ˜∥∥∥(ξτ S˜ξ −∥∥∥1Q∑Qi=1 Y˜i∥∥∥(1− aQ)(max16i6Q∥∥∥Y˜i∥∥∥))6∥∥∥1Q∑Qi=1 Y˜i∥∥∥(1− aQ).under Assumption 1, by Lemma 2.1, have max16i6Q∥∥∥Y˜i∥∥∥ = op(T1/η). By Lemma 2.2, under Assumption3, we have M(ξτ S˜ξ)> κ w.p.a.1. Under the assumption M = O(T 1/2−1/η) and∥∥∥1Q∑Qi=1 Y˜i∥∥∥ = Op(T−1/2),WT ≡ −M∥∥∥1Q∑Qi=1 Y˜i∥∥∥(1− aQ)(max16i6Q∥∥∥Y˜i∥∥∥)= op(1) and therefore M(ξ′S˜ξ +WT)> κ w.p.a.1.Therefore, w.p.a.1, we have∥∥∥λ˜∥∥∥ 6 1κM∥∥∥1Q∑Qi=1 Y˜i∥∥∥(1− aQ), and the desired result follows.392.7. ProofsNow let θˆa be the adjusted BEL estimator. Now we prove consistency of θˆa. By definition, θˆa ≡argminθ∈Θ`a(θ), where `a is defined in (2.11). By letting λ(θ) denote a random vector that attainssupγ∈Γa(θ)Q+1∑i=1log (1 + γτYi(θ))for every θ ∈ Θ, we have `a(θ) =∑Q+1i=1 log (1 + λ(θ)τYi(θ)). λ(θ) depends on the whole sample pathand this makes it hard to directly extend Wald (1949)’s consistency proof for parametric likelihoodestimator (cf. Van Der Vaart (1998) chapter 5.6) to EL. We follow Kitamura (1997)’s approach bydefining ΨT (θ) ≡∑Q+1i=1 log(1 + T−1/2γ(θ)τYi(θ))with some γ(θ) of simple structure and apply Wald(1949)’s type of argument to this pseudo sample estimator function. Kitamura (1997) proposed to replaceλ(θ) by its population counterpart argmaxγ∈ΓTE [log(1 + γτYi(θ))] with ΓT ≡{T−1/2u : ‖u‖ = 1}. In ourprove, we replace λ(θ) by a simpler deterministic object T−1/2γ(θ)τ with γ(θ) ≡ E[g(X1,θ)]1+‖E[g(X1,θ)]‖ and thisleads to a simpler proof.Proof of Proposition 2.1. The construction of γ(θ) gives that ‖γ(θ)‖ 6 1 for every θ ∈ Θ and thatΨ∗(θ) ≡ γ(θ)τE [g(X1, θ)] is minimized at θ∗ under Assumption 2. It is easy to see that by applying Lemma2.1, under Assumption 3 (a) maxi=1,··· ,Qsupθ∈Θ‖Yi(θ)‖ = op(T1/2+η). Also by applying Fitzenberger Lemma A1,under Assumption 3 (a), we have supθ∈Θ‖YQ+1(θ)‖ 6 a × Op(1) = op(T1/2+η). Therefore, w.p.a.1, ΨT iswell-defined everywhere on Θ. LetET ≡[maxi=1,··· ,Q+1supθ∈Θ∣∣∣T−1/2γ(θ)τYi(θ)∣∣∣ < κ]for some κ ∈ (0, 1), it is easy to verify that under the assumptions, limT−→∞P (ET ) = 1. So we can defineΨT arbitrarily outside of the event ET .402.7. ProofsFor every θ ∈ Θ, we have−`a(θ) 61QQ∑i=1−log(1 + T−1/2γ(θ)τYi(θ))+1Q(−log(1 + T−1/2γ(θ)τYQ+1(θ)))61QQ∑i=1M∑v=11M− log(1 + T−1/2γ(θ)τg(X(i−1)L+v, θ))+1Q(−log(1 + T−1/2γ(θ)τYQ+1(θ)))=1TT∑t=1−log(1 + T−1/2γ(θ)τg(Xt, θ))+Op(MT) +1Q(−log(1 + T−1/2γ(θ)τYQ+1(θ)))where the second inequality is by convexity of x 7→ −log (1 + x) and the equality is by Fitzenberger (1997)Lemma A1. It is straightforward to verify that the Op(MT ) term is uniform in θ. We get the mean-valueexpansion1Qlog(1 + T−1/2γ(θ)τYQ+1(θ))=1Qγ(θ)τ(T−1/2YQ+1(θ))+1QT−2/2(γ(θ)τYQ+1(θ))2(1 + ψ(θ)T−1/2γ(θ)τYQ+1(θ))2therefore 1Q(−log(1 + T−1/2γ(θ)τYQ+1(θ)))= O(Q−1) uniformly in θ. By another mean value expansion,for some mean value ψt(θ) with maxt=1,··· ,Tsupθ∈Θ|ψt(θ)| < 1, we have1T 1−1/2T∑t=1log(1 + T−1/2γ(θ)τg(Xt, θ))=1TT∑t=1γ(θ)τg(Xt, θ) +1TT∑t=1T−1/2 (γ(θ)τg(Xt, θ))2(1 + ψt(θ)T−1/2γ(θ)τg(Xt, θ))2 .DefineΨ˜T (θ) ≡1TT∑t=1γ(θ)τg(Xt, θ)andRT (θ) ≡1TT∑t=1T−1/2 (γ(θ)τg(Xt, θ))2(1 + ψt(θ)T−1/2γ(θ)τg(Xt, θ))2 .It is easy to check that a uniform law of large numbers for stationary and mixing processes (Lemma 2.4 ofNewey and McFadden (1994)) applies to Ψ˜T . By Lemma 2.1, we have maxt=1,··· ,Tsupθ∈Θ‖g(Xt, θ)‖ = op(T1/2+η),therefore w.p.a.1, on a sequence of events defined as[maxt=1,··· ,Tsupθ∈Θ∣∣ψt(θ)T−1/2γ(θ)τg(Xt, θ)∣∣ < κ]for some412.7. Proofsκ ∈ (0, 1), we havesupθ∈Θ|RT (θ)| 6 T−1/2 1TT∑t=1supθ∈Θ‖g(Xt, θ)‖2(1− κ)2= op(1).Under Assumption 3(a), by generalized monotone convergence theorem (Theorem 9.14 of Yeh (2006)),for each θ ∈ Θ, we havelimδ−→0E[supθ′∈B(θ,δ)− γ(θ′)τg(X1, θ′)]= −γ(θ)τE [g(X1, θ)] .By Assumption 2 (a), for each θ 6= θ∗, −γ(θ)τE [g(X1, θ)] < 0. Fix some > 0 and let B ≡ Θ \ B(θ∗, ).For each θ ∈ B, there exists some δθ such that E[supθ′∈B(θ,δθ)− γ(θ′)τg(X1, θ′)]< 0. Since B is compact,there exists θ1, · · · , θm with B ⊆⋃mj=1 B(θj , δθj ). Then we havesupθ∈BΨ˜(θ) 6 maxj=1,··· ,m1TT∑t=1supθ∈B(θj ,δθj )γ(θ)τg(Xt, θ)andmaxj=1,··· ,m1TT∑t=1supθ∈B(θj ,δθj )γ(θ)τg(Xt, θ) −→p maxj=1,··· ,mE supθ′∈B(θj ,δθj )− γ(θ)τg(X1, θ′)which is strictly negative. We also have[θˆa ∈ B]⊆[supθ∈B− T 1/2`a(θ) > −T1/2`a(θ∗)]⊆[supθ∈B− Ψ˜T (θ) > −T1/2`a(θ∗) + op(1)].If T 1/2`a(θ∗) = op(1), in light of supθ∈B− Ψ˜T (θ) −→p supθ∈B− Ψ∗(θ) < 0, it is straightforward to show theprobability of the above event goes to 0 as T −→∞. The inequality−λ(θ∗)τ((1−aQ)1QQ∑i=1Yi(θ∗))61QQ+1∑i=1−log (1 + λ(θ∗)τYi(θ∗)) 6 0and Corollary 2.1 implies that 1Q∑Qi=1 Yi(θ∗) = Op(T−1/2). By Lemma 2.3, we haveλ(θ∗)τ((1−aQ)1QQ∑i=1Yi(θ∗))= Op(MT).422.7. ProofsThen it follows that T 1/2`a(θ∗) = Op(T−1/2+η).Lemma 2.5. We have∥∥∥1Q∑Qi=1 Yi(θˆa)∥∥∥ = Op(T−1/2) and θˆa = θ∗ +OP (T−1/2).Proof. We can use the same arguments as used in proofs of Lemma A3 of Newey and Smith (2004) andLemma A15 of Donald, Imbens and Newey (2003).By Lemma 2.3 and Lemma 2.5, it follows that MQ∑Qi=1 Yi(θˆa)Yi(θˆa)τ is consistent for the long-run covari-ance matrix S. Moreover,∥∥∥∥MQYQ+1(θˆ)YQ+1(θˆ)τ∥∥∥∥ 6MQa2∥∥∥∥∥1QQ∑i=1Yi(θˆ)∥∥∥∥∥2= op(1)if a = op(T1/2) andM = O(T 1/2−1/2+η). We are now ready to establish asymptotic normality of the adjustedBEL estimator. Using the expansion technique proposed in Newey and Smith (2004), we prove that theasymptotic normality holds under weaker assumption on the derivative of the moment restrictions.Proof of Proposition 2.2. Let λˆ denote λ(θˆa) ≡ argmaxγ∈Γa(θˆa)∑Q+1i=1 log(1 + γτYi(θˆa)). We apply implicitfunction theorem (Theorem 9.28 of Rudin (1976)). For each θ, the real-valued random maximizer λ(θ)that attains supγ∈Γa(θ)∑Q+1i=1 log(1 + γτYi(θ)) is a solution to the equations∑Q+1i=1Yi(θ)1+λ(θ)τYi(θ)= 0 . Itis easy to see that if∑Q+1i=1 Yi(θ)Yi(θ)τ is p.d., then∑Q+1i=1Yi(θ)Yi(θ)τ(1+λ(θ)τYi(θ))2 is p.d. and by implicit functiontheorem, λ is one to one over a small neighborhood around θ and differentiable at θ. We assumed the trueparameter θ∗ is an interior point of Θ and we have proved that θˆa is consistent. So w.p.a.1, θˆa ∈ int(Θ).As a consequence of Lemma 2.4 and Lemma 2.2, w.p.a.1 MQ∑Q+1i=1 Yi(θˆa)Yi(θˆa)τ is p.d.. Therefore, on asequence of events w.p.a.1, `a is differentiable at θˆa and FOC is satisfied:0 =1QQ+1∑i=1∂∂θτlog(1 + λˆτYi(θˆa))=(1QQ+1∑i=1∂∂θτ Yi(θˆa)τ1 + λˆτYi(θˆa))λˆ. (2.30)The second equality is by the envelope theorem. By Lemma 2.3 and Lemma 2.4, λˆ = Op( MT 1/2 ). Expanding432.7. Proofsthe FOC for λˆ around 0 gives0 =1QQ+1∑i=1Yi(θˆa)−1QQ+1∑i=1Yi(θˆa)Yi(θˆa)τ(1 + λ˙τYi(θˆa))2 λˆ, (2.31)where λ˙ is the mean value. Define Gˆ ≡ 1Q∑Qi=1∂∂θτ Yi(θˆa)τ1+λˆτYi(θˆa). Then from the expression (2.30), we have(Gˆ+1Q11 + λˆτYQ+1(θˆa)(∂∂θTYQ+1(θˆa))τ)λˆ = op(T−1/2). (2.32)From (2.31), define Ωˆ ≡ 1Q∑Q+1i=1 Yi(θˆa)Yi(θˆa)τ and Yˆ ≡ 1Q∑Qi=1 Yi(θˆa) and then we haveYˆ +Q−1YQ+1(θˆa)−Ωˆ +1QYQ+1(θˆa)YQ+1(θˆa)τ(1 + λ˙τYQ+1(θˆa)τ)2 λˆ = 0. (2.33)Then from the expression (2.31), we have: w.p.a.1,λˆ = Ωˆ−1Yˆ + Ωˆ−11QYQ+1(θˆa). (2.34)Plug the expression (2.34) into (2.32), we obtainop(T−1/2) = GˆΩˆ−1Yˆ + GˆΩˆ−11QYQ+1(θˆa) +1Q11 + λˆτYQ+1(θˆa)(∂∂θτYQ+1(θˆa))τλˆ. (2.35)Expanding around θ∗ givesGˆΩˆ−1Yˆ = GˆΩˆ−1G˙(θˆa − θ∗) + GˆΩˆ−1 1QQ∑i=1Yi(θ∗) (2.36)where G˙ ≡ 1Q∑Qi=1∂∂θτ Yi(θ˙) for mean value θ˙ that could be different in each row. By uniform weak lawof large numbers for stationary and mixing processes (Lemma 2.4 of Newey and McFadden (1994)), wehave G˙ −→p G ≡ E[∂∂θτ g(X1, θ∗)].By applying Lemma 2.5, we can get Gˆ(M Ωˆ)−1T 1/2 1QYQ+1(θˆa) = Op(T 1/2+ηQ ). By applying Lemma442.7. Proofs2.4 and Lemma 2.1, we can get T1/2M1Q11+λˆτYQ+1(θˆa)(∂∂θτ YQ+1(θˆa))τλˆ = Op(T1/2+ηQ ). Then we haveop(1) = Gˆ(M Ωˆ)−1G˙T 1/2(θˆa − θ∗) + Gˆ(M Ωˆ)−1T 1/21QQ∑i=1Yi(θ∗).By Corollary 2.1, T 1/2 1Q∑Qi=1 Yi(θ∗) −→d Normal(0,S). By Lemma 2.3 and Lemma 2.5, we have(M Ωˆ)−1−→p S−1 and therefore Gˆ(M Ωˆ)−1T 1/2 1Q∑Qi=1 Yi(θ∗) −→d Normal(0,GTS−1G). Then fromthe last display, we get T 1/2(θˆa − θ∗) −→d Normal(0,(GTS−1G)−1). From the expression (2.34), we getT 1/2Mλˆ =(M Ωˆ)−1(I− G˙(Gˆ(M Ωˆ)−1G˙)−1Gˆ(M Ωˆ)−1)(T 1/21QQ∑i=1Yi(θ∗))+ op(1)which gives the second result.To prove Proposition 2.3, we can use the same expansion argument as used by Theorem 2 of Kitamura(1997) to expand the BEL ratio statistics ALRI and ALRII.Then we can show that leading terms haveχ2 limiting distribution. We then use arguments similar to proof of Proposition 2.2 to show that theadditional terms given by the pseudo observation are of smaller asymptotic order.2.7.2 Basic Setup for Deriving Higher-Order PropertiesRemember that we transform the effective observations to Wi(θ) ≡ ΨLV−1/2Yi(θ) where ΨL is given by(2.14) and V ≡ Var(Yi(θ∗)). The transformation makes the derivation easier since we can apply some keyalgebraic results in Diccicio, Hall and Romano (1991), Chen and Cui (2007) and Matsushita and Otsu(2012). We also defineΥ1(λ, θ) ≡1QQ∑i=1Wi(θ)1 + λτWi(θ)Υ2(λ, θ) ≡1QQ∑i=1(∂∂θTWi(θ))τλ1 + λτWi(θ)andΞ ≡−I Ξ12Ξ21 0452.7. Proofswhere Ξ21 ≡ ΨR [Λ 0], Ξ12 ≡ Ξτ21 and ΨR is given by (2.14). Let Ω ≡ ΨRΛ−1 and we use ωkl to denotethe kl-th element of Ω. It is easy to check that ωkl = O(M−1/2). In this section superscript alwaysdenotes coordinate, i.e. let aj denote the j-th component of a real vector a. Denote µ ≡ (λ, θ). We defineΥ(µ) = S−1Υ1(λ, θ)Υ2(λ, θ). Define the terms (α−A system, β −B system, γ − C system respectively)αj1...jk ≡ E[W j10 · · ·Wjk0 ] ; Aj1...jk ≡1QQ∑i=1W j1i · · ·Wjki − αj1...jk (2.37)βj,j1...jk ≡ E[∂kΥj(µ)∂µj1 . . . ∂µjk∣∣∣∣µ=(0,θ∗)]; Bj,j1...jk ≡1QQ∑i=1∂kΥj(µ)∂µj1 . . . ∂µjk∣∣∣∣µ=(0,θ∗)− βj,j1...jkandγj,j1...jl;k,k1...km;p,p1...pn ≡ E[∂lW0(θ)∂θj1 . . . ∂θjl∂kW0(θ)∂θk1 . . . ∂θkm∂pW0(θ)∂θp1 . . . ∂θpn∣∣∣∣θ=θ∗];Cj,j1...jl;k,k1...km;p,p1...pn ≡1QQ∑i=1∂lWi(θ)∂θj1 . . . ∂θjl∂kWi(θ)∂θk1 . . . ∂θkm∂pWi(θ)∂θp1 . . . ∂θpn∣∣∣∣θ=θ∗− γj,j1...jl;k,k1...km;p,p1...pn .We also use the notation αj1...jk1˜jk1+1 ...jk2 ≡ E[W j10 · · ·Wjk10 Wjk1+11 · · ·Wjk21]to denote first-order auto-covariance of the transformed block variables Wj , j ∈ Z. Similar to Diccicio, Hall and Romano (1988)step 2, linearizing the first order condition 1Q∑Qi=1Wi(θ∗)1+λ˜τWi(θ∗)= 0 we get the asymptotic expansionλ˜j = Aj −AjkAk + αjklAkAl +AjlAklAk +AjklAkAl − αklmAjmAkAl − 2αjkmAlmAkAl+ 2αjknαlmnAkAlAm − αjklmAkAlAm +Op(Q−2) (2.38)and the remaining term can be shown to be Op(Q−2) in a similar way. By Lemma 2.7, it is observed thatAj1...jk = Op(Q−1/2). By applying the moment bound (Theorem 2) of Kim (1993) it is easy to check thatαj1...jk = O(1) under Assumption 5. We also have`(θ∗) = 2Q∑i=1{λ˜τWi(θ∗)−12[λ˜τWi(θ∗)]2+13[λ˜τWi(θ∗)]3−14[λ˜τWi(θ∗)]4}+Op(Q−3/2) (2.39)462.7. ProofsBy plugging (2.38) into (2.39), we can get the expansion`(θ∗) = AjAj −AjiAjAi +23αjihAjAiAh +AjiAhiAjAh +23AjihAjAiAh − 2αjihAghAjAi+ αjgfαihfAjAiAhAg −12αjihgAjAiAjAg +Op(Q−5/2). (2.40)The algebraic expression of the expansions (2.38) and (2.40) is identical to its counterpart expansion ofEL lagrange multiplier and EL ratio statistic, except that the averages A′s in this paper are defined interms of block variables. We use (θˆ, λˆ) to denote the BEL estimator and the corresponding Lagrangemultiplier which is the saddle point of (θ, λ) 7→∑Qi=1 log(1 + λτWi(θ)) under Assumption 4, the first-order condition Υ(λˆ, θˆ) = 0 is satisfied at (θˆ, λˆ) w.p.a.1. Linearizing these first-order conditions, we getasymptotic expansion for (θˆ, λˆ) with remainder of order Op(Q−2). This expansion is in the same algebraicexpression as Chen and Cui (2007)’s equation (2.6) except that the remainder is of order Op(Q−2) andthe averages are defined in terms of block variablesµˆj − µj∗ = −Bj +Bj,kBk −12βj,klBkBl −Bj,kBk,lBl + βk,lmBj,kBlBm + βj,klBk,mBmBl (2.41)−12βj,klβk,mnBmBnBl −12Bj,klBkBl +16βj,klmBkBlBm +Op(Q−2).We also have the expansion`(θˆ) = 2Q∑i=1{λˆτWi(θˆ)−12[λˆτWi(θˆ)]2+13[λˆτWi(θˆ)]3−14[λˆτWi(θˆ)]4}+Op(Q−3/2) (2.42)By plugging (2.41) into (2.42), we get an asymptotic expansion expressed as polynomials of the β−Bsystem with error term of order Op(Q−3/2). We then find asymptotic expansions of the BEL ratio statisticsLRI and LRII. See Chen and Cui (2007) formula (2.9) and Matsushita and Otsu (2013) formula (2) fordetailed algebraic expression. Chen and Cui (2007)’s fundamental formula (A1 to A6 of Chen and Cui(2007)) that expresses the β − B system in terms of α − A system and γ − C system can be applied inour case where the β − B, α − A and γ − C systems are defined in terms of block variables. Applyingthese key algebraic equalities, from the algebraic expression of signed root decomposition of the EL ratiostatistics in Chen and Cui (2007) and Matsushita and Otsu (2013), we find a signed-root decomposition472.7. Proofsof the BEL ratio statistic, which are some p−dimensional random vectors RI,1, RI,2 and RI,3 and some(d− p)−dimensional random vectors RII,1, RII,2 and RII,3 withLRI = Q (RI,1 +RI,2 +RI,3)τ (RI,1 +RI,2 +RI,3) +Op(Q−1M−1/2)andLRII = Q (RII,1 +RII,2 +RII,3)τ (RII,1 +RII,2 +RII,3) +Op(Q−1M−1/2).We can find exlicit expressions of RI,1, RI,2 and RI,3, RII,1, RII,2 and RII,3. For any l ∈ {1, . . . , p}, wedefine RlI,1 ≡ Al;RlI,2.1 ≡ −12AklAl −Al p+aAp+a +13αklmAkAm + αkl p+aAp+aAk + αl p+a p+bAp+aAp+b;RlI,2.2 ≡ ωklCp+a,kAp+a −12γp+a,mnωmkωnlAp+aAk − γp+a;p+b,kωklAp+aAp+b;RlI,3.1 ≡38AlmAkmAk +13AlkmAkAm −512αlkmAnmAkAn −512αknmAlmAkAn+49αlknαomnAmAkAo −14αlknmAmAkAn;RlI,3.2 ≡ Alk p+aAp+aAk +Al p+a p+bAp+aAp+b +12AlkAk p+aAp+a +Al p+aAp+a p+bAp+b+12Al p+aAp+a kAk − αln p+bAp+b p+aAp+aAn − αln p+aAn p+aAp+aAp+b;RlI,3.3 ≡− αp+a p+b p+cAl p+cAp+aAp+b − 2αl p+a p+cAp+c p+bAp+aAp+b− αlk p+aAmp+aAkAm −23αlkmAmp+aAp+aAk −32αko p+aAloAp+aAk− 2αk p+a p+bAl p+bAp+aAk − αmp+a p+bAlmAp+aAp+b482.7. ProofsandRlI,3.4 ≡12αlk p+aαmnp+aAmAnAk +[2αp+a kfαlmf − αlkmp+a −13αlmnαkn p+a]Ap+aAkAm−[32αlk p+a p+b +13αklvαv p+a p+b +12αkv p+aαlv p+b]Ap+aAp+bAk+[αl p+a fαp+b p+c f − αl p+a p+b p+c]Ap+aAp+bAp+c+[2αl p+a fαmp+b f + αlmfαp+a p+b f]Ap+aAp+bAm.We also define RI,2 ≡ RI,2.1 + RI,2.2 and RI,3 ≡ RlI,3.1 + RlI,3.2 + RlI,3.3 + RlI,3.4. RI,1 + RI,2 + RI,3 is thedesired signed root decomposition. For any a ∈ {1, . . . , d− p}, we defind Rp+aII,1 ≡ Ap+a;Rp+aII,2.1 ≡ −12Ap+bAp+a p+b +13αp+a p+b p+cAp+bAp+c;Rp+aII,2.2 ≡ −ωklCp+a,kAl +12ωkmωlnγp+a,klAmAn + ωlmγp+a;p+b,lAp+bAm;andRp+aII,3 ≡38Ap+a p+cAp+b p+cAp+b +49αp+e p+a p+bαp+e p+c p+dAp+bAp+cAp+d−14αp+a p+b p+c p+dAp+bAp+cAp+d +13Ap+a p+b p+cAp+bAp+c−56αp+a p+b p+cAp+c p+dAp+bAp+d.We also define RII,2 ≡ RII,2.1 + RII,2.2. RI,1 + RI,2 + RI,3 is the desired signed root decomposition. Kim(1993)’s moment bound shows that αfgh = O(1) for any (f, g, h) ∈ {1, . . . , d}3 and γp+a;p+b,l = O(1)for any a, b ∈ {1, . . . , d− p} and any l ∈ {1, . . . , p}. It is also easy to see that γp+a,kl = O(M 1/2). Itis readily seen that RI,1, RII,1 = Op(Q−1/2), RI,2.1, RII,2.1 = Op(Q−1), RI,2.2, RII,2.2 = Op(Q−1M−1/2) andRI,3, RII,3 = Op(Q−3/2). LetU¯A ≡ (A1, . . . , Ad, A11, . . . , Add, A111, . . . , Addd)τ (2.43)andU¯C ≡ (C1,1, . . . , C1,q, . . . , Cd,1, . . . , Cd,q, C1;1,1, . . . , Cd;d,q)τ . (2.44)492.7. ProofsUsing theorems in Lahiri (2006,2007), we establish formal edgeworth expansion of (U¯ τA, U¯τC)τ . The signedroot statistics Q1/2RI and Q1/2RII are both polynomials in (U¯ τA, U¯τC)τ , which is a vector of centralizedsample means of block variables.2.7.3 Proofs of Propositions in Section 2.4Some Preliminary ResultsWe use the following notation. For an integral vector ν ∈ Zk+ and a real vector x ∈ Rk, we writexν =(x1)ν1· · ·(xk)νk. Let |·|1 denote the norm |x|1 =∑kj=1∣∣xk∣∣. As in Kitamura (1997), we needto modify the formula in step 6 of Diciccio, Hall and Romano (1989) step 6 to take account of serialcorrelation. LetUi ≡Wi(θ∗)Vec(∂∂θτWi(θ)∣∣θ=θ∗− E[∂∂θτWi(θ)∣∣θ=θ∗]) .For some νj ∈ Z+, j ∈ {1, . . . , 6}, let Tji ≡ Uνji − E[Uνji ], and Vj ≡ 1Q∑Qi=1 Tji . Remember that weonly consider non-overlapped blocking. Remember that we did the transformation Wi(θ) ≡ ΨLV−1/2Yi(θ)and that V = O(M−1). By using Kim (1993)’s moment bound for mixing processes, it is easy to checkthat under Assumption 5, the first to sixth cross moments of the variables(T 10 , . . . , T60)τare O(1). Wehave the following expansion result for moments of the averages of the block variables(T 1, . . . , T 6)τ.It is understood that T = T0 and T+ = T1 and the short notation E[T 1T 2+][C12 ; 1, 2] (Cnm ≡n!(n−m)!)means we have C12 terms by choosing 1 index number from 2 indices and putting it in the position of theunderlined index, i.e. anyE[T 1T 2+][C12 ; 1, 2] = E[T1T 2+] + E[T2T 1+]and similarly, for example, it is understood thatE[T 1T 2+T3+][C23 ; 3, 2, 1] = E[T1T 2+T3+] + E[T3T 1+T2+] + E[T2T 1+T3+].Lemma 2.6. Under Assumptions 5,6, the following expansion holds:E[V 1V 2] =1Q{E[T 1T 2] + E[T 1T 2+][C12 ; 1, 2]}−1Q2E[T 1T 2+][C12 ; 1, 2] +O(1Q3) (2.45)502.7. ProofsE[V 1V 2V 3] =1Q2{E[T 1T 2T 3] + E[T 1T 2T 3+][C13 ; 3, 2, 1] + E[T1T 2+T3+][C23 ; 3, 2, 1]}−1Q3{E[T 1T 2T 3+](C13 ; 3, 2, 1) + E[T1T 2+T3+][C23 ; 3, 2, 1]}+O(1Q4) (2.46)E[V 1V 2V 3V 4] =1Q2{12E[T 1T 2]E[T 3T 4][C24 ; 1, 2, 3, 4] +(E[T 1T 2]E[T 3T 4+])[C12 ; 3, 4][C24 ; 1, 2, 3, 4]}+1Q3{−12E[T 1T 2]E[T 3T 4][C24 ; 1, 2, 3, 4]− 2(E[T 1T 2]E[T 3T 4+])[C12 ; 3, 4][C24 ; 1, 2, 3, 4]}+1Q3{E[(T 1T 2 − E[T 1T 2]) (T 3+T4+ − E[T3+T4+])][C24 ; 1, 2, 3, 4] + E[T1T 2T 3T 4]+E[T 1T 2+T3+T4+][C34 ; 1, 2, 3, 4] + E[T1T 2T 3T 4+][C14 ; 1, 2, 3, 4]}+O(1Q4) (2.47)E[V 1V 2V 3V 4V 5] =1Q3{E[T 1T 2]E[T 3T 4T 5][C25 ; 1, 2, 3, 4, 5]+(E[T 1T 2]E[T 3T 4+T5+])[C23 ; 3, 4, 5][C25 ; 1, 2, 3, 4, 5]+(E[T 1T 2]E[T 3T 4T 5+])[C13 ; 3, 4, 5][C25 ; 1, 2, 3, 4, 5]+(E[T 1T 2+]E[T3T 4T 5])[C12 ; 1, 2][C35 ; 1, 2, 3, 4, 5]}+O(1Q4) (2.48)E[V 1V 2V 3V 4V 5V 6] =1Q3{E[T 1T 2]E[T 3T 4]E[T 5T 6]+(E[T 1T 2]E[T 3T 4]E[T 5T 6+])[C16 ; 1, 2, 3, 4, 5, 6]}+O(1Q4) (2.49)Proof. First, we assume that Dj = σ(Xj) so that under Assumption 6, (Xt)t∈Z has exponentially decayingstrong mixing coefficient: αX(m) 6 c1exp(−c2m) for some positive constants c1 and c2 so that (Zt)t∈Z(cf. (2.16) for definition) also has exponentially decaying mixing coefficients. Then applying the argumentin Kitamura (1997) (c.f. Page 2099) with covariance inequality for strong mixing processes (c.f. LemmaA2 of Hall and Heyde (1980)) E[TiTi+1Ti+2], {E[TiTi+k] : k > 2}, E[TiTi+1Ti+2Ti+3] are exponentiallydecaying. Furthermore, it is easy to see that if replacing the exponential decaying mixing coefficientsassumption on (Xt)t∈Z with that Zt can be approximated in L2 norm with exponentially decaying errorby another sequence of random vectors(Z†t,m)m∈Nsatisfying the Dj+mj−m -measurable assumption withsome sequence of σ−fields (Dj)j∈Z with exponentially decaying strong mixing coefficients, E[TiTi+1Ti+2],512.7. Proofs{E[TiTi+k] : k > 2}, E[TiTi+1Ti+2Ti+3] still have the property of exponential decaying rate and the results(2.45), (2.46), (2.47), (2.48) and (2.49) still hold.Next we need to find the asymptotic order of the autocovariance terms like E[T 1T 2T 3+]. This is givenby the next lemma.Lemma 2.7. Under Assumption 5,6, the following results hold:(a) For ν1 and ν2 with |ν1| = |ν2| = 1, we have Cov(Uν1i , Uν2i+1) = O(M−1) and ν1 and ν2 with |ν1| = 1,|ν2| = 2 we have Cov(Uν1i , Uν2i+1) = O(M−1/2).(b) For |ν1| 6 3 and |ν2| 6 3, we have Cov(Uν1i , Uν2i+1) = o(1).(c) For |ν1| 6 3 and |ν2| 6 3, limT−→∞Cov(Q−1/2∑Qi=1 Uν1i , Q−1/2∑Qi=1 Uν2i ) exists.Proof. We assume for now that Dj = σ(Xj) so that under Assumption 6, (Xt)t∈Z has exponentiallydecaying strong mixing coefficients as in proof of previous lemma. This can be relaxed easily so thatthe lemma holds under more general conditions. For (a), we notice that if |ν1| = |ν2| = 1, assumeν1 = ν2 = (1, 0, . . . , 0) without loss of generality. Denote ψt ≡ ΨLV−1/2g(Xt, θ∗). Then for some r > 2,∣∣E[Uν1i , Uν2i+1]∣∣ 61M2M∑i=1M∑j=1∣∣E[ψ1i ψ1M+j ]∣∣61M2M∑i=1M∑j=1κ · αX(M + j − i)1/2−1/r(E∣∣ψ11∣∣2)1/2 (E∣∣ψ11∣∣r)1/rwhere we applied Lemma A2 of Hall and Heyde (1980). Then it is easy to see that the result follows fromthe observation that∑Mi=1∑Mj=1 κ·αX(M+j−i)1/2−1/r = O(1) for some κ > 0 and(E∣∣ψ11∣∣2)1/2 (E∣∣ψ11∣∣r)1/r=O(M). Proof of second part of (a) is similar. Part (b) follows from the “subseries” argument used inproofs of many results in time series statistics. Let M˜ ≡ M 1/4. For simplicity, without loss of generality,we assume ν1 = ν2 = (1, 1, 0, . . . , 0), decompose the sumWi(θ∗) =1MM˜∑v=1ΨLV−1/2g(XM+v, θ∗) +M∑v=M˜+1ΨLV−1/2g(XM+v, θ∗) .522.7. ProofsThen∣∣Cov(Uν1i , Uν2i+1)∣∣ 6∣∣∣∣∣∣Cov(Uν1i ,1M2M∑v=M˜+1ΨLV−1/2g(XM+v, θ∗))ν2∣∣∣∣∣∣+ 2∣∣∣∣∣∣Cov(Uν1i ,1M2M∑v=M˜+1ΨLV−1/2g(XM+v, θ∗))1M˜∑v=1ΨLV−1/2g(XM+v, θ∗))1)∣∣∣∣∣∣+∣∣∣∣∣∣Cov(Uν1i ,1M2M˜∑v=1ΨLV−1/2g(XM+v, θ∗))1M˜∑v=1ΨLV−1/2g(XM+v, θ∗))1)∣∣∣∣∣∣and under the assumption of exponentially decaying the mixing coefficients, by the mixing inequality,the first term has an exponential decaying rate. By Cauchy-Schwartz inequality and Kim (1993)’smoment bound, we get that the second term is O(M˜ 1/2M−1/2) and the third term is O(M˜M−1). Bypart (b), Cov(Q−1/2∑Qi=1 Uν1i , Q−1/2∑Qi=1 Uν2i ) = Cov(Uν11 , Uν21 ) + op(1). Remember that we assumedΣ ≡ limM−→∞Var(U1) is nonsingular. Under Assumption 5, a central limit theorem for strong mixing process(Ibragimov, I. (1962)) applies and U1 −→d Normal(0,Σ) as M −→ ∞. Let U∞ denote a random vectorwith Normal(0,Σ) distribution. By Kim (1993)’s moment bound, under Assumption 5, supM∈NE |U1|6+ <∞for some > 0. Then Corallary 11.6 of Severini (2005) applies and we have limT−→∞E[Uν1i Uν2i ] = E[Uν1∞Uv2∞ ],limT−→∞E[Uν1i ] = E[Uν1∞ ] and limT−→∞E[Uν21 ] = E[Uν2∞ ].In addition to finding asymptotic order of these autocovariance terms that appear in expansion ofcumulants, part (c) helps to verify one key condition of Lahiri (2006)’s edgeworth expansion theorem forblock variables (c.f. Lahiri (2006)’s C2 (ii)).Expansion of Cumulants of RI and RIIUsing the basic formulas (2.45)(2.46)(2.47)(2.48)(2.49), we are able to formally expand the cumulantsCum(Rl, Ro), Cum(Rl, Ro, Rm, Rn) and Cum(Rl, Ro, Rk). For Cum(Rl, Ro, Rk), the leading term shouldbe of order Q−2. The algebraic expression for leading terms of Cum(Rl, Ro, Rk) is identical to thethird-order cumulant of the signed-root decomposition of the EL ratio statistic. Remember that Rk =Rk,1 + Rk,2 + Rk,3 for k = I, II. We apply Lemma 2.6 and Lemma 2.7 to get expansions of momentsand cumulants. For example, we obtain the following expansion for E[RlI,2]and E[Rp+aII.2]. We haveE [RI,1] = 0 and E [RII,2] = 0 and532.7. ProofsE[RlI,2.1]= Q−1(−16αlkk)+Q−1{−12(αk lk˜ + αk k˜l)−(αp+a l p˜+a + αp+a p˜+a l)+13(αklmαk m˜[2; k,m])+αkl p+a(αp+a k˜ + αp˜+a k)+ αl p+a p+bαp˜+a p+b[2; a, b]}+O(Q−2) (2.50)E[RlI,2.2]= Q−1{ωkl(γp+a,k;p˜+a + γp˜+a,k;p+a)−12γp+a,mnωmkωnl(αp+a k˜ + αp˜+a k)−γp+a;p+b,kωklαp+a p˜+b[2; a, b]}+O(Q−2) (2.51)E[Rp+aII,2.1]= Q−1(−16αp+a p+b p+b)+Q−1{−12[αp˜+b p+a p+b + αp+b˜p+a p+b]+13αp+a p+b p+c(αp+b p˜+c[2; b, c])}+O(Q−2) (2.52)E[Rp+aII.2.1] = Q−1{+12ωkmωlmγp+a,kl − ωklγl;p+a,k − ωkl[γl;p˜+a,k + γ l˜;p+a,k]+ωlmγp+a;p+b,l[αp+b m˜ + αp˜+bm]+12ωkmωlnγp+a,kl(αmn˜[2;m,n])}+O(Q−2). (2.53)We can also obtain E [RI,3] ,E [RII,3] = O(Q−2). The difference between the expansions (2.50)(2.51)(2.52)(2.53) and expansions of first-order cumulants of the signed root decomposition of EL ratio statistics isthe existence of autocovariance terms in (2.50)(2.51)(2.52) (2.53). These autocovariance terms are all o(1)and of different asymptotic sizes.We define the constant∆lI,[1,1] ≡ −16αlkk∆lI,[1,2] ≡ M1/2(−12(αk lk˜ + αk k˜l)−(αp+a l p˜+a + αp+a p˜+a l))∆p+aII,[1,1] ≡ −16αp+a p+b p+b∆p+aII,[1,2] ≡ M1/2(−12[αp˜+b p+a p+b + αp+b˜p+a p+b]+12ωkmωlmγp+a,kl − ωklγl;p+a,k).542.7. ProofsBy Lemma 2.7, we have ∆lI,[1,1],∆lI,[1,2] = O(1) and ∆p+aII,[1,1] = ∆p+aII,[1,2] = O(1). For the third-ordercumulants, we have the formula (cf. McCullagh (1987) chapter 2)Cum(RlI, RoI , RvI)= E[RlIRoIRvI]− E[Rl]E [RoRv] [3; l, o, v] + 2E[Rl]E [Ro] E [Rv] .By applying Lemma 2.6 Lemma 2.7, we find thatCum(RlI, RoI , RvI)= E[RlI,1RoI,1RvI,1]+ E[RlI,2.1RoI,1RvI,1][3; l, o, v]− E[RlI,2.1]E[RoI,1RvI,1][3; l, o, v] +O(Q−2M−1/2) (2.54)andCum(Rp+aII , Rp+dII , Rp+eII)=E[Rp+aII,1 Rp+dII,1 Rp+eII,1]+ E[Rp+aII,2.1Rp+dII,1 Rp+eII,1][3; a, d, e]− E[Rp+aII,2.1]E[Rp+dI,1 Rp+eII,1][3; a, d, e] +O(Q−2M−1/2).By applying Lemma 2.6 and Lemma 2.7, we obtainE[RlI,2.1]E[RoI,1RvI,1]= Q−2(−16αlkkαov)+O(Q−2M−1/2),E[RlI,2.1RoI,1RvI,1]= Q−2(−16αkklαov −13αolv)+O(Q−2M−1/2)andE[RlI,1RoI,1RvI,1]= Q−2αlov +O(Q−2M−1/2).It follows that Cum(RlI, RoI , RvI)= O(Q−2M−1/2) by the observation that the sum of the leading terms ofE[RlI,1RoI,1RvI,1]+ E[RlI,2.1RoI,1RvI,1][3; l, o, v]− E[RlI,2.1]E[RoI,1RvI,1][3; l, o, v]is equal to zero. These are a natural extension of the property of the fourth order joint cumulants ofEL ratio statistics (cf. Chen and Cui (2007) and Matsushita and Otsu (2013)). This result is crucialfor the existence of a higher-order refinement. In fact, working out lengthy algebra, we obtain a further552.7. Proofsexpansion of Cum(RlI, RoI , RvI ):Cum(RlI, RoI , RvI ) = Q−2{13αklo((αkv˜[2; k, v])[3; l, o, v])+ ωkl(γp+a,k;o(αv p˜+a + αp+a v˜)+ γp+a,k;v(αo p˜+a + αp+a o˜))[3; l, o, v]−12γp+a,mnωnl(ωmv(αp+a o˜ + αo p˜+a)+ ωmo(αp+a v˜ + αv p˜+a))[3 : l, o, v]−(αov˜ + αvo˜)[−12(αklk˜ + αkk˜l)−(αp+a l p˜+a + αp+a p˜+a l)+13(αklmαkm˜[2; k,m])+ ωkl(γp+a,k;p˜+a + γp˜+a,k;p+a)+[αkl p+a −12γp+a,mnωmkωnl](αp+a k˜ + αp˜+a k)+[αl p+a p+b − γp+a;p+b,kωkl] (αp+a p˜+b + αp˜+a p+b)]}+O(Q−3) (2.55)It is observed that the leading terms in (2.55) are of size (Q−2M−1). For Cum(Rp+aII , Rp+dII , Rp+eII), wealso obtain that the leading terms ofE[Rp+aII,1 Rp+dII,1 Rp+eII,1]+ E[Rp+aII,2.1Rp+dII,1 Rp+eII,1][3]− E[Rp+aII,2.1]E[Rp+dI,1 Rp+eII,1][3]is equal to zero and thus Cum(Rp+aII , Rp+dII , Rp+eII)= O(Q−2M−1/2). A further expansion gives thatCum(Rp+aII , Rp+dII , Rp+eII)= O(Q−2M−1).For the fourth-order cumulants, we have the formula (cf. McCullagh (1987) chapter 2)Cum(RlI, RoI , RmI , RnI)= E[RlIRoIRmI RnI]−12E[RlIRoI]E [RmI RnI ] [6; l, o,m, n]− E[RlI]E [RoIRmI RnI ] [4; l, o,m, n]+ 2E[RlI]E [RoI ] E [RmI RnI ] [6;m,n, l, o]− 6E[RlI]E [RoI ] E [RmI ] E [RnI ] .562.7. ProofsBy applying Lemma 2.6 Lemma 2.7, we find thatCum(RlI, RoI , RmI , RnI)= E[RlI,1RoI,1RmI,1RnI,1]+ E[RlI,2.1RoI,1RmI,1RnI,1]+ E[RlI,3RoI,1RmI,1RnI,1]−12E[RlI,1RoI,1]E[RmI,1RnI,1][6; l, o,m, n] + E[RlI,2.1RoI,2.1RmI,1RnI,1][6; l, o,m, n]− E[RlI,2.1RoI,1]E[RmI,1RnI,1][3; o,m, n][4; l, o,m, n]− E[RlI,3RoI,1]E[RmI,1RnI,1][3; o,m, n][4; l, o,m, n]− E[RlI,2.1RoI,2.1]E [RmI RnI ] [6; l, o,m, n]− E[RlI,2.1]E [RoIRmI RnI ] [4; l, o,m, n]− E[RlI,2.1]E[RoI,2.1RmI RnI][3; o,m, n][4; l, o,m, n]+ 2E[RlI,2.1]E[RoI,2.1]E[RmI,1RnI,1][6;m,n, l, o] +Op(Q−3M−1/2) (2.56)Similar to what we did for third-order cumulants, we apply the formulas in Lemma 2.6 to expand theexpectations in (2.56). We find that the sum of O(Q−2) terms (non-autocovariance terms) is equal tozero. This is still a natural extension of the property of the fourth order joint cumulants of EL ratiostatistics. By the last part of Lemma 2.7, some of the autocovariance terms in the expansion are largerthan Op(Q−3M−1/2). Specifically, after working out the algebra, we find thatCum(RlI, RoI , RmI , RnI ) =1Q2(3(αl˜k[2; l, k])(αm˜n[2;m,n]))+1Q3{(αlkm˜n − αlkαmn)[C13 ; k,m, n][C14 ; l, k,m, n]+(αlkm˜n − αlkαmn)[C24 ;m,n, l, k] + αlk˜mn[C14 ; l, k,m, n] + αlkmn˜[C14 ; l, k,m, n]+ αlk˜mn[C13 ; k,m, n][C12 ;m,n][C14 ; l, k,m, n]+(αmnl˜k − αlkαmn)[C13 ;m,n, k][C14 ; l, k,m, n]+14((αlmn˜k − αnkαlm)+(αlmn˜k − αnkαlm))[C12 ;n,m][C24 ; l, k,m, n]+13(αl˜mnk + αlmnk˜)[C13 ; k,m, n][C12 ;m,n][C14 ; l, k,m, n]}+Op(Q−3M−1/2)(2.57)The first autocovariance term 3(αl˜k[2; l, k]) (αm˜n[2;m,n])is O(M−2) by Lemma (2.7). The autocovari-ance terms in line 2 to 7 in (2.57) are of order o(1) but slightly bigger than O(M−1/2). Surprisingly, it isfound that the sum of these terms is zero. So the stochastic order of Cum(RlI, RoI , RmI , RnI ) is determined572.7. Proofsby the bigger one of Q−3M−1/2 and Q−2M−2. Similarly we can show thatCum(Rp+aII , Rp+bII , Rp+cII , Rp+dII ) = Op(Q−2M−2) +Op(Q−3M−1/2).We also need to expand the second-order cumulant and the refinement factor depends on the O(Q−2)terms in the expansion. Using the formulas in Lemma 2.6 and working out the algebra, we can get theexpansionE[RlI,2RoI,2] = Q−2J loI +Op(Q−2M−1/2)andE[RlI,3RoI,1] + E[RlI,2RoI,1] = Q−2K loI +Op(Q−2M−1/2)where it is defined thatJ loI ≡14(αlokk − αlo)+136αlkkαomm −736αlkmαokm + αlo p+a p+a − αl p+a p+bαo p+a p+b− αlk p+aαok p+a +14(αko k˜l − αlo)+(αp+a o p˜+a l)K loI ≡18(αlokk + αlo)−572αlkmαokm −172αlokαkmm −12αlokαk p+a p+a−14(αko k˜l − αlo)+16(αlko k˜ + αl˜ko k)−16(αlkk o˜ + αl˜kk o)−12(αl p+a o˜ p+a)[2; l, o]−(αp+a p+a l o˜ + α˜p+a p+a l o)and we observe that J loI = O(1) and KloI = O(1). Now we can get the expansion for the second cummulant:Cum(RlI, RoI ) = E[RlI,1RoI,1] + E[RlI,2RoI,2] +(E[RlI,3RoI,1] + E[RlI,2RoI,1])[2; l, o]− E[RlI,2]E[RoI,2] +Op(Q−3)=1Q1[l = o] +1Q(αlo˜ + αl˜o)+1Q2(J loI +KloI [2; l, o])−1Q2∆lI,[1,1]∆oI,[1,1] +Op(Q−2M−1/2).An expansion for Cum(Rp+aII , Rp+bII ) can be obtained in a similar fashion. Let ∆loI,[2,1] ≡ M(αlo˜ + αl˜o)=O(1). Summarily, We have the following expansion result for the first to fourth order cumulants of the582.7. Proofsscaled signed root statistic Q1/2RI:Cum(Q1/2RlI) =1Q1/2∆lI,[1,1] +1Q1/2M 1/2∆lI,[1,2] +1Q1/2M∆lI,[1,3] + o(Q−1/2M−1)Cum(Q1/2RlI, Q1/2RoI ) = 1[l = o] +1M∆loI,[2,1] +1Q∆loI,[2,2] +O(Q−1M−1/2)Cum(Q1/2RlI, Q1/2RoI , Q1/2RmI ) =1Q1/2M∆lomI,[3,1] + o(Q−1/2M−1)Cum(Q1/2RlI, Q1/2RoI , Q1/2RmI , Q1/2RnI ) =1M2∆lomnI,[4,1] +1QM 1/2∆lomnI,[4,2] + o(M−2) + o(Q−1M−1/2) (2.58)where the ∆′s are all O(1). It also can be checked that fifth and sixth cumulants are of orders smallerthan O(M−2) + O(Q−1M−1/2). We have similar expansion results (2.58) as for Q1/2RII with differentcoefficients ∆′s.Proofs of Propositions 2.4,2.5,2.6Proof of Proposition 2.4. For notational simplicity, let κ(j) denote j − th cumulant of Q1/2RI which isa pj dimensional array. Under assumptions 5, 6 on the DGP and Lahiri (2006)’s conditions C1 to C6 fornonoverlapping blocks (c.f. page 18, Lahiri (2006)) hold. Lemma (2.7) verifies C2(ii). Verification of restof the conditions are identical to proofs of (5.35) and (5.36) of Lahiri (2006). Therefore the sums of blockvariables (U¯TA , U¯TC )τ admit a formal edgeworth expansion. The signed root statistic Q1/2RI is a smoothfunction of (U¯ τA, U¯τC)τ . By applying Skovgaard (1981)’s transformation technique, we establish formaledgeworth expansion for Q1/2RI and the distribution of Q1/2RI can be approximated uniformly over thecollection of convex sets in Rp, denoted by C , by a signed measure with density as the inverse transformofexp(−12 tltl)×{1 +[κl(1)(itl) + 12(κlk(2) − 1[k = l])(itl)(itk) + 16κklm(3) (itl)(itk)(itm)]+ 124κklmn(4) (itl)(itk)(itm)(itn) + 12[κl(1)(itl) + 12(κlk(2) − 1[k = l])(itl)(itk) + 16κklm(3) (itl)(itk)(itm)]2}(2.59)as a function in t ∈ Rp with a coverage error of order o(Q−1M−1/2). For any (j1, . . . , jk) ∈ {1, . . . , p}k the(j1, . . . , jk)− th multivariate hermit polynomial H(j1,...,jk) is defined via the equalit∂k∂xj1 ···∂xjkφ(x) = (−1)kH(j1,...,jk)(x)φ(x)592.7. Proofsfor every x ∈ Rp. We apply the inversion equalityexp(−12 tτ t)(−1)k((itj1) · · · (itjk))= =´exp(itτx)H(j1,...,jk)(x)φ(x)dxfor every t ∈ Rp and define the polynomial pi bypi(x) ≡ 1− κl(1)Hl(x) +121M(∆lkI,[2,1]Hlk(x))+121Q(∆lkI,[2,2]Hlk(x))+121Q(∆lI,[1,1]∆kI,[1,1]Hlk(x))+16κklm(3) Hklm(x) +121Q−1/2M(∆lI,[1,1]∆kmI,[2,1]Hlkm(x)).We know that the distribution of Q1/2RI can be approximated by the signed measure with density as theinverse of (2.59). Checking the expansion of cumulants (2.58), we havesupC∈C∣∣∣∣P [Q1/2RI ∈ C]−ˆCpi(x)φ(x)dx∣∣∣∣ = O(Q−1M−1/2) +O(M−2).The second-order Hermite polynomial is Hlk(x) = xlxk − 1[l = k] for k, l ∈ {1, . . . , p} and the first andthird order Hermite polynomials Hl, Hlkm are odd functions. Therefore we haveP [Q ·RτIRI 6 cα] =´‖x‖6c1/2αpi(x)φ(x)dx +O(Q−1M−1/2) +O(M−2).We then expand the first term to getˆ‖x‖6c1/2αpi(x)φ(x)dx = P [χ2p 6 x] (2.60)+12M−1ˆ‖x‖6c1/2αp∑l=1∆llI,[2,1](xlxl − 1)+∑l 6=k∆llI,[2,1]xlxkφ(x)dx+12Q−1ˆ‖x‖6c1/2αp∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1])(xlxl − 1)φ(x)dx+12Q−1ˆ‖x‖6c1/2α∑l 6=k(∆lkI,[2,2] + ∆lI,[1,1]∆kI,[1,1])xlxkφ(x)dx=α−M−1%I,1cαfp(cα)−Q−1%I,2cαfp(cα)602.7. Proofswhere the last equality is by the definition%I,1 ≡ p−1p∑l=1∆llI,[2,1]and%I,2 ≡ p−1p∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1])and using the equalityp∑l=1ˆ‖x‖26c(xlxl − 1)φ(x)dx =ˆ‖x‖26c( p∑l=1xlxl − p)φ(x)dx=ˆ c0(x− p)fp(x)dx= 2cfp(c).Proof of Proposition 2.5. By the uniform approximation error result of Proposition 2.4, we haveP[Q ·RτIRI 6 x(1 +M−1%I,1 +Q−1%I,2)]= P[χ2p 6 x(1 +M−1%I,1 +Q−1%I,2)]−M−1%I,1xfp(x(1 +M−1%I,1 +Q−1%I,2))−Q−1%I,2xfp(x(1 +M−1%I,1 +Q−1%I,2))+O(M−2) +O(Q−1M−1/2). (2.61)Applying mean value approximation,P[χ2p 6 x(1 +M−1%I,1 +Q−1%I,2)]= P [χ2p 6 x] + fp(x)x(M−1%I,1 +Q−1%I,2)+Op(M−2)and fp(x(1 +M−1%I,1 +Q−1%I,2)) = fp(x) +O(M−1). The it is easy to see the desired result follows.Proof of Proposition 2.6. The idea of proof follows the proof of Liu and Chen (2010)’s Theorems 1and 2. We first expand ALRhr−I and relate its signed root decomposition to that of LRI. This gives a612.7. Proofssimpler way to derive signed-root decomposition of ALRhr−I with ALRhr−I = Q ·Rτhr−IRhr−I +Op(Q−3/2).We then establish formal edgeworth expansion of Q1/2Rhr−I. Then we find that in the expansion ofP [Q · Rτhr−IRhr−I 6 x], similar to (2.61), the leading terms with order O(M−1) + O(Q−1) disappear inthe sense that the coefficients are zeros.Define λ˜, λ˜hr to beλ˜ ≡ argmaxγ∈Γ(θ∗)Q∑i=1log (1 + γτWi(θ∗))andλ˜hr ≡ argmaxγ∈Γa(θ)Q+1∑i=1log (1 + γτWi(θ∗))where WQ+1(θ) ≡ −ahr−IQ∑Qi=1Wi(θ). Let θˆhr be the adjusted BEL estimator using the estimated ad-justment parameter ahr−I (c.f. (2.23)) and let θˆ be BEL estimator. Let λˆ beλˆ ≡ argmaxγ∈Γ(θˆ)Q∑i=1log(1 + γτWi(θˆ))andλˆhr ≡ argmaxγ∈Γa(θˆhr)Q+1∑i=1log(1 + γτWi(θˆhr)).Define Wi ≡Wi(θ∗), Wˆi ≡Wi(θˆ) and Aˆ ≡ Q−1∑Qi=1 Wˆi. We use the convention that summation over thesuperscript is understood with suitable range. Under Assumption 4, w.p.a.1, λ˜ and λ˜hr satisfy F.O.Cs:λ˜ solves f(ξ) = 0 where it is defined thatf(ξ) =1QQ∑i=1Wi1 + ξTWi.Therefore we have: w.p.a.1,f(λ˜hr) +1QWQ+1(θ∗)1 + λ˜ThrTQ+1(θ∗)= 0.We first look for an asymptotic expansion for λ˜hr − λ˜. Define a∗ ≡ 12(QM %I,1 + %I,2). Now we haveahr−I − a∗ =12QM(%̂I,1 − %I,1) +12(%̂I,2 − %I,2) .622.7. ProofsThen λ˜hr should satisfy the equationf(λ˜hr)−a∗QA−121M(%̂I,1 − %I,1)A−1Qa2∗A (AτA) = Op(Q−2). (2.62)And we also have the expansion result:f(λ˜hr) =∂∂ξτf(ξ)∣∣∣∣ξ=λ˜(λ˜hr − λ˜)+ op((λ˜hr − λ˜)2)(2.63)=(−1QQ∑i=1WiWτi − I)(λ˜hr − λ˜)+ 2(1QQ∑i=1(λ˜τhrWi)WiWi)τ(λ˜hr − λ˜)−(λ˜hr − λ˜)+ op((λ˜hr − λ˜)2).Combining (2.62) and (2.63), we get an asymptotic expansion for λ˜hr − λ˜λ˜hr − λ˜ = −a∗QA−121M(%̂I,1 − %I,1)A−1Qa2∗A(ATA)− 2a∗Qη1 +a∗Qη2 +Op(Q−2)where ηg1 ≡ αghjAhAj and ηg2 ≡ AghAh for any g ∈ {1, . . . , p}. We use the following rule for the indicesa, b, c, e ∈ {1, . . . , d− p}, f, g, h, i, j ∈ {1, . . . , d}, k, l,m, n, o ∈ {1, . . . , p}. Letζ ≡ −121M(%̂I,1 − %I,1)A−1Qa2∗A(AτA)− 2a∗Qη1 +aQη2.We have the following expansions:2Q∑i=1log(1 + λ˜τhrWi)= 2Q∑i=1log(1 +(λ˜−a∗QA+ ζ)τWi)+Op(Q−3/2)(2.64)= 2Q∑i=1log(1 +(λ˜−a∗QA+ ζ)τWi)+Op(Q−3/2)= 2Q∑i=1log(1 +(λ˜−a∗QA)τWi)+ 2ζτ( Q∑i=1Wi)+Op(Q−3/2)632.7. Proofsand2log(1 + λ˜τhrWQ+1)= − 2a∗AhAh −QM(%̂I,1 − %I,1)AhAh + 2a∗AjhAjAh − 2a∗αgjhAgAhAj+ 2a2∗QAhAh − a2∗(AhAh)2+Op(Q−3/2). (2.65)Expanding 2∑Qi=1 log(1 +(λ˜− a∗QA)τWi)we get2Q∑i=1log(1 +(λ˜−a∗QA)τWi)= 2Q∑i=1log(1 + λ˜τWi)− 2a∗QAτ( Q∑i=1Wi1 + λ˜τWi)−1Qa2∗AhAh +Op(Q−3/2)(2.66)and the second term is equal to 0 since λ˜ satisfies F.O.C. Combining (2.64),(2.65),(2.66), we get anasymptotic expansion for 2∑Q+1i=1 log(1 + λ˜τhrWi)with remainder term of order Op(Q−3/2):2Q+1∑i=1log(1 + λ˜τhrWi)= 2Q∑i=1log(1 + λ˜τWi)− 2a∗AhAh−32QM(%̂I,1 − %I,1)AhAh + 4a∗AjhAjAh − 3a2∗(AhAh)2− 6a∗αghjAgAhAj+a2∗QAhAh − 2a2∗(AhAh)2+Op(Q−3/2). (2.67)Next we need an asymptotic expansion for∑Q+1i=1 log(1 + λˆτhrWi(θˆhr)). Remember that(λˆ, θˆ)satisfiesthe F.O.C. h1(λˆ, θˆ) = 0 and h2(λˆ, θˆ) = 0 whereh1(ξ1, ξ2) ≡1QQ∑i=1Wi(ξ2)1 + ξτ1Wi(ξ2)h2(ξ1, ξ2) ≡1QQ∑i=1∂∂θWi(θ)τ∣∣θ=ξ2ξ11 + ξτ1Wi(ξ2).642.7. ProofsWe know that(λˆhr, θˆhr)satisfy the F.O.C.0 = h1(λˆhr, θˆhr) +1QWQ+1(θˆhr)1 + λˆτhrWQ+1(θˆhr)0 = h2(λˆhr, θˆhr) +1Q∂∂θWQ+1(θ)τ∣∣θ=θˆhrλˆhr1 + λˆThrWQ+1(θˆhr)Then we linearize h1 and h2 around(λˆ, θˆ). We then get a simultaneous equation in λˆhr − λˆ and θˆhr − θˆ.Firstly, going through lengthy but routine algebra, it can be shown that θˆhr − θˆ = Op(∥∥∥λˆhr − λˆ∥∥∥) ×Op(Q−1/2M−1/2) andλˆhr − λˆ = −a∗QAˆ−121M(%̂I,1 − %I,1) Aˆ+[1QQ∑i=1WiWτi − I](a∗QAˆ)+[1QQ∑i=1WiWτi −1QQ∑i=1WˆiWˆτi](a∗QAˆ)− 2[1QQ∑i=1(λˆτWˆi)WˆiWˆτi]a∗QAˆ+[1QQ∑i=1∂∂θτWi(θ)∣∣∣∣∣θ=θˆ](θˆhr − θˆ)+Op(Q−2).We apply Chen and Cui (2005)’s formulas (A.1 to A.6) to get asymptotic expansion for λˆ and Aˆ ex-pressed in terms of A − α and C − γ systems with remainder term of order O(Q−3/2). By steps similarto(2.64),(2.65),(2.66), we get the expansion2Q+1∑i=1log(1 + λˆτhrWi(θˆhr))= 2Q∑i=1log(1 + λˆτWi(θˆ))− 2a∗Ap+aAp+a−32QM(%̂I,1 − %I,1)Ap+aAp+a + 4a∗Ap+a p+bAp+aAp+b− 3a2∗(Ap+aAp+a)2− 6a∗αp+c p+a p+bAp+aAp+bAp+c+a2∗QAp+aAp+a − 2a2∗(Ap+aAp+a)2− a∗γp+a,knωkmωnm′AmAm′Ap+a+Op(Q−3/2). (2.68)Remember that we have a signed root decomposition for LRI, which isLRI = Q (RI,1 +RI,2 +RI,3)τ (RI,1 +RI,2 +RI,3) +Op(Q−3/2)652.7. Proofsfor RI,1 = Op(Q−1/2), RI,2 = Op(Q−1) and RI,3 = Op(Q−3/2) . By definition, we haveLRI = 2{ Q∑i=1log(1 + λˆτWi(θˆ))−Q∑i=1log(1 + λ˜τWi(θ∗))}.We link the signed root decomposition of ALRhr−I to that of LRI. Combining (2.67) and (2.68), we get asigned root decomposition for ALRhr−I, that is ALRhr−I = Q ·Rτhr−IRhr−I +Op(Q−1M−1/2) +Op(M−2)where Rhr−I is a p−dimensional random vector: for each k ∈ {1, . . . , p},Rkhr−I ≡ RkI +1M%I,1Ak +1Q%I,2Ak +1M%I,1(AjkAj +Ak p+aAp+a)−321M%I,1(αghkAgAh + αgk p+aAgAp+a + αk p+a p+bAp+aAp+b)With some algebraic work, it is found thatCum(Q1/2Rlhr−I, Q1/2Rohr−I, Q1/2Rmhr−I) =1Q1/2M∆lomhr−I,[3,1] + o(Q−1/2M−1)Cum(Q1/2Rlhr−I, Q1/2Rohr−I, Q1/2Rmhr−I, Q1/2Rnhr−I) =1M2∆lomnhr−I,[4,1] +1QM 1/2∆lomnhr−I,[4,2]+ o(M−2) + o(Q−1M−1/2)for some O(1) terms ∆lomhr−I,[3,1], ∆lomnhr−I,[4,1] and ∆lomnhr−I,[4,2] which are different from ∆lomI,[3,1], ∆lomnI,[4,1] and∆lomnI,[4,2] but still third and fourth order cumulants of Q1/2Rhr−I has the same key property that sums ofleading terms are equal to zero. We also have expansion for second-order cumulant:Cum(Q1/2Rlhr−I, Q1/2Rohr−I) = 1[l = o] +1M∆lohr−I,[2,1] +1Q∆lohr−I,[2,2]−1M%I,11[l = o]−1Q%I,21[l = o] +O(Q−1M−1/2)where ∆loI,[2,1] and ∆loI,[2,2] are O(1) terms as defined in formula (2.17). Then applying edgeworth expansionresult for Q1/2Rhr−I, and obtaining a similar expansion as in (2.60) for P[Q ·Rτhr−IRhr−I 6 x], it can be662.7. Proofsshown that the M−1 and Q−1 terms disappear:P[Q ·Rτhr−IRhr−I 6 x]=12M−1p∑l=1(∆llI,[2,1] − %I,1)ˆ‖t‖6x1/2(tltl − 1)φ(t)dt+12Q−1p∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1] − %I,2)ˆ‖t‖6x1/2(tltl − 1)φ(t)dt+O(Q−1M−1/2) +O(M−2)= M−1((12p−1p∑l=1(∆llI,[2,1] − %I,1))ˆ‖t‖6x1/2( p∑l=1tltl − p)φ(t)dt)+Q−1(12p−1p∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1] − %I,2)ˆ‖t‖26x( p∑l=1tltl − p)φ(t)dt)+O(Q−1M−1/2) +O(M−2)= P [χ2p 6 x] +O(Q−1M−1/2) +O(M−2)since%I,1 ≡ p−1p∑l=1∆llI,[2,1]and%I,2 ≡ p−1p∑l=1(∆llI,[2,2] + ∆lI,[1,1]∆lI,[1,1])by definition. Therefore we havesupx∈R∣∣P[Q ·Rτhr−IRhr−I 6 x]− P [χ2p 6 x]∣∣ = O(M−2) +O(Q−1M−1/2).We can find signed root decomposition of ALRhr−II and use same arguments to prove (2.26).67Chapter 3Second Order Refinement of EmpiricalLikelihood Ratio Tests for GeneralParameter Testing Problems3.1 IntroductionWe have independent and identically distributed observations (Xj)Tj=1 with sample size T ∈ N drawnfrom the distribution of some Rd−valued random vector X. Our parameter of interest is defined by restimating equations. For some known function g : Rd × Θ −→ Rr, we know that E [g(X, θ∗)] = 0, forunknown point θ∗ ∈ Θ ⊆ Rp. Θ is the parameter space. The model we consider can be over-identified:the number of moment restrictions r can be strictly larger than the dimension of the parameter p. Thismodelling framework is known as moment restriction model in the econometric literature. It is a verygeneral modelling framework that is widely used in empirical research. Overidentified moment restrictionmodels are especially useful in applied econometrics. Many economic theories imply that the economicvariables X satisfy some moment restriction with some unknown parameter. See Hall (2005) for examples.Generalized method of moments (GMM) (Hansen (1982)) is a popular estimation and testing methodfor these models. See Hall (2005) and Imbens (2002) for review of GMM. For inference problems, we firstdefine a GMM sample criterion function, which is a quadratic form in observed moment restrictions. Wehave the flexibility of choosing the weighting matrix. We minimize the GMM sample criterion functionand get a GMM estimator. We are also interested in parameter hypothesis testing problems. Supposefor some set Θ0 the researcher poses a hypothesis that the true parameter value θ∗ ∈ Θ0. We can doa large-sample test by defining Wald testing statistics which under the null hypothesis have χ2 limitingdistributions and using the asymptotic distributions as approximation to the true distribution of the683.1. Introductiontesting statistics to define the critical value corresponding to some significance level. However, simulationstudies in the literature showed that GMM can have poor finite-sample performance when used to addresseither estimation or testing problems. For estimation problems, GMM-based point estimates can beseverely biased. For hypothesis testing problems, the difference between the true rejection probabilityunder the null hypothesis and the nominal size of the test can be large, which is known as size distortionproblem.The first-order asymptotic distribution of the GMM-based testing statistics for parameter testingproblems can be poor approximation in finite samples. To address this issue, in the literature Hall andHorowitz (1996) and Brown and Newey (2002) proposed to use the bootstrap distribution of the testingstatistics and use it instead of the first-order asymptotic distribution when doing hypothesis testing.Another potential solution was to use a totally different inferential method, the generalized empiricallikelihood (GEL) (Smith (2011)). To do the efficient GMM estimation we need a preliminary estimate tocompute the optimal weighting matrix which gives us an estimator with the smallest asymptotic variance.GEL estimate can be obtained in one step without the need for a preliminary estimate. GEL includesempirical likelihood (EL) (Owen (1988, 1990), Qin and Lawless (1994)) and exponential tilting (Kitamuraand Stutzer (1997)) and continuous updating (Hansen, Heaton and Yaron (1996)) estimation as specialcases. See also Hall and La Scala (1990), Kitamura (2006) and Owen (2001) for review of empiricallikelihood.In statistical and econometric literature, various papers showed that EL-based estimation and hypoth-esis testing have desirable statistical properties. Newey and Smith (2004) showed that the EL estimatorshave good properties in the aspect of second order bias and higher order efficiency. For testing problems,several papers showed that various versions of ELR tests is Bartlett correctable: the distribution of somemean adjusted ELR statistic converges to χ2 limiting distribution at a faster speed. Chen and Cui (2007)showed that for overidentified models (r > p), the ELR test for a simple parameter hypothesis testingproblem with Θ0 = {θ0} for some θ0 ∈ Θ is Bartlett correctable. Chen and Cui (2006.a) showed that forjust identified models (r = p), ELR test for parameter testing problems in the presence of nuisance param-eters with Θ0 = {θ ∈ Θ : θ = (ψ0, ζ) , for some ζ} for some fixed ψ0 is Bartlett correctable. Matsushitaand Otsu (2013) showed that the ELR test for testing the overidentifying restrictions is also Bartlett cor-rectable.Earlier papers showing that EL-based tests are Bartlett correctable include DiCiccio, Hall andRomano (1991) in the context of smooth functions of means and Chen (1993, 1994) in the context of linear693.1. Introductionregression models. Otsu (2009) established that the ELR test for parameter testing problems is optimalin an asymptotic large deviations sense, which is known as generalized Neyman-Pearson optimality.Bartllett correctability is a delicate statistical property. There are also results about “non-correctability”.Jing and Wood (1996) focused on inference for population means and showed that exponential tiltingbased test, which is a special case of generalized ELR tests, is not Bartlett correctable. Baggerly (1998)also focused on inference for population means and further showed that ELR test is the only Bartlettcorrectable one within the power divergence family. Lazar and Mykland (1999) gave a non-correctabilityresult that is closely related to Chen and Cui (2006.a). It was shown that in the presence of nuisanceparameters, Bartlett correctability of ELR tests can be lost due to change in how the nuisance parameteris removed.In this paper, our inferential framework is the overidentified moment restriction model with cross-sectional data. We focus on testing H0 : ρ0(θ∗) = 0 for some nonlinear function ρ0. The class of parametertesting problems considered in this paper include the testing problems considered in Chen and Cui (2007)and Chen and Cui (2006.a) as special cases. We show that the EL ratio tests for this class of testingproblems are all Bartlett correctable. The theoretical result obtained in this paper has twofold significance.Theoretically, this result further confirms empirical likelihood as a favorable statistical method for testingproblems. Together with many other papers, we reach the conclusion that empirical likelihood ratio testsfor parameter testing problems exhibit favorable statistical properties. Practically, our result significantlyenhances the applicability of second order refinement to EL in econometrics. For example, the appliedeconometrician is interested in constructing a confidence region for a subvector of the parameter in anoveridentified instrumental variable regression model using EL method but the theorems about secondorder refinement of EL in the literature are not general enough to allow for these cases. Another example isthat in some situations researchers want to test the null hypothesis that two components of the parametervector are equal. With the results given in this paper, we now have feasible EL ratio tests with high orderprecision for these interesting testing problems.703.2. Setup3.2 SetupSuppose we observe a random sample (Xj)Tj=1 and for some known function g : Rd ×Θ −→ Rr, we knowthatE [g(X1, θ∗)] = 0,for unknown point θ∗ ∈ Θ. Θ ⊆ Rp is the parameter space. We wish to test the null hypothesisθ∗ ∈ Θ0 ⊆ Θ. In this paper we assume that the null hypothesis is of the form Θ0 = {θ ∈ Θ : ρ0 (θ) = 0}for some known nonlinear function ρ0 : Rp −→ Rp0 with p0 6 p. The null and alternative hypothesis areH0 : ρ0(θ∗) = 0 versus H1 : ρ0(θ∗) 6= 0 (3.1)We write g(X, θ) =(g1(X, θ), . . . , gr(X, θ))τ. In this paper, we use Aτ to denote transpose of a matrix A.Let V ≡ Var [g(X, θ∗)]. We use aj to denote the j−th component of a vector a. We make the followingassumptions.Assumption 7. (Xj)Tj=1 is i.i.d.Assumption 8. E [g(X, θ∗)] = 0 is uniquely satisfied at θ∗.Assumption 9. Θ is compact. V is positive definite. E[∂∂θτ g (X, θ)∣∣θ=θ∗]has full column rank.Assumption 10. There exists a neighborhood N of θ∗ such that for each j = 1, . . . , r, gj(X, ·) is con-tinuously third-order differentiable in θ ∈ N almost surely and the derivatives are bounded by integrablefunctions over N .Assumption 11. E[‖g(X, θ∗)‖15]<∞ and limsup|t|−→∞|E [exp(itτg(X, θ∗))]| < 1.These regularity conditions are the same as imposed by Chen and Cui (2006.a), Chen and Cui (2007)and Matsushita and Otsu (2013). In this paper, we exclude dependent data. Assumption 11 imposesboundedness of moments and a Cramer condition used to establish validity of Edgeworth expansion.The Cramer condition is satisfied when the distribution of g(X, θ∗) has a nondegenerate and absolutelycontinuous component. For example, this condition is satisfied for a linear instrumental variable modelwhen the error term has a absolutely continuous distribution. In addition, we need to put some restrictionon the form of the null hypothesis.713.2. SetupAssumption 12. ρ0 : Θ −→ Rp0 is twice continuously differentiable and the p0 × p matrix∂ρ0(θ)∂θτ∣∣∣θ=θ∗isof full row rank.For each θ ∈ Θ, we define the GEL profile likelihood function`GEL(θ) ≡ inf−T∑j=1φ(wj) : w ∈ ST ,T∑j=1wjg(Xj , θ) = 0(3.2)where ST ≡{(w1, · · · , wN ) ∈ RT+ :∑Tj=1wj = 1}and φ is a concave increasing function. Minimizer of`GEL, θˆGEL ≡ argminθ∈Θ`GEL(θ) is the Generalized EL estimator. Choosing φ(w) = log(w) we get theempirical likelihood estimator and choosingφ(w) = wlog(w) we get the exponential tilting estimator.We transform g(Xi, θ) toWi(θ) ≡ ΨLV− 12 g(Xi, θ)where ΨL is an r × r orthonormal matrix and ΨR is an p× p orthonormal matrix such thatΨLV− 12 E[∂∂θτg(X, θ)∣∣∣∣θ=θ∗]ΨR =Λ0r×p(3.3)with a nonsingular diagonal matrix Λ. Let ` denote the EL profile likelihood runction. Standard deriva-tions show that` (θ) =T∑i=1log (1 + λ(θ)τWi(θ))for each θ ∈ Θ where the Lagrange multiplier λ(θ) is the solution of T−1∑Ti=1Wi(θ)1+λτWi(θ)= 0. The(unconstrained) EL estimator θˆ and its corresponding λˆ ≡ λ(θˆ) are solutions of Q1T (λˆ, θˆ) = 0 andQ2T (λˆ, θˆ) = 0 whereQ1T (λ, θ) ≡1TT∑i=1Wi(θ)1 + λτWi(θ)and Q2T (λ, θ) ≡1TT∑i=1(∂∂ζWi(θ)τ)λ1 + λτWi(θ).LetS ≡ E[∂∂ητQ(η)∣∣∣∣η=(0,θ∗)]=−I S12S21 0723.2. Setupwhere S12 ≡ΛΨτR0 and S21 ≡ Sτ12. We introduce the following notations.αj1...jk ≡ E[Wi(θ∗)j1 · · ·Wi(θ∗)jk]; Aj1...jk ≡1TT∑i=1Wi(θ∗)j1 · · ·Wi(θ∗)jk − αj1...jkandβj,j1...jk ≡ E∂kΓji (η)∂ηj1 · · · ∂ηjk∣∣∣∣∣η=(0,θ∗) ; Bj,j1...jk ≡1TT∑i=1∂kΓji (ξ)∂ξj1 · · · ∂ξjk∣∣∣∣∣ξ=(0,ζ∗)− βj,j1...jkandγj,j1...jl;k,k1...km;...;p,p1...pn ≡E[∂lW ji (θ)∂θj1 · · · ∂θjl∣∣∣∣∣θ=θ∗∂mW ki (θ)∂θk1 · · · ∂θkm∣∣∣∣θ=θ∗· · ·∂nW pi (θ)∂θp1 · · · ∂θpn∣∣∣∣θ=θ∗]Cj,j1...jl;k,k1...km;...;p,p1...pn ≡1TT∑i=1∂lW ji (θ)∂θj1 · · · ∂θjl∣∣∣∣∣θ=θ∗∂mW ki (θ)∂θk1 · · · ∂θkm∣∣∣∣θ=θ∗· · ·∂nW pi (θ)∂θp1 · · · ∂θpn∣∣∣∣θ=θ∗.− γj,j1...jl;k,k1...km;...;p,p1...pnThen it is shown in Chen and Cui (2007) that `(θˆ)admits a high-order stochastic expansion. In thispaper, we also adopt a convention where if a superscript is repeated, a summation over that superscriptis understood. We also fix the range of superscripts a, b, c, d ∈ {1, 2, . . . , r − p}, f, g, h, i, j ∈ {1, 2, . . . , r}and q, s, t, u ∈ {1, 2, . . . , r + p}. k, l,m, n, o could range from 1 to p or 1 to p − p0, depending on whatplace these dummies are used. This will not cause any ambiguity in this paper. It is shown in Chen and733.2. SetupCui (2007) (Equation (2.8)) that`(θˆ)=− 2BjAj −BjBj + 2Cj,k(BjBr+k −Bj,qBqBr+k)+ γj,kl(−BjBr+kBr+l +Br+kBr+lBj,qBq[3]−12Br+lBr+kβj,uqBuBq)− Cj,klBjBr+kBr+l +13γj,klmBjBr+kBr+lBr+m −Bj,uBuBj,qBq −14βj,uqβj,stBuBqBsBt−BjBiAji +BjBi,qBqAji[2] + 2γj;i,l(BjBiBr+l −BjBiBr+l,qBq−Br+lBiBj,qBq[2] +12βj,uqBuBqBiBr+l[2])−(γj;i,lk + γj,l;i,k)BjBiBr+lBr+k+ 2BjBiBr+lCj;i,l −23αjihBjBiBh + 2αjihBjBiBh,qBq − αjihβj,uqBuBqBiBh−23AjihBjBiBh + 2γj;i;h,kBjBiBhBr+k −12αjihgBjBiBhBg +Op(T−5/2) (3.4)where [3] is short notation for [3; j, r + k, r + l] which means there are three terms by putting each ofj, r + k, r + l in the position of j. [2] is a short notation for [2; j, r + k] which means that there are twoterms by exchanging the superscripts j and r + k.The constrained EL estimator under the null hypothesis is θ˜ ≡ argminθ:ρ0(θ)=0`(θ). We apply the techniquethat transforms the constrained EL to an unconstrained one. This is used in many other papers thatconsider testing nonlinear parametric restrictions (3.1). For example, see Theorem 4.2 of Kitamura,Tripathi and Ahn (2004). By the implicit function theorem (Theorem 9.28, Rudin (1976)), without lossof generality, there exists some open set U ⊆ Rp−p0 and some twice differentiable function ψ : U −→ Rp0such that {θ ∈ N : ρ0(θ) = 0} = {(ζ, ψ(ζ)) : ζ ∈ U}. for some neighborhood N around θ∗. Under the nullhypothesis that ρ0(θ∗) = 0, there uniquely exists some ζ∗ ∈ U such that θ∗ = (ζ∗, ψ(ζ∗)). Letζ˜ ≡ argminζ∈U`(ζ, ψ(ζ)),then it follows that θ˜ =(ζ˜, ψ(ζ˜)). We also do the transformation W˜i(ζ) ≡ ΨLV−1/2g(Xi, ζ, ψ(ζ)). Theconstrained EL estimator ζ˜ and the corresponding λ˜ satisfies: Q˜1T (λ˜, ζ˜) = 0 and Q˜2T (λ˜, ζ˜) = 0 whereQ˜1T (λ, ζ) ≡1TT∑i=1W˜i(ζ)1 + λτW˜i(ζ)and Q˜2T (λ, ζ) ≡1TT∑i=1(∂∂ζ W˜i(ζ)τ)λ1 + λτW˜i(ζ). (3.5)743.2. SetupLetS˜ ≡ E[∂∂ξτ Q˜(ξ)∣∣∣ξ=(0,ζ∗)]=−I S˜12S˜21 0where S˜12 ≡ΛΨτR0I∂∂ζτ φ(ζ)∣∣∣ζ=ζ∗ and S˜21 ≡ S˜τ12. Let Υ ≡I∂∂ζτ φ(ζ)∣∣∣ζ=ζ∗ and Π ≡ ΛΨτRΥ.Let Γ˜ (ξ) ≡ S˜−1Q˜(ξ) and Γ(η) ≡ S−1Q(η). Denote η ≡ (λ, θ) and ξ ≡ (λ, ζ). We introduce the followingnotations.β˜j,j1...jk ≡ E∂kΓ˜ji (ξ)∂ξj1 · · · ∂ξjk∣∣∣∣∣ξ=(0,ζ∗) ; B˜j,j1...jk ≡1TT∑i=1∂kΓ˜ji (ξ)∂ξj1 · · · ∂ξjk∣∣∣∣∣ξ=(0,ζ∗)− β˜j,j1...jk (3.6)andγ˜j,j1...jl;k,k1...km;...;p,p1...pn ≡ E∂lW˜ ji (ζ)∂ζj1 · · · ∂ζjl∣∣∣∣∣ζ=ζ∗∂mW˜ ki (ζ)∂ζk1 · · · ∂ζkm∣∣∣∣∣ζ=ζ∗· · ·∂nW˜ pi (ζ)∂ζp1 · · · ∂ζpn∣∣∣∣∣ζ=ζ∗ ;C˜j,j1...jl;k,k1...km;...;p,p1...pn ≡1TT∑i=1∂lW˜i(ζ)∂ζj1 · · · ∂ζjl∣∣∣∣∣ζ=ζ∗∂mW˜i(ζ)∂ζk1 · · · ∂ζkm∣∣∣∣∣ζ=ζ∗· · ·∂nW˜i(ζ)∂ζp1 · · · ∂ζpn∣∣∣∣∣ζ=ζ∗(3.7)− γ˜j,j1...jl;k,k1...km;...;p,p1...pn .We need stochastic expansions of both `(θ˜)and `(θˆ)with error of order Op(T−3/2). . We drive theexpansion of `(θ˜)in a similar manner. First we notice that we have`(θ˜)=T∑i=1{λ˜τW˜i(ζ˜)−12[λ˜τW˜i(ζ˜)]2+13[λ˜τW˜i(ζ˜)]3−14[λ˜τW˜i(ζ˜)]4}+Op(T−3/2). (3.8)By inverting Γ˜(λ˜, ζ˜)= 0, we get for j ∈ {1, . . . , r},λ˜j =− B˜j + B˜j,qB˜q −12β˜j,uqB˜uB˜q − B˜j,uB˜u,qB˜q +12β˜u,qsB˜j,uB˜qB˜s + β˜j,uqB˜u,sB˜sB˜q−12β˜j,uqβ˜u,stB˜sB˜tB˜q −12B˜j,uqB˜uB˜q +16β˜j,uqsB˜uB˜qB˜s +Op(T−2)(3.9)753.2. Setupand for k ∈ {1, . . . , p− p0},ζ˜k =− B˜r+k + B˜r+k,qB˜q −12β˜r+k,uqB˜uB˜q − B˜r+k,uB˜u,qB˜q +12β˜u,qsB˜r+k,uB˜qB˜s + β˜r+k,uqB˜u,sB˜sB˜q−12β˜r+k,uqβ˜u,stB˜sB˜tB˜q −12B˜r+k,uqB˜uB˜q +16β˜r+k,uqsB˜uB˜qB˜s +Op(T−2) (3.10)By substituting (3.9) and (3.10) into (3.8), it is shown in the appendix that we can get an expansionof `(θ˜)in terms of β˜ − B˜ system with an error of magnitude Op(T−5/2).`(θ˜)=− 2B˜jAj − B˜jB˜j + 2C˜j,k(B˜jB˜r+k − B˜j,qB˜qB˜r+k)+ γ˜j,kl(−B˜jB˜r+kB˜r+l + B˜r+kB˜r+lB˜j,qB˜q[3]−12B˜r+lB˜r+kβ˜j,uqB˜uB˜q)− C˜j,klB˜jB˜r+kB˜r+l +13γ˜j,klmB˜jB˜r+kB˜r+lB˜r+m − B˜j,uB˜uB˜j,qB˜q −14β˜j,uqβ˜j,stB˜uB˜qB˜sB˜t− B˜jB˜iAji + B˜jB˜i,qB˜qAji[2] + 2γ˜j;i,l(B˜jB˜iB˜r+l − B˜jB˜iB˜r+l,qB˜q−B˜r+lB˜iB˜j,qB˜q[2] +12β˜j,uqB˜uB˜qB˜iB˜r+l[2])−(γ˜j;i,lk + γ˜j,l;i,k)B˜jB˜iB˜r+lB˜r+k+ 2B˜jB˜iB˜r+lC˜j;i,l −23αjihB˜jB˜iB˜h + 2αjihB˜jB˜iB˜h,qB˜q − αjihβ˜j,uqB˜uB˜qB˜iB˜h−23AjihB˜jB˜iB˜h + 2γ˜j;i;h,kB˜jB˜iB˜hB˜r+k −12αjihgB˜jB˜iB˜hB˜g +Op(T−5/2) (3.11)Using the algebraic formulae derived in the appendix, we could express `(θ˜)in terms of γ˜−C˜ system.There are also algebraic relations between the γ˜ − C˜ system and the γ −C system. Let N ≡ (ΠτΠ)−1 Πτand M ≡ −I + Π (ΠτΠ)−1 Πτ . We notice that −M is a projection matrix onto the orthogonal complementof the column space of Π. Let Π0 be a p× p0 matrix with columns spanning the orthogonal complementof the column space of Π. By uniqueness of projection matrices (Theorem 12.3.1, Harville (1997)),M = Π0 (Πτ0Π0)−1 Πτ0 . Let O ≡ Π0 (Πτ0Π0)−1/2. Then it is clear that O is an p× p0 matrix with OτO = Iand −M = OOτ . All the algbraic formulae are collected in the appendix.When we have an overidentified model but testing a simple hypothesis: θ∗ = θ0 for some θ0, asconsidered by Chen and Cui (2007), the expansion of the constraine maximum EL `(θ˜)is very simple,since in this case θ˜ = θ0 and we do not need to take into account the randomness of θ˜. When we havea just identified model in the presence nuisance parameter, as considered by Chen and Cui (2006.a),the constrained EL estimator is θ˜ =(ψ0, ζ˜)where ζ˜ ≡ argminζ`(ψ0, ζ). In this case, expansion for763.3. The Main Resultsthe unconstrained maximum EL `(θˆ)is trivial: `(θˆ)= Op(T−3/2) but expansion for `(θ˜)is quitecomplicated. In the more general case we considered in this paper, both expansions of `(θˆ)and `(θ˜)arequite complicated. Combining the expansion of `(θ˜)and `(θˆ), we could obtain a stochastic expansionfor LR ≡ 2{`(θ˜)− `(θˆ)}which can be written in terms of the γ − C system.3.3 The Main Results3.3.1 Signed Root Decomposition and Properties of Its CumulantsTo derive the second order properties of EL ratio statistics, we first need to find the signed root de-composition of the EL ratio statistic which are p0 - dimensional random vectors R1, R2, R3 such thatLR = T · (R1 +R2 +R3)τ (R1 +R2 +R3) + Op(T−3/2) and R1 = Op(T−1/2), R2 = Op(T−1) andR3 = Op(T−3/2). R1, R2, R3 are polynomials in decentralized sample moments. In the appendix andthe supplemental appendix we give the detailed steps to show the existence and the explicit forms ofR1, R2, R3. For example, it is shown in the appendix that for each x ∈ {1, . . . , p0},Rx1 = OmxAmandRx2 =12MmkOnxAmnAk −OnxAn p+aAp+a +{−γm;v,wΩwn (I + M)nk MmlOvx+13αvmnMvlMmkOnx +12γm,wvΩwn (I + M)nl Ωvo (I + M)ok Omx}AlAk+{−γp+a,vwΩvkΩwoOox − αvmp+aMvkOmx +(γp+a,m;v + γp+a;v,m)Ωmw (I + M)wk Ovx}AkAp+a+{−γp+a;p+b,mΩmnOnx + αv p+a p+bOvx}Ap+aAp+b− Ωko (I + M)om OlxC l,kAm + ΩkmOmxCp+a,kAp+aThe expression for R3 is very lengthy and given in Section 3.6.6 of the appendix. It is also clear fromthe derivation that 2{`(θ˜)− `(θˆ)}= T · Rτ1R1 + op(1) = −T ·MlkAlAk + op(1) and this gives that2{`(θ˜)− `(θˆ)}−→d χ2p0 . This is exactly the first-order asymptotic result given by Qin and Lawless(1994).773.3. The Main ResultsWe apply the conventional argument to derive Bartlett correction for the empirical likelihood ratiostatistic. In the appendix, we show that the third and the fourth order joint cumulants of R ≡ R1+R2+R3satisfyCum (Rx, Ry, Rz) = O(T−3) (3.12)for every (x, y, z) ∈ {1, . . . , p0}3 andCum(Rx, Ry, Rz, Rt)= O(T−4). (3.13)for every (x, y, z, t) ∈ {1, . . . , p0}4. It is known that (3.12) and (3.13) are important claims that lead toBartlett correctability.Bartlett correctability is a delicate statistical property because expansions for Cum (Rx, Ry, Rz) andCum(Rx, Ry, Rz, Rt)are very complicated. If we allow for the presence of nuisance parameters, i.e. ournull hypothesis is H0 : ψ∗ = ψ0 for some subvector ψ of θ. Lazar and Mykland (1999) showed that forthe EL ratio statistic 2{`(ψ0, ζˆ)− `(θˆ)}, where ζˆ is the corresponding subvector of the unconstrainedestimator θˆ, (3.13) fails and the EL ratio test is not Bartlett correctable in general. The first orderχ2 approximation is still valid. This means that Bartlett correctability depends on how the nuisanceparameter is removed. Let `et be the (exponential tilting) profile likelihood function corresponding totaking φ(w) = wlog(w) in the definition (3.2). Jing and Wood (1996) shows that in general (3.13) fails for2{minθ:ρ0(θ)=0`et(θ)−minθ`et(θ)}and therefore tests based on exponential tilting method are not Bartlettcorrectable. Derivation of the expressions for R1, R2, R3 and the proof of (3.12) and (3.13) are both verycomplicated. They are given in the appendix and the supplemental appendix.3.3.2 Second Order Refinement via Bartlett CorrectionLet LR ≡ 2{`(θ˜)− `(θˆ)}. The first-order cumulants of R are equal to its first-order moments. Byinspecting the algebraic expressions for R1, R2 and R3, it is easy to get that E [Rx1 ] = 0, E [Rx2 ] = O(T−1)and E [Rx3 ] = O(T−2). Let µx denote the coefficient of T−1, i.e. µx = T · E [Rx2 ]. We have for each783.3. The Main Resultsx ∈ {1, . . . , p0},Cum (Rx) = E [Rx] =E [Rx1 ] + E [Rx2 ] + E [Rx3 ]=T−1µx +O(T−2).Let δxy denote the indicator 1[x = y]. The second order cumulants can also be expanded: For each(x, y) ∈ {1, . . . , p0}2,Cum (Rx, Ry) = T−1δxy + T−2∆xy +O(T−3)for some matrix ∆ depending on population moments. ∆xy is defined to be the coefficient of the T−2term in the expansion. Let Bc ≡ p−10∑p0x=1 (∆xx + µxµx) be the Bartlett correction factor. We state thefollowing proposition as our first result in this paper. After we prove (3.12) and (3.13), this result can bereadily obtained. Let B̂c denote the method of moment estimator of Bc, which is√T consistent undermild conditions.Proposition 3.1. Let cα and fp0 be the (1− α)−th quantile and the probability density function of theχ2p0 distribution. Under assumptions 7, 8, 9, 10, 11 and 12, we haveP [LR 6 cα] = 1− α− T−1cαfp0(cα)Bc +O(T−2), (3.14)P[LR1 + T−1Bc6 cα]= 1− α+O(T−2) (3.15)andP[LR1 + T−1B̂c6 cα]= 1− α+O(T−2) (3.16)under H0 : ρ0(θ∗) = 0.Proposition 3.1 is a substantial generalization of the Bartlett correctability theorems by Chen and Cui(2006.a) and Chen and Cui (2007), where different proper subsets of the more general parametric testingproblem (3.1) are considered. It is remarkable that Bartlett correctability hold with quite mild restrictionson the form of the parametric restrictions under the null hypothesis and thus Bartlett correction can beimplemented when using empirical likelihood ratio tests for a much larger class of testing problems. Thecorrection factor Bc should be dependent on ρ0 and θ∗ in general.793.3. The Main ResultsIt is remarkable that using a method of moment estimator B̂c, the technical argument by Barndorff-Nielsen and Hall (1988) applies and LR1+T−1B̂cachieves the same level of high order precision as LR1+T−1Bcif considering the order of magnitude of the χ2 approximation error. From this aspect of high orderasymptotics, randomness in B̂c is incorporated when deriving (3.16) and using LR1+T−1B̂cis as well-justifiedas using LR1+T−1Bc . Chen and Cui (2007) argued that since the expression for Bc can be very complex,method of moment estimator cannot precisely estimate Bc in small samples and therefore finite sampleperformance of LR1+T−1B̂ccan be undermined. To address this problem, they proposes a bootstrap estimatorfor 1 + T−1Bc, denoted by β̂c. A theoretical compromise is that we now have P[β̂c−1LR 6 cα]=1−α+O(T−3/2). Also Liu and Chen (2010)’s simulation results show that bootstrap estimates for Bc arealso unstable. In practice bootstrap estimation for Bc is much easier to be implemented than method ofmoments estimation. The bootstrap estimation procedure in Chen and Cui (2007) can be easily adaptedto estimate Bc for the testing problem considered in this paper. Instead of using uniform weight forbootstrap resamples, we can also follow Matsushita and Otsu (2013) to resample from the EL impliedprobabilities. In our simulations, because of the fact that the expression for Bc is much simpler for thelinear case, we use method of moments estimator.3.3.3 Second Order Refinement via Pseudo ObservationAnother closedly related approach to get an EL ratio test with high-order precision is to use the pseudoobservation adjustment. Chen, Variyath and Abraham (2008) proposed adjusted EL to avoid the convexhull problem (see section 8 of Kitamura (2006)) and gain computational efficiency. Let a be a scalartuning parameter that may be dependent on the data. We define a “pseudo observation” bygT+1(θ) ≡ −aTT∑j=1g(Xj , θ)and denote gj(θ) ≡ g(Xj , θ) for j = 1, . . . , T . We can define the adjusted profile likelihood function`a(θ) ≡ inf−T∑j=1log(wj) : w ∈ ST+1,T+1∑j=1wjgj(θ) = 0by simply adding the pseudo point gT+1(θ). If a > 0 in the finite sample, we escape the convex hullproblem when minimizing `a. We define the adjusted EL estimator θˆa ≡ argminθ`a(θ) and analogously803.4. Monte Carlo Simulationsthe adjusted EL ratio statistic. Suppose now we are interested in the simple parameter hypothesis testingproblem H0 : θ∗ = θ0 and use B0c to denote the Bartlett correction factor derived in Chen and Cui(2007). Let B̂0c be a√T consistent estimator of B0c . Liu and Chen (2010) shows that if we put a =B̂0c2 ,the adjusted EL ratio statistic 2{`a(θ0)− `a(θˆa)}converges to χ2 in distribution with an error of orderO(T−2). In this way, we obtain another test with the same high order precision as the Bartlett correctedEL ratio test. And using the pseudo observation adjustment, in addition to second order refinement, weobtain robustness to convex hull problem in finite samples. This result can be easily extended to the moregeneral case we considered in this paper. We obtain the following proposition.Proposition 3.2. Let cα be the (1− α)−th quantile and the probability density function of the χ2p0distribution. Under assumptions 7, 8, 9, 10, 11 and 12, if a = Bc2 + Op(T−1/2) and we let `a be thecorresponding adjusted profile likelihood function, then we haveP [ALR 6 cα] = 1− α+O(T−2)under H0 : ρ0(θ∗) = 0, where ALR ≡ 2{minθ:ρ0(θ)=0`a(θ)−minθ`a(θ)}.Although using adjusted EL ratio statistic with a = B̂c2 , we obtain tests with the same level of highorder precision as tests using the Bartlett corrected statistic LR1+T−1B̂c, in finite samples they may behavevery differently. In the next section, we use Monte Carlo experiments to compare the finite sampleperformance of tests based on adjusted EL and Bartlett corrected EL.3.4 Monte Carlo SimulationsIn this section, we give some Monte Carlo results to assess the finite sample property of the testingmethods we proposed in this paper. In linear instrumental variable models, when the correlation betweenthe instruments and the endogenous regressor is weak, the validity of two-stage least square t-test for thecoefficient of the endogenous regressor is questionable and the size can be severely distorted (Bound, Jaegerand Baker (1995)). It is interesting to investigate whether EL ratio tests with second order refinementhave better finite sample size properties in these cases6. Because of this motivation, our experiment design6In Stock, Wright and Yogo (2002), it is discussed that edgeworth expansion and weak instruments asymptotics can bealternatives to conventional first order asymptotic approximations to finite sample distributions and address this issue.813.4. Monte Carlo Simulationsfollows Andrews and Marmer (2008). We consider a linear instrumental variable regression model. Ineach of our simulated samples, we have repeated observations of (y1i, y2i) with sample size T . (y1i, y2i)are generated by the following equationsy1i = α+ y2iβ + uiy2i = Zτi pi +(1− ρ2)1/2εi + ρuiwhere Zi = (Zi1, . . . , Zid)τ and Zij , ui, εi are i.i.d. draws from distribution F for all j = 1, . . . , d andi = 1, . . . , T . (α, β) is the parameter of interest. y2i is the endogeneous regressor. We take different valuesfor (α, β) in different experiments. pi determines the strength of the IVs. We take pi to bepi =ρIVd1/2 (1− ρIV )1/2(1, . . . , 1)τwhere ρIV is determined by a parameter λ in the equation (the first equality)λ =Tρ2IV1− ρ2IV= TpiτE [Z1Zτ1 ]pi.From the second equality, we note that λ is closedly related to the concentration parameter and directlymeasures the strength of the IVs. In the following experiments, we take λ to be 4, 9, 20 which correspondsto the cases of weaker IVs, moderately weak IVs and strong IVs. ρ measures the level of endogeneity.We follows Andrews and Marmer (2008) and take it to be 0.75 in all experiments. We consider cases forwhich we have a relatively large number of IVs (d = 6) and cases for which we have a relatively smallnumber of IVs (d = 2). In both cases, the models are overidentified. We take F to have mean 0 andvariance 1. We consider a standardized t5 case and a standardized χ2(3) case.We consider two parameter testing problems. The first is to test H0 : β = 0, taking α as a nuisanceparameter. We have an overidentified model in the presence of nuisance parameters. Bartlett correctabilityof EL ratio tests for this problem has not been established in the literature. In this case, we should takeΥ = (1, 0)τ in the definition of the correction factor Bc. First we set α = β = 0 for the true parametervalue and show the size property of our test with high order precision using either Bartlett Corrected EL(BEL) ratio statistic or Adjusted EL ratio statistic. We use method of moments estimator to estimate thecorrection factor Bc. In Table 3.1, we report rejection frequencies at the 5% nominal significance level of823.4. Monte Carlo SimulationsTable 3.1: Rejection Frequencies of Tests for H0 : β = 0 (T = 200)F d λ GMM EL BEL AELt5 2 4 0.146 0.081 0.047 0.0349 0.109 0.069 0.049 0.04120 0.083 0.068 0.058 0.055t5 6 4 0.474 0.186 0.097 0.0049 0.335 0.125 0.071 0.00820 0.213 0.110 0.070 0.022χ2(3) 2 4 0.150 0.082 0.049 0.0369 0.110 0.073 0.053 0.04220 0.078 0.069 0.059 0.055χ2(3) 6 4 0.464 0.182 0.093 0.0039 0.334 0.132 0.079 0.00720 0.211 0.099 0.070 0.018Nominal Significance Level = 0.05GMM Wald test, EL ratio test, BEL ratio test and AEL ratio test. Here the Wald test we use is based ontwo-step efficient GMM with identity matrix as the first-step weighting matrix. These results are basedon 10000 Monte Carlo replications 7.We obtain the following findings. First, GMM Wald test and EL ratio test can have severe sizedistortion when we have a relatively large number of IVs (d = 6) or the IVs are weak (λ = 4). Inthese cases, the rejection frequencies in the fourth and fifth columns are much larger than the nominalsignificance level. Second, comparing with EL, BEL and AEL are quite effective in controlling the size ofthe tests. Rejection frequencies in sixth and seventh columns are closer to the nominal significance levelfor all cases. Third, compared with BEL ratio tests, AEL ratio tests can be conservative and the overallperformance of BEL is better. Fourth, changing F from t5 to χ2(3), we have the same essential findings.The second testing problem we consider is to test H0 : α = β. Bartlett correctability of EL ratio testsfor this problem has not been established in the literature. In this case, we should take Υ = (1, 1)τ inthe definition of the correction factor Bc. We again take the true value to be α = β = 0 and report the7To compute the profile likelihood function point by point, we use Professor Yuichi Kitamura’s MATLAB code (availableat http://kitamura.sites.yale.edu/sites/default/files/EL codes 03 22 2011 0.zip). For minimization with respect to (α, β), weuse Ziena Optimization’s KNITRO libraries.833.4. Monte Carlo SimulationsTable 3.2: Rejection Frequencies of Tests for H0 : α = β (T = 200)F d λ GMM EL BEL AELt5 2 4 0.144 0.079 0.047 0.0359 0.109 0.066 0.049 0.04020 0.075 0.064 0.055 0.052t5 6 4 0.461 0.183 0.092 0.0069 0.321 0.122 0.071 0.00720 0.196 0.093 0.063 0.023χ2(3) 2 4 0.156 0.082 0.053 0.0399 0.122 0.073 0.053 0.04520 0.093 0.067 0.058 0.056χ2(3) 6 4 0.490 0.182 0.097 0.0099 0.365 0.127 0.079 0.01520 0.251 0.100 0.074 0.034Nominal Significance Level = 0.05Figure 3.1: Rejection Frequencies for Different Sample Sizes (F = t5, λ = 4, d = 6)Solid: BEL; Dotted: AEL; Dashed: EL; Plusses: GMM.rejection frequencies for various cases in Table 3.2. We obtain essentially the same findings as for testingH0 : β = 0.It is also interesting to look at how the rejection frequencies of these diffferent tests under the nullhypothesis change as we increase the sample size. For the case F = t5, λ = 4, d = 6, we set the true valueto be α = β = 0 and compute rejection frequencies of GMM, EL, BEL, AEL tests for testing H0 : β = 0(nominal significance level = 0.05) with sample sizes T = 100, 200, 300, . . . , 1200. Plot of these rejectionfrequencies is given in Figure 3.1. We have the following two essential findings. First, the convergencepattern in this figure is consistent with the theory. We can find that the the rejection frequency of either843.4. Monte Carlo SimulationsFigure 3.2: Size-Corrected Power of EL Ratio, BEL Ratio and AEL Ratio Tests for H0 : β = 0 (T = 200)Solid: BEL; Dotted: AEL; Dashed: EL.of BEL ratio test and AEL ratio test converges to the nominal significance level 0.05 at faster speed thanEL ratio test and also GMM-based Wald test. Second, for small samples, AEL ratio tests tend to beconservative and BEL ratio tests tend to overreject the null hypothesis. The rejection frequencies of bothtests converge quite fast.Next, we focus on the power properties of BEL ratio test and AEL ratio test. First, we considertesting H0 : β = 0. We fix α = 0 and vary the true value for β: β = −1.5 + 0.15k, for k = 0, 1, . . . , 20.Then we compute the size-corrected power of the EL ratio, BEL ratio and AEL ratio tests against thesealternatives. We have 10000 replications with β = 0 to compute the size-correcting critical values. Thesample size of each Monte Carlo sample is 200. These results are based on 2000 replications. The resultsare given in Figure 3.2. The horizontal axis denotes different values for β and the vertical axis denotessize-corrected power. Because of the much larger size distortion of the GMM Wald test shown in Table3.1, its power is not reported. We report only the cases with F = t5. We have the same essential findingsfor the case with F = χ2(3). First, comparing EL and BEL we find that overall BEL has no loss in powerif size-corrected power is considered. Second, for the cases with d = 2, the overall performance of AEL853.4. Monte Carlo SimulationsFigure 3.3: Size-Corrected Power of EL Ratio, BEL Ratio and AEL Ratio Tests for H0 : α = β (T = 200)Solid: BEL; Dotted: AEL; Dashed: EL.ratio test is as good as the other two tests. But for the cases with d = 6, AEL ratio test can have veryweak power against some alternatives. This can be seen as a cost to its robustness to the convex hullproblem. Further investigation into this finding is left for future research.Then we consider testing H0 : α = β. We fix β = 0 and vary the true value for α: α = −1.5 + 0.15k,for k = 0, 1, . . . , 20. Then we compute the size-corrected power of the EL ratio, BEL ratio and AEL ratiotests against these alternatives. The results are given in Figure 3.3. The horizontal axis denotes differentvalues for α and the vertical axis denotes size-corrected power. Figure 3.3 leads to the same conclusionas Figure 3.2.In summary, our Monte Carlo results show that BEL ratio and AEL ratio tests both have better finitesample size properties than EL ratio test. Considering power properties, BEL ratio test has no loss inpower but AEL ratio test can be less powerful than EL ratio test.863.5. Conclusion3.5 ConclusionIn this paper, we show that empirical likelihood ratio tests for a large class of parameter testing problemsdefined by nonlinear restrictions to the true parameter value are Bartlett correctable. This further con-firms empirical likelihood as an appealing method for testing problems. For a lot of interesting testingproblems to applied econometricians including the two considered in our Monte Carlo simulations, em-pirical likelihood based tests can be used but Bartlett correctability of these tests has not been formallyestablished in the literature. Based on our correctability theorem, we propose feasible tests with high-order precision for such a class of testing problems, based on empirical Bartlett correction and adjustedempirical likelihood. Results in our Monte Carlo experiments show that these tests have good finitesample performance.3.6 Proofs3.6.1 Basic FormulaeAlgebraic FormulaeIn the main text, we defined Ω ≡ ΨRΛ−1, Π ≡ ΛΨτRΥ, N ≡ (ΠτΠ)−1 Πτ and M ≡ −I + Π (ΠτΠ)−1 Πτ .We also have −M = OOτ with OτO = I. Then it can be readily obtained thatΥN = Ω (I + M) ; ΠτM = 0; MM = −M; (I + M) Nτ = Nτ ; NΠ = I; MO = −O. (3.17)We also have for each (l, k) ∈ {1, . . . , r}×{1, . . . , p− p0}, γ˜l,k = Πlk and for each (a, k) ∈ {1, . . . , r − p}×{1, . . . , p− p0}, γ˜p+a,k = 0. Therefore we have γ˜m,kMml = 0 and Nkmγ˜m,l = δkl, where we denotedδkl = 1[k = l]. We defined Γ (η) = S−1Q(η) and Γ˜(ξ) = S˜−1Q˜(ξ) whereS−1 =−I + S12 (Sτ12S12)−1 Sτ12 S12 (Sτ12S12)−1(Sτ12S12)−1 Sτ12 (Sτ12S12)−1=0 0 Ωτ0 −I 0Ω 0 ΩΩτ873.6. ProofsandS˜−1 =M 0 Nτ0 −I 0N 0 NNτTherefore we have β˜j,k = E[∂∂ξk Γ˜j(ξ)∣∣∣ξ=(0,ζ∗)]= δjk and βj,k = E[∂∂ηkΓj(η)∣∣∣ξ=(0,θ∗)]= δjk andB =0−A2ΩA1andB˜ =MA1−A2NA1. (3.18)Here Aτ =(A1, . . . , Ar)= (Aτ1 , Aτ2)τ where A1 =(A1, . . . , Ap)τand A2 =(Ap+1, . . . , Ar)τ. ThenBk = 0 for k 6 p ; Bp+a = −Ap+a for a 6 r − p ; Br+k = ΩklAl for k 6 pandB˜k = MklAl for k 6 p ; Bp+a = −Ap+a for a 6 r − p ; Br+k = NklAl for k 6 p− p0 .It is also easy to check that for j 6 p,γj,kBr+k = Ajand−MjkAk + γ˜j,kB˜r+k = Ajand for each k,γ˜l,kB˜l = 0.883.6. ProofsThe following formulae which are derived in Chen and Cui (2007) express the B − β system using theC − γ system and the A− α system. By definition, we have[Bs,t] = S−1[−Aij ] [Ci,l][Ci,l]τ 0=0 0 Ωτ0 −I 0Ω 0 ΩΩτ[−Akl] [−Ak p+b] [Ck,l][−Ap+a l] [−Ap+a p+b] [Cp+a,l][Ck,l]τ [Cp+a,l]τ 0. (3.19)Here we use [−Aij ] to denote a matrix with the i−j element being −Aij and i, j having suitable ranges. Inthis paper, we also adopt a convention where if a superscript is repeated, a summation over that superscriptis understood. We fix the range of superscripts a, b, c, d ∈ {1, 2, . . . , r − p}, f, g, h, i, j ∈ {1, 2, . . . , r} andq, s, t, u ∈ {1, 2, . . . , r + p}. k, l,m, n, o could range from 1 to p or 1 to p − p0, depending on what placethese dummies are used. This will not cause any ambiguity in this paper. For example, in (3.19), [−Aij ]is an r×r matrix and [−Akl] is a p×p matrix. Working out the matrix multiplication element by elementin (3.19), we obtain[Bk,l] [Bk,p+b] [Bk,r+l][Bp+a,l] [Bp+a,p+b] [Bp+a,r+l][Br+k,l] [Br+k,p+b] [Br+k,r+l]=[ΩnkC l,n] [ΩnkCp+b,n] 0[Ap+a l] [Ap+a p+b] [−Cp+a,l][Ωkm(ΩnmC l,n −Aml)] [Ωkm(ΩnmCp+b,n −Amp+b)] [ΩknCn,l].This gives a link between B’s and A and C’s. Similarly, we could also obtainβl,p+a p+c = −Ωvl (γp+c;p+a,v + γp+a;p+c,v); βl,r+mp+c = Ωvlγp+c,vm;βp+a,p+b p+c = −2αp+a p+b p+c; βl,p+a r+n = Ωvlγp+a,vn; βl,r+mr+n = 0;βr+k,r+mr+n = Ωkvγv,mn; βp+a,p+b r+n = γp+a,n;p+b + γp+a;p+b,n;βp+a,r+mr+n = −γp+a,mn; βr+k,p+a p+b = 2Ωkvαv p+a p+b − ΩkwΩnw(γp+b;p+a,n + γp+a;p+b,n);βr+k,p+a r+n = ΩkvΩmvγp+a,mn − Ωko (γp+a;o,n + γo;p+a,n);βr+k,r+mp+a = ΩkoΩnoγp+a,nm − Ωko (γp+a;o,m + γo;p+a,m).893.6. ProofsDetails on how these formulae are derived can be found in the appendix of Chen and Cui (2005). See Page15 of Chen and Cui (2005) for more details. Following the same approach applying to the constrainedEL, we could express B˜− β˜ system using the A−α system and the C˜ − γ˜ system. By definition, we have[B˜s,t] = S˜−1[−Aij ] [C˜i,l][C˜i,l]τ 0=M 0 Nτ0 −I 0N 0 NNτ[−Akl] [−Ak p+b] [C˜k,l][−Ap+a l] [−Ap+a p+b] [C˜p+a,l][C˜k,l]τ [C˜p+a,l]τ 0.We notice that [B˜s,t] is (r + (p− p0))×(r + (p− p0)) dimensional. Working out the matrix multiplication,we obtain[B˜k,l] [B˜k,p+b] [B˜k,r+l][B˜p+a,l] [B˜p+a,p+b] [B˜p+a,r+l][B˜r+k,l] [B˜r+k,p+b] [B˜r+k,r+l]=[−MknAnl + NnkC˜ l,n] [−MknAn p+b + NnkC˜p+b,n] [MknC˜n,l][Ap+a l] [Ap+a p+b] [−C˜p+a,l][Nkm(NnmC˜ l,n −Aml)] [Nkm(NnmC˜p+b,n −Amp+b)] [NknC˜n,l]. (3.20)We notice that in (3.20), for example, the dummy variable n in C˜n,l ranges from 1 to p and the dummyvariable l ranges from 1 to p− p0. The dummy variable n in C˜p+b,n ranges from 1 to p− p0. The suitableranges of these dummy variables are clear if we refer to the definition of C˜ ′s and B˜′s (see (3.6) and (3.7)).The dimension of ζ is p− p0. Similarly, we could also easily obtain the formulae that link the β˜′s and theα and γ˜′s. We take the same approach as Page 15 of Chen and Cui (2005). The equalities on Page 15 ofChen and Cui (2005) still hold if we replace Q1 and Q2 by the constrained counterpart (3.5). For fixed903.6. Proofsh ∈ {1, . . . , r}, by definition of β˜′s,[β˜s,th] =M 0 Nτ0 −I 0N 0 NNτ×[2αklh] [2αk p+b h] [−(γ˜h;k,m + γ˜k;h,m)][2αp+a lh] [2αp+a p+b h] [−(γ˜h;p+a,m + γ˜p+a;h,m)][−(γ˜h;k,m + γ˜k;h,m)]τ [−(γ˜h;p+a,m + γ˜p+a;h,m)]τ [γ˜h,kl](3.21)and working out the matrix multiplication we haveβ˜m,nh = 2Mmvαvnh −(γ˜h;n,k + γ˜n;h,k)Nkm; β˜m,p+a h = 2Mmvαv p+a h −(γ˜h;p+a,k + γ˜p+a;h,k)Nkm;β˜m,r+l h = −Mmv(γ˜h;v,l + γ˜v;h,l)+ Nkmγ˜h,kl; β˜p+a,nh = −2αp+anh;β˜p+a,p+b h = −2αp+a p+b h; β˜p+a,r+l h = γ˜h;p+a,l+γ˜p+a;h,l; β˜r+k,nh = 2Nkvαvnh−NkwNmw(γ˜h;n,m + γ˜n;h,m);β˜r+k,p+a h = 2Nkvαv p+a h −NkwNmw(γ˜h;p+a,m + γ˜p+a;h,m);β˜r+k,r+l h = −Nkw(γ˜h;w,l + γ˜w;h,l)+ NkwNmwγ˜h,ml.For fixed n ∈ {1, . . . , p− p0}, by definition of β˜′s,[β˜s,t r+n] =M 0 Nτ0 −I 0N 0 NNτ×[−(γ˜m;l,n + γ˜l;m,n)] [−(γ˜p+a;l,n + γ˜l;p+a,n)] [γ˜l,mn][− (γ˜m;p+a,n + γ˜p+a;m,n)] [−(γ˜p+b;p+a,n + γ˜p+a;p+b,n)] [γ˜p+a,mn][γ˜l,mn]τ [γ˜p+a,mn]τ 0(3.22)and working out the matrix multiplication we haveβ˜m,n r+l = −Mmk(γ˜n;k,l + γ˜k;n,l)+ Nkmγ˜n,kl ;β˜m,p+a r+l = −Mmk(γ˜p+a;k,l + γ˜k;p+a,l)+ Nkmγ˜p+a,kl; β˜m,r+l r+n = Mmkγ˜k,ln;β˜p+a,m r+n = γ˜m;p+a,n + γ˜p+a;m,n; β˜p+a,p+b r+n = γ˜p+b;p+a,n + γ˜p+a;p+b,n; β˜p+a,r+mr+n = −γ˜p+a,mn;β˜r+k,m r+n = −Nkw (γ˜m;w,n + γ˜w;m,n) + NkwNlwγ˜m,ln;β˜r+k,p+a r+n = −Nkw (γ˜p+a;w,n + γ˜w;p+a,n) + NkwNlwγ˜p+a,ln; β˜r+k,r+mr+n = Nkwγ˜w,mn.We also need formulae that link the C−γ system and the C˜− γ˜ system. We could obtain the following913.6. Proofsformulae by applying the chain rule:γ˜l,k = γl,mΥmk; C˜j,k = Cj,mΥmk; γ˜i;j,k = γi;j,mΥmk; γ˜m,kl = γm,onΥokΥnl . (3.23)Formulae for MomentsLet (X1, . . . , XT ) be T i.i.d. Rd valued random vectors. Let X¯ ≡ 1T∑Ti=1Xi be the sample average. Wewill intensively use the following formulae for moments from DiCiccio, Hall and Romano (1988). We usesuperscripts to denote coordinates of vectors, for example, Xj is the j − th coordinate of the randomvector X. We use a special notation here. [k, l,m] indicates that there are three terms by putting eachof k, l,m in the position of k. Similarly, [j, k, l,m, n] means there are(52)terms by choosing two out ofj, k, l,m, n and putting in the position of j, k.1. E[X¯jX¯k]= T−1E[XjXk];2. E[X¯jX¯kX¯ l]= T−2E[XjXkX l];3. E[X¯jX¯kX¯ lX¯m]= T−3 (T − 1){E[XjXk]E[X lXm][k, l,m]}+ T−3E[XjXkX lXm];4. E[X¯jX¯kX¯ lX¯mX¯n]= T−3{E[XjXk]E[X lXmXn][j, k, l,m, n]};5. E[X¯jX¯kX¯ lX¯mX¯nX¯o]= T−3{E[XjXk] (E[X lXm]E [XnXo] [m,n, o])[k, l,m, n, o]}+O(T−4) ;We can also easily derive the following three formulae as a corollary to these five formulae.6. E[X¯jX¯kX¯ lX¯mX¯n]− E[X¯jX¯kX¯ l]E[X¯mX¯n][l,m, n]= E[X lXmXn]E[XjXk]+E[XkX lXn]E[XjXm][m, l, n] + E[XjX lXn]E[XkXm][m, l, n] +O(T−4) ;7. E[X¯jX¯kX¯ lX¯mX¯nX¯o]− E[X¯jX¯kX¯ lX¯m]E[X¯nX¯o]= E[XjXk] (E[X lAn]E [XmXo] (2; o, n))+ E[XjX l] (E[XkXn]E [XmXo] [2; o, n])= E[XjXm] (E[XkAn]E[X lXo](2; o, n))+ E[XkX l] (E[XjXn]E [XmXo] [2; o, n])= E[XkXm] (E[XjAn]E[X lXo](2; o, n))+ E[X lXm] (E[XjXn]E[XkXo][2; o, n])+O(T−4) ;8. E[X¯jX¯kX¯ lX¯mX¯nX¯o]− E[X¯jX¯kX¯ lX¯m]E[X¯nX¯o][m,n, o]= E[XjXm] (E[XkXn]E[X lXo][2; o, n])+ E[XjXn] (E[XkXm]E[X lXo][2; o,m])+E[XjXo] (E[XkXm]E[X lXn][2;n,m])+O(T−4) ;923.6. Proofs3.6.2 Derivation of (3.11)It is derived in Chen and Cui (2007) that the full stochastic expansions for `(θˆ) is given by`(θˆ)= −2BjAj −BjBj + 2Bj,qBq(Aj +Bj)− βj,uqBuBq(Aj +Bj)−2Bj,uBu,qBq(Aj +Bj)+ βu,qsBj,uBqBs(Aj +Bj)− βj,uqβu,stBqBsBt(Aj +Bj)−B˜j,uqB˜uB˜q(Aj + B˜j)+ 13 β˜j,uqsB˜uB˜qB˜s(Aj + B˜j)+ 2β˜j,uqB˜u,sB˜sB˜q(Aj + B˜j)+2γj,k{−Bj,qBqBr+k +[12βj,uqBuBqBr+k +Bj,uBu,qBqBr+k − 12βu,qsBj,uBqBsBr+k− βj,uqBu,sBqBsBr+k + 12βj,uqβu,stBqBsBtBr+k + 12Bj,uqBuBr+k,qBq−16βj,uqsBuBqBsBr+k − 12βj,uqBuBqBr+k,sBs][2; j, r + k]+Bj,uBuBr+k,qBq + 14βj,uqβr+k,stBuBqBsBt}+2Cj,k{BjBr+k −Bj,qBqBr+k[2; j, r + k] + 12βj,uqBuBqBr+k[2; j, r + k]}+γj,kl{−BjBr+kBr+l +Br+kBr+lBj,qBq[3]− 12Br+lBr+kβj,uqBuBq[3]}−Cj,klBjBr+kBr+l + 13γj,klmBjBr+kBr+lBr+m −Bj,uBuBj,qBq − 14βj,uqβj,stBuBqBsBt+βj,uqBuBqBj,sBs −BjBiAji +BjBi,qBqAji[2; j, i]− 12βj,uqBuBqBiAji[2; j, i]+2γj;i,l{BjBiBr+l −BjBiBr+l,qBq + 12βr+l,uqBjBiBuBq −Br+lBiBj,qBq[2; j, i]+12βj,uqBuBqBiBr+l[2; j, i]}+2BjBiBr+lCj;i,l −(γj;i,lk + γj,l;i,k)BjBiBr+lBr+k − 23αjihBjBiBh + 2αjihBjBiBh,qBq−αjihβj,uqBuBqBiBh − 23AjihBjBiBh + 2γj;i;h,kBjBiBhBr+k − 12αjihgBjBiBhBg +Op(T−5/2)We use that notations that [2; j, r + k] indicates that there are two terms by exchanging the superscriptsj and r + k. The same is understood for [3; j, r + k, r + l]: there are three terms by putting each ofj, r + k, r + l in the position of j. We used [3] as a short notation for [3; j, r + k, r + l]. Details on howthis expansion is derived can be found in Chen and Cui (2005) Section A2. Chen and Cui (2005) SectionA2 also uses the basic algebraic formulae presented in the appendix to simplify this stochastic expansionand obtain the form of (3.4). We take exactly the same approach as Chen and Cui (2005) Section A2and substitute (3.9) and (3.10) into (3.8). We simply replace the B − β system by the B˜ − β˜ system andget an expansion of `(θ˜)in terms of B˜ − β˜ system.`(θ˜)= −2B˜jAj − B˜jB˜j + 2B˜j,qB˜q(Aj + B˜j)− β˜j,uqB˜uB˜q(Aj + B˜j)933.6. Proofs−2B˜j,uB˜u,qB˜q(Aj + B˜j)+ β˜u,qsB˜j,uB˜qB˜s(Aj + B˜j)− β˜j,uqβ˜u,stB˜qB˜sB˜t(Aj + B˜j)−B˜j,uqB˜uB˜q(Aj + B˜j)+ 13 β˜j,uqsB˜uB˜qB˜s(Aj + B˜j)+ 2β˜j,uqB˜u,sB˜sB˜q(Aj + B˜j)+2γ˜j,k{−B˜j,qB˜qB˜r+k +[12 β˜j,uqB˜uB˜qB˜r+k + B˜j,uB˜u,qB˜qB˜r+k − 12 β˜u,qsB˜j,uB˜qB˜sB˜r+k− β˜j,uqBu,sBqBsBr+k + 12 β˜j,uqβ˜u,stB˜qB˜sB˜tB˜r+k+12B˜j,uqB˜uB˜r+k,qB˜q − 16 β˜j,uqsB˜uB˜qB˜sB˜r+k − 12 β˜j,uqB˜uB˜qB˜r+k,sB˜s][2; j, r + k]+B˜j,uB˜uB˜r+k,qB˜q + 14 β˜j,uqβ˜r+k,stB˜uB˜qB˜sB˜t}+2C˜j,k{B˜jB˜r+k − B˜j,qB˜qB˜r+k[2; j, r + k] + 12 β˜j,uqB˜uB˜qB˜r+k[2; j, r + k]}+γ˜j,kl{−B˜jB˜r+kB˜r+l + B˜r+kB˜r+lB˜j,qB˜q[3]− 12B˜r+lB˜r+kβ˜j,uqB˜uB˜q[3]}−C˜j,klB˜jB˜r+kB˜r+l + 13 γ˜j,klmB˜jB˜r+kB˜r+lB˜r+m − B˜j,uB˜uB˜j,qB˜q − 14 β˜j,uqβ˜j,stB˜uB˜qB˜sB˜t+β˜j,uqB˜uB˜qB˜j,sB˜s − B˜jB˜iAji + B˜jB˜i,qB˜qAji[2; j, i]− 12 β˜j,uqB˜uB˜qB˜iAji[2; j, i]+2γ˜j;i,l{B˜jB˜iB˜r+l − B˜jB˜iB˜r+l,qB˜q + 12 β˜r+l,uqB˜jB˜iB˜uB˜q − B˜r+lB˜iB˜j,qB˜q[2; j, i]+12 β˜j,uqB˜uB˜qB˜iB˜r+l[2; j, i]}+2B˜jB˜iB˜r+lC˜j;i,l −(γ˜j;i,lk + γ˜j,l;i,k)B˜jB˜iB˜r+lB˜r+k − 23αjihB˜jB˜iB˜h + 2αjihB˜jB˜iB˜h,qB˜q−αjihβ˜j,uqB˜uB˜qB˜iB˜h − 23AjihB˜jB˜iB˜h + 2γ˜j;i;h,kB˜jB˜iB˜hB˜r+k − 12αjihgB˜jB˜iB˜hB˜g +Op(T−5/2) .And using the basic formulae in Section 3.6.1, we can see that some of these terms cancel each otherand thus this expansion can be greatly simplified. We use the following observations. First, since forj 6 p, −MjkAk + γ˜j,kB˜r+k = Aj and Aj + Bj =(Aj + MjkAk)1 [j 6 p], it is readily obtained that thesum of the 3rd to the 18th terms in the expansion above is equal to zero. The sum of the 19th, 24th,32nd and 35th terms can be also found to be equal to zero. We first expand these terms to a form wherethe basic algebraic formulae that link the B˜ − β˜ system and the C˜ − γ˜ can be applied.β˜j,uq{C˜j,kB˜r+k − B˜r+k,sB˜sγ˜j,k + B˜j,sB˜s −AjiB˜i}B˜uB˜q={β˜j,uqC˜j,kB˜r+k − B˜l,uqγ˜l,k(B˜r+k,mB˜m + B˜r+k,p+aB˜p+a + B˜r+k,r+mB˜r+m)− β˜l,uq(AlmB˜m +Al p+aB˜p+a)− β˜p+b,uq(Ap+bmB˜m +Ap+a p+aB˜p+a)+ β˜l,uq(B˜l,mB˜m + B˜l,p+aB˜p+a + B˜l,r+nB˜r+n)+β˜p+b,uq(B˜p+b,mB˜m + B˜p+b,p+aB˜p+a + B˜p+b,r+nB˜r+n)}B˜uB˜q.943.6. ProofsBy using the basic formulae and the definition γ˜l,k = Πlk, we find that γ˜l,kB˜r+k,p+a = NnlC˜p+a,n −(I + M)lmAmp+a. Since the formula (3.20) gives that B˜l,p+a = −MlmAmp+a + NnlC˜p+a,n, we obtain theequality that γ˜l,kB˜r+k,p+a = B˜l,p+a −Al p+a, for any fixed l and a. Similarly we can obtain γ˜l,kB˜r+k,m =B˜l,m −Alm. Therefore, we have the following equalities:β˜j,uq{C˜j,kB˜r+k − B˜r+k,sB˜sγ˜j,k + B˜j,sB˜s −AjiB˜i}B˜uB˜q={β˜j,uqC˜j,kB˜r+k − β˜l,uq(B˜l,m −Alm)B˜m − β˜l,uq(B˜l,p+a −Al p+a)B˜p+a+ β˜l,uqγ˜l,kB˜r+k,r+nB˜r+n − β˜l,uq(Alm − B˜l,m)B˜m − β˜l,uq(Al p+a − B˜l,p+a)B˜p+a− β˜p+b,uq(Ap+bm − B˜p+b,m)B˜m − β˜p+b,uq(Ap+b p+a − B˜p+b,p+a)B˜p+a+β˜l,uqB˜l,r+nB˜r+n + β˜p+b,uqB˜p+b,r+nB˜r+n}B˜uB˜q={β˜j,uqC˜j,kB˜r+k − β˜l,uq (I + M)lk C˜k,nB˜r+n + β˜l,uqMlmC˜m,nB˜r+n−β˜p+b,uqC˜p+b,nB˜r+n}B˜uB˜q= 0. (3.24)Similarly, using the basic formulae, we can also obtain the following equalities:C˜j,kβ˜r+k,uqB˜uB˜qB˜j − γ˜j,kβ˜r+k,uqB˜uB˜qB˜j,sB˜s=(C˜j,kB˜j − γ˜j,kB˜j,sB˜s)β˜r+k,uqB˜uB˜q=(C˜j,kB˜j − γ˜l,kB˜l,mB˜m − γ˜l,kB˜l,p+aB˜p+a − γ˜l,kB˜l,r+mB˜r+m)β˜r+k,uqB˜uB˜q=(C˜j,kB˜j − γ˜l,kNnlC˜m,nB˜m − γ˜l,kNnlC˜p+a,nB˜p+a)β˜r+k,uqB˜uB˜q=(C˜j,kB˜j − δnkC˜m,nB˜m − δnkC˜p+a,nB˜p+a)β˜r+k,uqB˜uB˜q= 0. (3.25)(3.24) and (3.25) imply that he sum of the 19th, 24th, 32nd and 35th terms is equal to zero. The sum of953.6. Proofsthe 20th and the 23rd terms can be simplified to −2C˜j,kB˜r+kB˜j,qB˜q since we can obtain− 2C˜j,kB˜jB˜r+k,qB˜q + 2γ˜j,kB˜j,uB˜uB˜r+k,qB˜q=(−2C˜j,kB˜j + 2γ˜j,kB˜j,uB˜u)B˜r+k,qB˜q=0.We also notice that the sum of the 21st, 27th and 38th terms can be simplified to−12 γ˜j,klB˜r+lB˜r+kβ˜j,uqB˜uB˜q.We notice that for the 21st term,12β˜j,uqγ˜j,mβ˜r+m,stB˜uB˜qB˜sB˜t={12β˜k,noγ˜k,mB˜nB˜o + β˜k,n p+aγ˜k,mB˜nB˜p+a + β˜k,n r+oγ˜k,mB˜nB˜r+o+12β˜k,p+a p+bγ˜k,mB˜p+aB˜p+b +12β˜k,r+n r+oγ˜k,mB˜r+nB˜r+o + β˜k,p+a r+oγ˜k,mB˜p+aB˜r+o}β˜r+m,stB˜sB˜t(3.26)We can easily obtain the following expansions for the 27th and 38th terms:− γ˜j,klβ˜r+l,uqB˜jB˜r+kB˜uB˜q=− γ˜m,klβ˜r+l,uqB˜mB˜r+kB˜uB˜q − γ˜p+a,klβ˜r+l,uqB˜p+aB˜r+kB˜uB˜q,andγ˜j;i,lβ˜r+l,uqB˜jB˜iB˜uB˜q=(γ˜m;n,lB˜mB˜n + γ˜p+a;n,lB˜p+aB˜n + γ˜m;p+a,lB˜mB˜p+a + γ˜p+a;p+b,lB˜p+aB˜p+b)β˜r+l,uqB˜uB˜q.The third term in (3.26) can be transformed using the basic formulae:β˜k,n r+oγ˜k,mB˜nB˜r+oβ˜r+m,stB˜sB˜t=(Nvkγ˜n,voγ˜k,m)B˜nB˜r+oβ˜r+m,stB˜sB˜t=γ˜n,moB˜nB˜r+oβ˜r+m,stB˜sB˜t.963.6. ProofsThe sixth term of (3.26) can be transformed:β˜k,p+a r+oγ˜k,mB˜p+aB˜r+oβ˜r+m,stB˜sB˜t=(−Mkv(γ˜p+a;v,o + γ˜v;p+a,o)+ Nlkγ˜p+a,lo)γ˜k,mB˜p+aB˜r+oβ˜r+m,stB˜sB˜t=γ˜p+a,moB˜p+aB˜r+oβ˜r+m,stB˜sB˜t.The first term of (3.26) can be transformed:12β˜k,noγ˜k,mB˜nB˜oβ˜r+m,stB˜sB˜t=12(γ˜o;n,v + γ˜n;o,v) Nvkγ˜k,mB˜nB˜oβ˜r+m,stB˜sB˜t=12(γ˜o;n,m + γ˜n;o,m) B˜nB˜oβ˜r+m,stB˜sB˜t.The second term of (3.26) can be transformed:β˜k,n p+aγ˜k,mB˜nB˜p+aβ˜r+m,stB˜sB˜t=(2Mkvαvn p+a −(γ˜p+a;n,l + γ˜n;p+a,l)Nlk)γ˜k,mB˜nB˜p+aβ˜r+m,stB˜sB˜t=−(γ˜p+a;n,m + γ˜n;p+a,m)B˜nB˜p+aβ˜r+m,stB˜sB˜t.Therefore we find that the sum of 21st, 27th and 38th terms isγ˜j,kl(−β˜j,uqB˜r+lB˜r+kB˜uB˜q[3])+ γ˜j;i,lβ˜r+l,uqB˜jB˜iB˜uB˜q +12β˜j,uqγ˜j,mβ˜r+m,stB˜uB˜qB˜sB˜t=γ˜p+a;p+b,lB˜p+aB˜p+bβ˜r+l,uqB˜uB˜q +12β˜k,p+a p+bγ˜k,mB˜p+aB˜p+bβ˜r+m,stB˜sB˜t+12β˜k,r+n r+oγ˜k,mB˜r+nB˜r+oβ˜r+m,stB˜sB˜t−12γ˜j,klB˜r+lB˜r+kβ˜j,uqB˜uB˜q.The sum of the first two lines is equal to zero since12β˜k,p+a p+bγ˜k,m = −12(γ˜p+b;p+a,m + γ˜p+a;p+b,m)and β˜k,r+n r+oγ˜k,m = 0 by using the basic algebraic formulae.973.6. ProofsApplying these algebraic equalities, we get (3.11), which is a simplified form of the high order expansionof `(θ˜). Similarly, as in Chen and Cui (2005), we can obtain (3.4), which is a simplified form of the highorder expansion of `(θˆ).3.6.3 Derivation of R1 and R2In this section, we look for expressions for R1 and R2, where R1, R2, R3 are signed root decomposi-tion which are p0 - dimensional random vectors such that LR = T · (R1 +R2 +R3)τ (R1 +R2 +R3) +Op(T−3/2) and R1 = Op(T−1/2), R2 = Op(T−1) and R3 = Op(T−3/2). LR ≡ 2(`(θ˜)− `(θˆ)). From(3.11) and (3.4), we obtainT−1LR =(−B˜jAj − B˜jB˜j −AjiB˜jB˜i + 2γ˜j;i,lB˜iB˜jB˜r+l + 2C˜j,kB˜jB˜r+k−23αjihB˜jB˜iB˜h − γ˜j,klB˜jB˜r+kB˜r+l)−(−BjAj −BjBj −AjiBjBi + 2γj;i,lBiBjBr+l + 2Cj,kBjBr+k−23αjihBjBiBh − γj,klBjBr+kBr+l)+Op(T−2).We observe that −BjAj −BjBj = Ap+aAp+a and−2B˜jAj − B˜jB˜j = −2(MlkAkAl −Ap+aAp+a)−Mkk′Ak′Mll′Al′−Ap+aAp+a.Apply this to the Op(T−1) terms in and also apply the basic formulae to Op(T−3/2) terms. We obtainT−1LR =−MklAkAl −MmnMlkAlmAnAk + 2MlkAl p+aAkAp+a + 2γ˜m;k,lMmnMkvNlwAnAvAw− 2γm;p+a,lMmnNlkAnAkAp+a + 2MlmNknC˜ l,kAmAn − 2γ˜p+b;k,lMkmNlnAmAnAp+a+(2γ˜p+a;p+b,lNlkAkAp+aAp+b − 2γp+a;p+b,lΩlkAkAp+aAp+b)−(2NklC˜p+a,kAp+aAl − 2ΩklCp+a,kAp+aAl)+(γ˜p+a,klNkmNlnAmAnAp+a − γp+a,klΩkmΩlnAmAnAp+a)− γ˜m,klMmnNkvNlwAnAvAw −23αlmnMlkMmvMnwAkAvAw+ 2αlm p+aMlkMmnAkAnAp+a − 2αl p+a p+bMlkAkAp+aAp+b +Op(T−2) (3.27)983.6. ProofsFrom the first term in (3.27), we obtain that Rx1 = OmxAm. To compute the 3rd to 5th lines, applyingthe basic formulae, we getC˜p+a,kAp+aNkk′Ak′= Cp+a,kΩklAp+aAl + Cp+a,kΩkmMmlAp+aAl,γ˜p+a;p+b,lNll′Al′Ap+aAp+b = γp+a;p+b,mΩmnAnAp+aAp+b+ γp+a;p+b,mΩmlMlnAnAp+aAp+bandγ˜p+a,klAp+aNknAnNlvAv = γp+a,onΩokΩnlAkAlAp+a + 2γp+a,onΩokΩnlMkmAmAlAp+a+ γp+a,onΩokΩnlMkmMlvAmAvAp+a.Then it follows thatT−1LR =−MklAkAl −MmkMlnAlmAkAn + 2MlkAl p+aAp+aAk+ 2γm;k,oΩol (I + M)ln MmvMkwAvAnAw − 2γm;p+a,vΩvw (I + M)wn MmwAnAwAp+a+ 2γp+a;p+b,mΩmlMlkAkAp+aAp+b − 2ΩkmMmlCp+a,kAlAp+a+ 2γp+a,onΩokΩnlMlvAkAvAp+a + γp+a,onΩokΩnlMkmMlwAmAwAp+a+ 2Ωnk (I + M)km MlvC l,nAmAv − 2γp+a;k,mΩmw (I + M)wv MkoAvAoAp+a− γm,olΩon (I + M)nv Ωlk (I + M)kw Mmk′AvAwAk′−23αlmnMlvMmwMnkAvAwAk − 2αl p+a p+bMlkAkAp+aAp+b +Op(T−2) (3.28)993.6. ProofsThen we obtain the expression for R2 by matching the Op(T−3/2) terms in (3.28) with 2Rx2Rx1 : for eachx ∈ {1, . . . , p0},Rx2 =12MmkOnxAmnAk −OnxAn p+aAp+a +{−γm;v,wΩwn (I + M)nk MmlOvx+13αvmnMvlMmkOnx +12γm,wvΩwn (I + M)nl Ωvo (I + M)ok Omx}AlAk+{−γp+a,vwΩvkΩwoOox − αvmp+aMvkOmx +(γp+a,m;v + γp+a;v,m)Ωmw (I + M)wk Ovx}AkAp+a+{−γp+a;p+b,mΩmnOnx + αv p+a p+bOvx}Ap+aAp+b− Ωko (I + M)om OlxC l,kAm + ΩkmOmxCp+a,kAp+a. (3.29)3.6.4 Expansion of Rx2Ry2Algebraically, from the expression for R2(Equation (3.29)), it can be readily obtained that for any fixedx, y, Rx2Ry2 is equal to the sum of the following 33 terms. [2] is short notation for [2;x, y].1.{14MmvMlwOnxOky}AmnAklAvAw ; 2. {OnxOmy}An p+aAmp+bAp+aAp+b ;3.{γl′;v,wγm′;o,w′Ωwk′Ωw′n′ (I + M)n′n (I + M)k′k Ml′lMm′mOoyOvx+ 19αl′k′vαn′m′wMl′lMk′kMn′nMm′mOvxOwy+14γo,wvγo′,w′v′Ωwl′Ωvk′Ωw′n′Ωv′m′ (I + M)l′l (I + M)k′k (I + M)n′n (I + M)m′m OoxOo′y}×AkAlAmAn ;4.{−13γl′;v,wαn′m′oΩwk′(I + M)k′k Mn′nMm′mMl′lOvxOoy− 12γl′;v,wγo,w′v′Ωwk′Ωw′n′Ωv′m′ (I + M)n′n (I + M)m′m (I + M)k′k Ml′lOvxOoy+16αl′k′oγw,v′vΩv′n′Ωvm′(I + M)n′n (I + M)m′m Ml′lMk′kOoxOwy}AkAlAmAn[2] ;5.{γp+a,vwγp+b,mnΩvkΩwoΩmlΩno′OoxOo′y+ 14γp+a,vwγp+b,mnΩvk′ΩwoΩml′Ωno′Mk′kMl′lOoxOo′y+ αvmp+aαwnp+bMvkMwlOmxOny+ (γp+a,m;v + γp+a;v,m)(γp+b,n;o + γp+a′;o,n)Ωmk′Ωnl′(I + M)k′k (I + M)l′l OvxOoy}×AkAlAp+aAp+b ;6.{12γp+a,vwγp+b,mnΩvkΩwoΩml′Ωno′Ml′lOoxOo′y+ γp+a,vwαl′mp+bΩvkΩwoMl′lOmyOox− γp+a,vw(γp+b,m;n + γp+b;n,m)ΩvkΩwoΩml′(I + M)l′l OnyOox1003.6. Proofs+ 12γp+a,vwαl′mp+bΩvk′ΩwoMk′kMl′lOmyOox− 12γp+a,vw(γp+b,n;m + γp+b;m,n)Ωvk′ΩwoΩnl′(I + M)l′l Mk′kOoxOmy−αvmp+a(γp+b,n;o + γp+b;o,n)Ωnw (I + M)wl MvkOmxOoy}AkAlAp+aAp+b[2] ;7.{γp+a;p+b,mΩmnγp+c;p+d,kΩklOnxOly + αv p+a p+bαo p+c p+dOvxOoy}Ap+aAp+bAp+cAp+d ;8.{−γp+a;p+b,mαo p+c p+dΩmnOnxOoy}Ap+aAp+bAp+cAp+d[2] ;9.{ΩkoΩno′(I + M)o′w (I + M)ov OlxOmy}C l,kAvCm,nAw ;10.{ΩlmΩkvOmxOvy}Cp+a,lCp+b,kAp+aAp+b ; 11.{−12MmlOnxOky}AnmAk p+aAlAp+a[2] ;12.{−12γk;l,w′Ωw′n′ (I + M)n′v MkwMmoOnxOly + 16αkln′MmoMkwMlvOnxOn′y12. +14γm′,klΩkn′Ωlo′(I + M)n′w (I + M)o′v MmoOnxOm′y}AmnAoAwAv[2] ;13.{−12γp+a,klΩkwΩloMmvOnxOoy− 14γp+a,klΩkm′ΩloMmvMm′wOnxOoy − 12αkl p+aMkwMmvOnxOly+12(γp+a,o;k + γp+a;k,o)Ωol (I + M)lw MmvOnxOky}AmnAvAwAp+a[2] ;14.{−12γp+a;p+b,oΩolMmvOnxOly + 12αo p+a p+bMmvOnxOoy}AnmAvAp+aAp+b[2] ;15.{−12Ωko (I + M)om MvnOwxOly}C l,kAwvAnAm[2] ;16.{12ΩnoMvmOwxOoy}Cp+a,nAwvAmAp+a[2] ;17.{γo;v,wΩwn (I + M)nk MolOmxOvy − 13αovwMolMvkOmxOwy−12γo,wvΩwnΩvk′(I + M)k′k (I + M)nl OmxOoy}Amp+aAp+aAlAk[2] ;18.{γp+a,vwΩvkΩwoOmxOoy + 12γp+a,vwΩvnΩwoMnkOmxOoy + αvn p+aMvkOmxOny− (γp+a,n;v + γp+a;v,m) Ωnw (I + M)wk OmxOvy}Amp+bAp+bAkAp+a[2] ;19.{γp+a;p+b,oΩonOmxOny − αv p+a p+bOmxOvy}Amp+cAp+cAp+aAp+b[2] ;20.{Ωko (I + M)on OmxOly}C l,kAmp+aAnAp+a[2] ;21. {−ΩmvOnxOvx}Cp+a,mAn p+bAp+aAp+b[2] ;22.{γl′;v,wγp+a,v′w′Ωwk′Ωv′mΩw′o (I + M)k′k Ml′lOvxOoy+ 12γl′;v,wγp+a,v′w′Ωwk′Ωv′m′Ωw′o (I + M)k′k Ml′lMm′mOvxOoy+ γl′;v,wαm′o p+aΩwk′(I + M)k′k Ml′lMm′mOvxOoy−γl′;v,w(γp+a,o;v′+ γp+a′;v′,o)Ωwk′Ωow′(I + M)k′k (I + M)w′m Ml′lOvxOv′y}AlAkAmAp+a[2] ;23.{−13αl′k′nγp+a,v′w′Ωv′mΩw′oMl′lMk′kOnxOoy− 16αl′k′nγp+a,v′w′Ωv′m′Ωw′oMl′lMk′kMm′mOnxOoy − 13αl′k′nαvo p+aMl′lMk′kMvmOnxOoy+13αl′k′n(γp+a,o;v′+ γp+a′;v′,o)Ωom′(I + M)m′m Ml′lMk′kOnxOv′y}AlAkAmAp+a[2] ;1013.6. Proofs24.{−12γn,wvγp+a,v′w′Ωwl′Ωvk′Ωv′mΩw′o (I + M)k′k (I + M)l′l OnxOoy− 14γn,wvγp+a,v′w′Ωwl′Ωvk′Ωv′m′Ωw′o (I + M)k′k (I + M)l′l Mm′mOnxOoy− 12γn,wvαm′o p+aΩwl′Ωvk′(I + M)k′k (I + M)l′l Mm′mOnxOoy+12γn,wv(γp+a,o;v′+ γp+a;v′,o)Ωwl′Ωvk′Ωom′(I + M)k′k (I + M)l′l (I + M)m′m OnxOv′y}×AlAkAmAp+a[2] ;25.{γl′;v,wγp+a;p+b,mΩwk′Ωmn (I + M)k′k Ml′lOvxOny − γl′;v,wαmp+a p+bΩwk′(I + M)k′k Ml′lOvxOmy− 13αvmnγp+a;p+b,wΩwoMvlMmkOnxOoy + 13αvmnαo p+a p+bMvlMmkOnxOoy− 12γm,wvγp+a;p+b,oΩwl′Ωvk′Ωon (I + M)l′l (I + M)k′k OmxOny+12γm,wvαo p+a p+bΩwl′Ωvk′(I + M)k′k (I + M)l′l OmxOoy}AlAkAp+aAp+b[2] ;26.{γl′;v,wΩwk′Ωnm′(I + M)k′k (I + M)m′m Ml′lOvxOoy − 13αl′k′vΩnm′(I + M)m′m Ml′lMk′kOvxOoy−12γo′,wvΩwl′Ωvk′Ωnm′(I + M)m′m (I + M)l′l (I + M)k′k Oo′xOoy}Co,nAmAlAk[2] ;27.{−γl′;v,wΩwk′Ωmo (I + M)k′k Ml′lOvxOoy + 13αvwnMvlMwkOnxΩmoOoy+12γn,wvΩmoΩwl′Ωvk′(I + M)k′k (I + M)l′l OnxOoy}Cp+a,mAp+aAlAk[2] ;28.{γp+a,vwγp+b;p+c,mΩmnΩvkΩwoOoxOny − γp+a,vwαmp+b p+cΩvkΩwoOoxOmy+ 12γp+a,vwγp+b;p+c,mΩvk′ΩwoΩmnMk′kOoxOny−12γp+a,vwαn p+a′ p+b′ΩvmΩwoMmkOoxOny}AkAp+aAp+bAp+c[2] ;29.{αvmp+aγp+b;p+c,oΩonMvkOmxOny − αvmp+aαn p+b p+cMvkOmxOny− (γp+a,m;v + γp+a;v,m) γp+b;p+c,nΩmwΩno (I + M)wk OvxOoy+ (γp+a,m;v + γp+a;v,m)αn p+b p+cΩmw (I + M)wk OvxOny}AkAp+aAp+bAp+c[2] ;30.{γp+a,vwΩvnΩwoΩkm′(I + M)m′m OoxOly+ 12γp+a,vwΩvn′ΩwoΩkm′(I + M)m′m Mn′nOoxOly + αvw p+aΩko (I + M)om MvnOwxOly− (γp+a,o;v + γp+a;v,o) Ωon′Ωkm′(I + M)n′n (I + M)m′m OvxOly}C l,kAmAnAp+a[2] ;31.{−γp+b,vwΩvlΩwoΩkmOoxOmy− 12γp+b,vwΩvnΩwoΩkmMnlOoxOmy − αvn p+bΩkmMvlOnxOmy+(γp+b,n;v + γp+b;v,n)ΩnwΩkm (I + M)wl OvxOmy}Cp+a,kAp+aAlAp+b[2] ;32.{γp+a;p+b,vΩvwΩko (I + M)om OwxOly−αv p+a p+bΩko (I + M)om OvxOly}C l,kAmAp+aAp+b[2] ;33.{−γp+a;p+b,vΩvwΩkmOwxOmy + αv p+a p+bΩkmOvxOmy}Cp+c,kAp+cAp+aAp+b[2] .1023.6. Proofs3.6.5 The Expression for Rτ2R2From the algebraic expansion for Rx2Ry2, we can readily obtain the expression for Rτ2R2 =∑p0x=1Rx2Rx2 .Rx2Rx2 is equal to the sum of the following 27 terms. We collected alike terms. This expression will beused when we derive the expression for R3.1.{−γk′;w,oΩom′Mww′γl′;w′,oΩon′Mk′kMl′l (I + M)n′n (I + M)m′m− 13αk′l′oMovγv,ww′Ωwm′Ωw′n′Mk′kMl′l (I + M)n′n (I + M)m′m− 19αvwoαk′m′n′Mon′MvkMwlMk′mMm′n + 23γo;w,vαk′m′n′Ωvl′Mwn′(I + M)l′l MokMk′mMm′n+ γk′;w,vγo′,ow′Ωvl′Mwo′Ωon′Ωw′m′ (I + M)l′l (I + M)n′n (I + M)m′m Mk′k−14γl′,owγm′,k′w′Ωoo′ΩwvMl′m′Ωk′n′Ωw′v′ (I + M)o′k (I + M)vl (I + M)n′m (I + M)v′n}AkAlAmAn ;2.{−γk′;w,vγp+a,n′nΩvl′MwoΩn′m′Ωno (I + M)m′m Mk′kMl′l− 2γk′;w,vαl′n p+aΩvm′Mwn (I + M)m′m Mk′kMl′l− 23αk′l′n (γo;p+a,v + γp+a;o,v) MnoΩvm′(I + M)m′m Mk′kMl′l+ 2γk′;w,v (γn;p+a,o + γp+a;n,o) Ωvl′MwnΩom′(I + M)m′m (I + M)l′l Mk′k+ γo,nwαk′v p+aΩnl′Ωwm′Mov (I + M)m′m (I + M)l′l Mk′k+ 23αk′l′nαm′v p+aMnvMm′mMk′kMl′l + 13αk′l′nγp+a,voMnwΩvm′ΩowMm′mMk′kMl′l+ 12γo,nwγp+a,v′vΩnm′Mow′Ωv′k′Ωvw′Ωwl′(I + M)m′m (I + M)l′l Mk′k+ 23αk′l′nγp+a,voMnwΩvmΩowMl′lMk′k − 2γk′;w,vγp+a,m′nΩvl′Mwn′Ωm′mΩnn′(I + M)l′l Mk′k+ γn,owγp+a,m′o′Ωok′Ωwl′Mnw′Ωm′mΩo′w′ (I + M)k′k (I + M)l′l−γn,ow(γw′;p+a,v + γp+a;w′,v)Ωok′Ωwl′Mnw′Ωvm′(I + M)k′k (I + M)l′l (I + M)m′m}AkAlAmAp+a ;3.{−14γp+a,vwΩvk′ΩwmMmm′γp+b,onΩol′Ωnm′Mk′kMl′l − αk′mp+aMmnαl′n p+bMk′kMl′l− γp+a,owΩok′ΩwmMmnαl′n p+bMk′kMl′l+ γp+a,owΩok′ΩwmMmn(γn;p+b,v + γp+b;n,v)Ωvl′Mk′k (I + M)l′l+ 2αk′mp+aMmo(γo;p+b,m′+ γp+b;o,m′)Ωm′l′Mk′k (I + M)l′l− γp+a,moΩmlΩowMwnγp+b,m′vΩm′k′ΩvnMk′k − 2γp+a,moΩmlΩowMwnαk′n p+bMk′k− 2γk′;w,oγp+a;p+b,nΩol′MwmΩnm (I + M)l′l Mk′k + 2γk′;w,oΩol′Mwvαv p+a p+b (I + M)l′l Mk′k+ 23αk′l′nMnvγp+a;p+b,wΩwvMk′kMl′l − 23αk′l′nMnvαv p+a p+bMk′kMl′l− γp+a,moΩmkΩowMww′γp+b,m′k′Ωm′lΩk′w′− (γp+a,m;w + γp+a;w,m) ΩmoMww′(γp+b,m′;w′ + γp+b;w′,m′)Ωm′k′ (I + M)ok (I + M)k′l+ 2γp+a,moΩmkΩowMww′(γp+b,m′;w′ + γp+b;w′,m′)Ωm′k′ (I + M)k′l1033.6. Proofs+ γm,owΩonΩwvMmn′γp+a;p+b,m′Ωm′n′ (I + M)nk (I + M)vl−γm,owΩonΩwvMmm′αm′ p+a p+b (I + M)nk (I + M)vl}AkAlAp+aAp+b ;4.{−γp+a,owγp+b;p+c,lΩok′ΩwmMmvΩlvMk′k + γp+a,owαl p+b p+cΩok′ΩwmMmlMk′k− 2αk′mp+aγp+b;p+c,vMmoΩvoMk′k + 2αk′mp+aαo p+b p+cMmoMk′k− 2γp+a,moγp+b;p+c,vΩmkΩowMwnΩvn + 2γp+a,moαn p+b p+cΩmkΩowMwn+ 2 (γp+a,m;w + γp+a;w,m) ΩmoMwlγp+b;p+c,vΩvl (I + M)ok−2 (γp+a,m;w + γp+a;w,m) ΩmoMwlαl p+b p+c (I + M)ok}AkAp+aAp+bAp+c ;5.{−γp+a;p+b,mγp+c;p+d,kΩmnMnlΩkl − αmp+a p+bMmlαl p+c p+d+2αmp+a p+bγp+c;p+d,kMmlΩkl}Ap+aAp+bAp+cAp+d ;6.{−14MmvMnlMkw}AnmAvAlkAw ; 7.{MmvMnlΩko (I + M)ow}C l,kAnmAvAw ;8. {−Mnm}An p+aAmp+bAp+aAp+b ; 9.{−Ωko (I + M)ov MlmΩnw′(I + M)w′w}Cm,nC l,kAwAv ;10.{−ΩkmΩlnMmn}Cp+a,kCp+b,lAp+aAp+b ; 11.{−2MknΩlm (I + M)mv}Cn,lAk p+aAp+aAv ;12.{2MloΩko}Cp+b,kAl p+aAp+aAp+b ; 13.{2Ωkw (I + M)wv Mlm′Ωmm′}C l,kCp+a,mAvAp+a ;14.{MmlMnk}AnmAk p+aAlAp+a ; 15.{−MmvΩokMnk}Cp+a,oAnmAvAp+a ;16.{γm′;w′,kΩkn′(I + M)n′w Mm′oMnw′Mmv − 13αkln′MkwMloMn′nMmv−12γm′,klΩkn′(I + M)n′w Ωll′(I + M)l′o Mm′nMmv}AnmAvAwAo ;17.{−2γm;l,v′Ωv′k′ (I + M)k′v MmkMloΩnl′(I + M)l′w + 23αv′mlMv′kMmvMloΩnl′(I + M)l′wγl′,mlΩmn′(I + M)n′k Ωlo′(I + M)o′w Ml′oΩnw′(I + M)w′v}Co,nAkAwAv ;18.{12γp+a,w′k′Ωk′oΩw′m′MowMm′vMmk + αlm′ p+aMlwMm′vMmk+γp+a,m′k′Ωm′wΩk′w′Mw′vMmk −(γp+a,m′;w′ + γp+a;w′,m′)Ωm′o (I + M)ow Mw′vMmk}AvmAkAwAp+a ;19.{−γp+a,onΩolΩnl′MlkMl′vΩmw′(I + M)w′w − 2αnl p+aMnkMlvΩmw′(I + M)w′w− 2γp+a,lnΩlkΩnoMovΩmk′(I + M)k′w+2(γp+a,l′;n + γp+a;n,l′)Ωl′o (I + M)ok MnvΩmk′(I + M)k′w}Cv,mAwAkAp+a ;20.{γm;w,oΩov (I + M)vl MmkMwm′Ωnm′− 23αomvMokMmlMvwΩnw−γm,owΩok′(I + M)k′k Ωwv (I + M)vl Mmm′Ωnm′}Cp+a,nAkAlAp+a ;21.{−γk′;w,oΩol′(I + M)l′l MwnMk′k + 23αk′l′vMl′lMvnMk′kγm,k′wΩk′n′ (I + M)n′k Ωwv (I + M)vl Mmn}An p+aAp+aAkAl ;22.{γp+a;p+b,kΩklMlvMmn − αk p+a p+bMkvMmn}AvmAnAp+aAp+b ;1043.6. Proofs23.{−γp+a,koΩkk′ΩovMvmMk′w − 2αk′n p+aMnmMk′w−2γp+a,klΩkwΩlvMvm + 2(γp+a,l;k + γp+a;k,l)Ωlk′(I + M)k′w Mkm}Amp+bAwAp+bAp+a ;24.{γp+a,kwΩklΩwmMlkMmvΩov + 2αwmp+aMwkMmvΩov+2γp+a,mlΩmkΩlwMwnΩon − 2 (γp+a,m;w + γp+a;w,m) Ωml (I + M)lk MwnΩon}Cp+b,oAkAp+bAp+a ;25.{−2γp+a;p+b,oΩonMnvΩml (I + M)lw + 2αl p+a p+bMlvΩmo (I + M)ow}Cv,mAp+aAp+bAw ;26.{2γp+a;p+b,mΩmnMnkΩok − 2αmp+a p+bMmkΩok}Cp+c,oAp+aAp+bAp+c ;27.{−2Mmkγp+a;p+b,lΩlk + 2Mmkαk p+a p+b}Amp+cAp+cAp+aAp+b .3.6.6 Derivation of R3We use the following notations to collect the Op(T−2) terms in the expansion of T−1LR into three groups.G˜1 ≡ −14 β˜j,uqβ˜j,stB˜uB˜qB˜sB˜t + γ˜j,kl(−12B˜r+lB˜r+kβ˜j,uqB˜uB˜q)+ 2γ˜j;i,l(12 β˜j,uqB˜uB˜qB˜iB˜r+l[2; j, i])−αjihβ˜j,uqB˜uB˜qB˜iB˜h ;Gˆ1 ≡ −14βj,uqβj,stBuBqBsBt + γj,kl(−12Br+lBr+kβj,uqBuBq)+ 2γj;i,l(12βj,uqBuBqBiBr+l[2; j, i])−αjihβj,uqBuBqBiBh ;G˜2 ≡ 2γ˜j,kB˜j,uB˜uB˜r+k,qB˜q − 2C˜j,kB˜j,qB˜qB˜r+k[2; j, r + k]− B˜j,uB˜uB˜j,qB˜q + B˜jB˜i,qB˜qAji[2; j, i] ;Gˆ2 ≡ 2γj,kBj,uBuBr+k,qBq − 2Cj,kBj,qBqBr+k[2; j, r + k]−Bj,uBuBj,qBq +BjBi,qBqAji[2; j, i] ;G˜3 ≡ γ˜j,klB˜r+kB˜r+lB˜j,qB˜q + 2γ˜j,klB˜jB˜r+lB˜r+k,qB˜q − C˜j,klB˜jB˜r+kB˜r+l+13 γ˜j,klmB˜jB˜r+kB˜r+lB˜r+m − 2γ˜j;i,lB˜jB˜iB˜r+l,qB˜q − 2(γ˜j;i,l + γ˜i;j,l)B˜r+lB˜iB˜j,qB˜q+2C˜j;i,lB˜jB˜iB˜r+l −(γ˜j;i,lk + γ˜j,l;i,k)B˜jB˜iB˜r+lB˜r+k + 2αjihB˜jB˜iB˜h,qB˜q − 23AjihB˜jB˜iB˜h+2γ˜j;i;h,kB˜jB˜iB˜hB˜r+k − 12αjihgB˜jB˜iB˜hB˜g ;Gˆ3 ≡ γj,klBr+kBr+lBj,qBq + 2γj,klBjBr+lBr+k,qBq − Cj,klBjBr+kBr+l+13γj,klmBjBr+kBr+lBr+m − 2γj;i,lBjBiBr+l,qBq − 2(γj;i,l + γi;j,l)Br+lBiBj,qBq+2Cj;i,lBjBiBr+l −(γj;i,lk + γj,l;i,k)BjBiBr+lBr+k + 2αjihBjBiBh,qBq − 23AjihBjBiBh+2γj;i;h,kBjBiBhBr+k − 12αjihgBjBiBhBg .Then it is clear that the Op(T−2) terms of T−1LR equal to(G˜1 − Gˆ1)+(G˜2 − Gˆ2)+(G˜3 − Gˆ3).Our objective is to get expansions of all these terms in written in terms of the A− α and C − γ systems.1053.6. ProofsExpansion of G˜1 − Gˆ1We notice thatG˜1 − Gˆ1 =(−14β˜k,uqβ˜k,stB˜uB˜qB˜sB˜t)−(−14βk,uqβk,stBuBqBsBt)+{(−14β˜p+a,uqβ˜p+a,stB˜uB˜qB˜sB˜t)−(−14βp+a,uqβp+a,stBuBqBsBt)}+{(−αjihβ˜j,uqB˜iB˜hB˜uB˜q)−(−αjihβj,uqBiBhBuBq)}+{((γ˜j;i,l + γ˜i;j,l)β˜j,uqB˜iB˜r+lB˜uB˜q)−((γj;i,l + γi;j,l)βj,uqBiBr+lBuBq)}+{(−12γ˜j,klβ˜j,uqB˜r+kB˜r+lB˜uB˜q)−(−12γj,klβj,uqBr+kBr+lBuBq)}(3.30)We want to get expansions of all the five lines in (3.30) written in terms of the A−α and C − γ systems.And the first line of (3.30) can be expanded to be the sum of the following 15 terms. We notice that wehave Bm = 0 for all m 6 p.1. −14 β˜k,mnβ˜k,ovB˜mB˜nB˜oB˜v ; 2. −β˜k,mp+aβ˜k,ovB˜mB˜oB˜vB˜p+a ; 3. −β˜k,mnβ˜k,o r+vB˜mB˜nB˜oB˜r+v ;4. −(12 β˜k,mnβ˜k,p+c p+d + β˜k,mp+dβ˜k,n p+c)B˜mB˜nB˜p+cB˜p+d ;5. −(2β˜k,mp+aβ˜k,o r+v + β˜k,moβ˜k,p+a r+v)B˜mB˜oB˜p+aB˜r+v ;6. −(12 β˜k,mnβ˜k,r+o r+v + β˜k,m r+oβ˜k,n r+v)B˜mB˜nB˜r+oB˜r+v ;7. −β˜k,mp+aβ˜k,p+c p+dB˜mB˜p+aB˜p+cB˜p+d ; 8. −β˜k,r+mr+nβ˜k,o r+vB˜oB˜r+mB˜r+nB˜r+v ;9. −(β˜k,p+a p+bβ˜k,o r+v + 2β˜k,o p+aβ˜k,o p+aβ˜k,p+b r+v)B˜oB˜p+aB˜p+bB˜r+v ;10. −(β˜k,mp+aβ˜k,r+o r+v + 2β˜k,m r+oβ˜k,p+a r+v)B˜mB˜p+aB˜r+oB˜r+v ;11.(12βk,p+a p+bβk,r+o r+v + βk,p+a r+oβk,p+b r+v)Bp+aBp+bBr+oBr+v−(12 β˜k,p+a p+bβ˜k,r+o r+v + β˜k,p+a r+oβ˜k,p+b r+v)B˜p+aB˜p+bB˜r+oB˜r+v ;12. 14βk,r+mr+nβk,r+o r+vBr+mBr+nBr+oBr+v − 14 β˜k,r+mr+nβ˜k,r+o r+vB˜r+mB˜r+nB˜r+oB˜r+v ;13. 14βk,p+a p+bβk,p+c p+dBp+aBp+bBp+cBp+d − 14 β˜k,p+a p+bβ˜k,p+c p+dB˜p+aB˜p+bB˜p+cB˜p+d ;14. βk,p+a p+bβk,p+c r+vBp+aBp+bBp+cBr+v − β˜k,p+a p+bβ˜k,p+c r+vB˜p+aB˜p+bB˜p+cB˜r+v ;15. βk,r+mr+nβk,p+c r+vBp+cBr+mBr+nBr+v − β˜k,r+mr+nβ˜k,p+c r+vB˜p+cB˜r+mB˜r+nB˜r+v .Then using the formula (3.23), the first term can be expanded and transformed to1063.6. Proofs−14β˜k,mnβ˜k,ovB˜mB˜nB˜oB˜v= −14{4αw′mnαwovMww′+(γ˜m;n,w′+ γ˜n;m,w′)(γ˜o;v,w + γ˜v;o,w) Nw′kNwk}B˜mB˜nB˜oB˜v={−αvm′n′αwk′l′Mvw −14(γm′;n′,w′ + γn′;m′,w′)(γl′;k′,v′ + γk′;l′,v′)Ωw′wΩv′v (I + M)wv}×Mm′mMn′nMk′kMl′lAkAlAmAnwhere for the first equality we use formulae (3.21) and (3.22) and for the second equality we use formulae(3.18), (3.17) and (3.23). And similarly the 13rd term can be expanded and transformed to14βk,p+a p+bβk,p+c p+dBp+aBp+bBp+cBp+d −14β˜k,p+a p+bβ˜k,p+c p+dB˜p+aB˜p+bB˜p+cB˜p+d={14(γp+b;p+a,v + γp+a;p+a,v)(γp+c;p+d,w + γp+d;p+c,w)ΩvkΩwk}Bp+aBp+bBp+cBp+d−{14(γ˜p+a;p+b,w′+ γ˜p+b;p+a,w′)(γ˜p+c;p+d,w + γ˜p+d;p+c,w)NmkNnk−αv p+a p+bαw p+c p+dMvw}B˜p+aB˜p+bB˜p+cB˜p+d={αv p+a p+bαw p+c p+dMvw −(γp+b;p+a,v + γp+a;p+a,v)(γp+c;p+d,w + γp+d;p+c,w)ΩvlΩwoMlo}×Ap+aAp+bAp+cAp+dwhere basic formulae in Section 3.6.1 are used. Similarly we can get expansions of all other 13 termswritten in terms of the A− α and C − γ systems.After we obtain the expanded forms of all the five lines in (3.30) written in terms of the A − α andC − γ systems, we combine alike terms with the same stochastic parts. It is found that G˜1 − Gˆ1 is equalto the sum of the following 15 terms.1.{−Mvwαvm′n′αwk′l′ + αp+am′n′αp+a k′l′ − 14(γm′;n′,w′ + γn′;m′,w′)(γl′;k′,v′ + γk′;l′,v′)Ωw′wΩv′v (I + M)wv+αvm′n′(γk′;l′,w′ + γl′;k′,w′)Ωw′w (I + M)wv}×Mm′mMn′nMk′kMl′lAkAlAmAn ;2.{Mwoγo,o′v′Ωo′n′Ωv′k′(γw;l′,w′ + γl′;w,w′)Ωw′m′− γp+a,o′v′Ωo′m′Ωv′n′(γl′;p+a,w′ + γp+a;l′,w′)Ωw′k′−γv,oo′Ωom′Ωo′n′γl′,v′w′Ωv′wΩw′k (I + M)wv}(I + M)m′m (I + M)n′n (I + M)k′k Ml′lAkAlAmAn ;1073.6. Proofs3.{−(γn′;m′,v′ + γm′;n′,v′)Ωv′v (I + M)vw γl′,o′w′Ωo′k′Ωw′w− 2αom′n′γl′,vv′Ωvk′Ωv′w (I + M)wo (I + M)k′k + 2Mwvαvl′m′(γw;n′,w′ + γn′;w,w′)Ωw′k′−(γo;n′,w′ + γn′;o,w′)Ωw′k′(γl′;m′,v + γm′;l′,v)Ωvw (I + M)wo−2αp+a l′m′(γp+a;n′,v + γn′;p+a,v)Ωvk′}(I + M)k′k Mn′nMl′lMm′mAkAlAmAn ;4.{αom′n′Mowγw,vv′Ωvl′Ωv′k′ −Mow(γo;n′,v + γn′;o,v)(γm′;w,v′ + γw;m′,v′)Ωvl′Ωv′k′− γm′,w′v′Ωw′wΩv′l′ (I + M)wv γn′,oo′ΩovΩo′k′ + αp+am′n′γp+a,wvΩwl′Ωvk′+(γp+a;n′,w + γn′;p+a,w)(γm′;p+a,v + γp+a;m′,v)Ωwk′Ωvl′+ 2γm′,v′w′Ωv′vΩw′k′(γo;n′,o′ + γn′;o,o′)Ωo′l′ (I + M)vo+γw,v′w′Ωv′l′Ωw′k′γm′;n′,o′Ωo′o (I + M)ow}(I + M)k′k (I + M)l′l Mm′mMn′nAkAlAmAn ;5.{−4Mwnαn l′ p+a(γk′;w,v + γw;k′,v)Ωvm′− 2(γl′;p+a,v′ + γp+a;l′,v′)Ωv′v (I + M)vw γk′,w′o′Ωw′wΩo′m′− 2Mwvαvl′k′ (γw;p+a,o + γp+a;w,o) Ωom′−(γk′;l′,v′ + γl′;k′,v′)Ωv′v (I + M)vw γp+a,w′o′Ωw′wΩo′m′+4αl′ p+a p+b(γk′;p+b,v + γp+b;k′,v)Ωvm′+ 2αl′k′ p+b(γp+b;p+a,v + γp+a;p+b,v)Ωvm′}+ 2(γ˜l′;p+a,n + γ˜p+a;l′,n)(γ˜o;k′,v + γ˜k′;o,v)Ωvm′Ωnw(I + M)wo+ (γ˜o;p+a,v + γ˜p+a;o,v)(γ˜l′;k′,n + γ˜k′;l′,n)Ωvm′Ωnw(I + M)wo+ 2αvl′k′γp+a,w′v′Ωw′wΩv′m′(I + M)wv + 4αol′ p+aγk′,w′v′Ωw′wΩv′m′(I + M)wo× (I + M)m′m Mk′kMl′lAkAlAmAp+a ;6.{2γp+c;p+b,m′Ωm′m (I + M)mw(γp+a;k′,w′ + γk′;p+a,w′)Ωw′w− 4αk′ p+d p+aαp+d p+c p+b + 4Mwvαk′w p+aαv p+b p+c− 2αvk′ p+a(γp+b;p+c,w′+ γp+c;p+b,w′)Ωw′w (I + M)wv−2αv p+a p+b(γk′;p+c,w′ + γp+c;k′,w′)Ωw′w (I + M)wv}Mk′kAkAp+aAp+bAp+c ;7.{−2Mwvαv p+a p+bαwk′l′ − 2γk′;l′,v′γp+a;p+b,w′Ωv′vΩw′w (I + M)wv − 4Mwvαv l′ p+aαwk′ p+b−(γk′;p+b,v′ + γp+b;k′,v′)(γp+a;l′,w′ + γl′;p+a,w′)Ωv′vΩw′w (I + M)wv+ 2αp+c k′l′αp+c p+a p+b + 4αp+c k′ p+aαp+c l′ p+b + 2γp+a;p+b,w′Ωw′w (I + M)wo αok′l′1083.6. Proofs+ 4αok′ p+b(γl′;p+a,v′ + γp+a;l′,v′)Ωv′v (I + M)vo+2αo p+a p+bγk′;l′,w′Ωw′w (I + M)wo}Mk′kMl′lAkAlAp+aAp+b ;8.{2γl′;k′,n′Ωn′n (I + M)nv(γp+a;m′,v′ + γm′;p+a,v′)Ωv′v + 4Movαom′ p+aαvk′l′− 4αp+bm′k′αl′ p+b p+a − 4αnm′ p+aγl′;k′,w′Ωw′w (I + M)wn−2αwm′k′(γp+a;l′,n′ + γl′;p+a,n′)Ωn′n (I + M)nw}Mm′mMk′kMl′lAmAkAlAp+a ;9.{2Mwvαwm′ p+aγv,w′v′Ωw′k′Ωv′l′ + 2Mnw(γn;m′,w′ + γm′;n,w′)(γp+a;w,v′+ γw;p+a,v′)Ωw′k′Ωv′l′+ 2γm′,oo′ΩovΩo′k′γp+a,n′w′Ωn′nΩw′l′ (I + M)nv − 2αm′ p+a p+bγp+b,wvΩwk′Ωvl′− 2γp+a,n′w′Ωn′nΩw′l′ (I + M)nw(γw;m′,v′ + γm′;w,v′)Ωv′k′− 2(γp+b;m′,v + γm′;p+b,v) (γp+a;p+b,w + γp+b;p+a,w)Ωvk′Ωwl′− 2(γn;p+a,n′+ γp+a;n,n′)Ωn′k′γw,v′o′Ωv′oΩo′l′ (I + M)on−(γm′;p+a,n′ + γp+a;m′,n′)Ωn′n (I + M)nw γw,v′o′Ωv′k′Ωo′l′}Mm′m (I + M)k′k (I + M)l′lAkAlAmAp+a ;10.{2Mmwαmp+a p+b(γk′;w,v + γw;k′,v)Ωvl′+ 2γp+a;p+b,m′Ωm′m (I + M)mn γk′,n′v′Ωn′nΩvl′+ 4Mmnαmk′ p+a(γp+b;n,w + γn;p+b,w)Ωwl′+ 2(γk′;p+a,m′ + γp+a;k′,m′)Ωm′m (I + M)mw γp+b,w′v′Ωw′wΩv′l′− 2αp+a p+b p+c(γk′;p+c,w + γp+c;k′,w)Ωwl′− 4αk′ p+a p+c(γp+b;p+c,w + γp+c;p+b,w)Ωwl′− 2γp+a;p+b,m′Ωm′m (I + M)mn(γk′;n,v + γn;k′,v)Ωvl′− 2(γn;p+b,w + γp+b;n,w)Ωwl′(γk′;p+a,m′ + γp+a;k′,m′)Ωm′m (I + M)mn−2αmp+a p+bγk′,n′w′Ωn′nΩw′l′ (I + M)nm − 4αmk′ p+aγp+b,n′w′Ωn′n (I + M)nm Ωw′l′}×Mk′k (I + M)l′lAkAlAp+aAp+b ;11.{−14Mvwγv,v′o′γw,w′oΩv′m′ (I + M)m′m Ωo′n′ (I + M)n′n Ωw′k′ (I + M)k′k Ωol′(I + M)l′l+ 14γp+a,oo′γp+a,vwΩom′Mm′mΩo′n′ (I + M)n′n Ωvk′(I + M)k′k Ωwl′(I + M)l′l+ 14γp+a,oo′γp+a,vwΩomΩo′n′Mn′nΩvk′(I + M)k′k Ωwl′(I + M)l′l+ 14γp+a,oo′γp+a,vwΩomΩo′nΩvk′Mk′kΩwl′(I + M)l′l+14γp+a,oo′γp+a,vwΩomΩo′nΩvkΩwl′Ml′l}AkAlAmAn ;1093.6. Proofs12.{−Mvwγv,onΩok′(I + M)k′k Ωnl′(I + M)l′l(γp+a;w,o′+ γw;p+a,o′)Ωo′m′ (I + M)m′m+ γv,onΩok′(I + M)k′k Ωnl′(I + M)l′l γp+a,ww′Ωwo′Mo′vΩw′m′ (I + M)m′m+ γv,onΩok′(I + M)k′k Ωnl′Ml′lγp+a,ww′ΩwvΩw′m′ (I + M)m′m+ γv,onΩok′Mk′kΩnlγp+a,ww′ΩwvΩw′m′ (I + M)m′m + γv,onΩokΩnlγp+a,ww′ΩwvΩw′m′Mm′m+ γp+b,vwΩvk′Mk′kΩwl′(I + M)l′l (γp+b;p+a,o + γp+a;p+b,o)Ωom′(I + M)m′m+ γp+b,vwΩvkΩwl′Ml′l(γp+b;p+a,o + γp+a;p+b,o)Ωom′(I + M)m′m+γp+b,vwΩvkΩwl(γp+b;p+a,o + γp+a;p+b,o)Ωom′Mm′m}AkAlAmAp+a ;13.{−Mvwαv p+a p+bγw,mnΩmk′(I + M)k′k Ωnl′(I + M)l′l−Mvw (γv;p+a,o + γp+a;v,o) Ωok′(I + M)k′k (γw;p+b,n + γp+b;w,n)Ωnl′(I + M)l′l− γp+a,wvΩwmMmnΩvk′(I + M)k′k γp+b,n′m′Ωn′nΩm′l′ (I + M)l′l− γp+a,wvΩwmΩvk′Mk′kγp+b,n′m′Ωn′mΩm′l′ (I + M)l′l − γp+a,wvΩwmΩvkγp+b,n′m′Ωn′mΩm′l′Ml′l+ 2 (γw;p+a,m + γp+a;w,m) Ωmk′(I + M)k′k γp+b,ovΩonMnwΩvl′(I + M)l′l+ 2 (γw;p+a,m + γp+a;w,m) Ωmk′Mk′kγp+b,ovΩowΩvl′(I + M)l′l+ 2 (γw;p+a,m + γp+a;w,m) Ωmkγp+b,ovΩowΩvl′Ml′l+ αp+c p+a p+bγp+c,ovΩok′Mk′kΩvl′(I + M)l′l + αp+c p+a p+bγp+c,ovΩokΩvl′Ml′l+ γp+a;p+b,vΩvoMowγw,mnΩmk′(I + M)k′k Ωnl′(I + M)l′l+γp+a;p+b,vΩvoγo,mnΩmk′Mk′kΩnl′(I + M)l′l + γp+a;p+b,vΩvoγo,mnΩmkΩnl′Ml′l}AkAlAp+aAp+b ;14.{−2Mmvαv p+a p+b(γp+c;m,l + γm;p+c,l)Ωlk′(I + M)k′k− 2γp+a;p+b,vΩvoMowΩmwγp+c,mnΩnk′(I + M)k′k − 2γp+a;p+b,vΩvoΩmoγp+c,mnΩnk′Mk′k+ 2αp+d p+a p+b(γp+c;p+d,v + γp+d;p+c,v)Ωvk′Mk′k+ 2γp+a;p+b,vΩvoMom (γp+c;m,n + γm;p+c,n) Ωnk′(I + M)k′k+ 2γp+a;p+b,vΩvo (γp+c;o,n + γo;p+c,n) Ωnk′Mk′k2αn p+a p+bγp+c,ovΩowMwnΩvk′(I + M)k′k + 2αn p+a p+bγp+c,ovΩonΩvk′Mk′k}AkAp+aAp+bAp+c ;15.{−Mvwαv p+a p+bαw p+c p+d − γp+a;p+b,vΩvwMwoΩnoγp+c;p+d,n+2γp+a;p+b,vΩvwMwoαo p+c p+d}Ap+aAp+bAp+cAp+c .1103.6. ProofsExpansion of G˜2 − Gˆ2We haveG˜2 − Gˆ2 ={(−2C˜j,kB˜r+k,qB˜qB˜j)−(−2Cj,kBr+k,qBqBj)}+{(−B˜j,uB˜uB˜j,qB˜q)−(−Bj,uBuBj,qBq)}+(2AjiB˜jB˜i,qB˜q − 2AjiBjBi,qBq).Using the approach we take in Section 3.6.6, we can obtain the expanded forms of all the terms in G˜2−Gˆ2written in terms of the A−α and C− γ systems. We again combine alike terms with the same stochasticparts. It is found that G˜2 − Gˆ2 is equal to the sum of the following 15 terms.1.{2MlnΩko (I + M)ov Mmw + 2Ωko (I + M)on MlvMmw}C l,kAmnAvAw ;2.{−MlmΩko (I + M)ov Ωno′(I + M)o′w − 2Ωno (I + M)ol Ωko′(I + M)o′v Mmw−Ωno (I + M)om′Ωkm′MmwMlv}C l,kCm,nAvAw ;3.{−MkmMlvMnw}AklAmnAvAw ; 4.{−2Ωko (I + M)om Mln}Cp+a,kAp+a lAmAn ;5.{ΩkoMomΩlw (I + M)wn + ΩkmΩlwMwn}Cp+a,kCp+a,lAmAn ; 6.{MklMmn}Ap+a kAp+amAlAn ;7.{−2Ωlo (I + M)ok Mnv − 2MknΩlo (I + M)ov}Cn,lAp+a kAp+aAv ;8.{−2Ωko (I + M)on Mml}Cp+a,kAmnAlAp+a ; 9.{2MmkMnl}AmnAk p+aAlAp+a ;10.{2ΩmlΩkwMwv + 2Ωkn (I + M)no ΩmoMlv + 2ΩmoMolΩkw (I + M)wv}C l,kCp+a,mAvAp+a ;11.{2ΩkmMml}Cp+b,kAp+b p+aAlAp+a ; 12.{−2Mkl}Ap+a kAp+a p+bAlAp+b ;13.{−ΩkoMowΩlw}Cp+a,kCp+b,lAp+aAp+b ; 14.{2ΩkoMol}Cp+b,kAp+a lAp+bAp+a ;15.{−Mkl}Ak p+aAl p+bAp+aAp+b .1113.6. ProofsExpansion of G˜3 − Gˆ3We haveG˜3 − Gˆ3 = γ˜j,klB˜r+kB˜r+lB˜j,qB˜q − γj,klBr+kBr+lBj,qBq+(2γ˜j,klB˜jB˜r+lB˜r+k,qB˜q − 2γj,klBjBr+lBr+k,qBq)+(13γ˜j,klmB˜jB˜r+kB˜r+lB˜r+m −13γj,klmBjBr+kBr+lBr+m)+{(−2γ˜j;i,lB˜jB˜iB˜r+l,qB˜q)−(−2γj;i,lBjBiBr+l,qBq)}+{(−2(γ˜j;i,l + γ˜i;j,l)B˜r+lB˜iB˜j,qB˜q)−(−2(γj;i,l + γi;j,l)Br+lBiBj,qBq)}+(2B˜jB˜iB˜r+lC˜j;i,l − 2BjBiBr+lCj;i,l)+{(−(γ˜j;i,lk + γ˜j,l;i,k)B˜jB˜iB˜r+lB˜r+k)−(−(γj;i,lk + γj,l;i,k)BjBiBr+lBr+k)}+(2αjihB˜jB˜iB˜h,qB˜q − 2αjihBjBiBh,qBq)+{(−23AjihB˜jB˜iB˜h)−(−23AjihBjBiBh)}+(2γ˜j;i;h,kB˜jB˜iB˜hB˜r+k − 2γj;i;h,kBjBiBhBr+k)+{(−12αjihgB˜jB˜iB˜hB˜g)−(−12αjihgBjBiBhBg)}.+{(−C˜j,klB˜jB˜r+kB˜r+l)−(−Cj,klBjBr+kBr+l)}Using the approach we take in Section 3.6.6, we can obtain the expanded forms of all the terms in G˜3−Gˆ3written in terms of the A−α and C− γ systems. We again combine alike terms with the same stochasticparts. It is found that G˜3 − Gˆ3 is equal to the sum of the following 41 terms.1.{−Mm′nMmoγm′,k′l′Ωk′w′ (I + M)w′w Ωl′v′ (I + M)v′v − 2γm′,k′l′Ωk′k (I + M)ko Ωl′l (I + M)lm MnwMm′v+ 2γm′;n′,oΩol (I + M)ln Mm′vMn′wMmo + 2(γm′;k,l′ + γk;m′,l′)Ωl′l (I + M)lw Mm′nMkoMmv−2αm′klMlnMm′wMkvMmo}AnmAoAwAv ;2.{Momγm′,k′l′Ωk′w′ (I + M)w′w Ωl′v′ (I + M)v′v Ωnn′(I + M)n′m′+ γm′,w′v′Ωw′k (I + M)kv Ωv′l (I + M)lm Mm′oΩno′(I + M)o′w+ 2γm′,k′l′Ωk′k (I + M)km Ωl′l (I + M)lw′Ωnw′Mm′wMov+ 2γm′,k′l′Ωk′k (I + M)kw Ωl′l (I + M)lo Ωnn′(I + M)n′v Mm′m− 2γm′;k,v′Ωv′l (I + M)ln′Ωnn′Mm′mMkvMow − 2γm′;k,k′Ωk′l (I + M)lo Mm′vMkwΩnn′(I + M)n′m1123.6. Proofs− 2(γm′;k,l′ + γk;m′,l′)Ωl′l (I + M)lw Ωnw′(I + M)w′m′ MkvMom− 2(γm′;k,l′ + γk;m′,l′)Ωl′l (I + M)lv MkmMm′oΩnk′(I + M)k′w+2αm′klMm′wMkmMovΩnn′(I + M)n′l + 2αm′n′lMloMm′wMn′vΩnk (I + M)km}Co,nAwAvAm ;3.{Mmnγm,w′v′Ωw′k′ (I + M)k′k Ωv′l′ (I + M)l′l + 2γm,k′l′Ωk′w (I + M)wk Ωl′v (I + M)vn Mml− 2γm;v,oΩol′(I + M)l′n MmlMvk − 2(γm;k′,l′ + γk′;m,l′)Ωl′o (I + M)ol MmnMk′k+2αmwl′Ml′nMmlMwk}An p+aAlAkAp+a ;4.{−γm,w′v′Ωw′k′ (I + M)k′k Ωv′l′ (I + M)l′l Ωnn′Mn′m − γm,w′v′Ωw′k′Mk′kΩv′l′ (I + M)l′l Ωnn′Mn′m− γm,w′v′Ωw′kΩv′l′Ml′lΩnn′Mn′m − 2γm,k′l′Ωk′w (I + M)wk Ωl′v (I + M)vw′Ωnw′Mml+ 2γm;v,oΩol′(I + M)l′w ΩnwMmlMvk + 2(γm;k′,l′ + γk′;m,l′)Ωl′v (I + M)vl Mk′kΩnw (I + M)wm−2αmwvMmkMwlΩnn′(I + M)n′v}Cp+a,nAkAlAp+a ;5.{γp+a,w′v′Ωw′k′ (I + M)k′k Ωv′l′ (I + M)l′l Mno − 2(γp+a;k′,l′ + γk′;p+a,l′)Ωl′w (I + M)wo MnkMk′l+2αmnp+aMmlMnoMnk}Ap+anAkAlAo ;6.{−γp+b,w′v′Ωw′k′Mk′kΩv′l′ (I + M)l′l − γp+b,w′v′Ωw′k′Mk′kΩv′l′Ml′l+2(γp+b;k′,l′ + γk′;p+b,l′)Ωl′w (I + M)wl Mk′k − 2αmnp+bMmkMnl}Ap+b p+aAp+aAkAl ;7.{−γp+a,w′v′Ωw′k′Mk′kΩv′l′ (I + M)l′l Ωmo (I + M)on − γp+a,w′v′Ωw′kΩv′l′Ml′lΩmo (I + M)on− γp+a,w′v′Ωw′kΩv′lΩmoMon + 2(γp+a;k′,l′ + γk′;p+a,l′)Ωl′m′ (I + M)m′n Mk′kΩmv (I + M)vl−2αvw p+aMvlMwnΩmv′(I + M)v′k}Cp+a,mAkAlAn ;8.{−2γp+a,v′w′Ωv′k′ (I + M)k′k Ωw′l (I + M)ln ΩmnMvw− 2γp+a,v′w′Ωv′k′Mk′kΩw′l (I + M)lv Ωmn (I + M)nw − 2γp+a,v′w′Ωv′kΩw′lMlvΩmn (I + M)nw− 2γp+a,v′w′Ωv′kΩw′vΩmnMnw + 2 (γm;p+a,o + γp+a;m,o) Ωol (I + M)ln′Ωmn′MmkMvw+ 2 (γp+a;n,o + γn;p+a,o) Ωol (I + M)lv MnkΩmn′(I + M)n′w+ 2(γn;p+a,l′+ γp+a;n,l′)Ωl′l (I + M)lk Ωmm′(I + M)m′n Mvw1133.6. Proofs+ 2(γn;p+a,l′+ γp+a;n,l′)Ωl′l (I + M)lk MnvΩmv′(I + M)v′w−4αo p+a lMokΩmn (I + M)nl Mvw − 4αn p+a lMnkMlvΩmm′(I + M)m′w}Cv,mAp+aAwAk ;9.{2γp+a,v′w′Ωv′k′ (I + M)k′k Ωw′l (I + M)lm Mvw − 2 (γp+a;n,o + γn;p+a,o) Ωol (I + M)lm MnkMvw−2(γn;p+a,l′+ γp+a;n,l′)Ωl′l (I + M)lk MnmMvw + 4αn p+a lMlmMnwMvk}AmvAkAwAp+a ;10.{2γp+a,v′w′Ωv′k′Mk′kΩw′l (I + M)ln′Ωon′+ 2γp+a,v′w′Ωv′kΩw′lMln′Ωon′− 2 (γm;p+a,w + γp+a;m,w) Ωwl (I + M)ln ΩonMmk− 2(γm;p+a,l′+ γp+a;m,l′)Ωl′l (I + M)lk ΩonMnm − 2(γm;p+a,l′+ γp+a;m,l′)Ωl′lMlkΩom+4αmp+a lMmkΩon (I + M)nl}Cp+b,oAkAp+aAp+b ;11.{−2γp+a,v′w′Ωv′kMkwΩw′l (I + M)lm − 2γp+a,v′w′Ωv′wΩw′lMlm+ 2 (γn;p+a,o + γp+a;n,o) Ωol (I + M)lm Mnw+2(γn;p+a,l′+ γp+a;n,l′)Ωl′l (I + M)lw Mnm − 4αn p+a lMlmMnw}Amp+bAwAp+aAp+b ;12.{−2γp+a;p+b,oΩol (I + M)ln ΩmnMvw− 2γp+a;p+b,oΩol (I + M)lv ΩmnMnw − 2γp+a;p+b,oΩolMlvΩmw+2αp+a p+b lΩmn (I + M)nl Mvw + 2αp+a p+b lMvlΩmn (I + M)nw}Cv,mAwAp+aAp+b ;13.{2γp+a;p+b,oΩol (I + M)lm Mvn − 2αp+a p+b lMlmMvn}AmvAnAp+aAp+b ;14.{2γp+a;p+b,mΩmlΩonMln − 2αp+a p+b lΩonMnl}Cp+c,oAp+cAp+aAp+b ;15.{−2γp+a;p+b,oΩolMlm + 2αp+a p+b lMlm}Amp+cAp+aAp+bAp+c ;16.{2(γp+a;p+b,l′+ γp+b;p+a,l′)Ωl′l (I + M)lo Mvw − 4αmp+a p+bMmwMvo}Ap+a vAoAwAp+b ;17.{−2(γp+a;p+b,l′+ γp+b;p+a,l′)Ωl′lMlo + 4αmp+a p+bMmo}Ap+a p+cAp+bAp+cAo ;18.{−2(γp+a;p+b,l′+ γp+b;p+a,l′)Ωl′lMloΩvn (I + M)nw − 2(γp+a;p+b,l + γp+b;p+a,l)ΩloΩvnMnw+4αmp+a p+bMmoΩvk (I + M)kw}Cp+a,vAp+bAoAw ;19.{13γn′,ovwΩok′Ωvl′Ωwm′}(I + M)k′k (I + M)l′l (I + M)m′m Mn′nAkAlAmAn ;1143.6. Proofs20.{−13γp+a,ovwΩok′Ωvl′Ωwm′+(γk′;p+a,ov + γk′,o;p+a,v)Ωol′Ωvm′+(γp+a;k′,ov + γp+a,o;k′,v)Ωol′Ωvm′}(I + M)l′l (I + M)m′m Mk′kAkAlAmAp+a ;21.{−13γp+a,ovwΩokΩvl′Ml′lΩwm′(I + M)m′m − 13γp+a,ovwΩokΩvlΩwm′Mm′m}AkAlAmAp+a ;22. Ωok′(I + M)k′k Ωvl′(I + M)l′l MmnCm,ovAkAlAn ;23.{−Ωok′Mk′kΩvl′(I + M)l′l − ΩokΩvl′Ml′l}Cp+a,ovAkAlAp+a ;24.{−(γk′;l′,ov + γk′,o;l′,v)Ωom′(I + M)m′m Ωvn′(I + M)n′n Mk′kMl′l}AkAlAmAn ;25.{−(γp+b;p+a,ov + γp+b,o;p+a,v)Ωok′Mk′kΩvl′(I + M)l′l+4γl′;p+a;p+b,oΩok′(I + M)k′k Ml′l + 2γp+b;p+a;l′,oΩok′(I + M)k′k Ml′l}AkAlAp+aAp+b ;26.{−(γp+b;p+a,ov + γp+b,o;p+a,v)ΩokΩvl′Ml′l}AkAlAp+aAp+b ;27.{2αp+a p+b p+cMkl}Ap+c kAp+aAp+bAl ; 28.{−2αp+a p+b p+cΩomMmk}Cp+c,oAkAp+aAp+b ;29.{−23MnkMolMvm}AnovAkAlAm ; 30.{2MmkMnlAmnp+aAp+aAkAl};31.{−2MmkAmp+a p+bAp+aAp+bAk}; 32.{2γm′;n′;l′,oΩok′}(I + M)k′k Mm′mMn′nMl′lAkAlAmAn ;33.{−4γm′;p+a;l′,oΩok′− 2γm′;l′;p+a,oΩok′}(I + M)k′k Mm′mMl′lAkAmAlAp+a ;34.{−2γp+c;p+a;p+b,oΩok′+ 2αk′ p+a p+b p+c}Mk′kAkAp+aAp+bAp+c ;35. −12αk′l′m′n′Mk′kMl′lMm′n′Mn′n′AkAlAmAn ; 36. 2αk′l′m′ p+aMk′kMl′lMm′mAkAlAmAp+a37. −3αk′l′ p+a p+bMk′kMl′lAkAlAp+aAp+b ; 38. 2Ωol (I + M)lk MwmMvnCw;v,oAkAmAn ;39. −2Ωol (I + M)lk MwmCw;p+a,oAkAmAp+a ; 40. −2Ωol (I + M)lk MwmCp+a;w,oAkAmAp+a ;41. 2ΩolMlkCp+a;p+b,oAkAp+aAp+bThe Expression for R3To derive the expression for R3, we first notice that it should be the case that 2Rτ3R1 =(G˜1 − Gˆ1)+(G˜2 − Gˆ2)+(G˜3 − Gˆ3)− Rτ2R2. It is found that after removing the contribution of Rτ2R2, the re-maining terms in(G˜1 − Gˆ1)+(G˜2 − Gˆ2)+(G˜3 − Gˆ3)can be written in the form of 2Rτ3R1 for somep0−dimensional random vector R3. This proves the existence of the signed root decomposition. We defineR3.j for j = 1, . . . , 34. It can be shown that R3 =∑34j=1R3.j . For each x,1153.6. ProofsRx3.1 ≡ −12{−89αvm′n′αwkl′Mvw + αp+am′n′αp+a kl′− 14(γm′;n′,w′ + γn′;m′,w′)(γl′;k,v′ + γk;l′,v′)Ωw′wΩv′v (I + M)wv+αvm′n′(γk;l′,w′ + γl′;k,w′)Ωw′w (I + M)wv}Mm′mMn′nMl′lOkxAlAmAn ;Rx3.2 ≡ −12{γo,o′v′Ωo′n′Ωv′k′γw;l,w′Ωw′m′Mwo − γp+a,o′v′(γl;p+a,w′+ γp+a;l,w′)Ωo′m′Ωv′n′Ωw′k′−γv,oo′γl,v′w′Ωom′Ωo′n′Ωv′wΩw′k (I + M)wv}(I + M)m′m (I + M)n′n (I + M)k′k OlxAkAmAn ;Rx3.3 ≡ −12{−(γn′;m′,v′ + γm′;n′,v′)γl,o′w′Ωv′vΩo′k′Ωw′w (I + M)vw− 2αom′n′γl,vv′Ωvk′Ωv′w (I + M)wo + 2αvlm′(γw;n′,w′ + γn′;w,w′)Ωw′k′Mwv−(γo;n′,w′ + γn′;o,w′)(γl;m′,v + γm′;l,v)Ωw′k′Ωvw (I + M)wo − 23γl;w,vαm′n′oΩvk′Mwo−2αp+a lm′(γp+a;n′,v + γn′;p+a,v)Ωvk′}(I + M)k′k Mn′nMm′mOlxAkAmAn ;Rx3.4 ≡ −12{−23αomn′γw,vv′Ωvl′Ωv′k′Mow + γm;w,oγn′;w′,oΩok′Ωol′Mww′−(γo;n′,v + γn′;o,v)(γm;w,v′+ γw;m,v′)Ωvl′Ωv′k′Mow− γm,w′v′γn′,oo′ΩovΩo′k′Ωw′wΩv′l′ (I + M)wv+ αp+amn′γp+a,wvΩwl′Ωvk′+(γp+a;n′,w + γn′;p+a,w)(γm;p+a,v + γp+a;m,v) Ωwk′Ωvl′+ 2γm,v′w′(γo;n′,o′ + γn′;o,o′)Ωv′vΩw′k′Ωo′l′ (I + M)vo+γw,v′w′γm;n′,o′Ωv′l′Ωw′k′Ωo′o (I + M)ow}(I + M)k′k (I + M)l′l Mn′nOmxAkAlAn ;Rx3.5 ≡ −12{+14γp+a,oo′γp+a,vwΩokΩo′n′Ωvm′Ωwl′(I + M)m′m (I + M)l′l (I + M)n′n+ 14γp+a,oo′γp+a,vwΩomΩo′kΩvn′Ωwl′(I + M)n′n (I + M)l′l+ 14γp+a,oo′γp+a,vwΩomΩo′nΩvkΩwl′(I + M)l′l+14γp+a,oo′γp+a,vwΩomΩo′nΩvlΩwk}OkxAlAmAn ;Rx3.6 ≡ −12{−12αkl′m′n′Ml′lMm′n′Mn′n′ + 13γk,ovwΩon′Ωvl′Ωwm′(I + M)n′n (I + M)l′l (I + M)m′m−(γl′;k,ov + γl′,o;k,v)Ωom′Ωvn′(I + M)m′m (I + M)n′n Ml′l+2γm′;n′;k,oΩol′(I + M)l′l Mm′mMn′n}OkxAnAlAm ;1163.6. ProofsRx3.7 ≡ −12{−4Mwnαn l′ p+a(γk;w,v + γw;k,v)Ωvm′+ 2γk;w,vαl′n p+aΩvm′Mwn− 43Mwvαvl′k (γw;p+a,o + γp+a;w,o) Ωom′− 2(γl′;p+a,v′ + γp+a;l′,v′)Ωv′v (I + M)vw γk,w′o′Ωw′wΩo′m′−(γk;l′,v′ + γl′;k,v′)Ωv′v (I + M)vw γp+a,w′o′Ωw′wΩo′m′+ γk;w,vγp+a,n′nΩvl′MwoΩn′m′Ωno + 4αl′ p+a p+b(γk;p+b,v + γp+b;k,v)Ωvm′+ 2(γ˜l′;p+a,n + γ˜p+a;l′,n) (γ˜o;k,v + γ˜k;o,v)Ωvm′Ωnw(I + M)wo+ (γ˜o;p+a,v + γ˜p+a;o,v)(γ˜l′;k,n + γ˜k;l′,n)Ωvm′Ωnw(I + M)wo+ 2αvl′kγp+a,w′v′Ωw′wΩv′m′(I + M)wv + 4αol′ p+aγk,w′v′Ωw′wΩv′m′(I + M)wo+2αl′k p+b(γp+b;p+a,v + γp+a;p+b,v)Ωvm′}(I + M)m′m Ml′lOkxAlAmAp+a ;Rx3.8 ≡ −12{2γl′;k,n′(γp+a;m′,v′ + γm′;p+a,v′)Ωn′nΩv′v (I + M)nv+ 103 αom′ p+aαvkl′Mov − 4αp+bm′kαl′ p+b p+a− 4αnm′ p+aγl′;k,w′Ωw′w (I + M)wn − 13αkl′nγp+a,voΩvm′ΩowMnw−2αwm′k(γp+a;l′,n′ + γl′;p+a,n′)Ωn′n (I + M)nw}Mm′mMl′lOkxAmAlAp+a ;Rx3.9 ≡ −12{αwk p+aγv,w′v′Ωw′m′Ωv′l′Mwv + 2γn;k,w′(γp+a;w,v′+ γw;p+a,v′)Ωw′,′Ωv′l′Mnw+ 2γk,oo′γp+a,n′w′ΩovΩo′m′Ωn′nΩw′l′ (I + M)nv − 2αk p+a p+bγp+b,wvΩwm′Ωvl′− 2γp+a,n′w′(γw;k,v′+ γk;w,v′)Ωn′nΩw′l′Ωv′m′ (I + M)nw− 2(γp+b;k,v + γk;p+b,v) (γp+a;p+b,w + γp+b;p+a,w)Ωvm′Ωwl′− 12γo,nwγp+a,v′vΩnm′Ωv′kΩvw′Ωwl′Mow′−3(γk;p+a,n′+ γp+a;k,n′)γw,v′o′Ωn′nΩv′m′Ωo′l′ (I + M)nw}× (I + M)m′m (I + M)l′l OkxAlAmAp+a ;Rx3.10 ≡ −12{+γv,onγp+a,ww′Ωom′Ωnl′Ωwo′Mo′vΩw′k (I + M)l′l (I + M)m′m+ γv,onγp+a,ww′Ωol′ΩnkΩwvΩw′m′ (I + M)m′m (I + M)l′l+ γv,onγp+a,ww′ΩokΩnlΩwvΩw′m′ (I + M)m′m+ γv,onγp+a,ww′ΩomΩnlΩwvΩw′k − 23αkl′nγp+a,voMnwΩvmΩowMl′l+ 2γk;w,vγp+a,m′nΩvl′Mwn′Ωm′mΩnn′(I + M)l′l1173.6. Proofs+ γp+b,vw(γp+b;p+a,o + γp+a;p+b,o)ΩvkΩwl′Ωom′(I + M)l′l (I + M)m′m+ γp+b,vw(γp+b;p+a,o + γp+a;p+b,o)ΩvlΩwkΩom′(I + M)m′m+γp+b,vw(γp+b;p+a,o + γp+a;p+b,o)ΩvmΩwlΩok}OkxAlAmAp+a ;Rx3.11 ≡ −12{−13γp+a,ovwΩokΩvl′Ωwm′(I + M)l′l (I + M)m′m+(γk;p+a,ov + γk,o;p+a,v)Ωol′Ωvm′(I + M)l′l (I + M)m′m+(γp+a;k,ov + γp+a,o;k,v)Ωol′Ωvm′(I + M)l′l (I + M)m′m− 13γp+a,ovwΩolΩvkΩwm′(I + M)m′m − 13γp+a,ovwΩomΩvlΩwk− 4γm′;p+a;k,oΩol′(I + M)l′l Mm′m − 2γm′;k;p+a,oΩol′(I + M)l′l Mm′m+2αkl′m′ p+aMl′lMm′m}OkxAlAmAp+a ;Rx3.12 ≡ −12{−43αv p+a p+bαwkl′Mwv − 2γk;l′,v′γp+a;p+b,w′Ωv′vΩw′w (I + M)wv− 23αkl′nγp+a;p+b,wΩwvMnv+ 14γp+a,vwγp+b,onΩvkΩwmΩol′Ωno′Mmo′+ γp+a,owαl′n p+bΩokΩwmMmn − 3αv l′ p+aαwk p+bMwv−(γk;p+b,v′+ γp+b;k,v′)(γp+a;l′,w′ + γl′;p+a,w′)Ωv′vΩw′w (I + M)wv+ 2αp+c kl′αp+c p+a p+b + 4αp+c k p+aαp+c l′ p+b+ 2γp+a;p+b,w′Ωw′w (I + M)wo αokl′+ 4αok p+b(γl′;p+a,v′ + γp+a;l′,v′)Ωv′v (I + M)vo+2αo p+a p+bγk;l′,w′Ωw′w (I + M)wo}Ml′lOkxAlAp+aAp+b ;Rx3.13 ≡ −12{2αmp+a p+b(γk;w,v + γw;k,v)Ωvl′Mmw + 2γp+a;p+b,m′γk,n′v′Ωn′nΩvl′Ωm′m (I + M)mn+ 2γk;w,oγp+a;p+b,nΩol′MwmΩnm − 2γk;w,oΩol′Mwvαv p+a p+b+ 4αmk p+a(γp+b;n,w + γn;p+b,w)Ωwl′Mmn+ 2γp+b,w′v′(γk;p+a,m′+ γp+a;k,m′)Ωm′mΩw′wΩv′l′ (I + M)mw− 2αp+a p+b p+c(γk;p+c,w + γp+c;k,w)Ωwl′− 4αk p+a p+c(γp+b;p+c,w + γp+c;p+b,w)Ωwl′− 2γp+a;p+b,m′ (γk;n,v + γn;k,v)Ωm′mΩvl′(I + M)mn− 2(γn;p+b,w + γp+b;n,w) (γk;p+a,m′+ γp+a;k,m′)Ωwl′Ωm′m (I + M)mn− 2αmp+a p+bγk,n′w′Ωn′nΩw′l′ (I + M)nm − γp+a,ow(γn;p+b,v + γp+b;n,v)ΩokΩwmΩvl′Mmn1183.6. Proofs− 2αkmp+a(γo;p+b,m′+ γp+b;o,m′)Ωm′l′Mmo−4αmk p+aγp+b,n′w′Ωn′nΩw′l′ (I + M)nm}(I + M)l′l OkxAlAp+aAp+b ;Rx3.14 ≡ −12{−γp+a,wvγp+b,n′m′ΩwmΩvkΩn′nΩm′l′Mmn (I + M)l′l− γp+a,wvγp+b,n′m′ΩwmΩvlΩn′nΩm′kMmn− γp+a,wvγp+b,n′m′ΩwmΩvkΩn′mΩm′l′ (I + M)l′l − γp+a,wvγp+b,n′m′ΩwmΩvlΩn′mΩm′k+ γp+a,moγp+b,m′vΩmlΩowΩm′kΩvnMwn + 2γp+a,moαkn p+bΩmlΩowMwn+ 2γp+b,ov (γw;p+a,m + γp+a;w,m) Ωml′ΩonΩvkMnw (I + M)l′l+ 2γp+b,ov (γw;p+a,m + γp+a;w,m) ΩmkΩowΩvl′(I + M)l′l+ 2γp+b,ov (γw;p+a,m + γp+a;w,m) ΩmlΩowΩvk+(γp+c;p+b,m + γp+b;p+c,m)Ωmk (γp+a;p+c,n + γp+c;p+a,n) Ωnl′(I + M)l′l+(γp+c;p+b,m + γp+b;p+c,m)Ωml (γp+a;p+c,n + γp+c;p+a,n) Ωnk+ αp+c p+a p+bγp+c,ovΩokΩvl′(I + M)l′l + αp+c p+a p+bγp+c,ovΩolΩvk+γp+a;p+b,vγo,mnΩvoΩmkΩnl′(I + M)l′l + γp+a;p+b,vγo,mnΩvoΩmlΩnk}OkxAlAp+aAp+b ;Rx3.15 ≡ −12{−(γp+b;p+a,ov + γp+b,o;p+a,v)ΩokΩvl′(I + M)l′l+ 4γk;p+a;p+b,oΩol′(I + M)l′l + 2γp+b;p+a;k,oΩol′(I + M)l′l−(γp+b;p+a,ov + γp+b,o;p+a,v)ΩolΩvk − 3αkl′ p+a p+bMl′l}OkxAlAp+aAp+b ;Rx3.16 ≡ −12{−2γp+a;p+b,vγp+c,mnΩvoΩmwΩnkMow− 2γp+a;p+b,vγp+c,mnΩvoΩmoΩnk + γp+a,owγp+b;p+c,lΩokΩwmΩlvMmv+ 2αp+d p+a p+b(γp+c;p+d,v + γp+d;p+c,v)Ωvk+ 2γp+a;p+b,v (γp+c;o,n + γo;p+c,n) ΩvoΩnk+ 2αn p+a p+bγp+c,ovΩowΩvkMwn + 2αn p+a p+bγp+c,ovΩonΩvk− γp+a,owαl p+b p+cΩokΩwmMml + 2αkmp+aγp+b;p+c,vMmoΩvo−2αkmp+aαo p+b p+cMmo}OkxAp+aAp+bAp+c ;Rx3.17 ≡ −12{2γp+c;p+b,m′(γp+a;k,w′+ γk;p+a,w′)Ωw′wΩm′m (I + M)mw− 4αk p+d p+aαp+d p+c p+b − 2αv p+a p+b(γk;p+c,w′+ γp+c;k,w′)Ωw′w (I + M)wv1193.6. Proofs+ 4Mwvαk′w p+aαv p+b p+cMk′k − 2αvk′ p+a(γp+b;p+c,w′+ γp+c;p+b,w′)Ωw′w (I + M)wv−2γp+c;p+a;p+b,oΩok + 2αk p+a p+b p+c}OkxAp+aAp+bAp+c ;Rx3.18 ≡ −12{MlnΩko (I + M)ov + 2Ωko (I + M)on Mlv}OmxC l,kAmnAv−12{−2ΩnoΩko′(I + M)ol (I + M)o′v − ΩnoΩkm′(I + M)om′Mlv}OmxC l,kCm,nAv−12{−2Ωko (I + M)om}OlxCp+a,kAp+a lAm − 12{ΩkoΩlw (I + M)wn + ΩknΩlo}OoxCp+a,kCp+a,lAn−12{−34MkmMlv}OnxAklAmnAv − 12{Mkl}OmxAp+a kAp+amAl ;Rx3.19 ≡ −12{−2Ωlo (I + M)ok}OnxCn,lAp+a kAp+a−12{2ΩmlΩkwOwxC l,kCp+a,mAp+a + 2Ωkn (I + M)no ΩmoOlxC l,kCp+a,mAp+a}−12{2ΩkmOmx}Cp+b,kAp+b p+aAp+a − 12{MmkOnx}AmnAk p+aAp+a−12{−2Okx}Ap+a kAp+a p+bAp+b − 12{−Ωkn − Ωko (I + M)on}OmxCp+a,kAmnAp+a ;Rx3.20 ≡ −12{−12γm′,klΩkw′Ωlv′Mm′n (I + M)w′w (I + M)v′v− 2γm′,k′l′Ωk′kΩl′l (I + M)kw (I + M)ln Mm′v + 2γm′;k,l′Ωl′l (I + M)lw Mm′nMkv+ γk;m′,l′Ωl′l (I + M)lw Mm′nMkv + 2γm′;n′,o′Ωo′l (I + M)ln Mm′vMn′w−53αm′klMlnMm′wMkv}OmxAnmAwAv ;Rx3.21 ≡ −12{γm,k′l′Ωk′w′Ωl′v′Ωnn′(I + M)n′m (I + M)v′v (I + M)w′w}OoxCo,nAwAv−12{2γm,k′l′Ωk′v′Ωl′lΩnw′(I + M)lw′(I + M)v′v Mok+ 2γm,k′l′Ωk′v′Ωl′lΩnn′(I + M)lo (I + M)n′v (I + M)v′k− 2γm;m′,v′Ωv′lΩnn′(I + M)ln′Mm′vMok − 2γm;v′,k′Ωk′l (I + M)lo Ωnn′(I + M)n′k Mv′v− 2(γv′;m,l′ + γm;v′,l′)Ωl′lΩnm′(I + M)m′v′ (I + M)lv Mok− 2(γk′;m,l′ + γm;k′,l′)Ωl′l (I + M)lv Mk′oΩnm′(I + M)m′k+ 2γm;l,v′Ωv′k′Ωnl′Mlo (I + M)k′v (I + M)l′k+2αmv′lΩnn′(I + M)n′l Mv′kMov + 43αmn′lΩnm′(I + M)m′k MloMn′v}OmxCo,nAvAk ;Rx3.22 ≡ −12{+2γw,k′l′Ωk′mΩl′v (I + M)vn (I + M)ml− 2γm;w,oΩov (I + M)vn Mml − 2γm;w,vΩvo (I + M)ol Mmn1203.6. Proofs−γw;m,vΩvo (I + M)ol Mmn + 43αmwvMvnMml}OwxAn p+aAlAp+a ;Rx3.23 ≡ −12{−γw,ovΩomΩvl′Ωnw (I + M)l′l− γw,ovΩolΩvmΩnw − 2γm,k′l′Ωk′wΩl′vΩnw′(I + M)vw′(I + M)wl+ 2γv;m,oΩol′Ωnw (I + M)l′w Mvl + 2(γo;m,l′+ γm;o,l′)Ωl′vΩnw (I + M)wo (I + M)vl− γm;w,oΩovΩnm′(I + M)vl Mwm′− 2αmwvΩnvMwl−43αmwvΩnn′Mn′vMwl}OmxCp+a,nAlAp+a ;Rx3.24 ≡ −12{γp+a,w′v′Ωw′k′Ωv′l′ (I + M)k′k (I + M)l′l − 2(γp+a;k′,l′ + γk′;p+a,l′)Ωl′w (I + M)wk Mk′l+2αmnp+aMmlMnk}OnxAp+anAkAl ;Rx3.25 ≡ −12{−γp+b,w′v′Ωw′kΩv′l′ (I + M)l′l − γp+b,w′v′Ωw′kΩv′l+2(γp+a;k,l′+ γk;p+a,l′)Ωl′w (I + M)wl − 2αkn p+bMnl}OkxAp+b p+aAp+aAl ;Rx3.26 ≡ −12{−γp+a,w′v′Ωw′kΩv′l′Ωmo (I + M)on (I + M)l′l − γp+a,w′v′Ωw′lΩv′kΩmo (I + M)on− γp+a,w′v′Ωw′nΩv′lΩmk + 2(γp+a;k,l′+ γk;p+a,l′)Ωl′m′Ωmv (I + M)m′n (I + M)vl−2αvk p+aΩmv′(I + M)v′n Mvl}OkxCp+a,mAlAn ;Rx3.27 ≡ −12{−2γp+a,v′w′Ωv′kΩw′lΩmn (I + M)lv (I + M)nw− 2γp+a,v′w′Ωv′wΩw′vΩmk+ γp+a,onΩokΩnl′Ωmw′(I + M)w′w Ml′v+ 2(γk;p+a,o + γp+a;k,o)ΩolΩmn′(I + M)ln′Mvw+ 2(γp+a;k,o + γk;p+a,o)ΩolΩmn (I + M)nw (I + M)lv−4αk p+a lΩmn (I + M)nl Mvw − 2αk p+a lΩmo (I + M)ow Mlv}OkxCv,mAp+aAw−12{−2γp+a,v′w′Ωv′k′ (I + M)k′k Ωw′l (I + M)ln Ωmn+2 (γn;p+a,o + γp+a;n,o) ΩolΩmm′(I + M)m′n (I + M)lk}OvxCv,mAp+aAk ;Rx3.28 ≡ −12{2γp+a,onΩowΩnl (I + M)lm (I + M)wk− 2 (γp+a;n,o + γn;p+a,o) Ωol (I + M)lm Mnk −(γn;p+a,l′+ γp+a;n,l′)Ωl′l (I + M)lk Mnm1213.6. Proofs− 12γp+a,wnΩnoΩwlMokMlm + 3αln p+aMlkMnmMvw−γp+a,m′k′Ωm′kΩk′w′Mw′mMvw}OvxAvmAkAp+a ;Rx3.29 ≡ −12{2γp+a,vwΩvkΩwlΩon (I + M)ln − γp+a,nwΩnkΩwmΩovMmv− 2(γk;p+a,w + γp+a;k,w)ΩwlΩon (I + M)ln − 2(γm;p+a,l + γp+a;m,l)ΩlkΩom+4αk p+a lΩol + 2αk p+a lΩonMnl}OkxCp+b,oAp+aAp+b ;Rx3.30 ≡ −12{−2γp+a,v′w′Ωv′nΩw′l (I + M)lm + γp+a,koΩknΩovMvm+2 (γp+a;n,o + γn;p+a,o) Ωol (I + M)lm − 2αn p+a lMlm}OnxAmp+bAp+aAp+b ;Rx3.31 ≡ −12{−2γp+a;p+b,oΩol (I + M)ln Ωmn + 2αp+a p+b lΩmn (I + M)nl}OvxCv,mAp+aAp+b−12{−2γp+a;p+b,oΩon (I + M)nv Ωml + 2γp+a;p+b,oΩonMnvΩml}OlxCv,mAp+aAp+b ;Rx3.32 ≡ −12{2γp+a;p+b,oΩol (I + M)lm − γp+a;p+b,kΩklMlm − αp+a p+b lMlm}OvxAmvAp+aAp+b−12{+2(γp+a;p+b,w + γp+b;p+a,w)Ωwl (I + M)lo − 4αmp+a p+bMmo}OvxAp+a vAoAp+b−12{−2(γp+a;p+b,v + γp+b;p+a,v)Ωvm + 4αmp+a p+b}OmxAp+a p+cAp+bAp+c−12{−2(γp+a;p+b,m + γp+b;p+a,m)ΩmlΩvn (I + M)nw − 2(γp+a;p+b,m + γp+b;p+a,m)ΩmwΩvl+4αl p+a p+bΩvk (I + M)kw}OlxCp+a,vAwAp+b ;Rx3.33 ≡ −12{Ωok′Ωvl′(I + M)k′k (I + M)l′l}OmxCm,ovAkAl−12{−23MnkMol}OvxAnovAkAl−12{2Ωol (I + M)lk Mvn}OwxCw;v,oAkAn ;Rx3.34 ≡ −12{ΩomΩvn (I + M)nk + ΩokΩvm}OmxCp+a,ovAkAp+a − 12{2αp+a p+b p+c}OkxAp+c kAp+aAp+b−12{−2αp+a p+b p+cΩom}OmxCp+c,oAp+aAp+b−12{2Mmk}OnxAmnp+aAp+aAk − 12 {−2Omx}Amp+a p+bAp+aAp+b−12{−2Ωol (I + M)lk Owx}Cw;p+a,oAkAp+a−12{−2Ωol (I + M)lk Owx}Cp+a;w,oAkAmAp+a − 12{2ΩolOlx}Cp+a;p+b,oAp+aAp+b .1223.6. Proofs3.6.7 Proof of Cum (Rx, Ry, Rz) = O(T−3)The joint third-order cumulants are: for each fixed (x, y, z) ∈ {1, . . . , p0}3,Cum (Rx, Ry, Rz) =E [RxRyRz]− E [Rx] E [RyRz] [x, y, z] + 2E [Rx] E [Ry] E [Rz]=E [Rx1Ry1Rz1] + (E [Rx2Ry1Rz1]− E [Rx2 ] E [Ry1Rz1]) [x, y, z] +O(T−3).(3.31)where the first equality is given by the algebraic relation between moments and cumulants.First, we can easily obtain thatE [Rx1Ry1Rz1] = αnmkOmxOnyOkz (3.32)andE [Rx2 ] =16αlmkMmkOlx +12γm,olΩonΩlk (I + M)nk Omx − γm;l,nΩnk (I + M)km OlxE [Ry1Rz1] =δyz. (3.33)We use R2.j to denote the j− th term in the expression of R2 (Equation (3.29)). We find that E [Rx2Ry1Rz1]is equal to the sum∑7j=1 E[Rx2.jRy1Rz1]. We use the formulae for moments in Section 3.6.1 to derive theO(T−2) terms of E [Rx2Ry1Rz1]. For example,E [Rx2.1Ry1Rz1] =12MmkOnxOvyOwzE[AmnAkAvAw]=T−2(12MmkOnxOvyOwz{αmnkαvw + αmnvαkw + αmnwαkv})+O(T−3)=T−2(12αmnkMmkOnxδyz −12αmnvOmzOnxOvy −12αmnwOmyOnxOwz)+O(T−3)=T−2(12αmnkMmkOnxδyz − αmnvOmzOnxOvy)+O(T−3).Similarly, we can obtainE [Rx2.2Ry1Rz1] = −αn p+a p+aOnxδyz;E [Rx2.3Ry1Rz1] = T−2(−13αvmnMvmOnxδyz + 13αvmnOvzOmyOnx + 13αvmnOvyOmzOnx+12γm,wvΩwn (I + M)no ΩvoOmxδyz)+O(T−3);1233.6. ProofsE [Rx2.5Ry1Rz1] = T−2 (−γp+a;p+a,mΩmnOnxδyz + αv p+a p+aOvxδyz) +O(T−3);E [Rx2.6Ry1Rz1] = T−2(−γl,k;mΩko (I + M)om Olxδyz)+O(T−3);E [Rx2.7Ry1Rz1] = T−2(γp+a;p+a,kΩkmOmxδyz)+O(T−3).It also follows that E [Rx2.4Ry1Rz1] = O(T−3), since E[AmAnAkAp+a]= O(T−3). Therefore, it follows that(E [Rx2Ry1Rz1]− E [Rx2 ] E [Ry1Rz1]) [x, y, z] =(−13αlmvOmyOlxOvz)[x, y, z]=− αlmvOmyOlxOvz (3.34)and from (3.32)and (3.34) that Cum (Rx, Ry, Rz) = O(T−3).3.6.8 Proof of Cum (Rx, Ry, Rz, Rt) = O(T−4)The fourth-order joint cumulants of R are: for each fixed (x, y, z, t) ∈ {1, . . . , p0}4,Cum(Rx, Ry, Rz, Rt)=E[RxRyRzRt]− E [RxRy] E[RzRt][y, z, t]− E [Rx] E[RyRzRt][x, y, z, t]+ 2E [RxRy] E [Rz] E[Rt][x, y, z, t]− 6E [Rx] E [Ry] E [Rz] E[Rt]=E[Rx1Ry1Rz1Rt1]− E [Rx1Ry1] E[Rz1Rt1][y, z, t]+(E[Rx2Ry1Rz1Rt1]− E [Rx2Ry1] E[Rz1Rt1][y, z, t])[x, y, z, t]+(E[Rx2Ry2Rz1Rt1]− E [Rx2Ry2] E[Rz1Rt1])[x, y, z, t]+(E[Rx3Ry1Rz1Rt1]− E [Rx3Ry1] E[Rz1Rt1][y, z, t])[x, y, z, t]− E [Rx2 ] E[Ry1Rz1Rt1][x, y, z, t]−(E [Rx2 ] E[Ry2Rz1Rt1][y, z, t])[x, y, z, t]+ 2E [Rx2 ] E [Ry2] E[Rz1Rt1][x, y, z, t] +O(T−4) (3.35)where the first equality is given by the algebraic relation between moments and cumulants.1243.6. ProofsFrom the proof of Cum (Rx, Ry, Rz) = O(T−3), it is readily obtained that− E [Rx2 ] E[Ry1Rz1Rt1][x, y, z, t]−(E [Rx2 ] E[Ry2Rz1Rt1][y, z, t])[x, y, z, t]+ 2E [Rx1Ry1] E [Rz2] E[Rt2][x, y, z, t]=− E [Rx2 ]{E[Ry1Rz1Rt1]+ E [Rx2 ] E[Ry2Rz1Rt1][y, z, t]− E [Ry2] E[Rz1Rt1][y, z, t]}[x, y, z, t]=O(T−4).Then it suffices to show that the sum of the first four lines in (3.35) is O(T−4). We look for expressionsfor each of the four terms. We will show that the sum of these four terms is O(T−4). We will use the lastthree formulae for moments in Section 3.6.1.Expansion of E[Rx2Ry1Rz1Rt1]− E [Rx2Ry1] E[Rz1Rt1][y, z, t]Let R2.j be the j − th term of R2. E[Rx2Ry1Rz1Rt1]− E [Rx2Ry1] E[Rz1Rt1][y, z, t] is equal to the sum ofE[Rx2.jRy1Rz1Rt1]− E[Rx2.jRy1]E[Rz1Rt1][y, z, t] for j = 1, · · · , 7. Using the 6th formula in Section 3.6.1,we obtainE[Rx2.1Ry1Rz1Rt1]− E [Rx2.1Ry1] E[Rz1Rt1][y, z, t]=T−3{12MmkOnx(E[AmnAkOvyAvOwzAwOotAo]− E[AmnAkOvyAv]E[OwzAwOotAo][y, z, t])}+O(T−4)=T−3{12MmkOnxOvyOwzOot(E[AmnAkAvAwAo]− E[AmnAkAv]E [AwAo] [v, w, o])}+O(T−4)=T−3{12MmkOnxOvyOwzOot{αvwoαmnk + αmnvαkow[v, w, o] + αkv (αmnwo − αmnαwo) [v, w, o]}}+O(T−4)=T−3{12αvwoαmnkMmkOnxOvyOwtOoz +12αkwoαmnvMmkOnxOvyOwtOoz[y, t, z]−12αmnwoOmyOnxOwtOoz[y, t, z] +12δyxδtz[y, t, z]}+O(T−4).Similarly we obtain expansions for other terms. We use [3] as a short notation for [y, t, z] in this section.E[Rx2.2Ry1Rz1Rt1]− E [Rx2.2Ry1] E[Rz1Rt1][3]1253.6. Proofs= T−3{−αn p+a p+aαvwoOnxOvyOwtOoz − αnv p+aαwo p+aOnxOvyOwtOoz[3]}+O(T−4) ;E[Rx2.3Ry1Rz1Rt1]− E [Rx2.3Ry1] E[Rz1Rt1][3]= T−3{γm;v,wαkloΩwn (I + M)nk OmyOvxOltOoz[3]− 13αvmnαv′w′o′MvmOnxOv′yOw′tOo′z− 13αvmnαkw′o′OvyMmkOnxOw′tOo′z[3]− 13αvmnαlw′o′MvlOmyOnxOw′tOo′z[3]+12αv′w′o′γm,wvΩwnΩvo (I + M)no OmxOv′yOo′zOw′t}+O(T−4) ;E[Rx2.4Ry1Rz1Rt1]− E [Rx2.4Ry1] E[Rz1Rt1][3]= T−3{−αw′o′ p+aγp+a,vwΩvkΩwoOoxOkyOw′tOo′z[3]+ 12αw′o′ p+aγp+a,vwΩvmΩwoOmyOoxOw′tOo′z[3]+αvmp+aαw′o′ p+aOvyOmxOv′yOw′tOo′z[3]}+O(T−4) ;E[Rx2.5Ry1Rz1Rt1]− E [Rx2.5Ry1] E[Rz1Rt1][3]= T−3{−αvwoγp+a;p+a,mΩmnOnxOwtOvyOoz + αvwoαmp+a p+aOmxOwtOvyOoz}+O(T−4) ;E[Rx2.6Ry1Rz1Rt1]− E [Rx2.6Ry1] E[Rz1Rt1][3]= T−3{−γm;l,kαvwnΩko (I + M)om OlxOwtOnzOvy−γv;l,kαmwnΩko (I + M)om OlxOwtOnzOvy[3]}+O(T−4) ;E[Rx2.7Ry1Rz1Rt1]− E [Rx2.7Ry1] E[Rz1Rt1][3]= T−3{γp+a;p+a,kαvwoΩkmOmxOvyOwtOoz+γv;p+a,kαwo p+aΩkmOmxOvyOwtOoz[3]}+O(T−4) ;Expansion of E[Rx2Ry2Rz1Rt1]− E [Rx2Ry2] E[Rz1Rt1]We notice that E[Rx2Ry2Rz1Rt1]− E [Rx2Ry2] E[Rz1Rt1]is equal to the sum of 33 terms since Rx2Ry2 is equalto the sum of the 33 terms shown in Section 3.6.4. For example, by using the 7th formula in Section 3.6.1,1263.6. Proofsthe first term can be expanded:14MmvMlwOnxOky(E[AmnAklAvAwRz1Rt1]− E[AmnAklAvAw]E[Rz1Rt1])=T−314MmvMlwOnxOkyOv′zOw′t{(αmnkl − αmnαkl)(αvv′αww′[2; v′, w′])+ αmnv(αklv′αvw′[2; v′, w′])+ αmnv(αklv′αvw′[2; v′, w′])+ αklv(αmnv′αww′[2; v′, w′])+αklw(αmnv′αvw′[2; v′, w′])+ αvw(αmnv′αklw′[2; v′, w′])}=T−3{αmnklOmzOltOnxOky −14δxzδyt −14αmnv(αklwMmvOltOnxOkyOwz + αklwMmvOlzOnxOwtOky)= −14αmnw(αklvMlwOmtOnxOkyOvz + αklvMlwOmzOnxOkyOvt)−14αklv(αmnwMmvOltOnxOkyOwz + αmnwMmvOlzOnxOkyOwz)−14αklw(αmnvMlwOmtOnxOkyOvz + αmnvMlwOmzOnxOkyOvt)−14αmnvαklwMmlOnxOkyOvzOwt}+O(T−4).Omitting the T−3 in the front and the O(T−4) notation, the 2nd to the 33rd terms are:2. αln p+aαkmp+aOnxOmyOlzOkt[2] ;3. γl′;v,wγm′;o,w′ΩwkΩw′n (I + M)nk Ol′zOm′tOoyOvx[2]−19αlkvαnmwMlkOntOmzOvxOwy[2]− 19αlkvαnmwMkmOlzOntOvxOwy[2]−19αlkvαnmwMknOlzOmtOvxOwy[2]− 19αlkvαnmwMlnOkzOmtOvxOwy[2]−19αlkvαnmwMlmOkzOntOvxOwy[2]− 19αlkvαnmwMmnOktOlzOvxOwy[2] ;4. 16αlkoγw,v′vΩv′nΩvm (I + M)nm OktOlzOoxOwy[2] ;5. γp+a,vwγp+a,mnΩvkΩwoΩmlΩno′OoxOo′yOkzOlt[2]+14γp+a,vwγp+a,mnΩvkΩwoΩmlΩno′OkzOltOoxOo′y[2] ;6. −12γp+a,vwγp+a,mnΩvkΩwoΩmlΩno′OltOoxOo′yOkz[2]−γp+a,vwαlm p+aΩvkΩwoOltOmyOoxOkz[2] + 12γp+a,vwαlm p+aΩvkΩwoOkzOltOmyOox[2] ;9. γv′;l,kγw′;m,nΩkoΩno′(I + M)oo′OlxOmyOv′zOw′t[2] ; 10. γv′;p+a,lγw′;p+a,kΩlmΩkvOmxOvyOv′zOw′t[2] ;11. −12αnmp+aαkl p+aOmtOnxOkyOlz[2]− 12αk p+a p+aαnmlOmtOnxOkyOlz[2] ;12. −12γk;l,w′αmnvΩw′n′ (I + M)n′v OktOmzOnxOly[2]+16αklwαmnoMmoOkzOltOnxOwy[2] + 16αklvαmnwMkwOmzOltOnxOvy[2]+16αklwαmnvMlvOmzOktOnxOwy[2] + 16αklwαmnvMmkOltOnxOwyOvz[2]1273.6. Proofs+16αklwαmnvMmlOktOnxOwyOvz[2] + 16αklwαmnvMklOmtOnxOwyOvz[2]−14αmnvγw,klΩkn′Ωlo (I + M)n′o OmtOnxOwyOvz[2] ;13. +12αmnp+aγp+a,klΩkwΩloOmzOnxOoyOwt[2]− 12αmnp+aγp+a,klΩkm′ΩloOmzOm′tOnxOoy[2]−12αkl p+aαmnp+aOktOmzOnxOly[2] ;14. +12αnmvγp+a;p+a,oΩolOmtOnxOlyOvz[2] + 12αnmvαo p+a p+aOmtOnxOoyOvz[2] ;15. +12γl,k;mαwvv′Ωko (I + M)om OvtOvtOwxOlyOv′z[2] + 12αwvmγl,k;v′Ωko (I + M)om OvtOwxOlyOv′z[2] ;16. −12γp+a,n;p+aαwvv′′ΩnoOvtOwxOoyOv′′z[2]− 12αwv p+aγp+a,n;v′′ΩnoOvtOwxOoyOv′′z[2] ;17. −13αovwαmp+a p+aOozOvtOmxOwy[2] ;18. αml p+aγp+a,vwΩvkΩwoOmxOoyOlzOkt[2]− 12αml p+aγp+a,vwΩvnΩwoOntOmxOoyOlz[2]−αkmp+aαvn p+aOvtOmxOnyOkz[2] ;21. −γk;p+a,mαln p+aΩmvOnxOvxOkzOlt[2] ;25. −13αvmnγp+a;p+a,wΩwoOvzOmtOnxOoy[2] + 13αvmnαo p+a p+aOvzOmtOnxOoy[2] ;26. −γl;v,wγv′;o,nΩwkΩnm (I + M)km OltOvxOoyOv′z[2]− 13αlkvγm;o,nΩnm′(I + M)m′m OlzOktOvxOoy[2] ;27. +13αvwnγp+a;p+a,mOvzOwtOnxΩmoOoy[2] ;31. −γp+a,vwγv′;p+a,kΩvlΩwoΩkmOoxOmyOv′zOlt[2] + 12γp+a,vwγv′;p+a,kΩvnΩwoΩkmOntOoxOmyOv′z[2]+αvn p+aγv′;p+a,kΩkmOvtOnxOmyOv′z[2] ;with terms that are not shown here equal to zero. Here we used [2] as a short notation for [2; z, t].Expansion of E[Rx3Ry1Rz1Rt1]− E [Rx3Ry1] E[Rz1Rt1][y, z, t]We know thatE[Rl3Rk1Rm1 Rn1]− E[Rl3Rk1]E [Rm1 Rn1 ] [y, z, t] =34∑j=1(E[Rl3.jRk1Rm1 Rn1]− E[Rl3.jRk1]E [Rm1 Rn1 ] [y, z, t]).Then we find expansions of E[Rx3.jRy1Rz1Rt1]− E[Rx3.jRy1]E[Rz1Rt1][3] for j = 1, . . . , 34 term by term.Here we use [3] as a short notation for [y, z, t]. The 8th formula in Section 3.6.1 will be used here. Itis readily observed that many of these terms are equal to zeros, for example, all the R′3.js with Ap+a or(I + M)l′lAl.E[Rx3.1Ry1Rz1Rt1]− E [Rx3.1Ry1] E[Rz1Rt1][3]= T−3{−49αvmnαwklMvwOkx(OlyOmtOnz[2; t, z] + OltOmyOnz[2; y, z] + OlzOmyOnt[2; t, y])1283.6. Proofs+ 12αp+amnαp+a klOkx(OlyOmtOnz[2; t, z] + OltOmyOnz[2; y, z] + OlzOmyOnt[2; t, , y])− 18(γm;n,w′+ γn;m,w′)(γl;k,v′+ γk;l,v′)Ωw′wΩv′v (I + M)wv×(OlyOmtOnz[2; t, z] + OltOmyOnz[2; y, z] + OlzOmyOnt[2; t, y])+ 12αvmn(γk;l,o + γl;k,o)Ωow (I + M)wv Okx×(OlyOmtOnz[2; t, z] + OltOmyOnz[2; y, z] + OlzOmyOnt[2; t, y])}+O(T−4) ;E[Rx3.5Ry1Rz1Rt1]− E [Rx3.5Ry1] E[Rz1Rt1][3]= T−3{−18γp+a,oo′γp+a,vwΩomΩo′nΩvlΩwkOkx×(OlyOmtOnz[2; z, t] + OltOnzOmy[2; y, z] + OlzOntOmy[2; y, t])}+O(T−4) ;E[Rx3.6Ry1Rz1Rt1]− E [Rx3.6Ry1] E[Rz1Rt1][3]= T−3{−14αklmnOkx(OltOmzOny[2; t, z] + OlyOmzOnt[2; y, z] + OlyOmtOnz[2; t, y])}+O(T−4) ;E[Rx3.18Ry1Rz1Rt1]− E [Rx3.18Ry1] E[Rz1Rt1][3]= T−3{γw;l,kαmnvΩko (I + M)on Omx×(OlzOvtOwy[2; t, z] + OlzOwtOvy[2; z, y] + OltOvyOwz[2; y, t])− 12γw;l,kγv;m,nΩnoΩkm′(I + M)om′Omx×(OlzOwyOvt[2; t, z] + OlzOvyOwt[2; y, z] + OltOvyOwz[2; t, y])− 12γw;p+a,kγv;p+a,lΩknΩloOox×(OwyOvtOnz[2; t, z] + OvyOwtOnz[2; y, z] + OvyOntOwz[2; t, y])− 38αklwαmnvMkmOnx(OlzOwyOvt[2; t, z] + OlzOvyOwt[2; y, z] + OltOvyOwz[2; y, t])+ 12αp+a kwαp+amvOmx×(OkzOwyOvt[2; t, z] + OkzOvyOwt[2; y, z] + OktOvyOwz[2; y, t])}+O(T−4) ;E[Rx3.20Ry1Rz1Rt1]− E [Rx3.20Ry1] E[Rz1Rt1][3]= T−3{−γk;v,oΩol (I + M)ln αnmwOmx×(OvtOkzOwy[2; t, z] + OvyOkzOwt[2; y, z] + OvyOktOwz[2; t, y])+ 56αvklαnmwMlnOmx×(OvtOkzOwy[2; t, z] + OvyOkzOwt[2; y, z] + OvyOktOwz[2; t, y])}+O(T−4) ;E[Rx3.21Ry1Rz1Rt1]− E [Rx3.21Ry1] E[Rz1Rt1][3]= T−3{γm;k,vγw;o,nΩvlΩnn′(I + M)ln′Omx×(OktOozOwy[2; t, z] + OkyOozOwt[2; y, z] + OkyOotOwz[2; t, y])− αmvlγw;o,nΩnk (I + M)kl Omx1293.6. Proofs×(OvzOotOwy[2; t, z] + OvzOoyOwt[2; y, z] + OvtOoyOwz[2; t, y])}+O(T−4) ;E[Rx3.24Ry1Rz1Rt1]− E [Rx3.24Ry1] E[Rz1Rt1][3]= T−3{−αmnp+aαlw p+aOlx×(OmzOntOwy[2; t, z] + OmzOnyOwt[2; y, z] + OmtOnyOwz[2; t, y])}+O(T−4) ;E[Rx3.26Ry1Rz1Rt1]− E [Rx3.26Ry1] E[Rz1Rt1][3]= T−3{12γp+a,w′v′γw;p+a,mΩw′oΩv′vΩmkOkx×(OwyOvtOoz[2; t, z] + OwtOvyOoz[2; y, z] + OvyOotOwz[2; y, t])}+O(T−4) ;E[Rx3.33Ry1Rz1Rt1]− E [Rx3.33Ry1] E[Rz1Rt1][3]= T−3{13αnovwOvx(OntOozOwy[2; t, z] + OnyOozOwt[2; y, z] + OnyOotOwz[2; t, y])}+O(T−4) .Proof of the ClaimWe define the following 11 constants. [3] is a short notation for [t, y, z]. [6] is a short notation for [x, y, z, t].[4] is a short notation for [x, y, z, t]. Lets1 ≡ αklvαmnwMkwOnxOlzOvyOwt[3];s2 ≡ γp+a,vwαlm p+aΩvkΩwoOlxOmyOotOkz[6];s3 ≡ αlkoγw,v′vΩv′nΩvm (I + M)nm OktOlzOoyOwx[4];s4 ≡ αwvv′γm;l,kΩko (I + M)om OvtOwyOv′zOlx[4];s5 ≡ γp+a,onγp+a,vwΩokΩnmΩvlΩwkOkxOlyOmzOkt[3];s6 ≡ αklmnOltOmzOnyOkx;s7 ≡ αwv p+a(γk;p+a,nΩno + γo;p+a,nΩnk)OvyOwxOotOkz[6];s8 ≡(γv;p+a,kΩkn + γn;p+a,kΩkv) (γw;p+a,l + γo;p+a,lΩlw)OoxOvyOwzOnt[3];s9 ≡ γp+a,w′v′(γv;p+a,mΩmk + γk;p+a,mΩmv)Ωw′oΩv′wOkxOvyOwtOoz[6];s10 ≡ δxtδyz[3];s11 ≡ αmnvαklwMmvOnxOltOkyOwz[4].We work out simplifications to the expansions of E[Rx2Ry1Rz1Rt1]−E [Rx2Ry1] E[Rz1Rt1][3], E[Rx2Ry2Rz1Rt1]−E [Rx2Ry2] E[Rz1Rt1]and E[Rx3Ry1Rz1Rt1]−E [Rx3Ry1] E[Rz1Rt1][3] in the previous sections. We simply com-1303.6. Proofsbine alike terms in the expansions. We obtain(E[Rx2Ry1Rz1Rt1]− E [Rx2Ry1] E[Rz1Rt1][y, z, t])[x, y, z, t]=T−3{16s11 −23s1 − 6s6 + 2s10 +12s3 − s2 − s4 + s7}+O(T−4), (3.36)(E[Rx2Ry2Rz1Rt1]− E [Rx2Ry2] E[Rz1Rt1])[x, y, z, t]=T−3{3s6 − s10 −16s11 +59s1 −12s3 + s5 + s8 + s4 − s7 − s9 + s2}+O(T−4) (3.37)and(E[Rx3Ry1Rz1Rt1]− E [Rx3Ry1] E[Rz1Rt1][y, z, t])[x, y, z, t]=T−3(19s1 − s8 − s5 + 2s6 + s9)+O(T−4). (3.38)First, it is easy to obtain by using the 8th formula of the formulae for moments in Section 3.6.1 thatE[Rx1Ry1Rz1Rt1]− E [Rx1Ry1] E[Rz1Rt1][y, z, t]=T−3{αklmnOkxOlyOmzOnt −(OlxOkyαlk) (OmzOntαmn)−(OlxOmzαlm)(OkyOntαkn){ −(OlxOntαln)(OmzOkyαmk)}+O(T−4)=T−3 {s6 − s10}+O(T−4). (3.39)Then it is readily obtained that Cum(Rx, Ry, Rz, Rt)= O(T−4) by summing up (3.36), (3.37), (3.38)and (3.39). The sum of the O(T−3) terms is equal to zero.3.6.9 Proofs of Propositions 3.1 and 3.2We derived that for some R1, R2, R3 and R ≡ R1 + R2 + R3, LR = T · RτR + Op(T−3/2). Once weestablished that Cum (Rx, Ry, Rz) = O(T−3) and Cum(Rx, Ry, Rz, Rt)= O(T−4), the proofs of (3.14)and (3.15) are exactly the same as proofs of Theorems 1 and 2 in Chen and Cui (2007). For (3.16), firstwe observe that(1 + T−1B̂c)−1= 1− T−1Bc + T−3/2ξ +Op(T−2)1313.6. Proofsfor some ξ = Op(1). Then we have(1 + T−1B̂c)−1LR = T (R1 +R2 +R3b)τ (R1 +R2 +R3b) +Op(T−3/2)for R3b ≡ R3 − T−1(Bc2)R1. Apply the delta method (section 2.7 of Hall(1992)), we obtainP[(1 + T−1B̂c)−1LR 6 c]= P [(R1 +R2 +R3b)τ (R1 +R2 +R3b) 6 c] +O(T−3/2).It is straightforward to verify that we still have Cum(Rxb , Ryb , Rzb)= O(T−3) and Cum(Rxb , Ryb , Rzb , Rtb)=O(T−4) for Rb ≡ R1 +R2 +R3b by inspecting (3.13). A consequence of this result is that in the edgeworthexpansion of the density f of T 1/2Rbf(t) = φ(t) + T−1/2pi1(t)φ(t) + T−1pi2(t)φ(t) + T−3/2pi3(t)φ(t) +O(T−2)the polynomial pi2 is of degree 2. It is also easy to verify thatCum(Rxb , Ryb)= T−1δxy + T−2 (∆xy −Bcδxy) +O(T−3).and therefore pi2(t) = 12∑p0x=1∑p0y=1 (∆xy + µxµy −Bcδxy) (txty − δxy). Then it is easy to getˆtτ t6cpi2(t)φ(t)dt = 0by applying the definition of Bc (Bc ≡ p−10∑p0x=1 (∆xx + µxµx)) and using the fact thatˆtτ t6c(txty − δxy)φ(t)dt = 0if x 6= y andˆtτ t6c(txty − δxy)φ(t)dt = p−10ˆtτ t6c(tτ t− p0)φ(t)dtfor all x = y. Since in the expansion pi1 and pi3 are odd polynomials, integration of pi1(t)φ(t) and pi3(t)φ(t)are zeros. We obtain P[(1 + T−1B̂c)−1LR 6 c]= P[χ2p0 6 c]+ O(T−3/2). Then we can apply thesymmetry argument by Barndorff-Nielson and Hall (1988) to obtain that the error term O(T−3/2) is1323.6. Proofsactually O(T−2).Once we established that Cum (Rx, Ry, Rz) = O(T−3) and Cum(Rx, Ry, Rz, Rt)= O(T−4), we caneasily adapt proofs of Theorem 2 of Liu and Chen (2010) to get a proof for Proposition 3.2. By someroutine algebraic work, similar to the proof of Theorem 2 of Liu and Chen (2010) we can also find thatALR =LR−Bc ·Rτ1R1 +Op(T−3/2)=T (R1 +R2 +R3a)τ (R1 +R2 +R3a) +Op(T−3/2)for R3a ≡ R3 − T−1(Bc2)R1. We can apply the same argument to show that the distribution ofT (R1 +R2 +R3a)τ (R1 +R2 +R3a) can be approximated by χ2p0 to T−2 precision.3.6.10 Expression for BcWe defined Bc ≡ p−10∑p0x=1 (∆xx + µxµx) where ∆xy and µxare defined by the expansions:Cum (Rx) = E [Rx] =E [Rx2 ] +O(T−2)=T−1µx +O(T−2)for each x ∈ {1, . . . , p0}, andCum (Rx, Ry) = T−1δxy + T−2∆xy +O(T−3)for each (x, y) ∈ {1, . . . , p0}2. Then it is easy to obtain that µx = T ·E [Rx2 ] for each x ∈ {1, . . . , p0}. Thealgebraic relation between cumulants and moments givesCum (Rx, Ry) =E [RxRy]− E [Rx] E [Ry]=E [Rx1Ry1] + E [Rx2Ry2] + E [Rx2Ry1] [2;x, y] + E [Rx3Ry1] [2;x, y]− E [Rx2 ] E [Ry2] +O(T−3).Here we note that E [Rx1Ry1] = T−1δxy. In Section 3.6.4, we derived that for fixed x, y, Rx2Ry2 is equal tothe sum of the 33 terms. Therefore E [Rx2Ry2] is equal to the sum of the expectations of the 33 terms. For1333.6. Proofsexample, the expectation of the first term can be expanded:E[{14MmvMlwOnxOky}AmnAklAvAw]=T−2{14MmvMlwOnxOky}{(αmnkl − αmnαkl)αvw + αmnvαklw + αmnwαklv}+O(T−3). (3.40)We can obtain expansions for all of these 33 expectations and we haveE [Rx2Ry2] = T−2Jxy +O(T−3)where Jxy is the sum of the coefficients of the T−2 terms in the 33 expansions. The form of Jxy isquite complex and will not be given here. Similarly, we find that E [Rx3Ry1] =∑34j=1 E[Rx3.jRy1]andE [Rx2Ry1] =∑7j=1 E[Rx2.jRy1]. For each E[Rx3.jRy1], we can work out an expansion in a form similar to(3.40). After a lot of relatively routine algebra, we can getE [Rx2Ry1] + E [Rx3Ry1] = T−2Kxy +O(T−3)for some Kxy defined as the sum of coefficients. Then we can find that∆xy = Kxy[2;x, y] + Jxy − µxµy.Then after some more algebraic work, we can obtain the explicit expression forBc ≡ p−10∑p0x=1 (∆xx + µxµx):Bc = p−10{13αvmnαwklMvwMmlMnk − αvmp+aαwnp+aMvwMmn + 12αnovkMnkMvo − αmnp+a p+aMnm+ αn p+a p+bαmp+a p+bMnm+ γo;n,vγw;m,v′ΩvlΩv′kMowMnm (I + M)kl+(γv;o,nγw;m,k − γw;o,nγv;m,k)ΩklΩnm′Mom (I + M)m′w (I + M)lv− 2αnmwγv;k,l′Ωl′lMvnMmk (I + M)lw + 2αkl p+a (γp+a;v,m + γv;p+a,m) ΩmnMkv (I + M)nl− 2αo p+a p+bγp+a;p+b,mΩmnMno + αnmvγp+a;p+b,oΩolMmvMnl+(γm;p+a,kγn;p+a,l − γn;p+a,kγm;p+a,l)ΩkoΩlwMom (I + M)wn− 2γp+a;v,nγl;p+a,kΩnwΩkmMvm (I + M)wl +(γn;p+a,kγm;p+a,l − γm;p+a,kγn;p+a,l)ΩknΩloMom− 2γp+a;k,oγp+a;v,mΩolΩmnMkv (I + M)ln − (γv;p+a,m + γp+a;v,m) γo;p+a,nΩmkΩnlMvo (I + M)kl+ γp+a;p+b,mγp+a;p+b,kΩmnΩklMnl − 2γm;l,kγp+a;p+b,vΩvwΩkoMwl (I + M)om1343.6. Proofs− 2γm;n;l,kΩkoMml (I + M)on + 2γn;p+a;p+a,mΩmvMnv − γp+a,l;p+a,kΩlmΩkvMmv+(γk;n,o′γm,v′wΩwo − 2γm,v′wγo;n,o′Ωwk)Ωv′vΩo′lMnm (I + M)vo (I + M)kl+(γm,w′v′γn,oo′ΩovΩv′l − 12γm,w′v′γn,oo′ΩolΩv′v)Ωo′kΩw′wMnm (I + M)wv (I + M)kl+ αnmwγv,k′l′Ωk′kΩl′lMmv (I + M)kw (I + M)ln − 2αmop+aγp+a,v′w′Ωv′nΩw′lMno (I + M)lm+ 14(γp+a,oo′γp+a,vwΩo′kΩvn + γp+a,oo′γp+a,vwΩo′nΩvk)ΩomΩwlMmk (I + M)nl+ 14(γp+a,oo′γp+a,vwΩo′mΩvn − γp+a,oo′γp+a,vwΩo′nΩvm)ΩomΩwkMkn+(γp+a,wvγk;p+a,mΩvn − γp+a,wvγn;p+a,mΩvk)ΩwlΩmoMkl (I + M)no+(γp+a,wvγo;p+a,mΩvkΩmn − γp+a,wvγo;p+a,mΩvnΩmk)ΩwnMko− 14γp+a,vwγp+a,mnΩvkΩwoΩmkΩno′Moo′− 34γp+a,vwγp+a,mnΩvkΩwoΩmlΩno′Moo′(I + M)lk− γn,wvγp+a;p+a,mΩmoΩwlΩvkMno (I + M)kl + γp+a,vwγl;p+a,kΩvlΩwoΩkmMom+γp+a,vwγl;p+a,kΩvnΩwoΩkmMom (I + M)nl}135Chapter 4On Pseudo Observation Adjustment ofEmpirical Likelihood Based Methods forConditional Moment Restriction Models4.1 IntroductionEstimation and testing for moment restriction models is important in economic research. Many usefuleconometric models are special cases of moment restriction models. Hansen (1982)’s generalized method ofmoments (GMM) give a family of important estimation and testing methods for these models. Since Owen(1988) and Qin and Lawless (1994)’s pioneering work, the family of generalized empirical likelihood (GEL)methods (Newey and Smith (2004)) which include empirical likelihood (EL) as a special case have beenpopular alternatives to GMM. It has been proved that GEL has the same first-order asymptotic propertiesas the two - step GMM with optimal weighting matrix. Theorists believe EL has theoretical advantagesthan efficient GMM in terms of second order properties. For example, it can be showed that EL (GEL) hasless asymptotic bias than efficient GMM and bias - corrected EL is higher order asymptotically efficient(Newey and Smith (2004)) and EL - based tests for point hypothesis and overidentification restrictionscan be shown to be Bartlett correctable (DiCiccio, Hall and Romano (1991), Chen and Cui (2007),Matsushita and Otsu (2013)). It is also noticed that GEL - based tests achieve asymptotic pivotalnesswithout explicit studentization. This feature is useful when estimating the asymptotic variance matrix isdifficult. However, it is computationally easier to work with GMM. EL (GEL) requires working on a two-fold nested optimization routine. More importantly, there is a technical issue with the computation step ofEL (GEL) method in finite samples. This makes it challenging to find an estimate as (approximately) thetrue global minimizer of the estimator objective function (in case of EL (GEL) this is called EL (GEL)1364.1. Introductionprofiling likelihood function). In worse cases the estimator objective function is analytically nowheredefined, so that EL (GEL) method cannot be applied. Ignoring this issue and directly carrying out brute-force numerical computation of an EL (GEL) estimate may lead to invalid estimation and testing results,especially when the researcher uses elementary algorithms.Chen, Variyath and Abraham (2008) proposed a novel adjustment to construction of the EL profilelikelihood function. Their adjustment is based on creating a “pseudo observation” and depends on atuning parameter which can be data - dependent. With this adjustment, the newly defined adjusted em-pirical likelihood (AEL) profile likelihood function is analytically well - defined and the whole parameterspace is its domain. Consequently, finding its optimizer in is much simpler. And it can be proved thatestimator and testing statistics obtained from AEL profile likelihood function have the same first-orderasymptotic properties under the same regularity assumptions. And therefore from an asymptotic ana-lytical perspective, use of AEL can be justified. In cases when EL (GEL) profiling likelihood function isnowhere defined, AEL is a valid alternative to GMM which also has the feature of implicit pivotalness.Moreover, the factor that AEL introduces additional flexibility in form of data-dependent tuning param-eter allow us to do second order refinement and solve the finite-sample problem simultaneously. Liu andChen (2010) shows that choosing this tuning parameter in an elegant and data-dependent way, we candecrease the error of the χ2 approximation to distribution of empirical likelihood ratio statistics to orderof n−2, which is equivalent to Bartlett correction. Matsushita and Otsu (2012) extended these results totesting overidentification restrictions. Chen and Huang (2012) discussed several important finite-sampleproperties of AEL.As a variation to ordinary moment restriction models, conditional moment restriction models are alsoused a lot in economic research. In fact theoretically, ordinary moment restriction models are special formsof conditional moment restriction models. Conditional moment restriction model is an abstract modellingframework that covers a wide range of interesting cases in economic research. Examples include linearor nonlinear conditional mean regression, fixed effect panel data models, sample selection models etc. aswell as some structural econometric models.Chamberlain (1983) derived a semi-parametric efficiency bound for conditional moment restrictionmodels. Earlier estimators that attains the semiparametric efficiency bound include Robinson (1987) andNewey (1990). Donald, Imbens and Newey (2003) and Kitamura, Tripathi and Ahn (2004) proposedtwo different estimation and testing methods that are based on mechanism of constructing EL (GEL) for1374.1. Introductionordinary moment restriction models. Roughly, these two methods correspond to the two devices to performnonparametric estimation, namely, kernel smoothing and approximation by functional series. Each ofthese two methods leads to a semiparametric efficient estimator that attains semi-parametric efficiencybound and valid tests. When using either of these two methods, we construct a profile likelihood functionand define estimator to be approximate minimizer of the profile likelihood function over the parameterspace. But the same technical issue may cause difficulty when applying these methods in practice andthus. In fact, in Section 4.4, based on analysis we can see that this issue may more easily make theEL based methods for conditional moment restriction models unfruitful and invalid statistical proceduresthan ordinary moment restriction models. Therefore in our opinion, usefulness of these methods werediminished largely and trying to solve this problem is very necessary. In this paper, we follow the “pseudoobservation adjustment” approach of Chen, Variyath and Abraham (2008) and apply similar adjustmentsto construction of EL profile likelihood functions for conditional moment restriction models. This gives ustwo new estimators that attain the semiparametric efficiency bounds as well as two test procedures. Wecan always compute the AEL-based estimates with good accuracy. Our work is to prove that first-orderasymptotic properties of AEL point estimator and specification test statistic have the same first-orderasymptotic properties as the EL-based ones, under the same assumptions on the true data-generatingprocess (DGP) as those in Donald, Imbens and Newey (2003) and Kitamura, Tripathi and Ahn (2004).This pseudo observation adjustment is of great practical use to applied econometricians workingwith conditional moment restriction models. First, when the original EL-based estimators cannot beconvincingly defined or computed, AEL-based estimators with greatly enhanced computation efficiencycan be used. This can be valuable when researchers are interested in using different estimation methodsto enhance robustness. AEL is also easy to implement. Additionally, as we introduced some additionalflexibility, AEL also has potential to let us do high order refinement. This topic will be left to futureresearch.We firstly give mathematical formulation of conditional moment restriction models in the secondsection along with two examples. In the third section, we outline general mechanism behind EL methodfor ordinary moment restriction models and based on it we construct EL-based estimators for conditionalmoment restriction models. In the third section, we discuss the technical issue with original EL-basedmethods that we want to solve using AEL and introduce method of pseudo observation adjustment. Inthe fourth section, we provide large sample results that justify use of AEL. Notice that the regularity1384.2. Basic Setupconditions assumed in this paper are almost the same sets of conditions assumed in Donald, Imbens andNewey (2003) and Kitamura, Tripathi and Ahn (2004).In this paper, int(A) means interior of a set A. supp(X) means the support of random variable X.‖·‖ is the usual Euclidean norm for real vectors. For a real matrix A, ‖A‖ = trace(AτA)1/2 . We use Aτto denote transpose of a matrix A. ∇θ is used to denote the gradient with respect to θ (i.e. ∂f(θ)/∂θτ) and∇θθτ denotes the Hessian with respect to θ. Vectors are understood to be column vectors. So for an RJvalued function f , ∇θf(θ)τ is a d× J matrix (θ is d-dimensional.). Due to the nature of this paper thatboth asymptotic results and computational issues are discussed heavily, we insist on using these notationsto distinguish random vectors (upper case letters) and their realizations (lower case letter). a⊗ b denotesthe Kronecker product of a and b. “With probability 1” is abbreviated as “w.p.1”.4.2 Basic SetupLet ((Zj , Xj))nj=1 be a set of Rd×Rs valued random vectors from an i.i.d. process that are observable byeconometricians. UsuallyXj is a subvector of Zj in econometric models. In econometric applications, theseare usually some observable economic variables. Let((zj , xj) ∈ Rd × Rs)nj=1 denote its realized values asthe data we have. Suppose now some economic theory provides a conditional moment restriction: weassume that for some known compact set Θ ⊆ RL, there exists some θ∗ ∈ Θ such thatE [ρ(Z1, θ∗)|X1] = 0 (4.1)with probability one for some known function ρ : Rd ×Θ −→ RJ . θ∗ is a parameter we want to estimateand Θ is the parameter space. Existence of the conditioning variables X1 makes conditional momentrestriction models a considerably more complicated mathematical object than unconditional ones. Inempirical studies, with our data we are interested in obtaining a point estimate of θ∗, conducting pointhypothesis testing on θ∗ and testing validity of the restriction (4.1) (i.e. overidentification test). Resultof this specification test gives evidence on validity of the economic theory that leads to the restriction.We always assume the model is identifiable and the parameter space is compact.Assumption 13. (a). The model is identifiable: θ∗ is the unique point in Θ with E [ρ(Z1, θ∗)|X1] = 0w.p.1.1394.3. Review of Empirical Likelihood Method for Moment Restriction Models(b). The parameter space Θ is compact.We use the following notations. Let D(X, θ) = E [∇θρ(Z, θ)|X]and V (x, θ) = E [ρ(Z, θ)ρ(Z, θ)τ |X].Then the semi-parametric efficiency bound (Chamberlain (1987)) is given byΣ∗ ≡(E[D(X, θ∗)τV (X, θ∗)−1D(X, θ∗)])−1.To estimate the model, Robinson (1987) and Newey (1990) proposed two-step estimators that attain thisbound. This approach needs a first-step consistent estimate for θ∗ and then construct an estimate forthe “optimal instruments” a∗(X) = D(X, θ∗)τV (X, θ∗)−1. Then they estimate an unconditional momentrestriction model that is implied by (4.1), E [a∗(X1)ρ(Z1, θ∗)] = 0. A theoretical problem with the two-stepapproach as noticed by Dominguez and Lobato (2004) is that the unconditional moment restriction modelmay not identify θ∗ even if Assumption 13(a) is true. Finding a first-step estimate is also inconvenient.Donald, Imbens and Newey (2003) and Kitamura, Tripathi and Ahn (2004) both address these issues withtwo-step approach. Although this is a big theoretical improvement, there are other issues with these twoEL-based methods as we want to address using Chen, Variyath and Abraham (2008)’s pseudo observationadjustment. We will prove that our two new estimators both have asymptotic covariance that attains thisefficiency bound.4.3 Review of Empirical Likelihood Method for Moment RestrictionModelsLet (Zj)nj=1 be a set of Rd- valued observations from an i.i.d. process with true probability distributionP∗. If we have a (unconditional) moment restriction model defined by some function g : Rd × Θ −→ RJand the restriction that E [g(Z1, θ∗)] = 0 for some θ∗ ∈ Θ, we make inference for the unknown parameterby using the EL method. We define the EL profile likelihood function`(θ) ≡ inf−n∑j=1log(wj) : (w1, · · ·, wn) ∈ Snn∑j=1wjg(Zj , θ) = 0, (4.2)and define the EL estimator to be minimizer of ` over the parameter space Θ, θˆ ≡ argminθ∈Θ`(θ). We can alsodo hypothesis tests for different hypothesis testing problems by using the likelihood ratio statistics (Qin1404.3. Review of Empirical Likelihood Method for Moment Restriction Modelsand Lawless (2004)). An appealing feature of EL estimator is that it can be obtained in one step. This is avery different feature from the efficient GMM estimator, which requires a first-step preliminary estimatorto estimate the optimal weighting matrix. Newey and Smith (2004) also shows that this appealing one-step feature of EL also gives better statistical properties in some sense. In more general cases, we have aconditional variable X. We need to modify the construction of the original EL profile likelihood function(4.2) to obtain “EL based” methods for conditional moment restriction models.We assume our data to be a finite-sample realization of some independently and identically distributed(i.i.d.) discrete stochastic processZjXj ∈ Rd × Rs∞j=1on a probability space (Ω,F , P ). Let P∗denote the true distribution of X1, our conditioning variable. As in Section 4.2, suppose we have aconditional moment restriction model defined by for some known function ρ : Rd×Θ −→ RJ . We assumethere exists some θ∗ ∈ Θ ⊆ RL such that E [ρ(Z1, θ∗)|X1] = 0 almost surely. Θ is our parameter space.Donald, Imbens and Newey (2003) and Kitamura, Tripathi and Ahn (2004) proposed two EL-baseddifferent estimation and testing methods. Donald, Imbens and Newey (2003) is based on a equivalenceform of the restriction (4.1) that is given by approximation by basis functions of the space L2(Rs,Bs, P∗)which is the square integrable functions on the measure space (Rs,Bs, P∗). Kitamura, Tripathi andAhn (2004) is based on localizing (4.1) at each observed data point of the conditioning variable. A usefulnotice is that these two methods are not perfect substitutes from theorists’ point of view. Validity of thesemethods depends on different sets of regularity assumptions which are not equivalent. It is suggested thatin empirical applications, it would be better to compute both estimates for robustness. This will be clearafter our section in asymptotic results.It is well known that (4.1) is equivalent to the following statement: for every measurable functionf : Rs −→ R with E[f(Xj)2 <∞], we have E [ρ(Yj , Xj , θ∗)f(X1)] = 0. It is also well known thatunder some assumptions on P∗, if some countable collection of functions denoted by {p1, p2, · · · } suchthat for each k ∈ N, we have E [ρ(Z1, θ∗)pk(X1)] = 0. {q1, q2, · · · } can be taken to be countable basis ofusual approximating functions. For example, monomials{1, t, t2, t3, · · ·}as standard basis for the space ofalgebraic polynomials. Other two most commonly used approximating functional spaces are trigonometricpolynomials and splines. In several ways, splines are the most attractive approximating functions for ourpurpose. Let s be the order of splines we use. Let ξ(x) ≡ 1(0,∞)(x)x and let t1, · · · , tn−s−1 denote knots wechoose. Basis for splines is (x 7→ (1, x, · · · , xs, ξ(x− t1)s, · · · , ξ(x− tK−s−1)))∞K=1(see De Villiers (2012)1414.3. Review of Empirical Likelihood Method for Moment Restriction ModelsChapter 12). Linear combinations of these basis functions are continuous piecewise polynomials that arejoined together at the knots t1, · · · , tn−s−1. Donald, Imbens and Newey (2003) proposed to construct anestimator and a test based on this equivalent form of conditional moment restriction (4.1).Now we put formally the approximating functions argument and define the EL profile likelihoodfunction for conditional moment restriction model. Suppose that we have an triangular array of basisfunctions((qK,k)Kk=1)∞K=1which are basis of some commonly used approximating functions and let qK ≡(qK,1, · · · , qK,K)τ . We put the following assumption on the distribution of X1 so that every functiona : Rs −→ R with E[a(X1)2]<∞ can be approximated with arbitrary precision.Assumption 14. (a). For each K, E[qK(Xj)τqK(Xj)]is finite.(b). For any a : Rs −→ R with E[a(Xj)2]< ∞, there is an array(γK ∈ RK)∞K=1 such that asK −→∞, E[(a(Xj)− qK(Xj)τγK)2]−→ 0.Notice that Assumption 14(b) is not a too restrictive assumption on distribution of Xj . It is true forcommon approximating functions under mild conditions on distribution of Xj (See Bierens (2013) and DeVilliers (2012)). So we can work with a countable collection of unconditional moment conditions whichis equivalent to the original unconditional moment restriction (For proof of this equivalence, see Donald,Imbens and Newey (2003) Lemma 2.1), in the form ofE[ρ(Z1, θ∗)⊗ qK(X1)]= 0 (4.3)for every K ∈ N. They proposed an EL estimator with unconditional moment restrictions in form of (4.3)that grow in number and variety with the sample size.The intuition is that by letting K grow with the sample size, all information in the conditionalmoment restriction is eventually accounted for and therefore we can construct a sequence of tests basedon the countable collection of unconditional moment restrictions (4.3). Then as in the introduction, afterchoosing an increasing sequence (kn)∞n=1 with limn−→∞kn =∞, we define the our first version of EL profilelikelihood function `b on Θ (here b stands for basis functions) by`b(θ) = inf−n∑j=1logwj : (w1, · · · , wn) ∈ Sn,n∑j=1wjρ(Zj , θ)⊗ qkn(Xj) = 0, θ ∈ Θ. (4.4)And the EL estimator for θ∗ is defined to be the approximate global minimizer of `b over Θ, i.e. θˆb ≡1424.3. Review of Empirical Likelihood Method for Moment Restriction Modelsargminθ∈Θ`b(θ). And the approximate minimum of `b(θ) can be used to form a testing statistic for thespecification test of restriction (4.1). The BFEL profile likelihood function is a optimum value functionof a parametric optimization problem with parameter θ ∈ Θ. For the estimator has desired asymptoticproperties, we need to put restrictions on the growth rate of kn. We need to put another high-levelassumption on distribution of X1 and approximating functions we use.Assumption 15. For each K ∈ N, there exists a constant ζ(K) > K1/2 and matrix B(K) such that ifdefining q˜K(x) = B(K)qK(x) for all x ∈ supp(X1), we have:(1). supx∈supp(X1)∥∥q˜K(x)∥∥ 6 ζ(K),(2). There exists some small number δ > 0 such that for all K ∈ N,inf{ξτE[q˜K(X1)q˜K(X1)τ ] ξ : ξ ∈ SK}> δ.Assumption 15(2) is saying that E[q˜K(X1)q˜K(X1)τ]has smallest eigenvalue bounded away from zerouniformly in K. The constant ζ(K) in the assumption is important in determining the growth rate of(kn)∞n=1 we should choose in order to have desired asymptotic results. Explicit formulas of ζ are availablefor each different set of approximating functional basis. For summarized results on these explicit formulas,see Donald, Imbens and Newey (2003). For more details, see Andrew (1991) and Newey (1997). Anotherimplication is that different (kn)∞n=1 should be chosen if we use different functional basis.The optimization problem to compute realized `b(θ) at each θ is called inner loop optimization. Andoptimization of realized `b(θ) on Θ is called outer loop optimizations. These are standard terms inliterature. Notice that the inner loop is a convex programming problem for which reliable algorithms areavailable. The outer loop is a general nonlinear programming problem. In both numerical optimizationand deriving asymptotic properties, we consider the dual optimization problem to the inner loop problem.That is, for each θ ∈ Θ, we have`b(θ) = supγ∈Γb(θ)n∑j=1log(1 + γτρ(Zj , θ)⊗ qkn(Xj)) (4.5)where Γb(θ) ≡{γ ∈ RJkn : 1 + γτ(ρ(Zj , θ)⊗ qkn(Xj))> 0, j = 1, · · · , n}.This dual optimization problemis more convenient to work with for both purposes, because the dimension of the choice variable, whichis just Lagrangian multiplier, is much smaller than n. In practice, to construct the profile likelihood1434.3. Review of Empirical Likelihood Method for Moment Restriction Modelsfunction, we use a convex programming algorithm to solve (4.5).Now we outline Kitamura, Tripathi and Ahn (2004)’s approach. Given our random sample ((Zj , Xj))nj=1,We consider empirical likelihood localized at some x ∈ supp(X1). Now (4.1) gives E [ρ(Z1, θ∗)|X1 = x] = 0.Given our random sample ((Zj , Xj))nj=1, we could have a random subsample localized at x, i.e. the col-lection of points in ((Zj , Xj))nj=1 such that Xj = x. In finite samples, it could be very likely that this isempty. A basic principle behind kernel smoothing approach to nonparametric statistics is that we canhave a bandwidth b > 0, and think of all Zj ’s such that the observed Xj ’s fall into a neighborhood aroundx with radius b as the observations of random vector Z when we know that X is equal to x. Now for eachi = 1, . . . , n, we can construct a localized subsample in this way. But as in other kernel-based methods, wecan allow data points in ((Zj , Xj))nj=1 such that Xj is closer to x receive “more weight”. This is achievedby using a kernel function κ : Rs −→ R+. It is assumed that the kernel function κ we use satisfies thefollowing assumption. Most popular kernels satisfy this assumption.Assumption 16. For any x = (x(1), . . . , x(s)) ∈ Rs, κ(x) =∏si=1 κ˜(x(i)). Here κ˜ : R −→ R is acontinuously differentiable p.d.f. with support [−1, 1]. κ˜ is symmetric about the origin, and for somea ∈ (0, 1) bounded away from zero on [−a, a].For each Xi, we define a vector of weights pii,j ≡κ(Xj−Xibn)∑nj=1 κ(Xj−Xibn), j = 1, . . . , n , where (bn)∞n=1 is a nullsequence. pii,j > 0 if and only if Xj is within bn-neighborhood of Xi. To have desired asymptotic results,we need to put more restrictions on the rate of (bn)∞n=1. Then we can define the localized (at Xi) ELprofile function to be`i(θ) ≡ inf−n∑j=1pii,j log(wj) : (w1, · · ·, wn) ∈ Sn,n∑j=1wjρ(Zj , θ) = 0, θ ∈ Θ. (4.6)We can also consider the dual optimization problem where the dimension of constraint set is muchsmaller. For each θ ∈ Θ,`i(θ) = supγ∈Γi(θ)n∑j=1pii,j log(1 + γτρ(Zj , θ)), (4.7)where Γi(θ) ≡{γ ∈ RJ : 1 + γτρ(Zj , θ) > 0 for all j such thatpii,j > 0}. Like in all EL-based methods,we use (4.7) when working on deriving asymptotic properties as well as constructing the profile likeli-hood function in practice. Notice that for any x ∈ supp(X1) the kernel estimator for h(x) is ĥ(x) =1nbsn∑nj=1 κ(Xj−xbn). We define Ti,n ≡ 1[hˆ(Xi) > bςn]for some ς ∈ (0, 1) being a trimming factor that is1444.4. Problem of Infeasible Inner Loop Optimizationsused to deal with the “denominator problem” with this kernel-based method. Then we define the kernelsmoothing based EL profile function `kto be`k(θ) =n∑i=1Ti,n`i(θ), θ ∈ Θ,and we can define an estimator as an approximate minimizer θˆk ≡ argminθ∈Θ`k(θ).There is much more computational burden if working with KSEL than working with BFEL, since toevaluate KSEL profile likelihood function at any point , we need to solve n convex programming problems.4.4 Problem of Infeasible Inner Loop OptimizationsA problem that we can encounter in practice when using either of the two methods and constructing theprofile likelihood function is that with our realized data points((zj , xj) ∈ Rd × Rs)nj=1 in finte samples,the constraint set in (4.4)(w1, · · · , wn) ∈ Sn :n∑j=1wjρ(zj , θ)⊗ qkn(xj) = 0(4.8)can be empty at possibly many points, even all the points in Θ. If at some point θ ∈ Θ, the set (4)is empty, we should set value of the realized `b to be ∞(By the convention that inf ∅ = ∞.). (10) isequivalent to saying that the convex hull of the n real vectors(ρ(zj , θ)⊗ qkn(xj))nj=1 does not containthe origin. This is also known as “convex hull problem” to EL-based methods (Owen (2001), Kitamura(2006)). The feasible region in the outerloop optimization problem is by definition the points in Θ atwhich the profile likelihood function is not ∞.The fact that (4.8) is empty (in other words, the inner loop is an infeasible optimization problem) ata proportion of points in Θ may have some consequences that prevent us from obtaining a reasonablyaccurate global minimizer. There are several reasons. First, with our data in finite samples, before doingnumerical optimization, the analytical properties of the realized profile likelihood function as a function ofθ ∈ Θ are largely unknown. It is often not possible to specify the feasible region beforehand. First, it canbe hard to find a feasible initial point but we need our initial point to be feasible when carrying out outerloop optimization. For general nonlinear constrained programming problem, if we cannot immediatelyfind a feasible initial point from the setup, finding a feasible initial point or determining whether a point1454.4. Problem of Infeasible Inner Loop Optimizationsis feasible can be hard. In fact often we need to solve another optimization problem. In the cases withcomputation of EL-based methods, we need to solve a linear programming. We will discuss this algorithmlater. Except in simpler cases when θ 7→ ρ(z, θ) is linear for each z (then the outer loop feasible regionis convex, but still can be empty), we do not have more analytical results about the outer loop feasibleregion that can be derived from parametric optimization theory (Bank etc. (1983)). According to oursimulation results, it is possible that we have our outer loop feasible region as disjoint pieces. This causesanother difficulty when trying to solve outer loop optimization. All the factors can make it very difficultto approach the true global optimizer of the outer loop problem without sophisticated algorithms andmuch computational power.In practice, we numerically solve the dual optimization problemsupγ∈Γb(θ)n∑j=1log(1 + γτρ(zj , θ)⊗ qkn(xj))instead of the primal to get a value for the profile likelihood function at each point θ ∈ Θ. By seperatinghyperplane theorem,supγ∈Γb(θ)n∑j=1log(1 + γτρ(zj , θ)⊗ qkn(xj)) =∞if and only if the constraint set (4.8) is empty. So in our case the primal solution and dual solution agreeeven when the inner loop primal is infeasible. This causes another potential problem that may makeit possible to get spurious results for outer loop optimization. As noticed by Kitamura (2006), a naivealgorithm may wrongly return a point in the infeasible region as local minimizer. It is possible to get aspurious result if we do not try sufficiently many different initial points.For kernel smoothing based method, we have the same problem. We solvesupγ∈Γi(θ)n∑j=1pii,j log(1 + γτρ(zj , θ))to give the value of the localized EL profile likelihood function at θ. Then we can see thatsupγ∈Γi(θ)n∑j=1pii,j log(1 + γτρ(zj , θ)) =∞if and only if the convex hull of a subcollection of the vectors (ρ(zj , θ))nj=1 (those vectors corresponding1464.4. Problem of Infeasible Inner Loop Optimizationsto the xj ’s within the bn neighberhood of xi) does not contain the origin. Notice that the EL profilelikelihood function is defined to be `k(θ) =∑ni=1 `i(θ). For any θ ∈ Θ, if one of the terms(`i(θ))ni=1 isequal to ∞, θ is not a feasible point of the outer loop optimization. Then we face the same difficultywhen doing the outer loop and searching for an estimate.Therefore, we can see that compared to EL method for ordinary moment restriction models, trying tosolve the problem with infeasible inner loops is more important if we use EL-based methods to work withconditional moment restriction models. For functional basis based method, use of the functional basismakes the dimension of effective unconditional moment restrictions used in construction of EL profilelikelihood function tends to be much larger. And therefore it becomes more difficult for the convex hullof the vectors(ρ(zj , θ)⊗ qkn(xj))nj=1 (dimension is J × kn) to contain the origin. For kernel smoothingbased method, the construction that we make localized EL profile functions at each observed data pointsof conditioning variable and sum them up tends to squeeze our sample size. For `k(θ) to be real value, weneed the convex hull condition to be satisfied by all of n different subcollections of vectors (ρ(zj , θ))nj=1.In fact, even in worse cases, from analytical point of view it is not possible to have an estimatebecause the outer loop feasible regions are empty. This may happen in cases when we make conventionalchoices of the variety parameter kn and bandwidth bn. We run some simulations to demonstrate. Fora collection of vectors in Rv, (yj)nj=1, to determine whether the origin is in the convex hull of (yj)nj=1, apopular algorithm is given by the two-phase simplex method (see Vanderbei (2008)). This algorithm isvery stable and reliable. For v = 1, it simplifies to checking maximum and minimum of (yj)nj=1. We adoptthe simulation setup as in Section 4.7 (this is the same as the simulation setup in Kitamura, Tripathiand Ahn (2004)). We choose variety parameter kn and bandwidth bn as suggested in literature. For a100× 100 grid over [0, 2]× [0, 2] and looking at each point in the grid one by one, we check whether thispoint is a feasible point of outer loops (both BFEL and KSEL). We made a large number of replications.These simulation results are reported in Table 4.18. It shows that if using kernel smoothing, we are muchmore likely to be subject to the convex hull problem. Surprisingly, almost all of the points in all of thereplications fail to be feasible. But BFEL is more robust to convex hull problem. This gives a strongevidence suggesting the difficulty with applying these existing EL-based method to conditional momentrestriction models. And it also suggests that working to solve this problem is more than necessary.8Notice that in these simulations, we generate data for the case N = 200 and the case N = 50 seperately.1474.5. Pseudo Observation AdjustmentTable 4.1: Average of Fractions of Convex-Hull-Condition-Failing Points (500 Replications)N = 50 N = 200Kernel Smoothing bN = N−1/4 0.9987 0.9999bN = N−1/14 0.9775 0.9876Basis Functions bN = N1/4 0.0965 0.01124.5 Pseudo Observation AdjustmentWe adopt Chen, Variyath and Abraham (2008)’s pseudo observation adjustment method. Denote gj(θ) =ρ(Zj , θ) ⊗ qkn(Xj) for 1 6 j 6 n. And we choose an adjustment tuning parameter an > 0 that can bedata - dependent. Definegn+1(θ) ≡ −annn∑j=1ρ(Zj , θ)⊗ qkn(Xj). (4.9)We suggest to add this pseudo observation and define an adjusted EL profile likelihood function. Thepseudo observation can be in other forms but a stochastic order condition must be satisfied (see Chen,Variyath and Abraham (2008)).The adjusted BFEL (ABFEL) estimator for θ∗ is defined to be the approximate minimizer of theABFEL profile likelihood function `ab defined as`ab(θ) ≡ inf−n+1∑j=1logwj : (w1, · · · , wn+1) ∈ Sn+1,n+1∑j=1wjgj(θ) = 0, θ ∈ Θ. (4.10)By construction of this AEL function, for every ω ∈ Ω, there must be zero duality gap to this min-imization problem and therefore we also have its representation in dual optimization form: `ab(θ) =supλ∈Γab(θ)∑n+1j=1 log(1 + λτgj(θ)), Γab(θ) ={γ ∈ RJkn : 1 + γτgj(θ) > 0, for 1 6 j 6 n+ 1}, for every θ ∈ Θ.An observation is that for any realized data points((yj , xj) ∈ Rd × Rs)nj=1 we have, the constraint set inthe inner loop optimization problem(w1, · · · , wn+1) ∈ Sn+1 :n∑j=1wjρ(yj , xj , θ)⊗ qkn(xj) +−annn∑j=1ρ(yj , xj , θ)⊗ qkn(xj) = 0(4.11)is always nonempty, since(1n ,··· ,1n ,1an)/1+ 1an is a point that satisfies the constraint and it is in the (relative)interior of Sn+1. In finite samples, it can be shown that ABFEL profile likelihood function is pointwisebelow the BFEL profile likelihood function and it is continuous everywhere on Θ if ρ(z, ·) is continuous on1484.5. Pseudo Observation AdjustmentΘ for every z ∈ supp(Z1). Since the constraint set (4.8) is always nonempty, the dual problem we wantto solve to construct the profile likelihood function must be bounded and existence of solution is alwaysguaranteed. Chen, Sitter and Wu (2002) proposed a modified Newton - Raphson algorithm and proved itsalgorithmic convergence. Convergence of their iterative optimization method is guaranteed because thedual problem is a bounded convex programming problem and therefore there must exist a solution. Forany realized data points, the feasible region of the realized AEL profile likelihood function is the whole ofΘ. A set of good initial values from Θ are still needed when we use nonlinear optimization algorithm tofind an estimate, but we no longer have the problem with disconnected feasible region and any point inΘ is a valid initial point for outer loop optimization. If we are interested in obtaining an estimate fromthe data and using AEL to get it, it is believed that in many cases computational efficiency is largelyenhanced relative to using the optimizer of the original EL profile likelihood function as an estimate.Then we define the ABFEL estimator for θ∗ to be approximate minimizer of `ab, θˆab ≡ argminθ∈Θ`ab(θ),where the error is in appropriate probabilistic sense.For KSEL method, we define pseudo obervations in almost the same way. The difference is that weadd one pseudo observation when constructing each of n localized EL functions. Denote ρj(θ) = ρ(Zj , θ).For each i = 1, . . . , n, we define a pseudo observationρn+1(θ) = −ann∑j=1pii,jρj(θ) (4.12)(ρn+1 should also depend on i. We suppressed this notation.) and the localized EL profile likelihoodfunction`ia(θ) = inf−n+1∑j=1pii,j log(wj) : (w1, · · ·, wn+1) ∈ Sn+1,n+1∑j=1wjρj(θ) = 0, θ ∈ Θ (4.13)where we put pii,n+1 = 1/n. Similarly, we can easily see that now at each θ ∈ Θ, convex hull ofthe vectors in (ρj(θ))n+1j=1 such that pii,j > 0 contains the origin and thus the dual optimization prob-lem supγ∈Γia(θ)∑n+1j=1 pii,j log(1+γτρj(θ)) with Γia(θ) ={γ ∈ RJ : 1 + γτρj(θ) > 0 for all j such thatpii,j > 0}isbounded. We define the AKSEL profile likelihood function to be`ak(θ) =n∑i=1`ia(θ), θ ∈ Θ.1494.6. Asymptotic PropertiesAnd the AKSEL estimator is defined to be approximate minimizer of `ak on Θ, i.e. θˆak ≡ argminθ∈Θ`ak(θ).In each of the original methods, there should be a tuning parameter to choose by the econometrician,which can be sample size-dependent and data-dependent, i.e. bandwidth for KSEL and variety parameterfor BFEL. For each of our AEL-based methods, there will be two tuning parameters to choose. We provethat putting restrictions on the stochastic order of an, our AEL-based estimator (AKSEL or ABFEL) isconsistent and asymptotically normal and χ2 approximation to the distribution of likelihood ratio statisticis asymptotically valid.4.6 Asymptotic Properties4.6.1 Asymptotic Properties of ABFELThen our next task is to show that all asymptotic results for the EL-based point estimator and specificationtest statistics remain valid for AEL-based ones. If they have the same asymptotic properties, we basicallyhave the same justification for using AEL as how we justify using EL at a philosophical level. In thenext section, we provide three theorems about first-order asymptotic properties of AEL. The first-orderproperties of AEL are the same as EL9. Some assumptions on the true DGP are given below. Thisis exactly the same set of assumptions as in Donald, Imbens and Newey (2003). Some restrictions onchoosing the tuning parameters kn and an are needed in order to have the AEL-based method maintaindesired first-order properties. We are to choose kn first and then choose an. The rate restrictions on knwill be similar to those in previous literature. There is then a wide range of choice for an. Assumptions17, 18 and 19 can also be found in Donald, Imbens and Newey (2003).Assumption 17. (a). E[supθ∈Θ‖ρ(Z1, θ)‖2 |X1]is bounded w.p.1.(b). There is some constant γ > 2 with E[supθ∈Θ‖ρ(Z1, θ)‖γ]<∞.(c). There exists δ(Z1) with E[δ(Z1)2]<∞ and α > 0 such that for all θ′, θ′′ ∈ Θ,∥∥ρ(Z1, θ′)− ρ(Z1, θ′′)∥∥ 6 δ(Z1)∥∥θ′ − θ′′∥∥α .Let ρl denote the l−th coordinate of the vector-valued function ρ.9In fact, since we can make the tuning parameter an data - dependent, we have an additional degree of flexibility. Thisimportant aspect make it possible for the AEL to achieve better asymptotic properties than the original EL. For high ordertheories on AEL for unconditional moment restriction models, see Liu and Chen (2010) and Matsushita and Otsu (2012).1504.6. Asymptotic PropertiesAssumption 18. (a). θ∗ ∈ int(Θ).(b). ρ(Z1, ·) is twice continuously differentiable in a neighborhood N around θ∗ and with probability 1,both E[supθ∈N‖∇θρ(Z1, θ)τ‖2 |X1]and E[∑Jl=1 ‖∇θθ′ρl(Z1, θ∗)τ‖2 |X1]are bounded.(c). E [E [∇θρ(Z1, θ∗)τ |X1]τ E [∇θρ(Z1, θ∗)τ |X1]] is nonsingular.Assumption 19. (a). E [ρ(Z1, θ∗)ρ(Z1, θ∗)τ |X1] has smallest eigenvalue bounded away from zero.(b). For a neighborhood N around θ∗, E[supθ∈N‖∇θρ(Z1, θ)τ‖4 |X1]is bounded.(c). For a neighborhood N around θ∗, there exists some δ(Z1) with E[δ(Z1)2]< ∞ such that for allθ ∈ N ,‖ρ(Yj , Xj , θ)− ρ(Yj , Xj , θ∗)‖ 6 δ(Z1) ‖θ − θ∗‖ .Next we provide three theorems about the asymptotic properties of ABFEL method. These correspondto the three main asymptotic results in Donald, Imbens and Newey (2003). We denote the ABFELestimator by θˆab which for each sample size n ∈ N is definded to be the approximate global minimizer ofthe AEL profile likelihood function `ab as defined in (4.10). Then we can prove consistency and asymptoticnormality of the ABFEL estimator.Theorem 4.1. Assume that Assumptions 13,14,15,17,19 are satisfied by the true DGP, assume that ourtuning parameters satisfy (1). knn = O(1ζ(kn)2n2/γ ) and (2). an = op(√nkn), then we have: θˆab −→p θ∗.Theorem 4.2. Assume that Assumptions 13,14,15,17,18,19 are satisfied by the true DGP, assume thatour tuning parameter satisfy (1). knn = O(1ζ(kn)2n2/γ ), (2).ζ(kn)2k2nn = o(1), and (3). an = op(nk1/2n ζ(kn)),then the AEL estimator is asymptotically normal: n1/2(θˆab − θ∗)−→d Normal(0,Σ∗), where Σ∗ is thesemi-parametric efficiency bound.Notice that the asymptotic covariance matrix can be consistently estimated by Σˆ ≡(GˆτnWˆnGˆn)−1,where Gˆ ≡ 1n∑nj=1∇θgj(θˆab) and Wˆ ≡1n∑nj=1 gj(θˆab)gj(θˆab)τ (Proof of consistency of these covarianceestimators can be found in Donald, Imbens and Newey (2003). Their proof easily extends to the casewhere we use θˆab in construction of the estimator instead of the original EL-based estimator proposed intheir paper).Another interesting statistic is the statistic for tesing the specification hypothesis. The ABFEL-based1514.6. Asymptotic Propertiesspecification test statistic is defined to beTˆab ≡ 2 supγ∈Γab(θˆab)n+1∑j=1log(1 + γτgj(θˆab)),which is just twice the optimum value we get from outer loop optimization and we have the followingasymptotic result:Theorem 4.3. Assume that assumptions 13,14,15,17,18,19 are satisfied by the true DGP, assume thatour tuning parameter satisfy (1). knn = O(1ζ(kn)2n2/γ ), (2).ζ(kn)2k3nn = o(1), (3). an = op(n1/2k3/4n) then wehaveTˆab − (Jkn − L)√2(Jkn − L)−→p 0. (4.14)An implication of (4.14) is limn−→∞P[Tˆab > qα,Jkn−L]= α, where qα,Jkn−L is the 1 − α quantile ofχ2(Jkn − L) (It is easy to show this based on application of a uniform convergence result for normalapproximation. See Ash and Doleans-Dade (2000)). This means χ2 approximation to distribution of teststatistic is still valid even when the number of unconditional moment restrictions used is growing.4.6.2 Asymptotic Properties of AKSELFor the first-order asymptotic results of Kitamura, Tripathi and Ahn (2004) to hold with the pseudo-point adjustment, we need to put a more restrictive condition on the rate of an. First, we need to putassumptions on distribution of (X1, Z1). h : Rs −→ R+ is the density function of X1.Assumption 20. (a). E[supθ∈Θ‖ρ(Z1, θ)‖m]<∞ for some m > 8.(b). h is strictly positive everywhere and twice continuously differentiable, with supx∈Rsh(x) < ∞,supx∈Rs‖∇xh(x)‖ <∞ and supx∈Rs‖∇xxτh(x)‖ <∞.(c). E[‖X1‖1+%]<∞ for some % > 1.(d). θ 7→ ρ(Z1, θ) is continuous on Θ w.p.1 and E[supθ∈Θ‖∇θρ(Z1, θ)τ‖]<∞.(e). (θ, x) 7→ ‖∇xxτE [ρl(Z1, θ)|X = x]‖ is uniformly bounded on Θ× Rs for 1 6 l 6 J .(f). For some neighborhood N around θ∗, the smallest eigenvalue of V (x, θ) is bounded away fromzero (by some constant σ > 0) uniformly on Rs ×N . The largest eigenvalue of V (x, θ) is bounded aboveon Rs ×N .1524.6. Asymptotic PropertiesNotice that Assumption 20 (a) is stronger than the assumption put on moment of supθ∈Θ‖ρ(Z1, θ)‖ neededin BFEL. Assumptions 20 (a) and 20 (b) together are somewhat weaker than Assumption 14, which isthe high-level assumption needed to put to restrict the distribution of the conditioning random vector.For Assumption 14 to hold for conventional basis functions, we need to put more restrictive assumptionson distribution of X. The theoretical constants m and % show up in the restrictions needed to put on thedecreasing rate of bandwidth. Also we notice that BFEL does not need assumption put on smoothnessof x 7→ E [ρ(Z1, θ)|X = x] as in Assumption 20 (e).Assumption 21. There exists a neighborhood N around θ∗ such that:(a). θ 7→ D(X1, θ) and θ 7→ V (X1, θ) are continuous on N w.p.1.(b). E[supθ∈Θ‖∇θρ(Z1, θ)‖η]<∞ for some η > 4.(c). supx∈Rs‖∇xDij(x, θ∗)‖ <∞ and sup(x,θ)∈Rs×N‖∇xxτDij(x, θ)‖ <∞, for all 1 6 i 6 L, 1 6 j 6 J .(d). supx∈Rs‖∇xVij(x, θ∗)‖ <∞ and sup(x,θ)∈Rs×N‖∇xxτVij(x, θ)‖ <∞, for all 1 6 i 6 J , 1 6 j 6 J .Theorem 4.4. Assume that Assumptions 13,16,20 hold, if % > 2/m, an = Op(1) the following restrictionson bn are satisfied: for some β ∈ (0, 1),max{nβnb3s/2+2ςn,1n%bςn,nβnb((m+2)/(m−2))sn,nβ+1/mnbs+2ςn}−→ 0, (4.15)then we have: θˆak −→p θ∗.Recall that ς ∈ (0, 1) is the parameter we choose in the trimming factor Ti,n ≡ 1[hˆ(Xi) > bςn]. Noticethat for AKSEL, we need to put more restrictive assumption on rate of the adjustment tuning parameter.For asymptotic normality, notice that we can use the proof techniques in Newey and Smith (2004)and drop Assumption 3.6 of Kitamura, Tripathi and Ahn (2004). We can directly prove that underother assumptions, if letting λˆi bet a random vector that attains supγ∈Γik(θˆak)∑n+1j=1 pii,j log(1 + γτρj(θˆak)),max16i6n∥∥∥λˆi∥∥∥ 6 n−1/m w.p.a.1.Theorem 4.5. Assume that Assumptions 13,16,20,21 hold, if % > max {1/η + 1/2, 2/m + 1/2}, an = Op(1)and the bandwidth satisfies: in addition to (4.15), for some β ∈ (0, 1)max{n2βnb5s/2+6ςn,1n2%−1/η−1/m−1/2b2ςn,1n2%−3/m−1/2b3ςn}−→ 01534.7. Monte Carlo Experimentthen we have n1/2(θˆak − θ∗)−→d Normal(0,Σ∗), where Σ∗ is the semi-parametric efficiency bound.4.7 Monte Carlo ExperimentWe use a Monte Carlo experiment to compare performance of ABFEL and AKSEL. This part is alsonew in literature. It is interesting to compare these two EL-based methods and see which one has betterperformance in finte samples. Notice that without the pseudo observation adjustment, the problem ofinfeasible inner loops can make the Monte Carlo comparison invalid.We adopt the same simulation design as that used in Kitamura, Tripathi and Ahn (2004). We have aunivariate linear model with heteroskedastic errors. The setup isYj = β1 + β2Xj + UjUj = i√0.1 + 0.2Xi + 0.3X2iwhere we set β1 = β2 = 1. Xi ∼ lognormal(0, 1) and i ∼ Normal(0, 1) are drawn independently. Thenumber of replications is 500. As in Kitamura, Tripathi and Ahn (2004), we use infeasible GLS as thebaseline. Then referring to Kitamura, Tripathi and Ahn (2004)’s results, we can also compare ABFELand AKSEL to Newey’s semiparametric efficient IV estimator. We do two experiments with n = 50and n = 200. In each experiment, we show results from using two different kernels (Epanechnikov andQuartic) and two different sets of basis functions (polynomials and splines). We also choose adjustmentparameter an to be 0.1 and log(n)/2 which is used in simulations in Chen, Variyath and Abraham (2008).For AKSEL, we report results from using two bandwidths, 0.5 and 1 similar to the results reported inKitamura, Tripathi and Ahn (2004). For ABFEL, we just choose the variety tuning parameter to bekn = floor(n1/4).In Table 4.2, we report RMSE (root mean square error) of AKSEL relative to infeasible GLS. In Table4.3, we report RMSE of ABFEL relative to infeasible GLS. The level of RMSE of infeasible GLS is similarto those reported in Kitamura, Tripathi and Ahn (2004).The simulation results show that for this linear model, finite-sample performance of AKSEL is generallybetter than ABFEL. And for both AKSEL and ABFEL, using a smaller adjustment parameter is better.We refer to Kitamura, Tripathi and Ahn (2004) and check that the performance of AKSEL is comparableto Newey (1990)’s two-step estimator, in terms of RMSE.1544.8. ConclusionTable 4.2: Relative RMSE of AKSELn = 50 n = 200Kernel Bandwidth Adjustment RMSE Kernel Bandwidth Adjustment RMSEEpanechnikov 0.5 0.11.511.58Epanechnikov 0.5 0.11.841.79Epanechnikov 0.5 11.961.66Epanechnikov 0.5 12.381.95Epanechnikov 1 0.11.131.10Epanechnikov 1 0.11.511.29Epanechnikov 1 11.461.37Epanechnikov 1 11.452.25Quartic 0.5 0.11.131.04Quartic 0.5 0.11.421.28Quartic 0.5 11.211.07Quartic 0.5 11.151.25Quartic 1 0.11.081.04Quartic 1 0.11.221.20Quartic 1 11.321.25Quartic 1 11.491.45Table 4.3: Relative RMSE of ABFELn = 50 n = 200Basis Adjustment RMSE Basis Adjustment RMSEPolynomials 0.11.621.49Polynomials 0.11.941.84Polynomials 11.531.40Polynomials 11.771.65Splines 0.11.881.61Splines 0.11.821.60Splines 12.011.54Splines 12.322.054.8 ConclusionIn this paper, we showed pseudo observation adjustment of Chen, Variyath and Abraham (2008) whichsolves the problem with infeasible inner loop optimization in computation of the EL profile likelihoodfunction for moment restriction model can be extended to the more general case of EL-based methods forconditional moment restriction model. We argued that this problem cause very practical difficulties whenapplying EL-based methods in empirical applications. We considered two separate EL-based methodsand validity of pseudo observation adjustments is shown. A Monte Carlo experiment demonstrate thetwo new estimators have good properties in finite samples.1554.9. Proofs4.9 Proofs4.9.1 Proofs of Theorems in Section 4.6.1In proofs of the theorems, I used many common techniques and lemmas that are also used in proofsof results in Donald, Imbens and Newey (2003) (abbreviated to DIN in following proofs). Thereforeour proofs will be brief. One difference is that using a lemma that gives a stronger order assessmentfor max16j6n |Yj | for an i.i.d. process (Yj)∞j=1, we can strengthen the results in DIN in the sense thatweaker restriction is needed to put on growth rate of the tuning parameter (kn ∈ N)∞n=1. Let (Ω,F , P )denote the underlying probability space. For a symmetric k × k matrix A, λmax(A) denotes the largesteigenvalue and λmin(A) denotes the smallest eigenvalue. ‖·‖ denotes the Frobenius matrix norm. Thatis, ‖A‖ = (trace(AτA))1/2. For real vectors, ‖·‖ denotes the Euclidean norm.Notice that for the linear space of symmetric real matrices, A 7→ λmax(A) is also a matrix norm thatsatisfies ‖Ax‖ 6 λmax(A) ‖x‖. Notice that all norms on a finite - dimensional vector space are equivalent.“with probability approaching one” will be abbreviated to w.p.a.1. “infinitely often with respect to n”will be abbreviated as “i.o.− n” Similar abbreviation for “almost always with repect to n”. “const” willdenote a generic positive constant that can be different in different places. FOC is short for “first ordercondition”. “p.d.” and “p.s.d.” are short for positive definite and positive semi-definite.We also use the following short-form notations:gj(θ) ≡ ρ(Zj , θ)⊗ qkn(Xj); gj ≡ ρ(Zj , θ∗)⊗ qkn(Xj); gˆ(θ) ≡ 1n∑nj=1 gj(θ); gn+1(θ) ≡ −angˆ(θ);Λ(θ) ≡{λ ∈ RJkn : 1 + λτgj(θ) > 0, j = 1, · · · , n};Γab(θ) ≡{λ ∈ RJkn : 1 + λτgj(θ) > 0, j = 1, · · · , n, n+ 1}.Let θˆab be the sequence of ABFEL estimators, defined as the approximate minimizer of the `ab(θ) overθ ∈ Θ. We first provide some lemmas that are useful in proofs of theorems. We will also directly usesome lemmas from DIN in proofs for theorems in the next section. For lemmas that are identical to thosein , proofs can be found in DIN and therefore will not be shown here.Lemma 4.1. Let (Yj)∞j=1 be a i.i.d. stochastic process with E[Y γj]< ∞, for some γ > 2. Then(Zn ≡ max16j6n |Yj |)∞n=1 is op(n1/γ).1564.9. ProofsProof. By Rosenthal (2000) Proposition 4.2.9 and i.i.d. assumption, for any m ∈ N, we have∞∑k=1P[mγY γk > k]=∞∑k=1P [mγY γ1 > k] = E [bmγY γ1 c] <∞.Therefore, by Borel-Cantelli lemma, P([|mYn| > n1/γ]i.o.− n)= 0. Therefore, with probability 1max16j6n |Yj | < n1/γ a.a.−n. Therefore,P(∞⋂m=1{[Zn/n1/γ < 1/m] a.a.− n})= 1.By definition, we have Zn/n1/γ −→a.s. 0.Notice that by construction of our EL profiling step, there must exists some λ˜ such that for all ω ∈ Ω,λ˜ attains supγ∈Γab(θ˜)∑n+1j=1 log(1 + γτgj(θ˜)) for all n ∈ N. Next we give an order assessment for λ˜.Lemma 4.2. Suppose θ˜ = θ∗ + Op(τn), where τn = o(k−1n ) and∥∥∥gˆ(θ˜)∥∥∥ = Op(√knn ). Suppose we alsohave rate restrictions√knn = O(n−1/γζ(kn)−1) and an = op(√nkn), let λ˜n be a random vector that attainssupγ∈Γab(θ˜)∑n+1j=1 log(1 + γτgj(θ˜)), then λ˜ = Op(√knn ).Proof. We haven∑j=1gj(θ˜)1 + λ˜τngj(θ˜)+gn+1(θ˜)1 + λ˜τngn+1(θ˜)= 0everywhere on Ω. Denote gj(θ˜) by g˜j . Put ξ ≡ λ˜n‖λ˜n‖. Then we have0 =ξτnn+1∑j=1g˜j1 + λ˜τ g˜j=ξτnn+1∑j=1λ˜−(λ˜τ g˜j)g˜j1 + λ˜τ g˜j= ξτ1nn∑j=1g˜j +1ng˜n+1−ξτnn+1∑j=1(λ˜τ g˜j)g˜j1 + λ˜τ g˜j= ξτ1nn∑j=1g˜j(1−ann)−∥∥∥λ˜∥∥∥nn+1∑j=1(ξτ g˜j)21 + λ˜τ g˜j.1574.9. ProofsTherefore, we have∥∥∥λ˜∥∥∥(ξτ S˜?ξ)= ξτ1nn∑j=1g˜j(1−ann)−∥∥∥λ˜τ∥∥∥n(ξτ g˜n+1)21 + λ˜τ g˜n+16∥∥∥∥∥∥1nn∑j=1g˜j∥∥∥∥∥∥(1−ann)where S˜? ≡ 1n∑nj=1g˜j g˜τj1+λ˜τ g˜j. Let S˜ ≡ 1n∑nj=1 g˜j g˜τj . Notice that by construction 1 + λ˜τ g˜j > 0 for every1 6 j 6 n + 1. It is easy to see that S˜?(1 + max16j6nλ˜τ g˜j) − S˜ is p.s.d. Therefore, by applyingCauchy-Schwartz inequality we have∥∥∥λ˜∥∥∥(ξτ S˜ξ)6∥∥∥λ˜∥∥∥(ξτ S˜?ξ)(1 + max16j6nλ˜τ g˜j)6∥∥∥λ˜∥∥∥(ξτ S˜?ξ)(1 + max16j6n∥∥∥λ˜∥∥∥ ‖g˜j‖)6∥∥∥∥∥∥1nn∑j=1g˜j∥∥∥∥∥∥(1−ann)(1 +∥∥∥λ˜∥∥∥ (max16j6n ‖g˜j‖)).And we can solve for∥∥∥λ˜∥∥∥ and this gives∥∥∥λ˜∥∥∥ξτSξ −∥∥∥∥∥∥1nn∑j=1g˜j∥∥∥∥∥∥(1−ann)(max16j6n ‖g˜j‖) 6∥∥∥∥∥∥1nn∑j=1g˜j∥∥∥∥∥∥(1−ann).We havemax16j6n ‖g˜j‖ 6 ζ(kn)max16j6n ‖ρ˜j‖ = ζ(kn)op(n1/γ)by Lemma 4.1. LetW ≡ −∥∥∥∥∥∥1nn∑j=1g˜j∥∥∥∥∥∥(1−ann)(max16j6n ‖g˜j‖) .Then by the assumption∥∥∥gˆ(θ˜)∥∥∥ = Op(√knn ) and an = op(√nkn), we have W = op(1). We now have∥∥∥λ˜∥∥∥(ξτ S˜ξ +W)6 Op(√knn ). By DIN Lemma A6, ξτ S˜ξ + W > const w.p.a.1. Therefore 1ξτ S˜ξ+op(1)=Op(1) and∥∥∥λ˜∥∥∥ 6 Op(√knn ).Lemma 4.3. Suppose θ˜ = θ∗+Op(τn), where τn = o(k−1n ) and∥∥∥gˆ(θ˜)∥∥∥ = Op(√knn ). Suppose we also have1584.9. Proofsrate restrictions√knn = O(n−1/γζ(kn)−1) and an = op(√nkn), thensupγ∈Γab(θ˜)n+1∑j=1log(1 + γτgj(θ˜)) = Op(knn).Proof. We obtain the mean value expansion with λ˙ being a random vector lying between λ˜ and 0:1nn+1∑j=1log(1 + λ˜τgj(θ˜)) = −λ˜τ gˆ(θ˜)−(12)λ˜τ1nn∑j=1gj(θ˜)gj(θ˜)τ(1 + λ˙τgj(θ˜))2 λ˜+1nlog(1 + λ˜τgn+1(θ˜)).It is easy to see that max16j6n∣∣∣λ˙τgj(θ˜n)∣∣∣ = op(1) if√knn = O(n−1/γζ(kn)−1). Therefore1nn+1∑j=1log(1 + λ˜τgj(θ˜)) 6∥∥∥λ˜∥∥∥∥∥∥gˆ(θ˜)∥∥∥− const ·∥∥∥λ˜∥∥∥2+1nlog(1 + λ˜τgn+1(θ˜))= Op(knn) + op(n−1).Lemma 4.4. Let Γn ≡{λ : ‖λ‖ 6√knn}. max16j6nsupθ∈Θ,λ∈Γn|λτgj(θ)| −→p 0 as n −→∞. AndP[Γn ⊆⋂θ∈ΘΓab(θ)]−→ 1as n −→∞.Proof. It is clear that max16j6nsupθ∈Θ‖g(θ)‖ = op(n1/γ)ζ(kn). Then by Cauchy Schwartz, we havemax16j6nsupθ∈Θ,λ∈Γn|λτgj(θ)| 6√knnmax16j6nsupθ∈Θ‖gj(θ)‖= op(1).Therefore P[Γn ⊆⋂θ∈Θ Γab(θ)]> P[max16j6nsupθ∈Θ,λ∈Γn|λτgj(θ)| < 1]−→ 1 as n −→∞.The next result is key in proof of consistency of the AEL estimator.1594.9. ProofsLemma 4.5.∥∥∥1n∑nj=1 gj(θˆab)∥∥∥ = Op(√knn ).Proof. By the dominating condition, we have∥∥∥1n∑nj=1 gj(θˆab)∥∥∥ 6 1n∑nj=1 supθ∈Θ‖gj(θ)‖ = Op(ζ(kn)) .We need to refine this result. Let λ¯ ≡√knngˆ(θˆab)‖gˆ(θˆab)‖. By Lemma 4.4, max16j6n∣∣∣λ¯τgj(θˆab)∣∣∣ −→p 0 andP[λ¯ ∈ Γab(θˆab)]−→ 1 as n −→ ∞. Thus, for any λ˙ that is a convex combination of 0 and λ¯, similarlywe have max16j6n(1 + λ˙τgj(θˆab))−2−→p 1. Moreover,sup‖λ‖6√knn ,θ∈Θ|λτgn+1(θ)| 6√knnsupθ∈Θ∥∥∥∥∥∥−annn∑j=1gj(θ)∥∥∥∥∥∥6 an√knn1nn∑j=1supθ∈Θ‖gj(θ)‖= op(1)if an = op(√nkn) under the assumptions. Therefore we haveP[1 + λ¯τgn+1(θˆab) > 0]−→ 1andP[λ¯ ∈ Γab(θˆab)]−→ 1as n −→∞. An expansion gives on a sequence of event whose probability goes to 1 as n −→∞, where λ˙is the mean value that could be different for each row1nn+1∑j=1log(1 + λ¯τgj(θˆab))= − λ¯τ gˆ(θˆab)−(12)λ¯τ1nn∑j=1gj(θˆab)gj(θˆab)τ(1 + λ˙τgj(θˆab))2 λ¯+1nlog(1 + λ¯τgn+1(θˆab))>√knn∥∥∥gˆ(θˆab)∥∥∥− const ·knn+1nlog(1 + λ¯τgn+1(θˆab)).1604.9. ProofsThe second inequality is by applying DIN Lemma A12. For the last term,1nlog(1 + λ¯τgn+1(θˆab))=1nlog1− λ¯τannn∑j=1gj(θˆab)=1nlog1− an(knn)−1/2∥∥∥∥∥∥1nn∑j=1gj(θˆab)∥∥∥∥∥∥=op(n−1)if an = op(√nkn) . By definition of the estimator θˆ and the saddle point structure of the estimationmethod, we have√knn∥∥∥gˆ(θˆab)∥∥∥− const ·knn+ op(n−1) 61nn+1∑j=1log(1 + λ¯τgj(θˆab))6 supλ∈Γab(θˆ)1nn+1∑j=1log(1 + λτgj(θˆab))6 supλ∈Γab(θ∗)1nn+1∑j=1log (1 + λτgj(θ∗)) .By Lemma A9 of DIN, ‖gˆ(θ∗)‖ = Op(√knn ). Then we can take θ˜ = θ∗ and apply Lemma 4.3. Then wehavesupλ∈Γab(θ∗)1nn+1∑j=1log (1 + λτgj(θ∗)) = Op(knn).Rearranging terms, we get∥∥∥1n∑nj=1 gj(θˆab)∥∥∥ = Op(√knn ).Proof of Theorem 4.1.The proof is identical to proof of Theorem 5.5 of DIN, proof of consistency oforiginal EL estimator, given that we have shown∥∥∥1n∑nj=1 gj(θˆab)∥∥∥ = Op(√knn ).Proof of Theorem 4.2.We apply implicit function theorem (9.28 of Rudin (1976)). For each θ, the real-valued random maximizer λˆ(θ) that attains supγ∈Γab(θ)∑n+1j=1 log(1 + γτgj(θ)) is a solution to the equations∑n+1j=1gj(θ)1+λˆ(θ)τgj(θ)= 0. We know that there must exist a solution in Λˆ†(θ) for every ω ∈ Ω, and thisdefines an implicit random multivalued function of θ. It is easy to see that∑n+1j=1gj(θ)gj(θ)τ(1+λτgj(θ))2 is p.d. if∑nj=1 gj(θ)gj(θ)τ is p.d.. By implicit function theorem, λˆ(θ) is one to one over a small neighborhoodaround θ and differentiable at θ. We assumed θ∗ ∈ int(Θ) and we have proved that θˆab is consistent.1614.9. ProofsSo w.p.a.1, θˆab ∈ int(Θ). Lemma A. 15 of DIN also applies to the AEL estimator, so we have θˆab =θ∗+Op(√knn ). Then DIN Lemma A.6 applies and we have w.p.a.11n∑nj=1 gj(θˆab)gj(θˆab)τ is nonsingular.Therefore, on a sequence of events w.p.a.1, θˆab satisfies the FOC10:op(n−1/2) =1nn+1∑j=1∇θlog(1 + λˆ(θˆab)τgj(θˆab))=1nn+1∑j=1∇θgj(θˆab)τ1 + λˆ(θˆab)τgj(θˆab) λˆ(θˆab). (4.16)The first equality is by envelope theorem. Use λˆ to denote λˆ(θˆab). Expanding the FOC for λˆ(θˆab) around0 gives0 =1nn+1∑j=1gj(θˆab)−1nn+1∑j=1gj(θˆab)gj(θˆab)τ(1 + λ˙τgj(θˆab))2 λˆ (4.17)where λ˙ is the mean value. Denote Gˇ = 1n∑nj=1∇θgj(θˆab)τ1+λˆ(θˆab)τgj(θˆab). Then from (4.16), we have(Gˇ+1n11 + λˆτgn+1(θˆab)∇θgn+1(θˆab)τ)λˆ = op(n−1/2) (4.18)From (4.17), denoting Ωˇ = 1n∑n+1j=1gj(θˆab)gj(θˆab)τ(1+λ˙τgj(θˆab))2 and gˆ = 1n∑nj=1 gj(θˆab) we havegˆ +1ngn+1(θˆab)−Ωˇ +1ngn+1(θˆab)gn+1(θˆab)τ(1 + λ˙τgn+1(θˆab))2 λˆ = 0 (4.19)Then from (4.19), we haveλˆ = Ωˇ−1gˆ + Ωˇ−11ngn+1(θˆab)− Ωˇ−1 1ngn+1(θˆab)gn+1(θˆab)τ(1 + λ˙τgn+1(θˆab))2 λˆ. (4.20)10We assume our estimator satisfies the FOC approximately to the order of op(n−1/2).1624.9. ProofsPlug the expression for λˆ (4.20) into (4.18), we obtainop(n−1/2) = GˇΩˇ−1gˆ + GˇΩˇ−11ngn+1(θˆab)− GˇΩˇ−1 1ngn+1(θˆab)gn+1(θˆab)τ(1 + λ˙τgn+1(θˆab))2 λˆ+1n11 + λˆτgn+1(θˆab)∇θgn+1(θˆab)τ λˆ (4.21)Expanding gˆ around θ∗ givesGˇΩˇ−1gˆ = GˇΩˇ−1G˙(θˆab − θ∗) + GˇΩˇ−1 1nn∑j=1gj(θ∗) (4.22)where G˙ = 1n∑nj=1∇θgj(θ˙) for mean value θ˙ that could be different in each row. Using the facts θˆab =θ∗ + Op(√knn ), GˇΩˇ−1G˙ = Op(1) and GˇΩˇ−1 1n∑nj=1 gj(θ∗) = Op(n−1/2), we obtain an order assessment ofGˇΩˇ−1gˆ to be Op(√knn ). Proofs of these facts can be found in proof of DIN Theorem 5.4. Moreover,∣∣∣λˆ′gn+1(θˆab)∣∣∣ 6∥∥∥λˆ∥∥∥∥∥∥∥∥∥−annn∑j=1gj(θˆab)∥∥∥∥∥∥= op(1)if an = op(n/kn) andGˇΩˇ−11ngn+1(θˆab) = −annGˇΩˇ−1gˆ = op(n−1/2)if an = op(nk−1/2n ). For the last term in (4.21), under Assumption 15 (b),1n11 + λˆτgn+1(θˆab)∇θgn+1(θˆab)τ λˆ = op(n−1/2)if ann ζ(kn)√kn/n = op(n−1/2). Now we haveGˇΩˇ−1G˙(θˆab − θ∗) + GˇΩˇ−1 1nn∑j=1gj(θ∗) = op(n−1/2).The same arguments used in proof of DIN Theorem 5.6 give the remainder of our asymptotic normalityproof.1634.9. ProofsProof of Theorem 4.3. Expanding around λ = 0 givesTˆab = 2n(gˆ +1ngn+1(θˆab))τλˆ− λˆτ1nn+1∑j=1gj(θˆab)gj(θˆab)(1 + λ¨τgj(θˆab))2 λˆ.λ¨ is the mean value that could be different for each row. Let Ω¨ ≡ 1n∑nj=1gj(θˆab)gj(θˆab)(1+λ¨τgj(θˆab))2 . Then we canwriteTˆab = 2ngˆτ λˆ+1ngn+1(θˆab)τ λˆ+12λˆτ Ω¨λˆ+121nλˆτgn+1(θˆab)gn+1(θˆab)(1 + λ¨τgn+1(θˆab))2 λˆ. (4.23)Use again expansion (4.17) and plug expression (4.20) into (4.23), we getTˆab = 2ngˆτ Ωˇ−1gˆ + gˆτ Ωˇ−11ngn+1(θˆab)− gˆτ Ωˇ−11ngn+1(θˆab)gn+1(θˆab)τ(1 + λ˙τgn+1(θˆab))2 λˆ+1ngn+1(θˆab)τ λˆ+121nλˆτgn+1(θˆab)gn+1(θˆab)(1 + λ¨τgn+1(θˆab))2 λˆ+12gˆ′Ωˇ−1Ω¨Ωˇ−1gˆ +(1ngn+1(θˆab)τ Ωˇ−1Ω¨Ωˇ−11ngn+1(θˆab))1 +1ngn+1(θˆab)τ λˆ(1 + λ˙τgn+1(θˆab))2.Under the assumptions on order of an, an = op(n1/2k3/4n), all the terms except 2ngˆτ Ωˇ−1gˆ and ngˆτ Ωˇ−1Ω¨Ωˇ−1gˆare op(k−1/2n ). Then we can show the asymptotic normality result in exactly the same way as in proofs ofDIN Theorems 6.4 and 6.3.4.9.2 Proofs of Theorems in Section 4.6.2In this section, we denoteVˆ (Xi, θ) ≡n∑j=1pii,jρj(θ)ρj(θ)τandDˆ(Xi, θ) ≡n∑j=1pii,j∇θρj(θ)τ .Because we need to use some lemmas from Kitamura, Tripathi and Ahn (2004), it is abbreviated intoKTA in the proofs. The following two lemmas are analogous to KTA Lemma B1. It basically proves that1644.9. Proofsthe Lagrange multiplier of AKSEL inner loop has the same stochastic property as that of KSEL innerloop.Lemma 4.6. Suppose an = op(n). Let λi be a random vector that attainssupγ∈Γia(θ∗)n+1∑j=1pii,j log(1 + γτρj(θ∗))for every ω ∈ Ω, then max16i6nTi,n ‖λi‖ = op(√nβnbs+2ςn)+ op(1n%−1/m).Proof. By construction of AKSEL, we know that λi exists. And it must satisfy the first order conditionfor every ω ∈ Ω,0 =n+1∑j=1pii,jρj(θ∗)1 + λτi ρj(θ∗)=n+1∑j=1pii,j(λτi ρj(θ∗)−(λτi ρj(θ∗))21 + λτi ρj(θ∗))= λτin∑j=1pii,jρj(θ∗) +1nρn+1(θ∗)−n+1∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)6 λτin∑j=1pii,jρj(θ∗) +1nρn+1(θ∗)−n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗).We therefore haveλτin∑j=1pii,jρj(θ∗)ρj(θ∗)τ1 + λτi ρj(θ∗)λi 6 λτin∑j=1pii,jρj(θ∗)(1−ann)6 ‖λi‖∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann).It is easy to see thatλτin∑j=1pii,jρj(θ∗)ρj(θ∗)τλi 6λτin∑j=1pii,jρj(θ∗)ρj(θ∗)τ1 + λτi ρj(θ∗)λi(1 + max16j6nλτi ρj(θ∗)).1654.9. ProofsNotice that by construction, we have for every j, 1 + λτi ρj(θ∗) > 0. Then we haveλτi Vˆ (Xi, θ∗)λi 6 ‖λi‖∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann)(1 + ‖λi‖ max16j6n‖ρj(θ∗)‖). (4.24)Notice that by Lemma 4.1, we have max16j6n‖ρj(θ∗)‖ = op(n1/m). In the inequality (4.24), let ξ ≡ λi/‖λi‖ wehave‖λi‖ξτ Vˆ (Xi, θ∗)ξ −∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann)max16j6n‖ρj(θ∗)‖ 6∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann).Lemma B.6 of KTA implies that max16i6nTi,n∥∥∥Vˆ (Xi, θ∗)− V (Xi, θ∗)∥∥∥ −→p 0 as n −→ ∞. Therefore wehave P[min16i6nξτ Vˆ (Xi, θ∗)ξ > σ/2]−→ 1. Lemma B.3 of KTA gives that under our assumption on thebandwidth,max16i6nTi,n∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥= op(√nβnbs+2ςn)+ op(1n%−1/m). (4.25)Under our assumptions, we have√nβ+2/mnbs+2ςn−→ 0 and 1n%−2/m−→ 0 and thereforemax16i6n∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann)(max16j6n‖ρj(θ∗)‖)= op(1)and so lettingW ≡ min16i6nξτ Vˆ (Xi, θ∗)ξ −max16i6n∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann)(max16j6n‖ρj(θ∗)‖),we have P [Wn > σ/4] −→ 1 as n −→∞. Therefore on such a sequence of events, we havemax16i6nTi,n ‖λi‖ 6max16i6nTi,n∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann) 4σ.And therefore the stochastic order (as n −→∞) of max16i6nTi,n ‖λi‖ is the same as max16i6nTi,n∥∥∥∑nj=1 pii,jρj(θ∗)∥∥∥,which is given in (4.25).1664.9. ProofsLemma 4.7. Suppose an = op(n). Let λi be a random vector that attains supγ∈Γia(θ∗)∑n+1j=1 pii,j log(1 +γτρj(θ∗)) for every ω ∈ Ω, thenTi,nλi = Ti,nVˆ (Xi, θ∗)−1n∑j=1pii,jρj(θ∗) + Ti,nriwhere max16i6nTi,n ‖ri‖ = op(nβ+1/mnbs+2ςn)+ op(1n2%−3/m).Proof. As in proof of Lemma 4.6, we have the first order condition0 =n+1∑j=1pii,jρj(θ∗)1 + λτi ρj(θ∗)=n∑j=1pii,jρj(θ∗)1 + λτi ρj(θ∗)+1nρn+1(θ∗)1 + λτi ρn+1(θ∗)=n∑j=1pii,jρj(θ∗)−n∑j=1pii,jρj(θ∗)ρj(θ∗)τλi +n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)ρj(θ∗) +1nρn+1(θ∗)1 + λτi ρn+1(θ∗).Consequently, we haveTi,nλi =Ti,nVˆ (Xi, θ∗)−1n∑j=1pii,jρj(θ∗)+ Ti,nVˆ (Xi, θ∗)−1n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)ρj(θ∗) +1nρn+1(θ∗)1 + λτi ρn+1(θ∗) .Notice that we have from the first order condition,n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)=n+1∑j=1pii,jλτi ρj(θ∗)−1n(λτi ρn+1(θ∗))21 + λτi ρn+1(θ∗)ρn+1(θ∗) (4.26)Letr1,i ≡ Vˆ (Xi, θ∗)−1n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)ρj(θ∗)andr2,i ≡ Vˆ (Xi, θ∗)−1(1nρn+1(θ∗)1 + λτi ρn+1(θ∗)).1674.9. ProofsAnd the remainder term is ri = r1,i + r2,i. For the first term,∥∥∥∥∥∥n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)ρj(θ∗)∥∥∥∥∥∥6∥∥∥∥∥∥n∑j=1pii,j(λτi ρj(θ∗))21 + λτi ρj(θ∗)ρj(θ∗)∥∥∥∥∥∥=(max16j6n‖ρj(θ∗)‖)‖λi‖∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥(1−ann)−(max16j6n‖ρj(θ∗)‖)1n(λτi ρn+1(θ∗))21 + λτi ρn+1(θ∗)ρn+1(θ∗). (4.27)For the last term in the above expression, we have∥∥∥∥∥1n(λτi ρn+1(θ∗))21 + λτi ρn+1(θ∗)ρn+1(θ∗)∥∥∥∥∥=ann(λτi ρn+1(θ∗))21 + λτi ρn+1(θ∗)∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥=ann∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥‖λi‖2 ‖ρn+1(θ∗)‖21 + λτi ρn+1(θ∗).Notice thatmax16i6n‖λτi ρn+1(θ∗)‖ 6 an(max16i6n‖λi‖)max16i6n∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥ = op(1).Therefore (on a sequence of events whose probability tends to one) the second term of (4.27) is neglectable.By Lemma B.7 of KTA, we havemax16i6nTi,n∥∥∥Vˆ (Xi, θ∗)−1∥∥∥ = Op(1).For r2,i, we have∥∥∥∥1nρn+1(θ∗)1 + λτi ρn+1(θ∗)∥∥∥∥ =ann∥∥∥∥∥∥n∑j=1pii,jρj(θ∗)∥∥∥∥∥∥∥∥∥∥11 + λτi ρn+1(θ∗)∥∥∥∥ .It is easy to see that if an = Op(1), max16i6nTi,n ‖r2,i‖ is neglectable to max16i6nTi,n ‖r1,i‖. The stochastic order ofmax16i6nTi,n ‖ri‖ is determined by(max16j6n‖ρj(θ∗)‖)(max16i6nTi,n ‖λi‖)(max16i6nTi,n∥∥∥∑nj=1 pii,jρj(θ∗)∥∥∥). Thenthe claimed result follows readily.The strategy for consistency proof is the same as proof of KTA Theorem 3.1. Recall that the AKSEL1684.9. Proofsprofile likelihood function is`ak(θ) =n∑i=1Ti,n supγ∈Γia(θ)n+1∑j=1pii,j log(1 + γτρj(θ))with Γia(θ) ≡{γ ∈ RJ : 1 + γτρj(θ) > 0 for all j such thatpii,j > 0}. We defineGn(θ) ≡ −1n`ak(θ), θ ∈ Θ.Let λi(θ) denote the point in Γia(θ) that attains supγ∈Γia(θ)∑n+1j=1 pii,j log(1 + γτρj(θ)). Notice that λi(θ) isreal-valued for each θ ∈ Θ and every ω ∈ Ω by construction of the pseudo observation. Then we have foreach θ ∈ Θ,Gn(θ) = −1nn∑i=1Ti,nn+1∑j=1pii,j log(1 + λi(θ)τρj(θ)). (4.28)The AKSEL estimator can be defined to be approximate maximizer of Gn over θ ∈ Θ. Notice thatλi(θ) depends on the whole sample. This is inconvenient and makes the consistency proof of a generalM-estimator not useful (see Van der Vaart (1998) Chapter 5). The method used in KTA is to find anotherrandom function Qn that has a simpler structure such that everywhere on Θ, Gn is majorized by Q˜n, ona sequence of events whose probabilities tend to one such that there exists another random function Qnwith supθ∈Θ∥∥∥Qn(θ)− Q˜n(θ)∥∥∥ −→p 0 and some the following properties.The random function Qn has a uniform probability limit Q on Θ, i.e. supθ∈Θ‖Qn(θ)−Q(θ)‖ −→p 0, suchthat θ∗ attains unique maximum of Q in a stronger way, i.e. for every δ > 0, supθ∈B(θ∗,δ)cQ(θ) < Q(θ∗). If wecan show that for every δ > 0, there existsH(δ) > 0 such that (1). P[supθ∈B(θ∗,δ)cQn(θ) < Q(θ∗)−H(δ)]−→1 as n −→∞ and (2). Gn(θ∗) −→p Q(θ∗). Then we can use standard arguments to show that θˆak −→p θ∗.Proof of Theorem 4.4. Following the strategy, we first define u(x, θ) = E[g(Z,θ)|X=x]1+‖E[g(Z,θ)|X=x]‖ .Notice that‖u(x, θ)‖ 6 1 for every (x, θ). In the definition 4.28, we replace λi(θ) by n−1/mu(Xi, θ) and get the randomfunction Q˜n:Q˜n(θ) ≡ −1nn∑i=1Ti,nn+1∑j=1pii,j log(1 + n−1/mu(Xi, θ)τρj(θ)).Under the assumption that E[supθ∈Θ‖ρ(Z1, θ∗)‖m]<∞, we have max16j6nsupθ∈Θ‖ρ(Zj , θ)‖ = op(n1/m). Therefore,1694.9. Proofswe havemax16i,j6nn−1/msupθ∈Θ‖u(Xi, θ)τρj(θ)‖ = op(1).Additionally, if an = Op(1), then we havemax16i6nn−1/msupθ∈Θ‖u(Xi, θ)τρn+1(θ)‖ 6 anmax16i6nn−1/mn∑j=1pii,jsupθ∈Θ‖ρj(θ)‖= op(1),where we applied KTA Lemma D5. Fix any c˜ ∈ (0, 1). Then we haveP[supθ∈Θ∥∥∥n−1/mu(Xi, θ)τρj(θ)∥∥∥ < c˜ for all i, j]−→ 1as n −→ ∞. Q˜n is defined everywhere on Θ and Gn is majorized by Q˜n everywhere on Θ on such asequence. Consider Q¯n defined byQ¯n(θ) ≡1n1+1/mn∑i=1−Ti,nu(Xi, θ)τE [ρ(Zi, θ)|Xi] .We next should prove that supθ∈Θ∣∣∣n1/mQ¯n(θ)− n1/mQ˜n(θ)∣∣∣ −→p 0. Then by the arguments used in proof ofTheorem 3.1 or KTA, we also have supθ∈Θ∣∣∣n1/mQn(θ)− n1/mQ˜n(θ)∣∣∣ −→p 0 whereQn(θ) =1n1+1/mn∑i=1−u(Xi, θ)τE [ρ(Zi, θ)|Xi] .By Lemma B8 of KTA, supθ∈Θ∣∣∣n1/mQ¯n(θ)− n1/mQ˜n(θ)∣∣∣ −→p 0 is true if and only if∥∥∥∥∥n1/mnn∑i=1−Ti,n1nlog(1 + n−1/mu(Xi, θ)τρn+1(θ))∥∥∥∥∥= op(1)1704.9. Proofsuniformly. Then for some random variables in (0, 1), ψi for i = 1, . . . , n, we have an expansionn1/mnn∑i=1Ti,n1nlog(1 + n−1/mu(Xi, θ)τρn+1(θ))=n1/mnn∑i=1Ti,n1n(n−1/mu(Xi, θ)τρn+1(θ) +n−2/m |u(Xi, θ)τρn+1(θ)|2(1 + ψin−1/mu(Xi, θ)τρn+1(θ))2).The uniform convergence result follows easily from the fact that max16i6nsupθ∈Θ∥∥∥∑nj=1 pii,jρj(θ)∥∥∥ = o(n1/m).Then as in the proof of Theorem 3.1 of KTA, now for each θ ∈ Θ, n1/mQn(θ) is an average of i.i.d.random variables and its probabilistic limiting function, denoted as Q, can be easily shown to havethe desired properties discussed in the remarks. In addition, it is easy to see that Q(θ∗) = 0 by theidentifiability assumption. To finish the proof, by using the results on stochastic order of the AKSELLagrange multiplier, i.e. Lemmas 4.6 and 4.7, it is clear thatGn(θ∗) = −1nn∑i=1Ti,nn+1∑j=1pii,j log(1 + λi(θ∗)τρj(θ∗))> −1nn∑i=1Ti,nλi(θ∗)τn∑j=1pii,jρj(θ∗)(1−ann)= op(nβnbs+2ςn)+ op(1n%−1/m)together with Gn(θ∗) 6 Q˜n(θ∗) −→p 0 and the restriction put on the rate of bandwidth, we haveGn(θ∗) −→p 0.Next we give a proof of asymptotic normality of AKSEL estimator. The proof technique used isdifferent from the original proof of KTA.Lemma 4.8. max16i6nsupγ∈Γia(θ∗)∑n+1j=1 pii,j log(1 + γτρj(θ∗)) = op(nβnbs+2ςn)+ op(1n2%−2/m).Lemma 4.8 gives us an order assement of max16i6nTi,n∥∥∥∑nj=1 pii,jρj(θˆak)∥∥∥. The proof technique mimicsthat of Newey and Smith (2004). This is an important step to make if we want to drop Assumption 3.6of KTA.Lemma 4.9. max16i6nTi,n∥∥∥∑nj=1 pii,jρj(θˆak)∥∥∥ = op(n−1/m).Lemma 4.10. Let λˆi be random vector that attains supγ∈Γia(θˆak)∑n+1j=1 pii,j log(1+γτρj(θˆak)) for every ω ∈ Ω,1714.9. Proofsthen we have P[max16i6n∥∥∥λˆi∥∥∥ 6 n−1/m]−→ 1 as n −→∞.Lemma 4.11. For some neighborhood N around θ∗, we havesupθ∈N∥∥∥∥∥∥1nn∑i=1Ti,nn∑j=1pii,j∇θρj(θ)τ Vˆ (Xi, θ)−1n∑j=1pii,j∇θρj(θ)− I(θ)∥∥∥∥∥∥= op(1)where I(θ) ≡ E[D(X1, θ)τV (X1, θ)−1D(X1, θ)].Lemma 4.12. Assume that assumptions 13,16,20,21 hold, suppose that % > max {1/η + 1/2, 2/m + 1/2} andfor some β ∈ (0, 1) the bandwidth satisfiesmax{n2βnb5s/2+6ςn,1n2%−1/η−1/m−1/2b2ςn,1n2%−3/m−1/2b3ςn}−→ 0,then n−1/2∑ni=1 Ti,nDˆ(Xi, θˆak)Vˆ (Xi, θˆak)−1(∑nj=1 pii,jρj(θ∗))−→d Normal(0, I(θ∗)).Lemma 4.13. If we denote Vˇi ≡ 1n∑nj=1 pii,jρj(θˆak)ρj(θˆak)τ(1+ρj(θˆak)τ λˇi)2 and Dˇi ≡ 1n∑nj=1 pii,j∇θρj(θˆak)1+ρj(θˆak)τ λˇi, wheremax16i6n∥∥λˇi∥∥ = Op(n−1/m), and if we haven−1/2n∑i=1Ti,nDˇiVˇ −1in∑j=1pii,jρj(θ∗)+ n−1/2n∑i=1Ti,nDˇiVˇ −1in∑j=1pii,j∇θρj(θ˜)(θˆak − θ∗)= op(1)then we have n1/2(θˆak − θ∗)−→d Normal (0,Σ∗).Proof of Theorem 4.5. Under our assumptions, θˆak is weakly consistent for θ∗, which is assumed tobe an interior point of Θ. Then there exists some neighborhood N around θ∗ such that N ⊆ Θ andP[θˆak ∈ N]−→ 1 as n −→∞. Applying the envelope theorem, the first order condition that θˆak satisfiescan be written as0 =n∑i=1Ti,nn+1∑j=1pii,j∇θρj(θˆak)λˆi1 + ρj(θˆak)τ λˆi=n∑i=1Ti,nn+1∑j=1pii,j∇θρj(θˆak)1 + ρj(θˆak)τ λˆin+1∑j=1pii,jρj(θˆak)ρj(θˆak)τ(1 + ρj(θˆak)τ λ˜i)2−1n+1∑j=1pii,jρj(θˆak) . (4.29)1724.9. ProofsWe further expandn+1∑j=1pii,jρj(θˆak) =n+1∑j=1pii,jρj(θ∗) +n+1∑j=1pii,j∇θρj(θ˜)(θˆak − θ∗). (4.30)In (4.29) and (4.30), λ˜i and θ˜ are appropriate mean values. Then we prove that the following termsvanishes at rate op(n−1/2):A ≡1nn∑i=1Ti,nn+1∑j=1pii,j∇θρj(θˆak)1 + ρj(θˆak)τ λˆi Vˇ −1i(1nρn+1(θˆak))B ≡1nn∑i=1Ti,nn+1∑j=1pii,j∇θρj(θˆak)1 + ρj(θˆak)τ λˆi Vˇ −1i1nρn+1(θˆak)ρn+1(θˆak)τ(1 + ρn+1(θˆak)τ λ˜i)2C ≡1nn∑i=1Ti,n1n∇θρn+1(θˆak)1 + ρn+1(θˆak)τ λˆiλˆi.Then we can apply Lemmas 4.11, 4.12 and 4.13 and claim that the AKSEL estimator has the desiredasymptotic distribution.173Chapter 5ConclusionIn this thesis, I proposed methods to improve finite sample performance of existing methods. The approachproposed in Chapter 2 gives new testing methods that can have much better finite sample size propertiesthan existing tests. The Bartlett correctability theorem proved in Chapter 3 substantially enhances theusefulness of Bartlett correction and pseudo observation adjustment of EL because now for a much largerclass of testing problems we can use these methods to improve finite sample size properties. Chapter4 considers a larger class of statistical models and extends pseudo observation adjustment approach toaddress the convex hull problem of EL-based methods in this setup. There are many potential applicationsof our findings, because we consider very general modelling frameworks.The limitation of the method in Chapter 2 is that we have a loss of power if using it. The limitation ofChapter 3 is that to make the method practically implementable and appealing to applied econometricians,we should find an easier way to estimate the correction factor than using sample analogues. The limitationof Chapter 4 is that our method introduces a new tuning parameter and now we do not know a gooddata-driven choice of it. Tackling these limitations is left for future research.174Bibliography[1] Allen, J., Gregory, A., & Shimotsu, K. (2011). Empirical Likelihood Block Bootstrapping. Journalof Econometrics, 161(2), 110-121.[2] Amemiya, T. (1985). Advanced Econometrics. Harvard University Press.[3] Anatolyev, S. (2005). GMM, GEL, Serial Correlation, and Asymptotic Bias. Econometrica, 73(3),983-1002.[4] Andrews, D. (1991). Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estima-tion. Econometrica, 59(3), 817-858.[5] Andrews, D. (1991). Asymptotic Normality of Series Estimators for Nonparametric and Semipara-metric Models. Econometrica, 59(2), 307 - 345.[6] Andrew, D. and Marmer, V. (2007). Exactly Distribution-free Inference in Instrumental VariablesRegression with Possibly Weak Instruments. Journal of Econometrics, 142, 183 - 200.[7] Ash, R. & Doleans-Dade, C. (2000). Probability and Measure Theory. Academic Press.[8] Baggerly, K. (1998). Empirical Likelihood as a Goodness-of-Fit Measure. Biometrika, 85, 535 - 547.[9] Bank, B., Guddat, J. Klatte, D. Kummer, B. & Tammer, K. (1983). Nonlinear Parametric Opti-mization. Birkhauser.[10] Barndorff-Nielsen, O.E. and Hall, P. (1988). On the Level-error after Bartlett Adjustment of theLikelihood Ratio Statistic. Biometrika, 75, 374-378[11] Bierens, H. (2013). The Hilbert Space Theoretical Foundation of Semi-Nonparametric Modeling. forth-coming in J. Racine, L. Su and A. Ullah (eds), Handbook of Applied Nonparametric and Semipara-metric Econometrics and Statistics, Oxford University Press.175Bibliography[12] Blinder, A. S., & Maccini, L. J. (1991). The Resurgence of Inventory Research: What have weLearned?. Journal of Economic Surveys, 5(4), 291-328.[13] Brown, B. W., & Newey, W. K. (2002). Generalized Method of Moments, Efficient Bootstrapping,and Improved Inference. Journal of Business & Economic Statistics, 20(4), 507-517.[14] Brown, B. M., & Chen, S (1998). Combined and Least Squares Empirical Likelihood. Annals of theInstitute of Statistical Mathematics, 1-18.[15] Chamberlain, G. (1987). Asymptotic Efficiency in Estimation with Conditional Moment Restrictions.Journal of Econometrics, 34(3), 305 - 334.[16] Chen, J., & Huang, Y. (2013). Finite-Sample Properties of the Adjusted Empirical Likelihood. Jour-nal of Nonparametric Statistics, 25(1), 147-159.[17] Chen, J., Variyath, A., & Abraham, B. (2008). Adjusted Empirical Likelihood and its Properties.Journal of Computational and Graphical Statistics, 17(2), 426-443.[18] Chen, J., Sitter, R. & Wu, C. (2002): “Using Empirical Likelihood Methods to Obtain RangeRestricted Weights in Regression Estimators for Surveys”, Biometrika, 89(1), 230 - 237[19] Chen, S.X. (1993). On the Coverage Accuracy of Empirical Likelihood Confidence Regions for LinearRegression Model. Annals of the Institute of Statistical Mathematics, 45, 621-637.[20] Chen, S.X. (1994). Empirical Likelihood Confidence Intervals for Linear Regression Coefficients.Journal of Multivariate Analysis, 49, 24-40.[21] Chen, S.X. & Cui, H. (2005). On the Bartlett Properties of Empirical Likelihood with Mo-ment Restrictions. Technical Report, Department of Statistics, Iowa State University, available athttp://www.public.iastate.edu/˜songchen/Chen-Cui-ELBC-report.pdf[22] Chen, S.X. & Cui, H. (2006.a). On Bartlett Correction of Empirical Likelihood in the Presence ofNuisance Parameters. Biometrika, 93, 215-220.[23] Chen, S.X. & Cui, H. (2006.b). On Bartlett Correction of Empirical Likelihood in the Presence ofNuisance Parameters. Technical Report, Department of Statistics, Iowa State University, available athttp://www.public.iastate.edu/˜songchen/el-nus-bc-report1.pdf176Bibliography[24] Chen, S.X., & Cui, H. (2007). On the Second-Order Properties of Empirical Likelihood with MomentRestrictions. Journal of Econometrics, 141(2), 492-516.[25] Clarida, R., Gali, J., & Gertler, M. (2000). Monetary Policy Rules and Macroeconomic Stability:Evidence and Some Theory. Quarterly Journal of Economics, 115(1), 147-180.[26] Clark, T. E. (1996). Small-Sample Properties of Estimators of Nonlinear Models of Covariance Struc-ture. Journal of Business & Economic Statistics, 14(3), 367-373.[27] Corbae, D., Stinchcombe, M. & Zeman, J. (2009). An Introduction to Mathematical Analysis forEconomic Theory and Econometrics. Oxford University Press.[28] De Villiers, J. (2012). Mathematics of Approxmations. Springer.[29] DiCiccio, T., Hall, P., & Romano, J (1991). Empirical Likelihood is Bartlett Correctable. The Annalsof Statistics, 19(2), 1053-1061.[30] DiCiccio, T., Hall, P. & Romano, J. (1989). Bartlett Adjustment for Empirical Likeli-hood. Technical Report #298, Department of Statistics, Stanford University, available athttp://statistics.stanford.edu/˜ckirby/techreports/NSF/EFS%20NSF%20298.pdf[31] Dominguez, M. & Lobato, I. (2004). Consistent Estimation of Models Defined by Conditional MomentRestrictions. Econometrica, 72(5), 1601 - 1615.[32] Donald, S., Imbens, G., & Newey, W (2003). Empirical Likelihood Estimation and Consistent Testswith Conditional Moment Restrictions. Journal of Econometrics, 117(1), 55-93.[33] Doukhan (1994). Mixing: Properties and Examples. Lecture Notes in Statistics, Springer.[34] Fitzenberger, B. (1998). The Moving Blocks Bootstrap and Robust Inference for Linear Least Squaresand Quantile Regressions. Journal of Econometrics, 82(2), 235-287.[35] Francq, C., & Zako¨ıan, J. (2005). A Central Limit Theorem for Mixing Triangular Arrays of VariablesWhose Dependence Is Allowed to Grow with the Sample Size. Econometric Theory, 21(6), 1165-1171.[36] Go¨tze, F., & Hipp, C. (1983). Asymptotic Expansions for Sums of Weakly Dependent RandomVectors. Probability Theory and Related Fields, 1-29.177Bibliography[37] Go¨tze, F., & Hipp, C. (1994). Asymptotic Distribution of Statistics in Time Series. The Annals ofStatistics, 22(4), 2062-2088.[38] Gregory, A., Lamarche, J., & Smith, G. (2002). Information-Theoretic Estimation of PreferenceParameters: Macroeconomic Applications and Simulation Evidence, Journal of Econometrics, 107(2),213-233.[39] Grenda´r, M., & Judge, G. (2009). Empty Set Problem of Maximum Empirical Likelihood Methods.Electronic Journal of Statistics, 3(0), 1542-1555.[40] Hall, A. (2005). Generalized Method of Moments in Advanced Texts in Econometrics. Oxford Uni-versity Press.[41] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer.[42] Hall, P. and Heyde, C.C. (1980). Martingale Limit Theory and Its Application. Academic Press.[43] Hall, P., & Horowitz, J. L. (1996). Bootstrap Critical Values for Tests Based on Generalized-Method-of-Moments Estimators. Econometrica, 64(4), 891-916.[44] Hall, P., Horowitz, J. L., & Jing, B. (1995). On Blocking Rules for the Bootstrap with DependentData. Biometrika, 82(3), 561-574.[45] Hall, P. & La Scala, B. (1990). Methodology and Algorithms of Empirical Likelihood. InternationalStatistical Review, 58 (2), 109 - 127.[46] Hansen, B., & West, K. (2002). Generalized Method of Moments and Macroeconomics. Journal ofBusiness & Economic Statistics, 20(4), 460-469.[47] Hansen, L. (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econo-metrica, 50(4), 1029-1054.[48] Hansen, L., Heaton, J., & Yaron, A. (1996). Finite-Sample Properties of Some Alternative GMMEstimators. Journal of Business & Economic Statistics, 14(3), 262-280.[49] Harville, D. (1997). Matrix Algebra From a Statistician’s Perspective. Springer.178Bibliography[50] Ibragimov, I. A. (1962). Some Limit Theorems for Stationary Processes. Theory of Probability & ItsApplications, 7(4), 349-382.[51] Imbens, G. (2002). Generalized Method of Moments and Empirical Likelihood. Journal of Business& Economic Statistics, 20(4), 493 - 506.[52] Imbens, G. W., Spady, R. H., & Johnson, P (1998). Information Theoretic Approaches to Inferencein and Moment Condition Models. Econometrica, 66(2), 333-357.[53] Inoue, A., & Shintani, M. (2006). Bootstrapping GMM Estimators for Time Series. Journal of Econo-metrics, 133(2), 531-555.[54] Jing, B. & Wood, A. (1996). Exponential Empirical Likelihood is not Bartlett Correctable. TheAnnals of Statistics, 24, 365 - 369.[55] Kim, T. Y (1993). A Note on Moment Bounds for Strong Mixing Sequences. Statistics & ProbabilityLetters, 16(2), 163-168.[56] Kitamura, Y. (1997). Empirical Likelihood Methods with Weakly Dependent Processes. The Annalsof Statistics, 25(5), 2084-2102.[57] Kitamura, Y. (2001). Asymptotic Optimality of Empirical Likelihood for Testing Moment Restric-tions. Econometrica, 69(6), 1661-1672.[58] Kitamura, Y. (2006). Empirical Likelihood Method in Econometrics: Theory and Practice. CowlesFoundation Discussion Paper, Yale UniversityKitamura, Y., & Stutzer, M. (1997). An Information-Theoretic Alternative to Generalized Method of Moments Estimation. Econometrica, 65(4), 861-874.[59] Kitamura, Y., Tripathi, G., & Ahn, H (2004). Empirical Likelihood-Based Inference in ConditionalMoment Restriction Models. Econometrica, 72(6), 1667-1714.[60] Komunjer, I. (2012). Global Identification in Nonlinear Models with Moment Restrictions. Econo-metric Theory, 28(04), 719-729.[61] Lahiri, S. (2003). Resampling Methods for Dependent Data. Springer.[62] Lahiri, S. (2007). Asymptotic Expansions for Sums of Block-Variables under Weak Dependence. TheAnnals of Statistics, 35(3), 1324-1350.179Bibliography[63] Lahiri, S. (2006). Asymptotic Expansions of Sums of Block-Variables under Weak De-pendence. Technical Report, Department of Statistics, Iowa State University, available atarxiv.org/abs/math.st/0606739[64] Lahiri, S. (2010). Edgeworth Expansions for Studentized Statistics under Weak Dependence. TheAnnals of Statistics, 38(1), 388-434.[65] Lazar, N. & Mykland, P. (1999). Empirical Likelihood in the Presence of Nuisance Parameters.Biometrika, 86, 203 - 211.[66] Liu, Y. & Chen, J. (2010): “Adjusted Empirical Likelihood with High-Order Precision”, The Annalsof Statistics, 38(3), 1341-1362.[67] Matsushita, Y., & Otsu, T. (2013). Second-Order Refinement of Empirical Likelihood for TestingOveridentifying Restrictions. Econometric Theory, 29(02), 324-353.[68] McCullagh, C. (1987). Tensor Methods in Statistics. Chapman & Hall/CRC Monographs on Statistics& Applied Probability.[69] Newey, W. (1990). Efficient Instrumental Variables Estimation of Nonlinear Models. Econometrica,58(4), 809-837.[70] Newey, W. (1997). Convergence Rates and Asymptotic Normality for Series Estimators. Journal ofEconometrics, 79(1), 147-168.[71] Newey, W. & McFadden, D. (1994). Large Sample Estimation and Hypothesis Testing, Chapter 36,Handbook of Econometrics, Volume IV, Edited by R.F. Engle and D.L. McFadden.[72] Newey, W., & Smith, R (2004). Higher Order Properties of GMM and Generalized Empirical Likeli-hood Estimators. Econometrica, 72(1), 219-255.[73] Newey, W. & West, K. (1994). Automatic Lag Selection in Covariance Matrix Estimation. Review ofEconomic Studies, 61, 631-654[74] Otsu, T. (2010). Generalized Neyman-Pearson Optimality of Empirical Likelihood for Testing Pa-rameter Hypotheses. Annals of the Institute of Statistical Mathematics, 61, 773-787.180Bibliography[75] Otsu, T. (2010). On Bahadur Efficiency of Empirical Likelihood. Journal of Econometrics, 157(2),248-256.[76] Owen, A. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika,75(2), 237-249.[77] Owen, A. (1990). Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics, 18(1),90-120.[78] Owen, A. (2001). Empirical Likelihood. Academic Press.[79] Politis, D. N., & Romano, J. P (1994). The Stationary Bootstrap. Journal of the American StatisticalAssociation, 89(428), 1303-1313.[80] Qin, J., & Lawless, J (1994). Empirical Likelihood and General Estimating Equations. The Annalsof Statistics, 22(1), 300-325.[81] Robinson, P. (1987). Asymptotically Efficient Estimation in the Presence of Heteroskedasticity ofUnknown Form. Econometrica, 55(4), 875-891.[82] Rosenthal, J. (2000). A First Look at Rigorous Probability Theory - Second Edition. World Scientific.[83] Rudin, W. (1976). Principles of Mathematical Analysis, Third Edition. McGraw-Hill.[84] Severini, T. (2005). Elements of Distribution Theory. Cambridge University Press.[85] Smith, R. J. (2011). GEL Criteria for Moment Condition Models. Econometric Theory, 27(06), 1192-1235.[86] Stock, J. H., Wright, J.H. and Yogo, M. (2002). A Survey of Weak Instruments and Weak Identi-fication in Generalized Method of Moments. Journal of Business and Economic Statistics, 20 (4),518-529.[87] Tsao, M. & Zhou, J. (2001). On the Robustness of Empirical Likelihood Ratio Confidence Intervalsfor Location. Canadian Journal of Statistics, 29(1), 129-140.[88] Tsao, M. (2004). Bounds on Coverage Probabilities of the Empirical Likelihood Ratio ConfidenceRegions. The Annals of Statistics, 32(3), 1215-1221.181Bibliography[89] Vanderbei, R. (2008): Linear Programming: Foundations and Extensions, Third Edition. Springer.[90] Van der Vaart, A. (1998). Asymptotic Statistics. Cambridge University Press.[91] Wald, A. (1949). Note on the Consistency of the Maximum Likelihood Estimate. The Annals ofMathematical Statistics, 20(4), 595-601.[92] Yeh, J. (2006). Real Analysis: Theory of Measure and Integration - Second Edition. Academic Press.[93] Zilinskas, A., Praga, E.. Mackute, A. and Varoneckas, A. (2004). Adaptive Search for Optimum in aProblem of Oil Stabilization Process. Adaptive Computing in Design and Manufacture VI, Computer-Aided Chemical Engineering, Springer, London.182
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Essays on empirical likelihood
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Essays on empirical likelihood Ma, Jun 2014
pdf
Page Metadata
Item Metadata
Title | Essays on empirical likelihood |
Creator |
Ma, Jun |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | This thesis consists of three research chapters on the theory of empirical likelihood (EL), which is a class of inferential methods widely used in econometrics. In Chapter 2, we focus on estimation and testing of moment restriction models with weakly dependent stationary time series data using blockwise empirical likelihood method. Empirical likelihood based methods often encounters the finite sample problem that the constraint set of the profiling step becomes empty. This issue undermines the validity of EL-based methods in empirical applications. We first show first-order validity of Chen, Variyath and Abraham (2008)'s pseudo observation adjustment, which is used to overcome this shortcoming. Under regularity conditions, key higher-order properties are found. The first property is that blockwise EL ratio statistics admit higher-order refinement and this refinement can be implemented via either mean adjustment to the EL ratio statistic or creating a pseudo observation with specific level of adjustment. By the latter approach, we address both the empty-constraint-set issue and low precision of chi-square approximation. We also find that for testing problems, the optimal block length choice that minimizes the higher-order approximation error has an order of magnitude the sample size to the power of 2/5. In Chapter 3, we focus on parameter hypothesis testing problems for moment restriction models using EL ratio tests. We substantially extend existing theorems on Bartlet correctability of EL ratio tests for parameter testing problems in Chen and Cui (2007) and Chen and Cui (2006.a). We consider tests of general nonlinear restrictions on the parameter under the null hypothesis. We show Bartlett correctability of EL ratio tests of such a large family of testing problems, which are potentially useful in many empirical applications. In Chapter 4, we focus on estimation and testing of conditional moment restrictions with i.i.d. data. Following the approach of adjusted empirical likelihood (AEL) proposed by Chen, Variyath and Abraham (2008), this paper develops AEL-based methods for conditional moment restrictions, and establishes that new methods produce semiparametrically efficient estimators and consistent specification tests. This new method shows improved computational efficiency and accuracy in finite samples, as compared to some existing alternatives. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-08-26 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166952 |
URI | http://hdl.handle.net/2429/50195 |
Degree |
Doctor of Philosophy - PhD |
Program |
Economics |
Affiliation |
Arts, Faculty of Vancouver School of Economics |
Degree Grantor | University of British Columbia |
Graduation Date | 2014-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2014_november_ma_jun.pdf [ 1.23MB ]
- Metadata
- JSON: 24-1.0166952.json
- JSON-LD: 24-1.0166952-ld.json
- RDF/XML (Pretty): 24-1.0166952-rdf.xml
- RDF/JSON: 24-1.0166952-rdf.json
- Turtle: 24-1.0166952-turtle.txt
- N-Triples: 24-1.0166952-rdf-ntriples.txt
- Original Record: 24-1.0166952-source.json
- Full Text
- 24-1.0166952-fulltext.txt
- Citation
- 24-1.0166952.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166952/manifest