On Dual Empirical LikelihoodInference under SemiparametricDensity Ratio Models in thePresence of Multiple SamplesWith Applications to Long Term Monitoring ofLumber QualitybySong CaiB.Sc. in Atmospheric Science, Peking University, 1999M.Sc. in Atmospheric Science, The University of British Columbia, 2008M.Sc. in Statistics, The University of British Columbia, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)May 2014© Song Cai 2014AbstractMaintaining a high quality of lumber products is of great social and economicimportance. This thesis develops theories as part of a research program aimedat developing a long term program for monitoring change in the strength oflumber. These theories are motivated by two important tasks of the monitor-ing program, testing for change in strength populations of lumber producedover the years and making statistical inference on strength populations basedon Type I censored lumber samples. Statistical methods for these inferencetasks should ideally be efficient and nonparametric. These desiderata lead usto adopt a semiparametric density ratio model to pool the information acrossmultiple samples and use the nonparametric empirical likelihood as the toolfor statistical inference.We develop a dual empirical likelihood ratio test for composite hypothesesabout the parameter of the density ratio model based on independent sam-ples from different populations. This test encompasses testing differences inpopulation distributions as a special case. We find the proposed test statis-tic to have a classical chi–square null limiting distribution. We also derivethe power function of the test under a class of local alternatives. It revealsthat the local power is often increased when strength is borrowed from ad-ditional samples even when their underlying distributions are unrelated tothe hypothesis of interest. Simulation studies show that this test has betterpower properties than all potential competitors adopted to the multiple sam-ple problem under the investigation, and is robust to model misspecification.The proposed test is then applied to assess strength properties of lumberwith intuitively reasonable implications for the forest industry.iiAbstractWe also establish a powerful inference framework for performing empiri-cal likelihood inference under the density ratio model when Type I censoredsamples are present. This inference framework centers on the maximiza-tion of a concave dual partial empirical likelihood function, and features aneasy computation. We study the properties of this dual partial empiricallikelihood, and find its corresponding likelihood ratio test to have a sim-ple chi–square limiting distribution under the null model and a non–centralchi–square limiting distribution under local alternatives.iiiPrefaceThis thesis is written up under the supervisions of Dr. Jiahua Chen andDr. James V. Zidek. Chapter 3 and 4 are based on a submitted manuscriptcoauthored with them. Dr. Chen initiated the idea of using the density ratiomodel as the platform for inference and suggested the direction of studyingthe dual empirical likelihood ratio test. Following his advice, I worked outall the theoretical results, wrote the initial draft of the manuscript, andconducted most of the follow–up revisions. During this process, Dr. Chenhad given me valuable constructive criticism, helped me to organize my ideasand improve my proofs, and taken up a few rounds of revisions. Dr. Zideksupported, encouraged and inspired me during the writing of the manuscript,gave me helpful suggestions, and revised and greatly improved the final draft.Chapter 5 is based on a manuscript in preparation coauthored with Dr.Jiahua Chen. Dr. Chen suggested the topic of the manuscript. I indepen-dently developed all the theory and wrote the manuscript. Dr. Chen helpedme to improve some details of the theory.Chapter 6 is based on an R software package called drmdel that hasbeen published on The Comprehensive R Archive Network. I independentlydesigned and wrote the entire package.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . xviiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Application background . . . . . . . . . . . . . . . . . . . 11.2 Density ratio models: concepts and examples . . . . . . . . 21.2.1 Exponential families of distributions . . . . . . . . 31.2.2 Relationship between logistic regression models andDRMs . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Biased sampling and DRM . . . . . . . . . . . . . . 71.2.4 Other examples . . . . . . . . . . . . . . . . . . . . 81.3 EL inference under DRMs: historical and recent develope-ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Highlights of the contributions of this thesis . . . . . . . . 121.5 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . 132 Fundamentals of DEL Inference under the DRM . . . . 152.1 EL for a single sample . . . . . . . . . . . . . . . . . . . . 15vTABLE OF CONTENTS2.2 EL for multiple samples under the DRM . . . . . . . . . . 172.3 Non–regularity of the DRM and DEL . . . . . . . . . . . . 212.4 Properties of the DEL . . . . . . . . . . . . . . . . . . . . 232.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.5.1 Theorem 2.1: Properties of the information matrix 292.5.2 Theorem 2.2: Asymptotic properties of the score func-tion . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5.3 Lemma 2.3: Concavity of the DEL . . . . . . . . . 352.5.4 Lemma 2.4: 3√n–consistency of the MELE . . . . . 362.5.5 Theorem 2.5: Asymptotic normality of the MELE . 383 DEL Ratio Test for Hypothesis about DRM Parameters 403.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 DELR statistic and its limiting distributions . . . . . . . . 433.2.1 DELR statistic and its null limiting distribution . . 443.2.2 Limiting distribution of the DELR statistic under localalternatives . . . . . . . . . . . . . . . . . . . . . . 463.2.3 On the condition for the positiveness of the non–centralparameter . . . . . . . . . . . . . . . . . . . . . . . 493.3 Simulation studies . . . . . . . . . . . . . . . . . . . . . . 523.3.1 Approximation to the distribution of the DELR . . 523.3.2 Power comparison . . . . . . . . . . . . . . . . . . . 553.4 Robustness of DELR test against model misspecification . 583.4.1 Null limiting distribution of the DELR statistic . . 583.4.2 Power of the DELR test . . . . . . . . . . . . . . . 703.5 Analysis of lumber quality data . . . . . . . . . . . . . . . 703.5.1 Assessing the DRM fit: an exploratory approach . . 723.5.2 Testing for equality of strength populations . . . . 773.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.6.1 Theorem 3.1: Null limiting distribution of the DELRstatistic . . . . . . . . . . . . . . . . . . . . . . . . 80viTABLE OF CONTENTS3.6.2 Theorem 3.2: Limiting distribution of the DELR underlocal alternatives . . . . . . . . . . . . . . . . . . . 853.7 Appendix: Parameter values in simulation studies . . . . . 914 Effects of Information Pooling by DRM . . . . . . . . . . 934.1 Effects on the estimation accuracy of the MELE . . . . . . 934.2 Effects on the power of the DELR test . . . . . . . . . . . 954.3 Simulation studies . . . . . . . . . . . . . . . . . . . . . . 994.3.1 Comparison of estimation accuracy . . . . . . . . . 994.3.2 Comparison of testing power . . . . . . . . . . . . . 994.4 proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.4.1 Theorem 4.1: Estimation accuracy comparison . . . 1024.4.2 Theorem 4.2: Local power comparison in general . 1084.4.3 Theorem 4.3: Local power comparison in a specialcase . . . . . . . . . . . . . . . . . . . . . . . . . . 1145 EL Inference under the DRM Based on Multiple Type ICensored Samples . . . . . . . . . . . . . . . . . . . . . . . 1175.1 Type I censored single random samples and the correspondingEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.2 EL for multiple Type I censored samples under the DRM . 1195.3 Calculating the MELE . . . . . . . . . . . . . . . . . . . . 1225.3.1 Partial EL and its relation to EL . . . . . . . . . . 1225.3.2 Maximization of the PEL . . . . . . . . . . . . . . 1245.4 Interpretation of the PEL . . . . . . . . . . . . . . . . . . 1265.5 Properties of the MPELE . . . . . . . . . . . . . . . . . . 1285.6 EL ratio test for the DRM parameter . . . . . . . . . . . . 1305.7 Other inference tasks . . . . . . . . . . . . . . . . . . . . . 1335.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.8.1 Lemma 5.2: Properties of the partial information ma-trix . . . . . . . . . . . . . . . . . . . . . . . . . . . 135viiTABLE OF CONTENTS5.8.2 Lemma 5.3: Asymptotic properties of the score func-tion . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.8.3 Theorem 5.4: Asymptotic normality of the MPELE 1465.8.4 Theorem 5.5: Asymptotic properties of the ELR test 1475.9 Appendix: The WEL inference for Type I censored samples 1506 R software package “drmdel” for DEL inference . . . . . 1556.1 Under the hood: consideration and implementation . . . . 1556.2 DRM fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 1576.3 The DELR test . . . . . . . . . . . . . . . . . . . . . . . . 1636.4 EL population CDF estimation . . . . . . . . . . . . . . . 1666.5 EL quantile estimation . . . . . . . . . . . . . . . . . . . . 1686.6 Quantile comparison . . . . . . . . . . . . . . . . . . . . . 1716.7 EL kernel density estimation . . . . . . . . . . . . . . . . . 1757 Summary and Future Work . . . . . . . . . . . . . . . . . 1797.1 Summary of the present work . . . . . . . . . . . . . . . . 1797.1.1 Contribution I: DELR test for hypothesis about theDRM parameter . . . . . . . . . . . . . . . . . . . 1797.1.2 Contribution II: Effects of information pooling by theDRM . . . . . . . . . . . . . . . . . . . . . . . . . . 1817.1.3 Contribution III: EL inference under the DRM basedon Type I censored samples . . . . . . . . . . . . . 1817.1.4 Contribution IV: Software package “drmdel” for DELinference under the DRM . . . . . . . . . . . . . . 1827.2 Outlook on future work . . . . . . . . . . . . . . . . . . . . 1827.2.1 EL ratio test for comparing quantiles under the DRM 1827.2.2 Effects of information pooling on quantile estimationunder the DRM . . . . . . . . . . . . . . . . . . . . 1837.2.3 Inference under the DRM based on randomly censoredsamples . . . . . . . . . . . . . . . . . . . . . . . . 184viiiTABLE OF CONTENTS7.2.4 Basis function selection in the DRM . . . . . . . . 1857.2.5 Random–effect DRMs . . . . . . . . . . . . . . . . 1857.2.6 Other projects: high dimensional DRM and finite sam-ple corrections for the DELR test . . . . . . . . . . 186References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187ixList of Tables3.1 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for Weibull samples under a misspecified DRM. 603.2 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 1. . . . . . . . . . . . . 623.3 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 2. . . . . . . . . . . . . 623.4 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 3. . . . . . . . . . . . . 633.5 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 4. . . . . . . . . . . . . 633.6 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 5. . . . . . . . . . . . . 633.7 The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distribu-tions under the misspecified DRM 6. . . . . . . . . . . . . 633.8 The p–values of pairwise comparisons among three MOR pop-ulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78xLIST OF TABLES3.9 The p–values of the DELR and Wald tests based on two–sample DRMs for pairwise comparisons among the three MORpopulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 793.10 Parameter values for power comparison under non–normal dis-tributions (Section 3.3.2). . . . . . . . . . . . . . . . . . . 913.11 Parameter values for power comparison under misspecifiedDRMs (Section 3.4.2). . . . . . . . . . . . . . . . . . . . . 924.1 Comparison of the estimation accuracies of νˆ(1) and νˆ(2) underthe setting of Example 4.1. β1[1], β1[2]: the two componentsof β1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.2 Comparison of the estimation accuracies of νˆ(1) and νˆ(2) underthe setting of Example 4.2. . . . . . . . . . . . . . . . . . . 1004.3 Gamma parameter values for power comparison of R(1)n andR(1)n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.4 Parameter values for power comparison of R(1)n and R(1)n underthe setting of Example 4.3. . . . . . . . . . . . . . . . . . . 1024.5 Parameter values for power comparison of R(1)n and R(1)n underthe setting of Example 4.4. . . . . . . . . . . . . . . . . . . 102xiList of Figures3.1 Q–Q plots of the simulated and the null limiting distributionsof the DELR statistic. . . . . . . . . . . . . . . . . . . . . 533.2 Q–Q plots of the distributions of the DELR statistics under thelocal alternative model against the corresponding asymptotictheoretical distributions. . . . . . . . . . . . . . . . . . . . 543.3 Power curves for normal data. The parameter setting 0 corre-sponds to the null model and the settings 1–6 correspond toalternative models. . . . . . . . . . . . . . . . . . . . . . . 563.4 Power curves for non–normal data. The parameter setting 0corresponds to the null model and the settings 1–5 correspondto alternative models. . . . . . . . . . . . . . . . . . . . . . 573.5 Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for Weibull samples under a misspecifiedDRM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.6 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 1. . . . . . . . . 643.7 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 2. . . . . . . . . 653.8 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 3. . . . . . . . . 66xiiLIST OF FIGURES3.9 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 4. . . . . . . . . 673.10 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 5. . . . . . . . . 683.11 Q–Q plots of the simulated and the null limiting distributionof the DELR statistics for samples from different families ofdistributions under the misspecified DRM 6. . . . . . . . . 693.12 Power curves of the five tests with DELR and Wald tests basedon misspecified DRMs. The parameter setting 0 correspondsto the null model and the settings 1–5 correspond to alterna-tive models. . . . . . . . . . . . . . . . . . . . . . . . . . . 713.13 Kernel density plots of the MOR and MOT samples. . . . 733.14 The histograms, EL kernel density estimates (solid curves),classical kernel density estimates (dashed curves) and threeparameter Weibull density estimates (dot–dashed curves) forMOR samples. . . . . . . . . . . . . . . . . . . . . . . . . . 763.15 The histograms, EL kernel density estimates (solid curves),classical kernel density estimates (dashed curves) and threeparameter Weibull density estimates (dot–dashed curves) forMOT samples. . . . . . . . . . . . . . . . . . . . . . . . . . 774.1 Power curves of R(1)n , R(2)n , Wald(1) and Wald(2). The param-eter setting 0 corresponds to the null model and the settings1–5 correspond to alternative models. . . . . . . . . . . . . 1014.2 Power curves of R(1)n , R(2)n , Wald(1) and Wald(2) under the datasettings of Example 4.3 and 4.4. The parameter setting 0corresponds to the null model and the settings 1–5 correspondto alternative models. . . . . . . . . . . . . . . . . . . . . . 103xiiiLIST OF FIGURES6.1 Comparative plot of the EL kernel density estimator, classicalkernel density estimator and true density of F3 in Example6.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178xivList of AbbreviationsAD Anderson–Darling testANOVA Analysis of varianceASTM American Society for Testing and MaterialsBFGS Broyden–Fletcher–Goldfarb–Shanno algorithmCDF Cumulative distribution functionCRAN The Comprehensive R Archive NetworkDEL Dual empirical likelihoodDELR Dual empirical likelihood ratioDELRT Dual empirical likelihood ratio testDPEL Dual partial empirical likelihoodDRM Density ratio modelEL Empirical likelihoodELR Empirical likelihood ratioIQR Interquartile rangeKW Kruskal–Wallis rank–sum testLHS Left hand sidexvList of AbbreviationsMELE Maximum empirical likelihood estimatorMPELE Maximum partial empirical likelihood estimatorMWELE Maximum weighted empirical likelihood estimatorMOR Modulus of ruptureMOT Modulus of tensionPEL Partial empirical likelihoodRHS Right hand sideWEL Weighted empirical likelihoodxviAcknowledgementsI am deeply indebted to my principle supervisor, Dr. Jiahua Chen, who hasinspired and guided me during the course of my Ph.D. study. Dr. Chen gaveme lots of constructive criticism that cannot be heard from anyone else, andhelped me to sharpen my view and form the shape of my research. From him,I not only have learned skills and knowledge but also a critical and seriousattitude toward research. Thank you, Jiahua, for guiding me and pushingme toward a high research standard.I also owe my deepest gratitude to my co–supervisor, Dr. James V. Zidek.Since the day I started my study in Statistics as a Master’s student, he hasbeen a great supervisor, a mentor, and a friend. He has been supporting mespiritually, financially, and in every other way. Thank you, Jim, for all youhave done for me!In addition, I would like to thank Dr. Yukun Liu, a friend and colleague,who had helped me with the theory of empirical likelihood inference anddiscussed with me frequently about my research when I started my workpresented in this thesis.Most importantly, I thank my dear wife and parents–in–law for theirwholehearted love and support. They have dedicated all their time andenergy to help me take care of my young children, one born when I startedmy PhD, and the other, when I was in the third year of my PhD study. Theyalso have encouraged me and brought me much joy and happiness.Without the help from my academic fathers, Dr. Zidek and Dr. Chen,or the help from my dear friend, Dr. Liu, or the support from my belovedfamily, the writing of this thesis would never have been possible.xviiTo Zhao, Tiejun, Tieli, Brendan, and HeatherxviiiChapter 1Introduction1.1 Application backgroundWith nearly half of Canada’s entire land surface covered by trees, lumberhas been a vital natural resource and major construction material for thiscountry. Maintaining a high quality of lumber products hence is of greatsocial and economic importance: it is crucial to construction safety as wellas to the Canadian lumber industry. This thesis develops statistical methodsas part of a research project aimed at developing a program for monitoringchange in the strength of lumber. Interest in such a program also has beensparked by climate change, which will affect the way trees grow, as well by thechanging resource mix, for example due to increasing reliance on plantationlumber.The monitoring program includes the following two important tasks.First, it is crucial to examine whether the overall strength of lumber changesover time, which translates into a hypothesis testing problem for detectingdifferences in the population distributions of the strength of lumber fromdifferent years. Second, it is desirable to make smart strength test plans tomaximize the scientific value of each piece of lumber: collecting and testinglumber costs time and money; for example lumber must be conditioned inthe lab over a period of months before being destructively tested. To achievethis goal, one way would be to collect Type I right–censored lumber strengthsamples: we stop a destructive strength test, say bending strength test, at aprespecified level such that not all the lumber is broken; the unbroken lum-ber then can be used afterwards for other strength tests, say tension strength11.2. Density ratio models: concepts and examplestest, etc. The sample collect in the first test, although censored, is a repre-sentative sample of bending strength, and can be used to infer populationcharacteristics of that strength. The sample collect in the second test isuseful for studying the relationship between the two different strengths.Echoing the above tasks, this thesis develops statistical methods aimedat the following inference goals: (i) testing hypotheses about parametersof a number of different population distributions, given a random samplefrom each; and (ii) establishing an inference framework for estimation andhypothesis testing problems concerning several different populations basedon Type I censored multiple samples.1.2 Density ratio models: concepts andexamplesDesiderata for the statistical methods used in the long term monitoring pro-gram of lumber includes two key goals. First, the methods must be efficientto reduce the sizes of the required samples. Moreover, the reduced effectivesample size of Type I censored samples demands highly efficient statisti-cal methods to reach required estimation accuracy and hypothesis testingpower. Toward the goal of efficiency, this thesis proposes methods that bor-row strength among lumber samples by exploiting an obvious feature of theresource, that distinct populations of lumber over years, species, regions andso on will share some latent strength characteristics. Second, the methodsshould ideally be nonparametric in accordance with the well–ingrained prac-tice in setting standards for forest products like those in American Societyfor Testing and Materials (ASTM) protocols (ASTM D1990 – 07).These desiderata, lead to the semiparametric density ratio model (DRM)adopted in this thesis. In the targeted application, we have multiple lumberstrength samples collected from different years. While the size of the samplefrom each population may be small because of the high data collection costs,21.2. Density ratio models: concepts and examplesthe total sample size could be large. If we can pool the information fromdifferent samples, efficient inference based on the pooled sample then can beexpected. Since the lumber samples from different years share some commonphysical characteristics, it is reasonable to assume that the correspondingdistribution functions have a certain relationship. In particular, we assumethat the lumber quality populations connect with each other through theirdensity functions. Suppose that we have m+ 1 independent random samplesfrom populations with cumulative distribution functions (CDFs) Fk(x), k =0, 1, . . . , m, with the same support. The DRM assumes thatdFk(x) = exp{αk + βᵀkq(x)}dF0(x), for k = 1, 2, . . . , m,where q(x), which we call the basis function of the DRM, is a prespecifiedd–dimensional function, and (αk,βk) are model parameters. But the baselinedistribution F0(x) is completely unspecified.The DRM assumption serves as a device for pooling information acrosssamples, while the nonparametric baseline distribution keeps the model flex-ible. In fact, the DRM covers a large range of distribution families, and wedemonstrate this with typical examples in the following subsections.1.2.1 Exponential families of distributionsEvery exponential family of distributions satisfies the assumption of theDRM. A family of distributions is called an exponential family if it has adensity of the formf(x; ϑ) = k(x) exp{ηᵀ(ϑ)t(x)− A(ϑ)}, x ∈ S,where ϑ is a parameter vector, k(·), η(·), t(·) and A(·) are given functions,and S, the support of X, does not depend upon ϑ. Densities, {fk(x)}, of the31.2. Density ratio models: concepts and examplessame exponential family with parameter values {ϑk} satisfyfk(x) = exp{(ηᵀ(ϑk)− ηᵀ(ϑ0))t(x) +(A(ϑ0)− A(ϑk))}f0(x).This relationship shows that distributions from a same exponential familyfulfill the DRM assumption with basis function q(x) = t(x) and parametersαk = A(ϑ0)− A(ϑk), βk = ηᵀ(ϑk)− ηᵀ(ϑ0).In order to define an exponential family, the function k(x) must be com-pletely specified. In DRM, the baseline density function is the counterpartof k(x), however, it is a non–parametric component of the model. This is thefundamental difference between the parametric exponential family and thesemiparametric DRM, and also the reason that the DRM encompasses eachexponential family as a special case.Example 1.1. The gamma distribution family with shape λ and rate κ,Γ(λ, κ), is an exponential family withη(λ, κ) = (−κ, λ− 1)ᵀ, t(x) =(x, lnx)ᵀ, A(λ, κ) = ln Γ(λ)− λ lnκ,where Γ(·) is the gamma function. Therefore, gamma distributions with pa-rameter values {(λk, κk)} satisfy the DRM assumption with basis functionq(x) = (x, lnx)ᵀ and parametersαk = lnΓ(λ0)Γ(λk)+ λk lnκk − λ0 lnκ0, βk = (κ0 − κk, λk − λ0)ᵀ.Example 1.2. The normal distribution family with mean µ and standarddeviation σ, N(µ, σ2), is an exponential family withη(µ, σ) =(µσ2, −12σ2)ᵀ, t(x) =(x, x2)ᵀ, A(µ, σ) =µ22σ2+ lnσ.41.2. Density ratio models: concepts and examplesHence, normal distributions with parameter values {(µk, σk)} satisfy the DRMassumption with basis function q(x) = (x, x2)ᵀ and parametersαk = lnσ0σk+µ202σ20−µ2k2σ2k, βk =(µkσ2k−µ0σ20,12σ20−12σ2k)ᵀ.Example 1.3. The log–normal distribution family with mean µ and standarddeviation σ on the log–scale, LN(µ, σ2), is an exponential family withη(µ, σ) =(µσ2, −12σ2)ᵀ, t(x) =(lnx, (lnx)2)ᵀ, A(µ, σ) =µ22σ2+ lnσ.Hence, log–normal distributions with parameter values {(µk, σk)} saftisfiesthe DRM assumption with basis function q(x) =(lnx, (lnx)2)ᵀand param-etersαk = lnσ0σk+µ202σ20−µ2k2σ2k, βk =(µkσ2k−µ0σ20,12σ20−12σ2k)ᵀ.Example 1.4. The Weibull distribution family with known shape λ and un-known scale parameter κ, W (κ), is an exponential family withη(κ) = −1κλ, t(x) = xλ, A(κ) = λ lnκ− lnλ.Weibull distributions with known common shape λ and different scales {κk}then satisfy the DRM assumption with basis function q(x) = xλ and param-etersαk = λ lnκ0κk, βk =1κλ0−1κλk.Example 1.5. The Pareto distribution family with location xmin and shapeλ, P (xmin, λ), has the density of the formf(x) = exp{(lnλ+ λ lnxmin)− (λ+ 1) lnx}, for x ≥ xmin > 0 and λ > 0.51.2. Density ratio models: concepts and examplesWhen xmin is fixed, the Pareto family is an exponential family. Pareto dis-tributions with common location xmin and different shapes {λk} satisfy theDRM assumption with basis function q(x) = ln x and parametersαk = lnλkλ0+ (λk − λ0) lnxmin, βk = λ0 − λk.In addition to the exponential families, the DRM also naturally arisesfrom many other statistical models, such as logistic regression models andbiased sampling models, as described in the following subsections.1.2.2 Relationship between logistic regression modelsand DRMsThe logistic regression model in case–control studies has a close relationshipwith the two–sample DRM (Qin and Zhang, 1997). A case–control study is toidentify factors that may contribute to a medical condition by comparing twogroups of individuals: those with the disease (case group) and those withoutthe disease (control group). Let Y be the group indicator variable with 0being control and 1 being case, and X be the random vector representing theexposures for an individual. A classical model for case–control data is thelogistic regression model, e.g. used by Prentice and Pyke (1979), Farewell(1979) and Mantel (1973),Pr(Y = 1|X = x) =exp(a+ bᵀx)1 + exp(a+ bᵀx),where a and b are unknown parameters. Suppose the exposures X has un-specified marginal density f(x). Denote the conditional density of exposures61.2. Density ratio models: concepts and examplesX given group Y = i, i = 0, 1, as fi(x). Then by Bayes’ rule, we have,f0(x) =Pr(Y = 0|X = x)f(x)Pr(Y = 0)=f(x)Pr(Y = 0){1 + exp(a+ bᵀx)}f1(x) =Pr(Y = 1|X = x)f(x)Pr(Y = 1)=exp(a+ bᵀx)f(x)Pr(Y = 1){1 + exp(a+ bᵀx)}.Hence the conditional densities of exposures X given the group Y = 0, 1 con-stitute a two–sample DRM with basis function q(x) = x, α = a+log{Pr(Y =0)/Pr(Y = 1)}, and β = b:f1(x) = exp({a+ log{Pr(Y = 0)/Pr(Y = 1)}}+ bᵀx)f0(x).As in the case of binary case–control data, if categorical data with cate-gories Y = 0, 1, . . . , m and covariates X satisfy the multinomial logit modelassumption,Pr(Y = k|X = x)Pr(Y = 0|X = x)= exp(ak + bᵀkx), k = 1, 2, . . . , m,where ak and bk, k = 1 . . . , m, are unknown parameters, then the conditionaldensity fk(x) of covariates X given category Y = k, fulfills the DRM assump-tion with basis function q(x) = x, αk = ak + log{Pr(Y = 0)/Pr(Y = k)},k = 1, 2, . . . , m, and βk = bk:fk(x) = exp({ak + log{Pr(Y = 0)/Pr(Y = k)}}+ bᵀkx)f0(x).1.2.3 Biased sampling and DRMAnother example of the DRM is the biased sampling model. Selection biashappens in survey sampling if not every unit in population is given an equalchance to enter the sample. For example, in a survey of hospital patients,patients with longer visits may have a greater chance to be sampled than71.2. Density ratio models: concepts and examplesthose with shorter visits; in a survey of life times of light bulbs, the bulbswith longer lives may be more likely to be chosen than those with shorterlives. In such cases, the distribution of the units in a sample differs fromthe population distribution. Let F (x) denote the population distributionfunction. Let w(x; ϑ) be a weighting function specifying the “chance” of aunit with value x being chosen, where ϑ is an unknown parameter. Thedensity, dG(x), of the sampled units isdG(x) =w(x; ϑ)dF (x)´w(x; ϑ)dF (x).This weighted density model can be extended to the multiple sample case.Suppose we have m+ 1 samples, labeled as 0, 1, . . . , m, from unknown dis-tributions, and the population distribution function of the kth sample, Fk, isa weighted version of the population distribution F0 of sample 0, i.e.Fk(x) =´ x−∞wk(t; ϑ)dF0(t)´∞−∞wk(t; ϑ)dF0(t), k = 1, . . . , m, (1.1)where wk(x; ϑ) is a given weighting function with parameter ϑ. Vardi (1982and 1985), Gill et al. (1988) studied the estimation of the {Fk} under thismulti–sample biased sampling model with a given parameter in the weight-ing function, and Gilbert et al. (1999) and Gilbert (2000) studied the sameestimation problem in a more general case where the parameter ϑ in theweighting function is unknown. It is easily seen that the {Fk} with weight-ing function wk(x; β) = exp{βᵀkq(x)} satisfy the DRM assumption.1.2.4 Other examplesDensity ratio models are also used for life data modeling. For example (Mar-shall and Olkin, 2007), given a distribution function F (x), the moment pa-81.3. EL inference under DRMs: historical and recent developementrameter family of distributions,Fb(x) =1µbˆ x0tbdF (t), x ≥ 0,where µb =´∞0 tbdF (t) < ∞, and the Laplace transform parameter familyof distributions,Fs(x) =1L(s)ˆ x−∞e−stdF (t),where L(s) =´∞−∞ e−stdF (t) <∞ is the Laplace transform of F , both satisfythe DRM assumption.Moreover, specific forms of DRMs have been used by Anderson (1972) forlogistic discrimination, Anderson (1979) for multivariate logistic compoundsmodeling, and Efron and Tibshirani (1996) for density estimation of expo-nential families.1.3 Empirical likelihood inference underDRMs: historical and recent developmentAlthough the use of the density ratio model can be traced back to Anderson(1972) for logistic discrimination, it is not until 1990’s that the DRM startedto gain popularity after Qin et al. had published a series of papers (Qin 1993,Qin and Zhang 1997, Qin 1998) on inference problems under two–sampleDRMs using the empirical likelihood (EL). Indeed, on the one hand, the EL,given its nonparametric characteristic, is a natural inference framework forthe semiparametric DRM; on the other, the theoretical foundation of ELestablished by Owen (2001) and Qin’s (1994) extension of EL for estimatingequations makes it a ready–to–use tool for inference under DRMs.Qin (1998) formally introduced EL to inference problems under the DRM,and in a two–sample case, established the asymptotic normality of the max-91.3. EL inference under DRMs: historical and recent developementimum EL estimator of DRM parameters. After that, EL became a stan-dard inference tool under the DRM, and numerous papers about variousaspects of inference under the DRM using EL have been published. Forestimation problems, Cheng and Chu (2004) and Fokianos (2004) studieddensity estimation under two–sample and multi–sample DRMs, respectively;Zhang (2000) and Chen and Liu (2013) studied quantile estimation undertwo–sample and multi–sample DRMs, respectively. For hypothesis testingproblems, Qin and Zhang (1997) and Zhang (2002) studied goodness–of–fittests for logistic models and generalized logit models based on case–controldata using DRM formulation, respectively; Fokianos et al. (2001) proposed asimple Wald–type test for linear hypotheses about the parameters of multi–sample DRMs; Keziou and Leoni-Aubin (2008) studied the EL ratio test fortesting the equality of two distributions that satisfy a two–sample DRM.The effect of misspecification of the basis function of the DRM was assessedby Fokianos and Kaimi (2006), and the basis function selection problem isstudied by Fokianos (2007).The DRM has been adopted for inference based on censored samples.Ren (2008) proposed a weighted EL approach for inference under a two–sample DRM based on randomly censored observations. Wang et al. (2011)studied EL inference under a two–sample DRM for randomly right–censoreddata. Shen et al. (2012) studied EL inference under the DRM with randomlycensored biased–sampling data.The DRM is also used in the context of finite mixture models for its flex-ibility and robustness against model misspecification. Zou et al. (2002) andZou (2002) proposed a finite mixture model whose each mixing componentis a further mixture of two distributions that satisfy a two–sample DRM, forthe modeling of genetic loci influencing quantitative traits. They discoveredthat the DRM is not a regular model: when the slope DRM parameter β = 0,the normalization constant α must also be 0, which declares that when thetrue DRM parameter (α∗, β∗) = 0, the EL function is not well–defined in a101.3. EL inference under DRMs: historical and recent developementneighbourhood of the true parameter — the violation of an important regu-larity condition for likelihood type inference. Zou et al. hence proposed theso–called partial empirical likelihood (PEL) to get around the non–regularityof the DRM, and studied the asymptotic properties of the PEL. Zhang’s(2006) score test for testing homogeneity of population distributions underthe same mixture model is also based on the PEL. A main criticism of thePEL is that it does not use all the information contained in data, henceis less efficient than the full EL. For the same mixture model as used byZou et al. (2002), Tan (2009) proposed to treat αk as a function of βk andbaseline distribution F0. The resulting EL function then does not containthe normalization constant α and hence the non–regularity issue is avoided.However, that EL is a very complicated function of β and the computationof the maximum EL estimator is cumbersome. The work by Qin and Liang(2011) on hypothesis testing in a mixture case–control model is also alongthe lines of Tan (2009).Luo and Tsai (2012) considered an extension of DRM where the slopeparameter β is regressed on a set of covariates. On the other hand, theDRM is also used for generalizing regression models. For example, Rathouzand Gao (2009), Huang and Rathouz (2012) and Huang (2014) adopted theDRM for mean regression when data are generated from a generalized linearmodel.There is a lot more work on the topic of DRM, including the applicationsto the estimation and comparison of the receiver operating characteristic(ROC) curve (Wan and Zhang 2008, Guan et al. 2012), to the inferenceabout measurement of treatment effects (Fokianos and Troendle 2007, Jiangand Tu 2012), etc.The above literature review is meant as a general overview of the workthat has been done on EL inference under the DRM. Specific and moredetailed reviews of literature that is closely related to the theory establishedin this thesis are given in each subsequent chapter.111.4. Highlights of the contributions of this thesis1.4 Highlights of the contributions of thisthesisAs noted in previous sections, this thesis develops a theory of EL inferencefor parameter estimation and hypotheses testing concerning a number ofdifferent population distributions under the DRM, with multiple completeor Type I censored random samples from each. In particular, it presents thefollowing new results.(i) A dual EL ratio test for a general composite hypothesis about the DRMparameter based on multiple samples is developed. It embraces testingfor change in population distributions as a special case. The limitingdistributions of the corresponding test statistic under the null modeland also a local alternative model are derived. The null limiting dis-tribution is useful for approximating the p–values of the proposed test;the limiting distribution under the local alternative model is useful forapproximating the power of the proposed test, calculating the sam-ple size required for achieving a given power, and comparing the localasymptotic powers of dual EL ratio tests formulated in different ways.(ii) The effects of information pooling by the DRM on the estimation accu-racy of the maximum EL estimator of the DRM parameter and on thelocal asymptotic power of the dual EL ratio test are assessed in the-ory. It is shown that when additional samples are incorporated by theDRM, the estimation accuracy of the maximum EL estimator of theDRM parameter is usually increased and the local asymptotic powerof the dual EL ratio test is often improved, even if the underlying dis-tributions of the additional samples are not related to the populationdistributions of direct interest.(iii) A general EL inference framework under the DRM based on multipleType I censored samples is established. This inference framework is121.5. Outline of the thesiscomputationally efficient and equipped with rich asymptotic results.The theory of hypothesis testing for the DRM parameter under thisframework is also developed. Moreover, this inference framework canpotentially be used to extend any EL inference result that is availablefor complete samples under the DRM to the case of Type I censoredsamples.In addition to the above contributions in theory, this thesis also presentsan application of the proposed dual EL ratio test to assessing bending andtension strengths of lumber produced in year 2007, 2010 and 2011 with in-tuitively reasonable implications for the forest industry.Moreover, the thesis introduces a user friendly R software package wedeveloped, which is called “drmdel ”, for the dual EL inference under theDRM. This software package is fast because its core is written in C. It coversa broad range of methods including the ones developed in this thesis as wellas those developed by Chen and Liu (2013) for quantile estimation and byFokianos (2004) for density estimation.1.5 Outline of the thesisThe rest of the thesis is organized as follows. Chapter 2 introduces the con-cept of dual empirical likelihood and presents some preliminary results thatare essential for the development of the theories in the succeeding chapters.Chapter 3 proposes a dual EL ratio test for a general composite hypothesisabout the DRM parameter, and presents the asymptotic properties of thattest. Chapter 4 studies the effects of information pooling by DRM on theestimation accuracy of the maximum EL estimator and on the local asymp-totic power of the dual EL ratio test. Chapter 5 establishes an EL inferenceframework under the DRM based on multiple Type I censored samples, andpresents the theory of EL ratio test under that framework. Chapter 6 in-troduces our software package drmdel for dual EL inference under the DRM131.5. Outline of the thesisand demonstrates its use. The last chapter summarizes the results of thisthesis and discusses some future work.14Chapter 2Fundamentals of Dual EmpiricalLikelihood Inference under theDRMThis chapter lays the foundation of the dual empirical likelihood (DEL),which is the framework this thesis adopts for inferences under the DRM. Wefirst review some basics about the EL inference for a single sample and formultiple samples under the DRM, then define the DEL and summarize itsimportant properties. All the theorems presented in this chapter are stemmedfrom the literature, although we present them under more general settingsand conditions. We highlight a previously unnoticed result, Theorem 2.2,on the relationship between information matrix and asymptotic variance ofthe score function evaluated at the so–called true parameter value under theDRM, which is a key to proving some of our results in succeeding chapters.The proofs are given in the last section.We use bold symbols for vectors, normal symbols for scalars or matri-ces, upper case letters for random variables and lower case letters for thecorresponding realized values.2.1 EL for a single sampleLet x1, x2, . . . , xn be an independent sample of size n from a populationwith cumulative distribution function (CDF) F (x). When F is a discretedistribution, we have dF (xi) = F (xi) − F (x−i ). The EL of the distribution152.1. EL for a single samplefunction F is defined as if F is discrete,Ln(F ) =n∏i=1dF (xi), subject ton∑i=1dF (xi) = 1.When there are no ties in the observations, Ln(F ) is maximized when dF (xi) =1/n, for i = 1, 2, . . . , n. This maximum corresponds to the empirical dis-tribution of {xi}ni=1, Fn = n−1∑ni=1 1(xi ≤ x), where 1(·) is the indicatorfunction.Many classical nonparametric inferences about F , e.g. quantile estima-tion, are based on this unconstrained maximum EL estimator of the distribu-tion function, or equivalently the empirical distribution, Fn. A breakthroughcomes from a result on the so–called profile EL of the population mean. Theprofile log EL of the population mean µ, ln(µ), is defined as the supremumof logLn(F ) over the class of distribution functions with {xi}ni=1 as support,i.e. the distribution functions of the formF (x) =n∑i=1pi1(xi ≤ x) withn∑i=1pi = 1,such that´xdF (x) = µ for fixed µ. In other words,ln(µ) = sup{logLn(F ) : F (x) =n∑i=1pi1(xi ≤ x),n∑i=1pi = 1,n∑i=1xipi = µ}.The maximum of ln(µ) as a function of µ is easily found to be obtained atµ = x¯ = n−1∑ni=1 xi. Owen (1988) showed that the likelihood ratio statistic2{ln(x¯)− ln(µ∗)}where µ∗ is the true population mean, converges in distribution to a chi–square random variable with one degree of freedom, under mild momentconditions. This elegant Theorem is the foundation of various EL based162.2. EL for multiple samples under the DRMinferences.Another remarkable piece of work was done by Qin and Lawless (1994)which extends the empirical likelihood framework to incorporate a set ofestimating functions. Suppose a parameter vector of interest, ϑ, can bedefined as the solution toE{g(X; ϑ)}= 0,where g is a smooth function of ϑ. The profile log EL, ln(ϑ), of ϑ is definedto be the supremum of logLn(F ) over the class of distribution functions with{xi}ni=1 as support, subject to´g(x; ϑ)dF (x) = 0 for fixed ϑ, i.e.ln(ϑ) = sup{logLn(F ) : F (x) =n∑i=1pi1(xi ≤ x),n∑i=1pi = 1,n∑i=1pig(xi; ϑ) = 0}.This profile log EL ln(ϑ) is again found to be useful for inference on ϑ.In particular, the maximum profile log EL estimator of ϑ is asymptoticallynormal, and the EL ratio statistics still has a chi–square limiting distributionjust as in parametric case.2.2 EL for multiple samples under the DRMSuppose we have m+ 1 independent random samples denoted as{xkj : j = 1, 2, . . . , nk}mk=0with nk > 0 being the size of the kth sample, which are collected from popula-tions with distribution functions Fk, k = 0, 1, . . . , m. Denote the total sam-ple size as n =∑k nk. Let dFk(x) = Fk(x)−Fk(x−), and put pkj = dF0(xkj).172.2. EL for multiple samples under the DRMSuppose the {Fk} satisfy the DRM assumption postulated in Section 1.2:dFk(x) = exp{αk + βᵀkq(x)}dF0(x), for k = 1, 2, . . . , m, (2.1)where the basis function q(x) is a prespecified d–dimensional function, andθᵀk = (αk,βᵀk) are model parameters. We denote θ0 = 0 for ease of exposition.This assumption implies that the {Fk} satisfyˆdFk(x) =ˆexp{αk + βᵀkq(x)}dF0(x) = 1. (2.2)Under the DRM assumption, the EL of the {Fk} is given byLn(F0, F1, . . . , Fm) =∏k, jdFk(xkj) =∏k, jexp{αk + βᵀkq(xkj)}dF0(xkj)={∏k, jpkj}· exp{∑k, j(αk + βᵀkq(xkj))}, (2.3)where the sum and product are over all possible (k, j) combinations. Letα = (α1, . . . , αm)ᵀ, βᵀ = (βᵀ1, . . . , βᵀm), and θᵀ= (αᵀ, βᵀ). We may alsowrite the EL as Ln(θ, F0).The maximum EL estimator (MELE) of θ and F0 is the maximum pointof Ln(θ, F0) over the space of θ and F0 such that (2.2) is satisfied. As inthe case of a single sample, for both theoretical discussion and numericalcomputation, the maximization is carried out in two steps. First, we definethe profile log EL:l˜n(θ) = supF0{logLn(θ, F0) :∑k, jexp{αr + βᵀrq(xkj)}pkj = 1, r = 0, . . . ,m.}where the supremum is taken over the space of F0 with fixed θ. This supre-mum can be obtained by the method of Lagrange multipliers. For a fixed θ,182.2. EL for multiple samples under the DRMdefine the Lagrange function:Φ({pkj}, {λr}mr=0) = logLn(θ, F0) + nm∑r=0λr{1−∑k, jpkj exp{αr + βᵀrq(xkj)}}.The point {pkj} at which Ln(θ, F0) is maximized must be on a stationarypoint of Φ({pkj}, {λr}) satisfying∂Φ({pkj}, {λr})/∂pkj = 0, (2.4)∂Φ({pkj}, {λr})/∂λr = 0. (2.5)Note that, at this stationary point,0 =∑k, jpkj{∂Φ({pkj}, {λr})/∂pkj}=∑k, jpkj{1/pkj − nm∑r=0λr exp{αr + βᵀrq(xkj)}}= n− nm∑r=0λr{∑k, jexp{αr + βᵀrq(xkj)}pkj}= n− nm∑r=0λr,where the last equality is obtained by the constraint∑k, j exp{αr+βᵀrq(xkj)}pkj =1 for all r = 0, 1, . . . , m. Solving equations (2.4) and using the relationship0 = n−n∑mr=0 λr, we find that the supremum of Ln(θ, F0) is attained whenλ0 = 1−∑mr=1 λr andpkj = n−1{1 +m∑r=1λr[exp{αr + βᵀrq(xkj)} − 1]}−1, (2.6)192.2. EL for multiple samples under the DRMwhere the Lagrange multipliers {λr}mr=1 solve, for t = 0, 1, . . . ,m,∑k, jexp{αt + βᵀtq(xkj)}pkj = 1. (2.7)The profile log EL can hence be written asl˜n(θ) = −∑k, jlog{1 +m∑r=1λr[exp{αr + βᵀrq(xkj)} − 1]}+∑k, j{αk + βᵀkq(xkj)}. (2.8)The MELE θˆ is then the point at which l˜n(θ) is maximized. Given θˆ, wesolve for the Lagrange multipliers λˆr through (2.7). The MELE must satisfy∂ln(θˆ)/∂αk = 0 for k = 1, . . . , m. We see that∂ln(θˆ)∂αk= nk −∑t, jλˆk exp{αˆk + βˆᵀkq(xtj)}1 +∑mr=1 λˆr[exp{αˆr + βˆᵀrq(xtj)} − 1]= nk − λˆkn∑t, jexp{αˆk + βˆᵀkq(xtj)}pˆkj= nk − λˆkn,where the second equality is by (2.6) and the last equality is by (2.7). There-fore ∂ln(θˆ)/∂αk = nk − λˆkn = 0, and so, when θ = θˆ, we haveλˆk = nk/n.Subsequently, we obtain pˆkj by plugging θˆ and λˆk into (2.6):pˆkj = n−1{1 +m∑r=1λˆr[exp{αˆr + βˆᵀrq(xkj)} − 1]}−1. (2.9)202.3. Non–regularity of the DRM and DELFinally, the MELE of Fk, k = 0, 1, . . . , m is given byFˆk(x) = n−1∑r, jexp{αˆk + βˆᵀkq(xrj)}pˆrj1(xrj ≤ x). (2.10)2.3 Non–regularity of the DRM and dualempirical likelihoodIn applications such as that described in the Introduction to the forestryproducts industry, giving a point estimation is a minor part of the data anal-ysis. Assessing the uncertainty in the point estimator and testing hypotheseswould be judged of greater practical importance. Asymptotic properties ofthe point estimator and the likelihood function enable more such in–depthdata analyses. However, classical asymptotic theories usually rely on dif-ferential properties of the likelihood function in the neighbourhood of thetrue parameter value. Consequently these results are applicable only if thisneighbourhood lies in the parameter space.According to (2.2), we haveαk = − logˆexp{βᵀkq(x)}dF0(x).Thus, αk = 0 whenever βk = 0. When the true value θ1 = 0, its neighbor-hood will not be contained in the parameter space. In statistical terminology,DRM is not regular at this θ, as noticed by Zou et al. (2002). Clearly, theregularity is also violated when βk = βj, k 6= j, which implies αk = αj. Inour targeted applications, θk would be the parameter of the lumber popula-tion, Fk, at year k and we are particularly concerned about the stability oflumber quality. Note that θk = θj would signify the stability of the woodquality over these two years, because under the DRM, Fk = F0 is equivalentto θk = 0 and Fk = Fj, k 6= j, is equivalent to θk = θj. In a statisti-cal context, we test this stability by detecting the differences among lumber212.3. Non–regularity of the DRM and DELdistributions:H0 : Fk = Fj for all k, j ∈ {0, . . . , m} against H1 : Fk 6= Fj for some k, j,or equivalently,H0 : θk = θj for all k, j ∈ {0, . . . , m} against H1 : θk 6= θj for some k, j.Non–regularity denies a simplistic application of the straightforward EL ratiotest to this important hypothesis. This creates a need for other effectiveinferential methods.To enable likelihood type inference in the presence of non–regularity ofthe DRM, Keziou and Leoni-Aubin (2008) proposed to use a “dual” formof the EL in the case of two samples. We extend their notion to the caseof multiple samples and refer to it as the dual empirical likelihood function.Recall that when the profile log EL (2.8) is maximized, i.e. when θ = θˆ, wehave λˆr = nr/n. We define DEL by replacing the {λr} with {λˆr} in (2.8),ln(θ) = −∑k, jlog{ m∑r=0λˆr exp{αr + βᵀrq(xkj)}}+∑k, j{αk + βᵀkq(xkj)}.(2.11)Clearly, the MELE is also the point at which the DEL is maximized,θˆ = argmaxθln(θ),and the profile log EL and the DEL have the same maximal values, ln(θˆ) =l˜n(θˆ).The DEL has the following appealing features: (i) it is well–defined for allvalues of θ in the corresponding Euclidean space, thus can be “safely” used forlikelihood type inference even when θk = θj for some k 6= j, k, j = 0, . . . , m;(ii) it has a much simpler analytical form than the profile log EL since the222.4. Properties of the DEL{λk} in the profile log EL are now replaced by the data value independent{λˆk}; and (iii) as we will prove in the next section, the DEL is a smoothconcave function of the DRM parameter θ (while the log–profile empiricallikelihood is not), which leads to nice theoretical properties of the DEL andmakes numerical computation of the MELE a pleasant task. The inferencemethods developed in this thesis are based on the DEL because of theseattractive characteristics.2.4 Properties of the DELThis section presents some properties of the DEL that are useful for thedevelopment of the theory in the sequel. Most of these properties have beengiven in literature under a two–sample DRM or/and under slightly differentconditions. We highlight an unnoticed result, the relationship between theinformation matrix and the asymptotic variance of the score function, whichis a key to our study of the asymptotic properties of the DEL ratio statisticin subsequent chapters. The well–known asymptotic normality of the MELEis also included for the self–containedness of the thesis. The proofs of theresults are given in the last section of this chapter.For a matrix A, we will use A > 0 to denote that A is positive definite, andA ≥ 0 to denote that A is positive semidefinite. For a differentiable functiong(x) and a particular value x0, we use ∂g(x0)/∂x to denote (∂g(x)/∂x)|x=x0 .Theorem 2.1 (Properties of the information matrix). Suppose we have m+1random samples from populations with distributions of the DRM form givenin (2.1) and a true parameter value θ∗ such thatˆexp{βᵀkq(x)}dF0(x) <∞for θ in a neighbourhood of θ∗,´Q(x)Qᵀ(x)dF0(x) > 0 with Qᵀ(x) =(1, qᵀ(x)), and λˆk = nk/n = ρk + o(1) for some constant ρk ∈ (0, 1).232.4. Properties of the DELThe empirical information matrix Un = −n−1∂2ln(θ∗)/∂θ∂θᵀ convergesalmost surely to a positive definite matrix U = limn→∞ Un.Remark 2.1. The condition that´Q(x)Qᵀ(x)dF0(x) > 0 is equivalent to saythatQ(x)Qᵀ(x) > 0 on a set (of x values) of positive probability with respectto F0. It is a model identifiability condition. If any two components of theextended basis function Q(x) are linear dependent with probability one, sayone component of q(x) is a constant, or x2 and x2 + 2 are two componentsof q(x), then the DRM is clearly not identifiable. This assumption ensuresthat we do not use an non–identifiable model.The limiting matrix U may be regarded as an information matrix. Wepartition the entries of U in agreement with α and β and represent them asUαα, Uαβ, Uβα and Uββ. Let ϕk(θ, x) = exp{αk + βᵀkq(x)}, k = 0, . . . ,m,andh(θ, x) = (ρ1ϕ1(θ, x), . . . , ρmϕm(θ, x))ᵀ,s(θ, x) = ρ0 +m∑k=1ρkϕk(θ, x),H(θ, x) = diag{h(θ, x)} − h(θ, x)hᵀ(θ, x)/s(θ, x).(2.12)Let Ek(·), k = 0, 1, . . . , m, be the expectation operator with respect to Fk, i.e.Ek{g(x)} =´g(x)dFk(x) for a measurable function g(x). Then, the block-wise algebraic expressions of the information matrix U in terms of H(θ∗, x)and q(x) can be written asUαα = − limn→∞n−1∂2ln(θ∗)/∂α∂αᵀ = E0{H(θ∗, x)},Uββ = − limn→∞n−1∂2ln(θ∗)/∂β∂βᵀ = E0{H(θ∗, x)⊗(q(x)qᵀ(x))},Uαβ = − limn→∞n−1∂2ln(θ∗)/∂α∂βᵀ = Uᵀβα = E0{H(θ∗, x)⊗ qᵀ(x)},(2.13)where ⊗ is the Kronecker product operator.242.4. Properties of the DELPutv = n−1/2{∂ln(θ∗)/∂θ}.Let E(·) be the usual expectation operator. LetT = ρ0−11m1ᵀm + diag{ρ−11 , ρ−12 , . . . , ρ−1m } and W =(T 0m×md0md×m 0md×md),where 1k, in general, is a vector of 1s with length k.Theorem 2.2 (Asymptotic properties of the score function). Under the con-ditions of Theorem 2.1, we have Ev = 0 and v is asymptotically multivariatenormal with mean 0 and covariance matrixV = U − UWU. (2.14)Remark 2.2. In the parametric likelihood setting, the information matrixequals the asymptotic variance of the 1/√n–scaled score function evaluatedat the true parameter. In the DEL framework under the DRM, the relation-ship between the information matrix and the asymptotic variance of the scorefunction evaluated at the true value, as shown in (2.14), is different. This is apreviously unnoticed result although it has been implicitly used by Chen andLiu (2013) and Zhang (2002). Due to the complicated algebraic expressionsof U and V , this relationship is not obvious. However, once observed, aswe will see later, we can intentionally “forget” the algebraic expression of Ugiven by (2.13), and use this relationship solely for deriving the asymptoticdistribution of the DEL ratio statistic.The following two lemmas are useful for establishing the consistency ofthe MELE θˆ.Lemma 2.3. The DEL function ln(θ) defined in (2.11) is a concave function.Moreover, when∑k,jQ(xkj)Qᵀ(xkj) > 0, ln(θ) is strictly concave.252.4. Properties of the DELRemark 2.3. The condition that∑k,jQ(xkj)Qᵀ(xkj) > 0 is a sample versionof the condition that´Q(x)Qᵀ(x)dF0(x) > 0 in Theorem 2.1. This conditionis usually fulfilled if number of distinct data points is larger than d + 1 andthe components of Q(x) are not functionally linearly dependent.The concavity of the DEL in the case of two samples was pointed outby Keziou and Leoni-Aubin (2008). Since the DEL is concave, whenever ithas a maximum, the maximum is a global one, and if the DEL is strictlyconcave, this maximum is unique. The next lemma states that when thetotal sample size n goes to infinity, the DEL ln(θ) has a maximum withprobability tending one, and this maximum is in a n−1/3 neighbourhood ofthe true parameter θ∗. Let ‖ · ‖ denote the Euclidean norm of a vector.Lemma 2.4. Adopt the conditions postulated in Theorem 2.1. As n → ∞,with probability tending one, the DEL ln(θ) attains its maximum at somepoint in the interior of the closed ball, Bθ∗ = {θ : ‖θ − θ∗‖ ≤ n−1/3}, whichis centered on the true parameter value θ∗.A two–sample version of Lemma 2.4 was given by Keziou and Leoni-Aubin(2008).Remark 2.4. Lemma 2.3 and 2.4 together confirm that when sample size islarge, the MELE is well–defined and easy to compute: it is a global maximalpoint of a concave function whose maximum exists with probability tendingone. Furthermore, they dictate that the MELE θˆ is 3√n–consistent: byconcavity, all the maximal points of the DEL must be interior points of theclosed ball Bθ∗ ; hence the MELE θˆ, as a maximal point of the DEL, mustalso be an interior point of Bθ∗ .With Theorem 2.1, 2.2, Lemma 2.3 and 2.4, the asymptotic normality ofthe MELE θˆ is an easy consequence.Theorem 2.5 (Asymptotic normality of the MELE). Under the conditions ofTheorem 2.1,√n(θˆ−θ∗) has an asymptotic multivariate normal distribution262.5. Proofswith mean 0 and covariance matrix U−1−W , where W is given in Theorem2.2.The asymptotic normality of the θˆ was also established by Chen and Liu(2013) and by Zhang (2002) under slightly different conditions. Theorem 2.5reveals that the MELE is root–n consistent, an important fact that we willuse in the subsequent chapters.2.5 ProofsThis section gives proofs for the theorems and lemmas presented in the lastsection. We first introduce more notations applicable to k = 0, . . . ,m. Recallthat ϕk(θ, x) = exp{αk + βkq(x)}. We writeLn,k(θ, x) = − log{ m∑r=0λˆrϕr(θ, x)}+{αk + βᵀkq(x)}with λˆr = nr/n being the sample proportion. Hence, the DEL ln(θ) =∑k, j Ln,k(θ, xkj) where the summation is over all possible (k, j). Let Lk(θ, x)be the “population” version of Ln,k(θ, x) by replacing λˆr with its limit ρr inthe above definition. Let ek be a vector of length m with the kth entry being1 and the others being 0s, and let δij = 1 when i = j, and 0 otherwise.Recall the definitions (2.12) of h(θ, x), s(θ, x) and H(θ, x). The first orderderivatives of Lk(θ, x) can be written as∂Lk(θ, x)/∂α = (1− δk0)ek − h(θ, x)/s(θ, x),∂Lk(θ, x)/∂β = {∂Lk(θ, x)/∂α} ⊗ q(x).(2.15)272.5. ProofsSimilarly, we have∂2Lk(θ, x)/∂α∂αᵀ = −H(θ, x)/s(θ, x),∂2Lk(θ, x)/∂β∂βᵀ= −{H(θ, x)/s(θ, x)}⊗{q(x)qᵀ(x)},∂2Lk(θ, x)/∂α∂βᵀ= −{H(θ, x)/s(θ, x)}⊗ qᵀ(x).(2.16)The algebraic expressions of the derivatives of Ln,k(θ, x) are similar tothose of Lk(θ, x), with ρr replaced by the sample proportion λˆr. Note thatall entries of h(θ, x) are non–negative, and s(θ, x) exceeds the sum of allentries of h(θ, x). Thus, ‖h(θ, x)/s(θ, x)‖ ≤ 1, and the absolute value ofeach entry of H(θ, x)/s(θ, x) is bounded by 1. By examining the algebraicexpressions closely, this result implies∣∣∂2Ln,k(θ, x)/∂θi∂θj∣∣ ≤ 1 + qᵀ(x)q(x),∣∣∂3Ln,k(θ, x)/∂θi∂θj∂θk∣∣ ≤ {1 + qᵀ(x)q(x)}3/2,(2.17)where θi in general denotes the ith entry of θ.We also observed the following important relationships between the firstand second order derivatives of Lk(θ, x):E0{∂L0(θ∗, x)∂α}= −ρ−10 Uαα1m, E0{∂L0(θ∗, x)∂β}= −ρ−10 Uβα1m,(2.18)and, for k = 1, 2, . . . , m,Ek{∂Lk(θ∗, x)∂α}= ρ−1k Uααek, Ek{∂Lk(θ∗, x)∂αqᵀ(x)}= ρ−1k Uαβ(ek ⊗ Id),Ek{∂Lk(θ∗, x)∂β}= ρ−1k Uβαek, Ek{∂Lk(θ∗, x)∂βqᵀ(x)}= ρ−1k Uββ(ek ⊗ Id).(2.19)As a reminder, Ek(·), k = 0, 1, . . . , m, is the expectation operator with re-282.5. Proofsspect to Fk.The assumption that´exp{βᵀkq(x)}dF0(x) <∞ for θ in a neighbourhoodof θ∗ implies that the moment generating function of q(x) with respect toeach Fk, exists in a neighbourhood of 0. Hence, all finite order momentsof q(x) with respect to each Fk are finite. This fact and inequalities (2.17)reveal that the second and third order derivatives of ln(θ) are bounded byan integrable function.With the above preparation, we are ready to prove the theorems given inthe chapter.2.5.1 Theorem 2.1: Properties of the informationmatrixWe now show that the empirical information matrix Un = −n−1∂2ln(θ∗)/∂θ∂θᵀconverges almost surely to a positive definite information matrix, and giveits algebraic expression.Recalling that ln(θ) =∑k, j Ln,k(θ;xkj) and λˆ = nk/n, we haveUn = −1n∂2ln(θ∗)∂θ∂θᵀ= −m∑k=0λˆk{1nknk∑j=1∂2Ln,k(θ∗, xkj)∂θ∂θᵀ}.Each term in the curly brackets is the average of the sum of independentand identically distributed (iid) random variables. And by bound (2.17) andthe fact that q(x) has finite second moments, these random variables havefinite covariance matrices. Hence, by the strong law of large numbers (Chowand Teicher, 1997, 5.4, Theorem 1), each term in the curly brackets has analmost sure limit. Along with limn→∞ nk/n = ρk, we have that {Un} has analmost sure limitU = limn→∞Un = −m∑k=0ρkEk{∂2Lk(θ∗, x) ∂θ∂θᵀ}.292.5. ProofsBy expressions (2.16) of ∂2Lk(θ∗, x)/∂θ∂θᵀ, we easily get the blockwise al-gebraic expressions of U as given in (2.13):Uαα = − limn→∞n−1∂2ln(θ∗)/∂α∂αᵀ = E0{H(θ∗, x)},Uββ = − limn→∞n−1∂2ln(θ∗)/∂β∂βᵀ = E0{H(θ∗, x)⊗(q(x)qᵀ(x))},Uαβ = − limn→∞n−1∂2ln(θ∗)/∂α∂βᵀ = Uᵀβα = E0{H(θ∗, x)⊗ qᵀ(x)}.We now show that for any given θ∗, Uαα = E0{H(θ∗, x)}is positivedefinite, which is implied if for any given value of θ and x, H(θ, x) is positivedefinite. Let a be a nonzero vector of length m and ai be its ith component.Recalling the definition (2.12) of H(θ, x), we haveaᵀH(θ, x)a = s−1(θ, x){m∑i=1a2i ρiϕi(θ, x) (s(θ, x)− ρiϕi(θ, x))−2m∑1≤i<jaiajρiϕi(θ, x)ρjϕj(θ, x)}.Note that since s(θ, x)−ρiϕi(θ, x) = ρ0+∑mj 6=i ρjϕj(θ, x), the above equalitycan be further written asaᵀH(θ, x)a=s−1(θ, x){m∑i=1a2i ρ0ρiϕi(θ, x) +m∑i 6=ja2i ρiϕi(θ, x)ρjϕj(θ, x)−2m∑1≤i<jaiajρiϕi(θ, x)ρjϕj(θ, x)}=s−1(θ, x){m∑i=1a2i ρ0ρiϕi(θ, x) +m∑1≤i<j(ai − aj)2ρiϕi(θ, x)ρjϕj(θ, x)}Since s(θ, x) is positive, the first term in the curly brackets on right hand side(RHS) of the above equality is positive and the second term is nonnegative,302.5. Proofswe have aᵀH(θ, x)a > 0 and so H(θ, x) > 0 for any value of θ and x.Therefore Uαα = E0{H(θ∗, x)}> 0.Finally we show that U > 0. Recall that Q(x) = (1, q(x)ᵀ)ᵀ. By expres-sions (2.16), we see that U can be obtained from the matrixE0{H(θ∗, x)⊗{Q(x)Qᵀ(x)}}(2.20)by simply permuting rows and columns respectively. Since H(θ∗, x) > 0 forany value of x andQ(x)Qᵀ(x) > 0 on a set of positive probability with respectto F0, by a property of Kronecker product, the matrix (2.20) is positivedefinite, and so is U . This completes the proof.2.5.2 Theorem 2.2: Asymptotic properties of the scorefunctionWe now show the asymptotic normality of v = n−1/2{∂ln(θ∗)/∂θ}, the scaledscore function evaluated at the true parameter value θ∗. Recall thatLn,k(θ, x) = − log{ m∑r=0λˆrϕr(θ, x)}+{αk + βᵀkq(x)}and ln(θ) =∑k, j Ln,k(θ, xkj). We havev = n−1/2{∂ln(θ∗)/∂θ} = n−1/2∑k, j{∂Ln,k(θ, xkj)/∂θ}.We first show that Ev = 0. Denote µn,k = Ek{∂Ln,k(θ∗, x)/∂θ}. Par-tition v to subvectors vα and vβ in agreement with parameters α and β.Let λˆ = (λˆ1, . . . , λˆm)ᵀ. Let hn(θ, x), sn(θ, x) and Hn(θ, x) be the sampleversions of h(θ, x), s(θ, x) and H(θ, x) defined in (2.12) with ρk replacedby λˆk. By expression (2.15) and noticing that Ek{g(x)} = E0{g(x)ϕk(θ∗, x)}312.5. Proofsfor a measurable function g(x), we haveEvα = n1/2m∑k=0λˆkEk{∂Ln,k(θ∗, x)/∂α}= n1/2{λˆ− E0{hn(θ∗, x)(m∑k=0λˆkϕk(θ∗, x))/sn(θ∗, x)}}= n1/2{λˆ− E0{hn(θ∗, x)}}= 0,where the second last equality holds because sn(θ∗, x) =∑mk=0 λˆkϕm(θ∗, x)by definition, and the last equality holds because the kth entry of E0{hn(θ∗, x)}is E0{λˆkϕk(θ∗, x)} = Ekλˆk = λˆk. Similarly,Evβ = n1/2m∑k=0λˆkEk{{∂Ln,k(θ∗, x)/∂α} ⊗ q(x)}= n1/2{λˆ⊗ E0{q(x)} − E0{hn(θ∗, x)⊗ q(x)}}= 0.Hence Ev = n1/2∑mk=0 λˆkµn,k = 0.Given the above result, we havev = v − Ev =m∑k=0λˆ1/2k{n−1/2knk∑j=1(∂Ln,k(θ∗, xkj)/∂θ − µn,k)}.Clearly, each term in curly brackets is a centered sum of iid random vari-ables with finite covariance matrices. Thus, by a triangular array version ofcentral limit theorem (Chow and Teicher, 1997, 9.1, Corollary 1), they areall asymptotically normal with appropriate covariance matrices. In addition,these terms are independent of each other, λˆk = nk/n are non–random withlimits ρk. Therefore, the linear combination is also asymptotically normal.What left is to verify the form of the asymptotic covariance matrix. The322.5. Proofsasymptotic covariance matrix of each term in curly brackets is given byVk = Ek{(∂Lk(θ∗, x)/∂θ)(∂Lk(θ∗, x)/∂θᵀ)}− µkµᵀk, (2.21)where µk = limn→∞µn,k = Ek{∂Lk(θ∗, x)/∂θ}. Hence the overall asymp-totic variance matrix is V =∑mk=0 ρkVk.We now show that V = U − UWU . First we showm∑k=0ρkEk{(∂Lk(θ∗, x)/∂α)(∂Lk(θ∗, x)/∂αᵀ)}= U. (2.22)By (2.15), we havem∑k=0ρkEk{(∂Lk(θ∗, x)/∂α)(∂Lk(θ∗, x)/∂αᵀ)}=m∑k=1ρkekeᵀk +m∑k=0ρkEk{h(θ∗, x)hᵀ(θ∗, x)/s2(θ∗, x)}−m∑k=1ρkEk{ekhᵀ(θ∗, x)/s(θ∗, x)} −m∑k=1ρkEk{h(θ∗, x)eᵀk/s(θ∗, x)}.Note that∑mk=1 ρkekeᵀk = E0{diag{h(θ∗, x)}},m∑k=0ρkEk{h(θ∗, x)hᵀ(θ∗, x)/s2(θ∗, x)}=E0{h(θ∗, x)hᵀ(θ∗, x){m∑k=0ρkϕ(θ∗, x)}/s2(θ∗, x)}=E0{h(θ∗, x)hᵀ(θ∗, x)/s(θ∗, x)},m∑k=1ρkEk{ekhᵀ(θ∗, x)/s(θ∗, x)} = E0{{m∑k=1ρkϕ(θ∗, x)ek}hᵀ(θ∗, x)/s(θ∗, x)}}= E0{h(θ∗, x)hᵀ(θ∗, x)/s(θ∗, x)},332.5. Proofsand similarlym∑k=1ρkEk{h(θ∗, x)eᵀk/s(θ∗, x)} = E0{h(θ∗, x)hᵀ(θ∗, x)/s(θ∗, x)}.By the above expressions and the definition, (2.12), of H(θ∗, x), we havem∑k=0ρkEk{(∂Lk(θ∗, x)/∂α)(∂Lk(θ∗, x)/∂αᵀ)}= E0{H(θ∗, x)} = Uαα.Similarly, we getm∑k=0ρkEk{(∂Lk(θ∗, x)/∂α)(∂Lk(θ∗, x)/∂βᵀ)}= Uαβandm∑k=0ρkEk{(∂Lk(θ∗, x)/∂β)(∂Lk(θ∗, x)/∂βᵀ)}= Uββ.Therefore, identity (2.22) holds.Lastly we show that∑mk=0 ρkµkµᵀk = UWU . By observation (2.18), wehaveµ0µᵀ0 =1ρ20(Uαα {1m1ᵀm}Uαα Uαα {1m1ᵀm}UαβUβα {1m1ᵀm}Uαα Uβα {1m1ᵀm}Uαβ),and by observation (2.19), we have, for any k = 1, 2, . . . , m,µkµᵀk =1ρ2k(Uαα {diag(ek)}Uαα Uαα {diag(ek)}UαβUβα {diag(ek)}Uαα Uβα {diag(ek)}Uαβ).342.5. ProofsHence,m∑k=0ρkµkµᵀk =(UααTUαα UααTUαβUβαTUαα UβαTUαβ)= UWU. (2.23)By (2.21), (2.22) and (2.23), we haveV =m∑k=0ρkEk{(∂Lk(θ∗, x)/∂θ)(∂Lk(θ∗, x)/∂θᵀ)}−m∑k=0ρkµkµᵀk = U − UWU,which completes the proof.2.5.3 Lemma 2.3: Concavity of the DELIn this subsection, we show that ln(θ) is a concave function. To show theconcavity of ln(θ), it suffices to show that ∂2ln(θ)/∂θ∂θᵀ≤ 0 for all valuesof θ, and since ∂2ln(θ)/∂θ∂θᵀ=∑k, j{∂2Ln,k(θ, xkj)/∂θ∂θᵀ}, it is enoughto show that∂2Ln,k(θ, x)/∂θ∂θᵀ≤ 0,for any given θ, x and k.Recall that θ is composed ofα and β. We first show that ∂2Ln,k(θ, x)/∂α∂αᵀ <0. By (2.16),∂2Ln,k(θ, x)/∂α∂αᵀ = −Hn(θ, x)/sn(θ, x).Noticing that Hn(θ, x) has a similar expression as H(θ, x), which is positivedefinite as we have shown in the proof of Theorem 2.1, only with ρk replaceby λˆk, we know that Hn(θ, x) is also positive definite. Along with the factthat sn(θ, x) > 0, we have ∂2Ln,k(θ, x)/∂α∂αᵀ = −Hn(θ, x)/sn(θ, x) < 0.Secondly, by expression (2.16), we see that ∂2Ln,k(θ, x)/∂θ∂θᵀis just a352.5. Proofsrow and column permuted version of the matrix{∂2Ln,k(θ, x)/∂α∂αᵀ}⊗{Q(x)Qᵀ(x)},which is negative semidefinite for any value of θ and x because ∂2Ln,k(θ, x)/∂α∂αᵀ <0 and Q(x)Qᵀ(x) ≥ 0. Therefore∂2ln(θ)/∂θ∂θᵀ=∑k, j{∂2Ln,k(θ, xkj)/∂θ∂θᵀ}≤ 0for any value of θ, and so ln(θ) is concave.Lastly, when Q(x)Qᵀ(x) > 0 for a given x, −∂2ln(θ)/∂θ∂θᵀ> 0. Hence,when∑k,jQ(xkj)Qᵀ(xkj) > 0, −∂2ln(θ)/∂θ∂θᵀ> 0 and the DEL is strictlyconcave. The proof is complete.2.5.4 Lemma 2.4: 3√n–consistency of the MELEWe show in this subsection that the MELE θˆ is attained in an interior point ofa 3√n–neighbourhood of the true parameter value θ∗ with probability tendingone. Note that θˆ is a maximum point of the DEL ln(θ). The idea is to showthat for any θ on the surface of the closed ball Bθ∗ = {θ : ‖θ−θ∗‖ ≤ n−1/3},ln(θ) < ln(θ∗) with probability tending one. Then, by concavity of ln(θ),all the maximum points, including θˆ, must be interior points of Bθ∗ withprobability tending to one.We first expand ln(θ) around θ∗. Recalling that ∂ln(θ∗)/∂θ =√nv and∂2ln(θ∗)/∂θ∂θᵀ = −nUn, we getln(θ) = ln(θ∗) +√nvᵀ(θ − θ∗)− (1/2)n(θ − θ∗)ᵀUn(θ − θ∗) + n,362.5. Proofswheren =n6∑i, j, k1n∂3ln(θ˜)∂θi∂θj∂θk(θˆi − θ∗i )(θˆj − θ∗j )(θˆk − θ∗k),with θ˜ being some parameter value. Notice that, for any value of θ,∣∣n−1∂3ln(θ)/∂θi∂θs∂θt∣∣ =∣∣n−1m∑k=0nk∑j=1∂3Ln,k(θ, xkj)/∂θi∂θs∂θt∣∣≤m∑k=0{n−1knk∑j=1|∂3Ln,k(θ, xkj)/∂θi∂θs∂θt|}≤m∑k=0{n−1knk∑j=1‖Q(xkj)‖3},where the last inequality is by (2.17). Since ‖Q(x)‖3 is integrable, and xkjare iid across j for each k, by strong law of large numbers, the last term onthe RHS of the above inequality is of O(1), and so is n−1∂3ln(θ)/∂θi∂θj∂θl.This implies that for any θ = θ∗ +Op(n−1/3), we have n = Op(1).By the above result, for any θ on the surface of the closed ball Bθ∗ , i.e.for any θ = θ∗ + an−1/3 with ‖a‖ = 1, we haveln(θ∗ + an−1/3)− ln(θ∗) = n1/6vᵀa− (1/2)n1/3aᵀUna+O(1)= n1/6vᵀa− (1/2)n1/3aᵀUa+ o(n1/3),where the last equality is by the fact that Un = U + o(1). Let c be thesmallest eigenvalue of U . By Theorem 2.2, v = Op(1), so we getln(θ∗ + an−1/3)− ln(θ∗) = n1/6vᵀa− (1/2)n1/3aᵀUa+ o(n1/3),≤ Op(n1/6)− (1/2)cn1/3 + o(n1/3)= −(1/2)cn1/3 + op(n1/3),372.5. Proofsuniformly in a that satisfies ‖a‖ = 1. Since U is positive definite, c > 0.Clearly, with probability tending one, the last term on RHS is strictly smallerthan 0 and hence ln(θ∗ + an−1/3) < ln(θ∗), as n → ∞. By the continuityof ln(θ), ln(θ) must have a maximum in the interior of the ball Bθ∗ withprobability tending one.2.5.5 Theorem 2.5: Asymptotic normality of theMELEWe now show the asymptotic normality of the MELE θˆ. The idea is to showthat√n(θˆ − θ∗) is well approximated by U−1v. As a reminder, U is theinformation matrix, and v = n−1/2∂ln(θ∗)/∂θ.Expanding n−1/2∂ln(θˆ)/∂θ at θ∗, we getn−1/2∂ln(θˆ)/∂θ = v − Un{√n(θˆ − θ∗)}+ n,where Un = n−1∂2ln(θ∗)/∂θ∂θᵀ is the empirical information matrix and nis a vector of length m(d+ 1) whose ith entry is√n(θˆ − θ∗)ᵀ{1n∂3ln(θ˜)∂θi∂θ∂θᵀ}(θˆ − θ∗),with θi being the ith component of θ and θ˜ being some parameter value. Wehave shown in the proof of Lemma 2.4 that the third order derivatives of ln(θ)are uniformly bounded by an integrable function, so n−1∂3ln(θ˜)/∂θi∂θj∂θl =Op(1). This, along with the fact that θˆ − θ∗ = Op(n−1/3), implies thatn = op(1) and the expansion can be written asn−1/2∂ln(θˆ)/∂θ = v − Un{√n(θˆ − θ∗)}+ op(1).Note that ∂ln(θˆ)/∂θ = 0 because the MELE θˆ is the point at which thesmooth function ln(θ) is maximized. By equating the left hand side (LHS)382.5. Proofsof the above expansion to 0 and reorganizing terms, we getUn√n(θˆ − θ∗) = v + op(1). (2.24)By Theorem 2.2, v is asymptotically normal, hence of Op(1), so the LHS ofthe above equality must be Op(1). Note that, on the LHS, the first factorUn has a positive definite limit by Theorem 2.1. We then deduce that thesecond factor,√n(θˆ−θ∗), must also be of Op(1). Furthermore, by Theorem2.1, Un = U + op(1). Hence,Un√n(θˆ − θ∗) =(U + op(1))√n(θˆ − θ∗) = U√n(θˆ − θ∗) + op(1).Substituting the LHS of (2.24) by the RHS of the above equality and reor-ganizing terms, we getU√n(θˆ − θ∗) = v + op(1).Since U is positive definite, we can left multiply U−1 on both sides of theabove equality to get√n(θˆ − θ∗) = U−1v + op(1). (2.25)Combining the above equality and asymptotic normality of v, we get theclaimed result that√n(θˆ−θ∗)→ N(0, U−1−W ) in distribution as n→∞.39Chapter 3Dual Empirical Likelihood RatioTest for Hypotheses about DRMParametersThis chapter develops a DEL ratio test for composite hypotheses about theparameter of the DRM based on independent samples from different pop-ulations. The proposed test encompasses testing differences in populationdistributions as a special case. The DEL ratio test statistic is found to havea classical chi–square null limiting distribution and a non–central chi–squarelimiting distribution under a class of local alternatives. The null limiting dis-tribution is useful for approximating the p–values of the proposed test; thelimiting distribution under the local alternative model is useful for approxi-mating the power of the proposed test, calculating the sample size requiredfor achieving a given power, and comparing the local asymptotic powers ofDEL ratio tests formulated in different ways. Simulation studies show thatthis test has better power properties than all potential competitors adoptedto the multiple sample problem under the investigation, and is robust tomodel misspecification. The proposed test is then applied to assess strengthproperties of lumber with intuitively reasonable implications for the forestindustry.403.1. Introduction3.1 IntroductionAn important task of the long term monitoring project is to monitor changein population distributions of the strength of lumber produced over the years.Recall Section 2.3 that, under the DRM (2.1), the equality of two distribu-tion functions is equivalent to the equality of the corresponding DRM slopeparameters: Fk = F0 is equivalent to βk = 0 and Fk = Fj is equivalentto βk = βj, k, j = 1, . . . , m. Hence under the DRM, a hypothesis aboutthe differences in distribution functions ultimately translates to a hypothesisabout the DRM parameter β.In principle, for a linear hypothesis about the DRM parameter β:H0 : Aβ = c against H1 : Aβ 6= cfor some given matrix A and vector c, a Wald type test (Fokianos et al.,2001, (17)) can be easily constructed based on the asymptotic normality ofthe MELE θˆ. According to Theorem 2.5,√n(βˆ − β∗) −→ N(0, Σβ)in distribution for some positive definite covariance matrix Σβ. Under thenull of the above linear hypothesis, Aβ∗ = c, so√n(Aβˆ − Aβ∗) =√n(Aβˆ − c) −→ N(0, AΣβAᵀ)in distribution. Let Σˆβ be a consistent estimator of the asymptotic covariancematrix of βˆ. When A is a full rank q×mdmatrix with q ≤ md, the dimensionof β, the test statisticWn = n(Aβˆ − c)ᵀ(AΣˆβAᵀ)−1(Aβˆ − c)has a chi–square limiting distribution, χ2q, of q degrees of freedom. Such a413.1. Introductiontest, however, suffers from a few drawbacks. First, it is usually not very pow-erful when the sample size is not very large because the estimation accuracyof the asymptotic covariance estimator Σˆβ in the denominator could be lowin that case. Second, it is not invariant to transformations: if we transformthe parameter and the hypothesis accordingly, the value of the test statisticand the corresponding p–value of the test may be different form those basedon the original scale.In contrast to Wald tests, likelihood ratio tests are usually more powerfulbecause they do not need an estimation of the asymptotic covariance matrix,and are invariant to transformation. However, as described in Section 2.3, asimplistic application of the straightforward EL ratio test to hypotheses thatcompare the slope DRM parameter βk of different population distributionsis negated by the non–regularity of the DRM, under which the EL functionis not well–defined in a neighbourhood of the true parameter value β∗ ifβ∗k = β∗j for some k, j ∈ {0, 1, . . . , m}. Therefore we look for a likelihoodratio test based on the DEL because it is well–defined for all values of θin the corresponding Euclidean space, has a simple analytical form, and isconcave.The next section presents a DEL ratio (DELR) test for a general com-posite hypothesis about the DRM slope parameter β, which encompassestesting differences in population distributions as a special case, and gives thelimiting distributions of the proposed test statistic under both the null anda class of local alternatives of that hypothesis testing problem. The proofsof these properties are given in Section 3.6. Section 3.3 assesses, via simu-lation, the finite sample distributions of the DELR statistic under the nulland local alternative models, as well as the power of the DELR test. Therobustness of the proposed test against the misspecification of the DRM isstudied via simulation in Section 3.4. In Section 3.5, we apply the DELRtest to lumber bending strength data and find that the outcome leads tointuitively reasonable implications for the forest industry.423.2. DELR statistic and its limiting distributions3.2 DELR statistic and its limitingdistributionsRecall Section 2.3 that we defined the DEL asln(θ) = −∑k,jlog{ m∑r=0λˆr exp{αr + βᵀrq(xkj)}}+∑k,j{αk + βᵀkq(xkj)},where λˆr = nr/n, nr is the size of the rth sample and n is the total sample size.The MELE θ is the point at which the DEL is maximized, θˆ = argmaxθln(θ).This DEL, unlike the EL, is well–defined for all values of θ, so we expect toderive the limiting distribution of the corresponding likelihood ratio statisticusing classical techniques. Under a two–sample DRM (m + 1 = 2), Keziouand Leoni-Aubin (2008) found that for simple hypothesis H0 : β1 = 0, orequivalently H0 : F1 = F0, the corresponding likelihood ratio test statistic,2ln(θˆ), has the usual chi–square limiting distribution, χ2d, with d degrees offreedom.The success of Keziou and Leoni-Aubin leads us to wonder if the result ismore generally applicable. In the long term monitoring program for lumberquality, we may encounter similar situations as follows: we have five (m+1 =5) lumber samples, with the first two being spruce samples, the third andfourth being pine samples, and the fifth being a Douglas fir sample; we areinterested in testing if the two spruce populations (F0 & F1) have the sameoverall quality, which amounts to the hypothesis testing problem ofH0 : β1 = 0 against H1 : β1 6= 0.If we also concerned about the stability of the qualities of the two pinepopulations (F2 & F3) simultaneously, the hypothesis testing problem would433.2. DELR statistic and its limiting distributionsbeH0 : β1 = 0 and β2 = β3 against H1 : β1 6= 0 or β2 6= β3.The first hypothesis testing problem above, despite of its simple appearance,is a composite hypothesis testing problem that is fundamentally differentfrom the two–sample problem that Keziou and Leoni-Aubin has studied. Itis clear that three other distributions F2, F3 and F4 are also modeled by theDRM, but their corresponding slope parameters β2, β3 and β4 are nuisanceparameters that are not specified in the hypothesis. The second hypothesistesting problem above is more complicated and also has a nuisance parame-ter β4 for the fifth population which is not specified in the hypothesis. Bothtesting problems are not covered by the results of Keziou and Leoni-Aubin.In addition, their proof of the result does not readily extend to more compli-cated hypotheses, because it is tailored for true parameter β∗ = 0, in whichcase the analytical expression of a key quadratic form that approximatesthe corresponding DELR statistic is much simpler than that for a compositehypothesis.The above limitation of Keziou and Leoni-Aubin’s result leads us to in-vestigate the properties of the DEL ratio in a much more general setting.3.2.1 DELR statistic and its null limiting distributionAll the above hypothesis testing problems can be abstractly stated as testingH0 : g(β) = 0 against H1 : g(β) 6= 0 (3.1)for some smooth function g : Rmd → Rq, with q ≤ md, the length of β. Wewill always assume that g, is thrice differentiable with a full rank Jacobianmatrix ∂g/∂β. The parameters {αk} are usually not a part of the hypothesis,because, by (2.2), their values are fully determined by the {βk} and F0 under443.2. DELR statistic and its limiting distributionsthe DRM assumption:αk = − logˆexp{βᵀkq(x)}dF0,although they are regarded as independent parameters in the DEL.Let θ˜ be the point at which the maximum of the DEL ln(θ) is attainedunder the null constraint g(β) = 0. Recall that the MELE θˆ is the point atwhich the ln(θ) is maximized without the null constraint. The DELR teststatistic is defined to beRn = 2{ln(θˆ)− ln(θ˜)}.Does Rn have the properties of a regular likelihood ratio test statistic? Theanswer is positive and we state the result as follows, the proof of which isgiven in Section 3.6.Recall the conditions of Theorem 2.1: we have m + 1 random sam-ples of sizes nk, k = 0, 1, . . . , m, from populations with distributions ofthe DRM form given in (2.1) and a true parameter value θ∗ such that´exp{βᵀkq(x)}dF0 <∞ for θ in a neighbourhood of θ∗,´Q(x)Qᵀ(x)dF0is positive definite with Qᵀ(x) = (1, qᵀ(x)), and λˆk = nk/n = ρk + o(1),where n =∑mk=0 nk is the total sample size, for some ρk ∈ (0, 1).Theorem 3.1 (Null limiting distribution of the DELR statistic). Adopt theconditions posited in Theorem 2.1. Under the null hypothesis g(β) = 0,Rn → χ2q in distribution as n → ∞, where q is the dimension of the nullmapping g(·) and χ2q is a chi–squared random variable with q degrees of free-dom.When m = 1, Theorem 3.1 reduces to the result of Keziou and Leoni-Aubin (2008) for g(β) = β1. This Theorem covers additional ground, for in-stance, the two composite hypothesis testing examples given at the beginningof this section and the case when we test the hypothesis g(β) = β1−β2 = 0453.2. DELR statistic and its limiting distributionsbased on all m+ 1 = 5 samples.The null limiting distribution of Rn is most useful for approximating thep–value, p, of a DELR test:p ≈ Pr(χ2q ≥ Rn).At the significance level of α, we reject the null hypothesis of g(β) = 0 whenRn ≥ χ2q, 1−α, where χ2q, p in general is the pth quantile of χ2q distribution.3.2.2 Limiting distribution of the DELR statisticunder local alternativesTheorem 3.1 provides an approximation to the p–value of a test but it doesnot give the power of the test. As is well known, most sensible tests areconsistent: the asymptotic power at any fixed alternative model goes to 1 asthe sample size n→∞; this is true for DELR test. Hence, instead of lookingat a fixed alternative, we here study the asymptotic power of the DELR testunder a class of local alternatives, under which the limiting distribution ofRn usually is not a point mass. The finite–sample power properties of thetest are studied by simulation in Section 3.3.2.Let {β∗k} be a set of parameter values which form a null model satisfyingH0 : g(β) = 0 under the DRM assumption. Letβk = β∗k + n−1/2k ck, (3.2)for some constants {ck}, be a set of parameter values which form a localalternative. We denote the distribution functions corresponding to β∗k andβk as Fk and Gk with G0 = F0, respectively. Note that the {Gk} are placedat n−1/2 distance from the {Fk}. As n → ∞, the limiting distribution ofRn under this local alternative is usually non–degenerate and provides usefulinformation on the power of the test.463.2. DELR statistic and its limiting distributionsWe now express the null hypothesis g(β) = 0 in an equivalent form.Recall that g : Rmd → Rq is thrice differentiable in a neighbourhood of β∗with a full rank Jacobian matrix evaluated at β∗. Denote 5 = ∂g(β∗)/∂βand partition5 into (51, 52), with q andmd−q columns respectively. Whenq < md, by the implicit function theorem (Zorich, 2004, 8.5.4, Theorem 1),there exists a unique function G: Rmd−q → Rmd, such that g(β) = 0 ifand only if β = G(γ) for some β and γ in a corresponding neighbourhoodsof β∗ and γ∗ respectively. In addition, G is also thrice differentiable in aneighbourhood of γ∗, and its Jacobian matrix evaluated at γ∗,J = ∂G(γ∗)/∂γ,has a full rank. Furthermore, if 51 has a full rank, thenJ = (−(5−11 52)ᵀ, Imd−q)ᵀ, (3.3)where Ik is an identity matrix of size k × k.Let U be the information matrix (2.13) under the null model H0 repre-sented by the {β∗k} and {Fk}.Theorem 3.2 (Limiting distribution of the DELR under local alternatives).Under the conditions of Theorem 2.1 and local alternative defined by (3.2),Rn → χ2q(δ2)in distribution as n→∞, where χ2q(δ2) is a non–central chi–square randomvariable with q degrees of freedom and a nonnegative non–central parameterδ2 ={ηᵀ{Λ− ΛJ(JᵀΛJ)−1JᵀΛ}η if q < mdηᵀΛη if q = mdwhere ηᵀ = (ρ−1/21 cᵀ1, ρ−1/22 cᵀ2, . . . , ρ−1/2m cᵀm) and Λ = Uββ − UβαU−1ααUαβ.Moreover, δ2 > 0 unless η is in the column space of J .473.2. DELR statistic and its limiting distributionsThe proof is given in Section 3.6. This result is useful for: (1) computinglocal power of the DELR test under specific distributional settings, (2) calcu-laing required sample size for achieving a certain power at a given alternative,and (3) comparing the powers of DELR tests formulated in different ways,which helps us to determine the most efficient use of information containedin multiple samples. We illustrate the first two points using the examplesbelow, and discuss the last point in Chapter 4.Example 3.1 (Computing the local asymptotic power of DELR test for acomposite hypothesis). Consider the situation where m+ 1 = 3, samples arefrom a DRM with basis function q(x) = (x, log x)ᵀ, and the sample propor-tions are (0.4, 0.3, 0.3). Let Fk, k = 1, 2, be the distributions with parametersβ∗1 = (−1, 1)ᵀ and β∗2 = (−2, 2)ᵀ. Let H0 be g(β) = 2β1 − β2 = 0. Considerthe local alternativeβk = β∗k + n−1/2k ck, for k = 1, 2, (3.4)with c1 = (2, 3)ᵀ and c2 = (−1, 0)ᵀ.We find 5 = (2I2, −I2) so J = ((1/2)I2, I2), and η ≈ (3.65, 5.48, −1.83,0)ᵀ. The information matrix U is F0 dependent. When F0 is Γ(2, 1), wherein general Γ(λ, κ) denotes the gamma distribution with shape λ and rate κ,we obtain the information matrix (2.13) and hence Λ, based on numericalcomputation. We therefore get δ2 ≈ 10.29 based on formula given in theabove theorem.The null limiting distribution of Rn is χ22. At the 5% level, the null isrejected when Rn ≥ χ22, 0.95 ≈ 5.99. Hence under the local alternative, thepower of the DELR test is approximately Pr(χ22(10.29) ≥ 5.99) ≈ 0.83.Example 3.2 (Sample size calculation for Example 3.1). Adopt the set-tings of Example 3.1. Suppose we require the power of the DELR test tobe at least 0.8 at the alternative of β1 = β∗1 + (0.5, 1.5)ᵀ and β2 = β∗2 +(0.5, 0.5)ᵀ at the 5% significance level. Recall that the sample proportions483.2. DELR statistic and its limiting distributionsare (0.4, 0.3, 0.3). This alternative corresponds to a local alternative of theform (3.4) with with c1 = (0.5√n1, 1.5√n1)ᵀ= 0.5(√0.3n, 3√0.3n)ᵀandc2 = (0.5√n2, 0.5√n2)ᵀ= 0.5(√0.3n,√0.3n)ᵀ.Using the above c1 and c2, we obtain η = (0.3−1/2cᵀ1, 0.3−1/2cᵀ2)ᵀ=0.5√n(1, 3, 1, 1)ᵀas a function of the total sample size n. With the sameJ , F0 and U as obtained in Example 3.1, and applying the formula givenin Theorem 3.2, we obtain the non–central parameter δ2 as a function of n,which we denote as δ2(n). To attain a minimal power of 0.8, we solvePr(χ22(δ2(n)) ≥ χ22, 0.95) ≥ 0.8for the total sample size n and get n ≥ 50.3.2.3 On the condition for the positiveness of thenon–central parameterA meaningful test should be unbiased: at the significance level of α, for anygiven alternative, the power of the test should be at least as large as α. Isthe DELR test asymptotically unbiased under the local alternatives of theform (3.2)? The answer is positive. Recall that at the significance level ofα, we reject the null hypothesis of g(β) = 0 when Rn ≥ χ2q, 1−α. Hence, byTheorem 3.2, at any given local alternative of the form (3.2), the asymptoticpower of the test islimn→∞Pr(Rn ≥ χ2q,1−α) = Pr(χ2q(δ2) ≥ χ2q,1−α). (3.5)By a result about non–central chi–square distribution (Johnson et al., 1995,(29.25a)), if 0 ≤ δ21 < δ22, then for any x > 0,Pr(χ2d(δ21) ≥ x) < Pr(χ2d(δ22) ≥ x). (3.6)493.2. DELR statistic and its limiting distributionsNote that a non–central chi–square distribution with a 0 non–central param-eter is just a usual chi–square distribution. Therefore, in view of (3.5), thelocal asymptotic power of the DELR test satisfyPr(χ2q(δ2) ≥ χ2q,1−α) ≥ Pr(χ2q ≥ χ2q,1−α) = αwith equality if and only if δ2 = 0. Thus the DELR test is asymptoticallyunbiased under the local alternative model (3.2).In practice, we always hope that the power of a test at an alternative isstrictly larger than the significance level. We now take one step further tostudy in what situations the local asymptotic power of DELR test is strictlylarger than the significance level α, or equivalently, δ2 > 0. Roughly, theanswer lies in whether a β defined by the local alternative model (3.2) istruly a local alternative.We first look at the case that q, the dimension of the g function in thehypothesis (3.1), equals md, the dimension of the DRM slope parameter β.In this case, by the inverse function theorem (Zorich, 2004, 8.6.1, Theorem 1),g is invertible at β∗, i.e. β∗ = g−1(0). Hence g defines a simple hypothesistesting problem with β being fully specified to be g−1(0) in the null. Thenany β value defined by the local alternative model (3.2), as long as not allthe {ck} are 0, is a real alternative, i.e. does not satisfy the null constraintof g(β) = 0 for any given sample size n. In this case, by Theorem 3.2, thenon–central parameter is δ2 = ηᵀΛη. Since Λ is positive definite and η 6= 0,we have δ2 > 0.When q < md, for some choices of ck, the β defined by the local alter-native model may still satisfy the null model of g(β) = 0. For example, letm+ 1 = 3, the null hypothesis be g(β) = 2β1−β2 = 0, and β∗be a param-eter value satisfying this null model. Then, when n1 = n2, the parameterβ satisfying (3.2) with β1 = β∗1 + n−1/21 v and β2 = β∗1 + 2n−1/22 v for somevector v always satisfies the null model for any given n. In this case, δ2 = 0as we will see soon.503.2. DELR statistic and its limiting distributionsThe mathematical condition that δ2 > 0 is given in Theorem 3.2: η isnot in the column space of J , the Jacobian matrix ∂G(γ∗)/∂γ. We now showthat this condition is equivalent to: for the β defined by the local alternativemodel (3.2), the speed that g(β) converges to 0 is no faster than the speedthat β converges to β∗, which is on the order of O(n−1/2).Let β∗ be a parameter value satisfying the null model g(β) = 0, and βsatisfy the local alternative model (3.2). Recall that we have defined 5 =∂g(β∗)/∂β. Expanding g(β) around β∗ and using the fact that g(β∗) = 0,we getg(β) = g(β∗) + {∂g(β∗)/∂β}(β − β∗) +O(‖β − β∗‖2)= 5(β − β∗) +O(‖β − β∗‖2).Notice that, when β satisfies the local alternative model (3.2), we haveβ − β∗ = n−1/2(λˆ−1/21 cᵀ1, λˆ−1/22 cᵀ2, . . . , λˆ−1/2m cᵀm)ᵀ= n−1/2η + o(n−1/2),Consequentlyg(β) = 5(β − β∗) +O(‖β − β∗‖2) = n−1/25 η + o(n−1/2).Without loss of generality, assume that the submatrix 51 of 5 = (51, 52)is of full rank. Recall that, in this case, the Jacobian matrix, J , of G(·)evaluated at γ∗ is given by (3.3): J = (−(5−11 52)ᵀ, Imd−q)ᵀ. We find thatthe column space of J is exactly the null space of 5. Hence, η is in thecolumn space of J if and only if 5η = 0, in which case, by the aboveexpansion, we have g(β) = o(n−1/2). Now, by Theorem 3.2, δ2 = 0 if andonly if η is in the column space of J , so we conclude that δ2 = 0 if and onlyif g(β) = o(n−1/2). On the other hand, if we know g(β) = 0 for all n likethe case in the previous example, then we must have 5η = 0 and so δ2 = 0.The above analysis shows that δ2 = 0 if and only if g(β) converges to513.3. Simulation studies0 faster than the order of O(n−1/2), which is the speed that β converges toβ∗. In this case, such a β defined by the local alternative model should beconsidered to be under the null model of g(β) = 0 in asymptotic sense.3.3 Simulation studiesIn this section, we conducted simulations to study: (1) the approximationaccuracy of the limiting distributions to the finite–sample distributions ofthe DELR statistic under both the null and the alternative models, and (2)the power of the DELR test under correctly specified DRMs. The num-ber of simulation runs in this and the next section (Section 3.4) is set to10, 000. Our simulation is more extensive than what are presented in termsof hypothesis, population distribution, and sample sizes. We selected themost representative ones and included them here; but the other results aresimilar. All computations are carried out by our R package drmdel for ELinference under DRMs, which is introduced in Chapter 6 and available onThe Comprehensive R Archive Network (CRAN).3.3.1 Approximation to the distribution of the DELRNull limiting distributionWe first study how well the chi–square distribution approximates the finite–sample distribution of the DELR statistic under the null hypothesis of (3.1).Set m + 1 = 6 and consider the hypothesis (3.1) with g(β) = (βᵀ1, βᵀ3) −(βᵀ2, βᵀ4). The null hypothesis is equivalent to F1 = F2 and F3 = F4. Wegenerated six samples of sizes (90, 60, 120, 80, 110, 30) from two distributionfamilies. The first one is from normal distributions with means (0, 2, 2, 1,1, 3.2) and standard deviations (1, 1.5, 1.5, 3, 3, 2). The second one is fromgamma distributions with shapes (3, 4, 4, 5, 5, 3.2) and rates (0.5, 0.8, 0.8,1.1, 1.1, 1.5).523.3. Simulation studiesWhen the basis function q(x) is correctly specified, i.e. q(x) = (x, x2)ᵀfor the normal family and q(x) = (log x, x)ᵀfor gamma family, the DELRstatistic, Rn, has a χ24 null limiting distribution in both cases. The quantile–quantile (Q–Q) plots of the distribution of Rn and χ24 are shown in Figure3.1. In both cases, the approximations are very accurate. The type I errorrates of Rn at 5% level are 0.056 and 0.058 for normal and gamma datarespectively.llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll l ll llNormal dataχ42DLR statistic0 6 0.95 qt 18 24060.95 qt18lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll ll l llGamma dataχ420 6 0.95 qt 18 24060.95 qt1824Figure 3.1: Q–Q plots of the simulated and the null limiting distributions ofthe DELR statistic.In general, for data from distributions such as Weibull and log–normal,the chi–square approximation has satisfactory precision when nk ≥ 70.Distribution under local alternativesWe next examine the precision of the non–central chi–square distributionunder the local alternative model (3.2). We set m+ 1 = 4 with sample sizes120, 160, 80 and 60.In the first scenario, we test the hypothesis (3.1) with g(β) = βᵀ1 − βᵀ2.The perceived null model is specified by β∗1 = β∗2 = (0.25, 1.875)ᵀ, β∗3 =533.3. Simulation studies(0.125, 1.97)ᵀ with basis function q(x) = (x, x2)ᵀ. The data were generatedfrom G0 = N(0, 0.52), G1 and G3 with β∗1 and β∗3 respectively, and G2 withβ2 = β∗2 + n−1/22 (1, 0)ᵀ. According to Theorem 3.2, the limiting distributionof Rn is χ22(2.67).In the second scenario, we test (3.1) with g(β) = (βᵀ1, βᵀ3)−(βᵀ2, (−6, 9)ᵀ).The perceived null model is specified by β∗1 = β∗2 = (−4, 5)ᵀ, β∗3 = (−6, 9)ᵀwith basis function q(x) = (log x, x)ᵀ. We generated data from G0 = Γ(3, 2)and Gk, k = 1, 2, 3, specified by (3.2) with c1 = (0.5, 0.5)ᵀ, c2 = (1, 1)ᵀandc3 = (2, 2)ᵀ. According to Theorem 3.2, the limiting distribution of Rn isχ24(1.80).The Q–Q plots under the two scenarios are shown in Figure 3.2. It isclear the non–central chi–square limiting distributions approximate these ofof Rn very well. In unreported simulation studies under various settings, wefind the approximate of the non–central chi–square is generally satisfactorywhen nk ≈ 100.llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll ll lll0 5 10 15 20 25 30051015202530Normal dataχ22(2.67)DELR statisticlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll l l ll0 5 10 15 20 25 3005101520253035Gamma dataχ42(1.80)Figure 3.2: Q–Q plots of the distributions of the DELR statistics underthe local alternative model against the corresponding asymptotic theoreticaldistributions.543.3. Simulation studies3.3.2 Power comparisonWe now compare the power of the DELR test (DELRT) with a numberof popular methods for detecting differences between distribution functions,testing H0 : F0 = F1 = . . . = Fm. This is the same as (3.1) with g(β) = β.We use the nominal level of 5%.The competitors include the Wald test based on DRM (Wald) (Fokianoset al., 2001, (17)), analysis of variance (ANOVA), the Kruskal–Wallis rank–sum test (KW) (Wilcox, 1995), and the k–sample Anderson–Darling test(AD) (Scholz and Stephens, 1987).The Wald test, as described in Section 3.1, is based on test statisticnβˆᵀΣˆ−1βˆ with Σˆ being a consistent estimator of the asymptotic covariancematrix of βˆ. It uses a chi–square reference distribution. KW is a rank–basednonparametric test for equal population medians. AD is a nonparametrictest based on the quadratic distances of empirical distribution functions forequal population distributions.We first compare their powers based on normal data with m+ 1 = 2 andsample sizes n0 = 30 and n1 = 40. In this case, the two–sample t–test isthe most powerful unbiased test when the two populations have the samevariance.We consider two different scenarios for alternatives both having F0 =N(0, 22). In the first scenario, F1 = N(µ, 22) with µ increasing in absolutevalue in a sequence of simulation experiments. In the second scenario, weconsider seven parameter settings (settings 0–6) for F1 = N(µ, σ2) with µand σ taking values in (0, 0.05, 0.1, 0.15, 0.25, 0.36, 0.55) and (2, 1.9, 1.8,1.7, 1.62, 1.56, 1.50) respectively.The power curves are shown in Figure 3.3. In the equal-variance scenario,DELR test is comparable to the optimal two–sample t–test. In the unequalvariance scenario 2, the DELR test clearly has much higher power than itscompetitors, and its type I error is close to the nominal 0.05.We next compare these tests on non–normal samples with m+ 1 = 5 and553.3. Simulation studiesl lllllllllll l lllllllllll lTwo normals with equal variancesMeans of the second normalPower0.050.20.40.60.81.0−1.92 −1.28 −0.64 0 0.64 1.28 1.92l lllllllll l l lllllllllllllDELRTWaldTwo−sample tWilcoxonADlllllll0 1 2 3 4 5 6Two normals with unequal variancesParameter settings for the second normal0.050.20.40.60.81.0lllllllFigure 3.3: Power curves for normal data. The parameter setting 0 cor-responds to the null model and the settings 1–6 correspond to alternativemodels.sample sizes to be 30, 40, 25, 45 and 50. We generated data from four familiesof distributions: gamma, log–normal, Pareto with common support, andWeibull distributions with shape parameter equaling 0.8. The log–normal,Pareto and Weibull distributions satisfy DRMs with basis functions q(x) =(log x, log2 x)ᵀ, q(x) = log x, and q(x) = x0.8, respectively.We used six DRM parameter settings (settings 0–5; shown in Table 3.10in 3.7). Setting 0 satisfies the null hypothesis and settings 1–5 do not. Thesimulated rejection rates are shown in Figure 3.4. It is clear that the DELRtest has the highest power while its type I error rates are close to the nominal.563.3. Simulation studiesllllll0 1 2 3 4 5Gamma dataPower0.050.20.40.60.81.0l l lll lllDELRTWaldANOVAKWADllllll0 1 2 3 4 5Log−normal data0.050.20.40.60.81.0l ll l l lllllll0 1 2 3 4 5Pareto dataParameter settingsPower0.050.20.40.60.81.0llllllllllll0 1 2 3 4 5Weibull data with known and common shapesParameter settings0.050.20.40.60.81.0llllllFigure 3.4: Power curves for non–normal data. The parameter setting 0corresponds to the null model and the settings 1–5 correspond to alternativemodels.573.4. Robustness of DELR test against model misspecification3.4 Robustness of DELR test against modelmisspecificationThe DRM is very flexible and includes a large number of distribution familiesas special cases. The risk of misspecification thus is considered low. Never-theless, examining the effect of misspecification remains an important topic.Fokianos and Kaimi (2006) suggested that misspecifying the basis functionq(x) has an adverse effect on estimating β. Chen and Liu (2013) found thatestimation of population quantiles is robust against misspecification. In thissection, we use simulation studies to demonstrate that, even if the DRMis misspecified, the null distribution of the DELR statistic is still well ap-proximated by the chi–square distribution for large sample sizes when a highdimensional basis function q(x) is utilized, and the DELR test remains tohave a high power and reasonable type I error rate for testing the hypothesisof equal population distributions.3.4.1 Null limiting distribution of the DELR statisticWe first study the chi–square approximation to the finite–sample distributionof the DELR statistic based on misspecified DRMs under the null hypothesisof (3.1). As for the simulation under the correctly specified DRM in Section3.3.1, we set m + 1 = 6 and consider the hypothesis (3.1) with g(β) =(βᵀ1, βᵀ3) − (βᵀ2, βᵀ4), which is equivalent to F1 = F2 and F3 = F4. Wegenerated six samples of the same size under two different distributionalsettings respectively. The first setting consists of Weibull distributions withshapes (2.5, 1, 1, 2, 2, 1.8) and scales (1.2, 2.8, 2.8, 4, 4, .0.9). The secondsetting consists of distributions from different families:X0 ∼ Gamma(3, 0.5), X1 ∼ log–normal(0, 0.6), X2 ∼ log–normal(0, 0.6),X3 ∼Weibull(2, 4), X4 ∼Weibull(2, 4), X5 ∼ Gamma(2, 0.8).583.4. Robustness of DELR test against model misspecificationThe distributions under neither of the above settings satisfy a DRM,however we still fit a DRM to them under each setting. For the first setting,we fit a DRM with the basis function that is suitable for gamma family,q(x) = (x, log x)ᵀ, to the Weibull samples, because the shapes of Weibulldensities are similar to those of gamma densities. For the second setting ofmixed families of distributions, we fit DRMs with the following different basisfunctions:DRM 1: q(x) = log x,DRM 2: q(x) = (x, x2)ᵀ,DRM 3: q(x) = (x, log x)ᵀ,DRM 4: q(x) = (log x,√x, x)ᵀ,DRM 5: q(x) = (log x, x, x2)ᵀ,DRM 6: q(x) = (log x,√x, x, x2)ᵀ.For each DRM, the theoretical limiting distribution of the DELR test forthe null hypothesis of g(β) = (βᵀ1, βᵀ3)− (βᵀ2, βᵀ4) = 0 is χ22d, where d is thedimension of the basis function.Under the first setting, we calculate the DELR test statistic for differentsample sizes: nk = 20, 40, 70, 100, 150, 300 respectively for k = 0, . . . , 5.The Q–Q plots of the DELR statistics against the quantiles of the theoreticallimiting χ24 distribution are shown in Figure 3.5 and the corresponding type–Ierror rates of the DELR tests at the nominal sizes of 0.10 and 0.05 are shownin Table 3.1. For small sample sizes, the Q–Q plots are always above thediagonal line, and the type–I errors are higher than the nominal sizes. Asthe sample size increases, the Q–Q plot slowly move towards the diagonalline. When the size of each sample reaches 150 and higher, the χ24 distributionapproximates the distribution of Rn well, and the type–I errors are close to593.4. Robustness of DELR test against model misspecificationthe nominal sizes. These tell us, although the DRM is misspecified, the chi–square approximation of the DERL statistic in this case is still useful forsamples of large sizes.Table 3.1: The type–I error rates of the DELR tests at nominal sizes of 0.10and 0.05 for Weibull samples under a misspecified DRM.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.1483 0.1322 0.128 0.123 0.1178 0.11020.05 0.0839 0.0728 0.0664 0.0664 0.0624 0.0543For the second setting of mixed families of distributions, the Q–Q plotsof the DELR statistics against the theoretical quantiles of the limiting dis-tribution under DRM 1 – DRM 6 are shown in Figure 3.6 – 3.11. Thecorresponding type–I error rates at the nominal sizes of 0.10 and 0.05 areshown in Table 3.2 – 3.7.For the DRM with the simplest basis function (DRM 1), the Q–Q lineis always slightly under the diagonal line for all sample sizes, implying con-servative DELR tests of all nominal sizes. For the two DRMs with two–dimensional basis functions (DRM 2 and 3), the Q–Q lines are always abovethe diagonal line for all sample sizes, which indicate anti–conservative DLRtests of all nominal sizes. For the two DRMs with three–dimensional basisfunctions (DRM 4 and 5), the Q–Q line is above the diagonal line when thesample size is small (nk = 20, 40). It moves close to the diagonal line whenthe sample size becomes moderately large (nk = 70, 100), and below the di-agonal line when the sample size gets larger (nk = 150, 300). If we increasethe size of each sample to 500 or 1, 000, the Q–Q line stays slightly below anda little further away from the diagonal line. For all the above cases (DRM1 – 5), the Q–Q plots show that the DLR test has some bias that does notdiminish as the sample size becomes larger.The DELR test under DRM 6, which has a four–dimensional basis func-tion, has a different behaviour with respect to the sample size. The Q–Q line603.4. Robustness of DELR test against model misspecificationllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll ll l ll l ll llm=5, nk=20Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll lll lll l l lllm=5, nk=400 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll lllll lll ll lm=5, nk=70Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll ll lll l lll ll lm=5, nk=1000 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll ll l l llll llm=5, nk=150Theoretical Quantiles of the χ42 distributionSample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll ll lll l l l llllm=5, nk=300Theoretical Quantiles of the χ42 distribution0 6 0.95 quant. 18 24060.95 quant.1824Figure 3.5: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for Weibull samples under a misspecified DRM.613.4. Robustness of DELR test against model misspecificationis well above the diagonal line when the sample size is small. As the sam-ple size increases, the Q–Q line moves gradually towards the diagonal line.When the size of each sample reaches 300, the Q–Q line is very close to thediagonal line. And if we further increase the size of each sample to 500 and1, 000, the Q–Q line approaches even closer to the diagonal line from above,and it never goes below as in the case under DRMs with three–dimensionalbasis functions. This behaviour is similar to that of the DELR test under acorrectly specified DRM, only the speed of convergence to the Chi–squareddistribution is slower. It indicates that a DRM with a high dimensional basisfunction is more likely to fit the samples better.Table 3.2: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 1.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.0867 0.0749 0.071 0.0739 0.07 0.07040.05 0.0419 0.0348 0.0311 0.0358 0.0321 0.0335Table 3.3: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 2.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.143 0.1251 0.1254 0.1282 0.1201 0.12150.05 0.0759 0.0683 0.0644 0.0702 0.064 0.0649623.4. Robustness of DELR test against model misspecificationTable 3.4: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 3.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.149 0.1419 0.1367 0.1408 0.1325 0.13640.05 0.0815 0.0794 0.0738 0.0778 0.0712 0.0742Table 3.5: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 4.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.1554 0.1204 0.0987 0.0981 0.0877 0.08190.05 0.0848 0.0616 0.0472 0.0501 0.0391 0.0387Table 3.6: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 5.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 3000.10 0.1608 0.1198 0.1019 0.1006 0.0872 0.08170.05 0.0871 0.0642 0.0503 0.0517 0.0443 0.0416Table 3.7: The type–I error rates of the DELR tests at nominal sizes of0.10 and 0.05 for samples from different families of distributions under themisspecified DRM 6.Nominal size nk = 20 nk = 40 nk = 70 nk = 100 nk = 150 nk = 300 nk = 500 nk = 1, 0000.10 0.2228 0.1602 0.1336 0.1343 0.1201 0.1109 0.1132 0.10380.05 0.1343 0.0866 0.0708 0.0716 0.0628 0.0583 0.0578 0.0529633.4. Robustness of DELR test against model misspecificationllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll ll l ll ll ll lll lllm=5, nk=20Sample quantiles of the DLR statistic0 0.95 quant. 12 1800.95 quant.1218lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllll l l l ll l ll lll ll lm=5, nk=400 0.95 quant. 12 1800.95 quant.12lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllll llllllllllllll l l l l ll ll ll l ll llm=5, nk=70Sample quantiles of the DLR statistic0 0.95 quant. 12 1800.95 quant.12llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllllllll ll ll l l ll l l ll l llm=5, nk=1000 0.95 quant. 12 1800.95 quant.1218lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll l ll lll l l l ll l l llm=5, nk=150Theoretical Quantiles of the χ22 distributionSample quantiles of the DLR statistic0 0.95 quant. 12 1800.95 quant.12llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllll llllllllllllll l ll l ll ll lll ll llm=5, nk=300Theoretical Quantiles of the χ22 distribution0 0.95 quant. 12 1800.95 quant.1218Figure 3.6: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 1.643.4. Robustness of DELR test against model misspecificationllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lll l ll ll l l lm=5, nk=20Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll llll ll l ll lm=5, nk=400 6 0.95 quant. 18 24060.95 quant.1824llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll lll l l l ll llm=5, nk=70Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll l ll ll ll lll lm=5, nk=1000 6 0.95 quant. 18 24060.95 quant.1824llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll l ll ll l llll lm=5, nk=150Theoretical Quantiles of the χ42 distributionSample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll lll l l l ll llm=5, nk=300Theoretical Quantiles of the χ42 distribution0 6 0.95 quant. 18 24060.95 quant.1824Figure 3.7: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 2.653.4. Robustness of DELR test against model misspecificationllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l llll l ll l l lllm=5, nk=20Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll l lll l ll ll llm=5, nk=400 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll ll l ll l ll llm=5, nk=70Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll ll l ll lll llm=5, nk=1000 6 0.95 quant. 18 24060.95 quant.1824llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll lll lll l l llm=5, nk=150Theoretical Quantiles of the χ42 distributionSample quantiles of the DLR statistic0 6 0.95 quant. 18 24060.95 quant.1824lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll ll ll l l llll l lm=5, nk=300Theoretical Quantiles of the χ42 distribution0 6 0.95 quant. 18 24060.95 quant.1824Figure 3.8: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 3.663.4. Robustness of DELR test against model misspecificationl lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll l l l ll l llm=5, nk=20Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.2430l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll ll ll ll ll llm=5, nk=400 6 0.95 quant. 18 24 30060.95 quant.182430l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll ll ll l lllllm=5, nk=70Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.182430l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lll l lll ll l lm=5, nk=1000 6 0.95 quant. 18 24 30060.95 quant.1824l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll l ll l l l llll lm=5, nk=150Theoretical Quantiles of the χ62 distributionSample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.1824l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll lll l l l llllm=5, nk=300Theoretical Quantiles of the χ62 distribution0 6 0.95 quant. 18 24 30060.95 quant.1824Figure 3.9: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 4.673.4. Robustness of DELR test against model misspecificationl lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll l lll l l l ll l lm=5, nk=20Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.2430l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lll l l ll l ll lm=5, nk=400 6 0.95 quant. 18 24 30060.95 quant.182430l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l lll l lll ll llm=5, nk=70Sample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.1824l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll l lll l l ll ll lm=5, nk=1000 6 0.95 quant. 18 24 30060.95 quant.1824l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l lll l l ll ll llm=5, nk=150Theoretical Quantiles of the χ62 distributionSample quantiles of the DLR statistic0 6 0.95 quant. 18 24 30060.95 quant.1824l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l ll llll l lll l lm=5, nk=300Theoretical Quantiles of the χ62 distribution0 6 0.95 quant. 18 24 30060.95 quant.1824Figure 3.10: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 5.683.4. Robustness of DELR test against model misspecificationl lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l llll l l l ll lm=5, nk=20Sample quantiles of the DLR statistic0 8 0.95 quant. 24 32080.95 quant.2432l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lll l ll lll lm=5, nk=400 8 0.95 quant. 24 32080.95 quant.2432l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llll l lll l l lm=5, nk=700 8 0.95 quant. 24 32080.95 quant.2432l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lll l l ll l l lm=5, nk=100Sample quantiles of the DLR statistic0 8 0.95 quant. 24 32080.95 quant.2432l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll l l l ll lm=5, nk=1500 8 0.95 quant. 24 32080.95 quant.2432l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll ll l l ll lm=5, nk=300Theoretical Quantiles of the χ82 distribution0 8 0.95 quant. 24 32080.95 quant.24l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll llll ll l lllm=5, nk=500Theoretical Quantiles of the χ82 distributionSample quantiles of the DLR statistic0 8 0.95 quant. 24 32080.95 quant.2432l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llll l l ll lllm=5, nk=1000Theoretical Quantiles of the χ82 distribution0 8 0.95 quant. 24 32080.95 quant.2432Figure 3.11: Q–Q plots of the simulated and the null limiting distribution ofthe DELR statistics for samples from different families of distributions underthe misspecified DRM 6.693.5. Analysis of lumber quality data3.4.2 Power of the DELR testWe now study the power of the DELR test under misspecification of theDRM. As in Section 3.3.2, we consider the null hypothesis of H0 : F0 = F1 =. . . = Fm, which is the same as (3.1) with g(β) = β. We use the nominallevel of 5%.We put m + 1 = 5 with sample sizes 90, 120, 75, 135 and 150. In thefirst simulation experiment, we generated data from Weibull distributions.We use DELR test and Wald test to test the same distribution hypothesisbased on DRM assumption with q(x) = (x, log x)ᵀ. We used six parametersettings with the 0th set satisfying the null hypothesis. Note that the basisfunction is misspecified.We repeated the experiment with data from mixtures of two normals,non–central t distributions, and and mixtures of a gamma and a Weibull.We conducted DELR test based on DRM assumption with q(x) = (x, x2)ᵀfor non–central t data, for the normal mixture data, and q(x) = (x, log x)ᵀfor data from gamma–Weibull mixture. All these DRMs are misspecified.The detailed parameter settings 0–5 are given in Table 3.11 in 3.7.We also applied ANOVA, KW and AD tests. The results are summarizedas power curves in Figure 3.12. Although all the DRMs are misspecified, wenotice that, under all these data settings, the DELR test has close to nominaltype I error rates, and superior power in detecting distributional differences.3.5 Analysis of lumber quality dataWe now turn to the primary application of this thesis: analysis of the lum-ber strength. As members of the Forest Products Stochastic Modeling Groupcentered at the University of British Columbia, we are helping develop meth-ods for assessing the engineering strength properties of lumber. A primarygoal, one noted in Chapter 1, is an effective but relatively inexpensive longterm monitoring program for the strength of lumber. Two strengths of703.5. Analysis of lumber quality datallllll0 1 2 3 4 5Weibull dataPower0.050.20.40.60.81.0l lllllllDELRTWaldANOVAKWADllllll0 1 2 3 4 5Data from a mixture of two Normals0.050.20.40.60.81.0l lllllllllll0 1 2 3 4 5T dataParameter settingsPower0.050.20.40.60.81.0llllllllllll0 1 2 3 4 5Data from a mixture of a gamma and a WeibullParameter settings0.050.20.40.60.81.0llllllFigure 3.12: Power curves of the five tests with DELR and Wald tests basedon misspecified DRMs. The parameter setting 0 corresponds to the nullmodel and the settings 1–5 correspond to alternative models.713.5. Analysis of lumber quality datagreat importance are the so–called modulus of rupture (MOR) or “bend-ing strength”, and modulus of tension (MOT) or “tension strength”, both ofwhich are measured in units of 103 pound–force per square inch (psi). TheForest Products Stochastic Modeling Group collected three MOR samplesin year 2007, 2010 and 2011 with sample sizes 98, 282 and 445 respectively,and two MOT samples in year 2007 and 2011 with sample sizes 98 and 425respectively. Our interest in change of lumber quality over time, leads us totest the hypothesis that the MOR and MOT samples respectively come fromthe populations of the same distributions. We do so using the DELR test,Wald test, ANOVA and Kruskal–Wallis rank–sum test.3.5.1 Assessing the DRM fit: an exploratory approachA good DRM fit to the data is crucial to the effectiveness of the DELR test.We now assess this by comparing the EL density estimates obtained usingthe fitted DRM to the histograms of the observed samples.The kernel density plots of the MOR and MOT samples are shown inFigure 3.13. The three densities with modes around 6, 100 psi correspond tothe MOR samples, while the two with modes around 3, 600 psi correspondto the MOT samples. The MOR samples of the year 2007 (MOR07) andthe year 2010 (MOR10) seem to have similar density plots: both are slightlyright–skewed, although MOR10 has a small lump to the right of its mode.Both seem well approximated by gamma distributions. The MOR sampleof the year 2011 (MOR11) seems to have a quite different characteristicfrom MOR07 and MOR10: it has a clearly larger spread and looks moresymmetric, which mimics the shape of the density of a normal distribution.The density plots of the MOT samples of the year 2007 (MOT07) and the year2011 (MOT11) look similar: both are roughly bell shape with a little right–skewed, looking like something between a normal and a gamma distribution.Since the above density plots look like either gamma or normal densi-723.5. Analysis of lumber quality data0 2 4 6 8 10 120.000.050.100.150.200.250.30Wood quality measures: MOR and MOT (103 psi)DensityMOR07MOR10MOR11MOT07MOT11Figure 3.13: Kernel density plots of the MOR and MOT samples.ties, when fitting a DRM to MOR or MOT samples, we use a basis functionq(x) = (log x, x, x2)ᵀwhich combines the basis function for the gammadistributions, q(x) = (log x, x)ᵀ, and that for the normal distributions,q(x) = (x, x2)ᵀ. The corresponding DRM is understood as a generalizationof both gamma and normal distributions. Below we assess how this DRM fitsto the MOR and MOT samples respectively. The DRM fit can be assessedusing goodness–of–fit tests like those developed by Qin and Zhang (1997)and Zhang (2002). Here we use a less formal but visually straightforward733.5. Analysis of lumber quality dataexploratory approach: checking whether the EL kernel density estimators ofthe different populations based on DRM agree with the histograms of theobserved samples.Recall that under the DRM the EL estimator pˆkj for dF0(xkj) of thebaseline distribution is given in (2.9). Applying the DRM assumption (2.1),we obtain the EL estimators for dFl(xkj), l = 1, . . . , m, aspˆ(l)kj = exp{αˆl + βˆᵀl q(xkj)}pˆkj.An EL kernel density estimator of the population l is defined to be the kerneldensity estimator with the {ˆp(l)kj} as weights and all the {xkj} as observations.Fokianos (2004) showed that under mild conditions this EL kernel densityestimator is consistent and more efficient than the classical kernel densityestimator with empirical weight 1/nk for every data point within sample k.The DRM fit to the MOR samplesWe fit the DRM with basis function q(x) = (log x, x, x2)ᵀto the three MORsamples, and compute the EL kernel density estimates of the MOR popula-tions. We compare the EL kernel density estimates based on the DRM tothe histograms, classical kernel density estimates with empirical weights, andparametric density estimates based on a three parameter Weibull distributionwith density functionf(x; λ, κ, c) ={κ(x− c)κ−1/λκ}· exp{− {(x− c)/λ}κ−1},for x > c and λ, κ > 0. Such a Weibull distribution seems to be a flexiblemodel for distributions with slightly heavy tails on the right. We calculatethe maximum likelihood estimates (MLE) of the Weibull parameters based oneach MOR sample, then estimate the density of the corresponding populationby plugging in the MLEs to the above density function. Note that when theshape parameter κ < 1 and the location parameter c tends to the smallest743.5. Analysis of lumber quality dataobservation from below, the likelihood tends to infinity. Thus the MLEs ofthe parameters for this Weibull distribution are not well–defined. In practice,when calculating the MLE, we constrained the range of location parameter cto be smaller than the smallest observation minus a small positive constant,10−6.The histograms and density estimates of the MOR samples are shown inFigure 3.14. For all samples, the EL kernel density estimates agree with thehistograms well. Compared to the classical kernel density estimates with em-pirical weights, the EL kernel density estimates based on the DRM are moresmooth, especially for MOR10. The mode of the EL kernel density estimatefor MOR07 is slightly to the right of the mode of the corresponding classicalkernel density estimate. Such differences exist because the EL kernel densityestimators are obtained using combined data from all samples while the clas-sical kernel density estimators are obtained using single samples. Comparedto the Weibull density estimator, the EL kernel density estimator looks moreflexible: it captures the small lumps on the right of the modes of MOR10and MOR11, while the Weibull estimates do not. Overall, the DRM seemsto fit the three MOR samples well.The EL kernel density estimates is also found to be quite robust to thechoice of the DRM basis function. We fit DRMs with other basis functions,including that for the normal family, that for the gamma family, q(x) =(log x,√x, x)ᵀ, q(x) = (log x,√x, x, x2)ᵀ, etc. The corresponding EL kerneldensity estimates are very similar to the one we get using the current basisfunction q(x) = (log x, x, x2)ᵀ.The DRM fit to the MOT samplesAs for the MOR samples, we fit the DRM with basis function q(x) =(log x, x, x2)ᵀto the two samples of MOT for the year 2007 and 2010, andcalculate the EL kernel density estimates based on the DRM for the cor-responding populations. Again, we compare these density estimates to the753.5. Analysis of lumber quality dataMOR07Density4 6 8 100.000.050.100.150.200.250.30 DRM fitEmpirical fitWeibull (3 parameter) fitMOR10Density4 6 8 100.000.050.100.150.200.250.30MOR11Density2 4 6 8 10 120.000.050.100.150.200.25Figure 3.14: The histograms, EL kernel density estimates (solid curves), clas-sical kernel density estimates (dashed curves) and three parameter Weibulldensity estimates (dot–dashed curves) for MOR samples.histograms, the classical kernel density estimates with empirical weights andthe three parameter Weibull density estimates (Figure 3.15). The EL ker-nel density estimates agree with the histograms well and are quite close tothe classical kernel density estimates. Both the EL and classical kernel den-sity estimators are more flexible and show more curvature than the Weibulldensity estimator.763.5. Analysis of lumber quality dataMOT07Density2 4 6 80.000.050.100.150.200.250.300.35 DRM fitEmpirical fitWeibull (3 parameter) fitMOT11Density2 4 6 8 100.000.050.100.150.200.250.30Figure 3.15: The histograms, EL kernel density estimates (solid curves), clas-sical kernel density estimates (dashed curves) and three parameter Weibulldensity estimates (dot–dashed curves) for MOT samples.3.5.2 Testing for equality of strength populationsComparing the MOR populationsWe now test the hypothesis that all MOR samples are from populations ofthe same distributions. As mentioned, we use the DELR test, Wald test,ANOVA and KW test for this hypothesis. The first two tests were carriedunder the DRM assumption with basis function q(x) = (log x, x, x2)ᵀ. TheDELR test and Wald test based on other basis functions lead to the sameconclusion below, although the p–values are not identical.The p–values obtained using the DELR test, Wald test, ANOVA and KWtest are respectively 3.05e-8, 2.04e-6, 0.0029 and 0.00108. All tests stronglyreject the null hypothesis of equal MOR distributions. The DRM–basedtests, especially the DELR test, have much smaller p–values.Following the rejection of that hypothesis it is natural to look for itscause through pairwise comparisons. For comparing pairs, we use the twosample t–test adjusted for unequal variances (also known as Welch’s t–test)773.5. Analysis of lumber quality dataand the Wilcoxon rank–sum test with continuity correction, instead of theplain ANOVA and KW test. We note that the plain ANOVA and KW testgive slightly different p–values but same the conclusions. Let FMOR07, FMOR10and FMOR11 denote the MOR population distributions for years 2007, 2010and 2011, respectively. The p–values for pairwise comparisons are given inTable 3.8. Note that the two DRM–based tests strongly suggest FMOR11 ismarkedly different from FMOR07 and FMOR10, while FMOR07 and FMOR10 arenot significantly different. The other two tests arrive at the same conclusion,but without statistical significance at 5% level. We also remark that theconclusion does not change at the 5% level when a Bonferroni correction isapplied to account for the multiple comparison.In addition, if the 5% size is strictly observed, t–test and KW test wouldimply FMOR07 = FMOR10 and FMOR07 = FMOR11, but FMOR10 6= FMOR11. Thisis much harder to interpret in applications.Table 3.8: The p–values of pairwise comparisons among three MOR popula-tions.DELRT Wald t–test WilcoxonH0: FMOR07 = FMOR10 0.871 0.875 0.516 0.431H0: FMOR07 = FMOR11 5.40e-4 7.01e-3 0.0579 0.0604H0: FMOR10 = FMOR11 4.54e-8 1.82e-6 6.09e-4 3.95e-4The three–sample DRM fit versus two–sample DRM fit. In theabove pairwise comparisons, the DELR and Wald tests are based on a DRMfitted to all three MOR samples. We can also fit the DRM to the two sam-ples in comparison only. How do the tests based on the two–sample DRMscompared to those based on the three–sample DRM. The p–values of suchtwo–sample tests for pairwise comparisons are shown in Table 3.9. The p–values are very close to those based on the DRM for all the three MOR783.5. Analysis of lumber quality datapopulations, and the corresponding conclusions are the same. This observa-tion agrees with the conclusion of Theorem 4.3 that we will present in thenext chapter: when comparing the equality of a set of distribution functions,incorporating more samples, whose population distributions are unrelated tothe hypothesis, into the DRM does not change the local asymptotic powerof the DELR test.Table 3.9: The p–values of the DELR and Wald tests based on two–sampleDRMs for pairwise comparisons among the three MOR populations.DELRT WaldH0: FMOR07 = FMOR10 0.856 0.859H0: FMOR07 = FMOR11 6.51e-04 7.8e-03H0: FMOR10 = FMOR11 4.75e-08 2.04e-06Comparing the MOT populationsWe now test the hypothesis that the distribution functions of the two MOTpopulations are equal. The p–values of the DELR test, Wald test, Welch’st–test and Wilcoxon rank–sum test for this hypothesis are 0.0877, 0.228,0.0749 and 0.295, respectively. No test rejects the null hypothesis at the 5%significance level. The DELR test and the t–test provide marginal evidenceagainst the null hypothesis, showing that they are picking up the vaguedifference between the two samples. These results agree with our observationfrom the kernel density plots of the MOT samples (Figure 3.13).793.6. Proofs3.6 Proofs3.6.1 Theorem 3.1: Null limiting distribution of theDELR statisticIn this subsection, we show that the DELR test statistic, Rn, has a simplechi–square limiting distribution under the null hypothesis of (3.1). The ideais to show that the Rn is well approximated by a quadratic form that isasymptotically chi–square. We first give two important lemmas.Lemma 3.3 (Block matrix inversion formula). Let M be a (s+ t)× (s+ t)nonsingular matrix with partitionM =As×sBs×tCt×sDs×t .If A is nonsingular, then so is SA = D − CA−1B andM−1 =(A−1 + A−1BS−1A CA−1 −A−1BS−1A−S−1A CA−1 S−1A).This is the conclusion of Theorem 8.5.11 of Harville (2008). We sketchthe proof here.Proof of Lemma 3.3. We first show that if A is nonsingular, then SA = D−CA−1B is also nonsingular. Observe the following equality,(Is 0−CA−1 It)M =(Is 0−CA−1 It)(A BC D)=(A B0 SA).The first matrix on the LHS is lower block–triangular, square, and has afull rank, so the rank of the second matrix on the LHS, M , is the same asthe rank of the matrix on the RHS. Note that the matrix on the RHS is an803.6. Proofsupper block–triangular matrix, so its rank equals the sum of the ranks of itsdiagonal blocks A and SA. Therefore,rank(M) = rank(A B0 SA)= rank(A) + rank(SA).Since M and A are both of full rank, SA also has a full rank by the aboveequality. Hence SA is nonsingular.Since the above RHS matrix is nonsingular, we can solve for M−1 asM−1 =(A B0 SA)−1(Im 0−CA−1 In)=(A−1 −A−1CS−1A0 S−1A)(Im 0−CS−1A In)=(A−1 + A−1BS−1A CA−1 −A−1BS−1A−S−1A CA−1 S−1A).Lemma 3.4 (Quadratic form decomposition formula). Adopt the settings ofLemma 3.3. Let zᵀ = (zᵀ1, zᵀ2) be a vector of length s + t, partitioned inagreement with s and t. If A is nonsingular, thenzᵀM−1z =(z2 −Bᵀ(A−1)ᵀz1)ᵀ(D − CA−1B)−1(z2 − CA−1z1)+ zᵀ1A−1z1.Lemma 3.4 is an easy consequence of Lemma 3.3. Its proof thus is omit-ted.Proof of Theorem 3.1. We now prove the asymptotic chi–squareness of theDELR statistic Rn = 2{ln(θˆ) − ln(θ˜)} under the null model of (3.1). Wefirst work on quadratic expansions of ln(θˆ) and ln(θ˜) under the null model.The difference of the two quadratic forms is then shown to have a chi–squarelimiting distribution.813.6. ProofsRecall v = n−1/2∂ln(θ∗)/∂θ and θ∗ is the true DRM parameter. Byexpanding ln(θˆ) at θ∗, we getln(θˆ) = ln(θ∗) +√nvᵀ(θˆ − θ∗)− (1/2)n(θˆ − θ∗)ᵀUn(θˆ − θ∗) + nwhere n = Op(n−1/2) since θˆ − θ∗ = Op(n−1/2) and the third derivative isbounded by an integrable function shown in (2.17). Also, in the proof ofTheorem 2.5, we have shown in (2.25) that√n(θˆ − θ∗) = U−1v + op(1).Plugging the above expression of√n(θˆ− θ∗) into the expansion of ln(θˆ), wegetln(θˆ) = ln(θ∗) + (1/2)vᵀU−1v + op(1). (3.7)Next, we work on an expansion for ln(θ˜) under the null model g(β) = 0.Recall Section 3.2.2 that, when q < md, the null model can be equivalentlyrepresented as β = G(γ) for some function G: Rmd−q → Rmd and parameterγ of dimension md − q. In addition, G is thrice differentiable in a neigh-bourhood of γ∗, and its Jacobian matrix J = ∂G(γ∗)/∂γ is of full rank.With this representation, the DRM parameter under the null hypothesis isθ = (α, G(γ)). Hence, we may write the likelihood function under null modelas`n(α, γ) = ln(α, G(γ)).Let (α˜, γ˜) be the maximal point of `n(α,γ). Note that θ˜ = (α˜, G(γ˜)) andso ln(θ˜) = `n(α˜, γ˜). Clearly, `n(α, γ) has the same properties as ln(θ) and`n(α˜, γ˜) has a similar expansion as (3.7):ln(θ˜) = `n(α˜, γ˜) = `n(α∗, γ∗) + (1/2)v˜ᵀU˜−1v˜ + op(1),823.6. Proofswhere v˜ = n−1/2∂ln(α∗, γ∗)/∂(α, γ) and U˜ is the corresponding informationmatrix. Partition v into v1 = n−1/2∂ln(θ∗)/∂α and v2 = n−1/2∂ln(θ∗)/∂β.Note thatn−1/2∂`n(α∗,γ∗)/∂α = n−1/2∂ln(θ∗)/∂α = v1.By the chain rule,n−1/2∂`n(α∗,γ∗)/∂γ = n−1/2Jᵀ{∂ln(θ∗)/∂β)} = Jᵀv2. (3.8)Similarly, the new information matrix is found to beU˜ =(Im 00 Jᵀ)(Uαα UαβUβα Uββ)(Im 00 J)=(Uαα UαβJJᵀUβα JᵀUββJ).Consequently, we haveln(θ˜) = `n(α˜, γ˜) = `n(α∗,γ∗) + (1/2)(vᵀ1,vᵀ2J)U˜−1(vᵀ1,vᵀ2J)ᵀ + op(1).Combining (3.7) and the above expansion, and noticing that `n(α∗,γ∗) =ln(θ∗), we haveRn = 2{ln(θˆ)− ln(θ˜)} = vᵀU−1v − (vᵀ1,vᵀ2J)U˜−1(vᵀ1,vᵀ2J)ᵀ + op(1).Applying the quadratic form decomposition formula given in Lemma 3.4 tothe two quadratic forms on the RHS of the above expansion, we getvᵀU−1v = ξᵀΛ−1ξ + vᵀ1U−1ααv1,(vᵀ1,vᵀ2J)U˜−1(vᵀ1,vᵀ2J)ᵀ = ξᵀJ(JᵀΛJ)−1Jᵀξ + vᵀ1U−1ααv1,(3.9)where ξ = (−UβαU−1αα, Imd)v and Λ = Uββ − UβαU−1ααUαβ is defined in833.6. ProofsTheorem 3.2. We then obtain the following expansionRn = 2{ln(θˆ)− ln(θ˜)} = ξᵀ{Λ−1 − J(JᵀΛJ)−1Jᵀ}ξ + op(1). (3.10)Recall that, by Theorem 2.2, v is asymptotically N(0, U − UWU), whereW = diag{T, 0md×md} and T = ρ0−11m1ᵀm + diag{ρ−11 , . . . , ρ−1m } as given inTheorem 2.2. Thus, ξ is asymptotic normal with mean 0 and covariancematrix(−UβαU−1αα, Imd)(U − UWU)(−UβαU−1αα, Imd)ᵀ.Noting that(−UβαU−1αα, Imd)U(−UβαU−1αα, Imd)ᵀ = Uββ − UᵀαβUααUαβ = Λand(−UβαU−1αα, Imd)UW =(−UβαU−1ααUααT + UβαT, 0)= 0,we get that the asymptotic covariance matrix of ξ is Λ.The last step is to verify the quadratic form in the above expansion ofRn has the claimed limiting distribution. We can easily check thatΛ1/2{Λ−1 − J(JᵀΛJ)−1Jᵀ}Λ1/2 = Imd − Λ1/2J(JᵀΛJ)−1JᵀΛ1/2is idempotent. Moreover, by the additivity and commutativity of the trace843.6. Proofsoperation, we find the trace of the above idempotent matrix to betr{Imd − Λ1/2J(JᵀΛJ)−1JᵀΛ1/2}=tr(Imd)− tr{Λ1/2J(JᵀΛJ)−1JᵀΛ1/2}=md− tr{(JᵀΛJ)−1(JᵀΛJ)}=md− tr(Imd−q)=q.Therefore, by Theorem 5.1.1 of Mathai (1992), the quadratic form in expan-sion (3.10), and hence also Rn, has a χ2q limiting distribution.The above proof is applicable to q < md. When q = md, the value of βis fully specified. Hence, the maximization under null is solely with respectto α and we easily findln(θ˜) = ln(θ∗) + (1/2)vᵀ1U−1ααv1 + op(1).This, along with the expansion (3.7) of ln(θˆ) and expression of vᵀU−1v givenin (3.9), implies that Rn = ξᵀΛ−1ξ + op(1). Just as the proof for the case ofq < md, the limiting distribution of the above Rn is seen to be χ2md.3.6.2 Theorem 3.2: Limiting distribution of the DELRunder local alternativesIn this subsection, we prove that the DELR test statistic, Rn, has a non–central chi–square limiting distribution under the local alternative model(3.2). We first sketch out the proof. Let β∗ be a specific parameter value un-der the null hypothesis and {Fk} be the corresponding distribution functions.Let {Gk} be the set of distribution functions satisfying the DRM with pa-rameter given by βk = β∗k +n−1/2k ck, k = 1, . . . , m, and G0 = F0. When thesamples are generated from the {Gk}, we still have that the DELR statisticis approximated by the quadratic form on the RHS of (3.10). The limiting853.6. Proofsdistribution of Rn is therefore determined by that of v = n−1/2∂ln(θ∗)/∂θ.According to Le Cam’s third lemma (van der Vaart 2000, 6.7), v has a specificlimiting distribution under the {Gk} if v and∑k,j log{dGk(xkj)/dFk(xkj)},under the {Fk}, are jointly normal with a particular mean and variancestructure. The core of the proof then is to establish that structure.For each k = 0, 1, . . . , m, let Vark(·) and Covk(·, ·) be the variance andcovariance operators with respect to Fk, respectively.Lemma 3.5. Under the conditions of Theorem 3.1 and the distribution func-tions {Gk}, v is asymptotically normal with meanτ =m∑k=1√ρkCovk{∂Lk(θ∗, x)/∂θ, qᵀ(x)}ckand covariance matrix V = U − UWU as given in Theorem 2.2.Proof of Lemma 3.5. We first expand wk =∑nkj=1 log{dGk(xkj)/dFk(xkj)}.Notice thatdGk(x)/dFk(x) = exp{αk + βkq(x)}/ exp{α∗k + β∗kq(x)}= exp{αk − α∗k + n−1/2k ckq(x)}.Because αk is normalization constants that can be expressed asαk = − logˆexp{βᵀkq(x)}dF0(x) = − logˆexp{(β∗kᵀ+ n−1/2k cᵀk)q(x)}dF0(x),we haveexp{α∗k − αk} =ˆexp{α∗k + (β∗kᵀ+ n−1/2k cᵀk)q(x)}dF0(x).=ˆexp{n−1/2k cᵀkq(x)} exp{α∗k + β∗kᵀq(x)}dF0(x)=ˆexp{n−1/2k cᵀkq(x)}dFk(x)863.6. ProofsExpanding the exponential term on the RHS, we getexp{α∗k − αk} =ˆ{1 + n−1/2k cᵀkq(x) + (2nk)−1(cᵀkq(x))2}dFk(x) + n,where n ∝ n−3/2k´‖q(x)‖3dFk(x) = O(n−3/2) uniformly in x, because thethird order moment of q(x) is finite. Denote νk = Ekq(x). Then, the aboveequality is further simplified toexp{α∗k − αk} = 1 + n−1/2k cᵀkνk + (2nk)−1cᵀkEk(q2(x))ck +O(n−3/2).Hence, ignoring the O(n−3/2) term, which is uniform in x, we havelog{dGk(x)/dFk(x)} ≈ n−1/2k ckq(x)− log{1 + n−1/2k cᵀkνk + (2nk)−1cᵀkEk(q2(x))ck}.Write σk = Vark(q(x)). Expanding the logarithmic term on the RHS, wegetlog{1 + n−1/2k cᵀkνk + (2nk)−1cᵀkEk(q2(x))ck}=n−1/2k cᵀkνk + (2nk)−1cᵀkEk(q2(x))ck − nkcᵀk{νkνᵀk}ck +O(n−3/2)=n−1/2k cᵀkνk + (2nk)−1cᵀkσkck +O(n−3/2),where the remainder O(n−3/2) is again uniform in x. Thereforelog{dGk(x)/dFk(x)} = n−1/2k cᵀk{q(x)− νk} − (2nk)−1cᵀkσkck +O(n−3/2),uniformly in x. Summing over j, we get, for each k,wk =nk∑j=1log{dGk(xkj)/dFk(xkj)}= n−1/2k cᵀknk∑j=1{q(xkj)− νk} − (1/2)cᵀkσkck +O(n−1/2).873.6. ProofsWhen k = 0, we have c0 = 0.Recall that v = n−1/2∂ln(θ∗)/∂θ, ln(θ∗) =∑k,j Ln,k(θ∗, xkj) and λˆk =nk/n whose limit is ρk, we have(v∑kwk)≈m∑k=01√nknk∑j=1(√ρk {∂Ln,k(θ∗, xkj)/∂θ − µk}cᵀk{q(xkj)− νk})−m∑k=0(012cᵀkσkck),which is seen to be jointly asymptotically normal under the null distributions{Fk}. The corresponding mean vector and variance matrix are given by(0ᵀ, −12∑kcᵀkσkck)ᵀand(V ττ ᵀ∑k cᵀkσkck),where τ is the one given in the Lemma, and V = U−UWU is the asymptoticcovariance matrix of v as given in Theorem 2.2. Because the second entry ofthe mean vector equals negative half of the lower–right entry of the covariancematrix, the condition of Le Cam’s third lemma is satisfied. By that lemma,we conclude that v has a normal limiting distribution with mean τ andcovariance matrix V under the local alternative distributions {Gk}.Proof of Theorem 3.2. We now show that the DELR statistic Rn has a non–central chi–square limiting distribution under the distributions {Gk} thatsatisfy the local alternative model (3.2). We first show that, under the {Gk},Rn is still approximated by the quadratic form on the RHS of (3.10).Under the {Gk}, we still have −n−1∂2ln(θ∗)/∂θ∂θᵀ → U and, by Lemma3.5, v = Op(1). In addition, θˆ still admits the expansion√n(θˆ − θ∗) = U−1v + op(1) = Op(1),and hence it is root–n consistent for θ∗. Similarly, the constrained MELE θ˜is also root–n consistent for θ∗ under the {Gk}. The root–n consistency of883.6. Proofsθˆ and θ˜ implyRn = ξᵀ{Λ−1 − J(JᵀΛJ)−1Jᵀ}ξ + op(1)when q < md, and Rn = ξᵀΛ−1ξ + op(1) when q = md. The matrix in thequadratic form of the expansion of Rn is the same as that in (3.10). Whathas changed is the distribution of ξ = (−UβαU−1αα, Imd)v.By Lemma 3.5, under the local alternative {Gk}, v is asymptoticallyN(τ , V ). Hence ξ also has a normal limiting distribution. Since the asymp-totic covariance matrix of v is the same as that under the {Fk}, the asymp-totic covariance matrix of ξ is still Λ as we have shown in the proof of Theo-rem 3.1. The mean of the limiting distribution of ξ now is µ = (−UβαU−1αα, Imd)τ .Partition τ into (τ ᵀα, τᵀβ)ᵀasτα =m∑k=1√ρkCovk{∂Lk(θ∗, x)/∂α, qᵀ(x)}ck,τβ =m∑k=1√ρkCovk{∂Lk(θ∗, x)/∂β, qᵀ(x)}ck.By relationship (2.18) and (2.19), we haveCovk(∂Lk(θ∗, x)∂α, qᵀ(x))=1ρk{Uαβ(ek ⊗ Id)− UααekEk(qᵀ(x))},Covk({ek −h(θ∗, x)s(θ∗, x)}⊗ q(x), qᵀ(x))=1ρk{Uββ(ek ⊗ Id)− UβαekEk(qᵀ(x))}.Plugging the above expressions into the expressions of τα and τβ, we getτα =m∑k=1ρk−1/2{Uαβ(ek ⊗ Id)− UααekEk(qᵀ(x))}ck,τβ =m∑k=1ρk−1/2{Uββ(ek ⊗ Id)− UβαekEk(qᵀ(x))}ck.(3.11)893.6. ProofsConsequently,µ = −UβαU−1αατα + τβ = Λm∑k=1ρk−1/2(ek ⊗ Id)ck = Λη,where the second last equality is by (3.11) and η is defined in Theorem 3.2.In the proof of Theorem 3.1, we have verified that the matrixA = Λ1/2{Λ−1 − J(JᵀΛJ)−1Jᵀ}Λ1/2is idempotent with rank q. Hence, by Corollary 5.1.3a of Mathai (1992), thequadratic form in the above expansion of Rn, and hence Rn, has a non–centralchi–square limiting distribution with q degrees of freedom and non–centralparameterδ2 = µᵀ{Λ−1 − J(JᵀΛJ)−1Jᵀ}µ = ηᵀ{Λ− ΛJ(JᵀΛJ)−1JᵀΛ}ηin the case of q < md, andδ2 = µᵀΛ−1µ = ηᵀΛµin the case of q = md.In the last step we verify the condition for positiveness of the non–centralparameter δ2. When q = md, δ2 = ηᵀΛη > 0 because Λ is positive definite.When q < md, δ2 = (ηᵀΛ1/2)A(Λ1/2η). We verified that A is an idempotentmatrix. Hence, A is positive semidefinite and δ2 ≥ 0. Moreover, δ2 = 0 ifand only if Λ1/2η is in the null space of A. The null space of A is the columnspace of I − A = Λ1/2J(JᵀΛJ)−1JᵀΛ1/2, which is just the column space ofΛ1/2J . It is easily verified that Λ1/2η is in the column space of Λ1/2J if andonly if η is in the column space of J . Hence Λ1/2η is in the null space of Aand δ2 = 0 if and only if η is in the column space of J .903.7. Appendix: Parameter values in simulation studies3.7 Appendix: Parameter values in simulationstudiesTable 3.10: Parameter values for power comparison under non–normal dis-tributions (Section 3.3.2).Γ(λ, κ): gamma distribution with shape λ and rate κ;LN(µ, σ): log–normal distribution with mean µ and standard deviation σ on log scale;Pa(γ): Pareto distribution with shape γ and common support of x > 1;W (b): Weibull distribution with scale b and common shape of 0.8.F0 remains unchanged across parameter settings 0–5.Parameter settingsF0 1 2 3 4 5λ κ λ κ λ κ λ κ λ κΓ(0.2, 0.8)F1: 0.18 0.7 0.17 0.6 0.16 0.5 0.155 0.45 0.14 0.4F2: 0.22 0.85 0.24 0.95 0.255 1.05 0.18 0.7 0.17 0.6F3: 0.23 0.95 0.255 1.2 0.275 1.25 0.29 1.4 0.33 1.6F4: 0.24 1.05 0.27 1.3 0.29 1.4 0.31 1.55 0.35 1.85µ σ µ σ µ σ µ σ µ σLN(0, 1.5)F1: 0.44 1.3 0.7 1.2 0.9 1.15 1 1 1.2 0.85F2: 0.22 1.32 0.57 1.30 0.62 1.25 0.67 1.20 0.87 1F3: 0.18 1.35 0.63 1.33 0.73 1.30 0.83 1.28 0.85 1.28F4: 0.37 1.38 0.60 1.35 0.70 1.33 0.75 1.32 0.95 1.30γ γ γ γ αPa(2)F1: 1.9 1.85 1.8 1.75 1.7F2: 2.1 2.2 2.3 1.85 1.75F3: 2.35 2.55 2.70 2.85 3.25F4: 2.5 2.78 2.98 3.2 3.75b b b b bW (1)F1: 0.76 0.65 0.59 0.53 0.42F2: 1.2 1.26 1.31 1.35 1.42F3: 1.08 1.05 1.10 1.12 1.14F3: 0.90 0.89 0.85 0.82 0.78913.7. Appendix: Parameter values in simulation studiesTable 3.11: Parameter values for power comparison under misspecified DRMs(Section 3.4.2).W (a, b): Weibull distribution with shape a and scale b;t(ν, c): non–central t distribution with ν degrees of freedom and non–central parameter c;F0 remains unchanged across parameter settings 0–5.Parameter settingsF0 1 2 3 4 5a b a b a b a b a bW (1, 1)F1: 0.9 0.95 0.85 0.94 0.82 0.92 0.79 0.91 0.75 0.88F2: 0.98 0.98 0.96 0.96 0.95 0.95 0.94 0.94 0.91 0.92F3: 1.03 1.04 1.05 1.06 1.07 1.07 1.09 1.08 1.12 1.12F4: 1.01 0.95 1.02 0.92 1.03 0.90 1.05 0.89 1.07 0.85ν c ν c ν c ν c ν ct(4, 0)F1: 3.85 -0.05 3.75 -0.1 3.65 -0.1 3.45 -0.15 3.4 -0.2F2: 5 0.1 6 0.05 6.5 0.1 3.8 0.1 3.7 0.15F3: 4.5 0.05 4.8 0.1 5.5 0.15 6 0.15 8 0.2F4: 6 0.1 6.5 0.15 8.5 0.2 8 0.23 10 0.28F1:0.8N(1, 1.3)+ 0.8N(1.05, 1.3)+ 0.8N(1.1, 1.2)+ 0.8N(1.15, 1.18)+ 0.8N(1.2, 1.15)+0.2N(0.8, 1) 0.2N(0.8, 1) 0.2N(0.8, 0.95) 0.2N(0.85, 0.9) 0.2N(0.9, 0.9)F2:0.6N(1, 1.3)+ 0.5N(1, 1.3)+ 0.5N(1, 1.25)+ 0.5N(1.1, 1.25)+ 0.5N(1.1, 1.3)+0.6N(1, 1.3)+ 0.4N(0.8, 1) 0.5N(0.75, 0.95) 0.5N(0.7, 1) 0.5N(0.7, 0.95) 0.5N(0.65, 0.92)0.4N(0.8, 1)F3:0.3N(1, 1.3)+ 0.3N(1, 1.25)+ 0.3N(0.9, 1.15)+ 0.3N(0, 8, 1.07)+ 0.3N(0.75, 1.02)+0.7N(0.8, 1) 0.7N(0.8, 0.95) 0.7N(0.8, 87) 0.7N(0.8, 0.83) 0.7N(0.75, 0.78)F4:0.6N(1, 1.3)+ 0.2N(1, 1.3)+ 0.1N(1, 1.3)+ 0.1N(0.95, 1.3)+ 0.1N(0.9, 1.25)+0.4N(0.8, 1) 0.8N(0.8, 1) 0.9N(0.95, 1) 0.9N(1, 1) 0.9N(1.1, 0.95)F1:0.45Γ(0.85, 1)+ 0.45Γ(0.85, 1)+ 0.4Γ(0.9, 1)+ 0.4Γ(1, 0.9)+ 0.4Γ(1.4, 0.8)+0.55W (1.15, 1.22) 0.55W (1.3, 1.15) 0.6W (1.4, 1.15) 0.6W (1.35, 0.9) 0.6W (1.5, 0.8)F2:0.3Γ(1.05, 0.9)+ 0.3Γ(1.15, 0.9)+ 0.3Γ(1.25, 0.9)+ 0.3Γ(1.4, 0.8)+ 0.3Γ(2, 0.7)+0.3Γ(0.8, 1)+ 0.7W (1, 1.25) 0.7W (1, 1.23) 0.7W (1, 1.2) 0.7W (1, 1.2) 0.7W (1, 1)0.7W (1, 1.3)F3:0.25Γ(0.8, 1.4)+ 0.25Γ(0.85, 1.65)+ 0.25Γ(0.85, 1.85)+ 0.25Γ(0.95, 2.05)+ 0.25Γ(1.2, 2.5)+0.75W (1, 1.3) 0.75W (1.05, 1.28) 0.75W (1.1, 1.25) 0.75W (1.1, 1.25) 0.75W (1.2, 1.2)F4:0.3Γ(0.8, 0.93)+ 0.3Γ(0.8, 0.88)+ 0.3Γ(0.8, 0.85)+ 0.3Γ(0.8, 0.7)+ 0.3Γ(1.6, 0.5)+0.7W (1.05, 1.3) 0.7W (1, 1.3) 0.7W (1, 1.3) 0.7W (1.3, 1) 0.7W (1.4, 0.9)92Chapter 4Effects of Information Pooling byDRMOur use of DRM is motivated by its ability to pool information across a num-ber of samples. We believe the resulting inferences are more efficient thaninferences based on individual samples. Moreover, strong evidence about thisimproved efficiency already exists: Fokianos (2004) obtained more efficientdensity estimators under DRM than the classical kernel density estimatorsbased on individual samples; Chen and Liu (2013) found DRM–based quan-tile estimators to be more efficient than the empirical quantile estimators.We also anticipate that there will be a gain on the estimation accuracy ofthe MELE of the DRM parameter θ and on the power the DELR test ifwe combine information from additional samples using the DRM — a topicthat is not studied in literature. This chapter provides both theoretical andsimulation supports for this conjecture.4.1 Effects on the estimation accuracy of theMELESuppose we have m+ 1 independent samples whose distributions satisfy theDRM (2.1). Yet our interest may well focus on the inference about a subsetof these distributions, without loss of generality, the first r + 1 distributionsF0, F1, . . . , Fr with 1 ≤ r < m. Shall we base the inference on a DRM fittedto the first r+ 1 samples or on one fitted to all the m+ 1 samples? We here934.1. Effects on the estimation accuracy of the MELEprove that the latter DRM yields a higher estimation accuracy for the MELEof the DRM parameter that corresponds to the distributions of interest.Let ν = (α1, . . . , αr, βᵀ1, . . . ,βᵀr)ᵀdenote the DRM parameter for thefirst r non–baseline distributions. Denote the MELEs of ν based on the DRMfor the first r + 1 distributions and that for all distributions as νˆ(1) and νˆ(2)respectively. Let Σ(1) and Σ(2) be the corresponding asymptotic covariancematrices. Recall that the size of the kth sample is nk, k = 0, 1, . . . , m. Putn(1) =∑rk=0 nk and ρ = limn→∞ n(1)/n, where n is the total size of all them + 1 samples. By the asymptotic normality of the MELE (Theorem 2.5),we have√n(1)(νˆ(1) − ν∗)(d)→ N(0,Σ(1)),√n(νˆ(2) − ν∗)(d)→ N(0,Σ(2)),where ν∗ is the true parameter value. Hence the asymptotic covariance ma-trix of νˆ(1) is approximated by Σ(1)/n(1) and that of νˆ(2) is approximated byΣ(2)/n. The estimation accuracy of an estimator is measured by the inverseof its covariance matrix. The scaling factors in the above results are different,so, to ensure fairness, we should compare Σ(1) to limn→∞(n(1)/n)Σ(2) = ρΣ(2).The following theorem tells us that the estimation accuracy of νˆ(2) is neverlower than that of νˆ(1).Recall that, for a matrix A, we use A > 0 (A ≥ 0) to denote that A ispositive definite (positive semidefinite). For matrices A and B, we will useA > B (A ≥ B) to represent A−B > 0 (A−B ≥ 0).Theorem 4.1. Under the conditions of Theorem 2.1, Σ(1) ≥ ρΣ(2).Example 4.1. Consider the situation where m + 1 = 4, samples are froma DRM with basis function q(x) = (x, log x)ᵀ, and the sample proportionsare (3/11, 3/22, 2/11, 9/22). Let Fk, k = 1, 2, 3, be the distributions withparameters β∗1 = (−2, 2)ᵀ, β∗2 = (−.25, 0.2)ᵀ and β∗2 = (−0.5, 0.1)ᵀ, respec-tively.944.2. Effects on the power of the DELR testLet r + 1 = 2 and ν = (α1, βᵀ1)ᵀ, then ρ = 9/22. When F0 is Γ(2, 1),we numerically compute the information matrices (2.13) for the parametersof the DRM based on the first r + 1 = 2 samples and that based on allthe m + 1 = 4 samples. Based on the information matrices, we obtain theasymptotic covariance matrices, Σ(1) and Σ(2), of the MELEs for ν underthese two DRMs by (2.14). The smallest eigenvalue of Σ1 − rΣ2 is thenfound to be approximately 0.237, so Σ1 > rΣ2.Example 4.2. Consider the situation where m+ 1 = 5, samples are from aDRM with basis function q(x) = x1.5, and the sample proportions are (0.2,0.1, 0.1, 0.2, 0.2). Let Fk, k = 1, . . . , 4, be the distributions with parametersβ∗1 = −9.78, β∗2 = 0.72, β∗3 = 1.11, and β∗4 = −1.43, respectively.Let r + 1 = 3 and ν = (α1, α2, βᵀ1, βᵀ2)ᵀ, then ρ = 0.4. When F0 isthe Weibull distribution with shape of 1.5 and scale of 0.8, we numericallycompute the information matrices (2.13) for the parameters of the DRMbased on the first r + 1 = 3 samples and that based on all the m + 1 =5 samples. Based on the information matrices, we obtain the asymptoticcovariance matrices, Σ(1) and Σ(2), of the MELEs for ν under these twoDRMs by (2.14). The smallest eigenvalue of Σ1 − rΣ2 is then found to beapproximately 0.013, so Σ1 > rΣ2.4.2 Effects on the power of the DELR testIn this section, we show that the local asymptotic power of the DELR test isoften increased when strength is borrowed from additional samples even whentheir underlying distributions are unrelated to the hypothesis of interest.We adopt the setting posited in the last section for multiple samples fromdistributions satisfying the DRM assumption. A hypothesis of interest mayalso focus on a characteristic of just a subset of these populations. If so, whyshould our tests be based on all the samples? One answer is found in theirimproved local power as we now demonstrate.954.2. Effects on the power of the DELR testWithout loss of generality, consider a null hypothesis regarding subpopu-lations F0, F1, . . . , Fr with r < m and let ζᵀ= (βᵀ1, . . . , βᵀr). The compositehypotheses are specified asH0 : g(ζ) = 0 against H1 : g(ζ) 6= 0 (4.1)for some smooth function g : Rrd → Rq with q ≤ rd. A DELR test can bebased either on samples from just F0, F1, . . . , Fr, or on the samples from allthe populations F0, F1, . . . , Fm. We denote the corresponding test statisticsas R(1)n and R(2)n , respectively.Theorem 3.1 implies that R(1)n and R(2)n have the same χ2q distribution inthe limit under the null model of (4.1). Theorem 3.2, on the other hand,provides a useful tool for comparing their local asymptotic powers. It im-plies that R(1)n and R(2)n have non–central chi–square limiting distributions ofthe same q degrees of freedom, however with possibly different non–centralparameter values at a local alternative. By the result (3.6) in Section 3.2.3, apower comparison can therefore be made using these non–central parametervalues. The following two theorems, whose proofs are given in Section 4.4,implement this idea and provide that power comparison both in a generaland a special situation.Theorem 4.2. Adopt the conditions of Theorem 2.1, the hypotheses (4.1)and the local alternativesβk ={β∗k + n−1/2k ck, if k = 1, . . . , r,β∗k, otherwise,(4.2)for some given set of constants {ck}. Let δ21 and δ22 be non–central parametervalues of the limiting distribution of R(1)n and R(2)n under the local alternativemodel. Then δ22 ≥ δ21.Example 4.3. Consider the situation where m + 1 = 3, samples are froma DRM with basis function q(x) = (x, x2)ᵀ, and the sample proportions964.2. Effects on the power of the DELR testare (0.5, 0.25, 0.25). Let Fk, k = 1, 2, be the distributions with parame-ters β∗1 = (6, −1.5)ᵀ and β∗2 = (−0.25, 0.375)ᵀ. Suppose H0 is given asg(ζ) = β1 − (6, −1.5)ᵀ = 0, and the local alternative isβ1 = β∗1 + n−1/21 c1; β2 = β∗2with c1 = (2, 2)ᵀ.Let R(1)n and R(2)n be the DELR test statistics based on F0, F1, and onF0, F1, F2, respectively. When F0 is, N(0, 1), the standard normal distribu-tion, we obtain information matrices (2.13), and hence Λ = Uββ−UβαU−1ααUαβ,for R(1)n and R(2)n based on numerical computation. For R(1)n , we have η =(4, 4)ᵀ and q = d = 2. Then by Theorem 3.2, we find δ21 = ηᵀΛη ≈ 5.90.For R(2)n , we find η = (4, 4, 0, 0)ᵀ, 5 = (I2, 02×2), and J = (02×2, I2)ᵀ. ByTheorem 3.2, we get δ22 ≈ 6.67. Now, since δ21 ≈ 5.90 < δ22 ≈ 6.67 and thedegrees of freedom of the limiting distributions of both R(1)n and R(2)n are 2,R(2)n is more powerful than R(1)n even though the null hypothesis concerns theparameter of just population 1.Note that at the 5% level, the powers of R(1)n and R(2)n are approximately0.577 and 0.633, respectively.The asymptotic power of R(2)n (based on all samples) is not always higherthan that of R(1)n . We demonstrate this fact in the following special case.Partition {1, . . . , r} into K parts denoted by Sk, k = 1, . . . , K, such thatthe size, sk, of Sk satisfies s1 ≥ 0 and sk ≥ 2 for k = 2, . . . , K. Let ζk bethe vector consisting of all the βj with j ∈ Sk. Consider the null hypothesisH0 composed ofgk(ζ) = Akζk = 0 for k = 1, . . . , K, (4.3)with A1 = Is1d and Ak = (1(sk−1) ⊗ Id, −I(sk−1)d) for k = 2, . . . , K. In otherwords, H0 posits that the distributions within the first group are all identicalto F0 and those within any other given group are identical to each other.974.2. Effects on the power of the DELR testWhen s1, the size of S1, is 0, no non–baseline distribution is compared to thebaseline F0.Theorem 4.3. Adopt the conditions postulated in Theorem 4.2. For testingthe null hypothesis (4.3), the limiting distributions of R(1)n and R(2)n underlocal alternative (4.2) have equal non–central parameters: δ21 = δ22.Example 4.4. Consider the situation where m + 1 = 6, samples are froma DRM with basis function q(x) = (log x, x)ᵀ, and the sample proportionsare (0.25, 0.22, 0.16, 0.10, 0.17, 0.10). Let Fk, k = 1, . . . , 5, to be the dis-tributions with parameters β∗1 = (0, 0)ᵀ, β∗2 = β∗3 = (−1, −0.5)ᵀ, β∗4 =(0.3, 1.2)ᵀ, and β∗5 = (−1.2, −0.4)ᵀ. We partition {1, 2, 3} into S1 = {1}and S2 = {2, 3}. The null hypothesis (4.3) postulates F0 = F1 and F2 = F3.Let the local alternative beβk = β∗k + n−1/2k ck, for k = 1, 2, 3,with c1 = c2 = (1, 1)ᵀ and c3 = (−1, 2)ᵀ.Let R(1)n and R(2)n be the DELR test statistics based on F0, . . . , F3, andon F0, . . . , F5, respectively. When F0 is Γ(2, 1), we obtained informationmatrices (2.13), and hence Λ, for R(1)n and R(2)n based on numerical com-putation. For R(1)n , we find η ≈ (2.13, 2.13, 2.5, 2.5, −3.16, 6.32)ᵀ, 5 =diag{I2, (I2,−I2)}, and J = (02×2, I2, I2)ᵀ. For R(2)n , we get η ≈ (2.13,2.13, 2.5, 2.5, −3.16, 6.32, 0, 0, 0, 0)ᵀ, 5 = (diag{I2, (I2,−I2)}, 04×4), andJ = (06×2, (I2, 02×4)ᵀ, I6)ᵀ. By Theorem 3.2, we confirm that δ21 = δ22 ≈ 2.72.Hence R(1)n and R(2)n are asymptotically equally powerful.The scenario presented in Theorem 4.3 is similar to the one–way ANOVAused in experimental design (Wu and Hamada, 2009). Suppose there are fivetreatments under investigation and we want to test the equal mean hypothesisof the first two treatments. One may use pooled variance estimator from allsamples to construct the two–sample t–test. This test gains in the degrees984.3. Simulation studiesof freedom comparing to the t–test based on the first two samples alone, butnot in the first order asymptotics.4.3 Simulation studies4.3.1 Comparison of estimation accuracy: νˆ(1) versusνˆ(2)We now conduct simulation studies to compare the estimation accuracy of theMELE based on a subset of the samples to that based on all the samples. Thenumber of repetitions for simulation is set to 100, 000 for a high simulationaccuracy.Recall that we have used ν to denote the DRM parameter of interest,and νˆ(1) and νˆ(2) to denote the MELEs of ν based on the samples from thepopulations of interest and the samples from all the populations, respectively.Consider the data settings of Example 4.1 and 4.2. Recall that, for Example4.1, m + 1 = 4 and ν = (α1, β1)ᵀ, and for Example 4.2, m + 1 = 5 andν = (α1, α2, β1, β2)ᵀ. The bias, standard deviation (sd), and root meansquared error (rmse) of νˆ(1) and νˆ(2) under the two settings are shown inTables 4.1 and 4.2, respectively. We see that under both settings, νˆ(2) alwayshas smaller absolute bias, sd and rmse, thereby a higher estimation accuracy.This observation agrees with our theoretical conclusion given in Theorem 4.1.4.3.2 Comparison of testing power: R(1)n versus R(2)nRecall that R(1)n and R(2)n are DELR statistics based on partial data setsand full data sets, respectively. We now conduct simulations to compare thepower of R(1)n and R(2)n . The number of simulation repetitions is set to 10, 000.We first let m+ 1 = 4 and consider a hypothesis test for β1. The DELRtest can be conducted based on the first two samples (R(1)n ) or based on all994.3. Simulation studiesTable 4.1: Comparison of the estimation accuracies of νˆ(1) and νˆ(2) underthe setting of Example 4.1. β1[1], β1[2]: the two components of β1.αˆ(1)1 αˆ(2)1 βˆ(1)1 [1] βˆ(2)1 [1] βˆ(1)1 [2] βˆ(2)1 [2]bias 0.44 0.392 -0.403 -0.362 0.477 0.422sd 1.31 1.2 1.15 1.05 1.46 1.34rmse 1.39 1.26 1.22 1.11 1.54 1.41Table 4.2: Comparison of the estimation accuracies of νˆ(1) and νˆ(2) underthe setting of Example 4.2.αˆ(1)1 αˆ(2)1 βˆ(1)1 βˆ(2)1 αˆ(1)2 αˆ(2)2 βˆ(1)2 βˆ(2)2bias 0.071 0.0376 -0.627 -0.424 -0.0339 -0.0121 0.0364 0.0184sd 0.346 0.254 2.51 2.09 0.246 0.218 0.242 0.218rmse 0.354 0.257 2.59 2.13 0.248 0.218 0.244 0.219four samples (R(2)n ). The Wald test can also be conducted in two differentways. We denote them as Wald(1) and Wald(2) respectively.We generated samples with sizes (60, 30, 40, 90), from gamma distribu-tions under six parameter settings (Table 4.3).Table 4.3: Gamma parameter values for power comparison of R(1)n and R(1)n .Common parameter settingsF0 : Γ(2, 1), F2 : Γ(2.2, 1.25), F3 : Γ(2.1, 1.5)Parameter settings for F10 1 2 3 4 5Case 1: Γ(4, 3) Γ(5.3, 4.3) Γ(6.3, 5.3) Γ(7.1, 6.1) Γ(8.3, 7.3) Γ(10, 9)Case 2: Γ(2, 1) Γ(2.45, 1.45) Γ(2.9, 1.9) Γ(3.3, 2.3) Γ(3.8, 2.8) Γ(5, 4)We consider two null hypotheses: one is β1 = (−2, 2)ᵀand the other isβ1 = 0. The first hypothesis asks whether F1 differs from F0 in a specific1004.3. Simulation studiesway; while the second one asks whether F0 = F1. By Theorems 4.2 and 4.3,R(2)n is more powerful than R(1)n for testing the first hypothesis, and two testsare asymptotically equally powerful for the second.The simulated power curves are shown in Figure 4.1. It is seen that Waldtests are generally not as powerful. The results on R(1)n and R(2)n closely matchthe predictions of Theorems 4.2 and 4.3.llllll0 1 2 3 4 5H0 : β1 = (− 2, 2)TParameter settingsPower0.050.20.40.60.81.0l lllllllRn(2)Rn(1)Wald(2)Wald(1)llllll0 1 2 3 4 5H0 : β1 = (0, 0)TParameter settings0.050.20.40.60.81.0l lllllFigure 4.1: Power curves of R(1)n , R(2)n , Wald(1) and Wald(2). The parametersetting 0 corresponds to the null model and the settings 1–5 correspond toalternative models.We conduct a second set of simulations where the data settings are takenfrom Example 4.3 and 4.4. For the setting of Example 4.3, we set the totalsample size n to be 240, and for that of Example 4.4, we set n = 600. Undereach data setting, we again calculate the powers of R(1)n , R(2)n , Wald(1) andWald(2) under six different DRM parameters as shown in Table 4.4 and 4.5.The corresponding power curves of are shown in Figure 4.2. We see that ourexpectation again meets the observation from simulation.1014.4. proofsTable 4.4: Parameter values for power comparison of R(1)n and R(1)n under thesetting of Example 4.3.Common parameter settingsF0 : N(0, 1), F2 : N(−1, 2)Parameter settings for F10 1 2 3 4 5N(1.5, 0.5) N(1.57, 0.45) N(1.58, 0.41) N(1.6, 0.39) N(1.62, 0.36) N(1.64, 0.31)Table 4.5: Parameter values for power comparison of R(1)n and R(1)n under thesetting of Example 4.4.Common parameter settingsF0 : Γ(2, 1), F4 : Γ(3.2, 0.7), F5 : Γ(1.6, 2.2)Parameter settings for F1, F2 and F30 1 2 3 4 5F1: Γ(2, 1) Γ(2.3, 1.1) Γ(2.3, 1) Γ(2.4, 1) Γ(2.4, 1) Γ(2.4, 0.9)F2: Γ(1.5, 2) Γ(1.6, 2.1) Γ(1.5, 2) Γ(1.5, 2) Γ(1.4, 1.9) Γ(1.4, 1.9)F3: Γ(1.5, 2) Γ(1.9, 2.2) Γ(1.9, 2.4) Γ(1.9, 2.3) Γ(2, 2.2) Γ(2.1, 2.2)4.4 proofs4.4.1 Theorem 4.1: Estimation accuracy comparisonWe first introduce a useful notation for Schur complements that will be fre-quently encountered in the subsequent proofs. Let square matrixM =(A BC D)be nonsingular. We write M/A = D − CA−1B and call it the Schur com-plement of M with respect to its upper–left block A. Also, we write M/D =A−BD−1C and call it the Schur complement of M with respect to its lower–right block D.1024.4. proofs0 1 2 3 4 5Example 4.5: H0: β1 = (6, − 1.5)TParameter settingsPower0.050.20.40.60.81.0Rn(2)Rn(1)Wald(2)Wald(1)0 1 2 3 4 5Example 4.7: H0: β1 = 0 and β2 = β3Parameter settings0.050.20.40.60.81.0Figure 4.2: Power curves of R(1)n , R(2)n , Wald(1) and Wald(2) under the datasettings of Example 4.3 and 4.4. The parameter setting 0 corresponds to thenull model and the settings 1–5 correspond to alternative models.Lemma 4.4. Adopt the above partition for a symmetric matrix M of size(s+ t)× (s+ t). When A > 0, M ≥ 0 if and only if M/A ≥ 0. When D > 0,M ≥ 0 if and only if M/D ≥ 0.This is Theorem 1.4 of Zhan (2002). The outline of the proof is givenbelow.Proof. We prove the claimed result for the case that A > 0. The proof forthe case that D > 0 is similar and so omitted.Note that since matrixN =(Im −A−1B0 In)is of full rank, any non–zero vector u of length s+t can be written as u = Nw1034.4. proofsfor some non–zero vector w. Hence, we haveuᵀMu = wᵀNTMNw= wᵀ(Im 0−BᵀA−1 In)(A BBᵀ D)(Im −A−1B0 In)w= wᵀ(A 00 M/A)w.Therefore, M ≥ 0 if and only(A 00 M/A)≥ 0.Since A > 0, the latter condition is equivalent to M/A ≥ 0.Recall that we defined θᵀk = (αk, βᵀk). Denote the information matrix withrespect to (θᵀ1, . . . ,θᵀr)ᵀunder the DRM based on the first r + 1 samples asU1, and that with respect to (θᵀ1, . . . ,θᵀm)ᵀunder the DRM based on allm+1samples as U2. Let U2,c be the lower–right (m − r)(d + 1) × (m − r)(d + 1)block of U2. Recall that ρ = limn→∞(∑rk=0 nk)/n.Lemma 4.5. Under the conditions of Theorem 2.1, U2/U2,c ≥ ρU1.Proof of Lemma 4.5. We prove the result for m = r+ 1, namely we comparethe DRM based on the first r + 1 = m samples and that based on all them+ 1 samples. The general result is true by mathematical induction.Let U2,a be the upper–left r(d + 1) × r(d + 1) block, and U2,b be theupper–right r(d + 1) × (m − r)(d + 1) block, of U2. Note that U2/U2,c =U2,a−U2,bU−12,cUᵀ2,b, so to show the claimed result of U2/U2,c ≥ ρU˜1, it sufficesto show that(U2,a − ρU1)− U2,bU−12,cUᵀ2,b1044.4. proofsis positive semidefinite. Notice that the above matrix is the Shur complementofD =(U2,a − ρU1 U2,bUᵀ2,b U2,c)= U2 − diag(ρU1, 0). (4.4)By Lemma 4.4, the positive semidefiniteness is implied by that of D.We now show D is positive semidefinite. We first give useful algebraicexpressions for U2 and ρU1. Notice that (θᵀ1, . . . , θᵀm) is just permuted θᵀ=(αᵀ,βᵀ), the information matrix (2.13) of which helps us to obtain algebraicexpressions for U1 and U2. Recall Qᵀ(x) = (1, qᵀ(x)). For the DRM basedon all the m+ 1 samples, we getU2 = E0{H(θ∗, x)⊗ {Q(x)Qᵀ(x)}}.For the DRM based on the first r + 1 = m samples, we findρU1 = E0{Hr(θ∗, x)⊗ {Q(x)Qᵀ(x)}},where Hr(θ, x) is the H matrix defined in (2.12) based on the first r + 1samples:Hr(θ, x) = diag{hr(θ, x)} − hr(θ, x)hᵀr(θ, x)/sr(θ, x).with hr(θ, x) = (ρ1ϕ1(θ, x), . . . , ρrϕr(θ, x))ᵀand s(θ, x) = ρ0+∑rk=1 ρkϕk(θ, x).As a reminder ϕk(θ, x) = exp{αk + βᵀkq(x)}, k = 0, 1, . . . , m. Put s∗(x) =s(θ∗, x) for simplicity and similarly define s∗r(x), h∗r(x) and ϕ∗k(x) for k =0, 1, . . . , m. Substituting the above expressions of U2 and ρU1 into the ex-1054.4. proofspression (4.4) of D and noticing s∗ − s∗r = ρmϕ∗m by m = r + 1, we getD = E0{(ρmϕ∗m(x)h∗r(x)h∗rᵀ(x)/{s∗(x)s∗r(x)} ρmϕ∗m(x)h∗r(x)/s∗(x)ρmϕ∗m(x)h∗rᵀ(x)/s∗(x) ρmϕ∗m(x)− {ρmϕ∗m(x)}2/s∗(x))⊗{Q(x)Qᵀ(x)}}= ρmE0{ϕ∗m(x)(h∗r(x)h∗rᵀ(x)/{s∗(x)s∗r(x)} h∗r(x)/s∗(x)h∗rᵀ(x)/s∗(x) s∗r(x)/s∗(x))⊗ {Q(x)Qᵀ(x)}}= ρmE0{{w(x)wᵀ(x)} ⊗ {Q(x)Qᵀ(x)}},withw(x) =√ϕ∗m(x)(h∗rᵀ(x), s∗r(x))ᵀ/√s∗(x)s∗r(x).SinceD is the expectation of the Kronecker product of two squares of vectors,it is positive semidefinite. This completes the proof.Proof of Theorem 4.1. We now show that the asymptotic covariance matrixof the MELE based on the samples from the first r + 1 populations is nosmaller than that based on the sample from all the populations. We provethe result for the estimation of (θᵀ1, . . . ,θᵀr)ᵀ. The claimed result for ν =(α1, . . . , αr, βᵀ1, . . . ,βᵀr)ᵀis then true because ν is just a permutation of(θᵀ1, . . . ,θᵀr)ᵀ.The asymptotic covariance matrix, Σ(1), of the MELE for (θᵀ1, . . . ,θᵀr)ᵀunder the DRM based on the first r + 1 samples, by Theorem 2.5, is foundto beΣ(1) = U−11 − ρW11064.4. proofswhere U1 is the information matrix under this DRM andW1 ={ (ρ0−11r1ᵀr + diag(ρ1−1, . . . , ρr−1)) }⊗ diag{e˜1},with e˜k, in general, being the vector of length d with the kth entry being 1and the others being 0s.Similarly, the asymptotic covariance matrix of the MELE for (θᵀ1, . . . ,θᵀm)ᵀunder the DRM based on all the m+ 1 samples is found to beU−12 −W2where U2 is the information matrix with respect to (θᵀ1, . . . ,θᵀm)ᵀunder thecurrent DRM andW2 ={ (ρ0−11m1ᵀm + diag(ρ1−1, . . . , ρm−1)) }⊗ diag{e˜1}.Therefore, the asymptotic covaraince matrix, Σ(2) of the MELE for the sub-vector (θᵀ1, . . . ,θᵀr)ᵀunder this DRM based on all the m + 1 sampels, isjust the upper–left rd × rd submatrix of U−12 − W2. By the block matrixinversion formula (Lemma 3.3), the upper–left rd × rd submatrix of U−12 is(U2/U2,c)−1. Also notice that the correspond upper–left submatrix of W2 isjust W1. Hence,Σ(2) = (U2/U2,c)−1 −W1.By the above expressions of Σ(1) and Σ(2), to show the claimed result ofΣ(1) ≥ ρΣ(2), it suffices to show thatU−11 ≥ ρ(U2/U2,c)−1,1074.4. proofswhich is equivalent toρU1 ≤ (U2/U2,c)because the matrices on both sides of the inequality are positive definite. Thelatter inequality is true by Lemma 4.5, so the claimed result of the theoremis true.4.4.2 Theorem 4.2: Local power comparison in generalLemma 4.6. Let M and N be (s+ t)× (s+ t) positive definite matrices withpartitionM =Mas×sMbs×tMct×sMdt×t and N =Nas×sNbs×tNct×sNdt×t .If M ≥ N , then M/Ma ≥ N/Na and M/Md ≥ N/Nd.Let U be the information matrix with respect to θ based on all m +1 samples (corresponding to R(2)n ), and U˜ be that with respect to ν =(α1, . . . , αr, βᵀ1, . . . ,βᵀr)ᵀbased on the first r + 1 samples (correspondingto R(1)n ). Similar to the partition of U , we partition U˜ to U˜αα, U˜αβ, U˜βαand U˜ββ, and similar to the definition Λ = U/Uαα given in Theorem 3.2, wedefine Λ˜ = U˜/U˜αα. Moreover, we partition Λ intoΛ =(Λa ΛbΛᵀb Λc)with Λa being the upper–left rd× rd block of Λ.Lemma 4.7. Under the conditions of Theorem 2.1, Λ/Λc ≥ ρΛ˜.1084.4. proofsLemma 4.8. Let M be a s × s positive definite matrix and N be a s × spositive semidefinite matrix. Also let X and Y be s× t matrices, and supposethe column space of Y is contained in that of B. Then(X + Y )ᵀ(M +N)−1(X + Y ) ≤ XᵀM−1X + Y ᵀN †Ywhere N † is the Moore–Penrose pseudoinverse of N .The proofs of the above lemmas, being lengthy, are given after the proofof Theorem 4.2.Proof of Theorem 4.2. We now show that under the local alternative model(4.2), the non–central parameter of the limiting distribution of the DELRstatistic based on the samples from the first r + 1 populations in general isnot greater than that based on the samples from all the populations.We have defined two DELR statistics R(1)n and R(2)n which are constructedusing the samples from only the first r + 1 populations F0, · · · , Fr, and thesamples from all the populations, respectively. Recall that the null hypothesisof (4.1) contains a constraint g(ζ) = 0 with ζᵀ = (βᵀ1, . . . , βᵀr) related onlyto populations F0, · · · , Fr. By Theorem 3.2, under the {Gk} defined by thelocal alternative model (4.2), R(1)n and R(2)n both have non–central chi–squarelimiting distributions of q degrees of freedom, but with different non–centralparameters δ21 and δ22. We also know thatδ21 = ρη˜ᵀ{Λ˜− Λ˜J(JᵀΛ˜J)−1JᵀΛ˜}η˜,where η˜ = (ρ−1/21 cᵀ1, . . . , ρ−1/2r cᵀr) is a subvector of η defined in Theorem 3.2.Moreover, noticing that for the local alternative (4.2) under investigation,ηᵀ = (η˜ᵀ, 0ᵀm−r), we getδ22 = η˜ᵀ {(Λ/Λc)− (Λ/Λc)J(Jᵀ(Λ/Λc)J)−1Jᵀ(Λ/Λc)}η˜.by applying Theorem 3.2 and the quadratic form decomposition formula1094.4. proofsgiven in Lemma 3.4. Hence, to show the claimed result δ22 ≥ δ21, it suffices toshow that(Λ/Λc)− (Λ/Λc)J(Jᵀ(Λ/Λc)J)−1Jᵀ(Λ/Λc) ≥ ρ{Λ˜− Λ˜J(JᵀΛ˜J)−1JᵀΛ˜}.(4.5)Define M = Λ/Λc − ρΛ˜. In Lemma 4.8, let A = ρJᵀΛ˜J , B = JᵀMJ ,X = ρJᵀΛ˜, Y = JᵀM . Then A + B = Jᵀ(Λ/Λc)J and X + Y = Jᵀ(Λ/Λc).Matrix A is positive definite because Λ˜ is positive definite and J is of fullrank. B is positive semidefinite becauseM is positive semidefinite by Lemma4.7. Moreover, it is easily seen that the column space of Y is the same asthat of B. Hence the conditions of Lemma 4.8 are satisfied, and we have(Λ/Λc)J(Jᵀ(Λ/Λc)J)−1Jᵀ(Λ/Λc) ≤ ρΛ˜J(JᵀΛ˜J)−1JᵀΛ˜ +MJ(JᵀMJ)†JᵀM.The above inequality and Λ/Λc = ρΛ˜ +M imply that(Λ/Λc)− (Λ/Λc)J(Jᵀ(Λ/Λc)J)−1Jᵀ(Λ/Λc)≥ρ{Λ˜− Λ˜J(JᵀΛ˜J)−1JᵀΛ˜}+ {M −MJ(JᵀMJ)†JᵀM}.The term M −MJ(JᵀMJ)†JᵀM is positive semidefinite becauseM −MJ(JᵀMJ)†JᵀM = M1/2{I −M1/2J(JᵀMJ)†JᵀM1/2}M1/2,and I − M1/2J(JᵀMJ)†JᵀM1/2 is easily verified to be idempotent, hencepositive semidefinite. Therefore inequality (4.5) holds and the claimed resultis true.Proof of Lemma 4.6. By Lemma 3.3, the upper–left s×s submatrices ofM−1and N−1 are (M/Md)−1 and (N/Nd)−1 respectively, and the lower–right t× tsubmatrices of M−1 and N−1 are (M/Ma)−1 and (N/Na)−1 respectively.Since both M and N are positive definite and M ≥ N , we have M−1 ≤1104.4. proofsN−1. The corrsponding submatrices of M−1 and N−1 satisfy the same in-equality, so we have (M/Md)−1 ≤ (N/Nd)−1 and (M/Ma)−1 ≤ (N/Na)−1. Bythe positive definiteness of these Schur complements, we can take inverseson both sides and reverse the directions of the above two inequalities, whichgives us the claimed result.To prove Lemma 4.7, partition Uαα, Uαβ and Uββ as follows:Uαα =(Uαα,a Uαα,bUᵀαα,b Uαα,c), Uαβ =(Uαβ,a Uαβ,bUαβ,c Uαβ,d), Uββ =(Uββ,a Uββ,bUᵀββ,b Uββ,c),where Uαα,a, Uαβ,a and Uββ,a are the corrsponding upper–left r × r, r × rdand rd× rd blocks.We also introduce an important property of the Schur complement. LetM =As×sBs×tCt×sDt×t and D =Eu×uFu×vGv×uHv×v ,where u+ v = t. Suppose M , A and D are nonsingular. By Theorem 1.4 ofZhang (2005), the lower–right u× u block of M/H is just D/H, andM/D = (M/H)/(D/H). (4.6)The above equality is known as the quotient formula. Similar quotient for-mula holds for M/A.Proof of Lemma 4.7. We first give an algebraic expression for Λ/Λc. Recallthe definition Λ = Uββ − UβαU−1ααUαβ, soΛ = Ψ/Uαα,1114.4. proofswhereΨ =(Uββ UβαUαβ Uαα).Let Ψ1 be the lower–right {(m − r)d + m} × {(m − r)d + m} block of Ψ.Then Λc, the lower–right (m− r)d × (m− r)d block of Λ = Ψ/Uαα, satisfiesΛc = Ψ1/Uαα.ThereforeΛ/Λc = (Ψ/Uαα)/(Ψ1/Uαα) = Ψ/Ψ1,where the second equality above is by quotient formula (4.6).It is easily seen that Ψ/Ψ1 = Ω/Ω1, whereΩ =Uββ,a Uβα,a Uββ,b Uβα,bUαβ,a Uαα,a Uαβ,b Uαα,bUᵀββ,b Uβα,c Uββ,c Uβα,dUαβ,c Uᵀαα,b Uαβ,d Uαα,cand Ω1 is the lower–right block of Ω with the same size as that of Ψ1. Thuswe getΛ/Λc = Ψ/Ψ1 = Ω/Ω1.Let Ω2 be the lower–right (m − r)(d + 1) × (m − r)(d + 1) block of Ω1.Matrix Ω1/Ω2 is just the lower–right r × r block of Ω/Ω2, and Ω/Ω1 =(Ω/Ω2)/(Ω1/Ω2) by quotient formula (4.6). Hence, we finally getΛ/Λc = Ω/Ω1 = (Ω/Ω2)/(Ω1/Ω2).1124.4. proofsThe above identity implies that our cliam of Λ/Λc ≥ ρΛ˜ is equivalent to(Ω/Ω2)/(Ω1/Ω2) ≥ ρΛ˜.Further notice that Λ˜ = Uˇ/U˜αα, whereUˇ =(U˜ββ U˜βαU˜αβ U˜αα),so, the above inequality is equivalent to(Ω/Ω2)/(Ω1/Ω2) ≥ ρ(Uˇ/U˜αα). (4.7)In the last step, we prove the above inequality (4.7). Recall Lemma 4.6that if matrices M and N are both positive definite and M ≥ N , then thecorresponding Schur complements satisfy the same inequality. Note thatboth Ω/Ω2 and Uˇ are positive definite and the terms (Ω/Ω2)/(Ω1/Ω2) andUˇ/U˜αα in (4.7) are corresponding Schur complements, so, by Lemma 4.6, toshow (4.7), it is enough to show thatΩ/Ω2 ≥ ρUˇ.Note that parameter φᵀ = (βᵀ1, . . . , βᵀr , α1, . . . , αr, βᵀr+1, . . . , βᵀm, αr+1, . . . , αm)is just permuted (θᵀ1, . . . , θᵀm), so the conculsion of Lemma 4.5 also applies tothe information matrix with respect to φ. The information matrix with re-spect to φ for R(2)n is just Ω, and its lower–right (m−r)(d+1)×(m−r)(d+1)block is Ω2. For R(1)n , the infromation matrix is just Uˇ . Thus by Lemma 4.5,we have Ω/Ω2 ≥ ρUˇ . The proof is complete.Proof of Lemma 4.8. Notice that(M +N X + Y(X + Y )ᵀ XᵀM−1X + Y ᵀN †Y)=(M XXᵀ XᵀM−1X)+(N YY ᵀ Y ᵀN †Y).1134.4. proofsThe first matrix on the RHS is positive semidefinite by Lemma 4.4 sinceM > 0 and the Schur complement of this matrix with respect to M is 0. Bya generalized version of Lemma 4.4 (Zhang (2005)), observing that N ≥ 0,Y is in the column space of N and the Schur complement of the secondmatrix on the RHS with respect to N is 0, we have that the second matrixon the RHS is also positive semidefinite. Therefore the matrix on the LHS ispositive semidefinite. Also note that M +N > 0. Hence, by Lemma 4.4, theSchur complement of the LHS with respect to its upper–left block M +N ,XᵀM−1X + Y ᵀN †Y − (X + Y )ᵀ(M +N)−1(X + Y ),must also be positive semidefinite. The claimed result then follows.4.4.3 Theorem 4.3: Local power comparison in aspecial caseIn this section, we show that for the special hypothesis (4.3), under the localalternative model (4.2), the non–central parameters of the limiting distribu-tions of R(1)n and R(2)n are equal.Recall that δ21 and δ22 are the non–central parameters of the limiting dis-tributions of R(1)n and R(2)n under the local alternative model (4.2). Under thespecial hypothesis (4.3) in this theorem, Jacobian J has a special structure.We use this structure to show the equality of δ21 and δ22. Without loss ofgenerality, we assume that the indices in Sk, k = 1, . . . , K, are in naturalorder.By Theorem 3.2, we haveδ21 = ρη˜ᵀ{Λ˜− Λ˜J1(Jᵀ1 Λ˜J1)−1Jᵀ1 Λ˜}η˜1144.4. proofswhere η˜ is defined in the proof of Theorem 4.2 andJ1 =(0(K−1)×s1 , diag(1ᵀs2 , . . . , 1ᵀsK ))ᵀ⊗ Id.As the proof of Theorem 4.2, we findδ22 = η˜ᵀAη˜,where A is the upper–left rd × rd block of Λ − ΛJ2(Jᵀ2 ΛJ2)−1Jᵀ2 Λ withJ2 = diag(J1, I(m−r)d). Therefore, to show the claimed δ21 = δ22, it suffices toshow thatρ{Λ˜− Λ˜J1(Jᵀ1 Λ˜J1)−1Jᵀ1 Λ˜}= A (4.8)We first simplify the LHS of (4.8). Let %k, k = 1, . . . , K, be the vectorconsisting of ρi, i ∈ Sk, and ςk =∑i∈Skρi. We observe thatJᵀ1 Λ˜ = (Jᵀ1 Λ˜J1)BwhereB =(− (ς1 + ρ0)−11K−1 ⊗ %ᵀ1, diag(ς−12 %ᵀ2, . . . , ς−1K %ᵀK))⊗ Id.Thus, we have (Jᵀ1 Λ˜J1)−1Jᵀ1 Λ˜ = B andρ{Λ˜− Λ˜J1(Jᵀ1 Λ˜J1)−1Jᵀ1 Λ˜}= ρ{Λ˜− Λ˜J1B}. (4.9)We then simplify Λ − ΛJ2(Jᵀ2 ΛJ2)−1Jᵀ2 Λ to get another expression of A.We find thatJᵀ2 Λ = Jᵀ2 ΛJ2(B 0C I(m−r)d),1154.4. proofswhere C =(− {ς1 + ρ0}−1(1m−r%ᵀ1 ⊗ Id), 0). Thus,Λ− ΛJ2(Jᵀ2 ΛJ2)−1Jᵀ2 Λ = Λ− ΛJ2(B 0C I(m−r)d).Recall that A is the upper–left rd×rd block of Λ−ΛJ2(Jᵀ2 ΛJ2)−1Jᵀ2 Λ. Hence,the above identity gives another expression of A asA = Λa − ΛaJ1B − ΛbC, (4.10)where Λa and Λb are respectively the rd× rd and rd× (m− r)d blocks of Λ.Finally, the proof is completed by showing the RHS expressions of (4.9)and (4.10) are equal. This is done by linking Λ–matrices to the informationmatrix (2.13), and applying the block matrix inversion formaula given inLemma 3.3 and the quadratic form deceomposition formula given in Lemma3.4.116Chapter 5Empirical Likelihood Inferenceunder the DRM Based onMultiple Type I CensoredSamplesAs noted in Chapter 1, it is desirable to make smart strength test plans tomaximize the scientific value of each piece of lumber, and one such plan is tocollect Type I right–censored strength samples such that some lumber canbe used for multiple strength tests. However, the theory of EL inferenceunder the DRM for complete samples does not carry automatically to thecase of Type I censored samples due to the substantially more complicatedanalytical form of the EL in the latter case. This chapter creates a powerfulEL inference framework for the latter case. In particular, we (1) show thatthe maximization of the EL can be reduced to the maximization of a concavefunction, which we call the dual partial EL, (2) give the asymptotic propertiesof the dual partial EL, and (3) study the properties of the EL ratio testfor hypothesis about the DRM parameter β. We argue that the inferenceframework established in the chapter can potentially be used to extend anyEL inference result that is available for multiple complete samples under theDRM to the case of multiple Type I censored samples.The chapter is organized as follows. Section 5.1 introduces the conceptof Type I censoring and gives the definition of the corresponding EL for asingle sample. Section 5.2 defines the EL function under the DRM based1175.1. Type I censored single random samples and the corresponding ELon multiple Type I censored samples, and the the associated maximum ELestimator. Section 5.3 explores the relationship between the EL and thepartial EL and reduces the constrained maximization problem for EL to aconvex maximization problem. Section 5.4 gives an interpretation for thePEL and Section 5.5 study the asymptotic properties of the MELE for theDRM parameters. The theory of EL ratio test based on Type I censoredsamples is established in Section 5.6, followed by a short discussion in Section5.7 about other inference tasks under this framework. The proofs are givenin Section 5.8.5.1 Type I censored single random samplesand the corresponding ELLet {xi}ni=1 be an independent sample from a population with CDF F (x).Let {Ci}ni=1 be a set of iid random variables from another population, and{ci}ni=1 be a set of corresponding realizations. The sample {xi}ni=1 is said tobe right–censored if we only observe value zi = min(xi, ci) and the indicator1(xi ≤ ci), which shows whether an observation is censored or not. When1(xi ≤ ci) = 1, the ith observation is said to be uncensored, otherwise,censored. If the underlying distribution of Ci is a point mass at a givenconstant c, that is when zi = min(xi, c), then the sample is said to be Type Iright–censored. Type I right censoring usually arises in reliability engineeringand medical studies when a experiment stops at a prespecified value c and thevalues of the sample points that are larger than c are unknown to observers.Similarly we can define other kinds of Type I censoring, for example,Type I left–censored sample: we observe zi = max(xi, c) and the censoringindicator 1(xi ≥ c) for some given constant c;Type I left and right–censored sample: we observe the censoring indicators1(xi ∈ [c1, c2]) and 1(xi < c1) for given constants c1 < c2, and zi =1185.2. EL for multiple Type I censored samples under the DRMxi1(xi ∈ [c1, c2]) + c11(xi < c1) + c21(xi > c2).This chapter focuses on the above three kinds of Type I censored samples,and in the sequel, we simply refer to them as “Type I censored samples”.Clearly, the Type I right–censored sample and Type I left–censored sampleare both special cases of Type I left and right–censored sample with c1 = −∞and c2 =∞ respectively. We focus on studying the most general one of thethree: the Type I left and right–censored sample.Let n˜ be the number of uncensored observations, i.e. n˜ =∑nj=1 1(xi ∈[c1, c2]), and denote the uncensored observations as x˜i, i = 1, . . . , n˜. Let nˇbe the number of left–censored observations, i.e. nˇ =∑nj=1 1(xi < c1). Putς = Pr(X ∈ [c1, c2]) = F (c2)− F (c−1 ),ι = Pr(X < c1) = F (c−1 ).We have Pr(X > c2) = 1−ι−ς. Recall that we defined dF (x) = F (x)−F (x−)in Section 2.1. The EL based on a Type I censored sample is given to beLn(F ) =n∏i=1ι1(xi<c1){dF (xi)}1(xi∈[c1, c2])(1− ι− ς)1−1(xi<c1)−1(xi∈[c1, c2])= ιnˇ{n˜∏i=1dF (x˜i)}(1− ι− ς)n−nˇ−n˜.5.2 EL for multiple Type I censored samplesunder the DRMSuppose we have m+ 1 independent Type I censored samples{xkj : j = 1, 2, . . . , nk}mk=0from populations {Fk(x)} of the same support S, which satisfy the DRMassumption (2.1). Our question is, based on such censored samples, how1195.2. EL for multiple Type I censored samples under the DRMshould we estimate the DRM parameter and test hypotheses about it, animportant inference task in our long term monitoring program for lumberstrength as noted in Chapter 1.As for complete data, we consider EL inference as it seems to be aneffective and most natural inference tool under the semiparametric DRM. Letck,1 and ck,2, k = 0, 1, . . . , m, be the left and right censoring cutting pointsfor the kth Type I censored sample, respectively. Let Sk = [ck,1, ck,2]∩S. Theset Sk is the support of the distribution of the uncensored observations inthe kth sample. Let n˜k be the number of the uncensored observations in thekth sample and denote the uncensored observations as {x˜kj : j = 1, . . . , n˜k}.Let nˇk be the number of left–censored observations in the kth sample. Defineςk = Pr(Xk1 ∈ Sk) = Fk(ck,2)− Fk(c−k,1) and ιk = Fk(c−k,1).We will always assume that, for all k = 0, 1, . . . , m, ςk > 0 and Sk ⊆ S0.When the samples are not both left and right–censored, we can always choosea baseline such that this assumption is satisfied. As in the case of a singlesample, the EL of the {Fk} based on these Type I censored samples is definedto beLn(F0, . . . , Fm) ={m∏k=0n˜k∏j=1dFk(x˜kj)}{m∏k=0ιnˇkk}{m∏k=0{1− ιk − ςk}nk−nˇk−n˜k}.The first factor on RHS in the above definition is the contribution of theuncensored observations to the likelihood; the second factor, the contributionof the left–censored observations; and the third, the contribution of the right–censored observations. When the {Fk} satisfy the DRM assumption, the1205.2. EL for multiple Type I censored samples under the DRMabove likelihood function can be further written asLn(F0, . . . , Fm) ={m∏k=0n˜k∏j=1dF0(x˜kj)}{m∏k=0n˜k∏j=1exp{αk + βᵀkq(x˜kj)}}{m∏k=0ιnˇkk {1− ιk − ςk}nk−nˇk−n˜k}, (5.1)where αk and βk satisfyˆx∈Skexp{αk + βᵀkq(x)}dF0(x) = ςk.Put pkj = dF0(x˜kj). Definep = {pkj : j = 1, . . . , n˜k}mk=0, ι = {ιk}mk=0, and ς = {ςk}mk=0.We see from (5.1) that, under the DRM, the EL is a function of θ, p, ι andς. We therefore write the EL as Ln(θ, p, ι, ς). Recall that we have definedα0 = 0 and β0 = 0. Let(θˆ, pˆ, ιˆ, ςˆ)=argmaxθ,p, ι, ς{Ln(θ, p, ι, ς) :m∑k=0n˜k∑j=1exp{αr + βᵀrq(x˜kj)}pkj1(x˜kj ∈ Sr) = ςr,pkj ≥ 0, 0 < ςr ≤ 1, 0 ≤ ιr ≤ 1− ςr, r = 0, . . . , m}. (5.2)We call θˆ the MELE of the DRM parameter θ, and (pˆ, ιˆ, ςˆ) the MELE ofthe baseline distribution F0. The next section addresses how to calculatethese MELEs.1215.3. Calculating the MELE5.3 Calculating the MELEThe constrained maximization (5.2) of the EL appears to be a complicatedproblem. We now show it can however be reduced to a simple concave max-imization problem.5.3.1 Partial EL and its relation to ELThe EL (5.1) of the multiple Type I censored samples can be factorized asLn(θ, p, ς) = PLn(θ, p, ς) · Ln(ι, ς), (5.3)wherePLn(θ, p, ς) ={m∏k=0n˜k∏j=1pkjς0}{m∏k=1n˜k∏j=1ς0ςkexp{αk + βᵀkq(x˜kj)}},Ln(ι, ς) ={m∏k=0ιnˇkk ςn˜kk {1− ιk − ςk}nk−nˇk−n˜k}.We call PLn(θ, p, ς) the partial empirical likelihood (PEL) function. Ournext proposition indicates that the EL attains its maximum under the cor-responding constraints given in (5.2) when both the PEL under the sameconstraints and the Ln(ι, ς) attain their maxima independently.Proposition 5.1. Under the constraintm∑k=0n˜k∑j=1exp{αr + βᵀrq(x˜kj)}pkj1(x˜kj ∈ Sr) = ςr, (5.4)for r = 0, 1, . . . , m, supθ,p PLn(θ, p, ς) does not depend on ς.Proof of Proposition 5.1. With reorganizations of terms, the PEL can be1225.3. Calculating the MELEwritten asPLn(θ, p, ς)={m∏k=0n˜k∏j=1pkjς0}{m∏k=1n˜k∏j=1exp{(αk + log ς0 − log ςk) + βᵀkq(x˜kj)}},and the corresponding constraint (5.4) can be written asm∑k=0n˜k∑j=1exp{(αr + log ς0 − log ςr) + βᵀrq(x˜kj)}pkjς01(x˜kj ∈ Sr) = 1, (5.5)for r = 0, 1, . . . , m. Now suppose (pˆ, ςˆ, αˆ, βˆ) is a point at which the PELis maximized under the constraint (5.5). Let ςˇ be an arbitrary vector eachcomponent, ςˇk, of which is in the interval (0, 1]. Put, for k = 0, 1, . . . , mand j = 1, . . . , nk,pˇkj = pˆkjςˇ0ςˆ0and αˇk = αˆk + logςˆ0ςˇ0− logςˆkςˇk.It is easily seen that (pˇ, ςˇ, αˇ, βˆ) also satisfies the constraint (5.5) and PLn(θˇ,pˇ, ςˇ) has the same value as PLn(θˆ, pˆ, ςˆ). Hence the claimed result holds.Proposition 5.1 has a few important implications that help us calculatingas well as studying the properties of the MELEs.(i) The EL Ln(θ, p, ι, ς) attains its maximum when both the PEL PLn(θ,p, ς) and Ln(ι, ς) attain their maxima, respectively. Hence, the MELE(ιˆ, ςˆ) is exactly the point at which the Ln(ι, ς) is maximized given that0 < ςk ≤ 1 and 0 ≤ ιk ≤ 1− ςk. This point is easily seen to beιˆk = nˇk/nk and ςˆk = n˜k/nk.(ii) The constrained maximization of PEL PLn(θ, p, ς) is overparameter-1235.3. Calculating the MELEized: the maximum of PEL is independent of the value of ς. Let℘kj = ς0−1pkj, ℘ = {℘kj : j = 1, . . . , n˜k}mk=0, κk = αk + log ς0 − log ςk,and κ = (κ1, . . . , κm)ᵀ. From the proof of Proposition 5.1, we see thatthe over–parameterization issue can be removed by re–parameterizingthe PEL toPLn(κ,β,℘) ={m∏k=0n˜k∏j=1℘kj}{m∏k=1n˜k∏j=1exp{κk + βᵀkq(x˜kj)}}, (5.6)and the corresponding constraint (5.4) tom∑k=0n˜k∑j=1exp{κr + βᵀrq(x˜kj)}℘kj1(x˜kj ∈ Sr) = 1, (5.7)for r = 0, 1, . . . , m. Define the maximum PEL estimator (MPELE) of(κ, β, ℘) as(κˆ, βˆ, ℘ˆ) = argmaxκ,β,℘{PLn(κ, β, ℘) : constraint (5.7)}.Then the MELEs of α, β and p are just αˆk = κˆk − log ςˆ0 + log ςˆk fork = 1, 2, . . . , m, βˆ and pˆ = ςˆ0℘ˆ, respectively.(iii) All the information about the slope DRM parameter β is contained inthe PEL, PLn(κ, β, ℘), so basing the inference about β on the PELwill cause no loss of efficiency for estimation or loss of statistical powerfor hypothesis testing.5.3.2 Maximization of the PELWe have seen that, to calculate the MELEs, the key is to maximize thePEL (5.6) under the constraint (5.7). Note that the PEL PLn(κ, β, ℘)and the corresponding constraint have similar mathematical expressions as1245.3. Calculating the MELEthe EL (2.3) for uncensored data under the DRM and the correspondingconstraint (2.2), except that in the current problem the {nk} are replacedby the {n˜k} and indicator terms are added to represent the supports of theuncensored observations. The constrained maximization of PEL can then besolved using a similar approach to that for the constrained maximization ofEL in uncensored case. We first obtain the profile log–PEL˜`n(κ, β) = sup℘{logPLn(κ, β, ℘) : constraint (5.7)} . (5.8)Let n˜ =∑mk=0 n˜k be the total number of uncensored observations. Thisconstrained maximization problem again can be solved by the method ofLagrange multipliers, and the supremum is found to be attained at℘kj = n˜−1{1 +m∑r=1λr[exp{κr + βᵀrq(xkj)}1(x˜kj ∈ Sr)− 1]}−1, (5.9)where the {λr}mr=1 are the solution tom∑k=0n˜k∑j=1℘kj exp{κt + βᵀtq(x˜kj)}1(x˜kj ∈ St) = 1,for t = 0, 1, . . . , m. The resulting profile log–PEL is˜`n(κ, β) = −m∑k=0n˜k∑j=1log{1 +m∑r=1λr[exp{κr + βᵀrq(x˜kj)}1(x˜kj ∈ Sr)− 1]}+m∑k=1n˜k∑j=1{κk + βkq(x˜kj)}.As in the case of the profile log–EL for uncensored data, we found thatthe maximum of ˜`n(κ, β) is attained when λk = n˜k/n˜, for k = 1, . . . , m. Wedefine the dual PEL (DPEL) of κ and β by replacing the {λk} in ˜`n(κ, β)1255.4. Interpretation of the PELwith {n˜k/n˜} as`n(κ, β) = −m∑k=0n˜k∑j=1log{m∑r=0(n˜r/n˜) exp{κr + βᵀrq(x˜kj)}1(x˜kj ∈ Sr)}+m∑k=1n˜k∑j=1{κk + βkq(x˜kj)}. (5.10)Clealy, the DPEL `n(κ, β) shares the same maximal point and value as theprofile log–PEL ˜`n(κ, β). Therefore the MPELE (κˆ, βˆ) can be calculated as(κˆ, βˆ) = argmaxκ,β`n(κ, β).The DPEL, just like the DEL for complete data, has a simple analytical formand is concave, so the above maximum can be easily computed.Plugging (κˆ, βˆ) into (5.9) and replacing λr with n˜r/n˜, we get℘ˆkj ={ m∑r=0n˜r exp{κˆr + βˆᵀrq(xkj)}1(x˜kj ∈ Sr)}−1. (5.11)5.4 Interpretation of the PELWe now give an interpretation for the PEL (5.6): it is exactly the ELof the underlying distributions of the uncensored observations {x˜kj : j =1, . . . , n˜k}.For a given k ∈ {0, 1, . . . , m}, denote the CDF of the uncensored obser-vations as F˜k. Then F˜k is a truncated version of Fk:dF˜k(x) = ςk−11(x ∈ Sk)dFk(x). (5.12)Recall that κk = αk + log ς0 − log ςk. Since the {Fk} satisfy the DRM (2.1)1265.4. Interpretation of the PELand Sk ⊆ S0, we havedF˜k(x) = ςk−11(x ∈ Sk) exp{αk + βᵀkq(x)}dF0(x).= ς0ςk−1 exp{αk + βᵀkq(x)}1(x ∈ Sk){ς0−11(x ∈ S0)dF0(x)}= exp{κk + βᵀkq(x)}1(x ∈ Sk)dF˜0(x).Note that we can add the factor 1(x ∈ S0) in the second equality becauseSk ⊆ S0 implies 1(x ∈ Sk) = 1(x ∈ Sk)1(x ∈ S0).We now see that the {F˜k} satisfy a DRM of the formdF˜k(x) = exp{κk + βᵀkq(x)}1(x ∈ Sk)dF˜0(x), for k = 1, . . . , m. (5.13)This model differs from the DRM (2.1) in that the supports of the non–baseline distributions are allowed to be different from each other, althoughall have to be contained in the support of the baseline distribution, while(2.1) requires all the distributions to have the same support. We thereforecall model (5.13) the varying support density ratio model (VSDRM). SinceF˜k’s are distribution functions, (κk, βᵀk)ᵀmust satisfyˆexp{κk + βᵀkq(x)}1(x ∈ Sk)dF˜0(x) = 1, for k = 1, . . . , m. (5.14)Just like the EL (2.3) for uncensored data, the EL of the {F˜k}, is defined tobeLn(F˜0, . . . , F˜m) =m∏k=0n˜k∏j=1dF˜k(x˜kj)={m∏k=0n˜k∏j=1℘kj}{m∏k=1n˜k∏j=1exp{κk + βᵀkq(x˜kj)}},with ℘kj = dF˜0(x˜kj) = ς−10 F0(x˜kj) = ς−10 pkj. We see that this is exactly thePEL (5.6). The constraint (5.14) is exactly the constraint (5.7) corresponding1275.5. Properties of the MPELEto the PEL, when we confine the support of F˜0 on the {x˜kj}. Therefore,maximizing the PEL (5.6) given constraint (5.7) is equivalent to maximizingthe EL of the distribution functions for uncensored observations given thecorresponding constraint (5.14).5.5 Properties of the MPELEPut ϑ = (κ, β). Given that MPELE ϑˆ = (κˆ, βˆ) is the point at whichthe concave DPEL `n(κ, β) — a function much like the DEL for completedata — is maximized, we wonder whether ϑˆ is asymptotically normal justas the MELE θˆ in the case of complete data. Recall that the asymptoticnormality of θˆ in the case of complete data is determined by two facts: (1)the negative second–order derivative of DEL, which we call the empiricalinformation matrix, has a limit; and (2) the score function evaluated at thetrue parameter value θ∗ has a normal limiting distribution. If these twoproperties also hold by the DPEL in the case of Type I censored data, theasymptotic normality of ϑˆ will follow.The differences between the algebraic expressions of the DPEL (5.10)and the DEL (2.11) post a challenge for showing these properties when dataare Type I censored. In the case of complete data, both limits are derivedbased on the fact that the DEL is a sum of iid random variables. However,the DPEL is no longer a sum of iid random variables, but rather a sumof dependent random variables, because of the repeated appearance of thesame random number n˜k in each summand. Does the DPEL have the similarasymptotic properties as the DEL? The answer is positive as we summarizein the following lemmas. The proofs are given in Section 5.8Let ϑ∗ denote the true parameter value (κ∗, β∗). CallUn = −n−1∂2`n(ϑ∗)/∂ϑ∂ϑᵀthe partial empirical information matrix. Recall that Qᵀ(x) = (1, qᵀ(x)), nk1285.5. Properties of the MPELEis the size of the kth sample, and n =∑mk=0 nk is the total sample size.Lemma 5.2 (Properties of the partial information matrix). Suppose we havem+ 1 random samples from populations with distributions of the DRM formgiven in (2.1) and a true parameter value θ∗ such thatˆexp{βᵀkq(x)}dF0(x) <∞for θ in a neighbourhood of θ∗. Each sample is Type I censored with uncen-sored observations fall in the range of Sk = [ck,1, ck,2] for given ck,1 < ck,2, andSk ⊆ S0 for all k. Also,´Q(x)Qᵀ(x)dF0(x) > 0, and nk/n = ρk + O(n−δ)for some constants ρk ∈ (0, 1) and δ > 0.The partial empirical information matrix Un converges almost surely toa positive definite matrix U.We call the limiting matrix U the partial information matrix. We par-tition its entries in agreement with κ and β and represent them as Uκκ,Uκβ, Uβκ and Uββ. Recall that ςk = Fk(ck,2)− Fk(c−k,1). Define ϕk(ϑ, x) =exp{κk + βᵀkq(x)} for k = 0, . . . ,m. Leth(ϑ, x) = (ρ1ς1ϕ1(ϑ, x)1(x ∈ S1), . . . , ρmςmϕm(ϑ, x)1(x ∈ Sm))ᵀ,s(ϑ, x) =m∑k=0ρkςkϕk(ϑ, x)1(x ∈ Sk),H (ϑ, x) = diag{h(ϑ, x)} − h(ϑ, x)hᵀ(ϑ, x)/s(ϑ, x).(5.15)Recall that Ek(·), k = 0, 1, . . . , m, be the expectation operator with respectto Fk. The blockwise algebraic expressions of the partial information matrix1295.6. EL ratio test for the DRM parameterU in terms of H (ϑ∗, x) and q(x) can be written asUκκ = − limn→∞n−1∂2`n(ϑ∗)/∂κ∂κᵀ = ς−10 E0{H (ϑ∗, x)},Uββ = − limn→∞n−1∂2`n(ϑ∗)/∂β∂βᵀ = ς−10 E0{H (ϑ∗, x)⊗(q(x)qᵀ(x))},Uκβ = − limn→∞n−1∂2`n(ϑ∗)/∂κ∂βᵀ = Uᵀβκ = ς−10 E0{H (ϑ∗, x)⊗ qᵀ(x)}.(5.16)Let v = n−1/2∂`n(ϑ∗)/∂ϑ. DefineT = (ρ0ς0)−11m1ᵀm + diag{(ρ1ς1)−1, (ρ2ς2)−1, . . . , (ρmςm)−1},W =(T 0m×md0md×m 0md×md).Lemma 5.3 (Asymptotic properties of the score function). Under the con-ditions of Theorem 5.2, Ev = 0 and v is asymptotically multivariate normalwith mean 0 and covariance matrix V = U −UWU.The asymptotic normality of the MPELE, ϑˆ = (κˆ, βˆ), follows fromLemma 5.2 and 5.3.Theorem 5.4 (Asymptotic normality of the MPELE). Under the condi-tions of Theorem 5.2,√n(ϑˆ − ϑ∗) has an asymptotic multivariate normaldistribution with mean 0 and covariance matrix U−1 −W .5.6 EL ratio test for the DRM parameterA primary inference problem of our interest, as noted in Chapter 1, is to testwhether the underlying distributions of a few Type I censored lumber samplesare equal. As commented in Chapter 3, such hypotheses can be translatedinto testing equalities among the {βk} parameters under the DRM setting,which are special cases embraced by the general composite hypothesis testing1305.6. EL ratio test for the DRM parameterproblem (3.1),H0 : g(β) = 0 against H1 : g(β) 6= 0for some smooth function g : Rmd → Rq, with q ≤ md, the length of β. Thefunction g is assumed to be thrice differentiable with a full rank Jacobianmatrix. Does the corresponding EL ratio statistic based on Type I censoredsamples have a chi–square limiting distribution just like the case for completedata? The answer is affirmative as we will seen in this section.Let (θ˜, p˜, ι˜, ς˜) denote the point at which the maximum of the EL (5.1)is attained under the null constraint g(β) = 0. Recall that the MELE(θˆ, pˆ, ιˆ, ςˆ) defined in (5.2) is the point at which the EL is maximized withoutthe null constraint. The EL ratio (ELR) test statistic is defined to beRn = 2{logLn(θˆ, pˆ, ιˆ, ςˆ)− logLn(θ˜, p˜, ι˜, ς˜)}.Factorizing the Ln(θ, p, ι, ς) in the above definition into the product of PELand Ln(ι, ϑ) as in (5.3), we getRn = 2({logPLn(θˆ, pˆ, ςˆ) + logLn(ιˆ, ςˆ)} − {logPLn(θ˜, p˜, ς˜) + logLn(ι˜, ς˜)}).By Proposition 5.1, the maximum value of PLn(θ, p, ς) is independent ofthe value of ς, so the MELE (ιˆ, ϑˆ) is just the point at which Ln(ι, ϑ) is max-imized. Now, under the null constraint on β, the conclusion of Proposition5.1 still applies, because the proof of that proposition does not involve anyalgebraic operation on β. Hence, (ι˜, ς˜) is also the point at which Ln(ι, ϑ)is maximized, thereby not influenced by the null constraint. Consequently,(ι˜, ς˜) = (ιˆ, ςˆ), Ln(ιˆ, ςˆ) = Ln(ι˜, ς˜), and Rn becomesRn = 2{logPLn(θˆ, pˆ, ςˆ)− logPLn(θ˜, p˜, ς˜)}.Recalling that the DPEL `(ϑ) shares the same maximum value with the1315.6. EL ratio test for the DRM parameterPEL, Rn hence further simplifies toRn = 2{`n(ϑˆ)− `n(ϑ˜)}.In other words, the EL ratio statistic equals the DPEL ratio statistic, whichagrees with the argument (iii) in Section 5.3.1, that basing the inferenceabout β on the PEL will cause no loss of statistical power for hypothesistesting. Based on this fact, we find the asymptotic properties the ELR testas summarized in the theorem below.Recall that, when q < md, the null hypothesis g(β) = 0 can be equiva-lently expressed as β = G(γ) for some lower dimensional parameter γ anda unique function G: Rmd−q → Rmd, which is thrice differentiable with fullrank Jacobian matrix J = ∂G(γ∗)/∂γ. When q = md, g is invertible andthe null hypothesis is fully specified as β = g−1(0).Theorem 5.5 (Asymptotic properties of the ELR test). Adopt the conditionspostulated in Lemma 5.2.(i) Under the null hypothesis, g(β) = 0, of (3.1), Rn → χ2q in distributionas n→∞.(ii) Under local alternative (3.2):βk = β∗k + n−1/2k ck,Rn → χ2q(δ2) in distribution as n → ∞, where δ2 is a nonnegativenon–central parameter with expressionδ2 ={ηᵀ{Λ˜− Λ˜J(JᵀΛ˜J)−1JᵀΛ˜}η if q < mdηᵀΛ˜η if q = mdwhere Λ˜ = Uββ−UβκU−1κκUκβ. and η =(ρ−1/21 cᵀ1, ρ−1/22 cᵀ2, . . . , ς−1/2m cᵀm)ᵀis the one given in Theorem 3.2.1325.7. Other inference tasksAlso, δ2 > 0 unless η is in the column space of J .There are similar results to Theorem 4.2 and 4.3 that concerns local powercomparison of ELR tests based on Type I censored data, but omitted sincethey are just straightforward extensions.5.7 Other inference tasksAs we have seen, the above inference framework for multiple Type I cen-sored samples under the DRM centers on the DPEL (5.10), a function looksremarkably similar to the DEL (2.11) for complete data under the DRM.In Section 5.5, we showed that the DPEL actually has similar asymptoticproperties to the DEL. Based on these properties, the DEL ratio test thatis available for complete data can be adapted to the case of Type I censoreddata. In fact, with the DPEL, we can show that the results on EL quantileestimation given by Chen and Liu (2013) and on density estimation givenby Fokianos (2004), all of which are derived under the DRM for completesamples, also extend to the Type I censored case. Similarly, based on the ELinference framework given in this chapter, any EL inference result that is ineffect for complete samples under the DRM may be extended to the case ofType I censored samples.5.8 ProofsWe first introduce a few results and more notations applicable to k = 0, . . . ,m.A common condition for the theorems of this Chapter is nk/n = ρk +O(n−δ)for some positive constant δ. Without loss of generality, we assume δ ≤ 1/3:if a result holds for a larger positive constant δ, it also holds for smaller δ.Recall that ςk = Pr(Xk1 ∈ Sk), n˜k is the number of uncensored observations1335.8. Proofsin the kth sample, and n˜ =∑mk=0 n˜k By the law of the iterated logarithm,n˜k/nk = n−1knk∑i=11(xkj ∈ Sk) = ςk +O(n−1/2 log log n).Therefore,n˜kn=n˜knk·nkn=(ςk +O(n−1/2 log log n))(ρk +O(n−δ))= ρkςk +O(n−δ).(5.17)We use symbol∑k,j to denote∑mk=0∑n˜kj=1, the sum over all k and j =1, . . . , n˜k for each given k. Recall that ϕk(ϑ, x) = exp{κk + βkq(x)} andλ˜k = n˜k/n˜. WriteLn,k(ϑ, x) = − log{ m∑r=0(n˜r/n˜)ϕr(ϑ, x)1(x ∈ Sr)}+ {κk + βᵀkq(x)}.The DPEL (5.10) can be written as `n(ϑ) =∑k, j Ln,k(ϑ, x˜kj). Let hn(ϑ, x),sn(ϑ, x) and Hn(ϑ, x) be defined as the h(ϑ, x), s(ϑ, x) and H (ϑ, x) in(5.15) with ρkςk replaced by n˜k/n. Since limn→∞(n˜k/n) → ρkςk, h(ϑ, x),s(ϑ, x) and H (ϑ, x) are the limits of hn(ϑ, x), sn(ϑ, x) and Hn(ϑ, x), re-spectively, as n tends to infinity. The first order derivatives of Ln,k(ϑ, x) canbe written as∂Ln,k(ϑ, x)/∂κ = (1− δk0)ek − hn(ϑ, x)/sn(ϑ, x),∂Ln,k(ϑ, x)/∂β = {∂Ln,k(ϑ, x)/∂κ} ⊗ q(x).(5.18)Similarly, we have∂2Ln,k(ϑ, x)/∂κ∂κᵀ = −Hn(ϑ, x)/sn(ϑ, x),∂2Ln,k(ϑ, x)/∂β∂βᵀ= −{Hn(ϑ, x)/sn(ϑ, x)}⊗{q(x)qᵀ(x)},∂2Ln,k(ϑ, x)/∂κ∂βᵀ= −{Hn(ϑ, x)/sn(ϑ, x)}⊗ qᵀ(x).(5.19)1345.8. ProofsNote that all entries of Hn(ϑ, x) are non–negative, and sn(ϑ, x) exceedsthe sum of all entries of hn(ϑ, x). Thus, ‖hn(ϑ, x)/sn(ϑ, x)‖ ≤ 1, andthe absolute value of each entry of Hn(ϑ, x)/sn(ϑ, x) is bounded by 1. Byexamining the algebraic expressions closely, this result implies∣∣∂2Ln,k(ϑ, x)/∂ϑi∂ϑj∣∣ ≤ 1 + qᵀ(x)q(x),∣∣∂3Ln,k(ϑ, x)/∂ϑi∂ϑj∂ϑk∣∣ ≤ {1 + qᵀ(x)q(x)}3/2,(5.20)where ϑi denotes the ith entry of ϑ. These inequalities are just a “censored”version of inequalities (2.17) for complete data.Let Lk(ϑ, x) be the “population” version of Ln,k(ϑ, x) by replacing n˜r/nwith its limit ρrςr in the above definition. The first and second order deriva-tives of Lk(ϑ, x) are the same as those of Ln,k(ϑ, x) with hn(ϑ, x), sn(ϑ, x)and Hn(ϑ, x) replaced by h(ϑ, x), s(ϑ, x) and H (ϑ, x). Also, Lk(ϑ, x)satisfy inequalities (5.20).5.8.1 Lemma 5.2: Properties of the partial informationmatrixBy the fact that `n(ϑ) =∑k,j Ln,k(ϑ, x˜kj) and n˜k/n = ρkςk + o(1), we haveUn = −1n∂2`n(ϑ∗)∂ϑ∂ϑᵀ= −m∑k=0(ρkςk + o(1)){1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂ϑ∂ϑᵀ}.(5.21)Note that, for any given k, the sum in the curly brackets is not a sum ofiid random variables because of the presence of the random variable n˜k inthe expressions of hn(ϑ, x), sn(ϑ, x) and Hn(ϑ, x), which appear in theexpression (5.19) of Ln,k(ϑ∗, x˜kj)/∂ϑ∂ϑᵀ. This negates a simple application1355.8. Proofsof the law of large numbers. However, as we will show later,1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂ϑ∂ϑᵀ=1n˜kn˜k∑j=1∂2Lk(ϑ∗, x˜kj)∂ϑ∂ϑᵀ+ o(1). (5.22)Now with {n˜k/n} replaced by constants {ρkςk} in the expression of the secondorder derivatives of Lk(ϑ, x), the sum on the RHS of the above equality isa sum of an iid random variables. By inequalities (5.20), each summand onthe RHS is dominated by an integrable function. Hence by the strong law oflarge numbers, the first term on the RHS satisfies1n˜kn˜k∑j=1∂2Lk(ϑ∗, x˜kj)∂ϑ∂ϑᵀ= E˜k∂2Lk(ϑ∗, x)∂ϑ∂ϑᵀ+ o(1),where E˜k(·) is the expectation operator with respect to F˜k. Consequently,(5.21) simplifies toUn = −m∑k=0(ρkςk + o(1)){E˜k∂2Lk(ϑ∗, x)∂ϑ∂ϑᵀ+ o(1)}= −m∑k=0ρkςkE˜k∂2Lk(ϑ∗, x)∂ϑ∂ϑᵀ+ o(1).Therefore Un converges almost surely toU = −m∑k=0ρkςkE˜k∂2Lk(ϑ∗, x)∂ϑ∂ϑᵀ.Recall that when Sk ⊆ S0, the {F˜k} satisfy, (5.13), the VSDRM, so the aboveexpression of U can be further written asU = −m∑k=0ρkςkE˜0{∂2Lk(ϑ∗, x)∂ϑ∂ϑᵀϕk(ϑ∗, x)1(x ∈ Sk)}.1365.8. ProofsBy expressions (5.19) of ∂2Lk(ϑ∗, x)/∂ϑ∂ϑᵀ, we haveUκκ =m∑k=0ρkςkE˜0{{H (ϑ∗, x)/s(ϑ∗, x)}ϕk(ϑ∗, x)1(x ∈ Sk)}= E˜0{{H (ϑ∗, x)/s(ϑ∗, x)}{m∑k=0ρkςkϕk(ϑ∗, x)1(x ∈ Sk)}}= E˜0{H (ϑ∗, x)},where the last equality is by s(ϑ∗, x) =∑mk=0 ρkςkϕk(ϑ∗, x)1(x ∈ Sk). Recallthat dF˜0(x) = ς−10 1(x ∈ S0)dF0(x) by (5.12), and Sk ∈ S0 for all k. Theabove expression of Uκκ therefore simplies toUκκ = E˜0{H (ϑ∗, x)} = ς−10 E0{H (ϑ∗, x)}.As a reminder, Ek(·) in general is the expectation operator with respect toFk. Similarly, we found the expressions for Uββ and Uκβ as given in (5.16).That U and Uκκ are positive definite, can be shown using a similarargument to that for showing the positive definiteness of U and Uαα in theproof of Lemma 2.1 (Section 2.5.1).To complete the proof, we show the matrix equality (5.22) block by block.First we show that1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂κ∂κᵀ=1n˜kn˜k∑j=1∂2Lk(ϑ∗, x˜kj)∂κ∂κᵀ+ op(1). (5.23)Define, for k = 0, 1, . . . , m,φk(x) = ϕk(ϑ∗, x)1(x ∈ Sk)∆k = n˜k/n− ρkςk,∆ = max0≤k≤m|∆k|.1375.8. ProofsBy (5.17), ∆k = O(n−δ) for each k and ∆ = O(n−δ). By expressions (5.19),the element on the ith row and tth column, i, t ∈ {1, . . . , m} and i 6= t, ofthe LHS matrix of the equality (5.23) can be written as1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂κi∂κt=1n˜kn˜k∑j=1(ρiςi + ∆i)φi(x˜kj)(ρtςt + ∆t)φt(x˜kj)s2n(ϑ∗, x˜kj).(5.24)Note that1s2n(ϑ∗, x)=1s2(ϑ∗, x)1{1 + (∑mr=0 ∆rφr(x)/s(ϑ∗, x)}2 .Now the key step is to perform a Taylor expansion for the second factor onthe RHS of the above equality,1{1 + (∑mr=0 ∆rφr(x))/s(ϑ∗, x)}2 = 1−2∆{∑mr=0(∆r/∆)φr(x)}/s(ϑ∗, x){1 + an(∑mr=0 ∆rφr(x))/s(ϑ∗, x)}3,where an is a non–random number in the interval [0, 1] that may change withthe total sample size n. With this expansion and (5.24), we get1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂κi∂κt= (ρiςi + ∆i)(ρtςt + ∆t){1n˜kn˜k∑j=1φi(x˜kj)φt(x˜kj)s2(ϑ∗, x˜kj)−Rn},(5.25)whereRn =2∆n˜kn˜k∑j=1{φi(x˜kj)φt(x˜kj)s2(ϑ∗, x˜kj)(∑mr=0(∆r/∆)φr(x˜kj))/s(ϑ∗, x˜kj){1 + an(∑mr=0 ∆rφr(x˜kj))/s(ϑ∗, x˜kj)}3}.(5.26)The first term in the curly brackets on the RHS of (5.25) is the average of a1385.8. Proofssum of iid random variables. Moreover, by the definition of s(ϑ, x), (5.15),and recalling φk(x) = ϕk(ϑ∗, x)1(x ∈ Sk), we have that, for all x and everyk ∈ {0, 1, . . . , m},0 < φk(x)/s(ϑ∗, x) < 1/(ρkςk) ≤ 1/(min0≤i≤m ρiςi). (5.27)Thus for all x and any i, t ∈ {0, 1, . . . , m},0 < φi(x)φt(x)/s2(ϑ∗, x) < 1/(min0≤i≤m ρiςi)2.We can therefore invoke the strong law of large numbers to conclude thatthe first term in the curly brackets on the RHS of (5.25) is of O(1). Thesecond term Rn, as we will show soon, is of o(1). Hence, with the fact that∆k = O(n−δ) = o(1), we have1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂κi∂κt=1n˜kn˜k∑j=1ρiςiφi(x˜kj)ρtςtφt(x˜kj)s2(ϑ∗, x˜kj)+ o(1)=1n˜kn˜k∑j=1∂2Lk(ϑ∗, x˜kj)∂κi∂κt+ o(1).Similarly, for i = 1, . . . , m, we can show1n˜kn˜k∑j=1∂2Ln,k(ϑ∗, x˜kj)∂κ2i=1n˜kn˜k∑j=1∂2Lk(ϑ∗, x˜kj)∂κ2i+ o(1).Therefore (5.23) holds.By expressions (5.19) and the fact that q(x)qᵀ(x) is an integrable func-tion, we can show that similar equalities to (5.23) hold for ∂2Ln,k(ϑ∗, x˜kj)/∂β∂βᵀand ∂2Ln,k(ϑ∗, x˜kj)/∂κ∂βᵀ. Hence (5.22) is true and the lemma is proved.To finish up, we show Rn is of o(1). Recalling (5.15) that s(ϑ∗, x) is thesum of the positive terms ρkςkφr(x) over k = 0, 1, . . . , m, we thus have, for1395.8. Proofsall x,∣∣∣∣∑mr=0(∆r/∆)φr(x)s(ϑ∗, x)∣∣∣∣ ≤∑mr=0 |∆r/∆|φr(x)s(ϑ∗, x)≤∑mr=0 φr(x)s(ϑ∗, x)≤1min0≤i≤m ρiςi.(5.28)Consequently, for all x,∣∣∣∣∑mr=0 ∆rφr(x)s(ϑ∗, x)∣∣∣∣ = |∆|∣∣∣∣∑mr=0 ∆r/∆rφr(x)s(ϑ∗, x)∣∣∣∣ ≤∆min0≤i≤m ρiςi.Since ∆ = o(1) and 0 ≤ an ≤ 1 for all n, we can find a N , such that whenevern > N and uniformly in x,|an(m∑r=0∆rφr(x))/s(ϑ∗, x)| < 1/2,and so23<11 + an(∑mr=0 ∆rφr(x))/s(ϑ∗, x)< 2. (5.29)Therefore, by bounds (5.27), (5.28), (5.29) and the expression (5.26) of Rn,we have, for all n large enough,|Rn| <16∆(min0≤i≤m ρiςi)3.Since ∆ = o(1), we have Rn = o(1). The proof now is complete.5.8.2 Lemma 5.3: Asymptotic properties of the scorefunctionFor ease of presentation, we first proof the lemma for δ = 1/3, i.e. whennk/n = O(n−1/3). Then we show that the lemma also holds for arbitrary1405.8. Proofs0 < δ < 1/3.Recall that v = n−1/2∂`n(ϑ∗)/∂ϑ = n−1/2∑k,j{∂Ln,k(ϑ∗, x˜kj)/∂ϑ}. Wefirst show v can be centered in a particular sense. For a function g(x)that functionally involves the random variables {n˜k}, we use˜∫ g(x)dF˜r(x),r = 0, . . . , m, to denote an integral that pretends n˜r to be a non–random con-stant. For example, for the previously defined sn(ϑ∗, x) =∑mk=0(n˜k/n)φk(x),˜∫sn(x)dF˜r(x) =m∑k=0(n˜k/n)ˆφk(x)dF˜r(x).By the VSDRM assumption (5.13) and expression (5.18), we findm∑k=0(n˜k/n)˜∫{∂Ln,k(ϑ∗, x)/∂ϑ}dF˜k = 0.Hence, v can be “centered” asv = n−1/2∑k,j∂Ln,k(ϑ∗, x˜kj)/∂ϑ=m∑k=0√n˜k√n{1√n˜kn˜k∑j=1(∂Ln,k(ϑ∗, x˜kj)∂ϑ−˜∫ ∂Ln,k(ϑ∗, x)∂ϑdF˜k)}. (5.30)In the second step, we show that for each given k, k = 0, . . . , m, the termin the curly brackets of (5.30) satisfies1√n˜kn˜k∑j=1(∂Ln,k(ϑ∗, x˜kj)∂ϑ−˜∫ ∂Ln,k(ϑ∗, x)∂ϑdF˜k)=1√n˜kn˜k∑j=1(∂Lk(ϑ∗, x˜kj)∂ϑ− E˜k∂Lk(ϑ∗, x)∂ϑ)+ op(1). (5.31)By (5.18) and ∆k = n˜k/n− ρkςk, to show the above equality, it is enough to1415.8. Proofsshow that for any given i ∈ {1, 2, . . . , m},1√n˜kn˜k∑j=1{(ρiςi + ∆i)φi(x˜kj)sn(ϑ∗, x˜kj)−˜∫ (ρiςi + ∆i)φi(x)sn(ϑ∗, x)dF˜k}=1√n˜kn˜k∑j=1{ρiςiφi(x˜kj)s(ϑ∗, x˜kj)− E˜kρiςiφi(x)s(ϑ∗, x)}+ op(1). (5.32)Note that1sn(ϑ∗, x)=1s(ϑ∗, x)1{1 + (∑mr=0 ∆rφr(x))/s(ϑ∗, x)} .Recall that ∆ = max0≤k≤m |∆k|. The second factor on the RHS of the aboveequality admits the following expansion11 + (∑mr=0 ∆rφr(x))/s(ϑ∗, x)= 1−∑mr=0 ∆rφr(x)s(ϑ∗, x)+ ∆2Qn(x), (5.33)whereQn(x) =(∑mr=0(∆r/∆)φr(x))2s2(ϑ∗, x)·1{1 + an(∑mr=0 ∆rφr(x))/s(ϑ∗, x)}3 ,with an being a non–random number in the interval [0, 1]. With the aboveexpansion, we then have1√n˜kn˜k∑j=1{(ρiςi + ∆i)φi(x˜kj)sn(ϑ∗, x˜kj)−˜∫ (ρiςi + ∆i)φi(x)sn(ϑ∗, x)dF˜k}=(ρiςi + ∆i)(a+m∑r=0∆rbr + ∆2c), (5.34)1425.8. Proofswherea =1√n˜kn˜k∑j=1{φi(x˜kj)s(ϑ∗, x˜kj)− E˜kφi(x)s(ϑ∗, x)},br =1√n˜kn˜k∑j=1{φi(x˜kj)φr(x˜kj)s2(ϑ∗, x˜kj)− E˜kφi(x)φr(ϑ∗, x)s(ϑ∗, x)},c =1√n˜kn˜k∑j=1{φi(x˜kj)s(ϑ∗, x˜kj)Qn(x˜kj)−˜∫ ( φi(x)s(ϑ∗, x)Qn(x))dF˜k}.Now, for any given i and k, {φi(xkj)}∞j=1 is an iid sequence and, by (5.27),φ2i (x)/s2(ϑ∗, x) < 1/(min0≤r≤m ρrςr)2,so by the central limit theorem, term a is of Op(1). Similarly, br is of Op(1)for each r. Recalling that ∆r = o(1), we havem∑r=0∆rbr = op(1).We then look at term the c. By bound (5.27), (5.28) and (5.29), we have, forall large enough n and all x,|{φi(x)/s(ϑ∗, x)}Qn(x)| < 8/(min0≤r≤m ρrςr)3.1435.8. ProofsConsequently,|c| =1√n˜kn˜k∑j=1∣∣∣∣∣φi(x˜kj)s(ϑ∗, x˜kj)Qn(x˜kj)−˜∫ ( φi(x)s(ϑ∗, x)Qn(x))dF˜k∣∣∣∣∣≤1√n˜kn˜k∑j=1{∣∣∣∣φi(x˜kj)s(ϑ∗, x˜kj)Qn(x˜kj)∣∣∣∣+˜∫ ∣∣∣∣φi(x)s(ϑ∗, x)Qn(x)∣∣∣∣ dF˜k}<16√nk(min0≤l≤mρl)3.Recall that ∆ = O(n−δ). When δ = 1/3, we have∆2c = O(n−2/3) ·O(n1/2k ) = o(1).With the above orders of a, br and c, and the expression (5.34), we knowthat equality (5.32) holds. The terms in the curly brackets on the RHS of(5.32) are iid across j, so the LHS of that equality has a normal limitingdistribution. It follows that the item in the curly brackets of (5.30) has anormal limiting distribution:1√n˜kn˜k∑j=1{∂Ln,k(ϑ∗, x˜kj)∂ϑ−˜∫ ∂Ln,k(ϑ∗, x)∂ϑdF˜k}−→ N(0,Vk)in distribution, whereVk = E˜k{∂Lk(ϑ∗, x)∂ϑ∂Lk(ϑ∗, x)∂ϑᵀ}+ E˜k{∂Lk(ϑ∗, x)∂ϑ}E˜k{∂Lk(ϑ∗, x)∂ϑᵀ}.By the above asymptotic nomality, (5.30) and√n˜k/n =√ρkςk + o(1),we havev −→ N(0, V )with V =∑mk=0 ρkςkVk. As the proof of Theorem 2.2 for complete data, we1445.8. Proofsfind V = U −UWU. The proof is complete.Remark 5.1. In the above proof, we assumed nk/n = ρk + O(n−δ) withδ = 1/3. We now show the general case of 0 < δ < 1/3.Recall that we have shown n˜k/n = ρk +O(n−δ). Note that the only placewe used the order of O(n−1/3) for ∆k = n˜k/n− ρk is in the proof of equality(5.32). The key to showing (5.32) lies in the expansion (5.33). Now, suppose∆k = O(n−δ) for some 1/4 > δ > 0. Let T = d1/(2δ)e + 1, where d·e is theceiling function. We expand 1/{1 + (∑mr=0 ∆rφr(x))/s(ϑ∗, x)} to the Tthorder instead of the 2nd order as in (5.33). Then similar to expansion (5.34),1√n˜kn˜k∑j=1{(ρiςi + ∆i)φi(x˜kj)sn(ϑ∗, x˜kj)−˜∫ (ρiςi + ∆i)φi(x)sn(ϑ∗, x)dF˜k}correspondingly has a Tth order expansion of the form(ρiςi + ∆i)(a1 + a2 + . . .+ aT + rTaT+1). (5.35)The leading term a1 is exactly the same as the leading term a in (5.34).Each at, t = 2, 3, . . . , T, just like term∑mr=0 ∆rbr in (5.34), by multinomialtheorem, is a finite sum of op(1) terms, so is also op(1). Lastly, the residualterm aT+1, similar to the c in (5.34), can be shown to be bounded by2T+2√nk(min0≤r≤mρrςr)T+1.This bound, along with ∆ = max0≤r≤m |∆i| = O(n−δ) and T = d1/(2δ)e+ 1,gives us∆TaT+1 = o(1).Then, in view of (5.35), we conclude that, even when nk/n = ρk +O(n−δ) foran arbitary δ > 0, equality (5.32) is still true and so Theorem 2.2 still holds.1455.8. Proofs5.8.3 Theorem 5.4: Asymptotic normality of theMPELEBased on properties of the partial information matrix (Lemma 5.2) and thescore function (Lemma 5.3), as the proof of Lemma 2.4 for complete data(Section 2.5.2), we can show that the DPEL `n(ϑ) attains a maximum inthe interior of a 3√n–neighbourhood of the true parameter value ϑ∗. Alongwith the fact that the DPEL is concave, we conclude that all the maximaof DPEL must be in the interior of that neighbourhood. Consequently, theMPELE, which is a maximum of the DPEL, must be 3√n–consistent.Recall that the partial empirical information matrix is defined asUn = −n−1∂2`n(ϑ∗)/∂ϑ∂ϑᵀ.Expanding n−1/2∂`n(ϑˆ)/∂ϑ around ϑ∗, we getn−1/2∂`n(ϑˆ)/∂ϑ = v −Un{√n(ϑˆ− ϑ∗)}+ op(1),where the third–order residue term is of op(1) because ϑˆ is 3√n–consistentand the third–order derivatives of the DPEL `n(ϑ) are bounded by an inte-grable function as implied by (5.20). Since ϑˆ is the point at which `n(ϑ) ismaximized, the LHS of the above expansion is 0. Reorganizing terms andrecalling that Un converges to U almost surely by Lemma 5.2, we easily get√n(ϑˆ− ϑ∗) = U−1v + op(1). (5.36)The claimed asymptotic normality of√n(ϑˆ − ϑ∗) then follows from theasymptotic normality of v , which is given by Lemma 5.3.1465.8. Proofs5.8.4 Theorem 5.5: Asymptotic properties of the ELRtestProof of Theorem 5.5 (i). Recall that the ELR statistic Rn equals the DPELratio statistic given by Rn = 2{`n(ϑˆ)− `n(ϑ˜)}. As the proof of Theorem 3.1for complete data, the idea is to find suitable quadratic expansions for `n(ϑˆ)and `n(ϑ˜) under the null model, and show that the difference of the two,which equals the ELR statistic Rn, has a chi–square limiting distribution.Expanding `n(ϑˆ) around ϑ∗, we get`n(ϑˆ) = `n(ϑ∗) +√nvᵀ(ϑˆ− ϑ∗)− (1/2)n(ϑˆ− ϑ∗)ᵀUn(ϑˆ− ϑ∗) + op(1),where the last term is of op(1) since ϑˆ − ϑ∗ = Op(n−1/2) and the thirdderivatives of `n(ϑˆ) are bounded by an integrable function. Combining theabove expansion with (5.36) and using the fact that Un = U + op(1), weobtain`n(ϑˆ) = `n(ϑ∗) + (1/2)vᵀU−1v + op(1).We then give an expansion of `n(ϑ˜) under the null model g(β) = 0.As noted in Section 3.2.2, when q < md, the null model is equivalently toβ = G(γ) for some function G: Rmd−q → Rmd and parameter γ of dimensionmd − q. In addition, G is thrice differentiable, and its Jacobian matrix J =∂G(γ∗)/∂γ is of full rank. Using exactly the same technique for the proofof Theorem 3.1, we find the follow expansion for the DPEL under the nullmodel`n(ϑ˜) = `n(ϑ∗) + (1/2)v˜ᵀU˜−1v˜ + op(1),where v˜ = {diag(Im, J)}ᵀv and U˜ = {diag(Im, J)}ᵀU{diag(Im, J)}.1475.8. ProofsWith the above expansions of `n(ϑˆ) and `n(ϑ˜), we then getRn = 2{`n(ϑˆ)− `n(ϑ˜)} = vᵀU−1v − v˜ᵀU˜−1v˜ + op(1).Applying the quadratic form decomposition formula given in Lemma 3.4 tothe above two quadratic forms and after cancelling terms, Rn finally simplifiestoRn = ξ˜ᵀ{Λ˜−1 − J(JᵀΛ˜J)−1Jᵀ}ξ˜ + op(1), (5.37)where ξ˜ = (−UβκU−1κκ, Imd)v and Λ˜ = Uββ − UβκU−1κκUκβ is defined inTheorem 5.5 (ii). The above expression of Rn is the same as that of DEL ratiostatistic based on complete samples given in (3.10), so using the same provingtechnique, we find that Rn has the claimed chi–square limiting distribution.Proof of Theorem 5.5 (ii). Let β∗ be a specific parameter value under thenull hypothesis and {Fk} be the corresponding distribution functions. Let{Gk} be the set of distribution functions satisfying the DRM with parametergiven by the alternative model βk = β∗k + n−1/2k ck, k = 1, . . . , m, and G0 =F0. Denote the distributions of the uncensored observations under the nullmodel and the local alternative model as {F˜k} and {G˜k}, respectively. Whenthe samples are generated from the {Gk}, ELR statistic still follows theexpansion (5.37), just like what we have shown in the proof of Theorem3.2 for complete samples. The limiting distribution of Rn is therefore againdetermined by that of v = n−1/2∂`n(ϑ∗)/∂ϑ, which can be found by usingLe Cam’s third lemma.We now derive the limiting distribution of v under the local alternativedistributions {Gk}. Let w˜k =∑n˜kj=1 log{dG˜k(x˜kj)/dF˜k(x˜kj)}. Note that v in-volves only uncensored observations {x˜kj}, so by Le Cam’s third lemma, thekey to find this limiting distribution lies in finding the joint limiting distribu-tion of v and∑mk=0 w˜k under the null limiting distributions for uncensored1485.8. Proofsobservations {F˜k}.We first work on an expansion for∑mk=0 w˜k. For each k = 0, 1, . . . , m, let˜Vark(·) and ˜Covk(·) be the variance and covariance operators with respectto F˜k, respectively. Just as the proof of Lemma 3.5 for complete data, wefind thatlog{dG˜k(x)/dF˜k(x)} = n−1/2k cᵀk{q(x)− ν˜k} − (2nk)−1cᵀkσ˜kck +O(n−3/2)uniformly in x, where ν˜k = E˜kq(x) and σ˜k = ˜Vark(q(x)). Therefore wehavew˜k =n˜k∑j=1log{dG˜k(x˜kj)/dF˜k(x˜kj)}= (n˜k/nk)1/2cᵀk{n˜−1/2kn˜k∑j=1{q(x˜kj)− ν˜k}}− (1/2)(n˜k/nk)cᵀkσ˜kck +O(n−1/2)= ς1/2k cᵀk{n˜−1/2kn˜k∑j=1{q(x˜kj)− ν˜k}}− (1/2)ςkcᵀkσ˜kck + op(1).where the last equality is by n˜k/nk = ςk+o(1) and n˜−1/2k∑n˜kj=1{q(x˜kj)−ν˜k} =Op(1).With the above expansion of w˜k, the expression (5.30) of v , the expan-sion (5.31) and the fact that n˜k/n = ρkςk + o(1), we get the following jointexpansion for v and∑mk=0 w˜k,(v∑k w˜k)=m∑k=01√n˜kn˜k∑j=1((ρkςk)1/2{∂Ln,k(ϑ∗, x˜kj)/∂ϑ− E˜k{∂Ln,k(ϑ∗, x)/∂ϑ}}ς1/2k cᵀk{q(xkj)− νk})−m∑k=0(0(1/2)ςkcᵀkσ˜kck)+ op(1).Hence v and∑mk=0 w˜k have a joint normal limiting distribution with mean1495.9. Appendix: The WEL inference for Type I censored samplesvector and covariance matrix given by(0ᵀ, −12∑kςkcᵀkσ˜kck)ᵀand(V τ˜τ˜ ᵀ∑k ςkcᵀkσ˜kck),withτ˜ =m∑k=1ρk1/2ςkCovk{∂Lk(ϑ∗, x)/∂ϑ, qᵀ(x)}ckBecause the second entry of the mean vector equals negative half of thelower–right entry of the covariance matrix, the condition of Le Cam’s thirdlemma is satisfied. By that lemma, we conclude thatv −→ N(τ˜ , V )in distribution, under the local alternative distributions {Gk}.We have argued at the beginning of the proof that, under the {Gk}, theELR statistics Rn is still approximated by ξ˜ᵀ{Λ˜−1 − J(JᵀΛ˜J)−1Jᵀ}ξ˜ withξ˜ = (−UβκU−1κκ, Imd)v . The vector ξ˜ has a normal limiting distributionbecause v has one as we have just shown. Based on this result, just likethe proof of Theorem 3.2 for complete samples, the above quadratic formis found to have the claimed non–central chis–square limiting distribution.This completes the proof.5.9 Appendix: Some thoughts on the weightedEL inference for Type I censored samplesFor inference under the DRM for two samples with randomly censored ob-servations, Ren (2008) proposed to use the so–called weighted empirical like-lihood (WEL) function. This leads us to wonder whether this WEL is also1505.9. Appendix: The WEL inference for Type I censored samplesa useful tool for inference based on Type I censored samples. We find thatin fact it results in an inconsistent estimator for the DRM scaling parameterα, as demonstrated in this section.Suppose the observations are randomly right–censored. The idea of theWEL is to construct a likelihood type function based on the Kaplan–Meierestimator (Kaplan and Meier, 1958) of the {Fk}, which is defined asFˇk(t) =sk∑j=1wkj1{ykj ≤ t},where sk is the total number of distinct uncensored observations in the kthsample and yk1 < yk2 < · · · < yksk are the ordered values of those distinctuncensored observations. The {wkj}skj=1 are a set of positive weights. Let dkj,j = 1, · · · , sk, be the number of failures at ykj in the kth sample, and rkj bethe number at risk just prior to ykj in that sample. The weights {wkj} aregiven bywk1 =dk1rk1and wkj =dkjrkjj−1∏l=1rkl − dklrkl, j = 2, . . . , sk.and for a positive integer t ≤ sk, we havet∑j=1wkj = 1−t∏j=1rkj − dkjrkj. (5.38)When the largest censored observation is larger than the largest uncensoredobservation,∑skj=1wkj < 1 and Fˇk is a not proper CDF.For the {Fk} that satisfy the DRM assumption, Ren (2008) defined the1515.9. Appendix: The WEL inference for Type I censored samplesWEL to beL(w)n (F0, α, β)=m∏k=0sk∏j=1{dFk(ykj)}nkwkj={m∏k=0sk∏j=1{dF0(ykj)}nkwkj}· exp{m∑k=1sk∑j=1nkwkj{αk + βkq(ykj)}}.(5.39)The corresponding profile log–WEL is defined asl(w)n (α, β) = supF0{logL(w)n (F0, α, β) :m∑k=1sk∑j=1exp{αl + βlq(ykj)}dF0(ykj) = 1,l = 0, 1, · · · , m}.In the case of two samples with randomly right–censored observations, Renshowed that, under suitable conditions, the maximumWEL estimator (MWELE)of (α, β) is consistent and asymptotically normal, and for a special form ofthe density ratio, the corresponding likelihood ratio statistic has a scaledchi–square limiting distribution.We now look at a direct adaptation of the WEL to the case of Type Iright–censored samples. For Type I right–censored samples, rkj−dkj = rk(j+1)for j = 1, . . . , sk − 1, and rksk − dksk = nk − n˜k. Hence, by (5.38), we getsk∑j=1wkj =n˜knk.Note that n˜k/nk → ςk almost surely as n→∞, so when ςk < 1,∑skj=1wkj < 1almost surely for all large n, and consequently Fˇk is not a proper CDF. In1525.9. Appendix: The WEL inference for Type I censored samplesthis case, the constraint in the profile log–WEL should be changed tom∑k=1sk∑j=1exp{αl + βlq(ykj)}1(ykj ∈ Sl)dF0(ykj) = ςl, (5.40)because the WEL is essentially constructed using the uncensored observationsonly. Assuming no ties in the uncensored observations, then sk = n˜k andwkj = 1/nk for all j = 1, . . . , n˜k. Now, the WEL (5.39) can be written asL(w)n (F0, α, β)={m∏k=0sk∏j=1{dF0(ykj)}nkwkj}· exp{m∑k=1sk∑j=1nkwkj{αk + βkq(ykj)}}={m∏k=0n˜k∏j=1dF0(x˜kj)}exp{m∑k=1n˜k∑j=1{αk + βkq(x˜kj)}1(x˜kj ∈ Sk)},and the corresponding constraint (5.40) can alternatively be written asm∑k=0n˜k∑j=1exp{αl + βᵀl q(x˜kj)}1(x˜kj ∈ Sl)dF0(x˜kj) = ςl.Strikingly, this WEL has the same expression as the PEL (5.6) except thatthe ℘kj, which is dF˜0(x˜kj), and κk in (5.6) are replaced by dF0(x˜kj) and αkhere. The profile log–WEL of α and β is then defined to bel(w)n (α, β) = supF0, {ςk}{logL(w)n (F0, α, β) :m∑k=0n˜k∑j=1exp{αl + βᵀl q(x˜kj)}1(x˜kj ∈ Sl)dF0(x˜kj) = ςl,dF0(x˜kj) ≥ 0, 0 < ςl ≤ 1, l = 0, . . . , m}.Clearly, the L(w)n (F0, α, β) is functionally independent of the {ςl}, so the1535.9. Appendix: The WEL inference for Type I censored samplesabove supremum is attained on the boundary of space of ςl’s when ςl = 1 forall l = 0, 1, . . . , m. Hence,l(w)n (α, β) = supF0{l(w)n (F0, α, β) :m∑k=0n˜k∑j=1exp{αl + βᵀl q(x˜kj)}1(x˜kj ∈ Sl)dF0(x˜kj) = 1,dF0(x˜kj) ≥ 0, l = 0, . . . , m}.Again, l(w)n (α, β) has the same mathematical expression as that for the profilelog–PEL (5.8), except that the ℘kj and κk in (5.8) are now replaced bydF0(x˜kj) and αk. Hence the MWELE of (α, β) has exactly the same valueas the MPELE of (κ, β). Based on this result and the fact that the MPELEof (κ, β) is a consistent estimator, we conclude that the MWELE of (α, β)converges to (κ∗, β∗) as the total sample size n goes to infinity. Since ingeneral κ∗k = α∗k + log ς∗0 − log ς∗k 6= α∗k, where ς∗k is the true value of ςk, theMWELE of (α, β) is not a consistent estimator.154Chapter 6R software package “drmdel” forDEL inference under the DRMThis chapter introduces and illustrates the use of an R software package,drmdel , that we wrote for the DEL inference under the DRM based on mul-tiple complete samples. This package can be used to calculate the MELEof the DRM parameter, perform the DELR test, estimate population distri-bution functions, estimate quantiles of the population distributions as foundin Chen and Liu (2013), compare quantiles from different distributions us-ing a Wald test, and estimate densities of different populations as found inFokianos (2004). The package and its manual can be download from TheComprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/drmdel/index.html; alternatively, it can be installedwithin R using command install.packages("drmdel").An extension of this package to the case of Type I censored samplesis under development by the time this thesis is written and will soon beavailable.6.1 Under the hood: consideration andimplementationAs we have seen in earlier chapters, the first and a key step of EL inferenceunder the DRM is to compute the MELEs of the DRM parameter θ andthe baseline distribution F0. For multiple complete samples, the MELE θˆ1556.1. Under the hood: consideration and implementationcan be computed as the point at which the concave function DEL (2.11) ismaximized. After obtaining θˆ, the MELE {pˆkj} of the baseline distributioncan then be computed through (2.9).The core of the above computational procedure is a smooth concave max-imization problem, and for such a problem, the quasi–Newton methods (No-cedal and Wright, 2006) are probably the most popular methods because theyare fast, reliable, and implemented in most computational software packages.A quasi–Newton method has a super–linear convergence rate, i.e. faster thanlinear rate but slower than quadratic rate, and for each iteration, its compu-tational complexity is O(p2 +pCf ), where p is the dimension of the parameterof the function to be optimized and Cf is the number of operations neededfor one function evaluation. Our particular choice is the famous Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, a quasi–Newton method si-multaneously discovered by Broyden (1970), Fletcher (1970), Goldfarb (1970)and Shanno (1970), because it is well ingrained in the R optim function.As noted above, the speed of the maximization procedure is influencedby the speed of function evaluation. And function evaluation becomes themain factor when the dimension p of the parameter is not very high. Theevaluation of the object function DEL (2.11) is straightforward but involvesa multiple summation. With such a structure, an R implementation of func-tion evaluation will be either cumbersome without loops or very slow withloops. We hence implement the function and its Jacobian evaluation in Cfor both speed and convenience. The evaluated function and its Jacobianthen are passed to the R optim function for maximization. In principle, thespeed of the optimization can be increased even more if we use a “BFGS”procedure implemented in C to save the communication time between R andC. In practice, we found such an approach does not provides a noticeablespeed gain. Moreover, since our intention is to write an R package, an Rimplementation of the algorithm is supposed to provide extra reliability, andR users who are familiar with optim function will find it more convenient.1566.2. DRM fittingThese considerations lead to our final implementation of function and itsJacobian evaluation in C, and optimization in R.The computation of pˆkj (2.9) is also implemented in C for maximal com-putational efficiency. With this estimator of dF0(xkj), by applying the DRMassumption (2.1), we estimate dFr(xkj), r = 1, . . . , , m, bypˆ(r)kj = dFˆr(xkj) = exp{αˆr + βˆᵀrq(xkj)}pˆkj.The other inferences tasks described in the sequel are all based on thevalues θˆ and {pˆ(r)kj }, so mostly implemented in R unless otherwise noted.6.2 DRM fittingThe primary function of the package is drmdel, which fits a DRM to data,calculates the MELEs θˆ and {pˆkj}, and performs the DELR test of Chapter3 for hypotheses about the DRM parameter β. This function should be usedprior to any other function in the package for performing EL inference, sinceall others depend on the output of this one.The function has the following generic form:drmdel(x, n_samples, basis_func, g_null=NULL,g_null_jac=NULL, par_dim_null=NULL, ...).The arguments are:x: a long vector containing all m+ 1 multiple samples in the order of x0,x1, . . . , xm, where x0 is the sample from the baseline distribution andxk, k = 1, . . . , m, is the kth non–baseline sample.n_samples: a vector indicating the size of each sample, i.e. (n0, n1, . . . ,nm).1576.2. DRM fittingbasis_func: the basis function q(x) of the DRM to be used. It couldbe either an integer between 1 and 11 or an R function. The integersrepresent 11 different built–in basis functions as follows:1: q(x) = x.2: q(x) = log |x|.3: q(x) =√|x|.4: q(x) = x2.5: q(x) = (x, x2)ᵀ.6: q(x) = (x, log |x|)ᵀ.7: q(x) = (log |x|,√|x|, x)ᵀ.8: q(x) = (log |x|,√|x|, x2)ᵀ.9: q(x) = (log |x|, x, x2)ᵀ.10: q(x) = (√|x|, x, x2)ᵀ.11: q(x) = (log |x|,√|x|, x, x2)ᵀ.g_null: the function G specifying the null hypothesis of (6.1), if there isone. The default value of NULL represents that there is no hypothesisspecified.g_null_jac: the Jacobian matrix (first–order derivatives) of G, if avail-able.par_dim_null: dimension of the null parameter γ in the hypothesis test-ing problem (6.1), if there is one....: further arguments to be passed to the R function optim for max-imizing the DEL. See help(optim) for details. The current defaultvalues for “control$method” and “control$maxit” are set to "BFGS"and 10000, respectively.1586.2. DRM fittingThe output of the function is an R list object containing lots of elements. Acomplete list of these output elements can be found by using R commandhelp(drmdel). We here only describe the most important ones as follows:mele: the MELE of the DRM parameter in the form of (θˆT1 , . . . , θˆTm),where θk = (αk, βᵀk)ᵀ.p_est: the MELE {pˆ(r)kj } of {dFr(xkj)}, r = 0, 1, . . . , m. This is a dataframe with the following three columns:k: label of the populations, k = 0, 1, . . . , m.x: data points. It specifies the value of xkj at which dFk(x) isestimated.p_est: values of {pˆ(r)kj } for r = 0, 1, . . . , m.info_mat: the estimated information matrix, Uˆ .mele_null: the MELE of the DRM parameters γ under the null modelof (6.1), if available. This is a list object with two elements:alpha: the MELE of the DRM parameter α under the null model.gamma: the MELE of the null parameter γ.delr: the value of the DELR statistic for the hypothesis testing prob-lem (6.1). When no hypothesis (g_null) is given, this is the value ofthe DELR statistic for testing the hypothesis that all the distributionfunctions are equal.df: degrees of freedom of the chi–square limiting distribution of theDELR statistic under the null model.p_val: p–value of the DELR test.We now give an example to illustrate the use of the drmdel function forDRM fitting.1596.2. DRM fittingExample 6.1 (Fitting a DRM and calculating the MELE). Suppose we havem+ 1 = 5 samples generated from gamma distributions as follow.set.seed(25)# sample sizesn_samples <- c(100, 200, 180, 150, 175) # sample sizes# data generationx0 <- rgamma(n_samples[1], shape=5, rate=1.8)x1 <- rgamma(n_samples[2], shape=12, rate=1.2)x2 <- rgamma(n_samples[3], shape=12, rate=1.2)x3 <- rgamma(n_samples[4], shape=18, rate=5)x4 <- rgamma(n_samples[5], shape=25, rate=2.6)# concatenating the samples to a long data vectorx <- c(x0, x1, x2, x3, x4)We now fit a DRM to these samples. Recall that the appropriate DRMbasis function for gamma distributions is q(x) = (x, log x)ᵀ, which is thebuilt–in basis function 6 of the drmdel function. We hence fit a DRM withsuch a basis function to the data as follows.# load the drmdel package into Rlibrary(drmdel)# fit the DRM to datadrmfit_ex1 <- drmdel(x=x, n_samples=n_samples, basis_func=6)# checking the names of the outputsnames(drmfit_ex1)1606.2. DRM fittingAs we have noted, the output of a fitted DRM object has lots of compo-nents, therefore hard to read. We hence provide a function, summaryDRM,to accompany drmdel, which reads a fitted DRM object and gives a nicelyformatted summary of its output.# checking the summary of the fitted DRM objectsummaryDRM(drmfit_ex1)The returned summary is:Basic information about the DRM:Number of samples (m+1): 5Basis function: 6Dimension of the basis function (d): 2Sample sizes: 100 200 180 150 175Sample proportions (rho): 0.124 0.248 0.224 0.186 0.217Maximum empirical likelihood estimator (MELE) of the DRMparameter:alpha[] beta[,1] beta[,2]F1 -22.14 -0.1935 13.5F2 -19.49 0.0243 11.4F3 -4.95 -4.6389 17.8F4 -32.88 -1.2457 22.8Default hypothesis testing problem:H_0: All distribution functions, F_k’s, are equal.H_1: One of the distribution functions is different fromthe others.Dual empirical likelihood ratio (DELR) statistc: 1035.261Degree of freedom: 8p-value: 01616.2. DRM fittingSummary statistics of the estimated F_k’s (mean, var --variance, sd -- standard deviation, Q1 -- first quartile, Q3-- third quartile, IQR -- inter-quartile range):mean var sd Q1 Q3 IQRF0 2.64 1.59 1.262 1.75 3.25 1.51F1 10.20 7.70 2.776 8.11 11.82 3.71F2 10.29 9.31 3.051 7.99 11.96 3.98F3 3.61 0.71 0.843 3.06 4.20 1.14F4 9.61 4.10 2.024 8.07 10.96 2.89In the above DRM fitting, we have used the built–in basis function 6.An alternative way to specify the basis function of the DRM is to directlypass an R function to the drmdel function. This way, we can use any basisfunction we want, but not limited to the built–in basis functions. However,if the basis function of our choice is one of the build–ins, we should alwaysuse the built–in for a higher computational efficiency. The following codeillustrates the use of an outer basis function.# specify a basis functionbasis_gamma <- function(x) return(c(x, log(abs(x))))# fit the DRM with this specified basis functiondrmfit_ex1a <- drmdel(x=x, n_samples=n_samples,basis_func=basis_gamma)# One can see the summary of this DRM fit is exactly the# same as that of the previous fit with basis_func=6summaryDRM(drmfit_ex1a)In addition to summaryDRM, we give another function, meleCov, to comple-ment the drmdel function. The meleCov function estimates the asymptotic1626.3. The DELR testcovariance matrix of the centered and scaled MELE θˆ,√n(θˆ − θ∗). Recallthat the asymptotic covariance matrix of√n(θˆ − θ∗) is given by Theorem2.5 as U−1−W , where U is the information matrix and W , which is definedin Theorem 2.2, is a matrix determined by sample proportions. Thus, toestimate the asymptotic covariance matrix of√n(θˆ− θ∗), the key is to esti-mate U−1. A consistent estimator of U is given by Uˆ = −n−1∂ln(θˆ)/∂θ∂θᵀ.Although U is invertible, Uˆ is not garanteed to be invertible; and in such acase, R will return an error and stop the program. Hence we implement thisin a separate function meleCov to increase the stability of the drmdel func-tion. This function should be used in the form of meleCov(drmfit), wheredrmfit is a fitted DRM object, i.e. an output from the drmdel function. Forexample, the asymptotic covariance matrix of√n(θˆ − θ∗) in Example 6.1can be estimated with the command meleCov(drmfit_ex1).6.3 The DELR testFrom the output of the DRM fit of Example 6.1, we see that the drmdelfunction by default tests the hypothesisH0 : F0 = F1 = . . . = Fm against H1 : Fi 6= Fj, for some i and j,which is equivalent toH0 : β = 0 against H1 : β 6= 0,under the DRM. We now illustrate how to perform DELR tests for morecomplicated hypotheses.In our package, a DELR test for a general composite hypothesis of theform (3.1) is also carried out by the drmdel function. Recall that an equiv-1636.3. The DELR testalent form of that hypothesis is given byH0 : β = G(γ) against H1 : β 6= G(γ), (6.1)where G: Rmd−q → Rmd is a smooth function and γ is a lower dimensionalparameter. With this equivalent representation of the hypothesis (3.1), themaximization of the DEL under the new null model is with respect to γand therefore does not involve any constraint. This allows a more efficientimplementation of the DELR test and hence we adopt this representation inour software.Example 6.2 (The DELR test for composite hypothesis). Adopt the datasetting of Example 6.1. Consider the hypothesis testing problemH0 : β1 = β2 and β3 = (−3.2, 13)ᵀagainst (6.2)H1 : β1 6= β2 or β3 6= (−3.2, 13)ᵀ.Under the null model, the free DRM slope parameter becomes γ = (βT1 , βᵀ4).And the hypothesis testing problem can be equivalently represented using theform (6.1) withG = Aγ + b,whereA =(I2 I2 02×2 02×202×2 02×2 02×2 I2)ᵀand b = (0ᵀ4, −3.2, 13, 0ᵀ2)ᵀTo test such a hypothesis, we need to pass the function G to the argumentg_null of the drmdel function.m <- 4 # number of non--baseline distributions1646.3. The DELR testd <- 2 # dimension of the basis function# dimension of the DRM parameter betadim_beta <- m*d# dimension of the null parameter gammadim_gamma <- dim_beta - 2*d# A matrixA <- matrix(rep(0, dim_beta*dim_gamma), dim_beta, dim_gamma)A[1:d, 1:d] <- diag(2)A[(d+1):(2*d), 1:d] <- diag(2)A[(3*d+1):(4*d), (d+1):(2*d)] <- diag(2)# b vectorb <- numeric(dim_beta)b[(2*d+1):(3*d)] <- c(-2.3, 13)# null mappingg_null <- function(par_gamma) {par_beta <- as.vector(A %*% par_gamma) + breturn(par_beta)}Optionally, we can also pass the Jacobian matrix of G to the drmdelfunction through g_null_jac argument to accelerate the maximization of theDEL under the null model. In general, this Jacobian matrix is a function ofγ, so has to be passed to drmdel as a function. In this particular example,the Jacobian matrirx of G is just a constant matrix, A.# Jacobian matrix of the null mappingg_null_jac <- function(par_gamma) return(A)With the above preparation, we are ready to perform a DELR test for thehypothesis testing problem (6.2).1656.4. EL population CDF estimationdrmfit_ex2 <- drmdel(x=x, n_samples=n_samples, basis_func=6,g_null=g_null, g_null_jac=g_null_jac,par_dim_null=dim_gamma)summaryDRM(drmfit_ex2)The part of the output from summaryDRM(drmfit_ex2) for the DELR test isDual empirical likelihood ratio (DELR) statistc: 4.067291Degree of freedom: 4p-value: 0.397showing that we find no strong evidence to reject the null hypothesis. Theother part of the output is the same as the one given in Example 6.1 becausewe used the same data and basis function to fit the DRM.6.4 EL population CDF estimationAs noted in Section 6.1, dFr(xkj), r = 1, . . . , , m, is estimated by pˆ(r)kj . Thecorresponding estimator of Fr(x) is then given byFˆr(x) = n−1∑k, jpˆ(r)kj 1(xkj ≤ x).This is exactly the MELE of Fr(x) defined in (2.10). This CDF estimator isa step function which jumps at each distinct point of the data values. Onemay linearly interpolate between every two adjacent distinct data values toget a continuous version of this CDF estimator.The drmdel package implements the above population CDF estimatorthrough the cdfDRM function:cdfDRM(k, x=NULL, drmfit, interpolation=TRUE).The arguments are:1666.4. EL population CDF estimationk: a vector of labels of populations whose CDFs are to be estimated,with k[i] = 0, 1, . . . , ,m.x: can be:(1) a list whose length is the same as the argument “k”. The ithcomponent of this list must be a vector of values at which theCDF of population k[i] is estimated.(2) a single vector of values, in which case, each CDF is estimated atthe same values given by this vector.(3) NULL (default), in which case, each CDF is estimated at thevalues of all the observed data points.drmfit: a fitted DRM object, i.e. an output from the drmdel function.interpolation: a logical variable specifying whether to linearly inter-polate the EL CDF estimator to make it continuous. The default valueis TRUE.The output of the function is an R list object whose length is the same asits argument “k”. The ith component of this list is a data frame with thefollowing two columns:x: values at which the CDF of population k[i] is estimated.cdf_est: the corresponding estimated CDF values of population k[i].Example 6.3 (EL CDF estimation under the DRM). Adopt the data andDRM setting of Example 6.1.To estimate the F1(x) evaluated at (3, 7.5, 11) and the F3(x) evaluated at(2, 6), we use:cdf_est <- cdfDRM(k=c(1, 3), x=list(c(3, 7.5, 11), c(2, 6)),drmfit=drmfit_ex1)1676.5. EL quantile estimation# show the outputnames(cdf_est)cdf_est$F1cdf_est$F3To estimate the F2(x) and F4(x) at the values of all the observed datapoints, we use:cdf_est1 <- cdfDRM(k=c(2, 4), drmfit=drmfit_exp1)# show the outputnames(cdf_est1)cdf_est1$F2cdf_est1$F46.5 EL quantile estimationBased on the EL CDF estimator Fˆr(x), Chen and Liu (2013) proposed toestimate the αth, α ∈ (0, 1), quantile ξr of Fr(x) asξˆr = inf{x : Fˆr(x) ≥ α}.They showed that the EL quantile estimator of a vector of quantiles frompossibly different distributions is jointly asymptotically normal and is moreefficient than the sample quantile based on the rth sample alone.As noted by Chen and Liu (2013), a quantile estimator based on a discretedistribution function has some disadvantage on estimating lower (or higher)quantiles of a continuous distribution: it tends to over (under) estimatea lower (higher) quantile. To adjust for this effect, they modified the ξˆrto ξˆr − (2nr)−1 for lower quantile estimation, and commented that such a1686.5. EL quantile estimationmodification does not change the first–order asymptotics of the EL quantileestimator. One can similarly modify ξˆr for higher quantile estimation.Function quantileDRM calculates the above quantile estimator ξˆ for avector of quantiles of possibly different populations, and provides an estimateof the asymptotic covariance matrix of√n(ξˆ−ξ∗), where ξ∗ is the true valueof ξ. The form of quantileDRM is:quantileDRM(k, p, drmfit, cov=TRUE, interpolation=TRUE,adj=FALSE, adj_val=NULL, bw=NULL,show_bw=FALSE).Its arguments are:k: a vector of labels of populations whose quantiles are to be estimated,with k[i] = 0, 1, . . . , ,m.p: a vector of probabilities at which the quantiles are to be estimated.Three combinations of k and p are allowed:(1) k and p have the same length: the p[i]th quantile of populationk[i] will be estimated for each i.(2) k is a single integer but p is a vector: the p[i]th quantile of thesame population k will be estimated for each i.(3) k is a vector but p is single integer: the pth quantile of populationk[i] will be estimated for each i.drmfit: a fitted DRM object.cov: a logical variable specifying whether to estimate the asymptoticcovariance matrix of the quantile estimator. With a TURE value, thespeed of the quantile estimation will be slower. The default is TRUE.interpolation: The EL quantile estimator is based on the EL CDFestimator. Hence the way the EL CDF estimate is calculated affects1696.5. EL quantile estimationthe result of the quantile estimation. This argument is to be passed tothe cdfDRM function for tweaking the EL CDF estimator. Its meaningand usage are explained when we introduce the cdfDRM function inSection 6.4.adj: a logical variable specifying whether to adjust the EL quantile es-timator by adding a term for lower or higher quantile estimation. Thedefault value is FALSE.adj_val: a vector of the same length as k (or as p if the length of k is 1)containing the values of adjustment terms for lower or higher quantileestimation, if adj=TRUE. The default value, NULL, uses −(2nk[i])−1,where nk[i] is the size of the k[i]th sample, for each i, to adjust the ELquantile estimator for lower quantile estimation.bw: to estimate the asymptotic covariance matrix of the EL quantileestimator, the densities of the population distributions have to be es-timated. An EL kernel density estimator (described in Section 6.7) isused for this purpose. The argument bw is a vector of the same lengthas k containing the bandwidths needed by that kernel density estima-tor. If bw is a single value, the same bandwidth will be used for eachpopulation k[i]. The default bw value, NULL, uses that given by (6.4)for each different k[i]. Note that bw is only needed for estimating theasymptotic covariance matrix of the EL quantile estimator, but not thepopulation quantiles themselves.show_bw: a logical variable specifying whether to output bandwidthswhen cov=TRUE. The default value is FALSE.The output of the function is a list object containing the following compo-nents:est: estimated quantiles.1706.6. Quantile comparisoncov: estimated asymptotic covariance matrix of the quantile estimator,available only if cov=TRUE at input.bw: bandwidths used for EL kernel density estimation required for esti-mating the asymptotic covariance matrix of the EL quantile estimator,available only if cov=TRUE and show_bw=TRUE at input.Example 6.4 (EL quantile estimation under the DRM). Adopt the data andDRM setting of Example 6.1. Denote the αth, α ∈ (0, 1), quantile of the kth,k = 0, 1, . . . , m, population as ξk, α.To estimate ξ0, 0.25, ξ0, 0.6, ξ1, 0.1 and ξ2, 0.1, we do:# estimate quantiles and show the output(qe <- quantileDRM(k=c(0, 0, 1, 2), p=c(0.25, 0.6, 0.1, 0.1),drmfit=drmfit_ex1))To estimate the 0.05th, 0.2th and 0.8th quantiles of F3(x), we do:(qe1 <- quantileDRM(k=3, p=c(0.05, 0.2, 0.8),drmfit=drmfit_ex1))To estimate the 0.05th quantiles of F1(x), F3(x) and F4(x), we do:(qe2 <- quantileDRM(k=c(1 , 3, 4), p=0.05,drmfit=drmfit_ex1))6.6 Quantile comparisonA bonus provided by the asymptotic theory of the EL quantile estimator isthat it enables a Wald test for comparing the quantiles of different popula-tions, an important task of our long term monitoring program for lumberquality as noted in Chapter 1.1716.6. Quantile comparisonLet ξ be a K–vector of quantiles of possibly different populations. Let Abe a given M ×K matrix with M ≤ K, and b be a given vector of length K.For the following linear hypothesis about ξ,H0 : Aξ = b against H1 : Aξ 6= b,a Wald test statistic is defined to beWn = n(Aξˆ − c)ᵀ(AΣˆξAᵀ)−1(Aξˆ − c),where Σˆξ is a consistent estimator of the asymptotic covariance matrix of√n(ξˆ− ξ∗). By the asymptotic normality of ξˆ, Wn has a χ2M limiting distri-bution.The above Wald test for a linear hypothesis about population quantilescan be carried out by the quantileCompWald function:quantileCompWald(quantileDRMObject, n_total, pairwise=TRUE,p_adj_method="none", A=NULL, b=NULL).The arguments are:quantileDRMObject: an output from the quantileDRM function. It mustcontain an estimate of the asymptotic covariance matrix of the ELquantile estimator, quantileDRMobject$cov; that is, the argumentcov at the input of the quantileDRM function must be set to TRUEwhen running quantileDRM.n_total: the total sample size.pairwise: a logical variable specifying whether to perform pairwise com-parisons of the quantiles. The default is TRUE.p_adj_method: the method for adjusting p–values for multiple compar-isons, provided pairwise=TRUE. This is implemented through the R1726.6. Quantile comparisonfunction p.adjust, and the available methods are: holm, hochberg,hommel, bonferroni, BH, BY, fdr and none. See help(p.adjust) fordetails. The default value is none, i.e. no adjustment.A: matrix A in the linear hypothesis. The default is NULL, i.e. no linearhypothesis to test.b: vector b in the linear hypothesis. This must be given if the argumentA is not NULL.The output is a list object containing the following components:p_val_pair: p–values of pairwise comparisons, in the format of a lowertriangular matrix, available only if pairwise=TRUE at input.p_val: p–value of the linear hypothesis test, available only if the argu-ments A and b are not NULL at input.Example 6.5 (Comparison of popuation quantiles under the DRM). Adoptthe data and DRM setting of Example 6.1 and the notation of Example 6.4.We compare the 5th percentiles of population 0, 1, 2 and 3, i.e.H0 : ξ0, 0.05 = ξ1, 0.05 = ξ2, 0.05 = ξ3, 0.05against (6.3)H1 : ξi, 0.05 6= ξj, 0.05 for some i and j.We first estimate these quantiles and the asymptotic covariance matrix of theEL quantile estimator usingqe <- quantileDRM(k=c(0, 1, 2, 3), p=0.05,drmfit=drmfit_ex1)To compare the quantiles using a Wald test, we need to specify the contrastmatrix A and the vector b; and for the hypothesis testing problem (6.3), one1736.6. Quantile comparisonway of doing this would be to setA =1 −1 0 01 0 −1 01 0 0 −1and correspondingly b = 0.# specify the matrix AA <- matrix(rep(0, 12), 3, 4)A[1,] <- c(1, -1, 0, 0)A[2,] <- c(1, 0, -1, 0)A[3,] <- c(1, 0, 0, -1)# specify the vector bb <- rep(0, 3)With the above preparation, we now can compare the quantiles as follows:# Adjust the p-values for pairwise comparisons using the# "holm" method.(quantComp <- quantileCompWald(qe, n_total=sum(n_samples),p_adj_method="holm", A=A,b=b))The output is$p_val_pairq1 q2 q3 q4q1 NA NA NA NAq2 0.000000e+00 NA NA NAq3 0.000000e+00 0.537371 NA NAq4 1.159073e-13 0.000000 0 NA1746.7. EL kernel density estimation$p_val[1] 0indicating that overall the quantiles are different, but not enough evidence(p–value = 0.537371) is found to reject the hypothesis that ξ1, 0.05 = ξ2, 0.05,which, in truth, are equal.6.7 EL kernel density estimationWith the EL CDF estimator Fˆr(x), it is easy to construct an EL kerneldensity estimator for Fr(x). LetK(·) ≥ 0 be a commonly used kernel functionsuch that´K(x)dx = 1,´xK(x)dx = 0 and´x2K(x)dx < ∞. For agiven bandwidth b > 0, put Kb(x) = (1/b)K(x/b). The EL Kernel densityestimator of Fr(x) is defined asfˆr(x) =ˆKb(x− y)dFˆr(x) =∑k,jKb(x− xkj)pˆ(r)kj .This estimator is originally proposed by Fokianos (2004). He showed thatthe asymptotic mean integrated square error of this estimator is smaller thanthat of the classical kernel density estimator with empirical weight of 1/nrbased on the rth sample alone.Kernel density estimation requires a bandwidth b to be specified in ad-vance. For the classical density estimation based on n iid observations, De-heuvels (1977) and Silverman (1986) suggested to use b = 1.06n−1/5 min{σˆ,Rˆ/1.34}, where σˆ and Rˆ are the estimated standard deviation and interquar-tile range (IQR) of the population, respectively. We adopt this formula inEL kernel density estimation and in our software uses the default bandwidthofbr = 1.06n−1/5 min{σˆr, Rˆr/1.34} (6.4)1756.7. EL kernel density estimationfor the density estimation of the rth population, where n is the total samplesize, and σˆr and Rˆr are the standard deviation and IQR of the estimatedCDF Fˆr(x) (2.10), respectively.The above fˆr(x) conforms with the general definition of a kernel densityestimator with weights given by {pˆ(r)kj }, so can be easily implemented usingthe R function density. The densityDRM function of our package providesan easy interface for automatically passing the data, {xkj}, and the weights,{pˆ(r)kj }, to the R density function. The densityDRM function has a genericform ofdensityDRM(k, drmfit, interpolation=TRUE, ...).The arguments are:k: a single label of the population whose density is to be estimated,k ∈ {0, 1, . . . , m}.drmfit: a fitted DRM object.interpolation: a logical variable to be passed to quantileDRM and thenultimately to cdfDRM, for estimating the population standard deviationsand IQRs required for calculating the default bandwidth (6.4). Thedefault value is TRUE....: further arguments to be passed to the R density function for kerneldensity estimation. See help(density) for details. One can customizebandwidth using the bw argument of the density function. The argu-ments x and weights should not be specified because they are supposedto be extracted automatically from the fitted DRM object drmfit. Ifspecified, they will be automatically replaced by those extracted fromdrmfit and a warning message is returned.The output is an object of class “density”, which comes from the R densityfunction.1766.7. EL kernel density estimationExample 6.6 (EL kernel density estimation). Adopt the data and DRMsetting of Example 6.1.We can estimate the density of F3 using commanddens_pop3 <- densityDRM(k=3, drmfit=drmfit).We compare the EL kernel density estimator, the classical kenel density esti-mator based on the third sample alone, and the true density curve by plottingthem.# Plotting the EL kernel density estimatesplot(dens_pop3, xlim=range(c(0, 10)), ylim=range(c(0, 0.5)),main=expression(paste("Kernel density estimators (KDE) of ", F[3],sep="")),xlab="x")# Adding the classical kernel density curve of F_3 based# on the third sample alonelines(density(x3), col="blue", lty="28F8")# Add the true density curve of F_3lines(seq(0, 10, 0.01), dgamma(seq(0, 10, 0.01), 18, 5),type="l", col="red", lty="dotted")legend(6.5, 0.48,legend=c("EL KDE", "Classical KDE", "True density"),col=c("black", "blue", "red"),lty=c("solid", "28F8", "dotted"))The comparative plot is shown in Figure 6.1.1776.7. EL kernel density estimation0 2 4 6 8 100.00.10.20.30.40.5Kernel density estimators (KDE) of F3xDensityEL KDEClassical KDETrue densityFigure 6.1: Comparative plot of the EL kernel density estimator, classicalkernel density estimator and true density of F3 in Example 6.6.178Chapter 7Summary and Future Work7.1 Summary of the present workThis thesis has presented new theories for inference, especially for hypothesistesting, concerning a number of populations from which multiple complete orType I censored independent samples are observed. The work was motivatedby an important application, the development of a new long term monitoringprogram for the North American lumber industry. Traditional reliance inthat industry on standards based on nonparametric statistical procedures ledto our adoption of a semiparametric approach, with a large nonparametriccomponent. The need for efficiency and hence small sample sizes led toour density ratio model approach where common information across samplescould be borrowed to gain strength.7.1.1 Contribution I: DELR test for hypothesis aboutthe DRM parameterThe first contribution of the thesis is a theory of dual EL ratio test for ageneral class of composite hypotheses about the DRM slope parameter β,which encompasses testing differences among population distributions as aspecial case. The new theory is assessed, both through theoretical analysis forlarge samples and by simulation studies for small ones. The theoretical resultsare illustrated by examples and the use of the proposed test is demonstratedon a dataset collected by our group over five years.Our overall conclusion is that the new theory works well and achieves its1797.1. Summary of the present workintended objectives. We recommend its use in applications for comparingpopulation parameters when independent samples from each are available.More specifically the new theory is very general and flexible in that itembraces in the DRM a large family of familiar distributions such as thenormal and gamma, making it quite robust against misspecification of pop-ulation distributions. It comes equipped with an asymptotic theory thatenables its properties to be assessed. In particular, easy to apply asymp-totic approximations are available for the distributions of the test statisticsinvolved. The asymptotic distribution of the proposed DELR test statistic isderived under the null model and also under a local alternative model. Thenull limiting distribution allows us to approximate the p–values of the DELRtests; the limiting distribution under the local alternative model enables us toapproximate the power of a DELR test, to calculate the sample size requiredfor attaining a given power, and to compare the local asymptotic powers ofDELR tests constructed in different ways.Simulation studies show that when the basis function q(x) is correctlyspecified, the distribution of the DELR test statistic, Rn, is well–approximatedby chi–square distribution under the null model and by non–central chi–square distribution under the local alternative model; that for normal datawith equal variances, the power of the DELR test is comparable to that ofthe optimal two–sample t–test; that for normal data with unequal varianceand non–normal data, the DELR test has a much higher power than all itscompetitors and its type I error rate is close to the nominal size of 0.05.When the DRM is misspecified, we observe the similar results on power com-parison and type I error rate. Also, Wald tests are generally not as powerfulas the DELR tests.The demonstration of the use of the method on three lumber samples,shows our method to give a more incisive assessment than competitors throughpaired comparisons of the populations.1807.1. Summary of the present work7.1.2 Contribution II: Effects of information poolingby the DRMThe second contribution of the thesis is a theoretical assessment of the ef-fects of information pooling by the DRM on the estimation accuracy of theMELE of the DRM parameter and on the local asymptotic power of theDELR test. We show that when additional samples are incorporated by theDRM, the estimation accuracy of the MELE θˆ is usually increased and thelocal asymptotic power of the DELR test is often improved, even if the under-lying distributions of the additional samples are not related to the populationdistributions of direct interest. In the special case of testing equality amongdistribution functions within subgroups of populations, including extra sam-ples does not change the local asymptotic power of the DELR test. This issimilar to the classical t–test: if we construct a t–test using a pooled samplevariance by including additional samples, the gain is only on the degree offreedom, but not in asymptotic sense.Our simulations support these theoretical results, that the DRM doesborrow strength as intended, to reduce the sample size needed to achieverequired power even against local alternatives.7.1.3 Contribution III: EL inference under the DRMbased on Type I censored samplesThe third contribution is an EL inference framework for population distri-butions under the DRM from which multiple Type I censored samples areobserved. In particular, we solve the maximization problem of the EL basedon Type I censored samples by reducing it to the maximization of a concaveDPEL, study the asymptotic properties of the DPEL, and establish the the-ory of EL ratio test for hypotheses about the DRM parameter under thisDPEL inference framework.The proposed EL inference framework features a fast computation be-1817.2. Outlook on future workcause of the concavity of the DPEL, and is very general in the sense that (1)it may be used to extend any EL inference result that is derived for completesamples under the DRM to the case of Type I censored samples, (2) it appliesto the general Type I left and right–censored samples, and (3) it does notrequire the censoring cutting points for each sample to be the same.7.1.4 Contribution IV: Software package “drmdel” forDEL inference under the DRMThe last but not least contribution of this thesis is a user friendly R softwarepackage drmdel for DEL inference under the DRM for multiple completesamples, which is available on CRAN at http://cran.r-project.org/web/packages/drmdel/index.html. The package is written in C from the core,so is fast. It can calculate the MELE of the DRM parameter, perform theDELR test as found in Chapter 3, estimate population distribution functions,estimate quantiles of the population distributions as found in Chen and Liu(2013), compare quantiles from different distributions using a Wald test, andestimate densities of different populations as found in Fokianos (2004). Theuse of the package is fully illustrated by examples in Chapter 6.An implementation of this package for EL inference under the DRM basedon Type I censored samples is currently under way and will soon be available.7.2 Outlook on future work7.2.1 EL ratio test for comparing quantiles under theDRMQuantile estimation under the DRM based on complete samples has beenstudied by Chen and Liu (2013). A Wald test for comparing quantiles ofdifferent populations can be constructed based on the asymptotic normality1827.2. Outlook on future workof that quantile estimator. However, there is no existing result on EL ratiotest for comparing quantiles. A population quantile can be defined as thesolution of a non–smooth estimation equation. My proposal is to constructan EL ratio test by incorporating this estimation equation as a constraintwhen profiling the EL function. The resulting profile EL then is a function ofthe quantiles to be compared, and so is the corresponding EL ratio statistic.The limiting distribution of this EL ratio statistic is hard to derive becausethe estimating equation defining a quantile is not smooth. However, whenthe population distribution functions are smooth, this estimating equation isasymptotically smooth. I have proved that, for parameters defined by smoothestimation equations, the EL ratio statistic under the DRM has a chi–squarelimiting distribution. Therefore, based on the “asymptotic smoothness” ar-gument, I conjecture that, for quantile comparisons, the corresponding ELratio statistic also has a chi–square limiting distribution. I aim to prove thisconjecture.Another way to construct an EL ratio test for quantile comparisons isto incorporate quantiles in EL using a kernel–smoothed estimating equation.In the one–sample case (not under the DRM), this approach is proposed byChen and Hall (1993). One issue with this approach is that one has to choosea tuning constant “bandwidth” that affects the asymptotic properties of theEL ratio statistic. I plan to study this approach under the DRM for multiplesamples, and study the optimal choice of the associated bandwidth.7.2.2 Effects of information pooling on quantileestimation under the DRMAs we have shown in Chapter 4, incorporating extra samples using DRMin general increases the estimation accuracy of the MELE and improves thelocal asymptotic power of the DELR test. But dose it also improve theestimation accuracy of the Chen–Liu (2013) EL quantile estimator? This isa hard question due to the complicated algebraic expression of the asymptotic1837.2. Outlook on future workcovariance matrix of that estimator. I seek for an answer.7.2.3 Inference under the DRM based on randomlycensored samplesChapter 5 establishes the theory of EL inference under the DRM based ontype–I censored samples. What if the samples are randomly censored? Inthat case, the EL function under the DRM is very complicated, and thestudy of the properties of the EL is found hard. As outlined in the Appendix5.9 of Chapter 5, Ren (2008) proposed to base the inference about randomlyright–censored samples under the DRM on the WEL, an EL like functionconstructed upon the Kaplan–Meier estimators of the population distributionfunctions instead of the population distributions themselves. An advantage ofthe WEL over the EL is that it has a simple analytic form. Ren showed thatunder a two–sample DRM, the WEL ratio statistic has a scaled chi–squarelimiting distribution. The scaling factor depends on the distributions of thepopulations as well as those of the censoring variables, which are generallyunknown. I found that, under a DRM of more than two samples, the limitingdistribution of the WEL ratio statistic is a mixture of chi–squares. Themixing coefficients, which also depend on the distributions of the populationsand those of the censoring variables, are unknown. Hence the WEL ratiois not useful for constructing confidence intervals or for testing hypothesesabout the DRM parameters.I aim to construct a usable WEL ratio statistic for randomly right–censored samples under the DRM. The reason the limiting distribution ofthe WEL is a mixture of chi–squares instead of a simple chi–square is that,under the WEL, the information matrix and the asymptotic covariance ma-trix of the score function do not have a neat relationship like that foundfor the DEL based on complete samples. My idea is to add weights to theKaplan–Meier estimators of the distribution functions and to construct aWEL based on the tweaked Kaplan–Meier estimators such that, under the1847.2. Outlook on future worknew WEL, the information matrix and the asymptotic covariance matrix ofthe score function have a “nice” relationship. The corresponding WEL ratiomay then have a simple chi–square limiting distribution.7.2.4 Basis function selection in the DRMWhether an EL inference under the DRM is effective relies heavily on whetherthe model fits data well, which in turn depends on the selection of the basisfunction q(x). Simulations show that when there is vague information aboutthe functional form of q(x), using a long vector containing many linearlyindependent components as the basis function usually gives a good model fit.However, some components might be superfluous. Ideally, we want to choosea subset of this long vector as the final basis function such that it yields both agood model fit to the data and a high estimation accuracy. I plan to study thefollowing two approaches for the selection of the basis function: (1) derivean information criterion similar to the Akaike’s information criterion thatestimates the Kullback–Leibler divergence between the assumed DRM andthe true distributions; and (2) study the maximum penalized EL approachwith an L1 or nonconcave penalty function.7.2.5 Random–effect DRMsThe lumber quality samples are collect from different mills across Canadaeach year. Although our purpose is to assess change in lumber strength overtime, which does not involve the difference on the mill level, the mill–to–millvariability should be considered for a more realistic model. My rough ideais to add random effects to the DRM slope parameters, {βk}, to representthis mill–level variability. The model constructed this way is flexible, howevermy initial investigation showed that the resulting EL function is very compli-cated and the study of the properties and computation of the correspondingmaximum EL estimator are hard. Given the above difficulties, I plan to start1857.2. Outlook on future workfrom a simpler model, where the mill–level variability is assumed to be fullyrepresented by a random effect on the population means only. Hopefully, thisinvestigation can help me to ultimate overcome the difficulties in the formermore sophisticated random–effect DRM.7.2.6 Other projects: high dimensional DRM andfinite sample corrections for the DELR testThe current theory of EL inference under the DRM assumes that the numberof samples is a constant that does not change with the total sample size n.However, in our targeted applications, new lumber samples are added yearby year. Hence, in reality, the number of DRM parameters increases as thesample size n increases, thereby nullifying the existing asymptotic resultson the DELR test and the EL quantile estimation under the DRM. I planto study the conditions under which these asymptotic results are still validwhen the number of samples increases with n.Another practical issue is that the DELR test for the DRM parameter isfound to be anti–conservative when the total sample size is small. A finitesample correction may be useful for adjusting for this effect. I intend tostudy whether the DELR test is Bartlett correctable; if it is, I hope thecorresponding Bartlett correction will have a second–order accuracy and canmake the DELR test less anti–conservative.186ReferencesAnderson, J. A. (1972). Separate sample logistic discrimination. Biometrika,59(1):19–35.Anderson, J. A. (1979). Multivariate logistic compounds. Biometrika,66(1):17–26.ASTM (2007). ASTM D1990 – 07: Standard Practice for Establishing Al-lowable Properties for Visually-Graded Dimension Lumber from In-GradeTests of Full-Size Specimens. ASTM Intenational, West Conshohocken,USA.Broyden, C. G. (1970). The convergence of a class of double–rank minimiza-tion algorithms. Journal of the Institute of Mathematics and Its Applica-tions, 6(1):76–90.Chen, J. and Liu, Y. (2013). Quantile and quantile–function estimationsunder density ratio model. The Annals of Statistics, 41(3):1669–1692.Chen, S. X. and Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annuals of Statistics, 21(3):1166–1181.Cheng, K. F. and Chu, C. K. (2004). Semiparametric density estimationunder a two–sample density ratio model. Bernoulli, 10(4):583–604.Chow, Y. S. and Teicher, H. (1997). Probability Theory – Independence,Interchangeability, Martingales. Springer, New York, USA, 3rd edition.187ReferencesDeheuvels, P. (1977). Estimation non paramétrique de la densité par his-togrammes généralisés. Revue de Statistique Appliquée, 25(3):5–42.Efron, B. and Tibshirani, R. (1996). Using specially designed exponentialfamilies for density estimation. The Annals of Statstics, 24(6):2431–2461.Farewell, V. (1979). Some results on the estimation of logistic models basedon retrospective data. Biometrika, 66:27–32.Fletcher, R. (1970). A new approach to variable metric algorithms. ComputerJournal, 13(3):317–322.Fokianos, K. (2004). Merging information for semiparametric density es-timation. Journal of the Royal Statistical Society, Series B (StatisticalMethodology), 66(4):941–958.Fokianos, K. (2007). Density ratio model selection. Journal of StatisticalComputation and Simulation, 77(9):805–819.Fokianos, K. and Kaimi, I. (2006). On the effect of misspecifying the densityraio model. Annals of the Institute of Statistical Mathematics, 58:475–497.Fokianos, K., Kedem, B., Qin, J., and Short, D. A. (2001). A semiparametricapproach to the one–way layout. Technometrics, 43(1):56–65.Fokianos, K. and Troendle, J. F. (2007). Inference for the relative treatmenteffect with the density ratio model. Statistical Modelling, 7(2):155–173.Gilbert, P. B. (2000). Large sample theory of maximum likelihood esti-mates in semiparametric biased sampling models. The Annals of Statistics,28(1):151–194.Gilbert, P. B., Lele, S. R., and Vardi, Y. (1999). Maximum likelihood es-timation in semiparametric selection bias models with application to aidsvaccine trials. Biometrika, 86(1):27–43.188ReferencesGill, R. D., Varidi, Y., and Wellner, J. A. (1988). Large sample theory ofempirical distributions in biased sampling models. The Annals of Statistics,16(3):1069–112.Goldfarb, D. (1970). A family of variable metric updates derived by varia-tional means. Mathematics of Computation, 24(109):23–26.Guan, Z., Qin, J., and Zhang, B. (2012). Information borrowing methods forcovariate-adjusted roc curve. The Canadian Journal of Statistics, 40(2):1–19.Harville, D. A. (2008). Matrix Algebra From a Statistician’s Perspective.Springer, New York, USA, 1st edition.Huang, A. (2014). Joint estimation of the mean and error distribution ingeneralized linear models. Journal of the American Statistical Association,109(505):186–196.Huang, A. and Rathouz, P. J. (2012). Proportional likelihood ratio modelsfor mean regression. Biometrika, 99(1):223–229.Jiang, S. and Tu, D. (2012). Inference on the probability p(t1 < t2) as ameasurement of treatment effect under a density ratio model and randomcensoring. Computational Statistics and Data Analysis, 56:1069–1078.Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Uni-variate Distributions, Volume 2. Wiley–Interscience, Hoboken, USA, 2ndedition.Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from in-complete observations. Journal of the American Statistical Association,53(282):457–481.189ReferencesKeziou, A. and Leoni-Aubin, S. (2008). On empirical likelihood for semipara-metric two–sample density ratio models. Journal of Statistical Planningand Inference, 138:915–928.Luo, X. and Tsai, W. Y. (2012). A proportional likelihood ratio model.Biometrika, 99(1):211–222.Mantel, N. (1973). Synthetic retrospective studies and related topics. Bio-metrics, 29:479–486.Marshall, A. W. and Olkin, I. (2007). Life Distributions – Structure of Non-parametric, Semiparametric and Parametric Families. Springer, New York,USA.Mathai, A. M. (1992). Quadratic Forms in Random Variables: Theory andApplications. Marcel Dekker, Inc., New York, USA.Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer,New York, USA, 2nd edition.Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for asingle functional. Biometrika, 75:237–249.Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall, New York,USA.Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models andcase–control studies. Biometrika, 66:403–422.Qin, J. (1993). Empirical likelihood in biased sample problems. The Annalsof Statistics, 21(3):1182–1196.Qin, J. (1998). Inferences for case–control and semiparametric two–sampledensity ratio models. Biometrika, 85(3):619–630.190ReferencesQin, J. and Lawless, J. (1994). Empirical likelihood and general estimatingequations. The Annals of Statistics, 22(1):300–325.Qin, J. and Liang, K. Y. (2011). Hypothesis testing in a mixture case–controlmodel. Biometrics, 67:182–193.Qin, J. and Zhang, B. (1997). A goodness of fit test for the logistic regressionmodel based on case–control data. Biometrika, 84:609–618.Rathouz, P. J. and Gao, L. (2009). Generalized linear models with unspecifiedreference distribution. Biostatistics, 10(2):205–218.Ren, J. (2008). Weighted empirical likelihood in some two–sample semi-parametric models with various types of censored data. The Annals ofStatistics, 36(1):147–166.Scholz, F. W. and Stephens, M. A. (1987). K–sample Anderson–Darlingtests. Journal of the American Statistical Association, 82(339):918–924.Shanno, D. F. (1970). Conditioning of quasi–Newton methods for functionminimization. Mathematics of Computation, 24(111):647–656.Shen, Y., Ning, J., and Qin, J. (2012). Likelihood approaches for the invariantdensity ratio model with biased-sampling data. Biometrika, 99(2):363–378.Silverman, B. W. (1986). Density Estimation for Statistics and Data Analy-sis. Chapman & Hall, Boca Raton, USA, 1st edition.Tan, Z. (2009). A note on profile likelihood for exponential tilt mixturemodels. Biometrika, 96(1):229–236.van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge UniversityPress, Cambridge, UK.Vardi, Y. (1982). Nonparametric estimation in the presence of length bias.The Annals of Statistics, 10(2):616–620.191ReferencesVardi, Y. (1985). Empirical distributions in selection bias models. The Annalsof Statistics, 13(1):178–203.Wan, S. and Zhang, B. (2008). Comparing correlated roc curves for continu-ous diagnostic tests under density ratio models. Computational Statisticsand Data Analysis, 53:233–245.Wang, C., Tan, Z., and Louis, T. A. (2011). Exponential tilt models fortwo-group comparison with censored data. Journal of Statistical Planningand Inference, 141:1102–1117.Wilcox, R. R. (1995). Anova: A paradigm for low power and misleadingmeasures of effect size. Review of Educational Research, 65(1):51–77.Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis,and Optimization. Wiley, Hoboken, USA, 2nd edition.Zhan, X. (2002). Matrix Inequalities. Springer–Verlag, Berlin, Germany.Zhang, B. (2000). Quantile estimation under a two–sample semi–parametricmodel. Bernoulli, 6(3):491–511.Zhang, B. (2002). Assessing goodness-of-fit of generalized logit models basedon case–control data. Journal of Multivariate Anlysis, 82:17–38.Zhang, B. (2006). A partial empirical likelihood based score test under asemiparametric finite mixture model. Annals of the Institute of StatisticalMathematics, 58:707–719.Zhang, F., editor (2005). The Schur complement and its applications.Springer, New York, USA.Zorich, V. A. (2004). Mathematical Analysis II. Springer–Verlag, Berlin,Germany.192ReferencesZou, F. (2002). A note on a partial empirical likelihood. Biometrika,89(4):958–961.Zou, F., Fine, J. P., and Yandell, B. S. (2002). On empirical likelihood for asemiparametric mixture model. Biometrika, 89(1):61–75.193
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On dual empirical likelihood inference under semiparametric...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
On dual empirical likelihood inference under semiparametric density ratio models in the presence of multiple… Cai, Song 2014
pdf
Page Metadata
Item Metadata
Title | On dual empirical likelihood inference under semiparametric density ratio models in the presence of multiple samples with applications to long term monitoring of lumber quality |
Creator |
Cai, Song |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | Maintaining a high quality of lumber products is of great social and economic importance. This thesis develops theories as part of a research program aimed at developing a long term program for monitoring change in the strength of lumber. These theories are motivated by two important tasks of the monitoring program, testing for change in strength populations of lumber produced over the years and making statistical inference on strength populations based on Type I censored lumber samples. Statistical methods for these inference tasks should ideally be efficient and nonparametric. These desiderata lead us to adopt a semiparametric density ratio model to pool the information across multiple samples and use the nonparametric empirical likelihood as the tool for statistical inference. We develop a dual empirical likelihood ratio test for composite hypotheses about the parameter of the density ratio model based on independent samples from different populations. This test encompasses testing differences in population distributions as a special case. We find the proposed test statistic to have a classical chi-square null limiting distribution. We also derive the power function of the test under a class of local alternatives. It reveals that the local power is often increased when strength is borrowed from additional samples even when their underlying distributions are unrelated to the hypothesis of interest. Simulation studies show that this test has better power properties than all potential competitors adopted to the multiple sample problem under the investigation, and is robust to model misspecification. The proposed test is then applied to assess strength properties of lumber with intuitively reasonable implications for the forest industry. We also establish a powerful inference framework for performing empirical likelihood inference under the density ratio model when Type I censored samples are present. This inference framework centers on the maximization of a concave dual partial empirical likelihood function, and features an easy computation. We study the properties of this dual partial empirical likelihood, and find its corresponding likelihood ratio test to have a simple chi-square limiting distribution under the null model and a non-central chi-square limiting distribution under local alternatives. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-05-26 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0167228 |
URI | http://hdl.handle.net/2429/46812 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2014-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2014_september_cai_song.pdf [ 1.38MB ]
- Metadata
- JSON: 24-1.0167228.json
- JSON-LD: 24-1.0167228-ld.json
- RDF/XML (Pretty): 24-1.0167228-rdf.xml
- RDF/JSON: 24-1.0167228-rdf.json
- Turtle: 24-1.0167228-turtle.txt
- N-Triples: 24-1.0167228-rdf-ntriples.txt
- Original Record: 24-1.0167228-source.json
- Full Text
- 24-1.0167228-fulltext.txt
- Citation
- 24-1.0167228.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0167228/manifest