LIKELIHOOD RATIOS IN ASYMPTOTIC STATISTICAL THEORY By BRIAN GILBERT LEROUX B.Sc, Carleton University, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1985 ©Brian Gilbert Leroux, 1985 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date March If, ABSTRACT This thesis deals with two topics in asymptotic statistics. A concept of asymptotic optimality for sequential tests of statistical hypotheses is introduced. Sequential Probability Ratio Tests are shown to have asymptotic optimality properties corresponding to their usual optimality properties. Secondly, the asymptotic power of Pearson's chi-square test for goodness of fit is derived in a new way. The main tool for evaluating asymptotic performance of tests is the likelihood ratio of two hypotheses. In situations examined here the likelihood ratio based on a sample of size n has a limiting distribution as n ->• 00 and the limit is also a likelihood ratio. To calculate limiting values of various performance criteria of statistical tests the calculations can be made using the limiting likelihood ratio. - iii TABLE OF CONTENTS Page Abstract ii Table of Contents iiList of Tables v List of Figures v Acknowledgement vi INTRODUCTION 1 CHAPTER 1 - THE THEORY OF LIKELIHOOD RATIOS 3 1.1 Likelihood Ratios and Hypothesis Testing 3 1.2 Sequential Tests of Hypotheses 8 1.3 Weak Convergence of Likelihood Ratios 12 1.4 Functional Convergence of Likelihood Ratios 19 1.5 Contiguity and Convergence of Experiments 23 CHAPTER 2 - ASYMPTOTIC OPTIMALITY OF SEQUENTIAL TESTS 26 2.1 Wald's Criterion 22.2 Bayes Risk Criterion 32 CHAPTER 3 - POWER OF CHI-SQUARE TESTS 40 BIBLIOGRAPHY 51 • APPENDIX A.l Uniform Integrability of a Sequence of Stopping Rules 3 A.2 The Likelihood Ratio of Singular Multivariate Normal Distributions 55 A.3 Two Lemmas on Weak Convergence 59 - iv -LIST OF TABLES Page Table I. Asymptotic power 1 - $(Za - v2[) of the test based on Zn for values of size a, power 3 and degrees of freedom k - 1 of the chi-square test 50 - v -LIST OF FIGURES Page Fig. 1. Graph of h which determines stopping boundaries A, B of optimal SPRT 36 - vi -ACKNOWLEDGEMENT The author, being one who thrives on encouragement, wishes to thank Professors Cindy Greenwood and John Petkau for the constant supply they gave. - 1 -INTRODUCTION The motivation behind some of this work lies in a problem concerning a sequential procedure for testing the mean of a normal distribution. The following discussion of this problem follows [3]. There are observed independent identically distributed observations X\, 2 2 X2,.«. assumed to be distributed as N(y, a ) for a known a . It is required to find a sequential procedure for testing whether u is positive or negative (sequential procedures are discussed in Section 1.1). The criterion by which procedures are to be judged is the Bayes Risk. This is defined in terms of a cost function having two components, one due to reaching an incorrect conclusion and a second depending on the number of observations on which the conclusion is based. The proposed costs are K|u| for making an error (K is a constant) and a cost of c per observation. The average cost for a given procedure will depend on u. To avoid problems involved with this it is assumed that u is a random variable, also with a normal distribution. If its mean and variance are specified the average cost can be averaged further against this distribution for y. The result is the Bayes Risk. In the development of a sequential procedure which minimizes the Bayes Risk the partial sums of the observations are replaced by a Brownian motion. This is a reasonable approximation if the number of observations can be expected to be large, and this can be expected when the cost c is small. A procedure which is optimal (minimizes Bayes - 2 -Risk) in the continuous time setting is derived and then applied (with a small adjustment) to the discrete time setting. It is desired to have a result stating that this procedure is asymptotically optimal in some sense which can be made precise. Asymptotic here refers to c approaching zero. Results along these lines can be found in [13] where the setting is the more complicated situation of sequential medical trials in which further components of cost are considered (see [4]). This author attempted to establish similar results using the theory of weak convergence of likelihood ratios which will be discussed in Chapter 1. Success was met only in simple hypothesis testing settings where there are only two possible states of nature. In Chapter 2 are presented discussions of asymptotically optimal sequential procedures which are based on the likelihood ratio. It is believed that the methods used there could be applied successfully in more complicated situations. Another area for application of likelihood ratio theory lies in the calculation of asymptotic performance of other tests not necessarily based on the likelihood ratio. In Chapter 3 the asymptotic power of chi-square tests is studied via the theory of Chapter 1. It is indicated there that the chi-square test is asymptotically inefficient compared to a test based on the likelihood ratio. - 3 -CHAPTER 1 THE THEORY OF LIKELIHOOD RATIOS 1.1 Likelihood Ratios and Hypothesis Testing We describe the general hypothesis testing problem of distinguishing two probability measures. On a set ft let there be probability measures PQ and pi» A random element X of n is chosen and the question is asked: was X chosen based on the distribution Pg or the distribution Pj? A decision rule for answering the question is a subset D of ft; if X belongs to D then it is decided that ?\ is the true distribution, otherwise PQ« In common language D is a test of the simple hypothesis HQ:PO versus the simple hypothesis H^:Pj. D is also called the rejection region because the occurrence of the event D leads to the rejection of the null hypothesis HQ in favor of the alternative . Each decision rule has associated with it two error probabilities: ct(D) = PQ(D) = probability of rejecting HQ when it is true, and B(D) = P^D0) = probability of accepting ^ when it is false, called the type I error and type II error respectively. a(D) is also called the level and 1-$(D) the power of the test D. Because it is generally impossible to minimize both types of error simultaneously, various criteria for comparing decision rules have been employed. In many cases the best rules are based on the likelihood ratio which we will now define. - 4 -Given two probability measures PQ and P^ such that Pj is absolutely continuous with respect to PQ (F^ <Pn), their likelihood ratio is the Radon-Nikodym derivative dPj/dPn. This can be generalized by defining the likelihood ratio of any two probability measures P^, P^ on a measure space (ft, F) by dP,/dp z = dVdii "-d where p is any measure on (ft, F) such that P^ < p and PQ € p (such as P = PQ + Pj). The conventions 1/0 = °° and 0/0 = 0 are used in (1.1). In order to show that Z does not depend on the particular choice of p the following result is needed. Lebesque Decomposition. For any AeF, JA z dP0 = PJ(A n {z < -}). dp0 dpj Proof: Let XQ = and X = . Then for any AeF dp dp /A Z dPQ = / Z dP0 = / XQ.Z dp = / X1 dp = Pj(An{z < »}) An{z<»} An{z<~} An{z<°°} since {z = °°} = {XQ = 0} and XQZ = X^^ on {z < °°}. Now because PQ(^Q = 0) = 0, Z is a finite random variable on the probability space (ft, F, P0). For any Ac{z < ~} the integral /» Z dPn = P,(A) is determined and so Z is uniquely determined on - 5 -(ft, F, Pg)« By symmetry 1/Z is uniquely determined on (ft, F, and hence Z is also uniquely determined on (ft, F, P^)« The notation dP^/dP^ is used to denote this extension of the Radon-Nikodym derivative and from here on dP^/dPg will denote Z as defined in (1.1). Example. When Pg and P^ are probability distributions on R with densities fg and fj respectively the ratio of densities Z - Vfg is the likelihood ratio of Pg and ?±. It need not be assumed that the support of fg is contained in the support of f±. Since Z is expected to be larger when is true a reasonable test of Hg vs. H]^ uses the decision rule D* = {z > C} where C is a constant which determines the level of the test. This rule has the following nice property. Neyman-Pearson Lemma. If D is any test of Hg vs Hj satisfying a(D) <_ a(D*) then 6(D) > 3(D*). Proof: By the Lebesque Decomposition 0(D) = P^D0) = / c Z dPQ + P1(Dcn {Z = »}). - 6 -Since D*cfl {z = °°} = 0, P^D0^! {z = »}) > P1(D*cn {z = »}). Also / Z dPQ - J Z dPQ = / (I - I ) Z dPQ + J (I - I ) Z dPQ c *c * D D *c D D D D D D D* D D D*c D D since Z > C on D and I -I* < 0 on D . — c *c — D D Therefore / Z dPQ - / Z dPQ > C/(I c - I *c) dP0 = C[PQ(DC) - P0(D)*C)] Dc D*c D D = C[PN(D*) - PQ(D)] > 0. This result says that among all rules having type I error at most PQ(D*), D* has the smallest type II error. Equivalently, among all rules with type II error at most Pi(D*C), D* has the smallest type I error. A simple and symmetric formulation is: no rule can simultaneously have both a smaller type I error and a smaller type II error than D . If PQ(D*) = a then D* is called an optimal level-a test. For a given number a it may be impossible to find a number C such that PQ(Z j> C) = a. There will always be a randomized decision rule which - 7 -achieves this but these will not be considered here. See [15] for a discussion of randomized rules. Decision rules having the form of D are optimal also in the sense of minimal Bayes Risk. The Bayes Risk for a rule D is where IT is the prior probability of the distribution being Pg* This assumes the 0-1 cost (or loss) function whereby a cost of 1 is incurred when an error of either type is made. Now the expected cost is the probability (under the appropriate hypothesis) of making an error and when this is averaged over the two hypotheses according to the prior probability TT, (1.2) results. For fixed TT the Bayes Risk is minimized by D with C having the value TT/(1 - TT) , i.e. IT a(D) + (1 - TT) 3(D) (1.2) inf[TT ct(D) + (1 - TT) 3(D)] = /[TTA(1 - TT) Z] dP( D 0 and the infimum is achieved at D„ = {z J> ir/(l - TT)}. This is proved easily using the Lebesque Decomposition as follows. First, TT aCD^) + (1 - TT) 3(0^) = - /[TTA (1 - TT) Z] dP, and for any D - 8 -TT cx(D) + (1-TT) g(D) = / ir dPQ + J (1-TT) Z dP0 + (1-TT) P (Dcft {Z = »}) D > / TT dPQ + /c(l - TT) Z dP( > /[ TT A (1 - TT) Z] dPQ. 1.2 Sequential Tests of Hypotheses Wald's Sequential Probability Ratio Test (SPRT) is a procedure for testing HQ:P0 vs. H1:P1 based on a sequence X^ , X2,... of independent identically distributed (i.i.d.) random variables having distribution either PQ or P^. If Xj,...X^ are observed the SPRT uses the statistic k \ = IT Z(X.) (1.3) k i-1 1 where Z is the likelihood ratio dPi/dPrj. This is reasonable because Z^ is the likelihood ratio of the distribution of Xj,...,X^ under F\ with respect to the distribution under Pg. The SPRT proceeds as follows: if Z^ _> A then is accepted if Z^ <^ B then HQ is accepted if B < Z^ < A then another observation is taken, - 9 -with A and B satisfying 0<B<^1<^A<°°. This procedure can be expressed in terras of the stopping rule T = inf {k: Zk > A or Zk < B} (1.4) and the decision rule D = {ZT > A}, (1.5) where Z-p denotes the value of Zk when T = k. An immediate question arises: can it happen that never crosses the boundaries determined by A and B? To answer this question probability distributions, corresponding to Pg and Pj, for infinite sequences X} , X2,... must be used. These are the infinite product measures denoted Qg and . The questions above is answered in the negative by QQ(B < Zk < A, k = 1,2,...) = 0. Q^B < Zk < A, k = 1,2,...) = 0. These statements are implied by the stronger results Zk -*• 0 a.s. under OQ Zk •*• 00 a.s. under . A proof of these uses the Strong Law of Large Numbers applied to the sequence {-a V log Z(X )}"=1> where X V Y is defined to be the larger of - 10 -X and Y. Note that the SPRT could equally well be defined with logZ^ in place of Z^. The Strong Law of Large Numbers yields 1 n 11m- E (-a V log Z(X )) - E (-a V log Z(X.)) a.s. (Q ). n^.oo i I u i u where EQ denotes expected value under QQ. By Jensen's Inequality, provided PQ and V\ are distinct in the sense that PQ(Z = 1) < 1, E0(logZ) = JlogZ dPQ < log / Z dP0 < 0. For large enough a then and E0(-a V logZCxp) = EQ(-a V logZ) < 0 k lim E (-a V logZ(X.)) = -°° a.s. (Qn) k-x» i=l lim E log Z(X,) = -» a.s. (Qn) k+» i=l lim Z, = 0 a.s. (QQ). k->-°° By symmetry, lim-i—= 0 a.s. (Qi) and so lim Z, = °° a.s. (Q.) k->-°° k k->-<*> We have just seen that the conditions for stopping in (1.4) will be met eventually, i.e., T is a.s. finite. Now let us compare the SPRT to other sequential tests of HQ vs. . A sequential test in general - 11 -consists of a stopping rule and a decision rule. A stopping rule is a random variable T taking values in the positive integers such that the set {T = k} depends only on Xi,...,X^. The decision rule of a sequential test is a set D which depends only on the observations X^ up until the random time T, i.e., for each k, D n {T = k} is determined by Xi,...,Xfc. Criteria for comparing sequential tests include error probabilities and the Average Sample Number (ASN) which is the expected value of the stopping rule. Only tests with finite ASN will be considered worthwhile; this implies that the conditions for stopping will almost surely be met eventually. The error probabilities are defined exactly as for non-sequential tests, a(T, D) = QQ(D) and 0(T, D) = 0^(0°). The SPRT has the following optimality property. Optimality Property of the SPRT. Let a = OQ(D) and B = Qi(DC) be the error probabilities of the SPRT defined in (1.4) and (1.5). If (T', D') is any other sequential test of HQ:PQ VS. H^:P^ with smaller error probabilities, Q0(D') < a; Q^D'0) < 6 then the SPRT has smaller ASN under both hypotheses, En(T') > En(T) - 12 -and Ej(T') > Ej(T). (As for EQ, E^ denotes expectation under Q^). There have been four strategies for proving this result. The original is due to Wald and Wolfowitz, [23], another is due to Lehmann (see [15] or [11]) and two others ([3], [20]) first prove that SPRTs are Bayes procedures. This latter result is important enough to be stated as a separate result. The Bayes Risk of a sequential test of HQ:PO vs. H^rPj which uses stopping rule T and decision rule D is p(T,D;rr) = TT(Q0(D) + c EQ(T)) + (1 - TT)(Q1(DC) + c E^T)) (1.6) where TT is the prior probability of the true distribution being Po and c is the cost per observation. Just as for the Bayes Risk in (1.2) the 0-1 cost function is employed here. Bayes Optimality of the SPRT; There exist constants A and B which depend on TT such that the Bayes Risk (1.6) is minimized by the SPRT which has stopping boundaries A and B. This property is proved in [20]; in Section 2.2 we will demonstrate how to adapt the argument given there to the continuous time setting. 1.3 Weak Convergence of Likelihood Ratios For testing simple hypotheses, results based on the likelihood ratio are good tests as measured by the optimality properties we have just seen. For testing a simple null hypothesis against a composite - 13 -alternative a reasonable approach consists of choosing one element of the alternative, thus forming a new simple alternative. For example let Xi,...,Xn be a random sample from the N(6,l) distribution and consider testing HQ:9 = 0 vs. :9 > 0. One test is based on the likelihood ratio for HQ:9 = 0 vs. :9 = 9Q for some fixed Bq > 0. In this case the likelihood ratio is "(x - 9 )2/2 2. z(x) = 1 T7 = e x 12 /2r?e and the likelihood ratio based on Xj,•..,Xk is (1.7) k k ,2, Zk - TT z(X±) = exp(9Q U. -k 6J/2) Another example involves the parameter 9 in the exp(l + 9) distribution. The likelihood ratio for H0:9 = 0 vs. 1^:9 = 9Q is -(1 + 9 ) x (1 + 9 ) e -9 x Z(x) = y—^ (1 + 9 ) e , (1.8) and the likelihood ratio based on a random sample Xj,...X^ is k , k Zk = TT z(X±) = (1 + 9Q)K exp(-90 S X±). (1.9) One way of comparing two tests of HQ:9 = 0 vs. H^:9 > 0 is to look - 14 -at their performance for testing HQ VS. simple alternatives and let the alternatives approach HQ. A common choice for alternative is Hl n^c/^ wnere n ^s tne samPle size. The reason for this choice is the desire for the test statistic to have a non-degenerate limiting distribution under the alternative. This enables one to calculate limiting (or asymptotic) power and it is by this criterion that tests will be compared. Tests which perform well according to this are considered sensitive to small departures from 9=0. For making asymptotic power calculations the limiting distribution of the likelihood ratio is useful in two ways: 1. When measuring the performance of tests based on the likelihood ratio its limiting distribution is essential. 2. The limiting distribution, under the alternative, of other statistics can be found from the joint limiting distribution, under the null, of the statistic and the likelihood ratio. These two uses are explored in Chapter 2 and Chapter 3. In Chapter 2 tests based on Z are shown to have certain asymptotic optimality properties. In Chapter 3 weak convergence of Z is used to find the limiting distribution of Pearon's chi-square statistic for goodness of fit tests. Let us examine the asymptotic distribution of the likelihood ratio in the exponential example introduced above. For each n there is a random sample XN,XN,... from the exp(l + 9) distribution. The hypothesis H^ n says that 9 = 9Q/Vn for this sample. The likelihood ratio for H0=9 = 0 vs. Hj>n is, from (1.7), - 15 -0O -(6 //rT) x Zn(x) = (1 + e 0 and the likelihood ratio statistic based on X?,...,Xn Is 1 n n 6. -9. n Z» = IT Zn(X^) = (1 +-V exp(-°- E X?). n 1 1 /TT /rT 1 1 By a Taylor expansion of log(l + x), log Zn can be written n 9 9 n log Z* = n log(l + —) ^ UJ /n /n 1 A 90 ^ nf 1 „ 90 n Yn = »(— - 2TT+ °<-372)> " 7= ? Xi vn n vn 1 1-0 Z(xn n _ 0 + Q(l_) /^T 1 1 /n" i.e., the remainder term 0(l//n") is deterministic and converges to 0 at rate 1//TT) . Now under , {x^ - l}i=j is a sequence of i.i.d. mean 0, n d variance 1 random variables and hence (l//n) Z(X - 1) -»-N(0,l) by the 1 1 Central Limit Theorem. Therefore .nd -602 n2 log Z^j * N(-yS Bp under HQ. (1.10) Similarly under >n - 16 -{(1 + ~~) <X° i )}? . /n" 1 1 + 9Q//n" 1-1 is a sequence of i.i.d. mean 0, variance 1 random variables and thus, using the same Taylor expansion, , 7* -e0 (i + e0/v^) n i V'n log Zn = ~ E(X, ) n (1 + 0 //n") /n" 1 1 1 + eQ//n" 1 + 8 //n" e2 + e //n" - 4 + o( -) (J Z r-vn -6 (1 + 8 n 62 0 0 z(xn 1 } + 0 (1 + eQ//n") vn" 1 1 1 + *Q/Sn 1 + QQ/^ e2 d e2 - -| + 0(— ) * -8 N(0,1) + e2 - -| where N(0,1) stands for a random variable having that distribution. Therefore d 92 . log Z« +N(/, eQ) under H^. n There is a connection between the limiting distributions of log under the null and alternative hypotheses with another hypothesis testing problem which is thought of as a "limiting problem." Given - 17 -X = N(9,l) the likelihood ratio statistic for testing is HQ: 9 = 0 vs. : 6 - 9Q * 0 Thus Z(X) = exp(90X - -|). -92 2 .d „, 0 log Z(X)= N(-~, 9Q) under HQ, (1.11) e2 2 log Z(X)= N(y^, 9Q) under (1.12) and hence log * log Z(X) under HQ and under Hj n» (1.13) d We will use this fact for evaluating the asymptotic properties of tests based on Z^ as the sample size n gets large. The broadest interpretation of (1.13) is that the parameter 9 in the exp(l + 9/ /vi) family of distributions plays the same role'asymptotically as 9 in the N(9,l) family. We pursue this idea in Section 1.5. For evaluating the performance of tests of the form {zn > K } or ° 1 n — nJ equivalently {log z" _> Cn} let p" be the exp(l + 90//n) distribution and PQ be the exp(l) distribution. In the notation introduced in - 18 -Section 1.1, = dPn/dPn. Now if the test mentioned previously is to have asymptotic level a, i.e., Pn(log zJJ >_ C ) * a as n + » then (1.11) and (1.13) imply that Cn + J'1 7 g- - Za 9o Cn + eQ Za - - as n * -where Za is the 100(1 - a) percentile of the standard normal distribution. Note that from (1.11) and (1.13) it follows that log Z™ + 0^/2 d —•—g N(0,1) under HQ (under pn). Now by (1.12) and (1.13) the asymptotic power of this test is log Zn - 62/2 C - 92/2 lim Pn(log z" > C ) = lim P"( \ — > " Q " ) (1.14) n-»-°° n->-°° 0 0 where 0 is the distribution function of the standard normal - 19 -distribution. How does this compare to the asymptotic power of other tests of HQ VS. %>n which have asymptotic level a? This can be answered by considering the power of level a tests of HQ'9 = 0 vs. Hl:9 = 90 based on x having distribution N(9,l). Now 1 - <3>(Za - 9Q) log Z(X) - 92/2 is the power of the test { g >^ ZQ - 9^} or {log Z(X) > 9QZa - 92/2} and by the Neyman-Pearson lemma this test has the greatest possible power. From this it can be shown (see [10]) that the test {log Z^ _> ^nZa - 9Q/2} has the greatest asymptotic power among all tests having asymptotic level a. This is demonstrated in general in [10]; the particular form of the likelihood ratios is not important. We will produce a similar result in the sequential testing situation in Chapter 2. For this purpose it is necessary to study the functional convergence of the likelihood ratio viewed as a stochastic process. We take up this topic next. 1.4 Functional Convergence of Likelihood Ratios In Section 1.2 sequential procedures for testing simple hypotheses were examined and the SPRT was seen to be optimal in certain ways. The question of asymptotic power against alternatives tending to the null leads to the study of the limiting distribution of the likelihood ratio considered as a process with time measured by observations of the data points. The data Xi»X2»... are i.i.d. observations from - 20 -distribution either Pg or ?i• The likelihood ratio process {zk:k=l,2,...} is defined in (1.3). In the non-sequential case of the previous section a sequence of alternative hypotheses was indexed by the sample size and the alternative grew closer to the null as the sample size increased. With a larger number of observations smaller departures from the null hypothesis can be detected with equal power. In the sequential situation, to detect nearby alternatives many observations are required on average and so it is reasonable to approximate the likelihood ratio process by a continuous time process. Based on an i.i.d. sequence Xn,Xn,... with distribution either P^ or Pn one continuous time version of the likelihood ratio process is [nt] zn(t) = n zn(xn) i=l (1.15) where Zn = dP^/dPn as defined in (1.1) and [nt] denotes the integer part of nt. Symbolically, the observations Xn are associated with the time points i/n. Example. If Xn,X~,... are independent N(6,1) random variables and H 0 specifies 9=0 while H l,n specifies 9 = 9 / VTI then 0* (1.16) - 21 -It is known that processes of this form converge weakly to a Brownian motion (e.g., see Corollary 6 of [16]); in this case we have w 02 log Zn(t) + 60 B(t) - — t under HQ (1.17) w 92 and log Zn(t) •»• 9„ B(t). + -~ t under H. (1.18) U I 1 ,n w where {B(t):t > 0} is a standard Brownian motion. The convergence -»• takes place in the space D([0,°°)) of right continuous functions with left limits with the Skorohod metric (see [1]). However, because B(t) has continuous sample paths we can use the alternative formulation d 92 f(log Zn(-)) - f(0QB(.) - -°r(-)) for functionals f continuous with respect to the sup-norm (uniform) metric ([1]). As in the non-functional case the limiting distributions of the log-likelihood under the null and alternative hypotheses are the distributions of the log-likelihood process for a "limiting hypothesis testing problem." This fact will be used for computation of asymptotic power and the derivation of asymptotically optimal sequential procedures in Chapter 2. Conditions which guarantee the weak convergence in (1.17) and (1.18) for general likelihood ratios are explored in [12]. One result - 22 -states that w log Zn(t) + B(t) - j Xt under H0 (1.19) w , and log Zn(t) •*• B(t) + j At under Hj (1.20) if and only if n /(/fn(x) - /fn(x))2 dx (1.21) and n J (/fn(x) - /fn(x))2 dx •»• 0 (1.22) as n * » where An(e) = {x:|/fn(x)/fn(x) - l| > e}. Here {B(t):t >_ o} is a Brownian motion with variance X per unit time (i.e., Var (B(t)) = Xt.) More general processes can arise as the limit of a log-likelihood ratio process, including processes with independent normally distributed increments. If we have independent and identically distributed observations Xn, Xn,... such that (1.21) and (1.22) hold then the limiting process can only be a Brownian motion. The reason for this is clear; if X°, Xn,... are i.i.d. then log Zn(t) has stationary independent increments because it is formed from partial sums of i.i.d. random variables. If the limiting process log Z(t) say, has stationary, independent, normally distributed increments it must be a Brownian motion. The limiting processes in (1.19) and (1.20) are Brownian - 23 -motions both with variance A per unit time and with drifts -A/2 and A/2 per unit time respectively. 1.5 Contguity and Convergence of Experiments The concept of nearness of null and alternative hypotheses or of families of probability measures is made precise by the notions of contiguity and convergence of experiments. A sequence {plj1} of probability measures is said to be contiguous to another sequence {PQ} (written plj1 < PQ) if for any sequence of events lira PQU11) = 0 implies lim P^(An) = 0. n-n» n->-eo Discussion of contiguity and its uses can be found in [12] and [21]. Contiguity has a close relationship with weak convergence of the likelihood ratio Zn = dP^/dP^. In the case that Zn has a limiting distribution under the null hypothesis contiguity is equivalent to the existence of a limiting distribution for Zn under the alternative hypothesis ([12]). In order that asymptotic power be non-degenerate the sequence of alternatives must be contiguous to the sequence of null hypotheses. Typically in the absence of contiguity there will exist tests with arbitrarily small error probabilities for sufficiently large sample - 24 -sizes. This was the case in Section 1.2 where the likelihood ratio had a degenerate limit because both the null and alternative did not change. When a composite hypothesis is specified the testing problem cannot be described in terms of contiguity or the likelihood ratio of two sequences of probability measures. A means of comparing more than two sequences of probabilities at one time is needed. Convergence of experiments describes the nearness of families of probability distributions. An experiment refers to a family E = {Pq} of probability distributions. A sequence of experiments EN = {Pg} is said to converge to E (written E •+• E) if for every finite set (9, ,...,9 } n J 1' ' m' the vector (dPg /dy11,... ,dPg /dy11) converges in distribution, under u11, 1 m m m to (dPQ /dy dPQ /dy) under y, where yn = E Pg and y = E PQ. 1 m 1 i 1 i In the case of binary experiments (those that contain two distributions) convergence of experiments coincides with weak convergence of the likelihood ratio. Convergence of experiments is the essential hypothesis of the Hajek-LeCam minimax theorem ([17]). This is one example of its application to composite hypothesis testing. An example of convergence of experiments is given by the family EN = {exp(l + 9//n"):9eR} which has limiting experiment E = {N( 9,1) : 9eR}. This fact is suggested (but not proven) by the one-dimensional weak convergence in (1.13). An interesting way of thinking about convergence of experiments is - 25 -as an extension of the likelihood principle. The likelihood principle (see [6]) says that all inference about the family {Pg} should be based on the likelihood function dPfi L( 9) = —9-du when there is a measure u such that Pg « p for all 9. An extension of this might say that when {p^} ->- {Pg} (in the sense defined above) all inference about {PQ} should be based on the likelihood function for {Pg} when n is large. - 26 -CHAPTER 2 ASYMPTOTIC OPTIMALITY OF SEQUENTIAL PROCEDURES 2.1 Wald's Criterion Let X^, X^,... be i.i.d. random variables with common distribution either P^ or P^ and let the likelihood ratio process {zn(t):t >^ o} be given by [nt] dP? n zn(t) = n —l- (x?) i-i dP« i dp^1 with Z = —- defined by (1.1). As sume that the process Z has the K asymptotic behaviour discussed in Section 1.4, namely w X log Z (t) •> B(t) ~ J t under Po» I2'1! w X and log Zn(t) •*• B(t) ~jt under Pn, [2.2] where {B(t):t >_ o} is a Brownian Motion with variance A per unit time (i.e., Var B(t) = At). A test will be defined using the limiting process and this test will be shown to be asymptotically optimal when applied to testing H-.tPj? vs. H, :P?. This is an extension to the sequential testing 0 0 1,n 1 situation of the similar result discussed in Section 1.3. - 27 -As a first step we recognize the limiting process in (2.1) and (2.2) as likelihood ratios. Let PQ and Pi be the distributions on C([0,«0) of the processes {B(t):t _> o} and {B(t) + Xt:t 2 n} respectively and let dPl t Z(t) = —^ dPo,t where PQ t and P^ t are the restrictions of PQ and P^ to C([0,t]). It is shown in [10] that log Z(t)= B(t) -|t under PQ (2.3) log Z(t)= B(t) + | t under P . (2.4) If HQ represents PQ and PQ and H^ represents P^1 and P^ then we have the weak convergence of the processes n w Z •*• Z under HQ and under . (2.5) The Sequential Probability Ratio Test (SPRT) for testing HQ vs. H1 uses the stopping rule T* = inf {t: Z(t) < B or Z(t) >_ A} (2.6) and decision rule D* = {Z(T*) > A}. (2.7) - 28 -It can be shown ([8]) that T* is finite under both hypotheses; thus when the event D* does not occur Z(T*) _< B and Hg is accepted. The SPRT has the same optimality property in continuous time as it does in discrete time. A sequential procedure for testing HQ vs. consists in general of a stopping rule T which takes values in [O,00] such that the event {T <_ t} is determined by {B(S):0 <. s <^ t} and a decision rule D which must be such that DflJT _< t} is determined by {B(S):0 < s <_ t}, for each te[0,°°]. Optimality Property of Continuous Time SPRT ([8]): Assume that for each n,Zn has a continuous distribution under Pn. If a sequential test (T,D) of HQ:PO VS. H^:PJ has smaller error probabilities than (T*,D*) defined by (2.6) and (2.7), i.e., VD) i po(D*} and pi(°C) < VD*C) then (T,D) must have higher average sample numbers (ASN), EQ(T) > EQ(T*) and E^T) !> E^T*). We will now prove a result (stated more precisely below) which says that this optimality property is preserved in the limit when the SPRT is applied to the discrete time setting. Consider the procedure (T^, D ) given by T* = inf{t: Zn(t) > A or Zn(t) < B}. - 29 -and D = {zn(T ) > A}. n 1 n — ' To study the asymptotic properties of (T » D ) the following results are used £ d ^ T^ -»• T under HQ and under H1 n (2.8) •n ic ^ & Z (T ) •*• Z(T ) under H„ and under H, (2.9) n U 1 ,n ft ft These follow from the fact that T and Z(T ) are continuous functionals of {z(t):t _> 0} relative to the sup-norm metric and the weak convergence (2.5) holds with respect to this metric (see Section 1.4). From (2.9) it follows immediately that the asymptotic error probabilities of . * ft. ft * (Tn, D ) are equal to the error probabilities of the SPRT (T , D ), i.e. lim Pjj (D*) = PQ(D*) (2.10) n>°° and lim Pn (D*c) = P.(D*c). (2.11) In 1 The same result for the average sample numbers requires the uniform integrability of {T^}; this is demonstrated in Appendix 1, thus lim En(T ) = E (T ) (2.12) U n U n>°° and lira E"(T*) = E.(T*). (2.13) In 1 n->-°° - 30 -The asymptotic optimality result can now be stated. Asymptotic Optimality Property of (Tn, Dn) : Assume that P^D) > 0. If (T^, D^) is any sequential test of vs. satisfying n lim P (D ) < P (D ) (2.14) n+°° and lim P™ (D°) _< PjCD*) (2.15) then lim En(T ) > E0(T*) (2.16) n-H» and lim En(T ) > E.(T*). (2.17) ^ 1 n - 1 A proof of this result will now be given. First we find a SPRT which has the same error probabilities as (Tn, Dn). This is where the assumption of continuity of the distribution of Zn is needed; it implies the existence of the required SPRT. We state the needed result from [24]: Lemma. Assume that Zn = dP^/dP^ has a continuous distribution. If <x^ and are non-negative numbers with + <^ 1 there exist An and Bn such that the SPRT with stopping boundaries Afl and Bn has error probabilities an and a. . - 31 -In order that the lemma applies the error probabilities of (T » D ) must satisfy the constraint + a • _< 1. Since we have assumed that PQ(D*) + P^D*0) < 1, (2.14) and (2.15) imply that for large n we will have PN(D ) + P"(Dc) < 1 as required. Since only the tail of the U n 1 n sequence affects (2.16) and 2.17) we can assume without loss of generality that this inequality holds for all n. Now let (Tn, DN) be the SPRT determined by the Lemma, that is T' = infit: Zn(t) > A or Zn(t) < B }, D» = {zn(T') > A }. n 1 n — n' By the optimality property of (T^, D^) it must have lower ASN than T , D ); thus it will suffice to show (2.16) and (2.17) with (T*, D1) in n'n' v/v/ n' n place of (T » nn)« Because of the inequalities (2.14) and (2.15) for (T^, D^), the sequences {A^} and {B^} must be bounded. If, say, {A^} was not bounded above then P?(D') = 0 and this contradicts (2.15). n>°° In By considering a subsequence if necessary assume {AN} and {B^} converge, say lim An = A', lim Bn = B'. Now if one of (2.16), (2.17) did not hold the SPRT (T', D') with stopping boundaries A' and B' would be k k k better than the optimal procedure (T , D ) i.e., PQ(D') _< pg(D )» P^D'0) < P^D*0), E (T') < EQ(T*) and E (T') < E (T*) with strict - 32 -inequality in one of the last two inequalities. This contradicts the ft ft optionality property of (T , D ). 2.2 Bayes Risk Criterion In this section a different criterion for comparing sequential testing procedures is used, the Bayes risk. We begin with the set up described in the first paragraph of Section 2.1. For a sequential test of HQ:PQ VS. H^:P^, say (T,D), we define the Bayes Risk, just as in (1.6), by Pn(T,D;ir) = ir(Pn(D) + cEn(T)) + (1 - TT)(P*(DC) + cE™(T)), (2.22) where TT is the prior probability of the distribution being PQ and c is the cost per observation. For a sequential test (T,D) of HQ:PQ VS. H^:PJ, where PQ and P^ are as in the previous section, the Bayes Risk is p (T,D;rr) = TT(PQ(D) + CEQ(T)) + (1 - ^(P^D0) + cE^T)), (2.23) Here c represents the cost per unit time of observing the likelihood ratio process Z(t). We will now solve the problem of minimizing p over all continuous time sequential tests. Our derivation will mimic the strategy used in [20] for deriving the same result in discrete time; the appropriate theory for the continuous time case corresponding to Snell's envelope is given in [20] and also in [9]. The solution will be a particular SPRT. Although the solution is derived here only for the special case that - 33 -PQ and P^ are distributions of Brownian motions the same argument will work for more general situations. In particular it will work under the general conditions of [8] which are used there for obtaining the previous optimality property of continuous time SPRTs given in Section 2.1. To begin it will be necessary to consider the equivalent problem of minimizing p(r) (T,D;IT) = TT(Pq(D) + cE0(T)) + r(l - TT)(P1(DC) + cE^T)), allowing the new parameter r to vary. The first step consists of fixing a stopping rule and finding the best decision rule to go with it. Lemma. If T is fixed minfir PQ(D) + r( 1 - TT) P^D0)] = Eq(TTA r(l - TT) Z(T)) and the minimum is achieved at DA = {Z(T) > Tr/r(l - TT) }. Proof: First we note that DO{T <_ t} is determined by {B(S):0 <_ s £ t}, for all t, and Z(T) equals dP]/dPQ on the o-field of such events. By the Lebesque Decomposition, TT PQ(D) + r(l - TT) Vl(D c) = /D TT dPQ + / c r(l - TT) Z(T) dPQ > /[ TT A r(l - IT) Z(T)] dPQ. It is straightforward to check that there is equality here for D = D*. - 34 -The problem is now reduced to minimizing TT c EQ(T) + r(l - TT) c E^T) + Eq(TT A r(l - TT) Z(T)) = E0(TT C T + r(l - TT) C T Z(T) + TT A r(l - TT) Z(T)) = EQ(Y(r)(T)) where the process {Y(r)(t):t >_ 0} is defined by Y(r)(t) = TT c t + r(l - TT) c t Z(t) + TT A r(l - TT) Z(t). According to Theorem 7.3 in [20] or Theorem 4 in [9] this can be done by finding the largest positive sub-martingale, say {V(r)(t):t > 0} dominated by{Y^r^(t):t _> o} and then forming the stopping rule T* = inf{t: Y(r)(t) = V(r)(t)}. (2.24) The process v(r) is given by V(r)(t) = essinf E(Y(r)(T)|B(S):0 < s < t) (2.25) where the essinf is taken over all stopping rules T which satisfy T _> t. Since Z(0) = 1, the initial value V^r)(0) is deterministic and V(r)(0) = inf E(Y(r)(T)) =h(r). (2.26) Note that h is an increasing concave function because it is the infimura of such functions. This fact will be important for determining the nature of the solution. For any stopping rule T satisfying T ^ t - 35 -E(Y(R)(T)|B(s):0 < s < t) = E(TTc(T - t) + r((l-ir) Z(t) c(T - t) + TT A r( 1 — TT) Z(t) |^-|B(S):0 < s < t) + TT ct + r(l - TT) ct Z(t) (2.27) where we have used the fact that E(Z(T) |B(S) :0 <, s _< t) = Z(t) (i.e., Z B(t)-|t is a martingale). Using the representation Z(t) = e zlul B(u) " B(t) " T (u " t} z(t) e and since JB(t)} has stationary independent increments, the process [|ftT:u >_ t} is independent of (B(S) :0 < s <_ t} and has the same distribution as the process {Z(u) :u >^ 0 }. Therefore the conditional expectation in (2.27) is minimized exactly as for the case t =0 in (2.26) but with r replaced by r Z(t), i.e., V(r)(t) = essinf E(Y(r) (T) |B(s) :s < t)) = h(r Z(t)) + Trct + r(l - TT) ctZ(t). ft Now the stopping rule T is given by T* = inf {t:Y(r)(t) = V(r)(t) } = inf {t :h(rZ(t)) = TT A r(l - TT) Z(t) }. In order for the Bayes Risk given in (2.23) to be minimized by T*, r is now set to 1. Thus T = inf{t: h(Z(t)) = TT A (1 — IT) z(t)}, Since h is increasing and concave, T* has the form T* = inf {t:Z(t) > A or Z(t) < B}. for constants A and B illustrated below. / j/hCx) ! x TTA(1-H)X B TT/( 1-IT) A Fig. 1. Graph of h which determines stopping boundaries A, B of optimal SPRT. If T is the stopping rule employed, the decision rules {Z(T ) > A} r * IT 1 and |Z(T- ) ^> y-—J- (recall the lemma, pg. 33) are equivalent due to the inequality B <_ ^ <_ A. Also, the cases B >^ 1 and A <^ lcorrespond to T* = 0 in which the initial decision based only on the prior probability is optimal, having Bayes Risk TTAO - • - 37 -As in the previous section the optimal procedure for the continuous time problem will be applied to the discrete time setting; a procedure which minimizes the asymptotic Bayes Risk results. Define the stopping rule T* = inf {t:Zn(t) > A or Zn(t) <^ B} where A and B are the stopping boundaries of the SPRT which minimizes the Bayes Risk (2.23) and the decision rule D* = {zn(T*) > A}. n 1 n — ' * * Thus (T > D^) has the asymptotic optimality property given by (2.14) -(2.17). Here it will be shown to have the following property. ft ft Asymptotic Bayes Optimality Property of (Tn, Dn):The asymptotic Bayes Risk of (T* D*) is n' n lim p (T*. D*; TT) = p(T*, D*; TT) . (2.28) n n n n*°° If (T^, Dn) is any sequential test of HQ VS. H^ then lim inf p (T, D; n) > p(T*, D*; n). (2.29) n+<» (T,D) n ft ft This will say that (Tn> D ) has the smallest possible asymptotic ft ft Bayes Risk and the value is the Bayes Risk of the procedure (T , D ). - 38 -The proof of (2.28) is achieved by the application of (2.10), (2.11), (2.12) and (2.13) which state that all of the components of the * * Bayes Risk Pn(Tn, D^; TT) converge to the corresponding components of ft ft p(T , D ; TT). The first step in proving (2.29) is to compute the minimum value of ft pn. From the discussion preceding the derivation of T it is known that pn is minimized by a SPRT with some stopping boundaries, say An and Bn, that is inf p (T, D; TT) = p (T , D ; TT) m ~ n n n' n' ' T,D where Tn = inf{t:Zn(t) > An or Zn(t) < Bn} and D = {zn(T ) > An}. n n — ' Assume now that along a subsequence of the integers {n'} the limit lim Pn,(Tn,, Dn,; ir) n' exists and is less than p(T , D ; Tr) • Within this subsequence there is a further subsequence (also called {n' }) such that the limits n' n' lim A = A' and lim B = B' exist, possibly infinite. Finally we can n' n' - 39 -repeat the argument at the end of the previous section, to show that the continuous time SPRT with stopping boundaries A' and B' has lower Bayes Risk than the SPRT which uses A and B. This of course contradicts the fact that A and B were derived to minimize the Bayes Risk (2.23). - 40 -CHAPTER 3 POWER OF CHI-SQUARE TESTS The focus of this section is the asymptotic power of Pearson's chi-square statistic for testing goodness of fit against a certain clas of alternatives. These alternatives are contiguous to the null hypothesis in the sense defined in Section (1.4). We will reproduce a derivation of the limiting distribution of Pearson's chi-square statistic under the null hypothesis ([7], [19]). In [5] the limiting distribution under a class of alternatives is computed, whereby the asympotic power can be computed. We will give a different development of this result which uses the weak convergence of the likelihood ratio. This highlights the usefulness of the likelihood ratio as a tool for studying hypothesis testing problems. In [18] the limiting alternative distribution is found for situations where a parameter-must be estimated. Also we will compare the asymptotic power of the chi-square test and of a test based on the likelihood ratio. We know from Section 1.3 that the test based on the likelihood ratio must win; the extent of the difference is of interest. Let N = (Ni,...,Nk) be a multinomial random vector which records the numbers of data points which fall into each of k classifications. Let the total number of data points be n and the probability of any one falling into the ith category be P^. The probability function of _N is - 41 -, k n. k P(N1=n1,...,Nk=nk) - = HP^ (ne Z+, I n =n) (3.1) n. !. .n. ! 1 1 1 k A common question asks whether the probability vector P_ = (Pj,...,Pk) belongs to a parametric family (P( 9): 9 EH}. This question reflects on the distribution of the underlying data which is usually the source of interest. For example when testing whether a sample X^,...,Xn came from the Normal distribution, categories E^ = (a^_^> ai_l-'. (^ = l,...,k) could be formed and the numbers N^ = #{Xj e E^} of observations falling into these intervals recorded. under the normal distribution the probability vector P would be given by Pi= L]\ JL-e-^W dx 1-1 /2T7O" a. - u a. , - u .The hypothesis of normality for Xi,...,Xn is also specified by the particular parametric form for P^. In general a test of the composite hypothesis H0 :_P = _P( 9) for some 6 e H requires estimation of 9. This situation is treated in [18]. We will consider only the simple hypothesis vp = p(v - 42 -for some specified 6Q e H. This is also written as H0:Pi = Pl (i=l,-..,k) (3.2) where P(8Q) = (pj ,... ,P°_) . Pearson's chi-square statistic for testing HQ is k (N - nP?)2 XZ(n) = E —i (3.3) 1-1 nP. l It will be shown that X2(n) has a limiting (n •*• °°) chi-square distribution with k-1 degrees of freedom. This fact is used for computing critical values of the test. The proof is based on a simple multivariate Central Limit Theorem ([7]) applied to the sequence of random vectors Vn given by N - nP° V = -i i . (3.4) n,i —?r /nP° A simple computation produces the covariance matrix of CovO^) = Ik-ii' where _£ = (/P^ ,...,Vp^)'. Because the are sums of independent identically distributed (Bernoulli) random variables the multivariate - 43 -CLT can be applied to yield d Vn \(0, A) under H( 0 (3.5) as n •*• 00 where A = Ik ~ R 3.' • ^n v^ew °f tne relation X2(n) = V V —n —n the following result is needed ([7]), [19]). Proposition 1. If _Y = N(0, A) and A is idempotent with rank r then since the covariance matrix A = 1^ - q q' of V_n is idempotent with rank k-1. Here we have used the continuity of the mapping X •*• X_' X. The statistic X2(n) is not designed with any specific alternatives to HQ in mind. The asymptotic power of X2(n) against the sequence of alternatives An application of this to Vn (recall (3.5)) gives lk-l (3.6) H. (3.7) - 44 -k where E C =0, can be calculated. 1 under the sequence of alternatives H stated in the following result. The distribution of ^ + v£)2 + Z2 + .. standard normal random variables. The limiting distribution of X (n) I n, is non-central chi-square as A 2 notion x' (^0 represents the r 2 . + Z where Z1t...,Z are i.i.d. r 1' r Theorem ,2 2 d k C XZ(n) * xl ,(I -4) as n -v (3.8) K_1 1 P. l One possible proof (as in [18]) uses a multivariate CLT for triangular arrays which establishes d "*" NK^» A) under n as n •»- 00 (3.9) with A as in (3.5) and Cl °k j5 = (— , •. •, ) /p° /P? 1 k This must be combined with the following fact about the multivariate normal distribution which generalizes Proposition 1. - 45 -Proposition 2. ([17]) If Y = Nk(j5, A) and A is idempotent with rank r and _6 is in the range (column space) of A then I' I = xj.2 (±f ±) . Using this result it is immediate that the Theorem follows from (3.9). A different proof of (3.9) will now be given; it will be based on the likelihood ratio for the simple hypothesis testing problem HQ vs. Hi_ n. This likelihood ratio is simply a ratio of multinomial probabilities defined by (3.1), namely k N TT (P° + C±/Sn) ± Zll=i k—IT (3.10) i In order that Zn can be used to find the limiting distribution of Vn there must be established a relationship between the two. This is done by taking logarithms and using a Taylor expansion as follows: k C log Zn = E N. log(l + -~--) 1 1 /rT 2 p0/n 2(P4) n c c2 — Z -i N, - - E * , N. + 0_,(—) /— „0i n .,0.2 l P/-/n P 2(?i) /n - 46 -/n P 0 v n,i /nP1; v n Ci 1 Ci = I — V - f l-f- + 0(1) 11,1 2 Pj ? (3.11) where 0p(—) is a term which converges to 0 in probability at rate l//n, /n and 0^(1) converges to 0 in probability. The latter term arises because N±/n - PJ - 0p(l). Now we use the following strategy. There is a "limiting" simple hypothesis testing problem which approximates the multinomial testing problem HQ vs. Hj n in such a way that there is a quantity which assumes the role of The distribution of this quantity, under null and under alternative, is the limiting distribution of under HQ and under H^ n, respectively. The limiting problem is HQ:Nk(0, A) vs. Hj :\(_6, A). The likelihood ratio based on a single observation X is (see Appendix A.2) Z(X) = exp(^' X--i_6' _S) . Comparing the equation - 47 -log Z(X) = _6' X - j j5« j5 (3.12) with (3.11) it is seen that (Zn, are related by the same equation as (Z(x), x) > except for the error term Op(l). Also _ d log Z •*• logZ under HQ (3.13) as n > ». The limiting distribution of log Zn is calculated from (3.11) and (3.5) as d log Zn -»• 6' Nk(0, A) - -i _6' _6 under HQ i N(- I _6' j5, _6' _6) (3.14) (since the variance of 6/ Nk(Cj, A) is _6' A _6 = _6' (lfc - ^ q^' ) _6 = _6' _6.) The distribution of log Z under HQ is easily seen from (3.12) to be the same. Finally, we show that (3.9) follows from (3.5), (3.11) and (3.12). Two lemmas are required; their proofs are found in the Appendix A.3. Lemma 1. Let Z be a likelihood ratio and {zn} a sequence of likelihood ratios. If a sequence of statistics Xn satisfies d (xn, zn) + (X, Z) - 48 -under the null hypothesis then d X * X n under the alternative. Note that the distribution of X under the alternative is not the same as the distribution of X under the null hypothesis. Lemma 2. Let Xn and Yn be sequences of random quantities which converge in distribution say d d Xn •> X, Yn * Y. If there is a continuous function H and random quantities en such that Y = H(X), Yn = H(Xn) + en and en •*• 0 in probability then d (Xn, Yn) + (X, Y). Lemma 1 reduces the proof of (3.9) to showing joint convergence of and Zn or equivalently of and log Zn; this follows from Lemma 2 and (3.11). Since V > X where X = N, (0, A) under the null and - 49 -X = N^—' A) under the alternative it follows that the latter is the limiting alternative distribution of VJJ. Finally we turn to a comparison of the test based on X2(n) with the test based on Zn. The comparison will be done via the limiting distributions of the statistics. Denoting _6' _6 by A the limiting distributions of X2(n) are 2 2 X, under H and x^. (A) under H , and the limiting distributions of log Zn are N(- —A , A.) under and N(yA ,A ) under . The asymptotic power of the test based on logZn is given in (1.14); replacing 92 there by A the asymptotic power is 1 - $(Za - (3.15) where Za is the lOO(l-a) percentile and $ is the distribution function of the standard normal distribution. Now for levels a = .05 and a = .01 we find the values of required for the asymptotic power of the X2(n) test to be .85, .90 and .95. These values of A solve the equations P(Xk-l(A) y Xfc-l.a* = *85' ,90' *95-2 2 where X, is the 100(l-a) percentile of the X, distribution and they can be read from Table 25 of [2], From A the power in (3.15) is computed and this is then compared to the relevant power for the - 50 -chi-square test. In Table 2 below the results are displayed for various values of k-1 the degrees of freedom for the chi-square statistic. Table 1. Asymptotic power 1 - Hz -a ^20 of the test based on zn for values of size a, power (3 and degrees of freedom k-1 of the chi--square test. a = .05 a = .01 k-l\S .85 .90 .95 .85 .90 .95 1 .911 .945 .975 .901 .937 .971 2 .952 .972 .989 .945 .968 .987 3 .969 .983 .994 .968 .980 .993 4 .978 .989 .996 .976 .987 .995 5 .984 .992 .997 .982 .991 .997 6 .988 .994 .998 .987 .994 .998 7 .991 .996 .999 .990 .995 .999 8 .993 .997 .999 .992 .996 .999 9 .994 .997 .999 .994 .997 .999 10 .995 .998 1.000 .995 .998 .999 15 .998 .999 1.000 .998 .999 1.000 Note that the power seems to converge to 1 as k gets large; this is also expressed by the fact that the non-centrality parameter & must increase with the degrees of freedom in order that the chi-square test have constant power. For a large number of cells k the chi-square statistic X2(n) has greater difficulty in detecting a particular alternative because it attempts to detect alternatives in many direc tions. It should be mentioned that under alternatives other than that specified by (i.e., P^ + C^/Zn), Zn may have smaller asymptotic power 2 than X (n). Thus a trade-off exists between increased power from using the likelihood ratio and the risk of using the wrong likelihood ratio. - 51 -BIBLIOGRAPHY 1. Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. 2. Biometrika Tables for Statisticians, Vol. II. (1972). Eds.: E.S. Pearson and H.0. Hartley. Cambridge University Press. 3. Chernoff, H. (1972). Sequential Analysis and Optimal Design. S.I.A.M., Philadelphia. 4. Chernoff, H. and Petkau, A.J. (1981). "Sequential medical trials involving paired data." Biometrika, 68, 1, 119-132. 5. Cochran, W.G. (1952). "The X2 test of goodness of fit." Ann. Math. Stat., 23, 315-345. 6. Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. Chapman and Hall, London. 7. Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 8. Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1953). "Sequential decision problems for processes with continuous time parameter. Testing hypotheses." Ann. Math. Stat., 24, 254-264. 9. Fakeev, A.G. (1970). "Optimal stopping rules for stochastic processes with continuous parameter." Thy. Prob. Appl. Vol. 15, No. 1, 324-331. 10. Freedman, D. (1971). Brownian Motion and Diffusion. Holden-Day, San Francisco. 11. Ghosh, B.K. (1970). Sequential Tests of Statistical Hypotheses. Addison-Wesley, Reading, Mass. 12. Greenwood, P.E. and Shiryayev, A.N. (1985). Contiguity and the Statistical Invariance Principle. Gordon and Breach, New York. 13. Lai, T.L., Siegmund, D. and Robbins, H. (1983). "Sequential design of comparative clinical trials." In Recent Advances in Statistics, Eds.: M.H. Risvi, J.S. Rustagi, D. Siegmund, 51-68. Academic Press, New York. 14. Lamperti, J. (1966). Probability. Benjamin/Cummings, Reading. 15. Lehmann, E.L. (1959). Testing Statistical Hypotheses. Wiley, New York. - 52 16. Lipcer, R. and Shiryayev, A.N. (1980). "A functional central limit theorem for semimartingales." Thy. Prob. Appl. Vol. 25, No. 4, 667-688. 17. Millar, P.W. (1983). The Minimax Principle in Asymptotic Statistical Theory. Unpublished notes. 18. Mitra, S.K. (1958). "On the limiting power function of the frequency chi-square test." Ann. Math. Stat., 29, 1221-1233. 19. Moore, D.S. (1983). "Chi-square tests." Studies in Mathematics, Vol. 19: Studies in Statistics, 66-106, Ed.: R.V. Hogg, Mathematical Association of America. 20. Neveu, J. (1974). Discrete Parameter Martingales. HoMcn-Pay, Sa«Francisco. 21. Roussas, G. (1972). Contiguity of Probability Measures: Some Applications in Statistics. Cambridge University Press. 22. Thompson, M.E. (1971). "Continuous parameter optimal stopping problems." Z. Wahrscheinlichkeitstheorie, 19, 302-318. 23. Wald, A. and Wolfowitz, J. (1948). "Optimum character of the sequential probability ratio test." Ann. Math. Stat., 19, 326-339. 24. Wijsman, R.A. (1963). "Existence, uniqueness and monotonicity of sequential probability ratio tests." Ann. Math. Stat., 34, 1541-1548. - 53 -APPENDIX A.l Uniform Integrabillty of a Sequence of Stopping Times The convergence of Average Sample Numbers ((2.12), (2.13)) which is used in Sections 2.1 and 2.2 requires the uniform integrability of the sequence of stopping times {Tn} given by T* = inf {t: Zn(t) >_ A or Zn(t) < B}. This we will establish now using the set-up described in the first paragraph of Chapter 2. The uniform integrability must be established under both sequences of probabilities {Pg} and {P^}. In doing this for both sequences at once we will let PN denote either of the sequences. Let Fn be the * n distribution function of T under P , n F (t) = Pn(T* < t). n n — Let t be an integer. Then 1 - F (t) = Pn(T* > t) n n - PN(B < Zn(s) < A for all s < t) = Pn(logB < logZn(s) < logA for all s < t) = Pn(logB < logz£ < logA for all k <^ tn) - 54 -< Pn(logB < logZn < logA, logB < logZnn < logA,..., logB < log Znn < logA) < Pn(|logz£| < C, |logZnn - logZn| < C,..., |logZjn- logZnt_1)n| < C) where C = JlogAJ + |logB . Since logZ^, iog z"n - log",..., logZn - logZ? are i.i.d. ° tn & (t-l)n 1 - F (t) < [P(n)]fc n — with 'P(n) = Pn(|logZ^| < C). Now since logZ^j = logZn(l) 3- logZ(l) we have p(n) •»• P( |logZ(l) | < C) < 1 and thus we can assume without loss of generality that P(n) < Y < 1 for every n Therefore 1 - Fn(t) <. Y*" for integers t, so for any t 1 - Fn(t) < 1 - Fn([t]) < Y[t] < Yt_1 (Al.l) holds for every n. Now by integration by parts 2/QCI - Fn(t)) tdt = (1 - Fn(t)) t2/^ + JQ t2 dFn(t) " 'O t2 dFn^> using the inequality (Al.l). Therefore En(T*)2 = 2/" (1 - F (t)) tdt n 0 n <2J^t Yt_1 dt < ~. It now follows that {^n) is uniformly integrable. A.2 The Likelihood Ratio of Singular Multivariate Normal Distributions It is required to find the likelihood ratio of the distributions Nk(_6, A) and Nk(Cj, A). Consider the representation ([19]) A1/2 Z + _6 for the Nk(j5, A) distribution, where Z is a vector of i.i.d. N(0,1) 1/2 variables and A is the square-root of A, a symmetric matrix 1/2 1/2 satisfying A A = A. In our situation A is idempotent with rank r 1/2 so that A = A; also - 56 -A = B P |o. B' = B D B' (A2.1) with B an orthogonal matrix, a representation which will be used below. Now if J> is in the range (column space) of A the vector A Z_ + _6 will remain in the range of A and the two distributions N^(_6, A) and Nk(0_, A) will have the same support. This will make their likelihood ratio meaningful. In Chapter 3 we had A = 1^ - <L <J.' and _6 was orthogonal to q so that A _6 = _6 thus ensuring that _6 is in the range of A. Now let QQ, be the probability measures on R corresponding to the Nk(0, A), Nk(_6, A) distributions respectively. With D as in (A2.1) X = Nk (Cj, D) => BX = Nk(0, A) (A2.2) Now let PQ, P^ be the probability measures corresponding to Nk(0_, D) , Nk(_u_, D). From (A2.2) and (A2.3), QQ = PQB"1 and Q = P^"1 where the notation means Q0(A) = PQ(B_1A), QX(A) = P^B LA) for Borel sets A in R . The likelihood ratio dP^/dPg is simple to find and the following lemma shows how it relates to dQ^/dQ^, the desired likelihood ratio. - 57 -Lemma. Let Pg, Pj be probability measures on a measure space (X,F) and f:(X,F) •»• (Y,G) be measureable and 1-1 with measurable inverse f-1:(Y,G) •»• (X,F). Define Q Q on (Y,G) by QQ(A) = P0(f_1 A), QX(A) - PjCf"1 A). (A e G), Then if P1 PQ and Q1 « QQ then dQ, dP. , air (y> = lor- (f <y» (y e Y>-0 0 Proof: Let A e G. dP, dP, _, , U Hr<£ <*» dV?> " Uw<f ™ dpo f dP, = /-i F (x) dpo(x) f A 0 (by the change of variables formula with y = f(x).) = P^r^A)) = Q^A), The use of this lemma requires dPj/dPg. But Pn is the distribution of a vector (Xj,...,X , 0,...,0)' of r i.i.d. N(0,1) variables and Pj is the distribution of this vector with the added mean vector j£ = (Pj,...,Mr, 0 0)'. Therefore for any x = (x^ >•••* 0, ••.0) - 58 -exp{- j I(x± - u±)2} dP. •jp-Cx) -0 I 1 V 2 I exp{- y I x } 1 tE pi xi " 1 1 ^ = exp ^' 2. - { 2' J±l = exp 1 = exp{- I(x - _y)' (x - v) + I x' xj. Therefore, by the lemma, since the linear map B is 1-1 from the range of D to the range of A, for each y in the range of A dQ. dP. , dP. _i (z) . _i (B 'Z> - ^ (»• j) = exp(u_' B' x - "J Ji' ill = exp{(B'_6)' > B'y - i-(B'_6)'(B'_S)} = exp{6/X - ^V±}. and this was the formula used to obtain (3.12). Note: A further use of this calculation is made for the application of the SPRT to the problem of testing the mean vector of a multivariate normal distribution. Note that only alternatives which specify a mean vector in the range of the covariance matrix can be tested. - 59 -A.3 Two Lemmas on Weak Convergence In this section proofs of Lemma 1 and Lemma 2 are provided. Lemma 1 is first restated more precisely. Lemma 1. Let PQ and P^ be probability measures with P^ PQ and Z = dP^dPQ be their likelihood ratio. For each n=l,2,... let PQ and be probability measures with P^ PQ and Zn = dP^/dP™. If there are random elements X, Xn such that d (Xn, Zn) •»• (X, Z) under PQ, PQ then d Xn •*• X under Pn, P^ Proof: If f is bounded and continuous on the space where Xn and X lie /f(xn) dP^ - /f(xn) zn d?l = /h(Xn, Zn) dPj * jh(X, Z) dPQ (since h(x, z) = f(X) z is continuous on the product space.) = Jf(X) Z dPQ = Jf(X) dPr - 60 -Lemma 2. Let P^ and PQ(n=l,2,...) be probability measures and let X, Y, Xn and Yn be random elements such that Xn ^ X under P^, PQ and Yn £ Y under Pn, FQ. If there is a continuous function H and random elements en such that then Y = H(X) Yn = H(Xn) + en en ->- 0 in probability under Pn, (Xn, Yn) i (X, Y) under Pn, VQ. Proof: Since en •> 0 in probability it suffices to prove that d (Xn, H(Xn)) - (X, H(X)). (see [1]). For this let f be continuous and bounded on the product space where (X,Y) lives. Then Jf(Xn, H(Xn)) dPjJ = /g(Xn) dpJJ * JR(X) dP0 = Jf(X, H(X)) dPQ Since g(x) = f(x, H(x)) is continuous.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Likelihood ratios in asymptotic statistical theory
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Likelihood ratios in asymptotic statistical theory Leroux, Brian Gilbert 1985-12-31
pdf
Page Metadata
Item Metadata
Title | Likelihood ratios in asymptotic statistical theory |
Creator |
Leroux, Brian Gilbert |
Publisher | University of British Columbia |
Date | 1985 |
Date Issued | 2010-05-19T13:22:57Z |
Description | This thesis deals with two topics in asymptotic statistics. A concept of asymptotic optimality for sequential tests of statistical hypotheses is introduced. Sequential Probability Ratio Tests are shown to have asymptotic optimality properties corresponding to their usual optimality properties. Secondly, the asymptotic power of Pearson's chi-square test for goodness of fit is derived in a new way. The main tool for evaluating asymptotic performance of tests is the likelihood ratio of two hypotheses. In situations examined here the likelihood ratio based on a sample of size ⁿ has a limiting distribution as ⁿ → ∞ and the limit is also a likelihood ratio. To calculate limiting values of various performance criteria of statistical tests the calculations can be made using the limiting likelihood ratio. |
Subject |
Mathematical statistics - Asymptotic theory |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2010-05-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096109 |
URI | http://hdl.handle.net/2429/24843 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- UBC_1985_A6_7 L47.pdf [ 2.38MB ]
- Metadata
- JSON: 1.0096109.json
- JSON-LD: 1.0096109+ld.json
- RDF/XML (Pretty): 1.0096109.xml
- RDF/JSON: 1.0096109+rdf.json
- Turtle: 1.0096109+rdf-turtle.txt
- N-Triples: 1.0096109+rdf-ntriples.txt
- Original Record: 1.0096109 +original-record.json
- Full Text
- 1.0096109.txt
- Citation
- 1.0096109.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 28 | 23 |
United States | 11 | 4 |
Nigeria | 5 | 1 |
Canada | 1 | 0 |
City | Views | Downloads |
---|---|---|
Shenzhen | 19 | 23 |
Beijing | 9 | 0 |
Unknown | 8 | 25 |
Ashburn | 7 | 0 |
Mountain View | 2 | 3 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096109/manifest