- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Likelihood ratios in asymptotic statistical theory
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Likelihood ratios in asymptotic statistical theory 1985
pdf
Page Metadata
Item Metadata
Title | Likelihood ratios in asymptotic statistical theory |
Creator |
Leroux, Brian Gilbert |
Publisher | University of British Columbia |
Date Created | 2010-05-19T13:22:57Z |
Date Issued | 2010-05-19T13:22:57Z |
Date | 1985 |
Description | This thesis deals with two topics in asymptotic statistics. A concept of asymptotic optimality for sequential tests of statistical hypotheses is introduced. Sequential Probability Ratio Tests are shown to have asymptotic optimality properties corresponding to their usual optimality properties. Secondly, the asymptotic power of Pearson's chi-square test for goodness of fit is derived in a new way. The main tool for evaluating asymptotic performance of tests is the likelihood ratio of two hypotheses. In situations examined here the likelihood ratio based on a sample of size ⁿ has a limiting distribution as ⁿ → ∞ and the limit is also a likelihood ratio. To calculate limiting values of various performance criteria of statistical tests the calculations can be made using the limiting likelihood ratio. |
Subject |
Mathematical Statistics - Asymptotic Theory |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2010-05-19T13:22:57Z |
DOI | 10.14288/1.0096109 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/24843 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0096109/source |
Download
- Media
- UBC_1985_A6_7 L47.pdf [ 2.38MB ]
- Metadata
- JSON: 1.0096109.json
- JSON-LD: 1.0096109+ld.json
- RDF/XML (Pretty): 1.0096109.xml
- RDF/JSON: 1.0096109+rdf.json
- Turtle: 1.0096109+rdf-turtle.txt
- N-Triples: 1.0096109+rdf-ntriples.txt
- Citation
- 1.0096109.ris
Full Text
LIKELIHOOD RATIOS IN ASYMPTOTIC STATISTICAL THEORY By BRIAN GILBERT LEROUX B.Sc, Carleton University, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE FACULTY OF GRADUATE STUDIES Department of S t a t i s t i c s We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA A p r i l 1985 ©Brian G i l b e r t Leroux, 1985 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date March If, ABSTRACT This thesis deals with two topics in asymptotic s t a t i s t i c s . A concept of asymptotic optimality for sequential tests of s t a t i s t i c a l hypotheses i s introduced. Sequential P r o b a b i l i t y Ratio Tests are shown to have asymptotic optimality properties corresponding to t h e i r usual optimality properties. Secondly, the asymptotic power of Pearson's chi-square test for goodness of f i t i s derived i n a new way. The main tool for evaluating asymptotic performance of tests i s the l i k e l i h o o d r a t i o of two hypotheses. In s i t u a t i o n s examined here the l i k e l i h o o d r a t i o based on a sample of size n has a l i m i t i n g d i s t r i b u t i o n as n ->• 0 0 and the l i m i t i s also a l i k e l i h o o d r a t i o . To c a l c u l a t e l i m i t i n g values of various performance c r i t e r i a of s t a t i s t i c a l tests the c a l c u l a t i o n s can be made using the l i m i t i n g l i k e l i h o o d r a t i o . - i i i TABLE OF CONTENTS Page Abstract i i Table of Contents i i i L i s t of Tables i v L i s t of Figures v Acknowledgement v i INTRODUCTION 1 CHAPTER 1 - THE THEORY OF LIKELIHOOD RATIOS 3 1.1 Likelihood Ratios and Hypothesis Testing 3 1.2 Sequential Tests of Hypotheses 8 1.3 Weak Convergence of Likelihood Ratios 12 1.4 Functional Convergence of Likelihood Ratios 19 1.5 Contiguity and Convergence of Experiments 23 CHAPTER 2 - ASYMPTOTIC OPTIMALITY OF SEQUENTIAL TESTS 26 2.1 Wald's C r i t e r i o n 26 2.2 Bayes Risk C r i t e r i o n 32 CHAPTER 3 - POWER OF CHI-SQUARE TESTS 40 BIBLIOGRAPHY 51 • APPENDIX A.l Uniform I n t e g r a b i l i t y of a Sequence of Stopping Rules 53 A.2 The Likelihood Ratio of Singular M u l t i v a r i a t e Normal D i s t r i b u t i o n s 55 A.3 Two Lemmas on Weak Convergence 59 - i v - L I S T OF TABLES Page T a b l e I. A s y m p t o t i c power 1 - $ ( Z a - v2[) o f the t e s t based on Z n f o r v a l u e s of s i z e a , power 3 and deg ree s o f f reedom k - 1 o f the c h i - s q u a r e t e s t 50 - v - L I S T OF FIGURES Page F i g . 1. Graph of h which determines stopping boundaries A, B of optimal SPRT 36 - v i - ACKNOWLEDGEMENT The author, being one who thrives on encouragement, wishes to thank Professors Cindy Greenwood and John Petkau for the constant supply they gave. - 1 - INTRODUCTION The motivation behind some of this work l i e s i n a problem concerning a sequential procedure for testing the mean of a normal d i s t r i b u t i o n . The following discussion of this problem follows [ 3 ] . There are observed independent i d e n t i c a l l y d i s t r i b u t e d observations X\, 2 2 X 2 , . « . assumed to be d i s t r i b u t e d as N(y, a ) for a known a . It i s required to f i n d a sequential procedure for testing whether u i s po s i t i v e or negative (sequential procedures are discussed i n Section 1.1). The c r i t e r i o n by which procedures are to be judged i s the Bayes Risk. This i s defined i n terms of a cost function having two components, one due to reaching an incorrect conclusion and a second depending on the number of observations on which the conclusion i s based. The proposed costs are K | u | for making an error (K i s a constant) and a cost of c per observation. The average cost for a given procedure w i l l depend on u. To avoid problems involved with this i t i s assumed that u i s a random v a r i a b l e , also with a normal d i s t r i b u t i o n . If i t s mean and variance are s p e c i f i e d the average cost can be averaged further against t h i s d i s t r i b u t i o n for y. The res u l t i s the Bayes Risk. In the development of a sequential procedure which minimizes the Bayes Risk the p a r t i a l sums of the observations are replaced by a Brownian motion. This i s a reasonable approximation i f the number of observations can be expected to be large, and th i s can be expected when the cost c i s small. A procedure which i s optimal (minimizes Bayes - 2 - Risk) i n the continuous time s e t t i n g i s derived and then applied (with a small adjustment) to the d i s c r e t e time s e t t i n g . It i s desired to have a r e s u l t s t a t i n g that t h i s procedure i s asymptotically optimal i n some sense which can be made precise. Asymptotic here refers to c approaching zero. Results along these li n e s can be found i n [13] where the s e t t i n g i s the more complicated s i t u a t i o n of sequential medical t r i a l s i n which further components of cost are considered (see [4]). This author attempted to e s t a b l i s h s i m i l a r r e s u l t s using the theory of weak convergence of l i k e l i h o o d r a t i o s which w i l l be discussed i n Chapter 1. Success was met only i n simple hypothesis testing settings where there are only two possible states of nature. In Chapter 2 are presented discussions of asymptotically optimal sequential procedures which are based on the l i k e l i h o o d r a t i o . It i s believed that the methods used there could be applied s u c c e s s f u l l y i n more complicated s i t u a t i o n s . Another area for a p p l i c a t i o n of l i k e l i h o o d r a t i o theory l i e s i n the c a l c u l a t i o n of asymptotic performance of other tests not necessarily based on the l i k e l i h o o d r a t i o . In Chapter 3 the asymptotic power of chi-square tests i s studied v i a the theory of Chapter 1. It i s indicated there that the chi-square test i s asymptotically i n e f f i c i e n t compared to a test based on the l i k e l i h o o d r a t i o . - 3 - CHAPTER 1 THE THEORY OF LIKELIHOOD RATIOS 1.1 Likelihood Ratios and Hypothesis Testing We describe the general hypothesis testing problem of di s t i n g u i s h i n g two p r o b a b i l i t y measures. On a set ft l e t there be p r o b a b i l i t y measures PQ a n d pi» A random element X of n i s chosen and the question i s asked: was X chosen based on the d i s t r i b u t i o n Pg or the d i s t r i b u t i o n Pj? A decision rule for answering the question i s a subset D of ft; i f X belongs to D then i t i s decided that ?\ i s the true d i s t r i b u t i o n , otherwise PQ« In common language D i s a test of the simple hypothesis HQ:PO versus the simple hypothesis H^:Pj. D i s also c a l l e d the r e j e c t i o n region because the occurrence of the event D leads to the r e j e c t i o n of the n u l l hypothesis HQ i n favor of the a l t e r n a t i v e . Each decision rule has associated with i t two error p r o b a b i l i t i e s : ct(D) = PQ(D) = p r o b a b i l i t y of r e j e c t i n g HQ when i t i s true, and B(D) = P ^ D 0 ) = p r o b a b i l i t y of accepting ^ when i t i s f a l s e , c a l l e d the type I error and type II error r e s p e c t i v e l y . a(D) i s also c a l l e d the l e v e l and 1-$(D) the power of the test D. Because i t i s generally impossible to minimize both types of error simultaneously, various c r i t e r i a for comparing decision rules have been employed. In many cases the best rules are based on the l i k e l i h o o d r a t i o which we w i l l now define. - 4 - Given two p r o b a b i l i t y measures PQ and P̂ such that Pj i s absolutely continuous with respect to P Q (F^ < P n ) , th e i r l i k e l i h o o d r a t i o i s the Radon-Nikodym der i v a t i v e dPj/dP n. This can be generalized by defining the l i k e l i h o o d r a t i o of any two p r o b a b i l i t y measures P^, P̂ on a measure space (ft, F) by dP,/dp z = d V d i i " - d where p i s any measure on (ft, F) such that P̂ < p and PQ € p (such as P = P Q + P j ) . The conventions 1/0 = °° and 0/0 = 0 are used i n (1.1). In order to show that Z does not depend on the p a r t i c u l a r choice of p the following r e s u l t i s needed. Lebesque Decomposition. For any AeF, J A z dP 0 = P J ( A n {z < -}). dp 0 dpj Proof: Let XQ = and X = . Then for any AeF dp dp / A Z dP Q = / Z dP 0 = / X Q.Z dp = / X 1 dp = Pj(An{z < »}) An{z<»} An{z<~} An{z<°°} since {z = °°} = { X Q = 0} and X QZ = X̂ ^ on {z < °°}. Now because PQ(^Q = 0) = 0, Z i s a f i n i t e random var i a b l e on the p r o b a b i l i t y space (ft, F, P 0 ) . For any Ac{z < ~} the i n t e g r a l /» Z dP n = P,(A) i s determined and so Z i s uniquely determined on - 5 - (ft, F , P g ) « By symmetry 1/Z i s u n i q u e l y d e t e r m i n e d on (ft, F , and hence Z i s a l s o u n i q u e l y d e t e r m i n e d on (ft, F , P^ )« The n o t a t i o n dP^/dP^ i s used to denote t h i s e x t e n s i o n o f the Radon-Nikodym d e r i v a t i v e and from here on dP^/dPg w i l l denote Z as d e f i n e d i n ( 1 . 1 ) . Examp le . When Pg and P^ a r e p r o b a b i l i t y d i s t r i b u t i o n s on R w i t h d e n s i t i e s fg and f j r e s p e c t i v e l y the r a t i o of d e n s i t i e s Z - V f g i s the l i k e l i h o o d r a t i o of Pg and ? ± . I t need not be assumed t h a t the s u p p o r t o f fg i s c o n t a i n e d i n the suppor t of f±. S i n c e Z i s expec ted to be l a r g e r when i s t r u e a r e a s o n a b l e t e s t of Hg v s . H]̂ uses the d e c i s i o n r u l e D* = {z > C} where C i s a c o n s t a n t wh ich de te rm ine s the l e v e l of the t e s t . T h i s r u l e has the f o l l o w i n g n i c e p r o p e r t y . Neyman-Pearson Lemma. I f D i s any t e s t of Hg vs Hj s a t i s f y i n g a(D) <_ a(D*) then 6(D) > 3 ( D * ) . P r o o f : By the Lebesque D e c o m p o s i t i o n 0(D) = P ^ D 0 ) = / c Z d P Q + P 1 ( D c n {Z = » } ) . - 6 - Since D * c f l {z = °°} = 0, P^D0^! {z = »}) > P 1(D* cn {z = »}). Also / Z dP Q - J Z dP Q = / (I - I ) Z dP Q + J (I - I ) Z dP Q c *c * D D *c D D D D D D D * D D D * c D D since Z > C on D and I - I * < 0 on D . — c *c — D D Therefore / Z dP Q - / Z dP Q > C/(I c - I * c) dP 0 = C [ P Q ( D C ) - P 0 ( D ) * C ) ] D c D * c D D = C[P N(D*) - P Q ( D ) ] > 0 . This r e s u l t says that among a l l rules having type I error at most P Q ( D * ) , D* has the smallest type II error. Equivalently, among a l l rules with type II error at most P i ( D * C ) , D* has the smallest type I error. A simple and symmetric formulation i s : no rule can simultaneously have both a smaller type I error and a smaller type II error than D . If PQ(D*) = a then D* i s ca l l e d an optimal l e v e l - a t e s t . For a given number a i t may be impossible to find a number C such that PQ(Z j> C) = a. There w i l l always be a randomized decision rule which - 7 - achieves t h i s but these w i l l not be considered here. See [15] for a discussion of randomized r u l e s . Decision rules having the form of D are optimal also i n the sense of minimal Bayes Risk. The Bayes Risk for a rule D i s where IT i s the pr i o r p r o b a b i l i t y of the d i s t r i b u t i o n being Pg* This assumes the 0-1 cost (or loss) function whereby a cost of 1 i s incurred when an error of either type i s made. Now the expected cost i s the p r o b a b i l i t y (under the appropriate hypothesis) of making an error and when th i s i s averaged over the two hypotheses according to the p r i o r p r o b a b i l i t y TT, (1.2) r e s u l t s . For fixed TT the Bayes Risk i s minimized by D with C having the value TT/(1 - TT) , i . e . IT a(D) + (1 - TT) 3(D) (1.2) inf[TT ct(D) + (1 - TT) 3(D)] = /[TTA(1 - TT) Z] dP( D 0 and the infimum i s achieved at D„ = {z J> ir/(l - TT)}. This i s proved e a s i l y using the Lebesque Decomposition as follows. F i r s t , TT aCD̂ ) + (1 - TT) 3(0^) = - /[TTA (1 - TT) Z] dP, and f o r any D - 8 - TT cx(D) + (1-TT) g(D) = / ir dP Q + J (1-TT) Z dP 0 + (1-TT) P ( D c f t {Z = »}) D > / TT dP Q + / c ( l - TT) Z dP( > / [ TT A (1 - TT) Z] dPQ. 1.2 S e q u e n t i a l T e s t s of Hypo these s Wald's Sequential P r o b a b i l i t y Ratio Test (SPRT) i s a procedure for tes t i n g H Q:P 0 vs. H 1:P 1 based on a sequence X^ , X2,... of independent i d e n t i c a l l y d i s t r i b u t e d ( i . i . d . ) random variables having d i s t r i b u t i o n either PQ or P^. If Xj,...X^ are observed the SPRT uses the s t a t i s t i c k \ = IT Z(X.) (1.3) k i-1 1 where Z i s the l i k e l i h o o d r a t i o dPi/dPrj. This i s reasonable because Z^ i s the l i k e l i h o o d r a t i o of the d i s t r i b u t i o n of Xj,...,X^ under F\ with respect to the d i s t r i b u t i o n under Pg. The SPRT proceeds as follows: i f Z^ _> A then i s accepted i f Z^ <̂ B then HQ i s accepted i f B < Z^ < A then another observation i s taken, - 9 - with A and B s a t i s f y i n g 0 < B < ^ 1 < ^ A < ° ° . This procedure can be expressed i n terras of the stopping rule T = i n f {k: Z k > A or Z k < B} ( 1 .4 ) and the decision rule D = {Z T > A}, ( 1 . 5 ) where Z-p denotes the value of Z k when T = k. An immediate question a r i s e s : can i t happen that never crosses the boundaries determined by A and B? To answer this question p r o b a b i l i t y d i s t r i b u t i o n s , corresponding to Pg and P j , for i n f i n i t e sequences X} , X2,... must be used. These are the i n f i n i t e product measures denoted Qg and . The questions above i s answered i n the negative by Q Q ( B < Z k < A, k = 1,2,...) = 0. Q ^ B < Z k < A, k = 1,2,...) = 0. These statements are implied by the stronger r e s u l t s Z k -*• 0 a.s. under OQ Z k •*• 0 0 a.s. under . A proof of these uses the Strong Law of Large Numbers applied to the sequence {-a V log Z(X ) } " = 1 > where X V Y i s defined to be the larger of - 10 - X and Y. Note that the SPRT could equally well be defined with logZ^ i n place of Z^. The Strong Law of Large Numbers yi e l d s 1 n 11m- E (-a V log Z(X )) - E (-a V log Z(X.)) a.s. (Q ). n^.oo i I u i u where EQ denotes expected value under QQ. By Jensen's Inequality, provided PQ and V\ are d i s t i n c t i n the sense that PQ(Z = 1) < 1, E 0(logZ) = JlogZ dP Q < log / Z dP 0 < 0. For large enough a then and E 0 ( - a V logZCxp) = E Q ( - a V logZ) < 0 k lim E (-a V logZ(X.)) = -°° a.s. (Q n) k-x» i=l lim E log Z(X,) = -» a.s. (Q n) k+» i=l lim Z, = 0 a.s. ( Q Q ) . k->-°° By symmetry, l i m - i — = 0 a.s. (Qi) and so lim Z, = °° a.s. (Q.) k->-°° k k->-<*> We have just seen that the conditions for stopping i n (1.4) w i l l be met eventually, i . e . , T i s a.s. f i n i t e . Now l e t us compare the SPRT to other sequential tests of HQ vs. . A sequential test i n general - 11 - c o n s i s t s of a s t o p p i n g r u l e and a d e c i s i o n r u l e . A s t o p p i n g r u l e i s a random v a r i a b l e T t a k i n g v a l u e s i n the p o s i t i v e i n t e g e r s such t h a t the se t {T = k} depends o n l y on X i , . . . , X ^ . The d e c i s i o n r u l e o f a s e q u e n t i a l t e s t i s a s e t D wh ich depends o n l y on the o b s e r v a t i o n s X^ up u n t i l the random t ime T , i . e . , f o r each k, D n {T = k} i s d e t e r m i n e d by X i , . . . , X f c . C r i t e r i a f o r compar ing s e q u e n t i a l t e s t s i n c l u d e e r r o r p r o b a b i l i t i e s and the Ave rage Sample Number (ASN) wh ich i s the e x p e c t e d v a l u e of the s t o p p i n g r u l e . On ly t e s t s w i t h f i n i t e ASN w i l l be c o n s i d e r e d w o r t h w h i l e ; t h i s i m p l i e s t h a t the c o n d i t i o n s f o r s t o p p i n g w i l l a lmost s u r e l y be met e v e n t u a l l y . The e r r o r p r o b a b i l i t i e s a re d e f i n e d e x a c t l y as f o r n o n - s e q u e n t i a l t e s t s , a (T , D) = QQ(D) and 0 (T , D) = 0 ^ ( 0 ° ) . The SPRT has the f o l l o w i n g o p t i m a l i t y p r o p e r t y . O p t i m a l i t y P r o p e r t y o f t he SPRT. L e t a = OQ(D) and B = Q i ( D C ) be the e r r o r p r o b a b i l i t i e s of the SPRT d e f i n e d i n (1 .4 ) and ( 1 . 5 ) . I f ( T ' , D') i s any o t h e r s e q u e n t i a l t e s t o f HQ:PQ VS. H^:P^ w i t h s m a l l e r e r r o r p r o b a b i l i t i e s , Q 0 ( D ' ) < a; Q ^ D ' 0 ) < 6 then the SPRT has s m a l l e r ASN under bo th h y p o t h e s e s , E n ( T ' ) > E n ( T ) - 12 - and Ej(T') > E j ( T ) . (As for EQ, E^ denotes expectation under Q ^ ) . There have been four strategies for proving this r e s u l t . The o r i g i n a l i s due to Wald and Wolfowitz, [ 2 3 ] , another i s due to Lehmann (see [15] or [ 1 1 ] ) and two others ( [ 3 ] , [ 2 0 ] ) f i r s t prove that SPRTs are Bayes procedures. This l a t t e r r e s u l t i s important enough to be stated as a separate r e s u l t . The Bayes Risk of a sequential test of HQ:PO vs. H^rPj which uses stopping rule T and decision rule D i s p( T , D ; r r ) = TT(Q 0(D) + c E Q ( T ) ) + (1 - TT)(Q 1(D C) + c E ^ T ) ) ( 1 . 6 ) where TT i s the p r i o r p r o b a b i l i t y of the true d i s t r i b u t i o n being Po and c i s the cost per observation. Just as for the Bayes Risk i n ( 1 . 2 ) the 0-1 cost function i s employed here. Bayes Optimality of the SPRT; There exist constants A and B which depend on TT such that the Bayes Risk ( 1 . 6 ) i s minimized by the SPRT which has stopping boundaries A and B. This property i s proved i n [ 2 0 ] ; i n Section 2.2 we w i l l demonstrate how to adapt the argument given there to the continuous time s e t t i n g . 1.3 Weak Convergence o f L i k e l i h o o d R a t i o s For testing simple hypotheses, results based on the l i k e l i h o o d r a t i o are good tests as measured by the optimality properties we have j u s t seen. For testing a simple n u l l hypothesis against a composite - 13 - a l t e r n a t i v e a reasonable approach consists of choosing one element of the a l t e r n a t i v e , thus forming a new simple a l t e r n a t i v e . For example l e t Xi,...,X n be a random sample from the N(6,l) d i s t r i b u t i o n and consider t e s t i n g HQ:9 = 0 vs. : 9 > 0. One test i s based on the l i k e l i h o o d r a t i o for HQ:9 = 0 vs. : 9 = 9Q for some fixed Bq > 0. In this case the l i k e l i h o o d r a t i o i s "(x - 9 ) 2/2 2 . z ( x ) = 1 T7 = e x 12 /2r? e and the l i k e l i h o o d r a t i o based on Xj,•..,X k i s (1.7) k k ,2, Z k - TT z(X ±) = e x p ( 9 Q U . - k 6J/2) Another example involves the parameter 9 i n the exp(l + 9) d i s t r i b u t i o n . The l i k e l i h o o d r a t i o for H 0 : 9 = 0 vs. 1^:9 = 9 Q i s -(1 + 9 ) x (1 + 9 ) e -9 x Z(x) = y — ^ (1 + 9 ) e , (1.8) and the l i k e l i h o o d r a t i o based on a random sample Xj,...X^ i s k , k Z k = TT z(X ±) = (1 + 9 Q ) K e x p ( - 9 0 S X±). (1.9) One way of comparing two tests of HQ:9 = 0 vs. H^:9 > 0 i s to look - 14 - at t h e i r performance for te s t i n g HQ VS. simple altern a t i v e s and l e t the al t e r n a t i v e s approach HQ. A common choice for a l t e r n a t i v e i s H l n ^ c / ^ w n e r e n ^ s t n e s a m P l e s i z e . The reason for this choice i s the desire for the test s t a t i s t i c to have a non-degenerate l i m i t i n g d i s t r i b u t i o n under the a l t e r n a t i v e . This enables one to calculate l i m i t i n g (or asymptotic) power and i t i s by this c r i t e r i o n that tests w i l l be compared. Tests which perform well according to this are considered s e n s i t i v e to small departures from 9 = 0 . For making asymptotic power calculations the l i m i t i n g d i s t r i b u t i o n of the l i k e l i h o o d r a t i o i s useful i n two ways: 1. When measuring the performance of tests based on the l i k e l i h o o d r a t i o i t s l i m i t i n g d i s t r i b u t i o n i s e s s e n t i a l . 2. The l i m i t i n g d i s t r i b u t i o n , under the a l t e r n a t i v e , of other s t a t i s t i c s can be found from the j o i n t l i m i t i n g d i s t r i b u t i o n , under the n u l l , of the s t a t i s t i c and the l i k e l i h o o d r a t i o . These two uses are explored i n Chapter 2 and Chapter 3. In Chapter 2 tests based on Z are shown to have cer t a i n asymptotic optimality properties. In Chapter 3 weak convergence of Z i s used to f i n d the l i m i t i n g d i s t r i b u t i o n of Pearon's chi-square s t a t i s t i c for goodness of f i t t e s t s . Let us examine the asymptotic d i s t r i b u t i o n of the l i k e l i h o o d r a t i o i n the exponential example introduced above. For each n there i s a random sample X N , X N , . . . from the exp(l + 9) d i s t r i b u t i o n . The hypothesis H^ n says that 9 = 9Q/Vn for this sample. The l i k e l i h o o d r a t i o for H 0=9 = 0 vs. H j > n i s , from (1.7), - 15 - 0 O - ( 6 //rT) x Z n(x) = (1 + e 0 and the l i k e l i h o o d r a t i o s t a t i s t i c based on X ? , . . . , X n Is 1 n n 6. - 9 . n Z » = IT Z n ( X ^ ) = (1 + - V e x p ( - ° - E X ? ) . n 1 1 /TT /rT 1 1 By a T a y l o r e x p a n s i o n of l o g ( l + x ) , l o g Z n can be w r i t t e n n 9 9 n l o g Z* = n l o g ( l + — ) ^ U J / n / n 1 A 9 0 ^ n f 1 „ 9 0 n Y n = » ( — - 2TT+ °<-372)> " 7= ? X i vn n vn 1 1 -0 Z ( x n n _ 0 + Q ( l _ ) /̂T 1 1 /n" i . e . , the rema inder term 0 ( l / /n " ) i s d e t e r m i n i s t i c and converges to 0 a t r a t e 1//TT) . Now under , {x^ - l } i = j i s a sequence o f i . i . d . mean 0 , n d v a r i a n c e 1 random v a r i a b l e s and hence ( l / / n ) Z(X - 1) - » - N ( 0 , l ) by the 1 1 C e n t r a l L i m i t Theorem. T h e r e f o r e . n d - 60 2 n2 l o g Z ĵ * N(-yS Bp under H Q . ( 1 .10 ) S i m i l a r l y under > n - 16 - {(1 + ~~) <X° i )}? . /n" 1 1 + 9Q//n" 1 - 1 i s a sequence of i . i . d . mean 0, variance 1 random variables and thus, using the same Taylor expansion, , 7* -e 0 ( i + e0/v^) n i V ' n log Zn = ~ E(X, ) n (1 + 0 //n") /n" 1 1 1 + eQ//n" 1 + 8 //n" e 2 + e //n" - 4 + o( -) (J Z r- vn -6 (1 + 8 n 6 2 0 0 z ( x n 1 } + 0 (1 + eQ//n") vn" 1 1 1 + *Q/Sn 1 + Q Q / ^ e 2 d e2 - - | + 0 ( — ) * -8 N(0,1) + e2 - - | where N(0,1) stands f o r a random var i a b l e having that d i s t r i b u t i o n . Therefore d 9 2 . log Z« + N ( / , e Q) under H ^ . n There i s a connection between the l i m i t i n g d i s t r i b u t i o n s of log under the n u l l and al t e r n a t i v e hypotheses with another hypothesis testing problem which i s thought of as a " l i m i t i n g problem." Given - 17 - X = N (9,l) the l i k e l i h o o d r a t i o s t a t i s t i c for te s t i n g i s H Q: 9 = 0 vs. : 6 - 9Q * 0 Thus Z(X) = exp (9 0X - - | ) . - 9 2 2 .d „, 0 log Z(X)= N ( - ~ , 9Q) under H Q, (1.11) e 2 2 log Z(X)= N ( y ^ , 9Q) under (1.12) and hence log * log Z(X) under HQ and under Hj n » (1.13) d We w i l l use this fact for evaluating the asymptotic properties of tests based on Z^ as the sample si z e n gets large. The broadest i n t e r p r e t a t i o n of (1.13) i s that the parameter 9 i n the exp(l + 9/ /vi) family of d i s t r i b u t i o n s plays the same role'asymptotically as 9 i n the N (9,l) family. We pursue this idea i n Section 1.5. For evaluating the performance of tests of the form { z n > K } or ° 1 n — n J equivalently {log z " _> C n} l e t p" be the exp(l + 90//n) d i s t r i b u t i o n and PQ be the exp(l) d i s t r i b u t i o n . In the notation introduced i n - 18 - Section 1 . 1 , = dP n/dP n. Now i f the test mentioned previously i s to have asymptotic l e v e l a, i . e . , P n ( l o g zJJ >_ C ) * a as n + » then ( 1 . 1 1 ) and ( 1 . 1 3 ) imply that C n + J'1 7 g- - Z a 9 o C n + eQ Z a - - as n * - where Z a i s the 100(1 - a) perc e n t i l e of the standard normal d i s t r i b u t i o n . Note that from ( 1 . 1 1 ) and ( 1 . 1 3 ) i t follows that log Z™ + 0^/2 d — • — g N ( 0 , 1 ) under HQ (under p n ) . Now by ( 1 . 1 2 ) and ( 1 . 1 3 ) the asymptotic power of t h i s test i s log Z n - 6 2 / 2 C - 9 2 / 2 lim P n ( l o g z " > C ) = lim P"( \ — > " Q " ) ( 1 . 1 4 ) n-»-°° n->-°° 0 0 where 0 i s the d i s t r i b u t i o n function of the standard normal - 19 - d i s t r i b u t i o n . How does t h i s compare to the asymptotic power of other tests of HQ VS. % > n which have asymptotic l e v e l a? This can be answered by considering the power of l e v e l a tests of HQ'9 = 0 vs. H l : 9 = 90 b a s e d o n x having d i s t r i b u t i o n N(9,l). Now 1 - <3>(Za - 9 Q) log Z(X) - 92/2 i s the power of the test { g >̂ Z Q - 9̂ } or {log Z(X) > 9 Q Z a - 9 2/ 2} and by the Neyman-Pearson lemma t h i s test has the greatest possible power. From t h i s i t can be shown (see [10]) that the test {log Z^ _> ^ n Z a - 9Q/2} has the greatest asymptotic power among a l l tests having asymptotic l e v e l a. This i s demonstrated i n general i n [10]; the p a r t i c u l a r form of the l i k e l i h o o d r a t i o s i s not important. We w i l l produce a s i m i l a r r e s u l t i n the sequential testing s i t u a t i o n i n Chapter 2. For this purpose i t i s necessary to study the functional convergence of the l i k e l i h o o d r a t i o viewed as a stochastic process. We take up this topic next. 1.4 Functional Convergence of Likelihood Ratios In Section 1.2 sequential procedures for testing simple hypotheses were examined and the SPRT was seen to be optimal i n c e r t a i n ways. The question of asymptotic power against al t e r n a t i v e s tending to the n u l l leads to the study of the l i m i t i n g d i s t r i b u t i o n of the l i k e l i h o o d r a t i o considered as a process with time measured by observations of the data points. The data Xi»X2»... are i . i . d . observations from - 20 - d i s t r i b u t i o n either Pg o r ? i • The l i k e l i h o o d r a t i o process {zk:k=l,2,...} i s defined i n (1.3). In the non-sequential case of the previous section a sequence of al t e r n a t i v e hypotheses was indexed by the sample size and the al t e r n a t i v e grew closer to the n u l l as the sample size increased. With a l a r g e r number of observations smaller departures from the n u l l hypothesis can be detected with equal power. In the sequential s i t u a t i o n , to detect nearby al t e r n a t i v e s many observations are required on average and so i t i s reasonable to approximate the l i k e l i h o o d r a t i o process by a continuous time process. Based on an i . i . d . sequence X n,X n,... with d i s t r i b u t i o n either P^ or P n one continuous time version of the l i k e l i h o o d r a t i o process i s [nt] z n ( t ) = n z n ( x n ) i=l (1.15) where Z n = dP^/dP n as defined i n (1.1) and [nt] denotes the integer part of nt. Symbolically, the observations X n are associated with the time points i / n . Example. If Xn,X~,... are independent N(6,1) random variables and H 0 s p e c i f i e s 9 = 0 while H l,n s p e c i f i e s 9 = 9 / VTI then 0* (1.16) - 21 - I t i s known that processes of th i s form converge weakly to a Brownian motion (e.g., see Corollary 6 of [16]); i n this case we have w 0 2 log Z n ( t ) + 6 0 B(t) - — t under H Q (1.17) w 9 2 and log Z n ( t ) •»• 9„ B(t). + - ~ t under H. (1.18) U I 1 ,n w where {B(t):t > 0} i s a standard Brownian motion. The convergence -»• takes place i n the space D([0,°°)) of right continuous functions with l e f t l i m i t s with the Skorohod metric (see [1]). However, because B(t) has continuous sample paths we can use the a l t e r n a t i v e formulation d 9 2 f ( l o g Z n(-)) - f ( 0 Q B ( . ) - -°r(-)) f o r functionals f continuous with respect to the sup-norm (uniform) metric ( [ 1 ] ) . As i n the non-functional case the l i m i t i n g d i s t r i b u t i o n s of the l o g - l i k e l i h o o d under the n u l l and al t e r n a t i v e hypotheses are the d i s t r i b u t i o n s of the l o g - l i k e l i h o o d process for a " l i m i t i n g hypothesis testing problem." This fact w i l l be used for computation of asymptotic power and the de r i v a t i o n of asymptotically optimal sequential procedures in Chapter 2. Conditions which guarantee the weak convergence i n (1.17) and (1.18) for general l i k e l i h o o d r a t i o s are explored i n [12]. One r e s u l t - 22 - states that w log Z n ( t ) + B(t) - j Xt under H 0 (1.19) w , and log Z n ( t ) •*• B(t) + j At under Hj (1.20) i f and only i f n / ( / f n ( x ) - / f n ( x ) ) 2 dx (1.21) and n J (/f n(x) - / f n ( x ) ) 2 dx •»• 0 (1.22) as n * » where A n(e) = { x : | / f n ( x ) / f n ( x ) - l | > e}. Here {B(t):t >_ o} i s a Brownian motion with variance X per unit time ( i . e . , Var (B(t)) = Xt.) More general processes can arise as the l i m i t of a l o g - l i k e l i h o o d r a t i o process, including processes with independent normally d i s t r i b u t e d increments. If we have independent and i d e n t i c a l l y d i s t r i b u t e d observations X n, Xn,... such that (1.21) and (1.22) hold then the l i m i t i n g process can only be a Brownian motion. The reason for this i s clear; i f X°, Xn,... are i . i . d . then log Z n ( t ) has stationary independent increments because i t i s formed from p a r t i a l sums of i . i . d . random v a r i a b l e s . If the l i m i t i n g process log Z(t) say, has stationary, independent, normally d i s t r i b u t e d increments i t must be a Brownian motion. The l i m i t i n g processes i n (1.19) and (1.20) are Brownian - 23 - motions both with variance A per unit time and with d r i f t s -A/2 and A/2 per unit time r e s p e c t i v e l y . 1.5 Contguity and Convergence of Experiments The concept of nearness of n u l l and al t e r n a t i v e hypotheses or of families of p r o b a b i l i t y measures i s made precise by the notions of cont i g u i t y and convergence of experiments. A sequence {plj1} of p r o b a b i l i t y measures i s said to be contiguous to another sequence {PQ} (written plj1 < PQ) i f for any sequence of events lira P Q U 1 1 ) = 0 implies lim P^(A n) = 0. n-n» n->-eo Discussion of contiguity and i t s uses can be found i n [12] and [21]. Contiguity has a close r e l a t i o n s h i p with weak convergence of the l i k e l i h o o d r a t i o Z n = dP^/dP^. In the case that Z n has a l i m i t i n g d i s t r i b u t i o n under the n u l l hypothesis contiguity i s equivalent to the existence of a l i m i t i n g d i s t r i b u t i o n for Z n under the a l t e r n a t i v e hypothesis ([12]). In order that asymptotic power be non-degenerate the sequence of al t e r n a t i v e s must be contiguous to the sequence of n u l l hypotheses. T y p i c a l l y i n the absence of contiguity there w i l l exist tests with a r b i t r a r i l y small error p r o b a b i l i t i e s for s u f f i c i e n t l y large sample - 24 - s i z e s . This was the case i n Section 1.2 where the l i k e l i h o o d r a t i o had a degenerate l i m i t because both the n u l l and a l t e r n a t i v e did not change. When a composite hypothesis i s s p e c i f i e d the testing problem cannot be described i n terms of contiguity or the l i k e l i h o o d r a t i o of two sequences of p r o b a b i l i t y measures. A means of comparing more than two sequences of p r o b a b i l i t i e s at one time i s needed. Convergence of experiments describes the nearness of families of p r o b a b i l i t y d i s t r i b u t i o n s . An experiment refers to a family E = {Pq} of p r o b a b i l i t y d i s t r i b u t i o n s . A sequence of experiments E N = {Pg} i s said to converge to E (written E •+• E ) i f for every f i n i t e set (9, ,...,9 } n J 1' ' m' the vector (dPg /dy 1 1,... ,dPg /dy11) converges i n d i s t r i b u t i o n , under u 1 1, 1 m m m to (dP Q /dy dP Q /dy) under y, where y n = E Pg and y = E P Q. 1 m 1 i 1 i In the case of binary experiments (those that contain two d i s t r i b u t i o n s ) convergence of experiments coincides with weak convergence of the l i k e l i h o o d r a t i o . Convergence of experiments i s the e s s e n t i a l hypothesis of the Hajek-LeCam minimax theorem ([17]). This i s one example of i t s a p p l i c a t i o n to composite hypothesis t e s t i n g . An example of convergence of experiments i s given by the family E N = {exp(l + 9//n"):9eR} which has l i m i t i n g experiment E = {N( 9,1) : 9eR}. This fact i s suggested (but not proven) by the one-dimensional weak convergence i n (1.13). An i n t e r e s t i n g way of thinking about convergence of experiments i s - 25 - as an extension of the l i k e l i h o o d p r i n c i p l e . The l i k e l i h o o d p r i n c i p l e (see [6]) says that a l l inference about the family {Pg} should be based on the l i k e l i h o o d function dP f i L( 9) = — 9 -du when there i s a measure u such that Pg « p for a l l 9. An extension of t h i s might say that when {p^} ->- {Pg} ( i n the sense defined above) a l l inference about {PQ} should be based on the l i k e l i h o o d function for {Pg} when n i s la r g e . - 26 - CHAPTER 2 ASYMPTOTIC OPTIMALITY OF SEQUENTIAL PROCEDURES 2.1 Wald's C r i t e r i o n Let X^, X^,... be i . i . d . random variables with common d i s t r i b u t i o n e i t h e r P^ or P^ and l e t the l i k e l i h o o d r a t i o process { z n ( t ) : t >̂ o} be given by [ n t ] d P ? n z n ( t ) = n — l - ( x ? ) i - i dP« i dp^1 with Z = — - defined by (1.1). As sume that the process Z has the K asymptotic behaviour discussed i n Section 1.4, namely w X log Z (t) •> B(t) ~ J t u n d e r P o » I 2 ' 1! w X and log Z n ( t ) •*• B(t) ~ j t under P n, [2.2] where {B(t):t >_ o} i s a Brownian Motion with variance A per unit time ( i . e . , Var B(t) = At). A test w i l l be defined using the l i m i t i n g process and this test w i l l be shown to be asymptotically optimal when applied to testing H-.tPj? vs. H, :P?. This i s an extension to the sequential t e s t i n g 0 0 1,n 1 s i t u a t i o n of the s i m i l a r r e s u l t discussed i n Section 1.3. - 27 - As a f i r s t step we recognize the l i m i t i n g process i n (2.1) and (2.2) as l i k e l i h o o d r a t i o s . Let PQ and Pi be the d i s t r i b u t i o n s on C([0,«0) of the processes {B(t):t _> o} and {B(t) + Xt:t 2 n} r e s p e c t i v e l y and l e t d P l t Z(t) = — ^ d P o , t where PQ t and P^ t are the r e s t r i c t i o n s of PQ and P^ to C ( [ 0 , t ] ) . I t i s shown in [10] that log Z(t)= B(t) - | t under P Q (2.3) log Z(t)= B(t) + | t under P . (2.4) If HQ represents PQ and PQ and H^ represents P^1 and P^ then we have the weak convergence of the processes n w Z •*• Z under H Q and under . (2.5) The Sequential P r o b a b i l i t y Ratio Test (SPRT) for testing HQ vs. H 1 uses the stopping rule T* = i n f {t: Z(t) < B or Z(t) >_ A} (2.6) and decision rule D* = {Z(T*) > A}. (2.7) - 28 - I t can be shown ([8]) that T* i s f i n i t e under both hypotheses; thus when the event D* does not occur Z(T*) _< B and Hg i s accepted. The SPRT has the same optimality property i n continuous time as i t does i n d i s c r e t e time. A sequential procedure for testing HQ vs. consists i n general of a stopping rule T which takes values i n [O,00] such that the event {T <_ t} i s determined by {B(S):0 <. s <̂ t} and a de c i s i o n rule D which must be such that DflJT _< t} i s determined by {B(S):0 < s <_ t}, for each te[0,°°]. Optimality Property of Continuous Time SPRT ( [ 8 ] ) : Assume that for each n,Z n has a continuous d i s t r i b u t i o n under P n. If a sequential test (T,D) of HQ:PO VS. H^:PJ has smaller error p r o b a b i l i t i e s than (T*,D*) defined by (2.6) and (2.7), i . e . , V D ) i po(D*} a n d p i ( ° C ) < VD*C) then (T,D) must have higher average sample numbers (ASN), E Q(T) > E Q(T*) and E ^ T ) !> E ^ T * ) . We w i l l now prove a r e s u l t (stated more pr e c i s e l y below) which says that this optimality property i s preserved i n the l i m i t when the SPRT i s applied to the disc r e t e time s e t t i n g . Consider the procedure (T^, D ) given by T* = i n f { t : Z n ( t ) > A or Z n ( t ) < B } . - 29 - and D = { z n ( T ) > A } . n 1 n — ' To study the asymptotic properties of (T » D ) the following r e s u l t s are used £ d ^ T^ -»• T under H Q and under H 1 n ( 2 . 8 ) •n ic ^ & Z (T ) •*• Z(T ) under H„ and under H, ( 2 . 9 ) n U 1 ,n ft ft These follow from the fact that T and Z(T ) are continuous functionals of { z(t):t _> 0} r e l a t i v e to the sup-norm metric and the weak convergence ( 2 . 5 ) holds with respect to this metric (see Section 1 . 4 ) . From ( 2 . 9 ) i t follows immediately that the asymptotic error p r o b a b i l i t i e s of . * ft. ft * ( T n , D ) are equal to the error p r o b a b i l i t i e s of the SPRT (T , D ), i . e . lim Pjj (D*) = P Q(D*) ( 2 . 1 0 ) n>°° and lim P n (D* c) = P.(D* c). ( 2 . 1 1 ) I n 1 The same r e s u l t for the average sample numbers requires the uniform i n t e g r a b i l i t y of {T^}; this i s demonstrated i n Appendix 1, thus l i m E n(T ) = E (T ) ( 2 . 1 2 ) U n U n>°° and lira E " ( T * ) = E . ( T * ) . ( 2 . 1 3 ) I n 1 n->-°° - 30 - The asymptotic optimality r e s u l t can now be stated. Asymptotic Optimality Property of ( T n , D n) : Assume that P^D) > 0. I f (T^, D^) i s any sequential test of vs. s a t i s f y i n g n lim P (D ) < P (D ) (2.14) n+°° and lim P™ (D°) _< PjCD*) (2.15) then lim E n ( T ) > E 0(T*) (2.16) n-H» and lim E n ( T ) > E.(T*). (2.17) ^ 1 n - 1 A proof of this r e s u l t w i l l now be given. F i r s t we f i n d a SPRT which has the same error p r o b a b i l i t i e s as ( T n , D n ) . This i s where the assumption of continuity of the d i s t r i b u t i o n of Z n i s needed; i t implies the existence of the required SPRT. We state the needed re s u l t from [24]: Lemma. Assume that Z n = dP^/dP^ has a continuous d i s t r i b u t i o n . If <x̂ and are non-negative numbers with + <̂ 1 there exist A n and B n such that the SPRT with stopping boundaries Afl and B n has error p r o b a b i l i t i e s a n and a. . - 31 - In order that the lemma applies the error p r o b a b i l i t i e s of (T » D ) must s a t i s f y the constraint + a • _< 1. Since we have assumed that P Q ( D * ) + P ^ D * 0 ) < 1, (2.14) and (2.15) imply that for large n we w i l l have P N ( D ) + P" ( D c ) < 1 as required. Since only the t a i l of the U n 1 n sequence a f f e c t s (2.16) and 2.17) we can assume without loss of gene r a l i t y that this i n e q u a l i t y holds for a l l n. Now l e t ( T n , D N ) be the SPRT determined by the Lemma, that i s T' = i n f i t : Z n ( t ) > A or Z n ( t ) < B }, D» = {z n(T') > A }. n 1 n — n' By the optimality property of (T^, D^) i t must have lower ASN than T , D ); thus i t w i l l s u f f i c e to show (2.16) and (2.17) with (T*, D 1) i n n ' n ' v / v / n' n place of (T » n n ) « Because of the i n e q u a l i t i e s (2.14) and (2.15) for (T^, D ^ ) , the sequences {A^} and {B^} must be bounded. I f , say, {A^} was not bounded above then P?(D') = 0 and this contradicts (2.15). n>°° I n By considering a subsequence i f necessary assume {A N} and {B^} converge, say lim A n = A', lim B n = B'. Now i f one of (2.16), (2.17) did not hold the SPRT (T', D') with stopping boundaries A' and B' would be k k k better than the optimal procedure (T , D ) i . e . , PQ(D') _< p g ( D )» P ^ D ' 0 ) < P ^ D * 0 ) , E (T') < E Q(T*) and E (T') < E (T*) with s t r i c t - 32 - in e q u a l i t y i n one of the l a s t two i n e q u a l i t i e s . This contradicts the ft ft o p t i o n a l i t y property of (T , D ) . 2.2 Bayes Risk Criterion In t h i s section a d i f f e r e n t c r i t e r i o n for comparing sequential testing procedures i s used, the Bayes r i s k . We begin with the set up described i n the f i r s t paragraph of Section 2.1. For a sequential test of HQ:PQ V S. H^:P^, say (T,D), we define the Bayes Risk, just as i n (1.6), by P n(T,D;ir) = ir(P n(D) + cE n(T)) + (1 - TT)(P*(D C) + cE™(T)), (2.22) where TT i s the p r i o r p r o b a b i l i t y of the d i s t r i b u t i o n being PQ and c i s the cost per observation. For a sequential test (T,D) of HQ:PQ VS. H ^ : P J , where PQ and P̂ are as i n the previous section, the Bayes Risk i s p (T,D;rr) = TT(PQ(D) + C E Q ( T ) ) + (1 - ^ ( P ^ D 0 ) + c E ^ T ) ) , (2.23) Here c represents the cost per unit time of observing the l i k e l i h o o d r a t i o process Z ( t ) . We w i l l now solve the problem of minimizing p over a l l continuous time sequential t e s t s . Our der i v a t i o n w i l l mimic the strategy used i n [20] for deriving the same res u l t i n discrete time; the appropriate theory for the continuous time case corresponding to Snell's envelope i s given i n [20] and also i n [9]. The solu t i o n w i l l be a p a r t i c u l a r SPRT. Although the solu t i o n i s derived here only f o r the sp e c i a l case that - 33 - PQ and P^ are d i s t r i b u t i o n s of Brownian motions the same argument w i l l work for more general s i t u a t i o n s . In p a r t i c u l a r i t w i l l work under the general conditions of [8] which are used there for obtaining the previous optimality property of continuous time SPRTs given i n Section 2.1. To begin i t w i l l be necessary to consider the equivalent problem of minimizing p ( r ) (T,D;IT) = TT(P q(D) + cE 0(T)) + r ( l - TT)(P 1(D C) + c E ^ T ) ) , allowing the new parameter r to vary. The f i r s t step consists of f i x i n g a stopping rule and finding the best decision rule to go with i t . Lemma. If T i s fixed minfir P Q(D) + r( 1 - TT) P ^ D 0 ) ] = E q ( T T A r ( l - TT) Z(T)) and the minimum i s achieved at D A = {Z(T) > Tr/ r(l - TT) }. Proof: F i r s t we note that DO{T <_ t} i s determined by {B(S):0 <_ s £ t}, for a l l t, and Z(T) equals dP]/dPQ on the o - f i e l d of such events. By the Lebesque Decomposition, TT P Q(D) + r ( l - TT) Vl(D c) = / D TT dP Q + / c r ( l - TT) Z(T) dP Q > / [ TT A r ( l - IT) Z(T)] dP Q. I t i s straightforward to check that there i s equality here for D = D*. - 34 - The problem i s now reduced to minimizing TT c E Q ( T ) + r ( l - TT) c E ^ T ) + E q(TT A r ( l - TT) Z(T)) = E 0(TT C T + r ( l - TT) C T Z(T) + TT A r ( l - TT) Z(T)) = E Q ( Y ( r ) ( T ) ) where the process { Y ( r ) ( t ) : t >_ 0} i s defined by Y ( r ) ( t ) = TT c t + r ( l - TT) c t Z(t) + TT A r ( l - TT) Z ( t ) . According to Theorem 7.3 i n [20] or Theorem 4 i n [9] this can be done by fi n d i n g the largest p o s i t i v e sub-martingale, say { V ( r ) ( t ) : t > 0} dominated by{Y^ r^(t):t _> o} and then forming the stopping rule T* = i n f { t : Y ( r ) ( t ) = V ( r ) ( t ) } . (2.24) The process v ( r ) i s given by V ( r ) ( t ) = essinf E ( Y ( r ) ( T ) | B ( S ) : 0 < s < t) (2.25) where the essinf i s taken over a l l stopping rules T which s a t i s f y T _> t. Since Z(0) = 1, the i n i t i a l value V^ r)(0) i s deterministic and V ( r ) ( 0 ) = inf E ( Y ( r ) ( T ) ) = h ( r ) . (2.26) Note that h i s an increasing concave function because i t i s the infimura of such functions. This fact w i l l be important for determining the nature of the s o l u t i o n . For any stopping rule T s a t i s f y i n g T ^ t - 35 - E ( Y ( R ) ( T ) | B ( s ) : 0 < s < t) = E ( T T c ( T - t) + r ( ( l - i r ) Z(t) c ( T - t) + TT A r( 1 — TT) Z(t) | ^ - | B ( S ) : 0 < s < t) + TT c t + r ( l - TT) ct Z(t) (2.27) where we have used the fact that E ( Z ( T ) |B(S) :0 <, s _< t) = Z(t) ( i . e . , Z B ( t ) - | t i s a martingale). Using the representation Z(t) = e z l u l B ( u ) " B ( t ) " T ( u " t } z ( t ) e and since JB(t)} has stationary independent increments, the process [|ftT:u >_ t} i s independent of (B(S) :0 < s <_ t} and has the same d i s t r i b u t i o n as the process {Z(u) :u >̂ 0 }. Therefore the c o n d i t i o n a l expectation i n (2.27) i s minimized exactly as for the case t = 0 i n (2.26) but with r replaced by r Z ( t ) , i . e . , V ( r ) ( t ) = essinf E ( Y ( r ) (T) |B(s) :s < t)) = h(r Z(t)) + Trct + r ( l - TT) c t Z ( t ) . ft Now the stopping rule T i s given by T* = i n f { t : Y ( r ) ( t ) = V ( r ) ( t ) } = i n f {t :h(rZ(t)) = TT A r ( l - TT) Z(t) }. In order for the Bayes Risk given i n (2.23) to be minimized by T*, r i s now set to 1. Thus T = i n f { t : h(Z(t)) = TT A (1 — IT) z(t)}, Since h i s increasing and concave, T* has the form T* = i n f {t:Z(t) > A or Z(t) < B } . for constants A and B i l l u s t r a t e d below. / j/hCx) ! x TTA(1-H)X B TT/( 1-IT) A F i g . 1. Graph of h which determines stopping boundaries A, B of optimal SPRT. If T i s the stopping rule employed, the decision rules {Z(T ) > A} r * IT 1 and |Z(T- ) ̂> y-—J- ( r e c a l l the lemma, pg. 33) are equivalent due to the ineq u a l i t y B <_ ^ <_ A. Also, the cases B >̂ 1 and A <̂ lcorrespond to T* = 0 i n which the i n i t i a l decision based only on the prior p r o b a b i l i t y i s optimal, having Bayes Risk TTAO - • - 37 - As i n the previous section the optimal procedure for the continuous time problem w i l l be applied to the discrete time s e t t i n g ; a procedure which minimizes the asymptotic Bayes Risk r e s u l t s . Define the stopping rule T* = inf {t:Z n(t) > A or Z n ( t ) <̂ B} where A and B are the stopping boundaries of the SPRT which minimizes the Bayes Risk (2.23) and the decision rule D* = {zn(T*) > A}. n 1 n — ' * * Thus (T > D^) has the asymptotic optimality property given by (2.14) - (2.17). Here i t w i l l be shown to have the following property. ft ft Asymptotic Bayes Optimality Property of ( T n , D n):The asymptotic Bayes Risk of (T* D*) i s n' n lim p (T*. D*; TT) = p(T*, D*; TT) . (2.28) n n n n*°° If (T^, D n) i s any sequential test of HQ VS. H^ then lim i n f p (T, D; n) > p(T*, D*; n). (2.29) n+<» (T,D) n ft ft This w i l l say that (T n> D ) has the smallest possible asymptotic ft ft Bayes Risk and the value i s the Bayes Risk of the procedure (T , D ). - 38 - The proof of (2.28) i s achieved by the ap p l i c a t i o n of (2.10), (2.11), (2.12) and (2.13) which state that a l l of the components of the * * Bayes Risk P n ( T n , D^; TT) converge to the corresponding components of ft ft p(T , D ; TT). The f i r s t step i n proving (2.29) i s to compute the minimum value of ft p n . From the discussion preceding the derivation of T i t i s known that p n i s minimized by a SPRT with some stopping boundaries, say A n and B n, that i s in f p (T, D; TT) = p (T , D ; TT) m ~ n n n' n' ' T,D where T n = i n f { t : Z n ( t ) > A n or Z n ( t ) < B n} and D = { z n(T ) > A n}. n n — ' Assume now that along a subsequence of the integers {n'} the l i m i t lim P n,(T n,, Dn,; ir) n' exists and i s less than p(T , D ; Tr) • Within this subsequence there i s a further subsequence (also c a l l e d {n' }) such that the l i m i t s n' n' lim A = A' and lim B = B' e x i s t , possibly i n f i n i t e . F i n a l l y we can n' n' - 39 - repeat the argument at the end of the previous section, to show that the continuous time SPRT with stopping boundaries A' and B' has lower Bayes Risk than the SPRT which uses A and B. This of course contradicts the fact that A and B were derived to minimize the Bayes Risk (2.23). - 40 - CHAPTER 3 POWER OF CHI-SQUARE TESTS The focus of this section i s the asymptotic power of Pearson's chi-square s t a t i s t i c for t e s t i n g goodness of f i t against a c e r t a i n clas of a l t e r n a t i v e s . These al t e r n a t i v e s are contiguous to the n u l l hypothesis i n the sense defined i n Section (1.4). We w i l l reproduce a d e r i v a t i o n of the l i m i t i n g d i s t r i b u t i o n of Pearson's chi-square s t a t i s t i c under the n u l l hypothesis ( [ 7 ] , [19]). In [5] the l i m i t i n g d i s t r i b u t i o n under a class of a l t e r n a t i v e s i s computed, whereby the asympotic power can be computed. We w i l l give a d i f f e r e n t development of this result which uses the weak convergence of the l i k e l i h o o d r a t i o . This highlights the usefulness of the l i k e l i h o o d r a t i o as a t o o l for studying hypothesis testing problems. In [18] the l i m i t i n g a l t e r n a t i v e d i s t r i b u t i o n i s found for sit u a t i o n s where a parameter-must be estimated. Also we w i l l compare the asymptotic power of the chi-square test and of a test based on the l i k e l i h o o d r a t i o . We know from Section 1.3 that the test based on the l i k e l i h o o d r a t i o must win; the extent of the difference i s of i n t e r e s t . Let N = (Ni,...,Nk) be a multinomial random vector which records the numbers of data points which f a l l into each of k c l a s s i f i c a t i o n s . Let the t o t a l number of data points be n and the p r o b a b i l i t y of any one f a l l i n g into the i t h category be P^. The p r o b a b i l i t y function of _N i s - 41 - , k n. k P(N 1=n 1,...,N k=n k) - = H P ^ ( n e Z +, I n =n) (3.1) n. !. .n. ! 1 1 1 k A common question asks whether the p r o b a b i l i t y vector P_ = (Pj,...,P k) belongs to a parametric family (P( 9): 9 E H } . This question r e f l e c t s on the d i s t r i b u t i o n of the underlying data which i s usually the source of i n t e r e s t . For example when testing whether a sample X^,...,Xn came from the Normal d i s t r i b u t i o n , categories E^ = (a^_^> a i _ l - ' . (̂ = l,...,k) could be formed and the numbers N^ = #{Xj e E^} of observations f a l l i n g into these i n t e r v a l s recorded. under the normal d i s t r i b u t i o n the p r o b a b i l i t y vector P would be given by P i = L]\ J L - e - ^ W d x 1 - 1 /2T7O" a. - u a. , - u .The hypothesis of normality for Xi,...,X n i s also s p e c i f i e d by the p a r t i c u l a r parametric form for P̂ . In general a test of the composite hypothesis H 0 :_P = _P( 9) for some 6 e H requires estimation of 9. This s i t u a t i o n i s treated i n [18]. We w i l l consider only the simple hypothesis v p = p ( v - 42 - for some s p e c i f i e d 6Q e H. This i s also written as H 0 : P i = P l (i=l,-..,k) (3.2) where P(8 Q) = (pj ,... ,P°_) . Pearson's chi-square s t a t i s t i c for testing HQ i s k (N - nP?) 2 X Z(n) = E — i (3.3) 1-1 nP. l It w i l l be shown that X 2(n) has a l i m i t i n g (n •*• °°) chi-square d i s t r i b u t i o n with k-1 degrees of freedom. This fact i s used for computing c r i t i c a l values of the t e s t . The proof i s based on a simple mu l t i v a r i a t e Central Limit Theorem ([7]) applied to the sequence of random vectors V n given by N - nP° V = - i i . (3.4) n,i —?r /nP° A simple computation produces the covariance matrix of CovO^) = I k - i i ' where _£ = (/P^ ,...,Vp^)'. Because the are sums of independent i d e n t i c a l l y d i s t r i b u t e d (Bernoulli) random variables the multivariate - 43 - CLT can be applied to y i e l d d V n \(0, A) under H( 0 (3.5) as n •*• 0 0 where A = Ik ~ R 3.' • ^n v ^ e w °f t n e r e l a t i o n X 2(n) = V V —n —n the following result i s needed ( [ 7 ] ) , [19]). Proposition 1. If _Y = N ( 0 , A) and A i s idempotent with rank r then since the covariance matrix A = 1^ - q q' of V_n i s idempotent with rank k-1. Here we have used the continuity of the mapping X •*• X_' X. The s t a t i s t i c X 2(n) i s not designed with any s p e c i f i c a l t e r n a t i v e s to HQ i n mind. The asymptotic power of X 2(n) against the sequence of alter n a t i v e s An a p p l i c a t i o n of this to Vn ( r e c a l l (3.5)) gives lk-l (3.6) H. (3.7) - 44 - k where E C =0, can be calculated. 1 under the sequence of al t e r n a t i v e s H stated i n the following r e s u l t . The d i s t r i b u t i o n of ^ + v £ ) 2 + Z 2 + .. standard normal random v a r i a b l e s . The l i m i t i n g d i s t r i b u t i o n of X (n) I n , i s non-central chi-square as A 2 notion x' (̂ 0 represents the r 2 . + Z where Z 1 t . . . , Z are i . i . d . r 1' r Theorem ,2 2 d k C X Z(n) * xl , ( I -4) as n -v (3.8) K _ 1 1 P. l One possible proof (as i n [18]) uses a multivariate CLT for tri a n g u l a r arrays which establishes d "*" N K ^ » A) under n as n •»- 0 0 (3.9) with A as i n (3.5) and C l °k j5 = ( — , •. •, ) /p° /P? 1 k This must be combined with the following fact about the multivariate normal d i s t r i b u t i o n which generalizes Proposition 1. - 45 - Proposition 2. ([17]) If Y = N k(j5, A) and A i s idempotent with rank r and _6 i s i n the range (column space) of A then I ' I = xj.2 (± f ±) . Using t h i s r e s u l t i t i s immediate that the Theorem follows from (3.9). A d i f f e r e n t proof of (3.9) w i l l now be given; i t w i l l be based on the l i k e l i h o o d r a t i o for the simple hypothesis t e s t i n g problem HQ vs. Hi_ n . This l i k e l i h o o d r a t i o i s simply a r a t i o of multinomial p r o b a b i l i t i e s defined by (3.1), namely k N TT (P° + C±/Sn) ± Z l l = i k — I T (3.10) i In order that Z n can be used to f i n d the l i m i t i n g d i s t r i b u t i o n of V n there must be established a r e l a t i o n s h i p between the two. This i s done by taking logarithms and using a Taylor expansion as follows: k C log Z n = E N. l o g ( l + -~--) 1 1 /rT 2 p 0 / n 2(P 4) n c c 2 — Z - i N, - - E * , N. + 0_,(—) /— „ 0 i n . , 0 . 2 l P / -/n P 2(? i) /n - 46 - /n P 0 v n , i /nP 1; v n C i 1 C i = I — V - f l-f- + 0 ( 1 ) 1 1 , 1 2 P j ? (3.11) where 0 p ( — ) i s a term which converges to 0 i n p r o b a b i l i t y at rate l / / n , /n and 0^(1) converges to 0 i n p r o b a b i l i t y . The l a t t e r term arises because N ±/n - P J - 0 p ( l ) . Now we use the following strategy. There i s a " l i m i t i n g " simple hypothesis t e s t i n g problem which approximates the multinomial testing problem HQ vs. Hj n i n such a way that there i s a quantity which assumes the role of The d i s t r i b u t i o n of this quantity, under n u l l and under a l t e r n a t i v e , i s the l i m i t i n g d i s t r i b u t i o n of under HQ and under Ĥ n , re s p e c t i v e l y . The l i m i t i n g problem i s H Q:N k(0, A) vs. Hj :\(_6, A). The l i k e l i h o o d r a t i o based on a single observation X i s (see Appendix A.2) Z(X) = exp(^' X - - i _ 6 ' _S) . Comparing the equation - 47 - log Z(X) = _6' X - j j5« j5 (3.12) with (3.11) i t i s seen that ( Z n , are related by the same equation as (Z(x), x) > except for the error term Op(l). Also _ d log Z •*• logZ under H Q (3.13) as n > ». The l i m i t i n g d i s t r i b u t i o n of log Z n i s calculated from (3.11) and (3.5) as d log Z n -»• 6' N k(0, A) - - i _6' _6 under H Q i N(- I _6' j5, _6' _6) (3.14) (since the variance of 6/ Nk(Cj, A) i s _6' A _6 = _6' ( l f c - ̂ q̂ ' ) _6 = _6' _6.) The d i s t r i b u t i o n of log Z under HQ i s e a s i l y seen from (3.12) to be the same. F i n a l l y , we show that (3.9) follows from (3.5), (3.11) and (3.12). Two lemmas are required; t h e i r proofs are found i n the Appendix A.3. Lemma 1. Let Z be a l i k e l i h o o d r a t i o and {zn} a sequence of l i k e l i h o o d r a t i o s . If a sequence of s t a t i s t i c s X n s a t i s f i e s d (x n, z n) + (X, Z) - 48 - under the n u l l hypothesis then d X * X n under the a l t e r n a t i v e . Note that the d i s t r i b u t i o n of X under the al t e r n a t i v e i s not the same as the d i s t r i b u t i o n of X under the n u l l hypothesis. Lemma 2. Let X n and Y n be sequences of random quantities which converge i n d i s t r i b u t i o n say d d X n •> X, Y n * Y. If there i s a continuous function H and random quantities e n such that Y = H(X), Y n = H(X n) + e n and e n •*• 0 i n p r o b a b i l i t y then d (X n, Y n) + (X, Y). Lemma 1 reduces the proof of (3.9) to showing j o i n t convergence of and Z n or equivalently of and log Z n; this follows from Lemma 2 and (3.11). Since V > X where X = N, (0, A) under the n u l l and - 49 - X = N^—' A) under the a l t e r n a t i v e i t follows that the l a t t e r i s the l i m i t i n g a l t e r n a t i v e d i s t r i b u t i o n of VJJ. F i n a l l y we turn to a comparison of the test based on X 2(n) with the test based on Z n. The comparison w i l l be done v i a the l i m i t i n g d i s t r i b u t i o n s of the s t a t i s t i c s . Denoting _6' _6 by A the l i m i t i n g d i s t r i b u t i o n s of X 2(n) are 2 2 X, under H and x̂ . (A) under H , and the l i m i t i n g d i s t r i b u t i o n s of log Z n are N(- —A , A.) under and N(yA ,A ) under . The asymptotic power of the test based on l o g Z n i s given i n (1.14); replacing 9 2 there by A the asymptotic power i s 1 - $ ( Z a - (3.15) where Z a i s the lOO(l-a) per c e n t i l e and $ i s the d i s t r i b u t i o n function of the standard normal d i s t r i b u t i o n . Now for l e v e l s a = .05 and a = .01 we find the values of required for the asymptotic power of the X 2(n) test to be .85, .90 and .95. These values of A solve the equations P ( X k - l ( A ) y Xfc-l.a* = * 8 5 ' , 9 0 ' * 9 5 - 2 2 where X, i s the 100(l-a) per c e n t i l e of the X, d i s t r i b u t i o n and they can be read from Table 25 of [2], From A the power i n (3.15) i s computed and this i s then compared to the relevant power for the - 50 - chi-square t e s t . In Table 2 below the results are displayed for various values of k-1 the degrees of freedom for the chi-square s t a t i s t i c . Table 1. Asymptotic power 1 - Hz - a ^20 of the test based on z n for values of size a, power (3 and degrees of freedom k-1 of the chi--square t e s t . a = .05 a = .01 k-l\S .85 .90 .95 .85 .90 .95 1 .911 .945 .975 .901 .937 .971 2 .952 .972 .989 .945 .968 .987 3 .969 .983 .994 .968 .980 .993 4 .978 .989 .996 .976 .987 .995 5 .984 .992 .997 .982 .991 .997 6 .988 .994 .998 .987 .994 .998 7 .991 .996 .999 .990 .995 .999 8 .993 .997 .999 .992 .996 .999 9 .994 .997 .999 .994 .997 .999 10 .995 .998 1.000 .995 .998 .999 15 .998 .999 1.000 .998 .999 1.000 Note that the power seems to converge to 1 as k gets large; this i s also expressed by the fact that the non-centrality parameter & must increase with the degrees of freedom i n order that the chi-square test have constant power. For a large number of c e l l s k the chi-square s t a t i s t i c X 2(n) has greater d i f f i c u l t y i n detecting a p a r t i c u l a r a l t e r n a t i v e because i t attempts to detect alternatives i n many d i r e c - t i o n s . It should be mentioned that under altern a t i v e s other than that s p e c i f i e d by ( i . e . , P^ + C^/Zn), Z n may have smaller asymptotic power 2 than X (n). Thus a trade-off exists between increased power from using the l i k e l i h o o d r a t i o and the r i s k of using the wrong l i k e l i h o o d r a t i o . - 51 - BIBLIOGRAPHY 1. B i l l i n g s l e y , P. (1968). Convergence of P r o b a b i l i t y Measures. Wiley, New York. 2. Biometrika Tables f o r S t a t i s t i c i a n s , V o l . I I . (1972). Eds.: E.S. Pearson and H.0. Hartley. Cambridge Uni v e r s i t y Press. 3. Chernoff, H. (1972). Sequential Analysis and Optimal Design. S.I.A.M., Phi l a d e l p h i a . 4. Chernoff, H. and Petkau, A.J. (1981). "Sequential medical t r i a l s i n volving paired data." Biometrika, 68, 1, 119-132. 5. Cochran, W.G. (1952). "The X 2 test of goodness of f i t . " Ann. Math. Stat., 23, 315-345. 6. Cox, D.R. and Hinkley, D.V. (1974). Theoretical S t a t i s t i c s . Chapman and H a l l , London. 7. Cramer, H. (1946). Mathematical Methods of S t a t i s t i c s . Princeton Un i v e r s i t y Press. 8. Dvoretzky, A., K i e f e r , J . and Wolfowitz, J . (1953). "Sequential decision problems for processes with continuous time parameter. Testing hypotheses." Ann. Math. Stat., 24, 254-264. 9. Fakeev, A.G. (1970). "Optimal stopping rules for stochastic processes with continuous parameter." Thy. Prob. Appl. Vol. 15, No. 1, 324-331. 10. Freedman, D. (1971). Brownian Motion and D i f f u s i o n . Holden-Day, San Francisco. 11. Ghosh, B.K. (1970). Sequential Tests of S t a t i s t i c a l Hypotheses. Addison-Wesley, Reading, Mass. 12. Greenwood, P.E. and Shiryayev, A.N. (1985). Contiguity and the S t a t i s t i c a l Invariance P r i n c i p l e . Gordon and Breach, New York. 13. L a i , T.L., Siegmund, D. and Robbins, H. (1983). "Sequential design of comparative c l i n i c a l t r i a l s . " In Recent Advances i n S t a t i s t i c s , Eds.: M.H. R i s v i , J.S. Rustagi, D. Siegmund, 51-68. Academic Press, New York. 14. Lamperti, J . (1966). P r o b a b i l i t y . Benjamin/Cummings, Reading. 15. Lehmann, E.L. (1959). Testing S t a t i s t i c a l Hypotheses. Wiley, New York. - 52 16. Lipc e r , R. and Shiryayev, A.N. (1980). "A functional c e n t r a l l i m i t theorem for semimartingales." Thy. Prob. Appl. V o l . 25, No. 4, 667-688. 17. M i l l a r , P.W. (1983). The Minimax P r i n c i p l e i n Asymptotic S t a t i s t i c a l Theory. Unpublished notes. 18. Mitra, S.K. (1958). "On the l i m i t i n g power function of the frequency chi-square tes t . " Ann. Math. Stat., 29, 1221-1233. 19. Moore, D.S. (1983). "Chi-square t e s t s . " Studies i n Mathematics, V o l . 19: Studies i n S t a t i s t i c s , 66-106, Ed.: R.V. Hogg, Mathematical Association of America. 20. Neveu, J . (1974). Discrete Parameter Martingales. HoMcn-Pay, Sa«Francisco. 21. Roussas, G. (1972). Contiguity of P r o b a b i l i t y Measures: Some Applications i n S t a t i s t i c s . Cambridge University Press. 22. Thompson, M.E. (1971). "Continuous parameter optimal stopping problems." Z. Wahrscheinlichkeitstheorie, 19, 302-318. 23. Wald, A. and Wolfowitz, J . (1948). "Optimum character of the sequential p r o b a b i l i t y r a t i o t e s t . " Ann. Math. Stat., 19, 326-339. 24. Wijsman, R.A. (1963). "Existence, uniqueness and monotonicity of sequential p r o b a b i l i t y r a t i o t e s t s . " Ann. Math. Stat., 34, 1541-1548. - 53 - APPENDIX A.l Uniform Integrabillty of a Sequence of Stopping Times The convergence of Average Sample Numbers ((2.12), (2.13)) which i s used in Sections 2.1 and 2.2 requires the uniform i n t e g r a b i l i t y of the sequence of stopping times {Tn} given by T* = i n f {t: Z n ( t ) >_ A or Z n ( t ) < B}. This we w i l l e s t a b l i s h now using the set-up described i n the f i r s t paragraph of Chapter 2. The uniform i n t e g r a b i l i t y must be established under both sequences of p r o b a b i l i t i e s {Pg} and {P^}. In doing this for both sequences at once we w i l l l e t P N denote either of the sequences. Let F n be the * n d i s t r i b u t i o n function of T under P , n F (t) = P n ( T * < t ) . n n — Let t be an integer. Then 1 - F (t) = P n ( T * > t) n n - P N ( B < Z n ( s ) < A for a l l s < t) = P n ( l o g B < l o g Z n ( s ) < logA for a l l s < t) = P n ( l o g B < logz£ < logA for a l l k <̂ tn) - 54 - < P n(logB < l o g Z n < logA, logB < l o g Z n n < logA,..., logB < log Z n n < logA) < P n ( | l o g z £ | < C, | l o g Z n n - l o g Z n | < C,..., | l o g Z j n - l o g Z n t _ 1 ) n | < C) where C = JlogAJ + |logB . Since logZ^, i o g z " n - l o g " , . . . , l o g Z n - logZ? are i . i . d . ° tn & ( t - l ) n 1 - F (t) < [P(n)] f c n — with 'P(n) = P n(|logZ^| < C). Now since logZ^j = l o g Z n ( l ) 3- l o g Z ( l ) we have p(n) •»• P( |logZ(l) | < C) < 1 and thus we can assume without loss of generality that P(n) < Y < 1 for every n Therefore 1 - F n ( t ) <. Y*" for integers t, so for any t 1 - F n ( t ) < 1 - F n ( [ t ] ) < Y [ t ] < Y t _ 1 ( A l . l ) holds for every n. Now by in t e g r a t i o n by parts 2/QCI - F n ( t ) ) tdt = (1 - F n ( t ) ) t 2 / ^ + JQ t 2 d F n ( t ) " 'O t 2 d F n ^ > using the ine q u a l i t y ( A l . l ) . Therefore E n ( T * ) 2 = 2/" (1 - F ( t ) ) tdt n 0 n < 2 J ^ t Y t _ 1 dt < ~. It now follows that {^n) i s uniformly integrable. A.2 The L i k e l i h o o d R a t i o o f S i n g u l a r M u l t i v a r i a t e Normal D i s t r i b u t i o n s I t i s required to f i n d the l i k e l i h o o d r a t i o of the d i s t r i b u t i o n s Nk(_6, A) and N k(Cj, A). Consider the representation ([19]) A 1 / 2 Z + _6 for the N k(j5, A) d i s t r i b u t i o n , where Z i s a vector of i . i . d . N(0,1) 1/2 variables and A i s the square-root of A, a symmetric matrix 1/2 1/2 s a t i s f y i n g A A = A. In our s i t u a t i o n A i s idempotent with rank r 1/2 so that A = A; also - 56 - A = B P |o. B' = B D B' (A2.1) with B an orthogonal matrix, a representation which w i l l be used below. Now i f J> i s i n the range (column space) of A the vector A Z_ + _6 w i l l remain i n the range of A and the two d i s t r i b u t i o n s N^(_6, A) and Nk(0_, A) w i l l have the same support. This w i l l make the i r l i k e l i h o o d r a t i o meaningful. In Chapter 3 we had A = 1^ - <L <J.' and _6 was orthogonal to q so that A _6 = _6 thus ensuring that _6 i s i n the range of A. Now l e t QQ, be the p r o b a b i l i t y measures on R corresponding to the N k(0, A), Nk(_6, A) d i s t r i b u t i o n s r e s p e c t i v e l y . With D as i n (A2.1) X = N k (Cj, D) => BX = N k(0, A) (A2.2) Now l e t PQ, P^ be the p r o b a b i l i t y measures corresponding to Nk(0_, D) , Nk(_u_, D). From (A2.2) and (A2.3), Q Q = PQB" 1 and Q = P ^ " 1 where the notation means Q 0 ( A ) = P Q ( B _ 1 A ) , Q X ( A ) = P ^ B L A ) for Borel sets A i n R . The l i k e l i h o o d r a t i o dP^/dPg i s simple to find and the following lemma shows how i t relates to dQ^/dQ^, the desired l i k e l i h o o d r a t i o . - 57 - Lemma. Let Pg, Pj be p r o b a b i l i t y measures on a measure space (X,F) and f:(X,F) •»• (Y,G) be measureable and 1-1 with measurable inverse f - 1:(Y,G) •»• (X,F). Define Q Q on (Y,G) by Q Q(A) = P 0 ( f _ 1 A), Q X(A) - P j C f " 1 A). (A e G), Then i f P 1 P Q and Q 1 « Q Q then dQ, dP. , air (y> = lor- ( f < y » (y e Y > - 0 0 Proof: Let A e G. dP, dP, _, , U Hr<£ <*» dV?> " Uw<f ™ dpo f dP, = / - i F ( x ) d po ( x ) f A 0 (by the change of variables formula with y = f(x).) = P ^ r ^ A ) ) = Q^A), The use of this lemma requires dPj/dPg. But P n i s the d i s t r i b u t i o n of a vector (Xj,...,X , 0,...,0)' of r i . i . d . N(0,1) var i a b l e s and Pj i s the d i s t r i b u t i o n of this vector with the added mean vector j£ = (Pj,...,M r, 0 0)'. Therefore for any x = (x^ > •••* 0, ••.0) - 58 - exp{- j I ( x ± - u ± ) 2 } dP. •jp-Cx) - 0 I 1 V 2 I exp{- y I x } 1 t E p i x i " 1 1 ^ = e x p ^ ' 2. - { 2' J±l = exp 1 = exp{- I(x - _y)' (x - v) + I x' xj. Therefore, by the lemma, since the l i n e a r map B i s 1-1 from the range of D to the range of A, for each y i n the range of A dQ. dP. , dP. _ i ( z ) . _ i (B 'Z> - ̂ (»• j) = exp(u_' B' x - "J Ji' i l l = exp{(B'_6)' > B'y - i-(B'_6)'(B'_S)} = exp{6/ X - ^V±}. and this was the formula used to obtain (3.12). Note: A further use of th i s c a l c u l a t i o n i s made for the ap p l i c a t i o n of the SPRT to the problem of testing the mean vector of a multivariate normal d i s t r i b u t i o n . Note that only alte r n a t i v e s which specify a mean vector i n the range of the covariance matrix can be tested. - 59 - A.3 Two Lemmas on Weak Convergence In t h i s section proofs of Lemma 1 and Lemma 2 are provided. Lemma 1 i s f i r s t restated more p r e c i s e l y . Lemma 1. Let PQ and P^ be p r o b a b i l i t y measures with P^ PQ and Z = dP^dPQ be th e i r l i k e l i h o o d r a t i o . For each n=l,2,... l e t PQ and be pr o b a b i l i t y measures with P^ PQ and Z n = dP^/dP™. If there are random elements X, X n such that d (X n, Z n) •»• (X, Z) under PQ, P Q then d X n •*• X under P n, P ^ Proof: If f i s bounded and continuous on the space where X n and X l i e /f(x n) dP^ - /f(x n) z n d?l = /h(Xn, Z n) dPj * jh(X, Z) dP Q (since h(x, z) = f(X) z i s continuous on the product space.) = Jf( X ) Z dP Q = Jf(X) d P r - 60 - Lemma 2. Let P^ and PQ(n=l,2,...) be p r o b a b i l i t y measures and l e t X, Y, X n and Y n be random elements such that X n ^ X under P^, P Q and Y n £ Y under P n, FQ. If there i s a continuous function H and random elements e n such that then Y = H(X) Y n = H(X n) + e n e n ->- 0 i n p r o b a b i l i t y under P n, (X n, Y n) i (X, Y) under P n, VQ. Proof: Since e n •> 0 i n p r o b a b i l i t y i t s u f f i c e s to prove that d (X n, H(X n)) - (X, H(X)). (see [1]). For this l e t f be continuous and bounded on the product space where (X,Y) l i v e s . Then Jf(X n, H(X n)) dPjJ = /g(X n) dpJJ * JR(X) dP 0 = Jf(X, H(X)) dP Q Since g(x) = f(x, H(x)) i s continuous.
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 9 | 12 |
City | Views | Downloads |
---|---|---|
Beijing | 9 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: