Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Non-parametric two sample tests of statistical hypotheses Hunt, Everett Edgar 1951

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1951_A8 H7 N6.pdf [ 1.73MB ]
Metadata
JSON: 831-1.0080627.json
JSON-LD: 831-1.0080627-ld.json
RDF/XML (Pretty): 831-1.0080627-rdf.xml
RDF/JSON: 831-1.0080627-rdf.json
Turtle: 831-1.0080627-turtle.txt
N-Triples: 831-1.0080627-rdf-ntriples.txt
Original Record: 831-1.0080627-source.json
Full Text
831-1.0080627-fulltext.txt
Citation
831-1.0080627.ris

Full Text

NON-PARAMETRIC TWO SAMPLE TESTS OF STATISTICAL HYPOTHESES by E v e r e t t Edgar Hunt  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS i n t h e Department of MATHEMATICS  We accept t h i s t h e s i s as conforming to the standard r e q u i r e d f r o m c a n d i d a t e s f o r t h e degree of MASTER OF ARTS.  Members o f t h e /Department o f Mathematics  THE UNIVERSITY OF BRITISH COLUMBIA A p r i l , 1951  Abstract The t e s t i n g o f s t a t i s t i c a l hypotheses c o n c e r n i n g two p o p u l a t i o n s c o n s i s t s i n d e t e r m i n i n g t h e r e l a t i o n s h i p between t h e cumulative d i s t r i b u t i o n f u n c t i o n s on the b a s i s o f random samples from each p o p u l a t i o n .  I n the non-parametric  case t h e o n l y assumption made r e g a r d i n g t h e p o p u l a t i o n s i s that the two c . d . f ' s . are c o n t i n u o u s . of any s t a t i s t i c  Thus the. d i s t r i b u t i o n  proposed t o t e s t the two samples must be  independent of t h e f u n c t i o n a l form o f t h e c.d.f s . f  One  method o f approach i s based on the order r e l a t i o n s o f t h e sample v a l u e s .  A survey i s made of such t e s t s r e c e n t l y pro-  posed and a new t e s t I s suggested based on sampling without replacement from a p o p u l a t i o n of the p o s i t i v e i n t e g e r s 1, 2,  Table of Contents page Introduction  1  C l a s s i f i c a t i o n o f non-parametric t e s t s based on order r e l a t i o n s of the sample v a l u e s  &  The Wald-Wolfowitz  5  Run Test  The Mathisen T e s t  9  The Pitman Randomization T e s t  15  •A New  18  Test:  Conclusion  "The Integer T e s t "  28  f  Introduction The numbers \i4iich c h a r a c t e r i z e t h e d i s t r i b u t i o n o f a p o p u l a t i o n or u n i v e r s e a r e c a l l e d p o p u l a t i o n parameters.  In  most cases which a r i s e i n p r a c t i c e i t i s impossible t o d e t e r mine the v a l u e s of these parameters.  Thus they are p r e d i c t e d  or estimated by s t a t i s t i c s which are f u n c t i o n s of t h e sample values drawn from the p o p u l a t i o n .  I n the past f i f t y  years a  g e n e r a l t h e o r y o f e s t i m a t i n g these parameters and of t e s t i n g hypotheses concerning t h e i r v a l u e s has been developed [ 2 ] , One to  important pro blem vfa i c h has r e c e i v e d much a t t e n t i o h i s  t e s t whether two random samples are drawn from t h e same popu-  lation.  T e s t s of t h i s h y p o t h e s i s are based on the  Student's  t  d i s t r i b u t i o n which g i v e s a c r i t e r i o n f o r t e s t i n g  whether the d i f f e r e n c e between two and on the  classical  F  sample means i s s i g n i f i c a n t  d i s t r i b u t i o n vAiich t e s t s whether the d i f f e r e n c e  between the v a r i a n c e s i s s i g n i f i c a n t .  Both these t e s t s and most  of t h e o t h e r s i n common use assume t h a t the p o p u l a t i o n b u t i o n s are normal.  Since t h i s h y p o t h e s i s i s very  distri-  restrictive  much e f f o r t has been expended by s t a t i s t i c i a n s i n attempting to  show t h a t the commonly used d i s t r i b u t i o n s are a t l e a s t  t o t i c a l l y normal.  asymp-  However, not a l l d i s t r i b u t i o n s have t h i s  pro-  p e r t y and- f u r t h e r , i f the sample i s s m a l l , t h e normality assumpt i o n w i l l not h o l d even An important  approximately.  s t a t i s t i c a l problem, then i s t o d e r i v e methods  2.  which can be used t o t e s t hypotheses assuming n o t h i n g about the p o p u l a t i o n d i s t r i b u t i o n s except that the cumulative d i s t r i b u t i o n f u n c t i o n s are c o n t i n u o u s . non-parametric or d i s t r i b u t i o n f r e e . t a t i v e rather  than quantitative  Such t e s t s are termed These t e s t s use  a s p e c t s of the  quali-  sample v a l u e s .  For example, instead" of s e t t i n g up a c r i t e r i o n to t e s t difference  between the  means and  a test c r i t e r i o n i s established r e l a t i o n s of the I t may  quantitative  i n f o r m a t i o n has should be  not  samples,  concerning the r a n i or  order  e f f i c i e n c y of a t e s t  i s reduced  r e l a t i o n s s i n c e a l l of the  been u t i l i z e d .  judged a g a i n s t the  assumption concerning t h e bution.  v a r i a n c e s of the two  data.  be argued t h a t the  by n e g l e c t i n g  the  available  This l o s s i n e f f i c i e n c y  p o s s i b i l i t y of making an  n o r m a l i t y of the  population  incorrect distri-  F o r - t h i s r e a s o n non-parametric t e s t s have .a p l a c e i n  t h e . t h e o r y of t e s t i n g hypotheses. A good t e s t should have a h i g h p r o b a b i l i t y f a l s e hypothesis. r e j e c t i n g the  The  power of a t e s t i s the p r o b a b i l i t y  of  n u l l h y p o t h e s i s when a c t u a l l y i t i s f a l s e and  a l t e r n a t i v e hypothesis i s true of the  of r e j e c t i n g a  [2].  Thus t h e  power i s a  parameters of the d i s t r i b u t i o n i n v o l v e d  i n the  an  function  true a l t e r -  n a t i v e hypothesis.: Therefore,, i n non-parametric t h e o r y a d i f f i c u l t y arises. t e s t has  However, an a l t e r n a t i v e method of e v a l u a t i n g  been proposed.  A test  i s c a l l e d consistent  i f the  a  pro-  b a b i l i t y of r e j e c t i n g a f a l s e n u l l h y p o t h e s i s against, c e r t a i n a l ternatives  approaches one  d e f i n i t e l y [7].  as the s i z e of the  Thus a t e s t may  be  sample .increases i n -  consistent  with r e s p e c t to  one p a r t i c u l a r a l t e r n a t i v e h y p o t h e s i s but not to o t h e r s . Many new  non-parametric t e s t s f o r comparing  have been proposed r e c e n t l y .  The  two  samples  o b j e c t of t h i s paper i s t o  present a s u r v e y o f these t e s t s and put forward another.  4  C l a s s i f i c a t i o n o f non-parametric  t e s t s based  on order r e l a t i o n s of t h e sample v a l u e s By order r e l a t i o n s of t h e sample v a l u e s i s meant the ordered s e t of v a l u e s i n a random sample from l e a s t t o g r e a t e s t . parametric two sample t e s t s u s i n g t h i s property  Non-  can be c o n s i d e r e d  as being one of t h r e e t y p e s : i ) those based on a c o m p a r i s i o n of t h e two p o p u l a t i o n d i s t r i b u t i o n s along the whole r e a l l i n e , i i ) those based on a comparison a t a f i n i t e number o f f i x e d p o i n t s such  as the q u a n t i l e p o i n t s of the d i s t r i b u t i o n s ,  i i i ) those based on the method o f r a n d o m i z a t i o n . In what f o l l o w s r e p r e s e n t a t i v e t e s t s o f these t h r e e are c o n s i d e r e d .  types  5  The Wald-Wolfowitz Run T e s t A t e s t of the f i r s t and  J . Wolfowitz  ^1»  ••• ^m  [V],  type i s t h e Run t e s t o f A. Wald Let  Y2, ... Y  m  be a sample o f o b s e r v a t i o n s  from a p o p u l a t i o n w i t h continuous  distribution function, Yl,  0  F(X)  fro ma  n  6 ion function,  and l e t  0  cumulative  be a sample  n  populat i o n with co ntinuous d l stir i b u -  G(X) •  It i s required to derive a test of  the n u l l hypothesis t h a t  F(X) = G(X) .  Let  0 + m  denote  n  the combined sample, tiie o b s e r v a t i o n s being ordered from the least t o the greatest. °m+n  :  z  l > 2> z  ,Wald and Wolfowitz 0  ••• m+n z  w  h  e  r  e  Z  i<  proceed as f o l l o w s :  z  i+l  replace  by z e r o o r by one depending on whether  m + n  the sample  0  m  or from sample  0 . n  Define a  Z^  Z^  in  comes from  r u n to be a  sequence of zeros u n i n t e r r u p t e d by ones or a sequence o f ones u n i n t e r r u p t e d by zeros and c o n s i d e r the number of runs i n 0  m+n  •  The s t a t i s t i c proposed  i n this test  i s U , t h e number  of runs. N a t u r a l l y b e f o r e any s t a t i s t i c c a n be used as a t e s t c r i t e r i o n , i t s d i s t r i b u t i o n f u n c t i o n must. be. determined. hypothesis that the p r o b a b i l i t y the assumption of the  0_  Under t h e n u l l  F(X) - G(X), t h e d i s t r i b u t i o n ^ :  U  ^ w i l l be  of o b t a i n i n g a p a r t i c u l a r number of. runs under :  that a l l o f the arrangements of t h e  m  sample, and a l l o f t h e arrangements of t h e  values n. v a l u e s  6.  of t h e  0  sample have e q u a l p r o b a b i l i t i e s .  n  This  probability-  i s t h e r a t i o o f the number o f the arrangements of t h e the  Y*s  with  m + n, m, n  ber of arrangements w i t h  and  m + n, m,  i s t h e number o f arrangements o f n  and  U h e l d f i x e d t o t h e t o t a l numn  constant.  The denominator of t h i s r a t i o i s  are a l i k e and  X*s  C(m +' n, n)  m + n  elements,  since t h i s m  o f which  of which are a l i k e .  To determine t h e numerator of the r a t i o , two c o n s i d e r e d a c c o r d i n g as Then t h e r e w i l l be  k  U  i s odd or even.  runs  cases must be  First, l e t  of z e r o s and a l s o  k  U • 2k .  r u n s o f ones  i n any arrangement i n which the exact number o f runs equals Now t h e problem runs w i t h  m  of d e t e r m i n i n g the number of arrangements of  x s m  zeros into  k  Consider t h e c e l l s to be  Then s i n c e each t h e r e are  K  i s t h e same as t h a t of f i n d i n g t h e number of  T  ways o f p u t t i n g empty.  U.  c e l l s , none of which i s spaces between  k + 1  bars.  arrangement must s t a r t and end w i t h a bar;,  k - 1  remaining b a r s t o permute.  Further, since the  c e l l s are non-empty t h e r e must be at most one bar between any two  zeros.  k - 1  Thus there are  spaces between t h e zeros and  p l a c e s to put the b a r s , hence t h e r e a r e  p o s s i b l e arrangements. ing exactly Now  m - 1  k  Similarly,  runs w i t h  n  y s T  combination  0  m + n  Y's  i s equal t o  U  depending  runs  C(n - 1, k - 1).  X's, t h e r e are two  ar-  on whether t h e  begins with zero or one.  l i t y that there are e x a c t l y  1)  t h e number of ways of o b t a i n -  f o r every g i v e n arrangement of t h e  rangements p o s s i b l e w i t h the  G(m - 1, k -  (U = 2k)  Then the p r o b a b i equals  7  2 C(m-1. k-1) C(n-1.k-1) C(m+n, n) For t h e case o f t h e X*s and  U = 2k + 1, t h e r e are e i t h e r  and  k  r u n s of t h e  k + 1  r u n s o f the  U » 2k + 1  runs e q u a l s  Y s .  Y's  or  k  k + 1  runs  runs o f t h e X*s  Then t h e p r o b a b i l i t y o f  T  C(m-1. k) C(n-1. k-1) + C(m-1. k-1) C(n-1. k) C(m+n, n) The r e g i o n o f r e j e c t i o n f o r t h e n u l l h y p o t h e s i s c o n s i s t s of t h e v a l u e s of t i c a l value o f  U  U  such t h a t  U < U  depends on t h e l e v e l  r  t h a t i s d e s i r e d by t h e experimenter. i s such t h a t  Prob  U  Ua , t h e c r i -  of s i g n i f i c a n c e a  a  i s pre-determined and  (U < U ) = a . a  Thus s m a l l v a l u e s of  U  a r e judged s i g n i f i c a n t i m p l y i n g t h a t  when t h e r e a r e t o o few runs  t  the two samples.  where  a  t h e r e i s poor mixing of t h e d a t a of  The worst c a s e would occur when  U « 2 .  This  would mean t h a t a l l the o b s e r v a t i o n s of the one sample a r e g r e a t e r t h a n those of t h e o t h e r . Tables giving values o f .01, .025  f o r m, n < 20  a  a t t h e .005,  s i g n i f i c a n c e l e v e l s have been prepared by F. Swed and  CV Eisenhart [ 6 j . computed.  U  Values of  U  a  f o r m, n > 20  However, s i n c e t h e d i s t r i b u t i o n o f  a s y m p t o t i c a l l y normal w i t h mean  2mn ]_ +  m+n and v a r i a n c e  2mn (2mn - m - n) 2 (m+n) (m+n-1)  have not been  U has been proved  a.  the c r i t i c a l  values can be computed appro x i mat e l y f o r l a r g e  samples [ 7 ] . The RUn t e s t has been shown t o be c o n s i s t e n t w i t h r e s p e c t t o a l t e r n a t i v e hypotheses w i t h minor r e s t r i c t i o n s [ 7 ] . n  i n c r e a s e without  constant.  The expected  value of  when the n u l l h y p o t h e s i s , U/m  m/n  l i m i t such t h a t the r a t i o , is  U  converges s t o c h a s t i c a l l y to  -  approximately  F(X) = G(X)  i s true.  Let X , a  2m/0.  The  + X)  statistic,  v a l u e , 2/(l + X)  i t s expected  under t h e n u l l h y p o t h e s i s .  T h i s means t h a t the p r o b a b i l i t y  the expected  d i f f e r i n g from  value o f  U/m  than any g i v e n amount approaches one as finitely. theses,  2/£L + X)  m  m,  by  of  less  i n c r e a s e s inde-  Then i t i s shown t h a t under t r u e a l t e r n a t i v e hypoU/m  converges to i t s expected v a l u e which i s l e s s  than 2/(l .+ X) .  Thus < 2/(1  Prob (U/m i f the n u l l hypothesis  ) -* 1  + X)  is false.  The f o l l o w i n g example i l l u s t r a t e s t h e use of t h e S-un t e s t . Given the two 3.3,  5.7,  samples (5.S,  4.1,  4.6,  5.6),  2.9,  7.2,  3.1,  2.5,  6.1)  (4.9,  and  t e s t the h y p o t h e s i s - t h a t t h e s e  are  random samples drawn from t h e same p o p u l a t i o n about which nothing i s assumed except t h a t i t i s c o n t i n u o u s .  Combine the date  order the values from the l e a s t to t h e g r e a t e s t . the v a l u e s  P  and  1  observed v a l u e o f  value  U ^ 0  » 3  for  Then a s s i g n i n g  to t h e o b s e r v a t i o n s according as they  come f r o m the f i r s t or second sample, we The  U mn 7  is  3 .  = 6 .  o b t a i n 000111111000.  From t a b l e s [ 6 J the  Thus  U = 3  critical  i s significant  the n u l l h y p o t h e s i s i s r e j e c t e d on the b a s i s of t h i s example.  and  and  particular  9.  L  The Mathisen  Test  The f o l l o w i n g t e s t proposed example of t h e  by H.C.  [4]  Mathisen  second type of order r e l a t i o n s  i s an  tests.  Two  methods of comparing the samples a r e considered,--one the median and t h e other, the Let  ©2n+l  D e  quartiles.  sample composed of  a  from a continuous p o p u l a t i o n .  The  are ordered so t h a t  .  servations i s *1>  ^2>  X^ < X^ ^ +  X ^.  Let  n+  ••• ^2m  0  involving  2n + 1  elements drawn  sample v a l u e s X j , X , 2  The median: o f the  2n + 1  be a sample c o n s i s t i n g  2 m  ... X  2 n +  ]_  ob-  of elements  drawn f r o m another, continuous p o p u l a t i o n .  As before, i t i s r e q u i r e d t o t e s t the h y p o t h e s i s t h a t these two  samples came from t h e same, p o p u l a t i o n .  number of v a l u e s of sample median of sample  0  2  n  +  i  0  .  Let  The s t a t i s t i c proposed h y p o t h e s i s i s the v a l u e of t i c a l v a l u e s of b a b i l i t y that  m^, X < X  m X  2  n +  = 2m -  by Mathisen m-^.  i  J  In o r d e r to determine  f (X) dX .  - 00 Then n +  the  n+  f o r testing the  be  Prob ( X ,  X ^,  equal t h e number  i t s d i s t r i b u t i o n i s obtained. n +  equal the  i.  r?n+l p =  m-^  which are l e s s than  2 m  of o b s e r v a t i o n s g r e a t e r than  Let  < X) = 1 -  p .  null the  cri-  Let t h e pro-  1.0.  Since  X ^  i s t h e median of  n +  l e s s than  X  n +  i  and  n  C^n+l • "there w i l l be  values g r e a t e r t h a n i t .  nomial d i s t r i b u t i o n the p r o b a b i l i t y (2n*l)I nj I I n!  p ( l- p ) n  element f o r  n  By the m u l t i X  n +  i  w i l l be  dp .  n  Also using the multinomial d i s t r i b u t i o n , the c o n d i t i o n a l lity  of  m^  f o r a given  X  (2m)! mil  n+l  !  n»l  p  probabi-  w i l l be { 1  . 210-10!  .  p)  (2m-m!)I  Then t h e p r o b a b i l i t y x  n +  values  of o b t a i n i n g p a r t i c u l a r  values f o r  mj and  is (2n»l)I nl  (2m) I  p  ^  l  _ n+2m-m! ^  o f a given v a l u e f o r  the above e x p r e s s i o n i n t h e i n t e r v a l mi  (2m)! (n+m )I x  nl n l m I x  or s m a l l values of of the s t a t i s t i c  m^  0 < p < 1 .  ,  integrate  Then t h e d i s -  (2m-mi).*  (n+2m-mi)! (2n+2m+l)I  The t e s t c r i t e r i o n i s the v a l u e o f m^  .  are judged s i g n i f i c a n t .  Either  level,  a .  large  C r i t i c a l values  c a n be computed from the d i s t r i b u t i o n  f o r any d e s i r e d s i g n i f i c a n c e  A small table  function of the  c r i t i c a l v a l u e s f o r a few p a i r s of v a l u e s of m, n  been i n c l u d e d i n the d e s c r i p t i o n Mathisen has proposed cribed.  ^  is  (2n+l)J  .01, .05  p)  n! mil (2m-mi)l  To o b t a i n t h e p r o b a b i l i t y  t r i b u t i o n of  ( 1  has  o f t h e t e s t [4] •  an e x t e n s i o n of the method j u s t  des-  Instead of d i v i d i n g the one sample i n t o two p a r t s i t i s  11  suggested to make f o u r d i v i s i o n s .  T h i s i s done by  the g u a r t i l e p o i n t s of the  sample  0  the second sample be  i n s t e a d of  values of  0^  0^  2  n  +  i 0  f a l l i n g i n each of the  m  q u a r t i l e s of  0  2 n +  ^  be  m^,  m,  m^,  2  .  considering  F o r convenience l e t •  2 m  Let the number of  four i n t e r v a l s of  m^  the  respectively.  Then 4 51 i»l The  m<  =  4m  s t a t i s t i c proposed f o r t h i s t e s t i s £  (m -m)  ' where  9m  0 < \  < 1 .  9  m  2  i s a normalizing, f a c t o r to ensure t h a t  2  I t should  2  ;  be  noted t h a t t h e r e  i s an e r r o r i n the e x p r e s s i o n 2 of the numerator i s 12m .  s i n c e t h e maximum value  Again, u n u s u a l l y l a r g e or s m a l l v a l u e s a poor comparison of the two significant.  The  values  same manner as was  C r i t i c a l values of  a  .  The  of  T^  m  statistic  can be computed f o r  s t a t i s t i c f o r both t h e median and and  the ( d i s t r i b u t i o n f u n c t i o n s of the  w i l l indicate  Let  ficsn.  values  o f the  the q u a r t i l e method .become n •  However, i n both c a s e s  s t a t i s t i c s can be  d i s t r i b u t i o n of  to be a s y m p t o t i c a l l y normal.  is  various  approximated  by other w e l l known d i s t r i b u t i o n s f o r which t a b l e s a r e For the median method, the  judged  T^  employed i n t h e  computation of the c r i t i c a l  rather laborious f o r large  m^  Thus such values a r e  d i s t r i b u t i o n f u n c t i o n of the  determined i n much the method.  samples.  of  T^  EJm-J  m^  available.  h a s been found  denote the mean of  m-^  12  -and  D [m-J  t h e v a r i a n c e of  m^.  As  m, n -* °°  such t h a t  m/n = X, a constant, t h e l i m i t i n g form of the moment g e n e r a t i n g f u n c t i o n f o r the r a t i o m  l - T l1 E  m  D[mi] i s shown t o be i d e n t i c a l with t h e moment g e n e r a t i n g f u n c t i o n of the standard normal d i s t r i b u t i o n w i t h zero mean and u n i t v a r i ance.  A l s o t h e d i s t r i b u t i o n of t h e s t a t i s t i c .  used i n t h e  q u a r t i l e method can be approximated by t h e d i s t r i b u t i o n by a Pearson type I curve.  defined  I t i s conjectured t h a t since  is  the sum o f squares i t s d i s t r i b u t i o n c o u l d be approximated by the chi-squarec  distribution.  Another non-parametric t e s t , proposed by W,J, Dixon [ 3 J c a n be shown to be an e x t e n s i o n of t h e method of using t h e median or q u a r t i l e p o i n t s as i n t h e Mathisen t e s t . samples.  Consider the  by t h e  n  n + 1  1  < x  Let t h e number o f values o f where  0  < ... < X  2  m  n  be t h e two  i n t e r v a l s on the r e a l , l i n e c r e a t e d  ordered o b s e r v a t i o n s of - 00 < x  Let 0 , 0  0  , < 00 ,  i n these  m  i = 1, 2, ... , n + 1 .  n  n  The t e s t  i n t e r v a l s be  m^  where  c r i t e r i o n suggested by  Dixon i s D Extending  -  2  Iii 1 _ f i ) 2 . i = l n+1 m  the q u a r t i l e method of Mathisen so t h a t t h e  t i l e p o i n t s of  0  n  are c o n s i d e r e d , t h e s t a t i s t i c would be n+1  T i  n  +  1  n  ,  i=l .  n+1  .  m  • 2  quan-  13.  E s s e n t i a l l y t h e two s t a t i s t i c s are t h e same since T n+1  e  T  n+1 ~  2 n  •  2 The d i s t r i b u t i o n o f  n D  has been shown by Dixon to be ap-  proximately the c h i - s q u a r e d d i s t r i b u t i o n with  v  degrees of  freedom where v « mn( n+m+1) (n+3 ) (n+4) 2(m-l)(m+n+2)(n+l) thus  In / ( n + l ) ) T  \  2  w i l l have the same d i s t r i b u t i o n .  M + l  2 Under c e r t a i n c o n d i t i o n s the Dixon t e s t c r i t e r i o n , D and t h e r u n t e s t  statistic,  same i n f o r m a t i o n .  ,  U , have been shown to g i v e t h e  I n h i s paper [3] , Dixon shows t h a t the c o r -  r e l a t i o n between t h e two c r i t e r i a approaches one f o r l a r g e compared t o  m  n  . I n t h i s case the Dixon t e s t can be c o n s i d e r e d  as a t e s t of type one s i n c e the two population  distributions  w i l l be compared at an i n f i n i t e number of p o i n t s along the r e a l line.  Such should a l s o be t r u e of t h e extension of t h e Mathisen  t e s t using  T  n + 1  .  A.H. Bpwker [l]  has shown t h a t t h e median t e s t suggested  by Mathisen i s not c o n s i s t e n t f o r a l l a l t e r n a t i v e  hypotheses  r e g a r d i n g t h e two p o p u l a t i o n d i s t r i b u t i o n f u n c t i o ns T h i s i m p l i e s t h a t the p r o b a b i l i t y  F(X), G(X).  of t h e fal-se n u l l  hypothesis  being r e j e c t e d , when t h e s i z e of the samples i n c r e a s e s i n d e f i n i t e l y , does not approach one.  In particular,  tested against the alternative  i f the n u l l hypothesis i s  hypothesis  that  F(X)  and  G(X) a r e  d i f f e r e n t except i n the r e g i o n o f t h e i r medians, the t e s t w i l l not consistently  r e j e c t the n u l l h y p o t h e s i s .  As before  let G  2 n +  ^  14.  „and  0  be the two samples.  2 m  The proof i s based on the f aet  that the sequences  ia /2n  h a l f where  are the upper and lower c r i t i c a l values of  m^  and m /2n  a  m  e  each converge to one-  £  such that under the n u l l hypothesis, Prob (B^ < m^) = a  arid Prob (m^ < nig) = e < 1 - a .  Then, even though the alternative hypothesis i s true, the prob a b i l i t y of re jecting the n u l l hypothesis approaches n  a + e as  increases i n d e f i n i t e l y . The following example i l l u s t r a t e ? the use of the Mathisen Given the two samples .(.651, .662, . 5#4, .601,  and Dixon tests. .639,  .572, . 6 0 4 , .625, .573, .536) and (.575, . 6 0 5 , . 5 5 0 , .579,  .563,  .552, .591, . 5 7 6 , . 5 6 7 , .533), test the hypothesis that  these are random samples drawn from the same population.  Since  n = m = 1 0 , the median o f either sample must be estimated by averaging the two middle numbers. is .6015.  The median of the f i r s t  sample  The observed vahie of m^ = 9 • Using t a b l e s [4] we  find this value of m  i s s i g n i f i c a n t at the  x  a = .05  level.  However, using the median of the second sample we obtain a d i f ferent r e s u l t .  The estimated median i s . 5 7 5 5 .  The observed  value of mi = 2 which i s not s i g n i f i c a n t at the  a = .05 . level,  Using the Dixon t e s t the f i r s t sample divides the second sample into the following groups: 4 , 0, 3 , 0 , 2, 0, 0, 1, 6 , 0, 0.  Then u  K  ll  10  '  V  TI  10'  v  l l 10'  II " Io  + {  )2 + 7 (  Ii " )2  2G9  Using the table [3] we f i n d t h i s - r e s u l t i s not s i g n i f i c a n t a t the  a • .05  level.  •  15  The A t e s t based  Pitman Randomization on the  samples w i t h elements pectively. Om+n  method of r a n d o m i z a t i o n has been  by E.J.G. Pitman [ 5 ] .  proposed  Test  0 ,  As b e f o r e , l e t  X j , X , ... X 2  m  and  m  Y^, Y , ... Y 2  Z^, Z , ... Z 2  where  m + n  Again i t i s r e q u i r e d t o t e s t t h e n u l l h y p o t h e s i s Define a s e p a r a t i o n o f  0  w i l l be  values.  The t o t a l  C(m+n, m) .  !  Let  m  0 .  R  v a l u e s and t h e  equal t o or g r e a t e r t h a n t h a t of < C(m+n, m) .  amount of p r o b a b i l i t y , the n u l l h y p o t h e s i s .  a  determined  |x - f |  i s d e f i n e d as 0 , 0  are t h e mean v a l u e s o f  \  m  m + n  Gall t h i s particular separation  n  m  n  M - t h e number of s e p a r a t i o n s o f  such t h a t  F(X) » G(X) .  number o f p o s s i b l e s e p a r a t i o n s  spread of t h i s s e p a r a t i o n and ?  Z^ < Z^+i.  One such s e p a r a t i o n w i l l be t h a t  0 ,  by t h e two samples The  res-  n  to be a d i v i s i o n of t h e  m + n  o b s e r v a t i o n s into two p a r t s , one c o n t a i n i n g n  be two  n  Combine and order the data of the two samples so t h a t  c o n s i s t s of the v a l u e s  other,  G>  R .  R .  where  respectively. 0  m + n  w i t h a spread  Let \  The v a l u e of  be a f i x e d i n t e g e r depends on t h e  d e s i r e d i n the r e j e c t i o n r e g i o n under  If M <  \  then t h e spread o f  R is  1 judged  s i g n i f i c a n t and t h e n u l l h y p o t h e s i s i s r e j e c t e d .  t e s t c r i t e r i o n i s t h e number of s e p a r a t i o n s of g r e a t e r or equal to t h a t o f t i v e l y small then  | x - ?|  h y p o t h e s i s t o be t r u e .  R .  I f t h i s number  0  m + n  M  Thus t h e  with  spread  is. compara-  i s considered too great f o r the n u l l  16.  For v a l u e s of  m,  s i d e r a b l e computation  n  as l a r g e as to determine |x  spread g r e a t e r or e q u a l to tic  i s suggested  10  t h e r e would be  con-  a l l the separations with a  - Y|  .  For t h i s r e a s o n a  statis-  by Pitman which i s r e l a t e d to t h e p r e v i o u s  w i t h the added p r o p e r t y t h a t i t s d i s t r i b u t i o n f u n c t i o n can approximated  by the b e t a  one be  distributiqri*  D e f i ne (X-Y)  2Sn_ m+n  w  2  Sl+S f£-(X-Y)  2  2+  where S The f i r s t  1  =  m  H  i=l  n 2  ( X i - X)  and  S  equal t o those of the beta  l o o Since l a r g e values of  i=l  c r i t i c a l v a l u e of i s determined  W  {Y  W  p - Y)  ±  —  are shown t o  distribution,  H-JLJ - l)  W  w i l l be judged  region of r e j e c t i o n f o r t h i s t e s t i s . for a particular  s i g n i f i c a n t , the  W  a  where  v a l u e of  W  i s the  a  a .  W  is  a  by i  rl  a = — — ± / ^ ( 1 , S»S -1) \ As a n i l l u s t r a t i o n of the  w»q-e  x-i  X '  2  (1 - X)  C(8, 4)  are two - 70  2  dx  Pitman t e s t a p p l y the s t a t i s t i c  to t e s t the h y p o t h e s i s t h a t  (16, 19, 22, 24) Thene are  H  t h r e e moments of the d i s t r i b u t i o n o f  be approximately  | l - Y|  »  2  (0, .11, 12, 20)  and  random samples f r o m the same p o p u l a t i o n . possible separations.. M  a  =, Q57 M  #  =  •  Since t h e r e are  M <= 6  or g r e a t e r t h a n  | X - Y | the r e s u l t  conclude  s e p a r a t i o n s with a spread e q u a l  t h e r e i s no evidence  t h e b a s i s o f these  samples.  to  i s not s i g n i f i c a n t and w  a g a i n s t the n u l l h y p o t h e s i s  on  18  A New T e s t : "The Integer T e s t " The f o l l o w i n g new t e s t which w i l l t e s t i s based  be c a l l e d t h e Integer  on the p r i n c i p l e o f r a n d o m i z a t i o n ,  and t h u s i s  r e l a t e d to the Pitman t e s t . As b e f o r e , suppose  0  m  and 0  are two samples drawn  n  from-populations w i t h continous d i s t r i b u t i o n f u n c t i o n s , F(X) and G(X) .  The n u l l h y p o t h e s i s i s F(X) - G(X) ;  Let 0  m + n  be t h e ordered combination of the two samples m+n  G  :  z  l » 2> •••m+n where z  z  Replace t h e sample v a l u e s subscript,  i  , where  of 0  Z  ±  0 , 0 m  i + 1  by t h e i r  m + n  i = 1, 2, ... m+n ,  element o f t h e two samples  < Z  .  corresponding  so t h a t t o each  t h e r e i s assigned a p o s i t i v e  n  i n t e g e r which i n d i c a t e s t h e rank or o r d e r of t h e element i n t h e combined sample  €> + . ra  n  I f Zj_ = i + l ~ z  Z  i+2  SS  ~ i + r f replace z  each o f t h e s e equal sample v a l u e s by t h e number, i + r/2 . Now c o n s i d e r a s a p o p u l a t i o n t h e i n t e g e r s 1, 2, 3, ... m +'n'= N .  Suppose samples of n  i n t e g e r s a r e drawn f r o m  , this  p o p u l a t i o n so t h a t none of t h e i n t e g e r s are s e l e c t e d more t h a n once f o r each sample.  These samples w i l l be random i n t h e sense  t h a t each has equal p r o b a b i l i t y . of  In p r a c t i c e , the observations  a sample a r e a c t u a l l y drawn without replacement  from a popula-  t i o n but s i n c e the s i z e o f t h e p o p u l a t i o n i s o f t e n very much g r e a t e r t h a n t h e s i z e of t h e sample i t can be assumed t h a t t h e sample d a t a a r e independent.  However, i n t h e Integer t e s t the  sample data must be c o n s i d e r e d a s dependent s i n c e  n  and  N are  J.9.  ,of the same order.  That i s , t h e sampling  placement  from a f i n i t e population.  divisions  of the  N  values r e s p e c t i v e l y . C(N, n) .  i s done without r e -  Now c o n s i d e r a l l p o s s i b l e  i n t e g e r s i n t o two s e t s of  n  and  m  The number of such combinations i s  Gne of these d i v i s i o n s w i l l r e p r e s e n t t h e samples  0m> . G„ n . m  The t e s t c r i t e r i a w i l l be the two means o f the s e t s o f and  n  and  0  integers f o r the p a r t i c u l a r n  ,  m  d i v i s i o n determined by  0  m  Since t h e two means a r e dependent a study o f one o f  them w i l l be s u f f i c i e n t .  F o r convenience, l e t t h e l a r g e r o f the  two,  proposed  be t h e s t a t i s t i c  i n this test.  If v  denotes  the o t h e r mean, note that  nU.  +  (N - n)v - M l + l l ' 2  where Values of  i s the sum of t h e i n t e g e r s  1, 2, 3, ... N .  U  are judged  g r e a t e r t h a n ,(N + l ) / 2  and f o r a g i v e n l e v e l o f s i g n i f i c a n c e  a  the r e g i o n o f r e j e c t i o n  c o n s i s t s o f those v a l u e s o f  U  i s t h e c r i t i c a l value o f  f o r a given p r o b a b i l i t y  U  suggested i n the Pitman t e s t binations greater than particular  U  a  such t h a t  U^ < U ,  a l l the means o f t h e can be computed.  where a .  C(N  Then  }  %  v a l u e of t h e mean such t h a t a p r o p o r t i o n ,  means i s g r e a t e r than  significant,  u a  As i s  n)  com-  is a a  of the  Ua •  U n f o r t u n a t e l y , w h i l e t h e computation i s simpler f o r t h i s t e s t (  t h a n f o r t h e Pitman t e s t , t h i s method o f determining the c r i t i c a l values f o r N  g r e a t e r t h a n t e n i s not p r a c t i c a l .  t h e r e f o r e t o o b t a i n t h e d i s t r i b u t i o n f u n c t i o n of  It i s advisable U  and t h u s  2J0.  determine t h e c r i t i c a l v a l u e s  U  fl  .  F o r independent v a r i a b l e s  the means of samples are normally d i s t r i b u t e d , e x a c t l y i f t h e p o p u l a t i o n i s normal and approximately i f t h e samples a r e l a r g e . However, s i n c e  U  i s t h e mean o f a sample of dependent  integers,  the w e l l known c e n t r a l l i m i t theorem can not be a p p l i e d i n t h i s case.  F o r t u n a t e l y , A. Wald and J. Wolfowitz  general  [&]  have proved a  theorem f o r t h e l i m i t i n g d i s t r i b u t i o n of l i n e a r forms  where the p o p u l a t i o n  c o n s i s t s o f a l l d i v i s i o n s of  servations.  d i s t r i b u t i o n of  Now the  U  m / n  ob-  w i l l . b e t h e same as t h e  d i s t r i b u t i o n o f t h e l i n e a r form, n i=l The Wald-Wolfowitz theorem s t a t e s thafe as  N •* °° t h e t  Prob. (j~ U i - E [ D I ] < t - D [ O i ]  )  i s approximately - k /2rr where  t  / - oo  exp (-x /2) dx 2  i s a r e a l number and E [IUJ  and  D  are t h e mean and v a r i a n c e  [l uj  2  of  2! ^ i r e s p e c t i v e l y .  Before t h i s  theorem may be a p p l i e d a c e r t a i n c o n d i t i o n must be s a t i s f i e d . L  et  | j . be t h e r  r t h moment about the mean of the i n t e g e r s  1, 2, 3, ... N ; the c o n d i t i o n i s t h a t  (u F/2 2  must be of t h e order  of one.  Since  n  r  i s of the order  o f if  21.  p-2  -and  i  s  °^  t  h  order  e  of  N  f o r a p o p u l a t i o n of  the theorem h o l d s f o r t h i s case. U  of the s t a t i s t i c The  N f- ,  where  (N + l ) / 2  The v a r i a n c e o f  n  where  TJ ,  2  2  TJ ,  n^l  1 n  o*i = ctj = a  E  L,  By d e f i n i t i o n ,  2  and  n  i t l j.ti+1 ^ p^j  cr  1  *  denotes t h e c o r r e l a t i o n Now  [(U - N±l-)(Uj {  between two  p ^ j equals  N±i)J  (A) i s equal t o N-1  (B)  integers.  2  7  i t i  N  D [f] i s  i n t e g e r s drawn i n s u c c e s s i o n . (A)  E[U] equals  i s t h e p o p u l a t i o n mean of  n  distribution  N+1  i t i  N  Thus the l i m i t i n g  integers,  i s normal.  expected v a l u e of 1  N  2  N  c(N-,2.) i = l .J-i+1  2  J  2  Since N  0 =  I  (Ui -  i=l  |±i) 2  N  N-1  i=l  i-1  the e x p r e s s i o n (B) equals _ 1 o£_ " cr " N-1 2  Then  4i]  n  2  L  - 2C(n,2)-4] =^ £N-1 S N-lJ n  N j-i+1  mnvj - N±i)  22.  Thus the s t a t i s t i c w i t h mean  U  i s a s y m p t o t i c a l l y normally d i s t r i b u t e d  (N + l ) / 2 and v a r i a n c e N-n N-1  Q-2  n where  o** ,  the p o p u l a t i o n v a r i a n c e e q u a l s  ( N -1)/12 . 2  In o r d e r t o use t h e t a b l e s of t h e standard normal d i s t r i b u t i o n the t e s t  t  c r i t e r i o n w i l l be U - N±l. 2_ ~/ C N-n" _  2  V  N-1  n  The r e g i o n o f r e j e c t i o n becomes c r i t i c a l value o f  t  t  a  < t  where  t  a  i s the a  corresponding t o the p r o b a b i l i t y  of  r e j e c t i n g the n u l l h y p o t h e s i s when i t i s a c t u a l l y t r u e . I f two samples a r e symmetric about t h e same mean t h e s t a t i s tic  U  w i l l be equal t o  (N + 1J/2  s i n c e the i n t e g r a l r e p r e s e n t a -  t i v e s o f t h e v a l u e s o f the samples w i l l a l s o be symmetric.  Now  suppose t h e a l t e r n a t i v e h y p o t h e s i s i s t h a t the p o p u l a t i o n d i s t r i b u t i o n s F(X) ances.  and  G(X)  have the same means but d i f f e r e n t  vari-  I t would be p o s s i b l e t h a t the Integer t e s t would not  d e t e c t t h e f a l s e h o o d of the n u l l hypothesis as some p a i r s of samples would have means which d i f f e r e d by very l i t t l e . t h i s reason when the value o f the observed it  i s suggested  t  i s c l o s e t o zero  t h a t the sample v a r i a n c e s o f t h e two s e t s of i n -  t e g e r s b^f compared with t h e p o p u l a t i o n v a r i a n c e of Since  n i=l  For  2  N-n 1=1  2  N  N  integers.  2  1=1  the two sample v a r i a n c e s a r e dependent and thus o n l y one of them, say t h e l a r g e r , need be considered as t h e t e s t  criterion.  23  -As before, t h e d i s t r i b u t i o n o f t h i s  statistic  must be d e t e r -  mined t o o b t a i n i t s c r i t i c a l v a l u e s .  I t w i l l be shown t h a t t h e 2  d i s t r i b u t i o n of t h i s sample v a r i a n c e  S  by t h e c h i - s q u a r e d  can be approximated  distribution.  To determine t h e p a r t i c u l a r f i r s t two moments of expected value o f  S  S  E  chi-square d i s t r i b u t i o n the  are obtained  2  [2].  By d e f i n i t i o n the  is n  I  A  _ 2 -U )  (Ui  "i=l P r e v i o u s l y i t was shown t h a t [u  _ 1+1]  =  o f _ n-1  2 -I  " where  o  2  n  _a±  n  N-1 1, 2, 3 ,  a~~, i s the v a r i a n c e o f the i n t e g e r s  ... N  By d e f i n i t i o n  i=l  ncr  2  Then u s i n g the i d e n t i t y n 1 1 n i = i we o b t a i n  u i  2  = I n  Is ] E 2  L  - u  2  nc n  -»  2  _ cr n  2 +  n  I i = i  (Ui  n-1 . o£_ n N-1  =  - iili) 2  n-1 n  I t can be shown t h a t t h e v a r i a n c e o f N(N-n)(n-l) Q-4  [2nN  2  - ( u- M : ) 2  2  N N-1 S  2  equals  - 6(n+1) (N-1) + (n N-.N-n-l) (N-1)\ J  (N-l) ( N - 2 ) ( N - 3 )  n  where  c o e f f i c i e n t o f excess d e f i n e d as  \  2  *  s t  n  e  a  2  2  24.  Let Then  and  q  E  D  2  Nn S  =  N-n  r 2n  N(n-l) N  N(n-l)  =  (N-n) (N-1.)  N-n  1. • 0<$)  "•2-  _ 2N (n-1) N (N-n)  4 |  (N-l) (N-2)(Nf3) 2  - 2N(n-l) N*  2N(n-l) (n-1) \Z N  2  (N-n) 2n (N-2)(N-3)  3(n+l)  N-n (N-2)(N-3) n where (n N-N-n-1)  i s a p p r o x i m a t e l y equal to  (n-l)(N-l) .  Then 2  _ 2«(n-: linzl) N-n J-n  Thus f o r l a r g e is  N  N(n-l)/(N-n)  distribution of t r i b u t i o n with  +  \  equal to zero, the mean of  2  and t h e v a r i a n c e i s 2N(n-l)/(N-n) .  q /cr 2  2  Hence t h e  q / o* can be approximated by a c h i - s q u a r e d i s N(n-l)/(N-n)  The s t a t i s t i c the  and  [ i J3=i X 2 + 0(1)1 L 2n N J L  degrees of freedom.  proposed f o r a comparison o f t h e v a r i a n c e s i n  Integer t e s t i s „,2 . Nn S  2  (N-n)a  2  The r e g i o n of r e j e c t i o n w i l l be t h e v a l u e s o f  As an i l l u s t r a t i o n  X  such t h a t  o f the Integer t e s t c o n s i d e r the two  samples used i n t h e a p p l i c a t i o n of t h e Wald-Wolfowitz  Run T e s t .  On o r d e r i n g t h e v a l u e s of the two samples and a s s i g n i n g t h e app r o p r i a t e i n t e g e r s the samples  become ( l , 2, 3, 10, 11, 12) and  25  and  (4,  5,  6,  (N+D/2 =  7 , 9 )  .  TJ = 6.5  Then  , v = 6.5  and  6.5  cr = ( N - l ) / l 2 2  = ll.Q  2  £l n  2  °U  =  Then  N=n N-1  TJ -  .  ftd  '  1  =  0  8  N+1  t =  = 6.5  -  = G  6.5  1.04  T h i s value o f testing  t  i s c e r t a i n l y not s i g n i f i c a n t .  a g a i n s t the a l t e r n a t i v e h y p o t h e s i s t h a t  t r a n s l a t i o n of  F(X)  the s t a t i s t i c  t  Now G(X)  should a l s o be  2  S  2  =  7 N ^ 4  used.  =  2  G  However,  populations  d i f f e r i n o t h e r r e s p e c t s b e s i d e s t h e i r means, the 2  are  is a  would be v a l i d .  i f the a l t e r n a t i v e h y p o t h e s i s i s such t h a t the two  q /cr  i f we  statistic  For the above example,  . 9  .  43 Note that t h i s formula f o r 4  i s d i v i s i b l e by -y  •  2  S  2  hold o n l y i f  n = N/2  and  N  Then  = a!  z  S  cr  - Nn  (12)(20.9L  =  (N-n)c  2  2  1.1  .  11.9  2  The number of degrees of freedom, \, _ N( .D n  _  (  1 2  N-n  )(5)  _  ^  1 ( ?  6  From t a b l e s f o r the c h i - s q u a r e c d i s t r i b u t i o n =.02  Prob. (TCy21.161) Thus the observed a = 505 We  value of  X  •  = 21.1  l e v e l and t h e n u l l hypothesis  is significant  at the  i s rejected.  note t h a t s i n c e t h e Integer t e s t c o n s i s t s o f two  parts  26.  the t o t a l p r o b a b i l i t y where  i n the r e j e c t i o n r e g i o n w i l l be a.-+r(:l-a)e  Prob ( t < t ) = a  Prob ("Y^ <% )  and  a  2  A good t e s t should ..have a h i g h p r o b a b i l i t y n u l l hypothesis when i t i s a c t u a l l y f a l s e .  of r e j e c t i n g the  As s t a t e d p r e v i o u s l y  t h i s p r o b a b i l i t y , c a l l e d t h e power o f a t e s t , cannot for distribution free tests.  = e  be  determined  An a l t e r n a t i v e c r i t e r i o n f o r the non-  parametric case i s t h a t a good t e s t i s c o n s i s t e n t with r e s p e c t to a l l couples of continuous  F ( X ) , G(X) •  I t i s c o n j e c t u r e d t h a t the I n t e g e r t e s t i s c o n s i s t e n t w i t h r e s p e c t to the a l t e r n a t i v e hypothesis t h a t t r a n s l a t i o n of  F(X)  where  d  G(X ) = F(X + d) , a  i s a constant.  should be shown t h a t the s t a t i s t i c  U/N  To prove t h i s i t  converges  stochastically  to i t s , expected values when e i t h e r hypothesis i s t r u e . be shown that i f the n u l l h y p o t h e s i s i s t r u e , U/N s t o c h a s t i c a l l y to p o s i t i v e number.  (N + 1)/2N  .  Let  e  Using TchebychefgJs <  E  I t can  converges  be an a r b i t r a r i l y small  inequality, u  >  1  . i Z N £  .  2  2  Thus f o r  N  s u f f i c i e n t l y large,  |_NJ converges  ° u / N approaches zero-and hence  2N  i n p r o b a b i l i t y to z e r o .  A d i f f i c u l t y a r i s e s i n c o n n e c t i o n w i t h any attempt that  TJ/N  converges  t o i t s "expected  to show  value when t h e a l t e r n a t i v e  hypothesis i s true,since t h e d i s t r i b u t i o n o f the s t a t i s t i c  i s not  known] .Thus t h e e x p r e s s i o n s f o r the expected value and v a r i a n c e o f U/N  cannot be s t a t e d e x p l i c i t l y although i t i s surmised t h a t  the expected value depends on the constant than large  (N + 1)/2N  d  and is greater  and that the variance' approaches zero for  N • Similar d i f f i c u l t i e s arise in the consideration of the con-  sistency of the Integer test with respect to other alternative hypotheses regarding  F(X)  and G(X) *  Conclusion In the example used to illustrate Mathisen's and Dixon's tests,conflicting results were obtained.  Mathisen's median test  rejected the null hypothesis,whereas, Dixon's test indicated there was no evidence against i t . Applying the Wald-Wolfowitz Run test to the same example, the observed value of probability  U is J  (U = 3) « .1276  ,  From the tables in [6] the  for  n - m - 10 .  There is no  evidence against the null hypothesis on the basis of these two samples. Now app}.y the Integer test to this example. of integers is  (5, 6, 10, 11, 14, 15, 16, 13, 19, 20) and  (1,2, 3, 4, 7, 3, 9, 12, 13, 17) . and  The division  crjj = 1.75"5 .  The observed  U = 13.4 , (N+D/2 - 10.5 t  is 2.20.  From tables for  the normal distribution the probability (2.20 < t) • .0139 . the null hypothesis is rejected for  Thus  a = .05 .  In two of the non-parametric tests the Mathisen and the Integer tests, a significant result is obtained while in the other two, the Dixon and the Run tests, the observed value of the statistic is not significant.  If the Mathisen and Integer tests are at  fault, it means the probabilities i n the rejection region for these tests are too small and conversely for the" case the other two give incorrect results. It is interesting to note what happens i f we assume that the  29.  , populations from which-these samples were drawn a r e normally distributed.  I n t h i s case we can a p p l y the t e s t  based on the Student's value of  F  i s 2.44  t  and  F  f o r v-^ = y  distributions. 2  = 9  statistics The observed  degrees o f freedom.  T h i s v a l u e i s not s i g n i f i c a n t f o r a = .05 .  Thus we may assume  t h a t t h e two normal p o p u l a t i o n s have a common v a r i a n c e and t h u s can apply t o the Student's is  2.36  nificant  with  v  for  t  equal t o  test. 13 .  The observed v a l u e o f t T h i s v a l u e i s alfoost  sig-  a - .01 , and we t h e r e f o r e r e j e c t the n u l l h y p o t h e s i s  In defense of t h e Run and Dixon t e s t s which g i v e o p p o s i t e r e sults t o that considering  of Student's  t  i t must be emphasized  j u s t one p a r t i c u l a r example.  that we were  On the other hand ex-  amples c a n be found i n which the Run t e s t has s m a l l e r p r o b a b i l i t i e s i n t h e r e j e c t i o n r e g i o n t h a n the Student's  t  test.  Suggested a p p l i c a t i o n s of these t e s t s a r e as f o l l o w s : I f the p o p u l a t i o n d i s t r i b u t i o n s a r e normal o r such t h a t they may be appro x i ma ted by normal d i s t r i b u t i o n s , t h e n Student's should be used. the  alternative  test  F o r other cases the c h o i c e of a t e s t depends on hypotheses and t h e demands of t h e experimenter.  I f the experiment c e n t r a l tendency  t  i s such that a comparison o f t h e measures of i s d e s i r e d the Mathisen, Pitman and Integer  t e s t s can be used.  I f we wish t o compare the f i r s t two moments  of t h e d i s t r i b u t i o n s the I n t e g e r t e s t  isapplicable.  For a l l  other non-parametric c a s e s the Run t e s t should be used. In e v a l u a t i n g non-parametric t e s t s and comparing them w i t h the  c l a s s i c a l t e s t s c o n s i d e r a t i o n should be made of the f a c t  3.0.  .that the l a t t e r are l i m i t e d i n t h e i r application due to the r e s t r i c t i v e assumption that the population d i s t r i b u t i o n s are normal.  Thus while i t i s apparent that non-parametric t e s t s  do not use as much of the available information as the c l a s s i c a l tests they are good substitutes tions are unknown.  i n the cases where the popula-  References 1.  A. H. Bowker, "Note on c o n s i s t e n c y of a proposed t e s t f o r the problem of two samples." Ann. Math. S t a t ,  vol. 15 U944) pp. 98 - 101.  2. H. Cramer, Mathematical Methods of S t a t i s t i c s .  Princeton,  1946. 3. W.J. Dixon, "A c r i t e r i o n f o r t e s t i n g t h e h y p o t h e s i s t h a t two samples a r e f r o m the same p o p u l a t i o n , " Ann. Math. S t a t . v o l . pp.  11 (1940),  199 - 204.  4. H.C. Mathisen, "A method, of t e s t i n g t h e h y p o t h e s i s t h a t two samples a r e f r o m the same p o p u l a t i o n , " Ann. Math. S t a t . v o l . ppl 1188 -  14 (1943),  194.  5. E.J.G. Pitman, " S i g n i f i c a n c e t e s t s which may be a p p l i e d t o samples from any p o p u l a t i o n , " J o u r n a l Roy. S t a t i s t . Soc. Supplement, v o l . pp.  4 (1937),  119 - 130.  6. F r i e d a S. Swed and C. E i s e n h a r t , " T a b l e s f o r t e s t i n g r a n domness o f g r o u p i n g i n a sequence of a l t e r n a t i v e s , " Ann. Math. S t a t . v o l . pp.  14 (1943),  66 - 87.  7. A. Wald and J . Wolfowitz, "On a t e s t whether two samples are from t h e same p o p u l a t i o n , " Ann. Math. S t a t ,  vol. IT (1940), pp. 147 - 162.  8.  A. Wald and J . Wolfowitz, " S t a t i s t i c a l t e s t s based on permutations of t h e o b s e r v a t i o n s , " Ann. Math. S t a t ,  vol. 5 (1944), PP. 358 - 372.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0080627/manifest

Comment

Related Items