Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Multicollinearity, autocorrelation, and ridge regression Hsu, Jackie Jen-Chy 1979

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1980_A4_6 H88.pdf [ 3.32MB ]
Metadata
JSON: 831-1.0094775.json
JSON-LD: 831-1.0094775-ld.json
RDF/XML (Pretty): 831-1.0094775-rdf.xml
RDF/JSON: 831-1.0094775-rdf.json
Turtle: 831-1.0094775-turtle.txt
N-Triples: 831-1.0094775-rdf-ntriples.txt
Original Record: 831-1.0094775-source.json
Full Text
831-1.0094775-fulltext.txt
Citation
831-1.0094775.ris

Full Text

MULTICOLLINEARITY, AUTOCORRELATION, AND RIDGE REGRESSION  by  JACKIE JEN-CHY HSU B.A. i n Econ., The N a t i o n a l Taiwan U n i v e r s i t y , 1977  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE fBUSISJESS ADMINISTRATION) IN: THE FACULTY OF GRADUATE STUDIES" THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION  We accept  t h i s t h e s i s as conforming  to the r e q u i r e d  standard  THE UNIVERSITY OF BRITISH COLUMBIA February 1980  c)  J a c k i e Jen-Chy Hsu,  1980  In p r e s e n t i n g an a d v a n c e d the  this  degree  Library shall  I further for  agree  scholarly  by  his  of  this  thesis  in p a r t i a l  fulfilment  at  University  of  the  make  that  thesis  purposes  for  may  avai1able  It  financial  is  of  The U n i v e r s i t y  by  the  understood  gain  shall  British  2 0 7 5 Wesbrook Place Vancouver, Canada 1W5  Feti.  8,  1980  Columbia  requirements  reference copying  Head o f  that  not  Commerce & B u s i n e s s A d m i n . of  for  for extensive  be g r a n t e d  the  B r i t i s h Co 1umbia,  permission.  Department  Date  freely  permission  representatives.  written  V6T  it  of  of  I agree and this  be a l l o w e d  or  that  study. thesis  my D e p a r t m e n t  copying  for  or  publication  without  my  ABSTRACT  The  presence of m u l t i c o l l i n e a r i t y  can induce l a r g e  i n the o r d i n a r y L e a s t - s q u a r e s estimates  of r e p r e s s i o n  coefficients.  I t has been shown t h a t r i d g e r e g r e s s i o n can reduce t h i s effect  on e s t i m a t i o n .  The presence o f s e r i a l l y  methods, have been proposed to o b t a i n good estimates i n t h i s case.  adverse  correlated error  terms can a l s o cause s e r i o u s e s t i m a t i o n problems.  coefficients  variances  Various  two-stage  of the r e g r e s s i o n  Although the m u l t i c o l l i n e a r i t y and  a u t o c o r r e l a t i o n problems have l o n g been r e c o g n i z e d  i n regression  a n a l y s i s , they a r e u s u a l l y d e a l t w i t h  This thesis  explores  separately.  the j o i n t e f f e c t s of these two c o n d i t i o n s on the mean  square e r r o r p r o p e r t i e s of the o r d i n a r y r i d g e e s t i m a t o r as the o r d i n a r y  'least-squares  estimator.  as w e l l  We show t h a t r i d g e  r e g r e s s i o n i s doubly advantageous when m u l t i c o l l i n e a r i t y i s accompanied by a u t o c o r r e l a t i o n i n b o t h , t h e e r r o r s and the p r i n c i p a l . components. adjusted  We then d e r i v e a new r i d g e type e s t i m a t o r  that i s  for autocorrelation.  F i n a l l y , using of - m u l t i c o l l i n e a r i t y  s i m u l a t i o n experiments w i t h d i f f e r e n t  degrees  and a u t o c o r r e l a t i o n , we compare the mean  square e r r o r p r o p e r t i e s o f v a r i o u s  estimators.  TABLE OF CONTENTS  INTRODUCTION NOTATION AND PRELIMINARIES MULTICOLLINEARITY 3.1  Sources  3.2  Effects  3.3  Detection  AUTOCORRELATION 4.1  Sources  4.2  Effects  4.3  Detection  JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION 5.1  Mean Square Error of the OLS Estimates of §'  5.2  Mean Square Error of the Ridge Estimates of,3  5.3  When w i l l Ridge estimates be better than the OLS estimates?  5.4  Use of the "Ridge Trace"  RIDGE REGRESSION: PREDICTION  ESTIMATES, MEAN SQUARE ERROR AND  6.1  Derivation of Ridge Estimator for a CLR Model  6.2  Derivation of Ridge Estimator for an ALR Model  6.3  Mean Square Error of the "Generalized Estimates"  6.4  Estimation  6.5  Prediction  TABLE OF CONTENTS (cont'd)  THE MONTE CARLO STUDY •7.1  D e s i g n o f t h e Experiments  7.2  Sampling R e s u l t s 7.2a.  R e s u l t s assuming p  £  i s known  7.2b.  R e s u l t s assuming p  £  i s unknown  7.2c.  Forecasting  CONCLUSIONS REFERENCES  .  INTRODUCTION M u l t i c o l l i n e a r i t y and A u t o c o r r e l a t i o n a r e two v e r y in regression analysis.  common problems  As i s well-known, the presence o f some degree  of m u l t i c o l l i n e a r i t y r e s u l t s i n e s t i m a t i o n ,  instability  and model m i s -  s p e c i f i c a t i o n w h i l e the presence o f s e r i a l l y c o r r e l a t e d e r r o r s l e a d s t o u n d e r e s t i m a t i o n o f the v a r i a n c e s prediction. estimation  o f parameter e s t i m a t e s and i n e f f i c i e n t  Because these two c o n d i t i o n s have adverse e f f e c t s on and p r e d i c t i o n , a wide range o f t e s t s have, been developed t o  reduce t h e i r impact.  I n v a r i a b l y , the m u l t i c o l l i n e a r i t y and auto-  c o r r e l a t i o n problems a r e d e a l t w i t h s e p a r a t e l y  i n most i f not a l l the  preceedings. In t h i s t h e s i s we address the q u e s t i o n  "What are. the j o i n t e f f e c t s  of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n on e s t i m a t i o n Thereafter  we s h a l l study a n a l y t i c a l l y the p o s s i b l e changes i n the  e f f e c t i v e n e s s of various these two c o n d i t i o n s . estimator  adjusted  e s t i m a t i o n methods i n the j o i n t presence of  As a r e s u l t o f these new f i n d i n g s , a new r i d g e  f o r a u t o c o r r e l a t i o n i s then proposed and i t s  p r o p e r t i e s a r e i n v e s t i g a t e d by c o n d u c t i n g a s i m u l a t i o n We b r i e f l y o u t l i n e t h i s t h e s i s . our  and p r e d i c t i o n ? "  analysis.  Sections  Section 2 provides  3 and 4 g i v e a g e n e r a l  of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n . the v a l i d i t y o f v a r i o u s  study. the s e t t i n g f o r  d i s c u s s i o n of the problems  In a d d i t i o n , we comment on  existing diagnostic tests.  The a n a l y t i c a l study  of the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n i s p r e s e n t ed i n S e c t i o n 5 .  I n S e c t i o n 6 , a new r i d g e e s t i m a t o r  adjusted  f o r auto-  c o r r e l a t i o n i s d e r i v e d and i t s mean square e r r o r p r o p e r t i e s a r e a n a l y z e d . A l s o , we d i s c u s s how these new e s t i m a t e s can be obtained The  i n practice.  methodology and the r e s u l t s o f sampling experiments appear i n S e c t i o n  -2-  7.  The  t h e s i s concludes w i t h the p r e s e n t a t i o n  methods t h a t can be used w i t h the new a c h i e v e b e t t e r e s t i m a t e s and  of s e v e r a l two-stage  ridge r u l e that hopefully  predictions.  will  -3-  2. NOTATION AND PRELIMINARIES The C l a s s i c a l L i n e a r R e g r e s s i o n the  (CLR) model can be r e p r e s e n t e d by  equation  Y = x§ + e  (2.1)  where Y i s a  n x l . v e c t o r o f o b s e r v a t i o n s on t h e dependent v a r i a b l e , X  i s a nxp m a t r i x o f o b s e r v a t i o n s on t h e e x p l a n a t o r y v a r i a b l e s , 3 i s a p x l v e c t o r o f r e g r e s s i o n c o e f f i c i e n t s t o be e s t i m a t e d and e i s a n x l vector of true  e r r o r terms.  The s t a n d a r d assumptions o f the l i n e a r  r e g r e s s i o n model a r e : (1) E ( e ) = 0 , where 0 i s t h e zero v e c t o r  2  T  (2) E(ee ) = a I , where I i s t h e i d e n t i t y m a t r i x . (3) The e x p l a n a t o r y v a r i a b l e s a r e n o n - s t o c h a s t i c , hence they a r e independent o f the e r r o r (4) Rank (X) = p < n . The O r d i n a r y L e a s t - s q u a r e s  (2.2)  3  0 L g  terms.  (OLS) e s t i m a t o r o f 3 i s g i v e n  (fp^fY  =  with variance-covariance matrix (2.3)  Var(3  ) = a (X X) 2  Q L S  T  _ 1  .  T For s i m p l i c i t y , we w i l l assume t h a t (X X) i s i n c o r r e l a t i o n form. L e t X T P be the pxp o r t h o g o n a l m a t r i x such t h a t PX XP = A where A i s a T d i a g o n a l m a t r i x w i t h t h e e i g e n v a l u e s o f (X X), X^,...,X , d i s p l a y e d on the d i a g o n a l o f A.  1 2  We assume f u r t h e r t h a t p  -4-  After  applying  (2.4)  an o r t h o g o n a l  E(Y) = X P P § T  rotation,  P, i t f o l l o w s  from  (2.1) t h a t  = X*a  T  T w h e r e X* = XP  i s the data  matrix  represented  and  t h e c o l u m n s o f X* a r e l i n e a r l y  the  vector  of regression  follows  that  We w i l l  consider  (2.6)  B (k) =(X X+kI)~ T  3  is  ridge  symmetric be  an " a d a p t i v e  .  0 L S  ridge  LS  estimator"  estimator"  i n the rotated  [12].  where Z =  By  SfcCk) - Z a (A+kI)  substituting  8 (k) R  - 1  the ridge  0 L S  A.  = (P AP+kI) T  =  ? ( T  A + k  P  _ 1  -P^A+kl)" It  follows  (2.8)  from  (2.7) t h a t  a ( k )= P§ (k) . R  R  - 1  1  I found  P AP§ T  ??  T A  Q L S  ?§OLS  Aa ... QLS  R  by a  i s said to  [ll:p.63].  coordinates,  X*P = X i n ( 2 . 6 ) ,  I f k l i s replaced  k, t h e n t h e e s t i m a t o r  given by  (2.7)  .  When k i s a f u n c t i o n o f |3Q > § ( k )  nonnegative d e f i n i t e matrix  a "generalized  Expressed  Q L S  It  -  0 < k < l  T  of  components.  f o r 3 o f the form  (X X)§  1  R  by  coordinates a =-P§ i s  The v e c t o r  of the p r i n c i p a l  of a i s given  ridge estimators  Where k i s i n d e p e n d e n t called  independent.  coefficients  t h e OLS e s t i m a t o r  i n the rotated  estimator  fora i s  For the CLR  model, assumption  i s often violated i n practice. Autoregressive  Linear Regression  (2) t h a t the e r r o r s are  This leads (ALR)  to the f o r m u l a t i o n  model.  Mathematically  model i s given by r e p l a c i n g assumptions (2) and (2') and  (3) by  T E(ee  ) = a^Q  We  the  ALR  assumptions  (3 )  X  t ; L > 2 » • • • > tp-' ' ^  =  fc  x  where 2 i s a n o n d i a g o n a l p o s i t i v e d e f i n i t e  x  t  x  t  ie  t  observation  on the  v a r i a b l e s , i s independent of the contemporaneous and E  an  2  (2 ) 1  t'  of  (3') below.  1  £  uncorrelated  matrix.  explanatory  succeeding  errors,  V  t+1'  assume t h a t the e r r o r term 'e  follows a f i r s t - o r d e r  autoregressive  scheme, t h a t i s e  (2.9) where p and  £  i s the a u t o c o r r e l a t i o n c o e f f i c i e n t .  that U  (2.10)  = p e •- + U e t-1 t  t  s a t i s f i e s the f o l l o w i n g f o r a l l t E(U E  (  U  ) = 0  t t U  +  E(U U t  S  )  t + s  "  s = 0  %  )=0  s  4 o  and (2.11)  E(ee ~~ N  T  ) = '  a2u~V  where n-1 n-2  V n-1  n-2  We  require that  |p |<l £  -6-  For an ALR model, the " G e n e r a l i z e d L e a s t - s q u a r e s " "Best L i n e a r Unbiased E s t i m a t o r "  (GLS) w i l l  (BLUE) o f 3, denoted as 3  r T C  g i v e the .  The  m a t r i x ft can be w r i t t e n T  0 = 9Q where Q i s n o n s i n g u l a r .  -1  -1 T  Q  ft(Q -1 T (Q )  and ^GLS  ^  S  Hence  ) = I -3 -1 n  9  =  •  ° b ^ - ^ by making t a  the f o l l o w i n g s u b s t i t u t i o n i n the ALR model;  n e  S*  =  Q  _ 1  £  .  •  Then i t f o l l o w s t h a t (2.12)  Since  Y  = X g + £,.  A  A  {  (2.12) s a t i s f i e s a l l the assumptions o f a CLR model, OLS  g i v e the BLUE of 3.  ( 2  -  1 3 )  i Ls  Hence i t f o l l o w s t h a t  =  G  T -1 -1 T -1 = (X ft X) X ft Y For p r e d i c t i o n , formula  where e  Y  t + 1  = X  t + 1  B  G L S  i s the t*"* GLS 1  '  (2.15) g i v e s the "Best L i n e a r Unbiased P r e d i c t o r "  (BLUP) i n a f i r s t - o r d e r ALR model  (2.15)  will  +  p e £  t  residual.  -7-  3. MULTICOLLINEARITY In a p p l y i n g m u l t i p l e r e g r e s s i o n models, some degree of dependence among e x p l a n a t o r y  v a r i a b l e s can be  expected.  inter-  As  this  T interdependence grows and  the c o r r e l a t i o n m a t r i x (X X)  s i n g u l a r i t y , m u l t i c o l l i n e a r i t y c o n s t i t u t e s a problem.  approaches Therefore  it is  p r e f e r a b l e to t h i n k of m u l t i c o l l i n e a r i t y i n terms of i t s " s e v e r i t y " rather 3.1  than i t s " e x i s t e n c e "  or  "nonexistence".  Sources In g e n e r a l , m u l t i c o l l i n e a r i t y can be  poor e x p e r i m e n t a l d e s i g n . be c l a s s i f i e d as f o l l o w s (i)  Not  The  considered  to be  a symptom  of  sources of severe m u l t i c o l l i n e a r i t y may  [20:p.99-101].  enough d a t a or two  many v a r i a b l e s  In many cases l a r g e d a t a s e t s o n l y c o n t a i n a few b a s i c f a c t o r s . As  the number of v a r i a b l e s e x t r a c t e d  from the d a t a i n c r e a s e s ,  each  v a r i a b l e tends to measure the d i f f e r e n t nuances of the same b a s i c f a c t o r s and  each h i g h l y c o l l i n e a r v a r i a b l e o n l y has  t i o n content.  little  In t h i s case, d e l e t i n g some v a r i a b l e s or  informa-  collecting  more data can u s u a l l y s o l v e the problem. ( i i ) P h y s i c a l or s t r u c t u r a l s i n g u l a r i t y Sometimes h i g h l y c o l l i n e a r v a r i a b l e s , due p h y s i c a l c o n s t r a i n t s , are i n a d v e r t e n t l y  to mathematical or  included  i n the model.  ( i i i ) Sampling s i n g u l a r i t y Due  to expense, a c c i d e n t  or m i s t a k e , sampling was  conducted i n a s m a l l r e g i o n of the d e s i g n  space.  only  3.2  Effects The  (i)  major  effects  Estimation As  the  T (X X)  the  T -1 (X X) explode.  inverse matrix  of  becomes i l l - c o n d i t i o n e d ,  the  OLS  estimates  f o r B are  elements of  the  inverse matrix  instability  of  the  be  numerically  the  inverse matrix  regression coefficients  variances (ii)  of  and  impossible are  of  quite sensitive  the  As  a result  of  small  OLS  estimates  v a r i a b l e s might  case  they  changes  have  i n the  large  data  set.  Structure misspecification The the  increase  information  decreasing to  the  the  i n the  content sample  explained  d e p e n d s on  size  of  the v a r i a b l e set of X  o f each e x p l a n a t o r y  significance  variance  e a c h member o f  o f Y.  authors data  a relatively  [6:p.94][13:p.l60][15],  limitation  responsible Forecast If  then  f o r the  than  tendency  the  thereby  each v a r i a b l e ' s c o n t r i b u t i o n even though Y  large v a r i a b l e set  happen.  i n the  variable,  decreases  As  process  theoretical  to underspecify  really of  a s s e r t e d by of  many  model-building,  limitation the  X,  is  models.  inaccuracy  an  collinear, changes  rather  of  Therefore,  e r r o n e o u s d e l e t i o n o f v a r i a b l e s may  (iii)  .  , the  I n any  to  that  -1  collinear  to obtain.  shows  the  -1  (X X)  the  (2.3)  a f f e c t e d by  T (XX) T  the  are  c o r r e l a t i o n matrix  variances  diagonal  serious multicollinearity  instability  elements of the  of  important  v a r i a b l e i s omitted  but  in- the  later  i t s behavior  and  prediction period,  moves i n d e p e n d e n t l y  any. f o r e c a s t i n g u n d e r . t h i s  inaccurate.  because  of  i t i s highly  this  omitted  variable  other v a r i a b l e s ,  o v e r s i m p l i f i e d model w i l l  be  very  -9-  (iv)  Numerical problems T  The  c o r r e l a t i o n m a t r i x (X X) i s not i n v e r t i b l e i f the columns T  o f X a r e l i n e a r l y dependent. the  With the m a t r i x (X X) b e i n g  singular,  OLS e s t i m a t e s of §, r e p r e s e n t e d by ( 2 . 2 ) , a r e c o m p l e t e l y  indeterminate.  In case o f an almost s i n g u l a r  s e t of v a r i a b l e s , T -1  the n u m e r i c a l i n s t a b i l i t y i n c a l c u l a t i n g i n v e r s e m a t r i x (X X) still 3.3  remains.  Detection T e s t s f o r the presence and l o c a t i o n o f s e r i o u s  are b r i e f l y o u t l i n e d (1)  multicollinearity  and f o l l o w e d by comments.  T e s t s based o n . v a r i o u s c o r r e l a t i o n  coefficients  Here, h a r m f u l m u l t i c o l l i n e a r i t y i s g e n e r a l l y r u l e s o f thumb.  r e c o g n i z e d by  For i n s t a n c e , an admitted r u l e o f thumb  requires  simple p a i r - w i s e c o r r e l a t i o n c o e f f i c i e n t s of e x p l a n a t o r y  variables  to be l e s s than 0.8.  sophisticated  Certainly,  those more extended and  r u l e s o f thumb w i t h prudent use o f v a r i o u s c o r r e l a t i o n w i l l give  more s a t i s f a c t o r y r e s u l t s .  i s generally  c o n s i d e r e d t o be s u p e r i o r  The f o l l o w i n g  coefficients  r u l e o f thumb  to o t h e r r u l e s :  a variable  i s s a i d t o be h i g h l y m u l t i c o l l i n e a r i f i t s c o e f f i c i e n t o f 2 m u l t i p l e c o r r e l a t i o n , R., w i t h the remaining (p-1) v a r i a b l e s i s 2 g r e a t e r than the c o e f f i c i e n t o f m u l t i p l e c o r r e l a t i o n , R , w i t h a l l the e x p l a n a t o r y v a r i a b l e s [I4:p.101]. The v a r i a n c e o f the e s t i m a t e of  3^ can be expressed as f o l l o w s -,  (3.1)  1 - R  o  Var(B ) = —^--4 1  n  -  p  _  1  [9]  \ a  2  X. l  v  1 - R  2  l  2 2 where a i s the v a r i a n c e o f the dependent v a r i a b l e Y and a,, i s the • y X. l 3  -10-  variance of the explanatory variable  X^.  From  (3.1), i t i s obvious 2  that m u l t i c o l l i n e a r i t y  constitutes  a problem  o n l y when R„ i s r e l -  2 high to R^.  atively this  (ii)  rule  U n f o r t u n a t e l y the geometric  o f thumb i s a p p a r e n t  o n l y when t h e r e a r e two  variables [6:p.98], Three-stage hierarchy test T h i s i s p r o p o s e d b y F a r r a r and G l a u b e r stage,  i f the n u l l  hypothesis H  the W i l k s - B a r t l e t t ' s severe  a n d move  test,  toward  i n t e r p r e t a t i o n of  Q  : |X  At the f i r s t  X| = 1 i s rejected  we may a s s e r t  the second  [6].  explanatory  based  on  that m u l t i c o l l i n e a r i t y i s  stage.  The F s t a t i s t i c  i s then  2 f o r each R^  computed  RJ/(P-D F  =  -  i = 1,  . . . , p  .  (l-R*)/(n-p) Statistical stage,  inspection o f the p a r t i a l  X ^ and t h e r e m a i n i n g can  F^ i m p l i e s X ^ i s c o l l i n e a r .  significant  (p-1) v a r i a b l e s  severe  F a r r a r and G l a u b e r  multicollinearity  among e x p l a n a t o r y v a r i a b l e s different Haitovsky In  stages  of their  C h i Square  statistic  test  between  t—ratios  among t h e e x p l a n a t o r y that detecting,  the pattern of  can be r e s p e c t i v e l y  localizing  interdependence  achieved  at three  test.  test  1969, H a i t o v s k y  hypothesis  claimed  and l e a r n i n g  coefficients  and t h e a s s o c i a t e d  show t h e p a t t e r n o f i n t e r d e p e n d e n c y  variables.  (iii)  correlation  At the t h i r d  [9] proposed  of severe  a heuristic  multicollinearity.  i s a f u n c t i o n o f the determinant  statistic  This  f o r the  heuristic  of the c o r r e l a t i o n  matrix  T ( X X ) , and a p p r o x i m a t e l y  distributed  as C h i Square.  Applications  to F a r r a r and Glauber's d a t a show t h a t t h i s t e s t g i v e s more s a t i s f a c t o r y r e s u l t s than the W i l k s - B a r t l e t t ' s t e s t t h a t i s adopted test. and  at the f i r s t  stage of  the F a r r a r and Glauber  three-stage  T h e r e f o r e H a i t o v s k y c l a i m e d the s u p e r i o r i t y of h i s t e s t  suggested  a replacement  of W i l k - B a r t l e t t ' s t e s t by h i s t e s t i n  the F a r r a r and Glauber  three-stage t e s t .  on the the determinant  of c o r r e l a t i o n m a t r i x has some b u i l t - i n  deficiencies..  However, any  test  As w i l l be shown l a t e r , the mean square  based  error T  p r o p e r t i e s depend o n l y on the e i g e n v a l u e s of the m a t r i x (X X ) . T Only when the  (X X)has a broad e i g e n v a l u e spectrum, that i s to say  the r a t i o of the l a r g e s t e i g e n v a l u e to the s m a l l e s t one, l a r g e , the performance of the OLS the determinant  e s t i m a t e s may  X^/X  deteriorate.  , is Since  of the c o r r e l a t i o n m a t r i x i s e q u a l t o the product  of a l l the e i g e n v a l u e s , t h i s t e s t w i l l t r e a t the m a t r i x having broad e i g e n v a l u e spectrum  e q u i v a l e n t l y to those h a v i n g  relatively  narrow e i g e n v a l u e s p e c t r a , so l o n g as they have the same or n e a r l y the same d e t e r m i n a n t s . is difficult  The  r e l a t i v e magnitude of the e i g e n v a l u e s  i f n o t i m p o s s i b l e t o i n f e r from the r e s u l t s of  t e s t t h a t i s based  on the determinant  However, H a i t o v s k y t e s t g i v e s a f a i r l y  any  of the c o r r e l a t i o n m a t r i x . good i n d i c a t i o n i n the  presence of severe m u l t i c o l l i n e a r i t y i n our s i m u l a t i o n study. T Examining the spectrum o f m a t r i x (X X) T I f the m a t r i x (X X) has a broad e i g e n v a l u e spectrum, t h a t i s , A^/Ap i s l a r g e , then the mean square e r r o r of the OLS e s t i m a t e s of § becomes v e r y l a r g e .  S i n c e the t r a c e of the c o r r e l a t i o n m a t r i x  T (X X) i s e q u a l to the number o f e x p l a n a t o r y v a r i a b l e s p, an  arbitrary  -12-  r u l e of thumb may consider \,/\ i s large i f A,/X > p. 1 p 1 p  Besides,  r  —2  —2  the minimax index MMI = Z X. /X i P  i s a useful indicator too. Small  1  MMI  , say, less than two implies the presence of m u l t i c o l l i n e a r i t y  [21:p.13-14]. Among a l l these tests and methods proposed, examining the T  eigenvalue spectrum of the matrix (X X) provides not only a sound t h e o r e t i c a l basis but also the l i g h t e s t computation burden.  -13-  4. AUTOCORRELATION One o f t h e b a s i c assumptions o f t h e CLR model i s t h a t the e r r o r terms a r e independent o f each o t h e r . i s applied  However, when r e g r e s s i o n  analysis  t o time s e r i e s d a t a , the r e s i d u a l s a r e o f t e n found t o be  s e r i a l l y correlated.  Like multicollinearity, autocorrelation  w i d e s p r e a d problem i n a p p l y i n g  r e g r e s s i o n models.  i s another  For s i m p l i c i t y ,  first-  o r d e r a u t o c o r r e l a t i o n i s assumed i n our s t u d y .  4.1  Sources The s o u r c e s a r e m a i n l y t h e f o l l o w i n g :  (i)  Omission of v a r i a b l e s The t i m e - o r d e r e d e f f e c t s o f t h e o m i t t e d v a r i a b l e s w i l l be included  i n t h e e r r o r terms.  d i s p l a y i n g random b e h a v i o r .  This prevents the e r r o r s  from  I n t h i s case, f i n d i n g the m i s s i n g  v a r i a b l e s and i d e n t i f y i n g t h e c o r r e c t r e l a t i o n s h i p can s o l v e t h e problem. (ii)  J  S y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e A g a i n , t h e e r r o r terms absorb the s y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e and t h e n d i s p l a y non-random b e h a v i o r .  (iii)  E r r o r s t r u c t u r e i s time dependent The g r e a t impacts o f some random e v e n t s o r s h o c k s , such as war, s t r i k e s , f l o o d , e t c . , a r e spread over s e v e r a l p e r i o d s o f t i m e , c a u s i n g t h e e r r o r terms t o be s e r i a l l y c o r r e l a t e d . "true-autocorrelation".  This i s so-called  -14-  4.2  Effects When the OLS technique i s s t i l l used f o r e s t i m a t i o n , the major  effects are: (i)  Unbiased but i n e f f i c i e n t e s t i m a t o r of B GLS p r o v i d e s t h e BLUE o f B when the d i s p e r s i o n m a t r i x o f e, 2 oM}, i s n o n d i a g o n a l .  That i s t o say on the average  the sampling  v a r i a n c e s o f GLS e s t i m a t e s of B a r e l e s s than t h a t o f OLS e s t i m a t e s of  g, hence OLS i s i n e f f i c i e n t  compared w i t h GLS.  U n d e r e s t i m a t i o n o f the v a r i a n c e s o f the e s t i m a t e s o f B  (ii)  As an i l l u s t r a t i o n , c o n s i d e r t h e v e r y simple model y  V where u  t  = Bx  t  =  p  e e  t  + c  t - i  +  t  u  t  s a t i s f i e s assumptions ( 2 . 1 0 ) .  I t has been shown t h a t the  v a r i a n c e o f OLS e s t i m a t e o f B i s [ 1 3 : p . 2 4 7 ] n-1 a  (4.1)  Var(6  0 L S  )  i>i i l  ,  x  2  +  - -f— i-i  n-2  [ 1 + 2^  +.2p  1=1  1  i  •+ 2 p e n  x  x  -f^—*  t  i - i  X • 1  n  " n  I  i=l  The OLS formula ( 2 . 3 ) i g n o r e s  J, l i+2  1  n  2  xi 1  the term i n parentheses  i n ( 4 . 1 ) and  2 2 g i v e s the v a r i a n c e s o f the e s t i m a t e s o f B as a / £ x ^ I f b o t h e . i=l and x a r e p o s i t i v e l y a u t o c o r r e l a t e d the e x p r e s s i o n i n parentheses n  i s almost c e r t a i n l y greater than unity, therefore the OLS formula w i l l underestimate the true variance of 3 (iii)  n T C  .  I n e f f i c i e n t predictor of Y When autocorrelation i s present,, error made at one point i n time gives information about the error made at a subsequent point i n time.  The OLS predictor f a i l s to take this information into  account, hence i t i s not the BLUP of Y [13:p.265-266]. A.3  Detection The tests which are commonly used to recognize  the existence of  f i r s t - o r d e r autocorrelation are the following. (i)  Eye-ball tests The plot of OLS residuals e Any nonrandom bechaior autocorrelation. lagged value e  fc  of e  fc  against time t can be informative.  can be considered  as an i n d i c a t i o n of  We may also plot the OLS residual e ^.  I f the observations  against i t s  are hot evenly spread over  the four quadrants, we may conclude the f i r s t - o r d e r autocorrelation i s present.  These eye-ball tests are quite e f f e c t i v e , however they  are imprecise and do not lend themselves to c l a s s i c a l i n f e r e n t i a l methods. (ii)  yon-Neumann r a t i o In 1941, the r a t i o of the mean square successive difference to the variance was proposed by von-Neumann as a test s t a t i s t i c for the existence of f i r s t - o r d e r autocorrelation [22].  Though various  applications have proven the usefulness of the von-Neumann r a t i o , we emphasize that this test i s applicable only when e values are independently d i s t r i b u t e d and the sample size i s large.  In practice,  the OLS r e s i d u a l s used  to compute t h e von-Neumann r a t i o  are not i n d e p e n d e n t l y d i s t r i b u t e d  usually  even when the t r u e e r r o r  terms  are. (iii)  Durbin-Watson t e s t T h i s t e s t , named a f t e r i t s o r i g i n a t o r s D u r b i n and Watson, i s w i d e l y used  f o r s m a l l sample s i z e s  comings o f the Durbin-Watson t e s t . o f indeterminancy.  [4][5]..  There a r e some s h o r t -  First,  t h e r e e x i s t two r e g i o n s  Though an exact t e s t was suggested  i n 1966, i t s heavy c o m p u t a t i o n a l burden, p r e v e n t s a p p l i c a t i o n s ' . [10].  Secondly,  the .test" from wide  the Durbin-Watson t e s t i s d e r i v e d f o r  non-stochastic explanatory v a r i a b l e s only. if  by Henshaw  I t has been shown t h a t  the lagged dependent v a r i a b l e s a r e p r e s e n t e i t h e r i n s i n g l e  r e g r e s s i o n e q u a t i o n models or i n systems o f simultaneous r e g r e s s i o n e q u a t i o n s , t h e Durbin-Watson t e s t  i s b i a s e d towards the v a l u e f o r  a random e r r o r , t h a t i s , d i s b i a s e d towards 2 v e r y m i s l e a d i n g i n f o r m a t i o n [17].  }  thereby  giving  I t i s as n e c e s s a r y as important  to t e s t f o r s e r i a l c o r r e l a t i o n f o r models c o n t a i n i n g lagged dependent v a r i a b l e s s i n c e a u t o c o r r e l a t e d models a r e u s u a l l y r e p a i r e d by i n s e r t i n g lagged Y v a l u e s i n t o the r i g h t - h a n d s i d e of the regression equation.  To t h i s end, D u r b i n  on the h s t a t i s t i c i n 1970[3]. "h" i s d e f i n e d as the f o l l o w i n g , n  where b' i s the c o e f f i c i e n t o f Y  t-1*  developed  a test  based  -17-  This test  i s c o m p u t a t i o n a l cheap but o n l y a p p l i c a b l e f o r l a r g e  sample s i z e s .  The s m a l l sample p r o p e r t i e s of the "h" s t a t i s t i c  are s t i l l unknown.  -18-  5.  JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION In  s t a t i s t i c a l a n a l y s i s , a p o i n t e s t i m a t e i s u s u a l l y o f l i t t l e use  u n l e s s accompanied by an e s t i m a t e o f i t s a c c u r a c y . Mean Square E r r o r it  In t h i s  connection,  (MSE) i s w i d e l y used as a measure of a c c u r a c y .  Since  i s t r u e t h a t a c c u r a t e parameter e s t i m a t e s c o n s t i t u t e an e f f e c t i v e  model.  MSE  can be used  t o determine  the models e f f e c t i v e n e s s when  the u n d e r l y i n g o b j e c t i v e i s s i m p l y t o o b t a i n good parameter e s t i m a t e s . In  1970 paper,  H o e r l and Kennard p r e s e n t e d  OLS and Ridge e s t i m a t e s of § [11]. s t u d i e s have confirmed  the MSE p r o p e r t i e s f o r the  T h e r e a f t e r the r e s u l t s o f v a r i o u s  that ridge r e g r e s s i o n w i l l  improve the MSE  of  e s t i m a t i o n and p r e d i c t i o n s i n the presence  In  t h i s s e c t i o n , we w i l l p r e s e n t e x p r e s s i o n s f o r the MSE of §  B (k) (2.9).  and  These e x p r e s s i o n s w i l l enable us t o examine the e f f e c t o f these  c o n d i t i o n s on the r i d g e and the OLS e s t i m a t e s .  be reduced  5.1  multicollineary.  when the e r r o r terms f o l l o w a f i r s t - o r d e r a u t o c o r r e l a t e d p a t t e r n  R  two  o f severe  Our a n a l y s i s can  t o t h a t o f H o e r l and Kennard by s e t t i n g p  £  = 0.  Mean Square E r r o r of the OLS E s t i m a t e s of 8 We b e g i n w i t h the a n a l y s i s f o r the OLS e s t i m a t e s f o r a f i r s t - o r d e r A L R  model. L e t  = D i s t a n c e from  A  =  (  ^0LS"§  ) T (  B  Q L S  to B  ^0LS~^ '  " We d e f i n e the MSE of B  — ul_ib  2 to be E ( L ) . _L  P r o p o s i t i o n 5.1  (5.D  -BO*) = 1  I  I  j = l £=1  p  D J J i  E  M  -19-  where  D = X(X X) T  Proof:  ( 5 . 2 )  2  X  T  From (2.1)  ^OLS " §  =  (2.2)  ? P ? I ~ §  (  T  - 1  T  = (X X) X (X§+e) - 3T  -1  T  = (X X) l  T  -1  T  X. e  By d e f i n i t i o n and (5.2) i f follows that  E(L ) = E [ ( 6 2  -B) (i T  O L S  T  —2  T  O L S  -3)]  T  = gt^xonp X e] . Noting that E(e) = 0 i t follows from Theorem 4.6.1  G r a y b i l l [7:p.l39]  that (5.3)  E(L ) = a t . [ X ( X X ) " X V ] 2  2  T  2  T  1  ?  From the d e f i n i t i o n of V and D, (5.1) follows. (5.1) does not give much insight into the effect of m u l t i c o l l i n e a r i t y and autocorrelation on the MSE  of 3Q «  By rotating axes (using  LS  p r i n c i p a l components) the effect can be more c l e a r l y  Proposition  5.2  1=1  where x*. i s the i  t  h  demonstrated.  l  j > I  observaton on the i  t  1=1  h  p r i n c i p a l component.  ^ o r a matrix A we use the notation t ( A ) to denote the trace of A. r  -20-  Proof: (5.5)  From (2.4) and (2.5) a  - a = (X* X*) X* Y - a T  Q L S  -1  T  = (X* X*) X* (X*a+e) - a T  1  T  T -1 T = (X* X*) X* g By d e f i n i t i o n and (5.5) i t follows that  E  ^"ibLS-^VPCgoLS-^l  < 1> L  =  E [ (  ?0LS^  ^0LS 2 _  )  = E[e X*(X* X*) T  T  2  )]  X* e] T  Hence by the same argument used i n proving Proposition 5.1  E(L ) 2  a t [X*A X* V] u r ~ ~ ~ ~ 2  2  T  xl i  X  !i !i X  2 1 AT l L  X* X* .  l i nx  au2 tr (,  V .) X* . x* . p X* . X* .  nx l x Y nx 2x 2 ^ 2 AT 1 AT X  X  p x* y nx 2  1 A x 2  By d e f i n i t i o n of A and V, (5.4) follows. After orthogonal rotation, the effect of m u l t i c o l l i n e a r i t y and autocorrelation  becomes apparent from (5.4).  First, i f p  £  i s positive  and most of the p r i n c i p a l components are also p o s i t i v e l y autocorrelated, almost c e r t a i n l y the second term i n (5.4) w i l l be p o s i t i v e .  That i s to  -21-  say t h a t the MSE o f g  w i l l be l a r g e r than when these e f f e c t s a r e n o t  p r e s e n t ; moreover the d i f f e r e n c e w i l l be i n p r o p o r t i o n to the magnitude of  p .  Secondly, we o b t a i n a c r o s s term o f e i g e n v a l u e s , A_ and a u t o T  c o r r e l a t i o n c o e f f i c i e n t , p^. that i s ,  I f the matrix  (X X) i s i l l - c o n d i t i o n e d ,  i s c l o s e to zero and t h e r e is' a h i g h degree o f p o s i t i v e  c o r r e l a t i o n both  i n the p*"* component and the e r r o r terms, then the 1  second term i n (5.4) dominates and the MSE o f g It  auto-  i s then extremely  characteristics.  can be v e r y l a r g e .  dangerous t o a p p l y OLS t o data w i t h the above  However, the problem w i l l n o t be t h a t s e r i o u s i f p  £  i s n e g a t i v e o r t h e p r i n c i p a l components, e s p e c i a l l y those weak components, are n o t a u t o c o r r e l a t e d .  Finally,  from (5.4) we a r e a b l e to t e l l by  how much the MSE o f Pg^g changes because o f the e x i s t e n c e o f f i r s t - o r d e r a u t o c o r r e l a t e d e r r o r s i n g e n e r a l r e g r e s s i o n models c o n t a i n i n g p explanatory v a r i a b l e s . Note t h a t when p^ = 0, (5.4) reduces t o E ( L ) = a t (xV I u r ~ ~ 2  2  ? ^. > A. i=l  2  = a  U  5.2  1  Mean Square E r r o r o f the Ridge E s t i m a t e s  - •'* — " •'• '-' •• • •  •— — •  1  1  1  of 0  • • •• • i  In p a r a l l e l w i t h 5.1, we d e f i n e L.(k) = D i s t a n c e from B_(k) L ~K  to g ~  The MSE o f g ( k ) i s g i v e n by D  E[L (k)] = E[g (k)  B ) ( B ( k ) - §)].  -  2  R  T  R  P r o p o s i t i o n 5.3 (5.6)  E[L (k)] =  where  2 y^k) = ^  2  2 Y  2  ( k )  =  k  Y l  ( k ) + Y (k) + Y (k) 2  I  (  x  i—J.  _  +  k  4  P  I  i  A  \^  3  )  (A +k)2 .  "i  2  -22-  n n-1 Y (k) - l o l l l . j > I 3  U  Proof:  A. ] p -'  px*.x* l \ i=l( Aj+k)  0  J  £  £  From (2.7) and (2.8), the MSE of §R.(k) can be written  as E [ L ( k ) ] = E[(§ (k) - B ) V p ( 3 ( k ) - §)] 2  R  "  E [ ( Z  ?OLS " ?  = E[(a  (5.7)  R  ) T ( Z  5oLS * ?  - a) Z Z(a T  Q L S  T  ) ]  - a)] + (Za - a ) ( Z a - a) T  Q L S  Since the f i r s t term i n (5.7) i s a scalar, from (2.7) and Proposition 5.2, i t follows (5.8)  E[(a  -a) Z Z(a T  Q L S  T  0 L S  -a)] = a  2  t [^r^CA+kl)" ^" ^*^] 2  1  r  = a j t [X*(A+kI)" X* V] 2  T  r  2 u  „  2o  u  x  +  1=1 (XH-k) n  n-1  l i t !  j > «,  P x*.x*  - » ^  .„ 2  ] Pf  £  i = l (X +k) ±  Since the matrix (Z-I) can be written as  (5.9)  Z - I = Z(I-Z ) -1  = Z(-kA ) -1  = -k(A+kI)  _1  From (5.9), the second term i n (5.7) can be expressed as follows.  (Za-a) -(Za-a) = J  a (Z-I)^'a i  2 T -? = kV(M-kl) a Z  9  = k  Z  P  I i=l  a. ' • (A +k)  = Y (k) '  1  2  Z  ±  Completing the p r o o f . The Y^(k)  MSE of B ( k ) c o n s i s t s o f t h r e e p a r t s , y^ik), R  can be c o n s i d e r e d  t o be the t o t a l v a r i a n c e  e s t i m a t e s and i s a m o n o t o n i c a l l y the  decreasing  Y ( k ) and Y^Ck). 2  o f the parameter  f u n c t i o n o f k, Y ( k ) i s 2  square o f the b i a s brought by the augmented m a t r i x k l and i s  m o n o t o n i c a l l y i n c r e a s i n g f u n c t i o n o f k w h i l e Y-j(k) i s r e l a t e d to the a u t o c o r r e l a t i o n i n the e r r o r terms.  Hoerl  the presence o f severe m u l t i c o l l i n e a r i t y ,  and Kennard c l a i m t h a t i n i t i s p o s s i b l e t o reduce MSE  s u b s t a n t i a l l y by t a k i n g a l i t t l e b i a s , t h a t i s , c h o o s i n g k > 0.  This  i s because i n the neighborhood o f o r i g i n , Y^(k) w i l l drop s h a r p l y Y (k) 2  w i l l only increase  s l i g h t l y as k i n c r e a s e s  [ll:p.60-61].  After  i n c o r p o r a t i n g a u t o c o r r e l a t i o n i n the c o n t e x t o f r i d g e r e g r e s s i o n t h e i r a s s e r t i o n w i l l s t i l l be t r u e o n l y satisfied.  From (5.6) we see t h a t  while  analys  i f c e r t a i n conditions are  the e f f e c t s o f m u l t i c o l l i n e a r i t y and  a u t o c o r r e l a t i o n a r e the f o l l o w i n g . (i)  If  i s p o s i t i v e and the p r i n c i p a l components, e s p e c i a l l y t h e  weak components, a r e a l s o p o s i t i v e l y a u t o c o r r e l a t e d , method w i l l be even more d e s i r a b l e than OLS.  then  ridge  T h i s i s because  s u b s t a n t i a l decrease i n b o t h Y-^(k) and Y^(k) can be a c h i e v e d by c h o o s i n g k > 0 w h i l e the i n c r e a s e  i n Y ( k ) i s r e l a t i v e l y s m a l l as 2  moving t o k > 0. (ii)  If p  c  i s n e g a t i v e or almost a l l o f the p r i n c i p a l components a r e  -24 -  not  autocorrelated,  then on the average Y ( k ) i s c l o s e to z e r o , 3  hence the r i d g e and the OLS e s t i m a t e s w i l l perform the  same as i n the u n c o r r e l a t e d  ( i i i ) Since  ridge regression  relatively  case,  i s s i m i l a r to s h r i n k i n g the model by d r o p p i n g  the l e a s t important component [21:p.24-28]  (5.6) g i v e s a t h e o r e t i c a l  j u s t i f i c a t i o n t o s h r i n k t h e model i f both t h e l a s t e r r o r terms a r e p o s i t i v e l y a u t o c o r r e l a t e d . of e s t i m a t i o n  stability,  component and the  From the p o i n t o f view  t h e r i d g e method w i l l be h e l p f u l when severe  m u l t i c o l l i n e a r i t y i s accompanied by h i g h degree o f p o s i t i v e autoc o r r e l a t i o n both i n the weakest component  5.3  and the e r r o r terms.  When w i l l Ridge e s t i m a t e s be b e t t e r than the OLS e s t i m a t e s ? T a k i n g the d e r i v a t i v e s o f Y-^(k) and Y 2 ( k ) , H o e r l and Kennard found  a c o n d i t i o n on k such that r i d g e r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s than OLS i n terms of MSE. a  That i s when k i s s m a l l e r  2 , where a i s the l a r g e s t r e g r e s s i o n c o e f f i c i e n t max max  MSE o f 6 ,(k) w i l l be l e s s than that o f g. „ [11]. T  T  i s present,  1  Y ( k ) and Y ( k ) . 2  3  below.  2  than a / u  i n magnitude, the  When a u t o c o r r e l a t i o n  t h e c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l  b e t t e r than OLS r e g r e s s i o n a r e d e s c r i b e d of Y ( k ) ,  parameter  perform  C o n s i d e r the d e r i v a t i v e s  • "P dy  /dk  =  2k  A. af  'I  X  i=l  (5.10) d y / d k 3  When (X X) values  =  (A^k)-  3  -  approaches s i n g u l a r i t y which i m p l i e s t h a t A  of the f i r s t  are g i v e n  1  two  d e r i v a t i v e s i n the neighborhood of  the  origin  by  (5.11) Lim Lim A ->0 k->0 P  (dy /dk)  = -<*>  (dy  = 0  +  (5.12) Lira Lim A ->0 k+0 P  /dk)  As k i n c r e a s e s , a huge drop i n y^ w i t h s l i g h t expected.  ->• 0,  increase  in y  2  may  However (5.10) shows t h a t the b e h a v i o r of y^ depends on  degree of a u t o c o r r e l a t i o n b o t h i n the p r i n c i p a l components and terms.  Therefore  increases.  be  The  y^ may  use  increase  or d e c r e a s e at v a r i o u s  of r i d g e r e g r e s s i o n  the  rate's as  i s most f a v o u r a b l e  the  when  error k  there  i s a h i g h degree of p o s i t i v e a u t o c o r r e l a t i o n both i n the components the e r r o r terms.  We  now  formalize  these arguments and  present  c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l be b e t t e r than regression  i n MSE  criterion.  a OLS  and  -26-  Let F(k) = E ( L ) - E [ L ( k ) ] 2  2  2 i = i At;  (A.+k)"  " j = i *=i  -  fcJ  £  +  j  i  ±=i  E  (A.+k)  2  Then dF/dk = 2 a J  (5.13)  j -^—[x/f i=l  (X + k )  J  x  7  •  j=l £=1  X*.X*  pj]-2k £  J  P A  2  ^  1=1 ( A + k )  3  ±  Assume t h a t Yg(k) i s a n o n - i n c r e a s i n g f u n c t i o n of k i n the neighborhood From ( 5 . 1 1 ) and  of o r i g i n . moving  towards k > 0 ,  ( 5 . 1 2 ) we may  ^ ( d F / d k ) > 0.  i.e.  +  e x i s t s k > 0 such t h a t the OLS  Theorem 5 . 1 .  /r- •, ,  2  a  o* a  max  implies  If  2 \  a  max  ,  ri-1  1 3=1  then E(L?; )• - E [ L ( k ) ] > 0 . 2  In o t h e r words,  e s t i m a t e s have h i g h e r MSE  estimates.  j £ J J + (dF/dk) > 0  expect F ( k ) to i n c r e a s e as  n-j 1=1  J  there  than the r i d g e  -27-  Again (5.14) w i l l reduce to Hoerl and Kennard's result i f p  =0.  When  p o s i t i v e autocorrelation exists i n the error terms and the p r i n c i p a l components, the second term i n (5.14) may well be p o s i t i v e , hence the range of k for ridge estimates to be better than OLS estimates i n MSE c r i t e r i o n w i l l be larger than what Hoerl and Kennard asserted i n uncorrelated case.  (5.14) shows that the extension i n the range of k  i s p o s i t i v e l y related with the magnitude of p .  However, (5.14) i s  2 2 just a necessary condition on k for E(L^) to be greater than E[L2(k)] since F(k)- i s increasing i n k over the range shown by (5.14).  It  i s possible that for some values of k, F(k) i s decreasing i n k while 2 the function value i s s t i l l p o s i t i v e , that i s E(L^) i s s t i l l greater 2 than E[L (k)]. Therefore, we may  consider (5.14) as a stringent  condition on k for ridge estimates to be better than OLS i n MSE  estimates  criterion.  If either p  £  i s negative or the p r i n c i p a l components, especially  those weak ones, are not autocorrelated, the behavior of Y^k) thereby F(k) as k increases w i l l be hard to predict.  and  The effect of  autocorrelation on the range of k depends on the data set we  gathered.  In practice, the true parameters are unknown, the range of k shown by (5.14) can be approximated  by conducting a p r i n c i p a l component  analysis and substituting the estimates for the 5.4  parameters.  Use of the "Ridge Trace" In ridge regression the augmented matrix (kl) i s used to cause  the system to have the general c h a r a c t e r i s t i c s of an orthogonal system.  -28-  H o e r l and Kennard claimed t h a t at c e r t a i n v a l u e of k, stabilize  [ll':p.65].  They proposed  the system w i l l  the usage of a "Ridge T r a c e " as a  d i a g n o s t i c t o o l t o s e l e c t a s i n g l e v a l u e of k and a unique of  g i n practice.  The  "Ridge T r a c e " w i l l p o r t r a r y the b e h a v i o r of a l l  the parameter e s t i m a t e s as k v a r i e s . dimensions  T h e r e f o r e i n s t e a d of s u p p r e s s i n g  e i t h e r by d e l e t i n g c o l l i n e a r v a r i a b l e s or dropping  p r i n c i p a l components of s m a l l importance, how  s i n g u l a r i t y i s causing i n s t a b i l i t y ,  incorrect  ridge estimate  signs.  the  the Ridge Trace w i l l show  over/under-estimations  and  In c o n n e c t i o n w i t h a u t o c o r r e l a t i o n where r i d g e  r e g r e s s i o n i s even more d e s i r a b l e , c e r t a i n l y the "Ridge T r a c e " w i l l of  g r e a t h e l p i n g e t t i n g b e t t e r p o i n t e s t i m a t e s and  predictions.  Even when p  be  thereby b e t t e r  i s n e g a t i v e or the p r i n c i p a l components e  are not a u t o c o r r e l a t e d , the m e r i t s and u s e f u l n e s s of the "Ridge T r a c e " and the "Ridge R e g r e s s i o n " a r e s t i l l p r e s e r v e d i n d e a l i n g w i t h problem of m u l t i - c o l l i n e a r i t y .  the  -29-  6.  RIDGE REGRESSION:  ESTIMATES, MEAN SQUARE ERROR AND PREDICTION  The MSE of OLS estimates of g can be written as the difference in the length between two vectors, $ g and g [11:p.56] 0Tj  (6.1)  E(L*) = E ( i  T 0 L S  3  ) - gg  '  T  0 L S  (6.1) shows that i n the presence of severe m u l t i c o l l i n e a r i t y , the MSE can be improved by shortening the OLS estimates of g.  In this section  we w i l l show that t h i s reasoning appears to be compatible with the derivation of ridge estimator of 'g.  Hence ridge regression can be  expected to be better i n terms of MSE.  6•1  Derivation of Ridge Estimator for a CLR Model Let B be any estimate of g.  I t s residual sum of squares, 0 ,  can be written as the value of minimum sum of squares, 0 . , plus ^ mm' c  X  the distance from B to &Q g weighted through ( X X ) . L  0 = (Y-XB) (Y-XB) T  (6.2)  = ^Xg  = 0  .  min  ) (Y-xg T  O L S  •+ 0(B)  ~  o L s  ) + (B-B  ) X X(B-g T  0LS  T  0LS  )  •  For a s p e c i f i c value of 0(B), 0  q  , the ridge estimator i s founded  by choosing a B to T Minimize B B (6.3)  Subject to ( B - g  ) X X (B-g ) = 0 T  0LS  T  QLS  Q  This problem can be solved by use of Lagrange m u l t i p l i e r techniques  -30-  where (1/k) The  i s the m u l t i p l i e r c o r r e s p o n d i n g to the c o n s t r a i n t  (6.3).  problem i s to minimize  (6.4)  F = BB T  +  (l/k)[(B-6  ) X (.?-§0LS T  0 L S  T x  A n e c e s s a r y c o n d i t i o n f o r B to minimize (6.4)  |f  =2B  +  i[2(X  T  X)B-2(X  T  )  "  0  0*  i s that  X)^  Hence  [J+^? ?)]?4 ? ?^0LS T  (  T  and B* = B ( k )  =  (X X+kI) X Y T  _ 1  T  where k i s chosen to s a t i s f y c o n s t r a i n t work the other way  (6.3).  In p r a c t i c e , we  usually  round s i n c e i t i s e a s i e r to choose a k > 0 and  compute the a d d i t i o n a l r e s i d u a l sum  of squares,  then  0. Q  I t i s c l e a r t h a t f o r a f i x e d increment 0 , t h e r e i s a continuum of o' v a l u e s of B t h a t w i l l s a t i s f y the r e l a t i o n s h i p 0 = 0 . + 0 , and mxn o nu  J  the r i d g e e s t i m a t e so d e r i v e d Therefore,  we  may  i s the one  w i t h the minimum l e n g t h .  w e l l expect the r i d g e e s t i m a t e s to y i e l d  l e s s MSE  the presence of m u l t i c o l l i n e a r i t y s i n c e they are o r i g i n a l l y by m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r . c e r t a i n extent equivalent (6.2)  derived  I t i s t r u e to a  t h a t m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r  to r e d u c i n g  the MSE  of parameter e s t i m a t e s .  Q L S  T increase  i n the r e s i d u a l sum  approaching s i n g u l a r i t y .  That i s to say,  is  In a d d i t i o n ,  shows t h a t i t i s p o s s i b l e to move f u r t h e r away from 8  an a p p r e c i a b l e  in  of squares as  ridge regression  (X  X)  may  without  achieve large reduction i n MSE  at v i r t u a l l y no cost i n terms of the T  r e s i d u a l sum In 1971,  of squares i f the conditioning of (X X) i s poor enough.  Newhouse and Oman [18] used MSE  as evaluation c r i t e r i o n i n  t h e i r Monte Carlo studies of ridge regression. the standard way  to evaluate proposals  Since then i t has become  for ridge estimators.  the above derivation of the ridge estimator, obviously we that ridge estimates are designed to be better i n MSE Now  From  realize  criterion.  we would l i k e to study the implications of the constraint  i n deriving the ridge estimator. i n t e r p r e t a t i o n , we represent Let A = PB  Since orthogonalization can ease  (6.2) i n the rotated axes.  Then 0  = ^ O L s ' V ^ O L S  =  0  min  +  5  ^OLS V )  +  T x  *CA-a  ( J L S  )  X1=1 - V ° O L S . l * i (  ) 2  Where A^ i s the estimate of regression c o e f f i c i e n t for the i t h component and a  i s the OLS  ULio «  estimate of regression c o e f f i c i e n t for the i t h  1  component. The problem i s T Minimize A A (6.5)  Subject  to ( - ? A  ) A(A-a T  0 L S  0 L S  ) = 0  Q  or equivalently  (6.6)  (6.5)  Subject  to  \ (A.-« i=l  Shows that the vector  [  ) X. 2  0LS  A _  = 0  .  i ?  0 L S  ] i s normed through A to have the  length equal to 0 .  Since the eigenvalue X^ can be considered as an  q  indicator of the information content and explanatory power of the i t h p r i n c i p a l component, we may well conclude that the derivation of the ridge estimator has already taken the r e l a t i v e information content and explaining power of the explanatory variables into account.  (6.6)  shows that the constraint has incorporated the concept of square-errorloss function as w e l l .  It increases the length of A the most when the  parameter estimates of the important  components deviate from OLS  estimates  since i t i s found by taking the square of the deviations multiplied by their corresponding eigenvalues.  This implies that i t i s best to  shrink the estimate of § for those components that have small eigenvalues, i . e . the ones most subject to  6.2  instability.  Derivation of Ridge Estimator for an ALR Model In the presence of autocorrelated error terms, the OLS  estimator  of 8 w i l l no longer have the minimum-variance property; the GLS estimator w i l l be the BLUE of B.  type  Our derivation O f a new ridge  estimator adjusted for autocorrelation, 8  (k), w i l l p a r a l l e l with  ~GR  the derivation of 6 (k) i n the previous section.  Again l e t B be  r)  R  estimate of B..  any  ~  Its residual sum of squares, 0, can be written as  the value of the minimum sum of squares, 0 . , plus the distance from  mm'  n  r,  B to §  .  rj-i G L S  weighted through ( X X ) . A  (See (2.10) for notation).  A  0 = (Y*-X*Bj (Y*-X*B) T  " CY -xJ A  ) (Y,-xJ T  G L S  G L S  ) 4- ( B - B  ) X^X (B-B T  G L S  A  G L S  )  We have  -33rp  0(B) = 0 , 3  For a s p e c i f i c v a l u e  (6.7)  subject  The  to ( B - §  ) X^(B-§  i s derived  T  G  L  S  G  L  S  )  = 0  G L S  )^ ] -  to minimize B B  .  Q  L a g r a n g i a n i s g i v e n by  * = ? ? +£ C(?-§ T  G L S  > &(?-e T  0  A n e c e s s a r y c o n d i t i o n f o r a minimum i s t h a t  f  »  2B  + i  [2X^B  -  2 ( X ^ ) 3  G  L  S  ]  =  0  .  T h i s reduces t o  (6.8)  B* = 3  G R  (k)  = (X*X* +  =  for  1  ^  T -1 ' -1 T ( X Q X + kl) X n  Where k i s chosen t o s a t i s f y trace of 3  k l ) "  (6.7).  -1  •Y  The c h a r a c t e r i z a t i o n o f the r i d g e  (k) w i l l be e s s e n t i a l l y the same as t h a t of B ^ k ) .  a specific  increment 0 , the 3 , ^ so d e r i v e d o ~(JK.  i s the r e g r e s s i o n  v e c t o r w i t h the minimum l e n g t h among a continuum o f v a l u e s will  satisfy  the r e l a t i o n s h i p 0 = 0 ^ m  n  + 0 . Q  For i n s t a n c e ,  of B  that  q  However m u l t i c o l l i n e a r i t y  may no l o n g e r be a s u b s t a n t i a l problem a f t e r t r a n s f o r m i n g i n some r a r e cases.  That i s ,  X into X  A  t h i s may happen i n time s e r i e s  s t u d i e s where m u l t i c o l l i n e a r i t y  i s a r e s u l t o f the e x p l a n a t o r y  i n c r e a s i n g together  I t i s then p o s s i b l e t h a t the transformed  over time.  v a r i a b l e s are not close to being the case, the r e d u c t i o n  c o l l i n e a r w i t h each o t h e r .  i n MSE can n o t be obtained  variables  I f that i s  with only a s l i g h t  i n c r e a s e i n the r e s i d u a l sum o f squares. T h i s i s because o f t h e low " T MSE of § a l r e a d y achieved and the n o n - s i n g u l a r i t y o f the ( X ^ X ) . G T s  A  In most c a s e s , i f not a l l ,  the m a t r i x  T -1 (X ft X)  i s very l i k e l y  to have  X a broad  eigenvalue  spectrum i f (X X) does.  on the m o t i v a t i o n of m i n i m i z i n g the i n t e r p r e t a t i o n and  the l e n g t h of the r e g r e s s i o n v e c t o r ,  i m p l i c a t i o n s o f - t h e c o n s t r a i n t i n the d e r i v a t i o n  of r i d g e e s t i m a t o r of 8 f o r a CLR  Mean Square E r r o r of the " G e n e r a l i z e d The MSE  Y p  A  e  model w i l l be a p p l i c a b l e to t h a t i n  3„„. ~GR  the d e r i v a t i o n of  6-3  = X8 =0  of 8^  +  A  &  and  Tc  (k)  are r e a d i l y e s t a b l i s h e d .  = a  2  tr  a  ( X ^ ) '  with  Q  to 3  n  1  -1  T t r (X  n  model, (5.3)  T  2  (6.9)  Since  of 3 „ „ as f o l l o w s , -GLS  L e t L . = D i s t a n c e from $ 2  Estimators"  s a t i s f i e s a l l the assumptions of a CLR  g i v e s the MSE  E(L )  Then the p r e v i o u s d i s c u s s i o n  -1 X)  '  x  e S e t t i n g p^ = 0 i n (5.8), L e t L^(k) = D i s t a n c e  (6.10)  E[L (k)] = a 2  2  (6.10) g i v e s  the MSE from § ( k ) to 8  of  8 (k). r R  G R  t r [ ( X f t X+kI) ( X f t X ) ] + T  1  2  T  1  2 T T -1 -2 k § (X « ?+kI) 8 The  (6.10), expect  effect since  2  E(L-j)  a n  of a u t o c o r r e l a t i o n Is d i f f i c u l t  to i n f e r from (6.9)  0, i s not a d i a g o n a l m a t r i x , however, n o r m a l l y we ^  2  2  E [ L ^ ( k ) ] to.be l e s s than E(L^) and  2  and may  E[L (k)] respectively. 2  -35-  6.4  Estimation T h e o r e t i c a l l y , t h e GLS g i v e s the BLUE o f 8 f o r an ALR model.  But u s u a l l y i n p r a c t i c e , n e i t h e r the order o f a u t o c o r r e l a t i o n s t r u c t u r e nor t h e v a l u e o f the parameter p' i s known. or GR e s t i m a t e s  can not be computed d i r e c t l y .  Many two-stage methods  have been proposed t o approximate the GLS e s t i m a t e s to be q u i t e e f f e c t i v e . process  Hence the GLS  and have proven  These i n c l u d e the Cochrane-Ocutt  iterative  [1] and Durbin's two-step method [ 2 ] .  In the j o i n t presence o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n , it one  i s a c t u a l l y quite straightforward  incorporated  of § .  We i l l u s t r a t e how r i d g e r e g r e s s i o n can be  i n Durbin's two-step method f o r a simple model w i t h  c o l l i n e a r explanatory  (6.11)  Y  . and  E  - 3  t  £.  +  q  X  B l  e t-l  = P  E  +  U  =  a  t l  only  variables:  + 8 X 2  t 2  +  E  t  t  t=l,2,,..,n  ' J  « I  P  for a l l t E(u )  = 0  t  E ( u  t' t+s u  )  u  for s = 0  =0  The  with  o f t h e two-stage r e g r e s s i o n methods i n the hope o f a c h i e v i n g  b e t t e r estimates  two  t o combine r i d g e r e g r e s s i o n  for s ± 0  transformed r e l a t i o n i s g i v e n by  (6.12)  Y  t  - P  e  Y _ t  x  - 3 (l-p ) + ^ ( X ^ - P ^ . ^ o  + u.  £  +  e (X 2  t 2  -P X „ e  t  l j 2  )  Combining (6.11) and (6.12) gives (6.13) Y  - 3 (1-P >-+ ^  t  0  e  ~ h e t-l,l X  p  +  S  2 t2 " W t - l , 2 X  +  p  Y E  t-l  + U  t  The f i r s t step i s to estimate the parameters of (6.13) using OLS. Then use the estimated c o e f f i c i e n t of Y (Y -p Y _ ), t  £  t  1  ^ to compute the transformed variables  ( X ^ - p g X ^ . ^ and (\ ~ e t-l P  X  2  ^.  At the second step,  ridge regression i s highly recommended to be used i n place of OLS and applied to relationship (6.12) containing those transformed variables.  The c o e f f i c i e n t estimate of (X . - p X .) i s our ti e t-li approximation of 3 and the intercept term divided by (1-p ) i s our n  /\  approximation of 3 G R  o  .  It might seem reasonable to apply ridge regression at the f i r s t step of Durbin's method since X,, and X^„ are c o l l i n e a r . t l t2 v  As stated  e a r l i e r i n Section 3, high pair-wise c o r r e l a t i o n c o e f f i c i e n t of explanatory  variables does not necessarily r e s u l t i n estimation' i n - .  stability.  Besides, the lagged values of X ^ and X ^ are inserted  into the explanatory  variable set.  I f X ^ and X ^ are not autocorrelated, T  the conditioning of the enlarged u  fc  (X X) may be s a t i s f a c t o r y . Moreover  i n (6.12) has a scalar dispersion matrix, therefore OLS gives  consistent estimates of regression c o e f f i c i e n t s . estimates,  Also among these  only the c o e f f i c i e n t estimate of Y' ^ w i l l be used to compute  the transformed variables.  Hence OLS technique i s recommended to be  used at the f i r s t step even when Xj.^ and X ^ are c o l l i n e a r . This combination of ridge regression and Durbin's two-step method can e a s i l y be extended to a p-variable model with higher order of autocorrelation.  -37-  6.5  Prediction Consider a f i r s t - o r d e r ALR model, (2.15) gives the minimum  variance predictor (BLUP).  In p r a c t i c e , both $  replaced by their estimated  values.  r T O  and p are  If ridge regression i s used i n conjunction with some other methods to cope with the j o i n t problem of m u l t i c o l l i n e a r i t y and autoc o r r e l a t i o n , the p r e d i c t i o n i s given by (6.14)  Y  f c + 1  = X ^ B ^ k )  +P e £  t  st where X ^ t +  i s a 1 x p vector of the (t+1)  explanatory v a r i a b l e s , B regression c o e f f i c i e n t s , P  observation on the  (k) i s a p x 1 vector of approximated £  i s an estimate of autocorrelation c o e f f i c i e n t  and e^ i s the ridge r e s i d u a l at time t .  THE MONTE CARLO STUDY Consider a f i r s t - o r d e r ALR model with two explanatory variables, the error terms i n the transformed r e l a t i o n , as shown by (6.12), have a scalar dispersion.  The residual sum of squares from (6.12) i s given  by  t=i  t=i  \  If  M  X  t 2 -  e W  p  ]  2  and Y^ are given, the summation can run from 1 to n, other-  wise i t can only run from 2 to n. /N  /S  respect to B , 6^, 8 Q  The direct minimization of (7.1) with  A a n 2  d P  £  leads to non-linear equations, therefore, /N  /S  /\  the analytic expressions for 8 » B^> 8^ and Q  can not be obtained.  As mentioned before, many two-stage methods have been proposed to approximate these parameters.  Usually i n practice, not only the  parameters and the true error terms but also the order of autocorrelation structure i s unknown.  As indicated previously, j o i n t presence of  autocorrelation and m u l t i c o l l i n e a r i t y w i l l further complicate the s i t u a tion.  Under t h i s circumstance, the r e l a t i v e effectiveness of those  two-stage methods can best be studied by the Monte Carlo experiments [19].  7.1  Design of the Experiments The main purpose of the experiment i s to give an empirical  support to the inference drawn from our analytic studies. experiments are conducted i n the following manner.  The sampling  B a s i c a l l y , the  sampling experiments comprise nine d i f f e r e n t experiments with d i f f e r e n t  -39-  degree  of multicollinearity and autocorrelation.  They are  summarized in Table 1.  ' Table 1  Experiment 1 2 3 4 5 6 7 8 9  r  12  .05 .05 .05 .50 .50 .50 .95 .95 .95  In our experiments, y ^ and  .05 .50 .90 .05 .50 .90 .05 .50 .90  are used to indicate the severity  of multicollinearity and autocorrelation respectively.. Usually in practice, multicollinearity constitutes a problem only when Y-^  1 S  a s  high as 0.8 or 0.9. In addition, the error terms are normally considered to be independent, moderately and highly autocorrelated when p = .05, £  .50 and .90 respectively.  As shown by Table 1; the experiments are  set up to have different characteristics. Through this design, we can study the effects of autocorrelation on estimation and prediction for a given degree of multicollinearity.  Moreover, we can observe how these  effects of autocorrelation change as the degree of multicollinearity varies. The data is generated as follows; f i r s t , values are assigned to 8^, 8 , B and the probability characteristics of error terms u 1  0  fc  in  -40(2.9).  Three s e r i e s of e  the v a l u e s o f e  o  i n (2.9) are s u b s e q u e n t l y generated, g i v e n  and d i f f e r e n t v a l u e of p . „ e  The  probability  c h a r a c t e r i s t i c s o f the j o i n t d i s t r i b u t i o n X, ., and X^„ a r e chosen to t l t.2 J  generate the s e r i e s of X ^ t h r e e experiments.  and X ^  We  and X ^  a r e generated f o r the r e m a i n i n g  have a l s o a s s u r e d t h a t t h e r e i s no  f i r s t - o r d e r a u t o c o r r e l a t i o n i n X ^ and X ^ is first-order.  Solving for Y  of f o r t y o b s e r v a t i o n s a r e generated.  so t h a t the e r r o r  structure sets  F o r each experiment, ten samples In each sample, t h i r t y  from t = 1 to t = 30 a r e employed  by a p p r o r p i a t e methods.  significant  based on the d a t a , n i n e d i f f e r e n t  can be generated f o r the experiments.  on the Y 's  first  By v a r y i n g the c o r r e l a t i o n c o e f f i c i e n t o f X ^ and  X 2» another two s e r i e s o f X ^ s i x experiments.  t h a t a r e s u i t a b l e f o r the  observations  to e s t i m a t e the e q u a t i o n  O b s e r v a t i o n s 31 to 40 a r e used to study the  p r e d i c t i o n p r o p e r t i e s of e s t i m a t o r s .  The BLUP i s used i n the presence  of s i g n i f i c a n t a u t o c o r r e l a t i o n i n the e r r o r  terms.  S p e c i a l c a r e has to be e x e r c i s e d i n c o n t r o l l i n g the s e r i a l c o r r e l a t i o n p r o p e r t i e s of the e r r o r terms.  In t h i s c o n n e c t i o n , OLS  r e g r e s s i o n has t o be r u n on (7.2)  e  =pe 3  C  e  ,+u J  C  j = 1,2,...,10 t = 1,2 40  3  to determine whether the e s t i m a t e d r e g r e s s i o n c o e f f i c i e n t c o n s i s t e n t w i t h the p • which i s used to generate them. i s well-known,  the OLS  p  £  is  However, as  e s t i m a t e s o f parameters f o r s m a l l samples  may  be b a d l y b i a s e d i f some o f the r e g r e s s o r s are lagged dependent variables  [23].  T h i s i s because the e r r o r terms, u_.  fc  be independent of the r e g r e s s o r s , j _ ^ > E  t  e  jt''"'' j40" G  w i l l no *  n  longer  ^• )> 2  -41-  ( -i.> jt  E  u  E  ) jt+s  -4.j_  $  0 for s  4- 0  and a l l t , hence the OLS e s t i m a t e f o r the  c o e f f i c i e n t of £..-, i s b i a s e d . 3t-l  The u s u a l t t e s t on the e s t i m a t e of  r e g r e s s i o n c o e f f i c i e n t may be q u i t e m i s l e a d i n g , t h e r e f o r e we can o n l y a s c e r t a i n t h a t the d e s i r e d s e r i a l by a s s u r i n g t h a t u ^ first  t e s t whether  t  c o r r e l a t i o n p r o p e r t i e s are o b t a i n e d  are randomly d i s t r i b u t e d .  the s e r i e s o f u  For each sample,  i s c o n s i s t e n t w i t h the p r o b a b i l i t y  c h a r a c t e r i s t i c s chosen t o generate them, then we use run determine whether u  fc  we  i s randomly d i s t r i b u t e d .  t e s t to  Only those s e r i e s of u  passed a l l the t e s t s are adopted i n our s i m u l a t i o n study.  We are  now  ready to e s t i m a t e the r e g r e s s i o n e q u a t i o n . First, the  f o r each experiment, the OLS p r i n c i p l e i s a p p l i e d to e s t i m a t e  parameters.  The Durbin-Watson  statistic  t e s t the e x i s t e n c e of a u t o c o r r e l a t i o n . statistic  i s used as a f i l t e r  Whenever the  to  Durbin-Watson  computed from the f i t t e d model i s l e s s than the c o r r e s p o n d i n g  upper c r i t i c a l v a l u e d^ (a=0.05), a u t o c o r r e l a t i o n i s assumed to'be p r e s e n t i n the e r r o r terms, then D u r b i n ' s two-step method i s used i n con j u n c t i o n ' w i t h Ridge r e g r e s s i o n f o r e s t i m a t i o n as d e s c r i b e d i n S e c t i o n 6.4; o n l y OLS purposes.  o t h e r w i s e , a u t o c o r r e l a t i o n i s assumed to be absent, and  and Ridge r e g r e s s i o n s t e c h n i q u e s are employed  for- e s t i m a t i o n  In a d d i t i o n , whenever the e x i s t e n c e of a u t o c o r r e l a t i o n i s ~  r e c o g n i z e d , 3„  ~ Tc  and B  j  -'  (k) are computed f o r comparison purposes.  Since the t r u e v a l u e of the a u t o c o r r e l a t i o n c o e f f i c i e n t each experiment, c a l c u l a t i o n s o f 3 and 3™(k) - GLo ~ GR  i s known f o r  w i l l simply be the  s t r a i g h t f o r w a r d m u l t i p l i c a t i o n of m a t r i c e s as shown by  (2.13) and  (6.8).  The methods adopted f o r e s t i m a t i o n i n each experiment are r e c o r d e d i n T a b l e 2.  -42-  Table 2  Experiment  Method  1  OLS,  2  Durb., Durb.+RR, GLS,  GR  3  Durb., Durb.+RR, GLS,  GR  4  OLS,  •5  OLS:  RR  RR  Durb., Durb.+RR, GLS,  GR  6  Durb., Durb.+RR,. GLS,  GR  7  OLS,  8  Durb., Durb.+RR, GLS,  GR  9.  Durb., Durb.+RR, GLS,  GR  RR  O r d i n a r y L e a s t --squares  Durb.:  Durbin's  Durb.+RR:  Regression  Two--step Method  Durbin's Two-step i n c o n j u n c t i o n w i t h Ridge R e g r e s s i o n  GLS: GR:  C e n t r a l i z e d Least-squares  Regression  Ridge R e g r e s s i o n a d j u s t e d f o r A u t o c o r r e l a t i o n  As i s expected, no c o r r e c t i o n f o r a u t o c o r r e l a t i o n i s n e c e s s a r y f o r experiments  1, 4 and 7.  Whenever t h e r i d g e method i s a p p l i e d seven o r  e i g h t v a l u e s o f k have been used i n our study.  I n o r d e r t o minimize  the e f f e c t s o f s u b j e c t i v i t y r e s u l t i n g from s e l e c t i n g the v a l u e o f k  A.  i n r i d g e r e g r e s s i o n s , we compute the mean 6 o f the samples f o r every ~R s p e c i f i c v a l u e o f k i n each experiment. based  on a "Mean Ridge T r a c e " .  That  Then the v a l u e o f k i s s e l e c t e d  i s to say, a unique  s e l e c t e d w i l l g e n e r a l l y be t h e b e s t f o r a l l then samples.  value of k Obviously,  the v a l u e o f k s o - s e l e c t e d may w e l l n o t t o be the b e s t f o r every i n d i v i d u a l sample.  T h e r e f o r e , t h e minimum o f t h e MSE o f r i d g e  e s t i m a t e s o f 3 a c h i e v e d f o r each experiment  i s s l i g h t l y upward b i a s e d .  C e r t a i n l y t h i s way o f s e l e c t i n g the v a l u e o f k cannot be used i n practice.  7.2  Sampling R e s u l t s For  each method f o r each experiment, the MSE o f the e s t i m a t e s 2  of  t h e a d j u s t e d R , t h e r e s i d u a l sum o f squares, the MSE f o r e c a s t  and the Durbin-Watson s t a t i s t i c a r e averaged over t e n samples. a d d i t i o n , t h e mean e s t i m a t e statistic  a r e a l s o computed,  In  o f p . and t h e mean H a i t o v s k y h e u r i s t i c u  i s assumed to f o l l o w a normal  d i s t r i b u t i o n w i t h mean zero and v a r i a n c e e q u a l t o 6 i . e . u ~'N(0,'6). The t r u e mean o f X  i s 10 and t h a t o f X tl  variance of X  i s 8.  and X ^  a  r  e  18 and 15.  v  i s chosen t o be 3 f o r each  sample. The t r u e v a l u e o f BQ i s 5, 8^ i s 1.1 and 7.2a R e s u l t s assuming p i s Known F i r s t we assume p  e  i s known.  best of s i t u a t i o n s .  i s 1.  The r e s u l t s here w i l l  whether the methods d e s c r i b e d i n S e c t i o n 6.4 the  The r e s p e c t i v e  t2  indicate  can show promise i n  T a b l e 3 c o n t a i n s t h e average MSE o f 8 ~ Lrii b  and 3 ™ f o r experiments 2, 3, 5, 6, 8 and 9. _GR  Table 3  \5xperiment k  \.  2  3  6  8  9  (Y =.05) (Y =.05) (Y =.50) (Y =-50) (Y =.95) (Y =.95) 12  (P  0.0*  5  e  12  =.50)  (P  12  =.90)  £  (P  £  12  =.50)  P  £  12  =.90)  (P  12  =.50)  e  (P.  =.90)  e  .4824  3.2913  .0834  2.3940  .025  .0591  1.8084  .0018  1.4991  .05  .0363  .8073  .1215  .8337  .075  .3603  .2274  .3033  .4263  .1  .9849  .0156  .8871  .1092  .4929  .2901  .2  5.7786  2.0100  4.0851  .6153  2.4426  .4242  .3  2.9771  7.0896  9.3743  4.0521  5.4924  1.0113  .5  30.1494  21.7887  21.1581  11.4165  13.7628  4.8285  .7  47.5367  36.6241  35.2063  22.2489  23.7003  10.9524  1.0  75.6732  63.4542  56.5869  39.9762  38.5830  .1011  2.1681  .0561  .9405 .  •  * GLS regressions can be considered as a s p e c i a l case of Ridge regressions, adjusted for autocorrelation, with k = 0. In Section 5, i t has been shown that the MSE of 8  will  increase  rapidly i f s i g n i f i c a n t l y p o s i t i v e autocorrelation exists both i n the disturbances and i n the p r i n c i p a l components.  Correction for auto-  c o r r e l a t i o n w i l l then be necessary i n estimating the regression  equation.  Though the GLS regression y i e l d s the BLUE of 8 , the behavior of the MSE of 8 i s very d i f f i c u l t to i n f e r from (6.10). ~Gljb  From the MSE of  experiments 3, 6 and 9 when k = 0, we observe that the MSE of 8 decreases as the degree of m u l t i c o l l i n e a r i t y increases for s u f f i c i e n t l y high degree of autocorrelation.  On the other hand, for a given degree  of m u l t i c o l l i n e a r i t y , the MSE.of 8 autocorrelation increases.  w i l l increase as the degree of  But the magnitude of the increase i n the MSE  -45of B  decreases as the r e l a t i o n among explanatory variables increases.  For instance, the difference i n MSE  of 3  r T 0  between experiments 8 and  ~ GL< o  9 i s less than that between experiments 5 and 6.  Moreover, Table 3  shows that there exists at least one value of k for each experiment such that the MSE  of 8  i s less than that of 8 „ . TC  P  e  = 9,  Note that when  ~GLiD  ~GR  k = .1 obtains the minimum MSE  of the estimates of g i n T  experiment 9.  This also implies that the transformed matrix (X^X^) i s  s t i l l ill-conditioned.  In Section 5.3 we have shown that the range  of k such that the MSE  of 3- i s less than that of B „ w i l l be larger ~R ~OLS T c  i f m u l t i c o l l i n e a r i t y i s accompanied by high degree Now P  £  with parameter estimates f u l l y adjusted  of autocorrelation.  for autocorrelation (since  i s known) experiment 9 s t i l l has the largest admissible  range of k.  That i s , the range of k such that the MSE B „ „ is s t i l l  of g i s less than that of ~GR larger i f m u l t i c o l l i n e a r i t y i s accompanied by high  ~Gijb  degree of autocorrelation and autocorrelation has been f u l l y We  adjusted.  also observe that as the degree of autocorrelation increases, a  larger reduction i n the MSE of the estimates of 3 can be obtained by replacing B „ with $ . For instance, the difference i n the MSE of ~GLo ~GR 3 „ and 3™ (.05) i n experiment 8 i s less than that i n experiment 9. ~G1JO ~GK c  n T  of B „ „ i s very d i f f i c u l t , i f not ~GR (6.10) shows that the MSE of $ i s comprised ~GR  However, the behavior of the MSE impossible,to of two  terms.  predict. How  each term behaves w i l l depend not only on  the  data matrix X and the degree of autocorrelation but also on the  way  the matrix X. i s linked with the matrix 0. ^. 7.2b  Results assuming p In practice p  e  i s unknown  i s unknown.  c o e f f i c i e n t i s unknown and we  We  assume that the true autocorrelation  t r y to f i t the equation using h e u r i s t i c  -46-  techniques akin to the Durbin's two-step method i n which has been shown by G r i l i c h e s and Rao [8] to perform well when there i s autocorrelation.  We apply these techniques as described i n Section 7.1. 2  Tables 4,5 and 6 report the mean adjusted R  and the mean Durbin-  Watson s t a t i s t i c for each experiment. (du(a = 5%) = 1.57 for experiments L-4 and 7; du(a = 5%) = 1.56 f o r theremaining six experiments) • Table 4 ( Y  Experiment  1 2  = .05)  1 V  2 .05  3  = .50  p  .90  E  2* R a  d**  .8640  2.0791  .8979  1.8766  .9149  1.8380  .025  .8640  2.0903  .8884  1.8713  .9144  1.8269  .05  .8620  2.1001  . 8868  1.8728  .9128  1.8286  .075  .8579  2.1083  .8844  1.8805  .9104  1.8409  .1  .8567  2.1154  .8813  1.8916  .9071  1.8613  .2  . 8401  2.1353  .8619  1.9619  .8888  1.9809  .3  .8167  2.1463  .8399  2 .0383  .8648  2.1030  .5  .7651  2.1536  .7865  2.1563  .8102  2.2789  1.0  .6394  2.1527  .6571  2.2960  .6780  2.4740  k 0.0  * R : *d :  2 R a  the mean adjusted R' the mean Durbin-Watson  statistic  d  2 R a  d  Table 5  (Y,  = -50)  0  4  Experiment  5 05  .50 P  k  0.0  2 R  d  a  6  R  .  .90  = £  -  2  a  ' d  R  2  d  a  .8973  2.0984  .9178  1.8958  .9475  2.0754  .025  .8970  2.1040  .9175  1.9054  .9472  2.0865  .05  .8962  2.1095  .9168  1.9194  .9464  2.1059  .075  .8950  2.1147  .9155  1.9371  .9451  2.1317  .1  .8933  2.1198  .9138  1.9576  .9434  2.1622  .2  .8832  2.1381  .9032  2.0540  .9336  2.3024  .3  .8702  2.1592  .8895  2.1504  .9184  2.4532  .5  .8351  2.1721  .8546  2.2988  .8834  2.6086  .7  .7970  2.1826  .8187  2.4203  .8440  2.7084  1.0  .7412  2.1900  .7572  2.4732  .7846  2.7887  Table 6  (y., = . 9 5 )  7  Experiment V k  .  R  2  a  8 .05  P  d  R  2  a  9 50  P  e  = £  d  R  2  a  -  90  d  .9208  2.0511  .9391  1.8785  .9575  1.8707  .05  .9201  2.0873  .9380  1.9058  .9565  1.8883  .1  .9181  2.1128  .9359  1.9437  .9544  1.9335  .2  .9116  2.1473  .9293  2.0280  .9677  2.0562  •3  .9024  2.1659  .9199  2.1084  .9382  2.1660  .5  .8786  2.1768  .8957  2.2312  .9136  2.8381  .7  .8505  2.1725  .8673  2.3082  .8847  2.4417  1.0  .8056  2.1594  .8217  2.3739  .8399  2.5259  0.0  -48-  From T a b l e s 4-6, we observe t h a t the a d j u s t e d  I n c r e a s e s as the  degree of a u t o c o r r e l a t i o n i n c r e a s e s f o r a g i v e n v a l u e o f k and a g i v e n degree o f m u l t i c o l l i n e a r i t y .  This i s i n t u i t i v e l y  plausible  since  a u t o c o r r e l a t i o n can account f o r p a r t o f the v a r i a t i o n i n the e r r o r s , thereby d e c r e a s i n g the r e s i d u a l sum of squares and i n c r e a s i n g the 2 adjusted R . increases.  F o r each experiment, the a d j u s t e d R  d e c r e a s e s as k  The r e a s o n i s obvious from the d e r i v a t i o n of r i d g e  B e s i d e s , the b e s t R is,  2  2 cl  estimators.  a c h i e v e d f o r each experiment i s p r e t t y h i g h , t h a t  the e s t i m a t e d model can e x p l a i n most o f the v a r i a t i o n i n Y^_.  This  a l s o i m p l i e s that the e s t i m a t i o n methods adopted i n our experiments are fairly efficient  and powerful.. The mean Durbin-Watson  statistic  computed f o r each method f o r each experiment i s h i g h enough to a s c e r t a i n t h a t the f i t t e d model has s u c c e s s f u l l y removed the problem of  autocorrelation.  S i n c e the model i s r e a s o n a b l y w e l l f i t t e d ,  simulation  comparisons o f t h e e x p e r i m e n t a l r e s u l t s s h o u l d be m e a n i n g f u l as w e l l as informative. The average MSE o f the e s t i m a t e s o f § i s computed f o r each method for  each experiment and r e c o r d e d i n T a b l e 7.  Table 7  ^^Experiment. 1  2  3  4  5  6  7  8  9  .1101  .4824  .9594  .0030  .0342  .3996  .0180  .0720  .6951  .025  .0192  .0570  .2691  .1104  .0210  .0945  -  -  -  .05  .2865  .0390  .0087  .4158  .1965  .0024  .1833  .0719  .0939  .075  .8820  .3744  .1200  .8973  .5778  .1026  -  -  -  .1  1.7430  1.0167  .5559  1.5366  1.1097  .3765  .8307  .5643  .0041  .2  7.3539  5.9115  4.7694  5.3823  4.5690  2.7540  3.2001  1.2549  .5822  .3  15.1383  13.2456  11.5962  10.8432  9.5949  6.8219  6.6531  5.7624  3.3012  .5  33.5067  31.1559  28.8369  23.9694  22.3449  18.6255  15.6804  14.2377  10.0251  .7  -  -  38.6334  36.9396  32.0214  26.3049  24.3789  18.5115  79.3134  76.8708  60.7620  50.3176  52.8276  43.3464  40.2351  32.3991  k 0.0  1.0  - • 73.8630  -50-  As t h e known a u t o c o r r e l a t i o n case, when k = 0 t h e MSE o f e s t i m a t e s o f B i n c r e a s e s as t h e degree o f a u t o c o r r e l a t i o n i n c r e a s e s , g i v e n the degree o f m u l t i c o l l i n e a r i t y .  However, b e i n g  different  from t h e known a u t o c o r r e l a t i o n case, t h e MSE o f e s t i m a t e s o f 8 f i r s t decreases  then i n c r e a s e s as t h e degree o f m u l t i c o l l i n e a r i t y i n c r e a s e s  f o r k = 0 and a g i v e n degree o f a u t o c o r r e l a t i o n . except  f o r experiments  T a b l e 7 shows t h a t  4 and 7, b e t t e r e s t i m a t e s o f B i n MSE  criterion  can be o b t a i n e d i f D u r b i n ' s  two-step method i s combined w i t h r i d g e  regression f o r estimation.  B e s i d e s , amazingly we have found t h a t we  a r e a b l e t o o b t a i n b e t t e r e s t i m a t e s o f B i n terms o f MSE i f t h e t r u e autocorrelation coefficient P  £  i s unknown.  F o r c l a r i t y , we  shall  compare o n l y t h e minimum o f t h e average MSE o f t h e e s t i m a t e s o f B a c h i e v e d f o r each experiment p,  £  known c a s e .  i n the p  unknown case w i t h t h a t i n the  T a b l e 8 r e p o r t s t h e minima o f t h e average MSE o f t h e  e s t i m a t e s o f 8 a c h i e v e d f o r each experiment unknown c a s e s .  i n both the  known and  I n a d d i t i o n , t h e e s t i m a t i o n method and the c h a r a c t e r i s t i c s  of each experiment  are also tabulated.  -51-  Table 8 (p  (p  unknown)  known)  ^ s E x p e r imen t  Experiment  (r  1 2  ,P ) £  Estimation Method  k  Min. MSE of g  k  Min. °  ^GR  f  1  (.05,.05)  RR  .025  .0192  _  _  2  (.05,.50)  . Durb.+RR  .05  .0390  .05  .0363  3  (.05,.90)  Durb.+RR  .05  .0087  .1  .0156  4  (.50,.05)  .0030  -  •-  5  (.50,,50)  Durb.+RR  .025  .0210  . 025  .0018  6  (.50,.90)  Durb.+RR  .05  .0024  .1  .1092  7  (.95,.05)  .0180  -  -  8  (.95,.50)  Durb.+RR  .05  .0719  .05  .0561  9  (.95,.90)  Durb.+RR  .1 .  .0441  .1  .2901  OLS  0.0  OLS  0.0  A couple o f i n t e r e s t i n g o b s e r v a t i o n s can be made from T a b l e 8. if  First  the degree o f m u l t i c o l l i n e a r i t y i s h e l d c o n s t a n t , the minimum o f the  average  MSE o f parameter e s t i m a t e s w i l l  first  the degree o f a u t o c o r r e l a t i o n i n c r e a s e s .  i n c r e a s e then decrease as  On the o t h e r hand, g i v e n the  degree o f a u t o c o r r e l a t i o n , t h e minimum o f the average estimates w i l l  first  decrease  c o l l i n e a r i t y increases.  MSE o f the parameter  then i n c r e a s e as the degree o f m u l t i -  These a r e i n t u i t i v e l y p l a u s i b l e s i n c e s u f f i c i e n t  h i g h degree o f a u t o c o r r e l a t i o n s h o u l d - l e a d to more s t a b l e parameter e s t i m a t e s w h i l e s u f f i c i e n t h i g h degree o f m u l t i c o l l i n e a r i t y  usually  r e s u l t s i n v e r y u n s t a b l e parameter e s t i m a t e s .  observe  t h a t the v a l u e o f r i d g e parameter k, used of  MSE  Secondly, we  t o a c h i e v e the minimum MSE  the e s t i m a t e s o f 8, i n c r e a s e s w i t h the degree o f m u l t i c o l l i n e a r i t y  and a u t o c o r r e l a t i o n .  T h i s i s c o n s i s t e n t w i t h our a n a l y t i c  findings  shown i n Section 5.3. Moreover, we have found that knowing  does not  give better estimates of 6 for s u f f i c i e n t high degree of autocorrelation. This may r e s u l t from sample sizes being small. Table 9 contains the mean estimates of p^ obtained i n the f i r s t step of Durbin's method and the mean Haitovsky h e u r i s t i c s t a t i s t i c for each experiment.  Table 9  Experiment  p  c  Bias i n p e H x^ df = 3 2  1  2  3  —  .3581  125.7  4  5  6  .7182  .3586  .7231  .1419  .1818  .1414  .1769  123.1  111.7  38.7  37.4  39.1  7  — 2.78  8  9  .3849  .7498  .1151 .1502 2.53  2.40  In a l l cases, Durbin's two-step method tends to underestimate the true autocorrelation c o e f f i c i e n t .  This results from the presence of the lagged  Y values among the explanatory variables [16].  I f the degree of multi-  c o l l i n e a r i t y Is held constant, the bias of estimate of p^ increases as the degree of autocorrelation increases; while given the degree of autocorrelation, the bias decreases as the degree of m u l t i c o l l i n e a r i t y increases. In our simulation study, Haitovsky h e u r i s t i c s t a t i s t i c can recognize the existence of severe m u l t i c o l l i n e a r i t y i n experiments  7, 8 and 9.  However,  i t does not give any warning when there exists a f a i r l y high degree of m u l t i c o l l i n e a r i t y , i . e . based on the Haitovsky test, m u l t i c o l l i n e a r i t y i s i n s i g n i f i c a n t i n experiments  4, 5 and 6.  Since the Haitovsky test  i s based on the determinant of correlation matrix, i t has some b u i l t - i n defficiencies  (see Section 3.3 for d e t a i l s ) .  Our experiments have  -53-  disclosed these d e f f i c i e n c i e s to a certain extent, hence we suggest that s p e c i a l care has to be exercised i n applying this test. 7.2.C.  Forecasting  Tables 10, 11 and 12 report the average residual sums of squares and the mean square error of prediction from the given values f o r the forecast period of each experiment, under the assumption that  Table 10  Experiment k  u  *  1 2  = .05, a* = 6)  1 a  0.0  (r  i s unknown.  2  3  AA  »2*  AA  F/C  a  F/C  u  0  u  AA M S E  F/C  5.9700  8.1132  5.7055  9.1966  5.6351  10.939  .025  5.9924  8.0343  5.7332  9.6623  5.6721  10.952  .05  6.0554  7.9961  5.8113  9.0620  5.7762  11.034  .075  6.1536  7.9986  5.9328  9.1072  5.9382  11.173  .1  6.2824  8.0343  6.0918  9.1913  6.1501  11.360  .2  7.0250  8.4293  7.0074  9.8181  7.3716  12.470  .3  8.0022  9.0877  8.2087  10.739  8.9754  13.932  .5  10.245  10.755  10.957  12.944  12.6521  17.249  1.0  15.733  15.075  17.656  18.419  21.649  25.164  "2 a : the average of the residual sums of squares over ten samples.  u  ** MSE ^ : F  C  the average MSE of predictions from the given values for the forecast period.  -54-  Table 11  (r, = .50, a  •  k 0.0  5  4  Experiment -2 a u  = 6)  2  u  12  M S E  F/C  6  -2 . a u  M S E  F/C  "2 a u  M S E  F/C  6.0169  8.2093  5.8625  9.3838  5.7757  10.733  .025  6.0331  8.1743  5.8828  9.3691  5.8038  10.731  .05  6.0797  8.1690  5.9409  9.3874  5.8838  10.672  .075  6.1541  8.1905  6.0330  9.4351  6.0109  10.850  .1  6.2518  8.2360  6.1559  9.5093  6.1803  10.961  .2  6.8444  8.6142  6.8849  10.021  7.1976  11.679  .3  7.2030  9.2476  7.9231  .10.783  8.6992  12 .482  .5  9.7190  10.851  10.470  12.735  12.128  15.327  .7  11.933  12.726  13.070  14.851  15.759  18.018  1.0  15.429  15.620  17.566  18.301  21.929  22.680  2 Table 12  a u  0.0  2  = .95, a  ^F/C  = 6)  8  7  Experiment k  (r  -2 a u  MSE„ .„ F/C  9 -2 a u  M S E  F/C  6.0443  8.3699  5.6725  9.8589  5.4033  11.335  .05  6.1186  8.1165  5.7759  9.4141  5.5390  10.758  .1  6.2753  8.1220  6.0473  9.3568  5.8110  10.754  .2  6.7890  8.4120  6.6160  9.5945  6.7433  11.172  .3  7.5180  8.9408  7.5161  10.116  7.9187  12.001  .5  9.4101  9.8466  11.683  11.120  . 14.301  10.445  .7  11.643  .12.290  12.584  13.651  14.881  17.123  1.0  15.198  15.343  16.973  16.918  20.713  21.591  Though the BLUP i s adopted f o r f o r e c a s t purposes, the MSE prediction w i l l increases.  still  i n c r e a s e as the degree of  of  autocorrelation  However, the main p o i n t i s t h a t the presence of m u l t i -  c o l l i n e a r i t y w i l l adversely disturbances  a f f e c t the p r e d i c t i v e performance i f the  are h i g h l y s e r i a l l y c o r r e l a t e d .  The  t h a t the p r e d i c t i v e power o f the model i s not  commonly h e l d  a f f e c t e d by  belief  existence  of m u l t i - c o l l i n e a r i t y i s o n l y t r u e i f the problem of a u t o c o r r e l a t i o n is  not  serious.  I n the 9th experiment, the model f i t t e d by  method g i v e s s a t i s f a c t o r y r e s u l t s on v a r i o u s  diagnostic tests,  the p r e d i c t i o n of the BLUP l e a v e s much t o be d e s i r e d .  determine the model which w i l l y i e l d l e s s MSE c r i t e r i a and  c o l l i n e a r i t y and  not  l e s s MSE  We  to perform  a l s o observed t h a t the v a l u e  of p r e d i c t i o n s .  e s t i m a t e s of B i n MSE  of  of squares w i l l  However, the  c r i t e r i o n tends to  of p r e d i c t i o n f o r each experiment.  the v a l i d i t y of the MSE  o f p r e d i c t i o n and  e s t i m a t e of the r e s i d u a l sum  u s u a l l y g i v e the minimum MSE  o f k g i v i n g the b e s t  are a b l e  t e s t s i n the j o i n t presence of m u l t i -  autocorrelation.  k t h a t y i e l d s the b e s t  still  Fortunately,  w i t h Durbin's method combined w i t h Ridge r e g r e s s i o n , we  w e l l on v a r i o u s  Durbin's  Hence, we  may  value yield  conclude  c r i t e r i o n i n the e v a l u a t i o n of parameter  estimates. To  avoid  c o n f u s i o n , we  based on g ~GLS  and  obtained  However, we true p  c  have not  reported  found t h a t the MSE  the MSE  of p r e d i c t i o n based  i s l e s s than t h a t based on the parameter e s t i m a t e s  by Durbin's two-step method i n c o n j u n c t i o n  regression.  of p r e d i c t i o n  with  ridge  Though Durbin's two-step method combined w i t h  r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s o f 3, but underestimates p .  Therefore,  ridge  i n general i t  the BLUP based on  (3  and  true p  on  g i v e s the minimal MSE o f p r e d i c t i o n i n each o f the experiments 2, 3, 5, 6, 8, and 9.  CONCLUSIONS I t has been shown t h a t i n the presence o f m u l t i c o l l i n e a r i t y w i t h s u f f i c i e n t h i g h degrees o f a u t o c o r r e l a t i o n .  The OLS e s t i m a t e s o f  r e g r e s s i o n c o e f f i c i e n t s can be h i g h l y i n a c c u r a t e . estimation  procedure i s o b v i o u s l y n e c e s s a r y .  r e g r e s s i o n , we d e r i v e d a new  8  Combining GLS and Ridge  estimator.  (k) = (xV'-'-X+kl)'" X n~ Y ~ .. ~ ~ ~ ~ T  1  ~GR  Improving the  where 0 < k < 1  1  andft i s defined i n (2').  &„„(k), though b i a s e d ,  i s expected to perform w e l l i n the j o i n t presence o f m u l t i c o l l i n e a r i t y and  autocorrelation.  based on the b i a s e d Therefore,  However, s i n c e ft i s unknown,parameter e s t i m a t e s estimator  B^Ck)  cannot be o b t a i n e d  i n practice.  we combined Durbin's two-step method w i t h o r d i n a r y  r e g r e s s i o n to approximate those parameters.  The e f f e c t i v e n e s s o f our  a p p r o x i m a t i o n can then b e s t be examined by the Monte C a r l o Our  simulation.  study has confirmed t h a t , f o r a g i v e n degree o f m u l t i -  c o l l i n e a r i t y , the MSE o f the GLS e s t i m a t e s o f 3 i s d i r e c t l y to the degree o f a u t o c o r r e l a t i o n . wisdom.  Ridge  T h i s agrees w i t h  proportioned  conventional  Unexpectedly, we found t h a t the MSE o f t h e GLS e s t i m a t e s o f  3 i s i n v e r s e l y p r o p o r t i o n a l to the degree of m u l t i c o l l i n e a r i t y f o r a s u f f i c i e n t l y high  degree o f a u t o c o r r e l a t i o n .  T h i s i m p l i e s t h a t i n the  a p p l i c a t i o n o f the GLS t e c h n i q u e , the symptom o f the e x i s t e n c e o f m u l t i c o l l i n e a r i t y may be d i s g u i s e d .  However, s i n c e i n p r a c t i c e n e i t h e r  the t r u e e r r o r terms nor GLS in  the a u t o c o r r e l a t i o n c o e f f i c i e n t  e s t i m a t e s can p o s s i b l y be o b t a i n e d .  We  were p l e a s e d  the j o i n t presence of m u l t i c o l l i n e a r i t y and  i s known, no to f i n d  autocorrelation;  whatever the degree i s , Durbin's two-step method i n c o n j u n c t i o n Ridge r e g r e s s i o n the GLS  with  (p^ unknown) y i e l d s even b e t t e r e s t i m a t e s of 8 than  technique ( p  £  known) does i n MSE  criterion.  Though the  o f k g i v i n g b e t t e r e s t i m a t e s of 3 tends to y i e l d l e s s MSE prediction, s t i l l the cases.  that  the GLS  gives  value  of  the minimal-MSE o f p r e d i c t i o n i n a l l  Besides, our e x p e r i m e n t a l r e s u l t s have shown t h a t  Durbin-Watson t e s t f o r d e t e c t i n g the e x i s t e n c e  the  of f i r s t - o r d e r  auto-  c o r r e l a t i o n remains p o w e r f u l i n the presence of m u l t i c o l l i n e a r i t y w h i l e the H a i t o v s k y h e u r i s t i c s t a t i s t i c g i v e s r e l a t i v e l y l i m i t e d about the e x i s t e n c e  of - m u l t i c o l l i n e a r i t y e i t h e r w i t h or without  presence of a u t o c o r r e l a t e d Our "optimal  autocorrelation.  to the s e a r c h  independent phenomena.  Empirical research  f o r optimal  d e a l i n g w i t h m u l t i c o l l i n e a r i t y and and  find,an  package" t h a t d e a l s w i t h the j o i n t problem of  m u l t i c o l l i n e a r i t y and been c o n f i n e d  the  e r r o r terms.  r e s u l t s a l s o suggest t h a t i t might be p o s s i b l e to estimation  information  The  estimation  autocorrelated  ordinary  has  hitherto  techniques i n e r r o r s as  separate  ridge regression, i . e . , T  adding a constant  k on the d i a g o n a l  of c o r r e l a t i o n m a t r i x (X X)  Durbin's two-step method have been shown to be v e r y techniques i n h a n d l i n g  m u l t i c o l l i n e a r i t y and  Even though s a t i s f a c t o r y e s t i m a t i o n  and  the combination o f Durbin's method and t h e r e may  still  e x i s t some o t h e r  and  powerful  a u t o c o r r e l a t i o n problems.  p r e d i c t i o n are o b t a i n e d ordinary  ridge  by  regression,  even more e f f i c i e n t approaches to  the j o i n t problem o f m u l t i c o l l i n e a r i t y and  autocorrelation.  For  -58-  i n s t a n c e , the combination  of the Cochrane-Orcutt  procedure  G e n e r a l i z e d Ridge r e g r e s s i o n i s a more f l e x i b l e e s t i m a t i o n and  thereby  with technique  s h o u l d l e a d to b e t t e r e s t i m a t i o n and p r e d i c t i o n .  Allowing  f o r h i g h e r o r d e r and mixed o r d e r a u t o c o r r e l a t i o n w i l l be a good d i r e c t i o n to pursue as w e l l .  BIBLIOGRAPHY Cochrane, D. and Orcutt, G. H. (1949). Application of l e a s t squares regressions to relationships containing autocorrelated error terms. J . Am. S t a t i s t . Assoc., 44, 32-61. Durbin, J . (1960). Estimation of parameters i n time-series regression models. J . Royal S t a t i s t . S o c , Series B, 139-153. Durbin, J . (1970). Testing for s e r i a l c o r r e l a t i o n i n l e a s t squares regression when some of the regressors are lagged dependent variables. Econometrica, 38, 410-421. Durbin, J . and Watson, G. S. (1950). Testing f o r s e r i a l c o r r e l a t i o n i n least-squares regression (part 1). Biometrica, J 3 7 , 409-428. Durbin, J . and Watson, G. S. (1951). Testing for s e r i a l c o r r e l a t i o n i n least-squares regression (part 2). Biometrica, 38, 159-178. Farrar, D. C. and Glanber, R. R. (1967). M u l t i c o l l i n e a r i t y i n regression analysis: the problem r e v i s i t e d . Rev. Economics S t a t i s t i c s , 49, 92-107. G r a y b i l l , F. A. (1976). Theory and Application of the Linear .Model. Daxburg Press, North Scituate, Mass. G r i l i c h e s , Z. and Rao, P. (1969). Small-sample properties'of several two-stage regression methods i n the context of . autocorrelated errors. JASA, 64, 253-272. Hailovsky, Y. (1969). M u l t i c o l l i n e a r i t y i n regression analysis: • comment. Rev. Economics S t a t i s t i c s , 486-489. Henshaw, R. C., J r . (1966). Testing single-equation least squares regression models for autocorrelated disturbances. Econometrica, 34, 646-660. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Tech., 12, 55-67. Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge regression: some simulations. Comm. Stat. , k_, 105-123. Johnston, J . (1972).  Econometric Methods.  2nd edn., McGraw-Hill.  K l e i n , L. R. (1962). Hall.  An Introduction to Econometrics.  Prentice-  -60-  [15]  L i n , T. C. (1960). Underidentification, s t r u c t u r a l estimation and forecasting. Econometrica, 28, 856.  [16]  Marriott, F. H. C. and Pope, J . A. (1954). Bias i n the estimation of autocorrelation. Biometrica, 41, 390-402.  [17]  Nerlove, M. and W a l l i s , K. F. (1966). Use of the Durbin-Watson S t a t i s t i c i n inappropriate s i t u a t i o n s . Econometrica, 34, 235-238.  [18]  Newhouse, J . P. and Oman, S. D. (1971). An evaluation of ridge estimators. Rand report No. R-716-PR.  [19]  Smith, V. K. (1973).  [20]  Thisted, R. A. (1976). Ridge regression, minimax estimation and empirical Bayes method. Ph.D. thesis, Tech. Report 28, B i o s t a t i s t i c s Dept., Stanford University.  [21]  Thisted, R. A. (1978). M u l t i c o l l i n e a r i t y , information and ridge regression. S t a t i s t i c s Dept., University of Chicago.  [22]  -yon Neumann, J . (1941). D i s t r i b u t i o n of the r a t i o of the mean square successive difference to the variance. Ann. Math. Stat. 12, 367-395.  [23]  White, J . S. (1961). Asymptotic expansions f o r the mean and variance of the s e r i a l . c o r r e l a t i o n c o e f f i c i e n t . Biometrica, 48, 85-94.  Monte Carlo Methods.. Lexington, Mass.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0094775/manifest

Comment

Related Items