MULTICOLLINEARITY, AUTOCORRELATION, AND RIDGE REGRESSION by JACKIE JEN-CHY HSU B.A. i n Econ., The N a t i o n a l Taiwan U n i v e r s i t y , 1977 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE fBUSISJESS ADMINISTRATION) IN: THE FACULTY OF GRADUATE STUDIES" THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA February 1980 c) J a c k i e Jen-Chy Hsu, 1980 In p r e s e n t i n g an a d v a n c e d the this degree Library shall I further for agree scholarly by his of this thesis in p a r t i a l fulfilment at University of the make that thesis purposes for may avai1able It financial is of The U n i v e r s i t y by the understood gain shall British 2 0 7 5 Wesbrook Place Vancouver, Canada 1W5 Feti. 8, 1980 Columbia requirements reference copying Head o f that not Commerce & B u s i n e s s A d m i n . of for for extensive be g r a n t e d the B r i t i s h Co 1umbia, permission. Department Date freely permission representatives. written V6T it of of I agree and this be a l l o w e d or that study. thesis my D e p a r t m e n t copying for or publication without my ABSTRACT The presence of m u l t i c o l l i n e a r i t y can induce l a r g e i n the o r d i n a r y L e a s t - s q u a r e s estimates of r e p r e s s i o n coefficients. I t has been shown t h a t r i d g e r e g r e s s i o n can reduce t h i s effect on e s t i m a t i o n . The presence o f s e r i a l l y methods, have been proposed to o b t a i n good estimates i n t h i s case. adverse correlated error terms can a l s o cause s e r i o u s e s t i m a t i o n problems. coefficients variances Various two-stage of the r e g r e s s i o n Although the m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n problems have l o n g been r e c o g n i z e d i n regression a n a l y s i s , they a r e u s u a l l y d e a l t w i t h This thesis explores separately. the j o i n t e f f e c t s of these two c o n d i t i o n s on the mean square e r r o r p r o p e r t i e s of the o r d i n a r y r i d g e e s t i m a t o r as the o r d i n a r y 'least-squares estimator. as w e l l We show t h a t r i d g e r e g r e s s i o n i s doubly advantageous when m u l t i c o l l i n e a r i t y i s accompanied by a u t o c o r r e l a t i o n i n b o t h , t h e e r r o r s and the p r i n c i p a l . components. adjusted We then d e r i v e a new r i d g e type e s t i m a t o r that i s for autocorrelation. F i n a l l y , using of - m u l t i c o l l i n e a r i t y s i m u l a t i o n experiments w i t h d i f f e r e n t degrees and a u t o c o r r e l a t i o n , we compare the mean square e r r o r p r o p e r t i e s o f v a r i o u s estimators. TABLE OF CONTENTS INTRODUCTION NOTATION AND PRELIMINARIES MULTICOLLINEARITY 3.1 Sources 3.2 Effects 3.3 Detection AUTOCORRELATION 4.1 Sources 4.2 Effects 4.3 Detection JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION 5.1 Mean Square Error of the OLS Estimates of §' 5.2 Mean Square Error of the Ridge Estimates of,3 5.3 When w i l l Ridge estimates be better than the OLS estimates? 5.4 Use of the "Ridge Trace" RIDGE REGRESSION: PREDICTION ESTIMATES, MEAN SQUARE ERROR AND 6.1 Derivation of Ridge Estimator for a CLR Model 6.2 Derivation of Ridge Estimator for an ALR Model 6.3 Mean Square Error of the "Generalized Estimates" 6.4 Estimation 6.5 Prediction TABLE OF CONTENTS (cont'd) THE MONTE CARLO STUDY •7.1 D e s i g n o f t h e Experiments 7.2 Sampling R e s u l t s 7.2a. R e s u l t s assuming p £ i s known 7.2b. R e s u l t s assuming p £ i s unknown 7.2c. Forecasting CONCLUSIONS REFERENCES . INTRODUCTION M u l t i c o l l i n e a r i t y and A u t o c o r r e l a t i o n a r e two v e r y in regression analysis. common problems As i s well-known, the presence o f some degree of m u l t i c o l l i n e a r i t y r e s u l t s i n e s t i m a t i o n , instability and model m i s - s p e c i f i c a t i o n w h i l e the presence o f s e r i a l l y c o r r e l a t e d e r r o r s l e a d s t o u n d e r e s t i m a t i o n o f the v a r i a n c e s prediction. estimation o f parameter e s t i m a t e s and i n e f f i c i e n t Because these two c o n d i t i o n s have adverse e f f e c t s on and p r e d i c t i o n , a wide range o f t e s t s have, been developed t o reduce t h e i r impact. I n v a r i a b l y , the m u l t i c o l l i n e a r i t y and auto- c o r r e l a t i o n problems a r e d e a l t w i t h s e p a r a t e l y i n most i f not a l l the preceedings. In t h i s t h e s i s we address the q u e s t i o n "What are. the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n on e s t i m a t i o n Thereafter we s h a l l study a n a l y t i c a l l y the p o s s i b l e changes i n the e f f e c t i v e n e s s of various these two c o n d i t i o n s . estimator adjusted e s t i m a t i o n methods i n the j o i n t presence of As a r e s u l t o f these new f i n d i n g s , a new r i d g e f o r a u t o c o r r e l a t i o n i s then proposed and i t s p r o p e r t i e s a r e i n v e s t i g a t e d by c o n d u c t i n g a s i m u l a t i o n We b r i e f l y o u t l i n e t h i s t h e s i s . our and p r e d i c t i o n ? " analysis. Sections Section 2 provides 3 and 4 g i v e a g e n e r a l of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n . the v a l i d i t y o f v a r i o u s study. the s e t t i n g f o r d i s c u s s i o n of the problems In a d d i t i o n , we comment on existing diagnostic tests. The a n a l y t i c a l study of the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n i s p r e s e n t ed i n S e c t i o n 5 . I n S e c t i o n 6 , a new r i d g e e s t i m a t o r adjusted f o r auto- c o r r e l a t i o n i s d e r i v e d and i t s mean square e r r o r p r o p e r t i e s a r e a n a l y z e d . A l s o , we d i s c u s s how these new e s t i m a t e s can be obtained The i n practice. methodology and the r e s u l t s o f sampling experiments appear i n S e c t i o n -2- 7. The t h e s i s concludes w i t h the p r e s e n t a t i o n methods t h a t can be used w i t h the new a c h i e v e b e t t e r e s t i m a t e s and of s e v e r a l two-stage ridge r u l e that hopefully predictions. will -3- 2. NOTATION AND PRELIMINARIES The C l a s s i c a l L i n e a r R e g r e s s i o n the (CLR) model can be r e p r e s e n t e d by equation Y = x§ + e (2.1) where Y i s a n x l . v e c t o r o f o b s e r v a t i o n s on t h e dependent v a r i a b l e , X i s a nxp m a t r i x o f o b s e r v a t i o n s on t h e e x p l a n a t o r y v a r i a b l e s , 3 i s a p x l v e c t o r o f r e g r e s s i o n c o e f f i c i e n t s t o be e s t i m a t e d and e i s a n x l vector of true e r r o r terms. The s t a n d a r d assumptions o f the l i n e a r r e g r e s s i o n model a r e : (1) E ( e ) = 0 , where 0 i s t h e zero v e c t o r 2 T (2) E(ee ) = a I , where I i s t h e i d e n t i t y m a t r i x . (3) The e x p l a n a t o r y v a r i a b l e s a r e n o n - s t o c h a s t i c , hence they a r e independent o f the e r r o r (4) Rank (X) = p < n . The O r d i n a r y L e a s t - s q u a r e s (2.2) 3 0 L g terms. (OLS) e s t i m a t o r o f 3 i s g i v e n (fp^fY = with variance-covariance matrix (2.3) Var(3 ) = a (X X) 2 Q L S T _ 1 . T For s i m p l i c i t y , we w i l l assume t h a t (X X) i s i n c o r r e l a t i o n form. L e t X T P be the pxp o r t h o g o n a l m a t r i x such t h a t PX XP = A where A i s a T d i a g o n a l m a t r i x w i t h t h e e i g e n v a l u e s o f (X X), X^,...,X , d i s p l a y e d on the d i a g o n a l o f A. 1 2 We assume f u r t h e r t h a t p -4- After applying (2.4) an o r t h o g o n a l E(Y) = X P P § T rotation, P, i t f o l l o w s from (2.1) t h a t = X*a T T w h e r e X* = XP i s the data matrix represented and t h e c o l u m n s o f X* a r e l i n e a r l y the vector of regression follows that We w i l l consider (2.6) B (k) =(X X+kI)~ T 3 is ridge symmetric be an " a d a p t i v e . 0 L S ridge LS estimator" estimator" i n the rotated [12]. where Z = By SfcCk) - Z a (A+kI) substituting 8 (k) R - 1 the ridge 0 L S A. = (P AP+kI) T = ? ( T A + k P _ 1 -P^A+kl)" It follows (2.8) from (2.7) t h a t a ( k )= P§ (k) . R R - 1 1 I found P AP§ T ?? T A Q L S ?§OLS Aa ... QLS R by a i s said to [ll:p.63]. coordinates, X*P = X i n ( 2 . 6 ) , I f k l i s replaced k, t h e n t h e e s t i m a t o r given by (2.7) . When k i s a f u n c t i o n o f |3Q > § ( k ) nonnegative d e f i n i t e matrix a "generalized Expressed Q L S It - 0 < k < l T of components. f o r 3 o f the form (X X)§ 1 R by coordinates a =-P§ i s The v e c t o r of the p r i n c i p a l of a i s given ridge estimators Where k i s i n d e p e n d e n t called independent. coefficients t h e OLS e s t i m a t o r i n the rotated estimator fora i s For the CLR model, assumption i s often violated i n practice. Autoregressive Linear Regression (2) t h a t the e r r o r s are This leads (ALR) to the f o r m u l a t i o n model. Mathematically model i s given by r e p l a c i n g assumptions (2) and (2') and (3) by T E(ee ) = a^Q We the ALR assumptions (3 ) X t ; L > 2 » • • • > tp-' ' ^ = fc x where 2 i s a n o n d i a g o n a l p o s i t i v e d e f i n i t e x t x t ie t observation on the v a r i a b l e s , i s independent of the contemporaneous and E an 2 (2 ) 1 t' of (3') below. 1 £ uncorrelated matrix. explanatory succeeding errors, V t+1' assume t h a t the e r r o r term 'e follows a f i r s t - o r d e r autoregressive scheme, t h a t i s e (2.9) where p and £ i s the a u t o c o r r e l a t i o n c o e f f i c i e n t . that U (2.10) = p e •- + U e t-1 t t s a t i s f i e s the f o l l o w i n g f o r a l l t E(U E ( U ) = 0 t t U + E(U U t S ) t + s " s = 0 % )=0 s 4 o and (2.11) E(ee ~~ N T ) = ' a2u~V where n-1 n-2 V n-1 n-2 We require that |p |<l £ -6- For an ALR model, the " G e n e r a l i z e d L e a s t - s q u a r e s " "Best L i n e a r Unbiased E s t i m a t o r " (GLS) w i l l (BLUE) o f 3, denoted as 3 r T C g i v e the . The m a t r i x ft can be w r i t t e n T 0 = 9Q where Q i s n o n s i n g u l a r . -1 -1 T Q ft(Q -1 T (Q ) and ^GLS ^ S Hence ) = I -3 -1 n 9 = • ° b ^ - ^ by making t a the f o l l o w i n g s u b s t i t u t i o n i n the ALR model; n e S* = Q _ 1 £ . • Then i t f o l l o w s t h a t (2.12) Since Y = X g + £,. A A { (2.12) s a t i s f i e s a l l the assumptions o f a CLR model, OLS g i v e the BLUE of 3. ( 2 - 1 3 ) i Ls Hence i t f o l l o w s t h a t = G T -1 -1 T -1 = (X ft X) X ft Y For p r e d i c t i o n , formula where e Y t + 1 = X t + 1 B G L S i s the t*"* GLS 1 ' (2.15) g i v e s the "Best L i n e a r Unbiased P r e d i c t o r " (BLUP) i n a f i r s t - o r d e r ALR model (2.15) will + p e £ t residual. -7- 3. MULTICOLLINEARITY In a p p l y i n g m u l t i p l e r e g r e s s i o n models, some degree of dependence among e x p l a n a t o r y v a r i a b l e s can be expected. inter- As this T interdependence grows and the c o r r e l a t i o n m a t r i x (X X) s i n g u l a r i t y , m u l t i c o l l i n e a r i t y c o n s t i t u t e s a problem. approaches Therefore it is p r e f e r a b l e to t h i n k of m u l t i c o l l i n e a r i t y i n terms of i t s " s e v e r i t y " rather 3.1 than i t s " e x i s t e n c e " or "nonexistence". Sources In g e n e r a l , m u l t i c o l l i n e a r i t y can be poor e x p e r i m e n t a l d e s i g n . be c l a s s i f i e d as f o l l o w s (i) Not The considered to be a symptom of sources of severe m u l t i c o l l i n e a r i t y may [20:p.99-101]. enough d a t a or two many v a r i a b l e s In many cases l a r g e d a t a s e t s o n l y c o n t a i n a few b a s i c f a c t o r s . As the number of v a r i a b l e s e x t r a c t e d from the d a t a i n c r e a s e s , each v a r i a b l e tends to measure the d i f f e r e n t nuances of the same b a s i c f a c t o r s and each h i g h l y c o l l i n e a r v a r i a b l e o n l y has t i o n content. little In t h i s case, d e l e t i n g some v a r i a b l e s or informa- collecting more data can u s u a l l y s o l v e the problem. ( i i ) P h y s i c a l or s t r u c t u r a l s i n g u l a r i t y Sometimes h i g h l y c o l l i n e a r v a r i a b l e s , due p h y s i c a l c o n s t r a i n t s , are i n a d v e r t e n t l y to mathematical or included i n the model. ( i i i ) Sampling s i n g u l a r i t y Due to expense, a c c i d e n t or m i s t a k e , sampling was conducted i n a s m a l l r e g i o n of the d e s i g n space. only 3.2 Effects The (i) major effects Estimation As the T (X X) the T -1 (X X) explode. inverse matrix of becomes i l l - c o n d i t i o n e d , the OLS estimates f o r B are elements of the inverse matrix instability of the be numerically the inverse matrix regression coefficients variances (ii) of and impossible are of quite sensitive the As a result of small OLS estimates v a r i a b l e s might case they changes have i n the large data set. Structure misspecification The the increase information decreasing to the the i n the content sample explained d e p e n d s on size of the v a r i a b l e set of X o f each e x p l a n a t o r y significance variance e a c h member o f o f Y. authors data a relatively [6:p.94][13:p.l60][15], limitation responsible Forecast If then f o r the than tendency the thereby each v a r i a b l e ' s c o n t r i b u t i o n even though Y large v a r i a b l e set happen. i n the variable, decreases As process theoretical to underspecify really of a s s e r t e d by of many model-building, limitation the X, is models. inaccuracy an collinear, changes rather of Therefore, e r r o n e o u s d e l e t i o n o f v a r i a b l e s may (iii) . , the I n any to that -1 collinear to obtain. shows the -1 (X X) the (2.3) a f f e c t e d by T (XX) T the are c o r r e l a t i o n matrix variances diagonal serious multicollinearity instability elements of the of important v a r i a b l e i s omitted but in- the later i t s behavior and prediction period, moves i n d e p e n d e n t l y any. f o r e c a s t i n g u n d e r . t h i s inaccurate. because of i t i s highly this omitted variable other v a r i a b l e s , o v e r s i m p l i f i e d model w i l l be very -9- (iv) Numerical problems T The c o r r e l a t i o n m a t r i x (X X) i s not i n v e r t i b l e i f the columns T o f X a r e l i n e a r l y dependent. the With the m a t r i x (X X) b e i n g singular, OLS e s t i m a t e s of §, r e p r e s e n t e d by ( 2 . 2 ) , a r e c o m p l e t e l y indeterminate. In case o f an almost s i n g u l a r s e t of v a r i a b l e s , T -1 the n u m e r i c a l i n s t a b i l i t y i n c a l c u l a t i n g i n v e r s e m a t r i x (X X) still 3.3 remains. Detection T e s t s f o r the presence and l o c a t i o n o f s e r i o u s are b r i e f l y o u t l i n e d (1) multicollinearity and f o l l o w e d by comments. T e s t s based o n . v a r i o u s c o r r e l a t i o n coefficients Here, h a r m f u l m u l t i c o l l i n e a r i t y i s g e n e r a l l y r u l e s o f thumb. r e c o g n i z e d by For i n s t a n c e , an admitted r u l e o f thumb requires simple p a i r - w i s e c o r r e l a t i o n c o e f f i c i e n t s of e x p l a n a t o r y variables to be l e s s than 0.8. sophisticated Certainly, those more extended and r u l e s o f thumb w i t h prudent use o f v a r i o u s c o r r e l a t i o n w i l l give more s a t i s f a c t o r y r e s u l t s . i s generally c o n s i d e r e d t o be s u p e r i o r The f o l l o w i n g coefficients r u l e o f thumb to o t h e r r u l e s : a variable i s s a i d t o be h i g h l y m u l t i c o l l i n e a r i f i t s c o e f f i c i e n t o f 2 m u l t i p l e c o r r e l a t i o n , R., w i t h the remaining (p-1) v a r i a b l e s i s 2 g r e a t e r than the c o e f f i c i e n t o f m u l t i p l e c o r r e l a t i o n , R , w i t h a l l the e x p l a n a t o r y v a r i a b l e s [I4:p.101]. The v a r i a n c e o f the e s t i m a t e of 3^ can be expressed as f o l l o w s -, (3.1) 1 - R o Var(B ) = —^--4 1 n - p _ 1 [9] \ a 2 X. l v 1 - R 2 l 2 2 where a i s the v a r i a n c e o f the dependent v a r i a b l e Y and a,, i s the • y X. l 3 -10- variance of the explanatory variable X^. From (3.1), i t i s obvious 2 that m u l t i c o l l i n e a r i t y constitutes a problem o n l y when R„ i s r e l - 2 high to R^. atively this (ii) rule U n f o r t u n a t e l y the geometric o f thumb i s a p p a r e n t o n l y when t h e r e a r e two variables [6:p.98], Three-stage hierarchy test T h i s i s p r o p o s e d b y F a r r a r and G l a u b e r stage, i f the n u l l hypothesis H the W i l k s - B a r t l e t t ' s severe a n d move test, toward i n t e r p r e t a t i o n of Q : |X At the f i r s t X| = 1 i s rejected we may a s s e r t the second [6]. explanatory based on that m u l t i c o l l i n e a r i t y i s stage. The F s t a t i s t i c i s then 2 f o r each R^ computed RJ/(P-D F = - i = 1, . . . , p . (l-R*)/(n-p) Statistical stage, inspection o f the p a r t i a l X ^ and t h e r e m a i n i n g can F^ i m p l i e s X ^ i s c o l l i n e a r . significant (p-1) v a r i a b l e s severe F a r r a r and G l a u b e r multicollinearity among e x p l a n a t o r y v a r i a b l e s different Haitovsky In stages of their C h i Square statistic test between t—ratios among t h e e x p l a n a t o r y that detecting, the pattern of can be r e s p e c t i v e l y localizing interdependence achieved at three test. test 1969, H a i t o v s k y hypothesis claimed and l e a r n i n g coefficients and t h e a s s o c i a t e d show t h e p a t t e r n o f i n t e r d e p e n d e n c y variables. (iii) correlation At the t h i r d [9] proposed of severe a heuristic multicollinearity. i s a f u n c t i o n o f the determinant statistic This f o r the heuristic of the c o r r e l a t i o n matrix T ( X X ) , and a p p r o x i m a t e l y distributed as C h i Square. Applications to F a r r a r and Glauber's d a t a show t h a t t h i s t e s t g i v e s more s a t i s f a c t o r y r e s u l t s than the W i l k s - B a r t l e t t ' s t e s t t h a t i s adopted test. and at the f i r s t stage of the F a r r a r and Glauber three-stage T h e r e f o r e H a i t o v s k y c l a i m e d the s u p e r i o r i t y of h i s t e s t suggested a replacement of W i l k - B a r t l e t t ' s t e s t by h i s t e s t i n the F a r r a r and Glauber three-stage t e s t . on the the determinant of c o r r e l a t i o n m a t r i x has some b u i l t - i n deficiencies.. However, any test As w i l l be shown l a t e r , the mean square based error T p r o p e r t i e s depend o n l y on the e i g e n v a l u e s of the m a t r i x (X X ) . T Only when the (X X)has a broad e i g e n v a l u e spectrum, that i s to say the r a t i o of the l a r g e s t e i g e n v a l u e to the s m a l l e s t one, l a r g e , the performance of the OLS the determinant e s t i m a t e s may X^/X deteriorate. , is Since of the c o r r e l a t i o n m a t r i x i s e q u a l t o the product of a l l the e i g e n v a l u e s , t h i s t e s t w i l l t r e a t the m a t r i x having broad e i g e n v a l u e spectrum e q u i v a l e n t l y to those h a v i n g relatively narrow e i g e n v a l u e s p e c t r a , so l o n g as they have the same or n e a r l y the same d e t e r m i n a n t s . is difficult The r e l a t i v e magnitude of the e i g e n v a l u e s i f n o t i m p o s s i b l e t o i n f e r from the r e s u l t s of t e s t t h a t i s based on the determinant However, H a i t o v s k y t e s t g i v e s a f a i r l y any of the c o r r e l a t i o n m a t r i x . good i n d i c a t i o n i n the presence of severe m u l t i c o l l i n e a r i t y i n our s i m u l a t i o n study. T Examining the spectrum o f m a t r i x (X X) T I f the m a t r i x (X X) has a broad e i g e n v a l u e spectrum, t h a t i s , A^/Ap i s l a r g e , then the mean square e r r o r of the OLS e s t i m a t e s of § becomes v e r y l a r g e . S i n c e the t r a c e of the c o r r e l a t i o n m a t r i x T (X X) i s e q u a l to the number o f e x p l a n a t o r y v a r i a b l e s p, an arbitrary -12- r u l e of thumb may consider \,/\ i s large i f A,/X > p. 1 p 1 p Besides, r —2 —2 the minimax index MMI = Z X. /X i P i s a useful indicator too. Small 1 MMI , say, less than two implies the presence of m u l t i c o l l i n e a r i t y [21:p.13-14]. Among a l l these tests and methods proposed, examining the T eigenvalue spectrum of the matrix (X X) provides not only a sound t h e o r e t i c a l basis but also the l i g h t e s t computation burden. -13- 4. AUTOCORRELATION One o f t h e b a s i c assumptions o f t h e CLR model i s t h a t the e r r o r terms a r e independent o f each o t h e r . i s applied However, when r e g r e s s i o n analysis t o time s e r i e s d a t a , the r e s i d u a l s a r e o f t e n found t o be s e r i a l l y correlated. Like multicollinearity, autocorrelation w i d e s p r e a d problem i n a p p l y i n g r e g r e s s i o n models. i s another For s i m p l i c i t y , first- o r d e r a u t o c o r r e l a t i o n i s assumed i n our s t u d y . 4.1 Sources The s o u r c e s a r e m a i n l y t h e f o l l o w i n g : (i) Omission of v a r i a b l e s The t i m e - o r d e r e d e f f e c t s o f t h e o m i t t e d v a r i a b l e s w i l l be included i n t h e e r r o r terms. d i s p l a y i n g random b e h a v i o r . This prevents the e r r o r s from I n t h i s case, f i n d i n g the m i s s i n g v a r i a b l e s and i d e n t i f y i n g t h e c o r r e c t r e l a t i o n s h i p can s o l v e t h e problem. (ii) J S y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e A g a i n , t h e e r r o r terms absorb the s y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e and t h e n d i s p l a y non-random b e h a v i o r . (iii) E r r o r s t r u c t u r e i s time dependent The g r e a t impacts o f some random e v e n t s o r s h o c k s , such as war, s t r i k e s , f l o o d , e t c . , a r e spread over s e v e r a l p e r i o d s o f t i m e , c a u s i n g t h e e r r o r terms t o be s e r i a l l y c o r r e l a t e d . "true-autocorrelation". This i s so-called -14- 4.2 Effects When the OLS technique i s s t i l l used f o r e s t i m a t i o n , the major effects are: (i) Unbiased but i n e f f i c i e n t e s t i m a t o r of B GLS p r o v i d e s t h e BLUE o f B when the d i s p e r s i o n m a t r i x o f e, 2 oM}, i s n o n d i a g o n a l . That i s t o say on the average the sampling v a r i a n c e s o f GLS e s t i m a t e s of B a r e l e s s than t h a t o f OLS e s t i m a t e s of g, hence OLS i s i n e f f i c i e n t compared w i t h GLS. U n d e r e s t i m a t i o n o f the v a r i a n c e s o f the e s t i m a t e s o f B (ii) As an i l l u s t r a t i o n , c o n s i d e r t h e v e r y simple model y V where u t = Bx t = p e e t + c t - i + t u t s a t i s f i e s assumptions ( 2 . 1 0 ) . I t has been shown t h a t the v a r i a n c e o f OLS e s t i m a t e o f B i s [ 1 3 : p . 2 4 7 ] n-1 a (4.1) Var(6 0 L S ) i>i i l , x 2 + - -f— i-i n-2 [ 1 + 2^ +.2p 1=1 1 i •+ 2 p e n x x -f^—* t i - i X • 1 n " n I i=l The OLS formula ( 2 . 3 ) i g n o r e s J, l i+2 1 n 2 xi 1 the term i n parentheses i n ( 4 . 1 ) and 2 2 g i v e s the v a r i a n c e s o f the e s t i m a t e s o f B as a / £ x ^ I f b o t h e . i=l and x a r e p o s i t i v e l y a u t o c o r r e l a t e d the e x p r e s s i o n i n parentheses n i s almost c e r t a i n l y greater than unity, therefore the OLS formula w i l l underestimate the true variance of 3 (iii) n T C . I n e f f i c i e n t predictor of Y When autocorrelation i s present,, error made at one point i n time gives information about the error made at a subsequent point i n time. The OLS predictor f a i l s to take this information into account, hence i t i s not the BLUP of Y [13:p.265-266]. A.3 Detection The tests which are commonly used to recognize the existence of f i r s t - o r d e r autocorrelation are the following. (i) Eye-ball tests The plot of OLS residuals e Any nonrandom bechaior autocorrelation. lagged value e fc of e fc against time t can be informative. can be considered as an i n d i c a t i o n of We may also plot the OLS residual e ^. I f the observations against i t s are hot evenly spread over the four quadrants, we may conclude the f i r s t - o r d e r autocorrelation i s present. These eye-ball tests are quite e f f e c t i v e , however they are imprecise and do not lend themselves to c l a s s i c a l i n f e r e n t i a l methods. (ii) yon-Neumann r a t i o In 1941, the r a t i o of the mean square successive difference to the variance was proposed by von-Neumann as a test s t a t i s t i c for the existence of f i r s t - o r d e r autocorrelation [22]. Though various applications have proven the usefulness of the von-Neumann r a t i o , we emphasize that this test i s applicable only when e values are independently d i s t r i b u t e d and the sample size i s large. In practice, the OLS r e s i d u a l s used to compute t h e von-Neumann r a t i o are not i n d e p e n d e n t l y d i s t r i b u t e d usually even when the t r u e e r r o r terms are. (iii) Durbin-Watson t e s t T h i s t e s t , named a f t e r i t s o r i g i n a t o r s D u r b i n and Watson, i s w i d e l y used f o r s m a l l sample s i z e s comings o f the Durbin-Watson t e s t . o f indeterminancy. [4][5].. There a r e some s h o r t - First, t h e r e e x i s t two r e g i o n s Though an exact t e s t was suggested i n 1966, i t s heavy c o m p u t a t i o n a l burden, p r e v e n t s a p p l i c a t i o n s ' . [10]. Secondly, the .test" from wide the Durbin-Watson t e s t i s d e r i v e d f o r non-stochastic explanatory v a r i a b l e s only. if by Henshaw I t has been shown t h a t the lagged dependent v a r i a b l e s a r e p r e s e n t e i t h e r i n s i n g l e r e g r e s s i o n e q u a t i o n models or i n systems o f simultaneous r e g r e s s i o n e q u a t i o n s , t h e Durbin-Watson t e s t i s b i a s e d towards the v a l u e f o r a random e r r o r , t h a t i s , d i s b i a s e d towards 2 v e r y m i s l e a d i n g i n f o r m a t i o n [17]. } thereby giving I t i s as n e c e s s a r y as important to t e s t f o r s e r i a l c o r r e l a t i o n f o r models c o n t a i n i n g lagged dependent v a r i a b l e s s i n c e a u t o c o r r e l a t e d models a r e u s u a l l y r e p a i r e d by i n s e r t i n g lagged Y v a l u e s i n t o the r i g h t - h a n d s i d e of the regression equation. To t h i s end, D u r b i n on the h s t a t i s t i c i n 1970[3]. "h" i s d e f i n e d as the f o l l o w i n g , n where b' i s the c o e f f i c i e n t o f Y t-1* developed a test based -17- This test i s c o m p u t a t i o n a l cheap but o n l y a p p l i c a b l e f o r l a r g e sample s i z e s . The s m a l l sample p r o p e r t i e s of the "h" s t a t i s t i c are s t i l l unknown. -18- 5. JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION In s t a t i s t i c a l a n a l y s i s , a p o i n t e s t i m a t e i s u s u a l l y o f l i t t l e use u n l e s s accompanied by an e s t i m a t e o f i t s a c c u r a c y . Mean Square E r r o r it In t h i s connection, (MSE) i s w i d e l y used as a measure of a c c u r a c y . Since i s t r u e t h a t a c c u r a t e parameter e s t i m a t e s c o n s t i t u t e an e f f e c t i v e model. MSE can be used t o determine the models e f f e c t i v e n e s s when the u n d e r l y i n g o b j e c t i v e i s s i m p l y t o o b t a i n good parameter e s t i m a t e s . In 1970 paper, H o e r l and Kennard p r e s e n t e d OLS and Ridge e s t i m a t e s of § [11]. s t u d i e s have confirmed the MSE p r o p e r t i e s f o r the T h e r e a f t e r the r e s u l t s o f v a r i o u s that ridge r e g r e s s i o n w i l l improve the MSE of e s t i m a t i o n and p r e d i c t i o n s i n the presence In t h i s s e c t i o n , we w i l l p r e s e n t e x p r e s s i o n s f o r the MSE of § B (k) (2.9). and These e x p r e s s i o n s w i l l enable us t o examine the e f f e c t o f these c o n d i t i o n s on the r i d g e and the OLS e s t i m a t e s . be reduced 5.1 multicollineary. when the e r r o r terms f o l l o w a f i r s t - o r d e r a u t o c o r r e l a t e d p a t t e r n R two o f severe Our a n a l y s i s can t o t h a t o f H o e r l and Kennard by s e t t i n g p £ = 0. Mean Square E r r o r of the OLS E s t i m a t e s of 8 We b e g i n w i t h the a n a l y s i s f o r the OLS e s t i m a t e s f o r a f i r s t - o r d e r A L R model. L e t = D i s t a n c e from A = ( ^0LS"§ ) T ( B Q L S to B ^0LS~^ ' " We d e f i n e the MSE of B — ul_ib 2 to be E ( L ) . _L P r o p o s i t i o n 5.1 (5.D -BO*) = 1 I I j = l £=1 p D J J i E M -19- where D = X(X X) T Proof: ( 5 . 2 ) 2 X T From (2.1) ^OLS " § = (2.2) ? P ? I ~ § ( T - 1 T = (X X) X (X§+e) - 3T -1 T = (X X) l T -1 T X. e By d e f i n i t i o n and (5.2) i f follows that E(L ) = E [ ( 6 2 -B) (i T O L S T —2 T O L S -3)] T = gt^xonp X e] . Noting that E(e) = 0 i t follows from Theorem 4.6.1 G r a y b i l l [7:p.l39] that (5.3) E(L ) = a t . [ X ( X X ) " X V ] 2 2 T 2 T 1 ? From the d e f i n i t i o n of V and D, (5.1) follows. (5.1) does not give much insight into the effect of m u l t i c o l l i n e a r i t y and autocorrelation on the MSE of 3Q « By rotating axes (using LS p r i n c i p a l components) the effect can be more c l e a r l y Proposition 5.2 1=1 where x*. i s the i t h demonstrated. l j > I observaton on the i t 1=1 h p r i n c i p a l component. ^ o r a matrix A we use the notation t ( A ) to denote the trace of A. r -20- Proof: (5.5) From (2.4) and (2.5) a - a = (X* X*) X* Y - a T Q L S -1 T = (X* X*) X* (X*a+e) - a T 1 T T -1 T = (X* X*) X* g By d e f i n i t i o n and (5.5) i t follows that E ^"ibLS-^VPCgoLS-^l < 1> L = E [ ( ?0LS^ ^0LS 2 _ ) = E[e X*(X* X*) T T 2 )] X* e] T Hence by the same argument used i n proving Proposition 5.1 E(L ) 2 a t [X*A X* V] u r ~ ~ ~ ~ 2 2 T xl i X !i !i X 2 1 AT l L X* X* . l i nx au2 tr (, V .) X* . x* . p X* . X* . nx l x Y nx 2x 2 ^ 2 AT 1 AT X X p x* y nx 2 1 A x 2 By d e f i n i t i o n of A and V, (5.4) follows. After orthogonal rotation, the effect of m u l t i c o l l i n e a r i t y and autocorrelation becomes apparent from (5.4). First, i f p £ i s positive and most of the p r i n c i p a l components are also p o s i t i v e l y autocorrelated, almost c e r t a i n l y the second term i n (5.4) w i l l be p o s i t i v e . That i s to -21- say t h a t the MSE o f g w i l l be l a r g e r than when these e f f e c t s a r e n o t p r e s e n t ; moreover the d i f f e r e n c e w i l l be i n p r o p o r t i o n to the magnitude of p . Secondly, we o b t a i n a c r o s s term o f e i g e n v a l u e s , A_ and a u t o T c o r r e l a t i o n c o e f f i c i e n t , p^. that i s , I f the matrix (X X) i s i l l - c o n d i t i o n e d , i s c l o s e to zero and t h e r e is' a h i g h degree o f p o s i t i v e c o r r e l a t i o n both i n the p*"* component and the e r r o r terms, then the 1 second term i n (5.4) dominates and the MSE o f g It auto- i s then extremely characteristics. can be v e r y l a r g e . dangerous t o a p p l y OLS t o data w i t h the above However, the problem w i l l n o t be t h a t s e r i o u s i f p £ i s n e g a t i v e o r t h e p r i n c i p a l components, e s p e c i a l l y those weak components, are n o t a u t o c o r r e l a t e d . Finally, from (5.4) we a r e a b l e to t e l l by how much the MSE o f Pg^g changes because o f the e x i s t e n c e o f f i r s t - o r d e r a u t o c o r r e l a t e d e r r o r s i n g e n e r a l r e g r e s s i o n models c o n t a i n i n g p explanatory v a r i a b l e s . Note t h a t when p^ = 0, (5.4) reduces t o E ( L ) = a t (xV I u r ~ ~ 2 2 ? ^. > A. i=l 2 = a U 5.2 1 Mean Square E r r o r o f the Ridge E s t i m a t e s - •'* — " •'• '-' •• • • •— — • 1 1 1 of 0 • • •• • i In p a r a l l e l w i t h 5.1, we d e f i n e L.(k) = D i s t a n c e from B_(k) L ~K to g ~ The MSE o f g ( k ) i s g i v e n by D E[L (k)] = E[g (k) B ) ( B ( k ) - §)]. - 2 R T R P r o p o s i t i o n 5.3 (5.6) E[L (k)] = where 2 y^k) = ^ 2 2 Y 2 ( k ) = k Y l ( k ) + Y (k) + Y (k) 2 I ( x i—J. _ + k 4 P I i A \^ 3 ) (A +k)2 . "i 2 -22- n n-1 Y (k) - l o l l l . j > I 3 U Proof: A. ] p -' px*.x* l \ i=l( Aj+k) 0 J £ £ From (2.7) and (2.8), the MSE of §R.(k) can be written as E [ L ( k ) ] = E[(§ (k) - B ) V p ( 3 ( k ) - §)] 2 R " E [ ( Z ?OLS " ? = E[(a (5.7) R ) T ( Z 5oLS * ? - a) Z Z(a T Q L S T ) ] - a)] + (Za - a ) ( Z a - a) T Q L S Since the f i r s t term i n (5.7) i s a scalar, from (2.7) and Proposition 5.2, i t follows (5.8) E[(a -a) Z Z(a T Q L S T 0 L S -a)] = a 2 t [^r^CA+kl)" ^" ^*^] 2 1 r = a j t [X*(A+kI)" X* V] 2 T r 2 u „ 2o u x + 1=1 (XH-k) n n-1 l i t ! j > «, P x*.x* - » ^ .„ 2 ] Pf £ i = l (X +k) ± Since the matrix (Z-I) can be written as (5.9) Z - I = Z(I-Z ) -1 = Z(-kA ) -1 = -k(A+kI) _1 From (5.9), the second term i n (5.7) can be expressed as follows. (Za-a) -(Za-a) = J a (Z-I)^'a i 2 T -? = kV(M-kl) a Z 9 = k Z P I i=l a. ' • (A +k) = Y (k) ' 1 2 Z ± Completing the p r o o f . The Y^(k) MSE of B ( k ) c o n s i s t s o f t h r e e p a r t s , y^ik), R can be c o n s i d e r e d t o be the t o t a l v a r i a n c e e s t i m a t e s and i s a m o n o t o n i c a l l y the decreasing Y ( k ) and Y^Ck). 2 o f the parameter f u n c t i o n o f k, Y ( k ) i s 2 square o f the b i a s brought by the augmented m a t r i x k l and i s m o n o t o n i c a l l y i n c r e a s i n g f u n c t i o n o f k w h i l e Y-j(k) i s r e l a t e d to the a u t o c o r r e l a t i o n i n the e r r o r terms. Hoerl the presence o f severe m u l t i c o l l i n e a r i t y , and Kennard c l a i m t h a t i n i t i s p o s s i b l e t o reduce MSE s u b s t a n t i a l l y by t a k i n g a l i t t l e b i a s , t h a t i s , c h o o s i n g k > 0. This i s because i n the neighborhood o f o r i g i n , Y^(k) w i l l drop s h a r p l y Y (k) 2 w i l l only increase s l i g h t l y as k i n c r e a s e s [ll:p.60-61]. After i n c o r p o r a t i n g a u t o c o r r e l a t i o n i n the c o n t e x t o f r i d g e r e g r e s s i o n t h e i r a s s e r t i o n w i l l s t i l l be t r u e o n l y satisfied. From (5.6) we see t h a t while analys i f c e r t a i n conditions are the e f f e c t s o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n a r e the f o l l o w i n g . (i) If i s p o s i t i v e and the p r i n c i p a l components, e s p e c i a l l y t h e weak components, a r e a l s o p o s i t i v e l y a u t o c o r r e l a t e d , method w i l l be even more d e s i r a b l e than OLS. then ridge T h i s i s because s u b s t a n t i a l decrease i n b o t h Y-^(k) and Y^(k) can be a c h i e v e d by c h o o s i n g k > 0 w h i l e the i n c r e a s e i n Y ( k ) i s r e l a t i v e l y s m a l l as 2 moving t o k > 0. (ii) If p c i s n e g a t i v e or almost a l l o f the p r i n c i p a l components a r e -24 - not autocorrelated, then on the average Y ( k ) i s c l o s e to z e r o , 3 hence the r i d g e and the OLS e s t i m a t e s w i l l perform the same as i n the u n c o r r e l a t e d ( i i i ) Since ridge regression relatively case, i s s i m i l a r to s h r i n k i n g the model by d r o p p i n g the l e a s t important component [21:p.24-28] (5.6) g i v e s a t h e o r e t i c a l j u s t i f i c a t i o n t o s h r i n k t h e model i f both t h e l a s t e r r o r terms a r e p o s i t i v e l y a u t o c o r r e l a t e d . of e s t i m a t i o n stability, component and the From the p o i n t o f view t h e r i d g e method w i l l be h e l p f u l when severe m u l t i c o l l i n e a r i t y i s accompanied by h i g h degree o f p o s i t i v e autoc o r r e l a t i o n both i n the weakest component 5.3 and the e r r o r terms. When w i l l Ridge e s t i m a t e s be b e t t e r than the OLS e s t i m a t e s ? T a k i n g the d e r i v a t i v e s o f Y-^(k) and Y 2 ( k ) , H o e r l and Kennard found a c o n d i t i o n on k such that r i d g e r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s than OLS i n terms of MSE. a That i s when k i s s m a l l e r 2 , where a i s the l a r g e s t r e g r e s s i o n c o e f f i c i e n t max max MSE o f 6 ,(k) w i l l be l e s s than that o f g. „ [11]. T T i s present, 1 Y ( k ) and Y ( k ) . 2 3 below. 2 than a / u i n magnitude, the When a u t o c o r r e l a t i o n t h e c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l b e t t e r than OLS r e g r e s s i o n a r e d e s c r i b e d of Y ( k ) , parameter perform C o n s i d e r the d e r i v a t i v e s • "P dy /dk = 2k A. af 'I X i=l (5.10) d y / d k 3 When (X X) values = (A^k)- 3 - approaches s i n g u l a r i t y which i m p l i e s t h a t A of the f i r s t are g i v e n 1 two d e r i v a t i v e s i n the neighborhood of the origin by (5.11) Lim Lim A ->0 k->0 P (dy /dk) = -<*> (dy = 0 + (5.12) Lira Lim A ->0 k+0 P /dk) As k i n c r e a s e s , a huge drop i n y^ w i t h s l i g h t expected. ->• 0, increase in y 2 may However (5.10) shows t h a t the b e h a v i o r of y^ depends on degree of a u t o c o r r e l a t i o n b o t h i n the p r i n c i p a l components and terms. Therefore increases. be The y^ may use increase or d e c r e a s e at v a r i o u s of r i d g e r e g r e s s i o n the rate's as i s most f a v o u r a b l e the when error k there i s a h i g h degree of p o s i t i v e a u t o c o r r e l a t i o n both i n the components the e r r o r terms. We now formalize these arguments and present c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l be b e t t e r than regression i n MSE criterion. a OLS and -26- Let F(k) = E ( L ) - E [ L ( k ) ] 2 2 2 i = i At; (A.+k)" " j = i *=i - fcJ £ + j i ±=i E (A.+k) 2 Then dF/dk = 2 a J (5.13) j -^—[x/f i=l (X + k ) J x 7 • j=l £=1 X*.X* pj]-2k £ J P A 2 ^ 1=1 ( A + k ) 3 ± Assume t h a t Yg(k) i s a n o n - i n c r e a s i n g f u n c t i o n of k i n the neighborhood From ( 5 . 1 1 ) and of o r i g i n . moving towards k > 0 , ( 5 . 1 2 ) we may ^ ( d F / d k ) > 0. i.e. + e x i s t s k > 0 such t h a t the OLS Theorem 5 . 1 . /r- •, , 2 a o* a max implies If 2 \ a max , ri-1 1 3=1 then E(L?; )• - E [ L ( k ) ] > 0 . 2 In o t h e r words, e s t i m a t e s have h i g h e r MSE estimates. j £ J J + (dF/dk) > 0 expect F ( k ) to i n c r e a s e as n-j 1=1 J there than the r i d g e -27- Again (5.14) w i l l reduce to Hoerl and Kennard's result i f p =0. When p o s i t i v e autocorrelation exists i n the error terms and the p r i n c i p a l components, the second term i n (5.14) may well be p o s i t i v e , hence the range of k for ridge estimates to be better than OLS estimates i n MSE c r i t e r i o n w i l l be larger than what Hoerl and Kennard asserted i n uncorrelated case. (5.14) shows that the extension i n the range of k i s p o s i t i v e l y related with the magnitude of p . However, (5.14) i s 2 2 just a necessary condition on k for E(L^) to be greater than E[L2(k)] since F(k)- i s increasing i n k over the range shown by (5.14). It i s possible that for some values of k, F(k) i s decreasing i n k while 2 the function value i s s t i l l p o s i t i v e , that i s E(L^) i s s t i l l greater 2 than E[L (k)]. Therefore, we may consider (5.14) as a stringent condition on k for ridge estimates to be better than OLS i n MSE estimates criterion. If either p £ i s negative or the p r i n c i p a l components, especially those weak ones, are not autocorrelated, the behavior of Y^k) thereby F(k) as k increases w i l l be hard to predict. and The effect of autocorrelation on the range of k depends on the data set we gathered. In practice, the true parameters are unknown, the range of k shown by (5.14) can be approximated by conducting a p r i n c i p a l component analysis and substituting the estimates for the 5.4 parameters. Use of the "Ridge Trace" In ridge regression the augmented matrix (kl) i s used to cause the system to have the general c h a r a c t e r i s t i c s of an orthogonal system. -28- H o e r l and Kennard claimed t h a t at c e r t a i n v a l u e of k, stabilize [ll':p.65]. They proposed the system w i l l the usage of a "Ridge T r a c e " as a d i a g n o s t i c t o o l t o s e l e c t a s i n g l e v a l u e of k and a unique of g i n practice. The "Ridge T r a c e " w i l l p o r t r a r y the b e h a v i o r of a l l the parameter e s t i m a t e s as k v a r i e s . dimensions T h e r e f o r e i n s t e a d of s u p p r e s s i n g e i t h e r by d e l e t i n g c o l l i n e a r v a r i a b l e s or dropping p r i n c i p a l components of s m a l l importance, how s i n g u l a r i t y i s causing i n s t a b i l i t y , incorrect ridge estimate signs. the the Ridge Trace w i l l show over/under-estimations and In c o n n e c t i o n w i t h a u t o c o r r e l a t i o n where r i d g e r e g r e s s i o n i s even more d e s i r a b l e , c e r t a i n l y the "Ridge T r a c e " w i l l of g r e a t h e l p i n g e t t i n g b e t t e r p o i n t e s t i m a t e s and predictions. Even when p be thereby b e t t e r i s n e g a t i v e or the p r i n c i p a l components e are not a u t o c o r r e l a t e d , the m e r i t s and u s e f u l n e s s of the "Ridge T r a c e " and the "Ridge R e g r e s s i o n " a r e s t i l l p r e s e r v e d i n d e a l i n g w i t h problem of m u l t i - c o l l i n e a r i t y . the -29- 6. RIDGE REGRESSION: ESTIMATES, MEAN SQUARE ERROR AND PREDICTION The MSE of OLS estimates of g can be written as the difference in the length between two vectors, $ g and g [11:p.56] 0Tj (6.1) E(L*) = E ( i T 0 L S 3 ) - gg ' T 0 L S (6.1) shows that i n the presence of severe m u l t i c o l l i n e a r i t y , the MSE can be improved by shortening the OLS estimates of g. In this section we w i l l show that t h i s reasoning appears to be compatible with the derivation of ridge estimator of 'g. Hence ridge regression can be expected to be better i n terms of MSE. 6•1 Derivation of Ridge Estimator for a CLR Model Let B be any estimate of g. I t s residual sum of squares, 0 , can be written as the value of minimum sum of squares, 0 . , plus ^ mm' c X the distance from B to &Q g weighted through ( X X ) . L 0 = (Y-XB) (Y-XB) T (6.2) = ^Xg = 0 . min ) (Y-xg T O L S •+ 0(B) ~ o L s ) + (B-B ) X X(B-g T 0LS T 0LS ) • For a s p e c i f i c value of 0(B), 0 q , the ridge estimator i s founded by choosing a B to T Minimize B B (6.3) Subject to ( B - g ) X X (B-g ) = 0 T 0LS T QLS Q This problem can be solved by use of Lagrange m u l t i p l i e r techniques -30- where (1/k) The i s the m u l t i p l i e r c o r r e s p o n d i n g to the c o n s t r a i n t (6.3). problem i s to minimize (6.4) F = BB T + (l/k)[(B-6 ) X (.?-§0LS T 0 L S T x A n e c e s s a r y c o n d i t i o n f o r B to minimize (6.4) |f =2B + i[2(X T X)B-2(X T ) " 0 0* i s that X)^ Hence [J+^? ?)]?4 ? ?^0LS T ( T and B* = B ( k ) = (X X+kI) X Y T _ 1 T where k i s chosen to s a t i s f y c o n s t r a i n t work the other way (6.3). In p r a c t i c e , we usually round s i n c e i t i s e a s i e r to choose a k > 0 and compute the a d d i t i o n a l r e s i d u a l sum of squares, then 0. Q I t i s c l e a r t h a t f o r a f i x e d increment 0 , t h e r e i s a continuum of o' v a l u e s of B t h a t w i l l s a t i s f y the r e l a t i o n s h i p 0 = 0 . + 0 , and mxn o nu J the r i d g e e s t i m a t e so d e r i v e d Therefore, we may i s the one w i t h the minimum l e n g t h . w e l l expect the r i d g e e s t i m a t e s to y i e l d l e s s MSE the presence of m u l t i c o l l i n e a r i t y s i n c e they are o r i g i n a l l y by m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r . c e r t a i n extent equivalent (6.2) derived I t i s t r u e to a t h a t m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r to r e d u c i n g the MSE of parameter e s t i m a t e s . Q L S T increase i n the r e s i d u a l sum approaching s i n g u l a r i t y . That i s to say, is In a d d i t i o n , shows t h a t i t i s p o s s i b l e to move f u r t h e r away from 8 an a p p r e c i a b l e in of squares as ridge regression (X X) may without achieve large reduction i n MSE at v i r t u a l l y no cost i n terms of the T r e s i d u a l sum In 1971, of squares i f the conditioning of (X X) i s poor enough. Newhouse and Oman [18] used MSE as evaluation c r i t e r i o n i n t h e i r Monte Carlo studies of ridge regression. the standard way to evaluate proposals Since then i t has become for ridge estimators. the above derivation of the ridge estimator, obviously we that ridge estimates are designed to be better i n MSE Now From realize criterion. we would l i k e to study the implications of the constraint i n deriving the ridge estimator. i n t e r p r e t a t i o n , we represent Let A = PB Since orthogonalization can ease (6.2) i n the rotated axes. Then 0 = ^ O L s ' V ^ O L S = 0 min + 5 ^OLS V ) + T x *CA-a ( J L S ) X1=1 - V ° O L S . l * i ( ) 2 Where A^ i s the estimate of regression c o e f f i c i e n t for the i t h component and a i s the OLS ULio « estimate of regression c o e f f i c i e n t for the i t h 1 component. The problem i s T Minimize A A (6.5) Subject to ( - ? A ) A(A-a T 0 L S 0 L S ) = 0 Q or equivalently (6.6) (6.5) Subject to \ (A.-« i=l Shows that the vector [ ) X. 2 0LS A _ = 0 . i ? 0 L S ] i s normed through A to have the length equal to 0 . Since the eigenvalue X^ can be considered as an q indicator of the information content and explanatory power of the i t h p r i n c i p a l component, we may well conclude that the derivation of the ridge estimator has already taken the r e l a t i v e information content and explaining power of the explanatory variables into account. (6.6) shows that the constraint has incorporated the concept of square-errorloss function as w e l l . It increases the length of A the most when the parameter estimates of the important components deviate from OLS estimates since i t i s found by taking the square of the deviations multiplied by their corresponding eigenvalues. This implies that i t i s best to shrink the estimate of § for those components that have small eigenvalues, i . e . the ones most subject to 6.2 instability. Derivation of Ridge Estimator for an ALR Model In the presence of autocorrelated error terms, the OLS estimator of 8 w i l l no longer have the minimum-variance property; the GLS estimator w i l l be the BLUE of B. type Our derivation O f a new ridge estimator adjusted for autocorrelation, 8 (k), w i l l p a r a l l e l with ~GR the derivation of 6 (k) i n the previous section. Again l e t B be r) R estimate of B.. any ~ Its residual sum of squares, 0, can be written as the value of the minimum sum of squares, 0 . , plus the distance from mm' n r, B to § . rj-i G L S weighted through ( X X ) . A (See (2.10) for notation). A 0 = (Y*-X*Bj (Y*-X*B) T " CY -xJ A ) (Y,-xJ T G L S G L S ) 4- ( B - B ) X^X (B-B T G L S A G L S ) We have -33rp 0(B) = 0 , 3 For a s p e c i f i c v a l u e (6.7) subject The to ( B - § ) X^(B-§ i s derived T G L S G L S ) = 0 G L S )^ ] - to minimize B B . Q L a g r a n g i a n i s g i v e n by * = ? ? +£ C(?-§ T G L S > &(?-e T 0 A n e c e s s a r y c o n d i t i o n f o r a minimum i s t h a t f » 2B + i [2X^B - 2 ( X ^ ) 3 G L S ] = 0 . T h i s reduces t o (6.8) B* = 3 G R (k) = (X*X* + = for 1 ^ T -1 ' -1 T ( X Q X + kl) X n Where k i s chosen t o s a t i s f y trace of 3 k l ) " (6.7). -1 •Y The c h a r a c t e r i z a t i o n o f the r i d g e (k) w i l l be e s s e n t i a l l y the same as t h a t of B ^ k ) . a specific increment 0 , the 3 , ^ so d e r i v e d o ~(JK. i s the r e g r e s s i o n v e c t o r w i t h the minimum l e n g t h among a continuum o f v a l u e s will satisfy the r e l a t i o n s h i p 0 = 0 ^ m n + 0 . Q For i n s t a n c e , of B that q However m u l t i c o l l i n e a r i t y may no l o n g e r be a s u b s t a n t i a l problem a f t e r t r a n s f o r m i n g i n some r a r e cases. That i s , X into X A t h i s may happen i n time s e r i e s s t u d i e s where m u l t i c o l l i n e a r i t y i s a r e s u l t o f the e x p l a n a t o r y i n c r e a s i n g together I t i s then p o s s i b l e t h a t the transformed over time. v a r i a b l e s are not close to being the case, the r e d u c t i o n c o l l i n e a r w i t h each o t h e r . i n MSE can n o t be obtained variables I f that i s with only a s l i g h t i n c r e a s e i n the r e s i d u a l sum o f squares. T h i s i s because o f t h e low " T MSE of § a l r e a d y achieved and the n o n - s i n g u l a r i t y o f the ( X ^ X ) . G T s A In most c a s e s , i f not a l l , the m a t r i x T -1 (X ft X) i s very l i k e l y to have X a broad eigenvalue spectrum i f (X X) does. on the m o t i v a t i o n of m i n i m i z i n g the i n t e r p r e t a t i o n and the l e n g t h of the r e g r e s s i o n v e c t o r , i m p l i c a t i o n s o f - t h e c o n s t r a i n t i n the d e r i v a t i o n of r i d g e e s t i m a t o r of 8 f o r a CLR Mean Square E r r o r of the " G e n e r a l i z e d The MSE Y p A e model w i l l be a p p l i c a b l e to t h a t i n 3„„. ~GR the d e r i v a t i o n of 6-3 = X8 =0 of 8^ + A & and Tc (k) are r e a d i l y e s t a b l i s h e d . = a 2 tr a ( X ^ ) ' with Q to 3 n 1 -1 T t r (X n model, (5.3) T 2 (6.9) Since of 3 „ „ as f o l l o w s , -GLS L e t L . = D i s t a n c e from $ 2 Estimators" s a t i s f i e s a l l the assumptions of a CLR g i v e s the MSE E(L ) Then the p r e v i o u s d i s c u s s i o n -1 X) ' x e S e t t i n g p^ = 0 i n (5.8), L e t L^(k) = D i s t a n c e (6.10) E[L (k)] = a 2 2 (6.10) g i v e s the MSE from § ( k ) to 8 of 8 (k). r R G R t r [ ( X f t X+kI) ( X f t X ) ] + T 1 2 T 1 2 T T -1 -2 k § (X « ?+kI) 8 The (6.10), expect effect since 2 E(L-j) a n of a u t o c o r r e l a t i o n Is d i f f i c u l t to i n f e r from (6.9) 0, i s not a d i a g o n a l m a t r i x , however, n o r m a l l y we ^ 2 2 E [ L ^ ( k ) ] to.be l e s s than E(L^) and 2 and may E[L (k)] respectively. 2 -35- 6.4 Estimation T h e o r e t i c a l l y , t h e GLS g i v e s the BLUE o f 8 f o r an ALR model. But u s u a l l y i n p r a c t i c e , n e i t h e r the order o f a u t o c o r r e l a t i o n s t r u c t u r e nor t h e v a l u e o f the parameter p' i s known. or GR e s t i m a t e s can not be computed d i r e c t l y . Many two-stage methods have been proposed t o approximate the GLS e s t i m a t e s to be q u i t e e f f e c t i v e . process Hence the GLS and have proven These i n c l u d e the Cochrane-Ocutt iterative [1] and Durbin's two-step method [ 2 ] . In the j o i n t presence o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n , it one i s a c t u a l l y quite straightforward incorporated of § . We i l l u s t r a t e how r i d g e r e g r e s s i o n can be i n Durbin's two-step method f o r a simple model w i t h c o l l i n e a r explanatory (6.11) Y . and E - 3 t £. + q X B l e t-l = P E + U = a t l only variables: + 8 X 2 t 2 + E t t t=l,2,,..,n ' J « I P for a l l t E(u ) = 0 t E ( u t' t+s u ) u for s = 0 =0 The with o f t h e two-stage r e g r e s s i o n methods i n the hope o f a c h i e v i n g b e t t e r estimates two t o combine r i d g e r e g r e s s i o n for s ± 0 transformed r e l a t i o n i s g i v e n by (6.12) Y t - P e Y _ t x - 3 (l-p ) + ^ ( X ^ - P ^ . ^ o + u. £ + e (X 2 t 2 -P X „ e t l j 2 ) Combining (6.11) and (6.12) gives (6.13) Y - 3 (1-P >-+ ^ t 0 e ~ h e t-l,l X p + S 2 t2 " W t - l , 2 X + p Y E t-l + U t The f i r s t step i s to estimate the parameters of (6.13) using OLS. Then use the estimated c o e f f i c i e n t of Y (Y -p Y _ ), t £ t 1 ^ to compute the transformed variables ( X ^ - p g X ^ . ^ and (\ ~ e t-l P X 2 ^. At the second step, ridge regression i s highly recommended to be used i n place of OLS and applied to relationship (6.12) containing those transformed variables. The c o e f f i c i e n t estimate of (X . - p X .) i s our ti e t-li approximation of 3 and the intercept term divided by (1-p ) i s our n /\ approximation of 3 G R o . It might seem reasonable to apply ridge regression at the f i r s t step of Durbin's method since X,, and X^„ are c o l l i n e a r . t l t2 v As stated e a r l i e r i n Section 3, high pair-wise c o r r e l a t i o n c o e f f i c i e n t of explanatory variables does not necessarily r e s u l t i n estimation' i n - . stability. Besides, the lagged values of X ^ and X ^ are inserted into the explanatory variable set. I f X ^ and X ^ are not autocorrelated, T the conditioning of the enlarged u fc (X X) may be s a t i s f a c t o r y . Moreover i n (6.12) has a scalar dispersion matrix, therefore OLS gives consistent estimates of regression c o e f f i c i e n t s . estimates, Also among these only the c o e f f i c i e n t estimate of Y' ^ w i l l be used to compute the transformed variables. Hence OLS technique i s recommended to be used at the f i r s t step even when Xj.^ and X ^ are c o l l i n e a r . This combination of ridge regression and Durbin's two-step method can e a s i l y be extended to a p-variable model with higher order of autocorrelation. -37- 6.5 Prediction Consider a f i r s t - o r d e r ALR model, (2.15) gives the minimum variance predictor (BLUP). In p r a c t i c e , both $ replaced by their estimated values. r T O and p are If ridge regression i s used i n conjunction with some other methods to cope with the j o i n t problem of m u l t i c o l l i n e a r i t y and autoc o r r e l a t i o n , the p r e d i c t i o n i s given by (6.14) Y f c + 1 = X ^ B ^ k ) +P e £ t st where X ^ t + i s a 1 x p vector of the (t+1) explanatory v a r i a b l e s , B regression c o e f f i c i e n t s , P observation on the (k) i s a p x 1 vector of approximated £ i s an estimate of autocorrelation c o e f f i c i e n t and e^ i s the ridge r e s i d u a l at time t . THE MONTE CARLO STUDY Consider a f i r s t - o r d e r ALR model with two explanatory variables, the error terms i n the transformed r e l a t i o n , as shown by (6.12), have a scalar dispersion. The residual sum of squares from (6.12) i s given by t=i t=i \ If M X t 2 - e W p ] 2 and Y^ are given, the summation can run from 1 to n, other- wise i t can only run from 2 to n. /N /S respect to B , 6^, 8 Q The direct minimization of (7.1) with A a n 2 d P £ leads to non-linear equations, therefore, /N /S /\ the analytic expressions for 8 » B^> 8^ and Q can not be obtained. As mentioned before, many two-stage methods have been proposed to approximate these parameters. Usually i n practice, not only the parameters and the true error terms but also the order of autocorrelation structure i s unknown. As indicated previously, j o i n t presence of autocorrelation and m u l t i c o l l i n e a r i t y w i l l further complicate the s i t u a tion. Under t h i s circumstance, the r e l a t i v e effectiveness of those two-stage methods can best be studied by the Monte Carlo experiments [19]. 7.1 Design of the Experiments The main purpose of the experiment i s to give an empirical support to the inference drawn from our analytic studies. experiments are conducted i n the following manner. The sampling B a s i c a l l y , the sampling experiments comprise nine d i f f e r e n t experiments with d i f f e r e n t -39- degree of multicollinearity and autocorrelation. They are summarized in Table 1. ' Table 1 Experiment 1 2 3 4 5 6 7 8 9 r 12 .05 .05 .05 .50 .50 .50 .95 .95 .95 In our experiments, y ^ and .05 .50 .90 .05 .50 .90 .05 .50 .90 are used to indicate the severity of multicollinearity and autocorrelation respectively.. Usually in practice, multicollinearity constitutes a problem only when Y-^ 1 S a s high as 0.8 or 0.9. In addition, the error terms are normally considered to be independent, moderately and highly autocorrelated when p = .05, £ .50 and .90 respectively. As shown by Table 1; the experiments are set up to have different characteristics. Through this design, we can study the effects of autocorrelation on estimation and prediction for a given degree of multicollinearity. Moreover, we can observe how these effects of autocorrelation change as the degree of multicollinearity varies. The data is generated as follows; f i r s t , values are assigned to 8^, 8 , B and the probability characteristics of error terms u 1 0 fc in -40(2.9). Three s e r i e s of e the v a l u e s o f e o i n (2.9) are s u b s e q u e n t l y generated, g i v e n and d i f f e r e n t v a l u e of p . „ e The probability c h a r a c t e r i s t i c s o f the j o i n t d i s t r i b u t i o n X, ., and X^„ a r e chosen to t l t.2 J generate the s e r i e s of X ^ t h r e e experiments. and X ^ We and X ^ a r e generated f o r the r e m a i n i n g have a l s o a s s u r e d t h a t t h e r e i s no f i r s t - o r d e r a u t o c o r r e l a t i o n i n X ^ and X ^ is first-order. Solving for Y of f o r t y o b s e r v a t i o n s a r e generated. so t h a t the e r r o r structure sets F o r each experiment, ten samples In each sample, t h i r t y from t = 1 to t = 30 a r e employed by a p p r o r p i a t e methods. significant based on the d a t a , n i n e d i f f e r e n t can be generated f o r the experiments. on the Y 's first By v a r y i n g the c o r r e l a t i o n c o e f f i c i e n t o f X ^ and X 2» another two s e r i e s o f X ^ s i x experiments. t h a t a r e s u i t a b l e f o r the observations to e s t i m a t e the e q u a t i o n O b s e r v a t i o n s 31 to 40 a r e used to study the p r e d i c t i o n p r o p e r t i e s of e s t i m a t o r s . The BLUP i s used i n the presence of s i g n i f i c a n t a u t o c o r r e l a t i o n i n the e r r o r terms. S p e c i a l c a r e has to be e x e r c i s e d i n c o n t r o l l i n g the s e r i a l c o r r e l a t i o n p r o p e r t i e s of the e r r o r terms. In t h i s c o n n e c t i o n , OLS r e g r e s s i o n has t o be r u n on (7.2) e =pe 3 C e ,+u J C j = 1,2,...,10 t = 1,2 40 3 to determine whether the e s t i m a t e d r e g r e s s i o n c o e f f i c i e n t c o n s i s t e n t w i t h the p • which i s used to generate them. i s well-known, the OLS p £ is However, as e s t i m a t e s o f parameters f o r s m a l l samples may be b a d l y b i a s e d i f some o f the r e g r e s s o r s are lagged dependent variables [23]. T h i s i s because the e r r o r terms, u_. fc be independent of the r e g r e s s o r s , j _ ^ > E t e jt''"'' j40" G w i l l no * n longer ^• )> 2 -41- ( -i.> jt E u E ) jt+s -4.j_ $ 0 for s 4- 0 and a l l t , hence the OLS e s t i m a t e f o r the c o e f f i c i e n t of £..-, i s b i a s e d . 3t-l The u s u a l t t e s t on the e s t i m a t e of r e g r e s s i o n c o e f f i c i e n t may be q u i t e m i s l e a d i n g , t h e r e f o r e we can o n l y a s c e r t a i n t h a t the d e s i r e d s e r i a l by a s s u r i n g t h a t u ^ first t e s t whether t c o r r e l a t i o n p r o p e r t i e s are o b t a i n e d are randomly d i s t r i b u t e d . the s e r i e s o f u For each sample, i s c o n s i s t e n t w i t h the p r o b a b i l i t y c h a r a c t e r i s t i c s chosen t o generate them, then we use run determine whether u fc we i s randomly d i s t r i b u t e d . t e s t to Only those s e r i e s of u passed a l l the t e s t s are adopted i n our s i m u l a t i o n study. We are now ready to e s t i m a t e the r e g r e s s i o n e q u a t i o n . First, the f o r each experiment, the OLS p r i n c i p l e i s a p p l i e d to e s t i m a t e parameters. The Durbin-Watson statistic t e s t the e x i s t e n c e of a u t o c o r r e l a t i o n . statistic i s used as a f i l t e r Whenever the to Durbin-Watson computed from the f i t t e d model i s l e s s than the c o r r e s p o n d i n g upper c r i t i c a l v a l u e d^ (a=0.05), a u t o c o r r e l a t i o n i s assumed to'be p r e s e n t i n the e r r o r terms, then D u r b i n ' s two-step method i s used i n con j u n c t i o n ' w i t h Ridge r e g r e s s i o n f o r e s t i m a t i o n as d e s c r i b e d i n S e c t i o n 6.4; o n l y OLS purposes. o t h e r w i s e , a u t o c o r r e l a t i o n i s assumed to be absent, and and Ridge r e g r e s s i o n s t e c h n i q u e s are employed for- e s t i m a t i o n In a d d i t i o n , whenever the e x i s t e n c e of a u t o c o r r e l a t i o n i s ~ r e c o g n i z e d , 3„ ~ Tc and B j -' (k) are computed f o r comparison purposes. Since the t r u e v a l u e of the a u t o c o r r e l a t i o n c o e f f i c i e n t each experiment, c a l c u l a t i o n s o f 3 and 3™(k) - GLo ~ GR i s known f o r w i l l simply be the s t r a i g h t f o r w a r d m u l t i p l i c a t i o n of m a t r i c e s as shown by (2.13) and (6.8). The methods adopted f o r e s t i m a t i o n i n each experiment are r e c o r d e d i n T a b l e 2. -42- Table 2 Experiment Method 1 OLS, 2 Durb., Durb.+RR, GLS, GR 3 Durb., Durb.+RR, GLS, GR 4 OLS, •5 OLS: RR RR Durb., Durb.+RR, GLS, GR 6 Durb., Durb.+RR,. GLS, GR 7 OLS, 8 Durb., Durb.+RR, GLS, GR 9. Durb., Durb.+RR, GLS, GR RR O r d i n a r y L e a s t --squares Durb.: Durbin's Durb.+RR: Regression Two--step Method Durbin's Two-step i n c o n j u n c t i o n w i t h Ridge R e g r e s s i o n GLS: GR: C e n t r a l i z e d Least-squares Regression Ridge R e g r e s s i o n a d j u s t e d f o r A u t o c o r r e l a t i o n As i s expected, no c o r r e c t i o n f o r a u t o c o r r e l a t i o n i s n e c e s s a r y f o r experiments 1, 4 and 7. Whenever t h e r i d g e method i s a p p l i e d seven o r e i g h t v a l u e s o f k have been used i n our study. I n o r d e r t o minimize the e f f e c t s o f s u b j e c t i v i t y r e s u l t i n g from s e l e c t i n g the v a l u e o f k A. i n r i d g e r e g r e s s i o n s , we compute the mean 6 o f the samples f o r every ~R s p e c i f i c v a l u e o f k i n each experiment. based on a "Mean Ridge T r a c e " . That Then the v a l u e o f k i s s e l e c t e d i s to say, a unique s e l e c t e d w i l l g e n e r a l l y be t h e b e s t f o r a l l then samples. value of k Obviously, the v a l u e o f k s o - s e l e c t e d may w e l l n o t t o be the b e s t f o r every i n d i v i d u a l sample. T h e r e f o r e , t h e minimum o f t h e MSE o f r i d g e e s t i m a t e s o f 3 a c h i e v e d f o r each experiment i s s l i g h t l y upward b i a s e d . C e r t a i n l y t h i s way o f s e l e c t i n g the v a l u e o f k cannot be used i n practice. 7.2 Sampling R e s u l t s For each method f o r each experiment, the MSE o f the e s t i m a t e s 2 of t h e a d j u s t e d R , t h e r e s i d u a l sum o f squares, the MSE f o r e c a s t and the Durbin-Watson s t a t i s t i c a r e averaged over t e n samples. a d d i t i o n , t h e mean e s t i m a t e statistic a r e a l s o computed, In o f p . and t h e mean H a i t o v s k y h e u r i s t i c u i s assumed to f o l l o w a normal d i s t r i b u t i o n w i t h mean zero and v a r i a n c e e q u a l t o 6 i . e . u ~'N(0,'6). The t r u e mean o f X i s 10 and t h a t o f X tl variance of X i s 8. and X ^ a r e 18 and 15. v i s chosen t o be 3 f o r each sample. The t r u e v a l u e o f BQ i s 5, 8^ i s 1.1 and 7.2a R e s u l t s assuming p i s Known F i r s t we assume p e i s known. best of s i t u a t i o n s . i s 1. The r e s u l t s here w i l l whether the methods d e s c r i b e d i n S e c t i o n 6.4 the The r e s p e c t i v e t2 indicate can show promise i n T a b l e 3 c o n t a i n s t h e average MSE o f 8 ~ Lrii b and 3 ™ f o r experiments 2, 3, 5, 6, 8 and 9. _GR Table 3 \5xperiment k \. 2 3 6 8 9 (Y =.05) (Y =.05) (Y =.50) (Y =-50) (Y =.95) (Y =.95) 12 (P 0.0* 5 e 12 =.50) (P 12 =.90) £ (P £ 12 =.50) P £ 12 =.90) (P 12 =.50) e (P. =.90) e .4824 3.2913 .0834 2.3940 .025 .0591 1.8084 .0018 1.4991 .05 .0363 .8073 .1215 .8337 .075 .3603 .2274 .3033 .4263 .1 .9849 .0156 .8871 .1092 .4929 .2901 .2 5.7786 2.0100 4.0851 .6153 2.4426 .4242 .3 2.9771 7.0896 9.3743 4.0521 5.4924 1.0113 .5 30.1494 21.7887 21.1581 11.4165 13.7628 4.8285 .7 47.5367 36.6241 35.2063 22.2489 23.7003 10.9524 1.0 75.6732 63.4542 56.5869 39.9762 38.5830 .1011 2.1681 .0561 .9405 . • * GLS regressions can be considered as a s p e c i a l case of Ridge regressions, adjusted for autocorrelation, with k = 0. In Section 5, i t has been shown that the MSE of 8 will increase rapidly i f s i g n i f i c a n t l y p o s i t i v e autocorrelation exists both i n the disturbances and i n the p r i n c i p a l components. Correction for auto- c o r r e l a t i o n w i l l then be necessary i n estimating the regression equation. Though the GLS regression y i e l d s the BLUE of 8 , the behavior of the MSE of 8 i s very d i f f i c u l t to i n f e r from (6.10). ~Gljb From the MSE of experiments 3, 6 and 9 when k = 0, we observe that the MSE of 8 decreases as the degree of m u l t i c o l l i n e a r i t y increases for s u f f i c i e n t l y high degree of autocorrelation. On the other hand, for a given degree of m u l t i c o l l i n e a r i t y , the MSE.of 8 autocorrelation increases. w i l l increase as the degree of But the magnitude of the increase i n the MSE -45of B decreases as the r e l a t i o n among explanatory variables increases. For instance, the difference i n MSE of 3 r T 0 between experiments 8 and ~ GL< o 9 i s less than that between experiments 5 and 6. Moreover, Table 3 shows that there exists at least one value of k for each experiment such that the MSE of 8 i s less than that of 8 „ . TC P e = 9, Note that when ~GLiD ~GR k = .1 obtains the minimum MSE of the estimates of g i n T experiment 9. This also implies that the transformed matrix (X^X^) i s s t i l l ill-conditioned. In Section 5.3 we have shown that the range of k such that the MSE of 3- i s less than that of B „ w i l l be larger ~R ~OLS T c i f m u l t i c o l l i n e a r i t y i s accompanied by high degree Now P £ with parameter estimates f u l l y adjusted of autocorrelation. for autocorrelation (since i s known) experiment 9 s t i l l has the largest admissible range of k. That i s , the range of k such that the MSE B „ „ is s t i l l of g i s less than that of ~GR larger i f m u l t i c o l l i n e a r i t y i s accompanied by high ~Gijb degree of autocorrelation and autocorrelation has been f u l l y We adjusted. also observe that as the degree of autocorrelation increases, a larger reduction i n the MSE of the estimates of 3 can be obtained by replacing B „ with $ . For instance, the difference i n the MSE of ~GLo ~GR 3 „ and 3™ (.05) i n experiment 8 i s less than that i n experiment 9. ~G1JO ~GK c n T of B „ „ i s very d i f f i c u l t , i f not ~GR (6.10) shows that the MSE of $ i s comprised ~GR However, the behavior of the MSE impossible,to of two terms. predict. How each term behaves w i l l depend not only on the data matrix X and the degree of autocorrelation but also on the way the matrix X. i s linked with the matrix 0. ^. 7.2b Results assuming p In practice p e i s unknown i s unknown. c o e f f i c i e n t i s unknown and we We assume that the true autocorrelation t r y to f i t the equation using h e u r i s t i c -46- techniques akin to the Durbin's two-step method i n which has been shown by G r i l i c h e s and Rao [8] to perform well when there i s autocorrelation. We apply these techniques as described i n Section 7.1. 2 Tables 4,5 and 6 report the mean adjusted R and the mean Durbin- Watson s t a t i s t i c for each experiment. (du(a = 5%) = 1.57 for experiments L-4 and 7; du(a = 5%) = 1.56 f o r theremaining six experiments) • Table 4 ( Y Experiment 1 2 = .05) 1 V 2 .05 3 = .50 p .90 E 2* R a d** .8640 2.0791 .8979 1.8766 .9149 1.8380 .025 .8640 2.0903 .8884 1.8713 .9144 1.8269 .05 .8620 2.1001 . 8868 1.8728 .9128 1.8286 .075 .8579 2.1083 .8844 1.8805 .9104 1.8409 .1 .8567 2.1154 .8813 1.8916 .9071 1.8613 .2 . 8401 2.1353 .8619 1.9619 .8888 1.9809 .3 .8167 2.1463 .8399 2 .0383 .8648 2.1030 .5 .7651 2.1536 .7865 2.1563 .8102 2.2789 1.0 .6394 2.1527 .6571 2.2960 .6780 2.4740 k 0.0 * R : *d : 2 R a the mean adjusted R' the mean Durbin-Watson statistic d 2 R a d Table 5 (Y, = -50) 0 4 Experiment 5 05 .50 P k 0.0 2 R d a 6 R . .90 = £ - 2 a ' d R 2 d a .8973 2.0984 .9178 1.8958 .9475 2.0754 .025 .8970 2.1040 .9175 1.9054 .9472 2.0865 .05 .8962 2.1095 .9168 1.9194 .9464 2.1059 .075 .8950 2.1147 .9155 1.9371 .9451 2.1317 .1 .8933 2.1198 .9138 1.9576 .9434 2.1622 .2 .8832 2.1381 .9032 2.0540 .9336 2.3024 .3 .8702 2.1592 .8895 2.1504 .9184 2.4532 .5 .8351 2.1721 .8546 2.2988 .8834 2.6086 .7 .7970 2.1826 .8187 2.4203 .8440 2.7084 1.0 .7412 2.1900 .7572 2.4732 .7846 2.7887 Table 6 (y., = . 9 5 ) 7 Experiment V k . R 2 a 8 .05 P d R 2 a 9 50 P e = £ d R 2 a - 90 d .9208 2.0511 .9391 1.8785 .9575 1.8707 .05 .9201 2.0873 .9380 1.9058 .9565 1.8883 .1 .9181 2.1128 .9359 1.9437 .9544 1.9335 .2 .9116 2.1473 .9293 2.0280 .9677 2.0562 •3 .9024 2.1659 .9199 2.1084 .9382 2.1660 .5 .8786 2.1768 .8957 2.2312 .9136 2.8381 .7 .8505 2.1725 .8673 2.3082 .8847 2.4417 1.0 .8056 2.1594 .8217 2.3739 .8399 2.5259 0.0 -48- From T a b l e s 4-6, we observe t h a t the a d j u s t e d I n c r e a s e s as the degree of a u t o c o r r e l a t i o n i n c r e a s e s f o r a g i v e n v a l u e o f k and a g i v e n degree o f m u l t i c o l l i n e a r i t y . This i s i n t u i t i v e l y plausible since a u t o c o r r e l a t i o n can account f o r p a r t o f the v a r i a t i o n i n the e r r o r s , thereby d e c r e a s i n g the r e s i d u a l sum of squares and i n c r e a s i n g the 2 adjusted R . increases. F o r each experiment, the a d j u s t e d R d e c r e a s e s as k The r e a s o n i s obvious from the d e r i v a t i o n of r i d g e B e s i d e s , the b e s t R is, 2 2 cl estimators. a c h i e v e d f o r each experiment i s p r e t t y h i g h , t h a t the e s t i m a t e d model can e x p l a i n most o f the v a r i a t i o n i n Y^_. This a l s o i m p l i e s that the e s t i m a t i o n methods adopted i n our experiments are fairly efficient and powerful.. The mean Durbin-Watson statistic computed f o r each method f o r each experiment i s h i g h enough to a s c e r t a i n t h a t the f i t t e d model has s u c c e s s f u l l y removed the problem of autocorrelation. S i n c e the model i s r e a s o n a b l y w e l l f i t t e d , simulation comparisons o f t h e e x p e r i m e n t a l r e s u l t s s h o u l d be m e a n i n g f u l as w e l l as informative. The average MSE o f the e s t i m a t e s o f § i s computed f o r each method for each experiment and r e c o r d e d i n T a b l e 7. Table 7 ^^Experiment. 1 2 3 4 5 6 7 8 9 .1101 .4824 .9594 .0030 .0342 .3996 .0180 .0720 .6951 .025 .0192 .0570 .2691 .1104 .0210 .0945 - - - .05 .2865 .0390 .0087 .4158 .1965 .0024 .1833 .0719 .0939 .075 .8820 .3744 .1200 .8973 .5778 .1026 - - - .1 1.7430 1.0167 .5559 1.5366 1.1097 .3765 .8307 .5643 .0041 .2 7.3539 5.9115 4.7694 5.3823 4.5690 2.7540 3.2001 1.2549 .5822 .3 15.1383 13.2456 11.5962 10.8432 9.5949 6.8219 6.6531 5.7624 3.3012 .5 33.5067 31.1559 28.8369 23.9694 22.3449 18.6255 15.6804 14.2377 10.0251 .7 - - 38.6334 36.9396 32.0214 26.3049 24.3789 18.5115 79.3134 76.8708 60.7620 50.3176 52.8276 43.3464 40.2351 32.3991 k 0.0 1.0 - • 73.8630 -50- As t h e known a u t o c o r r e l a t i o n case, when k = 0 t h e MSE o f e s t i m a t e s o f B i n c r e a s e s as t h e degree o f a u t o c o r r e l a t i o n i n c r e a s e s , g i v e n the degree o f m u l t i c o l l i n e a r i t y . However, b e i n g different from t h e known a u t o c o r r e l a t i o n case, t h e MSE o f e s t i m a t e s o f 8 f i r s t decreases then i n c r e a s e s as t h e degree o f m u l t i c o l l i n e a r i t y i n c r e a s e s f o r k = 0 and a g i v e n degree o f a u t o c o r r e l a t i o n . except f o r experiments T a b l e 7 shows t h a t 4 and 7, b e t t e r e s t i m a t e s o f B i n MSE criterion can be o b t a i n e d i f D u r b i n ' s two-step method i s combined w i t h r i d g e regression f o r estimation. B e s i d e s , amazingly we have found t h a t we a r e a b l e t o o b t a i n b e t t e r e s t i m a t e s o f B i n terms o f MSE i f t h e t r u e autocorrelation coefficient P £ i s unknown. F o r c l a r i t y , we shall compare o n l y t h e minimum o f t h e average MSE o f t h e e s t i m a t e s o f B a c h i e v e d f o r each experiment p, £ known c a s e . i n the p unknown case w i t h t h a t i n the T a b l e 8 r e p o r t s t h e minima o f t h e average MSE o f t h e e s t i m a t e s o f 8 a c h i e v e d f o r each experiment unknown c a s e s . i n both the known and I n a d d i t i o n , t h e e s t i m a t i o n method and the c h a r a c t e r i s t i c s of each experiment are also tabulated. -51- Table 8 (p (p unknown) known) ^ s E x p e r imen t Experiment (r 1 2 ,P ) £ Estimation Method k Min. MSE of g k Min. ° ^GR f 1 (.05,.05) RR .025 .0192 _ _ 2 (.05,.50) . Durb.+RR .05 .0390 .05 .0363 3 (.05,.90) Durb.+RR .05 .0087 .1 .0156 4 (.50,.05) .0030 - •- 5 (.50,,50) Durb.+RR .025 .0210 . 025 .0018 6 (.50,.90) Durb.+RR .05 .0024 .1 .1092 7 (.95,.05) .0180 - - 8 (.95,.50) Durb.+RR .05 .0719 .05 .0561 9 (.95,.90) Durb.+RR .1 . .0441 .1 .2901 OLS 0.0 OLS 0.0 A couple o f i n t e r e s t i n g o b s e r v a t i o n s can be made from T a b l e 8. if First the degree o f m u l t i c o l l i n e a r i t y i s h e l d c o n s t a n t , the minimum o f the average MSE o f parameter e s t i m a t e s w i l l first the degree o f a u t o c o r r e l a t i o n i n c r e a s e s . i n c r e a s e then decrease as On the o t h e r hand, g i v e n the degree o f a u t o c o r r e l a t i o n , t h e minimum o f the average estimates w i l l first decrease c o l l i n e a r i t y increases. MSE o f the parameter then i n c r e a s e as the degree o f m u l t i - These a r e i n t u i t i v e l y p l a u s i b l e s i n c e s u f f i c i e n t h i g h degree o f a u t o c o r r e l a t i o n s h o u l d - l e a d to more s t a b l e parameter e s t i m a t e s w h i l e s u f f i c i e n t h i g h degree o f m u l t i c o l l i n e a r i t y usually r e s u l t s i n v e r y u n s t a b l e parameter e s t i m a t e s . observe t h a t the v a l u e o f r i d g e parameter k, used of MSE Secondly, we t o a c h i e v e the minimum MSE the e s t i m a t e s o f 8, i n c r e a s e s w i t h the degree o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n . T h i s i s c o n s i s t e n t w i t h our a n a l y t i c findings shown i n Section 5.3. Moreover, we have found that knowing does not give better estimates of 6 for s u f f i c i e n t high degree of autocorrelation. This may r e s u l t from sample sizes being small. Table 9 contains the mean estimates of p^ obtained i n the f i r s t step of Durbin's method and the mean Haitovsky h e u r i s t i c s t a t i s t i c for each experiment. Table 9 Experiment p c Bias i n p e H x^ df = 3 2 1 2 3 — .3581 125.7 4 5 6 .7182 .3586 .7231 .1419 .1818 .1414 .1769 123.1 111.7 38.7 37.4 39.1 7 — 2.78 8 9 .3849 .7498 .1151 .1502 2.53 2.40 In a l l cases, Durbin's two-step method tends to underestimate the true autocorrelation c o e f f i c i e n t . This results from the presence of the lagged Y values among the explanatory variables [16]. I f the degree of multi- c o l l i n e a r i t y Is held constant, the bias of estimate of p^ increases as the degree of autocorrelation increases; while given the degree of autocorrelation, the bias decreases as the degree of m u l t i c o l l i n e a r i t y increases. In our simulation study, Haitovsky h e u r i s t i c s t a t i s t i c can recognize the existence of severe m u l t i c o l l i n e a r i t y i n experiments 7, 8 and 9. However, i t does not give any warning when there exists a f a i r l y high degree of m u l t i c o l l i n e a r i t y , i . e . based on the Haitovsky test, m u l t i c o l l i n e a r i t y i s i n s i g n i f i c a n t i n experiments 4, 5 and 6. Since the Haitovsky test i s based on the determinant of correlation matrix, i t has some b u i l t - i n defficiencies (see Section 3.3 for d e t a i l s ) . Our experiments have -53- disclosed these d e f f i c i e n c i e s to a certain extent, hence we suggest that s p e c i a l care has to be exercised i n applying this test. 7.2.C. Forecasting Tables 10, 11 and 12 report the average residual sums of squares and the mean square error of prediction from the given values f o r the forecast period of each experiment, under the assumption that Table 10 Experiment k u * 1 2 = .05, a* = 6) 1 a 0.0 (r i s unknown. 2 3 AA »2* AA F/C a F/C u 0 u AA M S E F/C 5.9700 8.1132 5.7055 9.1966 5.6351 10.939 .025 5.9924 8.0343 5.7332 9.6623 5.6721 10.952 .05 6.0554 7.9961 5.8113 9.0620 5.7762 11.034 .075 6.1536 7.9986 5.9328 9.1072 5.9382 11.173 .1 6.2824 8.0343 6.0918 9.1913 6.1501 11.360 .2 7.0250 8.4293 7.0074 9.8181 7.3716 12.470 .3 8.0022 9.0877 8.2087 10.739 8.9754 13.932 .5 10.245 10.755 10.957 12.944 12.6521 17.249 1.0 15.733 15.075 17.656 18.419 21.649 25.164 "2 a : the average of the residual sums of squares over ten samples. u ** MSE ^ : F C the average MSE of predictions from the given values for the forecast period. -54- Table 11 (r, = .50, a • k 0.0 5 4 Experiment -2 a u = 6) 2 u 12 M S E F/C 6 -2 . a u M S E F/C "2 a u M S E F/C 6.0169 8.2093 5.8625 9.3838 5.7757 10.733 .025 6.0331 8.1743 5.8828 9.3691 5.8038 10.731 .05 6.0797 8.1690 5.9409 9.3874 5.8838 10.672 .075 6.1541 8.1905 6.0330 9.4351 6.0109 10.850 .1 6.2518 8.2360 6.1559 9.5093 6.1803 10.961 .2 6.8444 8.6142 6.8849 10.021 7.1976 11.679 .3 7.2030 9.2476 7.9231 .10.783 8.6992 12 .482 .5 9.7190 10.851 10.470 12.735 12.128 15.327 .7 11.933 12.726 13.070 14.851 15.759 18.018 1.0 15.429 15.620 17.566 18.301 21.929 22.680 2 Table 12 a u 0.0 2 = .95, a ^F/C = 6) 8 7 Experiment k (r -2 a u MSE„ .„ F/C 9 -2 a u M S E F/C 6.0443 8.3699 5.6725 9.8589 5.4033 11.335 .05 6.1186 8.1165 5.7759 9.4141 5.5390 10.758 .1 6.2753 8.1220 6.0473 9.3568 5.8110 10.754 .2 6.7890 8.4120 6.6160 9.5945 6.7433 11.172 .3 7.5180 8.9408 7.5161 10.116 7.9187 12.001 .5 9.4101 9.8466 11.683 11.120 . 14.301 10.445 .7 11.643 .12.290 12.584 13.651 14.881 17.123 1.0 15.198 15.343 16.973 16.918 20.713 21.591 Though the BLUP i s adopted f o r f o r e c a s t purposes, the MSE prediction w i l l increases. still i n c r e a s e as the degree of of autocorrelation However, the main p o i n t i s t h a t the presence of m u l t i - c o l l i n e a r i t y w i l l adversely disturbances a f f e c t the p r e d i c t i v e performance i f the are h i g h l y s e r i a l l y c o r r e l a t e d . The t h a t the p r e d i c t i v e power o f the model i s not commonly h e l d a f f e c t e d by belief existence of m u l t i - c o l l i n e a r i t y i s o n l y t r u e i f the problem of a u t o c o r r e l a t i o n is not serious. I n the 9th experiment, the model f i t t e d by method g i v e s s a t i s f a c t o r y r e s u l t s on v a r i o u s diagnostic tests, the p r e d i c t i o n of the BLUP l e a v e s much t o be d e s i r e d . determine the model which w i l l y i e l d l e s s MSE c r i t e r i a and c o l l i n e a r i t y and not l e s s MSE We to perform a l s o observed t h a t the v a l u e of p r e d i c t i o n s . e s t i m a t e s of B i n MSE of of squares w i l l However, the c r i t e r i o n tends to of p r e d i c t i o n f o r each experiment. the v a l i d i t y of the MSE o f p r e d i c t i o n and e s t i m a t e of the r e s i d u a l sum u s u a l l y g i v e the minimum MSE o f k g i v i n g the b e s t are a b l e t e s t s i n the j o i n t presence of m u l t i - autocorrelation. k t h a t y i e l d s the b e s t still Fortunately, w i t h Durbin's method combined w i t h Ridge r e g r e s s i o n , we w e l l on v a r i o u s Durbin's Hence, we may value yield conclude c r i t e r i o n i n the e v a l u a t i o n of parameter estimates. To avoid c o n f u s i o n , we based on g ~GLS and obtained However, we true p c have not reported found t h a t the MSE the MSE of p r e d i c t i o n based i s l e s s than t h a t based on the parameter e s t i m a t e s by Durbin's two-step method i n c o n j u n c t i o n regression. of p r e d i c t i o n with ridge Though Durbin's two-step method combined w i t h r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s o f 3, but underestimates p . Therefore, ridge i n general i t the BLUP based on (3 and true p on g i v e s the minimal MSE o f p r e d i c t i o n i n each o f the experiments 2, 3, 5, 6, 8, and 9. CONCLUSIONS I t has been shown t h a t i n the presence o f m u l t i c o l l i n e a r i t y w i t h s u f f i c i e n t h i g h degrees o f a u t o c o r r e l a t i o n . The OLS e s t i m a t e s o f r e g r e s s i o n c o e f f i c i e n t s can be h i g h l y i n a c c u r a t e . estimation procedure i s o b v i o u s l y n e c e s s a r y . r e g r e s s i o n , we d e r i v e d a new 8 Combining GLS and Ridge estimator. (k) = (xV'-'-X+kl)'" X n~ Y ~ .. ~ ~ ~ ~ T 1 ~GR Improving the where 0 < k < 1 1 andft i s defined i n (2'). &„„(k), though b i a s e d , i s expected to perform w e l l i n the j o i n t presence o f m u l t i c o l l i n e a r i t y and autocorrelation. based on the b i a s e d Therefore, However, s i n c e ft i s unknown,parameter e s t i m a t e s estimator B^Ck) cannot be o b t a i n e d i n practice. we combined Durbin's two-step method w i t h o r d i n a r y r e g r e s s i o n to approximate those parameters. The e f f e c t i v e n e s s o f our a p p r o x i m a t i o n can then b e s t be examined by the Monte C a r l o Our simulation. study has confirmed t h a t , f o r a g i v e n degree o f m u l t i - c o l l i n e a r i t y , the MSE o f the GLS e s t i m a t e s o f 3 i s d i r e c t l y to the degree o f a u t o c o r r e l a t i o n . wisdom. Ridge T h i s agrees w i t h proportioned conventional Unexpectedly, we found t h a t the MSE o f t h e GLS e s t i m a t e s o f 3 i s i n v e r s e l y p r o p o r t i o n a l to the degree of m u l t i c o l l i n e a r i t y f o r a s u f f i c i e n t l y high degree o f a u t o c o r r e l a t i o n . T h i s i m p l i e s t h a t i n the a p p l i c a t i o n o f the GLS t e c h n i q u e , the symptom o f the e x i s t e n c e o f m u l t i c o l l i n e a r i t y may be d i s g u i s e d . However, s i n c e i n p r a c t i c e n e i t h e r the t r u e e r r o r terms nor GLS in the a u t o c o r r e l a t i o n c o e f f i c i e n t e s t i m a t e s can p o s s i b l y be o b t a i n e d . We were p l e a s e d the j o i n t presence of m u l t i c o l l i n e a r i t y and i s known, no to f i n d autocorrelation; whatever the degree i s , Durbin's two-step method i n c o n j u n c t i o n Ridge r e g r e s s i o n the GLS with (p^ unknown) y i e l d s even b e t t e r e s t i m a t e s of 8 than technique ( p £ known) does i n MSE criterion. Though the o f k g i v i n g b e t t e r e s t i m a t e s of 3 tends to y i e l d l e s s MSE prediction, s t i l l the cases. that the GLS gives value of the minimal-MSE o f p r e d i c t i o n i n a l l Besides, our e x p e r i m e n t a l r e s u l t s have shown t h a t Durbin-Watson t e s t f o r d e t e c t i n g the e x i s t e n c e the of f i r s t - o r d e r auto- c o r r e l a t i o n remains p o w e r f u l i n the presence of m u l t i c o l l i n e a r i t y w h i l e the H a i t o v s k y h e u r i s t i c s t a t i s t i c g i v e s r e l a t i v e l y l i m i t e d about the e x i s t e n c e of - m u l t i c o l l i n e a r i t y e i t h e r w i t h or without presence of a u t o c o r r e l a t e d Our "optimal autocorrelation. to the s e a r c h independent phenomena. Empirical research f o r optimal d e a l i n g w i t h m u l t i c o l l i n e a r i t y and and find,an package" t h a t d e a l s w i t h the j o i n t problem of m u l t i c o l l i n e a r i t y and been c o n f i n e d the e r r o r terms. r e s u l t s a l s o suggest t h a t i t might be p o s s i b l e to estimation information The estimation autocorrelated ordinary has hitherto techniques i n e r r o r s as separate ridge regression, i . e . , T adding a constant k on the d i a g o n a l of c o r r e l a t i o n m a t r i x (X X) Durbin's two-step method have been shown to be v e r y techniques i n h a n d l i n g m u l t i c o l l i n e a r i t y and Even though s a t i s f a c t o r y e s t i m a t i o n and the combination o f Durbin's method and t h e r e may still e x i s t some o t h e r and powerful a u t o c o r r e l a t i o n problems. p r e d i c t i o n are o b t a i n e d ordinary ridge by regression, even more e f f i c i e n t approaches to the j o i n t problem o f m u l t i c o l l i n e a r i t y and autocorrelation. For -58- i n s t a n c e , the combination of the Cochrane-Orcutt procedure G e n e r a l i z e d Ridge r e g r e s s i o n i s a more f l e x i b l e e s t i m a t i o n and thereby with technique s h o u l d l e a d to b e t t e r e s t i m a t i o n and p r e d i c t i o n . Allowing f o r h i g h e r o r d e r and mixed o r d e r a u t o c o r r e l a t i o n w i l l be a good d i r e c t i o n to pursue as w e l l . BIBLIOGRAPHY Cochrane, D. and Orcutt, G. H. (1949). Application of l e a s t squares regressions to relationships containing autocorrelated error terms. J . Am. S t a t i s t . Assoc., 44, 32-61. Durbin, J . (1960). Estimation of parameters i n time-series regression models. J . Royal S t a t i s t . S o c , Series B, 139-153. Durbin, J . (1970). Testing for s e r i a l c o r r e l a t i o n i n l e a s t squares regression when some of the regressors are lagged dependent variables. Econometrica, 38, 410-421. Durbin, J . and Watson, G. S. (1950). Testing f o r s e r i a l c o r r e l a t i o n i n least-squares regression (part 1). Biometrica, J 3 7 , 409-428. Durbin, J . and Watson, G. S. (1951). Testing for s e r i a l c o r r e l a t i o n i n least-squares regression (part 2). Biometrica, 38, 159-178. Farrar, D. C. and Glanber, R. R. (1967). M u l t i c o l l i n e a r i t y i n regression analysis: the problem r e v i s i t e d . Rev. Economics S t a t i s t i c s , 49, 92-107. G r a y b i l l , F. A. (1976). Theory and Application of the Linear .Model. Daxburg Press, North Scituate, Mass. G r i l i c h e s , Z. and Rao, P. (1969). Small-sample properties'of several two-stage regression methods i n the context of . autocorrelated errors. JASA, 64, 253-272. Hailovsky, Y. (1969). M u l t i c o l l i n e a r i t y i n regression analysis: • comment. Rev. Economics S t a t i s t i c s , 486-489. Henshaw, R. C., J r . (1966). Testing single-equation least squares regression models for autocorrelated disturbances. Econometrica, 34, 646-660. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Tech., 12, 55-67. Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge regression: some simulations. Comm. Stat. , k_, 105-123. Johnston, J . (1972). Econometric Methods. 2nd edn., McGraw-Hill. K l e i n , L. R. (1962). Hall. An Introduction to Econometrics. Prentice- -60- [15] L i n , T. C. (1960). Underidentification, s t r u c t u r a l estimation and forecasting. Econometrica, 28, 856. [16] Marriott, F. H. C. and Pope, J . A. (1954). Bias i n the estimation of autocorrelation. Biometrica, 41, 390-402. [17] Nerlove, M. and W a l l i s , K. F. (1966). Use of the Durbin-Watson S t a t i s t i c i n inappropriate s i t u a t i o n s . Econometrica, 34, 235-238. [18] Newhouse, J . P. and Oman, S. D. (1971). An evaluation of ridge estimators. Rand report No. R-716-PR. [19] Smith, V. K. (1973). [20] Thisted, R. A. (1976). Ridge regression, minimax estimation and empirical Bayes method. Ph.D. thesis, Tech. Report 28, B i o s t a t i s t i c s Dept., Stanford University. [21] Thisted, R. A. (1978). M u l t i c o l l i n e a r i t y , information and ridge regression. S t a t i s t i c s Dept., University of Chicago. [22] -yon Neumann, J . (1941). D i s t r i b u t i o n of the r a t i o of the mean square successive difference to the variance. Ann. Math. Stat. 12, 367-395. [23] White, J . S. (1961). Asymptotic expansions f o r the mean and variance of the s e r i a l . c o r r e l a t i o n c o e f f i c i e n t . Biometrica, 48, 85-94. Monte Carlo Methods.. Lexington, Mass.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Multicollinearity, autocorrelation, and ridge regression
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Multicollinearity, autocorrelation, and ridge regression Hsu, Jackie Jen-Chy 1979
pdf
Page Metadata
Item Metadata
Title | Multicollinearity, autocorrelation, and ridge regression |
Creator |
Hsu, Jackie Jen-Chy |
Date Issued | 1979 |
Description | The presence of multicollinearity can induce large variances in the ordinary Least-squares estimates of repression coefficients. It has been shown that ridge regression can reduce this adverse effect on estimation. The presence of serially correlated error terms can also cause serious estimation problems. Various two-stage methods, have been proposed to obtain good estimates of the regression coefficients in this case. Although the multicollinearity and autocorrelation problems have long been recognized in regression analysis, they are usually dealt with separately. This thesis explores the joint effects of these two conditions on the mean square error properties of the ordinary ridge estimator as well as the ordinary least-squares estimator. We show that ridge regression is doubly advantageous when multicollinearity is accompanied by autocorrelation in both,the errors and the principal components. We then derive a new ridge type estimator that is adjusted for autocorrelation. Finally, using simulation experiments with different degrees of multicollinearity and autocorrelation, we compare the mean square error properties of various estimators. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-13 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0094775 |
URI | http://hdl.handle.net/2429/21865 |
Degree |
Master of Science in Business - MScB |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1980_A4_6 H88.pdf [ 3.32MB ]
- Metadata
- JSON: 831-1.0094775.json
- JSON-LD: 831-1.0094775-ld.json
- RDF/XML (Pretty): 831-1.0094775-rdf.xml
- RDF/JSON: 831-1.0094775-rdf.json
- Turtle: 831-1.0094775-turtle.txt
- N-Triples: 831-1.0094775-rdf-ntriples.txt
- Original Record: 831-1.0094775-source.json
- Full Text
- 831-1.0094775-fulltext.txt
- Citation
- 831-1.0094775.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>

http://iiif.library.ubc.ca/presentation/dsp.831.1-0094775/manifest