COLLINEARITY IN GENERALIZED LINEAR MODELS by MURRAY J MACKINNON M.Sc. U n i v e r s i t y of O t a g o N.Z. A THESIS SUBMITTED IN PARTIAL FULFILMENT THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in FACULTY OF GRADUATE Commerce and B u s i n e s s We accept to this standard OF BRITISH COLUMBIA April © Administration t h e s i s as c o n f o r m i n g the r e q u i r e d THE UNIVERSITY STUDIES 1986 M u r r a y J M a c k i h n o n , 1986 OF In presenting requirements this that I agree that available permission scholarly for partial purposes or understood that gain by may his be or copying shall the reference f o r extensive Department financial in not Commerce a n d B u s i n e s s of the this granted by the her Head Administration i t agree thesis representatives. allowed make I further of or p u b l i c a t i o n be shall copying THE UNIVERSITY OF BRITISH COLUMBIA 2075 Wesbrook P l a c e V a n c o u v e r , Canada V6T 1W5 1986 Library and s t u d y . permission. Date: A p r i l fulfilment f o r an a d v a n c e d d e g r e e a t t h e THE UNIVERSITY OF BRITISH COLUMBIA, freely thesis of It for my is of t h i s t h e s i s f o r without my written Abstract The is concept introduced models. Two presented These pth A of collinearity are analysed bound binomial i s generalized illustrated t o t h e same proportion f o r model and p r i n c i p a l a Monte that gamma, methods components of models standard linear diagnostic degree i n terms linear collinearity and negative the Carlo f o r detecting f o rt h e Poisson, Estimation with to f o r t o lead derived linear model. likelihood compared approaches a n d shown order, linear and f o rgeneralized procedure. inverse Gaussian, binomial models. of collinearity that based of on i n a t h e standard ridge, a r e proposed, simulation are prior and b r i e f l y o f a gamma model. iii Table of Contents 0.0 Int.roduct.ion 1.0 Col l i n e a r i t y i n Standard L i n e a r Models Definition 1.2 Sources of C o l l i n e a r i t y 5 1.2.1 Large P a i r w i s e C o r r e l a t i o n s 5 1.2.2 Data C o l l e c t i o n 6 1.2.3 Model S p e c i f i c a t i o n 8 1.2.4 Overdefined 8 1.2.5 Outliers Effects of C o l l i n e a r i t y 3 1.1 1.3 2.Q 1 2.2 3 Model 9 of C o l l i n e a r i t y 10 1.3.1 111 c o n d i t i o n i n g of X 1.3.2 Estimate 1.3.3 Inference Effects 13 1.3.4 Predictor Effects 14 Definition Definition 11 L i n e a r Models of a G e n e r a l i z e d of 10 Effects C o l l i n e a r i t y l n Generalized 2.1 . . . 16 L i n e a r Model . . . . C o l l i n e a r i t y i n Generalized 16 Linear Models 22 2.2.1 Linearisation 2.2.2 Iteratively Approach of the Link F u n c t i o n . Reweighted Least ... 23 Squares 25 2.2.3 Choice of Approach 2.3 R e l a t i o n s h i p o f the 25 Standard L i n e a r Model and the G e n e r a l i z e d Model C o l l i n e a r i t y D e f i n i t i o n s 2.4 Sources o f Collinearity in . . . a Generalized Linear Model 2.5 29 Effects of Collinearity in a Generalized Linear Model 32 2.5.1 Estimation Effects 32 2.4.2 Inference 33 2.6 Appendix Generalized 2.7 and P r e d i c t o r E f f e c t s 2A Maximum Likelihood and the L i n e a r Model Appendix 2B 35 I t e r a t i v e l y Reweighted Least Squares Algorithm 3.0 26 Diagnostics 40 for Collinearity in Generalized Linear Models 3.1 42 Desirable Properties f o r Diagnostic Measures 42 4.0 3.2 Measures of C o l l i n e a r i t y 3.3 Model Dependency Estimation for 43 in Collinear Generalized Systems Linear Models Presence o f C o l l i n e a r i t y 4.1 Remedies f o r C o l l i n e a r 46 in the 50 Sources 50 4.1.1 Large P a i r w i s e C o r r e l a t i o n s i n X 51 4.1.2 Data C o l l e c t i o n 51 LEAF FEUILLET I V OMITTED I V NON IN PAGE UNCLUS NUMBERING. DANS L A P A G I N A T I O N . V 4.1.3 Model S p e c i f i c a t i o n 51 4.1.4 Overdefined 52 4.1.5 Outliers Model 52 4.2 Remedies f o r C o l l i n e a r E f f e c t s 52 4.3 E s t i m a t i o n Methods 53 4.3.1 53 Ridge E s t i m a t i o n 4.3.1.1 Standard L i n e a r Case 54 4.3.1.2 General 55 4.3.2 Bayesian Estimation 60 4.3.2.1 Standard L i n e a r Case 60 4.3.2.2 General 61 4.3.3 4.4 L i n e a r Case L i n e a r Case P r i n c i p a l Component E s t i m a t i o n 62 4.3.3.1 Standard L i n e a r Case 62 4.3.3.2 General 63 Appendix 4A L i n e a r Case Maximum L i k e l i h o o d Estimates o f the Posterior Distribution 5.0 65 I l l u s t r a t i v e Example 67 5.1 Introduction 67 5.2 Scope o f the S i m u l a t i o n 67 5.3 Generation 68 5.4 o f C o l l i n e a r Data 5.3.1 Standard L i n e a r Case 68 5.3.2 General 68 L i n e a r Case Simulation 71 5.4.1 S i m u l a t i o n Setup 71 5.4.2 S i m u l a t i o n 73 Implementation and D i f f i c u l t i e s vi 5.4.3 6.0 S i m u l a t i o n R e s u l t s and C o n c l u s i o n s . Summary and C o n c l u s i o n s Bibliography ... 74 81 83 0.0 Introduction Collinearity significant parameters with a with the is the has problems in a of collinearity its consequences The first linear model provide linear exception for an i n the standard generalized purpose long of and computation and linear model. model are thesis i n the to chapter recognised to However, propose introduction It definition model, to examine methods. collinearity and the effects [SCHAEFER,793. linear i t s sources of unexplored, reasonable estimation identifies reviews a causing the relatively seek generalized as estimation logistic regression this and been and comparison in the effects, for standard so the as to general setting. The second collinearity two link by i n the approaches. function the i s chosen. and least shown seeks the first of reasonable linear i s based second, squares that a generalized The instability reweighted it chapter on which matrices model a by used i s in Using for considering linearisation i s chosen, estimation. collinearity definition of the i s motivated the iteratively this definition, dependent on the model The the third general criterion chapter setting. and Desirable identification indeterminancy general the opens due criterion standard The by properties scheme, while of prior likelihood to fifth gamma chapter with model. The construction of simulation within advantages of collinearity. a proposed, To weak of the in and a approach quantify the bounds the the for criterion for of the by are following method Monte and such simulation its biased estimator in of ridge logistic on the the third solution. the Carlo ideas of simulation associated are setting the while objectives artificial The It and use i s based components problems a proposed. illustrates restricted sources routine [EDWARDS,691, principal methods. collinear second briefly a estimation methods The classical chapters terms for generalize, philosophy the in against Three CSCHAEFER,791. generalizes previous cautioning shown used. investigates remedies methods. is are extending dependency, developed diagnostics model. suggesting work The are collinearity is model chapter "mechanical" approach to linear fourth effects, the with i"BELSEY,KUH,WELSCH,80], of a deals the of with discussed. The demonstrates the the presence of 3 1.0 C o l l i n e a r i t y l n Standard L i n e a r Throughout the t h e o r e t i c a l development of t h i s t h e s i s a constant p a r a l l e l w i l l model and the be drawn between generalized linear l i n e a r model of a response y i n terms X be d e f i n e d as Models the standard l i n e a r model. Let the standard of p r e d i c t o r variables follows. D e f i n i t i o n 1.1 Standard L i n e a r Model y = Xfl + € where y nxl vector X nxp matrix of p r e d i c t o r variables fi p x l v e c t o r of c o e f f i c i e n t s € of nxl v e c t o r where 1.1 of responses errors € ~ N , (0, cr^I) v D e f i n i t i o n of C o l l i n e a r i t y Collinearity, regression, two as considered in classical linear i s s a i d to be present when : or more predictor c o e f f i c i e n t estimates variables are highly correlated are d i f f e r e n t i n magnitude and/or s i g n from those hypothesised c o e f f i c i e n t e s t i m a t e s are h i g h l y variable 4 the sampled matrices range used o f some i n the predictors computation may of be small estimates are i l l conditioned But these more basic the columns problem problem of i s and only foundation used here. and highly outliers, variable or causes dependencies variables. of With the this causes such and t o g i v e estimates symptoms linear predictor t i e together diffuse range possible of approximate the the [GUNST,831, to are approach as amongst underlying approach validity and This of the taken by i t i s possible deficient t o symptoms sensitive row sampling such as deletion statistics. The following definition of collinearity i s adapted from [GUNST,831. Definition y = Xfl + € For c 1.2 * 0 |Xc| It data i s p r e s e n t when some such Collinearity suitably i n the Standard Linear Model : chosen 6 i 0 there exists a pxl vector that L 6|c| i s t o be noted m a t r i x X a n d makes that no this definition mention of only involves t h e dependent the variable y, or the underlying the detection model. So f o r t h e s t a n d a r d of c o l l i n e a r i t y involves only linear the X model, matrix of of the lack of predictors. This definition guidance from i n the selection the require matrix others X ) . The directly explained 1.2 o f 6. selection CGUNST,831 notions CGUNST,831 subjective of shows how i n terms these of the no different (eg.CBELSEY,KUH,WELSCH,801 definition i t . The i t i s the condition of collinearity from becuase However i n the l i t e r a t u r e subjective intuitive more i s index i s preferred listed above following sources because the c a n be sections and of the deduced based effects on can be definition. Sourcaa o f C o l l i n a a r i t y Collinearity sources, can stem and a l l o f t h e s e Further, i t information i s of on from one c a n be c a p t u r e d considerable the any source by practical when of a number Definition interest trying to of 1.2. t o have alleviate collinearity. 1.2.1 Large Suppose correlation XK Pairwise X has been form Xj and i s r be the matrix <ie. J k with Correlations scaled the and centred correlation so of X X coefficient , where x j i s t h e j t h column t h e j t h column that T i sin between o f X ) . L e t X <j > X deleted a n d fi«j> t h e 6 corresponding theory i t i s known 6 <j > — Let Now = Xj where = Rj' S; from linear ( e g . [MONTGOMERY,PECK,823 regression p430) X (j )x j X )>ft<j> - t sum of squares principle ( 1 - R.., -)^ & i s the coefficient of multiple correlation. i f f o r some 6 i 0 <1 - R_, >^ i 6(1 + R we | X c j £ 6 | c: j s o a c o l l i n e a r i t y K then have Definition Xc = |Xc| xt = - r - |Xc| and (1 - l e a for r > ) r e i e & ft<e> = exists f i e and so c i n terms o f i s given by and > ^ > 0 we have i 6(1 + r i a 8 ) ^ then exists as per D e f i n i t i o n "large enough" a collinearity i s important t o note though dependency show A then = a collinearity It tend J 1 6|c| so words ( i f p 2 riftXfflj i f f o r some <1 R 1.2. Specifi cslly So Now as by t h e e x t r a |Xc| predictor. (X < j > X < j > ) c be d e f i n e d Xc So deleted i l s need towards when dependency, the tends be strong unity. for that a l l the Specifically singular to zero value, a l l only 1.2. In other exists. one correlations to CBELSEY,KUH,WELSCH,803 connected pairwise with just correlations tend one to unity. Hence, multiple 1.2.2 If high correlations cannot Data a subset of the predictor arise. collinearity can variable case scatterplot positively more i n the correlated collinearity variable (eg. as one jointly and can deficiency likely, factors to determine Collection only sampling used collinearities. sampled, of be due in For example, below, consequently be to an variable the space is the two with data is c o l l i n e a r . This caused by chance, lack highly source extrinsic factors observational variable is related influences two other of available study, to by another, variables). (eg. data) or intrinsic or one 1.2.3 Model Often which Specification in applied manifest However, models themselves they polynomial show that in for the i f the case exist the data not consider of a be implicit exact the f i t t i n g single t o any two powers, the differences improvement due to can predictor i n powers centring). be constraints (eg. mixture t h e c o r r e l a t i o n c o e f f i c i e n t between corresponding unity, in need [BRADLEY,SRIVASTAVA,793 there dependencies. of a low variable. the extremely i s even models). order They variables, close (even to allowing 1.2.4 Overdefined This when The i s a special there a r e more linear dependency follows. By Suppose singular X = case variables of than the collection, which observations <ie. n < p ) . collinearity of X decompostion and D diagonal UDV of data t h e rank value orthonormal But Model i s r . Now there such can be occurs seen r i. m i n ( n , p ) exist as = n. U orthogonal, V that T U, V a r e o r t h o g o n a l so rank(D) = rank(X) X where : = U[D 0"|V ± 1 T D l t = r. r * r [o oJ XlVt V ] a [Ut = U ] [ D n Cf| LO XV So such ra that any 6 i O |Xcj = 0 there exists a vector c, namely tVa-lj outliers <ie. 6|cj. Outliers observations space, a s i s shown linear x e fixed, columns X u are example = 0 be atypical can of are of unit i n the predictor argument predictor large X I B = k0 length. by be i l l u s t r a t e d p=3 atypically , induced i n the following dependency are e can which hypothetical and IV o r t h o n o r m a l ] i e . |Xc| £ Collinearity The oJ b y V] = O for 1.2.5 [postmultiply ffi and So in from CGUNST,83]. by c o n s i d e r i n g variables magnitude. X i s variable scaled where X i i Suppose so the k i s that the X! 1 Xoffii Xie 1 1 ^ iS: iFP: 1 Xei*/Xi 1 X X 1 X t", j 1 X = where c = Consider IXc! Now , j jr> x± i * variables X i r 0™ + T i — c" " = X , S : (0,1,-1). 1 = Xi,.* j. 5C v-» in-: "** ^ ^ i! are the unsealed predictor 0, Xt s : * 0 t o 0. such that |Xc| l 6|c|. Definition large, 68 + Z i •-• i Xit*® and ,S: So £ - x i e » / X ) is other & 1 0 words corresponds to a: there a a Xsa** * k 0' . H e n c e f o r any In <Xii*/Xi i s a the right value the o u t l i e r , a of 0 with 0 collinearity as i n 1.2. E f f e c t s of section intuitive + a e sufficiently = k^G e B : Now ( 0 / X i - k0/A )' tends Collinearity illustrates, i n t h e manner effects of collinearity 111 It e a 1 side 1.3.1 k0/X X a*/Xg and f o r large This /X K i=2(l)n hand 1.3 0/X* has intimately [SILVEY,691 conditioning been known related state using of CGUNST,831, Definition the 1.2. of X for a long time to the conditioning that collinearities that of collinearity X. i s [KENDALL,571, are associated with small eigenvalues small of singular precisely, values which S p e c i f i c a l l y , and which r of provides c o l l i n e a r i t y , centred X X, X. numerically This, an when alternative [BELSEY,KUH,WELSCH,80] suppose scaled is to again unit that length. to quantified more d e f i n i t i o n of advocate. the Then equivalent columns by of X singular are value decomposition X = UDV T where U = lu± . . . . . u ] ! nxp V = tvi , . . . , v ] : pxp D = d i a g (dt r:> p , ...,d F 3 ) orthogonal eigenvectors singular values So X Now = Z d , u , V j .1 — i consider Xc |Xc| So = djUj = d,|c| for according 1.3.2 The c any to = 6 i n i t i a l estimates of of v i form] f 0 , i f D e f i n i t i o n Estimate technique [bilinear T dj £ 6 then there [V orthogonal] [U,V orthogonal] is a c o l l i n e a r i t y 1.2. Effects reason ridge R for Hoerl regression were "too was and their long" Kennard developing observation in the the that the presence of collinearity. norm of the This observation ordinary least E[B flJ = E C ( B - B + f i ) T = B" B + r EItr(B is seen squares by estimate considering the B. ( f i - B + B ) ] T - B> (fl - B ) " ] r A So the = B fi + t r <Var(B)) = B fl + tr((X X) = B B x T + T magnitude a cr^tr (VD"- V ) a of B in a [singular value T collinear relative to magnitude i s inversely proportional X. orthogonal Similarly covariance the an *)o- T with matrix singular the is a values, data system matrix to variance direct decompostion] is overestimated X. the Further, singular estimates, function of the the values since of the r e c i p r o c a l s of ie. A = Var(B) In tr VD~ V particular components. the f f i c the Each one singular values Var(Bw> = a* Z T variance i s uniquely as i f two explained there an leads is to decomposition variance the of by the approximate the same following K jth singular be associated have linear proportion, B can decomposed into with one just p of follows coefficients variances B - 6 Hence of explained value. high related singular dependency definition T\J •.«-., by proportions the being the dependency their value, then them. This between of of the amount variance of the associated with Definition matrix Wju 1.2 Variance = | - k,j = l ( l ) p J V,, These algorithm 0 U J proportions form developed to to the general Inference Almost potentially For instance, H.r_ flj = 0 H„ fij * 0 test 0 the K T 0K, = basis of the i d e n t i f i c a t i o n [BELSEY,KUH,WELSCH,801 c o l l i n e a r i t i e s . This case in section for algorithm the i s shown 3.2. Effects a l l statistics can P» . W : by of multiple extend ( = -^r^-- detection the P r o p o r t i o n s o f the X:nxp where 1.3.3 Decomposition be "degraded" in testing statistic associated i s the with by t h e e f f e c t s hypothesis linear regression of collinearity. But recalling determination the relationship and t h e s i n g u l a r statistic can statistic i s often but c a n be c o m p e n s a t e d of this R. As X So = (1 - i t i s seen that the t In p r a c t i c e the t presence of collinearity, f o r by t h e " o v e r l a r g e " no explicit of the magnitude statements t statistic c a n be . However of the t s t a t i s t i c he i s R,»)^5J 2CT fixed parameter B, and involving of a, the stronger X j , the smaller the test statistic and the offending i s t h e non c e n t r a l i t y consequently the lower power o f t h e t e s t . As i s noted collinearity not effect the two i t a subset the effects of predictor of the orthogonal. remaining This the introduction variables of need coefficients, i f i s to be e x p e c t e d of orthogonal variates by in procedure. effects predictions are with Predictor The CBELSEY,KUH,WELSCH,80], the estimates stepwise 1.3.4 by involving subsets comparing the notes, the magnitudes collinearity a i n the t h e non c e n t r a l i t y p a r a m e t e r for the values, the c o e f f i c i e n t of by c o l l i n e a r i t y . smaller CGUNST,83] made a b o u t notes be e f f e c t e d between of collinearity using predictor Effects " i n sample" variables are often and l i e i n the "out widely different in o f sample" subspace of data. If the o r i g i n a l data, then r e l a t i v e l y p r e c i s e e s t i m a t i o n i s p o s s i b l e . However "out o f sample" predictions may be severely e f f e c t e d by collinearity. Consider for example the ordinary least squares p r e d i c t i o n s g i v e n by the e x p r e s s i o n y = x fi T Now Var(y) = tr^Ml/n + x * (X X> ~ x > T T 1 1 = o-'-Ml/n + X i T T A - ' T ^ ) 1 where Now if X i is T a T = (tj. t , l e i g e n v e c t o r s of X X A = (Xi X) T F pi linear eigenvalues of X X combination T of the e i g e n v e c t o r s a s s o c i a t e d with the " l a r g e " e i g e n v a l u e s , i e Xi T = a T r T where a,. = 1 Xt "large" 0 else then Var(y) = CT'«d/n + Z ( l / X i ) l E where the summation i s case r e l a t i v e l y over " l a r g e " e i g e n v a l u e s . So i n t h i s precise estimation i s possible. 16 2.0 2.1 Collinaarity Definition The o f a Generalized generalized linear CNELDER,WEDDERBURN,723 amongst probit for others, models for underlying as umbrella with developed that normal responses, features - the observations - the response - to, t h e response linear are encompasses errors, and log by l o g i t and linear models above, terms this distribution develops the quasi distribution be shown a theory o f the that as a function of a likelihood. the generalized for m e a n . He although no i t is sufficient results shows log that this the for a one is the weakest class one parameter for t o Fishers likelihood Consequently c a n be assumed t h e second t o the because; analagous likelihood, on variance [WEDDERBURN,743 t o be the directly expectation based assumed, and y i e l d s exponential : p r e d i c t o r v a r i a b l e s and e r r o r i s explicitly likelihood. as of i t s quasi-likelihood, classical are to, or additively the relation purposes same function o f the estimation parameter i s equal i s modelled [WEDDERBURN,743 models independent some combination feature o f these variance proportional can an models quantal Model model i s linear Linear L i n e a r Model* counts. The of i n Generalized sort o f models exponential family. based This on is 2.1 completely a an vector defined €. giving = exp([y9 n (iii) and a components. independently vector u = distributed E ( y ) and density of an y nxl error follows a density - b(8)]/a<0> + c(y,0)) i s called the canonical 0 i s called the dispersion nxl linear component, and an X being pxl parameter and parameter. an nxp matrix coefficient vector of B, predictor = Xfl link function, systematic c a n be parameters g(.) components n = g ( p ) It , three 8 variables, an y probabilty systematic predictor which i s L i n e a r Model the following exponential where a by value The f(y;9,0) definition, [McCULLAGH,NELDER,83]. component, n x l mean generalized (ii) of following + € random with the Generalized = g-'(XB) <i> to the notation Definition y leads or shown as p= that i n the linear : IR->-]R, c o n n e c t i n g the random follows g _ 1 (XB) • sufficient predictor statistics n = XB when exist XB f o r the i s equal to the canonical g(.) parameter i s called 8. the canonical Although i t purposes of this analytic simplification t h e s i s they members of the generalized to density gives the corresponding fourth 9 columns function respectively. and between (see the three as density 9 family. Note be n o t e d uniquely link 2.1 lists the while Table 2.2 a, The column \i a n d the variance the that b, c f o r t h e of t h e second Table gives the canonical fordetails). and distributions parameters first that the and Table members, vector 1A f o r selected variances = n, w h i l e t h e mean considerable details expressions. selected f o r the (see Appendix 2A). means, 2.3 g i v e function links, and t h e m u l t i n o m i a l mean It i s to p i n column be assumed, exponential Appendix of table of will link link of the distribution. distribution. the canonical the to use canonical their f o r the case distribution complicated relationship parameter a are exponential 2.3 g i v e s give because functions the general 2.3 hypergeometric excluded functions link c a n b e made 2.1 were this i s not necessary Tables non-central In The t h i r d and of y ( i e . v) a s canonical parameter the relationship c h a r a c t e r i z e s t h e member. of v 19 Table 2.1 D e n s i t i e s o f the g e n e r a l exponential family normal f(y;u,0) = 1 f(y;p) gamma f<y;u,0> inverse • order* f(y;u,0) f(y;u,0) binomial proportion f(y;u,n) negative binomial f(y;p,k) J ^ 20 y=o,i, L p J [2TT0y J ;a = E *'H' T(0) Gaussian pth ,-(y-M) , } L [21T0 poisson -[% r — Ti exp {— -~ exp ( H yiO y 2p y lB: \ L 1-p - —-—1 2-p J + h(y,0)) y=0/n,1/n, ^ y - - i L k+M ; J L k+P J 1. gamma with 0=1 i s the e x p o n e n t i a l 2 . binomial p r o p o r t i o n with n=l i s l o g i s t i c 3. # denotes the "p s e r i e s " 7 ' ' Table 2 . 2 Parameters o f the g e n e r a l exponential family a(0) normal 0 poisson 1 gamma 0 inverse Gaussian 0 1. KB™ -54[y' /0 +log(2ir0>] 5: log(y!) -log(-9) logl(0)+ (-28) 01og(0y) - 1 / ( 2 0 y ) 1 -1 0 binomial 1/n proportion negative binomial c<0,y) -(l/3)log(2TT0y-' ) # pth order b(8) 1 (p-2> — [ (1-p) 8D log(l+e") r- -ii h(y,0> U>--iaJ log(" > 3 -klog(l-e<») # denotes t h e "p s e r i e s " >">.V Table 2.3 D e n s i t i e s o f the g e n e r a l exponential g<H> P<6> normal u 8 poisson log(y> e° gamma # -1 inverse Gaussian family V(H> 1 \i -1 — — H — [--] H # - i r - i v(8) V <p~ > 1 1 e° 1 Ue] f lp^<p- -1 binomial proportion negative J? , , binomial 1. r H 1 log r — Lk + MJ 1 ke« r~« 1 - ke° # d e n o t e s t h e "p s e r i e s " M + c"/k r r keMl+eMl-k)) —— —-—(l-ke ) : 0 6 5 Further details applications of are to be these found distributions in and their [McCULLAGH,NELDER,831 and CMcCULLAGH,821. In the the ensuing maximum results likelihood and Appendix brief 2A. technique development, As i s i s estimation details shown and an a l g o r i t h m Appendix 2B. Definition Collinearity difficult linear to case. in this to a define appendix for this than Intuitively procedure - two o r more predictor variables - coefficient estimates are different from those coefficient - data - matrices estimates collection used are highly Linear are i l lconditioned. Models i s more i n the standard t o be p r e s e n t are highly i n magnitude or variable of the when: correlated i s deficient i n the computation least given i n hypothesised - estimates i s model i t s counterpart said are given i n reweighted linear i t i s again important Fisher's scoring i n Generalized generalized i s made t o The derivation iteratively of Collinearity in reference procedures. of their equivalent squares, 2.2 much coefficient sign There are several based on model. A second approach i s f o r generalized Linearisation in their linearising y to defining linear when directions uses i t . One approach linearisation suggested of the Link [BELSEY,KUH,WELSCH,80] case to [BELSEY,KUH,WELSCH,80] calculations 2.2.1 approaches models from the of the iterative (see Appendix 2A). Function commenting f o r future on t h e non research section, linear suggest t h e model = g- (Xfl) 1 + € the form y = Xfi + € Intuitively, plausible an of This with the standard collinearity approximate following Theorem analogy to define being X. by linear linearisation in the dependency i s expressed linear general amongst L i n e a r i s a t i o n of y = g-MXfl) y = g—*(Xfi) y = XG + € c a n be e x p r e s s e d + € where with X i = X VtXi/a(0) = [ x i 5C —* C x & V x ] F r V t * = diag(v ••• ; L ) f x vi D + € i n t h e form i ti s system the formally theorem. 2.1 case, as columns in the 24 This error can from estimate where Now = = non as linear follows. First procedure is the residual expanded due about to the giving + J ( B) (B •» 0 ( B ) - J(B)B J(B) i s the by letting 0(B) - - + gradient B) + 0 (fl e - B) 0 with J(B)B matrix of respect to B. second r e s u l t appeals to linear model J(B)B A X = -J(B) € = -€. first theory Xu By proved 0(B) ~ the a fi, 0(B) y be of = equation Appendix -J(B) definition follows. 2A as follows. 1 J of the generalised L f i n i J L0B.J ("jL The © H i ]v [©"ii this is just the Now using the simplification Appendix this 2A, simplifies = It definition to be admitted residual approach, i t i s point f o r defining 2.2.2 Itarativaly as i n the linear predictor n, does squares in the proof above, t h e an e f f i c i e n t e s t i m a t e W €. residuals, However >, suggesting as a suitable i n the general of the estimates (x W< = (X X>~*X z<*> +: a as first starting case. c a n be c a s t as i n Appendix in a weighted 2a i e . > X) - X W <* > z <* > 4 r be in gives collinearity _ can not useful formulation r that Rawaightad Laast Squaraa Approach computation fl This link, to due t o t h e w e i g h t e d least of canonical ± i s The the vj x a(0> unweighted that and t h e of T x viewed as ordinary least squares again be with the transf ormation X = WX z 2 = applied to the define collinearity approximate using data. linear W z 2 So i t in the dependency the definition generalized amongst of the weight v 1 3^(0) gives would reasonable linear t h e columns as i n Appendix model a s an o f W X. 2A, to i e . Now 26 -J V i a(0) 2.2.3 Choice The link squares X produce similar * ' aT0T= discussed the However collinearity regression is Definition 2.2 For c = g - 1 <XB) some * O |Xc| the easily approach. 2.5 same of from This on reasons in collinearity the iteratively be illustrated will sources and linear the approach. effects model. case Hence, of the of Also logistic following adopted. Collinearity + € i s said to chosen 6 in a exist 2 that £ 6|c| least namely notions generalized for similar suitably such more and reweighted = intuitive the selects *' seen 2.4 in definition x squares sections iteratively transformations, the are least CSCHAEFER,83] and a n d earlier reweighted y Approach linearisation respectively. in of where X = W^X 0 Generalized Linear Model when there exists a pxl vector 2.3 Relationship Generalized Now as V i a i s a lists the general model the members of Table (p=l), other X when Model and the inverse members standard linear a ft of (=XG) 2.2 Table from t h e standard model f o r p series. that the p series the family implies of the linear such binomial) t members o f |Xft| i s ( i e except (p=2) a n d p t h o r d e r ) . negative model. in t h e so c a l l e d considered model column t o be n o t e d collinearity Is fl Definition the selected It i s except Gaussian linear in to the generalized there and parameter The f i r s t of X f o r that this —*• X f o r a l l e x c e p t proportion generalized 2.4 of the family then f o r X. family. c a r r i e s over informally, binomial the of the canonical element exponential linear gamma function the i j t h column "small", Linear 2.3. So s u b s t i t u t i n g f o r second Since, Standard d i r e c t expressions 2.4 all the Model C o l l i n e a r i t y D e f i n i t i o n s i n Table gives of Hence f o r ( i e . Poisson, collinearity collinearity in in the 28 Table 2.4 L i n e a r i s e d P r e d i c t o r Elements general exponential f o r the family 1i m x i j e-o normal - x , , poisson e gamma # J i J 3 . X 0L2eJ X : ± J order . . negative binomial # ± x,. , X i j l r —^- il ll, ~ " ""* 1 -, 1 n binomial proportion 1. e X Gaussian pth / X 1 inverse # r 6 - denotes Cl+e*) l"k<e«fl the "p + J <l-k)e")]^ series' 2n [k(2-k)l (T-k> X l J However the i t w o u l d be general because but linear i t i s incorrect model case possible general consider X = Now linear a gamma IXi 1 f o r the with Xiel X = W^X where t p in X. T h i s i s t o be collinear, the collinearity i n series. F o r example 0=1 a n d Tfii1 _ w collinearity in collinearity b y W may r e m o v e case model define f o r t h e X system the premultiplication the as to x 6 4 So X j. s BiXei + BeXee AiXai Xcl ( C I X H + CeXie) Xcl fctXii * CffiXiel Now X may b e c o l l i n e a r Ct = 1, c » = - 1 ) w i t h lS small 2.4 IXcl many relating scalar + (C,.X . + e) [dXat •+ C X c a 1 a (eg. Xi± * X i | X c | < 6, CftXae)* a * ytt&± but f o r B :l , * B X a a e and sufficiently > 6. Sources In + E +' B^X^c of Collinearity practical applications t h e X and X systems multiple i n a Generalized will of the identity the Linear scaling be a p p r o x i m a t e l y matrix. Model matrix a W** constant In the particular rows a n d ft a r e following Theorem The 2.1 multiple of is does v , when the 4 x± W*^ identity are proved the as s i n c e they the linear equal, w the H = 1 + of follows. are between binomial <1 will rows as combinations i s shown v of in the - 4v)^ W"" approximate x*B are scalar approximately • equals to the But and i m p l i e s Q± a constant dispersion there 8 as is a then so parameter one is listed i s constant. proportions case = C Matrix X. up p an i f i s used. and Scaling be I f Wi equal constant ± Constant matrix canonical link Hence consider matrix the correspondence 2.3. when approximately Condition for a where This occurs theorem. scaling equal, this to in For one Table example where say But 1 H e^ + e® so 8 = Hence So also log(c/(l-c)) in this in these arise - large - data case = i f w± cases, i n the C the general pairwise collection i s constant then ao i s 8* same sources of linear case, namely correlations (=x±fl>. collinearity : will - model - overdefined - specification model outliers However in the general theoretically above i n terms of notions appropriate correlation Definition are 2.2. x± and case, the scaled captured substitution. of linear by For Xj g i v e s i t i s better data matrix Definition example rise the to 2.2 large to X. think So with the the pairwise collinearity as i n 2.5 Effects 2.5.1 In of Collinearity Estimation analogous estimates The result and t h e i r (see B = Omitting <*> to the standard variances of R Appendix fi a Generalized Linear Model Effects manner estimates in may case, both t h e a r e a f f e c t e d by c o l l i n e a r i t y . be expressed iteratively asi n 2A) (X W<* > X) - X"" (y r + linear 1 r - u<*M s u p e r s c r i p t s f o r c l a r i t y , X WX r c a n be decomposed as follows. (X^WX)" =' ( X ^ X ) " 1 1 where = <VDU" UDV >- = (VD" V )-' r at = VD = Z any collinearity T G T 1 ———I*,-,, iteration (ie. small directly affect the singular values. Hence model can procedure [singular value decomposition] 1 T V _ e F> So X = W^X develop and t h i s [V o r t h o n o r m a l ] ( t ) ,including singular parameter values throughout suggests that of estimates collinearity the X final > through i s seen t o the small i n the generalized the iterative one, linear estimation i t may b e w o r t h w h i l e f o r any 33 estimation procedure to adjust f o r collinearity a t each iteration. The by variance estimates the asymptotic result Var(B) 1 So =• <X WX)- oT again the singular hence In of has p o s i t i v e 2.4.2 directly variance values X affects estimates (since singular be approximated t h e e s t i m a t i o n , by according X VX i s positive T to the small definite and values). I n f e r e n c e and P r e d i c t o r E f f e c t s a similar manner to equivalent statistics the i s not so d i r e c t . t can fiS collinearity inflating a t convergence effect statistic D (y, H) t where 8 i standard are affected i s the deviance = 2Z w [ y ( 8 the by c o l l i n e a r i t y . F o r example difference - 8,.) - b ( 8 ) + J linear the analogue where case the However of the the deviance i s b(8 )] a = 8(£) f 8i = 8 ( y j. ) A is a complex form of B and W which are affected collinearity. Similarly i with v(H L) t h e g e n e r a l i z e d Pearson statistic by which i s used to calculate the dispersion parameter 35 2.6 Appendix Linear 2A Maximum Likelihood and t h e G e n e r a l i z e d Model The following maximum derivation of likelihood the standard iterative estimates results i s f o r the taken from [McCULLAGH,NELDER,83]. Since the the observations log likelihood canonical form parameter 0. applying 1±, contribution with Using the @11 Hence = t h e mean v e c t o r canonical E( by y i for J [~yi - b ( 8 i ) 1 ) using parameter = Hi = + _ details) maximum c ( y*' 0 Definition 2.1, and technique (see gives likelihood result the following estimates. t u c a n be r e p r e s e n t e d b O i ) 8j. a n d d i s p e r s i o n ) yj. - p 8 as the standard of initially i t h observation i n scoring f o r the iterative I—aT0i the consider parameter notation Fisher's = 11 of canonical [KENDALL,STUART,67.2] results a r e independent, i n terms of the Now Q l± 08i -b(8i) a(0) a = a ao the variance terms of the canonical Var(yt) by function using = v = ± v, c a n be s i m i l a r l y parameter represented in 8 as b(9 )aC0) 1 the standard result P] * [H]* •° • E The H E maximum and v likelihood i n the following Now the the log likelihood total individual s For the i s the f i r s t expressed - or t o any score linear i s just derivative So v t the score the sum the score of the log such vector likelihood, links [&] ! the following simplification L©9 j.J V i component V i simplifies a<0> to as of the as repeatedly. [SKiJ i n terms o f component c o n t r i b u t i o n s . Hence - [lib] • ? [ & ] ' canonical are obtained steps. contribution independent , which be equations i s used can Hence t h e maximum which f o r canonical ? L^^aTsKhor i n matrix _ r information the expected simplifies ° r= 1 < 1 > P m a t r i x , which value of i s derived 0 l ! where the derivative second of the was d e f i n e d second by F i s h e r derivative i n the following r g ii I L@fi .@fisJ ffi v term a simplification. _ Hence of as minus t h el o g steps. r®iii'| 1®"*J L 1 zero due since to i t the the elements a r e S t weight i s constant x j >,x v Define to 0 a E links form a r e 0 The - = e q u a t i o n s i n component form * < y - H> a<0) likelihood, likelihood matrix W @ f i r?Hii sJ L n,J @ V, r@(^i | L®n,J = diagCw* w,,,) a s contains canonical the link 38 a So ,S: (0> the negative f 0*11 lOB ..@Bs expected value expression simplifies to WiXi ,Xis t v So summing the to get the total information I v > or s = £ Wj X j » X t v i n matrix I So = i s S form Fisher's B = further because matrix scoring o f ft a r e g i v e n ft<t-Hi> < * > + I -• * i s details t h e same i s = H6 ft = link see simplification to the expressed and as or s A But <*-i> = A HB = ft<*> + < t ; > + likelihood s or (X WX)~ X (y T 1 r - p'* * ) the Now information so the above s c o r i n g Newton So H f i maximum [KENDALL,STUART,67.2]. as the Hessian, typically H--*s the by of this equivalent tBARD,741) scheme, s of the canonical scheme oft of r estimates (eg. element X WX using For matrix contribution, the rsth Rhapson method 39 (Hft) £ Hi.,B, ..j A Z Z w,,.x x |Bj kl Z WkXkifik Z Wk'^Xki k So Hfi ' Ldfiuj But this i s just t h e form of weighted least squares A (X W< X)fl< > T t > with pseudo Z This = t = as (X WX) T t ) + 1 X' Wz r (working) variable * converges, A ft ( dependent u generally represented T ri - k X z so the final estimate can be 2.7 Appendix 2B Iteratively Reweighted Least Squares Algorithm The following algorithm for calculating B i s taken from [McCULLAGH,NELDER,83]. Algorithm 2.1 Obtain n initial A = <o > Xfi A jj < O > * <o > ' x estimates of from < 0 > the = y = g (p <° > > least squares the linear predictor data A <o> | r LdnJ STEP reweighted 0 STEP A Iteratively LdfiJ A C O , 1 REPEAT t = 0(1) UNTIL Set up working Set up weight W<*> = convergence dependent variable W diag(Wi < t: > A w„ ( t ) < t > > 2 [ii] Regress z* < :> on x t x, f: with weight W to get 41 A < * -i- 1 > (X W< >X>- X" W< z r t A Xfi <*•*•'•> A m END REPEAT - [is] l r t> <t; > 42 3. Diagnostics for Collinearity In Generalized Linear Models 3.1 Desirable Properties A diagnostic try to sense measure collinearity hypothesised statistics. This sufficient illustrates directly signs because signs linked amongst based on of these the columns of row not that (eg. as deletion necessary [MULLET,761 diagnostic of the in are collinearity Rather, methods e f f e c t s such sensitivity definition the Measures i t s possible or for the be most case). to not from conditions the dependency is Diagnostic should incorrectly and for should approximate predictor be linear variables, ie. |Xc| This £ gives First, 6|c| two i t should combinations Second, of i t collinearity. the distribution X. properties be capable offending should However diagnostic variables desirable measure the able use must assumptions of predictor be be about of of a diagnostic identifying the linear variables. to quantify the the quantitative subjective, the measure. nature since of effects of aspect of there the are no predictor Now while collinearity haa consequences any subsequent standard linear collinearity into the 3.2 while of linear degradation due the singular a i n the For c*0 if and as the the 2.2 directly this enters tests natural and embedding case. exactly generalized is as there a one X following f o r the expresses linear are no model easy the idea setting, ways of of i t is finding i f |Xc| i to 1.2 by one and small correspondence the between instability singular values. of the This is theorem. of Standard suitably that only in Definition Equivalence some such X) estimates, manner, general of to Collinearity there 3.1 of the similar characterised Value values for X Theorem the of as in in measure collinearity proved and case, the i t combinations. However, system parameters, In i m p r a c t i c a l measure, the problem, prediction. Definition in the data or present Measures of a testing In collinearity an estimation expressions be Now the primarily hypothesis (say predictions. should for ia chosen 6|c| Collinearity Linear 6 1 0 Model there and y a = Small Singular XB + € exists a pxl vector di 1 6 where is proved values give rise 1.3.1. So This 1.2 (this \Xc\ eS: Hence i f a to show a i min u = d i T the to fact was that small that singular The collinearity suppose c X Xc that collinearity of X singular in section according value. Definition scaling). of small established singular c value to Without loss 1.2 unit has Consider T u" X" Xu r where r u u = T 1 [extrema of r a t i o quadratic forms e | X c | i. 6 not multivariate c± independent be to = smallest follows. is equivalent Although is as implies generality, norm i a the 4 i t remains Definition of d extended then 1 6. emphasized that the | in X So in problem, such to d. c is that there Now of 1 small. chapter i | < case d, of the multiple one, may collinearity be above k linearly theorem can collinearities as Collinearities and follows. Corollary Small Singular y = XB k if 3.1 + Equivalence Values X for the Multiple Standard Linear Model € collinearities and of of only i f are present in the standard linear model the The kth smallest "only while i f " argument the theorem the this since Definition is y having of estimation Collinearity by + from of the the 1.3.1 above variance definition i t i s possible the singular pseudo values. the singular linear general in a linear transformation definition Measure singular variables values i s reasonable properties to apply of a of measure Generalized Now this in as to = W'X Linear i t comes • near deduce variables be offending x.j c o r r e s p o n d i n g highly the general not algorithm the desirable to the o r i g i n a l i s i n general to earlier. identification i t would directly of X discussed the predictor values X j . However natural the € the "small" of to to a the following a l l the desirable variables section directly through CBELSEY,KUH,WELSCH,80] relate the Hence 1 combinations to in • • carries = g- <XB) above First the i t i s equivalent 3.1 given The established "small" components. variables. Model of X are follows the analogue with Further, setting values was " i f " argument by principal singular predictor possible setting to are the because pseudo ones x j . Nevertheless scaling matrix identity W matrix in i s and many practical an a p p r o x i m a t e has little situations scalar effect multiple on the the of the collinear structure. Second, measured this the by t h o s e Model 3.2 the col1inearities the corresponding index i s defined Collinearity singularities. as Index can be To do follows. f o r Generalized Linear l While i t experience, suggests a Finally (d i s such ,. . . , d ) there as r e f e r r e d index i s present than in section a large of X • body of i f that the corresponding 10.0. of collinearity inference Dependency i s values t o i n [MONTGOMERY,PECK,82], i s greater the effects discussed are the singular Fn subjective the estimation, Model t collinearity collinearity 3.3 of y = g" (Xfl) • € i s where was of a collinearity Definition in magnitudes are clearly and p r e d i c t i o n manifested calculations as 2.4. i n Collinear From the definition i s measure for a generalised i s Systems seen linear that model the depends collinearity specifically upon thedistribution proportion, It in chosen y (eg. binomial Poisson,e t c ) . i sdesirable t o be a b l e the collinearity. author f o r the response and proved to quantify The f o l l o w i n g by CM0YLS,85] theorem comes this indeterminancy conjectured part by t h e way i n a n s w e r i n g this. Theorem 3.1 a Wr^mx Collinearity £ a where i. W i v, m m A,T i s proved T Let as follows. the eigensystem M = X M = X = A M r M ( X r X ) t z = X t XM t M M t Letthe indices r o f X, X a n d and maximal diagonal matrix W • subscripts A,T i s the eigen o f X WX. ,,, system and stand M o f X X, and X Then M r M t M tT o r t h o g o n a l ] so M = z" z r Consider u X WXu. I t extrema of T quadratic forms, of subject T T Hence i thought u u = 1 and i s so maximised max(u X WXu) u r c a n be T restriction XM o f t h e weight a n d ,.,*»>«. S u p p o s e m±>*. (X X)t t a w,•,,„,« a r e t h e m i n i m a l i „ , elements for Relationships a, a a r e t h e c o l l i n e a r i t y w This Index z Wz T = t r M (X WX)t r M = as the ratio to by u = t X M > t x M of the M scaling . So (X WX)t the T M Now X = Z«iZi Zi i w = w X T Zzi Zi T r a ri1 M by the ratio M = t X wxt = z Wz r of the extrema of quadratic forma T M M where T = IWiZi Z, 1 w„Zzi Zi = w z = Xt M T T £ M tM w t M T T M X T Xt„ X Xt [extrema T M of ratio] WMXM ,= Thus £ XM £ wX Xm i. wX M Similarly W Xm £ r t l Dividing these covariance So and M appropriately weight matrices this theorem says is bounded by system factors being weights. This First, the that ratios is a given useful a the of of are the the result collinearity linear model, the models can calculated be that yields effects of the positive semi collinearity standard minimum f o r two doing since the definite. of the linear and general system maximum • with scaling reasons. analysis collinearity without result, with with another a other standard general e'igenanalysis. Second, scalar i t multiple structure of of with By of Table predictor predictor matrix. practical i t i s matrix W i s nearly a then the collinear matrix, i s approximately Since this applications, use i n t h e f o l l o w i n g 2.3 bounds chapter v(0) possible f o r t h e members i s this that often the fact i s of on e s t i m a t i o n . in the t o compare fourth column t h e magnitudes o f o f t h e f a m i l y . In p a r t i c u l a r f o r Poisson , min max while e e Xi __ XiB a min i s (Xifi)^ . . ,.,a (Xifl)" seen collinearity problem when ft ~ a £ £ f o r t h e gamma max — of identity considering the expression these the i£ t h e s c a l i n g of the many considerable that the transformed the original case It shows max e Xjfl — a , XiB min e t h e bounds a r e £ ~ a min £ immediately max from i n the standard a gamma t h e problems caused model (x R) ——-or a ( (xifl) * 1 the linear above model i s used. This by t h e s o c a l l e d p lower bound, that becomes more of a i s another series. example 4.0 of Estimation f o r Generalized Linear Models i n the Presence t h e methods of the previous i t s presence Collinearity Having detected chapter, the remains. The collinearity problem solution of by estimation to this in d e p e n d s on the causes of the collinearity. However a estimation cautionary with as an "We s h o u l d e s t i m a t i o n by was shown recognise the any method i s solutions in detail, a the sources From scaling by a and for Collinear matrix approach otherwise. general effects pragmatic treating precise combinations 1.3.4, of the [ALLEN,77] existence of situations warranted" t o examine Remedies linear While p96 example proceeding 4.1 necessary. in section Before to i s i s possible for certain coefficients shows word W overview discussed viewpoint, in as a possible i s given i n chapter no remedial with respect one. Sources i s negligible, t h e model taken the various where often s o some standard following the progress linear sections, effect one. except of the c a n be This where made i s the noted 4.1.1 Large While is Palrwiae this could unlikely prior Correlations be simple i t a l l e v i a t e d by sufficient d i s t r i b u t i o n or may be in X a Bayesian information likelihood. possible to exists If the remedy approach, to i t formulate pattern is a very this by variable due to extrinsic deletion. 4.1.2 Data C o l l e c t i o n If the dependency factors, then sampling plan often be previous In on infeasible or Just i f approach of be data according obvious solution. due to cost, lack of the may extra to available used dependency to However, inconsistency dependency be to a proper this may with the data. is intrinsic, incorporate the then a information mechanism. Model S p e c i f i c a t i o n The the the underlying 4.1.3 to is contrast, the suspected collection data, Bayesian is common solution eliminate the l e v e l s of With a predictor the case of redundant predictor categorical variable). approximate specified, to there variables. dependencies, is good However exact since variables the justification as dependencies, CGUNST,83] source for notes, (eg. is one is of clearly eliminating one must be careful to make collinearity 4.1.4 One Overdefined Model method to handle where the eliminate some exact their tGUNST,831 this i sthe sole predictor this variables. unlike other statistics p2237 i s eigenvalues distribution statistics, have that cause of the i n the data. regression, that sure theory variable principal of This exists components X X are has t h e advantage T f o r the selection used decision procedures, i n f l u e n c e d by t h e c o l l i n e a r i t y . notes there aresituations to which However where "There appears t o be no credible substitution f o r c o l l e c t i n g a d d i t i o n a l data o r l i m i t i n g t h e goals o f t h e study when m u l t i c o l l i n e a r i t y i s a t t r i b u t e d t o o v e r d e f i n e d m o d e l s . " 4.1.5 Outliers [PREGIBON,81] model the may that be d e t e c t e d outliers by e x a m i n i n g generalized projection M = I - So shows using case, generalized the diagonal linear elements o f matrix W" X(X WX>- X' W 1 T similar the ina 1 T criteria offending to data those in the c a n be d r o p p e d , standard linear so a l l e v i a t i n g the collinearity. 4.2 Remedies The major conditioning for Collinear observable of the Effects effect X of matrix, collinearity a n d i n moat i s the i l l practical cases there than In may complex this obvious a tradeoff ( i e .ridge Estimation 4.3.1 Ridge developed tHOERL.,62]. <X X + r proposed handle lying be variables. and reduced worthwhile. The stemmed the along from estimation instability the ridge simplest of a estimator of response canonical i s the biased shows in reduced the simple There against paper e x i s t s a k<, s u c h : i s estimator, mean developed less i e . an for choosing been t h e mean that Numerous of been the i s as a error ordinary results empirical methods variants of subsequently p75 That of bias criticism regression square k, a n d s e v e r a l much [SMITH,CAMPBELL,80] t h e use o f r i d g e that introduction error. have CHOERL,KENNARD,703. than square estimator has estimation. r estimator squares been J there least have kI)- X y i n the pioneering the ridge a may bias other estimator ftR = of the predictor increased analysis to surface paper estimation) ridge coefficients as between collinearity Estimation surface ridge f o r the Methods Historically methods reasons interrelationships in case variance 4.3 be no proposed. of spirited ridge attack "mechanical data p a r t i c u l a r phenomena coefficients" They argue that manipulation that being modeled and ridge pseudoinformation", approach. However while prior in the terms of essential and this estimators and the instead critics information a prior ridge imprecise are this based i s too regression a paper point to i t is be "adhoc as Bayesian out that, formulated nevertheless approximately Furthermore, on semi weak distribution; information. i n s e n s i t i v e to the information about advocate of held is to incorporates [GUNST,80] plOO notes "ridge regression has been successfully applied too f r e q u e n t l y , when little prior information is available, for i t s use t o be restricted only to data for which formal B a y e s i a n p r i o r s a r e known" 4.3.1.1 In Standard Linear their derivation CHOERL,KENNARD,70] Case reason of as a ridge estimator follows "the worse the c o n d i t i o n i n g of X X, the more R can be expected to be t o o l o n g . On the other hand, the worse the c o n d i t i o n i n g , t h e f u r t h e r one can move from R without an appreciable increase i n t h e r e s i d u a l sum o f s q u a r e s . In view of E[B 6] = B B + o t r ( X X ) - i t seems r e a s o n a b l e t h a t i f one moves away from the minimum sum of squares point, the m o v e m e n t s h o u l d be i n a direction which will shorten the length of the r e g r e s s i o n vector." T T So T the e ridge due to that the ordinary d* 6 1 estimator squares of r error is the i s constrained least squares = S S E ( f i ) - SSECrW.;.) = ( y - y > s h o r t e s t fi w h o s e to be within estimator B u.a 0 R A A T ( y - y ) - ( y A - ycii....ia > A 1 <y - y«.i !>> d'~ ie. sum of say of A (y = - XB«) (y - X1W T - ( B R - Ba,_a> X X<BH T 2(fl T Thus <B - T B the ridge minimise subject Now = which B R fi B to R gives = - B o i ,-,) X X ( B T equations] of - B - B „ > < d' r expression ridge extends R with (:; multiplier fia,.. ,) X X(fl« T s ra 1/k i s - fi „_„> T E r o f t h e form estimator. Linear to the definition the usual [normal fioLs) kI)-*Bot.a General following ) R has s o l u t i o n r This T i s the solution (fi„ (X X + simple r + l/k(B T R 4.3.1.2 of f l the Lagrangian L the T R D L B BOLB) ) " X X ( BR - trace B Xfl A - BOL R - T m S A = (y - X B o L . > ( y - B O L . ) «• T - BoL ) X (y B A Case general of a ridge iteratively setting as follows estimator rewighted Bp, d e f i n e d least squares with the i n terms solution A fli ni...ca • Definition model y 4.1 = g _ 1 This (XB) B m i n i m i s e S u b j e c t Ridge to has + € T R (GR B Estimation i s given b y f o r the generalized t h e R — s o l u t i o n GIRL.)=i) X W X ( B R r T - B s o l u t i o n linear 56 A A ft R The = (X WX + T proof of this kI)- X WXB 1 e T follows by I R L extending CSCHAEFER,79] tothe A general settiong reweighted X = W^X squares Brm.™ estimator, that i s with i tminimises the iteratively transformation t h e weighted sums o f due t o e r r o r WSSE the follows. , i n the sense squares Again least as = the (y - g sum o f usual ( Xft) ) W - ( y - 1 r squares estimator g'MXB)) 1 i s constrained to within say d ,;s of Hence ftxm_ia- A d' = a WSSE ( f l ) - WSSE (ft t u t * , ) R (y - g (y (y - (y ( X f W >W 1 1 1 (y - g - g- (Xft, „,,,) ) 'W W T - |i(S, - (y 1 R 1 TR1 - p (ftT m in ) ) )) tt- (y l T - p (B 1 vector p(B ) e = <B and A - T simplifies approximated R R -ft,:«,...=>) X - A 1 3 ) by first r linear (y - WXB model fii„ ) X WX(B T LB + B t m „> likelihood i tfurther to (B L A - fttm _ > T 1 the generalized x R ft) fi RLo) X WX(fl T R p (B - B) p ( f t ) + WX(ft simplifies to (ft« - as A p (fi so t h e above d be p ( f i ) + WX(fln = R series My T J: r Taylor )W T a n d p ( B<i_.!=>) c a n >Ji(Bn) Now A H(B - .„> + Tm A 2(p(B«) s - - H (ft:i R L B )'W"" ( p ( B R ) (p(B«) + - g - ( Xft _. > ) H(ft,»>) - T H L B (X6 )> My 1 p (fti=») ) 1 T R - A ftTBUB> equation order So following the tHOERL,KENNARD,701 minimisation result and an using analogous a Lagrange manner gives multiplier [HOERL,KENNARD,703 method in result an shows square ordinary general give a justification existence-construction there error least is a of non this squares setting as zero value estimator in of is estimator. i s shown theorem. less This the k, The such than result following for Theorem 4.2 Existence generalized of linear k : Q model MSE ( B y = g _ 1 their existence that the that extends of the to the theorem. A for the follows. Now mean result in A R (k ) ) (Xfl) + MSE(fiiR £. r a L f i l ) € If (i) Xij are (ii) (X VX) (iii) V(u) uniformly i s of T V = bounded order ^- 0(l/n ) a "v bounded dp (iv) Then 3rd and 4th central for n sufficiently large k.=, 2 0 : MSE(fi (k ,)) moments there A This the to case the (iii) of was logistic general case i s included € exist exists A R theorem of and c <L MSE(fi originally regression. above, the is I R L ..e) proved The by [SCHAEFER,791 extension straightforward analogue between the of the for proof i f condition 1st and 2nd moments in general setting the logistic i s recognised. multivariate As with while any the standard linear data. Further found for i s appealing that k k as y e t . Also upon disturbing to n So one i s of have the the side [GUNST,83] where [SCHAEFER,791 logistic choice the give produced So i t choosn since [MONTGOMERY,PECK,821 case, p340 a s no be n o t e d a to the of not give have been collinearity X, alternative i t i s large empirical earlier. bounds section for that construction in their k, ,. H o w e v e r :: shows, result i n the i s t o g e t any guidance s e t t i n g . However in of that for sufficiently construction several as from t h e bounds values condition a i nthe be o m i t t e d optimally the degree singular i s not possible linear will i t must counterexample case, of k i n the general standard be deduce a those tedious. case, referenced they regression achievable. and and t h e o r e t i c a l l y i t does l e d back [HOERL,KENNARD,701 theorem, can proof i t i s of limited value solely argument This i s long assurance present. estimation algebra t h e theorem depends regression candidates reviewing by not on t h e analogy with are suggested. various rules state that "no s i n g l e p r o c e d u r e emerges from these s t u d i e s as best overall... Our own preference i n practice i s f o r ordinary r i d g e r e g r e s s i o n w i t h k s e l e c t e d by inspection of the ridge trace ... I t i s also occasionally useful to find the " o p t i m u m " v a l u e o f k s u g g e s t e d by H o e r l , K e n n a r d and B a l d w i n [1975] and the iteratively e s t i m a t e d "optimum" o f H o e r l and K e n n a r d [1976] and compare t h e r e s u l t i n g models obtained v i a the ridge trace." Hence the following Definition for Rule : Rule linear k i s that stabilise, ie. Selection the general I to 4.IB model value y = g which against R t h e one definition o f k i n Ridge and i s found B (k) plotted II : extended with by — 1 Estimation (XB) causes + € the estimate inspection fi^OO of the ridge k. [HOERL,KENNARD,BALDWIN,75] k where the Rule I I I : 0 i s the dispersion generalized linear parameter estimate of model. [HOERL,KENNARD,76] Algorithm [ Rule REPEAT i=l(l) UNTIL convergence ft<k > ) f i ( k <* > ) 11 R(k < *- > ) = 1 END REPEAT T <X' WX r + k < 1+ 1 > I ) - X'Wy 1 II ] trace Th e r e i a including but a the legion richer the above three of other method methods methods to of generalized serve choose ridge from, estimation, to illustrate the general idea . 4.3.2 Bayesian 4.3.2.1 Estimation Standard Linear tLINDLEY,SMITH,72] has the normal multivariate posterior ft = [X X r + where So by more applied work, model method. posterior type in NCO, around this i t normal must points to the concept by a ft i f y has the a, > then of a e a the concept i s means a be of little CNELDER,72], out that of prior = prior Bayesian unpalatible come This which The exchangeable t h e ft m u s t used, prior i s obtained. strictly distribution. always However likelihood the concept since terms estimator are motivated multivariate equivalent and iR distribution ridge paper [LINDLEY,SMITH,721, is u I> P information pivoting standardised N (XB, case, o^/Ufa** Unfortunately standard the prior f o r ft a philosophy, for = of this priors. distribution linear kll-'X'^ k distribution i n the standard o f ft i s adding results show, normal mean Case from a implies a detracts from in reviewing the Bayesian formulation likelihood likelihood * [EDWARDS,69] likelihood This i s a likelihood 4.3.2.2 In framework General of as i s [EDWARDS,691 1, evidence prior for that normal. f yiXifl S the f i r s t exponential term i s the prior obtained from Appendix 4A. mode prior likelihood . :L + the partial Hence the using of the derivatives generalized 1.3, a n d t h e s e c o n d a normal likelihood following of ) • :L maximum philosophy _. , as i n d e f i n i t i o n The natural o f fi i s c<y ,0) log likelihood i s a e s t i m a t e s fi a r e the log likelihood J experiments, the sampling i s the log likelihood (O.dra^). with the models. other - b(x B)"| distribution parameters and prior with estimates are are presented i n definition of a posterior estimator. Definition linear that term linear using a70> where i n better from the So the posterior = ( F of fits Case known asymptotically and of generalized a normal since approach Linear the absence choice one simpler 4.2 model maximises = ( £ Posterior y = g _ 1 (Xfi) Mode + € i s the value the density I" y * « , B i-i [ - Estimator b<x i»1 + ± " C ( Y I J ^logo,;, * 1 S: fi -., f expression a(0) CJ«. f o r the 2 } , > ,• 0 generalized 4.3.3 Principal 4.3.3.1 Component Standard Principal variables, of y~ the standard = X-fl* A T + = where X" variables linear by linear reduced set of removed in the design the case, predictor are considering those matrix. canonical model, i e . = XT XX •= T" T r r T ' X XT 1 = ['Xi. , . . . , X ] eigenvalues = [ t i , . . . , t ;,l e i g e n v e c t o r s = T'JWra p components = A of X X T of r estimator X' X 1 i s defined as TBB~ B = dlag(bi substituting A a singular values b ,} P bi So using standard € the p r i n c i p a l A OPC the are easiest derived where Then by the to small results in estimates where corresponding form Case components, produces . biased The Linear Estimation f o r B" TBT B 0 \ "small" 1 e l se :l from above r ( TBT" " ( X X ) - X 1 1 TBT"' TA 1 1 XBo....m T X Xfl ua 1 T T n [*,/ [T TA-••• T" X" XB,: [A- 1 r r r r 1 tdiagonalisation T BA ~" T X X B o i... 1 X'X orthogonal 1 = BA — 1 (X^X)' ] 1 3 3 (X x>-x xrw. T r where ra (X X)"" 1=1—<« ^ T - - t i t i. "•• . 1 A j. = T i s an T t approximate inverse Now V a r ( B p c ) = V a r ( T B G •*) = TBVar(B")B T T [Var(AX) T TB ( X » * X * ) - B T CT^- [by c o n s t r u c t i o n TBA - 'B"' T "a™ [diagonal isation 1 r r 1 which i s least sqaures t o be Var(fl .a) Hence terms due values. 4.3.3.2 General a components i n the variances singular In with the variance to those Linear similar alleviates i t i s possible setting Definition 4.3 Components Principal linear A model A = (X WX)-X WXB T T 1 R L . where (X WX)'" T = by r e m o v i n g t h e to small Case manner the general the consequences of variables corresponding to the general f W 3 of the ordinary of the effects, definition for ] TA-^T^tr* principal collinearity large compared ] T estimator = m = AVar(X)A TA — T ' 1 r B to extend this v as follows. y = g Estimator _ 1 (Xfl) + € i s given by Hence variances are A = [\i T = t t i , . . . , t,,] B = d i ag (b , by d i r e c t of similarly analogy t h eestimates reduced. X r J eigenvalues o f X WX eigenvectors b F ) o f X WX r ) 0 Xt "small" 1 else t h ee f f e c t s i nthe r of collinearity generalized linear on t h e model 4.4 Appendix Posterior The 4A likelihood manner by However the problem i s values i s variance are of B [EDWARDS,69a] i s acceptable substituting of the when remove parameters. because t o use f o r the expected matrix. further nuisance t h e maximum However parameters formation He the usual standard the appropriate. out, using non nuisance the i n the to the information involved, to i t slightly the that more respect distribution obtaining shows like matrix shows of that i t parameter likelihood the by equations. the following calculations 1P = i.L £ I f VxXiB —-s- - the following partial Ql , ©ft,., - Hi>Xi _ t 01 @a K y i c <V- A 0 ) , > ) differentials _ _ are obtained fl ,_ r a, iz 3 B B 2 (ar,.) = fii m 01, 00 _ _p 2a , s e ra [" £ y x f l - b ( x f l ) ~ L 0 1 1 : L a 0fl .0B v + |loga*~ 1 8 r R J - b(Xifi)] —aTc5) ( So Estimates are obtained with then no a s s o c i a t e d in Hence equations differentiating [PATEFIELD,771 is Likelihood Distribution maximum there Maximum = s z iXiir, J~ XL ai s [Op f t iJH V v :: ll |L00 n J J A A - + + @c(y . ,0)1 00 J :l crB M [ 6 „ s Kronecker delta] 66 Z <y S l - e P Z(y S lp t pi > xi - Hi)x By : •B B a T ( Eliminating equations • - i C T r the a a):a ( CT , nuisance E A > parameter gives BJB P The score vector i s ©fi ©0 and the formation 0B0B f l p 0B00 with the usual matrix ©Jiie 0B00 0 1 , @0 S: F C scoring fi fi 0 0 i s + equation F-^s being a* f3 using t h e maximum 67 5.0 5.1 are developed obtain in verify set unknown. that square Because Scope Other methods of this, predictor such this o f fi ( a ) , thesis, illustrate adopted. and s o one l e v e l would be t o symptoms. then three approach to applied to the calculate coefficients h a s been The be u s e d has to directly the "true" a simulation as are choosen. [DEMPSTER,SCHATZOFF,WERMOUTH,771 and <P>, [SCHAEFER,82] - sample degree e t c .But t h i s methods simulation and a t o two of of studied <n), the chapter (d), resources that the number o f collinearity simulation, parameters levels. have size i s beyond a restricted some o f t h e In t h i s this since could chapter not possible parameters variables alignment collinear methods Simulation simulations of several of the One a p p r o a c h two However, [McDONALD,GALARNEAU,751 effects has chapter i t i s validating chapters. error, of the to which of estimators. mean 5.2 data and t h e disadvantage, the approaches measures this, obtain two the previous a large diagnostic to Example Introduction There , Illuat.rat.ive serves of to four, has been n, p, d a r e restricted 5.3 Generation 5.3.1 of Collinear Standard The Linear procedure (eg. that Case has been followed [McDONALD,GALARNEAU,751, [WICHERN,CHURCHILL,78), collinear design dependency. Two selected, and Data as matrix "true" from mean square respectively, [THISTED,76] p74 as have corresponding system. T i s these It choices. then i s known and been smallest that minimised [GIBBONS,81] f o r these a linear to the maximised justification to generate theoretical vectors of the X X error with a coefficient the eigenvectors simulations [HEMMERLE,BRANTLE,78], [GIBBONS,81]) has been largest eigenvalues the i n many cites choices " "extreme-case simulation" experiments a p p e a r t o be t h e most e c o n o m i c a l and i n f o r m a t i v e , especially f o r preliminary s t u d i e s o f new results" Finally, the systematic Now eg. to response part much a fair 5.3.2 General generalized been basis the resources problem has NOSTRAND,79] beyond The Linear of linear i s generated t o Xfl a n d a r a n d o m criticism [DRAPER,VAN construct due vector for of this normal levelled and much a s t h e sum error. against work comparison. of the needs this design t o be However, done this i s thesis. Case generating model a collinear i s not as simple as data s e t from in the a standard linear case, since for the p series the X X system the standard r must be used . A solution, in a case might ft a s the eigenvector of X X. T recover appear similar Then an it i s The as r not by [GIBBONS,811 of to the smallest = w'^(X,ft)X matrix. However i s itself with to no a non choose i s to start maintain by convenience dispersion system X X as c o l l i n e a r , could taking linear selecting eigenvalue be used ft to as the f u n c t i o n o f ft l e a d s trivial the solution. alignment to Hence of ft as tTHISTED,761. premultiplication For X which possible alternative to relation o f X X, to to generate corresponding original overconstrained recommended so the the eigenvectors t o be manner i s used collinearity. collinearity This X collinear in i s effectively the case parameter with of the ( i e . 0=1) to generate method i s X by and realising row scaling gamma density chosen. X with s e l e c t ft The of i s as f o l l o w s . X. with method a controllable that unit of degree Algorithm STEP 5.1 Modified [GIBBONS,81] Collinear Generator of X 1 Generate i STEP z .j as independent N<0,1> t = l ( l ) n paeudo random variables i=l(l)p+l 2 ct Select where a i s l S the correlation between any two variables Compute x j t STEP = (1 e X In the X matrix = W^X where term distribution. :L €, = l ( l ) n j= l ( l ) p as = model y component o f as having considered, T B mean come distribution l / X i the one So i t i s n o t m e a n i n g f u l with = r t has distribution ± l/x ' B linear that to the systematic thought H i t; y i s t h e sum o f t h e s y s t e m a t i c error term w ^ the generalized response be + o z i , ,,. j 3 Compute the - o( )Zij vector from = g (XB) + 1 € the component g- ( X f i ) a n d parameter exponential 1 t o add a s p e c i f i c t o generate y. Rather a one parameter p = g " (Xfi). t h e mean vector i s 1 error y can exponential So f o r t h e gamma 5.4 Simulation 5.4.1 Simulation The y model to = g (Xft) 1 Setup be considered + € i n the simulation where y : n x l gamma X : variate dispersion parameter n : sample size p : number predictor fi : p x l 2 levels = = 30 variables (including 4 »<•--> fi<«a> O.Ol -0.001 O.Ol fi< i-. > = 0.01 fi 0.01 < ™ •0.01 O.Ol l e v e l s were measured by predictor reason. as largest levels the degree of collinearity correlation coefficient d" choosen X generated as by eigenvectors eigenvalues are f o r the =0.80 were Given above the O.Ol d between as any variables ft<is>> <L> chosen a'"'-, d<*> fi , 0=1 nxp intercept) Two i s (t* -*, 1 =0.95 s > above Algorithm for 5.1 corresponding t < r a > ) of X the and to the with following d fi choosen <l smallest at the and two two = <u> 0.95 t<'- > -0.63 0.28 -0.75 0.05 -0.48 -0.49 -0.39 -0.34 -0.43 0.65 -0.36 -0.44 -0.50 -0.37 Now aa discussed choice B <'- > of approximately eigenvectors above of X(B> eigenvectors is = £ < la > t the 0.80 i t aligned X ( B of expected with < L > ) the B to align analytically, by t h e s u i t a b l e seen the for B -0.52 i t i s not possible i s in 0.78 < i s that but &<>•-> directions d " = 0.95. As alignment B and of < E with " are the respective little only improvement the 1 setting ( B <»> , d <•=> > i s u s e d . A pilot reasonable able 5 to r u n sample detect percent Because were simulation a level. of ten size. The 10 p e r c e n t This gave resource runs was criterion change made to estimate was that i n the estimates a run size of : iteratively B : I 1 just B,=.c: least [HOERL,KENNARD,BALDWIN,751 : principal components being (m) o f 2 0 . constraints, reweighted a B at the three estimators considered BIRLO R of squares ridge estimator estimator estimator To assess criterion o f the so c a l l e d ASE(fi) is the effectiveness = (fi - fi) (fi of the estimators, average square the usual error - fi) T used. Now i t variance i s of the [SCHAEFER,79] moments here, of possible but should 5.4.2 S i m u l a t i o n The stage ridge chapter the was d i v i d e d used SAS used generalized iteratively estimators While sets used PROC i t s matrix machine coding of ability leaves much deficieny OCl/n-"} ( s e e the third i n any f u t u r e and f o u r t h this i s omitted work. and D i f f i c u l t i e s into three PROC squares MATRIX and The first t o generate 5.1. The models package least stages. MATRIX Algorithm linear PROC of order f o r the second GLIM estimates. calculated the stage to get the The third the biased and s t a t i s t i c s . t h e SAS of using to o f y. However procedure reweighted again terms Implementation data because II) i n be c o n s i d e r e d the approximations estimators simulation the obtain distribution collinear stage to i s that MATRIX style the very to routine syntax matrix be large i s t o use and i s e f f e c t i v e routines; desired. matrices pleasant due t o i t s input/output The result must be of stored this (with numerous consequent physical is made o f interpreter to be During fault was precent model has the no the data used with with #****FAULT the and this and Table parameter estimator nested actual 5.1 flpc This stage, effected use computational that (as fitted a approximately appear collinearity when was values a five gamma here), and GLIM sensibly message data the type by in in fii, ratio as Conclusions displayed a each cell the f i , , fl , fi-+ f r o m s column error This and were s u b s t i t u t e d . in Tables 5.1 5.1. alignment 5.2 sets simulations are lists values parentheses. A second negative Results Figure squared reduction for i f efficient code. I t would severe extra of within Table and sets. problems) **»«»*****» results 5.2 GLIM. warning Simulation The the 59 overcome 5.4.3 of in adjustement aborts To running found of is the i/o 3 and fi<> the by similar and i t s table i n ASE opposed by degree the of m runs estimated with collinearity the d c > row. layout gives associated also using to average the standard displays the the biased usual values the of error the in percentage e s t i m a t o r s fi« . T estimator A fiiR .i=) L (ie. B / B i RI....E)* 1 0 0 ) . criteria the comparison and d < bars o f ft f o r ( f t , d that < l S : > comparisons level with within comparison i s displayed homogeneous the three with o f ft"-' t o make ft<«> the multiple setting at the five i n Table the simulation (ft <'--> , d < •> ) , 1 the scaling the graph size o fft<> precent 5.2 where groups. displays graphically ). Note test sample were each pictorially 5.1 settings the considered estimators pairwise represent Figure accordance of the three signficance the only . Tukey'a J In easier estimates (B< >,d has been L < e > ) modified t o comprehend. and to 76 Table 5.1 Estimators f o r t h e Gamma fi fi 1. fi " = <L>n B <S>T 2. d< 3. Each A > = in: 0 .0128 0. 0 1 0 7 0. 0 0 9 4 0 0 .00941 0. 0 0 9 5 8 0 .0 1 0 1 0 .0101 0. 0 0 9 1 4 0. 0 1 0 6 0 .0128 0. 0 1 0 5 0. 0 1 0 3 0 .0127 0. 0 1 1 0 0. 0 0 9 9 5 0 .00804 0. 0 1 0 7 0. 0 1 0 1 0 .00749 0. 0 0 7 3 5 0. 0 1 0 4 0 .0163 0. 0 1 2 3 0 .O l O l 0 .00763 0. 0 0 4 1 3 0. 0 0 1 7 5 o .0100 0. 0 0 6 7 2 0. 0 0 2 2 9 o .0101 0. 0 0 3 1 4 -0 . 0 0 2 7 9 0 .0105 0. 0 0 5 4 8 0 .0 0 2 6 2 < L: < s > (O.Ol, O.Ol, O.Ol, (-o.OOl, = 0.80 d O.Ol) O.Ol, -0.01, < f i : > = contains O.Ol) 0.95 A cell Ar~fi A A fii: Ft i~ra Model (fit, A A fi , s A fi , 3 B ^ ) T Table 5.2 Actual for Squared Error t h e Gamma A d <t > Estimators Model A A A I R i... m 0.00171 C0.00209 of B ] H I ft (.;-.: T 0.000450 [0.000984 ] 0.000167 [ 0 . 0 0 3 1 5 )3 26% 9% ft <L > 0.0139 [0.000177 d < ft 1. B < L s > > ft<S>r d = r = )] 19% 0.000281 [0.000454 3 <a> 0.0000942 [0.000142 3 3. Each c e l l 0.000197 [0.000193 1 0.000286 [O.0000465 3 d 100% O.Ol, O.Ol, O.Ol) (-o.OOl, > = 0.80 1% 55% (O.Ol, d % 0.00268 [0.00722 < a: > 2. < ] 0.01,-0.01, < c i > contains = O.Ol) 0.95 ASE a v e r a g e for m [standard error! reduction ratio runs % using a estimator 4. Bars r e p r e s e n t homogeneous groups w i t h Tukey's p a i r w i s e comparison a t t h e 5% l e v e l biased F i g u r e 5.1 Mean ± Standard D e v i a t i o n I n t e r v a l s o f E s t i m a t o r s (a) (b) ,d"') f o r the Gamma Model ,d > ) (fi <a o.eo o.o« 0.07 0.0* 0.00 0.04 COS u o o.ot + o.oo 99 - i 0 -.01 2 •o.ot -0.00 LEGEND CZ2 1 E 3 3 4 • -0.04 fii -o.oi -o.oa -o.or Rll IRLS -0.8* PC2 IRL8 ESTIMATES P C2 Rll ESTIMATES (c) (fi <s> ,d <,=:> ) 0 .0* o.oo o.or 0.01 0.01 0.04 O.Ol > 1 i• o 1 o.oa 0.01 0.00 1 -00.1 + •o.ot -0.00 B<u>x = (O.Ol, O.Ol, O.Ol, O.Ol) A < S > T d < 4 > = = ( - 0 . 0 0 1 , 0 . 8 0 d O.Ol, -O.Ol, O.Ol) -0.04 0 -.01 E 2 LEGEND CS3 t CD t 4 -0.00 •o.or tts:> = ( 0 . 9 5 0 -.00 IRLS Rll EST1MATE8 &l PCS 78 The are major as the alignment i s considerable biased estimator. proportional case to the principal estimator the five In fi ) can is be drawn favourable reduction Also, the degree of components (Duncan's from the results in variance amount range finds then 1 using reduction present. better test ft* -*), from of collinearity seems multiple ( i e . fi = than a In the a is this ridge difference at percent ' l e v e l ) . the case there <ra> that follows. When there conclusions estimation when appears , the to provided alignment i s unfavourable be some gain from the bias induced (ie. B ridge using could be is well = considered tolerable. The effect Figure 5.1(c). mean can square be seen. because It drawn major of the using In particular error The i s encouraging summary Table = estimators the variance IRLS variance from biased and PC2 reduction to note 5.2 conclusions are of in identity + biaa a estimators have the in PC2 is offest that the conclusions in shown reasonable [GIBBONS,81] by same i t s bias. that agreement pl37-8. ASE, can with be the 1. 2. A l l e s t i m a t o r s a r e b e t t e r t h a n t h e LS e s t i m a t o r s when t h e u n d e r l y i n g c o e f f i c i e n t v e c t o r i s f a v o r a b l e ; t h a t i s B = B,„. No e s t i m a t o r i a a l w a y s b e t t e r t h a n t h e L S when t h e underlying coefficent vector i s unfavorable; t h a t i sfi= B.,. n 3. So are Estimates HKB, where refers i n conclusion, definite estimator linear extensive carried gains when model collinear be HKB GHW a n d RIDGM to there to a structure i s some e v i d e n c e b e made testing, gamma as using out before well overall, [HOERL,KENNARD,BALDWIN,75] collinearity with performed was both by u s i n g i s to some f o r m present in distribution generated. simulation any broader suggest a of there biased generalized and an artificial However far more and r e a l data, must statements c a n be made. 81 6.0 This thesis collinearity has in identification as Summary an approximate predictor naturally of linear X = associated with shown collinearity that general series. system That the W X general for amongst is collinearity is is linear model. system the shown the and to the scaling weight matrix carries over general of extend this, the model columns Using except i n the linear with W models diagnostic standard where X for methods. setting i n the a l l in a [GUNST,83], general definitions model, estimation dependency X the transformation and collinearity matrix to suitable generalised linear procedures definition Conclusions investigated the The the and so i t was to the called setting is p model dependent. The CGUNST,831 mathematically transformed extend the were generalized linear model definition equivalent to predictor small matrix X. scheme. derived linear To for This model collinearity quantify the the «*. of shown to be values in the property was used diagnostic model collinearity i n terms index was singular [BELSEY,KUH,WELSCH,80] identification bounds extended that and dependency, index of to the of the standard Although estimation in trying bound ridge method Given previous area and problems the be estimation appealing alternative, liklehood of i s a approach, amount t o s e t up need of research. a Monte parameters the c r i t i c i s m there encountered [HOERL,KENNARD,70] for future in trying studies, simulation conclusions, exist much theoretically could area the easiest Carlo in the directed against f o r more work in too. collinearity However An were "equivalent i s an found the posterior model, and simulation k. an the linear extending the method for generalized While This the problems simulation aims was seemed difficulties parameter incorporates information". this i t by f o r the discussed, which intuitively to consider, t o implement result briefly regression and i t with c a n be more extremely d i d succeed estimation alleviated experience before advocated was some form for routine the presence t h e methods i s needed of biased use. i ni t s i n demonstrating in by restricted both that of described. practically estimation and methods Biblioqraphy [ALLEN,77] A l l e n , D M "Comment to Least Squares" Journal A s s o c i a t i o n 72 3 5 7 9 5 - 9 6 . [BARD, 7 4 ] Bard,Yonathon A c a d e m i c P r e s s New York. on of Simulation of Alternatives the American Statistical Nonlinear Parameter [BELSEY,KUH,WELSCH,80] B e l s e y , D A Kuh,E W e l s c h . R E Diagnostics : Identifying Influential Data and Col1inearity J o h n W i l e y & S o n s New York. Estimation Regression Sources of [BRADLEY,SVRIVASTAVA,79] Bradley,RA Svrivastava,SS "Correlation i n Polynomial Regression" American Statistician 33 1 1 - 1 4 . [DEMPSTER,SCHATZOFF,WERMUTH,77] Dempster,AP Schatzoff,M Wermuth,N "A S i m u l a t i o n Study of Alternatives to Ordinary Least Squares" Journal of the American Statistical A s s o c i a t i o n 72 7 7 - 1 0 6 . [DRAPER,VAN NOSTRAND,79] D r a p e r , N R Van Nostrand,RC "Ridge r e g r e s s i o n and James S t e i n E s t i m a t o r s : Review and Comments" T e c h n o m e t r i c s 21 4 5 1 - 4 6 6 . [EDWARDS,69] E d w a r d s , A W F "Statistical Inference" N a t u r e 222 1 2 3 3 - 1 2 3 7 . [EDWARDS,69a] Press London Edwards,AWF [GIBBONS,81] G i b b o n s , D G Estimators" Journal of 76 373 1 3 1 - 1 3 9 [GUNST,80] G u n s t , R F Regression Methods" Association 75 369 Likelihood "A the Methods in Cambridge Scientific University S i m u l a t i o n Study o f Some R i d g e American S t a t i s t i c a l Association "Comment o n Journal 98-100. A of Critique of the American Some R i d g e Statistical [GUNST,83] G u n a t , R F "Regression A n a l y s i s with Predictor Variables : Definition, Detection Communications in Statistics Theory and 2217-2260. Multicol1inear and Effects" Methods 12(19) CHEMMERLE,BRANTLE,78] Hemmerle,WJ Brantle,TF " E x p l i c i t and Constrained Generalized Ridge Regression" T e c h n o m e t r i c s 20 109-120. [H0ERL,62] Regression 54-59. Hoer1,AE Problems" "Application of Ridge Chemical Engineering Analysis Progress to 58 [HOERL,KENNARD,70] H o e r 1 , A E Kennard,RW "Ridge r e g r e s s i o n : Biased Estimation f o r Non-orthogonal Problems" Technometrics 12 5 6 - 6 7 . [H0ERL,KENNARD,76] H o e r 1 , A E Kennard,RW Iterative Estimation of the Communications i n S t a t i s t i c s A5 7 7 - 8 8 . "Ridge Biasing Regression : Parameter" [H0ERL,KENNARD,BALWIN,75] "Ridge Regression : Some S t a t i s t i c s 4 105-123. Hoer1,AE Kennard,RW Baldwin,KF Simulations" Communications in [KENDALL,57] Kendall,MG Griffin London. A [KENDALL,STUART,67.2] Theory of Statistics Griffin London. [LINDLEY,SMITH,72] the Linear Model" B 42 3 1 - 3 4 . [McCULLAGH,82] Lecture Notes Course Kendall,MG Volume 2 in Multivariate Analysis Staurt,A Inference and The Advanced Relationship Lindley,DV Smith,AFM "Bayes e s t i m a t e s f o r Journal of the Royal S t a t i s t i c a l Society McCullagh,P University of [McCULLAGH,NELDER,83] L i n e a r Models Chapman "Categorial Data British Columbia. McCullagh,P Nelder,JA and H a l l London. Analysis" Generalized [McDONALD,GALARNEAU,75] M c D o n a l d , G C Galarneau,DI Carlo Evaluation o f Some Ridge Type E s t i m a t o r s " t h e A m e r i c a n S t a t i s t i c a l A s s o c i a t i o n 70 4 0 7 - 4 1 6 . [MONTGOMERY,PECK,82] M o n t g o m e r y , D C P e c k , E A Linear Regression Analysis J Wiley & Sons [M0YLS,851 [MULLET,76] Wrong S i g n " Moyls,B Personal "A M o n t e Journal of Introduction New Y o r k to communication Mullet,GM "Why R e g r e s s i o n C o e f f i c i e n t s have t h e J o u r n a l o f Q u a l i t y T e c h n o l o g y 8 121-126. [NELDER,77] N e l d e r , J A " D i s c u s s i o n on Bayes E s t i m a t e s Linear Model" Journal of the Royal S t a t i s t i c a l S e r i e s B 34 1 8 - 2 0 . f o rthe Society [NELDER,WEDDERBURN,72] Nelder,JA Wedderburn,RWM "General Linear Models" Journal of the Royal S t a t i s t i c a l Society S e r i e s A 135 3 7 0 - 3 8 4 . [PATEFIELD,77] Patefield,WM "On the Function" S a n k h y a S e r i e s B 39 92-96. [PREGIBON,81] Pregibon,D The A n n a l s o f S t a t i s t i c s Maximized " L o g i s t i c Regression 9 4 705-724. Likelihood Diagnostics" [SCHAEFER,79] Schaefer,RL Multicollinearity in Logistic Regression PhD T h e s i s U n i v e r s i t y o f M i c h i g a n #792522D [SCHAEFER,82] Schaefer,RF "Alternative Estimators in L o g i s t i c Regression when t h e D a t a a r e C o l l i n e a r " Proceedings of the American Statistical Association Statistics and Computing S e c t i o n 159-164. [SILVEY,69] Silvey,SD "Multicollinearity Estimation" Journal of the Royal S t a t i s t i c a l B 31 5 3 9 - 5 5 2 . and Imprecise Society Series [SMITH,CAMPBELL,80] S m i t h , G C a m p b e l l , F "A C r i t i q u e o f Ridge Regression" Journal o f the American S t a t i s t i c a l Association 75 3 6 9 7 4 - 8 1 . CTHISTED,76] Thisted,RA "Ridge Regression, E s t i m a t i o n and E m p i r i c a l Bayes Methods" Technical 28 S t a n d f o r d U n i v e r s i t y D i v i s i o n o f B i o s t a t i s t i c s CWEDDERBURN,74] W e d d e r b u r n , R W M "Quasi Generalized Linear Models and the B i o m e t r i k a 61 3 4 3 9 - 4 4 7 . Hinimax R e p o r t No Likelihood Functions, Gauss-Newton Method"
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Collinearity in generalized linear models
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Collinearity in generalized linear models Mackinnon, Murray J. 1986
pdf
Page Metadata
Item Metadata
Title | Collinearity in generalized linear models |
Creator |
Mackinnon, Murray J. |
Publisher | University of British Columbia |
Date Issued | 1986 |
Description | The concept of collinearity for generalized linear models is introduced and compared to that for standard linear models. Two approaches for detecting collinearity are presented and shown to lead to the same diagnostic procedure. These are analysed for the Poisson, gamma, inverse Gaussian, pth order, binomial proportion and negative binomial models. A bound is derived for the degree of collinearity in a generalized linear model in terms of that of the standard linear model. Estimation methods based on ridge, prior likelihood and principal components are proposed, and briefly illustrated with a Monte Carlo simulation of a gamma model. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096659 |
URI | http://hdl.handle.net/2429/25711 |
Degree |
Master of Science - MSc |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1986_A4_6 M26_5.pdf [ 3.27MB ]
- Metadata
- JSON: 831-1.0096659.json
- JSON-LD: 831-1.0096659-ld.json
- RDF/XML (Pretty): 831-1.0096659-rdf.xml
- RDF/JSON: 831-1.0096659-rdf.json
- Turtle: 831-1.0096659-turtle.txt
- N-Triples: 831-1.0096659-rdf-ntriples.txt
- Original Record: 831-1.0096659-source.json
- Full Text
- 831-1.0096659-fulltext.txt
- Citation
- 831-1.0096659.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096659/manifest