UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Collinearity in generalized linear models Mackinnon, Murray J. 1986

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-UBC_1986_A4_6 M26_5.pdf [ 3.27MB ]
JSON: 831-1.0096659.json
JSON-LD: 831-1.0096659-ld.json
RDF/XML (Pretty): 831-1.0096659-rdf.xml
RDF/JSON: 831-1.0096659-rdf.json
Turtle: 831-1.0096659-turtle.txt
N-Triples: 831-1.0096659-rdf-ntriples.txt
Original Record: 831-1.0096659-source.json
Full Text

Full Text

COLLINEARITY  IN GENERALIZED  LINEAR MODELS  by MURRAY J MACKINNON M.Sc.  U n i v e r s i t y of O t a g o N.Z.  A THESIS SUBMITTED IN PARTIAL FULFILMENT THE REQUIREMENTS FOR THE DEGREE OF MASTER OF  SCIENCE  in FACULTY  OF GRADUATE  Commerce and B u s i n e s s  We  accept to  this  standard  OF BRITISH COLUMBIA  April  ©  Administration  t h e s i s as c o n f o r m i n g  the r e q u i r e d  THE UNIVERSITY  STUDIES  1986  M u r r a y J M a c k i h n o n , 1986  OF  In  presenting  requirements  this  that  I agree that  available  permission  scholarly  for  partial  purposes or  understood  that gain  by  may his  be or  copying  shall  the  reference  f o r extensive  Department  financial  in  not  Commerce a n d B u s i n e s s  of  the  this  granted  by  the  her  Head  Administration  i t  agree  thesis  representatives.  allowed  make  I further  of  or p u b l i c a t i o n be  shall  copying  THE UNIVERSITY OF BRITISH COLUMBIA 2075 Wesbrook P l a c e V a n c o u v e r , Canada V6T 1W5  1986  Library  and s t u d y .  permission.  Date: A p r i l  fulfilment  f o r an a d v a n c e d d e g r e e a t t h e THE UNIVERSITY OF  BRITISH COLUMBIA, freely  thesis  of It  for my is  of t h i s t h e s i s f o r  without  my  written  Abstract  The is  concept  introduced  models.  Two  presented These pth A  of collinearity  are analysed  bound  binomial i s  generalized  illustrated  t o t h e same  proportion f o r  model  and p r i n c i p a l a Monte  that  gamma,  methods components  of  models  standard  linear  diagnostic  degree  i n terms  linear  collinearity  and negative  the  Carlo  f o r  detecting  f o rt h e Poisson,  Estimation  with  to  f o r  t o lead  derived  linear  model.  likelihood  compared  approaches  a n d shown  order,  linear  and  f o rgeneralized  procedure.  inverse  Gaussian,  binomial  models.  of collinearity that  based  of on  i n a  t h e standard ridge,  a r e proposed,  simulation  are  prior  and b r i e f l y  o f a gamma  model.  iii  Table of Contents  0.0  Int.roduct.ion  1.0  Col l i n e a r i t y  i n Standard L i n e a r Models  Definition  1.2  Sources of C o l l i n e a r i t y  5  1.2.1  Large P a i r w i s e C o r r e l a t i o n s  5  1.2.2  Data C o l l e c t i o n  6  1.2.3  Model S p e c i f i c a t i o n  8  1.2.4  Overdefined  8  1.2.5  Outliers  Effects  of C o l l i n e a r i t y  3  1.1  1.3  2.Q  1  2.2  3  Model  9  of C o l l i n e a r i t y  10  1.3.1  111  c o n d i t i o n i n g of X  1.3.2  Estimate  1.3.3  Inference  Effects  13  1.3.4  Predictor Effects  14  Definition Definition  11  L i n e a r Models  of a G e n e r a l i z e d of  10  Effects  C o l l i n e a r i t y l n Generalized 2.1  . . .  16  L i n e a r Model . . . .  C o l l i n e a r i t y i n Generalized  16  Linear  Models  22  2.2.1  Linearisation  2.2.2  Iteratively Approach  of the Link F u n c t i o n . Reweighted  Least  ...  23  Squares 25  2.2.3  Choice of Approach  2.3 R e l a t i o n s h i p  o f the  25  Standard L i n e a r Model and the  G e n e r a l i z e d Model C o l l i n e a r i t y D e f i n i t i o n s 2.4  Sources o f  Collinearity  in  . . .  a Generalized  Linear  Model 2.5  29  Effects  of  Collinearity  in  a Generalized  Linear  Model  32  2.5.1  Estimation Effects  32  2.4.2  Inference  33  2.6  Appendix Generalized  2.7  and P r e d i c t o r E f f e c t s  2A  Maximum  Likelihood  and  the  L i n e a r Model  Appendix 2B  35  I t e r a t i v e l y Reweighted Least Squares  Algorithm  3.0  26  Diagnostics  40  for  Collinearity  in  Generalized  Linear  Models 3.1  42 Desirable Properties f o r Diagnostic  Measures 42  4.0  3.2  Measures of C o l l i n e a r i t y  3.3  Model Dependency  Estimation  for  43  in Collinear  Generalized  Systems  Linear  Models  Presence o f C o l l i n e a r i t y 4.1  Remedies f o r C o l l i n e a r  46  in  the 50  Sources  50  4.1.1  Large P a i r w i s e C o r r e l a t i o n s i n X  51  4.1.2  Data C o l l e c t i o n  51  LEAF  FEUILLET  I V OMITTED  I V NON  IN PAGE  UNCLUS  NUMBERING.  DANS L A P A G I N A T I O N .  V  4.1.3  Model S p e c i f i c a t i o n  51  4.1.4  Overdefined  52  4.1.5  Outliers  Model  52  4.2  Remedies f o r C o l l i n e a r E f f e c t s  52  4.3  E s t i m a t i o n Methods  53  4.3.1  53  Ridge E s t i m a t i o n  Standard L i n e a r Case  54  General  55  4.3.2  Bayesian  Estimation  60  Standard L i n e a r Case  60  General  61  4.3.3  4.4  L i n e a r Case  L i n e a r Case  P r i n c i p a l Component E s t i m a t i o n  62  Standard L i n e a r Case  62  General  63  Appendix 4A  L i n e a r Case  Maximum L i k e l i h o o d Estimates  o f the  Posterior Distribution  5.0  65  I l l u s t r a t i v e Example  67  5.1  Introduction  67  5.2  Scope o f the S i m u l a t i o n  67  5.3  Generation  68  5.4  o f C o l l i n e a r Data  5.3.1  Standard L i n e a r Case  68  5.3.2  General  68  L i n e a r Case  Simulation  71  5.4.1 S i m u l a t i o n Setup  71  5.4.2 S i m u l a t i o n  73  Implementation and D i f f i c u l t i e s  vi 5.4.3 6.0  S i m u l a t i o n R e s u l t s and C o n c l u s i o n s .  Summary and C o n c l u s i o n s  Bibliography  ...  74 81  83  0.0  Introduction  Collinearity significant parameters with  a  with  the  is  the  has  problems in  a  of  collinearity  its  consequences  The  first  linear  model  provide  linear  exception  for  an  i n the  standard  generalized  purpose  long  of  and  computation  and  linear  model.  model  are  thesis  i n the to  chapter  recognised  to  However,  propose  introduction  It  definition  model,  to  examine  methods.  collinearity  and  the  effects  [SCHAEFER,793.  linear  i t s sources  of  unexplored,  reasonable  estimation  identifies  reviews  a  causing  the  relatively  seek  generalized  as  estimation  logistic regression  this  and  been  and  comparison  in  the  effects, for  standard so  the  as  to  general  setting.  The  second  collinearity two link by  i n the  approaches. function the  i s  chosen.  and  least  shown  seeks  the  first  of  reasonable linear  i s based  second,  squares  that  a  generalized  The  instability  reweighted it  chapter  on  which  matrices  model a  by  used  i s  in  Using  for  considering  linearisation  i s chosen,  estimation.  collinearity  definition  of  the  i s motivated  the  iteratively  this  definition,  dependent  on  the  model  The the  third  general  criterion  chapter setting.  and  Desirable  identification  indeterminancy general the  opens  due  criterion  standard  The  by  properties scheme,  while  of  prior  likelihood  to  fifth  gamma  chapter with  model.  The  construction  of  simulation  within  advantages  of  collinearity.  a  proposed,  To weak of  the  in  and  a  approach  quantify  the  bounds  the  the  for  criterion  for  of  the  by  are  following method  Monte  and  such  simulation  its biased  estimator  in  of ridge  logistic on  the  the third  solution.  the  Carlo  ideas  of  simulation  associated  are  setting  the  while  objectives  artificial  The  It and  use  i s based  components  problems a  proposed.  illustrates  restricted  sources  routine  [EDWARDS,691,  principal  methods.  collinear  second  briefly a  estimation  methods  The  classical  chapters  terms  for  generalize,  philosophy  the  in  against  Three  CSCHAEFER,791.  generalizes  previous  cautioning  shown  used.  investigates remedies  methods.  is  are  extending  dependency,  developed  diagnostics  model.  suggesting  work  The  are  collinearity  is  model  chapter  "mechanical" approach  to  linear  fourth  effects,  the  with  i"BELSEY,KUH,WELSCH,80],  of  a  deals  the of with  discussed.  The  demonstrates  the  the  presence  of  3  1.0  C o l l i n e a r i t y l n Standard L i n e a r  Throughout the  t h e o r e t i c a l development of t h i s t h e s i s a  constant p a r a l l e l w i l l model  and  the  be  drawn between  generalized  linear  l i n e a r model of a response y i n terms X be d e f i n e d  as  Models  the standard l i n e a r  model. Let the  standard  of p r e d i c t o r  variables  follows.  D e f i n i t i o n 1.1  Standard L i n e a r  Model  y = Xfl + € where  y  nxl vector  X  nxp  matrix of p r e d i c t o r  variables  fi p x l v e c t o r  of c o e f f i c i e n t s  €  of  nxl v e c t o r where  1.1  of responses  errors  € ~ N , (0, cr^I) v  D e f i n i t i o n of C o l l i n e a r i t y Collinearity,  regression,  two  as  considered  in  classical  linear  i s s a i d to be present when :  or more  predictor  c o e f f i c i e n t estimates  variables  are  highly  correlated  are d i f f e r e n t i n magnitude and/or  s i g n from those hypothesised c o e f f i c i e n t e s t i m a t e s are h i g h l y  variable  4  the  sampled  matrices  range  used  o f some  i n the  predictors  computation  may of  be  small  estimates are i l l  conditioned  But  these  more  basic  the  columns  problem  problem of  i s  and  only  foundation  used  here.  and  highly  outliers,  variable  or causes  dependencies  variables. of  With  the  this  causes  such  and t o g i v e  estimates  symptoms  linear  predictor  t i e together diffuse  range  possible  of approximate the  the  [GUNST,831, to  are  approach as  amongst  underlying  approach  validity  and  This  of the  taken  by  i t i s possible  deficient t o symptoms  sensitive  row  sampling such  as  deletion  statistics.  The  following  definition  of  collinearity  i s adapted  from  [GUNST,831.  Definition y  = Xfl + €  For c  1.2  * 0  |Xc|  It data  i s p r e s e n t when  some  such  Collinearity  suitably  i n the Standard  Linear  Model  :  chosen  6  i 0 there  exists  a pxl vector  that  L 6|c|  i s  t o be  noted  m a t r i x X a n d makes  that no  this  definition  mention  of  only  involves  t h e dependent  the  variable  y,  or the underlying  the  detection  model.  So f o r t h e s t a n d a r d  of c o l l i n e a r i t y  involves  only  linear  the X  model,  matrix  of  of the lack  of  predictors. This  definition  guidance from  i n the selection  the  require matrix  others  X ) . The  directly  explained  1.2  o f 6.  selection  CGUNST,831  notions  CGUNST,831  subjective  of  shows  how  i n terms  these  of the  no  different  (eg.CBELSEY,KUH,WELSCH,801  definition  i t . The  i t i s  the condition  of collinearity  from  becuase  However  i n the l i t e r a t u r e  subjective  intuitive more  i s  index  i s preferred  listed  above  following sources  because the  c a n be  sections and  of the  deduced  based  effects  on  can  be  definition.  Sourcaa o f C o l l i n a a r i t y Collinearity  sources,  can  stem  and a l l o f t h e s e  Further,  i t  information  i s  of  on  from  one  c a n be c a p t u r e d  considerable  the  any  source  by  practical  when  of  a number  Definition interest  trying  to  of  1.2.  t o have alleviate  collinearity.  1.2.1  Large  Suppose correlation XK  Pairwise  X  has been form  Xj  and  i s r  be  the matrix  <ie. J  k  with  Correlations scaled the  and  centred  correlation  so  of  X X  coefficient  , where x j i s t h e j t h column t h e j t h column  that  T  i sin  between  o f X ) . L e t X <j >  X deleted  a n d fi«j> t h e  6 corresponding theory  i t i s known  6 <j > — Let  Now  =  Xj  where  =  Rj'  S;  from  linear  ( e g . [MONTGOMERY,PECK,823  regression  p430)  X (j )x j  X )>ft<j>  -  t  sum  of squares  principle  ( 1 - R.., -)^ &  i s the coefficient  of multiple  correlation.  i f f o r some 6 i 0 <1  - R_, >^  i 6(1 + R  we  | X c j £ 6 | c: j s o a c o l l i n e a r i t y  K  then  have  Definition  Xc  =  |Xc|  xt  =  -  r  -  |Xc| and  (1 -  l e  a  for r  >  )  r  e i  e  &  ft<e>  =  exists  f i e and so c  i n terms o f  i s given  by  and  > ^  > 0 we  have  i 6(1 + r i a  8  ) ^  then  exists  as per D e f i n i t i o n  "large  enough"  a collinearity  i s important  t o note  though  dependency  show  A  then  =  a collinearity  It  tend  J  1 6|c|  so  words  (  i f p 2  riftXfflj  i f f o r some <1  R  1.2.  Specifi cslly  So  Now  as  by t h e e x t r a |Xc|  predictor.  (X < j > X < j > )  c be d e f i n e d Xc  So  deleted  i l s  need  towards when  dependency,  the tends  be  strong  unity.  for  that  a l l the  Specifically  singular to zero  value, a l l  only  1.2.  In  other  exists. one correlations  to  CBELSEY,KUH,WELSCH,803  connected  pairwise  with  just  correlations  tend  one to  unity.  Hence,  multiple  1.2.2 If  high  correlations  cannot  Data  a  subset  of  the  predictor  arise.  collinearity  can  variable  case  scatterplot  positively  more  i n the  correlated  collinearity  variable  (eg.  as one  jointly  and  can  deficiency  likely,  factors  to  determine  Collection  only  sampling  used  collinearities.  sampled,  of  be  due in  For  example,  below,  consequently  be to an  variable  the  space  is  the  two  with data  is  c o l l i n e a r . This  caused  by  chance,  lack  highly source  extrinsic factors  observational  variable  is  related  influences  two  other  of  available study,  to  by  another,  variables).  (eg.  data)  or  intrinsic or  one  1.2.3  Model  Often which  Specification  in  applied  manifest  However,  models  themselves  they  polynomial show  that  in  for  the  i f  the case  exist  the data not  consider of a  be  implicit  exact  the f i t t i n g  single  t o any  two  powers,  the differences  improvement  due  to  can  predictor  i n powers centring).  be  constraints  (eg. mixture  t h e c o r r e l a t i o n c o e f f i c i e n t between  corresponding unity,  in  need  [BRADLEY,SRIVASTAVA,793  there  dependencies. of  a  low  variable. the  extremely i s even  models).  order They  variables, close  (even  to  allowing  1.2.4  Overdefined This  when The  i s a special  there  a r e more  linear  dependency  follows. By  Suppose  singular  X  =  case  variables of  than  the  collection,  which  observations  <ie. n < p ) .  collinearity  of X  decompostion  and D diagonal  UDV  of data  t h e rank  value  orthonormal  But  Model  i s r . Now there  such  can  be  occurs  seen  r i. m i n ( n , p )  exist  as  = n.  U orthogonal,  V  that  T  U, V a r e o r t h o g o n a l  so rank(D)  = rank(X)  X  where  :  = U[D  0"|V  ± 1  T  D  l t  = r.  r * r  [o oJ XlVt  V ] a  [Ut  =  U ] [ D n Cf| LO  XV So such  ra  that  any 6  i O  |Xcj = 0  there  exists  a vector  c, namely  tVa-lj  outliers  <ie.  6|cj.  Outliers  observations space,  a s i s shown  linear  x e  fixed, columns  X u  are  example  = 0  be  atypical  can  of  are of unit  i n the predictor argument  predictor  large  X I B = k0 length.  by  be i l l u s t r a t e d  p=3  atypically ,  induced  i n the following  dependency  are  e  can  which  hypothetical and  IV o r t h o n o r m a l ]  i e . |Xc| £  Collinearity  The  oJ  b y V]  = O  for  1.2.5  [postmultiply  ffi  and So  in  from  CGUNST,83].  by c o n s i d e r i n g  variables  magnitude.  X i s  variable  scaled  where X i i  Suppose so  the  k i s  that the  X! 1 Xoffii  Xie  1  1  ^ iS: iFP:  1  Xei*/Xi  1  X  X  1  X t", j  1  X =  where  c  =  Consider IXc! Now  , j  jr>  x± i *  variables X i  r  0™  +  T i —  c" " =  X  , S :  (0,1,-1).  1  =  Xi,.*  j.  5C v-» in-: "** ^ ^ i!  are the unsealed  predictor  0,  Xt  s :  * 0  t o 0.  such  that  |Xc|  l 6|c|.  Definition  large,  68  + Z  i •-• i  Xit*®  and  ,S:  So  £  -  x  i  e  » / X )  is  other  &  1 0  words  corresponds  to  a:  there  a  a  Xsa** * k 0' . H e n c e  f o r any  In  <Xii*/Xi  i s a  the right value  the o u t l i e r , a  of  0  with  0  collinearity  as i n  1.2.  E f f e c t s of section  intuitive  +  a  e  sufficiently  = k^G  e B :  Now  ( 0 / X i - k0/A )'  tends  Collinearity illustrates,  i n t h e manner  effects of collinearity  111 It  e  a  1  side  1.3.1  k0/X  X a*/Xg  and  f o r large  This  /X  K  i=2(l)n  hand  1.3  0/X*  has  intimately [SILVEY,691  conditioning been  known  related state  using  of  CGUNST,831,  Definition  the  1.2.  of X for a  long  time  to the  conditioning  that  collinearities  that of  collinearity X.  i s  [KENDALL,571,  are associated  with  small  eigenvalues  small  of  singular  precisely,  values  which  S p e c i f i c a l l y , and  which  r  of  provides  c o l l i n e a r i t y ,  centred  X X,  X.  numerically  This,  an  when  alternative  [BELSEY,KUH,WELSCH,80]  suppose  scaled  is  to  again  unit  that  length.  to  quantified  more  d e f i n i t i o n  of  advocate.  the  Then  equivalent  columns by  of  X  singular  are  value  decomposition X  =  UDV  T  where  U  =  lu±  . . . . .  u  ]  !  nxp  V  =  tvi  , . . . ,  v ]  :  pxp  D  =  d i a g (dt  r:>  p  , ...,d  F 3  )  orthogonal eigenvectors  singular  values  So X Now  =  Z d , u , V j .1 — i  consider Xc  |Xc|  So  =  djUj  =  d,|c|  for  according  1.3.2 The  c  any to  =  6  i n i t i a l  estimates  of of  v  i  form]  f  0  ,  i f  D e f i n i t i o n  Estimate  technique  [bilinear  T  dj  £  6  then  there  [V  orthogonal]  [U,V  orthogonal]  is  a  c o l l i n e a r i t y  1.2.  Effects reason  ridge R  for  Hoerl  regression were  "too  was  and their  long"  Kennard  developing  observation in  the  the  that  the  presence  of  collinearity. norm  of  the  This  observation  ordinary  least  E[B flJ = E C ( B - B + f i ) T  =  B" B  +  r  EItr(B  is  seen  squares  by  estimate  considering  the  B.  ( f i - B + B ) ]  T  -  B> (fl - B ) " ] r  A  So  the  =  B fi  +  t r <Var(B))  =  B fl  +  tr((X X)  =  B B  x  T  +  T  magnitude  a  cr^tr (VD"- V ) a  of  B  in  a  [singular value  T  collinear  relative  to  magnitude  i s inversely proportional  X.  orthogonal  Similarly  covariance the  an  *)o-  T  with  matrix  singular  the  is a  values,  data  system matrix to  variance  direct  decompostion]  is  overestimated  X.  the  Further,  singular  estimates,  function  of  the  the  values since  of the  r e c i p r o c a l s of  ie.  A  =  Var(B) In  tr  VD~ V  particular  components. the  f f i  c  the  Each  one  singular values Var(Bw> = a* Z  T  variance  i s uniquely as  i f  two  explained  there  an  leads  is to  decomposition variance the  of  by  the  approximate the  same  following  K  jth singular  be  associated  have  linear  proportion, B  can  decomposed  into  with  one  just  p of  follows  coefficients  variances  B  -  6  Hence  of  explained value.  high  related  singular  dependency  definition T\J •.«-., by  proportions  the  being  the  dependency  their  value,  then  them.  This  between of  of  the amount  variance of  the  associated  with  Definition matrix  Wju  1.2  Variance  =  | -  k,j = l ( l ) p  J  V,,  These algorithm  0  U J  proportions  form  developed  to  to the general  Inference  Almost  potentially  For  instance, H.r_  flj  = 0  H„  fij  * 0  test  0  the  K  T 0K,  =  basis  of  the i d e n t i f i c a t i o n  [BELSEY,KUH,WELSCH,801  c o l l i n e a r i t i e s . This case  in section  for  algorithm  the  i s shown  3.2.  Effects  a l l statistics  can  P»  .  W :  by  of multiple  extend  (  = -^r^--  detection  the  P r o p o r t i o n s o f the  X:nxp  where  1.3.3  Decomposition  be  "degraded"  in testing  statistic  associated  i s  the  with  by t h e e f f e c t s hypothesis  linear  regression  of collinearity.  But  recalling  determination  the  relationship  and t h e s i n g u l a r  statistic  can  statistic  i s often  but  c a n be c o m p e n s a t e d  of  this R.  As  X  So  = (1 -  i t i s seen  that  the t  In p r a c t i c e  the t  presence  of  collinearity,  f o r by t h e " o v e r l a r g e " no  explicit  of the  magnitude  statements  t statistic  c a n be  . However  of the t s t a t i s t i c  he  i s  R,»)^5J 2CT  fixed  parameter  B,  and  involving  of  a,  the  stronger  X j , the smaller  the test  statistic  and  the  offending  i s t h e non c e n t r a l i t y consequently  the  lower  power o f t h e t e s t .  As  i s  noted  collinearity not  effect  the  two  i t  a subset  the effects  of predictor  of the  orthogonal.  remaining This  the introduction  variables  of  need  coefficients, i f  i s to  be e x p e c t e d  of orthogonal  variates  by in  procedure.  effects  predictions  are  with  Predictor  The  CBELSEY,KUH,WELSCH,80],  the estimates  stepwise  1.3.4  by  involving  subsets  comparing  the  notes,  the magnitudes  collinearity  a  i n the  t h e non c e n t r a l i t y p a r a m e t e r  for  the  values,  the c o e f f i c i e n t of  by c o l l i n e a r i t y .  smaller  CGUNST,83]  made a b o u t notes  be e f f e c t e d  between  of collinearity  using  predictor  Effects  " i n sample"  variables  are often and  l i e i n the  "out  widely  different in  o f sample"  subspace  of  data.  If  the o r i g i n a l  data, then r e l a t i v e l y p r e c i s e e s t i m a t i o n i s p o s s i b l e . However "out o f  sample"  predictions  may  be  severely  e f f e c t e d by  collinearity. Consider  for  example  the  ordinary  least  squares  p r e d i c t i o n s g i v e n by the e x p r e s s i o n y = x fi T  Now Var(y)  = tr^Ml/n + x * (X X> ~ x > T  T  1  1  = o-'-Ml/n + X i T T A - ' T ^ ) 1  where  Now  if X i  is  T  a  T = (tj.  t , l e i g e n v e c t o r s of X X  A = (Xi  X)  T  F  pi  linear  eigenvalues of X X  combination  T  of  the e i g e n v e c t o r s  a s s o c i a t e d with the " l a r g e " e i g e n v a l u e s , i e Xi  T  = a T r  T  where a,. =  1  Xt  "large"  0  else  then Var(y) = CT'«d/n + Z ( l / X i ) l E  where the  summation i s  case r e l a t i v e l y  over  " l a r g e " e i g e n v a l u e s . So i n t h i s  precise estimation i s possible.  16 2.0  2.1  Collinaarity  Definition  The  o f a Generalized  generalized  linear  CNELDER,WEDDERBURN,723 amongst probit for  others, models  for  underlying  as  umbrella  with  developed that  normal  responses,  features  -  the  observations  -  the  response  -  to,  t h e response linear  are  encompasses  errors,  and  log  by  l o g i t and  linear  models  above,  terms  this  distribution  develops  the  quasi  distribution be  shown  a  theory o f the  that  as a function of a  likelihood.  the  generalized  for  m e a n . He  although  no  i t is sufficient results shows log  that  this  the  for  a one is the  weakest  class  one parameter  for  t o Fishers  likelihood  Consequently  c a n be assumed  t h e second  t o the  because;  analagous  likelihood,  on  variance  [WEDDERBURN,743  t o be the  directly  expectation  based  assumed,  and y i e l d s  exponential  :  p r e d i c t o r v a r i a b l e s and e r r o r  i s explicitly  likelihood.  as  of i t s  quasi-likelihood,  classical  are  to, or  additively  the relation  purposes  same  function  o f the  estimation  parameter  i s equal  i s modelled  [WEDDERBURN,743  models  independent  some  combination  feature  o f these  variance  proportional  can  an  models  quantal  Model  model  i s  linear  Linear  L i n e a r Model*  counts. The  of  i n Generalized  sort  o f models  exponential  family. based  This  on  is  2.1  completely  a  an  vector  defined  €.  giving  = exp([y9  n (iii) and  a  components.  independently  vector  u  =  distributed  E ( y ) and  density  of  an y  nxl error follows  a  density  - b(8)]/a<0>  +  c(y,0))  i s called  the canonical  0  i s called  the dispersion  nxl linear  component, and  an  X  being  pxl  parameter  and  parameter. an  nxp  matrix  coefficient  vector  of B,  predictor  = Xfl link  function,  systematic  c a n be  parameters  g(.)  components  n = g ( p )  It  ,  three  8  variables,  an  y  probabilty  systematic  predictor  which i s  L i n e a r Model  the following  exponential  where  a  by  value  The  f(y;9,0)  definition,  [McCULLAGH,NELDER,83].  component,  n x l mean  generalized  (ii)  of  following  + €  random  with  the  Generalized  = g-'(XB)  <i>  to  the notation  Definition y  leads  or  shown  as  p=  that  i n the linear  : IR->-]R, c o n n e c t i n g  the  random  follows g  _ 1  (XB)  •  sufficient predictor  statistics n  = XB  when  exist XB  f o r the  i s equal  to  the  canonical  g(.)  parameter  i s called  8.  the canonical  Although  i t  purposes  of this  analytic  simplification  t h e s i s they  members  of the generalized  to  density gives  the  corresponding  fourth  9  columns  function  respectively. and  between  (see  the  three  as  density  9  family.  Note  be n o t e d  uniquely  link  2.1  lists  the  while  Table  2.2  a,  The  column  \i a n d  the variance the that  b, c f o r t h e of  t h e second  Table gives  the canonical  fordetails).  and  distributions  parameters first  that the  and  Table  members,  vector  1A  f o r selected  variances  = n, w h i l e  t h e mean  considerable  details  expressions. selected  f o r the  (see Appendix 2A).  means,  2.3 g i v e  function  links,  and t h e m u l t i n o m i a l  mean  It i s to  p i n column  be assumed,  exponential  Appendix  of table of  will  link  link  of the distribution.  distribution.  the canonical  the  to use canonical  their  f o r the  case  distribution  complicated  relationship  parameter  a  are  exponential  2.3 g i v e s  give  because  functions  the  general  2.3  hypergeometric  excluded  functions  link  c a n b e made  2.1  were  this  i s not necessary  Tables  non-central  In  The t h i r d  and  of y  ( i e . v) a s  canonical  parameter  the  relationship  c h a r a c t e r i z e s t h e member.  of v  19  Table 2.1  D e n s i t i e s o f the g e n e r a l exponential family  normal  f(y;u,0)  =  1  f(y;p)  gamma  f<y;u,0>  inverse  •  order*  f(y;u,0)  f(y;u,0)  binomial proportion  f(y;u,n)  negative binomial  f(y;p,k)  J  ^  20  y=o,i,  L p J  [2TT0y J ;a  =  E  *'H'  T(0)  Gaussian  pth  ,-(y-M) , } L  [21T0  poisson  -[%  r — Ti exp {— -~  exp ( H  yiO  y  2p y lB:  \  L 1-p  - —-—1 2-p J  +  h(y,0))  y=0/n,1/n,  ^ y - - i L k+M ;  J L k+P J  1. gamma with 0=1 i s the e x p o n e n t i a l 2 . binomial p r o p o r t i o n with n=l i s l o g i s t i c 3. # denotes the "p s e r i e s "  7  '  '  Table 2 . 2  Parameters o f the g e n e r a l exponential family  a(0)  normal  0  poisson  1  gamma  0  inverse Gaussian  0  1.  KB™  -54[y' /0 +log(2ir0>] 5:  log(y!)  -log(-9)  logl(0)+  (-28)  01og(0y)  - 1 / ( 2 0 y ) 1  -1  0  binomial 1/n proportion  negative binomial  c<0,y)  -(l/3)log(2TT0y-' )  #  pth order  b(8)  1  (p-2> — [ (1-p) 8D  log(l+e")  r- -ii  h(y,0>  U>--iaJ  log(" > 3  -klog(l-e<»)  # denotes t h e "p s e r i e s "  >">.V  Table  2.3  D e n s i t i e s o f the g e n e r a l exponential  g<H>  P<6>  normal  u  8  poisson  log(y>  e°  gamma  #  -1  inverse Gaussian  family  V(H>  1  \i  -1  —  —  H  —  [--]  H  #  - i  r - i  v(8)  V  <p~ > 1  1  e°  1  Ue]  f  lp^<p-  -1  binomial proportion  negative J? , , binomial  1.  r  H 1  log r — Lk + MJ 1  ke«  r~« 1 - ke°  # d e n o t e s t h e "p s e r i e s "  M + c"/k  r  r  keMl+eMl-k))  —— —-—(l-ke ) :  0  6 5  Further  details  applications  of  are to  be  these found  distributions  in  and  their  [McCULLAGH,NELDER,831  and  CMcCULLAGH,821.  In the  the  ensuing  maximum  results  likelihood  and  Appendix  brief  2A.  technique  development,  As  i s  i s  estimation  details shown  and an a l g o r i t h m  Appendix  2B.  Definition  Collinearity difficult linear  to  case.  in  this  to  a  define  appendix  for  this  than  Intuitively  procedure  -  two o r more  predictor  variables  -  coefficient  estimates  are different  from  those  coefficient  -  data  -  matrices  estimates  collection used  are highly  Linear  are i l lconditioned.  Models  i s  more  i n the standard  t o be p r e s e n t  are highly  i n magnitude  or  variable  of the  when:  correlated  i s deficient  i n the computation  least  given i n  hypothesised  -  estimates  i s  model  i t s counterpart said  are given i n  reweighted  linear  i t i s again  important  Fisher's scoring  i n Generalized  generalized  i s made t o  The  derivation  iteratively  of Collinearity in  reference  procedures.  of their  equivalent  squares,  2.2  much  coefficient  sign  There  are several  based  on  model.  A second  approach  i s  f o r generalized  Linearisation  in their  linearising y to  defining  linear  when  directions  uses  i t . One  approach  linearisation  suggested  of the Link  [BELSEY,KUH,WELSCH,80] case  to  [BELSEY,KUH,WELSCH,80]  calculations  2.2.1  approaches  models  from  the  of the  iterative  (see Appendix  2A).  Function commenting  f o r future  on t h e non  research  section,  linear suggest  t h e model  = g- (Xfl) 1  + €  the form y  = Xfi + €  Intuitively, plausible an  of  This  with  the standard  collinearity  approximate  following  Theorem  analogy  to define  being X.  by  linear  linearisation  in  the  dependency  i s  expressed  linear general amongst  L i n e a r i s a t i o n of y = g-MXfl)  y  = g—*(Xfi)  y  = XG  + € c a n be e x p r e s s e d  + €  where with  X i = X  VtXi/a(0)  = [ x  i  5C —* C x & V  x ]  F  r  V t  *  = diag(v  ••• ; L  )  f  x vi D  + €  i n t h e form  i ti s  system the  formally  theorem.  2.1  case,  as  columns in  the  24 This error  can  from  estimate  where Now =  =  non  as  linear  follows.  First  procedure  is  the  residual  expanded  due  about  to the  giving +  J ( B) (B  •» 0 ( B )  -  J(B)B  J(B)  i s the  by  letting  0(B)  -  -  +  gradient  B)  +  0 (fl e  -  B)  0  with  J(B)B matrix  of  respect  to  B.  second  r e s u l t appeals  to  linear  model  J(B)B  A  X  =  -J(B)  €  =  -€.  first  theory Xu  By  proved  0(B)  ~  the  a  fi,  0(B)  y  be  of =  equation Appendix -J(B)  definition  follows. 2A  as  follows.  1 J  of  the  generalised  L f i n i J L0B.J ("jL  The  © H i ]v  [©"ii  this  is  just  the  Now  using  the simplification  Appendix this  2A,  simplifies =  It  definition  to  be  admitted  residual  approach,  i t  i s  point  f o r defining  2.2.2  Itarativaly  as i n  the linear  predictor  n,  does  squares  in  the  proof  above, t h e  an e f f i c i e n t e s t i m a t e  W €.  residuals,  However  >,  suggesting  as  a suitable  i n the general  of the estimates  (x W<  =  (X X>~*X z<*>  +:  a  as  first  starting  case.  c a n be c a s t  as i n Appendix  in a  weighted  2a i e .  > X) - X W <* > z <* > 4  r  be  in  gives  collinearity  _  can  not  useful  formulation r  that  Rawaightad Laast Squaraa Approach  computation  fl  This  link,  to  due t o t h e w e i g h t e d  least  of  canonical  ±  i s  The  the  vj x a(0>  unweighted that  and t h e  of  T  x  viewed  as  ordinary  least  squares  again  be  with  the  transf ormation X =  WX  z  2  =  applied  to the  define  collinearity  approximate using  data.  linear  W z 2  So  i t  in  the  dependency  the definition  generalized amongst  of the weight  v 1 3^(0) gives  would  reasonable  linear  t h e columns  as i n Appendix  model  a s an  o f W X. 2A,  to  i e .  Now  26 -J  V i  a(0)  2.2.3  Choice  The  link  squares  X  produce  similar  * ' aT0T=  discussed  the  However  collinearity  regression  is  Definition  2.2  For c  = g  - 1  <XB)  some  * O  |Xc|  the  easily  approach. 2.5  same  of  from  This  on  reasons  in  collinearity the  iteratively  be  illustrated  will  sources  and  linear the  approach.  effects model.  case  Hence,  of the  of Also  logistic following  adopted.  Collinearity + €  i s said  to  chosen  6  in a exist  2  that  £ 6|c|  least  namely  notions  generalized  for similar  suitably  such  more  and  reweighted  =  intuitive  the  selects  *'  seen  2.4  in  definition  x  squares  sections  iteratively  transformations,  the  are  least  CSCHAEFER,83]  and  a n d  earlier  reweighted  y  Approach  linearisation  respectively.  in  of  where  X  =  W^X  0  Generalized  Linear  Model  when  there  exists  a  pxl  vector  2.3  Relationship  Generalized  Now as  V i  a  i s a  lists  the  general  model  the  members  of Table  (p=l), other  X  when  Model  and the  inverse members  standard  linear  a  ft  of  (=XG) 2.2  Table  from t h e standard  model f o r  p series. that  the p series  the family  implies  of  the  linear  such  binomial)  t  members o f  |Xft|  i s  ( i e except  (p=2) a n d p t h o r d e r ) .  negative  model.  in  t h e so c a l l e d  considered  model  column  t o be n o t e d  collinearity  Is  fl  Definition  the selected  It i s  except  Gaussian  linear  in  to the generalized  there  and  parameter  The f i r s t  of X f o r  that  this  —*• X f o r a l l e x c e p t  proportion  generalized  2.4  of the family  then  f o r X.  family.  c a r r i e s over  informally,  binomial the  of the canonical  element  exponential  linear  gamma  function  the i j t h  column  "small",  Linear  2.3. So s u b s t i t u t i n g f o r  second  Since,  Standard  d i r e c t expressions  2.4  all  the  Model C o l l i n e a r i t y D e f i n i t i o n s  i n Table  gives  of  Hence f o r  ( i e . Poisson,  collinearity  collinearity  in  in the  28  Table  2.4  L i n e a r i s e d P r e d i c t o r Elements general  exponential  f o r the  family  1i m x i j  e-o normal  - x , ,  poisson  e  gamma  #  J  i  J  3  .  X  0L2eJ  X  :  ±  J  order  . . negative binomial  #  ±  x,. ,  X i j  l r —^- il ll, ~ " ""* 1 -,  1 n  binomial proportion  1.  e  X  Gaussian  pth  /  X  1  inverse # r  6  -  denotes  Cl+e*)  l"k<e«fl  the  "p  +  J  <l-k)e")]^  series'  2n  [k(2-k)l (T-k>  X  l  J  However the  i t w o u l d be  general  because but  linear  i t  i s  incorrect  model  case  possible  general  consider  X =  Now  linear  a gamma  IXi 1  f o r the  with  Xiel  X = W^X  where  t  p  in  X. T h i s i s  t o be  collinear,  the collinearity i n series.  F o r example  0=1 a n d Tfii1  _  w  collinearity in  collinearity  b y W may r e m o v e  case  model  define  f o r t h e X system  the premultiplication  the  as  to  x 6 4  So X j. s  BiXei  + BeXee  AiXai  Xcl  ( C I X H  +  CeXie)  Xcl  fctXii  *  CffiXiel  Now X may b e c o l l i n e a r Ct  = 1, c » = - 1 ) w i t h lS  small  2.4  IXcl  many  relating scalar  +  (C,.X .  +  e)  [dXat  •+ C X c a 1 a  (eg. Xi± * X i | X c | < 6,  CftXae)*  a  *  ytt&±  but f o r  B  :l  ,  *  B  X a a  e  and  sufficiently  > 6.  Sources In  +  E  +' B^X^c  of Collinearity practical  applications  t h e X and X systems  multiple  i n a Generalized  will  of the identity  the  Linear  scaling  be a p p r o x i m a t e l y  matrix.  Model matrix  a  W**  constant  In the  particular  rows  a n d ft a r e  following  Theorem The  2.1  multiple  of  is  does  v ,  when  the  4  x±  W*^  identity are  proved  the  as  s i n c e they  the  linear  equal,  w  the  H = 1 +  of  follows. are  between  binomial  <1  will  rows  as  combinations i s shown  v  of  in  the  - 4v)^  W""  approximate  x*B  are  scalar  approximately •  equals  to  the But  and  i m p l i e s Q±  a  constant  dispersion there  8 as  is a  then  so  parameter one  is listed  i s constant.  proportions case = C  Matrix  X.  up  p  an  i f  i s used. and  Scaling  be  I f Wi  equal  constant  ±  Constant  matrix  canonical link  Hence  consider  matrix  the  correspondence 2.3.  when  approximately  Condition for a  where  This  occurs  theorem.  scaling  equal,  this  to  in  For  one  Table  example  where  say  But 1  H  e^ + e®  so 8  =  Hence  So also  log(c/(l-c)) in this  in these arise  -  large  -  data  case  =  i f w±  cases,  i n the  C  the  general  pairwise collection  i s constant  then  ao  i s 8*  same  sources  of  linear  case,  namely  correlations  (=x±fl>.  collinearity :  will  -  model  -  overdefined  -  specification model  outliers  However  in  the general  theoretically above  i n terms of  notions  appropriate correlation Definition  are  2.2.  x±  and  case,  the scaled  captured  substitution. of  linear  by  For  Xj g i v e s  i t i s better data  matrix  Definition  example rise  the to  2.2 large  to X.  think So  with  the the  pairwise  collinearity  as i n  2.5  Effects  2.5.1 In  of  Collinearity  Estimation analogous  estimates  The result  and t h e i r  (see  B  =  Omitting  <*>  to  the standard  variances  of R  Appendix fi  a Generalized  Linear  Model  Effects  manner  estimates  in  may  case,  both t h e  a r e a f f e c t e d by c o l l i n e a r i t y .  be  expressed  iteratively  asi n  2A)  (X W<* > X) - X"" (y r  +  linear  1  r  -  u<*M  s u p e r s c r i p t s f o r c l a r i t y , X WX r  c a n be decomposed  as  follows. (X^WX)"  =' ( X ^ X ) "  1  1  where =  <VDU" UDV >-  =  (VD" V )-'  r  at  =  VD  =  Z  any  collinearity  T  G  T  1  ———I*,-,,  iteration (ie.  small  directly  affect  the  singular  values.  Hence  model  can  procedure  [singular value decomposition]  1  T  V  _ e  F>  So  X = W^X  develop  and t h i s  [V o r t h o n o r m a l ]  ( t ) ,including singular  parameter  values  throughout  suggests  that  of  estimates  collinearity  the X  final  >  through  i s seen t o the small  i n the generalized the  iterative  one,  linear  estimation  i t may b e w o r t h w h i l e  f o r any  33 estimation  procedure  to  adjust  f o r  collinearity  a t each  iteration.  The by  variance  estimates  the asymptotic  result  Var(B)  1  So  =• <X WX)- oT  again  the  singular hence  In  of  has p o s i t i v e  2.4.2  directly  variance  values  X  affects  estimates (since  singular  be  approximated  t h e e s t i m a t i o n , by  according  X VX  i s positive  T  to  the small  definite  and  values).  I n f e r e n c e and P r e d i c t o r E f f e c t s a  similar  manner  to  equivalent  statistics  the  i s not so d i r e c t .  t  can  fiS  collinearity  inflating  a t convergence  effect  statistic D (y, H)  t  where  8  i  standard  are affected  i s the deviance  = 2Z w [ y ( 8  the  by c o l l i n e a r i t y .  F o r example difference  - 8,.) - b ( 8 ) + J  linear  the analogue  where  case the However of the  the deviance i s  b(8 )] a  = 8(£)  f  8i  =  8 ( y j. ) A  is  a  complex  form  of  B  and  W  which  are  affected  collinearity.  Similarly  i  with  v(H L)  t h e g e n e r a l i z e d Pearson  statistic  by  which  i s used  to calculate  the  dispersion  parameter  35 2.6  Appendix  Linear  2A  Maximum  Likelihood  and t h e G e n e r a l i z e d  Model  The  following  maximum  derivation of  likelihood  the standard  iterative  estimates  results i s  f o r the  taken  from  [McCULLAGH,NELDER,83].  Since the  the observations  log likelihood  canonical  form  parameter  0.  applying  1±,  contribution with  Using  the  @11  Hence  =  t h e mean v e c t o r  canonical E( by  y i  for  J  [~yi - b ( 8 i ) 1  )  using  parameter =  Hi  =  +  _  details)  maximum  c  (  y*'  0  Definition  2.1, and  technique  (see  gives  likelihood  result  the  following  estimates.  t  u c a n be r e p r e s e n t e d  b O i )  8j. a n d d i s p e r s i o n  )  yj. - p  8 as  the standard  of  initially  i t h observation i n  scoring  f o r the iterative  I—aT0i  the  consider  parameter  notation  Fisher's  =  11  of  canonical  [KENDALL,STUART,67.2] results  a r e independent,  i n terms  of the  Now Q l± 08i  -b(8i) a(0)  a  =  a  ao  the variance  terms  of the canonical  Var(yt) by  function  using  =  v  =  ±  v, c a n be s i m i l a r l y  parameter  represented  in  8 as  b(9 )aC0) 1  the standard  result  P] * [H]* •° •  E  The H  E  maximum  and v  likelihood  i n the following  Now  the  the  log likelihood  total  individual s  For  the  i s the f i r s t  expressed  -  or  t o any score  linear i s  just  derivative  So  v  t  the score  the  sum  the score  of the log  such  vector  likelihood,  links  [&]  !  the following  simplification  L©9 j.J  V i  component  V i  simplifies  a<0> to  as  of the  as  repeatedly.  [SKiJ  i n terms o f  component  c o n t r i b u t i o n s . Hence  - [lib] • ? [ & ] ' canonical  are obtained  steps.  contribution  independent  , which  be  equations  i s used  can  Hence  t h e maximum  which  f o r canonical  ? L^^aTsKhor  i n matrix  _  r  information  the  expected  simplifies  °  r= 1 < 1 > P  m a t r i x , which  value  of  i s derived  0 l !  where  the  derivative  second of  the  was d e f i n e d second  by F i s h e r  derivative  i n the following  r g ii I L@fi .@fisJ ffi  v  term  a  simplification.  _  Hence  of  as minus t h el o g  steps.  r®iii'| 1®"*J L  1  zero due  since to  i t  the  the elements a r e  S  t  weight  i s  constant  x j >,x v Define  to  0  a  E  links  form a r e  0  The  -  =  e q u a t i o n s i n component  form  * < y - H> a<0)  likelihood,  likelihood  matrix W  @ f i  r?Hii sJ L n,J @  V,  r@(^i | L®n,J = diagCw*  w,,,) a s  contains  canonical  the link  38  a So  ,S:  (0>  the negative  f 0*11 lOB ..@Bs  expected  value  expression  simplifies  to  WiXi ,Xis t  v  So  summing  the  to get the total  information  I  v >  or  s  =  £  Wj X j » X t v  i n matrix I  So  =  i s  S  form  Fisher's  B  =  further  because matrix  scoring  o f ft a r e g i v e n  ft<t-Hi>  < *  >  +  I  -• *  i s  details  t h e same  i s  =  H6 ft =  link  see  simplification  to  the  expressed  and  as  or  s  A  But  <*-i>  =  A HB  =  ft<*>  +  < t ; >  +  likelihood  s  or  (X WX)~ X (y T  1  r  - p'* * )  the  Now  information  so the above s c o r i n g  Newton  So H f i  maximum  [KENDALL,STUART,67.2].  as the Hessian,  typically  H--*s  the  by  of this  equivalent  tBARD,741)  scheme,  s  of the canonical  scheme  oft  of  r  estimates  (eg.  element  X WX  using  For  matrix  contribution, the rsth  Rhapson  method  39  (Hft)  £  Hi.,B,  ..j  A  Z Z  w,,.x x |Bj kl  Z  WkXkifik  Z  Wk'^Xki  k  So  Hfi  '  Ldfiuj But  this  i s just  t h e form  of weighted  least  squares  A  (X W< X)fl< > T  t >  with  pseudo Z  This  =  t  =  as  (X WX) T  t  )  +  1  X' Wz r  (working)  variable  *  converges,  A  ft  (  dependent u  generally  represented  T  ri  -  k  X z  so  the  final  estimate  can  be  2.7  Appendix  2B  Iteratively  Reweighted  Least  Squares  Algorithm  The  following  algorithm  for  calculating  B  i s taken  from  [McCULLAGH,NELDER,83].  Algorithm  2.1  Obtain n  initial A  =  <o >  Xfi  A  jj < O > *  <o > '  x  estimates of  from  < 0 >  the  =  y  =  g (p <° > >  least  squares  the  linear  predictor  data  A  <o>  |  r  LdnJ STEP  reweighted  0  STEP  A  Iteratively  LdfiJ  A  C  O  ,  1  REPEAT  t  = 0(1)  UNTIL  Set  up  working  Set  up  weight  W<*>  =  convergence  dependent  variable  W  diag(Wi < t: >  A  w„  ( t )  < t >  >  2  [ii]  Regress  z* <  :>  on  x  t  x, f:  with  weight  W  to  get  41 A  < * -i-  1 >  (X W< >X>- X" W< z r  t  A  Xfi <*•*•'•>  A  m END  REPEAT  - [is]  l  r  t>  <t;  >  42 3.  Diagnostics  for  Collinearity  In  Generalized  Linear  Models  3.1  Desirable  Properties  A diagnostic try  to  sense  measure  collinearity  hypothesised  statistics.  This  sufficient  illustrates directly  signs  because  signs  linked  amongst  based  on  of  these  the  columns  of  row not  that  (eg.  as  deletion necessary  [MULLET,761  diagnostic  of the  in  are  collinearity  Rather,  methods  e f f e c t s such  sensitivity  definition  the  Measures  i t s possible  or  for  the  be  most  case).  to  not  from  conditions  the  dependency  is  Diagnostic  should  incorrectly  and  for  should  approximate predictor  be  linear  variables,  ie. |Xc| This  £  gives  First,  6|c| two  i t should  combinations  Second,  of  i t  collinearity. the  distribution X.  properties  be  capable  offending  should However  diagnostic  variables  desirable  measure  the  able use  must  assumptions  of  predictor  be  be  about  of  of  a  diagnostic  identifying  the  linear  variables.  to  quantify  the  the  quantitative  subjective, the  measure.  nature  since of  effects  of  aspect  of  there the  are  no  predictor  Now  while  collinearity  haa  consequences  any  subsequent  standard  linear  collinearity into  the  3.2  while  of  linear  degradation  due  the  singular  a  i n the  For  c*0 if  and  as the  the  2.2  directly  this  enters  tests  natural  and  embedding  case.  exactly  generalized  is  as  there  a  one  X  following  f o r the  expresses  linear are  no  model easy  the  idea  setting, ways  of  of  i t is  finding  i f  |Xc|  i  to 1.2  by  one and  small  correspondence the  between  instability  singular values.  of  the  This  is  theorem.  of  Standard  suitably  that  only  in Definition  Equivalence  some  such  X)  estimates,  manner,  general  of  to  Collinearity  there  3.1  of  the  similar  characterised  Value  values  for  X  Theorem  the  of  as  in  in  measure  collinearity  proved  and  case,  the  i t  combinations.  However,  system  parameters,  In  i m p r a c t i c a l measure,  the  problem,  prediction.  Definition in  the  data  or  present  Measures  of  a  testing  In  collinearity an  estimation  expressions  be  Now  the  primarily  hypothesis  (say  predictions. should  for  ia  chosen 6|c|  Collinearity Linear  6  1 0  Model  there  and y  a =  Small  Singular  XB  +  €  exists  a  pxl  vector  di  1  6  where  is  proved  values  give  rise  1.3.1.  So  This  1.2  (this  \Xc\ eS:  Hence  i f  a  to  show a  i  min u  =  d i  T  the  to  fact was  that  small  that  singular  The  collinearity  suppose  c X Xc  that  collinearity  of  X  singular  in  section  according  value.  Definition  scaling).  of  small  established  singular c  value  to  Without  loss  1.2  unit  has  Consider  T  u" X" Xu r  where  r  u u  =  T  1  [extrema of r a t i o quadratic forms  e  | X c | i. 6  not  multivariate  c±  independent be  to  =  smallest  follows.  is equivalent  Although is  as  implies  generality,  norm  i a the  4  i t remains  Definition of  d  extended  then  1  6.  emphasized  that  the  |  in X  So  in  problem,  such to  d.  c  is  that  there Now  of  1  small.  chapter  i | <  case  d,  of  the  multiple  one, may  collinearity be  above  k  linearly  theorem  can  collinearities  as  Collinearities  and  follows.  Corollary Small  Singular  y = XB  k if  3.1  +  Equivalence Values  X  for  the  Multiple Standard  Linear  Model  €  collinearities  and  of  of  only  i f  are  present  in  the  standard  linear  model  the  The  kth smallest  "only  while  i f " argument  the  theorem  the  this  since  Definition  is  y  having  of  estimation  Collinearity  by  +  from of  the  the  1.3.1 above  variance  definition  i t i s possible  the singular  pseudo values.  the singular  linear  general  in a  linear  transformation  definition  Measure  singular  variables  values  i s reasonable properties  to apply  of a  of  measure  Generalized  Now  this in  as  to  = W'X  Linear  i t  comes  •  near  deduce variables be  offending  x.j c o r r e s p o n d i n g  highly  the general  not  algorithm  the  desirable  to the o r i g i n a l  i s i n general  to  earlier.  identification  i t would directly  of X  discussed  the  predictor  values  X j . However  natural  the  €  the "small"  of  to  to a  the following  a l l the desirable  variables  section  directly  through  CBELSEY,KUH,WELSCH,80]  relate  the  Hence  1  combinations to  in  •  •  carries  = g- <XB)  above  First  the  i t i s equivalent  3.1  given  The  established  "small"  components.  variables.  Model  of X are  follows  the analogue with  Further, setting  values  was  " i f " argument  by  principal  singular  predictor  possible  setting  to  are the  because pseudo  ones  x j . Nevertheless  scaling  matrix  identity  W  matrix  in  i s and  many  practical  an a p p r o x i m a t e has  little  situations  scalar  effect  multiple  on  the  the of the  collinear  structure. Second, measured this  the  by t h o s e  Model  3.2  the  col1inearities  the corresponding index  i s defined  Collinearity  singularities.  as  Index  can  be  To do  follows.  f o r Generalized  Linear  l  While  i t  experience, suggests  a  Finally  (d  i s  such  ,. . . , d )  there  as r e f e r r e d  index  i s  present than  in section  a  large  of X  •  body  of  i f  that  the corresponding  10.0.  of collinearity  inference  Dependency  i s  values  t o i n [MONTGOMERY,PECK,82],  i s greater  the effects  discussed  are the singular  Fn  subjective  the estimation,  Model  t  collinearity  collinearity  3.3  of  y = g" (Xfl) • € i s  where  was  of  a collinearity  Definition  in  magnitudes  are clearly  and p r e d i c t i o n  manifested  calculations  as  2.4.  i n Collinear  From  the  definition  i s  measure  for  a generalised  i s  Systems seen  linear  that  model  the  depends  collinearity specifically  upon  thedistribution  proportion, It in  chosen  y  (eg. binomial  Poisson,e t c ) .  i sdesirable  t o be a b l e  the collinearity.  author  f o r the response  and  proved  to quantify  The f o l l o w i n g  by  CM0YLS,85]  theorem comes  this  indeterminancy  conjectured  part  by t h e  way i n a n s w e r i n g  this. Theorem  3.1 a  Wr^mx  Collinearity £  a  where  i. W  i v,  m  m  A,T  i s proved  T  Let  as follows.  the eigensystem M  =  X  M  =  X  =  A M  r  M  ( X  r  X ) t  z = X t XM  t  M  M  t  Letthe  indices  r  o f X, X a n d  and maximal  diagonal  matrix W  •  subscripts  A,T i s the eigen  o f X WX.  ,,,  system  and  stand  M  o f X X, and X  Then  M  r M  t  M  tT o r t h o g o n a l ]  so  M  = z" z r  Consider  u X WXu. I t  extrema  of  T  quadratic  forms,  of  subject  T  T  Hence i  thought  u u = 1 and i s so maximised  max(u X WXu) u r  c a n be  T  restriction  XM  o f t h e weight  a n d ,.,*»>«. S u p p o s e  m±>*.  (X X)t t  a  w,•,,„,« a r e t h e m i n i m a l  i „ ,  elements  for  Relationships  a, a a r e t h e c o l l i n e a r i t y w  This  Index  z Wz T  =  t  r M  (X WX)t r  M  =  as the ratio to  by u = t  X M >  t  x M  of  the M  scaling  . So  (X WX)t  the  T  M  Now  X  =  Z«iZi Zi  i  w  =  w X  T  Zzi Zi T  r a  ri1  M  by  the  ratio  M  =  t X wxt  =  z Wz  r  of  the  extrema  of  quadratic  forma  T  M  M  where  T  =  IWiZi Z,  1  w„Zzi Zi  =  w  z  =  Xt  M  T  T  £  M  tM  w t M  T  T M  X  T  Xt„  X Xt  [extrema  T  M  of  ratio]  WMXM  ,=  Thus £  XM  £  wX  Xm  i.  wX  M  Similarly W Xm  £  r t l  Dividing  these  covariance  So  and  M  appropriately weight  matrices  this  theorem  says  is  bounded  by  system factors  being  weights.  This  First,  the  that  ratios  is a  given  useful  a  the  of  of  are  the  the  result  collinearity  linear  model,  the  models  can  calculated  be  that  yields  effects  of  the  positive  semi  collinearity standard minimum f o r two  doing  since  the  definite.  of  the  linear  and  general  system  maximum  •  with  scaling  reasons.  analysis  collinearity  without  result,  with with  another  a  other  standard general  e'igenanalysis.  Second, scalar  i t  multiple  structure of  of  with  By of  Table  predictor  predictor  matrix.  practical  i t  i s  matrix  W i s nearly a  then  the collinear  matrix,  i s approximately  Since  this  applications,  use i n t h e f o l l o w i n g  2.3  bounds  chapter  v(0)  possible  f o r t h e members  i s  this  that  often the  fact  i s of  on e s t i m a t i o n .  in  the  t o compare  fourth  column  t h e magnitudes o f  o f t h e f a m i l y . In p a r t i c u l a r f o r  Poisson , min max  while  e e  Xi __ XiB  a  min  i s  (Xifi)^ . . ,.,a  (Xifl)"  seen  collinearity problem  when  ft ~ a £  £  f o r t h e gamma max —  of  identity  considering the expression  these the  i£ t h e s c a l i n g  of the  many  considerable  that  the transformed  the original  case  It  shows  max e  Xjfl  — a , XiB min e  t h e bounds a r e  £  ~ a  min  £  immediately  max  from  i n the standard a gamma  t h e problems caused  model  (x R) ——-or a  (  (xifl) * 1  the  linear  above  model  i s used.  This  by t h e s o c a l l e d  p  lower  bound,  that  becomes  more  of a  i s another series.  example  4.0 of  Estimation  f o r Generalized  Linear  Models  i n the  Presence  t h e methods  of the  previous  i t s  presence  Collinearity  Having  detected  chapter,  the  remains.  The  collinearity  problem solution  of  by  estimation  to this  in  d e p e n d s on  the causes  of the  collinearity.  However  a  estimation  cautionary  with  as  an  "We s h o u l d e s t i m a t i o n by  was  shown  recognise the any method i s  solutions  in detail,  a  the sources  From scaling by  a  and  for Collinear  matrix  approach otherwise.  general  effects  pragmatic  treating  precise  combinations  1.3.4,  of the  [ALLEN,77]  existence of situations warranted"  t o examine  Remedies  linear  While  p96  example  proceeding  4.1  necessary.  in section  Before  to  i s  i s possible for certain  coefficients shows  word  W  overview  discussed  viewpoint,  in  as a  possible  i s given  i n chapter  no  remedial  with  respect  one.  Sources  i s negligible,  t h e model  taken  the various  where  often s o some  standard  following  the  progress  linear  sections,  effect  one. except  of the  c a n be This where  made  i s the noted  4.1.1  Large  While is  Palrwiae  this  could  unlikely  prior  Correlations  be  simple  i t  a l l e v i a t e d by  sufficient  d i s t r i b u t i o n or may  be  in  X a  Bayesian  information  likelihood. possible  to  exists  If  the  remedy  approach, to  i t  formulate  pattern  is  a  very  this  by  variable  due  to  extrinsic  deletion.  4.1.2  Data C o l l e c t i o n  If  the  dependency  factors,  then  sampling  plan  often  be  previous  In  on  infeasible or  Just  i f  approach  of  be  data  according  obvious  solution.  due  to  cost,  lack  of  the  may  extra  to  available  used  dependency  to  However,  inconsistency  dependency  be  to  a  proper  this  may  with  the  data.  is  intrinsic,  incorporate  the  then  a  information  mechanism.  Model S p e c i f i c a t i o n  The  the  the  underlying  4.1.3  to  is  contrast,  the  suspected  collection  data,  Bayesian  is  common  solution  eliminate  the  l e v e l s of  With  a  predictor  the  case  of  redundant  predictor  categorical  variable).  approximate  specified,  to  there variables.  dependencies, is  good  However  exact  since  variables  the  justification as  dependencies,  CGUNST,83]  source for notes,  (eg.  is  one  is of  clearly  eliminating one  must  be  careful  to  make  collinearity  4.1.4 One  Overdefined  Model  method  to  handle  where  the  eliminate  some  exact  their  tGUNST,831  this  i sthe sole  predictor  this  variables.  unlike other statistics p2237  i s  eigenvalues  distribution  statistics, have  that  cause  of the  i n the data.  regression,  that  sure  theory variable  principal of  This exists  components  X X  are  has  t h e advantage  T  f o r the  selection  used  decision  procedures,  i n f l u e n c e d by t h e c o l l i n e a r i t y .  notes  there  aresituations  to  which  However  where  "There appears t o be no credible substitution f o r c o l l e c t i n g a d d i t i o n a l data o r l i m i t i n g t h e goals o f t h e study when m u l t i c o l l i n e a r i t y i s a t t r i b u t e d t o o v e r d e f i n e d m o d e l s . "  4.1.5  Outliers  [PREGIBON,81] model the  may  that  be d e t e c t e d  outliers  by e x a m i n i n g  generalized projection M = I -  So  shows  using  case,  generalized  the diagonal  linear  elements o f  matrix  W" X(X WX>- X' W 1  T  similar  the  ina  1  T  criteria  offending  to  data  those  in  the  c a n be d r o p p e d ,  standard  linear  so a l l e v i a t i n g the  collinearity.  4.2  Remedies The  major  conditioning  for Collinear observable of  the  Effects  effect X  of  matrix,  collinearity  a n d i n moat  i s  the i l l  practical  cases  there than In  may  complex  this  obvious  a  tradeoff  ( i e .ridge  Estimation  4.3.1  Ridge  developed  tHOERL.,62].  <X X  +  r  proposed  handle  lying  be  variables. and  reduced  worthwhile.  The  stemmed  the  along  from  estimation  instability  the ridge  simplest  of a  estimator  of  response  canonical i s the  biased  shows  in  reduced  the  simple  There  against  paper  e x i s t s a k<, s u c h :  i s  estimator, mean  developed  less i e . an  for  choosing  been  t h e mean that  Numerous  of  been  the  i s as  a  error  ordinary results  empirical  methods  variants of  subsequently  p75  That  of bias  criticism  regression  square  k, a n d s e v e r a l  much  [SMITH,CAMPBELL,80]  t h e use o f r i d g e  that  introduction  error.  have  CHOERL,KENNARD,703.  than  square  estimator  has  estimation.  r  estimator  squares  been  J  there  least  have  kI)- X y  i n the pioneering  the ridge  a  may  bias  other  estimator  ftR =  of  the predictor  increased  analysis  to  surface  paper  estimation)  ridge  coefficients  as  between  collinearity  Estimation  surface  ridge  f o r the  Methods  Historically methods  reasons  interrelationships in  case  variance  4.3  be no  proposed.  of  spirited  ridge attack  "mechanical data p a r t i c u l a r phenomena coefficients" They  argue  that  manipulation that being modeled and  ridge  pseudoinformation", approach.  However  while  prior  in  the  terms  of  essential  and  this  estimators  and the  instead  critics  information a  prior  ridge  imprecise  are  this  based  i s too  regression  a  paper  point to  i t  is  be  "adhoc  as  Bayesian out  that,  formulated nevertheless  approximately  Furthermore,  on  semi  weak  distribution;  information.  i n s e n s i t i v e to the information about  advocate  of  held  is to  incorporates  [GUNST,80]  plOO  notes "ridge regression has been successfully applied too f r e q u e n t l y , when little prior information is available, for i t s use t o be restricted only to data for which formal B a y e s i a n p r i o r s a r e known" In  Standard  Linear  their  derivation  CHOERL,KENNARD,70]  Case  reason  of  as  a  ridge  estimator  follows  "the worse the c o n d i t i o n i n g of X X, the more R can be expected to be t o o l o n g . On the other hand, the worse the c o n d i t i o n i n g , t h e f u r t h e r one can move from R without an appreciable increase i n t h e r e s i d u a l sum o f s q u a r e s . In view of E[B 6] = B B + o t r ( X X ) - i t seems r e a s o n a b l e t h a t i f one moves away from the minimum sum of squares point, the m o v e m e n t s h o u l d be i n a direction which will shorten the length of the r e g r e s s i o n vector." T  T  So  T  the  e  ridge  due  to  that  the  ordinary  d* 6  1  estimator  squares of  r  error  is  the  i s constrained least  squares  =  S S E ( f i ) - SSECrW.;.)  =  ( y - y >  s h o r t e s t fi w h o s e to  be  within  estimator  B u.a 0  R  A  A T  ( y - y ) - ( y  A  -  ycii....ia >  A 1  <y  -  y«.i  !>>  d'~ ie.  sum  of  say  of  A  (y =  - XB«) (y  - X1W  T  -  ( B R - Ba,_a> X X<BH T  2(fl  T  Thus  <B  -  T  B  the ridge minimise subject  Now  =  which B  R  fi B  to  R  gives =  - B o i ,-,) X X ( B T  equations]  of  -  B  - B „ > < d'  r  expression  ridge  extends  R  with  (:;  multiplier  fia,.. ,) X X(fl« T  s  ra  1/k i s  - fi „_„>  T  E  r  o f t h e form  estimator.  Linear to the  definition  the usual  [normal  fioLs)  kI)-*Bot.a  General  following  )  R  has s o l u t i o n r  This  T  i s the solution  (fi„  (X X +  simple  r  + l/k(B  T  R  of  f l  the Lagrangian L  the  T R  D L B  BOLB)  ) " X X ( BR -  trace B  Xfl  A  - BOL  R  -  T  m  S  A  =  (y - X B o L . > ( y  - B O L . ) «•  T  - BoL ) X (y  B  A  Case general  of a ridge  iteratively  setting  as follows  estimator  rewighted  Bp, d e f i n e d  least  squares  with  the  i n terms solution  A fli ni...ca •  Definition model  y  4.1  = g  _ 1  This  (XB)  B  m i n i m i s e  S u b j e c t  Ridge  to  has  + €  T R  (GR  B  Estimation i s given  b y  f o r the generalized t h e  R  —  s o l u t i o n  GIRL.)=i) X W X ( B R r  T  -  B  s o l u t i o n  linear  56  A  A  ft R The  =  (X WX  +  T  proof  of this  kI)- X WXB 1  e  T  follows  by  I R L  extending  CSCHAEFER,79]  tothe  A  general  settiong  reweighted X  = W^X  squares  Brm.™  estimator,  that  i s  with  i tminimises  the  iteratively  transformation t h e weighted  sums o f  due t o e r r o r  WSSE  the  follows.  , i n the sense  squares  Again  least  as  =  the  (y - g sum o f  usual  ( Xft) ) W - ( y -  1  r  squares  estimator  g'MXB))  1  i s constrained  to within  say d  ,;s  of  Hence  ftxm_ia-  A d'  =  a  WSSE ( f l )  - WSSE (ft t u t * , )  R  (y  - g (y  (y  -  (y  ( X f W >W  1  1  1  (y - g  - g- (Xft, „,,,) ) 'W W  T  - |i(S,  -  (y  1  R  1  TR1  - p (ftT m in ) )  )) tt- (y l  T  - p (B  1  vector  p(B )  e  =  <B and  A  -  T  simplifies  approximated  R  R  -ft,:«,...=>) X  -  A  1  3  )  by  first  r  linear  (y - WXB model  fii„ ) X WX(B T  LB  +  B  t m  „>  likelihood  i tfurther to (B  L  A  - fttm _ >  T  1  the generalized  x R  ft)  fi RLo) X WX(fl T  R  p (B  - B)  p ( f t ) + WX(ft simplifies to  (ft«  -  as  A  p (fi so t h e above d  be  p ( f i ) + WX(fln  =  R  series  My  T  J: r  Taylor  )W  T  a n d p ( B<i_.!=>) c a n  >Ji(Bn)  Now  A  H(B  -  .„> +  Tm  A  2(p(B«)  s  -  - H (ft:i R L B )'W"" ( p ( B R )  (p(B«)  +  - g - ( Xft _. > )  H(ft,»>)  -  T  H L B  (X6 )>  My  1  p (fti=») )  1  T  R  -  A  ftTBUB>  equation  order  So  following  the  tHOERL,KENNARD,701  minimisation  result  and  an  using  analogous  a  Lagrange  manner  gives  multiplier  [HOERL,KENNARD,703  method  in  result  an  shows  square  ordinary general  give  a  justification  existence-construction  there error  least  is a of  non  this  squares  setting  as  zero  value  estimator  in  of  is  estimator.  i s shown  theorem.  less  This  the  k,  The  such than  result  following  for  Theorem  4.2  Existence  generalized  of  linear  k  :  Q  model  MSE ( B y  =  g  _ 1  their  existence  that  the  that  extends  of  the  to  the  theorem.  A  for  the  follows.  Now  mean  result  in  A R  (k ) )  (Xfl)  +  MSE(fiiR  £.  r a  L f i l  )  €  If (i)  Xij  are  (ii)  (X VX)  (iii)  V(u)  uniformly i s of  T  V  =  bounded  order ^-  0(l/n ) a  "v  bounded  dp  (iv)  Then  3rd  and  4th  central  for  n  sufficiently  large  k.=, 2  0  : MSE(fi (k ,))  moments  there  A  This the to  case the  (iii)  of  was  logistic  general  case  i s included  €  exist  exists  A R  theorem  of  and  c  <L  MSE(fi  originally regression. above, the  is  I R L  ..e)  proved The  by  [SCHAEFER,791  extension  straightforward  analogue  between  the  of  the  for  proof  i f condition 1st  and  2nd  moments  in  general  setting  the  logistic  i s recognised.  multivariate  As  with  while any  the standard  linear  data.  Further  found  for  i s appealing  that  k  k  as  y e t . Also upon  disturbing  to  n  So one i s  of  have  the  the side  [GUNST,83]  where  [SCHAEFER,791 logistic  choice the  give  produced  So i t  choosn  since  [MONTGOMERY,PECK,821  case, p340  a s no  be n o t e d  a  to the  of  not give  have  been  collinearity X,  alternative  i t i s large  empirical  earlier.  bounds  section for  that  construction  in their  k, ,. H o w e v e r ::  shows,  result  i n the i s  t o g e t any guidance  s e t t i n g . However  in  of  that  for sufficiently  construction  several  as  from t h e  bounds  values  condition  a  i nthe  be o m i t t e d  optimally  the degree  singular  i s not possible  linear  will  i t must  counterexample  case,  of k i n the general  standard  be  deduce a  those  tedious.  case,  referenced  they  regression  achievable.  and  and  t h e o r e t i c a l l y i t does  l e d back  [HOERL,KENNARD,701 theorem,  can  proof  i t i s of limited value  solely  argument  This  i s long  assurance  present.  estimation  algebra  t h e theorem  depends  regression  candidates  reviewing  by  not  on t h e  analogy  with  are suggested.  various  rules  state  that "no s i n g l e p r o c e d u r e emerges from these s t u d i e s as best overall... Our own preference i n practice i s f o r ordinary r i d g e r e g r e s s i o n w i t h k s e l e c t e d by inspection of the ridge trace ... I t i s also occasionally useful to find the " o p t i m u m " v a l u e o f k s u g g e s t e d by H o e r l , K e n n a r d and B a l d w i n [1975] and the iteratively e s t i m a t e d "optimum" o f H o e r l and  K e n n a r d [1976] and compare t h e r e s u l t i n g models obtained v i a the ridge trace." Hence  the following  Definition for  Rule  :  Rule  linear  k i s that  stabilise,  ie.  Selection  the general  I  to  4.IB  model  value  y = g  which  against  R  t h e one  definition  o f k i n Ridge  and i s found  B (k) plotted  II :  extended  with  by  — 1  Estimation  (XB)  causes  + €  the estimate  inspection  fi^OO  of the ridge  k.  [HOERL,KENNARD,BALDWIN,75]  k where the  Rule I I I :  0  i s the dispersion  generalized  linear  parameter  estimate of  model.  [HOERL,KENNARD,76]  Algorithm [ Rule REPEAT  i=l(l)  UNTIL  convergence  ft<k > ) f i ( k <* > ) 11  R(k < *- > ) = 1  END  REPEAT  T  <X' WX r  + k  <  1+ 1  >  I ) - X'Wy 1  II ]  trace  Th e r e  i a  including but  a  the  legion richer  the above three  of  other  method  methods  methods  to  of generalized serve  choose  ridge  from,  estimation,  to illustrate  the  general  idea .  4.3.2  Bayesian  Estimation  Standard  Linear  tLINDLEY,SMITH,72] has  the  normal  multivariate posterior ft =  [X X r  +  where So  by  more  applied  work,  model  method.  posterior  type  in  NCO,  around this  i t  normal  must  points  to the concept  by  a  ft  i f y  has the  a, >  then  of  a  e  a  the  concept i s  means  a  be  of little  CNELDER,72], out that of prior  = prior  Bayesian  unpalatible come  This  which  The  exchangeable  t h e ft m u s t  used,  prior  i s obtained. strictly  distribution.  always  However  likelihood  the  concept  since  terms  estimator  are motivated  multivariate  equivalent  and  iR  distribution  ridge  paper  [LINDLEY,SMITH,721, is  u I>  P  information  pivoting  standardised  N (XB,  case,  o^/Ufa**  Unfortunately  standard  the  prior  f o r ft a  philosophy,  for  =  of this  priors.  distribution  linear  kll-'X'^ k  distribution  i n the standard  o f ft i s  adding  results  show,  normal  mean  Case  from  a  implies  a  detracts  from  in  reviewing  the Bayesian  formulation  likelihood  likelihood  *  [EDWARDS,69]  likelihood  This  i s  a  likelihood In  framework  General  of  as i s  [EDWARDS,691  1,  evidence prior  for  that  normal.  f yiXifl  S  the f i r s t  exponential term  i s the prior  obtained  from  Appendix  4A.  mode  prior  likelihood  .  :L  +  the partial Hence  the  using  of the  derivatives  generalized  1.3, a n d t h e s e c o n d a  normal  likelihood  following  of  ) •  :L  maximum  philosophy  _. ,  as i n d e f i n i t i o n  The  natural  o f fi i s  c<y ,0)  log likelihood  i s a  e s t i m a t e s fi a r e  the  log likelihood  J  experiments, the  sampling  i s the log likelihood  (O.dra^).  with the  models.  other  - b(x B)"|  distribution  parameters  and  prior  with  estimates are  are presented i n  definition  of a  posterior  estimator.  Definition linear that  term  linear  using  a70>  where  i n better  from  the  So  the posterior  = (  F  of  fits  Case  known  asymptotically  and  of generalized  a normal  since  approach  Linear  the absence  choice one  simpler  4.2  model  maximises  =  (  £  Posterior y = g  _ 1  (Xfi)  Mode  + € i s the value  the density  I" y * « , B  i-i [  -  Estimator  b<x i»1  +  ±  "  C ( Y I  J ^logo,;, * 1  S:  fi -., f  expression  a(0) CJ«.  f o r the  2  }  , > ,• 0  generalized  4.3.3  Principal  Component  Standard  Principal  variables,  of  y~  the standard  = X-fl*  A T  +  =  where  X"  variables  linear  by  linear  reduced  set of  removed in  the design the  case,  predictor  are  considering  those matrix.  canonical  model, i e .  = XT  XX  •= T" T  r  r  T ' X XT 1  =  ['Xi. , . . . , X ]  eigenvalues  =  [ t i , . . . , t ;,l e i g e n v e c t o r s  =  T'JWra  p  components  = A  of  X X T  of  r  estimator  X' X 1  i s defined  as  TBB~ B  = dlag(bi  substituting A  a  singular values  b ,} P  bi  So  using  standard  €  the p r i n c i p a l  A OPC  the  are easiest derived  where  Then  by  the  to small  results  in  estimates  where  corresponding  form  Case  components,  produces . biased  The  Linear  Estimation  f o r B"  TBT B  0  \  "small"  1  e l se  :l  from  above  r  (  TBT" " ( X X ) - X 1  1  TBT"' TA  1  1  XBo....m  T X Xfl ua  1  T  T  n  [*,/  [T  TA-••• T" X" XB,:  [A-  1  r  r  r  r  1  tdiagonalisation  T BA ~" T X X B o i... 1  X'X  orthogonal 1  = BA —  1  (X^X)' ] 1  3 3  (X x>-x xrw. T  r  where  ra  (X X)""  1=1—<« ^ T - - t i t i. "•• . 1 A j.  =  T  i s an  T t  approximate  inverse Now V a r ( B p c )  =  V a r ( T B G •*)  =  TBVar(B")B T T  [Var(AX)  T  TB ( X » * X * ) - B T CT^-  [by c o n s t r u c t i o n  TBA - 'B"' T "a™  [diagonal isation  1  r  r  1  which  i s  least  sqaures  t o be  Var(fl .a) Hence  terms  due  values.  General  a  components  i n the variances  singular  In  with  the variance  to  those  Linear  similar  alleviates  i t  i s  possible  setting  Definition  4.3  Components  Principal linear  A  model  A =  (X WX)-X WXB T  T  1 R L  .  where (X WX)'" T  =  by r e m o v i n g t h e to small  Case  manner  the general  the consequences of  variables corresponding  to the general  f W  3  of the ordinary  of the effects,  definition  for  ]  TA-^T^tr*  principal  collinearity large  compared  ]  T  estimator  =  m  = AVar(X)A  TA — T ' 1  r  B  to  extend  this v  as follows.  y = g  Estimator _ 1  (Xfl)  + €  i s given  by  Hence variances are  A  =  [\i  T  =  t t i , . . . , t,,]  B  =  d i ag (b ,  by d i r e c t of  similarly  analogy  t h eestimates reduced.  X  r  J  eigenvalues  o f X WX  eigenvectors b  F )  o f X WX r  )  0  Xt "small"  1  else  t h ee f f e c t s i nthe  r  of collinearity  generalized  linear  on t h e model  4.4  Appendix  Posterior  The  4A  likelihood  manner  by  However  the problem  i s  values  i s  variance  are  of B  [EDWARDS,69a]  i s  acceptable  substituting  of the  when  remove  parameters. because  t o use f o r the  expected  matrix.  further  nuisance  t h e maximum  However  parameters formation  He  the  usual  standard  the  appropriate.  out, using  non  nuisance the  i n the  to the  information  involved,  to  i t  slightly  the that  more  respect  distribution  obtaining shows  like matrix  shows  of  that i t  parameter  likelihood  the  by  equations.  the following calculations  1P  =  i.L £  I  f VxXiB  —-s-  -  the following  partial  Ql , ©ft,.,  - Hi>Xi  _  t  01  @a  K y i  c  <V-  A  0  )  , >  )  differentials _  _  are  obtained  fl ,_ r  a,  iz  3  B B 2 (ar,.)  = fii  m  01, 00  _  _p  2a  , s  e ra  [" £ y x f l - b ( x f l ) ~ L 0 1  1  : L  a  0fl .0B v  +  |loga*~  1 8  r  R  J  - b(Xifi)]  —aTc5)  ( So  Estimates  are obtained  with  then  no a s s o c i a t e d  in  Hence  equations  differentiating  [PATEFIELD,771  is  Likelihood  Distribution  maximum  there  Maximum  = s  z  iXiir, J~ XL ai s [Op f t iJH V v  :: ll  |L00 n J J A A  -  + +  @c(y . ,0)1 00 J :l  crB  M  [  6  „  s  Kronecker  delta]  66  Z <y  S l  -  e  P  Z(y  S lp  t  pi > xi  - Hi)x  By  :  •B B  a  T  (  Eliminating equations  • -  i  C  T  r  the  a  a):a  ( CT ,  nuisance  E A  >  parameter  gives BJB  P  The  score  vector i s  ©fi ©0 and  the formation  0B0B f l p 0B00 with  the usual  matrix  ©Jiie 0B00  0 1 , @0 S:  F  C  scoring  fi  fi  0  0  i s  +  equation  F-^s  being  a* f3  using  t h e maximum  67 5.0  5.1  are  developed obtain  in  verify  set  unknown.  that  square Because  Scope Other  methods  of this,  predictor  such  this  o f fi ( a ) ,  thesis,  illustrate adopted.  and s o  one l e v e l  would  be t o  symptoms. then  three  approach  to  applied  to the  calculate  coefficients  h a s been  The  be u s e d  has  to directly  the "true"  a simulation  as  are  choosen.  [DEMPSTER,SCHATZOFF,WERMOUTH,771  and  <P>,  [SCHAEFER,82] - sample degree  e t c .But t h i s  methods  simulation  and a t o two  of  of  studied  <n),  the  chapter  (d),  resources  that  the  number o f  collinearity  simulation,  parameters levels.  have  size  i s beyond  a restricted  some o f t h e  In t h i s  this  since  could  chapter  not possible  parameters  variables  alignment  collinear  methods  Simulation  simulations  of several  of  the  One a p p r o a c h  two  However,  [McDONALD,GALARNEAU,751  effects  has  chapter  i t i s  validating  chapters.  error,  of the  to  which  of  estimators.  mean  5.2  data  and t h e  disadvantage, the  approaches  measures  this,  obtain  two  the previous  a large  diagnostic  to  Example  Introduction There  ,  Illuat.rat.ive  serves  of to  four,  has been  n, p, d a r e  restricted  5.3  Generation  5.3.1  of Collinear  Standard  The  Linear  procedure  (eg.  that  Case has  been  followed  [McDONALD,GALARNEAU,751,  [WICHERN,CHURCHILL,78), collinear  design  dependency.  Two  selected, and  Data  as  matrix  "true"  from  mean  square  respectively, [THISTED,76]  p74  as  have  corresponding system.  T  i s  these  It  choices.  then  i s known and  been  smallest that  minimised  [GIBBONS,81]  f o r these  a  linear  to the  maximised  justification  to generate  theoretical  vectors  of the X X  error  with  a  coefficient  the eigenvectors  simulations  [HEMMERLE,BRANTLE,78],  [GIBBONS,81]) has been  largest eigenvalues  the  i n many  cites  choices  " "extreme-case simulation" experiments a p p e a r t o be t h e most e c o n o m i c a l and i n f o r m a t i v e , especially f o r preliminary s t u d i e s o f new results" Finally,  the  systematic  Now eg. to  response  part  much  a fair  5.3.2  General  generalized  been  basis  the resources  problem  has  NOSTRAND,79]  beyond  The  Linear of  linear  i s generated  t o Xfl a n d a r a n d o m  criticism  [DRAPER,VAN construct  due  vector  for  of this  normal  levelled  and  much  a s t h e sum error.  against work  comparison.  of the  needs  this  design  t o be  However,  done  this  i s  thesis.  Case generating  model  a  collinear  i s not as simple  as  data  s e t from  in the  a  standard  linear  case,  since for  the p  series  the X X  system  the  standard  r  must  be  used . A solution,  in a  case  might  ft a s  the eigenvector  of  X X. T  recover  appear  similar  Then  an it  i s  The as  r  not by  [GIBBONS,811 of  to the smallest  =  w'^(X,ft)X  matrix.  However  i s itself with  to  no  a  non  choose  i s to start  maintain by  convenience  dispersion  system  X  X as c o l l i n e a r ,  could taking  linear  selecting eigenvalue be  used  ft  to  as the  f u n c t i o n o f ft l e a d s trivial  the  solution.  alignment  to  Hence  of  ft  as  tTHISTED,761.  premultiplication  For  X  which  possible  alternative to  relation  o f X X,  to  to generate  corresponding  original  overconstrained  recommended  so  the  the  eigenvectors  t o be  manner  i s used  collinearity.  collinearity  This  X collinear in  i s effectively  the case  parameter  with  of the  ( i e . 0=1) to generate method  i s  X  by  and  realising  row  scaling  gamma  density  chosen.  X with  s e l e c t ft  The  of  i s as f o l l o w s .  X.  with method  a controllable  that  unit of  degree  Algorithm  STEP  5.1  Modified  [GIBBONS,81]  Collinear  Generator  of X  1  Generate i STEP  z .j as independent  N<0,1>  t  = l ( l ) n  paeudo  random  variables  i=l(l)p+l  2 ct  Select  where  a  i s  l S  the  correlation  between  any two  variables Compute x j t  STEP  =  (1  e  X  In  the X matrix  = W^X  where  term  distribution.  :L  €,  =  l ( l ) n  j= l ( l ) p  as =  model  y  component  o f as having  considered, T  B  mean  come  distribution l / X i  the  one  So i t i s n o t m e a n i n g f u l  with  =  r  t  has  distribution  ±  l/x ' B  linear  that  to the systematic  thought  H  i  t;  y i s t h e sum o f t h e s y s t e m a t i c  error  term  w ^  the generalized  response  be  + o z i , ,,. j  3  Compute  the  - o( )Zij  vector  from  =  g (XB)  +  1  € the  component  g- ( X f i ) a n d  parameter  exponential  1  t o add a s p e c i f i c  t o generate  y. Rather  a one parameter  p =  g " (Xfi).  t h e mean  vector i s  1  error y can  exponential  So f o r  t h e gamma  5.4  Simulation  5.4.1  Simulation  The  y  model  to  = g (Xft) 1  Setup be  considered  + €  i n the  simulation  where  y  : n x l gamma  X  :  variate  dispersion  parameter  n  : sample  size  p  : number  predictor  fi : p x l  2  levels  =  =  30 variables  (including  4  »<•-->  fi<«a>  O.Ol  -0.001  O.Ol  fi< i-. >  =  0.01 fi  0.01  < ™  •0.01  O.Ol  l e v e l s were  measured  by  predictor  reason. as  largest levels  the  degree  of  collinearity  correlation coefficient  d"  choosen  X generated  as  by  eigenvectors  eigenvalues are  f o r the  =0.80  were  Given  above  the  O.Ol  d  between  as any  variables  ft<is>>  <L>  chosen  a'"'-,  d<*> fi ,  0=1  nxp  intercept)  Two  i s  (t* -*, 1  =0.95  s >  above  Algorithm  for 5.1  corresponding t  < r a >  ) of  X  the and  to the with  following  d  fi choosen <l  smallest at  the  and two  two  =  <u>  0.95  t<'- >  -0.63  0.28  -0.75  0.05  -0.48  -0.49  -0.39  -0.34  -0.43  0.65  -0.36  -0.44  -0.50  -0.37  Now  aa  discussed  choice  B <'- >  of  approximately eigenvectors  above  of X(B>  eigenvectors  is  =  £ < la >  t  the  0.80  i t  aligned X ( B  of  expected  with  <  L  >  )  the  B  to align  analytically,  by t h e s u i t a b l e  seen  the for B  -0.52  i t i s not possible  i s in  0.78  < i s  that  but &<>•->  directions d  "  = 0.95. As alignment  B  and  of  <  E  with  " are  the respective  little only  improvement the  1  setting  ( B <»> , d <•=> > i s u s e d .  A  pilot  reasonable able 5  to  r u n sample  detect  percent  Because were  simulation  a  level.  of  ten  size.  The  10 p e r c e n t This  gave  resource  runs  was  criterion  change  made  to estimate  was  that  i n the estimates  a run size  of  : iteratively  B  :  I  1  just  B,=.c:  least  [HOERL,KENNARD,BALDWIN,751  : principal  components  being  (m) o f 2 0 .  constraints,  reweighted  a  B at the  three  estimators  considered  BIRLO R  of  squares  ridge  estimator  estimator  estimator  To  assess  criterion  o f the so c a l l e d  ASE(fi) is  the effectiveness  =  (fi -  fi) (fi  of the estimators,  average  square  the usual  error  - fi)  T  used.  Now  i t  variance  i s  of  the  [SCHAEFER,79] moments here,  of  possible  but should  5.4.2 S i m u l a t i o n The stage  ridge  chapter the  was d i v i d e d  used  SAS  used  generalized  iteratively  estimators  While  sets  used  PROC  i t s matrix  machine  coding  of  ability  leaves  much  deficieny  OCl/n-"} ( s e e  the third  i n any f u t u r e  and f o u r t h  this  i s omitted  work.  and D i f f i c u l t i e s into  three  PROC  squares  MATRIX  and  The  first  t o generate  5.1. The  models package  least  stages.  MATRIX  Algorithm  linear  PROC  of  order  f o r the  second  GLIM  estimates. calculated  the stage  to get the The  third  the biased  and s t a t i s t i c s .  t h e SAS of  using  to  o f y. However  procedure  reweighted  again  terms  Implementation  data  because  II) i n  be c o n s i d e r e d  the  approximations  estimators  simulation  the  obtain  distribution  collinear  stage  to  i s that  MATRIX style  the  very  to  routine syntax  matrix be  large  i s  t o use  and i s e f f e c t i v e  routines;  desired. matrices  pleasant  due t o  i t s input/output  The  result  must  be  of  stored  this (with  numerous  consequent  physical  is  made o f  interpreter  to  be  During fault  was  precent model has  the  no  the  data  used  with  with  #****FAULT  the  and  this  and  Table parameter estimator nested  actual  5.1  flpc  This  stage,  effected  use  computational  that (as  fitted  a  approximately  appear  collinearity  when  was  values  a  five gamma  here), and  GLIM  sensibly  message  data  the  type  by  in  in  fii,  ratio as  Conclusions displayed  a  each  cell  the  f i , , fl , fi-+ f r o m s  column  error  This  and  were s u b s t i t u t e d .  in Tables  5.1  5.1.  alignment  5.2  sets  simulations are  lists  values  parentheses.  A  second  negative  Results  Figure  squared  reduction  for  i f efficient  code.  I t would  severe  extra  of  within  Table  and  sets.  problems)  **»«»*****»  results  5.2  GLIM.  warning  Simulation  The  the  59  overcome  5.4.3  of  in  adjustement  aborts  To  running  found  of  is  the  i/o  3  and fi<>  the by  similar and  i t s  table  i n ASE opposed  by  degree  the of  m  runs  estimated with  collinearity  the d  c >  row.  layout  gives  associated  also using to  average  the  standard  displays the the  biased usual  values  the  of  error  the in  percentage  e s t i m a t o r s fi« . T  estimator  A fiiR .i=) L  (ie.  B / B i RI....E)* 1 0 0 ) .  criteria  the  comparison and  d  <  bars  o f ft f o r ( f t , d that  <  l  S  :  >  comparisons  level  with  within  comparison  i s displayed homogeneous  the  three with  o f ft"-' t o make  ft<«>  the multiple  setting  at the five i n Table  the simulation  (ft <'--> , d < •> ) , 1  the scaling  the graph  size  o fft<> precent  5.2  where  groups.  displays graphically  ). Note  test  sample  were  each  pictorially  5.1  settings  the  considered  estimators  pairwise  represent  Figure  accordance  of the three  signficance the  only  . Tukey'a  J  In  easier  estimates  (B< >,d  has been  L  < e >  )  modified  t o comprehend.  and to  76  Table  5.1  Estimators  f o r t h e Gamma  fi  fi  1.  fi "  =  <L>n  B  <S>T  2.  d<  3.  Each  A >  =  in:  0 .0128  0. 0 1 0 7  0. 0 0 9 4 0  0 .00941  0. 0 0 9 5 8  0 .0 1 0 1  0 .0101  0. 0 0 9 1 4  0. 0 1 0 6  0 .0128  0. 0 1 0 5  0. 0 1 0 3  0 .0127  0. 0 1 1 0  0. 0 0 9 9 5  0 .00804  0. 0 1 0 7  0. 0 1 0 1  0 .00749  0. 0 0 7 3 5  0. 0 1 0 4  0 .0163  0. 0 1 2 3  0 .O l O l  0 .00763  0. 0 0 4 1 3  0. 0 0 1 7 5  o .0100  0. 0 0 6 7 2  0. 0 0 2 2 9  o .0101  0. 0 0 3 1 4  -0 . 0 0 2 7 9  0 .0105  0. 0 0 5 4 8  0 .0 0 2 6 2  < L:  <  s  >  (O.Ol,  O.Ol, O.Ol,  (-o.OOl,  = 0.80  d  O.Ol)  O.Ol, -0.01,  < f i : >  =  contains  O.Ol)  0.95  A cell  Ar~fi  A  A  fii: Ft i~ra  Model  (fit,  A  A  fi , s  A  fi , 3  B ^ )  T  Table  5.2  Actual for  Squared Error  t h e Gamma  A  d  <t >  Estimators  Model  A  A  A I R i... m  0.00171 C0.00209  of  B  ]  H  I  ft (.;-.:  T  0.000450 [0.000984 ]  0.000167 [ 0 . 0 0 3 1 5 )3  26%  9%  ft <L > 0.0139 [0.000177 d  <  ft  1.  B  <  L  s >  >  ft<S>r  d  =  r  =  )]  19%  0.000281 [0.000454 3  <a>  0.0000942 [0.000142 3  3.  Each c e l l  0.000197 [0.000193 1  0.000286 [O.0000465 3  d  100%  O.Ol, O.Ol, O.Ol)  (-o.OOl,  > = 0.80  1%  55%  (O.Ol,  d  %  0.00268 [0.00722  < a: >  2.  <  ]  0.01,-0.01,  < c i >  contains  =  O.Ol)  0.95 ASE a v e r a g e  for m  [standard  error!  reduction  ratio  runs  % using  a  estimator 4. Bars r e p r e s e n t homogeneous groups w i t h Tukey's p a i r w i s e comparison a t t h e 5% l e v e l  biased  F i g u r e 5.1  Mean ± Standard D e v i a t i o n I n t e r v a l s o f E s t i m a t o r s  (a)  (b)  ,d"')  f o r the Gamma Model  ,d > )  (fi  <a  o.eo  o.o« 0.07 0.0* 0.00 0.04 COS  u o  o.ot  +  o.oo  99  -  i  0 -.01 2 •o.ot -0.00 LEGEND  CZ2  1  E 3  3 4  •  -0.04  fii  -o.oi -o.oa -o.or  Rll  IRLS  -0.8*  PC2  IRL8  ESTIMATES  P C2  Rll ESTIMATES  (c)  (fi <s> ,d  <,=:>  )  0 .0* o.oo o.or 0.01 0.01 0.04 O.Ol  >  1 i• o 1  o.oa  0.01  0.00  1 -00.1 +  •o.ot  -0.00  B<u>x = (O.Ol, O.Ol, O.Ol, O.Ol) A < S > T  d  < 4 >  =  =  ( - 0 . 0 0 1 ,  0 . 8 0  d  O.Ol, -O.Ol, O.Ol)  -0.04  0 -.01  E 2  LEGEND CS3 t CD t 4  -0.00 •o.or  tts:>  =  (  0 . 9 5  0 -.00  IRLS  Rll EST1MATE8  &l  PCS  78  The are  major  as  the  alignment  i s considerable  biased  estimator.  proportional case  to  the  principal  estimator the  five  In fi )  can  is  be  drawn  favourable  reduction  Also,  the  degree  of  components  (Duncan's  from  the  results  in  variance  amount  range  finds  then  1  using  reduction present.  better  test  ft* -*),  from  of  collinearity seems  multiple  ( i e . fi =  than a  In  the  a is  this ridge  difference  at  percent ' l e v e l ) .  the  case  there  <ra>  that  follows.  When there  conclusions  estimation  when  appears  ,  the to  provided  alignment  i s unfavourable  be  some  gain  from  the  bias  induced  (ie. B  ridge  using  could  be  is  well  =  considered  tolerable.  The  effect  Figure  5.1(c).  mean can  square  be  seen.  because  It drawn major  of  the  using  In  particular  error The  i s encouraging  summary  Table  =  estimators the  variance  IRLS  variance  from  biased  and  PC2  reduction  to  note  5.2  conclusions  are of  in  identity  +  biaa  a  estimators  have  the  in  PC2  is offest  that  the  conclusions  in  shown  reasonable  [GIBBONS,81]  by  same  i t s bias.  that  agreement pl37-8.  ASE,  can  with  be the  1.  2.  A l l e s t i m a t o r s a r e b e t t e r t h a n t h e LS e s t i m a t o r s when t h e u n d e r l y i n g c o e f f i c i e n t v e c t o r i s f a v o r a b l e ; t h a t i s B = B,„. No e s t i m a t o r i a a l w a y s b e t t e r t h a n t h e L S when t h e underlying coefficent vector i s unfavorable; t h a t i sfi= B.,. n  3.  So are  Estimates  HKB,  where  refers  i n conclusion,  definite  estimator linear  extensive carried  gains  when  model  collinear  be  HKB  GHW  a n d RIDGM  to  there to  a  structure  i s some e v i d e n c e  b e made  testing,  gamma as  using  out before  well  overall,  [HOERL,KENNARD,BALDWIN,75]  collinearity with  performed  was both  by u s i n g i s  to  some f o r m  present  in  distribution generated. simulation  any broader  suggest  a  of  there biased  generalized  and an  artificial  However  far  more  and r e a l  data,  must  statements  c a n be  made.  81 6.0  This  thesis  collinearity  has  in  identification  as  Summary  an  approximate predictor  naturally  of  linear  X  =  associated  with  shown  collinearity  that  general series.  system That  the  W  X  general  for  amongst  is collinearity  is  is  linear  model.  system  the  shown  the  and  to the  scaling  weight  matrix  carries  over  general  of  extend  this,  the  model  columns  Using  except  i n the  linear  with  W  models  diagnostic  standard  where  X  for  methods.  setting  i n the  a l l  in a  [GUNST,83],  general  definitions  model,  estimation  dependency  X  the  transformation  and  collinearity  matrix  to  suitable  generalised linear  procedures  definition  Conclusions  investigated  the  The  the  and  so  i t  was  to  the  called  setting  is  p  model  dependent.  The  CGUNST,831  mathematically transformed extend  the  were  generalized linear  model  definition  equivalent to  predictor  small  matrix  X.  scheme. derived  linear  To for  This  model  collinearity  quantify the  the  «*.  of  shown  to  be  values  in  the  property  was  used  diagnostic model  collinearity  i n terms index  was  singular  [BELSEY,KUH,WELSCH,80]  identification bounds  extended  that  and  dependency,  index of  to  the  of  the  standard  Although estimation in  trying  bound  ridge method  Given  previous area  and  problems  the  be  estimation  appealing  alternative,  liklehood  of  i s a  approach,  amount  t o s e t up  need  of  research.  a  Monte  parameters  the c r i t i c i s m there  encountered  [HOERL,KENNARD,70]  for future  in trying  studies,  simulation  conclusions, exist  much  theoretically could  area  the easiest  Carlo  in  the  directed  against  f o r more  work  in  too.  collinearity However  An  were  "equivalent  i s an  found  the  posterior  model, and  simulation  k.  an  the  linear  extending  the  method  for  generalized  While  This  the problems  simulation  aims  was  seemed  difficulties  parameter  incorporates  information".  this  i t by  f o r the  discussed,  which  intuitively  to consider,  t o implement  result  briefly  regression  and  i t  with c a n be  more  extremely  d i d succeed  estimation alleviated  experience  before  advocated  was  some  form  for routine  the  presence  t h e methods  i s needed of biased use.  i ni t s  i n demonstrating in  by  restricted  both  that of  described.  practically  estimation  and  methods  Biblioqraphy  [ALLEN,77] A l l e n , D M "Comment to Least Squares" Journal A s s o c i a t i o n 72 3 5 7 9 5 - 9 6 .  [BARD, 7 4 ] Bard,Yonathon A c a d e m i c P r e s s New York.  on of  Simulation of Alternatives the American Statistical  Nonlinear  Parameter  [BELSEY,KUH,WELSCH,80] B e l s e y , D A Kuh,E W e l s c h . R E Diagnostics : Identifying Influential Data and Col1inearity J o h n W i l e y & S o n s New York.  Estimation  Regression Sources of  [BRADLEY,SVRIVASTAVA,79] Bradley,RA Svrivastava,SS "Correlation i n Polynomial Regression" American Statistician 33 1 1 - 1 4 .  [DEMPSTER,SCHATZOFF,WERMUTH,77] Dempster,AP Schatzoff,M Wermuth,N "A S i m u l a t i o n Study of Alternatives to Ordinary Least Squares" Journal of the American Statistical A s s o c i a t i o n 72 7 7 - 1 0 6 .  [DRAPER,VAN NOSTRAND,79] D r a p e r , N R Van Nostrand,RC "Ridge r e g r e s s i o n and James S t e i n E s t i m a t o r s : Review and Comments" T e c h n o m e t r i c s 21 4 5 1 - 4 6 6 .  [EDWARDS,69] E d w a r d s , A W F "Statistical Inference" N a t u r e 222 1 2 3 3 - 1 2 3 7 .  [EDWARDS,69a] Press London  Edwards,AWF  [GIBBONS,81] G i b b o n s , D G Estimators" Journal of 76 373 1 3 1 - 1 3 9  [GUNST,80] G u n s t , R F Regression Methods" Association 75 369  Likelihood  "A the  Methods  in  Cambridge  Scientific  University  S i m u l a t i o n Study o f Some R i d g e American S t a t i s t i c a l Association  "Comment o n Journal 98-100.  A of  Critique of the American  Some R i d g e Statistical  [GUNST,83] G u n a t , R F "Regression A n a l y s i s with Predictor Variables : Definition, Detection Communications in Statistics Theory and 2217-2260.  Multicol1inear and Effects" Methods 12(19)  CHEMMERLE,BRANTLE,78] Hemmerle,WJ Brantle,TF " E x p l i c i t and Constrained Generalized Ridge Regression" T e c h n o m e t r i c s 20 109-120.  [H0ERL,62] Regression 54-59.  Hoer1,AE Problems"  "Application of Ridge Chemical Engineering  Analysis Progress  to 58  [HOERL,KENNARD,70] H o e r 1 , A E Kennard,RW "Ridge r e g r e s s i o n : Biased Estimation f o r Non-orthogonal Problems" Technometrics 12 5 6 - 6 7 .  [H0ERL,KENNARD,76] H o e r 1 , A E Kennard,RW Iterative Estimation of the Communications i n S t a t i s t i c s A5 7 7 - 8 8 .  "Ridge Biasing  Regression : Parameter"  [H0ERL,KENNARD,BALWIN,75] "Ridge Regression : Some S t a t i s t i c s 4 105-123.  Hoer1,AE Kennard,RW Baldwin,KF Simulations" Communications in  [KENDALL,57] Kendall,MG Griffin London.  A  [KENDALL,STUART,67.2] Theory of Statistics Griffin London.  [LINDLEY,SMITH,72] the Linear Model" B 42 3 1 - 3 4 .  [McCULLAGH,82] Lecture Notes  Course  Kendall,MG Volume 2  in Multivariate Analysis  Staurt,A Inference  and  The Advanced Relationship  Lindley,DV Smith,AFM "Bayes e s t i m a t e s f o r Journal of the Royal S t a t i s t i c a l Society  McCullagh,P University of  [McCULLAGH,NELDER,83] L i n e a r Models Chapman  "Categorial Data British Columbia.  McCullagh,P Nelder,JA and H a l l London.  Analysis"  Generalized  [McDONALD,GALARNEAU,75] M c D o n a l d , G C Galarneau,DI Carlo Evaluation o f Some Ridge Type E s t i m a t o r s " t h e A m e r i c a n S t a t i s t i c a l A s s o c i a t i o n 70 4 0 7 - 4 1 6 .  [MONTGOMERY,PECK,82] M o n t g o m e r y , D C P e c k , E A Linear Regression Analysis J Wiley & Sons  [M0YLS,851  [MULLET,76] Wrong S i g n "  Moyls,B  Personal  "A M o n t e Journal of  Introduction New Y o r k  to  communication  Mullet,GM "Why R e g r e s s i o n C o e f f i c i e n t s have t h e J o u r n a l o f Q u a l i t y T e c h n o l o g y 8 121-126.  [NELDER,77] N e l d e r , J A " D i s c u s s i o n on Bayes E s t i m a t e s Linear Model" Journal of the Royal S t a t i s t i c a l S e r i e s B 34 1 8 - 2 0 .  f o rthe Society  [NELDER,WEDDERBURN,72] Nelder,JA Wedderburn,RWM "General Linear Models" Journal of the Royal S t a t i s t i c a l Society S e r i e s A 135 3 7 0 - 3 8 4 .  [PATEFIELD,77] Patefield,WM "On the Function" S a n k h y a S e r i e s B 39 92-96.  [PREGIBON,81] Pregibon,D The A n n a l s o f S t a t i s t i c s  Maximized  " L o g i s t i c Regression 9 4 705-724.  Likelihood  Diagnostics"  [SCHAEFER,79] Schaefer,RL Multicollinearity in Logistic Regression PhD T h e s i s U n i v e r s i t y o f M i c h i g a n #792522D  [SCHAEFER,82] Schaefer,RF "Alternative Estimators in L o g i s t i c Regression when t h e D a t a a r e C o l l i n e a r " Proceedings of the American Statistical Association Statistics and Computing S e c t i o n 159-164.  [SILVEY,69] Silvey,SD "Multicollinearity Estimation" Journal of the Royal S t a t i s t i c a l B 31 5 3 9 - 5 5 2 .  and Imprecise Society Series  [SMITH,CAMPBELL,80] S m i t h , G C a m p b e l l , F "A C r i t i q u e o f Ridge Regression" Journal o f the American S t a t i s t i c a l Association 75 3 6 9 7 4 - 8 1 .  CTHISTED,76] Thisted,RA "Ridge Regression, E s t i m a t i o n and E m p i r i c a l Bayes Methods" Technical 28 S t a n d f o r d U n i v e r s i t y D i v i s i o n o f B i o s t a t i s t i c s  CWEDDERBURN,74] W e d d e r b u r n , R W M "Quasi Generalized Linear Models and the B i o m e t r i k a 61 3 4 3 9 - 4 4 7 .  Hinimax R e p o r t No  Likelihood Functions, Gauss-Newton Method"  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items