UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Monotone regression functions Zuo, Yanling 1990

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1990_A6_7 Z86.pdf [ 4.39MB ]
Metadata
JSON: 831-1.0098248.json
JSON-LD: 831-1.0098248-ld.json
RDF/XML (Pretty): 831-1.0098248-rdf.xml
RDF/JSON: 831-1.0098248-rdf.json
Turtle: 831-1.0098248-turtle.txt
N-Triples: 831-1.0098248-rdf-ntriples.txt
Original Record: 831-1.0098248-source.json
Full Text
831-1.0098248-fulltext.txt
Citation
831-1.0098248.ris

Full Text

M O N O T O N E REGRESSION FUNCTIONS By Y A N L I N G ZUO B . S c , Beijing Institute of Technology, 1984 M . S c , Beijing Institute of Technology, 1987 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE  in T H E F A C U L T Y O F G R A D U A T E STUDIES D E P A R T M E N T O F STATISTICS  We accept this thesis as conforming to the required standard  T H E U N I V E R S I T Y OF BRITISH C O L U M B I A September 1990 ©Yanling Zuo, 1990  In  presenting  degree  this  at the  thesis  in  University of  freely available for reference copying  of  department  this or  publication of  partial  British Columbia, and study.  thesis for scholarly by  his  this  or  her  Department of  STATISTICS  The University of British Columbia Vancouver, Canada  DE-6 (2/88)  RKPT.  ? a  of  the  r  i g g n  requirements  I agree  that the  I further agree  purposes  representatives.  may be It  thesis for financial gain shall not  permission.  Date  fulfilment  that  advanced  Library shall make it  by the  understood be  an  permission for extensive  granted  is  for  allowed  head  that without  of  my  copying  or  my written  Monotone Regression Functions  Abstract  In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimatesMepend upon a "smoothing parameter": the number and location of knots in regression splines and the usual A in smoothing splines. Two standard techniques for choosing the smoothing parameter, G C V and A I C , are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets.  n  Table of Contents  Abstract  ii  Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgement  xiii  Chapter 1. Introduction  1  Chapter 2. The Sobolev Space W [a,6]  4  Chapter 3. Splines  6  m  2  1. Definition of S (t ..., m  u  t)  6  k  2. Truncated Power Basis  6  3. B-splines and I-splines  8  4. Monotone Splines  10  Chapter 4. Regression  14  1. Quadratic Programming  14  2. Least Square Regression Splines  15  3. Smoothing Splines  17  3.1. Unconstrained Smoothing Splines in the Normal Model  17  3.2. Monotone Smoothing Splines  19  4. Choice of the Smoothing Parameter  20  4.1. G C V and Its Asymptotic Properties  21  in  4.2. A I C and Its Asymptotic Property  22  4.3. G C V Applied to Monotone Constrained Problems  23  4.3.1. The Approximation Form of G C V in Regression Splines  24  4.3.2. The Approximation Form of G C V for Monotone Constrained Problems in Smoothing Splines  24  4.4. A I C in Monotone Constrained Problems in Regression Splines  25  C h a p t e r 5. D a t a A n a l y s i s 1: the R e l a t i o n s h i p b e t w e e n S u r v i v a l P r o b a b i l i t i e s a n d B i r t h w e i g h t of M a l e Infants  27  1. Description of the Data Set  27  2. Exploratory Data Analysis  27  3. Analysis of the Data by Least Square Monotone Regression Splines  29  4. Analysis of the Data by Monotone Smoothing Splines  30  5. Some Discussions on Testing the Monotone Assumption of the Regression Function  30  5.1. The Summary Statistics  30  5.2. Simulation  31  5.3. Conclusion  32  C h a p t e r 6. D e s c r i p t i o n of S o n g S p a r r o w D a t a a n d E x p l o r a t o r y D a t a Analysis  33  1. Data Set Description  33  2. Exploratory Data Analysis  33  2.1. Sex Effect of Birds on the Values of Independent Variables  34  2.2. Pairwise Relations of the Six Independent and Sex Effect on the Relations . . 35 2.3. Principal Component Analysis  36 iv  2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Independent Variables  38  2.5. Generalized Linear Models  39  Chapter 7. Binomial Regression Splines  45  1. Unconstrained Binomial Regression Splines  45  2. Monotone Binomial Regression Splines  48  3. G C V in Binomial Regression Splines  48  3.1. G C V in Unconstrained Binomial Regression Splines  48  3.2. G C V in Constrained Binomial Regression Splines  50  4. AIC in Binomial Regression Splines  41  5. Data Analysis 2: the Relationship between Survival Probabilities and Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51 5.1. Survival Probabilities and Tarsus  51  5.2. Survival Probabilities and Beak Length  52  5.3. Survival Probabilities and Beak Width  53  Bibliography  54  v  List of Tables  Table 1. Male Births Data  28  Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables  35  Table 3. Logistic Models Fit to the Female Song Sparrow Data  40  Table 4. Logistic Models Including Six Independent Variables for Female Song Sparrow Data  41  Table 5. Logistic Models Including T and Its Second Interaction Term  41  Table 6. Logistic Models Including B L and Its Second Interaction Term  42  Table 7. Logistic Models Including B W and Its Second Interaction Term  42  Table 8. Logistic Models Fit to the Male Song Sparrow Data  43  vi  List of Figures  Figure 1. Fraction of Surviving Versus Birthweight  57  Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines  58  Figure 3. The Relationship between Birthweight and Survival Probabilities  of the Male Infants by Least Square Regression Splines  59  Figure 4. Grid Search on G C V  60  Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines  61  Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7Sin(l.5x)  62  Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 75m(1.5x)  63  Figure 8. Q-Q Plots of R\jc  Figure 9. Q-Q Plots of R\/u  2  When the Simulation Function Equals 75m(1.5x)  64  2  When the Simulation Function Equals 75m(1.5a;)  65  When the Simulation Function Equals 75m(1.5x)  66  Figure 10. Q-Q Plots of R /cr  2  0  vii  Figure 11. Q-Q Plots of RQ/(T When the Simulation Function Equals 75m(1.5x) 67 2  Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xI{ < <o.3} + 7 * 0.3J{o.3<s<i} + ( x - 0.8)I{ .8<*<1} 7  0  iB  6  8  0  Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{ < . y + 7 * 0.31{ .3<2;<i} + 7(x - 0.8)I{o.s<x<i} 0  x<0  3  0  Figure 14. Q-Q Plots oiRi/a  When the Simulation Function  2  Equals 7xl{ < . y + 7 * 0.31{ .3<z<i} + 7(x - 0.8)I{ .8<x<i} 0  x<0  3  69  0  0  70  Figure 15. Q-Q Plots of R\/o- When the Simulation Function 2  Equals 7x!{ <x<o.3} + 7 * 0.37{ .3<r<i} + 7{x — 0.8)I{o.8<x<i} 0  0  71  Figure 16. Q-Q Plots of Ro/a When the Simulation Function 2  Equals 7xI{ < . } + 7 * 0.3/{ .3<x<i} + 7(x - 0.8)7{ .8<x<i} Q  x<0  3  0  0  72  Figure 17. Q-Q Plots of Ro/cr When the Simulation Function 2  Equals 7x!{ <x<o.3} + 7 * 0.3/{ .3<x<i} + 7{x - 0.8)/{ .8<r<i} 0  0  0  73  Figure 18. Boxplots of Weight for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  74  Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  75  Vlll  Figure 20. Boxplots of Tarsus for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  76  Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  77  Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  78  Figure 23. Boxplots of Beak W i d t h for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  79  Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data  80  Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow D a t a  81  Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow D a t a  82  Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data  83  Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data  84  Figure 29. Scatterplot of Beak Length and Beak W i d t h for Song Sparrow Data . 85 ix  Figure 30. Scatterplot of Weight and Beak Length for Song Sparrow Data  86  Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data .. 87  Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows  88  Figure 33. The Relationship between Wing Length and Survival Probabilities of the Female Song Sparrows  89  Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows  90  Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows  91  Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows  92  Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  93  Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows  94  Figure 39. The Relationship between Wing Length and Survival Probabilities  x  of the Male Song Sparrows  95  Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows  96  Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows  97  Figure 42. The Relationship between Beak Depth and Survival Probabilities of the Male Song Sparrows  98  Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows  99  Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows  100  Figure 45. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows  101  Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method  102  Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method  xi  103  Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method  104  Figure 49. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method  105  Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  106  Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  107  xii  Acknowledgement  I would like to express my deep gratitude to my advisor, Professor Nancy E . Heckman, for her constant encouragement and very helpful advice, which are crucial to this work and very much beyond, and for her patiently reviewing the draft. Very special thanks are due Professor Constance Van Eeden and Dr. Jean Meloche for reading the manuscripts and making constructive comments. I am indebted to Dr. Dolph Schluter for his help in biology and Dr. Chong G u for his help in computation. Also, I thank Dr. Dolph Schluter and Dr. James N . M . S m i t h for allowing me to use the song sparrows data. This research was partially supported by the National Science and Engineering Research Council of Canada Grant Number 5-87969.  xiii  Chapter 1. Introduction  In this paper, effective methods for fitting monotone functions are discussed: least square regression splines, binomial regression splines and smoothing splines. These estimates are then used to decide if the true regression function is monotone. In some applications, we require the estimate of the regression function to be monotone. For instance, if f(x) is a response of a test of ability carried out by varying the degree of stress, x, it is reasonable to expect that / is a decreasing function of x. Monotone regression has been used in assessing the interaction between two drugs (Kelly and Rice, 1988). Asynergism is one commonly used definition of a lack of interaction. Asynergism is defined in terms of f(x,y),  f\(x)  = / ( # , 0) and f (y) 2  =  / ( 0 , y), where f(x, y) is the expected response at dosage x of one drug and dosage y of the other. Testing the null hypothesis that the two drugs are asynergistic requires the estimation of fx and f  and inversion of the estimates. Since drug response is usually  2  a monotone function of dose and since the estimates of fx and f  2  fx and f  2  must be invertible,  are estimated by monotone functions.  Monotone regression is also used in the quantification of a plant's ability to absorb different ions (Meier, 1989). In the study of a corn seedling's ability to absorb nitrogen, the experimenter is interested in the relationship between the nitrate concentration, u, of a solution surrounding the roots of a corn seedling and the seedling roots' rate of uptake (absorption) of nitrate,  The uptake rate is not measured directly. Instead,  the nitrate concentration of the surrounding solution, C , is measured over time, the rate of uptake determined by minus the derivative of this curve. The relation between $ and u can be expressed as *( ) U  =  |  ( = c  _  1 ( u ) =  1  -C\C-\u)).  So, we need estimates of C and C function to get a unique C  - 1  - 1  . It is necessary to assume that C is a monotone  and thus $.  In other applications, we may want to test whether the regression function is monotone. For example, in the study of natural selection, biologists wish to study the relationship between survival and some physical trait.  If the survival function  is unimodal, biologists say that selection is stablizing. If the survival function is monotone, selection is directional. Some techniques are presented for estimating monotone functions. Brunk (1955) proposed the Pool-Adjacent-Violators algorithm to solve the following problem. Given observations {yi, y , • • •, Un}, with E(yi) = u,, find {u\, u , • • •, ^n} to minimize YLiiUi — 2  2  iii) subject to the restriction ux < u < ... < u - While Brunk's method gives the 2  2  n  maximum likelihood estimate of E(yi), assuming y,- is normal, the resulting estimate is unappealing, due to its roughness. To get a smooth and monotone estimate, Friedman and Tibshirani (1984) combine the running-mean smoothing technique with Brunk's isotonic regression method. Instead of estimating the entire regression function, both methods give estimates of evaluations of the regression function at the observed data points. This paper focuses on fitting monotone functions by regression and smoothing splines. Ramsay (1988) and Kelly and Rice (1989) independently defined monotone regression functions in a spline space.  Their definitions and their equivalence are  discussed in Chapter 3 . Point-wise approximation to monotonicity is also given. Using these monotonicity constraints, monotone least square regression, smoothing and binomial regression spline estimates are introduced and developed in Chapter 4 and Chapter 7.. A l l three estimators depend upon a smoothing parameter: the number and location of knots in the regression estimate, and the usual A in the smoothing spline  2  estimate. For choosing the smoothing parameter in each of the methods, generalized cross validation ( G C V ) and Akaike information criterion (AIC) are used. Villalobos and Wahba (1987) introduced inequality-constrained multivariate smoothing splines and applied them to the estimation of posterior probabilities, where the estimated probabilities are constrained to lie between 0 and 1. Techniques for testing the monotone assumption are not as well developed as those for estimating a monotone function. Barlow et al. (1972) proposed the generalized likelihood ratio test statistics, x (tf , the variance of y,-, known) and E (cr unknown), 2  -2  2  2  to test the hypothesis H : u(xx) = u(x ) = . . . = u(rcfc), vs Hi: u(xi) < u(x ) < • • • < Q  u(xk) where for each i in { 1 , 2 , . . . ,  2  2  n; values of y are observed when x has values  x;. The null hypothesis distributions of x and E 2  2  are complicated, especially when  h is large, and the usual asymptotic results concerning likelihood ratio tests do not hold. In Chapter 5, we discuss techniques to determine if the true regression function is monotone. Instead of making a theoretical proof, we carry out simulations to study our techniques. Techniques are applied to two data sets in Chapter 5 and Chapter 7.  3  C h a p t e r 2. T h e S o b o l e v S p a c e W [ a , 6 ] m  2  In the regression model, the unknown function / is assumed to be i n some large function space. The Sobolev space W  m 2  is commonly used.  A function, / , defined on [a, b] is said to belong to the Sobolev space W [a, b] if / m  2  has the following properties: (i) / , / W , . . . , fi™- )  are absolutely continuous.  1  (ii) f( ) is square integrable. m  The following facts are well known (see, for instance, Kimeldorf and Wahba, 1 9 7 1 ) (i) W™[a, b] is a Hilbert space with inner product m-1  </,<?>= E r  +  / r{v)g {v)dv.  (i:  m  i=o  J a  (ii) L , which has domain W [a, b] and range R, defined by L (f) = f(t), is a m  t  2  t  continuous linear functional with L (f) = < Kt, f > where t  Kt( )  =  s  is called a reproducing kernel.  Two useful subspaces of W [a, b] i n the theory of smoothing splines are Hi and m  2  Ho- They are defined as Hi = {fe  W?[a, b] : / « ( a ) = 0,  j = 0 , . . . , m - 1}  and H = {fe 0  W?[a, b] : / ( ( i ) = 0 a.e}, m  and one can show that W  m 2  =  Hq@H . X  P and P denote two linear operators. P is the projection of W™ onto H and Q  x  0  0  TO  Po(/) = £<tf;,/>lfri  (3)  where { ^ 1 , • • •, V'm} is a basis for polynomials of order m which spans Ho- P\ is the projection of W  m 2  onto H , x  m  Pi(/)(*)  = / ( * ) - E < ^ / > ^ ( 0  ( ) 4  and \\Pi(fW  = [\f \t)Ydt. {m  (5)  The concepts mentioned above will be used in Chapter 4, part 3, smoothing splines.  5  Chapter 3 . Splines  In this chapter, we present the definition of spline space S (ti,..., m  the elements in S (ti,.. m  t ). To describe k  ., t ), we introduce three bases for it, truncated power basis, k  B-splines and I-splines. Finally, we will discuss the definition of monotone splines. The concepts and the results of this chapter will be applied to regression problems in Chapter 4 and Chapter 7. 1. Definition of S (ti, m  S (ti,..., m  k  k  [i,-,  k  t ) denotes the set of mth order splines. A spline of order m with knots  t = 0 < ti < ... < t 0  ...,t )  < t  k+1  — 1 is a polynomial of order m on each subinterval,  and has m — 2 absolutely continuous derivatives on [0,1]. The knots, i  are called interior knots. It is well known that S (ti,..., m  l 5  . . . , t, k  t ) has dimension m + k (de k  Boor, 1978).  2. Truncated Power Basis One of the bases for S (ti,...,  t ) is the truncated power basis, Xi,...,  m  k  where X {t) 1  x (t) m  =  1  =  t  =  tm—l m—l +  m—l  6  X+ m  k  and x+ = xl{x  > 0}, (x+)° = I{x > 0}. If m+k /(*) = £ frXiit), i=l  (6)  then simple calculations reveal i) / is a piecewise polynomial of order m on any subinterval [t,-,r ), since on t+1  m+k /(*) = £  [U,t ), i+1  m =  i=i  s  o  m  e  J'=I  ii) / has m—2 absolutely continuous derivatives, since each Xi has m—2 absolutely continuous derivatives. Proof: For i — 1 , . . . , k d ~ X i(t) m  2  m+  dt ~ m  (7)  = (m-l)!(t-<«•)+•  2  iii)-X",'s are linearly independent. Proof: For t € [0,*i),/(i) = E i * * .. . = 0  = 0, and so for i G [t t ),  m  u  2  Z?  +k  = E W ' PiX^t) =  fi (t m+1  _  -  1  = 0 implies ft = = 0, which  implies fi +\ — 0. Working forward in this manner, one easily sees that /?,• = 0, i = m  1 , . . . , m + k, if E r  = 0, for < e [0,1].  + f c  Although the simplicity of the truncated power basis makes it attractive for statistical work, the basis tends to be ill-conditioned as is often manifested in a nearly singular matrix,  X  Xi(xi)  X +k(xi)  Xx{x )  X (x )  2  \ Xx(x ) n  Here, X\,...,x  n  m  m+k  ...  2  X (x ) m+k  \  n  j  represent data points. Among other things, this makes the inverse of  X*X difficult to calculate. Thus, it is desirable to explore the use of other bases which are better conditioned.  3.  B - s p l i n e s a n d I-splines Given knots which satisfy 0 = i _ ( _ i ) = . . . = t < h < ... < tk < tk+i — • • • = tk+m = 1, m  (8)  0  the mth-order m + k dimensional B-spline basis, { - B _ ( _ i ) , , . . . , Bh, } (de Boor, m  m  m  1978), is defined as Bi,m(t) =  37^- _i(i) +  t+m  m  _  —B i -i(t) i+  im  1 r € [*t,*,-+i) »' = 0 , . . . , with B i(t) = < it  1 *e 0  and The  - 1  i= k  otherwise (9)  0/0 = 0.  B-spline basis is more convenient computationally than the power series basis  since each B^ has support [£,-, i m  ! + m  ) (de Boor, 1978), making X ' X  banded and easier  to invert. Another basis, which is used to define monotone regression splines (Ramsay, 1988), is the I*-spline basis of order m (m > 2), I* (i) m  where M _i(u) itTn  =  {ri( _ ), , • • •, Ik, }, m  2  m  m  f* M -i(u)du  Jo  i = -(m-2),... k  itm  -  —  with knots as i n (8).  .  (10)  't+m —1  Since each B^ _i is a piecewise polynomial of order m — 1 with i?; _i(i) > 0, each m  iTO  I* will be a non-decreasing piecewise polynomial of order m. m  However, the above definition of the I*-spline basis is incomplete, since i,* (0) = m  0,  i = —(m — 2),. . . , k, and there exist functions, / , i n S ( < i , . . . , t ) with / ( 0 ) ^ 0. m  k  Also, the basis for a n m + i dimensional space should have m + k elements, so an extra basis element, Il( -i) , m  im  should be included.  Theorem 1. Let / ! _ ( m  = 1. Then { i ^ _  1 ) i 7 7 1  ( m  1 ) i i n >  i " ! - , , • • •, I* ,m] is a basis for ( m  2 )  m  k  S (*i,...,**). m  Proof: Since {-B_( _i), , • • •, -Bjt,m} is a basis for S ( r i , . . . , tk), it suffices to show that any m  m  m  Bi, (l — —(m — l ) , . . . , k) can be written as a linear combination oi I^ _^ , I*( _ ), , m  m  m  m  2  m  r*  From de Boor (1978, page 138 equation 9) (m-l)(-f'  ^•Bt,m(g)  + 1  ' m  -(m-l) -/ 7 g  (m-1)  m  ' ~  2 )  + f'""- ?)  l ( 3 ; )  m  l(  l ( t )  7=  -(m-2),...,fc-l  Z=  -(m-l)  (11)  •Bfc,m-l(i) tk+m-l—tk  and thus - f * (rn-l)B +C,  .(u) ^  l+1  f=  ft ( m - l ) B , _ i ( « ) (  m  -(m-2),...,A;-l  (12)  /o C !r-^ '* (w  where  )  )du+c  )m  ( r o  _  1 ) i m  , -fl( -2), > • • • > -^,m> m  -i:  ( m  _ , (i) + c_ 2 )  m  ^ / * (i) + c £ fc  m  Therefore, for any 5 ; Hence, { I *  ( r o  _  f c  > m  ( m  _  e  1 ) i m  ( m  _  obtain  (*)  / = - ( m - 2),  (i)  / = - ( m - 1)  1 ) i m  (t)  =  tym  i  s  a  k- 1 (-14)  i=*  , there exist a / s , such that B (t)  , iH _ ),m. • • •> fc,m} (ro  _  ( m  _Di:  ( m  1 ) i m  7  1 ) i m  w  m  - ^ + i , m ( 0 + J/tmC*) + ^ lB  (13)  C\ = B ; ( 0 ) .  From the definitions of / l  J3/ .(0 =  l = k  b  a  s  i  s  f  o  r  2  S (h,. m  EL-( -i)«.^ (<)' m  m  .., t*).  In order to study the relation between I* and B-splines more easily, we defined an improved I-spline basis of order m as r* Ii, (t) = / Mi, -i(u)du m  J o  m  X + £ i,m(°)i i=i B  i = ~(  m  - 1), • • •,  M_( _i) _i(ti) =  where  m  Since 22-( -i) B^ (t) m  im  = 1, I-( -i),m(t)  m  (15)  0.  = 1. W i t h the inclusion of the constant terms,  m  E J U -S(, (0), the relationship between the I-spline and B-spline bases is easily stated. m  Theorem 2. Given a choice of knots satisfying (8), if Pj = E i  ( r o  Yl&jlj,m  = YLfiiBj,-™.  if and only  _i)  Proof: It suffices to show that I»,m(i) = E?=:  Bi (t). tTn  From (14), we have -It , (t) +1  BUt)  + IUt)  m  = { -Z _ (t) {m  + B.  2)tm  I = - ( m - 2 ) , . . . ,fc- 1  + B (0) l<m  ( n i  _ , (0) 1 )  l = -(m-  m  ^ r . (t)+B (0) k m  1)  (16)  l =k  ktm  Combining (16) and the definition of I-splines yields —ii+i,m(<) + Ej=/+i Bj (Q)  + Ii  tm  +B (0)  l =  lim  Bl, (t) m  - J2 =i k  tTn  Bj (0) im  -(m-2),...,k-l  = < - I - ( m - 2 ) , ( < ) + E * = - ( m - 2 ) B (0) m  +  jtm  h,m{t) - B (0) ktm  + B , (0)  (m  1)im  (0)  I= - ( m  - 1)  I= k  k m  -ii+l,m(*) + Il,m(t)  S_ _  1  = ~(  — 2), . . . ,k — I  m  — J-(m-2),m(*) + I-(m-l),m  I = —(  ijfc,m(*)  *= k  m  ~ 1)  Therefore Ii,m = E Bl l=i  tT  (17)  4. Monotone Splines In some applications, one may want to require that the regression function be monotone. For instance, if / ( £ ) is a response to a drug administered at dosage t, it 10  is reasonable to assume that / is a non-decreasing function of t. Three definitions of non-decreasing splines of order m are presented. (a) Improved Ramsay's monotone I-splines For  Ii m(t) t  as i n (15), a non-decreasing regression spline is defined as k  /(*)=  E  <*.•£>(*), w i t h a i > 0 ,  i=  -(m-2),...,k.  i=-(m-l)  Since  B[ -i(t) tTn  is non-negative, by definition (10),  is a non-decreasing function  of t. Thus, / is non-decreasing. (b) Kelly and Rice's monotone B-splines (Kelly &: Rice, 1989) For B-splines defined as i n (9), Kelly and Rice define a non-decreasing regression spline as k /(*)=  E  With /?_(„,_!) < . . . < 0 k  t'=-(m-l)  (c) Non-decreasing splines k  /(*)  =  E  PiBi,m(t)  is non-decreasing subject to the condition that/'(t) > 0.  :=-(m-l)  (For the case m = 2, in which case /'(£;) is undefined, take f'(t) > 0 almost everywhere.) Theorem 3. W i t h the same choice of knots and order, the set (a) of monotone splines of Ramsay is equal to the set (b) of monotone splines of Kelly and Rice. This follows directly from Theorem 2. Theorem 4. Suppose that  m  = 1, 2 or 3. If f(x)  =  Xw=-( -i) PiBi,m(x), m  monotone if and only if /?_( _!) < /?_( _ ) < . . . < 0km  m  2  Proof: First consider the case m = 1. Then k f( ) x  k-l  = 120i iA ) B  1=0  x  = EM*«<*<t«+i} + 1=0  11  ^i{t < <t }k  x  k+1  then / is  For x G [ti, t,+i), i = 1 , . . . , k, and y G [tj, t ),  j = 1,..., k with y > x,  j+1  f(x)=/3i  and f(y) = p . j  Thus f is monotone, if and only if / ( y ) > / ( x ) , if and only if 0j > /?,-,  j > i, and  i = 0 , . . . , k. For cases m = 2 and m — 3, from (11) /'(*)=  £ /?i^ (x) ;=-( -i) m  m  =  ( ™ - i ) i -/3-(m-i)— ; K t\ — r_( _ ) Bk,m-l(x) -i(*) ) +0 m  =  (m-1)  £  f* ~ ^  l=-(m-2)  t  ' + m - l  \  V  =  2  m  +T H tl+ -l—t[/  —  2  m  (18)  m  tl  we have  X Pi — 01-1 rj  /  _  X Pi — 01-1  N  f = 0 '+! ' r  A l - : : i _i _ ) W+m W+l  B,, -i(x).  ~  For the case m = 2, for x 7^ „,  +  f=0  r  r  '+  x  r  —  T  '  For x G (ij,r - i), t  +  z  i+l ~ i z  Thus / is monotone, if and only i f / ' > 0 on each interval (t,-,i ), i f and only if !+1  A > 0i-i, i =  0,...,k.  For the case m = 3, since / ' is piecewise linear and continuous, / is monotone if and only if /'(*,-) > 0, i = 0 , . . . , k + 1. From the definition of B-splines, Bi, (x) =  — ^ r ^ { t < x < t  2  I  i  +  1  }  k  k+1  /  '  W+2  W+l — If  When / = k, I{x G [t ,t )}  +  +  2  , {« i<»<«i }W+l X  J  1+  —  is replaced by I{x G [it,r i]}. fc+  12  +3  (I ) 9  Thus, for / = — 1 , . . . , k,  f 1 i = I+ 1 I 0  otherwise  Combining (18) and (19) yields  f\U)  = 2f- "f- .  Therefore, f'(U) > 0 if and only if & _ i >  1  2  (20)  ft_ . 2  Theorem 4 is proved. From our discussion, we conclude that the definitions of monotone regression splines in (a) and (b) are equivalent for all m and equal the definition in (c) when m < 3.  13  C h a p t e r 4. R e g r e s s i o n  In this chapter, two nonparametric regression methods, least square regression splines and smoothing splines, are introduced and developed for analysing data sets under monotonicity constraints on the regression function. In the numerical computation of the methods, quadratic programming is involved. 1. Q u a d r a t i c P r o g r a m m i n g The general form of a quadratic programming problem with linear inequality constraints is min xzR  n  subject to  c'x +  -x'G?  A*x > b  (21)  for constant matrices G and A and vectors c and b. (For further details, see Gill, Murray and Wright 1981.) For solving this problem, we introduce the following concepts: (i) . Feasible Solution Any point, x, that satisfies all the constraints in (21) is called a feasible solution. (ii) . Active and Inactive Constraints Let qi denote the i t h column of A, and b = (&(i),...,  b(k)Y-  At the feasible point  x, the constraint q\x > &(,•) is said to be active if q\x — b^, and inactive if q\x > b^y If the matrix G is strictly positive definite, problem (21) has a unique minimum. If x* is the solution with active constraints A{x* = bi, then x* minimizes c'x +  \x Gx l  subject to A\x = b . (The ith. column of Ai and the ith. element of b correspond to x  x  the i t h active constraint.) Using the method of Lagrange multipliers, c + Gx* - A i A = 0 14  and it can be shown that there exists A satisfying the above such that A, > 0. Conversely, if x* is a feasible point which minimizes c'x + | x G x subject to A\x — b and J  x  if c + Gx* — A\X = 0 for some A with A,- > 0, then x* is the solution to (21). Quadratic programming is an iterative technique. A sketch of the general procedure is: (i) . Find an initial feasible solution, x°. (ii) . A t the k-th step, separate the constraints into two parts, active and inactive, where A\x  k  = b represents the active constraints. x  (iii) . Minimize c'x + | x ' G x subject to A[x = b and obtain the solution x  (iv) . F i n d A satisfying c + Gx  k+1  x . k+1  — A X = 0. If A,- < 0, remove qf x = bn from the X  xi  active constraints, where g - is the ith. column of A and 6 - is the z'th element of b\. lt  x  lt  (v) . If the boundary of an inactive constraint is attained or violated by x  f c + 1  , add the  constraint to the active set. (vi) . Go back to (iii), using the new active constraint set as defined i n (iv) and (v), and follow the procedure until the active constraint set doesn't change. 2. L e a s t S q u a r e R e g r e s s i o n S p l i n e s Consider the regression problem where we have observations yi at design points and assume the observations of (x,-, y,),  i = 1,..., n are related by  the regression model yi = f(xi) + ei,  *= l,...,n  (22)  where the e,- are zero mean uncorrelated random variables with a constant variance cr , a < Xj < Xi < ... < x 2  n  < 6, and assume / is an unknown function in the Sobolev  space W™[a, b]. Given a basis { X _ ( _ ! ) , . . . , Xk} for S [ti,..., m  m  15  4 ] , we approximate /  by a function of the form «(*) =  £ j=-(m-l)  Set Q = ( / ? _ ( „ _ ! ) , . . . , f t ) , X = {X },  X^  4  i:j  = Xj(xi),  i = l,...,n,  j = -(m  -  1),. . . , k, y = ( j / i , . . . , y ) - If X is of full rank m + k < n, the least square spline f  n  estimator of u for the given basis is m+k u(t) = £ PjXj(t), i=i  where J3 minimizes ||y - Xg\\ and 2  is given by J3 =  [X X}~ X y. t  1  t  In some applications, a monotone function, u, is required. If monotonicity constraints are imposed, the fitted parameter Q is changed. There are two types of monotonicity constraints, as discussed in Chapter 3, Part 4. Below, these are formulated as quadratic programming problems, for numerical computation. (i) . Kelly and Rice's monotone splines Choosing B-splines as in (9) of Chapter 3 as the basis of S , m  Bj m( i), x  t  X = {X^},  X^ =  i = 1, • •. ,n, j = —(m — 1 ) , . . . , k, § and y are the same as before. Write the  constraints /?_( _x) < . . . < ft in matrix form, A Q > b, where the diagonal elements l  m  of A , a^i, equal —1, and a , 1  jl+1  equal 1, others are zero and b — 0. When the interior  knots ti, i = 1 , . . . , k, are unique, X is of full rank m + k < n, the least square spline estimator of / for the given basis is the unique solution of the following minimization problem, min^ subject to  -y XP x  +  A'g > b.  which can be solved by quadratic programming. (ii) . Point-wise monotone splines  16  ^tfX'Xg (23)  The expression of the non-decreasing constraint in Chapter 3, Part 4, (c) is u'(t) > 0 which can be approximated by the conditions ti(xa) < u(x ) < • • • < u(x ) 2  (24)  n  in numerical computation. As above, we can write the conditions as linear inequality constraints, A/3 > 6, where Aij = i?j (x, i) — JE?j (x,), i = 1,..., n — 1, j = —(m — )7n  +  im  1),..., k and 6 = 0. Then, finding the least square spline estimator of u under the constraints becomes the typical quadratic programming problem. 3. Smoothing Splines 3.1. Unconstrained Smoothing Splines Consider the regression model (22). To estimate / , we want an estimator which fits the data well but, at the same time, has some degree of smoothness. The natural measure of smoothness associated with a function / £ W [a, b] is / f( \t) dt, m  6  2  m  2  a  a standard measure of goodness-of-fit to the data is n~ Y^j=\{yj ~ f( j)) x  x  2  while Thus, one  method of combining these two criteria to obtain an overall performance measure is to find / which minimizes  where A > 0, controls the relative importance of smoothness and fit. The minimizer, / , is a natural spline of order k = 2m, with simple knots at xi,...,x  n  (Schoenberg,  1964, Reinsch, 1967). From (5), using the notation of Chapter 2, min  n " £(y,. - f(x,))  2  1  i=i  =  + A  [\f (t)) dt m  2  J a  min n-^yi-WW t=i  17  + XlWm*.  (25)  Now, let ipj(t) = V , j = 1,..., m, and r/,-, i = 1,..., n be the representors of Li in x  W? with Li(f)=<  f,r,i>,  Let £ = P x ^ , ) = / ( i - u)!p (x,- - u)+~7[(m - l)\] du, 6  _1  i = 1,..., n. Then the  2  0  minimizer, /A, can be expressed as 771  71  E ^ W  + E ^ W -  (26)  3=1  j=l  Let m  f (».--»)r (»i-")i " ,  w , ,  (S)o-  =  1  < 6,  >= X  m  1 )  T /  [(m-!)!]»  It can be shown that the coefficients c = ( c . . . , c ) ' and d = 1 ?  n  (e?i,...,  d Y in (26) are n  obtained by solving the following problem, min  ||y - T c j - E c | | + nAc'Ec 2  c r f  Subject to  T*c = 0.  (27)  From its definition, it is easily seen that S is positive definite. If T is of full column rank, the minimizer, fx , is unique (Kimeldorf &: Wahba, 1971, Duchon, 1976, Wahba & Wendelberger, 1980). The equality constraint T'c = 0 can be nicely eliminated from (27) as follows: let Q — [Q : Q ], orthonormal, and R be the Q — R decomposition of T, that is x  2  T = (Q : <? ) l  I  R  ^ 0(n— m)*m and let c = Q e. 2  ^  2  j  Since T*c = 0, such a representation always exists. The problem of  (27) becomes min^g  | | y - T d - E Q e | | + nAe'Q EQ e. 2  2  18  2  2  (28)  Then, it is straightforward to find the minimizers, d and e by differentiating (28). 3.2. Monotone Smoothing Splines If the smooth function we want to estimate is monotone, then new constraints must be added to d and c. It can be shown that the minimizer of (28), fx, subject to /'(<) > 0 for all t, is (i) a polynomial of degree 2m — 1 in each open subset of [a, b] — 5 — {xi,...,  x }, n  (ii) constant in every open subset of S, and (iii) has 2m — 3 continuous derivatives in [a, b] where 5 = ( i 6 [a, b]\f'x(t) = 0} (Utreras, 1985). The exact solution cannot be found explicitly and is difficult to calculate numerically. Therefore, as an approximation to the condition f'(t)  > 0, we intend to use the conditions in (24) which can be rewritten  in the following form. f( )-f( )  > 0  f(x )-f(x )  > 0  X2  Xl  3  2  /(*n)-/(*n-l)  >  0.  (29)  Then, the minimizer of (25) subject to the constraints (29) will be of the form E » ) + E ; ^ ) j j with T d + E c > 0,  '  c  2  where ( T ) 2  (E)f/,  t i  = (T) - —(T),-,-, t  i — l,...,n  +1)J  — 1,  (30)  2  i = 1,..., n-1,  j = l,...,n.  j = 1,..., m and ( E ) 2  l j  = (E)  ! + 1 J  -  B y an argument similar to that in the  derivation of (28), the coefficients c = Q e and d in (30) are obtained by solving the 2  19  following quadratic programming problem:  min^  ||y-Td-EQ2e||  g  subject to  + nAg*Q £Q g  2  2  2  T d + E <? e > Q 2  2  2  which is equivalent to  (A min^g - y\T : S Q ) 2  \J  T*T  e  T*SQ  ^ Q E*T  Q|E*SQ + nAQ £Q  2  d ^  W 2  2  2  ,  2  \  subject to(T : E Q ) 2  2  ?  /  > 0(31)  2  V? 7  Since T is of full rank and E is positive definite, the Hessian, '  TT  T'EQ  ^  2  ^ Q'E'T Q E EQ + nAQ EQ  J '  t  2  2  2  2  is positive definite, and so the unique solution for (31) exists. 4. Choice of the Smoothing Parameter In smoothing splines, the effect of the smoothing parameter, A, which describes the smoothness of the fitted function, is embodied in the minimizing criterion, 1  rb  n  - EE( y . - - / ( x , - ) ) + A a  n  A/ (0) *. (m)  2  Ja  . ,  t=i  The bigger the A, the smoother the fitted curve. In regression splines, the interior knots play a role similar to A in fitting functions. We assume the unknown function / is in the space S [t\,..., m  tk]. Given the basis { X _ ( _ i ) , . . . , X } m  k  of 5 [ r , . . . , tk], m  x  we express / as E j L - ( - i ) PjXj(x) and find least square estimates. The number and m  positions of the interior knots {t ,.. .,tk} x  determine the shapes of the basis splines,  Xj, and thus in turn the shape of the final function. For choosing the interior knots,  20  we can start with a few more equally spaced interior knots than we might need, then choose a subset of knots to get a smoother estimate of / . Thus, in regression splines, a value of A corresponds to a subset of knots. The choice of criterion for selecting the smoothing parameter is an important consideration in fitting a function. The choice of the smoothing parameter should depend on a measurement of the performance of the estimator, f\. Three criteria are widely accepted and used. They are the loss, the risk and the prediction risk. The loss L(A) is defined as  L(\)  = n-  1  £(f(x )  -  3  f (x )) . 2  x  3  «=i  Given the data set, the fitted function value  is a function of the smoothing  f\(xj)  parameter A. Hence the loss L(A) depends on A. The risk, R(\), equals the expectation of the loss L(X). The prediction risk is P(A) = E(n-> ±(y- -  f (xj)) ) 2  x  where y*- denotes a new observation with the same distribution as yj, but independent of yj. It can be proved that -P(A) = cr + R(X). 2  A good choice of A is one that  minimizes either L or R (and hence P). Unfortunately, all three measurements involve the unknown function / . The only way to solve the problem is to find well-behaved estimates of them. Generalized Cross-Validation ( G C V ) and the Akaike Information Criterion (AIC) are two such estimates. 4.1. G C V and Its Asymptotic Properties The definition of G C V is derived from Cross-Validation ( C V ) . Suppose we have observations (x,-, y,), i = 1,..., n, which satisfy model (22). Let  f \ = (f\(xi),  ...,  f\(x )Y n  where f\ is the estimate of / using the smoothing parameter A. For each fixed A, we estimate / ( x , ) using (n — 1) observations ( X J , yj), j = 1,..., i — 1, i -f- 1,..., n and  21  express the estimator as / A ( I ) - CV(A) is defined as  Cy(A)  l / n f : ( y - / A ( , ) )  =  »=i  t  2  .  If the estimator £ \ , which is obtained by using n observations, equals H(X)y f\(i)  and  = #(A)(yi, • • •,y,-_i, f\(i), y,-+i, • - . , y ) , n  it can be proved that  (  '"^(l-W)),,) ' 1  The G C V criterion is defined as GCV(A)-  1  EWw-AM)  1  n((l/n)tr(I-ff(A)))'  To choose the best A, that is, the A that minimizes G C V , we use the following grid research. Given A i , . . . , Ajv, calculate GCV(A,), i = 1,..., N, and choose the A,- which gives the smallest value of G C V ( A , ) . Some Asymptotic Properties of G C V (See L i 1985 and Speckman 1985) (i) . When the sample size n is large, G C V ( A ) is nearly an unbiased estimator of the prediction risk P(X). (ii) . If tr[J5T(A)] = 0, G C V ( A ) represents an unbiased estimator of the prediction risk for an estimator u\, where u\ — H(X)y, H(X) = (1 + a(X))H(X) with  -  and a(\)I  a(X) = tr[J7(A)]/tr[/ - ff(A)].  4.2. A I C and Its Asymptotic Property  22  A I C , which is based on information theoretic concepts, was proposed by Akaike in 1971 (Sakamoto et al, 1987). It can be used for statistical model selection in a wide range of situations. When a model involving q independent adjusted parameters is fitted to data, the A I C is defined by, AIC(g) = —2ln(maximized likelihood) + 2q. In the regression model with normal errors, AIC(g) = n l n f n - f^( 1  - f(x )) )  + 2q.  2  yj  }  j=i  A I C is closely related to G C V . Under the assumption that q/n —» 0 as n —> oo, exp[AIC(A)/n] = G C V ( A ) .  4.3. G C V Apphed to Monotone Constrained Problems A key to G C V in the unconstrained case is that fx = H(\)y, where H(\) depend on y. In the constrained case, fx = H(\)y  doesn't  where -ff(A) depends on y only  through the set of constraints which are active at the minimizing g. Under constraints, in both regression splines and smoothing splines, the problem is  min^  1 - y'Cg + -g'G§  subject  to  A*j3 > 0  (32)  with A not depending on y's. In regression splines, fx = XQ; in smoothing splines, 1  S = (cf : e*y and £ = [T : E Q ] J . x  2  Let J3 be the minimizer of (32) with A\Q = 0. If 1  M = (Qi • Q2) \ 23  0  J  then @ = Q e, where e minimizes —y CQ e + ^e Q GQ e. t  2  t  2  Thus  t  2  2  I = Q2(Q GQ )- Q C y t  1  2  2  t  (33)  t  2  and  2A =  ff(A)y.  But Ai depends on y and thus so does H(X). Therefore the unconstrained G C V results do not hold here. 4.3.1. The Approximation form of G C V in Regression Splines Let A\ correspond to the active constraints at the solution of the constrained minimization problem, (23). Then, t r a c e ( J J ( A ) ) = rank(X) — rank(Ai) = m + k — r a n k ( A i ) = I. The rank of A can be obtained by doing the Q-R decomposition of A . The approxx  x  imate form of G C V ( A ) is (l/n)E? (/ 0r,)-y,) (1 - l/nf • 2  r G  r C  v V  m (  A  = 1  )  =  A  How do we use G C V ( A ) to choose interior knots?  Suppose the domain of the  fitted function is in [0,1]. The two ends knots are 0 and 1. Consider all the possible non-empty subsets of {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}. There are 511 members in this collection. W i t h the empty set, we get 512 combinations. For each subset, Aj, compute G C V ( A j ) using interior knots equal to the subset. Choose the set of interior knots which minimizes G C V score. 4.3.2.  The Approximation Form of G C V for Monotone Constrained Problems in  Smoothing Splines  24  From (31) and (32), we have  C  =  (T:EQ ),  A*  =  (T :E <? ).  G  2  2  2  TT  T XQ t  2  2  Let T2i,E 2 i be composed of all rows of T and  E  straints at the solution point. Then A\ = ( T :  E Q ).  2  corresponding to the active con-  2  2i  21  2  Let the Q-R decomposition of Ai be  A  1  = (U, : U ) 2  \ ° / From (33)  V (V\GV y^V\C\ 2  2  Vs ) Thus,  fx  = C =  H(\)  CU (U GU )- U C y t  2  =  1  2  t  2  t  2  CU {UlGU )- U C 1  2  tr i7(A) =  t  2  t  2  tii(U GU )- U^C CU ). t  2  1  t  2  2  Therefore, the approximation form of GCV(A) is GCV(A) = l/nE? i(/(arO-y.0 (1 - l/ntv(H(X))y  2  =  4.4. A I C i n Monotone Constrained Problems i n Regression Splines  25  In monotone constrained problems, the value of q i n the definition of A I C in 4.2 is the number of true free parameters, namely, the difference between the number of parameters i n the unconstrained case and the number of active constraints. Compute the A I C score for each sequence of knots which has been mentioned 4.3.1. Choose the set of knots which minimizes the A I C .  26  Chapter 5. Data Analysis 1: the Relationship between Survival Probabilities and Birthweight of Male Infants In this chapter, we apply the monotone least square regression splines and smoothing splines described in Chapter 4 to the estimation of the relationship between survival probabilities and birthweights of male infants. We discuss techniques to study the assumption that the regression function is monotone and give some preliminary simulation results. A short description of the data set and exploratory data analysis are given in the first two sections. 1. Description of the Data Set This is a survival data set from 7037 male births, taken from Table 2(i) in Karn and Penrose (1951). Data are birthweight (in pounds, rounded to the nearest half pound) and whether or not the infant has survived. It is believed that infant survival is closely related to birthweight. Two possibilities for the relationship are believed to exist. The first is that survival probability increases with birthweight. The second is that there is an optimal birthweight: smaller infants are weaker and less likely to survive, while larger infants are associated with more difficult births. The purpose of the analysis of the data set is to determine if survival probability increases with birthweight. 2. Exploratory Data Analysis Table 1 contains the data set and summary statistics.  27  Table 1. Male Births Data Birthweight  Number of infants  Fraction surviving (p,-)  standard error of p,-  1.0  13  0  0  1.5  8  0  0  2.0  20  0.05  0.0487  2.5  15  0.13333  0.0878  3.0  39  0.25641  0.0699  3.5  46  0.56522  0.0731  4.0  60  0.76667  0.0546  4.5  99  0.81818  0.0388  5.0  170  0.90588  0.0224  5.5  360  0.92222  0.0141  6.0  721  0.97781  0.0055  6.5  1002  0.97505  0.0049  7.0  1403  0.97078  0.0045  7.5  1141  0.97721  0.0044  8.0  942  0.98089  0.0045  8.5  459  0.96514  0.0086  9.0  301  0.98671  0.0066  9.5  132  0.96212  0.0166  10.0  63  0.90476  0.0370  10.5  25  0.92000  0.0543  11.0  15  0.86667  0.0878  11.5  2  1.0000  0  13.0  1  0.0000  0  Before doing regression analysis, we draw the scatter plot of the fraction of infants 28  surviving, p,-, versus birthweight X ; with horizontal bars displaying 95% confidence intervals for p,- (not including the last two data, as the numbers of infants here are too small). Figure 1 shows that the fraction of infants surviving increases with birthweight for weights between 1 and 6.5 pounds, is basically constant between 6.5 and 9.5 pounds, and decreases slightly from 96% surviving at 9.5 pounds to 89% surviving at 11 pounds. Compared to the interval [1,9.5], the length of the interval [9.5,11] is small, as is the drop i n the percent surviving, from 96% to 89%. Therefore, the regression function relating survival probabilities and birthweight of the infants may be well-fit by a monotone function. 3. A n a l y s i s o f t h e D a t a b y L e a s t S q u a r e M o n o t o n e  Regression Splines  For calculational convenience, we make a transformation of the data.  Let z- = t  sin ^/y,/n,-, i = 1,. . . , 19 (not including the first and last two data), where y, and _1  rii equal the number surviving and the number of observations at the ith birthweight respectively. Assuming that yi is binomial, the asymptotic variance of Z{ is consistently estimated by V{ — l / ( 4 n ) as rii approaches infinity. The simplicity of the asymptotic t  variance is the primary justification for considering the inverse sine transformation. The weighted regression analysis is equivalent to treating z^j\Jv~i as normal with unit variance with  u(a:0A/5 = E(-p=)= V » U  £  ^-(xO/v^,  j=-(m-l)  where X,- = 1 + .5 * (i — 1), u and Xj have the same meanings as i n Chapter 4, Part 2 with m , the order, equal to 4. The least square regression spline techniques with two types of monotonicity constraints (Kelly and Rice and point wise monotonicity) are then applied to the transformed data,  z,/y/vf.  In Kelly and Rice's method, G C V and A I C choose the same set of knots, {0,0.1, 0.4, 0.6,0.9,1}. In the point-wise monotone method, G C V and A I C choose {0,0.1,0.3,0.6, 29  0.8,0.9,1}. Figure 2 and Figure 3 display the two fitted curves where the vertical axis points are pi =(sin(S,)) . It is observed that the two fitted curves are almost 2  same.  The estimated survival probabilities of the infants increase almost linearly  for birthweights between 2 and 6 pounds. Between 6 and 11 pounds, the estimated survival probabilities are about 97%. The unconstrained estimates of the survival function are also shown in Figure 2 and Figure 3. Each unconstrained estimate was calculated using the same knots as in the corresponding constrained estimate. 4. Analysis of the Data by Monotone Smoothing Splines We apply fourth order monotone smoothing splines to the transformed data to fit the unknown function relating survival probability and birthweight. From the plot of the G C V score in Figure 4, we see that the best choice of the smoothing parameter, A, is 0.019. The fitted curve corresponding to A = 0.019 is shown in Figure 5. From this fit, we reach the same conclusion about the relationship between birthweight and survival probabilities as before. 5. Some Discussions on Testing the Monotone Assumption of the Regression Function In the previous data analysis, we assume that the unknown regression function, / , is monotone. Naturally, we ask: Is this assumption acceptable? Defining a suitable test statistic and determining its null distribution is hard. Instead, we do some preliminary analysis. We propose three summary statistics and carry out simulations to determine their distributions. 5.1. The Summary Statistics Someone who believes that / is not monotone wants to test H : / is monotone, vs Hi'. No restrictions on / . 0  30  We approximate / by a fourth order spline  j=-(m-l) and estimate u using the least square regression methods with generalized cross validation, as discussed in Chapter 4, Part 2 and Part 4.3.1. Under the null hypothesis, with Kelly and Rice's constraints on the /?/s, we obtain the estimate of / , / . The 0  unconstrained least squares estimate of / , using the knots chosen i n the constrained case, is f\. One statistic, similar to that i n the classical regression problem, is T  _  \\U-h\?l{dh-dh) h-LVIdh  where dfo represents the degrees of freedom of the model estimating / under H , dfo = Q  n—tr .ff(A), H(X) is defined in Chapter 4, Part 4.3.1, and dfi represents the degrees of freedom of the model estimating / under Hi, dfi = n—rank of X, X = {X,(x,)}. If the null hypothesis were not a constrained one and if the choice of knots were not data based, then under the null hypothesis T would be F distributed with degrees of freedom df — df and df . In Section 5.2, we describe simulation studies to determine 0  t  x  if T has an F distribution in the constrained problem. We also study the distributions of Ri/cr fo\\ /& , 2  2  2  = \\y — f \\ /cr 2  x  to determine if they are approximately \  2  2  and R /a  2  0  = \\y —  with degrees of freedom dfi and  dfo, respectively. 5.2. Simulation Let yi = 7sin( 1.5a:,-)-(-£;, i = 1,..., 50 where x - = z'/51 and e,- ~ N(0,1.5 ). 2  t  For each  of 1000 simulations, use G C V to choose the knots, estimate fo and f and calculate the x  test statistic, T. Since the number of knots and the number of active constraints may vary from simulation to simulation, the 1000 runs will produce different values of df  0  and dfi. Figure 6 and Figure 7 show the Q-Q plots of T with the four most common 31  values of (df — dfi,dfx),  under the assumption that T is F distributed. Obviously T  Q  is not F distributed, since the points on the Q-Q plots do not lie on the line y = x. Figures 8, 9, 10, and 11 give the Q-Q plots oiR\/a  and R /a ,  2  2  0  under the assump-  tion that these variables are x - The plots show that the distribution of R\ja 2  close to a x , while R /a 2  2  0  2  is very  is close to, but a little smaller than, a x 2  We next simulate using the same x^s and an irregular, monotone regression function, f(x)  = 7xI < . {0  x<Q 3}  + 7 * 0.31 . < < {0 3  x  1}  + 7(x - 0.8)I .8<*<1}- Figures 12, 13, 14, {0  15, 16, and 17 show the Q-Q plots of T, R\jo  and R /a  2  Q  2  from 950 simulations. We  have the same conclusions as the previous simulations. 5.3. Conclusion In real data, we seldom know cr , and so cannot calculate the statistics 2  and Ro/a . 2  0  should be close. In the male infants  data, the value of i ? equals 4.4222 with df = 11 and the value of R x  0  2  However, if the degrees of freedom in both models are close and the null  hypothesis is correct, the values of R\ and R  df  R\/a  x  0  6.8954 with  = 16. Therefore, there appears to be little difference between constrained and  unconstrained fits. Statistical inference for monotonicity of a regression function is still an unsolved problem in the regression/smoothing spline context. The construction of a suitable test statistic and the calculation of its null distribution could be a research topic in our future study.  32  Chapter 6. Description O f Song Sparrow Data and Exploratory Data Analysis  1. Data Set Description The second data set which we will analyse is about the survival of song sparrows in their first year of life on Mandarte Island in B . C . (Schluter and Smith, 1986). The purpose of our study is to relate survival probabilities of song sparrows in their first year of life to some of their morphological traits. From 1975 to 1978, each year's nestlings were tagged with identifying numbers about six days after hatching. The nestlings were later measured when captured as independent juveniles usually in late summer or early fall of the same year. We use measurements from birds at least 8 weeks old. Totally, 152 male and 145 female song sparrows were observed. O n each bird, 6 traits were measured. They were weight, wing length, tarsus (lower leg) length, beak length, beak depth, and beak width, where weight is in grams, others are in millimetres. Observations indicated that mortality and emigrations occured primarily during fall and winter. Annual spring censuses therefore determined the fate of individual birds through the previous fall and winter. The response variable, the status of survival, is categorized into 1 or 0, 1 meaning survived, 0 meaning disappeared. 2. Exploratory Data Analysis Before we analyse the relationship between the response variable and the'six independent variables, we would like to look at the six independent variables separately for the male song sparrows, the female song sparrows and both combined. In Section 2.1, we draw boxplots and compute means, medians and standard deviations of the six variables in order to observe the sex effect of the birds on the values of the six vari-  33  ables. A large difference between the sexes would suggest a need to analyse survival probabilities for each sex separately. One way of approaching the relationship between the response variable and the six predictors is to do a regression analysis. In regression, when the predictors are collinearly related, regression modelling can be very confusing. Estimated effects can change magnitude or even sign, depending on the other predictors in the model. It is important to understand the relationships between the predictors before proceeding with a regression analysis. In Section 2.2, the scatter plots of pairwise relations will help us roughly check the pairwise collinearity of the predictor. In many regression problems, we may seek a small set of predictors that have nearly the same information as the full set. Further analysis can concentrate on this subset of predictors. In Section 2.3, principal component analysis is done for this purpose. By separating the male and the female song sparrows and doing principal component analysis, we can also see the relationship between sex and the six independent variables. In Section 2.4, we will do some preliminary regression to have a better understanding of the relationship between response and each of the predictors. Finally, in Section 2.5, we model the logit scale of survival probabilities as linear functions of the independent variables in order to compare the contributions of the 6 independent variables to survival probabilities of song sparrows. 2.1. Sex Effect of Birds on the Values of Independent Variables Figures 18, 19, 20, 21, 22, and 23 show the boxplots for the six independent variables. In each picture, from left to right, the three boxplots correspond to the combined, the male and the female song sparrows. We observed that the male and female sparrows differ in all six traits, with the largest differences being in weight and  34  wing length. For quantifying the differences, we compute means, medians, standard deviations and standard errors of the means and list them i n Table 2. Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables Variable  Weight  W i n g Length  Tarsus  Beak Length  Beak Depth  Beak W i d t h  Type of Data  Mean  Median  SD  SE  Overall  24.0110  24.0000  1.7737  0.1029  Male  25.1445  25.0000  1.4184  0.1150  Female  22.8229  22.9231  1.2568  0.1044  Overall  66.4861  66.5000  2.2540  0.1308  Male  68.3416  68.1845  1.2553  0.1018  Female  64.5411  64.4837  1.1610  0.0964  Overall  19.8770  19.9000  0.5922  0.0344  Male  20.1257  20.1250  0.4919  0.0399  Female  19.6162  19.5833  0.5775  0.0480  Overall  8.6625  8.6500  0.3148  0.0183  Male  8.7054  8.6777  0.3138  0.0255  Female  8.6174  8.6010  0.3106  0.0258  Overall  5.8844  5.9000  0.2017  0.0117  Male  5.9346  5.9466  0.1995  0.0162  Female  5.8318  5.8146  0.1908  0.0158  Overall  6.7473  6.7500  0.2001  0.0116  Male  6.8125  6.800  0.1917  0.0156  Female  6.6790  6.6814  0.1860  0.0154  2.2. Pairwise Relations among the Six Independent Variables and Sex Effect on the Relations Figure 24 shows the pairwise relations of the six independent variables for the whole 35  data set where W T is weight, W L is wing length, T is tarsus, B L is beak length, B D is beak depth, and B W is beak width. It seems that W T and W L , W L and T are strongly linearly correlated and that B D and B W , B L and B D , W L and B W , B L and B W are linearly correlated. For observing sex effect, we divide the data set into two parts according to the sex of song sparrows and draw the same pictures. Figure 25 shows the data of the female song sparrows, in which all the pairwise linear relations existing in Figure 24 look weak. In Figure 26, which shows the data of male song sparrows, only the relations among B L , B D and B W are similar to those in Picture 24. Figures 27 through 31, enlarged images of some of the small scatter plots in Figure 24, describe the relations between the independent variables more clearly. In those plots, by adding notations f (female song sparrows) and m (male song sparrows), the dominance of sex of the birds on the relation of W T and W L is obvious. Since, for female and male song sparrows, the values of W T and W L are much different, although W T and W L have no strong linear relation, when we put the male's and female's data together, an elliptical shape appears. The phenomenon occurs in other pictures, but not as obviously as in that of W T and W L . 2.3. Principal Component Analysis There are six independent variables in the data set. W i t h principal component analysis, we want to see whether we can use a few principal components to reproduce most of variability of the system, and thus to reduce the data. For instance, if the first principal component summarizes most of the variability in the independent variables, we may want to regress survival probability against the first principal component. We do principal component analysis for the whole data set and for separated data sets of male and female sparrows in order to observe the difference of principal components  36  in the two cases, which somehow indicates the effect of sex. For the whole data set, ^ 0.2803024 ^ 0.2007063 ^ Proportion of total sample ^ variance due to kth  0.1460836 = (  Ai + ... + A  0.1385502  6  ^  principal component  ^  0.1298394 0.1045181  The first three principal components, which represent 63% the variability, are PCI  =  0.46WT + 0.50WL + 0.29T + 0.35BL + 0.39BD + 0.42BW  PC2  =  - 0 . 2 5 W T - 0.30WL - 0.61T + 0.45BL + 0.43BD + 0.29BW  PC3  =  0.49WT - 0.08WL - 0.38T - 0.67BL + 0.39BD + 0.01BW.  For the female sparrow data set, ^ 0.2258542 ^ 0.1944510 ^ Proportion of total sample ^ variance due to kth  0.1654235  •Ai + ... + A  e  ^  principal component  j  0.1438113 0.1414802 0.1289797  In this case, about 58.6% of the total sample variance is attributed to the first three principal components. The first three principal components become PCI  =  0.19WT + 0.30WL - 0.13T + 0.56BL + 0.52BD + 0.52BW  PC2  =  - 0 . 5 6 W T - 0 . 4 0 W L - 0 . 7 1 T + 0.10BL + 0.00BD + 0.16BW  PC3  =  0.54WT - 0.73WL - 0.05T - 0.08BL + 0.39BD - 0.10BW. 37  For the male sparrow data set, 1  0.1999561  ^ Proportion of total sample ^ variance due to kth  0.2473383 ^  0.1599121 = (  •) = Ai + ... + A ,  principal component  0.1445753 0.1307587 0.1174594  Here, the first three principal components, which explain 60.7% variance of total samples are changed into PCI  =  0.29WT + 0.36WL + 0.06T + 0.54BL + 0.49BD + 0.50BW  PC2  =  - 0 . 3 3 W T + 0.53WL + 0.67T + 0 . 1 2 B L - 0 . 3 8 B D - 0 . 0 3 B W  PC3  =  0.87WT + 0.03WL + 0.35T - 0.20BL - 0.13BD - 0.23BW.  From the above analysis, we draw the following conclusions: (i) . The first three principal components are not enough to reproduce most of the variation. (ii) . Since for male and female sparrows the values of the six morphological traits are different and the coefficients in the first three principal components of the traits vary dramatically, analysing the data separately for male and female sparrows is necessary for us to recover the true relationship of response and the independent variables. 2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Independent Variables We separate the whole data set into two parts according to the sex of the birds. Let x  x  < x  2  < ... < x  n  denote the independent variable values and y; denote the  corresponding 0 — 1 responses. Divide the range of x into a number of subintervals, 38  as follows. The first subinterval consists of x ,..., Xio+j where j is the biggest nonx  negative integer such that x j lQ+  = x .  The following subintervals are constructed  x0  with the same rule, that is, each subinterval includes 10 points plus the replications (if any) of the tenth point. Let p,- denote the survival probability in the ith subinterval, Pi = { E ^ g g u b i n t e r v a l K y =  1  I j)}/# j x  x  subinterval. The estimate of the survival  e  probability, p,-, i n each subinterval equals the mean of the response i n the subinterval. In logit scale, the estimate is log(p /(l — p,)) = logit(p ). The asymptotic variance 2  t  of logit(p ) is l / ( n ; p ; ( l — pi)), and thus an approximate 95% confidence interval for :  logit(p ) is [log(p,/(l - Pi)) - 2/y/mpiil-pi), t  log(pf/(l - pi)) +  2/y/n p (l-p )]. i  i  i  Figure 32 - Figure 43 are the scatter plots of (x"i, log(pi/(l — pi))) with horizontal bars displaying these confidence intervals, (x, denotes the average of the independent variable values i n the subinterval.) From these plots, for male data, it is hard to see a relationship between the response and any of the independent variables, while for female data, relationships with T , B L and B W may exist. For T , it seems there is a decreasing trend for survival probabilities. Also, i n the plots of B L and B W , some relations may exist. 2.5. Generalized Linear Models In this section, we use generalized linear models to study the relationships between independent variables and survival probabilities. Since the response, the status of survival, is binomial and the independent variables are continuous, we model the logit of the survival probabilities as linear functions of independent variables. Although the models chosen may have big lack of fit errors, by comparing the deviances of the models, we can roughly evaluate how much valuable information we can obtain through fitting the curve of each predictor and the response separately. Let W T , W L , T , B L , B D , and B W represent the same variables as before.  39  First look at female song sparrow data. Table 3 shows the deviances and degrees of freedom of the models with a constant, first degree, or second degree polynomial depending on W T , W L , T , B L , B D , or B W .  Model  Deviance  Degree of Freedom  log(p/(l -p))  = a  200.951  144  log(p/(l -p))  =a+ftWT  200.906  143  log(p/(l -p))  = a +  198.981  142  log(p/(l -p))  = a +  200.077  143  log(p/(l -p))  = a + /3 WL+/? WL  197.671  142  log(p/(l  = a + (3 T  184.944  143  log(p/(l -p))  = a +  f3 T+p T  184.767  142  log(p/(l -p))  = a +  ftBL  192.876  143  log(p/(l -p))  = a +  191.239  142  log(p/(l -p))  = a +  200.901  143  log(p/(l -p))  = a+  200.249  142  log(p/(l -p))  = a + fieBW  196.914  143  log(p/(l -p))  = a +  193.183  142  ftWT+ftxWT  2  WL 2  2  22  3  2  3  33  ftBL+^BL  2  ftBD ftBD+/3 BD  2  55  ftBW+fcBW  2  From Table 3, the model, log(p/(l — p)) = a + f3 T, is a significant improvement over 3  the elementary constant model. The linear parts in the relationships of B L and B W with the response can not be neglected, although this is not as obvious as in T . Adding the quadratic term to each model doesn't help us a lot for obtaining a good fit. For explaining the point more clearly, in Table 4, we use all six independent variables to fit models.  40  Table 4: Logistic Models Including Six Independent Variables for Female Song SparrowData Model log(p/(l  -P))  = a + /WT+/WL+/3 T+/? BL+/? BD+/? BW  log(p/(l  -P))  = a + AWT+^WL+^T+^BL+^BD+^BW  +P\i W T  3  2  + /3 WL  2  22  +  +  ^  4  4  B  L  2  +  5  p  s  s  B  B  2  6  +  feBW  2  Deviance  DF  174.943  138  167.964  132  From Table 3 and Table 4, we see that second degree polynomials are not good models for fitting the data. Some nonparametric methods, for example, splines, may help us to solve the problem. Next consider the first degree polynomial models of T , B L and B W in Table 3 by adding some of the second interaction terms. Table 5, Table 6, and Table 7 give the results corresponding to T , B L , and B W . Table 5: Logistic Models Including T and Its Second Interaction Term Model  Deviance  Degree of Freedom  184.944  143  log(p/(l -p))  = a + /3 T  log(p/(l -p))  = a + /3 T+/? WT*T  184.536  142  log(p/(l -p))  = a +  /3 T+/3 WL*T  182.562  142  log(p/(l -p))  = a + /? T+/? T*BL  184.597  142  log(p/(l -p))  = a +^ T+^ T*BD  180.358  142  log(p/(l -p))  = a + /? T+/3 T*BW  184.662  142  3  3  13  3  3  3  23  34  3 5  3  36  41  Table 6: Logistic Models Including B L and Its Second Interaction Term Model  Deviance  Degree of Freedom  192.876  143  log(p/(l -P)) = a + /3 BL 4  log(p/(l ~P))  = a + /? BL+/? WT*BL  192.806  142  log(p/(l ~P))  = a + /? BL+& WL*BL  192.871  142  log(p/(l -P)) = a + / ? B L + / ? T * B L  192.327  142  log(p/(l -P)) = a + / ? B L + / ? B L * B D  192.758  142  = a + /? BL+/? BL*BW  191.857  142  4  14  4  4  34  4  log(p/(l  4  45  4  46  Table 7: Logistic Models Including B W and Its Second-Interaction Term Model  Deviance  Degree of Freedom  196.914  143  log(p/(l -P)) = a + / ? B W + / ? W T * B W  196.906  142  log(p/(l -P)) = a + / ? B W + f t W L * B W  196.046  142  log(p/(l -P)) = a + i 3 B W + ^ T * B W  196.913  142  log(p/(l -P)) = a + / ? B W + / ? B L * B W  196.457  142  log(p/(l -P)) = a + / ? B W + / ? 5 B D * B W  196.747  142  log(p/(l -P)) = a + j8 BW 6  6  16  6  6  6  6  6  46  6  The results in these tables suggest that, among all second interaction terms, only the term T * B D might have some useful information for predicting survival probabilities. The following table includes the same models as in Table 3, but for the male song sparrow data.  42  Table 8: Logistic Models Fit to the Male Song Sparrow Data Model  Deviance  =a  log(p/(l -p)) log(p/(l -p))  a + ft W T  =  log(p/(l -p))  a +  ftWT+ftxWT  a + ft W L  log(p/(l -p))  =a  log(p/(l -p))  + ft W L + f t W L  a + ftT  log(p/(l -p))  a +  ftT+ft T  2  3  log(p/(l -p))  =  a +  ftBL  log(p/(l -p))  =  a +  ftBL-f  log(p/(l -p))  =a  log(j>/(l -p))  -  a +  =a  ft BL  2  4  + ftBD  a +  log(p/(l -p))  2  2  log(p/(l -p))  log(p/(l  2  +  ftBD+ft BD  2  5  ftBW ftBW+ft BW  2  6  Degree of Freedom  196.575  '• 151  196.539  150  194.647  149  196.517  150  196.082  149  196.100  150  194.876  149  196.572  150  196.411  149  196.529  150  194.946  149  196.034  150  194.456  149  From the deviances of the models in the above table, we see that adding first and second degree terms of the independent variables doesn't give much improvement over the constant model. From here, at least we can roughly say, the logit scales of survival probabilities are not linearly related with any one of the independent variables. Conclusions: (i) Compared with the male song sparrow data, the female song sparrow data set shows relationships between independent variables and response. (ii) If we want to analyse the relation between each independent variable and survival probabilities further, including T , B L and B W in female song sparrow might be wise. (iii) Second degree polynomials are not good models for fitting the data. 43  In the following analyses of this data, we will focus on T, BL and BW in the female song sparrow.  44  Chapter 7. Binomial Regression Splines For analysing the song sparrow data, we now introduce unconstrained binomial regression splines to motivate the technique for the constrained case. B y modifying the unconstrained method, we propose a technique of estimating monotone binomial regression splines. We define G C V and A I C for knots selection in binomial regression models, and their approximate forms under constraints. Finally, we apply the methods to the female song sparrow data.  1. Unconstrained Binomial Regression Splines Consider independent observations (x,-, y,), i = 1 , . . . , n, where Binomial(l,p(ar,-))  i=  I,...,n.  Let f(t) =log(p(i)/(l — p(t))) and assume / is in the Sobolev space W [ a , b]. This m  2  transformation is made to remove the constraint p 6 (0,1), and to guarantee / is monotone in p. Given the B-splines basis, { I ? _ ( _ i ) , . . . , B_jt, }, of S [ti,..., m  m  irn  m  t ], k  we approximate / by k  j=-(m-l)  Set  § =  (/?_(„!_!),...,  flkY  and y = ( y . . . , y ) - To obtain the maximum likelihood 4  1 ;  n  estimate of the parameter 0, we need the likelihood function of y. The likelihood function of y is n L(8/V)  =  l[p(xi) (l-P(xi)) vi  «'=i  and its logarithm can be expressed as  n  = E{y.-/(*0-ln(l + exp(/(zO))} 45  n  k  f  J2\yi »=1 I  Let  £  k £  - l n ( l + exp(  PjBj, {xi) m  j=-(m-l)  j=-(m-l)  -Sfc, (a;i)  ( B I \  m  •S-(m-l),771(^2)  5 =  -^-(m-2),m  y B-( -\) (x ) m  tm  = Bk, (x )  B_( _2),m( n) x  n  PjBj^Xi)))  m  m  BI  j  n  and assume B is of full column rank. Thus, K6) = £ {yiB'Bi - l n ( l + exp(^ 5 t ))} . t=i 4  Differentiating  with respect to g yields s(§)  =  -  I'm  VV„n exp(^^,) ^ " l + exp(^B,-) 1=1 (  y  i  B  <  (34)  B0  Defining ^  exp(jg'Bi) ^ l+expf^'Bj)  exp(^ B ) i+exp(#<B„) t  n  \  /  We obtain the equation  The maximum likelihood estimate of g is the solution to S(g) — 0. This equation has no explicite solution, but can be solved by Newton-Raphson iterative techinique. This technique requires the calculation of the second derivative of I with respect to g. From (34), cPl(g) _ dS{6) dg  2  dp  _ _ ^  expjB'Bj)  t  £ t (1 + e x p ( f t £ ) ) t  46  2  1  *"•  Defining V = d i a g K -..,,„) = d i a g (  (  1  +  e  x  p  ^  (  )  )  , • - -,  2  ( 1  +  e  x  p  (  ^  )  )  2  )>  we obtain the equation = - £  d$  2  ViBiB\  =  -B*VB.  The implementation of Newton-Raphson iterative technique includes three calculations: (i) F i n d an initial estimate of 0, f3 . "k  -fc+1  (ii) Given the kth. step's estimate {5 of § , define §  -t+i  ~k  . =  dS(Jl  -k  w  §" + {B VB)-'B\ -uQk))  (35)  i  U  (iii) Establish a criterion to decide when iteration should be stopped. If, at each step, B VB  is non-singular matrix, then the Newton-Raphson method  l  converges. B being of full rank, B VB t  is necessarily non singular. The maximum  likelihood estimate of Q is the final result of the iteration. The Newton-Raphson method of finding the maximum likelihood estimate of Q is sometimes called iteratively reweighted least squares for the following reason: From (35), we have  8*  1  (B'VB)^  1  {B VB)- BXy-u{f) ))  = 8 + k  =  (B'VB)^  =  t  B $  k  + B\y  -  B*V(BJ3  + V~\u  -  + V-\y  - u),  k  If  g=  1  47  «(£*))•  then, 8  =  k+1  (B'VB^B'Vz  is the solution of the weighted least squares problem min^  \\vh(z-BS)\\\  (36)  where the elements of V and z are the functions of the former estimate of (3. 2. M o n o t o n e B i n o m i a l R e g r e s s i o n Splines Assume / is a monotone function. W i t h each of the two types of monotonicity constraints discussed in Chapter 3, Part 4, we want to estimate the parameter Q which maximizes  1(0)  subject  to  A § > 0. t  To solve the problem, we make a modification of step (ii) of the Newton-Raphson method, using equation (36).  W i t h V and z as defined in Section 1, @  k + l  is the  solution of the following quadratic programming problem: min subject A t each iteration, if B VB t  \\V*(z - B £ ) | | to  2  A*@ > 0.  (37)  is strictly positive definite, a unique minimum is achieved.  3. G C V i n B i n o m i a l R e g r e s s i o n Splines 3.1. G C V in Unconstrained Binomial Regression Splines In the least squares regression problem, G C V is defined as  where the numerator represents the residual sum of squares. A n analog of the residual sum of squares in the generalized linear models context is the generalized Pearson \ 48  2  statistic. Replacing the numerator in (38) by Pearson \  2  leads to the G C V score  where V, z, §, and the hat matrix, H(A) = 5 ( B ' V B ) ~ B*V, are all computed at the l  final stage of the iteration. It is conjectured that, for smoothing spline fits, the A minimizing the G C V score will approximately minimize the weighted mean square error (O'Sullivan et al., 1986), i/»E»ife)-/fe)) 2  i=i  In binomial regression splines, there are two algorithms to calculate the G C V score. Algorithm 1. For each subset of possible knots, A, estimate the parameter g using the Newton-Raphson algorithm and compute the G C V at the last iteration using (39). Choose the A that minimizes G C V . Algorithm 2. Choose an initial estimate, j3°. Calculate the associated V and the pseudo data z. Fixing z and V, assume that there exists a knot subset, A, for which z is normal with mean B(\)g  and covariance V~ . x  Fit the pseudo data by the usual  method of generalized cross-validation in a weighted normal model, that is, for each value of A, find J(A) to minimize l l t ^ / ^ — B(\)/3)\\ 1  2  2  and then choose the value of  A that minimizes G C V in (39). The resulting § is then used in the next step of the iteration to create a new V and z. Iterate until A is stabilized and the iteration for this stabilized A converges. Algorithm 1 is proposed by O'Sullivan et al. (1986) for use with smoothing splines, where A is a real number. Algorithm 2 is proposed by G u (1990a) also in the smoothing spline context.  G u shows that, under the right conditions, the final A chosen by  algorithm 2 satisfies  D(X) 49  where A minimizes E(V (\)) W  = E(GCV(X)),  and in Binomial case, D{A) = E ( £ " = i K  rij represents the number of observations with a particular value  (rij,p(xj),px(xj)) j, S  of x = Xj, K(n, o,P\)  is the Kullback-Leibler distance between a Binomial (n,p ) and  P  0  Binomial (n,pi) and equals e(n,p ) — e(n, i) 0  / e(n,p) =  2 k=Q  \  n k  where  P  \  I  - p) ~ n  k  (k\og{-^—) + nlog(l - p) + log \ l-p  )  \  n " V  t fc  /  ) /  The difference between the two algorithms is explained in G u (1990b), who argues convincingly that algorithm 2 is more suitable. The data are only analysed using the first algorithm, as Gu's method has only recently become known. 3.2. G C V in Constrained Binomial Regression Splines The modification of G C V  in (39) for the constrained problem is similar to that  made in the normal regression model. Specifically, suppose Q is the solution of (37) and A\ is the active constraints set with A\j3 — 0. The minimization problem of (37) has the same solution as min  ||V5(z  Do Q-R decomposition of A  - B§)\\  subject  2  to  A[/3 = 0.  1 ;  = (Qi : Q )  QiR.  2  Let Q = Q2Q. Then, (37)  =  min \\V^z  -  BQ c)\\  =  min (z - BQ cfV(z  2  2  -  2  The estimate of c and J3 are c =  ((BQ ) V(BQ ))- Q B Vz t  2  1  2  50  t  2  t  BQ c). 2  (40)  (Q B VBQ )- Q B Vz t  t  1  2  t  2  t  2  Q2Q =  Q (Q B VBQ )- Q B Vz, t  2  t  1  2  t  2  t  2  and thus z = H(X)  BQ^QlB'VBQ^QlB'Vz  =  BQ (Q B VBQ )- Q B V, t  2  t  2  1  2  t  t  2  and tvH(X)  =  tvdQlB'VBQ^QlB'VBQ^  =  rank of  Q^VBQ^.  The approximate form of G C V in the constrained binomial regression splines is defined as G  C  V  _  \\vKi-BS)\f (n-tr  (H(X))f  This form can be used in either Algorithm 1 or 2. 4. A I C in Binomial Regression Splines In binomial regression splines, A I C is defined as AIC(A) = - 2 J2( S*Bi - l n ( l + e x p ^ S , ) ) ) + 2 * q yj  i=i  where 5 represents the number of independent parameters and @ is found by the Newton-Raphson method. Without constraints, q equals m + k. W i t h constraints, q equals m + k minus the rank of the active constraints matrix A\. 5.  Data Analysis: the Relationship Between Survival Probabilities and  Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51  5.1. Survival Probabilities and Tarsus Exploratory data analysis indicates that as the size of the female song sparrows' tarsus increases, the survival probabilities of the birds decrease. We relate survival probabilities to tarsus using Kelly and Rice's and point-wise monotone binomial regression splines. In Kelly and Rice's method, we assume the parameters of the fitted curve form a decreasing sequence. B y computing G C V and A I C scores, as in Algorithm 1, we obtain the unique best set of knots,{0,0.3,0.4,0.5, 0.7,1}. In the point-wise monotone method, we assume that at the increasingly ordered data points, the values of the fitted function are decreasing. A grid search of G C V and A I C yields the same set of knots, {0,0.2, 0.3,0.6,1}. Figures 44 and 45 show the two fitted curves. In this data set, the two fitted curves are very close. To describe the relation of survival probabilities and tarsus of the female song sparrows more clearly, we divide the values of tarsus length into four subintervals. If the values of tarsus length are from 17.7mm to 18.25mm, the birds have about 100% chance of surviving, from 18.4mm to 19.75mm , the survival probabilities of the birds decrease sharply from over 95% to 37%, if we go further from 19.75mm to 20.3mm, the chance of survival of the birds is about 37%, from 20.3mm to 21.3mm, the chance of survival of the birds drops from 37% to 0. 5.2. Survival Probabilities and Beak Length We assume a non-decreasing function fits the relation between beak length and survival probabilities of the female song sparrows. Applying the two types of constraints of Section 5.1 to the data set, we have the following results. W i t h Kelly and Rice's method, G C V and A I C choose different sets of knots. G C V chooses {0,0.2,0.7,1}, A I C chooses {0,1}. For each set of knots, we have the fitted curve in Figure 46 and Figure 47.  52  W i t h pointwise monotone constraints, computation of G C V and A I C also gives different sets of knots. The best sets of knots from G C V and A I C are {0,0.2,0.6, 0.7,1} and {0,0.6,1} respectively. W i t h the two sets of knots, we obtain the fitted curves shown i n Figure 48 and Figure 49. Although the two types of constraints and the two ways of choosing knots result in different sets of knots, the fitted curves are very close. Roughly speaking as the beak lengths of the female birds increase from 7.8mm to 8.8mm, the survival chances of the birds increase from 5% to 62%. For beak lengths greater than 8.8mm, the survival chance is about 63%. 5.3. Survival Probabilities and Beak Width Here we still assume that the regression function relating survival probabilities and beak width of the female song sparrows is monotone. For each type of constraint, G C V and A I C give the same set of knots {0,0.8,1}. The fitted curves are shown in Figure 50 and Figure 51. Both methods give almost identical fits. The chance of survival increases from 0 at 5.9mm beak width to 55% at 6.8mm beak width. Birds with beak width bigger than 6.8mm have about a 55% chance of surviving.  53  Bibliography  Barlow, R . E . , Bartholomew, D . J . , Bremner, J . M . , and Brunk, H . D . (1972). Statistical Inference under Order Restrictions. New York: John Wiley & Sons.  Brunk, H . D . (1955). Maximum Likelihood Estimates of Monotone Parameters. A n n . Math. Statist., 26, 607-616.  de Boor, C. (1978). A Practical Guide to Splines. New York: Springer-Verlag.  Duchon, J . (1977).  Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev  Spaces, in Constructive Theory of Functions of Several Variables, eds. W . Schempp and K . Zeller, Berlin: Springer-Verlag, 85 — 100.  Friedman, J . and Tibshirani, R. (1984). The Monotone Smoothing of Scatterplots. Technometrics, 26, 242-250.  G i l l , P . E . , Murray, W . , and Wright, M . H . (1981). Practical Optimization. New York: Academic Press.  G u , C. (1990a). Adaptive Spline Smoothing in Non Gaussian Regression Models. J . Amer. Statist. Assoc., 85, 801-807.  G u , C. (1990b). A Note on Cross-Validating Non Gaussian Data. Technical Report #96. Department of Statistics, University of British Columbia.  L i , K . (1985). From Stein's Unbiased Risk Estimates to the Method of Generalized  54  Cross-Validation. The Annals of Statistics, 13, 1352 — 1377.  Karn, M . N . and Penrose, L . S . (1951). B i r t h Weight and Gestation Time in Relation to Maternal Age, Parity and Infant Survival. A n n . Eugen. 16, 147-164.  Kelly, C. and Rice, J . (1988). Monotone Smoothing with Application to Dose Response Curves and the Assessment of Synergism. Unpublished Manuscript.  Kimeldorf, G . and Wahba, G . (1971). Some Results on Tchebycheffian Spline Functions. Journal of Mathematical Analysis and Applications, 33, 82-95.  Meier, K . (1989). Estimating Rate Equations Using Nonparametric Regression Methods. Institute of Statistics Mimeograph Series No. 1969, North Carolina State University.  O'Sullivan, F., Yandell, B.S. and Raynor, W . J . (1986).  Automatic Smoothing of  Regression Functions in Generalized Linear Models. J . Amer. Statist. Assoc., 81, 96-103.  Ramsay, J.O. (1988). Monotone Regression Splines in Action. Statistical Science, 3, 425-461.  Reinsch, C. (1967). Smoothing by Spline Functions. Numer. M a t h , 10, 177-183.  Schluter, D . and Smith, J . N . M . (1986). Natural Selection on Beak and Body Size in the Song Sparrow. Evolution, 40(2), 221-231.  55  Schoenberg, I.J. (1964). Spline Functions and the Problem of Graduation. Proc. Nat. Acad. Sci. U S A 52, 947-950.  Speckman, P. (1985). Spline Smoothing ans Optimal Rates of Convergence in Nonparametric Regression Models. Annals of Mathematical Statistics, 13, 970 — 983  Utreras, F.I. (1985). Smoothing Noisy Data under Monotonicity Constraints Existence, Characterization and Convergence Rates. Numer. Math., 47, 611-625.  Villalobos, M . and Wahba, G . (1987). Inequality-Constrained Multivariate Smoothing Splines W i t h Application to the Estimation of Posterior Probabilities. J . Amer. Statist. Assoc., 82, 239-248.  Sakamoto, Y . , Ishiguro, M . and Kitagawa, G . (1987). Akaike Information Criterion Statistics. Boston: D . Reidel.  Wahba, G . and Wendelberger, J . G . (1980). Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation. Monthly Weather Review, 108, 1122-1143.  56  Fraction of Surviving  tf ro -  o o Ui c  cp  s  cn — CO  < < cs cm  cn C  00  -  CA  w cr  I.  cm cr  Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines  Figure 3. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines  Figure 4. Grid Search on G C V  0.0  0.01  0.02  0.03 Lambda  60  0.04  0.05  0.06  Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines  Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7SIN(1.5x)  in  CM  o c\i  in d o d Quantiles of F-Distribution, df=(3,43)  62  Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 7SIN(1.5x)  Quantiles of F-Distribution, df=(4,42)  1  2  3  Quantiles of F-Distribution, df=(5,42)  63  Figure 8. Q-Q Plots of R^cr  2  When the  Simulation Function Equals 7SIN(1.5x)  Figure 9. Q-Q Plots of R^/a  2  When the  Simulation Function Equals 7SIN(1.5x)  Figure 10. Q-Q Plots of R /a 0  2  When the  Simulation Function Equals 7SIN(1.5x)  ra o ra o Q T3  CU  •c o CO  o cn o co o  Quantiles of Chisq-Distribution, df=43  o  o o  o CO  o co  Quantiles of Chisq-Distribution, df=44  66  Figure 11. Q-Q Plots of R /cr  2  Q  When the  Simulation Function Equals 7SIN(1.5x)  —,  ,  ,  ,  30  40  50  60  70  Quantiles of Chisq-Distribution, df=45  30  40  50  60  Quantiles of Chisq-Distribution, df=46  67  70  80  Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{ < . y 0  x<0 3  + 7 * 0.31{.3<x<i} + 7(x — 0.8)I{.8<x<i} 0  Quantiles of F-Distribution, df=(4,43)  in  CM  o  CM  in J  Quantiles of F-Distribution, df=(4,42)  68  0  Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals  7xI{ <x<o.3} 0  + 7 * 0.37{o.3<a;<i} + 7(x — 0.8)I{.8<a;<i} 0  69  Figure 14. Q-Q Plots of i?i/cr When the Simulation 2  Function Equals  7a;J{ <x<o.3} + 7 * 0.3/{ .3<x<i} + 7(x - 0.8)/{ .8<x<i} 0  0  0  re re Q  n<u to  CO  30  40  50  Quantiles of Chisq-Distribution, df=40  70  60  Figure 15. Q-Q Plots of R /(j  2  x  Function Equals  When the Simulation  7xJ{ <x<o.3} + 7 * 0.31{.3<x<i} + 7(x - 0.8)I{ .8<x<i} 0  0  71  0  Figure 16. Q - Q Plots of R /a  2  Q  When the Simulation  Function Equals 7xI{ < o. y + 7 * 0.3^ .3<x<i} + 7(x - 0.8)7{ .8<*<i} Q  x<  3  0  Quantiles of Chisq-Distribution, df=46  72  0  Figure 17. Q-Q Plots of R /a  2  Q  When the Simulation  Function Equals 7xJT{<x<o.3} + 7 * Q-3I{a.z< <i} + 7(x - 0.8)I{ . <x<i} 0  x  0 8  Weight  20  22  24  26  28  CW  c  re M 00  a.  a  Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  o  oo  CD  cs) c  CD _l D) .=  CD CD  CD  CM CD  75  Tarsus 00  s  CD  to  O  » 0  SL  OP  cr  Ul O  cn  C CA  S» 1  CS  rw P3  o  3  CD  CL  ft) 3?  CL CL  Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data  10  o  ai  c  cu  _J  ca  CD CO in oo  q cd  77  Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female (Right) Song Sparrow Data  CD  CM CO  o  CO  Q. CU  Q oo X. ro LO cu CO  CO LO  LO  CM LO  78  Beak Width  6.6  C » i ft to w  6.8  •  W  em  *J  £L ft  plo  nd  09  0  rtcn O >-*»  W ft  Pi  rt-  O £3 •1  l-i  o  o  re /-—V  ft  Sift*  )—-I  Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data  80  Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow Data  81  Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow Data  82  Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data  m m m m m _ mm mm m mm m  m jjn  o m  _m m  oo  f  f  f tfff fff  co to  f t fff  t  f  f  CD  1  f  f  f  I pf%  m  m  T f  ft  fff f/ff» ' ffff ff f W f fff t f f  f  f  f ff  • 1  i  1  t fff t  f f f 'f  CM CD  m  m  m^iiimri ™ m m mmm ^nvx RLlwim m m m TIyrrf f^tmn m mrrn m mrcL mmm m f m3m m m m ••• m m m mm 1 It  m  co  mn rmmrrfrim rff  m  m  m m  ff f f  if  20  22  "T~ 24  26  WT  83  28  m  Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data  20  1^ 22  24  26 WT  84  28  Figure 29. Scatterplot of Beak Length and Beak Width for Song Sparrow Data  85  F i g u r e 30.  Scatterplot  o f W e i g h t a n d B e a k L e n g t h for  Song Sparrow Data  m f m m m m m m m m f m f m, y% m  m  m  m  mm f ni m m f _' « f rflV , f, ,f f f tffm f f  f m  f ff f f  f  m  f  1  m  !  }  m  n  1  f f f f  m mm  m  m m m  %  %  « fmf rrmp "m Ln*m.L m m  T  T  f ff f ff f rh  f  f  f j p ^ R l  ^  f f rfi rfi m f W f  f  f  m m  m  m  m  f  f  f  m  f 'f f  20  ff  m  f  f  22  f m  f f  m  m  T 26  24 WT  86  m  nm  28  Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data  87  Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows  88  Figure 33. The Relationship between W i n g Length and Survival Probabilities of the Female Song Sparrows  WL  89  Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows  18.5  19.0  19.5  90  20.0  20.5  Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows  -J  1 8.2  8.4  T~  8.6  8.8 BL  91  9.0  9.2  9.4  Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows  co -  CM  -  O -  O)  T-  o  CM  _  92  Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  o H  CM - J  ^ -4  BW  93  Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows  oo H  to H  CM  H  o H  CM  - J  94  Figure 39. The Relationship between Wing Length and Survival Probabilities of the Male Song Sparrows  •t -  co -  CM  -  O  1.o=>  O  o  "T~  67  68  69 WL  95  70  71  Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows  19.5  20.0  96  20.5  21.0  Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows  BL  97  Logit of the Estimates of Survival Probabilities  Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows  CO  -  CVJ  o -  99  Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows  Kelly and Rice's Method  18  19  20 Tarsus  100  21  F i g u r e 45. T h e R e l a t i o n s h i p b e t w e e n T a r s u s a n d S u r v i v a l P r o b a b i l i t i e s of the Female S o n g S p a r r o w s  00  d  CD  CD  J5  co  n o  CO >  d  CO  C\J  d  o d  Tarsus  101  Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method  8.5 Beak Length  102  9.0  9.5  Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method  . • ••  /  Knots Chosen By AIC  8.0  8.5  9.0  Beak Length  103  9.5  Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method  CO  »  ci  / to  d  d  co d  CM  d  Knots Chosen By GCV  8.0  8.5  9.0  Beak Length  104  9.5  Figure 4 9 . The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method  8.0  8.5  9.0  Beak Length  105  Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  Beak Width  106  Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows  in  d  d CD -O CO  S  CO  o  CO  >  CO  CM  d  o d  107  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098248/manifest

Comment

Related Items