M O N O T O N E REGRESSION FUNCTIONS By Y A N L I N G ZUO B . S c , Beijing Institute of Technology, 1984 M . S c , Beijing Institute of Technology, 1987 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES D E P A R T M E N T O F STATISTICS We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF BRITISH C O L U M B I A September 1990 ©Yanling Zuo, 1990 In presenting degree this at the thesis in University of freely available for reference copying of department this or publication of partial British Columbia, and study. thesis for scholarly by his this or her Department of STATISTICS The University of British Columbia Vancouver, Canada DE-6 (2/88) RKPT. ? a of the r i g g n requirements I agree that the I further agree purposes representatives. may be It thesis for financial gain shall not permission. Date fulfilment that advanced Library shall make it by the understood be an permission for extensive granted is for allowed head that without of my copying or my written Monotone Regression Functions Abstract In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimatesMepend upon a "smoothing parameter": the number and location of knots in regression splines and the usual A in smoothing splines. Two standard techniques for choosing the smoothing parameter, G C V and A I C , are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets. n Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures vii Acknowledgement xiii Chapter 1. Introduction 1 Chapter 2. The Sobolev Space W [a,6] 4 Chapter 3. Splines 6 m 2 1. Definition of S (t ..., m u t) 6 k 2. Truncated Power Basis 6 3. B-splines and I-splines 8 4. Monotone Splines 10 Chapter 4. Regression 14 1. Quadratic Programming 14 2. Least Square Regression Splines 15 3. Smoothing Splines 17 3.1. Unconstrained Smoothing Splines in the Normal Model 17 3.2. Monotone Smoothing Splines 19 4. Choice of the Smoothing Parameter 20 4.1. G C V and Its Asymptotic Properties 21 in 4.2. A I C and Its Asymptotic Property 22 4.3. G C V Applied to Monotone Constrained Problems 23 4.3.1. The Approximation Form of G C V in Regression Splines 24 4.3.2. The Approximation Form of G C V for Monotone Constrained Problems in Smoothing Splines 24 4.4. A I C in Monotone Constrained Problems in Regression Splines 25 C h a p t e r 5. D a t a A n a l y s i s 1: the R e l a t i o n s h i p b e t w e e n S u r v i v a l P r o b a b i l i t i e s a n d B i r t h w e i g h t of M a l e Infants 27 1. Description of the Data Set 27 2. Exploratory Data Analysis 27 3. Analysis of the Data by Least Square Monotone Regression Splines 29 4. Analysis of the Data by Monotone Smoothing Splines 30 5. Some Discussions on Testing the Monotone Assumption of the Regression Function 30 5.1. The Summary Statistics 30 5.2. Simulation 31 5.3. Conclusion 32 C h a p t e r 6. D e s c r i p t i o n of S o n g S p a r r o w D a t a a n d E x p l o r a t o r y D a t a Analysis 33 1. Data Set Description 33 2. Exploratory Data Analysis 33 2.1. Sex Effect of Birds on the Values of Independent Variables 34 2.2. Pairwise Relations of the Six Independent and Sex Effect on the Relations . . 35 2.3. Principal Component Analysis 36 iv 2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Independent Variables 38 2.5. Generalized Linear Models 39 Chapter 7. Binomial Regression Splines 45 1. Unconstrained Binomial Regression Splines 45 2. Monotone Binomial Regression Splines 48 3. G C V in Binomial Regression Splines 48 3.1. G C V in Unconstrained Binomial Regression Splines 48 3.2. G C V in Constrained Binomial Regression Splines 50 4. AIC in Binomial Regression Splines 41 5. Data Analysis 2: the Relationship between Survival Probabilities and Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51 5.1. Survival Probabilities and Tarsus 51 5.2. Survival Probabilities and Beak Length 52 5.3. Survival Probabilities and Beak Width 53 Bibliography 54 v List of Tables Table 1. Male Births Data 28 Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables 35 Table 3. Logistic Models Fit to the Female Song Sparrow Data 40 Table 4. Logistic Models Including Six Independent Variables for Female Song Sparrow Data 41 Table 5. Logistic Models Including T and Its Second Interaction Term 41 Table 6. Logistic Models Including B L and Its Second Interaction Term 42 Table 7. Logistic Models Including B W and Its Second Interaction Term 42 Table 8. Logistic Models Fit to the Male Song Sparrow Data 43 vi List of Figures Figure 1. Fraction of Surviving Versus Birthweight 57 Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines 58 Figure 3. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines 59 Figure 4. Grid Search on G C V 60 Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines 61 Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7Sin(l.5x) 62 Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 75m(1.5x) 63 Figure 8. Q-Q Plots of R\jc Figure 9. Q-Q Plots of R\/u 2 When the Simulation Function Equals 75m(1.5x) 64 2 When the Simulation Function Equals 75m(1.5a;) 65 When the Simulation Function Equals 75m(1.5x) 66 Figure 10. Q-Q Plots of R /cr 2 0 vii Figure 11. Q-Q Plots of RQ/(T When the Simulation Function Equals 75m(1.5x) 67 2 Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xI{ < <o.3} + 7 * 0.3J{o.3<s<i} + ( x - 0.8)I{ .8<*<1} 7 0 iB 6 8 0 Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{ < . y + 7 * 0.31{ .3<2;<i} + 7(x - 0.8)I{o.s<x<i} 0 x<0 3 0 Figure 14. Q-Q Plots oiRi/a When the Simulation Function 2 Equals 7xl{ < . y + 7 * 0.31{ .3<z<i} + 7(x - 0.8)I{ .8<x<i} 0 x<0 3 69 0 0 70 Figure 15. Q-Q Plots of R\/o- When the Simulation Function 2 Equals 7x!{ <x<o.3} + 7 * 0.37{ .3<r<i} + 7{x — 0.8)I{o.8<x<i} 0 0 71 Figure 16. Q-Q Plots of Ro/a When the Simulation Function 2 Equals 7xI{ < . } + 7 * 0.3/{ .3<x<i} + 7(x - 0.8)7{ .8<x<i} Q x<0 3 0 0 72 Figure 17. Q-Q Plots of Ro/cr When the Simulation Function 2 Equals 7x!{ <x<o.3} + 7 * 0.3/{ .3<x<i} + 7{x - 0.8)/{ .8<r<i} 0 0 0 73 Figure 18. Boxplots of Weight for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 74 Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 75 Vlll Figure 20. Boxplots of Tarsus for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 76 Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 77 Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 78 Figure 23. Boxplots of Beak W i d t h for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 79 Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data 80 Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow D a t a 81 Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow D a t a 82 Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data 83 Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data 84 Figure 29. Scatterplot of Beak Length and Beak W i d t h for Song Sparrow Data . 85 ix Figure 30. Scatterplot of Weight and Beak Length for Song Sparrow Data 86 Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data .. 87 Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows 88 Figure 33. The Relationship between Wing Length and Survival Probabilities of the Female Song Sparrows 89 Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 90 Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows 91 Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows 92 Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 93 Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows 94 Figure 39. The Relationship between Wing Length and Survival Probabilities x of the Male Song Sparrows 95 Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows 96 Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows 97 Figure 42. The Relationship between Beak Depth and Survival Probabilities of the Male Song Sparrows 98 Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows 99 Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 100 Figure 45. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 101 Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method 102 Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method xi 103 Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 104 Figure 49. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 105 Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 106 Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 107 xii Acknowledgement I would like to express my deep gratitude to my advisor, Professor Nancy E . Heckman, for her constant encouragement and very helpful advice, which are crucial to this work and very much beyond, and for her patiently reviewing the draft. Very special thanks are due Professor Constance Van Eeden and Dr. Jean Meloche for reading the manuscripts and making constructive comments. I am indebted to Dr. Dolph Schluter for his help in biology and Dr. Chong G u for his help in computation. Also, I thank Dr. Dolph Schluter and Dr. James N . M . S m i t h for allowing me to use the song sparrows data. This research was partially supported by the National Science and Engineering Research Council of Canada Grant Number 5-87969. xiii Chapter 1. Introduction In this paper, effective methods for fitting monotone functions are discussed: least square regression splines, binomial regression splines and smoothing splines. These estimates are then used to decide if the true regression function is monotone. In some applications, we require the estimate of the regression function to be monotone. For instance, if f(x) is a response of a test of ability carried out by varying the degree of stress, x, it is reasonable to expect that / is a decreasing function of x. Monotone regression has been used in assessing the interaction between two drugs (Kelly and Rice, 1988). Asynergism is one commonly used definition of a lack of interaction. Asynergism is defined in terms of f(x,y), f\(x) = / ( # , 0) and f (y) 2 = / ( 0 , y), where f(x, y) is the expected response at dosage x of one drug and dosage y of the other. Testing the null hypothesis that the two drugs are asynergistic requires the estimation of fx and f and inversion of the estimates. Since drug response is usually 2 a monotone function of dose and since the estimates of fx and f 2 fx and f 2 must be invertible, are estimated by monotone functions. Monotone regression is also used in the quantification of a plant's ability to absorb different ions (Meier, 1989). In the study of a corn seedling's ability to absorb nitrogen, the experimenter is interested in the relationship between the nitrate concentration, u, of a solution surrounding the roots of a corn seedling and the seedling roots' rate of uptake (absorption) of nitrate, The uptake rate is not measured directly. Instead, the nitrate concentration of the surrounding solution, C , is measured over time, the rate of uptake determined by minus the derivative of this curve. The relation between $ and u can be expressed as *( ) U = | ( = c _ 1 ( u ) = 1 -C\C-\u)). So, we need estimates of C and C function to get a unique C - 1 - 1 . It is necessary to assume that C is a monotone and thus $. In other applications, we may want to test whether the regression function is monotone. For example, in the study of natural selection, biologists wish to study the relationship between survival and some physical trait. If the survival function is unimodal, biologists say that selection is stablizing. If the survival function is monotone, selection is directional. Some techniques are presented for estimating monotone functions. Brunk (1955) proposed the Pool-Adjacent-Violators algorithm to solve the following problem. Given observations {yi, y , • • •, Un}, with E(yi) = u,, find {u\, u , • • •, ^n} to minimize YLiiUi — 2 2 iii) subject to the restriction ux < u < ... < u - While Brunk's method gives the 2 2 n maximum likelihood estimate of E(yi), assuming y,- is normal, the resulting estimate is unappealing, due to its roughness. To get a smooth and monotone estimate, Friedman and Tibshirani (1984) combine the running-mean smoothing technique with Brunk's isotonic regression method. Instead of estimating the entire regression function, both methods give estimates of evaluations of the regression function at the observed data points. This paper focuses on fitting monotone functions by regression and smoothing splines. Ramsay (1988) and Kelly and Rice (1989) independently defined monotone regression functions in a spline space. Their definitions and their equivalence are discussed in Chapter 3 . Point-wise approximation to monotonicity is also given. Using these monotonicity constraints, monotone least square regression, smoothing and binomial regression spline estimates are introduced and developed in Chapter 4 and Chapter 7.. A l l three estimators depend upon a smoothing parameter: the number and location of knots in the regression estimate, and the usual A in the smoothing spline 2 estimate. For choosing the smoothing parameter in each of the methods, generalized cross validation ( G C V ) and Akaike information criterion (AIC) are used. Villalobos and Wahba (1987) introduced inequality-constrained multivariate smoothing splines and applied them to the estimation of posterior probabilities, where the estimated probabilities are constrained to lie between 0 and 1. Techniques for testing the monotone assumption are not as well developed as those for estimating a monotone function. Barlow et al. (1972) proposed the generalized likelihood ratio test statistics, x (tf , the variance of y,-, known) and E (cr unknown), 2 -2 2 2 to test the hypothesis H : u(xx) = u(x ) = . . . = u(rcfc), vs Hi: u(xi) < u(x ) < • • • < Q u(xk) where for each i in { 1 , 2 , . . . , 2 2 n; values of y are observed when x has values x;. The null hypothesis distributions of x and E 2 2 are complicated, especially when h is large, and the usual asymptotic results concerning likelihood ratio tests do not hold. In Chapter 5, we discuss techniques to determine if the true regression function is monotone. Instead of making a theoretical proof, we carry out simulations to study our techniques. Techniques are applied to two data sets in Chapter 5 and Chapter 7. 3 C h a p t e r 2. T h e S o b o l e v S p a c e W [ a , 6 ] m 2 In the regression model, the unknown function / is assumed to be i n some large function space. The Sobolev space W m 2 is commonly used. A function, / , defined on [a, b] is said to belong to the Sobolev space W [a, b] if / m 2 has the following properties: (i) / , / W , . . . , fi™- ) are absolutely continuous. 1 (ii) f( ) is square integrable. m The following facts are well known (see, for instance, Kimeldorf and Wahba, 1 9 7 1 ) (i) W™[a, b] is a Hilbert space with inner product m-1 </,<?>= E r + / r{v)g {v)dv. (i: m i=o J a (ii) L , which has domain W [a, b] and range R, defined by L (f) = f(t), is a m t 2 t continuous linear functional with L (f) = < Kt, f > where t Kt( ) = s is called a reproducing kernel. Two useful subspaces of W [a, b] i n the theory of smoothing splines are Hi and m 2 Ho- They are defined as Hi = {fe W?[a, b] : / « ( a ) = 0, j = 0 , . . . , m - 1} and H = {fe 0 W?[a, b] : / ( ( i ) = 0 a.e}, m and one can show that W m 2 = Hq@H . X P and P denote two linear operators. P is the projection of W™ onto H and Q x 0 0 TO Po(/) = £<tf;,/>lfri (3) where { ^ 1 , • • •, V'm} is a basis for polynomials of order m which spans Ho- P\ is the projection of W m 2 onto H , x m Pi(/)(*) = / ( * ) - E < ^ / > ^ ( 0 ( ) 4 and \\Pi(fW = [\f \t)Ydt. {m (5) The concepts mentioned above will be used in Chapter 4, part 3, smoothing splines. 5 Chapter 3 . Splines In this chapter, we present the definition of spline space S (ti,..., m the elements in S (ti,.. m t ). To describe k ., t ), we introduce three bases for it, truncated power basis, k B-splines and I-splines. Finally, we will discuss the definition of monotone splines. The concepts and the results of this chapter will be applied to regression problems in Chapter 4 and Chapter 7. 1. Definition of S (ti, m S (ti,..., m k k [i,-, k t ) denotes the set of mth order splines. A spline of order m with knots t = 0 < ti < ... < t 0 ...,t ) < t k+1 — 1 is a polynomial of order m on each subinterval, and has m — 2 absolutely continuous derivatives on [0,1]. The knots, i are called interior knots. It is well known that S (ti,..., m l 5 . . . , t, k t ) has dimension m + k (de k Boor, 1978). 2. Truncated Power Basis One of the bases for S (ti,..., t ) is the truncated power basis, Xi,..., m k where X {t) 1 x (t) m = 1 = t = tm—l m—l + m—l 6 X+ m k and x+ = xl{x > 0}, (x+)° = I{x > 0}. If m+k /(*) = £ frXiit), i=l (6) then simple calculations reveal i) / is a piecewise polynomial of order m on any subinterval [t,-,r ), since on t+1 m+k /(*) = £ [U,t ), i+1 m = i=i s o m e J'=I ii) / has m—2 absolutely continuous derivatives, since each Xi has m—2 absolutely continuous derivatives. Proof: For i — 1 , . . . , k d ~ X i(t) m 2 m+ dt ~ m (7) = (m-l)!(t-<«•)+• 2 iii)-X",'s are linearly independent. Proof: For t € [0,*i),/(i) = E i * * .. . = 0 = 0, and so for i G [t t ), m u 2 Z? +k = E W ' PiX^t) = fi (t m+1 _ - 1 = 0 implies ft = = 0, which implies fi +\ — 0. Working forward in this manner, one easily sees that /?,• = 0, i = m 1 , . . . , m + k, if E r = 0, for < e [0,1]. + f c Although the simplicity of the truncated power basis makes it attractive for statistical work, the basis tends to be ill-conditioned as is often manifested in a nearly singular matrix, X Xi(xi) X +k(xi) Xx{x ) X (x ) 2 \ Xx(x ) n Here, X\,...,x n m m+k ... 2 X (x ) m+k \ n j represent data points. Among other things, this makes the inverse of X*X difficult to calculate. Thus, it is desirable to explore the use of other bases which are better conditioned. 3. B - s p l i n e s a n d I-splines Given knots which satisfy 0 = i _ ( _ i ) = . . . = t < h < ... < tk < tk+i — • • • = tk+m = 1, m (8) 0 the mth-order m + k dimensional B-spline basis, { - B _ ( _ i ) , , . . . , Bh, } (de Boor, m m m 1978), is defined as Bi,m(t) = 37^- _i(i) + t+m m _ —B i -i(t) i+ im 1 r € [*t,*,-+i) »' = 0 , . . . , with B i(t) = < it 1 *e 0 and The - 1 i= k otherwise (9) 0/0 = 0. B-spline basis is more convenient computationally than the power series basis since each B^ has support [£,-, i m ! + m ) (de Boor, 1978), making X ' X banded and easier to invert. Another basis, which is used to define monotone regression splines (Ramsay, 1988), is the I*-spline basis of order m (m > 2), I* (i) m where M _i(u) itTn = {ri( _ ), , • • •, Ik, }, m 2 m m f* M -i(u)du Jo i = -(m-2),... k itm - — with knots as i n (8). . (10) 't+m —1 Since each B^ _i is a piecewise polynomial of order m — 1 with i?; _i(i) > 0, each m iTO I* will be a non-decreasing piecewise polynomial of order m. m However, the above definition of the I*-spline basis is incomplete, since i,* (0) = m 0, i = —(m — 2),. . . , k, and there exist functions, / , i n S ( < i , . . . , t ) with / ( 0 ) ^ 0. m k Also, the basis for a n m + i dimensional space should have m + k elements, so an extra basis element, Il( -i) , m im should be included. Theorem 1. Let / ! _ ( m = 1. Then { i ^ _ 1 ) i 7 7 1 ( m 1 ) i i n > i " ! - , , • • •, I* ,m] is a basis for ( m 2 ) m k S (*i,...,**). m Proof: Since {-B_( _i), , • • •, -Bjt,m} is a basis for S ( r i , . . . , tk), it suffices to show that any m m m Bi, (l — —(m — l ) , . . . , k) can be written as a linear combination oi I^ _^ , I*( _ ), , m m m m 2 m r* From de Boor (1978, page 138 equation 9) (m-l)(-f' ^•Bt,m(g) + 1 ' m -(m-l) -/ 7 g (m-1) m ' ~ 2 ) + f'""- ?) l ( 3 ; ) m l( l ( t ) 7= -(m-2),...,fc-l Z= -(m-l) (11) •Bfc,m-l(i) tk+m-l—tk and thus - f * (rn-l)B +C, .(u) ^ l+1 f= ft ( m - l ) B , _ i ( « ) ( m -(m-2),...,A;-l (12) /o C !r-^ '* (w where ) )du+c )m ( r o _ 1 ) i m , -fl( -2), > • • • > -^,m> m -i: ( m _ , (i) + c_ 2 ) m ^ / * (i) + c £ fc m Therefore, for any 5 ; Hence, { I * ( r o _ f c > m ( m _ e 1 ) i m ( m _ obtain (*) / = - ( m - 2), (i) / = - ( m - 1) 1 ) i m (t) = tym i s a k- 1 (-14) i=* , there exist a / s , such that B (t) , iH _ ),m. • • •> fc,m} (ro _ ( m _Di: ( m 1 ) i m 7 1 ) i m w m - ^ + i , m ( 0 + J/tmC*) + ^ lB (13) C\ = B ; ( 0 ) . From the definitions of / l J3/ .(0 = l = k b a s i s f o r 2 S (h,. m EL-( -i)«.^ (<)' m m .., t*). In order to study the relation between I* and B-splines more easily, we defined an improved I-spline basis of order m as r* Ii, (t) = / Mi, -i(u)du m J o m X + £ i,m(°)i i=i B i = ~( m - 1), • • •, M_( _i) _i(ti) = where m Since 22-( -i) B^ (t) m im = 1, I-( -i),m(t) m (15) 0. = 1. W i t h the inclusion of the constant terms, m E J U -S(, (0), the relationship between the I-spline and B-spline bases is easily stated. m Theorem 2. Given a choice of knots satisfying (8), if Pj = E i ( r o Yl&jlj,m = YLfiiBj,-™. if and only _i) Proof: It suffices to show that I»,m(i) = E?=: Bi (t). tTn From (14), we have -It , (t) +1 BUt) + IUt) m = { -Z _ (t) {m + B. 2)tm I = - ( m - 2 ) , . . . ,fc- 1 + B (0) l<m ( n i _ , (0) 1 ) l = -(m- m ^ r . (t)+B (0) k m 1) (16) l =k ktm Combining (16) and the definition of I-splines yields —ii+i,m(<) + Ej=/+i Bj (Q) + Ii tm +B (0) l = lim Bl, (t) m - J2 =i k tTn Bj (0) im -(m-2),...,k-l = < - I - ( m - 2 ) , ( < ) + E * = - ( m - 2 ) B (0) m + jtm h,m{t) - B (0) ktm + B , (0) (m 1)im (0) I= - ( m - 1) I= k k m -ii+l,m(*) + Il,m(t) S_ _ 1 = ~( — 2), . . . ,k — I m — J-(m-2),m(*) + I-(m-l),m I = —( ijfc,m(*) *= k m ~ 1) Therefore Ii,m = E Bl l=i tT (17) 4. Monotone Splines In some applications, one may want to require that the regression function be monotone. For instance, if / ( £ ) is a response to a drug administered at dosage t, it 10 is reasonable to assume that / is a non-decreasing function of t. Three definitions of non-decreasing splines of order m are presented. (a) Improved Ramsay's monotone I-splines For Ii m(t) t as i n (15), a non-decreasing regression spline is defined as k /(*)= E <*.•£>(*), w i t h a i > 0 , i= -(m-2),...,k. i=-(m-l) Since B[ -i(t) tTn is non-negative, by definition (10), is a non-decreasing function of t. Thus, / is non-decreasing. (b) Kelly and Rice's monotone B-splines (Kelly &: Rice, 1989) For B-splines defined as i n (9), Kelly and Rice define a non-decreasing regression spline as k /(*)= E With /?_(„,_!) < . . . < 0 k t'=-(m-l) (c) Non-decreasing splines k /(*) = E PiBi,m(t) is non-decreasing subject to the condition that/'(t) > 0. :=-(m-l) (For the case m = 2, in which case /'(£;) is undefined, take f'(t) > 0 almost everywhere.) Theorem 3. W i t h the same choice of knots and order, the set (a) of monotone splines of Ramsay is equal to the set (b) of monotone splines of Kelly and Rice. This follows directly from Theorem 2. Theorem 4. Suppose that m = 1, 2 or 3. If f(x) = Xw=-( -i) PiBi,m(x), m monotone if and only if /?_( _!) < /?_( _ ) < . . . < 0km m 2 Proof: First consider the case m = 1. Then k f( ) x k-l = 120i iA ) B 1=0 x = EM*«<*<t«+i} + 1=0 11 ^i{t < <t }k x k+1 then / is For x G [ti, t,+i), i = 1 , . . . , k, and y G [tj, t ), j = 1,..., k with y > x, j+1 f(x)=/3i and f(y) = p . j Thus f is monotone, if and only if / ( y ) > / ( x ) , if and only if 0j > /?,-, j > i, and i = 0 , . . . , k. For cases m = 2 and m — 3, from (11) /'(*)= £ /?i^ (x) ;=-( -i) m m = ( ™ - i ) i -/3-(m-i)— ; K t\ — r_( _ ) Bk,m-l(x) -i(*) ) +0 m = (m-1) £ f* ~ ^ l=-(m-2) t ' + m - l \ V = 2 m +T H tl+ -l—t[/ — 2 m (18) m tl we have X Pi — 01-1 rj / _ X Pi — 01-1 N f = 0 '+! ' r A l - : : i _i _ ) W+m W+l B,, -i(x). ~ For the case m = 2, for x 7^ „, + f=0 r r '+ x r — T ' For x G (ij,r - i), t + z i+l ~ i z Thus / is monotone, if and only i f / ' > 0 on each interval (t,-,i ), i f and only if !+1 A > 0i-i, i = 0,...,k. For the case m = 3, since / ' is piecewise linear and continuous, / is monotone if and only if /'(*,-) > 0, i = 0 , . . . , k + 1. From the definition of B-splines, Bi, (x) = — ^ r ^ { t < x < t 2 I i + 1 } k k+1 / ' W+2 W+l — If When / = k, I{x G [t ,t )} + + 2 , {« i<»<«i }W+l X J 1+ — is replaced by I{x G [it,r i]}. fc+ 12 +3 (I ) 9 Thus, for / = — 1 , . . . , k, f 1 i = I+ 1 I 0 otherwise Combining (18) and (19) yields f\U) = 2f- "f- . Therefore, f'(U) > 0 if and only if & _ i > 1 2 (20) ft_ . 2 Theorem 4 is proved. From our discussion, we conclude that the definitions of monotone regression splines in (a) and (b) are equivalent for all m and equal the definition in (c) when m < 3. 13 C h a p t e r 4. R e g r e s s i o n In this chapter, two nonparametric regression methods, least square regression splines and smoothing splines, are introduced and developed for analysing data sets under monotonicity constraints on the regression function. In the numerical computation of the methods, quadratic programming is involved. 1. Q u a d r a t i c P r o g r a m m i n g The general form of a quadratic programming problem with linear inequality constraints is min xzR n subject to c'x + -x'G? A*x > b (21) for constant matrices G and A and vectors c and b. (For further details, see Gill, Murray and Wright 1981.) For solving this problem, we introduce the following concepts: (i) . Feasible Solution Any point, x, that satisfies all the constraints in (21) is called a feasible solution. (ii) . Active and Inactive Constraints Let qi denote the i t h column of A, and b = (&(i),..., b(k)Y- At the feasible point x, the constraint q\x > &(,•) is said to be active if q\x — b^, and inactive if q\x > b^y If the matrix G is strictly positive definite, problem (21) has a unique minimum. If x* is the solution with active constraints A{x* = bi, then x* minimizes c'x + \x Gx l subject to A\x = b . (The ith. column of Ai and the ith. element of b correspond to x x the i t h active constraint.) Using the method of Lagrange multipliers, c + Gx* - A i A = 0 14 and it can be shown that there exists A satisfying the above such that A, > 0. Conversely, if x* is a feasible point which minimizes c'x + | x G x subject to A\x — b and J x if c + Gx* — A\X = 0 for some A with A,- > 0, then x* is the solution to (21). Quadratic programming is an iterative technique. A sketch of the general procedure is: (i) . Find an initial feasible solution, x°. (ii) . A t the k-th step, separate the constraints into two parts, active and inactive, where A\x k = b represents the active constraints. x (iii) . Minimize c'x + | x ' G x subject to A[x = b and obtain the solution x (iv) . F i n d A satisfying c + Gx k+1 x . k+1 — A X = 0. If A,- < 0, remove qf x = bn from the X xi active constraints, where g - is the ith. column of A and 6 - is the z'th element of b\. lt x lt (v) . If the boundary of an inactive constraint is attained or violated by x f c + 1 , add the constraint to the active set. (vi) . Go back to (iii), using the new active constraint set as defined i n (iv) and (v), and follow the procedure until the active constraint set doesn't change. 2. L e a s t S q u a r e R e g r e s s i o n S p l i n e s Consider the regression problem where we have observations yi at design points and assume the observations of (x,-, y,), i = 1,..., n are related by the regression model yi = f(xi) + ei, *= l,...,n (22) where the e,- are zero mean uncorrelated random variables with a constant variance cr , a < Xj < Xi < ... < x 2 n < 6, and assume / is an unknown function in the Sobolev space W™[a, b]. Given a basis { X _ ( _ ! ) , . . . , Xk} for S [ti,..., m m 15 4 ] , we approximate / by a function of the form «(*) = £ j=-(m-l) Set Q = ( / ? _ ( „ _ ! ) , . . . , f t ) , X = {X }, X^ 4 i:j = Xj(xi), i = l,...,n, j = -(m - 1),. . . , k, y = ( j / i , . . . , y ) - If X is of full rank m + k < n, the least square spline f n estimator of u for the given basis is m+k u(t) = £ PjXj(t), i=i where J3 minimizes ||y - Xg\\ and 2 is given by J3 = [X X}~ X y. t 1 t In some applications, a monotone function, u, is required. If monotonicity constraints are imposed, the fitted parameter Q is changed. There are two types of monotonicity constraints, as discussed in Chapter 3, Part 4. Below, these are formulated as quadratic programming problems, for numerical computation. (i) . Kelly and Rice's monotone splines Choosing B-splines as in (9) of Chapter 3 as the basis of S , m Bj m( i), x t X = {X^}, X^ = i = 1, • •. ,n, j = —(m — 1 ) , . . . , k, § and y are the same as before. Write the constraints /?_( _x) < . . . < ft in matrix form, A Q > b, where the diagonal elements l m of A , a^i, equal —1, and a , 1 jl+1 equal 1, others are zero and b — 0. When the interior knots ti, i = 1 , . . . , k, are unique, X is of full rank m + k < n, the least square spline estimator of / for the given basis is the unique solution of the following minimization problem, min^ subject to -y XP x + A'g > b. which can be solved by quadratic programming. (ii) . Point-wise monotone splines 16 ^tfX'Xg (23) The expression of the non-decreasing constraint in Chapter 3, Part 4, (c) is u'(t) > 0 which can be approximated by the conditions ti(xa) < u(x ) < • • • < u(x ) 2 (24) n in numerical computation. As above, we can write the conditions as linear inequality constraints, A/3 > 6, where Aij = i?j (x, i) — JE?j (x,), i = 1,..., n — 1, j = —(m — )7n + im 1),..., k and 6 = 0. Then, finding the least square spline estimator of u under the constraints becomes the typical quadratic programming problem. 3. Smoothing Splines 3.1. Unconstrained Smoothing Splines Consider the regression model (22). To estimate / , we want an estimator which fits the data well but, at the same time, has some degree of smoothness. The natural measure of smoothness associated with a function / £ W [a, b] is / f( \t) dt, m 6 2 m 2 a a standard measure of goodness-of-fit to the data is n~ Y^j=\{yj ~ f( j)) x x 2 while Thus, one method of combining these two criteria to obtain an overall performance measure is to find / which minimizes where A > 0, controls the relative importance of smoothness and fit. The minimizer, / , is a natural spline of order k = 2m, with simple knots at xi,...,x n (Schoenberg, 1964, Reinsch, 1967). From (5), using the notation of Chapter 2, min n " £(y,. - f(x,)) 2 1 i=i = + A [\f (t)) dt m 2 J a min n-^yi-WW t=i 17 + XlWm*. (25) Now, let ipj(t) = V , j = 1,..., m, and r/,-, i = 1,..., n be the representors of Li in x W? with Li(f)=< f,r,i>, Let £ = P x ^ , ) = / ( i - u)!p (x,- - u)+~7[(m - l)\] du, 6 _1 i = 1,..., n. Then the 2 0 minimizer, /A, can be expressed as 771 71 E ^ W + E ^ W - (26) 3=1 j=l Let m f (».--»)r (»i-")i " , w , , (S)o- = 1 < 6, >= X m 1 ) T / [(m-!)!]» It can be shown that the coefficients c = ( c . . . , c ) ' and d = 1 ? n (e?i,..., d Y in (26) are n obtained by solving the following problem, min ||y - T c j - E c | | + nAc'Ec 2 c r f Subject to T*c = 0. (27) From its definition, it is easily seen that S is positive definite. If T is of full column rank, the minimizer, fx , is unique (Kimeldorf &: Wahba, 1971, Duchon, 1976, Wahba & Wendelberger, 1980). The equality constraint T'c = 0 can be nicely eliminated from (27) as follows: let Q — [Q : Q ], orthonormal, and R be the Q — R decomposition of T, that is x 2 T = (Q : <? ) l I R ^ 0(n— m)*m and let c = Q e. 2 ^ 2 j Since T*c = 0, such a representation always exists. The problem of (27) becomes min^g | | y - T d - E Q e | | + nAe'Q EQ e. 2 2 18 2 2 (28) Then, it is straightforward to find the minimizers, d and e by differentiating (28). 3.2. Monotone Smoothing Splines If the smooth function we want to estimate is monotone, then new constraints must be added to d and c. It can be shown that the minimizer of (28), fx, subject to /'(<) > 0 for all t, is (i) a polynomial of degree 2m — 1 in each open subset of [a, b] — 5 — {xi,..., x }, n (ii) constant in every open subset of S, and (iii) has 2m — 3 continuous derivatives in [a, b] where 5 = ( i 6 [a, b]\f'x(t) = 0} (Utreras, 1985). The exact solution cannot be found explicitly and is difficult to calculate numerically. Therefore, as an approximation to the condition f'(t) > 0, we intend to use the conditions in (24) which can be rewritten in the following form. f( )-f( ) > 0 f(x )-f(x ) > 0 X2 Xl 3 2 /(*n)-/(*n-l) > 0. (29) Then, the minimizer of (25) subject to the constraints (29) will be of the form E » ) + E ; ^ ) j j with T d + E c > 0, ' c 2 where ( T ) 2 (E)f/, t i = (T) - —(T),-,-, t i — l,...,n +1)J — 1, (30) 2 i = 1,..., n-1, j = l,...,n. j = 1,..., m and ( E ) 2 l j = (E) ! + 1 J - B y an argument similar to that in the derivation of (28), the coefficients c = Q e and d in (30) are obtained by solving the 2 19 following quadratic programming problem: min^ ||y-Td-EQ2e|| g subject to + nAg*Q £Q g 2 2 2 T d + E <? e > Q 2 2 2 which is equivalent to (A min^g - y\T : S Q ) 2 \J T*T e T*SQ ^ Q E*T Q|E*SQ + nAQ £Q 2 d ^ W 2 2 2 , 2 \ subject to(T : E Q ) 2 2 ? / > 0(31) 2 V? 7 Since T is of full rank and E is positive definite, the Hessian, ' TT T'EQ ^ 2 ^ Q'E'T Q E EQ + nAQ EQ J ' t 2 2 2 2 is positive definite, and so the unique solution for (31) exists. 4. Choice of the Smoothing Parameter In smoothing splines, the effect of the smoothing parameter, A, which describes the smoothness of the fitted function, is embodied in the minimizing criterion, 1 rb n - EE( y . - - / ( x , - ) ) + A a n A/ (0) *. (m) 2 Ja . , t=i The bigger the A, the smoother the fitted curve. In regression splines, the interior knots play a role similar to A in fitting functions. We assume the unknown function / is in the space S [t\,..., m tk]. Given the basis { X _ ( _ i ) , . . . , X } m k of 5 [ r , . . . , tk], m x we express / as E j L - ( - i ) PjXj(x) and find least square estimates. The number and m positions of the interior knots {t ,.. .,tk} x determine the shapes of the basis splines, Xj, and thus in turn the shape of the final function. For choosing the interior knots, 20 we can start with a few more equally spaced interior knots than we might need, then choose a subset of knots to get a smoother estimate of / . Thus, in regression splines, a value of A corresponds to a subset of knots. The choice of criterion for selecting the smoothing parameter is an important consideration in fitting a function. The choice of the smoothing parameter should depend on a measurement of the performance of the estimator, f\. Three criteria are widely accepted and used. They are the loss, the risk and the prediction risk. The loss L(A) is defined as L(\) = n- 1 £(f(x ) - 3 f (x )) . 2 x 3 «=i Given the data set, the fitted function value is a function of the smoothing f\(xj) parameter A. Hence the loss L(A) depends on A. The risk, R(\), equals the expectation of the loss L(X). The prediction risk is P(A) = E(n-> ±(y- - f (xj)) ) 2 x where y*- denotes a new observation with the same distribution as yj, but independent of yj. It can be proved that -P(A) = cr + R(X). 2 A good choice of A is one that minimizes either L or R (and hence P). Unfortunately, all three measurements involve the unknown function / . The only way to solve the problem is to find well-behaved estimates of them. Generalized Cross-Validation ( G C V ) and the Akaike Information Criterion (AIC) are two such estimates. 4.1. G C V and Its Asymptotic Properties The definition of G C V is derived from Cross-Validation ( C V ) . Suppose we have observations (x,-, y,), i = 1,..., n, which satisfy model (22). Let f \ = (f\(xi), ..., f\(x )Y n where f\ is the estimate of / using the smoothing parameter A. For each fixed A, we estimate / ( x , ) using (n — 1) observations ( X J , yj), j = 1,..., i — 1, i -f- 1,..., n and 21 express the estimator as / A ( I ) - CV(A) is defined as Cy(A) l / n f : ( y - / A ( , ) ) = »=i t 2 . If the estimator £ \ , which is obtained by using n observations, equals H(X)y f\(i) and = #(A)(yi, • • •,y,-_i, f\(i), y,-+i, • - . , y ) , n it can be proved that ( '"^(l-W)),,) ' 1 The G C V criterion is defined as GCV(A)- 1 EWw-AM) 1 n((l/n)tr(I-ff(A)))' To choose the best A, that is, the A that minimizes G C V , we use the following grid research. Given A i , . . . , Ajv, calculate GCV(A,), i = 1,..., N, and choose the A,- which gives the smallest value of G C V ( A , ) . Some Asymptotic Properties of G C V (See L i 1985 and Speckman 1985) (i) . When the sample size n is large, G C V ( A ) is nearly an unbiased estimator of the prediction risk P(X). (ii) . If tr[J5T(A)] = 0, G C V ( A ) represents an unbiased estimator of the prediction risk for an estimator u\, where u\ — H(X)y, H(X) = (1 + a(X))H(X) with - and a(\)I a(X) = tr[J7(A)]/tr[/ - ff(A)]. 4.2. A I C and Its Asymptotic Property 22 A I C , which is based on information theoretic concepts, was proposed by Akaike in 1971 (Sakamoto et al, 1987). It can be used for statistical model selection in a wide range of situations. When a model involving q independent adjusted parameters is fitted to data, the A I C is defined by, AIC(g) = —2ln(maximized likelihood) + 2q. In the regression model with normal errors, AIC(g) = n l n f n - f^( 1 - f(x )) ) + 2q. 2 yj } j=i A I C is closely related to G C V . Under the assumption that q/n —» 0 as n —> oo, exp[AIC(A)/n] = G C V ( A ) . 4.3. G C V Apphed to Monotone Constrained Problems A key to G C V in the unconstrained case is that fx = H(\)y, where H(\) depend on y. In the constrained case, fx = H(\)y doesn't where -ff(A) depends on y only through the set of constraints which are active at the minimizing g. Under constraints, in both regression splines and smoothing splines, the problem is min^ 1 - y'Cg + -g'G§ subject to A*j3 > 0 (32) with A not depending on y's. In regression splines, fx = XQ; in smoothing splines, 1 S = (cf : e*y and £ = [T : E Q ] J . x 2 Let J3 be the minimizer of (32) with A\Q = 0. If 1 M = (Qi • Q2) \ 23 0 J then @ = Q e, where e minimizes —y CQ e + ^e Q GQ e. t 2 t 2 Thus t 2 2 I = Q2(Q GQ )- Q C y t 1 2 2 t (33) t 2 and 2A = ff(A)y. But Ai depends on y and thus so does H(X). Therefore the unconstrained G C V results do not hold here. 4.3.1. The Approximation form of G C V in Regression Splines Let A\ correspond to the active constraints at the solution of the constrained minimization problem, (23). Then, t r a c e ( J J ( A ) ) = rank(X) — rank(Ai) = m + k — r a n k ( A i ) = I. The rank of A can be obtained by doing the Q-R decomposition of A . The approxx x imate form of G C V ( A ) is (l/n)E? (/ 0r,)-y,) (1 - l/nf • 2 r G r C v V m ( A = 1 ) = A How do we use G C V ( A ) to choose interior knots? Suppose the domain of the fitted function is in [0,1]. The two ends knots are 0 and 1. Consider all the possible non-empty subsets of {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}. There are 511 members in this collection. W i t h the empty set, we get 512 combinations. For each subset, Aj, compute G C V ( A j ) using interior knots equal to the subset. Choose the set of interior knots which minimizes G C V score. 4.3.2. The Approximation Form of G C V for Monotone Constrained Problems in Smoothing Splines 24 From (31) and (32), we have C = (T:EQ ), A* = (T :E <? ). G 2 2 2 TT T XQ t 2 2 Let T2i,E 2 i be composed of all rows of T and E straints at the solution point. Then A\ = ( T : E Q ). 2 corresponding to the active con- 2 2i 21 2 Let the Q-R decomposition of Ai be A 1 = (U, : U ) 2 \ ° / From (33) V (V\GV y^V\C\ 2 2 Vs ) Thus, fx = C = H(\) CU (U GU )- U C y t 2 = 1 2 t 2 t 2 CU {UlGU )- U C 1 2 tr i7(A) = t 2 t 2 tii(U GU )- U^C CU ). t 2 1 t 2 2 Therefore, the approximation form of GCV(A) is GCV(A) = l/nE? i(/(arO-y.0 (1 - l/ntv(H(X))y 2 = 4.4. A I C i n Monotone Constrained Problems i n Regression Splines 25 In monotone constrained problems, the value of q i n the definition of A I C in 4.2 is the number of true free parameters, namely, the difference between the number of parameters i n the unconstrained case and the number of active constraints. Compute the A I C score for each sequence of knots which has been mentioned 4.3.1. Choose the set of knots which minimizes the A I C . 26 Chapter 5. Data Analysis 1: the Relationship between Survival Probabilities and Birthweight of Male Infants In this chapter, we apply the monotone least square regression splines and smoothing splines described in Chapter 4 to the estimation of the relationship between survival probabilities and birthweights of male infants. We discuss techniques to study the assumption that the regression function is monotone and give some preliminary simulation results. A short description of the data set and exploratory data analysis are given in the first two sections. 1. Description of the Data Set This is a survival data set from 7037 male births, taken from Table 2(i) in Karn and Penrose (1951). Data are birthweight (in pounds, rounded to the nearest half pound) and whether or not the infant has survived. It is believed that infant survival is closely related to birthweight. Two possibilities for the relationship are believed to exist. The first is that survival probability increases with birthweight. The second is that there is an optimal birthweight: smaller infants are weaker and less likely to survive, while larger infants are associated with more difficult births. The purpose of the analysis of the data set is to determine if survival probability increases with birthweight. 2. Exploratory Data Analysis Table 1 contains the data set and summary statistics. 27 Table 1. Male Births Data Birthweight Number of infants Fraction surviving (p,-) standard error of p,- 1.0 13 0 0 1.5 8 0 0 2.0 20 0.05 0.0487 2.5 15 0.13333 0.0878 3.0 39 0.25641 0.0699 3.5 46 0.56522 0.0731 4.0 60 0.76667 0.0546 4.5 99 0.81818 0.0388 5.0 170 0.90588 0.0224 5.5 360 0.92222 0.0141 6.0 721 0.97781 0.0055 6.5 1002 0.97505 0.0049 7.0 1403 0.97078 0.0045 7.5 1141 0.97721 0.0044 8.0 942 0.98089 0.0045 8.5 459 0.96514 0.0086 9.0 301 0.98671 0.0066 9.5 132 0.96212 0.0166 10.0 63 0.90476 0.0370 10.5 25 0.92000 0.0543 11.0 15 0.86667 0.0878 11.5 2 1.0000 0 13.0 1 0.0000 0 Before doing regression analysis, we draw the scatter plot of the fraction of infants 28 surviving, p,-, versus birthweight X ; with horizontal bars displaying 95% confidence intervals for p,- (not including the last two data, as the numbers of infants here are too small). Figure 1 shows that the fraction of infants surviving increases with birthweight for weights between 1 and 6.5 pounds, is basically constant between 6.5 and 9.5 pounds, and decreases slightly from 96% surviving at 9.5 pounds to 89% surviving at 11 pounds. Compared to the interval [1,9.5], the length of the interval [9.5,11] is small, as is the drop i n the percent surviving, from 96% to 89%. Therefore, the regression function relating survival probabilities and birthweight of the infants may be well-fit by a monotone function. 3. A n a l y s i s o f t h e D a t a b y L e a s t S q u a r e M o n o t o n e Regression Splines For calculational convenience, we make a transformation of the data. Let z- = t sin ^/y,/n,-, i = 1,. . . , 19 (not including the first and last two data), where y, and _1 rii equal the number surviving and the number of observations at the ith birthweight respectively. Assuming that yi is binomial, the asymptotic variance of Z{ is consistently estimated by V{ — l / ( 4 n ) as rii approaches infinity. The simplicity of the asymptotic t variance is the primary justification for considering the inverse sine transformation. The weighted regression analysis is equivalent to treating z^j\Jv~i as normal with unit variance with u(a:0A/5 = E(-p=)= V » U £ ^-(xO/v^, j=-(m-l) where X,- = 1 + .5 * (i — 1), u and Xj have the same meanings as i n Chapter 4, Part 2 with m , the order, equal to 4. The least square regression spline techniques with two types of monotonicity constraints (Kelly and Rice and point wise monotonicity) are then applied to the transformed data, z,/y/vf. In Kelly and Rice's method, G C V and A I C choose the same set of knots, {0,0.1, 0.4, 0.6,0.9,1}. In the point-wise monotone method, G C V and A I C choose {0,0.1,0.3,0.6, 29 0.8,0.9,1}. Figure 2 and Figure 3 display the two fitted curves where the vertical axis points are pi =(sin(S,)) . It is observed that the two fitted curves are almost 2 same. The estimated survival probabilities of the infants increase almost linearly for birthweights between 2 and 6 pounds. Between 6 and 11 pounds, the estimated survival probabilities are about 97%. The unconstrained estimates of the survival function are also shown in Figure 2 and Figure 3. Each unconstrained estimate was calculated using the same knots as in the corresponding constrained estimate. 4. Analysis of the Data by Monotone Smoothing Splines We apply fourth order monotone smoothing splines to the transformed data to fit the unknown function relating survival probability and birthweight. From the plot of the G C V score in Figure 4, we see that the best choice of the smoothing parameter, A, is 0.019. The fitted curve corresponding to A = 0.019 is shown in Figure 5. From this fit, we reach the same conclusion about the relationship between birthweight and survival probabilities as before. 5. Some Discussions on Testing the Monotone Assumption of the Regression Function In the previous data analysis, we assume that the unknown regression function, / , is monotone. Naturally, we ask: Is this assumption acceptable? Defining a suitable test statistic and determining its null distribution is hard. Instead, we do some preliminary analysis. We propose three summary statistics and carry out simulations to determine their distributions. 5.1. The Summary Statistics Someone who believes that / is not monotone wants to test H : / is monotone, vs Hi'. No restrictions on / . 0 30 We approximate / by a fourth order spline j=-(m-l) and estimate u using the least square regression methods with generalized cross validation, as discussed in Chapter 4, Part 2 and Part 4.3.1. Under the null hypothesis, with Kelly and Rice's constraints on the /?/s, we obtain the estimate of / , / . The 0 unconstrained least squares estimate of / , using the knots chosen i n the constrained case, is f\. One statistic, similar to that i n the classical regression problem, is T _ \\U-h\?l{dh-dh) h-LVIdh where dfo represents the degrees of freedom of the model estimating / under H , dfo = Q n—tr .ff(A), H(X) is defined in Chapter 4, Part 4.3.1, and dfi represents the degrees of freedom of the model estimating / under Hi, dfi = n—rank of X, X = {X,(x,)}. If the null hypothesis were not a constrained one and if the choice of knots were not data based, then under the null hypothesis T would be F distributed with degrees of freedom df — df and df . In Section 5.2, we describe simulation studies to determine 0 t x if T has an F distribution in the constrained problem. We also study the distributions of Ri/cr fo\\ /& , 2 2 2 = \\y — f \\ /cr 2 x to determine if they are approximately \ 2 2 and R /a 2 0 = \\y — with degrees of freedom dfi and dfo, respectively. 5.2. Simulation Let yi = 7sin( 1.5a:,-)-(-£;, i = 1,..., 50 where x - = z'/51 and e,- ~ N(0,1.5 ). 2 t For each of 1000 simulations, use G C V to choose the knots, estimate fo and f and calculate the x test statistic, T. Since the number of knots and the number of active constraints may vary from simulation to simulation, the 1000 runs will produce different values of df 0 and dfi. Figure 6 and Figure 7 show the Q-Q plots of T with the four most common 31 values of (df — dfi,dfx), under the assumption that T is F distributed. Obviously T Q is not F distributed, since the points on the Q-Q plots do not lie on the line y = x. Figures 8, 9, 10, and 11 give the Q-Q plots oiR\/a and R /a , 2 2 0 under the assump- tion that these variables are x - The plots show that the distribution of R\ja 2 close to a x , while R /a 2 2 0 2 is very is close to, but a little smaller than, a x 2 We next simulate using the same x^s and an irregular, monotone regression function, f(x) = 7xI < . {0 x<Q 3} + 7 * 0.31 . < < {0 3 x 1} + 7(x - 0.8)I .8<*<1}- Figures 12, 13, 14, {0 15, 16, and 17 show the Q-Q plots of T, R\jo and R /a 2 Q 2 from 950 simulations. We have the same conclusions as the previous simulations. 5.3. Conclusion In real data, we seldom know cr , and so cannot calculate the statistics 2 and Ro/a . 2 0 should be close. In the male infants data, the value of i ? equals 4.4222 with df = 11 and the value of R x 0 2 However, if the degrees of freedom in both models are close and the null hypothesis is correct, the values of R\ and R df R\/a x 0 6.8954 with = 16. Therefore, there appears to be little difference between constrained and unconstrained fits. Statistical inference for monotonicity of a regression function is still an unsolved problem in the regression/smoothing spline context. The construction of a suitable test statistic and the calculation of its null distribution could be a research topic in our future study. 32 Chapter 6. Description O f Song Sparrow Data and Exploratory Data Analysis 1. Data Set Description The second data set which we will analyse is about the survival of song sparrows in their first year of life on Mandarte Island in B . C . (Schluter and Smith, 1986). The purpose of our study is to relate survival probabilities of song sparrows in their first year of life to some of their morphological traits. From 1975 to 1978, each year's nestlings were tagged with identifying numbers about six days after hatching. The nestlings were later measured when captured as independent juveniles usually in late summer or early fall of the same year. We use measurements from birds at least 8 weeks old. Totally, 152 male and 145 female song sparrows were observed. O n each bird, 6 traits were measured. They were weight, wing length, tarsus (lower leg) length, beak length, beak depth, and beak width, where weight is in grams, others are in millimetres. Observations indicated that mortality and emigrations occured primarily during fall and winter. Annual spring censuses therefore determined the fate of individual birds through the previous fall and winter. The response variable, the status of survival, is categorized into 1 or 0, 1 meaning survived, 0 meaning disappeared. 2. Exploratory Data Analysis Before we analyse the relationship between the response variable and the'six independent variables, we would like to look at the six independent variables separately for the male song sparrows, the female song sparrows and both combined. In Section 2.1, we draw boxplots and compute means, medians and standard deviations of the six variables in order to observe the sex effect of the birds on the values of the six vari- 33 ables. A large difference between the sexes would suggest a need to analyse survival probabilities for each sex separately. One way of approaching the relationship between the response variable and the six predictors is to do a regression analysis. In regression, when the predictors are collinearly related, regression modelling can be very confusing. Estimated effects can change magnitude or even sign, depending on the other predictors in the model. It is important to understand the relationships between the predictors before proceeding with a regression analysis. In Section 2.2, the scatter plots of pairwise relations will help us roughly check the pairwise collinearity of the predictor. In many regression problems, we may seek a small set of predictors that have nearly the same information as the full set. Further analysis can concentrate on this subset of predictors. In Section 2.3, principal component analysis is done for this purpose. By separating the male and the female song sparrows and doing principal component analysis, we can also see the relationship between sex and the six independent variables. In Section 2.4, we will do some preliminary regression to have a better understanding of the relationship between response and each of the predictors. Finally, in Section 2.5, we model the logit scale of survival probabilities as linear functions of the independent variables in order to compare the contributions of the 6 independent variables to survival probabilities of song sparrows. 2.1. Sex Effect of Birds on the Values of Independent Variables Figures 18, 19, 20, 21, 22, and 23 show the boxplots for the six independent variables. In each picture, from left to right, the three boxplots correspond to the combined, the male and the female song sparrows. We observed that the male and female sparrows differ in all six traits, with the largest differences being in weight and 34 wing length. For quantifying the differences, we compute means, medians, standard deviations and standard errors of the means and list them i n Table 2. Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables Variable Weight W i n g Length Tarsus Beak Length Beak Depth Beak W i d t h Type of Data Mean Median SD SE Overall 24.0110 24.0000 1.7737 0.1029 Male 25.1445 25.0000 1.4184 0.1150 Female 22.8229 22.9231 1.2568 0.1044 Overall 66.4861 66.5000 2.2540 0.1308 Male 68.3416 68.1845 1.2553 0.1018 Female 64.5411 64.4837 1.1610 0.0964 Overall 19.8770 19.9000 0.5922 0.0344 Male 20.1257 20.1250 0.4919 0.0399 Female 19.6162 19.5833 0.5775 0.0480 Overall 8.6625 8.6500 0.3148 0.0183 Male 8.7054 8.6777 0.3138 0.0255 Female 8.6174 8.6010 0.3106 0.0258 Overall 5.8844 5.9000 0.2017 0.0117 Male 5.9346 5.9466 0.1995 0.0162 Female 5.8318 5.8146 0.1908 0.0158 Overall 6.7473 6.7500 0.2001 0.0116 Male 6.8125 6.800 0.1917 0.0156 Female 6.6790 6.6814 0.1860 0.0154 2.2. Pairwise Relations among the Six Independent Variables and Sex Effect on the Relations Figure 24 shows the pairwise relations of the six independent variables for the whole 35 data set where W T is weight, W L is wing length, T is tarsus, B L is beak length, B D is beak depth, and B W is beak width. It seems that W T and W L , W L and T are strongly linearly correlated and that B D and B W , B L and B D , W L and B W , B L and B W are linearly correlated. For observing sex effect, we divide the data set into two parts according to the sex of song sparrows and draw the same pictures. Figure 25 shows the data of the female song sparrows, in which all the pairwise linear relations existing in Figure 24 look weak. In Figure 26, which shows the data of male song sparrows, only the relations among B L , B D and B W are similar to those in Picture 24. Figures 27 through 31, enlarged images of some of the small scatter plots in Figure 24, describe the relations between the independent variables more clearly. In those plots, by adding notations f (female song sparrows) and m (male song sparrows), the dominance of sex of the birds on the relation of W T and W L is obvious. Since, for female and male song sparrows, the values of W T and W L are much different, although W T and W L have no strong linear relation, when we put the male's and female's data together, an elliptical shape appears. The phenomenon occurs in other pictures, but not as obviously as in that of W T and W L . 2.3. Principal Component Analysis There are six independent variables in the data set. W i t h principal component analysis, we want to see whether we can use a few principal components to reproduce most of variability of the system, and thus to reduce the data. For instance, if the first principal component summarizes most of the variability in the independent variables, we may want to regress survival probability against the first principal component. We do principal component analysis for the whole data set and for separated data sets of male and female sparrows in order to observe the difference of principal components 36 in the two cases, which somehow indicates the effect of sex. For the whole data set, ^ 0.2803024 ^ 0.2007063 ^ Proportion of total sample ^ variance due to kth 0.1460836 = ( Ai + ... + A 0.1385502 6 ^ principal component ^ 0.1298394 0.1045181 The first three principal components, which represent 63% the variability, are PCI = 0.46WT + 0.50WL + 0.29T + 0.35BL + 0.39BD + 0.42BW PC2 = - 0 . 2 5 W T - 0.30WL - 0.61T + 0.45BL + 0.43BD + 0.29BW PC3 = 0.49WT - 0.08WL - 0.38T - 0.67BL + 0.39BD + 0.01BW. For the female sparrow data set, ^ 0.2258542 ^ 0.1944510 ^ Proportion of total sample ^ variance due to kth 0.1654235 •Ai + ... + A e ^ principal component j 0.1438113 0.1414802 0.1289797 In this case, about 58.6% of the total sample variance is attributed to the first three principal components. The first three principal components become PCI = 0.19WT + 0.30WL - 0.13T + 0.56BL + 0.52BD + 0.52BW PC2 = - 0 . 5 6 W T - 0 . 4 0 W L - 0 . 7 1 T + 0.10BL + 0.00BD + 0.16BW PC3 = 0.54WT - 0.73WL - 0.05T - 0.08BL + 0.39BD - 0.10BW. 37 For the male sparrow data set, 1 0.1999561 ^ Proportion of total sample ^ variance due to kth 0.2473383 ^ 0.1599121 = ( •) = Ai + ... + A , principal component 0.1445753 0.1307587 0.1174594 Here, the first three principal components, which explain 60.7% variance of total samples are changed into PCI = 0.29WT + 0.36WL + 0.06T + 0.54BL + 0.49BD + 0.50BW PC2 = - 0 . 3 3 W T + 0.53WL + 0.67T + 0 . 1 2 B L - 0 . 3 8 B D - 0 . 0 3 B W PC3 = 0.87WT + 0.03WL + 0.35T - 0.20BL - 0.13BD - 0.23BW. From the above analysis, we draw the following conclusions: (i) . The first three principal components are not enough to reproduce most of the variation. (ii) . Since for male and female sparrows the values of the six morphological traits are different and the coefficients in the first three principal components of the traits vary dramatically, analysing the data separately for male and female sparrows is necessary for us to recover the true relationship of response and the independent variables. 2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Independent Variables We separate the whole data set into two parts according to the sex of the birds. Let x x < x 2 < ... < x n denote the independent variable values and y; denote the corresponding 0 — 1 responses. Divide the range of x into a number of subintervals, 38 as follows. The first subinterval consists of x ,..., Xio+j where j is the biggest nonx negative integer such that x j lQ+ = x . The following subintervals are constructed x0 with the same rule, that is, each subinterval includes 10 points plus the replications (if any) of the tenth point. Let p,- denote the survival probability in the ith subinterval, Pi = { E ^ g g u b i n t e r v a l K y = 1 I j)}/# j x x subinterval. The estimate of the survival e probability, p,-, i n each subinterval equals the mean of the response i n the subinterval. In logit scale, the estimate is log(p /(l — p,)) = logit(p ). The asymptotic variance 2 t of logit(p ) is l / ( n ; p ; ( l — pi)), and thus an approximate 95% confidence interval for : logit(p ) is [log(p,/(l - Pi)) - 2/y/mpiil-pi), t log(pf/(l - pi)) + 2/y/n p (l-p )]. i i i Figure 32 - Figure 43 are the scatter plots of (x"i, log(pi/(l — pi))) with horizontal bars displaying these confidence intervals, (x, denotes the average of the independent variable values i n the subinterval.) From these plots, for male data, it is hard to see a relationship between the response and any of the independent variables, while for female data, relationships with T , B L and B W may exist. For T , it seems there is a decreasing trend for survival probabilities. Also, i n the plots of B L and B W , some relations may exist. 2.5. Generalized Linear Models In this section, we use generalized linear models to study the relationships between independent variables and survival probabilities. Since the response, the status of survival, is binomial and the independent variables are continuous, we model the logit of the survival probabilities as linear functions of independent variables. Although the models chosen may have big lack of fit errors, by comparing the deviances of the models, we can roughly evaluate how much valuable information we can obtain through fitting the curve of each predictor and the response separately. Let W T , W L , T , B L , B D , and B W represent the same variables as before. 39 First look at female song sparrow data. Table 3 shows the deviances and degrees of freedom of the models with a constant, first degree, or second degree polynomial depending on W T , W L , T , B L , B D , or B W . Model Deviance Degree of Freedom log(p/(l -p)) = a 200.951 144 log(p/(l -p)) =a+ftWT 200.906 143 log(p/(l -p)) = a + 198.981 142 log(p/(l -p)) = a + 200.077 143 log(p/(l -p)) = a + /3 WL+/? WL 197.671 142 log(p/(l = a + (3 T 184.944 143 log(p/(l -p)) = a + f3 T+p T 184.767 142 log(p/(l -p)) = a + ftBL 192.876 143 log(p/(l -p)) = a + 191.239 142 log(p/(l -p)) = a + 200.901 143 log(p/(l -p)) = a+ 200.249 142 log(p/(l -p)) = a + fieBW 196.914 143 log(p/(l -p)) = a + 193.183 142 ftWT+ftxWT 2 WL 2 2 22 3 2 3 33 ftBL+^BL 2 ftBD ftBD+/3 BD 2 55 ftBW+fcBW 2 From Table 3, the model, log(p/(l — p)) = a + f3 T, is a significant improvement over 3 the elementary constant model. The linear parts in the relationships of B L and B W with the response can not be neglected, although this is not as obvious as in T . Adding the quadratic term to each model doesn't help us a lot for obtaining a good fit. For explaining the point more clearly, in Table 4, we use all six independent variables to fit models. 40 Table 4: Logistic Models Including Six Independent Variables for Female Song SparrowData Model log(p/(l -P)) = a + /WT+/WL+/3 T+/? BL+/? BD+/? BW log(p/(l -P)) = a + AWT+^WL+^T+^BL+^BD+^BW +P\i W T 3 2 + /3 WL 2 22 + + ^ 4 4 B L 2 + 5 p s s B B 2 6 + feBW 2 Deviance DF 174.943 138 167.964 132 From Table 3 and Table 4, we see that second degree polynomials are not good models for fitting the data. Some nonparametric methods, for example, splines, may help us to solve the problem. Next consider the first degree polynomial models of T , B L and B W in Table 3 by adding some of the second interaction terms. Table 5, Table 6, and Table 7 give the results corresponding to T , B L , and B W . Table 5: Logistic Models Including T and Its Second Interaction Term Model Deviance Degree of Freedom 184.944 143 log(p/(l -p)) = a + /3 T log(p/(l -p)) = a + /3 T+/? WT*T 184.536 142 log(p/(l -p)) = a + /3 T+/3 WL*T 182.562 142 log(p/(l -p)) = a + /? T+/? T*BL 184.597 142 log(p/(l -p)) = a +^ T+^ T*BD 180.358 142 log(p/(l -p)) = a + /? T+/3 T*BW 184.662 142 3 3 13 3 3 3 23 34 3 5 3 36 41 Table 6: Logistic Models Including B L and Its Second Interaction Term Model Deviance Degree of Freedom 192.876 143 log(p/(l -P)) = a + /3 BL 4 log(p/(l ~P)) = a + /? BL+/? WT*BL 192.806 142 log(p/(l ~P)) = a + /? BL+& WL*BL 192.871 142 log(p/(l -P)) = a + / ? B L + / ? T * B L 192.327 142 log(p/(l -P)) = a + / ? B L + / ? B L * B D 192.758 142 = a + /? BL+/? BL*BW 191.857 142 4 14 4 4 34 4 log(p/(l 4 45 4 46 Table 7: Logistic Models Including B W and Its Second-Interaction Term Model Deviance Degree of Freedom 196.914 143 log(p/(l -P)) = a + / ? B W + / ? W T * B W 196.906 142 log(p/(l -P)) = a + / ? B W + f t W L * B W 196.046 142 log(p/(l -P)) = a + i 3 B W + ^ T * B W 196.913 142 log(p/(l -P)) = a + / ? B W + / ? B L * B W 196.457 142 log(p/(l -P)) = a + / ? B W + / ? 5 B D * B W 196.747 142 log(p/(l -P)) = a + j8 BW 6 6 16 6 6 6 6 6 46 6 The results in these tables suggest that, among all second interaction terms, only the term T * B D might have some useful information for predicting survival probabilities. The following table includes the same models as in Table 3, but for the male song sparrow data. 42 Table 8: Logistic Models Fit to the Male Song Sparrow Data Model Deviance =a log(p/(l -p)) log(p/(l -p)) a + ft W T = log(p/(l -p)) a + ftWT+ftxWT a + ft W L log(p/(l -p)) =a log(p/(l -p)) + ft W L + f t W L a + ftT log(p/(l -p)) a + ftT+ft T 2 3 log(p/(l -p)) = a + ftBL log(p/(l -p)) = a + ftBL-f log(p/(l -p)) =a log(j>/(l -p)) - a + =a ft BL 2 4 + ftBD a + log(p/(l -p)) 2 2 log(p/(l -p)) log(p/(l 2 + ftBD+ft BD 2 5 ftBW ftBW+ft BW 2 6 Degree of Freedom 196.575 '• 151 196.539 150 194.647 149 196.517 150 196.082 149 196.100 150 194.876 149 196.572 150 196.411 149 196.529 150 194.946 149 196.034 150 194.456 149 From the deviances of the models in the above table, we see that adding first and second degree terms of the independent variables doesn't give much improvement over the constant model. From here, at least we can roughly say, the logit scales of survival probabilities are not linearly related with any one of the independent variables. Conclusions: (i) Compared with the male song sparrow data, the female song sparrow data set shows relationships between independent variables and response. (ii) If we want to analyse the relation between each independent variable and survival probabilities further, including T , B L and B W in female song sparrow might be wise. (iii) Second degree polynomials are not good models for fitting the data. 43 In the following analyses of this data, we will focus on T, BL and BW in the female song sparrow. 44 Chapter 7. Binomial Regression Splines For analysing the song sparrow data, we now introduce unconstrained binomial regression splines to motivate the technique for the constrained case. B y modifying the unconstrained method, we propose a technique of estimating monotone binomial regression splines. We define G C V and A I C for knots selection in binomial regression models, and their approximate forms under constraints. Finally, we apply the methods to the female song sparrow data. 1. Unconstrained Binomial Regression Splines Consider independent observations (x,-, y,), i = 1 , . . . , n, where Binomial(l,p(ar,-)) i= I,...,n. Let f(t) =log(p(i)/(l — p(t))) and assume / is in the Sobolev space W [ a , b]. This m 2 transformation is made to remove the constraint p 6 (0,1), and to guarantee / is monotone in p. Given the B-splines basis, { I ? _ ( _ i ) , . . . , B_jt, }, of S [ti,..., m m irn m t ], k we approximate / by k j=-(m-l) Set § = (/?_(„!_!),..., flkY and y = ( y . . . , y ) - To obtain the maximum likelihood 4 1 ; n estimate of the parameter 0, we need the likelihood function of y. The likelihood function of y is n L(8/V) = l[p(xi) (l-P(xi)) vi «'=i and its logarithm can be expressed as n = E{y.-/(*0-ln(l + exp(/(zO))} 45 n k f J2\yi »=1 I Let £ k £ - l n ( l + exp( PjBj, {xi) m j=-(m-l) j=-(m-l) -Sfc, (a;i) ( B I \ m •S-(m-l),771(^2) 5 = -^-(m-2),m y B-( -\) (x ) m tm = Bk, (x ) B_( _2),m( n) x n PjBj^Xi))) m m BI j n and assume B is of full column rank. Thus, K6) = £ {yiB'Bi - l n ( l + exp(^ 5 t ))} . t=i 4 Differentiating with respect to g yields s(§) = - I'm VV„n exp(^^,) ^ " l + exp(^B,-) 1=1 ( y i B < (34) B0 Defining ^ exp(jg'Bi) ^ l+expf^'Bj) exp(^ B ) i+exp(#<B„) t n \ / We obtain the equation The maximum likelihood estimate of g is the solution to S(g) — 0. This equation has no explicite solution, but can be solved by Newton-Raphson iterative techinique. This technique requires the calculation of the second derivative of I with respect to g. From (34), cPl(g) _ dS{6) dg 2 dp _ _ ^ expjB'Bj) t £ t (1 + e x p ( f t £ ) ) t 46 2 1 *"• Defining V = d i a g K -..,,„) = d i a g ( ( 1 + e x p ^ ( ) ) , • - -, 2 ( 1 + e x p ( ^ ) ) 2 )> we obtain the equation = - £ d$ 2 ViBiB\ = -B*VB. The implementation of Newton-Raphson iterative technique includes three calculations: (i) F i n d an initial estimate of 0, f3 . "k -fc+1 (ii) Given the kth. step's estimate {5 of § , define § -t+i ~k . = dS(Jl -k w §" + {B VB)-'B\ -uQk)) (35) i U (iii) Establish a criterion to decide when iteration should be stopped. If, at each step, B VB is non-singular matrix, then the Newton-Raphson method l converges. B being of full rank, B VB t is necessarily non singular. The maximum likelihood estimate of Q is the final result of the iteration. The Newton-Raphson method of finding the maximum likelihood estimate of Q is sometimes called iteratively reweighted least squares for the following reason: From (35), we have 8* 1 (B'VB)^ 1 {B VB)- BXy-u{f) )) = 8 + k = (B'VB)^ = t B $ k + B\y - B*V(BJ3 + V~\u - + V-\y - u), k If g= 1 47 «(£*))• then, 8 = k+1 (B'VB^B'Vz is the solution of the weighted least squares problem min^ \\vh(z-BS)\\\ (36) where the elements of V and z are the functions of the former estimate of (3. 2. M o n o t o n e B i n o m i a l R e g r e s s i o n Splines Assume / is a monotone function. W i t h each of the two types of monotonicity constraints discussed in Chapter 3, Part 4, we want to estimate the parameter Q which maximizes 1(0) subject to A § > 0. t To solve the problem, we make a modification of step (ii) of the Newton-Raphson method, using equation (36). W i t h V and z as defined in Section 1, @ k + l is the solution of the following quadratic programming problem: min subject A t each iteration, if B VB t \\V*(z - B £ ) | | to 2 A*@ > 0. (37) is strictly positive definite, a unique minimum is achieved. 3. G C V i n B i n o m i a l R e g r e s s i o n Splines 3.1. G C V in Unconstrained Binomial Regression Splines In the least squares regression problem, G C V is defined as where the numerator represents the residual sum of squares. A n analog of the residual sum of squares in the generalized linear models context is the generalized Pearson \ 48 2 statistic. Replacing the numerator in (38) by Pearson \ 2 leads to the G C V score where V, z, §, and the hat matrix, H(A) = 5 ( B ' V B ) ~ B*V, are all computed at the l final stage of the iteration. It is conjectured that, for smoothing spline fits, the A minimizing the G C V score will approximately minimize the weighted mean square error (O'Sullivan et al., 1986), i/»E»ife)-/fe)) 2 i=i In binomial regression splines, there are two algorithms to calculate the G C V score. Algorithm 1. For each subset of possible knots, A, estimate the parameter g using the Newton-Raphson algorithm and compute the G C V at the last iteration using (39). Choose the A that minimizes G C V . Algorithm 2. Choose an initial estimate, j3°. Calculate the associated V and the pseudo data z. Fixing z and V, assume that there exists a knot subset, A, for which z is normal with mean B(\)g and covariance V~ . x Fit the pseudo data by the usual method of generalized cross-validation in a weighted normal model, that is, for each value of A, find J(A) to minimize l l t ^ / ^ — B(\)/3)\\ 1 2 2 and then choose the value of A that minimizes G C V in (39). The resulting § is then used in the next step of the iteration to create a new V and z. Iterate until A is stabilized and the iteration for this stabilized A converges. Algorithm 1 is proposed by O'Sullivan et al. (1986) for use with smoothing splines, where A is a real number. Algorithm 2 is proposed by G u (1990a) also in the smoothing spline context. G u shows that, under the right conditions, the final A chosen by algorithm 2 satisfies D(X) 49 where A minimizes E(V (\)) W = E(GCV(X)), and in Binomial case, D{A) = E ( £ " = i K rij represents the number of observations with a particular value (rij,p(xj),px(xj)) j, S of x = Xj, K(n, o,P\) is the Kullback-Leibler distance between a Binomial (n,p ) and P 0 Binomial (n,pi) and equals e(n,p ) — e(n, i) 0 / e(n,p) = 2 k=Q \ n k where P \ I - p) ~ n k (k\og{-^—) + nlog(l - p) + log \ l-p ) \ n " V t fc / ) / The difference between the two algorithms is explained in G u (1990b), who argues convincingly that algorithm 2 is more suitable. The data are only analysed using the first algorithm, as Gu's method has only recently become known. 3.2. G C V in Constrained Binomial Regression Splines The modification of G C V in (39) for the constrained problem is similar to that made in the normal regression model. Specifically, suppose Q is the solution of (37) and A\ is the active constraints set with A\j3 — 0. The minimization problem of (37) has the same solution as min ||V5(z Do Q-R decomposition of A - B§)\\ subject 2 to A[/3 = 0. 1 ; = (Qi : Q ) QiR. 2 Let Q = Q2Q. Then, (37) = min \\V^z - BQ c)\\ = min (z - BQ cfV(z 2 2 - 2 The estimate of c and J3 are c = ((BQ ) V(BQ ))- Q B Vz t 2 1 2 50 t 2 t BQ c). 2 (40) (Q B VBQ )- Q B Vz t t 1 2 t 2 t 2 Q2Q = Q (Q B VBQ )- Q B Vz, t 2 t 1 2 t 2 t 2 and thus z = H(X) BQ^QlB'VBQ^QlB'Vz = BQ (Q B VBQ )- Q B V, t 2 t 2 1 2 t t 2 and tvH(X) = tvdQlB'VBQ^QlB'VBQ^ = rank of Q^VBQ^. The approximate form of G C V in the constrained binomial regression splines is defined as G C V _ \\vKi-BS)\f (n-tr (H(X))f This form can be used in either Algorithm 1 or 2. 4. A I C in Binomial Regression Splines In binomial regression splines, A I C is defined as AIC(A) = - 2 J2( S*Bi - l n ( l + e x p ^ S , ) ) ) + 2 * q yj i=i where 5 represents the number of independent parameters and @ is found by the Newton-Raphson method. Without constraints, q equals m + k. W i t h constraints, q equals m + k minus the rank of the active constraints matrix A\. 5. Data Analysis: the Relationship Between Survival Probabilities and Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51 5.1. Survival Probabilities and Tarsus Exploratory data analysis indicates that as the size of the female song sparrows' tarsus increases, the survival probabilities of the birds decrease. We relate survival probabilities to tarsus using Kelly and Rice's and point-wise monotone binomial regression splines. In Kelly and Rice's method, we assume the parameters of the fitted curve form a decreasing sequence. B y computing G C V and A I C scores, as in Algorithm 1, we obtain the unique best set of knots,{0,0.3,0.4,0.5, 0.7,1}. In the point-wise monotone method, we assume that at the increasingly ordered data points, the values of the fitted function are decreasing. A grid search of G C V and A I C yields the same set of knots, {0,0.2, 0.3,0.6,1}. Figures 44 and 45 show the two fitted curves. In this data set, the two fitted curves are very close. To describe the relation of survival probabilities and tarsus of the female song sparrows more clearly, we divide the values of tarsus length into four subintervals. If the values of tarsus length are from 17.7mm to 18.25mm, the birds have about 100% chance of surviving, from 18.4mm to 19.75mm , the survival probabilities of the birds decrease sharply from over 95% to 37%, if we go further from 19.75mm to 20.3mm, the chance of survival of the birds is about 37%, from 20.3mm to 21.3mm, the chance of survival of the birds drops from 37% to 0. 5.2. Survival Probabilities and Beak Length We assume a non-decreasing function fits the relation between beak length and survival probabilities of the female song sparrows. Applying the two types of constraints of Section 5.1 to the data set, we have the following results. W i t h Kelly and Rice's method, G C V and A I C choose different sets of knots. G C V chooses {0,0.2,0.7,1}, A I C chooses {0,1}. For each set of knots, we have the fitted curve in Figure 46 and Figure 47. 52 W i t h pointwise monotone constraints, computation of G C V and A I C also gives different sets of knots. The best sets of knots from G C V and A I C are {0,0.2,0.6, 0.7,1} and {0,0.6,1} respectively. W i t h the two sets of knots, we obtain the fitted curves shown i n Figure 48 and Figure 49. Although the two types of constraints and the two ways of choosing knots result in different sets of knots, the fitted curves are very close. Roughly speaking as the beak lengths of the female birds increase from 7.8mm to 8.8mm, the survival chances of the birds increase from 5% to 62%. For beak lengths greater than 8.8mm, the survival chance is about 63%. 5.3. Survival Probabilities and Beak Width Here we still assume that the regression function relating survival probabilities and beak width of the female song sparrows is monotone. For each type of constraint, G C V and A I C give the same set of knots {0,0.8,1}. The fitted curves are shown in Figure 50 and Figure 51. Both methods give almost identical fits. The chance of survival increases from 0 at 5.9mm beak width to 55% at 6.8mm beak width. Birds with beak width bigger than 6.8mm have about a 55% chance of surviving. 53 Bibliography Barlow, R . E . , Bartholomew, D . J . , Bremner, J . M . , and Brunk, H . D . (1972). Statistical Inference under Order Restrictions. New York: John Wiley & Sons. Brunk, H . D . (1955). Maximum Likelihood Estimates of Monotone Parameters. A n n . Math. Statist., 26, 607-616. de Boor, C. (1978). A Practical Guide to Splines. New York: Springer-Verlag. Duchon, J . (1977). Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces, in Constructive Theory of Functions of Several Variables, eds. W . Schempp and K . Zeller, Berlin: Springer-Verlag, 85 — 100. Friedman, J . and Tibshirani, R. (1984). The Monotone Smoothing of Scatterplots. Technometrics, 26, 242-250. G i l l , P . E . , Murray, W . , and Wright, M . H . (1981). Practical Optimization. New York: Academic Press. G u , C. (1990a). Adaptive Spline Smoothing in Non Gaussian Regression Models. J . Amer. Statist. Assoc., 85, 801-807. G u , C. (1990b). A Note on Cross-Validating Non Gaussian Data. Technical Report #96. Department of Statistics, University of British Columbia. L i , K . (1985). From Stein's Unbiased Risk Estimates to the Method of Generalized 54 Cross-Validation. The Annals of Statistics, 13, 1352 — 1377. Karn, M . N . and Penrose, L . S . (1951). B i r t h Weight and Gestation Time in Relation to Maternal Age, Parity and Infant Survival. A n n . Eugen. 16, 147-164. Kelly, C. and Rice, J . (1988). Monotone Smoothing with Application to Dose Response Curves and the Assessment of Synergism. Unpublished Manuscript. Kimeldorf, G . and Wahba, G . (1971). Some Results on Tchebycheffian Spline Functions. Journal of Mathematical Analysis and Applications, 33, 82-95. Meier, K . (1989). Estimating Rate Equations Using Nonparametric Regression Methods. Institute of Statistics Mimeograph Series No. 1969, North Carolina State University. O'Sullivan, F., Yandell, B.S. and Raynor, W . J . (1986). Automatic Smoothing of Regression Functions in Generalized Linear Models. J . Amer. Statist. Assoc., 81, 96-103. Ramsay, J.O. (1988). Monotone Regression Splines in Action. Statistical Science, 3, 425-461. Reinsch, C. (1967). Smoothing by Spline Functions. Numer. M a t h , 10, 177-183. Schluter, D . and Smith, J . N . M . (1986). Natural Selection on Beak and Body Size in the Song Sparrow. Evolution, 40(2), 221-231. 55 Schoenberg, I.J. (1964). Spline Functions and the Problem of Graduation. Proc. Nat. Acad. Sci. U S A 52, 947-950. Speckman, P. (1985). Spline Smoothing ans Optimal Rates of Convergence in Nonparametric Regression Models. Annals of Mathematical Statistics, 13, 970 — 983 Utreras, F.I. (1985). Smoothing Noisy Data under Monotonicity Constraints Existence, Characterization and Convergence Rates. Numer. Math., 47, 611-625. Villalobos, M . and Wahba, G . (1987). Inequality-Constrained Multivariate Smoothing Splines W i t h Application to the Estimation of Posterior Probabilities. J . Amer. Statist. Assoc., 82, 239-248. Sakamoto, Y . , Ishiguro, M . and Kitagawa, G . (1987). Akaike Information Criterion Statistics. Boston: D . Reidel. Wahba, G . and Wendelberger, J . G . (1980). Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation. Monthly Weather Review, 108, 1122-1143. 56 Fraction of Surviving tf ro - o o Ui c cp s cn — CO < < cs cm cn C 00 - CA w cr I. cm cr Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines Figure 3. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines Figure 4. Grid Search on G C V 0.0 0.01 0.02 0.03 Lambda 60 0.04 0.05 0.06 Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7SIN(1.5x) in CM o c\i in d o d Quantiles of F-Distribution, df=(3,43) 62 Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 7SIN(1.5x) Quantiles of F-Distribution, df=(4,42) 1 2 3 Quantiles of F-Distribution, df=(5,42) 63 Figure 8. Q-Q Plots of R^cr 2 When the Simulation Function Equals 7SIN(1.5x) Figure 9. Q-Q Plots of R^/a 2 When the Simulation Function Equals 7SIN(1.5x) Figure 10. Q-Q Plots of R /a 0 2 When the Simulation Function Equals 7SIN(1.5x) ra o ra o Q T3 CU •c o CO o cn o co o Quantiles of Chisq-Distribution, df=43 o o o o CO o co Quantiles of Chisq-Distribution, df=44 66 Figure 11. Q-Q Plots of R /cr 2 Q When the Simulation Function Equals 7SIN(1.5x) —, , , , 30 40 50 60 70 Quantiles of Chisq-Distribution, df=45 30 40 50 60 Quantiles of Chisq-Distribution, df=46 67 70 80 Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{ < . y 0 x<0 3 + 7 * 0.31{.3<x<i} + 7(x — 0.8)I{.8<x<i} 0 Quantiles of F-Distribution, df=(4,43) in CM o CM in J Quantiles of F-Distribution, df=(4,42) 68 0 Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xI{ <x<o.3} 0 + 7 * 0.37{o.3<a;<i} + 7(x — 0.8)I{.8<a;<i} 0 69 Figure 14. Q-Q Plots of i?i/cr When the Simulation 2 Function Equals 7a;J{ <x<o.3} + 7 * 0.3/{ .3<x<i} + 7(x - 0.8)/{ .8<x<i} 0 0 0 re re Q n<u to CO 30 40 50 Quantiles of Chisq-Distribution, df=40 70 60 Figure 15. Q-Q Plots of R /(j 2 x Function Equals When the Simulation 7xJ{ <x<o.3} + 7 * 0.31{.3<x<i} + 7(x - 0.8)I{ .8<x<i} 0 0 71 0 Figure 16. Q - Q Plots of R /a 2 Q When the Simulation Function Equals 7xI{ < o. y + 7 * 0.3^ .3<x<i} + 7(x - 0.8)7{ .8<*<i} Q x< 3 0 Quantiles of Chisq-Distribution, df=46 72 0 Figure 17. Q-Q Plots of R /a 2 Q When the Simulation Function Equals 7xJT{<x<o.3} + 7 * Q-3I{a.z< <i} + 7(x - 0.8)I{ . <x<i} 0 x 0 8 Weight 20 22 24 26 28 CW c re M 00 a. a Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data o oo CD cs) c CD _l D) .= CD CD CD CM CD 75 Tarsus 00 s CD to O » 0 SL OP cr Ul O cn C CA S» 1 CS rw P3 o 3 CD CL ft) 3? CL CL Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 10 o ai c cu _J ca CD CO in oo q cd 77 Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female (Right) Song Sparrow Data CD CM CO o CO Q. CU Q oo X. ro LO cu CO CO LO LO CM LO 78 Beak Width 6.6 C » i ft to w 6.8 • W em *J £L ft plo nd 09 0 rtcn O >-*» W ft Pi rt- O £3 •1 l-i o o re /-—V ft Sift* )—-I Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data 80 Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow Data 81 Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow Data 82 Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data m m m m m _ mm mm m mm m m jjn o m _m m oo f f f tfff fff co to f t fff t f f CD 1 f f f I pf% m m T f ft fff f/ff» ' ffff ff f W f fff t f f f f f ff • 1 i 1 t fff t f f f 'f CM CD m m m^iiimri ™ m m mmm ^nvx RLlwim m m m TIyrrf f^tmn m mrrn m mrcL mmm m f m3m m m m ••• m m m mm 1 It m co mn rmmrrfrim rff m m m m ff f f if 20 22 "T~ 24 26 WT 83 28 m Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data 20 1^ 22 24 26 WT 84 28 Figure 29. Scatterplot of Beak Length and Beak Width for Song Sparrow Data 85 F i g u r e 30. Scatterplot o f W e i g h t a n d B e a k L e n g t h for Song Sparrow Data m f m m m m m m m m f m f m, y% m m m m mm f ni m m f _' « f rflV , f, ,f f f tffm f f f m f ff f f f m f 1 m ! } m n 1 f f f f m mm m m m m % % « fmf rrmp "m Ln*m.L m m T T f ff f ff f rh f f f j p ^ R l ^ f f rfi rfi m f W f f f m m m m m f f f m f 'f f 20 ff m f f 22 f m f f m m T 26 24 WT 86 m nm 28 Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data 87 Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows 88 Figure 33. The Relationship between W i n g Length and Survival Probabilities of the Female Song Sparrows WL 89 Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 18.5 19.0 19.5 90 20.0 20.5 Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows -J 1 8.2 8.4 T~ 8.6 8.8 BL 91 9.0 9.2 9.4 Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows co - CM - O - O) T- o CM _ 92 Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows o H CM - J ^ -4 BW 93 Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows oo H to H CM H o H CM - J 94 Figure 39. The Relationship between Wing Length and Survival Probabilities of the Male Song Sparrows •t - co - CM - O 1.o=> O o "T~ 67 68 69 WL 95 70 71 Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows 19.5 20.0 96 20.5 21.0 Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows BL 97 Logit of the Estimates of Survival Probabilities Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows CO - CVJ o - 99 Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows Kelly and Rice's Method 18 19 20 Tarsus 100 21 F i g u r e 45. T h e R e l a t i o n s h i p b e t w e e n T a r s u s a n d S u r v i v a l P r o b a b i l i t i e s of the Female S o n g S p a r r o w s 00 d CD CD J5 co n o CO > d CO C\J d o d Tarsus 101 Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method 8.5 Beak Length 102 9.0 9.5 Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method . • •• / Knots Chosen By AIC 8.0 8.5 9.0 Beak Length 103 9.5 Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method CO » ci / to d d co d CM d Knots Chosen By GCV 8.0 8.5 9.0 Beak Length 104 9.5 Figure 4 9 . The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 8.0 8.5 9.0 Beak Length 105 Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows Beak Width 106 Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows in d d CD -O CO S CO o CO > CO CM d o d 107
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Monotone regression functions
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Monotone regression functions Zuo, Yanling 1990
pdf
Page Metadata
Item Metadata
Title | Monotone regression functions |
Creator |
Zuo, Yanling |
Publisher | University of British Columbia |
Date Issued | 1990 |
Description | In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimates depend upon a "smoothing parameter": the number and location of knots in regression splines and the usual [formula omitted] in smoothing splines. Two standard techniques for choosing the smoothing parameter, GCV and AIC, are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets. |
Subject |
Monotonic functions Regression analysis |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0098248 |
URI | http://hdl.handle.net/2429/29457 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1990_A6_7 Z86.pdf [ 4.39MB ]
- Metadata
- JSON: 831-1.0098248.json
- JSON-LD: 831-1.0098248-ld.json
- RDF/XML (Pretty): 831-1.0098248-rdf.xml
- RDF/JSON: 831-1.0098248-rdf.json
- Turtle: 831-1.0098248-turtle.txt
- N-Triples: 831-1.0098248-rdf-ntriples.txt
- Original Record: 831-1.0098248-source.json
- Full Text
- 831-1.0098248-fulltext.txt
- Citation
- 831-1.0098248.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098248/manifest