M O N O T O N E R E G R E S S I O N F U N C T I O N S By Y A N L I N G ZUO B . S c , Beijing Institute of Technology, 1984 M . S c , Beijing Institute of Technology, 1987 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF S C I E N C E in T H E F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF STATISTICS We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF BRITISH C O L U M B I A September 1990 ©Yanling Zuo, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of STATISTICS The University of British Columbia Vancouver, Canada Date R K P T . ? a r i g g n DE-6 (2/88) M o n o t o n e R e g r e s s i o n F u n c t i o n s A b s t r a c t In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regres-sion functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimatesMepend upon a "smoothing pa-rameter": the number and location of knots in regression splines and the usual A in smoothing splines. Two standard techniques for choosing the smoothing parameter, G C V and A I C , are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets. n Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures vii Acknowledgement xiii Chapter 1. Introduction 1 Chapter 2. The Sobolev Space W2m[a,6] 4 Chapter 3. Splines 6 1. Definition of Sm (tu..., tk) 6 2. Truncated Power Basis 6 3. B-splines and I-splines 8 4. Monotone Splines 10 Chapter 4. Regression 14 1. Quadratic Programming 14 2. Least Square Regression Splines 15 3. Smoothing Splines 17 3.1. Unconstrained Smoothing Splines in the Normal Model 17 3.2. Monotone Smoothing Splines 19 4. Choice of the Smoothing Parameter 20 4.1. GCV and Its Asymptotic Properties 21 in 4.2. A I C and Its Asymptotic Property 22 4.3. G C V Applied to Monotone Constrained Problems 23 4.3.1. The Approximation Form of G C V in Regression Splines 24 4.3.2. The Approximation Form of G C V for Monotone Constrained Problems in Smoothing Splines 24 4.4. A I C in Monotone Constrained Problems in Regression Splines 25 C h a p t e r 5. D a t a A n a l y s i s 1: the Re la t ionsh ip between S u r v i v a l P robab i l i t i e s and B i r t hwe igh t of M a l e Infants 27 1. Description of the Data Set 27 2. Exploratory Data Analysis 27 3. Analysis of the Data by Least Square Monotone Regression Splines 29 4. Analysis of the Data by Monotone Smoothing Splines 30 5. Some Discussions on Testing the Monotone Assumption of the Regression Function 30 5.1. The Summary Statistics 30 5.2. Simulation 31 5.3. Conclusion 32 C h a p t e r 6. D e s c r i p t i o n of Song Spar row D a t a and E x p l o r a t o r y D a t a A n a l y s i s 33 1. Data Set Description 33 2. Exploratory Data Analysis 33 2.1. Sex Effect of Birds on the Values of Independent Variables 34 2.2. Pairwise Relations of the Six Independent and Sex Effect on the Relations . . 35 2.3. Principal Component Analysis 36 iv 2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Independent Variables 38 2.5. Generalized Linear Models 39 Chapter 7. Binomial Regression Splines 45 1. Unconstrained Binomial Regression Splines 45 2. Monotone Binomial Regression Splines 48 3. GCV in Binomial Regression Splines 48 3.1. GCV in Unconstrained Binomial Regression Splines 48 3.2. GCV in Constrained Binomial Regression Splines 50 4. AIC in Binomial Regression Splines 41 5. Data Analysis 2: the Relationship between Survival Probabilities and Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51 5.1. Survival Probabilities and Tarsus 51 5.2. Survival Probabilities and Beak Length 52 5.3. Survival Probabilities and Beak Width 53 Bibliography 54 v List of Tables Table 1. Male Births Data 28 Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables 35 Table 3. Logistic Models Fit to the Female Song Sparrow Data 40 Table 4. Logistic Models Including Six Independent Variables for Female Song Sparrow Data 41 Table 5. Logistic Models Including T and Its Second Interaction Term 41 Table 6. Logistic Models Including B L and Its Second Interaction Term 42 Table 7. Logistic Models Including B W and Its Second Interaction Term 42 Table 8. Logistic Models Fit to the Male Song Sparrow Data 43 vi List of Figures Figure 1. Fraction of Surviving Versus Birthweight 57 Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines 58 Figure 3. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines 59 Figure 4. Grid Search on G C V 60 Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines 61 Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7Sin(l.5x) 62 Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 75m(1.5x) 63 Figure 8. Q-Q Plots of R\jc2 When the Simulation Function Equals 75m(1.5x) 64 Figure 9. Q-Q Plots of R\/u2 When the Simulation Function Equals 75m(1.5a;) 65 Figure 10. Q-Q Plots of R0/cr2 When the Simulation Function Equals 75m(1.5x) 66 vii Figure 11. Q-Q Plots of RQ/(T2 When the Simulation Function Equals 75m(1.5x) 67 Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xI{0<iB<o.3} + 7 * 0.3J{o.3<s<i} + 7 (x - 0.8)I{0.8<*<1} 6 8 Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{0<x<0.3y + 7 * 0.31{0.3<2;<i} + 7(x - 0.8)I{o.s<x<i} 69 Figure 14. Q-Q Plots oiRi/a2 When the Simulation Function Equals 7xl{0<x<0.3y + 7 * 0.31{0.3<z<i} + 7(x - 0.8)I{0.8<x<i} 70 Figure 15. Q-Q Plots of R\/o-2 When the Simulation Function Equals 7x!{0<x<o.3} + 7 * 0.37{0.3<r<i} + 7{x — 0.8)I{o.8<x<i} 71 Figure 16. Q-Q Plots of Ro/a2 When the Simulation Function Equals 7xI{Q<x<0.3} + 7 * 0.3/{0.3<x<i} + 7(x - 0.8)7{0.8<x<i} 72 Figure 17. Q-Q Plots of Ro/cr2 When the Simulation Function Equals 7x!{0<x<o.3} + 7 * 0.3/{0.3<x<i} + 7{x - 0.8)/{0.8<r<i} 73 Figure 18. Boxplots of Weight for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 74 Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 75 V l l l Figure 20. Boxplots of Tarsus for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 76 Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 77 Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 78 Figure 23. Boxplots of Beak Width for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 79 Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data 80 Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow Data 81 Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow Data 82 Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data 83 Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data 84 Figure 29. Scatterplot of Beak Length and Beak Width for Song Sparrow Data . 85 ix Figure 30. Scatterplot of Weight and Beak Length for Song Sparrow Data 86 Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data .. 87 Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows 88 Figure 33. The Relationship between Wing Length and Survival Probabilities of the Female Song Sparrows 89 Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 90 Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows 91 Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows 92 Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 93 Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows 94 Figure 39. The Relationship between Wing Length and Survival Probabilities x of the Male Song Sparrows 95 Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows 96 Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows 97 Figure 42. The Relationship between Beak Depth and Survival Probabilities of the Male Song Sparrows 98 Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows 99 Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 100 Figure 45. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 101 Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method 102 Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method 103 xi Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 104 Figure 49. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 105 Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 106 Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows 107 xii Acknowledgement I would like to express my deep gratitude to my advisor, Professor Nancy E. Heckman, for her constant encouragement and very helpful advice, which are crucial to this work and very much beyond, and for her patiently reviewing the draft. Very special thanks are due Professor Constance Van Eeden and Dr. Jean Meloche for reading the manuscripts and making constructive comments. I am indebted to Dr. Dolph Schluter for his help in biology and Dr. Chong Gu for his help in computation. Also, I thank Dr. Dolph Schluter and Dr. James N.M.Smith for allowing me to use the song sparrows data. This research was partially supported by the National Science and Engineering Research Council of Canada Grant Number 5-87969. xiii Chapter 1. Introduction In this paper, effective methods for fitting monotone functions are discussed: least square regression splines, binomial regression splines and smoothing splines. These estimates are then used to decide if the true regression function is monotone. In some applications, we require the estimate of the regression function to be monotone. For instance, if f(x) is a response of a test of ability carried out by varying the degree of stress, x, it is reasonable to expect that / is a decreasing function of x. Monotone regression has been used in assessing the interaction between two drugs (Kelly and Rice, 1988). Asynergism is one commonly used definition of a lack of interaction. Asynergism is defined in terms of f(x,y), f\(x) = / ( # , 0) and f2(y) = / (0 , y), where f(x, y) is the expected response at dosage x of one drug and dosage y of the other. Testing the null hypothesis that the two drugs are asynergistic requires the estimation of fx and f2 and inversion of the estimates. Since drug response is usually a monotone function of dose and since the estimates of fx and f2 must be invertible, fx and f2 are estimated by monotone functions. Monotone regression is also used in the quantification of a plant's ability to absorb different ions (Meier, 1989). In the study of a corn seedling's ability to absorb nitrogen, the experimenter is interested in the relationship between the nitrate concentration, u, of a solution surrounding the roots of a corn seedling and the seedling roots' rate of uptake (absorption) of nitrate, The uptake rate is not measured directly. Instead, the nitrate concentration of the surrounding solution, C , is measured over time, the rate of uptake determined by minus the derivative of this curve. The relation between $ and u can be expressed as *(U) = | ( = c _ 1 ( u ) = -C\C-\u)). 1 So, we need estimates of C and C - 1 . It is necessary to assume that C is a monotone function to get a unique C - 1 and thus $. In other applications, we may want to test whether the regression function is monotone. For example, in the study of natural selection, biologists wish to study the relationship between survival and some physical trait. If the survival function is unimodal, biologists say that selection is stablizing. If the survival function is monotone, selection is directional. Some techniques are presented for estimating monotone functions. Brunk (1955) proposed the Pool-Adjacent-Violators algorithm to solve the following problem. Given observations {yi, y2, • • •, Un}, with E(yi) = u,, find {u\, u2, • • •, ^n} to minimize YLiiUi — iii)2 subject to the restriction ux < u2 < ... < un- While Brunk's method gives the maximum likelihood estimate of E(yi), assuming y,- is normal, the resulting estimate is unappealing, due to its roughness. To get a smooth and monotone estimate, Friedman and Tibshirani (1984) combine the running-mean smoothing technique with Brunk's isotonic regression method. Instead of estimating the entire regression function, both methods give estimates of evaluations of the regression function at the observed data points. This paper focuses on fitting monotone functions by regression and smoothing splines. Ramsay (1988) and Kelly and Rice (1989) independently defined monotone regression functions in a spline space. Their definitions and their equivalence are discussed in Chapter 3 . Point-wise approximation to monotonicity is also given. Us-ing these monotonicity constraints, monotone least square regression, smoothing and binomial regression spline estimates are introduced and developed in Chapter 4 and Chapter 7 . . A l l three estimators depend upon a smoothing parameter: the number and location of knots in the regression estimate, and the usual A in the smoothing spline 2 estimate. For choosing the smoothing parameter in each of the methods, generalized cross validation (GCV) and Akaike information criterion (AIC) are used. Villalobos and Wahba (1987) introduced inequality-constrained multivariate smooth-ing splines and applied them to the estimation of posterior probabilities, where the estimated probabilities are constrained to lie between 0 and 1. Techniques for testing the monotone assumption are not as well developed as those for estimating a monotone function. Barlow et al. (1972) proposed the generalized likelihood ratio test statistics, x 2 (tf-2, the variance of y,-, known) and E2 (cr2 unknown), to test the hypothesis HQ: u(xx) = u(x2) = . . . = u(rcfc), vs Hi: u(xi) < u(x2) < • • • < u(xk) where for each i in { 1 , 2 , . . . , n; values of y are observed when x has values x;. The null hypothesis distributions of x 2 and E2 are complicated, especially when h is large, and the usual asymptotic results concerning likelihood ratio tests do not hold. In Chapter 5, we discuss techniques to determine if the true regression function is monotone. Instead of making a theoretical proof, we carry out simulations to study our techniques. Techniques are applied to two data sets in Chapter 5 and Chapter 7. 3 C h a p t e r 2. T h e S o b o l e v S p ac e W 2 m [a,6] In the regression model, the unknown function / is assumed to be in some large function space. The Sobolev space W 2 m is commonly used. A function, / , defined on [a, b] is said to belong to the Sobolev space W 2 m [a, b] if / has the following properties: (i) / , / W , . . . , fi™-1) are absolutely continuous. (ii) f(m) is square integrable. The following facts are well known (see, for instance, Kimeldorf and Wahba, 1 9 7 1 ) (i) W™[a, b] is a Hilbert space with inner product m-1 </,<?>= E r + / r{v)gm{v)dv. (i: i=o J a (ii) Lt, which has domain W 2 m [a, b] and range R, defined by Lt(f) = f(t), is a continuous linear functional with Lt(f) =< Kt, f > where Kt(s) = is called a reproducing kernel. Two useful subspaces of W 2 m [a, b] in the theory of smoothing splines are Hi and Ho- They are defined as Hi = {fe W?[a, b] : / « ( a ) = 0, j = 0 , . . . , m - 1} and H0 = {fe W?[a, b] : / ( m ( i ) = 0 a.e}, and one can show that W 2 m = Hq@HX. PQ and P x denote two linear operators. P0 is the projection of W™ onto H0 and TO Po(/) = £<tf;,/>lfri (3) where { ^ 1 , • • •, V'm} is a basis for polynomials of order m which spans Ho- P\ is the projection of W 2 m onto Hx, m Pi(/)(*) = / ( * ) - E < ^ / > ^ ( 0 ( 4 ) and \\Pi(fW = [\f{m\t)Ydt. (5) The concepts mentioned above will be used in Chapter 4, part 3, smoothing splines. 5 Chapter 3 . Splines In this chapter, we present the definition of spline space Sm(ti,..., tk). To describe the elements in Sm(ti,.. ., tk), we introduce three bases for it, truncated power basis, B-splines and I-splines. Finally, we will discuss the definition of monotone splines. The concepts and the results of this chapter will be applied to regression problems in Chapter 4 and Chapter 7. 1. Definition of Sm(ti, ...,tk) Sm(ti,..., tk) denotes the set of mth order splines. A spline of order m with knots t 0 = 0 < ti < ... < tk < tk+1 — 1 is a polynomial of order m on each subinterval, [i,-, and has m — 2 absolutely continuous derivatives on [0,1]. The knots, i l 5 . . . , tk, are called interior knots. It is well known that Sm(ti,..., tk) has dimension m + k (de Boor, 1978). 2. Truncated Power Basis One of the bases for Sm(ti,..., tk) is the truncated power basis, Xi,..., Xm+k where X1{t) = 1 = t xm(t) = t m—l m—l + m—l 6 and x+ = xl{x > 0}, (x+)° = I{x > 0}. If m+k /(*) = £ frXiit), i=l (6) then simple calculations reveal i) / is a piecewise polynomial of order m on any subinterval [t,-,r t + 1), since on m+k m [U,ti+1), /(*) = £ = s o m e i=i J'=I ii) / has m—2 absolutely continuous derivatives, since each Xi has m—2 absolutely continuous derivatives. Proof: For i — 1, . . . , k dm~2Xm+i(t) dtm~2 = (m-l)!(t-<«•)+• ( 7 ) iii)-X",'s are linearly independent. Proof: For t € [0,*i),/(i) = E i * * = E W ' _ 1 = 0 implies ft = .. . = 0m = 0, and so for i G [tut2), Z?+k PiX^t) = fim+1(t - = 0, which implies fim+\ — 0. Working forward in this manner, one easily sees that /?,• = 0, i = 1,. . . , m + k, if E r + f c = 0, for < e [0,1]. Although the simplicity of the truncated power basis makes it attractive for sta-tistical work, the basis tends to be ill-conditioned as is often manifested in a nearly singular matrix, X Xi(xi) Xx{x2) Xm+k(xi) Xm+k(x2) \ \ Xx(xn) ... Xm+k(xn) j Here, X\,...,xn represent data points. Among other things, this makes the inverse of X*X difficult to calculate. Thus, it is desirable to explore the use of other bases which are better conditioned. 3. B-spl ines and I-splines Given knots which satisfy 0 = i_( m _i) = . . . = t 0 < h < ... < tk < tk+i — • • • = tk+m = 1, (8) the mth-order m + k dimensional B-spline basis, {-B_( m _i) , m , . . . , Bh,m} (de Boor, 1978), is defined as Bi,m(t) = 3 7 ^ - m _ i ( i ) + t+m_ —B i + i i m -i(t) with Biti(t) = < 1 r € [*t,*,-+i) »' = 0 , . . . , - 1 1 * e i = k 0 otherwise and 0/0 = 0. (9) The B-spline basis is more convenient computationally than the power series basis since each B^m has support [£,-, i ! + m ) (de Boor, 1978), making X ' X banded and easier to invert. Another basis, which is used to define monotone regression splines (Ramsay, 1988), is the I*-spline basis of order m (m > 2), {ri( m_ 2), m, • • •, Ik,m}, with knots as in (8). I* m ( i ) = f* Mitm-i(u)du i = - ( m - 2 ) , . . . Jo k where MitTn_i(u) - — . (10) 't+m —1 Since each B^m_i is a piecewise polynomial of order m — 1 with i?; i T O _i(i) > 0, each I*m wil l be a non-decreasing piecewise polynomial of order m. However, the above definition of the I*-spline basis is incomplete, since i,*m(0) = 0, i = —(m — 2),. . . , k, and there exist functions, / , in S m (< i , . . . , tk) with /(0) ^ 0. Also, the basis for a n m + i dimensional space should have m + k elements, so an extra basis element, Il(m-i)im, should be included. Theorem 1. Let / ! ( m _ 1 ) i 7 7 1 = 1. Then { i ^ ( m _ 1 ) i i n > i " ! ( m - 2 ) , m , • • •, I*k,m] is a basis for Sm(*i,...,**). Proof: Since {-B_(m_i),m, • • •, -Bjt,m} is a basis for S m ( r i , . . . , tk), it suffices to show that any Bi,m(l — —(m — l ) , . . . , k) can be written as a linear combination oi I^m_^m, I*( m_ 2), m , r* From de Boor (1978, page 138 equation 9) ( m - l ) ( - f ' + 1 ' m - l ( 3 ; ) + f '""- l ( ?) 7 = - ( m - 2 ) , . . . , f c - l ^•Bt,m(g) - ( m - l ) g - / m 7 2 ) ' m ~ l ( t ) Z = - ( m - l ) (11) ( m - 1 ) •Bfc,m-l(i) tk+m-l—tk and thus - f * (rn-l)Bl+1 .(u) ^ ft ( m - l ) B ( , m _ i ( « ) + C , f = - ( m - 2 ) , . . . , A ; - l /o(wC)!r-^)du+c'* l = k where C\ = B; ) m (0) . (12) (13) k- 1 From the definitions of / l ( r o _ 1 ) i m , -fl(m-2),m> • • • > -^ ,m> w e obtain -^+i ,m (0 + J/tmC*) + ^ ( m _ 1 ) i m ( * ) / = - ( m - 2), J3/ l B.(0 = - i : ( m _ 2 ) , m ( i ) + c _ ( m _ D i : ( m _ 1 ) i m ( i ) / = - ( m - 1) (-14) ^ / f c*m(i) + c f c £ ( m _ 1 ) i m ( t ) i = * Therefore, for any 5 ; > m , there exist a/s , such that Btym(t) = EL-(m-i)«.^m(<)' Hence, { I * ( r o _ 1 ) i m , iH ( ro_2),m. • • •> 7fc,m} i s a b a s i s f o r Sm(h,. .., t*). In order to study the relation between I* and B-splines more easily, we defined an improved I-spline basis of order m as r* X Ii,m(t) = / Mi,m-i(u)du + £ Bi,m(°)i i = ~(m - 1), • • •, J o i=i where M_(m_i)im_i(ti) = 0. (15) Since 22-(m-i) B^m(t) = 1, I-(m-i),m(t) = 1. Wi th the inclusion of the constant terms, E J U -S(,m(0), the relationship between the I-spline and B-spline bases is easily stated. Theorem 2. Given a choice of knots satisfying (8), Yl&jlj,m = YLfiiBj,-™. if and only if Pj = E i ( r o _ i ) Proof: It suffices to show that I»,m(i) = E?=: BitTn(t). From (14), we have -It+1,m(t) + IUt) + Bl<m(0) I = - ( m - 2 ) , . . . , fc - 1 BUt) = { -Z{m_2)tm(t) + B . ( n i _ 1 ) , m ( 0 ) l = -(m- 1) (16) ^ rk.m(t)+Bktm(0) l = k Combining (16) and the definition of I-splines yields —ii+i,m(<) + Ej=/+i Bjtm(Q) + IitTn - J2k=i Bjim(0) +Blim(0) l = - ( m - 2 ) , . . . , k - l - I - (m -2) , m (<) + E*=-( m - 2 ) Bjtm(0) + S _ ( m _ 1 ) i m ( 0 ) I = - ( m - 1) h,m{t) - Bktm(0) + Bk,m(0) I = k -ii+l,m(*) + Il,m(t) 1 = ~(m — 2), . . . ,k — I — J-(m-2),m(*) + I-(m-l),m I = —(m ~ 1) ijfc,m(*) * = k Bl,m(t) = < Therefore 4. Monotone Splines Ii,m = E BltT l=i (17) In some applications, one may want to require that the regression function be monotone. For instance, if / (£) is a response to a drug administered at dosage t, it 10 is reasonable to assume that / is a non-decreasing function of t. Three definitions of non-decreasing splines of order m are presented. (a) Improved Ramsay's monotone I-splines For Iitm(t) as in (15), a non-decreasing regression spline is defined as k /(*)= E <*.•£>(*), w i t h a i > 0 , i = -(m-2),...,k. i=-(m-l) Since B[tTn-i(t) is non-negative, by definition (10), is a non-decreasing function of t. Thus, / is non-decreasing. (b) Kelly and Rice's monotone B-splines (Kelly &: Rice, 1989) For B-splines defined as in (9), Kelly and Rice define a non-decreasing regression spline as k /(*)= E With /?_(„,_!) < . . . < 0k-t'=-(m-l) (c) Non-decreasing splines k /(*) = E PiBi,m(t) is non-decreasing subject to the condition that/'(t) > 0. :=-(m-l) (For the case m = 2, in which case / '(£;) is undefined, take f'(t) > 0 almost every-where.) Theorem 3. Wi th the same choice of knots and order, the set (a) of monotone splines of Ramsay is equal to the set (b) of monotone splines of Kelly and Rice. This follows directly from Theorem 2. Theorem 4. Suppose that m = 1, 2 or 3. If f(x) = Xw=-(m-i) PiBi,m(x), then / is monotone if and only if /?_(m_!) < /?_(m_2) < . . . < 0k-Proof: First consider the case m = 1. Then k k-l f(x) = 120iBiAx) = E M * « < * < t « + i } + ^i{tk<x<tk+1}-1=0 1=0 11 For x G [ti, t,+i), i = 1, . . . , k, and y G [tj, tj+1), j = 1,.. . , k with y > x, f(x)=/3i and f(y) = pj. Thus f is monotone, if and only if / (y) > / (x ) , if and only if 0j > /?,-, j > i, and i = 0 , . . . , k. For cases m = 2 and m — 3, from (11) / ' ( * ) = £ / ? i ^ m ( x ) ;=-(m-i) = ( ™ - i ) i - / 3 - ( m - i ) — ; + A l - : : + T H K t\ — r_( m_ 2) i=_im_2) V W+m — W+l tl+m-l—t[/ +0 Bk,m-l(x) = ( m - 1 ) £ f* ~ ^ B,,m-i(x). (18) - i (* ) ) l=-(m-2) t ' + m - l ~ tl For the case m = 2, for x 7^ we have „, \ X Pi — 01-1 rj / N X Pi — 01-1 T f = 0 r'+! _ r' f = 0 r ' + x — r ' For x G (i j , r t - + i) , zi+l ~ zi Thus / is monotone, if and only if / ' > 0 on each interval ( t ,- , i ! + 1 ) , if and only if A > 0i-i, i = 0,...,k. For the case m = 3, since / ' is piecewise linear and continuous, / is monotone if and only if /'(*,-) > 0, i = 0 , . . . , k + 1. From the definition of B-splines, Bi,2(x) = — ^ r ^ { t I < x < t i + 1 } + / ' + 2 ,X J{« 1 +i<»<«i+ 3}- (I9) W+l — If W+2 — W+l When / = k, I{x G [tk,tk+1)} is replaced by I{x G [i t , r f c + i ]}. 12 Thus, for / = — 1,. . . , k, f 1 i = I + 1 I 0 otherwise Combining (18) and (19) yields f\U) = 2 f - 1 " f - 2 . (20) Therefore, f'(U) > 0 if and only if & _ i > ft_2. Theorem 4 is proved. From our discussion, we conclude that the definitions of monotone regression splines in (a) and (b) are equivalent for all m and equal the definition in (c) when m < 3. 13 Chap te r 4. Regress ion In this chapter, two nonparametric regression methods, least square regression splines and smoothing splines, are introduced and developed for analysing data sets under monotonicity constraints on the regression function. In the numerical compu-tation of the methods, quadratic programming is involved. 1. Quadra t i c P r o g r a m m i n g The general form of a quadratic programming problem with linear inequality con-straints is min xzRn c'x + - x ' G ? subject to A*x > b (21) for constant matrices G and A and vectors c and b. (For further details, see Gi l l , Murray and Wright 1981.) For solving this problem, we introduce the following concepts: (i) . Feasible Solution Any point, x, that satisfies all the constraints in (21) is called a feasible solution. (ii) . Active and Inactive Constraints Let qi denote the i th column of A, and b = (&(i), . . . , b(k)Y- At the feasible point x, the constraint q\x > &(,•) is said to be active if q\x — b^, and inactive if q\x > b^y If the matrix G is strictly positive definite, problem (21) has a unique minimum. If x* is the solution with active constraints A{x* = bi, then x* minimizes c'x + \xlGx subject to A\x = bx. (The ith. column of Ai and the ith. element of bx correspond to the i th active constraint.) Using the method of Lagrange multipliers, c + Gx* - A i A = 0 14 and it can be shown that there exists A satisfying the above such that A, > 0. Con-versely, if x* is a feasible point which minimizes c'x + | x J G x subject to A\x — bx and if c + Gx* — A\X = 0 for some A with A,- > 0, then x* is the solution to (21). Quadratic programming is an iterative technique. A sketch of the general procedure is: (i) . Find an initial feasible solution, x°. (ii) . A t the k-th step, separate the constraints into two parts, active and inactive, where A\xk = bx represents the active constraints. (iii) . Minimize c'x + | x ' G x subject to A[x = bx and obtain the solution xk+1. (iv) . F ind A satisfying c + Gxk+1 — AXX = 0. If A,- < 0, remove qfxix = bn from the active constraints, where g l t- is the ith. column of Ax and 6lt- is the z'th element of b\. (v) . If the boundary of an inactive constraint is attained or violated by x f c + 1 , add the constraint to the active set. (vi) . Go back to (iii), using the new active constraint set as defined in (iv) and (v), and follow the procedure until the active constraint set doesn't change. 2. Least Square Regress ion Splines Consider the regression problem where we have observations yi at design points and assume the observations of (x,-, y,), i = 1,.. . , n are related by the regression model yi = f(xi) + ei, * = l , . . . , n (22) where the e,- are zero mean uncorrelated random variables with a constant variance cr2, a < Xj < Xi < ... < xn < 6, and assume / is an unknown function in the Sobolev space W™[a, b]. Given a basis { X _ ( m _ ! ) , . . . , Xk} for Sm[ti,..., 4] , we approximate / 15 by a function of the form «(*) = £ j=-(m-l) Set Q = (/?_(„_!),...,ft) 4, X = {Xi:j}, X^ = Xj(xi), i = l,...,n, j = -(m -1),. . . , k, y = ( j / i , . . . , y n ) f - If X is of full rank m + k < n, the least square spline estimator of u for the given basis is m+k u(t) = £ PjXj(t), where J3 minimizes ||y - Xg\\2and is given by J3 = [XtX}~1Xty. i=i In some applications, a monotone function, u, is required. If monotonicity con-straints are imposed, the fitted parameter Q is changed. There are two types of mono-tonicity constraints, as discussed in Chapter 3, Part 4. Below, these are formulated as quadratic programming problems, for numerical computation. (i) . Kelly and Rice's monotone splines Choosing B-splines as in (9) of Chapter 3 as the basis of Sm, X = {X^}, X^ = Bjtm(xi), i = 1, • •. ,n, j = —(m — 1 ) , . . . , k, § and y are the same as before. Write the constraints /?_(m_x) < . . . < ft in matrix form, AlQ > b, where the diagonal elements of A1, a^i, equal —1, and a , j l + 1 equal 1, others are zero and b — 0. When the interior knots ti, i = 1, . . . , k, are unique, X is of full rank m + k < n, the least square spline estimator of / for the given basis is the unique solution of the following minimization problem, min^ -yxXP + ^tfX'Xg subject to A'g > b. (23) which can be solved by quadratic programming. (ii) . Point-wise monotone splines 16 The expression of the non-decreasing constraint in Chapter 3, Part 4, (c) is u'(t) > 0 which can be approximated by the conditions ti(xa) < u(x2) < • • • < u(xn) (24) in numerical computation. As above, we can write the conditions as linear inequality constraints, A/3 > 6, where Aij = i?j ) 7 n(x, +i) — JE?jim(x,), i = 1,... , n — 1, j = —(m — 1),..., k and 6 = 0. Then, finding the least square spline estimator of u under the constraints becomes the typical quadratic programming problem. 3. Smoothing Splines 3.1. Unconstrained Smoothing Splines Consider the regression model (22). To estimate / , we want an estimator which fits the data well but, at the same time, has some degree of smoothness. The natural measure of smoothness associated with a function / £ W 2 m [a, b] is / a 6 f(m\t)2dt, while a standard measure of goodness-of-fit to the data is n~x Y^j=\{yj ~ f(xj))2- Thus, one method of combining these two criteria to obtain an overall performance measure is to find / which minimizes where A > 0, controls the relative importance of smoothness and fit. The minimizer, / , is a natural spline of order k = 2m, with simple knots at xi,...,xn (Schoenberg, 1964, Reinsch, 1967). From (5), using the notation of Chapter 2, min n" 1 £(y,. - f(x,))2 + A [\fm(t))2dt i=i J a = min n-^yi-WW + XlWm*. (25) t=i 17 Now, let ipj(t) = V x , j = 1,.. . , m, and r/,-, i = 1,. . . , n be the representors of Li in W? with Li(f)=< f,r,i>, Let £ = P x ^ , ) = / 0 6 ( i - u)!p_1(x,- - u)+~7[(m - l)\]2du, i = 1,. . . , n. Then the minimizer, /A , can be expressed as 771 71 E ^ W + E ^ W - (26) j=l 3=1 Let m w , , f ( » . - - » ) r 1 ( » i - " ) i m " 1 ) , T / (S)o- = < 6, >= X [(m-!)!]» It can be shown that the coefficients c = ( c 1 ? . . . , c n ) ' and d = (e?i,..., dnY in (26) are obtained by solving the following problem, m i n c r f ||y - T c j - E c | | 2 + nAc'Ec Subject to T*c = 0. (27) From its definition, it is easily seen that S is positive definite. If T is of full column rank, the minimizer, fx , is unique (Kimeldorf &: Wahba, 1971, Duchon, 1976, Wahba & Wendelberger, 1980). The equality constraint T'c = 0 can be nicely eliminated from (27) as follows: let Q — [Qx : Q2], orthonormal, and R be the Q — R decomposition of T, that is T = (Ql: <?2) I R ^ ^ 0(n— m)*m j and let c = Q2e. Since T*c = 0, such a representation always exists. The problem of (27) becomes min^g | | y - T d - E Q 2 e | | 2 + nAe 'Q 2 EQ 2 e . (28) 18 Then, it is straightforward to find the minimizers, d and e by differentiating (28). 3.2. Monotone Smoothing Splines If the smooth function we want to estimate is monotone, then new constraints must be added to d and c. It can be shown that the minimizer of (28), fx, subject to /'(<) > 0 for all t, is (i) a polynomial of degree 2m — 1 in each open subset of [a, b] — 5 — {xi,..., xn}, (ii) constant in every open subset of S, and (iii) has 2m — 3 continuous derivatives in [a, b] where 5 = ( i 6 [a, b]\f'x(t) = 0} (Utreras, 1985). The exact solution cannot be found explicitly and is difficult to calculate numerically. Therefore, as an approximation to the condition f'(t) > 0, we intend to use the conditions in (24) which can be rewritten in the following form. f(X2)-f(Xl) > 0 f(x3)-f(x2) > 0 / ( * n ) - / ( * n - l ) > 0. (29) Then, the minimizer of (25) subject to the constraints (29) will be of the form E » ) + E c ; ^ ) ' (30) j j with T2d + E 2 c > 0, where ( T 2 ) t i = (T) t-+1)J—(T),-,-, i = 1,... , n-1, j = 1,. . . , m and ( E 2 ) l j = ( E ) ! + 1 J -(E)f/, i — l,...,n — 1, j = l , . . . , n . By an argument similar to that in the derivation of (28), the coefficients c = Q2e and d in (30) are obtained by solving the 19 following quadratic programming problem: m i n ^ g | | y - T d - E Q 2 e | | 2 + nAg*Q 2 £Q 2 g subject to T2d + E2<?2e > Q which is equivalent to min^g - y\T : S Q 2 ) (A \e J T*T T*SQ 2 W d ^ ^ Q 2 E*T Q | E * S Q 2 + n A Q 2 £ Q 2 , subject to(T 2 : E 2 Q 2 ) \ ? / > 0(31) V ? 7 Since T is of full rank and E is positive definite, the Hessian, ' TT T ' E Q 2 ^ ^ Q ' E ' T Q 2 E t E Q 2 + n A Q 2 E Q 2 J ' is positive definite, and so the unique solution for (31) exists. 4. Choice of the Smoothing Parameter In smoothing splines, the effect of the smoothing parameter, A, which describes the smoothness of the fitted function, is embodied in the minimizing criterion, 1 n rb E t=i The bigger the A, the smoother the fitted curve. In regression splines, the interior knots play a role similar to A in fitting functions. We assume the unknown function / is in the space Sm[t\,..., tk]. Given the basis { X _ ( m _ i ) , . . . , Xk} of 5 m [ r x , . . . , tk], we express / as E j L - ( m - i ) PjXj(x) and find least square estimates. The number and positions of the interior knots {tx,.. .,tk} determine the shapes of the basis splines, Xj, and thus in turn the shape of the final function. For choosing the interior knots, - E ( y . - - / ( x , - ) ) a + A A/(m)(0)2*. n . , Ja 20 we can start with a few more equally spaced interior knots than we might need, then choose a subset of knots to get a smoother estimate of / . Thus, in regression splines, a value of A corresponds to a subset of knots. The choice of criterion for selecting the smoothing parameter is an important consideration in fitting a function. The choice of the smoothing parameter should depend on a measurement of the performance of the estimator, f\. Three criteria are widely accepted and used. They are the loss, the risk and the prediction risk. The loss L(A) is defined as L(\) = n - 1 £(f(x3) - fx(x3))2. «=i Given the data set, the fitted function value f\(xj) is a function of the smoothing parameter A. Hence the loss L(A) depends on A. The risk, R(\), equals the expectation of the loss L(X). The prediction risk is P(A) = E(n-> ±(y- - fx(xj))2) where y*- denotes a new observation with the same distribution as yj, but independent of yj. It can be proved that -P(A) = cr2 + R(X). A good choice of A is one that minimizes either L or R (and hence P). Unfortunately, all three measurements involve the unknown function / . The only way to solve the problem is to find well-behaved estimates of them. Generalized Cross-Validation ( G C V ) and the Akaike Information Criterion (AIC) are two such estimates. 4.1. G C V and Its Asymptotic Properties The definition of G C V is derived from Cross-Validation (CV) . Suppose we have ob-servations (x,-, y,), i = 1,..., n, which satisfy model (22). Let f \ = (f\(xi), . . . , f\(xn)Y where f\ is the estimate of / using the smoothing parameter A. For each fixed A, we estimate / (x , ) using (n — 1) observations ( X J , yj), j = 1,..., i — 1, i -f- 1,..., n and 21 express the estimator as / A ( I ) - CV(A) is defined as C y ( A ) = l / n f : ( y t - / A ( , ) ) 2 . »=i If the estimator £ \ , which is obtained by using n observations, equals H(X)y and f\(i) = #(A)(yi, • • • ,y ,-_i , f\(i), y,-+i, • - . , y n ) , it can be proved that ( '"^( l-W)),,) 1' The G C V criterion is defined as G C V ( A ) - 1 E W w - A M ) 1 n ( ( l / n ) t r ( I - f f ( A ) ) ) ' To choose the best A, that is, the A that minimizes G C V , we use the following grid research. Given A i , . . . , Ajv, calculate GCV(A,) , i = 1,. . . , N, and choose the A,- which gives the smallest value of GCV(A, ) . Some Asymptotic Properties of G C V (See L i 1985 and Speckman 1985) (i) . When the sample size n is large, GCV(A) is nearly an unbiased estimator of the prediction risk P(X). (ii) . If tr[J5T(A)] = 0, GCV(A) represents an unbiased estimator of the prediction risk for an estimator u\, where u\ — H(X)y, and H(X) = (1 + a(X))H(X) - a(\)I with a(X) = tr[J7(A)]/tr[/ - ff(A)]. 4.2. A I C and Its Asymptotic Property 22 A I C , which is based on information theoretic concepts, was proposed by Akaike in 1971 (Sakamoto et al, 1987). It can be used for statistical model selection in a wide range of situations. When a model involving q independent adjusted parameters is fitted to data, the A I C is defined by, AIC(g) = —2ln(maximized likelihood) + 2q. In the regression model with normal errors, AIC(g) = n lnfn- 1 f^(yj - f(x}))2) + 2q. j=i A I C is closely related to G C V . Under the assumption that q/n —» 0 as n —> oo, exp[AIC(A)/n] = GCV(A) . 4.3. G C V Apphed to Monotone Constrained Problems A key to G C V in the unconstrained case is that fx = H(\)y, where H(\) doesn't depend on y. In the constrained case, fx = H(\)y where -ff(A) depends on y only through the set of constraints which are active at the minimizing g. Under constraints, in both regression splines and smoothing splines, the problem i s 1 min^ - y'Cg + -g'G§ subject to A*j3 > 0 (32) with A1 not depending on y's. In regression splines, fx = XQ; in smoothing splines, S = (cf : e*y and £x = [T : E Q 2 ] J . Let J3 be the minimizer of (32) with A\Q = 0. If M = (Qi • Q2) 1 \ 0 J 23 then @ = Q2e, where e minimizes —ytCQ2e + ^etQt2GQ2e. Thus I = Q2(Qt2GQ2)-1Qt2Cty (33) and 2 A = ff(A)y. But Ai depends on y and thus so does H(X). Therefore the unconstrained G C V results do not hold here. 4.3.1. The Approximation form of G C V in Regression Splines Let A\ correspond to the active constraints at the solution of the constrained minimization problem, (23). Then, t race (JJ (A)) = rank(X) — rank(Ai) = m + k — rank(Ai) = I. The rank of Ax can be obtained by doing the Q-R decomposition of Ax. The approx-imate form of GCV(A) is r r v m ( l / n ) E ? = 1 ( / A 0 r , ) - y , ) 2 G C V ( A ) = (1 - l/nf • How do we use GCV(A) to choose interior knots? Suppose the domain of the fitted function is in [0,1]. The two ends knots are 0 and 1. Consider all the possible non-empty subsets of {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}. There are 511 members in this collection. Wi th the empty set, we get 512 combinations. For each subset, Aj, compute GCV(Aj) using interior knots equal to the subset. Choose the set of interior knots which minimizes G C V score. 4.3.2. The Approximation Form of G C V for Monotone Constrained Problems in Smoothing Splines 24 From (31) and (32), we have C = ( T : E Q 2 ) , G TT TtXQ2 A* = (T 2 :E 2 <? 2 ) . Let T2i,E2i be composed of all rows of T 2 and E2 corresponding to the active con-straints at the solution point. Then A\ = (T 2 1 : E2iQ2). Let the Q-R decomposition of Ai be A1 = (U, : U2) \ ° / From (33) V s ) V2(V\GV2y^V\C\ Thus, fx = C = CU2(U2tGU2)-1U2tCty H(\) = CU2{UlGU2)-1Ut2Ct tr i7(A) = tii(U2tGU2)-1U^CtCU2). Therefore, the approximation form of GCV(A) is l /nE? = i ( / (a rO-y .0 2 GCV(A) = (1 - l/ntv(H(X))y 4.4. A I C in Monotone Constrained Problems in Regression Splines 25 In monotone constrained problems, the value of q in the definition of A I C in 4.2 is the number of true free parameters, namely, the difference between the number of parameters in the unconstrained case and the number of active constraints. Compute the A I C score for each sequence of knots which has been mentioned 4.3.1. Choose the set of knots which minimizes the A I C . 26 Chapter 5. Data Analysis 1: the Relationship between Survival Probabilities and Birthweight of Male Infants In this chapter, we apply the monotone least square regression splines and smoothing splines described in Chapter 4 to the estimation of the relationship between survival probabilities and birthweights of male infants. We discuss techniques to study the assumption that the regression function is monotone and give some preliminary sim-ulation results. A short description of the data set and exploratory data analysis are given in the first two sections. 1. Description of the Data Set This is a survival data set from 7037 male births, taken from Table 2(i) in Karn and Penrose (1951). Data are birthweight (in pounds, rounded to the nearest half pound) and whether or not the infant has survived. It is believed that infant survival is closely related to birthweight. Two possibilities for the relationship are believed to exist. The first is that survival probability increases with birthweight. The second is that there is an optimal birthweight: smaller infants are weaker and less likely to survive, while larger infants are associated with more difficult births. The purpose of the analysis of the data set is to determine if survival probability increases with birthweight. 2. Exploratory Data Analysis Table 1 contains the data set and summary statistics. 27 Table 1. Male Births Data Birthweight Number of infants Fraction surviving (p,-) standard error of p,-1.0 13 0 0 1.5 8 0 0 2.0 20 0.05 0.0487 2.5 15 0.13333 0.0878 3.0 39 0.25641 0.0699 3.5 46 0.56522 0.0731 4.0 60 0.76667 0.0546 4.5 99 0.81818 0.0388 5.0 170 0.90588 0.0224 5.5 360 0.92222 0.0141 6.0 721 0.97781 0.0055 6.5 1002 0.97505 0.0049 7.0 1403 0.97078 0.0045 7.5 1141 0.97721 0.0044 8.0 942 0.98089 0.0045 8.5 459 0.96514 0.0086 9.0 301 0.98671 0.0066 9.5 132 0.96212 0.0166 10.0 63 0.90476 0.0370 10.5 25 0.92000 0.0543 11.0 15 0.86667 0.0878 11.5 2 1.0000 0 13.0 1 0.0000 0 Before doing regression analysis, we draw the scatter plot of the fraction of infants 28 surviving, p,-, versus birthweight X ; with horizontal bars displaying 95% confidence intervals for p,- (not including the last two data, as the numbers of infants here are too small). Figure 1 shows that the fraction of infants surviving increases with birthweight for weights between 1 and 6.5 pounds, is basically constant between 6.5 and 9.5 pounds, and decreases slightly from 96% surviving at 9.5 pounds to 89% surviving at 11 pounds. Compared to the interval [1,9.5], the length of the interval [9.5,11] is small, as is the drop in the percent surviving, from 96% to 89%. Therefore, the regression function relating survival probabilities and birthweight of the infants may be well-fit by a monotone function. 3. A n a l y s i s o f the D a t a b y L e a s t S q u a r e M o n o t o n e R e g r e s s i o n Sp l ines For calculational convenience, we make a transformation of the data. Let zt- = sin _ 1 ^/y,/n,-, i = 1,. . . , 19 (not including the first and last two data), where y, and rii equal the number surviving and the number of observations at the ith birthweight respectively. Assuming that yi is binomial, the asymptotic variance of Z{ is consistently estimated by V{ — l / (4n t ) as rii approaches infinity. The simplicity of the asymptotic variance is the primary justification for considering the inverse sine transformation. The weighted regression analysis is equivalent to treating z^j\Jv~i as normal with unit variance with u(a:0A/5 = E(-p=)= £ ^ - ( x O / v ^ , V U» j = - ( m - l ) where X,- = 1 + .5 * (i — 1), u and Xj have the same meanings as in Chapter 4, Part 2 with m, the order, equal to 4. The least square regression spline techniques with two types of monotonicity constraints (Kelly and Rice and point wise monotonicity) are then applied to the transformed data, z,/y/vf. In Kelly and Rice's method, G C V and AIC choose the same set of knots, {0,0.1, 0.4, 0.6,0.9,1}. In the point-wise monotone method, G C V and AIC choose {0,0.1,0.3,0.6, 29 0.8,0.9,1}. Figure 2 and Figure 3 display the two fitted curves where the vertical axis points are pi =(sin(S,))2. It is observed that the two fitted curves are almost same. The estimated survival probabilities of the infants increase almost linearly for birthweights between 2 and 6 pounds. Between 6 and 11 pounds, the estimated survival probabilities are about 97%. The unconstrained estimates of the survival function are also shown in Figure 2 and Figure 3. Each unconstrained estimate was calculated using the same knots as in the corresponding constrained estimate. 4. Analysis of the Data by Monotone Smoothing Splines We apply fourth order monotone smoothing splines to the transformed data to fit the unknown function relating survival probability and birthweight. From the plot of the GCV score in Figure 4, we see that the best choice of the smoothing parameter, A, is 0.019. The fitted curve corresponding to A = 0.019 is shown in Figure 5. From this fit, we reach the same conclusion about the relationship between birthweight and survival probabilities as before. 5. Some Discussions on Testing the Monotone Assumption of the Regres-sion Function In the previous data analysis, we assume that the unknown regression function, / , is monotone. Naturally, we ask: Is this assumption acceptable? Defining a suitable test statistic and determining its null distribution is hard. Instead, we do some preliminary analysis. We propose three summary statistics and carry out simulations to determine their distributions. 5.1. The Summary Statistics Someone who believes that / is not monotone wants to test H0: / is monotone, vs Hi'. No restrictions on / . 30 We approximate / by a fourth order spline j=-(m-l) and estimate u using the least square regression methods with generalized cross vali-dation, as discussed in Chapter 4, Part 2 and Part 4.3.1. Under the null hypothesis, with Kelly and Rice's constraints on the /?/s, we obtain the estimate of / , / 0 . The unconstrained least squares estimate of / , using the knots chosen in the constrained case, is f\. One statistic, similar to that in the classical regression problem, is T _ \\U-h\?l{dh-dh) h-LVIdh where dfo represents the degrees of freedom of the model estimating / under HQ, dfo = n—tr .ff(A), H(X) is defined in Chapter 4, Part 4.3.1, and dfi represents the degrees of freedom of the model estimating / under Hi, dfi = n—rank of X, X = {X,(x,)}. If the null hypothesis were not a constrained one and if the choice of knots were not data based, then under the null hypothesis T would be F distributed with degrees of freedom df0 — dft and dfx. In Section 5.2, we describe simulation studies to determine if T has an F distribution in the constrained problem. We also study the distributions of Ri/cr2 = \\y — fx\\2/cr2 and R0/a2 = \\y — fo\\2/&2, to determine if they are approximately \ 2 with degrees of freedom dfi and dfo, respectively. 5.2. Simulation Let yi = 7sin( 1.5a:,-)-(-£;, i = 1,..., 50 where xt- = z'/51 and e,- ~ N(0,1.52). For each of 1000 simulations, use G C V to choose the knots, estimate fo and fx and calculate the test statistic, T. Since the number of knots and the number of active constraints may vary from simulation to simulation, the 1000 runs will produce different values of df0 and dfi. Figure 6 and Figure 7 show the Q-Q plots of T with the four most common 31 values of (dfQ — dfi,dfx), under the assumption that T is F distributed. Obviously T is not F distributed, since the points on the Q-Q plots do not lie on the line y = x. Figures 8, 9, 10, and 11 give the Q-Q plots oiR\/a2 and R0/a2, under the assump-tion that these variables are x 2- The plots show that the distribution of R\ja2 is very close to a x 2, while R0/a2 is close to, but a little smaller than, a x2-We next simulate using the same x^s and an irregular, monotone regression func-tion, f(x) = 7xI{0<x<Q.3} + 7 * 0.31{0.3<x<1} + 7(x - 0.8)I{0.8<*<1}- Figures 12, 13, 14, 15, 16, and 17 show the Q-Q plots of T, R\jo2 and RQ/a2 from 950 simulations. We have the same conclusions as the previous simulations. 5.3. Conclusion In real data, we seldom know cr2, and so cannot calculate the statistics R\/a2 and Ro/a2. However, if the degrees of freedom in both models are close and the null hypothesis is correct, the values of R\ and R0 should be close. In the male infants data, the value of i? x equals 4.4222 with dfx = 11 and the value of R0 6.8954 with df0 = 16. Therefore, there appears to be little difference between constrained and unconstrained fits. Statistical inference for monotonicity of a regression function is still an unsolved problem in the regression/smoothing spline context. The construction of a suitable test statistic and the calculation of its null distribution could be a research topic in our future study. 32 Chapter 6. Description Of Song Sparrow Data and Exploratory Data Analysis 1. Data Set Description The second data set which we will analyse is about the survival of song sparrows in their first year of life on Mandarte Island in B . C . (Schluter and Smith, 1986). The purpose of our study is to relate survival probabilities of song sparrows in their first year of life to some of their morphological traits. From 1975 to 1978, each year's nestlings were tagged with identifying numbers about six days after hatching. The nestlings were later measured when captured as independent juveniles usually in late summer or early fall of the same year. We use measurements from birds at least 8 weeks old. Totally, 152 male and 145 female song sparrows were observed. On each bird, 6 traits were measured. They were weight, wing length, tarsus (lower leg) length, beak length, beak depth, and beak width, where weight is in grams, others are in millimetres. Observations indicated that mortality and emigrations occured primarily during fall and winter. Annual spring censuses therefore determined the fate of individual birds through the previous fall and winter. The response variable, the status of survival, is categorized into 1 or 0, 1 meaning survived, 0 meaning disappeared. 2. Exploratory Data Analysis Before we analyse the relationship between the response variable and the'six inde-pendent variables, we would like to look at the six independent variables separately for the male song sparrows, the female song sparrows and both combined. In Section 2.1, we draw boxplots and compute means, medians and standard deviations of the six variables in order to observe the sex effect of the birds on the values of the six vari-33 ables. A large difference between the sexes would suggest a need to analyse survival probabilities for each sex separately. One way of approaching the relationship between the response variable and the six predictors is to do a regression analysis. In regression, when the predictors are collinearly related, regression modelling can be very confusing. Estimated effects can change magnitude or even sign, depending on the other predictors in the model. It is important to understand the relationships between the predictors before proceeding with a regression analysis. In Section 2.2, the scatter plots of pairwise relations will help us roughly check the pairwise collinearity of the predictor. In many regression problems, we may seek a small set of predictors that have nearly the same information as the full set. Further analysis can concentrate on this subset of predictors. In Section 2.3, principal component analysis is done for this purpose. By separating the male and the female song sparrows and doing principal compo-nent analysis, we can also see the relationship between sex and the six independent variables. In Section 2.4, we will do some preliminary regression to have a better understand-ing of the relationship between response and each of the predictors. Finally, in Section 2.5, we model the logit scale of survival probabilities as linear functions of the independent variables in order to compare the contributions of the 6 independent variables to survival probabilities of song sparrows. 2.1. Sex Effect of Birds on the Values of Independent Variables Figures 18, 19, 20, 21, 22, and 23 show the boxplots for the six independent variables. In each picture, from left to right, the three boxplots correspond to the combined, the male and the female song sparrows. We observed that the male and female sparrows differ in all six traits, with the largest differences being in weight and 34 wing length. For quantifying the differences, we compute means, medians, standard deviations and standard errors of the means and list them in Table 2. Table 2. Mean, Median, Standard Deviation and Standard Error of the Means of the Six Independent Variables Variable Type of Data Mean Median SD SE Overall 24.0110 24.0000 1.7737 0.1029 Weight Male 25.1445 25.0000 1.4184 0.1150 Female 22.8229 22.9231 1.2568 0.1044 Overall 66.4861 66.5000 2.2540 0.1308 Wing Length Male 68.3416 68.1845 1.2553 0.1018 Female 64.5411 64.4837 1.1610 0.0964 Overall 19.8770 19.9000 0.5922 0.0344 Tarsus Male 20.1257 20.1250 0.4919 0.0399 Female 19.6162 19.5833 0.5775 0.0480 Overall 8.6625 8.6500 0.3148 0.0183 Beak Length Male 8.7054 8.6777 0.3138 0.0255 Female 8.6174 8.6010 0.3106 0.0258 Overall 5.8844 5.9000 0.2017 0.0117 Beak Depth Male 5.9346 5.9466 0.1995 0.0162 Female 5.8318 5.8146 0.1908 0.0158 Overall 6.7473 6.7500 0.2001 0.0116 Beak Width Male 6.8125 6.800 0.1917 0.0156 Female 6.6790 6.6814 0.1860 0.0154 2.2. Pairwise Relations among the Six Independent Variables and Sex Effect on the Relations Figure 24 shows the pairwise relations of the six independent variables for the whole 35 data set where W T is weight, W L is wing length, T is tarsus, B L is beak length, BD is beak depth, and B W is beak width. It seems that W T and W L , W L and T are strongly linearly correlated and that B D and B W , B L and B D , W L and B W , B L and B W are linearly correlated. For observing sex effect, we divide the data set into two parts according to the sex of song sparrows and draw the same pictures. Figure 25 shows the data of the female song sparrows, in which all the pairwise linear relations existing in Figure 24 look weak. In Figure 26, which shows the data of male song sparrows, only the relations among B L , B D and B W are similar to those in Picture 24. Figures 27 through 31, enlarged images of some of the small scatter plots in Figure 24, describe the relations between the independent variables more clearly. In those plots, by adding notations f (female song sparrows) and m (male song sparrows), the dominance of sex of the birds on the relation of W T and W L is obvious. Since, for female and male song sparrows, the values of W T and W L are much different, although W T and W L have no strong linear relation, when we put the male's and female's data together, an elliptical shape appears. The phenomenon occurs in other pictures, but not as obviously as in that of W T and W L . 2.3. Principal Component Analysis There are six independent variables in the data set. Wi th principal component analysis, we want to see whether we can use a few principal components to reproduce most of variability of the system, and thus to reduce the data. For instance, if the first principal component summarizes most of the variability in the independent variables, we may want to regress survival probability against the first principal component. We do principal component analysis for the whole data set and for separated data sets of male and female sparrows in order to observe the difference of principal components 36 in the two cases, which somehow indicates the effect of sex. For the whole data set, ^ Proportion of total sample ^ variance due to kth ^ principal component ^ = ( Ai + ... + A6 ^ 0.2803024 ^ 0.2007063 0.1460836 0.1385502 0.1298394 0.1045181 The first three principal components, which represent 63% the variability, are P C I = 0.46WT + 0.50WL + 0.29T + 0.35BL + 0.39BD + 0.42BW P C 2 = -0 .25WT - 0.30WL - 0.61T + 0.45BL + 0.43BD + 0.29BW P C 3 = 0.49WT - 0.08WL - 0.38T - 0.67BL + 0.39BD + 0.01BW. For the female sparrow data set, ^ Proportion of total sample ^ variance due to kth ^ principal component j •Ai + ... + Ae ^ 0.2258542 ^ 0.1944510 0.1654235 0.1438113 0.1414802 0.1289797 In this case, about 58.6% of the total sample variance is attributed to the first three principal components. The first three principal components become P C I = 0.19WT + 0.30WL - 0.13T + 0.56BL + 0.52BD + 0.52BW P C 2 = - 0 . 5 6 W T - 0 . 4 0 W L - 0 . 7 1 T + 0.10BL + 0.00BD + 0.16BW P C 3 = 0.54WT - 0.73WL - 0.05T - 0.08BL + 0.39BD - 0.10BW. 37 For the male sparrow data set, ^ Proportion of total sample ^ variance due to kth principal component = ( Ai + ... + A, •) = 1 0.2473383 ^ 0.1999561 0.1599121 0.1445753 0.1307587 0.1174594 Here, the first three principal components, which explain 60.7% variance of total samples are changed into P C I = 0.29WT + 0.36WL + 0.06T + 0.54BL + 0.49BD + 0.50BW P C 2 = -0 .33WT + 0.53WL + 0.67T + 0 . 1 2 B L - 0 . 3 8 B D - 0 . 0 3 B W P C 3 = 0.87WT + 0.03WL + 0.35T - 0.20BL - 0.13BD - 0.23BW. From the above analysis, we draw the following conclusions: (i) . The first three principal components are not enough to reproduce most of the variation. (ii) . Since for male and female sparrows the values of the six morphological traits are different and the coefficients in the first three principal components of the traits vary dramatically, analysing the data separately for male and female sparrows is necessary for us to recover the true relationship of response and the independent variables. 2.4. Some Rough Pictures of the Relations of Survival Probabilities and the Indepen-dent Variables We separate the whole data set into two parts according to the sex of the birds. Let xx < x2 < ... < xn denote the independent variable values and y; denote the corresponding 0 — 1 responses. Divide the range of x into a number of subintervals, 38 as follows. The first subinterval consists of xx,..., Xio+j where j is the biggest non-negative integer such that xlQ+j = xx0. The following subintervals are constructed with the same rule, that is, each subinterval includes 10 points plus the replications (if any) of the tenth point. Let p,- denote the survival probability in the ith subinterval, Pi = {E^ggubintervalKy = 1 I xj)}/#xj e subinterval. The estimate of the survival probability, p,-, in each subinterval equals the mean of the response in the subinterval. In logit scale, the estimate is log(p 2/(l — p,)) = logit(p t). The asymptotic variance of logit(p :) is l / (n ;p ; ( l — pi)), and thus an approximate 95% confidence interval for logit(p t) is [log(p,/(l - Pi)) - 2/y/mpiil-pi), log(pf/(l - pi)) + 2/y/nipi(l-pi)]. Figure 32 - Figure 43 are the scatter plots of (x"i, log(pi/(l — pi))) with horizontal bars displaying these confidence intervals, (x, denotes the average of the independent variable values in the subinterval.) From these plots, for male data, it is hard to see a relationship between the response and any of the independent variables, while for female data, relationships with T, B L and B W may exist. For T, it seems there is a decreasing trend for survival probabilities. Also, in the plots of B L and B W , some relations may exist. 2.5. Generalized Linear Models In this section, we use generalized linear models to study the relationships between independent variables and survival probabilities. Since the response, the status of survival, is binomial and the independent variables are continuous, we model the logit of the survival probabilities as linear functions of independent variables. Although the models chosen may have big lack of fit errors, by comparing the deviances of the models, we can roughly evaluate how much valuable information we can obtain through fitting the curve of each predictor and the response separately. Let W T , W L , T, B L , B D , and B W represent the same variables as before. 39 First look at female song sparrow data. Table 3 shows the deviances and degrees of freedom of the models with a constant, first degree, or second degree polynomial depending on W T , W L , T, B L , B D , or B W . Model Deviance Degree of Freedom log(p/(l -p)) = a 200.951 144 log(p/(l -p)) =a+ftWT 200.906 143 log(p/(l -p)) = a + ftWT+ftxWT2 198.981 142 log(p/(l -p)) = a + W L 200.077 143 log(p/(l -p)) = a + / 3 2 WL+/? 2 2 WL2 197.671 142 log(p/(l = a + (33T 184.944 143 log(p/(l -p)) = a + f33T+p33T2 184.767 142 log(p/(l -p)) = a + ftBL 192.876 143 log(p/(l -p)) = a + ftBL+^BL2 191.239 142 log(p/(l -p)) = a + ftBD 200.901 143 log(p/(l -p)) = a + ftBD+/355BD2 200.249 142 log(p/(l -p)) = a + fieBW 196.914 143 log(p/(l -p)) = a + ftBW+fcBW2 193.183 142 From Table 3, the model, log(p/(l — p)) = a + f33T, is a significant improvement over the elementary constant model. The linear parts in the relationships of B L and B W with the response can not be neglected, although this is not as obvious as in T. Adding the quadratic term to each model doesn't help us a lot for obtaining a good fit. For explaining the point more clearly, in Table 4, we use all six independent variables to fit models. 40 Table 4: Logistic Models Including Six Independent Variables for Female Song Sparrow-Data Model Deviance D F log(p/(l -P)) = a + / W T + / W L + / 3 3 T + / ? 4 B L + / ? 5 B D + / ? 6 B W 174.943 138 log(p/(l -P)) = a + A W T + ^ W L + ^ T + ^ B L + ^ B D + ^ B W +P\i W T2 + / 3 2 2 W L 2 + + ^ 4 B L 2 + p s s B B 2 + feBW2 167.964 132 From Table 3 and Table 4, we see that second degree polynomials are not good models for fitting the data. Some nonparametric methods, for example, splines, may help us to solve the problem. Next consider the first degree polynomial models of T, B L and B W in Table 3 by adding some of the second interaction terms. Table 5, Table 6, and Table 7 give the results corresponding to T, B L , and B W . Table 5: Logistic Models Including T and Its Second Interaction Term Model Deviance Degree of Freedom log(p/(l -p)) = a + /33T 184.944 143 log(p/(l -p)) = a + /3 3 T+/? 1 3 WT*T 184.536 142 log(p/(l -p)) = a + /33T+/323WL*T 182.562 142 log(p/(l -p)) = a + /? 3 T+/? 3 4 T*BL 184.597 142 log(p/(l -p)) = a + ^ 3 T + ^ 3 5 T * B D 180.358 142 log(p/(l -p)) = a + /? 3 T+/3 3 6 T*BW 184.662 142 41 Table 6: Logistic Models Including B L and Its Second Interaction Term Model Deviance Degree of Freedom log(p/(l -P)) = a + /34BL 192.876 143 log(p/(l ~P)) = a + / ? 4 BL+/? 1 4 WT*BL 192.806 142 log(p/(l ~P)) = a + / ? 4 B L + & 4 W L * B L 192.871 142 log(p/(l -P)) = a + / ? 4 BL+/? 3 4 T*BL 192.327 142 log(p/(l -P)) = a + / ? 4 BL+/? 4 5 BL*BD 192.758 142 log(p/(l = a + / ? 4 B L + / ? 4 6 B L * B W 191.857 142 Table 7: Logistic Models Including B W and Its Second-Interaction Term Model Deviance Degree of Freedom log(p/(l -P)) = a + j8 6 BW 196.914 143 log(p/(l -P)) = a + / ? 6 BW+/? 1 6 WT*BW 196.906 142 log(p/(l -P)) = a + / ? 6 B W + f t 6 W L * B W 196.046 142 log(p/(l -P)) = a + i 3 6 B W + ^ T * B W 196.913 142 log(p/(l -P)) = a + / ? 6 B W + / ? 4 6 B L * B W 196.457 142 log(p/(l -P)) = a + / ? 6 B W + / ? 5 6 B D * B W 196.747 142 The results in these tables suggest that, among all second interaction terms, only the term T * B D might have some useful information for predicting survival probabilities. The following table includes the same models as in Table 3, but for the male song sparrow data. 42 Table 8: Logistic Models Fit to the Male Song Sparrow Data Model Deviance Degree of Freedom log(p/(l -p)) = a 196.575 '• 151 log(p/(l -p)) a + ft W T 196.539 150 log(p/(l -p)) = a + ftWT+ftxWT2 194.647 149 log(p/(l -p)) a + ft W L 196.517 150 log(p/(l -p)) = a + ft W L + f t 2 W L 2 196.082 149 log(p/(l -p)) a + ftT 196.100 150 log(p/(l -p)) a + ftT+ft3T2 194.876 149 log(p/(l -p)) = a + ftBL 196.572 150 log(p/(l -p)) = a + ftBL-f ft4BL2 196.411 149 log(p/(l -p)) = a + ftBD 196.529 150 log(j>/(l -p)) - a + ftBD+ft5BD2 194.946 149 log(p/(l -p)) a + ftBW 196.034 150 log(p/(l = a + ftBW+ft6BW2 194.456 149 From the deviances of the models in the above table, we see that adding first and second degree terms of the independent variables doesn't give much improvement over the constant model. From here, at least we can roughly say, the logit scales of survival probabilities are not linearly related with any one of the independent variables. Conclusions: (i) Compared with the male song sparrow data, the female song sparrow data set shows relationships between independent variables and response. (ii) If we want to analyse the relation between each independent variable and survival probabilities further, including T, B L and B W in female song sparrow might be wise. (iii) Second degree polynomials are not good models for fitting the data. 43 In the following analyses of this data, we will focus on T, BL and BW in the female song sparrow. 44 Chapter 7. Binomial Regression Splines For analysing the song sparrow data, we now introduce unconstrained binomial regression splines to motivate the technique for the constrained case. By modifying the unconstrained method, we propose a technique of estimating monotone binomial regression splines. We define G C V and AIC for knots selection in binomial regression models, and their approximate forms under constraints. Finally, we apply the methods to the female song sparrow data. 1. Unconstrained Binomial Regression Splines Consider independent observations (x,-, y,), i = 1,. . . , n, where Let f(t) =log(p(i)/(l — p(t))) and assume / is in the Sobolev space W 2 m [a , b]. This transformation is made to remove the constraint p 6 (0,1), and to guarantee / is monotone in p. Given the B-splines basis, { I?_( m _i ) i r n , . . . , B_jt,m}, of Sm[ti,..., tk], we approximate / by Set § = (/?_(„!_!),. . . , flkY and y = ( y 1 ; . . . , y n ) 4 - To obtain the maximum likelihood estimate of the parameter 0, we need the likelihood function of y. The likelihood function of y is Binomial(l,p(ar,-)) i = I,...,n. k j=-(m-l) n L(8/V) = l[p(xi)vi(l-P(xi)) «'=i and its logarithm can be expressed as n = E{y.-/(*0-ln(l + exp(/(zO))} 45 n f k k J2\yi £ PjBj,m{xi) - ln ( l + exp( £ PjBj^Xi))) »=1 I j=-(m-l) j=-(m-l) Let 5 = •S-(m-l),771(^2) -^-(m-2),m -Sfc,m(a;i) ( B I \ = BI Bk,m(xn) j y B-(m-\)tm(xn) B_(m_2),m(xn) and assume B is of full column rank. Thus, K6) = £ {yiB'Bi - ln ( l + exp(^ 45 t))} . t=i Differentiating with respect to g yields s(§) = I'm - V V „ n exp(^^ , ) - ^ ( y i B < " l + exp(^B,-) B 0-1=1 Defining ^ exp(jg'Bi) ^ l+exp f ^ 'B j ) e x p ( ^ t B n ) \ i+exp(#<B„) / We obtain the equation (34) The maximum likelihood estimate of g is the solution to S(g) — 0. This equation has no explicite solution, but can be solved by Newton-Raphson iterative techinique. This technique requires the calculation of the second derivative of I with respect to g. From (34), cPl(g) _ dS{6) _ _ ^ expjB'Bj) t £ t (1 + e x p ( f t £ t ) ) 2 1 *"• dg2 dp 46 Defining V = d i a g K - . . , , „ ) = d i a g ( ( 1 + e x p ( ^ ) ) 2 , • - -, ( 1 + e x p ( ^ ) ) 2 ) > we obtain the equation = - £ ViBiB\ = -B*VB. d$2 The implementation of Newton-Raphson iterative technique includes three calcula-tions: (i) Find an initial estimate of 0, f3 . "k -fc+1 (ii) Given the kth. step's estimate {5 of §, define § - t+i ~k dS(Jl w -k . = §" + {BiVB)-'B\U-uQk)) (35) (iii) Establish a criterion to decide when iteration should be stopped. If, at each step, BlVB is non-singular matrix, then the Newton-Raphson method converges. B being of full rank, BtVB is necessarily non singular. The maximum likelihood estimate of Q is the final result of the iteration. The Newton-Raphson method of finding the maximum likelihood estimate of Q is sometimes called iteratively reweighted least squares for the following reason: From (35), we have 8*1 = 8k + {BtVB)-1BXy-u{f)k)) (B'VB)^1 = (B'VB)^ + B\y -= B*V(BJ3k + V~\u - «(£*))• If g = B $ + V-\y - u), 47 then, 8k+1 = (B'VB^B'Vz is the solution of the weighted least squares problem min^ \\vh(z-BS)\\\ (36) where the elements of V and z are the functions of the former estimate of (3. 2. M o n o t o n e B i n o m i a l Regression Splines Assume / is a monotone function. Wi th each of the two types of monotonicity constraints discussed in Chapter 3, Part 4, we want to estimate the parameter Q which maximizes 1(0) subject to At§ > 0. To solve the problem, we make a modification of step (ii) of the Newton-Raphson method, using equation (36). Wi th V and z as defined in Section 1, @ k + l is the solution of the following quadratic programming problem: min \\V*(z - B £ ) | | 2 subject to A*@ > 0. (37) At each iteration, if BtVB is strictly positive definite, a unique minimum is achieved. 3. G C V i n B i n o m i a l Regress ion Splines 3.1. G C V in Unconstrained Binomial Regression Splines In the least squares regression problem, G C V is defined as where the numerator represents the residual sum of squares. A n analog of the residual sum of squares in the generalized linear models context is the generalized Pearson \ 2 48 statistic. Replacing the numerator in (38) by Pearson \ 2 leads to the G C V score where V, z, §, and the hat matrix, H(A) = 5 ( B ' V B ) ~ l B*V, are all computed at the final stage of the iteration. It is conjectured that, for smoothing spline fits, the A minimizing the G C V score wil l approximately minimize the weighted mean square error (O'Sullivan et al., 1986), i / » E » i f e ) - / f e ) ) 2 -i=i In binomial regression splines, there are two algorithms to calculate the G C V score. Algorithm 1. For each subset of possible knots, A, estimate the parameter g using the Newton-Raphson algorithm and compute the G C V at the last iteration using (39). Choose the A that minimizes G C V . Algorithm 2. Choose an initial estimate, j3°. Calculate the associated V and the pseudo data z. Fixing z and V, assume that there exists a knot subset, A, for which z is normal with mean B(\)g and covariance V~x. Fit the pseudo data by the usual method of generalized cross-validation in a weighted normal model, that is, for each value of A, find J(A) to minimize l l t ^ 1 / 2 ^ — B(\)/3)\\2 and then choose the value of A that minimizes G C V in (39). The resulting § is then used in the next step of the iteration to create a new V and z. Iterate until A is stabilized and the iteration for this stabilized A converges. Algorithm 1 is proposed by O'Sullivan et al. (1986) for use with smoothing splines, where A is a real number. Algorithm 2 is proposed by G u (1990a) also in the smoothing spline context. G u shows that, under the right conditions, the final A chosen by algorithm 2 satisfies D(X) 49 where A minimizes E(VW(\)) = E(GCV(X)), and in Binomial case, D{A) = E ( £"=i K (rij,p(xj),px(xj))Sj, rij represents the number of observations with a particular value of x = Xj, K(n,Po,P\) is the Kullback-Leibler distance between a Binomial (n,p0) and Binomial (n,pi) and equals e(n,p0) — e(n,Pi) where / \ e(n,p) = 2 k=Q n \ k ) I \ n - p)n~k (k\og{-^—) + nlog(l - p) + log " ) \ l - p t / V fc / The difference between the two algorithms is explained in Gu (1990b), who argues convincingly that algorithm 2 is more suitable. The data are only analysed using the first algorithm, as Gu's method has only recently become known. 3.2. G C V in Constrained Binomial Regression Splines The modification of G C V in (39) for the constrained problem is similar to that made in the normal regression model. Specifically, suppose Q is the solution of (37) and A\ is the active constraints set with A\j3 — 0. The minimization problem of (37) has the same solution as min | |V5 ( z - B§)\\2 subject to A[/3 = 0. (40) Do Q-R decomposition of A 1 ; = (Qi : Q2) Let Q = Q2Q. Then, QiR. (37) = min \\V^z - BQ2c)\\2 = min (z - BQ2cfV(z - BQ2c). The estimate of c and J3 are c = ((BQ2)tV(BQ2))-1Qt2BtVz 50 (Qt2BtVBQ2)-1Qt2BtVz Q2Q = Q2(Qt2BtVBQ2)-1Qt2BtVz, and thus z = BQ^QlB'VBQ^QlB'Vz H(X) = BQ2(Qt2BtVBQ2)-1Qt2BtV, and tvH(X) = tvdQlB'VBQ^QlB'VBQ^ = rank of Q^VBQ^. The approximate form of G C V in the constrained binomial regression splines is defined as G C V _ \\vKi-BS)\f (n-tr (H(X))f This form can be used in either Algorithm 1 or 2. 4. AIC in Binomial Regression Splines In binomial regression splines, AIC is defined as AIC(A) = - 2 J2(yjS*Bi - ln( l + e x p ^ S , ) ) ) + 2 * q i=i where 5 represents the number of independent parameters and @ is found by the Newton-Raphson method. Without constraints, q equals m + k. Wi th constraints, q equals m + k minus the rank of the active constraints matrix A\. 5. Data Analysis: the Relationship Between Survival Probabilities and Tarsus, Beak Length and Beak Width of the First Year's Female Song Sparrows 51 5.1. Survival Probabilities and Tarsus Exploratory data analysis indicates that as the size of the female song sparrows' tarsus increases, the survival probabilities of the birds decrease. We relate survival probabilities to tarsus using Kelly and Rice's and point-wise monotone binomial regres-sion splines. In Kelly and Rice's method, we assume the parameters of the fitted curve form a decreasing sequence. B y computing G C V and AIC scores, as in Algorithm 1, we obtain the unique best set of knots,{0,0.3,0.4,0.5, 0.7,1}. In the point-wise mono-tone method, we assume that at the increasingly ordered data points, the values of the fitted function are decreasing. A grid search of G C V and AIC yields the same set of knots, {0,0.2, 0.3,0.6,1}. Figures 44 and 45 show the two fitted curves. In this data set, the two fitted curves are very close. To describe the relation of survival probabilities and tarsus of the female song sparrows more clearly, we divide the values of tarsus length into four subintervals. If the values of tarsus length are from 17.7mm to 18.25mm, the birds have about 100% chance of surviving, from 18.4mm to 19.75mm , the survival probabilities of the birds decrease sharply from over 95% to 37%, if we go further from 19.75mm to 20.3mm, the chance of survival of the birds is about 37%, from 20.3mm to 21.3mm, the chance of survival of the birds drops from 37% to 0. 5.2. Survival Probabilities and Beak Length We assume a non-decreasing function fits the relation between beak length and sur-vival probabilities of the female song sparrows. Applying the two types of constraints of Section 5.1 to the data set, we have the following results. Wi th Kelly and Rice's method, G C V and AIC choose different sets of knots. G C V chooses {0,0.2,0.7,1}, AIC chooses {0,1}. For each set of knots, we have the fitted curve in Figure 46 and Figure 47. 52 With pointwise monotone constraints, computation of G C V and A I C also gives different sets of knots. The best sets of knots from G C V and A I C are {0,0.2,0.6, 0.7,1} and {0,0.6,1} respectively. Wi th the two sets of knots, we obtain the fitted curves shown in Figure 48 and Figure 49. Although the two types of constraints and the two ways of choosing knots result in different sets of knots, the fitted curves are very close. Roughly speaking as the beak lengths of the female birds increase from 7.8mm to 8.8mm, the survival chances of the birds increase from 5% to 62%. For beak lengths greater than 8.8mm, the survival chance is about 63%. 5.3. Survival Probabilities and Beak Width Here we still assume that the regression function relating survival probabilities and beak width of the female song sparrows is monotone. For each type of constraint, G C V and A I C give the same set of knots {0,0.8,1}. The fitted curves are shown in Figure 50 and Figure 51. Both methods give almost identical fits. The chance of survival increases from 0 at 5.9mm beak width to 55% at 6.8mm beak width. Birds with beak width bigger than 6.8mm have about a 55% chance of surviving. 53 Bibliography Barlow, R . E . , Bartholomew, D.J . , Bremner, J . M . , and Brunk, H.D. (1972). Statistical Inference under Order Restrictions. New York: John Wiley & Sons. Brunk, H .D. (1955). Maximum Likelihood Estimates of Monotone Parameters. Ann. Math. Statist., 26, 607-616. de Boor, C. (1978). A Practical Guide to Splines. New York: Springer-Verlag. Duchon, J . (1977). Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces, in Constructive Theory of Functions of Several Variables, eds. W. Schempp and K . Zeller, Berlin: Springer-Verlag, 85 — 100. Friedman, J . and Tibshirani, R. (1984). The Monotone Smoothing of Scatterplots. Technometrics, 26, 242-250. Gi l l , P .E. , Murray, W., and Wright, M . H . (1981). Practical Optimization. New York: Academic Press. Gu, C. (1990a). Adaptive Spline Smoothing in Non Gaussian Regression Models. J . Amer. Statist. Assoc., 85, 801-807. Gu, C. (1990b). A Note on Cross-Validating Non Gaussian Data. Technical Report #96. Department of Statistics, University of British Columbia. L i , K . (1985). From Stein's Unbiased Risk Estimates to the Method of Generalized 54 Cross-Validation. The Annals of Statistics, 13, 1352 — 1377. Karn, M . N . and Penrose, L.S. (1951). Bir th Weight and Gestation Time in Relation to Maternal Age, Parity and Infant Survival. Ann. Eugen. 16, 147-164. Kelly, C. and Rice, J . (1988). Monotone Smoothing with Application to Dose Response Curves and the Assessment of Synergism. Unpublished Manuscript. Kimeldorf, G. and Wahba, G. (1971). Some Results on Tchebycheffian Spline Func-tions. Journal of Mathematical Analysis and Applications, 33, 82-95. Meier, K . (1989). Estimating Rate Equations Using Nonparametric Regression Meth-ods. Institute of Statistics Mimeograph Series No. 1969, North Carolina State Uni-versity. O'Sullivan, F. , Yandell, B.S. and Raynor, W. J . (1986). Automatic Smoothing of Regression Functions in Generalized Linear Models. J . Amer. Statist. Assoc., 81, 96-103. Ramsay, J .O. (1988). Monotone Regression Splines in Action. Statistical Science, 3, 425-461. Reinsch, C. (1967). Smoothing by Spline Functions. Numer. Math, 10, 177-183. Schluter, D. and Smith, J . N . M . (1986). Natural Selection on Beak and Body Size in the Song Sparrow. Evolution, 40(2), 221-231. 55 Schoenberg, I.J. (1964). Spline Functions and the Problem of Graduation. Proc. Nat. Acad. Sci. U S A 52, 947-950. Speckman, P. (1985). Spline Smoothing ans Optimal Rates of Convergence in Non-parametric Regression Models. Annals of Mathematical Statistics, 13, 970 — 983 Utreras, F.I. (1985). Smoothing Noisy Data under Monotonicity Constraints Exis-tence, Characterization and Convergence Rates. Numer. Math., 47, 611-625. Villalobos, M . and Wahba, G . (1987). Inequality-Constrained Multivariate Smooth-ing Splines Wi th Application to the Estimation of Posterior Probabilities. J . Amer. Statist. Assoc., 82, 239-248. Sakamoto, Y . , Ishiguro, M . and Kitagawa, G . (1987). Akaike Information Criterion Statistics. Boston: D. Reidel. Wahba, G. and Wendelberger, J .G. (1980). Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation. Monthly Weather Review, 108, 1122-1143. 56 Fraction of Surviving ro -cp C O cn — 0 0 -tf o o Ui c s < < cs cm cn C CA w cr I. cm cr Figure 2. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines Figure 3. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Least Square Regression Splines Figure 4. Grid Search on G C V 0.0 0.01 0.02 0.03 0.04 Lambda 0.05 0.06 60 Figure 5. The Relationship between Birthweight and Survival Probabilities of the Male Infants by Smoothing Splines Figure 6. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7SIN(1.5x) in CM o c\i in d o d Quantiles of F-Distribution, df=(3,43) 62 Figure 7. Q-Q Plots of the Test Statistics T When the Simulation Function Equals 7SIN(1.5x) Quantiles of F-Distribution, df=(4,42) 1 2 3 Quantiles of F-Distribution, df=(5,42) 63 Figure 8. Q-Q Plots of R^cr2 When the Simulation Function Equals 7SIN(1.5x) Figure 9. Q-Q Plots of R^/a2 When the Simulation Function Equals 7SIN(1.5x) Figure 10. Q-Q Plots of R0/a2 When the Simulation Function Equals 7SIN(1.5x) ra ra Q T 3 CU •c o CO o o o cn o co o Quantiles of Chisq-Distribution, df=43 o o o o CO o co Quantiles of Chisq-Distribution, df=44 66 Figure 11. Q-Q Plots of RQ/cr2 When the Simulation Function Equals 7SIN(1.5x) —, , , , 30 40 50 60 70 Quantiles of Chisq-Distribution, df=45 30 40 50 60 70 80 Quantiles of Chisq-Distribution, df=46 67 Figure 12. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xl{0<x<0.3y + 7 * 0.31{0.3<x<i} + 7(x — 0.8)I{0.8<x<i} Quantiles of F-Distribution, df=(4,43) in CM o CM in J Quantiles of F-Distribution, df=(4,42) 68 Figure 13. Q-Q Plots of the Test Statistic T When the Simulation Function Equals 7xI{0<x<o.3} + 7 * 0.37{o.3<a;<i} + 7(x — 0.8)I{0.8<a;<i} 69 Figure 14. Q-Q Plots of i?i/cr2 When the Simulation Function Equals 7a;J{0<x<o.3} + 7 * 0.3/{0.3<x<i} + 7(x - 0.8)/{0.8<x<i} re re Q n <u t o CO 30 40 50 60 Quantiles of Chisq-Distribution, df=40 70 Figure 15. Q-Q Plots of Rx/(j2 When the Simulation Function Equals 7xJ{0<x<o.3} + 7 * 0.31{0.3<x<i} + 7(x - 0.8)I{0.8<x<i} 71 Figure 16. Q-Q Plots of RQ/a2 When the Simulation Function Equals 7xI{Q<x<o.3y + 7 * 0.3 0^.3<x<i} + 7(x - 0.8)7{0.8<*<i} Quantiles of Chisq-Distribution, df=46 72 Figure 17. Q-Q Plots of RQ/a2 When the Simulation Function Equals 7xJT{0<x<o.3} + 7 * Q-3I{a.z<x<i} + 7(x - 0.8)I{0.8<x<i} Weight 20 22 24 26 28 CW c re M 00 a. a Figure 19. Boxplots of Wing Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data o oo CD cs) c CD _ l D) CD .= CD CD CM CD 75 Tarsus SL OP cr Ul O CS rw P3 o 3 00 s CD to O » 0 cn C CA S» 1 CD CL ft) 3? CL CL Figure 21. Boxplots of Beak Length for Mixed(Left), Male(Middle), and Female(Right) Song Sparrow Data 10 o ai c cu _ J ca CD CO in oo q cd 77 Figure 22. Boxplots of Beak Depth for Mixed(Left), Male(Middle), and Female (Right) Song Sparrow Data Q. CU Q X. ro cu CO CD CM CO o CO oo LO CO LO LO CM LO 78 Beak Width 6.6 6.8 C »i ft to w • W 0 09 nd plo *J rt-cn em O >-*» £L W ft ft Pi O £3 l-i o o rt-•1 re /-—V ft Si-ft* )—-I Figure 24. Pairwise Scatterplots of the Six Independent Variables for Song Sparrow Data 80 Figure 25. Pairwise Scatterplots of the Six Independent Variables for Female Song Sparrow Data 81 Figure 26. Pairwise Scatterplots of the Six Independent Variables for Male Song Sparrow Data 82 Figure 27. Scatterplot of Weight and Wing Length for Song Sparrow Data o oo co co to CD m m m m m _ m m m m mm jjn m mm m m m CM CD m f _m m mn rmmrrfrim m m m mm rff1 m m i^iimri ™ m m mmm ^nvx RLlwim m f mrrn m ITyrrf f^ tmn m m r c L mmm f m 3m m m m m ••• m m m mm f f f tfff fff 1 It f I T t t f f f f f f f p f % ft f f f f f / f f » ' ffff ff ffW f fff t f f f ff i • 1 1 f f f t fff t f f 'f f f if m m m 20 22 "T~ 24 26 28 WT 83 Figure 28. Scatterplot of Weight and Tarsus for Song Sparrow Data 20 1^ 22 24 26 28 WT 84 Figure 29. Scatterplot of Beak Length and Beak Width for Song Sparrow Data 8 5 F i g u r e 30. S c a t t e r p l o t o f W e i g h t a n d B e a k L e n g t h for S o n g S p a r r o w D a t a m m m f m f ff f m m m m m m m m m f m f m, y% m m m f _' « ni m m % m m mm m m f f m f 1 ff f rflV , f, ,f f f t f f m f f ! } m n % m m m m f f 1 rh f f f f T« fmf rrmpT"m Ln*m.L m m f ff ff ff f f f j p ^ R l ^ m f f f f rfi rfi m f ff f f W m ff f f f m m nm m m m 'f f m f m 20 22 24 T 26 28 WT 86 Figure 31. Scatterplot of Beak length and Beak Depth for Song Sparrow Data 87 Figure 32. The Relationship between Weight and Survival Probabilities of the Female Song Sparrows 8 8 Figure 33. The Relationship between Wing Length and Survival Probabilities of the Female Song Sparrows WL 89 Figure 34. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows 18.5 19.0 19.5 20.0 20.5 90 Figure 35. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows -J 8.2 8.4 1 T ~ 8.6 8.8 B L 9.0 9.2 9.4 91 Figure 36. The Relationship between Beak Depth and Survival Probabilities of the Female Song Sparrows co -CM -O -O) T-o CM _ 92 Figure 37. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows o H CM - J ^ -4 BW 93 Figure 38. The Relationship between Weight and Survival Probabilities of the Male Song Sparrows oo H to H C M H o H C M - J 9 4 Figure 39. The Relationship between Wing Length and Survival Probabilities of the Male Song Sparrows •t -co -CM -O .1= O o> o " T ~ 67 68 69 70 71 WL 95 Figure 40. The Relationship between Tarsus and Survival Probabilities of the Male Song Sparrows 19.5 20.0 20.5 21.0 96 Figure 41. The Relationship between Beak Length and Survival Probabilities of the Male Song Sparrows BL 97 Logit of the Estimates of Survival Probabilities Figure 43. The Relationship between Beak Width and Survival Probabilities of the Male Song Sparrows CO -CVJ o -99 Figure 44. The Relationship between Tarsus and Survival Probabilities of the Female Song Sparrows Kelly and Rice's Method 18 19 20 21 Tarsus 100 Figure 45. T h e Re la t ionsh ip between Tarsus and S u r v i v a l P robab i l i t i e s of the Female Song Sparrows 00 d CD CD J5 co n o CO > CO d C\J d o d Tarsus 101 Figure 46. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method 8.5 Beak Length 9.0 9.5 102 Figure 47. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Kelly and Rice's Method . • • • / Knots Chosen By AIC 8.0 8.5 9.0 9.5 Beak Length 103 Figure 48. The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method CO ci to d d co d CM d » / Knots Chosen By GCV 8.0 8.5 9.0 9.5 Beak Length 104 Figure 4 9 . The Relationship between Beak Length and Survival Probabilities of the Female Song Sparrows by Pointwise Monotone Method 8.0 8.5 9.0 Beak Length 105 Figure 50. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows Beak Width 106 Figure 51. The Relationship between Beak Width and Survival Probabilities of the Female Song Sparrows in d d CD - O CO S o CO > CO CO CM d o d 107
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Monotone regression functions
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Monotone regression functions Zuo, Yanling 1990
pdf
Page Metadata
Item Metadata
Title | Monotone regression functions |
Creator |
Zuo, Yanling |
Publisher | University of British Columbia |
Date Issued | 1990 |
Description | In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimates depend upon a "smoothing parameter": the number and location of knots in regression splines and the usual [formula omitted] in smoothing splines. Two standard techniques for choosing the smoothing parameter, GCV and AIC, are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets. |
Subject |
Monotonic functions Regression analysis |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0098248 |
URI | http://hdl.handle.net/2429/29457 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1990_A6_7 Z86.pdf [ 4.39MB ]
- Metadata
- JSON: 831-1.0098248.json
- JSON-LD: 831-1.0098248-ld.json
- RDF/XML (Pretty): 831-1.0098248-rdf.xml
- RDF/JSON: 831-1.0098248-rdf.json
- Turtle: 831-1.0098248-turtle.txt
- N-Triples: 831-1.0098248-rdf-ntriples.txt
- Original Record: 831-1.0098248-source.json
- Full Text
- 831-1.0098248-fulltext.txt
- Citation
- 831-1.0098248.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0098248/manifest