UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bump hunting in regression revisited Harezlak, Jaroslaw 1998

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
[if-you-see-this-DO-NOT-CLICK]
ubc_1998-0458.pdf [ 4.97MB ]
Metadata
JSON: 1.0088542.json
JSON-LD: 1.0088542+ld.json
RDF/XML (Pretty): 1.0088542.xml
RDF/JSON: 1.0088542+rdf.json
Turtle: 1.0088542+rdf-turtle.txt
N-Triples: 1.0088542+rdf-ntriples.txt
Original Record: 1.0088542 +original-record.json
Full Text
1.0088542.txt
Citation
1.0088542.ris

Full Text

BUMP HUNTING IN REGRESSION REVISITED by Jaroslaw Harezlak B.Sc., Simon Fraser University, Burnaby, B.C., 1995 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1998 ©Jaroslaw Harezlak, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Suppose bivariate data {(ti, Yj), i — 1,..., n} are observed at times a < t\ < t2 < ... < tn < b. Given a nonparametric regression model Yi = m(ti) + £; with i.i.d. mean 0, variance cr2, for i = 1, 2,..., n, we want to estimate the number of modes of the underlying regression function m(-) or its derivative. We use the penalized least squares technique to get an estimate of m(-), i.e. the function minimizing A new test of multimodality is introduced and its performance is studied. Our idea is motivated by the test proposed by Silverman (1981) concerning the number of modes in the density function. He used a "critical bandwidth" as a test statistic in a kernel smoothing context. He noted that if the data are strongly bimodal, we would need a large value of a bandwidth to obtain a unimodal density estimate. In our case we define the "critical smoothing parameter" Xcrit as the smallest A giving an estimate with the specified number of modes. We use Acr;4 as a test statistic in our new test CrvSV. We use bootstrap techniques to assess the performance of our test. We study the effects of the penalty L on the quality of our test via simulation using different regression functions and we compare it with Bowman et.al.'s monotonicity test (1998). CriSV is also applied to the children's growth data in studying the number of bumps in the derivatives of the growth functions. In a sample of 43 boys and 50 girls, our test procedure gives an automatic classification rule in about 80% of the growth curves analyzed. •6 11 Contents Abstract ii Table of Contents iiList of Tables v List of Figures vi Acknowledgments ix 1 Introduction 1 2 Bumps: what are they? 6 2.1 Bumps in Density Functions2.2 Bumps in Regression Functions 10 2.3 Testing in Density versus Regression3 Methods of Estimation 12 3.1 Kernel Smoothing3.2 Spline Estimation 7 3.2.1 Regression Splines 18 3.2.2 Smoothing Splines 9 iii 4 Testing for Bumps 26 4.1 Density Function4.2 Regression Function 32 4.3 New Test: CxiSV 3 4.3.1 Counting Bumps 34 4.3.2 CviSV in Regression 5 4.3.3 CviSV for Derivatives 38 5 Results 41 5.1 Regression Function Simulations 45.1.1 Function m2(-|a = 0.48, b = 0.15) 46 5.1.2 Zero-One Bump Function 49 5.1.3 Multi-Bump Function 56 5.2 Derivative Simulations 73 5.3 Analysis of the Growth Data 84 6 Conclusions 103 Bibliography 5 A Male results 112 B Female results 13iv List of Tables 5.1 Test results for m2(ti\a = 0.48, b = 0.15) with 2 bumps 47 5.2 Test results for function mi(t|a = 0) with 0 bumps 52 5.3 Test results for function mi(t\a = 0.15) with 0 bumps 53 5.4 Test results for function mi(t\a = 0.45) with 1 bump 54 5.5 Test results for function mi(t\a = 0.25) with 1 bump 55 5.6 Test results for m2(t|a, b) with 0 bumps 65.7 Test results for m2(i|a, b) with 1 bump 6 5.8 Test results for m2(t\a, b) with 2 bumps 68 5.9 Test results for m3(i|a, b) with 0 bumps 80 5.10 Test results for m3(£|a, b) with 1 bump 1 5.11 Test results for m3(t\a, b) with 2 bumps 82 5.12 Bootstrap results for male data - 3-max 9 5.13 Bootstrap results for female data - 3-max 100 A. l Test results for the male growth data 113 B. l Test results for the female growth data 133 v List of Figures 1.1 Estimates of the speed of growth of male # 1 3 5.1 Function mi(t) = l + t + a* B(t\0.5) 43 5.2 Function m2(t\a,b) = 1 + exp(-4t) + a* B(t\0.25) + b* B(t\0.7b). . 45 5.3 Number of bumps as a function of a smoothing parameter A 48 5.4 Function mi(t\a) = 1 + t + a * B(t\0.5) with 0 bumps and simulated data with the level of noise o = 0.10 50 5.5 Function mi(t\a) = 1 + t + a * B(t\0.b) with 1 bump and simulated data with the level of noise a = 0.10 51 5.6 Power of the CviSV test with k = 0 and size of the test with k = 1 for the function mi = 1 +1 + 0.25 * B(t\0.5) with the noise level a = 0.10. 57 5.7 Function m2{t\a, b) = 1 + exp(-4t) + a * B(t|0.25) + b * E(i|0.75) with 0 bumps and simulated data with the level of noise a = 0.05 58 5.8 Function m2(t\a, b) = 1 + exp(-4t) + a * B(t\0.25) + b* B(t\0.75) with 1 bump and the simulated data with the level of noise a = 0.05. ... 59 5.9 Function m2(t\a, b) = 1 + exp(-4t) + a * B(t\0.25) + b* B(t\0.75) with 2 bumps and simulated data with the level of noise a = 0.05 60 5.10 Power and size estimates for the function m2(t\a = 0.64,6 = 0). The vertical line is drawn at a = 0.15 70 vi 5.11 Power and size estimates for the function m2(i|a = 0.48,6 = 0.15). The vertical line is drawn at a = 0.15 71 5.12 Power and size estimates for the function m2(t\a = 0.48,6 = 0.10). The vertical line is drawn at a = 0.15 72 5.13 Functions ms(t\a, 6) (Top) and its derivative ra'3 (Bottom) with 0 bumps. Also included, data sets (Top) with noise level a = 0.005 75 5.14 Function m3(t\a, b) (Top) and its derivative m'3 (Bottom) with 1 bump. Also included, data sets (Top) with noise level a = 0.005 76 5.15 Function m3(t\a, b) (Top) and its derivative m3 (Bottom) with 2 bumps. Also included, data sets (Top) with noise level a = 0.005 77 5.16 Power of the CriSV test when k=0 and k=l for the function m'3(t\a = 0.48,6 = 0.15) at the two noise levels. The vertical line is drawn at a = 0.15 83 5.17 Estimates of the speed of growth of males with one bump 91 5.18 Estimates of the speed of growth of males with two bumps 92 5.19 Estimates of the speed of growth of males with three bumps 93 5.20 Estimates of the speed of growth of males when a decision cannot be reached by a "strict" criterion 94 5.21 Estimates of the speed of growth of females with one bump 96 5.22 Estimates of the speed of growth of females with two bumps 97 5.23 Estimates of the speed of growth of females with three bumps. ... 98 5.24 Estimates of the speed of growth of females when the decision could not be reached according to the "strict" criterion 99 A.l Plot of male speed curves with 1 bump 128 A.2 Plot of male speed curves with 2 bumps 129 vii A.3 Plot of male speed curves with 2 bumps (top) and 3 bumps (bottom). 130 A. 4 Plots of the cases with no decision reached by the "strict" criterion. 131 B. l Plots of female speed curves with 1 bump 150 B.2 Plots of female speed curves with 1 bump (continued) 151 B.3 Plots of female speed curves with 2 bumps 152 B.4 Plots of female speed curves with 2 bumps (continued) 153 B.5 Plots of female speed curves with 3 or more than 3 bumps 154 B.6 Plots of the cases with no decision reached by the "strict" criterion. . 155 B.7 Plots of the cases with no decision reached by the "strict" criterion (continued) 156 viii Acknowledgements First and foremost, I would like to thank my supervisor, Nancy Heckman, for her support and guidance throughout the development of this thesis. Nancy was always willing to give her advice and encouragement which were greatly appreciated. Secondly, I would like to thank Paul Gustafson for his comments and suggestions on improving this manuscript. My gratitude extends to all other faculty members and particularly to Jean Meloche and Michael Schulzer for expanding my understanding of statistics. In addition, I must acknowledge the work of Jim Ramsay who provided us with the Splus programs Pspline and Lspline, and the Berkeley growth data. Thanks also to all fellow graduate students who were always ready to share their knowledge and expertise, and especially to Matias Salibian-Barrera for his amity, wit and patience. Finally, I would like to thank the Natural Science and Engineering Research Coun cil of Canada for its financial support during my graduate studies. ix Chapter 1 Introduction Statisticians often want to find an underlying "true" regression function when given a set of noisy data. We assume, very often, that this function is of a parametric form. For instance, in the case of polynomial regression, we search for the "truth" in the class of polynomials. However once we adopt a specific parametric form of a function, we cannot detect features of the data beyond those found in the specified model. In the event we adopt the exponential parametric model, the regression function is monotone, i.e. implicitly we restrict the regression relationship to be of increasing or decreasing form. However, on a number of occasions we want to allow more flexibility in modeling. We can achieve that by not assuming any specific model, i.e. by taking advantage of nonparametric regression. This approach permits a detection of fine features in the function, e.g. bumps and valleys. Let us consider the growth data collected on children in Berkeley, California. This data set consists of height measurements taken at least once per year from an age of one till age eighteen. Our main goal is to discover interesting features in the speed and acceleration of growth. A few parametric models for the speed of growth have been proposed in the literature. Gasser et.al. (1984) give a good review of the 1 existing models. However, all of the models assume the knowledge of the shape of the regression function. For example, we know that most children have pubertal spurts of growth. Therefore we include this spurt in the model. However, the studies in 1930's and 1940's indicated that some children experience prepubertal growth acceleration. Such growth spurts cannot occur in the "pubertal growth spurt only" model. A new parametric model would be needed. Thus we approach the problem from the nonparametric point of view. We allow ourselves the flexibility of modeling the growth curve but without prespecification of the number of spurts. Unfortunately, a problem arises in the nonparametric ap proach. We are forced to choose a so called "smoothing parameter" controlling the balance between goodness of fit to the data and smoothness of the regression function. Largo et.al. (1978) pioneered the use of smoothing splines functions and Gasser et.al. (1984) introduced kernel estimation in the analysis of growth data. Largo et.al. se lected the "optimal" smoothing parameter using a cross validation procedure. Gasser et.al. described a complicated procedure for choosing the "optimal" smoothing pa rameter. Ramsay et.al. (1995) in their analysis chose the smoothing parameter: "in part by a data-driven process called generalized cross validation, and in part by ex periments with simulated data". We can look at Figure 1.1 to see the importance of smoothing parameter selection. At low values of the smoothing parameter, we produce very wiggly estimates, and at large values we lose all the important features of the curve. Construction of these function estimates is discussed in Chapter 3. In our approach, we avoid this problem entirely. We actually use a specially defined smoothing parameter to test for the existence of a certain number of bumps in the regression function. We use a testing method inspired by the work of Silverman (1981, 1983). His interest was in testing for the number of modes of an underlying density function. He 2 Effects of a smoothing parameter on the function estimate Figure 1.1: Estimates of the speed of growth of male # 1. used the kernel density estimation method. This local approach produces a smooth histogram, with the smoothing parameter (also called a bandwidth), h, playing the role of bin-width. Now, if the data are strongly bimodal, we would need a large value of h to obtain a unimodal estimate. Silverman used this idea to test the null hypothesis that the density / has k modes versus the alternative that / has more than A; modes. Bowman et.al. (1997) extended Silverman's idea to regression functions. They were interested in testing for the monotonicity of the underlying regression function. Here, the nonparametric regression function estimate used is found by a local linear kernel method. This procedure fits a local least squares line to the data. A bandwidth h controls how many adjacent points to the point of interest are used in the local fit. Again, to have a monotone regression function estimate when the "truth" is non-monotone, the bandwidth would have to be very large. Both, Silverman's and Bowman's procedures are described in more detail in Chapter 4. We extend Bowman's monotonicity test to Cx'iSV (Critical Smoothing Parame ter), a new test of H0 : # of bumps in m(-) < k vs. Hi : # of bumps in m(-) > k where m(-) is the "true" regression function. We use a spline smoothing procedure in the estimation step. Our testing method is based on the critical value of a smoothing parameter which is defined as the smallest smoothing parameter yielding a regression curve with k bumps. The p-value of the test is calculated via bootstrapping. The general scenario that we follow in this thesis has three parts: 1. Defining a bump in ra(-). 4 2. Estimating the regression function m(-). 3. Defining a test statistic and conducting a hypothesis test. In Chapter 2, we discuss possible definitions of a bump, and we select the most practical working definition. Methods of estimation are presented in Chapter 3. We mainly use spline smoothing methods, but for comparison purposes kernel smoothing is also utilized. Chapter 4 gives an overview of existing testing procedures. There we also define our test CviSV , and we provide the way for calculating the p-value associated with it. Extensive simulation studies and the analysis of the Berkeley growth data are presented in Chapter 5. 5 Chapter 2 Bumps: what are they? In this chapter we outline the framework of our problem. The next section presents the definitions and facts about modes and bumps in densities, and section 2.2 extends the definitions to regression functions. 2.1 Bumps in Density Functions Let us focus our attention for the moment on the distribution F(-) of a random variable X. Let /(•) be its density and f^(-) the A;-th derivative of the density /(•) when it exists. M is a mode of the distribution of X if /(•) has a local maximum at M. In practical applications and in the rest of the thesis, we assume /(•) to be continuous and at least twice continuously differentiable. Thus, we can define a mode as follows: Definition 2.1 M is a mode if and only if f'(M) = 0 and 3 e > 0 such that f'(t) > 0 for te(M-e,M) and f'(t) < 0 for t e (M, M + e). 6 Usually, each mode has two associated antimodes ti, < M and tu > M (or local minima), defined by tL = maxr. such that f'(tL) = 0, f"(tL) > 0 (2.1) tv = mint such that /'(%) = 0, /"(%) > 0. (2.2) We want to study properties of bumps, but we have not clarified yet what we mean when we say that a function has a "bump". The first difficulty arises in the definition of a bump. There have been a few definitions proposed in the literature, and we discuss here the approaches taken to define a bump. Since tr, < M < tu, we could give the following definition of a bump. Definition 2.2 /(•) has one bump in [tL,tu] if and only if there exists a mode M G (tL,tu) andtL, tu satisfy properties (2.1) and (2.2) respectively. However, definition 2.2 of a bump does not cover all the possibilities, since not every mode has two antimodes (e.g. the Gaussian density has a mode, but no anti-modes). Another way to define a bump in /(•) is: Definition 2.3 /(•) has a bump in [tL,tu] if and only if f"(tL) = 0 = f"{tu) (2.3) Vte(tL,tu) f"(t) <0 (2.4) and [tr,,tu] is the shortest interval with properties (2.3) and (2.4). Definition 2.3 extends the class of functions with bumps. According to this defi nition, it is clear that every mode M with f"(M) ^ 0 has a surrounding bump, but the converse is not true. We can encounter a bump in a density function where the 7 second derivative changes sign, but the first derivative does not. In this case, we say that there is a shoulder in Therefore, we have to adjust the definition of a bump to avoid the situations when we have mere changes of concavity on the slopes of a "true" bump. Combining the features of definitions 2.2 and 2.3, we give the final definition of a bump. Definition 2.4 /(•) has one bump in if and only if • There exists a mode M € (£L,%), • Vi G (tL, M) f'(t) > 0, and • Vte(M,tu) /'(*) <0 The second problem is the difficulty of estimating the number of bumps in a density function. Let us consider a random variable X with a probability distribution F(-). When we take a sample from it, we get little information about the tails of the distribution. In a simple case of estimating the mean p of X in a nonparametric way, Bahadur and Savage (1956) showed that there is no effective way to get a truly nonparametric point estimate of p or a confidence interval for p. Similar problems are encountered when estimating nonlinear functionals of a den sity function /(•) or its derivatives. Two integer-valued functionals that are commonly looked at are mixture complexity K(F) (the number of mixture terms needed to rep resent a density) and M(F), number of modes in /(•). In the applied setting, it is of interest to estimate such functionals. However, there is a logical difficulty with this, since all the functionals depend on the existence of a density with some amount of regularity. And we cannot verify empirically that a density is well-behaved. Donoho (1988) showed that we can still make inferences about the values of K(F) or M(F) 8 with a non-parametric validity. We have a restriction, however, that all the state ments are of a one-sided nature. In the M(F) example, we can make a statement: "I have 90% confidence that the number of modes of the underlying distribution is at least 2". The statement above does not depend on the assumptions about the under lying distribution. It is totally empirical and it has at least the indicated coverage. Moreover, Donoho showed that using his procedure we get one-sided bounds that are consistent and converge at rapid rates. The main idea behind the non-existence of upper limits of some functionals is that close to any distribution of interest, there are indistinguishable distributions for which the functional takes on arbitrarily large values. Given a fixed sample size, it is not possible to put an upper bound on the functional based only on the empirical evidence. We would have to make some a priori assumptions that are not testable by data on hand. There is an ambiguity arising in the definition of the functional M(F) counting the number of modes. When a distribution is flat on some interval, we can argue that it has infinitely many or just one mode there. Donoho gave a definition which yields only one mode in such a situation. He used Silverman's idea (1981) of a convolution $^ *F of a normal distribution function $/j with variance h and an arbitrary distribution function F. G = $/i * F is a smooth distribution function with derivatives of all orders. Therefore, Donoho defined the number of modes of G, M(G), as the number of downcrossings in G". Using Silverman's result (Theorem 4.1) about monotonicity of M(G) with respect to h, M(F) was defined as M(F) — lim M($/l * F) (2.5) In the light of Definition 2.5, we can make statements of the form: the correct number of modes (bumps) may be larger than a certain number, but I have a high confidence 9 that it is not smaller. In general, the data can invalidate simple models, e.g. a distribution with one mode. However, data cannot usually rule out complex models. Therefore, inference about most measures of complexity pertain to lower bounds, but not upper ones. 2.2 Bumps in Regression Functions Let us turn our attention to the regression problem. We define m(-) to be the true (smooth) regression function and we denote its k-ih derivative by m^(-). We assume m(-) is defined on an open interval (a, b) where either a can be —oo or b can be +00. We define a mode in m^(-) for k — 0,1,2,... by Definition 2.5 m^(-) has a mode M if and only if M is a local maximum ofm^(-). Following the definition 2.4 for a bump in a density function, we define a bump in the regression function m(-) or in its fc-th derivative m^(-) by Definition 2.6 mW{-) has one bump in [tL,tu] if and onty if • there exists a mode M £ (ti,tu), • Vi G (tL,M) m(fc+1)(t) > 0, and • Vi e (M,tu) m(fc+1)(i) < 0. In section 4.3, we discuss ways to estimate the number of bumps in regression functions and their derivatives. 2.3 Testing in Density versus Regression In the last two sections, we defined modes and bumps as well as presented some results pertaining to them. So far, we have looked at the similarities between density 10 and regression functions. However, the questions asked in the two mentioned contexts are usually different. In the density framework, the main questions arising are "Is the density function unimodal?" or "Should we make use of the bimodal mixture?". In the regression setting, we are more interested in the monotonicity, or possibly in the existence of a few local maxima in the function. Therefore, we can test for 0 bumps in the regression context, but not in the density function environment. Besides, very often we are interested in the number of bumps in the derivatives of the regression function. For example, in the growth data analyzed in section 5.3 we inquire about possible spurts of growth, requiring the estimation or testing for the number of bumps in the speed of growth. We can be concerned as well with the change in the speed of growth, i.e. with acceleration. We will show in Chapter 4 how our testing procedure can be applied in tests for regression functions or their derivatives. 11 Chapter 3 Methods of Estimation In this chapter, we give a brief summary of the nonparametric techniques used in univariate regression estimation. Some of the most popular methods are based on kernel functions, spline functions and wavelets. The first two of these methods have become commonly used in recent years. In section 3.1, we give a review of prevalent kernel smoothing methods with special emphasis on the local linear approach. Spline smoothing is reviewed in section 3.2. We concentrate mostly on smoothing splines. Before giving an overview of kernel-based nonparametric regression, we will introduce some relevant terminology and notation. We focus on the fixed design context with equally spaced non-random numbers (t\,...,tn) € [a, b\. The response variable is assumed to satisfy 3.1 Kernel Smoothing Yi = m(ti) + £i, i-l,...,n (3.1) where ei, en are independent random variables with E{£i) = 0 and Var(e*) = a2. 12 We call m(-) the mean (or true) regression function. The fundamental idea behind kernel smoothing lies in the following premise: data points far away from t carry little information about the value of m(t). Thus, the intuitive estimator for the mean regression function is the running local average. An improved version of it is the locally weighted average. Let K(-) be a function which will be used to assign weights. Usually K(-) is a probability density symmetric about 0 and is called a kernel function. Let h be a bandwidth (sometimes called a smoothing parameter), which is a nonnegative number controlling the size of a window around the point of interest t. With notation Kh(-) = K(-/h)/h, the Nadaraya-Watson kernel regression estimator is given by mk(t)= &,*»<«.-*)' (3'2) Most commonly used kernel functions include the normal kernel K(t) = (v/2^)_1exp(-t2/2), and the "symmetric Beta family" ^) = Beta(1/2,7 + l)(1-t2)X- 7 = <U""' (3'3) where the subscript + denotes the positive part taken before the exponentiation. The choices 7 = 0,1,2, and 3 in (3.3) lead respectively to the uniform, the Epanechnikov, the biweight and the triweight kernel function. The selection of the kernel function is not very important in general. However, we choose to work with the normal kernel, since it enables us to use convolution theorems true only in the normal case (see Silverman (1981) and Babaud et.al. (1986)). On the other hand, the choice of the bandwidth h is crucial for the estimation. Small values of h give us an undersmoothed estimate, and large values of h give us an oversmoothed estimate. 13 Gasser and Muller (1979) proposed another way to introduce weights. Their estimator (also called the convolution weighted estimator) is given by: mh{t) =YjYi fSl Kh{t - u)du, (3.4) where s0 = —oo, sn = oo, ti < Sj < for j = 1, ...,n — 1 and K(-) is a density function. Both kernel estimators mentioned so far use a local constant approximation. We can see that by applying a local least squares regression when approximating m(-) by a constant 6. We obtain an estimate n n n 6 - argmin^(li - 6)2Wi = J2wiYi/Y,w^ i=l i=l i=l where u>i = Kh(ti — t) in the case of the Nadaraya-Watson estimator and uii = Kh(t — u)du in the case of Gasser-Miiller estimator. Local constant approximation gives us a large bias in estimation of m(t) when t is close to either a or b. A local linear or a higher order local polynomial fit does not have this problem, and so is recommended. Higher order polynomials are preferable when we want to accurately fit finer features of the data, e.g. peaks and valleys. The main drawbacks of higher order local polynomial fitting are its computational complexity and higher degree of sample variability. Let p be the degree of the local polynomial being fit. We estimate m(t) by rhh(t; p) obtained by fitting the polynomial pQ + Pi(--t) + ... + (3p(--ty to the (U, Y^'s using weighted least squares with kernel function K(-). The estimator rhh(t;p) is fio where 0 = (/30, •••,/5p)' is the minimizer of -fa- ... - 0p(ti - t)p}2Kh(ti - t). i=i 14 The general solution to the minimization problem is given by 3 = (T^WtTtY^WtY where Y = (Y\,..., Yn)' is the response vector, 1 h-t ••• (h-ty Tt= j : : i tn-t ••• (tn-ty is an n x (p + 1) design matrix and Wt = diag{tfh(ti - t),..., Kh(tn - t)} is an n x n diagonal matrix of weights. We can derive explicit formulae for the Nadaraya-Watson (p = 0) and the local linear (p — 1) estimators. In the latter case, we obtain mh(t; 1) = n"1 £ — ^ k) " _ t)}Kfl{ti ~ t)Yi s2(<; h)sQ(t; h) - §i(t; h)2 (3.5) where sr{t; h) = n-1^2(ti-t)rKh(ti-t). i=l An important problem in estimation is the choice of the degree p of the polynomial to be locally fitted. For sufficiently smooth regression functions, rhh(t;p) has better asymptotic performance for larger values of p. In practice, however, we would need a very large sample size to see the substantial performance improvement. Since odd degree fits have good bias and boundary properties, it is recommended that p = 1 or p = 3 be used (see Wand and Jones (1995)). In our work, we use p = 1, i.e. a local linear kernel regression estimator. Our interest extends beyond the estimation of the regression functions themselves. In some problems, the estimation of the derivatives of the regression function is more 15 important than estimation of the function itself. For example, in analyzing human growth curves, the first two derivatives of height as a function of age have significant biological meaning. Estimation of the derivatives is also needed for plug-in bandwidth selection procedures (see Wand and Jones (1995)). We can easily use local polynomial fitting to estimate the rth derivative of the regression function. We use the coefficient of the rth derivative of the local polynomial being fitted at t to estimate m(r\t). For instance, m'h(t\p) = 0i, and in general, the local pth degree estimate of m^(t) is m^(t;p) = r!/?r. It should be noted that m^\t;p) ^ (m^(t;p))(r) in general. The preferred selection of the degree p of the local polynomial fit is such that (p — r) is odd. In the case of (p — r) even, we obtain a more complicated expression for the bias of m^\t;p). More seriously, the bias for t close to the boundary of (a, b) is of higher order than the bias for the t in the interior of (a, b). As mentioned before, the value of the bandwidth h plays an important role in kernel regression estimation. In many situations a satisfactory choice can be made subjectively by eye. This involves looking at several regression estimates over a range of bandwidths and choosing the "best" in some sense. However, it is also desirable to have an automatic procedure for the bandwidth selection. Currently available methods can be divided into two categories: plug-in and cross-validation. The for mer technique has many attractive theoretical and practical properties (see Ruppert, Sheather, Wand (1995)), and can be easily implemented in S-PLUS. M.P. Wand developed a function locpoly using a plug-in procedure in the band width choice. We use locpoly in our simulation study, discussed in detail in Chapter 5. 16 3.2 Spline Estimation We turn our attention to another method of nonparametric estimation: splines. In tuitively, we can think about splines as piecewise polynomial functions. The name "spline" comes from the similarity of a spline fit to the curve obtained by a draftsman using a mechanical spline - a thin flexible rod with weights, used to fix the rod at points through which one draws a smooth interpolating curve. For simplicity, we assume our true regression function m(-) is defined on a closed interval [0,1]. Of course, any closed interval can be chosen. We first define a general polynomial spline of degree D: Definition 3.1 A general polynomial spline g is a function on [0,1] with the following properties: Given an integer D > 1, and J points 0 < t\ < t2 < .. • < tj < 1, called "knots" g G TTd, t G [U,ti+i], for i = 0,...,J, where t0 = 0, and tj+\ = 1 and g G CD~\ t G [0,1], where nD is the class of polynomials of degree at most D and CD is the class of functions with D continuous derivatives. Thus, g is a Dth degree piecewise polynomial with the pieces joined at the knots in such a way that g has D — 1 continuous derivatives. This general idea of polynomial splines was of interest to numerical analysts for over 30 years. Many interpolation results have been proved about the convergence of general splines, as well as their derivatives and integrals, as the number of knots becomes dense in [0,1]. Because of these properties as well as good features in smoothing noisy data, splines are of interest to statisticians. The two principal classes of splines used in statistics are 17 regression splines and smoothing splines. We will present the former briefly in the next section, and the latter in more detail in section 3.2.2. 3.2.1 Regression Splines We could define regression splines in general, but to be more concrete, we present a cubic regression spline. Let ti,...,tj be a knot sequence such that 0 < ti < ... < tj < 1. A cubic spline function g satisfies the two following properties: • g is a cubic polynomial on each of the intervals [0, ti], [ti,t2],..., [tj-i, tj], [tj, 1]. • It is twice continuously differentiable. The collection of cubic spline functions is a (J + 4)-dimensional linear space. The two most popular bases used for this space are: • power basis: (t — (j = 1,..., J), 1, t, t2, t3; • B-spline basis: Let B\D^ denote the ith B-spline basis function of degree D. Therefore, the basis functions for a cubic spline are {B^l,..., B^}. We can define B^'s using a recursive formula. Let ... = t-i = t0 = 0 < ti < ... < tj < 1 = tj+i = ... We start with the degree 0 local polynomial, i.e. local constant, B^it) = I{t G [k, ti+l)} for i = 0,..., J - 1 and B^(x)=I{t e[tj,l}}. Now, given B\ , i = — (D — 1),..., J, define t-U ti+D+l — t ti+D+l ~ ti+l 18 where we take 0/0=0 and undefined/0=0. We can observe that B\D\t) is equal to 0 for t ^ [ti,ti+D+i\. Thus, design matrices based on 5-splines are sparser than those using the power basis, and so we achieve a better numerical stability with the 73-spline basis. This fact plays an important role in the estimation procedure described below. Using the S-spline basis we can express a cubic spline function as g(t) = E OjBfit). i=-3 For given knots ti,... ,tj, the spline regression method finds the best spline approxi mation by the following least squares regression: min.£{*- E^f^)}2-Using this approach, we obtain an estimate of m(-) (unique if J > D) of the form m(t) = E OMt). j=-3 The procedure just described is very sensitive to the choice of knots as well as to their location. Knots are usually placed in the spots where curvature of m(-) changes rapidly. One can find in the literature a few methods to deal with knot selection. Most of them are based on the knot deletion idea (see Fan and Gijbels (1996)). 3.2.2 Smoothing Splines Our goal is to estimate the regression function m(-) using the least squares fit, but with the restriction that m(-) is not too rough or wiggly. Smoothing splines provide one of the ways to find such an estimate. We want to find a minimizer rh\(-) over an appropriate space of functions of E(y4 - m{U))2 + X f\Lm(t)]2dt, (3.6) i=l ,'0 19 where A > 0 is called a smoothing parameter, and L is a linear differential operator of order k > 1 defined as The minimization is carried out over all m(-) in the Sobolev space 9ik[0,1] = {m : [0,1] —>• IR : m^\j — 0,..., k - 1 are absolutely continuous Going back to the expression (3.6), we can see that the first part of it penalizes for the lack of fit, and the second part penalizes for m when Lm is far from 0. We can see clearly from (3.6) that setting A = 0 would give us an interpolation to the data. For large A, one is, essentially, minimizing the second term, and as A —t +oo, our estimate converges to m with Lm{t) = 0 almost everywhere t 6 [0,1]. We say that rridef is a default model if and only if Lmdef{t) — 0 almost everywhere t. Before looking at the general solution to the minimization problem, let us look at a few examples of the linear differential operator L. The most popular one is Lm = m" with the default model mdef(t) = a0 + ait, (a0,a>i e IR). The function rh\(-) minimizing (3.6) is known as a natural cubic spline function, a general cubic spline with knots t\,... ,tn, with rh\ linear on [0,ti] U [tn, 1]. Using m"[t) in (3.6) corresponds to penalizing visual roughness. It is very difficult for us to see by eye the behaviour of third or higher derivatives. Penalizing by the second derivative is adequate to enable us to exclude rough and wiggly functions. However, the drawback of this approach is in assuming the default model is a line. In many situations, we can gain considerably in estimation if we use other linear differential operators. One of the classes of L, which includes Lm = m", is Lm = for k = 1,2,3,... with the k-l (Lm)(t) = mk(t) + J2 Wjm^t) (3-7) 3=0 where Wj G IR for j — 0,..., k — 1. 20 default model mdef(t) = a0+ait+.. .+ak-itk~~1 (a0, ai,..., ctk-i £ IR). The functions minimizing (3.6) with Lm = are known as natural splines (general splines of degree 2k-l, with knots ti,...,tn, but with zero kth derivative on [0,ti] U [in>l])-In another example, used in the growth data analysis (described in chapter 5), the default model is commonly assumed to be of the form m<fef(£) = an + aiexp(—7^) with the corresponding Lm = m" + 7m. In our work, we use Lm = m" and Lm = m" + 7m as penalty terms in (3.6). For both L's, the default models are monotone. Therefore, when the smoothing parameter A —> +00 we obtain estimated functions with 0 bumps. The general solution to the minimization problem (3.6) involves the theory of reproducing kernel Hilbert spaces. The details and proofs of the main results can be found in Wahba (1990) or in Heckman (1997). We present here the most important points. The minimizer of (3.6) is of the form k n m = T, &M t) + T,Pii'j(t)- (3-8) i=l j=l where the UiS (i = 1,..., k) form a basis for the default model space, and fj(t) can be calculated by the integration of the Green's function G(-, •) associated with L. The Green's function satisfies f(t) = f G(t,u)(Lf)(u)du V/ eHk with /W(0) = 0, i = 0,..., k - 1. G is easily computed, given ui,..., u^. Then "j® = K(tj,t), K(s,t) = f G(s,u)G(t,u)du = K(t,s). 21 The only remaining task is to choose a smoothing parameter A in such a way that we obtain a "good" fit. As mentioned in the last section, our estimate will go from interpolating the data (when A —> 0) to taking the form of the default model (when A —> +00). Therefore, we would like to find an optimum smoothing parameter, that allows us flexibility in the modeling, but does not let the model dominate the estimation procedure. A few methods for estimating the optimum smoothing parameter have been pro posed in the literature on smoothing, but there is no general agreement on one method being superior to the others. Two possibilities are cross validation (CV) and general ized cross validation (GCV). Both ideas are based on minimizing the prediction error MSPE. Let rh\(-) denote the regression function fitted to the data {^i}"- We would like to choose A in such a way as to minimize MSPEX = £(Y; - mxiU))2, (3.9) i=i where the Y-'s are new observations at each U respectively. The difficulty with this approach is that the Y- 's are unknown, so we have to replace them with the observed values {li}". However, minimizing (3.9) with direct substitution of Y{ for Y- would result in the choice of A = 0, since we fit the regression model to the "old" data, thus overfitting the data on hand. To adjust for that unpleasant feature, we divide our data set into a training sample (n — 1 observations) and a testing sample (1 observation) in every possible way, i.e. we estimate the regression function with case i removed, and we compare the fit with the original data point. Therefore, we define the cross-validation function as Definition 3.2 CV^-^-m^iU))2, (3.10) n i=i 22 where mA l\ti) is the estimate of the regression function with case (U,Yi) removed. We select A according to the CV criterion by XCv = argminCVA- (3.11) AelR In practice, this procedure looks very time-consuming, since it seems that we would have to fit n separate curves. However we can use a shortcut making use of only one fit for each value of A. It can be shown that the solution to the minimization problem (3.6) can be written as a linear combination of the observed values Yi rhA(t) = 5(A)Y (3.12) where mA(t) = (mA(£i),mA(£„))' and Y = (Yi,..., Yn)'. We call 5(A) a hat matrix, because it maps the vector of observed values Yi into fitted values rh\(ti). The following result Y, - *!-"(*,) = (3.13) and the theorem using it can be found in Green and Silverman (1994). Theorem 3.1 The cross-validation function CV\ satisfies where mA(-) is calculated from the full data set using a smoothing parameter A. Using Theorem 3.1, we can write an efficient computer program to calculate CV\ on a grid of A values. We do not have a guarantee that the function CV\ is convex in A, so it can have many local minima. Therefore a grid search is the safest method of correctly calculating the smoothing parameter \cv-Another method used frequently in smoothing parameter choice is generalized cross validation, abbreviated by GCV. It is a modified version of cross validation 23 described above. In GCV the factor 1 — ^(A) in the denominator of (3.14) is replaced by the average value of 1 — Su(Xys, i.e. by 1 — n~1tvS(X). The generalized cross validation function is GCV* = n (l-n-trS(A))2 • (3-15) We choose the smoothing parameter XQCV by minimizing (3.15) over values of A. One of the reasons GCV was introduced was computational. We can find the trace of the hat matrix without finding its diagonal elements. Nowadays, however there is little difference between computing time for CV and GCV. The slight advantage of using GCV can be seen when we think, analogously to standard regression, of the diagonal elements of the matrix S(X), the 5ij(A)'s, as the leverage values. We know that at the points with high leverage the predicted values are very sensitive to the observations made at these points. Using (3.13) we can rewrite GCV function in the form GCVX = „- ± { ((, -,^A)),)' « - **-"<*))'} • (3-16) We observe from (3.16) that GCV when compared to the CV criterion (Definition 3.2), downweighs the deleted residuals at points with large leverage values. Another way of looking at GCV is possible when we invoke classical least squares regression, and its idea of the number of model parameters. For a parametric model, the least squares vector of fitted values is Y = HY with H the so-called hat matrix. The number of parameters is the sum of the diagonal elements of the hat matrix, i.e. for a model with k parameters, we have k — tr(H). We can introduce a similar concept in nonparametric regression. The degrees of freedom of the model can be defined by analogy to standard regression as the tr(5(A)) (see Hastie and Tibshirani (1990) or Green and Silverman (1994)). Therefore, we can define the equivalent 24 degrees of freedom for noise as EDF = tv (/-5(A)). The EDF goes from 0 when A = 0 (interpolation to the data), to (n — k) when A —> +00 (k is the dimension of the default model). The following expression for GCV\ can be easily derived EDF2 In our simulation studies we use the GCV function to choose the A value. We selected this method because it has more favorable properties than CV (see Wahba (1990)). However, the preference for one or the other criterion is not crucial in our work, since we deal with hypothesis testing and not the true regression function estimation. We use two S-PLUS modules Lspline and Pspline written by J.O. Ramsay. Lspline enables us to use the penalties L that are linear combinations of the derivatives of m(-), e.g. in our study we use Lm = m" + 7m'. Pspline allows us to fit a smoothing spline associated with the penalty Lm = m^ for k E IN. 25 Chapter 4 Testing for Bumps As mentioned in section 2.3, we have to look separately at the density estimation and the regression estimation problems. The differences between them carry through to hypotheses testing. In the next section I describe the methods used in bump-hunting in the density context. Review of bump testing in regression follows in section 4.2, and our method is presented in section 4.3. 4.1 Density Function Consider a data set {X±, ...,Xn} coming from a population with a density function /(•). Firstly, we want to estimate the density /(•) using one of the available methods. Secondly, we wish to make inferences about the features of the true density. We are interested in the existence of modes and associated bumps. There have been different approaches proposed to the bump-hunting problem, starting from the early parametric approach in the 1960's through nonparametric methods developed mainly in the 1980's and 1990's. Cox (1966) has suggested constructing a histogram from the data choosing a bin width broad enough to smooth over spurious bumps, but not so broad as to obscure 26 the effects under study. Let fi denote the number of observations falling in cell i, i.e. fi = YJj=i € Ci). Therefore fi ~ Binomial (n,p = fc. f(x) dx) and fi I fi-i + fi + fi+i ~ Binomial (n = /j_i + fi + fi+i,p) with p = P(X <E d\X e Ci-i U Ci U Ci+i) E(/Q E(/i_1)+E(/i) + E(/i+1)-We can define a test statistic y/2(fi-i + fi + fi+i)' Using the normal approximation to the binomial distribution, it can be shown that Tj has an approximately normal distribution. Values of Ti significantly different from zero will indicate either convexity (Tj < 0) or concavity (Tj > 0) of the true density function around cell i. When we examine the sequence of Tj's, we can get some evidence on the number of bumps in /. There were two approaches proposed to test the hypothesis H0 : # bumps = 1 versus Ha : # bumps = 2 using normal models. Wolfe (1970) used the likelihood ratio to test the normal null hypothesis against the alternative of a two-component normal mixture. The proposed test, however, is very sensitive to the normality assumption, and it may decide with a high probability that a long-tailed unimodal distribution has two modes. Engelman and Hartigan (1969) proposed dividing the sample into the two subsets which maximize the likelihood ratio that the two subsets are sampled from normal distributions with different means, against the null hypothesis that the means are equal. In the 1980's nonparametric methods in bump-hunting started gaining popularity due to greater availability of computing resources. 27 Good and Gaskins (1980) compared density estimation and bump hunting. Their notion was that bump-hunting differed from density estimation, since the former involved significance testing, and the latter estimation. They used the maximum penalized-likelihood (MPL) method for estimating the probability density and for the bump-hunting. Let / be the estimated density using MPL. They then used iterative surgery to construct another estimate /* in which the bump is absent. The difference between the penalized log likelihood of / and /* gave log-odds ratio in favour of the bump. Silverman (1981) proposes a test statistic for hypotheses concerning the number of modes in the density. His statistic is calculated by application of a kernel density estimator defined at a point x as where K(-) is known as a kernel function. Using this definition, we can say the window width h controls the amount of data used to calculate the estimate of f(x). Now, if the data are strongly bimodal, we would need a large value of h to obtain a unimodal estimate. Using this idea, we can test the null hypothesis that the density / has at most k modes versus the alternative that / has more than k modes. Formally, we define the k-critical window width hcru by hcrit = mf{h : /(•, h) has at most k modes} (4.3) Large values of hcrit will lead to rejection of the null hypothesis that /(•) has at most k modes. The existence of hcrit is guaranteed, if the kernel used in estimation is normal. Silverman proved Theorem 4.1 Given any fixed {Xi,Xn}, define f(x; h) as above, using the normal kernel K(x) = (\/27r)_1 exp(—x2/2). For each integer m > 0, the number of local 28 maxima in dmf(x; h)/dxm as a function of x is a right continuous decreasing function ofh. The following corollary can be used to simplify computation of hcrit. Corollary 4.1 Using the definition of hCTn as above and f(-,h) calculated using the same data set, /(•, h) has more than k modes if and only if h < hcrit. Thus we can use a simple binary search procedure to find hcrit. For any value h we can tell if h < hcra by counting the number of modes in f(-;h). However some caution should be exercised, since the cited result does not apply to other kernels or to other smoothing techniques. Babaud, Witkin, Baudei, and Duda (1986) examined different kernels used in kernel density estimation. They proved that in the class of symmetric, infinitely differentiable kernels with tails vanishing faster than an inverse of any polynomial, the normal kernel is unique in guaranteeing the conclusion of Theorem 4.1. Hartigan and Hartigan (1985) have proposed a technique called the dip test. They defined a statistic DH by DH = min sup \Fn(x) - F(x)\ (4.4) where F(-) is a cumulative distribution function, Fn(-) is the empirical cumulative distribution function of the data and U is the class of all distribution functions with unimodal densities. They define a unimodal distribution function F(-) with mode at x = m as F(-) being convex in (—oo,m] and concave in [m, oo). Additionally, for comparison purposes, the depth test and the likelihood ratio test are specified. The depth test identifies three intervals of equal length such that the middle interval has low empirical probability relative to both the outside intervals. The likelihood ratio 29 test statistic is denned as sup £ log/(*<)/ sup f>g/pQ (4.5) / £ UIC i=i / € ^2C i=i where U\c is the class of unimodal densities and U2C is the class of bimodal densities, constrained by max/(-) < C. The dip test performed slightly better than the depth test, and much better than the likelihood ratio test. Hartigan (1987) generalized his dip test to higher dimensions. He proposed a SPAN statistic, which used an analogue of the empirical distribution function on the minimum spanning tree. However, the SPAN statistic is computationally and conceptually complex. Hartigan and Mohanty (1992) offered yet another test called RUNT, which is based on single linkage clusters of {Xi, ...,Xn} (the maximal connected subgraphs formed by a clustering method based on a distance matrix model). The test is defined as follows. We consider all the single linkage clusters. They form a hierarchical tree with each nonsingleton cluster dividing into a few subclusters. Each cluster C has a smallest subcluster (or "runt"). Let n(C) be the number of points in this subcluster. We define the RUNT statistic as the maxcn(C). Hartigan (1981) justified the RUNT test statistic by the asymptotics of the single linkage clusters. If there are at least two modes in the population density, then asymptotically, just one of the single linkage clusters will split into two clusters of points corresponding to the different modes, with the smaller of the clusters being the runt. A large number of points in the smaller cluster would indicate bimodality. On the other hand, if there is a single mode, we expect each cluster to divide into two clusters, the smaller of which contains very few points. Therefore, a large value of the RUNT statistic should indicate multimodality. Muller and Sawitzki (1991) proposed a method for analyzing the modality of a possibly multivariate distribution based on the excess mass functional. This func-30 tional measures excessive empirical mass defined by a comparison of the empirical distribution to multiples of a uniform distribution. In one dimension the resulting test is equivalent to Hartigan's dip test. Minnotte (1993, 1997) introduced a whole new class of tests. He used kernel density estimation with varying bandwidths h. Due to the fact that data-driven choice of the bandwidth h is difficult, he proposed a new graphical method, a mode tree, to assist in bandwidth selection. A mode tree is a graphical tool relating the locations of modes in density estimates with the bandwidths of those estimates. For each value of the bandwidth, a density estimate and location of the mode are calculated. The mode locations are then plotted against the logarithm of the bandwidth. The usual scale is not appropriate here, since large changes in a bandwidth when the bandwidth is large have less of an effect on the density estimate than smaller changes in h when h is small. Additional functionality can be added to a simple mode tree. Firstly, we can draw lines to show the connections between the new modes and the old ones from which they split. Minotte suggested a second enhancement which helps us visualize the presence and the importance of the bumps by drawing the widths of mode traces. It may be more appropriate to consider the mode tree idea as an exploratory device useful in indicating modes being worth further study. It is therefore unclear how we can apply this approach to testing, since we make multiple comparisons at different bandwidths. Chaudhuri and Marron (1997) offered us a method based on scale space ideas developed first in computer vision literature. Here "scale" means "level of resolution" or in a kernel density context a bandwidth - h. Their proposal, the SiZer, assesses the significant zero crossings of derivatives of estimates of the density. The density esti mation is carried out simultaneously at different values of the bandwidth h. Marron noted that only "blurred signal" is available from a finite sample in the presence of 31 noise. Therefore, the focus of his analysis was on the convolution /*Kh representing the "blurring" of the signal. At each level of h, the confidence limit for the derivative of a density f'h(x) was obtained, where f'h(-) denoted the derivative of / * Kh. The behaviour of the estimator f'h(-) at each x and h is presented via the SiZer colour map. Chaudhuri and Marron's method can be applied in the regression framework by looking at the first derivative of the regression function. The advantage of their technique is not only in detecting the number of significant bumps, but also in show ing their locations. Chaudhuri and Marron mentioned, however, that their approach had less power than methods devised specifically for mode testing, since mode tests are not impeded by trying to be simultaneous at each x and h. As with the mode tree idea, SiZer should be used mostly as an exploratory tool, with follow up analysis of important features of interest. 4.2 Regression Function It is very surprising that tests for bumps are so much more prevalent in the density estimation context than in the regression estimation. Procedures for bump-hunting in the regression context are few, and only some of them can be readily implemented to the problem on hand. An early reference is Schlee (1982) who proposed a test based on the greatest discrepancy of an estimate of the derivative of the regression function from zero. He used a modified version of kernel estimators, and based his tests on the asymptotic distribution of the maximal deviation. Schlee's tests can be used to test for constancy, monotonicity, or convexity of the underlying regression function. The drawback of the methodology is in the lack of discussion of practical implementation. Heckman (1992) suggested that one might recognize an occurrence of a bump in 32 the regression function if an estimate of the regression function rises and then falls over a range of independent variable. Letting Fj, i = 1,..., N denote these estimates, she defined a "bump" of size k at the j-th estimate as the event < <---<Yj>Yj+1>---> Yj+k] (4.6) She studied the asymptotic distribution of 5^, the number of bumps of size k, and KN, the maximum bump size observed. was proven to be a consistent estimator of the number of local maxima of a regression function provided k —> oo and k/N —> 0. However in practical applications, was very sensitive to the choice of k and iV in determining the correct number of bumps. Bowman et.al. (1998) constructed a test for the monotonicity of the regression function that is analogous to "Silverman's test" (1981) of multimodality of a prob ability density function. Further discussion of "Bowman's test" is given in section 4.3. 4.3 New Test: CriSV As mentioned in section 4.2, results on bump-hunting in regression are few, when compared to the results in density estimation. These few results mainly concern testing for monotonicity. We would like to explore here bump-hunting in regression using spline methods of estimation. In the first place in section 4.3.1, we present the discussion on estimating the number of bumps in the regression function. Then in section 4.3.2, we propose a testing method inspired by Silverman's idea of defining the hcrit statistic. In section 4.3.3 we give a description of a testing procedure for the derivatives of the regression function. 33 4.3.1 Counting Bumps Following on the idea (4.6) of Heckman (1992) , we can decide that a bump in m^(-) (k=0, 1, 2, ...) has occurred if the estimates of m^(-) rise and then fall over a range of the independent variable. If we let = m(k\ti), i = 1, ...,n, be our estimates, then we say m^(-) has one bump in (tj-i,tj+i), if <y}-li< • • • < *f > >f/5> >y$} (4.7) where A;=0, 1, 2, 3,.... We call it an /-bump to show how many points we take into account when estimating the number of bumps. It is very appealing to use (4.7) in the testing procedure for the number of bumps. However, the restriction of the estimates to be strictly rising and then falling over a range of an independent variable does not seem to work well in practice. We do not have any guarantee that in the vicinity of a mode the estimates of the regression function will rise on one side of a mode and fall on the other side at the grid points. In addition, the monotonicity property proven by Silverman for modes (Theorem 4.1) does not hold for regression estimates. Empirical studies show that often the estimate of the number of bumps increases and then decreases as the the smoothing parameter increases (see section 5.1). Another problem arises in the definition of Acr;t, which is needed to calculate the estimate of the regression function rh\CTit for our bootstrap procedure (step 1). For some data sets, rh\ may have fewer than k /-bumps for all values of A. Thus Xcrit does not exist. Furthermore, time of computation required to estimate the number of bumps according to (4.7) is prohibitively large. Therefore in our testing procedure, we utilize (4-8) as a basis for saying that rh^(-) has one bump in (tj^i,tj+i). We call this an /-max. 34 In our simulation studies, we found that /-max did not have the same problems as /-bump. For 1=1, /-bump and /-max are of course equivalent. However, for all other values of /, if a bump occurs according to /-max then it will occur according to /-bump, but not necessarily vice versa. If the true underlying regression function is not con stant over an interval, the differences in the estimated number of bumps according to (4.7) and (4.8) happen mostly for small values of the smoothing parameter A. In section 5.1.1, we compare the performance of the CviSV using definitions (4.7) and (4.8) on one of the regression functions, and we discuss further the advantages and disadvantages of (4.8). 4.3.2 CT\SV in Regression Formally, the following tests are performed for a fixed non-negative integer k: Ho : # of bumps in m(-) < k vs. Hi : # of bumps in m(-) > k Using the test statistic: Acr;t = inf{A : rh\ has k bumps}, we reject H0 if Xcrit is too big. The testing procedure would be reasonable if the number of bumps were monotone in A, i.e. the greater the A the fewer bumps of rh\(-). As A approaches infinity the limiting value of the number of bumps is the number of bumps in the default model m(-). The p-value of the test is then P = PHo{Krit > Krtt} (4-9) 35 where A°^, viewed as non-random in the probability, is the observed value of \Crit-In the definition of p, we actually calculate the probability as a supremum under the null hypothesis, if Ho is composite. Unfortunately, the distribution of Acrjt is unknown, and the estimation of p is extremely computing-intensive. Therefore, we use the following shortcut. Define p* by p* = Pff0{mAof^has more than k bumps} (4-10) If the number of bumps in rh\ is monotone in A then p = p*. The set in (4.10) is fairly simple, since it depends on mA for a fixed value of A = X°^.sit. The monotonicity of the number of modes as a function of the smoothing param eter or the bandwidth was established in a few cases. In the case of kernel estimators, both Gasser-Miiller and Nadaraya-Watson estimators have the monotonicity prop erty, if the kernel is normal. However, this feature does not hold for the local linear regression estimator or for the spline regression estimator. Fortunately, simulation studies indicate that, when defining a bump as in (4.8), data sets where the number of bumps in rh\ is non-monotone are very unusual, so we use p* as our estimator of the p-value. Theoretical derivation of the exact expression for p or p* would be hard or even impossible, therefore we will use bootstrapping to estimate p*. In order to keep the description of the procedure described below comprehensible, let us fix a smoothing method we use. We abbreviate the generic smoothing method utilized by SM. The bootstrap procedure involves the following steps: 1. Calculate \°^rsit and mAo6^ using SM. 2. Find munr, an unrestricted estimate of m(-) with smoothing parameter for SM chosen by generalized cross-validation. 36 3. Find residuals e* = Y{ — munr(ti), i = 1,n. 4. Create a bootstrap sample Y* = mAo6^(tj) + e*, i = 1,n where the e*'s are drawn at random with replacement from the e^s, j = 1, 2,..., n. 5. Find the new estimate m* t? for the bootstrap sample {17}" using SM. crit 6. Find the number of bumps in m. Acrit 7. Repeat 4-6 a large number of times, and calculate the estimate of the p-value by counting the number of estimates m\oba having more than k bumps, i.e. *crit # of estimates m*bs with more than k bumps p — Acrit # of bootstrap samples The choice of SM depends on the default model we assume. For instance, if the default model is a line, we use penalized least squares with Lm = m", and SM is then cubic Pspline. The smoothing method to calculate the residuals in step 3 need not be the same as the smoothing method in steps 1 and 5. However, steps 1 and 5 should use the same smoothing method. Therefore, we can experiment by using either spline smoothing or the local linear kernel technique in steps 1 and 5. When we use the kernel smoothing the definition of the critical smoothing parameter is analogous to Krit- We define hcrit = inf{/i: rhh has k bumps} and we use the same bootstrap steps 4-7 with mXobs replaced by mhobs. crit crit In step 3 we calculate the residuals using the estimates Y^nr = munr(ti). Another possibility would be to use the restricted estimates Yfrit = mXob^(U) in the residual computation. The two methods of generating the residuals may be equivalent when the null hypothesis is true, although this is not clear. The first method has an advantage of being appropriate under both null and alternative hypothesis while the second method is proper only when the null hypothesis is true. The results of our 37 simulation study (see Chapter 5) indicate that using the unrestricted estimates to obtain the residuals gives the right size of the test. Determining the "correct" residuals for hypothesis testing using a bootstrap tech nique has not yet been done satisfactorily. There exist complex ways to adjust the bootstrap procedure in the simpler testing problems. However, no consensus has been reached on resolving the residual problem in more complicated testing procedures. Bowman et.al. (1998) used a similar bootstrap procedure in their test of the null hypothesis that m(-) is monotone. They utilized the same residual generating method as we did in CriSV . The problem of the residual choice was mentioned in their work, but no satisfactory way to settle the issue was provided. Bowman et.al. used local linear kernel regression estimator in all steps of their bootstrap procedure. When the bandwidth h —» +00 this estimator tends to a least squares line fit, which is of course monotone. In a new test, CnSV, a default model plays the role of the line. The default model may not be monotone. Therefore, we can test for any number of bumps being greater than or equal to the number of bumps in the default model. 4.3.3 CriSV for Derivatives So far we have concentrated on the bump-hunting problem in the regression function m(-). However we can extend our procedure to the derivatives of m(-). For example, we can test for any fixed integer-valued k H0 : # of bumps in m'(-) < k vs. Hi : # of bumps in m'(-) > k. We use the same test as for the regression function with m'(-) replacing m(-). Again, we have to use the bootstrap procedure to estimate the p-value of the test. Now, 38 instead of calculating the number of bumps in m(-), we compute the number of bumps in m'(-). Let us denote by SM, a generic smoothing method used in the procedure described below. The following steps are used in the bootstrap procedure testing for bumps in the derivatives: 1. Calculate \°£sit for m' and mAo<« using SM. 2. Find munr, an unrestricted estimate of m(-) with smoothing parameter for SM chosen by either generalized cross-validation or the plug-in method. 3. Find residuals e\ = Yi — munr(ti), i = 1,n. 4. Create a bootstrap sample Y* = mXob^(ti) + e*, i = 1, ...,n where the e*'s are drawn at random with replacement from the e^'s, j = 1, 2,..., n 5. Find the new estimate for the bootstrap sample {^j*}" using SM. 6. Find the number of bumps in m\0bS . 7. Repeat 4-6 a large number of times, and calculate the estimate of the p-value by counting the number of estimates m'Xoba having more than k bumps, i.e. crit # of estimates m'Xobs with more than k bumps P — crit # of bootstrap samples Again, we use the same smoothing technique throughout the simulations, but we can experiment with a different smoothing method in steps 1, 2 and 6. In our simulation study, we employ Lspline with the penalty Lm = m" + jm' with the default model mdef(t) = Po + Pi exp{—7^}. In the case of the Berkeley growth data, local linear kernel smoothing using the Splus function locpoly, is used as SM in all 39 steps. For comparison purposes, the data are re-analyzed, using locpoly in step 2, and Lspline in steps 1 and 5. In Chapter 5 we present the results of the simulations for the regression functions and their derivatives as well as the analysis of the speed of growth in the Berkeley growth data. 40 Chapter 5 Results In order to examine the proposed testing procedure for the number of bumps, we conducted a simulation study. We used several functions to assess the performance of a new test CviSV. In section 5.1, we present the results of the simulations for two classes of regression functions. In section 5.2, we show the performance of the test in the context of derivatives of regression functions, and in section 5.3, the CviSV is applied to the Berkeley growth data. 5.1 Regression Function Simulations We generated the data . Yi - m(ti) + o£i (5.1) with ti = i/101 and £j ~ N(0,1) for i = 1,2,..., 101. Two levels of noise were used, a = 0.05 and a = 0.10 for one of the functions, and one level of noise a = 0.05 for the other. Functions m(-) used in the simulation study are described below. For each model, we produced 500 data sets, and to each applied the bootstrap procedure (see pages 36 and 39) in sections 5.1.1 - 5.1.3, with bootstrap sample size equal to 500. 41 We have chosen two classes of monotone functions to work with, one being the linear function, and the other the exponential function. To each of them, we added one or more bumps. In order to preserve the continuity of the regression functions, we have chosen a function producing a bump decreasing at the exponential rate to zero. We define a function generating a bump by B(tic)=exp{-f(^f }• <5-2' As we can see B(t\c) is the kernel of the normal density function. The parameter c controls the place the bump is inserted in the function. The choice of (0.1) in the denominator of the exponent enables us to separate bumps easily, in case we want to add more than one bump to the studied function. Definition of the functions We considered two classes of functions mi(t\a) =l+t + a*B(t\0.5) (5.3) and m2(t\a,b) = l + exp(-4i) + a * B(t\0.25) + b * B(t\0.75). (5.4) Introduction of the variable a in mi(-\a) allows us to introduce a bump in the otherwise a monotone function. Plots of the functions mi(-\a) used are shown in Figure 5.1. The value of a = 0 gives us a straight line and a = 0.15 produces a curve which is on a border-line of having a bump. Cases a = 0.25 and a = 0.45 produce curves with small and large bump respectively. Bowman et.al. (1998) used functions similar to (5.3) in their simulation studies of a test of monotonicity of a regression function. In the first two cases of the parameter a = 0,0.15, the function mi(-) is monotone, and in the other two cases, a = 0.25,0.45, non-monotone. Using a combination of variables a and b in the function m2(t|a, b), we can produce a function with 0, 1, or 2 bumps. When a = 0 or a = 0.32 the resulting function 42 a=0 a=0.15 has no bump at £=0.25. The values a = 0.48 and a = 0.64 yield a medium and large bump respectively. When 6 = 0, there is no bump in the regression function at £=0.75. However values of b = 0.10,0.15,0.20 give rise to a small, medium and large bump respectively. An interaction between two bumps introduced in the equation by B(t\0.2b) and B(t\0.75) is negligible, since the exponential functions are the kernels of a normal density, and the distance between their maxima is equal to 0.5 = 5x the standard deviation, if thought of as a normal density function. We can see the functions with all the combinations of a and b in Figure 5.2. In our simulation study, we used 9 (indicated by the asterisks "**") out of 16 combinations to assess the performance of CviSV. Counting bumps in estimates We would like to look at a few aspects of our test in the simulation study. First, we want to investigate the behaviour of CviSV when we utilize two different methods for detection of a bump. Let % = m(ti), i.e. we take the estimates at the data points. As pointed out in chapter 3, we can employ either an /-bump {YM < YM+1 <---<Yj>Yj+1>---> Yj+l) (5.5) OR an /-max Yj > max {*}_,, • • • Yj-u Yj+U • • •, Yj+l) (5.6) with / = 1,2,3,..., to define a bump at Yj in our testing procedure. We use the function m2(-1 a = 0.48, b = 0.15) to investigate in detail the behaviour of the CviSV. We employ the Lspline smoothing technique in all the steps of the bootstrap procedure, using the penalty L = m" + 7m'. The values of / = 2,3,4 and 5 are used in /-max as well as / = 2 in /-bump for comparison purposes. We present detailed discussion and results in section 5.1.1. Based on that section, for all the 44 a=0,b=0" a=0,b=0.10 a=0,b=0.15" a=0,b=0.20 a=0.32,b=0" a=0.32,b=0.10" a=0.32, b=0.15 a=0.32, b=0.20 a=0.48,b=0 a=0.48,b=0.10" a=0.48,b=0.15" a=0.48, b=0.20 . a=0.64, b=0 ** a=0.64, b=0.10 a=0.64,b=0.15 a=0.64, b=0.20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 02 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.2: Function m2(t\a, b) = 1 + exp(-4t.) + a * B(t\0.25) + b* B(t\0.75). 45 other functions we utilized 3-max as a basis for counting bumps in the CviSV testing procedure. 5.1.1 Function m2(-|a = 0.48, b = 0.15) We test the hypothesis H0 : m2(-) has < k bumps vs. Ha : m2(-) has > k bumps (5.7) for k = 0,1, 2,3, where m2(t\a = 0.48,6 = 0.15) = 1 + exp{-4t} + 0.48 * B(t\c = 0.25) + 0.15 *B(t\c = 0.75). (5.8) We use /-bump with 1 = 2, and /-max with / = 2,3,4 and 5 to count the num ber of bumps in the estimates m2(-1 0.48,0.15). Results of the test (5.7) are dis played in Table 5.1. The proportions of the simulations with p-values falling below 5%, 10% and 25% are displayed. Thus, the table entries give the proportion of times we reject the null hypothesis at levels 5%, 10% and 25%. The standard error of the estimates of the rejection probabilities is not more than 2.25% at all significance levels. As we can see, there is little difference among definitions, but 5-max seems slightly better due to high power (for k=0, 1) and low rejection when k=2, 3. Consequently, it would be reasonable to use 5-max in our testing procedure. How ever, for 2 out of 500 data sets used in the simulations, the estimates rh\, X > 0, had at most 2 bumps. Thus Xc% = 0. Therefore, X^r\t did not exist and so it was im possible to carry out tests. In 26 out of 500 data sets, xfjit = 0, since the estimates 46 Table 5.1: Two Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Yi = m2(ii|a = 0.48,6 = 0.15) + ce;, i = 1,..., 101, £•» ~ iV(0,1), a = 0.05, using different definitions of a bump. a = 0.48,6 = 0.15 H0 : # of bumps < k Level Width k = 0 k = 1 k = 2 k = 3 2-bump 1.000 0.358 0.032 0.016 2-max 1.000 0.362 0.036 0.016 a = 0.05 3-max 1.000 0.368 0.034 0.020 4-max 1.000 0.366 0.034 0.022 5-max 0.998 0.382 0.024 0.008 2-bump 1.000 0.590 0.080 0.058 2-max 1.000 0.592 0.088 0.056 a = 0.10 3-max 1.000 0.586 0.074 0.060 4-max 1.000 0.586 0.070 0.042 5-max 1.000 0.596 0.056 0.026 2-bump 1.000 0.836 0.212 0.174 2-max 1.000 0.838 0.222 0.184 a = 0.25 3-max 1.000 0.832 0.206 0.172 4-max 1.000 0.840 0.202 0.150 5-max 1.000 0.838 0.188 0.092 47 l-bump versus l-max l-maxfor 1=2,3,4,5 -7 -6 -5 -4 -3 -2 -1 loglO(lambda) Figure 5.3: Number of bumps as a function of a smoothing parameter 48 rh\ had at most 3 bumps for A > 0. When we used CviSV with 4-max, 1 out of 500 datasets had X^it = 0. In all other situations, i.e. 2-bump, 2-max and 3-max, we were able to obtain A^]t > 0 for k < 3. Due to the similar performance of the CviSV test in all the cases, and difficulties encountered with 5-max and 4-max, we had to decide between values of 1=2 or 3 in /-max or we could use /-bump in the further studies. In the simulations, we found some problems with testing while utilizing /-bump when 1=3 or 4. Our testing procedure is based on the existence of \*£)it and on the monotonicity property of the number of bumps as a function of A. For some data sets, using /-bump with / > 3 seriously violates the monotonicity property as can be seen in Figure 5.3. When we use 2-bump, the monotonicity is violated mostly at the smaller values of A. The range of A on the plot extends from lO^-7) to lO^-1). Values of A < 10^-7^ pro duce a very wiggly estimate, since they correspond to using more than 60 (=tr 5(A)) parameters, and we do not want to model a data set with 101 observations by a regression function with so many parameters. Therefore, for our testing purposes we can choose either 2-max or 3-max. Since we are more interested in detecting "true" bumps, and not the spurious ones, we decided to use 3-max in all other simulation studies. 5.1.2 Zero-One Bump Function As introduced in section 5.1, we used regression functions rai(-|a) in the first simula tion study with a = 0,0.15,0.25 and 0.45. Sample data sets are given in Figures 5.4 and 5.5. In the testing procedure, we use the Pspline method with a penalty L = m" giving a default model mdef(t) = a + (3t. For comparison purposes, we substitute 49 a=0.15 0.0 0.2 0.4 0.6 0.8 1.0 a=0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.4: Function mi(t\a) = 1 +1 + a * B(t\0.5) with 0 bumps and simulated data with the level of noise a = 0.10. 50 a=0.45 Figure 5.5: Function mi(t\a) = 1 + t + a * B(t\0.b) with 1 bump and simulated data with the level of noise a = 0.10. 51 Table 5.2: Zero Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Yi = m\(ti\a = 0) + oe^ i = 1,..., 101, Ei ~ iV(0,1), a — 0.05,0.10. a = 0 ff0 : # of bumps < fc Noise Level Method fc = 0 k = 1 cr = 0.05 a = 0.05 Pspline Kernel 0.002 0.002 0.002 0.006 a = 0.10 Pspline Kernel 0.004 0.006 0.010 0.010 a = 0.25 Pspline Kernel 0.030 0.044 0.024 0.036 a = 0.10 a = 0.05 Pspline Kernel 0.006 0.006 0.008 0.018 a = 0.10 Pspline Kernel 0.010 0.014 0.020 0.038 a = 0.25 Pspline Kernel 0.080 0.112 0.088 0.122 Local Linear Kernel Smoothing for Pspline in steps 1 and 5 of the bootstrap procedure (see page 36). The results are exhibited in Tables 5.2 - 5.5. For a = 0 and a = 0.15, we carried out the tests with A; = 0 and k = 1, and for a = 0.25 and a = 0.45 we used k = 0,1 and 2. Overall the performance of CriSV using either Pspline or Kernel Smoothing is very good at all combinations of a and a. In most cases, the size of the test is well below the significance level, and the power is at acceptable levels . 52 Table 5.3: Zero Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Y{ = mx{ti\a = 0.15) + aeu i = l,..., 101, e{ ~ iV(0,1), o = 0.05,0.10. a = 0.15 H0 : # of bumps < k Noise Level Method k = 0 k = 1 a = 0.05 a = 0.05 Pspline Kernel 0.032 0.060 0.010 0.014 a = 0.10 Pspline Kernel 0.082 0.106 0.026 0.038 a = 0.25 Pspline Kernel 0.240 0.298 0.114 0.162 a = 0.10 a = 0.05 Pspline Kernel 0.024 0.030 0.026 0.032 a = 0.10 Pspline Kernel 0.066 0.076 0.050 0.060 a = 0.25 Pspline Kernel 0.208 0.250 0.156 0.204 In the first case, a = 0, the rejection levels are far below the specified significance levels of 0.05, 0.10, and 0.25. As might be expected, when the noise level is higher, the null hypothesis is rejected more often. However this difference is only significant at level a = 0.25. Both methods (Pspline and Kernel Smoothing) give comparable rejection levels, with the first method being slightly superior. For a = 0.15 our test performs well given the fact that m(-|a = 0.15) is on a border line of having a bump. The proportions of rejection of H0 are larger than when a = 0, but in most cases are still below the specified significance levels. The 53 Table 5.4: One Bump: Proportion of times H0 is rejected at level a based on 500 simulations of Y{ = mi(ti\a = 0.45) 4- aei, i = 1,..., 101, Si ~ N(0,1), a = 0.05, 0.10. a = 0.45 JFfn : # of bumps < Noise Level Method k = 0 k = 1 k = 2 cr = 0.05 a = 0.05 Pspline Kernel 0.996 1.000 0.010 0.006 0.008 0.018 a = 0.10 Pspline Kernel 1.000 1.000 0.026 0.022 0.020 0.036 a = 0.25 Pspline Kernel 1.000 1.000 0.124 0.154 0.096 0.132 a = 0.10 a = 0.05 Pspline Kernel 0.498 0.916 0.002 0.006 0.012 0.010 a = 0.10 Pspline Kernel 0.686 0.974 0.016 0.024 0.042 0.046 a = 0.25 Pspline Kernel 0.924 0.998 0.138 0.136 0.150 0.154 Kernel Smoothing method rejects more often than the Pspline method. However the difference is significant only at a = 0.25. It seems surprising that we reject the true null hypothesis more often with a smaller noise. Thus far, we dealt with functions with zero bumps, and the rejection of the null hypothesis was expected in all the tests performed. When a = 0.25 or a = 0.45 the problem becomes more challenging, since we would like to reject the null hypothesis when k = 0, but not in the other instances k = 1 and k = 2. In the easier situation of a = 0.45, both Pspline and Kernel Smoothing give very 54 Table 5.5: One Bump: Proportion of times H0 is rejected at level a based on 500 simulations of Yt = mx(ti\a = 0.25) + aeu i = 1,..., 101, Si ~ N(0,1), a = 0.05,0.10. a = 0.25 Ho : # of bumps < fc Noise Level Method k = 0 fc = 1 k = 2 a = 0.05 a = 0.05 Pspline Kernel 0.782 0.822 0.018 0.040 0.010 0.016 a = 0.10 Pspline Kernel 0.902 0.866 0.040 0.066 0.034 0.038 a = 0.25 Pspline Kernel 0.962 0.936 0.138 0.170 0.126 0.158 <r = 0.10 a = 0.05 Pspline Kernel 0.302 0.298 0.028 0.054 0.016 0.030 a = 0.10 Pspline Kernel 0.462 0.446 0.050 0.110 0.056 0.072 a = 0.25 Pspline Kernel 0.702 0.676 0.182 0.248 0.186 0.196 good results. At a smaller level of noise a = 0.05, the estimated power of the test for H0 : k = 0 is 100% at all levels, except at a = 0.05 where the Pspline method rejects 99.6% of the time. When the noise level is increased to a = 0.10, the power decreases slightly for the Kernel method and quite a bit for the Pspline method. When we test with k = 1 or 2, at both noise levels the estimated sizes are far below the specified significance levels, and both methods (Pspline and Kernel) give very similar results. Thus, based on size and power, Kernel Smoothing seems superior to the Pspline method here. As expected, we falsely reject H0 : k < 2 more often at the 55 higher noise level. However the false rejection of H0 : k < 1 happens at virtually the same frequency at both levels of noise. We encountered by far the most difficult task in testing with a = 0.25. When the noise is at a lower level, we reject H0 : k = 0 a large proportion of time. However, power decreases dramatically when a = 0.10. The rejection levels for the true null hypothesis fall well below the specified significance levels, with performance better at the lower noise level. The Pspline method seems to be superior to the Kernel method, but the differences are not significant. In Figure 5.6 we can observe that the power of the test rises slowly as a function of significance level. Therefore, it would be more reasonable to choose a threshold value for a to be more than 10%. For a threshold of a = 0.15 (indicated by the vertical line), the power is high (around 0.50) and the size is fairly small (around 0.10). 5.1.3 Multi-Bump Function In the case of the first regression function mi (• | a), we had either 0 bumps or 1 bump. However, we want to evaluate the CviSV in a more challenging situation. We study regression functions with 0, 1 or 2 bumps in our second attempt. We use the function m2(-|a, 6) introduced in section 5.1 m2(t\a, b) = 1 + exp(-4t) + a * S(t|0.25) + b * B(t\0.7b). We analyze the behaviour of the CxiSV when the true function contains 0, 1 or 2 bumps. In order to accommodate all the situations, we have chosen 9 combinations of parameters a and b presented below: • m2(-1 a, b) with 0 bumps (see Figure 5.7): - m(-|a = 0,6 = 0) 56 Power for k=0 a Figure 5.6: Power of the CviSV test with k = 0 and size of the test with = 1 for the function mi = 1 +1 + 0.25 * JB(£|0.5) with the noise level a = 0.10. 57 a=0, b=0 cvi a=0.32, b=0 o 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.7: Function m2(t\a, b) = 1 + exp(-4t) + a * S(t|0.25) + b * B(t|0.75) with 0 bumps and simulated data with the level of noise a = 0.05. 58 a=0, b=0.15 a=0.32, b=0.10 Figure 5.8: Function m2(t\a, b) = 1 + exp(-4£) + a * 5(t|0.25) + 6 * B(£|0.75) with 1 bump and the simulated data with the level of noise a = 0.05. 59 a=0.48, b=0.10 0.0 0.2 0.4 0.6 0.8 1.0 a=0.48,b=0.15 a=0.64, b=0.20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.9: Function m2{t\a, b) = 1 + exp(-4t) + a * B(t\0.25) + b* B(t\0.75) with 2 bumps and simulated data with the level of noise a = 0.05. 60 - m(-\a - 0.32,6 = 0) • m2(-1 a, 6) with 1 bump (see Figure 5.8): — m(-\a = 0,6 = 0.15) — m(-\a = 0.64,6 = 0) — m(-\a = 0.32,6 = 0.20) — m(-\a = 0.32,6 = 0.10) • m2(-1 a, 6) with 2 bumps (see Figure 5.9): - m(-|a = 0.64,6 = 0.20) - m(-|a = 0.48,6 = 0.15) - m(-|a = 0.48,6 = 0.10) Within each group, we present the functions in order of increasing difficulty for de tecting a bump, e.g. for m2 with one bump, when a = 0 and 6 = 0.15 the bump is easily detected, while the combination of a = 0.32 and 6 = 0.10 presents a hard testing problem. In all the simulations in this section the level of noise is fixed at a = 0.05. Results are displayed in Tables 5.6 - 5.8. The proportions of the simulations with p-values falling below 5%, 10 and 25% are exhibited. We use 3 smoothing methods, Lspline, Pspline and Local Linear Kernel Smoothing, in steps 1 and 5 of the bootstrap procedure (see page 36) to assess the performance of the CviSV testing procedure. Lspline is always used in step 3. In order to use Lspline, we need to estimate the parameter 7 of the default model. We find 7 for each dataset by a nonlinear least squares fit of the parametric model m(t) = ao + aiexp{—7^} (see Heckman and 61 Ramsay (1996), for a discussion of estimation of unknown parameters in the default model). ZERO BUMPS (Table 5.6) In the first case of the combination of parameters a = 0 and b = 0, the sizes of the tests Ho : # bumps < k, k = 0,1, 2 fall well below the respective nominal levels of 5%, 10% and 25%. It is worth observing the unexpected fact that the size increases as k goes from 0 to 2. The performances of both the Lspline and Pspline methods are very similar, and slightly better than the performance of Kernel Smoothing. The other case with a = 0.32 and 6 = 0 gives similar, but slightly worse results than the first case. The sizes of the test fall just below the nominal significance levels while testing with k = 0 and 1, except for a = 0.25 and k = 1. However they are larger than a when k = 2, but in most cases only marginally. Again the performance of both spline methods is similar, and slightly superior to the Kernel method. ONE BUMP (Table 5.7) For a = 0 and b = 0.15, the estimated power of the test Ho with A;=0 is 100% in most cases, with the lowest value of 98.6% for the Lspline method at level 0.05. The size of the test is substantially lower than the prespecified significance level in all cases. The results for all the methods are very similar. When a = 0.32 and b = 0.20, we achieve 100% power in all cases. The size of the test Ho : # bumps < 1 is very close to, but below the nominal significance level in all instances. For k = 2, the proportions of rejection of the true null hypotheses are lower than the respective levels. All the methods exhibit similar performance. While keeping a at the same level of 0.32, we changed b to 0.10, thus trying to detect a bump that has half the amplitude of the one discussed in the previous paragraph. The size of the test is very similar to the one above on most occasions, expect for the Kernel method at level 0.25. However, the power has dropped, but is 62 acceptable, ranging from around 65% to 90%. Therefore, even with a bump of very small amplitude the ChSV performs very well. Once more, there is practically no difference between the spline methods, with the Kernel method performing slightly worse in terms of both size and power. The last situation we deal with is slightly different since we put a = 0.64 and 6 = 0, thus introducing a large bump on a steep part of an exponential function. In these circumstances, CriS'P's behaviour depends on the smoothing method used. Pspline and Lspline have sizes close to and often below the respective significance levels. However, for the Kernel method the size is marginally larger than the level a in most cases. We have a reverse situation when it comes to the power of the test. The Kernel method outperforms the spline methods, especially at significance levels less than 0.15 (see Figure 5.10). TWO BUMPS (Table 5.8) We start off with the easiest combination of parameters a = 0.64 and 6 = 0.20. Both bumps are moderately big, i.e. they should be easily detected, based on our experience in the cases discussed so far. The power of the test for both k = 0 and 1 is over 95% in all cases but one (with the Kernel method for k = 0 and a = 0.05). The size of the test is well below the prespecified significance level. All the methods perform very well with the only exception noted above for the Kernel method. In the second function, we have two medium bumps introduced by setting a = 0.48 and 6 = 0.15. In the case with k=0, the power of the CriSV test stays close to 100% for all the methods at all considered levels. However, for the test with k=l, the power drops considerably for both spline methods and by some amount for the Kernel method. The reverse happens for the size: spline methods have sizes below the nominal significance levels, but the Kernel method has sizes above for the smaller values of a while testing with fc=2 (see Figure 5.11). 63 In the last considered case, we shrink the amplitude of the bump introduced by b to 0.10, and we leave a at 0.48. We observe that the power of the test for A;=0 drops substantially, if compared with the one in the previous case. However, the results obtained for k = 1 indicate little or no change when compared to the last case. The conclusions for the size are almost identical as in the case with 6=0.15 with the Kernel method performing worse in the same circumstances (see Figure 5.12). 64 Table 5.6: Zero Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Yi = m2(tj|a, b) + asi, i = 1,..., 101, £j ~ ^(0,1), a = 0.05. a = 0, 6 = 0 H0 : of bumps < fc Level a Method fc = 0 k = 1 k = 2 Lspline 0.008 0.020 0.034 a — 0.05 Pspline 0.006 0.022 0.032 Kernel 0.024 0.028 0.038 Lspline 0.036 0.050 0.076 a = 0.10 Pspline 0.038 0.048 0.074 Kernel 0.066 0.062 0.070 Lspline 0.168 0.184 0.222 a = 0.25 Pspline 0.164 0.184 0.226 Kernel 0.224 0.222 0.228 a = 0.32 ,6 = 0 H0 : # of bumps < fc Level a Method fc = 0 k = 1 k = 2 Lspline 0.030 0.034 0.064 a = 0.05 Pspline 0.030 0.034 0.062 Kernel 0.028 0.042 0.084 Lspline 0.062 0.098 0.134 a = 0.10 Pspline 0.064 0.092 0.136 Kernel 0.078 0.092 0.178 Lspline 0.176 0.294 0.318 a = 0.25 Pspline 0.174 0.292 0.322 Kernel 0.254 0.310 0.380 65 Table 5.7: One Bump: Proportion of times H0 is rejected at level a based on 500 simulations of Fj = m2(ti\a, b) + oei, i = 1,..., 101, Si ~ iV(0,1), a = 0.05. a = 0, b -= 0.15 H0 : # of bumps < k Level Method k = 0 fc = 1 k = 2 Lspline 0.986 0.014 0.022 a = 0.05 Pspline 0.988 0.012 0.020 Kernel 1.000 0.012 0.020 Lspline 0.998 0.036 0.052 a = 0.10 Pspline 0.998 0.036 0.048 Kernel 1.000 0.022 0.048 Lspline 1.000 0.134 0.168 a = 0.25 Pspline 1.000 0.136 0.166 Kernel 1.000 0.126 0.166 a = 0.32, b = 0.20 H0 : of bumps < k Level Method k = 0 k = l k = 2 Lspline 1.000 0.048 0.032 a = 0.05 Pspline 1.000 0.050 0.034 Kernel 1.000 0.038 0.042 Lspline 1.000 0.088 0.056 a = 0.10 Pspline 1.000 0.092 0.058 Kernel 1.000 0.076 0.068 Lspline 1.000 0.244 0.202 a = 0.25 Pspline 1.000 0.250 0.202 Kernel 1.000 0.236 0.212 66 Table 5.7: (continued) a = 0.32,6 = 0.10 HQ : # of bumps < k Level Method k = 0 k = 1 k = 2 Lspline 0.678 0.042 0.048 a = 0.05 Pspline 0.652 0.042 0.046 Kernel 0.698 0.056 0.048 Lspline 0.802 0.088 0.078 a = 0.10 Pspline 0.788 0.088 0.080 Kernel 0.782 0.106 0.110 Lspline 0.930 0.264 0.206 a = 0.25 Pspline 0.922 0.258 0.212 Kernel 0.878 0.274 0.310 a = 0.64,6 = 0 H0 : # of bumps < k Level Method k = 0 k = 1 k = 2 Lspline 0.546 0.078 0.032 a = 0.05 Pspline 0.554 0.078 0.034 Kernel 0.838 0.066 0.044 Lspline 0.812 0.088 0.078 a = 0.10 Pspline 0.816 0.086 0.080 Kernel 0.870 0.120 0.112 Lspline 0.968 0.152 0.254 a = 0.25 Pspline 0.972 0.154 0.258 Kernel 0.992 0.292 0.302 67 Table 5.8: Two Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Y; = m2(ii|a, b) + o£i, i = 1,..., 101, Ei ~ iV(0,1), a = 0.05. a = 0.64, 6 = 0.20 H0 # of bumps < k Level Method k = 0 k = 1 k = 2 k = 3 Lspline 1.000 0.958 0.020 0.024 a = 0.05 Pspline 1.000 0.956 0.018 0.022 Kernel 0.922 0.996 0.016 0.016 Lspline 1.000 0.978 0.044 0.050 a = 0.10 Pspline 1.000 0.976 0.042 0.050 Kernel 0.972 1.000 0.032 0.050 Lspline 1.000 0.998 0.158 0.172 a = 0.25 Pspline 1.000 0.998 0.168 0.164 Kernel 1.000 1.000 0.146 0.172 a = 0.48,6 = 0.15 Ho # of bumps < Level Method k = 0 k = 1 k = 2 k = 3 Lspline 1.000 0.368 0.034 0.020 a = 0.05 Pspline 0.998 0.328 0.036 0.020 Kernel 0.976 0.708 0.078 0.022 Lspline 1.000 0.586 0.074 0.060 a = 0.10 Pspline 1.000 0.546 0.076 0.052 Kernel 1.000 0.814 0.120 0.064 Lspline 1.000 0.832 0.206 0.172 a = 0.25 Pspline 1.000 0.810 0.208 0.160 Kernel 1.000 0.918 0.242 0.226 68 Table 5.8: (continued) a = 0.48, 6 = 0.10 H0 # of bumps < fc Level Method k = 0 k = 1 k = 2 k = 3 Lspline 0.650 0.420 0.036 0.036 oc = 0.05 Pspline 0.616 0.360 0.036 0.034 Kernel 0.640 0.738 0.100 0.044 Lspline 0.816 0.614 0.076 0.070 a = 0.10 Pspline 0.800 0.544 0.078 0.066 Kernel 0.748 0.828 0.162 0.092 Lspline 0.934 0.864 0.224 0.234 a = 0.25 Pspline 0.932 0.844 0.226 0.230 Kernel 0.914 0.910 0.316 0.244 69 Figure 5.10: Power and size. estimates for the function m2(t|a = 0.64,6 = 0). The vertical line is drawn at a = 0.15. 70 Power for k=1 Size for k=2 o 0.0 0.2 0.4 0.6 0.8 1.0 a Figure 5.11: Power and size estimates for the function m2(t|a = 0.48, b = 0.15). The vertical line is drawn at a = 0.15. 71 Power for k=1 Size for k=2 a Figure 5.12: Power and size estimates for the function m2(t\a = 0.48, b = 0.10). The vertical line is drawn at a = 0.15. 72 5.2 Derivative Simulations As mentioned in Chapter 4 the CviSV can be used in testing for bumps in the derivative of the regression function. As the example in section 5.3 will show, in certain situations we are interested in the rate of change of the regression function, and not in the function itself. In the simulation study we carry out, we have chosen to work with the antiderivatives of the function (m2(-) — 1). Therefore we define the derivative as m'3(t) = m2(t) — 1 = exp{-4i} + a*£(£|0.25) + b*B(t\0.75) (5.9) and the regression function itself rt , mz{t) = / m3(s)ds Jo = 0.25 - 0.25 exp{-4£} + ^r)(0.1){a* $((*- 0.25)/0.1) + b* $((* - 0.75)/0.1} (5.10) where $(•) is the cumulative distribution function of the standard normal random variable. We generated the data Yt = m3(U) + o-£i (5.11) with U = i/101 and £; ~ iV(0,1) for i = 1,2,..., 101. Two levels of noise were used: a = 0.0025 and a = 0.0050. However, if the results at the higher noise level indicated a good performance of the test, we limited the simulations at the lower noise level to one method (Lspline) and one function in each bump category. For each model, we produced 500 data sets and for each of them we applied a bootstrap procedure with a sample size equal to 500. 73 We have chosen 6 out of the 16 original combinations of parameters a and 6 defining functions ffl2(-). We have chosen them in such a way as to have 2 curves with 0, 1, and 2 bumps. The functions m3(-) are presented in Figures 5.13 - 5.15 together with their derivatives. Throughout the study, the Lspline method of estimation was used with the penalty L = m"+7m', yielding the default model mdef(t) = a0+ai exp{—7*}. The estimation of 7 is necessary to utilize the Lspline smoothing method. We find 7 for each dataset via a nonlinear least squares fit of the parametric model m(t) = a0 + a\ exp{—7*}. For comparison purposes we also use Local Quadratic Kernel Smoothing in steps 1 and 5 of the bootstrap procedure (see page 39). ZERO BUMPS in the DERIVATIVE (Table 5.9) In the first case with a = 0 and 6 = 0, the results show that the sizes of the test for H0 : # bumps < k for k=0, 1, 2 are in most cases below the nominal significance levels. Both methods, Lspline and Kernel, perform well, and there is not much difference when the level of noise is increased. In the second case with a = 0.32 and 6 = 0, i.e. with a "shoulder" on the slope of the derivative, for k = 0 and 1 the results are very similar to the previous case. The Kernel method holds an edge at the level a = 0.25. When k=2, the Lspline method has consistently larger size than the prespecified level, and the Kernel method has size around the nominal levels. ONE BUMP in the DERIVATIVE (Table 5.10) We start evaluating CnSV in the one-bump case by setting a = 0 and 6 = 0.20. The estimated power of the test is equal to 100% in all the cases while testing with Lspline, and is almost as high while utilizing the Kernel method. The size of the test is well below the nominal level in most cases, and slightly below in a few situations. Size-wise the Kernel method seems to outperform the Lspline method. However we 74 Function m Function m Derivative m' with a=0 and b=0 Derivative m' with a=0.32 and b=0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.13: Functions m3(£|a, b) (Top) and its derivative m'3 (Bottom) with 0 bumps. Also included, data sets (Top) with noise level a = 0.005. 75 Function m Function m Figure 5.14: Function m3(t\a, b) (Top) and its derivative m'3 (Bottom) with 1 bump. Also included, data sets (Top) with noise level a = 0.005. 76 Function m Function m Derivative m' Derivative m' with a=0.48 and b=0.15 with a=0.64 and b=0.20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.15: Function m3(i|a, b) (Top) and its derivative m3 (Bottom) with 2 bumps. Also included, data sets (Top) with noise level a = 0.005. 77 have to remember that Lspline has larger estimated power that is, Lspline to reject more often. To our surprise, the estimated size at the lower noise level is consistently higher than at the higher noise level. However power is about the same at both noise levels. We try another setup of parameters a and b by introducing an "almost bump" on the steep part of m3(-|a, b) by putting a = 0.32. We leave b = 0.20. Again, the power of the test is very high, reaching 100% when we use the Lspline method, and over 90% when the Kernel method is used. The sizes of the test are well below the nominal levels. In most cases, the Kernel method holds an advantage, when the size is considered, especially at the highest prespecified level a = 0.25. TWO BUMPS in the DERIVATIVE (Table 5.11) We have come to by far the most difficult case of all considered. We would like to evaluate the CviSV test when we generate the data using a regression function with its derivative having two bumps. We start off with setting a = 0.64 and b = 0.48, i.e. introducing the two largest bumps considered in our study. We obtain results that are very satisfactory. The estimated power of the test is over 95% in all but one case, and the size of the test is well below the prespecified level. Once more, the Kernel method has an edge over Lspline when we compare the sizes of the test at a = 0.25. However, the differences are only marginally significant. We set a = 0.48 and b = 0.15, thus introducing only two moderate size bumps in ra3(-). At the lower noise level a = 0.0025 the size of our test is about right in all the cases taken into account. However, when we look at the estimated power the Lspline method produces superior outcomes with the lowest estimated power over 85% at the level a = 0.05, and k=l. The Kernel method achieves significantly lower power, especially for the lower significance levels or when we test for not more than one bump (see Figure 5.16). 78 When we increase the noise level to a = 0.005, the power of CviSV decreases slightly for the Lspline method when k=0. For k=l, the Kernel method has higher estimated power at the lower significance levels, but the Lspline method starts to be advantageous at the level a = 0.15 (see Figure 5.16). When k = 2 the size of the test is higher than the nominal level, but in some cases only marginally. In the case of k = 3, the estimated size is at the right level in all but one case. The simulation studies indicate that we should choose a rejection level of at least 10%. Otherwise, the power of our test will tend to be low. On the other hand, we do not want to commit a Type I error too often. Therefore the significance level should not be larger than 20%. In the next section, we settle for a = 0.15, however any choice between 0.10 and 0.20 would be reasonable. 79 Table 5.9: Zero Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Yt = m3(ti\a, b) + aeu i = 1,..., 101, e{ ~ N(0,1), o = 0.0025,0.0050. a = = 0,6 = 0 HQ : # of bumps < k Noise Level Method k = 0 k = 1 k = 2 a = 0.05 Lspline 0.018 0.010 0.030 cr = 0.0025 a = 0.10 Lspline 0.054 0.046 0.082 a = 0.25 Lspline 0.160 0.228 0.254 a = 0.05 Lspline 0.034 0.014 0.046 Kernel 0.030 0.020 0.026 a = 0.005 a = 0.10 Lspline 0.072 0.050 0.104 Kernel 0.062 0.054 0.064 a = 0.25 Lspline 0.188 0.208 0.290 Kernel 0.192 0.202 0.230 a = 0.32,6 = 0 H0 : # °f bumps < k Noise Level Method k = 0 k = 1 k = 2 a = 0.005 a = 0.05 Lspline Kernel 0.054 0.044 0.018 0.020 0.088 0.040 a = 0.10 Lspline Kernel 0.084 0.062 0.074 0.052 0.180 0.090 a = 0.25 Lspline Kernel 0.204 0.144 0.258 0.152 0.400 0.284 80 Table 5.10: One Bump: Proportion of times H0 is rejected at level a based on 500 simulations of Yt = m3(ti\a, b) + aeu i = l,..., 101, e{ ~ JV(0,1), a = 0.0025,0.0050. a = 0,6 = 0.20 HQ : # of bumps < fc Noise Level Method k = 0 fc = 1 k = 2 a = 0.05 Lspline 1.000 0.028 0.028 a = 0.0025 a = 0.10 Lspline 1.000 0.064 0.064 a = 0.25 Lspline 1.000 0.238 0.248 a = 0.05 Lspline 1.000 0.008 0.014 Kernel 0.944 0.000 0.004 cr = 0.005 a = 0.10 Lspline 1.000 0.032 0.038 Kernel 0.998 0.002 0.022 a = 0.25 Lspline 1.000 0.140 0.204 Kernel 1.000 0.064 0.140 a = 0.32,6 = 0.20 H0 : # of bumps < fc Noise Level Method k = 0 fc = 1 k = 2 a = 0.005 a = 0.05 Lspline Kernel 1.000 0.910 0.020 0.038 0.042 0.010 a = 0.10 Lspline Kernel 1.000 0.978 0.054 0.048 0.076 0.032 a = 0.25 Lspline Kernel 1.000 1.000 0.220 0.076 0.244 0.148 81 Table 5.11: Two Bumps: Proportion of times H0 is rejected at level a based on 500 simulations of Y{ = m3(ti\a, b) + aeu % = 1,..., 101, e{ ~ iV(0,1), a = 0.0025,0.0050. a — 0.64, b = 0.20 H0 : # of bumps < k Noise Level Method k = 0 k = 1 k = 2 k = 3 a = 0.05 Lspline 1.000 1.000 0.024 0.032 a = 0.0025 a = 0.10 Lspline 1.000 1.000 0.058 0.082 a = 0.25 Lspline 1.000 1.000 0.198 0.244 a = 0.05 Lspline 0.986 0.898 0.026 0.026 Kernel 0.956 0.960 0.044 0.026 cr = 0.005 a = 0.10 Lspline 0.996 0.982 0.064 0.076 Kernel 0.964 0.968 0.070 0.060 a = 0.25 Lspline 0.998 0.998 0.186 0.242 Kernel 0.980 0.976 0.138 0.176 a = 0.48, 6 = 0.15 H0 : # of bumps < k Noise Level Method k = 0 k = 1 k = 2 k = 3 cr = 0.0025 a = 0.05 Lspline Kernel 1.000 0.460 0.874 0.240 0.040 0.046 0.042 0.036 a = 0.10 Lspline Kernel 1.000 0.690 0.938 0.270 0.090 0.084 0.122 0.094 a = 0.25 Lspline Kernel 1.000 0.970 0.972 0.410 0.238 0.258 0.288 0.238 cr = 0.005 a = 0.05 Lspline Kernel 0.832 0.328 0.090 0.328 0.104 0.062 0.044 0.056 a = 0.10 Lspline Kernel 0.904 0.404 0.290 0.360 0.164 0.108 0.116 0.100 a = 0.25 Lspline Kernel 0.968 0.680 0.662 0.484 0.306 0.266 0.292 0.256 82 Power for k=0, sigma=0.0025 Power for k=1, sigma=0.0025 oo d d d d o d Lspline Kernel Power for k=0, sigma=0.005 Power for k=1,sigma=0.005 00 d d d d o d 0.0 0.2 0.4 0.6 Oi i 1 1 1 r 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.16: Power of the CviSV test when k=0 and k=l for the function m'3(t\a 0.48,6 = 0.15) at the two noise levels. The vertical line is drawn at a = 0.15. 83 5.3 Analysis of the Growth Data We consider the growth data coming from a longitudinal study conducted in Berkeley, California. These data are usually referred to in the literature as the Berkeley growth data. The complete description of the study can be found in Tuddenham and Snyder (1954). Here, we analyze the data coming from the records of 43 white males and 50 white females born in 1928 or 1929 in Berkeley, California. The data set includes, in most cases, measurements of recumbent length at three months and then at three-month intervals to 21 months. Standing heights were measured annually from age two until eight and thereafter semiannually until age eighteen. Most of the analyses of human growth use the data coming from the Zurich longi tudinal growth study (see Gasser et.al. (1984a, 1984b, 1985, 1990, 1991a, 1991b)). In the Zurich study, the children's heights were measured at a lesser frequency in infancy. In work making comparisons between the Berkeley and Zurich data sets, the common ages were chosen to make the comparison fair (see Ramsay et.al. (1995)). Therefore, in our analysis we have used only the measurements at ages 1, 1.5, 2 and annually thereafter until 9.5 in females and 10 in males, followed by semiannual measurements until age 18, for a total of 27 ages for the males and 28 for the females. The growth curves were analyzed for each child. We have used the bootstrap procedure for derivatives outlined on page 39. We utilized local quadratic kernel smoothing in step 2 of the bootstrap procedure to find the unrestricted estimate munr(-) of the regression function m(-). Local quadratic smoothing produces a good estimate of m', the rate of growth. The calculations were performed using the locpoly function in Splus. locpoly selects a bandwidth based on a plug-in procedure. We chose this method because the GCV method for smoothing parameter choice performs badly when the noise in the data is very small, 84 as is the case here. The GCV choice of A is usually too small, yielding residuals which are very close to zero. Also, we employed Lspline smoothing in steps 1 and 5 of the bootstrap procedure. The penalty used was Lm = ra" + 7m' with the default model nidef{i) = a0 + a1exp{—jt}. We estimated 7 by a non-linear least squares procedure. For each child, we generated 500 bootstrap samples from the original residuals. Our data set has height measurements on either a semi-annual or annual basis. If we wanted to implement our testing procedure at the observed times ti, it would be easier to detect a bump in an area with more frequent measurements. However, we do not want the frequency of observations to differ from the testing procedure developed for equally spaced independent variable values. Therefore, for testing purposes, we evaluate the estimate of the speed of growth at 101 equally spaced values of time between the ages 1 and 18. We decide on the number of bumps in a curve using p-values from tests for the existence of 0, 1, 2, or 3 bumps in the speed of the growth. Following the discussion in section 5.2, we decided to choose the significance level of our test to be a = 0.15, i.e. we reject H0 if a p-value < 0.15. For each child, we perform 4 tests: H0 : # bumps < k for k = 0, 1, 2, and 3. We base our decision on either a "strict" or "soft" rule, described below. By the "strict" decision rule, we reach a conclusion that the speed of growth has exactly b bumps if, for the hypothesis tests H0 : # bumps < k the p-values are less than or equal to 0.15, for all k < b and the p-values are greater than 0.15 for all b < k < 3. When we reject H0 for all k < 3 we conclude that the number of bumps b is greater than three. In some cases, no classification can be made by the strict decision rule. For instance, we may reject the hypothesis H0 : # bumps < k for k=Q and 2, but we do not reject it for k=l and 3 at the level a = 0.15. 85 Therefore, we introduce the "soft" decision rule, which always gives us a classi fication. For 6=0, 1, 2, 3, we say that the curve has b bumps if, for the hypothesis tests H0 : # bumps < k, for all k < b the p-values are less than or equal to 0.15, and the p-value is greater than 0.15 for k = b. As with the strict decision rule, we conclude that a curve has more than 3 bumps if, for all k < 3, the p-values are less than or equal to 0.15. If we cannot reach a conclusion according to the strict decision rule, we report not only b, the soft rule decision, but also a range of values for the number of bumps. Thus, we say that a curve has 6^, 6L+I, • • • ,bu bumps where = b according to the soft decision rule, and bu = 1 + (the biggest k for which the p-value < 0.15). For example, when the p-value < 0.15 for k=Q and 2, and the p-value > 0.15 for k=l and 3, we have a range of bumps bL = 1,..., bu = 1 + 2, since k=2 is the largest k value for which the p-value < 0.15. We present here the results obtained using Lspline smoothing with 3-max as the definition of bump. Results are summarized in Tables 5.12 and 5.13 for males and females respectively. When the strict rule gives us a classification we have a single number in the "Conclusion" column. For other cases, we present a classification based on the range rule with the number of bumps as classified by the soft rule indicated in bold face. The full results of all the tests performed are presented in Appendices A and B in Tables A.l and B.l for males and females respectively. MALE DATA Using the "strict" decision rule described in the last section, we classify the males into groups with 0, 1, 2, 3 and more than 3 bumps (or spurts of growth). We can make a decision in 34 out of 43 cases. Out of 34 males classified according to the strict decision rule, 10 males have one spurt of growth, 17 males - two spurts, 6 males 86 - three spurts, and 1 male more than three spurts. In the remaining 9 cases, we do not reject the null hypothesis for some smaller values of k, but we do reject for larger /c's. For instance, for 5 boys, we do not reject the H0 with fc=l and 3, but we do reject H0 for A;=0 and 2. As mentioned before, in such a situation we give a range of values for 6, the number of bumps. The "soft" decision rule gives the following classification: 17 males with one growth spurt, 19 males with two growth spurts, 6 males with three, and 1 male with more than three spurts of growth. Figures 5.17 - 5.19 contain plots of curves classified by the strict definition of having 1, 2, and 3 bumps, respectively. None of the 43 curves was classified as having 0 bumps. We plot the estimates for 6=1, 2, and 3 bumps, and we indicate the one we decided on by a solid curve. The top plot in each figure shows an individual where our decision was very clear: p-values were much smaller than a = 0.15 for k less than our choice of 6, and much larger than 0.15 for the other values of k. The p-values for the bottom plot are less separated. Figure 5.20 contains plots of two curves that could only be classified by the soft or range decision rule. The top plot shows curves for male # 18, with a range of the number of bumps: 1, 2, 3. The bottom plot plot for male # 23 has a range: 1, 2, 3, >3 bumps. We cannot make a complete comparison of our test results with the work described in Ramsay et.al. (1995). The procedure outlined in their work deals with estimating so called "landmarks", or ages at which an important feature of growth occurs. The features consist of events associated with the mid-spurt of growth (MS), and the pubertal spurt of growth (PS). In total, eight landmarks are described, but only two are related to bumps in the rate of growth: T3 - age of maximal velocity during MS, and T8 - age of maximal velocity during PS. 87 Ramsay et.al. then find in how many cases T3 and T8 occur. From the information provided, we can use Ramsay et.al.'s estimation procedure to classify curves into two groups: in one group, curves have one bump (only T8 present), and in the other group, curves have two or more bumps (both T3 and T8 present with the possibility of T3 being multiple). For the male data, they discovered the occurrence of T3 in 24 out of 43 cases, and T8 in all 43 cases. In our study, we found 1 bump in 10 cases out of 34 classified by the "strict" decision rule, and 17 out of 43 using the "soft" decision rule. The remaining curves were, of course, classified as having 2 or more bumps. Therefore, 71% of the cases classified by the "strict" rule give us 2 or more bumps in a curve, and in 60% of the cases "soft" rule indicates 2 or more bumps in the rate of growth curves. Ramsay et.al. had 56% of male growth curves with 2 or more bumps. We can conclude that our "soft" decision rule gives comparable results to the ones described in Ramsay et.al. 88 Table 5.12: p-value estimates based on 500 bootstrap samples for the male data using 3-max as a definition of a bump in the testing procedure. HQ : # of bumps < k Males k = 0 k = 1 k = 2 A; = 3 Conclusion 1 0.000 0.002 0.938 0.994 2 2 0.000 0.908 0.680 0.316 1 3 0.000 0.130 0.380 0.718 2 4 0.000 0.068 0.274 0.996 2 5 0.000 0.310 0.094 0.884 1,2,3 6 0.000 0.000 0.126 0.560 3 7 0.000 0.010 0.114 0.558 3 8 0.000 0.590 0.442 0.998 1 9 0.000 0.000 0.864 0.454 2 10 0.000 0.862 0.478 0.476 1 11 0.000 0.000 0.220 0.588 2 12 0.000 0.030 0.460 0.310 2 13 0.000 0.012 0.798 0.802 2 14 0.000 0.012 0.162 0.126 2,3,>3 15 0.000 0.000 0.984 0.992 2 16 0.000 0.088 0.148 0.134 >3 17 0.000 0.830 0.292 0.384 1 18 0.000 0.380 0.010 0.224 1,2,3 19 0.000 0.154 0.706 0.718 1 20 0.000 0.184 0.204 0.036 1,2,3,>3 21 0.000 0.000 0.532 0.178 2 ' 22 0.000 0.000 0.472 0.744 2 89 Table 5.12: (continued) Males k = 0 k = 1 k = 2 k = 3 Conclusion 23 0.000 0.750 0.296 0.086 1,2,3,>3 24 0.000 0.000 0.150 0.484 3 25 0.000 0.000 0.208 0.256 2 26 0.000 0.092 0.162 0.066 2,3,>3 27 0.000 0.748 0.798 0.416 1 28 0.000 0.390 0.078 0.776 1,2,3 29 0.000 0.000 0.172 0.566 2 30 0.000 0.010 0.038 0.364 3 31 0.000 0.050 0.686 0.998 2 32 0.000 0.152 0.994 0.978 1 33 0.000 0.534 0.302 0.484 1 34 0.000 0.222 0.028 0.286 1,2,3 35 0.000 0.006 0.856 0.938 2 36 0.000 0.296 0.030 0.188 1,2,3 37 0.000 0.002 0.034 0.516 3 38 0.000 0.000 0.018 0.400 3 39 0.000 0.100 0.538 0.994 2 40 0.000 0.106 0.658 0.440 2 41 0.000 0.824 0.408 0.260 1 42 0.000 0.212 0.592 0.488 1 43 0.000 0.028 0.388 0.576 2 90 Male #2 Male #35 Male #37 Male #18 Male #23 in j o o H 5 10 15 Age Figure 5.20: Estimates of the speed of growth of males when a decision cannot be reached by a "strict" criterion. 94 FEMALE DATA When we apply the "strict" decision rule to the female growth curves, we can classify 39 out of 50 females. In 17 of these cases, we decide that there is one spurt of growth, in 15 cases - two spurts, in 5 cases - three spurts, and in 2 cases - more than three spurts of growth. The remaining 11 females cannot be classified into one of the mentioned groups. Classification according to the "soft" rule gives: 1 case with zero spurts, 24 cases with one spurt of growth, 18 cases with two spurts, 5 cases with three, and 2 cases with more than three spurts of growth in the speed curve. We plot two curves for each number of the bumps detected 6=1, 2, 3 according to the "strict" rule (see Figures 5.21 - 5.23). Figure 5.24 contains the cases when the "strict" decision rule did not provide a 6 value. In Appendix B the plots of the estimated growth functions for all 50 females are presented. Ramsay et.al. found T3 present in 9 growth curves, and T8 in 49 growth curves, and this would classify 49 of the curves as having 1 bump and 9 of the curves as having 2 or more bumps. (Recall that there were multiple occurrences of T3 in some estimates of the female speed of growth.) Our results indicate more females having more than one episode of maximal velocity, 22 cases out of 39 using the "strict" criterion, and 25 out of 50 using the "soft" criterion. Comparison of the female and male growth curves suggests that multi-bump curves are more prevalent for boys than for girls (71% of the boys versus 56% of the girls when the "strict" criterion is used, and 60% of the boys versus 50% of the girls when the "soft" criterion is used). Also, boys and girls are equally easy to classify by the "strict" definition, with only about 20% of each group being unclassified. 95 Female # 1 Female #12 Age Figure 5.21: Estimates of the speed of growth of females with one bump. 96 Female # 5 Female #46 Female # 32 Age Figure 5.23: Estimates of the speed of growth of females with three bumps. 98 Female #2 Female # 39 Figure 5.24: Estimates of the speed of growth of females when the decision could not be reached according to the "strict" criterion. 99 Table 5.13: p-value estimates based on 500 bootstrap samples for the female data using 3-max as a definition of a bump in the testing procedure. H0 : # of bumps < k Females k = 0 k = 1 k = 2 k = 3 Conclusion 1 0.000 0.872 0.998 0.994 1 2 0.586 0.630 0.116 0.030 0,1,2,3,>3 3 0.014 0.496 0.664 0.902 1 4 0.000 0.138 0.676 0.672 2 5 0.000 0.022 0.998 0.956 2 6 0.000 0.152 0.002 0.682 1,2,3 7 0.000 0.126 0.008 0.058 >3 8 0.000 0.388 0.722 0.884 1 9 0.000 0.426 0.420 0.606 1 10 0.000 0.320 0.508 1.000 1 11 0.000 0.214 0.168 0.102 1,2,3,>3 12 0.000 0.510 0.250 0.476 1 13 0.000 0.150 0.546 0.448 2 14 0.000 0.166 0.184 0.606 1 15 0.014 0.012 1.000 0.976 2 16 0.000 0.376 0.348 0.530 1 17 0.000 0.290 0.986 0.848 1 18 0.000 0.006 0.982 0.998 2 19 0.000 0.306 0.110 0.998 1,2,3 20 0.000 0.716 0.428 0.380 1 21 0.000 0.020 0.888 0.656 2 22 0.010 0.126 0.194 0.546 2 100 Table 5.13: (continued) Females k = 0 k = 1 k = 2 k = 3 Conclusion 23 0.000 0.006 0.390 1.000 2 24 0.000 0.008 0.772 0.324 2 25 0.000 0.014 0.984 0.948 2 26 0.000 0.144 0.000 0.470 3 27 0.000 0.918 0.816 1.000 1 28 0.000 0.080 0.094 0.162 3 29 0.000 0.042 0.170 0.018 2,3,>3 30 0.000 0.138 0.266 0.332 2 31 0.002 0.000 0.060 0.826 3 32 0.000 0.126 0.020 0.182 3 33 0.000 0.686 0.932 0.926 1 34 0.000 0.330 0.102 0.026 1,2,3,>3 35 0.000 0.042 0.172 0.954 2 36 0.000 0.088 0.148 0.026 >3 37 0.000 0.078 0.994 0.942 2 38 0.020 0.734 0.658 0.396 1 39 0.000 0.230 0.294 0.034 1,2,3,>3 40 0.000 0.008 0.210 0.112 2,3,>3 41 0.000 0.452 0.974 0.944 1 42 0.000 0.004 0.508 0.104 2,3,>3 43 0.000 0.218 0.080 0.902 1,2,3 44 0.000 0.148 0.984 0.970 2 45 0.000 0.340 0.092 0.002 1,2,3,>3 46 0.000 0.028 0.016 0.724 3 101 Table 5.13: (continued) Females k = 0 k = 1 k = 2 k = 3 Conclusion 47 0.000 0.634 0.548 0.898 1 48 0.000 0.086 0.778 0.958 2 49 0.010 0.728 0.242 0.796 1 50 0.000 0.804 0.652 0.938 1 102 Chapter 6 Conclusions In this thesis, the new, smoothing parameter based test CriSV for the number of bumps in regression functions and their derivatives has been proposed. Bootstrap methods have been used to assess its performance. Simulation studies indicated that the CriSV test was successful in all considered setups. The behaviour of the test was better when the bump was of a bigger size and not situated on a steep slope of a regression function. As mentioned in Chapter 5, the significance level that should be chosen for CriSV is between 10% and 20%. At low significance levels (a < 0.05) the power of the proposed test drops substantially, especially for the more challenging regression functions and their derivatives. Cri.S'P was applied to the Berkeley growth data, to detect the number of spurts of children's growth. Most of the results obtained from the completely data-driven test agree with what one might decide "by eye", by looking at an appropriately smoothed derivative estimate. When we compared our test with the results of Ramsay et.al. (1995), we arrived at very similar conclusions for the male data. For the female data, we detected spurts of growth on more occasions than Ramsay et.al. did. It should be noted that the procedure introduced in this thesis is complicated, 103 and many issues regarding it are still unresolved. Throughout the simulation studies, we obtained a few surprising results. For instance, in the simulations involving the first derivative of the regression function, for some models we got better results with a higher level of noise added to the true regression function than with a lower level of noise. Also, we were able to detect a small bump on a flat portion of the curve more often than for a large bump on a steep slope. In the introduced procedure, the residuals were calculated using the GCV criterion for the smoothing parameter choice in spline smoothing. This criterion chooses the "optimal" smoothing parameter for the estimation of the function m(-). However, when we test for the number of bumps in the derivative of m(-), we could probably improve the performance of CriSV if we used a procedure tailored to choosing the "best" smoothing parameter for m'(-), rather than for m(-). Another issue of interest is the definition of a bump. We defined it in terms of the values of the original function m(-). However, we could define it as a zero down-crossing of the derivative of m. When we define a bump as (5.5), we encounter the unpleasant property of the number of bumps being a non-monotone function of a smoothing parameter. It causes some philosophical problems in the testing procedure. This property is not necessary for CxiSV to work, but without the monotonicity property we cannot provide the intuitive reasons behind our test. A modification of CxiSV to deal with correlated or heteroscedastic errors is worth pursuing. This adjustment would most likely involve a new estimate of rhunr and a different method of bootstrapping the residuals. A version of CviSV for higher dimensional data would also be useful. 104 Bibliography [1] Babaud, J., Witkin, A.P., Baudin, M., and Duda, R.O. (1986), "Uniqueness of the Gaussian Kernel for Scale-Space Filtering", IEEE Transactions on Pattern Analysis and Machine Intelligence, PA MI-8 26-33. [2] Bahadur, R.R. and Savage, L.J. (1956), "The Nonexistence of Certain Statistical Procedures in Nonparametric Problems", Annals of Mathematical Statistics, 27, 1115-1122. [3] Bowman, A.W., Jones, M.C and Gijbels, I. (1998), "Testing Monotonicity of Regression", Journal of Computational and Graphical Statistics, to appear. [4] Chaudhuri, P. and Marron, J.S. (1997), "SiZer for Exploration of Structures in Curves", unpublished manuscript. [5] Cox, D.R. (1966), "Notes on the Analysis of Mixed Frequency Distributions", British Journal of Mathematical and Statistical Psychology, 19, 39-47. [6] Donoho, D.L. (1988), "One-sided Inference about Functionals of a Density", The Annals of Statistics, 16, 1390-1420. [7] Efron, B. and Tibshirani, R. (1993), An Introduction to the Bootstrap, Chapman and Hall. 105 [8] Fan, J., and Gijbels, I. (1995), "Data-driven Bandwidth Selection in Local Poly nomial Fitting: Variable Bandwidth and Spatial Adaptation", Journal of the Royal Statistical Society, Series B, 57, 371-394. [9] Fan, J., and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications, Chapman and Hall. [10] Fan, Jianqing (1992), "Design-adaptive Nonparametric Regression", Journal of the American Statistical Association, 87, 998-1004. [11] Fisher, N.I., Mammen, E., and Marron, J.S. (1994), "Testing for Multimodality", Computational Statistics & Data Analysis, 18, 499-512. [12] Gasser, T., Kneip, A. and Kohler, W. (1991), "A Flexible and Fast Method for Automatic Smoothing", Journal of the American Statistical Association, 86, 643-652. [13] Gasser, T. and Muller, H.-G. (1979), "Kernel Estimation of Regression Func tions", in Smoothing Techniques for Curve Estimation (eds. T. Gasser and M. Rosenblatt), Springer-Verlag. [14] Gasser, T., Muller, H.-G., Kohler, W., Molinari L. and Prader A. (1984a), "Non parametric Regression Analysis of Growth Curves", The Annals of Statistics, 12, 210-229. [15] Gasser, T., Kohler, W., Muller, H.-G., Kneip, A., Largo, R., Molinari L. and Prader A. (1984b), "Velocity and Acceleration of Height Growth Using Kernel Estimation", The Annals of Human Biology, 11, 397-411. 106 [16] Gasser, T., Muller, H.-G., Kohler, W., Prader A., Largo, R. and Molinari L. (1985), "An Analysis of the Mid-growth and Adolescent Spurts of Height Based on Acceleration", The Annals of Human Biology, 12, 129-148. [17] Gasser, T., Kneip, A., Ziegler, P., Largo, R. and Prader A. (1990), "A Method for Determining the Dynamics and Intensity of Average Growth", The Annals of Human Biology, 17, 459-474. [18] Gasser, T., Kneip, A., Binding, A., Prader A. and Molinari L. (1991a), "The Dynamics of Linear Growth in Distance, Velocity and Acceleration", The Annals of Human Biology, 18, 187-205. [19] Gasser, T., Kneip, A., Ziegler, P., Largo, R., Molinari L. and Prader A. (1991b), "The Dynamics of Growth of Width in Distance, Velocity and Acceleration", The Annals of Human Biology, 18, 449-461. [20] Good, I.J. and Gaskins, R.A. (1980), "Density Estimation and Bump-Hunting by the Penalized Likelihood Method Exemplified by Scattering and Meteorite Data", Journal of the American Statistical Association, 75, 42-56. [21] Green P.J. and Silverman, B.W. (1994), Nonparametric Regression and Gener alized Linear Models - A Roughness Penalty Approach, Chapman and Hall. [22] Hall, P., Sheather, S.J., Jones, M.C. and Marron, J.S. (1991), "On Optimal Data-based Bandwidth Selection in Kernel Density Estimation", Biometrika, 78, 263-271. [23] Hardle, W., and Marron, J.S. (1985), "Optimal Bandwidth Selection in Nonpara metric Regression Function Estimation", The Annals of Statistics 13, 1465-1481. 107 [24] Hardle, W. (1989), Applied Nonparametric Regression, SIAM Cambridge Uni versity Press. [25] Hardle, W. (1991), Smoothing Techniques with Implementation in S, Springer-Verlag. [26] Hartigan, J.A. and Hartigan, P.M. (1985), "The Dip Test of Unimodality", The Annals of Statistics, 13, 70-84. [27] Hartigan, J.A. and Mohanty, S. (1992), "The RUNT Test for Multimodality", Journal of Classification, 9, 63-70. [28] Hastie, T.J. and Loader, C. (1993), "Local regression: Automatic Carpentry", Statistical Science, 8, 120-143. [29] Hastie, T.J. and Tibshirani, R.J. (1990), Generalized Additive Models, Chapman and Hall. [30] Heckman, N.E. (1992), "Bump hunting in regression analysis", Statistics and Probability Letters, 14, 141-152. [31] Heckman, N.E. (1997), "The Theory and Application of Penalized Least Squares Methods or Reproducing Kernel Hilbert Spaces Made Easy", unpublished manuscript. [32] Heckman, N.E. and Ramsay, J.O. (1996), "Penalized Regression with Model-Based Penalties", unpublished manuscript. [33] Izenman, A.J., and Sommer, C.J. (1988), "Philatelic Mixtures and Multimodal Densities", Journal of the American Statistical Association, 83, 941-953. 108 [34] Largo, R.H., Gasser, Th., Prader, A., Stuetzle, W. and Huber, P.J. (1978), "Analysis of the Adolescent Growth Spurt using Smoothing Spline Functions", Annals of Human Biology, 5, 421-434. [35] Mammen, E. (1991), "Estimating a Smooth Monotone Regression Function", The Annals of Statistics, 19, 724-740. [36] Mammen, E. (1991), "Nonparametric Regression Under Qualitative Smoothness Assumptions", The Annals of Statistics, 10, 1217-1223. [37] Mammen, E., and Marron, J.S. (1995), "Mass Recentered Kernel Smoothers", Biometrika, 84, 765 - 778. [38] Mammen, E., Marron, J.S., and Fisher, N.I. (1992), "Some Asymptotics for Multimodality Tests based on Kernel Density Estimates", Probability Theory and Related Fields, 91, 115-132. [39] Marron, J.S. and Chu, C. K. (1991), "Choosing a Kernel Regression Estimator", Statistical Science, 6, 404-436, (with discussion). [40] Minnotte, M.C. (1997), "Nonparametric Testing of the Existence of Modes", The Annals of Statistics, 25, 1646-1660. [41] Minnotte, M.C. and Scott, D.W. (1993), "The Mode Tree: A Tool for Visu alization of Nonparametric Density Features", Journal of Computational and Graphical Statistics, 2, 51-68. [42] Mukerjee, H. (1988), "Monotone Nonparametric Regression", The Annals of Statistics, 16, 741-750. [43] Muller, D.W. (1992), "The Excess Mass Approach in Statistics", Preprint Nr.3, December 1992, Beitrage zur Statistik, Universitat Heidelberg. 109 [44] Muller, D.W. and Sawitzki, G. (1991), "Excess Mass Estimates and Tests for Multimodality", Journal of the American Statistical Association, 86, 738-746. [45] Ramsay, J.O. (1996), "Estimating a Smooth Monotone Function", Journal of the Royal Statistical Society, Series B, 60, 365-375. [46] Ramsay, J.O., Altman, N. and Bock, R.D. (1994), "Variation in Height acceler ation in the Fels Growth Data", Canad. J. Statist, 22, 89-102. [47] Ramsay, J.O., Bock, R.D. and Gasser, T. (1994), "Comparison of Height Ac celeration Curves in the Fels, Zurich, and Berkeley Growth Data", Annals of Human Biology, 22, 413-426. [48] Rice, J. (1984), "Bandwidth Choice for Nonparametric Regression", The Annals of Statistics, 12, 1215-1230. [49] Ruppert, D., Sheather, S.J., and Wand, M.P. (1993), "An Effective Bandwidth Selector for Local Least Squares Regression", Journal of the American Statistical Association, 90, 522-534. [50] Sain, S.R. (1994), "Adaptive Kernel Density Estimation", Ph.D. Thesis, Dept. of Statistics, Rice University. [51] Schlee, W. (1982), "Nonparametric Tests of the Monotony and Convexity of Regression", in Nonparametric Statistical Inference, vol. II(eds: B.V. Gnedenko, M.L. Puri & I. Vincze), Amsterdam: North-Holland, 823-836. [52] Silverman, B.W. (1981), "Using Kernel Density Estimates to investigate Multi-modality", Journal of Royal Statistical Society, Series B, 43, 97-99. [53] Silverman, B.W. (1983), "Some Properties of a Test for Multimodality Based on Kernel Density Estimates", in London Mathematical Society Lecture Note 110 Series # 79 Probability, Statistics and Analysis (eds: J.F.C. Kingman & G.E.H. Reuter), 248-259. [54] Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman and Hall. [55] Tuddenham, R.D., and Snyder, M.M. (1954), "Physical Growth of California Boys and Girls form Birth to Eighteen Years", University of California Publica tions in Child Development, 1, 183-364. [56] Wahba, G. (1990), Spline Models for Observational Data, SIAM. [57] Wand, M.P. and Jones, M.C. (1995), Kernel Smoothing, Chapman and Hall. [58] Wong, M.A. (1985), "A Bootstrap Testing Procedure for Investigating the Num ber of Subpopulations", J. Statist. Comput. Simul., 22, 99-112 [59] Woodroofe, M. and Sun, J. (1997), "Testing Uniformity versus a Monotone Den sity" , submitted to The Annals of Statistics 111 Appendix A Male results In Table A.l the results of the bootstrap procedure are summarized. We show the p-values of all the tests performed with different definitions of a bump (2-, 3-, 4-, and 5-max), and two smoothing methods used in steps 1 and 5 of the bootstrap procedure on page 39: Lspline and Kernel smoothing. In step 2 of the bootstrap procedure, we always used the Kernel smoothing. Also, we exhibit here the estimated growth curves for all 43 males. Lspline smooth ing is used to obtain these estimates. When the "strict" decision rule gives us the classification, we plot only the estimate with the value of a smoothing parameter being equal to the respective Xcbr\t. In 9 cases with no decision made according to the "strict" rule, we plot the estimates with the values of b arrived at using the range rule and we indicate the "soft" classification by a solid line. 112 Table A.l: Summary of 500 bootstrap samples for the male growth data. H0 : # of bumps < k Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.002 0.962 0.992 Kernel 0.022 0.120 0.948 0.778 3-max Lspline 0.000 0.002 0.938 0.994 1 Kernel 0.014 0.096 0.934 0.720 4-max Lspline 0.000 0.004 0.936 0.996 Kernel 0.014 0.076 0.868 0.628 5-max Lspline 0.000 0.000 0.908 0.944 Kernel 0.012 0.056 0.992 0.946 2-max Lspline 0.000 0.930 0.632 0.372 Kernel 0.006 0.874 0.610 0.266 3-max Lspline 0.000 0.908 0.680 0.316 2 Kernel 0.004 0.882 0.638 0.274 4-max Lspline 0.000 0.914 0.738 0.382 Kernel 0.004 0.862 0.628 0.264 5-max Lspline 0.000 0.934 0.632 0.556 Kernel 0.000 0.876 0.650 0.274 2-max Lspline 0.000 0.082 0.436 0.726 Kernel 0.000 0.226 0.388 0.948 3-max Lspline 0.000 0.130 0.380 0.718 3 Kernel 0.000 0.222 0.328 0.944 4-max Lspline 0.000 0.108 0.372 0.624 Kernel 0.000 0.242 0.344 0.960 5-max Lspline 0.000 0.148 0.432 0.614 Kernel 0.000 0.218 0.398 0.836 113 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.068 0.300 0.998 Kernel 0.008 0.052 0.802 0.418 3-max Lspline 0.000 0.068 0.274 0.996 4 Kernel 0.004 0.058 0.802 0.404 4-max Lspline 0.000 0.072 0.242 0.992 Kernel 0.000 0.056 0.836 0.484 5-max Lspline 0.000 0.104 0.226 0.958 Kernel 0.000 0.088 0.832 0.482 2-max Lspline 0.000 0.352 0.106 0.884 Kernel 0.020 0.330 0.096 0.862 3-max Lspline 0.000 0.310 0.094 0.884 5 Kernel 0.014 0.322 0.100 0.866 4-max Lspline 0.000 0.296 0.082 0.874 Kernel 0.008 0.328 0.094 0.856 5-max Lspline 0.000 0.264 0.106 0.848 Kernel 0.008 0.330 0.096 0.912 2-max Lspline 0.000 0.000 0.138 0.632 Kernel 0.000 0.000 0.182 0.550 3-max Lspline 0.000 0.000 0.126 0.560 6 Kernel 0.000 0.000 0.158 0.590 4-max Lspline 0.000 0.000 0.130 0.924 Kernel 0.000 0.000 0.162 0.588 5-max Lspline 0.000 0.000 0.130 0.864 Kernel 0.000 0.000 0.176 0.906 114 Table A.l: (continued) Male Width Method Jfc = 0 Jfc = 1 k = 2 k = 3 2-max Lspline 0.000 0.016 0.136 0.500 Kernel 0.000 0.118 0.056 0.842 3-max Lspline 0.000 0.010 0.114 0.558 7 Kernel 0.000 0.098 0.038 0.832 4-max Lspline 0.000 0.014 0.114 0.444 Kernel 0.000 0.080 0.036 0.826 5-max Lspline 0.000 0.014 0.100 0.524 Kernel 0.000 0.046 0.024 0.918 2-max Lspline 0.000 0.630 0.492 0.996 Kernel 0.030 0.510 0.348 0.994 3-max Lspline 0.000 0.590 0.442 0.998 8 Kernel 0.024 0.506 0.328 0.998 4-max Lspline 0.000 0.554 0.396 0.998 Kernel 0.010 0.518 0.318 1.000 5-max Lspline 0.000 0.528 0.456 1.000 Kernel 0.012 0.510 0.370 0.940 2-max Lspline 0.000 0.000 0.882 0.466 Kernel 0.000 0.016 0.676 0.838 3-max Lspline 0.000 0.000 0.864 0.454 9 Kernel 0.000 0.008 0.830 0.842 4-max Lspline 0.000 0.000 0.916 0.474 Kernel 0.000 0.004 0.950 0.818 5-max Lspline 0.000 0.000 0.918 0.948 Kernel 0.000 0.004 0.970 0.714 115 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.884 0.522 0.520 Kernel 0.030 0.910 0.698 0.416 3-max Lspline 0.000 0.862 0.478 0.476 10 Kernel 0.020 0.914 0.710 0.454 4-max Lspline 0.000 0.854 0.536 0.420 Kernel 0.014 0.894 0.626 0.472 5-max Lspline 0.000 0.854 0.550 0.468 Kernel 0.010 0.886 0.624 0.614 2-max Lspline 0.000 0.000 0.258 0.636 Kernel 0.000 0.000 0.290 0.094 3-max Lspline 0.000 0.000 0.220 0.588 11 Kernel 0.000 0.000 0.290 0.262 4-max Lspline 0.000 0.000 0.166 0.696 Kernel 0.000 0.000 0.302 0.408 5-max Lspline 0.000 0.000 0.178 0.630 Kernel 0.000 0.000 0.278 0.824 2-max Lspline 0.000 0.042 0.428 0.152 Kernel 0.004 0.018 0.350 0.064 3-max Lspline 0.000 0.030 0.460 0.310 12 Kernel 0.000 0.014 0.434 0.134 4-max Lspline 0.000 0.026 0.518 0.270 Kernel 0.000 0.014 0.440 0.562 5-max Lspline 0.000 0.012 0.558 0.272 Kernel 0.000 0.012 0.388 0.670 116 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.024 0.738 0.818 Kernel 0.000 0.030 0.884 0.504 3-max Lspline 0.000 0.012 0.798 0.802 13 Kernel 0.000 0.028 0.764 0.438 4-max Lspline 0.000 0.016 0.738 0.706 Kernel 0.000 0.036 0.782 0.342 5-max Lspline 0.000 0.024 0.762 0.628 Kernel 0.000 0.038 0.782 0.406 2-max Lspline 0.000 0.034 0.184 0.092 Kernel 0.000 0.054 0.002 0.062 3-max Lspline 0.000 0.012 0.162 0.126 14 Kernel 0.000 0.050 0.006 0.050 4-max Lspline 0.000 0.014 0.140 0.080 Kernel 0.000 0.046 0.044 0.036 5-max Lspline 0.000 0.018 0.106 0.034 Kernel 0.000 0.048 0.086 0.036 2-max Lspline 0.000 0.000 0.976 0.998 Kernel 0.004 0.038 0.974 0.890 3-max Lspline 0.000 0.000 0.984 0.992 15 Kernel 0.004 0.022 0.980 0.886 4-max Lspline 0.000 0.000 0.990 0.984 Kernel 0.000 0.014 0.978 0.864 5-max Lspline 0.000 0.000 0.984 0.918 Kernel 0.000 0.008 0.920 0.688 117 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.104 0.150 0.118 Kernel 0.034 0.150 0.014 0.072 3-max Lspline 0.000 0.088 0.148 0.134 16 Kernel 0.032 0.130 0.014 0.070 4-max Lspline 0.000 0.066 0.130 0.150 Kernel 0.028 0.096 0.034 0.066 5-max Lspline 0.000 0.054 0.084 0.086 Kernel 0.020 0.066 0.060 0.040 2-max Lspline 0.000 0.774 0.282 0.318 Kernel 0.000 0.708 0.236 0.800 3-max Lspline 0.000 0.830 0.292 0.384 17 Kernel 0.000 0.726 0.238 0.654 4-max Lspline 0.000 0.860 0.332 0.316 Kernel 0.000 0.740 0.250 0.464 5-max Lspline 0.000 0.806 0.348 0.540 Kernel 0.000 0.706 0.198 0.938 2-max Lspline 0.000 0.382 0.010 0.242 Kernel 0.002 0.480 0.054 0.140 3-max Lspline 0.000 0.380 0.010 0.224 18 Kernel 0.000 0.492 0.064 0.158 4-max Lspline 0.000 0.440 0.036 0.238 Kernel 0.000 0.446 0.096 0.136 5-max Lspline 0.000 0.478 0.058 0.270 Kernel 0.000 0.466 0.048 0.146 118 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.204 0.654 0.792 Kernel 0.000 0.020 0.950 0.578 3-max Lspline 0.000 0.154 0.706 0.718 19 Kernel 0.000 0.018 0.944 0.616 4-max Lspline 0.000 0.110 0.786 0.602 Kernel 0.000 0.018 0.930 0.550 5-max Lspline 0.000 0.076 0.812 0.402 Kernel 0.000 0.018 0.938 0.456 2-max Lspline 0.000 0.218 0.214 0.032 Kernel 0.000 0.160 0.012 0.148 3-max Lspline 0.000 0.184 0.204 0.036 20 Kernel 0.000 0.142 0.014 0.172 4-max Lspline 0.000 0.184 0.192 0.026 Kernel 0.000 0.138 0.018 0.148 5-max Lspline 0.000 0.206 0.126 0.030 Kernel 0.000 0.138 0.014 0.186 2-max Lspline 0.000 0.002 0.574 0.282 Kernel 0.002 0.022 0.462 0.420 3-max Lspline 0.000 0.000 0.532 0.178 21 Kernel 0.002 0.018 0.478 0.360 4-max Lspline 0.000 0.000 0.462 0.164 Kernel 0.002 0.018 0.504 0.330 5-max Lspline 0.000 0.000 0.444 0.118 Kernel 0.000 0.016 0.434 0.364 119 Table A.l: (continued) Male Width Method k = 0 k - 1 k = 2 k = 3 2-max Lspline 0.000 0.000 0.548 0.756 Kernel 0.022 0.002 0.140 0.808 3-max Lspline 0.000 0.000 0.472 0.744 22 Kernel 0.006 0.000 0.114 0.852 4-max Lspline 0.000 0.000 0.446 0.766 Kernel 0.004 0.000 0.100 0.754 5-max Lspline 0.000 0.000 0.390 0.766 Kernel 0.004 0.000 0.072 0.644 2-max Lspline 0.000 0.794 0.256 0.072 Kernel 0.000 0.560 0.278 0.080 3-max Lspline 0.000 0.750 0.296 0.086 23 Kernel 0.000 0.548 0.248 0.082 4-max Lspline 0.000 0.790 0.334 0.122 Kernel 0.000 0.536 0.208 0.066 5-max Lspline 0.000 0.754 0.308 0.048 Kernel 0.000 0.546 0.294 0.104 2-max Lspline 0.000 0.000 0.164 0.382 Kernel 0.000 0.008 0.222 0.712 3-max Lspline 0.000 0.000 0.150 0.484 24 Kernel 0.000 0.008 0.214 0.648 4-max Lspline 0.000 0.000 0.118 0.360 Kernel 0.000 0.010 0.206 0.548 5-max Lspline 0.000 0.000 0.104 0.486 Kernel 0.000 0.008 0.204 NA 120 Table A.l: (continued) Male Width Method k = 0 k - 1 k = 2 k = 3 2-max Lspline 0.000 0.000 0.232 0.302 Kernel 0.052 0.030 0.200 0.404 3-max Lspline 0.000 0.000 0.208 0.256 25 Kernel 0.012 0.016 0.160 0.360 4-max Lspline 0.000 0.000 0.292 0.256 Kernel 0.000 0.012 0.188 0.360 5-max Lspline 0.000 0.000 0.336 0.294 Kernel 0.000 0.002 0.300 0.334 2-max Lspline 0.000 0.098 0.118 0.062 Kernel 0.010 0.130 0.230 0.028 3-max Lspline 0.000 0.092 0.162 0.066 26 Kernel 0.002 0.148 0.246 0.022 4-max Lspline 0.000 0.072 0.152 0.056 Kernel 0.002 0.118 0.198 0.022 5-max Lspline 0.000 0.062 0.274 0.042 Kernel 0.000 0.136 0.208 0.036 2-max Lspline 0.000 0.788 0.758 0.454 Kernel 0.016 0.918 0.548 0.320 3-max Lspline 0.000 0.748 0.798 0.416 27 Kernel 0.010 0.922 0.530 0.336 4-max Lspline 0.000 0.728 0.798 0.426 Kernel 0.006 0.920 0.530 0.388 5-max Lspline 0.000 0.726 0.702 0.322 Kernel 0.004 0.918 0.524 0.462 121 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.398 0.112 0.746 Kernel 0.000 0.522 0.066 0.638 3-max Lspline 0.000 0.390 0.078 0.776 28 Kernel 0.000 0.516 0.058 0.656 4-max Lspline 0.000 0.402 0.076 0.714 Kernel 0.000 0.540 0.080 0.688 5-max Lspline 0.000 0.456 0.068 0.766 Kernel 0.000 0.548 0.092 0.700 2-max Lspline 0.000 0.000 0.210 0.660 Kernel 0.000 0.018 0.134 0.106 3-max Lspline 0.000 0.000 0.172 0.566 29 Kernel 0.000 0.016 0.124 0.230 4-max Lspline 0.000 0.000 0.160 0.610 Kernel 0.000 0.016 0.152 0.478 5-max Lspline 0.000 0.000 0.174 NA Kernel 0.000 0.016 0.140 0.816 2-max Lspline 0.000 0.006 0.036 0.318 Kernel 0.008 0.104 0.104 0.446 3-max Lspline 0.000 0.010 0.038 0.364 30 Kernel 0.002 0.090 0.098 0.470 4-max Lspline 0.000 0.012 0.062 0.446 Kernel 0.002 0.078 0.108 0.444 5-max Lspline 0.000 0.026 0.050 0.476 Kernel 0.000 0.064 0.122 0.426 122 Table A.l: (continued) Male Width Method k = 0 A; = 1 k = 2 k = 3 2-max Lspline 0.000 0.032 0.738 0.998 Kernel 0.000 0.064 0.948 0.748 3-max Lspline 0.000 0.050 0.686 0.998 31 Kernel 0.000 0.064 0.882 0.818 4-max Lspline 0.000 0.048 0.674 0.996 Kernel 0.000 0.064 0.778 0.866 5-max Lspline 0.000 0.048 0.640 0.918 Kernel 0.000 0.072 0.834 0.712 2-max Lspline 0.000 0.114 0.996 0.982 Kernel 0.000 0.208 0.964 0.960 3-max Lspline 0.000 0.152 0.994 0.978 32 Kernel 0.000 0.216 0.966 0.974 4-max Lspline 0.000 0.136 0.994 0.984 Kernel 0.000 0.192 0.962 0.972 5-max Lspline 0.000 0.176 0.986 0.976 Kernel 0.000 0.186 0.968 0.900 2-max Lspline 0.000 0.566 0.306 0.546 Kernel 0.000 0.514 0.080 0.924 3-max Lspline 0.000 0.534 0.302 0.484 33 Kernel 0.000 0.556 0.082 0.904 4-max Lspline 0.000 0.536 0.268 0.342 Kernel 0.000 0.508 0.086 0.832 5-max Lspline 0.000 0.522 0.266 0.402 Kernel 0.000 0.542 0.096 0.806 123 Table A.l: (continued) Male Width Method A; = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.160 0.024 0.264 Kernel 0.000 0.370 0.040 0.070 3-max Lspline 0.000 0.222 0.028 0.286 34 Kernel 0.000 0.396 0.034 0.060 4-max Lspline 0.000 0.358 0.016 0.234 Kernel 0.000 0.360 0.040 0.092 5-max Lspline 0.000 0.458 0.016 0.352 Kernel 0.000 0.386 0.048 0.108 2-max Lspline 0.000 0.010 0.886 0.936 Kernel 0.000 0.010 0.972 0.856 3-max Lspline 0.000 0.006 0.856 0.938 35 Kernel 0.000 0.010 0.964 0.876 4-max Lspline 0.000 0.004 0.804 0.940 Kernel 0.000 0.008 0.962 0.838 5-max Lspline 0.000 0.000 0.788 0.874 Kernel 0.000 0.010 0.962 0.906 2-max Lspline 0.000 0.268 0.024 0.176 Kernel 0.068 0.464 0.002 0.082 3-max Lspline 0.000 0.296 0.030 0.188 36 Kernel 0.064 0.460 0.000 0.090 4-max Lspline 0.000 0.328 0.022 0.158 Kernel 0.060 0.464 0.000 0.054 5-max Lspline 0.000 0.378 0.016 0.156 Kernel 0.056 0.464 0.002 0.066 124 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.000 0.038 0.598 Kernel 0.000 0.038 0.116 0.316 3-max Lspline 0.000 0.002 0.034 0.516 37 Kernel 0.000 0.038 0.112 0.312 4-max Lspline 0.000 0.002 0.044 0.584 Kernel 0.000 0.038 0.116 0.354 5-max Lspline 0.000 0.002 0.040 0.458 Kernel 0.000 0.040 0.104 0.234 2-max Lspline 0.000 0.000 0.014 0.390 Kernel 0.000 0.000 0.012 0.360 3-max Lspline 0.000 0.000 0.018 0.400 38 Kernel 0.000 0.000 0.016 0.294 4-max Lspline 0.000 0.000 0.014 0.478 Kernel 0.000 0.000 0.012 0.326 5-max Lspline 0.000 0.000 0.016 0.400 Kernel 0.000 0.000 0.018 0.356 2-max Lspline 0.000 0.118 0.492 0.994 Kernel 0.000 0.138 0.504 0.936 3-max Lspline 0.000 0.100 0.538 0.994 39 Kernel 0.000 0.170 0.432 0.942 4-max Lspline 0.000 0.122 0.472 0.986 Kernel 0.000 0.146 0.410 0.942 5-max Lspline 0.000 0.122 0.588 0.884 Kernel 0.000 0.148 0.476 0.604 125 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.106 0.694 0.416 Kernel 0.000 0.076 0.540 0.202 3-max Lspline 0.000 0.106 0.658 0.440 40 Kernel 0.000 0.092 0.668 0.220 4-max Lspline 0.000 0.108 0.644 0.560 Kernel 0.000 0.080 0.610 0.534 5-max Lspline 0.000 0.066 0.558 0.530 Kernel 0.000 0.090 0.592 0.786 2-max Lspline 0.000 0.694 0.374 0.278 Kernel 0.002 0.460 0.288 0.506 3-max Lspline 0.000 0.824 0.408 0.260 41 Kernel 0.002 0.652 0.310 0.432 4-max Lspline 0.000 0.784 0.660 0.236 Kernel 0.000 0.790 0.504 0.442 5-max Lspline 0.000 0.820 0.614 0.600 Kernel 0.000 0.736 0.888 0.682 2-max Lspline 0.000 0.184 0.656 0.572 Kernel 0.000 0.284 0.114 0.106 3-max Lspline 0.000 0.212 0.592 0.488 42 Kernel 0.000 0.266 0.262 0.090 4-max Lspline 0.000 0.224 0.564 0.504 Kernel 0.000 0.248 0.372 0.182 5-max Lspline 0.000 0.188 0.480 0.640 Kernel 0.000 0.240 0.336 0.858 126 Table A.l: (continued) Male Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.012 0.434 0.564 Kernel 0.000 0.064 0.324 0.422 3-max Lspline 0.000 0.028 0.388 0.576 43 Kernel 0.000 0.068 0.348 0.410 4-max Lspline 0.000 0.020 0.426 0.570 Kernel 0.000 0.052 0.384 0.384 5-max Lspline 0.000 0.032 0.520 0.820 Kernel 0.000 0.068 0.314 NA 127 Male #2 Male #8 Male #10 10 15 5 10 15 5 10 15 Male #17 Male #19 Male #27 10 15 5 10 15 5 10 15 Male #32 Male #33 Male #41 10 15 5 10 15 5 10 15 Male #42 • 10 15 Figure A.l: Plot of male speed curves with 1 bump. 128 Figure A.2: Plot of male speed curves with 2 bumps. 129 Male #31 Male #35 Male #39 Figure A.3: Plot of male speed curves with 2 bumps (top) and 3 bumps (bottom). 130 Male #5 Male #18 Male #28 Figure A.4: Plots of the cases with no decision reached by the "strict" criterion. 131 Appendix B Female results In Table B.l the results of the bootstrap procedure are summarized. We show the p-values of all the tests performed with different definitions of a bump (2-, 3-, 4-, and 5-max), and two smoothing methods used in steps 1 and 5 of the bootstrap procedure on page 39: Lspline and Kernel smoothing. In step 2 of the bootstrap procedure, we always used the Kernel smoothing. Also, we exhibit here the estimated growth curves for all 50 females. Lspline smoothing is used to obtain these estimates. When the "strict" decision rule gives us the classification, we plot only the estimate with the value of a smoothing parameter being equal to the respective X^Ht- In 11 cases when the decision was not reached by using the "strict" rule we plot the estimates with the values of b attained when using the range rule. "Soft" rule classification is indicated by a solid line. 132 Table B.l: Summary of 500 bootstrap samples for the female growth data. H0 : # of bumps < k Female Width Method k = 0 k = l k = 2 k = 3 2-max Lspline 0.000 0.908 1.000 0.996 Kernel 0.000 0.748 0.998 0.998 3-max Lspline 0.000 0.872 0.998 0.994 1 Kernel 0.000 0.754 0.994 0.998 4-max Lspline 0.000 0.870 0.996 0.986 Kernel 0.000 0.762 0.988 0.998 5-max Lspline 0.000 0.818 0.998 0.950 Kernel 0.000 0.712 0.962 0.774 2-max Lspline 0.578 0.520 0.156 0.034 Kernel 0.074 0.172 0.170 0.026 3-max Lspline 0.586 0.630 0.116 0.030 2 Kernel 0.066 0.186 0.174 0.020 4-max Lspline 0.582 0.638 0.096 0.052 Kernel 0.058 0.196 0.148 0.018 5-max Lspline 0.600 0.698 0.090 0.118 Kernel 0.068 0.198 0.160 0.018 2-max Lspline 0.022 0.544 0.718 0.924 Kernel 0.116 0.352 0.884 0.804 3-max Lspline 0.014 0.496 0.664 0.902 3 Kernel 0.072 0.348 0.886 0.838 4-max Lspline 0.010 0.462 0.614 0.826 Kernel 0.038 0.316 0.924 0.856 5-max Lspline 0.002 0.408 0.654 0.770 Kernel 0.024 0.296 0.860 0.764 133 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.070 0.728 0.688 Kernel 0.000 0.006 0.222 0.430 3-max Lspline 0.000 0.138 0.676 0.672 4 Kernel 0.000 0.004 0.462 0.398 4-max Lspline 0.000 0.118 0.700 0.610 Kernel 0.000 0.004 0.752 0.476 5-max Lspline 0.000 0.084 0.722 0.966 Kernel 0.000 0.004 0.882 NA 2-max Lspline 0.000 0.044 0.364 0.944 Kernel 0.000 0.012 0.874 0.630 3-max Lspline 0.000 0.022 0.998 0.956 5 Kernel 0.000 0.008 0.886 0.550 4-max Lspline 0.000 0.018 0.996 0.910 Kernel 0.000 0.008 0.894 0.508 5-max Lspline 0.000 0.012 0.982 0.804 Kernel 0.000 0.012 0.914 0.656 2-max Lspline 0.000 0.196 0.008 0.756 Kernel 0.002 0.062 0.012 0.304 3-max Lspline 0.000 0.152 0.002 0.682 6 Kernel 0.000 0.054 0.012 0.272 4-max Lspline 0.000 0.100 0.002 0.616 Kernel 0.000 0.052 0.000 0.242 5-max Lspline 0.000 0.034 0.000 0.554 Kernel 0.000 0.044 0.000 0.170 134 Table B.l: (continued) Female Width Method Jfc = 0 Jfc = 1 Jfc = 2 Jfc = 3 2-max Lspline 0.000 0.174 0.020 0.038 Kernel 0.000 0.082 0.000 0.008 3-max Lspline 0.000 0.126 0.008 0.058 7 Kernel 0.000 0.084 0.000 0.012 4-max Lspline 0.000 0.128 0.010 0.032 Kernel 0.000 0.102 0.000" 0.008 5-max Lspline 0.000 0.132 0.008 0.034 Kernel 0.000 0.104 0.000 0.012 2-max Lspline 0.000 0.448 0.672 0.920 Kernel 0.000 0.390 0.754 0.774 3-max Lspline 0.000 0.388 0.722 0.884 8 Kernel 0.000 0.336 0.768 0.714 4-max Lspline 0.000 0.378 0.608 0.904 Kernel 0.000 0.300 0.820 0.710 5-max Lspline 0.000 0.362 0.658 0.784 Kernel 0.000 0.286 0.920 0.732 2-max Lspline 0.000 0.474 0.518 0.616 Kernel 0.000 0.868 0.776 0.732 3-max Lspline 0.000 0.426 0.420 0.606 9 Kernel 0.000 0.864 0.888 0.806 4-max Lspline 0.000 0.550 0.376 0.652 Kernel 0.000 0.810 0.974 0.836 5-max Lspline 0.000 0.744 0.968 0.838 Kernel 0.000 0.790 0.860 0.784 135 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.382 0.522 0.998 Kernel 0.000 0.562 0.624 0.946 3-max Lspline 0.000 0.320 0.508 1.000 10 Kernel 0.000 0.582 0.546 0.928 4-max Lspline 0.000 0.310 0.476 1.000 Kernel 0.000 0.646 0.468 0.884 5-max Lspline 0.000 0.272 0.456 0.904 Kernel 0.000 0.566 0.472 0.808 2-max Lspline 0.000 0.246 0.206 0.140 Kernel 0.014 0.466 0.216 0.216 3-max Lspline 0.000 0.214 0.168 0.102 11 Kernel 0.010 0.448 0.194 0.192 4-max Lspline 0.000 0.220 0.208 0.068 Kernel 0.006 0.422 0.162 0.150 5-max Lspline 0.000 0.258 0.154 0.062 Kernel 0.006 0.450 0.156 0.090 2-max Lspline 0.000 0.468 0.276 0.316 Kernel 0.000 0.006 0.148 0.322 3-max Lspline 0.000 0.510 0.250 0.476 12 Kernel 0.000 0.010 0.140 0.284 4-max Lspline 0.000 0.550 0.200 0.374 Kernel 0.000 0.024 0.092 0.294 5-max Lspline 0.000 0.492 0.124 0.526 Kernel 0.000 0.060 0.070 0.348 136 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.180 0.624 0.450 Kernel 0.000 0.356 0.438 0.684 3-max Lspline 0.000 0.150 0.546 0.448 13 Kernel 0.000 0.318 0.406 0.672 4-max Lspline 0.000 0.200 0.520 0.416 Kernel 0.000 0.310 0.448 0.616 5-max Lspline 0.000 0.258 0.496 0.404 Kernel 0.000 0.352 0.388 0.620 2-max Lspline 0.000 0.154 0.250 0.700 Kernel 0.000 0.174 0.062 0.174 3-max Lspline 0.000 0.166 0.184 0.606 14 Kernel 0.000 0.168 0.058 0.234 4-max Lspline 0.000 0.186 0.234 0.690 Kernel 0.000 0.182 0.064 0.192 5-max Lspline 0.000 0.128 0.186 0.550 Kernel 0.000 0.190 0.072 0.250 2-max Lspline 0.010 0.018 1.000 0.984 Kernel 0.374 0.006 1.000 1.000 3-max Lspline 0.014 0.012 1.000 0.976 15 Kernel 0.356 0.004 1.000 0.988 4-max Lspline 0.014 0.008 1.000 0.962 Kernel 0.346 0.004 0.998 0.964 5-max Lspline 0.014 0.008 0.990 0.930 Kernel 0.332 0.004 0.996 NA 137 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.446 0.276 0.492 Kernel 0.000 0.350 0.164 0.600 3-max Lspline 0.000 0.376 0.348 0.530 16 Kernel 0.000 0.346 0.180 0.534 4-max Lspline 0.000 0.410 0.416 0.516 Kernel 0.000 0.348 0.156 0.502 5-max Lspline 0.000 0.412 0.302 0.602 Kernel 0.000 0.274 0.122 0.674 2-max Lspline 0.030 0.348 0.988 0.882 Kernel 0.000 0.148 0.964 0.908 3-max Lspline 0.000 0.290 0.986 0.848 17 Kernel 0.000 0.188 0.962 0.930 4-max Lspline 0.000 0.248 0.998 0.810 Kernel 0.000 0.160 0.968 0.948 5-max Lspline 0.000 0.194 0.994 0.872 Kernel 0.000 0.132 0.984 0.888 2-max Lspline 0.000 0.006 0.932 0.998 Kernel 0.000 0.028 0.968 0.870 3-max Lspline 0.000 0.006 0.982 0.998 18 Kernel 0.000 0.032 0.972 0.828 4-max Lspline 0.000 0.002 1.000 0.996 Kernel 0.000 0.030 0.928 0.896 5-max Lspline 0.000 0.002 0.998 NA Kernel 0.000 0.040 0.962 0.656 138 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.346 0.116 0.998 Kernel 0.000 0.210 0.064 0.990 3-max Lspline 0.000 0.306 0.110 0.998 19 Kernel 0.000 0.196 0.070 0.988 4-max Lspline 0.000 0.282 0.084 0.988 Kernel 0.00.0 0.206 0.090 0.974 5-max Lspline 0.000 0.226 0.096 0.958 Kernel 0.000 0.226 0.068 0.912 2-max Lspline 0.000 0.678 0.522 0.404 Kernel 0.002 0.758 0.462 0.164 3-max Lspline 0.000 0.716 0.428 0.380 20 Kernel 0.002 0.760 0.476 0.136 4-max Lspline 0.000 0.698 0.312 0.354 Kernel 0.000 0.764 0.456 0.176 5-max Lspline 0.000 0.658 0.236 0.366 Kernel 0.000 0.736 0.508 0.278 2-max Lspline 0.000 0.038 0.912 0.650 Kernel 0.000 0.014 0.838 0.576 3-max Lspline 0.000 0.020 0.888 0.656 21 Kernel 0.000 0.020 0.818 0.570 4-max Lspline 0.000 0.008 0.914 0.712 Kernel 0.000 0.008 0.776 0.598 5-max Lspline 0.000 0.012 0.838 0.646 Kernel 0.000 0.014 0.938 0.724 139 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.014 0.166 0.156 0.548 Kernel 0.124 0.004 0.412 0.198 3-max Lspline 0.010 0.126 0.194 0.546 22 Kernel 0.114 0.008 0.388 0.206 4-max Lspline 0.010 0.092 0.212 0.556 Kernel 0.094 0.004 0.386 0.156 5-max Lspline 0.010 0.046 0.256 0.340 Kernel 0.082 0.002 0.404 0.162 2-max Lspline 0.000 0.012 0.444 1.000 Kernel 0.048 0.044 0.182 0.888 3-max Lspline 0.000 0.006 0.390 1.000 23 Kernel 0.010 0.032 0.188 0.880 4-max Lspline 0.000 0.000 0.364 1.000 Kernel 0.006 0.022 0.166 0.846 5-max Lspline 0.000 0.000 0.312 1.000 Kernel 0.006 0.016 0.150 0.870 2-max Lspline 0.000 0.012 0.830 0.372 Kernel 0.004 0.000 0.594 0.330 3-max Lspline 0.000 0.008 0.772 0.324 24 Kernel 0.000 0.000 0.618 0.316 4-max Lspline 0.000 0.008 0.776 0.290 Kernel 0.000 0.000 0.628 0.342 5-max Lspline 0.000 0.004 0.752 0.168 Kernel 0.000 0.000 0.642 0.380 140 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.060 0.986 0.928 Kernel 0.000 0.006 0.830 0.590 3-max Lspline 0.000 0.014 0.984 0.948 25 Kernel 0.000 0.006 0.912 0.780 4-max Lspline 0.000 0.016 0.988 0.982 Kernel 0.000 0.006 0.904 0.788 5-max Lspline 0.000 0.008 0.988 NA Kernel 0.000 0.004 0.854 0.416 2-max Lspline 0.000 0.198 0.010 0.288 Kernel 0.000 0.030 0.000 0.770 3-max Lspline 0.000 0.144 0.000 0.470 26 Kernel 0.000 0.034 0.000 0.686 4-max Lspline 0.000 0.126 0.000 0.400 Kernel 0.000 0.036 0.000 0.752 5-max Lspline 0.000 0.084 0.000 0.614 Kernel 0.000 0.026 0.000 0.786 2-max Lspline 0.000 0.952 0.892 1.000 Kernel 0.000 0.804 0.594 0.874 3-max Lspline 0.000 0.918 0.816 1.000 27 Kernel 0.000 0.772 0.670 0.902 4-max Lspline 0.000 0.892 0.846 0.994 Kernel 0.000 0.728 0.668 0.734 5-max Lspline 0.000 0.832 0.730 0.948 Kernel 0.000 0.654 0.860 0.602 141 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.114 0.080 0.172 Kernel 0.000 0.082 0.188 0.010 3-max Lspline 0.000 0.080 0.094 0.162 28 Kernel 0.000 0.072 0.204 0.028 4-max Lspline 0.000 0.078 0.058 0.166 Kernel 0.000 0.064 0.200 0.146 5-max Lspline 0.000 0.056 0.060 0.764 Kernel 0.000 0.056 0.176 0.780 2-max Lspline 0.000 0.036 0.098 0.014 Kernel 0.000 0.038 0.060 0.006 3-max Lspline 0.000 0.042 0.170 0.018 29 Kernel 0.000 0.036 0.074 0.004 4-max Lspline 0.000 0.056 0.166 0.012 Kernel 0.000 0.038 0.108 0.004 5-max Lspline 0.000 0.062 0.302 0.024 Kernel 0.000 0.048 0.084 0.002 2-max Lspline 0.000 0.142 0.378 0.390 Kernel 0.000 0.094 0.206 0.114 3-max Lspline 0.000 0.138 0.266 0.332 30 Kernel 0.000 0.084 0.222 0.156 4-max Lspline 0.000 0.178 0.360 0.314 Kernel 0.000 0.082 0.176 0.100 5-max Lspline 0.000 0.118 0.356 0.236 Kernel 0.000 0.054 0.184 0.250 142 Table B.l: (continued) Female Width Method k = Q k = 1 k = 2 k = 3 2-max Lspline 0.008 0.000 0.092 0.862 Kernel 0.000 0.000 0.014 0.474 3-max Lspline 0.002 0.000 0.060 0.826 31 Kernel 0.000 0.000 0.022 0.788 4-max Lspline 0.000 0.000 0.034 0.824 Kernel 0.000 0.000 0.022 0.746 5-max Lspline 0.000 0.070 0.018 0.752 Kernel 0.000 0.000 0.020 0.738 2-max Lspline 0.000 0.180 0.026 0.200 Kernel 0.000 0.204 0.000 0.196 3-max Lspline 0.000 0.126 0.020 0.182 32 Kernel 0.000 0.230 0.004 0.178 4-max Lspline 0.000 0.104 0.026 0.180 Kernel 0.000 0.196 0.004 0.174 5-max Lspline 0.000 0.106 0.036 0.164 Kernel 0.000 0.190 0.004 0.232 2-max Lspline 0.000 0.738 0.902 0.904 Kernel 0.008 0.440 0.886 0.564 3-max Lspline 0.000 0.686 0.932 0.926 33 Kernel 0.004 0.418 0.896 0.554 4-max Lspline 0.000 0.646 0.914 0.784 Kernel 0.000 0.416 0.892 0.598 5-max Lspline 0.000 0.576 0.872 0.986 Kernel 0.000 0.424 0.668 0.474 143 Table B.l: (continued) Female Width Method A: = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.302 0.110 0.020 Kernel 0.000 0.214 0.066 0.344 3-max Lspline 0.000 0.330 0.102 0.026 34 Kernel 0.000 0.224 0.110 0.340 4-max Lspline 0.000 0.478 0.142 0.036 Kernel 0.000 0.206 0.164 0.256 5-max Lspline 0.000 0.414 0.190 0.352 Kernel 0.000 0.206 0.386 0.140 2-max Lspline 0.000 0.078 0.214 0.942 Kernel 0.000 0.088 0.172 0.776 3-max Lspline 0.000 0.042 0.172 0.954 35 Kernel 0.000 0.086 0.130 0.834 4-max Lspline 0.000 0.034 0.116 0.980 Kernel 0.000 0.090 0.152 0.832 5-max Lspline 0.000 0.030 0.106 0.960 Kernel 0.000 0.078 0.154 0.484 2-max Lspline 0.000 0.162 0.140 0.020 Kernel 0.000 0.044 0.000 0.000 3-max Lspline 0.000 0.088 0.148 0.026 36 Kernel 0.000 0.042 0.000 0.000 4-max Lspline 0.000 0.052 0.134 0.016 Kernel 0.000 0.014 0.000 0.000 5-max Lspline 0.000 0.016 0.130 0.002 Kernel 0.000 0.008 0.000 0.000 144 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.086 0.996 0.928 Kernel 0.000 0.050 0.922 0.760 3-max Lspline 0.000 0.078 0.994 0.942 37 Kernel 0.000 0.050 0.924 0.826 4-max Lspline 0.000 0.054 0.994 0.934 Kernel 0.000 0.046 0.896 0.806 5-max Lspline 0.000 0.022 0.982. NA Kernel 0.000 0.052 0.762 0.442 2-max Lspline 0.026 0.768 0.554 0.436 Kernel 0.100 0.894 0.456 0.104 3-max Lspline 0.020 0.734 0.658 0.396 38 Kernel 0.076 0.888 0.472 0.100 4-max Lspline 0.012 0.704 0.586 0.340 Kernel 0.048 0.894 0.448 0.098 5-max Lspline 0.012 0.646 0.634 0.266 Kernel 0.032 0.896 0.400 0.138 2-max Lspline 0.000 0.228 0.306 0.070 Kernel 0.000 0.148 0.344 0.222 3-max Lspline 0.000 0.230 0.294 0.034 39 Kernel 0.000 0.136 0.356 0.248 4-max Lspline 0.000 0.184 0.254 0.048 Kernel 0.000 0.134 0.394 0.278 5-max Lspline 0.000 0.178 0.266 0.092 Kernel 0.000 0.140 0.278 0.180 145 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.034 0.198 0.144 Kernel 0.000 0.014 0.024 0.362 3-max Lspline 0.000 0.008 0.210 0.112 40 Kernel 0.000 0.014 0.026 0.380 4-max Lspline 0.000 0.016 0.178 0.064 Kernel 0.000 0.008 0.024 0.474 5-max Lspline 0.000 0.006 0.182 0.080 Kernel 0.000 0.008 0.020 0.384 2-max Lspline 0.000 0.424 0.962 0.920 Kernel 0.000 0.276 1.000 0.984 3-max Lspline 0.000 0.452 0.974 0.944 41 Kernel 0.000 0.260 1.000 0.998 4-max Lspline 0.000 0.420 0.954 0.984 Kernel 0.000 0.272 0.994 0.982 5-max Lspline 0.000 0.338 0.924 0.860 Kernel 0.000 0.216 NA NA 2-max Lspline 0.000 0.010 0.510 0.100 Kernel 0.030 0.000 0.306 0.048 3-max Lspline 0.000 0.004 0.508 0.104 42 Kernel 0.024 0.000 0.328 0.064 4-max Lspline 0.000 0.004 0.494 0.142 Kernel 0.012 0.000 0.334 0.236 5-max Lspline 0.000 0.004 0.460 0.482 Kernel 0.002 0.000 0.330 0.302 146 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.228 0.046 0.858 Kernel 0.000 0.134 0.010 0.778 3-max Lspline 0.000 0.218 0.080 0.902 43 Kernel 0.000 0.130 0.012 0.728 4-max Lspline 0.000 0.232 0.036 0.944 Kernel 0.000 0.148 0.014 0.668 5-max Lspline 0.000 0.164 0.074 0.870 Kernel 0.000 0.188 0.024 0.730 2-max Lspline 0.000 0.172 0.958 0.992 Kernel 0.000 0.106 0.980 0.918 3-max Lspline 0.000 0.148 0.984 0.970 44 Kernel 0.000 0.082 0.984 0.888 4-max Lspline 0.000 0.122 0.962 0.942 Kernel 0.000 0.086 0.966 0.724 5-max Lspline 0.000 0.100 0.994 0.896 Kernel 0.000 0.100 0.914 0.864 2-max Lspline 0.000 0.384 0.076 0.010 Kernel 0.052 0.320 0.122 0.108 3-max Lspline 0.000 0.340 0.092 0.002 45 Kernel 0.036 0.320 0.116 0.116 4-max Lspline 0.000 0.290 0.098 0.026 Kernel 0.032 0.322 0.102 0.138 5-max Lspline 0.000 0.290 0.102 0.066 Kernel 0.032 0.298 0.096 0.160 147 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.000 0.088 0.044 0.774 Kernel 0.026 0.004 0.002 0.192 3-max Lspline 0.000 0.028 0.016 0.724 46 Kernel 0.020 0.012 0.000 0.348 4-max Lspline 0.000 0.010 0.012 0.718 Kernel 0.016 0.024 0.000 0.626 5-max Lspline 0.000 0.008 0.004 0.640 Kernel 0.012 0.014 0.000 0.870 2-max Lspline 0.000 0.628 0.628 0.924 Kernel 0.000 0.466 0.462 0.134 3-max Lspline 0.000 0.634 0.548 0.898 47 Kernel 0.000 0.468 0.436 0.342 4-max Lspline 0.000 0.588 0.504 0.938 Kernel 0.000 0.426 0.392 0.732 5-max Lspline 0.000 0.538 0.438 0.868 Kernel 0.000 0.432 0.412 0.860 2-max Lspline 0.000 0.102 0.804 0.968 Kernel 0.000 0.098 0.652 0.682 3-max Lspline 0.000 0.086 0.778 0.958 48 Kernel 0.000 0.092 0.606 0.870 4-max Lspline 0.000 0.080 0.722 0.974 Kernel 0.000 0.080 0.622 0.876 5-max Lspline 0.000 0.042 0.646 0.872 Kernel 0.000 0.068 0.462 NA 148 Table B.l: (continued) Female Width Method k = 0 k = 1 k = 2 k = 3 2-max Lspline 0.030 0.748 0.252 0.808 Kernel 0.070 0.612 0.490 0.294 3-max Lspline 0.010 0.728 0.242 0.796 49 Kernel 0.058 0.602 0.714 0.352 4-max Lspline 0.000 0.710 0.240 0.776 Kernel 0.058 0.582 0.660 0.762 5-max Lspline 0.000 0.694 0.186 0.950 Kernel 0.058 0.590 0.666 0.978 2-max Lspline 0.000 0.842 0.708 0.924 Kernel 0.000 0.488 0.414 0.238 3-max Lspline 0.000 0.804 0.652 0.938 50 Kernel 0.000 0.440 0.536 0.358 4-max Lspline 0.000 0.740 0.550 0.926 Kernel 0.000 0.480 0.536 0.602 5-max Lspline 0.000 0.690 0.372 0.860 Kernel 0.000 0.398 0.428 0.894 149 Female # 1 Female # 3 Female # 8 Figure B.l: Plots of female speed curves with 1 bump. 150 Female # 20 Female # 27 Female # 33 Figure B.2: Plots of female speed curves with 1 bump (continued). 151 Female # 4 Female # 5 Female #13 Figure B.3: Plots of female speed curves with 2 bumps. 152 Female # 25 Female # 30 Female # 35 Female # 26 Female # 28 Female # 31 155 Figure B.7: Plots of the cases with no decision reached by the "strict" criterion (continued). 156 

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 181 0
United States 14 5
Japan 4 1
Hong Kong 2 0
France 2 0
Italy 1 0
Turkey 1 0
India 1 0
Canada 1 0
Republic of Moldova 1 0
City Views Downloads
Hangzhou 132 0
Unknown 17 1
Ürümqi 13 0
Beijing 7 0
Indianapolis 6 0
Tokyo 4 0
Guangzhou 3 0
Ashburn 3 0
Qingdao 3 0
Nanchang 2 0
North Point 2 0
Phoenix 2 0
Changsha 2 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088542/manifest

Comment

Related Items