The Not-So-Smoother by Jennifer Paige Eveson B.Sc. University of Victoria, 1994 A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F T H E R E Q U I R E M E N T S FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES Department of Statistics We accept this thesis as conforming /"tp the required standard T H E UNIVERSITY O F BRITISH C O L U M B I A October 1996 © J. Paige Eveson, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date (QcAJoPA/ DE-6 (2/88) A b s t r a c t In this thesis, a local smoothing method, termed the not-so-smoother, designed to es-timate discontinuous regression functions is proposed. Local smoothing techniques es-timate the regression function at a given point by finding the "best fit" through the observations within a fixed neighbourhood of the point. The "best fit" can be the best constant fit (which gives the moving average smoother), the best linear fit, the best k-degree polynomial fit, et cetera. The not-so-smoother finds the best local broken constant fit, a piecewise constant function with exactly one simple discontinuity. Unlike any of the traditional local smoothing methods, the not-so-smoother uses discontinuous local fits and, therefore, has the ability to preserve discontinuities in the function. Consistency of the not-so-smoother under general conditions is proven. Performance of the smoother on simulated data, both continuous and discontinuous, is demonstrated, and an application to a real data set of electric current recordings through an ion channel in a cell membrane is also shown. Variations of the not-so-smoother which can lead to improved performance in certain situations are investigated. 11 Table of Contents A b s t r a c t i i Table of Contents ' i i i L is t of Tables iv Lis t of Figures v Acknowledgements v i i 1 In t roduct ion 1 2 T h e Not -So-Smoother 6 3 Consistency 12 4 Performance of The Not -So-Smoother 33 4.1 Local Performance 33 4.2 Global Performance 37 5 Extensions 50 5.1 The Somewhat-Smoother 50 5.2 The Not-So-Smoother using Local Linear Fits 55 6 Conclus ion and Discussion 58 B i b l i o g r a p h y 62 iii List of Tables 4.1 Summary results of smoothing methods on simulated data (500 replica-tions in each case) 40 4.2 Summary results of smoothing methods for data simulated according to the two-constant model (1000 replications in each case) 45 iv List of Figures 1.1 The best constant and broken constant fits and their corresponding esti-mates at the central point 4 2.1 Examples of local broken constant fits for the fictitious neighbourhood in Figure 1.1 8 2.2 The function H corresponding to the fictitious neighbourhood in Figure 1.1 9 4.1 Histograms of breakpoints when no discontinuity exists 34 4.2 Histograms of breakpoints for various 8 and a values keeping the ratio 6/a constant 36 4.3 Histograms of breakpoints when a discontinuity exists in the center of the neighbourhood 38 4.4 Optimal smooths of one cycle of the sine function with Normal(0, 0.3) errors 41 4.5 Optimal smooths of a split cube-root function with Normal(0, 0.2) errors 44 4.6 Measurements of current flowing through an ion channel in a cell membrane 47 4.7 Smooths of the current data using the not-so-smoother and the moving average smoother for various bandwidths 48 4.8 Not-so-smooth of the current data using a bandwidth of 50 49 5.1 Somewhat-smooths of the sine data using Rn = 5 and various levels of a 52 5.2 Not-so-smooth and somewhat-smooth with a = 0.0001 of the split cube-root data using Rn = 9 54 Optimal not-so-smooths of the sawtooth function with noise using local constant fits and local linear fits vi Acknowledgements Foremost I want to thank my supervisor, Jean Meloche, for approaching me with this project. Not only has the topic proved interesting and rewarding, but also I couldn't have chosen a better person to work with. Jean was always willing to give his advice, ideas, and encouragement, all of which were much needed. Secondly, I am grateful to Nancy Heckman for her very useful comments and without whose careful reading my thesis may well have contained erroneous proofs. Thanks also to Grace Chiu for her invaluable help with Latex, C programming, and various other problems along the way, but most of all for being such a terrific officemate and friend. Lastly, I would like to acknowledge NSERC for their financial support which has enabled me to enter the real world debt free. vn Chapter 1 Introduction A common problem in statistics is, given a set of noisy data, to estimate the underlying function, also called the signal or regression function. Oftentimes, a parametric form for the signal is assumed. For example, in the case of linear regression, the signal is assumed to belong to the class of linear functions. A more general problem arises when the regression function is not restricted to any specific form. If limited or no information is known about the underlying function then it is preferable not to place restrictions on the function's form. Nonparametric regression techniques such as kernel smoothers and smoothing splines are typically used to estimate the signal in such situations (Green and Silverman, 1994; Eubank, 1988). These smooth-ing methods are based on the assumption that the regression function is continuous. Applying the traditional smoothing methods when the function is not continuous tends to smooth away the discontinuities, or "jumps". In applications where discontinuities are present in the signal, it is important that the jumps be preserved. In fact, identification and preservation of the discontinuities can be the main objective in estimating the signal. For example, in Chapter 4 we consider a data set of electric current recordings through an ion channel in a cell membrane. The current is believed to have two, or perhaps more, conductance levels between which it switches randomly. The data are quite noisy and the goal is to restore the true signal. For data sets such as this, we seek a smoother with the ability to preserve discontinuities. In particular, we are interested in the case where no a priori information regarding the parametric form 1 Chapter 1. Introduction 2 of the regression function, including the number and location of discontinuities, is known. In 1986, McDonald and Owen investigated the estimation of a discontinuous regression function and introduced a smoothing algorithm called the split linear smoother which can produce a discontinuous estimate. The main idea of their approach is, for any given point, to obtain linear fits based on points to the left, to the right, and to both sides of the point in question. A smoothed estimate for the point is found by taking a weighted average of the left, right, and central fits, where the weights are chosen based on goodness-of-fit measures for the linear fits. The split linear smoother is quite complicated in concept and in practice, and its statistical properties were only briefly discussed. Following the work of McDonald and Owen, Hall and Titterington (1992) developed an edge-preserving smoothing algorithm summarized as follows. (Note that an edge is simply a discontinuity in the function.) For each design point, a left, right and central smooth is calculated by taking a weighted average of data to the left, right, and both sides, respectively, of the point. Weights are determined through a procedure which equates leading terms in the Taylor series expansions of the expected smooths. Discontinuities are then identified using various diagnostics to compare the three smooths. Having identified the jumps, a final estimate of the function at each design point is produced. The left (right) smooth is used for points to the left (right) of and sufficiently close to a discontinuity; otherwise, the central smooth is used. As acknowledged by Hall and Titterington, some arbitrariness is involved in the procedure, including the choice of weights, the number of neighbouring data points used in the smooths, and the diagnostics used to identify jumps. Furthermore, properties of their estimator are not known. Several related problems have also been studied. The case where exactly one discon-tinuity exists was considered by Miiller in 1992. Miiller identifies the discontinuity by comparing right and left one-sided kernel smooths, and he also gives results on the rate of convergence of the estimated point of discontinuity to the true point. Chapter 1. Introduction 3 In a 1991 paper, Lee presents a method for detecting and measuring the size of change-points, which are discontinuities occurring in the kth order derivative of the regression function. The use of smoothing splines to estimate the function once the change-points have been detected is briefly discussed. Shiau (1987) proposes a partial spline model to estimate the underlying function of a noisy data set, assuming that the locations of the discontinuities are known. In a case where this information is unavailable, Shiau's approach could be used in conjunction with a procedure for identifying the discontinuities, such as Lee's. Less recently, Feder (1975) investigated regression functions which have different para-metric forms over different regions of the domain. Specifically, he studied the asymptotic distribution of least-squares estimators in this segmented regression problem. Because each segment's parametric form must be specified, this problem is not as general as the one we are interested in. In this thesis, we propose a smoother which, like those of McDonald and Owen and Hall and Titterington, is capable of preserving discontinuities, but is simpler conceptually as well as in application. Moreover, consistency of the estimator can be shown. No assumptions about the parametric form of the underlying function or the existence of discontinuities are made. This new smoothing technique is a local smoother. Local smoothers estimate the regression function at a given point by finding the "best fit" through a fixed number of neighbouring observations. The "best fit" depends on the form of the fits considered as well as the criterion used to determine best. In the case of the moving average smoother, only constant fits are considered and best is determined by minimizing the squared error between the observations and the constant. Local linear, quadratic, and k-degree polynomial fits are also commonly used. The smoothing method we propose finds the best "broken constant" fit, a piecewise constant function with exactly one simple Chapter 1. Introduction 4 A fictitious neighbourhood of observations best constant fit best broken constant fit estimate using constant fits estimate using broken constant fits -1— 10 -1— 15 —r -20 -1— 25 40 Figure 1.1: The best constant and broken constant fits and their corresponding estimates at the central point discontinuity, where best is determined by a minimization of squared error criterion. Regardless of the form of local fits used, after determining the best local fit, the estimate of the regression function at the point of interest is taken to be the fitted value at that point. An illustration should clarify. Suppose we have a data set of 100 observations and we are estimating the regression function at the 25th point. We must first choose the number of neighbouring observations to use in our estimate. For this example, we will-use 15 observations to the left and 15 observations to the right of the 25th point, meaning observations 10 through 40. Figure 1.1 shows fictitious data for this situation. The best constant fit, which is just the mean of the observations, and the best broken constant Chapter 1. Introduction 5 fit are shown in the figure. To find the best broken constant fit, we consider, for each design point in the neighbourhood, the broken constant fit that breaks at that point. Searching over all these fits, we find that the total squared error between the observed and fitted values is minimized when the breakpoint occurs at the 19th point, as pictured. The estimates at the 25th point corresponding to the best constant fit and best broken constant fit are also marked. This illustration helps us to see that when there is a discontinuity in the neighbour-hood of a point, it is reasonable to expect the best broken constant fit to break at, or near, the point of discontinuity. If so, the function estimate at points close to the discon-tinuity will only be influenced significantly by observations on the same side of it, thus preserving the edge. Because the result of our smoothing method is an estimate of the function that need not always be smooth, we will refer to it as the not-so-smoother. A detailed description of the not-so-smoother follows in Chapter 2. Consistency of the estimator under general conditions is proven in Chapter 3. In Chapter 4 we demonstrate the performance of the not-so-smoother on simulated as well as real data sets. Chapter 5 contains extensions of the smoother to include the use of local linear, rather than constant, fits and the use of a test to allow the estimator to "break" only if the data provide sufficient evidence of a discontinuity. Finally, concluding remarks and suggestions for further work are found in Chapter 6. Chapter 2 The Not-So-Smoother Before introducing the not-so-smoother, we must first define the framework under which we are working. Let / denote the regression function, or signal, being estimated. We assume that / is a bounded, real-valued function on the interval (a, b) with a finite number of unknown discontinuities. Let (tnk,Xnk), k = 0,1, . . . , n , denote the data, where each tnk is a design point in the interval (a, b) and each Xnk is the observed regression function at the point tnk-The model can be expressed as Xnk = f(tnk) + £nk, k = 0,l,...,U where a < tno < tn\ < • • • < tnn < b and the error terms, {enk}, are independent and identically distributed with mean 0 and finite variance a2. A smoothed estimate of the regression function, / , at each of the design points is calculated according to the following algorithm. First choose a non-zero bandwidth, Rn, such that in estimating / at the point tnk the nearest 2Rn +1 observations, namely {Xntk-Rn,. • •,Xnk,...,XUik+Rn}, are used. We will refer to the interval {tnik-Rn , tn:k+Rn] as the neighbourhood of tnk-For each design point tnk, k £ {0,1,...,n), define the local breakpoint, Ink, to be the argument that minimizes the function Hnk(J) over all J G {—R n , • • •, 0, . . . , Rn — 1}, where Hnk{J) = — I {Xn,k+3 - X-Rn-j) + {Xn,k+j ~ XJ+l:Rn) J ( 2-l) nn \j=-Rn j=J+l J 6 Chapter 2. The Not-So-Smoother 7 and Xi:. is defined as m Then define the estimate of f(tnk) to be f(tnk) = f(tnk, Ink ) where ) 1 jes (2-2) \S\ and { — Rn, • • • , Ink} if Ink > 0 {ink + l,...,Rn} H Ink < 0. S = < (2.3) Note that |5"| denotes the number of elements in the set S. Although the notation may seem daunting, the idea is really quite simple. Suppose we are trying to get a smoothed estimate of / at the point tnk, then consider only the 2Rn +1 neighbouring observations, which we have denoted {Xntk-Rn,..., Xnk,...,Xn>k+Rn}- We want to find the broken constant function that fits the data "best", where best is based on the minimization of squared errors. Specifically, fix J € {—Rn, • • •, 0, . . . , Rn — 1} so that J divides the data in the neigh-bourhood into two subsets, Si = {Xn,k-Rn,... ,XnMj} and S2 = {Xn<k+J+1,..., XnMRn} The constant line through the observations in Si which minimizes the sum of squared errors is the mean of the observations; similarly for S2- Let us denote the sum of squared errors from the observations in 5"i to their mean and S? to their mean as SSE-Rn:j and SSEj+i:Rn respectively. The value of J which minimizes the total squared error, SSE-Rn:J + SSEj+i:Rn, or equivalently Hnk{J) = (SSE-Rn.,j + SSEJ+i:Rn)/Rn, is what we have called the (local) breakpoint. After having identified the best broken constant fit, the estimate of / at tnk is simply taken to be the fitted value at that point. Chapter 2. The Not-So-Smoother 10 15 20 25 30 35 40 J = -10 H(-6) = 6.50 10 15 20 25 30 35 40 J = -6 H(0) = 21.93 10 15 20 25 30 35 40 J = 0 10 15 20 25 30 35 40 J = 10 Figure 2.1: Examples of local broken constant fits for the fictitious neighbourhood Figure 1.1 Chapter 2. The Not-So-Smoother 9 — i 1 1 1 1 1 f -15 -10 -5 0 5 10 15 J Figure 2.2: The function H corresponding to the fictitious neighbourhood in Figure 1.1 To illustrate this procedure, again consider the fictitious neighbourhood shown in Figure 1.1. For this example, the bandwidth is 15. Thus, there are 30 possible broken constant fits to consider, one corresponding to each value of J € { —15,..., 0, . . . , 14}, where J divides the data in the neighbourhood into two subsets as described above. Broken constant fits for various values of J are shown in Figure 2.1. The function #100,25 (recall n = 100 and we are estimating the 25th point), for which we will simply write H in this example, is calculated for each value of J. A plot of H is given in Figure 2.2. It is clear from this figure that H is minimized at J = —6, meaning the breakpoint is —6 (corresponding to the 19th point). Therefore, the estimate of the regression function at the 25th point is the mean of observation 20 through 40, which is 2.07. Chapter 2. The Not-So-Smoother 10 C o m m e n t s 1. Some modifications must be made for estimating points near the ends of the interval (a, b). If we are estimating / at the point tnk where k £ {0,1,..., Rn — 1}, then there are not Rn design points to the left of tnk- Similarly, for estimating / at tnk where k £ {n — Rn + 1, • • •, n — l,n}, there are not Rn design points to the right of tnk- In such cases, we simply replicate the data set to the right and left of itself, connecting the endpoints of the interval (a,b). In doing so, for tnk near a, observation Xn-j becomes observation XniTl-j+i, j = 1, . . . , Rn, and for tnk near b, observation Xn,n+j becomes observation Xn,j-i, j = 1, . . . , i?„. More concisely, every Xnk can be written as Xntk%(n+i)i where % denotes modulo arithmetic. Furthermore, tn,-j can be defined as a - (b - tnin+1_j), j = 1 , . . . , Rn, and tn,n+j as b + (tn,j-i - a), j = 1 , . . . , Rn. Although the location of the design points does not affect the estimated value of the function, it is important that all design points be well defined in order that the proofs in Chapter 3 hold near the ends of the interval (a, b). Note that with a usual moving average smoother, replicating the data set could result in biased estimates since the observations near the beginning of the interval need not be consistent with those near the end of the interval. However, using the not-so-smoother there is no such problem since we are just introducing another potential discontinuity which this method is designed to preserve. 2. We chose to use local piecewise constant fits as opposed to local piecewise linear fits, or any more complex piecewise function. This choice stems from the idea that if the width of the neighbourhood is small, then the regression function should not fluctuate too much within a neighbourhood and thus should be adequately approximated by constants. This will not always be true, as a later example will show, leading us to investigate the use of local piecewise linear fits in Chapter 5. The literature on local regression is unanimous to say that local linear and higher order fits perform better than local constant fits for numerous reasons. The example in Chapter 5 suggests that this is also the case for the Chapter 2. The Not-So-Smoother 11 estimation of a discontinuous signal. 3. As mentioned previously, the not-so-smoother should detect discontinuities when they exist. If there is a jump within the neighbourhood of the point being estimated, then it is reasonable that the best broken constant fit will break at, or very near, the point of discontinuity. (In Chapter 4 we show this is true in practice and in the next chapter we prove that, under certain conditions, it is true asymptotically.) Therefore, if the point being estimated is to the left (right) of the discontinuity, then the estimate should only be influenced significantly by points to the left (right) of the discontinuity. As a result, the jump will not be smoothed away. It will be convenient to rewrite the estimator given in equation (2.2) as follows. Since Xnk = f(tnk) + £nk, we can write f{tnk, Ink) = 9nk{tnk, Ink) + Ink(Ink) (2.4) where gnk(tnk, Ink) = T7T7 X) f(tn,k+j)> (2-5) 11 51 jes Ink(ink) = T T m £n,k+j (2.6) I*51 jes and S is defined in (2.3). Thus gnk can be thought of as the part of the estimate attributable to the signal whereas ^nk is the part of the estimate attributable to noise in the observations. Chapter 3 Consistency This chapter is devoted to proving that the estimator, / , as defined by (2.4) is consistent under general conditions. The conditions required are: (Rl) max (tni - £ n , ;_i) = 0(l/n), te{l,-,n} tno — a = 0(l/n) and b — tnn = 0(l/n) (R2) Rn ™ oo and — n-=^ 0 n (i?3) For every discontinuity point t0, there exists 6i,S2 > 0 and Mi,M2 > 0 such that (0 l / (*)- /"(*o)l < M i ( t 0 - < ) V 0 < ^ o - t < < 5 i and (ii) - /+(*o)| < M2 (t - t0) V 0 < t - t0 < 62 where f~(t0) = lim t| i o f(t) and f+(to) = l im t j t o f(t) are assumed to exist. Further-more, either f(t0) = f~(t0) or f(t0) = f+(t0)-(RA) Ee* <oo V i = l , . . . , n Condition (Rl) states that the distance between any two consecutive design points tends to zero at speed 1/n as the number of design points tends to infinity. For example, 12 Chapter 3. Consistency 13 if the design points are equally spaced over the interval [0,1], as is commonly the case, then tni = i/n, i = 0,1, . . . , n. Thus, the distance between any two consecutive points is 1/n which is clearly 0(1/n) as required. The additional requirements in (Rl) that the left and right endpoints of the interval (a, b) be close to the first and last design points respectively are needed when proving consistency near the ends of the interval. Collectively, the conditions in (Rl) state that distance between any two consecutive design points, including those defined by replicating the data set (see Comment 1 on page 10), is 0(1/n). Condition (R2) states that the number of points used in estimating / at any given point must be large in absolute terms, but small relative to the total number of observa-tions. For example, take Rn = y/n. An immediate consequence of conditions (Rl) and (R2) is that the distance between a given design point and any other point within its neighbourhood tends to zero as n tends to infinity. Formally this can be stated as: (R21) Let Sn = {-Rn, • • •, 0, . . . , Rn}. Then max\tn,kn+j ~tnkn\ 0. je«->n The proof is simple. max \tn,kn+j — tnkn I < tn,kn+Rn — tn,kn-Rn jeo n Rn = XI (tn,kn+i ~ ^n,fc„+(i-l)) i=-Rn R = £ 0(l/n) by (Rl) i=-Rn = (2Rn + 1) 0(l/n) n-^? 0 by (R2). Condition (i?3) is a Lipschitz-type condition for the discontinuity points which re-quires that the underlying function be smooth on either side of a discontinuity. It also Chapter 3. Consistency 14 states that / is either right or left continuous to exclude the case where / has a remov-able discontinuity, that is, f~(to) = f+{to) ^ f(to)- Although the results presented in this chapter are still true when there is a removable discontinuity, the proofs would need modified to include this case. Finally, condition (-R4) states that the distribution of the error terms must have finite fourth moments. This is true for many distributions, including the commonly assumed normal distribution. The last two requirements, (i?3) and (-R4), will be needed to prove consistency of / at the discontinuity points. C o m m e n t Note that in the proof of (-R2'), tnkn was written instead of tnk\ tnk represents a fixed point whereas tnkn represents a sequence of points. We want to show consistency of / at every point t0 in the interval (a, b). Because / is only defined at the design points and t0 may not correspond to a design point, we must consider a sequence of design points, tnkn, which converges to t0. Thus, throughout this chapter, kn will be written wherever k was formerly used. The proof that the not-so-smoother is consistent will be presented through a series of lemmas. First it is shown that the part of the estimate attributable to noise, 7n^n (refer to (2.6)), converges in probability to zero. Next, the term attributable to the signal, 9nk„ (refer to (2.5)), is considered and it is shown that the difference between gnkn and the true function / tends to zero at all points as n tends to infinity. Continuity points and discontinuity points will be considered separately. Finally, since / = gnkn + ^nkn, it follows that / — / converges in probability to zero, or equivalently that / is a consistent estimator of / . Chapter 3. Consistency 15 Notation Before proceeding, some simplifying notation which will be used throughout this chapter is introduced. When considering design point tnkni £j will be written in place of £n,kn+j and Xj in place of Xn,kn+ji for j = — Rn,..., Rn. Thus it is important to keep in mind the dependence of these terms on kn, the position of the current point of interest, and n, the total number of observations. Lemma 1 Assume condition (Rl) holds. Then max Inkn £{ — Rn,--;Rn} link n where V denotes convergence in probability. Proof Knkn (Inkn) < max e-Rn + ... + e0\ \e~Rn + • • • + £i| \e-Rn + ... + eRn Rn + 1 \£-Rn+l + •••+£«„! Rn + 2 2Rn + 1 |e_i + ... + eRn\ \e0 + ... + eRn\' 2Rn Rn + 2 Rn + 1 < 1 •max{\e-Rn + ... + e0|, • • •, \e-Rn + .. . + eRn\, < < Rn + 1 + • • • + £R„| , • • •, |e0 + • • • + £Rn\} D \ -1 max{\e-R n |, \s-Rn + £ - « n + i |,. •., \e-Rn + ... + eRn |, tin + 1 Y-Rn+\ + •••+ £fl„| , • • • , | £ f l „ - l + £Rn \ > \6Rn\} D 1 , i max{ |£_ f l n |, |e_fln + £-fi n +i |, • • •, \e-Rn + • • • + £Rn I} tin + 1 + n IT max{ \eRn I, \eRn-X + eRn |,..., \e-Rn + ... + eRn \} tin + 1 By Kolmogorov's Inequality (Chung, 1974), we know for all 8 > 0, P * + 1 m a x { | £ _ H J , \e-Rn + e-Rn+1\,...,\e-Rn + ... + eRn\} > #j = P{max{|e_flf>|,|e_HN +e-fl B+i|,. . . ,|e-fl B + ... + eRn\} > (Rn + 1) 8} (3-1) Chapter 3. Consistency 16 < Vav(e-Rn + ... + eRn) (Rn + l ) 2 S2 (2Rn + 1) a2 71—* OO (Rn + l ) 2 S2 ~^ By definition, this means that 0 since by (R2) Rn — • oo. Rn + l Also, by symmetry 1 v ma,x{\e-Rn\,\e-Rn+e-Rn+1\,...,\e-Rn +... + eRn\} — • 0. (3.2) 1 v m&x{\eRn\,\eRn-1+eRn\,...,\e-Rn +... + eRn\} —> 0. (3.3) Rn + l Now apply the results of (3.2) and (3.3) along with Slutzky's Theorem (Bickel and Doksum, 1977) to statement (3.1) to obtain 7nkn(Lkn) < 1 max{|g_fin|, \e-Rn + e-Rn+1\,. • •, \e-Rn + ... + eRn\} tin + J-max {\eRn|, \eRn--i + eRn\,..., \e-Rn + ... + eRn|} Rn + l 0 + 0 = 0 • Consider now convergence of gnkn- By condition (Rl), we know that for any point to G (a, b) there exists a sequence tnkn which converges to to. Thus, for each to, we must show that gnkn(tnkn,inkn) — f(to) ^—^0. When / is continuous at to, the proof requires only conditions (Rl) and (R2), whereas (i?3) and (i?4) are also needed when / is discontinuous at to. The continuity points are considered first. Suppose we are estimating / at point tnkn, where tnknn-^> to. The width of the neighbourhood around tnk„ shrinks to zero asymp-totically so that it will not contain a discontinuity when / is continuous at to- Thus Chapter 3. Consistency 17 f(tn,kn-Rn), • • •, /(*nfc„), • • • > /(<n,*„+fl„) are essentially equal, all converging to /(t 0). Be-cause gnkn(tnk„i Inkn) is an average of a subset of these terms, it must also converge to f(to). A rigorous proof is given in the following lemma. L e m m a 2 Assume (Rl) and (R2) hold. If tnkj——* t0 and if f is continuous at to, then 9nkn(tnkni Inkn) ~ f(to) 0. Proof Let S be a subset of <Sn = {—Rn,..., 0, . . . , Rn} and let IS"! denote the number of elements in S. For any kn G {0,... ,n}, 1 Tel X ) /(Wn+j) - f(to) 1151 jes < | ^ E l / ( W , ) - / ( * o ) | I151 jes < m a x | / ( t „ , f c n + J ) - / ( t 0 ) | - ^ l / ( ' » . * » + i ) - / ( * o ) | -(3.4) Regardless of the estimate 7 n f c„, gnkn(tnkn,Inkn) is of the form £ j G s f ( t n , k „ + j ) / \ S \ where S = {—Rn, • • •, Inkn} OT S = {Inkn + 1, • • •, Rn}- Therefore the result of (3.4) can be applied to get 9nkn(tnkn,Inkn) ~ f{t0) < max |/(t„,JfeB+j ) - f(t0) I (3.5) Now, because t0 is a continuity point of / , we know that for all t such that \t — t0 \ — ^ 0, | / ( t ) - / ( t o ) | ™ 0 . By (R2'), we know that max j ( _£ n \tn,kn+j - tnkn\nj^> 0. Combine this with the fact that \tnk„ — to\ ——^ 0 to get max |tn,fc„+j — to\ = max \tn<kn+j — tnkn + tnkn — tQ\ jeo n jeon < max \tn,k„+j — tnkn \ + \tnkn — *o| jeon 0. Chapter 3. Consistency 18 Therefore, max\f(tntkn+j) - f(t0) So we can conclude from inequality (3.5) that 9nk„ (tnk„, Inkn) ~ /(^o) 0. (3.6) • Note that / ( t 0 ) can be replaced by f(tnkn) in the above lemma for a slightly different statement. The reason this works is as follows: 9nk„ (tnkn i Inkn ) f(^nkn ) < 9nkn(tnkn, Inkn) ~ /(*o) + /(*o) ~ f(tnkn) gnkn(tnknJnkn) ~ f(t0)\ + \f(t0) - /(*„*„ ) I Because / is continuous at to and tnknn—to, we know that \f(to) — f(tnkn] this fact and expression (3.6), we can conclude that ' 0. Using 9nkn (tnkn J Inkn ) f(tnkn ) 0. To prove convergence at points where / is discontinuous is somewhat more involved. Suppose / is discontinuous at to. Ideally, we would like to show that gnkn(tnkn: Inkn)~ /( to) I — • 0. This is unrealistic as we do not usually know if / is right or left continuous at to, and even if we did, the most we can expect is that g n k n will converge to either f~(to) or f+(to). In Lemma 4, this last statement is shown to be true. In proving Lemma 4, we will need to use a result to be presented in Lemma 3. We saw in the proof of Lemma 2 that the location of the estimated breakpoint is irrelevant asymptotically when / is continuous at the point being estimated. This is not the case for discontinuity points; rather we will need to show that the estimated breakpoint is Chapter 3. Consistency 19 sufficiently close to the true location of the jump. To do so requires conditions (R3) and (i?4). This is the content of Lemma 3. For simplicity, we assume in both Lemmas 3 and 4 that to is a discontinuity point and that it corresponds to some design point; that is, t0 = tnk„ for some kn £ {0,1,..., n}. Of course, this need not be true; we may only have tnknn—^ to- Lemma 4 will still be valid; however, the proof becomes much more involved since to can be anywhere inside, or even outside, of the neighbourhood around tnkn. Numerous cases must then be considered and Lemma 3 needs to be slightly revised. The essence of the proofs for Lemmas 3 and 4 in the more general setting is similar to when we assume to = tnkn. However, the proofs become lengthy and repetitive, and thus only the more restricted statements are proven here. L e m m a 3 Assume (Rl) - (R4) hold. Consider estimating f at design point tnknwhere tnk„ = for some kn £ {0,1,..., n} and f is discontinuous at to- Then Inkn Rn 0. Proof We must show that Inkn Rn > e 0 V 0 < e < l . To do so, we will bound the probability by an expression which is easier to work with. '•nkn Rn >e\ = P { | / „ f c n | >eRn) = ? U {Hnkn(J) < Hnkn(K) y j ^ K } [ \J\>€Rn < P | U {Hnkn(J) < Hnkn(0) } [ \J\>SRn Chapter 3. Consistency 20 < E P {Hnkn(J) < Hnkn(0)} (3-7) \J\>6R„ Case 1: eRn < J < Rn. We need to show that E 1 P {Hnkn(j) < H^O)} ™ o. J>ERn The following notation will be convenient: let VJ, i = 1, . . . , n, be any random vari-ables. Then, for any 1 < k < I < n. It can be shown that P {Hnkn(J) < Hnkn(0)} = p{Rn(Rn + l)(X-Rn:0-X1:jy < (J + Rn + l)(i? n - J) (X1:J - XJ+1:Rny} (3.8) through the series of steps that follow. Hnkn{J) = E (x3-x-Rn:jy+ E (x3-xJ+v.Rn)2 j=-Rn j=J+l = E {Xj -x.Rn:Jy+E(X, -x_Rn:Jy + E (Xj-xj+^y j=-Rn j = l j = J + l 0 E {pi - X-Rn.O + X_Rn:0 - X-Rn:ij , 2 j=-Rn J ,2 Rn + j2(xj-x1:j + x1:J-x.Rn..j) + E {Xj-xj+^y i = i j=J+i = E ( ^ • - A ? _ f l n ; 0 ) 2 + (/i„ + i)(x_ f i n : J-z_^ :o) 2 + E ( ^ - ^ ) " + ^ ( A F - f i n : 7 - X 1 : J ) ' + £ ( ^ - X J + 1 : H n ) 2 ( 3 . 9 ) Chapter 3. Consistency 21 Now, J , . 2 = X {X3 ~ X^-Ru) - J (Xl:J ~ X 1 : R n ^ ~ (Rn - J) (Xj+i-.Rn - X1:Rr^j where the last equality follows by expanding the squares and gathering terms. Substituting this expression into equation (3.9) and using the fact that 0 R Hnkn(o)= E {xj-x^y+ Y,(xj-x1:Rny j=-Rn j = l gives HNKN(J) = Hnkn(0) + (Rn + l)(x_Rn:J-X_Rn:O)2 + j ( X - R n : j - X 1 : j ) 2 - J ( X 1 : J - X 1 : R n ) 2 - (Rn - J) ( X j + 1 : R n - X ^ f . (3.10) Therefore P {HNKN(J) < Hnkn(0)} = p +1) (x_Rn:J - x - R n . . 0 y + J (x-Rn:j - x1:Jy < J (X1:j - X 1 : R n ) 2 + (Rn - J) (Xj+l:Rn ~ Xl:Rny] • (3.11 More manipulations are needed to get this in the form of equation (3.8). Consider each term in the probability statement on the right-hand side of (3.11) separately. (Rn + 1) ( X _ H „ : J — X _ H n : o ) ( R u ( ((Rn + l)-(J + Rn + 1)) (X-Rn + . . . + X0) + (Rn + l)(X1 + ...+ Xj) [ K N + I ) { (J + Rn + l)(Rn + l) (Rn + l)J2 ( Y Y V ) Chapter 3. Consistency 22 Similarly, j{x-Rn:J-x1:Jy = {j^^\y(x-^o-x1:Jy J (X1:J - X1:Rn^ = ^Tfi — (X1:J - Iy +l : f l„ ) (Rn - J) (xJ+1:Rn - X1:Rn) = (i?„-J)J2/- -Rl (X1:j - I j + 1 : B „ ) Substituting these equalities into equation (3.11) gives P {Hnkn(J) < Hnkn(0)} p f J(Rn + l) (Y _ XrY<JiRn~J)^ ^ (Xi-.J - Xj+1:Rr^j J P s[Rn(Rn + 1) (X-Rn-.o ~ X1:j)2 <(J + Rn + l)(Rn - J) (X1:J - XJ+1..Rny} which we recognize as equation (3.8). Thus, P {Hnkn(J) < Hnkn(0)} = p {(x_Rn:0 - x1:Jy < ( i + ( i - £ ) (x1:J - xJ+1..Rny} < p { ( x _ f l n : 0 - x 1 : J ) 2 < ( 1 - ( £ ) 2 ) ( x 1 : J - x J + 1 : R „ ) 2 | = P {\X-Rn:o - X1:J\ < m \X1:J - XJ+1:Rn\} where m = \Jl — (J/i? n ) 2 < P | J E X _ f l n : 0 — EX\.j — X-Rn:0 — EX-Rn:Q — Xi:j — EXUJ < m ( Xj+i:Rn - EXj+x:Rn + XUJ - EXi:j + EXj+i:Rn - EX1:j^)} using the fact that |a| — |6| — |c| < |a + b + c\ < |a| + \b\ + |c| = P {(1 + ro) \X1:J - EX1:J\ + ro \xJ+1:Rn - EXJ+1:Rn\ + \X-Rn:0 - EX_Rn:0 -m\EXJ+1:Rn-EXv.j\} > 1/3 (\EX_Rn:0 - EX1:J\ - m \EXJ+1:Rn - EX1:J\)} > E J _ f i „ ; 0 — EXi:j < ?{(l + m)\x1:j-EX1:J Chapter 3. Consistency 23 +P {m \Xj+1:Rn - EXJ+1:Rn\ > 1/3 (\EX.Rn:0 - EX1:J\ - m \EXJ+1:Rn - EX1:J\)} +P {\X-Rn:0 - EX_ H n :o| > 1/3 (|EX_*n :o - EX1:J\ - m \EXJ+1:Rn - EX1:J\)} . To simplify the notation, define Pi E(X-Rn:0) = ]T f(tn>kn+j) tin + 1 „•_ D j=-Rn p2 = E(X1:j) = - £ f ( t n , k n + j ) i=i I Rn fi3 = E(XJ+1:Rn) = _ ^ f(tn,kn+j] U n J j=J+l Thus, we have P {Hnkn(J) < Hnkn(0)} < •p{(l + m ) | X i : J . - / i 2 | > 1/3(1/*! -p2\ -m\p3-p2\)} +P {m |x J + 1 : i ? n - fi3 > 1/3 (|/ii - jU 2 | - m |//3 - A*a|)} +P ||X_H„:O -(*•!> 1/3 - p2\-m \p3 - p2\)} • (3.12) Recall that we need to show that Rn~ 1 E P{iWJ)<#*.(o)} o. J>SRn To do so, we consider each probability statement on the right-hand side of (3.12) sepa-rately. We apply to each a generalization of Markov's inequality (Bickel and Doksum, 1977) and also use the the facts that m = yjl — (J/Rn)2 < V l — £ 2 , since J > eRn, and E (Xk:l - EXk:l)4 = Eieu)* = ( f " \ + ^ ) 3 + ( / _ ^ + 1 ) 2 -In deriving this expression, we used the fact that the £j's are independent and identically distributed with mean 0 and variance a2. Chapter 3. Consistency 24 P |(1 + m) X1:J - p2 > 1/3 ( - p2\-m |// 3 - M2I)} < < {3(l + m))4E(x1:J-n2y ( l A * l - M 2 | - ^ | P 3 - P 2 | ) 4 (3(l + > / r ^ ) ) 4 E ( X l ! j - / x 2 ) ' ( l A * i - P2I - V l - e2 | A * 3 - M2I) (3(1 + y T ^ 2 " ) ) 4 - M2I - V l - e 2 IMS - M2I) Eg 4 - 2cr4 2<j4 1 + J 3 J 2 (3.13) P {m |A"j+i : f l„ - / i 3 | > 1/3 - P2I - | / i 3 - A*2|)} ( 3 m ) 4 E ( X J + 1 : f l n - / x 3 ) 4 (IMI - P 2 | - m | ^ 3 - p 2 | ) 4 (3m)4 (iMi - M2I - \ / l - e 2 |/*3 - M2I) + 2CT4 Eg 4 - 2cr4 (i?n - J ) 3 ' ( i ? n - J ) 2 J (3.14) P { X _ f l „ : 0 - /ii J > 1/3 (|/xi - p2\-m |M3 - M2I)} 3 4 E ( X _ « „ : 0 - M i ) 4 < < (iMi - P 2 I - m | M 3 - M 2 | ) 4 3 4 E ( X - f l n ; 0 - M i ) 4 (iMi - P2I - V l - e 2 | M 3 - M2|) 4 3* [Eg 4 - 2cr4 ( | M 1 - P 2 | - V T ^ | M 3 - M 2 | ) 4 U ^ + 1 ) 3 ' ( ^ + 1)2J + 2a 4 (3.15) Assume without loss of generality that / is left continuous.1 Then, by (R3) (i), there exists 81 > 0 and Mi > 0 such that l / ( * n , f c „ + i ) - /"(*o)| < M i (to - <n ,*„+j) (3.16) "•Note that the function need not be left continuous. If / is right continuous then the proof will follow in the same manner except back at equation (3.8), we would have split into the terms X-Rn-.-i, XQ:J, and Xj+i:RN. Chapter 3. Consistency 25 whenever 0 < to — tn>kn+j < Si. Recalling that to = tnkn and using (-R2'), we know 0 < m a x i e { _ f i „ o} t0 - * n , f c n + j ™ 0. By (3.16), w " l a x „J/(*».*»+j) - / (*o)| 0. (3.17) Analagously, using (R3) (ii) and (R2'), w m a x D , l/(*n.*n+j) - /+(*o)| (3.18) Therefore, |/*i-/-(*o ) l = \»2-f+(to)\ = I M 3 " / + ( * O ) | = T E / ( < n , * „ + i ) - / - ( * 0 ) i = - - R „ J E / ( w , ) - /+ C o ) 0 V J fin R - T l J J=J+I E / ( W ; ) - / + ( t o ) 0 VJ. Thus, for all e > 0, there exists No such that if n > No then \[ii — f (to) j < £. Take e — S/A where S is defined as *=l/-(*o)-/+(*o)|, which is a positive constant. Then there exists Ni such that if n > Ni then \fii — /~(t0)| < S/A. Similarly, there exists N2 such that iin> N2 then |/x2 — f+(to)\ < S/A for all J , and there exists N3 such that if n > N3 then \fi3 - f+(t0)\ < S/A for all J . Let N = max{Ni, N2,N3} and we have, for all n > N, l/Ui - / - ( ' o ) l < <V4 |/ i 2 - / + (MI < 8/4 V J |/ / 3 - / + ( io)| < 5/4 VJ. (3.19) (3.20) (3.21) Chapter 3. Consistency 26 So, for all J and for all n > N, | M l-H = \(f-(to) + »l-f-(to))-(f+(to) + »2-f+(t0))\ = |(/"(<o) " / + ( 'o)) + ( M i " /"(to)) + (/ +('o) - M2)| > l/-(V> - / + ( io )| - IMI - /-(*o)l - \i* - f+(t0)\ > 6-6/4- 8/4 using (3.19) and (3.20) = 8/2 (3.22) and 1/^ 3 -M2| = |(M3-/ +(M)-(/<2-/ +(V>)l < | P 3 - / + ( i o ) | + |M2 - / + ( i 0 ) | < 8/4: + 8/4: using (3.20) and (3.21) = S/2. (3.23) Therefore, using (3.22) and (3.23) along with (3.13) gives, for all n > N, Rn-l E P { ( l + m) J>ERn < < Xi:j - p2 > 1/3 (|MI - M2I - rn \p3 - p2\)} 4 (3(1 + V T ^ S 2 ) ) ^ 1 (6/2(1 ~ V T ^ ) ) 4 J>6Rn (3(1+ y T ^ ) ) 4 ( « / 2 ( i - v T ^ F ) ) ' (3(1 + V T ^ ) ) 4 i ? n ( l - e ) Ee? - 2cr4 2cr4 — 1 h J 3 J 2 Es 4 - 2cr4 ( 1 - e ) (5/2(1 - VT^2)Y since Rn"-^? 0 0 by (i?2) and Ee 4 < 0 0 by (i?4). ( e ^ n ) 3 Ee 4 - 2tr4 + 2a4 szRl + {eRn)2 2<74 e2Rn Using (3.22) and (3.23) along with (3.14) gives, for all n>N, Rn-l E P yn \Xj+l..Rn - M3 > 1/3(|MI - M2I - m |M3 - M2I)} J>SRn (3.24) Chapter 3. Consistency 27 < < (5/2(1 - V i - . 2 ) ) 1 34 (5/2(1 - v W ) ) d 34 (5/2(1 - V l - e 2 ) ) 1 34 (5/2(1 - V l - e 2 ) ) 1 34 (5/2(1 - v W ) ) 1 Rn-1 ( y J 2 \ 2 Ee? - 2a 4 2<r4 1 + (Rn-J)3 (Rn-J)2 Rn-1 J Ee 4 - 2<r4 + 2a4 E i + # E e4 - 2 c r 4 2a4' + Rn-1 E 22 2 2 ( l - e ) Ee 4 - 2a 4 2a 4 / 2 3 ( i / j R n ) + f £ j "Ee 4 - 2a 4 2<r4 — 1 h (3.25) since i? , ,^-^ co by (R2) and Ee 4 < oo by (-R4). Finally, using (3.22) and (3.23) along with (3.15) gives, for all n > N, E P { | ^ - R „ : o - Hi > 1/3 - / i 2 | -m |A*3 — H)} J>SRn < (5/2(1 - vT^T2))4 J>en„ 34 Ee 4 - 2<74 + 2<74 (5/2(1 - v T ^ ) ) (i?n + l )3 (Rn + 1)2 ^Rn{l-e) Ee 4 - 2a 4 2a 4 1 + [(Rn + l)3 (Rn + 1)2\ 34(1 - e) (5/2(1 - V T ^ ) ) ' Ee 4 - 2a4 + 2a4 R2n(l + 1/Rny Rn(l + 1/Rn) 0 (3.26) since Rnn——> oo by (R2) and Ee 4 < oo by (i?4). Thus, by (3.12) and (3.24) - (3.26), we can conclude that E 1 P {Hnkn(J) < Hnkn(0)} ™ 0. J>CRn Case 2: -Rn< J < -eRn. We need to show that -ERn E P {Hnkn(J) < Hnkn(0)} J>-Rn Chapter 3. Consistency 28 The proof is analagous to case 1 and thus will be omitted. Combining cases 1 and 2, we can conclude that E P {Hnk„(J) < Hnkn(0)} \J\>SR„ and therefore, using (3.7), that >e\ ™ 0. • L e m m a 4 Assume (Rl) - (R4) hold. Let t0 = tnkn for some kn £ {0,1,... ,n} where f is discontinuous at to. Then mm{\gnk„{tnkn,inkn) - f~(to) , gnkn(tnk„, Inkn) - f+(to) } 0. Proof Case 1: Suppose Inkn > 0. That is, the estimated breakpoint is to the right of to (= tnkn)- Then 1 9nkn(tnkn5 Inkn ) nkn + Rn + 1 j=-Rn Inkn £ f(tn,k+j) 1 -1 nkn + Rn + 1 j=-fl„ f(tn,kn+j ) + + i2n + 1 j=0 £ f(tn,kn+j) Therefore, S'nfcn ( * n f c „ , Inkn ) ~ f (*o) | 1 - 1 < < '•nkn + Rn + 1 j=_i?n 1 - 1 £ ( / ( W i ) - / " ( * 0 ) ) 1 + i? n + 1 j = _ H „ E l/(W;)-/-(*o)| + Inkn I + - ^ n + 1 j=0 1 Inkn E(/(*».*-+i)-/-(*o)) + + 1 j=o El / (W; ) - /1*o)| Chapter 3. Consistency 29 < E l / ( W ; ) - / - W l 4 2 l / ( W ; ) - / - ( * o ) | j = - R „ U n j=0 ~ • S ^ ^l /CWn+j) - / ( * 0 ) | + - • r , i - / (<o)| + nfcn + 1 7?. max !/(*„,*„+;) - / (*o)| 'nkn + 1 Rn je{o,...,Rn} max l / ( * n , * „ + i ) - / - ( * o ) | . (3.27) Case Suppose Inkn < 0. That is, the estimated breakpoint is to the left of to (— tnk„)- Then 1 9nkn {tnkn i ^nkn ) — Rn + |/n*„| j=Inkn+l 1 Rn E / ( * " , * n + j ) •fnfc. ) + " I J = I n k n +1 •Rn E/(in.fc"+i) (nfc„ j=l Therefore, gnkn(tnkn,Inkn) ~ / + ( ^ o ) < < -Rn + |/n*„| j=Inkn+l 1 E (/(Wi)-/+(*o)) + 1 E(/(Wi)-/+(*o)) •Rn + nkn -Rn "t" |7nfcn 1 Rn j = l n k n + l Rn + El/(W;)-/ +('o)l nkn j = l < 4 E l/(Wi)-/ +(*o)| + 4El/(Wi)r/+(*o)| < < nfcn .max |/(<n,*»+j) - /+(*o)| + . max• \f(tn,kn+j) - f+(t0)\ Rn je { J„ f c „+i , . . . ,o } Rn} 'nfcn max |/(*„ l j f e l , + i ) - /+(i0)| + ..max \ f ( t n > k n + j ) - f+(t0)\. (3.28) tin 3£{-Rn, — ,0) je{l,...,Rn} Combining the two cases, we have for any Inkn, mm{ gnkn{tnknJnkn) ~ f (*u)| , |#nfc„(*nfc„, hkn) - / + (*o)|} Chapter 3. Consistency 30 - - c / m a X i i \f(tn>k»+i) ~ f Co)I + Inkn Inkn + 1 i? n je{o,...,fl„} m a X , , l / ( * n , f c „ + i ) - / (*o) , max n | / ( t „ , * n + i ) - /+(t0)| + max \ f ( t n M ) ~ f+(to)\• (3.29) By Lemma 3, we have Inkn /Rn-^-+0, so clearly ( Inkn + 1)/Rn-^ 0 as well. Also, maxi6{0,...,R„} \f(tn,kn+j) ~ /~(*o)l < 0 0 a n d maXJG{-Hn,..,0} |/(*n,Jfc„+j) ~ /+(*o)l < OO since / is bounded. Therefore, in (3.29), Inkn + 1 and Rn ie{o Rn) Inkn r m a X „ , Ifi^kn+j) ~ f (*0)| • 0 max „ l / ( t „ , * B + i ) - / + ( * o ) l 0. i? n je{-Hn 0} Now consider the remaining terms on the right-hand side of (3.29). In the proof of Lemma 3, statement (3.17), we showed that and therefore that . ™ax nJ/(*n,*n+j) - / (*o)l ™ 0 m a X Af(tn,kn+j) - f (to) je{-Rn -1} Also in the proof of Lemma 3, statement (3.18), we showed that max Af(tn,kn+j) ~f+(to)\ j€{l,...,Rn} Thus, using inequality (3.29), we can conclude that min{\gnkn(tnkn,Inkn) - f (t0)\ , \gnkn(tnknJnkn) ~ f+(to)\} 0 and Lemma 4 is proven. • Chapter 3. Consistency 31 Consistency of the not-so-smoother, / , at continuity points now follows directly from Lemmas 1 and 2, and at discontinuity points from Lemmas 1, 3 and 4. This is stated formally in the two theorems which follow. Theorem 1 Assume (Rl) and (R2) hold. Iftnknn~*to where f is continuous at to, then f(tnkn)~f(to) Proof f(tnkn)-f(t0) < 9nk„ (tnkn ) Inkn ) *)nkn (Inkn ) f(^nkn ) 9nkn{tnkni Inkn) / ( ^ n f c n ) | ~t~ i T n f c n ( - ^ n f c n ) By Lemma 2 we have that gnkn{^nkn->inkn) ~ f(^nkn) 0? a n d by Lemma 1 we have that {Inkn) 0. Thus we can conclude that f{tnkn)~f(to) a By the remarks following Lemma 2, we can replace f(to) by f(tnkn) in the above theorem to obtain the slightly different statement that \f(tnkn) — f(tnkn) 0. Theorem 2 Assume (Rl) - (R4) hold. If tnkn = t0 for some kn 6 {0,1,... ,n} where f is discontinuous at to, then min{|/(t n f c n ) - / - ( t 0 )| , | / ( < n * J - / + ( * o ) | } 0. Proof m i n { | / ( U j - f~(to)\, \f(tnkn) - f+(t0)\} = mm {\gnk (inkn ) ~ f (to)\, \gnkn (tnkn 5 inkn ) + Inkn (inkn ) ~ f+ (to) I} < mm{\gnkn(tnknJnkn) - / ~ ( * o ) , 9nkn(tnknJnkn) ~ / + ( * o ) | } + |7nfc„ ( / n * „ ) Chapter 3. Consistency 32 By Lemma 4 we have that min {\gnkn (tnkn, Inkn) - / (t0) , gnkn (tnkn, Inkn) - /+ (t0) |} ?0, and by Lemma 1 we have that |7nfc„(-^nA:„)| —• 0. Thus we can conclude that min{|/(< n f c n)-/-(t 0)|, | / ( * « f c „ ) - / + ( * o ) | } 0. • It should be noted that Theorem 2 is true if we replace tnkn = to with the more general statement tnknn—> to- This is because, as stated preceding Lemma 3, Lemma 4 is still valid in this more general setting.. However, the details of the proof will not be given in this thesis. C o m m e n t Under the condition that the bandwidth goes to infinity as n goes to infinity but goes to zero relative to n, the usual moving average smoother is consistent except at the discontinuity points. Briefly, this is because the neighbourhoods around all continuity points get smaller and smaller until they eventually do not contain a discontinuity. At a point of discontinuity, the moving average estimate converges to the midpoint of the jump, that is, (f~(to) + f+(to))/2 where to denotes the discontinuity point. The not-so-smoother, on the other hand, is infinitely close to either f~(to) or f+(to)- The failure to converge at only one point (or a few points) may not seem significant asymptotically, but in practice where n is finite, the implications are important. In the vicinity of a discontinuity and for finite n, the moving average smoother is upset by the presence of a discontinuity while the not-so-smoother is not. This is illustrated in the next chapter. Chapter 4 Performance of The Not-So-Smoother In this chapter, the performance of the not-so-smoother in application is demonstrated. Before considering the global performance, the local behaviour within a fixed neighbour-hood is investigated. 4.1 Local Performance Recall that in estimating the function, / , at a given design point, say i n , - , we find the best broken constant fit in a neighbourhood around the point. (The neighbourhood includes 2Rn + 1 points — the Rn points to both the right and left of £m- as well as itself.) The point at which the best broken constant "splits" is called the breakpoint, and the function estimate is taken to be the average of all observations in the neighbourhood to the left or right of the breakpoint, whichever contains the central point, tm-. In order to demonstrate the characteristics of the breakpoint, consider the simple case where the local model is truly two constants. cl -|- €n,i+j for j — Rni • • • i Ini -Xn,i (4.1) [ c2 + e„ti+j for j = /„,• + 1 , . . . , Rn When there is a discontinuity within the neighbourhood, cl ^ c2 and /„ ; indexes the point at which the discontinuity occurs. When the function is continuous within the neighbourhood, cl = c2 and 7nt- becomes irrelevant. Of course, in practice this local model will be violated. However, within a small enough neighbourhood, the function / should not fluctuate too much and the local model should be an adequate approximation. 33 Chapter 4. Performance of The Not-So-Smoother 34 0 5 10 breakpoint -50 -30 -10 10 30 50 breakpoint II • l l l s i l l l l i s l l -50 0 50 breakpoint Rn = 10 Rn = 50 Rn = 100 Figure 4.1: Histograms of breakpoints when no discontinuity exists Suppose we are in the case where no discontinuity occurs. Then we would prefer that the average used as the estimate of / ( i m ) includes as many observations as possible. Regardless of the location of the estimated breakpoint, the estimate will have no bias since all observations have the same mean, but the more observations used in the average, the smaller the variance of the estimate and thus the greater the efficiency. Consequently, we would like the breakpoint to be at either extreme of the neighbourhood. We do not know the theoretical distribution of the breakpoint but the empirical dis-tribution can be studied. We considered the case where the error terms are independent, standard normal random variables (mean 0 and standard deviation o = 1) and we take cl (= c2) arbitrarily to be zero. We randomly generated 2Rn + 1 standard normal obser-vations and determined the breakpoint by finding the argument that minimizes equation (2.2). After repeating this 1000 times, we looked at the histogram of the breakpoints. This was done for varying neighbourhood sizes, namely Rn = 10, 50, and 100. Histograms of the results are given in Figure 4.1. We can see that the breakpoint does have a tendency to be located near the edges of Chapter 4. Performance of The Not-So-Smoother 35 the neighbourhood and this tendency becomes stronger as the number of points in the neighbourhood increases, suggesting that m i n | | i m / i ? „ — 1 At this point we should note that in order for the locally constant model to be ap-proximately true, the neighbourhood must be small, yet to get an efficient estimate, a large number of points in the neighbourhood is needed. This can be achieved if the num-ber of observations in the data set is very large and the observations are close together, since this would allow us to define a neighbourhood size that is small relative to the total amount of data but still contains numerous observations. Note that the asymptotic equivalents of these conditions, stated in requirements (Rl) and (R2) at the beginning of Chapter 3, were needed to prove consistency. Now, when we are in the case where the neighbourhood contains a discontinuity, we expect the breakpoint to be at, or at least very near, the location of the discontinuity. This is the fundamental idea behind the not-so-smoother. If the breakpoint is correctly located near the jump, then the estimate of the central point will be an average of points all or most of which lie on the same side of the jump. Intuitively, the larger the magnitude of the jump and the smaller the variability in the data, the more likely the breakpoint will be located at the jump. Denote the jump size by 8, which in the local model is just |cl — c2|, and the standard deviation of the data by a as before. Then if we consider the ratio 8/a, we expect to be more successful identifying the jumps as the ratio increases. Of course, the number of points in the neighbourhood also affects the results. Presumably, with more observations the jump is easier to identify. We can investigate empirically the behaviour of the breakpoint in the presence of a discontinuity. The discontinuity can occur anywhere within the neighbourhood, but we chose the center for our simulations, meaning In{ = 0 in the local model. We considered the case where the error terms are independent, normal random variables with mean 0 Chapter 4. Performance of The Not-So-Smoother 36 -10 -5 0 5 10 breakpoint -10 -5 0 5 10 breakpoint -10 -5 0 5 10 breakpoint 6/a = 0.5; a = 1 6/a = 0.5; a = 10 6/a = 0.5; a = 50 -10 -5 0 5 10 breakpoint -10 -5 0 5 10 breakpoint -5 0 5 breakpoint 6/a = 1; cr = 1 tS/tr = 1; cr = 10 tS/tr = 1; a = 50 -10 -5 0 5 10 breakpoint -10 -5 0 5 10 breakpoint -10 -5 0 5 10 breakpoint 6/a = 2; a = 1 fl/tr = 2; a = 10 tS/cr = 2; cr = 50 Figure 4.2: Histograms of breakpoints for various 8 and cr values keeping the ratio 6/a constant Chapter 4. Performance of The Not-So-Smoother 37 and standard deviation a = 1. Data were generated corresponding to a given 8/a value and a given neighbourhood size, and the breakpoint was determined. Specifically we looked at the cases where Rn equals 10, 50, and 100, and for each value of i?„ we looked at 81 a equal to 0.5, 1, and 2. Intuition suggests that the distribution of the breakpoint depends only on the ratio 8/a and not the individual values of 8 and a. Our experiences seem to confirm this. Refer to Figure 4.2 where histograms of the breakpoint for varying 8 and a values keeping the ratio constant are presented. Note that the bandwidth is taken to be 10 in all results shown. The histograms look almost identical when the ratio 81 a is the same. Thus, the actual values of 8 and a used in our simulations appear to be irrelevant; however, for completeness, note that we consistently took a to be 1. In each case, 1000 replications were done and the resulting breakpoints reported in histograms. These results are shown in Figure 4.3. As we expect, the breakpoint is correctly located at the center of the neighbourhood more frequently as Rn increases and as 8/a increases. Recall that Lemma 3 in the previous chapter states that when the function is discon-tinuous at the point being estimated, meaning the discontinuity occurs in the center of the neighbourhood, the ratio of the breakpoint to the bandwidth size, Rn, converges to the location of the discontinuity. The results of our simulations are consistent with this lemma. The rate of convergence, however, is highly dependent on the value of 8/a. We see that when 8/a equals 0.5 the rate is considerably slower than when it equals 1 or 2. 4.2 Global Performance Now that the local behaviour has been considered, we will look at the not-so-smobther's global performance on some data sets. As with other smoothing methods, a bandwidth must be chosen before applying the method, and, as with other smoothing methods, the Chapter 4. Performance of The Not-So-Smoother 38 -10 -5 0 5 10 breakpoint -50 -30 -10 10 30 50 breakpoint I 1l -100 -50 0 50 100 breakpoint S/a = 0.5; Rn = 10 S/a = 0.5; Rn = 50 S/a = 0.5; Rn = 100 -10 -5 0 5 10 breakpoint -50 -30 -10 10 30 50 breakpoint -100 -50 0 50 100 breakpoint S/a =l;Rn = 10 S/a = l;Rn = 50 S/a = 1; Rn = 100 I_ -10 -5 0 5 10 breakpoint -50 -30 -10 10 30 50 breakpoint -100 -50 0 50 100 breakpoint S/a = 2;Rn = 10 S/a = 2;Rn = 50 S/a = 2;Rn = 100 Figure 4.3: Histograms of breakpoints when a discontinuity exists in the center of the neighbourhood Chapter 4. Performance of The Not-So-Smoother 39 best way to make this choice is not clear. In general, a large bandwidth leads to a small variance but a large bias in the estimate, whereas a small bandwidth leads to a small bias but a large variance. This is often referred to as the "bias-variance trade-off". We will not discuss the problem of bandwidth selection in depth in this thesis. Suffice it to say that cross-validation and plug-in methods, which are used in the case of continuous functions, can presumably be derived and applied here. Simulated Data To begin with, we will evaluate the not-so-smoother's performance on simulated data in which case the true function is known. The bandwidth which minimizes the squared error between the estimated function and the true function at the design points can be determined; we will refer to this as the optimal bandwidth. The optimal performance of the not-so-smoother will be compared with the optimal performance of a simple (un-weighted) moving average smoother. A continuous function is considered first. We expect the optimal performance of the not-so-smoother to be worse than the optimal performance of a simple moving average smoother when the true function is continuous — how much worse is of interest. The function chosen was one cycle of the sine curve. One hundred observations were generated at equal increments over the interval [0, 2ir) and the noise generated was normal with standard deviation 0.3. The model can be written as where e,- ~ Normal(0, a = 0.3). Note that the function was chosen such that no discon-tinuities are introduced by replicating the data set when estimating near the ends. Five hundred data sets were generated according to this model. For each simulated data set, the not-so-smooth estimate of the function was calculated using every bandwidth from 1 to 50 (half the total number of observations). For each bandwidth, the mean Chapter 4. Performance of The Not-So-Smoother 40 Table 4.1: Summary results of smoothing methods on simulated data (500 replications in each case) Moving Average Smooth Not-So-Smooth mean optimal mean optimal mean optimal mean optimal function bandwidth (sd) mse (sd) bandwidth (sd) mse (sd) sine 8.458 0.006 4.390 0.027 N(0,.3) noise (1.561) (0.003) (0.942) (0.005) split cube-root 1.650 0.020 7.930 0.008 N(0,.2) noise (0.604) (0.002) (2.379) (0.002) squared error between the estimate and the function at the design points was calculated. The optimal bandwidth is the one leading to the smallest mean squared error, which we will call the optimal mse. Specifically, for each bandwidth R £ {1,..., 50}, MSE(R) = jr(fR(ti)-f(ti))2 i=0 was calculated, where n = 99, U = (27rz)/100, and fR denotes the not-so-smooth corre-sponding to bandwidth R. We call the minimum value of MSE(i?) the optimal mse and the value of R minimizing it the optimal bandwidth. In the exact same way, the optimal bandwidth and optimal mse using a simple moving average smooth were determined for each data set, where the only difference is fR now refers to the moving average estimator. The results of the 500 simulations confirm that the moving average smooth consis-tently achieves a smaller minimum mse than the not-so-smooth. Also, the bandwidth corresponding to the minimum mse is consistently larger for the moving average smooth than the not-so-smooth. A summary of the average optimal mse and bandwidth for the two methods is presented in the first row of Table 4.1. The optimal not-so-smooth and the optimal moving average smooth for one simulation are shown in Figure 4.4. The not-so-smoother appears to be suitably named. Although Chapter 4. Performance of The Not-So-Smoother 41 Optimal Moving Average Smooth Rn = 7 and mse = 0.0078 • ' -^ ••* • observed estimate true . • / ^ /' ' / • • • • ? * • '/' I I 0 1 1 1 1 2 3 4 i 5 6 Figure 4.4: Optimal smooths of one cycle of the sine function with Normal(0, 0.3) errors Chapter 4. Performance of The Not-So-Smoother 42 both smooths follow the curve of the data well, the not-so-smooth is much jumpier due to the fact that it "breaks" within every neighbourhood even when a discontinuity is not present. Methods to improve upon this will be presented in Chapter 5. Next we consider a function with one discontinuity. We expect the not-so-smoother to perform better than a moving average smoother in such a case. The function chosen was the cube root function in the range -1 to 1, with the section from -1 to 0 shifted upwards by 2. One hundred observations were generated at equal increments over the interval [—1,1) and the noise generated was normal with standard deviation 0.2. The model can be written as where et- ~ Normal(0,a = 0.2). Again note that the function was chosen such that no discontinuities are introduced when estimating at design points near the ends of the data set. Five hundred data sets were generated according to this model. As before, the optimal mse and the corresponding optimal bandwidth for both the not-so-smoother and the moving average smoother were found for each data set. In each of the 500 simulations the not-so-smoother achieved a smaller optimal mse and also had a substantially larger optimal bandwidth than the moving average smoother. A summary of the mean results is given in the second row of Table 4.1. Note that, on average, the optimal bandwidth for the moving average smoother was just 1.65. If the bandwidth is large, then all points near the discontinuity will have highly biased estimates. Because the increase in bias which results from using a large bandwidth is greater than the decrease in variability, the mse is minimized at a very small bandwidth. This leads to a very rough smooth overall, as we illustrate next. The optimal not-so-smooth and the optimal moving average smooth for one simulation Chapter 4. Performance of The Not-So-Smoother 43 are shown in Figure 4.5. The optimal moving average smooth uses a bandwidth of 1 and therefore does not smooth away the discontinuity too much, but it is indeed very rough. The not-so-smoother preserves the discontinuity even better and gives a much smoother estimate overall, as reflected by the smaller mse. The ideal situation in which to apply the not-so-smoother is when the underlying function is two distinct constants. To give a complete evaluation of .our method's per-formance, we should investigate such a situation. Data were generated according to the following model, which sets one of the constants equal to 0 for convenience: Si i = 0, . . . , 24 6 + d i = 25,. . . , 49 where e,- ~ Normal(0,o~ = 1). The magnitude of the jump, 8, relative to the standard deviation in the noise, <r, which in this model is 1, was varied. Similar to when we investigated local performance of the not-so-smoother, we considered 8/a equal to 0.5, 1 and 2, and in addition we considered 8 j a equal to 0 since this represents the continuous case. One thousand simulations were carried out for each 8/a value. For every data set, the optimal mse and optimal bandwidth using the not-so-smoother and the moving average smoother were found. Table 4.2 summarizes the results of the simulations. As we expect, the moving average smoother performs better than the not-so-smoother in the continuous case (when 8/a = 0). It has a smaller average optimal mse, and more specifically, its optimal mse is smaller in 932 of the 1000 cases. When 8/a = 0.5, the moving average smoother again performed better with respect to having a smaller average optimal mse. Also, in 685 of the 1000 replications, the optimal moving average mse was smaller than the optimal not-so-smooth mse. This is not too surprising because we saw in our investigation of the not-so-smoother's local behaviour that when 8/a = 0.5 and Chapter 4. Performance of The Not-So-Smoother 44 Optimal Moving Average Smooth Rn = 1 and mse = 0.020 observed estimate true Figure 4.5: Optimal smooths of a split cube-root function with Normal(0, 0.2) errors Chapter 4. Performance of The Not-So-Smoother 45 Table 4.2: Summary results of smoothing methods for data simulated according to the two-constant model (1000 replications in each case) Moving Average Smooth Not-So-Smooth mean optimal mean optimal mean optimal mean optimal 6/<T bandwidth (sd) mse (sd) bandwidth (sd) mse (sd) 25.000 0.019 23.381 0.041 0 (0.000) . (0.025) (2.432) (0.032) 13.734 0.209 20.846 0.272 0.5 (4.542) (0.126) (4.677) (0.159) 8.170 0.106 17.118 0.109 1.0 (3.004) (0.047) (5.304) (0.061) 4.288 0.213 14.291 0.120 2.0 (1.858) (0.054) (3.702) (0.094) the bandwidth is small, the discontinuity is not located very accurately. Furthermore, the moving average smoother will not be too biased in the vicinity of the discontinuity when the jump size is relatively small. When 8/a = 1, the two methods perform equally well in terms of mse. We see from the table that the average optimal mse is almost identical for both methods (taking the standard deviations into account, the difference is insignificant). In addition, the optimal not-so-smooth mse was smaller in approximately half of the replications, 524 out of 1000 to be exact. When 8/a — 2, the not-so-smoother performs much better, having a smaller optimal mse than the moving average smooth in 861 of the 1000 replications and also having a smaller optimal mse on average (refer to Table 4.2). It seems clear from these results that as the ratio 8ja increases, the not-so-smoother becomes increasingly preferable to the moving average smoother. Lastly, we compare the optimal bandwidth sizes. For both smoothing methods, the average optimal bandwidth decreased as the ratio 8ja increased. Moreover, when 8ja = 0, the moving average smooth tended to have a slightly larger optimal bandwidth than the not-so-smooth. For each non-zero value of 8/a, the not-so-smooth tended to have Chapter 4. Performance of The Not-So-Smoother 46 a larger optimal bandwidth. This is consistent with the results in Table 4.1 in which the moving average smooth had a larger mean optimal bandwidth in the continuous case (sine curve data), and the not-so-smooth had a larger mean optimal bandwidth in the discontinuous case (split cube-root function). Real Data Using a real data set for which the true signal is unknown will give a more practical demonstration of the not-so-smoother's performance. We considered data for which each observation is the measurement of current flowing through an ion channel in a cell mem-brane (Fredkin and Rice, 1990). Theoretically, the current has two, and perhaps more, conductance levels between which it seems to switch randomly. Due to the method used to measure the current, the noise can be great compared to the current size. The data set has a total of 16000 current recordings from which we chose a subset of 500 observations to investigate here. A plot of this subset is shown in Figure 4.6. As with the simulated data, we compared the performance of the not-so-smoother and the moving average smoother on the current recordings data. To avoid the problem of bandwidth selection, we applied both smoothing methods using a range of bandwidths, namely Rn = 5, 15 and 30. Quantitative methods to determine the best choice of bandwidth could be used, but we simply evaluated the smooths visually. Plots of all the smooths are given in Figure 4.7. Consistent with the results of our simulations on discontinuous data, a much smaller bandwidth appears best for the moving average smoother than for the not-so-smoother since there are seemingly several discontinuities. With a bandwidth of 5, the moving average smooth identifies the jumps quite clearly but produces jagged output as the cost. As the bandwidth increases, the output becomes progressively smoother but the jumps Chapter 4. Performance of The Not-So-Smoother 47 0 100 200 300 400 500 Figure 4.6: Measurements of current flowing through an ion channel in a cell membrane become more blurred. On the other hand, the not-so-smoother also produces progres-sively smoother output as the bandwidth increases yet the jumps are still preserved and, in fact, become sharper. Although the true signal is unknown, the results strongly suggest that the current switched between a high and low conductance level at four times indexed approximately by 165, 222, 280 and 348. In addition, the not-so-smooth using a bandwidth of 5 provides some evidence that the current switched levels rapidly, perhaps to an intermediate level, a couple of times between indices 280 and 348. This data set provides a good opportunity to see how the not-so-smoother behaves when the bandwidth is so large that a neighbourhood contains two or more discontinu-ities. In such a case, the local two-constant model is grossly violated and we can expect poor results. If the minimum number of observations between any two discontinuities is Chapter 4. Performance of The Not-So-Smoother Figure 4.7: Smooths of the current data using the not-so-smoother and the average smoother for various bandwidths Chapter 4. Performance of The Not-So-Smoother 49 Not-So-Smooth with Rn= 50 8 CM 8 V . • • • • 7 X' Mr '. •1 • • • < 100 200 300 400 500 Figure 4.8: Not-so-smooth of the current data using a bandwidth of 50 d, then taking the bandwidth to be greater than d/2 will violate the local model. For the ion current data, the minimum number of observations between discontinuities is about 60 (assuming that the discontinuity points are in fact indexed approximately by 165, 222, 280 and 348). Thus, using any bandwidth exceeding 30 will result in neigh-bourhoods containing more than one discontinuity. We calculated the not-so-smooth for Rn = 50 and the result is shown in Figure 4.8. Undesirable peaks in the estimate occur between the discontinuities because in neighbourhoods containing two discontinuities, no matter where the best broken constant fit breaks, the estimate will still be an average of observations on both sides of one of the discontinuities. Chapter 5 Extensions 5.1 The Somewhat-Smoother The examples in the previous chapter illustrated that the not-so-smoother performs well at preserving discontinuities, but often produces rough output. This is due to the na-ture of the smoothing method in that the best local fit always breaks, even when no discontinuity exists, so only a subset of the neighbourhood data is averaged for each estimate. One way to improve the roughness of the not-so-smoother is, for each neighbourhood, to compare the fit of the best broken constant with the fit of the best constant, which is just the mean of all observations in the neighbourhood. Unless the broken constant fit is substantially better, the mean should be used as the estimate. In other words, the not-so-smooth estimate should be used if the broken constant fit is much better since this is evidence of a break, but otherwise the moving average estimate should be used. A hypothesis test can be used to determine if the broken constant fit is substantially better. Consider a fixed neighbourhood about a point £„,-. Assume that the local model given by equation (4.1), which says that the true function within the neighbourhood is two constants, holds. For convenience, we repeat the model here: Xn,i+j — \ cl + enti+j for j = -Rn, • • •, hi [ c2 + en,i+j for j = /„,• + 1, . . . , Rn. Recall that the e's are assumed to be independent and identically distributed with mean 50 Chapter 5. Extensions 51 0 and variance a2. We want to test the null hypothesis that the two constants are equal versus the alternative that they are not equal. If the null is rejected, then the not-so-smooth estimate is used. The natural estimators of cl and c2 are the sample means of X^i-Rn,..., Xn,i+ini and Xn,i+ini+i,..., Xnj+Rn respectively, which in keeping with the notation of Chapter 3 are denoted by X-Rn.jni and Xini+i:Rn. Of course, In{ is unknown and must be replaced by its estimate, 7m-. The variance, <72, is also unknown in most situations and therefore must be estimated. A two-sample t-test for the difference in means can be used. HQ : cl = c2 vs. HA : cl / c2 , 2 i 1 , L where P \ini + Rn + l Rn — In 4 ~ OR - 1 ( ^ {X3 ~ X - R n : I n ) + E {Xi - X i n i + l : R n ) ) 71 \j=-Rn jWni+l / If we assume that the error terms are normally distributed (or alternatively, if the number of points in each average is large), then ts has (approximately) a ^-distribution with 2R„ — 1 degrees of freedom. Therefore, we reject Ho at level a and conclude that there is significant evidence of a break in the neighbourhood if \ts\ > t2Rn~i(l — a), the (1 — oj)th quantile of the t2Rn-\ distribution. When a = 0, we never reject the null hypothesis and the moving average smooth is the result. On the contrary, when a = 1, the null is always rejected and the not-so-smooth is the result. Thus, by varying the significance level we can control the amount of smoothing done. Because the amount of smoothing ranges between that of the moving average smoother and the not-so-smoother, we will appropriately name this modified smoothing method the somewhat-smoother. Chapter 5. Extensions 52 Figure 5.1: Somewhat-smooths of the sine data using Rn — 5 and various levels of a Chapter 5. Extensions 53 In the previous chapter, we considered data generated from a sine curve with normally distributed noise. The not-so-smoother produced very rough output. The purpose of the somewhat-smoother is to improve upon this. To demonstrate, we use the same data shown in Figure 4.4 and apply the somewhat-smoother using a equal to 0, 0.01, and 1 and using a bandwidth of 5 in all cases. Graphs of the smooths are given in Figure 5.1 on the previous page. The somewhat-smoother with a — 0.01 provides a less jagged estimate than the not-so-smoother (a = 1). Of course, when the underlying function is continuous, as is the case with the sine data, the moving average smoother (a — 0) still performs best. However, when the possibility of a discontinuity exists, the somewhat-smoother allows for the preservation of the jump while producing smoother results in the continuous stretches. To illustrate, we will again consider a data set used previously, namely the split cube-root data as shown in Figure 4.5. This figure shows that the optimal not-so-smooth preserved the discontinuity perfectly but produced fairly rough output. Thus, we applied the somewhat-smoother to the data using the same bandwidth as the optimal not-so-smooth (Rn = 9) for the purpose of comparison. The level of significance used was a = 0.0001. Although this level may seem very small, it was chosen because it produced quite smooth output yet the test still detected the jump. The somewhat-smooth is given in Figure 5.2, as is a replicate of the optimal not-so-smooth, and the results look much improved. The drawback to using the somewhat-smoother is that there are now two parameters, Rn and a, to be selected. In our examples, a was chosen visually but, as with the selection of an optimal bandwidth, we would like to develop quantitative methods for choosing the optimal a value. Chapter 5. Extensions 54 Somewhat-Smooth with alpha = 0.0001 Rn= 9 and mse = 0.0040 o CJ • • observed 1.0 1.5 estimate true 1.0 1.5 * • ' • * hS*'.' -• • • ^ • • • • • ID d *• o d i -1.0 -0.5 0.0 0.5 1.0 Figure 5.2: Not-so-smooth and somewhat-smooth with a data using Rn — 9 = 0.0001 of the split cube-root Chapter 5. Extensions 55 5.2 The Not-So-Smoother using Local Linear Fits There are situations when using local constant fits within each neighbourhood will not provide a good fit. As an example, we consider the "sawtooth" function used in the paper by McDonald and Owen as well as Hall and Titterington. The function consists of two line segments rising from 0 to 1 — one between 0 and 0.5 and the other between 0.5 and 1. There are 256 equally spaced data points with normal noise added. The standard deviation of the noise is taken to be half that of the function; that is a = function is nowhere approximately constant, applying the not-so-smoother gives a very jagged estimate. Figure 5.3 shows the optimal not-so-smooth according to the minimum mse criterion. Although it is less jagged than the optimal moving average smooth would be, we would still prefer a smoother estimate. Using the somewhat-smoother would only marginally improve the results because it still assumes that the local two-constant model This example leads us naturally to consider using local linear fits rather than local constant fits. Within each neighbourhood, the two lines which minimize the mean squared error are found. This involves minimizing the function H*kn(J, cti, a 2 , (32) with respect to its five parameters where The value of J which minimizes the function is the estimated breakpoint, the values of ct\ and /3i which minimize the function are estimates of the intercept and slope of the first line respectively, and the values of a2 and f32 which minimize the function are estimates of the intercept and slope of the second line respectively. 1/2 {Jo0-5 (x - 0.5)2 2x dx + J^5 (x - 0.5)2(2x - 1) dx} 1/2 1/2 0712- B l ecause the true holds. Chapter 5. Extensions 56 Optimal Not-So-Smooth using Local Broken Linear Fits Rn= 66 and mse = 0.00039 • observed estimate true - i 1 1 1 1 1— 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.3: Optimal not-so-smooths of the sawtooth function with noise using local constant fits and local linear fits Chapter 5. Extensions 57 The not-so-smoother using local linear fits was applied to the sawtooth data and the optimal smooth was found (see Figure 5.3). The benefits of using linear rather than constant fits is evident in this case. The estimated function is almost indistinguishable from the true function oyer most of the range. Because the sawtooth function consists of two line segments, it is not surprising that using linear fits leads to a good estimate. However, this method involves estimating a greater number of parameters. If the optimal neighbourhood size using linear fits is sufficiently larger than that using constant fits, then no loss incurs from the additional parameter estimation. Whether this will in fact be the case depends on the data. When the data are such that the locally constant model is approximately true, the original not-so-smoother or somewhat-smoother may be preferable, but otherwise, using the not-so-smoother with local linear fits is likely to give better results. Chapter 6 Conclusion and Discussion In Chapter 2, we proposed a smoothing method which, when applied in situations where the regression function being estimated is discontinuous, is designed to preserve the discontinuities. This smoother, termed the not-so-smoother, was shown to be consistent under very general conditions. The performance of the not-so-smoother, both locally (within a neighbourhood) and globally (on an entire data set), was thoroughly investigated. Using simulated data for which the underlying function was known, we could evaluate the accuracy with which the discontinuities were located and the function was estimated. Overall, the not-so-smoother was successful at estimating functions with discontinuities, with greater success as the number of points within the neighbourhood increased and as the ratio of the size of the discontinuity to the variability in the data increased. The performance of a smoothing method is highly dependent on the choice of the neighbourhood size, or bandwidth. By using simulated data sets, we were able to de-termine the optimal bandwidth, meaning the one which minimized the mean squared error. Thus we could compare the optimal performance of the not-so-smoother with the optimal performance of a moving average smoother. When the regression function was continuous, the moving average smoother performed better, as the not-so-smoother produced a much rougher estimate. When the regression function was discontinuous, the relative performance of the two methods depended on the ratio of the discontinuity 58 Chapter 6. Conclusion and Discussion 59 size to the standard deviation in the data. Our simulations suggest that when the ra-tio is less than 1, the moving average smoother performs better. As long as the ratio is greater than 1, the not-so-smoother is superior, becoming increasingly better as the ratio increases. In this case, not only does the optimal not-so-smooth preserve the edges more sharply than the optimal moving average smooth, but it also produces smoother output since the optimal moving average smooth requires such a small bandwidth that it is very jumpy. Note that we are evaluating the smooths based solely on minimum mean squared error criterion; if the ability to preserve an edge was the criterion used, then the not-so-smoother would be preferable whenever a discontinuity is present. The not-so-smoother is designed such that it assumes, essentially, that a discontinu-ity exists within each neighbourhood. This property can result in jagged output over continuous segments of the data, and a desire to reduce this jaggedness led us to consider a modified method called the somewhat-smoother. The somewhat-smoother, presented in Chapter 5, combines the not-so-smoother with the moving average smoother. A test is performed within each neighbourhood to determine if the data provide sufficient evi-dence that a discontinuity exists. If so, the not-so-smooth estimate of the point is used; otherwise, the moving average estimate is used. Examples illustrated that the somewhat-smoother can achieve the goal of producing smooth output while still preserving the discontinuities. We considered another modification to the not-so-smoother in which the best piece-wise linear, rather than constant, function with exactly one simple discontinuity is iden-tified within each neighbourhood. When the regression function is not approximately flat, even within small segments, as with the sawtooth function, using constant fits gives a rough estimate. Using linear fits can improve the results greatly, but requires more parameter estimation. In summary, if it is known that the underlying function is continuous, then there is Chapter 6. Conclusion and Discussion 60 no advantage to applying the not-so-smoother. The not-so-smoother will not perform as well as, say, a moving average smoother, and computationally it takes much longer. However, if it is suspected that the function being estimated is discontinuous, or if no information about the function is available, then using the not-so-smoother or one of its modifications can be highly beneficial. As always, there is further work to be done. Consistency of the not-so-smoother was proven, but more work is required to determine the asymptotic behaviour. We expect that the not-so-smoother will converge much faster than the traditional smoothing techniques when there is a discontinuity, and at the same speed when there is not. Bandwidth selection was mentioned briefly in Chapter 4. How to choose the best bandwidth first requires a criterion on which to base evaluation. Typically mean squared error is used, but perhaps this does not tell the whole story in the case of discontinu-ous regression functions. When the size of the discontinuity is small, a moving average smooth may have a smaller mean squared error than a not-so-smooth even though the edge is denned more sharply with the latter. Depending on the situation, edge preser-vation may be more important. Next, an evaluation criterion involves knowing the true function, and since the function is unknown, methods of dealing with this problem must be developed. Traditionally, cross-validation and plug-in techniques are used in the se-lection of a bandwidth, and ways to apply these techniques in the discontinuous case can be derived. Lastly, the not-so-smoother was introduced in a one-dimensional setting where one might argue that the discontinuities can be identified visually, at least for some data sets. When a higher dimensional function is being estimated, discontinuities would be very difficult to identify without some quantitative technique such as a smoother. The not-so-smoother can be extended quite naturally to include higher dimensional cases. Consider a two-dimensional function. At each point in the sample, define its neighbourhood to Chapter 6. Conclusion and Discussion 61 be all points within a fixed area around the point. For simplicity, the area could be taken to be a square. Then find the line dividing the neighbourhood into the two subsets which minimize the distance between the mean and the data of the first subset plus the distance between the mean and the data of the second subset. To consider all possible lines could be lengthy and not even desirable (since very curvy optimal lines would likely result), so a subset such as all vertical, horizontal and diagonal lines could be considered. After determining the "best" line, the estimate of the point under consideration would be the mean of the subset to which it belongs. Modifications analagous to the somewhat-smoother and the use of linear fits in the one-dimensional case can also be extended to higher dimensions. Bibliography [1] Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Prentice Hall, New Jersey. [2] Chung, K. L. (1974). A Course in Probability Theory. 2nd edition. (Probability and Mathematical Statistics: A Series of Monographs and Textbooks.) Academic Press, New York. [3] Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. (Statistics, Textbooks and Monographs, 90.) M . Dekker, New York. [4] Feder, P. I. (1975). On Asymptotic Distribution Theory in Segmented Regression Problems - Identified Case. The Annals of Statistics 3 49-83. [5] Fredkin, D. R. and Rice, J. A. (1990). Bayesian Restoration of Single Channel Patch Clamp Recordings. University of California, San Diego. [6] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and General-ized Linear Models: A Roughness Penalty Approach. 1st edition. (Monographs on Statistics and Applied Probability, 58.) Chapman and Hall, New York. [7] Hall, P. and Titterington, D. M . (1992). Edge-Preserving and Peak-Preserving Smoothing. Technometrics 34 429-440. [8] Lee, D. (1991). Detection, Classification, and Measurement of Discontinuities. SIAM Journal of Scientific and Statistical Computing 12 311-341. [9] McDonald, J. A. and Owen, A. B. (1986). Smoothing with Split Linear Fits. Tech-nometrics 28 195-208. [10] Miiller, H. G. (1992). Change-Points in Nonparametric Regression Analysis. The Annals of Statistics 20 737-761. [11] Shiau, J. H. (1987). A Note of MSE Coverage Intervals in a Partial Spline Model. Communications in Statistics - Theory and Methods 16 1851-1866. 62
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The not-so-smoother
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The not-so-smoother Eveson, Jennifer Paige 1996
pdf
Page Metadata
Item Metadata
Title | The not-so-smoother |
Creator |
Eveson, Jennifer Paige |
Date Issued | 1996 |
Description | In this thesis, a local smoothing method, termed the not-so-smoother, designed to estimate discontinuous regression functions is proposed. Local smoothing techniques estimate the regression function at a given point by finding the "best fit" through the observations within a fixed neighbourhood of the point. The "best fit" can be the best constant fit (which gives the moving average smoother), the best linear fit, the best kdegree polynomial fit, et cetera. The not-so-smoother finds the best local broken constant fit, a piecewise constant function with exactly one simple discontinuity. Unlike any of the traditional local smoothing methods, the not-so-smoother uses discontinuous local fits and, therefore, has the ability to preserve discontinuities in the function. Consistency of the not-so-smoother under general conditions is proven. Performance of the smoother on simulated data, both continuous and discontinuous, is demonstrated, and an application to a real data set of electric current recordings through an ion channel in a cell membrane is also shown. Variations of the not-so-smoother which can lead to improved performance in certain situations are investigated. |
Extent | 2824789 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-02-17 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087254 |
URI | http://hdl.handle.net/2429/4736 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1996-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1996-0564.pdf [ 2.69MB ]
- Metadata
- JSON: 831-1.0087254.json
- JSON-LD: 831-1.0087254-ld.json
- RDF/XML (Pretty): 831-1.0087254-rdf.xml
- RDF/JSON: 831-1.0087254-rdf.json
- Turtle: 831-1.0087254-turtle.txt
- N-Triples: 831-1.0087254-rdf-ntriples.txt
- Original Record: 831-1.0087254-source.json
- Full Text
- 831-1.0087254-fulltext.txt
- Citation
- 831-1.0087254.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0087254/manifest