- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A new measure of quantitative robustness
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A new measure of quantitative robustness 1992
pdf
Page Metadata
Item Metadata
Title | A new measure of quantitative robustness |
Creator |
Mazzi, Sonia V. T |
Date Created | 2009-01-06 |
Date Issued | 2009-01-06 |
Date | 1992 |
Description | The Gross-Error Sensitivity (GES) and the Breakdown Point (BP) are two measures of quantitative robustness which have played a key role in the development of the theory of robust-ness. Both can be derived from the maximum bias function B(€) and constitute a two-number summary of this function. The GES is the derivative of B(€) at the origin whereas the BP determines the asymptote of the curve (c, B(€)). Since GES€ ≈ B(€) for € near zero, the GES summarizes the behavior of B(€) near the origin. On the other hand, the BP does not provide an approximation for B(€) for c large and, consequently, estimates with strikingly different bias performance when c is large may have the same BP. A new robustness quantifier, the breakdown rate (BR), that summarizes the behavior of B(€)for € near BP will be introduced. The BR for several families of robust estimates of regression will be presented and the increased usefulness of the three-number summary (GES,BP,BR) for comparing robust estimates will be illustrated by several examples. |
Extent | 1482597 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2009-01-06 |
DOI | 10.14288/1.0086725 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 1992-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/3379 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0086725/source |
Download
- Media
- ubc_1992_spring_mazzi_sonia.pdf [ 1.41MB ]
- Metadata
- JSON: 1.0086725.json
- JSON-LD: 1.0086725+ld.json
- RDF/XML (Pretty): 1.0086725.xml
- RDF/JSON: 1.0086725+rdf.json
- Turtle: 1.0086725+rdf-turtle.txt
- N-Triples: 1.0086725+rdf-ntriples.txt
- Citation
- 1.0086725.ris
Full Text
A NEW MEASURE OF QUANTITATIVE ROBUSTNESS by SONIA V.T. MAllI Lic., Universidad Nacional de COrdoba, Argentina, 1989 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA December 1991 ©Sonia V.T. Mazzi, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of 6-TA T I STICS The University of British Columbia Vancouver, Canada Date oEcemso 43 -fh 1 j9 9 1 DE-6 (2/88) Abstract The Gross-Error Sensitivity (GES) and the Breakdown Point (BP) are two measures of quantitative robustness which have played a key role in the development of the theory of robust- ness. Both can be derived from the maximum bias function B(€) and constitute a two-number summary of this function. The GES is the derivative of B(c) at the origin whereas the BP determines the asymptote of the curve (c, B(c)). Since GESc :::-_, B(€) for c near zero, the GES summarizes the behavior of B(c) near the origin. On the other hand, the BP does not provide an approximation for B(€) for c large and, consequently, estimates with strikingly different bias performance when c is large may have the same BP. A new robustness quantifier, the breakdown rate (BR), that summarizes the behavior of B(c) for c near BP will be introduced. The BR for several families of robust estimates of regression will be presented and the increased usefulness of the three-number summary (GES,BP,BR) for comparing robust estimates will be illustrated by several examples. i i Contents Abstract ii Table of Contents iii List of Tables v List of Figures vi 1 Introduction 1 2 Quantitative Robustness 7 2.1 Estimates Defined by Functionals 7 2.2 c-Neighborhoods 8 2.3 Quantitative Robustness 9 2.3.1 Asymptotic Bias and Asymptotic Variance 9 2.3.2 The Influence Function and the Gross-Error-Sensitivity 10 3 Some Robust Estimates of Regression Coefficients 12 3.1 The Regression Model 12 3.2 S-Estimates 15 3.3 7-Estimates 18 3.4 MM-Estimates 19 iii 4 The Relative Breakdown Rate 21 4.1 The Relative Breakdown Rate of S-Estimates Based on x Functions Strictly Con- vex on a Neighborhood of Zero 21 4.2 The Relative Breakdown Rate of MM- and S-Estimates Based on x Functions Strictly Convex on a Neighborhood of Zero 25 4.3 The Relative Breakdown Rate of 7- and S-Estimates Based on x Functions Strictly Convex on a Neighborhood of Zero 26 5 The Breakdown Rate 29 5.1 The Baseline Estimate 29 5.2 The Definition of the Breakdown Rate 34 5.3 Breakdown Rate of S-Estimates of Regression 34 5.4 Breakdown Rate of 7-Estimates of Regression 41 5.5 Breakdown Rate of MM-Estimates of Regression 43 5.6 Conclusions 47 Bibliography 47 iv List of Tables 4.1 Comparison of two S-estimates with the same BP 25 4.2 Comparison of an MM- and a 7-estimate with the same BP and efficiency . 27 4.3 Comparison of an MM- and a 7-estimate with the same BP and SENS 27 5.1 Comparison of two S-estimates with the same BP but markedly different bias performance 41 v List of Figures 1.1 Maximum bias curve, BP and GES of the sample median 3 1.2 Maximum bias curves of Sb for b = 0.85 and b = 0.15 4 vi Chapter 1 Introduction To quantify the large sample properties of an estimate representable as a functional T, the study of its asymptotic behavior is usually performed on some neighborhood of the model. We will concentrate on the study of the asymptotic bias of T and consider c-contamination neighborhoods of a central or ideal model Fo . Following this criterion, robust estimates (in their asymptotic version) should change as little as possible, uniformly over some neighborhood of the model. An c-neighborhood of F0 is a set of distribution functions Ve(Fo) = IF : F = (1— OF° + cH; H is a cdfl. If F E 1),(F0) then F = (1 — c)Fo + EH for some cdf H which can be interpreted as some unspecified distribution function generating outliers and c can be viewed as the fraction of outliers. The maximum asymptotic bias of an estimate T over an c-neighborhood, BT(c), is an es- tablished concept and an important measure of the quantitative and global robustness of T (see Section 2.3.1). BT(c) measures the maximum possible perturbation of the value of T(F) when F ranges over V,(F0 ). Naturally, when the amount c of contamination increases so does BT(c) and it eventually becomes infinity. The smallest value of c such that the maximum asymptotic bias is infinite is called the breakdown point of the estimate and indicates the amount of distortion in the model needed to make the estimate take on arbitrarily large aberrant values. The concept of breakdown 1 point was first introduced by Hodges (1967) for one-dimensional estimates of location. Hampel (1971) gave a much more general definition of an asymptotic nature and Donoho and Huber (1983) introduced a finite sample version of the breakdown point. Hampel (1968, 1974a) introduced a robustness quantifier called the influence curve which measures the speed of change of the value of an estimate when the central model is contaminated with a single observation (see Section 2.3.2). The maximum absolute value of the influence curve is called the gross-error sensitivity and this single number summarizes the behavior of the maximum bias curve in a neighborhood of c = 0. In many cases, like in the following example, the gross-error sensitivity is the derivative of the maximum bias curve at the origin. The concepts of maximum bias curve, gross-error sensitivity and breakdown point are illus- trated in Figure 1.1. In this case we consider the one dimensional Gaussian location model and the sample median. It can be shown that the maximum bias of the median is .13,,(c) = (1) -1 (1/(2(1 — c))) and that its influence curve is /Cm (x) = sgn(x)/[2(p(0)]. It easily follows then, that the breakdown point of the sample median is c* = 0.5 and that the gross error sensitivity is 7* = 1/[2(p(0)]P..- 1.253 (see for instance Huber, 1981). The breakdown point and the gross-error-sensitivity are two "one-number-summaries" of the maximum bias curve and they carry important information about this function. These two quantities are now routinely computed and characterize the performance of an estimate. The breakdown point has proved to be very helpful for understanding the robustness prop- erties of estimates. For example Hampel (1974b,1976) analyzed data from a Monte Carlo study of rejection rules followed by the sample mean, concluding that the performance of the different statistics considered could be ranked in terms of their breakdown points. As another example, in the Princeton robustness study (Andrews et al, 1972, p.253) two estimates of location with similar asymptotic properties for all symmetric distributions were 2 O0.0 0.1 0.2 0.3 0.4 E4 =0.5 eps Figure 1.1: Maximum bias curve, BP and GES of the sample median studied, among others. These location estimates used auxiliary estimates of scale and the differ- ence in their performance was explained in terms of the breakdown points of their corresponding scale estimates. In the regression setup, the problem of constructing an estimate with non-null breakdown point, i.e. an estimate that can deal with a certain percentage of outliers and that is efficient for a model with Gaussian errors, was a serious concern for many statisticians. Until 1984, several efforts were made towards obtaining an affine equivariant estimate with maximal breakdown point of 50%. In 1984, Rousseeuw and Yohai introduced the S-estimates , which are defined implicitly by minimizing a robust M-estimate of the scale of the residuals (see Section 3.2). S-estimates can attain a 50% breakdown point, they are affine equivariant and asymptotically normal at the usual rate of VT. But these estimates cannot combine the property of high breakdown point with high efficiency at the model with Gaussian errors. Finally, the MM-estimates proposed by Yohai (1987) and the 7-estimates proposed by Yohai and Zamar (1988) have the three desired properties: high breakdown point, affine equivariance 3 OO 0 0.0 0.02 0.04 0.08 0.08 0.10 0.12 0.14 0.18 EPSILON Figure 1.2: Maximum bias curves of Sb for b = 0.85 and b = 0.15 and high efficiency at the Gaussian model (see Section 3.3 and 3.4). We see how the concept of breakdown-point (combined with other classical asymptotic con- cepts) inspired a fruitful search for estimates which are robust in a very precise way and also possess other desirable properties. However, the following example illustrates the fact that ro- bust estimates with the same breakdown point can have strikingly different bias performances for large E. EXAMPLE: Let bi ,b2 be such that b 1 < 0.5, b i = 1 — b2 . Consider the S-estimates of regression Sb, and Sb, based on jump functions (see Section 5.1). Since b 1 = 1 — b2 , these two estimates have the same breakdown point (see section 3.2). By graphing their maximum bias functions (see Figure 1.2) we notice that /35,2 diverges much more rapidly than Bsbi , where Bs denotes the maximum bias curve of Sb,. This indicates that Sb..2 is prone to take on large aberrant values much more rapidly than Sb, and this fact can be formalized by computing the 4 following limit: RBR(Sb1, S62) An easy calculation shows that RBR(Sb i , S62) inferred from Figure 1.2. The reason why the breakdown point classification fails to distinguish between rather differ- ent estimates, is that the breakdown point indicates only the location of the asymptote of the maximum bias curve but not how the curve actually behaves near this point. That is, the BP does not distinguish among estimates with maximum bias curves tending to infinity at different rates. Therefore the gross-error sensitivity should be considered a more complete single-number description since it tells us about the behavior of the maximum bias curve in a neighborhood of the origin. In this thesis we introduce a new measure to quantify robustness in terms of the asymptotic bias, which is fairly easy to compute and to interpret and which, in conjunction with the GES and BP criteria, helps in classifying robust estimates. This quantity is called the Breakdown Rate (BR). The breakdown rate is based on another newly introduced concept called the Relative Break- down Rate (RBR). Given two estimates, say Ti. and T2 with the same breakdown point, c*, we compute their relative breakdown rate as the limit of the ratio of the square of their maximum bias curves, Bi (c) and B2 (c), as c —› c*. If 0 < RBR(T1,T2) < oo then for c near c*, BRE) r-'..-: RBR(Ti,T2) B2 (E). If RBR(TI, T2) = 0 then there is no doubt we would prefer T1 to T2 and if RBR(Ti, T2) = 00 then Ti. would be inadmissible from a robust point of view with respect to T2. We work with the specific model of linear regression. The estimates considered are Rousseeuw- Yohai's S-estimates, Yohai's MM-estimates and Yohai-Zamar's r-estimates. Bsbi (E) = iiill , , e--*B P .13;§ (E) 62 = 0, providing a formal justification of what we 5 The breakdown rate of an estimate in the just mentioned families is defined as the relative breakdown rate with respect to a baseline estimate, namely the min-max bias S-estimate among all S-estimates with the same breakdown point. The breakdown rate together with the breakdown point concept gives a more complete de- scription of the robustness properties of an estimate, because it not only points to the asymptote of the bias curve but also characterizes the way in which the curve goes to infinity. Observe that the gross-error sensitivity and the breakdown rate describe the maximum bias curve near the boundary of its domain, (0, BP). We will show how the triplet (GES, BP, BR) allows a finer classification of robust estimates. 6 Chapter 2 Quantitative Robustness 2.1 Estimates Defined by Functionals Hampel (1968) introduced a way to define an estimate which proved to be quite fruitful since it enabled formalization of a very important aspect of robustness (qualitative robustness). It also made easier the study of the asymptotic properties of estimates, linking theoretical results of functional analysis with those of statistics. To present Hampel's idea we need the concept of empirical distribution, which gives a way for linking a set of observations yn to a probability distribution on Rk , k > 1. Definition: Given a set yn} yi E Rk , the empirical distribution of yn is the probability measure on Rk, yn] defined by n 11 [Y1, • • • , Yn](13) = n IB (yi) ,VB E Bk where IB is the indicator function of the set B and 13k is the family of Borelian sets in Rk. Let Z(Rk) denote the set of all probability measures on Rk . For each n let = {161, • • • , Yrd , • • • , Yn E Rk } be the set of all empirical distributions associated with samples of size n. Definition: an estimate Tn, is given by a functional, T, defined on Z(Rk) if there exists a function T defined on a subset D(T) C Z(Rk) such that: Tn , • • , Yn ) T(P[Y1, • • • , 7 where yn) is in the domain set of Tn and p[yi , yn] E D(T). We consider estimates which can be defined by functionals or that can be replaced by func- tionals. This means we assume that there exists a function T : D(T) Rk such that Tn (Yi , T(F), as n oo when the observations are i.i.d according to the true distribution F. We say that T(F) is the asymptotic value of Tn at F. To illustrate the definitions, an example of how an estimate can be defined by a functional follows. EXAMPLE: Sample mean defined by a functional. Let D(T) {F : F is a cdf on R and f Ix' dF(x) < oo} and T(F) = f x dF(x) = EF(X). Then n Tn(Y1 • • • I Yn) =Yi T(tt[Y1, • • Yni). n i=i 2.2 €-Neighborhoods Given a functional T, we are interested in quantifying its robustness with respect to small changes in F. We want to measure the changes in T(F) caused by "small" changes in F in a sense that we will define. We need the concept of an "ideal" distribution F0 which obtains because of physical or other reasons and which is completely known. The real data we are able to obtain have a distribution F distorted through gross errors, rounding errors or other factors beyond our control. To make a quantitative assessment of the effects of such distortions we employ a measure of such distortions in the ideal distribution, which can be a measure defined in the space of probability 8 distributions or more generally just a discrepancy in the same space. We will work with the Huber contamination discrepancy defined as: 6Huber(F; Fo) = inf{C : F(x) (1 — ()Fo (x), V xl Note that this is not a distance. Let V,(Fo) = {(1 — c)F0 cH : His a cdf}; then V, (Fo ) = {F : OHnber (F, Fo) E} is called the c-contamination neighborhood of Fo• c-contamination neighborhoods were first introduced by Huber (1964) for the location model and they provide a simple way for modeling data contaminated by outliers. If F E V„ then F = (1 — OF° -I- cH where H can be interpreted as some unspecified distribu- tion function which generates the outliers and c can be viewed as the fraction of contamination. 2.3 Quantitative Robustness For various reasons it may be useful to describe quantitatively how greatly a small change in the underlying distribution, F, changes the distribution, dF(Tn ), of an estimate Tn = Tn(xi, • • • , xn), xi E Rk . A description by means of a few numerical quantifiers might be more effective than a detailed characterization. For the sake of simplicity, we will assume that k = 1. 2.3.1 Asymptotic Bias and Asymptotic Variance Assume that Tr, is defined through a functional T, so that Tr, = T(Fn ). In most cases of interest, Tn, is strongly consistent i.e, a.s.[F1 Tn T(F) 9 and asymptotically normal, EFINTri[Tn, — T(F)]} 7-2-4. Ar(0, A(F,T)) as the sample size, n, tends to infinity. Quantitative large sample robustness is usually discussed in terms of the behavior of the asymptotic variance A(F,T) and of the asymptotic bias, T(F) — T(F0), over some neighborhood Vf (F0 ) of the model distribution (e.g. V,(F0 ) can be an c-contamination neighborhood). In this sense, two important quantifiers are the maximum asymptotic bias BT(E) = sup IT(F) — T(Fo)i FEMF0) and the maximum asymptotic variance VT(E) -= sup A(F,T). FEK(Fo) If we consider c -contamination neighborhoods of F0, then Ve (F0) = {F : F = (1 — OF° + EH , where H is a cdf}. Therefore, V1(F0) = {H : H is a cdf} is the set of all probability measures on the sample space so that VE (F0) C V1(F0), d0 < c < 1 and so BT(c) < BT(1). Usually BT(1) = oo. The asymptotic breakdown point of T at F0 is BP(T) = sup{c : BT(E) < BT(1)}. 2.3.2 The Influence Function and the Gross-Error-Sensitivity Hampel (1968,1974a) introduced a robustness quantifier called the influence curve (IC) or in- fluence function, defined as IC(x; F, T) = li m T((1 — s)F + so) — T(F) ,3.0 s where bx denotes the point mass 1 at x, x E R, when the limit exists. 10 This quantity can be viewed as the limiting influence on the value of T(Fri) of a single observation x added to the sample of size il. The maximum absolute value of the influence curve, 7* = sup IIC(x; F,T)1 s is called the gross-error-sensitivity. In most of the cases, when -y* and ./3,(0) are finite, it can be seen that 7* = ./A,(0), and so the gross-error sensitivity gives us a linear approximation of the bias curve near 0. Indeed it can be shown that under mild regularity conditions, then equality holds. 11 Chapter 3 Some Robust Estimates of Regression Coefficients 3.1 The Regression Model Assume the target model is given by ( 3 . 1 ) y = x'00 + u, where x =-- (x1,...,xp)' is a random vector in RP, Bo = (Oh , ...,Opo )' is the vector of true regression coefficients and the error, u, is a random variable independent of x. Let Fo be the nominal distribution function of u and G o , the nominal distribution function of x. Then the nominal distribution function, 1/0 , of (y, x) is (3.2) Ho(y, x) = 11 • • •/X : Fo(y — eos) dGo(s). Assume Go is elliptical about the origin with scatter matrix A. Correspondingly, we work with a zero intercept, although it can be shown that there is no loss of generality in this as- sumption. Let T be an RP valued functional defined on a ("large") subset of the space of distribution functions, H, on RP+ 1 . This subset is assumed to include all empirical distribution functions, Hri , corresponding to a sample, (y 1 ,x 1 ),...,(y„,x„), of size n from H. Then, Tn = T(1-/n ) is an estimate of 00 . 12 It is further assumed that T is regression invariant, i.e., if 'y = y + x'b and "X. = CTx for some full rank p x p matrix, C, then T(R) = C -1 [T(H) + b], where ii- is the distribution of (9, '-)• Correspondingly, the transformed model parameter is Bo = C-1 [90 + b]. The asymptotic bias bA = bii(H) of T at H is defined as (3.3) bl(H) = (T(H) — 9o )'A(T(H) — Go ). Therefore, we can assume without loss of generality, that G o is spherical, i.e., A is the identity matrix, and that 00 = 0. Accordingly, the nominal model (3.2) becomes (3.4) Ho(y,x)= flio- • •,/ x:Fo (y)dG0 (11.911) and, correspondingly, the asymptotic bias of T at H is given by the euclidean norm squared of T, (3.5) bT(H) = IIT(H)11 2 . From now on we will write bT(H) = bi,(H). If the functional T is continuous at H, then T(H) is the asymptotic value of the estimate when the underlying distribution of the sample is H. It is assumed that T is asymptotically unbiased at the nominal model, Ho , that is T(Ho) = 0. In this paper, we will assume that (y, x) ,-•-• .V(0, /p+1), that is Ho is the p + 1-dimensional multivariate standard normal distribution. We will work with the &contamination neighborhood of the fixed nominal distribution Ho, VE (Ho ) = {(1—€)H0+ cH* : H* is any arbitrary distribution on RP+ 1 }. The maximum asymp- totic bias of T over VE (Ho ) is defined as (3.6) BT(c) = sup {I IT(H)11 : H E 1),(H0 )}. Finally the asymptotic breakdown point of T is defined as (3.7) BP(T) = inf le : BT(E) = oo}. 13 The estimates of regression coefficients considered in this paper have the characteristic that their influence curves are unbounded and so their gross error sensitivity is infinite. And the derivative of their maximum asymptotic bias function at 0 is infinite but the derivative of the square of their maximum asymptotic bias function at 0 is finite. This fact and the need of a linear approximation of the maximum bias function near the origin leads us to use B1 , instead of BT as a measure of maximum possible departure from the central model. Note that the breakdown point remains unaffected. We define the sensitivity of T as d(3.8) SEN S(T) = Te .131(€)1,=0 . In this way we can approximate /31,(c) .;.--, ESENS(T) for c ..:-... 0. Remark. Connected with the computation of the maximum asymptotic bias of the estimates considered in the next section, the following is a key result (Martin, Yohai and Zamar, 1989). Let x be a real-valued function on RI- satisfying the following assumptions: • symmetric and non-decreasing on [0, oo), with x(0) = 0; • bounded, with lims_, x(x) = 1; • x has only a finite number of discontinuities. Assume now that the target model is H0 is given by (3.4) and that • F0 is absolutely continuous with density fo which is symmetric, continuous and strictly decreasing for u > 0 and • Go is spherical and PG0 (X 1 0 = 0) = 0 , V 0 E RP with 0 0. Under the last assumption, it is easy to see that the distribution of x'0 depends only on 11011. Thus we set 1(s,11911)= EHO X ( Y —sx'9 ) - 14 Martin, Yohai and Zamar (1989) show that under the assumptions stated above on x, Fo and Go , h is continuous, strictly increasing with respect to and strictly decreasing in s for s > O. If z = (y, x) N Ar(0, Ip+1 ), then ((1 7 2 ) 112 )h(s, 7) = gx where gx(t) = E{X(tZ)} with Z ti H(0, 1). 3.2 S-Estimates S-estimates of regression coefficients were introduced by Rousseeuw and Yohai (1984). Given the M-estimate of scale of these numbers, s„, is defined as the solution of n Ex ( ?Ls i = b n i=1 Where x is bounded, even and non-decreasing on [0, oo) and b is usually taken equal to E{x(Z)} with Z N .Ar(0, 1) (see Huber, 1964). We can assume with no loss of generality that x(oo) = 1 and x(0) = 0. Let (yi, xi) be as in (3.1) and let ui(0) = yi — O'xi , 0 ER'. The S-estimate of regression Bs is defined by the property of minimizing the M-estimate of scale of {ui(0)}1 1_ 1 , that is ds = arg min Sn (0). The corresponding asymptotic version is (3.9) Os(H) = arg min SH(0) where SH(0) satisfies the equation ( y e'x ) b S H(0)EHX 15 As proved in Martin, Yohai and Zamar (1989), the maximum bias of S-estimates of regression when 110 is Gaussian is given by [g (1%12 , (3.10) B2s(E) = 1 with g(t) = Ec{x(tZ)}g-1 (h) where (I. denotes the standard normal distribution function. This formula can be derived in the following way. Let us consider two situations. 1.Residual M-Scale When the True Model B = 0 is Fitted. Let H(s,y) = (1 — OH° 05(x ,y) E VE (Ho). Suppose that y is such that x (y I s(c)) = 1 where D o is the residual scale M-estimate when we fit the true model (i.e. 9 = 0) so that (1 — c)EHo x (--) c = b, Ao or equivalently (1 — c)g (--) c = b Po = Ao g-1 1 efE0 2.Residual M-Scale When the Outlier (y, x) is Fitted. Let A(11011) be defined by the equation that is (1 — €)Elio x (Y — 9'x 6, (11 0 11)) (1 — c)g (V1+11011 2 ) + HOP A(11011) = b A(I1011) — g-1 (H • The maximum bias Bs(c) is determined by the condition (3.11) A(Bs(E)) = Do . Observe that SH(11°11) > 0 (11 9 11) and SH(0) < Ao for all 9 ERP and all H E VE(Ho)• Therefore, if 11611 > Bs(c) then SH(9) > SH(0) and -0 argminSH(9). Clearly then Bs(c) < 11°11. 16 On the other hand, following along the lines of Martin, Yohai and Zamar (1989) one can prove that given 9* with 11011 < BS(c), there exists H E VE (H0) such that 9* = arg min SF(0). Hence, BS(f) I WI I - Therefore, BS(c)= sup{liell : A(11 011) < Do} and so by continuity of AO, Bs(c) must satisfy the equation A(Bs(c)) = D o , from which (3.10) directly follows. BREAKDOWN POINT OF S-ESTIMATES From (3.7) and (3.10) we see that the breakdown point of an S-estimate S is (3.12) BP(S) = minfb, 1 — b}. So, two distinct values of b give rise to any specified breakdown point c* E (0,0.5), namely, b = c* and b = 1 — c*. It will be shown in chapter 4 that the S-estimates S b for two such values of b have a strikingly different bias performance. SENSITIVITY OF S-ESTIMATES From (3.8) and (3.10), and if g(t) is continuously differentiable in some neighborhood of t = 1, the sensitivity of an S-estimate, SENS(S), is given by (3.13) SENS(S) = g' 2 (1) • More generally, suppose now that the estimate of regression coefficients, Oj(H), is given by (3.14) dj(H) = arg mjn J(FH,e) where J is a functional defined on a subset of Z(R) and FH,e is the distribution function under H of the residual r(9) = y — x'0. Notice that in the case of S-estimates we take J(FH,e) = S(FH,e), with S(FH,e) defined by the equation (3.15) S(FH,e) r L EFH , e x (—) = u. s 17 Under certain regularity conditions to be determined in future work, we conjecture that following the lines of the argument given above it can be shown that the maximum bias function for Uj(H) satisfies the equation (3.11) with, AO = AFFIX)) ; FH2O(X) = ( 1 — 04(x) + 054. 00 (X) and A(IIeii) = AFRe) ; Ffi ,e(x) = (1— c)(1)(x\11 +1101 2 ) + c80 (x) where Sy • ) is a point mass distribution at y. 3.3 T-Estimates A r-estimate is given by (3.14) with J(FH,e) = r (F11,9), where T(FH,9 ) = S2 (FH,e)EFH ,e X2 (s(FH,9)) and S(FH,o) is based on a function xi (see Yohai and Zamar, 1988). Let g2(t) = gx(t), i = 1,2 and b = E:Dxi (Z). Since in this case, 1 b — Ao = T(FH,o) ET ,[gi-1 2 {( 1 092 (gi ( and A(11 0 1 1 ) = ( 1 + 1 10112)(1 E)g2 (gi-1 we have that from (3.11) b e ) I 2 g2 (gr1 (4 ) (3.16) B?(c)= (=e,) g2 (g171 ( tie 1 _ eg2 (gr i b , )) 1. BREAKDOWN POINT OF T -ESTIMATES According to (3.16) we see that the breakdown point of a r-estimate of regression, T, is (3.17) BP(r) = minfb, 1 — b}. 18 [gr1 (47)} 2 Again, as in the case of S-estimates, two distinct values of b give rise to an specified breakdown point €* E (0, 0.5). In chapter 5 the pronounced difference between these estimates with the same breakdown point will be shown . SENSITIVITY OF 7-ESTIMATES If gi(t) is continuously differentiable in a neighborhood of t = 1, i = 1, 2, the sensitivity of a T-estimate, SENS(7),is 1 g(3.18) SENS(T) = g“ 2 1 ) g; (1 gl(11)) where b2 = E4>X2(Z). 3.4 MM-Estimates Let sl = si(H) = min Si(FH,o) where S1(FH,o) is as on (3.15) and is based on a function xi. (3.14) where the J-functional is in this case M(Fx,e, si), with r M(FH,e, si) = EFH,e X2 (- 81 An MM-estimate is defined by while xi and X2 satisfy the conditions given in Yohai (1987) including the requirement that Xi(x) X2(x) V x ER. In this case 6.0 = M( FH,o,Si(FH,0 ) ) and A( 11 0 11 ) = m(Fil , e ,si (FH,0 ) ). Notice that sup = Si (FH,o). HEvf(H0) 19 Let gi(t) = gx(t) i = 1, 2 and b = Ecpxi(Z). It can be easily derived that b — Op = (1 — c)g2 (grl ) \ c A(11 0iI) =— E )(1 — c)g2 (V1 + 11011 2gT 1 (b— f )1 and therefore (3.19) BL(c) = [9 1 (92 (91 1-z)] 1.g 1 0.:Ce) BREAKDOWN POINT OF MM-ESTIMATES According to (3.7) and (3.19) the breakdown point of an MM-estimate is (3.20) BP(M) = b. SENSITIVITY OF MM-ESTIMATES If gi (t) is continuously differentiable in a neighborhood of t = 1, i = 1, 2, the sensitivity of an MM-estimate is (3.21) SENS(M) — , 2gi) . 2 20 Chapter 4 The Relative Breakdown Rate Given two estimates T and T' with the same breakdown point b, we define the relative breakdown rate of T with respect to T' as: Y(4.1) RBR(T,T') lim B„,,E),loq i kE) • The concept of relative breakdown rate gives a more complete description of two estimates, because it not only points to the asymptote of the bias curves but also characterizes the relative speed of divergence to infinity. In chapter 5 we will define the Breakdown Rate of certain types of S-, T- and MM-estimates. It will be the relative breakdown rate with respect to a baseline estimate, namely the min-max bias S-estimate of regression among all S-estimates with the same breakdown point. We illustrate now how to compute and use the concept of the relative breakdown rate. 4.1 The Relative Breakdown Rate of S-Estimates Based on x Functions Strictly Convex on a Neighborhood of Zero Let Si be an S-estimate of regression based on Xi such that xi is continuous, differentiable in all but a finite number of points with 0 < j Vi (y)y dy < oo and three times differentiable in some neighborhood of zero with x'1(0) 0, i = 1, 2. Also suppose that E(Dx i (Z) = Etx2(Z) = b , 0 < b < 0.5. 21 L? = lim t-43 According to (3.10) and (4.1), the relative breakdown rate of Sl with respect to S 2 is ( 1_ b € ) (tfe) 2 (4.2) RBR(S1, 52) = urn ).t) g ( i_b e ) g1-1 ( itef ) where gi = g, , i = 1,2. Note that as c --> b , 1—E 1-6—* 0, i _b , if b < 0.5 and T§7 -> 1 if b = 0.5. Therefore, as E -> b, g[ l U-2,) 0 gz- 1 (1%) )b if b < 0.5 and gt 1 (ib E ) if b = 0.5, i = 1,2. We will compute L1 = 2-(t)27---1 and L2 = 11111t_,.1 g-1—r—(t) using L'HOspital's rule.(t) (t) Computation of L 1 . It is easier to compute 00 Let = ii111 dt [g2 1(t)] 2 —llm g(gri(t)) g2 1 ( t) t—*C1 {9, 1-1( t )] 2 t—).0 gl 1 (t) 92(92 1 (t)) Then, if LI exists (contemplating also the possibility of LI being infinity), L? = Now, for i = 1,2 co gi(t)= 2 xi(ty)co(y) dy Jo and co gat) = 2 Jo where co = Let y > 0; then a Taylor's series expansion of order 1 around 0 of Vi gives, XatY) = X: 1(0)ty o(ty) as t -> 0, so that 1 °° o( - t gat) = x1(0) + 2 f tty) yco(y) dy , t > 0. Hence, (4.3) . g“gr l (t)) g l (t) t,c1 (t) gygnt)) 22 (4.4) VAC)) + 2 0 y(P(y )0 g371 (t) y(p s dy 4(0) + 2 fo" (t) yco(y) dy (4.5) Computation of L2. We have that 1 • gl 1(t) goL2 = 11M = liM 2 i t1" t-n 1 g 1 (t) t--.4 —_-r— gi (t) Let d 1 L* = lim dt g 2 =r6j- = lim [gi 1 (t)] 29,1 (gr 1 (t)) 2 t_.•1 dt 9T ) t-•1 {gi 1 (t)PCgnt)) • . (t Then, if L; exists, L2 = L2 . Now, for i = 1, 2 and t > 0 t2g(t) = 2 Joy X:(Y)Poo (— Y ) dy. For each y > 0, a Taylor's series expansion of order 1 around 0 of co gives around 0 we can write ( t ) = (p(0) + o(—) as t--; oo and 00 00 t2 gat) = 2(140) Xi(Y)Y dy + 2 f Vi(Y)Y0 (t) dyJo so that L2 (Y)Y dY(4.6) -2 /0" X 12 (Y)Y dY . Then, by (4.2),(4.5) and (4.6), if 0 < b < 0.5 and if b= 0.5 RBR(S 1 , 5 2 ) = 92 1 ( 14:41 2 X1( 3 ).(AO T051 (4.7) RBR(S1 , s 2 ) _ f0Xi(Y)Y dy 2 V1'(0) f0 v2(y)y dy v2(0) - 23 EXAMPLE 1: We will compute the RBR of two commonly used "smooth" initial S-estimates of regression with breakdown point equal to 0.5. The first one, SA , is based on the function (4.8) XA(x) = if lx1 > A x 2 , if < Ix' < A , A > 0 which is a simple truncation of the classical square loss function. The choice A = 1.041 gives BP(SA) = xA(x) d(1)(x) = 0.5. The second S-estimate, SB is based on the integrated Tukey's bisquare score function 73Tx4 irx6, if 0 < lx1 < B which is three times continuously differentiable. The choice B = 1.547 gives BP(SB) .111 xB(x) ci(x) = 0.5. We have that , 2 A(4.10) X'A(Y)YaY = –3 ; f co XIBMY dY = —16 B Joy 35 and (4.11) so that by (4.7) 4(0) == 2 • e (0) = 6-- A2 B B2 (4.9) XB(x) = , B > 01, if Ix' > B RBR(SA , SB) = 0.709. Also, 4 (4.12) g'A(1) = --A-2 [4(A) – iLly,(A) – 0.5] and (4.13) g'B (1) = 12co(B) ( T32 – + 12[4)(B) – Bco(B) – 0.5] 11; For A = 1.041 and B = 1.547 we get g iA (1) = 0.404 ; g'B (1) = 0.389 and so from (3.13) SENS(SA) = 4.950 ; SENS(SB) = 5.141. 24 2 (E) RBR(M, S) = lim B m e--.b B3(c) 9v. (92 (g1 1[ = lim I(H) + &i) 2W1—e)e—+b —1 estimate A, B SENS BP efficiency RBR SA 1.041 4.950 0.5 0.219 0.709 SB 1.547 5.141 0.5 0.287 Table 4.1: Comparison of two S-estimates with the same BP The asymptotic efficiency at the model with Gaussian errors of SA is given by eA 2[4)(A) — Aco(A) — 0.5] and for A = 1.041 we have that eA = 0.219. The efficiency of SB for B = 1.547 is eB = 0.287 (see Rousseeuw and Yohai, 1984). All these computations are summarized in Table 4.1. Based on these figures, the S-estimate based on XA can be expected to perform approximately the same as that based on XB for Gaussian or approximately Gaussian data and can be expected to perform better in the presence of a large fraction of outliers. However, this should be confirmed by extensive Monte Carlo simulation. 4.2 The Relative Breakdown Rate of MM- and S-Estimates Based on x Functions Strictly Convex on a Neighborhood of Zero Let M be an MM-estimate of regression based on x i and X2 with x i and X2 three times contin- uously differentiable, x'/(0) 0 for i = 1, 2 and E4,xi (Z) = b, 0 < b < 0.5. Further, let S be an S-estimate based Xi. Then BP(S) = BP(M) and we want to compute the relative breakdown rate of M with respect to S. By (4.1) and (3.19) we have that 25 To compute the limit we can use L'IlOspital's rule. Following a similar reasoning as in Section 4.1 we see that RBR(M, S) = if 0 < b < 0.5I1.9T I (4_1).1 ' rf,:x(Y)YdY 2 ))) \ —1 1 2 if b = 0.5 L Jo Xi ( dY )?11(0 4.3 The Relative Breakdown Rate of T— and S-Estimates Based on x Functions Strictly Convex on a Neighborhood of Zero Let 7 be a 7-estimate of regression based on x i and X2 such that Eox i (Z) = b, 0 < b < 0.5. Further, let S be the S-estimate of regression based on xi. Then, 92 (9r1 1;:fe) RBR(r, 5) = lim + g2 (gr l ( 1 1...0 1 gr1 ( 1 i b e ) if 0 < b < 0.51—b 92 (9 17 1 ( 1, if b = 0.5. Note that given x i , if r and S are based on x i and b = 0.5, then RB R(r, S) = 1, no matter how we choose X2. EXAMPLE 2: Let M be an MM-estimate based on x i and X2 such that xi and X 2 satisfy the assumptions made in example 3. Now let T be based on x i and some other function X3. Then, noting that RBR(M, T) = RBR(T,S) RBR(M,S) where S is the S-estimate based on x i we have that (el 2 1—b (gr1 if 0 < b < 0.5[gV (4.14) RBR(M, T) = X(Y)YdY 2 x"(0)4s) 1 2 if b = 0.5[ 0 Xi (y)y dy XI 0) Let .7" be the family of functions XB, B > 0 where xB(x) is given by (4.9). .T is known as Tukey's family of x functions. If we take xi = XBi i = 1, 2,3 such that Eoxi(Z) = 0.5, by (4.14), (4.10) and (4.11) we have that (4.15) 2 ) B2 B?)-112RBR(M, T ) [71T 26 estimate B1 B2 BP efficiency SENS RBR M 1.56 4.68 0.5 0.95 9.639 2.58 T 1.56 6.08 0.5 0.95 13.480 Table 4.2: Comparison of an MM- and a 7-estimate with the same BP and efficiency estimate B1 B2 BP SENS RBR M 1.56 4.680 0.5 9.639 2.58 T 1.56 5.025 0.5 9.639 Table 4.3: Comparison of an MM- and a 7-estimate with the same BP and SENS The choice of B 1 = 1.56 gives us two estimates with BP = 0.5 and if we choose B2 = 4.68 and B3 = 6.08, both estimates have 95% asymptotic efficiency at the model with Gaussian errors (see Yohai, 1985 and Yohai and Zamar, 1988). With these values of B 1 and B2 (note that BRB(M, r) doesn't depend on B3 ) RBR(M, r) = 2.58. By (4.13) gl . (1) = 1.300, gY1) = 0.207 and g(1) = 0.138 and the value of b 3 such that b3 = E:Dx3(Z) is b3 = 0.075. Therefore, SENS(r) = 13.480 ; SENS(M) = 9.639. We summarize the calculated quantifiers in Table 4.2. Since (4.15) does not depend on the choice of B3, if we take B1 = 1.56, B2 = 4.68 and B3 = 5.025, we have that SENS(M) = SENS(r) = 9.639 and RBR remains the same as the one calculated before, i.e. RBR(M, r) = 2.58. We can conclude from Table 4.3 that 7-estimates can be expected to outperform comparable MM-estimates for a wide range of fractions of contamination. This should also be confirmed by 27 extensive Monte Carlo studies. 28 Chapter 5 The Breakdown Rate In this chapter we will define the breakdown rate for S-, r- and MM-estimates of regression. The min-max asymptotic bias (among all S-estimates with the same breakdown point) S-estimate will be used as a baseline estimate. In Section 5.1 we justify the choice of this baseline estimate, in Section 5.2 we give the definition of the breakdown rate and in the subsequent sections we compute the breakdown rate for certain types of 5-, r- and MM-estimates. 5.1 The Baseline Estimate We will denote by X a the function Xa(x) = { 1 if ixi < a0 otherwise. We call Xa a "jump function" with jump constant a. Let C be the family of functions x :R--> R such that: • X is even and nondecreasing in [0, oo); • x is either continuous or a jump function; • x is continuously differentiable in all but a finite number of points; • X(0) = 0 and x(x) --> 1 as x —+ oo; • 0 <E4,{x(X)} < 1. 29 For x E C, let gx ( t) = EcD {x(tx)}. The following lemma was stated and proved by Martin and Zamar (1989). 1 Lemma 1 : Given 0 < b < 1, let (5.1) Cb = fx : x E C and E4{x(X)} = b} and a satisfying 2[1 — (D(a)] = b. Then, for all x E Cb gx0 (t) _> gx ( t) , V t > 1; gx.(t) < gx (t) , Vt < 1. Proof: Since x, Xa E Cb we have that fa coo jo X(Y) (P(Y) dy = J [1 — x(Y)1 (P(Y) dY • a Now, note that cp(y1t)hp(y) is an increasing function of y if t > 1 and it is decreasing in y if 0 < t < 1. Then, V t > 1 1 fa ( Y = 1 r -7 0 xkg)co (7) dy T Jo a x(g)co(g) (70(y) dy < t1 5, La X(Y)C9(Y) 4 = ,11,fa—[1—x(y)],(y)dy < 1-./. [1_x(y)]4omdy 'The result proved in the reference paper is more general than the one presented in Lemma 1. It is valid for any distribution function F0 with a density fo symmetric about 0 and such that f(tx)/f(x) is decreasing in x for t > 1. 30 Therefore, gx(t) —t1 Jo (.1) dy = VOW (Yt) dy r X(Y)Co (1-) dyt Job 71100 (y) dy = gxa(t) For t <1 the inequalities above are reversed and the result follows. q The following theorem, which follows directly from Lemma 1, shows that the S-estimate of regression Sb based on Xa with 2[1 — (NO] = b is min-max bias over Cb, where Cb is as in (5.1). Theorem 1 : For all 0 < b < 1, Bs(c) Bsb(E) 0 < < b for all S based on x E Cb. Proof: From (3.10), the maximum bias function of an S-estimate based on a function x E Cb is given by 2\ I (b/(1 — c)) 1 -13*(€) 1.9V ((b EV( 1— f)).1 - 1 ,0<c<b. Since V c > 0, b/(1 — c) > b and (b — c)/(1 — c) < b, it follows that gX l (b/(1—E)) > 1 and G 1 ((b — c)1(1— c)) <1. By the preceding lemma, we have that gxa (b/(1 — c)) (b/(1 — c)) and g;c-,1, ((b — e)/(1 — c)) ((b — e)/(1 — c)) . q Proposition 1 : If S is an S-estimate of regression based on x E Cb, 0 < b < 1, then lim -13(f) > 1 . * 31 1 , if b = 0.5 b1—b 92 (91--1( i b , if 0 < min{ b,1 — b} < 0.51 Proof: The result follows from last theorem, since BS(E) > Bk(E), 0 < c < b. q Proposition 2 : Let 0 < b < 1 and r be a r-estimate of regression based on x i E Cb and X2 E C. Then, if either • b = 0.5, or • 0 < min{ b, 1 — b} < 0.5 and g2 (gi-1 ( 1 60) b 21—b lim B (c) > 1 e--4) Bs2 b(E) 1 . Proof: Denote by S 1 the S-estimate of regression based on the function x l and let gi = gx ,, i = 1,2. Then, ELM B S 1 (€) g2 =fe )) 1 B2(C) B2 (e) f--*b B s2 b (E) g2 (grl( i b E ))Sb 1 E 92 (g1 1( i b E )) } • Now, g2 (gT1( iblEe )) liM g2 f)) 1 1 C g2 (gri(1te)) The hypothesis g2 (grl (Th)) implies that the last limit above is greater than or equal to one V 0 < b < 1, and so by proposition 1 the result follows. q Proposition 3 : Let M be an MM-estimate of regression based on X 1 E Cb and X2 E C , 0 < b < 0.5. Then, if either • b = 0.5 Or 2 If xi(x) > x2(x)Vx, then gi (t) > g2(t)V t, and so g2(g1 1 (t)) < t, V t. Usually, xi and X 2 are taken in the same family of functions (e.g. Tukey's family, see section 4.1) . In this case, since x i is chosen to attain a high breakdown point and X 2 to attain high efficiency, the choice x i > X2 is the natural one to do. 32 • 0 < b < 0.5, g 2 r 1 (lb b )) < ib b 3 , C > 0 and d > c such that xi(x) = 0, V Ix' < c and xi is strictly increasing and two times continuously differentiable on (c, d) Btr(C) llin > 1. B b E) Proof: Suppose first that b = 0.5. For each 0 < c < 0.5, let fe (b) = Bk(c) and b(c) = arginine<b<r — EL(b). It was proved by Yohai and Zamar (1991) that if T is an estimate of regression depending only on the residuals, then for each 0 < c < 0.5, Bk o (c) < BRE). This fact implies that e_.1kno .5 :To 1.BST (E)) Now, note that since b(E) 0.5 as c T 0.5, B b( c) (€)ER1113.5 B s2 i (c = 1. and since lim B2 (E B M (E) .14 (c) ) m b(0 m ' = firn Bs2 4 (€) BL) (E) BS (e) the assertion follows. Now suppose that 0 < b < 0.5, then4 : B2 (C)ihn M (E 00 a2 (9;1 01(1 -0)) 2 c2 h -1 (b/(1-0) if c = 0 if c > 0 Suppose that c > 0 and denote by S 1 the S-estimate based on xi. Since g 171 ( i b b ) < g2-1 (16=6) B 2 (E) B21 (E) li111 M > 11111 S > 1.0 6.-+b B s2 b (c) B s2 b (E) 3 this condition will be automatically satisfied for the MM-estimates such as we have defined them in Section 3.4 since it is required that X2 < xi. 4 We delay the proof of this fact until Section 5.5. 33 B2 (f) urn BS (E) Bk(e) 9-1 ( ib e ) = lim E—rb [h—i ( i b e ) BR2(S) = ]h-1 0=0 2 9 -1 ( it€, ) 5.2 The Definition of the Breakdown Rate Define eb as the family of S-,T- and MM-estimates with breakdown point b. The results of the previous section motivate us to define the Breakdown Rate (BR) of an estimate Tb E gb as: B R2 (Tb) = lin 6(e)i., B2 E Under the assumptions stated in propositions 1, 2 and 3, the BR indicates the speed of divergence to infinity of the square of the maximum asymptotic bias function of an estimate Tb with respect to that of a baseline function, namely the maximum asymptotic bias of Sb, Bk(c). The BR is a measure of global robustness which summarizes information contained in the last portion of the maximum asymptotic bias function of Tb. It provides a simple way of comparing robust estimates with the same breakdown point. Note that we are comparing all estimates of Cb with the same estimate, namely Sb E Cb, the min-max asymptotic bias S-estimate of regression. 5.3 Breakdown Rate of S-Estimates of Regression Let 0 < b < 1, x E Cb and S be an S-estimate of regression based on x. In this section we calculate the breakdown rate of S, that is where, h(t) = 2 (1 — (I) (0 ) , with a such that b = 2 (1 — cD(a)) g(t) = I : x(tx)co(x) dx . and 34 The following two results show that we can restrict our attention to the case 0 < b < 0.5. Lemma 2 : Let 0.5 < b < 1 and xi E Cb, X2 E Cl—b. Further assume that either: • 0 < fo" Vi. (y)y dy < oo, or • Xi = Xa, i = 1, 2 where 2[1 — fb(ai)] b and 2[1 — (1.(a2)] = 1 — b. Denote by Sb the S-estimate of regression based on xi and Sl — b the S-estimate of regression based on X2. Then, RBR(S6 ,5 1—b ) = oo. Proof: Assume that 0 < fo Xi(Y)Ydy < co. Let g(t) = g xi (t) and f (t) = gx2 (t). (1 112 — RBR(S b , S i—b ` = I ET b 9-1 (tf_: ..ce ) f-1 04)) We can apply L'IlOspital's rule to compute the limit: f_1( 1:17 t) g- 1 (t)f- 1 (1 — t) = lim lig (t) gl( — 1 (g — 1 t )2 f f g (1 t ) ) t We can write, t2g'(t) = 2 [C0(0) foc° (x)x dx I: o(x It)Vi (x)x dx] f (t) = 21 '3° A(tx)xco(x) dx. 0 Thus, lim( n _1( 9'(9 -1(t)) _ Ern Co(o) r Xi(x)x dx 0(x1(9 1 (t)AC(x)x dx t.1" fi(f-1 (t) t.1 Lc° XVI-1 ( 1 — t)x)xco(x)dx = 00. Now suppose that xi is of the jump type with jump constant ai, i = 1, 2 where 2[1-4)(a1)] = b and 2[1 — (I)(a2)] = 1 — b. 35 [4, —1 (1 rtf..7 )- 1 (1 1—b73: 70 1 2RBR(Sb , S i—b ) c-+1-b 4)-1 (1 1-r-14.) 4, —1 (1 2(1b—c)) • Now, by applying L'HOspital's rule we get 21 lira 4._i 1 - 1 - b - f) _ i ( 1 b Ern. , - 4,-1 ( 1 ( 6) 6-4-19 2(1 - C) ) 2(1- c)) ,---,1-b ipp-i (1 12 -t, 0 4) = lirn 1-4,_i (i 1 _ b - c)1 2 (1.1 (41 1 ( 1 2 ( 1 6 ))) c—a—b i 2(1 - e) LI (70 (4,-1 (1 ..1L.y)) 1 =limo (P(0)o° t2W(t) = 0 And so, RBR(Sb , S 1 - b ) = 00.0 Let Sb be the family of S-estimates based on x functions such that x E Cb. Proposition 4 : If 0.5 < b <1 and S E Sb then BR(S) oo. Proof: Let S1-b be any S-estimate in S i_b such that it is based on a function of the same type as the function on which S is based (i.e. if S is based on a jump type function then S 1- b should be based on a jump type function as well, and similarly if S is based on a continuously differentiable in all but a finite number of points function). Then, since B R2 (S) RBR(S, S 1 - b )B R2 (S") > RBR(S, S1 -b) the result follows as a consequence of the previous lemma. q We will concentrate now in the case when 0 < b < 0.5. g-i_b_ Note that if L 1 = lim,_,b h 1-c/ ) and L2 = 1—c both exist, then BR2 (S) = -1 i (1-) (L1L 2 ) 2 • Lemma 3 : Suppose that x E Cb and 0 < fo x'(y)ydy < oo. Then 1 n1 g-1(t) 1 j.c° Li X'(Y)Yt i h-1 (t) a 36 Proof: Since h-1 (t) and g -1 (t) tend to infinity as t tends to one, we can apply L'HOspital's rule to compute L 1 . Let d 1 L t. m d h -1 (t) I = h t-o. 11_ 1 dt F=ITtT [g -1 (o] 2 9,/ (g -1 (0) [h -1 MP il l ( h-1 (0) . Then, if LI exists, L 1 = LI. Note that h'(t) = g'(t) = 2 a (a —co — t2 t ) ' T L x (ow (Y dy; and that a Taylor's series expansion of order 1 around 0 gives co ( z z7) = cp(0) + o (7) , as 1 --- 0 so that we can write, a h'(t) = 2T k(0) + o (7)] , as t —> oo; 1 CO 00 g'(t) = 2 — t2 {cc( 0)1 Xi (Y)Y dY + I X' (Y)Y 0 () dy] , as t -- oo.o o Hence, I = lim (g-1(t))2 g'(g-1(t)) 1 rx) L1 (h-1 (t)) 2 hqh-1 (t)) a JO ay and the result follows. q Lemma 4 : Let x E Cb and L2 = lint-40 9: 14 • Suppose that 3 c > 0 and d > c such that X(y) = 0 dy E [-c, c] and x is strictly increasing and two times continuously differentiable in (c, d). Then if c = 0, L2 = 00 and if c > 0, a L2 = —. C 37 Proof: Since h-1 (t), g -1 (t), h'(t), g'(t) —.). 0 as t —> 0, we can apply L'IlOspital's rule (two times) to compute L. Let d 1 L 2 = um dt N -1 (t)p .... [h-1 (t)]3 11 1 0-1 (t)) t--*C1 cclitlh-11(0p — [g -1 (t)] 3g'(g -1 (t)) and L2 = lim 11[11— 1 (OP h/ ( h-1 (0)} t-4) 1{[g-1(013g 1(9-1(0)} 3[h-1 (01 2 + [11 -1 (0]3 hi: : hhi: tt?? If L2 exists, then L2 = L2 and so L3 = L2. Now, a) ah'(t) = 2c7o (--t- T1 1 211"M = h'(t) t- 72- — 2 1 c° g'(t) 2— Xi(Y)FP (-) dyt2I y g"(t) = 2— XilY/Y3C0 (—) dy — 2 Tg'(t)t5 I so that g"(t) g'(t) h"(t) h'(t) 1 fr xi(y)y3ca (f) dy 2 _ 1 1 fr V(Y)Y3co (f) dy 2t2 1 = t3 Jr v(y)yv (f) dy t t3 t fr Xi (Y)A0 () dy T1 (a2 — 2t2 ) and so a2 + h -1 (t) I2 = 1114 f" x i (y)y3 (P(n dY n-1(t)locx) XIMY(P(ndY + .7 Under the stated hypothesis on X, fr x'(Y)Y3co (f) dy = t2 fc7t X/(tY)Y3(10(Y) dy fog Xi(Y)YV () dy rit xqtY)Ycio(y)dy = 3[g -1 (t)] 2 + [g -1 ( t )1 3 g i : .gi- 11 till 38 Let 2 f(e+h)it XVY)y3y,(y) dy f(t,h) = t co • t > 0 , h > 0. Ac+h)/t V(tY)Y50(y) dy Let t, h > 0 and y > (c + h) I t. Then, by performing a Taylor's series expansion of order 0 around c + h we can write X'(ty) = X i(c + h) + Ro(ty, c + h) where Ro(ty, c + h) = x"(e)[ty — (c + h)] for some E (c + h, ty). Then, _f (t, h) t2 X1(,C, a 1 + h L) , rfr+oeh)/t Y3CO(Y)clY + f(7:Fol t R(ty, c + h)y3co(y) dy X" lc + i(+h)lt MY) dy + fic:)+01t R(ty, c + h)p,o(y) dy • But, t2 f( :+h)/t y3co(y) dy — (LA) [(c + h) 2 + 2t2 ] = Acc°+hv t Y Co(Y) dy (to ( gin.) = (c + h) 2 + 2t2 ; t2 fr+hvt Ro (ty, c + h)y3 co(y) dy = f(+h)/t ycp(y) dy and fo co C0 (x + LttLI) dxRo(tx + c + h, c + h)(tx + c + h)3 t (L+A) `P t Ac ° Ro(tY , c + h*P(Y) clY-E0 it I °° +F= Ro(tx c + h, c + h)(tx + c + h) i co (x + -Li) dx frc+hvt Y40(Y) 4 o t co (L+A) where Ro(tx + c + h, c + h) = x" (e)tx , for some e E (c + h,tx -I- c + h) for each x > O. Therefore, f (t, h) —+ c2 for (t, h) --- (0, 0) -1- and a2L2 -=- - - 111 c Theorem 2 : Let 0 < b < 0.5 and x E Cb. Suppose that 3 c > 0 and d > c such that X(Y) = 0 Vy E [—c, c] and x is strictly increasing and two times continuously differentiable in (c,d). Let S be the S-estimate based on X. • If c , 0 and 0 < fr V(Y)Y dy < oo, then BR2 (S) = oo V b. 39 • If c > 0 and 0 < b < 0.5, then 1 BR2 (S) = [y l (1 2(1 b b) ) g-1 (1 b b )12 .— • If c > 0, 0 < fr X'(Y)Y dy < 00 and b = 0.5, then (5.2) BR(S)2 = [-1 f c° X'(0Y dd 2 • c Proof: It is a direct consequence of lemmas 3 and 4. q EXAMPLE 3: Let { XC,A = and let Sc,A be the S-estimate based on XC,A• that is if 0 < Ix' < C if C < Ix' < A if ixl > A Let 0 < b < 0.5 be such that E4)xc,A(Z) = b 0 x2 — C2 A2 —C2 1 A2 — 2 C2 [40(A)(1 — A2 ) — Aco(A) — 4, (C)(1 — C2 ) + Cco(C)J + 2 = b. The choice C = 0.202 and A = 1 gives BP(S") = 0.5. Since gc,A(1) = A2 _ 4 c,2 [4)(A) — Aco(A) — 4, (C) + Cco(C)] we have that A2 — C2SENS(Sc 'A ) = 2[4,0)_ Aco(A) — 4.(c) + cw(c) , and the efficiency of S c,A at the Gaussian model is e(S") = 2WA) — Aco(A) — 4)(C) + Cco(C)]• To compute the breakdown rate of Sc ,A note first that [co 2 A3 — C3A Vc,A(x)x dx — 3 A2 _ 6,2 and by(5.2) A, 2 g — C3 11 2 BR2 (SC" - ) = [3 A2 — C2 C 40 estimate C A BP SENS efficiency BR 5c'' ,A 0.202 1 0.5 4.879 0.196 3.412 SA 0 1.041 0.5 4.950 0.219 00 Table 5.1: Comparison of two S-estimates with the same BP but markedly different bias per- formance Note that if C = 0, the estimate reduces to the one introduced in Section 4.1, Example 1, based on XA given by (4.8). We summarize the quantifiers calculated above for the specific values of C and A in Table 5.1 including SA as well. Notice that these two estimates have very similar asymptotic properties such as the BP, SENS and efficiency. They can only be distinguished in terms of their BR. 5.4 Breakdown Rate of T-Estimates of Regression Let 0 < b < 1, Xi E Cb, X2 E C and r be a 7-estimate of regression based on x i and x2 . If b 0.5, the breakdown rate of r is BR2 (r) = lim B2(E)T e—.b (E) b gT1 (?) h -1 1— bk2 (g171 (T-bb ))1 lim [€—nb h-1 617) giT-1 0:1 2 (b—e) 1—f and if b = 0.5 limB R2 (7) = b g 1 ` 1 6E) 0 :fe )h T 1 2 h-1 0-2c ) g1 1—E where gi(t)= gx.(t), i = 1,2 and h is the same function defined in Section 5.3. Lemma 5 : Let 0.5 < b < 1, E C, i = 1,2, Xi E Cb and Xi E C1 —b. Let yi be the y-estimate based on Xi and Xa. Then RBR(r 1 ,7-2 ) = 00. 41 Proof: Let g i(t) = g x; (t) and fi(t) = gx./ (t) , j = 1,2. b ))] -1 RBR(r 1 ,r2 ) 1 = 3111b [12 (K 1 ( b 2 [g2 (gr 1 (lb Y))]-1 and by lemma 1, this limit is equal to infinity. q Let Tb be the family of 7-estimates based on functions 2 [ 917 1 ( i t%) fi-1 (1 _ 1b e ) g r 1 ( lb 1 fe ) K 1 ( 1 : be ) 1 E Cb and X2 E C. The following result shows that we can restrict our attention to the case 0 < b < 0.5. Proposition 5 : If 0.5 < b < 1, X1 E Cb, X2 E C and r is the 7-estimate of regression based on xi and x2, then BR(r) = Proof: Let 7 1— b be any 7-estimate in Ti_b. Then since BR2 (r) > RBR(r,r 1— b), the result follows as a consequence of Lemma 5. q Theorem 3 : Let 0 < b < 0.5, xi E Cb, X2 E C and r be the 7-estimate of regression based on Xi and X2. Suppose that ]c > 0 and d > c such that xi(y) = 0 Vy E [—c, c] and x i is strictly increasing and two times continuously differentiable in (c, d). Then, • if c 0 and 0 < fo vicoydY < oo, then BR(r) = oo; • if c > 0 and b < 0.5, BR2 (7) = b —1 (ib b)j 21 — bk2 (9'1 ))1 {Y 1 ( 1 2(1 b— b)) • if c > 0, b = 0.5 and 0 < Xi(Y)Y dY < 00, 2 (5.3) BR2(7) = Xi(Y)y dy] Proof: It is also a consequence of lemmas 3 and 4. q Corollary 1 : If b = 0.5, S E Sb is based on some function x i and r E 7 is based on x i and X2, then BR(S) = BR(r). If b < 0.5 then BR(S) < BR(r). Remark:In the case c > 0, b = 0.5, the BR of the 7-estimate does not depend on the choice of x 2 . 42 5.5 Breakdown Rate of MM-Estimates of Regression Let 0 < b < 0.5, Xi E Cb, X2 E C and M be an MM-estimate of regression based on x i and x2 . The breakdown rate of M is 2E)( BR2(M) = Ern B m., E) 92 1 [g 2 (g r1 0:0) J h-1 (( 1 —E/c) 21—€ = c-'b h-1 ( 14-4) Lemma 6 : Let 0 < b < 0.5 and xi E C, i = 1,2. Then, • if b < 0.5, 11111 = c—.b h-1 (1-9 g2 [9,2 (9,17 1 (1 1: cf )) 4_ I . e l 9 1 ( 1— b b ) J h-1 ( 1b1) ; • if b = 0.5 and 0 < foc° V2(Y)Y dy < 00, gv [g2 (gr i 0:0) + isL71 1 r -1 c—).1) j = j: V2 (y)y dy (2 - trim° g20)) •lim h-1 ( i+,) -0 Proof: Assume that b = 0.5. We can apply L'Illispitars rule to compute the limit of interest: if L = lim E-40.5 h-1 —c = lim -1c—n0.5 [g (g2 (g r 1 (0 itcf )) 1Sz7)] [11-1 (M)} 1 L' = lim €—).b d "cll fh-i ( 0.5 )1 -1 i-EJ 7:14i {g2-1 (g2 ( 91 -1 (0.5—e€) —1c 1c)1 92 (g2 (oitc,)) TEE )) [g2 12 (g2 (W)) +E 6)] 2 hil (11 1 (M)) [11 1 92 i- 1 (05—,e)) —1 . A (gi.An (0i%€)) g2 1 (g2 (g171 (01.57)) + l i e ) and Or = X {2 43 Then, if L' exists, L = L'. Now, since h' (t) = 2t2 [co(0) o ( c.±)] as t - 00 92(t)t) = 2t P 00 O) X'2(Y)Y dy + Joy a () d yi as t oo ; then, (9,2 (91-1 (015 t—cc)) 1 l c )] 2 A (gV (g2 r 1 ( Oitf e rim 1 = [h -1 (II)] 2 h' (M))--€ 1 fcc)j X2(Y)Y dy. qa g(g- ( t 0 )) ) iRemark: Unfortunately, we were not able to compute lim t, g0 , g 1. ( n general. However( we will compute it for one important special case. Let xi , X2 E C be such that x2(x) < (x), V x. Suppose that there exists c 1 > 0 such that X1(x) = 0 if x E [0, Then there exists c 2 > c i such that X 2 (x) = 0 if x E [0, ed. If we want to define an MM-estimate based on xi and X 2 , the choice of X 2 should be done to obtain efficiency at the Gaussian model and so X2 should be as closely as possible to x(x) = x 2 . For this reason we only consider the case c i = c2 = c. Now, g2(t) .17 x2(y)y4' dy 9i(t) Vi(y)yco (f) dy .17it V2(ty)w(y) dy Ljt Xi(tY)Y (P(Y) dy If we let f (t, h) — roofrc-f-hvt X2(tY)P,o(y) dy for t Aci-hvt X1(tY)A0(Y) dy > Oi and assume that there exists d > c such that Xi is strictly increasing and two times continuously differentiable in (c, d). By continuity of on (c, d), for each y > (c h) I t we have x2(ty) = x2(c h) + Po(ty, c + h), Vi (ty) xii(c h) Ro (ty, , c h)• 44 where Ro(ty, c h) and Ro(ty, c + h) converge to zero as ty tends to c + h. Then, tf(,h) = xi (c + h) fr+0,ty,(04 + f c' R, (1 x'2(c + h)co ( 2-11-i ) + r V2 (c + h) f- , I/ ( ) + rh) ( dy , +h)/t Ro(ty, c + h)Ao(y) dy j jj: : ++ hoviit.fio(ty , c + h)Ao(y) dy J(c-1-1)1t —0 ‘ --Y 1 C + h)yco(y) dy Xi (c + h)co (A) + r Ro(ty, c + h)yco(y) dy x(cc-i-+hh) +( ) 1- 3-(71:PL) 474-h)it Ro(ty, c + It)y,p((cti))/t) dy 1 + xl(c1+h) fr-I-Olt Ro(ty, c + h)Y w((-i-Yh)it) dy Then, it's easy to see that if xi is not differentiable at c, then f (t, h) x ( c+) )2 , as (t, h) (0,0); Xi( where xac+) denotes the right lateral derivative of Xi at c, i = 1,2. Theorem 4 : Let° < b < 0.5, xi E CI)) X2 E C, X2(x) < xi(x), V x and M be the MM-estimate of regression based on xi and X2. Suppose that 3 c > 0 and d > c such that xi(x) = OV lx1 < c and x i is strictly increasing and two times continuously differentiable on (c, d). Then, • if 0 < b < 0.5 and c > 0, 1 .1 b )1 2BR2 (M) 2(1— b)) — bil Further assume that 0 < fo x'2 (y)y dy < Do. Then, • if c = 0, BR(M) = oo; • if b = 0.5, c > 0 and lim t_,o g'212.0) exists, —1 2(5.4) BR2(M) / 1C foc° A(y)y dy [2 — A(gri(t))] .gl(gT 1 (t)) Proof: Let g2 1 [g2 (91-1 (cli.5_7 )) + 1 1 , 1 = • h -1 `)1—f 45 o.s—Then, since g2 (gr i ( EE)) > 0, g2 (t) < g 1 (t) and g2 is increasing,1 gV ( 1e ) g21( 0.5e) ( ( < f( C) < h-1 (;—c) h-1 (D —E ) . By Lemma 3 the right hand side of the above inequality converges to 1/a fr x 12 (y)y dy as E 0.5 and by a similar reasoning the left hand side tends to -1-,-, fir V2 (y)y dy as E —4 0.5. Since by hypothesis 0 < fr V2(Y)Y dy < oo, there exist A1, A2 such that 0 < A l < A2 < oo and A l < f (c) < A2. Hence, by Lemma 4, if c = 0, BR(M) = oo. If c > 0 and b = 0.5, the result is a consequence of Lemma 6. 46 5.6 Conclusions Following are some conclusions obtained from the results proved in this chapter. • The results of Chapter 5 can be used to choose the loss function, Xi, which determines the breakdown point of S-, 7- and MM-estimates so that they have good bias-robustness properties. In particular, the fact that Xi should be constant and equal to zero on a neighborhood of zero (among other regularity conditions) was first discovered here. • MM- and 7-estimates were developed for the purpose of achieving a high breakdown point and a high efficiency at the Gaussian model. The results in this chapter show that the breakdown rate of 7-estimates with breakdown point equal to 0.5 does not depend on the choice of the "efficiency determining" loss function X 2 . On the other hand, the condition that X2 (x) < X1 (x), Vx for MM-estimates, forces X 2 to be constant near the origin with ensuing loss of efficiency (see the remark to Lemma 6, Section 5.5). • We think that the breakdown rate is a good criterion for defining optimality as in the following problem: "maximize the efficiency of an estimate subject to a constraint on its breakdown rate". If we can find an estimate that solves such a problem, it will be an adaptive estimate in the sense that if the model is Gaussian or nearly Gaussian the estimate will perform well (because of its high efficiency) and if the fraction of contamination is high, the estimate will perform well compared to other estimates with the same breakdown point. • The breakdown rate of an estimate is a robustness quantifier of an asymptotic nature. It remains to be determined whether the good breakdown rate properties of an estimate carries over to finite sample situations. A next step in this work will be to perform an extensive Monte Carlo study. 47 Bibliography Andrews, D.F.,Bickel, P.J.,Hampel, F.R.,Huber, P.J.,Rogers, W.H. and Tukey, J.W. (1972),Ro- bust Estimates of Location: Survey and Advances, Princeton University Press, Princeton, N.J. Donoho, D.L., and Huber, P.J. (1983), The notion of breakdown point, in A Festschrift for Erich Lehmann, edited by P. Bickel, K. Doksum, and J.L. Hodges, Jr., Wadswoth, Belmont, CA. Hampel, F.R. (1968), Contributions to the theory of robust estimation, Ph.D. Thesis, University of California, Berkeley. Hampel, F.R. (1971), A general qualitative definition of robustness, Ann. Math. Stat., 42, 1887-1896. Hampel, P.R. (1974a), The influence curve and its role in robust estimation, J. Am. Stat. Assoc., 69, 389-393. Hampel, F.R. (1974b), Rejection rules and robust estimates of location: an analysis of some Monte Carlo results, Proc. European Meeting of Statisticians and 7th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, Prague, 1974. Hampel, F.R. (1976), On the breakdown point of some rejection rules with mean, Res. Rep. No. 11, Fachgruppe fiir Statistik, Eidgen. Tech. Hochschule, Zurich. Hampel, F.R. (1978), Optimally bounding the gross-error sensitivity and the influence position in factor space, in Proceedings of the Statistical Computing Section of the American Statistical Association, ASA, Washington, D.C., 59-64. Hodges, J.L. (1967), Efficiency in normal samples and tolerance of extreme values for some estimates of location, Proc. Fifth Berkeley Syrnp. Math. Stat. Probab., 1, 163-168. 48 Huber, P.J. (1964), Robust estimation of a location parameter, Ann. Math. Stat.,35, 73-101. Huber, P.J. (1973), Robust regression: Asymptotics, conjectures and Monte Carlo, Ann. Stat., 1, 799-821. Huber, P.J. (1981), Robust Statistics, John Wiley & Sons, New York. Martin, R.D., Yohai, V.J. and Zamar, R.H. (1989), Min-max bias robust regression, Ann. Math. Stat., 4, 1608-1630. Martin, R.D.,and Zamar, R.H. (1989), Asymptotically min-max bias robust M-estimates of scale for positive random variables, J. Am. Stat. Assoc., 406, 494-501. Rousseeuw, P.J. and Yohai, V.J. (1984), Robust regression by means of S-estimators, in Robust and Nonlinear Time Series Analysis, edited by J. Franke, W. Hardle, and R.D. Martin, Lecture Notes in Statistics No. 26, Springer Verlag, New York, pp 256-272. Yohai, V.J. (1987), High breakdown point and high efficiency robust estimates for regression, Ann. Math. Stat., 15, 642-656. Yohai, V.J., and Zamar, R.H. (1988), High breakdown point and high efficiency robust estimates for regression, Ann. Math. Stat., 83, 406-413. Yohai, V.J., and Zamar, R.H. (1991), unpublished manuscript. 49 BIOGRAPHICAL INFORMATION NAME: 50N/4 V. T. MAll-1 . MAILING ADDRESS: SAMUEL BRETON 480 LoiviAs DE SAN MA -RT64 5000- cdR.008A ARGENTINA PLACE AND DATE OF BIRTH: CORD() M / AR6ENriNA SEPTEMBER VIA, 1 9 63 EDUCATION (Colleges and Universities attended, dates, and degrees): uNiveRSIDAD NACIONAL DE COIRDO8A - 4RGENTIN A 3 / 1 "3 - 3 / 4C1 gq 1-10ENCIADA EN mArEmdricA POSITIONS HELD: PUBLICATIONS (if necessary, use a second sheet): AWARDS: Complete one biographical form for each copy of a thesis presented to the Special Collections Division, University Library.
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
Japan | 5 | 0 |
France | 4 | 0 |
Ukraine | 1 | 1 |
Switzerland | 1 | 0 |
City | Views | Downloads |
---|---|---|
Tokyo | 5 | 0 |
Paris | 4 | 0 |
Unknown | 2 | 1 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: