- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Contributions to the theory of robust inference
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Contributions to the theory of robust inference 2000
pdf
Page Metadata
Item Metadata
Title | Contributions to the theory of robust inference |
Creator |
Salibian-Barrera, Matias |
Date Created | 2009-07-23 |
Date Issued | 2009-07-23 |
Date | 2000 |
Description | We study the problem of performing statistical inference based on robust estimates when the distribution of the data is only assumed to belong to a contamination neighbourhood of a known central distribution. We start by determining the asymptotic properties of some robust estimates when the data are not generated by the central distribution of the contamination neighbourhood. Under certain regularity conditions the considered estimates are consistent and asymptotically normal. For the location model and with additional regularity conditions we show that the convergence is uniform on the contamination neighbourhood. We determine that a class of robust estimates satisfies these requirements for certain proportions of contamination, and that there is a trade-off between the robustness of the estimates and the extent to which the uniformity of their asymptotic properties holds. When the distribution of the data is not the central distribution of the neighbourhood the asymptotic variance of these estimates is involved and difficult to estimate. This problem affects the performance of inference methods based on the empirical estimates of the asymptotic variance. We present a new re-sampling method based on Efron's bootstrap (Efron, 1979) to estimate the sampling distribution of MM-location and regression estimates. This method overcomes the main drawbacks of the use of bootstrap with robust estimates on large and potentially contaminated data sets. We show that our proposal is computationally simple and that it provides stable estimates when the data contain outliers. This new method extends naturally to the linear regression model. |
Extent | 8111641 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2009-07-23 |
DOI | 10.14288/1.0089805 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 2000-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/11164 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0089805/source |
Download
- Media
- ubc_2000-566137.pdf [ 7.74MB ]
- Metadata
- JSON: 1.0089805.json
- JSON-LD: 1.0089805+ld.json
- RDF/XML (Pretty): 1.0089805.xml
- RDF/JSON: 1.0089805+rdf.json
- Turtle: 1.0089805+rdf-turtle.txt
- N-Triples: 1.0089805+rdf-ntriples.txt
- Citation
- 1.0089805.ris
Full Text
Contributions to the Theory of Robust Inference by Matias Salibian-Barrera Licenciado en Matematicas, Universidad de Buenos Aires, Argentina, 1994 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Doctor of Philosophy in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Statistics) we accept this thesis as conforming to the required standard The University of British Columbia July 2000 © Matias Salibian-Barrera, 2000 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s thesis for s c h o l a r l y purposes may be granted by the head of my department or by h i s or her representatives. It i s understood that copying or p u b l i c a t i o n of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of STA-TfS~T( The U n i v e r s i t y of B r i t i s h Columbia Vancouver, Canada doii 2o\t ZJDJOO. Date A b s t r a c t We study the problem of performing statistical inference based on robust esti- mates when the distribution of the data is only assumed to belong to a contamination neighbourhood of a known central distribution. We start by determining the asymp- totic properties of some robust estimates when the data are not generated by the central distribution of the contamination neighbourhood. Under certain regularity conditions the considered estimates are consistent and asymptotically normal. For the location model and with additional regularity conditions we show that the conver- gence is uniform on the contamination neighbourhood. We determine that a class of robust estimates satisfies these requirements for certain proportions of contamination, and that there is a trade-off between the robustness of the estimates and the extent to which the uniformity of their asymptotic properties holds. When the distribution of the data is not the central distribution of the neighbourhood the asymptotic vari- ance of these estimates is involved and difficult to estimate. This problem affects the performance of inference methods based on the empirical estimates of the asymptotic variance. We present a new re-sampling method based on Efron's bootstrap (Efron, 1979) to estimate the sampling distribution of MM-location and regression estimates. i i This method overcomes the main drawbacks of the use of bootstrap with robust esti- mates on large and potentially contaminated data sets. We show that our proposal is computationally simple and that it provides stable estimates when the data contain outliers. This new method extends naturally to the linear regression model. i i i C o n t e n t s Abstract ii Contents iv List of Tables vii List of Figures ix Acknowledgements xii Dedication xiv 1 Introduction 1 1.1 Robust estimates and data screening 3 1.2 The variability caused by cleaning the data 7 1.3 Inference based on robust estimates 12 1.4 Bootstrapping robust estimates 15 1.5 A new computer intensive method 17 1.6 Thesis outline 25 iv 2 Global asymptotic properties of robust estimates for the location- scale model 27 2.1 Definitions 29 2.2 Robustness properties 36 2.3 Asymptot ic properties 40 2.3.1 Uniform consistency of the S-scale estimate 41 2.3.2 Consistency of the S-location estimate 43 2.3.3 Uniform consistency of the S-location estimate 51 2.3.4 Consistency of the MM-loca t ion estimate 61 2.3.5 Uniform consistency of the MM-loca t ion estimate 64 2.3.6 Asymptot ic distribution of the MM-loca t ion estimate 68 2.3.7 Uniform asymptotic distribution for MM- loca t i on estimates . . 74 3 Robust bootstrap for the location-scale model 84 3.1 Definitions 91 3.2 Examples 95 3.2.1 One sample location-scale: B lood pressure 95 3.2.2 Two-sample location-scale: Seeded clouds 96 3.3 Asymptot ic properties 101 3.4 Robustness properties I l l 3.5 Studentizing the robust bootstrap 115 3.6 Inference 119 3.6.1 Asymptot ic variance estimation 121 3.6.2 Coverage and lengths of confidence intervals 125 4 Global asymptotic properties of robust estimates for the linear re- gression model 138 4.1 Definitions 139 4.2 Robustness properties 144 4.3 Asymptot ic properties 146 4.3.1 Consistency of the S-scale estimate 147 4.3.2 Consistency of the S- and MM-regression estimates 161 4.3.3 Asymptot ic distribution of the MM-regression estimate . . . . 164 5 Robust bootstrap for the linear regression model 166 5.1 Definitions 169 5.2 Examples 172 5.2.1 Body and Bra in Weights 172 5.2.2 Belgium International Phone Calls 174 5.3 Asymptot ic properties 180 5.4 Robustness properties 188 5.5 Inference 195 5.5.1 Empir ica l coverage levels of confidence intervals 195 6 Conclusion 208 7 Appendix - Auxiliary results 212 Bibliography 233 v i L i s t o f Tab les 1.1 Comparison of the estimated SDs of the linear regression estimates for the Stack Loss data 9 1.2 Comparison of actual and estimated standard deviations using the H R R method to "clean" the data 11 2.1 Numerical parameters for Tukey's family of functions Pd 60 2.2 Numerical evaluation of regularity conditions required for uniform con- sistency of S-location estimates with Tukey's family of functions pd • 60 3.1 Comparison of breakdown points of classical and robust bootstrap quantile estimates for MM-location estimators 115 3.2 Comparison of asymptotic variance estimates - quadratic measure . . 125 3.3 Comparison of asymptotic variance estimates - logarithmic measure . 126 3.4 Coverage and length of 95% confidence intervals for the location-scale model 129 3.5 Coverage and length of 99% confidence intervals for the location-scale model 129 vii 5.1 Belgium International Calls - Bootstrap and robust bootstrap 95% confidence intervals 177 5.2 Comparison of quantile upper breakdown points for linear regression . 194 5.3 Average coverage and length of 5,000 95% confidence intervals for the linear regression model with p = 2 198 5.4 Coverage and length of 5,000 99% confidence intervals for the linear regression model with p = 2 199 5.5 Coverage and length of 5,000 95% confidence intervals for the linear regression model with p = 5 200 5.6 Coverage and length of 5,000 99% confidence intervals for the linear regression model with p = 5 202 viii L i s t o f F i g u r e s 1.1 Least Squares residuals for the Stack Loss data 5 1.2 Robust Regression residuals for the Stack Loss data 6 1.3 QQ-plots of the re-sampled Intercept coefficient estimates obtained with both the classical and robust bootstrap for the Stack Loss data. 21 1.4 QQ-plots of the re-sampled Air Flow coefficient estimates obtained with both the classical and robust bootstrap for the Stack Loss data. 22 1.5 QQ-plots of the re-sampled Water Temperature coefficient estimates obtained with both the classical and robust bootstrap for the Stack Loss data 23 1.6 QQ-plots of the re-sampled Acid Concentration coefficient estimates obtained with both the classical and robust bootstrap for the Stack Loss data 24 2.1 Plots of 7 (Fe, t, cr (F€)) for e = 0.15 and 0.20 48 2.2 Plots of 7 (Fe, t, cr (Fe)) for e = 0.25 and 0.30 49 2.3 Plots of 7 (Fe, t, cr (F€)) for e = 0.40 and 0.45 50 2.4 Plots to determine regularity conditions on pd 59 ix 3.1 Boxplots of 3,000 re-calculated MM-location estimates with the classi- cal and robust bootstrap 88 3.2 Boxplots of 3,000 re-calculated MM-location estimates with the classi- cal and robust bootstrap on an artificial data set with 35% outliers. . 90 3.3 Comparison of the classical and robust bootstrap distribution estimates for the modified blood pressure data 97 3.4 Precipitation data 98 3.5 Comparison of asymptotic variance estimates - quadratic measure . . 130 3.6 Comparison of asymptotic variance estimates - logarithmic measure . 131 3.7 Location-scale model 95% confidence intervals for n = 20 132 3.8 Location-scale model 95% confidence intervals for n = 30 133 3.9 Location-scale model 95% confidence intervals for n — 50 134 3.10 Location-scale model 99% confidence intervals for n = 20 135 3.11 Location-scale model 99% confidence intervals for n = 30 136 3.12 Location-scale model 99% confidence intervals for n = 50 137 5.1 Least squares and robust regression fits to the Brain and Body Weight data 175 5.2 Classical and robust bootstrap distribution estimates for the Brain and Body Weight data 176 5.3 Least squares and robust regression fits to the Belgium International Phone Calls data 178 5.4 Comparison of classical and robust bootstrap distribution estimates for the Belgium International Phone Calls data 179 x 5.5 Average coverage of 95% confidence intervals for the linear regression model with p = 2 204 5.6 Average coverage of 99% confidence intervals for the linear regression model with p = 2 205 5.7 Average coverage of 95% confidence intervals for the linear regression model with p = 5 . . . . . . . . 206 5.8 Average coverage of 99% confidence intervals for the linear regression model with p = 5 207 xi A c k n o w l e d g e m e n t s Many people helped me during the years I spent at the Department. No line of this thesis could have been written without my wife Veronica D'Angelo by my side and our son Lucas constantly teaching me what is really important. Financial support was provided by the University of British Columbia through a University Graduate Fellowship; by the Department of Statistics via part-time appointments as Teaching Assistant; by Dr. Ruben Zamar with appointments as Research Assistant during many summers and by the Statistical Consultant and Re- search Laboratory with a part-time position. I owe much more than what I can express here to our families back home (Alfredo, Marisa and Florencia Salibian, Oscar and Noemi D'Angelo) who always believed in us and unconditionally supported us all these years; to our friend Sonia Mazzi, who made our settling in this remote land a smooth and pleasant adven- ture; to the Palejkos (George, Gi l l , Nancy, Ingrid, and Roger) who became our fam- ily in Canada; and to our good friends Daniel and Iris Brendle-Moczuk, Genevieve Gagne, Stephane Lemieux (and Eugenie!), Remi Gilbert, Caroline Grondin (Victor and Zacharias), the Hollmans (Jorge, Sandra and Rocfo), Jorge Adrover, Hector Pala- cio, Luis Suero, and Carlos Di Bella. xii Many people at the Department also helped in different ways: Jarek Harezlak, being a good friend and colleague; Rick White, who was always ready for a chat; Christine Graham (what would we do without her?); Dr. Nancy Heckman and Dr. John Petkau, who on many occasions spent some time with the students at the pub; and last but not least: everybody at "The L. Gang". Finally, I would like to thank Dr. Ruben Zamar for his encouragement, con- stant availability, patience, guidance and support; Dr. Nancy Heckman and Dr. John Petkau for their helpful comments, suggestions and advice; and Dr. Jim Zidek for his advice. M A T I A S S A L I B I A N - B A R R E R A The University of British Columbia July 2000 xii i For Lucas and Vero xiv C h a p t e r 1 I n t r o d u c t i o n In this chapter we introduce and illustrate the problems addressed in the rest of this thesis. In the first section we use a real-life example to show how an analysis based on a robust regression estimate compares with previous analyses of these data. In those analyses the data were carefully screened, suspicious observations were deleted, and least squares methods were used on the remaining data. Unfortunately there are no proposals in the literature to consistently estimate the variability of the estimates obtained after deleting potential outliers. The second section of this chapter explores this problem. An alternative method to perform statistical inference when the data are con- taminated is to use robust estimates. Most attention in the robust literature has been paid to the case of errors with symmetric distributions. Section 1.3 briefly re- 1 views some of the published studies for asymmetric distributions. In the same section we discuss our results regarding the asymptotic properties of some robust estimates for more general error distributions. In particular we study their consistency and asymptotic distribution. Empirical estimates of these asymptotic variances provide consistent estimates of the variability of these robust estimates. Unfortunately, sim- ulation experiments suggest that they can be numerically unstable and hence yield poor estimates. We also consider computer-intensive inference methods, in particular Efron's bootstrap (Efron, 1979). In Section 1.4 we discuss two drawbacks of the use of Efron's bootstrap with robust estimates. Both the presence of outliers in the data and the computational complexity of robust estimates are important challenges for this method. In Section 1.5 we introduce a new computer-intensive method that overcomes these limitations and hence can be used with large data sets that might contain outliers. In this section we present the basic idea for the simple location model with known scale. Details of the application of this method to the location-scale and linear regression models are presented in Chapters 3 and 5 respectively. Finally, Section 1.6 outlines the rest of this thesis. 2 1.1 Robust estimates and data screening Consider the Stack Loss data, first published by Brownlee (1965, page 454). These data have been extensively studied in the literature (see Daniel and Wood, 1980, Chapters 5 and 7; Atkinson, 1985, pp. 129-136, 267-8; and Venables and Ripley, 1997, pp 262-264). They consist of 21 daily observations measured in a plant for the oxidation of ammonia to nitric acid. The response variable is ten times the percentage of ammonia lost. This is an indirect measure of the efficiency of the plant. There are 3 explanatory variables: air flow, temperature of the cooling water and acid concentration. The linear model used in the literature is Ammonia Lost (%) = Po + Pi Ai r flow + p2 Water temperature + Ps Acid concentration + e, (1.1) where e are independent identically distributed normal errors. The residuals of the least squares fit of model (1.1) presented some features worth further consideration. After a very careful analysis of the listing of the data, Daniel and Wood (1971, Chapter 5, page 81) noticed a different behavior in the response variable every time the water temperature was above 60. They concluded that the plant seemed to have needed a period of one day to stabilize after the water temperature reached 60. Hence they decided that the observations that were obtained with water temperature above 60 (cases 1, 3, 4 and 21) require special attention, and were removed from the analysis. Figure 1.1 contains the plot of residuals versus fitted values for the least squares 3 fit. The dotted lines correspond to twice the estimated standard deviation of the er- rors in (1.1). Note that observation number 21 appears to have a residual considerably larger than the others. Three other cases are somewhat outlying, but within 2 es- timated standard deviations from zero. Classical outlier detection methods, such as the externally Studentized residuals test (Weisberg, 1985, page 115-6) only detect observation 21 as an outlier. We also estimated the coefficients of model (1.1) using an MM-regression es- timate with 50% breakdown point, 95% efficiency and Tukey's loss functions (see Sections 4.1 and 4.2 for the corresponding definitions). We worked with the complete data set. The plot of residuals versus fitted values is shown in Figure 1.2. The dotted lines correspond to twice the estimated standard deviation of the errors. With this robust estimate cases 1, 3, 4 and 21 are clearly identified as outliers. This example illustrates the potential of robust estimates. Daniel and Wood (1980) had to rely on an careful analysis of the listing of the data until some pattern seemed apparent. The additional complications and limitations of this method when the data have either more explanatory variables or more cases are obvious. In this example the analysis based on a robust regression estimate yields the same conclusion as Daniel and Wood (1980, Chapter 5), namely that observations 1, 3, 4 and 21 seem to follow a different model from the rest of the data. Note that we did not require a detailed case-by-case analysis as Daniel and Wood did (1980, Chapter 5). From the discussion above one might conclude that the main role of robust estimates is to help to identify outliers or suspicious observations. These cases could 4 < 9 20 FITTED VALUES Figure 1.1: Residuals of the least squares fit for the Stack Loss data. The dotted lines correspond to twice the estimated standard deviation of the errors 5 < a HI IT 20 FITTED VALUES Figure 1.2: Residuals of a robust fit for the Stack Loss data. The dotted lines corre- spond to twice the estimated standard deviation of the errors 6 then be discarded and classical methods applied to the "clean" data set. In the next section we discuss some drawbacks of this approach. 1.2 The variability caused by cleaning the data There are two classes of methods to detect outliers: "subjective" and "objective" procedures. In this section we will focus on outlier detection methods applied to linear regression analyses. Subjective methods rely on the judgment of data analysts. They normally use a classical fit followed by an analysis of the residuals. Using plots and other devices the researcher identifies outliers or suspicious observations. These observations are then removed and classical methods applied to the remaining data. A formal study of the variability introduced into the final least squares esti- mate by these data-cleaning methods seems impossible with the mathematical tools available today (but see Relies and Rogers (1977) for a Monte Carlo experiment on subjective outlier rejection rules). On the other hand, objective methods are based on a well defined rule, such as: "discard all observations with a residual larger than 2 standard deviations", or "reject all observations with associated Cox Distance larger than 1". Because it is expected that if there are outliers in the data the classical fit will be misleading, another set of objective rules are based on the residuals from a robust fit as follows: 7 1. Fit a robust estimate. 2. Calculate a robust estimate of the standard deviation of the residuals, a. 3. Fix a number c > 0 and drop any observation with a residual larger than ca (typically 2 < c < 3). 4. Apply classical methods to the remaining data. We will refer to this last family of methods as "hard rejection rules" (HRR). See Hampel et al. (1986, page 31) for a Monte Carlo study of objective rejection rules for the location model. If we apply steps 1-4 to the Stack Loss data with the same MM-regression estimate we used before, and set c = 2 in step 3, we find that observations 1, 3, 4 and 21 should be removed. The least squares fit of the remaining 17 data points yields regression estimates that are indistinguishable from the MM-regression fit. However, the estimates of the standard deviations of the regression estimates given by the least squares analysis are consistently smaller than those reported by the robust procedure (see Table 1.1). It is important to note that the standard errors of the estimates reported by the least squares analysis of the "cleaned" data do not take into account the variability introduced by the "cleaning" step. In other words, the column of estimated standard deviations in the computer output may not reflect the actual variability of the reported point estimates. 8 Estimated Standard Deviations LS on the Robust Coefficient "cleaned" data fit Intercept 4.732 5.003 Air flow 0.067 0.071 Water temp. 0.166 0.176 Acid cone. 0.062 0.065 Residuals 1.253 1.837 Table 1.1: Comparison of the estimated standard deviations of the linear regression estimates for the Stack Loss data To illustrate this problem we performed a small Monte Carlo experiment (also see Dupuis and Hamilton (2000) for a theoretical assessment of this inference proce- dure). The objective of the experiment is to show that the estimates of the standard deviations of the regression estimates calculated by the H R R method consistently underestimate the actual standard deviations of those regression estimates. In order to do so, we first estimated the actual variability of the point estimates obtained by using a HRR. We considered a linear model of the form Vi = A> + Pi xu H r(3pXpi + 6 i , i = l , . . . , n , (1.2) where e» are independent standard normal random variables. Note that in the above model there are no outliers in the data. We used all the combinations with n = 20, n — 50, p = 1 and p = 3. The robust estimate used in the H R R procedure was a 95% efficient MM-regression estimate with 50% breakdown point and scale calculated with Tukey's loss function (see Sections 4.1 and 4.2 for the corresponding definitions). For each combination of sample size and number of predictors we generated 100,000 9 samples following model (1.2). With each sample we followed steps 1 to 4 above. In step 3 we used c = 2.5 and the robust estimate of the standard deviation of the errors associated with the MM-regression estimate. Our estimate of the actual variability of these estimates is the Monte Carlo standard deviation of these 100,000 coefficient estimates. In Table 1.2, the column labeled "Monte Carlo estimate of the standard deviation" contains this estimated standard deviation for each coefficient in the different models. The next step in the experiment is to show that the estimates of those standard deviations as reported by the H R R analysis are consistently smaller that the estimates obtained in the first part of our study. With the same design matrices we generated 100,000 new samples following model (1.2). We applied steps 1 to 4 as before to each of these new samples, and recorded the estimates of the standard errors of each coefficient as reported by the least squares analysis in step 4. Column " H R R estimates of the standard deviation" in Table 1.2 contains the mean and standard error of these 100,000 estimated standard deviations. From Table 1.2 it is clear that the estimates of the standard deviations reported by the least squares fit after cleaning the data consistently underestimate the actual variability of this estimation procedure. Hence we might obtain optimistic confidence intervals and smaller p-values than their actual value. The researcher should be concerned that this difference can affect the validity of his or her conclusions. An alternative method of performing inference that can deal with outliers in the data is to use robust estimates. These methods naturally incorporate the variability of 10 H R R estimates of the Monte Carlo estimate of n P standard deviation the standard deviation p = 2 20 Po 0.205 (0.046) 0.256 P i 0.227 (0.051) 0.283 50 Po 0.133 (0.017) 0.152 P i 0.138 (0.017) 0.156 p = 4 20 Po 0.182 (0.057) 0.322 P i 0.164 (0.051) 0.410 P2 0.173 (0.054) 0.478 Ps 0.177 (0.056) 0.295 50 Po 0.135 (0.018) 0.159 P i 0.144 (0.019) 0.170 P2 0.145 (0.019) 0.171 p3 0.132 (0.018) 0.157 Table 1.2: Comparison of actual and estimated standard deviations using the H R R method to "clean" the data. The first column contains the Monte Carlo mean of the H R R estimates of the standard deviations and the corresponding Monte Carlo standard deviation within parentheses. The second column contains the estimate of the standard deviations obtained from a separate simulation experiment. These last values are the "actual" standard deviations. 11 the down-weighting step into the estimated standard deviations. In the next section we discuss some limitations of the existing asymptotic theory for robust estimates. 1.3 Inference based on robust estimates The finite sample distribution of robust estimates is unknown and hence inference must be based on their asymptotic distribution (see Hampel et ai, 1986, Chapter 3; Ronchetti, 1982; Markatou and Hettmansperger, 1990; among others). The asymptotic distribution of robust regression estimates is well known when the distribution of the errors is symmetric (Huber, 1967; Maronna and Yohai, 1981; Davies, 1993). In this case the estimates of the regression coefficients and of the scale of the errors are asymptotically independent. Because outliers need not be balanced on both sides of the regression line, many data sets with outliers do not satisfy this symmetry assumption. If one relaxes this condition, the calculation of the asymptotic variance of robust location and regression estimates becomes very involved. The main difficulty seems to be that the scale estimate is no longer asymptotically independent of the estimate of the location or regression parameters. This problem has received little attention in the literature. Carroll (1979), Huber (1981) and Rocke and Downs (1981) are among the few who studied it. Carroll (1979) compared several variance estimates of both location and simple linear regression robust estimates. He showed that the asymptotic variance derived under the symmetry assumption underestimates the true variance. In the 12 location case, this effect can be ameliorated by jackknifing. However, this technique does not seem to work for the intercept of the simple linear regression model. Huber (1981, page 140) gave a formula to compute the influence functions of location and scale estimates when they are calculated simultaneously. Rocke and Downs (1981) also studied variance estimation for robust location estimates when the distribution of the data is asymmetric. Their simulation study concluded that estimating the variance of robust location estimates in this situation is very difficult. In particular, for symmetric distributions the empirical estimate of asymptotic variance estimate worked better than the bootstrap, but for asymmetric distributions the performances reversed. Their numerical results do not show a variance estimation method that yields good estimates for both symmetric and asymmetric distributions. In Sections 2.3 and 4.3 we study the consistency and asymptotic distribution of the S-scale, S- and MM-location and regression estimators (see Sections 2.1 and 4.1 for the corresponding definitions). We assume that the distribution of the errors belongs to a contamination neighbourhood of a symmetric central distribution and show that these estimates are consistent for any distribution in this neighbourhood. For the location-scale model, with further regularity conditions we show that these results hold uniformly on the neighbourhood. That is, the speed of the convergence does not depend on the particular distribution F in the contamination neighborhood rle (see Section 2.2). Formally, the uniformity result we obtain is as follows. Let fin be the robust location estimate calculated on a sample of size n generated by a distribution F 6 rie. Let a- be the almost sure asymptotic value of jxn when n —> oo. 13 Let 5 > 0 be arbitrary, then lim sup PF < sup \pn — p\ > 8 > = 0. m - » o o F e K e yn>m ) We also find that under certain regularity conditions the MM-location estimates are asymptotically normal and we derive an explicit formula for their asymptotic variance. For the location model it has the form V(LI,O,F) = o2a2EF{[U - b x W]2} , (1.3) where U and W are certain random variables (see equation 2.70 on page 71), the constants a and b are given by a=l/EF{ib'{(X-ii)/<j)}, and EpWUX-ri/tT) (X-p)/a} EF {p> ((X - fl) I a) (X - fa) /a} ' where Ji is the almost sure asymptotic value of the S-location estimate p,n associ- ated with the MM-location estimate fin. The functions ip and p are bounded, and continuously differentiable (see Definition 2.9). Another result we derive for the location-scale model is that the asymptotic normality of these estimates holds uniformly on the distribution generating the data. That is, we have lim sup sup PF\y/n (pn - LI) < x \/V \ - $ (x) n->oo p p , , . j e t > = 0. 1 Fenc xi where $ denotes the standard normal cumulative distribution function and V V (pi, a, F) is the asymptotic variance given by (1.3). 14 In general, consistent estimates for the asymptotic standard deviations of ro- bust estimates can be obtained from the corresponding empirical asymptotic vari- ances. For example, to estimate V above we can use V = V (p,n, an, Fn) where Fn is the empirical distribution function of the sample x\,...,xn. However, for the case of asymmetric error distributions, we found some numerical problems that seem to arise from the involved form of V in (1.3). In particular, the denominators in a and b can become small for asymmetric distributions F. In Section 3.6.1 we describe a Monte Carlo experiment that illustrates the extent of this instability. 1.4 Bootstrapping robust estimates Another approach to estimate the variability of estimates is given by the bootstrap (Efron, 1979). This method has been extensively studied for diverse models. In particular, the theory for bootstrap distribution of robust estimates has been studied by Shorack (1982), Parr (1985) and Yang (1985) among others. Two problems of practical relevance arise when bootstrapping robust regres- sion estimates. First, the proportion of outliers in the bootstrap samples may be higher than that in the original data set causing the bootstrap quantiles to be very inaccurate. Intuitively the reasoning is as follows. Both outlying and non-outlying observations have the same chance of being in any bootstrap sample. With a certain positive probability, the proportion of outliers in a bootstrap sample will be larger than the fraction of contamination the robust estimate tolerates. In other words, a 15 certain proportion of the re-calculated values of the robust estimate will be heavily influenced by the outliers in the data. Thus, the tails of the bootstrap distribution can be heavily influenced by the outliers. This "lack of robustness" of the classical bootstrap was noted by Ghosh et al. (1984), and Shao (1990, 1992) in the context of estimating asymptotic variances, and by Singh (1998) for quantile estimates. Ghosh et al. (1984) showed that a condition is needed on the tails of the distribution of the data for the bootstrap variance estimate of the median to converge. Note that no matter how robust the estimate being boot- strapped, a tail condition is still needed. Shao (1990) proved that if one truncates the tails of the bootstrap distribution (with the truncation limit going to infinity as the sample size increases, so that asymptotically there are no discarded bootstrapped estimates) then the bootstrap variance converges to the asymptotic variance of the estimate of interest. Unfortunately it is not clear how to implement this method in a finite sample setting. Singh (1998) quantified this robustness problem for the estimates of the quantiles of the asymptotic distribution of location estimates. He defined the breakdown point for bootstrap quantiles and showed that it is disap- pointingly low even for highly robust location estimates. He proposed to Winsorize the observations using the robust location and scale estimates and then to re-sample from these Winsorized observations. He showed that the quantiles obtained from this method have a much higher breakdown point and that they converge to the quantiles of the asymptotic distribution of the estimate. The second difficulty is caused by the heavy computational requirements of 16 the bootstrap which are compounded with robust estimates. Robust regression esti- mates are generally determined by the solution of a non-linear optimization problem in several dimensions. In the particular case of MM-estimates (Yohai, 1987) for each sample we have to solve two such problems. Moreover, one of them is only implicitly defined as the solution of a non-linear equation. We see that bootstrapping M M - estimates involves repeatedly solving two non-linear optimization problems in several dimensions. We have also found additional computational issues that needed spe- cial attention. For example, a bootstrap sample may not be in general position (see Definition 5.1 in Section 5.4) and this has consequences in determining the scale of the residuals. This large number of non-linear optimization problems may render the method unfeasible for high dimensional problems. As an example of the computa- tional times that can be expected, the evaluation of 5,000 bootstrap re-calculations of an MM-regression estimate on a simulated data set with 200 observations and 10 explanatory variables took 9120 C P U seconds («2.5 hours) on a Sun Sparc Ultra workstation. The same number of re-calculations performed with the robust bootstrap we introduce in the next section took 416 C P U seconds (approximately 7 minutes) under the same conditions. 1.5 A new computer intensive method The basic ideas are best presented using the simple location model. Let x±,..., xn be a random sample satisfying Xi = n-rei, i = l,...,n, (1.4) 17 where are independent and identically distributed random variables with known variance. Let xj) : R —> R be odd, bounded, and non-decreasing. The associated M-location estimate for LL is defined as the solution fin of n ^2^(Xi-iln) = 0. (1.5) We are interested in estimating the standard deviation of fin. For this purpose we present the following computer intensive method to generate a large number of re- calculated estimates //*. We will use the variability observed in these re-calculated estimates to assess the variance of fin. It is easy to see that jxn can also be expressed as a weighted average of the observations: E n iP(xj-iin) n where uji = ip (XJ — fin)/ (%i — An)- This representation of fin cannot be used directly to calculate fin because the weights on the right hand side depend on the estimate. Note that commonly used functions ip (such as Huber's family ipc, see equation 2.5) yield weights to (u) = ip (u)/u that are decreasing functions of In this case, outlying observations that typically have a large residual \xi — p,n\ will have a small weight in (1.6). Let x\,..., x*n be a bootstrap sample of the data (i.e. a random sample taken from Xi,... ,xn with replacement). Recalculate fin using equation (1.6): K = (1.7) 18 with UJ* — ip{x* — ftn)/ {x* — fln). We have seen above that observations that are far from the bulk of the data will typically come into the bootstrap samples associated with small weights. Hence the influence of outliers in the bootstrapped estimate is bounded. Also note that we are not fully recalculating the estimate from each bootstrap sample. The re-calculated /x*'s in (1.7) may not reflect the actual variability of fin. Intuitively this happens because the weights are not re-computed with each boot- strap sample. Instead, we are using the weights obtained with the estimate fin as calculated with the original data. To remedy this loss of variability in the /}*'s we use an estimable correction factor. One way to derive this correction is to think of (1.6) as a fixed-point equation of the form fin = / (fin). The first-order Taylor expansion of / around the limit u. of fin suggests that we should multiply the re-weighted / i n ' s by [1 — / ' (A*)] - 1 - With this notation, the correction factor we use is [1 — / ' ( /z„)] _ 1 . Theorem 3.1 in Section 3.3 shows that the corrected /i*'s have the same asymptotic distribution as the estimates fin. Our method yields quantile estimates with a high breakdown point as defined by Singh (1998) (see Sections 3.4 and 5.4). This property means that a high propor- tion of outliers is needed to push the robust bootstrap quantile estimates above any bound. Classical bootstrap quantile estimates have a disappointingly low breakdown point, in spite of the robustness of the estimate being re-calculated (Singh, 1998). In Section 3 we study the robust bootstrap for the location model with unknown scale. This new bootstrap method, which we call the robust bootstrap, is also compu- 19 tationally simple. In the linear regression context studied in Chapter 5, this property is very desirable. As opposed to the classical bootstrap that would need to solve a full multivariate optimization problem with each re-calculation, robust bootstrap evaluations only require solving a linear system of equations. To compare the performance of our method with the classical bootstrap we ran 5,000 robust bootstrap iterations on the same artificial data set we used to illustrate the computational demands of the classical bootstrap (see page 17). Our method took 416 C P U seconds (approximately 7 minutes) to finish, while the classical bootstrap used 2.5 C P U hours. Both programs were written in C and called within Splus 3.4 for Unix. To illustrate the stability of the distribution estimates obtained with the ro- bust bootstrap, we applied our method to the MM-regression estimate for the Stack Loss data (see Chapter 4 for the definitions). We used both re-sampling methods to estimate the distribution of the 4-dimensional vector QQ-plots of the estimates of the distribution of the projections (/% — /%), i = 1,..., 4, obtained with each method. Note that in all cases the distribution estimates yielded by our method are closer to the limiting normal distribution and have lighter tails than the re-calculated estimates using the classical bootstrap. where 0O,Pi, fa,h) is the MM-regression estimate. Figures 1.3 to 1.6 display the 20 o 1 1 1 1 I -4 -2 0 2 4 Quantiles of Standard Normal (a) Classical bootstrap i , , , -4 -2 0 2 4 Quantiles of Standard Normal (b) Robust bootstrap Figure 1.3: QQ-plots of the re-sampled Intercept coefficient estimates obtained wi th both the classical and robust bootstrap for the Stack Loss data. 21 Quantiles of Standard Normal (a) Classical bootstrap Quantiles of Standard Normal (b) Robust bootstrap Figure 1.4: QQ-plots of the re-sampled A i r Flow coefficient estimates obtained with both the classical and robust bootstrap for the Stack Loss data. 22 M , r- -4 -2 0 Quantiles of Standard Normal (a) Classical bootstrap 1 0 Quantiles of Standard Normal (b) Robust bootstrap Figure 1.5: QQ-plots of the re-sampled Water Temperature coefficient estimates ob- tained with both the classical and robust bootstrap for the Stack Loss data. 23 0 Quantiles of Standard Normal (a) Classical bootstrap (b) Robust bootstrap Figure 1.6: QQ-plots of the re-sampled A c i d Concentration coefficient estimates ob- tained wi th both the classical and robust bootstrap for the Stack Loss data. 24 1.6 Thesis outline The rest of this thesis is organized as follows. Chapter 2 studies the asymptotic properties (consistency and asymptotic distribution) of some robust scale and location estimates. We introduce the location-scale model and the classes of S-scale, S-location and MM-location estimates. We study the asymptotic behaviour of these estimates when the distribution of the errors belongs to a contamination neighbourhood of the standard normal. We present consistency and asymptotic normality results that, under additional regularity conditions, hold uniformly on the distribution of the errors (see Davies, 1998). As a side result we derive a technique to determine the maximum asymptotic bias of M-location estimates with re-descending score functions. In Chapter 3 we present a new computer intensive inference method for the location-scale model and we study its asymptotic properties. In particular we show that the resulting bootstrap distribution converges to the asymptotic distribution of the estimates of interest and that the derived quantile estimates have satisfac- tory robustness properties. Finally, we report the results of two Monte Carlo studies that compare the performance of this new method with other proposals in the lit- erature. The first study compares several asymptotic variance estimates while the second compares the mean length and empirical coverage of confidence intervals for the parameters of interest in the model. In Chapter 4 we extend the results of Chapter 2 to the linear regression model. Section 4.1 presents the model and defines the MM-regression estimates. Section 4.2 25 presents the contamination neighbourhood and the robustness properties of the M M - regression estimates. Section 4.3 studies the asymptotic properties of these estimates. Chapter 5 extends the inference method presented in Chapter 3 to the linear regression model. We illustrate its use with two examples and we study the consis- tency of the distribution estimate and the robustness of the corresponding quantile estimates. Section 5.5 contains the results of a simulation study that investigates the finite sample size behaviour of the confidence intervals based on new method introduced here. Chapter 6 contains a brief list of the results obtained in this thesis, the chal- lenges that remain to be solved and the directions we forsee for future work. The appendix in Chapter 7 contains most of the auxiliary results needed in the proofs. Proofs are presented for those results that could not be found in the literature. 26 C h a p t e r 2 G l o b a l a s y m p t o t i c p r o p e r t i e s o f r obus t es t imates for the loca t ion - sca le m o d e l In this chapter we study the asymptotic properties (consistency and asymptotic distri- bution) of some robust estimates of a location parameter when the observations may have an asymmetric distribution. First we define the classes of M-location estimates with general scale, S-location, S-scale estimates, and MM-location estimates. Most attention in the robustness literature has been paid to the asymptotic properties of robust estimates (in particular to their consistency and asymptotic distribution) when the data follow the non-contaminated model. In this chapter we study the properties of S- and MM-estimates in the full contamination neighbourhood. We show that the 27 S- and MM-estimates are consistent and asymptotically normal for any distribution in the gross-error neighbourhood of a symmetric distribution. We also discuss condi- tions that ensure these results hold uniformly over the contamination neighbourhood. As discussed by Davies (1998), uniformity is a reasonable property to expect in this context. Robust estimates have been proposed to deal with uncertainty in the model that generates the data, hence we expect their properties not to depend on a specific distribution in the neighbourhood. For example, the speed of convergence can depend on the distribution that generated the data. Our results guarantee that this is not the case with the estimates we consider in this chapter. This chapter is organized as follows. Section 2.1 defines the classes of M - , S- and MM-location and scale estimates. Section 2.2 introduces the contamination neighbourhood 7^e and briefly discusses the robustness properties of these estimates. Section 2.3 studies the asymptotic properties of S-location, S-scale and MM-location estimates when the distribution of the data belongs to We provide conditions to obtain consistency and asymptotic normality of these estimates. We also obtain reg- ularity conditions that ensure that the consistency and asymptotic normality results hold uniformly on the contamination neighbourhood. We show that a certain family of robust estimates satisfies these conditions. 28 2.1 Definitions In this chapter we consider the following location-scale model. Let x\,...,xn be n observations on the real line satisfying Xi = LI + a Ci i = l , . . . n , (2.1) where ê , i = 1,.. .n are independent and identically distributed (i.i.d.) observations with variance equal to 1. The interest is in estimating pi. The scale o is a nuisance parameter. Huber (1964) introduced the class of M-estimates. Suppose that xi,...,xn are i.i.d. observations with density function / (x,9), 9 G 0 , with 0 some parameter space. The M-estimate of 9 is n An = An (xi, ...,xn)= a r g m i n ^ p ( x i , ( 9 ) , (2.2) i=l where p is a loss function. When p(x,d) = — l o g / (x,9), fin is the maximum like- lihood estimate of 9. Under regularity conditions on p and 0 the estimate An also satisfies n J 3 V ( z i , £n ) = 0 , (2-3) i=l where ib = dp/89. If the data follow model (2.1) and 8 = p, in (2.2) it is natural to choose the loss function p to be a function of the residuals, p (x, 9) = p(x — 9). Then (2.3) becomes n X ^ ( z i - £ n ) = 0. (2.4) 1=1 29 In the following we will assume that the function ip : R —v R satisfies: P . l ip (—u) — —ip (u), u>0, and bounded; P.2 ip is non-decreasing and lim u _nx, ip (u) > 0; P.3 ip is continuous. Without loss of generality, if ip satisfies P . l we can assume that \ip (u)\ < 1, u € R. Definition 2.1 - M-location estimates (with known scale): Let x\,...,xn be a random sample following model (2.1). Let ip : R —)• R satisfy P.l to P.3 above. The solution fin of (2.4) is called an M-location estimate. (2.5) A widely used family of functions ipc was proposed by Huber (1964). Its mem- bers are given by ( sgn (u) if |«| > c u/c if | M | < c, where c G R+ is a user-chosen constant and sgn (u) is the sign function. The constant c determines the asymptotic ; properties of the sequence fin (see Definition 2.13 on page 39). One corresponding function pc is given by u — c/2 if u > c Pc (u) = { u2/2c if |u| < c -u — c/2 if u < —c. Under certain regularity conditions the corresponding M-location estimates are asymp- totically normal (Huber, 1967). The choice c = 1.345 yields an asymptotic efficiency 30 of 95% when t{ ~ N (0,1). For some asymptotic results we will need the function yjc to be twice continuously differentiable. We can easily construct functions that satisfy the regularity conditions of Definition 2.1 and that are twice continuously dif- ferentiable. For example, for a given c > 0 we can find constants a, b, d and e such that { sgn (u) if Iwl > c (2.6) a u7 + bub + du3 + e u if |w| < c is twice continuously differentiable with fc (±c) = ± 1 , f'c (±c) = 0, f'c (0) = 1 and f'J ( ± c ) = 0. Beaton and Tukey (1974) proposed another family of functions ipd, { 0 if Id > d [u/d) (1 - (u/df) if |u| < d. The constant d determines the asymptotic properties of these estimates. The associ- ated family of functions pd is given by { 3(u/d) 2 - 3 ( u / df + (ul df i f | « | < d (2.8) 1 if \u\ > d, This family of functions ipd differs from Huber's in that its members vanish for large values of x. In terms of the estimate this feature means that outlying points will be ignored instead of down-weighted. In the robustness literature these functions are called re-descending. A property that is natural to expect from an estimate for the location param- eter in (2.1) is that it be equivariant under shifts in the center of the data. 31 Definition 2.2 - Translation equivariance: We will say that an estimate fin = (in (xi,... i „ ) is translation equivariant if for any sample x\,..., xn and real number a we have fin (x i + a,..., xn + a) = fin (xi,..., xn) + a. It is easy to verify that estimates fin that satisfy (2.4) are translation equivari- ant. Equivariance with respect to change of scale is also of interest. Definition 2.3 - Scale equivariance: We will say that an estimate fin = fin ( x i , . . . x. is scale equivariant if for any sample Xi,...,xn and real number a we have fin(axi,...,axn) = ajln(xl,...,xn). (2.9) The estimates jxn defined by equation (2.4) are not generally scale equivariant. To obtain equivariant estimates we introduce scale estimates. Definition 2.4 - Scale estimate: Let xi,...,xn be a sample of n real numbers. An estimate dn = on (x\,..., xn) such that on ( o i l + 6, • • •, axn + b) = \a\ bn ( x \ , x n ) V a, b € M, (2.10) will be called a scale estimate. Equation (2.4) can incorporate the scale estimate on as follows. Definition 2.5 - M-location estimates with general scale: Let yj : M. -> M. satisfy P.l to P.3. Let X\,... ,xn be a random sample of real numbers and let on be 32 a scale estimate. The M-location estimate with general scale is the solution fin of 1 n -J2ib((xi- Lin)/on) = 0. (2.11) Let be a real function such that p'^ = ib, then fin can also be defined by 1 " fin = a r g m i n - J^p^ {( x* ~ *)/^n) • (2.12) i=l If ip is not continuous in the above definition, then the solution of (2.11) may not exist. We can still define the M-location estimate in this situation as n £„ = i n f { t G l : ^ ^ ( ( x i - i ) / f f n ) < 0 }, i = l where inf A denotes the infimum of the set A (see Huber, 1981, page 46). It is easy to verify that the M-location estimates with general scale as defined in Definition 2.5 are translation and scale equivariant. Different scale estimates on generate different classes of M-location estimates. Definitions 2.6 and 2.7 consider two particular classes: the M-scale and S-scale esti- mates respectively. In the following we will assume that the real function p : R —>• R+, satisfies p (0) = 0 and R . l p{-u) = p(it), u > 0, and sup u e K p(u) = 1; R.2 p (u) is non-decreasing in u > 0; R.3 p is continuous. 33 Note that without loss of generality, any symmetric and bounded function p that is not constantly equal to 0 can be adjusted to satisfy R . l above. For an arbitrary measurable function / and a random variable X with distri- bution function F, let Epf (X) denote the expected value of the random variable / (X) when X has distribution F, if this expectation exists. Def ini t ion 2.6 - M-scale estimates (Huber, 1964): Let p : K -> E satisfy R.l to R.3 above. Let b € (0,1/2]. Let xi,..., xn be a random sample and let p,n be a scale- and translation-equivariant estimate. Define the residuals r\ = X\ — p,n,...,rn = xn — fan- The M-scale on is implicitly defined by The choices of the function p and the constant b in (2.13) determine the proper- ties of the resulting scale estimate. For example, to ensure consistency of on when the residuals r;'s in (2.13) are standard normal random variables we choose b — E^p(Z), where Z ~ N (0,1). The constant b will also characterize the robustness properties of the sequence dn (see Section 2.2). A widely used family of p functions is given by where A; is a user-chosen constant. For a given b G (0,1/2] we can choose k to obtain E$pk (Z) -b. k = 1.04086 satisfies E^pk (Z) = 1/2. (2.13) i=l (2.14) 34 Another family of scale estimates is that of the S-scales (Rousseeuw and Yohai , 1984). Definition 2.7 - S-scale estimates: Let p : R —> R+ and b £ R as in Definition 2.6. Let x\,..., xn be a random sample. For every t E R consider the residuals x\ — t,..., xn — t and their M-scale sn (t) satisfying 1 " -J2p((xi-t)/sn(t)) = b. (2.15) The S-scale on is defined by n • 1 on(x1,...,xn) = Msn(t). (2.16) Natural ly associated with this family are the S-location estimates. Definition 2.8 - S-location estimates: Let x\,...,xn be a random sample, and for each t e R let sn (t) be as in (2.15). The S-location estimate pn is fin (xi,...,xn)= a rg inf sn (t). (2.17) It is easy to see that i f the function p is continuously differentiable, the pair (pLn,dn) in (2.16) and (2.17) satisfies the following system of equations 1 " -^2p({xi-}in)/dn) = b (2.18) i=l 1 " -^2p'{{xi-jin)/on) = 0, (2.19) i=l where p' denotes the derivative of p. In analogy wi th Yohai (1987) we wi l l refer to the M-locat ion estimates calcu- lated with an S-scale as MM-loca t ion estimates. 35 Definition 2.9 - MM-location estimates: Let x \ , . . . , x n be a random sample fol- lowing model (2.1). Let ip : R —> R satisfy P.l to P.3. Let on be an S-scale estimate as in (2.16). The solution fin of n ^2^((Xi- Ml On) = 0 , (2.20) t=l will be called the MM-location estimate of X i , . . . , x n . Definition 2.10 - Simultaneous M-location and scale estimates (Huber, 1964): Let ip : R -> R satisfy P.l to P.3, and let p : R ->• R+ satisfy R.l to R.3. Let x i , . . . , x n be a random sample and let b — E^p(Z). The simultaneous M-location and scale estimates pn and on are given by the solution of the following system of equations 1 " _ ^ (& - Ml (Jn) = 0, n i=l 1 n -^p((Xi- pn)/dn) =b, (2.21) 2.2 Robustness properties The asymptotic properties of the M-locat ion estimates given by (2.12) are well-known when the distribution of the errors is symmetric (Huber, 1964, 1967, 1981; Boos and Serfling, 1980; Clarke, 1983, 1984). We wi l l assume that the distribution of the errors belongs to the following gross error neighbourhood rle = {FeV : F (x) = (1 - e) F0((x - p)/o) + eH (x)} , (2.22) 36 where V denotes the set of all distribution functions, F0 is a fixed symmetric distri- bution, e G (0,1/2), and H is an arbitrary distribution function. Intuitively e is the fraction of outliers that are expected to be present in the sample. We should mention here that (2.22) does not constitute a topological neighbourhood. See Huber (1981, page 10) for a discussion of different neighbourhoods. We need to introduce some notation. Let £ be a subset of the set of all dis- tribution functions. We will assume that all possible empirical distribution functions belong to £. Let p and b be as in the definition of M-scale estimates, Definition 2.6. Define the functional cr : £ —> R as follows. Let F G £. For every t G R, let a (F,t) satisfy EF[p{(X-t)/o{F,t))) = b. (2.23) For F G S the value of the functional cr (F) is tr(F) = Ma{F,t). (2.24) Define the S-location functional as /2(F) =arg inf a (F,t). (2.25) Clearly, we have cr (F) = a (F, fx (F)). Similarly define the functional p : £ R by the equation EF[iP((X-p(F))/o-(F))}=0, (2.26) or, equivalently p (F) = argmin EF [ ( (X — p, (F)) / cr (F) ) ], 37 where p'^ = ip. It is easy to see that if Fn denotes the empirical distribution function of the sample, then /z (Fn) — pn and cr (Fn) = an where pn and an are given by (2.12) and (2.16) respectively. In what follows we will assume that rit C 5. One measure of robustness of an estimate is given by its asymptotic bias. If F is not symmetric we will typically have /z (F) =fi p, where fx (F) is defined in (2.26) and p is the actual location parameter in (2.1). The supremum of the absolute value of this difference as F ranges over the neighbourhood rit (see 2.22) measures the worst asymptotic deviation we can have. This quantity is called the maximum asymptotic bias. Definition 2.11 - Maximum asymptotic bias: Let n be a statistic defined by a functional as in (2.26). The maximum asymptotic bias of pb over %e is given by B ( e ) = sup | / * ( F ) - / i ( F o ) | / < r ( F 0 ) , F€Ut where F0 is the central distribution of the neighborhood rit. Another measure of robustness for estimates is the breakdown point. This concept was defined by Hampel (1971). Intuitively the breakdown point is the smallest proportion of contamination e* such that the maximum asymptotic bias is unbounded. Definition 2.12 - Asymptotic breakdown point: Let pb be a statistic as before and let B (e) be its maximum asymptotic bias function given in Definition (2.11). The 38 asymptotic breakdown point of fx is defined by e* = e* (F0) = sup {e G (0,1) : B (e) < 00} . In many cases e* does not depend on F0 (Huber, 1981, page 13). Donoho and Huber (1983) introduced the following modified version for finite samples. Let fx (Fn) be an estimate of the parameters of interest, where Fn denotes the empirical distri- bution of the sample. Let b ( m , fx, Fn) be the maximum disturbance you can cause to the estimate for this data set if you arbitrarily change m observations. Formally, let where Fn is an empirical distribution that differs from Fn in that m data points have been replaced by arbitrary values. The finite sample breakdown point (BP) is the minimum m that yields an unbounded b (m, fx, Fn). Definition 2.13 - Finite sample breakdown point: Let Fn be the empirical dis- tribution function of a sample of size n, and let fx be a statistic defined by a functional as above. The breakdown point of p, at the sample Fn is There is an asymptotically equivalent version of this definition where b (m, fx, Fn) is calculated by adding m points to the sample instead of replacing them (see Donoho and Huber, 1983). When the scale o is known, M-location estimates can be defined as in (2.4). b (m, fx, Fn) = sup \fi (Fn) - fx (Fn) |, m G N, such that b (m, fx, Fn) < 00 39 In this case we have (Huber, 1981, page 53) that with n = min {— ip (—oo)/ip (oo), — ip (po)/ip (—oo)} and ip (oo) = lim x^oo ip (x). Hence, e* = 0.50 when ip (oo) = — ip (—oo). Note that if ip satisfies P . l and P.2 then e* = 0.50. When a is unknown and is estimated simultaneously as in (2.21) the breakdown point e* of pn decreases. To avoid this effect, we can use an estimator an with e* = 0.50 and that does not use fin as its centering statistic (such as the median of the absolute deviations from the median, M A D ) . In this way, if ip satisfies P . l we can achieve e* = 1/2 for pn when a is unknown (see Huber, 1981, page 144). On the other hand, M-scale estimates (2.13) have e* = min (b, 1 — b) (Huber, 1981). To obtain a consistent scale estimate an with e* = 0.50 when we use a function Pk in (2.14) we set k = 1.04086. When pd belongs to Tukey's family (2.8) we need d = 1.54764. More generally, to construct a consistent M-scale estimate with breakdown point 8 G (0,1/2), the tuning constant d — d(S) can be determined by solving E^pd (Z) — 5 for d. S-estimates of scale and location as defined in (2.16) and (2.17) respectively, also have e* = 0.50 when b = 0.50 (Rousseeuw and Yohai, 1984). 2.3 Asymptotic properties The objective of this section is to determine the conditions under which the M M - location estimates (2.20) are consistent and asymptotically normal when the distri- bution of the data F G rie is not necessarily symmetric. We are also interested in 40 obtaining uniform properties over the contamination neighbourhood % £ . We will see that if F is asymmetric the asymptotic distribution of MM-location estimates depends heavily on that of the S-scale (2.16) and associated S-location (2.17) estimates. Hence we begin by studying their asymptotic behaviour. The rest of this section is organized as follows. Section 2.3.1 considers the consistency of the S-scale estimate. Sections 2.3.2 and 2.3.3 deal with the S-location estimates. Finally, Sections 2.3.4 to 2.3.7 study the consistency and asymptotic dis- tribution of MM-location estimates. 2.3.1 Uniform consistency of the S-scale estimate The following theorem shows that the S-scale estimate is consistent for the asymptotic value defined in (2.24), and that this convergence holds uniformly for F G 7i€. Let gp (s, t) = EFop ((X - t)l s) and hp (s, t) = (d/ ds) gp (s, t). Theorem 2.1 - (Martin and Zamar, 1993) - U n i f o r m consistency of the S-scale: Let p and b satisfy the conditions in the definition of M-scale estimates, Definition 2.6. Let dn be the S-scale as in Definition 2.7 and cr (F) its asymptotic value as in (2.24)- Let hp be as above. Assume that hp is continuous and that hp(s,t) < 0 for all s G R+ and i e l . Then, for any 5 > 0 lim sup Pp sup |<7„ - CT (F) | > <5 n>m = 0 41 The following lemma shows that if we use a function pd in Tukey's family to obtain an S-scale estimate, and the contamination neighbourhood rle is centered around a distribution function with strictly positive density on R (such as the standard normal) then the conditions of Theorem 2.1 are met. Lemma 2.1 Let pd belong to Tukey's class (2.8) and let rie be a contamination neigh- bourhood as in (2.22) around a distribution F0 with density function f0 that is strictly positive on R. Let hPd be defined as above. Then, pd and hPd are continuous, and hPd (s,t) < o, Vs e R+, yt e R. Proof: The continuity of pd follows from its definition. Note that for pd in Tukey's family we have The continuity of hPd is a consequence of the Dominated Convergence Theorem. Note that p'd (u) u > 0 for all u e R. Finally, that hPd (s,t) < 0, Vs e R+, Vt e R follows by noting that for any pair (s, t) e R+ x R there exists a set / C M G R such that s,t • By hypothesis we have F0 (/CS]() > 0. Then EF0Pd X-t s fo(X)>0. 42 2.3.2 Consistency of the S-location estimate Our next theorem shows that under certain regularity conditions the S-location esti- mate is consistent for distributions F G 7ie- Conditions under which this convergence holds uniformly on % e are discussed in the next section. We need to introduce the following notation. Let p(x,t,s) = p((x — t)/s). Denote the set of positive real numbers (0, oo) by R+. For each t G R and s G R+ let where Fn denotes the empirical distribution function of the random sample x\,..., xn. As in Martin and Zamar (1993), for each e G [0,1/2) let s~ = s~ (e) and s+ = s+ (e) be such that 0 < s" < inf cr (F) < sup a (F) < s+ < oo, (2.29) F€UE F E H E where a (F) is given by (2.24). Theorem 2.2 - Consistency of the S-location estimate: Let p satisfy R.l to R.3, and assume that p is not constant. Let pn be the associated S-location estimate as in Definition 2.8. Let F G 7ic and let ft (F) be as in (2.25). Assume the following: 1. 7 (F, t, s) = EFp((X — t)/S) is continuous in s uniformly in t; 2. g(t) = 7 (F, t, cr (F)) has a unique minimum fi; l(F,t,s) = EFp(X,t,s), (2.27) (2.28) i=i 43 3. for any neighbourhood B (t0) we have inf p(X,t',cr) .t'eB(to) . as B (t0) shrinks to {t0}; B(t0)\{to] p(X,t0,cr) 4- for any bounded neighbourhood B (to) we have E inf p(xi,t',on) EF ,o . t'eB(to) n->oo inf p(X,t',cr) t'eB(t0) Then pn ——>• jx. n—>oo Proof: First note that by the previous result we know that on a almost surely. Hence, with high probability, on G JC, a fixed compact set, for n sufficiently large. Note that g (t) — 7 (F, t, cr) is a continuous, bounded function, and by hypoth- esis it has a unique minimum. Denote the value of t where this minimum is attained by t0, i.e. g (t0) < g (t) for all t ^ t0. We will show that there exist sets I\ C h with to G h such that sup 7 (F, t, cr) < inf 7 (F, t, a). (2.30) teii Because p is not constant, we have limit^oo g (t) = 1 > g (to). Let ei = 1 — g (t0) > 0. Choose ai such that | i | > a\ implies 1 — g (t) < ei/2. Then we have in f | 4 | > a i g (t) > 1 — ei/2 = g (t0) + ei/2. Hence, A = {\x\ < a\) satisfies i n f ^ g (t) — g (tQ) > 0. Also note that necessarily |t 0| < «i- By continuity of g there exists a neighbourhood of t0, 44 B (to) such that g (t) — g (t0) < ei/4 for all t G B (to). It follows that supB( t o) g (t) < 9 (to) + ei/4. We will now show that B (to) C {\t\ < o i} . Let t G B(t0). Then, g(t) < g (to) + ei/4. If \t\ > a\ then, g (t) > g (to) + e i /2, which is a contradiction. Hence, the above inclusion holds. Next note that in f | t | > a i g (t) > 9 (to) + ei /2 > g (t0) + ei/4 > sup B ( i o ) ^ (t). Hence IX = B (t0) C {|i| < o j = 72 satisfy (2.30). We now show that, with high probability, jin eventually lies in the compact set I2. It is enough to prove that with high probability, there exists n i such that for n > rti we have sup 7 „ (*, <rn) < inf 7 n (t, an). (2.31) Let a = sup 7 (F, t, cr), and 77 = inf 7 (F, t, cr) . teh Let 0 < e' < (rj - a) /2. Note that for any 8 > 0, t G K and s G R+, Chebychev's inequality yields P F ( | 7 n (t, 5) - 7 (F, i , 5) I > 8) < - 1 V F . It follows that there exists an integer n0 (e') such that, with high probability, |7„ (t, s) - 7 (F, t, s ) | < e'/2, V n > n 0 (e'), V t, V s, VF. (2.32) Note that because of the uniformity in (2.32) we have for each fixed t and n > n0 hn(t,crn)-i(F,t,an) \ < e'/2. Let ni (e1) be such that with high probability an and a are close enough so that by the continuity of 7 we have | 7 (F, t, on) - 7 (F, t, cr) I < e'/2 for n > m . 45 Note that n\ does not depend on t. It follows that jn (t, dn) < 7 (F, t, cr) + e' and s u p t e / l 7 n (t, an) < a + e' for n > rt\. Similar ly we have n — e' < inf t ^/ 2 7 n (£, an) for n > ni and (2.31) holds. It follows that, with high probability, there exists an integer n such that n > fi jj,n (E h- Having proved that /2 n is ultimately in a compact set 7 2 , and that the unique minimum of the asymptotic equation belongs to I2, we now adapt a standard argu- ment (Huber, 1967) to show that pn converges almost surely to ft. Let B (p.) be an arbitrary neighbourhood of jx. We wi l l restrict our attention to the compact set I2. We have inf 7 (F, t, cr) > 7 (F, p,,cr)+4e, for some e > 0. B y hypothesis, for each t ^ B (fi) there exists a neighbourhood of t such that E inf p(X,t',cr) t ' £ B ( t ) > 7 (F, ix, CT) + 3 e. (2.33) The collection of open sets B (t) covers the compact set I2H B (fi)c. Pick a finite number of them such that k \jB(ti)Dl2nB{p,)c. (2.34) i=l For each of them we have that if n is large enough 1 " 1 n inf ~ y ^ p ( x j , i , CT„) > -Y^ inf p(xi,t,an) t — 1 %—1 > 7 (F, fi, cr) + 2 e. (2.35) 46 Also (2.36) It follows that for n large enough 1 -^2p(xi,t,an) + e, and hence fin & B (/2). A p p l i c a t i o n of Theorem 2.2 to Tukey's family of functions pd Lemmas 7.7 to 7.11 in the Appendix show that if pd belongs to Tukey's family (2.8) then assumptions 1, 3 and 4 of Theorem 2.2 are met. Assumption 2 is particularly difficult to verify in general. Under certain regularity conditions and for e < 0.10 we can show that it holds uniformly on F G 7^£ (see next Section). We conjecture that it holds for any F G He. The following plots illustrate the behaviour of the family of functions fe(t) = 7 (F e , t, cr (Ft)) for Fe in the contamination neighbourhood of the standard normal distribution and pd in Tukey's family (2.8). We considered Fe (x) = (1 - c) $ (x) + e$ ((x - XQ)/0.1) for e = 0.15, 0.20, 0.25, 0.30, 0.40, 0.45 and x0 = 1, 2, and 5. We see that in all cases 7 (Fe, t, cr (Fe)) has a unique global minimum. Note that as x0 increases a second local minimum appears. Only for e > 0.40 this local minimum approaches the global minimum of the function, but these functions seem to always have a unique global minimum for t < 0.50 for the contaminations considered here. 47 •10 -5 0 5 10 (a) e = 0.15, x0 = 1 -10 -5 0 5 10 (b) e = 0.15, x0 = 2 -10 -5 0 5 10 -10 -5 0 5 10 (c) e = 0.15, x0 = 5 (d) e = 0.20, z 0 = 1 (e) e = 0.20, x0 = 2 (f) e = 0.20, z 0 = 5 Figure 2.1: Plots of / (t) = 7 (F e, t, cr (Fe)) wi th Fe(x) = (1 - e) $ (rr) + e $ ((1 — in) / 0.1), for different values of e and x0. 48 - 1 0 -5 0 5 1 0 (a) e = 0.25, x0 = 1 - 1 0 -5 0 5 1 0 (b) e = 0.25, x0 = 2 - 1 0 -5 0 5 10 - 1 0 -5 0 5 10 (c) e = 0.25, xo = 5 (d) e = 0.30, z 0 = 1 (e) c = 0.30, zn = 2 (f) e = 0.30, x0 = 5 Figure 2.2: Plots of / (t) = 7 ( F £ , t, <r (Fe)) wi th F e (x) = (1 - t) $ (x) + e $ ((a; — x0)/ 0.1), for different values of e and x 0 . 49 -10 -5 (a) e = 0.40, x0 = 1 (b) e = 0.40, x0 = 2 (c) e = 0.40, x0 = 5 (d) e = 0.45, x 0 = 1 * (e) e = 0.45, x0 = 2* (f) e = 0.45, zo = 5 Figure 2.3: Plots of / (t) = 7 (Fe, t, a (Fe)) wi th Fe(x) = (1 - e) $ (x) + e $ ((x - x 0 ) / 0 . 1 ) , for different values of e and x0. *Note the different scale on the x-axis for e = 0.45, xQ = 1 and 2. 50 2.3.3 Uniform consistency of the S-location estimate In this section we show that under stronger conditions on p than those of Theorem 2.2 we obtain uniform consistency of pn. These new regularity assumptions are basically the uniform counterparts of those of Theorem 2.2. We first state the theorem and its proof. We then obtain sufficient conditions to meet the required uniform assumptions and we verify them for Tukey's family of functions pa in a range of values of the proportion of contamination e. Theorem 2.3 - Uniform consistency of the S-location estimate: Let p satisfy R.l to R.3, and assume that p is not constant. Let b G (0,1/2], pn as in Definition 2.8 andfi{F) as in (2.25). If U.1 j(F,t,s) is continuous in s uniformly in t G R and F G rie, that is, for any e > 0 there exists a 5 = S (e) > 0 such that if | s i — s2\ < 5 then j(F,t,Sl)-j(F,t,s2) <i, V i G R , V F G K ; (2.37) U.2 for each F G rie, fF (t) = y{F,t, cr (F)) has a unique minimum fi (F); U.3 there exists sets I± C I2 with I2 compact that do not depend on F G rle, and such that s u p 7 ( F , £ , < T ( F ) ) < infy(F,t,tr{F)), V F G H £ ; (2.38) teh ltl2 U.4 the convergence in assumption 3 of Theorem 2.2 holds uniformly in F G H€, 51 i.e. for every I > 0 and t 0 e l there exists 5 = 5 (e, t0) such that inf p{X,t',cr(F)) E i p(X,t0,tr(F)) <e, VFeHe, (2.39) where the ball B$ (t0) has diameter 5; U.5 the convergence in 4 of Theorem 2.2 holds uniformly in F G H£, that is, if Yi= inf p(Xi,t',on) and Y (F) = E F t'eBs(to) then for any S > 0 lim sup PF( sup\Y n - Y (F)\ > 5 ) = 0 ; m->oo Feue V n>m ' U. 6 for every 5 > 0, e (S, F) defined by the property where ft(F) is the global minimum oj'^{F,t,cr (F)), satisfies i(6) = inf e(S,F) > 0; (2.40) WMF)) 7 ( F ' i ' ° ' ( F ) ) ^ y(F,fi{F),*r(F)) + i{5,F) , (2.41) (2.42) then lim sup PF( sup \fln - fj, (F)\ > 5 ) = 0 . m->oo F e n \ n > m ) (2.43) Proof: Fix an arbitrary neighbourhood B{JL (F)) of jx (F) and let e > 0 be given by (2.42). Let i i and7 2 be as in U.3. Note that by assumption U.4 the finite coverage of I2 in (2.33) and (2.34) associated with e (6) in U.6 does not depend on F. Let Yi and Y be as in U.5. Consider the events Am (F) = Lup\Yn - Y (F)\ < i] , m € N . (n>m J 52 Choose an arbitrary 8 > 0. By U.5 we have that there exists m 0 (8) such that PF(Am(F)} > 1-8, Vm>m 0(<5) MF Now note that Am(F)c\-J2 in / P (xu t, an) > 7 ( F , fx (F), cr (F)) + 21, Vn > m 1 — cv, Let A » ( F ) = j ^ p ( ^ , £ ( F ) , < r ( F ) ) < y(F,jJL(F),tr(F))+i, Vn > m j We also have that there exists mi = mi (8) such that for m > mi we have PF(Dm{F))>l-5 VFeHe. Take rri2 = max (mo, mi). We have Cm(F)nDm (F) > 1-28 V m > m 2 , \/F Erie We also have Cm(F)nDm(F) C /2m G / i ( /2 (F ) ) ,m > m 2 Hence, for each 8 > 0 there exists m 2 (5) such that PF prneB(ii(F)),Mm>m2 > 1 - 28, VF Erie, that is, for each neighbourhood B(p, (F)) we have lim inf PF m->oo FeHe jln G B(p,(F)), V n > m or equivalently, if i i B > 0 is the diameter of B^fi (F)), = 1 lim sup PF sup - /2 (F) | > d B m->-oo F e H e in>m 0. 53 Application of Theorem 2.3 for Tukey's family of functions pd Here we discuss sufficient conditions for U.1 to U.6 to hold for S-location estimates obtained with functions pd in Tukey's family (2.8). Assumption U.1 holds by Lemmas 7.8 and 7.9. To see that assumption U.4 holds use Lemmas 7.7 and 7.10. We now show that U.5 holds. Let Y{ and Y (F) be as in U.5 and Vi(F)= inf p(Xi,t',<r (F)), i = l,...,n. t GB(to) Then Y (F) — EF [Vi]. We have to show that for any 5 > 0 and e > 0 there exists m0 such that < e V m > m 0 . sup\Yn-Y(F)\ > r j n>m We cannot use Lemma 7.2 (Bernstein's inequality) on Y (F) because these random variables do not have mean zero nor are they independent. We have sup\Yn-Y(F)\ >S n>m < PF sup\Vn(F)-Y(F)\>5/2 n>m + + PF sup \Yn-Vn(F)\ > 5/2 n>m (2.44) for some e' (5) > 0 that depends on <5. We have s u p | F n - y „ ( F ) | >5/2 n>m < PF sup |<7„ -tr(F)\ > e' n>m (2.45) for some e' = e'(<5). To prove inequality (2.45) note that Yn = l/n £)"=1 g (xt, an) and Vn = l/n Y J " = 1 g (x{, cr). In the proof of Lemma 7.11 we see that g(x,s) is 54 continuous in s uniformly on x. Hence, for a given 5/2 there exists a positive e' such that \an — cr| < e' implies \Yn — Vn\ < 8/2. Hence, for each n we have {\Yn-Vn\>8/2} C{\an-cr\>e'}, and then note that for any sequence of random variables {-X„} n e N if a is a real number, we have { s u p n > m X n > a} = \Jn>m{Xn > a}. Together with (2.45) this bounds the second term in (2.44). To control the first term, note that the sequence of random variables Wi = V{ - E (Vi) = V* - Y (F) satisfies the assumptions of Bernstein's Lemma (Lemma 7.2 in the Appendix) with c = 2sup u p(u) and sn = na^, where o2w denotes the variance of Hence for any 8 > 0 we have D F(\Vn-E(V)\>5) =PF(\Wn\ ><j) -n82 < 2exp = 2 2(al + cS) exp^—a (8)^ < 2 exp -n 82 2 (k2 + c8) (2.46) where a2w < k2 < oo for all F e rl£ and a (8) > 0. Note that they do not depend on F. Use Theorem 2.1 to find m 0 large enough such that sup PF sup|<rn -tr(F)\ > e' n>m < c / 2 , (2.47) and use (2.46) together with the Borel-Cantelli Lemma (Lemma 7.3) and a standard argument to find mi large enough such that sup Pp s u p \ V n - Y ( F ) \ > 5/2 n>m < 1/2. (2.48) Equations (2.44), (2.47) and (2.48) show U.5. 55 Assumptions U.2, U.3 and U.6 are closely related and we consider them to- gether. Let s+ and s~ be as in (2.29) and suppose that there exists f e l such that inf s~<s<s+ „ , X - t \ „ (X > -—-—, V | i | > r * . (2.49) The above condition, together with the hypothesis of existence of a unique minimum for each F G He suffices to show U.3 (i.e., that the set 72 in (2.30) does not depend on F G "H e ). Equation (2.49) is hard to verify analytically. Numerical evaluation shows that (2.49) holds for e < 0.25 for estimates calculated with pd in Tukey's family (2.8) and d = 1.54764 (i.e., estimates with breakdown point 50%). Assumption U.2 seems hard to prove. Together with (2.49) a condition to ensure that for all F G He the function 7 (F, •, a) has a unique minimum is inf E*p" f ̂ —H > — — sup p" (x)~ , (2.50) -**<*<** \ s / 1 — e x s~<s<s+ where t* is given in (2.49). The above condition suffices to show that the functions 7 (F, •, cr (F)), with F G Hc, are uniformly convex on (—t*, t*). We have the following implication: (2.49) and (2.50) U.2, U.3 and U.6. It is easy to see that (2.49) implies that 7 (F, 0, cr (F)) < 7 (F, t, cr (F)) for all \t\ > t* and thus U.3 holds. Equation (2.50) implies mf 7" (F, t, s) > rj > 0, V F G He, s~ <s<s+ where n does not depend on F . Equation (2.50) ensures that the functions are strictly convex on that interval, and hence have a unique global minimum in this fixed interval 56 (assumption U.2). To verify U.6 note that for any t P>s(p, (F)) we have 7 ( F , t, cr (F)) - 7 (f, £ (F), cr (F)) = ± 7" (F, f, <r (F)) (t - ft (F)) 2 > 17 (* - A ( ^ ) ) 2 >77<52 V F G 7 i e , where i ^ B^{ji (F)) and 77 does not depend on F . Condition (2.50) is quite strong. The S-estimates obtained with p functions in Tukey's family having breakdown point 50% do not satisfy (2.50) for e = 0.10. That is, if there is 10% contamination we cannot guarantee uniform convergence over the neighbourhood. The main problem seems to be that the threshold t* found in (2.49) is unnecessarily large. We can adjust the choice of this constant as follows. For each t eR define the set A(t) = J sH (t) : EHp ( ^ y ) = b, H G Ue j , let s~ (t) = inf A (t) and s+ (t) = sup A (t). If we choose t* as the solution of supp'(x) , (2.51) X then (2.50) holds for a larger range of values of e. We will now show that (2.51) and (2.50) are sufficient conditions to ensure that 7 (F, •, cr) has its unique global minimum in the interval (—t*,t*) for any F G rie, where t* is given by (2.51). The reasoning is as follows. If t is a minimum of 7 (F, -, cr) then it solves the equation ( 1 - £ ) ^ ( f ( i r ) + ^ ( T ( i r ) = 0 ' (2-52) 5 7 inf * " ( * * ) < * < » + ( t * ) -E*p' X-t* where s (t) = cr. Hence, t solves E*p'd ( X-t S(t) (2.53) For each e G (0,1/2] the largest solution t of (2.53) is determined by solving that is, equation (2.51). In Figure 2.4 we plot the function ge (t) for estimates with breakdown point 50% and 40% and different values of e. We include the threshold t* obtained in (2.49). We see that the largest solution of (2.51) (or 2.54) is larger than the mentioned threshold, and hence this solution corresponds to a local minimum of e 7 (F,-,cr). The smallest solution t* of (2.51) is then the largest possible value of t satisfying (2.52) that corresponds to a global minimum. Equation (2.50) guarantees that every function 7 (F, •, cr) is strictly convex in (—£*,£*). It follows that there only exists one global minimum, and that it belongs to this interval. We evaluated conditions (2.50) and (2.51) for S-location estimates obtained with Tukey's pd functions. We considered estimates with breakdown point 50% and 40%. Details are presented in Tables 2.1 and 2.2. For estimates with 50% breakdown point equation (2.50) holds for e < 0.10. When we lower the breakdown to 40%, (2.50) holds for e < 0.15. We see that there is a trade-off between high breakdown point and the uniform convexity condition in (2.50). 1 - sup (a) , (2.54) 58 (a) BP = 50%, e = 0.05 (b) BP = 50%, e = 0.10 r t f t (c) BP = 40%, e = 0.10 (d) BP = 40%, c = 0.15 Figure 2.4: Plots of ge (t) = inf s -(t)< s < s +( t ) [—E$p'd (^f^)] for estimates wi th break- down point 50 and 40%. The threshold t* is given by (2.49). The horizontal line is at e/(I-e)supxp'd(x). 59 B P d e s s+ sup p'd sup [pd] 0.50 1.54764 0.05 0.933919 1.069247 1.1096251 2.0040167 0.10 0.863700 1.150487 0.11 0.849103 1.168490 0.12 0.834307 1.187162 0.13 0.819307 1.206545 0.15 0.788662 1.247639 0.20 0.707933 1.366741 0.40 1.987967 0.05 0.949208 1.081607 0.86384744 1.214571 0.10 0.895659 1.181595 0.15 0.838933 1.308399 0.16 0.827163 1.338136 0.17 0.815240 1.369688 0.20 0.778506 1.477396 Table 2.1: Numerical parameters for Tukey's family of functions pd. B P = Breakdown Point, d = tunning constant. s~ and s+ are defined in (2.29). sup p'd ~ supxeR p'd (x). supp2 = sup i e R p2(a;) . B P e t* as in t* as in satisfies (2.49) (2.51) (2.50) 0.50 0.10 0.835 0.37205 Yes 0.11 0.891 0.41940 No 0.15 1.229 0.62620 No 0.40 0.10 0.786 0.26313 Yes 0.15 1.038 0.43912 Yes 0.16 1.092 0.47839 No 0.20 1.203 0.65011 No Table 2.2: Numerical evaluation of regularity conditions required for uniform consis- tency of S-location estimates wi th Tukey's family of functions pd. 60 2.3.4 Consistency of the MM-location estimate The next theorem shows that under certain regularity conditions, if an is a consistent S-scale estimate, then the MM-location estimates that satisfy n X ^ ( ( X i - £ n ) / * n ) =0, (2.55) t=l are also consistent. In Section 2.3.5 we study regularity conditions that suffice for these estimates to be uniformly consistent over the contamination neighbourhood ri€. Theorem 2.4 - Consistency of the M M - l o c a t i o n estimate - Let x\,... ,xn be a random sample of i.i.d. random variables with distribution function F G rie. Let &n = vn(xi,..., xn) be an S-scale estimate. Let ip : R —> R satisfy P.l to P.3 and let fbn be a sequence of MM-estimates that solve (2.55) above. Then p p i) if &n —> o (F) then jxn > ti (F) ; n—too ii) if on o (F) then fin a'S' > /x (F). n—>oo Proof: Fraiman, Yohai and Zamar (2000) show that if tp satisfies P . l to P.3 and the central distribution of rie has a density function / 0 (u) that satisfies f'Q (u) < 0 for all u > 0, then there exists a unique solution /x (F) of ' X - A x ( F ) ' = 0 . They also show that under the same conditions ry(F,t,cr) = Ep[^p({X — t)/o)] is strictly monotone as a function of t G R. Also note that P . l to P.3 imply that there exists 0 < L < oo such that lim|x|._KX) ip (x) = L. The proof of Lemma 7.13 can be 61 easily modified to show that these conditions imply that ip is uniformly continuous on R. We now adapt a classical argument (Huber, 1981, page 46) to obtain the desired results. Let A n satisfy (2.55). To simplify the notation, in what follows let p denote To prove (i) we will show that for any e > 0 P(fan < A* ~ e ) —> 0 a n d P[pn > P + e) — • 0. By Lemma 7.1 we know that for each i e K, EFip i f p „ , fx-t n \ ON J n->oo In particular, if £ < p then the monotonicity of EF [ip ((X — t)/a)] as a function of t € R implies n ^ \ ON J n->oo \ a J and hence ^ * ' < 0 1 —-> 0 , V t < n. (2.56) Similarly we can show that P U E ^ f V ^ J < 0 I — V (2-57) Note the following inclusion [t: fin <*} C j<: ^ E ^ ( ( x ' - < °J ^ A n < * } We have { A n < / i ~ e } C J A n < / x - e / 2 } C ^ V - (/i - e/2))/<j n) < o| 62 Now (2.56) yields P (pn < fi ~ e) < P f^ip ((Xi - (fi - e/2))/an) < 0^ — ^ 0, (2.58) so that lirrin^oo P (p,n < p - e) = 0 , and hence l i m ^ o o P (£L„ < fi - e) = 0. For the second result, the inclusion together with (2.57) yields p ( / i „ < / J + e) > p ^ I ^ ^ ( ( X i - ( A ( + c ) ) / a n ) < o j — > 1 . Thus l im n _ ) , 0 0 P (fin < fi + e) = 1 and l im^oo P (fin > \x + e) = 0. The result follows by noting that for any e > 0 Plfi-e < iln < fi + e) • 1. This proves part (i) of the Theorem. The proof of part (ii) follows the same lines. Now Lemma 7.1 yields n^—f \ crn J n-foo \ a J Hence, for each e > 0 there exists a null set J\fe such that if u £ J\fe there exists no = no ( w ) with n *7-! V a, i=i for all n>nQ. Consider the null set N = (JfceN-^iA- F o r a n y e > 0 a n d a n y u N there exists nx = nx (u,e) such that (2.59) holds for n > nx. This null set does not 63 depend on e. If M denotes the corresponding null set where (2.60) holds for large enough n, then for any to MIJ M the same reasoning as in the previous proof yields that there exists n 2 = n 2 (to, e) such that for any n > n2 , \pn - fi\ < e. • 2.3.5 Uniform consistency of the MM-location estimate In this section we show that if the scale estimates an are uniformly consistent over the distributions in He, and we impose more regularity conditions on the function ip, then the MM-location estimates (2.20) are also uniformly consistent. We need the following additional regularity condition: P.4 ip is continuously differentiable. Theorem 2.5 - Uniform consistency of the M-location estimate with gen- eral scale: Let X\,...,xn be i.i.d. observations following the location model (2.1). Let on be a scale estimate that satisfies (2.29) and the conclusion of Theorem 2.1. Let ip satisfy P.l to P.4- Let fin be the solution of (2.20) and let LI (F) be the asymptotic value of fin as defined in (2.26). Then for any 8 > 0 lim sup PF sup| / i n - n(F)\ > 5 n>m 64 Proof: For any t G R and F G Ue let ^(t,F) = EFxb(?-^ Let e > 0 be arbitrary. We first show that and o(c )= inf / v ( / i ( F ) - « / 2 , F ) > 0 , (2.61) J* fc He b ^ = H ( A* (^) + e/2, F ) > 0 . (2.62) r t r t f Equations (2.61) and (2.62) can be expressed as: the family of functions fx^ (t, F) has "uniform minimum slope" at fx (F). Bounding d\i$j dt\^ uniformly over F G ~H£ will be enough for these conditions to hold. Let XF (e) be w , „ , fX-fx(F) + e\ then a (e) = i n f F e ^ £ XF (e). Note that \ F (0) = 0; hence A F (e) = e\'F (eF) , where lF G (0, e). By (2.29) we have 0 < s~ < cr (F) < s+ < oo. Then where e-u€ is the proportion of contamination in He. It is easy to see that the last term in the above equation is a decreasing function of iF. Hence eF < e implies XF (e) = e \'F (iF) > ^ ( l - e n J EFoip' (* " j f f i + *) • 65 The Dominated Convergence Theorem shows that the above expression is continuous as a function of fx and cr. It is also positive and hence a sufficient condition to obtain a positive lower bound is that \x (F) and cr (F) be bounded for any F e 7ie. A similar argument can be applied to show that equation (2.62) holds. Let cr = c r (F) and ix = LI(F). To simplify the notation let ip(X,t,s) = ip ((X - t)/s). For each t it is easy to see that Yt (t) = ip (Xi, t, an) and Y (F, t) = EFip (X,t,tr) have the same properties as those in U.5 , hence the proof on page 54 holds. Let ijn (t) = i J2"=i Yi (*) and m (t, F) = EF (iP (X, t, cr)). For each T > 0 and t G K we have l im sup PF( sup \ipn (t) - m (t, F)\ >T) = 0, For each m G N , t G M, F G Ue and r > 0 let (2.63) iPn (*.^) > r An (F, t, T) = I S U p (n>m then (2.63) can be written as l i m sup PF( Am(F,t,T)) = 0 . (2.64) Now note that LI^ (/X (F) ,F) = 0 and that fx^ (t, F) is a non-increasing function in t. We also have 1 An < /* - e> C S ~ E ^ fa" A* ~ e / 2 ' ^ ) ^ 0 C c V>n (A* - e/2) -^(fx- e/2, F) iPn (fx - e/2) - ̂ (fx-e/2, F) > Aty (/x - e/2, F ) > a ( c U = A n (F, e) , 66 where a (e) is given by (2.61). Similarly /}„ > A* + e j C j^XJ^ (x^ fi + e/2, an) > 0 j C c ybn (/x + e/2) - A ^ ( / X + e/2, F ) V>n (/x - e/2) - Aty (M - e/2, F ) > - / t y (/x + e/2, F ) > 6 ( e U = P „ ( F , e ) It follows that {|/xn - > e} C An (F, e) (J P n (F, e). Hence, oo oo oo |J { | / i n - A * | > e } C |J A , (F , e )U [J -BN (F, e) . n=m n=m n=m Immediately Mm (F,e) = < sup|fi n - A * | > e > l_n>m J C { sup | ^ n (AX - e/2) - (/x - e/2, F) | > ^ (AX - e/2, F ) ) |J j sup (AX + e/2) - AV (/* + e/2, F) | > (AX + e/2, F) C A m (F, AX - e/2, a (e)) |J A n (F, AX + e/2, b (e)) . We have PF [Mm (F, e)] < PF [Am (F, /x - e/2, a (e))] + P F [ A n (F, AX + e/2, 6 (e))] , and then sup PF [Mm (F, e)] < sup PF [Am (F, AX - e/2, a (e))] + sup P F [ A n ( F , A * + e/2,6(e))] , Fe-He 67 so that lim sup PF [Mm (F, e)] < lim sup PF [Am (F, fx - e/2, a (e))l + lim sup PF [Am (F, ix + e/2, b (e))] = 0 , m-+oo FeHc and the proof is complete. 2.3.6 Asymptotic distribution of the MM-location estimate Having shown the consistency of the estimates for any distribution F in the contam- ination neighbourhood, we turn our attention to their asymptotic distribution. The following argument will show the basic idea behind the proof of Theorem 2.6 and will also illustrate why we concentrate on location estimates calculated with an S-scale. Let fJLn be an M-location estimate and a n a general scale estimate. To simplify the notation denote their asymptotic values fx (F) and cr (F) by // and a respectively. We will consider M-location estimates with general scale, i.e. jin satisfies 1 n -^2tp({Xi-ixn)/an) = 0. (2.65) Under certain regularity conditions a Taylor expansion of the above equation around (/j, a) yields i=l i=l i=l ~ E W < (Xi -^1^ & -»)/°} + Rn, (2-66) i=l 68 where Rn = Op (1/ y/n) is the residual term. From here we obtain y/n (/xn - n) = y/nAn (fx, a) + y/n (an - a) Bn (fx, a) + Rn, (2.67) where n I n An (fi, a) = a ̂ 2 ip ((x{ - fx)/a) / ^ip'((xt - fx)/a), i=i / i=i and n I n Bn (fi, o) - W (& ~ A*)/ o-) (xt - fx)/a] I ^ ip' ((Xi - fx)/a). i=l / i=l Assume that F is symmetric and ip (u) is odd (and hence ip' (u) u is also odd). Let U — (X — fx)/a. It is easy to see that EF [ip' (U) U\ = 0 and that in this case Bn (P,CT) converges almost surely to zero. If in addition y/n(an — a) = Op (1) (see Definition 7.1) we immediately obtain y/^(fxn-fx)^N(0,V), n—>oo where denotes weak convergence and 2 Ep[iPH(X-fx)/a)} {Ep[iP'((X-fx)/a)}}2- In the case of asymmetric F we typically have EF [ip' (U) U] ̂ 0. Hence, the term involving y/n (an — a) in (2.67) will not vanish. Thus we need the corresponding Taylor expansion for an. Assume that an is a M-scale, that is, it is given by an equation of the form n 5^p((x i-T n)/c> n) = 6', (2.68) i=l 69 for some function p not necessarily related with ip in (2.65). As in Definition 2.6, Tn in (2.68) is some arbitrary location estimate and b' G (0,1/2]. A Taylor expansion of equation (2.68) yields (an - a) = 4= C - ( T ' a) + ~D- ( T ' °) (Tn ~T) + R'n, (2.69) where R'n is the remaining term, T is the asymptotic value of Tn, and Cn and Dn are sums of independent random variables. To be able to obtain a Taylor expansion of (2.68) that can be used in (2.67) we need an estimate Tn which is at the same time linearizable (i.e., it accepts a Taylor expansion) and does not depend on another scale estimate. If we use the same location estimate in the scale equation, that is, if we set Tn = pn, we are solving a system of two simultaneous equations as in (2.21). Estimates obtained in this way do not have satisfactory robustness properties (Martin and Zamar, 1993). As far as we know there is no robust estimate that simultaneously satisfies: (i) is location and scale equivariant; (ii) admits a Taylor expansion of first order; and (iii) does not depend on an scale estimate. Our next result shows that this problem can be avoided if we use an S-scale an in (2.65). The basic idea is that in this case the expansion (2.69) asymptotically does not depend on the distribution of y/n (Tn — T). We need the following additional regularity conditions: R.4 p is twice continuously differentiable; 70 P.5 ip is twice continuously differentiable; Theorem 2.6 - Asymptotic normality - Assume the regularity conditions of The- orems 2.1 and 2.4- Assume that ip satisfies P.l to P.3 and P.5. Assume that p satisfies R.l to R-4- Let pn be the M-estimate of location given by (2.12), pn be the S-estimate of location and an the corresponding S-estimate of scale, as defined in (2.17) and (2.16). Denote the almost sure finite limits of the sequences pn, pn and on by p (F); p (F) and o (F) respectively. Assume that ip and p also satisfy the following regularity conditions for any F Erie: Al: EFip' (u) > 0 and finite; A2: EP [ip' (u) u] is finite; A3: EF [p' (u) u}^0 and finite; A4: EF [ip" (u)} is finite; A5: EF [p" (u)} is finite; A6: EF [ip" (u) v? + 2 ip' (u) u] is finite; A 7: EF [p" (u) u2 + 2p' (u) u] is finite; where u= (x — p (F))/o(F) and u = (x - p (F))/o (F). Then Vn(pn- P (F)) —• N (0 , V (p, o, F)) (2.70) 71 where V(fi,o,F) = cr(F)2H(F)2Ef ~ A 4 (F) (F) x I P -J(F) X-p(F) a(F) H (F ) = l/EF W {{X - p (F)) /a (F))} , J ( F ) = EF W ((X - ix (F)) fa (F)) (X-LI (F)) /a (F)} { } EF y ((X - fl (F)) J a (F)) (X-fx (F)) /a (F )} Proof: To simplify the notation let LI = p ( F ) , a = a ( F ) , fx = fx ( F ) and ui = (xt - p)l a. and (2.71) A second order Taylor expansion of (2.12) around the l imit values (p, a) yields (2.72) 1 1 1 n + o~2 (r*n ~ -y\^'{ui)+ (2.73) 1 1 1 ^ + 2 ^ * " ' ) 2 n ^ k'(fii) ( 2 7 4 ) i=i + 2<// (Si) u, (2.75) 1 1 + ^"^2 E r " + ^ ' ~ CT) (A*n - A*) i=i (2.76) 72 where Ui = (xi — /})/a and (//, a) lies between (p,n,an) and (/_*, a). Let 11 1 " = r - (An - /i) - E "̂ (̂ ) ' i=l 11 1 " Cn = ~ - (ff„ - O) - V (Ui) U 2 + 2ip' (Ui) Ui] , (2.77) (2.78) i=i and 1 1 D n = — 2 E W ("0 + $ (<7n - o) • Th (J (2.79) i=i By hypothesis fin — fj, = op (1) and, from assumption A4, = O P (1). i=l Then (7.3) implies that Bn = Op (1). Similarly we can show that Cn = oP (1) and Dn = oP (1). From (2.72)-(2.76) we have - y/n (p,n - fi) l-ih'ip' (v-i) - Bn - Dn J ° \nl=1 ) 1 n 1 / l n \ = -rYjy>{ui)--s/n~{on-o) \-'y,ip'{ui)ui-Cn\. (2.80) From equation (2.18) we get 1 l~l " -Vn~(on-o) -Y/P'(vi)vl-B'n-D,n i=l = H=itp M - b - ^ - ( - E ^ («0 - C n 1 , (2-81) where, as before, B'n = oP (1), C'n = Op (1) and D'n = oP (1). Note that ^E*/(«i) = op(i) . i=l 73 and hence - y y («o - c;=op a). (2.82) From (2.19) we have a \ n t l J ° \ n l = i J = OP (1) - -y/E {an -a) + 0P (1). (2.83) a From (2.81), (2.83) and Lemma 7.12 we have 1 fill* x _ ~ b , . m r 9 o 4 ^ - \ M (̂ Vi - cr) = ——-.—=j \-Op(l) . (2.84) c V n E i = l P («•) w* The theorem now follows from (2.80), (2.84) and Lemma 7.12. • 2.3.7 Uniform asymptotic distribution for MM-location es- timates In this section we wi l l show that the MM-loca t ion estimates /x n converge weakly to a normal distribution uniformly over F 6 ?{ f . Our main result is the following Theorem. Theorem 2.7 Suppose that all the assumptions of Theorems 2.1, 2.3, 2.5 and 2.6 hold. Then sup sup FeHe xgR < x > - $ (x) = o(l), where V = V (F) is given by (2.71). 74 To prove Theorem 2.7 we need to define uniform versions of op (an), Op (an) and asymptotic normality, and that these quantities have analogous convergence prop- erties to the usual (non-uniform) ones. Definition 2.14 - Uniform big O in probability: Let an, n > 1, be a sequence of real numbers and let Xn, n > 1, be a sequence of random variables. We say that Xn = UOp (a„) over the set of distribution functions %e if l im sup l im Pp Definition 2.15 - Uniform small o in probability: Let an, n>\, be a sequence of real numbers and let Xn, n > 1, be a sequence of random variables. We say that Xn = Uop (an) over the set of distribution functions %e if Vr5 > 0 > k = 0 l im sup Pp Xn > 5 = 0. Definition 2.16 - Uniformly asymptotically normal: We say that a sequence Xn, n 6 N is uniformly asymptotically normal (UAN) over the set of distribution functions rie if sup sup PF (Xn <x)-$(x) =o (1) . (2.85) FeHe x e R Lemma 2.2 Let Xn, n G N , be sequence of random variables that are uniformly asymptotically normal as in Definition 2.16. Then Xn = UOp (1). Proof: For any K > 0 we have PF{\Xn\ >K)= PF{Xn >K)+ PF(Xn < -K) = l-PF(Xn<K)+ PF(Xn < -K) . 75 Fix I > 0. By (2.85) there exists n0 = nQ (e) such that for all n>n0 PF{Xn<x)-<f>(x) <e, V x , VF£He. Hence, PF(\Xn\ > K) < 1 - $ (K) + $ ( - i f ) + 2c, for all n > n0 and for all F G ft£. Similarly we obtain P p ( | X n | > i f ) > 1 - $ (if) + $ (-iv") - 2e. Hence, given e > 0 we find n0 (e) such that for all n > n0 PF(\Xn\ >K)-[l-$(K) + $ {-K)] < 2e, or, equivalently, l im^oo PF(\Xn\ > K) = 1 - $ (K) + $ (-K). It follows then that l i n i K - K x , s u p F e W e l i m n ^ o o P F ( | ^ I » | > K) = 0. • L e m m a 2.3 Lei a n , n € N, 6e a sequence of random variables such that an = t/Op (1), and /ei n G N, 6e another sequence such that bn — UoP(l), then anxbn = UoP (1). Proof: The Lemma follows easily from the following inequality, valid for any 5 > 0 and K > 0. PF(\anbn\ >6) <PF(\bn\ > S/\an\ , \an\ < K) + PF (\an\ > K) <PF{\bn\> 5/K) + PF(\an\>K) . L e m m a 2.4 Let an, n G N , be a sequence of random variables such that an = UOp(l), and let bn, n G N, be another sequence such that there exists b ^ 0 with 76 bn — b = Uop (1), then ^ = ^ + UoP(l) Proof: We have We now show that ^71 b \b„ - l = UoP (1) . (2.86) (2.87) For simplicity assume that b > 0. The same argument, wi th appropriate modifications can be applied to the case b < 0. F i x 5 > 0. For any 0<e<b<Kv?e have >8j =PF(\b-bn\>\bn\5) < PF (\b -bn\> \bn\ 5,\bn\<K) + PF (\bn\ > K) < PF (|6„| < e) + PF (\b - bn\ > \bn\ 5,e<\bn\<K) + PF(\bn\>K) <PF(\bn\<£) + PF(\b-bn\>e5)+Pp(\bn\>K) . Now note that \b — bn\ > b — \bn\ and \b — bn\ > \bn\ — b imply PF(\bn\<~e)<PF(\b-bn\>b-e) , and PF (\bn\ > K) < PF (\b - bn\ > K - b). Choose an arbitrary r > 0. For fixed 5, i and K choose n 0 (r) such that PF(\b-bn\ >b-e) < r / 3 , PF{\b-bn\ >K-b)<r/3, 77 and PF(\b-bn\ > 15) < r / 3 . for a l l F £H€ and n > n0. It follows that for n > n 0 we have bn J Hence (2.87) holds. The result now follows from (2.86) and Lemma 2.3. Lemma 2.5 Let an, n E M, be a sequence of random variables such that an = UOp (1), and let bn, n E N , be another sequence such that there exists b with bn — b = Uop (1), then anbn = anb + Uop (1). Proof: This follows immediately by noting that an bn—an b = an (bn — b) = UOp (1) x Uop (1) and applying Lemma 2.3. • Lemma 2.6 Let D\,...,Dn be n independent and identically distributed random variables and let Dn = F>{. Assume that EF [Df] < c < oo, for all F E He- Then Dn = UOp (1) and Dn - EF ( A ) = UoP (1). Proof: Note that the assumption on the second moment of D j implies that EF \Di\ < 1 + c for al l F E%t. To simplify the notation, let d = 1 + c. Then we have PF [\Dn\ >2d]<PF [\Dn - EFDi\ >d]< ~ V a r F ( A ) < ^\EFD] . 78 Hence, l i m n P F [ | / J n | > 2d] = 0 for al l F G He, where d does not depend on F. It follows that l im sup l im PF \\Dn\ > k] = 0 , fe—>oo FeT-Lc n ~ > o c that is, D n = UOp (1). A similar argument shows that Dn — EF (Di) = UoP (1). • Lemma 2.7 7 / a „ = Uop (1) and X n zs UAN (see Definition 2.16) then Xn + a„ is C/^AT. That is: sup sup | P F {Xn + an < x) - $ (x) | = o (1) . (2.88) Proof: First note that for any x G R, 5 > 0 and F e He PF [Xn + an < x] < PF [Xn + an < x , | a n | < 8] + P F [|a n | > <5] < P F [ X „ < x - ( 5 ] + P F [ | a n | > ( 5 ] . (2.89) Similar ly we have P F [X„ + an > x] < PF [\an\ > 6] + PF [Xn + an > x , \an\ < 5] <PF[Xn >x-6] + PF[\an\ >S] , which yields P F [Xn + an < x] > PF [Xn < x - 5] - PF [ K I > 5] . (2.90) Equations (2.89) and (2.90) together yield -PF [\an\ >5]<PF [Xn + an < x] - PF [Xn < x - 5] < PF [\an\ > 8] . (2.91) 79 To simplify the notation, let un (8, F) = PF [\an\ > 8]. We have - Un (8, F) < PF [Xn + an < x] - $ (a;) + $ (x - 8) - PF [Xn < x - 8} + $ (a;) - $ (x - 8) < un (8, F) . (2.92) Let e > 0 be arbitrary. Choose 8 = 8 (e) > 0 such that sup |$ (x - <5) - $ (a;) | < e/3 . (2.93) xeR For this 8 choose n0 = n0 (8) such that sup F e ^ £ \un (5, F)\ < e/3. Choose ni = n\ (e) such that sup sup |$ (x - 8) — PF [Xn < x - 8]\ = sup sup |<2> (x) - PF [Xn < x]\ < e/3. Fenc x e R Fenc X G R (2.94) Let bn (x, F) = $ (x - 8) - PF [Xn < x - 8} and c (x) = $ (x - 8) - $ (x). Then, equation (2.92) can be written as -e/3 < PF [Xn + an < x] - $ (x) + bn (x, F) + c (x) < e/3 . (2.95) We know that for n > max(n0,ni), sup F sup s \bn (x, F)\ < e/3, sup x | c (x) | < e/3. Let dn {x, F) = PF [Xn + an < x] — $ (x). Equation (2.95) implies -e /3 - bn (x, F)-c (x) < dn (x, F) < e/3 - bn (x, F) — c (x) , (2.96) and we immediately obtain that for n sufficiently large (not depending on x or F), -e < dn (x, F) < e. • Remark 2.1 Let / be a real function such that EFo [f (X, t, s)] = Jf (X, t, s) dF0 (X) > 0 80 for any f e l and s > 0, where F0 denotes the central distribution of the contam- ination neighbourhood rlt. It is easy to see that if EFo [f (X,t,s)] is a continuous function of (t, s) and JCt and K,s are compact sets in the real line such that Ks C (0, oo) then we have In particular, if a (F) denotes a scale estimate that satisfies (2.29) and p (F) is an M-location with general scale calculated with a function tpc in Huber's family (2.5) then inf Var F [VJC ((X - p (F))/a (F))] > 0 . (2.97) P r o o f of Theorem 2.7: To simplify the notation, in what follows let \x = fJ.(F), /} = p. (F) and a = a (F). The idea of the proof is to show that y/n (p,n — fi) can be represented as a linear term plus a uniformly small remainder. We use the Berry Esseen Theorem to show that the linear part is U A N (see Definition 2.16) and Lemma 2.7 to show that the sum of these terms is also U A N . We now show that where ^^7=4 = + UoP (1) . (2.98) Wi= i^{{xi - y)la) - d {p{{Xi - p)la) - b ) ) / ( 2 . 9 9 ) EF {>' ((X - p) /a) (X - p) /a} EF {ff ((X - p) /a) (X - p) /a) e = EF{iP'((X-p)/o)} . 81 Theorems 2.1, 2.3 and 2.5 show that that an — a — UoP (1), A n — p, = [/op (1) and fin - p = UoP(l) respectively. From (2.72) to (2.76) and (2.77) to (2.79), using Lemmas 2.3 and 2.6 we have ^ Vn ( A n - lA (^J2^'(ui) + UoP(l)Sj 1 " 1 ( \ n \ i=l \ i=l / From (2.81) and (2.82) we have 1 -\/n(on - a) ^ p ' ( < K + t / 0 p ( l ) i=l = -7=E'0(Ui) ~ 6 ~ - \ / ^ ( A n - A ) x ^ O p (1) . V " i=l CT Similarly, from equation (2.83) we have 1 / l " \ 1 - V ^ ( A n - A ) - J2 P" M = ^ ( J ) - - V " - CT) + ^ °" V n7=? / ° From the last two equations we obtain -VH (o-n -a)\a + UoP (1)1 = V p((«0 - 6) + f / o P (1) , (2.101) where a = £ p {p' ( ( X - A ) / a ) ( X - A ) M - From (2.100) and (2.101) we have ^ \/n ( A n - A*) [c + f io F ( l )J = 4= E ^ -\--7=i2(PM ~b)+ U°r \d + U°r > where c = EF {ip' ((X - p) / a )} , and d = £ F {ip' {(X - p) / a ) (X - p) / a } . Hence, 1 r i 1 n d 1 n - V ^ ( A n - A * ) C+f /0 P ( l ) = 7 : V ^ B i ) j= V] (p (^) - b) + UoP (l) . v t=l v i=l 82 From the last equation and Lemma 2.4 we obtain (2.98). Note that \Wi\ are bounded (see (2.99)), and hence their moments are bounded uniformly for F G He- Using (2.97) we see that their variance is bounded away from zero uniformly on F G ri€. The Berry Esseen Theorem (Chow and Teicher, 1988, page 305) yields Validity of the required regularity conditions to obtain uniform asymptotic normality The required conditions to obtain uniform consistency and asymptotic normality are satisfied by estimates wi th breakdown point 50% obtained wi th functions pd in Tukey's family (2.8) when the proportion of contamination e is up to 10%. If we consider estimates in the same family with breakdown point 40% the conditions hold for e < 0.15. There seems to be a trade-off between the breakdown point of the estimate and the extent to which the uniform consistency holds. Uniform results for contamination neighbourhoods with e < 0.10 are nevertheless of practical interest. There are reports in the literature suggesting that most data sets have fractions of contamination ranging between 0 and 10%. (Hampel, et a/., 1986). Hence we have 83 C h a p t e r 3 R o b u s t b o o t s t r a p for t he loca t ion-sca le m o d e l In this chapter we introduce a new computer intensive method of inference based on Efron's bootstrap (Efron, 1979; Efron, 1982; Hall, 1992; Efron and Tibshirani, 1993). To distinguish Efron's bootstrap from the method presented here we will refer to the former as "classical bootstrap". Efron's method applies to the problem of estimating the sampling distribution of a complex statistic. It is based on the following principle. Suppose we are interested in the sampling distribution of the statistic /}„ when the data have distribution function F. If we knew F we could obtain either the exact distribution of fin or an approximation to it via Monte Carlo simulations. The idea behind Efron's bootstrap is to use an estimate of F in order to estimate the distribution of jln. If F is assumed to belong to a parametric family, F = Fg, 84 then, given an estimate 9 for 9, we can use F = F§ to estimate Fg. This method is called parametric bootstrap. If we do not assume an underlying parametric family for F, then a natural non-parametric estimate is the sample distribution function F = Fn. In this case we call the method non-parametric bootstrap. Throughout this thesis we only assume that the distribution F that generated the data belongs to a contamination neighborhood He of a certain central distribution (see Section 2.2). Hence, we will focus on non-parametric bootstrap methods. For both types of bootstrap (parametric and non-parametric) we need to cal- culate the distribution of p,n assuming data generated by F. However, in most cases it is very difficult to obtain an explicit expression for this distribution of fin. In those cases we can use computer simulations to get an estimate of this law. One generates several thousand samples from F and re-calculates p,n for each of these samples. The empirical distribution of those replicated values of fin gives an estimate of the desired sampling distribution. In what follows we will also refer to this empirical distribution as the "bootstrap distribution estimate". When each evaluation of jxn is computationally demanding, re-calculating the statistic many times can make the method too slow to be of practical interest. Robust estimates are not easy to calculate. For example, S-scales (2.16) solve an optimization problem where the function being minimized is implicitly defined. The MM-location estimate is calculated by solving the non-linear equation in (2.12). The numerical complexity of robust estimates for the linear model is significantly greater (see Chap- ter 4). 85 Another potential problem when using the classical bootstrap on data that might contain outliers arises with the tails of the distribution estimate. Intuitively, the problem is that the outliers may appear in the bootstrap samples in larger pro- portions than in the original sample. For example, if there is one outlier among 10 data points, we expect that over 3% of the bootstrap samples of size 10 will contain the outlier 5 or more times. This means that over 3% of the re-sampled statistics may be severely affected by the single outlier. In other words, the corresponding tail of the empirical distribution of the re-calculated estimates might be unreliable. A sym- metric 95% confidence interval might then be affected because it uses the estimated 2.5% and 97.5% quantiles, and at least one of them can be influenced by the outlier. This problem has been quantified by Singh (1998). He defined the breakdown point of bootstrap quantile estimates and showed that even robust estimates do not yield bootstrap quantile estimates with satisfactory breakdown points. To remedy this problem he proposes to re-sample from the Winsorized data (see Section 3.6.1). Un- fortunately this variant of the bootstrap method does not reduce the computational requirements of the re-calculation scheme. The large amount of data that businesses and government agencies can collect and store with the available information technology makes large-scale applications a reality. Hence, it is of practical interest to study the feasibility of estimation methods when applied to moderate and large data sets. Compared to the classical bootstrap, our method, which we call "robust bootstrap", is significantly faster, more stable when the data are contaminated and produces comparable results when applied to data that do not contain outliers. 86 To illustrate the gain in speed of calculation of our proposal consider the simple case of constructing a 95% confidence interval for the location parameter ti when the observations Xi satisfy x^ = p + e;. We assume that the errors are independent observations with unknown variance. We generated an artificial data set of 1,000 independent standard normal observations (i.e. we set // = 0 in the model above) and built a 95% confidence interval for the population mean based on an MM-loca t ion estimate (see Definition 2.9). We used the function ^1.345 i n Huber's family. The S- scale an was calculated wi th the function P1.04086 in (2-14). The basic percentile confidence interval based on the classical bootstrap with 3,000 bootstrap samples was (—0.05341,0.07601) (for the definition of these bootstrap confidence intervals see Davison and Hinkley, 1997, page 194). It took 1545 C P U seconds (that is around 25 C P U minutes) to finish on a d u a l - C P U Sun Sparc U l t r a 4 (each C P U a 296 Megahertz S U N W U l t r a S P A R C - I I ) with 1.1 Gigabytes of R A M memory running SunOS 5.7. O n the other hand, the basic percentile confidence interval using the robust bootstrap based on the same number of bootstrap samples took less than 3 C P U seconds. The 95% confidence interval was (—0.05309,0.07574). Figure 3.1 displays a comparison of the 3,000 re-calculated /i*'s with each method. Note that the boxplots look very similar. This example illustrates that when the data do not contain outliers both methods are comparable, and the robust bootstrap is significantly faster to compute. The improved stability of the robust bootstrap can be illustrated wi th the following simple example. Consider a random sample of size 100 containing 65 ob- servations that follow a standard normal distribution and 35 observations with dis- 87 Figure 3.1: Boxplots of 3,000 re-calculated MM-loca t ion estimates with the classical and robust bootstrap. The artificial data set contains 1,000 independent standard normal observations without contamination. 88 tribution function G (u) = $ ((« — 10)/ \/G\5), where $ denotes the standard normal cumulative distribution function. In other words, this sample contains 35% of outliers centered around 10. We used the same MM-location estimate as above. Figure 3.2 contains the boxplots of the 3,000 re-calculated MM-location estimates for each boot- strap method, and a simulated data set from the actual asymptotic distribution of fin. Note that the tails of the robust bootstrap re-calculated MM-location estimates are more stable than those corresponding to the classical bootstrap method. The new method presented here also gives quantile estimates with higher breakdown points than those obtained with the classical bootstrap (see Section 3.4). The intuitive rea- son is that we use weights based on the robust estimate to re-calculate the statistic. Hence outlying points will typically be associated with small weights and have small impact on the bootstrapped estimate. The rest of this chapter is organized as follows. Section 3.1 presents the re- sampling method for the location-scale model. Section 3.2 contains a one-sample and a two-sample example of statistical inference performed with the robust bootstrap. Section 3.3 studies the asymptotic behaviour of the robust bootstrap. Section 3.4 discusses the breakdown point of the quantiles estimates obtained with the robust bootstrap. Section 3.5 proposes a way to studentize the robust bootstrap in order to improve on its order of convergence. Finally, Section 3.6 contains two Monte Carlo comparison studies on the performance of the robust bootstrap to estimate asymptotic variances and to build confidence intervals. 89 CLASSICAL ROBUST ASYMPTOTIC BOOTSTRAP BOOTSTRAP DISTRIBUTION Figure 3.2: Boxplots of 3,000 re-calculated MM-loca t ion estimates with the classical and robust bootstrap. The artificial data set contains 100 independent observations; 65 of them follow a standard normal distribution, while the remaining 35 have distri- bution function G (u) = $ ((u - 10)/ where $ denotes the standard normal cumulative distribution function. The boxplot labeled "Asymptotic Distr ibution" contains a sample of 3,000 observations simulated from the actual asymptotic distri- bution of fin. 90 3.1 Definitions The intuitive idea behind our method was briefly discussed in Section 1.5 for the simple case when the scale parameter a is known. Here we present the method for the case of unknown a. We use MM-location estimates (see Definition 2.9). Let x\,...,xn be i.i.d. observations following model (2.1). Assume that the x,'s have distribution function F belonging to the contamination neighborhood Ht defined in (2.22). Let fin be an MM-location calculated with an S-scale an, and let //„ be the associated S-location estimate. As in Section 1.5, we are interested in making statistical inferences about the location parameter p. We consider two methods to achieve this goal. The first alter- native is to use the result of Theorem 2.6 on the asymptotic normality of the sequence y/n(£in — (J,). To use this method we only have to estimate the variance of fin. The second option is to directly estimate the distribution function of y/n (fin — p). We can then use this distribution estimate to approximate the quantiles needed to construct confidence intervals (see for example, Efron and Tibshirani, 1993, and Davison and Hinkley, 1997). We propose to use the following computer intensive method to generate a large number of re-calculated /t*'s. These re-computed statistics can be used to estimate both the variance and the distribution function of the sequence fin. For the first objective we can use the empirical variance of the fi„'s. A natural estimate of the distribution function Fn of jln is given by the empirical distribution function of the 91 re-computed statistics. Recall that in the discussion in Section 2.3.6 we noted that for data generated by an arbitrary distribution F in the contamination neighborhood rl€, the location and scale estimates are not necessarily asymptotically independent. Hence, to esti- mate the distribution of fin we have to take into account the behaviour of the scale estimate an. Intuitively this is the reason why in what follows we re-calculate both estimates to incorporate the information obtained from the re-computed cr*'s into the final re-calculated /i*'s. For each 1 < i < n define the residuals = Xi — fin and fi = Xi — f~Ln associated with the M M - and the S-location estimates respectively. We first write (xn and an as weighted averages. Define the weights Ui and Vi as 1 n b Simple computations yield Vi = -Kpinl'<?„)/'fif l<i<n. (3.1) n I n A n = E Ui Xi / Yl Ui ' (3-2) i=l I i=l n i=l Clearly this representation does not help in calculating the estimates because the right-hand side depends on fin and an, but it motivates the robust bootstrap proce- dure. Let x\, i — 1,... ,n be a non-parametric bootstrap sample from the obser- 92 vations. That is: x* are i.i.d. random variables with distribution function assigning probability l/n to each of the points in the original sample. Define the random variables fi*n and <r* by n / n i=l / i=l n where UJ* = ip {r*/an)/r*, v* = p(r*/an)/ {n b f*), r* = x* - pn, and f* = x* - pn for 1 < i < n. Note that the estimates fin, fin and an involved in u>* and v* are based on the original data, not the bootstrap samples. The re-calculated jx*n and <r* obtained in (3.4) and (3.5) may not reflect the actual variability of the random vector ( / i n , an)' due to the fact that the estimates fin, an and pn used in the weights uii and Vi are those evaluated from the original sample. We now give an intuitive argument on how to derive a correction factor to remedy this unwanted phenomenon. Think of (3.2) and (3.3) as a fixed-point equation of the form (fin, on)' = f (fin,crn) for certain f : M 2 —>• R 2 . Let p and a be the almost sure limits of fin and an respectively. A first-order Taylor expansion of f around LI and a suggests that we should multiply the re-calculated pairs (fin,an)' by the matrix [I — Vf (p, <T)] _ 1, where V f denotes the matrix of first derivatives of f. We estimate this factor with [I — Vf ( / t n , a n ) ] _ 1 . Hence, the re-calculated fi^* — fin with the robust bootstrap is a linear combination of fi*n and cr* obtained in (3.4) and (3.5). Recall that our interest lies in estimating the sampling distribution of y/n (fin — p). 93 The robust bootstrap "replicates" of this expression, An — fin, are given by An* ~ An = an (A* - An) + K (d*n - On) , (3.6) where An and on are the location and scale estimates obtained with the original sample, ^n — 0~n i=l n i=l i=l n ^2^P'(ri/an) , i=i (3.7) i=l and ip' and p' denote the derivatives of ip and p respectively. Remark 3.1 - Computational Ease: To estimate the distribution of An — fJ> with the classical bootstrap we have to re-calculate on as well as An- Wi th each bootstrap sample, to determine bn we have to minimize the function sn (t), t G R, implicitly defined as the solution of (see (2.15)) 1 7 1 i=l Once the minimum d* of s n (t) is found we have to solve for (x*n n ^ ( ( * i - A n ) / * n ) = 0 . i=l These optimization problems have to be solved several thousand times. On the other hand, to re-calculate An* — An with the robust bootstrap we use the weighted averages (3.4) and (3.5). The correction factors an, bn and c„ are computed only once with the full sample. In the location-scale examples we considered, the robust bootstrap 94 required less than 0.2% of the C P U time needed to compute the same number of classical bootstrap re-calculations. Remark 3.2 - Robustness: Note that both Huber's and Tukey's families of functions •0's yield weights u>i(u) = ip(u)/u that are decreasing functions of Out ly ing points wi l l typically have large residuals and hence be associated wi th small weights i n equations (3.4) and (3.5). Note that when using a function ipd from Tukey's family, extreme outliers (those wi th a residual |rj | > don) receive a zero weight, and hence have no effect on the re-calculated estimate. For pd in Tukey's family, the resulting weights Vi used in re-calculating the scale are also decreasing in the absolute value of the residuals. This makes outlying points less influential in the re-calculated a*. See Section 3.4 for a formal discussion of the robustness properties of this method. 3.2 Examples 3.2.1 One sample location-scale: Blood pressure Consider the following data set obtained from a hypertension screening program (Ros- ner, 1977). Ten monthly observations were obtained for each patient. The data for a particular individual are: 40, 75, 80, 83, 86, 88, 90, 92, 93 and 95. We wish to estimate the individual 's mean blood-pressure LLQ. Let fin denote the MM-loca t ion estimate for p 0 calculated wi th the function •01.345 in Huber's family and S-scale on obtained with pi.04086 in (2.14). W i t h the above 95 data set we obtain pn = 86.03. In order make inferences about LIQ we would like to have an estimation of the distribution of pn — fi0 for this patient. To this end we use both the classical bootstrap and the robust bootstrap and compare their performance. We generated 50,000 bootstrap samples from the data and calculated p%* — pn, where p%* denotes the classical bootstrap re-calculated pn. We also computed 50,000 robust bootstrap p^* — pn- Figure 3.3 contains boxplots for both — pn and p^* — pn. Note that the robust bootstrap empirical distribution is less skewed than that of the classical bootstrap. We see that the lower ta i l of the classical bootstrap distribution estimate is heavier than that of the robust bootstrap. This happens because in the bootstrap samples the outlier is appearing in larger proportions than in the original sample. The difference in the re-sampled distributions is reflected in the correspond- ing 99% basic percentile confidence intervals for /u0- W i t h the classical bootstrap we get (79.537,101.121) and wi th the robust bootstrap yields (78.505,94.869). The corresponding 95% confidence intervals are also different, but the disagreement is less severe. The classical bootstrap yields (80.349,93.79) and the robust bootstrap (79.717,91.994). 3.2.2 Two-sample location-scale: Seeded clouds We consider data from an experiment conducted in the state of F lor ida (USA) between 1968 to 1972. The data we use is a subset corresponding to the period 1968 to 96 Robust Bootstrap Classical Bootstrap Figure 3.3: Comparison of the classical and robust bootstrap distribution estimates of fin — HQ for the blood pressure data. 97 Clouds Figure 3.4: Precipitation data 1970. These data were originally analyzed by Simpson et al. (1975) using Bayesian techniques. The question of interest is whether "dynamic seeding" (massive silver iodide seeding of clouds) produces, under certain conditions, increased precipitation. The experimental unit is a single cumulus cloud and the outcome is the rain volume falling from the cloud. There are two groups of 26 clouds each. One group contains the "seeded clouds" and the other the "unseeded clouds". We applied a log transformation to the data to get similar dispersions in both groups. Boxplots of the transformed data are provided in Figure 3.4. 98 Using a two-sample t-test for populations with equal variance, the p-value for the null hypothesis of equal means is p = 0.0141. The corresponding 99% confidence interval for the difference of the means yields (—0.06,2.35). Thus, at the 1% level, we would conclude that there is not enough evidence to support the alternative hy- pothesis of different mean rainfall. Figure 3.4 suggests that there is a difference in the location of the two boxplots, but that this shift is probably concealed by the two smallest observations in the seeded group. Indeed, after removing these two clouds the two-sample t-test assuming equal variances for the hypothesis of equal means now yields a p-value of 0.0014 (roughly ten times smaller than above). The corresponding confidence interval with nominal level of 99% becomes (0.30,2.56). Of course, the actual level of this confidence interval is unknown since its construction involved a subjective rejection rule. If we bootstrap the two-sample t-test for populations with equal variances using the classical bootstrap, we obtain the following 95% and 99% basic percentile confi- dence intervals for the difference of the means: (—4.561, —0.202) and (—5.119,0.599) respectively. These results also yield a p-value p that satisfies 0.01 < p < 0.05. The inference based on bootstrapping the classical two-sample Welch test for populations with different variances leads to basic percentile confidence intervals that are equal to the ones shown above. We now apply our robust bootstrap to construct a 99% confidence interval for the difference in the location parameters. We first describe a more general method to compare several location estimates. Given independent samples from k potentially 99 different populations we want to build a confidence interval for c'/x, where c' = (c i , . . . , Ck) are fixed constants with Ci ^ 0 and LI = (/ii , . . . , is the vector of the population parameters. Assume, for simplicity of the argument, that all the groups have the same number of observations, n. Equation (2.12) for the i-th population becomes ^E^((v«-/£)M0)=°. (3-8> where yij, 1 < j < n is the data for the i-th population, and is the corresponding S-scale estimate. We base our inference on the distribution of V ^ c ' ( A „ - M ) , (3.9) where fi'n = (/*„,... /}„) and fln, 1 < % < k is the robust location estimate for the i-th population given by (3.8). The distribution function of (3.9) can be obtained as follows. / O O / " O O • • • / FV1 ( u / C l + c'g) dFV2 (g2) • • • dFVk (gk), • 0 0 J —oo (3.10) where v{ = V " (#, - Mi), ^ («) = P (vt < u) for i = 1,... ,h, g = (g2, ...,gk) and c = (—c 2 /ci , . . . , —Cfc/ci). To simplify the notation we do not explicitly indicate that F and F„. above depend on the sample size n. An estimate for (3.10) based on N bootstrap samples for each population (k N independent bootstrap samples overall) can be constructed as follows. For i = 1,..., k and j = 1,..., N let v*1 — y/n (jl^*1 - Lij) be the re-calculated uj's for the z-th 100 population. Let FVl be the empirical distribution function of the v*1, j = 1 , . . . , N. The estimate of (3.10) is given by 1 N 1 N 12 = 1 »fc = l where Q denotes the /-th coordinate of the vector c. W i t h this method we can estimate the required quantiles of (3.9) by solving F (u) = a. We apply this method wi th k — 2 and c' = ( 1 , - 1 ) using the same MM-loca t ion estimate as in the previous example (see page 95). The 99% confidence interval for c'/x = pi — A*2 is (0.22,2.22). We thus conclude that the p-value satisfies p < 0.01 and reject the null hypothesis of equal means at the 1% level. Our conclusion is in agreement wi th the one reached by Simpson et al. (1975). Splus functions for one- and two-sample confidence intervals based on this method are available from the author. 3.3 Asymptotic properties The following theorem shows that the robust bootstrap re-calculated quantities yfn (/if* — / i n ) have the same asymptotic distribution as the sequence yfn (fin — /i) when n —t oo. This result justifies the use of our method in order to perform infer- ence on the parameter \x. Note that this result holds for any distribution F in the contamination neighbourhood rit (see page 36). Theorem 3.1 - Convergence of the robust bootstrap distribution - Let ip : R —> R be odd, bounded, and non-decreasing in (0,oo]. Let p : R —> R+ be even, 101 bounded, and non-decreasing in (0, oo]. Let b G (0,1/2]. Let pn be the MM-location estimate based on ip, let on be the S-scale calculated with the function p and let pn be the associated S-location estimate (see Definitions 2.7 and 2.9). Let p, o and p be the almost sure limits of the sequences pn, bn and pn respectively. Assume that the following conditions hold. 1. The following expected values exist and are finite for all F G rie- EF [ip (X)}, EF [iP (X)/X], Ep [iP' (X)} and EF [iP' (X) X] . 2. For all F G He, EF [p' (X) X] ± 0 and finite. 3. The following functions are continuous: ip, p', ip(u)/u, p'(u)/u, ip', p", ip", p'", (iP (u) - ip' (u) u)/u2, and (p' (u) - p" (u) u)/u2. Let V2 be the asymptotic variance of the sequence y/n(pn — t1) (see Theorem 2.6), and let p^* — pn be the robust bootstrap re-calculated estimates. Then along almost all sample sequences, conditional on the first n observations, v ^ ( A f - A n ) ^ iv(o,v 2). Remark 3.3 Note that if ipc is a Huber-type function then (ipc (u) - ip'c (u) u)/u2 - 0 for | w | < c . When pd is a re-descending function in Tukey's family we have ( p ' d W - P ^ W « ) / ^ 2 = | ( ^ - Q ) 3 ) ^ |u |<d. 102 Hence assumption 3 is also satisfied if the distribution F generating the data does not have positive mass at the points where ip is not differentiable. A p p l y Lemma 7.13 to verify that conditions 1 and 2 also hold for ipc and pd in Huber's and Tukey's families respectively. Remark 3.4 We can change condition 3 above so that it does not depend on the particular F E H c that generated the data. In this case we require that 3 hold everywhere. Clearly, now Huber's functions ibc do not satisfy the assumption. We have to use a "smoothed" version ipc that coincides with ipc except on an arbitrary small neighbourhood around c and —c where ipc is continuously differentiable. Proof of Theorem 3.1: For i = 1 , . . . , n let rj = xi — jj,n and fi = X{ — jj,n. Note that the estimates pn, &n and pn satisfy They can also be written as a weighted average of the observations as follows: j=i (3.11) n 103 where Wi (An, C>„) = Ip ( (Zj - An) / £ „ ) / (Xj - An) , Vi (An, C>„) = p ( (Xi - / } „ ) / O n ) / ( n b (Xi - / } „ ) ) , and W,' (An, #n) = p' ( (iTj - An) / <?„)/ (Xj - / / „ ) . The idea is to show that the vector (ftn,an, An) G M 3 is the fixed point of a smooth function of means. Let f : R 3 - » K 3 be defined by where ( dn(k,s)/gn(k,s) ^ shn (k, n (k, s ) j vn (k, s ) J u. dn(k,s) = (fc's) Xi> i=i n gn (k,s) = ^,s) ' t=i n ^n (fc,s) = ^2vi(k,s) f i , i=l n u„ ( fc ,s ) = J^W- (fc,s)Xi , i=l and n i=l 104 The re-weighted representation of the estimates can be written as f (An, <5"„, An) = (An, O n , An)' • Now, perform a Taylor expansion o f f around the limiting values (/J, a, A)'- We have \ Pn J = f (p, o-, A) + V f (/x, a, A) ^ An - P ^ <7n — a \ An - A / + -Rn, (3.12) where G M 3 is the remainder term and V f (•) G R 3 x 3 is the matrix of partial derivatives of f. Each component of P„ is of the form •x!nHn-x.n, where | | x n | | = Op(\/y/n) and Hn is a Hessian matrix. We have \x.'nHnxn\ < \\Hnxn\\ | | x n | | < \\Hn\\ | | x n | | 2 . Each entry in Hn is a second derivative of a component of f. For example, if fi denotes the first coordinate of f, we have „ _ d% _ [ j E ^ ' f o ) 1] [ E ^ f o ) / r J + [ i EIU V (U)] x (ET=i^(rO/r0 2 \ [ E L ^ (*)] [EIU ̂ 1 - [EIU (rO] [EHI ^ By Lemma 7.1 and assumption 3 we have | | / / n | | | |x n | | = o P ( l ) . Then, | x ^ i y n x n | = oP(\jyfn). Hence, | | P n | | = oP(l/y/n). To simplify the notation let 0n = (An, °n, /}„)' , and 0 = (//,cr,/2)'. Equation (3.12) becomes ( d B - 6>) = [I - V f (0) ] - 1 Vn" (f (^) -9) +oP (1). (3.13) The rest of the proof consists of showing that the bootstrap distribution of the right- hand side of (3.13) converges to the asymptotic distribution of y/n(0n - 0). Note 105 that the correction matrix in (3.13) has to be estimated. We wi l l first show that a consistent estimate of [I - Vf (0)]~\ namely [I - Vf (On)}'1, yields the coefficients an, bn and cn in (3.7). First note that ^ au a i 2 0 ^ I - V f ( 0 n ) = 0 a 2 2 0 ^ 0 a 3 2 a 3 3 J The only entry that needs justification is a 2 3 = 0. Remember that an minimizes sn (t), t 6 M. It follows that for each i e K w e have d_ dt 1 " -Y,P((Xi-t)/sn (t)) i=l hence l _ f , (Xj - t\ f-Sn (t) - (Xj - t) Sn (t) nttP \sn(t)J { Sn(tf o, (3-14) where s'n (t) denotes the derivative of sn (t). Because fln minimizes sn (t), we have sn (Pn) = 0, which together wi th (3.14) implies 1 n -^2p'((xi -Pn)/o-n) = 0. i=l It is easy to verify that lJ2p((xi-~k)/s) d a 2 3 = M i=l 1 1 (k=jlN,S=Vn,k=jln) o ft i=l so that a 2 3 = 0. Hence, [ I - V f ( 0 „ ) ] - x = ^ 1/an -au/ana22 0 ^ 0 1/ a 2 2 0 ^ 0 - 0 3 2 / a 2 2 a 3 3 1/ 033 y 106 Because we are interested in the asymptotic behaviour of the first coordinate of 9n we only need the first row of [I — V f ( 0 n ) ] _ \ a n d hence we only need to calculate an , a22 and a\2. Simple calculations yield au = Ol2 ^22 E ^ ' ( ^ ) ( ^ ) C=i^(2ie)/(£ie) 1 v , ( Xi L~ln ^ f Xi L~Ln i=i It is easy to verify that (1 / an , — 0 1 2 / a u c ^ ) = (a„,6 n ) where a n and 6 n are the correction factors in (3.7). We now show that the bootstrap distribution of the right-hand side of (3.13) converges to the same distribution as the sequence 9n. First we show that f (9) — 9 in (3.13) is a smooth function of means. It will then follow that we can bootstrap it to obtain an estimate of its distribution. Define the random vector Y (9) e R 5 by Y ( 0 ) = ( V ( ( X - A i ) / < 7 ) , tb((X-vL)/a)/(X-ri , p((X-p)/a) , p'((X-fl)/a) , p' {{X-p)la)/ (X -(,))' We have M Y W = £ F Y ( 0 ) = (O, # , & , ( ) , # ) , 107 where # stands for something different from zero, but otherwise irrelevant in the analysis that follows. Let g : R 5 ->• R 3 g (xi,x2, x3, Xi,xb)= h (j, , -x3 , 1- A* J - \X2 b x5 J Then g (/LtY(e)) — {Pi^iP) = Let Yj (0) be the corresponding vectors obtained with the observations Xj , z = 1 , . . . , n, and let Y n (0) be their sample mean. We have g(Y„(0))=f(0) . Hence [f (9) - 0} = [g (Y B (0)) - g (/xY(fl))] . B y Bickel and Freedman (1981), if g is smooth, we can bootstrap the last expression to obtain a consistent estimate of its asymptotic distribution. For any vector 0 let Y* (0) be the sample mean of the vectors Yn (0) obtained with a bootstrap sample x\,...,x*n. Because 0 is unknown and we want to esti- mate it wi th 0n, we st i l l have to show that y/n [Y* (0n) — Y n (0n)] is asymptotically equivalent to ^fn [Y„ (0) — /xY(e)] • Consider the metric d2 (Fi,F2) for distribution functions defined by d 2 ( F 1 , F 2 ) = i n f F [ ( X - Y ) 2 ] (3.15) where the infimum is taken over al l the possible joint distributions for the random vector (X, Y) such that its marginal laws are Fx and F2 respectively. This metric was introduced in Mallows (1972) and Tanaka (1973). For a detailed discussion see Bickel and Freedman (Section 8, 1981). d2 metrizes weak convergence in the following sense: d2 (Fa, F) ->• 0 iff F a ^ F and \imEFaX2 = EFX2 a 108 where —? denotes weak convergence. Let Z\,..., Zn be i.i.d. random variables with common distribution function F. Let F^ denote the distribution function of ^=Y{(Zi-E[Zl}). For any pair of distribution functions i* \ and F2, d2 satisfies d2 {F^, F^1) < d2 (F\, F2) (Bickel and Freedman, 1981). Let Gn be the empirical distribution function of the vectors yi{9n), i = 1,..., n. Conditional on the first n observations, the distribution of v^[y; { 9 n ) - y n (dn)] is G^ • Let Z represent the asymptotic distribution of y/n[yn (0) — ii (0)]. In what follows we will show that d2 (G£\ Z) > 0 almost surely. n->oo Let Fn denote the distribution function of y/n [yn (0) — /u (0)] and F i 7 1 ^ the empirical distribution of \/n[yn (0) — yn (0)}. Following the notation in Bickel and Freedman (1981) we have that d2 (<#>, Z) < d2 ( G W , F „ n ) ) + d2 {Fin\ Z) < d2 (Gn, Fn) + d2 (F£\Z) . Bickel and Freedman show in their Theorem 2.1 that d2 (Fnn\z) —>• 0 almost surely as n —> oo. Lemma 7.18 shows that d2 (Gn, Fn) —> 0 almost surely. We have shown that conditionally on the first n observations, as n goes to infinity, V^[rn(9n)-yn (9n)] 109 converges weakly to the same limit as y/n[yn (00) — n (Oo)]- To complete the proof note that the function g satisfies the regularity conditions required in Lemma 8.1 of Bickel and Freedman (1981), hence the above conclusion applies to g (y* (0n))- ' Remark 3.5 - If assumption 3 in the previous theorem does not hold, then the re- mainder term Rn in (3.12) does not necessarily satisfy ||-Rn|| = 0p (1/y/n)- The theorem is still valid, but the proof follows a different approach. Let f* (0) be the evaluation of f (0) with a bootstrap sample of the xi,...,xn. We can show that y/n[f* (0n) — 0n] in (3.13) is asymptotically normal. Let Sf be the asymptotic co- variance matrix of the robust bootstrapped estimates. Let of y/n (£in ~ p)- The following theorem shows that the bootstrap variance of the robust boot- strap estimates converges to the asymptotic variance. Note that this is not a conse- quence of the previous theorem. See Ghosh et al. (1984) for a counterexample. Theorem 3.2 - Convergence of the robust bootstrap variances - Assume the same regularity conditions as in Theorem 3.1. Then along almost all sample se- quences, where o2 is the asymptotic variance of the sequence y/n(pn ~ p), and var* denotes then it is easy to show that the element [ A ] (1,1) converges to the asymptotic variance n var {Pn ) ->• a the bootstrap variance. 110 Proof: To fix ideas we first consider the simple case of known scale. In this case (3.6) and (3.7) become (see also (1.6)) rc* - \ E i L i ^ f a * ~ An) Vn (pn - pn) = an === -j-i W / . r - r , E i = l V> fa* - Pn) [X* - pn) where Then we can write E L l ^ f a i - A n V f o i - A n ) ^ ~ YsUViXi-Pn) • ( 3 - 1 6 ) Y* _ Vn ( / £ - An) = a n = an g (Fn*, Z„) , Z n where # (x, y) = x/ y, y* = ip (x* - pn) and z* = ip (x* - An) / (x* - pn). B y Bickel and Doksum (1977, page 52), conditionally on the first n observations, n var, (ft) = al ( \ ^ % + o ( 1 / n). (n E i = l Zi) The result now follows by replacing an wi th its value in (3.16) and taking the l imi t as n —y oo. The general proof follows the same lines, and it is based on the matrix representation used in the proof of Theorem 3.1 • 3.4 Robustness properties In this section we study the robustness properties of the quantile estimates of the robust bootstrap. Let t G (0,1), and let qt be the t-th upper quantile of a statistic pn, that is, qt satisfies P [ An > qt ] = t • I l l Following Singh (1998) we define the upper breakdown point of a quantile estimate qt as the minimum proportion of asymmetric contamination that can drive it over any finite bound. Equivalently, it is the smallest proportion of arbitrarily large outliers in the original data set such that we expect the re-calculated estimate to be unbounded in at least t x 100 % of the bootstrap samples. It is easy to see that i f the breakdown point of jln is e*, the corresponding upper breakdown point of qt is the smallest 5 G [0,1] such that P ( Binomia l (n, 5) > [e*n] )>t, (3.17) where [x] denotes the smallest integer larger or equal to x. Lemma 7.17 shows that the function / (5) = P ( Binomia l (n, 5) > [e*n] ) is continuous and non-decreasing for 8 G [0,1], and hence we can always find the upper breakdown point of qt as defined above. In the same paper Singh proves that i f we re-sample from the Winsorized data points, the resulting quantile estimates are asymptotically equivalent to those obtained by the classical bootstrap but have the highest possible breakdown point, namely, the minimum between 50% and the breakdown point of the robust estimate. There are two closely related scenarios in which the quantile estimates based on the robust bootstrap can break down. The first unfavourable situation is when the proportion of outliers in the original data is larger than the breakdown point of the estimate. In this case the estimate is already unreliable, and so are the inferences we derive from it. The second case is related to the number of outliers appearing in the bootstrap samples. Let r* be the expected proportion of bootstrap samples that 112 contain more outliers than the breakdown point of the estimate. In other words, we expect T* x 100% of the re-calculated fi*n's to be unreliable. The estimate qt may be severely affected by the outliers when r* > t. The following theorem summarizes this discussion. Theorem 3.3 - Breakdown point of the robust bootstrap quantiles for the location-scale model - Let xi,...xn be i.i.d. observations following model (2.1). Let 0 < t < 1/2 and let p,n be an MM-location estimate with breakdown point e*. The breakdown point of the t-th robust bootstrap quantile estimate qt is min (t 1/", e*). Proof: Not ing that a* in (3.5) satisfies 1 „ (Vi ~ An P we see that <7* remains bounded for any bootstrap sample y*,... yn. Let x\",..., xn be a bootstrap sample. To simplify the notation, and without loss of generality, assume that x\,..., x*g., wi th 0 < g < n are points in the bootstrap sample that are not outliers. The robust bootstrap evaluation of An satisfies An * ~Pn = _ £ f [ ^ ( ^ ) / ( * * - A n ) ] < + E ; . + 1 [P ( ^ ) / (X? - An) £ f ^ ( ^ ) / (*? " An) + £ ; . + l * {^t)/ (X* ~ An) ' If we now take the l imit when the outliers x*.+1,... ,x*n approach infinity, we have , „ « v — • 0 , g* + l<i<n. [X* - fJLn) x*->oo and . • 1, g* + 1 < i < n. 113 As a result An * ~ An remains bounded as long as g* > 1. It is easy to see that the probability of obtaining a bootstrap sample with g* — 0, that is, where al l the points are outliers, is e n , where e is the proportion of outliers i n the original sample. Hence, to drive the i - th quantile out of bounds we should have e n > i , that is e > tlln. The proof is complete. • The previous theorem shows that the breakdown point of the i - th robust boot- strap quantile increases wi th the sample size and wi th the value of i . For example, i f the MM-est imate has a breakdown point of 50%, for any sample size n > 10 and any t > 0.001 the breakdown point of the i - th robust bootstrap quantile is 50%. Table 3.1 shows some classical and robust bootstrap quantile breakdown points for an M-locat ion estimate with breakdown point of 50%. The breakdown points of the robust bootstrap quantile estimates are calculated wi th the formula proved i n Theorem 3.3 above. The corresponding breakdown points for the classical bootstrap quantile estimates are calculated with formula (3.17). For example, the entry 0.22 for n = 10 and t = 0.01 means that i f there are at least 22% outliers (more than 2 outliers) in a sample of size 10, then the classical bootstrap estimate of q0.0i might be severely affected by the value of those outliers. Note that the breakdown point decreases as we move further out into the tails of the distribution of An, and it increases wi th the sample size. A s an example, for the same quantile <70.oi as before, i f the sample size is n — 20 the classical bootstrap quantile wi l l breakdown only when the proportion of outliers reaches 28%. It is intuitively clear that the lower breakdown points wi l l 114 Classical bootstrap Robust bootstrap n 90.05 9o.oi 90.005 90.001 90.05 90.01 90.005 9o.ooi 5 0.19 0.11 0.08 0.05 0.50 0.40 0.35 0.25 10 0.30 0.22 0.19 0.14 0.50 0.50 0.50 0.50 20 0.35 0.28 0.26 0.21 0.50 0.50 0.50 0.50 50 0.40 0.35 0.33 0.30 0.50 0.50 0.50 0.50 Table 3.1: Comparison of breakdown points of classical and robust bootstrap quantile estimates for MM-loca t ion estimators be found further out into the tails of the distribution, as these quantiles are typically more difficult to estimate. Note that for n > 10 the breakdown of the robust bootstrap quantile estimates considered here have the highest attainable value, namely 50%. In this sense our method compares favourably with Singh's Winsorized bootstrap, which yields quantile estimates qt wi th breakdown point 50% for any n and t. 3.5 Studentizing the robust bootstrap The basic idea behind the studentized bootstrap (both classical and robust) is as follows. Let Tn be a statistic such that y/n (Tn — p) —>• N (0, U2) and let U2 be a consistent estimate of U2. Under certain regularity conditions (see, for example, Ha l l , 1992) we have p ( v ^ ( T n - p ) / L > n < x ) - P ( y ^ ( T n * - A n ) / t > ; < x\ x) = Op ( r r 1 ) , (3.18) where U* denotes the re-calculated Un wi th the bootstrap samples, and P(-\X) de- notes the bootstrap distribution conditional on the sample X. Note that y/n (Tn — p ) / Un 115 is an asymptotically pivotal statistic. Under certain regularity conditions, the prop- erty of the bootstrap distribution stated in (3.18) holds when the statistic being bootstrapped is asymptotically pivotal (see Ha l l , 1992). On the other hand, if $ (•) denotes the standard normal cumulative distribution function we have P ( v ^ ( T n - fi)/Un <x)-$(x)=0 ( n - 1 / 2 ) . (3.19) When (3.18) holds we say that the bootstrap estimate of the distribution function of jln has a higher order of convergence than the one given by the normal asymptotic distribution. Because the robust bootstrap re-computes a non-pivotal statistic the resulting estimate of the distribution function of fin may not be more accurate than the one derived from its asymptotic normal distribution. To improve the accuracy of our distribution estimates we consider a studentized version of the robust bootstrap, in the same spirit as the bootstrap-^ confidence intervals (see Efron, 1979; Ha l l , 1992; DiCicc io and Efron, 1996). We wi l l refer to them as robust bootstrap-^ confidence intervals. Let / in, bn and /}„ be the MM-loca t ion , S-scale and S-location estimates, respec- tively. Under certain regularity conditions the vector ( / i n , an, jj,n)' has an asymptotic normal distribution. Let E = E (/i, cr, /2, F) denote the corresponding asymptotic co- variance matrix in K 3 * 3 . Let E n be the empirical estimate of this matrix, that is E n = E (//„, & n , / i n , Fn) where Fn is the empirical distribution function of the sample. Following the same approach as in the proof of Theorem 3.1 it is easy but tedious to 116 see that along almost al l sample sequences we have \fn ( a* ^ Mn N (o, E) , Pn \p*n J J for a certain matrix £ (p, cr, /2, F) G R 3 x 3 , where (pn,a*,pn) are the bootstrap re- calculations obtained from the re-weighted expressions in (3.11). A s before, let £ n = E (An, <5n, An, F„) be the empirical estimate of E . Let An — I — V f (An, on, An) be the estimated correction matrices used in the proof of Theorem 3.1 (see page 105 for the definitions). We can show that under certain regularity conditions, A n E n ^ n ~^ ^ almost surely. For a matrix C that is symmetric and definite positive, let C 1 / 2 be its unique square root. That is, we have C = C 1 / 2 [ C 1 / 2 ] ' . Let C " 1 / 2 be the inverse of C 1 / 2 . If we use classical studentized bootstrap on the vector (An, on, pn) we use En , the inverse of the square root of the estimate E n , as follows: / , c * \ ( h \ Pn Pn n \Pn J where £ „ 1 / 2 c * denotes the evaluation of E n wi th the bootstrap samples, and (pcn*, anc, jx0*)' are the classical bootstrap re-calculated statistics. From Theorem 3.1 we know that ( [jc* \ ( r, \ Pn Pn V An / V An / ( A„ ^ ( Pn ^ at \ Pn j 117 where (fin, <r*, /}*)' are the robust bootstrap re-calculated statistics, and Xn ~ Yn means that both sequences have the same asymptotic distribution. Simple algebra yields E n = AnT,nA'n. Hence, E n 1 / 2 = E n ^ A " 1 . Based on these observations we propose to studentize our bootstrap as follows. Let E n 1 / 2 * and A~l* be bootstrap evaluations of E n 1 / 2 and A~l respectively. Let Pn \ P n ) 1 u x On \ Pn ) (3.20) be the studentized robust bootstrap re-calculations of (fin, an, fj,n)'• If we transform e* in (3.20) with the covariance matrix estimate E ^ 2 calculated with the original data, we get a bootstrap sample of the joint distribution of (pn,&n, Pn)'• That is, if h* = E y 2 e * , we can use the first coordinate of these h* vectors to get our estimate for the asymptotic distribution of fin. Unfortunately, we were unable to prove that the proposed studentized robust bootstrap method achieves (3.18). The reason seems to be that the correction factor we apply to jx^* — pn is of order Op (1/ yjn). Numerical experiments show that the studentized robust bootstrap performs slightly better that the non-studentized robust bootstrap, but the difference does not seem to be of the order of magnitude suggested by (3.18) and (3.19). See Chapter 6 for a proposal on how this could be improved. 118 3.6 Inference We consider two related approaches to performing statistical inference about the location parameter p. The first approach is to approximate the distribution of \fn (fin — /i) wi th its normal asymptotic distribution (see Theorem 2.6). If we proceed in this fashion, it is important to have a good estimate of the asymptotic variance V2 given in that Theorem. We wi l l then bui ld an asymptotic 1 — a confidence interval for /x of the form (/x„ — za/2 Vn, i±n + za/2 Vn), where Vn is an estimate of V, and za is the quantile that leaves area equal to a to its right under a standard normal curve. We can also wish to estimate V in order to assess the precision of the point estimate fin. In either case, it is of interest to have a reliable estimate of V. In Section 3.6.1 we compare the accuracy of four estimates Vn: the robust bootstrap ( R B ) , the classical bootstrap ( C B ) , Singh's Winsorized bootstrap ( W B ) (Singh, 1998) and the empirical estimate (AV) based on the formula given in Theorem 2.6. The first three methods use the empirical variance of the re-calculated fx*'s to estimate V2 (see for example Davison and Hinkley, 1997, page 16). The difference among them lies in the way in which the re-computed fin's are obtained. See Section 3.6.1 for a description of the W B method. The second approach is based on estimating the distribution of \fn (fin — /x) without using Theorem 2.6. In this case we focus in constructing confidence intervals for fj, of the form ( Z i _ Q / 2 , Za/2), where Z„ satisfies l i m ^ o o P [ y/n (fin — (j,) > Z„] = rj, for r\ G [0,1]. We can use bootstrap methods to obtain such estimates. The basic 119 idea (see for example Davison and Hinkley, 1997, page 18) is to use the empirical distribution of the re-calculated p*n 's to obtain estimated quantiles Zn. Note that with this method we do not use the symmetry assumption that underlies the asymptotic normal approximation in the previous approach. To estimate these quantiles we considered the studentized classical bootstrap (see Davison and Hinkley, 1997, page 29), the studentized robust bootstrap and the Winsorized bootstrap (Singh, 1998). We have also studied the non-studentized classical and robust bootstrap, but the studentized methods performed better. Details and the results of our experiment are discussed in Section 3.6.2. In al l the simulation experiments we used an MM-loca t ion estimate wi th •01.345 in Huber's family. The S-scale was calculated wi th the function pi.04086 in (2-14). This election yields an estimate pn that has 50% breakdown point and that is 95% efficient when the data is normally distributed. Remark 3.6 - Note that the correction coefficient bn in (3.7) EIU ^ {Ul *„)] E=l ? (fi/ &n) h i On] ' could potentially be unstable due to small values in the denominator. It is clear that, almost surely, ,. , , , „ x , EF[ip'(u)u] h m & n = bO0(F) = b F [ ; 1, ' n->°o Ep [ip' (u)] Ep [pf (u) u] Lemma 7.20 gives a bound for b^ (F) for any distribution Fe in an e neighbourhood of the standard normal distribution, when ip = ipc belongs to Huber's family and p — pk given by (2.14). 120 In our simulation studies we used c = 1.345, b = 1/2 and k = 1.041, so that the previous lemma yields for e G (0,0.3). Hence we restricted bn G (—1.5,1.5). 3.6.1 Asymptotic variance estimation In this section we compare the following estimates of the asymptotic variance V2 of the sequence fin: the classical bootstrap ( C B ) , the robust bootstrap ( R B ) , Singh's Winsorized bootstrap ( W B ) (Singh, 1998) and the empirical estimate based on the asymptotic formula ( A V ) . A s mentioned before, the three bootstrap-based estimates of V2 are the empiri- cal variance of the re-calculated /t*'s. They differ in the way in which the re-computed statistics are obtained. The last estimate AV of V2 is given by V2 = V (fin, on, Fn)2 where V (/x, a, F) is given in (2.71). We now briefly describe Singh's Winsorized bootstrap (Singh, 1998). Let X\,... ,xn be the original sample. Let an be a robust scale estimate and jln a ro- bust location estimate calculated as in (2.12) with a function tpc from Huber's family (2.5). For a fixed constant h and a bootstrap sample x\,...,x*n define the Winsorized 121 bootstrap sample as follows n if |(a;J - fin)/on\ < h if (xl - Lin)/dn < -h fin + h a, n if (x\ - fln)/On > h. for i = 1 , . . . , n. Singh proposes to use h = 1.5 c where c is the constant used in ipc. The Winsorized bootstrap evaluation of / i j f * is the solution of We then estimate the asymptotic standard error V of jj,n by calculating the empirical standard deviation of the bootstrapped fi^^s. In this study we considered distributions in the family where $ denotes the standard normal cumulative distribution function. That is, 100 x e per cent of the observations are outliers centered around 7. Other values for the center of the contamination yielded similar results. We needed to simulate observations of a random variable X wi th distribution function Fe given by (3.21). A realization of such a random variable can be eas- i ly generated in the following way. For each i = 1 , . . . , n let P>i ~ Binomia l (1, e), independent from each other. Then n £^((tf-£r)/*») = ° 1=1 Fe(x) = (l-e) $ ( x ) + e $ ( ( x - 7 ) / 0 . 1 ) , (3.21) (3.22) 122 where Zf ~ N (0,1) and Zf ~ N (7,0.1), both independent of 5*. When drawn in this fashion, every sample xx,..., xn w i l l contain a different number of outliers. This random proportion of outliers has expected value equal to e. We used samples of size 20, 30 and 50. The proportions e of contamination considered were 0.0, 0.1, 0.2 and 0.3. For each Fe we computed the correct asymptotic variance V2 (p, a, Fe) of the sequence pn (see Theorem 2.6). We then simulated 3,000 samples from Fe and obtained the asymptotic variance estimates Vi, i = 1 , . . . , 3000 wi th each of the four methods. We report the averages of the following two "loss functions", where V is the actual asymptotic standard deviation of pn when X ~ Ft: The quadratic measure (3.23) is more sensitive to over-estimation of V, but does not penalize under-estimation with the same intensity (intuitively: the worst because dt (V, V)/dq (V, V) -> 0 when V - 4 + 0 0 . Tables 3.2 and 3.3 provide detailed results of this Monte Carlo study. In Figures 3.5 and 3.6 we summarize these results. We see that in al l cases there is a value e* 123 (3.23) and (3.24) under-estimation for V is V = 0 and hence there is an upper bound for dq (V, V) when V < V, while dq is unbounded for V > V). The logarithmic loss dt (3.24) penalizes under-estimation wi th more intensity because now di (V*, V) —7 +00 when Vi —r 0. W i t h this loss function over-estimation receives less weight than wi th dq such that for e < e* al l methods behave similarly, but when the proportion of outliers exceeds e* the robust bootstrap shows a clear advantage in performance. For example, with dq and n = 30, for e < 0.10 there is not much difference among the different methods. But when e > 0.20 there is a clear change in the pattern: the robust bootstrap remains relatively stable, the Winsorized bootstrap and the empirical variance grow notably faster and the classical bootstrap is completely unreliable. For n = 50 and dq the classical bootstrap breaks-down for e > 0.1 while the other three methods remain close to each other. When e > 0.20 we see that the robust bootstrap remains stable while the other methods break-down (the Winsorized bootstrap resulted the second best method in this study). W i t h d\ the differences in performance are smaller but the pattern is the same as wi th dq. Note that the robust bootstrap has a comparable average loss for values of e where most of the methods considered here remain close to each other. When the proportion of contamination is large and there is a clear differentiation in perfor- mances, the robust bootstrap is consistently the best method. This study illustrates the numerical stability of our method when the interest lies in estimating V2, the asymptotic variance of the sequence / t n . These results show that inference based on our method is more stable than that based on the other three proposals for high proportions of contamination, and at the same time, remains comparable for small proportions of outliers. The robust bootstrap is also much faster 124 E(yn/v-i) n e Robust Bootstrap Classical Bootstrap Winsorized Bootstrap Empi r ica l A V 20 0.00 0.10 0.20 0.30 0.268 (0.993) 3.258 (41.46) 17.99 (88.02) 10.17 (26.84) 0.219 (0.475) 33.27 (411.9) 189.7 (632.3) 89.81 (134.0) 0.218 (0.464) 2.908 (51.35) 19.95 (115.9) 11.83 (31.14) 0.271 (1.120) 3.256 (50.89) 29.14 (189.6) 18.92 (66.11) 30 0.00 0.10 0.20 0.30 0.135 (0.268) 1.304 (29.35) 13.60 (114.1) 17.94 (53.36) 0.123 (0.230) 9.798 (329.8) 116.0 (595.7) 124.1 (241.3) 0.122 (0.221) 1.505 (54.10) 20.36 (191.2) 38.63 (116.4) 0.132 (0.261) 1.418 (40.43) 34.46 (456.0) 76.22 (294.3) 50 0.00 0.10 0.20 0.30 0.067 (0.112) 0.233 (0.759) 3.417 (38.15) 12.34 (63.11) 0.061 (0.094) 0.361 (1.786) 31.16 (366.3) 95.15 (301.1) 0.060 (0.093) 0.203 (0.609) 6.124 (161.3) 40.62 (224.2) 0.065 (0.114) 0.215 (0.685) 6.307 (161.7) 122.3 (1012) Table 3.2: Comparison of asymptotic variance estimates - quadratic measure to compute than these alternatives. 3.6.2 Coverage and lengths of confidence intervals In this study we compare the coverage and mean lengths of confidence intervals based on the following methods: studentized classical bootstrap (B-t), studentized robust bootstrap (RB-t ) , and Singh's Winsorized bootstrap ( W B ) . Let fin be an MM-loca t ion estimate and let on be the associated S-scale. The classical bootstrap-^ (B-t) method was implemented as described above wi th Tn = fin and Un = V (fin,an, Fn), where V (p, a, F) is given in Theorem 2.6. The robust 125 E(\og(vn/v)) n e Robust Bootstrap Classical Bootstrap Winsorized Bootstrap Empi r ica l A V 20 0.00 0.10 0.20 0.30 0.198 (0.322) 0.436 (0.863) 1.130 (1.871) 1.642 (1.730) 0.177 (0.264) 0.875 "(1.982) 2.896 (4.091) 3.696 (3.211) 0.195 (0.301) 0.390 (0.791) 1.081 (1.902) 1.809 (1.863) 0.198 (0.327) 0.415 (0.841) 1.166 (2.075) 1.907 (2.075) 30 0.00 0.10 0.20 0.30 0.120 (0.187) 0.279 (0.552) 0.813 (1.570) 1.559 (1.974) 0.110 (0.171) 0.418 (1.048) 1.903 (3.314) 3.556 (3.667) 0.114 (0.182) 0.245 (0.503) 0.764 (1.693) 1.922 (2.579) 0.119 (0.186) 0.269 (0.540) 0.844 (1.858) 2.096 (3.127) 50 0.00 0.10 0.20 0.30 0.065 (0.094) 0.143 (0.226) 0.480 (0.900) 1.049 (1.596) 0.059 (0.085) 0.166 (0.299) 0.913 (1.921) 2.570 (3.332) 0.060 (0.087) 0.134 (0.206) 0.405 (0.897) 1.257 (2.350) 0.064 (0.093) 0.139 (0.216) 0.476 (0.967) 1.364 (2.797) Table 3.3: Comparison of asymptotic variance estimates - logarithmic measure 126 bootstrap-^ was performed as discussed in Section 3.5. For a description of the W i n - sorized bootstrap ( W B ) see Section 3.6.1. To construct 1 — a confidence intervals for LI we estimate the quantiles qt that satisfy l im P (qi-a/2 < \/n(ixn- LI)/Un < qa/2) = I - a. Let F* be the empirical distribution function of the re-computed y/n (A„ — Lin)/U* with each bootstrap-t method. A n estimate of qa is given by the solution q*a of F* (qa) = a. The confidence interval is ^An — Qa/2 Un i An — Ql-a/2 ^ • Confidence intervals based on Singh's Winsorized bootstrap used the empirical distribution function of the recomputed estimates as an estimate of the cumulative distribution function of y/n (An — A 4 ) - If F^ denotes that estimate, then, for a G [0,1] the cn-th quantile estimate based on the Winsorized bootstrap q^Va solves F^ (qwa) ~ a. The interval is ^An — Qw a/2 , An — Qw l-a/2 ^ • The tails of the distribution estimates obtained by the classical bootstrap are potentially unstable (see the discussion on page 86). We expect the robust bootstrap to show more clearly its advantage over the other methods when we estimate extreme quantiles (for example the quantile Qo.oos needed to bui ld a 99% confidence interval). We considered data generated by distribution functions Fe in (3.21) wi th e equal to 0.00, 0.10, 0.20 and 0.30 and n = 20, 30 and 50. For each combination 127 of n and e we generated 3,000 samples and constructed the confidence intervals as described above. Tables 3.4 and 3.5 show the results. We display the same results in Figures 3.7 to 3.12. For each sample size (20, 30 and 50) we plot the empirical coverage level on the x-axis and the the mean length on the y-axis. The labels 0 , 1 , 2 and 3 correspond to 0%, 10%, 20% and 30% of outliers. The results corresponding to each method are joined by a line. The ideal trajectory wi l l stay near the 1 — a line without moving upward. A s expected we see that for larger proportions of contamination we obtain larger mean lengths. For n = 20 or 30 the studentized bootstrap confidence inter- vals are noticeably longer that the ones based on the other methods. We also note that the studentized bootstrap confidence intervals have smaller coverage levels than nominal. There is no important difference in the performance of the robust bootstrap compared with the Winsorized bootstrap, but the computational demands are signifi- cantly less for the robust bootstrap. We conclude that the robust bootstrap performs equivalently to other robust proposals and is faster to compute, so we recommend its use. 128 Studentized Studentized Winsorized n e robust bootstrap bootstrap bootstrap 20 0.00 0.942 (1.01) 0.953 (1.09) 0.938 (0.91) 0.10 0.950 (1.25) 0.931 (1.26) 0.937 (1.11) 0.20 0.962 (1.74) 0.912 (1.87) 0.958 (1.49) 0.30 0.970 (2.56) 0.940 (4.01) 0.972 (2.11) 30 0.00 0.949 (0.79) 0.954 (0.81) 0.948 (0.74) 0.10 0.958 (0.97) 0.933 (0.95) 0.950 (0.90) 0.20 0.964 (1.28) 0.910 (1.20) 0.963 (1.18) 0.30 0.971 (2.01) 0.919 (2.10) 0.971 (1.73) 50 0.00 0.946 (0.59) 0.948 (0.59) 0.942 (0.57) 0.10 0.958 (0.72) 0.948 (0.71) 0.958 (0.70) 0.20 0.967 (0.94) 0.924 (0.88) 0.966 (0.91) 0.30 0.980 (1.43) 0.908 (1.29) 0.982 (1.32) Table 3.4: Coverage and length of 95% confidence intervals for the location-scale model Studentized Studentized Winsorized n e robust bootstrap bootstrap bootstrap 20 0.00 0.988 (1.43) 0.988 (1.60) 0.968 (1.12) 0.10 0.991 (1.81) 0.986 (2.04) 0.983 (1.42) 0.20 0.994 (2.52) 0.982 (4.55) 0.985 (1.90) 0.30 0.998 (3.80) 0.990 (11.5) 0.995 (2.63) 30 0.00 0.988 (1.08) 0.989 (1.12) 0.979 (0.93) 0.10 0.993 (1.33) 0.976 (1.31) 0.981 (1.14) 0.20 0.997 (1.77) 0.967 (1.67) 0.988 (1.50) 0.30 0.999 (2.68) 0.978 (3.22) 0.995 (2.14) 50 0.00 0.989 (0.80) 0.988 (0.80) 0.979 (0.73) 0.10 0.992 (0.97) 0.988 (0.94) 0.991 (0.89) 0.20 0.999 (1.27) 0.974 (1.16) 0.993 (1.16) 0.30 0.999 (1.97) 0.978 (1.86) 0.997 (1.70) Table 3.5: Coverage and length of 99% confidence intervals for the location-scale model Sample size: 20 Sample size: 30 as 5 0.0 0.1 0.2 0.3 epsilon o CM , — o o •— CO co _g o 00 u ra adr o CD cr c o CO •<»• CD o O J o 0.0 0.1 0.2 0.3 epsilon Sample size: 50 0.0 0.1 0.2 0.3 epsilon Robust bootstrap Classical bootstrap Winsorized bootstrap Empirical variance Figure 3.5: Comparison of asymptotic variance estimates - quadratic measure 130 Sample size: 20 Sample size: 30 0.0 0.1 0.2 0.3 epsilon 0) 2 0.0 0.1 0.2 0.3 epsilon in cvi Sample size: 50 0.0 0.1 0.2 0.3 epsilon Robust bootstrap Classical bootstrap Winsorized bootstrap Empirical variance Figure 3.6: Comparison of asymptotic variance estimates - logarithmic measure 131 Sample size: 20 o LO CO Robust bootstrap-t Bootstrap-t Winsorized bootstrap o CO Figure 3.7: Location-scale model 95% confidence intervals for n = 20 - Labels 0, 1, 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 132 Sample size: 30 Figure 3.8: Location-scale model 95% confidence intervals for n = 30 - Labels 0, 1, 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 133 Sample size: 50 Figure 3.9: Location-scale model 95% confidence intervals for n = 50 - Labels 0, 1, 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 134 Sample size: 20 Robust bootstrap-t Bootstrap-t Winsorized bootstrap oo H Figure 3.10: Location-scale model 99% confidence intervals for n = 20 - Labels 0, 1, 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 135 Sample size: 30 o LO CO o CO O) LO c o C\i LO o 0.95 0.96 0.97 0.98 0.99 1.00 Coverage Figure 3.11: Location-scale model 99% confidence intervals for n = 30 - Labels 0, 1, 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 136 Sample size: 50 LO c\i Robust bootstrap-t Bootstrap-t Winsorized bootstrap o 05 c a) c CO CO 2, T T 0.95 0.96 0.97 0.98 Coverage 0.99 1.00 Figure 3.12: Location-scale model 99% confidence intervals for n = 50 - Labels 2 and 3 correspond to e = 0.0, 0.1, 0.2 and 0.3 respectively. 137 C h a p t e r 4 G l o b a l a s y m p t o t i c p r o p e r t i e s o f r obus t es t imates for the l i nea r regress ion m o d e l In this chapter we extend the results of Chapter 2 to the linear regression model. We first describe the class of MM-regression estimates (Yohai, 1987) and discuss its robustness properties when the data are generated by a distribution belonging to the e-contamination neighbourhood of a central distribution H0. This gross-error neighbourhood allows for contamination both in the errors and in the predictor vari- ables (see (4.10)). We show that under certain regularity conditions the S-scale, S- regression and the MM-regression estimates are consistent for any distr ibution i n this neighbourhood. We also show that wi th some additional regularity conditions, the S- 138 and MM-regression estimates are asymptotically normal for arbitrary distributions in the neighbourhood. 4.1 Definitions Consider the following linear regression model. Let yi,..., yn be n independent ob- servations satisfying yi = /3 0 Xj + CJ, i = l,...,n, (4.1) where /3 0 E W is the parameter of interest, Xj are n p-dimensional covariates, and the errors 6j are i . i .d . wi th mean zero and constant variance <72. We wi l l consider random covariates in (4.1). Asymptot ic theory for robust regression estimates wi th fixed explanatory variables has yet to be studied in detail. To our knowledge, only results for M-regression estimates (Yohai and Maronna, 1979), S-regression estimates (Davies, 1993) and GM-estimates (Wiens, 1996) have been published. The errors e, are assumed to be independent of the explanatory variables. We consider <7n, the dispersion parameter of the errors, a nuisance parameter. If the model (4.1) includes an intercept write Xj = ( l , z^ ) ' where Zj are the explanatory variables. Otherwise we have Xj = Zj . Robust regression estimates were first introduced by Huber (1973, 1981). Let ip : R —> R be odd, non-decreasing and bounded. The M-regression estimate is the solution /3n of the equation n ^ V ( ( y i - x | i 9 B ) / a n ) x i = 0, (4.2) i=i 139 where an is a robust estimate of the scale of the residuals. If p is such that ip — p' then /3„ can be defined as n hn = arg min J ] p ((Vi - x\ 9)/an) . (4.3) i=l Note that because ip is non-decreasing the corresponding p is unbounded. These estimates have breakdown point 0 (see Definition 4.4) because the above equation does not take into account the leverage of the observations (see Weisberg, 1985, page 111) to down-weight them. A generalization of the above class of estimates is given by the generalized M-estimates (GM-estimates). Let n : RP x R - » R satisfy: • for al l x G R p , n (x, •) is continuous except on a finite set C (x); • for each x G R p , n (x, •) is odd; • V (x, r) > 0 for x G R p and r > 0. The GM-regression estimate 0n is defined by n E 7 ? ^ ' ~ X i ^ n ) / ^ n ) ^ = 0. (see for instance H i l l , 1977; Krasker, 1980; Krasker and Welsch, 1982 and Hampel et al, 1986). A l l proposals for rj can be written as 77 (x, r) = co (x) , 0(rv(x)), for different choices of the weight functions u : W R+, ip : R -> R and v : Rp -> R+. For example, setting w (x) = u (x) = 1 we obtain Huber's estimates (4.2). If v (x) = 1 we obtain Mallow's family 77 (x, r) = u (x) ^ (r). Maronna, Bustos and Yohai (1979) 140 showed that these estimates have breakdown point at most 1/ (p + 1) where p is the number of covariates i n the model, including the intercept i f present. Rousseeuw (1984) introduced the Least Median of Squares (LMS) estimate and the Least Tr immed Squares (LTS) estimates. These estimates minimize the median and the tr immed mean of the squared residuals respectively. The L M S has the highest achievable breakdown point, namely 50%, independently of the number of explanatory variables. Unfortunately, the L M S does not have a v/n-asymptotic distribution (Davies, 1990) and the L T S is computationally very demanding. Rousseeuw and Yohai (1984) introduced the class of S-regression estimates. They are defined as the set of coefficients /3n that minimizes an M-scale of the corre- sponding residuals (compare with Definition 2.6). Consider a loss function p : K —>• E_)_ that satisfies the following set of regularity conditions: R . l p (-u) = p (u) for al l u G R, and p (0) = 0; R.2 p is continuously differentiable; R.3 sup x p(a;) = 1; R.4 if p (u) < 1 and 0 < v < u then p(v) < p (u). Definition 4.1 - S-regression estimates Let p : M —>• satisfy conditions R.l to R.4 above. Let b € (0,1]. The S-regression estimate f3n solves Pn = a r S min on (/3) , 141 where an (8) satisfies lYjp((yi-x!i8)/dn{P)) = b. (4.4) The corresponding S-scale estimate on is an= in{pdn(8) = on(f3n) • (4-5) S-regression estimates have been shown to be asymptotically normal when the errors distribution is symmetric (Rousseeuw and Yohai, 1984). Unfortunately, for these estimates there is a trade-off between high breakdown point and high efficiency when the errors follow a standard normal distribution. The function p in the above definition can be chosen so that the resulting S-regression estimate is highly efficient, but this choice of p yields a poor breakdown point. If, on the other hand, we choose p to obtain a high breakdown point, the asymptotic efficiency of f3n decreases notably. Note that S-regression estimates are a special type of M-regression estimates with a bounded loss function p and a special scale estimate. Because of the mono- tonicity of p and the inequality on < bn (8) for any 8 G W it is easy to see that and hence /3n minimizes where p is bounded (compare with (4.3)). In order to obtain simultaneously high breakdown point and high efficiency at the standard normal model, Yohai (1987) introduced the class of MM-regression estimates. 142 Definition 4.2 - MM-regression estimates Let p 0 : R ->• R+ and px : R ->• R+ 6e two functions satisfying conditions R.l to R.4 above and such that p\ (u) < po (̂ ) /or all u 6 R and s u p u e R p i (u) = s u p u e R p 0 (u). The MM-regression estimate is defined in the following three steps: • let f3n be a high-breakdown point estimate for (3; • let on be the M-scale estimate of the residuals based on $n. That is, on satisfies 1 n -^2po((Vi -A~Pn)/on) = b; n t=l • the MM-estimate f3n is defined as any solution of n Yl Pl ( ̂ ~ X ' i & » ) / an) Xj = 0 i=l withS(f3n) < S(j3n), where n S{B) = Y,Pl{{Vi-<P)l&n)- i=l In particular, we will consider MM-estimates obtained with the steps described above when /3n is a S-regression estimate, and dn is the corresponding S-scale. Note that if po and p\ are continuously differentiable then the estimates (3n, (3n and an satisfy the following equations: n Y,p'l((yi-x'iPn)/Vn)Xi = 0, (4.7) i=l 1 " - £ > ( f o - x ; = 6, (4.8) 143 and Y^Po({yi-x!iPn)/Vn)Xi = 0. (4.9) i=l This class of regression estimates simultaneously achieves high breakdown point and high asymptotic efficiency (see Yohai , 1987). Yohai (1987) also proves that, if the data follow model (4.1) and the sequence J3n is consistent to the true parameter 30, then an —>• a0 and / 3 n is also strongly consistent to 30. 4.2 Robustness properties Let F0 be the distribution function of the errors e and let Go be the distribution function of the explanatory variables z in x (see model (4.1)). Let V be a set of distribution functions in W where p is the number of random components in the vector of covariates x. Let H0 the distribution of the pair (y,z). We model the presence of outliers in the data in such a way that both the re- sponse variable and the covariates can be affected. In other words, the contamination might upset both F0 and Go. Consider the following e-contamination neighbourhood of Ho ri€ = S^H EV : H = (l-e) H0 + eH*Y (4.10) where H* is arbitrary. Let po and pi be real functions satisfying the regularity con- ditions of Definition 4.2. Let b G (0,1/2]. For each 0 G W define the functional 144 a (H, 0) : V —> by the following equation Y-0'X EH Let cr : V ->• R. be Po = b. (4.11) <r(fT) = i£<r{H,0) . (4.12) The associated S-regression functional B : V —> W is 0(#) = argmm<r(tf,0) . (4.13) Finally, let the functional of the MM-regression estimate 8 : V —» R p be defined by /3 (#) = arg inf EH P i <T{H) (4.14) Most asymptotic bias results for robust regression estimates have been estab- lished for the linear model without intercept, that is when x* = in (4.1). When we have a linear regression model through the origin, the definition of asymptotic bias for the parameters 8 is as follows. Definition 4.3 - Maximum asymptotic bias for models without intercept - The maximum asymptotic bias of LI over Tit is given by B( C )= sup \\(B (H) - B (H0))' A (G0) (B (H) - B (H0))\\, (4.15) where Go is the distribution o /x in (4-1), A ( G o ) is an equivariant dispersion estimate, and H0 is the central distribution of the contamination neighbourhood %t. 145 For MM-regression estimates we have the following lower bound for B (e) (Berrendero Diaz, 1996; and Berrendero Diaz et al, 1998): B ( e ) > A ^ M B t M ) ^ / ^ ^ - u ( 4 ' 1 6 ) where and /•oo M s ) = / P(y/s) dF0(y) , Jo v The lower bound in (4.16) is an equality for small values of e (see Berrendero Diaz, 1996). Definition 4 .4 - Asymptotic breakdown point Let B (e) be as in (4-15). The asymptotic breakdown point of 3 is e* = i n f | e : B (e) = oo J . (4.17) Yohai (1987) shows that i f 8n has breakdown point equal to 1/2, then the MM-regression estimate (3n also has e* = 1/2. Rousseeuw and Yohai (1984) show that the S-regression estimates have t* = 1/2 and hence the MM-regression estimates obtained with an ini t ia l S-regression estimate inherit this property. 4.3 Asymptotic properties Under the central model H0 the sequence 3n is consistent to the true 30 in (4.1) (see Yohai , 1987). We also have that y/n~0n -3) is asymptotically normal wi th 146 covariance matrix S ( F 0 , G 0 , / 3 0 , c r 0 ) EFop[2(U/a0)/ [EpJKU/ao))' where U = Y - 8'0X and a0 = cr (H0) (see Yohai , 1987). i ? G 0 X X ' , (4-18) The asymptotic properties of these estimates when H € He and H ^ H0 are very difficult to study. In the next sections we obtain asymptotic results that hold for arbitrary distributions H in He. In particular, we show that with some additional regularity conditions the sequences f3n, dn and (3n are consistent, and that (3n is asymptotically normal. 4.3.1 Consistency of the S-scale estimate In this section we wi l l show that the S-scale estimates (4.5) for the linear regression model are strongly consistent to their asymptotic value (4.12). We wi l l need the following regularity conditions on the function p and the explanatory variables R.5 s u p j p ' ( w ) | < oo; X . l P ( 0'X = 0 ) = 0 for al l 0 e W. Remark 4.1 It is not difficult to see that i f p belongs to Tukey's family (2.8) then it satisfies R.1-R.5 . 147 To simplify the notation, for each 0 EW and s > 0 let g(0,s) = Ep(^—p^j . (4.19) Recall that the S-scale estimate on satisfies an = infg on (0), where an (0) solves (4.4). Denote by Q, the underlying probability space, and let UJ E be an arbitrary event. In the statement of Theorem 4.1 below we add the argument u> to on (0) to explicit ly indicate that it is a random variable. The following theorem is the main result in this section. Let H E Tit be an arbitrary but fixed distribution. To simplify the notation, drop the argument H from CT{0,H) (see (4.11)). Theorem 4.1 Let p be a real function satisfying conditions R.1-R.5 and let X be a random vector in W that satisfies X. 1 above. Then i) for any e > 0 there exists K2 > 0 such that cr (0) — e < on (0, LO) < a (0) + e, a.s. uniformly in {0 : \\0\\ < K2}. That is, there exists a null set Ai such that for any e > 0 and to ^ AA, there exists no = no (e,u>) such that for any n > n 0 «r(0) - e < on(0,u)) < <r(0) + e V 0 E {0 : ||0|| < K2} , where no does not depend on 0; ii) on —> cr almost surely. The following lemmas are needed for the proof of Theorem 4.1. 148 Unless explicit ly stated otherwise, we wi l l assume that the function p and the vector X satisfy conditions R.1-R.5 and X . l above. For a function / : RK —> K let / (u) dP (u) , denote the integral of / over A c t f wi th respect to the measure P. Lemma 4.1 The function g(9,s) defined in (4-19) is continuous in 9 uniformly on s € (77, 00) for any rj > 0. Proof: F i x I > 0. Choose a bounded set K C W+1 such that PH [ (F, X ) e K] < c/4. Let a,i — s u p x £ £ | |x|| < 00 and let a2 — s u p u g R \p' {u)\ < 0 0 . Choose 5 = 8 (e) such that 0 < 8 < (en)/ (2ax a2), and \\0i - 92\\ < 8. Then we have g{9i,s) - g(92,s) < IA L Y - 9[X Y " 0 > 2 X ' dH (Y, X ) K P' \ {Gl " 02)'X dH ( y ' X ) < ||<9! - 9 2 \ \ - a i a 2 + e/2 < e, + e/2 + i/2 for any s > rj. The next result shows that the scale functional cr (H, 0) defined in (4.11) wi th H G % e and 9 e W remains bounded away from zero and infinity i f 9 belongs to an arbitrary neighbourhood of 0 G W. 149 Lemma 4.2 Let a (H,0) be as in (4-11). For each arbitrary K > 0 and H G rie, there exist two constants S\ = Si (H, K) and S2 = S2 (H, K) such that 0 < Sr < cr (H, 6) < S2 < 00 V ||0|| < K, (4.20) where \\-\\ is the Euclidean norm in W. P r o o f : F i x the distribution function H and the constant K. Consider the function / (s) : E+ -r R+ defined by ( Y — 0 ' X \ ) = max q (0, s) . where g (0, s) is defined in (4.19). Note that for any 9 and S\ < s2 we have « E«p ( ^ ) ^ E»p ( ^ ) ; (ii) l i m s _ 0 0 £ : i ? p ( Z ^ ) = 0 . We wi l l now show that the above properties hold for / (s) as well. That / is non- increasing follows immediately from the first inequality above. To simplify the nota- tion let K, = { 0 G W : \\0\\ < K}. Let e > 0 be arbitrary and fix a sufficiently small 77 > 0. For each 0 G /C, let s (0) < 00 be such that \g(0, s)\ < e/4, V s > max(r7,s(0)) . Also for each 0 G K, define the set Ae = {0elC: \g(0,s)\ < e/4, V s > max(77,s(0)) }. . 150 We wi l l show that Ae is open. Note that 0 € Ae and hence is not empty. Let 0 £ AQ. B y the continuity of g, there exist a S = S (e, 0, rj) such that \\9-0\\<5 => \g{0,s) -g(0,s)\ < e/8 V s > r / . Hence, 0 G A whenever \\0 — 0\\ < 5 (e, 0, n), and e is fixed throughout the argument. Hence A$ is open. B y a standard compactness argument it follows that there exists a finite collection 0\, ..., 0k such that Take so > max (n, s (0X),..., s {0k))- Let s > s0. It is easy to see that for any 0 € K, we have \g(0,s)\ < e/4. Then, / (s) = max g (0, s) < e/4 < e i f s > s 0 . We can find S2 = S2 {H, K) such that f (s) < b for s > S2. Hence, g (0, s) < b for al l 0 € /C. Hence <r ( if , 0) < S2 for al l 0 e /C. The argument for the other inequality is simple. Note that for any fixed 0 E W we have lim s_> 0 g (0,s) = 1. Hence, l i m 3 ^ , 0 maxo^ g{0,s) > 1. The result follows by noting that lim s_> 0 maxegjt g (0, s) < 1 because p (u) < 1. • The following lemma shows that the infimum in the definition (4.12) for a can be taken inside a certain neighbourhood around 0 € W. Lemma 4.3 There exists Kx > 0 such that cr(H,0) <(T(H,0) V ||0|| > K U 151 and hence cr (H) = inf cr (H, 0) = inf a (H, 0) Proof: It is enough to show that for large ||0|| we have cr (H, 0) < cr (H, 0). Note that by the Dominated Convergence Theorem g (0, a (0)) —> 1 when ||0|| —r oo. Hence, given 0 < rj < 1 — b there exists K\ such that g (0, cr (0)) > b + n for al l 0 wi th ||0|| > Kx. Hence, <r (0) > cr (0) for ||0|| > Kx. • Remark 4.2 Note that we can consider any other neighbourhood around 0 e W larger than the one given by the previous lemma. Specifically, let Kx be as in Lemma 4.3 and let K2>K1. Then cr < inf a (0) < inf cr (0) = cr , _ | |9 | |< i i f2 ~ ||fl||<ffi and hence we have cr = inf a (0) for any Ko > K\. We now show that there exists a compact set where both a and an are attained almost surely. Recall that for each 0 £W, an (0) is defined in (4.4) by and that an = m{ge^P bn (0). Lemma 4.4 Let Kx be as in Lemma 4-3. Then there exists K2 > K\ such that if then P(An i.o.) = 0. 152 P r o o f : We wi l l first show that l im E inf p \W\>K = 1. (4.21) It is easy to see that for any y G R and s G R+ we have -\y\-M" l im p M—>oo 7 ( | y | < M ) = l . (4.22) B y hypothesis and Lemma 7.4, for any e > 0 there exist a > 0 and C i , . . . , C S such that U-=i ^ D { 0 : ||0|| = 1} , and P [ inf f l e C i | 0 'X | > a] > 1 - e, for 1 < i < s. We have i n f p (?LZ**\ > i n f p f I L Z * ^ 7 [|0<X| > M ] \\e\\>L \ s J ~ \\e\\>Lr V * / — ̂ Lp(^^i[\~e'x\ > M/\\e\\]i[\Y\ < K] , where 0 = 0 / ||0||. Set L = M/a to obtain i n f inf , ( M ^ W x | > a ] / [ | y | < , K ] ||»||>£ V * / ll«ll=l \ « / J II I - . ^ ? . i ? l ' , ( M T ^ ) / [ | , ' ' x | S a l / | | y | ^ 1 > (1 - cT) min inf 7 [|0'x| > a] , by (4.22) for M large enough. Hence E y _ /3'"V inf p ( }I(\Y\<K) \\0\\>L V S * vi i — y > (1 - 6) E = (1-6)P >(1-S) ( 1 - 5 6 ) , ™™*1\**\>«]II\Y\<IC\ inf |0'x| > a, V l < j < s sec; ' 1 - which can be made as close to 1 as desired. Hence (4.21) holds. 153 We now show that for an arbitrary 0 € W, 5 > 0 we have £ff(|c>„(0) > S) < 2 e x p ( - n 7 ) for some 7 = 7 (<5) > 0 . (4.23) F i x 8 > 0 and 0 E R p . We know that there exists <50 = <50 (0, t5) > 0 such that (y _ \ and ^<x(0) -<5 B y Bernstein's Inequality (Lemma 7.6) we have , Y — 0 'X . Ep r <6-<$o. P(<7„ (0) - CT (0) < = p ( d n (0) < Cr (0) + (?) > P ^ £ p ( f ( 0 ) ^ ) < ^ > P 1 7 1 n — ' P U ( 0 ) + < V P U W + V i=i < S0/2 > 1 - e x p ( - r a 7 i ) , for some 71 > 0. Similar ly we obtain P(an (0) - cr (0) > -5) = P(an (0) > o- (0) - <j) Vi - 0 'XJ \ > P n (0) + Sj - Ep y - 0 ' x (0) + 6j < 50/2 > 1 - exp ( - n 7 2 ) , where 72 > 0. F ina l ly P(j(7N (0) - Cr (0)| > 5) < p(<7„ (0) - Cr (0) > <j) + P («7N (0) - <T (0) < - j ) < 2 e x p ( - n min (71,72)) . 154 Hence (4.23) holds wi th 7 = min (71,72). We wi l l show that if A"=UnA/"w-^(o)}' then P (An i.o.) = 0. Let 5X > 0 be arbitrary. Choose K2 > Kx such that E ( . j n f p ( ^ ^ ) ) > b + S l . e\\>K2r \tr(0) + 5 (4.24) We have K».!inV"w>M0)) >P( inf l £ J » - * * >i-p(1-± i „ f > b , a n (0) - cr (0) <b \ - P > nj^\\e\\>K2r \ c r (0) + 5 l - 2 e x p ( - n 7 ) - P ( - V inf p ( V i ~ ^ \nf^\\e\\>K2 H \ < 8 On (0) - Cr (0) <b) , > S kcr(0) + «Jy where 7 > 0 is given by (4.23) (set 6 = 0). Now note that by (4.24) and Bernstein's Inequality (Lemma 7.6) we have p(-izinf p(- \n^\\0\\>K2 \, (0) + 5 < b < P f^\\e\\>K2^ \(r(0) + 6 < exp (—n 7 ' ) , ><5i where 7' > 0. Hence P (An) < 3 exp (-n min (7,7')) and the result follows from the Borel-Cantel l i Lemma (Lemma 7.3). • 155 The following Lemma shows that cr (H, 0) in (4.11) is continuous as a function of 0. Lemma 4.5 Assume that the conditions oj Lemma 4-2 hold and that d ds g (0, s) < 0 V 0 e W V s > 0 . Then cr (0) is continuous in 0 and hence uniformly continuous if 0 e K,Q, where KQ is an arbitrary compact set in W. P r o o f : We adapt an argument in Mar t i n and Zamar (1993). F i x 5 > 0. Let S\ and S2 as in Lemma 4.2 and let K\ as Lemma 4.3. Let KS = [Si, S2] and K , & = {\\0\\ < K2}. Let B — KS x KQ. Let <5o be given by 8 So = 5 min (0,s)<=B ds 9 {0,8) > 0. B y Lemma 4.1 there exists 7 > 0 such that 6>i -0 2 | | < 7 \g(e1,a(62)-5)-g(e2,cr(e2)-5)\ < 50/4. Using the Mean Value Theorem we have g {01, cr (02) -S)-b>g(92,cr (02) - 5) - b So > S min (0,s)eB d_ ds 9(0,s) Hence cr (Oi) > cr (02) — S. Similar ly we have g{Ou<r (62) + 5)-b<g (02, cr (02) + 5) - b + So < —S min 156 so that cr ( 0 x ) < cr ( 0 2 ) 4- 5. The following Lemma states that if s ^ cr (0) then Ep((Y — O'X)/ s) remains uniformly away from b when 0 belongs to an arbitrary compact set. Lemma 4.6 Let K, C W be an arbitrary compact set, let 8 > 0 be arbitrary and let b be as in the definition of the S-scale estimate. Then, there exists a positive constant e = e (/C, 5) such that ( Y — 0 ' X \ M . r 1 <b~* V 0 G / C , and cr(0) + 5 „ fY-0'X >b + e V 0 G / C . P r o o f : Follows easily from Lemma 4.2 and a Taylor expansion of first order, after noting that and A x = inf E A 2 = inf E , (Y-0'X\ (Y -O'X v{0)+5) \cr(0) + 6 ,(Y -e'x\ (Y -O'X cr(0)-8j \cr(0)-5 > 0 , > 0 . Lemma 4.7 shows that the estimating equation is uniformly continuous in 0 and s bounded away from zero, for n sufficiently large, almost surely. When needed in the proof, we wi l l explicit ly indicate that bn or on (0) are random variables by including the argument oo G fi, where fi denotes the underlying probability space. 157 Lemma 4.7 For each 0 e W, s > 0 and u e fl, let -fa \ 1 f (Vi (u) - e>yii (w) \ Pn(0,U),S) = - ^ p l g (4.25) and n > 0 be arbitrary. Then there exists a null set N such that for any e > 0 and UJ N there exist 6 = 5 ( M , e, n) > 0 and nQ = n 0 ( M , u) such that if | | 0 i — 02\\ < 5 then \pn(0i,u,s) - pn(02,oj,s)\ < e Vn>n0(M,oj) V s > n. P r o o f : F i x e > 0 . Let K. C W+1 be a compact set such that PH [ (Y, X ) £ K] < e/8. Take away a null set such that for every remaining u the strong law of large numbers assures the existence of no (u) such that 1 " - J ] / ( x i ( a ; ) ^ / C ) < e / 4 V n > n0 (OJ) n i=l Let ai = sup t p ' ( i ) < oo and let a2 = s u p x e A : | |x | | < oo. Using the Mean Value Theorem and R.3 we Have Pn (01, W) - Pn (02, W) i=l 1 " Vi(u) - 0 i x { (u)\ _ (yt (to) - 0 2 X j (w) / ( X i e / C ) + - V / ( x ^ / C ) i=l i=l „# / Vi M -0Xj{u))\ 1 , p | _ x . ( w ) ( ^ _ 0 2 ) 5 IS I ( ^ (w) G /C) + e/4 < | | 0 i - 0 2 | | o 2 - a i + e74. Hence, for almost al l a; Pn(0l,0j) - pn(02,Uj) < e, 158 if — 02\\ is sufficiently small, and n> n0 (e,ui). • We now show that pn(0,to,s) defined in (4.25) and Ep ((Y — 0'X.)/s) are uniformly close i f 0 belongs to an arbitrary compact set, for large n, almost surely. Lemma 4.8 Let K C W be an arbitrary compact set, let p, pn be as in Lemma 4-7, and let rj > 0 be arbitrary. There exists a null set M such that for any to £ M and e > 0 there exists no = n 0 (CJ , e) such that sup eeic pn (0,LJ,S) - Ep(0,s) <e V n > n 0 ( c j , e ) V s > n. P r o o f : Consider the same null M as in the proof of Lemma 4.7. B y Lemma 4.1 the function 9(0,s) = E p { ^ ^ ^ is continuous in 0, uniformly on s > rj. B y Lemma 4.7 there exists 5 > 0 such that i f | | 0 i - 02\\ < S then > \pn(0l,U,s) - pn(02,LU,s)\ < 6 V U > H 0 (w, e) V s > f ) . Construct a finite collection of open balls B (0j,5) of radius 8 such that they cover K, and such that < e/3 V s > ? 7 , 159 for 0 G B (0j, 8). Such a collection exists by Lemma 4.1 and a standard compactness argument. Let 0 G fC be arbitrary. Then 0 G B (9j, 8) for a particular bal l . We have pn (9,LO,S) - Ep Y-9'X ) s < pn (9,U,S) - pn (0j,U,S )\+ pn (9j,U,s) - Ep + Ep ) < e , (4.26) s if n is large enough, not depending on the particular 9 G W. We now prove the main result of this section. Proof of Theorem 4.1. (i): Let S\ as in Lemma 4.2. F i x 8 > 0 such that Si — 8 > 0. B y the previous lemmas there exist e > 0 such that Ep ((Y — 0 ' X ) / (cr (9) + 8)) < b - e for 9 G K. Also \pn (9,s) - Ep((Y - 0 ' X ) / s ) | < e/2 for al l s > S i - 8 and 0 G /C, for n sufficiently large, almost surely. Note that for any 0 G K, we have o- (0) + <5 > Si - 8 and then so that on (0) < <T(9) + 8 for n large enough, almost surely. Similar ly we obtain on (0) > cr (0) — 8 for n large enough, almost surely. pn (0, cr (0) + 5) < Ep ((y - 0 ' X ) / (cr (0) + 8)) + e/2 < 6 - e / 2 , (ii): This follows from (i) and Lemma 7.16. 160 4.3.2 Consistency of the S- and MM-regression estimates In this section we show that under certain regularity conditions the S- and M M - regression estimates are consistent. Note that because both estimators minimize a loss measure, the following Theorem applies to both. T h e o r e m 4.2 Assume that p : R —>• R+ and X € R p satisfy conditions R.l to R.5 and X.l above. Let g (6, s) be as in (4-19). Assume that an —¥ cr almost surely, and let /3n be defined by Let ~Q be If g (6, CT) has a unique minimum as a function of 0 6 R p , then /3n —>• /3 a.s. P r o o f : We wi l l first show that there exists L > 0 such that IhrT \\f3J < L a.s. 7 1 — > 0 0 Because on —> a almost surely, it is enough to show that for any o > 0 there exists L and r? = n(L) > 0 such that l im inf - V p (Vi ~ ° * l \ > b + n a.s. (4.27) n^oo \\e\\>L n ' V °~ J ~ 1=1 ' The Dominated Convergence Theorem shows that for any o > 0, ,. „ (\Y\-M\ hm Ep = 1. 4.28 M^oo \ a J 161 B y Lemma 7.4 and because P (0 'X = 0) = 0 for al l 0 G W by hypothesis, there exist a > 0, 7 > 0 and a finite collection of compact sets C\,... ,CS in RP such that s \JCj D {OeW : ||0|| = 1} , i=i and P ( inf |0 'X| > a ) > 6 + 7 . (4.29) B y (4.28) we can find M and 77 > 0 such that (6 + 7) Ep(^Y\~M^j P^\Y\ <M^j >fe + 77. (4.30) Now we have inf - E ^ O ^ ^ U i n f 1E^f^-^) / ( l ^ l > M ) 7 ( | y i | < M ) l | 0 | | > L n f e V o- 7 \\e\\>Lnj^ \ a J > inf - V p f 1 ^ 1 " 1 ^ 1 ) n\0%\>M)I(\Vi\<M) , l | 0 | l > L n i = t \ CT / because \yi — 0'XJ| > |yj| — |0'XJ|. Hence inf l £ p ( « L Z « V ) > i n f I ^ p f M ^ ) / ( | 0 ' X i | > M ) / ( | y l | < M ) where 0= 0/\\0\\. Also note that if L > M/ a then {|0'x;| > M/ | |0| |} D {|0Xj| > a}, so that inf l±J*z**) > i n f l £>(M-» [ ) / ( | ^ | > o ) \\0\\>Lnj^H\ a J \m\=m^H\ a J > min inf — > p\ —• I (\0 X.A > a) ~ i<j<s0eCj n j ^ \ o y v l ' - ' > min - Y inf p ~ M > | /( |0' X l | > a) . 162 Equation (4.27) now follows from (4.29), (4.30) and the Strong Law of Large Numbers. We now show that f3n-> J3 almost surely. Consider an arbitrary open neighbourhood B (ft) of f3 and the compact set /C = { 0 : | | 0 | | < L } nB0)c. B y Lemma 7.5 a.s.. So, it is enough to show that there exists 7 > 0 and a\> CT such that The assumption on uniqueness of the minimum of the function g(0,cr), the Domi - nated Convergence Theorem and a standard compactness argument yield CT\ > CT, 7 > 0 and a finite family of sets C\,..., Cs such that s f]Ci D K = J0 : ||0|| < i } n B (f3)c , i=l and r p / r -0 'x \ i / /y - /3 'x \ \ E [ M . " \ - ^ - ) \ > - E { " { - ^ - ) ) + - 1 1 £ ! £ S - Hence l im inf — p ( — — ) > min l im — inf p( — — j , n"Z^o e?e/C n*—?\ <Ti / l<j<s n->oo n BtCj V °~\ J t=l ' 1=1 and the result follows from the Strong Law of Large Numbers. • 163 4.3.3 Asymptotic distribution of the MM-regression estimate The asymptotic distribution of these estimates for distributions belonging to (4.10) can be derived along the same lines we used in Section 2.3.6. In general the scale estimate on is not asymptotically independent of /3„. Nevertheless, provided the sequences Bn, Bn and an are consistent, if an is a S-scale, we can find the asymptotic distribution of y/n (f3n — 8). The proof of the following theorem is based on the same techniques used in the proof of Theorem 2.6 and is omitted here. Theorem 4.3 - Asymptotic normality - Assume that f3n A 8 (H), /3n A /3 (H), p and on —\ cr (H). Let F G He and let p0 and p\ be as in Definition 4-2. Assume that Po and p\ have third derivatives that are continuous and the following expected values exist: EH EH EH , (y-8'X\ (y-B'X Po ti(y-^\(y-0'X)X EH EH , EH Pi a y-B'x I X H 2 X and Then E H p 7 [ y — ^ ) ( y - B ' x ) x x ' The asymptotic variance-covariance matrix is given by the following. Let A = E H Pi n -1 X X ' (4.32) 164 and E H b = ACT X E H S = A £ p , ^ ^ x ) x x , 2AEH Po I I - b bb' p ; ( ? ^ ) „ ( £ z £ E , x b' (4.33) Note that the form of S is very involved, and hence the empirical estimate obtained by replacing H wi th Hn w i l l be numerically unstable. This w i l l have a negative impact on the quality of statistical inferences which are based on it. 165 C h a p t e r 5 R o b u s t b o o t s t r a p for the l i nea r regress ion m o d e l In this chapter we extend the results of Chapter 3 to the linear regression model. We consider the problem of statistical inference based on robust regression estimates and describe the implementation of the robust bootstrap for this model. We illustrate its robustness properties with two real data sets. We show that under regularity conditions the robust bootstrap distribution estimate is consistent for the asymptotic distribution of the statistic of interest. We also study the breakdown point of the resulting quantile estimates and show that they improve upon the classical bootstrap quantile estimates. Final ly, we compare the finite sample coverage and mean length of confidence intervals built wi th the empirical asymptotic variance estimate and the robust bootstrap. 166 There are several results in the literature concerning the application of the bootstrap principle to this model (Efron, 1979; Freedman, 1981; W u , 1986; Efron and Tibshirani , 1993). Moreover, the bootstrapping of robust regression estimates has also been studied by Shorack (1982). As in the location-scale model, two difficulties arise when bootstrapping robust regression estimates. The first is related to the computational complexity of robust regression estimates. The second is a consequence of the presence of outliers in the data. Consider MM-regression estimates /3 n calculated wi th an in i t ia l S-estimate 8n and S-scale an (see Definitions 4.2 and 4.1 on pages 143 and 141 respectively). These estimates have desirable robustness properties but are not easy to calculate. When we bootstrap these estimates, for each bootstrap sample we have to solve a non-convex minimizat ion problem in W to determine 8n and the scale estimate an. Then.we have to find a local extreme of another non-convex function in R p to determine 3n. The number of bootstrap samples needed to obtain reliable distribution estimates naturally grows with the dimension of the statistic and hence makes the problem computationally even more expensive to solve. In the context of data-mining and other applications with extremely large data sets (both in the number of cases and the number of covariates) straightforward re-calculation of such robust estimates is rarely a feasible option. The presence of outliers i n the data can have an unduly large effect on the final distribution estimate. As described in Chapter 3 the reason lies in the re-sampling 167 scheme: in many bootstrap samples the proportion of outliers can be significantly higher than in the original data set. This may in turn produce extreme re-calculated estimates and affect the bootstrap distribution estimate. We wi l l show that wi th our method, each re-calculation only involves solving a linear system of equations. Hence there is a very important gain in speed, and consequently in feasibility. We also obtain more stable and robust quantile estimates than the classical bootstrap method. To quantify this property we extend Singh's concept of quantile breakdown point (Singh, 1998) to the linear regression model. To illustrate the magnitude of the gain in speed obtained wi th our method consider this simple example. We generated an artificial data set following model (4.1) with 3 = 0 wi th n = 50 and p = 5. We applied both the classical and robust bootstrap to an MM-regression estimate, with 10,000 bootstrap samples. The classical bootstrap took 2735 C P U seconds while our method used less than 7 C P U seconds. The computations were done wi th C code designed by the author and run on a d u a l - C P U Sun Sparc U l t r a 4 (each C P U a 296 Megahertz S U N W U l t r a S P A R C - I I ) with 1.1 Gigabytes of R A M memory and using SunOS 5.7. The rest of this chapter is organized as follows. Section 5.1 presents the method and the notation. Section 5.2 contains two examples that illustrate the use of our method. Sections 5.3 and 5.4 discuss its asymptotic and robustness properties respec- tively. Finally, Section 5.5 contains the results of a Monte Carlo study on the coverage and average mean of confidence intervals obtained wi th MM-regression estimates and different methods of estimating their distribution. 168 5.1 Definitions Let 0 / i , X i ) , . . . , (Vni x n ) be a random sample following model (4.1). We consider M M - regression estimates wi th in i t ia l S-regression estimates and S-scales (see Definitions 4.2 and 4.1). To simplify the notation let 3n be the in i t ia l S-regression estimate, an the associated S-scale estimate and let (3n be the final M M - r e g r e s s i o n estimate. We wi l l consider the case of random explanatory variables in detail, but briefly discuss the fixed design case in Remark 5.3 below. As discussed in Chapter 3, we are interested in making statistical inferences about the regression parameter 3. We can use the result of Theorem 4.3 on the asymptotic normality of the sequence -y/n(/3n — 3). To use this method we only have to estimate the asymptotic variance of J3n. The second option is to directly estimate the distribution function of \ /n ( /3 n — 3). We can then use this distribution estimate to approximate the quantiles needed to construct confidence intervals. We propose to use the following computer intensive method to generate a large number of re-calculated /3 n 's. These re-computed statistics can be used to estimate both the asymptotic covariance matrix and the distribution function of the statistics J3n. For the first objective we can use the empirical covariance matrix of the re- calculated p „ ' s . To estimate the distribution function Fn of 3n we use the empirical distribution function of the re-computed statistics. In what follows we wi l l focus on this last problem. Theorem 4.3 shows that the asymptotic behaviour of the sequence J3n depends 169 on that of & n . Hence, to obtain an estimate of the distribution of Qn we take into account the behaviour of the scale estimate an. For each pair (y^ Xj) in the sample define the residuals associated wi th J3n and ~8n: rt = yi — / 3 n Xj and f; = yt — / 3 n X j . First note that J3n and an can be represented as a weighted least squares fit. Similar ly to Chapter 3, define the weights U{ and as = p\ {n/an)/Ti 1 < i < n, Vi = p0(fi/an)/fi l < i < n . n o (5.1) Simple computations yield the following weighted average representation of equations (4.7) and (4.8): Pn E Lt=l n OJi Xi Xi i=l (5.2) (5.3) i=l Let (y*,x*), i = l , . . . , n b e a bootstrap sample from the observations. Define the random variables Bn and cr* by Pn n -1 . i= l n ^ *i Vi , (5.4) (5.5) i=i where OJ* = p\ {r*/an)/r*, v* = an p0 (f*/an)/ (n b f*), r* = y* - p'nx*, and f\ = y* - ~3nx\ for 1 < i < n. Note that J3n, an and @ n are not re-calculated from each bootstrap sample (y*, x | ) , i = 1 , . . . , n. 170 We now apply a linear correction to the estimates obtained in (5.4) and (5.5) and combine them. Intuitively the correction is needed to account for the loss in variability due to the fixed weights. Let XIp'l (ri/an,Xi)xi x- i=l n X!uji Xi x*' i=l ,i=l i=l 1 1 ° " = ^ " n 6 51 ^'o (*V On) fi/On] . (5.6) (5.7) (5,8) t=i R* The robust bootstrap 3n — 3n is given by R* 3n -3n= Mn (Pn ~ Pn) + <*n (K ~ On) , R e m a r k 5.1 - Computational Ease: Note that to recalculate 3n we do not solve (4.8) and (4.7). For each bootstrap sample we only solve the linear system of equations (5.4) and calculate the weighted average (5.5). The correction factors M n , d n and an arise from two linear systems and a weighted average respectively and are computed only once with the full sample. R e m a r k 5.2 - Robustness: For MM-regression estimates (3n wi th a re-descending score function p[ (i.e., p\ (r) = 0 for \r\ > c > 0), the weights uii give the method stability in the presence of outliers. Out ly ing points wi l l be associated with small weights in equations (5.2) and (5.3). Note that extreme outliers (those with an associated residual \rt\ > con) w i l l receive a zero weight, and hence wi l l have no effect at all on the recalculated coefficients. Note that the weights v* used in recalculating the scale are also decreasing in the absolute value of the residuals. This also makes the outlying points less influential in the recalculated a*. 171 R e m a r k 5.3 - Fixed design: In the case of a linear regression model with fixed design we propose to adapt our method as follows. The main difference lies in the re-sampling procedure, so that it best resembles the randomness of the model (Freedman, 1981). Let ej = y.j — i3nx.j, 1 < j < n be the residuals of the MM-est imate. The bootstrapped y*'s are where ej, 1 < i < n is a random sample from the residuals. Now (3n and <7* are defined by x< x< . 8 = 1 71 ^ X i V i . 1=1 = < (2/i* ~ Pn*i) (5.9) (5.10) i=l where u* = p[ (r*/an)/r*, v* = an p0 (f*/an)/[n b f*), r* = y* - (3nXi, and f* = y* — 3 n Xj for 1 < i < n. The correction factors M n , d n and an are defined as before, and so is f3n — (3n. See Chapter 6 for a discussion on future work regarding this case. 5.2 Examples 5.2.1 Body and Brain Weights Consider the brain and body weight of 28 animals published in Rousseeuw and Leroy (1987, page 57). The model considered in the literature is log ( B r a i n weight (g)) = a0 + /30 log ( B o d y weight (g)) + e; 172 where OJO £ R and /3Q G R are the parameters of interest and e are independent and identically distributed errors wi th mean zero and constant variance a2. We used an MM-regression estimate obtained with ip = p 4 6 8 5 in Tukey's family. The S-scale was obtained wi th pi.54764 also in Tukey's family. This choice yields an estimate with simultaneous 50% breakdown point and 95% efficiency if the data are normally distributed. Figure 5.1 contains a scatter plot of the transformed data with the least squares and MM-regression fits. The question of interest is whether larger brains are required to govern heavier bodies. In particular, the magnitude of the slope is relevant: a slope larger than 1 would indicate that the required brain weight increases faster than the body weight. O n the other hand, a slope smaller than 1 would indicate the opposite. We are interested on a confidence interval for the slope of this model. We obtained 10,000 bootstrap samples to re-calculate the estimates (&n,J3n). We used these re-computed estimates to approximate the distribution of the vector of parameters Vn ({&n,Pn)' ~ (aoiA))') , (5-11) in order to perform statistical inference on the parameters of interest (ao, A))- We used both the classical and robust bootstrap and concentrated on inferences for the slope /3o. Note that given the empirical distribution in R 2 of the re-computed (&*, /3*) we can use its projection on the /5-axis to obtain an estimate of the distribution of the (3 projection of (5.11). W i t h the classical bootstrap, the 99% confidence interval for /30 is (0.66,1.49). The p-value for the null hypothesis /30 < 1 is between 0.01 and 0.05. The robust 173 bootstrap yields the following 99% confidence interval for f30: (0.67,0.84). The p- value for the same null hypothesis is p < 0.0001. The reason for this difference lies wi th the tails of the bootstrap distribution of the regression parameters. Three observations in this data set correspond to d i - nosaurs and do not follow the same pattern as the other observations. A certain proportion of the bootstrap samples may contain enough outlying observations to breakdown the estimate. These samples can yield extreme values for the estimate that produce unduly large quantiles. The robust point estimate and the robust boot- strap re-calculated estimates down-weight these three observations and hence are less sensitive to them. In Figure 5.2 we show scatter plots of (/3* — J3N) for 1 < u < 10, 000, where /3* denotes either the classical or the robust bootstrap estimate for the u-th bootstrap sample. We see that the tails of the classical bootstrap estimate are highly influenced by the outliers present in the data. This causes the estimates for the 99.5% and 0.5% quantiles to be highly inaccurate. The quantile estimates based on the robust bootstrap do not have this problem. 5.2.2 Belgium International Phone Calls Consider the Belgium International Calls data set (see Rousseeuw and Yohai , 1984). These data consist of the number of international phone calls (in tens of millions) originating in Belgium between 1950 and 1973. From 1964 to 1969 the observations 174 0 i i r 0 5 10 Body Weight (g) - Log scale Figure 5.1: Least squares and robust regression fits to the Bra in and Body Weight data 175 I >• «• * • • • • ' • » , . . . T 0 2 4 6 Intercept (a) Classical bootstrap -M k, 0 2 4 6 Intercept (b) Robust bootstrap Figure 5.2: Classical and robust bootstrap distribution estimates for the Bra in and Body Weight data - 10,000 bootstrap samples 176 Parameter Method 95% Confidence Interval c*o Robust Bootstrap Classical Bootstrap ( -10 .32 , -3 .20) (-17.74,0.35) Po Robust Bootstrap Classical Bootstrap (0.08,0.20) (0.00,0.28) Table 5.1: Belgium International Calls - Bootstrap and robust bootstrap 95% confi- dence intervals were mistakenly recorded. Instead of the number of calls, their total duration in minutes was registered. The figure for 1970 is partly contaminated; some calls were recorded wi th their duration, others were registered according to the old convention. The linear regression model considered in the literature is # Calls (in tens of millions) = a0 + Po Year + e , (5.12) where a0 and p0 are the parameters of interest, and the errors e are assumed to be independent and identically distributed with mean zero and unknown but constant variance cr2. The MM-regression estimate wi th an S-scale gives a0 = —5.23 and /30 = 0.11. Figure 5.3 displays the data with the robust and least squares fits. To obtain confidence intervals for the regression parameters 0 we use the classical and robust bootstrap. We performed 10,000 bootstrap re-calculations. Scatter plots of fiu — f3n for the robust bootstrap and of f3u — (3n for the classical bootstrap are presented in Figure 5.4. We clearly see that the robust bootstrap estimates are more stable. This is reflected in the length of the confidence intervals. Table 5.1 contains the lower and upper l imits of 95% confidence intervals for the slope and intercept calculated wi th both the classical and robust bootstrap. 177 V) c o E i n O Ui c Least Squares estimate MM-regression estimate 50 55 60 65 70 Year Figure 5.3: Least squares and robust regression fits to the Belgium International Phone Calls data Note that when we estimate the variability of the robust estimates with the robust bootstrap we conclude that the estimates of the regression coefficients are significantly different from zero at the 5% level. The artificial variabili ty introduced by the outliers in the classical bootstrap re-calculated BJs inflates the standard de- viation estimates. A s a consequence, if we use these standard deviation estimates we conclude that, at the 5% level, there is no significant linear relationship between the response and the predictor variable. The conclusion obtained wi th the robust bootstrap analysis is intuitively i n agreement wi th the linear trend observed in the scatter plot of the data (see Figure 5.3). 178 0 0 0 0 0 o 8 0̂ s 8° 0 0 0 o o 0 Intercept (a) Classical bootstrap Intercept (b) Robust bootstrap Figure 5.4: Comparison of classical and robust bootstrap distribution estimates for the Belgium International Phone Calls data - 10,000 bootstrap samples 179 5.3 Asymptotic properties The following theorem shows that the asymptotic distribution of the robust bootstrap is the same as that of the MM-regression estimator. Theorem 5.1 - Convergence of the robust bootstrap distribution - Let p0 and pi be real functions as in Definition 4.2. Assume that they have continuous third derivatives. Let f3n be the MM-regression estimator, dn the S-scale and 0n the p associated S-regression estimator. Assume that they are consistent, that is: 0n —f 0, P p on —> o and 0n —> 0. If the following conditions hold: 1. the following matrices exist and are finite: E[p[{r)/rXX']-\ E[p'Q(r)/rXX!}-\ E [p[ (r) X X ' ] , E [p[ (r) r X X ' ] , E [pi (r) X X ' ] , E [p'l (r) X X ' ] " 1 , E [pi (r) rX] , E [p'[ (r) r X ] ; 2. E [p'0 (r) r] ^ 0 and finite, 3. p'0 (it)/u, p\ (u)/u, (p'0 (u) — p'o (it) u)l i t 2 and (p[ (u) — p'[ (it) u)f u2 are con- tinuous; then along almost all sample sequences, conditional on the first n pairs, y/n (/3n — /3n) converges weakly, as n goes to infinity, to the same limit distribution as y/n (fin — 0). Remark 5.4 We refer to Remark 3.3 on page 102 to verify that assumption 3 above is satisfied for functions pd in Tukey's family (2.8). 180 P r o o f : First note that the estimates 0n, on and 0n satisfy the following equations i=i X » IT. i=i x, = 0 = b n (K) x, = 0. i=l Simple calculations yield the following re-weighted version of the estimates where J3N = An (Jjn, <rn) v„ (pn, d>„) On = On Un (p\,,<5n) ~0n = Bn (]}n, <rn) w n (&n, <7„) , A„ (p\,cr Vn O n a M n (62, O W „ (/32,cr Ui (0x,a Vi (02,a n 1 " 7 1 — ' i=l 1 " -^2uji(01,a) ytXi, i=i n $3 ^ (02,a)fi, i=i 1 " i=l 1 " - J ^ W i (02io)yiXi, i=i P i (ri/a)/ri, Po(fi/o)/ (nbfi), (5.13) 181 and Equations (5.13) can be expressed as the fixed point of a conveniently chosen function. Consider f : R 2 P + 1 ->• R 2 P + 1 defined for 81 G RP, a G R and 82 G R P by f Anid^o)-1 v n ( / 3 l 5 a ) ^ f{81,o,82) = oun(82,o) y Bn(82o)~l W „ ( / 3 2 ,CT) . j To simplify the notation we do not explicit ly indicate the dependence of f on n. We have Using the differentiability of po and p\ we can calculate a Taylor expansion of f about the l imi t ing values of the estimates (8, a, /3), ( /9,<7,0)+Vf(S,<7,0) On — O + Rn, (5.14) where is the remainder term and Vf (•) G R ( 2 P + I ) X O + I ) J s t n e m a t r i x of partial derivatives, P 1 a[A^vB]/a/3 a ^ v ^ / a a d[A-^ vnyd~8 d[oun]/d8 d[oun]/do d[oun}/d~8 d[B-l^n]/d8 d\B~lwn]/da d[B-^n]/d~8 182 Tedious but straightforward calculations show that each entry in Rn is a linear combination of quadratic forms x ' n Hn x n where x n = 0n — 0 or x „ = an — o or x n = 0n — 0. Note that | |x„ | | = Op ( 1 / \/n). The regularity conditions on p 0 and px and Lemma 7.1 show that \\Hn\\ = Op (1). We have | x ^ i J n x n | = oP ( 1 / V n ) - Hence 1 1 ^ 1 1 = 0 , ( 1 / 0 1 ) in (5.14). To simplify the notation let r n = (J3n,on,J3n)' and r = (3,o-,J3)'. Equation (5.14) becomes v / ^ ( T „ - r ) = [ I - V f ( r ) ] - 1 v ^ [ f ( r ) - r ] + 0 p ( l ) . (5.15) We wi l l now show that the correction factors M „ , and d n in (5.6) and (5.7) are the corresponding first p rows of the estimate [I — V f ( r n ) ] _ 1 of the matr ix [I — V f ( r ) ] _ 1 in (5.15). It is easy to see that I — V f ( r n ) has the following form I - V f ( r n ) = A V 0 ••• 0 0 ••• 0 0 ••• 0 a 0 ••• 0 0 ••• 0 0 ••• 0 w B where d d Q W = f B " _ 1 w " ] and B = 1- -^L- [B"1 w j . 183 That d - ~ - K ] = (o,...,o) dd follows from the fact that bn attains the minimum of the S-scale. Now note that the estimate of the correction factor in (5.15) has the following form: [I — V f (r n )] - l A-1 -A~lv/ a 0 ••• 0 0 ••• 0 0 ••• 0 1/a 0 ••• 0 0 ••• 0 0 ••• 0 -B~lw/a B-1 (5.16) Note that in (5.15) we are only interested in the first p + 1 coordinates of r n (the remaining p correspond to the S-regression estimate). From [I - V f ( r n ) ] _ 1 in (5.16) we see that the last p coordinates o f f are not involved in determining the first p + 1 coordinates of rn — r . Hence, when we apply this method in practice we do not need to bootstrap & n . It also follows from (5.16) that we only need to calculate A, v and a. We need to find and d_ dd d_ do [ A n ( / 3 , c r r vn(d,o)] ~An{a,o)-1 vn(0,o)] 184 One way to calculate them is to differentiate the vector an defined impl ic i t ly by A n (0, a) ocn (0, a) = v„ (0, a) . Drop the arguments (0, a) and the subscripts to simplify the notation. We differen- tiate on both sides of the equation 9 r A i d c ^ [ A a ] = c ^ V - Note that 0 0 dp[Aa] = Adpa + 8 I V ( 4 A ) « ; ; feA)« AaT* + A' say. So 8_ 00 a = A 8_ 80 v - A Simple calculations show that 8 80 and P'i (rj/vn) ~ Pi' {ril on) n/ an , r? 8 00 8 i=l i=l = QpV ~ A + P" (Til °n*iA , i=l which yields 8_ 00 a = A - l i=i 185 It follows that A = l - i k a 1 " Gn — Then we have and A 1 = an I J ] Pi' to/ ff„) x i X; I A , (5.17) _A-1 y / a = b n °n E"=l Al ( Til On) X j X J ] 1 £ ? = 1 P i fal ^n) U/ dnXj EiLlPo to/On) ?V<Tn It is easy to see that M n in (5.6) is equal to (5.17) above, and that d n in (5.7) is - A ' 1 v/a in (5.18). We wi l l now show that the bootstrap distribution of y/n [f * (rn) — r„] con- verges to the same l imi t ing distribution as that of the sequence \/n[i* (r) — r]. First , note that la* a \ I A * - 1 , ,* a \ [r fan) - Tn] = Pn'Pn an ~ °n A * - 1 V* - 8 On Un - On \Pn~Pn J where * denotes the bootstrap version of these quantities. It is easy to see that n =5>ito7*n)x; + A ; ( j 9 B l a B ) pn i=l and n W*n(f3n,dn)=Y^Pofa/tn)x* + B n (/3 N ,<7 N ) ~Qn . 186 Then O* — On n \Pn-Pn ) On Un Or, (5.19) V B n E r = i P ' o ( ^ 7 / ^ ) x : j This last expression can be expressed as a function of means. Consider the function g : Wxp x W x W x Wxp x l M l p x l " x I P , g ( A , v , u, B , w ) = ( A \,u,B X w ) . Then (5.19) can be written as g ( A * , z * , u * , B * , w*) where A * , u*n and B * are as before, Zj = p\ (r*/&n) x*, and = p 0 ( f * / < 7 n ) x * for 1 < i < n. This function is differentiable (this can be seen by thinking it as a composition of differentiable functions). We have that the statistic we are bootstrapping is of the form g ( y n (T„)) - g (fi ( r n ) ) , where for 1 < i < n is a vector of the bootstrapped dimension and rn is a consistent estimate of the vector of parameters r. As in the proof of Theorem 3.1 we have to show that the asymptotic distribution of Vn(yn ( r „ ) - /X(T„)) (5.20) is the same as that of Vn(yn(r)-fi(r)) . (5.21) The proof of this last statement uses the same idea as that used in Theorem 3.1 to prove the corresponding statement for the location-scale model. It is based on bounding the distance d2 (see Bickel and Freedman, 1981) between the distribution 187 functions of (5.20) and (5.21), using the fact that r n —>• r almost surely. Lemma 8.1 of Bickel and Freedman (1981) and the regularity conditions of g show that the bootstrap distribution of g (yn (r„)) - g (/z (rn)) converges to the same l imi t as that of the sequence g(y„(r))-g(/i(r)). • 5.4 Robustness properties We are also interested in the robustness properties of the quantile estimates of our robust bootstrap. Let t G (0,1), and let qt be the t-th upper quantile of a statistic 0n, that is, qt satisfies P [9n > qt] = t. A s in Section 3.4 we define the upper breakdown point (UB) of a bootstrap estimate qt as the smallest proportion of arbitrarily large outliers such that we expect qt to be driven above any bound in at least t x 100 % of the bootstrap samples. For the linear regression model we need an extra assumption. To fix ideas consider a linear regression model wi th a single explanatory variable (XJ G R) . We wi l l require that no two x's are equal. If the design is random then this event has probability one. If the data contain two observations at the same x the breakdown point of the robust bootstrap quantile estimates w i l l decrease. Intuitively this is due to the fact that in this case we would not need to introduce outliers in the sample to get an unbounded recalculated slope: a bootstrap sample consisting only of these vertically aligned points wi l l result in an arbitrarily large set of coefficients. The following definition can also be found in Rousseeuw and Leroy (1987). 188 Definition 5.1 - General position - We say that k points in W are in general position if no subset of sizep+1 of them determines an affine subspace of dimension p. In other words, for every subset x^,..., x* + 1 ; 1 < ij < k, ij ^ %i if j ^ I, there is no vector v 0 G W \ {0} and scalar a 6 l such that The main result of this section is the following theorem that establishes the breakdown point of the quantile estimates based on the robust bootstrap. Theorem 5.2 - Breakdown point of the robust bootstrap quantiles for the regression model - Let 0 / i , X i ) , . . . , (yn,^-n) E R p + 1 be a random sample following the linear model (4-1)• Assume that the explanatory variables x 1 ; x„ in W are in general position (see Definition 5.1). Let f3n be an MM-regression estimate and let t* be its breakdown point. Then the breakdown point of the t-th robust bootstrap quantile estimate of the regression parameters (3j, j = 1, . . . ,p is given by min (e*, e), where e is the smallest solution in 5 of the equation x i ; v o = « for j = l,...,p+l. P [ Binomial (n, 1 6)<p]>t. The following lemma is needed for the proof of Theorem 5.2. Lemma 5.1 Let (ylt x x ) , . . . , (yn, x, •n ) be n > p points in W such that if (5.22) x; n 189 then X„X„ has full rank. For a given (yn+1,xn+1) let / 3 n + 1 be the least squares regression coefficients determined by the n + 1 points. There exists a finite constant K such that $n+i < K for anV (2/n+i)Xn+i) with \yn+\\ < c. (The constant K only depends on the first n points and on the constant c) Proof: We wi l l show that for any set of n > p points, the regression parameters f3n+1 obtained when adding a new point ( y n + i , x n + 1 ) are bounded for any x n + 1 if yn+\ is bounded. Let X n e E n x p be the design matrix in (5.22). Note that X n has rank p by hypothesis. A s a consequence both (X'n X n ) and its inverse are positive definite. Let A be a non-singular matrix in Wxp and let x g f f . Use the following formula (see for example Seber (1984), page 519) (A + xx')"1 = A" 1 - A ^ x x ' A " 1 ( l + x ' A ^ x ) - 1 to obtain 0n+l = V x n + 1 x n + 1 1 + x n + 1 V x n + 1 V - V x n + 1 x „ + 1 V 1 + x n + l V X n + 1 x n + l Vn+li where V = (X„X n ) 1 is positive definite and ( y n + i , x n + i ) is the new point to be added to the regression. To simplify the notation let u = x n + 1 , A = I Vuu' 1 + u'Vu' The last equation can then be written as and B = V - Vuu'V 1 + u'Vu Pn+i = Aj9B + B u y n + 1 . First we wi l l show that every entry in A is bounded for ||u|| -> oo. The element 190 is given by A ( y ) - O y 1 + Z k i : i V k l U k U l Si,j - It is easy to see (for example by dividing both the numerator and denominator by ||u||2) that the denominator has the same order as the numerator, so that the fraction wi l l remain bounded as ||u|| —»• oo. Note that the denominator is bounded away from zero, so that the whole expression is bounded for any u. We now show that the r t h element of B u goes to zero as ||u|| —» oo. Note that V u B u 1 + u ' V u ' The r t h element is then 1 + T/iJviJuiuj' Divide both numerator and denominator by | | u | | 2 and use that Hull < 1 for 1 < j < p, (5.23) to conclude that the denominator is bounded, and that the numerator goes to zero because (5.23) implies -V 0 for 1 < j < p. P r o o f o f T h e o r e m 5.2: Let (yu x :),. . . , (yn, xn) G W+1 be n observations fol- lowing model (4.1). We assume that xi , . . . ,x n G MP are in general position (see 191 Definition 5.1 above). This assumption guarantees that any subset of size p of them wi l l determine a bounded least squares estimate. We assume that there is a certain proportion of observations that do not nec- essarily follow the linear regression model (4.1). We wi l l show that any bootstrap sample that contains at least p points that are not outliers yields a bounded Qn . It follows that the only samples that can produce unbounded robust bootstrap coeffi- cients are those that contain at most p — 1 points that are not outliers. The robust ~ R* bootstrap pn is given by P " f = M n (fa-^y + dn (0*n-&n) Note that the matrix M n and the vector d n are not re-calculated with each bootstrap sample, and as long as the robust regression estimate Qn does not breakdown, they remain bounded. It is also easy to see that <r* also remains bounded for any bootstrap sample. Hence, the problem becomes determining under which circumstances Bn can be driven beyond any finite bound. Recall that /3„ = * x*' 1=1 - 1 E * * * x i Vi , i=i where the weights to* = p[ (r*/an)/r* are bounded. The above expression can be re-written as 1 - 1 i=i XI Vi* ' t=l where x V = y ^ x * and y* = y/ufy*. We consider the case of having at least p data points that are not outliers. It is enough to have a bound on the effect of one outlier and that that bound does not depend on the outlier. In what follows we show 192 how to obtain such a bound. To simplify the notation we use the same symbols x» and yi for the weighted points X ; and y;. Let (y i , X i ) , . . . , (y„, x „ ) , be a bootstrap sample of n > p good data points, and let ( y n + i , x n + i ) be an arbitrary outlier included in this sample. Let /3n be the M i s - estimate based on the full data. Without loss of generality assume that 8n = 0 G W. The data can always be transformed to satisfy this assumption. In particular if yi = yi-p'nXi i = l , . . . , n , then the points (yi , x i ) , . . . , (y n , x n ) have a zero regression estimate. We now show that the outlier ( y n + 1 , x n + 1 ) wi l l only have an effect on /3 n + 1 for a bounded range of yn+i- Let c > 0 be the constant of the function ipc used for the MM-est imate in (4.7), and let o£ = sup crn be the largest possible value of an for a sample of size n. A n y point ( y n + i , x n + 1 ) such that | y n +i | > a+c wi l l not affect /3n and wi l l receive a null weight in the robust bootstrap re-calculations. Hence it is not possible to upset 3n+1 wi th this type of contamination. In what follows we consider the case | y n + i | < o+c. Lemma 5.1 gives a bound for the effect of ( y n + 1 , x n + i ) on 3n+1. This bound only depends on the first n pairs. Given a bootstrap sample of size n , assume that the first k observations are "good" and the remaining n — k are arbitrary outliers. App ly ing Lemma 5.1 n — k times we see that the new 3n+i can only be modified by a finite amount. This amount depends on the k first observations of this bootstrap sample, but it does not depend on the values of the n—k outliers. Considering al l the possible bootstrap samples that contain at least p points that are not outliers we find a bound that only depends on 193 p n Robust Bootstrap 90.005 90.025 90.05 Classical Bootstrap 90.005 90.025 90.05 10 1 20 30 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.191 0.262 0.304 0.257 0.315 0.347 0.293 0.343 0.370 10 2 20 30 0.456 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.128 0.187 0.222 0.217 0.272 0.302 0.265 0.313 0.339 10 5 20 30 0.191 0.262 0.304 0.500 0.500 0.500 0.500 0.500 0.500 0.011 0.025 0.036 0.114 0.154 0.177 0.185 0.226 0.249 20 10 50 100 0.257 0.315 0.347 0.500 0.500 0.500 0.500 0.500 0.500 0.005 0.012 0.018 0.180 0.212 0.230 0.294 0.322 0.336 Table 5.2: Comparison of quantile upper breakdown points for MM-regression esti- mates wi th 50% breakdown point. the original data set. To drive the t-th. robust bootstrap quantile estimate above any bound we need to have at least t% of the bootstrap samples containing less than p "good" points. The proportion e of outliers in the original sample should then satisfy P [ B inomia l (n, 1 - e) < p ] > t. • The following table compares the breakdown point of the robust bootstrap quantile estimates with the equivalent classical bootstrap estimates. We considered an MM-regression estimate with 50% breakdown point and 95% efficiency when the data are normally distributed. We compared the quantiles needed to construct 90%, 95% and 99% confidence intervals for different sample sizes (n) and number of explanatory variables (p). Note that the only cases where the upper breakdown point for the 194 robust bootstrap quantiles is significantly smaller than the breakdown point of the regression estimate (50%) is for n = 10, p = 5, and for n = 20, p — 10. These cases are not of interest from a practical point of view due to the extremely large dimension of the model relative to the number of observations available. Also note that our upper breakdown points are significantly larger than those of the classical bootstrap quantiles estimate. 5.5 Inference 5.5.1 Empirical coverage levels of confidence intervals In this section we report the results of a Monte Carlo study on the finite sam- ple properties of confidence intervals for the parameters 3 in the linear regression model (4.1). We considered sample sizes n = 30, 50 and 100 wi th 2 and 5 ex- planatory variables. These independent variables included an intercept: x\ = 1, and Xi ~ Af(0,1) for i = 2,...,p. Finally, the errors followed the gross-error contamination model wi th distributions F€ = (1 — e) $ (x) 4- eV (x) where V (x) = 0 .5$ ( (x - x0)/Q.l) + 0 .5$ ( (x + xo)/0.l) and $ (x) denotes the standard normal cumulative distribution function. We used e = 0.00, 0.10 and 0.20. The contamina- tion point XQ was set at 3, 4 and 10. We report the results obtained for xQ = 4, the others being very similar. We generated 5,000 data sets from the above distributions with e = 0.00, 195 0.10 and 0.20. We build 95% and 99% confidence intervals for the parameters of the model. We used MM-regression estimates obtained wi th ip = p 4 6 8 5 in Tukey's family. The S-scale was obtained wi th pi.54764 also in Tukey's family. This choice yields estimates with simultaneous 50% breakdown point and 95% efficiency when the data are normally distributed. We considered two methods to obtain confidence intervals for the regression coefficients. The first was the robust bootstrap as discussed above. We generated ~ R* many re-calculated 0n — 0n and used the empirical distribution of each projection to obtain estimates of the distribution of ~ f ° r e a c n coordinate j = 1, . . . ,p. W i t h these distribution estimates we obtained the quantiles needed to bui ld the con- fidence intervals of interest. The second approach used the normal approximation (4.18) where we esti- mated the asymptotic variance with its empirical version £ = £ (Fn, Gn, f3n, cr n), where Fn is the empirical distribution of the observed errors and Gn is the empirical distribution of the observed design. Note that this estimate of the asymptotic variance is simpler (and hence nu- merically more stable) than the one given in Theorem 4.3 for an arbitrary distribution H. Because we know that the distribution generating the data is symmetric, we are confident that formula (4.18) is correct. Because of its numerical simplicity it is the best competitor to the robust bootstrap among (4.33) and (4.18). In this context the classical bootstrap demands so much computer time that 196 it becomes almost unfeasible; hence we did not include it in our study. Tables 5.3 and 5.4 tabulate the results for p = 2. Tables 5.5 and 5.6 contain the corresponding findings for p = 5. Figures 5.5, 5.6, 5.7 and 5.8 display part of the results in a graphical form. These pictures show at a glance that the levels obtained with the robust boot- strap are better than the ones yielded by the empirical asymptotic variance estimate. The difference in performance is more important for p = 5. B o t h methods are very close only for the case of n = 100 and e < 0.10 or for n = 50 and e = 0.00. The observed behaviour for the first scenario was expected because both methods are asymptotically equivalent, and hence behave similarly for large sample sizes. It is also reasonable not to observe large differences for e = 0.00 and moderate to large sample sizes, as the empirical asymptotic variance estimate is asymptotically correct when the errors follow the central model. Our method is more stable for smaller sample sizes, and yields more reasonable coverage levels for positive values of e. 197 n e Parameter Robust bootstrap Empir ica l A V 30 0.00 Po 0.927 (0.748) 0.918 (0.720) P i 0.925 (0.796) 0.921 (0.739) 0.10 Po 0.930 (0.953) 0.915 (0.720) P i 0.933 (1.057) 0.908 (0.892) 0.20 Po 0.945 (1.408) 0.921 (1.215) P i 0.943 (1.550) 0.914 (1.247) 50 0.00 Po 0.932 (0.572) 0.928 (0.562) Pi 0.928 (0.591) 0.935 (0.571) 0.10 Po 0.938 (0.716) 0.930 (0.684) P i 0.939 (0.755) 0.926 (0.697) 0.20 Po 0.953 (1.037) 0.938 (0.962) P i 0.950 (1.119) 0.925 (0.977) 100 0.00 Po 0.943 (0.404) 0.942 (0.400) Pi 0.936 (0.410) 0.939 (0.404) 0.10 Po 0.951 (0.501) 0.946 (0.490) P i 0.942 (0.514) 0.941 (0.495) 0.20 Po 0.954 (0.713) 0.948 (0.690) P i 0.949 (0.742) 0.938 (0.695) Table 5.3: Average coverage and length of 5,000 95% confidence intervals for the linear regression model with p = 2 198 n e Parameter Robust bootstrap Empir ica l A V 30 0.00 Po 0.975 (0.984) 0.972 (0.947) P i 0.974 (1.047) 0.972 (0.973) 0.10 Po 0.979 (1.255) 0.973 (1.143) P i 0.977 (1.391) 0.970 (1.174) 0.20 Po 0.984 (1.854) 0.977 (1.599) P i 0.980 (2.040) 0.966 (1.641) 50 0.00 Po 0.980 (0.752) 0.979 (0.740) P i 0.976 (0.777) 0.981 (0.751) 0.10 Po 0.984 (0.942) 0.982 (0.901) Pi 0.983 (0.994) 0.978 (0.917) 0.20 Po 0.989 (1.365) 0.986 (1.266) Pi 0.989 (1.473) 0.977 (1.286) 100 0.00 Po 0.986 (0.531) 0.986 (0.527) P i 0.984 (0.540) 0.987 (0.532) 0.10 Po 0.989 (0.659) 0.987 (0.645) P i 0.984 (0.677) 0.985 (0.651) 0.20 Po 0.991 (0.938) 0.989 (0.908) P i 0.990 (0.977) 0.983 (0.915) Table 5.4: Coverage and length of 5,000 99% confidence intervals for the linear gression model wi th p = 2 199 Table 5.5: Coverage and length of 5,000 95% confidence intervals for the linear gression model wi th p = 5 n e Parameter Robust bootstrap Empir ica l A V 30 0.00 Po 0.924 (1.018) 0.829 (0.684) Pi 0.920 (1.070) 0.833 (0.702) P2 0.921 (1.071) 0.835 (0.702) Pz 0.917 (1.077) 0.831 (0.701) PA 0.915 (1.060) 0.828 (0.702) 30 0.10 Po 0.940 (1.248) 0.837 (0.780) Pi 0.936 (1.332) 0.835 (0.797) P2 0.932 (1.330) 0.829 (0.801) Pz 0.935 (1.328) 0.828 (0.798) PA 0.932 (1.337) 0.822 (0.800) 30 0.20 Po 0.950 (1.917) 0.825 (1.058) Pi 0.930 (2.006) 0.809 (1.085) Pi 0.929 (1.983) 0.806 (1.082) Pz 0.935 (2.042) 0.812 (1.084) PA 0.936 (2.057) 0.808 (1.084) 50 0.00 Po 0.946 (0.644) 0.912 (0.559) Pi 0.945 (0.669) 0.918 (0.567) P2 0.938 (0.671) 0.911 (0.567) Pz 0.938 (0.669) 0.906 (0.568) PA 0.940 (0.672) 0.911 (0.566) 50 0.10 Po 0.957 (0.814) 0.912 (0.653) Pi 0.950 (0.851) 0.901 (0.664) Pi 0.952 (0.864) 0.895 (0.664) Pz 0.950 (0.858) 0.900 (0.663) PA 0.949 (0.848) 0.900 (0.664) 50 0.20 Po 0.960 (1.306) 0.904 (0.930) Pi 0.951 (1.387) 0.877 (0.947) P2 0.952 (1.384) 0.880 (0.942) Pz 0.949 (1.366) 0.888 (0.944) continued on next page 200 continued from previous page n e Parameter Robust bootstrap Empi r ica l A V PA 0.955 (1.375) 0.890 (0.941) 100 0.00 Po 0.944 (0.421) 0.935 (0.400) Pi 0.945 (0.428) 0.935 (0.402) P2 0.939 (0.427) 0.935 (0.403) Ps 0.948 (0.428) 0.939 (0.403) PA 0.941 (0.427) 0.938 (0.403) 100 0.10 Po 0.954 (0.531) 0.932 (0.482) Pi 0.950 (0.544) 0.927 (0.486) P2 0.948 (0.546) 0.925 (0.485) Ps 0.950 (0.547) 0.927 (0.485) PA 0.950 (0.545) 0.926 (0.485) 100 0.20 Po 0.960 (0.797) 0.935 (0.686) Pi 0.968 (0.828) 0.936 (0.692) P2 0.956 (0.832) 0.919 (0.691) Ps 0.960 (0.828) 0.916 (0.691) PA 0.957 (0.830) 0.919 (0.691) 201 Table 5.6: Coverage and length of 5,000 99% confidence intervals for the linear gression model wi th p = 5 n e Parameter Robust bootstrap Empir ica l A V 30 0.00 Po 0.967 (1.340) 0.911 (0.901) Pi 0.963 (1.408) 0.912 (0.924) P2 0.963 (1.410) 0.913 (0.923) Ps 0.963 (1.417) 0.907 (0.923) PA 0.963 (1.395) 0.908 (0.924) 30 0.10 Po 0.979 (1.643) 0.923 (1.027) Pi 0.974 (1.753) 0.917 (1.049) P2 0.973 (1.751) 0.913 (1.054) Ps 0.973 (1.748) 0.910 (1.050) PA 0.971 (1.760) 0.908 (1.052) 30 0.20 Po 0.983 (2.523) 0.917 (1.393) Pi • 0.973 (2.641) 0.895 (1.429) P2 0.973 (2.610) 0.901 (1.424) Ps 0.978 (2.688) 0.901 (1.427) PA 0.974 (2.707) 0.898 (1.427) 50 0.00 Po 0.985 (0.847) 0.973 (0.735) Pi 0.984 (0.881) 0.971 (0.747) P2 0.983 (0.883) 0.970 (0.747) Ps 0.981 (0.880) 0.970 (0.748) PA 0.985 (0.885) 0.974 (0.745) 50 0.10 Po 0.988 (1.072) 0.970 (0.860) Pi 0.986 (1.121) 0.965 (0.874) P2 0.987 (1.138) 0.963 (0.874) Ps 0.989 (1.130) 0.971 (0.873) PA 0.986 (1.116) 0.964 (0.874) 50 0.20 Po 0.992 (1.719) 0.968 (1.224) Pi 0.990 (1.826) 0.954 (1.246) P2 0.987 (1.821) 0.949 (1.239) Ps 0.987 (1.798) 0.955 (1.242) continued on next page 202 continued from previous page n e Parameter Robust bootstrap Empir ica l A V PA 0.988 (1.810) 0.957 (1.238) 100 0.00 Po 0.988 (0.555) 0.983 (0.526) Pi 0.986 (0.563) 0.986 (0.530) P2 0.984 (0.562) 0.982 (0.530) Ps 0.987 (0.564) 0.985 (0.530) PA 0.988 (0.562) 0.985 (0.531) 100 0.10 Po 0.990 (0.699) 0.983 (0.635) Pi 0.989 (0.717) 0.981 (0.640) P2 0.988 (0.719) 0.981 (0.639) Ps 0.990 (0.720) 0.980 (0.639) PA 0.990 (0.718) 0.980 (0.638) 100 0.20 Po 0.994 (1.050) 0.984 (0.903) Pi 0.993 (1.090) 0.982 (0.911) P2 0.990 (1.095) 0.978 (0.910) Ps 0.992 (1.090) 0.974 (0.910) PA 0.992 (1.093) 0.978 (0.910) 203 (a) n = 30 (b) n = 50 (c) n = 100 Figure 5.5: Average coverage of 95% confidence intervals for the linear regression model with p = 2. Solid triangles are levels of the confidence intervals for the intercept and the coefficient of X l calculated with the robust bootstrap; circles represent the corresponding levels for the confidence intervals obtained with the empirical asymp- totic variance estimate. Across the horizontal axis, the three groups correspond to e = 0.0, 0.1 and 0.2 respectively. The horizontal line indicates the nominal level. 204 (a) n = 30 (b) n = 50 (c) n = 100 Figure 5.6: Average coverage of 99% confidence intervals for the linear regression model with p = 2. Solid triangles are levels of the confidence intervals for the intercept and the coefficient of xx calculated with the robust bootstrap; circles represent the corresponding levels for the confidence intervals obtained with the empirical asymp- totic variance estimate. Across the horizontal axis, the three groups correspond to e = 0.0, 0.1 and 0.2 respectively. The horizontal line indicates the nominal level. 205 (a) n = 30 (b) n = 50 A A A A A * A A * 0 0 0 A A A A O ° o ° o oo ° o ° 0.0 0.1 EPSILON 0.2 (c) n = 100 Figure 5.7: Average coverage of 95% confidence intervals for the linear regression model wi th p = 5. Solid triangles are levels of the confidence intervals for the inter- cept and the coefficients of x 1 ; . . . ,x 4 calculated with the robust bootstrap; circles represent the corresponding levels for the confidence intervals obtained wi th the em- pirical asymptotic variance estimate. Across the horizontal axis, the three groups correspond to e = 0.0, 0.1 and 0.2 respectively. The horizontal line indicates the nominal level. 206 o o ° o ° 0.0 0.1 0.2 0.0 0.1 0.2 EPSILON (a) n = 30 (b) n = 50 (c) n = 100 Figure 5.8: Average coverage of 99% confidence intervals for the linear regression model with p = 5. Solid triangles are levels of the confidence intervals for the inter- cept and the coefficients of x x,... ,x4 calculated wi th the robust bootstrap; circles represent the corresponding levels for the confidence intervals obtained wi th the em- pirical asymptotic variance estimate. Across the horizontal axis, the three groups correspond to e = 0.0, 0.1 and 0.2 respectively. The horizontal line indicates the nominal level. 207 Chapter 6 Conclusion This chapter contains an outline of the results obtained in the thesis, the problems that we encountered, and the directions we foresee for future work. • Global asymptotic properties of robust estimates: • We established the consistency and asymptotic normality of robust loca- tion and regression estimates for an arbitrary distribution function in the contamination neighbourhood. Under additional regularity conditions, we showed that the consistency and asymptotic normality of the robust esti- mates for the location model hold uniformly on the contamination neigh- borhood for certain proportions of contamination. There seems to be a trade-off between the breakdown point of the estimate and the size of the neighbourhood where it is uniformly consistent. 208 • It is desirable to find regularity conditions on the loss function p that wi l l ensure a unique minimum of the functional that defines the estimate for any distribution in the contamination neighbourhood. • It remains to study whether the asymptotic properties of the robust re- gression estimates hold uniformly on the contamination neighborhood. We conjecture that this is the case and we anticipate some technical difficulties due to the multivariate nature of the problem. We also expect that the required regularity conditions on the function p w i l l be more strict than those found for the location model. Maximum asymptotic bias calculation for location estimates: • As a byproduct of our computations regarding the uniform consistency of the S-location estimate (see Section 2.3.3), we derived a method to determine the maximum asymptotic bias for location estimates calculated with a re-descending function ijj. To our knowledge there are no results in the literature regarding the maximum asymptotic bias of this type of estimates. A stable and feasible computer intensive inference method: • We introduced a new computer intensive method to perform statistical inference based on robust estimates. Th i s method, which we call the robust bootstrap, can in principle be used on any statistical model where residuals are well defined. We studied its theoretical and practical properties when it is applied it to estimate the variability and the sampling distribution 209 of robust estimates for the location and regression models. We found that the robust bootstrap is computationally simpler than the classical bootstrap and that it is more stable when the data are contaminated. In particular, it yields estimates of the quantiles of the distribution of the location and regression estimators that have a higher breakdown point than those obtained from the classical bootstrap. • Robust regression estimates for the linear model wi th fixed design remain to be studied in detail. We expect the robust bootstrap to apply as de- scribed in Section 3.5. Note that i f the design is fixed then the contami- nation model does not contemplate outliers in the covariates. Hence, we can use a M-regression estimate with a monotone function ip. This change may modify the robustness properties of the quantile estimates. • We have not been able to show that the Studentized robust bootstrap converges faster than Op(l/ y/n). Our proof seems to fail because the correction factor is of that order. A possible solution is to perform a second order Taylor expansion when deriving the correction factor for the robust bootstrap. This deserves further study. • We are interested in extending the robust bootstrap to estimators that are defined by estimating equations of the form n ( y t , X i , d n > i > n ) = 0, i=l where yi are the response variables, X j are vectors of covariates, 0n is the estimator of interest and vn is an estimator of nuisance parameters. 210 We expect the robust bootstrap to be computationally simpler than the classical bootstrap. Its robustness properties (stability and breakdown point of the quantile estimates) wi l l depend on the form of the functions 9i- • It is also of interest to determine whether the convergence of the distribu- tion of the robust bootstrap holds uniformly on the contamination neigh- bourhood. If this is the case we wi l l have an estimation method for the sampling distribution of the estimate of interest that is uniformly close to its target. This would be a very interesting result to obtain. 211 Chapter 7 Appendix - Auxiliary results This chapter contains auxiliary results used throughout this thesis. Proofs are pre- sented for those that could not be found in the literature. Auxiliary results found in the literature Definition 7.1 - Big O in probability : Let an, n = 1, . . . be a sequence of real numbers and let Xn,n = 1, . . . be a sequence of random variables. We say that Xn = 0P (an) if > k = 0 That is, the sequence \ Xn/an\ is bounded in probability. Definition 7.2 - Small o in probability: Let an,n = 1 , . . . be a sequence of real 212 numbers and let Xn,n = 1, . . . be a sequence of random variables. We say that Xn = oP(an) t/V<5 > 0 l im P n—>oo X„ > 6 = 0. That is, the sequence \Xn/an\ converges to zero in probability. It is easy to see that the following three implications hold Xn = 0P(l), Yn = oP(l) Xn + Yn = 0P(l), Xn = 0P(l), Yn = Op(l) => XnxYn = 0P(l), and (7.1) (7.2) Xn = 0 P ( l ) , Yn = oP(l) X n x Y n = o P ( l ) . (7.3) Remark 7.1 - The above definition of Op (an) is equivalent to the following def- init ion, also found in the literature (see Davison and Hinkley, 1997, page 39): A sequence of random variables Xn is said to be Op (an) if, for each e > 0 we have l i m ^ o o P (\Xn/an\ > e) — p, a constant. It is clear that our first definition implies the latter. To see the other implication first note that p = p(e) above is a non- increasing function of e > 0. This is a consequence of the following inequality that holds for each n G N Xn > 6 2 < P Xn > ei ei < e 2 • We wi l l now show that l im e _ > 0 0 p(e ) = 0. Because for any e we have p(e) > 0 it is enough to show that we cannot have p (e) > p > 0 for some fixed p. If such a p existed 213 p > e > p - 6 V e > 0 . we would have that for any 8 > 0 there exists a n0 6 N such that for any n > n 0 , 1 Xn Because for each fixed n 6 N the left hand side converges to zero as e increases, this is a contradiction. We conclude that p (e) -> 0 as e goes to infinity. Hence both definitions of Op (an) are equivalent. Definition 7.3 - Infinitely often - Let An be a sequence of subsets of a probability space ft. The event {An infinitely often} is defined by oo oo {An infinitely often} = An. m=l n=m We will also write {An i.o.} Lemma 7.1 - Serfling - (Serfling, 1980, page 253) - Let X\,..., Xn be a sequence of independent identically distributed random variables and let g (x,t) : R x R —>• R be continuous in t uniformly on x G R. Let 6n be a sequence of random variables such P that 9n y 9, a constant, then n—too - J 2 9 (Xu 9n) E[g (X, 9)}. (7.4) 71. 1 n — V r * " i n • 1 a s If 6n — & then (7.4) holds a.s. as well. n->oo Lemma 7.2 - Bernstein - (Chow and Teicher, 1988, page 111) Let Sn = YA=IXI where Xi are independent random variables with E (Xi) = 0, E (X?) = of. Let sn = X ) " = 1 of > 0. Assume that E \Xi\k < (k\/2) of ck~2 for k > 2 and 0 < c < oo, then P l s » > i | S e x p G«rT^)) ' *>°- <7-5> 214 The condition E \Xi\k < (k\/2) of ck 2 holds if, for example, P [\Xi\ < c] = 1. Theorem 7.1 - Berry-Esseen for i.i.d. random variables - (Chow and Teicher, 1988, page 305) If {Xn,n > 1} are i.i.d. random variables with EXi = 0, EX2 = o2, E\Xi\2+5 = 7 2 + < 5 < oo for some S £ (0,1], Sn = Y!i=iXi and $ is the standard normal distribution function, then there exists a universal constant c$ such that s u p \ P { S n < xon1'2} - $ (a)I < • xeR n 1 Lemma 7.3 - Borel-Cantelli (see for example Chung, 1974, page 73) Let {An}nen be a sequence of events. Then n ^2P(Ai) < oo P(Ani.o.) = 0. i=l Lemma 7.4 - Lemma 3.4 in Yohai (1985) - If P (|0'X| > 0) > A for all 0 E W, then there exist 4> > 0, 5 > 0, 7 > 0 and a finite collection of compact sets Ci,..., Cs such that | J d D {0 e W : \\0\\ = 1} i=i and P ( inf |0'X| > cA ) > A + 7 . Lemma 7.5 - Lemma 4.2 in Yohai (1985) - Let g : E f c xRh -^R be continuous and let Q be a probability measure on Rk such that for some 6 > 0 we have Er sup M z , 7 ) | ! l7-7ol l<* < OO . 215 Let 7 N be a sequence of estimates in Rk such that 7 N —>• 7 0 almost surely. Then if Z i , . . . ,zn are independent identically distributed random variables in R f c with distri- bution Q, we have Auxiliary results not found in the literature The following lemma is an immediate consequence of Lemma 7.2. For completeness we state and prove it here. L e m m a 7.6 Assume that Xi for i = 1 , . . . , n are independent random variables with zero mean such that there exists a constant c with P (\Xi\ < c) = 1. Let Xn = £ Ei=i Xi- Then, for any 5 > 0 we have P r o o f : Let of = V (Xi), s2n = £ " = 1 of, and Sn = £?=1 Xt. B y Lemma 7.2 we have P(\Xn\>5)=P(Xn>5)+P(Xn<-8) = P(Sn>n6) + P(Sn < -nS) a.s. P(\Xn > 5) < 2 exp(—717) for some 7 = 7 (c, 5) > 0 . 216 and because si < nc2 we have ~ 2 e X p t -n2(J+c5) ) = 2 e x P ( ~ " 7 ) where 7 = 7 (c, 5). L e m m a 7.7 Ze£ p : K —>• R + be a continuous real function such that there exists a finite constant c with p (u) = 1 for \u\ > c. Let t G T and s G <5, where T and S are bounded real intervals, with inf {s G <S} > 0. Then the function f (u, t,s) = p (^-p-^j > ueR, t e T s e S is continuous in s and t uniformly in u. In other words, for any e > 0, there exist 6t > 0 and Ss > 0 such that \s1-s2\<Ss, \h-t2\<6t => | / (u, t\, S\) — f (u, t2, s2)\ < e, Vw G M . P r o o f : The idea is to show that there exists a closed and bounded interval U such that for any ti, t2 G T , S i , S2 G <S we have "{^r)="{^r}' (7'6) whereas for u G U we wi l l use a standard e-5 argument. B y hypothesis we have t < t < t and s < s < s. Consider u > i + s c. It is clear that, for any t < i and s < s we have (u — t)/s > c. Hence (7.6) holds. Similarly, for u < t — sc we have that for any t > t and s > s we have (u — t)/s < —c and (7.6) holds as well. So, 217 U = [t — sc , t + s c]. In U p is uniformly continuous and hence it is enough to bound \u-ti u — t2\ Sl s2 I i |S2 - S i | | t 2 Si - * i s2\ , , \s2 - Si \u\ h J = \u\ Si s 2 Sl s 2 Sl s 2 . , , i | S 2 - S i | |t 2 - <l| , „ \S2 ~ S i | | s 2 - S i | |*2 - *11 ^ , + l*2| 1- s 2 < Ku V t h J 5 — < 5, Sl s 2 Sl s 2 if |si — s2| and |*2 — * i | are sufficiently small [Ku = sup : u E L e m m a 7.8 / / <K, V t e R , V s G / Q , VFeHe, then 7 (F, *, s) is continuous in s uniformly in * and F , i.e., /o r any e > 0, there exists 6 > 0 independent of t and F such that | s i - s 2 | <cT | 7 ( F , * , s i ) - 7 ( F , * , s 2 ) | < e, V* e R, VF E %e. P r o o f : A Taylor expansion yields fx-t\ fx-t where Si < s < s2. Hence, si J \ s 2 = — Ep s < F i X - * \ / X - * PI : -P Sl S 2 . . K |si - s2| |Si - S 2 < • < 6 if |si — s2| < s e/K. Note that by hypothesis K does not depend on * or F. L e m m a 7.9 If p belongs to Tukey's family andrle is a contamination neighbourhood around the standard normal distribution, then Lemma 7.8 holds. 218 P r o o f : Note that pd (u) u > 0. Hence we have to find a uniform bound for It is enough to bound / a 1 2 -u2 ( l — u 2 ) <f> (t + s u) du, -d " and ^ V (1 - « 2 ) 2 dH(t + su). J-d " The first integral is bounded because </> is- Also note that f \u2 ( l . - u 2 ) 2 d H ( t + < K\ f dH(t + su) J-d a v ' d J _ d = ^K dH{x)<-K. d Jt-sd d Lemma 7.10 If p(x,t,a) is continuous in t, then, for fixed x and a we have inf p (x, t, o) teB(t0) B(t0)\{t0} > p(x,t0,o), where B (t0) is an open ball around t0. If, in addition, p(x,t,o) is bounded, we inf p (X, t, o) tSB(to) B(to)\{t0} > EFp{X,t0,o) P r o o f : F i x e > 0. B y continuity there exists 8 = 5 (x, o, t0) > 0 such that t-t0\<5 =>• \p(x,t,o) - p(x,t0,o-)\ < e. 219 Hence, p (x, t,o) < p (x, to, o) + e for al l t in a sufficiently small neighbourhood B (t0) of to- Immediately we obtain Similar ly we have inf p (x, t, a) < p (x, t0, a) + e. t£B(to) inf p (x, t,a)>p (x, t0,o)- e, and the proof of the first claim is complete. The second claim is a consequence of the Dominated Convergence Theorem. • Lemma 7.11 Let x x , . . . , xn be i.i.d. random variables, Xi ~ F. If p satisfies the con- ditions of Lemma 7.7 and on —>• cr a.s. [F], then for any to and bounded neighbourhood B (t0) we have 1 n - V ] inf p (xu t', on) EF n ^—' t'eB(t0) n^oo inf p(X,t',cr) t 'eB(io) (7.7) P r o o f : B y Lemma 7.1 it is enough to show that the function f(x,o) = inf p(x,t',o) t ££>(to) is continuous in o uniformly in x. F i x e > 0. The proof of Lemma 7.7 shows that there exists 8 > 0 such that i f Isi — s 2 | < 8 then x — t x — t < e, if \ox - CT2| < 8, yt e B (to), Vx e R, where A denotes the completion of A. Note that 8 does not depend on t (although it does depend on B (to)). We have ' x — t , x — f <P[ I + e- S2 220 It follows then that in f t 6 B ( t 0 ) P ( T T ) - m^teB(t0) P ( 1 7 ) + e - The same argument can be applied to the other inequality to obtain inf p ( -—- t€B(to) \ Si inf p teB(to) \ S 2 x — t for al l x. Hence (7.7) holds. L e m m a 7.12 Let Xn and Yn, n = 1,... be two sequences of random variables such that Xn = 0P (1) and Yn = 0P (1). Then X, Yn + oP (1) Yn - + oP(l) P r o o f : The result follows immediately from (7.1)-(7.3) and Xn Xn -Xn O p (1) Yn + 0 P (1) Yn (Yn + O p (1)) Yn The following lemma is an elemental result. We state and prove it here for completeness of the presentation. L e m m a 7.13 Let f : R —> K be a continuous function. If l im| . r |_ ) . 0 0 / (x) = 0, then f is uniformly continuous in R. P r o o f : Let e > 0. Choose K (e) such that i f |x | > K, \ f (x)\ < e/4. For the compact set < K} choose 5 = 5 (K) such that i f \u — v\ < 5, \u\ < K, \v\ < K then \f(u)-f(v)\ < e/4. Note that \f (±K)\ < e/4. To see that \f (K)\ < e/4, take 221 a sequence xn \ K, wi th |x„ | > K and use the continuity of / to conclude that |/ (K)\ = lim„ | / (xn)\ < e/4 because \ f (xn)\ < e/4 for al l n G N . Now, take any two real numbers x, y such that \x - y\ < 5. If |x | < K and \y\ < K then | / (x) - f (y)\ < e / 4 < e . If both | z | > K and \y\ > K, then |/(x) - f (y)\ < \f (x)\ + \f (y)\ < e/2 < e. If M < K and \y\ > K, then \f (x) - f (y)\ < \f (x) - f (K)\ + \f (K) - f (y)\ < The following lemma is an elemental result. We state and prove it here for completeness of the presentation. L e m m a 7.14 Assume that the sequence of random variables Xn, n G N converges in probability to the random variable X. Then Xn = Op (1). P r o o f : Let 5 > 0 be arbitrary. e/4 + e/2 < e. 222 L e m m a 7.15 - U n i f o r m S l u t z k y - Assume that Xn (9) is a sequence of random variables indexed by a parameter 9 G 0 . Assume that sup sup ' ( x n (0 ) < x ) - p ( * ( 0 ) < x ) + 0, and that P(X{9)<X + 77) — j p(x (9) < x ) (7.8) uniformly on x e E and 9 € 6 . Let an (9) be a sequence of real random variables indexed by the same set 0 , such that for any 8 > 0 Then, sup sup supP(\an {9) - a(9)\ > 8) >0. ()(zQ V / n—too P(an (9) Xn (9) < x ) - P[a (9) X (9) < x ) -> 0. P r o o f : To simplify the notation we wi l l write Xn, an, X and a instead of Xn (9), an (9), X (9) and a (9) respectively. Let 8 > 0 be arbitrary. Note that P (Xn an < x) < P (\an — a\ > 8) + P (Xn an < x, \an - a\ < 8) = = P (\an - a\ > 8) + P (Xn an < x , \an - a\ < 8) < < P (\an -a\>8) + P(Xn< xj {a - 8)) . Let and uN(9,8) = P(\an -a\ > 8) , tn (9) = sup p(xn{e)<x)-p(x(e)<x) 223 Note that if c > 0 then for any b G R we have b < c+\b — c\. Hence we have P (Xn an<x)< un (9,5) + tn (9) + P (X < xf (a - 8)) Similar ly we have P(Xn > x/an) <P(\an-a\ > 5) + P(Xn > x/(a + 5)) . It follows that P(Xn < x/ On) > P (Xn < x/(a + 5))-P(\an-a\>5) . Now the inequality b > c — \b — c\ implies P(Xn< x/ an) > P (X < x/ (a + 6)) - tn(9) - un(9,6) . Let e > 0, choose n 0 = n 0 (e) large enough such that for any 9, \tn (9)\ < e, for al l n > n 0 . Choose 8 > 0 such that and P (X < xj (a + 5)) - P (X < x/ a) P(X < x/(a-8))-P(X < x/a) < e, for al l 9 e 6 and x G R. For this 5 = 8(e) choose ni = nx (8) = ni (e) such that for all n > ni we have s u p 0 6 0 \un (9,8)\ < e. It follows that i f n > m a x ( n 0 (e) , n i (e)) then P(Xn < x/an)-P(X < x/a) < 3e , for al l 9 G 0 and.x G 224 C o r o l l a r y 7.1 Assume that Xn (9) is a sequence of random variables indexed by a parameter 9 € 6 . Assume that sup sup 0e@ x £ R P(xn{9) < x ) - p ( x < x ) + 0, and that X is a continuous random variable with bounded density function. Let an {9) be a sequence of real random variables indexed by the same set Q, such that for any 8 > 0 supP(\an(9) - a(9)\ > s) • 0 . geQ V / n->oo Then, sup sup 8e@ xeR >(an (9) Xn (9) < x ) - P(a{9) X < x ) -> 0 . P r o o f : The result follows immediately from Lemma 7.15 by noting that (7.8) is satisfied. • L e m m a 7.16 Assume that fn : E —>• M is a sequence of real functions that converges uniformly to g : R —t R on a set K. Let an be the sequence of infimum of fn on K, i.e. an = inf / „ (x). and let b = i n f l 6 £ g (x). Then an > b. n—¥oo P r o o f : Assume b > —oo (the case b = —oo can be treated along the same lines). F i x e > 0. Let n0 (e) be such that for al l n > n0 (e) we have \fn(x)-g(x)\<e, V x G / C . 225 We have an < In (x) < g (x) + e, V x G JC. (7.9) It follows that an < b + e. To prove it, assume that an > b + e, i.e. an — e > b. B y the definition of an, there must exist xg such that g (xg) < an — e, which contradicts (7.9). In the same way, we can show that b - e < fn(x) for a l l x e JC, and hence L e m m a 7.17 Let n > 1 and 0 < k < n be integers. Then the function g(S) = P( B inomia l (n, S) > k ) for 5 G [0,1] satisfies: g (0) = 0, g (1) = 1; g is continuous and non-decreasing. P r o o f : That g (6) is continuous, g (0) = 0, and g (1) = 1 is immediate. We now prove its monotonicity. Firs t assume that k > 1. We wi l l show that h (6) = 1 — g (5) is non-increasing. We have b — e < an. Final ly, we have that for n > no (e), |a. b\ < e. Then k-l h' (6) = £ i t ? - 1 (1 - 5) n—i {n - i) 6{ (1 - 6) n—i—l = a — b. % where 226 and b = E ( n ) (n - o s i (! - < 5 ) n _ i _ 1 = E ( n ) i 5 i ~ l (! - > i=o ^ i ' i ' where the last equality follows from Then, n \ / n ( n - t ) = + 71 Ji'(cJ) = a - 6 = - A ; ( ) J * " 1 (1 - J ) " - * < 0 V(5G[0 ,1 ] k and the proof is complete for k > 1. If = 1 h(5)=P( B inomia l (n, eJ) = 0 ) = (1 - 5 ) n , which is clearly decreasing for 5 € [0,1]. F ina l ly i f k = 0 h(5) = P( B inomia l (n , o~) < 0 ) = 0 V 5 e [0,1]. Consider the metric d2 (F, G) for distribution functions defined by d\ (F, G) = inf E [(X - Y)2] (7.10) where the infimum is taken over al l the possible distributions of the random vector (X, Y) such that its marginal laws are F and G respectively. This metric was intro- duced in Mallows (1972) and Tanaka (1973). For a detailed discussion see Bickel and 227 Freedman, Section 8 (1981). d2 metrizes weak convergence in the following sense: d2 (Fa, F) -)• 0 iff Fa —> F and \ixaEFaX2 = EFX2 a W where denotes weak convergence. Lemma 7.18 Let . . . , X n be independent and identically distributed random vec- tors on W. Let 6n = 0n ( X i , . . . X n ) be a statistic in W such that 0n ^ ôo> to some vector 6^ G W almost surely. Let h (x, t) : W x R ^ W be a continuous function such that ||h(X,0)|| 2 < g(X) Wee, with E [g (X)] < oo. Let the random variables Y j and Zj be Z{ = h (Xi, floo) = h (Xu bn) \<i<n. If Gn is the empirical cumulative distribution function of the Yi, and Fn the corre- sponding empirical cumulative distribution of the Z{, then we have l im d2 (Gn,Fn) = 0 a.s. P r o o f : Because d2 in (7.10) is the infimum over al l the joint distribution functions with the bootstrapped marginals, an upper bound is given by any joint distribution such that its marginals coincide with the distributions of X and Y. In particular consider the distribution in R p x W that assigns mass l / n to each of the "pairs" 228 (h(Xj,0) ,h(Xj,0oo)) . Choose an arbitrary e. Let X be a random vector wi th the same distribution of the XjS. Let I A (X) be the indicator function of the set A, i.e. I A ( X ) = 1 4=> X G A and 1^ ( X ) = 0 otherwise. We know that there exists a compact set JC = K. (e) C W such that 2E[IKc ( X ) g ( X ) ] < c / 2 . For this set /C (e) there exists a positive number 5 = S (e, /C) such that | | h ( X ^ 1 ) - h ( X , - 9 2 ) | | 2 < 6 / 2 if X G /C and \6\ — 92\ < S (e, K). F i x a; G fl (the probability space) such that 9n {OJ) converges to 9^. Almost al l OJ satisfy this. There exists a ni = ni (to, 5) such that V n > ni dn(u)-9oo\ <5/2 . O n the other hand, for a fixed set /C there exists an integer n 2 = n 2 (e, OJ, K) such that for n > n 2 IJ^lKc ( X , ) g ( X O < e/2 . Take n > max (n i , n 2 ) . We have d\{Fn,Gn) < h ( x i , 0 ) - h ( X i , 0 o o ) i=l < ^ E e / 2 + e/2 < e . 229 L e m m a 7.19 Let o(F,t) be as in (2.23). For any Kx > 0, there exists K2 = K2 (Kx) > 0 such that o (F, t) > Kx, for all \t\ > K2, for all F G He. P r o o f : Let Ft = (1 — e) $ + St where St is a point mass distribution function at t. Then o (Ft, t) satisfies (1 - e) E^p((X -t)/a (Ft, t)) = b. Hence, for any F <E Ut we have EFp ((X - t)l o (Ft, t)) = (1 - e) E*p ((X - t)/o (Ft, t)) + eEHp((X-t)/o(Ft,t)) = b + eEHp((X-t)/o(Ft,t))>b. It follows that o (F, t) > a (Ft, t). Also, l im (1 - c) E^p ((X - t)/Kx) = (1 - e) > b, |t|->oo (because p ( ± o o ) = 1). Hence, there exists K2 such that for al l \t\ > K2, (l-e)E9p((X-t)/K1)>b. Hence o (Ft, t) > Kx for \t\ > K2. It follows that o (F, t)>o (Ft, t) > Kx for \t\ > K2, V F G U€. L e m m a 7.20 Let e G (0,1/2) be a fixed number, and let F€ be a distribution function of the form Fe = (1 — e) $ + e H, where $ denotes the standard normal distribution function and H is an arbitrary distribution function. Let ipc be a function from 230 Huber's family (see 2.5), pk be as in (2.14) and let b = E& (pk). Then EFM{U) U] • £ F . [ # ( « ) ] EFAP'M U] where Z ~ $. < (1 - ef 2 [2$ (c) - 1] [b-P(\Z\>k)Y P r o o f : Firs t note that for Fe and any real function h EFe [h (u)} = (1 - e) E* [h (u)] + eEH[h (u)]. Also note that ipc satisfies ib'c (u) u = u/ c i f |u | < c 0 i f Id > c Hence | £ F . [ # ( « ) « ] ! = l rc (1 - e) - / u(f)(u) du + eEH [ip'c (u) u] c J—c = e | E H [ # ( « ) « ] I because f° u</) (u) du = 0 and | ^ (it) tt| < 1. (7.11) To control the denominator we wi l l find upper bounds for EFe [ip'c (u)] and Epc [p'k (u) u]. First note that ib'c (u) > 0, VM EFC [€ (u)] > (1 - 6) (u)] = ( l - e ) ( 2 < D ( c ) - l ) . (7.12) 231 Let pk be a function of the family described in (2.14). Then { 2{u/kf i f | u | < f c 0 if |u | > k and then EFMM u]>(l-e)E*[ffk(u) u] = 2 ( 1 - 6 ) (b-P(\Z\>k)), (7.13) where Z denotes a random variable with a standard normal distribution. The latter equality is due to the fact that by hypothesis b satisfies / k p—k poo (u/kf (j){u) + / (f){u)du+l (p(u)du •k J—oo Jk = E*[p'k(u) u]/2 + P(\Z\>k). The result now follows from (7.11), (7.12) and (7.13). • 232 Bibliography 1. Atkinson, A . C . (1985) Plots, Transformations and Regression. Oxford: Oxford University Press. 2. Berrendero Diaz, J .R . (1996). Contribuciones a la teoria de robustez respecto al sesgo. Tesis Doctoral , Universidad Carlos III de Madr id , Departamento de Estadistica y Econometria, Madr id . 3. Berrendero, J .R. , Mazz i , S., Romo, J . and Zamar, R . (1998). O n the explosion rate of maximum-bias functions. The Canadian Journal of Statistics, 26, 333- 351. 4. Beaton, A . E . and Tukey, J . W . (1974). The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics, 16, 147- 185. 5. Bickel , P . J . and Doksum, K . A . (1977). Mathematical statistics: basic ideas and selected topics. Oakland: Holden-Day. 6. Bickel , P . J . and Freedman, D . A . (1981). Some asymptotic theory for the boot- 233 strap. The Annals of Statistics, 9, 1196-1217. 7. Boos, D . D . and Serfling, R . J . (1980). A note on differentials and the C L T and L I L for statistical functions, with applications to M-estimates. The Annals of Statistics, 8, 618-624. 8. Brownlee, K . A . (1965). Statistical Theory and Methodology in Science and En- gineering. Second Edi t ion . New York: Wiley. 9. Carrol l , R . J . (1979). O n estimating variances of robust estimators when the errors are asymmetric. Journal of the American Statistical Association, 74, 674-679. 10. Chow, Y . S . and Teicher, H . (1988). Probability Theory. Independence, Inter- changeability, Martingales. Springer Texts in Statistics. New York: Springer- Verlag. 11. Chung, K . L . (1974). A Course in Probability Theory. New York: Academic Press. 12. Clarke, B . R . (1983). Uniqueness and Frechet differentiability of functional so- lutions to maximum likelihood type equations. The Annals of Statistics, 11, 1196-1205. 13. Clarke, B . R . (1984). Nonsmooth analysis and Frechet differentiability of M - functionals. Research Report. Murdoch University, Murdoch, Western Aus- tralia. 234 14. Daniel , C . and Wood, F .S . (1980). Fitting Equations to Data. Computer Anal- ysis of Multifactor Data. Second Edi t ion . 'New York: John Wi ley &; Sons. 15. Davies, P . L . (1990). The asymptotics of S-estimators in the linear regression model. The Annals of Statistics, 18, 1651-1675. 16. Davies, P . L . (1993). Aspects of robust linear regression. The Annals of Statis- tics, 21 , 1843-1899. 17. Davies, P . L . (1998). O n locally uniformly linearizable high breakdown location and scale functionals. The Annals of Statistics, 26, 1103-1125. 18. Davison, A . C . and Hinkley, D . V . (1997). Bootstrap Methods and their Appli- cation. Cambridge Series in Statistical and Probabilist ic Mathematics. C a m - bridge: Cambridge University Press. 19. DiCicc io , T . J . and Efron, B . (1996). Bootstrap Confidence Intervals. Statistical Science, 11, 189-228. 20. Donoho, D . L . and Huber, P . J . (1983). The notion of breakdown-point. In A Festschrift for Erich L. Lehmann (P. J . Bickel , K . A . Doksum and J . L Hodges, Jr., eds.) 157-184. Wadsworth, Belmont, California. 21. Dupuis, D . J . and Hamil ton, D . C . (2000). Regression residuals and test statis- tics: Assessing naive outlier deletion. To appear in The Canadian Journal of Statistics. 22. Efron, B . (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics 7, 1-26. 235 23. Efron, B . (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Appl ied Mathematics. 24. Efron, B . and Tibshirani , R . J . (1993). An Introduction to the Bootstrap. Mono- graphs on Statistics and Appl ied Probability, 57. New York: Chapman & Hal l . 25. Fraiman, R. , Yohai , V . J . , and Zamar, R . (2000). Opt imal robust M-estimates of location. Unpublished manuscript. 26. Freedman, D . A . (1981). Bootstrapping regression models. The Annals of Statistics, 9, 1218-1228. 27. Ghosh, M . , Parr, W . C , Singh, K . and Babu, G . J . (1984). A note on boot- strapping the sample median. The Annals of Statistics, 12, 1130-1135. 28. Ha l l , P. (1986). O n the bootstrap and confidence intervals. The Annals of Statistics, 14, 1431-1452 29. Ha l l , P. (1988). Theoretical comparison of bootstrap confidence intervals. The Annals of Statistics, 16, 927-953. 30. Ha l l , P. (1992). The bootstrap and Edgeworth expansion. New York : Springer- Verlag. 31. Hampel, F . R . (1971). A General Qualitative Definition of Robustness. The Annals of Mathematical Statistics, 42, 1887-1896. 32. Hampel , F . R . , Ronchetti, E . Z . , Rousseeuw, P . J . and Stahel, W . A . (1986). Ro- bust Statistics. The approach based on influence functions. New York: Wiley. 236 33. He, X . (1991). A local breakdown property of robust tests in linear regression. Journal of Multivariate Analysis, 38, 294-305. 34. He, X . , Simpson, D . G . and Portnoy, S.L. (1990). Breakdown robustness of tests. Journal of the American Statistical Association, 85, 446-452. 35. H i l l , R . W . (1977). Robust regression when there are outliers in the carriers. P h . D . thesis. Harvard University, Cambridge. 36. Huber, P . J . (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73-101. 37. Huber, P . J . (1967). The behavior of maximum likelihood estimates under non- standard conditions. In Proceedings of the Fifth Berkeley Symposium on Math- ematical Statistics and Probability. Berkeley, California. 38. Huber, P . J . (1973). Robust regression: asymptotics, conjectures and Monte Carlo. The Annals of Statistics, 1, 799-821. 39. Huber, P . J . (1981). Robust Statistics. New York: Wiley. 40. Krasker, W . S . (1980). Est imation in linear regression models wi th disparate data points. Econometrica, 48, 1333-1346. 41. Krasker, W . S . and Welsch, R . E . (1982). Efficient bounded influence regression estimation. Journal of the American Statistical Association, 77, 595-604. 42. Mallows, C L . (1972). A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43, 508-515 237 43. Markatou, M . and Hettmansperger, T . P . (1990). Robust bounded-influence tests in linear models. Journal of the American Statistical Association, 85, 187-190. 44. Markatou, M . , Stahel, W . A . , and Ronchetti, E . (1991). Robust M-type testing procedures for linear models. In Directions in Robust Statistics and Diagnostics, Part I, Stahel, W . , Weisberg, S., eds. Springer-Verlag. pp. 201-220. 45. Maronna, R . A . , Bustos, O . H . and Yohai , V . J . (1979). Bias- and efficiency- robustness of general M-estimators for regression wi th random carriers. In Smoothing Techniques for Curve Estimation, T . Gasser and M . Rosenblatt (eds.). Lecture Notes in Mathematics 757, 91-116. Ber l in ; New York : Springer- Verlag. 46. Maronna, R . A . and Yohai , V . J . (1981). Asymptot ic behavior of general M- estimates for regression and scale wi th random carriers. Zeitschrift fur Wahrschein- lichkeitstheorie und Verwandte Gebiete, 58, 7-20. 47. Mar t in , R . D . and Zamar, R . (1993). Bias robust estimation of scale. The Annals of Statistics, 21 , 991-1017. 48. Mar t in , R . D . , Yohai , V . J . and Zamar, R. (1989). Min-max bias robust regres- sion. The Annals of Statistics, 17, 1608-1630. 49. Parr, W . C . (1985). The bootstrap: some large sample theory and connections with robustness. Statistics and Probability Letters, 3, 97-100. 238 50. Relies, D . A . A n d Rogers, W . H . (1977). Statisticians are fairly robust estimators of location. Journal of the American Statistical Association, 72, 107-111. 51. Rocke, D . M . and Downs, G . W . (1981). Est imating the variances of robust esti- mators of location: influence curve, jackknife and bootstrap. Communications in Statistics, Part B - Simulation and Computation, 10, 221-248. 52. Ronchetti , E . (1982). Robust Testing in Linear Models: The Infinitesimal Ap- proach. Unpublished P h . D . thesis, Swiss Federal Institute of Technology, Zurich. 53. Rosner, B . (1977). Percentage points for the R S T many outlier procedure. Technometrics, 19, 307-312. 54. Rousseeuw, P . J . (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871-880. 55. Rousseeuw, P . J . and Leroy, A . M . (1987). Robust Regression and Outlier Detec- tion. New York: Wiley. 56. Rousseeuw, P . J . and Yohai , V . J . (1984). Robust regression by means of S- estimators. In Robust and Nonlinear Time Series. (J. Franke, W . Hardle and D . Mar t in , eds.). Lecture Notes in Statist, 26 256-272. Ber l in: Springer-Verlag. 57. Seber, G . A . F . (1984). Multivariate Observations. New York: John Wi ley and Sons. 58. Serfling, R . J . (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley. 239 59. Shao, J . (1990). Bootstrap estimation of the Asymptot ic variances of statistical functionals. Annals of the Institute of Statistical Mathematics, 42, 737-752. 60. Shao, J . (1992). Bootstrap variance estimators wi th truncation. Statistics and Probability Letters, 15, 95-101. 61. Shorack, G . R . (1982). Bootstrapping robust regression. Communications in Statistics, Part A - Theory and Methods, 11, 961-972. 62. Simpson, A . , and Eden (1975). A Bayesian analysis of a multiplicative treat- ment effect in weather modification. Technometrics, 17, 161-166. 63. Singh, K . (1998). Breakdown theory for bootstrap quantiles. The Annals of Statistics, 26, 1719-1732. 64. Tanaka, H . (1973). A n inequality for a functional of probability distribution and its application to Kac 's one-dimensional model of a Maxwell ian gas. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 27, 47-52. 65. Venables, W . N . and Ripley, B . D . (1997). Modern applied statistics with S- PLUS. Second Edi t ion . New York: Springer. 66. Weins, D . R (1996). Asymptotics of generalized M-estimation of regression and scale wi th fixed carriers, in an approximately linear model. Statistics and Prob- ability Letters, 30, 271-285. 67. Weisberg, S. (1985). Applied Linear Regression. New York: Wiley. 68. W u , C . F . J . (1986). Jackknife, bootstrap and other re-sampling methods in regression analysis. The Annals of Statistics, 14, 1261-1295. 240 69. Yang, S-S (1985). O n bootstrapping a class of differentiable statistical func- t i o n a l with applications to L - and M-estimates. Statistica Neerlandica, 39, 375-385. 70. Yohai , V . J . (1985). High breakdown-point and high efficiency robust estimates for regression. Technical Report No. 66, Deptartment of Statistics, University of Washington, Seattle. 71. Yohai , V . J . (1987). High breakdown-point and high efficiency robust estimates for regression. The Annals of Statistics, 15, 642-656. 72. Yohai , V . J . and Maronna R . A . (1979). Asymptot ic behavior of M-estimators for the linear model. The Annals of Statistics, 7, 258-268. 73. Yohai , V . J . and Zamar, R. (1988). High breakdown point estimates of regression by means of the minimizat ion of an efficient scale. Journal of the American Statistical Association, 83, 406-413. 241
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 3 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 3 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: