A COMPARISON BETWEEN SEVERAL ONE-STEP M-ESTIMATORS OF LOCATION AND DISPERSION IN THE PRESENCE OF A NUISANCE PARAMETER by EVE RAINVILLE B.Sc. University of Ottawa, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to therequiredstandard THE UNIVERSITY OF BRITISH COLUMBIA August 1996 ©Eve Rainville, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract The idea of one-step estimators has long been used: Le Cam (1956), Neyman (1949) and Fisher (1922) have proposed it in the context of maximum likelihood estimation. More recently, Bickel (1975) adapted this idea to robustness theory when he introduced one-step Huber M-estimators for simple linear models. Huber (1981) and Hampel et al (1986) further investigated the advantages of such one-step M-estimators; while retaining the robustness properties of their initial estimates, one-step M-estimators show increased efficiency, and thus represent a good compromise between robust and parametric estimation. Different versions of one-step M-estimators, some more numerically stable than others, have been proposed throughout the years. To our knowledge, no thorough comparison of available one-step M-estimators have been done using modern techniques, as in Rousseeuw and Croux (1993a). In this thesis, two versions of one-step M-estimators of location, obtained with the Newton-Raphson method, are studied in the context of unknown dispersion. Their asymptotic efficiencies at Gaussian and non-Gaussian models, as well as their maximum asymptotic bias are compared. We also introduce two new one-step M-estimators of dispersion with unknown location, and challenge the traditional fixed-point method one-step M-estimator of dispersion, originating from Huber (1981) and used by Rousseeuw and Croux (1993a). We identify the optimal situations in which to use any of those three one-step M-estimators of dispersion, using their relative asymptotic efficiency at different models, and their explosion and implosion maximum asymptotic bias curves. Table of Contents Abstract ii List of Tables vi List of Figures viii Acknowledgements x 1 Introduction 1 2 M-Estimators: Maximum Likelihood Type of Estimators 10 2.1 Univariate Problem 10 2.1.1 Qualitative Robustness 11 2.1.2 Infinitesimal Aspects 12 2.1.3 Quantitative Robustness 15 2.2 3 4 Nuisance Parameter in the Location-Dispersion Problem 18 One-Step M-Estimators 21 3.1 General Idea of One-Step M-Estimators 21 3.2 Estimation of Location With Unknown Dispersion 3.3 Estimation of Dispersion With Unknown Location 24 3.4 The First Step Is a Big Step 27 Location Estimation with Unknown Dispersion iii . 22 31 4.1 Introduction 31 4.2 Asymptotic Efficiency of the M O S M E of Location 33 4.2.1 Our Choice of Underlying Distributions and of Score Functions . . 33 4.2.2 The Asymptotic Variance of the Standard One-Step and the FullyIterated M-Estimators of Location 35 4.2.3 Asymptotic Variance of the M O S M E of Location 36 4.2.4 The Asymptotic Variance of the M L E 38 4.2.5 Asymptotic Efficiency of the M O S M E Compared to That of the Standard One-Step (or Fully-Iterated) M-Estimator of Location . 4.3 40 Maximum Bias of the M O S M E of Location 42 4.3.1 Monotone Non-Decreasing Score Functions 43 4.3.2 Redescending Score Functions 47 4.3.3 Any Type of Score Function . 49 4.3.4 Further Work to Be Done 50 4.4 A n Example: Hummingbirds 50 4.5 Conclusions 53 5 Dispersion Estimation with Unknown Location 61 5.1 Introduction 61 5.2 Relative Asymptotic Efficiency of the M O S M E of Dispersion 63 5.2.1 Our Choice of Underlying Distributions and of Score Functions . . 63 5.2.2 Asymptotic Value of the M O S M E , the One-Step M-Estimator and 5.2.3 the r-Estimator of Dispersion 65 Asymptotic Variance of the M O S M E of Dispersion 70 iv 5.2.4 Asymptotic Variance of the Standard One-Step M-Estimator of Dispersion 72 5.2.5 Asymptotic Variance of the r-Estimator of Dispersion 73 5.2.6 Relative Asymptotic Efficiency of the M O S M E Compared to That of the One-Step M-Estimator and the T-Estimator of Dispersion . 5.3 74 Maximum Bias of the M O S M E of Dispersion 78 5.3.1 85 Further Work to Be Done 5.4 Continuation of The Hummingbird Example 86 5.5 Conclusions 88 Bibliography 97 A Normalization of Densities for Calculations of Asymptotic Efficiencies 100 B Derivation of the Influence Function of the MOSME of Location 103 C Derivation of the Influence Function of the MOSME of Dispersion 105 D Derivation of the Influence Function of the One-Step M-Estimator of Dispersion 106 E Derivation of the Influence Function of the r-Estimator of Dispersion 108 v List of Tables 1.1 Results of the Implementation of the Loop in Turbo Pascal 4.2 The Value of the Constant Denominator, E^tf)'(z), in the Ratio Defining the Three M O S M E ' s of Location Under Study 4.3 Asymptotic Variance V(MLE,F) 35 of the M L E for Different Underlying Distributions F 4.4 7 39 Asymptotic Efficiency of the M O S M E and the Standard One-Step M Estimator of Location (Equivalent to That of the Fully-Iterated M-Estimator), Derived from Different Score Functions ^ and for Different Underlying Distributions F. The Asymptotic Efficiency of the Median, the Initial Estimator of Location, is Provided for Comparison Purposes. The Normalized M A D is Used As an Initial Estimator of Dispersion 4.5 40 Measures of Location of Flying Times of Four Types of Hummingbirds: Adult Females ( A F ) , Adult Males ( A M ) , Junior Females (JF) and Junior Males (JM) 5.6 52 The Value of the Constant Denominator, E$x'{ ) -> z the Three M O S M E ' s of Dispersion Under Study vi z m the Ratio Defining 66 5.7 Asymptotic Value of the M L E , the M O S M E , the Standard One-Step M Estimator and the r-Estimator of Dispersion, As Well As the Normalized M A D and the Standard Deviation (SD), for Different Underlying Distributions F. The Initial Estimators of Dispersion and Location Used to Calculate the One-Step M-estimators Are Respectively the Normalized M A D and the Median 5.8 67 Relative Asymptotic Variance RV(MLE, F) of the M L E for Different Underlying Distributions F 5.9 76 Relative Asymptotic Efficiency of the M O S M E , the One-Step M-Estimator and the r-Estimator of Dispersion Derived from Different Score Functions X , for Different Underlying Distributions F. The Relative Asymptotic Efficiency of the Initial Estimator of Dispersion, the Normalized M A D , Is Provided for Comparison Purposes. The Median is Used as the Initial Estimator of Location 76 5.10 Measures of Dispersion of Flying Times of Four Types of Hummingbirds: Adult Females ( A F ) , Adult Males ( A M ) , Junior Females (JF) and Junior Males (JM). The M A D is the Median Absolute Deviation Multiplied by the Inverse of $ _ 1 (3/4) 87 A . 11 Normalizing Factor d Needed to Standardize the Interquartile Range of 0 Each Distribution F to That of the Standard Normal Distribution . . . . vu 102 List of Figures 4.1 Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived from the # 1 . 3 4 5 Score Function 4.2 . . . . 56 Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived From the NCDF Score Function . . . . 57 4.3 Lower Bound on the Maximum Bias Function of the M O S M E , the Standard One-Step and the Fully-Iterated M-Estimators of Location, Derived from the X4.7 Score Function 4.4 58 Maximum Bias Function of the M O S M E ' s of Location Derived from the #1.345 and the NCDF Score Functions, and a Lower Bound on the Max- imum Bias Function of the M O S M E of Location Derived from the T 4 . 7 Score Function 4.5 59 Bar Plots of Flying Times of Four Types of Hummingbirds: Adult Females, Adult Males, Junior Females and Junior Males 5.6 60 Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the #0.975 Score Function 90 vm 5.7 Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the 5.8 #2.376 Score Function 91 Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the 5.9 T3. 8 6 Score Function 92 Explosion Bias Curves of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model, for Small Contamination by Outliers 93 5.10 Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through The and the Tau-Estimator Have the #2.516 #2.376 Score Function, Score Function. Those M-Estimators Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes. 94 5.11 Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through the T3.86 Score Function, and the Tau-Estimator, Through the T . Score Function. Those M-Estimators 5 3 Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes 95 5.12 Implosion Curve of M O S M E ' s and r-Estimators That Are 95% Efficient At the Normal Model 96 ix Acknowledgements I am forever grateful to many individuals for helping me complete successfully this thesis. Thanks to my supervisor, Dr. Ruben Zamar, who has introduced me to robustness theory and guided my thoughts and my research. Although I had no previous experience in robustness, he willingly offered me his financial support and provided me with as much time as I needed to feel comfortable with my thesis topic. I also wish to thank Dr. Paul Gustafson who found time to revise my thesis in spite of his heavy professional and personal responsibilities. I must express my gratitude towards Janet Moore who happily provided me with data from her own research about hummingbirds. Many friends have made my stay in the Statistics department and in Vancouver very enjoyable. Thank you Yulia, Xiaochun, Paige, Grace, Hubert, Shideh, Nancy, Karine, Mike, Didier, Cathy, Terry. I will always remember you. But most of all, through his constant encouragement and help, my husband Marc Theberge really made a difference. Merci Marc for your availability and contagious curiosity, your great meals, your expertise which you were always eager to share. couldn't have done it without you. x I Chapter 1 Introduction The importance of point estimation in statistics has long been established. Its usefulness in all the disciplines requiring statistical analysis is increasing more than ever. However, the classical techniques, mostly based on maximum likelihood, still encounter difficulties with less than perfect data, which can sometimes lead to disastrous conclusions. In an attempt to solve the major problem of contamination of the data by outliers, and to adjust to a variety of possible underlying processes generating the data (the true one being impossible to determine), the statistical community developed what is now known as robustness theory. Modest attempts at robustness go back at least as far as two centuries ago. Simple and intuitive robust methods, such as rejection of outliers, have been discussed by Bernoulli (1777) and Bessel and Baeyer (1838). In the 19th and the beginning of the 20th century, other authors considered ways to partly downweight excessive observations, much in the spirit of modern robustness. Tukey (1960) summarized the statistical work of the 1940s and 1950s, demonstrated the nonrobustness of the mean and investigated some robust alternatives. His paper shaped the robustness estimation as a general area of research, and broke the isolation of the early pioneers. See the historical notes by Hampel (1986) (pp. 34-36) for a more complete review of early work in robustness. But with the first attempt at a reasonably manageable, realistic and comprehensive theory, the robustness theory was officially launched in 1964 by Peter J . Huber through 1 Chapter 1. 2 Introduction his famous paper Robust Estimation of a Location Parameter (see [18]). In this paper, Huber introduced M-estimators of location, as a generalization of Maximum-likelihood type of estimators, which include the mean and the median, among many others. More specifically, an M-estimator of location is the value 9 which satisfies (1.1) i=l where X\,...,X is a sample from the population with distribution F(9), and ^ is a n score function defining the estimator. Huber (1964) found the M-estimator of location which minimizes the maximum asymptotic variance among all location estimators in the symmetric family of e-contaminated distributions V (F°) = {F : F(x) = (1 - e)F (x -9) + eH(x)} e 0 where 0 < e < 1/2 is fixed and H is symmetric. This minimax estimator has been called, since then, the Huber M-estimator of location. Considering more generally the asymmetric family V (FQ), c where H is allowed to be asymmetric, Huber(1964) also showed that among all translation equivariant estimators of location, the median minimizes the maximum asymptotic bias. After the publication of this paper, the mean, as an estimator of location, lost without any doubt its momentum. These two results illustrate two important concerns of robustness theory: the asymptotic efficiency of an estimator versus its asymptotic bias, or more generally, its robustness properties. Chapter 1. Introduction 3 Following Huber (1964)'s paper, a variety of robust estimators for dispersion , re1 gression, general linear models, and more recently for their multivariate extensions, hypothesis testing, and other more complex statistical models such as time series, were proposed. Through standard techniques, as well as others developed specifically for robustness purposes, it was shown that these estimators offer competitive alternatives to maximum likelihood estimation, especially in the presence of (possibly) corrupted data. Nevertheless, robustness never appeared to acquire the popularity that would make it become a standard technique of estimation. The following two criticisms about robustness theory may explain why: • There is generally a trade-off between robustness and efficiency - the more robust an estimator is, the less efficient it is, which affects necessarily the precision of the estimation; and • Robust estimators represent often a computational challenge, which may require too much computing work and time to overcome. For example, the popular robust estimators such as the median, the MAD (Median Absolute Deviation) and the LMS (Least Median of Squares) are highly resistant to outliers. However, the Gaussian efficiencies of these estimators is very low: the median and the MAD have respectively 63.7% and 36.7% efficiency (see Hampel et al (1986)), and the LMS regression estimator converges only at the n 1 / 3 rate (see Rousseeuw (1984)). Moreover, computing these robust estimators requires more time and memory than the computation of their maximum-likelihood counterparts. N o t e that the use of the word dispersion in this thesis means what is usually referred to as scale. The dispersion corresponds to the spread of a distribution, whereas the scale is a measure of distance between the center of a distribution and 0, and therefore varies with location. The distinction between the two concepts is just now starting to be made. 1 Chapter 1. Introduction 4 In answer to those two criticisms towards robustness, Bickel (1975) adapted an old idea of the statistical literature to robust M-estimation. Le Cam (1956), Neyman (1949) (see [22]) and Fisher (1922), in the early years of modern statistics, had observed that in the univariate estimation of location setup, "if F is known, and = (—/'//)> the estimate obtained by starting with a y/n consistent estimate of 6 and performing one Gauss-Newton iteration of (1.1) is asymptotically efficient even when the M L E is not and is equivalent to it when it is" (see [5], p. 428). In times when computers were still a dream, less effort to compute an estimator, associated with no apparent loss in its asymptotical properties, represented a major advantage. Inspired by the observation of Fisher, Neyman and Le Cam, Bickel (1975) proposed a one-step Huber M-estimator to be used in the estimation of simple linear models, such as location and regression through the origin. The author showed that the estimator is asymptotically normal under mild conditions. In his book, Huber (1981) gave explicit expressions for one-step M-estimators of location with unknown dispersion (p. 140 and p. 146), as the first step of Newton's method, starting with preliminary robust estimates of location and dispersion. The author further showed that if the initial estimate of location was consistent for 9, F was symmetric and the score function defining the estimator was odd, then this one-step M-estimator of location was asymptotically equivalent to the fully-iterated M-estimator of location with preliminary dispersion. In the special case of dispersion estimation with unknown location, Huber (1981) (p. 147) also suggested a fixed-point iterative method for computing the estimate, the first step of which may serve as a one-step M-estimator of dispersion. Hampel et al (1986) (p. 106) further stressed the importance of selecting robust preliminary estimates when computing onestep M-estimators, as otherwise the resulting estimators may not be robust (which was also observed by Andrews et al in [1]). Chapter 1. Introduction 5 In all cases, the enthusiasm created by one-step estimators was contagious: those estimators were easy to compute, but most importantly, they represented a good balance between robustness and efficiency. Being one step away from robust initial estimators, they retained the robustness properties of their initial estimators. Simultaneously, one step closer to their fully-iterated version, the one-step estimators were almost as efficient as their fully-iterated version. In the univariate location set up, Andrews et al (1972) (Bickel (1975) has submitted his paper in 1971 and was one of the authors of [1]) showed with some length that for certain score functions, one-step M-estimators were very well behaved. Jureckova and Portnoy (1987) further investigated the use of one-step regression M-estimators based on the L M S to obtain high efficiency. While one-step M-estimators solve the possible lack of uniqueness of the solution of (1.1) and reduce the computational effort, they can sometimes be numerically unstable. For instance, the standard one-step M-estimator of location with unknown dispersion contains a ratio which denominator can become very small for certain samples and score functions. To address this problem, Hampel et al (1986) suggested different versions of one-step M-estimators of location (pp. 152-153), one of which replaces the denominator by a constant. Similarly, the following example (reproduced from [7], p. 10) illustrates that the fixed-point iterative method can sometimes lead to disastrous results, which raises the question whether the one-step M-estimator of dispersion with unknown location, as introduced by Huber (1981), is stable enough to be trusted. Example. The value x = 1/n is a fixed point of the function given by f(x) = (n 4 - l ) * - ! Chapter 1. Introduction 6 since It follows then that iterating this function for any particular value ofn using the following loop should, if floating-point arithmetic of the computer were exact, simply result in x = 1/n: x := 1/n for i = 1 to 30 x := (n + 1) * x — 1 We use here the symbol := for computer assignment, so 'x := 1/n', means 'Assign the value 1/n to the stored variable x\ Table 1.1 shows the results of implementing such a loop in Turbo Pascal for different values of n. We see that for the various powers of 2 the arithmetic is indeed exact. By contrast, for other values ofn the error grows steadily from approximately —3.5xl0 for 5 n = 3 through about —2.3xl0 18 for n = 10. These errors are the direct result of the propagation of the rounding error made in the binary representation of | and j^, respectively. • Bickel (1975) had most likely foreseen the numerical instability problem. He actually proposed two types of one-step Huber M-estimators of regression. His Type II estimator was the smooth version of his Type I, of which a term was replaced by its asymptotic expectation. As much as robustness is concerned with outliers, it must never ignore important issues such as numerical stability. With the development of modern techniques, such as influence functions and maximum bias curves, which allow a more complete study of one-step M-estimators, these 7 Chapter 1. Introduction n I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 Final x 1.000 000 000 000 00 E + 0000 5.000 000 000 000 00 E - 0001 1.747 630 000 000 00 E + 0005 2.500 000 000 000 00 E - 0001 -4.021 311 173 693 75 E + 0010 -1.952 324 816 734 00 E + 0012 -4.021 071 095 865 60 E + 0013 1.250 000 000 000 00 E - 0001 -1.616 879 469 807 53 E + 0017 -2.308 383 841 816 94 E + 0018 -1.962 659 088 425 60 E + 0019 -2.443 971 442 609 19 E + 0020 -1.935 039 516 698 14 E + 0021 -1.328 735 789 941 45 E + 0022 7.052 067 281 085 79 E + 0022 6.250 000 000 000 00 E - 0002 Table 1.1: Results of the Implementation of the Loop in Turbo Pascal Chapter 1. Introduction 8 estimators can be better understood and appreciated. Maximum bias curves occured briefly in Hampel et al (1986) (pp. 176-177), but were used to their full advantage by Martin and Zamar (1989), Martin et al (1989) and He and Simpson (1993). The influence function as a tool was itself developed by Hampel starting in 1974 and carefully discussed in [15]. Using those techniques, Rousseeuw and Croux (1993a) have studied the bias properties of k-step Huber M-estimators (the kth step of the iterative algorithm) in the univariate location and dispersion setup. They have shown that while the efficiency increases with the number of steps, the bias also increases. This led the authors to recommend the use of one or two-step M-estimators, but not more, especially in multiparameter problems where the dispersion is typically unknown. Rousseeuw and Croux (1993a) preferred the one-step M-estimator of location suggested by Hampel et al (1986) (p. 153), which is derived from the standard one-step M-estimator, but has a constant denominator in its ratio. To estimate univariate dispersion, the authors used the fixed-point method one-step M-estimator suggested by Huber (1981) (p. 147). However, to our knowledge, no formal comparison have been done of all the one-step M-estimators available, using modern techniques, which would allow one to choose one estimator over another depending on the statistical situation at hand. The goal of this thesis is to offer such a comparison, in two contexts: estimation of location with unknown dispersion, and estimation of dispersion with unknown location. The standard one-step M-estimator of location, as suggested by Huber (1981) (p. 146), will be compared to the one-step M-estimator used by Rousseeuw and Croux (1993a) in the general setup of unknown dispersion. To estimate dispersion with unknown location, two estimators are derived using Newton-Raphson's method, and compared to the standard, unchallenged, fixed-point one-step M-estimator of dispersion, used for example by Rousseeuw and Croux (1993a). Chapter 1. Introduction 9 Chapter 2 presents the general theory of M-estimators as initiated by Huber (1964). Chapter 3 presents formally the one-step M-estimators that will be compared in the following two chapters. In Chapter 4, we study the asymptotic properties of the two one-step M-estimators of location that are of interest; that is, we study their efficiency under Gaussian and non-Gaussian models (following the approach.of Rousseeuw and Croux (1993b)), as well as their maximum asymptotic bias. Using the same robustness techniques, the three one-step M-estimators of dispersion with unknown location that are of interest are compared in Chapter 5 in terms of their asymptotic behaviour. Chapter 2 M-Estimators: Maximum Likelihood Type of Estimators 2.1 Univariate Problem Let Xi, ...,X n be a sample from the population with distribution function F(x; 9). Any estimate T which minimizes an equation of the form n n i=l or which is defined by the implicit equation n 5Xz,-;T ) = 0, n (2.2) i=l where p is an appropriate loss function and ip(x;9) = (d/d9)p(x;9) is a score function, is called an M-estimator of 9. This estimator is an extension of the usual maximum likelihood estimator ( M L E ) of 9: the choice p(x;9) = — log/(#;#) exactly corresponds to the M L E minimization problem. For example, when the parameter 9 to be estimated is a location parameter, one sets ip(x; T ) — ip(x — T ). When the parameter a to be estimated is a dispersion parameter, n n the function i\> used is i/>(x; a) = ij>(f)Note that the M-estimator is not modified when tp is multiplied by any constant r > 0. 10 Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 2.1.1 11 Qualitative Robustness It is easy to study the asymptotic properties of M-estimators when ijj(x; 9) is monotone in 9. Furthermore, assume • tp(x;9) in non-increasing in 9; • ip(x;9) is measurable in x; • tj>(x; — oo) > 0 and ip(x; oo) < 0; and define • X (t) = E {^(x;t)}; F F • T * = sup{*:£? ^(x.-;t)>0}; n = 1 • r ** = i n f { * : £ ? i ^ ( * , - ; < ) < u } . n = Consistency M-estimators are consistent under some conditions. Huber (1981) has shown that if there is a to(F) such that f X {t) > 0 Vt< t (F) \ Vf> i (F) F then T* —> t (F) 0 T n X (t) < 0 F and T** —>• t (F) satisfying T* < T 0 n 0 0 almost surely [F] and in probability [F]. Any value < T** can serve an M-estimator, which will be consistent since asymptotically, T* = T**. Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 12 Asymptotic Distribution Assume furthermore that • there exists at least one t (F) 0 such that Ai?(£ ) = 0; 0 • X (t) is continously differentiable in a neighboorhood of t (F) and A^(i ) < 0; and F 0 • cr (t) = EF4> (X; t) — \ (t) is finite, non zero and continuous near to(F). 2 F 0 F Under those assumptions, M-estimators T are asymptotically normally distributed, n that is, y/E(T -1 ) n N(0, V(V>, F)), where V(if>, F) = °}^ . 0 { See Huber (1981) (pp. ]2 49-50) for the complete details of this proof. The fact that M-estimators are asymptotically consistent and normally distributed have certainly contributed to their success, since it becomes so easy to make inference under these conditions. In most cases, T is a function of the empirical distribution F and derives from a n n functional T, that is, T = T(F ). n n If T is consistent, then T —> T(F) in probability n n [F]. The discussion that follows will adopt the functional notation. A n M-estimator T n will therefore be referred to as T(F). 2.1.2 Infinitesimal Aspects The Influence Function Beyond qualitative properties, it is informative to study the behaviour of M-estimators under infinitesimal changes. In 1968, Hampel introduced the influence function, which "describes the effect of an infinitesimal contamination at the point x on the estimate, standardized by the mass of the contamination. One could say it gives a picture of the infinitesimal behavior of the asymptotic value, so it measures the asymptotic bias Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators caused by contamination in the observations." 13 (see [15], p. 84) When Hampel first introduced this concept (1968, 1974), he referred to it as the influence curve; however, the term influence function is now widely preferred in view of the generalizations to higher dimensions. More precisely, the influence function of the functional T(F) is IF(x;T,F) v where F TIX y = lim t-o '* t T { F t ) T { F \ = (1 — t)F + t6 . In other words, it is defined as the derivative with respect x to t of the functional T(F , ), evaluated at t = 0. T X Under the regularity assumptions listed in the previous section, it is easy to show that the influence function of any univariate M-estimator, defined by the score function ip, has the following form: T ' F ) = -EH(a/^fi ) ; r(F))}- (2 - 3) See Huber (1981) (p. 45) for more details on this derivation. Notice that the influence function of an M-estimator is proportional to ip. In the special case of a univariate location problem, the score function defining the M - estimator is ip{x\ 9) = if)(x — 9), and we obtain "V' '*) E ip'(x-T(F)Y 1 F Similarly, the influence function of a univariate dispersion M-estimator S(F), for which the score function has the form x( ', cr) = x ( f ) , will be x IF(x-SF)EFX/{X/ XWWWW S F))(X/S(F)Y { Hampel et al (1986) have shown that under regularity conditions, the knowledge of the influence function of an M-estimator is equivalent to the knowledge of its asymptotic Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators distribution. Indeed, using a Taylor expansion of T (F ) n n around T(F), 14 the authors showed that the asymptotic variance of an estimator is completely defined by its influence function, since V(T, F) = E {IF(x; F T, F)} . 2 (2.4) The influence function is a priori a heuristic tool. It is easy to calculate, and therefore to obtain an expression for the asymptotic variance of an estimator. However, the regularity assumptions for (2.4) are cumbersome to prove; one usually tries to prove normality using another method. However, in all practical cases, the relation (2.4) holds. The influence function approach will be used in the next chapters when deriving an expression for the asymptotic variance of the estimators of interest. The Gross-Error Sensitivity The influence function of an estimator can be summarized in many ways other than its expected square. The most important is probably the supremum of its absolute value. Hampel (1968,1974) introduced this notion as the gross-error sensitivity of an estimator. More precisely, one defines the gross-error sensitivity of T at F by 7 *(r,F) = sup|/F(x;r,F)|, X for values of x where the influence function exists. The gross-error sensitivity measures the worst influence which a small amount of contamination of fixed size can have on the value of the estimator. It could therefore be regarded as an upper bound on the standardized asymptotic bias of the estimator. If 7*(T, F) is finite, we say that T is B-robust at F, where the B comes from bias. In view of (2.3), an M-estimator is B-robust at F if and only if ?/>(•, T(F)) is bounded. Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators Putting a bound on j*(T,F) 15 is often the first step in robustifying an estimator. However, in many cases, this will conflict with the goal of asymptotic efficiency. The introduction of one-step M-estimators brings a partial solution to this problem, as will be seen in the next chapter. 2.1.3 Quantitative Robustness The influence function represents an excellent tool for assessing local asymptotic behaviour of an estimator. However, it must be complemented by a measure of global reliability, which describes up to what.distance from the model distribution the estimator still gives some relevant information. Consider the e-contaminated neighboorhood of F: 0 V (F ) E 0 = {F : F = (1 — e)F 0 + eH}, where H is an arbitrary distribution and 0 < e < 1/2 so it is possible to distinguish between the central model FQ and the contamining distribution H. In what follows, we will present two measures of distance from the central model FQ that should stay as small as possible, and for e as big as possible, for an estimator to be considered robust. The Breakdown Point Hampel (1971) introduced the notion of breakdown point of an estimator, generalizing a definition by Hodges (1967). Roughly speaking, the breakdown point of an estimator is the maximum e-contamination that an estimator can endure before its value goes to infinity. It gives the limiting fraction of bad outliers the estimator can cope with. For any estimator, the maximum breakdown point is 50%. Robust M-estimators such as the median or the median absolute deviation (MAD) have 50% breakdown point. On Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 16 the other hand, the mean, as an estimator of location, has 0% breakdown point - it is completely intolerant to outliers. In fact, Hampel (1971) and Huber (1981) (section 3.2) have shown that when F is symmetric, one usually chooses odd score functions ip to estimate location. If ip is moreover strictly monotone and bounded, the M-estimator defined by ip is B-robust and has 50% breakdown point. On the other hand, M-estimators are not B-robust and have 0% breakdown point when ip is strictly monotone but unbounded. There exists an asymptotic version of the breakdown point, as well as a finite-sample version. In what follows, to accompany the functional (limiting) notation, we will use the asymptotic version, as in Hampel (1971) and Huber (1981). See [9], [10], [17] and [20] for a detailed presentation of the finite-sample version of the breakdown point. The Maximum Bias Function A high breakdown point is a necessary condition for a good estimating method, but not a sufficient condition (see [29], p. 877). Many argue that the breakdown point is not as general at it claims it is. To deal with the matter, Hampel et al (1986) (pp. 176-177) briefly introduced maximum bias functions, but this measure of robustness was fully exploited by Martin and Zamar (1989), Martin et al (1989) and He and Simpson (1993). The maximum bias of an estimator (as a function of e) describes how an estimator T(F) can change in V {FQ) due to a given fraction e of contamination. C In the univariate location setup, the maximum bias function of an estimator T(F) is formally defined as B (e)= T sup \(T(F)-T(F )\. 0 FeV (F ) e 0 However, in the univariate dispersion setup, one needs to generalize the concept of maximum bias function. Martin and Zamar (1989) have observed that the presence of Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 17 outliers can cause the estimate of dispersion to be very large, but as well, too many inliers may cause the estimate to be dangerously close to 0. They defined the generalized maximum bias as the maximum between the bias due to outliers and the bias due to inliers. Moreover, they allowed (monotone) penalization for inliers and outliers to be independently chosen, because in some setups, one of the two may cause more trouble than the other. Formally, the generalized maximum bias can be defined by B (e)= max s r B[S(F)], £V (l<o) € where B[S(F)} = I ^ W M W ' \ L [S(F)/S(F )}, 2 < W ^ ^ if S(Fo) < S(F) < oo, i f 0 0 S where L\ and L are the continuous, nonnegative and monotone (penalization) loss func2 tions, with I/i(l) = £ 2 ( 1 ) = 0 and limXi(t) = lim L (t) = 00. 2 A popular choice for a loss function is the logarithmic function. From monotonicity of L\ and L , it follows that 2 B (e)=max{L [S-/S(F )],L [S /S(Fo)}}, + s where S~ and S + 1 0 2 denote the supremum and the infimum of the functional S(F) as F ranges over V (F ). Therefore, it is enough to concentrate on S~ and S + e 0 when studying the generalized maximum bias of an estimator. We shall call S /S(Fo) the explosion + maximum bias of an estimator, and S~/S(Fo) mator, both being functions of e. the implosion maximum bias of an esti- Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 18 Finally, note that the dependence of the explosion and implosion maximum bias functions of an estimator on the ratio S(F)/S(FQ) is justified in terms of dispersion- invariance (see [24], p. 134). The maximum bias function of an estimator includes its gross-error sensistivity (the slope of the curve at e = 0), and also the breakdown point, where the curve goes to infinity. The maximal bias curve is therefore an additional, and more complete, tool that can help to choose between competing estimators. For a small range of e, some use the gross-error sensitivity of an estimator as a linear approximation to the curve (check Hampel et al (1986) for a rule of thumb on values of e for which the approximation hold). 2.2 Nuisance Parameter in the Location-Dispersion Problem In many cases, the underlying distribution of the population from which a sample is taken has more than one parameter. For example, the location-dispersion families have typically distributions of the form F( ^-), S where —oo < 8 < oo,<7 > 0. If the interest is still to estimate only one parameter, then the other becomes a nuisance parameter. It must also be observed that typical M-estimators of location are in practice location invariant, but not dispersion invariant. Therefore, when estimating location, one needs to provide an M-estimator of dispersion as well. The unknown dispersion is, per say, a nuisance parameter. The same can be said about dispersion M-estimators: they are not location invariant, and thus need an ancillary location estimator. If %j> and x a r e the respective score functions of the M-estimators of location and dispersion, then the simultaneous equations defining them implicitly are and (2.5) Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 1 n n Ex i=l ( 19 X = (2-6) 0. From a finite-sample point of view, when estimating location with unknown dispersion, one would compute some M-estimate of dispersion, S , and then find the location esn timate T using (2.5). If, on the other hand, one is interested in estimating dispersion n when the location is unknown, then one would solve (2.6) for S , with some preliminary n estimate of location T . In functional terms, the M-estimator of location T(F) with unn known dispersion, and the M-estimator of dispersion S(F) with unknown location can be expressed implicitly by and where So(F) and T (F) are the asymptotic values of the initial estimator used for the 0 nuisance parameter. Fortunately, if the underlying distribution F is symmetric, ip is odd and x 1S even, the location functional T(F) and the dispersion functional S(F) are independent: the asymptotic variance of T depends on 5" only through its asymptotic value S(F), and that of S(F) does not depend on T(F). More precisely, the influence functions of T and S, respectively IF(x; T, F) and IF(x; S, F) are: IF(x; F, T) and Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators 20 (Remember that T(F) = 0 when F is symmetric.) Under similar regularity assumptions as listed in the preceding section, the M-estimators of location and dispersion defined by (2.5) and (2.6) have asymptotic normal distributions with variances equal to the expected squares of their influence functions. Therefore, when estimating location with unknown dispersion, one can choose the estimator of dispersion on criteria other than low variability, which enlarge considerably the set of possible candidates. Similarly, the asymptotic variability of an M-estimator of dispersion will not be affected by the estimate of the unknown location parameter, no matter which estimator is chosen. The maximum bias of an M-estimator of location with unknown dispersion is defined exactly as in the univariate setup. However, one must remember that T is defined implicitly as a function of S, which necessarily causes the bias of T to be affected by the bias of S in an e-contaminated neighboorhood such as V (Fo). Clearly, the maximum bias ( of T will be higher in presence of a nuisance parameter than with no nuisance parameter. Similarly, the maximum explosion bias of an M-estimator of dispersion with unknown location, defined as in the univariate setup, will be implicitly affected by the estimator of location, and increased accordingly. On the other hand, the maximum implosion bias is not affected by the estimate of the unknown location, compared to the implosion bias of the univariate estimator of dispersion. This is due to the fact that when S —> 0, the location estimate tends to the fixed value T(F ). 0 Solving the non-linear equations (2.5) and (2.6) requires a numerical method, which may not be numerically feasible in some situations. One-step M-estimators have been proposed to solve this problem. The next chapter presents these estimators in the context of location and dispersion estimation with a nuisance parameter. Chapter 3 One-Step M-Estimators 3.1 General Idea of One-Step M-Estimators The computation of an M-estimator requires an iterative algorithm, since the estimator is defined implicitly by a non-linear equation. Moreover, the presence of a nuisance parameter adds an extra dimension to the problem, being itself the implicit solution of another non-linear equation. Indeed, in multiparameter problems, you can never be sure that a root exists until you find it: there is no bracketing-the-root-by-fancy-algorithms possible. The computation involved often proves to be difficult, as any iterative method can be subject to many problems. For example, the Newton-Raphson algorithm, often used in practice, will diverge if it encounters a local.extremum in its search of a root (see [28] p. 362 for a graphical explanation of this problem). Bickel (1975) suggested, as an alternative to the fully-iterated M-estimator, what he referred to as the one-step M-estimator. This estimator corresponds to the first iteration solution of the algorithm used to solve the non-linear equation defining the estimator, as Huber (1981) (section 6.7) further investigated. In the case where the objective is to estimate the parameter 6 of a distribution in the presence of a nuisance parameter, one would first choose preliminary estimates for both parameters. The nuisance parameter would then be considered fixed and equal to its preliminary estimate. To get the one-step M-estimator of 0, one would simply perform one iteration in the algorithm used to solve for the fully iterated M-estimator, starting 21 Chapter 3. One-Step M-Estimators 22 from the preliminary estimate of 9 and considering the nuisance parameter as fixed. 3.2 Estimation of Location With Unknown Dispersion Huber (1981) suggested to use the well-known Newton-Raphson method to solve the nonlinear equation defining an M-estimator. The one-step M-estimator is therefore obtained by performing one step in the Newton-Raphson method. Thus, in order to estimate the location parameter of the location-dispersion distribution F, with unknown dispersion, one would choose initial estimates of location and dispersion, with respective asymptotic values To(F) and So(F). Since the Taylor expan- sion of with respect to T(F), around TQ(F), begins with + ... , the one-step M-estimator of location is defined, in functional terms, by T(F) = T (F) 0 + So(F) \*$* - "f V S (F) > tjp (3.8) ) 0 This estimator will be hereafter called the Standard One-Step M-Estimator of Location. The functional notation is used here for simplicity. But in practice, we always deal with finite sample sizes. The finite-sample standard one-step M-estimator of location is given by T n = T" + 0 -pL;J-, (3.9) Chapter 3. One-Step M-Estimators 23 where TQ and SQ are the initial finite-sample estimates of location and dispersion, and n is the sample size. Huber (1981) (p. 146) indicates that (3.9) converges to the exact value of the fully iterated M-estimator in a finite number of steps, provided the underlying distribution F is symmetric and tp is skew symmetric and piecewise linear. Of course, in practice, we deal with empirical cumulative density functions F which are never symmetric. But we N assume here that those cumulative density functions converge to symmetric distribution functions. The value of the denominator in the ratio defining the estimator (3.8) is not critical, and can be replaced by any constant greater than 1/2 as long as 0 < if)' < 1. However, as Hampel et al (1986) (p. 153) pointed out, for certain score functions ip (especially the redescending types) and certain samples, it may happen that the denominator ^ ICILi ^'C's" 0 ) becomes extremely small (approximately equal to 0), which would destabilize the value of the estimator. To avoid this problem, Hampel et al (1986) have suggested replacing the denominator in the ratio by the constant E$ip'(z), where $ is the standard normal distribution. This constant is easy to calculate and never becomes 0. It can also be seen as the smooth version of ^ £"=i ^'C's^ ) w n e n the underlying distribution F is normal. The normality assumption is often used in practice, but nothing prevents one from assuming F is otherwise, and calculating the expectation with respect to that different underlying distribution. The modified version of the one-step M-estimator (3.8) is therefore given by: T*(F) = T (F) + 0 S (F) 0 This estimator will hereafter be called the MOSME of Location, for Modified One-Step M-Estimator of Location. Its finite-sample version can be expressed as <*\n rpn 0 i ' on 0 Chapter 3. One-Step M-Estimators 24 where Tg and SQ are the initial finite-sample estimates of location and dispersion, and n is the sample size. Hampel et al (1986) (p. 153) suggest the use of another estimator, the one-step Westimator, to avoid the problem of a small denominator. This thesis will however focus on the M O S M E and the standard one-step M-estimator of location, as they are more commonly used in practice. 3.3 Estimation of Dispersion With Unknown Location The non-linear equation defining the fully iterated M-estimator of dispersion S(F) with unknown location, (3.10) slightly differs from that of the M-estimator of location with unknown dispersion, in that the right-hand side constant, (3 = E$x{ ), x is greater than 0. It would be undesirable to have /3 = 0 since this would force S(F) to be equal to oo (assuming x(0) = 0). Taking advantage of the form of the expression (3.10), Huber (1981) (p. 147) suggested the use of the fixed-point iterative method to solve it for S(F). Actually, Huber used a particular case of (3.10), the one for which x = ^'i where ip was the score function defining the fully iterated M-estimator of the unknown location. That is, Huber wanted to make the location estimator defined by ip dispersion invariant. The fixed-point method can however be straightforwardly generalized for any xThe one-step M-estimator of dispersion, Si(F), tational algorithm can be expressed as derived from the fixed-point compu- Chapter 3. One-Step M-Estimators 25 [SQ(F)} 2 (3.11) where TQ(F) and So(F) are the limiting value of the preliminary estimators of location and dispersion. Note that the standard fixed-point method solves an equation of the form g(S(F)) = S(F), for a certain function g. The reason for solving instead S(F)g(S(F)) is due to the fact that when x S(F) 2 —> T0(F), the x function behaves generally like a second-order polynomial, and therefore S(F)g(S(F)) To(F)) . 2 — = S(F) 2 E { ~^P) X FX ^ E (x The latter expression refers directly to the expression for the standard deviation as an estimator of dispersion. The finite-sample version of (3.11) can be expressed as where and SQ ave the initial finite-sample estimates of location and dispersion, and n is the sample size. Yohai and Zamar in [34] (p. 407) have obtained T-estimators of dispersion by letting So(F) in (3.11) be an M-estimator of dispersion determined by a smooth function p±. The estimator S\{F), defined by (3.11), will therefore be hereafter called One-Step r Estimator of Dispersion. The fixed-point method for solving (3.10) is not the only one possible, of course. The Newton-Raphson method, as used in the setup of location estimation with unknown dispersion, can also be used. In fact, the fixed-point method is known to converge linearly, whereas the Newton-Raphson method, as a special case with an additonal constraint, converges quadratically (see [7], p. 62). To my knowledge, the Newton-Raphson method in the estimation of dispersion has never been fully studied, but it represents clear advantages over the fixed-point method in terms of asymptotic behaviour of the estimator (refer F - Chapter 3. One-Step M-Estimators 26 to chapter 5 for more details). We therefore believe it is worth of some consideration. Thus, in order to estimate the dispersion parameter of the location-dispersion distribution F, with unknown location, one would choose initial estimates of location and dispersion, respectively referred to as TQ(F) and So(F). Since the Taylor expansion of the right-hand side of (3.10), with respect to S(F), around So(F), begins with S(F)-S (F) So{F) 0 E , (x-T (F)\ (x-T (F)\ , \ S (F) ) \ S {F) ) T ••• 0 0 0 0 the Standard One-Step M-Estimator of Dispersion can be defined in functional terms by = S (F) S (F) 0 1 + P- ) 12 ' »ff ) ' - . (F) So FX \ (F) )\ So S (F) ) 0 The finite-sample version of the standard one-step M-estimator of dispersion is on qn , qn " (n 1 Q\ '-'o where Tg and Sfi are the initial finite-sample estimates of location and dispersion, and n is the sample size. For the same reasons explained in the location setup, it may happen, for certain x functions and certain samples, that the denominator in the ratio of (3.13) gets dangerously close to 0 and strongly affects the estimate. In this case, the MOSME of Dispersion may represent a better choice of estimator, where the M O S M E of dispersion is asymptotically defined as TP S\F) Its finite-sample definition is = S (F) + S (F) 0 0 " J X-T (F)\ _ 0 E ^ Z ) Z Q • (3-14) Chapter 3. One-Step M-Estimators where and 27 are the initial finite-sample estimates of location and dispersion, and n is the sample size. 3.4 The First Step Is a Big Step The obvious advantage of one-step M-estimators is their ease of computation, compared to their fully iterated versions. Moreover, the asymptotic properties of one-step M - estimators make them very attractive. For. example, it is known that the standard one-step M-estimator of location is asymptotically equivalent to the fully iterated M estimator, provided the underlying distribution F is symmetric, if> is odd and most of all, To is consistent, translation invariant and odd. To is said to be translation invariant and odd if T (F ) = T (F ) . T (F-x) = 0 X+C 0 X + c, and 0 -To(Fx). In this thesis, we will focus on the following robustness properties of one-step M estimators: their breakdown point (the fraction of contamination causing complete disaster), their influence function (which describes the effect of an infinitesimal contamination) and their worst-case bias (which describes how much the estimators can change with a fraction e of contamination). It can be shown (see Rousseeuw and Croux in [31]) that the breakdown point of onestep M-estimators is equal to that of the initial M-estimators used in their computation. Chapter 3. One-Step M-Estimators 28 Selecting very robust initial M-estimators is thus strongly recommended. For example, the median and the median absolute deviation ( M A D ) , as estimates of location and dispersion respectively, have 50% breakdown point, the highest possible. This makes them excellent candidates for preliminary estimates of location and dispersion of a distribution. On the other hand, many estimators with a high breakdown point have very low asymptotic efficiency (63.7% for the median in the univariate location problem, and 36.7% for the M A D in the univariate scale model). However, one-step M-estimators, while inheriting the breakdown point of their initial estimators, will generally show an improvement over the asymptotic efficiencies of their initial estimators, because they approach the fully iterated M-estimators, which are usually by definition more efficient. Unfortunately, as Rousseeuw and Croux (1993a) have shown, increasing efficiency does not come without compromise. The authors treat the problems of univariate location estimation and univariate dispersion estimation, that is, estimation of location and dispersion when the nuisance parameter is known. In the former, they use the univariate equivalent of our M O S M E of location, while they use the univariate equivalent of our T-estimator of dispersion in the latter. Rousseeuw and Croux (1993a) show that the maximum bias of these one-step M-estimators is higher than that of their initial estimators. The increase in bias is especially strong in the dispersion setup. However, because only one iteration in the computational algorithm is performed when deriving one-step M-estimators, the increase in worst-case bias is perceived as a good compromise for the increase in efficiency. Rousseeuw and Croux (1993a) treat in general &-step M-estimators, for k > 1. They show that it is possible to obtain an arbitrarily high efficiency, while maintaining a 50% breakdown point, through a fc-step M-estimator, for a fixed finite k (which corresponds to performing k steps in the algorithm used to solve for the fully-iterated M-estimator, Chapter 3. One-Step M-Estimators 29 starting from a preliminary estimator with 50% breakdown point). Notice that under the Gaussian model, the univariate one-step M-estimator of location has already the same asymptotic efficiency as the fully-iterated M-estimator. Therefore, taking further steps will not increase the efficiency of the &-step M-estimator: Rousseeuw and Croux (1993a) show that it is equal to the efficiency of the fully-iterated M-estimator of location, for any k. The authors nevertheless consider A:-step M-estimators of location for k > 1 in view of possible improvements on their quantitative robustness properties. However, Rousseeuw and Croux (1993a) show that as the efficiency of the estimator goes up (as k increases), its maximum bias will also go up. In the univariate location case, the bias increases only slightly with k. Unfortunately, in the univariate dispersion problem, the maximum bias explodes rapidly with k, which does not justify, in the authors' opinions, the increase in efficiency. Based on these findings, they believe small values of k (k = 1 or k = 2) are preferable, especially for multiparameter problems (which typically contain a dispersion component). This has driven the choice to focus this thesis on one-step M-estimators. The purpose of this thesis is to extend the work by Rousseeuw and Croux (1993a). It will consider the more realistic situations with nuisance parameters: estimation of location with unknown dispersion and estimation of dispersion with unknown location. The emphasis will be put on the M O S M E ' s presented earlier in this chapter for the above models, because they have never been studied to the extent they deserve. To follow Rousseeuw and Croux's steps, we will derive the maximum bias of the M O S M E for the two models of interest, and compare them to that of the standard one-step M-estimators and the r-estimators when applicable. To further complete the study, we believe the asymptotic efficiency of an estimator also provides important information about its behaviour, even though Rousseeuw and Croux quickly mentioned it, and only for the normal distribution as the underlying distribution. The asymptotic efficiency of the M O S M E will Chapter 3. One-Step M-Estimators 30 be derived from its influence function, under some regularity conditions, and for different types (heavy-tailed, normal, light-tailed) of distributions. The asymptotic efficiency of the M O S M E will then be compared to that of the oether one-step M-estimators. To draw a more complete parallel with Rousseeuw and Croux (1993a), the same preliminary estimators of location and dispersion, the median and the (normalized) M A D , will be used. We believe, in any case, that they represent an excellent choice of preliminary estimators. The next two chapters treat independently the two problems of interest: estimating location with unknown dispersion, and estimating dispersion with unkown location. Each chapter will first look at the asymptotic efficiency of one-step M-estimators, then at their worst-case bias. Chapter 4 Location Estimation with Unknown Dispersion 4.1 Introduction Let Xi, ...,X n be a sample from a population with distribution F in the location-dispersion family {F(x) : F(x) = F(^-)}. The objective is to estimate the location 6, when the dispersion parameter a is unknown. Given a score function an M-estimator of location is the solution T of the equation n (4.15) where S n is a robust estimate of the dispersion parameter a. It can be shown that, under mild regularity conditions, T converges a.s. [F] to T(F), the functional implicitly n defined as the solution of (4.16) where S(F) is the asymptotic value of S . We will therefore adopt the functional notation n in the discussion below. Computing an M-estimate of location requires the use of an iterative method, such as the Newton-Raphson algorithm, as we must solve the nonlinear equation (4.15). Huber (1981) suggested, as an alternative, to use the estimate found by performing only one iteration in the algorithm, starting with initial estimates of location and dispersion. With 31 Chapter 4. Location Estimation with Unknown Dispersion 32 the underlying distribution F, the Standard One-Step M-Estimator of Location, derived from the score functions I/J, can be formally defined by the functional where To(F) and So(F) are asymptotic values of the initial estimators of location and dispersion. By replacing the distribution F by the empirical distribution F , it is therefore n possible to get a finite-sample estimator for the location parameter 6. However, for certain score functions tp and certain samples, it may happen that the finite-sample version of the denominator becomes 0 or dangerously close to Epij)'( ~^) x 0. To avoid this problem, Hampel et al (1986) suggested the following modified version of the standard one-step M-estimator of location: T*{F) = T {F) + S (F) 0 0 E ^( F E^'(z) which will be denoted hereafter the MOSME of Location (Modified One-Step M-Estimator of Location). The standard one-step M-estimator of location has essentially the same asymptotic behaviour as its fully-iterated version, at least when F is symmetric, %j) is odd and To is consistent, translation invariant and odd. Our main interest is to study the asymptotic behaviour of the MOSME as a means of comparison with the standard one-step Mestimator of location. More specifically, the asymptotic efficiency at different models and the maximum asymptotic bias of both estimators will be compared. Chapter 4. 4.2 Location Estimation with Unknown Dispersion 33 Asymptotic Efficiency of the M O S M E of Location 4.2.1 Our Choice of Underlying Distributions and of Score Functions The study developed in section 4.2 will include a set of eleven underlying distributions F and three different score functions ip. The discussion in section 4.2 assumes that the underlying distribution F is symmetric. Interesting cases for F include heavy-tailed distributions, which attempt to model samples with outliers. To further make a parallel between the theory related to the asymptotic efficiency of the M O S M E and its maximum bias, one case of the contaminated normal distribution will be considered. Finally, two light-tailed distributions will also be used, to illustrate the adaptative property of the M O S M E to various situations. More specifically, the distributions of interest are: the normal distribution, the Student's t distribution with 1, 2, 5, 8, 10, 20 degrees of freedom, the double exponential distribution, a contaminated standard normal distribution, with 5% of outliers with distribution N(6,0.01) and another 5% with distribution N(—6, 0.01), and two distributions with lighter tails, that is, a symmetrized Beta distribution (a = j3 = 10) and the distribution with density f(x) — 0.5516313254exp(—a; ). 4 In order to compare the asymptotic variances of our estimators at these distributions, it was decided to normalize the distibutions so that their interquartile range would all be equal to the standard normal interquartile range, 1.349. This normalization implied scaling each distribution with a factor do, so that the density associated with the central distribution F is actually ^ • / e=0(T _ ( ^ - ) , and not fe=o,a=i{x). See Appendix A for the 1 scale factors do necessary to make each of the above distributions have an interquartile range equal to 1.349. Chapter 4. Location Estimation with Unknown Dispersion 34 We are however aware that the normalization is not needed if we compare the efficiencies of the estimators instead of their variances, which we actually do. Indeed, an efficiency is by definition the ratio of asymptotic variances, and the asymptotic variance under F(-^) is equal to cfy times the asymptotic variance under F(x). Therefore, the efficiency calculations are independent of the normalization factors. However, the normalization greatly simplifies the calculations when the initial estimator of dispersion in the one-step calculation is chosen to be the M A D , as will later be shown. This is the reason why we have adopted this procedure. If one nevertheless prefers to compare the asymptotic variances of the estimators, instead of their efficiency, the material provided in this thesis will enable that person to do so. To further illustrate the behaviour of the M O S M E , the following three score functions ip will be used: |x| < 1.345 1.345 ( sign(x) tp (x) NcDF (4.17) |x| > 1.345 = 2${x)-l, (4.18) and x(4.7 -x ) \x\<4.7 0 1x1 > 4.7 2 They will respectively hereafter be called 2 2 #1.345, NCDF and T4.7. (4.19) The first two score func- tions are monotone non-decreasing, whereas the third is redescending. The score function #1.345 was initially proposed by Huber (1964) who showed that it is asymptotically min- imax for F = $, within the class of location estimators with general dispersion. The Chapter 4. Location Estimation with Unknown Dispersion Score Function tj) #1.345 NCDF T. 4 7 35 E^'(z) 0.6106876 .5641896 370.4275608 Table 4.2: The Value of the Constant Denominator, E^tp'(z), in the Ratio Defining the Three M O S M E ' s of Location Under Study choice of the constant 1.345 makes the estimator #1.345 95% efficient under the normal model. As will be seen later, NCDF is also 95% efficent when the underlying distribution is normal. The score function T4.7 is an example of Tukey's biweight function. The choice of the constant 4.7 makes the estimator T4.7 95% efficient under the normal model. Note that the constant denominator, E^ijj'(z), in the ratio defining the M O S M E can be easily calculated when the score functions 0 are determined. Table 4.2 provides those constants. Notice that the score functions (4.17) and (4.18) are normalized to have a maximum of 1. The score function in (4.19) is not. This explains the big difference in the values of the constants in Table 4.2. 4.2.2 The Asymptotic Variance of the Standard One-Step and the FullyIterated M-Estimators of Location In order to study the behaviour of the M O S M E of location, it is of interest to compare its asymptotic variance to the asymptotic variance of the standard one-step M-estimator. But the latter is equivalent to the asymptotic variance of the fully iterated M-estimator of location, provided that the initial estimator of location, To, is consistent, translation invariant and odd, and that the underlying distribution F is symmetric and the score function if) is odd. These last assumptions shall be used throughout this chapter, in view of the simplification they bring to the problem. Chapter 4. Location Estimation with Unknown Dispersion 36 Huber (1981) (pp. 140-141) shows that the influence function of the standard one-step M-estimator and of the fully iterated M-estimator of location is, under our assumptions, IF(x-T,F)= ' f\\ KS b/( (4.20) where So(F) is the asymptotic value of the initial estimator of dispersion. Remember that T (F) — 0 when F is symmetric. Note that the above influence function (4.20) is 0 directly proportional to the score function tp defining the estimators. If we take the initial estimator of dispersion to be the normalized M A D , with asymptotic value Med(|X|) S o [ F ) ~ $-H3/4) ' then So(F) — 1 for all the distributions of interest presented in section 4.2.1. This is due to the normalization factor do (see Appendix A) which makes the interquartile range of each distribution equal to 1.349. With this choice of initial estimator of dispersion, the asymptotic variance of the standard one-step M-estimator of location, and so of the fully iterated M-estimator of location, simplifies to W W under mild regularity conditions. 4.2.3 Asymptotic Variance of the M O S M E of Location Appendix B provides the complete derivation of the influence function of the M O S M E of location. In the general non-symmetric case, it is equal to Chapter 4. Location Estimation with Unknown Dispersion IF{x-T\F)= I F ( X ] T 0 , F ) { l - ^ } 37 + mx;So,F){§f^-^0^} m * - , T , F ) { § ^ } - S + 0 § $ $ , where To = TQ(F) and So = So(F) are the asymptotic values of the initial estimators of location and dispersion, y = v g °, IF(x\To, 0 F) and IF(x; SQ,F) are the influence functions of the initial estimators of location and dispersion, and IF(x; T, F) is the influence function of the standard (or the fully iterated) one-step M-estimator of location. However, it is possible to simplify the above expression with appropriate conditions. P r o p o s i t i o n 1 Assume To is consistent, translation invariant and odd. Assume that ip is odd, bounded, differentiable except in at most a finite number of points, and equal to 0 at 0. If F is symmetric, then IF{x; T*, F) = (1 - a) IF(x; T , F) + a IF(x; T, F), 0 where a = § J $ M The conditions of Proposition 1 are needed for the standard one-step M-estimator of location to have the same influence function as the fully iterated M-estimator. The most important condition is the symmetry of F, which greatly simplifies our problem since it makes the influence function of the M O S M E of location independent of the initial estimator of dispersion. The conditions on the score function ij) are minimal regularity conditions that most score functions used in practice will satisfy. Note that the influence function of the M O S M E illustrates its adaptative behaviour. When the underlying distribution F is approximately normal, the constant a becomes close to 1, and the M O S M E behaves like the more efficient standard one-step (fully Chapter 4. Location Estimation with Unknown Dispersion 38 iterated) M-estimator. On the other hand, the further away F is from normal, the further away a is from 1, and the more impact the initial (robust) estimator of location has on the behaviour of the M O S M E . On can use a Taylor series expansion, under mild regularity conditions, or the heuristic formula V(T*,F) M O S M E , V(T*,F). = Ep{IF(x;T*, F)} 2 to obtain the asymptotic variance of the In either case, we find that V(T\ F) = E {(1 F - a)IF(x;T , F) + aIF{x-T, 0 F)} . 2 Note that the asymptotic value of the initial estimator of location, To(F), is 0, since all the distributions are symmetric. It is however important to choose a robust initial estimator of location when using the M O S M E or the standard one-step M-estimator. The median is strongly recommended in the literature. Its influence function is when it is assumed that its asymptotic value is 0. Hampel et al (1986) have moreover shown that the influence function of the median has the sharpest bound for any location estimator, thus the smallest gross-error sensitivity. The asymptotic variance of the median is 1/4/(0) . 2 As in the preceding section on the standard one-step M-estimator of location, we recommend the use of the normalized M A D as the initial estimator of dispersion. It is asymptotically equal to 1 for all distributions F under study, by the choice of the normalization factors do. 4.2.4 The Asymptotic Variance of the M L E In order to calculate the efficiency of the M O S M E and the one-step M-estimator of location, it is necessary to find the asymptotic variance of the maximum likelihood estimator Chapter 4. Location Estimation with Unknown Dispersion Distribution F V(MLE, F) double exponential contaminated normal t(l) t(2) t(5) t(8) t(10) t(20) normal symmetrized beta 0.55exp(-a; ) 0.9455161 0.0713725 0.909873 1.1356899 1.1470014 1.1127178 1.0962449 1.0543198 1.0000000 0.9233088 0.5367305 4 Table 4.3: Asymptotic Variance V(MLE,F) butions F 39 of the M L E for Different Underlying Distri- of location, for each underlying distribution F. The M L E of location is defined through the score function ^ M L E W ~ ~f'( )/f( )x x The MLE, consistent for the location 9, possesses the smallest asymptotic variance possible, namely the inverse of the Fisher information. That is, the asymptotic variance of the M L E is V(MLE, F) = E r { ^ w Table 4.3 gives the asymptotic variance of the MLE, y ( M L E , i ) , for the different ? distributions under study. Note the strikingly small asymptotic variance of the M L E for the contaminated normal distribution, which is equivalent to a very large value of the Fisher information. When integrating {f'(x)/f(x)} 2 to obtain this Fisher information, the two contaminations centered on x — ± 6 cause the integral to increase significantly over values of x greater than 5 or smaller than -5. The contaminated normal is the only distribution in Table 4.3 which is not unimodal. Chapter 4. Location Estimation with Unknown Dispersion MOSME Distribution Med dble exp 1.000 cont normal 0.047 t(l) 0.811 0.833 t(2) 0.769 . t(5) 0.731 t(8) t(10) 0.716 0.680 t(20) normal 0.637 sym beta 0.581 exp(—x) 0.300 F 4 40 Standard One-Step #1.345 NCDF T .7 #1.345 NCDF T4.7 0.735 0.060 0.620 0.876 0.992 0.996 0.993 0.978 0.950 0.902 0.669 0.742 0.058 0.609 0.870 0.993 0.999 0.996 0.983 0.950 0.906 0.631 0.747 0.080 0.781 0.930 0.987 0.987 0.984 0.942 0.950 0.910 0.666 0.698 0.060 0.569 0.857 0.990 0.996 0.993 0.979 0.950 0.901 0.644 0.718 0.058 0.571 0.856 0.992 0.999 0.997 0.983 0.950 0.905 0.616 0.695 0.080 0.716 0.904 0.984 0.987 0.985 0.946 0.950 0.908 0.643 4 Table 4.4: Asymptotic Efficiency of the MOSME and the Standard One-Step M-Estimator of Location (Equivalent to That of the Fully-Iterated M-Estimator), Derived from Different Score Functions if; and for Different Underlying Distributions F. The Asymptotic Efficiency of the Median, the Initial Estimator of Location, is Provided for Comparison Purposes. The Normalized MAD is Used As an Initial Estimator of Dispersion. 4.2.5 Asymptotic Efficiency of the M O S M E Compared to That of the Standard One-Step (or Fully-Iterated) M-Estimator of Location Table 4.4 presents the asymptotic efficiency of the MOSME and the one-step M-estimator of location (equivalent to that of the fully iterated M-estimator), derived from the three score functions (4.17), (4.18) and (4.19), for the different distributions F under study. The asymptotic efficiency of the initial estimator, the median Med, is also given in Table 4.4, as a mean of comparison. In view of Table 4.4, it can be concluded that the MOSME of location does not have the same asymptotic properties as the fully- iterated M-estimator, contrary to the one-step M-estimator of location under our assumptions. Chapter 4. Location Estimation with Unknown Dispersion In fact, for the three score functions #1.345, 41 NCDF and T4.7, it appears that the efficiency of the M O S M E is always greater than, or comparable, to that of the standard one-step M-estimator, and hence to that of the fully-iterated M-estimator, at least for the underlying distributions F selected. This improvement is especially present in the case of very heavy-tailed distributions, which puts the M O S M E at an advantage over the other estimators when the sample contains outliers. The M O S M E , the standard one-step and the fully-iterated M-estimators of location seem approximately equivalent, in terms of asymptotic efficiency, for underlying distributions F approaching the normal distribution. Table 4.4 also suggests that a M-estimator derived from the T 4 . 7 score function shows a greater efficiency than the one derived from the #1.345 and the NCDF score functions, in presence of very heavy-tailed distributions (t(l), t(2) and contaminated normal). The NCDF score function behaves slightly better than the #1.345 score function, which itself behaves slightly better than T 4 . 7 , for distributions approaching the normal distribution (t(5), t(8), t(10) and t(20)). For the very light-tailed symmetrized beta distribution, the T4.7 score function gives the highest efficiency, followed by NCDF and then case of the light-tailed 0.55exp(—x ), 4 #1.345 and T4.7 #1.345. In the have comparable efficiencies, slightly higher than NCDF'S- For the double exponential underlying distribution, things are not so clear. The NCDF score function is superior to #1.345 and T4.7 when using the fully iterated M-estimator. However, the three score functions are roughly equivalent when the M O S M E is used. Therefore, because the presence of outliers is the main problem when estimating location, I would recommend using the M O S M E derived from the T 4 . 7 score function to estimate the location of a distribution with unknown dispersion. Chapter 4. Location Estimation with Unknown Dispersion 4.3 42 Maximum Bias of the M O S M E of Location Let Xi,..., X n be a sample from a population with distribution F in the contamination neighboorhood V (F ' ) 6 T A 0 = {F : F = (1 - e)F ' + E (R 0 where the central distribution F E , A Q e#, H arbitrary distribution}, 0< e < 1/2, belongs to a location-dispersion family, that is, u The arbitrary distribution H generates the outliers that can be present in the sample, and it will invariably affect the estimation of the location parameter. Moreover, we set e less than 1 /2 because it would otherwise be impossible to distinguish between the central, and the arbitrary, FQ '", 9 H, distribution. The maximum bias function B *(e) T = sup Fev \T*(F)-9\, € as briefly introduced by Hampel et al (1986) (pp. 176-177), can be used to measure the asymptotic robustness of the M O S M E of location as a function of the fraction e of contamination. Note that we can take 9 = 0 without loss of generality because the estimate T*(F) is translation invariant. The following compares the maximum bias of the M O S M E , the standard one-step and the fully-iterated M-estimators of location in two distinct situations: when the score function (as if) is monotone non-decreasing (as # 1 . 3 4 5 and is T 4 . 7 ) . NCDF are), and when if) is redescending Chapter 4. Location Estimation with Unknown Dispersion 4.3.1 43 M o n o t o n e N o n - D e c r e a s i n g Score Functions Let the distribution Foo be a point mass contamination at infinity, obtained when H = 6oo. We shall hereafter concentrate on the normal central distribution Fo' = $ in the a neighboorhood V . t P r o p o s i t i o n 2 Assume ip is monotone non-decreasing, bounded and odd. Let S~(e) = mf v So(F), S (e) = sup + Fe £ FeVe So(F), and B(e) = sup y \TQ(F)\. Fe Assume we can € interchange integration and derivation, that is d/dt{E^( j^)} = —^E^'(^-) and E E*ip'(x) — l-e> (4.21) Vi in [-B(e), B(t)], forfixeds in [5~(e), 5 (e)] + and E^{»-f)-E^X^){*-f)>-^, (4.22) \/s in [S~(e),S (e)], forfixedt in [-B(e), B(e)], + then sup T*(F) = r(Foo), Fev e where = (1 — e)$ + e6 0 P r o o f : We clearly always have that s u p F e V e T*(F) > T * ( F ) . 00 Moreover, Ve < 1/2, ™PFevJ*(F) = sup f T (F) + S (F) 0 0 ^ *-?o(fV ) 44 Chapter 4. Location Estimation with Unknown Dispersion Fe {{l-e)$ + eH} F G {(1 - e)$ + eH} I * < M ; sup - - £ ( e ) < t < 5(e) Y+s ^ 7 7 ^ 5"(e) < s < S+{e) = T*(Foo). The first equality is simply the definition of T*(F). to write Epipi^-^pp-) By definition of V , it is possible E as the sum of two terms, as states the second equality. Since tp is bounded, we can assume without loss of generality that sup tp(x) = 1. Thus E}jip is x always less or equal to 1, which gives the third line. The function which is to be maximized on the third line does not depend on the distribution H anymore. In fact, it can be regarded as a function of two arguments, T (F) 0 (or t) and So(F) (or s). Following the work of Martin and Zamar in [26], if e < 1/2, then T {F) and S (F) are bounded as in the fourth line. That is, S~{e) < S {F) = s< S (e), + 0 0 and -B(e) < 0 T (F) 0 = t < B(e). Assuming the conditions (4.21) and (4.22) hold, the function to be maximized in the fourth line is increasing in TQ(F) = t, for all fixed S~(e) < SQ(F) = s < S (e), + Chapter 4. Location Estimation with Unknown Dispersion 45 when —B(e) < TQ(F) = t < B(e), and it is increasing in So(F) = s, for all fixed -B(e) < T (F) = t < B(e), when S~(e) < S (F) = s < S+(e). Therefore, we directly 0 0 get the fifth line, which is by definition T*( F ). J Hence, we have shown that sup y T*(F) Fe F ' e cr 0 € 00 = T*(F ), when the central distribution 00 is normal. • Analytical derivations, combined with numerical calculations, have shown that the #1.345 and the NCDF score functions satisfy the above conditions (4.21) and (4.22), when the median and the (normalized) M A D are used as preliminary estimates of location and dispersion. The conditions were rewritten for the two specific cases of #1.345 and NCDF score functions, and evaluated over a finite and equally-spaced 21x21 grid, covering the range of possible t and s values. For a fixed e, the maximum value of the (normalized) M A D , S (e), is produced by a point mass contamination at infinity, and such contamination + also produces the maximum value B(e) of the location estimator. The minimum value of the (normalized) M A D , S~(e), is produced by a point mass contamination at 0, and such contamination also produces the minimum absolute value of the location estimator, 0 (see Martin and Zamar.(1989)). More specifically, the value for B(e) can be explicitly written as $~ ( (i- ))- It is the value T(F) which satisfies (4.16) for the median score 1 2 e function XMed(x) = sgn(x) and F = F^. The bounds S (e) and S~(e) are the implicit + solutions of (1 - e)MB(e) - 5' (e)$- (3/4)} + 1 - ${#(e) + 5 (e)$- (3/4)}] + t = 1/2, + 1 ,+ 1 and (1 - e)[${-5-(e)$- (3/4)} + 1 - ${5"(e)$- (3/4)}] = 1/2. 1 1 Chapter 4. Location Estimation with Unknown Dispersion That is, S (e) satisfies + = 1/2 and S (e) satisfies E XMAD( +$) X FOO 46 S 1/2, where the M A D score function is XMAD{X) E XMAD{^J^) FO = l/2{sgn(|a;| - $ ( 3 / 4 ) ) + 1} (see _1 = chapter 5 for more details about score functions of dispersion estimators) and Fo = (1 - e)$ + e6 . 0 For all e < 1/2 used, the conditions (4.21) and (4.22) were always met. So, even if those conditions seem somewhat restrictive, it is believed that many often used monotone non-decreasing, bounded and odd score functions tfi satisfy them. A lower bound for the maximum bias of the fully iterated location M-estimator is T(F ), 00 ie the positive solution t of the non-linear equation - E ^ ( X - ^ E S+(e)J 1-e' for a fixed e. Moreover, if we assume that the score function ip is monotone non-decreasing, bounded and odd, then we have a lower bound for the bias of the standard one-step location M estimator, Ti(F), given by T^F^), Tx^co) = where B{e) + S+(e)- (1 - e)E^(^^) ' S { t ) J , ( + e • ( i - W ( ^ ) It is therefore possible to compare, in terms of maximum asymptotic bias, the M O S M E with the standard one-step M-estimator, as well as with the fully iterated M-estimator of location. Figures 4.1 and 4.2 show the maxbias curve of the M O S M E and a lower bound for the maxbias curves of the one-step M-estimator and the fully iterated M-estimator of location, derived from the #1.345 and the NCDF score functions. In both cases, the M O S M E shows a smaller maximum asymptotic bias than the one-step M-estimator and the fully iterated M-estimator, Ve < 1/2. Note that the improvement by the M O S M E Chapter 4. Location Estimation with Unknown Dispersion 47 over the one-step and the fully iterated M-estimators of location is especially striking with the #1.345 score function. The maximum bias for the three types of M-estimator was also calculated, using Huber's score function with different values of index, ranging from 0.5 to 1.75. Similar results as those with the #1.345 score function were obtained; the bigger the value of the index, the better the improvement by the M O S M E , compared to the one-step and the fully iterated M-estimators of location. Hence, the M O S M E clearly shows improved asymptotic robustness over the standard one-step M-estimator, as well as the fully iterated M-estimator of location. 4.3.2 Redescending Score Functions As in the previous section, let's concentrate on the normal central distribution F e,(r 0 = $ in the neighboorhood V . e It is possible to obtain a lower bound on the maximum bias of the M O S M E , the standard one-step and the fully iterated M-estimators of location when the score function used in their definition is redescending, as for example the T4.7 score function is. Indeed, let x* = 2.1 be the value at which T4.7 is maximized. Let be the point mass contamination at x*, obtained in the neighboorhood V when # = 8*. Then, T*(F ), t tf Ti(F*) and T(F*) ave lower bounds on the maximum bias of respectively the M O S M E , the standard one-step and the fully iterated M-estimator of location. To obtain T*(F*), Ti(F*) and T(F*), we must first determine what the initial estimators of location and dispersion, the median and the normalized M A D , become as a function of e at F*. For any e < 1/2, the median at F+ is the value #(e) which satisfies Chapter 4. Location Estimation with Unknown Dispersion 48 where S (e) is the normalized M A D at F* and ipMed is the score function defining the + estimator Med. It turns out that we can explicitly write B(t) — (2(1-0) w n e n e is approximately equal or less than 0.49. The normalized M A D at F* is the value S (e) which satisfies + **•*»«> ^ w ) = ' 1/2 where B(e) is the median at F* and ipMAD is the score function defining the normalized M A D . It turns out that for e approximately equal or less than 0.30, S (e) satisfies more + precisely (1 - e ) [ $ { £ ( e ) - 5 (e)$~ (3/4)} + 1 - ${5(e) + 5 (e)$" (3/4)}] + e = 1/2. + 1 + 1 Therefore, for e approximately equal or less than 0.30, a lower bound for the maximum bias of the M O S M E defined with the score function T4.7 is T*(F ) = B(e) + S+(e) (1 - e ) F ^ r 4 , (^) + 7 0r ,(^l) £ 4 m Similarly, we have a lower bound on the maximum bias of the standard one-step M estimator of location: ( l - e W T , (1 - U ( ^ ) + ^ 7 T 4 ( ^ l ) 7 T (F*) = B(e) + S (e) + 1 W ^ ) + ^ U 2 ^ ) " Finally, a lower bound on the maximum bias of the fully iterated M-estimator of location is the positive solution t of which can be more precisely written as (1 - e)E*ip . Tl T (^rrr) + ^T . S+(e)J ' e 4 r ^ ^ \S+(e) Chapter 4. Location Estimation with Unknown Dispersion 49 Figure 4.3 shows the lower bounds on the maximum bias of the M O S M E , the standard one-step and the fully iterated M-estimators of location, defined through the nonredescending score function T4.7. It appears that for small fraction of contamination e (e < 0.12), the three estimators are equivalent in terms of maximum bias. For medium e (0.12 < e < 0.30), the standard one-step and the fully iterated estimators are nondistinguishable and appear to have a slightly lower maximum bias than the M O S M E . However, this cannot be ascertained for a fact, as the figure shows only a lower bound on the maximum bias. However, the redescending nature of the Tukey score function, which discards the contribution of very large values of x, may makes it superflous to change the denominator in Si(F) to get the M O S M E of location. This would explain possible higher bias of the M O S M E compared to the standard one-step M-estimator of location. 4.3.3 A n y T y p e of Score F u n c t i o n It was shown in section 4.3.1 that the maximum bias of the M O S M E is uniformly lower than that of the standard one-step or the fully iterated M-estimator of location defined through a monotone non-decreasing score function. When a redescending score function is used, the M O S M E is equivalent to the other two estimators for small fraction of contamination e, as section 4.3.2 presented. Therefore, it is possible to now focus our attention to M O S M E ' s only, especially when e is small. When using the M O S M E of location, one needs to decide which score function to select in its definition, so as to minimize its maximum bias. Figure 4.4 shows the maximum bias of the M O S M E of location defined through the #1.345 and the NCDF score functions, and a lower bound for the maximum bias of the T4.7 M O S M E of location. It is impossible to conclude anything when e > 0.12, but for small fraction of contamination, Figure 4.4 shows that the Huber score function has the lowest maximum bias, followed by the Chapter 4. Location Estimation with Unknown Dispersion Tukey score function, and then preferred to the 4.3.4 NCDF NCDF- 50 For larger e, the Huber score function should be score function. Further Work to Be Done The work about asymptotic robustness was developed for one specific case of central distribution FQ 9,(7 in the neighborhood of V : the normal distribution C need to be done for different central distributions FQ '", 9 More calculations as for example, the ones used in the asymptotic efficiency section (4.2) of this text. Finally, it remains to be seen whether the asymptotic properties of the M O S M E and the standard one-step M-estimators studied in this section reflect on their finite-sample performance for small and medium sample sizes. This could be assessed with Monte-Carlo simulations, for example. 4.4 A n Example: Hummingbirds In asymptotic terms, the M O S M E of location performs better than the standard one-step M-estimator and the fully iterated M-estimator of location. However, how well do they handle finite data sets? Figure 4.5 present the bar plots of flying times (in seconds) of four types of hummingbirds: adult females (AF), adult males ( A M ) , junior females (JF) and junior males (JM). During 15 minutes, each bird was put in a cage containing two perches 0.5 meter apart. A red light above the perches alternatively flashed so as to indicate to the bird to fly towards it (the birds were previously trained to react to flashing red lights in that manner). The time for the bird to fly from one perch to the other (in seconds) was recorded. It is believed to be a measure of the agility of the birds. Hummingbirds can Chapter 4. Location Estimation with Unknown Dispersion 51 fly for long periods of time without rest. In 15 minutes, a bird typically flew 200 times from one perch to the other. A close look at Figure 4.5 shows that some bird wandered around before hitting the flashing perch, during some of their flights, introducing extreme outliers in their flying times. On the other hand, for other birds, it is difficult to determine whether a high flying time was due to wandering or if it effectively represents the bird's performance. For that reason, the researcher who provided the data set was interested in using a measure of location to describe the distribution of flying times for each bird that would be resistant enough to possible outliers, while allowing for flexibility if long flying times of a bird are normal. She hesitated between using the median and the mode. Our M O S M E of location provides an interesting alternative. Table 4.5 shows the estimated value of different location measures for the distribution of flying times of all 16 hummingbirds, that is, their means, medians, M O S M E ' s and standard one-step M-estimates of location. The standard one-step M-estimates are shown; note however that the estimates obtained with the fully iterated M-estimator of location are roughly equal, up to the third decimal place, except for the adult female 2, which has an extreme outlier. Many observations can be drawn from Table 4.5. In general, the M O S M E ' s and the standard one-step M-estimates of location are roughly equal, except in the cases of A F 2, A F 4, J F 1, J F 2 and J M 3 hummingbirds, depending on the score function used. These particular birds have extreme outliers or a very heavy tail. Their M O S M E ' s are closer to the median than the standard one-step M-estimates, which illustrates the special adaptative nature of the M O S M E of location. Notice as well that the Tukey score function appears to be conservative, compared to #1.345 or NCDF-, in the robustness sense of the term. The M O S M E ' s or the standard Chapter 4. Location Estimation with Unknown Dispersion Bird AF 1 AF 2 AF 3 AF 4 AF 5 AM 1 AM 2 AM 3 AM 4 AM 5 JF 1 JF 2 JF 3 JM 1 JM 2 JM 3 MOSME of Location Mean Median #1.345 NCDF T. 0.707 0.669 0.675 0.676 0.666 0.930 0.580 0.607 0.607 0.591 0.642 0.609 0.615 0.615 0.608 0.751 0.630 0.653 0.655 0.630 0.796 0.761 0.770 0.772 0.760 0.844 0.812 0.821 0.822 0.814 0.886 0.840 0.850 0.851 0.841 0.644 0.645 0.640 0.663 0.640 0.658 0.620 0.627 0.627 0.621 0.673 0.652 0.659 0.660 0.656 0.740 0.620 0.653 0.654 0.630 0.582 0.583 0.564 0.686 0.561 0.657 0.618 0.619 0.620 0.614 0.666 0.560 0.571 0.572 0.556 0.647 0.570 0.576 0.576 0.568 0.733 0.661 0.693 0.693 0.678 4 7 52 Standard One-Step #1.345 NCDF 0.675 0.610 0.615 0.657 0.771 0.822 0.851 0.644 0.627 0.660 0.657 0.586 0.619 0.573 0.576 0.694 0.676 0.609 0.616 0.659 0.773 0.823 0.852 0.645 0.628 0.660 0.659 0.586 0.620 0.573 0.577 0.694 T4.7 0.666 0.594 0.608 0.630 0.760 0.815 0.841 0.641 0.621 0.656 0.632 0.565 0.614 0.555 0.568 0.681 Table 4.5: Measures of Location of Flying Times of Four Types of Hummingbirds: Adult Females (AF), Adult Males (AM), Junior Females (JF) and Junior Males (JM) Chapter 4. Location Estimation with Unknown Dispersion 53 one-step estimates defined with the T4.7 score function are always closer to the median than the estimates defined through the other two score functions. The Tukey score function sometimes even gives smaller estimates than the median. This completes our results of section 4.2 which makes the Tukey score function a better choice in presence of heavy-tailed distributions, or outliers. Among all bar plots of flying times in Figure 4.5, the one for A M 5 appears to be best approximated by a normal curve. The one-step M-estimates of the location of the flying times of this bird are midway between the median and the mean, which is the closest the estimates get from the mean over among all the birds. The one-step estimates confirms that for A M 5, the M L E may not be a bad estimator of location after all, though some caution in its use is necessary. It appears that adult males have the worst agility, and that junior males are the most agile birds. Adult and junior females hummingbirds a priori do not show a difference in agility. In order to estimate the mean flying times of one of the four types of birds, and conclude statistically that one type was more agile than the others, one could use a robust analysis which is beyond the scope of this thesis. 4.5 Conclusions The M O S M E of location presented in this chapter is uniformly better than the standard one-step location M-estimator, in the sense that it is easier to compute, it has a comparable and sometimes better asymptotic efficiency under many important symmetric distributions, and has a lower or comparable asymptotic maximum bias when the central distribution is normal. When using a monotone non-decreasing score function, the M O S M E has a lower asymptotic bias than the standard one-step for any fraction of Chapter 4. Location Estimation with Unknown Dispersion 54 contamination. With a resdescending score function, the maximum bias of the M O S M E is comparable to that of the standard one-step for small, but realistic, fractions of contamination. Under our assumptions, the standard one-step M-estimator of location is asymptotically equivalent to the fully iterated M-estimator. This makes again the M O S M E better than the fully iterated M-estimator, in terms of asymptotic behaviour. A n d since finding the solution of a non-linear equation, as is required when computing the fully iterated location M-estimator, can sometimes be problematic, the M O S M E still is preferable to the fully iterated M-estimator. The superiority of the M O S M E in terms of asymptotic efficiency is especially strong for very heavy-tailed distributions, which describe the situations of samples with outliers. Martin and Zamar (1989) have shown through finite sample-size simulations that in practice, the squared bias is at least as large as the variance of M-estimators for rather modest sample sizes. Therefore, the comparison between the M O S M E and the standard one-step (fully iterated) M-estimator should give more weight to the maximum bias. And indeed, the M O S M E beats uniformly and clearly the other two estimators in terms of maximum asymptotic bias when it is derived from a monotone non-decreasing score function; and it is comparable to the other two estimators for small fractions of contamination, when it is derived from a redescending score function. It is impossible to single out the best score function to define the M O S M E , as all score functions perform optimally in specific, and different, situations. However, when striving for an estimator as accurate (as in low bias) and as precise (as in high efficiency) as possible, and if it is known that the contamination by outliers is small, the Huber score function should be used. If one wants to use a monotone non-decreasing score function, one should prefer the Huber score function to the normal one. When the contamination Chapter 4. Location Estimation with Unknown Dispersion 55 by outliers is large, the Tukey score function may represent a better choice than the Huber score function, mainy due to the fact that the efficiency of the Tukey is higher for heavy-tailed distributions than that of the Huber estimator. However, at this point, a formal comparison in terms of maximum bias between the Tukey and the Huber score functions is not possible for large fraction of contamination. With finite data sets, the example in section 4.4 shows that the M O S M E is more robust than the one-step (or the fully iterated) M-estimators of location when it needs to be. Indeed, in the presence of extreme outliers or very heavy tails, the values of the M O S M E are closer to the median than that of the other two estimators. The M O S M E , by its adaptative nature, becomes more robust and conservative when the data indicates that caution is necessary. On the other hand, the one-step (or the fully iterated) M estimators do not handle so well data that is far from normally distributed. We therefore believe that the use of the M O S M E should be prefered to that of the standard one-step location M-estimator, or the fully iterated location M-estimator, when estimating the location parameter 6 of a distribution. Furthermore, most of the results presented in this chapter are of an asymptotic nature. It remains to be seen whether the asymptotic superiority of the M O S M E reflects on its finite-sample performance for small and medium sample sizes. It would therefore be necessary to make simulations in order to establish for what minimum sample size the results presented in this paper become approximately valid. One could also assess the finite-sample behaviour of one-step M-estimators of location with sensitivity curves and empirical bias curves. If these finite sample-size results agree with the asymptotic results presented in this chapter, and that for small sample sizes, the acceptance of the M O S M E as a tool would be greatly facilitated. Chapter 4. Location Estimation with Unknown Dispersion 56 Maximum Bias Function of M-estimators Using H_1.345 Score Function Epsilon Figure 4.1: Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived from the #1.345 Score Function Chapter 4. Location Estimation with Unknown Dispersion 57 Maximum Bias Function of M-estimators Usjng N_CDF Score Function Epsilon Figure 4.2: Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived From the NCDF Score Function Chapter 4. Location Estimation with Unknown Dispersion 58 Maximum Bias Function of M-Estimators Using T_4.7 Score Function Epsilon Figure 4.3: Lower Bound on the Maximum Bias Function of the M O S M E , the Standard One-Step and the Fully-Iterated M-Estimators of Location, Derived from the T4.7 Score Function Chapter 4. Location Estimation with Unknown Dispersion 59 Maximum Bias Function of MOSME's of Location Epsilon Figure 4 . 4 : Maximum Bias Function of the M O S M E ' s of Location Derived from the #1.345 and the NCDF Score Functions, and a Lower Bound on the Maximum Bias Function of the M O S M E of Location Derived from the T4.7 Score Function Chapter 4. Location Estimation with Unknown Dispersion Adult Female 2 Adult Female 1 0.6 0.8 1.0 1.2 10 1.4 Flying Times (soc) 15 20 25 Adult Female 3 Adult Female 4 Flying Times (sec) Flying Times (sec) Adult Male 2 Adult Male 3 30 Flying Times (sec) Adult Female 5 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 Flying Times (soc) Flying Times (sec) Hying Times (sec) Flying Times (sec) Adult Male 4 Adult Male 5 Junior Female 1 Junior Female 2 1.0 0.5 60 1.5 0.2 0.4 0.6 0.8 1.0 1.2 1.0 2.0 Flying Times (sec) Flying Times (soc) Flying Times (sec) Frying Times (soc) Junior Female 3 Junior Male 1 Junior Male 2 Junior Male 3 1.0 1.5 Flying Tlmos (soc) 2.0 0.4 0.6 0.8 1.0 1.2 1.4 Flying Times (sec) 1.6 Flying Times (sec) Figure 4.5: Bar Plots of Flying Times of Four Types of Hummingbirds: Adult Females, Adult Males, Junior Females and Junior Males Chapter 5 Dispersion Estimation with Unknown Location 5.1 Introduction Let X\, ...,X be a sample from a population with distribution F in the location-dispersion n family {F(x) : F(x) = F( ^)}. The objective is to estimate the dispersion <r, when the s location parameter 8 is unknown. Given a score function Xi a n M-estimator of dispersion is the solution S n of the equation (5.23) where T is a robust estimate of the location parameter 8. It can be shown that, under n mild regularity assumptions, S n converges a.s. [F] to S(F), the functional implicitly defined as the solution of where T(F) is the asymptotic value of T . We will therefore adopt the functional notation n in the discussion below. Computing an M-estimator of dispersion requires the use of an iterative method, as we must solve the nonlinear equation (5.23). Because the score function \ 1 S of the form x { ) — g{ ) ~ Pi where g is an even function with g(0) = 0, Huber (1981) suggested, as x x an alternative, to use the estimate found by performing one iteration in the fixed-point 61 Chapter 5. Dispersion Estimation with Unknown Location 62 algorithm (starting with some initial estimates of location and dispersion). With the underlying distribution F, the r-Estimator of Dispersion, Sl(F), derived from the score functions x, can be formally defined by the functional So(F) 2 0 where To(F) and So(F) are the asymptotic values of the initial estimators of location and dispersion. By replacing the distribution F by the empirical distribution F , it is n therefore possible to get a finite-sample estimator for the dispersion parameter <r. Note that the subscript "p" in S\ stands for "p"oint, as in fixed-point. Another iterative algorithm widely used to estimate location with an M-estimator, but not so much dispersion, is the Newton-Raphson method. The Standard One-Step M-Estimator of Dispersion, S\(F), can be defined as the first iteration of this algorithm, starting from initial estimates of location and dispersion. More precisely, 5o(F)E x( jgf) £ S (F) 1 where T (F) 0 = So(F) + F and So(F) are the asymptotic values of the initial estimators of location and dispersion. However, for certain score functions x a n d certain samples, it may happen that the dangerously close to 0. To avoid this problem, we suggest replacing the denominator by E$x'( ) , z z following Hampel et al (1986)'s idea on p. 153 in the context of location estimation with unkown dispersion. This defines the MOSME (Modified One-Step M "Eistimator) of Dispersion: S*(F) = S (F) + S (F) 0 0 Chapter 5. Dispersion Estimation with Unknown Location 63 Our main interest is to evaluate the asymptotic behaviour of the M O S M E of dispersion. It is therefore necessary to compare it with the estimators it is aimed at improving: the standard one-step M-estimator and the r-estimator of dispersion. We believe it has adaptative properties which the r-estimator or the standard one-step M-estimator lack. The M-estimators of dispersion, other than the M L E ' s , are not consistent for distributions other than the standard normal, $, which makes their asymptotic values differ, as well as their asymptotic properties. For this reason, the usual notion of asymptotic efficiency of an estimator will not apply directly. Instead, the relative asymptotic efficiency will be used to assess the asymptotic performance of the M O S M E of dispersion as compared with the standard one-step M-estimator and the r-estimator. The worst-case asymptotic bias of the three estimators will also be evaluated and commented on in the following sections. 5.2 Relative Asymptotic Efficiency of the M O S M E of Dispersion 5.2.1 Our Choice of Underlying Distributions and of Score Functions To assess the asymptotic efficiency of the one-step estimators of dispersion, the eleven underlying distributions F used in chapter 4 (see section 4.2.1) will again be considered. To further illustrate the behaviour of the M O S M E , the following three score functions X will be used: x -0.5 \x\< 0.975 0.451 x\ > 0.975 2 XH .„ ( ) X 0 6 = { x - 0.9686 Ircl < 2.376 4.6768 x\ > 2.376 2 .376 (x) (5.24) (5.25) Chapter 5. Dispersion Estimation with Unknown Location 64 and , x XT .M 3 I = <( 3.86 6 3.86 + 3.86 4 2 0 - 1 6 \ \ 5 X 0.835 < 3 " 8 6 |z| > 3.86 • L K N A \ (5.26) The first two score functions, (5.24) and (5.25), are two cases of the general score function Xc(x) =I x 2 \x\ < c c 2 la;I > c (5.27) suggested by Huber (1981) (p. 109), which satisfies E$Xc(x) = P(c). Notice that it is the square of Huber's score function tpfj which defines a fully iterated M-estimator c that is variance minimax in the location setup. For c not too small, the fully iterated M-estimator of dispersion defined through (5.27) is also optimally B-robust (see [15], p. 122). The M O S M E and the standard one-step M-estimator derived from (5.24) will be hereafter denoted by #0.975- The score function value /3(c) = 0(0.975) = 1/2. X//0975 w a s c r i o s e n by tradition for the The M O S M E and the standard one-step M-estimator derived from (5.25) will be hereafter denoted by #2.376- The value of the constant 2.376 was chosen because it makes the M O S M E and the standard one-step M-estimator of dispersion 95% efficient under the normal model. Finally, the score function (5.26) is derived from Tukey's redescending score function in the location setup. It will define a M O S M E and a standard one-step M-estimator denoted by T . 6- Both the M O S M E and 3 8 the standard one-step M-estimator of dispersion T , e are approximately 95% efficient 3 8 under the normal model. The r-estimators derived from the score functions (5.25) and (5.26) are less than 95% efficient at the normal model. To make a fair comparison between the r-estimator and Chapter 5. Dispersion Estimation with Unknown Location 65 the M O S M E or the standard one-step M-estimator of dispersion, we must ensure that all estimators are 95% efficient at the normal model. The two following score functions: x \x\ < 2.516 6.33 Ixl > 2.516 2 (5.28) with /?(2.516) = E*XH (X) 2M6 = 0.9785, and 5.3 5.3 + 5.3 6 4 1 with /9(5.3) = E$XT {x) b3 2 l l ^ °'° X ^ Ixl > 5.3 — 0.096, define two r-estimators that are 95% efficient at the normal model. They will respectively be denoted by #2.516 and T5.3. Note that the value of the non-differentiable point 5.3 in the Tukey score function, which makes the T-estimator 95% efficient at the normal model, is 40% higher than that for the M O S M E . This will invariably affect the worst-case bias of the T-estimator of dispersion. The increase in the non-differentiable point c in the Huber score function is not as pronounced. The constant denominator, E$x'( ) i z z m the ratio defining the M O S M E can be easily calculated when the score functions x are determined. Table 5.6 provides those constants. The slightly higher constant for #2.516 is due to the fact that none of the three score functions are normalized to have a maximum of 1. The score function #2.516 in fact has a much larger maximum than the other two. 5.2.2 Asymptotic Value of the MOSME, the One-Step M-Estimator and the r-Estimator of Dispersion To draw a parallel between the estimation of location discussed in the previous chapter and the estimation of dispersion, a first attempt consists in providing the efficiency for the estimators of dispersion. However, it is in a sense meaningless to compare the maximum Chapter 5. Dispersion Estimation with Unknown Location 66 Score Function x 0.3736064 1.7396048 0.2677105 #0.975 #2.376 7/3.86 Table 5.6: The Value of the Constant Denominator, E^x'{ ) i Three M O S M E ' s of Dispersion Under Study z z m the Ratio Defining the likelihood estimator of dispersion to one-step M-estimators, because they do not estimate the same quantity. The maximum likelihood estimator of dispersion is consistent for a. Without loss of generality, we can assume that a = 1. Therefore, the M L E takes asymptotically the value 1. Indeed, for a given distribution F, the score function defining the M L E is X M L E W = ~ J(^j x ~ l Assuming JP is symmetric, we must solve the equation - S F X M L E ^ ) ~ ^ w n e n calculating the asymptotic value of the M L E of dispersion, a. But Ep j — x y ^ j = 1, assuming we can interchange derivation and integration. Therefore, Ep < —jj^xj | — 1 = 0, and the M L E takes asymptotically the value 1. On the other hand, one-step M-estimators will estimate a function of a, function that depends on the type of M-estimator (standard one-step M-estimator, M O S M E or restimator), the score function x used in its definition and the initial estimates of location and dispersion. Table 5.7 provides the asymptotic value of the M L E , the M O S M E , the standard one-step M-estimator and the T-estimator of dispersion, as well as the normalized M A D (median absolute deviation from the median, multiplied by the inverse of $ ( 3 / 4 ) ) and the standard deviation, for the different underlying distributions F of _1 interest. The initial estimators of dispersion and location used to calculate the one-step Chapter 5. Dispersion Estimation with Unknown Location MOSME Standard One-Step 67 r-Estimator F MLE #0.975 #2.376 ?3.86 #0.975 #2.376 33.86 #2.516 T5.3 MAD SD dble exp 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.21 1.17 1.39 1.22 1.09 1.06 1.04 1.02 1.00 0.98 0.87 1.21 1.21 1.43 1.23 1.09 1.06 1.05 1.02 1.00 0.98 0.88 1.01 1.01 1.02 1.01 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.21 1.22 1.52 1.25 1.09 1.06 1.05 1.02 1.00 0.98 0.84 1.21 1.27 1.53 1.25 1.09 1.06 1.05 1.02 1.00 0.98 0.86 1.19 1.16 1.34 1.20 1.09 1.06 1.04 1.02 1.00 0.98 0.87 1.23 1.33 1.50 1.28 1.11 1.06 1.05 1.02 1.00 0.98 0.88 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.38 2.07 cont norm t(l) t(2) t(5) t(8) t(10) t(20) normal sym beta exp(—a; ) 4 00 CO 1.20 1.10 1.08 1.03 1.00 0.97 0.86 Table 5.7: Asymptotic Value of the M L E , the M O S M E , the Standard One-Step M-Estimator and the T-Estimator of Dispersion, As Well As the Normalized M A D and the Standard Deviation (SD), for Different Underlying Distributions F. The Initial Estimators of Dispersion and Location Used to Calculate the One-Step M-estimators Are Respectively the Normalized M A D and the Median. M-estimators are respectively the normalized M A D and the median. Table 5.7 demontrates that the term dispersion of a distribution F in itself is not cleary defined or even meaningful. Indeed, the standard deviation is commonly used as a measure of dispersion. When the underlying distribution is normal, the standard deviation can easily be interpreted. For example, if the standard deviation of a normal sample is 5, then the dispersion of the sampled normal population is estimated to be 5 times that of the standard normal distribution $. Approximately 68% of the sampled values fall within ± 5 of the sample mean. Notice also that the M L E of dispersion in the normal case is exactly the standard deviation. On the other hand, what can be said about the dispersion of a Cauchy or a t(2) family, for which the sample standard deviation is, for example, 5? No matter how spread or dispersed the Cauchy or the t(2) Chapter 5. Dispersion Estimation with Unknown Location 68 distribution are, their standard deviations always equal infinity. The sample standard deviation cannot therefore be a meaningful measure of dispersion for these families. As an alternative, one could use the M L E to estimate the parameter u and refer to it as the dispersion of a distribution. However, calculating the M L E assumes that the distribution F generating the data is known, which is rarely the case in practice. Besides, even though the M L E possesses the analytical interpretation of cr, its geometric interpretation remains obscure. Indeed, remember that the distributions presented in Table 5.7 have been normalized so that their interquartile range equals 1.349. The normalization factor do is now part of the definition of their density (f(x; 8, cr) = fd (x', 0, cr)) and dispersion 0 estimation is concerned with the problem of identifying the distribution F in the family {F : F = Fd (^y-)}, 0 from which the sample is drawn. Therefore, assume for example that the M L E of dispersion from a sample is found to be equal to 5. Does this mean that d = 5 and u = 1, or rather that d = 0.5 and a = 10, or some other combination? The 0 0 first possibility would imply that the distribution of interest has a light-tailed, whereas the second possibility indicates that F is heavy-tailed. Somehow, it is hard to grasp what kind of geometric dispersion the M L E actually estimates. For a clear geometric interpretation of dispersion, the M A D offers a good deal. It is easy to picture the interquartile range of a sample: it is the range, centered around the middle of the sample, which contains half of the data points. If for example a sample has a normalized M A D of 5, then the dispersion of the sampled population can be estimated as 5 times larger than the dispersion of the same distribution with interquartile range equal to 1.349. However, maybe because of its ease of interpretation, the M A D does not define the complete picture. Indeed, all the distributions presented in Table 5.7 have the same M A D , even though some F are heavy-tailed, others are light-tailed and one is normal. Chapter 5. Dispersion Estimation with Unknown Location 69 Clearly, the centre half of mass of these distributions is contained in the exact same range. But, we are given no indication regarding the shape of their tails. Are they elongated, dispersed, or rather compact, short? W i t h these questions in mind, one can easily conclude from Table 5.7 that the one-step M - estimators of dispersion do indeed estimate dispersion in its most meaningful way. For any distribution in the table, the estimates of dispersion are finite, and roughly close to 1. The closer the estimates are from 1, the closer the distributions are from normal. A l l estimates of dispersion over 1 come from heavy-tailed distributions, whereas all estimates under 1 refer to light-tailed distributions. In fact, by their intrinsic definitions, the onestep M-estimators of dispersion are a linear function of the M A D ; this function differs more or less from the identity as the underlying distribution differs from the standard normal distribution. The one-step M-estimators therefore provide information about dispersion which is more complete than the M A D , while staying geometrically interpretable and meaningful. A look at the one-step estimates in Table 5.7 tells us that all distributions have roughly the same interquartile range, but that their nature differ greatly, since the estimates are all close to 1, but still different from each other. Finally, note that computing a one-step M-estimator of dispersion doesn't require the knowledge of the distribution generating the data, since the empirical distribution is used: this is a clear advantage over the M L E , which is defined through the joint density of the data points. But, which of the M O S M E , the standard one-step or the T-estimator should be used to estimate dispersion? It appears that the M O S M E or the T-estimator represent the best choice, depending on the situation. Both estimators are relatively close to 1, and always more than the standard one-step M-estimator of dispersion, especially in the presence a heavy-tailed distribution. When using the Huber score function, the T-estimator is Chapter 5. Dispersion Estimation with Unknown Location 70 slightly closer to 1 than the M O S M E for heavy-tailed distributions. In situations where a smoother score function is used, such as the Tukey score function, the M O S M E is closer to 1 than the r- estimator with heavy-tailed distributions. Any choice of estimator and score function is good for light-tailed distributions. Finally the choice of the Huber score function which is 95% efficient at the normal model seems more appropriate than score function. #2.376 #0.975 better reflects the nature of the tails of the underlying distribution, by diverging more from 1. 5.2.3 A s y m p t o t i c V a r i a n c e of the M O S M E of D i s p e r s i o n Appendix C provides the complete derivation of the influence function of the M O S M E of dispersion. In the general non-symmetric case, it is equal to where T = T (F), S = S (F), y = 0 0 0 0 and IF(x;S ,F) 0 and IF(x;T ,F) 0 are the respective influence functions of the initial estimates of location and dispersion. However, it is possible to simplify the above expression with appropriate conditions. P r o p o s i t i o n 3 Assume the score function x is even, bounded and twice differ entiable everywhere except in at most a finite number of points. If F is symmetric, then where y =So(F)' Note that T (F) = 0 when F is symmetric. The most important condition in Proposition 0 3 is the symmetry of F, because it makes the M O S M E of dispersion independent of the Chapter 5. Dispersion Estimation with Unknown Location 71 initial estimator of location. The conditions on the score function x a r e minimal regularity conditions that most score functions used in practice will satisfy. Under mild regularity assumptions, a Taylor series expansion gives an expression for the asymptotic variance of the M O S M E of dispersion: V(S*, F) = E {IF(x- S*, F)} . 2 F In view of Proposition 3, it is clear that the choice of the initial estimator of dispersion, So{F), will greatly affect the efficiency of the M O S M E of dispersion. The literature strongly recommends the use of the robust normalized M A D _ Med|x - Med(x)| n $" (3/4) = 1 as an initial estimate of dispersion, which has the following influence function I F ( X { . S O F) - ' °' } ^ ( w - ^ m ) ~ 4/($" (3/4))$-i(3/4) 1 when F is symmetric and properly scaled so that So(F) = 1. The M A D was first promoted by Hampel (1974). Other possibilities for starting estimators of dispersion are presented by Rousseeuw and Croux (1991), which do not assume a symmetric underlying distribution. Indeed, one must note that the M A D is aimed at symmetric distributions and has low gaussian efficiency, but its influence function has the sharpest bound among all possible dispersion estimators for symmetric F, and therefore, the M A D has the lowest possible gross-error sensitivity for symmetric F (see [15], p. 142). Huber (1981) concluded that "the M A D has emerged as the single most useful ancillary estimate of scale" (p. 107). The S H O R T H (the shorthest half of the data), equivalent to the M A D for symmetric distributions, is also becoming more an more popular because it was found to be approximately minimax bias robust within the class of M-estimators of dispersion with general location (see Martin and Zamar (1993)). Chapter 5. Dispersion Estimation with Unknown Location 72 On the other hand, because only symmetric distributions are studied, the initial estimator of location will asymptotically have the value 0 (TQ(F) — 0), and therefore, does not affect directly the asymptotic variance of the M O S M E of dispersion. However, in practice, because any sample size is finite, one should carefully choose the initial estimate of location. The median is recommended in the literature, because of its robustness properties. 5.2.4 Asymptotic Variance of the Standard One-Step M-Estimator of Dispersion Appendix D provides the complete derivation of the influence function of the standard one-step M-estimator of dispersion. In the general non-symmetric case, it is equal to IF( -S F) X U = IF{x; S , F) { 2 0 TF(r- T ir{x,± ,r)\ F\ f FX where T = T {F), 0 0 S 0 FX(V) E x"(y)y , [Erx'MvP + F S 0 = S {F), y = 0 }+ e 0 E '{y)y | g ^ + > -^ffi y Epx(y) E x'(y) [E '(y)y] F 2 FX E '{y)y E x'(y) \ , E '(y)y(' F t FX J ' FX and IF(x;S ,F) 0 and IF(x;T ,F) 0 are the respective influence functions of the initial estimators of dispersion and location. However, it is possible to simplify the above expression with appropriate conditions. Proposition 4 Assume the score function x is even, bounded and twice differentiable everywhere except in at most a finite number of points. If F is symmetric, then IFfrSuF) = 7f(x;5„,f){2^ + a^ggAi} + Chapter 5. Dispersion Estimation with Unknown Location 73 Notice that To(F) = 0 when F is symmetric. The influence function of the standard one-step M-estimator is the sum of two terms, similarly to the influence function of the M O S M E : one is a multiple of IF(x; So, F) and the other is a multiple of So(F). It is obvious that the initial estimator of dispersion holds great importance in both the definition of the M O S M E and the standard one-step M-estimator. The influence function for the one-step M-estimator contains a'x"(y) and many ratios with denominator EFx'{y)Vi which may cause computational problems. On the other hand, the influence function of the M O S M E contains no x"(y) a n d only includes ratios with denominator E<^x'{ ) i which are known to be more stable. z z Under mild regularity assumptions, it is possible to express the asymptotic variance of the standard one-step M-estimator as V(S ,F) 1 5.2.5 = E {IF(x;S ,F)} . 2 F 1 Asymptotic Variance of the T-Estimator of Dispersion Appendix E provides the complete derivation of the influence function of the T-estimator of dispersion, S%(F). In the general non-symmetric case, it is equal to IFix-SIF) = £,{IF(x S ,F)[j;(S?) -fE x'{y)y] 2 ] -IF(x; where T 0 = T (F), 0 S 0 = S (F), 0 T , F)fE '(y) 0 FX 0 F - (S?) + S* = 5?(F), y = 2 fxi^)}, and IF{x;T ,F) 0 and IF(x; So, F) are the influence functions of the initial estimators of location and dispersion. However, it is possible to simplify the above expression with appropriate conditions. 74 Chapter 5. Dispersion Estimation with Unknown Location Proposition 5 Assume the score function \ even, bounded and twice differentiable z s everywhere except in at most a finite number of points. If F is symmetric, then IF{x-S{, F) = IF{x- S , F ) { f | f } - ^E '(y)y} 0 + FX § ^ x ( ^ ) 2 ' where y = Notice that To(F) = 0 when F is symmetric. The value of the T-estimator appears in its influence function, contrary to the M O S M E or the standard one-step M-estimator of dispersion. The assumption of symmetry for F makes the influence function of the T-estimator of dispersion with unknown location equal to the influence function of the T estimator of dispersion with no location parameter (see [30]). The problem of calculating the influence functions for symmtric F is therefore greatly simplified. Under mild regularity assumptions, it is thus possible to express the asymptotic variance of the T-estimator of dispersion as V{Sl,F) 5.2.6 = E {IF(x-Sl,F)}\ F Relative Asymptotic Efficiency of the M O S M E Compared to That of the One-Step M-Estimator and the T-Estimator of Dispersion The asymptotic efficiency of an M-estimator of dispersion is not a good measure of its asymptotic variability, because as shown in section (5.2.2), the asymptotic value of an M-estimator depends strongly on its score function as well as on its type. Huber (1981) (p. 108) suggests rather the use of the relative asymptotic variance, RV(S,F), the asymptotic variance of yf(n)\og(S(F )/S(F)), n RV(S,F) = V(\ogS,F) which is defined as = ^ p - , that is, Chapter 5. Dispersion Estimation with Unknown 75 Location where S = S(F). This suggestion is in accordance with Bickel and Lehmann (1976) who observed that the relative variance instead of the variance for dispersion estimators is needed to obtain a natural measure of accuracy. Let the relative asymptotic efficiency of an M-estimator of dispersion be defined as the ratio of the relative asymptotic variance of the M L E over its relative asymptotic variance. The relative asymptotic efficiency of the M O S M E will be compared to that of the one-step M-estimator and the T-estimator of dispersion to assess the asymptotic performance of the M O S M E in comparison with that of the other two estimators. The Fisher information, 1(c), for the location-dispersion family of distributions {Fg : i<T Fe tt7 = F(^-)}, is equal to 1(a) = ~^E {x F + 1} 2 Because the M L E of dispersion is consistent for a, its asymptotic variance is V(MLE, F) = JT^T. Its asymptotic value being 1 for any distribution F, as shown in section (5.2.2), the relative asymptotic variance of the M L E of dispersion is thus i ? V ( M L E , F) = y^y. Table 5.8 gives the relative asymptotic variance of the M L E , RV(MLE, F), for the different distributions under study. Finally, Table 5.9 shows the relative asymptotic efficiency of the M O S M E , the onestep M-estimator and the T-estimator of dispersion, for the score functions #2.516, 23.86, and T5.3, #0.975, #2.3765 and different underlying distributions F. The relative asymptotic efficiency of the normalized M A D is also provided, as a mean of comparison, since it is used as the initial dispersion estimator. The asymptotic variance of the normalized M A D is equal to V(MAD,F) = 1 16/ ($- (3/4)){$- (3/4)} ' 2 1 1 2 Chapter 5. Dispersion Estimation with Unknown Location Distribution F #V(MLE,F) double exponential contaminated normal t(l) t(2) t(5) t(8) t(10) t(20) normal symmetrized beta 0.55exp(—x ) 1 0.5 2 1.25 0.8 0.6875 0.65 0.5750084 0.5 0.4210503 0.25 4 Table 5.8: Relative Asymptotic Variance RV(MLE,F) lying Distributions F of the M L E for Different Under- Distribution F MAD #0.975 #2.376 T3.86 #0.975 #2.376 23.86 #2.516 2k3 dble exp cont normal t(l) t(2) t(5) t(8) t(10) t(20) normal sym beta exp(—x ) 0.481 0.336 0.811 0.703 0.534 0.476 0.456 0.413 0.368 0.317 0.233 0.575 0.409 0.947 0.837 0.660 0.597 0.574 0.524 0.470 0.410 0.328 0.873 0.352 0.916 0.959 0.977 0.976 0.977 0.966 0.950 0.919 0.825 0.910 0.311 0.913 0.963 0.993 0.992 0.989 0.975 0.947 0.891 0.780 0.573 0.405 0.884 0.812 0.653 0.593 0.571 0.524 0.470 0.410 0.335 0.534 0.219 0.371 0.585 0.813 0.879 0.899 0.932 0.950 0.942 0.735 0.918 0.211 0.788 0.922 0.987 0.989 0.986 0.973 0.946 0.898 0.833 0.844 0.346 0.902 0.955 0.970 0.970 0.970 0.964 0.950 0.920 0.841 0.935 0.246 0.880 0.929 0.974 0.985 0.987 0.980 0.953 0.892 0.769 4 MOSME 76 Standard One-Step r-Estimator Table 5.9: Relative Asymptotic Efficiency of the M O S M E , the One-Step M-Estimator and the r-Estimator of Dispersion Derived from Different Score Functions x , for Different Underlying Distributions F. The Relative Asymptotic Efficiency of the Initial Estimator of Dispersion, the Normalized M A D , Is Provided for Comparison Purposes. The Median is Used as the Initial Estimator of Location Chapter 5. Dispersion Estimation with Unknown Location 77 Many observations can be made from Table 5.9. The relative asymptotic efficiency of the M O S M E , the one-step M-estimator and the r-estimator of dispersion is improved over that of the M A D , their initial estimator of dispersion, for almost all distributions. The very heavy-tailed distributions like t(l) and t(2) seem to cause some problems to the one-step M-estimator with the #2.376 score function, and the contaminated normal distribution is in general not so well handled by estimators derived from #2.376) #2.516) 73.86 and T5.3 score functions. When compared to the score function #2.376, the relative efficiency of the M O S M E s derived from #0.975 does not in general show a significant improvement over the relative efficiency of the M A D . The #0.975 M O S M E ' s are probably not worth considering, at least in a relative asymptotic efficiency context. However, for very heavy-tailed (t(l), t(2), contaminated normal and double exponential) distributions, the one-step M-estimator derived from the #2.376 score function actually performs worse than the one derived from #0.975: a sign that sacrificing robustness for efficiency with the standard one-step M-estimator can sometimes be a bad strategy. The T3.86 score function, used either with the M O S M E or the one-step M-estimator, appears to be a better or a comparable choice for estimating dispersion, with heavy-tailed distributions. The conclusion is however reversed for the r-estimator: #2.516 generally performs better than T5.3 for heavy-tailed distributions (with the exception of the double exponential distribution). However, the M L E beats any of the estimators presented in Table 5.9, and that, for all distributions F, since all relative efficiencies presented are below 1.000. But using the M L E for estimating dispersion requires that the entire sample be drawn from F, with no outlier allowed: this is too restricted a situation in many cases. Besides, the distribution F must be specified for the M L E to be defined. The relative asymptotic efficiency of the M O S M E is higher than that of the one-step M-estimator or the r-estimator of dispersion for heavy-tailed distributions (t, double Chapter 5. Dispersion Estimation with Unknown Location 78 exponential and contaminated normal), and that, for all score functions considered, except for the Tukey score function used with the double exponential distribution. The improvement appears more significant in very heavy-tailed distributions (Cauchy, t(2), double exponential and contaminated normal) used with efficient score functions and #2.516, 23.86 T5.3). (#2.376, The improvement by the M O S M E over the standard one-step M-estimator is also generally greater than over the T-estimator. For the light-tailed distributions symmetrized beta and exp(—x ), however, the one4 step M-estimator of dispersion performs generally better than the M O S M E or the T estimator, but not significantly better. 5.3 M a x i m u m Bias of the M O S M E of Dispersion Let X\,..., X be a sample from a population with distribution F in the contamination n neighboorhood V (F '°) 6 e 0 = {F : F = (1 - e)F '" + e#, # arbitrary distribution}, 6 0 where 0 < e < 1/2. Refer back to section 4.3 for a more detailed description of the nature of this contamination neighboorhood. The worst-case bias of an estimator gives a measure of its asymptotic robustness as a function of the fraction e of the contamination. Since outliers, as well as inliers, can possibly affect the estimator, we need to consider both cases separately. The explosion bias curve of the M O S M E of dispersion is defined by ^1 #+ (e) = sup FeV e cr and describes the behaviour of the estimator in presence of outliers. On the other hand, the implosion bias curve of the M O S M E of dispersion is 79 Chapter 5. Dispersion Estimation with Unknown Location S*(F) a and describes the M O S M E in presence of inliers. Let the distribution i ^ be obtained when H = 6^, the distribution which puts its 7 total mass at infinity. Let the distribution F be obtained when H = 6 , the point mass 0 0 contamination at 0. We shall hereafter concentrate on the normal central distribution F E , = $ in the neighboorhood A 0 V, E as in section 4.3. Proposition 6 Assume the score function x is odd and bounded. Let S~(e) = i m V y e E$x'( ) > 0 d z z an d/dt{E* ( -?)} x X £ S (e) + SQ(F), = sup FeV and B(e) = s u p So(F), F e V | r ( F ) | . Assume 0 assume that we can interchange derivation and integration, that is = -iErfi*?) ^d d/ds{E* ( -?)} = - ^ x ' ( ^ ) ( ^ ) - // x X (5.30) V* in [0,B(e)], for fixed s in [S~(e),S {e)] + and (5.31) \/s in [S (e),S (e)], for fixed t in [0,B(e)] + then sup S*{F) 5*(Foo), = where F^ Fev = (1 - e)$ + eS ( c and inf S*{F) = S*{F ), 0 where F 0 = (1 - e)$ + e6 . 0 Chapter 5. Dispersion Estimation with Unknown Location 80 P r o o f : We will prove the explosion bias result only. The proof for the implosion bias of the M O S M E follows the exact same idea. We clearly always have that s u p y S*(F) > S*(F ). Fe £ CX> Moreover, Ve < 1/2, sup F€l , S*(F) = S (F) i J^ ) EF \so(F) + S U P i ) 0 sup Fe {{l-e)$ + eH} {5o ^()"- »^,;:r ' )ji W+ SU < f .„ P F G { 1-e ,£atf1 $ + e#} ^Ai-^xC-^M 1 1 sup -B( 5"(c) 5 = + ( e ) < a < + ^ ( 5+(e) e ) ' - ^ H ' 5*(Foo). The first equality is simply the definition of S*(F). By definition of V , it is possible c to write ^FX(^^Fp) a s *he sum of two terms, as states the second equality. Since x is bounded, we can assume without loss of generality that sup x{x) x always less or equal to 1, which gives the third line. = 1. Thus EHX i s Chapter 5. Dispersion Estimation with Unknown Location 81 The function which is to be maximized on the third line does not depend on the distribution # anymore. In fact, it can be regarded as a function of two arguments, T (F) 0 (or t) and SQ(F) (or s). Following the work of Martin and Zamar in [26], if e < 1/2, then T (F) and So(F) are bounded as in the fourth line. That is, S~(e) < So(F) = s < 0 and -B(e) S (e), + < T (F) = t < B(e). 0 Notice that since x is odd, E$( j^) s = - B $ ( ^ ) . We can therefore assume without loss £ of generality that t > 0. Assuming the conditions (5.30) and (5.31) hold, the function to be maximized in the fourth line is increasing in To(F) = t, for all fixed S~(e) < So(F) = s < S (e), + when 0 < T (F) 0 = t < B(e), and is increasing in SQ(F) = s, for all fixed 0 < T (F) = t < #(e), when S~(e) < S {F) = s< S (e). + 0 0 fifth line, which is by definition Therefore, we directly get the S*(F ). OQ Hence, we have shown that sup FeVe S*(F) = S*(F ), co when the central distribution Fo '" is normal. • 6 Analytical derivations, combined with numerical calculations, have shown that the M O S M E ' s with #o.975 #2.376 5 and the T3.86 score functions satisfy the above conditions (5.30) and (5.31), when the median and the normalized M A D are used as preliminary estimates of location and dispersion. The conditions were rewritten for the three specific cases of #0.975, #2.376 and T3.se score functions, and evaluated over a finite and equally-spaced 21x21 grid, covering the range of possible t and s values. For a fixed e, the maximum value of the (normalized) M A D , S (e), is produced by a point mass contamination at infinity, and such contami+ nation also produces the maximum value B(e) of the location estimator. The minimum value of the (normalized) M A D , ^"(e), is produced by a point mass contamination at 0, and such contamination also produces the minimum absolute value of the location Chapter 5. Dispersion Estimation with Unknown Location 82 estimator, 0 (see Martin and Zamar (1993)). More specifically, the value for B(e) can be explicitly written as ^ ~ ( (i- ) )• ^ is the value T(F) which satisfies (4.16) for the > 1 2 median score function £ = sgn(x) and F = F ^ . The bounds S (e) and S~(e) are + XMed(x) the implicit solutions of (1 - e)MB(e) - 5 ( e ) $ - ( 3 / 4 ) } + 1 - *{J3(e) + 5 ( ) $ " ( 3 / 4 ) } ] + e = 1/2, + 1 + 1 e and (1 - e ) [ $ { - 5 - ( 6 ) $ - ( 3 / 4 ) } + 1 - ^{S-(e)^(3/A)}} = 1/2. 1 That is, S (t) satisfies + = E XMAD( S+^)) X FOO 1/2, where the M A D score function is V 2 and ^"(e) satisfies E XMAD{S^) = FO = l/2{sgn(|x| - $ ( 3 / 4 ) ) + 1} and _1 XMAD(X) F = (1 - e)$ + t6 . 0 0 For all e < 1/2 used, the conditions (5.30) and (5.31) were always met. So, even if those conditions seem somewhat restrictive, it is believed that many often used bounded and even score functions x satisfy them. Assuming that the score function x 1S bounded, we have a lower bound for the ex- plosion bias curve of the standard one-step M-estimator of dispersion, given by Si(F ), 0O where: (1 - e ) F $ x ( ^ 4 ) + ex(oo) £ S ^ ) = S (e) + S (e)+ + i 'l \ ) • { W M Similarly, it is possible to get an upper bound on the implosion curve of the standard one-step M-estimator of dispersion. It is given by S {F ) 1 0 = S (e) + 5 ( e ) - — . K J x . (1 - t)E^X {s=(e)Ks=tf) It is also possible to get exact expressions for the explosion and implosion bias curves of the T-estimator. Rousseeuw and Croux (1993a) gives them in the context of estimation of dispersion with known location. The following proposition improves their results. Chapter 5. Dispersion Estimation with Unknown Location 83 Proposition 7 If s —> s E$x( ~^) is increasing in the range [S (e),S' (e)] for fixed t 2 in [—B(e), B(e)], and if t [S-{e),S (e)} + } x + E^xi ^) is increasing in the range [0,#(e)] for fixed s in 1 —> then sup 5 f ( F ) FGF = 5f(Foo) mf S f t F ) = ST(F ). £ 0 Proof: The proof follows the exact same steps as in Proposition 6. • Analytical derivations, combined with numerical calculations similar to those performed for the M O S M E of dispersion, have shown that the T-estimators with #2.516 and the T5.3 score functions satisfy the above conditions in Proposition 7, when the median and the normalized M A D are used as preliminary estimates of location and dispersion. Note that the condition necessitating s E$x( ^) to be increasing in 5 can be explicitly 2 written as x — cs) > $(£ + cs) — 1 for the H score function, which.is clearly always c true for any s, t and c. It is therefore possible to compare, in terms of maximum asymptotic explosion and implosion bias, the M O S M E with the standard one-step M-estimator and the r-estimator of dispersion. Figures 5.6, 5.7 and 5.8 show the explosion bias curve of the M O S M E and a lower bound for the explosion bias curve of the one-step M-estimator of dispersion, derived from the #0.975, #2.376 and T3.86 score functions. In all three cases, the M O S M E shows a smaller maximum asymptotic bias than the one-step M-estimator, Ve < 1/2. The Chapter 5. Dispersion Estimation with Unknown 84 Location improvement is however better in the case of the more efficient estimators #2.376 and the 23.86- Figure 5.9 shows the explosion curves of the M O S M E and the r-estimator of dispersion for the Huber and Tukey score functions, for small, but realistic, contamination by outliers (e < 0.30). We notice that for small values of e, the M O S M E has a lower maximum bias than the r-estimator. However, the situation is reversed when e gets bigger. As the figure shows, this happens with the Huber score function when e > 0.075. (It also happens for the Tukey score function with e > 0.47.) Nevertheless, notice that situations with small fraction of contaminations are representative of many real data set with outliers. When defining the M O S M E or the r-estimator of dispersion, one needs to choose between different score functions X- As Figure 5.9 shows, the Huber score function is preferable to the Tukey score function with the M O S M E , as well as with the r-estimator. Finally, Figure 5.10 shows the implosion curves of the M O S M E , the standard onestep M-estimator and the r-estimator derived from the Huber score function. Figure 5.11 shows the implosion curves of the M O S M E , the standard one-step M-estimator and the r-estimator derived from the Tukey score function. In both Figures, the implosion curve of the normalized M A D is also given for comparison purposes. We notice that for smaller e (e < 0.27 with the Huber score function and e < 0.33 for the Tukey score function), the M O S M E outperforms the other estimators. For larger e, the M O S M E uniformly beats the r-estimator, but is outperformed by the standard one-step M-estimator. The latter does not implode at e = 0.5. Depending on the situation, this may or not be desirable: when half the data coincides with a single point, the standard one-step M-estimator does not become 0 (whereas the M O S M E , the r-estimator and the M A D do). Rousseeuw and Croux (1993a) mention that the fully iterated M-estimator derived from the #2.376 score function explodes at e = 0.9686/5.645 = 0.17 and implodes at e = 1 — 0.17 = 0.83. Chapter 5. Dispersion Estimation with Unknown Location Similarly, the fully iterated M-estimator derived from the T . 3 85 86 score function explodes at e = 0.165 and implodes at e = 1 — 0.165 = 0.835. At the normal model, the standard one-step M-estimator is consistent for a and so has the same asymptotic properties as the fully iterated M-estimator of dispersion. When using a one-step M-estimator of dispersion in a situation of inliers, one needs to decide which score function to select in its definition, so as to minimize its implosion bias. Figure 5.12 shows that the M O S M E ' s , the standard one-step M-estimators and the T-estimators derived from the Huber and Tukey score function are roughly equivalent in terms of implosion bias for small, but realistic, contamination (e < 0.3). The #2.376 M O S M E appears however to slightly beat the other estimators. If one expects a large fraction of contamination of inlers (e > 0.3), then the standard one-step M-estimator of dispersion with the #2.376 score function would probably be the best choice. As discussed, though, this choice may lead to catastrophic results in the presence of special samples. In that case, the M O S M E with the T3.86 score function may be a wiser choice. 5.3.1 Further W o r k to B e Done The work about asymptotic robustness was developed for one specific case of central distribution FQ ' 9 17 in the neighborhood of V : the normal distribution, e More calculations need to be done for different central distributions Fo '", as for example, the ones used in 6 the asymptotic efficiency section (5.2) of this text. One could also define the M O S M E , the standard one-step M-estimator and the restimator of dispersion using other types of score functions, and compare the different estimators in the hope that one proves better than all the others in most situations. Finally, it remains to be seen whether the asymptotic properties of the three one-step M-estimators studied in this section reflect on their finite-sample performance for small Chapter 5. Dispersion Estimation with Unknown Location 86 and medium sample sizes. This could be assessed with Monte-Carlo simulations, for example. 5.4 Continuation of The Hummingbird Example To illustrate the use of the M O S M E as an alternative to the standard one-step M estimator or the T-estimator of dispersion, we shall estimate the dispersion for the distributions of the flying times of the sixteen hummingbirds presented in the previous chapter, section 4.4. Remember that there are four types of hummingbirds: adult females ( A F ) , adult males ( A M ) , junior females (JF) and junior males (JM). Refer to Figure 4.5 for the bar plots of the flying times of all 16 birds. The researcher who provided the data was interested not only in finding a measure of location within each type of birds, but also in describing more generally the features of the distribution of their flying times, such as for example their dispersion. The use of one-step M-estimators of dispersion represent good alternatives to the popular measure of dispersion: the standard deviation. Table 5.10 presents the values of the M O S M E , the standard one-step M-estimator and the T-estimator of dispersion, as well as the standard deviation and the normalized M A D , for the sixteen birds, assuming an underlying normal distribution. We notice that contrary to the location setup, the Tukey score function is less conservative in terms of robustness than the Huber score function, since the Tukey estimates are further from the M A D than the Huber estimates. As expected, the #0.975 estimates stay very close to the M A D , and do not bring much more light than the M A D : the score function is almost as robust as the M A D . On the other hand, the M O S M E and the T-estimates of dispersion are roughly comparable when using a Huber score function which is 95% efficient at the Chapter 5. Dispersion Estimation with Unknown Location M O S M E of Dispersion Bird AF 1 AF 2 AF 3 AF 4 AF 5 AM 1 AM 2 AM 3 AM 4 AM 5 JF 1 JF 2 JF 3 JM 1 JM 2 JM 3 SD MAD 0.151 0.075 2.430 . 0.097 0.143 0.073 0.275 0.074 0.176 0.120 0.132 0.076 0.160 0.074 0.103 0.044 0.167 0.073 0.105 0.064 0.314 0.135 0.387 0.090 0.179 0.042 0.237 0.044 0.449 0.044 0.182 0.104 #0.975 #2.376 0.079 0.096 0.073 0.076 0.124 0.083 0.078 0.048 0.072 0.066 0.144 0.094 0.042 0.048 0.043 0.101 0.091 0.131 0.080 0.111 0.139 0.100 0.093 0.052 0.078 0.075 0.186 0.123 0.049 0.063 0.053 1.124 33.86 0.093 0.133 0.082 0.117 0.143 0.100 0.096 0.054 0.080 0.077 0.191 0.128 0.050 0.067 0.055 0.129 87 Standard One- Step #0.975 #2.376 0.079 0.095 0.073 0.077 0.125 0.083 0.078 0.047 0.072 0.066 0.143 0.095 0.042 0.048 0.043 0.100 0.095 0.147 0.082 0.151 0.141 0.097 0.096 0.054 0.080 0.078 0.200 0.133 0.050 0.077 0.058 0.138 33.86 0.097 0.140 0.085 0.145 0.144 0.099 0.098 0.054 0.082 0.078 0.201 0.137 0.052 0.083 0.059 0.137 r-Estimator #2.516 0.090 0.127 0.080 0.106 0.138 0.097 0.092 0.052 0.079 0.075 0.180 0.119 0.049 0.061 0.053 0.124 T5.3 0.099 0.140 0.089 0.124 0.149 0.102 0.100 0.056 0.086 0.080 0.199 0.135 0.054 0.073 0.061 0.138 Table 5.10: Measures of Dispersion of Flying Times of Four Types of Hummingbirds: Adult Females (AF), Adult Males (AM), Junior Females (JF) and Junior Males (JM). The MAD is the Median Absolute Deviation Multiplied by the Inverse of $ (3/4). _1 normal model. When using a Tukey score function with that 95% Gaussian efficiency, the T-estimator compares favourably with the standard one-step M-estimator of dispersion. Interpreting those measures of dispersion represents an important issue of dispersion estimation. For example, the flying times of the junior female bird 1 has a high standard deviation, compared with its other measures of dispersion, which is due to a very heavy tail (see Figure 4.5 for a visual assessment of the outlier). The normalized MAD of the data is equal to 0.135; in other words, the interquartile range of this sample is 13.5% that of the normal distribution (which MAD equals 1). Because the values of the different one-step M-estimators for this specific bird are significantly higher than the MAD, we know that the sample is very heavy-tailed, without even looking at Figure 4.5. Chapter 5. Dispersion Estimation with Unknown Location 88 The researcher was interested in determining a general measure of dispersion in flying times within the four types of hummingbirds. A robust analysis could be used, but it is beyond the scope of this thesis. However, a simple look at the estimates of dispersion suggests no type of hummingbird has a dispersion in flying times clearly different from the other types. 5.5 Conclusions The M O S M E of dispersion presented in this chapter is in general better than the standard one-step M-estimator of dispersion: it is easier to compute, it has an asymptotic value closer to that of the M L E , it has a higher relative asymptotic efficiency in presence of many heavy-tailed distributions, it has a lower asymptotic explosion bias and a lower implosion bias for small e-contaminations by inliers with central normal distribution. However, if one is concerned mainly about bias, and the data possibly contain a large contamination of inliers but not many repetitions, then the standard one-step M-estimator may be a better estimator of dispersion. In that situation, the choice of the Huber score function would be advisable. For light-tailed underlying distributions, the one-step M-estimator of dispersion performs slightly better in terms of asymptotic relative efficiency than the M O S M E (and the T-estimator). However, heavy-tailed distributions, an attempt to model samples with outliers, are of more interest and importance to robust theory. On the other hand, the T-estimator offers some competition to the M O S M E , in the sense that it is also easy to calculate, it estimates more accurately dispersion of heavytailed distributions when it is derived from the Huber score function, and it presents a lower explosion bias for large contamination with central normal distribution. The Chapter 5. Dispersion Estimation with Unknown Location 89 M O S M E derived from the Tukey score function has nevertheless a higher relative efficiency than the T-estimator for heavy-tailed distributions. Moreover, the M O S M E outperforms the T-estimator in terms of explosion bias for small (and more realistic) percentages of contamination, and beats the T-estimator in the presence of any percentage of inliers. As the hummingbirds example studied in this chapter indicates, there is similarly no obvious winner in terms of dispersion estimator with finite sample sizes. The behaviour of the M O S M E , the standard one-step or the T-estimator of dispersion compares to any of the other, depending on the score function used. The estimates however provides information about the tails of the distribution generating the data, which the M A D or the standard deviation cannot do. If one is mainly interested in the accuracy (as in "small bias") of the estimate of dispersion of normal data with outliers present, and didn't expect a large e-contamination, then we would recommend the use of the M O S M E of dispersion derived from the Huber score function. A large portion of outliers would however call for the T-estimator with the Huber score function. However, if one was mainly concerned with the precision (as in "high relative efficiency") of the estimates of dispersion, then the M O S M E defined by the Tukey score function would provide the best estimator of dispersion. Chapter 5. Dispersion Estimation with Unknown Location 90 Maximum Bias Function of M-Estimators Using H_0.975 Score Function 0.0 0.1 0.2 0.3 0.4 0.5 Epsilon Figure 5.6: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the # 0 . 9 7 5 Score Function Chapter 5. Dispersion Estimation with Unknown Location 91 Maximum Bias Function of M-Estimators Using Huber_2.376 Score Function Epsilon Figure 5.7: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the #2.376 Score Function Chapter 5. Dispersion Estimation with Unknown Location 92 Maximum Bias Function of M-Estimators Using T_3.86 Score Function Epsilon Figure 5.8: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the T 3 . 8 6 Score Function Chapter 5. Dispersion Estimation with Unknown Location 93 Maximum Bias Function of M-Estimators of Dispersion That Are 9 5 % Efficient at the Normal Model Tau M-Estimator With T_5.3 Score Function MOSME With T_3.86 Score Function MOSME With H_2.376 Score Function Tau M-Estimator With H_2.51 Score Function 0.0 0.05 0.10 0.15 0.20 0.25 0.30 Epsilon Figure 5.9: Explosion Bias Curves of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model, for Small Contamination by Outliers Chapter 5. Dispersion Estimation with Unknown Location 94 Implosion Curve of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model (Huber Score Function) 0.0 0.1 0.2 0.3 0.4 0.5 Epsilon Figure 5.10: Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through The #2.376 Score Function, and the Tau-Estimator Have the #2.516 Score Function. Those M-Estimators Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes. Chapter 5. Dispersion Estimation with Unknown Location 95 Implosion Curve for M-Estimators of Dispersion That Are 95% Efficient at the Normal Model (Tukey Score Function) o 0.0 0.1 0.2 0.3 0.4 0.5 Epsilon Figure 5.11: Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through the T3.86 Score Function, and the Tau-Estimator, Through the T5.3 Score Function. Those M-Estimators Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes. Chapter 5. Dispersion Estimation with Unknown Location 96 Implosion Curve for M-Estimators of Dispersion That Are 95% Efficient at the Normal Model 0.0 0.1 0.2 0.3 0.4 0.5 Epsilon Figure 5.12: Implosion Curve of M O S M E ' s and T-Estimators That Are 95% Efficient At the Normal Model Bibliography [1] Andrews, D . F . , P.J. Bickel, F . R . Hampel, P.J. Huber, J . W . Tukey and W . H . Rogers (1972). Robust Estimates of Location: Survey and Advances, Princeton, N J : Princeton University Press. [2] Bernoulli, D. (1777). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inde formand, Acta Acad. Sci. Petropolit 1, 3-33. (English translation by C. G. Allen (1961). Biometrica, 48, 3-13. [3] Berrendero, J.R. (1995). A Note on One Step M-Estimates of Location, Spain: Universidad Carlos III de Madrid (Working Paper). [4] Bessel, F . W . and J.J. Baeyer (1838). Gradmessung in Ostpreussen und ihre Verbindung mit Preussischen und Russischen Dreiecksketten, Druckerei der Koniglichen Akademie der Wissenschaften Berlin. (Reprinted in part in Abhandlungen von F. W. Bessel, R. Engelmann (ed.). W . Engelmann, Leipzig, 1876, Vol. 3, pp. 62-138.) [5] Bickel, P.J. (1975). One-Step Huber Estimates In the Linear Model, J. Amer. Statis. Ass. 70, 428-434. [6] Bickel, P.J. and E . L . Lehmann (1976). Descriptive Statistics for Non-Parametric Models III: Dispersion, Ann. Statist. 4, 1139-1158. [7] Buchanan, J.L. and P.R. Turner (1992). Numerical Methods and Analysis, New York: McGraw-Hill, 751 p. [8] Casella G. and R . L . Berger (1990). Statistical Inference, Belmont, C A : Wadsworth & Brooks/Cole Advanced Books and Softwares, 650 p. [9] Donoho, D . L . (1982). Breakdown Properties of Multivariate Location Estimators, unpublished manuscript, Harvard University, Dept. of Statistics. [10] Donoho, D.L. and P.J. Huber (1983). The Notion of Breakdown Point, in Festschrift fur Erich L. Lehmann, eds. P.J. Bickel, K . Doksum and J.L. Hodges, Jr., Belmont, C A : Wadsworth, 157-184. [11] Fisher R . A . (1922). On the Mathematical Foundations of theoretical Statistics, reprinted in Contributions to mathematical Statistics (1950) by F.J.Wiley, New York. 97 Bibliography 98 [12] Hampel, F . R . (1968). Contributions to the Theory of Robust Estimation, P h . D. Thesis, University of California, Berkeley. [13] Hampel, F . R . (1971). A General Definition of Robustness, Ann. Math. Statist. 42, 1887-1896. [14] Hampel, F . R . (1974). The Influence Curve and Its Role in Robust Estimation, J. Amer. Statist. Ass. 62 1179-1186. [15] Hampel, F.R., E . M . Ronchetti, P.J. Rousseeuw and W . A . Stahel (1986). Robust Statistics - The Approach Based on Influence Functions, New York: John Wiley & Sons. [16] He, X . and D . G . Simpson (1993). Lower Bounds for Contamination Bias: Globally Minimax Versus Locally Linear Estimation, Ann. Statist. 21, 314-327. [17] Hodges, J.L. Jr. (1967). Efficiency in Normal Samples and tolerance to Extreme values for Some Estimates of Location, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), 163-168. [18] Huber, P.J. (1964). Robust Estimation of a Location Parameter, Ann. Math. Statist. 35, 73-101. [19] Huber, P.J. (1981). Robust Statistics, New York: John Wiley k Sons. [20] Huber, P.J. (1984). Finite sample Breakdown of M- and P- Estimators, Ann. Statist. 12, 119-126. [21] Jureckova, J . and S. Portnoy (1987). Asymptotics for One-Step M-estimators in Regression With Application to Combining Efficiency and High Breakdown Point, Comm. Statist. Theory Methods 16, 2187-2200. [22] Kiefer, J.C. (1987). introduction to Statistical Inference, New York: Springer-Verlag. [23] Le Cam, L . (1956). On the Asymptotic Theory of Estimation and testing Hypotheses, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Berkeley: University of California Press, 129-56. [24] Lehmann, E . L . (1983). The Theory of Point Estimation, New York: John Wiley. [25] Martin, R . D . and R . H . Zamar (1989). Asymptotically Min-Max Bias Robust MEstimates of Scale for Positive Random Variables, Ann. Statist. 84, 494-501. [26] Martin, R . D . and R . H . Zamar (1993). Bias Robust Estimation of Scale, Ann. Math. Statist. 21, 991-1017. Bibliography 99 [27] Martin, R.D., V . J . Yohai and R . H . Zamar (1989). Min-Max Bias Robust Regression, Ann. Statist. 17, 1608-1630. [28] Press, W . H . , S.A. Teukolsky, W . T . Vetterling and B.P. Flannery, (1992). Numerical Recipes in C, 2nd edition, New York: Cambridge University Press, 994 p. [29] Rousseeuw, P.J. (1984). Least Median of Squares Regression, J. Amer. Statis. Assoc. 79, 871-880. [30] Rousseeuw, P.J. and C. Croux (1991). Alternatives to the Median Aboslute Deviation, J. Amer. Statis. Assoc. 88, 1273-1283. [31] Rousseeuw, P.J. and C . C r o u x (1993a). The Bias of k-Step M-estimators, Belgium: University of Antwerp, UIA, Department of Mathematics & Computer Science (Report no. 93-06). [32] Rousseeuw, P.J. and C. Croux (1993b). Alternatives to the Median Absolute Deviation, J. Amer. Statis. Assoc. 88, 1273-1283. [33] Tukey (1960). A Survey of Sampling from Contaminated Distributions, Contributions to Probability and Statistics, I. Olkin (ed.). Stanford, C A : Stanford University Press, 448-485. ' [34] Yohai, V . J . and R . H . Zamar (1988). High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale, J. Amer. Statis. Assoc. 83, '406-413. Appendix A N o r m a l i z a t i o n of Densities for Calculations of A s y m p t o t i c Efficiencies Many distributions F are used as examples in the study of asymptotic efficiency of estimators of location and of relative asymptotic efficiency of estimators of dispersion. They are defined as the central distribution in their respective distribution family {F(x) : F(x) = F(^-), —oo < x < oo, —co < 6 < oo, a > 0}. The following lists the expression for the density associated with any member of their distribution family: double exponential f (x;6,a) contaminated normal f (x;6,a) « P ( - ^ ) do do = 0.9 1 exp( \ 2dn <T ••o'"'' _|_ 0.05 2(0.0l)d '2irado ^/2(0.01)x<Td exp(- 2 2 0 exp(0 t(l) f (x;6,a) t(2) fdo(x;0,<r) t(5) fd (x;6,a) r ) 2(0.01)d y/2(0.01)irado V 2 0 <7 -' 2 1 1 do 0 r(3/2) i_ 372" r(l) doeV^ ( T(5/2) doaV^r" 1+ x-e\ 1/ S\d a J 0 t(8) fd (x;9,a) 0 r(9/2) i_ T(4) •d <TV'8lr' 0 1+ 9/2 I (x-e\ 8\d cr J 0 t(10) f (x;6,a) do _ r(n/2) 1 T(5) d (?VWir 0 t(20) f (x;6,a) do _ r(2i/2) T(10) 11/2 1+ ^ d 0 ° - J i_ d aV 207r / 0 1+ 100 10 _i_ 20 (x-e\ ^d <r 0 / 21/2 2 n <x ' 2 Appendix A. Normalization of Densities for Calculations of Asymptotic normal : f (x; 9, a) = symmetrized beta : f (x; 9, a) = do do Efficiencies 101 exp(-| (%f) ) (- (^) 2 + 1/4) a(9-1/2) <x <<T(0,+ 1/2) Notice that the densities are function of a certain factor d . 0 Indeed, in order to make the different central distributions comparable in terms of spread, it was decided to normalize the associated density in their families by a factor do. The specific choice of do makes the interquartile range of each central distribution equal to the standard normal interquartile range, 1.349. Because these central distributions are symmetric around 0, their interquartile range is equal to 2r, where r satisfies (A.32) By fixing the value of r to 0.6744897502 (which can be obtained by solving the above equation (A.32) with Fd (x;8,do) = $(#), where $ is the standard normal distribution 0 function), it is therefore possible to find the normalizing factor do that satisfies (A.32). Table A . 11 shows the factor do necessary to make each of the distributions of interest have an interquartile range equal to 1.349. In summary, the distributions F used in the calculations of asymptotic efficiency of estimators of location, and of relative asymptotic efficiency of estimators of dispersion are the central distributions in their family, and so, have densities fd (x] 9 = 0, a = 1). 0 Appendix A. Normalization of Densities for Calculations of Asymptotic Efficiencies 102 Distribution F double exponential contaminated normal t(l) t(2) ' t(5) . t(8) t(10) t(20) normal symmetrized beta 0.55exp(—x) 4 Normalization Factor do 0.9723764576 0.8820206848 0.6744897502 0.8254780431 0.9274971822 0.9541517180 0.9631157239 0.9811421330 1.0000000000 8.8849776417 1.475435073 Table A.ll: Normalizing Factor do Needed to Standardize the Interquartile Range of Each Distribution F to That of the Standard Normal Distribution Appendix B Derivation of the Influence Function of the M O S M E of Location The influence function of the functional T ( F ) , as defined by Hampel et al in [15] (first introduced by Hampel in 1968), is IF( ;T,F) = X where F tiX Km T { F '- ) T { \ F = (1 — t)F + t8 . In other words, it is defined as the derivative with respect x to t of the functional T(F ), evaluated at t = 0. tiX Therefore, to derive the influence function of the M O S M E of dispersion T*(F), need to write the expression for T*(F ), ttX we first and then evaluate its derivative with respect to t at t = 0. The expression for T*(F ) is t<x {(1 - t)E ^{ - f ^) y T F T*(F , ) t x = T (F ) 0 + S (F , )- tx 0 + Ft F v i , t x f \ t^Cf?^)} Derivating this expression with respect to t, and evaluating it at t = 0, gives IF(x;T*,F) = 0 ^ (y)(" / = + IF(x;T ,F) / F ( x ; r I °' ^ F F ) 5 o "^" IF(x;T ,F){l- ^}+ ^ + T o ) J F ( 3 : ; 5 o E ^ { - E ^ ( y ) + - F ) ) + H^ff )} 1 (B.33) E 0 103 Appendix where T B. Derivation 0 = T (F), S 0 0 of the Influence Function of the MOSME = S (F), y = 0 ^gpf , 1 IF(x;S ,F) of Location and IF(x;T ,F) 0 0 104 are the respective influence functions of the initial estimates of dispersion and of location. But if T is consistent, then T (F) 0 0 = 9 — T(F), M-estimator of location 9. Moreover, in that case, ) = 0 and To(F) = T(F). EF( SJ(P^ V then E ip'(^f^){^^) where T(F) is the fully-iterated ^ ( ^ J ^ ) = 0, since by definition If we also assume that ip is odd and F is symmetric, = 0, since T (F) = T(F) = 0. Therefore, provided that T F 0 0 is consistent, ip is odd and F is symmetric, equation (B.33) reduces to. - IF{x;T\F) = IF(x-,To,F){l^ l ^ } EF i ) + Notice that the middle term on the right-hand side of the above expression can be rewritten as SJF)^^ - ^fa EFn SJF) } where IF(x.; T, F) is the influence function of the one-step location M-estimator, which is equivalent to the influence function of the fully- iterated location M-estimator, provided that To is consistent, odd and translation invariant (see [19] for more details). If these last conditions for T hold, and if ip is odd and F is symmetric, the expression for IF(x; T*, F) 0 finally simplifies to IF{x;T*,F) WHERE *= E!^ EF ] = (l-a)IF(x-T ,F) 0 + aIF(x;T,F), Appendix C Derivation of the Influence Function of the M O S M E of Dispersion As explained in Appendix B , to derive the influence function of the M O S M E of dispersion S*(F), we first need to write the expression for S*(Ft ), and then evaluate its derivative iX with respect to t at t = 0. The expression for S*(F ) is ttX S*(F ) = S (F ) + tiX 0 S (F , ){(l 0 + *x(^gy)}. - t)E (^f^) t x FX ttX E* '(z)z X Derivating this expression with respect to t, and evaluating it at t = 0, gives IF(x- S*, F) = IF(x; S ,F) + ^-^{IFix-, S , F)E (y) 0 ^(x^){= / F ( -^ 7F(x;S ,F){l + ^ 0 m*\T ,F){^} 0 where T = T {F), S = S (F), 0 F ) 5 0 0 0 0 °-g^ - ) / F ( M + $ + FX S [-E (y)+ 0 FX ^ ' } ) + X(^)]} " F ) } + ^M t)-EFx{v)h y = ^ ^ p , S: IF(x;S ,F) and IF(x; T ,F). are the 0 0 respective influence functions of the initial estimates of dispersion and of location. But since E '( 7j°^) y FX = 0 when x is even and F is symmetric, and since T (F) = 0 0 when F is symmetric, equation (G.34) reduces to IF ; ;F) (X S = / f „ , f) (i + izMnl. ( S o So 105 ° " ^ p > } + Appendix D D e r i v a t i o n of the Influence F u n c t i o n of the One-Step M - E s t i m a t o r of Dispersion As explained in Appendix B , to derive the influence function of the one-step M-estimator of dispersion, S\(F), we first need to write the expression for Si(F ), and then evaluate tiX its derivative with respect to t at t = 0. The expression for Si(F , ) is t x Derivating this expression with respect to t (using the derivation formula for a ratio a two functions), and evaluating it at t = 0, gives IF(x;S ,F) 1 = IF(x-S ,F) + 0 So[-E (y) ^ {(IF(x-So,F)E (y)+ w w FX + E x'(y){ ~ ( ' > ) °~W~ °) ( < ' } IF FX x To F s T IF x }_|- Sn F F x(^)]E '(y)y-SoE (y)(-E x (y)y+ f FX FX F ^[(x (y)y + x ^ ) ) ( // J F ( )c(*t)(*t))} = IF{x- So, F){2^L T F(r1 | + F\f ^ EpX l 0 *FX\y)\ , r ^ " T o I + 2 J + ' )]+ F ) (D-35) }+ S FX 106 r o ) / i ? ( 3 ; ; S Q *^«)J W E FX 1f- F ) 5 Q ' F\"(y)y [E '(y)y] [E '(y)y]* ' E x(y) E '(y) [E x'(y)y? P FX F E '(y)y FX ' E '(y) E '( ) i FX FX y y -i , + Appendix D. Derivation of the Influence Function of the One-Step M-Estimator of Dispersionl07 where T 0 = T {F), 0 S = 0 S (F), 0 ^ ^ f , y = IF{x;S ,F) 0 and IF(x;T ,F) 0 are the respective influence functions of the initial estimates of dispersion and of location. But since Erx'i'-^ ) 1 = 0 and E y\*fffi){*gffi) = 0 when F is symmetric, and since TQ(F) = 0 when IF(x- ,F) SL = IF{x- S, 0 F) F is symmetric, equation K ^ ^ l L ) + E F X l x (D.35) is even and reduces to ^( ^^^'}+ E F Appendix E Derivation of the Influence Function of the r-Estimator of Dispersion As explained in Appendix B , to derive the influence function of the T-estimator of dispersion, Si(F), we first need to write the expression for Sl(F ), and then evaluate its ttX derivative with respect to t at t = 0. The expression for Si(F ) tjX is where /3 = E$x( )x Derivating this expression with respect to t, and evaluating it at t = 0 gives IF(x;SlF) = i { 2S iF ,;S ,F) 0 { 0 E F x { y ) + 2yJ £E (y) T •• FX f[-E (y) + ^X^)(- FX = 5 o f F ( ^p{IF(x; S ,F)[~^(Sl) 2 0 " - F ) -|- r o ) / F ( 3 : ; S o ' F ) ) + X(^f)]} - ^-E x'(y)y]- IF(x;T ,F)fE x'(y)-(Siy 0 r o • F F + f ( -^)}, x X (E.36) where T = T (F), S = S {F), SI = S?{F), y = 0 0 0 0 IF(x;T ,F) 0 and IF{x,S ,F) 0 are the respective influence functions of the initial estimators of location and dispersion. But since E x'(y) = 0 when x is even and F is symmetric, and since To(F) = 0 when F F is symmetric, y = IF( ;SIF) X = and equation (E.36) reduces to iF(x-,So,F){^-^E \y)y} + FX 108 x-T (F)} F 1 ^ x { ^ f 0 F ) _ Sf(F) 2 '
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A comparison between several one-step M-estimators...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A comparison between several one-step M-estimators of location and dispersion in the presence of a nuisance… Rainville, Eve 1996
pdf
Page Metadata
Item Metadata
Title | A comparison between several one-step M-estimators of location and dispersion in the presence of a nuisance parameter |
Creator |
Rainville, Eve |
Date Issued | 1996 |
Description | The idea of one-step estimators has long been used: Le Cam (1956), Neyman (1949) and Fisher (1922) have proposed it in the context of maximum likelihood estimation. More recently, Bickel (1975) adapted this idea to robustness theory when he introduced one-step Huber M-estimators for simple linear models. Huber (1981) and Hampel et al (1986) further investigated the advantages of such one-step M-estimators; while retaining the robustness properties of their initial estimates, one-step M-estimators show increased efficiency, and thus represent a good compromise between robust and parametric estimation. Different versions of one-step M-estimators, some more numerically stable than others, have been proposed throughout the years. To our knowledge, no thorough comparison of available one-step M-estimators have been done using modern techniques, as in Rousseeuw and Croux (1993a). In this thesis, two versions of one-step M-estimators of location, obtained with the Newton-Raphson method, are studied in the context of unknown dispersion. Their asymptotic efficiencies at Gaussian and non-Gaussian models, as well as their maximum asymptotic bias are compared. We also introduce two new one-step M-estimators of dispersion with unknown location, and challenge the traditional fixed-point method one-step M-estimator of dispersion, originating from Huber (1981) and used by Rousseeuw and Croux (1993a). We identify the optimal situations in which to use any of those three one-step M-estimators of dispersion, using their relative asymptotic efficiency at different models, and their explosion and implosion maximum asymptotic bias curves. |
Extent | 4874959 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-02-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087209 |
URI | http://hdl.handle.net/2429/4557 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1996-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1996-0394.pdf [ 4.65MB ]
- Metadata
- JSON: 831-1.0087209.json
- JSON-LD: 831-1.0087209-ld.json
- RDF/XML (Pretty): 831-1.0087209-rdf.xml
- RDF/JSON: 831-1.0087209-rdf.json
- Turtle: 831-1.0087209-turtle.txt
- N-Triples: 831-1.0087209-rdf-ntriples.txt
- Original Record: 831-1.0087209-source.json
- Full Text
- 831-1.0087209-fulltext.txt
- Citation
- 831-1.0087209.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087209/manifest