UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A comparison between several one-step M-estimators of location and dispersion in the presence of a nuisance… Rainville, Eve 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-0394.pdf [ 4.65MB ]
Metadata
JSON: 831-1.0087209.json
JSON-LD: 831-1.0087209-ld.json
RDF/XML (Pretty): 831-1.0087209-rdf.xml
RDF/JSON: 831-1.0087209-rdf.json
Turtle: 831-1.0087209-turtle.txt
N-Triples: 831-1.0087209-rdf-ntriples.txt
Original Record: 831-1.0087209-source.json
Full Text
831-1.0087209-fulltext.txt
Citation
831-1.0087209.ris

Full Text

A COMPARISON BETWEEN SEVERAL ONE-STEP M-ESTIMATORS OF LOCATION AND DISPERSION IN THE PRESENCE OF A NUISANCE PARAMETER by EVE RAINVILLE B.Sc. University of Ottawa, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Statistics  We accept this thesis as conforming to therequiredstandard  THE UNIVERSITY OF BRITISH COLUMBIA August 1996 ©Eve Rainville, 1996  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives.  It is understood that copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department of The University of British Columbia Vancouver, Canada  DE-6 (2/88)  Abstract  The idea of one-step estimators has long been used: Le Cam (1956), Neyman (1949) and Fisher (1922) have proposed it in the context of maximum likelihood estimation. More recently, Bickel (1975) adapted this idea to robustness theory when he introduced one-step Huber M-estimators for simple linear models. Huber (1981) and Hampel et al (1986) further investigated the advantages of such one-step M-estimators; while retaining the robustness properties of their initial estimates, one-step M-estimators show increased efficiency, and thus represent a good compromise between robust and parametric estimation. Different versions of one-step M-estimators, some more numerically stable than others, have been proposed throughout the years. To our knowledge, no thorough comparison of available one-step M-estimators have been done using modern techniques, as in Rousseeuw and Croux (1993a). In this thesis, two versions of one-step M-estimators of location, obtained with the Newton-Raphson method, are studied in the context of unknown dispersion. Their asymptotic efficiencies at Gaussian and non-Gaussian models, as well as their maximum asymptotic bias are compared. We also introduce two new one-step M-estimators of dispersion with unknown location, and challenge the traditional fixed-point method one-step M-estimator of dispersion, originating from Huber (1981) and used by Rousseeuw and Croux (1993a). We identify the optimal situations in which to use any of those three one-step M-estimators of dispersion, using their relative asymptotic efficiency at different models, and their explosion and implosion maximum asymptotic bias curves.  Table of Contents  Abstract  ii  List of Tables  vi  List of Figures  viii  Acknowledgements  x  1  Introduction  1  2  M-Estimators: Maximum Likelihood Type of Estimators  10  2.1  Univariate Problem  10  2.1.1  Qualitative Robustness  11  2.1.2  Infinitesimal Aspects  12  2.1.3  Quantitative Robustness  15  2.2 3  4  Nuisance Parameter in the Location-Dispersion Problem  18  One-Step M-Estimators  21  3.1  General Idea of One-Step M-Estimators  21  3.2  Estimation of Location With Unknown Dispersion  3.3  Estimation of Dispersion With Unknown Location  24  3.4  The First Step Is a Big Step  27  Location Estimation with Unknown Dispersion  iii  .  22  31  4.1  Introduction  31  4.2  Asymptotic Efficiency of the M O S M E of Location  33  4.2.1  Our Choice of Underlying Distributions and of Score Functions . .  33  4.2.2  The Asymptotic Variance of the Standard One-Step and the FullyIterated M-Estimators of Location  35  4.2.3  Asymptotic Variance of the M O S M E of Location  36  4.2.4  The Asymptotic Variance of the M L E  38  4.2.5  Asymptotic Efficiency of the M O S M E Compared to That of the Standard One-Step (or Fully-Iterated) M-Estimator of Location .  4.3  40  Maximum Bias of the M O S M E of Location  42  4.3.1  Monotone Non-Decreasing Score Functions  43  4.3.2  Redescending Score Functions  47  4.3.3  Any Type of Score Function .  49  4.3.4  Further Work to Be Done  50  4.4  A n Example: Hummingbirds  50  4.5  Conclusions  53  5 Dispersion Estimation with Unknown Location  61  5.1  Introduction  61  5.2  Relative Asymptotic Efficiency of the M O S M E of Dispersion  63  5.2.1  Our Choice of Underlying Distributions and of Score Functions . .  63  5.2.2  Asymptotic Value of the M O S M E , the One-Step M-Estimator and  5.2.3  the r-Estimator of Dispersion  65  Asymptotic Variance of the M O S M E of Dispersion  70  iv  5.2.4  Asymptotic Variance of the Standard One-Step M-Estimator of Dispersion  72  5.2.5  Asymptotic Variance of the r-Estimator of Dispersion  73  5.2.6  Relative Asymptotic Efficiency of the M O S M E Compared to That of the One-Step M-Estimator and the T-Estimator of Dispersion .  5.3  74  Maximum Bias of the M O S M E of Dispersion  78  5.3.1  85  Further Work to Be Done  5.4  Continuation of The Hummingbird Example  86  5.5  Conclusions  88  Bibliography  97  A Normalization of Densities for Calculations of Asymptotic Efficiencies 100 B Derivation of the Influence Function of the MOSME of Location  103  C Derivation of the Influence Function of the MOSME of Dispersion  105  D Derivation of the Influence Function of the One-Step M-Estimator of Dispersion  106  E Derivation of the Influence Function of the r-Estimator of Dispersion 108  v  List of Tables  1.1  Results of the Implementation of the Loop in Turbo Pascal  4.2  The Value of the Constant Denominator,  E^tf)'(z),  in the Ratio Defining  the Three M O S M E ' s of Location Under Study 4.3  Asymptotic Variance V(MLE,F)  35  of the M L E for Different Underlying  Distributions F 4.4  7  39  Asymptotic Efficiency of the M O S M E and the Standard One-Step M Estimator of Location (Equivalent to That of the Fully-Iterated M-Estimator), Derived from Different Score Functions ^ and for Different Underlying Distributions F. The Asymptotic Efficiency of the Median, the Initial Estimator of Location, is Provided for Comparison Purposes. The Normalized M A D is Used As an Initial Estimator of Dispersion  4.5  40  Measures of Location of Flying Times of Four Types of Hummingbirds: Adult Females ( A F ) , Adult Males ( A M ) , Junior Females (JF) and Junior Males (JM)  5.6  52  The Value of the Constant Denominator, E$x'{ ) -> z  the Three M O S M E ' s of Dispersion Under Study  vi  z  m  the Ratio Defining 66  5.7  Asymptotic Value of the M L E , the M O S M E , the Standard One-Step M Estimator and the r-Estimator of Dispersion, As Well As the Normalized M A D and the Standard Deviation (SD), for Different Underlying Distributions F.  The Initial Estimators of Dispersion and Location Used to  Calculate the One-Step M-estimators Are Respectively the Normalized M A D and the Median 5.8  67  Relative Asymptotic Variance RV(MLE, F) of the M L E for Different Underlying Distributions F  5.9  76  Relative Asymptotic Efficiency of the M O S M E , the One-Step M-Estimator and the r-Estimator of Dispersion Derived from Different Score Functions X , for Different Underlying Distributions F.  The Relative Asymptotic  Efficiency of the Initial Estimator of Dispersion, the Normalized M A D , Is Provided for Comparison Purposes. The Median is Used as the Initial Estimator of Location  76  5.10 Measures of Dispersion of Flying Times of Four Types of Hummingbirds: Adult Females ( A F ) , Adult Males ( A M ) , Junior Females (JF) and Junior Males (JM). The M A D is the Median Absolute Deviation Multiplied by the Inverse of $  _ 1  (3/4)  87  A . 11 Normalizing Factor d Needed to Standardize the Interquartile Range of 0  Each Distribution F to That of the Standard Normal Distribution . . . .  vu  102  List of Figures  4.1  Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived from the # 1 . 3 4 5 Score Function  4.2  . . . .  56  Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived From the NCDF Score Function . . . . 57  4.3  Lower Bound on the Maximum Bias Function of the M O S M E , the Standard One-Step and the Fully-Iterated M-Estimators of Location, Derived from the X4.7 Score Function  4.4  58  Maximum Bias Function of the M O S M E ' s of Location Derived from the #1.345  and the NCDF Score Functions, and a Lower Bound on the Max-  imum Bias Function of the M O S M E of Location Derived from the T 4 . 7 Score Function 4.5  59  Bar Plots of Flying Times of Four Types of Hummingbirds: Adult Females, Adult Males, Junior Females and Junior Males  5.6  60  Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the  #0.975  Score Function  90  vm  5.7  Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the  5.8  #2.376  Score Function  91  Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the  5.9  T3. 8 6  Score Function  92  Explosion Bias Curves of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model, for Small Contamination by Outliers  93  5.10 Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through The and the Tau-Estimator Have the  #2.516  #2.376  Score Function,  Score Function. Those M-Estimators  Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes.  94  5.11 Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through the  T3.86  Score Function, and  the Tau-Estimator, Through the T . Score Function. Those M-Estimators 5  3  Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes  95  5.12 Implosion Curve of M O S M E ' s and r-Estimators That Are 95% Efficient At the Normal Model  96  ix  Acknowledgements  I am forever grateful to many individuals for helping me complete successfully this thesis. Thanks to my supervisor, Dr. Ruben Zamar, who has introduced me to robustness theory and guided my thoughts and my research. Although I had no previous experience in robustness, he willingly offered me his financial support and provided me with as much time as I needed to feel comfortable with my thesis topic. I also wish to thank Dr. Paul Gustafson who found time to revise my thesis in spite of his heavy professional and personal responsibilities. I must express my gratitude towards Janet Moore who happily provided me with data from her own research about hummingbirds. Many friends have made my stay in the Statistics department and in Vancouver very enjoyable. Thank you Yulia, Xiaochun, Paige, Grace, Hubert, Shideh, Nancy, Karine, Mike, Didier, Cathy, Terry. I will always remember you. But most of all, through his constant encouragement and help, my husband Marc Theberge really made a difference.  Merci Marc for your availability and contagious  curiosity, your great meals, your expertise which you were always eager to share. couldn't have done it without you.  x  I  Chapter 1  Introduction  The importance of point estimation in statistics has long been established. Its usefulness in all the disciplines requiring statistical analysis is increasing more than ever. However, the classical techniques, mostly based on maximum likelihood, still encounter difficulties with less than perfect data, which can sometimes lead to disastrous conclusions. In an attempt to solve the major problem of contamination of the data by outliers, and to adjust to a variety of possible underlying processes generating the data (the true one being impossible to determine), the statistical community developed what is now known as robustness theory. Modest attempts at robustness go back at least as far as two centuries ago. Simple and intuitive robust methods, such as rejection of outliers, have been discussed by Bernoulli (1777) and Bessel and Baeyer (1838). In the 19th and the beginning of the 20th century, other authors considered ways to partly downweight excessive observations, much in the spirit of modern robustness. Tukey (1960) summarized the statistical work of the 1940s and 1950s, demonstrated the nonrobustness of the mean and investigated some robust alternatives. His paper shaped the robustness estimation as a general area of research, and broke the isolation of the early pioneers. See the historical notes by Hampel (1986) (pp. 34-36) for a more complete review of early work in robustness. But with the first attempt at a reasonably manageable, realistic and comprehensive theory, the robustness theory was officially launched in 1964 by Peter J . Huber through  1  Chapter 1.  2  Introduction  his famous paper Robust Estimation of a Location Parameter (see [18]). In this paper, Huber introduced M-estimators of location, as a generalization of Maximum-likelihood type of estimators, which include the mean and the median, among many others. More specifically, an M-estimator of location is the value 9 which satisfies  (1.1) i=l  where X\,...,X  is a sample from the population with distribution F(9), and ^ is a  n  score function defining the estimator. Huber (1964) found the M-estimator of location which minimizes the maximum asymptotic variance among all location estimators in the symmetric family of e-contaminated distributions V (F°) = {F : F(x) = (1 - e)F (x -9) + eH(x)} e  0  where 0 < e < 1/2 is fixed and H is symmetric. This minimax estimator has been called, since then, the Huber M-estimator of location. Considering more generally the asymmetric family  V (FQ), c  where H is allowed to be asymmetric, Huber(1964) also showed  that among all translation equivariant estimators of location, the median minimizes the maximum asymptotic bias. After the publication of this paper, the mean, as an estimator of location, lost without any doubt its momentum. These two results illustrate two important concerns of robustness theory: the asymptotic efficiency of an estimator versus its asymptotic bias, or more generally, its robustness properties.  Chapter 1.  Introduction  3  Following Huber (1964)'s paper, a variety of robust estimators for dispersion , re1  gression, general linear models, and more recently for their multivariate extensions, hypothesis testing, and other more complex statistical models such as time series, were proposed. Through standard techniques, as well as others developed specifically for robustness purposes, it was shown that these estimators offer competitive alternatives to maximum likelihood estimation, especially in the presence of (possibly) corrupted data. Nevertheless, robustness never appeared to acquire the popularity that would make it become a standard technique of estimation. The following two criticisms about robustness theory may explain why: • There is generally a trade-off between robustness and efficiency - the more robust an estimator is, the less efficient it is, which affects necessarily the precision of the estimation; and • Robust estimators represent often a computational challenge, which may require too much computing work and time to overcome. For example, the popular robust estimators such as the median, the MAD (Median Absolute Deviation) and the LMS (Least Median of Squares) are highly resistant to outliers. However, the Gaussian efficiencies of these estimators is very low: the median and the MAD have respectively 63.7% and 36.7% efficiency (see Hampel et al (1986)), and the LMS regression estimator converges only at the n  1 / 3  rate (see Rousseeuw (1984)).  Moreover, computing these robust estimators requires more time and memory than the computation of their maximum-likelihood counterparts. N o t e that the use of the word dispersion in this thesis means what is usually referred to as scale. The dispersion corresponds to the spread of a distribution, whereas the scale is a measure of distance between the center of a distribution and 0, and therefore varies with location. The distinction between the two concepts is just now starting to be made. 1  Chapter 1.  Introduction  4  In answer to those two criticisms towards robustness, Bickel (1975) adapted an old idea of the statistical literature to robust M-estimation. Le Cam (1956), Neyman (1949) (see [22]) and Fisher (1922), in the early years of modern statistics, had observed that in the univariate estimation of location setup, "if F is known, and  = (—/'//)> the  estimate obtained by starting with a y/n consistent estimate of 6 and performing one Gauss-Newton iteration of (1.1) is asymptotically efficient even when the M L E is not and is equivalent to it when it is" (see [5], p. 428). In times when computers were still a dream, less effort to compute an estimator, associated with no apparent loss in its asymptotical properties, represented a major advantage. Inspired by the observation of Fisher, Neyman and Le Cam, Bickel (1975) proposed a one-step Huber M-estimator to be used in the estimation of simple linear models, such as location and regression through the origin. The author showed that the estimator is asymptotically normal under mild conditions. In his book, Huber (1981) gave explicit expressions for one-step M-estimators of location with unknown dispersion (p. 140 and p. 146), as the first step of Newton's method, starting with preliminary robust estimates of location and dispersion. The author further showed that if the initial estimate of location was consistent for 9, F was symmetric and the score function defining the estimator was odd, then this one-step M-estimator of location was asymptotically equivalent to the fully-iterated M-estimator of location with preliminary dispersion. In the special case of dispersion estimation with unknown location, Huber (1981) (p. 147) also suggested a fixed-point iterative method for computing the estimate, the first step of which may serve as a one-step M-estimator of dispersion. Hampel et al (1986) (p. 106) further stressed the importance of selecting robust preliminary estimates when computing onestep M-estimators, as otherwise the resulting estimators may not be robust (which was also observed by Andrews et al in [1]).  Chapter 1.  Introduction  5  In all cases, the enthusiasm created by one-step estimators was contagious: those estimators were easy to compute, but most importantly, they represented a good balance between robustness and efficiency. Being one step away from robust initial estimators, they retained the robustness properties of their initial estimators. Simultaneously, one step closer to their fully-iterated version, the one-step estimators were almost as efficient as their fully-iterated version. In the univariate location set up, Andrews et al (1972) (Bickel (1975) has submitted his paper in 1971 and was one of the authors of [1]) showed with some length that for certain score functions, one-step M-estimators were very well behaved. Jureckova and Portnoy (1987) further investigated the use of one-step regression M-estimators based on the L M S to obtain high efficiency. While one-step M-estimators solve the possible lack of uniqueness of the solution of (1.1) and reduce the computational effort, they can sometimes be numerically unstable. For instance, the standard one-step M-estimator of location with unknown dispersion contains a ratio which denominator can become very small for certain samples and score functions. To address this problem, Hampel et al (1986) suggested different versions of one-step M-estimators of location (pp. 152-153), one of which replaces the denominator by a constant. Similarly, the following example (reproduced from [7], p. 10) illustrates that the fixed-point iterative method can sometimes lead to disastrous results, which raises the question whether the one-step M-estimator of dispersion with unknown location, as introduced by Huber (1981), is stable enough to be trusted.  Example. The value x = 1/n is a fixed point of the function given by f(x) = (n 4 - l ) * - !  Chapter 1. Introduction  6  since  It follows then that iterating this function for any particular value ofn using the following loop should, if floating-point arithmetic of the computer were exact, simply result in x = 1/n: x := 1/n for i = 1 to 30 x := (n + 1) * x — 1 We use here the symbol := for computer assignment, so 'x := 1/n', means 'Assign the value 1/n to the stored variable x\ Table 1.1 shows the results of implementing such a loop in Turbo Pascal for different values of n. We see that for the various powers of 2 the arithmetic is indeed exact. By contrast, for other values ofn the error grows steadily from approximately —3.5xl0 for 5  n = 3 through about —2.3xl0  18  for n = 10.  These errors are the direct result of the propagation of the rounding error made in the binary representation of | and j^, respectively. •  Bickel (1975) had most likely foreseen the numerical instability problem. He actually proposed two types of one-step Huber M-estimators of regression. His Type II estimator was the smooth version of his Type I, of which a term was replaced by its asymptotic expectation.  As much as robustness is concerned with outliers, it must never ignore  important issues such as numerical stability. With the development of modern techniques, such as influence functions and maximum bias curves, which allow a more complete study of one-step M-estimators, these  7  Chapter 1. Introduction  n I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16  Final x 1.000 000 000 000 00 E + 0000 5.000 000 000 000 00 E - 0001 1.747 630 000 000 00 E + 0005 2.500 000 000 000 00 E - 0001 -4.021 311 173 693 75 E + 0010 -1.952 324 816 734 00 E + 0012 -4.021 071 095 865 60 E + 0013 1.250 000 000 000 00 E - 0001 -1.616 879 469 807 53 E + 0017 -2.308 383 841 816 94 E + 0018 -1.962 659 088 425 60 E + 0019 -2.443 971 442 609 19 E + 0020 -1.935 039 516 698 14 E + 0021 -1.328 735 789 941 45 E + 0022 7.052 067 281 085 79 E + 0022 6.250 000 000 000 00 E - 0002  Table 1.1: Results of the Implementation of the Loop in Turbo Pascal  Chapter 1.  Introduction  8  estimators can be better understood and appreciated. Maximum bias curves occured briefly in Hampel et al (1986) (pp. 176-177), but were used to their full advantage by Martin and Zamar (1989), Martin et al (1989) and He and Simpson (1993). The influence function as a tool was itself developed by Hampel starting in 1974 and carefully discussed in [15]. Using those techniques, Rousseeuw and Croux (1993a) have studied the bias properties of k-step Huber M-estimators (the kth step of the iterative algorithm) in the univariate location and dispersion setup. They have shown that while the efficiency increases with the number of steps, the bias also increases. This led the authors to recommend the use of one or two-step M-estimators, but not more, especially in multiparameter problems where the dispersion is typically unknown. Rousseeuw and Croux (1993a) preferred the one-step M-estimator of location suggested by Hampel et al (1986) (p. 153), which is derived from the standard one-step M-estimator, but has a constant denominator in its ratio. To estimate univariate dispersion, the authors used the fixed-point method one-step M-estimator suggested by Huber (1981) (p. 147). However, to our knowledge, no formal comparison have been done of all the one-step M-estimators available, using modern techniques, which would allow one to choose one estimator over another depending on the statistical situation at hand. The goal of this thesis is to offer such a comparison, in two contexts: estimation of location with unknown dispersion, and estimation of dispersion with unknown location. The standard one-step M-estimator of location, as suggested by Huber (1981) (p. 146), will be compared to the one-step M-estimator used by Rousseeuw and Croux (1993a) in the general setup of unknown dispersion. To estimate dispersion with unknown location, two estimators are derived using Newton-Raphson's method, and compared to the standard, unchallenged, fixed-point one-step M-estimator of dispersion, used for example by Rousseeuw and Croux (1993a).  Chapter 1.  Introduction  9  Chapter 2 presents the general theory of M-estimators as initiated by Huber (1964). Chapter 3 presents formally the one-step M-estimators that will be compared in the following two chapters.  In Chapter 4, we study the asymptotic properties of the two  one-step M-estimators of location that are of interest; that is, we study their efficiency under Gaussian and non-Gaussian models (following the approach.of Rousseeuw and Croux (1993b)), as well as their maximum asymptotic bias. Using the same robustness techniques, the three one-step M-estimators of dispersion with unknown location that are of interest are compared in Chapter 5 in terms of their asymptotic behaviour.  Chapter 2  M-Estimators: Maximum Likelihood Type of Estimators  2.1  Univariate Problem  Let Xi, ...,X  n  be a sample from the population with distribution function F(x; 9).  Any estimate T which minimizes an equation of the form n  n i=l  or which is defined by the implicit equation n  5Xz,-;T ) = 0, n  (2.2)  i=l  where p is an appropriate loss function and ip(x;9) = (d/d9)p(x;9)  is a score function,  is called an M-estimator of 9. This estimator is an extension of the usual maximum likelihood estimator ( M L E ) of 9: the choice p(x;9) = — log/(#;#) exactly corresponds to the M L E minimization problem. For example, when the parameter 9 to be estimated is a location parameter, one sets ip(x; T ) — ip(x — T ). When the parameter a to be estimated is a dispersion parameter, n  n  the function i\> used is i/>(x; a) = ij>(f)Note that the M-estimator is not modified when tp is multiplied by any constant r > 0.  10  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  2.1.1  11  Qualitative Robustness  It is easy to study the asymptotic properties of M-estimators when ijj(x; 9) is monotone in 9. Furthermore, assume • tp(x;9)  in non-increasing in 9;  • ip(x;9)  is measurable in x;  • tj>(x; — oo) > 0 and ip(x; oo) < 0;  and define • X (t) =  E {^(x;t)};  F  F  • T * = sup{*:£? ^(x.-;t)>0}; n  = 1  • r ** = i n f { * : £ ? i ^ ( * , - ; < ) < u } . n  =  Consistency M-estimators are consistent under some conditions. Huber (1981) has shown that if there is a to(F) such that  f X {t) > 0  Vt<  t (F)  \  Vf>  i (F)  F  then T* —> t (F) 0  T  n  X (t) < 0 F  and T** —>• t (F)  satisfying T* < T  0  n  0  0  almost surely [F] and in probability [F]. Any value  < T** can serve an M-estimator, which will be consistent since  asymptotically, T* = T**.  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  12  Asymptotic Distribution Assume furthermore that • there exists at least one t (F) 0  such that Ai?(£ ) = 0; 0  • X (t) is continously differentiable in a neighboorhood of t (F) and A^(i ) < 0; and F  0  • cr (t) = EF4> (X; t) — \ (t)  is finite, non zero and continuous near to(F).  2  F  0  F  Under those assumptions, M-estimators T are asymptotically normally distributed, n  that is, y/E(T -1 ) n  N(0, V(V>, F)), where V(if>, F) = °}^ .  0  {  See Huber (1981) (pp.  ]2  49-50) for the complete details of this proof. The fact that M-estimators are asymptotically consistent and normally distributed have certainly contributed to their success, since it becomes so easy to make inference under these conditions. In most cases, T is a function of the empirical distribution F and derives from a n  n  functional T, that is, T = T(F ). n  n  If T is consistent, then T —> T(F) in probability n  n  [F]. The discussion that follows will adopt the functional notation. A n M-estimator T  n  will therefore be referred to as T(F).  2.1.2  Infinitesimal Aspects  The Influence Function Beyond qualitative properties, it is informative to study the behaviour of M-estimators under infinitesimal changes. In 1968, Hampel introduced the influence function,  which  "describes the effect of an infinitesimal contamination at the point x on the estimate, standardized by the mass of the contamination. One could say it gives a picture of the infinitesimal behavior of the asymptotic value, so it measures the asymptotic bias  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  caused by contamination in the observations."  13  (see [15], p. 84) When Hampel first  introduced this concept (1968, 1974), he referred to it as the influence curve; however, the term influence function is now widely preferred in view of the generalizations to higher dimensions. More precisely, the influence function of the functional T(F) is IF(x;T,F) v  where F  TIX  y  = lim t-o  '* t  T { F t  )  T { F  \  = (1 — t)F + t6 . In other words, it is defined as the derivative with respect x  to t of the functional T(F , ), evaluated at t = 0. T X  Under the regularity assumptions listed in the previous section, it is easy to show that the influence function of any univariate M-estimator, defined by the score function ip, has the following form:  T  '  F )  =  -EH(a/^fi  ) ;  r(F))}-  (2  -  3)  See Huber (1981) (p. 45) for more details on this derivation. Notice that the influence function of an M-estimator is proportional to ip. In the special case of a univariate location problem, the score function defining the M - estimator is ip{x\ 9) = if)(x — 9), and we obtain  "V' '*)  E ip'(x-T(F)Y  1  F  Similarly, the influence function of a univariate dispersion M-estimator S(F), for which the score function has the form x( ', cr) = x ( f ) , will be x  IF(x-SF)EFX/{X/  XWWWW S F))(X/S(F)Y {  Hampel et al (1986) have shown that under regularity conditions, the knowledge of the influence function of an M-estimator is equivalent to the knowledge of its asymptotic  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  distribution.  Indeed, using a Taylor expansion of T (F ) n  n  around T(F),  14  the authors  showed that the asymptotic variance of an estimator is completely defined by its influence function, since  V(T, F) = E {IF(x; F  T, F)} . 2  (2.4)  The influence function is a priori a heuristic tool. It is easy to calculate, and therefore to obtain an expression for the asymptotic variance of an estimator. However, the regularity assumptions for (2.4) are cumbersome to prove; one usually tries to prove normality using another method. However, in all practical cases, the relation (2.4) holds. The influence function approach will be used in the next chapters when deriving an expression for the asymptotic variance of the estimators of interest.  The Gross-Error Sensitivity The influence function of an estimator can be summarized in many ways other than its expected square. The most important is probably the supremum of its absolute value. Hampel (1968,1974) introduced this notion as the gross-error sensitivity of an estimator. More precisely, one defines the gross-error sensitivity of T at F by  7  *(r,F) = sup|/F(x;r,F)|,  X for values of x where the influence function exists. The gross-error sensitivity measures the worst influence which a small amount of contamination of fixed size can have on the value of the estimator. It could therefore be regarded as an upper bound on the standardized asymptotic bias of the estimator. If 7*(T, F) is finite, we say that T is B-robust at F, where the B comes from bias. In view of (2.3), an M-estimator is B-robust at F if and only if ?/>(•, T(F)) is bounded.  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  Putting a bound on j*(T,F)  15  is often the first step in robustifying an estimator.  However, in many cases, this will conflict with the goal of asymptotic efficiency. The introduction of one-step M-estimators brings a partial solution to this problem, as will be seen in the next chapter.  2.1.3  Quantitative Robustness  The influence function represents an excellent tool for assessing local asymptotic behaviour of an estimator.  However, it must be complemented by a measure of global  reliability, which describes up to what.distance from the model distribution the estimator still gives some relevant information. Consider the e-contaminated neighboorhood of F: 0  V (F ) E  0  = {F : F = (1 — e)F  0  +  eH},  where H is an arbitrary distribution and 0 < e < 1/2 so it is possible to distinguish between the central model FQ and the contamining distribution H. In what follows, we will present two measures of distance from the central model FQ that should stay as small as possible, and for e as big as possible, for an estimator to be considered robust.  The Breakdown Point Hampel (1971) introduced the notion of breakdown point of an estimator, generalizing a definition by Hodges (1967). Roughly speaking, the breakdown point of an estimator is the maximum e-contamination that an estimator can endure before its value goes to infinity. It gives the limiting fraction of bad outliers the estimator can cope with. For any estimator, the maximum breakdown point is 50%. Robust M-estimators such as the median or the median absolute deviation (MAD) have 50% breakdown point. On  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  16  the other hand, the mean, as an estimator of location, has 0% breakdown point - it is completely intolerant to outliers. In fact, Hampel (1971) and Huber (1981) (section 3.2) have shown that when F is symmetric, one usually chooses odd score functions ip to estimate location. If ip is moreover strictly monotone and bounded, the M-estimator defined by ip is B-robust and has 50% breakdown point. On the other hand, M-estimators are not B-robust and have 0% breakdown point when ip is strictly monotone but unbounded. There exists an asymptotic version of the breakdown point, as well as a finite-sample version. In what follows, to accompany the functional (limiting) notation, we will use the asymptotic version, as in Hampel (1971) and Huber (1981). See [9], [10], [17] and [20] for a detailed presentation of the finite-sample version of the breakdown point.  The Maximum Bias Function A high breakdown point is a necessary condition for a good estimating method, but not a sufficient condition (see [29], p. 877). Many argue that the breakdown point is not as general at it claims it is. To deal with the matter, Hampel et al (1986) (pp. 176-177) briefly introduced maximum bias functions, but this measure of robustness was fully exploited by Martin and Zamar (1989), Martin et al (1989) and He and Simpson (1993). The maximum bias of an estimator (as a function of e) describes how an estimator T(F) can change in V {FQ) due to a given fraction e of contamination. C  In the univariate location setup, the maximum bias function of an estimator T(F) is formally defined as  B (e)= T  sup  \(T(F)-T(F )\. 0  FeV (F ) e  0  However, in the univariate dispersion setup, one needs to generalize the concept of maximum bias function. Martin and Zamar (1989) have observed that the presence of  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  17  outliers can cause the estimate of dispersion to be very large, but as well, too many inliers may cause the estimate to be dangerously close to 0. They defined the generalized maximum bias as the maximum between the bias due to outliers and the bias due to inliers. Moreover, they allowed (monotone) penalization for inliers and outliers to be independently chosen, because in some setups, one of the two may cause more trouble than the other. Formally, the generalized maximum bias can be defined by B (e)=  max  s  r  B[S(F)],  £V (l<o) €  where B[S(F)} = I ^ W M W ' \ L [S(F)/S(F )}, 2  < W ^ ^ if S(Fo) < S(F) < oo,  i f  0  0  S  where L\ and L are the continuous, nonnegative and monotone (penalization) loss func2  tions, with I/i(l) = £ 2 ( 1 ) = 0 and limXi(t) = lim L (t) = 00. 2  A popular choice for a loss function is the logarithmic function. From monotonicity of L\ and L , it follows that 2  B (e)=max{L [S-/S(F )],L [S /S(Fo)}}, +  s  where S~ and S  +  1  0  2  denote the supremum and the infimum of the functional S(F) as F  ranges over V (F ). Therefore, it is enough to concentrate on S~ and S  +  e  0  when studying  the generalized maximum bias of an estimator. We shall call S /S(Fo) the explosion +  maximum bias of an estimator, and S~/S(Fo) mator, both being functions of e.  the implosion maximum bias of an esti-  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  18  Finally, note that the dependence of the explosion and implosion maximum bias functions of an estimator on the ratio  S(F)/S(FQ)  is justified in terms of dispersion-  invariance (see [24], p. 134). The maximum bias function of an estimator includes its gross-error sensistivity (the slope of the curve at e = 0), and also the breakdown point, where the curve goes to infinity. The maximal bias curve is therefore an additional, and more complete, tool that can help to choose between competing estimators. For a small range of e, some use the gross-error sensitivity of an estimator as a linear approximation to the curve (check Hampel et al (1986) for a rule of thumb on values of e for which the approximation hold).  2.2 Nuisance Parameter in the Location-Dispersion Problem In many cases, the underlying distribution of the population from which a sample is taken has more than one parameter. For example, the location-dispersion families have typically distributions of the form  F( ^-), S  where —oo <  8 <  oo,<7 > 0. If the interest is  still to estimate only one parameter, then the other becomes a nuisance parameter. It must also be observed that typical M-estimators of location are in practice location invariant, but not dispersion invariant. Therefore, when estimating location, one needs to provide an M-estimator of dispersion as well. The unknown dispersion is, per say, a nuisance parameter. The same can be said about dispersion M-estimators: they are not location invariant, and thus need an ancillary location estimator. If %j> and x  a r e  the respective score functions of the M-estimators of location and  dispersion, then the simultaneous equations defining them implicitly are  and  (2.5)  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  1 n  n  Ex i=l  (  19  X  =  (2-6)  0.  From a finite-sample point of view, when estimating location with unknown dispersion, one would compute some M-estimate of dispersion, S , and then find the location esn  timate T using (2.5). If, on the other hand, one is interested in estimating dispersion n  when the location is unknown, then one would solve (2.6) for S , with some preliminary n  estimate of location T . In functional terms, the M-estimator of location T(F) with unn  known dispersion, and the M-estimator of dispersion S(F) with unknown location can be expressed implicitly by and  where So(F) and T (F) are the asymptotic values of the initial estimator used for the 0  nuisance parameter. Fortunately, if the underlying distribution F is symmetric, ip is odd and x  1S  even,  the location functional T(F) and the dispersion functional S(F) are independent:  the  asymptotic variance of T depends on 5" only through its asymptotic value S(F),  and  that of S(F) does not depend on T(F). More precisely, the influence functions of T and S, respectively IF(x; T, F) and IF(x; S, F) are: IF(x; F, T)  and  Chapter 2. M-Estimators: Maximum Likelihood Type of Estimators  20  (Remember that T(F) = 0 when F is symmetric.) Under similar regularity assumptions as listed in the preceding section, the M-estimators of location and dispersion defined by (2.5) and (2.6) have asymptotic normal distributions with variances equal to the expected squares of their influence functions. Therefore, when estimating location with unknown dispersion, one can choose the estimator of dispersion on criteria other than low variability, which enlarge considerably the set of possible candidates. Similarly, the asymptotic variability of an M-estimator of dispersion will not be affected by the estimate of the unknown location parameter, no matter which estimator is chosen. The maximum bias of an M-estimator of location with unknown dispersion is defined exactly as in the univariate setup.  However, one must remember that T is defined  implicitly as a function of S, which necessarily causes the bias of T to be affected by the bias of S in an e-contaminated neighboorhood such as V (Fo). Clearly, the maximum bias (  of T will be higher in presence of a nuisance parameter than with no nuisance parameter. Similarly, the maximum explosion bias of an M-estimator of dispersion with unknown location, defined as in the univariate setup, will be implicitly affected by the estimator of location, and increased accordingly. On the other hand, the maximum implosion bias is not affected by the estimate of the unknown location, compared to the implosion bias of the univariate estimator of dispersion. This is due to the fact that when S —> 0, the location estimate tends to the fixed value  T(F ). 0  Solving the non-linear equations (2.5) and (2.6) requires a numerical method, which may not be numerically feasible in some situations. One-step M-estimators have been proposed to solve this problem. The next chapter presents these estimators in the context of location and dispersion estimation with a nuisance parameter.  Chapter 3  One-Step M-Estimators  3.1  General Idea of One-Step M-Estimators  The computation of an M-estimator requires an iterative algorithm, since the estimator is defined implicitly by a non-linear equation.  Moreover, the presence of a nuisance  parameter adds an extra dimension to the problem, being itself the implicit solution of another non-linear equation. Indeed, in multiparameter problems, you can never be sure that a root exists until you find it: there is no bracketing-the-root-by-fancy-algorithms possible. The computation involved often proves to be difficult, as any iterative method can be subject to many problems. For example, the Newton-Raphson algorithm, often used in practice, will diverge if it encounters a local.extremum in its search of a root (see [28] p. 362 for a graphical explanation of this problem). Bickel (1975) suggested, as an alternative to the fully-iterated M-estimator, what he referred to as the one-step M-estimator. This estimator corresponds to the first iteration solution of the algorithm used to solve the non-linear equation defining the estimator, as Huber (1981) (section 6.7) further investigated. In the case where the objective is to estimate the parameter 6 of a distribution in the presence of a nuisance parameter, one would first choose preliminary estimates for both parameters.  The nuisance parameter would then be considered fixed and equal to its  preliminary estimate. To get the one-step M-estimator of 0, one would simply perform one iteration in the algorithm used to solve for the fully iterated M-estimator, starting 21  Chapter 3. One-Step M-Estimators  22  from the preliminary estimate of 9 and considering the nuisance parameter as fixed.  3.2  Estimation of Location With Unknown Dispersion  Huber (1981) suggested to use the well-known Newton-Raphson method to solve the nonlinear equation defining an M-estimator. The one-step M-estimator is therefore obtained by performing one step in the Newton-Raphson method. Thus, in order to estimate the location parameter of the location-dispersion distribution F, with unknown dispersion, one would choose initial estimates of location and dispersion, with respective asymptotic values  To(F)  and  So(F).  Since the Taylor expan-  sion of  with respect to  T(F),  around  TQ(F),  begins with  + ... , the one-step M-estimator of location is defined, in functional terms, by  T(F) = T (F) 0  + So(F)  \*$* - "f V S (F) > tjp  (3.8)  )  0  This estimator will be hereafter called the Standard One-Step M-Estimator of Location. The functional notation is used here for simplicity. But in practice, we always deal with finite sample sizes. The finite-sample standard one-step M-estimator of location is given by  T  n  = T" + 0  -pL;J-,  (3.9)  Chapter 3. One-Step M-Estimators  23  where TQ and SQ are the initial finite-sample estimates of location and dispersion, and n is the sample size. Huber (1981) (p. 146) indicates that (3.9) converges to the exact value of the fully iterated M-estimator in a finite number of steps, provided the underlying distribution F is symmetric and tp is skew symmetric and piecewise linear. Of course, in practice, we deal with empirical cumulative density functions F which are never symmetric. But we N  assume here that those cumulative density functions converge to symmetric distribution functions. The value of the denominator in the ratio defining the estimator (3.8) is not critical, and can be replaced by any constant greater than 1/2 as long as 0 < if)' < 1. However, as Hampel et al (1986) (p. 153) pointed out, for certain score functions ip (especially the redescending types) and certain samples, it may happen that the denominator ^  ICILi ^'C's"  0  ) becomes extremely small (approximately equal to 0), which would  destabilize the value of the estimator. To avoid this problem, Hampel et al (1986) have suggested replacing the denominator in the ratio by the constant E$ip'(z), where $ is the standard normal distribution. This constant is easy to calculate and never becomes 0. It can also be seen as the smooth version of ^ £"=i ^'C's^  )  w  n  e  n  the underlying  distribution F is normal. The normality assumption is often used in practice, but nothing prevents one from assuming F is otherwise, and calculating the expectation with respect to that different underlying distribution. The modified version of the one-step M-estimator (3.8) is therefore given by: T*(F)  = T (F)  +  0  S (F) 0  This estimator will hereafter be called the MOSME of Location, for Modified One-Step M-Estimator of Location. Its finite-sample version can be expressed as <*\n  rpn 0  i  '  on 0  Chapter 3. One-Step M-Estimators  24  where Tg and SQ are the initial finite-sample estimates of location and dispersion, and n is the sample size. Hampel et al (1986) (p. 153) suggest the use of another estimator, the one-step Westimator, to avoid the problem of a small denominator. This thesis will however focus on the M O S M E and the standard one-step M-estimator of location, as they are more commonly used in practice.  3.3  Estimation of Dispersion With Unknown Location  The non-linear equation defining the fully iterated M-estimator of dispersion S(F) with unknown location,  (3.10) slightly differs from that of the M-estimator of location with unknown dispersion, in that the right-hand side constant, (3 = E$x{ ), x  is greater than 0. It would be undesirable to  have /3 = 0 since this would force S(F) to be equal to oo (assuming x(0) = 0). Taking advantage of the form of the expression (3.10), Huber (1981) (p. 147) suggested the use of the fixed-point iterative method to solve it for S(F). Actually, Huber used a particular case of (3.10), the one for which x = ^'i where ip was the score function defining the fully iterated M-estimator of the unknown location. That is, Huber wanted to make the location estimator defined by ip dispersion invariant. The fixed-point method can however be straightforwardly generalized for any xThe one-step M-estimator of dispersion, Si(F), tational algorithm can be expressed as  derived from the fixed-point compu-  Chapter 3. One-Step M-Estimators  25  [SQ(F)}  2  (3.11)  where TQ(F) and So(F) are the limiting value of the preliminary estimators of location and dispersion. Note that the standard fixed-point method solves an equation of the form g(S(F))  = S(F), for a certain function g. The reason for solving instead S(F)g(S(F)) is due to the fact that when x  S(F)  2  —> T0(F),  the x function behaves generally like a  second-order polynomial, and therefore S(F)g(S(F)) To(F)) . 2  —  = S(F)  2  E { ~^P) X  FX  ^ E (x  The latter expression refers directly to the expression for the standard deviation  as an estimator of dispersion. The finite-sample version of (3.11) can be expressed as  where  and SQ ave the initial finite-sample estimates of location and dispersion, and  n is the sample size. Yohai and Zamar in [34] (p. 407) have obtained T-estimators of dispersion by letting So(F) in (3.11) be an M-estimator of dispersion determined by a smooth function p±. The estimator S\{F), defined by (3.11), will therefore be hereafter called One-Step r Estimator of Dispersion.  The fixed-point method for solving (3.10) is not the only one possible, of course. The Newton-Raphson method, as used in the setup of location estimation with unknown dispersion, can also be used. In fact, the fixed-point method is known to converge linearly, whereas the Newton-Raphson method, as a special case with an additonal constraint, converges quadratically (see [7], p. 62). To my knowledge, the Newton-Raphson method in the estimation of dispersion has never been fully studied, but it represents clear advantages over the fixed-point method in terms of asymptotic behaviour of the estimator (refer  F  -  Chapter 3. One-Step M-Estimators  26  to chapter 5 for more details). We therefore believe it is worth of some consideration. Thus, in order to estimate the dispersion parameter of the location-dispersion distribution F, with unknown location, one would choose initial estimates of location and dispersion, respectively referred to as TQ(F) and So(F).  Since the Taylor expansion of  the right-hand side of (3.10), with respect to S(F), around So(F), begins with  S(F)-S (F) So{F) 0  E  , (x-T (F)\ (x-T (F)\ , \ S (F) ) \ S {F) ) T ••• 0  0  0  0  the Standard One-Step M-Estimator of Dispersion can be defined in functional terms by  = S (F)  S (F)  0  1  +  P- ) 12  ' »ff ) ' - .  (F)  So  FX \  (F)  )\  So  S (F)  )  0  The finite-sample version of the standard one-step M-estimator of dispersion is on  qn , qn  "  (n 1 Q\  '-'o  where Tg and Sfi are the initial finite-sample estimates of location and dispersion, and n is the sample size. For the same reasons explained in the location setup, it may happen, for certain x functions and certain samples, that the denominator in the ratio of (3.13) gets dangerously close to 0 and strongly affects the estimate. In this case, the MOSME of Dispersion may represent a better choice of estimator, where the M O S M E of dispersion is asymptotically defined as TP  S\F) Its finite-sample definition is  = S (F) + S (F) 0  0  "  J X-T (F)\  _  0  E  ^  Z  )  Z  Q  •  (3-14)  Chapter 3. One-Step M-Estimators  where  and  27  are the initial finite-sample estimates of location and dispersion, and  n is the sample size.  3.4  The First Step Is a Big Step  The obvious advantage of one-step M-estimators is their ease of computation, compared to their fully iterated versions.  Moreover, the asymptotic properties of one-step M -  estimators make them very attractive.  For. example, it is known that the standard  one-step M-estimator of location is asymptotically equivalent to the fully iterated M estimator, provided the underlying distribution F is symmetric, if> is odd and most of all, To is consistent, translation invariant and odd. To is said to be translation invariant and odd if T (F )  = T (F )  . T (F-x)  =  0  X+C  0  X  + c,  and 0  -To(Fx).  In this thesis, we will focus on the following robustness properties of one-step M estimators: their breakdown point (the fraction of contamination causing complete disaster), their influence function (which describes the effect of an infinitesimal contamination) and their worst-case bias (which describes how much the estimators can change with a fraction e of contamination). It can be shown (see Rousseeuw and Croux in [31]) that the breakdown point of onestep M-estimators is equal to that of the initial M-estimators used in their computation.  Chapter 3. One-Step M-Estimators  28  Selecting very robust initial M-estimators is thus strongly recommended. For example, the median and the median absolute deviation ( M A D ) , as estimates of location and dispersion respectively, have 50% breakdown point, the highest possible. This makes them excellent candidates for preliminary estimates of location and dispersion of a distribution. On the other hand, many estimators with a high breakdown point have very low asymptotic efficiency (63.7% for the median in the univariate location problem, and 36.7% for the M A D in the univariate scale model). However, one-step M-estimators, while inheriting the breakdown point of their initial estimators, will generally show an improvement over the asymptotic efficiencies of their initial estimators, because they approach the fully iterated M-estimators, which are usually by definition more efficient. Unfortunately, as Rousseeuw and Croux (1993a) have shown, increasing efficiency does not come without compromise. The authors treat the problems of univariate location estimation and univariate dispersion estimation, that is, estimation of location and dispersion when the nuisance parameter is known. In the former, they use the univariate equivalent of our M O S M E of location, while they use the univariate equivalent of our T-estimator of dispersion in the latter.  Rousseeuw and Croux (1993a) show that the  maximum bias of these one-step M-estimators is higher than that of their initial estimators. The increase in bias is especially strong in the dispersion setup. However, because only one iteration in the computational algorithm is performed when deriving one-step M-estimators, the increase in worst-case bias is perceived as a good compromise for the increase in efficiency. Rousseeuw and Croux (1993a) treat in general &-step M-estimators, for k > 1. They show that it is possible to obtain an arbitrarily high efficiency, while maintaining a 50% breakdown point, through a fc-step M-estimator, for a fixed finite k (which corresponds to performing k steps in the algorithm used to solve for the fully-iterated M-estimator,  Chapter 3. One-Step M-Estimators  29  starting from a preliminary estimator with 50% breakdown point). Notice that under the Gaussian model, the univariate one-step M-estimator of location has already the same asymptotic efficiency as the fully-iterated M-estimator. Therefore, taking further steps will not increase the efficiency of the &-step M-estimator: Rousseeuw and Croux (1993a) show that it is equal to the efficiency of the fully-iterated M-estimator of location, for any k. The authors nevertheless consider A:-step M-estimators of location for k > 1 in view of possible improvements on their quantitative robustness properties. However, Rousseeuw and Croux (1993a) show that as the efficiency of the estimator goes up (as k increases), its maximum bias will also go up. In the univariate location case, the bias increases only slightly with k. Unfortunately, in the univariate dispersion problem, the maximum bias explodes rapidly with k, which does not justify, in the authors' opinions, the increase in efficiency. Based on these findings, they believe small values of k (k = 1 or k = 2) are preferable, especially for multiparameter problems (which typically contain a dispersion component). This has driven the choice to focus this thesis on one-step M-estimators. The purpose of this thesis is to extend the work by Rousseeuw and Croux (1993a). It will consider the more realistic situations with nuisance parameters: estimation of location with unknown dispersion and estimation of dispersion with unknown location. The emphasis will be put on the M O S M E ' s presented earlier in this chapter for the above models, because they have never been studied to the extent they deserve. To follow Rousseeuw and Croux's steps, we will derive the maximum bias of the M O S M E for the two models of interest, and compare them to that of the standard one-step M-estimators and the r-estimators when applicable. To further complete the study, we believe the asymptotic efficiency of an estimator also provides important information about its behaviour, even though Rousseeuw and Croux quickly mentioned it, and only for the normal distribution as the underlying distribution. The asymptotic efficiency of the M O S M E will  Chapter  3.  One-Step  M-Estimators  30  be derived from its influence function, under some regularity conditions, and for different types (heavy-tailed, normal, light-tailed) of distributions. The asymptotic efficiency of the M O S M E will then be compared to that of the oether one-step M-estimators. To draw a more complete parallel with Rousseeuw and Croux (1993a), the same preliminary estimators of location and dispersion, the median and the (normalized) M A D , will be used. We believe, in any case, that they represent an excellent choice of preliminary estimators. The next two chapters treat independently the two problems of interest: estimating location with unknown dispersion, and estimating dispersion with unkown location. Each chapter will first look at the asymptotic efficiency of one-step M-estimators, then at their worst-case bias.  Chapter 4  Location Estimation with Unknown Dispersion  4.1  Introduction  Let Xi, ...,X  n  be a sample from a population with distribution F in the location-dispersion  family {F(x) : F(x) = F(^-)}.  The objective is to estimate the location 6, when the  dispersion parameter a is unknown. Given a score function  an M-estimator of location is the solution T of the equation n  (4.15) where S  n  is a robust estimate of the dispersion parameter a.  It can be shown that,  under mild regularity conditions, T converges a.s. [F] to T(F), the functional implicitly n  defined as the solution of  (4.16) where S(F) is the asymptotic value of S . We will therefore adopt the functional notation n  in the discussion below. Computing an M-estimate of location requires the use of an iterative method, such as the Newton-Raphson algorithm, as we must solve the nonlinear equation (4.15). Huber (1981) suggested, as an alternative, to use the estimate found by performing only one iteration in the algorithm, starting with initial estimates of location and dispersion. With  31  Chapter 4. Location Estimation with Unknown Dispersion  32  the underlying distribution F, the Standard One-Step M-Estimator of Location, derived from the score functions I/J, can be formally defined by the functional  where To(F) and So(F) are asymptotic values of the initial estimators of location and dispersion. By replacing the distribution F by the empirical distribution F , it is therefore n  possible to get a finite-sample estimator for the location parameter 6. However, for certain score functions tp and certain samples, it may happen that the finite-sample version of the denominator  becomes 0 or dangerously close to  Epij)'( ~^) x  0. To avoid this problem, Hampel et al (1986) suggested the following modified version of the standard one-step M-estimator of location: T*{F) = T {F) + S (F) 0  0  E ^( F  E^'(z)  which will be denoted hereafter the MOSME of Location (Modified One-Step M-Estimator of Location). The standard one-step M-estimator of location has essentially the same asymptotic behaviour as its fully-iterated version, at least when F is symmetric, %j) is odd and To is consistent, translation invariant and odd. Our main interest is to study the asymptotic behaviour of the MOSME as a means of comparison with the standard one-step Mestimator of location. More specifically, the asymptotic efficiency at different models and the maximum asymptotic bias of both estimators will be compared.  Chapter 4.  4.2  Location Estimation with Unknown Dispersion  33  Asymptotic Efficiency of the M O S M E of Location  4.2.1  Our Choice of Underlying Distributions and of Score Functions  The study developed in section 4.2 will include a set of eleven underlying distributions F and three different score functions ip. The discussion in section 4.2 assumes that the underlying distribution F is symmetric. Interesting cases for F include heavy-tailed distributions, which attempt to model samples with outliers. To further make a parallel between the theory related to the asymptotic efficiency of the M O S M E and its maximum bias, one case of the contaminated normal distribution will be considered. Finally, two light-tailed distributions will also be used, to illustrate the adaptative property of the M O S M E to various situations. More specifically, the distributions of interest are: the normal distribution, the Student's t distribution with 1, 2, 5, 8, 10, 20 degrees of freedom, the double exponential distribution, a contaminated standard normal distribution, with 5% of outliers with distribution N(6,0.01) and another 5% with distribution N(—6, 0.01), and two distributions with lighter tails, that is, a symmetrized Beta distribution (a = j3 = 10) and the distribution with density f(x) — 0.5516313254exp(—a; ). 4  In order to compare the asymptotic variances of our estimators at these distributions, it was decided to normalize the distibutions so that their interquartile range would all be equal to the standard normal interquartile range, 1.349. This normalization implied scaling each distribution with a factor do, so that the density associated with the central distribution F is actually ^ • /  e=0(T  _ ( ^ - ) , and not fe=o,a=i{x). See Appendix A for the 1  scale factors do necessary to make each of the above distributions have an interquartile range equal to 1.349.  Chapter 4. Location Estimation with Unknown Dispersion  34  We are however aware that the normalization is not needed if we compare the efficiencies of the estimators instead of their variances, which we actually do. Indeed, an efficiency is by definition the ratio of asymptotic variances, and the asymptotic variance under F(-^) is equal to cfy times the asymptotic variance under F(x). Therefore, the efficiency calculations are independent of the normalization factors. However, the normalization greatly simplifies the calculations when the initial estimator of dispersion in the one-step calculation is chosen to be the M A D , as will later be shown. This is the reason why we have adopted this procedure. If one nevertheless prefers to compare the asymptotic variances of the estimators, instead of their efficiency, the material provided in this thesis will enable that person to do so. To further illustrate the behaviour of the M O S M E , the following three score functions ip will be used:  |x| < 1.345  1.345  ( sign(x)  tp  (x)  NcDF  (4.17)  |x| > 1.345  = 2${x)-l,  (4.18)  and  x(4.7 -x )  \x\<4.7  0  1x1 > 4.7  2  They will respectively hereafter be called  2  2  #1.345, NCDF  and  T4.7.  (4.19)  The first two score func-  tions are monotone non-decreasing, whereas the third is redescending. The score function #1.345  was initially proposed by Huber (1964) who showed that it is asymptotically min-  imax for F = $, within the class of location estimators with general dispersion. The  Chapter 4. Location Estimation with Unknown Dispersion  Score Function tj) #1.345  NCDF  T.  4 7  35  E^'(z) 0.6106876 .5641896 370.4275608  Table 4.2: The Value of the Constant Denominator, E^tp'(z), in the Ratio Defining the Three M O S M E ' s of Location Under Study choice of the constant 1.345 makes the estimator  #1.345  95% efficient under the normal  model. As will be seen later, NCDF is also 95% efficent when the underlying distribution is normal. The score function T4.7 is an example of Tukey's biweight function. The choice of the constant 4.7 makes the estimator T4.7 95% efficient under the normal model. Note that the constant denominator, E^ijj'(z), in the ratio defining the M O S M E can be easily calculated when the score functions 0 are determined. Table 4.2 provides those constants. Notice that the score functions (4.17) and (4.18) are normalized to have a maximum of 1. The score function in (4.19) is not. This explains the big difference in the values of the constants in Table 4.2.  4.2.2  The Asymptotic Variance of the Standard One-Step and the FullyIterated M-Estimators of Location  In order to study the behaviour of the M O S M E of location, it is of interest to compare its asymptotic variance to the asymptotic variance of the standard one-step M-estimator. But the latter is equivalent to the asymptotic variance of the fully iterated M-estimator of location, provided that the initial estimator of location, To, is consistent, translation invariant and odd, and that the underlying distribution F is symmetric and the score function if) is odd. These last assumptions shall be used throughout this chapter, in view of the simplification they bring to the problem.  Chapter 4. Location Estimation with Unknown Dispersion  36  Huber (1981) (pp. 140-141) shows that the influence function of the standard one-step M-estimator and of the fully iterated M-estimator of location is, under our assumptions,  IF(x-T,F)=  '  f\\  KS b/(  (4.20)  where So(F) is the asymptotic value of the initial estimator of dispersion. Remember that T (F) — 0 when F is symmetric. Note that the above influence function (4.20) is 0  directly proportional to the score function tp defining the estimators. If we take the initial estimator of dispersion to be the normalized M A D , with asymptotic value  Med(|X|) S  o  [  F  )  ~ $-H3/4) '  then So(F) — 1 for all the distributions of interest presented in section 4.2.1. This is due to the normalization factor do (see Appendix A) which makes the interquartile range of each distribution equal to 1.349. With this choice of initial estimator of dispersion, the asymptotic variance of the standard one-step M-estimator of location, and so of the fully iterated M-estimator of location, simplifies to W  W  under mild regularity conditions.  4.2.3  Asymptotic Variance of the M O S M E of Location  Appendix B provides the complete derivation of the influence function of the M O S M E of location. In the general non-symmetric case, it is equal to  Chapter 4. Location Estimation with Unknown Dispersion  IF{x-T\F)=  I F (  X  ]  T  0  , F ) { l - ^ }  37  +  mx;So,F){§f^-^0^} m * - , T , F ) { § ^ } - S  + 0  § $ $ ,  where To = TQ(F) and So = So(F) are the asymptotic values of the initial estimators of location and dispersion, y =  v  g °, IF(x\To, 0  F) and IF(x; SQ,F) are the influence  functions of the initial estimators of location and dispersion, and IF(x; T, F) is the influence function of the standard (or the fully iterated) one-step M-estimator of location. However, it is possible to simplify the above expression with appropriate conditions. P r o p o s i t i o n 1 Assume To is consistent, translation invariant and odd. Assume that ip is odd, bounded, differentiable except in at most a finite number of points, and equal to 0 at 0. If F is symmetric, then IF{x; T*, F) = (1 - a) IF(x; T , F) + a IF(x; T, F), 0  where a =  § J $ M  The conditions of Proposition 1 are needed for the standard one-step M-estimator of location to have the same influence function as the fully iterated M-estimator. The most important condition is the symmetry of F, which greatly simplifies our problem since it makes the influence function of the M O S M E of location independent of the initial estimator of dispersion. The conditions on the score function ij) are minimal regularity conditions that most score functions used in practice will satisfy. Note that the influence function of the M O S M E illustrates its adaptative behaviour. When the underlying distribution F is approximately normal, the constant a becomes close to 1, and the M O S M E behaves like the more efficient standard one-step (fully  Chapter 4. Location Estimation with Unknown Dispersion  38  iterated) M-estimator. On the other hand, the further away F is from normal, the further away a is from 1, and the more impact the initial (robust) estimator of location has on the behaviour of the M O S M E . On can use a Taylor series expansion, under mild regularity conditions, or the heuristic formula V(T*,F) M O S M E , V(T*,F).  = Ep{IF(x;T*,  F)}  2  to obtain the asymptotic variance of the  In either case, we find that  V(T\  F) = E {(1 F  - a)IF(x;T , F) + aIF{x-T, 0  F)} . 2  Note that the asymptotic value of the initial estimator of location, To(F), is 0, since all the distributions are symmetric. It is however important to choose a robust initial estimator of location when using the M O S M E or the standard one-step M-estimator. The median is strongly recommended in the literature. Its influence function is  when it is assumed that its asymptotic value is 0. Hampel et al (1986) have moreover shown that the influence function of the median has the sharpest bound for any location estimator, thus the smallest gross-error sensitivity. The asymptotic variance of the median is 1/4/(0) . 2  As in the preceding section on the standard one-step M-estimator of location, we recommend the use of the normalized M A D as the initial estimator of dispersion. It is asymptotically equal to 1 for all distributions F under study, by the choice of the normalization factors do.  4.2.4  The Asymptotic Variance of the M L E  In order to calculate the efficiency of the M O S M E and the one-step M-estimator of location, it is necessary to find the asymptotic variance of the maximum likelihood estimator  Chapter 4. Location Estimation with Unknown Dispersion  Distribution F  V(MLE, F)  double exponential contaminated normal t(l) t(2) t(5) t(8) t(10) t(20) normal symmetrized beta 0.55exp(-a; )  0.9455161 0.0713725 0.909873 1.1356899 1.1470014 1.1127178 1.0962449 1.0543198 1.0000000 0.9233088 0.5367305  4  Table 4.3: Asymptotic Variance V(MLE,F) butions F  39  of the M L E for Different Underlying Distri-  of location, for each underlying distribution F. The M L E of location is defined through the score function ^ M L E W ~  ~f'( )/f( )x  x  The MLE, consistent for the location 9, possesses the smallest asymptotic variance possible, namely the inverse of the Fisher information. That is, the asymptotic variance of  the M L E is V(MLE, F) =  E  r  {  ^ w  Table 4.3 gives the asymptotic variance of the MLE, y ( M L E , i ) , for the different ?  distributions under study. Note the strikingly small asymptotic variance of the M L E for the contaminated normal distribution, which is equivalent to a very large value of the Fisher information. When integrating {f'(x)/f(x)}  2  to obtain this Fisher information,  the two contaminations centered on x — ± 6 cause the integral to increase significantly over values of x greater than 5 or smaller than -5. The contaminated normal is the only distribution in Table 4.3 which is not unimodal.  Chapter 4. Location Estimation with Unknown Dispersion  MOSME  Distribution Med dble exp 1.000 cont normal 0.047 t(l) 0.811 0.833 t(2) 0.769 . t(5) 0.731 t(8) t(10) 0.716 0.680 t(20) normal 0.637 sym beta 0.581 exp(—x) 0.300 F  4  40  Standard One-Step  #1.345  NCDF  T .7  #1.345  NCDF  T4.7  0.735 0.060 0.620 0.876 0.992 0.996 0.993 0.978 0.950 0.902 0.669  0.742 0.058 0.609 0.870 0.993 0.999 0.996 0.983 0.950 0.906 0.631  0.747 0.080 0.781 0.930 0.987 0.987 0.984 0.942 0.950 0.910 0.666  0.698 0.060 0.569 0.857 0.990 0.996 0.993 0.979 0.950 0.901 0.644  0.718 0.058 0.571 0.856 0.992 0.999 0.997 0.983 0.950 0.905 0.616  0.695 0.080 0.716 0.904 0.984 0.987 0.985 0.946 0.950 0.908 0.643  4  Table 4.4: Asymptotic Efficiency of the MOSME and the Standard One-Step M-Estimator of Location (Equivalent to That of the Fully-Iterated M-Estimator), Derived from Different Score Functions if; and for Different Underlying Distributions F. The Asymptotic Efficiency of the Median, the Initial Estimator of Location, is Provided for Comparison Purposes. The Normalized MAD is Used As an Initial Estimator of Dispersion.  4.2.5  Asymptotic Efficiency of the M O S M E Compared to That of the Standard One-Step (or Fully-Iterated) M-Estimator of Location  Table 4.4 presents the asymptotic efficiency of the MOSME and the one-step M-estimator of location (equivalent to that of the fully iterated M-estimator), derived from the three score functions (4.17), (4.18) and (4.19), for the different distributions F under study. The asymptotic efficiency of the initial estimator, the median Med, is also given in Table 4.4, as a mean of comparison. In view of Table 4.4, it can be concluded that the MOSME of location does not have the same asymptotic properties as the fully- iterated M-estimator, contrary to the one-step M-estimator of location under our assumptions.  Chapter 4. Location Estimation with Unknown Dispersion  In fact, for the three score functions  #1.345,  41  NCDF and  T4.7,  it appears that the  efficiency of the M O S M E is always greater than, or comparable, to that of the standard one-step M-estimator, and hence to that of the fully-iterated M-estimator, at least for the underlying distributions F selected. This improvement is especially present in the case of very heavy-tailed distributions, which puts the M O S M E at an advantage over the other estimators when the sample contains outliers. The M O S M E , the standard one-step and the fully-iterated M-estimators of location seem approximately equivalent, in terms of asymptotic efficiency, for underlying distributions F approaching the normal distribution. Table 4.4 also suggests that a M-estimator derived from the T 4 . 7 score function shows a greater efficiency than the one derived from the  #1.345  and the NCDF score functions,  in presence of very heavy-tailed distributions (t(l), t(2) and contaminated normal). The NCDF score function behaves slightly better than the  #1.345  score function, which itself  behaves slightly better than T 4 . 7 , for distributions approaching the normal distribution (t(5), t(8), t(10) and t(20)). For the very light-tailed symmetrized beta distribution, the T4.7  score function gives the highest efficiency, followed by NCDF and then  case of the light-tailed 0.55exp(—x ), 4  #1.345  and  T4.7  #1.345.  In the  have comparable efficiencies, slightly  higher than NCDF'S- For the double exponential underlying distribution, things are not so clear. The NCDF score function is superior to  #1.345  and  T4.7  when using the fully  iterated M-estimator. However, the three score functions are roughly equivalent when the M O S M E is used. Therefore, because the presence of outliers is the main problem when estimating location, I would recommend using the M O S M E derived from the T 4 . 7 score function to estimate the location of a distribution with unknown dispersion.  Chapter 4. Location Estimation with Unknown Dispersion  4.3  42  Maximum Bias of the M O S M E of Location  Let  Xi,...,  X  n  be a sample from a population with distribution  F  in the contamination  neighboorhood  V (F ' ) 6  T  A  0  =  {F  : F  =  (1 -  e)F '  +  E (R  0  where the central distribution  F  E  ,  A  Q  e#,  H  arbitrary distribution},  0<  e <  1/2,  belongs to a location-dispersion family, that is,  u The arbitrary distribution H generates the outliers that can be present in the sample, and it will invariably affect the estimation of the location parameter. Moreover, we set e less than 1 /2 because it would otherwise be impossible to distinguish between the central, and the arbitrary,  FQ '", 9  H,  distribution.  The maximum bias function B *(e) T  =  sup Fev  \T*(F)-9\,  €  as briefly introduced by Hampel et al (1986) (pp. 176-177), can be used to measure the asymptotic robustness of the M O S M E of location as a function of the fraction e of contamination. Note that we can take 9 = 0 without loss of generality because the estimate T*(F) is translation invariant. The following compares the maximum bias of the M O S M E , the standard one-step and the fully-iterated M-estimators of location in two distinct situations: when the score function (as  if)  is monotone non-decreasing (as # 1 . 3 4 5 and  is T 4 . 7 ) .  NCDF  are), and when  if)  is redescending  Chapter 4. Location Estimation with Unknown Dispersion  4.3.1  43  M o n o t o n e N o n - D e c r e a s i n g Score Functions  Let the distribution Foo be a point mass contamination at infinity, obtained when H = 6oo. We shall hereafter concentrate on the normal central distribution Fo' = $ in the a  neighboorhood V . t  P r o p o s i t i o n 2 Assume ip is monotone non-decreasing, bounded and odd. Let S~(e) = mf v So(F), S (e) = sup +  Fe  £  FeVe  So(F), and B(e) = sup y \TQ(F)\. Fe  Assume we can  €  interchange integration and derivation, that is d/dt{E^( j^)}  = —^E^'(^-) and  E  E*ip'(x)  — l-e>  (4.21)  Vi in [-B(e), B(t)], forfixeds in [5~(e), 5 (e)] +  and  E^{»-f)-E^X^){*-f)>-^,  (4.22)  \/s in [S~(e),S (e)], forfixedt in [-B(e), B(e)], +  then sup T*(F) = r(Foo), Fev e  where  = (1 — e)$ + e6  0  P r o o f : We clearly always have that s u p  F e V e  T*(F) > T * ( F ) . 00  Moreover, Ve < 1/2,  ™PFevJ*(F)  =  sup  f  T (F) + S (F) 0  0  ^  *-?o(fV )  44  Chapter 4. Location Estimation with Unknown Dispersion  Fe {{l-e)$ + eH}  F  G {(1 - e)$ + eH}  I  * < M ;  sup -  - £ ( e ) < t < 5(e)  Y+s  ^  7  7  ^  5"(e) < s < S+{e)  =  T*(Foo).  The first equality is simply the definition of T*(F). to write Epipi^-^pp-)  By definition of V , it is possible E  as the sum of two terms, as states the second equality. Since tp  is bounded, we can assume without loss of generality that sup tp(x) = 1. Thus E}jip is x  always less or equal to 1, which gives the third line. The function which is to be maximized on the third line does not depend on the distribution H anymore. In fact, it can be regarded as a function of two arguments, T (F) 0  (or t) and So(F) (or s). Following the work of Martin and Zamar in [26], if e < 1/2, then T {F) and S (F) are bounded as in the fourth line. That is, S~{e) < S {F) = s< S (e), +  0  0  and -B(e) <  0  T (F) 0  = t < B(e).  Assuming the conditions (4.21) and (4.22) hold, the function to be maximized in the fourth line is increasing in TQ(F) = t, for all fixed S~(e) < SQ(F) = s < S (e), +  Chapter 4. Location Estimation with Unknown Dispersion  45  when —B(e) < TQ(F) = t < B(e), and it is increasing in So(F) = s, for all fixed -B(e)  < T (F) = t < B(e), when S~(e) < S (F) = s < S+(e). Therefore, we directly 0  0  get the fifth line, which is by definition T*( F ). J  Hence, we have shown that sup y T*(F) Fe  F '  e cr  0  €  00  = T*(F ), when the central distribution 00  is normal. •  Analytical derivations, combined with numerical calculations, have shown that the #1.345 and the NCDF score functions satisfy the above conditions (4.21) and (4.22), when  the median and the (normalized) M A D are used as preliminary estimates of location and dispersion. The conditions were rewritten for the two specific cases of #1.345 and NCDF score functions, and evaluated over a finite and equally-spaced 21x21 grid, covering the range of possible t and s values. For a fixed e, the maximum value of the (normalized) M A D , S (e), is produced by a point mass contamination at infinity, and such contamination +  also produces the maximum value B(e) of the location estimator. The minimum value of the (normalized) M A D , S~(e), is produced by a point mass contamination at 0, and such contamination also produces the minimum absolute value of the location estimator, 0 (see Martin and Zamar.(1989)). More specifically, the value for B(e) can be explicitly written as $~ ( (i- ))- It is the value T(F) which satisfies (4.16) for the median score 1  2  e  function XMed(x) = sgn(x) and F = F^. The bounds S (e) and S~(e) are the implicit +  solutions of (1 - e)MB(e) - 5' (e)$- (3/4)} + 1 - ${#(e) + 5 (e)$- (3/4)}] + t = 1/2, +  1  ,+  1  and (1 - e)[${-5-(e)$- (3/4)} + 1 - ${5"(e)$- (3/4)}] = 1/2. 1  1  Chapter 4. Location Estimation with Unknown Dispersion  That is, S (e)  satisfies  +  = 1/2 and S (e) satisfies  E XMAD( +$) X  FOO  46  S  1/2, where the M A D score function is  XMAD{X)  E XMAD{^J^) FO  =  l/2{sgn(|a;| - $ ( 3 / 4 ) ) + 1} (see _1  =  chapter 5 for more details about score functions of dispersion estimators) and Fo = (1 - e)$ + e6 . 0  For all e < 1/2 used, the conditions (4.21) and (4.22) were always met. So, even if those conditions seem somewhat restrictive, it is believed that many often used monotone non-decreasing, bounded and odd score functions tfi satisfy them. A lower bound for the maximum bias of the fully iterated location M-estimator is T(F ), 00  ie the positive solution t of the non-linear equation  - E ^  (  X  - ^  E  S+(e)J  1-e'  for a fixed e. Moreover, if we assume that the score function ip is monotone non-decreasing, bounded and odd, then we have a lower bound for the bias of the standard one-step location M estimator, Ti(F),  given by T^F^),  Tx^co)  =  where  B{e) + S+(e)-  (1 - e)E^(^^) '  S { t )  J , (  + e •  ( i - W ( ^ ) It is therefore possible to compare, in terms of maximum asymptotic bias, the M O S M E with the standard one-step M-estimator, as well as with the fully iterated M-estimator of location. Figures 4.1 and 4.2 show the maxbias curve of the M O S M E and a lower bound for the maxbias curves of the one-step M-estimator and the fully iterated M-estimator of location, derived from the #1.345 and the  NCDF  score functions.  In both cases, the  M O S M E shows a smaller maximum asymptotic bias than the one-step M-estimator and the fully iterated M-estimator, Ve < 1/2. Note that the improvement by the M O S M E  Chapter 4. Location Estimation with Unknown Dispersion  47  over the one-step and the fully iterated M-estimators of location is especially striking with the #1.345 score function. The maximum bias for the three types of M-estimator was also calculated, using Huber's score function with different values of index, ranging from 0.5 to 1.75. Similar results as those with the #1.345 score function were obtained; the bigger the value of the index, the better the improvement by the M O S M E , compared to the one-step and the fully iterated M-estimators of location. Hence, the M O S M E clearly shows improved asymptotic robustness over the standard one-step M-estimator, as well as the fully iterated M-estimator of location.  4.3.2  Redescending Score Functions  As in the previous section, let's concentrate on the normal central distribution F  e,(r 0  = $  in the neighboorhood V . e  It is possible to obtain a lower bound on the maximum bias of the M O S M E , the standard one-step and the fully iterated M-estimators of location when the score function used in their definition is redescending, as for example the T4.7 score function is. Indeed, let x* = 2.1 be the value at which T4.7 is maximized. Let  be the point mass  contamination at x*, obtained in the neighboorhood V when # = 8*. Then, T*(F ), t  tf  Ti(F*) and T(F*) ave lower bounds on the maximum bias of respectively the M O S M E , the standard one-step and the fully iterated M-estimator of location. To obtain T*(F*), Ti(F*) and T(F*), we must first determine what the initial estimators of location and dispersion, the median and the normalized M A D , become as a function of e at F*. For any e < 1/2, the median at F+ is the value #(e) which satisfies  Chapter 4. Location Estimation with Unknown Dispersion  48  where S (e) is the normalized M A D at F* and ipMed is the score function defining the +  estimator Med. It turns out that we can explicitly write B(t) —  (2(1-0)  w  n  e  n  e  is  approximately equal or less than 0.49. The normalized M A D at F* is the value S (e) which satisfies +  **•*»«> ^ w ) = ' 1/2  where B(e) is the median at F* and  ipMAD  is the score function defining the normalized  M A D . It turns out that for e approximately equal or less than 0.30, S (e) satisfies more +  precisely (1 - e ) [ $ { £ ( e ) - 5 (e)$~ (3/4)} + 1 - ${5(e) + 5 (e)$" (3/4)}] + e = 1/2. +  1  +  1  Therefore, for e approximately equal or less than 0.30, a lower bound for the maximum bias of the M O S M E defined with the score function T4.7 is T*(F ) = B(e) + S+(e)  (1 - e ) F ^  r 4  , (^) + 7  0r ,(^l)  £  4  m  Similarly, we have a lower bound on the maximum bias of the standard one-step M estimator of location: ( l - e W  T  ,  (1 -  U  ( ^ ) + ^  7  T  4  ( ^ l )  7  T (F*) = B(e) + S (e) +  1  W  ^  ) +  ^  U  2  ^  )  "  Finally, a lower bound on the maximum bias of the fully iterated M-estimator of location is the positive solution t of  which can be more precisely written as  (1 - e)E*ip .  Tl T  (^rrr)  + ^T .  S+(e)J '  e  4  r  ^  ^  \S+(e)  Chapter 4. Location Estimation with Unknown Dispersion  49  Figure 4.3 shows the lower bounds on the maximum bias of the M O S M E , the standard one-step and the fully iterated M-estimators of location, defined through the nonredescending score function T4.7. It appears that for small fraction of contamination e (e < 0.12), the three estimators are equivalent in terms of maximum bias. For medium e (0.12 < e < 0.30), the standard one-step and the fully iterated estimators are nondistinguishable and appear to have a slightly lower maximum bias than the M O S M E . However, this cannot be ascertained for a fact, as the figure shows only a lower bound on the maximum bias. However, the redescending nature of the Tukey score function, which discards the contribution of very large values of x, may makes it superflous to change the denominator in Si(F) to get the M O S M E of location. This would explain possible higher bias of the M O S M E compared to the standard one-step M-estimator of location.  4.3.3  A n y T y p e of Score F u n c t i o n  It was shown in section 4.3.1 that the maximum bias of the M O S M E is uniformly lower than that of the standard one-step or the fully iterated M-estimator of location defined through a monotone non-decreasing score function. When a redescending score function is used, the M O S M E is equivalent to the other two estimators for small fraction of contamination e, as section 4.3.2 presented. Therefore, it is possible to now focus our attention to M O S M E ' s only, especially when e is small. When using the M O S M E of location, one needs to decide which score function to select in its definition, so as to minimize its maximum bias. Figure 4.4 shows the maximum bias of the M O S M E of location defined through the  #1.345  and the  NCDF  score functions, and  a lower bound for the maximum bias of the T4.7 M O S M E of location. It is impossible to conclude anything when e > 0.12, but for small fraction of contamination, Figure 4.4 shows that the Huber score function has the lowest maximum bias, followed by the  Chapter 4. Location Estimation with Unknown Dispersion  Tukey score function, and then preferred to the  4.3.4  NCDF  NCDF-  50  For larger e, the Huber score function should be  score function.  Further Work to Be Done  The work about asymptotic robustness was developed for one specific case of central distribution  FQ  9,(7  in the neighborhood of V : the normal distribution C  need to be done for different central distributions  FQ '", 9  More calculations  as for example, the ones used in  the asymptotic efficiency section (4.2) of this text. Finally, it remains to be seen whether the asymptotic properties of the M O S M E and the standard one-step M-estimators studied in this section reflect on their finite-sample performance for small and medium sample sizes. This could be assessed with Monte-Carlo simulations, for example.  4.4  A n Example: Hummingbirds  In asymptotic terms, the M O S M E of location performs better than the standard one-step M-estimator and the fully iterated M-estimator of location. However, how well do they handle finite data sets? Figure 4.5 present the bar plots of flying times (in seconds) of four types of hummingbirds: adult females (AF), adult males ( A M ) , junior females (JF) and junior males (JM). During 15 minutes, each bird was put in a cage containing two perches 0.5 meter apart. A red light above the perches alternatively flashed so as to indicate to the bird to fly towards it (the birds were previously trained to react to flashing red lights in that manner).  The time for the bird to fly from one perch to the other (in seconds) was  recorded. It is believed to be a measure of the agility of the birds. Hummingbirds can  Chapter 4. Location Estimation with Unknown Dispersion  51  fly for long periods of time without rest. In 15 minutes, a bird typically flew 200 times from one perch to the other. A close look at Figure 4.5 shows that some bird wandered around before hitting the flashing perch, during some of their flights, introducing extreme outliers in their flying times. On the other hand, for other birds, it is difficult to determine whether a high flying time was due to wandering or if it effectively represents the bird's performance. For that reason, the researcher who provided the data set was interested in using a measure of location to describe the distribution of flying times for each bird that would be resistant enough to possible outliers, while allowing for flexibility if long flying times of a bird are normal. She hesitated between using the median and the mode. Our M O S M E of location provides an interesting alternative. Table 4.5 shows the estimated value of different location measures for the distribution of flying times of all 16 hummingbirds, that is, their means, medians, M O S M E ' s and standard one-step M-estimates of location. The standard one-step M-estimates are shown; note however that the estimates obtained with the fully iterated M-estimator of location are roughly equal, up to the third decimal place, except for the adult female 2, which has an extreme outlier. Many observations can be drawn from Table 4.5. In general, the M O S M E ' s and the standard one-step M-estimates of location are roughly equal, except in the cases of A F 2, A F 4, J F 1, J F 2 and J M 3 hummingbirds, depending on the score function used. These particular birds have extreme outliers or a very heavy tail. Their M O S M E ' s are closer to the median than the standard one-step M-estimates, which illustrates the special adaptative nature of the M O S M E of location. Notice as well that the Tukey score function appears to be conservative, compared to #1.345 or  NCDF-,  in the robustness sense of the term. The M O S M E ' s or the standard  Chapter 4. Location Estimation with Unknown Dispersion  Bird AF 1 AF 2 AF 3 AF 4 AF 5 AM 1 AM 2 AM 3 AM 4 AM 5 JF 1 JF 2 JF 3 JM 1 JM 2 JM 3  MOSME of Location Mean Median #1.345 NCDF T. 0.707 0.669 0.675 0.676 0.666 0.930 0.580 0.607 0.607 0.591 0.642 0.609 0.615 0.615 0.608 0.751 0.630 0.653 0.655 0.630 0.796 0.761 0.770 0.772 0.760 0.844 0.812 0.821 0.822 0.814 0.886 0.840 0.850 0.851 0.841 0.644 0.645 0.640 0.663 0.640 0.658 0.620 0.627 0.627 0.621 0.673 0.652 0.659 0.660 0.656 0.740 0.620 0.653 0.654 0.630 0.582 0.583 0.564 0.686 0.561 0.657 0.618 0.619 0.620 0.614 0.666 0.560 0.571 0.572 0.556 0.647 0.570 0.576 0.576 0.568 0.733 0.661 0.693 0.693 0.678 4  7  52  Standard One-Step #1.345  NCDF  0.675 0.610 0.615 0.657 0.771 0.822 0.851 0.644 0.627 0.660 0.657 0.586 0.619 0.573 0.576 0.694  0.676 0.609 0.616 0.659 0.773 0.823 0.852 0.645 0.628 0.660 0.659 0.586 0.620 0.573 0.577 0.694  T4.7  0.666 0.594 0.608 0.630 0.760 0.815 0.841 0.641 0.621 0.656 0.632 0.565 0.614 0.555 0.568 0.681  Table 4.5: Measures of Location of Flying Times of Four Types of Hummingbirds: Adult Females (AF), Adult Males (AM), Junior Females (JF) and Junior Males (JM)  Chapter 4. Location Estimation with Unknown Dispersion  53  one-step estimates defined with the T4.7 score function are always closer to the median than the estimates defined through the other two score functions.  The Tukey score  function sometimes even gives smaller estimates than the median. This completes our results of section 4.2 which makes the Tukey score function a better choice in presence of heavy-tailed distributions, or outliers. Among all bar plots of flying times in Figure 4.5, the one for A M 5 appears to be best approximated by a normal curve. The one-step M-estimates of the location of the flying times of this bird are midway between the median and the mean, which is the closest the estimates get from the mean over among all the birds. The one-step estimates confirms that for A M 5, the M L E may not be a bad estimator of location after all, though some caution in its use is necessary. It appears that adult males have the worst agility, and that junior males are the most agile birds. Adult and junior females hummingbirds a priori do not show a difference in agility. In order to estimate the mean flying times of one of the four types of birds, and conclude statistically that one type was more agile than the others, one could use a robust analysis which is beyond the scope of this thesis.  4.5  Conclusions  The M O S M E of location presented in this chapter is uniformly better than the standard one-step location M-estimator, in the sense that it is easier to compute, it has a comparable and sometimes better asymptotic efficiency under many important symmetric distributions, and has a lower or comparable asymptotic maximum bias when the central distribution is normal. When using a monotone non-decreasing score function, the M O S M E has a lower asymptotic bias than the standard one-step for any fraction of  Chapter 4. Location Estimation with Unknown Dispersion  54  contamination. With a resdescending score function, the maximum bias of the M O S M E is comparable to that of the standard one-step for small, but realistic, fractions of contamination. Under our assumptions, the standard one-step M-estimator of location is asymptotically equivalent to the fully iterated M-estimator. This makes again the M O S M E better than the fully iterated M-estimator, in terms of asymptotic behaviour. A n d since finding the solution of a non-linear equation, as is required when computing the fully iterated location M-estimator, can sometimes be problematic, the M O S M E still is preferable to the fully iterated M-estimator. The superiority of the M O S M E in terms of asymptotic efficiency is especially strong for very heavy-tailed distributions, which describe the situations of samples with outliers. Martin and Zamar (1989) have shown through finite sample-size simulations that in practice, the squared bias is at least as large as the variance of M-estimators for rather modest sample sizes. Therefore, the comparison between the M O S M E and the standard one-step (fully iterated) M-estimator should give more weight to the maximum bias. And indeed, the M O S M E beats uniformly and clearly the other two estimators in terms of maximum asymptotic bias when it is derived from a monotone non-decreasing score function; and it is comparable to the other two estimators for small fractions of contamination, when it is derived from a redescending score function. It is impossible to single out the best score function to define the M O S M E , as all score functions perform optimally in specific, and different, situations. However, when striving for an estimator as accurate (as in low bias) and as precise (as in high efficiency) as possible, and if it is known that the contamination by outliers is small, the Huber score function should be used. If one wants to use a monotone non-decreasing score function, one should prefer the Huber score function to the normal one. When the contamination  Chapter 4. Location Estimation with Unknown Dispersion  55  by outliers is large, the Tukey score function may represent a better choice than the Huber score function, mainy due to the fact that the efficiency of the Tukey is higher for heavy-tailed distributions than that of the Huber estimator. However, at this point, a formal comparison in terms of maximum bias between the Tukey and the Huber score functions is not possible for large fraction of contamination. With finite data sets, the example in section 4.4 shows that the M O S M E is more robust than the one-step (or the fully iterated) M-estimators of location when it needs to be. Indeed, in the presence of extreme outliers or very heavy tails, the values of the M O S M E are closer to the median than that of the other two estimators. The M O S M E , by its adaptative nature, becomes more robust and conservative when the data indicates that caution is necessary. On the other hand, the one-step (or the fully iterated) M estimators do not handle so well data that is far from normally distributed. We therefore believe that the use of the M O S M E should be prefered to that of the standard one-step location M-estimator, or the fully iterated location M-estimator, when estimating the location parameter 6 of a distribution. Furthermore, most of the results presented in this chapter are of an asymptotic nature. It remains to be seen whether the asymptotic superiority of the M O S M E reflects on its finite-sample performance for small and medium sample sizes. It would therefore be necessary to make simulations in order to establish for what minimum sample size the results presented in this paper become approximately valid. One could also assess the finite-sample behaviour of one-step M-estimators of location with sensitivity curves and empirical bias curves. If these finite sample-size results agree with the asymptotic results presented in this chapter, and that for small sample sizes, the acceptance of the M O S M E as a tool would be greatly facilitated.  Chapter 4. Location Estimation  with Unknown Dispersion  56  Maximum Bias Function of M-estimators Using H_1.345 Score Function  Epsilon  Figure 4.1: Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived from the #1.345 Score Function  Chapter 4.  Location  Estimation  with Unknown  Dispersion  57  Maximum Bias Function of M-estimators Usjng N_CDF Score Function  Epsilon  Figure 4.2: Maximum Bias Function of the M O S M E , and a Lower Bound for the Maximum Bias Function of the One-Step M-Estimator and the Fully Iterated M-Estimator of Location, Derived From the NCDF Score Function  Chapter 4. Location Estimation with Unknown Dispersion  58  Maximum Bias Function of M-Estimators Using T_4.7 Score Function  Epsilon  Figure 4.3: Lower Bound on the Maximum Bias Function of the M O S M E , the Standard One-Step and the Fully-Iterated M-Estimators of Location, Derived from the T4.7 Score Function  Chapter 4. Location Estimation with Unknown Dispersion  59  Maximum Bias Function of MOSME's of Location  Epsilon  Figure 4 . 4 : Maximum Bias Function of the M O S M E ' s of Location Derived from the #1.345 and the NCDF Score Functions, and a Lower Bound on the Maximum Bias Function of the M O S M E of Location Derived from the T4.7 Score Function  Chapter 4. Location Estimation with Unknown Dispersion  Adult Female 2  Adult Female 1  0.6  0.8  1.0  1.2  10  1.4  Flying Times (soc)  15  20  25  Adult Female 3  Adult Female 4  Flying Times (sec)  Flying Times (sec)  Adult Male 2  Adult Male 3  30  Flying Times (sec)  Adult Female 5  1.0  1.2  1.4  1.6  0.6  0.8  1.0  1.2  Flying Times (soc)  Flying Times (sec)  Hying Times (sec)  Flying Times (sec)  Adult Male 4  Adult Male 5  Junior Female 1  Junior Female 2  1.0  0.5  60  1.5  0.2  0.4  0.6  0.8  1.0  1.2  1.0  2.0  Flying Times (sec)  Flying Times (soc)  Flying Times (sec)  Frying Times (soc)  Junior Female 3  Junior Male 1  Junior Male 2  Junior Male 3  1.0  1.5  Flying Tlmos (soc)  2.0  0.4  0.6  0.8  1.0  1.2  1.4  Flying Times (sec)  1.6 Flying Times (sec)  Figure 4.5: Bar Plots of Flying Times of Four Types of Hummingbirds: Adult Females, Adult Males, Junior Females and Junior Males  Chapter 5 Dispersion Estimation with Unknown Location  5.1  Introduction  Let X\, ...,X be a sample from a population with distribution F in the location-dispersion n  family {F(x) : F(x) = F( ^)}.  The objective is to estimate the dispersion <r, when the  s  location parameter 8 is unknown. Given a score function Xi  a  n  M-estimator of dispersion is the solution S  n  of the  equation  (5.23) where T is a robust estimate of the location parameter 8. It can be shown that, under n  mild regularity assumptions, S  n  converges a.s.  [F] to S(F), the functional implicitly  defined as the solution of  where T(F) is the asymptotic value of T . We will therefore adopt the functional notation n  in the discussion below. Computing an M-estimator of dispersion requires the use of an iterative method, as we must solve the nonlinear equation (5.23). Because the score function \  1 S  of the form  x { ) — g{ ) ~ Pi where g is an even function with g(0) = 0, Huber (1981) suggested, as x  x  an alternative, to use the estimate found by performing one iteration in the fixed-point 61  Chapter 5. Dispersion Estimation with Unknown Location  62  algorithm (starting with some initial estimates of location and dispersion). With the underlying distribution F, the r-Estimator  of Dispersion, Sl(F), derived from the score  functions x, can be formally defined by the functional So(F)  2  0 where To(F) and So(F) are the asymptotic values of the initial estimators of location and dispersion. By replacing the distribution F by the empirical distribution F , it is n  therefore possible to get a finite-sample estimator for the dispersion parameter <r. Note that the subscript "p" in S\ stands for "p"oint, as in fixed-point. Another iterative algorithm widely used to estimate location with an M-estimator, but not so much dispersion, is the Newton-Raphson method. The Standard One-Step M-Estimator of Dispersion, S\(F), can be defined as the first iteration of this algorithm, starting from initial estimates of location and dispersion. More precisely,  5o(F)E x( jgf) £  S (F) 1  where T (F) 0  = So(F) +  F  and So(F) are the asymptotic values of the initial estimators of location  and dispersion. However, for certain score functions x  a n  d certain samples, it may happen that the  dangerously close to 0. To avoid this problem, we suggest replacing the denominator by E$x'( ) , z  z  following Hampel et al (1986)'s idea on p. 153 in the context of location  estimation with unkown dispersion. This defines the MOSME (Modified One-Step M "Eistimator) of Dispersion: S*(F) = S (F) + S (F) 0  0  Chapter 5. Dispersion Estimation with Unknown Location  63  Our main interest is to evaluate the asymptotic behaviour of the M O S M E of dispersion. It is therefore necessary to compare it with the estimators it is aimed at improving: the standard one-step M-estimator and the r-estimator of dispersion. We believe it has adaptative properties which the r-estimator or the standard one-step M-estimator lack. The M-estimators of dispersion, other than the M L E ' s , are not consistent for distributions other than the standard normal, $, which makes their asymptotic values differ, as well as their asymptotic properties. For this reason, the usual notion of asymptotic efficiency of an estimator will not apply directly. Instead, the relative asymptotic efficiency will be used to assess the asymptotic performance of the M O S M E of dispersion as compared with the standard one-step M-estimator and the r-estimator. The worst-case asymptotic bias of the three estimators will also be evaluated and commented on in the following sections.  5.2  Relative Asymptotic Efficiency of the M O S M E of Dispersion  5.2.1  Our Choice of Underlying Distributions and of Score Functions  To assess the asymptotic efficiency of the one-step estimators of dispersion, the eleven underlying distributions F used in chapter 4 (see section 4.2.1) will again be considered. To further illustrate the behaviour of the M O S M E , the following three score functions X will be used:  x -0.5  \x\< 0.975  0.451  x\ > 0.975  2  XH .„ ( ) X  0  6  =  {  x - 0.9686  Ircl < 2.376  4.6768  x\ > 2.376  2  .376  (x)  (5.24)  (5.25)  Chapter 5. Dispersion Estimation with Unknown Location  64  and  ,  x  XT .M 3  I  = <(  3.86  6  3.86 + 3.86 4  2  0  -  1  6  \ \  5  X  0.835  <  3  "  8  6  |z| > 3.86  •  L K  N A  \  (5.26)  The first two score functions, (5.24) and (5.25), are two cases of the general score function  Xc(x)  =I  x  2  \x\ < c  c  2  la;I > c  (5.27)  suggested by Huber (1981) (p. 109), which satisfies E$Xc(x)  = P(c). Notice that it  is the square of Huber's score function tpfj which defines a fully iterated M-estimator c  that is variance minimax in the location setup. For c not too small, the fully iterated M-estimator of dispersion defined through (5.27) is also optimally B-robust (see [15], p. 122). The M O S M E and the standard one-step M-estimator derived from (5.24) will be hereafter denoted by #0.975- The score function value /3(c) = 0(0.975) = 1/2.  X//0975  w  a  s  c r i o s e n  by tradition for the  The M O S M E and the standard one-step M-estimator  derived from (5.25) will be hereafter denoted by #2.376- The value of the constant 2.376 was chosen because it makes the M O S M E and the standard one-step M-estimator of dispersion 95% efficient under the normal model. Finally, the score function (5.26) is derived from Tukey's redescending score function in the location setup. It will define a M O S M E and a standard one-step M-estimator denoted by T . 6- Both the M O S M E and 3  8  the standard one-step M-estimator of dispersion T , e are approximately 95% efficient 3 8  under the normal model. The r-estimators derived from the score functions (5.25) and (5.26) are less than 95% efficient at the normal model. To make a fair comparison between the r-estimator and  Chapter 5. Dispersion Estimation with Unknown Location  65  the M O S M E or the standard one-step M-estimator of dispersion, we must ensure that all estimators are 95% efficient at the normal model. The two following score functions:  x  \x\ < 2.516  6.33  Ixl > 2.516  2  (5.28) with /?(2.516) =  E*XH (X) 2M6  =  0.9785, and  5.3  5.3 + 5.3  6  4  1 with /9(5.3) = E$XT {x) b3  2  l l ^ °'° X  ^  Ixl > 5.3  — 0.096, define two r-estimators that are 95% efficient at the  normal model. They will respectively be denoted by #2.516 and T5.3. Note that the value of the non-differentiable point 5.3 in the Tukey score function, which makes the T-estimator 95% efficient at the normal model, is 40% higher than that for the M O S M E . This will invariably affect the worst-case bias of the T-estimator of dispersion. The increase in the non-differentiable point c in the Huber score function is not as pronounced. The constant denominator, E$x'( ) i z  z  m  the ratio defining the M O S M E can be easily  calculated when the score functions x are determined. Table 5.6 provides those constants. The slightly higher constant for #2.516 is due to the fact that none of the three score functions are normalized to have a maximum of 1. The score function #2.516 in fact has a much larger maximum than the other two.  5.2.2  Asymptotic Value of the MOSME, the One-Step M-Estimator and the r-Estimator of Dispersion  To draw a parallel between the estimation of location discussed in the previous chapter and the estimation of dispersion, a first attempt consists in providing the efficiency for the estimators of dispersion. However, it is in a sense meaningless to compare the maximum  Chapter 5. Dispersion Estimation with Unknown Location  66  Score Function x 0.3736064 1.7396048 0.2677105  #0.975 #2.376 7/3.86  Table 5.6: The Value of the Constant Denominator, E^x'{ ) i Three M O S M E ' s of Dispersion Under Study z  z  m  the Ratio Defining the  likelihood estimator of dispersion to one-step M-estimators, because they do not estimate the same quantity. The maximum likelihood estimator of dispersion is consistent for a. Without loss of generality, we can assume that a = 1. Therefore, the M L E takes asymptotically the value 1. Indeed, for a given distribution F, the score function defining the M L E is  X M L E W  =  ~ J(^j x  ~ l  Assuming JP is symmetric, we must solve the equation - S F X M L E ^ ) ~ ^  w  n  e  n  calculating  the asymptotic value of the M L E of dispersion, a. But Ep j — x y ^ j = 1, assuming we can interchange derivation and integration. Therefore, Ep < —jj^xj | — 1 = 0, and the M L E takes asymptotically the value 1. On the other hand, one-step M-estimators will estimate a function of a, function that depends on the type of M-estimator (standard one-step M-estimator, M O S M E or restimator), the score function x used in its definition and the initial estimates of location and dispersion.  Table 5.7 provides the asymptotic value of the M L E , the M O S M E ,  the standard one-step M-estimator and the T-estimator of dispersion, as well as the normalized M A D (median absolute deviation from the median, multiplied by the inverse of $ ( 3 / 4 ) ) and the standard deviation, for the different underlying distributions F of _1  interest. The initial estimators of dispersion and location used to calculate the one-step  Chapter 5. Dispersion Estimation with Unknown Location  MOSME  Standard One-Step  67  r-Estimator  F  MLE  #0.975  #2.376  ?3.86  #0.975  #2.376  33.86  #2.516  T5.3  MAD  SD  dble exp  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1.00 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.00 0.99  1.21 1.17 1.39 1.22 1.09 1.06 1.04 1.02 1.00 0.98 0.87  1.21 1.21 1.43 1.23 1.09 1.06 1.05 1.02 1.00 0.98 0.88  1.01 1.01 1.02 1.01 1.00 1.00 1.00 1.00 1.00 1.00 0.99  1.21 1.22 1.52 1.25 1.09 1.06 1.05 1.02 1.00 0.98 0.84  1.21 1.27 1.53 1.25 1.09 1.06 1.05 1.02 1.00 0.98 0.86  1.19 1.16 1.34 1.20 1.09 1.06 1.04 1.02 1.00 0.98 0.87  1.23 1.33 1.50 1.28 1.11 1.06 1.05 1.02 1.00 0.98 0.88  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1.38 2.07  cont norm t(l)  t(2)  t(5) t(8) t(10) t(20) normal sym beta exp(—a; ) 4  00 CO  1.20 1.10 1.08 1.03 1.00 0.97 0.86  Table 5.7: Asymptotic Value of the M L E , the M O S M E , the Standard One-Step M-Estimator and the T-Estimator of Dispersion, As Well As the Normalized M A D and the Standard Deviation (SD), for Different Underlying Distributions F. The Initial Estimators of Dispersion and Location Used to Calculate the One-Step M-estimators Are Respectively the Normalized M A D and the Median. M-estimators are respectively the normalized M A D and the median. Table 5.7 demontrates that the term dispersion of a distribution F in itself is not cleary defined or even meaningful.  Indeed, the standard deviation is commonly used  as a measure of dispersion. When the underlying distribution is normal, the standard deviation can easily be interpreted. For example, if the standard deviation of a normal sample is 5, then the dispersion of the sampled normal population is estimated to be 5 times that of the standard normal distribution $. Approximately 68% of the sampled values fall within ± 5 of the sample mean. Notice also that the M L E of dispersion in the normal case is exactly the standard deviation.  On the other hand, what can be  said about the dispersion of a Cauchy or a t(2) family, for which the sample standard deviation is, for example, 5? No matter how spread or dispersed the Cauchy or the t(2)  Chapter 5. Dispersion Estimation with Unknown Location  68  distribution are, their standard deviations always equal infinity. The sample standard deviation cannot therefore be a meaningful measure of dispersion for these families. As an alternative, one could use the M L E to estimate the parameter u and refer to it as the dispersion of a distribution. However, calculating the M L E assumes that the distribution F generating the data is known, which is rarely the case in practice. Besides, even though the M L E possesses the analytical interpretation of cr, its geometric interpretation remains obscure. Indeed, remember that the distributions presented in Table 5.7 have been normalized so that their interquartile range equals 1.349. The normalization factor do is now part of the definition of their density (f(x; 8, cr) = fd (x', 0, cr)) and dispersion 0  estimation is concerned with the problem of identifying the distribution F in the family {F : F = Fd (^y-)}, 0  from which the sample is drawn. Therefore, assume for example  that the M L E of dispersion from a sample is found to be equal to 5. Does this mean that d = 5 and u = 1, or rather that d = 0.5 and a = 10, or some other combination? The 0  0  first possibility would imply that the distribution of interest has a light-tailed, whereas the second possibility indicates that F is heavy-tailed. Somehow, it is hard to grasp what kind of geometric dispersion the M L E actually estimates. For a clear geometric interpretation of dispersion, the M A D offers a good deal. It is easy to picture the interquartile range of a sample: it is the range, centered around the middle of the sample, which contains half of the data points. If for example a sample has a normalized M A D of 5, then the dispersion of the sampled population can be estimated as 5 times larger than the dispersion of the same distribution with interquartile range equal to 1.349. However, maybe because of its ease of interpretation, the M A D does not define the complete picture. Indeed, all the distributions presented in Table 5.7 have the same M A D , even though some F are heavy-tailed, others are light-tailed and one is normal.  Chapter 5. Dispersion Estimation with Unknown Location  69  Clearly, the centre half of mass of these distributions is contained in the exact same range. But, we are given no indication regarding the shape of their tails. Are they elongated, dispersed, or rather compact, short? W i t h these questions in mind, one can easily conclude from Table 5.7 that the one-step M - estimators of dispersion do indeed estimate dispersion in its most meaningful way. For any distribution in the table, the estimates of dispersion are finite, and roughly close to 1. The closer the estimates are from 1, the closer the distributions are from normal. A l l estimates of dispersion over 1 come from heavy-tailed distributions, whereas all estimates under 1 refer to light-tailed distributions. In fact, by their intrinsic definitions, the onestep M-estimators of dispersion are a linear function of the M A D ; this function differs more or less from the identity as the underlying distribution differs from the standard normal distribution. The one-step M-estimators therefore provide information about dispersion which is more complete than the M A D , while staying geometrically interpretable and meaningful. A look at the one-step estimates in Table 5.7 tells us that all distributions have roughly the same interquartile range, but that their nature differ greatly, since the estimates are all close to 1, but still different from each other. Finally, note that computing a one-step M-estimator of dispersion doesn't require the knowledge of the distribution generating the data, since the empirical distribution is used: this is a clear advantage over the M L E , which is defined through the joint density of the data points. But, which of the M O S M E , the standard one-step or the T-estimator should be used to estimate dispersion? It appears that the M O S M E or the T-estimator represent the best choice, depending on the situation. Both estimators are relatively close to 1, and always more than the standard one-step M-estimator of dispersion, especially in the presence a heavy-tailed distribution. When using the Huber score function, the T-estimator is  Chapter 5. Dispersion Estimation with Unknown Location  70  slightly closer to 1 than the M O S M E for heavy-tailed distributions. In situations where a smoother score function is used, such as the Tukey score function, the M O S M E is closer to 1 than the r- estimator with heavy-tailed distributions. Any choice of estimator and score function is good for light-tailed distributions. Finally the choice of the Huber score function which is 95% efficient at the normal model seems more appropriate than score function.  #2.376  #0.975  better reflects the nature of the tails of the underlying distribution,  by diverging more from 1.  5.2.3  A s y m p t o t i c V a r i a n c e of the M O S M E of D i s p e r s i o n  Appendix C provides the complete derivation of the influence function of the M O S M E of dispersion. In the general non-symmetric case, it is equal to  where T = T (F), S = S (F), y = 0  0  0  0  and IF(x;S ,F) 0  and IF(x;T ,F) 0  are the  respective influence functions of the initial estimates of location and dispersion. However, it is possible to simplify the above expression with appropriate conditions.  P r o p o s i t i o n 3 Assume the score function x is even, bounded and twice differ entiable everywhere except in at most a finite number of points. If F is symmetric, then  where y =So(F)' Note that T (F) = 0 when F is symmetric. The most important condition in Proposition 0  3 is the symmetry of F, because it makes the M O S M E of dispersion independent of the  Chapter 5. Dispersion Estimation with Unknown Location  71  initial estimator of location. The conditions on the score function x  a  r  e  minimal regularity  conditions that most score functions used in practice will satisfy. Under mild regularity assumptions, a Taylor series expansion gives an expression for the asymptotic variance of the M O S M E of dispersion: V(S*, F) = E {IF(x-  S*, F)} . 2  F  In view of Proposition 3, it is clear that the choice of the initial estimator of dispersion, So{F), will greatly affect the efficiency of the M O S M E of dispersion. The literature strongly recommends the use of the robust normalized M A D _ Med|x - Med(x)| n  $" (3/4)  =  1  as an initial estimate of dispersion, which has the following influence function  I F ( X  {  .  S O  F) -  ' °'  }  ^  (  w  -  ^  m  )  ~ 4/($" (3/4))$-i(3/4) 1  when F is symmetric and properly scaled so that So(F) = 1. The M A D was first promoted by Hampel (1974).  Other possibilities for starting  estimators of dispersion are presented by Rousseeuw and Croux (1991), which do not assume a symmetric underlying distribution.  Indeed, one must note that the M A D  is aimed at symmetric distributions and has low gaussian efficiency, but its influence function has the sharpest bound among all possible dispersion estimators for symmetric F, and therefore, the M A D has the lowest possible gross-error sensitivity for symmetric F (see [15], p. 142). Huber (1981) concluded that "the M A D has emerged as the single most useful ancillary estimate of scale" (p. 107). The S H O R T H (the shorthest half of the data), equivalent to the M A D for symmetric distributions, is also becoming more an more popular because it was found to be approximately minimax bias robust within the class of M-estimators of dispersion with general location (see Martin and Zamar (1993)).  Chapter 5. Dispersion Estimation with Unknown Location  72  On the other hand, because only symmetric distributions are studied, the initial estimator of location will asymptotically have the value 0  (TQ(F)  —  0), and therefore,  does not affect directly the asymptotic variance of the M O S M E of dispersion. However, in practice, because any sample size is finite, one should carefully choose the initial estimate of location. The median is recommended in the literature, because of its robustness properties.  5.2.4  Asymptotic Variance of the Standard One-Step M-Estimator of Dispersion  Appendix D provides the complete derivation of the influence function of the standard one-step M-estimator of dispersion. In the general non-symmetric case, it is equal to IF( -S F) X  U  = IF{x; S , F) {  2  0  TF(r-  T  ir{x,± ,r)\  F\ f  FX  where T = T {F), 0  0  S  0  FX(V) E x"(y)y , [Erx'MvP + F  S  0  = S {F), y = 0  }+  e  0  E '{y)y  | g ^ +  >  -^ffi  y  Epx(y) E x'(y) [E '(y)y] F  2  FX  E '{y)y  E x'(y) \ , E '(y)y(' F  t  FX  J '  FX  and IF(x;S ,F) 0  and IF(x;T ,F) 0  are  the respective influence functions of the initial estimators of dispersion and location. However, it is possible to simplify the above expression with appropriate conditions. Proposition 4 Assume the score function x is even, bounded and twice differentiable everywhere except in at most a finite number of points. If F is symmetric, then IFfrSuF)  =  7f(x;5„,f){2^  +  a^ggAi}  +  Chapter 5. Dispersion Estimation with Unknown Location  73  Notice that To(F) = 0 when F is symmetric. The influence function of the standard one-step M-estimator is the sum of two terms, similarly to the influence function of the M O S M E : one is a multiple of IF(x; So, F) and the other is a multiple of So(F).  It is obvious that the initial estimator of dispersion  holds great importance in both the definition of the M O S M E and the standard one-step M-estimator. The influence function for the one-step M-estimator contains a'x"(y) and many ratios with denominator EFx'{y)Vi which may cause computational problems. On the other hand, the influence function of the M O S M E contains no x"(y)  a n  d only includes  ratios with denominator E<^x'{ ) i which are known to be more stable. z  z  Under mild regularity assumptions, it is possible to express the asymptotic variance of the standard one-step M-estimator as V(S ,F) 1  5.2.5  =  E {IF(x;S ,F)} . 2  F  1  Asymptotic Variance of the T-Estimator of Dispersion  Appendix E provides the complete derivation of the influence function of the T-estimator of dispersion, S%(F). In the general non-symmetric case, it is equal to IFix-SIF)  =  £,{IF(x S ,F)[j;(S?) -fE x'{y)y] 2  ]  -IF(x; where T  0  = T (F), 0  S  0  = S (F), 0  T , F)fE '(y) 0  FX  0  F  - (S?) +  S* = 5?(F), y =  2  fxi^)}, and IF{x;T ,F) 0  and  IF(x; So, F) are the influence functions of the initial estimators of location and dispersion. However, it is possible to simplify the above expression with appropriate conditions.  74  Chapter 5. Dispersion Estimation with Unknown Location  Proposition 5 Assume the score function \  even, bounded and twice differentiable  z s  everywhere except in at most a finite number of points. If F is symmetric, then  IF{x-S{,  F) = IF{x- S , F ) { f | f } - ^E '(y)y} 0  +  FX  § ^ x ( ^ )  2 '  where y = Notice that To(F) = 0 when F is symmetric. The value of the T-estimator appears in its influence function, contrary to the M O S M E or the standard one-step M-estimator of dispersion. The assumption of symmetry for F makes the influence function of the T-estimator of dispersion with unknown location equal to the influence function of the T estimator of dispersion with no location parameter (see [30]). The problem of calculating the influence functions for symmtric F is therefore greatly simplified. Under mild regularity assumptions, it is thus possible to express the asymptotic variance of the T-estimator of dispersion as  V{Sl,F)  5.2.6  =  E {IF(x-Sl,F)}\ F  Relative Asymptotic Efficiency of the M O S M E Compared to That of the One-Step M-Estimator and the T-Estimator of Dispersion  The asymptotic efficiency of an M-estimator of dispersion is not a good measure of its asymptotic variability, because as shown in section (5.2.2), the asymptotic value of an M-estimator depends strongly on its score function as well as on its type. Huber (1981) (p. 108) suggests rather the use of the relative asymptotic variance, RV(S,F), the asymptotic variance of yf(n)\og(S(F )/S(F)), n  RV(S,F)  = V(\ogS,F)  which is defined as  =  ^ p - ,  that is,  Chapter 5. Dispersion Estimation  with Unknown  75  Location  where S = S(F). This suggestion is in accordance with Bickel and Lehmann (1976) who observed that the relative variance instead of the variance for dispersion estimators is needed to obtain a natural measure of accuracy. Let the relative asymptotic efficiency of an M-estimator of dispersion be defined as the ratio of the relative asymptotic variance of the M L E over its relative asymptotic variance. The relative asymptotic efficiency of the M O S M E will be compared to that of the one-step M-estimator and the T-estimator of dispersion to assess the asymptotic performance of the M O S M E in comparison with that of the other two estimators. The Fisher information, 1(c), for the location-dispersion family of distributions {Fg : i<T  Fe  tt7  = F(^-)},  is equal to  1(a) = ~^E  {x  F  + 1}  2  Because the M L E of dispersion is consistent for a, its asymptotic variance is V(MLE,  F) =  JT^T. Its asymptotic value being 1 for any distribution F, as shown in section (5.2.2), the relative asymptotic variance of the M L E of dispersion is thus i ? V ( M L E , F) = y^y. Table 5.8 gives the relative asymptotic variance of the M L E , RV(MLE,  F), for the different  distributions under study. Finally, Table 5.9 shows the relative asymptotic efficiency of the M O S M E , the onestep M-estimator and the T-estimator of dispersion, for the score functions #2.516,  23.86,  and  T5.3,  #0.975,  #2.3765  and different underlying distributions F. The relative asymptotic  efficiency of the normalized M A D is also provided, as a mean of comparison, since it is used as the initial dispersion estimator. The asymptotic variance of the normalized M A D is equal to  V(MAD,F)  =  1 16/ ($- (3/4)){$- (3/4)} ' 2  1  1  2  Chapter 5. Dispersion Estimation with Unknown Location  Distribution F  #V(MLE,F)  double exponential contaminated normal t(l) t(2) t(5) t(8) t(10) t(20) normal symmetrized beta 0.55exp(—x )  1 0.5 2 1.25 0.8 0.6875 0.65 0.5750084 0.5 0.4210503 0.25  4  Table 5.8: Relative Asymptotic Variance RV(MLE,F) lying Distributions F  of the M L E for Different Under-  Distribution F  MAD  #0.975  #2.376  T3.86  #0.975  #2.376  23.86  #2.516  2k3  dble exp cont normal t(l) t(2) t(5) t(8) t(10) t(20) normal sym beta exp(—x )  0.481 0.336 0.811 0.703 0.534 0.476 0.456 0.413 0.368 0.317 0.233  0.575 0.409 0.947 0.837 0.660 0.597 0.574 0.524 0.470 0.410 0.328  0.873 0.352 0.916 0.959 0.977 0.976 0.977 0.966 0.950 0.919 0.825  0.910 0.311 0.913 0.963 0.993 0.992 0.989 0.975 0.947 0.891 0.780  0.573 0.405 0.884 0.812 0.653 0.593 0.571 0.524 0.470 0.410 0.335  0.534 0.219 0.371 0.585 0.813 0.879 0.899 0.932 0.950 0.942 0.735  0.918 0.211 0.788 0.922 0.987 0.989 0.986 0.973 0.946 0.898 0.833  0.844 0.346 0.902 0.955 0.970 0.970 0.970 0.964 0.950 0.920 0.841  0.935 0.246 0.880 0.929 0.974 0.985 0.987 0.980 0.953 0.892 0.769  4  MOSME  76  Standard One-Step  r-Estimator  Table 5.9: Relative Asymptotic Efficiency of the M O S M E , the One-Step M-Estimator and the r-Estimator of Dispersion Derived from Different Score Functions x , for Different Underlying Distributions F. The Relative Asymptotic Efficiency of the Initial Estimator of Dispersion, the Normalized M A D , Is Provided for Comparison Purposes. The Median is Used as the Initial Estimator of Location  Chapter  5. Dispersion  Estimation  with Unknown  Location  77  Many observations can be made from Table 5.9. The relative asymptotic efficiency of the M O S M E , the one-step M-estimator and the r-estimator of dispersion is improved over that of the M A D , their initial estimator of dispersion, for almost all distributions. The very heavy-tailed distributions like t(l) and t(2) seem to cause some problems to the one-step M-estimator with the #2.376 score function, and the contaminated normal distribution is in general not so well handled by estimators derived from #2.376) #2.516) 73.86 and T5.3 score functions. When compared to the score function #2.376, the relative  efficiency of the M O S M E s derived from #0.975 does not in general show a significant improvement over the relative efficiency of the M A D . The #0.975 M O S M E ' s are probably not worth considering, at least in a relative asymptotic efficiency context. However, for very heavy-tailed (t(l), t(2), contaminated normal and double exponential) distributions, the one-step M-estimator derived from the #2.376 score function actually performs worse than the one derived from #0.975: a sign that sacrificing robustness for efficiency with the standard one-step M-estimator can sometimes be a bad strategy. The T3.86 score function, used either with the M O S M E or the one-step M-estimator, appears to be a better or a comparable choice for estimating dispersion, with heavy-tailed distributions. The conclusion is however reversed for the r-estimator: #2.516 generally performs better than T5.3 for heavy-tailed distributions (with the exception of the double exponential distribution). However, the M L E beats any of the estimators presented in Table 5.9, and that, for all distributions F, since all relative efficiencies presented are below 1.000. But using the M L E for estimating dispersion requires that the entire sample be drawn from F, with no outlier allowed: this is too restricted a situation in many cases. Besides, the distribution F must be specified for the M L E to be defined. The relative asymptotic efficiency of the M O S M E is higher than that of the one-step M-estimator or the r-estimator of dispersion for heavy-tailed distributions (t, double  Chapter 5. Dispersion Estimation with Unknown Location  78  exponential and contaminated normal), and that, for all score functions considered, except for the Tukey score function used with the double exponential distribution. The improvement appears more significant in very heavy-tailed distributions (Cauchy, t(2), double exponential and contaminated normal) used with efficient score functions and  #2.516, 23.86  T5.3).  (#2.376,  The improvement by the M O S M E over the standard one-step  M-estimator is also generally greater than over the T-estimator. For the light-tailed distributions symmetrized beta and exp(—x ), however, the one4  step M-estimator of dispersion performs generally better than the M O S M E or the T estimator, but not significantly better.  5.3  M a x i m u m Bias of the M O S M E of Dispersion  Let X\,...,  X  be a sample from a population with distribution F in the contamination  n  neighboorhood V (F '°) 6  e  0  = {F : F = (1 - e)F '" + e#, # arbitrary distribution}, 6  0  where 0 < e < 1/2. Refer back to section 4.3 for a more detailed description of the nature of this contamination neighboorhood. The worst-case bias of an estimator gives a measure of its asymptotic robustness as a function of the fraction e of the contamination. Since outliers, as well as inliers, can possibly affect the estimator, we need to consider both cases separately. The explosion bias curve of the M O S M E of dispersion is defined by  ^1  #+ (e) = sup FeV  e  cr  and describes the behaviour of the estimator in presence of outliers. On the other hand, the implosion bias curve of the M O S M E of dispersion is  79  Chapter 5. Dispersion Estimation with Unknown Location  S*(F)  a  and describes the M O S M E in presence of inliers. Let the distribution i ^ be obtained when H = 6^, the distribution which puts its 7  total mass at infinity. Let the distribution F be obtained when H = 6 , the point mass 0  0  contamination at 0. We shall hereafter concentrate on the normal central distribution F  E  ,  = $ in the neighboorhood  A  0  V, E  as in section 4.3.  Proposition 6  Assume the score function x is odd and bounded.  Let S~(e) = i m V y e  E$x'( )  > 0 d  z z  an  d/dt{E* ( -?)} x  X  £  S (e) +  SQ(F),  = sup  FeV  and B(e) = s u p  So(F),  F e V  | r ( F ) | . Assume 0  assume that we can interchange derivation and integration, that is = -iErfi*?)  ^d  d/ds{E* ( -?)}  = - ^ x ' ( ^ ) ( ^ ) - //  x  X  (5.30)  V* in [0,B(e)], for fixed s in [S~(e),S {e)] +  and  (5.31)  \/s in [S (e),S (e)], for fixed t in [0,B(e)] +  then sup  S*{F)  5*(Foo),  =  where  F^  Fev  =  (1 - e)$ + eS  (  c  and inf  S*{F)  =  S*{F ), 0  where  F  0  = (1 - e)$ + e6 . 0  Chapter 5. Dispersion Estimation  with Unknown  Location  80  P r o o f : We will prove the explosion bias result only. The proof for the implosion bias of the M O S M E follows the exact same idea. We clearly always have that s u p y S*(F) > S*(F ). Fe  £  CX>  Moreover, Ve < 1/2,  sup  F€l  , S*(F) =  S (F) i J^ ) EF  \so(F) +  S U P  i  )  0  sup Fe  {{l-e)$  + eH}  {5o ^()"- »^,;:r ' )ji  W+  SU <  f  .„  P  F G { 1-e  ,£atf1  $ + e#}  ^Ai-^xC-^M  1  1  sup -B( 5"(c)  5  =  + ( e  )  < a <  +  ^ (  5+(e)  e  ) ' - ^  H  '  5*(Foo).  The first equality is simply the definition of S*(F). By definition of V , it is possible c  to write  ^FX(^^Fp)  a s  *he sum of two terms, as states the second equality. Since x  is bounded, we can assume without loss of generality that sup x{x) x  always less or equal to 1, which gives the third line.  = 1. Thus EHX i  s  Chapter 5. Dispersion Estimation with Unknown Location  81  The function which is to be maximized on the third line does not depend on the distribution # anymore. In fact, it can be regarded as a function of two arguments, T (F) 0  (or t) and SQ(F) (or s). Following the work of Martin and Zamar in [26], if e < 1/2, then T (F) and So(F) are bounded as in the fourth line. That is, S~(e) < So(F) = s < 0  and -B(e)  S (e), +  < T (F) = t < B(e). 0  Notice that since x is odd, E$( j^) s  = - B $ ( ^ ) . We can therefore assume without loss £  of generality that t > 0. Assuming the conditions (5.30) and (5.31) hold, the function to be maximized in the fourth line is increasing in To(F) = t, for all fixed S~(e) < So(F) = s < S (e), +  when 0 < T (F) 0  = t < B(e), and is increasing in SQ(F) = s, for all fixed  0 < T (F) = t < #(e), when S~(e) < S {F) = s< S (e). +  0  0  fifth line, which is by definition  Therefore, we directly get the  S*(F ). OQ  Hence, we have shown that sup  FeVe  S*(F) = S*(F ), co  when the central distribution  Fo '" is normal. • 6  Analytical derivations, combined with numerical calculations, have shown that the M O S M E ' s with  #o.975 #2.376 5  and the  T3.86  score functions satisfy the above conditions  (5.30) and (5.31), when the median and the normalized M A D are used as preliminary estimates of location and dispersion. The conditions were rewritten for the three specific cases of  #0.975, #2.376  and T3.se  score functions, and evaluated over a finite and equally-spaced 21x21 grid, covering the range of possible t and s values. For a fixed e, the maximum value of the (normalized) M A D , S (e), is produced by a point mass contamination at infinity, and such contami+  nation also produces the maximum value B(e) of the location estimator. The minimum value of the (normalized) M A D , ^"(e), is produced by a point mass contamination at 0, and such contamination also produces the minimum absolute value of the location  Chapter 5. Dispersion Estimation with Unknown Location  82  estimator, 0 (see Martin and Zamar (1993)). More specifically, the value for B(e) can be explicitly written as ^ ~ ( (i- ) )• ^ is the value T(F) which satisfies (4.16) for the >  1  2  median score function  £  = sgn(x) and F = F ^ . The bounds S (e) and S~(e) are +  XMed(x)  the implicit solutions of (1 - e)MB(e)  - 5 ( e ) $ - ( 3 / 4 ) } + 1 - *{J3(e) + 5 ( ) $ " ( 3 / 4 ) } ] + e = 1/2, +  1  +  1  e  and (1 - e ) [ $ { - 5 - ( 6 ) $ - ( 3 / 4 ) } + 1 - ^{S-(e)^(3/A)}}  = 1/2.  1  That is, S (t) satisfies +  =  E XMAD( S+^)) X  FOO  1/2, where the M A D score function is  V  2  and ^"(e) satisfies  E XMAD{S^)  =  FO  = l/2{sgn(|x| - $ ( 3 / 4 ) ) + 1} and _1  XMAD(X)  F = (1 - e)$ + t6 . 0  0  For all e < 1/2 used, the conditions (5.30) and (5.31) were always met. So, even if those conditions seem somewhat restrictive, it is believed that many often used bounded and even score functions x satisfy them. Assuming that the score function x  1S  bounded, we have a lower bound for the ex-  plosion bias curve of the standard one-step M-estimator of dispersion, given by  Si(F ), 0O  where: (1 - e ) F $ x ( ^ 4 ) + ex(oo) £  S ^ )  = S (e) + S (e)+  +  i  'l \  )  •  {  W  M  Similarly, it is possible to get an upper bound on the implosion curve of the standard one-step M-estimator of dispersion. It is given by S {F ) 1  0  = S (e) + 5 ( e ) - — .  K  J  x  .  (1 - t)E^X {s=(e)Ks=tf) It is also possible to get exact expressions for the explosion and implosion bias curves of the T-estimator. Rousseeuw and Croux (1993a) gives them in the context of estimation of dispersion with known location. The following proposition improves their results.  Chapter 5. Dispersion Estimation with Unknown Location  83  Proposition 7 If s —> s E$x( ~^) is increasing in the range [S (e),S' (e)] for fixed t 2  in [—B(e), B(e)], and if t [S-{e),S (e)} +  }  x  +  E^xi ^) is increasing in the range [0,#(e)] for fixed s in 1  —>  then  sup 5 f ( F ) FGF  =  5f(Foo)  mf S f t F ) =  ST(F ).  £  0  Proof: The proof follows the exact same steps as in Proposition 6. • Analytical derivations, combined with numerical calculations similar to those performed for the M O S M E of dispersion, have shown that the T-estimators with  #2.516  and  the T5.3 score functions satisfy the above conditions in Proposition 7, when the median and the normalized M A D are used as preliminary estimates of location and dispersion. Note that the condition necessitating s E$x( ^) to be increasing in 5 can be explicitly 2  written as  x  — cs) > $(£ + cs) — 1 for the H score function, which.is clearly always c  true for any s, t and c. It is therefore possible to compare, in terms of maximum asymptotic explosion and implosion bias, the M O S M E with the standard one-step M-estimator and the r-estimator of dispersion. Figures 5.6, 5.7 and 5.8 show the explosion bias curve of the M O S M E and a lower bound for the explosion bias curve of the one-step M-estimator of dispersion, derived from the  #0.975, #2.376  and  T3.86  score functions. In all three cases, the M O S M E shows  a smaller maximum asymptotic bias than the one-step M-estimator, Ve < 1/2.  The  Chapter 5.  Dispersion Estimation  with Unknown  84  Location  improvement is however better in the case of the more efficient estimators  #2.376  and the  23.86-  Figure 5.9 shows the explosion curves of the M O S M E and the r-estimator of dispersion for the Huber and Tukey score functions, for small, but realistic, contamination by outliers (e < 0.30). We notice that for small values of e, the M O S M E has a lower maximum bias than the r-estimator. However, the situation is reversed when e gets bigger. As the figure shows, this happens with the Huber score function when e > 0.075. (It also happens for the Tukey score function with e > 0.47.) Nevertheless, notice that situations with small fraction of contaminations are representative of many real data set with outliers. When defining the M O S M E or the r-estimator of dispersion, one needs to choose between different score functions X- As Figure 5.9 shows, the Huber score function is preferable to the Tukey score function with the M O S M E , as well as with the r-estimator. Finally, Figure 5.10 shows the implosion curves of the M O S M E , the standard onestep M-estimator and the r-estimator derived from the Huber score function. Figure 5.11 shows the implosion curves of the M O S M E , the standard one-step M-estimator and the r-estimator derived from the Tukey score function. In both Figures, the implosion curve of the normalized M A D is also given for comparison purposes. We notice that for smaller e (e < 0.27 with the Huber score function and e < 0.33 for the Tukey score function), the M O S M E outperforms the other estimators. For larger e, the M O S M E uniformly beats the r-estimator, but is outperformed by the standard one-step M-estimator. The latter does not implode at e = 0.5. Depending on the situation, this may or not be desirable: when half the data coincides with a single point, the standard one-step M-estimator does not become 0 (whereas the M O S M E , the r-estimator and the M A D do). Rousseeuw and Croux (1993a) mention that the fully iterated M-estimator derived from the  #2.376  score function explodes at e = 0.9686/5.645 = 0.17 and implodes at e = 1 — 0.17 = 0.83.  Chapter 5. Dispersion Estimation with Unknown Location  Similarly, the fully iterated M-estimator derived from the T . 3  85  86  score function explodes  at e = 0.165 and implodes at e = 1 — 0.165 = 0.835. At the normal model, the standard one-step M-estimator is consistent for a and so has the same asymptotic properties as the fully iterated M-estimator of dispersion. When using a one-step M-estimator of dispersion in a situation of inliers, one needs to decide which score function to select in its definition, so as to minimize its implosion bias. Figure 5.12 shows that the M O S M E ' s , the standard one-step M-estimators and the T-estimators derived from the Huber and Tukey score function are roughly equivalent in terms of implosion bias for small, but realistic, contamination (e < 0.3). The #2.376 M O S M E appears however to slightly beat the other estimators. If one expects a large fraction of contamination of inlers (e > 0.3), then the standard one-step M-estimator of dispersion with the #2.376 score function would probably be the best choice. As discussed, though, this choice may lead to catastrophic results in the presence of special samples. In that case, the M O S M E with the T3.86 score function may be a wiser choice.  5.3.1  Further W o r k to B e Done  The work about asymptotic robustness was developed for one specific case of central distribution FQ '  9 17  in the neighborhood of V : the normal distribution, e  More calculations  need to be done for different central distributions Fo '", as for example, the ones used in 6  the asymptotic efficiency section (5.2) of this text. One could also define the M O S M E , the standard one-step M-estimator and the restimator of dispersion using other types of score functions, and compare the different estimators in the hope that one proves better than all the others in most situations. Finally, it remains to be seen whether the asymptotic properties of the three one-step M-estimators studied in this section reflect on their finite-sample performance for small  Chapter 5.  Dispersion Estimation with Unknown Location  86  and medium sample sizes. This could be assessed with Monte-Carlo simulations, for example.  5.4  Continuation of The Hummingbird Example  To illustrate the use of the M O S M E as an alternative to the standard one-step M estimator or the T-estimator of dispersion, we shall estimate the dispersion for the distributions of the flying times of the sixteen hummingbirds presented in the previous chapter, section 4.4. Remember that there are four types of hummingbirds: adult females ( A F ) , adult males ( A M ) , junior females (JF) and junior males (JM). Refer to Figure 4.5 for the bar plots of the flying times of all 16 birds. The researcher who provided the data was interested not only in finding a measure of location within each type of birds, but also in describing more generally the features of the distribution of their flying times, such as for example their dispersion. The use of one-step M-estimators of dispersion represent good alternatives to the popular measure of dispersion: the standard deviation. Table 5.10 presents the values of the M O S M E , the standard one-step M-estimator and the T-estimator of dispersion, as well as the standard deviation and the normalized M A D , for the sixteen birds, assuming an underlying normal distribution. We notice that contrary to the location setup, the Tukey score function is less conservative in terms of robustness than the Huber score function, since the Tukey estimates are further from the M A D than the Huber estimates. As expected, the #0.975 estimates stay very close to the M A D , and do not bring much more light than the M A D : the score function is almost as robust as the M A D . On the other hand, the M O S M E and the T-estimates of dispersion are roughly comparable when using a Huber score function which is 95% efficient at the  Chapter 5. Dispersion Estimation with Unknown Location  M O S M E of Dispersion  Bird AF 1 AF 2 AF 3 AF 4 AF 5 AM 1 AM 2 AM 3 AM 4 AM 5 JF 1 JF 2 JF 3 JM 1 JM 2 JM 3  SD MAD 0.151 0.075 2.430 . 0.097 0.143 0.073 0.275 0.074 0.176 0.120 0.132 0.076 0.160 0.074 0.103 0.044 0.167 0.073 0.105 0.064 0.314 0.135 0.387 0.090 0.179 0.042 0.237 0.044 0.449 0.044 0.182 0.104  #0.975  #2.376  0.079 0.096 0.073 0.076 0.124 0.083 0.078 0.048 0.072 0.066 0.144 0.094 0.042 0.048 0.043 0.101  0.091 0.131 0.080 0.111 0.139 0.100 0.093 0.052 0.078 0.075 0.186 0.123 0.049 0.063 0.053 1.124  33.86  0.093 0.133 0.082 0.117 0.143 0.100 0.096 0.054 0.080 0.077 0.191 0.128 0.050 0.067 0.055 0.129  87  Standard One- Step #0.975  #2.376  0.079 0.095 0.073 0.077 0.125 0.083 0.078 0.047 0.072 0.066 0.143 0.095 0.042 0.048 0.043 0.100  0.095 0.147 0.082 0.151 0.141 0.097 0.096 0.054 0.080 0.078 0.200 0.133 0.050 0.077 0.058 0.138  33.86  0.097 0.140 0.085 0.145 0.144 0.099 0.098 0.054 0.082 0.078 0.201 0.137 0.052 0.083 0.059 0.137  r-Estimator #2.516  0.090 0.127 0.080 0.106 0.138 0.097 0.092 0.052 0.079 0.075 0.180 0.119 0.049 0.061 0.053 0.124  T5.3  0.099 0.140 0.089 0.124 0.149 0.102 0.100 0.056 0.086 0.080 0.199 0.135 0.054 0.073 0.061 0.138  Table 5.10: Measures of Dispersion of Flying Times of Four Types of Hummingbirds: Adult Females (AF), Adult Males (AM), Junior Females (JF) and Junior Males (JM). The MAD is the Median Absolute Deviation Multiplied by the Inverse of $ (3/4). _1  normal model. When using a Tukey score function with that 95% Gaussian efficiency, the T-estimator compares favourably with the standard one-step M-estimator of dispersion. Interpreting those measures of dispersion represents an important issue of dispersion estimation. For example, the flying times of the junior female bird 1 has a high standard deviation, compared with its other measures of dispersion, which is due to a very heavy tail (see Figure 4.5 for a visual assessment of the outlier). The normalized MAD of the data is equal to 0.135; in other words, the interquartile range of this sample is 13.5% that of the normal distribution (which MAD equals 1). Because the values of the different one-step M-estimators for this specific bird are significantly higher than the MAD, we know that the sample is very heavy-tailed, without even looking at Figure 4.5.  Chapter 5. Dispersion Estimation with Unknown Location  88  The researcher was interested in determining a general measure of dispersion in flying times within the four types of hummingbirds. A robust analysis could be used, but it is beyond the scope of this thesis. However, a simple look at the estimates of dispersion suggests no type of hummingbird has a dispersion in flying times clearly different from the other types.  5.5 Conclusions The M O S M E of dispersion presented in this chapter is in general better than the standard one-step M-estimator of dispersion: it is easier to compute, it has an asymptotic value closer to that of the M L E , it has a higher relative asymptotic efficiency in presence of many heavy-tailed distributions, it has a lower asymptotic explosion bias and a lower implosion bias for small e-contaminations by inliers with central normal distribution. However, if one is concerned mainly about bias, and the data possibly contain a large contamination of inliers but not many repetitions, then the standard one-step M-estimator may be a better estimator of dispersion. In that situation, the choice of the Huber score function would be advisable. For light-tailed underlying distributions, the one-step M-estimator of dispersion performs slightly better in terms of asymptotic relative efficiency than the M O S M E (and the T-estimator). However, heavy-tailed distributions, an attempt to model samples with outliers, are of more interest and importance to robust theory. On the other hand, the T-estimator offers some competition to the M O S M E , in the sense that it is also easy to calculate, it estimates more accurately dispersion of heavytailed distributions when it is derived from the Huber score function, and it presents a lower explosion bias for large contamination with central normal distribution. The  Chapter 5. Dispersion Estimation with Unknown Location  89  M O S M E derived from the Tukey score function has nevertheless a higher relative efficiency than the T-estimator for heavy-tailed distributions.  Moreover, the M O S M E  outperforms the T-estimator in terms of explosion bias for small (and more realistic) percentages of contamination, and beats the T-estimator in the presence of any percentage of inliers. As the hummingbirds example studied in this chapter indicates, there is similarly no obvious winner in terms of dispersion estimator with finite sample sizes. The behaviour of the M O S M E , the standard one-step or the T-estimator of dispersion compares to any of the other, depending on the score function used.  The estimates however provides  information about the tails of the distribution generating the data, which the M A D or the standard deviation cannot do. If one is mainly interested in the accuracy (as in "small bias") of the estimate of dispersion of normal data with outliers present, and didn't expect a large e-contamination, then we would recommend the use of the M O S M E of dispersion derived from the Huber score function. A large portion of outliers would however call for the T-estimator with the Huber score function. However, if one was mainly concerned with the precision (as in "high relative efficiency") of the estimates of dispersion, then the M O S M E defined by the Tukey score function would provide the best estimator of dispersion.  Chapter 5. Dispersion Estimation with Unknown Location  90  Maximum Bias Function of M-Estimators Using H_0.975 Score Function  0.0  0.1  0.2  0.3  0.4  0.5  Epsilon  Figure 5.6: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the # 0 . 9 7 5 Score Function  Chapter 5. Dispersion Estimation  with Unknown Location  91  Maximum Bias Function of M-Estimators Using Huber_2.376 Score Function  Epsilon  Figure 5.7: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the #2.376 Score Function  Chapter 5. Dispersion Estimation with Unknown Location  92  Maximum Bias Function of M-Estimators Using T_3.86 Score Function  Epsilon  Figure 5.8: Explosion Bias Curve of the M O S M E , and a Lower Bound for the Explosion Bias Curve of the One-Step M-Estimator of Dispersion, Derived from the T 3 . 8 6 Score Function  Chapter 5. Dispersion Estimation with Unknown Location  93  Maximum Bias Function of M-Estimators of Dispersion That Are 9 5 % Efficient at the Normal Model  Tau M-Estimator With T_5.3 Score Function MOSME With T_3.86 Score Function MOSME With H_2.376 Score Function Tau M-Estimator With H_2.51 Score Function  0.0  0.05  0.10  0.15  0.20  0.25  0.30  Epsilon  Figure 5.9: Explosion Bias Curves of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model, for Small Contamination by Outliers  Chapter 5. Dispersion Estimation with Unknown Location  94  Implosion Curve of M-Estimators of Dispersion That Are 95% Efficient at the Normal Model (Huber Score Function)  0.0  0.1  0.2  0.3  0.4  0.5  Epsilon  Figure 5.10: Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through The #2.376 Score Function, and the Tau-Estimator Have the #2.516 Score Function. Those M-Estimators Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes.  Chapter 5. Dispersion Estimation with Unknown Location  95  Implosion Curve for M-Estimators of Dispersion That Are 95% Efficient at the Normal Model (Tukey Score Function) o  0.0  0.1  0.2  0.3  0.4  0.5  Epsilon  Figure 5.11: Implosion Curve of M-Estimators of Dispersion. The M O S M E and the One-Step M-Estimator Are Defined Through the T3.86 Score Function, and the Tau-Estimator, Through the T5.3 Score Function. Those M-Estimators Are 95% Efficient at the Normal Model. The Normalized M A D is Provided for Comparison Purposes.  Chapter 5. Dispersion Estimation with Unknown Location  96  Implosion Curve for M-Estimators of Dispersion That Are 95% Efficient at the Normal Model  0.0  0.1  0.2  0.3  0.4  0.5  Epsilon  Figure 5.12: Implosion Curve of M O S M E ' s and T-Estimators That Are 95% Efficient At the Normal Model  Bibliography  [1] Andrews, D . F . , P.J. Bickel, F . R . Hampel, P.J. Huber, J . W . Tukey and W . H . Rogers (1972). Robust Estimates of Location: Survey and Advances, Princeton, N J : Princeton University Press. [2] Bernoulli, D. (1777). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inde formand, Acta Acad. Sci. Petropolit 1, 3-33. (English translation by C. G. Allen (1961). Biometrica, 48, 3-13. [3] Berrendero, J.R. (1995). A Note on One Step M-Estimates of Location, Spain: Universidad Carlos III de Madrid (Working Paper). [4] Bessel, F . W . and J.J. Baeyer (1838). Gradmessung in Ostpreussen und ihre Verbindung mit Preussischen und Russischen Dreiecksketten, Druckerei der Koniglichen Akademie der Wissenschaften Berlin. (Reprinted in part in Abhandlungen von F. W. Bessel, R. Engelmann (ed.). W . Engelmann, Leipzig, 1876, Vol. 3, pp. 62-138.) [5] Bickel, P.J. (1975). One-Step Huber Estimates In the Linear Model, J. Amer. Statis. Ass. 70, 428-434. [6] Bickel, P.J. and E . L . Lehmann (1976). Descriptive Statistics for Non-Parametric Models III: Dispersion, Ann. Statist. 4, 1139-1158. [7] Buchanan, J.L. and P.R. Turner (1992). Numerical Methods and Analysis, New York: McGraw-Hill, 751 p. [8] Casella G. and R . L . Berger (1990). Statistical Inference, Belmont, C A : Wadsworth & Brooks/Cole Advanced Books and Softwares, 650 p. [9] Donoho, D . L . (1982). Breakdown Properties of Multivariate Location Estimators, unpublished manuscript, Harvard University, Dept. of Statistics. [10] Donoho, D.L. and P.J. Huber (1983). The Notion of Breakdown Point, in Festschrift fur Erich L. Lehmann, eds. P.J. Bickel, K . Doksum and J.L. Hodges, Jr., Belmont, C A : Wadsworth, 157-184. [11] Fisher R . A . (1922). On the Mathematical Foundations of theoretical Statistics, reprinted in Contributions to mathematical Statistics (1950) by F.J.Wiley, New York.  97  Bibliography  98  [12] Hampel, F . R . (1968). Contributions to the Theory of Robust Estimation, P h . D. Thesis, University of California, Berkeley. [13] Hampel, F . R . (1971). A General Definition of Robustness, Ann. Math. Statist. 42, 1887-1896. [14] Hampel, F . R . (1974). The Influence Curve and Its Role in Robust Estimation, J. Amer. Statist. Ass. 62 1179-1186. [15] Hampel, F.R., E . M . Ronchetti, P.J. Rousseeuw and W . A . Stahel (1986). Robust Statistics - The Approach Based on Influence Functions, New York: John Wiley & Sons. [16] He, X . and D . G . Simpson (1993). Lower Bounds for Contamination Bias: Globally Minimax Versus Locally Linear Estimation, Ann. Statist. 21, 314-327. [17] Hodges, J.L. Jr. (1967). Efficiency in Normal Samples and tolerance to Extreme values for Some Estimates of Location, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), 163-168. [18] Huber, P.J. (1964). Robust Estimation of a Location Parameter, Ann. Math. Statist. 35, 73-101. [19] Huber, P.J. (1981). Robust Statistics, New York: John Wiley k Sons. [20] Huber, P.J. (1984). Finite sample Breakdown of M- and P- Estimators, Ann. Statist. 12, 119-126. [21] Jureckova, J . and S. Portnoy (1987). Asymptotics for One-Step M-estimators in Regression With Application to Combining Efficiency and High Breakdown Point, Comm. Statist. Theory Methods 16, 2187-2200. [22] Kiefer, J.C. (1987). introduction to Statistical Inference, New York: Springer-Verlag. [23] Le Cam, L . (1956). On the Asymptotic Theory of Estimation and testing Hypotheses, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Berkeley: University of California Press, 129-56. [24] Lehmann, E . L . (1983). The Theory of Point Estimation, New York: John Wiley. [25] Martin, R . D . and R . H . Zamar (1989). Asymptotically Min-Max Bias Robust MEstimates of Scale for Positive Random Variables, Ann. Statist. 84, 494-501. [26] Martin, R . D . and R . H . Zamar (1993). Bias Robust Estimation of Scale, Ann. Math. Statist. 21, 991-1017.  Bibliography  99  [27] Martin, R.D., V . J . Yohai and R . H . Zamar (1989). Min-Max Bias Robust Regression, Ann. Statist. 17, 1608-1630. [28] Press, W . H . , S.A. Teukolsky, W . T . Vetterling and B.P. Flannery, (1992). Numerical Recipes in C, 2nd edition, New York: Cambridge University Press, 994 p. [29] Rousseeuw, P.J. (1984). Least Median of Squares Regression, J. Amer. Statis. Assoc. 79, 871-880. [30] Rousseeuw, P.J. and C. Croux (1991). Alternatives to the Median Aboslute Deviation, J. Amer. Statis. Assoc. 88, 1273-1283. [31] Rousseeuw, P.J. and C . C r o u x (1993a). The Bias of k-Step M-estimators, Belgium: University of Antwerp, UIA, Department of Mathematics & Computer Science (Report no. 93-06). [32] Rousseeuw, P.J. and C. Croux (1993b). Alternatives to the Median Absolute Deviation, J. Amer. Statis. Assoc. 88, 1273-1283. [33] Tukey (1960). A Survey of Sampling from Contaminated Distributions, Contributions to Probability and Statistics, I. Olkin (ed.). Stanford, C A : Stanford University Press, 448-485. ' [34] Yohai, V . J . and R . H . Zamar (1988). High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale, J. Amer. Statis. Assoc. 83, '406-413.  Appendix A  N o r m a l i z a t i o n of Densities for Calculations of A s y m p t o t i c Efficiencies  Many distributions F are used as examples in the study of asymptotic efficiency of estimators of location and of relative asymptotic efficiency of estimators of dispersion. They are defined as the central distribution in their respective distribution family {F(x) : F(x) = F(^-), —oo < x < oo, —co < 6 < oo, a > 0}. The following lists the expression for the density associated with any member of their distribution family: double exponential  f (x;6,a)  contaminated normal  f (x;6,a)  « P ( - ^ )  do  do  = 0.9  1 exp( \ 2dn <T ••o'"'' _|_ 0.05 2(0.0l)d '2irado ^/2(0.01)x<Td  exp(-  2  2  0  exp(0  t(l)  f (x;6,a)  t(2)  fdo(x;0,<r)  t(5)  fd (x;6,a)  r  ) 2(0.01)d  y/2(0.01)irado V  2  0  <7 -' 2  1  1  do  0  r(3/2)  i_  372"  r(l) doeV^ (  T(5/2) doaV^r" 1+  x-e\  1/  S\d a J 0  t(8)  fd (x;9,a) 0  r(9/2)  i_  T(4) •d <TV'8lr' 0  1+  9/2  I (x-e\  8\d cr J 0  t(10)  f (x;6,a) do  _ r(n/2)  1  T(5) d (?VWir 0  t(20)  f (x;6,a) do  _ r(2i/2) T(10)  11/2 1+  ^ d  0  ° -  J  i_  d aV 207r /  0  1+  100  10  _i_ 20  (x-e\ ^d <r 0  /  21/2  2 n  <x ' 2  Appendix A. Normalization of Densities for Calculations of Asymptotic  normal  :  f (x; 9, a) =  symmetrized beta  :  f (x; 9, a) =  do  do  Efficiencies 101  exp(-| (%f) ) (- (^)  2  + 1/4)  a(9-1/2) <x <<T(0,+ 1/2)  Notice that the densities are function of a certain factor d . 0  Indeed, in order to  make the different central distributions comparable in terms of spread, it was decided to normalize the associated density in their families by a factor do. The specific choice of do makes the interquartile range of each central distribution equal to the standard normal interquartile range, 1.349. Because these central distributions are symmetric around 0, their interquartile range is equal to 2r, where r satisfies  (A.32) By fixing the value of r to 0.6744897502 (which can be obtained by solving the above equation (A.32) with Fd (x;8,do) = $(#), where $ is the standard normal distribution 0  function), it is therefore possible to find the normalizing factor do that satisfies (A.32). Table A . 11 shows the factor do necessary to make each of the distributions of interest have an interquartile range equal to 1.349. In summary, the distributions F used in the calculations of asymptotic efficiency of estimators of location, and of relative asymptotic efficiency of estimators of dispersion are the central distributions in their family, and so, have densities fd (x] 9 = 0, a = 1). 0  Appendix A. Normalization of Densities for Calculations of Asymptotic Efficiencies 102  Distribution F double exponential contaminated normal t(l) t(2) ' t(5) . t(8) t(10) t(20) normal symmetrized beta 0.55exp(—x) 4  Normalization Factor do 0.9723764576 0.8820206848 0.6744897502 0.8254780431 0.9274971822 0.9541517180 0.9631157239 0.9811421330 1.0000000000 8.8849776417 1.475435073  Table A.ll: Normalizing Factor do Needed to Standardize the Interquartile Range of Each Distribution F to That of the Standard Normal Distribution  Appendix B  Derivation of the Influence Function of the M O S M E of Location  The influence function of the functional T ( F ) , as defined by Hampel et al in [15] (first introduced by Hampel in 1968), is  IF( ;T,F)  =  X  where F  tiX  Km  T  {  F  '- )  T  {  \  F  = (1 — t)F + t8 . In other words, it is defined as the derivative with respect x  to t of the functional T(F ),  evaluated at t = 0.  tiX  Therefore, to derive the influence function of the M O S M E of dispersion T*(F), need to write the expression for T*(F ), ttX  we first  and then evaluate its derivative with respect  to t at t = 0. The expression for T*(F )  is  t<x  {(1 - t)E ^{ - f ^) y  T  F  T*(F , ) t x  = T (F ) 0  + S (F , )-  tx  0  +  Ft  F  v i ,  t x  f  \  t^Cf?^)}  Derivating this expression with respect to t, and evaluating it at t = 0, gives  IF(x;T*,F)  =  0  ^ (y)(" /  =  +  IF(x;T ,F)  / F ( x ; r  I  °'  ^  F  F ) 5 o  "^"  IF(x;T ,F){l- ^}+  ^  +  T o ) J F ( 3 : ; 5 o  E ^ { - E ^ ( y ) + -  F )  ) +  H^ff )} 1  (B.33)  E  0  103  Appendix  where T  B. Derivation  0  = T (F),  S  0  0  of the Influence Function of the MOSME  = S (F),  y =  0  ^gpf , 1  IF(x;S ,F)  of Location  and IF(x;T ,F)  0  0  104  are the  respective influence functions of the initial estimates of dispersion and of location. But if T is consistent, then T (F) 0  0  = 9 — T(F),  M-estimator of location 9. Moreover, in that case, ) = 0 and To(F) = T(F).  EF( SJ(P^ V  then E ip'(^f^){^^)  where T(F) is the fully-iterated  ^ ( ^ J ^ )  =  0, since by definition  If we also assume that ip is odd and F is symmetric,  = 0, since T (F) = T(F) = 0. Therefore, provided that T  F  0  0  is consistent, ip is odd and F is symmetric, equation (B.33) reduces to. -  IF{x;T\F)  =  IF(x-,To,F){l^ l ^ } EF i  )  +  Notice that the middle term on the right-hand side of the above expression can be rewritten as SJF)^^  -  ^fa  EFn  SJF)  }  where IF(x.; T, F) is the influence function of the one-step location M-estimator, which is equivalent to the influence function of the fully- iterated location M-estimator, provided that To is consistent, odd and translation invariant (see [19] for more details). If these last conditions for T hold, and if ip is odd and F is symmetric, the expression for IF(x; T*, F) 0  finally simplifies to  IF{x;T*,F)  WHERE  *= E!^ EF  ]  = (l-a)IF(x-T ,F) 0  +  aIF(x;T,F),  Appendix C  Derivation of the Influence Function of the M O S M E of Dispersion  As explained in Appendix B , to derive the influence function of the M O S M E of dispersion S*(F), we first need to write the expression for S*(Ft ),  and then evaluate its derivative  iX  with respect to t at t = 0. The expression for S*(F ) is ttX  S*(F ) = S (F ) + tiX  0  S (F , ){(l 0  + *x(^gy)}.  - t)E (^f^)  t x  FX  ttX  E* '(z)z X  Derivating this expression with respect to t, and evaluating it at t = 0, gives  IF(x- S*, F) = IF(x; S ,F) + ^-^{IFix-,  S , F)E (y)  0  ^(x^){=  / F (  -^  7F(x;S ,F){l + ^ 0  m*\T ,F){^} 0  where T = T {F), S = S (F), 0  F ) 5  0  0  0  0  °-g^ -  ) / F (  M  +  $  +  FX  S [-E (y)+ 0  FX  ^ ' } ) + X(^)]}  "  F )  } +  ^M t)-EFx{v)h  y = ^ ^ p ,  S:  IF(x;S ,F)  and IF(x; T ,F). are the  0  0  respective influence functions of the initial estimates of dispersion and of location. But since E '( 7j°^) y  FX  = 0 when x is even and F is symmetric, and since T (F) = 0 0  when F is symmetric, equation (G.34) reduces to  IF ; ;F) (X  S  = / f „ , f) (i + izMnl. (  S o  So  105  °  "  ^  p  >  } +  Appendix D  D e r i v a t i o n of the Influence F u n c t i o n of the One-Step M - E s t i m a t o r of Dispersion  As explained in Appendix B , to derive the influence function of the one-step M-estimator of dispersion, S\(F), we first need to write the expression for Si(F ),  and then evaluate  tiX  its derivative with respect to t at t = 0. The expression for Si(F , )  is  t x  Derivating this expression with respect to t (using the derivation formula for a ratio a two functions), and evaluating it at t = 0, gives  IF(x;S ,F) 1  = IF(x-S ,F)  +  0  So[-E (y)  ^ {(IF(x-So,F)E (y)+  w  w  FX  + E x'(y){ ~ ( ' > ) °~W~ °) ( < ' } IF  FX  x To  F  s  T  IF  x  }_|-  Sn F  F  x(^)]E '(y)y-SoE (y)(-E x (y)y+ f  FX  FX  F  ^[(x (y)y + x ^ ) ) ( //  J F (  )c(*t)(*t))} =  IF{x- So, F){2^L T F(r1  |  +  F\f ^ EpX  l  0  *FX\y)\  ,  r  ^  "  T o  I  +  2  J  +  ' )]+ F )  (D-35)  }+  S  FX  106  r o ) / i ? ( 3 ; ; S Q  *^«)J W  E  FX  1f-  F ) 5 Q  '  F\"(y)y [E '(y)y]  [E '(y)y]*  '  E x(y) E '(y) [E x'(y)y? P  FX  F  E '(y)y FX  '  E '(y) E '( ) i FX  FX  y  y  -i  , +  Appendix D. Derivation of the Influence Function of the One-Step M-Estimator of Dispersionl07  where  T  0  =  T {F), 0  S  =  0  S (F), 0  ^ ^ f ,  y =  IF{x;S ,F) 0  and  IF(x;T ,F) 0  are the  respective influence functions of the initial estimates of dispersion and of location. But since Erx'i'-^ ) 1  = 0 and  E y\*fffi){*gffi) = 0 when F  is symmetric, and since TQ(F) = 0 when IF(x- ,F) SL  =  IF{x-  S, 0  F)  F  is symmetric, equation  K ^ ^ l L ) +  E F X l  x  (D.35)  is even and reduces to  ^( ^^^'}+ E  F  Appendix E  Derivation of the Influence Function of the r-Estimator of Dispersion  As explained in Appendix B , to derive the influence function of the T-estimator of dispersion, Si(F), we first need to write the expression for Sl(F ),  and then evaluate its  ttX  derivative with respect to t at t = 0. The expression for Si(F ) tjX  is  where /3 = E$x( )x  Derivating this expression with respect to t, and evaluating it at t = 0 gives  IF(x;SlF)  =  i  {  2S iF ,;S ,F) 0  {  0  E F x { y ) +  2yJ £E (y) T  ••  FX  f[-E (y)  + ^X^)(-  FX  =  5 o f F (  ^p{IF(x; S ,F)[~^(Sl)  2  0  "  -  F )  -|-  r o ) / F ( 3 : ; S o  '  F )  ) + X(^f)]}  - ^-E x'(y)y]-  IF(x;T ,F)fE x'(y)-(Siy 0  r o  •  F  F  +  f ( -^)}, x  X  (E.36) where T = T (F), S = S {F), SI = S?{F), y = 0  0  0  0  IF(x;T ,F) 0  and  IF{x,S ,F) 0  are the respective influence functions of the initial estimators of location and dispersion. But since E x'(y) = 0 when x is even and F is symmetric, and since To(F) = 0 when F  F is symmetric, y = IF( ;SIF) X  =  and equation (E.36) reduces to  iF(x-,So,F){^-^E \y)y}  +  FX  108  x-T (F)} F 1  ^ x { ^ f  0  F  )  _  Sf(F) 2  '  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087209/manifest

Comment

Related Items