SPATIAL ESTIMATION: T H E GEOSTATISTICAL POINT O F VIEW by SIMON M C F A D Z E A N - F E R G U S O N B.Sc. Bristol University, United K i n g d o m , 1991 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES (Institute of A p p l i e d Mathematics) (Department of Mathematics) We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH C O L U M B I A M a r c h 1995 © S i m o n McFadzean-Ferguson In presenting this degree at the thesis in partial University of fulfilment of the requirements for an advanced British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for department or by his or scholarly purposes may be granted by the head of her representatives. It is understood that copying my or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada -Ik Date DE-6 (2/88) \ 2. 'vSfrML ' S£ Abstract Geostatistics involves the statistical estimation of erratic surfaces, similar to those found in geology, using sample data. It has been my experience that there are few texts i n geostatistics written for people who are new to the subject, and who have not been immersed in it since its inception i n the 1960's. To prevent other people becoming confused by the changing notation, and unspoken assumptions, I provide an overview of this subject, w i t h the a i m of providing a clearer understanding of the concepts involved, the assumptions made, and the motivation behind each type of estimator. I then concentrate on the more general form of estimation assuming a nonhomogeneous trend, called Universal Kriging. I explain i n detail how this estimator can be found i n an accurate and computationally efficient way. Using the information gained from robustness studies of this estimator, I then attempt to apply it to real surfaces, for different methods of covariance estimation and trend orders. ii Table of Contents Abstract ii Table of Contents iii List of Figures vi List of Tables vii Acknowledgements viii General Notation ix 1 Introduction 1 2 T h e E v o l u t i o n of Geostatistics 7 3 2.1 Derivation of Matheron's Ordinary K r i g i n g 2.2 T h e Histogram 13 2.3 (Ordinary) Lognormal K r i g i n g (LK) 16 2.4 Nonlinear estimators 18 (OK) estimator, ZOK(^) • • 2.4.1 Disjunctive K r i g i n g (DK) 19 2.4.2 Indicator K r i g i n g (IK) 25 K r i g i n g w i t h a nonhomogeneous trend 3.1 9 27 O r i g i n a l derivation of Matheron's Universal K r i g i n g estimator, Z {x) UK 3.1.1 Discussion . 30 34 iii 3.2 3.3 4 R i p l e y ' s form of the Universal K r i g i n g estimator 36 3.2.1 Cholesky factorisation 36 3.2.2 Q R Decomposition 36 3.2.3 Singular Value Decomposition ( S V D ) 37 3.2.4 Step 1: E s t i m a t i n g the trend 37 3.2.5 Step 2: Modelling the residuals, R(x) 44 T h e equivalency of the two forms of estimation 46 3.3.1 F i n d i n g $ explicitly 46 3.3.2 F i n d i n g R K(*) 46 explicitly 3.4 C a l c u l a t i n g o"g(x), the error variance of ZUK(X) 47 3.5 E s t i m a t i o n of the covariance function 48 3.5.1 Intrinsic R a n d o m Functions of Order p (IRF-p) 51 3.5.2 Other forms of unbiased covariance estimation 57 59 Robustness of the Universal Kriging estimator, Z (x.) UK 4.1 Step 1: T h e trend component functions 4.2 Step 2: Nonparametric estimation of the experimental (co)variogram 4.3 Step 3: F i t t i n g an appropriate parametric model, C ( x , y ; 6) 4.4 Step 4: F i n a l estimation using C ( x , y ; 0 ) and the sample data z 4.4.1 5 0 60 . . 63 64 67 n Robust K r i g i n g 68 A Numerical Application 71 5.1 Reasons for choosing the biased form of (co)variogram estimation 5.2 Detailed description of (co)variogram estimation 74 5.3 Results 79 5.4 Detecting the optimal trend order 87 iv . . . . 72 5.5 6 Conclusions 90 94 Conclusion 97 Bibliography v List of Figures 1.1 Idealised (co)variogram under second order stationarity assumption 2.2 Illustration of how distribution of Z(x) is transformed to N(0,1) . . . 5 using CDF's 23 5.3 A n experimental semivariogram 75 5.4 Comparison of semivariogram models 76 5.5 A comparison between L S and Cressie's weighted L S method 79 5.6 H o w weighted L S alters the sill of the spherical fit 80 5.7 Plots of residuals versus fitted values for the A D and S D data sets . . . . 82 5.8 Histograms and normal q-q plots for the A D and S D ^ residuals 83 5.9 Dimension-.methodinteraction plots for both the A D and S D residuals . 85 5 5.10 Density and normal q-q plots for L S - W L S and L S - L I N differences for stationary trend surface 87 5.11 M e a n and median factor plots for the A D data set 92 5.12 N o r m a l q-q plots for the cross validation data 93 vi List of Tables 5.1 A N O V A tables for A D and S D ^ data sets 5.2 M e a n and standard deviations of A D data for the different m e t h o d / t r e n d combinations 84 85 vii Acknowledgements First and foremost I would like to express my gratitude to D r . R u b e n Zamar for taking a particular interest i n my work i n his sampling theory course, and for offering to supervise me for the duration of my Masters thesis. I thank h i m for his continuous support and guidance during the course of this research. I am also extremely grateful to Professor D o n L u d w i g for serving on my advisory committee, for carefully reading drafts of this thesis, and for his many helpful comments during various stages of this work. I also thank h i m for his generous support as my course advisor. viii General Notation Roughly i n order of appearance: n the number of sample values x a general (usually unsampled) point i n !!ft Xj the sample points, i = 1 . . . n Z(x) the theoretical regionalised variable for the ore grade at the point x z(x) the ore grade measured at the point x , seen as a realisation of the variable Z ( x ) Z„ the (n x 1) vector of ore grade random variables at the sample points, x i . . . x „ z the (n x 1) vector of ore grades measured at the sample points, x i . . . x 2 n Z(x) an estimator of Z{x) £(x) the actual estimate of z{x) LI(X.) the expected value of Z{x) at point x H the expected value of Z(x) when it is assumed to be homogeneous e the (n x 1) vector of residual variables at the sample points, X i . . . x n C ( x , y) shortened notation for C o v ( Z ( x ) , C(h) n Z(y)) notation for C o v ( x , y) when the covariance function is assumed to be homogeneous, where h = x — y C(h) notation for C o v ( x , y) when the covariance function is assumed to be homogeneous and isotropic, where h = \x — y\ 7(x, y) the semivariogram for Z(x) with same notation changes as for C ( x , y) cr^(x) the variance of Z(x), which depends on x <7g(x) the variance of the error, Z ( x ) — Z(x), which depends on x ix n A,-(x) the kriging estimator weights, i = 1 . . . n , which depend on x A(x) the (n x 1) vector of kriging weights A (x) <p a Lagrange multiplier <p the (n x 1) vector of Lagrange multipliers, fa,..., (f> //(x) the trend component functions, I = 0 ... L f(x) the ((L + 1) x 1) vector composed of the trend component functions F the (n x (L + 1)) m a t r i x of trend component functions evaluated at the sample n t n points p the degree of the polynomial trend (3 the ((L + 1) x 1) vector of trend constants, /?/, such that: u(x) = / 3 f (x) K the (n x n) m a t r i x of covariances between the ore grade variables, Z(x;), at the T sample points KQ an i n i t i a l estimate of K k(x) the (n x 1) vector of covariances between the ore grade variable Z(x) at x and those at the sample points, Z ( x i ) . . . Z ( x ) n A T m(x) the generalised least squares estimator of the trend: m(x) = (3 f(x) M the Cholesky decomposition lower triangular m a t r i x of K such that: K — Y the uncorrelated sample data transformed from Z G the ( n x ( I + 1)) transformed m a t r i x of F such that: G = k^x) the ( n x l ) transformed vector of k(x) such that: k^x) = M k ( x ) £ the uncorrelated sample residuals transformed from e such that: £ = A f e „ n n such that: Y n = M _ 1 MM T Z n M~ F l _ 1 _ 1 n n i? R the Q R decomposition of G where Q is (n x n), i? is (n x (X + 1)) and G = an ((1/ + 1) x (L + 1)) upper triangular m a t r i x : R = V i?(x) n the residual estimator : i?(x) = e(x) = Z(x) — p(x) x 0 ) the actual residual estimate at the sample point x*, seen as a realisation of the variable i?(x,) such that: r(x,) = Z(XJ) — m ( x j ) the (n x 1) vector of estimated residuals r ( x j ) at the sample points x i . . . x, the covariance parameters for a chosen covariance model C ( x , y ; 6) xi Chapter 1 Introduction Geostatistics was initially created to provide better estimates of ore reserves. To understand the concepts of geostatistics, it is necessary to look more closely at the types of surfaces of the ore deposits involved. T h e quality of ore, or ore grade, can be distributed i n a variety of ways depending upon the type of ore deposit. Bedded sedimentary ore deposits can have quite regular spatial ore grade distributions. For these deposits, the ore reserve can be adequately estimated using conventional techniques such as the fitting of a deterministic function by least squares. Alternatively, deposits of valuable minerals and precious metals such as the gold mines i n South Africa can have extremely erratic spatial ore grade distributions, even w i t h i n a very small area. A l t h o u g h these ore grade distributions are continuous surfaces, they are so erratic i n their spatial behaviour that points on the surface that are relatively far apart appear to be statistically independent. T h i s idea of ore grade being somewhere between a deterministic function and a random variable led to the concept of the regionalised variable. A s the types of surfaces that are found i n geology could not be realistically fitted by any form of deterministic function without having to solve for a huge number of parameters, geologists were forced to look at statistical models for these surfaces. A s for traditional statistical models, geostatistics considers surface values .z(x) and 1 z(y), Chapter 1. Introduction 2 from different locations x and and Z(y). y, to be realisations of the surface random variables, Z(x) A basic statistical model assumes that, if chosen randomly, Z(x) and Z(y) are independent and identically distributed random variables with: E(Z(x)) = p, V x (1.1) Var(Z(x)) = o\ (1.2) Var(Z(x) - Z(y)) = Vx V a r ( Z ( x ) ) + Var(Z(y)) = 2a . (1.3) 2 z Equations (1.1) and (1.2) state that the surface variables at a l l points have the same expected value and variance, and are commonly referred to as the stationarity under translation or homogeneity assumptions i n both the mean and the variance. These assumptions allow the estimation of statistical properties which can be assumed representative of the whole surface or population over which stationarity is assumed, rather than of a particular location. Under these assumptions we can view each sample value as an outcome of the same variable, allowing us to find its mean and variance. Therefore, under this basic model, the geological surface z ( x ) is modelled as a realisation of an infinite set of independent (normal) random variables, Z(x), x. one for each point W h e n talking about this set of random variables, Z(x) is referred to as a random function, as it assigns a random variable to each point x . Even though the sample values are used to estimate the common mean, \x, and variance, a\, the positional information of the sampled points is not used i n estimating the surface value at a specifically placed unsampled point. Bearing i n m i n d that the geological surface z ( x ) that we are t r y i n g to model as a random function is actually non-random and continuous, then for any points x and y close together, it is unrealistic to treat variables Z ( x ) and Z(y) independent. as entirely Instead, the closer x and y are to each other, the more likely Z(x) and Chapter 1. Introduction Z(y) 3 w i l l be similar i n value. Therefore, a more realistic statistical model is to assume that however randomly the points x and y are chosen, they are not independent, but spatially correlated. This additional information is translated into the geostatistical theory i n the following two ways: E(Z(x)) = M Vx V a r ( Z ( x ) - Z{y)) intrinsic stationarity (IS) (1-4) = 2 (h) J E ( Z ( x ) ) = /* V x 7 > second — order stationarity (SOS) (1-5) Cov(Z(x),Z(y)) = C(h) J where h = x — y . The function, 27(h), is called the variogram. For a particular direction, this is usually an increasing function of | h | . The idea behind this is that the closer the points x and y are, the more alike the variables will be, and so the smaller w i l l be the variance of their difference. Thus, under this geostatistical theory, the geological surface, z(x), realisation of an infinite set of random variables, Z(x), is modelled as a which are independent over long distances, and are correlated over short distances, dependent on their variogram 27(h) or covariance function C ( h ) , otherwise known as the is referred to as a covariogram. Under this model, Z(x) regionalised variable. C o m p a r i n g assumptions (1.4) and (1.5), S O S is the stronger of the two as it i m p l i c i t l y implies stationarity i n the variance, whereas IS does not. It can be shown that the class of IS models strictly contains those satisfying S O S . In practise, I have found that this distinction is lost because one must assume stationarity in the variance i n order to Chapter 1. Introduction 4 estimate either the variogram or the covariance function. Under this unifying assumption, the variogram and covariance function can be directly related i n the following way: 2 (h) 7 = £ ( Z ( x ) - Z{y)) = V a r ( Z ( x ) - Z ( y ) ) = Var(Z(x)) + = 2a -2C(h). 2 Var(Z(y))-2Cov(Z(x),Z(y)) 2 z (1.6) F r o m now on, I use the abbreviation (co)variogram to refer to both the variogram and covariance function. In geostatistics, people refer to the semivariance or semivariogram, 7(h), rather than the variogram where 7 (h) = a 2 2 -C(h) = C(0)-C(h), (1.7) A s the covariance function is so central to spatial estimation i n geostatistics, estimation of this function is key to any estimator's performance. Unless there is information to suggest that the covariance function should indicate more correlation i n one direction than another, i t may also be assumed to be stationary under rotation, or isotropic, to make estimation of the covariance function easier: C ( h ) -> C{h) where h = | h | Having outlined the philosophy and assumptions behind basic geostatistical theory, I now concentrate on more specific developments within this subject of spatial estimation. In Chapter 2, I describe the evolution of ideas i n mainstream geostatistics. Throughout this chapter, the random function Z(x) is assumed to have a homogeneous trend. Firstly, Chapter 1. Introduction 5 h a Figure 1.1: A n idealised (co)variogram under the second order stationarity assumption. C(h) is the covariogram and 'y(h) is the variogram. Sample values whose inter-location distance is greater than a, the range of influence, are considered to be uncorrelated and thus independent. I derive Matheron's original linear estimator, the Ordinary K r i g i n g estimator B y considering the optimality of ZOK(X), ZOK{*)- we become interested i n the underlying dis- tribution of the sample variables which can be estimated by forming a histogram of the data. Methods of forming this histogram from correlated data are discussed. Under the assumption that the data is jointly gaussian, the optimal estimator can be shown to be linear. U p o n the failure of this assumption, I then outline the alternative methods used, leading to the onset of nonlinear estimators. I summarise a selection of nonlinear estimators, outlining their differing motivations and assumptions, and pointing out their practical failings. In Chapter 3, I then outline the attempts that have been made i n geostatistics to estimate random functions Z(x) w i t h a nonhomogeneous trend. I describe i n detail one of the early geostatistical estimators, the Universal Kriging estimator ZUK^X)- Pointing out the weaknesses i n applying the original form of the estimator, I describe i n detail an Chapter 1. Introduction 6 alternative derivation by Ripley (1981) that is practically superior, and computationally more efficient. T h e two forms of estimation are shown to be equivalent. I discuss the problem of estimating the covariance function unbiasedly from nonhomogeneous sample variables, and how this led to the use of tion of the generalised increments and the unbiased estima- generalised covariance function. Finally, I outline other proposed methods of unbiased estimation of the covariance function. In Chapter 4, I outline the studies that have been done on the robustness of Universal K r i g i n g to outliers i n the data, and to errors i n the estimation process, such as misspecification of the trend order, inappropriate choice of (co)variogram model and misspecification of the model parameters. I also describe various methods of detecting and editing these outliers. In Chapter 5, I attempt to apply the improved form of the Universal K r i g i n g estimator derived i n Chapter 3 to real surface data of a mountain range, assumed to be representative of an ore surface, using the practical information gained from Chapter 4. I discuss the various methods by which I estimate the (co)variogram of the surface from the data. I then divide the surface area into smaller areas and estimate a selection of these smaller surfaces for different combinations of (co)variogram estimation method and trend order. I analyse the results i n detail, and come to some general conclusions. Lastly I discuss whether the optimal trend order could have been identified for a particular sample configuration and surface. Chapter 2 T h e E v o l u t i o n of Geostatistics Even though the theory of regionalised variables and the original kriging estimators were developed i n the early to late 60's, much of this work was written i n French and it was not until the publications i n English of Michel David's and A n d r e JournePs and C . Huijbregts' Geostatistical Ore Reserve Estimation Mining Geostatistics i n the late 70's that use of these geostatistical ideas became widespread. Since this time, numerous papers and books have been published on a l l aspects of geostatistics. It is impossible to summarise adequately this wealth of ideas, recent estimators, improved versions of old estimators, and differences of opinion, i n a single chapter. Therefore I outline the 'backbone' of modern geostatistics, highlighting the motivations behind each type of estimator. I begin by deriving Matheron's linear Ordinary K r i g i n g estimator Z o ^ ( x ) which was originally thought to be distribution-free. However, when considering the optimality of an estimator, the underlying distribution of the sample variables becomes vitally important, and can be estimated by forming a histogram of the data. T h e problem of using correlated data to form a histogram is discussed. Depending on whether this distribution can be assumed to be jointly gaussian or jointly lognormal, or whether it can be transformed to an isofactorial model, a variety of estimators have been developed, and applied w i t h varying success. I describe the theory behind these estimators i n enough detail to provide the reader w i t h a clear understanding of the concepts and assumptions involved. For a l l the estimators discussed i n this chapter, the random function 7 Z(x), Chapter 2. The Evolution of Geostatistics 8 from which the surface originated, is assumed to have a homogeneous trend. K r i g i n g is a statistical form of estimation developed by Georges M a t h e r o n i n the 1960's for use i n the mining industry as a method for estimating ore reserves. It is named after D . G . Krige, a South African mining engineer, who was a forerunner i n using statistical techniques i n ore estimation. It is this same G . Matheron that developed the Theory of Regionalised Variables, introduced i n the previous chapter, which together w i t h kriging, led to an explosion of work i n this area. T h i s work, and other related subjects i n applied statistics, has become collectively known as Geostatistics. T h e original kriging estimators, Z(x), are estimators of the regionalised variable Z(x) at an unsampled point x , and are chosen as a weighted sum of the regionalised variables Z ( x i ) , . . . , Z(x. ) at the sample points, x i , . . . , x . These sampled points do not have to n n be of any specific configuration. The weights are chosen so that Z(x) is unbiased, and the variance of the difference between Z(x) and -2T(x) is minimised. The kriging estimator, Z ( x ) , is therefore a B L U E estimator (Best Linear Unbiased Estimator) i n that no other linear combination of the sample variables w i t h the same model assumptions can form an estimator w i t h a smaller variance about the true surface value at a general point x . A l t h o u g h i n the beginning the term kriging specifically referred to this particular method of unbiased m i n i m u m error variance estimation, it now refers more to the collection of geostatistical estimators, both linear and non-linear, that have developed from this idea. Before going any further, I shall outline the basic assumptions made by a l l the kriging estimators discussed i n this chapter. Chapter 2. The Evolution of Geostatistics 9 Basic kriging assumptions 1. The unknown ore grade, z(x), and the n sample values, z ( x i ) , . . . , . z ( x ) , are realin sations of the respective regionalised variables, Z ( x ) , and Z ( x i ) , . . . , Z ( x „ ) . T h e y contain no measurement or positional errors. 2. The covariance C o v ( Z ( x ) , Z(y)) of regionalised variables Z(x) and Z ( y ) , is known for any two points x and y i n the area over which we want to estimate z(x). 3. T h e non-negative definite matrix K of covariances between the ore grade variables at the sample points, is positive definite, where ( C(x!,Xi) .... C(x!,X ) n \ K (2.8) \ C(x ,xi) n and C ( x , y ) = C o v ( Z ( x ) , Z(y)) 4. The mean of Z(x) .... C(x ,x ) J n n for ease of notation. is the same for all points x i n the area over which we want to estimate -z(x). In geostatistical terms, the trend is homogeneous. T h i s assumption w i l l be relaxed i n Chapter 3. 2.1 D e r i v a t i o n o f M a t h e r o n ' s O r d i n a r y K r i g i n g (OK) e s t i m a t o r , Matheron considers the original Ordinary K r i g i n g (OK) estimator ZOK(*) Z K(^) 0 of the surface variable Z(x.) at point x as a linear function of the surface variables Z ( i ) > • • • > Z(yin) at x the sample points: Z (x) OK = £>(x)Z(xO = A ( x ) Z , r n (2.9) Chapter 2. The Evolution of Geostatistics 10 where ( Z( X L ( Ax(x) ) ^ A(x) = Z — n ^ A (x) 2 (2.10) and the weights, A;(x), are fixed but unknown. Z(xi),..., For T h e mean of each variable Z ( x ) , Z ( x ) is unknown and denoted by /i. n ZOK(X) to be unbiased, we need the expected value of the error, the difference between the ore grade z ( x ) that would be measured at x and its estimate Z K{*), 0 to be zero: E(Z(x) - Z {*)) OK = £(Z(x)) - £(Z *(x)) G 1=1 = M (i - E M * ) ) = o. (2.11) Therefore, for equation (2.11) to be true, independent of the value of the homogeneous mean LL, we require: ] P A j ( x ) = 1, otherwise written as A ( x ) l = 1. T i=i T h e error variance, cr (x), is defined as: 2 <r (x) 2 e = Var(Z(x) - Z ^ ( x ) ) = Var(Z(x)) + V a r ( Z 0 O K ( x ) ) - 2 C o v ( Z ( x ) , Z0AT(X)) (2.12) Chapter 2. The Evolution of Geostatistics 11 Var(Z(x)) + V a r ( A ( x ) Z ) - 2Cov(Z(x), A ( x ) Z ) r T n n o (x) + A ( x ) i T A ( x ) - 2 A ( x f k ( x ) , 2 (2.13) T z where C(x, ( X l ) ^ (2.14) k(x) V G(x,x) j n We now minimise the error variance crf(x) subject to the unbiasedness constraint i n equation (2.12) by introducing a Lagrange multiplier 0. Therefore we minimise: H = <r (x) + 2 ( l - A ( x ) l ) 0 = <x (x) + X(x) KX(x) 2 r 2 T - 2 A ( x f k ( x ) + 2(1 - A(x) l)</>, r (2.15) w i t h respect to A ( x ) and 4>. The factor of two i n the Lagrange component of (2.15) is only for convenience i n the working, and has no effect on the final solution. A s H is quadratic i n A ( x ) and K is non-negative definite, when we differentiate with respect to A ( x ) , the stationary point solution w i l l necessarily be at a m i n i m u m : <9A(x) - -- 2 K A ( x ) - 2k(x) - 201 = 0 = 2 (1-A(xfl) = 0 which imply respectively that KX(x) = k ( x ) + 01 (2-16) A(x) l = 1. (2.17) r Chapter 2. The Evolution of Geostatistics 12 These are referred to as the kriging equations. B y first solving for c/>, and using the assumption that K is positive definite, it can be shown that a unique solution for A(x) exists. T h i s is described i n full detail i n Section 3.1 for Universal K r i g i n g , of which O r d i n a r y K r i g i n g is a special case. Therefore, by substituting the solution for A(x) into equations (2.9) and (2.13), we have the original OK estimator and error variance as derived by Matheron. If the homogeneous mean ji is somehow known, then a slightly different kriging estimator is used: n z {y) = v- + 5 > ( x ) ( Z ( x i ) ~ A*), (2-18) OK i=l which is already unbiased. Therefore an unbiasedness constraint is not required, and the kriging equation for A(x) is simply: (2.19) tfA(x) =k(x). A s mentioned, ZOK(X) and ZOK(X) are B L U E estimators and were originally consid- ered to be distribution-free, but as soon as one considers the optimality of an estimator, the assumed distribution of the data is all important. Z(x) A s s u m i n g E(Z(x) ) 2 < oo, (for to have a finite variance), the optimal estimator of Z(x) by any measurable func- tion g(Z(xi),..., Z ( x ) ) , whether linear or nonlinear, and irrespective of the underlying n distribution of Z{x), is the conditional expectation (CE) estimator: Z (x) CE = E(Z(x)|Z( = / J z(x) X l ),...,Z(x )) f e ) , Z ( x ) . - , Z ( x ) {Z{*)X*1), 1 (2.20) n B •• • »*W) /z(xi),...,Z(x )(2(Xi),...,2(x )) n n d z ( x ) . Chapter 2. CE The Evolution of Geostatistics 13 is optimal i n that it minimises the error variance over all estimators which can be computed from the sample values. In order to calculate this estimator, the joint distribution of ( Z ( x ) , Z ( x x ) , . . . , Z ( x ) ) would be required, which is rarely known i n n practise. For this reason, we are forced to restrict our choice of estimators to those that require information that we know, or can at least estimate from the known data values z(Xi), . . . , 2 ( x ) . n If ( Z ( x ) , Z ( x i ) , . . . , Z(x )) n has a joint gaussian distribution, then it can be easily shown that CE is a linear function of the sample variables Z(x), Z ( x i ) , . . . , Z ( x ) , and more n importantly, that it is identical to ZOK^) (Rendu 1979). Therefore, under this distribu- tional assumption, only linear functions of the sample variables need be considered. O f course, if pi is unknown, then CE cannot be found exactly. In this case, Zo {x) K be used, which is the best (linear) unbiased approximation to should CE. The first problem is how to verify this gaussian assumption using the correlated sample values. U p to now this has been done using the simplest of tools: the histogram. 2.2 The Histogram It is an impossible task to verify that (Z(x), Z(xi),..., Z(~x )) has a joint gaussian disn tribution from only the sample values ( z ( ) , . . . , ;z(x )), seen as a single realisation X l of ( Z ( x i ) , . . . , Z(x )). n F r o m only one realisation of a distribution, there is no form n of inference possible. Therefore, we can only verify that the marginal distributions of ( Z ( x ) , Z ( x i ) , . . . , Z(x )) n are gaussian, assuming them to be identically distributed. T h i s can be done by forming a histogram of the sample values. A l t h o u g h a histogram is easy to produce, finding the correct histogram becomes very Chapter 2. The Evolution of Geostatistics 14 difficult due to the correlation i n the sample values. In order for it to represent true distribution of Z(x), the the sample values must be seen as realisations of independent random variables Z(x.i), i = l,...,n identically distributed to Z(x). T h e problem is that, i n reality, samples are far from independent. Miners drill where they t h i n k the ore grade is high, and then cluster their drilling about this high grade region. So treating these samples as independent realisations of Z(x) being over-represented, leading to a would result i n the high ore grades biased histogram and an overestimated average ore grade. A very simple technique called declustering involves dividing the area into a regular grid and choosing one sample at random from each grid square. Using this reduced set of sample values to form the histogram has proved to much reduce this bias, and is far more practical than the other methods devised to resolve this problem (David, 1988). One drawback to this technique is that, dependent on the grid size, different histograms may be found. T h i s leads to confusion over which histogram to use, demanding a subjective benefit function over which to optimise. M y objection to this technique is that it attempts to remove the bias by choosing the sample points randomly within each square. This does not agree w i t h the theory. A s described i n Chapter 1, geostatistics is based on the premise that however randomly you choose your samples, they will not be independent but correlated as dictated by the covariance function. For this reason we must ensure independence by m a k i n g sure that the samples used i n the histogram are further apart than the range of influence, a, of the covariance function. Using the declustering technique, we could still end up w i t h samples close together despite being i n separate squares, which would then bias the histogram. A method more i n line w i t h the theory would be to assume a gaussian distribution, Chapter 2. The Evolution of Geostatistics 15 estimate the (co)variogram from the data, and from this, estimate a. T h e n by o m i t t i n g the samples that are within distance a of each other, we could form a histogram and see if our original assumption is valid. The (co)variogram estimators, discussed later i n Chapter 4, are again only optimal under the gaussian assumption. A n o t h e r source of bias i n the histogram is the occurrence of outliers. A g a i n , numerous solutions have been put forward, but an even more basic problem still lies i n defining an outlier. A common method to correct this bias is to decide on a particular model for the distribution from the histogram and then correct the sample values accordingly to fit the model. T h e problem w i t h this approach is that i n practise, relatively small samples are taken, and so the sample values could very well be accurate but just not numerous enough to provide a good representation of the distribution. In this situation it would be very unwise to alter the sample values, and thus risk biasing the results, so as to fit an estimated histogram. Instead, estimation techniques that are robust to outliers should be used. It is a well known fact that linear estimators are extremely sensitive to outliers. For this reason, histogram estimation, and outlier detection i n particular, is an essential and skilled job and takes great experience to be done well. If the corrected histogram is skewed from normal, it could indicate either the existence of a nonhomogeneous trend, or a different distribution. The first case w i l l be discussed in Chapter 3. In the second case, if the distribution is lognormal, then a transformed version of the OK estimator, called Lognormal Kriging, can be used due to its close connection to the gaussian distribution. Otherwise, CE estimators. is not necessarily linear or loglinear and we must consider nonlinear Chapter 2.3 2. The Evolution of Geostatistics 16 (Ordinary) Lognormal Kriging (LK) If Z ( x ) , Z ( x i ) , . . . , Z ( x ) have a joint lognormal distribution then \n(CE) is a linear n function of the transformed sample variables l n ( Z ( x ) ) , m ( Z ( x i ) ) , . . . , l n ( Z ( x ) ) . A g a i n , n this can easily be shown using multivariate gaussian theory (Rendu 1979). T h i s estimator was developed by A n d r e Journel, and a complete derivation can be found in (Journel 1980). W e first perform the transformation: F ( x ) = ln(Z(x)), (2.21) such that the new variables F ( x ) , F ( x i ) , . . . , Y(x ) n use the O r d i n a r y K r i g i n g estimator YOK(^) are normally distributed. W e then of ^ ( x ) , using the transformed data set l n ( z ( x i ) ) , . . . , l n ( 2 ; ( x ) , assuming the homogeneous mean \n(p) of Y(x) to be known. O f n course this must be transformed back i n order to find the estimator for Z(x): Z (x) LK = De °"&\ (2.22) f where D is a constant derived i n (Journel 1980) to ensure that the resulting estimator is unbiased. In deriving D, Journel uses the fact that YOK(X) is normally distributed and so e ° Y K ^ is lognormally distributed. A g a i n it can be easily shown that under these distributional assumptions, ZLK(X) equal to the CE estimator and is thus optimal. then ZLJ<-(X), the LK estimator using Y K(*) 0 is A g a i n , i f the mean p is unknown, instead of Y K(^-), 0 is the best (log-linear) unbiased approximation to C E . T h e use of LK has been somewhat controversial, due to its lack of robustness compared to n o r m a l kriging. OK is only dependent on the relative values of the covariance function Chapter 2. The Evolution of Geostatistics 17 C(h) at different h, i.e. its shape, whereas LK is very sensitive to local variations i n the value of C(h) of the transformed data. This can be seen by looking at the OK kriging equations (2.16) and (2.17) which can be written as EAj(x)C(xj,Xi) = C(x,xO i=i + i = l,...,n, £>i(x) = l. (2.23) (2-24) B y substituting the covariance function C ( x , y ) for a C ( x , y) + P i n equation (2.23) where a and j3 are constants, and then expanding, we get n n a ^ j ( ) ( j . i ) + / ? E i ( ) = a C ( x , X i ) + /? + <> / x C x x A x z= l,...,n. (2.25) Therefore, by using equation (2.24), we can cancel P from both sides of the equation. T h e n by dividing throughout by a, equation (2.25) becomes d> n E A ( x ) C ( x , x ) = C(x,Xi) + j i i j=i i = l,...,n. 01 The constant 0 i n equation (2.23) only has the effect of equating C ( x , j ) } for each i. X of Aj(x). (2.26) {YIj=\ A J ( X ) C ( X J , x,) — Therefore, dividing <f> by a has no effect on the final solutions O f course, a can not be zero, as there would be no solution to the kriging equations. Therefore, the covariance function can not be horizontal. So, it is clear that any constant nonzero factor or constant term i n the covariance function is filtered out by the kriging equations, and thus has no effect on the kriging parameters A ( x ) and the final estimator ZOK{*)- However, for LK, the unbiasedness constant D is dependent upon the exact covariance function, and thus the estimator loses this filtering property. This makes LK far more Chapter 2. The Evolution of Geostatistics 18 dependent on the estimation of the covariance function, which is already a very difficult task. It is demonstrated very clearly i n (David 1988) using a couple of examples, that by slightly altering the covariance function you can get a great variety of estimates. For this reason, LK is a very risky estimator to use unless the practitioner is very experienced and some form of cross-validation technique is used, such as David's method ( K i m and Knudsen, 1978). This method involves the simple technique of removing one sample value at a time and using the other sample values and the proposed covariance function to estimate it. B y repeating this for every sample value, we get a collection of estimation errors which we can compare w i t h the model predictions to see if the proposed covariance is appropriate. 2.4 Nonlinear estimators T h e advantage of using linear estimators as compared w i t h nonlinear estimators, is that they require fewer (although stronger) assumptions about Z(x), faster and easier to use, and usually provide unique solutions. are computationally For these reasons the original linear kriging estimators such as ZOK(X) are still i n great use i n the mining industry, despite the recent development of nonlinear estimators. For the same reasons, it is also worthwhile to use linear estimators, even when the variable Z ( x ) does not depend on the data i n a linear way. In this case, it is advisable to divide the area into subsections, where the kriging assumptions of a homogeneous trend and a jointly gaussian or lognormal sample distribution are more likely to be satisfied, and use a different estimator for each subsection. A s linear estimators are extremely sensitive to outliers, dividing the estimator i n this way would also limit the effect of the outliers to the sections i n which they lie. O f course, the extent of this division would be limited by the number of samples, considering that a covariance function and trend function must Chapter 2. The Evolution of Geostatistics 19 be estimated for each section. W h e n Z ( x ) , Z ( x i ) , . . . , Z ( x „ ) are neither jointly gaussian nor jointly lognormal, then CE is neither linear nor log-linear. Therefore, i n this case, when the number of samples does not allow ample division of the area as described above, we must look to nonlinear estimators of Z(~x). T h e first nonlinear estimator to be developed was Disjunctive Kriging. Devised by M a t h eron (1976), it is still one of the most popular non-linear estimators, at least i n research if not i n practise, and for this reason I shall describe it i n more detail. 2.4.1 Disjunctive Kriging (DK) Matheron's motivation behind Disjunctive K r i g i n g is to increase the space of possible estimators of Z(x), rather than restricting them to linear ones, but i n such a way that the computations involved do not require information that could not be estimated from the data. T h e a i m is to find an estimator somewhere between OK and CE, again assuming a parametric distribution for Z(x). Therefore, instead of choosing a linear function of the sample variables as for the O r d i n a r y K r i g i n g estimator, Matheron estimates the variable Z{x) by a sum of n Immeasurable functions each a function of one sample variable Z ( x ; ) : n (2.27) i=i which satisfy the following unbiasedness constraints: E(Z (x) DK | Z(x,-)) = E(Z(x) | Z(x.j)) j = 1,... ,n. (2.28) Chapter 2. The Evolution of Geostatistics 20 These unbiasedness constraints of ZDK{~X) are chosen to be as close to the conditionally unbiased property, E(Z K(X)\Z(X-I) D • • • Z ( x ) ) = CE, without destroying the n disjunctive nature of the estimator. The reason for this will become clear later on i n this section. T h e functions fi must be Immeasurable for E[fi (Z(x.i)] < oo for each x ; , and thus for 2 ZDK{*) to have a finite variance. A s the system of equations defined i n (2.28) cannot be solved i n general, more assumptions must be made to simplify the problem. F r o m now on, for ease of notation, x shall be denoted by Xo. If we can assume that Zfa), Z ( x i ) = to(Y(xi)), i = 0 , . . . , n, are transformations from a process such that w i t h to assumed known and Immeasurable, and the process F ( x ; ) , denoted by Yi i = 0 , . . . , n, forms an isofactorial model such that: 1. Yi, i = 0 , . . . , n, have the same distribution function F 2. T h e joint distribution functions Fij for all pairs of variables (Yi, Yj) can be factorised such that: = (f^TSWWtyi)^)) Fi^ViAVj) \fc=0 F(d )F(d ) yi yj Vi, j (2.29) / where v , k — 0,1,..., are orthogonal functions i n that Vz, E[ri (Yi)ri (Yi)] k kl k2 =0 for k ^ k , and T (i, j) = E [rik( i)Vk(Yj)], Y x 2 k then the system of equations i n (2.28) can be greatly simplified, m a k i n g it possible to estimate the functions fi, i = 0 , . . . , n, or more precisely hi, i = 0 , . . . , n, where hi(-) = fM-)). (2.30) For a general isofactorial model, understanding of the following theory can become unnecessarily confused by notation. Therefore, as only one isofactorial model seems to Chapter 2. The Evolution of Geostatistics be used i n most practical applications of DK, 21 I shall derive the DK estimator for this specific model. The following steps are very similar to those i n the general case: If the sample variables Zfc) are transformations from the random variables Yi whose distribution function F is iV(0,1), and whose joint distribution functions Fij are bivariate normal w i t h marginal means 0, marginal variances 1, and correlation coefficient pij = E(YiYj) (which i n this case is also the covariance), then the Yi can be shown to form an isofactorial model w i t h the Tjk(Yi) being the normalised Hermite polynomials: V k ( Y i ) =k\ k = > >~> 0 1 w h e r e = e ^ ( e ;' ( 2 ' 3 1 ) These polynomials form an orthogonal basis of Immeasurable functions of a single JV(0,1) distributed random variable. Therefore if we rewrite (2.27) i n terms oi hi, i — 0 , . . . , n: Z (x) = DK J2h(Z(xi)) i=l = j^hiiYi), = JZh^iXi)) i=l (2.32) i=l then as Yi, % = 0 , . . . , n, are distributed N(0,1), and the functions hi, i = 0 , . . . , n, are Immeasurable, each element in the sum can be written as an expansion of Hermite polynomials i n the following way: oo h (Yi) = £ W Y ) fc=0 t Similarly Z{x ) 0 oo = £ fc=o TJ (V\ A.fc^^. - (2.33) lc can be expanded in this way: Z(xo)=u,(Yo) = f > ^ T ^ ( - ) 2 3 4 Under the assumption that we know the transformation cu, we also also know the coefficients bk, k = 0 , 1 , . . Now, i n order to solve for hi, i = 0 , . . . , n, or more specifically, for Chapter 2. The Evolution of Geostatistics 22 the coefficients A ^ , i = 0 , . . . ,n, k = 0 , 1 , . . , we must go back to (2.28) which can be rewritten as: I it ( i(Yi) E h i) = E(Z(x ) Y 0 I Yj) j = 1 , . . . , n. (2.35) i=l B y inserting (2.33) a n d (2.34) into (2.35), we get: oo oo n \ Y.Y.^E{H {Y )\Y ) k fc=0 i=l K l = Y.T-^(H {Y )\Y ) ] k - k=0 which, using the fact that oo E L Q ] j = l,...,n, (2.36) - K E(H (Yi) \ Yj) = pij H (Yj) for all triples (i,j,k), becomes: k k k oo n X > = fc=0 i=l E hp^VkiYj) j = 1 , . . . , n. (2.37) k=0 T h i s is where the advantage of expressing the hi, i — 0 , . . . , n, as expansions of Hermite polynomials is realised. B y multiplying both sides of (2.37) by rjk(Yj) and t a k i n g expectations, we can use the orthogonal properties of the polynomials to divide each equation for j i n (2.37) into infinitely many smaller equations, one for each k: it *Pn x k = *A>;* j = l,...,n, 6 A; = 0 , 1 , . . (2.38) i=i Thus, for each k, we now have n equations and n unknowns allowing us to solve for each set of Ajfc, i = 1 , . . . , n, separately - thus the title Disjunctive K r i g i n g . In practise, the Hermite polynomial expansions must be truncated to form a finite sum, such that our DK estimator becomes: n ZDKM K = E E n oo n ^VkiYi) ~ E E W ( ^ ) = E W ) , i=l fc=0 i=l fc=0 (2-39) i=l where the coefficients Xi are found by solving K sets of n equations as defined i n (2.38). k Chapter 2. The Evolution of Geostatistics 23 A t this point it must also be noted that for F ( x ) to have an invariant distribution w i t h respect to x , as required for an isofactorial model, so must zT(x), under the common transformation u>. A s a consequence, Z(x) is required to be stationary i n both its mean and variance. Another major drawback is that this model has no allowance for an anisotropic covariance function. DK has been highly criticised for these very strict assumptions innate to the model, which i n many cases render DK an inapplicable form of estimation. DK also poses some difficult problems when put to practical use: the most important being the estimation of to. The only documented method of estimation seems to be in (David, 1988). T h e C D F is estimated from the sample values, and the C D F of a N(0,1) random variable is transposed onto the same axes. Each sample value 2(x;) is transformed to the corresponding value y{x.i) on the N(0,1) curve, chosen as the value below which the same proportion of the distribution lies. T h i s is illustrated i n Figure 2.2. F(z) _ • ' F(y) F(yO=F(Zi) -0.6 -0.4 -0.2 yi 0 0.2 0.4 Zi 0.6 0.8 1 Ore Grade (z) Figure 2.2: This is an illustration of how the sample values . z ( x i ) , . . . , z(x ) are transformed to corresponding values y ( x i ) , . . . , y ( x ) from a iV(0,1) distribution. F(z) is the C D F of the sample values, and F(y) is the C D F of a N(0,1) random variable. n n Chapter 2. The Evolution of Geostatistics 24 B y then plotting z(x{) against t/(x;), we can fit a function to of the form: Z(x) = u(Y(x)) = £ b (Y(x)), (2.40) kVk fc=0 estimating the coefficients b by least squares or some other similar method. k T h i s method only attempts to estimate to that transforms Z(x) variable Y(x), to a N(0,1) random w i t h no consideration of the bivariate distributions between Y^s at different locations. Due to there being only one realisation of each random variable Yi, no inference can be made about these bivariate distributions, so an assumption that the estimated to also provides normal bivariate distributions Fij, V i, j must be made. Also, accurate estimation of the C D F from correlated data is directly related to histogram estimation. A s discussed earlier, this is a difficult task.. Disjunctive K r i g i n g is also computationally heavy i n comparison w i t h the previous linear estimators, and as Hawkins and Cressie (1984) point out, the procedure for estimating the functions fi is highly nonrobust. E s t i m a t i o n of the correlation (covariance) function p and the coefficients b are completely k dependent upon the transformed data y(xi), i = 1 , . . . , n. A bias i n the estimation of to, from misspecification of the C D F say, could lead to a compounded bias i n the resulting estimator. { Most other non-linear estimators have been developed especially to estimate recoverable reserves rather than total reserves, where instead of estimating the reserves directly from the sample variables Z{xi), indicator variables of the following type are defined: I(x;z) 1 if Z(x) < z w.p. p(z) 0 otherwise w.p. 1 — p(z) (2.41) one for each sample variable, where z is a cut-off grade below which the ore is not recoverable. Stationarity i n the mean and variance of Z{x) assumed throughout. and thus I(x; z) are b o t h Chapter 2. The Evolution of Geostatistics One such estimator is 2.4.2 25 Indicator Kriging, as derived i n (Journel 1983). Indicator K r i g i n g Indicator K r i g i n g , or IK, (IK) is principally designed as a non-parametric method of esti- mation arguing that i n nature, variables are rarely gaussian, especially the uncontrolled spatial distributions found i n the earth sciences. Therefore, Journel proposes to use the data to model uncertainty i n the value at an unknown point x i n the form of a conditional distribution function F(x;z\n) of Z(x). This is as opposed to assuming the underlying distribution and using the data to find an optimal estimator. T h i s conditional distribution function is modelled as a weighted linear function of the indicator functions at the data points: n F(x; z\n) = P(Z(x) < z\Z(x )... Z(x )) = £ ^ ( x ; z)I(x ; z). 1 The unknown coefficients n (2.42) 3 aj(x;z) are estimated using OK w i t h a non-parametric dis- tance related covariance function, e.g.^^p, just as i n distance related moving average estimators, where D is a constant. T h e n by repeating this for z , k = 1 , . . . , K, a discrete k approximation of F ( x ; z\n) can be found. This is of course w i t h the condition that the resulting estimate is a valid distribution function i.e. F G [0,1] and F(z) z > z'. In order to provide an estimate of Z(x) a loss function L. B y using F ( x ; , z | n ) , ZJK, > F(z') when given the sample values, Journel defines the estimate at x , is chosen as that which minimises: E[L(z IK - Z(x))\Z(x )... Z(x )} = / f ^ L(z x - £f=i L(z n IK IK - z)dF(x; z\n) - 4 ) [ F ( x ; z \n) - F ( x ; z \n)) k+1 k (2.43) Chapter 2. The Evolution of Geostatistics where z' = k 26 ^+ , Zfc+ 1 Other non-linear estimators include: which is very similar to IK Probability Kriging as derived i n (Sullivan 1984) but uses instead a linear K r i g i n g estimator involving b o t h the sample variables and the indicator variables for F(x;zfc|n); as derived i n (Verly, 1984); and Multigaussian Kriging Bigaussian Kriging as derived i n (Marcotte and D a v i d , 1985), b o t h of which go back to various gaussian assumptions about Z(x) distributions of the sample variables. and the joint Chapter 3 Kriging with a nonhomogeneous trend In practise, most ore surfaces are not homogeneous i n their mean, but instead the mean follows a systematic trend. This can sometimes be diagnosed by looking at the estimated covariance function. If the correct covariance function is fitted, and is significantly nonzero for large values of h, this implies that surface variables that are far apart are still significantly correlated. This correlation over large distances can be described by a deterministic function, the trend surface /i(x), relating the two variables. T h e way i n which geostatistical theory deals with this nonhomogeneity i n the mean is to model the surface values relative to this trend, rather than the surface values themselves. T h i s leads to the model: Z ( x ) = ( x ) + e(x), (3.44) M where ^i(x) = trend surface, assumed non-random, e(x) = small scale regionalised variable incorporating the randomness i n Z ( x ) , and £ ( Z ( x ) ) = n(x), Vx => £ ( e ( x ) ) = 0, V x . (3.45) So now, as long as the trend surface is known or can be well-estimated, the new regionalised variable e(x) is stationary, and so can be modelled using regular geostatistical 27 Chapter 3. Kriging with a nonhomogeneous trend 28 theory. T h e covariance function for e(x) is of course the same as for Z(x), as the trend surface is assumed to be non-random. The problem w i t h the above model, which will be discussed i n greater detail later, is i n its practical application. The surface value, z(x), at each location x is viewed as one realisation of the surface variable Z(x). W h e n the mean is stationary, we can use every sample value to estimate the mean ft, whereas now we have only one value w i t h which to estimate £t(x), for each point x. Even if we were to know z(x) for the whole surface, which would then make estimation unnecessary, we still could not estimate yu(x) exactly. So really, this trend surface only exists as a theoretical tool to make the regionalised variables stationary, but i n practise it is impossible to find. Despite these practical drawbacks, Universal Kriging, as devised by M a t h e r o n (1969), is still the only form of estimation i n geostatistical theory that incorporates a nonhomogeneous trend. For this reason, much has been written on this estimator, and i n particular on unbiased estimation of the covariance function using sample values from nonhomogeneous sample variables. In this chapter, I first outline the original derivation of the Universal K r i g i n g (UK) estimator by M a t h e r o n . A s UK is also a B L U E estimator, its derivation is very similar to that for OK i n Chapter 2. Pointing out the weaknesses i n applying the explicit form of the estimator, I then describe a second form of the UK estimator proposed by Ripley (1981). I verify that the two forms of estimation are equivalent. I show how Ripley's form of estimation avoids the various practical drawbacks of using the explicit estimator, and I describe exactly how the final estimator and error variance can be found accurately and efficiently. T h i s is very important when multiple estimates have to be found as i n Chapter 3. Kriging with a nonhomogeneous trend 29 Chapter 5. Finally, I discuss the problem of estimating the covariance function unbiasedly from nonhomogeneous sample variables. I describe how this led to the theory of intrinsic ran- dom functions and a proposed form of unbiased estimation of the generalised covariance function. I also describe other forms of unbiased covariance estimation that have been developed since then. Before deriving the Universal K r i g i n g estimator Z (x), I shall list the assumptions UK required, i n addition to those outlined i n Chapter 2. U K m o d e l assumptions 1. T h e basic model for each regionalised variable Z ( x ) at a point x is as defined i n equations (3.44) and (3.45). 2. T h e trend surface p(x) is unknown, yet we know its shape, i n that it is of the form: /x(x) = £ / ? / / * ( x ) = / 3 f ( x ) , r (3.46) where / 0 A and f(x) = /o(x) X /i(x) V hW J T h e trend component functions, / ; ( x ) , defining the shape of the trend, are known, but the parameters B are unknown. t Chapter 3. Kriging with a nonhomogeneous trend 30 3. T h e matrix F, defined as /o(xi) /L(XI) F = (3.47) ^ /o(x ) n .... / ( x „ ) J L is of rank L + 1, where x i , . . . , x are the sample points. T h i s means that the n L + l vectors ( / ( x i ) , . . . , / ( x ) ) , . - . . , (/L(X ), . . . , /L(X„)), must be linearly inde0 0 n x pendent. It is worth noting at this point that although F is full rank, F F T becomes \ illconditioned for relatively small L and will cause roundoff errors to occur when calculated. Numerical solutions to this problem will be discussed later. 3.1 Original derivation of Matheron's Universal K r i g i n g estimator, A s for O r d i n a r y K r i g i n g , Matheron considers the Universal K r i g i n g estimator ZUK^) ZJJK^) to be a linear function of the surface variables Z ( x i ) , . . . , Z ( x ) at the sample points n X i , . . . , x „ , such that: n Z (x) = UK 5Xx)Z( ) i=l Xl (3.48) = Z^A(x), where Z and A ( x ) are as defined i n equation (2.10), and the weights Aj(x) are unknown. n For ZIJK(*) to be unbiased, we again need the expected value of the error to be zero, such that: E(Z(x) - Z (x)) = UK E(Z(x)) n E(Z (x)) UK = ^( ) - E *( M i) X A x x i=l (3.49) Chapter 3. Kriging with a nonhomogeneous trend 31 Therefore, for (3.49) to be true, independent of the values of Pi, we require: f(x) = f > ( x ) f ( ) , l = 0,...,L, X i i=i written simply as f(x) r = A(x) F. (3.50) T We now want to find the optimal weights which minimise the error variance, <r (x), 2 under the L+ 1 unbiasedness constraints defined i n (3.50). So again, by introducing the Lagrange multipliers <b = (<f>o,..., 4>L) for the L + 1 unbiasedness constraints, we need T to minimise: H \{x) F)cf> T = a (x) + 2 ( f ( x ) - = a (x) + A ( x ) K A ( x ) - 2A(x) k(x) + 2(f(x) - A ( x ) F ) 0 , 2 T 2 T T r T (3.51) w i t h respect to A ( x ) and <p. Therefore, d -- 2K\(x) - 2k(x) - 2F<p = 0 H d\{.x H = 2(f ( x ) - A ( x ) F ) = 0, r r which i m p l y respectively that KX(x) f(x) = k ( x ) + Feb T (3.52) = A(x) F. (3.53) r F i r s t we solve for (p. A s K is the covariance matrix, it is symmetric and non-negative definite by construction. We further assume that it is positive definite, and thus exists. B y multiplying (3.52) by F K~ T l and transposing (3.53), we get: K~ l Chapter 3. Kriging with a nonhomogeneous trend F A(x) = F /C k(x) + F A(x) = f(x). T T r 32 F K~ F(f) _ 1 T (3.54) 1 (3.55) So, by equating (3.54) and (3.55), we get: F ^ ^ x ) + F K~ Fcb T = f(x). 1 (3.56) A s K is assumed positive definite and F is assumed to be of rank L + 1, it follows that A = (F K~ F) T 1 is of full rank L + 1 and thus its inverse exists. Therefore: <t> = A~H(x) - A- F K~ \i{x). 1 T (3.57) 1 B y substituting this solution for <f> back into (3.52), and multiplying by K , -1 we have the following solution for A(x): A(x) = K - k ( x ) + i T - F v l - ( f ( x ) - F K - k ( x ) ) . X 1 1 T (3.58) 1 Thus, by substituting (3.58) back into (3.48), the Universal Kriging estimator, of Z(x), ZUK(X), as defined by Matheron, can be written explicitly as: Z (x) VK = Z^A(x) = T7 (K~ \{X) + K " F ^ - ( f ( x ) - F K-^{x))) X n 1 1 T . (3.59) In order to calculate the kriging estimate, we need to define the trend component functions, fi(x), i n equation (3.46). A s discussed earlier, this trend surface p(x) is only constructed i n theory so as to make the residuals of the random function, Z(x), homo- geneous. Therefore we have little idea of what the trend component functions should be. In most cases, the trend surface is assumed to be a polynomial of some general order p, where Chapter 3. Kriging with a nonhomogeneous trend 33 ' /o(x) ( A /i(x) f(x) = ( x\ = 1 if p = 0, X L = Hp = 2, if p = 1, V /L(X) ; and 1 ^ xy n=i If p = 0, the C/K kriging equations (3.52) and (3.53) become identical to the OK kriging equations, as one would expect. Also, as Z (x.) is a (linear) unbiased estimator that minimises the variance of the error, UK it must necessarily estimate the exact values at the the sample points w i t h zero error variance. Therefore, assuming there to be no measurement error i n the sample values, UK is an interpolator. This can be very difficult to see by looking at the UK estimator. We must go back to the original UK kriging equations defined i n (3.52) and (3.53). B y letting x be a general sampled point X j , and noticing that (3.52) and (3.53) can be written as: [k(x )...k(x )...k(x )]A(x ) = k(xi) + F 0 1 i n i ( f(x,) = r f( X l (3.60) f ^ A(x^ (3.61) f(x ) n 3 Chapter 3. Kriging with a nonhomogeneous trend it is easy to see that a solution to (3.60) and (3.61) is 34 AJ(XJ) = 1, AJ(XJ) = 0 V7 ^ i, and (p = 0. A s we know that this system of equations has a unique solution then this must be i t . Therefore zuxi^-i) = zfai) VXJ. B y substituting this solution for A(XJ) into equation (2.13) for the error variance, it is easy to see that the terms cancel to leave a zero error variance as expected. ZUK(X) is always an interpolator, to the extent that if C{h) is discontinuous at zero, then the predicted surface becomes discontinuous at the data points. A s an example, if we assume that the trend is homogeneous and the surface variables at a l l points on the surface are completely uncorrelated, such that p = 0, and: o if / i = 0 0 otherwise 2 C(h) = (3.62) then it is easy to show that ZUK{X) predicts a horizontal surface of height ^ J2?=i z ( x the mean of the sample values, which becomes discontinuous at the sample points. 3.1.1 Discussion It is a well-known numerical fact that though products and transposes of large matrices can be calculated fairly easily and accurately, inverses are a completely different proposition. A s K is an n x n matrix and A an (.L + l ) x (L + l) matrix, calculating K" 1 and A" 1 w i l l become extremely heavy i n computation i f either n or L become large. Additionally, in order to estimate the covariance function, a relatively large sample would need to be taken, naturally forcing n to be large, unless a covariance function is assumed known. For this reason, it is very difficult to find a computer package that can cope w i t h the magnitude of the calculations. B o t h M a t h e m a t i c a and Maple are symbolically-oriented packages and so are very inefficient i n dealing w i t h numerical problems. I suggest that i)> Chapter 3. Kriging with a nonhomogeneous trend 35 for any extensive data analysis that one uses one of the more numerically-based packages such as M a t l a b . I wrote a program i n M a t l a b to estimate a surface from a set of samples using the above explicit form of the UK estimator. A l t h o u g h it was no problem to calculate accurately, A~ l became ill-defined once the trend p(x) K~ l was assumed to be of order 3 or above. T h e reason for this lies behind the construction of A. A s K is a covariance matrix, it can be very close to the identity matrix depending on the range of influence of the covariance, and therefore so can its inverse. So ultimately A is composed of the inner products of the columns of F, which is where our problem arises. Under our assumption of a polynomial trend, the first column of F is composed of ones, the second column of the x coordinates of the n samples, the t h i r d column of the y coordinates, and so on, w i t h increasing order of exponent. Therefore as p, the dimension of the trend, increases, we have a great difference i n the order of magnitude of the columns i n F. As A has the square of this order of magnitude difference i n its components, this w i l l result i n it having a near-zero determinant and thus an inaccurate calculation of its inverse. A n o t h e r form of the Universal K r i g i n g estimator can be found i n Ripley (1981). A l t h o u g h it seems very different i n its derivation, it will be shown to be identical to Matheron's estimator i n equation (3.59), and it avoids finding the inverse of b o t h K and A. It also provides, I feel, a clearer understanding of how Universal K r i g i n g estimates the trend ^i(x) and how the kriging weights Aj(x) are related to the covariance function. For situations where multiple estimates must be found, such as i n Chapter 5, this form of estimation also proves to be more computationally efficient. Chapter 3. Kriging with a nonhomogeneous trend 3.2 36 Ripley's form of the Universal Kriging estimator Instead of estimating z ( x ) directly from the sample values -Z(XJ), Ripley approaches the problem from a point of view analogous to that i n time series analysis. He estimates the trend fj,(x) directly and then models the residuals, i?(x) = Z(x) — / i ( x ) , just as we d i d for Z ( x ) before, but now assuming they are stationary w i t h mean zero. T h e same basic model assumptions and notation as defined for Matheron's estimator still hold. Before proceeding w i t h the derivation, I define some well known numerical decompositions which are used i n the analysis: 3.2.1 Cholesky factorisation Cholesky factorisation states that if an (JV x N) matrix X is non-negative definite, then it can be uniquely factorised into the product of a lower triangular m a t r i x and its transpose. X = MM , T (3.63) where M is also (N x N). 3.2.2 Q R Decomposition Q R decomposition states that any (N x (L + 1)) matrix X can be decomposed into the product of an (N x N) orthogonal matrix Q, and an (JV x ( L + 1)) m a t r i x R, such that: X = QR, (3.64) where QQ = Q Q = 1, T and T R=^ J, R is an ((L + 1) x ( L + 1)) upper triangular matrix. (3.65) Chapter 3. Kriging with a nonhomogeneous trend 3.2.3 37 Singular Value Decomposition (SVD) Singular Value Decomposition states that any real (N x (L + 1)) m a t r i x X can be decomposed into the product of an (N x (L + 1)) m a t r i x U, a ((L + 1) x (L + 1)) diagonal m a t r i x E = diag(v\,..., ox+i), and an ((L + 1) x (L + 1)) m a t r i x V such that: X = C/£V where U U T (3.66) T = VV = VV T T = I. T h e columns of U are the L + 1 orthonormalised eigenvectors associated w i t h the L + 1 largest eigenvalues of X l ' , the columns of V are the L +1 orthonormalised eigenvectors 1 associated w i t h the Z- + 1 eigenvalues of X X, and (01,... ,CTL+I) are the non-negative T square roots of the eigenvalues of X X, T called the singular values. T h e proof of this can be found, for instance, i n Moler (1967). The advantages of using S V D are discussed i n Section 3.2.4. 3.2.4 Step 1: Estimating the trend A s we assume the trend to be of the form: (3.67) 0(x) = /3 f(x), T w i t h the components of f (x) assumed known, then under the basic model, we arrive at the following vector equation for the ore grade variables at the sample points: 1 Z n = Ff3 + e , n *(x0 X e(x ) 2 where e (3.68) n ^ *(Xn) j Chapter 3. Kriging with a nonhomogeneous trend 38 where Z„, F, and f (x) are as defined i n the last section, and the residuals, e , are assumed n to be correlated random variables with mean zero and covariance matrix K. We then find the generalised least squares ( G L S ) estimator, (3, which is the solution of: min (Z - Fb) K- (Z - Fh) l T n n b = min £^K~ e l n b n n = ,b m V(*i)> i n (3-69) i=lj = l = ( e ( x i ) , . . . , e ( x ) ) is the vector of errors. T h e idea behind this is that i f the where n ore grade variables Z(XJ) and Z(x.j) at the sampled points j and X j are highly correlated, X this reduces the worth of their combined information. Therefore, the multiple of their associated errors, £(XJ) and £( j)> will have a reduced weighting by being multiplied x by a factor inversely proportional to their covariance COV(Z(XJ), Z(x.j)) = Kij. T h i s w i l l reduce the effect that a cluster of points has on the polynomial fit, thus providing a better fit to the more isolated points. A s K is a covariance matrix, it is non-negative definite by construction. Therefore we can use Cholesky factorisation as defined earlier to write K~ i n terms of M l K' 1 = (MM )~ T l = M~ M-\ - 1 , (3.70) T where M is a lower triangular matrix, and ( M _ 1 ) = (M )~ T T 1 = M~ for ease of notation. T Therefore equation (3.69) can be written as min (Z - Fb) M- M~ (Z T T - Fh) l n n b = min ( M - Z 1 b - M- Fb) (M- Z„ 1 n T 1 M~ Fb). l (3.71) Chapter 3. Kriging with.a nonhomogeneous trend 39 * B y transforming the problem i n this way, we have effectively uncorrelated the -Z'(XJ), and thus the residuals e(xj). T h e new residuals of the transformed problem, £ ( X J ) , are independent random variables with variances equal to one: Cov(0 = Cov(Y ) = C o v ( M = M~ KM~ n X _ 1 Z ) = M n = M~ MM T x T M" T _ 1 Cov(Z )M~ T n = I. (3.72) It then becomes clear that the generalised least squares estimator J3 is the standard least squares estimator of the transformed problem: M~ Z l = M~ Ff3 + M l n _ 1 e , (3.73) n which can be written as Y „ = G/3 + £ , (3.74) n where Y = M~ Z , 1 n n G = M~ F If we assume the residuals e 1 n and £ = M~ e . (3.75) l n n to be jointly gaussian, then the new residuals £ n w i l l be jointly gaussian, and the G L S estimator 0 w i l l be the maximum likelihood estimator of BA l t h o u g h the transformed variables F ( x ^ ) are uncorrelated among themselves, they rem a i n correlated w i t h the unknown ore grade variable Z(x) that we are t r y i n g to estimate, w i t h new covariance Chapter 3. Kriging with a nonhomogeneous trend k (x) y 40 = Cov(Z(x),Y ) = Cov(Z(x),M- Z ) = M- Cov(Z(x),Z ) = M~ k(x). 1 n n 1 (3.76) 1 n Notice that since M is a lower triangular matrix, the transformations as denned i n (3.75) can be carried out using a sequential substitution method. For example, Y n = M _ 1 Z n can be written as M Y = Z . n (3.77) n which expanded, becomes MnY, MY = Z( ) X l l + M Y M Y + M r MY + M Y 2l Z1 nl X x 22 3 2 n2 = Z(x ) 2 2 + M y 2 33 + M Y 2 n3 3 = Z(x ) 3 3 + ... + M Y nn n = Z(x ) n B y solving the first equation for Y\ and substituting it into the second equation, it becomes obvious how Z n can be easily transformed to Y „ without having to find M - 1 . Similarly we can find G, the transformed matrix of F, by letting MG = F and applying the same method to each column of G, as if it were Y n i n the previous example. A s stated earlier, we can now use standard least squares methods to estimate j3 from the transformed data. Chapter 3. Kriging with a nonhomogeneous trend 41 S t a n d a r d Least Squares A n a l y s i s of the transformed problem Following from equation (3.71), we now want to find the standard least squares (LS) estimator of f3 that satisfies min H = min ( Y - G b ) ( Y - Gb). T n b n (3.78) b To find the usual form of this estimator, we expand H to get H = YlY - 2b G Y T n A s G G = F K~ F T T + b G Gh. T T n T (3.79) = A , as defined earlier, G G is symmetric and positive definite. l T Therefore, as H is quadratic i n b , when we differentiate H w i t h respect to b , the unique stationary point solution must necessarily be at a minimum: ^- — —2G Y ah < T As F F T + 2G Gh = 0 T n => J3 = (G G)~ G Y . T 1 (3.80) T n is an ill-conditioned matrix for relatively small values of L, then G G T likely to be ill-conditioned, especially if K is close to the identity matrix. is also Therefore in calculating /3, as defined above, we run into exactly the same problem as before, of finding the inverse of an ill-conditioned matrix. In this method of L S has been developed to avoid this problem where the columns of G are orthogonalised using the QR decomposition. A s G is an (N x ( L + 1)) matrix, it can be decomposed into an (N x TV) orthogonal m a t r i x Q, and an (N x ( L + l ) ) matrix R , such that G = QR, where R and R is an ( ( L + 1) x (L + 1)) upper triangular matrix. equation (3.78), (3.81) Therefore, following from Chapter 3. Kriging with a nonhomogeneous trend m i n ( Y - Gb) (Y - Gb) T n n = 42 m i n ( Y - Gb) QQ (Y T - Gb) T n n b = min ( Q ( Y - Gb)) (Q (Y r T n T n - Gb)) b = Now if we define the vector Q Y T n m i n (Q Y T - Rb) (Q Y T n T n - Rb). (3.82) as two sub vectors i n the following way: QY (3.83) T n \ J2 where J i is a ((L + 1) x 1) vector, and J is thus an ((n — L + 1) x 1) vector, equation 2 (3.82) becomes -Rb V Ji T ( J - Rb \ mm b x m m ((Jx - Pib) (3 - Rb) + j £ j ) . T 1 \ J 2 2 (3.84) J F r o m looking at equation (3.84), it finally becomes obvious that i t is minimised for b satisfying J x = R/3. (3.85) Since R is an upper triangular matrix, b can be found by applying the sequential substitution method as before. We can summarise the above i n the following steps: 1) F i n d K, either by assuming a covariance function, or by some form of estimation. 2) Perform Cholesky decomposition on K to find M. 3) F o r m F by inserting the sample coordinates into the polynomial trend component Chapter 3. Kriging with a nonhomogeneous trend 43 functions. Transform the sample data Z 5) A p p l y similar substitution method to find G, where MG = F 6) Perform Q R decomposition on G to find R 7) Calculate Q Y 8) Use reversed substitution method to find 0 where Rf3 = J i T n to find J n to Y n by solving MY = Z 4) n n for Y(xi). x ~ T Thus we have found the G L S estimator (3 and have an estimate /2(x) = j3 f (x) of the trend. A t this point it is worth noting that the above method of avoiding roundoff error for the illdefined L S problem is only one of many. A n alternative method called singular value decomposition (SVD), as defined i n Section 3.2.3, is probably the most widely used, as it is not only very stable, but it also decomposes G i n such a way that it reveals a great deal about its structure and stability. B y allowing the practitioner to keep an eye on the condition number of G G, S V D provides a solution that can be easily altered to avoid roundoff error when G G is unconditioned. None of the other methods, including T T Ripley's, allow this versatility. T h e algorithm for this decomposition is based on the original procedure by G o l u b and Reinsch (1970), using two finite sequences of Householder transformations to reduce G to a bidiagonal matrix, and then using a QR algorithm to find the singular values. Other references are G o l u b and V a n L o a n (1983) and Press et al. (1989). After decomposing G i n this way, we look at the singular values. A l l the singular values whose ratio to the largest singular value (the reciprocal of the condition number) is less than a certain multiple of the machine precision must be set to zero. B y doing this, we throw away the linear combinations of the set of equations associated w i t h these singular values. T h e y Chapter 3. Kriging with a nonhomogeneous trend 44 are so corrupted by roundoff error, that they have the effect of pulling (3 away from the L S solution. T h e final S V D solution of (3 for the linear system G(3 = Y n (3* = where E _ 1 is: VE~ U nY, U * is E _ 1 with T (3.86) 1/aj replaced by zero liuj = 0 for j — 1 , . . . , L +1. T h i s solution is also extremely versatile in its application. If G is square and non-singular, then (3* is the exact solution. If G is singular, whether it is square or not, then if Y „ is i n the range of G , there are an infinite number of solutions, and (3* is the one w i t h the smallest length (3 f3, for increased stability. Lastly, and most importantly, if Y T n is not i n the range of G, then \3* is the L S solution that minimises {G(3 — Y ) (G(3 — Y ) . T h i s is proved i n T n n Press et al. (1989). Despite the versatility and stability of this method, Ripley's method is more than adequate for the L S problems that I encounter i n Chapter 5. 3.2.5 S t e p 2: M o d e l l i n g t h e r e s i d u a l s , i ? ( x ) Once the trend has been estimated, we then find the residuals, r ( x ; ) , such that r(xi) = z(xi) - p,(xi) = z(xi) - J3 f ( x ^ , Vx; (3.87) and r(xi) r(x ) 2 \ (3.88) T h e residual r ( x ; ) for each sample is seen as a realisation of the residual variable -R(x,) = Z(XJ) — /X(XJ). Under the assumption that /i(x) is the actual trend, i ? ( x ) is equal to the Chapter 3. Kriging with a nonhomogeneous trend 45 theoretical residual random function e(x) defined earlier. Therefore, we assume that R(x) is a homogeneous random function such that E(R(x)) = 0, V x , Var(i2(x)) = Var(Z(x)) = o (x), V x , 2 z Cov(fl(x),.R(y)) = C o v ( Z ( x ) , Z ( y ) ) = C(x,y), Vx,y. Under this assumption, we can estimate R(x) using the O r d i n a r y K r i g i n g estimator ROK(X) derived i n Chapter 2 for u — 0: n ^OAT(X) = 5Xx)i2(xi) = R£TJ(X), (3.89) where Kr){x) = k ( x ) . (3.90) A g a i n , by rewriting equation (3.90) as M(M r](x.)) = k(x) and letting v ( x ) = M r / ( x ) , T r we can solve Mv{x) = k(x) for u ( x ) , and then solve M rj(x) = i>(x) for rj(x), using the T substitution method described earlier. We can summarise the above i n the following steps: -T 1) F o r m the vector of sampling residuals r 2) F o r m the covariance vector k(x) of the n sample points Xj w i t h the estimation n where r(xj) = zfc) — (3 f (XJ) point x , using the estimated or assumed covariance function, C ( x , y) 3) Solve Mv(x) = k(x) for v(x) using the substitution method 4) Solve M 7](x) = v(x) for 77(x) using the reversed substitution method T Therefore, Ripley's form of the Universal K r i g i n g estimator is: Z (x) UK = m(x) + Ro {y) = p f T K (x) + R ^ ( x ) (3.91) Chapter 3. Kriging with a nonhomogeneous trend 3.3 46 The equivalency of the two forms of estimation It is far from obvious that Ripley's estimator, as defined i n equation (3.91), is equivalent to Matheron's estimator, as defined i n equation (3.59). Therefore I now show that the two forms oi UK estimation are equivalent. 3.3.1 Finding ft explicitly RR = R R = R Q QR T T T = GG = A T R T l = A~ R l (3.92) T Therefore, using (3.92), (3.85), and (3.83), we find that (3 = R J = (A- R )J 1 1 l = T 1 A- R X ' T V J = A-'R {Q Y ) T T n 2 A- R (Q M- Z ). 1 T T (3.93) l n B y noting that R = Q G = Q M~ F, T T l and substituting for R i n equation (3.93), we T can find an explicit expression for 0: f3 = A~ (F M~ l 3.3.2 T Finding T Q)Q ROK(^) T M~ Z l = A~ F M~ x n T T M~ 7i l = A~ F K~ Z . 1 n T 1 n (3.94) explicitly A s ROK^) is defined as i?(x) = Rl(K- k(x)), (3.95) l we need to find an explicit function for R . B y first expressing the components of R „ i n n terms of the trend: R(*i) = Z(xi) - ji{xi) = Z(xi) - Z K- FA-H(x), T l n (3.96) Chapter 3. Kriging with a nonhomogeneous trend and noticing that F 47 = (f(xi), f ( x ) , . . . , f(x )), we can write R T 2 n explicitly as n Kn = Z - (Z£K- FA- F ) . 1 1 (3.97) T T n Finally, by substituting J3 from equation (3.94) and R n from equation (3.97) into equa- tion (3.91), we get an explicit expression for Ripley's UK estimator that is identical to Matheron's estimator: Z {x) UK 3.4 = m ( x ) + i?(x) = ZlK^FA'H(x) = Z + Z {I l n 1 n ( i f - ^ x ) + K- FA~ T K- FA- F )K- k(x) T (f(x) - x 1 T 1 F K~ k(x))) T 1 C a l c u l a t i n g cr (x), t h e e r r o r v a r i a n c e o f Z (-x) 2 UK The advantage of having a statistical rather than deterministic estimator of the ore grade such as Z (-x), is that one can estimate its error variance, cr (x) = V a r ( Z ( x ) — Z (x)), 2 UK UK and thus have some idea of the magnitude of the estimation error. T h e UK error variance is as derived i n equation (2.13): (T (x) = a ( x ) + A ( x ) K A ( x ) - 2 A ( x ) k ( x ) 2 2 r (3.98) T e where A ( x ) is as defined i n equation (3.58). B y substituting A ( x ) into (3.98), and simplifying the expanded equation using the fact that F K~ F T 1 = A, we arrive at the following equation for cr (x): 2 a (x) 2 = <r (x) 2 kfaf/T^x) + (f (x) - F K- k(pc)) T 1 T A' 1 (f(x) - F ^ - k ( x ) ) T 1 (3.99) Chapter 3. Kriging with a nonhomogeneous trend 48 For ease of calculation, we can write a (x) in terms of G, k y ( x ) and R which have already 2 been calculated when finding <T(X) = 2 ZTJK{X)'- a (x) + k ( x ) M - M - k ( x ) 2 r : r 1 z + ( f ( x ) - F M- M- k(x)) T = 1 R T l R T (f(x) - F M- M- k(x)) T r x ^ W+ k^xfk^x) (f (X) - GTky(x))) + (lf T T = T R T (f (x) - G k ( x ) ) a (x)+ k (x) k (x)+ h h, T y y (3.100) (f(x) - G k ( x ) ) . (3.101) 2 T y T where h = R T r y Using the fact that R is a lower triangular matrix, we can again write (3.101) as R h = ( f ( x ) — G k ( x ) ^ and solve for h using the substitution method described earlier. T 2 / In practise, I am somewhat dubious of the effectiveness of this error variance, considering the assumptions made about the surface and the random function i t apparently originated from. I would not expect the estimated error variance from the sample data to be accurate enough to be of any measure of the magnitude of the error. 3.5 E s t i m a t i o n o f the covariance function A l t h o u g h UK may have a theoretical superiority over the other estimators, involving less restrictive assumptions about the surface to be estimated, i n practise this has not proved to be true. The essential reason for C / K ' s demise as an effective estimator lies i n the estimation of its covariance function. A l t h o u g h the covariance function is assumed known for a l l geostatistical estimators, i n general i t must be estimated from the sample Chapter 3. Kriging with a nonhomogeneous trend 49 data. For this reason, the UK estimator stated earlier is somewhat misleading. Before going any further, I shall define C ( x , y ; 0) as a parametric class of models assumed to contain the true covariance function, and I shall restate Ripley's version of the estimator in equation (3.91) so as to include this information: Z (x) = P(0) f (x) + ( Z - Ff3(6)) K(0)- k(x; T UK T 1 n where J3(0) = (F K(ey F)- F K(6)- Z T 1 1 T 0), (3.102) is the optimal unbiased G L S estimator of 1 n the trend coefficients, and 6 is an estimator for 0. W i t h a homogeneous trend, the experimental variogram can be unbiasedly estimated directly from the sample data using, for example, Matheron's estimator (Matheron 1969): 2 W where N(h) = jivM ^ [z(x ° " z(Xj)]2 ' ' (3 103) = {(XJ,XJ) : X j — X j = h} and |A~(h)| is the number of elements i n N(h). There are other more robust estimators which w i l l be discussed i n Chapter 5, but the important fact is that the estimation is unbiased. Under the second-order stationary assumptions, and using the filtering property of the kriging equations, the covariance function is equivalent to the negative of the variogram for the purposes of finding the optimal kriging weights. Therefore, assuming an appropriate parametric model is fitted to the experimental variogram, estimation with a homogeneous trend is unbiased. O n the other hand, when the trend is non-homogeneous, as for UK, estimation of the covariance function poses a larger problem. The process of estimating the trend, calculating the residual estimates at the sample points, and from these estimating C(h), introduces a substantial bias (Starks and Fang 1982a). Even if we were to know the correct covariance m a t r i x K, the process of estimating the trend by the o p t i m a l unbiased Chapter 3. Kriging with a nonhomogeneous trend 50 G L S estimator /t(x), and estimating the covariance function from the resulting residuals R „ is biased. T h i s is demonstrated by the following: £(x) = f(x) 3 = R = Z „ - FJ3 = (J - FA~ F K~ )Z n f(x) A- F K- Z T T f 1 T 1 nt 1 T — BZ . 1 n n Therefore, E(R ) = F(3 - F(3 = 0 = E(e ) = BE(Z ) = (/ - FA- F K~ )Ff3 1 n T 1 (3.104) n n but, Cov(R ) n = BCov(Z )B = (I- T n FA- F K~ )K{I l — K - {KK~ FA~ F 1 l + - T T - 1 K~ FA~ F ) 1 1 T FA~ F K~ K 1 T 1 FA~ F K~ KK~ FA~ F ) l = K-FA^F # K = Cov{Z ) T 1 1 1 T 7 (3.105) n Therefore, by using an estimator of the trend, we have introduced a bias, such that the variance of the new residual process i?(x) is different from that of the original data Z. n We cannot calculate this bias either as it depends on K. A back-substitution method has been proposed as a way of resolving this problem, (Ripley 1983); (Neuman and Jacobsen, 1984). B y starting w i t h an ordinary L S estimator of the trend using covariance matrix K = la , 2 0 the sample residuals are estimated, a covariance model fitted, a new covariance matrix K 0 trend found using the new K . 0 calculated, and then a G L S estimate of the This process is to be continued until the covariance function converges. It has been demonstrated (Cressie, 1987) that this back-substitution Chapter 3. Kriging with a nonhomogeneous trend 51 w i l l not remove the bias i n the covariance function, concluding that the best any iteration method can do is converge to a biased estimator. T h i s problem w i t h UK was recognised soon after its development i n 1969, and by 1973 M a t h e r o n had already come up w i t h an apparent solution to this problem i n the form of Intrinsic Random Functions of order p (Matheron 1973), where p is the order of the trend as defined on page 32. 3.5.1 Intrinsic Random Functions of Order p (IRF-p) Even today, UK and IRF-p are referred to as two different methods of estimation, but in fact, as best linear unbiased estimators, they are identical (Christensen 1990). It is best to think of IRF-p as just a different way of looking at the UK problem, resulting i n the unbiased estimation of the generalised covariance function (GCF) K (h), rather than p the usual covariance function C{h), to be used i n the UK system of equations. A l t h o u g h this new method provides a better understanding of the UK problem and the effect of the unbiasedness constraints, there is still much debate as to whether the GCF is easier to estimate and interpret than the original covariance function. A s much of the theory of IRF-p is buried i n the previous working for UK, i t can be very difficult to see what new insight it brings. Being a subtle concept, many texts have failed to provide a clear description of IRF-p and the GCF, and of their relevance to the previously discussed problem. T h e most popular detailed account of the theory is (Delfmer 1976), whereas a brief summary can be found i n (Journel 1989). T h e idea behind this method is analogous to A R I M A models i n time series, where linear combinations or increments of the sample variables are found such that the resulting Chapter 3. Kriging with a nonhomogeneous trend 52 series or surface is stationary with respect to the mean and variance. For a general configuration of points xi,..., x , m a generalised increment (GI) of order p is any linear combination of the corresponding random variables: m m J2"iZ{xi) =v Z 5></'(xi) such that T m i=l where = 0 l = 0,...,L, (3.106) i=l /i(x), I = 0 , . . . , L are the monomial trend component functions defined earlier, and p is the order of the trend. Some Vi, but not a l l , can be zero valued. These G T s have the property of filtering out any linear combination of the trend component functions, m J2t=o dififc), / added or taken away from the random function Z(x): L \ ^2v (z(x )±j2dji(x )) i=l i \ i m = l 1=0 / /m L \ E ^ W ± E ^ 5>/<(xi) i=l I \i=l 1=0 m = A n intrinsic tion Z(x) random function for which zZTLi Y,ViZ{xi). of order p (IRF-p) ViZ{*-i (3.107) is then defined as any random func- + x) is stationary (i.e. independent of x) i n its mean and variance, for any configuration of sample points x . . . , x , and any GI vector v, 1 ; m irrespective of its covariance structure. Therefore, i f our surface Z(x) is thought to have a trend of the form zZi=o A / / ( ) P x stationary residual e(x), then: E ^ Z ( * i + x) i=i = r L Yl i E/ '/'(xt + x) + e(x + x) u i=i ? i Vl=0 i u s a Chapter 3. Kriging with a nonhomogeneous trend E '»( )/a(Xi) T i=l (=0 X Ls=0 E E f e ( x ) E^/*( * x 1=0s=0 m .1=1 53 +E ^ ( » + ) e x x i=i m + E Vit^Xi + x) 1=1 E^e(xi + x), (3.108) i=l which is thus stationary i n both its mean and variance, for any configuration of sample points X i , . . . , x m and any GI vector u. T h i s is using the fact that the monomial trend component functions fi are closed under translation, so that //(x^ + x ) can be expanded as above. Thus Z(x) is an IRF-p, or more exactly a member of an IRF-p, the class or group of random functions with a trend of the form zZiLo A / t ( ) x M a t h e r o n proves, using the theory of Generalised Functions (Gel'fand and V i l e n k i n 1964), that an IRF-p is characterised by a generalised covariance function for any GI of the (GCF) K , such that, p IRF-p: V a r ( E ViZifr) ] = V a r ( E ^e(x*) \i=l \i=l = E E WiKpifr / - x (3.109) j)- i=l j= l In a recent paper by K i t a n i d i s (1993), the relationship between the normal covariance function C and the G C F K p is finally clarified. He states that an IRF-p is a class of equivalent functions whose covariance functions could be used by UK to give the same results. He points out that i n forming the IRF-p, the unbiasedness conditions essentially divide the covariance function C ( x , y ) into an effective part, the GCF A T ( x , y ) , and a p redundant part i ? ( x , y ) : C ( x , y ) = K ( x , y ) + i?(x,y), p where K (x, p (3.110) y ) is the same for a l l the functions i n the IRF-p, and i ? ( x , y ) changes w i t h each function. B y filtering out the polynomial trend, the GPs essentially filter out this Chapter 3. Kriging with a nonhomogeneous trend 54 redundant variance 7?(x, y) associated w i t h this trend, leaving the characteristic G C F of the IRF-p. K i t a n i d i s shows that the higher the order of the trend, the more variability i n the surface that can be described by the trend, and thus the more complex i2(x, y) becomes. Therefore, contrary to popular belief, the higher the order of the trend that can be justified, the more freedom one has to estimate C ( x , y). If we now consider the configuration of sample points x 1 ; ..., x n for UK, w i t h x 0 being any point of estimation, then the UK error can be written as: n Z(xo) - Z {xo) UK = n Z(x ) - £A (xo)Z( ) = £ 0 z UiZ(xi), X i i=l (3.111) i=0 where ' -Ai(xo) ^: = < 1 if i± 0. (3.112) ifi= 0 n and the unbiasedness conditions, E i ( o ) / / ( i ) = / ; ( o ) A x x x I = 0 , . . . , L , can be written as: n 2>/i(xi) = 0 Z= 0,...,L. (3.113) i=0 T h i s implies that the UK error is a GI of order p, and as Z ( x ) is an IRF-p, the error variance <Jg(x) is characterised by the G C F as i n (3.109). This can be minimised w i t h respect to the unbiasedness constraints, as for UK, producing the same set of equations for the kriging weights A ; ( x ) and the Lagrangian parameters <pi, but w i t h the covariance 0 function replaced by the GCF. Thus, for K r i g i n g purposes, instead of finding the specific covariance function C of the random function Z(x)\ characterising the class of random functions IRF-p. we only have to find the GCF K p Chapter 3. Kriging with a nonhomogeneous trend Theoretically, we can estimate K 55 unbiasedly without estimating the trend by finding a p function which satisfies (3.109) for any GI satisfying (3.111) and (3.112). A s s u m i n g N such G T s , h,.. .,IN can be found, where Ii = £ " = 1 K (h) is estimated by a VijZfc), p L S regression, minimising the function R: R = E [i - E(I ) 2 2 i=i N VijZ(%) E E - Vi=i i=i E "ijVikKpdxj E - (3.114) x |) fc j=i fc=i / T h i s uses the fact that the GTs now have zero mean. K is assumed t o be stationary p and isotropic i.e. K (h). K is also chosen as a polynomial which depends linearly on p p its parameters so that the regression simplifies to the unique solution of a set of linear equations, otherwise the regression would be very difficult to solve. A n o t h e r restriction is that i t must be positive definite for it to be a valid covariance function. It has been proven (Matheron 1973) that a valid model for a positive definite polynomial GCF is: a — 0 K (h) = p a 0 for p — 0 ai\h\ — \h\ + a | / i | a - osi\h\ + a \h\ - a \h\ 3 0 5 2 where a , a i , a 3 > 0 and a 0 2 > (3.115) for p = 1 3 2 3 for p = 2 - j^/aitt3. A more recent improvement on the form of these functions can be found i n (Delfiner et al., 1978) where the Green's function | / i | l o g | / i | is added to K (h) defined above for b o t h 2 p p = 1 and p = 2, and | / i | l o g | / i | for p = 2. Also a is replaced w i t h a 8(\h\), where 6(\h\) 4 0 0 is the D i r a c delta function, allowing for a discontinuity at h = 0, the nugget effect. This can arise from measurement error, incurring variance i n the value measured at a point. Chapter 3. Kriging with a nonhomogeneous trend 56 A s for finding the GFs of the random variables, Z ( x , ) , i = 0 , . . . , n , various methods have been proposed, the most practical probably being that proposed i n (David 1988). Essentially it uses the above fact that the UK error is a generalised increment, and thus by removing one sample value Z(xi) i n turn, and estimating it by UK using the other n — 1 sample values, n such G / ' s can be found. T h i s is of course assuming an K (h), e.g. —\h\. B y using these GFs i n (3.114), a better estimate of i n i t i a l estimate of p Kp(h) can be found, and the whole process is repeated, similarly to the iterative process described earlier for C(h). O n l y this time, it supposedly tends to an unbiased estimate of K (h). p A s stated earlier, it is still debated as to whether the GCF to estimate than the original covariance function C(h). K (h) p is i n practise easier One drawback to the GCF is its highly restricted choice of models. Another problem lies i n the form of estimation of its parameters. T h e success of fitting covariance functions has proven to depend much upon the skill of the practitioner at cleaning the data of outliers. T h e fact that L S estimation allows very little interaction by the practitioner can be a major disadvantage, as it is highly sensitive to outliers. Journel (1989) also states that the L S estimation often results i n a , the nugget effect, being dominant, resulting i n little or no spatial 0 correlation. So effectively, the surface is estimated by a trend surface w i t h white noise. T h i s is little gain for so much work. F i n a l l y Cressie (1986) states that the estimation of K p i n conclusion, it is still not clear whether K p equations. still appears to be biased. So or C should be used i n the UK system of Chapter 3. Kriging with a nonhomogeneous trend 3.5.2 57 Other forms of unbiased covariance estimation Since 1973, when the above form of unbiased covariance estimation was first proposed, other methods have been developed. One example is the maximum likelihood estimator 0 ( M a r d i a and M a r s h a l l 1984) which, assuming ( Z ( x i ) , . . . , Z ( x ) ) to be jointly gaussian, maximises the likelihood: n , . 1 exp{-\{z-Ft3{0)) K{0y\z-Fp{0))\ (3.116) T over a l l viable 0, for a chosen parametric class of covariance models C ( x , y ; 0). A n o t h e r example is minimum variance quadratic unbiased estimation ( M I V Q U ) developed by K i t a n i d i s (1985). The covariance matrix K is assumed to be a linear combination of m known symmetric nxn matrices K{ , and the covariance parameters, 8j, are assumed to be quadratic functions of the sample values: K = Y,Ki6i 9j = Z BZ T n 3 n (3.117) j = l,...,m. T h e matrices Bj are n x n (3.118) matrices chosen so that the estimators, 9j, are invariant to the addition of any linear combination of the trend component functions to the surface, and are unbiased. T h e optimal matrices Bj are found by minimising the error variance of each 9j under these constraints, just as for UK. Despite the apparent completeness of the theory, many problems arise when t r y i n g to apply it i n practise. Firstly, as the error variance involves the fourth moments of the sample values, a distributional assumption for ( Z ( x i ) , . . . , Z ( x ) ) is required. Secondly, n Chapter 3. Kriging with a nonhomogeneous trend an initial estimate K 0 58 of K is required. A l t h o u g h unbiasedness and invariance to the addition of a trend is withheld by using Ko, the resulting estimators 9j may not necessarily be m i n i m u m variance. Also if this method is used iteratively to find 0 by substituting the estimate K for Ko, it may become biased, as the matrices Bj become functions of the sample values. Similarly, if constraints for K to be a positive definite m a t r i x are included, then the matrices Bj will be functions of the sample values and may again interfere w i t h the unbiasedness of 0. Other examples of this form of unbiased covariance estimation are restricted maximum likelihood ( R E M L ) by K i t a n i d i s (1983), and minimum norm quadratic unbiased ( M I N Q U ) by M a r d i a and M a r s h a l l (1985), and are along the same lines as those described above. Chapter 4 Robustness of the Universal Kriging estimator, ZTJK{X) A l t h o u g h much effort and thought has been put into the development of the various forms of kriging estimator, relatively little has been put into analysing the robustness of these estimators to changes i n the assumptions, and providing robust estimators to cope w i t h these changes. W i t h respect to the UK estimator, ZUK(X), these changes could take the form of outliers i n the data; misspecification of the trend component functions; misrepresentation of the sampling configuration, and misspecification of the (co)variogram model and associated parameters. A s extremely little has been written on the robustness of the supposedly unbiased forms of covariance estimation discussed i n section 3.5.2, I shall concentrate on the other forms of covariance estimation where the trend function is estimated to find estimates of the residuals. T h e estimation process, for a specific sample and sample configuration, can be divided into four distinct steps: 1. choosing the trend component functions f(x) appropriate to the surface of estimation, and finding initial estimates of the trend coefficients (3. 2. calculating the estimated residuals R , and estimating the experimental (co)varion gram nonparametrically. 3. choosing an appropriate parametric (co)variogram model to be fitted to the experimental (co)variogram, and estimating the model parameters 0. 59 Chapter 4. Robustness of the Universal Kriging estimator, ZUK{X) 4. final estimation using the estimate of 6 and the sample data z 60 n E a c h step i n the estimation process could be affected by the above changes. A s each step is also dependent on the previous steps, the UK estimator could potentially be extremely unstable to these changes. In this chapter I describe how each step of the estimation process is affected by some of these changes, and how this may have an effect on the final estimator. In some cases, robust estimators have been devised to lessen these effects. 4.1 S t e p 1: T h e t r e n d c o m p o n e n t f u n c t i o n s The problem of estimating the trend coefficients j3 unbiasedly, given that the appropriate trend component functions are known, has been discussed at length, yet very little is known about how to choose these functions, or more specifically, the order of the trend, assuming it to be polynomial. Starks and Fang (1982b) suggest first seeing if a trend is evident i n the geology of the surrounding area, or i n the data. T h e n after fitting a trend model, and estimating the (co)variogram, they suggest using the cross-validation technique discussed earlier to obtain errors and predicted error variances. O n looking at the normal quantile-quantile plot of the standardised errors, an alternative model should be fitted if it is far from a N(0,1) distribution. The idea is that if the correct trend and regionalised variable models are fitted then the errors will be independent N(0,1) random variables. T h e problem w i t h this approach is that, assuming first that the kriging model is appropriate for the surface, it is extremely unlikely that the correct one will be chosen. Therefore, these standardised errors w i l l be correlated and lead to a biased estimation of the distribution. A second Chapter 4. Robustness of the Universal Kriging estimator, Z (x) UK 61 problem is that if the points are evenly spaced then if a point is taken out and estimated from the rest, the nearest sample point w i l l be about twice the normal distance from the point of estimation. T h i s will lead to more inaccurate estimation than usual of the surface values and error variances, and could again bias the error distribution. Also, due to the large number of steps i n the estimation process, it is very difficult to pinpoint misspecification of the trend as being the sole cause. Despite this, the application of different models to the data and the comparison of their errors and predicted error variances at the samples points is very sensible, though computationally expensive. Another possible way of choosing the order of the polynomial trend model from the sample data is to use stepwise regression and other commonly used methods i n linear regression for model choice. A good reference for these techniques is Weisberg (1985). Forward selection, one particular form of stepwise regression, begins w i t h a simple regression model and at each step, one variable (or group of variables) is added to the model. T h i s variable must provide the largest reduction i n the sum of squares of all the possible variables not i n the model, when the new regression model is fitted to the sample data. Equivalently, the variable w i t h the largest F-value is chosen. T h i s adding of variables is continued until a stopping rule is met. Possible stopping rules are that the number of variables i n the model have reached a chosen limit; the F-values for the rest of the possible variables to be added are below a chosen minimum; or the addition of the next variable would make the matrix F F T unconditioned and thus may cause roundoff errors in the L S fit of the model. In this particular case, we must add all the monomials of each order all at once, i n order for the trend model to retain its invariance to rigid motions. A s the coordinate system is only chosen for convenience, the trend model should not depend on it. For this reason, this stepwise method reduces to adding groups of monomials of Chapter 4. Robustness of the Universal Kriging estimator, Z (x) UK 62 successively higher order until either the m a x i m u m number of terms is reached or the F value is below the m i n i m u m . A s only one group of monomials is considered at each step, it is only sensible to choose the m i n i m u m for the F-value to be the point of significance for the specific F statistic. A s long as the order of the trend does not get too high, the previously discussed methods i n Section 3.2 should solve the problem of roundoff error. One drawback to this method is that the sample variables must be uncorrelated such that the residuals can be assumed to be independent N(0, o ) random variables. Under 2 geostatistical theory, the sample values are correlated, so they must first be uncorrelated by using a technique such as declustering. Alternatively, various studies have been done on the effects of choosing the wrong order of the trend, or misspecifing the trend. D i a m o n d and A r m s t r o n g (1984) suggest, using a form of perturbation analysis, that owerspecification of the trend would increase the potential instability of the UK estimator by increasing the bound on the relative error i n the kriging weights. Cressie and Zimmerman (1992) correctly state that overspecification of the trend would not cause any bias i n the final estimator, but due to the sample values being used to estimate the extra trend coefficients, the predicted error variance w i l l be an overestimate of the actual error variance. Theoretically, underepecification of the trend poses a more serious problem by introducing a bias i n the final estimator, but as Cressie and Z i m m e r m a n point out, estimation of the (co)variogram acts to some extent as a self-correcting mechanism i n the estimation process. A s long as the UK model sensibly divides the regionalised variable Z(x) into the large-scale variation of the trend, and the small-scale variation of the residuals, then if the trend is underspecified, the resulting residuals should contain the extra variance unaccounted for by the trend. Therefore the variogram of the residuals should naturally be overfit and thus correct the bias to some degree. Chapter 4. Robustness of the Universal Kriging estimator, Z (-x.) 63 UK 4.2 Step 2: Nonparametric estimation of the experimental (co)variogram Depending on whether the stationarity assumption of the residuals i?(x) is intrinsic or second-order, two estimators devised by Matheron (1969) can be used to estimate the experimental variogram and covariance function: 2 W C(h) = JFHhJl ] ^ { R { X i ) ~ R ( x ' ) ) 2 ( 4 ' U 9 ) = where N(h) = {(x;,Xj) : Xj — x,- =.h} and |AT(h)| is the number of elements i n N(h). For second-order stationarity, both the variogram estimator 27(h), and the covariogram estimator C(h) can be used to estimate the variogram and covariance function respectively, but for intrinsic stationarity, only the variogram estimator can be used, due to nonstationarity i n the variance. B o t h of these estimators are m i n i m u m variance a n d unbiased under the gaussian assumption of -R(x), but Cressie and G r o n d o n a (1992) show that i n terms of the bias, the variogram estimator is more stable to misspecification of the trend than the covariogram estimator. Therefore I advise using 27(h) rather than C(h) when assuming second-order stationarity. If the residual data R n is contaminated with outliers then i t is advised to use a more robust variogram estimator. One proposed by A r m s t r o n g and Delfiner (1980) uses a scale estimator devised by Huber (1964) rather than the sample variance. Cressie and Hawkins (1980) have devised a similar estimator to Matheron's but instead take the fourth power of the mean rooted difference: Chapter 4. Robustness of the Universal Kriging estimator, = h jm) Dl/ 64 ZUK(X) 7+ < 4121) T h e justification for this robust estimator is that the rooted differences |i?(xj) — i?(xj)|2 are very close to normal even when the differences i?(xj) — Rfaj) follow a symmetric distribution about zero with heavy tails contaminated by outliers. Another robust estimator, 270(h), is proposed by Cressie and Hawkins, replacing the mean of the rooted differences D i n equation (4.121) by the median D, which is a more robust statistic to outliers than the mean. Hawkins and Cressie (1984) found that for data containing outliers, all of the above estimators had a positive bias, but that 275(h) had a smaller bias than 275(h) which again had a smaller bias than Matheron's i n equation (4.119). W i t h respect to their variances, the order of 275(h) and 275(h) was reversed, but Matheron's estimator proved to have the smallest variance as the outlier effect became negligible, confirming its m i n i m u m variance property under gaussian conditions. Therefore one must use either 275(h) or 275(h) as an estimator of the experimental variogram when the data is thought to contain outliers. 4.3 S t e p 3: F i t t i n g a n a p p r o p r i a t e p a r a m e t r i c m o d e l , C ( x , y ; 0) T h i s step has been given the most attention with respect to robustness as it is considered to be the most sensitive part of the estimation process. M a n y studies have been conducted on robustness of variogram models to misspecification, or more exactly, to inappropriate choice of parametric model or wrong estimation of the parameters. A l t h o u g h kriging predictions for linear unbiased estimators have m i n i m u m error variance, Chapter 4. Robustness of the Universal Kriging estimator, Z (x.) 65 UK given that the underlying assumptions hold, if the (co)variogram is misspecified, the error variance can become very large. D i a m o n d and A r m s t r o n g (1984) have investigated the effect of small perturbations of the variogram on the kriging predictions for a fixed set of observations. B y defining a neighbourhood 6 of variograms close to the true variogram, D i a m o n d and A r m s t r o n g provide a bound on the relative error i n the kriging weights w i t h i n this neighbourhood. They find that this bound ultimately depends on the condition number K(T) of the true variogram matrix V, which is equal to the largest absolute eigenvalue of T divided by the smallest absolute eigenvalue. The larger K(T), the larger the bound and thus the larger the potential instability. Therefore they propose K(T) to be used as a measure of how stable a particular variogram model is to misspedification for a fixed set of observations. B y applying this to b o t h the gaussian and spherical variogram models, they found that the gaussian model gave very unstable behaviour for small perturbations i n the variogram whereas the spherical model was far more stable i n comparison. Bardossy (1988) proposes a different measure to D i a m o n d and Armstrong's which is also sensitive to the location of the estimation point x , a factor which Bardossy shows is a major influence on the kriging weights. Using this new measure, Bardossy also found the gaussian model to give more unstable behaviour to small changes i n the parameters than either the spherical or exponential models. For this reason, Stein (1989) recommends that if a model must be fitted that is quadratic at the origin, then the gaussian should be replaced by a more robust model such as C ( h ; 0) = c ? - 0 2 h i e (l + 6 h). 2 A s the number of observations increases, Stein and Handcock (1989) show that as long as the variogram model used is compatible w i t h the true variogram, then the differences i n their final prediction and their kriging variances become negligible. The term compatible is strictly defined by Stein and Handcock, but it can be loosely defined as having the same behaviour near the origin. A l t h o u g h this is an encouraging fact, I think that it is Chapter 4. Robustness of the Universal Kriging estimator, 66 ZTJK( ) X not necessarily relevant, as i n most applications observations must be kept to a m i n i m u m in order to cut costs. Z i m m e r m a n and Harville (1989) prove that, assuming the correct covariance model C ( x , y ; 0) and trend component functions have been chosen, the estimate of the trend coefficients (3(0) is unbiased under the weak conditions that z be symmetric about its mean, and that 0 be an even and translation invariant estimator. Using this fact, Z i m merman and Cressie (1991) show that under the same conditions the UK estimator is unbiased. Therefore, contrary to previous belief, the estimator of the covariance function parameters 0 need not be unbiased for the resulting estimate of the trend and the final kriging estimator to be unbiased. Finally, i n fitting a (co)variogram model to the experimental (co)variogram, w i t h Stein and Handcock's definition of compatibility i n m i n d , maybe more emphasis should be put on providing a better fit to the smaller values of h i n preference to the larger values. T h i s is achieved using the weighted LS method (Cressie 1985), described i n detail i n Chapter 5. T h e smaller h is, the closer an observation is to the unobserved point and thus the larger its weight i n predicting the value at the unobserved point. Therefore more care should be put into estimating these weights correctly at the expense of poorly estimating the weights for observations further away. The only problem w i t h this argument is that the larger h is, the larger the probable number of observations at that distance. Therefore the total of these negligible weights could sum to a weight that has a significant effect on the final prediction. Alternatively, if the process is truly stochastic, then only the closer observations should be included i n the estimator anyway, (i.e we should see the observation weights for the further points as exceeding a confidence interval i n some way and therefore being ignored.) Chapter 4. Robustness of the Universal Kriging estimator, 4.4 67 ZUK{X) Step 4: F i n a l estimation using C(x,y;0) and the sample data z n A l t h o u g h much has been written about the effects of misspecification of the trend and the (co)variogram on the final UK estimator ZUK{x), very little has been written about how it is affected by errors i n the sample configuration. D i a m o n d and A r m s t r o n g (1984) tried to analyse this by perturbing the coordinates of the sample slightly while keeping the variogram unchanged. They again derived a bound on the relative error i n the kriging weights, and found that the condition number K(T) of the variogram m a t r i x T for the original sample configuration was again the most influential factor. Therefore, K(T) could be used to give some idea of how robust different kriging systems are to a slight change i n the sample configuration, and thus to measurement errors i n the sample coordinates. O f course, calculating K(T) assumes knowledge of the true variogram and so this could not be used as a measure of instability when applied to real situations where the variogram is needed to be estimated. The same applies to the previous step when K(T) was used as a measure of instability i n the variogram. In conclusion, the UK estimator has an inherent stability i n its estimation process due to the model's decomposition of Z(x) into the large scale variation of the trend, and the small scale variation of the residuals. A s long as estimates of the trend coefficients /3 and the covariance parameters 0 are relatively good then, as discussed earlier, an error i n one is naturally corrected by an opposite error i n the other. Therefore, as long as each step in the estimation process is carried out carefully, and an appropriate covariance model is chosen that is compatible and relatively stable to misspecification of its parameters, then the UK estimator should be fairly robust to errors i n the estimation process. A l t h o u g h every effort has been made to make each step as robust as possible to outliers, Chapter 4. Robustness of the Universal Kriging estimator, Z (x) UK this does not guarantee that ZUK{X) 68 w i l l be robust to outliers, as it is a linear function of the data. Therefore i n some way these outliers must be isolated a n d edited so as not to overinfluence the predictions made be Z (x). UK way of doing this called Robust Kriging. Hawkins and Cressie (1984) propose a The method is based on the idea that outliers can only be identified by comparison w i t h the values of the immediately surrounding observations, rather than w i t h the sample mean w i t h which i t may agree. 4.4.1 Robust Kriging 1. estimate the variogram using robust methods as discussed earlier. 2. use this robust variogram i n the cross validation technique to estimate the kriging weights Oiij and the error variance s} for each sample point x» based on the rest of the data: 5(x ) = £ c ^ ( t X j ) (4.122) 3. use these weights ctij and the neighbouring sample values to Xj to get a robust prediction 5(x;) of .z(xj) for each sample point x, using a weighted median method: • if gridded data, then choose the nearest eight points, requiring only two different kriging weights to be used throughout. • solve iZj^i ocijSgn(z(xi) - z(xi)) = 0 for z(xi). • if there are multiple roots and/or intervals of roots to the equation then there w i l l always be an odd number of them, so choose z(xi) to be the middle one, and if i t is an interval, choose -Z(XJ) to be the middle of this interval. 4. Winsorise z(xi) by replacing i t w i t h z(x{) + cs, if z(xi) > z(x{) + CSJ, or w i t h z(xi) — csi if -z(xj) < z(xi) — csi, but otherwise keep it unchanged. Chapter 4. Robustness of the Universal Kriging estimator, 5. the final Robust K r i g i n g estimate -ZRK-(XJ) ZUK(^) 69 of z ( x ) at an unobserved point x is found by using the robust estimate of the variogram to calculate the kriging weights A; as usual, but by replacing the sample values z z {x) RK = £^w(xi) J n w i t h the Winsorised sample values (4.123) i T h i s method of smoothing the data is very similar to Huber's smoothing process used i n time series (Huber 1977). The crux of the method is i n the fourth step when the data is edited by Winsorizing. The larger the estimated error variance at X;, the more erratic the surface is assumed to be, and therefore the larger the bound about the weighted median of its surrounding values within which zfa) remains unedited. Obviously, the constant c also controls the w i d t h of this bound, and is thus an important parameter i n the estimation process. A relatively high value of 2-2.5 is usually chosen so as not to alter the data unless a sample value is extremely different from the weighted median of its surrounding sample values. Therefore, the essential difference between this form of robust kriging and the previous methods, is that the edited sample values are used i n the final estimate instead of the original sample values. Considering that the error variances sj are not estimated robustly, and that they may be extremely inaccurate as discussed i n section 4.1, this method may not be an effective way of identifying outliers. Instead, the editing of the data may only succeed i n biasing the final estimate. A n alternative method is to measure i n some way the influence that a particular sample value has on the final estimates, and then to decide whether the highly influential sample Chapter 4. Robustness of the Universal Kriging estimator, 70 ZJJK{^) values are outliers or important points that should be left untouched. Christensen, Johnson, and Pearson (1992) do this by using case deletion diagnostics. B y using a version of the UK estimator which allows for measurement errors at the sample points, the ac- tual values at the sample points are estimated ( z ) using the n measured sample values, n y . T h e n , w i t h one sample value removed, they are estimated again ( z _ i ) . A measure n similar to n Cook's distance (Cook 1977), quadratic i n the error ( z n — z _i), n is used to measure the relative influence a sample has on the resulting estimates. In this way, the most influential sample values can be found and edited if thought to be outliers. Chapter 5 A Numerical Application In this final chapter, I attempt to apply Ordinary K r i g i n g and Universal K r i g i n g w i t h a first order and second order trend to real data, using Ripley's form of the estimator and the practical information gained from the robustness studies discussed i n the previous chapter. T h e data consists of elevations of a mountain range i n northern B r i t i s h C o l u m b i a , near St. M a r y ' s Lake, measured by satellite, and was obtained courtesy of Dave M o u l t o n (1993). It is i n the form of a (359 x 504) grid of data points, w i t h a grid spacing of 60 metres. I assume the surface to be somewhat representative of an ore surface, to which these techniques are commonly applied. F r o m now on, to avoid any confusion, I shall refer to Ordinary K r i g i n g as Universal K r i g i n g w i t h a homogeneous trend. F i r s t l y I state my reasons for choosing the form of (co)variogram estimation associated w i t h estimating the trend and forming an experimental semivariogram, as opposed to the other supposedly unbiased methods described i n section 3.5.2. I then describe i n detail the method by which I estimate the (co)variogram from the data. Secondly, I divide the above surface into 84 smaller surfaces of (51 x 42) grid points, and choose 20 of them at random. I assume all of these surfaces to be isotropic and stationary in their variance. I predict each of these 20 surfaces for the same sample configuration of 50 randomly chosen points for different trend assumptions and 3 different methods of 71 Chapter 5. A Numerical Application 72 fit to the experimental semivariogram: linear, spherical w i t h L S fit, and spherical w i t h weighted L S fit. I repeat this for 20 different sample configurations. A s a measure of prediction performance, I calculate both the squared difference and the absolute difference between the predicted value and the actual value at each grid point. M y two measures of prediction performance or goodness-of-fit are then the mean squared difference, and the mean absolute difference over the grid of points. I investigate how they differ i n ranking the various method combinations. Using these performance measures, I come to some conclusions as to which trend assumption and method of variogram fit to use when applying UK to surfaces of this type. Finally, I outline the few methods which have been proposed to detect the order of the trend that would give the best estimator of the surface. I provide an example of how these methods can be very misleading. 5.1 Reasons for choosing the biased form of (co)variogram estimation A s previously mentioned i n Chapter 4, there are two general types of method by which the (co)variogram can be estimated from the data. T h e first requires i n i t i a l estimation of the trend. F r o m the residuals, the experimental (co)variogram is estimated, and a positive definite function is fitted, usually of linear, spherical, exponential or gaussian form. T h e second type of method supposedly avoids going through this biased process of estimating the trend by estimating the parameters of an assumed (co)variogram model straight from the data. A l t h o u g h , i n theory, these (co)variogram estimators of the second type may be unbiased, in practise their performance ultimately depends on the appropriateness of the chosen Chapter 5. A Numerical Application 73 (co)variogram model and the validity of the many assumptions made about the data and their distribution. A s pointed out i n section 3.5.2, if these assumptions are not satisfied, it could lead to biased estimators just as for the first method. Therefore, it would be unwise to rely on these apparent unbiasedness properties. Also, as stated i n section 4.3, a biased (co)variogram estimator does not necessarily lead to a biased kriging estimator. T h e methods of the second type also require a highly restricted choice of (co)variogram model, usually linear i n the parameters. These parameters, 9, are very difficult to interpret, and as a result, it is difficult to evaluate whether the estimates, 9, are reasonable for the considered data set. T h i s property, and the very nature of the estimation process, allows little interaction by the practitioner. Also, the final parameter estimates can often lead to an invalid (co)variogram. O n the other hand, the (co)variogram models for the first type of method are very versatile and naturally produce positive definite (co)variograms. T h e individual parameters are also very easy to interpret as, say, the variance of Z(x), or the gradient of the (co)variogram at h — 0. T h i s allows the practitioner to use his or her skill and experience i n choosing the (co)variogram model and the method of fitting it to the experimental (co)variogram. Lastly, and most importantly, there seems to be absolutely nothing documented on the robustness of these 'unbiased' estimators. W i t h respect to Delfiner's L S method, discussed i n section 3.5.1, it is well known that L S estimation is highly sensitive to outliers. A l t e r natively, much has been written about the sensitivity of estimators of the first method to outliers and misspecification of the trend and (co)variogram, as previously summarised i n Chapter 4. For this major reason, and the others discussed above, I choose to use Chapter 5. A Numerical Application 74 estimators of the first type to estimate the (co)variogram. 5.2 Detailed description of (co)variogram estimation For a stationary trend, the experimental (co)variogram can be directly estimated from the data, whereas for a first and second order trend, we face the problem of estimating the trend from correlated data without previous knowledge of the (co)variogram. A s there is still no generally accepted way of doing this, I simply decluster the sample points, and then assuming them to be independent, I fit a trend surface of first or second order by L S . T h i s method should be fairly stable, as the sample data is exact with no measurement or positional errors. I use the declustering technique discussed i n section 2.2, using a grid spacing of 4 metres. This generally reduces the sample from 50 to 43-44 values. I then calculate the residuals between the sample values and the fitted trend surface. Assuming these residuals to have originated from normally distributed random variables w i t h zero mean, I use them to estimate the experimental (co)variogram by the exact same methods as for a homogeneous trend, but using the estimated residuals -R(XJ), i = 1 , . . . , n, i n place of the original data. A s the data is free of measurement errors, I use Matheron's estimator for the variogram, as defined i n equation (4.119), instead of Cressie and Hawkins' robust estimators 2^p(h) and 2jf)(h), to make use of its m i n i m u m variance property for gaussian data. I also choose Matheron's variogram estimator over the covariogram estimator for its stability to misspecification of the trend. A s \N(h)\ is extremely small or zero-valued for most values of h, it is standard practise to divide the h scale into intervals or discrete lags, providing only one semivariogram estimate, ^(hj), per lag, where N(hj) Xi — Xj = hj, and hj is i n j th = {(XJ,X.,) : interval} (Ripley 1981). There is no standard interval Chapter 5. A Numerical Application 75 length but it seems common to divide the h scale into the order of 10-20 intervals. I use 15 intervals of equal length, w i t h the estimators evaluated at the midpoints. A n example of an experimental semivariogram of this form is shown i n Figure 5.3. Figure 5.3: A n experimental semivariogram after dividing the h scale into 15 discrete lags, providing only one estimator per lag. T h e next step is to fit a valid semivariogram model to the experimental semivariogram. T h e common models considered are linear, exponential, spherical, and gaussian, although there are many other variations of these which could be applied. A comparison of these models can be found i n Figure 5.4. A s mentioned i n Section 3.1.1, the [/If equations are constructed i n such a way that any constant term or factor i n the covariance function is filtered out. Therefore, any linear model of the semivariogram would be equivalent under UK, so there would be no point fitting one to the experimental semivariogram under these methods. O f the nonlinear models i n Figure 5.4, the most appropriate one to fit to the experimental semivariogram seems to be the spherical model, for every pairing of surface and sample Chapter 5. A Numerical Application 76 Linear S = Spherical E = Exponential Gaussian h Figure 5.4: A comparison of semivariogram models with the same sill or variance (except for the linear model), and the same gradient at h = 0 (except for the gaussian model which always has a zero gradient at h — 0). configuration that I choose. Also, as mentioned i n section 4.3, the spherical model is relatively stable to misspecification of 0. So, i n order to automate the estimation process, I assume that the spherical model is the most appropriate nonlinear model to fit i n all cases. A s most of the experimental semivariograms indicate negligible discontinuity at h = 0, which is expected as there is no measurement error i n the data, I also decide not to allow for a nugget effect i n the model. Therefore the final form of the spherical model that I fit to the experimental semivariograms is as follows: h< R V (5.124) h> R T h e final step of fitting the semivariogram model to the experimental semivariogram is where the practitioner's skill and experience comes into effect. The common methods of fitting the curve are by eye, by various techniques of estimating the parameters specific to the model fitted, or by some form of L S . A s the curve is nonlinear, explicit equations Chapter 5. A Numerical Application 77 for the parameters cannot be found. Instead some form of Newton-Rhaphson iteration technique must be applied to the L S statistic to find a local minimum. A s there may be many local minima, initial values for the parameters must be stated. For the spherical model, the parameters are S and R. S is the horizontal asymptote or sill of the model, and i n theory should be equal to the variance of Z(x), assuming the variance to be stationary. Therefore, the sill S can be estimated by an estimate of cr , such as the sample variance a , 2 n 1 2 i=i O f course, if a trend is fitted to the surface, then the estimated residuals i2(xj) should be used instead of the data i n the above formula. Cressie (1985) shows that <r w i l l 2 always have a negative bias due to the estimation of LL by Z, just as i n Section 3.1 when estimating the trend for higher dimensions. Cressie and Glonek (1984) propose that by using the median of the sample values, rather than the sample mean Z, as an estimate of p, this bias can be much reduced. Also <r is based on the assumption that the data 2 are uncorrelated, so I apply it to the declustered data. The gradient of the spherical semivariogram model at h = 0 is f | . B y estimating | | by the gradient of the line of best fit to the first three lags of the experimental semivariogram, and using the variance estimator for S, we can provide initial values of S and R for the L S iteration method. A s the experimental semivariogram is estimated for a different number of pairs, \N(hj)\, for each discrete lag hj, minimisation of the L S statistic may not be the most appropriate way of fitting the semivariogram model. Cressie (1985) proposes a weighted L S statistic to be minimised, based on \N(hj)\, which gives weight to early lags, and down weights Chapter 5. A Numerical Application 78 lags w i t h a small number of pairs. He proposes finding the model parameters 0 that minimise L rather than where L is the number of lags, j(hj) is the experimental semivariogram estimator, and j(hj \ 0) is the semivariogram model. Cressie states that this is a vast improvement over L S . Cressie also proposes that this estimator of 6 could be used as the starting value of an iterative generalised L S approach. W h e n comparing Cressie's weighted L S method w i t h normal L S for fitting the spherical model, I find that there is practically no difference between them when the model is a good fit to the experimental semivariogram. Otherwise, I find that Cressie's method does fit the early lags far better, which are the most important lags, as discussed i n section 4.3. A n example of this is shown i n Figure 5.5. A l s o Cressie's method gives more weight to the middle lags, which have the highest values of \N(hj) |. T h i s has the effect of bending the normal L S fit towards these weighted values. In Figure 5.5 this effect is shown to t u r n an almost linear L S fit into a more appropriate spherical fit. In general though, this weighting of the middle lags has the effect of changing the sill S and the range R without altering the general L S fit too much. A n example of this is shown i n Figure 5.6. Lastly, as the values of \N(hj)\ for lags hj = 13,14, and 15 are generally far below those of the other lags, I do not include them i n the L S and weighted L S fits. A similar 'rule' has been devised by Journel and Huijbregts (1978) for fitting semivariograms. To summarise, I apply UK w i t h a homogeneous, first order and second order trend Chapter 5. A Numerical Application 79 Figure 5.5: A comparison between LS and Cressie's weighted L S method for fitting a spherical model to the first 12 lags of the experimental semivariogram in Figure 5.3. surface, using both a linear semivariogram, and a spherical semivariogram fitted to the experimental semivariogram using L S and weighted L S . Finally, I use the relation C{h) = cr — j(h) 2 to estimate the covariance function C(h). A s UK filters out any constant terms or factors i n the covariance function, o and S could 2 be any constants that make C(h) a valid positive definite function, i.e. any numbers large enough to ensure that C(h) is non-negative i n the required range of h. However, it is best to estimate these parameters as accurately as possible so that C(h) can be used to estimate the error variance. 5.3 Results 20 randomly chosen geological surfaces are estimated by UKiox 20 different sample configurations of 50 randomly chosen points; 3 different methods of (co)variogram estimation Chapter 5. A Numerical Application 80 WLS LS 0 10 20 30 40 50 60 h Figure 5.6: A n example of how Cressie's weighted L S method alters the sill of the L S fit without altering the general L S fit too much. - linear ( L I N ) , spherical with a least squares fit ( L S ) , spherical w i t h a weighted least squares fit ( W L S ) ; and 3 different orders of the trend surface - stationary (O), first order (I), second order (II). For the response variable, I use both the mean squared difference (SD) and the mean absolute difference ( A D ) between the real and estimated surface over the (51 x 42) grid of points. W i t h o u t analysing the data statistically, it is very difficult to come to any real conclusions about which method of (co)variogram estimation is better, and for which order of the trend surface. It is also very unwise to compare the different methods and trend orders without first investigating whether there is any interaction between the 3 factors of the experiment: method, trend, and sample. Interactions between these factors can lead to very misleading conclusions if they are not dealt w i t h properly. In order to investigate these effects, and their comparative magnitudes, I attempt to model the response variable Chapter 5. A Numerical Application Vijkn a s a y ijkn n 81 overall mean p plus the sum of these effects and their interactions: = + M + T + S + (MT)ij + (TS) f l i j k jk + (MS) ik + (MTS) ijk +e ljkn (5.127) where i = 1,2,3, j = 0,1, 2, k, n = 1 , . . . , 20, and M ; is the effect due t o (co)variogram method i, Tj is the effect due to trend order j, and S k is the effect due to sample configuration k. n is the surface coefficient. The rest of the terms i n equation (5.127) are the interaction effects between the different factors. tij kn using the is the residual variable when {ijkn} combination of factors and surface. B y modelling the response variable i n this way, I can carry out an analysis of variance ( A N O V A ) study on the data to compare the relative effects. G o o d references for A N O V A tests and related diagnostic techniques are Montgomery(1991) and Hicks (1993). A s the sample configurations are randomly chosen, a mixed A N O V A model must be used. T h e method and trend effects, M and T, and their interactions, MT, are called fixed effects. The sample effects, S, and their associated interactions, TS, MS, and MTS, are called random effects. For each random class, S,TS,MS, and MTS, the effects are assumed to be normally distributed w i t h zero mean and constant variance for a l l i,j, and k. For each fixed class, M,T, and MT, the null hypothesis is that a l l the effects are zero, and the alternative hypothesis is that there is least one non-zero effect. For each random class, the null hypothesis is that the common variance is zero, and the alternative hypothesis is that i t is non-zero. If we assume that the model is appropriate, and the residuals are independent normally distributed random variables w i t h constant variance a , then the 2 F-values for each class follow an F distribution w i t h their respective degrees of freedom. In each case, the alternative hypothesis only replaces the null hypothesis if the F-value is significantly large. Therefore the upper t a i l is the critical region, requiring a one-sided test. Chapter 5. A Numerical Application 82 Before analysing the A N O V A tables for the A D and S D data sets closely, the residuals of their respective fits to the above model should be analysed to ensure that they satisfy the above assumptions. Plots of the residuals versus the fitted values for b o t h the A D and S D data sets are shown i n Figure 5.7. AD Data 35 40 Fitted Value SD Data 45 50 3000 4000 5000 6000 7000 8000 Fined Value Figure 5.7: Plots of residuals versus fitted values for the mean absolute difference (AD) and the mean squared difference (SD) data sets. A l t h o u g h the variance of the residuals for A D shows only a slight increase w i t h i n creasing magnitude of the fitted values, the variance of the residuals for S D shows a marked increase indicating the need for a transformation of the data. L o g ( S D ) pro- vides the most stable residual variance of a l l the transformations i n the natural set { S D , S D ^ , log(SD), S D ^ } but at the expense of an asymmetric histogram of the resid- 1 - uals, whereas S D ^ provides a slightly increasing residual variance similar to that for A D , but a near perfect histogram and normal quantile-quantile plot. T h e quantile-quantile plot for A D is also relatively straight except for a slight kink i n the negative t a i l between Chapter 5. A Numerical Application 83 -10 and -30. T h i s is reflected i n the histogram where there is a slight dip around -20. B o t h the histograms and normal quantile-quantile plots for A D and S D ^ can be found in Figure 5.8. Figure 5.8: Histograms and normal quantile-quantile plots for both the A D and SD2 residuals. A straight normal quantile-quantile plot indicates that the residuals follow a normal distribution. These results, and further plots of residuals versus method, trend, and sample, support my conclusion that neither the A D or S D ^ model fits are i n significant violation of the above assumptions as to seriously affect any conclusions made from the respective A N O V A tables. B y comparing the A N O V A tables for A D and SD5 i n Table 5.1, it is clear that they provide exactly the same information. For both response variables, the two fixed factor effects, the variance of the sample effect and the trend:method interaction effect have Chapter 5. A Numerical Application AD 84 Effect trend method sample trend:method trend:sample trend:sample trend:method:sample Residuals Df 2 2 19 4 38 38 76 3420 Sum of Sq 3629 1225 42887 996 1319 433 312 534749 M e a n Sq Effect Df 2 2 19 4 38 38 76 3420 Sum of Sq 18286 4575 157588 3359 7838 1663 1235 1080426 M e a n Sq 9143.20 2287.66 8294.08 839.73 206.27 43.76 16.24 315.91 SD* trend method sample trend: method trend:sample method:sample trend:method: sample Residuals 1814.49 612.56 2257.24 249.12 34.72 11.40 4.11 156.36 F Value 52.261 53.733 14.436 60.613 0.222 0.073 0.026 p Value 0.000 0.000 0.000 0.000 1.000 1.000 1.000 F Value 44.326 52.277 26.255 51.708 0.653 0.139 0.051 p Value 0.000 0.000 0.000 0.000 0.951 1.000 1.000 Table 5.1: A N O V A tables for the A D and SD2 data sets extremely high probability of being non-zero, whereas the other interaction effects are almost definitely zero. T h e trend:method interaction plots for A D and SD2 i n Figure 5.9 are also very similar. T h e y b o t h indicate that by increasing the order of the trend, the differences i n the methods are only magnified, without changing the actual ordering i n any way. Only when the trend is stationary does this order seem to change. Paired t-tests and paired W i l c o x o n signed rank nonparametric tests are performed to test the significance of these differences between the A D means for the various method/trend combinations. In Table 5.3, these method/trend combinations are ordered w i t h increasing Chapter 5. A Numerical Application Figure 5.9: 85 Dimension:method interaction plots for both the A D and SD2 residuals mean A D . mean LS WLS lin 0 36.00 36.08 36.11 I 37.72 37.06 36.35 II 39.85 38.81 36.86 s.d. LS WLS lin 0 11.84 11.84 11.82 I 12.80 12.92 12.00 II 14.50 13.82 12.51 Table 5.2: Mean and standard deviations of the A D data for the different method/trend combinations estimated over all the combinations of surface and sample configuration. Firstly, I test the differences between the A D means of the different (co)variogram methods, keeping the trend order fixed. For both the first and second order trend surfaces (I and II), b o t h the paired t-test and paired W i l c o x o n signed rank test give extremely small Chapter 5. A Numerical Application 86 p-values for each difference. Therefore, despite the fact that these differences seem i n significant, they are actually highly significant. This implies that for estimating a general surface of this type from 50 randomly chosen sample values, L I N is better than W L S , which is itself better than L S , when using a polynomial trend surface of order one or two. For a stationary trend (O), the two tests agree that the difference between W L S and L I N is insignificant, but for the other two differences, W L S - L S and L I N - L S , there is some disagreement. The respective p-values for the paired t-test are 0.263 and 0.314, suggesting that the two differences are insignificant, whereas the values for the W i l c o x o n test are 0.016 and 0.000, suggesting completely the opposite conclusions. O n looking at the density plots and normal quantile-quantile plots of the two differences i n Figure 5.10,1 feel that despite the heavy tails the distibutions are nearly normal and suggest an insignificant difference from zero for the respective means. Therefore I conclude that the two differences are insignificant, i n agreement w i t h the paired t-tests. A s these differences are so small, this conclusion is rather less important than for the first and second order trends. Secondly, I test the differences between the A D means for the different trend orders, keeping the (co)variogram method fixed. For each (co)variogram method, the two tests give extremely small p-values for every difference. T h i s implies that for any (co)variogram method, when estimating a general surface of this type from 50 randomly chosen sample values, the lower the trend order the better. T h e exact same conclusions are found by analysing the S D ^ data i n this way. These conclusions are also reflected i n the mean and median factor plots for A D i n Figure 5.11. These factor plots also point out that however significant the differences may be between the (co)variogram methods and trend orders, the differences i n the means for the various Chapter 5. A Numerical Application 87 Figure 5.10: Density and normal quantile-quantile plots for LS - W L S and LS - L I N differences for a stationary trend surface over all combinations of surface and sample configuration. The density plots are smoothed histograms. sample configurations are far greater. T h i s strongly indicates that the choice of sample configuration is very important i n the estimation of a surface, and suggests that more effort should be put into choosing the best sampling scheme. T h e problem w i t h choosing the best sampling scheme is that it is entirely dependent on the surface, of which we know extremely little at this stage i n the estimation process. Based on this little knowledge, various sampling schemes such as clustered, stratified, systematic or random sampling can be applied (Ripley 1981). 5.4 Detecting the optimal trend order In this final section, the problem of choosing the order of the trend surface is considered. Chapter 5. A Numerical Application 88 T h e results of my numerical example indicate that if the practitioner has no idea of which trend order to choose then he or she should choose the trend to be stationary as in general, for randomly chosen samples and surfaces, a stationary trend surface seems to perform the best. However, if there was any way that the o p t i m a l trend order could be detected from the sample data, this would obviously be better. A s discussed i n Section 4.1, there has been very little written on this problem of choosing the appropriate trend component functions, or i n this case, the order of the trend. A s mentioned, Starks and Fang (1982) suggest using the cross validation method to obtain kriging errors and predicted kriging error variances at the sample points. B y dividing the errors by their predicted variances, a set of standardised errors can be found. T h e y suggest choosing the model whose distribution of the standardised errors compares best w i t h a N(0,1) distribution. The problems w i t h this method have been discussed i n section 4.1. For these reasons, I feel that goodness-of-fit of the standardised errors to a N(0,1) distribution is very unlikely to be related to the optimality of a particular trend model. A simpler measure could be the mean squared difference or mean absolute difference of these kriging errors. A t h i r d method, also discussed i n Section 4.1, could be to fit all three trend models to the declustered data by L S and calculate the residual sum of squares at the sample points, as i n stepwise regression. F-tests can be performed to test whether the reduction i n the sum of squares, when moving from a stationary to a first order model, and from a first order to second order model, are significant. T h e results of these methods when applied to surface 13 for the first sample configuration are as follows: Chapter 5. A Numerical Application 89 T h e cross validation kriging errors and associated error variances are estimated for a homogeneous, first order, and second order trend surface, using a linear covariance function. Note that the parameters of the linear model are now required to be estimated i n order to estimate the kriging error variances. The normal quantile-quantile plots of the standardised errors for each trend model are i n Figure 5.12. It is clear that the best fit to a N(0,1) distribution is the second order trend model, w i t h the stationary model being the worst. A s expected, none of them are very good fits. Next, I calculate the mean squared difference and mean absolute difference of the kriging errors. I find the ordering of the trend models to be the same for b o t h measures, w i t h the second order model having the lowest value, and the first order model the highest. Lastly, the three trend models are fitted to the 43 declustered sample values by L S . The F statistic for moving from the stationary to the first order trend model is 29.71 which is highly significant for an F(2,40) distribution. The F statistic for moving from the first order trend to the second order trend model is 35.10 which is also highly significant for an F(3,37) distribution. Therefore again it is concluded that the second order trend model is the most appropriate, then first order, and lastly the stationary model. In fact, the trend model that provides the best fit to surface 13 for the first sample configuration is the first order model, w i t h the second order model coming i n last place. Therefore, these methods can clearly be misleading and result i n choosing the worst rather than the best trend model for a particular surface and sample configuration. Chapter 5. A Numerical Application 5.5 90 Conclusions T h e following conclusions are obviously limited to applications of UK to surfaces of the type studied i n this example. B o t h the surfaces, and the low sample size were chosen so as to direct the resulting conclusions as much towards ore mining applications as possible. A s prediction performance measures, A D and S D give slightly different orderings of the method/trend combinations for specific surfaces and sampling configurations. W i t h respect to the general analysis, A D and S D ^ give almost identical results. Therefore, I conclude that there is little difference between A D and S D . I conclude that, i n general, UK w i t h a homogeneous trend performs better than UK w i t h a first order trend, which performs better than UK w i t h a second order trend, whatever the form of variogram estimation. I conclude that, i n general, a linear variogram model performs better than a spherical variogram model fitted by weighted L S , which performs better than a spherical variogram model fitted by L S , for UK w i t h a first and second order trend. For UK w i t h a homo- geneous trend, there is no significant difference between the three forms of variogram estimation. Finally, I conclude that the methods for detecting the optimal trend order for a specific sample configuration and surface can be seriously misleading. Therefore, for a sample configuration of 50 randomly chosen sample points and a general surface of the type estimated i n this example, it is best to choose the simplest trend surface and the simplest method of variogram estimation. I suggest that this is as a result of my inexperience i n finding an initial estimate for the trend, and i n fitting a variogram Chapter 5. A Numerical Application model. 91 For these reasons, I conclude that unless the practitioner has experience i n applying this type of estimator and i n using the above methods for detecting the o p t i m a l trend order, UK should always be applied w i t h a homogeneous trend i.e. K r i g i n g , for which the choice of variogram estimation is unimportant. Ordinary Chapter 5. A Numerical Application 5 92 20' 20 n 8- 8- 5 97- 79- 6- 6- 1611 • Q < Q < "l 15- lOO ll • 16- II- LSWLS 102- LS WLST LINT '' 1 LIN4- s 15O- 0- 18- I 17ISM- 1751 s * 19- 8 S' 14 19- 12- 18- sample 18' trend Factors method sample trend method Factors Figure 5.11: Mean and median factor plots for the A D data set. The horizontal lines in the mean and median plots mark the overall mean and median respectively. Chapter 5. A Numerical Application 93 Homogeneous trend Quantiles of Standard Normal First order trend Quantiles of Standard Normal Second order trend Quantiles of Standard Normal Figure 5.12: Normal quantile-quantile plots for the cross validation standardised residuals. The standardised residuals for the second order trend provide the best fit to a N(0,1) distribution. Chapter 6 Conclusion Since the original linear estimators of Ordinary and Universal K r i g i n g were first developed i n the 1960's, a multitude of alternative estimators have been put forward, based on far more complex mathematical and statistical theory, yet few, if any, have proved to be practically superior. T h i s is supported by the fact that O r d i n a r y K r i g i n g is still being practiced i n many ore mines around the world, including the gold mines i n South Africa, where statistical techniques for ore estimation were first applied. So, as is often the case, the simplest model proves to be the best. A l t h o u g h Universal K r i g i n g (UK) is the most general of the linear estimators, it fails i n its attempt to model the surface as having a nonhomogeneous trend. T h e problem lies i n estimating the (co)variogram unbiasedly from sample variables w i t h unknown but different means. M u c h research has gone into solving this problem, resulting i n the development of supposedly unbiased forms of (co)variogram estimation which avoid prior estimation of the trend. Also, much research has gone into making UK more robust to errors at each stage i n its estimation process. Despite all this work, UK w i t h a nonhomogeneous trend is still considered to be too unreliable to be of any real practical use. Another problem is that, being a linear estimator, it is also very sensitive to outliers in the data. T h i s problem is difficult to avoid. Therefore, UK is very dependent upon the prior detection and editing of these outliers, which is itself a very difficult and unreliable procedure. In situations where outliers are known to exist, UK should be replaced by 94 Chapter 6. Conclusion 95 estimators that are inherently robust to outliers. After attempting to apply UK, using the (biased) techniques associated w i t h prior estimation of the trend, and estimation of the experimental semivariogram, I come to the conclusion that, for an inexperienced practitioner such as I, a homogeneous trend model should always be chosen when applying UK to a general unknown surface. M y reasoning for why the homogeneous model performs better than the higher order trend models i n my numerical example is that the estimation performance for the homogeneous model seems to be far less dependent upon the choice of (co)variogram model and method of fit, and therefore relies less upon the skill and experience of the practitioner. For nonhomogeneous trends, the estimation of the (co)variogram is also dependent upon prior estimation of the trend from correlated sample values, and this increases the potential instability of the estimation process. For both first and second order trend models, I find that the linear (co)variogram model, which demands no interaction at a l l by the practitioner, performs better than the spherical model fitted by weighted L S , which itself performs better than the spherical model fitted by L S . T h i s again indicates that the less interaction by an inexperienced practitioner the better, as it is unlikely that a linear model is more appropriate than a spherical model for a (co)variogram when b o t h fitted correctly. T h i s observation leads me to some suggestions for future research. Firstly, as the other unbiased forms of (co)variogram estimation demand little to no interaction by the practitioner, it would be of interest to see how these (co)variogram methods compare when applied to the same surfaces and sample configurations as i n my numerical example. Secondly, it would be of interest to compare UK with other non-statistical estimators Chapter 6. Conclusion 96 for different surfaces of varying irregularity and different numbers of sample values, and try to find some definition of the type of surface and number of sample values that UK is best suited for. It is known that the interpolating spline is equivalent to UK w i t h a first order trend and a fixed covariance function equal to the Green's function, |/i| Zo(?|/i| 2 (Dubrule 1983). The theoretical advantage that UK has over spline interpolation is that it is more versatile, allowing the trend order and the covariance function to change to suit the surface being fitted. It would be of interest to see whether this translates to an advantage i n practise. Thirdly, I think less attention should be put towards finding unbiased forms of estimation and more towards finding estimators that are robust to outliers. A s these estimators are generally applied to ore deposits, where there are likely to be errors i n measurement, I feel that the effort i n this field of research has been somewhat misplaced. Lastly, my numerical example strongly indicates that the choice of sample configuration is extremely important i n the estimation of a surface, far outweighing the effects of the different trend orders and forms of (co)variogram estimation. Therefore, I feel that more effort should be put towards finding methods of choosing an optimal sampling scheme. Bibliography A R M S T R O N G , M . AND D E L F I N E R , P . , 1980, Towards a more robust variogram: A case study on coal, Internal note N-671: Centre de Geostatistique, Fontainebleau, France. B A R D O S S Y , A . , 1988, Notes on the robustness of the kriging system: M a t h . Geol., 20, 189-203. CHRISTENSEN, R . , 1990, The equivalence of predictions from univeral kriging and intrinsic random function kriging: M a t h . Geol., 22, 655-664. C H R I S T E N S E N , R . , W E L S E Y , J . , AND P E A R S O N , L . M . , 1992, P r e d i c t i o n diagnos- tics for spatial linear models: Biometrika, 79, 583-591. C O O K , R . D . , 1977, Detection of influential observations i n linear regression: Technometrics, 19, 15-18. CRESSIE, N . , 1985, F i t t i n g variogram models by weighted least squares: J . Int. Assoc. M a t h . G e o l , 17, 563-586. CRESSIE, N . 1986, K r i g i n g nonstationary data, J . A m e r . Stat. Assoc. 81, 625-634 CRESSIE, N . 1987, A nonparametric view of generalised covariances for kriging: M a t h . G e o l , 19, 425-449. CRESSIE, N . , AND G L O N E K , G . , 1984, M e d i a n based covariogram estimators reduce bias: Stat. P r o b . Lett., 2, 299-304. CRESSIE, N . , A N D GRONDONA, M . O . , 1992, A comparison of variogram estimation w i t h covariogram estimation, in M a r d i a , K . V . (Ed.), The Art of Statistical Science: Wiley, New York. CRESSIE, N . , AND HAWKINS, D . M . , 1980, Robust estimation of the variogram, I: J . Int. Assoc. M a t h . G e o l , 12, 115-125. CRESSIE, N . , A N D ZIMMERMAN, D . L.,1992, O n the stability of the geostatistical method: M a t h . G e o l , 24, 45-59. DAVID, M . , 1977, Geostatistical Ore Reserve Estimation: 97 Elsevier, New Y o r k , 364p. Bibliography D A V I D , M . , 1988, 98 Handbook of Applied Advanced Geostatistical Ore Reserve Esti- mation: Elsevier, Amsterdam. D E L F I N E R , P . , 1976, Linear estimation of nonstationary spatial phenomena, in Guarascio, M . , et al. (Eds.), Advanced Geostatistics in the Mining Industry: Reidel, Dordrecht, 49-68. D E L F I N E R , P . , R E N A R D , D . , A N D CHILES, J . P . , 1978, BLUEPAK-3D Manual: Centre de Geostatistique, Fontainebleau, France. D I A M O N D , P . , AND A R M S T R O N G , M . , 1984, Robustness of variograms and conditioning of kriging matrices: M a t h . Geol., 16, 809-822. D U B R U L E , O . , 1983, T w o methods w i t h different objectives: splines and kriging: M a t h . G e o l , 15, 245-257. G E L ' F A N D , I. M . , A N D V l L E N K l N , N . Y . , 1964, Generalised Functions: Applica- tions of Harmonic Analysis, vol. 4: Academic Press, New York, 384p. G O L U B , G . H . , A N D REINSCH, C , 1970, Singular value decomposition and least squares solutions: Numer. M a t h . , 14, 403-420. G O L U B , G . H . , A N D V A N L O A N , C . F . , 1983, Matrix Computations: Johns Hopkins U n i v . Press, Baltimore. HAWKINS, D . M . , AND CRESSIE, N . , 1984, Robust kriging - a proposal: G e o l , 16, 3-18. Math. HICKS, C . R . , 1993, Fundamental concepts i n the design of experiments (4th ed.): Saunders College P u b . , New York, 509p. H U B E R , P . J . , 1964, Robust estimation of a location parameter: A n n . M a t h . Stat., 35, 73-101. H U B E R , P . , 1979, Robust smoothing, in Launer R . L . , and W i l k i n s o n , G . N . (Eds.), Robustness in Statistics: Academic Press, New York, 33-47. J O U R N E L , A . G . , 1980, The lognormal approach to predicting local distributions of selective mining unit grades: M a t h . Geol., 12, 285-303. J O U R N E L , A . G . , 1983, Nonparametric estimation of spatial distributions: M a t h . Geol. 15, 445-468. A . G . , 1989, Fundamentals of Geostatistics in Five Lessons: Short Course i n Geology, 8: American Geophysical Union, Washington, D . C . , 40p. JOURNEL, Bibliography 99 J O U R N E L , A . G . , A N D HUIJBREGTS, C . , 1978, Mining Academic Press, 600p. Geostatistics: London, KITANIDIS, P . K . , 1983, Statistical estimation of polynomial generalised covariance functions and hydrologic applications: Water Resour. Res., 19, 909-921. KITANIDIS, P . K . , 1985, Minimum-variance unbiased quadratic estimation of covariances of regionalised variables: M a t h . Geol., 17, 195-208. KITANIDIS, P . K . , 1991, Orthonormal residuals i n geostatistics: and parameter estimation: M a t h . Geol. 23, 741-758. M o d e l criticism KITANIDIS, P . K . , 1993, Generalised covariance functions i n estimation: G e o l , 25, 525-540. M A R C O T T E , D . , A N D DAVID, M . , 1985, The Bi-Gaussian approach: method for recovery estimation: M a t h . Geol., 17, 625-644. Math. A simple M A R D I A , K . V . , AND M A R S H A L L , R . J . , 1984, M a x i m u m likelihood estimation of models for residual covariance i n spatial regression: Biometrika, 7 1 , 135-146. M A R D I A , K . V . , AND M A R S H A L L , R . J . , 1985, M i n i m u m norm quadratic estimation of components of spatial covariances: M a t h . Geol., 17, 517-525. M A T H E R O N , G . , 1969, Le Krigeage Universel: Les Cahiers d u Centre de M o r p h o l o gie Mathematique de Fontainebleau, Fascicule 1. M A T H E R O N , G . 1971, L a theorie des variables regionalisees et ses applications: Les Cahiers du Centre de Morphologie Mathematique de Fontainebleau, Fascicule 5, 211p. M A T H E R O N , G . , 1973, The intrinsic random function: A d v . A p p l . P r o b . , 5, 438-468. M A T H E R O N , G . , 1976, A simple substitute to conditional expectation: T h e Disjunctive K r i g i n g , in Guarascio, M . , et al. (Eds.), Advanced Geostatistics in the Mining Industry: Reidel, Dordrecht, 221-236. M O L E R , C . B . , 1967, Computer Solutions H a l l , New Jersey. of Linear Algebraic Systems: Prentice- M O N T G O M E R Y , D . C , 1991, Design and analysis of experiments: Wiley, New York, 649p. Bibliography 100 Nonconforming Finite Element Solution for the Plate Bending problem with application to Visual Surface Reconstruction: M O U L T O N , J . D . , A N D SPITERI, R . , 1993, F i n a l project, C S 542a, U n i v . of B . C . N E U M A N , S. P . , A N D J A C O B S E N , E . A . , 1984, Analysis of nonintrinsic spatial variability by residual kriging w i t h application to regional groundwater levels: M a t h . G e o l , 16, 499-521. PRESS, W . H . , FLANNERY, B . P . , TEUKOLSKY, S . A . , AND V E T T E R L I N G 1989, 702p. W.T., Numerical Recipes: The Art of Scientific Computing: C a m b . U n i v . Press, R E N D U , J - M . M . , 1979, N o r m a l and lognormal estimation: M a t h . Geol., 1 1 , 407422. R I P L E Y , B . D . , 1981, Spatial Statistics: Wiley, New York. STARKS, T . , A N D F A N G , J . , 1982a, The effect of drift on the experimental semivariogram: J . Int. Assoc. M a t h . Geol., 14, 309-319. STARKS, T . , AND F A N G , J . , 1982b, O n the estimation of the generalised covariance function: J . Int. Assoc. M a t h . Geol., 14, 57-64. STEIN, M . L . , 1989, The loss of efficiency i n kriging prediction caused by misspecification of the covariance structure, in Armstrong, M . et al. (Eds.), Geostatistics vol. 1: Kluwer, 273-282. STEIN, M . L . , A N D H A N D C O C K , M . S., 1989, Some asymptotic properties of kriging when the covariance function is misspecified: M a t h . Geol., 2 1 , 171-190. SULLIVAN, J . , 1984, Conditional recovery estimation through P r o b a b i l i t y K r i g i n g , in Verly, G . et al. (Eds.), Geostatistics for Natural Resources Characterisation: Reidel, Dordrecht, 365-384. V E R L Y , G . , 1983, T h e Multigaussian approach and its application to the estimation of local reserves: M a t h . G e o l , 15, 259-286. W E I S B E R G , S., 1985, Applied Linear Regression: Wiley, New York, 324p. Z I M M E R M A N , D . L . , AND CRESSIE, N . , 1991, M e a n squared prediction error i n the spatial linear model w i t h estimated covariance parameters: A n n . Inst. Stat. M a t h . , 43 Bibliography 101 Z I M M E R M A N , D . L . , A N D H A R V I L L E , D . A . , 1989, O n the unbiasedness of the Papadakis estimator and other nonlinear estimators of treatment contrasts i n fieldplot experiments: Biometrika, 76, 253-259.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Spatial estimation: the geostatistical point of view
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Spatial estimation: the geostatistical point of view McFadzean-Ferguson, Simon 1995
pdf
Page Metadata
Item Metadata
Title | Spatial estimation: the geostatistical point of view |
Creator |
McFadzean-Ferguson, Simon |
Date Issued | 1995 |
Description | Geostatistics involves the statistical estimation of erratic surfaces, similar to those found in geology, using sample data. It has been my experience that there are few texts in geostatistics written for people who are new to the subject, and who have not been immersed in it since its inception in the 1960's. To prevent other people becoming confused by the changing notation, and unspoken assumptions, I provide an overview of this subject, with the aim of providing a clearer understanding of the concepts involved, the assumptions made, and the motivation behind each type of estimator. I then concentrate on the more general form of estimation assuming a nonhomogeneous trend, called Universal Kriging. I explain in detail how this estimator can be found in an accurate and computationally efficient way. Using the information gained from robustness studies of this estimator, I then attempt to apply it to real surfaces, for different methods of covariance estimation and trend orders. |
Extent | 4585667 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-01-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0079902 |
URI | http://hdl.handle.net/2429/3817 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1995-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1995-0381.pdf [ 4.37MB ]
- Metadata
- JSON: 831-1.0079902.json
- JSON-LD: 831-1.0079902-ld.json
- RDF/XML (Pretty): 831-1.0079902-rdf.xml
- RDF/JSON: 831-1.0079902-rdf.json
- Turtle: 831-1.0079902-turtle.txt
- N-Triples: 831-1.0079902-rdf-ntriples.txt
- Original Record: 831-1.0079902-source.json
- Full Text
- 831-1.0079902-fulltext.txt
- Citation
- 831-1.0079902.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0079902/manifest