BAYESIAN MULTIVARIATE INTERPOLATION WITHMISSING DATA AND ITS APPLICATIONSbyWEIMIN SUNB.Sc., University of Electric Science and Technology of China, 1983M.Sc., University of Georgia, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESDepartment of StatisticsWe accept this thesis as conformingto the equired tandardTHE UNIVERSITY OF BRITISH COLUMBIADecember 1994©Weimin Sun, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of____________The University of British ColumbiaVancouver, CanadaN. IDateDE-6 (2188)AbstractThis thesis develops Bayesian multivariate interpolation theories when there are: (i) datamissing-by-design; (ii) randomly missing data; (iii) monotone missing data patterns.Case (i) is fully discussed both theoretically and empirically. A predictive distribution yields a Bayesian interpolator with associated standard deviation, a simultaneousinterpolation region, and a hyperparameter estimation algorithm. These results are described in detail. The method is applied to interpolating data from Southern OntarioPollution. An optimal redesign of a current network is proposed. A cross-validationstudy is conducted to judge the performance of our method. The method is comparedwith a Co-kriging approach to which the method is meant to be an alternate.Case (ii) is briefly discussed. An approximation of a matrix T-distribution by a normaldistribution is explored for obtaining an approximate predictive distribution. Based onthe approximate distribution, an approximate Bayesian interpolator and an approachfor estimating hyperparameters by the EM algorithm are described.Case (iii) is only touched on. Only an iterative predictive distribution is derived. Furtherstudy is needed for finding ways of estimating hyperparameters.11ContentsAbstract iiTable of Contents iiiList of Tables vList of Figures viAcknowledgements ix1 Introduction 12 Bayesian Multivariate Interpolation with Missing Data 132.1 Introduction 132.2 Bayesian Interpolation 142.3 Matrix T-Distribution 182.4 Interpolation with Data Missing-by-design 252.4.1 Predictive Distributions and Interpolation 262.4.2 Estimation of Hyperparameters 312.4.3 EM Algorithm 382.5 Interpolation with Randomly Missing Data 402.6 Interpolation with Monotone Missing Data Patterns 461112.7 Proofs of Theorems, Corollaries and Lemmas 493 Application to Air Pollution Data 663.1 Application to Monthly Pollution Data 683.2 Application to Daily Pollution Data 764 Application to Environmental Monitoring Network Redesign 844.1 Theory of Network Redesign with Entropy 854.2 Network Redesign with Data Missing-by-design 885 Cross-validation Study 905.1 Simultaneous Interpolation Versus Univariate Interpolation 905.2 Trends in Adjusted Mean Squared Prediction Error (AMSPE) 935.3 Comparison with CoKriging 956 Concluding Remarks and Future Studies 104Bibliography 108Appendix A 116Appendix B 121Appendix C 153ivList of Tables3.1 Pollutants Measured at Each Gauged Site, Where a, b, d And e RepresentNO2, SO4, 03 And SO2 Respectively 693.2 The Estimated Between-pollutants-hypercovariance Matrix of the Log-transformed, Summer Monthly Data 723.3 Correlations Between Residuals of Log-Transformed, Observed and Estimated Pollution Levels at Gauged Sites 753.4 The Estimated Between-pollutants-hypercovariance Matrix of the Logtransformed Daily Summer Pollution Levels in Southern Ontario 793.5 Correlations Between the Residual of Log-transformed, Summer DailyObserved and Estimated pollutants at Gauged Sites 815.1 Latitudes and Longitudes of the Gauged Sites 1015.2 MSPEs by Both Methods 102VList of Figures3.1 Locations of gauged sites in Southern Ontario plotted with Census Subdivision boundaries, where monthly pollution levels are observed and Sites3, 29 (outliers) are not plotted 1533.2 Locations of selected sites in Southern Ontario plotted with Census Subdivision boundaries, where monthly interpolated pollution levels are needed.1543.3 Plots for monthly observed and fitted, log-transformed levels of 03 in ppb,SO2, NO2 and 504 in g/m3,at Gauged Site 5 1553.4 Normal quantile-quantile plots for original and log-transformed monthlylevels of SO4 in pg/m3 at Gauged Site 4 1563.5 Plots for autocorrelation and partial autocorrelation of monthly, log-transformed levels of SO4 in ug/m3 at Gauged Site 4 1573.6 A rough checkerboard obtained in the SG step 1583.7 A smoother checkerboard obtained in the SG step 1583.8 Scatter plot of observed covariances vs predicted covariances obtained bythe GS approach 1593.9 Means of monthly levels of 03 in ppb, in summers of 1983 -‘ 1988 atgauged sites in Southern Ontario plotted with CSD boundaries 1603.10 Means of monthly levels of 03 in ppb, in summers of 1983 1988 atselected sites in Southern Ontario plotted with CSD boundaries 160vi3.11 Scatter plots for residuals of monthly observed pollutant levels againstresiduals of interpolated levels at the log-scale in winter and summerrespectively, where levels of 03 are in ppb, SO2, NO2 and SO4 in ,ug/m3. 1613.12 Pollutant-wise scatter plots for residuals of monthly observed pollutantlevels against residuals of interpolated levels at the log-scale in winter andsummer respectively, where levels of 03 are in ppb; SO2, NO2 and SO4in g/m3 1623.13 Boxplots for predicted, observed and residual levels of log-transformed,monthly concentrations of 03 in ppb, SO2, NO2 and 304 in /Lg/in3,respectively. 1633.14 Locations of gauged sites in Southern Ontario plotted with Census Subdivision boundaries, where daily pollution levels are observed and Sites3, 26 (outliers) are not plotted 1643.15 Normal quantile-quantile plots for original and log-transformed daily levels of SO4 in tg/m3 at Gauged Site 1 1653.16 Plots for autocorrelation and partial autocorrelation of daily, log-transformedlevels of 03 in ppb at Gauged Site 6, before an AR(1) transformation istaken 1663.17 Plots for autocorrelation and partial autocorrelation of daily, log-transformedlevels of 03 in ppb at Gauged Site 6, after an AR(1) transformation istaken 1665.1 Plot for trends in AMSPE 1675.2 MSPEs obtained by Hass’s interpolator and the Bayesian interpolatorwith original acid rain data 1685.3 MSPEs obtained by Hass’s interpolator and the Bayesian interpolatorwith log-transformed acid rain data 168vii5.4 Boxplots of observed nitrate levels with/out log-transformation at 35 sitesin US 169vu’AcknowledgementsI would like to thank Prof. James Zidek for his excellent guidance, for his fundamental influence on my “statistical thinking” and for providing me with a opportunity towork on a practical problem. I would like to thank Dr. Nhu Le for the many fruitfuldiscussiolls I had with him.I am indebted to Dr. Rick Burnett for his comments and for providing me with thedata used in my investigation. I thank Dr. Tim Hass; only with his generous helpwas the comparison of the method in this thesis with his own method feasible. I thankMr. Hongbin Zhang, our systems analyst, for his support. I thank Mr. Rick White forproviding part of the C codes I needed for my analysis. I thank the members of mySupervisory Committee for their help.Very special thanks go to Ms. Chun Zhang, the McConnells and the Veecks for theirsubstantial help.Finally, I would like to thank the University and its Department of Statistics along withHealth Canada for their financial support during my graduate studies.ixChapter 1IntroductionA spatial data set consists of a collection of measurements or observations on one or moreattributes taken at specified locations. At a fixed time, those data are observed over arestricted geographical or other region. Many models and theories have been establishedfor spatial data analyses. The following several paragraphs give a brief overview of thisnewly established field.Cressie (1991a) defined a general spatial model. Let s e R° be a generic samplingsite in d-dimensional Euclidean space and X(s) the response vector at s. As s variesover the index set D C Rd, {X(s) : s D} represents a multivariate random field(or alternatively, process). According to the form of D, spatial data analyses can beclassified into four types. When D is a fixed, non-degenerate convex subset of Rd andX(s) is a random vector at site s D, such analysis is called geostatistical data analysis.When D is a fixed collection of countably many points of Rd and X(s) is a random vectorat site S E D, it is called lattice data analysis. When D is a random point process andX(s) is a random vector at site s D, it is called point pattern data analysis. WhenD is a random point process and X(s) is a random set, it is called object data analysis.Cressie (1991a) gives a comprehensive survey of spatial data analyses.1An earlier form of spatial data occurred as a data map centuries ago (Halley 1686).Recently spatial data have been seen in many applications. For example, the grade ofore deposit is measured at various sites spread over a constrained geographical area. Interms of the general spatial model, D is the constrained geographical area, d is two,s is a two dimensional vector of the longitude and latitude of a site and X(s) is themeasured ore deposit grade at location s. Another example is the remotely sensed yieldof wheat on earth at a fixed time. In the ore grade example, the ore grade at a site islikely similar to the grade at a nearby site. However, it will be less similar to that at afaraway site, provided these two sites are not on the same ore deposition ridge. It meansan underlying relationship among the ore deposit grades does exist. This underlyingrelationship is called “spatial correlation”. Spatial correlation plays an important rolein spatial inference.While the ore deposition grade does not change over time, observations like humidity,temperature and wind within a region are different from time to time. So they have to bemeasured not only across sites , but also over time. Such data are called spatial-temporaldata. Another example of spatial-temporal data comes from air pollution monitoring.There, air pollution levels, e.g. sulphate or nitrate levels, are observed at different sitesand regularly, say hourly, for a segment of time duration.Among the four types of spatial analyses, geostatistics is most relevant to the topic ofthis dissertation. Geostatistics was established in the early 1980s as a hybrid disciplineof mining engineering, geology, mathematics and statistics. Its study began with Matheron’s early 1960’s papers on Krigiug, a name given by Matheron after D. 0. Krige,a South African mining engineer who developed empirical methods to determine trueore grade distribution based on sampled ore grades. In Kriging, the model of a random process X(s) generally consists of two terms: a trend and an error. The trend2term catches large-scale variation of X(s) and is deterministic. The error term reflectssmall-scale variation of X(s) and is a random process. The Kriging approach gives a“best linear unbiased estimator” (BLUE) of the unknown ore-grade at sites in the prediction region using ore-grade samples from neighboring sites by exploring correlationsamong ore-grades at different sites. Later, in other applications, many different formsof Kriging are developed. Examples are ordinary Kriging; simple Kriging; universalKriging; Bayesian Kriging; robust Kriging and Co-Kriging. These Kriging methods alsogive best linear predictors. Thus Kriging has become synonymous with optimal spatiallinear prediction.Before exploring the family of Kriging methods further, we define some basic concepts.In Kriging, the spatial correlation is expressed in terms of the Variogram, defined asVar(X(si)— X(s2)), sl,s2 E D. When E(X(si) — X(s2)) = 0, E(X(si) — X(s2))= Var(X(si) — X(s2)). The equation says that if the mean function of a randomfield is a constant, the variogram and expected mean squared difference are the same.Sometimes, the variogram has other names. For example, it is called Dispersion inSampson and Guttorp (1992) (SG hereafter). Another important concept in Kriging isintrinsic stationarity. An intrinsically statiollary process, X(s), has: (i)a constant meanfor all s D; (ii) Var(X(si)— X(s2)) = 2r(s1— s2), where r(.) is a real, non-negativefunction in Rd. In Cressie (1991a) 2r(.) is called variogram and r(.) semi-variogram.The concept of variogram implicitly implies intrinsic stationarity. If (ii) is replaced byCov(X(si), X(s2)) = C(si—s2), X(s) is a second-order stationary process. The functionC(.) is called a covariogram. If further, 2r(si — s2) = 2r(H Si — s2 II) (C(si — s2) =C Si — s2 2r(.) (C(.)) is isotropic. The cross-variogram and cross-covariogrambetween the random fields X(s) and X3(s) have similar definitions.Intrinsic stationarity is a strictly weaker assumption than second-order stationarity.3From 2r(h) = 2(0(0) — 0(h)), we can prove that second-order stationarity impliesintrinsic stationarity. However, the opposite is not true. For example, a Brownianmotion is an intrinsically stationary process but not a second-order stationary process.In many applications, the variogram (covariogram) is unknown. One has to estimateit. There are both parametric and nonparametric approaches for estimating variogramwithin a constrained region. Usually a parametric approach involves two steps. First,lags, h1, ..., h, are chosen and sample variograms (covariograms) at these lags areestimated using observed data. Second, a proper, often an isotropic, variogram (covariogram) model is chosen and the model parameters are determined by fitting it tosample variograms (covariograms). Matheron (1962) proposes a natural way for thecomputation of a sample variogram (covariogram) by the method of moments. His estimator is unbiased but not robust. Cressie and Hawkins (1980) propose two robustsample variogram estimators. For choosing a proper variogram (covariogram) model,precautions are needed. For example, a variogram model should satisfy the conditionally negative-definiteness condition. That is, aa3 2y(s— s) < 0 for any realvector a E Rk Si, . . . , sk E D and any integer k. Similarly, a covariogram model mustsatisfy the positive-definiteness condition. That is, aa3 C(s — s) 0 forany real vector a E R’ , Si,. . . , sk E D and any integer k. There are other considerations too. For example, a variogram model should be able to reflect the sill, observeddata may present a sill being defined as a variogram’s non-zero limit at lag zero. Manyisotropic variogram models have been proposed. Examples of such models are the rational quadratic model: 7(h) = a2 h 2 /(1+ h 2) ,h E R’ (Schoenberg 1938);(the Gaussian model) 7(h) = a2{1— exp(— h 2)}, h E Rd; the linear model; theexponential model and the spherical model (Journel and Huijbregts 1978). A covariogram model is chosen similarly. A natural model fitting criterion is “least squares”.Other possible criteria include maximum likelihood, restricted maximum likelihood and4minimum norm quadratic. When observed data do not confirm a stationary or isotropiccondition, measures are taken to make data stationary or isotropic. For example, Hass(1990a, 1990b, 1992, 1993) adopts a moving window approach so that all the observations inside a circular window are approximately isotropic.SG describe a nonparametric approach to estimating a spatial dispersion matrix whenthe observed data are not stationary. Here a dispersion matrix has the same meaningas a variogram matrix except that no stationarity assumption is implicitly implied.Their method takes two steps. First, with the “nonmetric multidimensional scaling”(MDS) algorithm (see Mardia, Kent and Bibby 1979), a two-dimensional representationof sampling sites is sought. In this two dimensional Euclidean space, called the D—plane,a monotonic function, g, of the distance between any two points approximates the spatialdispersion between the two points. As a counterpart of the D-plane, the geographicalcoordinates plane of the sampling sites is called the G — plane. Second, thinplatesplines, f, are found to provide smooth mappings of the G-plane representation intothe D-plane representation. Then, the composition of f and g yields a nonparametricestimator of var(Z(xa) — Z(x6)) at any two geographic locations Xa and xb. Othernonparametric estimation methods for a spatial covariance matrix are discussed , forexample, in Loader and Switzer (1991), Leonard and Hsu (1992), Le and Zidek (1992)and Pilz (1991). While both adopting Bayesian approaches, Le and Zidek (1992) takeconjugate priors, inverted Wishart, on the spatial covariance matrix while Leonard andHsu (1992) propose a class of prior distributions, other than the inverted Wishart.With the above preliminaries, we can now summarize ordinary Kriging as follows. Assume a model, X(s) = u + 6(s), s E D, t E R, where D is a convex subset in Rd, 6(.)is a stochastic process and u is an unknown, constant scalar. Kriging searches for anoptimal linear predictor of any unobserved value X(so) within a family of linear func5tions of X(s1), i = 1,. . . , n. Therefore the predictor takes the form p(X; SO) = ZX(s) under the restriction, = 1. The restriction makes p(X; S) unbiased, sinceE(p(X; so))=A = = E(X(so)).Suppose X(s) is an intrinsically stationary process. By minimizing2(X(so) — X(s)) + 2m(1 — A) = 0, (1.1)optimal choices of=(Xi,. . .,)) and m have the forms,= (7+11_1tr7)_iandm = —(1 — ltp_17)/(ltp_1,where y = (‘y(so— Si),.. . , 7(So — 5))t, an n x 1 matrix and 1’ is an n x n matrix whose(j,j)th element is y(sj — sd).When X(s) is second-order stationary, the above optimal solution can be expressed interms of covariograms. The solutions for ) and rn are obtained by replacing ) with c,where c = (C(so — si),. . . , G(SO — sn)), F with = (G(s — si)) and m with -m. If thevariogram or covariogram fullction is known, ordinary Kriging stops here. If unknown,it is estimated with either a parametric or nonparametric method.When t has a more complicated form, other Kriging approaches are developed. Insimple Kriging (Matheron 1971), the model is X(s) = it(s) + 5s), where u(.) is known.The best linear predictor is sought within the family p(X; so) = lX(s) + k subjectto an uribiasedness restriction. By minimizing E(X(so) — p(X; so))2 over i = {i,i} and k, the optimal solution is, k = JL(so) — ijt(s) and i = Cti where(7 = (C(so,s1),. . . , C(so, s))t and = (C(s,s3)).6Universal Kriging is introduced when a more general form of is assumed. That generalform is X(s) = f_1(s)i3_ + 6(s), where 3 = , /3)t R’ is an unknownvector of parameters,f3—(•), j = 1,. . . ,p+l are known functions and 6(.) is a zero meanintrinsically stationary process with variogram 27(.). The form of a best linear unbiasedpredictor is p(X; so) )X(s), subject to, )X = x, ) = \1,. . . , .A,,), where Xis an n x (p + 1) matrix, its (,j)t1 element being f_1(s), and x = (fo(so), . . . , f(so)).If the variogram is known, solutions i = 1,. . . , n are easily derived. If unknown, itneeds to be estimated. One problem occurs when one estimates the variogram. Since inuniversal Kriging X(s) is not intrinsically stationary, the sample variograms computedwith observed data, X(s1), i = 1 . . . , n by the formulas of Matheron (1962), and Gressiand Hawkins (1980) are biased. To obtain unbiased variogram estimators, the residualsof X(s) must be known; but they are unknown, since /3 is generally unknown. To bypassthis dilemma, Neuman and Jacobson (1984; see also Haas 1993) propose an iterativemethod starting with ordinary least square (o.1.s.). Note that the model for the vectorXt= (X(si), ..., X(sj) can be rewritten in a matrix form, X = Z/3 + 6, where Z is ann x (p + 1) matrix with its (,j)t1 element, i = 1,... ,ri, j = 0,1,... ,p being f(s); /3is regression coeficients and S is a stochatic process vector. The iterative procedure isas follows. First, estimate /3 by /3i = (ZtZ)1Xand obtain based on residuals,X— Z/30i. Second, update /3 by /3gis = (Zt_1Z)_1Z_1_1X and E with the updatedresiduals, X— Z/3gis. Repeat the above procedure until it converges.In previous Kriging approaches, an estimate of X(so) is computed using informationon the same process X(s1) i = 1,. . . , n. In some applications, additional information isavailable. In these cases, realizations of other correlated random processes are observed.CoKriging was developed to bring the additional information into the BLUE predictor.More specifically, if the observed data set is X = (X(s1),. . . , X(s))t, an n x k matrixwith (j,j)th element being X3(s), i = 1,... ,n, j = 1,..., k, one needs to predict7Xi(so) using X. Suppose E(X3(s)) = ,,j = 1,. . . ,k, s E D and cov(X(s), X(u)) =C(s,u), s,u E D, where /1 ([Li,. ..,[jk)t and C(s,u) is a k x k matrix. The bestlinear CoKriging predictor of X1(so) takes the form p1(Z; so)=with Z-1 A11 = 1,_i Ai = 0, j = 2,.. . , k. The remaining steps are the same as thosefor other Kriging methods.Applications of the CoKriging method in environmetrics, soil science, and hydrologycan be found in Haas (1993) , Yate and Warrick (1987), as well as Ahmed and deMarsily (1987). Haas (1993) proposes a Moving Window Regression Residual CoKriging(MWRRCK) method for predicting pollution carried in rainfall. There, the observedinformation includes: (i) wet deposition of pollutants monitored at 200 sampling sitesin US; (ii) observations of precipitation at over 6000 sites of National Weather Service(NWS) network. We outline MWRRCK method below.First, a moving circular window centered at a prediction site is selected to achievelocal isotropy. A radius of the circular window is chosen so that the total number ofmonitoring sites inside the window will not be less than a predetermined value. Thisnumber is set to make sample variograms reasonably accurate. Second, the spatial trendsurface (respectively spatial covariance matrix) in the window is removed (respectivelyestimated) through an iterative o.l.s-g.l.s-procedure. Third, a regular CoKriging methodis applied to the residual process for estimating the residual at the prediction site. Thefinal prediction is a sum of the predicted trend and the predicted residual.In Hass’s theory, care has been taken to make the covariance matrix within a windowpositive definite. However, the covariance matrix between windows need not be positivedefinite. The problem arises when two prediction sites are geographically close and oneof the two sites falls inside the circular window of the other. Because different fitted8semivariogram and cross-semivariogram models are obtained for each window, it canhappen that the covariance matrix between these two sites is negative.Although Kriging is an appealingly simple method, it tends to understate uncertaintiesof predicted values, since model uncertainty is not included. One simple example givenin Handcock and Stein (1993) shows how much effect the model uncertainty has on theconfidence interval (CI) of a predicted value. There, the above authors showed that if theuncertainty of the unknown spatial covariance is not included, the 95% CI of a predictedelevation based on measured elevations is (699, 707). When it is included by using aBayesian approach, the 95% CI becomes (694, 713). This example clearly supports theuse of Bayesian approach to Kriging. Such Kriging are described as Bayesian.Bayesian Kriging is relatively new. Not much work has been done in geostatistics. Fora general model X(s) = n(s) + 6(s), s E D, a Bayesian approach can be adopted byassuming it(s) to be a random process that is independent of 6(.). More specially, p(s) =/9_1f_(s), s e D ,where j 0,... ,p are random variables (see Nather 1985,Kitanidis 1986, and, Omre and Halvorsen 1989). For example, Omre and Halvarsen(1989) describe a version of Bayesian Kriging that puts prior on the mean function only.Their method appears to be a direct extension of traditional Bayesian linear model tocase of spatial data. An empirical Bayesian predictor is obtained if the parameters of theprior are estimated from the data and substituted into the Bayesian predictor of X(so).Similarly, one can assume the covariogram varies in the space of all positive-definitefunctions PD = {C(s, ) : s, 1tt € D} and put a (prior) distribution on PD. Or one canassume a structural model on C(.), C(s, i; 0) and put a prior on 0. For the latter case,the predictive density isf(X(so) I X)=10 f(X(so) I X, 0)f(0 I X)d0.One example of such an approach is proposed by Handcock and Stein (1993). In Hand9cock and Wallis (1994), the method is applied to model meteorological fields in anattempt to assess the global warming problem from a statistical point of view.One competitor to Kriging is the smoothing spline (Wahba 1990a, 1990b). The splinemethod can be briefly summarized as follows. For observed data, X, i = 1,. . . n, theassumed model is X = f(s) + e, where s, i = 1,.. . , n are the locations of measurements, f() is a smooth function and e, i = 1,. . . , n, are independently, identicallydistributed (i.i.d.) errors. For each value of y, the smoothing function, f(s), is estimatedby minimizing- f(s)}2 + f[f”(x)]2dxover all f with continuous first and squared-integerable second derivatives. The smoothparameter ,\ is chosen by “generalized cross-validation” (Wahba 1990b). The interpolated value at any location s0 is taken as f(so). Oehlert (1993) shows one examplewhich combines a smoothing spline technique with a Bayesian approach. There, Oehlertproposes a multiple time series model for data that have both temporal and spatialcorrelations. The smoothing spline is used when the mean and trend are extendedfrom the rectangles where there are the monitoring sites to rectangles with no monitors. Many empirical comparisons have suggested that the interpolation performancesof spline methods and Kriging methods are similar (see Laslett (1994) for references tothese comparisons). Laslett (1994) demonstrates that in certain cases, Kriging surpassessplines by a big margin.When the observed data have a spatial-temporal form, Kriging faces another rival,a “hierarchal Bayesian time series approach” developed by Le and Zidek (1992) (LZhereafter). LZ assume independent responses over time, a common unknown spatialcovariance structure at each fixed time point and conjugate priors on both trend parameters and the spatial covariance. As an alternative to Kriging, LZ’s method has many10advantages. For example, it incorporates the model uncertainty into its interpolator.It uses incoming information to dynamically update estimation and gives a predictivedistribution that enables construction of a simultaneous band for interpolated values.In its original form, the LZ method does not allow such data. In this thesis, we extendthe LZ method to include missing data.We develop and explain the extension in the context of interpolating air pollution. Let’sfirst describe this context. Assume there are s sites scattering over a constrained region.At each site, k response values are measured. Examples of such measurements are airpollution levels, like sulphate or nitrate levels. Among the s sites, s,, sites are ungaugedsites, where there are no observations but air pollution levels are needed. The other 8gsites are gauged; there are observations. In the case of missing data, some values aremissing at the gauged sites. The missing data patterns discussed in the next chapterare called randomly missing, missing-by-design and monotone missing. Here, we use theterm “randomly missing” to mean its probability of being missing is not related to itsvalue (see Little and Rubin 1987). The term “monotone missing” is also from Littleand Rubin (1987). The meanings of the three missing data patterns are explained inthe next chapter.Let X denote the complete, random response vector for all sites (both gauged andungauged) at time t, where the first k elements represent pollution levels for k pollutantsat site one, the second k elements for site two and so on. Thus, X, is an sk-dimensionalvector. The inferential objective treated in this thesis is to interpolate unobservedpollution levels at ungauged sites using incomplete observations at gauged sites. Asin LZ, we assume a linear, Multivariate Gaussian model and conjugate priors on itsparameters. Interpolation with missing data consists of two parts. First, by fixinghyperparameters, we find a predictive distribution and its posterior mean along with11a standard error. Second, we develop an estimation procedure for hyperparameters.Further, in two steps we estimate the hyper-covariance matrix of X,. In step one, weadopt an EM algorithm to estimate the hypercovariance matrix at gauged sites. In steptwo, we apply the OS approach to extend this hyper-covariance matrix to all sites.This thesis consists of six chapters. In Chapter 2, we describe three different interpolation theories depending on the missing data patterns. The patterns are: (i) missing-by-design; (ii) randomly missing; (iii) monotone. There, we fully develop the theory ofinterpolation with data missing-by-design; we briefly discuss the theory of interpolationwith randomly missing data and only touch the theory of interpolation with monotonemissing data. In Chapter 3, we apply the theory of interpolation with data missing-by-design to Southern Ontario pollution data; we implement it with S and C programsand carry out residual analysis. In Chapter 4, by combining the theory developed inChapter 3 and the theory developed in Caselton, Kan and Zidek (1992), we show howto apply our results to an optimal network redesign problem. In Chapter 5, we comparethe theory of “interpolation with data missing-by-design” with the general theory of LZand also with Hass’s CoKriging method. In Chapter 6, we drawn conclusions and listsome future research topics. All figures referred in Chapter 3 and Chapter 5 are listedin Appendix C. We attach examples of S and C programs in Appendix A and B.12Chapter 2Bayesian MultivariateInterpolation with Missing Data2.1 IntroductionThe problem of interpolating a spatial field has received a lot of attention. Kriging offersa well-known solution, but it has deficiencies. In particular, it overstates the precisionof its interpolator because it fails to reflect the model uncertainty (LZ, Brown, Le andZidek 1994a, hereafter BLZ). To avoid these deficiencies, LZ propose a hierarchicalBayesian alternative to Kriging. The LZ method takes a Bayesian time series approachwith a Gaussian linear model and conjugate priors for the model parameters. Sucha Bayesian approach incorporates model uncertainty and yields a heavier-than-normaltailed posterior distribution, the multivariate t. LZ’s Bayesian alternative also hasthe advantage of dynamically updating the predictor as more observations come in. LZdeveloped their theory in a univariate case where at each of s sites, only one air pollutantis monitored.BLZ extend the above LZ Bayesian interpolation theory to the multivariate case. Ateach site k air pollutants are monitored. BLZ adapt the original LZ Bayesian theory bystacking rows of the s x lc response matrix into an sk xl vector, where each row represents13k measurements at a gauged site. To reduce the total number of hyperparameters in theprior distributions, BLZ assume a Kronecker product structure on the hypercovariancematrix. Then BLZ describe an algorithm for hyperparameter estimation. But the BLZtheory has an important restriction. It does not permit missing values at gauged sites.In this chapter, theories of Bayesian multivariate interpolation with different patternsof missing data are discussed. These patterns are: (i) missing-by-design ; (ii) randomlymissing and (iii) monotone. In all three cases, we follow the hierarchical Bayesian approach of LZ. In the case of data missing-by-design, the proposed Bayesian interpolationmethod is an alternative to Co-Kriging. In a Bayesian analysis, a predictive distribution plays a key role and we need to derive that predictive distribution of the unknownpollution levels at ungauged sites given the observed at gauged sites.Section 2 gives a brief review of the LZ theory. Section 3 serves as a technical preparationfor the following sections. There, we define a matrix T-distribution, which plays a pivotalrole in our inference, and explore its normal approximation. Section 4 spells out thetheory of Bayesian multivariate interpolation with data missing-by-design. Section 5gives a brief discussion to interpolation theory with randomly missing data. There,we describe an approximate predictive distribution and estimation of hyperparameters.Section 6 is about interpolation theory with monotone missing data patterns. There,only a recursive predictive distribution is described. We put the proofs of most theorems,lemmas and corollaries appearing in this Chapter, in the last section.2.2 Bayesian InterpolationIn this section, the LZ Bayesian interpolation theory is briefly summarized. Let Xbe an s-dimensional random vector at time t, where the first s elements, denotedby X, correspond to the unobserved responses at s ungauged sites. The remaining148g elements, denoted by X, correspond to the observed responses at s gauged sites.Assume:Z, B, indejndent N8(Bz, ) (2.1)where Zt is an h-dimensional vector of known covariates at time t and B is an s x hmatrix of regression coefficients,B=The priors of the unknown parameters B, are taken as conjugates of the normalmodel,B B°, , F N3h(B° , ® F), (2.2)6” W3_l(,5*). (2.3)Let At denote a matrix transpose of A. Since X is partitioned into X and X, E andB are partitioned correspondingly as(11 12 (B1i and B=i\21 E22)DefineS = (X -E2z)(X -B2 = CA,C =A =Note D is the set of all observed data.15With a straightforward calculation, the posterior distributions of B and Z are easilyfound. Lemma 2.1 gives their forms.Lemma 2.1 The posterior distributions of B and are:B D,B°,F222 D, 22, 5* W(22,6* + n—D, 1I2, 6* l47(1I2, 6),.,N339(1212where={22,r112},‘c’ ‘c’—iT12 = ZJi2L2v’ c’ v—iv= LJfl L.Ii2ZI2 L21B*= B° + ( T12 ) (E2 — B)Et,139 X Sg= EØF— [(z) (E22r,Z)]Ø(EF),E = F’(A + F’),22 = ‘22 + S + (E2 — B°)t(A’ + Fj(B2— B°),,+712 ‘‘i22LZ show that the predictive distribution of a future unknown realization Xf consists oftwo multivariate t distributions. These distributions are given in Theorem 2.1. For completeness, the definition of a multivariate t-distribution is repeated here. A multivariatet-distribution, denoted as tr(jt, H, 9), is defined to have a density function,f(x) H I- [ + (x —)tH(x — )] +r) (2.4)16Theorem 2.1 Let the posterior distribution of B and be defined as in Lemma 2.1.The predictive distribution of X = ((x))t, (X)t), given covariate vector Zf and thehyperparameters B°, , 6, F, consists ofX D t9 (a(i) + b,c — d22andX X = D t (ao + 12(x — a(i)),c + (a(i) —x)t’(a(l) — x)12q),where1*= *+—— 5g + 1,q =—s + 1,(ao’\a=I J=Bzf,\a(1)Jb (E2 — B)Ezf,c = 1 + 4F1zf,d = zEF1f.With the predictive distributions, the Bayesian predictor is simplely taken as the posterior mean.Corollary 2.1 Given the predictive distributions of Theorem 2.1 and Zf, the predictivemeans of X) and X are= E(X) = a1 + qi2b,112 E(X) = a2 + b.172.3 Matrix T-DistributionA representation of normal distribution given as a lemma in Lindley (1972) is summarized here. For any given constant matrices A, B, assumeX N(AY,E1)andY N(B, 2).If two normal distributions are independent, Lindley’s result assertsX AB + N(O, +A2t).The above fact will be repeatedly used in the sequel without explicit mention.Besides the normal distribution, the matrix T-distribution plays a pivotal role in thetheories developed in this chapter. Its definition and some properties are discussed next.Definition 2.1 A random matrix T : p x q is said to follow the matrix T-distribution,if its density is expressible as= I p(rn-qQpF1 + tQ’t Im, (2.5)where > 0, Qqxq > 0, m > p + q — 1,KPFq ((m—k[m,p,q]= fi \T’q m)and F() = l)f()f(— ) . . . F( — + ).In the above theorem, the notation, “P > 0” means that P is positive-definite. Analternative form of f(t) isQ (m-p) p q 1= +tp m (2.6)k[rn,p,q]18Using the notation of Dickey (1967), Press(1982) , we express the matrix T-distributionby T -‘ T(P, Q, 0, m) or more generally T + C “ T(F, Q, C, m), where C is a constantmatrix.Lemma 2.2 When q = 1 and Q is a scalar, T(F, Q, 0, m) is equivalent to a multivariatet(0, c’,m_p).Proof: By Equation (2.6),f(t) (Q + tt[ph]1t)m[(m— ) + t (QP1)-1rn—pBy (2.5) and (2.6), it is easy to see thatTt r’. T(Q1,P’,O,m). (2.7)In Dickey (1967), a representation of a matrix T is given. The result is copied here.Lemma 2.3 Suppose that U7,, F,m W(P,m—q), Xpxq Q ‘‘ N(0,I®Q), P >0,Q > 0 ,m > p + q — 1 and that X, U are independent. Let T be a random p x q matrixand T = (u_4)tx. Then T has the distribution given in (p.5).In the Lemma, 0 denotes the Kronecker product. A direct application of the aboverepresentation yields the mean of a matrix T.Corollary 2.2 If T ‘- T(P, Q, 0, m), the mean of T + C is C, where C is any constantmatrix.Proof: By Lemma 2.3, E(T)= E([U1t)E(X) = 0.1Below, some properties of a matrix T distribution given in Press (1982) are listed withoutproof. Partition T asIT1T=(19T: being a p x q matrix, i = 1, 2, and conformably,\P21 122for a matrix P11 of dimensions P1 x p. Let P211 = P22 —P21’P12 > 0 be positivedefinite.Lemma 2.4 Suppose T T(P,Q,0,m). Then:1. conditionally T1 T2 = t2 ‘—‘ T(Pn, Q + tPii2,—PPi2t,m);2. marginally 7’2 T(F211,Q, 0, m—p);3. 0 T(P,CQCi,0,m) where Opxr = TC1 and C1 is any q x r matrix.Dawid (1981) replaces the notation T(P, Q, 0, m) with T(8, F, Q). His notation differsfrom that of Dickey in the choice of the “degrees of freedom” parameter, that is, rn—p =6+q—i.Dawid (1981) has another representation of a matrix T-distribution T T(I, ‘q, 0, rn)that can be defined as the marginal distribution of Tpxq with T I E ‘s-’ N(0, I, 0 E)and W(I, m—p). In a general form, if Tpxq N(0, P’ 0 E) and >Z QW’(Q,m—p) thenT T(P, Q, 0, m), (2.8)where F, Q are invertible symmetric square matrices.Proof:f(t P,Q) = ff(t I P,)f( Q)dSf —(m++1) e_4tT[_1 (ttPt )] dZI Q +tptThe last step is true by integrating with respect to a partial Wishart density function.I20Definition 2.2 Let be a matrix with column vectors, a1, ..., a,,, define:a1vec(A) =aLet tr(A) denote the trace of matrix A, which is the sum of diagonal elements of A.Here are some useful facts:Lemma 2.5 When the dimensions are proper:1. vec(A + B) = vec(A) + vec(B);2. (Bt 0 A) vec(Z) vec(AZB);3. tr(AB) = tr(BA),4. tr(AB) = (vec(A))vec(B);5. (AOB)(COD)=(AC)Ø(BD);6. (AØB)t=AØB;7. (A ® B)-’ = A’ ® B’;8. if X is a random matrix, E(vec(X)) = vec(E(X)).The proof is straightforward and omitted.Definition 2.3 Assume Y=(Yj) is an m x q random matrix, elements of Y areindependently, identically distributed as N(O, 1), M : n x p, A : p x q, B : n x m areconstant matrices andX=M+BYAt.Then X follows a matrix normal distribution, i.e. X NXP(M, (BBt) ® (AAt)).21The density function of a matrix normal X isf(x) = (2r)4 W V i’ e_tr[W_1_M)V_1(_M)t1, (2.9)where W = BBt >0, V = AAt >0.By Lemma 2.5,[vec(X — M)t]t(W 0 V)’vec(X — M)t= [vec(X —M)t]vec[V’(X—M)tWh]tr[(X — M)V1(X — M)tW’]= tr[W’(X — M)V’(X — M)t].Thus, the following fact is proved.Lemma 2.6 T’Vhen the dimensions are proper and W, V are invertible, symmetric matrices, the following is true,[vec(X — M)t]t(W 0 V)1vec(X — M)t = tr[W’(X — M)V’(X — M)t].By a direct application of Lemma 2.6 to Equation (2.9), an equivalent relation betweenmultivariate normality and matrix normality is established.Lemma 2.7 Assume X be an n x p random matrix, thenX N(M, W 0 V) if and only if vec(Xt)‘ N(vec(Mt),W ® V).As one can see, the covariance matrix of a matrix normal distribution is a Kroneckerproduct of two matrices. Sometimes, for notational simplicity, a notation Xpxq ‘ .A1(W,V) +M due to Dawid (1981) replaces Xpxq ‘ N(Mpxq,Wpxq 0 Vqxq). With this newnotation, some facts about the matrix normal distribution are given without proof:22Lemma 2.8 If Xpxq f(W,V)+M,1.tV(V,W)+M;2. CXD Jf(CWCt, DtVD), where C, D are two nonrandom matrices withproper dimensions.Next, a normal approximation of a matrix T is derived. Suppose T follows a matrix Tdistribution Tpxq ‘‘ T(P, Q, 0, 6). Define a scaled matrix T* = 6T. By Lemma 2.3,1 u’ —rT*=g (-_) x, (2.10)where = means equal in distribution. T* is a matrix analogue of the univariate t8.Let = 6 — q. By the definition of a Wishart distribution, there are,p dimensionalrandom vectors, Y1 N(0, F), such thatUtU.YtYt. (2.11)By the multivariate strong law of large numbers (SLLN),vec(Ut)vec(yiyjt) E(vec(YiY1t)) a.s. aswhere, a.s. represents convergence almost surely with respect to fy1().Note that E(vec(Y1Y))= vec(E(Y1Yt))= vec(P). Hence,Ut—* P a.s. as —* oc or 6 —* oc.Since when p is fixed, tS’ —f oc is equivalent to 6 —* oc. Applying Slutsky’s theorem,we haveUtT*=[-]—* P X = .A/(P, Q) in distribution, as —* oc.23Thus the following theorem is proved.Theorem 2.2 WIzen n, p are fixed, as —* ccT* J/(P’, Q) in distribution.Theorem 2.2 is an extension of a similar asymptotic result in the univariate case, thatis t5 —* N(O, 1) as S —* cc. The fact has been noted in Dickey (1967).Lemma 2.9 Let U W(P,’9). Then( - p) (2P, F), ,as cc,where in the lemma, __,d denotes convergence in distribution.The above lemma is applied in the proof of the following theorem that gives an asymptotic distribution of a matrix T-distribution in higher order.Theorem 2.3 Let T* be defined by (2.10), and F > 0, Q > 0. Then-P4X)--U * V, as —* cc,where U Jf(P,F1) V .A/(P’,Q) and given the hyperparameters F, Q, U isindependent of V.Corollary 2.3 When z9 is big,T*(1+=Y*)X*where given F, Q, U, V are independent andU(P1,Q),242.4 Interpolation with Data Missing-by-designIn this section, the theory of Bayesian spatial interpolation with data missing-by-designis established for the multivariate case in which there are s,. ungauged sites, s gaugedsites and k monitored pollutants at each site.Suppose, now, that observations at gauged sites are pooled from different air pollutionmonitoring network. Since by design each gauged site does not monitor the same subsetof pollutants, some pollutants, concerned by us, may not be monitored at a site. Hence,these pollutants are missing because of design. Therefore, we call these “missing” values missing-by-design. More specifically, Let X = ((X)t, (x(l))t), where (X9)klrepresents the response vector at ungauged sites at time t and (X1)sgkx1 represents theresponse vector at gauged sites at time t. After a proper rearrangement of its elements,is further partitioned into (X’)11, and (X)(sgk_1)x1 that respectively correspondto the vector of missing-by-design pollutants and to the vector of observed pollutants atgauged sites. Since the same 1 pollutants are missing during the whole monitoring period and they comprise 1 columns in matrix X, they are sometimes referred as missingcolumns in the sequel.Like LZ, a normal model for the conditional sampling distribution is adopted. In termsof a matrix normal distribution, the model can be written as,X Z, B,-N3kX(BZ, E ® Ia), (2.12)where X = (X1,.. .,X)3k fl is the response matrix; Z = (Z1,..., Z)h is the matrixof covariates;B=/9sk,1 . /3sk,h skxhis the coefficient matrix; is the unknown spatial covariance matrix of X and I is an25n x n identity matrix. The conjugate priors of , B are,B I B°, E, F N3kh(B°, D ® F’) (2.13)andF,6* W(,6*). (2.14)The random field interpolation theory developed in this section has two steps. In thefirst step, while the hyperparameters B°, F, 1’ and 5’ are assumed to be fixed, predictivedistributions derived. In the second step, estimation procedure of hyperparameters isdiscussed.2.4.1 Predictive Distributions and InterpolationIn this subsection, all hyperparameters B°, F, and 6 fixed and are suppressed in thederivations. The estimation of these hyperparameters is left to the next subsection.If indices of missing columns in are i1, ..., i1 and indices of observed columns areZgJ let R1 = (r1, . ..,r) and R2= (r1+,. . ., r3,) where r, j = 1,..., .gk,is an.s9k-dimensional vector with th element being one and the remainder being zero.Thus R1 and R2 “mark” the position of missing columns. Now let R = (R1,R2). Observethat (X’)t = (X(l))tR1 consists of the missing columns. Similarly (X2)t = (X(1))tR2consists of the observed columns. Because vector X is partitioned into three parts, B,E, B° and 4 are partitioned accordingly. For example, is partitioned as( oo Eo(1)E(i)o E()where oo and (11) are .sJc x .sk, sgk x .s9k matrices respectively; Rt(ll)R is furtherpartitioned asRt R—E12” (R(ll)Rl R(ll)R2’\(11)— 21 E22) = R(ll)Rl RE(ll)R)’26where 11 and En are 1 X 1, (sgk — 1) X (sgk — 1) matrices respectively and in general ifEt = (E,E,E)t, E(1) means (E,E)t. The partitions of B, B° have a meaninganalogous to that of E11). Further, denote/T T \ /rt D DtT Dt ( v11 W \ ( tL1’1’(ll)ILl 1L’&’(11)1U2(11) = (11)= k\l12l ‘P22) = kR(ll)Rl R(ll)R2with ‘I’rn being 1 x 1 and JJ22 being (s9k— 1) x (sgk — 1) . Let “P112— “121’22121and ‘IJ12’J! for use in the sequel. Let,/ t o\ / oritr,o_1 (1) \ _1 1— RtB° — B°\ 2 (i)J \ 2When a mean squared loss function is taken, the Bayesian interpolator is simply E(X°X2). To find the posterior mean, one needs the predictive distribution function, f(X° IX2). That predictive distribution function plays a pivotal role in our Bayesian interpolation theory. As an indirect approach, since E(X° X2) = E(E(X° X’,X2) X1),one can instead find the predictive distributions of f(X° X(’)) and f(X’ X2). Thosetwo predictive distributions are given in the following two theorems.Theorem 2.4x° = ‘-‘ T(1), + ZtF_1+ (x(1) — Bl)Z)t — B°l)Z),B0Z + O(l)(1j)(X(’) — B(°l)Z), 6* + n).From Corollary 2.2 we obtain the next corollary.Corollary 2.4E(X° = x(1)) = BZ + — Brl)Z).27Theorem 2.5X’ X2 = x2 T(’,I + ZtF_1+ (x2 —RB1)Z)t2l(x—RB(°)Z + ‘y(x2 — RB°l)Z), 6*— sk + n).Corollary 2.5E (x’ I = 2) = RBrl)Z + 7(x2 — RB°1)Z).With the above two corollaries, the Bayesian interpolator is found.Corollary 2.6E(X° X2 = x2) = B0Z + o(l)R2I1’(X — RB°l)Z).The above corollary shows that the interpolated value E(X° X2) depends only on theobserved vector, X, provided the hyperparameters are fixed. Later it will be seen thatour estimator of the hyperparameters depends on all observed values X2, therefore theinterpolated values, E(X° X2), do depend on all observed values.Although it is difficult to find an analytical form of Var(X° X2), it is possible to finda closed form for Var(X° X2). The following theorem gives that closed form. Beforethe variance formula is presented, an useful fact, needed in the derivation of varianceformula, is given.28Lemma 2.10 Let S be an invertible matrix(a A12”I At A I\12 h122Jwhere a is a scalar and A22 is an n — 1 by n — 1 matrix. A1 is denoted asA1=( ::)•Then (b —B12’B2)= a.Theorem 2.6 Fort=1,...,n,Var(X° X2)= m 2[A11I+where(A1,A2) = Io(l)’I’(J’flR = (o(l)j)Rl, ?o(l)’I’(j)R2);cj = 1 + ZFZ + (X — — B20Zt);m = 8*— sk 1 + 1.Obviously when there are no missing columns, the variance iso1(l)ct/(m — 2). Whenthere are missing columns, the variance is roughly increased by a factor Ai’I’i12ct/(m—2).As indicated earlier, if the predictive distribution of f(X° X2) is found, the Bayesianinterpolator can easily be obtained. That approach is pursued here as a way of checkingthe formulas for the interpolator and its variance. Note that if hyperparameters aregiven, by Model (2.12)-(2.14) and Lemma 2.20 of Section 2.6:fyO\ / v’( ‘kt I I I -‘--‘0 ( ‘-OO £-20(1)1t2X} B2) RE(l)o R(ll)R229(B0N((Boo ( OO ® F-’\B2) ‘ \RE(,)o RE(,,)R2J( E Eo(l)R2 ‘\ (( oo 0(1)R2”\ * iRlO R(ll)R2) 22 }—By following the same way as the proof of Theorem 2.4, the predictive distribution off(X° X2) is easily obtained.Theorem 2.7X° X2 T (,ct,Bo°Zt + o(l)R22(X— B20Zt),m + sk)where Ct, m are defined in Theorem 2.6 and oI2 = oo — o(i)R2 ‘I’ R(,)o.From the above predictive distribution, the same Bayesian interpolator as that of Corollary 2.6 is obtained. By Lemma 2.2 and Press (1982), it is easy to see that the varianceisVar(Xt° I X)= m— 20I2(2.15)It is not difficult to prove that Equation (2.15) is equivalent to the variance formula ofTheorem 2.6.Lemma 2.11 If A, is defined in Theorem 2.6 and oI2 is defined in Theorem 2.7, thefollowing equation is true.A iTi At‘OI2 — 11l2111 + “oI(l)•That verifies the variance formula of Theorem 2.6.With the variance, only a pairwise confidence interval can be derived. With the posteriordistribution of X° X2, we can derive a simultaneous region. The next lemma isTheorem 3.2.12 of Muirhead (1982 p913) and it is copied here.30Lemma 2.12 If A is W(,6*). So 5* is a positive integer, > p 1, and Y isany p x 1 random vector distributed independently of A with P(A = 0) = 0 thenYtEY/YtA_lY is and is independent of Y.By that lemma, the distribution of the quadratic form of matrix T is easily found.Theorem 2.8 IfT1 T(I,1,0,6*),(6*— p)TtT‘-.-Let I = B0Z + o(l)R221(X— B20Z), where I is the Bayesian interpolator of X°when there are missing-by-design data. Then a simultaneous region of X° is given inTheorem 2.9.Theorem 2.9 The 1 — a, where 0 < a < 1, simultaneously posterior region of theBayesian interpolator is a hyper-ellipsoid, defined by the set{X : (X —)t(X— ) <b X2},where,b — sk* Ct * F1_a,suk,3*_l_suk+1— 6*_lsk+1and Ct is defined in Theorem 2.6.2.4.2 Estimation of HyperparametersIn the previous subsection, the hyperparameters , 8, F, B° are assumed to be known.Often, they are unknown. To finish the interpolation procedure, they must be specifiedin some way. In a complete Bayesian hierarchical approach, another level of priorsis usually laid on these hyperparameters. An alternative approach, suitable when theparameters are not sensitive to their prior specification, entails the use of empirical31Bayesian method. In an empirical method, the hyperparameters are estimated with theobserved data.In BLZ, 1 and 6 are estimated through a maximum likelihood estimator (MLE) andB°, F’ are assumed to be zero. While it is difficult to estimate all of hyperparameters,, 6, B°, F’ by MLE, in this subsection we propose two unbiased estimators for theestimation of B°, F—’ and apply the MLE method to estimate , 6. For the moment,the MLE of , 6’ is discussed. The two unbiased estimators of B°, F’ are describedat the end of this subsection.Even with B°, F’ being fixed, the direct maximization of the marginal distributionf(X2 I , 6*) is difficult. With the help of EM algorithm proposed by Dempster, Lairdand Rubin (1977), one can instead maximize the distribution of f(X2,E22,B2 , 6)(Chen 1979). Let the complete data set be x and assume only part of x is observed. Thatobserved part is denoted as y. If the sampling density f(x = b(x)exp(qt(x))/a(q5)is from a regular exponential-family, the p iteration of EM algorithm consists of:(E-step) estimating the complete data sufficient statistics, t(x), by finding (P) = E(t(x)y, q));and(M-step) determining q(P+1) as the solution of the equation E(t(x) q) =It is proved, in C.F.J. Wu(1983), that the limit point does maximize the marginallikelihood function l(q I 1).To apply the above EM algorithm, we need to find the sufficient statistics of F and 6*.Following the approach of LZ, letB2 = (X2)tZt(ZZt)132andS2 = (X(I —Zt(ZZ)-’Z)X2.Anderson (1984 p291) shows that given B(,) andB2 B(,) - N(RB(l), 22 ® (ZZt)—’), (2.16)52 I (11) WS9k_1(22, n — h) (2.17)and that B2, S2 are independent.Since,f(X2,B(,), E() B°,), F’,,,6)= f(X2 B(,), (ii))f((ii) I 6*)f(B I F’,f(S2 (fl) 6*)f(E(,) B(,), (,,))f(B(i) B(°,),,,F-’),where the last step is true, because of Equations (2.16) and (2.17).When B° and F’ are assumed to be known, the above likelihood function of (11) and6* is proportional to f((ii) (,,),6*). Because, by Lemma 2.20 of Section 2.6,6*— sk).Therefore, the likelihood function is proportional to,f(X2,B(,), i) (ii), 6*) oc f(Eii) (H), 6*)(—sk) 6+sgk+1—sukCC CJ1 2 2exp [_tr((l,)E))] (2.18)where=2sgk(6*suk)/2(8gk1)sgk/4 — sk — i + 1)33With the likelihood Equation (2.18), the pair log ) is readily identified asthe sufficient statistics of 6 (Chen 1979).To reduce number of parameters that need to be estimated, BLZ adopt a Kroneckerstructure, = A®, where A the between-sites-hypercovariance matrix and betweenpollutants-hypercovariance matrix. Since (11) = Ag 0 , where Ag is the hypercovariance matrix between the gauged sites, by the likelihood function (2.18), only Ag andcould be estimated. Hence, the estimation of c1 takes two steps. In Step one, Ag, and8* are estimated by MLE through an EM algorithm. In Step two, the SG method isapplied to extend Ag to A.Expectation StepAt E-step, the posterior expectations of and log I are of interest. Thenext lemma, due to Chen (1979, Theorem 2.1, p237), is used in the derivation of theexpectations of these sufficient statistics. The lemma is copied without proof.Lemma 2.13 Given hyperparameters , 6, B° and F,x(1)) = ( 8*1) _6*)\\_6?7t dwhere,,T71 ‘4’O(1)P(fl),A=ZZt,0(1) = x(l)Zt,B(1) =S(i) = x’ (I — Z(ZZt)1Z) ()((1) ),(11) = (11) + S(i) + (E — Brl))(A’ + F’) (E(i) — Btl))t,34d = (6* + n — sk — h)’j) + 6?74?7 +Note that ‘J/ = R(ll)R.Lemma 2.14 If W W(A’,n), then given A, n,E(log W ) = n * A1andwhere ‘P(.) represents a digamma function. A digamma function is defined to be thederivative of a log-gamma function.With the above lemma, the following lemma is true.The expectation of is,Theorem 2.10E E’ x2— B ( (6* — sk)’IJ (6* — sk)I,* “ Rt(11)— (8 — suk)(*)t d1 )where= I — Zt(ZZt)_1,G2 = MZt(Z)’— B,*_ T T—l— ‘“1222S = MG12+G2((ZZt)’+F1)’G,= ‘22 + 5,d1 = (6*— sk + fl — 1 — h)’ + (6* — suk)(,l*)tI17 + lT!.35The expectation of log I is,Lemma 2.15E(iog (11) I X2, ,) = —sgkiog(2) — — — + 1_Sgkl— k—i—h— .+12) + log 1I2 +log 22 I.Maximization StepWhen the expectations of sufficient statistics are found, at M-step, the current valuesof the hyperparameters are updated with the values that maximize the “likelihood”function. Here, the “likelihood” function is obtained by plugging the expectations of thesufficient statistics into Equation (2.18). However maximizing the “likelihood” functionover = Ag 0 is not easy. When there is more than one parameter involved in themaximization step, the following lemma leads to an iterated procedure. An advantageof the iterated procedure is that at each iterated step, only the maximization of afunction over a single-parameter is required. That is generally easier than maximizingover several parameters simultaneously.Lemma 2.16 If a function g(x, y) is upper bounded, the iteration procedure describedbelow leads to the pair (xo, yo) that maximizes g(,) locally. At the pt iteration, updatethe current value with a value that maximizes the function g(x, y(”) y(P)) and denotethe updated x value as xl); update the current value (°) with a value that maximizesthe function g(x(’),y x(’)) and denote the updated value y as (‘). (xo,yo) is thelimit of (x(’), y(P)).Proof: g(x,y) is bounded up andg(x,y) g(x’’,y)36Besides the above lemma, another lemma is used in the maximizing procedure. Thelemma is Lemma 3.2.2 of Anderson (1984 p62). It is copied for convenience.Lemma 2.17 If > 0, the maximum off(G) — —flog G —tr(G1D),with respect to positive definite matrices G exists, occurs at G = (1/n)D, and has thevaluef[(1/n)D] —_p*n*log(n)—n*log D —p*n.With Lemma 2.16, an iterated procedure is adopted to find F(1i) = A9 0 Q, 6* thatmaximize Equation (2.18). Since Rt(ll)R = I(ii) and R is orthogonal, maximizingEquation (2.18) over is equivalent to maximizing the same equation overWhen 5 is fixed, the log-likelihood of Equation (2.18) is proportional toL(Ag, c) = (S — sk)log —tr ((11)))= (6*— sk)klog A’ _(6* — sk)s9 —tr [(A9 0Again, an iterated procedure is applied to maximize L(Ag, ft). When Q is fixed, rewritetr (411)) as tr(A9Q). By Lemma 2.17, Ag = k(6* — sk)Q’ maximizes L(A9j).Similarly, when Ag is fixed, = s(6* — sk)G, where tr((11) = tr(QG),maximizes L(Ag, 1).When 4() is fixed, 6* is updated by maximizing Equation (2.18) over 6*. By takingthe first derivative of Equation (2.18) with respect to 6*, it becomessgk 6* k 1—s9klog(2) — log E(11) I +log I — 2 = 0. (2.19)37When A9, are fixed, finding t that maximizes Equation (2.18) is equivalent to findinga solution in Equation (2.19). For showing the existence of a solution of Equation (2.19),replace log j with E(log Ei) II M,(ll),6*) in Lemma 2.15 to Equation (2.19)and use the relationship log J) log R’P(ll)Rt log ‘P112 +log ‘P22 ,Equation (2.19) becomes,___________________6—sk—z+1i=1 1=1+1—log ‘P221 +log I ‘P22=0or equivalently—sk— h—1)‘P(6 — sk— i + 1)]i=l+1=log22— og’P.Since the gamma function is convex, the digamma function is monotonically increasing.Thus, the left side of the above equation is always positive whenever n — h > 0 andit goes to zero when t increases to infinity. The right side of the equation is positive.Therefore, a unique solution does exist.2.4.3 EM AlgorithmBy summarizing the above discussion, our EM algorithm becomes:E-STEP. Given the current values of ‘P(11), 6*,E ‘ Mt— R ( (6* — k)’P (6* — sk)’P,* Rt(11) 2) —_(6*— sk)(*)t’P d1 )whered1 = (6*— sk + fl — 1— h) + (6* — sUk)(7l*)tW,7+ lP38andE(log II M(,,),6*) = —s9klog(2) — 6 — sk— i +6 +n—sk—l—h _i+ ) +log 1I2 +log I I;M-STEP: Given the current values of and log , find the MLE of = A9 0 l,6* by repeating the following steps until it converges.Step 1:Given the current A) and Q(’), represent tr[(A ® as tr(1(P)G) and set= s(6*— sk)G;Step 2.Given current and P+1), represent tr[(AO’1))j)] as ir(A’)Q) and set= k(6*— sk)Q1.Step 3.Given the current estimate 6* by solving the following equation:{(6*+n_suk_h_i+1)(6*_suk_i+1)]j=1+1= log I 22 —log 22 I. (2.20)Estimation of B° and F1.The estimation of the hyperparameters B° and F are based on two unbiased estimators. Suppose that the k air pollutants are labeled from 1 to k. The measurementsof the same air pollutant at different sites are usually similar but the measurementsamong different pollutants are not similar. Based on this fact, we assume that B° hasan exchangeability structure, that is, the hypermean coefficients of th air pollutant atall gauged sites are the same and equal to u, i = 1,. . . , k.39Since,X2 I BZ + N(O, a Ia),then the following lemma is true.Lemma 2.18 Given Be,) and E(), B2 and S2 are independent.Proof:f(E2,S2 Br,), (ii))— J f(E, 52 I B(1), Br,), (n))f(B(,) Br,), E(,,)) dB(,)= J f(E B(i), (,,))f(B(,) Br,), E(,l))f(S2 I 11) dB(,)= f(E2 B(°,), (,,f(S2 (11)). IThe next theorem gives unbiased estimators of B0 and F1. Let12=isgk—Lfs9k—lwherej E {1,.. . ,k}, v 1,.. .,sgkl. Obviously (/3jt = (ZZt)_lZtX andj marksthe pollutant type of X.Let %i be the means of all J3 with v = 1,... , k. Therefore, under the exchangeability assumption, is the estimator of uz. Let,= n — h— 2 S9k1 (v — %iv)t($iv —— (ZZt)’— v=1 2awhere /3 — = a E2.Theorem 2.11 Given X2 and the exchangeability of B°, , i = 1,.. . , k and F’ areunbiased estimators of u, i = 1,..., k and F—’ respectively.2.5 Interpolation with Randomly Missing DataWe refer the term randomly missing to the same meaning as missing at random definedmathematically by Rubin (1976), Little and Rubin (1987). More precisely, let Y denote40the data that would occur in the absence of missing values. Further, we assume Y =(Yobs, Ymis), where Y0 represents the observed values and Ymis, missing values. Assumef(Y 0) denotes the joint probability density of Yobs, Ymis. We define for each componentof Y a missing data indicator, taking value 1 if the component is observed and 0 ifmissing. For example, if Y (Y1), an (n x K) matrix of n observations measured forK variables, define the response indicator R = (R), such that— f 1 if Yij observed,1Itj —0 if Yij missing.When we treat R as a random variable and specify the joint distribution of R and Y,the conditional distribution R given Y indexed by an unknown parameter i/ is calledthe distribution for the missing data mechanism.Rubin (1976) defines the missing data to be missing at random when the distributionof the missing data mechanism does not depend on the missing value Ymis, that is, iff(R I Yobs,Ymis,i/)) = f(R Y0b3,i,b).Little and Rubin (1987) point out that when data are missing at random, likelihoodbased inferences can ignore the missing data mechanism. In other words, likelihoodbased inferences can only base on the marginal distribution, f(Yob5 0).Let M1 represent the randomly missing subset of data from x’ and M2 the observedsubset of data from Let 1* denote the total number of elements in M1. Again, thepredictive distribution f(X° M2) leads to a Bayesian interpolator. Given the hyperparameters, when there are data missing-by-design or no missing data, the predictivedistribution follows a matrix T-distribution or a multivariate t-distribution. With randomly missing data, will the predictive distribution f(X° M2) still follow some formof t-distribution? The following simple example gives a negative answer.41Consider a simple case, where there are two gauged sites and n = 2. Let T denote therandom matrix at the gauged sites. That is,(X1 y’\T22 Y1,Y2 = I I\Y2 X21where the upper case characters represent the missing values and lower case charactersrepresent the observed values. Assume,T-‘.A[(I2x,), E W (12x2,6+2).Then,II+TtT=X+y+1 y1X+y2y1+y2 X+y+1= (X+ 1)X+ (—2yiy21)X+ + yy+1+ X= (aix2+)a4;2where a1 = (X? + 1), a2 = —y1y2X,a3 = y + y + y?y + 1 + X? and a4 = a3 —Note,a4=y+y+y+X+i-Thus-±II+TtT_=[(aiX2+)+4]2sincej°° f(Xj,x2,yi,y)=1 dx2J—oo f(y1,y2)Lf(xi,x2,yi,y)dxL f(T)dx2.42Use (2.5) to substitute for f(T), and obtainf(X1 IY’ =yl,Yy)LlI+TtT2 dx2-±/ 2±i fX’ (aix2 + 2)2 a1xa4 I 1+ dx2a4-±oo t2 2cxa4 2 a’J 1+— dta4-±X a4 2 a12 [i+x] 2The last step is completed by integrating with respect to a partial t density.Consider the special case, Ui = Y2 = 1. Thenf(X1= ‘,Y2 = 1) (X +2)(1 +X),f(X1 I 1, 1) is clearly not a t-distribution. Therefore f(X1,X2 y, y2) does not follow amultivariate t-distribution. If it did, its marginal f(X1 I y’, y2) would follow a univariatet-distribution.Since the search for an exact predictive distribution proves difficult, an alternative approach is to derive an approximate predictive distribution. Note that given hyperparameters, f(X°,M1,M2)follows a matrix T-distribution. By Theorem 2.2, f(X°,X(’)) isapproximately normal and then by Lemma 2.7, f(vec(X°), vec(Mi), vec(M2))is approximately multivariate normal. General normal theory implies that f(vec(X°) vec(M2)) isapproximately normal. Theorem 2.3 shows that the order of approximation is O((6*)_).The theorem below states an approximate predictive distribution of f(vec(X°) I vec(M2)).LetR* — (Iskxs0k 0- 0where R = (R112)and Rvec((X(l))t) = vec(Mi), Rvec((X(l))t) = vec(M2).43Theorem 2.12 f(vec((X°)t) vec(M2)) is approximately normal with meanvec((B0°Z)t)-f- ExoM2(M) (vec(M2) — Rvec((B(°l)Z)t))and covariance— X0M2(EM)x,whereExx = [(ZtF’ + I) 0 ]/(* + n);XOX0 = (‘sujcxs 0) Zxx o)(0(39k_1*)x(S_1j’\EXOM2 = I 0) Exx jJM2 = (0 R)Exx(2)If the loss function is “mean squared error”, the approximately Bayesian interpolator issimply the mean of the above approximate predictive distribution.Following the same approach of the previous section, an empirical Bayesian method isobtained. In other words, all the hyperparameters are estimated using the observations.Since no unbiased estimators of B°, F—’ has yet been found, the estimation of thesehyperparameters will not be discussed here. In applications, they will be specified asplausible values. Hence, in the following discussion, these two hyperparameters areassumed to be known. Again, the estimation of hyperparameters requires two steps. Instep one, an EM algorithm is applied to estimate 6, = A9 0 ft In step two, theSG method is applied to extend A9 so as to obtain A.In the EM algorithm, the likelihood function is f(X(’) 6). By the same argumentas that of the proof of Theorem 2.12,f(vec((X)t) I ,(ii)) + vec((B,)Z)t).44So by Equation (2.9), f(X(’) 11), 6) is approximately proportional toAg _bk1 ç (6*— sk +whereA a*(X — B(°l)Z)(ZtF’ + I)’(X1— B1)Z)tand a* = 6*— .sk + n.For the E-step, E(M1 I M2) is given by the following lemma.Lemma 2.19 Given hyperparameters, f(vec(Mi) vec(M2), 6) is approximatelynormal with meanf4vec((Brl)Z)t)+ M12M2(vec(M2)— Rvec((B(°l)Z)t)and covarianceEM1M—EM1M2,where= [(ii) a (ZtF’Z + I)]/(6* — sk + n);= REx(l)x(l)Rj,i,j = 1,2.The M-step is similar to that in the previous section. Its discussion will lot be repeatedhere.The EM algorithm is summarized below.E-STEP: Given the current values of 6*, B),E(vec(Mi) I vec(M2)) = Rvec(B1)Z) + MlM2EIj(vec( 2) — Rvec(Brl)Z))45M-STEP: Given the current values of vec(Mi), update (11) = Ag 0 Q, 6K by repeating thefollowing steps until it converges.Step 1:Given the current (A’)() and (1_1)(7), represent tr[((A;’)P 0 (1—)())A] astr((f’)(”)G) and set (c_1)(P+1) = hsgG’;Step 2:Given the current (A’)() and (1)(i+1), represent tr[((A)PO (Q_1)(P+1)) A] asand set (A;’)(’) = hkQ’.Step 3:Given the current = Ag 0 , update a* bya*= sgkh/tr[— B(l)Z)(ZtF1+i)1(X’ — B(l)Z)t]The estimated 6 is max{sk, a* ——2.6 Interpolation with Monotone Missing Data PatternsLet X be generally partitioned as ((X)t,(X)t,. . . , (Xfl’) and (X)t = ((X)t,...,(Xflt), i = 0,. . . , r. E is partitioned correspondingly asI1’ z=O,1,...,r—1.\ (i+i)i L4([i+1][j+1]) JIn the above, are the covariance matrices of X and j = o,... , r respectively. Note that = >D. A reparameterization is recursively taken as{P1, ([i+i][i+i])}, 0,.. . , r — 1,46where Pj= {ici+i’ i(i+1)}=o andTi(i+l) i(i+l)E([+l][+ll), iI(i+1) = ii — Tj(j+l)([+lJ[i+lj)Tjt(j+l), = 0,. . . , r — 1.B is partitioned as Bt = (B, B,. . . , B) accordingly.The lemma below gives the intuition behind the reparameterization. In the lemma, ‘Iis partitioned in the same fashion asLemma 2.20 If has the prior distribution of (2.14), the following are true:1. is independent of 1’,,j = 0, 1,. . . , i.2. [i+1][i+1fl ‘- TV’(([j+i][+i]), — sk + l(i+1)), where l(i+1) is the dimensionof3. E11i W_l(1(+l),6* — sk + l()), where i(i) is the dimension of X and—4. Tq1) E1(+) E1()0This result is called Generalized Inverted Wishart Distribution by Brown, Le and Zidek(1994b) (see Caselton, Kan and Zidek (1990), LZ for its earlier forms).Suppose (X))t = (U, G), i = 1,. . . , n, where U! : 1 x d is missing, G : 1 x (sgk — d)is observed and X’ is the matrix of response vectors at the gauged sites. Whend1 d2 ... d,, the pattern of the missing data {U!}1 resembles an upside downstaircase. An example of such missing data occurs when at gauged Site 1 the pollutantsare measured up to time T1, at gauged Site 2 up to T2, . .., at gauged Site n up to T,-,and when T1 < T2 < ... < T; the missing data of [X(1)]t have the shape of staircase.Such a missing pattern is called “monotone”; in Little and Rubin (1987).The following theorem gives a recursively predictive distribution.47Theorem 2.13 Suppose X, t = 1,... ,n follow the model of (2.12) and B, E followthe prior of (2.13), (2.14). Then, given all hyperparameters,Uj {G1},D T((1(2))’, a, 6 + j — s)k, j 1,. .. ,where:fl — jj y(1)j.-1— lVki fi=1Ja (Bfl*Z —1(2)((2) (B(2))ZJ +l(2)((2)Gj;a = 1 + (G —(B))*Zj)t(22_1(Gj — (B2))*Zj) + ZZ;(Bi))* B1) + (B(1) —(B1))* = (1)) (B)* : d x h, (B2))*: (s9k — d) x h;((2))B(1) = CA7’;Si=—C, = A’ =F1(A’ + F)’; F = (I —ii) = (11) + Si + (E1) — B°l))t(AT’ + F’)(E1)—= ( : d x d, 22) : (sgk — d3) X (sgk — di).(2)1 (22)For the case of a monotone missing data pattern, finding an EM algorithm estimationprocedure for hyperparameters is difficult. As a make-shift measure, we can treat it asa randomly missing data problem; the EM algorithm proposed in the previous sectionmay then be applied.482.7 Proofs of Theorems, Corollaries and LemmasProof of Lemma 2.9Proof: By (2.11), U=1’’ and }çyt i.i.d. N(O, F). Proposition 8.3 (ii) (cf. Eaton1983 p305) implies Cov(vec(YYt)) = Cov(YYt)= 2P 0 P. With the multivariatecentral limit theorem,(vec(U)vec(P))(2P, F).Since vec(Ut) and Ut are only notationally different and a matrix normal distributionis distributionally equivalent to a multivariate normal distribution, by the definition ofconvergence in distribution (Billingsley 1968), it is easy to see that- P)--A1(2P, F).IProof of Theorem 2.3Since- FX) = (T* - F x) + (() _i) p-xand1v1(T*_P ()x)(x=-(ç)-) (ç) ()= (-(1+ ()-) ‘ () ()By Lemma 2.9:as49By LLN and Slutsky Theorem, as —* o1 —1(1+()r4)—* a.s.;-—* P 2 a.s.;—+1.So by Slusky theorem(T*- F- ()x) Ar(2P,P)F’F4Ar(I,Q)or(T*- () - x) F’)A1(F1,Q).Given F, Q, the two normal distributions are independent, since U and X are independent.Further, since6 )[()r+i]-—+ 0 as —* co.The above is true, since = 6— q and q is fixed.Reapply Slusky theorem. The conclusion is followed. IProof of Theorem 2.4Given B°, F, , 6*, (2.12), (2.13) implyandBI > B°+N(0,®F’).50In turn,X EB0Z+N(O,®P)where P = ZtF’Z + I. SoXt Zt(B0)+N(O,P®).Equation (2.14) and (2.8) imply,Xt — Zt(B’) T(P1,, 0, + n).By Bartlett’s decomposition, = where= (IA= (oI(1)0\0 I) ‘ \ 0 (11)and = o(1)i). Now apply Lemma 2.4 part 3 to get(Xt —Zt(B0))(_l ‘-‘ T(P_l,A, 0, 6* + n).Then by (2.7)( )x=(t1 t2)T((rj1) )(BO)where X° = t1 and X’ = t2. By Lemma 2.4tl — — B0°Z +?7B(°1)Z I t2 -‘ T(1),P + (t2 — Bl)Z)j)(t2— B1)Z),0,6* +n).The theorem is proved by reapplying the same lemma. IProof of Theorem 2.5By (2.12)-(2.14) and Lemma 2.20(X(’))t I Zt(B1))+ N(0, P ® W (ii), 6* — k)51where P = ZtF’Z + Ij)(fl. Thus(X1)t— (Bl)Z)t ‘-- T(P’, (11), 0, — sk + n).Apply Lemma 2.4 part 3:(X’)tR— (B(01)Z)tR T(P’,Rt(ll)R, 0,— .sk + n).The remaining is the same as in the proof of Theorem 2.4. IProof of Corollary 2.6E(X° X2) = E(E(X° X2)= BZ + O(l)1)(E XU I X2) — B’l)Z)= B0Z + o(1)(i) [R1E(x1 I X2) + R2X — Brl)Z].The last equation is true since I = RRt R1+R2,X’ = RX(1)and X2 =By Corollary 2.5E(X’ X2) = B0Z + o(l)jj){Rl (RB(°l)Z + y(X2— RBrl)Z))+ R2X — Bl)Z}= BZ + — R1’yR — I)Bl)Z+ (R2 +R1-y)X2}= BZ +—R2)Brl)Z+ (R2 +R1)X2}= BZ + + R2)(X — RBl)Z)= BZ + O(l)R222(X— RBl)Z).The third last step is true because RRt = R1 + R2 = I and the last step is truesinceRt(ll)R= 1’(11)52impliesRt(ll)R2= 1’22)So/‘12= RI122)= R1112 + R2’I’2.That isR — + R2) =1)(Rl7+ R2).I2 22 —Proof of Lemma 2.10By using Bartlett decomposition, it is easy to see thatb = (a —A12_lAt ‘—1.22 12?B12 = —A12Ab;B22 = A’ + A’A2b A’12 22For notational simplicity, let d = = a —A12A2which is a scalar. Thenb — B blBt = d’ —A12’d’(A’ +12 22 12= d2[d— A12(A2 +A2d12)’AJ= d’[l—A12(A2d+A212)’At1 (2.21)12iFor showing the lemma, note thatA12A2is a scalar andA12 A12—aAi(AiA)+a’Ai2(AA)= A12a’[a —A122]+aA12( iA’At“22 12)= A12a’d + a’(AiA)Ai= a’AiA(A22+A212).Equivalently,A12(A2d+A22)1= a’A12A.53Then,1 —A12(A2d+A212)’A = 1 — a’A12A’2= a’d.Combined with (2.21), the lemma is proved. IProof of Theorem 2.6Given all the hyperparameters,Var(X° X2) = Var(E(Xt° X(1)) X2)+E[Var(Xt° x(1)) X2].If t = n, by (2.7), Lemma 2.4, Theorem 2.4 and Lemma 2.10, it is obvious thatx° I x(’) T(’I (l),gt,BO0Zt+ — B(Ol)Z),6* + 1), (2.22)wheregi = 1 + ZF’Z + (X’ — Brl)Zt)t j)(Xt — B(°l)Zt).For t€{1,. . . ,n — 1}, let C be the orthogonal matrix such thatBy Lemma 2.4 (III) and Theorem 2.4,x°c I x(’) T(1),I. + CtZtF_1ZC— Bl)ZC)tj)(X(l)C— Bl)ZC),B0ZC + o(l)’)(X(’)C — B(°l)ZC), + n).Note that the last column of ZC is Z, the last diagonal element of CtZtF_1ZC isZF’Z and the last column of X’C is So it shows that (2.22) is true for any t.By Lemma 2.2 and Press (1982 p128),E(X I X(’)) = BZ + (I)o(l)(ll)(X — Bl)Zt)54and= Var(o(l) )(Xt — Bl)Zt) X2)= AiVar(X’ X2)AA iT At— J1lCtVlI2f1im—2of Lemma 2.2 and Press (1982 p128). To findwithE — Brl)Zt)tJj)(Xl)— Bl)Zt) x]= E [(RtX1) — — RtBl)Zt) I x]-E0- X-BZ I 0((Xe’ —B’Zt)—r(X? — BZ) 2X—BZ )= E{[(X— BZ) — r(X— BZ)]t[(X— B’Z)—r(X— BZ)] I X} + (X — BZ)t(X— BZ).55Var(X I = g oI(1).m+l—2Further,— B(°l)Zt) =0(1)(J!j)RRt(XU— B(°l)Zt)IX’—B’Z= (A1,2)— BZ)= A1(X’ — BZ) + A2(X — BZ).By Theorem 2.5 and a similar argument as the case of X°X IX —BZt),m+l).So(2.23)Var(E(X° I X2)The last equation is true, becauseE[Var(X° X}, let us startRt)R= (-(I -r( 0 (I -r‘\0 I) \ 0 ,1\oSoLet Y = i1’[(X’ — B10Z) — r(X — B20Z)]. By (2.23) and Lemma 2.4,Y X T(I,ct,O,m+ 1).Lemma 2.2 and Equation 2.4 imply___-f(Y=y I X) = C?(Ct + yt) 27r2F()ThenE(YtY)=E(YtY+ct)_ct— (m —2 + l)ct p (m_2+1) m—2-H dy— Ct— m — 2 o p (!!iza)C(Ct + ,, y) 2m-2+l= Ct—Ct.m—2That implies— Bl)Zt)tj)(Xl)— Bl)Zt)m—2+l= Ct — Ct + (X — — B°Z).m—2And1 m—2+lE[Var(X I X’) XJ= m + 1—2[Ct +m — 2 Ct Ct]O1(l)Ct= oI(1).m—2Finally,Var(Xt X) = A1c’I’12 +m—2 m—2Ct= [A1T,12 + oI(1)].Im—2Proof of Lemma 2.11By Rt(ll)R=‘IJ, the following are true:(11)R2= R (12 = R112 + R222, (2.24)122)= RI’11 +R2’I’1 +R1’J!2 +R212. (2.25)56By (2.24) and (2.25)2 22 2 (11) (R1’T’2 +R22)I’1(R’T1+R2’12)t(11)R ‘P’Rt= 112’I’I’21+ —R1’P1=—Hence,RI!112= (11) — (ll)R2PR(ll)which impliest-1 —(ll)RlIhlI2Rl (11) — (I)(fl) —R221.So= O(1)(i1j o(1) —orA1I’12= —[o—+ —= oI(1) + OI2•It proves thatA1W12 + oI(1) = OI2’Proof of Theorem 2.8:By Lemma 2.3, there exists U r’.I W(I 5*— 1) and N(O, I> 0 1) such thatpxp,T = (U_)tY. By Lemma 2.12 ‘‘ is independent of YtY x• Therefore,, YtU-1Y— p)TtT = Yt/pF ,..IYty IytU_1y/(*_p)Proof of Theorem 2.9:By Equation (2.7), Lemma 2.4 and Theorem 2.7,I2(XO —012 / T(I,1,O,6*_l+1).Ct257Then by Theorem 2.8,(6*— 1— sk + 1)(X — 0t—l(xO — O)t/ 012F5k, 6—l—sk+1.sk * ctSo,1 — a = ——< b).I012\ tProof of Lemma 2.13By Bartlett’s decomposition, = TAT, where(°I(1) 0, T=Ii \0 E(11)j o )and r= 0(1)(1’1). Thus=/ I 0\ 1>:0:1) 0 /1 —r\=i) )— / (1)—E(11) +Tt1)T)Given , 6”, Lemma 2.1 implies1 6*)I x(’) WSk((l),I x(’) WS9k(!j), 6* + n — sk — h)and(1)’ N(3k)89(,E01(1)0So= 6oI(1)X) = E(1)E(r 0I(1)) x(’))= I x(’))58and+Tt,)T x(’)) =(6* + n — sk — + E(E(Tt (1)T OI(1)) x(’)).Note that,TEl)T = (E)T)tE r. Let Y )T_1. Obviously by definitionof a Wishart distribution,Y E01,x1 N(sk)(s9k)(O, ‘skxsk 0Then,ytyoI(1), x’ - WS9k(j), .sk).ThusE(E(ll) +TtE,)T xc’)) = (6* + n — sk —+ sUk’j) x(’))= (6* + n — sk — h)j) + 6q(1)77 + sUkF(ll).IProof of Theorem 2.10Note that since,RtX’ Z, B, E RtB(l)Z + N(O,RtZ(,i)R 0 Ia),RtB(,) Br,), , F RtBrl) + N(o, Rt(,,)R ®RtE(ii)R (11),6* W(R(ll)R,6*— sk)andx2, 6*) = RE((RtE(,,)R)’ X2, 6*)1?t.The theorem follows from Lemma 2.13. IProof of Lemma 2.1559SinceE(log II X2, 6) = E(log I RtE(ll)R I X2, ,6)= E(log( 1I2 E22 I) X2, 6*)Note thatE112 I X2, (11), 6* W1’(’P12,6* — sk)and thatX2, (11), 6* r, W_1(226* + n — sk — 1— h).Now apply Lemma 2.14 to finish the proof. IProof of Theorem 2.11In the following proof, we assume F1 fixed. By (2.16),E(3 I1))f, v1,...,sgkl.SoE(/IBl))= i=1,...,k.To prove the other half of the Theorem, by (2.17) and Theorem 3.2.8 of Muirhead 1982(p93),at S2aJv Jv 2L “JVta3 22aimplyingE(a3S2j =x2,2)’ = 1 (2.26)aE22 n — h —2and—jt(— i) X2 = x2,B1), 22] = Var(Ba X2 = x2,B°1), 22)•By (2.16) and the fact that aE22j is a scalar,= x2,B, 22 “ N([B1)]tR2,((ZZt)’ +F1)a22a).60Hence,Var(Baj I = x2 ,B(°,), E22) =(a22a)(F’ + (ZZt)’). (2.27)When X2 x2, B(,), 22 are fixed, S2 and B2 are independent. If we replace B(,) withB(°,), they are still independent (Lemma 2.18). Therefore, given X2 = x2, B(°,) and Z22,by Lemma 2.18,s9k—1 a S2aE(fr_1) = — h —2 Esgk — 1=,(aE22aj)}’ [($ — v)t(u — v)j— (ZZt)11 s9k—lsgk — 1(F-’ + (ZZt)’) — (ZZt)’=Proof of Theorem 2.12By (2.7), (2.8) and (2.12)-(2.14), given the hyperparameters , 6’, B°, F,X T(’,ZtF’Z + B°Z, 6* + ii).By Theorem 2.2,1 1 approx, _1X — B°Z = a(a’(X — B°Z)) ‘-‘-‘ a ZtF’Z + I),where a = 6* + n. Then by Lemma 2.7,approx.—vec(X) a4Jf(ZtF1Z + I, ) + vec((B°Z)t).So,vec[(X°)t}\(R*)tvec(X)= ( vec(M,) J approx.v ec(M2)/ vec[(BZ)t]Rvec[(Brl)Z)t] + (R*)ta_Ar(ZtF_1Z + I, ).Rvec[(B,)Z)t])61By the general normal theory, the theorem is proved. IProof of Lemma 2.19By the same argument of the proof of Theorem 2.12, one hasvec(Xl))aocalAf(ZF_lZ+ I, +where a1 = 8*— sk + n. So,(vec(Mi))aiRtJ\1(ZtF_1Z + I, +Rtvec[(B(°1)Z)].vec(M2)By the general normal theory, the lemma is proved. IProof of Theorem 2.13For notational simplicity, the superscript j is suppressed in the following derivation.Let X, j = 1,... ,n, be partitioned into ((X5)1X3k , (UJ)lxd3, ((G)1x(sgk_dj) ). B =(B, B, B) and are partitioned confirmablely. is reparameterizedas {oI(1), To(i), f*}where P= {1I(2), Ti(2),E(22)}. B1) represents {B, B}. B(1) is also reparameterizedas= B1 — Tl(2)B, b(2) = B(2).Note,f (U {G},D)Jf(U,{G} Dj,bl,b(2),f*)f(bl,b(,F* Dj)dbldb(2)dF*Jf(U,G Dj,bl,b(),F*)f({G}+l.f(b, b(2) F D3)f(f* D) db1 db(2) dF* (2.28)To find the distribution of f(b1,b(2) D, f*), note by lemma (2.1)B(i) D, f B + N(O, (ii))62where B1) = B1) + (B(i) — B(01))Et, E=En 0 F, F* = (I — E)F-’. By Lemma2.7,vec(Bl)) D f vec((Bl))t)+ N(O, 0 F).—By a general normal theory,vec(B) Dj,vec(B2)),I*vec( (B)t) + vec(B )vec(B2)vec(B))vec(B (vec(B2)) — v ec((B2))t2))+ .1V(O,2vec(B)Ivec(Bt) )vec((B)t)+ (ri(2) 0 I)(vec(B2)— vec((B2))t+ N(O, 1I(2) 0 F*)vec((B + Tl(2)(B( — B2))t) + N(O, 1I(2) 0 F*).In the above derivation, it is assumed that (F*)_l exists and the last step is true becauseof Lemma 2.5. Again by Lemma 2.7,_B*B1 I B(2) D f B + Tl(2)(B( (2)) + N(O, 1I(2) 0 F*),, 3’it implies— (B1 —Tl(2)B() I3’ —d b + N(O, 1I(2) 0 F*),where b* — —1 — ‘-‘l rl(2)B). Since the distribution of b1 b(2) does not depend on b(2), b1and b(2) are independent. Sof(b1,b(2) D, f*) = f(b1 I D, F*)f(b(2) D, f*)= f(bi, 1I(2), rl())f(b( I D3). (2.29)Further sincef(U, G I b1, b(2), f*) oc f(U I G, b1, b(2), f*)f(Gt b1, b(2), f*)63andUJ I G, b1, b(2), p [B1Z + TI(2)(G — B(2)Z)]t+ N(0, 1I(2))= [b1Z + Tl(2)Gj]t + N(0,E11(2)).Sof(U,, G b1, b(2), F) cx f(U G, b1, Ti(2),112))f(G b(2), (22)). (2.30)Also, by Model (2.12)-(2.14),f({G} D, b1, b(2), f*) = f({G}1 I (22), b(2)). (2.31)t t=3+By (2.28)-(2.31) and Lemma 2.20f(UJ {G},D)cx I f(Ut I t b1, Ti(2),E112))f(bl I D, Ti(2), 1I(2))jI’Yj,f(r1(2),E11(2) I D) db1 drl(2)dl!(2)f f({G b,E(22))f(b( I (22), D)f(E(22 D) db(2)d(22)if i=jcx J f(Ut b1, Ti(2),12))f(bi D, Ti(2),E11(2))jI’j’f(71(2), >1I(2) D,) db1 drl(2)d112).For summary, we have,U G, b1, F b1Z + Tl(2)G + N(0,E11(2)); (2.32)dD, Ti(2), 1I(2) = b1 + N(0, 1I(2) 0 F), (2.33)where= B — Tl(2)B= B1° + (B1 — B)E — Tl(2)[B + (E(2) —B2))Et].Since, by Lemma 2.1,W(ii), + (j — 1) — sk).Then by Lemma 2.20,Ti(2) D, 1I(2) “ N(l(2)),1I(2) 0 (2.34)641I(2) D W1(i2),6* — sk + (j — 1)). (2.35)By (2.32)-(2.34),U G, E11(2) a1 + N(O,i1(2)a,where1 B* Z + 1(2) Ga1 = BZ— 1(2)2) (2) 3 (22) 3anda2 = 1 + ZF*Z + (G —B2)Zj)tF)(G — B’2)Zj).Combined with Equation (2.8) and (2.35),U U G T(2),a,al,6* — sk +j).I65Chapter 3Application to Air Pollution DataIn this chapter, the theory of interpolation for data missing-by-design developed inChapter 2 is applied to obtain interpolated data.The daily maximum hourly levels of nitrogen dioxide (NO2), ozone (03) , sulphur dioxide(SO2) and the daily mean levels of nitrate (NO3), sulfate (SO4) were recorded fromJanuary 1 of 1983 to December 31 of 1988 in Ontario and its surrounding areas. Thesedata come from several monitoring networks in the province, including the EnvironmentAir Quality Monitoring Network (OME), Air Pollution in Ontario Study (APIOS), theCanadian Acid and Participation Monitoring Network (CAPMON) (see Burnett et al(1992) for more description of the data set). Also available are some climatic data:daily maximum, minimum and mean temperature; daily maximum, minimum and meanhumidity, and, mean pressure and mean wind speed, measured at other locations. Intotal, there are 37 different monitoring locations (sites), but not all sites monitor all ofthe five air pollutants. In the application below, we assume that the variation causedby networks to the observations at 37 sites is negligible. Therefore we can pool theobserved pollution levels without worrying about that variation.In general, there are two kinds of air pollutants: (i) a primary pollutant, which is66directly emitted by identifiable sources; (ii) a secondary pollutant, which is producedby chemical reactions within the atmosphere between pollutants and other constituents.SO2 is a primary pollutant while NO2, NO3, 03 and 504 are secondary pollutants.SO2 is produced by burning fuels containing sulphur. Its level depends on local emissionsources, like burning fuel oil or smelting.The secondary pollutants studied here are all produced by oxidation of primary pollutants. This oxidation is driven by ultra-violet radiation from sunlight and compriseschemical reactions that are temperature dependent. Since the chemical reactions proceed while the polluted air is being adverted by winds, secondary pollutants are generallymore widespread than primary pollutants. We thus refer to secondary pollutants as regional. Because of temperature dependence of the governing chemical reaction, NO2,NO3 and 03 are high in early afternoon and midsummer, low overnight and in winter.The oxidation of 802 to 504 is dominated by photochemical processes in dry, warmatmospheres.Monthly pollution data are interpolated down to the level of a Public Health Unit(PHU), the daily pollution data down to the level of a census subdivision (CSD). Bothinterpolation problems originate from environmental health studies not discussed in thisthesis. In both cases, the relevance of the assumptions in Chapter 2 is investigated andthe interpolated residuals are checked.In Section 3.1, our interpolation theory is applied to monthly data and in Section 3.2,to daily data.673.1 Application to Monthly Pollution DataA monthly pollution level is simply computed as the mean of the observed daily levels inthat month. Hence, a time series of observed monthly mean levels consists of 72 values.Since some of these time series contain excessively many randomly missing values, tocontrol the quality of data all time series are screened and those with more than one-third of its values missing are deleted. Thus, the number of gauged sites is reducedfrom 37 to 31. Here, we assume that the probability of a time series having more thanone-third of its values missing is not related to those values. Therefore, such a strategyfor dropping time series from our study does not cause bias.The locations of the remaining 31 sites, except two outlying sites, are plotted in Figure3.1. The whole of the Province of Ontario divides into thirty-seven PHUs or districts(Duddek et al 1994). A PHU resembles a Census Division, the difference being marginaldisagreements in boundaries. Some PHUs, for example, are aggregates of two CensusDivisions. Figure 3.2 displays the locations of some approximate centroids of thesePHU’s in Southern Ontario. Our ungauged sites consist of the 37 approximate centroids.Heilce, the total number of gauged sites, 5g, is 31 and that of ungauged sites, .s, is 37.For the monthly interpolation, four air pollutants are included. They are NO2, SO4,03 and SO2. Table 3.1 lists the observed pollutants at each of gauged sites, where a, b,d and e represent NO2, SO4, 03 and SO2 respectively. From the table, we can see thatat all gauged sites, there are 64 observed time series aild 60 missing time series. In thesequel, the term “time series” is replaced with “column”.Among the 64 observed columils, about two percent are missing. The missing datainclude the randomly missing data aild those censored from below. Here, censored belowmeans that the actual air pollution level is below the detection limit of a monitor.68Table 3.1: Pollutants Measured at Each Gauged Site, Where a, b, d And e RepresentNO2, SO4, 03 And SO2 Respectively.Sites 1 2 3 4 5 6 7 8 9 10 11 12Pollutants b b b b abde d be ade d d ade deSites 13 14 15 16 17 18 19 20 21 22 23 24Pollutants ade ade ade b abde b ade ade ade d e adeSites 25 26 27 28 29 30 31Pollutants de ade ade de b e deIn statistics, the procedure for filling in missing data is called imputation. There aremany existing imputation methods (c.f. Sarndel 1992, Little and Rubin 1987). Examplesof these methods are: overall mean imputation; class mean imputation; hot-deck andcold-deck imputation; random overall imputation; random imputation within classes;sequential hot-deck imputation; distance function matching; regression imputation; EMalgorithm based imputation and multiple imputation.The imputation methods mentioned above, produce a single imputed value for eachmissing value except for the multiple imputation approach. Many authors used thesemethods. Afifi et al. (1961) suggested filling in the missing observations for each variableby that variable’s mean. Buck (1960) and Stein et al. (1991) discussed imputing themissing values by regression models. Komungoma (1992) adopted a modified regressionstrategy. Johnson et al. (1994) combined a stepwise linear regression and a time seriesmodel to fill in missing values. Miller (1994) employed a nearest-neighbor hot-deckimputation procedure to fill in missing data. While the multiple imputation is appealing,it has a disadvantage of requiring more work for data handling and computation ofestimates. Moreover, the multiple imputation method was designed for survey sampling.It is not suitable for a spatial application.69We may apply Theorem 2.12 to impute our missing data. However, that method has notbeen well tested yet. Further more, Theorem 2.12 bases on the normal approximation toa matrix T. As showed by the “empirical coverage percentage” at the end of this section,in the current scenario the matrix T cannot be replaced by its normal approximation.For our monthly data, with such a small proportion of missing data, the choice of aprocedure for filling in missing values is not critical. We filled in a value missing fromthe jth column in the th month with the mean of the observed values of the same columnin the month of other years. In case, all six measurements of the same month of1983 to 1988 are missing, the grand mean of the observed values of the column is used.However, with our data set, no such a case obtained. The above ad hoc filled-in methodallows us to preserve the periodic property of the columns shown in Figure 3.3, whereozone is seen to have a strong periodic pattern. Again, because only a small percentageof the data are missing, our ad hoc fill-in method will cause negligible bias.Our theory of Bayesian multivariate interpolation with data missing-by-design is developed under two important assumptions. One assumption is that the detrended residualsshould spatially follow a multivariate normal distribution. The other is that each residual time series is a white noise process; in other words, the residuals are temporallyindependent. Checking multivariate normality is not easy since we lose hundreds ofdegrees of freedom to parameter estimation. In this chapter, only univariate marginalnormality is checked. Temporal independence is checked with autocorrelation and partial autocorrelation plots.The normal quantile-quantile plots of the detrended residuals of the original observeddata do not show straight lines, suggesting the observed data be transformed. With alogarithmic transformation of the observed data, the residuals appear to be marginally70normal. A typical example of a normal q-q plot of the original and log-transformeddata is shown in Figure 3.4. Autocorrelation and partial autocorrelation plots of thedetrended residuals of log-transformed data are shown in Figure 3.5. The correlationplots do not show evidence of temporal correlation. By repeating the above initialdata analysis for observations at all gauged sites, we are led to conclude that the log-transformed data do meet the assumptions of our interpolation theory. In the sequel,unless specify otherwise, we always refer observations on their log-scale.For determining the linear and seasonal trends of our Gaussian model, time series plotsare made. Based on the plots, Z is taken to be 1, t, cos(W), .sin(W) ,where t = 1,. . . 72.Here t = 1 represents the January of 1983, t = 2 represents February of 1983 and soon, until t = 72 which represents December of 1988. The coefficients of the linear andseasonal trends are estimated with ordinary least squares. Figure 3.3 displays the timeseries plots and their least squares fitted curves for the four observed pollutants at Site5. The fitted curve of log(O3) is far better than that of the other three because of itsperiodicity. The strong yearly pattern of ozone is partially explained by the fact thatthe creation of ozone is highly related to solar radiation.For the environmental health study referred to earlier, the one which required interpolation of monthly means, the air pollution level in summer is of special interest. Thereforethe pollution data of winter and summer are interpolated separately. In the following,only the interpolation for summer data is described. The procedure for winter data issimilar. Here, the summer of a year is defined to be from May 1 to August 31 and “winter” the remainder of the year. Thus there are 24 values (24 months) in each summercolumn. The purpose of the analysis is to interpolate monthly NO2, SO4, 03 and SO2levels in the summers of 1983 1988 at 37 ungauged sites.71Table 3.2: The Estimated Between-pollutants-hypercovariance Matrix of the Log-transformed, Summer Monthly Data.N02 S04 03 S02N02 1.00000000 -0.2854130 0.03476434 0.1364513S04 -0.28541295 1.0000000 0.79402064 -0.337937003 0.03476434 0.7940206 1.00000000 -0.1457342S02 0.13645127 -0.3379370 -0.14573424 1.0000000The interpolation procedure follows the following order: first, the unbiased estimators ofF—’ and Br,) are computed; second, the EM algorithm for the estimation of and Ag, Qis invoked; third, the SG method is applied to extend Ag to A; then, with the exchangeable assumption on B°, Br,) is extended to B°; finally, with all the hyperparametersestimated, the interpolated values are computed by the Bayesian interpolator.We use S macros to carry out the interpolation and call a C program to perform the EMalgorithm. An example of the S macros and C program are attached in the Appendices.By running S macros and the C program, we get the estimated prior degrees of freedomto be 610; the estimated between-pollutants-hypercorrelation matrix, which is listed inTable 3.2 and the estimated hyper-covariances of NO2,SO4, 03 and SO2 are 0.6582724,1.6263357, 0.2166192, 1.8503741, respectively. The biggest positive correlation amongthe four pollutants occurs between 03 and SO4. Since both 03 and SO4 are regional airpollutants and both are related to sunlight, a higher correlation between 03 and SO4is expected. Meanwhile 802 is a primary pollutant, it has a lower correlation with theother pollutants.The result of the SG step is summarized in Figures 3.6 ‘-- 3.8. The right hand plotof Figure 3.6 is a twisted 30 by 30 checkerboard in the D-plane. The original 30 by30 checkerboard is in the 0-plane. The coordinates of its lower left corner are the72minimum latitude and longitude of gauged sites. The coordinates of its upper rightcorner are the maximum latitude and longitude of gauged sites. The left hand plot ofFigure 3.6 shows an exponential fit between dispersions and the D-plane distances (referto Chapter 1 for a brief summary of the SG method). The smoothness of the twistedcheckerboard is controlled by parameter ). A smoother checkerboard in D-plane isachieved by sacrificing the fit between the dispersions and D-plane distances. Figure 3.7shows a smoother checkerboard in the D-plane but a rougher fit between the dispersionsand D-plane distances when the smoothing parameter value is increased from 0 to 2500.A linear pattern in Figure 3.8 shows that the estimated covariance and the observedcovariance are conformable.By applying Corollary 2.6 and using the above estimated hyperparameter values, afterSG step it is straightforward to compute the interpolated, summer monthly air pollutionlevels at ungauged sites over six years. As a way of checking the interpolated values, theoverall average ozone levels the in summers of 1983 to 1988 at gauged sites are plottedin Figure 3.9. Those of interpolated ozone levels at ungauged sites are plotted in Figure3.10. The two plots affirm our interpolation procedure. When a high mean 03 level isobserved at a gauged site, our interpolator gives a high 03 values at nearby ungaugedsites. Corresponding results obtain for lower observed 03 levels.Another way of checking the interpolation procedure is to look at the correlation between the observed and estimated data by cross-validation (CV hereafter). CV is aprocedure which deletes observed datum one at a time and estimates these datum fromthe remaining data as if that datum were never observed. It is a popular diagnostictool. In our CV study, we deleted one gauged site at a time and interpolated pollutantlevels at that same site using observed levels at other sites. To avoid spuriously highcomputed correlations between the estimated and observed levels of pollutants, we first73removed the trends from both the estimated and observed columns, and then calculatedthe correlations among the residuals.The correlations between the detrended, estimated and observed levels of each pollutantaggregating across sites and over time are:Summer WinterN02 0.243 0.242S04 0.494 0.43803 0.534 0.429502 0.238 0.200.The correlations between the estimated and observed levels at each gauged site and foreach observed pollutant are given in Table 3.3.In Table 3.3, the correlations of SO4 and 03 in both summer and winter are generallyhigher than those of 502 with other pollutants. In other words, the predictions of SO4and 03 are more accurate than those of 302. Figure 3.11 displays the plot residualsof log-transformed, monthly observed and estimated pollutant levels in both summerand winter. Figure 3.12 shows the scatteplots of log-transformed observed pollutantsagainst estimated pollutant levels for each pollutant in summer and willter, respectively.In the plots a linear pattern means accurate interpolation. The plots confirm conclusionssuggested by the tables; these results are consistent with the fact that 03 and SO4 areregional pollutants. It is easy to predict them with the observed data from other sites.In contrast SO2 is a local pollutant and so more difficult to predict.Can a simpler to use, normal distribution be substituted for the multivariate T predictive distribution? That might naively seem possible since the univariate normalapproximates its longer tailed relative very well. However, our results suggest this74Table 3.3: Correlations Between Residuals of Log-Transformed, Observed and EstimatedPollution Levels at Gauged Sites.Sites summer winterNO2 S04 03 502 N02 S04 03 SO21 0.96 0.812 0.96 0.853 0.87 0.744 0.93 0.815 0.39 0.82 0.75 0.57 0.09 0.67 0.70 0.406 0.92 0.747 0.90 0.67 0.71 0.148 0.42 0.81 0.76 0.56 0.74 0.589 0.87 0.7810 0.85 0.4111 0.57 0.97 0.66 0.61 0.75 0.3212 0.87 0.65 0.59 0.3013 0.44 0.75 0.54 0.11 0.57 0.0414 0.11 0.88 0.53 0.13 0.69 0.3315 0.66 0.93 0.69 0.52 0.79 0.2416 0.90 0.8717 0.66 0.91 0.80 0.78 0.25 0.68 0.55 0.1818 0.85 0.7519 0.63 0.77 0.80 0.34 0.59 0.3820 0.36 0.72 0.61 0.57 0.72 0.4021 0.67 0.80 0.63 0.47 0.63 0.4022 0.86 0.4923 0.78 0.2524 0.48 0.74 0.68 0.56 0.65 0.5625 0.80 0.52 0.44 0.2626 0.71 0.82 0.81 0.24 0.47 0.2327 0.55 0.77 0.66 0.44 0.59 0.5028 0.87 0.83 0.40 0.4929 0.97 0.4930 0.78 0.1031 0.82 0.62 0.23 0.5275substitution cannot be recommended without additional study. Our initial impressioncomes from an evaluation we did of the empirical coverage percentage of three-standard-deviation confidence intervals (CI). If the predictive distribution were normal, all thethree-standard-deviation CIs would include the true values about 100 percent of thetime. As the percentages by pollutants presented below indicate, this high coverageprobability is not achieved here. The heavier tailed predictive matrix T distributionseems to be required.By pollutants the percentages are,Summer WinterN02 1000!. 94.27.S04 1007. 99.27.03 98.67. 98.87.S02 94.57. 1007.The unbiasedness of residuals is also checked. In the top plot of Figure 3.13 are theboxplots of the prediction errors for four pollutants, these being defined as the differencebetween the predicted and observed values. Except for SO2, the mean prediction errorsfor the other three pollutants are almost zero. In other words, the predictor is unbiased.In the same figure, the other two plots indicate that observed values and the boxplots ofthe predicted values have similar patterns except that the predicted values have biggervariances.3.2 Application to Daily Pollution DataFor one of the environmental health studies mentioned above we needed to interpolatedaily air pollution levels down to the centroids of CSD’s. In Southern Ontario, 733 such76centroids are chosen and all five pollutants NO2,NO3, 03, SO2 and SO4, are included.The general interpolation procedure and many intermediate results are similar to thosefor the monthly data. Thus, in the following, only the results which differ from those inSection 1 are discussed. Again, only the interpolation of summer air pollution levels atthe CSD centroids is discussed.The observed data come from daily measurements at 37 sites. By removing the timeseries (columns) where there are excessively many missing data, the number of sites isreduced from 37 to 27 and the number of observed columns to 55. Then the numberof missing columns is 80 (= 27 * 5 — 55). Figure 3.14 displays the locations of the 27gauged sites, except for two outlying sites.As in Section 3.1, an ad hoc method is used to fill-in the missing data. However, in thissection, a different ad hoc method is used. Since in the daily observed columns, elevenpercent are missing, a more delicate approach is needed. The new ad hoc method replyson multivariate normal theory. With the new method, missing data are filled in byxiii = + ij,where X is the missing value for the jth pollutant (i = 1,. . . , 5) at j gauged site(j = . . ,27) at time t; B is estimated by the ordinary least squares method usingobserved values for the jth pollutant at the ih gauged site and jj is the estimatedresidual. The estimated residual is computed by using well known normal theory. Thatis,E(X Y = y) = E(X) + xY’y(y —where X and Y are jointly normal, is the covariance matrix between randomvariables U and V and E(U) is the mean of U. By letting Y represent the set of allobserved data at time t, replacing X with X, and taking both E(X) and E(Y) to be77zero, we apply the above formula to calculate.However, the formula is not directlyapplicable, since the joint covariance matrix of X is unknown. A method of momentsis used to estimate the covariance matrix in a pairwise manner. In other words, thecovariance of Ximt and Xhkt is estimated by Zt(Ximt — Xlm)(Xhkt — Xhk), where f isthe total number of observed pairs of Ximt and Xhkt. The estimated covariance matrixhas to be checked for positive definiteness. If the estimated matrix is not positivedefinite, it cannot be used to obtain If so, other imputation methods mentionedin the previous section would need to be investigated. However, in this application,the estimated matrix is indeed positive definite. Compared with the method used inSection 1, the advantage of new method is that it brings less autocorrelation into eachtime series when the missing values are fill-in. We assume such a filled-in procedure willnot cause serious bias, because of the small percentage of missing data.Figure 3.15 shows that a logarithmic transformation of daily data is also necessary. Sincethe time series plots of the daily summer data do not have obviously periodic patterns,the linear trend of the Model (2.12) is instead taken to be the grand mean effect; timet effect; weekday effect (from Monday to Thursday); monthly effect (May, June, July);yearly effect (from 1983 to 1987) and the mean daily temperature. The mean dailytemperature is the mean of observed daily mean temperatures in Southern Ontario.Such an arrangement is due to the fact that Model (2.12) only allows covariates withspatially equal measurements. Other variables, for instances, the daily mean humidity,mean pressure etc. are investigated too. They are not included in the final modelbecause their ordinary-least-squares fit coefficients are not significantly different fromzero.We checked the autocorrelation and partial autocorrelation plots of daily, detrendedsummer observations. Figure 3.16 shows an example of such a plot. We find that78Table 3.4: The Estimated Between-pollutants-hypercovariance Matrix of the Log-transformed Daily Summer Pollution Levels in Southern Ontario.N02 $04 N03 03 S02N02 1.0000000 0.1224039 0.24357919 0.11314233 0.20852698S04 0.1224039 1.0000000 0.41186592 0.26680085 0.12408473N03 0.2435792 0.4118659 1.00000000 0.24745489 0.0508072103 0.1131423 0.2668008 0.24745489 1.00000000 0.09569043S02 0.2085270 0.1240847 0.05080721 0.09569043 1.00000000they are lag-i correlated. For removing the lag-i correlation of the residuals, an AR(1)transformation of the observed values is applied, that is, = — fort = 2,. . . , 738, where qjj is estimated based on the observed values of i’ pollutantat gauged site. Such a transformation removes the lag-i correlation among theresiduals. For comparison, the autocorrelation and partial autocorrelation plots of theAR(1)-transformed residuals of 03 at Gauged Site 6 are shown in Figure 3.17.When the normal and independence assumptions are satisfied, the EM algorithm canbe applied to the residuals to estimate the hyperparameters. The between-pollutantshypercorrelation matrix is given in Table 3.4 and the hypervariances of the N02, S04,N03, 03, S02 are 0.8093218, 1.8325046, 1.3923836, 0.4061250, 1.9200310, respectively.The estimated number of degrees of freedom is 4365 for nearest integer. A big numberreflects a lot of prior information about the covariance matrix E. That large numberstems from the fact that there are only 27 gauged sites with 55 observed columns while733 ungauged sites with 3665 columns missing-by-design need to be interpolated.The SG and interpolation steps are similar to those of the monthly data applicationabove. One remaining problem needs to be solved. Since the interpolation is basedon the AR(1)-transformed data, the interpolated values are in the form of AR(1)-transformed residuals. To obtain the true interpolated residuals, the following fact79is must be used.Lemma 3.1 If Uk— cUk_1 = Vk and q < 1, then Uk k=2 fl_kVk provided n is bigenough.Proof: SinceU2-çU1=VUn_i— c/;Ufl_2 = V_1U — çbU1 = V,multiply both sides of the first equation by q2, the next equation by , etc andthen add all equations. It becomes,Uk — ‘Ui= k=2n_kVkWhen n is big enough, is almost zero. ITo apply the above fact, the ‘s at ungauged sites are needed. These values are notavailable. Let us assume the ‘s are the same for the same pollutant at all sites. Thenin all, only 5 different coefficients are needed. The five coefficients can be estimatedby ordinary linear regression subject to the above assumption. By checking the observed data, we know that the assumption is valid. Now with the new assumption, theinterpolation procedure is repeated.80Table 3.5: Correlations Between the Residual of Log-transformed, Summer Daily Observed and Estimated pollutants at Gauged Sites.Sites AR(1)_SummerN02 S04 NO3 03 S02 NO2 S04 N03 03 S021 0.68 0.56 0.63 0.522 0.79 0.65 0.76 0.633 0.56 0.55 0.48 0.504 0.72 0.64 0.70 0.635 0.44 0.62 0.45 0.67 0.12 0.44 0.60 0.48 0.65 0.126 0.69 0.697 0.65 0.49 0.10 0.64 0.50 0.108 0.49 0.87 0.51 0.889 0.90 0.9010 0.55 0.82 0.55 0.8311 0.82 0.8312 0.18 0.78 0.17 0.7813 0.72 0.7314 0.41 0.75 0.38 0.7315 0.74 0.68 0.71 0.6716 0.69 0.50 0.58 0.86 0.68 0.50 0.59 0.8517 0.70 0.53 0.68 0.5218 0.63 0.78 0.63 0.7919 0.63 0.84 0.62 0.8620 0.71 0.88 0.02 0.69 0.88 0.0321 0.82 0.8222 0.54 0.82 0.56 0.8323 0.79 0.7824 0.45 0.68 0.44 0.6425 0.34 0.65 0.04 0.32 0.61 0.0526 0.60 0.58 0.52 0.5427 0.15 0.1481As before, a CV study of the residuals for both the observed and predicted valuesis carried out. The pollutant-wise correlations between the residuals of observed andpredicted summer daily pollution levels at gauged sites are:Summer AR(1) SununerN02 0.49128973 0.48457402S04 0.66564685 0.63188994N03 0.58376676 0.5692709103 0.78495848 0.78477716S02 0.08807141 0.08978648.In the above table, the predicted values used for computing the correlations in Columnone are obtained using the original observed values and the predicted values, and forthose in Column two obtained using the AR(l)-transformed data. From the same table,it can be seen that the correlations of the same pollutant in both columns are close.We interpret this result positively, as saying that our interpolation procedure is robustto the assumption of temporal independence. In other words, in terms of the abovecorrelations, by taking an AR(l) transformation, the prediction of observed values hasnot been improved. From the same table, the correlation for SO2 is much lower thanthose of the other pollutants. This is explained as follows. First, from Table 3.4, we seethat the hypercorrelations of SO2 with other pollutants are very low. This fact indicatesthat by including other pollutants, the predictions on SO2 levels are not improved much.Second, from Table 3.5 it is seen that 502 was only observed at only 5 gauged sites.So in the CV analysis, the prediction of SO2 levels at one of its gauged sites are basedonly on the observations at the other four sites gauged for SO2. For monthly data,the problems associated with SO2 do not exist. Therefore, the computed correlationsbetween the predicted and observed 802 values are higher.82The correlations of the observed and estimated residuals at each gauged site for eachpollutant are given in Table 3.5. From Table 3.5, it can be seen that it is easier topredict 03 and SO4, because they are regional pollutants and more difficult to SO2,because it is a local pollutant.83Chapter 4Application to EnvironmentalMonitoring Network RedesignAnother application of the “Bayesian interpolation with missing-by-design data” theorydeveloped in Chapter 2 is to the problem of redesigning an environmental monitoringnetwork. The term “redesigning” means adding or deleting sites from a current existingmonitoring network. Guttorp, Le, Sampson and Zidek (1992), Caselton, Kan and Zidek(1992), Wu and Zidek (1992) have discussed the above redesign problem. These authorsderived their optimally redesigning strategy based on following reasoning. Maintainingand collecting data from an environmental monitoring network is quite expensive, therefore the network is set up and maintained by a nation wide institute. Thus the collecteddata may be used by different users for different purposes. This fact implies that anoptimal redesign of an environmental monitoring network should be based on certaincommon and fundamental purposes of the users of monitoring networks. They choose“reducing uncertainty about some aspect of the world, regardless of how that need maybe expressed in any particular situation” (Guttorp, Le, Sampson and Zidek 1992) asthe redesign goal. They developed a general network redesign theory by combing theentropy optimal criteria and the Bayesian paradigm.84In Section 1, the general theory of network redesign in univariate case, which is discussed in Guttorp, Le, Sampson and Zidek (1992), is summarized. In Section 2, it isdemonstrated that how our theory can be applied to the redesign problem.4.1 Theory of Network Redesign with EntropyIn this section, the theory of network redesign with the entropy is summarized in theunivariate case, which means that there is one pollutant at each site. Although thetheory deals only with an augmentation of a network, the reduction of a network ishandled in a similar fashion and the conclusion is similar.Suppose that currently there are g gauged sites in a environmental monitoring network and it is planned to add u1 more sites to the network in future. These u1sites are chosen among u candidate sites. Call all the u candidate sites “ungaugedsites”. Let Xf represent a future realization at gauged and ungauged sites. Decompose Xinto((X)t,(X)t), where X is a realization at gauged sites and X7 at ungauged. By properly rearranging the coordinates, (Xy)t is further decomposed into((Xrd)t, (xrem))t). is the response vector at ungauged sites that will be addedinto the network and xrm) is the vector at sites that are not chosen. We assume thesame Gaussian linear model of (2.12) -.‘ (2.14).As to a measurement of uncertainty, one natural choice is the entropy, which is definedasH(X) = E [_log]where h(X) is a reference density, h(x) makes H(X) be invariant to scale transformationof X. From the definition, it is easy to see thatH(X, Y) = H(Y) + H(X Y), provided h(X, Y) =h1(X)h2Y).85Let D be the set of all observed data at gauged sites in the past, 0= (, B) be the setof parameters in our Guassian model. Then, given D, the total uncertainty of a futurerealization Xf and unknown parameter, 0, is H(Xf, 0 D). Since,H(Xf, 0 D) = H(U G, 0, D) + H(0 I G, D) + H(G D), (4.1)where G = ((X)t,(X)t), U= xrm andH(U G,0,D) = E[—log(f(U I G,0,D)/h11(U))H(8 G,D) = E[—log(f(0 G,D)/h2(0 ) D],H(G I D) = E[—log(f(G D)/h12(G)) I Dj.In the above, it is assumed that h(X,0) =h1(X)h20) and h1(X) =h11(U)h2G).Note that adding or deleting any site to or from the current network will not changethe total uncertainty H(Xf, 0 D). When in future, the response vector at gauged sitesand added sites is observed and if the measurement errors are negligible, the uncertaintyrepresented by H(G D) becomes known. By Equation (4.1), one can see that a fixedtotal present uncertainty is decomposed into two parts, one part will become absolutelyknown by taking observation in future and the other part is still unknown. Thereforeminimizing the future uncertainty by taking additional sites is equivalent to minimizingthe unknown future uncertainty, is in turn equivalent to maximizing the uncertainty ofwhat will be known in future. So the problem is equivalent to adding sites to maximizeH(GID).For finding expression, H(G D), the entropy of a multivariate t-distribution needs tobe computed, since f(G D) consists of two multivariate t-distributions.Now assume, for a random vector Y,I E86W;1(,6),by (2.8),I t(O,(6* —g + 1),6* —g + 1).Note thatH(Y, E , 6) = H(Y 6’) + H(E , 6) H(E Y , 6*) + H(Y , 6w).To find the multivariate t-distribution’s entropy, H(Y , 6’), we only need computeH(Y I , 6’, ), H(> I , 6*) and H(E Y, , 6*) respectively. Since Y E, , 6* ismultivariate normal , , 6* and E I Y, , 6* are inverse Wishart. By using the resultpresented in Caselton, Kan and Zidek (1992), the entropies of the multivariate normaland the inverse Wishart. Thus, after a straightforward calculation, the entropy of amultivariate t-distribution is,H(Y ,6*) = log +c(g,6*), (4.2)where c(g, 6*) is a function of g and 6* only.Since by Theorem 2.1, the distributions of D and Xd X, D are multivariatet-distributions andH(G I D) = H(X I D) + H(X I X,D).By applying (4.2), it is easy to see thatH(G D) = log addIg +c(22,22, c, d, 1, g).Where ddg is the residual covariance matrix of Xa conditional on and 22 Sdefined in Lemma 2.1. Because only the term log is related to a choice of thenewly added sites, maximizing H(G I D) is the same as maximizing I.874.2 Network Redesign with Data Missing-by-designThe result presented in the previous section can be easily extended to the multivariatecase by assuming k pollutants at each site. The value of (Faddlg comes from (F that isestimated with the method described in BLZ. Similarly the result can be extended tothe case when there is missing-by-design data. To justify this extension, one needs tonotice that, given hyperparameters,() B = R*Xt(R*)t T47_l((R*)t(FR*, 6*— 1)R*B I N(R*BO, (R*)tER* ® F-’),where= (1skxsk 0\ 0and R2 is defined the same as that in Section 2.4.1 of Chapter 2. Then applying theabove model and a similar argument as in the previous section, an optimal criteria forredesign of a current network is maximizing log 1addI2 , where‘T’addl2 is the conditionalhypercovariance matrix of the future realization at added pollutants and sites given theobserved pollutants at gauged sites. The matrix‘I’addl2 can be obtained from matrix (Fthat is estimated in Chapter 2. In an application, the added sites need not monitor all airpollutants. This relaxation may be useful when the optimal network redesign is requiredfor multiple networks that were set to monitor different pollutants. A hypotheticalexample is that there are three networks, labeled 1, 2, 3. Network 1 measures ozoneonly, network 2 nitrate only and network 3 both pollutants. The above discussed optimalredesign enables us, say, to optimally add one site to network 1 and another site tonetwork 2, in terms of maximally reducing uncertainty of a future realization in thesethree networks.88As an example of implementing the above discussion, the monthly air pollution data inChapter 3 is used. Thus, there are 31 gauged sites and their latitudes range from 42.246to 49.800 and longitudes range from 74.736 to 94.400. Suppose in future two sites will beadded to these monitoring networks in Southern Ontario. One added site will monitorozone only and the other nitrate only. Further suppose the possible site locations forthe two would-be added sites are constrained to the grids of 10 by 10 checkerboard withthe latitudes and longitudes of the four corners being (42.246, 74.736), (42.246, 94.400),(49.800, 74.736) and (49.800, 94.400). By taking the estimated 1 from Section 3.1 andapplying the above optimal redesign criteria, the following answer is reached.We should add the measuring-ozone only site at latitude 43.92467 and longitude 85.66044and add the site measuring-nitrate-only site at latitude 44.76400 and longitude 87.84533.However, further simulation study on the sensitivity of to the locations of added sitesneed be done.89Chapter 5Cross-validation StudyIn this chapter, we describe three CV studies designed to judge performance of thenewly developed theory of interpolation with data missing-by-design. The first study isto justify the necessity of developing a new theory, in situation where the LZ methodcould be applied to solve the same problem. In the second study, an artificial example ismade for studying the trend of adjusted mean squared predicted error(AMSPE), where,starting from a complete data set, columns are deleted one-by-one and the AMSPEcomputed. The third is a comparison of our theory against Hass’s CoKriging method,since both methods can handle data missing-by-design.5.1 Simultaneous Interpolation Versus UnivariateInterpolationBy interpolating one pollutant at a time, one can apply LZ theory to the SouthernOntario pollution interpolation problem. Why then is a new theory needed when anold theory is available? The answer lies in the difference of two methods. With the LZmethod, only partial data is used for the interpolatioll, while the ew method includes allavailable data in the procedure. Take the monthly pollution data in Southern Olltario,for example. When ozone levels are interpolated at ungauged sites by the LZ theory, only90the observed 03 levels at gauged sites are included in the analysis. The new methoduses all the observed values of NO2, SO4, 03 and SO2. For distinguishing betweenthese two methods, that of LZ will be called univariate interpolation and that of thenew method, simultaneous interpolation.One way of showing the superiority of the simultaneous interpolation over the univariateinterpolation is to prove that its interpolator leads to a smaller mean square error. Thatfact is shown below.Theorem 5.1 Let X0, 1’ be any two random vectors and X a random variable. ThenE(X — E(X Xo, Y0))2 <E(X — E(X I (5.1)Proof: Observe thatE(X — E(X = E(X2— 2XE(X X0) + E2(X X0))= E(X2)— 2E(XE(X X0)) + E(E2(X I Xe))= E(X2)— 2E(E(XE(X I X) X0)) + E(E2(X I X0))= E(X2)- E(E2(XSimilarly,E(X - E(X X0,Y))2= E(X2)- E(E2(XFurther, by Jensen’s inequality, for any random variable Z,E(Z2) > (E(Z))2.Apply it,E(E2(X X0,Y)) = E(E(E2(X X0,Y)E([E(E(X X0,Y) X0)]2= E(E2(X I91So equivalently,E(X — E(X I X0,Y))2 E(X — E(X K0))2.Returning to the ozone example, we take X0 to be the observed levels of 03 at gaugedsites, Y0 the observed levels of the other pollutants and X, the unobserved pollutionlevel at an ungauged site. Then the univariate Bayesian interpolator is E(X X0) andthe simultaneous interpolator E(X X0,Ye). When the model is correctly specified andthe hyperparameters are known, Theorem 5.1 implies that the simultaneous interpolatordoes no worse than the univariate interpolator.The following CV study supports the above claim empirically. Again, monthly air pollution data in Southern Ontario is used. At each gauged site, the observed pollutants aredeleted as if they were not observed. Then both univariate and simultaneous Bayesianinterpolators are applied to obtain the predicted values of the “deleted” values basedon the data at the other gauged sites. When the predicted values by both methods arecomputed for all 31 gauged sites, the mean squared predicted error (MSPE) is calculated for the univariate interpolator and the simultaneous interpolator respectively. Theresults for the monthly summer data and monthly winter data are listed below.Simultaneous Univariatesummer winter summer winterN02 0.18543849 0.14342632 0.2829322 0.1292677S04 0.13848447 0.21311221 1.274841 0.731078203 0.04369236 0.05382523 0.12536 0.2367643S02 0.62173407 0.28098323 0.758635 0.4344104The values confirm our theoretical result, except the case of NO2 in winter, where the92MSPE of the univariate interpolator is smaller than that of the simultaneous interpolator. One interesting point is worth mentioning here. The above numbers show that therelative reduction in the MSPE’s achieved by simultaneous interpolation over univariateinterpolation, is much higher for 804 and 03 than for 802. For 504 and O3 the relativereduction is from 300% to 900%. For 302, the reduction is under 50%. This is because504 , 0 are regional pollutants and 802 is a local pollutant. Intuitively, a regionalair pollutant intuitively has a higher correlation with the other pollutants than a localpollutant, as indicated by the estimated between-pollutants-hypercorrelation in Section3.1. By including the other correlated pollutants in the analysis, we would expect toenhance the interpolation. For a local pollutant, since it has little or no correlation withother pollutants, the inclusion of additional pollutants in the analysis will not improvethe interpolator as much. Therefore, we can conclude that the interpolator with datamissing-by-design does better than that of LZ on regional pollutants. It does not domuch better than LZ on local pollutants. The conclusion has theoretical support: if X0are X are independent equality in Equation (5.1) obtains.5.2 Trends in Adjusted Mean Squared PredictionError (AMSPE)We now check the performance of the Bayesian interpolator (2.6) in terms of the AMSPE.Here the adjusted prediction error is defined to be (Xpred — Xobs)fstd(Xpred), that is, thedifference between the predicted value and the observed value divided by the standarddeviation of the predicted value. AMSPE is the mean of all squared adjusted predictederrors. We prefer AMSPE over the mean squared error (MSE) because different pollutants have different units of measurement, hence different variability. Therefore, meanprediction errors of different pollutants must be normalized to make them comparable.93The data set for our study comes from the monthly Southern Ontario pollutant dataused earlier. Among 31 gauged sites 13 sites where NO2, 03 and SO2 are all observed,are chosen. So the original data set consists of thirty-nine columns (three observedpollutants at each of thirteen sites). In our study, a randomly chosen column at arandomly chosen site among the 13 sites is deleted. A CV study is performed on theremaining data set that has data missing-by-design. An AMSPE is computed. Next, ina similar way, a randomly chosen column from the remaining data is deleted and the CVprocess is repeated on the new data set, which has two columns less than the completedata set. So a new AMSPE is computed. By repeating this and plotting AMSPEs, atrend in AMSPE is obtained.One example of AMSPE trends is shown in Figure 5.1. There are twenty-four AMSPE’sin the plot. The first AMSPE is computed after the column of observed NO2 at Site2 is deleted from the original data set, the second AMSPE while the previously chosencolumn and the column of observed SO2 at Site 3 are deleted. By repeating the sameprocedure, the 24 AMSPE is computed after 24 randomly chosen columns are deleted.Below we show the order of the deleted columns.Order: 1 2 3 4 5 6 7 8 9 10 11 12Pollutant: N02 S02 N02 03 03 502 03 03 N02 S02 N02 S02Site: 2 3 4 1 7 4 10 6 12 7 10 12Order: 13 14 15 16 17 18 19 20 21 22 23 24Pollutant: N02 03 03 S02 S02 S02 N02 S02 N02 S02 03 S02Site: 3 2 11 9 6 1 5 5 9 11 8 894The AMSPE trend plot in Figure 5.1 shows a generally increasing pattern with somebumps along the way. The incremental changes in AMSPE are not very dramatic.This perhaps indicates good performance of the simultaneous interpolator. However,the result must be interpreted cautiously, since the AMSPE is not a robust index.When the standard deviation of a predicted value is relatively small, it could explodethe corresponding adjusted prediction error so much that an AMSPE value will bedominated by a particular term. If a common term plays a dominant role among all theAMSPE values, the trend will be a flat line. Such a plot does not imply superiority ofthe simultaneous interpolator.5.3 Comparison with CoKrigingBoth Bayesian multivariate interpolation with data missing-by-design (referred to belowas the new method) and Hass’s CoKriging method can handle data missing-by-design.Here are some direct contrasts of the two methods. The new method gives a predictivedistribution, therefore a simultaneous interpolation region, while Hass’s method onlygives the interpolated value and its standard deviation. The new method easily handlesany number of variables (pollutants), ignoring for the limitation of computer capacity,while Hass’s method only allows two variables in order to retain mathematical tractability. The new method includes all available data in the interpolation process, while Hass’smethod only uses only partial data. However, when sk is big, Hass’s method has a substantial computing advantage over the new method, since the convergence of the EMalgorithm used in the new method is slow.A detailed comparison follows. Generally speaking, our simulation study shows thatHass’s method enjoys computational advantages over the new method when there are95many sites. When there are not enough sites, Hass’s method encounters difficulty. Sincein each moving window, Hass’s method requires a minimal number of sites in the windowfor the purpose of variogram estimation. When gauged sites are widely spread out orthere are not enough sites, that requirement cannot be met. This affects the accuracyof the estimated variogram. On the other hand, since the new method involves a lotof matrix operations, when there are too many sites, the dimensions of the matrix canbe so big that computation becomes excessively slow. In the case of a small number ofgauged sites, the new method works fine. Another difference between Hass’s methodand the new method is in the models they use. While Hass’s method models only a“snapshot” of a spatial trend, the new method incorporates the temporal trend for eachpollutant at each site. Therefore, the new method can handle spatial-temporal data,but may have problems with spatial data alone. In the case of spatial data alone, it ishard to model a temporal trend with one datum for each observed pollutant at eachgauged site.Because of the complexity of the cross-variogram formulas, Hass’s method cannot takemore than two variables (e.g., two air pollutants) in practice and it is hard to expandbeyond that. The new method can handle any number of variables (pollutants). Asshown in Chapter 2, a simultaneous prediction region is derivable when a predictivedistribution is available. Hass’s method only gives a pairwise confidence interval for theinterpolated value.A fundamental distinction between the two methods lies in their Bayesian against nonBayesian distinction. We do not take up that issue here.For an empirical comparison of the two methods, we use a real data set. The datacome from selected sites in the National Acidic Deposition (NADP) Network and the96National Trends Network (NTN) in United States. Refer to Wu and Zidek (1992),Watson and Olsen (1984) for more details. Forty-five gauged sites, the latitudes andlongitudes of which are in Table 5.1, are randomly chosen. Since both sulfate and nitrateare regional pollutants, they should be highly correlated. That guess is confirmed by ananalysis of the data, where the estimated hypercorrelation between sulfate and nitrate ishigher than 0.70. The high correlation enables both methods to fully demonstrate theirstrength and weakness, making their comparison more reliable. At the forty-five sites,both sulfate and nitrate are observed monthly from 1983 to 1986. In terms of Hass’smethod, one pollutant is treated as a response variable and the other as a covariate.The covariate is included in the analysis to improve the interpolation precision. Wearbitrarily chose nitrate as the response variable and sulfate as the covariate. To agreewith the form of a CV study provided by Dr. Tim Hass, both observed levels of nitrateand sulfate at the first 35 sites of Table 5.1 are used for our study and only the monthlysulfate observations at the remaining 10 sites are used for the study. The number, 35,is chosen because according to Hass’s method, in each moving window there must be aminimum of 30 gauged sites.For each month of the four years, a CV study of Hass’s method is done and the MSPE iscomputed for the original and log-transformed data respectively. The same study is donefor the new method. The MSPE, for both methods on the two cases are listed in Table5.2. In terms of MSPE, with the original data, the new method beats Hass’s method in32 out of 48 months and in terms of the grand mean of MSPE, the new method also winsby 0.4437143 against 1.25524. Figure 5.2 displays the monthly MSPE’s of both methodson the original data. In the figure, “Bayesian” means the new method and CoKrigingmeans Hass’s method. With the log-transformed data, the ranking is reversed. The newmethod loses to Hass’s method in 39 out of 48 months and 0.3813128 against 0.2347863of the mean MSPE. See Figure 5.3 for a graphical comparison of the MSPE’s at each97month on the log-transformed data.By checking the MSPE’s of the new method used on the log-transformed data at eachsite, we found that among 35 MSPE’s, the MSPE value at Site 35 (i.e, Site 075a) ismuch higher than those at other sites. At Site 35, the MSPE value is 7.48, while atother sites, all the MSPE values are below 0.493 (except that at Site 21). Observationsat Site 35 are unnaturally compressed into a small interval near zero, making theirlogarithm extraordinarily large in magnitude. Thus in Table 5.2, we observe MSPE’sfor transformed data larger than those for the original because of this outlying site.As well, the poor performance at Site 35 makes the monthly MSPE values of the newmethod applied to the log-transformed data systematically higher than those of Hass’smethod. Figure 5.4 shows that the boxplot of log-transformed nitrate levels at Site 35is well below other boxplots as noted above.Recall, in Chapter 2, we assumed that the hyperparameter B° has an exchangeablestructure. From our simulation study, we know that if that assumption is violated, theproposed new method will not do well. For example, if B° were taken to be identical oversites for each pollutant while the actual data shows that it is not true, the performanceof our interpolator would be poor. So one explanation of the poor performance of thenew method on the log-transformed data is that the exchangeability assumption on B°is violated because of Site 35. That is, the actual hyperparameter B° at that site is notthe same as that of the same pollutant at other sites. To check this conjecture, Sites21, 35, 36, 44, 46, where either observed sulfate levels or nitrate levels are unusual, areremoved from the data set and five new sites are added in (the codes are 076a, 077a,078a, 160a, 161a). CV studies of both Hass’s method and the new method are thencarried out on the new log-transformed data. In terms of MSPE, over 48 months thenew method beats Hass’s method in 35 out of 48 months and in terms of the mean98MSPE, the new method gains an edge by 0.2087363 against 0.3676921. These resultssupport our conjecture.However, the diagnosis of the exchangeability assumption is difficult. The followingtheory offers a way to make a rough check.Note, by Model (2.12)-(2.14) and Equation (2.16),B2° N((B20)t,((ZZt)’ + F-’) 0 22).Equation (2.8) implies,B2 B r’- T(’1’, (ZZt)_1 + F’, B, 6* — sk — 1 + h).By Lemma 2.4, the marginal distribution of /9r (i = 1,. . . , .Sgk — 1), the row of B2,has distributionT(d, (ZZt)’ + F’, 3° 6* — sk + h + 1),where 3° is the rth row of B. To define d, letD= (f” f::)=22Cr,where Cr is an orthogonal matrix that exchanges the r column of J! with its lastcolumn, D,, is (s9k— 1— 1) by (sgk — 1— 1) and d22 is 1 by 1. Then d is d22 — D2,D,,’ D,2.By applying Theorem 2.8, the following theorem is proved.Theorem 5.2F= 6*—sk+ 1($_)[(zzt)1+F1]( o)t Fh,ss_3k÷1. (5.2)Then by the above theorem, the p-value for /335 at Site 35 can be computed as1 — [1— PF(F > F0b8)]35,99because /335 is the extreme value among the 35 /3r5 j = 1,. . . , 35. F0,,3 can be computedby plugging in the estimated hyperparameters and into Equation (5.2). With theabove p-value formula, unless PF(F > F03) is extremely small, it is very unlikely wewould reject the exchangeability assumption at Site 35. Therefore, the power of theabove test is low. An application of the above test to our data shows that we failed toreject it. However, the boxplots of Figure 5.4 make us believe that the observed levelsof nitrate at Site 35 are abnormal.Theoretically, Hass’s method is optimal only when the processes of pollutants are multivariate normal with a flat prior on the parameters. That explains why the MSPEvalues of Hass’s method drop a lot when it is applied to the log-transformed data. (SeeTable 5.2.) A log-transformation made the observed levels of the pollutants follow amultivariate normal more closely, although it is not necessarily the most appropriatetransformation. The new method is also sensitive to the normality assumption. FromTable 5.2, it can be seen that the MSPE values were reduced in 31 out of 48 monthswhen the new method was applied to the log-transformed data.100Table 5.1: Latitudes and Longitudes of the Gauged Sites.Sites Lat. Long.004a 36.1006 94.1733OlOa 40.3644 105.56Olla 39.1011 105.092012a 40.8064 104.754017a 33.1778 84.4061020a 40.0533 88.3719021a 41.7011 87.9953022a 37.71 89.2689023a 37.4356 88.6719024a 41.8414 88.8511025a 41 .6325 87.0878028a 35.6644 83.5903029a 37.1989 108.491030a 45.4897 69.6644031a 45.5611 84.6783032a 42.4103 85.3928033a 44.2244 85.8186034a 47.5311 93.4686035a 44.2372 95.3006036a 32.3344 88.745037a 48.5103 113.996038a 41.1531 96.4928039a 43.9431 71.7033040a 42.7339 76.6597041a 42.2994 79.3964046a 43.5261 75.9472047a 42.1061 77.5356049a 36.1278 77.175051a 35.6967 80,6228052a 35.0239 78.2792053a 35.7286 78.6811055a 40.3553 83.0661056a 39.7928 81.5311058a 40.78 81 .9253059a 44.3869 123.623061a 44.2231 122.242063a 41.5978 78.7678064a 40.6589 77.9361065b 40.7883 77.9464068a 36.0717 112.153070a 29.3019 103.177071a 28.8453 96.92073a 37.335 80.5578074a 47.86 123.933075a 39.0897 79.6622101Table 5.2: MSPEs by Both Methods.month L original_data log-transformed_dataII CoKriging new method CoKriging new method1983.1 3.28392 0.52072 0.341134 0.3236112 5.652717 0.366484 0.549418 0.3578773 0.950273 0.321812 0.165156 0.3037324 0.617414 0.396189 0.115926 0.274415 0.232695 0.509551 0.083971 0.3201966 0.914407 0.881811 0.183823 0.4077217 0.668226 0.786001 0.290934 0.4051738 1.347384 0.592036 0.263396 0.1830269 0.300603 0.543329 0.092434 0.33230110 0.521691 0.375799 0.166289 0.22690311 0.238963 0.25169 0.079822 0.39456112 0.320864 0.332461 0.095796 0.3893541984.1 3.209043 0.402137 0.339717 0.3924072 0.817165 0.539348 0.234923 0.4533883 0.469469 0.18821 0.079713 0.3947194 0.517674 0.288131 0.28535 0.7813335 0.143388 0.334989 0.047676 0.2538296 1.500481 0.787056 0.23976 0.4003717 0.987726 0.775919 0.34463 0.3403488 0.457944 0.60891 0.208633 0.6068519 0.210432 0.354206 0.082369 0.29046810 0.499223 0.502362 0.163754 0.42775811 0.808873 0.234341 0.146061 0.40570912 0.254086 0.356278 0.089412 0.5019831985.1 1.876528 0.238328 0.51466 0.2657332 1.821478 0.36958 0.181693 0.4277473 0.398158 0.244937 0.13868 0.2673054 2.397513 0.430129 0.326478 0.3269755 0.570281 0.307239 0.13937 0.266416 1.89153 0.37827 0.405738 0.4619217 0.509808 0.51783 0.153515 0.2977028 0.758525 0.733556 0.215295 0.545632102month original_data log-transformed_dataCoKriging new method CoKriging 1_new method1985.9 0.302321 0.470564 0.200713 0.5810910 1.559379 0.484305 0.314374 0.29119611 0.24711 0.165849 0.188169 0.26592712 9.080794 0.585204 1.434542 0.3958021986.1 6.937748 0.553377 0.685439 0.3462472 1.717417 0.32262 0.29135 0.2617323 1.376672 0.194057 0.217498 0.3151484 0.410942 0.32283 0.116953 0.52265 0.240855 0.568083 0.069861 0.3808136 0.270805 0.755552 0.09048 0.369877 0.377201 0.637667 0.08781 0.5983398 0.664169 0.411857 0.226513 0.282679 0.492622 0.570382 0.14715 0.4371910 0.363747 0.310728 0.118633 0.41716411 0.199882 0.194151 0.109205 0.30766412 0.861362 0.281423 0.205528 0.502112Mean 1.25524 0.4437143 0.234786 0.381312103Chapter 6Concluding Remarks and FutureStudiesIn this thesis, the theories of Bayesian multivariate interpolation with missing dataare discussed. The proposed interpolation theories have two main characteristics, ahierarchical Bayesian method and a multivariate method.Some obvious advantages of a Bayesian method: with a Bayesian paradigm, the priorinformation is easily incorporated into an interpolation procedure, if such prior information is indeed available; with a Bayesian approach, one can include the model uncertaintyinto the confidence interval of interpolated values, while traditional CoKriging fails todo so. Thus, the confidence intervals computed with traditional CoKriging methods arenarrower they should be. With a hierarchical Bayesian approach, the specification of thespatial covariance structure can be pushed up to a higher level in the hierarchy. That is,the covariance structure can be specified at Stage 2 of the modeling. Under this setup, ifthere is any mistake in the specification of the covariance structure, future observationscan modify the wrongly specified structure. Eventually if enough data are observed,the mistake will be corrected. Therefore, Bayesian modeling provides robustness. Another benefit of the Bayesian approach is that it provides predictive distribution, while104CoKriging only provides an interpolated value with its standard deviation. By knowingthe predictive distribution, one can do more things, e.g., constructing a simultaneousconfidence region, creating random numbers for a simulation study and so on.Another characteristic of the theories in this thesis is their multivariate nature. Thegeneral theory of BLZ also yields a multivariate interpolator. However when there aremissing data, as when data are missing-by-design, with the BLZ theory, interpolationcan only be done pollutant-by-pollutant. That reduces the method to a univariatemethod. The advantage of a multivariate approach is that it allows the interpolationto be carried out by including all the available information. Theorem 5.1 of Chapter3 concludes that in terms of mean squared error, when hyperparameters are known, aBayesian interpolator based on all available information performs at least as well as aBayesian interpolator based on partial information. With that theorem, the extensionof the general BLZ theory to the new theory is intuitively justified. As an empiricalcheck, a CV study in Chapter 5 shows that the quantitative gains of the new methodover the LZ method are significant.The theories developed in this thesis are by no means complete. There is much roomfor further study. For example, a general estimation method of the unknown hyperparameters is needed for the theory of interpolation with monotone missing data. Otherfuture research topics are listed below.In Chapter 5, while the theory of interpolation with data missing-by-design is comparedwith Hass’s CoKriging theory, we pointed out that if the observed data is a spatial dataset only, the theory in this thesis is not directly applicable. That is because the model ofour Bayesian interpolation consists of a temporal trend, which can not be estimated by asingle datum. Therefore, hierarchical Bayesian CoKriging is of interest. Here hierarchal105Bayesian CoKriging means putting a prior on the unknown coefficients of the trendand spatial covariance matrix. One possible approach is as follows. When there aremany sites involved, the dimensions of the spatial covariance matrix can be very big.To reduce the number of random variables in the spatial covariance matrix, one maydivide all related sites into m clusters, with n sites in each cluster. When the means areassumed to be zero, a model of the random process at the jth cluster is set to be,X=R1+L,where the random variable R reflects the correlations of Cluster i with other clusters,1 is a 1 x n, vector of l’s and L reflects the local spatial correlation at cluster i. LetR represents a m x 1 random vector that takes care of the between-clusters spatialcorrelation. Conjugate priors of inverted Wishart distribution are assumed on Var(R)and Var(L), i = 1,. . . , m. Details of interpolation theory remain to be filled in.A Gaussian model is assumed for all interpolation theories discussed in this thesis.In some applications, the observed data may not be Gaussian and they cannot betransformed to be Gaussian. An interpolation theory with non-Gaussian data is needed.With a Gaussian model, the interpolator with data missing-by-design has an analyticalform. With a general distribution model, an interpolator may not have closed form. Inthat case, some Bayesian computing tools, like Gibbs’ importance sampling may needto be brought in for a numerical answer.The prior distribution of the unknown spatial covariance matrix, , in this thesis, istaken to be an inverted Wishart, a conjugate distribution of the Gaussian model. Sincemany authors criticize the Wishart distribution (c.f. Press 1982, Brown, Le and Zidek1993b), an extension of the inverted Wishart may be needed. On possible choice, whichkeeps the conjugate property of the priors, therefore, retains mathematical tractability,and at the same time, provides more parameters to offset one deficiency of the Wishart,106is proposed in Brown, Le and Zidek (1993b). The proposed prior is called a GeneralizedInverted Wishart Distribution. However, the following is worth mentioning. Based onour application and simulation studies in Chapter 3 and 5, our method works well withthe inverted Wishart prior.In Chapter 3, we see that the daily air pollution levels in Southern Ontario show significant lag-i correlation. There, an AR(i) transformed is adopted to remove the correlation. A better method is to extend the interpolation theory of LZ to include alag-i correlated time series, or more generally, to remove the assumption of temporalindependence. That extension will have immediate applications.From the same Southern Ontario study, we see that among the observed data, some arecensored below. In Chapter 3, the censored data are simply treated as missing and filledwith an ad hoc method. Therefore, a new theory that can handle the censored data willbe needed for applications.A last remark goes to the optimal redesign of a current monitoring network. In Chapter4, an example demonstrates how to add two sites to a current monitoring network.An interesting question may be raised. Instead of adding a new monitoring site, thatmeasures ozone only, an alternative approach would put an ozone monitor at one of thecurrently gauged sites where ozone is not monitored. The question is which is optimal?While this can be done with the same entropy approach given in Chapter 4, there aresome problems. Normally it is true that the cost of maintaining a separate monitoringsite will cost more than putting a meter at a current operating site. The entropyapproach does not take that into consideration. Therefore, to solve such a problem, wehave to bring additional factors like cost into the objective function.107BibliographyAfifi, A. A. and Elashoff, R. M. (1961). “Missing Observations in Multivariate Statistics.” J. Amer. Statist. Assoc. Vol. 61, 595-604.Ahmed, S. and de Marsily, G., (1987). “Comparison of Geostatistical Methods forEstimating Transmissivity Using Data on Transmissivity and Specific Capacity”.Water Resources Research, 23, 1717-1737.Anderson, T.W., (1984). “An Introduction to Multivariate Statistical Analysis”. NewYork: Wiley.Berndtsson, R., (1988). “Temporal Variability in Spatial Correlation of Daily Rainfall”.Water Resources Research, Vol. 24, 1511-1517.Billingsley, P., (1968). “Convergence of Probability Measures.” New York: Wiley.Brown, P.J., Le, N. D. and Zidek, J.V. (1994a). “Multivariate Spatial Interpolationand Exposure to Air Pollutants” . Canadian Journal of Statistics. To appear.Brown P.J., Le, N. D. and Zidek, J.V. (1994b). “Inference for A Covariance Matrix”.Aspects of Uncertainty: A Tribute to D.V. Lindley. Ed. A.F.M. Smith and P.R.Freeman. John Wiley & Sons.Buck, S. F. (1960). “A Method of Estimation of Missing Values in Multivariate DataSuitable for Use with an Electronic Computer.” J. of the Royal Statistical Society,Ser. B, 22, 302-307.Burnett, R. T., Dales R. E., Rainenne M. D. and Krewski D. (1992). “The RelationshipBetween Hospital Admissions and Ambient Air Pollution in Ontario, Canada: APreliminary Report”. Unpublished Report.Caselton, W.F., Kan, L. and Zidek, J. V. (1992) “Quality Data Networks that MinimizeEntropy”. Statistics in the Environmental and Earth Sciences. Eds. P. Guttorpand Walden. Griffin, London.108Chen, C.F., (1979). “Bayesian Inference for a Normal Dispersion Matrix and its Application to Stochastic Multiple Regression Analysis”. J. Roy. Statist. Soc., B,41, 235-248.Cressie, N. and Hawkins, D.M.(1980). “Robust Estimation of the Variogram, I.” J. ofthe International Association for Mathematical Geology, 12, 115-125.Cressie, N. (1986). “Kriging Nonstationary Data”. JASA, 81, 625-634.Cressie, N. (1989). “Geostatistics”. The American Statistician, 43, 197-202.Cressie, N., (1991a). “Statistics for Spatial Data”. New York: Wiley.Cressie, N., (1991b). “Modeling Growth with Random Sets.” In Spatial Statisticsand Imaging (Proceedings of the AMS-IMS-SIAM Joint Summer Research Conference), A. Possolo, ed. Institute of Mathematical Statistics, Hayward, CA.Dawid, A.P., (1978). “Extendibility of Spherical Matrix Distribution”. J. Mult. Anal.8, 559-66.Dawid, A.P., (1981). “Some Matrix—variate Distribution Theory”. Biometrika, 68,265-74.Dempster, A.P., Laird, N.M. and Rubin, D.B., (1977). “Maximum Likelihood fromIncomplete Data via the EM Algorithm (with discussion)”. J. Roy. Statist. SocB, 39, 1-38.Delhomme, J.P. (1978). “Kriging in the Hydrosciences” Advances in Water Resources,1, 251-266.Dickey, J.M, (1967). “Matricvariate Generalizations of the Multivariate t Distributionand the Inverted Multivariate t—distribution”. Ann. Math. Statist. 38, 511-8.Dickey, J.M., Lindley, D.V. and Press, S.J. (1985). “Bayesian Estimation of the Dispersion Matrix of a Multivariate Normal Distribution “. Communication in StatisticsA, 14:1019-1034.109Duddek, C., Le N. D., Sun W., White R., Wong H., Zidek J.V. (P1) (1994). “Assessingthe Impact of Ambient Air Pollution on Hospital Admissions Using InterpolatedExposure Estimates in Both Space and Time: Final Report to Health Canadaunder DSS Contract h4078-3-C059/01-SS”. Unpublished Report.Dunnett, C. W. and Sobel, (1954) . “A Bivariate Generalization of Student’s tdistribution with Tables for Certain Special Cases”. JASA, 50, 1096-1121Eaton, M.L., (1983). “Multivariate Statistics: A Vector Space Approach”. Wiley, NewYork.Guttorp, P., Le N. D., Sampson P.D. and Zidek, J.V., (1993) “Using Entropy in theRedesign of an Environmental Monitoring Network”. Multivariate EnvironmentalStatistics; edited by G.P. Patil and C.R. Rao. North Holland/Elsevier Science,NY.Harrison, P. J. and Stevens, C. F.(1976) “Bayesian Forecasting” (with discussion) J.of the Royal Statistical Society, Ser. B, .38, 205-247.Hass, T. C., (1993). “Cokriging Variables That Are First and Second Order Nonstationary “. Paper presented at the annual meeting of the Statistical Society ofCanada, Wolfville, Nova Scotia, June, 1993.Hass, T. C., (1992). “Redesigning Continental-Scale Monitoring Networks”. Atmospheric Environment Vol.26A, No.18, 3323-3333.Hass, T. C., (1990a). “Lognormal and Moving Window Methods of Estimating AcidDeposition”. J. Amer. Statist. Assoc. 85(412), 950-963.Hass, T. C., (1990b). “Kringing and Automated Variogram Modeling within a MovingWindow”. Atmospheric Environment Vol.24A, No.7, 1759-1769.Halley, E. (1686). “An Historical Account of the Trade Winds, and Monsoons Observable in the Seas between and near the Tropics: With an Attempt to Assign thePhysical Cause of Said Winds”. Philosophical Transactions 183, 153-168.110Handcock, M. S. and Stein, M. L.(1993) “A Bayesian Analysis of Kriging” Technometrics, Vol.35, No.4, 403-410.Handcock, M. S. and Wallis, J. R.(1994) “An Approach to Statistical Spatial-TemporalModeling of Meteorological Fields” JASA, Vol.89, No.426, 368-390.Huijibregts, C. J. and Matheron, G. (1971) “Universal Kriging (An Optimal Methodfor Estimating and Contouring in Trend Surface Analysis).” In Proceedings ofNinth International Symposium on Techniques for Decision-Making and Metallurgy, Social Volume, 12, 159-169.Johnson, T., Capel, J., McCoy, M. and Warnasch, J. (1994) “Estimation of CarbonMonoxide Exposures and Associated Carboxyhemoglobin Levels Experienced byResidents of Toronto, Ontario Using a Probabilistic Version of NEM.” UnpublishedReport.Journel, A. G. (1980) “The Lognormal Approach to Predicting Local Distributionsof Selective Mining Unit Grades.” Journal of the International Association forMathematical Geology, Vol.12, 285-303.Kannan, D. (1979). “An Introduction to Stochastic Processes “. New York: NorthHolland.Kitanidis, P. K.(1986) “Parameter Uncertainty in Estimation of Spatial Functions:Bayesian Analysis.” Water Resources Research,22,499-507.Komungoma, S. (1992) “Assessment of the Quality for the NADP/NTN Data Basedon Their Predictability.” M.Sc. Thesis, Department of Statistics, University ofBritish Columbia, Vancouver, Canada.Laslett, G. M.(1994) “Kriging and Splines: An Empirical Comparison of Their Predictive Performance in Some Applications.” JASA,Vol.89, No.426, 391-409.Le, N. D. and Zidek, J.V. (1992). “Interpolation with Uncertain Spatial Covariance:A Bayesian Alternative to Kriging”. J. Mult. Anal, 43, 351-74.111Leonard, T. and Hsu, S.J.H. (1992). “Bayesian Inference for a Covariance Matrix”.Annals of Statistics , 20, 1669-1696.Lindley, D. V. and Smith, A.F.M. (1972). “Bayes Estimates for the Linear Model”. J.Roy. Statist. Soc. B, 34, 1-32.Little, R. J. A. and Rubin, D.B. (1987). “Statistical Analysis with Missing Data”.John Wiley & Sons.Loader, C. and Switzer, P. (1992). “Spatial Covariance Estimation for MonitoringData”. In Statistics in the Environmental and Earth Sciences, eds. A. T. Waldenand P. Guttorp, London: Edward Arnold, pp.52-7O.Mardia, K. V., Kent, J.T. and Bibby, J.M. (1979). “Multivariate Analysis”. AcademicPress, New York.Mardia, K.V. and Marshall, R.J. (1984). “Maximum Likelihood Estimation of Modelsfor Residual Covariance in Spatial Regression”. Biometrika , 71, 135-146.Matheron, C. (1962). “Traite de Geostatistique, Tome I.”Memoires du Bureau deRecherches Geologiques et Minieres, No. 14. Editions Technip, Paris.Matheron, G. (1969). “Le Kriegeage Universel.” Cahiers du Centre de MorphologicMathematique , No. 1. Fontainebleau, France.Matheron, 0. (1971). “The Theory of Regionalized Variables and Its Applications”.Cahiers du Centre de Morphologic Mathematique, No. 5. Fontainebleau, France.Miller, A. A. and Sager, T. W. (1994). “Site Redundancy in Urban Ozone Monitoring.”J. Air é1 Waste Manage. Assoc., 44, 1097-1102.Muirhead, R. J. (1982). “Aspects of Multivariate Statistical Theory”. John Wiley,New York.Nather, W. (1985). “Effective Observations of Random Fields.” Teubner-Texte zurMathematik, Band 72. Teubner, Leigzig.112Neuman S. P. and Jacobson, E.A. (1984). “Analysis of Nonintrinsic Spatial VariabilityBe Residual Kriging with Application to Regional Groundwater Levels”. J. of theInternational Association for Mathematical Geology 16, 499-521.Olea, R.A. (1984). “Sampling Design Optimization for Spatial Functions “. Journalof Geophysical Research, 79, 695-702.Oehlert, G. W. (1993). “Regional Trends in Sulfate Wet Deposition”. JASA, 88,390.3599.Omre, H. (1987). “Bayesian Kriging—Merging Observations and Qualified Guesses inKriging “. Mathematical Geology, 19, 25-39.Omre, H. and Halvorsen, K.B. (1989a). “the Bayesian Bridge between Simple andUniversal Kriging “. Mathematical Geology, 21, 767-786.Omre, H. and Halvorsen, KB. (1989b). “A Bayesian Approach to Kriging “. InProceedings of the Third International Geostatistics Congress I ed. M. Armstrong,Dordrecht: Academic Publishers, pp. 49-68.Pilz, J., (1991). “Bayesian Estimation and Experimental Design in Linear RegressionModels.” John Wiley, New York. . New York: Holt, Rinehart & WinstonPress, S. J., (1982). “Applied Multivariate Analysis-Using Bayesian and FrequentistMethods of Inference”. Holt, Rinehart & Winston, New York.Rubin, D. B. (1976). “Comparing Regressions When Some Predictor Variables AreMissing”. Technometrics 18, 201-206.Sampson, P. and Guttorp, P., (1992). “Nonparametric Estimation of NonstationarySpatial Covariance Structure”. J. Amer. Statist. Assoc. Vol.87 No. 417, 108-119.Sarndel, C. E. and Swensson, B. and Wretman, J. (1992). “Model Assisted SurveySampling”. Springer-Verlag, New York, Inc.Stein, A. and Corsten, L.C.A., (1991). “Universal Kriging and Cokriging as a Regres113sion Procedure”. Biometrics 47, 575-587.Sten, M. L., Shen, X. and Styer, P.E. (1991). “Applications of a Simple RegressionModel to Acid Rain Data.” Technical Report No. 276. Department of Statistics,University of Chicago, Chicago, IL., USA.Taylor, A. E. and Lay, D.C. (1980). “Introduction to Functional Analysis”. Wiley,New York.Tanner, M.A. and Wong, W.H., (1987). “The Calculation of Posterior Distribution byData Augmentation”. JASA 82, 528-550.Wahba, 0., (1990a). “Comment on Cressie,” The American Statistician, 44, 255-256.Wahba, 0., (1990b). “Spline Models for Observational Data,” Philadelphia: Societyfor Industrial and Applied Mathematics.West, M. and Harrison, P.J., (1986). “Monitoring and Adaptation in Bayesian Forecasting Models”. JASA 81, 741-750.Woodbury, A.D. (1989). “Bayesian Updating Revisited,”. Mathematical Geology 21,285-308.Wu, C.F.J., (1983). “On the Convergence Properties of the EM Algorithm,” Ann.Statist. 11, 95-103.Wu, S. and Zidek, J.V. (1992). “An Entropy Based Review of Selected NADP/NTNNetwork Sits for 1983-1986”. Atmospheric Environment.Yates, S. and Warrick, A.W. (1987). “Estimating Soil Water Content Using Cokriging.”Soil Science Society of America Journal,51, 23-30.Zellner, A. (1971). “An Introduction to Bayesian Inference in Econometrics.” NewYork: John Wiley.Watson, C.R. and Olsen A. R. (1984). “Acid Deposition System (ADS) for Statis114tical Reporting—System Design and User’s Code Manual”. U.S. EnvironmentalProtection Agency.115Appendix A:In this Appendix, the S function, iwm(m,cn,p,flag=1,sl,M2,misindx, Tint), and its Shelp file is listed. The iwm S function is designed to perform spatial interpolation andestimation of Ag, , with an EM algorithm that is described at the end of Chapter2. “iwm” calls a C program, the source file of which is listed in Appendix B.iwm S funtion help file:Returns a list of estimated hypercovariance matrices and prior degrees of freedom.USAGE: iwm(m, cn, p, flag = 1, si, M2, mi.sindx, Tint)ARGUMENTS:m: number of measured or non-measured air pollutants at a sitecn: when flag=O, cn is the known degrees of freedom in the prior distribution, “inverseWishart”, of the unknown spatial covariance matrix. When flag=i, cn is unknownand the input is an initial value. That value must be greater than m times thetotal number of sites, p+si.p: number of gauged sites.flag: a value indicates whether cn is known or not.Si: number of ungauged sites.M2: a response matrix. The rows are the observations at gauged sites over the time.The columns are “snapshot” air pollutant levels observed at gauged sites.misindx: an index vector of missing columns and observed columns.Tint: a p column matrix with each column being a mean of all columns at a gaugedsite. In iwm, Tint is used for getting an initial estimation of the between-siteshypercovariance matrix TiVALUE:116Ti: the estimated between—gauged-sites-hypercovariance matrix.B 1: the estimated between-air-pollutants-hypercovariance matrix.cn: if flag=O, cn is the same as the input value. If fiag=1, cn is the estimated degreesof freedom.C OBJECT FILE:The C objective file “Phill.o” is supposed to be under your S working directory thatnormally is at position 1 of your S search list. Use S function “searchO” to check it.If that is not true, you may need to customize “iwm” function to your own version byreplacing the following S commands in your copy of iwm function,filename_search() [1]filename_paste(c(filename,”/Phill.o”) ,collapse=”)dyn. load(filename)withdyn.load(”pathname/Phill .0”)where “pathname” is the complete pathname under which “Phill.o” locates.DETAILS:IWM, standing for “Interpolation With Data Missing-by-design”, implements the theoryof interpolation with data missing-by-design. The theory is developed under a normaldistribution model when there are data missing-by-design. The other hyperparametersB°, F—’ are estimated by unbiased estimators. The output Ti of iwm is used lateras an input for the Sampson and Guttorp nonparametric approach for extending Tito including all sites. The structure of iwm is that: first, iwm does some initial datamanipulation and second, it calls “Phiil.o”. It is in the C program where the EMalgorithm of the interpolation theory is implemented.As a warning, iwm adopts an EM algorithm, which is notoriously slow. Sometime whenthe dimensions of the observed data matrix are big, it could take hours, even days tofinish. So first of all, be patient! Secondly, either submit it as a batch job or lower theprecision value at line 70 of file “Phiii.c”. When you change the source code file, be117sure that you recompile the file with unix command “cc -c Phill.c” and copy the newobject file to your S working directory. The current value is set to be 0.0005. You mayset it to a different number. Also you can try various transformations of the data matrixto make it close to normal. By doing those, the convergence is hopefully faster.EXAMPLES:Consider an artificial example for an explanation of how to use “iwm”. Suppose thereare four Sites 1, 2, 3 and 4. Site 1 is an ungauged site (sl=1), Sites 2, 3 and 4 are gaugedsites (p=3). Three air pollutants: 504, 03, NO3 are measured at all sites. Only SO4is measured at Site 2, 03 at Site 3, and, SO4, 03 and NO3 at Site 4. Now label 1 to 3for SO4, 03, NO3 at Site 2; 4 to 6 for SO4, O3 NO3 at Site 3 and 7 to 9 for 504, 03,NO3 at Site 4. Since the columns of air pollutants labeled with 2, 3, 4, 6 are missed,and the columns labeled with 1, 5, 7, 8, 9 are observed, misindx=(2,3,4,6,1,5,7,8,9). Iffurther, the observed data matrix is:Site #2 Site #3 Site #4SO4 03 5O4,03NOt=1 1.2 3.4 1.6 4.0 2.7t=2 1.7 3.1 1.1 3.9 2.5thenM2= (1.2 3.4 1.6 4.0 2.7\1.7 3.1 1.1 3.9 2.5and/1,2 3.4 2.667Tint = I\ 1.7 3.1 2.5where the third column of Tint is the mean of the columns 4,5,6 of M2.Macros of IWM:function(m,cn,p,flagl,sl,M2,misindx,Tint){f_dim(M2) [1]1181_p*m—dim(M2) [2]tmlO_c(1 :f)Z_c(rep(1,f),tmlO,cos(2*pi*tmlO/12),sin(2*pi*tmlO/12))k_length(Z) IfZ_matrix(Z,k,f ,byrowT)Bhat_t (M2)7.*%t (Z)%*°hsolve(Z°h*%t (Z))MuO_apply(Bhat ,2 ,mean)S_t (M2) 7*7M2-Bhat707Z%*72tmpi_as integer(p*m-1)F1_matrix(O,k,k)for (i in 1:tmpi) {a.rep(-1/tmpi ,tnipi)a[i] _1+a[i]a_matrix(a, tmpi,1)F1_F1+t (Bhat) 7.*7.a%*7.t (a) °h*’/.Bhat/c (t (a) *7,S7.*7.a)}F1_ (n+f-k-2)*F1/tmpiB02_matrix(1 ,tmpi ,1)70*%matrix(MuO , 1 ,k)tmp_Bhat-B02Ssim_S+tmp%*Y.solve(F1) %*Y.t (tmp)B1_diag(rep(1 ,m))Bhat2_t (TintYh*’ht (Z)Y.*Y0so1ve(Zh*Yt(Z))T1_t (Tint)7*Y0Tint-Bhat2%*Y.Z%*7.TintnOfilename_search() [1]filename_paste(c(filename,”IPhill.o),collapse””)dyn. load (filename)rvalues_ .CQ’Phill”, as.double(T1), as.double((B1)),as.integer(misindx), as.double((Ssim)),as.integer(flag),as.integer(n), as.integer(f), as.integer(p), as.integer(m),as.integer(k), as.integer(l), as.double(cn), as.integer(sl))119list (Tlrvalues[[1]] ,Blrvalues[[2]] ,degree=rvalues[[12]])}120Appendix B:Source file of C program Phill.o:#include <math.h>#include <stdio .h>void Phill(T1, Bi, misindxl, Ssiml ,flagl,nl,fl ,pl ,ml ,kl ,ll ,cnl, sli)mt *flagl, *nl,*fl,*pl,*ml,*kl,*ll, *misindxl, *sll;double *T1, *B1, *Ssiml, *cnl;{mt flag, n,f,p,m,k,l,sl;double **T, **B, **Ssim,cn;double **,**matrixQ;double ma,tO,tl,t2,detQ,snQ,t3,t4;mt i,j,*ivectorQ, *misindx, tmpi2;void maxlQ, 1Q,f_matQ, detlogO;/* flag = 0 degrees of freedom is known(ie. m is constant in the paper)= 1 estimate the df as welln = number of observations in the pastf = number of observations in the future1 = number of missing columnsp = number of (gauged) stationss1 number of (ungauged) stationsm = number of ionsk = number of coefficients in the linear model(i.e. 4 in the paper)cn= initial estimate of m;121that is, if flag =0 then df =cn else ifflagl then use the algorithm in the paper withcn as initial valueSsimS”{”}misindx = the index matrix for the missing columns andthe not missing cls.*1/* T is Lambda, B is Omega *1flagflag1;nflnl;f=*f 1;p*pl;m*ml;k=*kl;l=*l1;cn*cnl;s1*sll;Tmatrix(p,p);Bmatrix(m,m);misindxivector(m*p);matrix(m*p ,m*p);Ssimmatrix(p*m-l,p*m-l);/*******************/for (j0;j<m*p;++j)misindx[j]=*(misindxl+j);for (i0; i<p*m—l;+-i-i)for (j0;j<p*m—l;++j)Ssim[i] [j]*(Ssiml+i*(p*m—1)+j);for (i0;i<p;++i)for (j0;j<p;++j)T[i] [j]*(T1+i*p+j);122for (i0;i<m;++i)for (j0;j<m;++j)B[i] [j]*(B1+i*m+j);printf(”The program will finish when ma < O.0005\n”);while (1) {1* printf (“step 1\n”); */1(sigma, misindx, T, B, cn, p,f,l, m,n,k,Ssim,sl);1* printf(”step 2\n”); */while (1) {tldet(T,p);t2det(B,m);if ( fabs(tl) < le—50) t11.O;if ( fabs(t2) < le—50) t21.O;/* printf(”step 3\n”); *1maxl(cn,T,B,,p,m,sl);1* printf (“step 4\n”); *1t3=det(T,p);t4det(B,m);if ( fabs(t3) < le—50) t31O;if ( fabs(t4) < le—50) t4=1O;if ((fabs(log(tl)—log(t3))<le-4)&&(fabs(log(t2)—log(t4))<le-4)) break;}macn;tmpi2p*m;/* printf (“step 5\n”); */detlog(T,B,&t2,&t3,p,m,l,misindx, Ssim);/* printf (“step 6\n”); */if (flag>O) cn=sn(cn,n,p,m,sl,f,k,l,t2,t3);1* printf(”step 7\n”); */123ma=fabs(ma-cn)/cn;printf(”ma is h1f\n”,ma);if (ma<O.0005) break;}for (i0;i<p;++i)for (j0;j<p;-i-+j)*(T1+i*p+j)T[i] [j]for (i0;i<m;++i)for (j0;j<m;++j)*(B1+i*m-I-j)=B[i] [j]*cnlcn;}/**********************************************/void maxl(c,T,B,L,p,m,sl)double **T,**B,**L,c;mt p,m,sl;{mt i,j,k,l;double xO,xl,detQ;void invert_matrixQ;for (k0;k<p;-f+k)for (l=O;l<p;++l)for (i0,T[k] [l]0.O;i<m;++i)for (j0;j<m;++j)T[k] [l]+L[m*k+i] [m*l+j]*B[i] U]!(c— (double) (sl*m))/(double)m;invert_matrix(T,p);for (k=O;k<m;-i-+k)for (10;l<m;++l)for (i0,B[k][1]0.O;i<p;++i)124for (j0;j<p;-f+j)B Ek] [1] +=L [in*i+k] [m*j+l] *T [i] Ci] /(c—(double)(sl*m))/(double)p;invert_matrix(B,m);}/**********************************************/double sn(cl,n,p,m,sl,f,k,1,t2,t3)double cl,t2,t3;mt n,p,m,sl,f,k,l;{double kl,k2,c2,funclQ,trigarnmaQ;mt i, tmpi;while (1) {tmpip*m-l;kl=funcl(cl,n,p,m,sl,f,k,l,t2,t3);c2c1;for (il,k20.0;i<p*m;++i)k2=k2+trigamma((cl+(double) (n+f-i—k—sl*m))/2.0)—trigainma((cl—(double)(i+sl*m))/2.0);k2k2/2.0;c1c2-kl/k2;1* printf(”kl,k2,cl,c2 is %1f 701f °hlf °hlf\n”,kl,k2,cl,c2); *1if (fabs(kl)<1.e-08) break;if (cl<(double) (m*(p+sl))){if (c2/2>(double)(m*(p+sl)+1)) c1c2/2;elseci (double) (m*(p+si)+i);}}125return ci;}double funcl(c,n,p,m,sl,f,k,l,t2,t3)double c,t2,t3;mt p,m,n,k,si,f,l;{double tl,digamma() ,**matrix() ,de-tQ;mt i;void f_matQ;for (i1,t10.0;i<p*m;-i-+i)titi+digamma((c+(double) (n+f—i—k—si*m)) /2.0)—digamma((c— (double) (i+si*m))/2 .0);/* printf(”ti,t2,t3 °hlf °hlf701f\n”,ti,t2,t3);return ti—t2+t3;}void detlog(T,B,t2,t3,p,m,l,mismndx,ssim)double **T, **B, *t2,*t3,**ssim;mt p,m,l, *misindx;{double **Psi22, **tmi, **tm2, **R, **matrixQ,detQdouble **Phiii, **Psi;mt tmpi2,tmpi,i,j ,ii,i2,i3,tpi;tmpip*m-l;tmpi2p*m;Rmatrix(p*m, p*m);Phiiimatrix(tmpi2, tmpi2);126for (i10;±1<tmp±2;++i1)for (i20;±2<tmpi2;++i2)R[±1] [±21=0.0;for (i0; ±<p*m; i++) {tpimis±ndx[i] —1;R[tp±] [±1=1.0;}for (i0;i<tmp±2;++i)for (j0;j<tmp±2;++j)Ph±11[i] [j]T[±/m] [j/m]*B[±°hm] [j°hm];tmlmatr±x(tmp±2, tmp±2);tm2=matr±x(tmp±2, tmp±2);for (±1=O;±1<tmp±2;++±1)for (±2=0; ±2<tmp±2 ;tml[±1] [±2]R[±2] [±1];for (±1=0;±1<tmp±2;++±1)for (±2=0; ±2<tmp±2 ;for (±3=0,tm2[±1] [±2]=0.0;±3<tmp±2;++±3)tm2 [±1] [±2] =tm2 [±1] [±2] +tml [±1] [±3] *ph±11 [±3] [±2]f_mat (Ph±11 ,tmp±2);f_mat (tml ,tmp±2);Ps±matr±x(tmp±2, tmp±2);for (±1=0;±1<tmp±2;+i-±1)for (±2=0,Ps±[±1][±2]=0.0;±2<tmp±2;++±2)for (±3=0; ±3<tmp±2 ; ++±3)Ps±[±1] [±2]Ps±[±1] [±2]+tm2[±1] [±3]*R[±3] [±2];f_mat (tm2,tmp±2);f_mat(R, tmp±2);Ps±22matr±x(tmp±, tmp±);for (±1=0; ±1<tmp±; ±1++)for (±2=0; ±2<tmp±; ±2++) {127Psi22 [ii] [i2] =Psi [il+l] [i2+l] +ssim[il] [i2];}*t2=log(det(Psi22,tmpi));for (i10; il<tmpi; il++)for (i2O; i2<tmpi; i2++) {Psi22[il] [i2]=Psi[il+l] [i2+l]}*t3=log(det(Psi22,tmpi));f_mat(Psi, tmpi2);f_mat(Psi22, tmpi);}/******i*****************************************************/#include <stdio .h>#include <math.h>/*****************************************************************//*/* This module gives E(\_{(11)} I M_{2}, \Phi, \cn{*})1* and E(log I\J(ii)}I I m_{2}, \Phi, \cn{*}) for the1* systematic pattern with whole column missing *1/* is \Sigma..{(11)}1* log is log I\Sigma_{(11)} I *11* phi is the hyperparameter \Phi_{(11)}=1* \Lambda_{g} \otimes \Omega *11* cn is the degrees of freedom, another hyperparameter1* s2 is number of gauged stations, si is number of ungauged/* stations *11* 1 is number of missing columns, misindx is the index of1* missing columns1* k is number of responses at each station *11* n is number time spots in the past, f is that in the future */128/* h is number of covariatesvoid 1(sigma, misindx, Lambda, Omega,delta, s2,f,l, k,n,h,Ssim,sl)double **, **Lambda, **Omega, **Ssim,delta;mt 1, k,h,n, s2,f, *misindx,sl;{double **Psillstar, **Psill, **Psil2, **Psi22, **matrixQ, **Psi;double **R, **Rt, **tml, **tm2, **tm3, **tm4,**tm5, **1;double **Psi22in, **Phi22, **Psillstarin, **Phill, cont,contl;double **etal2, **etal2star;mt ii, i2, i3,tmpi, tmpi2, tpi;void mat_inverseQ, f_matQ;FILE *out;tmpis2*k-lRmatrix(s2*k, s2*k);for (i10;il<s2*k;++il)for (i20;i2<s2*k;++i2)R[il] [i2]0.O;for (i10; il<s2*k; il++) {tpi=misindxEil]—1;R[tpi] [il]=1.O;}tmp i 2 = s2 *k;Phillmatrix(tmpi2, tmpi2);Psimatrix(tmpi2, tmpi2);for (i10;il<tmpi2;++il)for (i2=O; i2<tmpi2;++i2)Phill[il] Ei2]LambdaEil/k] [i2/k]*Omega[il%k] [i2%k];129tmlmatrix(tmpi2, tmpi2);tm2matrix(tmpi2, tmpi2);for (i10;il<tmpi2;++il)for (i20;i2<tmpi2;-H-i2)tml[il] [i2]R[i2] [ii];for (il=O;il<tmpi2;++il)for (i20; i2<tmpi2 ; ++i2)for (i30,tm2[il][i2]0.O;i3<tmpi2;++i3)tm2[il] [i2]=tm2[il] [i2]+tml[il] [i3]*Phill[i3] [i2]for (110;il<tmpi2;++il)for (i20,Psi[il][i2]0.O;i2<tmpi2;++i2)for (i3O; i3<tmpi2 ;++i3)Psi[il] [i2]=Psi[il] [i2]+tm2[il] [i3]*R[i3] [i2]f_mat (tml, tmpi2);f_mat(tm2, tmpi2);Psillmatrix(1,1);for (i10; il<1; il++)for (i2=O; i2<1; i2++) {Psill[il] [i2]Psi[il] [i2]}Psil2matrix(1 ,tmpi);for (i10; il<1; il-H-)for (i2O; i2<tmpi; i2++) {Psil2 [ii] [i2]Psi[il] [i2+1]3-Psi22matrix(tmpi,tmpi);for (il=O; il<tmpi; il++)for (i2=O; i2<tmpi; i2++) {Psi22[il] [i2]Psi[il+1] [i2+1]3-etal2matrix(1,tmpi);130Psi22inmatrix(tmpi, tmpi);mat_inverse(Psi22, tmpi, Psi22in);f_mat(Psi22, tmpi);Psi22matrix(tmpi, tmpi);for (i10; il<tmpi; il++)for (i20; i2<tmpi; i2++) {Psi22[il] [i2]Psi[il+1] [i2+1]}f_mat (Psi ,tmpi2);for (i10;il<1;++il)for (i20 ; i2<tmpi ; ++i2)for (i30,etal2[il] [i2]0.O;i3<tmpi;++i3)etal2[il] [i2]etal2[il] [i2]+Psil2[il] [i3]*Psi22in[i3] [i2];tml=matrix(tmpi, 1);for (i10;il<tmpi;++il)for (i2O ; i2<1 ; ++i2)tml[il] [i2]Psil2[i2] [ii];tm2matrix(1, 1);for (i10;il<1;++il)for (i20;i2<1;++i2)for (i30,tm2[il][i2]=O.O;i3<tmpi;+-t-i3)tm2[il] [i2]tm2[il] [i2]+etal2[il] [i3]*tml[i3] [i2JPsillstar=iuatrix(1,1);for (i10; il<1; il++)for (i20; i2<1; i2++)Psillstar[il] [i2]Psill[il] [i2]—tm2[il] [i2]Psillstarinmatrix(1,1);/*outfopen(”jmik”,”w”); *//*for (i10;il<1;il++) {*/1* for (i20;i2<1;++i2) {*/1* fprintf(out,”°h5.21f “,Psillstar[il][i2]);*/1311* if (ilY.1O==9) fprintf(out,”\n”); }*//*fprintf(”\n\n”); }*//*fclose(out) ;*//*exjt(0) ;*/if (1 0) mat_inverse(Psillstar, 1, Psilistarin);if (1 ‘=0) {f_mat (Psillstar,l);f_mat (Psill,l);f_mat(tm2, 1);f_mat(Psil2,1);}f_mat(tml, tmpi);contdelta— (double) (sl*k);1=matrix(tmpi2,tmpi2);for (i10; il<1; il++)for (i20; i2<l; i2++)1[il] [i2]cont*Psillstarin[il] [i2];tmlmatrix(l,tmpi);for (i10;il<l;++il)for (i2=0;i2<tmpi;++i2)for (i30,tml[il][i2]=0.0;i3<l;++i3)tml [ii] [i2] =tml [ii] [i2] +Psillstarin [ii] [i3] *etal2 [i3] [i2]for (i10; il<1; il-H-)for (i20; i2<tmpi; i2++) {1[il] [i2+l]—1.0*cont*tml[il] [i2]l[i2+l] [ii] sigmal [ii] [i2+l];}contldelta-(double) (sl*k-n-f+l-t-h);tm2matrix(tmpi , 1);for (i10;il<tmpi;++il)for (i20;i2<l;+-i-i2)132tm2[il] [i2]=etal2[i2] [ii];tm3=matrix(tmpi ,tmpi);for (il=O;il<tmpi;++il)for (i2=O;i2<tmpi;++i2)for (i30,tm3[il][i2]0.O;i3<1;++i3)tm3[il] [i2]tm3[il] [i2]+tm2[il] [i3]*tml[i3] [i2Jif (1 ‘=0) f_mat(tml,1);f_mat(tm2, tmpi);tm4=matrix(tmpi,tmpi);for (il=O;il<tmpi;++il)for (i2=0; i2<tmpi ; ++i2)tm4 [ii] [i2] Psi22 [ii] [i2] +Ssim [ii] [i2]tm5matrix(tmpi,tmpi);mat_inverse(tm4, tmpi, tm5);tml=matrix(tmpi,tmpi);for (il=0; il<tmpi; il++)for (i20; i2<tmpi; i2++) {1[il+1] [i2+1]contl*tm5[il] [i2]+cont*tm3 [ii] [i2]+(double)1*Psi22in[il] [i2];f_mat (tml ,tmpi);f_mat(tm4, tmpi);f_mat (Phill ,tmpi2);f_mat(tm5, tmpi);f_mat(Psi22, tmpi);f_mat(tm3, tmpi);f_mat(Psi22in, tmpi);if (1 =0) {f_mat (etal2,1);f_mat (Psillstarin,1);}133tml=matrix(tnpi2,tmpi2);Rt=matrix(tmpi2, tmpi2);for (il=O;il<tmpi2;++il)for (i2=O; i2<tmpi2 ; ++i2)Rt[il] [i2]R[i2] [ii];for (il=O;il<s2*k;++il)for (i20;i2<s2*k;++i2) {tml[il] [i2]0.O;for (i30;i3<s2*k;++i3) tml[il] [i2]=tml[il] [i2]+1[il] [j3]*Rt[j3] [i2]}for (i10;il<s2*k;++il)for (i20;i2<s2*k;++i2) {[ii] [i2]0.O;for (i30;i3<s2*k;++i3) [ii] [i2]=[ii] [i2]+R[il] [j3]*tml[j3] [i2]}f_mat(tml, tmpi2);f_mat(Rt, tmpi2);f_nat (R,tmpi2);f_mat(1,tmpi2);}/**********************************************************/#include <math.h>#define SIGN(a,b) ((b)<O.O 7 -fabs(a) fabs(a))void ord_mat(nvr,eig,mat)mt nvr;double *eig, **nat;{134mt i=O,j=O,k=O;double tmp=O.O;for (i=0;j<nvr;i++)for (ji;j<nvr;j++)if (eig[j]<eig[i]) {tmpeig[j];eig[j]eig[i]eig[i] tmp;for (k0;k<nvr;k++) {tmpmat[k] [ji;iuat[k] [j]matEk] [ii;matEk] [i]tmp;}}}void tred2(a,n,d,e)double **a,d[] ,e[];mt n;{mt 10,k0,j0,i0;double scale0 .0,hhO .0,h=0 .0,g0 .0,f0 .0;for (i=n—1;i>1;i——) {1i—1;hscale=0.0;if (1 > 0) {for (k0 ; k<1 ; k++)scale + fabs(a[i][k]);if (scale == 0.0)135e[i]a[i] [1];else {for (k0;k<1;k++) {a[i] [k] / scale;h +za[i] [k]*a[i] [k];}fa[i] [1];g = f>O.O ? —sqrt(h) : sqrt(h);e [ii scale*g;h -= f*g;a[i] [l]f—g;f0.O;for (j0;j<1;j++) {a[j] [i]aEi] [j]/h;g0.O;for (k0;k<j ;k++)g +a[j] [k]*a[i] [k];for (kj+1 ;k<1;k++)g +a[k] [j]*a[i] [k];e[j]g/h;f + e[j]*a[i][j];}hh±/(h+h);for (jO;j<l;j++) {fa[i] [j]e[j]=g=e[j]-hh*f;for (kO;k<j ;k++)a[j] [k] —= (f*e[k]+g*a[i] [k]);}}} else136e[i]a[i] [1];d[i]h;}d [0] =0 . 0;e [0] =0 0;for (i0;i<n;i++) {1i—1;if (d[i]) {for (j0;j<1;j++) {g0.0;for (k0 ;k<1 ;k++)g +=a[i] [k]*a[k] [j];for (k0 ;k<1 ;k++)a[k] [j] —= g*a[k] [i];}}d[i]a[i] [ii;a[i] [i]1.O;for (jO;j<1;j++)a[i] [j]=a[j] [i]0.0;}}/*********************************************************/void tqli(d,e,n,z)double d[] ,e[] ,**z;mt n;{mt m0,10,iter0,i0,k0;double s0.0,r0.0,p0.0,g0.0,f0.0,ddO.0,c0.0,b0.0;void nrerrorQ;137for (i1;i<n;i++)e[i—1]e[i]e[n—1]0.O;for (l=iter=O;l<n;1++)do {for (ml;m<n-1;m++) {dd=f abs (d Em] ) +f abs (d [m+1] )if (fabs(e[m])+dd == dd) break;}if (m 1) {if ((iter++) == 10000)nrerror(”Too many iterations in TQLI”);g=(d[l+1]—d[l])/(2.0*e[l]);rsqrt((g*g)+1.0);g=d [ml —dEl] +e El] / (g+SIGN (r ,sc1.O;p0.O;for (im—1;i>1;i——) {f=s*e[j];bc*e[i];if (fabs(f) >= fabs(g)) {cg/f;rsqrt((c*c)+1.0);e Ei+1] =f*r;c *= (s1.0/r);} else {sf/g;rsqrt((s*s)+1.0);eEi+1]g*r;s *= (c1.0/r);}138gd[i+1] -p;rCd [i] —g)*s+2. O*c*b;ps*r;d[i+1]g+p;gc*r-b;for (k0;k<n;k++) {fz[k] [i+1];z[k] [j+1]s*z[k] [i]+c*f;z[k] [i]c*z[k] [i]—s*f;}}d[l]d[l]—p;e[m]0.O;}} while Cm 1);1* printf(”Number iterations in tqli = ‘/.d\n”,iter); */}#include <malloc .h>#include <stdio .h>/*******************************************************/void nrerror(error_text)char error.text[];{fprintfCstderr,”Numerical Recipes run-time error..fprintfCstderr,”°hs\n” ,errortext);fprintfCstderr,”. . .now exiting to system..exit Ci)3.139double *vector(nh)mt nh;{double *v;v(double *)calloc(nh,sizeof (double));if (v) nrerror(”allocation failure in dvectorQ”);return v;}/*******************************************************/mt *ivector(nh)mt nh;{mt *v;v=(int *)calloc(nh,sizeof(int));if (!v) nrerror(”allocation failure in ivectorO”);return v;}/*******************************************************/double **matrix(nrh,nch)mt nrh,nch;{mt i,j;double **m;m(double **) calloc(nrh,sizeof(double*));if (!m) nrerror(”allocation failure 1 in matrixO”);for(i0; i<nrh; i++) {m[i]=(double *)calloc(nch,sizeof (double));if (m[i]) nrerror(”allocation failure 2 in matrixQj;140}return m;}/******************************************************/void f_mat (m,nrh)double **m;mt nrh;{mt i0;for(i=nrh—1;i>0;i——) free((char*) (m[i]));free((char*) (m));}double **dmatrix(nrh,nch)mt nrh,nch;{mt i;double **m;m(double **) calloc(nch,sizeof(double*));if (m) nrerror(”allocation failure 1 in dmatrixQ”);for(i=O;i<nch;i++) {m[i]=(double *)calloc(nrh,sizeof (double));if (!m[iJ) nrerror(”allocation failure 2 in dmatrixQ”);}return m;/**********************************************************/#include <math.h>double det(h,m)141double **h;mt m;{double *dv,*er,tmp,**temp;double *vector() ,**matrixQ;void tqli() ,tred2() ,f_matQ;mt i,j;temp=matrix(m,m);for (i0;i<m;++i)for (j0;j<m;++j)temp[i] [j]h[i] [j]ervector(m);dv=vector(m);tred2(temp,m,dv,er);tqli(dv,er,m,temp);for (i0,tmpl.O;i<m;-t--t-i)tmp*=dv[i];free((char*)(dv));free((char*)(er));f_mat (temp,m);return tmp;}/**********************************************************/void mat_inverse(h,m,invh)double **h, **invh;mt m;{double **hs ,*dv,*er,**lam;double *vector() ,**matrixO;void tqliO,tred2O,f_matQ,m_matO,nrerrorQ;142mt i,j;er=vector(m);lam=matrix(m,m);hs=matrix(m,m);dv=vector(m);tred2(h,m,dv,er);tqli(dv,er,m,h);for (i=O;i<m;++i)if (dv[i]’O.O)lam[i] [i]=1.O/dv[i]else nrerrorQ’trying to invert a singular matrix”);m_mat(hs,h,lam,m,m,m);for (i0;i<m;++i)for (j0;j<m;++j)lam[i] [j]=h[j] [ii;m_mat(invh,hs,lam,m,m,m);f_mat (hs,m);f_mat(lam,m);free((char*)(dv));free((char*)(er));}/**********************************************************/void invert_matrix(h,m)double **h;mt m;{double **hs , *dv , *er , **lata;double *vector() ,**matrixQ;void tqli() ,tred2() ,f_mat() ,m_mat() ,nrerrorQ;mt i,j;143ervector(m);lam=matrix(m,m);hs=matrix(m,m);dvvector(m);tred2(h,m,dv,er);tqli(dv,er,m,h);for (i0;i<m;++i)if (dv[i] !=O.O)lam[i] [i]1.O/dv[i]else nrerror(”trying to invert a singular matrix”);m_mat(hs,h,lam,m,m,m);for (i0;i<m;++i)for (j=O;j<m;+-fj)lam[i] [j]=h[j] [ii;m_mat(h,hs,lam,m,m,m);f_mat (hs,m);f_mat(lam,m);free((char*)(dv));free((char*)(er));}/***********************************************************/void m_mat(mO,ml,m2,n,p,m)double **mO , **mj. , **m2;mt n,p,m;{mt i,j,k;for (i0;i<n;++i)for (j0;j<m;+-i-j)for (k0,mO[i] Ej]=O.O;k<p;++k)144mO[i] [j]+m1[i] [k]*m2[k] U];}/**********************************************************/void s_mat(mO,ml,m2,n,m,k)double **mO , , **rn2;mt n,m,k;{mt i,j;if (k==o)for (i0;i<n;++i)for (j=O;j<m;-I--fj)mOLi] [j]m1[i] [j]—m2[i] [j];elsefor (i0;i<n;++i)for (j0;j<m;++j)iuO[i] [j]m1[i] [j]-i-m2[i] [ii;}#include <math.h>double trigamma(z)double z;{double retval,rl,d,w,x,ww;mt il,i2,j;i2(int) (z);if (i2>20) {rlz;wwrl*rl;145x1—(O .2—1/(ww*7))/ww;d=1/z+(x/(z*3)+1)/(ww*2);}else {i220—i2;wz+i2;rlw;wwrl*rl;x1—(O .2—1/(ww*7))/ww;d=1/w+(x/(w*3)+1)/(ww*2);i1i2;for (j0;j<il;+-i-j) {rlw;d+1/(rl*rl);}3.retval=d;return retval;3.double digamma(x)double x;{double y,retval,s3,s4,s5,dl,r,gammaQ;s31.0/12;s41 .0/120;s50 .003968253968;d1—0 .5772156649;retval0 .0;146yx;if (y<=O.O) return retval;if (y<1e—5) return (dl—1.O/y);while (y<8.5) {retvalretval-1.O/y;yy+1.O;}r=1 .O/y;retvalretval+log(y)-r/2;retval=retval—r* (s3—r* (s4-r*s5));return retval;}double gamma(x)double x;{double gcs[24],pi,sq2pil,xmin,xmax,dxrel,y,sinpiy,gamiri;double r9lgmc() ,csevlQ,gamlinQ;mt ngcs,i,n,initsO;gcs[1] O.008571195590989331e0;gcs [2] = 0.00441538 1324841007e0;gcs [3] = 0. 05685043681599363e0;gcs [4] = —0.0042 19835396418561e0;gcs [5] = 0. 001326808181212460e0;gcs [6] = —0. 0001893024529798880e0;gcs [7] = 0 . 0000360692532744124e0;gcs [8] —0. 0000060567619044608e0;gcs [9] = 0 . 0000010558295463022e0;gcs [10] = —0. 0000001811967365542e0;147gcs [11] 0. 0000000311772496471e0;gcs [12] = —o . 0000000053542196390e0;gcs [13] = 0. 0000000009193275519e0;gcs [14] = —0. 0000000001577941280e0;gcs [15] = 0. 0000000000270798062e0;gcs [16] = —0. 0000000000046468186e0;gcs [17] = 0. 0000000000007973350e0;gcs [18] —0.000000000000 1368078e0;gcs [19] = 0 . 0000000000000234731e0;gcs [20] = —0. 0000000000000040274e0;gcs [21] = 0. 0000000000000006910e0;gcs [22] = —0. 0000000000000001185e0;gcs[23] 0.0000000000000000203e0;pi=3. 14159265358979324e0;sq2pilo . 91893853320467274e0;ngcs = inits(gcs,23,5.9604645e—8);gainlim(&xmin,Scxmax);dxrel = sqrt(1.1920929e—6);y = fabs(x);if (y>1O.O) {gamin = exp((y—0.5)*log(y)—y+sq2pil+r9lgmc(y));if (x<0.) {sinpiy = sin(pi*y);gamm = -pi/(y*sinpiy*gamm);}}else {n = x/1;if (x<0.)n = n-i;y = x—n/1.0;148n = n-i;ganim = O.9375+csevl(2.*y-i. ,gcs,ngcs);if (n>0)for (i1;i<n;i++)gamm = (y+i/i.O)*ganun;else {n = -n;for (ii;i<n;i++)gamin = gamm/(x+(i-i)/i.O);return(ganuu);}gaml im(xmin , xmax)double *xmin, *xmax;{double alnsml,alnbig,xln,xold,log() ,fabsQ;mt i;alnsmllog(2. 9387359e-37);*xmjn=-alnsml;for (ii;i<10;i++){xold= *xmin;xln=log(*xmin);*xmjn *xmjn- *xmjn* ( (*xmjn+O 5) *xln*xmin-0. 2258+alnsml) / (*xmin*xln+0 . 5);if (fabs(*xmin—xold)<0 .005) break;*fljjfl- *xmjn+0.Oi;alnbiglog(i .70i4117e38);*xmaxalnbig;149for (i1;i<10;i++){xold *xmax;xln=log(*xmax);*xmax *xmax— *xmax* ( (*xmax-0 .5) *xln—*xmax+0. 9189—alnbig) / (*xmax*xln-0 .5);if (fabs(*xmax-xold)<0 .005) break;*xmax *xmax-0 .01;*xmjn(*xmjn>1 .0- *xmax)? *xmin: 1.0- *xmax;}double r9lgmc(x)double x;{double algmcs[7],xbig,xmax,y,z,csevlQ,sqrtQ,expO,logQ;mt nalgm,initsO;algmcs [1] = 0. 166638948045186;algmcs [2] = —0 .0000138494817606;algmcs [3] = 0.0000000098108256;algmcs [4] = —0.0000000000180912;algmcs [5] = 0.0000000000000622;algmcs [6] = —0.0000000000000003;nalgminits(algmcs ,6,5. 9604645e—7);xbigl . 0/sqrt (5. 9604645e—7);ylog(1 .7014117e38/12.0);z= —log(12.0*2.9387359e—37);xmax(y<z)? exp(y) :exp(z);if (x<xbig) return(csevl(2.0*(10.0/x)*(10.0/x)—1.0, algmcs ,nalgm)/x);else return(1.0/(12.0*x));}150double csevl(x,cs ,n)mt n;double x,cs[201];{double bO,bl,b2,twox;mt i,ni;b10.O;b00.O;twox2 . Q*x;for (i1;i<n;i++){b2b1;blbO;nin+1-i;bOtwox*bl-b2+cs [ni];return(O.5*(bO-b2));}mt inits(os,nos,eta)mt nos;double os[201] ,eta;mt i,ii;double err,fabsQ;errO.O;iil;while (ii<=nos && err<eta){inos+1—ii;errerr+fabs(os [i]);151return (i);}152Appendix C: Figures3130(D2L()cl)1 23 81221C)4 1I I I-84 -82 -80 -78 -76 -74LongitudeFigure 3.1: Locations of gauged sites in Southern Ontario plotted with Census Subdivision boundaries, where monthly pollution levels are observed and Sites 3, 29 (outliers)are not plotted.153**CD**‘.0 *ci)4-***_I ******* ** * *C’) ****I I-84 -82 -80 -78 -76 -74LongitudeFigure 3.2: Locations of selected sites in Southern Ontario plotted with Census Subdivision boundaries, where monthly interpolated pollution levels are needed.1540 20 40 60Month(1=Jan. 1983)0 20 40The Oberved and Fitted Values at Site 520 40 60Month(1=Jan. 1983)q0z00•qC., LI)00 20 40 60Month(l=Jan, 1983)60Month(1.Jan. 1983)Figure 3.3: Plots for monthly observed and fitted, log-transformed levels of 03 in ppb,SO2, NO2 and SO4 in g/m3,at Gauged Site 5.155Quantiles of Standard Norma)Figure 3.4: Normal quantile-quantile plots for original and log-transformed monthlylevels of SO4 in ug/m3 at Gauged Site 4.a -00Cl)0U,C‘......*....—‘..—-2 -1 0 1 2Quantiles of Standard Normal0•#—.d’•.—.——-2 -1 0 1 2156Series: S040u_ .II IiI0 5 10 15LagSeries : S04cJj0U0I Iit095 10 15LagFigure 3.5: Plots for autocorrelation and partial autocorrelation of monthly, logtransformed levels of SO4 in g/m3 at Gauged Site 4.1578cI0SVJdeoU4SIp9UECI—rjOO9=pqtuJeOLJJEI6U!LBOOUJSoufld0001-009OO0009OO0OO-009-•seeuipJooeuid—cjiiueuodxS!LUJf5O!JAPel.111dasU!putqopaoqappajoouxsv:)canjdi{UTpUtqopa’oqa3ppt{flOJ:90=pquiIeoluIea6uqooueuiids0=eouSpeuid—a000091-00900001-—0091-0001-0090seeu!pJooeuid-cjIueuodx3S!tJJJ6O[J’\P&!d.U)ci0C>00a)U)-00CC.0.L()0•.....• • ••• •.•..... . .•.••.•I.. e....•I I I I- I0 20 40 60 80 100Estimated CovariancesFigure 3.8: Scatter plot of observed covariances vs predicted covariances obtained bythe GS approach.159314U,-a-a)844‘ 65-84 -82 -80 -78 -76 -74LongitudeFigure 3.9: Means of monthly levels of 03 in ppb, in summers of 1983 1988 at gaugedsites in Southern Ontario plotted with CSD boundaries.Figure 3.10: Means of monthly levels of 03 in ppb, in summers of 1983 1988 at selectedsites in Southern Ontario plotted with CSD boundaries.3541Co-a-4454447 645 0525 56557-84 -82 -80 -78-76 -74Longitude160I-a)22D(I)CCl)Cl)a)a).100a)•1Ca).1CCCl)DU)a)a)0a).4-.C10••••.••• ••.c_._._..•••. .—. .•.•.4,_?.•_.,,.1...• St...•••.I• •• .• •••I I . •.•••%i..I.S...Residuals of Observations in Summer-2 —1 0 10..•.•• ‘: , •.•.Is•.•••...•‘-•.• :.•..c.j..; .mr S—. •. .S • •.-—.•.•a•.•.-1.0 -0.5 0.5 1.0 1.50.0Residuals of Observations in WinterFigure 3.11: Scatter plots for residuals of monthly observed pollutant levels againstresiduals of interpolated levels at the log-scale in winter and summer respectively, wherelevels of 03 are in ppb, SO2, NO2 and SO4 in pg/m3.161Figure 3.12: Pollutant-wise scatter plots for residuals of monthly observed pollutantlevels against residuals of interpolated levels at the log-scale in winter and summerrespectively, where levels of 03 are in ppb; SO2, NO2 and 504 in tg/m3.#-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6summer monthly est. log(N02)-0.4 -0.2 0.0 0.2winter monthly est. log(N02)... .•..s•. .-1.0-0.5 0.0 0.5 1.0 1.5winter monthly est. log(S04)0”z 000aSE’EC0EE82°o Cl0£0400oaa0:.‘i .•• •#• ‘I.-1 0 1summer monthly eut. log(S04)-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6summer monthly cut. log(03)OozO000Ec0SCocoo0C.SCS0000£S00. 0t0S.1-0.6 -04 02 00 02 0.6winter monthly eut. log(03):,..-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5-2 -1summer monthly cut. log(S02) winter monthly cut. log(SID2)162EE00)Figure 3.13: Boxplots for predicted, observed and residual levels of log-transformed,monthly concentrations of 03 in ppb, 802, NO2 and 804 in1ug/m3,respectively.163___I1 2 3 4polutants, 1=N02, 2=S04, 3=03, 4=S021 2 3 4pollutants, I=N02, 2=S04, 3=03, 4=S02Co00VEE0)002 3pollutants, 1 =N02, 2=S04, 3=03, 4=S02427Co.2LC)a)D11-J 1219 410 1Cs”I I I-84 -82 -80 -78 -76 -74LongitudeFigure 3.14: Locations of gauged sites in Southern Ontario plotted with Census Subdivision boundaries, where daily pollution levels are observed and Sites 3, 26 (outliers)are not plotted.164.•21012•3Quantiles of Standard Normal..___._..__.....I__..._._._._..__I__.____._I:_s__tøtI•P_•e•_••-3 -2 -1 0 1 2 3Quantiles of Standard NormalFigure 3.15: Normal quantile-quantile plots for original and log-transformed daily levelsof SO4 in [Lg/m3 at Gauged Site 1.Cl,C000CCl,00ci)00165Series : 030U-C)0LiL:z:i::L1zzizz:zir.EzE.z.:mz ii0 5 10 15 20 25LagSeries : 03Figure 3.16: Plots for autocorrelation and partial autocorrelation of daily, log-transformed levels of 03 in ppb at Gauged Site 6, before an AR(1) transformation istaken.Series : 03U-C)-000 5 10 15 20 25LagSeries : 03c?0 5 10 15 20Figure 3.17: Plots for autocorrelation and partial autocorrelation of daily, log-transformed levels of 03 in ppb at Gauged Site 6, after an AR(1) transformation istaken.U.I.:h1Ih1h1.II•I‘.....:.....:.....‘0 5 10 15 20 25LagLag25166.U)...wD_ 0C,, ‘-:15 10 15 20Order of Deleted ColumnsFigure 5.1: Plot for trends in AMSPE.167Months, 1=Jan. 1983; 48=Dec. 1986Figure 5.2: MSPEs obtained by Hass’s interpolator and the Bayesian interpolator withoriginal acid rain data.Figure 5.3: MSPEs obtained by Hass’s interpolator and the Bayesian interpolator withlog-transformed acid rain data.. l--Co-kriging MSPE. 0--Bayesian MSPECoCo -C\j-ci)Cl,a)a)CLt)C)a)a)a)CciC0ci)0Lii0C,)0’0 10 20 30 40I--Cokriging MSPE0--Bayesian MSPEC,)a)CC)a)a)a)a)0&0ci)-c0U-I0Cl)C.”CoCDC”00C000 O0 0 00 II IC I0 Lt‘ T 0H HI0CCC001I7 170 10 20 30C000CC 01Months, 1=Jan. 1983; 48=Deo. 198640168U)C)>C)a)1I.4-CVa)2:C)U).00ci)a)>C)C)4-.4-CVC)0U)CI...4-0-DC)2:C)U).00Figure 5.4: Boxplots of observed nitrate levels with/out log-transformation at 35 sitesin US.1690coCDCd00101 2345678sites from 1 to 35:1 2345678 91112 3141 516 T122a72s3c81323s3435sites from 1 to 35
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bayesian multivariate interpolation with missing data...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bayesian multivariate interpolation with missing data and its applications Sun, Weimin 1994
pdf
Page Metadata
Item Metadata
Title | Bayesian multivariate interpolation with missing data and its applications |
Creator |
Sun, Weimin |
Date Issued | 1994 |
Description | This thesis develops Bayesian multivariate interpolation theories when there are: (i) data missing-by-design; (ii) randomly missing data; (iii) monotone missing data patterns. Case (i) is fully discussed both theoretically and empirically. A predictive distribution yields a Bayesian interpolator with associated standard deviation, a simultaneous interpolation region, and a hyperparameter estimation algorithm. These results are described in detail. The method is applied to interpolating data from Southern Ontario Pollution. An optimal redesign of a current network is proposed. A cross-validation study is conducted to judge the performance of our method. The method is compared with a Co-kriging approach to which the method is meant to be an alternate. Case (ii) is briefly discussed. An approximation of a matrix T-distribution by a normal distribution is explored for obtaining an approximate predictive distribution. Based on the approximate distribution, an approximate Bayesian interpolator and an approach for estimating hyperparameters by the EM algorithm are described. Case (iii) is only touched on. Only an iterative predictive distribution is derived. Further study is needed for finding ways of estimating hyperparameters. |
Extent | 2465492 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-06-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0088917 |
URI | http://hdl.handle.net/2429/8976 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1995-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1995-983523.pdf [ 2.35MB ]
- Metadata
- JSON: 831-1.0088917.json
- JSON-LD: 831-1.0088917-ld.json
- RDF/XML (Pretty): 831-1.0088917-rdf.xml
- RDF/JSON: 831-1.0088917-rdf.json
- Turtle: 831-1.0088917-turtle.txt
- N-Triples: 831-1.0088917-rdf-ntriples.txt
- Original Record: 831-1.0088917-source.json
- Full Text
- 831-1.0088917-fulltext.txt
- Citation
- 831-1.0088917.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088917/manifest