Combing Measurements With Deterministic Model Outputs: Predicting Ground-Level Ozone by Zhong Liu B.Sc., The University of Science and Technology of China, 2001 M.Sc., The National University of Singapore, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Statistics) The University Of British Columbia Dec, 2007 Â© Zhong Liu 2007 Abstract The main topic of this thesis is how to combine model outputs from deterministic models with measurements from monitoring stations for air pollutants or other meteorological variables. We consider two different approaches to address this particular problem. The first approach is by using the Bayesian Melding (BM) model proposed by Fuentes and Raftery (2005). We successfully implement this model and conduct several simulation studies to examine the performance of this model in different scenarios. We also apply the melding model to the ozone data to show the importance of using the Bayesian melding model to calibrate the model outputs. That is, to adjust the model outputs for the prediction of measurements. Due to the Bayesian framework of the melding model, we can extend it to incorporate other components such as ensemble models, reversible jump MCMC for variable selection. However, the BM model is purely a spatial model and we generally have to deal with space-time dataset in practice. The deficiency of the BM approach leads us to a second approach, an alternative to the BM model, which is a linear mixed model (different from most linear mixed models, the random ii effects being spatially correlated) with temporally and spatially correlated residuals. We assume the spatial and temporal correlation are separable and use an AR process to model the temporal correlation. We also develop a multivariate version of this model. Both the melding model and linear mixed model are Bayesian hierarchical models, which can better estimate the uncertainties of the estimates and predictions. iii Table of Contents Abstract ^ Table of Contents ^ iv List of Tables ^ vii List of Figures ^ xiii Notations, Abbreviations and Conventions ^ xix Acknowledgments ^ xxii Dedication ^ xxiv 1 Introduction ^ 1 2 Spatial Covariance and Melding Model ^ 7 2.1 Definition and Estimation of The Variogram ^ 7 2.2 A Class of Non-stationary Spatial Models ^ 11 2.3 Kriging ^ 17 2.4 Bayesian Melding Model ^ 18 iv 3 4 5 6 Implementation of the Bayesian Melding Model ^ 23 3.1 Bayesian Melding in Stationary Case ^ 23 3.2 Bayesian Melding in Non-stationary Case 30 3.3 MCMC Algorithm ^ 31 3.4 Spatial Prediction 33 3.5 Extension to Ensembles 3.6 Reversible Jump MCMC 3.7 Bayesian Melding Model Software ^ ^ 34 ^ 37 ^ ^ 40 Testing the Bayesian Melding Model ^ 47 4.1 A Single Deterministic Model ^ 47 4.2 Ensemble Studies ^ 58 4.3 Reversible jumps: the stationary case 4.4 Bayesian Melding in Non-stationarity Case ^ A New Univariate Spatial-Temporal Model 62 ^ 64 ^ 74 5.1 Two-Step Linear Regression ^ 74 5.2 Two-step linear regression with Kriging 79 5.3 A Bayesian Hierarchical Spatial-Temporal Model Multivariate Spatial-Temporal Model ^ ^ ^ 79 93 6.1 Multivariate Spatial-Temporal Model ^ 95 6.2 MCMC Algorithm ^ 98 7 Forecasting and Spatial Prediction ^ 102 7.1 Bayesian Melding Model Prediction ^ 102 7.2 Summary and Conclusions ^ 122 7.3 Spatial-temporal Model Prediction and Forecasting ^ 126 7.4 Multivariate Spatial-temporal Model Prediction and Forecast 136 8 Calibration of Deterministic models ^ 145 9 Application of the Bayesian Ensemble Melding Model . . . 153 9.1 Data Description ^ 153 9.2 Data Analysis ^ 155 10 Dynamic Bayesian Ensemble Melding Model ^ 165 10.1 DeGroot's problem ^ 165 10.2 Dynamic Bayesian Ensemble Melding forecast ^ 167 11 Discussion and Future Work ^ 11.1 Bayesian Melding Model ^ 172 172 11.2 Bayesian Spatial-Temporal Model ^ 175 Bibliography ^ vi 178 List of Tables 4.1 Estimates of a and b with different numbers of model outputs. 52 4.2 Estimates of 0 with different numbers of model outputs. . . . 53 4.3 Estimates of 9 = (Q, p) with different numbers of model outputs. 54 4.4 SSPE (Sum of squared prediction errors) of Kriging and the BM model with different number of grid cells. Since the data is simulated, there is no unit associated to the numbers in this table. ^ 66 4.5 The SSPE when the true values of the parameters (a, b, 13, 0) are used. That is, we use the conditional mean of the unavailable measurements based on the available measurements to predict at the 100 unavailable stations. Since the data is simulated, there is no unit associated to the numbers in this table. ^ 67 4.6 Coverage probability of the credible interval for the simulation study in Section 4.1, with 20 monitored sites and up to 50 grid cells to predict 100 un-monitored sites. The first column is the nominal coverage probability of the credible interval. ^ 68 vii 4.7 Estimates are based on 20 monitored sites and 50 sampling points. There are 25 sampling points in close proximity to one another. The averages are computed over 50 dataset. True values are a = 1.50 and p = 5.00. Since this table is for the estimates of parameters a and p, there is no unit associated to the numbers in this table 68 4.8 Parameter estimates in the simulation study of ensembles of deterministic models in Section 4.2. Columns 3-4 give the averages of posterior means and standard deviations for (x i , b i , i = 1, 2, the calibration parameters of the model output for two deterministic models. as, 1 and ao , 2 are the variances of two model output error processes. The averages are computed over 15 dataset. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. 69 4.9 SSPE of the BM model and Kriging in the simulation of multideterministic models. We have 50 monitored sites and 100 unmonitored ones. There are 20 grid cells each of them having two sampling points inside. Since the data is simulated, there is no unit associated to the numbers in this table. viii 70 4.10 Estimates in the stationary case using reversible jump BM model. Both the "estimates" and "standard error" are averages obtained from 15 independent dataset. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. 71 4.11 SSPEs with different choices of h in non-stationary BM model. Columns 2-6 give the SSPEs for different ratios of the true bandwidth generating the data over the chosen bandwidth in the non-stationary BM model. Since the data is simulated, there is no unit associated to the numbers in this table 72 4.12 Estimates of a and b for varying values of h in non-stationary BM model. Columns 2-5 give the averages of posterior means and standard deviations. The averages are computed over the 15 datasets generated in the simulation study. The true values are a = 5.00 and b = 2.50. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. 73 7.1 Average RMSPE of hourly ozone predictions at 30 stations. Column 1: RMSPE for BM model. Column 2: RMSPE for Kriging using measurements. Column 3: RMSPE for Kriging using model outputs. The unit of the numbers in this table is ppbv. ix 119 7.2 RMSPE (root mean square prediction error) for predicting daily average of ozone levels at 30 stations. Column 1: days. Column 2: RMSPE for BM model. Column 3: RMSPE for Kriging with measurements. Column 4: RMSPE for Kriging with model outputs. The number with * is the smallest number in each row. The unit of the numbers from column 2 to 4 in this table is ppbv. 121 7.3 RMSPE (root mean square prediction error) for predicting weekly average of ozone levels at 30 stations. Column 1: days. Column 2: RMSPE for BM model. Column 3: RMSPE for Kriging with measurements. Column 4: RMSPE for Kriging with model outputs. The number with * is the smallest number in each row. The unit of the numbers from column 2 to 4 in this table is ppbv. 123 7.4 Posterior means and standard deviations of parameters al, â€¢ a5, â€¢ â€¢ â€¢, c 5 , AE, Aa7 aa2 , ac2 , 'a and p. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar.140 7.5 The first column gives the indeces of 15 stations where the first 240 hours' measurements are available. The second, third and forth columns are the RMSFE (square root of the mean square forecast error) of the next 240 hours' measurements fore-casted by the Bayesian spatial-temporal model, ad-hoc approach and model outputs. The number followed by * indicates the approach that has the smallest RMSFE in that row. The unit of the numbers from column 2 to 4 in this table is ppbv 141 7.6 RMSPE at ungauged sites 1-35. Column 2: Bayesian spatialtemporal model; column 3: ad-hoc approach; column 4: model outputs; column 5: Kriging with measurements. The number followed by * indicates the approach that has the smallest RMSPE in each row. The unit of the numbers from column 2 to 5 in this table is ppbv. 142 7.7 RMSPE at ungauged sites 36-63. Column 2: Bayesian spatialtemporal model; column 3: ad-hoc approach; column 4: model outputs; column 5: Kriging with measurements. The number followed by * indicates the approach that has the smallest RMSPE in each row. The unit of the numbers from column 2 to 5 in this table is ppbv. xi 143 7.8 Posterior mean and standard deviations of the parameters /10, p and E.i-to i i and Âµa 2 , 1 are the means of intercepts; ,u, 0 , 1 , 2 and po, 1 , 3 , the mean of coefficients of day- and nighttime model outputs of the daytime measurements; P)3,2,2 and A0,2,3, the mean of coefficients of day- and nighttime model outputs on the nighttime measurements. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. 144 7.9 Average of RMSPE and RMSFE for Bayesian multivariate spatial-temporal model (6.1) (columns 2 and 4) and univariate spatial-temporal model (5.5) (columns 3 and 5). The average of RMSPE is over 63 sites and the average of RMSFE is over 15 stations. The unit of the numbers in this table is ppbv. . . 144 8.1 RMSPE of uncalibrated and calibrated model outputs for the prediction of measurements at the 30 stations. column 1: day 1-30; column 2: RMSPE of uncalibrated model output; column 3: RMSPE of BM calibrated model output; column 4: RMSPE of Bayesian spatial-temporal model calibrated model output; The number with a * indicates the "winner" in that row. The unit of the numbers from column 2 to 4 in this table is ppbv. xii 149 List of Figures 4.1 Locations of 120 (20 monitored, 100 to be predicted) sites and points sampled from up to 50 grid cells. Monitored sites (A), sites with responses to be predicted: +, 1: the first 2 points from grid cells, 2: 3-10 points, 3: 11-20 points, 4: 21-30 points, 5: 31-50 points. 50 4.2 Spatial predictions for 100 un-monitored sites by using BM model in the case of 50 grid cells. ^ 55 4.3 MCMC Gibbs samplers of the additive and multiplicative calibration parameters, a and b, respectively, in the case of 50 grid cells. 56 4.4 MCMC Gibbs samplers of coefficients 0 in the case of 50 grid cells ^ 57 4.5 MCMC Gibbs samplers of covariance parameters 0 = (c 7, p) in the case of 50 grid cells. ^ 58 4.6 MCMC Gibbs samplers of error variance 01 and (7,2 in the case of 50 grid cells. ^ 59 4.7 Locations of 150 sites (50 monitored, 100 to be predicted) and 20 grid cells. Stations: A; sites with sites to be predicted: +; grid cells: rectangles. Each grid cell has two sampling points in it. The x-axis is the longitude in degrees, y-axis, the latitude in degrees 60 4.8 The upper plot: the dimension k of the coefficient 3, as a function of the number of MCMC iterations. The lower plot: histogram of the posterior MCMC samples of k 63 5.1 Histogram of Pearson's correlation coefficients between hourly measurements and model output for the 78 stations. ^ 75 7.1 Scatter plots and Pearson correlations between measurements of station #290470003 and model outputs for cell #1847. The y-axis stands for the model outputs of grid cell #1847 and the x-axis stands for the measurements at station #290470003. The station #290470003 is inside grid cell #1847. Observe that the correlation between measurements and model outputs is 0.7. 105 7.2 The upper panel is the 24 side-by-side hourly box plots of measurements at station #10731005. The lower panel is the 24 side-by-side hourly box plots of model outputs (simulated ozone levels) in grid cell #3529. The station #10731005 lies inside grid cell #3529. xiv 108 7.3 Histograms of correlation over space at each hour. The upper histogram is for all the 24 hours. The lower histogram is for the 8-hour day time. Notice the markedly larger model-tomeasurement correlations during the hours 1000-1700. 109 7.4 Locations of 48 available stations, 78 grid cells and 30 unavailable stations. A: available stations. Rectangle: grid cells. +: unavailable stations to be predicted. The x-axis is the longitude in degrees, y-axis, the latitude in degrees 111 7.5 Cumulative plots of the relative errors in percentages between the distance using formula 7.1 and formula 7.2. 112 7.6 Hourly RMSPE (root mean square prediction error) difference between Kriging using measurements and BM versus the correlation over space between measurements and model outputs. Points above the horizontal line in the plot represent victories for the BM model over Kriging. 114 7.7 RMSPE difference between Kriging prediction with measurements and the BM prediction versus the correlation over space between measurements and model outputs. Points above the plotted horizontal line represented victory for the BM model. Notice the supremacy of the BM model when correlation exceeds 0.6. xv 118 7.8 Daily RMSPE difference between Kriging with measurements and the BM model versus the variance of b (defined by formula (7.4) ) across space. Points above the plotted horizontal line represented victory for the BM model. Notice the supremacy of the BM model when the variance of b over space is small. . 119 7.9 Daily RMSPE difference between Kriging with only measurement and the BM model versus the absolute mean difference of b (defined by formula (7.4)) between the measured and unmeasured stations. Points above the plotted horizontal line represented victory for the BM model . Notice the supremacy of the BM model when the absolute mean difference of b over space is small 120 7.10 Plots of partial ACF and ACF for the estimated residual "e, at one station. 128 7.11 Plots of the parameter values in the Gibbs sampling iterations. Upper left panel: a l ; upper right panel: rho; lower left panel: cre2 ; lower right panel: A, 130 7.12 Plot of measurement and the forecast and with 90% credible interval using the Bayesian spatial-temporal model at one of the 15 stations. 131 7.13 Plot of measurement and the spatial prediction with 90% credible interval from the Bayesian spatial-temporal model at one of the 63 ungauged sites xvi 133 7.14 Plot of distance versus correlation for the normalized residuals from model 5 134 5 8.1 Measurements versus the uncalibrated and calibrated model output of 30 grid cells on day 2. The solid line shows the measurements inside the grid cells. The dotted and dashed lines are the uncalibrated and calibrated model outputs respectively. 148 8.2 Scatter plots of uncalibrated model outputs versus BM calibrated model outputs. The solid lines have intercept 0 and slope 1. The plots are for days 2, 7, 8 and 26 from the upper left to the lower right corner in left-to-right sequence. . . . . 150 8.3 Scatter plots of uncalibrated model outputs versus Bayesian spatial-temporal model calibrated model outputs. The solid lines have intercept 0 and slope 1. The plots are for days 2, 7, 8 and 26 from the upper left to the lower right in left-to-right sequence. 151 8.4 Histograms of posterior means of a o , a l , a 2 and b during the 30 days on which the current study focuses on. 152 9.1 Histograms of 102 hours' Pearson's correlation coefficients between measurement and model output over space. Panels 1-5 are the histograms for model output from five different MM5 initial conditions xvii 156 9.2 Histograms of posterior mean of a. Panels 1-5 are the histograms of a for model output from five different MM5 initial conditions 161 9.3 Histograms of posterior mean of b. Panels 1-5 are the histograms of b for model output from five different MM5 initial conditions 162 9.4 RMSPE histograms of predictions by BEM, model averaging with equal weight and Kriging with measurements only 163 9.5 The x-axis represents the mean correlation coefficients between measurement and model output, i.e. the mean of the five correlation coefficients between the measurement and five model outputs. x v iii 164 Notations, Abbreviations and Conventions We follow the widely used conventions throughout the thesis. Latin uppercase letters, often X, Y, Z, sometimes with subscripts such as s, t, are used for random variables. Bold Latin upper-case letters, often X, Y, Z, are used for random vectors. Bold Greek lower-case letters stand for parameter vectors, e.g., /3, 0. For a vector or matrix, the transpose is indicated with a superscript of T. List of notation and abbreviations: (â€”cc, co) distributed as a proportional to approximate to 0 Kronecker product I identity matrix multivariate normal density function xix 7r(â€¢) prior distribution X1Y random variable X conditional on random variable Y a Estimate of a N o-2 ) univariate normal distribution with mean it and variance cr 2 MVN(p, E) multivariate normal distribution with mean pc and varinace matrix E probability function of the standard normal distribution 7 ( 3) prior distribution of parameter i.i.d independently and identically distributed AR(p) autoregressive of order p E(X) expectation of random variable X Var (X) variance of random variable X Cov(X, Y covariance of random variables X and Y {Z} a stochastic process a location in a geographic region B a grid cells of the deterministic model outputs in a geographic region x also indicate a location in a geographic region in Section 2.2 also indicate a location in a geographic region in Section 2.2 a distance vector between two locations in a geographic region the covariance function the variogram function variance of a stochastic process also known as the sill parameter range parameter of a covariance function C(.) also represent auto-regressive parameter in Chapters 5, 6 and Sections 7.3 and 7.4 xx t^indicate time in hours in Chapter 5 and Section 7.3, in days in Chapter 6 and Section 7.4 nugget parameter of a covariance function C(.) K(.)^the kernel function ANOVA^analysis of variance 1 2 1^measurement process over the space {2}^deterministic model outputs process over the space "distributed as" or "has its distribution in" "approximate to" f (x)^density function of random variable X = x MCMC^Markov chain Monte Carlo sd^standard deviation ppbv^part par billion by volume BM^Bayesian melding BEM^Bayesian ensemble melding DLM^dynamic linear model RMSPE^root mean square prediction error RMSFE^root mean square forecast error SSPE^sum square prediction error DBEM^dynamic Bayesian ensemble melding model output deterministic model outputs station^gauged site where measurements are available xxi Acknowledgments There are so many people to thank. First of all, I thank my two supervisors, Jim Zidek and Nhu Le, for their academic and financial support. Without their patient coaching and continual encouragement, there is no way I can finish my thesis. Both of them are great mentors and I feel truly honored to work with them. I want to thank my committee member Paul Gustafsonâ€ž for his great feedback on my work. Thank you to Jiahua Chen and Douw Steyn who kindly agreed to be University Examiners at my defense. I also thank to the current and previous office staff, Christine Graham, Rhoda Morgan, Elaine Salameh, Peggy Ng and Viena Tran, for all their help with administrative matters. I am also indebted to my fellow graduate students (Lawrence McCandless, Jean-Franois Plante, Kenneth Lo, etc) for many helpful talks on my research. Special thanks to Mike Danilov, for his patience to answer my many questions about computing issues. Outside the department, I am grateful to Fahimah Al-Awadhi who helped me to understand how reversible jump MCMC works. Thank you also to Prasad Kasibhatla who provide ozone air pollution data and MAQSIP modeling outputs. Finally, a note of thanks to Tilmann Gneiting for very generously pointing the way to the MURI ensemble daatset. On a personal note, I want to say thank you to my family and friends for their continual love and support. I am so lucky to have my mom who always encouraged me to pursue what I truly wanted in my life even when it made life difficult for her. Finally, I thank my girl friend, Jiaoyang Jia, for her love and support. She has always had a faith in me even when I lost faith in myself. Dedication To my father, Renxi Liu (1949-2001), whose love is with me always. Chapter 1 Introduction Deterministic models have been widely used in scientific research because they have a number of advantages. First and foremost they, unlike statistical models, do incorporate prior knowledge about the underlying processes. The deterministic models usually are made of a series of differential equations with boundary conditions. Since the boundary conditions and inputs of the deterministic models are adjustable, they enable scenario analysis under hypothetical changes to an existing regime, say as a result of abatement strategies. They enable computer experiments to be run when it is not feasible to make the phenomenon they represent happen in real life due to ethical or other constraints. Such experiments can enable the study of input - output relationships and possibly suggest optimal control settings for use in the real phenomenon they represent. One of the most important features of the deterministic models is that repeated runs yield the same outputs with fixed input. That is why they are called "deterministic". Unlike statistical models, they do attempt to capture the fundamental dynamic processes that govern the phenomenon of interest. The complexity of the equations that describe such dynamics can render analytic solution impossible and force the 1 use of numerical models, typically with lots of complicated computer code and slow running times. In this thesis, we are interested in the deterministic models used in climate and environmental studies, one example being the RCM (Regional Climate Model) described in Caya et al. (1995). Another example, the MAQSIP (Multiscale Air Quality Simulation Platform) model is described in Odman and Ingram (1996). The chemical transport model (CTM) called GEOS-CHEM (Goddard Earth Observing System-Chemistry), which models hourly ozone fields, is suggested in the current review of NAAQS (National Ambient Air Quality Standards) as a method for estimating ozone's policy relevant background (PRB) level (Garner et al. (2005)). More details about the various versions of GEOS-CHEM models can be found at http://www-as.harvard.edu/chemistry/trop/geos/ . The PRB level, a baseline for the NAAQS, is the ozone concentration that would obtain if there were no anthropogenic sources of ozone in North America. The PRB cannot be measured but has to be modeled because even remote areas in North America is polluted by ozone which drifts from other places. For example, the monthly maximal hourly ozone concentration level in the Yellowstone national park (WY) is over 50 ppbv (parts per billion by volume) in most months from 1998 to 2001 (Figure 3-25a in Garner et al. (2005)). This ozone concentration level is not low even compared with urban areas. In the EPA report Garner et al. (2005), the current deterministic models estimate the PRB level is 15-35 ppbv. Moreover the deterministic models can generate 2 useful byproducts such as estimates of the fraction of the ozone field from the vertical transport from the stratospheric ozone to the troposphere. From now on, we also refer the deterministic model outputs as model outputs for simplicity. Deterministic models simulate the ozone air pollution on grid cells with certain resolutions. However, the measurements are obtained at each monitoring station over the space. So we cannot compare the model outputs with the measurements directly. Nevertheless, we can examine whether there is correlation between model outputs and the local measurements. If there is such correlation, we can use statistical models to combine the measurements with model outputs to achieve better prediction. In addition, we can also calibrate deterministic model outputs for predicting the ozone concentration levels Edwards (1967) and Speed (1983) have started studying the deterministic model outputs by using statistical model. Sacks et al. (1989) approximate the experiment outputs with stochastic processes and choose input values to optimize some functions of the outputs. Poole and Raftery (2000) use a Bayesian model to combine different priors for the input to better evaluate the uncertainties of the model outputs. Ramsay et al. (2007) use non-parametric methods to estimate the parameters of differential equations in a deterministic models. However, the complexity of the deterministic models such as MAQSIP sometimes may stop us working on those differential equations directly. So we simply treat the deterministic models as black boxes. We 3 only focus on the model outputs without learning details of the differential equations in those deterministic models. Kasibhatla and Chameides (2000); Hogrefe et al. (2001b,a) study deterministic model outputs in meteorological and environmental science contexts by using traditional scatter plots and least squares analysis of measurements and modeling outputs; moreover they examining the correlation between measurements from monitoring stations and deterministic modeling output. Guttorp and Walden (1987) address these issues within a statistical framework. Of particular interest is the work of Fuentes et al. (2003) and Fuentes and Raftery (2002, 2005), in which the Bayesian melding (BM) model is used to combine measurements and modeling output while respecting their intrinsic differences. The combination highlights the discrepancies and similarities between the values. The BM model can be used to predict the phenomenon of interest and calibrate the deterministic models by using both measurements and modeling output. Note that the terminology "Bayesian melding" was first proposed by Poole and Raftery (2000). To avoid confusion, from now on in this thesis, the Bayesian melding (BM) model specifically refers to the model proposed by Fuentes and Raftery (2005). Another Bayesian model is proposed by Sanso and Guenni (2002) who like Fuentes and Raftery (2005), postulate a "true underlying process" but it is specifically concerned with rainfall and uses a truncated model for the relationship between the true underlying process and the data. Berliner (2003); Wikle and Berliner (2005) also talk about how to use Bayesian models to combine information 4 from different sources. The BM model is built on a Bayesian framework, which enables us to combine measurements and modeling outputs from more than one deterministic models. This extension of the BM model to include model outputs from more than one deterministic models is of interest because usually an ensemble of deterministic models is used in weather forecasting. Raftery et al. (2005) uses a Bayesian model averaging approach to combine model outputs from deterministic models in different initial conditions to give a more accurate weather forecast. The Bayesian ensemble melding (BEM) model can provide an alternative to that approach. Although the BM model can be used to combine measurements with model outputs, it was designed as a spatial model which cannot handle the temporal correlation appearing in most environmental or climatic data. The ozone data used by Kasibhatla and Chameides (2000) have the hourly measurements and model outputs of the ozone concentration level in eastern and central USA, which has very strong temporal auto-correlation. It is desirable to find an alternative which are spatial-temporal models. Wikle and Cressie (1999); Wikle and Royle (1999); Wikle et al. (2001); Wikle (2001) has proposed spatial-temporal models based on various approaches such as dynamic kalman filter, kernel-based spectral analysis or Bayesian hierarchical model. Guillas et al. (2006) provides a simple but powerful regression approach which enables us to analyze the data with temporal correlation. We first extend this regression approach to a more mathematically rigorous Bayesian spatial-temporal model. Then we extend this Bayesian spatial5 temporal model further to a multivariate model. The rest of the thesis is organized as follows. Chapter 2 gives the BM model specifications. Then Chapter 3 presents the MCMC algorithm to fit the BM model. As an byproduct, we develop the R code needed to implement the BM model under different scenarios. We also include a description of the code in Chapter 3. We test the BM model with a comprehensive simulation study in Chapter 4. In the simulation study, we investigate the prediction and calibration performance of the BM model in various scenarios. As an alternative to the BM model, Chapter 5 and Chapter 6 present the Bayesian spatial-temporal models in univariate and multivariate cases. Chapter 7 summarizes all the forecast and spatial prediction results for the BM model and univariate/multivariate Bayesian spatial-temporal models. Chapter 8 specifically discusses how we calibrate the deterministic models by using the BM and Bayesian spatial-temporal models. The data source for the analyses in Chapter 7 and 8 are the ozone data used by Kasibhatla and Chameides (2000). Chapter 9 presents an application for the BEM model of the temperature data used by Raftery et al. (2005). Chapter 10 presents a dynamic BEM model which extends the BEM to a spatial-temporal model. This thesis concludes with Chapter 11 in which we discuss some future research topics in assessing or calibrating deterministic models used in climatic or environmental studies. 6 Chapter 2 Spatial Covariance and Melding Model This chapter first presents some background knowledge of the stationary and non-stationary spatial covariance structure. Then it presents the specifications of the BM model proposed by Fuentes and Raftery (2005); Fuentes et al. (2003); Fuentes and Raftery (2002). 2.1 Definition and Estimation of The Variogram Suppose we have a real-valued random process {Z(s) : s E D}, which is observed at locations {s i : i = 1, ..., n} over a geographic region D c R d , d being a positive integer. The random process {Z} is defined as second order 7 stationary if it satisfies for all s E(Z (s + h) â€” Z(s)) = 0, Var(Z(s + h)) = Var(Z(s)), Cov(Z(s + h), Z(s)) = C(h), where C(â€¢) is a covariance function. In other words, the correlation between responses at two locations depends only on their degree of separation. In addition, the first and second order moments of the random process Z(s) are the same for all s. Furthermore, if C(h) = C (Ilk), where IA I is the length of h, the covariance function C(.) is called isotropic. Another very important quantity used in spatial statistics, the variogram, is related to the spatial covariance between two locations. We have Var(Z(s + h) â€” Z(s)) = -y(h), 7(h) being known as the variogram. The classical estimator of the variogram proposed by Matheron (1962) is obtained by using the method of moments. The estimator is 7(h)^ 1 I (z(si) â€” z(s i ))2, (2.1) N(h) where the sum is over N(h) = { (i, i) h â€” 5 < ps i â€” s j i < h} and IN(h)1 8 is the number of distinct elements in N(h ). Although unbiased, this classical estimator is highly affected by outliers due to the squared term in the summation. Hawkins and Cressie (1984) presented a more robust estimator ;11(h) , ^Z(8,011/214 I N(h)1E N (h)1 Z (5 0.457 + 0.494 (2.2) From our definition of the variogram, it is clear that 7(0) = 0. However, more generally 7(h) T > 0 maybe allowed as h ---+ 0, in which case r is the called nugget effect by Matheron (1962). One of the sources of the nugget effect is measurement error and commonly measurements are assumed to be the ground truth Z(s) plus some measurement error. The book by Cressie (1993) gives more details about the nugget effect. The covariance function only needs to ensure the covariance matrix it generates is positive definite and symmetric. In the book Stein (1999), Bochner's theorem specifies that the positive definite covariance functions are those which are Fourier transforms of non-negative Borel measures. One commonly used covariance function proposed by Matern (1960) has the following form: Co (d) = 2 , _ lar ( v ) ( 2 1/ 1/ 2 1(11/PY IG( 2v 1/ 2 4 1 / P), d being the distance between any two locations. For convenience, we let 9 = (a, p, v). Here a, the sill parameter, represents the variance of random process Z(s) while p, the range parameter, determines how fast the correlation decreases when distance d increases and v, the smoothing parameter 9 controls the smoothness of the covariance function. is the modified Bessel function of type III as described by Abramowitz and Stegun (1972). Matern covariance functions have the advantage of flexibility. Certain choices of the smoothing parameter v reduce it to simple well known covariance functions. For example, v s = 1/2 gives the so-called exponential covariance function, C 0 (d) = 1 a exp(-1d1/p) if 1 di > 0; (2.3) if Id! = O. The nugget parameter is T. It is obvious that the covariance functions defined above are only suitable for stationary random fields because we assume the covariance C0 (d) is only a function of the distance d. However, in many cases, it is not reasonable to assume second order stationarity. In recent years, the non-stationarity problem has received a substantial amount of attention. One of the earliest and most important papers is due to Sampson and Guttorp (1992), who obtain a semi- parametric estimate of the non-stationary spatial covariance function by transforming the original geographical map into another, the deformed plane. However that deformation method relies on repeated measurements to give an empirical estimate of the variogram, a serious limitation in the context of geostatistics, for example, where typically just one realization of a space - time process is available. Haas (1995, 1998) circumvents that problem with a moving windows method as does Higdon et al. (1999) with a convolution approach although these methods may require data from a large number of monitoring sites to be effective. Other 10 non-stationary papers include Paciorek and Schervish (2004); Nychka et al. (2002); Paciorek and Schervish (2006). The paper of Gelfand et al. (2004) considers the non-stationarity problem for a multivariate case. Fuentes and Smith (2001) propose a class of non-stationary spatial models used in Fuentes et al. (2003), Fuentes and Raftery (2002), Fuentes and Raftery (2005). This approach's appeal derives not only from its circumvention of the need for replicate measurements but as well, from its simple intuitive, easy to understand formulation as well as the ease with which it can be implemented. More details about the convolution approach are presented in the next section. 2.2 A Class of Non-stationary Spatial Models Calder and Cressie (2006) gives an excellent summary of the convolutionbased spatial models which become more popular recently because of its flexibility. Yaglom (1987) stated that a stationary spatial process {Z(s)} can be written as Z(s) = f K (s , u)W (du),^(2.4) Rd where K(s, u) is a kernel function satisfying f K 2 (s, u) < M < Do and W(.) is a Brownian motion. The covariance function of the process {Z(s)} is C (Z (s 1 ) , Z (2 2 )) = f K (s Rd 11 1 , u)K (s 2 , u)du. To represent a non-stationary spatial process, we can modify 2.4 by either assuming the kernel function K(s, u) or the Brownian motion process W(.) depending on location s. Higdon et al. (1999) use the convolution approach with a spatially dependent kernel function to represent non-stationary process. That is, Z (s) = f K 8 (2 , u)W (du), (2.5) Rd whose covariance function is C(Z(si), Z (s2)) = fR d K 81(si, u)K 8 2 (.92, u)du. Higdon et al. (1999) use Gaussian kernel function so that the resulting nonstationary process has a Gaussian covariance function. Departing from Higdon et al. (1999), Fuentes and Smith (2001) assume the kernel function is constant and convolves stationary process as opposed to Brownian motion process. This section shows how we can use the approach of Fuentes and Smith (2001) to model the non-stationary process in detail. This approach assumes the existence at a number of locations of unobservable, independent, stationary random processes that need not have any physical meaning. Their existence only helps us to construct the observable non-stationary process so they do not need to have any physical meaning. The observed non-stationary process is represented as a weighted average or convolution of these latent processes. Each latent random process has its own covariance parameters which vary from one latent process (location) to 12 another. The weight attached to each of the stationary processes in the representation of the observed non-stationary process depends on its location. So the covariance of the process between any two locations depends not only on their distance d but also on their locations. The rest of this section gives a detailed account of the non-stationary model of Fuentes and Smith (2001). We include it to enable us to introduce the detailed MCMC algorithm used to fit this model in Section 3.2. Suppose the latent stationary processes are Z, i = 1, â€¢ â€¢ â€¢, K, with Cov(Z(s), Z3 (s)) = 0 for i^j. Then the observed process Z(s) can be expressed as Z (S) = Ez ( s),(s), w2 =11 i i=, i=1 in which w z is the weight attached to stationary process Z. The above equation can be extended to an integral by replacing the weight with a kernel function as follows: Z(x) = f K(x â€” s)Z9( 8 )(x)ds, where K(X â€” s) is a kernel function. So, the weight of latent process Ze ( s ) depends on the location difference vector between x and s. The spatial covariance parameter vector of the latent stationary process {4 8 )} is 9(s) depending on its center location s. That is why these latent stationary processes are called "locally stationary". 13 Since the covariance between Ze(8) (x l ) and Zo ( 8) (x 2 ) is Cov (Ze( 8 )(x l ), Zo( 8 )(x2)) = Ce( 8 )(x l â€” x2), the covariance between Z(x l ) and Z(x 2 ) in the non-stationary process Z can be expressed as a convolution of the local covariance Ce(,)(xi â€” x2): C( Z (x i ), Z(x2)) = f K(xi â€” s)K(x 2 â€” s)Co(8)(xi â€” x2)ds.^(2.6) The process {Z(x)} is non-stationary because the covariance between Z(x i ) and Z(x 2 ) depends on the locations x 1 and x 2 . The kernel function can be any of the form K(u) = +1 2 K0 (3 ',), K0 being - 1 - any non-negative function with integral 1. Fuentes and Raftery (2005) use the following kernel function 3 Ko(u) = 4(1 â€” u 21)+71 ( 1 â€” uP+, with u = (u 1 , u 2 ) and in general, a + = max{0, a} for any scalar valued quantity a. This kernel function is chosen because it is easy to compute and it has a compact support. However, other kernel functions can be used as well. The choice of kernel function is not as important as the bandwidth parameter h, which can be any positive scalar valued quantity subject to certain restrictions explained later in this section. Also in Section 4.4, a simulation is carried out to examine the effect of different choices of the 14 bandwidth h on prediction and parameter estimation. Our choice of the kernel function implies that for a given pair of locations x 1 and x 2 , only the local stationary process whose center location s is within circles with origins x 1 , x 2 and radius h will have an effect on their covariance. In the paper by Fuentes and Smith (2001), the integral (2.6) is approximated by an average. First, the center locations s m , m = 1,â€¢ â€¢ â€¢, M of those latent stationary processes are chosen as points on a regular grid over the map. The kernel integral (2.6) is then replaced by M CM(x1, x2 9) = m-i ; E K(x, - .577,)K-(x2 - 8,72)c, 8,, (xi - x2). ( ) (2.7) m=.1 In the non-stationary model, the parameter 9 of the latent stationary processes is a function of its center location s. This function can be smooth but instead Fuentes and Smith (2001) assume an additive "ANOVA" (analysis of variance) type model for 9 in terms of center locations s. The stationary points are points of a regular grid over the map. For example, for the exponential covariance function, with just two parameters, the sill a and the 15 range p, the "ANOVA" forms would be: o i , = a + r i^ci^Ei,3 ^ - 1, â€¢ â€¢ = a + ri + + E;,9 =^â€¢ â€¢, n2 ^ (2.8) â€” (a, rl, â€¢ â€¢ ,^â€¢ â€¢, cri,) Ap =^â€¢â€¢,rn'i,ci,â€¢ â€¢ 7 c712) rsi NIVN(0, E(70-, n o-)) MVN(0, E(rp , 71p )). In the above model, n 1 and n 2 are the numbers of points in the horizontal and vertical directions respectively. As well, 7- 2 and c3 are main effects of center location's longitude and latitude on a. The main effects of the center location's longitude and latitude on p are 7.: and c'3 . In statistics,a design matrix is a matrix that is used in certain statistical models such as ANOVA model. It contains indicator variables (ones and zeros) that indicates the group membership. The design matrix for sill a has a dimension of (n 1 x n 2 ) x (1 + n i + n2). The elements in the first column of the matrix are all 1. For columns k E [2, n i + 1], only the elements from rows 1 E [(k â€” 2) x ni + 1, (k â€” 1) x n i l are 1. For columns k E + 2, n i + n 2 ], only the elements from rows 1 E [(k â€” n 1 â€” 1) x n 1 + 1, (k â€” n i ) x n i ] are 1. All the other elements in the design matrix are 0. In this hierarchical model, a and p might well have normal prior distributions. Then the hyper-parameters would be r i ri c3 , c; and y0., 77,, rp , 77,. , 16 , This "ANOVA" model is flexible because it does not assume a parametric function of a and p over the space. However, the normality assumption of a and p could be a problem because these two parameters should always be positive. An alternative is to model them with log-normal distributions which ensure cr's and p's are positive. Section 3.2 explores an algorithm for fitting this non-stationary model. 2.3 Kriging Kriging is a celebrated geostatistical approach to spatial interpolation in the point-referenced data setting, a name given to it by Matheron (1963) in honor of D.G. Krige, a South African mining engineer. Kriging, the best linear unbiased predictor (BLUP), weights the available observations in accordance with the distance between the locations where they are made and that of the response to be interpolated. These BLUP weights are obtained by minimizing the variance of the interpolation error assuming the spatial covariance is stationary and known. Given measurements of a random field Y = (Y(s i ),..., Y(s,,)) T at locations (s i , â€¢ â€¢ â€¢, s n ), the question is how to predict the random variable Y at a new site s o . The simplest Kriging approach, ordinary Kriging, assumes E(Y, o ) = = i = 1, â€¢, n. The interpolated value at s o is Y* (s o) =^wiY i=r 1 17 ( s , ,) tv, being the weight of Y(s,) subject to the constraint En 1 w = i i 1. Obvi- ously, Y*(s o ) is an unbiased estimator of Y(8 0 ). So, to minimize the MSE (mean square error) of Y*(2 0 ), we only need to minimize its variance, 2 aE = E [(Y* (s o ) â€” Y (so)) 2 ] n = â€” 7( 8 0 â€” so) â€” E^wituf-y(siâ€”si)-1- 2 Ewe-y(8,â€”s.). Besides ordinary Kriging, universal Kriging assumes the mean of the process depends on other variables such as the coordinates. Co-Kriging is a multivariate extension of Kriging. In practice , the spatial covariance function is . estimated empirically by (2.1) or (2.2). Its simplicity and ease of computation has made Kriging very flexible and hence a valuable tool in situations where it is applicable. The Kriging method is available in the R package "geoR" developed by Ribeiro and Diggle (2001), that we use to implement the method in our simulation studies and data analyses later in this thesis. 2.4 Bayesian Melding Model Although Kriging is easy to implement, Kriging relies on a known variogram to compute its weights, those estimated parameters as fixed. So the true uncertainty in interpolation is underestimated. The simulation study in Section 4.1 demonstrates that disadvantage. More importantly, grid cell data from a deterministic model will be on 18 coarser scales of resolution than the micro scale on which measurements are made. For example, the deterministic AQM studied in this thesis output its hourly ozone concentration data at a resolution 6 x 6 km 2 . The mismatch of scales leads to a need to re-calibrate the simulated data when combining it with the data for interpolation. While this could be done in an ad hoc fashion with Kriging, it is not designed to deal with that issue in a fundamental way. In contrast to Kriging, the BM model, developed and studied by Fuentes and Raftery (2005), Fuentes et al. (2003), as well as Fuentes and Raftery (2002), is designed to do that. It combines measurements and deterministic model outputs in a Bayesian framework that enables calibration of deterministic model outputs and spatial prediction including interpolation. BM model links processes with responses on mismatched scales through an underlying true process {Z(s) : s E R D } called the "truth", D being the dimension of the domain. If the location s only has longitude and latitude then D = 2, while if the location also has altitude then D = 3. The underlying true (latent) process, being unobservable, must itself be estimated. Denote the measurement process by {2(s) : s E and the determin- istic model output process by {2(B)}, B being the grid cell. To match Z(s), we also hypothesize the existence of deterministic model output process {Z(s) : s E V} based on locations s. Of course, the purely conceptual process {2(s)} does not actually exist. Its purpose: to link the model output with the truth at the micro- scale, thereby enabling its representation as an integral of {Z(s)}. The truth thus serves as the common basis for both 19 processes {Z(s)} and {2(s)}. The Bayesian model has the following mathematical form: , 2(s) = Z(s) + e(s) Z(s) = ,u(s) + â‚¬(s) Z(B) = T3 -1- fB Z(s)ds 2(s) = a(s) + b(s)Z(s) + 8(s) (B) =^fB a(s)ds + if13-1 - Bf ^113 b(s)Z(s)ds + 1 fB 8(s)ds it(s) = X(s)0.^ (2.9) In the above model, the measurements error and model output error are independent of each other. The measurement errors, e(s), are independent and identically distributed, having a normal distribution N(0, cr e2 ). The model output errors, 8(s), are independent and identically distributed with a normal distribution N(0, ag). The spatially correlated residuals, â‚¬(s), have zero mean and covariance matrix E(0), where 0 is the covariance parameter vector. The number of locations is rt. Z(B) and (B) are integrals of Z(s) and 2(s) over grid cell B. We only observe realizations of process 2(.9) and Z(B) at measured stations and grid cells for model outputs. The mean of the true underlying process is ,u(s) = X(s)/3, where X(s) is a function of the coordinates at location s and is the corresponding coefficient vector. We use polynomial function although other functions can also be used. The polynomial function can have degree such as 1, 2 up to 3 as 20 necessary. The covariance matrix E (0) is constructed with some covariance functions having a parameters vector 0. (See the previous section for a detailed discussion of covariance functions in stationary and non-stationary cases.) Throughout this thesis, we use the exponential covariance function (2.3). However, the software we have developed and provided can use either the Matern, Gaussian or exponential covariance function. In BM model (2.9), measurement 2(s) is modeled as the truth Z(s) plus measurement error e(s) and model output Z(s), as Z(s) times a multiplicative calibration parameter b(s) plus additive calibration parameter a(s) and random error 6(s). In general, we can assume that the calibration parameters are functions of the location of s, namely b(s) and a(s) in the model specification, to take into account their variability with respect to locations. For simplicity, we assume a and b are constants throughout the rest of this chapter for derivation and in the simulation study. But later in the data analysis, we assume a is a function of the coordinates of the location s and b still remains constant. The reason to keep b constant is that a varies over space much more than b. Fuentes and Raftery (2005) suggest and we use a Monte Carlo method to approximate the integrals in model (2.9). For example, we could sample L points 81,B, â€¢ â€¢ â€¢ , sL,B within grid cell B. From now on, we call these points "sampling points" to distinguish them from stations. Then, Z(B) can be approximated by the average of the values of process evaluated at the sampling 21 points within B: ( 22 2.10) Chapter 3 Implementation of the Bayesian Melding Model This chapter presents the details of how to fit the BM model (2.9) by using the Gibbs sampling algorithm proposed by Gelfand and Smith (1990). First, we consider the BM model with stationary spatial covariance structure, then we show how to fit the model with non-stationary spatial covariance structure. We also include an extension to the model (2.9) to combine measurements with model outputs from an ensemble of deterministic models. This chapter also presents how to incorporate reversible jump MCMC into the BM model. Finally, we give a description of the R package we have developed to implement the BM model in different scenarios. 3.1 Bayesian Melding in Stationary Case The Bayesian paradigm primarily seeks the posterior distribution of all the unknowns given the data as a description of their uncertainty. These unknowns include the random error variances (r', tea, coefficient vector /3, the 23 true underlying process Z, calibration parameters a, b and covariance parameter vector 0. We use vector Z to stand for realizations of the true underlying process at the stations and sampling points within grid cells. If we have n stations and m grid cells, the dimension of Z is n + m x L. The dimensions of the measurements Z and model outputs Z are n and m respectively. Let H = {X, Z Z} represent all the data, where X is the covariate matrix in , model (2.9). By using the above notation, the joint distribution of all the unknowns and available data can be decomposed as follows: p(2 , Z , Z , 13, 0, a, b, = p(k I Z, oDp(Z I Z, a, b, ol)p(Z l e 3 , 0)p(c4, al, Q, 0) (1/ E1^AOZ)43E2^â€” a â€” bA l Z)cl) E 3 (Z â€” X 13 )P(anP( 01)P(13 )A(201)) Note that given Z, Z and Z are independent. In (3.1), 1. E (Âµ) stands for the multivariate normal density with mean vector ea and covariance matrix E. We take the components of (0, 0, a e2 , (7,2 ) to have independent prior distributions, that is, p(o- , ate, 0 ) = P(a e2 )P(a 2c )P(0)P( 0 )^The matrices A o and Al are Ao 1^... 0 0^... 0 0^ 1 0^ = . . .^ 24 . . .^ 0 / nx(mL+n) and / 0^. ..^0^1- 1 L Al = 0 . .^0^0^...^0 â€¢ 1^1 L^â€¢ â€¢ L / mx(mL+n) In matrix A o the first n columns form an identity matrix and all other elements are all zero. In matrix A 1 , the elements in row i are -If from column n + 1 + (i â€” 1) x L to n + 1 + i x L and all other elements are all zero. We get (3.1) by using the approximation (2.10). E 1 = o-e2 / is the covariance matrix of measurement error vector e = [e(s i ), â€¢ â€¢ â€¢,e(s,i )] T , E2 = ov , the - covariance matrix of 6 = [S(B i ), â€¢ â€¢ â€¢ , 6. (B,,)1 T and E3 = E(6), the covariance matrix of Z while I is the identity matrix. The density of the joint conditional distribution for (Z, Z, z 0, 0, a, b, a 2 , al) is p(k, Z, Z1 a, 0, a, b, ae2 , al) a exp( H [ k â€” iloZ) T ET 1 (k â€” ADZ) + (k â€” a â€” bA1Z)T E21 ^â€” a â€” bA i Z) + (Z â€” it) T EV (Z â€” ea)] 1 = exp {--1 [â€”Z T (A r-oc ET i k + bATE2 1 (Z â€” a) + E 1 /./,) (Z T E -1-1 A0 + b(Z â€” a)EVA i + it T EV) Z + Z T (ATEi-1 A 0 + b 2 ATEVA 1 + E 25 l) Z] 1 + C,^(3.2) where it = X/3 is the mean vector of Z and C has the other terms without Z. As is well known, the normal prior distribution is conjugate to the normal sampling distribution. The full conditional distribution of Z conditional on all the other unknown parameters and available data is also a normal with mean vector ft and variance matrix E. So, the full conditional distribution of Z must be p (Z113, 9, a 2 , 0-1, a, b, But what are cx exp --1 2 rzTt 1Z - .^(3.3) - and E ? By matching the terms containing Z and Z T in (3.3) and (3.2), we identify the required ft and E for the full conditional distribution of Z as the following: Z 1(13, 0, a , , a, b, H)^MVN (f t); ---1 b 2 A rriE2-1 A1 E v E^(A0TETiA0 ); = E (A'cr,E'k + bATE2 1 (2 â€” a) + EVA) â€¢ ^(3.4) If the prior for 0 were p(/3) MVN(0 0 , F), then we could claim the full conditional distribution of /3 would be 01(0, Z, other parameters) ti NVN(Bb, B),^(3.5) -- and b = X T E3 1 Z + F -1 /3 0 . The proof of where .13-^x r 26 that claim now follows. Given E3 1 and Z, 13 is independent of other parameters. So the full conditional distribution of /3 is PAZ, E3) a P(ZIO, E3 1 )P( 3 ) 1 a exp{â€” [(z â€” XO) TE3 1 (Z â€” X0) + ( 3â€” 00) 1T -1 (3 â€” 00)]}- We can find the mean and variance of p(01Z , EA) by an approach of Lindley and Smith (1972); Smith (1973). For the exponential covariance function, the two components a and p of 0 are independent with inverse gamma and gamma prior distributions respectively. The inverse gamma distribution has the density function, p(x; a, 7) = 1^1 ^ e-vo,x) r( a ),-y a xa-Fi^, a > 0 and 7 > 0 being the shape and scale parameters, respectively. The full conditional posterior density function of 0, given the other variables and the data is 1 p(010, Z) a p(0)1E 3 0 exp [- 2( z â€” X 0)TE 3--1 (Z â€” X13)] ,^(3.6) where p(0) is the prior density for 0. In general, we have E3 = a g(p, C), where C is the Euclidean distance matrix between the stations and sampling points and g(â€¢) is the spatial correlation function. Then the full conditional 27 distribution of 0 can be written as p( 0 113, Z) P( 6 )P(P)1 0- 9(p,C)1 -1 exp 1 2 (Z â€” X13)T(ag(p,C)) 1 (Z â€” X 0)1. (3.7) From (3.7) the conjugacy of the inverse gamma as a's prior becomes apparent. With an exponential correlation structure, the spatial correlation matrix is g(p, C) = exp(â€”C/p) and so (3.7) can be written as ^p(010 , Z) a p(p)a - ' 1 ' 12 1e -ci^exp^+^â€” X fi)T(e -c ) -1 (Z â€” X 13))] . ^( Thus the full conditional distribution of a also has an inverse gamma with parameter = + n/2 ry Â± 2 (Z c xf3) , (e _, ^ P) 1 (Z X M) 1 . - However p does not have a standard full conditional distribution, forcing us to use Metropolis-Hasting algorithm proposed by Hastings (1970) to sample p from (3.7) with a fixed. If a e2 and a have inverse gamma priors with parameters (shape=a l ,scale=1/A 1 ) and (shape=a 2 ,scale=1/A 2 ) respectively, the full conditional distributions of 28 ^ a and 01 become ol I (a, b, Z Z) N IG (shape = a l + m/2, scale = (A 1 + -1 A) -1 ) 2 , with A = (Z - a - bA 1 Z) T (Z - a -^Z) and cr l(Z Z 2 ) ti IG (shape = a 2 + n/2, scale = (A 2 + -1 7) 1 ) with -y = (Z - Ao2) T (2 - A 0 Z). (3.8) Letting the prior for a, b be ti MVN(0, F) 7a) yields the full conditional distribution of the calibration parameters a, b as a , A 1 Z , E2 ti MVN(BC, B), where B = (A i Z)T (E2)' ( 1, A i Z) + F - 1 and C = (E2)' ^)T ^ ( 1, ) +F 1 0, We define 1 to be an column vector having the same dimension as Z. 29 3.2 Bayesian Melding in Non-stationary Case The previous section presents the full conditional distributions of all the parameters for a stationary true underlying process distribution. The full conditional distributions are the same for the non-stationary case except for the spatial covariance parameters (a and p). In model (2.8), the priors for the "main-effects" (a, r z , c3 and a', r:, c31 ) are independent multivariate normal distributions with means tt o. and tt p . Fuentes and Smith (2001) propose the non-stationary model (2.8) without giving much detail about fitting the model. In this section, we derive the full conditional distribution of a, p a , To- and 77, . For that purpose we let Y be the design matrix in model (2.8). The parameters associated with the sill include a, p c., To and r7,, while those associated with the range include p, Pp, Tp and n. Because the full conditional distribution of the parameters associated with the sill are analogous to those associated with the ranges, we only give the derivation of the full conditional distributions of parameters associated with the former. In that derivation, H represents all the other parameters as well as the available data. The full conditional distribution of a is the following: P (0 111) a I E^IE (7, ,^exp^[(Z â€” X 0) T E(cr, p) -1 (Z )(0)] } - exp{â€” 1 ^ 1-ta)TE(70-,71ff)-1(cT^120-)1}, 30 where E (cr, p), the spatial covariance matrix of Z and E(7-,, 77,) is the spatial covariance of the vector a for the latent stationary processes. In the nonstationary case, the full conditional distribution of o is no longer a standard distribution, necessitating use of the Metropolis-Hasting algorithm to update o as a block. Since p& has a normal prior with mean 0 0. and variance matrix F u , its full conditional distribution is MVN(Bb, B) B -1 = Y TE (ra , 77,) -1 Y + F; 1 b = Y T E(r, , The full conditional distribution of Ta ff)-1v + F; 1 a .. is p(7,1Z , 77,) oc^715)1E(T, , ?MI -12 exp [â€” 2 (cr â€” Yi.c,)TE (To-, 7/0-)' (cr â€” 17 /2 5 )] . Our Gibbs sampling algorithm must include Metropolis-Hasting steps to sample from the full conditional distributions whenever they cannot be specified in a closed form. 3.3 MCMC Algorithm To sample from the posterior distribution of the parameters in the BM model (2.9), we use the Gibbs sampling algorithm proposed by Gelfand and Smith 31 (1990) as implemented by Fuentes and Raftery (2005). First, we choose some arbitrary initial values for 0, 0 as e ( ' ) , 13 (1) . Then, the Gibbs sampling is implemented in the following three stages. â€¢ Stage 1: Given all the other parameters and the available data, realizations of the true underlying process {Z(s)} are updated at n stations and L sampling points within each of the m grid cells. In this stage, a random sample of Z is generated by using (3.4). â€¢ Stage 2: First 0 = (a, p) is updated given /3 and Z obtained in Stage 1. Second, given 0 and Z is updated by (3.5). Updating /3 and a is easy because their full conditional distributions are multivariate normal and inverse gamma respectively. However, the full conditional distribution of p does not have a closed form, so we have to use Metropolis-Hasting algorithm to update it. â€¢ Stage 3: In this stage, cr 2 , .(7,1 and a, b are easily updated given all the other parameters since their full conditional distributions are either normal or inverse gamma. So, it is easy to update them. The above Gibbs sampling algorithm is nearly identical for the stationary and non-stationary cases, the only difference being in the updating of 0 in Stage 2. 32 3.4 Spatial Prediction This section indicates how the BM model can be used to predict realizations of the underlying process at unmonitored sites using the available data. Denote by Z u , realizations of the true underlying process at the unmonitored sites of interest. Finding the interpolation procedure entails finding the posterior distribution of Z u lZ g , Z 9 being realizations of the true underlying process at monitoring stations and sampling points within grid cells. Observe that E(Z 9 ) =Âµ 9 and E(Z u ) = p,â€ž. Thus the conditional mean and variance of Z u I Z g , /3, 0 are E(Z u l Zi 9 , 0,0) = m u + E ug E 9 1 (Z 9 â€” m g ), and Var(Z u Z g , 0) = E u â€” E ug E 9 1 E 9u . The covariance matrix of Z u is Eâ€ž and the covariance matrix between Z u and Z 9 is E. Let E gu denote the transpose of E". In the BM model (2.9), we have Z(s) = Z(s) + e(s). So, the conditional mean and variance of k u lZ 9 ,0, 9 are E(k u jZ g ,^= tz u Eu g E l (Z g^0, and Var(Z u lZ g , 113,^= E u â€” E u9 E 9 1 E 9u in which the dimension of the identity matrix I is the number of stations to be predicted. 33 The marginal distribution of k u lk, Z is P(kulk ,^= f P(zulcb)P(014 2)4, ^(3.9) in which di stands for all the other parameters such as Z 9 , and 0 that can be approximated by p(k,, I Z g ) p(kuiz9,o2), where c/) i (i = 1, â€¢ â€¢, n) is the i-th MCMC sample of all the parameters. 3.5 Extension to Ensembles Previous sections consider the problem of combining measurements from monitoring sites made with a single instrument with output from just one deterministic model. In fact, the BM model can be extended to combine data from ensembles of measuring instruments and of deterministic models. For simplicity we consider the case of just a single measuring instrument and multiple deterministic models although the extension to incorporate multiple measuring instruments will be obvious. To extend model (2.9), suppose k â€ž, â€¢ â€¢ â€¢, kp are output from an ensemble of p deterministic models. To ensure a non-singular spatial covariance matrix, suppose no overlap between the monitoring sites and sampling points within each grid cell for all the different deterministic models. For simplicity, we 34 assume the number of sampling points within each grid cell is the same, L, for all the deterministic models although extension to differing numbers is straightforward. We also assume each deterministic model has model outputs on m grid cells. Then the dimension of Z is n+p x m x L. As in (3.1), the joint posterior distribution can be decomposed as follows: P( k l 2 11 â€¢ â€¢ â€¢1 = p(21 Z, (7,) 2 p7Z,1, 0 7^ â€¢ â€¢. 7 bp, Cre2 011 â€¢ â€¢ . 1 Â°1,7) H p(2 i 1Z, ai bi o-L)p(Z 10 , 0)p(o-e2 , o-L, â€¢ â€¢ â€¢ , aa , i=i = CI3 E 0 (k â€” Z0) H DE ( i= 1 i , ,p, 0, 9) ( â€” a i â€” b i A i Z)p(aL)(1) E (e)(Z â€” X 0)P(ae)P( 0 )P( 3 )â€¢ Deterministic model i has calibration parameters a i and b i , model output error variance parameter as i. The covariance matrix of measurement error e is E 0 = o-e2 I and the covariance matrix of the model output error vector S i of the i-th deterministic model is E i = al i /. The dimension of matrix A o is n x (n +p x m x L) and that of matrix A i (i = 1, â€¢ â€¢ â€¢,p) is m x (nÂ±pxmx L). The first n columns of matrix A o form an identity matrix and the remaining elements are all 0. Row j of matrix A i (i = 1, â€¢ â€¢ â€¢ ,p) has elements Â± from columnn+1+(i-1)xmxL+(j-1)xL ton+1+(i-1)xmxL+jxL and all other elements are zero. The density of the conditional distribution of (Z, Z 1 , â€¢ â€¢ â€¢, Z p , Z) given 35 (Q7 07 ai, â€¢ ap,^â€¢ â€¢ bp, Cr e2 , 41 , â€¢ â€¢ .1 qp) iS P( k , 2 1, â€¢ â€¢ â€¢ ^Z)10, 0, al, â€¢ ", ap , 14, â€¢^bp, cre2 , (IL, â€¢ â€¢ â€¢, Â°1,p) exp - -1 [(Z - A 0 Z) ET 1 (Z - A 0 Z) 2 + E (2 i â€” a i - b i A j Z)Ei T 1 (Z i a, - b i A i Z) â€¢ (Z -^E-1 (Z - /1)] biATE;- 1 (2, â€” a i ) + E -1 p ) = exp^[-ZT (A rd'EVZ i=i (Z TEVA 0 +^b i (2 i â€” a i )ET 1 Ai + itTE -1 Z i=-1 + P Z T ATET (^ 1 iii Â±^14Al'ET 1 A i + E -1 Z + C, (3.10) i= 1 p, and E being the mean and variance of Z, C, the other terms without Z Like (3.4), the full conditional distribution of Z is the following. Z I other parameters and all data ^E) E 1 = (24 10'E cT 1 A 0 + ^ i=1 bi2 iq E T 1 A i + E -1 ) =^E biATE T 1( â€” a i ) + E -1 /.1,) . i=i 36 The prior distribution of (a i , b i ), i^1, â€¢ â€¢ â€¢,p, is ai ti MVN (0 â€ž F). Their full conditional distribution is ai ( ^lki, z, Ei ,-, MVN (BC , B), bi where ( B =^(Ei)-1 ( 1, (A j Z)T +F 1 and C 1T (A i ) (E1) -1^, Zi + FT 1 0 i . Z)T 3.6 Reversible Jump MCMC Previous sections assume a known degree in the polynomial representing the mean function ,u,(s) X(s)O. In other words, the coefficient vector has known dimension. Green (1995) proposes the reversible jump MCMC to allow the dimension of to be an unknown parameter. For the BM model, we let k be /3's dimension and ri k = (Oâ€¢, 0). The reversible jump MCMC allows k to vary, while the dimension of all other parameters are fixed. The 37 objective: to sample from the joint posterior distribution of k and Ti k , that is, p(k,n k idata). To achieve that, we first find the full joint conditional distribution of k and ri k . Given Z, that full conditional distribution is P(k, 97kIZ) oc p(Z, 71k, k) = P(Z1k, 71k)P( 71k1k)P(k), p(k) being the prior distribution of the dimension k and p(77k (k), the prior distribution of nk given k. Suppose initially the dimension of /3 is p > 0). Then the next iteration has two possible outcomes for k: increase the dimension by one (a "birth") or decrease it by one (a "death"). The probability of a "birth" or "death" is 1/2. The jump scheme can be quite arbitrarily but we make the above choice because we favor "birth" or "death" equally. After choosing either "birth" or "death", we accept/reject the jump with probability a/(1 â€” a), the specification of a, being explained below. If we reject the jump, then we stay with the current dimension p; otherwise, the dimension k will be either p +1 or p â€” 1. In the same iteration, the next step updates the parameters ri given k as in the fixed dimensional case. By using the formulas provided in Green (1995), the acceptance probabilities of "birth" and "death" are the following. â€¢ "Birth". To jump from the previous dimension k = p to a new dimension k* = p +1, we need to propose one extra coefficient 137E ,,, for one extra covariate. That extra coefficient/3 7Le w is proposed by a proposal distribution with density q(â€¢). Because Aien, is proposed independently 38 of other coefficients the Jacobian of transforming from )3 to 13* is 1. So, the new coefficient vector is 13* = (13, and the acceptance probability of k* = p + 1 is p(klp(O*Ik*)p(ZIk* , )3*) a = minfl, puopolop(zik, o)q(Nne.) 1- â€¢ "Death". To jump from the previous dimension k = p to a new dimension k* = p â€” 1, we need to delete one coefficient from the coefficient vector. We choose to delete the last coefficient of the vector /3. The reason is that we arrange the covariates from lower order of the coordinates to higher order and usually the higher order of the coordinates are more likely to be insignificant than lower ones. Again, the Jacobin is 1 and the acceptance probability of k* = p â€” 1 is a = min{ 1, P(e)P(0 * le)P(Z k * , 0*)q(4) }. P(k)P(31k)P(ZIk, )3 ) After choosing dimension k, we update all other parameters by using Gibbs sampling as in the fixed dimension case. In Section 4.3, we conduct a simulation to see how well the reversible jump MCMC works when incorporated into the BM model. The discussion above shows how to use the reversible jump MCMC algorithm to choose the dimension of /3. That algorithm can also be used to choose the dimension of coefficients of the additive calibration a(s). If we 39 let a(s) = Y(s)p a , Y (s) being the polynomial function of the coordinate at location s, then we can use reversible jump MCMC to choose the dimension of coefficients O a . The detailed algorithm is very similar to the above when reversible jump MCMC is applied to choose the dimension of 0. The current software does not include the reversible jump MCMC to choose the dimension of coefficients O a , but we are planning to incorporate that into the software in the near future. 3.7 Bayesian Melding Model Software We wrote a BM model program in R, developed by the R Development Core Team (2006). The R program is online at http://enviro.stat.ubc.ca/melding/meldingcode.zip . The code for the stationary BM model is in the directory "mcmc" and the code for stationary BM model incorporating the reversible jump MCMC is in the directory "rjmcmc". The code for the non-stationary BM model will be available online soon. The subdirectory "debug" in directory "mcmc" is used to debug the R programs. That directory includes the file "simudata.s" simulating the measurements and model output, "mcmc.s", the main function to implement the MCMC algorithm, "update.s", various functions to sample parameters from their full conditional distributions. The description of the main function "melding" in file "mcmc.s" is the following. 40 Dependence: R>2.3.0 and packages "MASS". Usage: melding(ml,m2,nm,sam.sloc,sam.slocl,ton,burnin,zhat,Zbtilde,degree,cov.model) Arguments: â€¢ nm: number of sampling points in each grid cell of the model output. â€¢ sam.sloc: coordinates of the sampling points and the monitoring stations. It should be a (m2 x nm + ml) x 2 matrix. The first m2 x nm rows are the coordinates of the sampling points in grid cells. The last ml rows are the coordinates of the monitoring stations. â€¢ sam.slocl: coordinates of the unmonitored stations where the measurements are to be predicted. â€¢ ton: number of MCMC iterations in the Gibbs sampling. â€¢ burnin: the "burn-in" period of the Gibbs sampling. â€¢ zhat: the measurements vector. â€¢ Zbtilde: the model output vector. â€¢ degree: degree of the polynomial function f(-) for the mean p(s) = f(s)fi and the following options are allowed: 0 the mean is assumed constant across space. 1 the mean is assumed to be a first order polynomial on the coordinates: p(s) = Qo + /30 1 + 13282. 41 2 the mean is assumed to be a second order polynomial on the coordinates: A(s) = Oo + 91s1 + 02.52 + 03s2 Q4s2 + beta 5 s 1 s 2 . â€¢ cov.model: a string with the name of the correlation function. The options are one of the following three functions: "exponential": exp(â€”d1 p), d is the distance and rho is the range parameter. "Gaussian": exp(â€”d 2 /p), d is the distance and rho is the range parameter. "Matern": [(2" â€” 1)r(v)] -1 (dIp)"IG(dlp). v is the smoothing parameter and Ku (.) denotes the modified Bessel function of the third kind of order v. The function "melding" will return a list of the following objects: â€¢ "beta.est": posterior mean of the coefficient vector Q. â€¢ "beta.est.sd ": posterior standard deviation of the coefficient vector 0. â€¢ "theta.est": posterior mean of the spatial correlation vector 9. â€¢ "theta.est.sd": posterior standard deviation of the spatial correlation vector O. â€¢ "prediction": posterior mean of the spatial prediction. â€¢ "pred.q1": 5% quantile of the posterior distribution of the spatial prediction. 42 â€¢ "pred.q2": 95% quantile of the posterior distribution of the spatial prediction. â€¢ "ab.est": posterior mean of the additive and multiplicative parameters a and b. â€¢ "ab.est.sd": posterior standard deviation of the additive and multiplicative parameters a and b. â€¢ "sigmae.est": posterior mean of the measurement error variance parameter ol. â€¢ "sigmae.est.sd ": posterior standard deviation of the measurement error variance parameter ol. â€¢ "sigmad.est": posterior mean of the model output error variance parameter 01. â€¢ "sigmad.est.sd ": posterior standard deviation of the model output error variance parameter cd . The description of various functions in file "update.s" is the following. â€¢ "updatebeta(...)" : generate MCMC sample from the full conditional distribution of 0. The arguments of this function include y: the realizations of the true underlying process {Z(s)} at monitored stations and sampling points within grid cells. 43 x: the covariate matrix at monitored stations and sampling points within grid cells. prior.mean: prior mean of the coefficient vector f3. prior.var.solve: inverse of the prior variance matrix of the coefficient vector 0. sigma.solve: inverse of the spatial covariance matrix of true underlying process {Z}. â€¢ "updatetheta(...)" : generate MCMC sample from the full conditional distribution of 0.The arguments of this function include diff: the residuals of the true underlying process. theta: the values of the covariance parameters theta from the previous MCMC iteration. Distance: the Euclidean distance matrix between all the locations including monitored stations and sampling points within grid cells. n: number of monitored stations and sampling points within grid cells. cov.model: one of the three possible choices for the covariance function: "exponential", "Gaussian" and "matern". â€¢ "updatesigma(...)": generate MCMC sample from the full conditional distribution of o and er'. The arguments of this function include zhat: the measurements vector. 44 Zbtilde: the model output vector. nm: number of sampling points in each grid cell of the model output; n: number of monitored stations and sampling points within grid cells; m2: number of grid cells. a: additive bias parameter a; b: multiplicative bias parameter b. A2: matrix A l as described in Section 3.1. â€¢ "updateab(...)":generate MCMC sample from the full conditional distribution of a and b. The arguments of this function include Zbtilde: same as in the function "updatesigma(...)". ,y,abO,fb.solve,nm,m2,A2. sigmad: the output error variance parameter 4. y: same as in the function "updatesigma(...)". n,m2,A2: same as in the function updatesigma(...). â€¢ "updaters(...)": generate MCMC sample from the full conditional distribution of Z. The arguments of this function include X: the covariate matrix at monitored stations and sampling points within grid cells. zhat, Zbtilde, nm,m2: same as in the function "updatesigma(...)". sigmad, sigmae: the output error variance parameter 0-1 and measurement error variance parameter o -,2 . 45 sigma3.solve: inverse of the spatial covariance matrix of true underlying process {Z}. Reversible Jump MCMC is used to do the variable selection in a Bayesian framework. It is functioning similar to the step-wise regression in classical framework. All the functions in "update.s" are the same as in the BM model described above. In the function "rjmelding(...)", we do not specify the "degree", which is estimated by the reversible jump MCMC. The function "rjmeding(...)" only returns the dimension of the coefficient in the mean of underlying true process. More details of the software can be found in Liu (2007). 46 Chapter 4 Testing the Bayesian Melding Model This chapter presents an extensive simulation study of the BM model under a variety of scenarios that assess the method while checking the R code developed. We begin with the stationary BM model, one deterministic model and one measuring instrument. Then we move on to the case with an ensemble of deterministic models. We include a simulation study that incorporates a reversible jump MCMC into the BM model. In addition, studies are devoted to several other issues such as how the estimation of covariance parameters can be improved by better monitored site layouts and the effect of choosing bandwidth h in non-stationary BM model. 4.1 A Single Deterministic Model This simulation, which assumes a stationary true underlying process, has two purposes. First, it validates the MCMC algorithm for implementing the BM model. Second, it compares the predictions of Kriging with those of the BM 47 model. Simulation Settings In the simulation, we have 20 monitored sites and 100 un-monitored sites whose responses are to be predicted. The number of available grid cells increases from 0 to 50 through the sequence: 0, 2, 10, 20, 30, 50. To reduce our computational burden, we only have one sampling point within each grid cell. The coordinates of the available/unavailable stations and sampling points within grid cells are generated uniformly on [-5, 5] x [-5, 5]. Figure 4.1 depicts the locations of sites (monitored and un-monitored) and sampling points. For each combination of sites and sampling points, we generate 50 independent dataset of the measurements and model outputs according to the BM model (2.9). The measurements generated at the 100 unmeasured stations will not be used in the MCMC prediction but left for validation. As the number of grid cells increases, we investigate changes in prediction accuracy and coverage probability for different prediction intervals at various credibility levels (95%, 90%, 80%, 70%, 60%, 40%). In BM 2.9, we let the mean of the true underlying process be E [Z(s)] = P(s) = A) + r(31s1 + /3282, ^(4.1) s 1 , s 2 being the coordinates of location s. We assume {Z(s)} has an expo- 48 nential covariance structure. At the same time, we give 13 a prior normal distribution with mean 0 0 = (1, 1, 1) and diagonal covariance matrix with diagonal elements (100, 100, 100). The covariance parameters a and p have independent prior distributions; a has an inverse gamma prior distribution with shape = 2, scale = 0.2, while p has a gamma prior distribution with shape = 3.5, scale = 2. The calibration parameters a and b have independent normal prior distributions with means kt a = 0, /..t b = 1 and variances aa2 = 100, 0-1 100. The random variables 01 and a have identical and independent inverse gamma distributions with shape = 3 and scale = 0.1. Using the MCMC algorithm to sample from the joint posterior distribution of all parameters, we take 1000 Gibbs sampling iterations (the first 100 being for "burn-in"), which is enough to ensure convergence as shown in Figure 4.3 and Figure 4.4. That is fortunate since computational times are long and necessity for a longer series of iterations would have greatly limited the scope of our simulation study. The software used in our study is available online (http://enviro.stat.ubc.ca/melding/meldingcode.zip) to enable independent verification of our findings. Simulation Results Tables 4.1, 4.2 and 4.3 present estimation results for the parameters. Table 4.4 presents the SSPE (sum of squared prediction errors) for the 100 unmeasured stations. Our results lead to the following over observations. 49 -4 ^ -2 ^ 0 ^ 2 ^ 4 Xâ€”coordinate Figure 4.1: Locations of 120 (20 monitored, 100 to be predicted) sites and points sampled from up to 50 grid cells. Monitored sites (s), sites with responses to be predicted: +, 1: the first 2 points from grid cells, 2: 3-10 points, 3: 11-20 points, 4: 21-30 points, 5: 31-50 points. â€¢ The estimation of a, b and /3 are very close to their true values. â€¢ Based on Figure 4.2, which shows the true underlying process values versus the BM prediction in the case of 50 grid cells, we can see the BM predictions are quite close to the true values. â€¢ The estimates of the covariance parameters 9 = (a, p) are reasonably good. However, the range parameter p proves much more difficult to estimate than the sill parameter a. The posterior distribution in Figure 50 4.5 shows p to be widely dispersed. â€¢ Table 4.4 reveals a significant variation in the SSPE from one simulated dataset to another. â€¢ Table 4.4 shows that adding more grid cells decreases the average SSPE for the BM model. The average is computed over 50 independent dataset. In each, the SSPE does not necessarily decrease in a monotonic fashion as the number of cells increases. However, variation from the non-decreasing pattern in the SSPE may just be due to the sampling variation between the generated dataset. Table 4.5 shows the SSPE obtained by using the true values of the parameters in the predictor. In spite of that advantage, the SSPE for even this predictor does not always decrease monotonically as the number of grid cells increases. (See dataset 4, and 6, for example.) â€¢ Table 4.6 shows that the empirical coverage probability for the BM predictor comes close to the nominal level when we have a reasonably large number of cells (at least 10). However that is not the case when we have just 0 or 2 cells, not surprising given the paucity of data in those situations. Kriging's coverage probability turns out to be much smaller than that for the BM predictor even when no simulated data are available from the deterministic model (so that the two methods compete "head-on"). That result would be anticipated since, as is well known, Kriging underestimates the uncertainty in its spatial predictions. 51 â€¢ Figures 4.3, 4.4 and 4.5 show the MCMC samples from posterior distributions of a, b, 0, a and p in the case of 50 grid cells. From these three figures, we can see the Markov chain converges after a few iterations. Table 4.1: Estimates of a and b with different numbers of model outputs available. The true values are a = 5.0 and b = 2.5. Columns 2-5 are the averages of the posterior means and standard deviations. The averages are computed over the 50 dataset generated in the simulation study. Since this table is for the estimates of parameters a and b, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. Number of cells a sd b sd 2 5.61 4.14 2.49 0.22 10 5.11 1.40 2.50 0.09 20 5.24 1.02 2.50 0.08 30 5.12 0.82 2.50 0.07 50 5.15 0.63 2.50 0.06 Improving Covariance Parameter Estimates Table 4.3 shows that although the posterior mean of p gets closer to the true value as the number of grid cells increases from 0 to 50, the posterior standard deviation remains relatively large even with 50 grid cells. That may well be due to insufficient sampling points in close proximity to one another. In other words, we do not have enough information about the small scale process variability needed to accurately estimate the variogram. To explore 52 Table 4.2: Estimates of the coefficients /3 = (/3 0 , /31 , /32 ) with different numbers of model outputs available. The true values are 0 0 = 2.50, 0 1 = 2.90 and /32 = 3.20. Columns 2-7 are the averages of posterior means and standard deviations. The averages are computed over the 50 dataset generated in the simulation study. Since this table is for the estimates of parameters /30 , â€¢ â€¢ /32 , there is no unit associated to the numbers in this table. Each of these parameters is a scalar. Number of cells ,87.0 sd (31 sd 0 2.35 0.46 2.89 0.12 3.12 0.12 2 2.13 0.75 2.88 0.13 3.11 0.14 10 2.11 0.77 2.88 0.13 3.11 0.14 20 2.09 0.80 2.89 0.13 3.12 0.14 30 2.03 0.82 2.88 0.13 3.12 0.14 50 2.03 0.83 2.88 0.13 3.12 0.14 /42 sd that conjecture, we carry out a small simulation study with 20 monitoring sites and 50 sampling points in grid cells as in the previous simulation. However, in contrast to the previous case, we concentrate 25 sampling points in a very small region given by [-0.05, 0.05] x [-0.05, 0.05]. Except for that variation, we generate these data in precisely the same way as in the previous simulation. Table 4.7 shows the results for the estimators of the covariance parameters. From that table, we can see that the standard error of estimator for p is reduced significantly compared with the result in Table 4.3, thereby adding support to our conjecture. 53 Table 4.3: Estimates of the covariance parameters in 0 = (a, p) with different numbers of model outputs available. The true values are a = 1.50, p = 5.00. Columns 2-5 are the averages of posterior means and standard deviations. The averages are computed over the 50 dataset generated in the simulation study. Since this table is for the estimates of parameters a and p, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. Number of cells Q sd p sd 0 1.42 0.26 1.36 1.46 2 1.54 0.31 2.41 2.22 10 1.52 0.35 2.98 2.01 20 1.53 0.33 3.61 2.03 30 1.54 0.33 4.41 2.17 50 1.52 0.27 5.34 2.32 Conclusions Our simulation results point to strengths and weaknesses in the BM model. Strengths: â€¢ The BM model can estimate the calibration parameters of the model output very well, given a reasonable number of monitoring sites and grid cells. â€¢ In general, increasing the number of grid cells improves spatial prediction accuracy at un-monitored sites. â€¢ Estimates of the coefficients of the process mean are very good. 54 0 â€” true values - - - interpolation by melding 0 0 I^ 0 20^ I 40^ 60^ 80^ 100 unâ€”monitored sites Figure 4.2: Spatial predictions for 100 un-monitored sites by using BM model in the case of 50 grid cells. â€¢ The BM predictor gives much more realistic estimates of the prediction uncertainty than does the classical Kriging approach. Weaknesses: â€¢ The BM model does not estimate the covariance parameters (o and p) well in terms of their standard errors unless an appreciable number of sampling points are very close to one another. (As seen seen from the previous section, forcing a number of such points into close proximity does decrease their standard errors substantially). 55 ro Gibbs sampling iteration 0 ^ 200 ^ 400 ^ 600 ^ 800 ^ 1000 Gibbs sampling iteration Figure 4.3: MCMC Gibbs samplers of the additive and multiplicative calibration parameters, a and b, respectively, in the case of 50 grid cells. â€¢ The current BM model (2.9) does not include temporal information. For the ozone pollution data studied by Kasibhatla and Chameides (2000), both the real measurements and model output are hourly data, which are temporally correlated. Extending the BM model to embrace random space - time fields would be desirable, as it would enable potentially great strength to be borrowed across time as well as space. One of such extensions is the dynamic BEM model we present later in Chapter 10. 56 Gibbs sampling iteration Gibbs sampling iteration Gibbs sampling iteration Figure 4.4: MCMC Gibbs samplers of coefficients 0 in the case of 50 grid cells. â€¢ The computational burden imposed by the BM model limits its practicality. That burden stems from the need to invert a large dimensional spatial covariance matrices three times in each Gibbs sampling iteration. For example, 20 monitoring sites and 100 grid cells with 4 sampling points in each leads to a spatial covariance matrix with 20 + 100 x 4 = 420 rows and columns. MCMC does not provide a "free Bayesian lunch" as might naively be suggested by its very elegant theory. 57 Gibbs sampling iteratioin a 0) c CO 0 ^ 200 ^ 1^ 400 ^ 600 ^ i^ 800 ^ i 1000 Gibbs sampling iteratioin Figure 4.5: MCMC Gibbs samplers of covariance parameters 0 = (a, p) in the case of 50 grid cells. 4.2 Ensemble Studies Simulation settings This section presents a simulation study illustrating the use of BEM model to combine measurements with model outputs from more than one deterministic models. To make this simulation study closer to the real life case, we used locations from a dataset studied by Kasibhatla and Chameides (2000); Hogrefe et al. (2001b,a). There, measurements come from monitoring sites ("stations") and modeling output from deterministic models. In fact, we use 58 cis 0 - IT cn 0 a) - CO 0) a_ Fn o - I^ 0 i^ r^ I^ 1 200^ 400^ 600^ 800^ 1000 Gibbs sampling iteration 0 200^ 400^ 600^ 800 ^ 1000 Gibbs sampling iteration Figure 4.6: MCMC Gibbs samplers of error variance 01 and o-e2 in the case of 50 grid cells. only a subset of these stations and grid cells in our simulation, with 50 monitored sites treated as stations and 100 un-monitored sites. For simplicity we assume only two deterministic models in our simulation study yielding output in 20 grid cells and each of them has two sampling points. Figure 4.7 shows the locations of the sites (monitored/unmonitored) and the sampling points within grid cells. The mean of the true underlying random process Z is a polynomial func- 59 1 ^ -100 ^ i -90 ^ -80 ^ -70 Figure 4.7: Locations of 150 sites (50 monitored, 100 to be predicted) and 20 grid cells. Stations: A; sites with sites to be predicted: +; grid cells: rectangles. Each grid cell has two sampling points in it. The x-axis is the longitude in degrees, y-axis, the latitude in degrees. tion of the coordinates: E [Z(s)] = /-t(s) = N +131s1 + 02,52 + /334 + 044 + 058182, s 1 and s 2 being the coordinates at generated location s. This second degree polynomial mean function of the coordinates is one degree higher then the mean function (4.1) used in the previous simulation. This more complicated choice permits us to see how well the BEM model works no matter how 60 complicated the mean function. In this simulation, we generate 15 independent datasets. Initially we tried 1000 iterations of the MCMC algorithm but found it failed to converge. Much better results obtained after we extended the number of iterations to 10,000 with a "burn-in" period of 1000. In the case of ensemble deterministic models, there are more parameters in the BEM model. The complexity of BEM model requires a much longer Markov chain to reach convergence. Simulation Results Table 4.8 gives the estimated values for the parameters and Table 4.9, the SSPE (sum of squared prediction errors) for the 100 unmonitored stations. These results, suggest the following conclusions. â€¢ Estimates of the additive and multiplicative biases for both deterministic models are quite accurate (Table 4.8). â€¢ The estimate of the sill parameter a is very close to the true value while that of the range parameter p is not. However, the true value does lie within its 90% credibility interval. â€¢ Estimates of o 5,2 is reasonably accurate. However, the estimate of al, i - exceeds the true value by quite a margin. â€¢ The BEM model gives better predictions than Kriging, measured by SSPE, because the calibration parameters are estimated very well, 61 meaning in effect, that the model output helps to achieve better prediction. In summary, the BEM model seems to work quite well when we have more than one deterministic model although the estimate of ag i is not very close to the true value. 4.3 Reversible jumps: the stationary case This simulation involves 20 stations and 50 grid cells. Each grid cell has only one sampling point to reduce the computation burden. All stations (monitoring sites) as well as sampling points are generated uniformly on [-5, 5] x [-5, 5]. The mean function of the underlying true process is E [Z(s)] = P(s) = /3o + /31s1 + 32s2 +,33s1 * s2 with true parameters = (1.3, 1.2, 0.5, 0.3). The true parameters of the exponential covariance function are a = 1.5 and p = 5.0. The calibration parameters are a = 5.0 and b = 2.5. The variances of the measurement error and model output error are cre2 = 0.25 and 01 0.25 respectively. We simulate 15 independent dataset in total. Figure 4.8 shows the MCMC plot and histogram of the dimension k. Table 4.10 shows the estimation results. They lead us to make the following observations. 62 â€¢ The Markov chain for k converges to the true dimension k = 4 (Figure 4.8). â€¢ The estimates of )3 are averaged over all the MCMC iterations in which )(3 has dimension of 4 and that the estimates are close to the true value of 13. â€¢ The true values of various parameters are within the Bayesian credible intervals except for those of ol and 4. Gibbs sampling iterations 4 ^ 5^ 6 7 dimension of beta Figure 4.8: The upper plot: the dimension k of the coefficient )3, as a function of the number of MCMC iterations. The lower plot: histogram of the posterior MCMC samples of k. 63 4.4 Bayesian Melding in Non-stationarity Case In Equation (2.6) the bandwidth h has to be chosen to represent the nonstationary process by convolution of latent stationary processes. A minimal requirement for h: ensure the positive definiteness of the covariance matrix. Yet if h is small then two locations will have no point, around which stationary processes center, between them, the covariance between these two locations will be zero. So, if h is too small then too many zeros will appear in the covariance matrix of the non-stationary process, resulting in numerical problems when this covariance matrix is inverted to evaluate the likelihood of the non-stationary process. On the other hand, if h is too big, the covariance between two locations could also be too small because the kernel function is too flat and M in (2.7) is not big enough. In practice M cannot be chosen too big because of the computational burden. This simulation studies the effect of varying h. Primary interests focuses on spatial prediction error as well as on the accuracy of the a and b estimates. Table 4.11 and 4.12 present the prediction and estimation results. They show that as long as h is not too small, both the SSPE and estimates of a and b are not much different. However, when the chosen h is too big, say, more than twice the true bandwidth used to simulate the data, the SSPE is much bigger. Also, the standard error of estimators of a and b are 64 much bigger with small h (40% of the true h ) than the estimators with a bigger h. If h is chosen too small, say, less than half of the true bandwidth used to simulated the data, even the MCMC algorithm will "crash" because the covariance matrix becomes singular. So as a conservative strategy, we recommend choosing a large h. Large in this can mean half of the distance between two points located at the far ends of the diagonal line across the region. That choice of h is less likely to produce a row or column full of zeros in the spatial covariance matrix, leaving it more numerically stable. 65 Table 4.4: SSPE (Sum of squared prediction errors) of Kriging and the BM model with different number of grid cells. Since the data is simulated, there is no unit associated to the numbers in this table. Dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 mean Kriing 116.03 104.50 223.32 119.76 182.26 135.59 150.89 127.38 115.09 121.31 107.19 172.63 140.90 194.51 181.34 257.84 141.69 170.47 113.81 99.03 249.37 138.46 138.66 110.81 110.84 162.32 203.55 138.53 117.12 152.81 157.05 140.36 142.60 102.68 153.52 171.43 105.98 144.45 257.85 158.86 145.30 128.23 193.23 130.01 112.00 168.65 213.19 100.05 136.49 113.47 149.47 Ocells 2 cells 74.11 74.21 42.16 42.73 104.80 115.60 78.73 76.37 80.53 90.61 59.63 64.79 68.86 68.71 92.14 90.36 59.11 65.81 120.22 117.87 56.38 67.78 100.56 96.47 71.37 70.93 126.80 134.38 92.89 89.98 151.52 141.73 81.64 80.08 90.68 89.63 73.24 74.97 83.48 85.73 163.09 147.83 85.70 95.09 82.25 73.79 76.74 75.58 92.90 98.19 112.48 114.98 112.85 113.07 72.54 74.38 82.15 82.46 135.98 133.59 117.43 117.13 101.53 100.57 103.28 101.85 93.99 86.41 80.53 78.76 75.52 73.94 85.41 85.17 62.95 71.70 173.02 170.12 123.88 119.54 93.46 100.74 72.57 73.41 80.03 83.14 72.40 72.02 79.34 80.79 121.84 95.57 137.96 139.81 84.13 74.76 70.64 74.48 110.61 92.53 93.46 92.66 10 cells 75.83 40.57 107.57 73.50 72.47 52.65 70.23 94.48 62.19 106.56 57.15 96.12 69.01 134.74 83.47 129.46 77.18 91.66 74.34 82.64 156.51 81.80 81.62 79.02 88.23 110.75 116.73 73.53 101.03 122.34 107.43 93.45 104.96 88.18 75.80 70.59 79.46 63.52 163.96 109.30 101.24 73.23 82.81 74.59 72.00 91.42 122.81 76.80 66.42 89.05 89.41 66 20 cells 61.75 40.60 94.54 73.48 73.20 52.60 68.27 86.12 57.87 103.36 54.68 93.62 63.82 120.24 84.37 118.79 73.68 84.61 69.21 80.42 141.12 79.31 69.68 81.83 84.67 102.47 123.22 69.50 75.22 117.14 106.66 88.29 106.54 74.78 74.19 65.64 72.99 68.56 179.29 95.47 94.68 69.72 82.26 74.97 76.52 86.39 122.18 68.59 62.05 85.96 85.10 30 cells 56.67 41.17 88.72 69.00 69.82 53.14 67.88 89.74 60.31 91.41 55.63 89.37 62.55 123.25 83.54 105.07 72.90 79.97 67.39 79.52 131.03 72.60 68.94 81.40 82.00 91.22 112.19 68.60 74.61 93.48 105.21 93.38 111.43 77.84 76.07 63.30 78.09 66.12 154.24 91.86 84.76 67.10 80.17 70.93 67.87 91.86 112.25 65.79 66.89 79.55 81.76 50 cells 71.86 39.46 78.28 63.57 62.21 51.06 67.92 79.00 56.87 82.43 51.33 86.77 56.94 104.89 78.83 95.62 57.69 73.82 66.16 73.01 105.89 64.44 61.82 69.30 76.09 85.60 105.84 66.67 69.97 72.49 92.00 88.65 107.54 67.23 70.82 52.16 66.47 60.56 132.83 104.50 65.31 65.88 75.48 67.97 81.12 76.49 93.99 63.60 54.86 73.53 74.74 Table 4.5: The SSPE when the true values of the parameters (a, b, Q 0) are used. That is, we use the conditional mean of the unavailable measurements based on the available measurements to predict at the 100 unavailable stations. Since the data is simulated, there is no unit associated to the numbers in this table. , Dataset 2 cells 1 71.45 2 40.87 3 91.50 4 66.72 5 82.61 6 53.37 80.49 7 8 75.61 9 52.14 10 74.02 11 59.88 12 95.59 13 63.23 14 91.40 15 87.79 57.62 16 17 66.08 18 85.65 19 70.43 20 77.18 21 105.28 22 72.46 23 53.52 24 72.67 25 77.12 26 89.49 27 112.37 28 69.11 29 63.89 30 69.43 31 94.46 32 80.29 33 94.49 34 73.97 35 77.03 36 49.02 37 57.07 38 54.79 39 107.22 40 79.55 93.32 41 42 60.07 43 71.83 44 71.14 45 69.84 46 69.84 47 78.45 48 65.29 49 60.39 50 80.41 mean 74.35 10 cells 20 cells 67.06 58.71 44.22 40.07 89.48 83.86 62.83 62.69 61.58 61.24 52.38 49.85 78.99 72.17 78.63 66.16 51.78 50.65 71.02 58.54 55.76 48.13 94.42 75.94 61.81 57.52 86.96 85.65 90.01 85.94 56.83 54.90 67.89 58.25 81.95 61.91 72.75 65.97 66.47 61.21 99.17 103.48 75.67 59.15 52.24 46.91 68.45 64.94 68.43 68.92 84.16 70.41 103.01 72.94 58.72 60.26 65.79 54.27 62.33 59.45 89.88 83.92 56.97 57.68 84.18 86.67 62.18 54.61 68.71 62.90 48.34 45.57 56.01 52.06 53.17 55.68 84.85 67.01 75.74 65.22 93.24 52.51 64.15 59.39 74.13 73.27 71.26 71.38 57.97 54.07 61.69 67.31 71.75 70.10 56.04 45.04 51.19 55.84 75.18 64.94 69.84 63.21 67 30 cells 43.00 41.19 78.88 55.20 61.48 49.16 65.16 67.84 54.46 50.27 46.95 67.41 52.86 69.78 81.63 50.70 60.47 61.83 57.36 56.51 71.71 62.64 48.51 67.81 59.51 68.56 72.33 60.95 52.92 45.74 80.52 47.16 53.36 53.44 65.48 40.47 51.31 58.89 71.63 66.18 54.55 50.81 68.45 64.28 52.08 75.64 56.13 43.28 43.97 58.50 58.78 50 cells 43.47 38.79 59.20 56.90 45.55 51.02 62.48 64.87 56.31 44.31 48.71 59.28 55.63 54.15 62.36 44.47 45.96 50.13 48.66 51.70 54.18 53.83 47.59 45.60 60.16 50.87 54.01 56.24 55.52 43.89 74.34 47.38 48.66 45.64 55.35 39.14 54.31 43.27 72.91 57.71 44.87 45.13 51.47 61.59 48.51 70.61 56.87 38.92 42.54 47.76 52.26 Table 4.6: Coverage probability of the credible interval for the simulation study in Section 4.1, with 20 monitored sites and up to 50 grid cells to predict 100 un-monitored sites. The first column is the nominal coverage probability of the credible interval. 95% 90% 80% 70% 60% 40% kriging 37.60% 32.70% 26.32% 21.86% 18.64% 12.22% 0 cell 68.50% 59.68% 49.84% 40.64% 33.50% 22.92% 2 cells 91.98% 84.78% 73.04% 62.04% 52.70% 34.86% 10 cells 94.40% 89.42% 79.58% 70.88% 57.78% 39.00% 20 cells 94.48% 88.22% 78.88% 68.26% 56.24% 38.96% 30 cells 97.23% 86.00% 76.48% 66.28% 55.48% 36.78% 50 cells 95.38% 91.90% 80.76% 71.92% 59.66% 40.72% Table 4.7: Estimates are based on 20 monitored sites and 50 sampling points. There are 25 sampling points in close proximity to one another. The averages are computed over 50 dataset. True values are a = 1.50 and p = 5.00. Since this table is for the estimates of parameters a and p, there is no unit associated to the numbers in this table. averaged posterior mean averaged posterior sd a^0.97^ 0.25 P^4.69^ 0.76 68 Table 4.8: Parameter estimates in the simulation study of ensembles of deterministic models in Section 4.2. Columns 3-4 give the averages of posterior means and standard deviations for a z , bâ€ž i = 1, 2, the calibration parameters of the model output for two deterministic models. ao, i and a6,2 are the variances of two model output error processes. The averages are computed over 15 dataset. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. Parameters al bi a2 b2 Q P ,80 /3 1 ,32 03 04 05 ae 0-(5,1 Q (5,2 True value 5.00 2.50 4.00 3.40 1.50 5.00 2.50 2.90 3.20 0.80 1.10 1.30 0.25 0.25 0.25 69 mean 4.94 2.50 4.00 3.39 1.51 6.73 1.98 2.92 3.20 0.80 1.11 0.31 0.33 0.24 4.51 sd 0.48 0.02 0.68 0.03 0.28 0.91 0.59 0.17 0.16 0.03 0.03 0.03 0.13 0.46 0.91 Table 4.9: SSPE of the BM model and Kriging in the simulation of multideterministic models. We have 50 monitored sites and 100 unmonitored ones. There are 20 grid cells each of them having two sampling points inside. Since the data is simulated, there is no unit associated to the numbers in this table. Dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 mean Kriging 42.57 74.07 85.39 56.63 81.75 81.91 56.64 55.31 85.39 72.93 60.41 65.04 97.69 62.23 60.79 69.25 70 BM 33.13 70.99 59.90 48.59 45.36 73.19 55.89 48.74 56.79 57.10 49.30 59.70 50.43 60.55 41.45 54.07 Table 4.10: Estimates in the stationary case using reversible jump BM model. Both the "estimates" and "standard error" are averages obtained from 15 independent dataset. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. Parameters k al b1 a P i30 A. 02 03 2 U p d(5 True value 4 5.00 2.50 1.50 5.00 1.30 1.20 0.50 0.30 0.25 0.25 Estimate 4.079 5.32 2.45 1.10 4.94 0.96 1.06 0.51 0.32 0.39 4.52 71 Standard Error 0.41 0.52 0.07 0.27 0.67 0.74 0.49 0.10 0.04 0.06 0.83 Table 4.11: SSPEs with different choices of h in non-stationary BM model. Columns 2-6 give the SSPEs for different ratios of the true bandwidth generating the data over the chosen bandwidth in the non-stationary BM model. Since the data is simulated, there is no unit associated to the numbers in this table. dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 mean 1.67 7.99 3.39 3.50 4.30 4.11 2.04 3.87 6.15 2.90 4.63 2.49 3.29 4.13 3.27 7.45 4.23 1.33 7.74 3.01 3.78 3.60 4.79 4.01 3.87 6.13 3.04 5.94 3.15 2.40 4.40 3.49 7.26 4.44 72 1.00 8.09 2.92 7.40 3.26 4.44 2.18 5.12 7.70 2.49 4.96 3.27 2.56 5.46 3.81 7.87 4.77 0.50 8.89 3.26 2.59 3.28 5.18 3.61 4.19 6.40 5.09 7.03 5.11 5.09 5.07 2.90 7.04 4.98 0.40 10.73 5.71 4.63 4.24 17.33 2.71 5.58 8.36 18.33 6.68 3.08 7.56 3.50 7.70 6.26 7.49 Table 4.12: Estimates of a and b for varying values of h in non-stationary BM model. Columns 2-5 give the averages of posterior means and standard deviations. The averages are computed over the 15 datasets generated in the simulation study. The true values are a = 5.00 and b = 2.50. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. (true h)/(chosen h) 1.67 1.33 1.00 0.50 0.40 a 5.04 5.03 5.01 5.00 5.51 73 sd b sd 0.33 0.32 0.31 0.30 1.23 2.49 2.49 2.49 2.50 2.72 0.05 0.05 0.05 0.05 0.58 Chapter 5 A New Univariate Spatial-Temporal Model The BM model cannot deal with temporal correlation in the data. To provide an alternative to the BM model, we consider a regression model. The rest of this chapter is organized as follows. Section 5.1 reviews a two-step linear regression model, Section 5.2 presents an ad-hoc extension of the two-step linear regression model. Based on the favorable results seen in Section 5.2, Section 5.3 develops more rigorously a spatial-temporal model in a Bayesian framework. The data analysis results and conclusions are in Section 7.3. 5.1 Two-Step Linear Regression This section reviews the two-step linear regression model proposed by Guillas et al. (2006). The strong linear correlation between the hourly measurements and model output at each station makes such a model seem natural. Figure 5.1 shows those correlations to be bigger than 0.5 at most stations, pointing to a linear relationship between the measurements and model outputs. 74 The first model in the two-step linear regression procedure proposed by Guillas et al. (2006) relates the measurements, {0(t)}, to the model outputs {M(t)} by = c+aM(t)+ Nt, t= 1,2, â€¢,T^(5.1) with autocorrelated residuals Nt = PNt--1 + et. ^ (5. 2) g â€˜9, O O 0.2 0.3^04^05^06^07 08 Pearson's correlation coefficients between measurements and model output Figure 5.1: Histogram of Pearson's correlation coefficients between hourly measurements and model output for the 78 stations. The residuals for the autocorrelation model in turn satisfy the following 75 linear regression model: 11 et = ^ aimz (t)^Et, ^ (5.3) i=0 m i (t) { 1 if t mod 12 = i; 0 iftmod12 ^ i. i = 0, â€¢ â€¢^11. The models (5.1), (5.2) and (5.3) are the same for all stations, while the coefficients are station specific. Both the measurements and model outputs are at the hourly level of temporal resolution and thus they have a strong auto-regressive structure. {Xt } is an AR(p) process if for every t Xt = C where e t ti ^ PiXt-i + Et, N(0, (7,2 ) independently and identically. The autoregssive pa- rameters are p i , â€¢ â€¢, pp . Many authors omit constant c for simplicity. Some constraints are necessary on the values of the parameters of this model in order that the model remains stationary. For example, processes in the AR(1) model with 'p i ' > 1 are not stationary. Assuming the {N t } have an AR(1) autoregressive structure provides computational simplicity at the possible expense of realism. We also have tried an AR(2) structure for the {Nt } but the forecast results do not improve, leading us to use AR(1) throughout 76 this thesis. Following Guillas et al. (2006), we first fit model (5.1) by using generalized least squares as proposed by Box et al. (1994), N(t) being an AR(1) process. If the model output captures the temporal structure of the measurements very well then model (5.1) would be enough. However, most of the time, the measurements will still have some temporal structure that the model output fails to capture. That is to say, the residuals {e t } will have non-zero means. So, the second step is to fit model (5.3). The covariates in this model are indicator functions of the 12 hours. We also tried indicator functions of the 24 hours, but the forecast results do not improve. So, in favor of fewer parameters, we choose indicator functions of the 12 hours. For the residuals of the AR(1) process {Nt}, Et r N(0, (7) independently and identically. Models (5.1-5.3) are fitted sequentially and independently. These two models can then be used to forecast the measurements. Suppose at each station, the training data are measurements and model output for times t =1 to T. For times t = T +1, â€¢ â€¢ â€¢, T +T', only the model output is available and the goal is to forecast the measurement during that future time period. To do so, models (5.1-5.3) are fitted to obtain estimators 6, a, The forecast of O t is then the following O t = a + etMt + 77 1), = 1, â€¢ â€¢ â€¢ , 12. where 11 it = p it-i +^d-rn i ( t )^ (5.4) t =T + 1, â€¢ â€¢,T +T' Although the two-step regression above is easy to implement, it has two disadvantages. First, it does not include any spatial correlation between stations. We fit models (5.1-5.3) at each station independently. Thus we can only forecast the measurement for stations that have measurements in the past, after estimating the requisite coefficients. In other words, it cannot borrow strength across space to forecast future values of stations without past data by exploiting data from neighbors with a past. Second, the twostep regression procedure only gives point forecasts without any indication of their uncertainties. To address the first disadvantage of the two-step regression, we first consider a simple but ad-hoc approach which Kriges parameter estimates (Ct, and d i ) across space to get values for them at stations without measurements. After affirming the value of our ad-hoc approach, we address the second disadvantage by extending it to a Bayesian spatial-temporal model. The latter has better Bayesian credentials than the former but that comes at the price of greater complexity. The credible predictive intervals of the forecast and spatial prediction can then be obtained from their MCMC posterior samples at considerable computation cost. 78 5.2 Two-step linear regression with Kriging This section presents an ad-hoc approach which is able to not only forecast the measurements but also spatially predict the measurements at those locations without historic measurements. First, we fit models (5.1) and (5.3) at the stations with measurements. Then the estimates of parameters a, c and a z , i = 0, â€¢ â€¢ â€¢, 11, are interpolated at locations without measurements by Kriging as described in Section (2.3). We uses the exponential function (2.3) to model the spatial covariance. More complex choices of the spatial covariance function will be considered in future work. At the locations without measurements but with model output, we have the Kriging interpolated values of the parameters a, c and a z , i = 0, â€¢ â€¢ â€¢,11. Then we can plug these parameters into Formula (5.4) to obtain spatial predictions of the measurements. 5.3 A Bayesian Hierarchical Spatial-Temporal Model The ad-hoc approach in the previous section is "self-contradictory" because it first fits models (5.1-5.3) to get estimates for {a i l ignoring their spatial correlation. But then it uses Kriging to interpolate the estimated parameter values at the locations without measurements assuming parameters at different stations are spatially correlated. 79 Another disadvantage of the ad-hoc approach is that it fails to provide measures of the uncertainties associated with the forecasts or predictions. Kassteele et al. (2006) use a similar two-step regression model to interpolate PM10 concentrations over western Europe. In this paper, the authors first fit a linear regression model with PM10 as the response, modeling output and other explanatory variables as covariates. Then the residuals are interpolated using Kriging. The approach in Kassteele et al. (2006) has the same weakness as the ad-hoc approach presented in the previous section. That is, it assumes the residuals are spatially independent to fit the model by using ordinary least square method and then uses Kriging to interpolate the residuals at other locations. Moreover, the model used in Kassteele et al. (2006) does not include temporal correlation because the data are yearly averages of PM10 concentrations that do not have much temporal correlation. Hence, strictly speaking, both the ad-hoc approach in the previous section and that of Kassteele et al. (2006) do not yield spatial-temporal models. This section presents a Bayesian hierarchical spatial-temporal model that assumes parameters at different stations in models (5.1) and (5.3) are spatially correlated Gaussian processes. For the Bayesian spatial-temporal model has the following mathematical 80 form: -= a8 C8M8,t N8,t N8,t = PN8,t-1^2 1,8Z1,t^72,8Z2,t^' â€¢ â€¢ + 712,8Z12,t^C8,t ( a = (a i , â€¢ â€¢ ct a ) T ti MVN(p a , E a ) c = (c1, â€¢ â€¢ â€¢, cm)" N MVN(1 1 , Ec) = (71,i, â€¢ â€¢ â€¢,71,n) T '712 = (712,1, â€¢^712,n) T^MVN(/.6712 , E 712 ) (â‚¬14, â€¢ â€¢ â€¢,â‚¬,,,,t) T V MVN(0, E E ) independently and identically times t = 1, â€¢ â€¢ â€¢, T, and sites s = si, â€¢ â€¢ â€¢,sn = (Pa, â€¢ â€¢ â€¢, Pa)T 1-tc = (te, â€¢ â€¢ ',1-tc) r E a = aa2 exp(â€”D/A a ) E, = o-c2 exp(â€”D/A,) Eli = (77 2 exp(â€”D/A ), i = 1, â€¢ â€¢ â€¢, 12 = (P-y, â€¢ â€¢ â€¢,/- 1 72) T, i = 1, â€¢ â€¢^12 E E = cr e2 exp(â€”D/A,), ^ (5.5) n being the number of monitoring sites, T, the number of hours and D, the 81 Euclidean distance matrix between stations. Vectors p a , p c and E 7, have dimension n. In model (5.5), s denotes location and t, time. The timerelated covariates at time t are Zi , t = 0 except for t mod 12 = i where Zi , t = 1, i = 1, â€¢ .12. These time-related covariates are the same as the indicator functions m, (t) in (5.3). At each time t, the residuals at different locations are also spatially correlated. The spatial correlation is isotropic and specified by an exponential correlation function. At each location s, the residuals f 8 , t form an AR(1) time series with mean 0. We make the following assumptions for this Bayesian spatial-temporal model. â€¢ The measurements and model output are linearly related. â€¢ The autoregressive parameter p is constant across stations. Section 11.2 has more discussion about this assumption. â€¢ The temporal-spatial correlation is separable. For a process {Z(s, its temporal-spatial is separable if the correlation can be written as product of spatial and temporal correlation functions. That is, the temporal correlation does not change over space and the spatial correlation remains the same over time. Section 11.2 discuss future work to release this assumption. â€¢ The spatial correlation between measurements at different stations can be explained by the model output and the spatial correlation between 82 the coefficients and residuals. Section 7.3 justifies this assumption in our data analysis. We use the Gibbs sampling algorithm proposed by Gelfand and Smith (1990) to fit the Bayesian spatial-temporal model (5.5). The key to make Gibbs sampling work efficiently is to write model (5.5) in matrix form, that is 0= Aa+Mc+N RN = 7 1 Z1+ â€¢ + -y i2 Z + E a c ti -y i E IT-1 MVN(/-t a , Ea) N MVN(p c , E c ) ti ti MVN(iz,yi ,^i = 1,â€¢12 MVN(0, IT-1 0 Ee),^ (5.6) being an identity matrix with dimension (T â€” 1) x (T â€” 1) and E f is the spatial covariance matrix of residuals f,, t across space at each time t. The measurement vector is denoted as 0 = (01,1, â€¢ â€¢ â€¢,01,n, â€¢ â€¢ â€¢, OT,1, â€¢ â€¢ , OT, TO T and the vector N, as N = (N1,1,â€¢ â€¢^â€¢ â€¢, NT,1, â€¢ â€¢ â€¢ NT,Th) T . The measurements are arranged first in the order of locations and then in the order of time. So, the measurements from different locations at the same time are next to each other. That is how we get the Kronecker product structure for the variance matrix of residuals 83 E in model (5.6). The matrix form of the model output is 0 M 0 â€¢ Mt, 1 â€¢ â€¢â€¢ = Ml,n 0 0 0 Mt,n MT,1 0 0 0 â€¢ â€¢ - MT,n nTxn In the matrix M, the rows from (t â€”1) x n + 1 to t x n form a diagonal matrix of dimension n x n for t =1,â€¢ â€¢ â€¢,T. The elements in each diagonal matrix are the measurements from different locations at time t. The matrix 84 A has the same structure as M. We have 0^1 A ^1 = ^0 0 ^\ 0^l j nTxn In the matrix A, the rows from (t â€” 1) x n + 1 to t x n form an identity matrix of dimension n x n for t = 1, â€¢ â€¢ â€¢, T. The matrix R has dimensions n(T â€” 1) x nT. At each row i, i = 1, â€¢ â€¢, n(T â€” 1), the element at column i is â€”p and the element at column i + n is 1. All the other elements in the matrix R are 0. 85 The matrix Z 1 is 1^... 0 â€¢ â€¢. â– o 0^... 1 1 .Z1 0 = 0 1 0 0^... 0 â€¢ . \0^ â€¢ â€¢ â€¢ 0 n(T-1)xn This matrix has dimensions n(T â€” 1) x n. The rows from (t â€” 1) x n + 1 to t x n (t =1,- â€¢ -,T) form an identity matrix of dimension n x n only for t mod 12 = 1. All the other elements in the matrix R are 0. The matrices Z (i = 2 â€¢ â€¢ 12) have the same dimensions as matrix Z 1 . For matrix Z, (i = 2 12), the rows from (t â€” 1) x n + 1 to t x n (t = 1,â€¢ â€¢, T) form an identity matrix of dimension 77, x n only for t mod 12 = i. All the other elements in the matrix R are 0. Let N = 0 â€” Aa â€” Mc and F = 2_, 17i Z 2 . Then the density of 86 c = RN is the following: p(Ola, c, R, eXP 1 2 EE) a IEE E T21 [R(0 - Aa - Mc) -^(IT_i 0 E r ) -1- [R(0 - Aa - Mc) - 1]) . (5.7) The inferences about the parameters, forecasts and predictions are based on their Markov Chain Monte Carlo (MCMC) samples generated by the Gibbs sampling algorithm. That algorithm as implemented is iterative. First, we choose some arbitrary initial values for all the parameters. Then, in each iteration of the algorithm, the parameters are updated by generating a random sample from their conditional distributions given the data and other parameter values from the previous iteration. For computational efficiency, each of the parameter vectors a, c and -y l , â€¢ â€¢ -y i is updated as a block. For completeness, we list the conditional distributions for each of the parameters from (5.8) to (5.16) . The prior and full conditional distributions of a are P(aitta, Ea) ti MVN (Pa, Ea) P(ailia, Ea, M, A, R, E E ) N MVN(t.t a' , Ea) Ea =^+ A TRT (/ T _ i E f ) -i RA] 1 a = Ea [A T RT ('. T_i 0 E E ) - .17(0 - Mc) -A TRT (IT_ i Â® EE) -1 F + Ea l Pa] â€¢^(5.8) 87 The prior and full conditional distributions of c are p(c! p, e , E c ) ti MVN(p,,, E c ) Ec, M, M, R, E E ) ti MVN^E c ) E ic = [E;- ' + M TRVT_i Â® E E ) -1 RM] 1 p c = Ec [M TRVT_i Â® E,) -1 R(0 - Aa) -M TRVT-i 0 + . (5.9) The prior and full conditional distributions of^i = 1, â€¢ â€¢ â€¢, 12, are P(7i1I-t 7o EN) "' MVN(FI N , E-yi ) p(-y i^, E N , M, A, R, f ) Eli ti MVN Gu 7i , i =^ZT(iT-1 EE)-1Z2ri E 7.[ Z iVT - 1 -F + = EE)-1 (R(0 - Aa - Mc) (5.10) The prior and full conditional distributions of p a are both univariate 88 normal distributions with the following forms: gttalao, fo)^N(ao, fo) P(/ualao, fo, Ea, a) ^N(ao, to) fo = [fcT i + 21 TE 1 1] -1 ao = fo [a TE: 1 1 + 1TEI, l ot] ,^(5.11) 1 being a vector of dimension n with all the elements are 1, f o and f,'3 , the prior and conditional variance of p a . The full conditional distributions of p c , (i = 1, â€¢ â€¢ â€¢, 12) are almost same as (5.11) only replacing a, E a with c, E, and -y a , KT, correspondingly. The prior and full conditional distributions of a, are both inverse gamma distributions with the following forms: p ( o.a2 1 a , 0 )^(aa2 ) ,_1 exp ( ^1 ) ,(7?, P(cja2 ia, ,(3, a, Aa, [la)^(cja2 ) -ai^exp a=a+ Cr?4 n 2 = [1/Q + -21j- (a â€” a ) T (exp(â€”D/A a )) -1 (a â€” p a )]^(5.12) We cannot find a conjugate prior for the parameter A a , so its conditional 89 distribution does not have a closed form. However, we have p(A a la, p a , a) cx lexp(â€”D/A a )1 -1 expf- -1 (a â€” ti a ) T (exp(â€”D/) a )) 1 (a â€” it a )}.^(5.13) To obtain a random sample from the above distribution of A,â€ž we have to use the Metropolis-Hasting algorithm proposed by Metropolis et al. (1953) and Hastings (1970). The conditional distributions of a c2 , A, and o- 72 1 ,A.-yi (i = 1, â€¢ â€¢ â€¢, 12) are very similar to cr a2 and ' a . We only need to replace a, Pa with c, pt, and y 2 Ec ya correspondingly in (5.12) and (5.13). , The prior and conditional distributions of al are both inverse gamma distributions with the following forms: P(CF a, /3 ) CX (C4) -a-1 eXp 1 i30") 2 I a, , C, AE, i-tc)^(cr,2 )^exP^ P( Gr^/3^ n(T â€” 1) a=a+ 2 (3 = [11 + /31G1) (etT(exP(â€”D/AE))-1et) 1 1 (5.14) =2 Et being the residuals vector at different locations at time t. As with ) a , we use Metropolis-Hastings algorithm to obtain a random sample of A f . We 90 have P(Aflo",^icx lexP( â€” DIAE)I -1 1 t=1 (et T (exP( DIAE) â€” r et)}. i (5.15) To update p, its ordinary least square (OLS) estimate given other parameters is used instead of a random sample from its conditional distribution. We do this in part for computational simplicity. Finding a conjugate prior for p is very difficult while finding its ordinary least square estimation is very easy. The main reason is that the variance of the conditional distribution of p is small enough to be treated as zero because of the large sample size n x (T â€” 1). The OLS estimate of p is vâ– T^vâ– 2 =^ Lâ€”dt= 2 v8 5t^Laj=i i,t7i,S)NS,t-1 ^2 (5.16) Et=2 where N is the matrix form of N in model (5.6) and 234 = 0 except for t mod 12 = j where 2i , t = 1. At each iteration in Gibbs sampling, the forecast or prediction is obtained in an iterative way. The forecast or prediction is O t = a, + c 8 M t + Nt 12 N t = pN t _ i + ^ 72,8Zi,t + Et i=1 Et is a random sample generated from MVN(0, E E ),^(5.17) 91 M t being the vector of measurements from all the stations at time t, bdc t , the residuals at time t, O t the forecast or prediction at time t. At those stations where the historic measurements are available up to time T, at , t = T + I,. â€¢ -,T + T' are the forecasts. At those stations where there is no historic measurement, a t , t = 1,â€¢ â€¢ â€¢, T are the predictions. The parameters at those stations without measurements can be spatially predicted by the parameter values at stations with measurement. Finally, the credible interval of the forecast or prediction is obtained by taking the quantiles of all their MCMC samples. 92 Chapter 6 Multivariate Spatial-Temporal Model Both the BM model in Chapter 2 and the Bayesian spatial-temporal model in Chapter 5 are univariate models. That is, the responses in these models are univariate variables. However, in practice, the monitoring stations can record the concentration levels of more than one pollutant. It is also possible for the deterministic models to generate multivariate outputs. For example, Grimit and Mass (2002) describe the University of Washington mesoscale short-range ensemble forecasting system over the Pacific Northwest. This system can generate temperature and sea surface level pressure at the same time. The multivariate regression problem has been studied extensively by Brown and Zidek (1982); van der Merwe and Zidek (1980); Brown and Zidek (1980); Cripps et al. (2003), etc. In geo-statistics, a well known tool for interpolating a multivariate spatial process is the so-called co-Kriging approach described by Wackernagel (1998). Co-Kriging is a multivariate extension of the Kriging approach. However, like Kriging, co-Kriging is purely a spatial model which 93 cannot handle the temporal correlation in a fundamental way. The more recent literature of multivariate pollutant prediction includes Mardia and Goodall (1993); Brown et al. (1994); Le and Zidek (1997); Sun et al. (1998); Zidek et al. (2000); Schmidt and Gelfand (2003); Gelfand et al. (2004). The related chapters in the books by Le and Zidek (2006); Banerjee et al. (2003) provide excellent summaries of various multivariate models used in the spatial statistics. However, many of multivariate models proposed before assume the responses are temporally independent. In some cases, this assumption is valid because the data has been monthly or yearly averaged so there is not much temporal correlation left. In other cases, some preliminary analysis has been done to remove the temporal correlation before using the multivariate models. This chapter presents a multivariate model which is an extension of the univariate spatial-temporal model proposed in Chapter 5. It incorporates both spatial and temporal correlation. This multivariate model is built in a Bayesian framework in order to evaluate the uncertainty of the forecasts and spatial predictions. The rest of this chapter is organized as follows. Section 6.1 presents the details of the multivariate spatial-temporal model. Section 6.2 presents the MCMC algorithm to fit this multivariate spatial-temporal model. 94 6.1 Multivariate Spatial-Temporal Model This section presents the model specifications and full conditional distributions of all the parameters needed to implement the MCMC algorithm. At time t and station s, the model has the following form: 0 84 = /3 8 M 8,t + N8,t 1â€¢7 8 , t = pN8,t-i+'7 8 Z8,t+ C 8 ,t t =1,â€¢ â€¢ â€¢T 8 =1,â€¢ â€¢ .n E8,t ti MVN(0, E,) independently and identically,^(6.1) 0 8 , t : q x 1 being the measurements vector of q pollutants and M a , t : (p + 1) x 1, the vector of intercept term and p model outputs for q pollutants. It is not necessary that p = q. The covariate Z,, t is a vector of dimension m, which can include m other variables helping to forecast or predict the measurements. For example, Z can include temperature which is helpful to forecast or predict the ozone concentration levels. The matrix E, is the covariance matrix of q pollutants, the so-called co-regionalization matrix in geo-statistics. The auto-regression coefficient matrix p is of dimensions q x q. The coefficients O s and ye at each location s are matrices of dimensions q x (p + 1) and q x m respectively. Each element of the matrix /3 and -y, 95 are spatially correlated across stations. That is, we have ( 13i,j,1 1 .. = = 17 0i, .7 )T ti MVN(A0,i,j, Es,i,i) aQ, i ,7 exp( â€” D/A0,i,9) â€¢ â€¢ â€¢ 7 ql j = Â°I â€¢ â€¢ ("Y i,j,17^i,j,n)T r". â€¢7P MVN(A-y ,i,j7 =^exp(â€”D/),i,j) i = 1, â€¢ â€¢ â€¢,q, j = 1, â€¢ â€¢ â€¢,m. ^ (6.2) Spatial processes {0, 0 , 8 } (i = 1, â€¢ â€¢ â€¢, q, j = 0, â€¢ â€¢ â€¢,p) and {-y2 , 3 , 8 } (i = 1, â€¢ â€¢ â€¢, q, j =1,- â€¢ â€¢ , m) are assumed to have exponential spatial correlation functions. However, these spatial processes, 0, 0 , 8 1 and 0, 0 , 8 1), have independent prior distributions. We use the Gibbs sampling algorithm to fit models (6.1) and (6.2). We write model (6.1) in a matrix form to make the MCMC algorithm more efficient. The matrix form is the following: 0 = M f3 + N RN=-yZ+c c ti MVN(O, I E E ), (6.3) the dimension of identity matrix I being n(T â€” 1) x n(T â€” 1) , the measurements vector 0, formed by stacking vectors O t (t = 1, â€¢ â€¢ -,T). In other words, Ot = (01,1,/7 â€¢ â€¢ â€¢, q,l,t1 â€¢ â€¢ 7 Â°1,01 â€¢ â€¢ â€¢ 96 0 q,k,t1 0 1,n,t7 â€¢ â€¢ 0q,n,t)T, which includes the measurements of q pollutants for n stations at time t. In model (6.3), we have the coefficients vectors =^â€¢ â€¢^01,0,n,^ 13i,3,11 â€¢ â€¢ '7 1^' â€¢ i3q,p,11 ' ' ' 13q,p,n) and 1 T = (71,1,i, â€¢ â€¢ â€¢ 71,1,n, â€¢ â€¢ â€¢ ,^â€¢ â€¢ â€¢ , 7i, 7 , n , â€¢ â€¢ â€¢ N m 1 â€¢ â€¢ â€¢ , 'yq,m,n ) T. The coefficient , = , 1, q, j = 1, â€¢ â€¢ â€¢ , p, k = 1, â€¢ â€¢ -,n ) is the effect of j-th model output on the i-th measurement at station k and f3i , o , k is the intercept of the i-th measurement at station k. In principle, we can arrange the elements of vectors Q and y in arbitrary orders. The reason to have the specific ordering above is that we can easily write the covariance matrix of the coefficients and 7 in a block diagonal form shown later in the next section. In model (6.3), the model output M is matrix of dimension nqT x nq(p + 1). The covariate matrix Z is of dimensions nqT x nqm. The matrix of measurement M is formed by stacking the matrices M t , t = 1, â€¢ â€¢ â€¢ ,T by rows. Each matrix M t is of dimensions nq x nq(p + 1) and it is formed by stacking the matrices m i , i = 1, â€¢ â€¢, n by rows, m i being a diagonal block matrix of dimensions q x nq(p + 1). The row vectors on the diagonal of matrix m 1 is (Mo,i,t, 0 â€¢^0, ^0 â€¢ â€¢ 0, â€¢ to the intercept, M1,1,tâ€¢ â€¢ â€¢ â€¢â€¢ p i 1 , , â€¢ , Mp, 1,t , 0 â€¢ â€¢ 0) M0,1,t = 1 corresponding p covariates at station 1 at time t. The matrix Z has the same structure as M except that its dimensions are of nq (T â€” 1) x nqm. 97 The density function of 0 is the following: p(010 ,-y, R, EE) oc II 0 Ed -1 ( 1 exp -- [R(0 â€” M/3) â€” ZIT (I E -1 ) [R(0 â€” M 1 3) â€” Z-y])(6.4) 2 6.2 MCMC Algorithm To use the Gibbs sampling algorithm, we first need to derive the full conditional distributions of all the parameters. The likelihood function of the parameters 13, y E and p is (6.4). The prior and full conditional distribu, tions of are the following: p(13 ÂµQ, Ep) MVN(p o , Ep) p(Oliz o , Ep, R, 0, Z, E,, 7) MVN(i4, E t3' ) [ Efi i m TRT (/ E i) R,A4-] 1 p si = Ep [E iVp is + M TR T (I 0^(RO â€” Z-y)] . The prior mean p o is formed by stacking vectors = (up,,, j , â€¢ â€¢ â€¢â€žup,, o )T, i = 1, â€¢ â€¢,q and j = 0, â€¢ â€¢ â€¢ , p. The n elements in vector Âµ o, , ,3 are identical because we assume the prior mean of A o is the same at each station. The prior variance of [3 is Ep = diag(E 04 , 0 , â€¢ â€¢ â€¢, E 0 ,,, j , â€¢ â€¢ â€¢, Ej, q ,p ). Each matrix Eo, i0 is the spatial covariance matrix of vector 63, 04 , â€¢ â€¢ â€¢, 0 20 7,), that is, the coefficients ,32 , 3 at different stations. , 98 Because the spatial processes of coefficients )32 , 3 are independent of each other so E 0 is a diagonal block matrix of p x (p + 1) blocks. The prior and full conditional distributions of -y are the following: P(7111 y , E-y)^MVN(u-r , E-y) p(-y1/1 7 , E 7 , R, O, Z , EE, -y) MVN(pt7, E7) Ely = [E; 1 + zT(I Ei)z] -1 Âµy =E7 [E.:y-141 7 + Z T (/ E l )R(0 â€” M /3)] . The prior mean and variance of^have the same structure as those of 0 0 . The only difference is that the dimension of the coefficient matrix -y k at each station k is of q x m but the dimensions of 13 k is q x (p + 1). A q x q positive definite matrix W is said to have a Wishart distribution with v degrees of freedom if its density function is of the form f (W) = [2v 0 F q (v /2)] 1 Litirv/ 2 1Wr - q -1 exp [â€”tr(A -1 W)/2] F q (t)^rq(q-1)/41171F[t â€” (i â€” 1)/2] r(x) = Z oo tx -l e -t dt.^ (6.5) for any positive q x q matrix A, tr(A -1 W) being the trace of matrix A -1 W. Let W = E -1 stands for the precision matrix in the density function 99 (6.4). The prior and full conditional distributions of W are p(W) - Wishart(A, v) , , p(We) --, Wishart(A', v') A' = (A -1 + "Ã© TÃ«) -1 7; = v + n(T â€” 1).^ (6.6) The n(T â€”1) x q matrix 6- is formed by stacking row vectors e B , t , t = 2, â€¢ â€¢ â€¢, T and s = 1, â€¢ â€¢ â€¢, n. Each residuals vector e 8 , t is of dimension q. To update the auto-regression coefficient matrix p, we use ordinary least squares estimation instead of sampling from its full conditional distribution. In Gibbs sampling algorithm, iteratively update each parameter block by its full conditional distributions derived as above in this section. At each iteration in Gibbs sampling, the forecast or prediction is obtained in an iterative way. The forecast or prediction is O,â€ž t = 0 8 m ,,t + N s,t 1 V 8,t = PN 8 t-i + 3 68,t c a , t is a random sample generated from MVN(0, E E ).^(6.7) At those stations where the historic measurements are available up to time T, O 8 , t , t = T + 1, â€¢ â€¢ -,T + T' are the forecasts. At those stations where there is no historic measurements, O 8 , t , t = 1, â€¢ â€¢ â€¢, T are the predictions. 100 The parameters at those stations without measurements can be spatially predicted by the parameter values at stations with measurements. Finally, the credible interval of the forecast or prediction is obtained by taking the quantiles of all their MCMC samples. Section 7.4 presents an application of the Bayesian multivariate spatial-temporal model and demonstrates its use. 101 Chapter 7 Forecasting and Spatial Prediction One of the important purposes of deterministic models is to predict the phenomena they represent. This chapter first presents spatial prediction results for BM described in Chapter 2. Then it continues to explore the making of such predictions and forecasts by the spatial-temporal model proposed in Chapter 5. In the rest of this chapter, we use the term "prediction" to refer to spatial prediction, while "forecast" refers to future responses. 7.1 Bayesian Melding Model Prediction The data comes from two sources: regional surface ozone concentration measurements and model outputs from a deterministic model AQM (air quality model), a non-hydrostatic version of the MAQSIP (Multiscale Air Quality Simulation Platform) model. This AQM system has been described in detail by Wheeler and Houyoux (1998). The AQM model output is based on grid cells with resolution 6 x 6 km 2 . The measurements are from the Air Quality 102 System (AQS) monitoring network. Both the measurements and model outputs are hourly concentrations starting from May 15 to September 11, 1995 over a 120-day period. The dataset represents 375 monitoring stations in the AQS network and 307 grid cells in the AQM output. The measurements and model outputs are based on different supports, that is, measurements are based on point values while model outputs are grid cell averages over space. In addition, the model outputs are based on the UTC (Coordinated Universal Time) time standard, the measurements on local time. Ignoring this time difference would result in poor correlation between measurements and model outputs. The next section shows the necessity to adjust this discrepancy. This dataset has been analyzed by Kasibhatla and Chameides (2000), Hogrefe et al. (2001b) as well as Hogrefe et al. (2001a). The first paper compares the Pearson correlation coefficients between quantiles (10th, 25th, 50th, 75th and 90th) of the measurements and AQM model outputs for daily average data. The latter two papers also analyze the correlation between measurements and model outputs, but after decomposing the hourly time series into sub-series on different time scales. They reach similar conclusions, that the AQM model output represents the measurement better at longer time scales. However in estimating the correlation between measurements and model output, the above analysis ignores the temporal and spatial correlation in the data, thus leaving some uncertainty about their validity. Moreover correlation analysis does not help in the assessment of the model calibration 103 (additive and multiplicative) since the correlation is by definition invariant under the transformation of measurement scales. Finally our interest lies not merely in the degree of linear association between measurements and model outputs measured by the correlation, but rather in predicting the ozone level at unmonitored locations. These factors motivate a new data analysis using the BM. Preliminary data analysis The previous section points out that the measurements are based on local time while the model outputs are based on GMT (Greenwich Mean Time). So we adjusted the model outputs to the local time. The dataset has 119 days after the adjustment. Figure 7.1 shows the linear relationship between the measurements and model outputs. The measurement series, unlike that for the model outputs, have missing values. For example, all the measurements from station 550730005 (in Wisconsin state) are missing. To deal with the missing values, we first choose those stations that have no more than 100 hours of missing measurements. Second, we use the 24 hour mean to fill in the missing values. For example, if the missing values occur at 1000, then we use the average of the available values at 1000 every day to fill in these missing values. After adjusting for different time standards and ignoring the stations with more than 100 missing measurements, we have measurements at 81 stations and model outputs 104 N. ^ 0 50 ^ 100 ^ 150 station 290470003 Figure 7.1: Scatter plots and Pearson correlations between measurements of station #290470003 and model outputs for cell #1847. The y-axis stands for the model outputs of grid cell #1847 and the x-axis stands for the measurements at station #290470003. The station #290470003 is inside grid cell #1847. Observe that the correlation between measurements and model outputs is 0.7. 105 on 375 grid cells of 2856 hours (119 days). In these 375 grid cells, there are 78 grid cells which contain one and only one station. To enable us understand better the role of model-to-measurement correlation in spatial prediction, from now on the data always will focus on the 78 grid cells with 78 stations inside the grid cells. Although the measurements and model outputs are available during the 119 day period, we only focus on the 30 days of July, when the ozone concentration is at a high level due to high temperatures. The ozone concentration level at night and early in the morning is much lower than during the day. Figure 7.2 shows the 24 side-by-side hourly box plots of measured ozone levels at station #10731005 and simulated ozone levels at grid cell #3529. The station #10731005 is inside grid cell #3529. Both the observed and simulated ozone level are at a peak during the 8 hours from 1000 to 1700. Figure 7.3 shows the histograms of the model-to-measurement correlation at all 24 hours and at the 8 hours. Based on Figures 7.2 and 7.3, we focus on the analysis of 8-hour measurements and model outputs, since otherwise the two data sources are quite dissimilar. Moreover, little interest obtains in the hours outside that period. From now on, the 8-hour average measurements or model outputs is also referred to as the "daily average". We apply BM to analyze the hourly, daily and weekly average ozone concentration levels for the selected 8-hour time period. Kasibhatla and Chameides (2000); Fiore et al. (2003, 2004) focus their analysis on similar daily average data. In particular, Fiore et al. (2003) use the empirical orthogonal function (EOF) method, a principal component analysis (PCA), to compare the first 106 two principal components of the measurement and model outputs. The simulated ozone levels at the 78 grid cells and measurements at 48 stations are used to fit the BM model (2.9). The measurements at the remaining 30 stations are used as validation data. These 30 stations are called "un-monitoring" sites from now on. Figure 7.4 maps the grid cells and monitoring/un-monitoring sites. For locations s i and 8 2 , we use the Euclidean distance of their longitude and latitude to define the distance between them as the following: di st( s 1 , s 2 ) = V ( 8 1,1 â€” S2,1) 2 + (S1,2 â€” s2,2) 2 , â€˜ (7.1) s 1 , 1 and s 2 , 1 being the longitudes of location s 1 , s 1 , 2 and s 2 , 2 , the latitudes of location 8 2 . In fact, the more accurate formula to calculate the distance in km between two locations is the following: dist(s i , s 2 ) = 111.21 x V(s 1 ,1 â€” 824) 2 + cos 2 (s1,2/57.3)(81,2 â€” s2,2) 2 . (7.2) The distances calculated from Equations (7.1) and (7.2) are not proportional because of the cosine term in Equation (7.2). However, the relative errors of the distances between the 78 stations we have is small. Figure 7.5 shows that about 90% of the relative errors are within 10%. Another reason for us to use the Equation (7.1) to define the distance between stations is for the usage of Kriging in R package "geoR" by Ribeiro and Diggle (2001). That package define the distance by using Equations (7.1). To compare the results 107 CL CL â€” cr) a) 0 0 -T- -1- . C o 0 0 a) N 0 LO a) 0 80 -cL 0 -9- 00 -L. o - _o0 o 2 ^ b=4 1111111111111111 (0 0^1^2^3^4 5^6^7^8^9^10 11 12 13 14 15 16 17 18 19 20 21 22 23 hours n. CL 0 (i) TI) 0^6 o,- o 6 8 -r-^ ^eEER H -1- 0 a) 0 8 0 0 0 - - N 0 o 0 0 0 0 0 0 7 ^-7- 0 0 1^I^ 11 I FLH ^ 11 1 -L- 0^1^2^3^4^5^6^7^8^9^10 11 12 13 14 15 16 17 18 19 20 21 22 23 hours Figure 7.2: The upper panel is the 24 side-by-side hourly box plots of measurements at station #10731005. The lower panel is the 24 side-by-side hourly box plots of model outputs (simulated ozone levels) in grid cell #3529. The station #10731005 lies inside grid cell #3529. 108 O o co C a) 0 C) LL 0 0 O O -0.4^-0.2^00^02^04 ^ 06 ^ 08 correlation over space for all 24 hours 0 0 -N - C C.1) 0 LL o - -0.2 0.0 02 04 06 correlation over space for hours 1000-1700 Figure 7.3: Histograms of correlation over space at each hour. The upper histogram is for all the 24 hours. The lower histogram is for the 8-hour day time. Notice the markedly larger model-to-measurement correlations during the hours 1000-1700. 109 08 of Kriging and other approaches, we stick to that definition of the distance from now on. We compare the results of spatial prediction between two competing approaches: BM and Kriging. In fact, we have two different versions of Kriging: Kriging using measurements and Kriging using model outputs. The RMSPE (root mean square prediction error) measures the predictive performance. At time t, which could be hour, day or week, we define the RMSPE by RMSPE =^(0i â€” O i ) 2 , (7.3) n being the number of un-monitoring sites to be predicted, 0â€ž the measurement at station i and O i the prediction. , Analysis of Hourly Data For the 30 days from July 1 to July 30, 1995, we have 240 hourly measurements and model outputs at the daily 8-hour time period (1000-1700) selected for analysis. We apply the BM model and Kriging to each hour separately. For brevity, we list the average RMSPE for the BM model and Kriging in Table 7.1. The averages are computed over the 240 hours. Figure 7.6 shows the plot of the RMSPE difference between the BM model and Kriging using only measurements versus the correlation between measurements and model outputs over space. We offer the following observations about the analysis 110 i^1 -100^-95 -90 i i 1 i -85 -80 -75 -70 Figure 7.4: Locations of 48 available stations, 78 grid cells and 30 unavailable stations. A: available stations. Rectangle: grid cells. +: unavailable stations to be predicted. The x-axis is the longitude in degrees, y-axis, the latitude in degrees. 111 0 2 4 8 10 12 relative errors in percentages Figure 7.5: Cumulative plots of the relative errors in percentages between the distance using formula 7.1 and formula 7.2. 112 of hourly data. â€¢ Table 7.1 shows that on average, the BM predictor has the smallest RMSPE among all the competitive prediction approaches. â€¢ The BM predictor seems marginally better than Kriging using only the measurements. The RMSPE for the BM predictor is 15.82 and for Kriging using measurements, 16.36. As shown in the next section, the BM predictor does much better than that version of Kriging in the analysis of daily average data. â€¢ Kriging using model outputs has the biggest RMSPE, pointing to the desirability of calibrating the model outputs. â€¢ Figure 7.6 shows that in general, for spatial prediction, BM more likely outperforms Kriging using measurements when the correlation between measurements and model outputs is big. Note there is one hour at which the BM prediction is much worse than Kriging. The reason is that the estimation of variance by BM is huge at that hour. This also reveals some instability of BM with the hourly data, which maybe one of the reasons that Fuentes and Raftery (2005) use the weekly average data in their analysis. â€¢ The coverage probability of BM's 90% predictive interval is 86.72%, which is fairly good. 113 ^ 0 o0 o 0 0^ 0 0 0 o_ o_ 0 0 0 0 0. 0 ^co c 0 coo 0 00 0 0 UJ 0 o 0 1 .7r ^ocspo 30 0 0 oP^6 eau cti" (9 o Â°^voo 000 00 0 >, Â°0 of 000 , 1 oe%^80. e 0 0 00 00 Â°^ 0 0 ^,(:^ )? 0(1,8 0 cf) 0 cc^'7 CS) 2 w a- 0 I (/) 2 EC C â€¢ 0 0 0.0 ^ I^ 0.2 ^ I^ 0.4 ^ 0.6 ^ spatial correlation between observation and output Figure 7.6: Hourly RMSPE (root mean square prediction error) difference between Kriging using measurements and BM versus the correlation over space between measurements and model outputs. Points above the horizontal line in the plot represent victories for the BM model over Kriging. 114 I 0.8 Analysis of 8-hour Daily Average Data This section presents our analysis of daily 8-hour (1000-1700) measurements and model outputs averages. We would make the following observations about our analysis of daily average data. â€¢ Table 7.2 shows that for daily averages, the BM model has the smallest RMSPE among all competitive prediction approaches. The RMSPE of the BM model is smaller than Kriging using measurements on 18 out of the 30 days. The average RMSPE for Kriging using model outputs is the biggest, showing the need to calibrate the model outputs. â€¢ Figure 7.7 confirms the intuitively plausible result that in general the prediction performance of the BM model improves as the correlation between measurements and model outputs increases. When the correlation exceeds 0.6, the BM model performs substantially better than Kriging on most days. â€¢ In implementing BM, we assume the additive calibration parameter a(s) is a polynomial function of the coordinates at location s, that is a(s) = f (8)0 a . We use the reversible jump MCMC to choose the degree of the polynomial function. The reversible jump MCMC can sample the dimension of O a from its posterior distribution. Section 3.6 describes this reversible jump MCMC in detail. For all 30 days, the posterior distribution of O a ranges between 1 to 3. So, we can 115 assume a(s) is a linear function of the coordinates at location s, that is, a(s) = a o +a i s i +a 2 s 2 , s 1 and s 2 being the longitude and latitude in degrees at location s. By assuming a is a linear function coordinates at location s, we get a smaller RMSPE than by assuming a is a constant across the space. â€¢ In BM (2.9), we assume that the multiplicative calibration parameter b is constant across stations. The more realistic this assumption, the better will be the BM predictor's performance against Kriging. In the dataset, each station is located in a grid cell, so the model outputs for that grid cell can be treated as the model output at the station inside the grid cell. Thus, at each grid cell B, we can plausibly estimate b by -(B) â€” et(B) b(B) Z (B) â€ž (7.4) where 2(B) = fB 2(s)ds is the integral of {2(s)} over the grid cell B. Because we only have one sampling point in each grid cell, this integral is just 2(s), that is, the measurement at the station inside grid cell B. After obtaining b(B) from the above formula, we can compute the sampling variance of {b(B)} and the absolute mean difference of {b(B)} between grid cells having measured or unmeasured stations. Figure 7.8 and Figure 7.9 show plots of (Kriging RMSPE - BM RMSPE) versus Var(b) and the absolute mean difference for b, respectively, for the 30 days in July. From these plots we can see that the BM prediction is 116 better than Kriging when Var(b) and the absolute mean difference of b are small. This finding is expected because the BM prediction is better when the model assumptions are more justified. â€¢ The coverage probability of the 90% credible for the BM prediction is 87.33%, which is fairly good. Analysis of weekly Average Data Fuentes and Raftery (2005) analyze weekly average SO 2 measurements and model outputs in USA. The available/unavailable stations and grid cells used in the analysis of weekly average data is the same as in the analysis of daily average data. However, with averages over longer time scales, the predictions of both the BM model and Kriging are expected to improve because of better normality of the data distribution and smaller variation of the average data over the longer time scale. The improvement of the BM prediction also lies in the better performance of deterministic model for the longer time scale as noted by Kasibhatla and Chameides (2000); Hogrefe et al. (2001b,a). Table 7.3 presents that the RMSPE of the BM model , Kriging with only measurements, Kriging with measurement plus model outputs without calibration, Kriging with only model outputs without calibration. We have the following findings from the analysis of weekly average data. â€¢ Table 7.3 shows that the BM model achieves the smallest RMSPE in 117 0 0 0 00 O 0 0 0 O O Â° 0 O^0 O 0 0 0^ 0 0 O O O 0 0.1^0.2^0.3 ^ 0.4 ^ 0.5 ^ 0.6 ^ spatial correlation between measurements and model outputs Figure 7.7: RMSPE difference between Kriging prediction with measurements and the BM prediction versus the correlation over space between measurements and model outputs. Points above the plotted horizontal line represented victory for the BM model. Notice the supremacy of the BM model when correlation exceeds 0.6. 118 0.7 Table 7.1: Average RMSPE of hourly ozone predictions at 30 stations. Column 1: RMSPE for BM model. Column 2: RMSPE for Kriging using measurements. Column 3: RMSPE for Kriging using model outputs. The unit of the numbers in this table is ppbv. BM Kriging 1 Kriging 2 15.82 16.36^16.91 0 O O O 0 0 0 0 0 O 0 O O 0^ 0^ 0 0 0 O O O 0.05 ^ 0.10 ^ 0.15 ^ 0.20 variance of b over space Figure 7.8: Daily RMSPE difference between Kriging with measurements and the BM model versus the variance of b (defined by formula (7.4) ) across space. Points above the plotted horizontal line represented victory for the BM model. Notice the supremacy of the BM model when the variance of b over space is small. 119 (.0 O 0 0 0 0^c, 0 O 0. O O 0^ 0 O O 0 0 0 0 0 0.0 ^ 0.1^ 0.2^ 0.3 ^ 0.4 absolute mean difference of b between measured and unmeasured stations Figure 7.9: Daily RMSPE difference between Kriging with only measurement and the BM model versus the absolute mean difference of b (defined by formula (7.4)) between the measured and unmeasured stations. Points above the plotted horizontal line represented victory for the BM model . Notice the supremacy of the BM model when the absolute mean difference of b over space is small. 120 Table 7.2: RMSPE (root mean square prediction error) for predicting daily average of ozone levels at 30 stations. Column 1: days. Column 2: RMSPE for BM model. Column 3: RMSPE for Kriging with measurements. Column 4: RMSPE for Kriging with model outputs. The number with * is the smallest number in each row. The unit of the numbers from column 2 to 4 in this table is ppbv. day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 mean BM 14.50 8.99 11.43 13.53 13.65 14.43 13.59 11.73* 17.86 15.14 11.59 15.30* 14.14* 15.61* 18.55 17.95 11.37* 13.67 6.98 * 12.84* 17.44 9.48* 10.04* 9.60* 11.76* 17.55* 12.26* 14.31* 13.49 12.41* 13.37* Kriging 1 13.98 8.58* 8.71* 12.19* 10.63* 11.69* 13.04* 11.82 14.11 9.43 15.60 16.24 19.61 22.67 21.06 19.36 14.14 16.84 11.54 14.03 16.61* 11.88 11.65 9.37 12.90 18.11 15.94 18.33 12.63* 14.66 14.24 121 Kriging 2 13.91* 12.86 11.84 15.51 15.59 16.54 13.54 12.89 15.99 12.76 11.32 15.30 15.16 17.99 18.45* 17.60* 11.70 11.65* 10.51 15.45 17.33 12.54 8.65 11.99 21.11 19.03 14.95 18.07 19.10 14.26 14.79 week 2 and week 4. On average over all 4 weeks, the BM model has smallest RMSPE. â€¢ As expected, the RMSPE for all the predictors are smaller than RMSPE in the analysis of daily average data. â€¢ the BM's 90% predictive interval has a reasonably good coverage probability of 91.67%. â€¢ The RMSPE for BM in Tables 7.1, 7.2 and 7.3 tells us that BM's predictor improves as the averaging time scale increases. This agrees with the findings in Kasibhatla and Chameides (2000); Hogrefe et al. (2001a), that is, the model outputs represents the measurements better on larger time scales. 7.2 Summary and Conclusions Previous sections present the details of the BM model and the MCMC algorithm used to fit the model. We have also conducted a comprehensive simulation and applied BM to the ozone data in different time scales. We see that the BM model has its strengths and weakness. Its strengths are the following. â€¢ The novel method connects the measurement process with the model outputs process by assuming the existence of an underlying process. 122 Table 7.3: RMSPE (root mean square prediction error) for predicting weekly average of ozone levels at 30 stations. Column 1: days. Column 2: RMSPE for BM model. Column 3: RMSPE for Kriging with measurements. Column 4: RMSPE for Kriging with model outputs. The number with * is the smallest number in each row. The unit of the numbers from column 2 to 4 in this table is ppbv. week BM Kriging 1 Kriging 2 1 9.43 7.98* 13.77 2 9.72* 11.22 10.51 3 10.32 10.21 9.83* 4 8.08* 11.76 12.71 mean 9.38* 10.29 11.71 123 This approach addresses the difference-in-support problem in a fundamental way. â€¢ the BM model can do a variety of things such as predict unmeasured responses, assessing deterministic model outputs, detecting spatial trends and estimating spatial correlation. â€¢ The model better estimates prediction uncertainty than the classical approach, Kriging. In our data analysis, the coverage probability of the BM's predictive interval is reasonably close to the nominal level. â€¢ Its Bayesian framework makes the BM model relatively easy to extend to incorporate other things like ensembles, the reversible jump MCMC and non-stationary spatial correlation. However, the BM model also has some weakness as following. â€¢ The computational price is high. By sampling points within grid cells, the dimension of the spatial correlation matrix can be very big even with a modest number of grid cells. Inverting the spatial covariance matrix three times in each MCMC iteration takes a lot of computation time. â€¢ the BM model does not yet cover space-time processes and hence it cannot "borrow strength" over time. Ozone data are recorded hourly, which obviously has strong periodicity and strong auto-correlation. The main challenge to create a "space-time" BM model lies in the 124 computation burden. With temporal correlation, the space-time correlation matrix will be huge and hence more likely to be ill-conditioned. Inverting such a matrix will be both difficult and computationally expensive. However, the dynamic BEM model we present later in Chpater 10 provides a simple extension to a 'space-time" BM model. â€¢ the BM model's normality assumption poses a problem. For the measurements, a transformation maybe used to validate that assumption. However, with the model outputs data, non-linear transformations cannot be used since the model outputs is represented by such an integral of the true process. â€¢ The locations of sampling points within grid cells change because of the random sampling scheme. Because the mean of the underlying process depends on its geographical coordinates in general (like universal Kriging), the prediction result have some variation every time the BM model is used for the same data. Finally, we offer the following observations. â€¢ Where some of the stations are within grid cells, it would seem better to use them instead of sampling points within the grid cells to reduce the dimension of the spatial correlation matrix. â€¢ The meaning of the concept of a "true underlying process" seems unclear since it is merely a conceptual construct rather than a physically meaningful process. Thus its may will be criticized. 125 â€¢ We assume the underlying process mean to be a polynomial function of the geographical coordinates, which may not be enough if other variable such as temperature also affects the ozone level. We can sidestep the assumption of a true underlying process by regressing the measurements on the model outputs directly. 7.3 Spatial-temporal Model Prediction and Forecasting In previous sections of this chapter, we have used the BM to predict hourly ozone concentrations and found that its predictions are only marginally better than Kriging. Moreover, the hourly data has strong auto-regression structure which is totally ignored in BM. This section presents a new analysis of the hourly data by using the Bayesian spatial-temporal model proposed in Chapter 5. We still use the hourly measurements and model outputs from the same 78 stations and grid cells as in the previous section. By our selection, each grid cell has one and only one station inside. We use the first 240 hours of measurements and model outputs at 15 stations as training data. Then we forecast the ozone concentration levels in the next 240 hours for the 15 stations and also predict the first 240 hours' ozone concentration levels at the remaining 63 stations. To forecast and spatially predict the ozone concentration levels, we use both the ad-hoc approach and the Bayesian spatialtemporal model proposed in Section 5.2 and Section 5.3. The comparison 126 between the ad-hoc approach and the Bayesian spatial-temporal model will be summarized in this section. As a preliminary step, we use the two-step regression procedure. Figure 7.10 shows the auto-correlation function (ACF) and partial auto-correlation function (PACF) of the residuals Et at only one station. It is clear that the residuals are temporally independent. The ACF and PACF plots of the residuals at other stations show similar results. So the AR(1) temporal structure seems appropriate to model the process 1\18 , t in model 5.5. To fit the Bayesian spatial-temporal model (5.6), we use 500 iterations of Gibbs sampling algorithm and the first 50 iterations for the "burn-in" period. Figure 7.11 shows the time series plots of the posterior MCMC samples for some parameters as examples. From that plot, we can see the Markov Chains converge after the "burn-in" period. We use the root mean square forecast error (RMSFE) and root mean square prediction error (RMSPE) to evaluate accuracy of the forecasts or predictions, the smaller RMSFE or RMSPE, the better the forecast or prediction. We define RMSPE in (7.3). At each station location s, RMSFE is defined as (7.5) 0 8 , t being the real measurement at time t and O B , t , the forecast or prediction. Table 7.4 presents estimation results for some of the parameters in model (5.5). Most of the estimates differ significantly from 0 except a 2 , the spatial 127 5 15 10 20 Lag in hours U0 0 0_ '^ I^ I 0 ^ 5 ^ Is ^, ^ ^ 15 20 Lag in hours Figure 7.10: Plots of partial ACF and ACF for the estimated residual E t at one station. 128 variance of the process a. Table 7.4 shows the posterior standard deviations of A, to be much smaller than those of A u and A,. As stated in Section 4.1, the range parameter of the spatial process is usually difficult to estimate. That is why the posterior standard deviations are large for A a and A. However, for A E , we have independent replicates of the residual process {6 3 } over 239 hours in this data analysis, and thereby see a reduction in the posterior standard deviations of A,. The RMSFE of the forecasts by three approaches, Bayesian spatial-temporal model (5.5), the ad-hoc approach in Section (5.2) and the model outputs are in Table 7.5. As an example, Figure 7.12 shows the plot of measurements versus the forecast by the Bayesian spatial-temporal model for one station. We also tried to use AR(2) for Ar a , t in model (5.5). The estimation of the second auto-regression coefficient is as small as 0.025 and the standard error of that estimation is 0.013. So the second auto-regression coefficient is not significant. This shows that AR(1) is enough for NB , t , which in agreement with our preliminary analysis. In additional, the predictions and forecasts become a little worse if we change from AR(1) to AR(2). So for this data analysis, AR(1) is more appropriate than AR(2) for N8 , t . The spatial predictor predicts the ozone concentration levels during the first 240 hours at the 63 sites that have only model outputs but no historic measurements. The four predictions include the Bayesian spatial-temporal model, the ad-hoc approach, Kriging, and the model outputs. By Kriging, we mean using the Kriging approach to interpolate the measurements at the 63 sites repeatedly for every hour. So the Kriging approach does not use 129 ci rho 100^200^300^400^500 sigmaA2 100^200^300 100^200^300^400^500 lambda ^ 400^500 MCMC iterations 100^200^300 400^500 MCMC iterations Figure 7.11: Plots of the parameter values in the Gibbs sampling iterations. Upper left panel: a l ; upper right panel: rho; lower left panel: o -2 ; lower right panel: A,. 130 â€” measurements --- spatialâ€“temporal model fc model outputs time in hours Figure 7.12: Plot of measurement and the forecast and with 90% credible interval using the Bayesian spatial-temporal model at one of the 15 stations. 131 the model outputs. Tables 7.6 and 7.7 present the RMSPE of these four approaches. As an example, Figure 7.13 shows the plot of measurements versus the spatial predictions using the Bayesian spatial-temporal model at one site. The Bayesian spatial-temporal model yields credible predictive intervals for the forecasts and predictions through the quantiles of the MCMC samples of the forecasts and predictions. The coverage probability of the credible interval is defined as the proportion of the true measurements that are forecast or predicted, falling in the respective credible intervals. The coverage probabilities of the 90% credible intervals for the forecasts and predictions are 90.00% and 94.69% respectively. Based on the above forecast and spatial prediction results, we have the following conclusions. â€¢ The two-step linear regression procedure can only forecast the measurements. The Kriging approach can only spatially predict the measurements. Both the ad-hoc approach and the Bayesian spatial-temporal model can both forecast and spatially predict the measurements at the same time. From Figure 7.12 and Figure 7.13, we can see that the forecast and prediction from the Bayesian spatial-temporal model track the real measurements much better than the model outputs. â€¢ Tables 7.5, 7.6 and 7.7 show that in averaging over the stations, the Bayesian spatial-temporal model gives the smallest RMSFE and RM- 132 â€” measurements --- spatialâ€“temporal model pre dictions model outputs S _o _o 0. C C 0 8 C a) C O a) `'t C O N 0 O _ 0 50 100 150 200 time in hours Figure 7.13: Plot of measurement and the spatial prediction with 90% credible interval from the Bayesian spatial-temporal model at one of the 63 ungauged sites. 133 0 O 0 0 O^0 0 0 7,$ 0U 0 Cs O0 O o Â°Â° 0 Â° ooÂ° oÂ° o^o o^0 O o o 00 0 O 0 0 ^ Â° o O 00 .0 0. ^ rn o 0 0 0 Â° 0 0o cÂ° o 0 O o0 oo 0 0^ 0 0 10 ^ 15 distance in 111.21 km Figure 7.14: Plot of distance versus correlation for the normalized residuals from model 5.5. 134 SPE among all the competitive approaches for forecasting and spatially prediction. For forecasting, the mean RMSFE of the Bayesian spatial-temporal model is about 16% smaller than the forecasts from the model outputs. The RMSFE of the ad-hoc approach is slightly bigger than the Bayesian spatial-temporal model. In prediction, not surprisingly, Kriging has the biggest RMSPE because we have interpolated 63 sites using measurements from only 15 stations. The RMSPE of the Bayesian spatial-temporal model is about 12% smaller than the ad-hoc approach and the model output. So, using model (5.5) to calibrate the model outputs seems worthwhile for achieving better forecasts and predictions. The coverage probabilities of the intervals for the Bayesian spatial-temporal model are fairly well calibrated, meaning the uncertainties in the forecasts and predictions are characterized for quite well. â€¢ Although in averaging over the sites, the Bayesian spatial-temporal model achieves the smallest RMSFE and RMSPE, it is also marginally better than the ad-hoc approach in forecasting albeit at significant cost in computational time and parsimony judging from the number of parameters. Table 7.5 shows that the Bayesian spatial-temporal model has smaller RMSFE than ad-hoc approach at 9 out 15 stations. Table 7.6 and 7.7 show that the Bayesian spatial-temporal model has smaller RMSFE than ad-hoc approach at 46 out 63 stations. So, recognizing the cost stated above, the major benefit of the Bayesian spatial135 temporal model lies in the higher quality of spatial predictions and in its well calibrated credible predictive intervals. â€¢ Table 7.4 shows the posterior mean of the auto-regressive parameter p to be as big as 0.91, which suggests strong auto-regression in the residuals {Nt }. â€¢ At time t, the residuals vector over space is e t and the normalized residuals vector is e t , = E E 2 e t . Figure 7.14 shows the empirical spatial correlation for the normalized residuals from the Bayesian spatial-temporal model 5.5 in relationship to inter-site distance. We can see that except for few pairs of stations, the normalized residuals do not have much spatial correlation. There are only two pairs of stations whose normalized residuals still have spatial correlation higher than 0.5. The reason could be the similar temporal pattern of the normalized residuals which are not captured by the spatial-temporal model. 7.4 Multivariate Spatial-temporal Model Prediction and Forecast This section presents the results of applying the multivariate spatial-temporal model (6.1) and (6.2) to the ozone pollution data described in Section 7.1. We use the measurements and model outputs from the same 78 stations and grid cells as in Section 7.3. We have 119 days of hourly measurements 136 and model outputs. On each day, we take the 8-hour day time average of measurements and model outputs from 1000 to 1700 and the night time average from the rest hours. On each day, we have the day- and nighttime averages of measurements as bivariate responses. The day- and nighttime averages of model outputs are the covariates M in model (6.1). We use the data for the first 80 days at 15 stations to fit the the multivariate spatialtemporal model. Then we forecast the measurements in the next 39 days at the 15 stations and spatially predict the measurements at the remaining 63 sites for the first 80 days. In model (6.1), we assume the residuals {N.,â€ž t } at each site form an AR(1) process which has a non-zero mean with covariates {Z 8 , t }. In the application of univariate model (5.5), the covariates {Z s , t } are indicator functions to include the 12-hour periodicity of the measurements. However, the day- and nighttime average measurements are on a daily scale and they do not have such strong periodicity as the hourly measurements. We have used indicator functions as in Section 5.1 to include the 7-day periodicity of the measurements. However, the forecasts or predictions are not improved. So we just assume -y 8 = 0 in model (6.1) when we do the data analysis. We still include the covariate matrix Z in the multivariate model (6.1) for the completeness and flexibility of the model. We evaluate the forecast and prediction results by RMSPE and RMSFE defined by (7.3) and 7.5. Table 7.8 presents the estimation results for the parameters !t o , p and E. An alternative to using the multivariate model (6.1 is to apply the univariate model (5.5) to the day- and nighttime average data separately. To illustrate 137 the benefit of the multivariate model, we compare the RMSFE and RMSPE of these two approaches. Table 7.9 presents the comparison results. Table 7.8 and Table 7.9 lead us to the following conclusions. â€¢ The posterior mean of /2 04 , 2 is significantly different from 0. This shows the model outputs are helpful in predicting and forecasting daytime measurements. â€¢ The posterior means of /1,3, 2 , 2 and itt3, 2 , 3 are not significantly different from 0 because the ozone levels at night are much lower than the daytime. This shows the nighttime measurements are much more difficult to predict or forecast from the model outputs. â€¢ The posterior mean of E 1 , 1 is much larger than E2,2â€¢ In other words, the variation of daytime measurements are much larger than the nighttime measurements. This is not surprising to us because the ozone concentration levels are much higher at the daytime. â€¢ The coefficients of the auto-regressive matrix p are significantly different from 0. However, the auto-regression is not as strong as in the univariate case. The estimation of the auto-regressive parameter in univariate case is 0.91 while in multivariate case the parameters are no larger than 0.45. The reason is that the measurements are on an hourly scale in the univariate analysis but are daily averages in multivariate analysis. 138 â€¢ Table 7.9 shows that the multivariate model forecasts and predictions have smaller RMSPE and RMSFE than the univariate models for both day- and nighttime measurements. So it is worth using multivariate spatial-temporal model (6.1) to achieve better forecasts and predictions of the day- and nighttime measurements for daily measurement aggregates. â€¢ The coverage probabilities for 90% prediction and forecast credible intervals are 84.73% and 83.93% respectively. This shows the uncertainties of the predictions and forecasts are taken into account reasonably well by the model. 139 Table 7.4: Posterior means and standard deviations of parameters a l , â€¢ â€¢ â€¢, a 5 , C17 â€¢ â€¢ â€¢ 7 c5, QE, Ae) Aal Cra2 7 017 Act and p. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. al ci o A, Aa a2 cfc P mean 16.57 0.38 48.09 0.35 4.70 33.00 0.33 0.91 sd 2.81 0.07 1.15 0.03 2.41 27.4 0.13 0.004 140 a2 c2 a3 c3 a4 c4 a5 c5 mean 16.67 0.38 20.62 0.35 19.73 0.37 17.48 0.62 sd 2.83 0.06 3.89 0.06 3.40 0.07 2.92 0.08 Table 7.5: The first column gives the indeces of 15 stations where the first 240 hours' measurements are available. The second, third and forth columns are the RMSFE (square root of the mean square forecast error) of the next 240 hours' measurements fore-casted by the Bayesian spatial-temporal model, ad-hoc approach and model outputs. The number followed by * indicates the approach that has the smallest RMSFE in that row. The unit of the numbers from column 2 to 4 in this table is ppbv. station spatial-temporal ad-hoc model outputs 1 21.51* 21.51* 26.30 2 19.70* 20.99 31.13 3 13.73 12.02* 16.29 4 18.16 16.69* 20.06 5 15.61* 17.21 14.36 6 14.54 10.51* 16.51 7 19.33* 22.12 26.90 8 19.91 19.30* 18.98 9 14.08 13.67* 16.74 10 11.28* 12.57 11.58 11 12.04* 12.70 16.64 12 15.37 15.09* 15.89 13 10.95* 12.20 15.04 14 12.57* 14.32 14.66 15 14.71* 15.63 17.51 mean 15.57* 15.77 18.57 141 Table 7.6: RMSPE at ungauged sites 1-35. Column 2: Bayesian spatialtemporal model; column 3: ad-hoc approach; column 4: model outputs; column 5: Kriging with measurements. The number followed by * indicates the approach that has the smallest RMSPE in each row. The unit of the numbers from column 2 to 5 in this table is ppbv. site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 spatial-temporal ad - hoc model 16.16 13.08* 16.32 11.68* 12.07 13.48 13.19 * 13.91 13.38 16.17* 17.08 22.65 14.63* 29.98 19.59 14.17 18.15 11.85* 12.59 * 12.98 14.55 12.90* 13.62 17.35 16.91* 17.28 21.85 14.74 16.26 13.31* 15.31 16.07 11.49* 15.82 17.94 15.02* 15.33 20.35 11.77* 12.75* 15.37 16.46 12.62 11.49* 15.05 14.71 12.91* 14.24 19.49 18.77 18.35* 11.35* 15.76 16.15 16.44 14.29* 14.76 15.83 14.15* 15.38 12.85 13.06 12.65* 11.66 10.26 * 11.21 15.56 16.25 16.11 9.44* 10.75 14.17 15.08* 15.93 15.47 14.08 13.40 13.33* 11.41* 19.10 15.94 12.42* 12.94 14.97 16.00 * 19.17 18.02 12.45* 18.70 13.94 19.24 17.84* 21.82 14.22* 21.61 16.28 14.04* 22.79 16.47 14.91* 15.29 15.39 12.40 14.76 10.86* 142 Kriging 16.51 17.05 24.15 19.17 43.40 40.89 22.80 23.10 27.81 31.90 35.59 29.32 36.71 20.84 16.70 17.11 19.14 23.07 18.33 17.70 18.79 14.83 10.38* 19.64 24.78 25.74 27.33 26.94 32.11 32.67 25.94 34.19 32.95 27.05 24.43 Table 7.7: RMSPE at ungauged sites 36-63. Column 2: Bayesian spatialtemporal model; column 3: ad-hoc approach; column 4: model outputs; column 5: Kriging with measurements. The number followed by * indicates the approach that has the smallest RMSPE in each row. The unit of the numbers from column 2 to 5 in this table is ppbv. site 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 mean spatial-temporal ad-hoc 13.65* 22.56 14.29* 23.24 12.71 12.52* 18.22* 21.30 13.75* 14.86 13.80* 14.49 14.72 13.93* 19.59 14.66* 19.39 14.28* 15.12 15.07 13.54* 14.35 13.40* 13.80 16.18 15.54* 15.02* 20.88 12.39* 15.35 11.24* 11.48 11.78* 12.60 17.99 18.07 13.04* 12.77 15.08* 15.47 11.98* 17.61 13.98* 19.41 14.70* 19.02 11.01* 17.40 10.00* 13.34 14.23* 14.32 22.87 18.49 17.97 15.48* 14.43* 16.13 143 model 16.01 18.30 13.91 24.27 18.99 20.00 18.46 19.99 17.69 17.79 13.75 15.21 18.91 19.65 13.42 12.86 16.20 19.14 18.63 17.42 18.32 20.09 17.01 18.97 18.36 16.15 24.94 15.92 16.50 Kriging 35.43 33.09 26.54 20.20 16.91 16.57 18.77 26.53 23.84 14.22* 31.88 28.34 19.62 25.71 13.38 14.31 13.79 16.47* 13.77 15.31 21.02 25.79 25.05 23.48 18.03 19.83 14.62* 19.99 23.36 Table 7.8: Posterior mean and standard deviations of the parameters pm p and E. tio,i,i and /10,2,1 are the means of intercepts; /10,1,2 and N3,1,3, the mean of coefficients of day- and nighttime model outputs of the daytime measurements; ,us, 2 , 2 and i.to, 2 , 3 , the mean of coefficients of day- and nighttime model outputs on the nighttime measurements. Since this table is for the estimates of parameters, there is no unit associated to the numbers in this table. Each of these parameters is a scalar. it0,14 Af3,1,2 11 ,8,1,3 /1 13,2,1 P0,2,2 /2 0,2,3 E2,1 mean 17.82 0.41 0.20 17.54 0.05 0.11 10.33 sd 3.72 0.09 0.09 1.27 0.10 0.10 pi,' P1,2 P2,1 P2,2 E1,1 E1,2 E2,2 mean sd 0.45 0.01 0.22 0.01 0.17 0.02 0.25 0.01 131.89 10.33 50.37 - Table 7.9: Average of RMSPE and RMSFE for Bayesian multivariate spatialtemporal model (6.1) (columns 2 and 4) and univariate spatial-temporal model (5.5) (columns 3 and 5). The average of RMSPE is over 63 sites and the average of RMSFE is over 15 stations. The unit of the numbers in this table is ppbv. daytime nighttime RMSPE multivariate univariate 18.10 16.78 12.51 14.19 144 RMSFE multivariate univariate 14.44 15.91 9.71 9.69 Chapter 8 Calibration of Deterministic models Chapter 1 describes the ozone's policy relevant background (PRB) level and why it cannot be measured. That difficulty forces use of deterministic model output to infer the PRB level. However, that output needs to be calibrated to ensure the deterministic model output represents the PRB level. Calibration needs to be based on the measurements. This chapter explains how we can use the BM model (2.9) and Bayesian spatial-temporal model (5.5) to calibrate deterministic models and it presents an example involving model outputs of ozone concentrations. As in Chapter 7, we focus on the 8-hour daytime average measurements and model outputs for 78 grid cells and stations during the 30 days from July 1 to July 30, 1995. The calibrated model output at grid cell B by using BM model is (2(B) 1B La(s)ds) /b.^(8.1) The multiplicative calibration parameter b is assumed constant across space 145 and a(s) = a c, + aisi + a2s 2 while s 1 and s 2 are the longitude and latitude of location s. Fuentes and Raftery (2005) plug-in the posterior mean of a(s) and b into (8.1) to obtain the calibrated model output. Departing the plug-in approach used by Fuentes and Raftery (2005), we calibrate the model output using the BM model directly. In each iteration of the Gibbs sampling, we use formula (8.1) to calculate the calibrated model output, so that we get the empirical distribution needed for calibration. The calibrated model output by using the Bayesian spatial-temporal model is a + c2(B). (8.2) To illustrate the benefit of calibrating the model output, we compare in Table 8.1 the RMSPE for the calibrated and uncalibrated model outputs for the 30 days on which our study focuses. As the daily data analysis in Section 7.1. The data available is the measurements at 48 stations and model outputs at all the 78 grid cells. The calibrated model output at the remaining 30 stations are used to predict the measurements there. As an example, Figure 8.1 shows the plot of the measurements versus the uncalibrated and calibrated model output for day 1. Both the BM and Bayesian spatialtemporal models calibrated model outputs are included in that plot. Figure 8.2 shows the scatter plots of BM model calibrated model outputs versus uncalibrated model outputs and Figure 8.3 shows scatter plots of Bayesian spatial-temporal model calibrated model outputs versus uncalibrated model 146 outputs. From the calibration results for the model output, we have the following conclusions. â€¢ Figure 8.2 and 8.3 reveal marked differences between calibrated and uncalibrated model outputs because many points deviate from the solid line with intercept 0 and slope 1. â€¢ Table 8.1 shows that on average the Bayesian spatial-temporal model calibrated model outputs have the smallest RMSPE than the uncalibrated ones and BM calibrated model outputs also have smaller RMSPE than uncalibrated ones. After calibration, the mean RMSPE has been reduced by 16.07% for BM model, 21.48% for Bayesian spatialtemporal model. Out of all the 30 days, both BM and Bayesian spatialtemporal model calibrated model outputs have smaller RMSPEs than uncalibrated ones for 25 days. So it is beneficial to calibrate the model outputs by using either BM or Bayesian spatial-temporal model. â€¢ Figure 8.1 shows that both BM and Bayesian spatial-temporal calibrated model outputs are closer to the measurements than the uncalibrated ones at most stations. â€¢ Figure 8.4 shows the histograms of posterior means of calibration parameters a o , al, a2 and b in the 30 days. The posterior means of b range from 0.9 to 2.0 in most of the days. 147 â€¢ measurements model output BM calibrated output spatial-temporal calibrated output _CD â€¢ 7f) a) 0 0 O C a) 0 co C O O N C O N 0 â€” tO O O Co C) CD al 0 - 5 0 _C c0 0 Cu 0 ^ ^ ^ ^ ^ ^ 15 20 5 10 25 30 30 sites Figure 8.1: Measurements versus the uncalibrated and calibrated model output of 30 grid cells on day 2. The solid line shows the measurements inside the grid cells. The dotted and dashed lines are the uncalibrated and calibrated model outputs respectively. 148 Table 8.1: RMSPE of uncalibrated and calibrated model outputs for the prediction of measurements at the 30 stations. column 1: day 1-30; column 2: RMSPE of uncalibrated model output; column 3: RMSPE of BM calibrated model output; column 4: RMSPE of Bayesian spatial-temporal model calibrated model output; The number with a * indicates the "winner" in that row. The unit of the numbers from column 2 to 4 in this table is ppbv. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 mean model output BM 20.43 16.54 19.88 10.89* 10.41 7.60* 14.82 13.79 22.72 16.05 23.67 15.09 16.14 13.75 15.36 14.42 14.66 14.61* 12.23 11.19* 12.96 10.01 17.41 16.19 15.07 18.45 20.83 15.77* 23.19 20.14* 16.49 17.28 15.68 11.47 15.89 15.35 10.30 10.44 14.20 11.59* 19.78 26.13 17.96 11.07 12.29 11.37 15.87 10.02 22.67 12.35* 18.41 14.06 * 16.59 9.99* 19.70 18.75 13.44 11.33* 15.06 17.41 16.80 14.10 149 spatial-temporal 13.39* 12.17 8.78 12.00* 15.94* 14.02* 13.12* 11.30* 15.06 12.42 9.78* 15.00* 15.25* 22.28 22.79 16.22* 11.09* 13.83* 7.29* 13.14 14.37* 8.98* 8.29* 9.53* 13.92 14.12 10.80 13.15* 15.92 11.69* 13.19* 30^40^50^60^70^80 30^40^50^60^70^80 30^40^50^60^70^80 ^ 90 uncalibrated model outputs in ppbv 30 ^ 40^50 ^ 60^70^80^90 uncalibrated model outputs in ppbv Figure 8.2: Scatter plots of uncalibrated model outputs versus BM calibrated model outputs. The solid lines have intercept 0 and slope 1. The plots are for days 2, 7, 8 and 26 from the upper left to the lower right corner in left-to-right sequence. 150 - 30^40^50^60^70 30^40 ^ 50^60^70^80 80 30^40^50 ^ 60 70 80 90^30^40^50^60^70^80^90 ^uncalibrated model outputs in ppbv^ uncalibrated model outputs in ppbv Figure 8.3: Scatter plots of uncalibrated model outputs versus Bayesian spatial-temporal model calibrated model outputs. The solid lines have intercept 0 and slope 1. The plots are for days 2, 7, 8 and 26 from the upper left to the lower right in left-to-right sequence. 151 5 10 15 20 25 a0 -4 -3 -2 -1.0 -0.5^0 0 05 10 al 2 1 0^1.5^2.0 a2 b Figure 8.4: Histograms of posterior means of a 0 , a l , a 2 and b during the 30 days on which the current study focuses on. 152 Chapter 9 Application of the Bayesian Ensemble Melding Model This chapter presents the results of applying the BEM model to combine measurements and model outputs from an ensemble of deterministic models. Section 3.5 gives a technical introduction to the BEM model. Later in this chapter, we present the additional detailed modifications needed for a particular application we address. Section 9.1 introduces the intended application and Section 9.2 presents the results and conclusions. 9.1 Data Description Recently, people have paid attention not only to the weather forecast but also to the uncertainties of the forecast. Uncertainty is already a well established concept in statistical community. For example, by using a linear regression model, statisticians not only predict the response based on the available data but they also give a predictive interval for the predicted value. Raftery et al. (2005) propose an idea to probabilistic forecasting and they have already 153 set up a website (http://probcast.com/) to give such probabilistic forecast for the Pacific Northwest area. In Bayesian models, it is straightforward to use the MCMC samples from the posterior predictive distribution to give a probabilistic forecast. In this section, we give an example of using the BEM model to combine measurements with model outputs from more than one deterministic models to give spatial predictions and their credible intervals. For this purpose, we downloaded data from the MURI project website (http://www.stat.washington.edu/MURI/) . This dataset includes measurements and model output of the sea level temperature in the Pacific Northwest area. The time of the measurements and model outputs are available at 102 non-consecutive hours during January-June 2000. The model outputs are generated by the University of Washington mesoscale short-range ensemble system for the Pacific Northwest, which has been described by Grimit and Mass (2002). This system is a five-member multi-analysis ensemble consisting of different runs of the fifth generation Pennsylvania State University National Center for Atmospheric Research Mesoscale model (MM5). Each run of the MM5 model is made with different initial conditions. The raw deterministic model output are available on grid cells. However, preliminary analysis has been done to interpolate the model outputs to the locations of monitoring stations. So for this dataset, each station has one measurement and five "aligned" model outputs for the temperature. Raftery et al. (2005) modeled this dataset by using Bayesian model averaging of the five model outputs to make the 48-hour ahead forecast of the temperature. Here we 154 model the same dataset by using ensemble BM for future comparison. The number of monitoring stations varies from 311 to 650 at different hours. For each hour, we first calculate the Pearson correlation coefficients between measurement and each of the five model outputs over space. Thus, we have 102 correlation coefficients for each of the five initial conditions. Figure 9.1 shows the histograms of these correlation coefficients. Clearly the correlation coefficients are larger than 0.7 at most hours. The strong correlation makes the application of the BEM model a reasonable choice to combine the measurements and model outputs. 9.2 Data Analysis In our study, the number of stations varies from hour-to-hour and we choose a subset of 150 stations which have measurements and model outputs available at each hour. We use the BEM model to combine measurements from 30 stations and model outputs for all 150 stations to predict the temperature at the remaining 120 stations. We apply the BEM model to the measurements and model output at each hour repeatedly. Unlike the ozone data analyzed in Section 7.1, the model outputs of temperature are available at locations after interpolation and the data file we have does not have the coordinates of the grid cells' vertexes. So in this analysis, we no longer have the concept of grid cells and the conditional distribution (3.10) therefore needs some small modifications. Assume that we have m stations in total and measurements 155 I^ -7 ^I 06^06^07^08 model 1 0.3^04^05^06^07^08^09^1.0 05^0.6^07^08 model 2 model 3 (.) C CL) 7 Cr 0 J- T 05^07^08^09^ 0.6^05^06^07^08^09 model 4 model 5 Figure 9.1: Histograms of 102 hours' Pearson's correlation coefficients between measurement and model output over space. Panels 1-5 are the histograms for model output from five different MM5 initial conditions. are available at n stations, a subset of the m stations. However, the model outputs are available at all m stations. Vector Z is the realization of the true underlying process at the m stations, vectors k i (i = 1, â€¢ â€¢ 5), the realizations of the model output process from i-th deterministic model at the m stations, and vector Z, the realization of the measurement process at the n stations. Similar to but different from (3.10), the conditional distribution of (Z, Z 1 , â€¢ â€¢, Zp, Z) given (0, 0, a l , â€¢ â€¢ â€¢, aP, b 1 , â€¢ â€¢ â€¢, bp , cre2 ag, 1 , following. 156 , â€¢ â€¢ â€¢ , ag,p) is the p( 2 ,^Z)1/3, 0, al, â€¢ â€¢ â€¢, ap , b1, â€¢ â€¢7 bp, Cre2 1 41, â€¢ â€¢ .7 4,p) 01 exp^{--1 [(Z â€” A 0 Z) E iT 1^â€” A 0 Z) 2 P^T + E (2 i _ , _ b i z) E T i (k i â€”a i â€” b i Z) i=i + (Z â€” ii.) T E -1 (Z â€” p,)] = exp â€” 1 â€”Z T 4, E 0 1 2 +^liTET l k i + E -l ett i=i (k TEVA 0 +^b i kiET 1 + it TE -1 Z i=i P + 1 ZT E 1 +^14E771 + E -1 Z (9.1) i-i where p = 5 and A o is defined as 1^...^0^0^...^0 A0 = \0^. . .^1^0^...^0 The first n columns of A o form an identity matrix and the rest elements of A o 's are 0. 157 The prior distribution of (a i , b i ), i = 1, â€¢ â€¢ /a ti â€¢,p, is MVN(0 F). b The full conditional distribution of (a i , b i ) is ^IZi,Z, E i ti MVN(BC, B), bi ai) where B = 1T ZT and = ( 1T 1, 2, + Fi 1 1 i , ZT where E z is a diagonal matrix with dimensions m x m. The diagonal elements are ,5, the variance of model output error of deterministic model i. We first assume each additive calibration parameter a is a linear function of the coordinates of location s, that is, we have a 5 = a() + a 1 s 1 + a2s2, s 1 and s 2 being the longitude and latitude of location s. However, after fitting the BEM model, we discovered that the coefficients a l and a 2 are 158 not significant. So we assume both the calibration parameters a and b are constants across space. However, the calibration parameters still differ for the different deterministic models. That is, we have five sets of parameters a and b. Figure 9.2 and Figure 9.3 show the histograms of the posterior mean of a and b for different models. Each histogram shows the posterior mean of a and b for 102 hours. To predict the temperature at the 120 stations, we have three competitive purely spatial approaches: BEM, Kriging with measurements and averaging of the five model outputs. Figure 9.4 shows the histogram of the RMSPE of these three approaches over the 102 hours. The above analysis leads us to the following findings. â€¢ The mean RMSPEs of three prediction approaches over 102 hours are 2.98 (BEM), 3.00 (averaging the five model output with equal weights) and 4.10 (Kriging with measurement). The BEM model has the smallest mean RMSPE while the Kriging has the largest. Over the 102 hours, the BEM model achieves smaller RMSPE than Kriging at each of 77 hours. The coverage probability of the BEM's 90% credible intervals for the prediction is 82%, which although reasonable, suggests the BEM slightly underestimate the uncertainties of predictions. â€¢ In Kriging, we use the measurements at only 30 stations to predict 120 stations. So, it is not surprising that Kriging does not predict the measurements very well given the sparsity of the available data. Although the mean RMSPEs between BEM and model averaging outputs is very 159 close, Figure 9.4 shows that the distribution of BEM RMSPEs is more left skewed than of RMSPEs from simply averaging model outputs. Also BEM can estimate the calibration parameters for each of the five models. â€¢ Figure 9.3 and 9.2 show that the posterior means of bi are in the interval [0.9, 1.05] and [0, 20] respectively for most hours. The range of the measurements is from 250 to 313 ppbv. So overall, even without calibration, the model outputs represent the measurement very well. This coincides with the strong correlation between measurement and model outputs shown in Figure 9.1. â€¢ Figure 9.5 presents a plot of the RMSPE difference between Kriging and the BEM model versus the mean correlation coefficient between measurement and model output. The mean correlation coefficient is defined as the mean of five correlation coefficients between measurement and five model output. The plot shows that the stronger the correlation between measurement and model outputs, the better melding's predictions compared with Kriging. To summarize, the BEM model gives the best spatial prediction measured by RMSPE. The BEM model also reflects prediction uncertainty reasonably well. So, with a small number of measurements but a large number of model outputs, we can combine the measurements and model outputs to achieve better spatial prediction. The model outputs help in prediction because of 160 the strong correlations between measurement and model output. N C o cr a CT - Â° -10^0^10^20^30^ -20^- 0^0^10^2^30^40^50 a of model 1^ a of model 2 -20 - 0^0^10^2 a of model 3 C R7 a CT -10 ^a of ^ model 10^20^30 4 ^ - -20^- 0^0^10^20^30^40^50 ^ a of model 5 Figure 9.2: Histograms of posterior mean of a. Panels 1-5 are the histograms of a for model output from five different MM5 initial conditions. 161 1 0.90^0.95^1.00^1.05^ 0.80^0.85^0.90^095^1.00^1.05^ b of model 1^ b of model 2^ 090^095^1.00^105 b of model 3 LA; R- 0.90^0.95^1.00^1.05 b of model 4 0.50^0.85^0.90^0.95^1.00^1.05^1.10 b of model 5 Figure 9.3: Histograms of posterior mean of b. Panels 1-5 are the histograms of b for model output from five different MM5 initial conditions. 162 c ^ 5 ^ 6 BEM RMSPE in ppbv 3^ 6^ model average RMSPE in ppbv 6 10 Kriging RMSPE in ppbv Figure 9.4: RMSPE histograms of predictions by BEM, model averaging with equal weight and Kriging with measurements only. 163 ^^ 0 0 0 0 0 0 0 0 0^ 0 0 0 0 Â° oÂ°^ 0 o 0 c9 0 0 0 0^0^ 0^ 0^ , q ^0 0 0 0^ Â° 0 0 0^0 0^00^Â®0 0 0 Â°8 0 cb 0 o Â° 0 0 0 0 Â° o^Â° 0 8O^0 0 0 ' B 0 0 0.5^ 0.6^ 0.7^ 0.8 ^ 0.9 average correlation between model outputs and measurements Figure 9.5: The x-axis represents the mean correlation coefficients between measurement and model output, i.e. the mean of the five correlation coefficients between the measurement and five model outputs. 164 Chapter 10 Dynamic Bayesian Ensemble Melding Model Both the BM model in Section 2.4 and its extension, the BEM model in Section 3.5 are purely spatial models. So we need to extend them to spatialtemporal models. In some cases, a simple extension is available without the complexities involved in characterizing the temporal structure such as in AR or dynamical linear models. Section 10.1 presents a decision-making problem and its solution. Section 10.2 present a DBEM (dynamic Bayesian ensemble melding ) model which can forecast the future with historic data by using the solution of that decision-making problem. 10.1 DeGroot's problem Bayarri and DeGroot (1989) propose a decision making model to combine opinions from different experts. The idea for this model originally derives from a problem in the book by DeGroot (1970). We present our proof of DeGroot's problem and later show how to construct a dynamic BEM model 165 by using its conclusion. Theorem 10.1.1 (DeGroot's theorem) Each of the K experts has his/her own prior distribution for a certain parameter W, and let 7ri be the prior distribution which expert i assigns to W . The executive forms his/her opinion about W from the opinion of the K experts and that his/her prior for W is an weighted average of the 7ri . So, the executive's prior for W is r(w) =^Gi7Ti (w) i= 1 ^ ^ (x i = 1, cE i > 0. i=1 Furthermore, the K experts and the executive observe together the value of a random variable X whose conditional distribution when W = w is f (.1w). The posterior distribution of the executive will again be a weighted average of the posterior for the K experts. Proof The posterior for expert i is ri*(wlx) = 7ri(w)f(x1w) Pi (x) where pi (x) is the marginal distribution for expert i, that is, pi (x) = f f (x1w)R-i(w)dw 166 . The posterior for the executive is therefore 7* (w1x) = ir (w)f(x(w) p(x) ai 7ri(w)f (xlw) p(x) aipi(X) it (ID) f (x1 w) p(x)^pi(x) (w x)â€¢^ (10.1) i=1 Here pi (x) is the marginal distribution for expert i, that is, pi (x) = f f (x (w)7r, (w)dw The marginal distribution for the executive is p(x) = E ziÂ±( 1 f a, f (x1w) (w)dw So, the posterior for the executive is again a weighted average of posteriors (wjx), i = I, . â€¢ â€¢ K) for the K experts. The updated weight for expert i after observing X = x is = ce,p,(x) I p(x). The updated weight for each expert depends on his/her marginal distribution of X. 10.2 Dynamic Bayesian Ensemble Melding forecast Forecasting ozone concentration levels in urban areas has become more and more important for public health reasons such as the safety of people like asthmatics. The forecasts can be on various time scales such as hourly or daily. The rest of this section presents how to construct a DBEM model for 167 hourly forecast. However, the same mode can also be used to forecast the average daily ozone concentration levels. For the BEM model, we have hourly measurements Z and model outputs â€¢ â€¢, 2 from p deterministic models, the true underlying process Z and parameters a l , â€¢ â€¢ â€¢, ap b 1 , â€¢ â€¢, bp , /3, 9, , fie,^â€¢, aL. As in DeGroot's problem, the parameter is W^(a l , â€¢ â€¢ â€¢ , ap,b1, â€¢ â€¢ â€¢ , bp, 0, 0 , a e2 1 4,11 â€¢ â€¢ â€¢ 7 ag,p) . The measurements Z across space at each hour is the X in the DeGroot's problem. We use the first K hours' measurements and model outputs to fit the BEM model, from which we obtain the posterior distributions of the parameter W for each of the K hours. These K posterior distributions will be prior distributions for W for the next hour. So we have 7; . Â°) (w), â€¢ â€¢ â€¢ornw) as in DeGroot's problem. We take the weights a, (i = 1, â€¢ â€¢ â€¢ , K) of the prior distributions to be equal to 1/K. Suppose we have K + 1 hourly measurements and K + 2 hourly model outputs. The goal is to forecast the measurement at hour K + 2. After observing the measurements at K + 1 hour, we obtain the updated distributions 71 -1) (w) (i = 1, K) and their i corresponding weights a l) . So both the K distributions for W and their weights change dynamically after the measurements become available every hour. The predictive distribution of measurements at hour K + 2 is the following: P( k K+21 D ) = f a i 1) f K +21W, ( 168 ZK +2, â€¢ â€¢ â€¢ ,--p,17 (1) (W)dW , (10.2) where D stands for all the measurements and model outputs available. The likelihood function for W is f (Z1W, Z 1 , â€¢ â€¢ â€¢, kp). To use the con- clusion from DeGroot's problem, we have to derive this likelihood function. We know from that Z and Z 1 â€¢ â€¢ kp have independent normal distributions i given W and Z. Those normal distributions are the following: kl(z, w) ti mvN(A 0 z, 0- 2 i) 2 i 1(z,^MVN(ai1 b i A j Z,o LI) - ^ (10.3) i = 1,- â€¢ â€¢,p. The matrix A 0 and A 1 , â€¢ â€¢ â€¢, A p are the same as in the BEM model. To derive the conditional distribution f(k1W, Z 1 â€¢ â€¢ â€¢, Zr ), we need to obtain , the joint distribution of f(Z, Z 1 , â€¢ â€¢ â€¢, k p (w), which is a multivariate normal distribution. For simplicity, we suppress the conditional notation (.1W) in the following equations. 169 ^ E(2) = E(E(21Z)) = Var(Z)^Var(ELIZffl E(Var( 2 1Z))) Var(A 0 Z) +o 9 - - A 0 Eifor + cr./ E(2 i ) = E(E(2 i 1Z)) = ail + b i A i p Var(2 i )^Var(E(2iI2))) E(Var( 2 i1Z))) Var(a i l + b i .A i Z) +4/ = 14A j EAT+ (4 ,i / Cov(Z, 2 i )^E(E(k2ThilZ)) â€” E(k)E(2D b i A 0 EAT cov(2 i ,^= E(E(iTi";1z)) â€” E(2 i )E(2D = 1 < i<j<p. (10.4) Then the joint distribution of (Z, Z 1 â€¢ â€¢ â€¢, 2p ) given W is the following: i / Z \ 21 MVN(/./., ZP 170 where / Ao p /1 1 a 1 1 + biAitz \ tip Jap i bp A p ti and z-,01 El 0 E11 We have t oo = Var(k), E 11 = Var(2) and t oi = Cov(k, Z). Vector Z is formed by stacking 2 1 , â€¢ â€¢ Z, column by column. Then the conditional distribution of (klk i ,^kp,vv) is multivariate normal with mean f-to + tooti 11 - and variance E 00 â€” EolEil Eioâ€¢ Unfortunately, we do not have closed forms for either l3, or 7r 2 1) (w) so we have to approximate them by a Monte Carlo method, so further work will be needed to implement this DBEM model. However, the Monte Carlo approximation and the implementation of this model will present some challenges. 171 Chapter 11 Discussion and Future Work This chapter presents some discussion and potential future work. Section 11.1 discusses the BM model (2.9) and Section 11.2 presents some future work extending the Bayesian spatial-temporal model (5.5). 11.1 Bayesian Melding Model The melding model proposed by Fuentes and Raftery (2005) as well as Fuentes et al. (2003) assumes the existence of a true underlying process which serves as a basis on which to "meld" the measurement and model output processes. This melding approach is a novel idea and in some sense it can deal with the fact that measurements and model output processes have different supports. However this true underlying process also presents some challenges. First, it makes the model complicated and computationally expensive. When we first consider to extend the BM model into a spatialtemporal one, we think of a approach similar to the DLM (dynamic linear model) used by West and Harrison (1997) and Huerta et al. (2004). DLM assumes the coefficients in a linear model dynamically evolving over time. 172 For example, we can assume at time t, the coefficient O t IO t _ i N( t3 t _ 1 , E). Because the coefficients are correlated over time, so the response is also correlated over time. However, the assumption of this true underlying process presents an obstacle to the extension of the BM model into a dynamic BM which incorporates the temporal correlation. Second, it is difficult to extend the univariate melding model into a multivariate one. For example, we cannot use the approach of Schmidt and Gelfand (2003) who propose a cornputationally efficient co-regionalization approach. It seems that the biggest benefit of this true underlying process is to address the differential supports issue presented by measurements and model outputs. But in fact, even meteorological and environmental work often uses some preliminary analysis to interpolate the model output onto monitoring sites and then treat the interpolates as point based data. Examples include Kasibhatla and Chameides (2000); Hogrefe et al. (2001b,a); Fiore et al. (2003, 2004). If we really can ignore the differential supports issue, then the regression approach seems more appealing and easier to implement. In that approach, the responses are the measurements from the monitoring sites and the covariate is the model output interpolated to those sites or to other locations. Of course we have unequal numbers of model outputs and measurements, leading to a standard missing value problem. We either have missing responses or covariates. In the melding model (2.9), we assume the mean of the true underlying process is a function of only the coordinates. However, we know that other 173 variables have effects on the pollutant although we do not have their information available in our application. These variables include temperature, humidity, wind speed and so on. Without the information provided by these variables, the variance of the spatial residuals c(s) in model (2.9) can be large and predictions based on the true underlying process could be in question. However, when we regress measurements on the model outputs, at least part of that information is captured by in the model outputs. So the resulting residuals reflect only the part that model output fails to capture. The regression model discussed above has the following form: (s) = a(s) + bZ(s) + â‚¬(s). (11.1) In the above model, the residuals e(s) are spatially correlated with covariance matrix E constructed by correlation functions such as Matern, Gaussian or exponential. This linear model (11.1) is much easier to fit with fewer parameters and without the true underlying process. Moreover, it is not difficult to extend the model (11.1) to incorporate multivariate response and temporal correlation. For multivariate model, we can borrow the idea from Schmidt and Gelfand (2003). They decompose the multivariate model into several independent univariate models. This decomposition enables the computation to be sequential and less time consuming. In order to extend the model (11.1) to incorporate temporal correlation, we have two potential options. The first 174 choice is to use DLM. The other choice is to assume the residuals c(s) are correlated over time. We can assume AR or even ARMA (Auto-regressive moving average) model for the residuals over time t. If we use the dynamic linear model, the resulting spatial-temporal is easy to fit because most of the theory and results for the dynamic linear model can be applied here directly. By using the second option, the covariance matrix of residuals â‚¬(8, t) is a Kronecker product of spatial covariance matrix and the temporal covariance matrix. This complicated Kronecker product of two matrices may present some computational challenges. The above two options are both based on the assumption that the spatial and temporal correlation are separable. That is, at each time t, the spatial correlation remains the same and at each station s, the temporal correlation remains the same or at least have the same structure. 11.2 Bayesian Spatial-Temporal Model The Bayesian spatial-temporal model 5.5 has a number of limitations. â€¢ The coefficients a, c, 7 vary over space but remain constant over time. Raftery et al. (2005) also make this observation about their Bayesian averaging model and they investigate how long the historic period should be for fitting their model. For the sake of forecasting, we should use the most recent historic data but we also need a reasonably long period to have enough data to fit the model. As discussed in Section 11.1, 175 one possible solution is to use the DLM by allowing the coefficients to evolve in time. â€¢ The temporal and spatial correlation may or may not be separable. Gneiting (2002) discusses a class of non-separable but stationary covariance functions for space-time data. It is possible to incorporate this class of non-separable functions into the model 5.5. Another way to model the non-separable spatial-temporal correlation is to let the autoregressive parameter p in model 5.5 vary over space. Similar to the coefficients a, c, y, we can also assume p at different stations are spatially correlated. The autoregressive coefficients are always in the interval (-1, 1), so we can use truncated normal distributions to model them. That is, for autoregressive coefficient at n locations, we have (Pi, â€¢ â€¢ â€¢, â€” cNIVNitp, Ep) -1 [ - - 1< pi, â€¢ â€¢ â€¢,Pri < 1 ], where I stands for the indicator function. That is, this indicator is 1 if all the p's at n locations all in the interval (-1, 1), otherwise it is 0. The normalize constant c makes sure the integral of the multivariate normal density function to be 1. â€¢ Model 5.5 assumes the spatially correlated residuals are stationary. This assumption of stationarity could be relaxed by fitting a nonstationary model, say using the celebrated Sampson-Guttorp (SAG) method of Sampson and Guttorp (1992). The main idea is that we 176 can first fit the two-step linear regression model (5.1), (5.2) and (5.3). Then we apply the Sampson-Guttorp method to the spatial covariance for the fitted residuals to transform the coordinates from the original geographical plane (G-plane) to the deformed plane (D-plane). After the transformation, we can use the new coordinates of the D-plane and fit the Bayesian spatial-temporal model 5.5 over again with the help of the estimated SG covariance. We can also incorporate other non-stationary process modeling approaches into the spatial-temporal model. We have already listed some reference for such approaches at the end of Section 2.1 and beginning of Section 2.2. 177 Bibliography Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions, Dover. Banerjee, S., Carlin, B. and Gelfand, A. (2003). Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall/CRC. Bayarri, M. and DeGroot, M. (1989). Optimal reporting of prediction, Journal of the American Statistical Association 84: 214-222. Berliner, L. (2003). Physical-statistical modeling in geophysics, Journal of Geophysical Research 108: 80-91. Box, G., Jenkins, G. and Reinsel, G. (1994). Time Series Analysis: Forecasting and Control; 3rd Edition, Holden-Day. Brown, P. J. and Zidek, J. V. (1980). Adaptive multivariate ridge regression, The Annals of Statistics 8: 64-74. 178 Brown, P. J. and Zidek, J. V. (1982). Multivariate regression shrinkage estimation with unknown covariance matrix, Scandinavian Journal of Statistics 9: 209-215. Brown, P. J., Le, N. D. and Zidek, J. V. (1994). Multivariate spatial interpolation and exposure to air pollutants, The Canadian Journal of Statistics 22: 489-509. Calder, C. and Cressie, N. (2006). some topics in convolution-based spatial modeling, Proceedings of the 56th Session of the International Statistical Institute, Lisbon, Portugal, forthcoming. Caya, D., Laprise, R., Gigure, M., Bergeron, G., Blanchet, J., Stocks, B., Boer, G. and McFarlane, N. (1995). Description of the canadian regional climate model, Water, Air and Soil Pollution 82: 477-482. Cressie, N. A. (1993). Statistics for Spatial Data, Wiley-Interscience. Cripps, E., C., C. and Kohn, R. (2003). Variable selection and covariance selection in multivariate regression models, Technical report, SAMSI. DeGroot, M. (1970). Optimal statistical decisions, McGraw-Hill. Edwards, A. (1967). A biometric case history: Studies on the distribution of sexes in families, 1889-1966. abstract 1286, Biometrics. 179 Fiore, A. M., Jacob, D., Liu, H., Yantosca, R. M., Fairlie, T. D. and Li, Q. (2004). Variability in surface ozone background over the united states: Implications for air quality policy, Journal of Geophysical Research. Fiore, A. M., Jacob, D., Mathur, R. and Martin, R. V. (2003). Application of empirical orthogonal functions to evaluate ozone simulation with regional and global models, Journal of Geophysical Research. Fuentes, M. and Raftery, A. (2002). Model validation and spatial interpolation by combining observations with outputs from numerical models via bayesian melding, Technical Report 403, Department of Statistics, University of Washington. Fuentes, M. and Raftery, A. (2005). Model evaluation and spatial interpolation by bayesian combination of observations with outputs from numerical models, Biometrics 61: 36-45. Fuentes, M. and Smith, R. (2001). A new class of nonstationary spatial models, Journal of the American Statistical Association, being reviewed. Fuentes, M., Guttorp, P. and Challenor, P. (2003). Statistical assessment of numerical models, Technical Report 076, NRCSE. Garner, J., Lewis, T., Hogsett, W. and Andersen, C. (2005). Air quality criteria for ozone and related photochemical oxidants (second external review draft) volume iii, Technical report, U.S. Environmental Protection Agency. 180 Gelfand, A. and Smith, A. (1990). Sampling-based approaches to calculating marginal densities, Journal of American Statistical Association 85: 398-409. Gelfand, A., Schmidt, A., Banerjee, S. and Sirmans, C. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion), Test 13: 1-50. Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data, Journal of the American Statistical Association 97: 590600. Green, P. (1995). Reversible jump markov chain monte carlo computation and bayesian model determination, Biometrika 82: 711-732. Grimit, E. P. and Mass, C. (2002). Initial results of a mesoscale short-range ensemble forecasting system over the pacific northwest, Weather Forecasting 17: 192-205. Guillas, S., Tiao, G., Wuebbles, D. and Zubrow, A. (2006). Statistical diagnostic and correction of a chemistry-transport model for the prediction of total column ozone, Atmospheric Chemistry and Physics 6: 527-537. Guttorp, P. and Walden, A. (1987). On the evaluation of geophysical models, Geophysical Journal of the Royal Astronomical Society 91: 201-210. Haas, T. (1995). Local prediction of a spatio-temporal process with an application to wet sulfate deposition, Journal of the American Statistical Association 90: 1189-1199. 181 Haas, T. (1998). Statistical assessment of spatio-temporal pollutant trends and meteorological transport models, Atmospheric Environment 32: 18651879. Hastings, W. (1970). Monte carlo sampling methods using markov chains and their application., Biometrika 57: 97-109. Hawkins, D. and Cressie, N. (1984). Robust kriging-a proposal, Journal of the International Association for Mathematical Geology 16: 3-18. Higdon, D., Swall, J. and Kerv, J. (1999). Non-stationary spatial modelling, Bayesian Statistics pp. 761-768. Hogrefe, C., Rao, S., Kasibhatla, P., Hao, W., Sistla, G., R., M. and McHenry, J. (2001a). Evaluating the performance of regional-scale photochemical modeling systems: Part ii-ozone predictions, Atmos. Environ. 35: 4175-4188. Hogrefe, C., Rao, S., Kasibhatla, P., Kallos, G., Tremback, C., Hao, W., Olerud, D., Xiu, A., McHenry, J. and Alapaty, K. (2001b). Evaluating the performance of regional-scale photochemical modeling systems: Part i-meteorological predictions, Atmos. Environ. 35: 4159-4174. Huerta, G., Sanso, B. and Stroud, J. (2004). A spatio-temporal model for mexico city ozone levels, Journal of the Royal Statistical society 53: 231-248. 182 Kasibhatla, P. and Chameides, W. (2000). Seasonal modeling of regional ozone pollution in the eastern united states, Geophys. Res. Letter 27: 1415 1418. Kassteele, J. V. d., Koelemeijer, R. B. A., Dekkers, A. L. M., Schaap, M., Homan, C. and Stein, A. (2006). Statistical mapping of pm10 concentrations over western europe using secondary information from dispersion modeling and modis satellite observations, Stochastic Environmental Research and Risk Assessment (SERRA) 21: 183-194. Le, N. and Zidek, J. (1997). Interpolation with uncertain spatial covariance, Journal of Royal Statistics Society, B 59: 501-510. Le, N. D. and Zidek, J. V. (2006). Statistical Analysis of Environmental Space-Time Process, Springer. Lindley, D. and Smith, A. (1972). Bayes estimate for the linear model, Journal of the Royal Statistical Society. Series B (Methodological). Liu, Z. (2007). Bayesian melding model software in r, in preparation, Technical report, Department of Statistics, The University of British Columbia. Mardia, K. V. and Goodall, C. R. (1993). Multivariate Environmental Statistics: Spatial-temporal analysis of multivariate environmental monitoring data, Elsevier Science Publishers. Matern, B. (1960). Spatial variation, Technical Report 49, Meddelanden fran Statens Skogsforskningsinstitut. 183 Matheron, G. (1962). Trait de geostatistique appliquee, tome i, Memoires du Bureau de Recherches Geologiques et Minieres. Matheron, G. (1963). Traite de geostatistique appliquee, Technical report, Memoires du Bureau de Recherches Geologiques et Minieres. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. and Teller, E. (1953). Equation of state calculations by fast computing machines., Journal of Chemical Physics 21: 1087-1092. Nychka, D., Wikle, C. and Royle, J. (2002). Multiresolution models for nonstationary spatial covariance functions, Statistical Modelling: An Inter- national Journal 2: 315-331. Odman, M. and Ingram, C. (1996). Multiscale air quality simulation platform (maqsip) source code documentation and validation, Technical report, Environmental Programs, MCNC-North Carolina Supercomputing Center. Paciorek, C. and Schervish, M. (2004). Nonstationary covariance functions for gaussian process regression, in S. Thrun, L. Saul and B. SchOlkopf (eds), Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA. Paciorek, C. and Schervish, M. (2006). Spatial modelling using a new class of nonstationary covariance functions, Environmetrics 17: 483-506. 184 Poole, D. and Raftery, A. (2000). Inference for deterministic simulation models: The bayesian melding approach, Journal of American Statistical Association 95: 1244-1255. R Development Core Team (2006). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005). Using bayesian model averaging to calibrate forecast ensembles, Monthly Weather Review 133: 1155-1174. Ramsay, J., Hooker, G., Campbell, D. and Cao, J. (2007). Parameter estimation for differential equations: a generalized smoothing approach, Journal of the Royal Statistical Society: Series B 69: 741C796. Ribeiro, P. J. and Diggle, P. (2001). geor: a package for geostatistical analysis, R-NEWS 1(2): 14-18. ISSN 1609-3631. Sacks, J., Welch, W., Mitchell, T. and Wynn, H. (1989). Design and analysis of computer experiments, Statistical Science 4: 409-435. Sampson, P. and Guttorp, P. (1992). Nonparametric estimation of nonstationary spatial covariance structure, Journal of the American Statistical Association 87: 108-119. Sanso, B. and Guenni, L. (2002). Combining observed rainfall and deterministic prediction using a bayesian approach, Technical report, CESMA. 185 Schmidt, A. M. and Gelfand, A. E. (2003). A bayesian coregionalization approach for multivariate pollutant data, ournal of Geophysical Research- Atmospheres. Smith, A. (1973). A general bayesian linear model, Journal of the Royal Statistical Society. Series B (Methodological). Speed, T. (1983). Modelling, CSIRO DMS Newsletter 96: 1-2. Stein, M. (1999). Interpolation of spatial data : some theory for kriging, New York : Springer. Sun, W., Le, N. and Zidek, J. (1998). Assessment of bayesian multivariate interpolation approach for health impact studies, Environmetrics 9: 565586. van der Merwe, A. and Zidek, J. V. (1980). Multivariate regression analysis and canonical variates, The Canadian Journal of Statistics 8: 27-39. Wackernagel, H. (1998). Multivariate Geostatistics - An Introduction with Applications, 2nd Edition, Springer-Verlag. West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Mod- els; 2 edition, Springer. Wheeler, N. and Houyoux, M. (1998). Development and implementation of a seasonal model for regional air quality, Proc. Air and Waste Manage Association. 186 Wikle, C. (2001). A kernel-based spectral approach for spatio-temporal dynamic models, Proceedings of the 1st Spanish Workshop on Spatio-Temporal Modelling of Environmental Processes (METMA) pp. 167-180. Wikle, C. and Berliner, L. (2005). Combining information across spatial scales, Technometrics 47: 80-91. Wikle, C. and Cressie, N. (1999). A dimension reduced approach to spacetime kalman filtering, Biometrika 86: 815-829. Wikle, C. and Royle, J. (1999). Space-time models and dynamic design of environmental monitoring networks, Journal of Agricultural, Biological, and Environmental Statistics 4: 489-507. Wikle, C., Milliff, R., Nychka, D. and Berliner, L. (2001). Spatio-temporal hierarchical bayesian modeling: Tropical ocean surface winds, Journal of American Statistical Association 96: 382-397. Yaglom, A. (1987). Correlation Theory of Stationary and Related Random Functions, New York: Springer-Verlag. Zidek, J., Sun, W. and Le, N. (2000). Designing and integrating composite networks for monitored multivariate gaussian pollution fields, Journal of the Royal Statistical society 49: 63-79. 187
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Combining measurements with deterministic model outputs:...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Combining measurements with deterministic model outputs: predicting ground-level ozone Liu, Zhong 2007-12-31
pdf
Page Metadata
Item Metadata
Title | Combining measurements with deterministic model outputs: predicting ground-level ozone |
Creator |
Liu, Zhong |
Publisher | University of British Columbia |
Date | 2007 |
Date Issued | 2008-02-15T21:14:47Z |
Description | The main topic of this thesis is how to combine model outputs from deterministic models with measurements from monitoring stations for air pollutants or other meteorological variables. We consider two different approaches to address this particular problem. The first approach is by using the Bayesian Melding (BM) model proposed by Fuentes and Raftery (2005). We successfully implement this model and conduct several simulation studies to examine the performance of this model in different scenarios. We also apply the melding model to the ozone data to show the importance of using the Bayesian melding model to calibrate the model outputs. That is, to adjust the model outputs for the prediction of measurements. Due to the Bayesian framework of the melding model, we can extend it to incorporate other components such as ensemble models, reversible jump MCMC for variable selection. However, the BM model is purely a spatial model and we generally have to deal with space-time dataset in practice. The deficiency of the BM approach leads us to a second approach, an alternative to the BM model, which is a linear mixed model (different from most linear mixed models, the random effects being spatially correlated) with temporally and spatially correlated residuals. We assume the spatial and temporal correlation are separable and use an AR process to model the temporal correlation. We also develop a multivariate version of this model. Both the melding model and linear mixed model are Bayesian hierarchical models, which can better estimate the uncertainties of the estimates and predictions. |
Extent | 6411321 bytes |
Subject |
spatial-temoral models Bayesian Melding deterministic models |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Collection |
Electronic Theses and Dissertations (ETDs) 2008+ |
Date Available | 2008-02-15 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0066258 |
URI | http://hdl.handle.net/2429/362 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2008-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_2008_spring_liu_zhong.pdf [ 6.11MB ]
- Metadata
- JSON: 1.0066258.json
- JSON-LD: 1.0066258+ld.json
- RDF/XML (Pretty): 1.0066258.xml
- RDF/JSON: 1.0066258+rdf.json
- Turtle: 1.0066258+rdf-turtle.txt
- N-Triples: 1.0066258+rdf-ntriples.txt
- Original Record: 1.0066258 +original-record.json
- Full Text
- 1.0066258.txt
- Citation
- 1.0066258.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 36 | 13 |
United States | 21 | 0 |
Ukraine | 17 | 0 |
Russia | 17 | 0 |
Canada | 12 | 3 |
Malaysia | 10 | 0 |
Brazil | 8 | 0 |
India | 7 | 0 |
Germany | 6 | 11 |
Bulgaria | 3 | 0 |
France | 2 | 0 |
Australia | 1 | 0 |
Romania | 1 | 0 |
City | Views | Downloads |
---|---|---|
Unknown | 42 | 11 |
Shenzhen | 29 | 13 |
Saint Petersburg | 17 | 0 |
Ashburn | 10 | 0 |
Vancouver | 10 | 1 |
Kuala Selangor | 10 | 0 |
Beijing | 7 | 0 |
Mountain View | 4 | 0 |
University Park | 3 | 0 |
Kingston | 2 | 0 |
Mariupol | 2 | 0 |
Buffalo | 1 | 0 |
Fuquay-Varina | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066258/manifest