DYNAMIC BAYESIAN MODELS FOR MODELLING ENVIRONMENTAL SPACE–TIME FIELDS by Yiping Dou B.Sc., The XuZhou Normal University, 1997 M.Sc., The University of New Brunswick, 2003 Ph.D., The University of British Columbia, 2008 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Statistics) The University Of British Columbia (Vancouver) March, 2008 c© Yiping Dou 2008 Abstract This thesis addresses spatial interpolation and temporal prediction using air pollution data by several space–time modelling approaches. Firstly, we implement the dynamic linear modelling (DLM) approach in spatial interpo- lation and find various potential problems with that approach. We develop software to implement our approach. Secondly, we implement a Bayesian spatial prediction (BSP) approach to model spatio–temporal ground–level ozone fields and compare the accuracy of that approach with that of the DLM. Thirdly, we develop a Bayesian version empirical orthogonal function (EOF) method to incorporate the uncertainties due to temporally varying spatial process, and the spatial variations at broad– and fine– scale. Finally, we extend the BSP into the DLM framework to develop a unified Bayesian spatio–temporal model for univariate and multivariate responses. The result generalizes a number of current approaches in this field. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xx Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 AQS Database and Ground–level Ozone . . . . . . . . . . . . 2 1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Introduction to the Thesis . . . . . . . . . . . . . . . . . . . 5 2 Dynamic Linear Modelling . . . . . . . . . . . . . . . . . . . . 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Space–time Process . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Spatio–temporal DLM . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Kalman Filter and Smoother . . . . . . . . . . . . . . . . . . 11 2.5 An Illustrative Example: Cluster 2 AQS Database (1995) . . 13 2.6 Algorithms for Estimating the Model Parameters . . . . . . 17 2.6.1 Metropolis–within–Gibbs algorithm . . . . . . . . . . 18 2.6.2 Sampling from p(λ, σ2,x1:T |a1, a2,y1:T ) . . . . . . . . 19 2.6.3 Sampling from p(ym1:T |λ, σ2,x1:T ,yo1:T ) . . . . . . . . . 22 iii Table of Contents 2.6.4 Sampling from p(a1, a2|x1:T , λ, σ2,y1:T ) . . . . . . . . 23 2.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Algorithms for Interpolation and Prediction on Ungauged Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7.1 Sampling the unobserved state parameters . . . . . . 26 2.7.2 Spatial interpolation at ungauged sites . . . . . . . . 27 3 Dynamic Linear Modelling and Its Spatial Interpolation . 28 3.1 Cluster 2 AQS Dataset (1995) Revisited . . . . . . . . . . . . 28 3.2 Markov Chain Simulation Study . . . . . . . . . . . . . . . . 30 3.3 Spatial Interpolation . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Problems in the DLM . . . . . . . . . . . . . . . . . . . . . . 37 3.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 44 4 Multivariate Bayesian Spatial Prediction and Its Spatial In- terpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 AQS Ozone Database (2000) for the Chicago Area . . . . . . 54 4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4 Spatial Interpolation . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Spatial Leakage in the DLM . . . . . . . . . . . . . . . . . . 89 4.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . 96 5 Multivariate Bayesian Spatial Prediction and Its Temporal Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.1 The Multivariate BSP Approach . . . . . . . . . . . . . . . . 99 5.2 The DLM Approach . . . . . . . . . . . . . . . . . . . . . . . 105 5.3 NAIVE∗ Approach . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 Results and Comparisons . . . . . . . . . . . . . . . . . . . . 107 5.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . 109 6 Bayesian Empirical Orthogonal Function Method . . . . . 126 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.2 Classical EOFs . . . . . . . . . . . . . . . . . . . . . . . . . . 129 iv Table of Contents 6.3 Corrected EOFs . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.4 Bayesian EOFs . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.5 Extension to the Bayesian EOFs . . . . . . . . . . . . . . . . 148 6.6 Simulation Study 2 . . . . . . . . . . . . . . . . . . . . . . . 149 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7 An Extension of the BSP: Bayesian Spatio–Temporal Mod- els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.2 Univariate Bayesian Spatio–temporal Model . . . . . . . . . 162 7.2.1 Type I covariates . . . . . . . . . . . . . . . . . . . . 165 7.2.2 Type II covariates . . . . . . . . . . . . . . . . . . . . 165 7.2.3 Possible choices for ΦK(s) . . . . . . . . . . . . . . . 166 7.2.4 Predictive posterior distributions . . . . . . . . . . . 166 7.3 The Univariate Bayesian Spatio–temporal Model and Rela- tionships with Others Approaches . . . . . . . . . . . . . . . 169 7.3.1 Relationship with the DLM in Huerta et al. (2004) . 169 7.3.2 Relationship with the SSM in Wikle & Cressie (1999) 169 7.3.3 Relationship with the univariate SSM in Gelfand et al. (2005) . . . . . . . . . . . . . . . . . . . . . . . . . 170 7.3.4 Relationship with the BSP model in Le and Zidek (1992) . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.4 A Multivariate Bayesian Spatio–Temporal Model . . . . . . . 172 7.5 MCMC Algorithm on the Bayesian Spatio–temporal Models 173 7.6 Results and Conclusions . . . . . . . . . . . . . . . . . . . . 175 8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.1 Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.2 Future Research Plan . . . . . . . . . . . . . . . . . . . . . . 178 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 v Table of Contents Appendices A Additional Results for Chapter 2 . . . . . . . . . . . . . . . . 186 A.1 Additional Results for Section 2.6.1 . . . . . . . . . . . . . . 186 A.2 Additional Results for Section 2.6.2 . . . . . . . . . . . . . . 187 A.3 Additional Results for Section 2.6.4 . . . . . . . . . . . . . . 188 A.4 Additional Results for Sections 2.7.1 and 2.7.2 . . . . . . . . 191 B Software for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 193 C Additional Results for Chapter 6 . . . . . . . . . . . . . . . . 200 C.1 Additional Results for Section 6.4 . . . . . . . . . . . . . . . 200 vi List of Tables 3.1 Posterior summaries for λ, σ2, a1, and a2. . . . . . . . . . . . . . 32 3.2 Comparisons between the nominal levels and actual predictive cred- ibility interval coverage at the ungauged sites A, . . . , F. . . . . . . 33 3.3 Summary of pairs of “friends” for ungauged and gauged sites. . . . 37 3.4 Fixed λ∗ in Study C. . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 The relationship between the discount factor (δ) and the signal–to– noise ratio (r). . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 The greatest circle distance (GCD) between the pairs of ungauged sites and their closest gauged site(s) in the Chicago’s hourly O3 field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3 The posterior ellipsoid coverage probabilities at various nominal levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 The mean square predictive error (MSPE) at ungauged sites of the multivariate BSP, DLM, and NAIVE approaches. . . . . . . . . . 91 5.1 The mean square predictive error (MSPE) of the one–day ahead prediction at the 14 gauged sites by the multivariate BSP, DLM, and NAIVE∗ approaches. The BSP dominates in all but 3 cases where it essentially ties with one or another of its competitors. . . 108 6.1 Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.9). . . . . . . . . . 139 6.2 Matrix discrepancies for the classical and corrected EOFs against the true EOFs. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vii List of Tables 6.3 Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.1). . . . . . . . . . 152 6.4 Matrix discrepancies for the classical and corrected EOFs against the true EOFs (ρ = 0.1). . . . . . . . . . . . . . . . . . . . . . 152 6.5 Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.9). . . . . . . . . . 156 6.6 Matrix discrepancies for the classical and corrected EOFs against the true EOFs (ρ = 0.9). . . . . . . . . . . . . . . . . . . . . . 156 viii List of Figures 2.1 Geographic locations for the 1995 AQS database in US map, where the latitude and longitude are measured by degrees. (Diamond = Cluster 1 sites; Upper–triangle = Cluster 2 sites; Down–triangle = Cluster 3 sites.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Bayesian periodogram for the square–root of hourly ozone concen- trations at Cluster 2 sites in the AQS database from May 15 to September 11 (1995). . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Geographical locations for the ten gauged sites in Cluster 2 and the randomly selected six ungauged sites. (Number = Cluster 2 sites and letter = ungauged sites.) . . . . . . . . . . . . . . . . . . . . 29 3.2 Traces of model parameters with the number of iterations of the Markov chains. The model parameters are: (a) – λ, the range parameter; (b) – σ2, the variance parameter; (c) – a1, the phase parameter with respect to the 24–hour periodicity; and (d) – a2, the phase parameter with respect to the 12–hour periodicity. . . . 31 3.3 Interpolation at Ungauged Site 4 from the 1st week to the 4th week. 34 3.4 Interpolation at Ungauged Site 4 from the 5th week to the 8th week. 34 3.5 Interpolation at Ungauged Site 4 from the 9th week to the 12th week. 35 3.6 Interpolation at Ungauged Site 4 from the 13th week to the 16th week. 35 3.7 Interpolation at Ungauged Site 4 from the 17th week to the 120th day. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.8 Scatterplot for the square–root of hourly ozone concentrations at Ungauged Site D and its nearly neighbour, Gauged Site 1. . . . . 38 ix List of Figures 3.9 Traces of model parameters with number of iterations of the two Markov chains. The model parameters are: (a) −λ, the range pa- rameter; (b) −σ2, the variance parameter; (c) −a1, the phase pa- rameter with respect to the 24−hour periodicity; and (d) −a2, the phase parameter with respect to the 12−hour periodicity. . . . . . 39 3.10 Histogram (left panel), ACF (middle panel) and PACF (right panel) of model parameters of the Markov chains after a burn–in period of 1, 000 iterations. The model parameters are: (i) first row: – λ, the range parameter; (ii) second row: – σ2, the variance parameter; (iii) third row: – a1, the phase parameter with respect to the 24– hour periodicity; and (iv) last row: – a2, the phase parameter with respect to the 12–hour periodicity. . . . . . . . . . . . . . . . . . 40 3.11 Scatterplots for model parameters’ pairs: (a) λ v.s. σ2; (b) λ v.s. a1; (c) λ v.s. a2; (d) σ2 v.s. a1; (e) σ2 v.s. a2; and (f) a1 v.s. a2. . 41 3.12 Scatterplot for λ against σ2 given one–week–data only, constructed from MCMC samples starting from same initial values. . . . . . . 43 3.13 Coverage probabilities v.s. 95% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.14 Coverage probability versus 90% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 x List of Figures 3.15 Coverage probability versus 80% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.16 Coverage probability versus 70% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.17 Coverage probability versus 60% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 ( square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.18 Coverage probability versus 50% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.19 Coverage probability versus 40% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 xi List of Figures 4.1 Geographical locations for the Chicago AQS database (2000), where the latitude and longitude are measured in degrees. (◦ = G = gauged sites and × = UG = ungauged sites.) . . . . . . . . . . . 55 4.2 Boxplots for the rates of: (a) missing measurements; and (b) zero measurements, at 24 monitoring stations in the Chicago AQS database. (G = gauged sites and UG = ungauged sites.) . . . . . . . . . . 56 4.3 Boxplots for the square–root of hourly ozone concentrations ( √ ppb) at 24 monitoring stations in the Chicago AQS database. (G1 = Gauged Site 1; UG1 = Ungauged Site 1; and so on.) . . . . . . . 57 4.4 The weekday effect of the square–root of hourly ozone concentra- tions ( √ ppb) at the 14 gauged sites in the Chicago AQS database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5 The hourly effect of the square–root of hourly ozone concentrations ( √ ppb) at the 14 gauged sites in the Chicago AQS database. . . . 59 4.6 The estimated spatial correlations of: (a)–detrended residuals, and (b)–deAR’d residuals; between gauged sites. . . . . . . . . . . . 68 4.7 The PACF plots for the square–root of hourly ozone concentrations ( √ ppb) at the 14 gauged sites in the Chicago’s area AQS database. 70 4.8 Boxplots for the spatial correlations of the detrended residuals (De- trended), and the estimated spatial correlations using the square– root of hourly ozone concentrations during the hours of: 9 A.M. to 10 A.M. (10:11), 8 A.M. to 10 A.M. (9:11), 7 A.M. to 10 A.M. (8:11), 6 A.M. to 10 A.M. (7:11), and 5 A.M. to 10 A.M. (6:11), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.9 Interpolation at Ungauged Site 7 from the 1st week to the 2nd week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 77 xii List of Figures 4.10 Interpolation at Ungauged Site 7 from the 3rd week to the 4th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 78 4.11 Interpolation at Ungauged Site 7 from the 5th week to the 6th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 79 4.12 Interpolation at Ungauged Site 7 from the 7th week to the 8th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 80 4.13 Interpolation at Ungauged Site 7 from the 9th week to the 10th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 81 xiii List of Figures 4.14 Interpolation at Ungauged Site 7 from the 11th week to the 12th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = ob- servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 82 4.15 Interpolation at Ungauged Site 7 from the 13th week to the 14th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = ob- servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 83 4.16 Interpolation at Ungauged Site 7 from the 15th week to the 16th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = ob- servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 84 4.17 The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.18 The observed square–root of ozone concentrations ( √ ppb) during the 10th week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 xiv List of Figures 4.19 The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.20 The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.21 The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.22 Boxplot of the simultaneous posterior ellipsoid credibility regions at various nominal levels. . . . . . . . . . . . . . . . . . . . . . . 90 4.23 The ratio of MSPE of the interpolation by NAIVE to that of the multivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites. . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.24 The ratio of MSPE of the interpolation by the DLM to that of the multivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites. . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.25 Side–by–side boxplots of the coverage probabilities of the multi- variate BSP and DLM approaches plotted against the 10 ungauged sites, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.26 Side–by–side boxplots of the coverage probabilities of the multivari- ate BSP and DLM approaches plotted against the time span of 17 weeks, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 95 xv List of Figures 5.1 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 1. . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 2. . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 3. . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 4. . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.5 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 5. . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.6 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 6. . . . . . . . . . . . . . . . . . . . . . . . . . 114 xvi List of Figures 5.7 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 7. . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 8. . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.9 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 9. . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.10 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 10. . . . . . . . . . . . . . . . . . . . . . . . . 118 5.11 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 11. . . . . . . . . . . . . . . . . . . . . . . . . 119 5.12 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 12. . . . . . . . . . . . . . . . . . . . . . . . . 120 xvii List of Figures 5.13 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 13. . . . . . . . . . . . . . . . . . . . . . . . . 121 5.14 The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise pre- dictive intervals using the multivariate BSP and DLM approaches at Gauged Site 14. . . . . . . . . . . . . . . . . . . . . . . . . 122 5.15 The width of the 95% pointwise predictive intervals of the one–day ahead prediction at the 14 gauged sites using the multivariate BSP approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.16 The width of the 95% pointwise predictive intervals of the one–day ahead prediction at the 14 gauged sites using the DLM approach. 124 5.17 Boxplots of the coverage probabilities using the DLM and multi- variate BSP approaches at the 95% nominal level. . . . . . . . . 125 6.1 Contour plot for the simulated data at day t = 5 in the 18 × 18 grid locations. The AR coefficient in the simulated data is set to be φ = 0.9. (White=-4.0; Black=4.0.) . . . . . . . . . . . . . . . 133 6.2 Contour plot for the simulated data at day t = 28 in the 18 × 18 grid locations. The AR coefficient in the simulated data is set to be φ = 0.9. (White=-4.0; Black=4.0.) . . . . . . . . . . . . . . . 134 6.3 Histogram (first row), ACFs (second row) and PACFs (third row) for the simulated data at four randomly selected sites in the region. The AR coefficient in the simulated data is set to be φ = 0.9. . . . 135 6.4 Contour plots for the first two true EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . . . . 136 6.5 Contour plots for the first two classical EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . . . . 137 6.6 Contour plots for the first two corrected EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . 138 xviii List of Figures 6.7 Contour plots for the first 6 true EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . . . 153 6.8 Contour plots for the first 6 classical EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . 154 6.9 Contour plots for the first 6 corrected EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . 155 6.10 Contour plots for the first 6 classical EOFs (ρ = 0.9): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-1.6; Black=2.2.) . . . . . . . . . . 157 6.11 Contour plots for the first 6 corrected EOFs (ρ = 0.9): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-1.6; Black=2.2.) . . . . . . . . . . . 158 xix Acknowledgements I am most grateful to my co–supervisors, Constance van Eeden, Nhu Le and Jim Zidek, the financial support they provided as well as the flexible work- ing environment they provided. I greatly appreciated the encouragement provided by all three of these individuals. I benefited from my numerous discussions with Jim, who provided inspi- ration and encouraged my growth through his excellence guidance through- out my PhD studies. Jim’s patience, valuable suggestions and comments contributed greatly to making this thesis feasible. I am honoured to have the chance to work with him. Discussions with Nhu Le proved invaluable and I benefited greatly from his comments and his inspired suggestions. In particular, I profited from his very perceptive insights on how my work related to the realities I set out to describe. I cannot thank Constance enough for her wise counsel, especially on her advice on technical writing and teaching. She constantly encouraged me both academically and personally. I would like to thank the other member of my advisory committee, Paul Gustafson, for his insightful comments and suggestions. I would like to extend my thanks to all my thesis committee for their prompt responses and suggestions on my thesis during the holiday. I am indebted for Harry Joe and Weiliang Qiu for their help on providing some of the C codes used in the software that has been developed in this thesis. I would like to thank Nancy Heckman for her invaluable suggestions when I was trying to solve a problem in my research, and Rick White for his advice on using the C++ language. I would like to thank Prasad Kasibhatla of Nicholas School of Environ- xx Acknowledgements ment of Duke University for providing the database used for an implemen- tation in the thesis. Let me say “thank you” to the Department of Statistics for providing a nice environment for my growth. I appreciate the chance to study in this institute and all the help I got from the Department’s faculty. I would also like to thank Christine Graham, Rhoda Morgan, Elaine Salameh, Peggy Ng, and Viena Tran for all their help with administrative matters. Many thanks to my department fellows for valuable discussions with them. I would like to thank my parents and my grandparents for their under- standing and support throughout my PhD program. I would also like to thank my brother, Yiwen Dou, for his understanding and efforts to take care of family issues during the years I studied in Vancouver. YIPING DOU The University of British Columbia March 2008 xxi To Xiulan, Benju, Huazhu, and Yuanlie xxii Chapter 1 Introduction This thesis addresses one topic in four themes. More specifically, we develop a unified fully Bayesian hierarchical modelling approach to the interpolation and prediction of univariate (respectively multivariate) response variables (respectively vectors) in spatial–temporal fields, or space–time fields. The importance of the prediction of certain responses turn out to be vital for human health. Moreover, people tend to find faster, cheaper and better method for prediction. These all motivates studies in this thesis. Many researchers in these areas face responses with spatial structure that changes over time in a dynamic fashion in the sense that the underlying process varies in space and time (Wikle and Royle, 2004). The associated random field is called a “space–time field”. Measurements taken on that field yield so–called spatio–temporal data because of stochastic dependence relationships that are both spatial and temporal in nature. Cressie (1993) defines a space–time field as a set of stochastic processes over space that differ over time. Space–time modelling requires that we deal with space– time data for a variety of purposes, using various methods. The ubiquity of such processes has led to a rich research literature on space–time data problems spread over diverse fields such as environmental health, climatology, epidemiology and ecology. For example, we may wish to investigate the relationship between air pollution and health outcomes, such as, asthma or chronic obstructive pulmonary disease. We may record the measurements at a number of monitoring sites within the study region. Each of those sites may: measure different sets of pollutants; contain missing data; have a startup time that differs from those of other monitoring sites in the network. To model such data, we use a stochastic space–time model to capture the 1 Chapter 1. Introduction dependence between pollutants, spatially and temporally. That dependence may derive from other processes than just those associated with the pollu- tants themselves. For example, the dependence between pollutants could be due to wind direction and speed since some pollutants such as the ground– level ozone concentrations may spread with the wind. Temporal dependence may be due to the atmospheric processes that generate the pollutants. Spa- tial dependence can yield a high correlation between the responses for each pollutant at different monitoring sites. Monitors in geographical proximity tend to be highly correlated unlike those that are far apart. These correla- tions among the monitoring sites in our application seem relatively constant over time. Section 1.1 introduces the AQS database used in this thesis and some features of ground–level ozone concentrations. Section 1.2 describes some background literature on the topic this thesis addresses. Section 1.3 presents the plan of this thesis. 1.1 AQS Database and Ground–level Ozone The Air Quality System (AQS) database contains measurements of air pol- lutant concentrations in the United States, for both criteria air pollutants and hazardous air pollutants. The former are of more concern as they are regulated under the US Clean Air Act of 1970 to protect human health and welfare. The US Environmental Protection Agency (EPA) set air quality standards for six criteria air pollutants: Carbon Monoxide (CO), Nitro- gen Dioxide (NO2), Sulfur Dioxide (SO2), Ozone (O3), Particulate Matter (PM10 and PM2.5), and Lead (Pb). Ozone is good up bad down in terms of the environment and health, that is, ozone could be good or bad depending on its location in the atmosphere. “Good” ozone occurs in the high altitude, about 6 to 30 miles from the surface of the Earth, also called stratosphere ozone. Stratosphere ozone helps reduce the harmful ultraviolet (UV) rays to protect life on Earth. “Bad” ozone, one of the six principal pollutants set by EPA that occurs closest to the surface of the Earth, often within 6 miles, also called ground–level 2 Chapter 1. Introduction ozone or troposphere ozone. Ground–level ozone is “created by chemical reactions between oxides of nitrogen (Nox) and volatile organic compounds (VOC) in the presence of sunlight”1. Ground–level ozone concentrations are often high, annually in summer and daily in late morning and early afternoon. Ground–level ozone’s absorption in the ultraviolet spectrum is approximately 250 nanometers (nm). Ozone is measured by comparing the degree of UV light absorption through a flow cell with ozone–free air.2. Ozone can also be measured by differential optical absorption spectroscopy (DOAS) instrumentation. The US EPA sets both primary and secondary standards for ground– level ozone at 0.08 parts per million (ppm) by volume. We use parts per billion, the unit of ozone levels, instead of parts per million in order to be consistent with other studies about ozone (while noting that these units are not universally acceptable since Europe defines “billion” differently than North America, for example). Only one–hour measured ground–level ozone concentrations are considered in this thesis and so we have the primary and secondary standards3 of 80 ppb. Ground–level ozone concentrations are measurements of a space–time process in space–time fields where the “true” ozone levels change over time and monitoring locations. Uncertainty occurs between the measurements and the things to be measured, that is, the “true” ozone levels. We consider all these features in a Bayesian framework, a framework chosen for its great flexibility. More specifically, in this thesis, we consider fully hierarchical Bayesian models for the prediction of the ground–level ozone concentrations using AQS database or a simulated database as our application that helps 1See the following link at US EPA: http://www.epa.gov/air/ozonepollution/basic.html. 2See the link at: http://www.epa.qld.gov.au/environmental management/air/ air quality monitoring/air pollutants/ozone/. 3According to EPA at http://www.epa.gov/air/ozonepollution/standards.html: • Primary standards are the limits set “to protect public health, including the health of ‘sensitive’ populations such as asthmatics, children, and the elderly.” • Secondary standards are the limits set “to protect public welfare, including pro- tection against visibility impairment, damage to animals, crops, vegetation, and buildings.” 3 Chapter 1. Introduction us assess our models while being of great substantive importance. 1.2 Literature Review Multivariate models for vectors of pollutants over networks of monitoring sites prove to be much more powerful than their univariate counterparts. (See Gelfand et al. (2005) for one recent approach to the development of such models using a linear coregionalization method within a dynamic lin- ear modeling (DLM) framework to predict the univariate and multivariate responses in space–time domains.) By combining all the information from different pollutants at multiple locations, we borrow strength and gain a bet- ter understanding of the levels of pollution at these sites. Furthermore, we obtain more accurate predictors of the pollutants at ungauged sites (unmon- itored sites). Moreover, such models enable us to accommodate other site specific responses in our multivariate framework. For instance, we know that temperature affects ground–level ozone concentrations; ozone levels tend to be higher in summer than winter. Thus temperature can be incorporated directly in the model even though it would usually be regarded merely as a covariate. Another feature of space–time fields receiving increased attention in re- cent literature is nonstationarity. (See Definition 2.2.2 in Chapter 2.) Non- stationarity can be due to the correlation (or covariance) varying with dif- ferent site–features or heterogeneity in pollutant levels. Higdon (1998) pro- poses a process convolution approach to define a nonstationary process using basis function expansions. Following that, a dynamic process convolution method is proposed by Calder (2004). Calder and Cressie (2007) review var- ious types of convolution–based models for spatial data, in which they cite Fuentes’s work (Fuentes, 2002). Fuentes (2002) models the nonstationary process through the convolution method by defining that process to be a mixture of stationary processes at small subregion. Sampson and Guttorp (1992) propose a deformation approach for capturing process non-stationary while Damian et al. (2002) offer a Bayesian version of it. These covariance models fit in well with the Bayesian hierarchical models proposed by Brown 4 Chapter 1. Introduction et al. (1994) who define and implement the generalized inverted Wishart distribution particularly to construct covariance models for patterned data such as those that exhibit a monotone (staircase) pattern. We use a DLM approach to deal with the problem of nonstationarity, amongst other things. In particular, we propose a class of fully hierarchi- cal Bayesian models that accounts for parameter uncertainty and addresses the curse of dimensionality in spatio–temporal modelling. In fact, one of the greatest difficulties arising in implementing the above approaches for spatial–temporal fields is the computational burden and inefficiency, espe- cially in high–dimensional systems with irregularly located monitoring sta- tions as well as sparse data. Approaches to tackling this problem include Bayesian kriging approach by Wikle and Cressie (1999), the process convo- lution method by Hidgon (1998), and the spatial dynamic factor approach by Lopes et al. (2007). As an extension of Wikle and Cressie, we investigate a fully Bayesian approach to construct the local principal spatial patterns through an EOF approach. 1.3 Introduction to the Thesis My thesis contains four main themes. The first implements a version of the DLM model proposed by Huerta et al. (2004) to model an AQS database for the ozone concentrations at one cluster of 10 monitoring stations over the entire summer of 1995. This implementation, along with the background knowledge for the thesis is introduced in Chapters 2 and 3. In Chapter 2, we first review the definition and properties of spatio–temporal processes and of Kalman filtering and smoothing methods in Gaussian DLM frame- work. We then introduce the AQS database (1995) and an implementation of an exploratory data analysis (EDA) based on that database. Finally, we demonstrate use of MCMC algorithms for spatial interpolation and tempo- ral prediction at ungauged sites. Following the theory presented in Chapter 2, we implement the DLM to model the hourly ozone concentrations based on the database in Chapter 3. We use the software developed for this the- sis, GDLM.1.0, to complete this implementation. We also present potential 5 Chapter 1. Introduction problems in applying this method to ground–level ozone concentrations at that space–time domain. More discussions on this implementation of the DLM can be referred to Dou et al. (2007). Computational inefficiency proves a critical problem in using the AQS database (1995) for ozone studies – the DLM is just not scalable to large space–time domains as our work will show. Thus we explore use of an alternative, the BSP (Bayesian spatial prediction after prefiltering) or Le– Zidek approach that we implement for another AQS database (2000) to spatially interpolate and temporally predict the ground–level ozone concen- trations. In Chapter 4, we first introduce related literature work and the AQS database for the Chicago area. We then demonstrate the methodology of the BSP approach. We show the existence of the spatial leakage problem (Le and Zidek, 1999) in the DLM framework, a newly result of this thesis. We summarize the results on spatial interpolation and compare them with the results using the DLM at the end of this chapter. Le & Zidek (2006, p.131–183) suggest that a different modelling approach would be needed for the temporal prediction of univariate and multivariate responses in spatio–temporal fields. However in Chapter 5, we show how to do this with a further modelling step so that the BSP can in fact be made to yield one–day–ahead temporal forecasts for ground–level ozone concen- trations. The temporal prediction results using the BSP approach are then compared with that of the DLM and another alternative called “NAIVE”. We summarize these comparisons at the end of this chapter, and conclude on the advantages and disadvantages of the BSP and DLM approaches. Because of enormous computational time savings of the BSP over the DLM approach, we chose to extend and refine the BSP, in particular, to incorporate both broad and fine scale spatial variations, while incorporating autocorrelation in the time series at each of the sites that of interest. In Chapter 6, we first show that the way EOFs ([Nancy] empirical orthogonal functions) are traditionally computed may be misleading when the time se- ries data autocorrelated in a simulation study. [Nancy: The EOF method is used here for purpose of an extension into a fully Bayesian framework be- cause of its widely and intensively usage in scientific community.] Assuming 6 Chapter 1. Introduction a known temporal covariance function, we show the corrected EOF method better capturing main spatial patterns than the classical one. Since this co- variance will always be unknown and so uncertain in practice, we propose a Bayesian EOF method to represent that uncertainty. In the second simula- tion study, we compare both the classical and corrected EOFs with the true EOFs, assuming a known and separable spatial–temporal covariance func- tion. However we leave the implementation of the Bayesian EOF method. That implementation can use the MCMC algorithms already developed in this chapter. Finally, the flexible, general structure of the DLM allows us to inte- grate the BSP approach into the DLM framework. Chapter 7 proposes a unified Bayesian spatio–temporal model for univariate and multivariate re- sponses. Using this new model, we can decompose data variations into three components: long–term spatio–temporal; short–term principal spatial; and short–term spatio–temporal components. The short–term spatio–temporal components can be modelled as an BSP term. This very general, flexible model accounts for temporal correlation in the data and so allows us to update the information on those parameters as new data come in. To main- tain computational speed, the Bayesian EOF has been implemented in this model to capture the local–term principal spatial patterns in the detrended spatio–temporal residual fields. We show the model’s generality and flexi- bility by investigating that some well–known related models turn out to be special cases of ours. Those related models include the DLM by Huerta et al. (2004), Wikle and Cressie (1999), Gelfand et al. (2005), and of course, the BSP approach itself from Le and Zidek (1992). Our model also incorpo- rates certain computational efficiencies from theoretical results we obtain in this thesis. In particular, we develop the MCMC algorithm to draw samples from the joint posterior distribution of model parameters. We can imple- ment our model using this algorithm. However that implementation will be left to future work. Finally in Chapter 8, we discuss future work flowing from the work of this thesis as well as possible directions for their solutions. 7 Chapter 2 Dynamic Linear Modelling 2.1 Introduction We are particularly interested in the study for ozone concentrations, due to its importance for the environment. Without the shielding layer of strato- sphere ozone, ultraviolet (UV) radiation would harm life on Earth. The in- creasing incidences of skin cancers, for example, may be linked to a thinning of Earth’s ozone level. On the other hand, exceedingly high tropospheric ozone levels may cause some other diseases, for instance, eye irritation and cardiovascular diseases. We wish to study the ozone levels to more com- pletely understand ozone today to better predict them in the future. Ozone concentrations are a spatio–temporal field, that is, the response variable is observed across monitoring stations, which can be fixed or varied as time changing, over some time periods. There are many approaches to modelling the spatio–temporal data. We have a particular interest in dynamic linear modelling because modelling the time series process is based on “classes of dynamic models,” which is often defined as “sequences of sets of models.” The term, dynamic, is defined as the changes in the process “due to the passage of time as a fundamental motive force” and the DLMs (dynamic linear models) when the normality is assumed. This chapter starts with an introduction to some basic notions in space– time theory, some elementary notions and results on the DLM and a review on Kalman filter and smoother processes in Sections 2.2–2.4. The theme of this chapter centers on a DLM approach to modelling the ozone concentrations in the Air Quality System (AQS)4 database. The DLM structure (proposed by Huerta et al. (2004)) is specified by an illustrative 4AQS originally called AIRS, the terminology we use hereafter. 8 Chapter 2. Dynamic Linear Modelling example in Section 2.4 through some exploratory data analysis (EDA) and used throughout the following sections in this chapter. Theoretical results and algorithms on the DLM are represented in Sections 2.6 and 2.7. The MCMC sampling scheme is outlined in Section 2.6.1. The forward–filtering– backward–sampling (FFBS) method is demonstrated in Section 2.6.2 to es- timate the state parameters in the DLM. Moreover, we outline the MCMC sampling scheme to obtain samples for other model parameters from their posterior conditional distributions with a Metropolis–Hasting step, and the theoretical results for prediction and interpolation at ungauged sites from their predictive posterior distributions in Section 2.7. 2.2 Space–time Process Let D be the region of monitoring stations for study. For simplicity, we can fix D as a finite domain to study. Suppose Y (si, t) denotes the observation at time t ∈ R and site si ∈ D, and Z(si, t) denotes the space–time process at time t ∈ R and site si ∈ D, with t = 1, . . . , T, and i = 1, . . . , n. There may be additional covariates available, x(si, t). The modeling structure is given by Y (si, t) = Z(si, t) + ²(si, t), i = 1, . . . , n, t = 1, . . . , T, (2.1) where ²(si, t) is a white noise process. The space–time process Z(si, t) can be expressed as Z(si, t) = µ(si, t) + w(si, t), (2.2) where µ(si, t) is the mean process obtained from the observed covariate x(si, t) and w(si, t) is a mean 0 spatio–temporal process. Definition 2.2.1 The space–time covariance function is defined as C(s1, s2; t1, t2) = Cov[w(s1, t1), w(s2, t2)], (2.3) where si is the ith location, and ti is a time point, for i = 1, 2. 9 Chapter 2. Dynamic Linear Modelling Definition 2.2.2 The zero mean spatio–temporal process w(s, t) is covari- ance stationary if C(s1, s2; t1, t2) = C(s1 − s2; t1 − t2) = C(h; η), (2.4) where h = s1 − s2 and η = t1 − t2. Note that h in (2.4) denotes the vector distance between the two sites. If w(s, t) does not satisfy (2.4), it is called a nonstationary spatio–temporal process. Definition 2.2.3 The zero mean spatio–temporal process w(s, t) is isotropic if C(h; η) = C(‖h‖; |η|), (2.5) which means that the covariance function depends on the separation vectors only by their length of difference, ‖h‖ and |η| . If the spatio–temporal process is not isotropic, then it is called anisotropic. The covariance structure for the spatio–temporal process can be simpli- fied by assuming the separability (see Definition 2.2.4). Definition 2.2.4 The zero mean spatio–temporal isotropic process w(s, t) is separable if C(‖h‖; |η|) = Cs(‖h‖)Ct(|η|), (2.6) that is, the covariance function for the spatio–temporal process can be de- composed as the product of an isotropic spatial and an isotropic temporal covariance function. If the covariance for spatio–temporal process is not separable, it is called non–separable and the process is then nonseparable. As Gelfand et al. (2004) mentioned, for non–stationary spatial process, the general approach is to construct a valid separable covariance function. Brown et al. (1994b) and Le et al. (1998) investigate a separable covari- ance structure to deal with nonstationary multivariate spatial models in a 10 Chapter 2. Dynamic Linear Modelling hierarchical Bayesian framework. The advantages for this approach are: (i) it reduces the number of parameters to be estimated in the model; (ii) it provides a positive definite covariance matrix. Because of the dynamic features of spatio–temporal data, we are par- ticularly interested in modelling the process dynamically using a dynamic modelling approach. In the following chapter, we introduce some basic no- tations, along with results, from the dynamic modelling. 2.3 Spatio–temporal DLM Suppose yt : n×1 is a vector observation, for t = 1, 2, . . . . The DLM contains two equations: the observation equation and the evolution equation. The observation equation is given by yt = F ′ txt + νt, νt ∼ N [0,Vt], (2.7) and the evolution equation is given by xt = Gtxt−1 + ωt, ωt ∼ N [0,Wt], (2.8) where Ft : p×n, Gt : p× p, Vt : n×n, and Wt : p× p are known matrices. In Equation (2.7), Ft is called the design matrix, xt is the state, or system, vector, and νt is the observational error. Equation (2.8) is also called the evolution, state or system equation. Gt is the system or state matrix and ωt is the system, or evolution, error with evolution matrix Wt. The dynamic linear model is completed with the initial information for the state parameter given by (x0|y0) ∼ N [m0,C0]. (2.9) 2.4 Kalman Filter and Smoother Kalman recursion, or Kalman filtering as it is sometimes referred to, is used to update and forecast the state parameters. An analogous method is de- 11 Chapter 2. Dynamic Linear Modelling rived by West and Harrison (1997, Chapter 4) on updating and forecasting state parameters in the DLM framework. This result can also be derived by means of Bayesian inference. Carter and Kohn (1994) label this approach forward–filtering–backward–sampling. It is particularly powerful and will be used in the following chapter. It can, moreover, be summarized in the following theorem which will be applied to the filtering and smoothing pro- cesses. Theorem 2.4.1 (West and Harrison, 1997) Let y1:t = (y1, . . . ,yt), t ≥ 1, denote all the responses observed until time t. Under models (2.7) - (2.8), together with the initial state information in (2.9), we have for t = 2, . . . , T (i) (xt−1|y1:t−1, θ) ∼ N [mt−1,Ct−1] (xt|y1:t−1, θ) ∼ N [at,Rt] (yt|y1:t−1, θ) ∼ N [ft,Qt] (xt|y1:t, θ) ∼ N [mt,Ct], where at = Gtmt−1 Rt = GtCt−1G′t +Wt ft = F ′ tat Qt = F ′ tRtFt +Vt et = yt − ft At = RtFtQ−1t mt = at +Atet Ct = Rt −AtQtA′t. (ii) Let Bt = CtG′t+1R −1 t+1. For 0 ≤ k ≤ T − 1, (xT−k|y1:T , θ) ∼ N [aT (−k),RT (−k)], (2.10) where aT (−k) = mT−k +BT−k[aT (−k + 1)− aT−k+1] RT (−k) = CT−k +BT−k[RT (−k + 1)−RT−k+1]B′T−k 12 Chapter 2. Dynamic Linear Modelling with aT (0) = mT , RT (0) = CT , aT−k(1) = aT−k+1, and RT−k(1) = RT−k+1. Note that the distributions for the smoothing and filtering processes are obtained conditionally on the parameter vector θ. In practice, it is often exceedingly difficult to obtain the posterior distribution of θ because it may not have any closed form. In this case, the MCMC method is often used to obtain samples of θ from its posterior conditional distribution. After obtaining θs, we can use Theorem 2.4.1 to get samples of state parameters from the corresponding distributions. If we can obtain all the samples of θs and state parameters, we can then do the prediction and interpolation, the general goals of the kriging method. Next we set the DLM modelling for the Cluster 2 AQS database. This DLM modelling will be implemented in Chapter 3. 2.5 An Illustrative Example: Cluster 2 AQS Database (1995) We consider hourly ozone concentrations in ppb measured during 1995 at 375 different monitoring stations irregularly located in the USA. Among the 375 stations, we choose three clusters of sites in close proximity with 10 monitoring stations each, for a total of 30 monitoring stations. The geographical locations of these stations are given by the latitudinal and longitudinal coordinates (see Figure 2.1). Our goal is to construct a suitable DLM for the Cluster 2 sites based on the explanatory data analysis (EDA) as Huerta et al. (2004) suggest. The missing data are filled initially by the spatial regression method (SRM). The empirical distribution of the ozone concentrations has an asymmetric shape. So the square–root of the ozone concentrations are used as the responses due to the normality assumption of the DLM. For our study of the periodicity of the ozone dataset, we plot the Bayesian periodograms (Bretthorst, 1988) for the square–root of ozone levels for this summer of 1995 at Cluster 2 sites in Figure 2.2. We find a high peak during 1 pm to 3 pm each day for 120 13 Chapter 2. Dynamic Linear Modelling 1995 AQS Database Figure 2.1: Geographic locations for the 1995 AQS database in US map, where the latitude and longitude are measured by degrees. (Diamond = Cluster 1 sites; Upper–triangle = Cluster 2 sites; Down–triangle = Cluster 3 sites.) days, a significant 24–hour cycle for all these gauged sites in Figure 2.2. We also find a slightly significant 12–hour cycle by plotting the periodicities which contribute more variation according to the spectrum for each station in Cluster 2 sites. We do not find any obvious weekly cycles or any nightly peaks for this database. The DLM used in this chapter is a variation of the one proposed by Huerta et al. (2004). The state vector equation accounts for the trend and periodicity across the sites. The validity of this model for this example is assessed in Chapter 3. Given the information of other covariates, such as temperature or wind speed, better prediction of responses might be obtained using the DLM. However, due to the lack of such knowledge, we use such a variation of the DLM in Huerta et al. (2004). 14 Chapter 2. Dynamic Linear Modelling 10 20 30 40 50 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 Wavelength L o g L i k e l i h o o d Figure 2.2: Bayesian periodogram for the square–root of hourly ozone concentra- tions at Cluster 2 sites in the AQS database from May 15 to September 11 (1995). Let yit denote the observed square–root of the ozone concentration, at site si and time t, with i = 1, . . . , n and t = 1, . . . , T, where n presents the total number of gauged sites (that is, sites with observations) in our study and T, the total number of time points. A variant of the state–space model for such a database is given by (Huerta et al., 2004): yt = 1 ′ nβt + S1t(a1)α1t + S2t(a2)α2t + νt (2.11) βt = βt−1 + wt (2.12) αjt = αj,t−1 + ω αj t , (2.13) where νt ∼ N [0, σ2Vλ], wt ∼ N [0, σ2τ2y ] and ωαjt ∼ N [0, σ2τ2jVλj ], with Vλ = exp(−V/λ) andVλj = exp(−V/λj), for j = 1, 2. Let yt = (y1t, . . . , ynt)′ : 15 Chapter 2. Dynamic Linear Modelling n×1 and αjt = (αj1t, . . . , αjnt)′ : n×1, j = 1, 2. Here βt denotes a canonical spatial trend and αjit, a coefficient for site si at time t corresponding to a periodicity component Sjt(aj), where Sjt(aj) = cos(pitj/12)+aj sin(pitj/12), for j = 1, 2. Note that V = (vij) : n × n represents the distance matrix for the gauged sites s1, . . . , sn, that is, vij = ||si − sj || for i, j = 1, . . . , n. Here ||si−sj || denotes the Euclidean distance between sites si and sj in kilometers. Here the initial state x0 = (β0, α′10, α′20)′ has been given in (2.9). Models (2.11)–(2.13) can also be written as in (2.7) and (2.8) by letting Gt = I, Vt = σ2Vλ and Wt = σ2W, where W is a block diagonal ma- trix with diagonal entries τ2y , τ 2 1 exp(−V/λ1), and τ22 exp(−V/λ2). In other words, the observation and the state equations can also be written as yt = F ′ txt + νt (2.14) xt = xt−1 + ωt (2.15) where x′t = (βt, α1t′, α2t′), and F ′ t is given by 1 S1t(a1) 0 . . . 0 S2t(a2) 0 . . . 0 1 0 S1t(a1) . . . 0 0 S2t(a2) . . . 0 ... ... ... ... ... ... ... 1 0 0 . . . S1t(a1) 0 0 . . . S2t(a2)  . Let y1:T = (ym1:T ,y o 1:T ) ′, where ym1:T = (y m 1 , . . . ,y m T ) represents all the missing values and yo1:T , all the observed values in Cluster 2 sites for t = 1, . . . , T. The model parameters are (λ, σ2,x1:T ,ym1:T , a1, a2), in which x1:T = (x1, . . . ,xT ) are the state parameters until time T, λ the range parameter, σ2 the variance parameter and a = (a1, a2) the phase parameters. Here we assume constant, i.e., site–invariant, phase parameters over all gauged sites due to an empirical study at each gauged site that confirms such features of spatio–temporal data in this field. Define γ = (τ2y , τ 2 1 , λ1, τ 2 2 , λ2) to be the vector of parameters fixed in the DLM. The DLM is then completed with the following hyperpriors for some of 16 Chapter 2. Dynamic Linear Modelling the model parameters: λ ∼ IG(αλ, βλ) σ2 ∼ IG(ασ2 , βσ2) a ∼ N(µoa,Σoa). The choice for such hyperpriors is addressed in Section 3.1. We express the state–space model in two different ways because of our dual objectives of inference for parameters and interpolation. For simplic- ity, we use models (2.14)–(2.15) for the purpose of inference on the range, variance, state parameters and missingness in Sections 2.6.2–2.6.3, and use models (2.11)–(2.13) for inference about the phase parameters in Section 2.6.4 and spatial interpolation in Section 2.7. We can see that the state–space models in (2.14)–(2.15) capture some important features of the AIRS database. It reflects the time–dependent structure of the data and captures the diurnal patterns of ozone concen- trations across all the sites. Further implementation of the DLM in this database will be revisited in Chapter 3. 2.6 Algorithms for Estimating the Model Parameters For the purpose of interpolation and prediction, one has to estimate all the unknown model parameters at each gauged site and each time point. Our goal in this section is to give the details needed to estimate the model param- eters by the MCMC method and the forward–filtering–backward–sampling algorithm, developed by Carter and Kohn (1994). In Section 2.6.1, we introduce the Metropolis–within–Gibbs method to sample from the target distribution for the model parameters given all the observations. Sections 2.6.2, 2.6.3 and 2.6.4 give details about implement- ing this method under the DLM. The algorithm for estimating the model parameters is then summarized in Section 2.6.5. 17 Chapter 2. Dynamic Linear Modelling 2.6.1 Metropolis–within–Gibbs algorithm Consider the state space model (2.14)–(2.15). Let yo1:T = (y o 1, . . . ,y o T) : n × T be the observation matrix at the n gauged sites until time T. Let x1:T = (x1, . . . ,xT ) : (2n+ 1)× T be the state parameters at the n gauged sites until time T. For simplicity, the coordinates of γ are fixed but the problem on setting them will be addressed later in Section 3.5. The target distribution of interest is given by p(λ, σ2,x1:T ,ym1:T , a1, a2|yo1:T ). Since the target density does not have a closed form, direct sampling meth- ods cannot draw samples from it. The MCMC method is a popular way to sequentially sample the parameters from their posterior distributions de- pending on the last value drawn and iteratively until the convergence of the chain is reached. As we can see from Huerta et al. (2004), the MCMC method can be used to obtain the posterior, predictive and interpolation results for the DLM. A blocking MCMC scheme is used to sample each component iteratively from the target distribution. Three blocks are chosen as (λ, σ2,x1:T ), ym1:T and (a1, a2). There are two reasons for such blocks in applying the MCMC method. Firstly, it is natural to select blocks in which the parameters are highly correlated but relatively conditional independent between the blocks. The phase parameters are assumed independent in time and location and so less correlated with the other model parameters. Another reason derives from the fact that the full conditional posterior distribution of the phase parameters can be obtained by assuming a bivariate normal hyperprior. Details about inference are presented in Appendix A.2. Since there is no direct way to sample from the target distribution, Gibbs sampling is used to sample these three blocks iteratively from their full conditional posterior distributions. In other words, we can iteratively: (i) sample from p(x1:T , λ, σ2|a1, a2,y1:T ), (ii) sample from p(ym1:T |λ, σ2, a1, a2,yo1:T ), and (ii) sample from p(a1, a2|x1:T , λ, σ2,y1:T ). 18 Chapter 2. Dynamic Linear Modelling The problem is that we do not have a closed form for p(λ, σ2,x1:T |a1, a2, y1:T ) either. However, the full conditional posterior distribution of x1:T can be obtained explicitly by Kalman filtering and smoothing (Section 2.4), i.e., FFBS algorithm. Assuming an inverse Gamma hyperprior for σ2, the condi- tional posterior distribution of σ2 given the range and phase parameters also has an inverse Gamma distribution with new shape and scale parameters. Note that p(λ, σ2,x1:T |a1, a2,y1:T ) = p(λ|a1, a2,y1:T )p(σ2|λ, a1, a2,y1:T ) ×p(x1:T |λ, σ2, a1, a2,y1:T ), (2.16) which shows we can sample from the three conditional posterior distri- butions on the right–hand–side of (2.16) iteratively to obtain the sam- ples from p(λ, σ2,x1:T |a1, a2,y1:T ). However, there is no closed form for p(λ|a1, a2,y1:T ). So we must sample λ from it by a Metropolis–Hasting step within a Gibbs sampling cycle. This algorithm is often called Metropolis– within–Gibbs. Details about sampling from the joint target distribution are given in the next three subsections. 2.6.2 Sampling from p(λ, σ2,x1:T |a1, a2,y1:T ) We use the block MCMC scheme to sample (λ, σ2,x1:T ) from p(λ, σ2,x1:T |a1, a2,y1:T ). Because of (2.16), ideally we can could iteratively sample λ from p(λ|a1, a2,y1:T ), σ2 from p(σ2|λ, a1, a2,y1:T ), and x1:T from p(x1:T |λ, σ2, a1, a2,y1:T ). However, because we do not have a closed form for that posterior density of p(λ|a1, a2,y1:T ), we instead use the Metropolis–Hasting algorithm to sample λ, given all the observations from the following term which is proportional to its posterior density, that is, p(λ|a1, a2,y1:T ) ∝ p(λ) T∏ t=1 |Qt|− 1 2 [ β + 1 2 T∑ t=1 et′Q−1t et ]−(nT/2+α) .(2.17) Details are included in Appendix A.1. Since we cannot compute the normalization constant for p(λ|a1, a2,y1:T ), 19 Chapter 2. Dynamic Linear Modelling the Metropolis–Hasting algorithm must be used here; it is impossible to sample λ directly from its posterior distribution. The Metropolis–Hasting algorithm yields an equilibrium distribution for the Markov chain, since com- putation and simulation is easier for reversible chains where the transition probabilities and stationary density of the chain satisfy the detailed balance equations. In the Metropolis–Hasting algorithm, the transition kernel is a mixed distribution for the new state of the chain: q(., .), the proposal density and α(., .), the acceptance probability. We choose the proposal density, q(., .), to be lognormal distribution, be- cause the parameter space is bounded below by 0, the density of the Gaussian distribution making in appropriate. As Moller (2002) notes, this alternative to the random walk Metropolis considers the proposal move to be a random multiple of the current state. From the current state λ(j−1)(j > 1), the pro- posed move is λ∗ = λ(j−1)eZ , where Z is drawn from a symmetric density, such as normal. In other words, at iteration j, we sample a new λ∗ from this proposal distribution, centered at the previously sampled λ(j−1), with a tuning parameter τ2 as the variance of the distribution of Z. Gammerman (2006) suggests the acceptance rate, that is, the ratio of accepted λ∗ to the total number of iterations, should be around 50%.We tune τ2 to attain that rate. If the acceptance rate is too high, for example, 70% to 100%, we then increase τ2. Similarly, if the acceptance rate is too low, for example, 0 to 20%, we decrease τ2 to narrow down the search area for λ∗. The Metropolis–Hasting algorithm proceeds as follows. Given λ(j−1), where j > 1, • Draw λ∗ from LN(λ(j−1), τ2).5 • Compute the acceptance probability α(λ(j−1), λ∗) = min { 1, p(λ∗|y1:T )/q(λ(j), λ∗) p(λ(j−1)|y1:T )/q(λ∗, λ(j−1)) } . 5X ∼ LN(a, b) means X follows a lognormal distribution. In other words, X = exp(Y ) where Y ∼ N(a, b). 20 Chapter 2. Dynamic Linear Modelling • Accept λ∗ with probability α(λ(j−1), λ∗). In other words, sample u ∼ U [0, 1] and let λ(j) = λ∗ if λ∗ < u and λ(j) = λ(j−1) otherwise. We run this algorithm iteratively until convergence is reached. Next we sample σ2, given the accepted λ, a1, a2 and the observations. The prior for σ2 is chosen to be an inverse gamma distribution with shape parameter α and scale parameter β. The posterior distribution for σ2 is also an inverse gamma distribution, but with a shape parameter α + nT2 and a scale parameter β + 12 ∑T t=1 et ′Q−1t et. We now sample x1:T given the accepted λ, σ2, a1, a2 and y1:T , using the forward–filtering–backward–sampling (FFBS) method as described in Sec- tion 2.4. West and Harrison (1997) propose a general theorem for inference on the parameters in the DLM framework. For time series data, the usual method for updating and predicting is Kalman filter. We present the follow- ing theorem as the FFBS algorithm (similar to the Kalman filter algorithm) to resample the state parameters conditional on all the other parameters and observations. FFBS is used as part of a MCMC method to sample x1:T = (x1, . . . ,xT ) from the smoothing distribution p(xt|λ, σ2, a1, a2,y1:T ). It is called FFBS because recent data are used to update the state parame- ters, xt’s, recursively for t from 1 to T, as well as to sample each element of the xt’s using all the information recursively for t from T to 1. Theoretical results for models (2.14) and (2.15) are presented to give an idea of how to draw samples from the posterior distribution for x1:T . One can also obtain it from Theorem 2.4.1 in Section 2.4, by letting Gt = I, Vt = σ2Vλ and Wt = σ2W. The initial state parameter is given by (x0|y0, θ) ∼ N [m0, σ2C0], (2.18) where y0 denotes the initial information, and m0 and C0 are known values. Later in Section 3.1, we consider how to specify them in Cluster 2 AQS database (1995). Let θ = (λ, σ2, a1, a2, γ). Assume all the prior information has been given and θ’s coordinates are mutually independent. The details on this algorithm are included in Appendix A.2. 21 Chapter 2. Dynamic Linear Modelling 2.6.3 Sampling from p(ym1:T |λ, σ2,x1:T ,yo1:T ) The MCMC method has an important advantage that it enables us to fill in the missing values at each iteration, that is, to treat missing values like the model “parameters”. In this way, it avoids to use ad hoc methods for fitted missing values, say by period means or the spatial regression method. At any fixed time point t, after appropriately defining a scale matrix Rt, we can rewrite the observation vector yt as follows: Rtyt = ( ymt yot ) , where ymt : nt×1 denotes the missing response(s) at time t and yot : (n−nt)× 1 the observed response(s) at t. Notice that “o” represents for “observed” and “m” for “missing”. Let Rt = (en1 , . . . , ent , ek1 , . . . , ekn−nt ) ′, where {snj : j = 1, . . . , t} presents the gauged sites containing missing values at time point t, {skj : j = 1, . . . , n− nt} the gauged sites containing observed values at time t, for all t = 1, . . . , T ; and ej is an 1×n vector such that ejj = 1, ejk = 0 if k 6= j, for j ∈ Z+. We already know that (yt|λ, σ2,xt,a) ∼ N [F′txt, σ2 exp(−V/λ)], and so Rtyt also has a multivariate normal distribution (Rtyt|λ, σ2,xt,a) = ((ymt ,yot )′|λ, σ2,xt,a) ∼ N [µ̃t, Σ̃t], where µ̃t = RtF′txt Σ̃t = σ2Rt exp(−V/λ)R′t. 22 Chapter 2. Dynamic Linear Modelling We can also partition µ̃t as follows: µ̃t = ( µ̃mt µ̃ot , ) where µ̃mt : nt × 1 and µ̃ot : (n− nt)× 1. Similarly, we have Σ̃t = ( Σ̃mmt Σ̃ mo t Σ̃omt Σ̃ oo t , ) where Σ̃mmt : nt × nt, Σ̃mot : nt × (n− nt) and Σ̃oot : (n− nt)× (n− nt). By a basic property of the multivariate normal distribution, we have (ymt |λ, σ2,xt,a,yot ) ∼ N [µ∗∗t ,Σ∗∗t ], (2.19) where µ∗∗ = µ̃mt + Σ̃ mo t (Σ̃ oo t ) −1(yot − µ̃ot ), (2.20) and Σ∗∗t = Σ̃ mm t − Σ̃mot (Σ̃oot )−1Σ̃omt , (2.21) for t = 1, . . . , T. At each iteration, we draw the {ymt } from the corresponding distribution (2.19) at each time point t and then write the response variables as y1:T = (ym1:T ,y o 1:T ) without loss of generality, where y m 1:T = (y m 1 , . . . ,y m T ) and y o 1:T = (yo1, . . . ,y o T ). 2.6.4 Sampling from p(a1, a2|x1:T , λ, σ2,y1:T ) We now present our method for sampling the phase parameters a = (a1, a2)′ from its full conditional posterior distribution p(a|λ, σ2,x1:T ,y1:T ), using the samples for λ, σ2 and x1:T obtained in Sections 2.6.2–2.6.3. For simplicity, we use the notation of models (2.11)–(2.15) in this section. We then sample the constant phase parameters conditional on all the other parameters and observations. Suppose a = (a1, a2)′ has a conjugate 23 Chapter 2. Dynamic Linear Modelling bivariate normal prior with mean vector µo = (µ1o, µ2o)′ and covariance matrix Σ0. Then the posterior conditional distribution for a is normal with mean vector µ∗ and covariance matrix Σ∗, where µ∗ and Σ∗ can be obtained from equations (A.13) and (A.14), respectively. This result is shown in the Appendix A.3. We will not use a non–informative prior for a such as p(a) ∝ 1, because that would be problematic here. The reason is straightforward: we want to avoid cases of non–identified posterior means or posterior variances. To be more specific, assume p(a) ∝ 1. Using the same inferential approach as above, we find that the posterior conditional distribution for a is normal with mean vector µ = (µ1, µ2)′ and covariance matrix Σ from equations (A.2) and (A.3), respectively. The elements of Σ are also given in Appendix A.3, where Σ can be singular for any t = 12k, where k is an integer. Hence we obtain extreme values at times 12, 24, . . . , 2880, which invalidates the assumption of constant phase parameters across all the time scales when we sample from their full conditional posterior distribution. For fixed values of λ, σ2 and x1:T , we sample a from N(µ∗,Σ∗) and then obtain the median as the estimator for a for each fixed iteration by exploiting the assumption that a1 and a2 are constant phase parameters in the models (2.14)–(2.15). 2.6.5 Summary The MCMC methods we use here are very similar to those of Huerta et al. (2004) except that we use all the samples after the burn–in period, not just the chain corresponding to the accepted samples, as they did in their paper. That is because using only accepted Markov chains actually leads to the biases on the samples, which indeed changes the detailed balance equation of Metropolis–Hasting algorithm. The algorithm we use in Cluster 2 AQS database is summarized as fol- lowing: ------------------------------------------------------ 24 Chapter 2. Dynamic Linear Modelling Algorithm The Metropolis-within-Gibbs method ------------------------------------------------------ 1. Initialization: sample λ(1) ∼ IG(αλ, βλ) σ2 (1) ∼ IG(ασ2 , βσ2) x(1)1:T ∼ N(m0, σ2 (1)C0). 2. Given the (j− 1)th values, λ(j−1), σ2(j−1), ym1:T (j−1), a(j−1)1 , a(j−1)2 and the observations yo1:T : (1) Sample (λ(j), σ2(j),x(j)1:T ) from p(λ, σ 2,x1:T |a(j−1)1 , a(j−1)2 ,y(j−1)1:T ), where y(j−1)1:T = (y m 1:T (j−1),yo1:T ). (i) • Generate a candidate value λ∗ from a logarithm proposal distribution q(λ(j−1), λ), that is, LN(λ(j−1), τ2) for some suitable tuning parameter τ2. • Compute the acceptance ratio α(λ(j−1), λ∗) where α(λ(j−1), λ∗) = min { 1, p(λ∗|a(j−1)1 , a(j−1)2 ,y(j−1)1:T )λ∗ p(λ(j−1)|a(j−1)1 , a(j−1)2 ,y(j−1)1:T )λ(j−1) } . • With probability α(λ(j−1), λ∗) accept the candidate value and set λ(j) = λ∗; otherwise reject and set λ(j) = λ(j−1). (ii) Sample σ2(j) from p(σ2|λ(j), a(j−1)1 , a(j−1)2 ,y(j−1)1:T ). (iii) Sample x(j)1:T from p(x1:T |λ(j), σ2(j), a(j−1)1 , a(j−1)2 ,y(j−1)1:T ). (2) Sample ym1:T (j) from p(ym1:T |λ(j), σ2(j),x(j)1:T , a(j−1)1 , a(j−1)2 ,yo1:T ). (3) Sample (a(j)1 , a (j) 2 ) from p((a1, a2)|λ(j), σ2(j),x(j)1:T , y(j)1:T ), where y(j)1:T = (ym1:T (j),yo1:T ). 3. Repeat until convergence. ------------------------------------------------------ 25 Chapter 2. Dynamic Linear Modelling 2.7 Algorithms for Interpolation and Prediction on Ungauged Sites Our goal in this section is to interpolate the ozone concentrations at un- gauged sites using the DLM and the simulated Markov chains of the model parameters (see Section 2.6). In other words, suppose s1, . . . , su are u un- gauged sites of interest within the region of Cluster 2 sites (excluding the possibility of extrapolation), the objective is to draw samples from p(ys1:T |λ, σ2,x1:T , a1, a2,y1:T ), where ys1:T = (y s 1, . . . ,y s T ) : 1×T and yst denotes the unobserved square–root of ozone concentrations at the ungauged site s and time t, for t = 1, . . . , T and s ∈ {s1, . . . , su}. Let (αs1t, αs2t) denote the unobserved state parameters at site s and time t. The DLM is given by yt new = 1n+1′βt + S1t(a1)α1tnew + S2t(a2)α2tnew + νtnew, (2.22) where ytnew = (yst ,yt ′)′, αtnew = (αs1t, α1t′, αs2t, α2t′)′, and νtnew ∼ N(0, σ2 exp(−Vnew/λ)). In Section 2.7.1, we illustrate how to sample the unobserved state param- eters {(αs1t, αs2t) : t = 1, . . . , T} from the corresponding conditional posterior distribution. Spatial interpolation at the ungauged site s is demonstrated in Section 2.7.2. 2.7.1 Sampling the unobserved state parameters We first sample αsjt given α s j,t−1, αjt and αj,t−1, j = 1, 2. From the state equation (2.15) to αjtnew, we know that the joint density of αsjt and αjt follows a normal distribution, with covariance matrix σ2τ2j exp(−Vnew/λj), where Vnew denotes the distance matrix for the unobserved station and the monitoring stations. The conditional posterior distribution, p(αsjt|αsj,t−1, λ, σ2, βt, α1t, α2t, a1, a2,y1:T ), 26 Chapter 2. Dynamic Linear Modelling has been derived in Appendix A.4. 2.7.2 Spatial interpolation at ungauged sites We interpolate the square–root of ozone concentration at ungauged sites by conditioning on all the other parameters and observations at gauged sites. Similarly as above, yst and yt are jointly normally distributed from the observation equation. The predictive conditional distribution for yst , that is, p(yst |αs1t, αs2t, λ, σ2, βt, α1t, α2t, a1, a2,y1:T ), is given in Appendix A.4. 27 Chapter 3 Dynamic Linear Modelling and Its Spatial Interpolation In Chapter 2, we illustrate the DLM settings for the Cluster 2 AQS database. Furthermore, we show the explicit MCMC method to estimate the model parameters and the predictive posterior distribution for the responses at “ungauged sites”. Here we randomly select some monitoring stations in the database and treat them as “ungauged” (i.e., unmonitoring) sites for the purpose of model assessment. The theme of this chapter is the spa- tial interpolation using the DLM approach at the Cluster 2 AQS database. Section 3.1 revisits the Cluster 2 AIRS database. Section 3.2 shows the results of MCMC. Section 3.3 demonstrates the spatial interpolation results on the ozone study. Section 3.4 discusses the problems underlying the DLM process. Section 3.5 provides the summary and conclusions about this ap- plication of the univariate DLM. We also develop a software, written in C and R (see Appendix B), partly solving the computational burden due to the use of MCMC algorithm in the DLM approach. This software, GDLM.1.0, had been successfully tested in PIMS summer school (2007). It can also be downloaded freely from http://enviro.stat.ubc.ca/dlm and the written DEMO can be directly used to illustrate the DLM in Chapter 2. 3.1 Cluster 2 AQS Dataset (1995) Revisited Because of the importance and spatio–temporal features, ozone concentra- tions are of particular interest for this study. In this section, we revisit the ozone levels in Cluster 2 AQS database and study it using a variant of the 28 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation implementation of the DLM approach proposed by Huerta et al. (2004). Within the range of Cluster 2 sites, six ungauged sites are randomly se- lected from the available sites, that is, a non–null subset of the sites within the range of but excluding the ten gauged ones in Cluster 2. The geograph- ical locations of these six ungauged sites, represented by the alphabetic letters, A, . . . , F, are shown in Figure 3.1. Cluster 2 AQS Database (1995) 1 2 3 4 5 6 7 8 9 10 AB C D E F Figure 3.1: Geographical locations for the ten gauged sites in Cluster 2 and the randomly selected six ungauged sites. (Number = Cluster 2 sites and letter = ungauged sites.) In Section 2.5, we have illustrated the features of ozone concentrations tabled in the AIRS database by means of an explanatory data analysis and constructed the DLM based on these features. In Sections 2.6.1–2.6.4, we have demonstrated the use of FFBS and MCMC methods in the model used in our study. Statistical inferences and results are presented in the above subsections, where we illustrate the MCMC sampling scheme to obtain the 29 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation samples of model parameters. For the initial values of the state parameters, hyperpriors and fixed model parameters, we use settings by Huerta et al. (2004), after confirming their suitability by preliminary investigation. We do a Markov chain simulation study and discuss the results in Section 3.2. We then interpolate the square– root of ozone concentrations at ungauged sites in Section 3.3. Issues faced in using the DLM process are discussed in Section 3.4. These issues include monitoring two Markov chain’s convergence, highly autocorrelated chains for λ and σ2, and time–varying effect of λ–σ2 in the DLM. These problems and possible solutions are summarized in Section 3.5. 3.2 Markov Chain Simulation Study We do a Markov chain simulation study to draw samples of the DLM’s model parameters from their posterior distributions to make inference based on them. Initial settings As proposed by Huerta et al. (2004), we use the following initial settings for the starting values, hyperpriors and fixed model parameters in the DLM: • The hyperprior for λ is IG(1, 5) and IG(2, 0.01) for σ2. The expected value of IG(1, 5) is ∞ as are both of the variances of p(λ) and p(σ2). These vague priors for λ and σ2 are selected since we do not have any prior knowledge about their distributions. • Initially the state parameters x0, is assumed to be normally distributed with mean vector m0 = (2.85,−0.751′n,−0.081′n)′ and covariance ma- trix σ21C0, where σ 2 1 ∼ IG(2, 0.01) and C0 is a block diagonal matrix with diagonal entries 1, 0.011′n and 0.011′n. • The hyperprior for a is a bivariate normal distribution with mean vector µo = (2.5, 9.8)′ and a diagonal matrix Σo with diagonal entries 0.5 and 0.5. 30 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation • The model parameters in the DLM are fixed as follows: τ2y = 0.02, τ21 = 0.0002, τ 2 2 = 0.0004, λ1 = 25, and λ2 = 25. We found our results to be fairly insensitive to changes in the values for µo, Σo, λ1, and λ2. However, it is not true for τ2y , τ 2 1 , and τ 2 2 . Further discussion on settings for these values can be found in Section 4.4 below Equation (4.24). Monitoring the convergence of the Markov chains 0 500 1000 1500 0 2 0 4 0 6 0 8 0 (a) Iterations λ 0 500 1000 1500 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 1 . 2 1 . 4 (b) Iterations σ 2 0 500 1000 1500 2 . 2 0 2 . 3 0 2 . 4 0 2 . 5 0 (c) Iterations a 1 0 500 1000 1500 9 . 8 9 . 9 1 0 . 1 1 0 . 3 (d) Iterations a 2 Figure 3.2: Traces of model parameters with the number of iterations of the Markov chains. The model parameters are: (a) – λ, the range parameter; (b) – σ2, the variance parameter; (c) – a1, the phase parameter with respect to the 24– hour periodicity; and (d) – a2, the phase parameter with respect to the 12–hour periodicity. Figure 3.2 shows the trace plots of model parameters λ, σ2, a1 and a2 with the number of iterations of the simulated Markov chains where the total number of iterations is 4, 268. The burn–in period is chosen to be 2, 269 and 31 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation all the remaining Markov samples are collected for posterior inference. The acceptance rate is approximately 62%. We observe that the Markov chain converges after a run of less than five hundreds iterations. Table 3.1 demonstrates the median and 95% quantile from the simulated Markov chains for the model parameters λ, σ2, a1 and a2. Quantile λ σ2 a1 a2 2.5% 69.29 1.19 2.42 9.77 Median 71.83 1.21 2.45 9.80 97.5% 75.37 1.24 2.48 9.84 Table 3.1: Posterior summaries for λ, σ2, a1, and a2. 3.3 Spatial Interpolation Six ungauged sites are randomly selected from the available subset of sites within the range of Cluster 2 sites. Their geographical locations are shown in Figure 3.1. Our goal in this subsection is to assess the model’s perfor- mance by comparing the interpolation values with the observations at these ungauged sites, that is, A, . . . , F . All missing values at ungauged sites are initially filled in by the spatial regression method. The MCMC method allows us to obtain the posterior samples for the missing values from their posterior distributions. We use the observed data at ungauged sites to assess the performance of the interpolation results by the DLM. Table 3.2 demonstrates the coverage probabilities at ungauged sites when comparing them to the corresponding credible probabilities. These coverage probabilities are calculated by counting the number of observed responses falling into the predictive intervals constructed by the DLM. It shows differ- ent levels of the predictive credibility intervals and the corresponding actual coverage probability at each of the ungauged sites based on the spatial inter- polation using the DLM approach. In general, the coverage probabilities at ungauged sites are larger than the corresponding credible probability, which indicates that the error bands of the interpolation are too wide. Among these six ungauged sites, site D has the highest coverage probability at 32 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation these nominal levels. This may be due to the fact that Ungauged Site D is very close geographically to Gauged Site 1 in Cluster 2 sites. It validates our assumption that the spatial correlation is large if the pairs of sites are close together but small if they are far apart. However, these unsatisfac- tory coverage probabilities imply the deficiency of the DLM, which we will address in the following sections. Nominal levels (%) Observed coverage fraction (%) A B C D E F 95.0 94.9 96.9 96.5 99.7 96.1 98.1 90.0 91.9 93.7 93.5 99.4 93.6 96.8 80.0 84.8 88.5 88.2 97.7 89.6 94.3 70.0 78.7 83.5 83.3 94.0 85.8 90.6 60.0 73.0 78.5 77.1 89.7 81.6 86.6 50.0 65.2 71.5 70.4 85.6 76.1 81.4 40.0 55.2 61.4 61.0 79.2 67.9 74.7 30.0 42.2 47.6 47.5 69.6 54.9 64.4 Table 3.2: Comparisons between the nominal levels and actual predictive credibil- ity interval coverage at the ungauged sites A, . . . , F. Figures 3.3–3.7 show the interpolation results at Ungauged Site D from May 14 to September 11 in 1995, where the solid lines represent the pre- dicted median of the responses, the dashed lines represent the 95% predictive intervals for the predicted square–root of ozone concentrations and the solid dots represent the observations at this “ungauged” site. 33 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 Figure 3.3: Interpolation at Ungauged Site 4 from the 1st week to the 4th week. 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 Figure 3.4: Interpolation at Ungauged Site 4 from the 5th week to the 8th week. 34 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 Figure 3.5: Interpolation at Ungauged Site 4 from the 9th week to the 12th week. 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 0 50 100 150 0 2 4 6 8 1 0 O 3 Figure 3.6: Interpolation at Ungauged Site 4 from the 13th week to the 16th week. 35 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 50 100 150 0 2 4 6 8 1 0 Hour O 3 Figure 3.7: Interpolation at Ungauged Site 4 from the 17th week to the 120th day. 36 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation Table 3.2 demonstrates the coverage probabilities at each ungauged site using the data for the selected summer. That table shows Ungauged Site D having the largest coverage probability comparing to other ungauged sites. An intuitively plausible explanation: these ungauged sites are close to some gauged sites in Cluster 2 sites. To explore this possibility let us consider “friends” of ungauged sites, any gauged sites in Cluster 2 within 100 kilo- meters. Table 3.3 shows the Global Circle Distance (GCD) and Pearson’s cor- relations between these pairs of “friends”. As an example, the relationship between Ungauged Site D and its “friend”, Gauged Site 1, is demonstrated in Figure 3.8. From this figure, Ungauged Site D has a strong linear as- sociation with Gauged Site 1. It explains that the coverage probability at Ungauged Site D is always higher than the other sites. Ungauged Site Friend(s) GCD (km) Pearson’s r A 2 66.6 0.73 B 2 62.5 0.74 C 2 35.5 0.84 D 1 11.0 0.95 E 2 38.0 0.70 F (7, 8) (18.6, 44.9) (0.84, 0.82) Table 3.3: Summary of pairs of “friends” for ungauged and gauged sites. Overall, the DLM does not predict the responses at ungauged sites very accurately. That points to problems hidden in this method and process model to which we turn in the next section. 3.4 Problems in the DLM There are a number of problems with the current DLM. We summarize several critical issues in this section and give some suggestions about their 37 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation s +s +s+ s + s + s s + s +s + s + s s+ s +s + s + s s + s + s + s +s + s + s +s+ + s + s + s s + s+ s + s + s +s + s + s + s+ ss + ss + s +s + s s + s + s +s + s+ s + s +s s + s +s + s + s + s +s+s +s + s + s s +s +s + s + s + s s + s +s +s + s + s s + s + s + s + s + s + s + s + s + s + s + s + s + s + +s +s+ s+ s + sss+ s++s+ s + s +s +s + s + s + s + s ss + s + s + s + s + s +s + s + s s +s s+s+s+ s + + +s + s+s s + s s+ s++ s + s s + s +s+ s +s + + s +s + s + s + s + s + ss + s + s + s s + s s+ s + + s + s+ s s+s + s + s s+s + s + + s + s + s + s + s+ s + s +s s + s + s + s + s + s + s+s + s + s s+s+ s + s + s + s +s + s + s + s + s+ s + s + + s s + s+ s+ + s + s + s + s + s s + + s +s+ s +s s + s+ s + s + s s + s s + s +s + s + s + s s + + s+ + s + s+ s + +s + s + s + s s + + s s + s s + s + s + +s s + s s + s + s + s s + s+ s + s + s s s +s + s + + s s +s +s+ s + s + s+ s +s + s + s + s ++ s + s + s + + s + s ++ s s + s + s+ s s s+ + s + + s + +s +s s + s s + s+ s + s s + s + s++s s + s + s + + s s + s + + + s + s + s + + s + s+ s+ s s + s + s +s s + s + s + s+ s+s + + + s + s+ s s s + s + s s +s + + + s + s + s +s +s s + s s + s + s + s + s + s + s + s s s + s s + s + + + s + + s+ s+ + + s s +s + +s + s + s + s + s + + s + s s s+ s s+ s +s s + s s + + s + s + s + s + s+ s + s + s s +s s + s + s + s +s s + + s + s + s + s + s s + s + s + s + s + s s + s + s + s + +s + s+s+ s + ++ + s + + s + ++ s + s + + s s + s + s + s + s s s+ s s +s + s + s +s s ss s +s+ + ss + + s + s + + s + s + s + s + s s + + s + s + s + s +s +s s s + +s s+ + s + s+ + s + s s + s+ s+ s s + +s s + s + s + s s ++ s s+ s + s + s s + + s s + +++ s + s ++ s + s + s +s + s + s + s +s+s + s+ ++ s + ss + s + s + s + s++ s + s s s + s s + + s + s + s + s s + s + s + s s + s + s +s s + s s s + s + s + + + s+ s + s+ s s + s + s + s +ss +s s + s +s + s s + s + s ++ s + s + s + s s + s + s+ s+ ss s + s + s+ + s + s + s + s+ s + s + s s + s + s + s + s + s + s +s + s + + s+ s + + + s + s + s + s + s s + s s + s + s + s + s+ s + s + s + s + s s + s + s + + s + s s +s + s s s ++ + s ss+ s s + + + + s s +ss+ s + s + s s + s + s + + s + s + s + s + s s + s s + s + s + + s ++ s + + s s + s + s + s +s + +s + s + s s + + + s + + s + s s + s+ s + s + s + + + s + s +s s + + + s + s + s + s s + s + s + s s s s s + s s + s s s + s s+ s + s s +s + s s + s s+ + s s s s + s + s+ s s + + s + s + ++s+ + +s +s + s s + s + s +s + s + s + s s s s +s+ + s + s +s s + + + s s s + s s + s + s s + s s + s + + + + + s s +s + s s + s s + ss s + ss s s s + + ss s +s + s+s + s + + s + + +s+ s + + s + s + + s + s + + s+ + s +s + s + s s + s + + s + s s + s + + s + s s + + s + + s + s s + s + s s + s + + + + s s +s + s + s + s +s s+ + s s + + s + + + s s+ s + s + s s + s + s + s + + ss s + s + + s + s + s s s+ s s + s + s + s + + ss + s + s +s s +s s + s +s + s s + + s + + s s s + s s + +s + s + s s + s + ss s s + s+ s + s ss s s s + ss + + s + + + s + + + + s + + +s + s + s + s + s +s + s s + s s s s + s + s +s + s s + s + s s+ s + + s + s+ s + s+ s s+ s+ s + + + s s + s + s s s + s +s +s s s + s + s + s + s s +s +s + + s+ s+ s + + s s + s ++ + + s + + s + s + s s s + s s+ + s +s + + + + ss + s s s + s + + + s s ss s + s + s s + s+ + s + s + + s s s s s+ s s +s + s + s + s + s + s + + s +s s + s s s s + s s + s s + + s + s s + s + s s + s s s s s s s s s s + +s + + s s+s + s s +s + s + + s s + s + s + + 0 2 4 6 8 10 0 2 4 6 8 1 0 Gauged Site 1 U n g a u g e d S i t e D s + Gauged Site 1 Ungauged Site D Figure 3.8: Scatterplot for the square–root of hourly ozone concentrations at Ungauged Site D and its nearly neighbour, Gauged Site 1. resolution in the next section. Monitoring two Markov chains’ convergence Figure 3.9 represents the trace plots of model parameters λ, σ2, a1, and a2 of two chains from the initial settings in Section 3.2. These two chains seem to mix well after several hundred iterations. It suggests the Markov chains have converged. Autocorrelation and partial autocorrelation of the Markov chains However, we know that the autocorrelation function (ACF) is very im- portant when considering the length of the chain needed to ensure the esti- 38 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation (a) Iterations λ 0 200 400 600 800 1000 0 2 0 4 0 6 0 (b) Iterations σ 2 0 200 400 600 800 1000 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 1 . 2 (c) Iterations a 1 0 200 400 600 800 1000 2 . 2 0 2 . 3 0 2 . 4 0 2 . 5 0 (d) Iterations a 2 0 200 400 600 800 1000 9 . 8 9 . 9 1 0 . 1 1 0 . 3 Figure 3.9: Traces of model parameters with number of iterations of the two Markov chains. The model parameters are: (a) −λ, the range parameter; (b) −σ2, the variance parameter; (c) −a1, the phase parameter with respect to the 24−hour periodicity; and (d) −a2, the phase parameter with respect to the 12−hour periodicity. mates having the required accuracy. In other words, a highly autocorrelated chain has to run for a long time to obtain sufficiently accurate estimates. The partial autocorrelation function (PACF) is important in assessing the Markov chain since a large value of the PACF at lag h indicates that the next value in the chain is dependent not only on the immediate past but also on the distant past. Figure 3.10 shows the histogram as well as the ACF and PACF for the Markov chains used in Section 3.3, after a burn–in period of 1, 000. Its ACF plots show the λs to be highly autocorrelated. It indicates that the chain for λ is not mixing very well, which leads to the biased estimates in Section 3.3. A possible way to reduce the autocorrelation between these λs is to thin the Markov chain. That is, we could use every kth (k > 1, k ∈ Z+) λ 39 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation generated by the chain to give the estimates. However, due to the unduly large computational cost for thinning the Markov chain, we are forced to use the whole chain for estimation and interpolation. λ F r e q u e n c y 68 70 72 74 76 78 0 5 0 1 5 0 0 5 10 15 20 25 0 . 0 0 . 6 Lag A C F λ 0 5 10 15 20 25 0 . 0 0 . 4 Lag P a r t i a l A C F λ σ2 F r e q u e n c y 1.20 1.25 1.30 1.35 1.40 0 1 0 0 2 5 0 0 5 10 15 20 25 0 . 0 0 . 4 0 . 8 Lag A C F σ2 0 5 10 15 20 25 − 0 . 0 5 0 . 1 0 Lag P a r t i a l A C F σ2 a1 F r e q u e n c y 2.40 2.42 2.44 2.46 2.48 0 4 0 1 0 0 0 5 10 15 20 25 0 . 0 0 . 4 0 . 8 Lag A C F a1 0 5 10 15 20 25 − 0 . 1 0 0 . 1 0 Lag P a r t i a l A C F a1 a2 F r e q u e n c y 9.76 9.78 9.80 9.82 9.84 9.86 0 4 0 8 0 0 5 10 15 20 25 0 . 0 0 . 4 0 . 8 Lag A C F a2 0 5 10 15 20 25 − 0 . 0 5 0 . 0 5 Lag P a r t i a l A C F a2 Figure 3.10: Histogram (left panel), ACF (middle panel) and PACF (right panel) of model parameters of the Markov chains after a burn–in period of 1, 000 iterations. The model parameters are: (i) first row: – λ, the range parameter; (ii) second row: – σ2, the variance parameter; (iii) third row: – a1, the phase parameter with respect to the 24–hour periodicity; and (iv) last row: – a2, the phase parameter with respect to the 12–hour periodicity. Relationship between the pairs of λ, σ2, a1 and a2 The DLM assumes that priors of model parameters λ, σ2, a1, and a2 are mutually uncorrelated. Figure 3.11 shows the relationship between the pairs of these parameters and specially, a weak linear association between λ and σ2. It indicates that λ and σ2 are actually dependent, given the observed responses. 40 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 68 70 72 74 76 1 . 4 1 . 5 1 . 6 1 . 7 1 . 8 1 . 9 (a) λ σ 2 68 70 72 74 76 2 . 4 2 2 . 4 4 2 . 4 6 2 . 4 8 (b) λ a 1 68 70 72 74 76 9 . 7 6 9 . 7 8 9 . 8 0 9 . 8 2 9 . 8 4 9 . 8 6 (c) λ a 2 1.20 1.25 1.30 1.35 2 . 4 2 2 . 4 4 2 . 4 6 2 . 4 8 (d) σ2 a 1 1.20 1.25 1.30 1.35 9 . 7 6 9 . 7 8 9 . 8 0 9 . 8 2 9 . 8 4 9 . 8 6 (e) σ2 a 2 2.42 2.44 2.46 2.48 9 . 7 6 9 . 7 8 9 . 8 0 9 . 8 2 9 . 8 4 9 . 8 6 (f) a1 a 2 Figure 3.11: Scatterplots for model parameters’ pairs: (a) λ v.s. σ2; (b) λ v.s. a1; (c) λ v.s. a2; (d) σ2 v.s. a1; (e) σ2 v.s. a2; and (f) a1 v.s. a2. Time varying effect of λ–σ2: coverage probabilities versus cred- ible probabilities It’s natural to ask whether these λs and σ2s generated from the MCMC method are constant over all the time points, an assumption in Huerta et al.’s DLM. In other words, we want to answer questions such as: (1) Which data and time points used in the DLM might produce different estimation and interpolation result? (2) Are λ and σ2 varying from time–to–time? To help answer these questions, we design the following three simulation studies: (i) Study A : Implement the DLM at ungauged sites using weekly data (Wk : k = 1, . . . , 17). Obtain the Markov chains of λ, σ2, a1 and a2. 41 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation Also obtain the coverage probability at each ungauged site and each week for fixed nominal levels. (ii) Study B : Implement the DLM at ungauged sites using all the data from week 1 to week 17 (W1:17 = {W1, . . . ,W17}). Estimate the model parameters and interpolate responses at ungauged sites. Furthermore, obtain the coverage probabilities at each ungauged site and each week for fixed nominal levels, using each week’s data. (iii) Study C : Fix λ∗k at week k (k = 1, . . . , 17) using the Markov chains obtained in Study A. Then use these λ∗ = {λ∗1, . . . , λ∗17} in the DLM. In other words, we go through all the steps in the algorithm of Section 2.6.5 except that we use the fixed λ∗ instead of generating it by a Metropolis–Hasting step. Note that we only need Gibbs sampling and MCMC blocking scheme for this study. We then compute the corresponding coverage probabilities usingW1:17 at each ungauged site and each week for fixed nominal levels. The objective of Studies A and B is to demonstrate the effect of data and time propagation on the interpolation results. Study C aims to tell us there is a significant difference between the interpolation results obtained using the fixed λ∗ and those from using the Markov samples of λs. Table 3.4 shows these fixed λ∗s in Study C. Week 1 2 3 4 5 6 7 8 9 λ∗ 54.2 178.5 83.7 405.4 86.6 59.7 199.3 144.1 322.7 Week 10 11 12 13 14 15 16 17 λ∗ 142.2 172.7 187.9 315.8 419.0 99.8 260.3 284.8 Table 3.4: Fixed λ∗ in Study C. Figure 3.12 illustrates the MCMC estimation results for Study A. It plots the Markov chains for λ and σ2 using weekly data. Obviously, λ and σ2 vary from week to week, implying that the constant λ− σ2 model is not tenable over a whole summer for this database. Figures 3.13–3.19 represent the coverage probabilities of the interpola- tion results from these three studies. The solid line with dots represents 42 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 100 200 300 400 500 600 700 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 4 . 5 λ σ 2 week 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 Figure 3.12: Scatterplot for λ against σ2 given one–week–data only, constructed from MCMC samples starting from same initial values. the results in Study A, the dot line with solid diamond for Study B, and the dashed line with stars, Study C. The interpolation results for Study B and C are very similar from these figures. In other words, using these fixed λ∗s in Table 3.4 gives us similar interpolation results as treating it to be model parameter in the standard DLM setting, pointing to a drawback with the current DLM. In fact, in some cases Study C even produces better interpolation results than Study B. The results from Study A show that, as time increases, the interpolation results become, anomalously, more uncertain with the coverage probabilities getting larger and larger. We can interpret this as the model trying to incorporate the constant λ and σ2 over all the time points while they actually vary with time. Comparing these studies, sometimes the DLM gives better interpolation results when using only one week’s worth of data. In any case the current 43 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation model assumption of constant λ and σ2 is not valid in practice. Further development of the DLM is required to incorporate time–varying model parameters. 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.13: Coverage probabilities v.s. 95% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). 3.5 Summary and Conclusion We have implemented the DLM on Cluster 2 sites (AQS, 1995). Further- more, we have applied a variant of Huerta et al.’s (2004) DLM and MCMC method on this database for a whole summer (whereas they considered a single week). We find deficiencies in their MCMC method which actually uses a biased estimate of λ. In practice, their model assumption of a con- stant λ–σ2 seems inappropriate. Moreover, preliminary studies tell us the 44 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.14: Coverage probability versus 90% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). sensitive choice for the values of τ2y , τ 2 1 , and τ 2 2 , indicating the inappropri- ate setting for these parameters to be “constant” in their model. Finally the computational cost is of concern: that cost deriving from the use of the FFBS method used in this chapter. The software for implementing the DLM, GDLM.1.0, has been summarized in Appendix B. One way to tackle the interesting problem of setting τ2y , τ 2 1 , and τ 2 2 is by setting appropriate discount factors in the discount DLM. However, we are not recommending using the composite Metropolis–Hasting algorithm to obtain the samples for γ = (λ, τ2y , τ 2 1 , λ1, τ 2 2 , λ2) from their joint posterior distribution. The reason is obvious that the computational cost is huge and it is very difficult for the Markov chains to reach its convergence after 45 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.15: Coverage probability versus 80% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). certain iterations. We find the relationship between the discount factors and the model parameters: τ2y , τ 2 1 , and τ 2 2 by a first–order polynomial dynamic model in Section 4.4. To deal with the time–varying λ and σ2 in the current version of the DLM, we might be able to use the discount DLM where the discount factor varies from time to time. However, this even exemplify the computational burden and so will not be recommended here. Instead of the DLM, we now propose the Le–Zidek style modelling approach (also called as BSP approach) in the next two chapters to deal with the spatial interpolation and temporal prediction in spatial–temporal fields (Le & Zidek, 1992–2006). 46 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.16: Coverage probability versus 70% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). 47 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.17: Coverage probability versus 60% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B: W1:17 ( square with dot line); and Study C: W1:17 but with fixed λ∗ (star with dashed line). 48 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.18: Coverage probability versus 50% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). 49 Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C 0 5 10 15 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Weeks C o v e r a g e P r o b a b i l i t y Study A Study B Study C Figure 3.19: Coverage probability versus 40% nominal level for ungauged sites: (a) – site A; (b) – site B; (c) – site C; (d) – site D; (e) – site E; and (f) – site F. These coverage probabilities are computed according to Study A: weekly data (dot with solid line); Study B:W1:17 (square with dot line); and Study C:W1:17 but with fixed λ∗ (star with dashed line). 50 Chapter 4 Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation Many approaches other than the univariate DLM (Chapter 2) have been developed to model space–time fields. Although the DLM provides a very flexible approach, it can have poor predictive performance. Moreover, this approach costs a lot of computation time and becomes impractical for large geographical subregions, for instance, for the 274 sites in the US EPA AQS database (1995). To overcome these difficulties, an alternative Bayesian hierarchical modelling approach, multivariate Bayesian spatial prediction (BSP), will be presented and its performance compared with that of the DLM. This BSP approach appears in a series of papers, originating with Le and Zidek (1992). Section 4.1 presents that approach, its motivation, and some recent applications. Section 4.2 describes the Chicago area’s hourly ozone (O3) AQS database (2000) in this study. Section 4.3 introduces related theoretical results for the multivariate BSP approach (Le & Zidek, 2006). The multivariate BSP approach is then applied for spatial interpolation in the Chicago area’s hourly ozone AQS database in Section 4.4. In Section 4.4, two other approaches, the DLM and NAIVE, are implemented for the same purposes to give comparative assessment of the performance of the various predictors. Sections 4.6 summarizes our conclusions about them. 51 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 4.1 Introduction Alternatives to the DLM have been investigated in recent years for finding the characteristics of the mean surface of spatio–temporal processes or inter- polating them at unmonitored (ungauged) sites within specified geograph- ical subregions. Like the DLM they can deal with nonstationary processes as described in Chapter 2. In particular, the multivariate Bayesian spatial prediction (BSP) approach presented in this chapter handles nonstationary processes, using the deformation approach (Sampson & Guttorp, 1992). The DLM treats the processes as spatially correlated time series, that is, parallel time series for a spatial process. Other approaches such as the linear model of coregionalization (LMC) method also treat data as spatially correlated time series (Gelfand et al., 2005). In contrast, the multivariate BSP treats the data as a collection of the temporally correlated spatial processes (Kyriakidis & Journel, 1999). Related work about the multivariate BSP can be seen in Le and Zidek (1992), Brown et al. (1994a, 1994b), Sun et al. (1998), Li et al. (1999), Le et al. (1997, 2001), Zidek et al. (2002), and Le and Zidek (2006). This chapter addresses spatial interpolation in space–time fields of hourly ozone concentrations. The multivariate BSP is implemented in the Chicago area throughout the whole summer of 2000. Within this geographical region, hourly ozone concentrations are measured at 24 monitoring sites through that time span. For comparison, the DLM method (see Chapter 2) is also applied to this database. The large geographical scale of the interpolation problem across the USA, can be handled fairly well by the multivariate BSP. Unlike what was done in Chapter 2 for the DLM, the BSP approach used in this chapter has a multi- variate rather than univariate framework, although the response variable is the square–root of hourly ozone concentrations, the square–root being taken to validate the normality assumption stated in Chapter 2. We have two rea- sons for choosing the multivariate setting: (1) to increase precision in spatial interpolation and temporal prediction; and (2) to provide a way to reduce the spatial correlation leakage problem arising in the univariate case. In fact, 52 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation more accurate results can be obtained even for marginal inference based on the joint distribution model for all of the responses rather than using only a marginal distribution model, due to the greater uncertainty involved in the latter. In other words, the multivariate approach allows interpolators and predictors to “borrow strength” not only from close–by monitoring sta- tions but also from chemical correlates (Le & Zidek, 2006, p.161). At the same time, a potential spatial correlation leakage problem, which occurs for whitened residuals (see Section 4.3) that have a nonnegligible lag 1 spatial correlation between two sites, can be side–stepped. That leakage problem seems to have been first observed by Zidek et al. (2002) while interpolating hourly PM10 concentrations in the Vancouver area. Though not precisely defined, the leakage happens if the cross–covariance of space and time is not negligible in the whitened residuals due for example to a failure to correctly model autocorrelation at fine temporal scales (Zidek et al., 2002). In fact, it actually provides a criterion for selecting the appropriate response vector dimension in our approach. Section 4.4 discusses appropriate initial settings for the multivariate BSP to interpolate hourly ozone concentrations. The advantages of using the multivariate approach have been described by Le and Zidek (2006), and will be reviewed in Section 4.6. The temporal prediction problem does not at first glance seem to be embraced by the current multivariate BSP approach. The challenge arises because the multivariate response variable used in this case has to be a 24– dimensional response vector, in order to estimate the hyperscale covariance matrix among these 24 “pollutants”, in reality hours, with a separability assumption on the covariance structure. Using the whole database is not appropriate because it invalidates the independence assumption on the re- sponse vectors needed. However, using a part of the database leads to a choice for the covariates, although further work on the temporal predictor has to be done. Resolutions of some above issues and theoretical results for temporal predictive distributions are presented in Section 5.1. To assess the multivariate BSP model’s performances, two other alterna- tive approaches, the DLM and NAIVE (NAIVE∗), are proposed for spatial interpolation (temporal prediction) of the Chicago area’s hourly ozone con- 53 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation centration’s field. These three models’ performances are then compared in Section 4.4 (Sections 5.2–5.3) for the BSP, DLM and NAIVE (NAIVE∗), respectively. Computational efficiency is one major advantage of the multivariate BSP approach. The software, EnviRo.stat, can be freely downloaded from http://enviro.stat.ubc.ca/. Sections 4.6 and 5.5 summarize our conclusions about the implementation of this approach. The next section describes characteristics of the Chicago area’s hourly ozone field. 4.2 AQS Ozone Database (2000) for the Chicago Area The database used in this chapter originally comes from the AQS ozone database (2000) by EPA. The hourly ground–level ozone concentrations (in ppb) for the whole summer in the Chicago area are extracted from that database. The extracted database contains 24 monitoring stations at irreg- ularly geographical locations in this area, hourly ozone concentrations being measured at each of them. The joint spatial and temporal dependence of the hourly ozone levels are then modelled as a spatio–temporal process in the spatio–temporal field over the Chicago area. To facilitate the assessment of the model’s performance for interpolation and prediction, 14 sites are selected as “gauged” sites from 24 monitoring stations, the remaining 10 being taken to be ungauged sites. Figure 4.1 represents the geographical locations of these 14 gauged and 10 ungauged sites. Each has a few missing values but the gauged sites have many fewer zero measurements during the overall time span than most of the ungauged sites, thus providing much more information for this spatio–temporal field (see Figure 4.2). Figure 4.3 shows a side–by–side boxplot of the square–root of hourly ozone concentrations at each one of the 24 monitoring stations across all the time points. It shows that Gauged Site 3 behaves differently because 54 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 23 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 G UG Figure 4.1: Geographical locations for the Chicago AQS database (2000), where the latitude and longitude are measured in degrees. (◦ = G = gauged sites and × = UG = ungauged sites.) of its deviation from the median for all sites and times. Figure 4.1 shows that gauged site to be near the Michigan River. However, it is unknown if the difference of the observed responses at Gauged Site 3 from the rest are due to the influence of that river or because other sites are also close to it, for example, Gauged Sites 1 and 10. One might expect that any model not taking account of this difference could lead to a poor model fit. This chapter 55 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation G UG 1 2 3 4 (a) R a t e s o f t h e m i s s i n g m e a s u r e m n t s ( % ) G UG 0 2 4 6 8 (b) R a t e s o f t h e z e r o m e a s u r e m e n t s ( % ) Figure 4.2: Boxplots for the rates of: (a) missing measurements; and (b) zero measurements, at 24 monitoring stations in the Chicago AQS database. (G = gauged sites and UG = ungauged sites.) will examine this issue later when comparing interpolation (see Section 4.4) and prediction (see Section 5) of ozone concentrations’ field using three different approaches: the multivariate BSP, DLM and NAIVE (NAIVE∗). To explore this database further, weekday and hourly effects are exam- ined in Figures 4.4 and 4.5, respectively, using a simple regression method. The latter are approximately constant over all gauged sites; in particular, the variability of the hourly effects from 0 A.M. to 10 A.M. is slightly larger than that of the remaining hours after 10 A.M., indicating the relatively strong constant hourly effects from 10 A.M. to 11 P.M. The weekday ef- fects in Figure 4.4 also indicate constant weekday effects across all gauged sites. The above exploratory data analysis (EDA) suggests modelling con- stant weekday and hourly effects across all gauged sites. Constant weekday 56 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation G1 G3 G5 G7 G9 G11 G13 UG1 UG3 UG5 UG7 UG9 0 2 4 6 8 1 0 Monitoring Stations O 3 Figure 4.3: Boxplots for the square–root of hourly ozone concentrations ( √ ppb) at 24 monitoring stations in the Chicago AQS database. (G1 = Gauged Site 1; UG1 = Ungauged Site 1; and so on.) and hourly effects point to constant effects for appropriate covariates in the multivariate BSP approach. Next, the corresponding model settings and methodology for the multivariate BSP is discussed in the context of the Chicago area’s hourly ozone concentrations’ field. 4.3 Methodology The multivariate BSP approach puts no restriction at level one of its un- derlying hierarchical model structure on the covariance structure in the spatio–temporal field. Instead, the covariance function is modelled by the generalized–inverted Wishart (GIW) (Le & Zidek 1992; 2006) distribution. That approach takes account of uncertainties about the K–step staircase 57 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 2 3 4 5 6 − 0 . 6 − 0 . 4 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 Weekday W e e k d a y E f f e c t G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 Figure 4.4: The weekday effect of the square–root of hourly ozone concentrations ( √ ppb) at the 14 gauged sites in the Chicago AQS database. pattern data by allowing different degrees of freedom for different “steps” in the data array, and allows a nonstationary covariance structure in the field. In this approach, spatio–temporal multivariate responses are treated as a collection of temporally correlated spatial fields at a finite number of time points, instead of a collection of spatially correlated time series at a finite number of monitoring stations in the DLM approach (Kyriakidis & Journel, 1999). The multivariate BSP models spatio–temporal responses at two levels: at the first, the spatio–temporal random function in this field is supposed to follow a Gaussian process with the mean function depending on the covariates as well as the corresponding coefficient matrix β, and co- variance function Σ; at the second, the coefficient matrix β is modelled as a random matrix, following a Gaussian process with covariance function Σ, having a GIW distribution. The residuals after the first level of modelling 58 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 5 10 15 20 − 0 . 6 − 0 . 4 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 Hour H o u r l y E f f e c t G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 Figure 4.5: The hourly effect of the square–root of hourly ozone concentrations ( √ ppb) at the 14 gauged sites in the Chicago AQS database. are called as detrended residuals, and those after the second level are called as deAR’d residuals (Zidek et al., 2002). The multivariate BSP approach uses an autoregressive temporal struc- ture to incorporate short–term autocorrelations and a nonstationary spatial covariance structure to deal with the nonstationary temporal–spatio pro- cesses. In particular, 2–hour–block response vectors are selected in Chicago’s hourly ozone field (Section 4.2) to reduce the loss of spatial correlation leak- age between the sites and allow prediction at the given hour borrowing strength from its neighbour. Le and Zidek (2006) and Le et al. (1997) present theoretical results on the multivariate BSP model, but only the results related to the work done in this section are summarized here. Specifically, the data patterns of the multivariate BSP, that is, the systematic missing data patterns, are intro- 59 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation duced. Also it is demonstrated in the fully hierarchical Bayesian framework about the multivariate BSP model, as well as the conditional posterior dis- tribution of the unobserved response variables at ungauged sites and the procedure for the hyperparameters estimation. Finally, this section presents the posterior predictive credibility ellipsoids of the unobserved responses at ungauged sites to assess the BSP’s model performance (Sun et al., 1998; Le & Zidek, 2006, p.181–183). The package, EnviRo.stat, is used in this sec- tion (http://enviro.stat.ubc.ca). The results can be reproduced using that software since it is freely available. Model specification The multivariate BSP approach addresses two types of missing data pat- terns: monotone missing (i.e., staircase pattern) and systematically missing (Le et al., 1997); only the former is relevant here. The monotone missing pattern occurs when the multivariate responses are measured at different sites that start at different times. In this case, the sites can be rearranged in such a way that the data array exhibits a monotone increasing pattern over time (say a K–step staircase pattern, for K = 1, 2, . . .). Within each block of the monotone missing or staircase pattern, the multivariate random vector of responses are measured from the same starting time. Le and Zidek (2006) provide a theory that handles such data. In our application, K = 1 in the Chicago’s hourly ozone field because the response variables are measured from the same starting time across all gauged sites. [Note: at each gauged site, the small number of missing measurements are imputed by the spatial regression method before implementing the multi- variate BSP to interpolate the hourly ozone in the field. Unlike the DLM, we can obtain posterior samples for the missingness by treating them as additional parameters. Imputing the missingness by the MCMC samples might take more computational time, however.] In this spatio–temporal field, let n denote the total number of time points, p, the total number of pollutants or species measured at each station, g, the total number of gauged sites and u, the total number of ungauged 60 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation sites. Denote the response variable at time t and gauged site gj by Y[gj ]t = (Y [gj ] t,1 , . . . , Y [gj ] t,p ) : 1× p, and the response variable at time t and ungauged site ui by Y[ui]t = (Y [ui] t,1 , . . . , Y [ui] t,p ) : 1× p, for t = 1, . . . , n, j = 1, . . . , g, and i = 1, . . . , u. At time t, the responses at gauged sites are coordinates of the random response vectorY[g]t = (Y [g1] t , . . . , Y[gg ]t ) : 1 × gp and at ungauged sites, Y[u]t = (Y[u1]t , . . . ,Y[uu]t ) : 1 × up. The combined random response vector at time t can be written as Yt = (Y[u]t ,Y [g] t ) : 1 × (u + g)p. Consequently, the matrix variate response Y is given by (Y1′, . . . ,Yn′)′ : n× (u+ g)p. Notice that Y can also be written as (Y[u],Y[g]), where Y[u] = (Y[u]1 ′ , . . . ,Y[u]n ′ )′, the unobserved response vari- ables at ungauged sites, and Y[g] = (Y[g]1 ′ , . . . ,Y[g]n ′ )′, the observed response variables at gauged sites. Let Z : n× h be the covariates matrix, where the total number of covariates is h. Assume that the covariates are the same across all the sites at any fixed time point. Suppose β : h × (u + g)p is the coefficient matrix of Z. Assume common covariates effects across all the sites in the multivariate BSP. The multivariate BSP model is given by Y|β,Σ ∼ N(Zβ, In ⊗Σ)2 (4.1) β|Σ, β0 ∼ N(β0,F−1 ⊗Σ) (4.2) Σ ∼ GIW (Θ, δ), (4.3) where the covariance structure Σ : (u+g)p×(u+g)p is positive definite and follows the generalized Inverted Wishart (GIW) distribution (Le & Zidek, 2006, p.158; Brown et al., 1992; Le et al., 1997; Sun et al., 1997; Le & Zidek, 61 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1992). The GIW distribution is defined recursively for the covariance function Σ, having the K–block structure (Brown et al., 1994; Le & Zidek, 2006, p.300). The GIW can be reparameterized by the Bartlett transformation (Le & Zidek, 2006, p.302). In particular, for the case of K = 1, the GIW distributed Σ in (4.3) is equivalent to (Γ[u], τ [u],Γ1), such that τ [u]|Γ[u] ∼ N(τ [u]0 ,H[u] ⊗ Γ[u]) (4.4) Γ[u] ∼ IWup(δ[u],Λ[u] ⊗Ω) (4.5) Γ1 ∼ IWgp(δ1,Λ1 ⊗Ω), (4.6) where IW represents the inverted Wishart distribution. Equations (4.4)– (4.6) imply that τ [u]0 = Φ −1 gg Φgu, H [u] = Φ−1gg , δ[u] = δ, Λ[u] ⊗ Ω = Φu|g, Γ1 = Σgg, Λ1⊗Ω = Φgg, and δ1 = δ− up by the properties and definitions of the IW and GIW (Le & Zidek, 2006, p.299–301). Let H = {Θ, δ,F, β0}, Θ = {τ [u]0 ,H[u],Λ[u],Ω,Λ1}, and δ = {δ[u], δ1}. Hence, δ[u] = δ1 + up. Note that sine or cose functions (show in the DLM model before rep- resenting the periodicities in spatial–temporal data) can be included in Z, common over sites. While working with BSP, we can actually incorpo- rate more general structure with more flexible than only using sine or cose. Moreover, we will have affordable number of parameters. To deal with the site–specific covariates, we can do it in two ways: (1) dealing it in the pre– filtering stage; or (2) treating it as a random process, and then conditioning on that response. Given the multivariate BSP model in (4.1)–(4.3), the predictive posterior distribution of the unobserved responses at ungauged sites and the hyper- parameter estimates at the gauged and ungauged sites are given briefly in 2⊗ represents the Kronecker product between two matrices such that Ap×q ⊗Bm×n =  a11B . . . a1qB... ... ap1B . . . apqB  pm×qn =  b11A . . . b1nA... ... bm1A . . . bmnA  pm×qn . 62 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation the next section. Predictive distributions and hyperparameters estimation This subsection briefly demonstrates the method used in estimating the hy- perparameters in the multivariate BSP approach and the predictive posterior distribution of the multivariate response variables at ungauged sites given those estimated hyperparameters. The hyperparameters in the BSP contain the hyperparameters at gauged sites, Hg = (Ω,Λ1, δ1,F, β0), and those at ungauged sites, Hu = (Λ[u],H[u], τ [u]0 , δ[u]). Firstly, the EM algorithm is used to estimate the hyperparameters at the gauged sites, Hg, in the multivariate BSP. Given those estimators, the Sampson–Guttorp method is used to esti- mate the covariance function of the ungauged sites and the cross–covariance function between the gauged and ungauged sites. Consequently, the estima- tors of Γ[u] and τ [u]0 can be obtained from the above estimators. Finally, the predictive posterior distribution of the response variables at the ungauged sites is obtained conditional on those estimators of H = (Hg,Hu). Note here we only estimate those hyperparameters once in the BSP ap- proach. Not like the DLM, estimates for the model parameters have to be obtained at each iteration of the MCMC runs. This one–time–estimation greatly saves our computational time. Suppose Y = (Y[u],Y[g]) : (n × up, n × gp), β = (β[u], β[g]) : (h × up, h× gp), and β0 = (β[u]0 , β[g]0 ) : (h×up, h× gp). Given Y[g], the predictive distribution of Y[u] is Y[u]|Y[g],H ∼ tn×(up)(µ[u|g],Φ[u|g] ⊗Ψ[u|g], δ − up+ 1), (4.7) where µ[u|g] = Zβ[u]0 + (Y [g] − Zβ[g]0 )τ [u]0 (4.8) Φ[u|g] = In + ZF−1ZT + (Y[g] − Zβ[g]0 )H[u](X[g] − Zβ[g]0 )T (4.9) Ψ[u|g] = 1 δ − up+ 1(Λ [u] ⊗Ω) (4.10) 63 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation (Le & Zidek, 2006, p.160–161). At the gauged sites, the multivariate BSP can deal with the case of missing pollutants in each block. In other words, the observed responses in the matrix–variate form may include the missing columns of some pollutants at the gauged sites. Let l be the total number of unobserved responses at gauged sites, and so l ∈ {0, . . . , gp}. The vector of observable responses at gauged sites Y[g] can be partitioned into Y(1) : n × l and Y(2) : n × (gp − l), the missing and observed responses at gauged sites, respectively. Let rj : (gp × 1) = (rj,1, . . . , rj,gp)′ be a vector such that rj,j = 1 and rj,k = 0 for k 6= j, and k, j = 1, . . . , gp. Suppose R1 = (ri1 , . . . , ril), and R2 = (ril+1 , . . . , rigp); thus R = (R1,R2) : (gp × gp), forms an orthogonal matrix. Hence, Y(1) = Y[g]R1 and Y(2) = Y[g]R2. Let RTΣggR = ( Σ11 Σ12 Σ21 Σ22 ) = ( RT1ΣggR1 R T 1ΣggR2 RT2ΣggR1 R T 2ΣggR2 ) and Ψgg = RTΦggR = ( Ψ11 Ψ12 Ψ21 Ψ22 ) = ( RT1ΦggR1 R T 1ΦggR2 RT2ΦggR1 R T 2ΦggR2 ) , with Σ11,Ψ11 : l× l and Σ22,Ψ22 : (gp− l)×(gp− l), respectively. Similarly, β [g] 0 R can be partitioned as β [g] 0 = (β [g] 0 R1, β [g] 0 R2) = (β [g] (1), β [g] (2)). By the properties of the multivariate t–distribution, the predictive pos- terior distribution of Y(1) is given by Y(1)|Y(2),H ∼ tn×h(Zβ[g](1) + (Y(2) − Zβ [g] (2))Ψ −1 22Ψ21, 1 δ − up− l + 1P1|2 ⊗Ψ1|2, δ − up− l + 1), (4.11) where P1|2 = In + ZF−1ZT + (Y(2) − Zβ[g](2))Ψ−122 (Y(2) − Zβ [g] (2)) T (4.12) 64 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation and Ψ1|2 = Ψ11 −Ψ12Ψ−122Ψ21. (4.13) Furthermore, the predictive posterior distribution of Y[u] is given as follows Y[u]|Y(2),H ∼ tn×up(Zβ[u]0 + (Y(2) − Zβ[g](2))Ψ−122 RT2Φgu, Pu|2 ⊗Φu|g δ − up− l + 1 , δ − up− l + 1), (4.14) where Pu|2 = In + ZF−1ZT + (Y(2) − Zβ[g](2))Ψ−122 (Y(2) − Zβ [g] (2)) T (4.15) (Le et al., 1997). The next subsection explores the predictive performance of the multi- variate BSP approach. Predictive performance To assess the multivariate BSP model’s performance, pointwise predictive intervals and predictive posterior credibility ellipsoids are constructed from the predictive posterior distributions in Section 4.3 (Le et al., 1997; Le & Zidek, 2006). Le and Zidek (2006) point out that the pointwise predictive distribution of the last (or pth) pollutant at each of the ungauged sites and any fixed time point t can be obtained from (4.14) by letting up = 1. Moreover, the pointwise predictive variance is given by Var(Y[u]t |Y(2) = y(2),H) = (δ∗ − 2)−1Pt|2Φu|2, (4.16) where δ∗ = δ − up − l + 1. Hence, the pointwise predictive intervals of the last pollutant at ungauged site ui and the fixed time t is given by E(Y[u]t |Y(2) = y(2),H)± tδ∗(0.025)(Var(Y[u]t |Y(2) = y(2),H))1/2. 65 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation However, one might expect that the pointwise predictive intervals would not have good calibration properties due to the accumulated uncertainty arising from simultaneously interpolating the response variables in the spatio– temporal field. Le et al. (1997) develop the ellipsoid credible regions for simultaneously interpolating at ungauged sites, improving its calibration. Those ellipsoid credible regions are given by the following theorem that we include here for completeness. Theorem 4.3.1 The (1 − α)−level (0 < α < 1) simultaneously posterior credibility region is given by {Y[u]t : (Y[u]t −ŷ[u]t )Φ−1u|g(Y [u] t −ŷ[u]t )T < up δ − up− l + 1Pt|2Fup,δ−up−l+1(1−α)}, where ŷ [u] t = Ztβ [u] 0 + (Y [g] t − Ztβ[g]0 )τ [u]0 , Pt|2 = 1 + ZtF−1ZTt + (Y [g] t − Ztβ[g]0 )H[u](Y[g]t − Ztβ[g]0 )T , and Φ−1u|g = (Λ [u] ⊗Ω)−1. Theorem 4.3.1 is true because Y[u]t |Y(2)t ,H ∼ t1×up(ŷ[u]t , 1 δ − up− l + 1Pt|2Φu|g, δ − up− l + 1), (4.17) where ŷ[u]t = Ztβ [u] 0 + (Y (2) t − Ztβ[g](2))Ψ−122 RT2Φgu, (4.18) and Pt|2 = 1 + ZtF−1ZTt + (Y (2) t − Ztβ[g](2))Ψ−122 (Y (2) t − Ztβ[g](2))T . (4.19) Further work by Sun et al. (1998) shows these credibility ellipsoids to be well calibrated. These credible regions are also constructed and compared with the pointwise predictive intervals when interpolating in the Chicago’s 66 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation hourly ozone field, in the next section. 4.4 Spatial Interpolation This section’s main theme concerns the interpolation of ozone concentrations at those ten ungauged sites using the multivariate BSP approach for the Chicago’s hourly ozone field. Moreover, two other approaches, the DLM and NAIVE, are also used in the same interpolation problem to assess the model’s performance. The interpolation results using these three approaches are then compared in this section. The multivariate BSP approach For the multivariate BSP model given by (4.1)–(4.3), hourly ground–level ozone concentrations’s field is modelled as a trend plus a detrended spatio– temporal noise (i.e., detrended residuals) at the first level of a hierarchi- cal model; at the second level, the temporal dependence in the detrended spatio–temporal noise is modelled through an autoregressive process with the deAR’d residuals. These deAR’d residuals are then interpolated at the ungauged sites in the spatio–temporal field. Finally, these interpolated deAR’d residuals are imputed by taking them back to the trend model to get the interpolated values at the ungauged sites. To do this, we square the interpolated responses due to the square–root transformation we made due to the normality assumption assumed for this model. The multivariate BSP model has been used to interpolate Vancouver’s hourly PM10 field (Li et al., 1999; Zidek et al., 2002). A multivariate data model is created to reduce the loss of spatial correlation in the deAR’d resid- uals compared with their detrended counterparts, and to allow the predic- tions to be made on any response by borrowing strength from the close–by responses through their spatial correlations (Le et al., 1999; Zidek et al., 2002; Le & Zidek, 2006, p.277). This loss of spatial correlation has been called the spatial correlation leakage problem (also seen in Section 4.5) by Li et al. (1999), Zidek et al. (2002), and Le and Zidek (2006). The spatial 67 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation leakage problem occurs if nonnegligible lag–1, lag–2, etc., spatial correlations exist in the deAR’d residuals. Using those problematic deAR’d residuals to interpolate at ungauged sites, one may anticipate that the interpolators at any given ungauged site cannot borrow strength from its gauged neighbours due to the small number of spatial correlation between sites (Zidek et al., 2002; Le & Zidek, 2006). Figures 4.6 plots the spatial correlations of the detrended and deAR’d residuals between sites, respectively. The substantial loss of the spatial correlation of the deAR’d residuals suggests the use of the multivariate strategy adopted here. In particular, that strategy enables one to bypass the need for difficult fine scale autocorrelation modelling that will inevitably be inexact and hence induce the lagged spatial cross–correlations that can cause that leakage. 0 50 100 150 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Intra−Distance S p a t i a l c o r r e l a t i o n ( d e t r e n d e d r e s i d u a l s ) 0 50 100 150 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Intra−Distance S p a t i a l c o r r e l a t i o n ( d e A R ’ d r e s i d u a l s ) Figure 4.6: The estimated spatial correlations of: (a)–detrended residuals, and (b)–deAR’d residuals; between gauged sites. But what kind of multivariate data settings should one select? Le et 68 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation al. (1999) and Zidek et al. (2002) argue for the use of the daily response vectors with hourly response coordinates instead of the hourly responses themselves. One of their arguments, that the spatial leakage problem is neg- ligible for the daily responses but not for the hourly ones, is supported by theoretical results (Zidek et al., 2002). A related argument is that the daily deAR’d residuals are approximately independent from each other based on the AR structure of the detrended residuals, approximately satisfying the in- dependence assumptions for the responses in (4.1)–(4.3). These arguments also apply to the Chicago’s ground–level ozone field, as demonstrated by Figure 4.7 which depicts the partial autocorrelation functions (PACFs) of the detrended residuals, indicating an hourly AR(2) process. That implies the possible choices of dimensions, 2,. . ., or even 6, for the daily response vectors to achieve adequate temporal separation between them to ensure in- dependence while at the same time, allowing pairwise temporal correlations between hours to be estimated. Choice of the appropriate dimensionality of the response vectors is based on how severe the spatial correlation leak- age may result. In other words, we use multivariate AR process for the detrended residuals, avoiding different AR process for responses at different monitoring locations. For simplicity, let (i:j)–hour represent the sub–data matrix for the hourly responses from hour i to j across the gauged sites, for i, j ∈ {1, . . . , p}. Sup- pose one were interested in interpolating say hour 11’s ozone level at any fixed ungauged site given all the observed responses at the gauged sites. As we discussed above, the possible response vectors could be the (10:11)– hours, . . ., or the (6:11)–hours. For each choice of these response vectors, the spatial correlations between all the gauge sites are estimated using the multivariate BSP approach. Figure 4.8 shows these estimated spatial corre- lations for the (10:11)–hours, . . ., (6:11)–hours response vectors and that of the detrended residuals. Notice how the spatial correlation declines as the dimensions of the response vector increases (leakage). This boxplot shows that the (10:11)–hours response vectors have the smallest number of loss of spatial correlations between the gauged sites, so becoming the right choice for a multivariate data model. This is true for other hours as well, strongly 69 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G1 0 5 10 15 20 25 30 35 0 . 0 0 . 4 0 . 8 Lag P A C F G2 0 5 10 15 20 25 30 35 0 . 0 0 . 4 0 . 8 Lag P A C F G3 0 5 10 15 20 25 30 35 0 . 0 0 . 4 0 . 8 Lag P A C F G4 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G5 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G6 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G7 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G8 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G9 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G10 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G11 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G12 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G13 0 5 10 15 20 25 30 35 − 0 . 2 0 . 2 0 . 6 Lag P A C F G14 Figure 4.7: The PACF plots for the square–root of hourly ozone concentrations ( √ ppb) at the 14 gauged sites in the Chicago’s area AQS database. supporting the use of 2–hour–block as the response vector. Hence, 2–hour– block data are extracted from the Chicago AQS ozone database (2000) to serve as the multivariate responses in a multivariate BSP model designed to interpolate the hourly ozone concentrations in the field. Prior to implementing the multivariate BSP approach, a small number of missing measurements are filled in by the spatial regression method. For the model in (4.1)–(4.3), p = 2, n = 123, u = 10, and g = 14. The multivariate BSP approach is repeated 24 times by successively cycling the first hour in the two–hour–block through the day to predict the hourly ozone levels at the 10 ungauged sites, and the corresponding 95% pointwise predictive intervals. For example, suppose the hour of “interest” was hour 11. The re- sponse variable at any fixed day, t, and gauged site, j, could be written 70 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation Detrended 10:11 9:11 8:11 7:11 6:11 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 S p a t i a l C o r r e l a t i o n s Figure 4.8: Boxplots for the spatial correlations of the detrended residuals (De- trended), and the estimated spatial correlations using the square–root of hourly ozone concentrations during the hours of: 9 A.M. to 10 A.M. (10:11), 8 A.M. to 10 A.M. (9:11), 7 A.M. to 10 A.M. (8:11), 6 A.M. to 10 A.M. (7:11), and 5 A.M. to 10 A.M. (6:11), respectively. as Y[g j ] t = (Y [gj ] t,10,Y [gj ] t,11) : 1×p. Given the BSP model in (4.1)–(4.3) and pri- ors of the parameters, the predictive posterior distribution of Y[u] is given by (4.7). Notice that l = 0 because the response vector contains no miss- ing values. Two covariates, month with four levels and weekday with seven levels, are considered in this approach due to the exploratory data analy- sis described above for this field, which returns h = 11. Let e1 = (0, 1)′ and E1 = (e1′, . . . , e1′)′ : up × 1. By a basic property of the multivariate t–distribution and the predictive posterior distribution of Y[u]t in (4.17), the posterior distribution of the interested pollutant at ungauged sites is given 71 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation by Y[u]t E1|Y(2)t ,H ∼ t1×u(ŷ[u]t E1, 1 δ − up− l + 1Pt|2E ′ 1Φu|gE1, δ − ug − l + 1). (4.20) Let e0,j be the u–dimensional vector such that the jth entry is 1 and 0 oth- erwise, for j = 1, . . . , u. Then the unobserved pth pollutant at the ungauged site uj , Y [u] t E1e0,j , has the following conditional posterior distribution Y[u]t E1e0,j |Y(2)t ,H ∼ tδ−up−l+1(ŷ[u]t E1e0,j , 1 δ − up− l + 1Pt|2e ′ 0,jE ′ 1 ×Φu|gE1e0,j). (4.21) Hence, the predictive posterior mean and variance of the unobserved pth pollutant are given by E(Y[u]t E1e0,j |Y(2)t ,H) = ŷ[u]t E1e0,j and Var(Y[u]t E1e0,j |Y(2)t ,H) = 1 δ − up− l + 1Pt|2e ′ 0,jE ′ 1Φu|gE1e0,j , respectively. Consequently, the pointwise 95% predictive intervals of the unobserved pth pollutant at the ungauged site ui and fixed day t is given by: E(Y[u]t E1e0,j |Y(2)t ,H)± tδ−up−l+1(0.025)(Var(Y[u]t E1e0,j |Y(2)t ,H))1/2. (4.22) The above procedure is then repeated 24 times to interpolate the hourly ozone levels in this field. The software used for this multivariate BSP ap- proach is EnvioStat.1.0, found at http://enviro.stat.ubc.ca. We next compare our approach with two others, the DLM and NAIVE. 72 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation The DLM approach The dynamic linear model (DLM) is one alternative approach to the spatial interpolation in the Chicago area’s hourly ozone field. The DLM and its im- plementation have been extensively explored in Chapter 2. In the Chicago’s AQS ozone database, the total number of time points (i.e., the hours), T, is 123 × 24 = 2952. To compare its interpolation results with those of the multivariate BSP, same gauged and ungauged sites are selected as in Section 4.4. Hence the total number of the gauged sites, n, is 14. The initial settings for the DLM are given next, following an investigation of the discount factor for first–order polynomial dynamic models, to optimize this approach. Our investigation indicates that the appropriate prior for the phase pa- rameters a = (a1, a2)′ is N(µ0,Σ0), where µ0 = (1.5, 4.5)′ and Σ0 = ( 0.0625 0 0 0.5625 ) . The best initial specification for the state parameters turns out to be m0 = (5.15,−0.751′n, 0.051′n)′ for location whileC0 is a block diagonal matrix with diagonal entries: 0.1304, 0.0158In, and 0.0003In. The first–order polynomial dynamic model, for t ≥ 1, is then given by: yt = βt + εt εt ∼ N(0, σ2) (4.23) βt = βt−1 + ωt ωt ∼ N(0, σ2σ2β), (4.24) and the initial information: β0 ∼ N(0, σ20). The measurement of the rate of adaptation to new data, i.e., the adaptive coefficient At, converges to the constant A as t→∞, where A is a function of r = σ2β, the ratio of the state variance to the system variance (r ∈ [0, 1]) (Harvey, 1984; West & Harrison, 1997). r is also called the signal–to–noise ratio, large values of r implying a clear signal in the system. The adaptive coefficient A reflects how much weight to put on the new data in updating the predictive mean for this simplest DLM: larger value of A represents more weight on the new data; smaller value of A represents less weight on the new data. Consequently, the 73 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation discount factor, δ, is defined as 1−A. Hence, the larger the discount factor is, the less weight is attached to information provided by the new data in updating the predictive mean at current time; the smaller δ is, the more weight is put on the information of the new data. If δ = 0, the dynamic model would not put any weight on the new data, resulting in a very poor model fit. In other words, there is no signal in the system but only the noise from the measurements. It is impossible to distinguish the signal and the noise in the system. If δ = 1, almost all the weight would be on the new data, resulting in a non–distinguishable signal and noise. West and Harrison (1997) recommend setting the values of δ between 0.8 and 1. A, r, and δ, are related by the equation: r = A2(1−A)−1 = (1− δ)2δ−1 (West & Harrison, 1997). δ 0.80 0.85 0.90 0.95 0.99 r 0.05 0.0265 0.0111 0.0026 0.0001 Table 4.1: The relationship between the discount factor (δ) and the signal–to–noise ratio (r). Table 4.1 shows some of the values of δ and its corresponding r, that is, σ2β in (4.23)–(4.24). This table provides one way to set the initial values of σ2β. For the constant DLM model in Chapter 2 given the common variance σ2 the canonical trend has a Gaussian distribution with variance σ2τ2y , the periodic trend with period 24, a Gaussian distribution with variance σ2τ21 , and the periodic trend with period 12, a Gaussian distribution with variance σ2τ22 . The variability of the canonical trend is assumed to be larger than that of the periodic trends; while the variability of the periodic trend with period 24 is assumed to be larger than that of the periodic trend with period 12. In other words, one assumes that the canonical trend puts more weight on the information provided by the new data than other periodic trends, and the periodic trend with periodic 24 has more weight than the periodic 12 trend. For example, if the discount factors of the canonical, periodic 24, and periodic 12 trends were set to 0.85, 0.90, and 0.95, respectively, the corresponding values of τ2y , τ 2 1 , and τ 2 2 would be 0.0265, 0.0111, and 0.0026, respectively. Those values, divided by 17, are then selected to be the fixed 74 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation values of the model parameters: τ2y , τ 2 1 , and τ 2 2 . [The reason to divide them by 17 were discussed in Section 3.4 to take account of the 17 weeks’ time effect.] Both λ1 and λ2 are set at 25. The hyperpriors for σ2 and λ are IG(2, 2) and IG(1, 5), respectively. Those initial settings of the DLM in Section 3.2 improve the DLM model’s fit in interpolating the responses at ungauged sites. However, it un- derestimates the variability of the system, leading to a trade–off between the precision and the variability of the spatial interpolator. The interpolation re- sults for Chicago’s hourly ozone field are shown in Section 4.4. The software, GDLM.1.0., can be freely downloaded from http://enviro.stat.ubc.ca/dlm, to allow our results to be reproduced or for application in other contexts. NAIVE approach One might well suspect that the responses at the nearby gauged sites af- fect interpolation at ungauged sites. It may seem plausible that for a rough spatio–temporal field, the spatial correlations between the sites are so small that the responses at different sites can be viewed to be approximately in- dependent from each other. This leads to NAIVE approach, i.e., the “in- terpolated” values at any ungauged site are the observed responses at that gauged site closest to the ungauged one. Ungauged Site Gauged Site GCD (km) 1 1 18.19 2 1 5.74 3 1 6.09 4 4 8.70 5 7 12.99 6 7 19.46 7 7 13.50 8 1 13.97 9 11 15.86 10 10 9.33 Table 4.2: The greatest circle distance (GCD) between the pairs of ungauged sites and their closest gauged site(s) in the Chicago’s hourly O3 field. 75 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation Table 4.2 represents the GCD (in km) between the ungauged sites and their closest gauged sites in the Chicago’s hourly ozone field. For example, Gauged Site 11 is the closest one to Ungauged Site 9 in terms of the GCD in Table 4.2. The interpolated values at Ungauged Site 9 are the observed responses at Gauged Site 11 in this approach. NAIVE approach ignores the local characteristics in the field, not taking account of the effects of the covariates, or other factors such as wind and wind directions among the stations. If NAIVE approach had been proved comparable or better than the multivariate BSP and DLM, one would prefer it owing to its great simplicity. Section 4.4 compares NAIVE approach with the multivariate BSP and DLM. Comparisons and results This subsection implements the multivariate BSP, DLM, and NAIVE ap- proaches, to interpolate the hourly ozone concentrations at ungauged sites in the Chicago area. These interpolation results are compared with each other to assess the model performance of the multivariate BSP, DLM, and NAIVE approaches. Figures 4.9–4.16 plot the interpolation results of the three approaches at one of the ungauged sites, Ungauged Site 7, during the first week to the sixteenth week. In these four figures, the dots rep- resent the observed values at this ungauged site, the solid and dotdashed lines represent the predictive mean and 95% pointwise predictive intervals for the multivariate BSP approach, the dashed and dotted lines represent the predictive mean and 95% empirical predictive intervals for the DLM ap- proach, and the addition signs represent the interpolated value by NAIVE approach, i.e., the observed values at Gauged Site 7 in this case (refer to Table 4.2). Those empirical predictive intervals of the DLM are wiggly and monotonically increasing as time increases, one feature observed in Section 3.4. On the other hand, the 95% pointwise predictive intervals of the mul- tivariate BSP do not have such wiggly behavior and capture both the long– and short–term trends at Ungauged Site 7. The interpolation of NAIVE approach is not as good as those of the DLM and multivariate BSP for most 76 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation time points, but it does occasionally do well when the interpolation results for the other two approaches deviate from the observed responses. 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.9: Interpolation at Ungauged Site 7 from the 1st week to the 2nd week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive inter- vals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] The closer the ungauged sites are to the gauged ones, the better the ex- pected interpolation performance; that is the assumption upon which these three approaches rest. Figures 4.17–4.18 plot the interpolation results for 77 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.10: Interpolation at Ungauged Site 7 from the 3rd week to the 4th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive inter- vals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] the three approaches at Ungauged Site 10 during the 1st and 10th week, respectively. The overestimated predictive variances of the DLM have a monotone increasing trend in the 10th week, compared with those in the 1st week. NAIVE approach performs quite differently from one time to the next. For example, Figure 4.17 shows that NAIVE approach produces good 78 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.11: Interpolation at Ungauged Site 7 from the 5th week to the 6th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive inter- vals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] predictive values that are quite close to the observed responses at Ungauged Site 10 at most times in the 1st week. However, Figure 4.18 shows that the predicted values for NAIVE approach tend to underestimate the observed responses. One therefore suspects that other factors contribute to the spatial correlation between these sites, for instance, covariate effects, such as wind 79 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.12: Interpolation at Ungauged Site 7 from the 7th week to the 8th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive inter- vals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] and its direction. Two close–by sites may be correlated because they lie in the same wind direction; on the other hand, two close–by sites may look independent of one another if their locations lie on a line orthogonal to the wind direction. Other interpolation results, demonstrated in Figures 4.19– 4.21, are also examined to check for the association between the response 80 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.13: Interpolation at Ungauged Site 7 from the 9th week to the 10th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dot- dashed) lines = interpolation and 95% pointwise predictive intervals by the BSP; dash (dot) lines = interpolation and 95% predictive inter- vals by the DLM; + = interpolation by NAIVE; and ◦ = observations at Ungauged Site 7.] variables at the ungauged site and its closest gauged neighbour. The marginal posterior distribution of the pth pollutant tends to overesti- mate the predictive variance of the process, as observed in Figures 4.17–4.21. The posterior ellipsoid credible regions, constructed from the joint posterior distributions of the p pollutants, provide well calibrated predictive results 81 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.14: Interpolation at Ungauged Site 7 from the 11th week to the 12th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive inter- vals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observa- tions at Ungauged Site 7.] using the multivariate BSP approach. Figure 4.22 plots the empirical el- lipsoid coverage probabilities across the ungauged sites at various nominal levels. Their averages across the ungauged sites are listed in Table 4.3, showing slightly underestimated predictive variances. Table 4.4 shows the mean square predictive error (MSPE) at each un- 82 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.15: Interpolation at Ungauged Site 7 from the 13th week to the 14th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive inter- vals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observa- tions at Ungauged Site 7.] gauged site obtained by the three approaches. For all the ungauged sites, the NAIVE approach has the largest MSPE and has the poorest model per- formance of the three. Except for Ungauged Sites 2 and 8, the multivariate BSP has the smallest values of the MSPE among these three approaches, overall the best model performance among the three. At Ungauged Sites 83 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 0 50 100 150 − 5 0 5 1 0 1 5 Hour O 3 Figure 4.16: Interpolation at Ungauged Site 7 from the 15th week to the 16th week. The square–root of hourly ozone concentrations are plotted on the vertical axes and hours, on the horizontal axes. [Solid (dotdashed) lines = interpolation and 95% pointwise predictive inter- vals by the BSP; dash (dot) lines = interpolation and 95% predictive intervals by the DLM; + = interpolation by NAIVE; and ◦ = observa- tions at Ungauged Site 7.] 2 and 8, the MSPE of the DLM is slightly smaller than that of the multi- variate BSP, implying possible local variations in the characteristics of the ungauged sites. In other words, one might well add a nugget effect in the multivariate BSP approach to account for the variations among the response variables. 84 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 0 5 1 0 1 5 Hours O 3 BSP 95% PIs (BSP) DLM 95% PIs (DLM) NAIVE OBS Figure 4.17: The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap- proaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 10. Figures 4.23 and 4.24 plot the ratios of the MSPEs of NAIVE and the DLM to that of the multivariate BSP, respectively. Figure 4.23 shows that the interpolation for the multivariate BSP has smaller MSPE than that for NAIVE approach over all seventeen weeks and ten ungauged sites. Figure 4.24 plots the ratio of the MSPE of the DLM interpolator to that of the multivariate BSP, showing that better model performance of the latter over all 17 weeks and most ungauged sites. 85 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 0 5 1 0 1 5 Hours O 3 BSP 95% PIs (BSP) DLM 95% PIs (DLM) NAIVE OBS Figure 4.18: The observed square–root of ozone concentrations ( √ ppb) during the 10th week, the interpolation using the multivariate BSP, DLM and NAIVE approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 10. The coverage probabilities of the multivariate BSP and DLM are com- puted at the 95% nominal level as a further assessment of model perfor- mance. Figures 4.25 and 4.26 show the side–by–side boxplots of these cov- erage probabilities over 10 ungauged sites and 17 weeks, respectively. The coverage probabilities of the DLM are much higher than the 95% nominal level, for most weeks and ungauged sites. In fact, the pointwise coverage probabilities of the multivariate BSP tend to be closer to that nominal level 86 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 0 5 1 0 1 5 Hours O 3 BSP 95% PIs (BSP) DLM 95% PIs (DLM) NAIVE OBS Figure 4.19: The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap- proaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 9. than those of the DLM. Comparing both the MSPEs and coverage probabilities from the three approaches, NAIVE approach has the worst model performance among these three; the DLM approach does better than NAIVE, but not as good as the multivariate BSP. Overall, the multivariate BSP has the best model performance among these three. Temporal prediction is addressed in the next section by using the mul- 87 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 0 5 1 0 1 5 Hours O 3 BSP 95% PIs (BSP) DLM 95% PIs (DLM) NAIVE OBS Figure 4.20: The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap- proaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 8. tivariate BSP approach in the Chicago area’s hourly ozone field. Moreover, that model is assessed by comparing with the other two approaches: the DLM and NAIVE∗ (see Section 5.3). 88 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0 50 100 150 0 5 1 0 1 5 Hours O 3 BSP 95% PIs (BSP) DLM 95% PIs (DLM) NAIVE OBS Figure 4.21: The observed square–root of ozone concentrations ( √ ppb) during the 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap- proaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Ungauged Site 2. 4.5 Spatial Leakage in the DLM Zidek et al. (2002) address the crucial problem caused by the spatial leakage, or the space–time interaction problem in spatially interpolating the hourly PM10 in Vancouver area using the univariate approach. The spatial leakage occurred when substantial amount of between–site cross–correlations exists until certain time lag. The whitened residuals used in their model come from two steps: (i) firstly obtaining the detrended residuals by taking off 89 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 0.99 0.95 0.90 0.80 0.70 0.60 0.50 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Norminal Coverage P o s t e r i o r E l l i p s o i d C o v e r a g e Norminal level Figure 4.22: Boxplot of the simultaneous posterior ellipsoid credibility regions at various nominal levels. the covariates effect from the response variables, for fixed site and varying time points; (ii) then the deAR’d (or whitened) residuals can be obtained after taking account the temporal dependence into the detrended residuals. To be more specific, suppose Y (s, t) represents the response variable at site s and time t, for s ∈ {s1, . . . , sp} and t = 1, . . . , n. Let zt : 1 × l be an l-dimensional covariates vector at time t for all sites and β : 1 × l, the regression coefficients corresponding to the l covariates. The detrended 90 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation Nominal Coverage (%) 99 95 90 80 70 60 50 Ellipsoid Credible Coverage (%) 82 75 71 65 62 57 53 Table 4.3: The posterior ellipsoid coverage probabilities at various nominal levels. Ungauged Site MSPE(NAIVE) MSPE(DLM) MSPE(BSP) 1 2.96 1.79 1.16 2 2.17 1.55 1.61 3 1.62 1.80 1.30 4 1.65 1.46 0.94 5 2.52 1.92 1.03 6 3.08 1.85 0.99 7 2.13 1.60 0.97 8 3.74 2.62 2.67 9 1.42 1.63 1.01 10 0.46 0.87 0.38 Table 4.4: The mean square predictive error (MSPE) at ungauged sites of the multivariate BSP, DLM, and NAIVE approaches. residuals, E(s, t), are given by E(s, t) = Y (s, t)− ztβ. (4.25) Assuming an AR(q) process for the detrended residuals, the deAR’d residuals are denoted by e(s, t), and given by E(s, t) = q∑ i=1 αiE(s, t− i) + e(s, t). (4.26) Theoretical results in Zidek et al. (2002) show the existence of potential spatial correlations leakage for an AR(1) (detrended) process. The connec- tions between Le–Zidek approach and the DLM allow one use the former modelling in the DLM framework. Moreover, as a specific case of state– space model, the DLM framework can be treated as a bridge to unify the Le–Zidek approach and the general state–space modelling. Our objective is to investigate potential spatial correlations leakage in the general state– 91 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 Weeks ( a ) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 Ungauged Sites ( b ) Figure 4.23: The ratio of MSPE of the interpolation by NAIVE to that of the multivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites. space modelling for this approach. The following content addresses the work for an AR(p) process in (4.26). To reformulate the Le–Zidek approach as the DLM, we can assume zt itself having a random Gaussian process with mean 0 and very small variance matrix, σ2zIl. Of course, we can also assume that zt is fixed at each time point t, exactly the same as in Zidek et al. (2002). In other words, we consider two cases here: Case (i) zt itself has a stochastic process, for t = 1, . . . , n, that zt = zt−1 +wzt wzt ∼ N(0, σ2zIl). (4.27) 92 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 Weeks ( a ) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 Ungauged Sites ( b ) Figure 4.24: The ratio of MSPE of the interpolation by the DLM to that of the multivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites. Case (ii) zt is fixed at fixed time point t, for t = 1, . . . , n. For Case (i), models (4.25)–(4.27) can be normalized in the DLM as follows: Y (s, t) = F′θ(s, t) (4.28) θ(s, t) = Gθ(s, t− 1) +wt, (4.29) where F′ : 1×(q+1) = (1, 1, 0, . . . , 0), θ(s, t)′ : 1×(q+1) = (ztβ,E(s, t), E(s, t− 93 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 2 3 4 5 6 7 8 9 10 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 1 . 1 Ungauged Sites C o v e r a g e P r o b a b i l i t y DLM BSP Figure 4.25: Side–by–side boxplots of the coverage probabilities of the multivariate BSP and DLM approaches plotted against the 10 ungauged sites, respectively. 1), . . . , E(s, t− q + 1)), w′t : 1× (q + 1) = (wzt β, e(s, t), 0, . . . , 0) and G : (q + 1)× (q + 1) =  1 0 0 . . . 0 0 α1 α2 . . . αq 0 1 0 . . . 0 ... ... ... ... 0 0 0 . . . 1  . For Case (ii), we can also formalize models (4.25)–(4.27) same as above 94 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 1 . 1 Week C o v e r a g e P r o b a b i l i t y DLM BSP Figure 4.26: Side–by–side boxplots of the coverage probabilities of the multivariate BSP and DLM approaches plotted against the time span of 17 weeks, respectively. except the first entry in F′ to be zt((z∗t )′z∗t )−1(z∗t )′ and the (1,1) entry in G, z∗t ((z∗t−1)′z∗t−1)−1z∗t−1, and z∗t = zt − ∑q i=1 αizt−i. For both cases, let c′i : 1 × (q + 1) such that its ithentry being 1 and others, 0, for i = 1, . . . , q + 1. Then we have E(s, t− i) = c′i+2θ(s, t) (4.30) e(s, t) = c′2w(s, t), (4.31) 95 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation for i = 0, . . . , q − 1. Zidek et al. (2002) shows that the spatial correlation leakage occurs for the AR(1) process if Cor(E(s, t− 1), e(s′, t)) 6= 0. Hence the lag–1 between– site cross–covariance for any two sites s and s′ can be written as: Cov(E(s, t− 1), e(s′, t)) = Cov{c′3θ(s, t), c′2w(s′, t)} = c′3Cov{θ(s, t),w(s′, t)}c2 = c′3Cov{Gθ(s, t− 1) +w(s, t),w(s′, t)}c2 = c′3Cov(w(s, t),w(s ′, t))c2 = c′3Wt(s, s ′)c2, (4.32) whereWt : p(p+1)×p(p+1) is the covariance matrix for the vectorwt : p(p+ 1)×1 = (w(s1, t)′, . . . ,w(sp, t)′)′ at fixed time point t andWt(s, s′) : (p+1)× (p+ 1), the covariance matrix for w(s, t) and w(s′, t) for fixed t. Therefore, the lag–1 between–site cross–correlation is not zero if (Wt(s, s′))32 6= 0. In general, for i = 1, . . . , q−1, consider the covariance between E(s, t−i) and e(s′, t), we then have Cov(E(s, t− i), e(s′, t)) = Cov(c′i+2θ(s, t), c′2w(s′, t)) = c′i+2Cov(θ(s, t),w(s ′, t))c2 = c′i+2Wt(s, s ′)c2. (4.33) Hence, the lag–i between–site cross–correlation is not zero if (Wt(s, s′))i+2,2 6= 0. This result implies that the spatial correlation leakage occurs if, for i = 1, . . . , q − 1, Wt(s, s′)i+2,2 6= 0 for any two sites s and s′. It means that the DLM framework cannot avoid the spatial correlation leakage prob- lem for the AR(q) process if such conditions are satisfied. 4.6 Conclusion and Discussion The multivariate BSP approach can be used to overcome the spatial correla- tion leakage problem for the filtered residuals between the sites; that other- 96 Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation wise tends to incorporate a significant temporal structure in those residuals. Unlike the DLM, the multivariate BSP approach does not assume sta- tionarity and isotropy for the underlying process. The multivariate BSP can incorporate the uncertainty about hyperparameters through fully hierarchi- cal Bayesian modelling. On the other hand, the DLM approach needs some hyperparameters to be fixed to achieve similar objective. The multivari- ate BSP can provide ellipsoid credible regions for multiple pollutants and simultaneous prediction; the DLM provides predictive intervals for single pollutant only. The multivariate BSP is much more computationally effi- cient than the DLM. Unlike the latter it can interpolate hourly ground–level ozone over a large number of monitoring stations, an impossible task for the DLM. In other words, it is computationally scalable. 97 Chapter 5 Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Not only are people interested in interpolating responses at ungauged sites given all the information at gauged sites, but also in temporally predicting them at these gauged sites. In fact, 24–hour ahead ozone forecasting is now commonly done in many urban areas. The modeller may be asked: “What will the ozone concentration level be at 2 p.m. tomorrow given all data until 10 a.m. today?”; or, “What will the ozone levels be tomorrow if I have all the measurements up to today?” The multivariate BSP approach can be adapted to answer such questions by taking a 24–dimensional multivariate response variable formed from the daily 24 hourly univariate response variables, treated like 24 “species” or “pollutants”; each entry therein represents one measurement for each of the successive 24 hours. The multivariate model is needed due to the fact that the daily dependence structure of the 24–dimensional multivariate response variable invalidates the row independence assumption of the deAR’d resid- uals in the multivariate BSP approach. To apply the BSP, one must obtain estimates of the hyperscale spatial covariance of the 24 pollutants given all the observed response variables. However, the sequence of the 24–dimensional vectors of hourly responses does not meet the BSP’s assumption of independence. To overcome this difficulty, subsequences of vectors are systematically extracted to contain responses separated by 24 hours. This subset of data is then used to predict 98 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction each one of the 24 response variables at the future day. More specifically, two sub–data matrices are formed by selecting these 24–dimensional vectors, one sequence for odd days and a second for even days. Model assumptions hold for each sequence, allowing hyperparam- eters at gauged sites to be estimated by the EM–algorithm via the En- viRo.stat software cited earlier. Each “pair” of estimated hyperparameter sets can then be averaged to form approximate “estimates” of hyperparame- ters given all the data. Finally, the one–step ahead prediction is implemented at gauged sites given those estimates of hyperparameters and observed re- sponses. However, to get the 24–hour ahead forecast, obstacles remain that we now turn to. In fact, Section 5.1 demonstrates how to construct the above sequences and the corresponding subdata matrices along with the multivariate settings of the BSP model to predict each one of the 24 responses next day. Moreover, their predictive posterior distributions are developed and the corresponding pointwise predictive intervals at each gauged site, constructed. Section 5.2 illustrates the h–step ahead prediction by the DLM approach. Section 5.3 presents the one–day ahead temporal prediction by NAIVE∗ approach. Sec- tion 5.4 shows the results and comparisons of the one–day ahead prediction by the three approaches at gauged sites. 5.1 The Multivariate BSP Approach Suppose Y [gmj ] t,i represents the unobserved i th response variable at day t, gauged site j, and Y [goj ] t,i , the observed response variable, for t = 1, . . . , T , i = 1, . . . , p, and j = 1, . . . , g. Specifically in the Chicago’s AQS database, T = 120, p = 24, and g = 14. Two cases are considered here to predict the response variable at say (i) 11 P.M., or (ii) any single hour during the 0 A.M. to 10 P.M. period on day 121, the last day in the observation sequence, whose data we set aside to be used for assessment. • Case 1: Predict the response variable at the last hour (i.e., 11 P.M.) of the 121st day. 99 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Call the corresponding multivariate BSP model “Model–1”. One of the subdata matrices can be formed by {U(1)t } : 1× gp, where U(1)t = (Y [g o 1 ] 2t−1,1, . . . , Y [gog ] 2t−1,p); the other byV (1) t : 1×gp, whereV(1)t = (Y [g o 1 ] 2t,1 , . . . , Y [gog ] 2t,p ), for t = 1, . . . , 59. In other words, the observed response vari- ables from the 1st day to the 119th day are used as the “data”. After that, each of the subdata matrices is used with Model–1 to find es- timates of the hyperparameters. Moreover, “approximate” estimates of the hyperparameters given all the observed response variables at gauged sites are obtained by averaging each pair of the hyperparame- ters estimated by the BSP using these two matrices respectively. • Case 2: Predict the response variable at the (k − 1)th hour of the 121st day, for k = 2, . . . , p. Call the corresponding multivariate BSP model “Model–k”. One of the subdata matrices can be formed by {U(k)t } : 1× gp, where U(k)t = (Y [g o 1 ] 2t−1,k, . . . , Y [go1 ] 2t−1,p, Y [go1 ] 2t,1 , . . . , Y [go1 ] 2t,k−1, . . . , Y [gog ] 2t−1,k, . . . , Y [gog ] 2t−1,p, Y [gog ] 2t,1 , . . . , Y [gog ] 2t,k−1); the other by {V(k)t }, whereV(k)t = (Y [go1 ] 2t,k , . . . , Y [go1 ] 2t,p , Y [go1 ] 2t+1,1, . . . , Y [go1 ] 2t+1,k−1, . . . , Y [gog ] 2t,k , . . . , Y [gog ] 2t,p , Y [gog ] 2t+1,1, . . . , Y [gog ] 2t+1,k−1).One can obtain es- timates of the hyperparameters of Model–k in the same way as inCase 1. The covariates, i.e., the weekdays, are constructed by starting from “Monday∗” for the {U(k)t } and “Tuesday∗” for the {V(k)t }, k = 1, . . . , p, where the “∗” has been added to signify that the beginning of the day has been shifted successively by 0, 1, . . . , 23 hours, according to which hour of day 121 is to be predicted. This weekday effect is removed from the {U(k)t }s and {V(k)t }s, respectively, and the software, EnviRo.stat, used to implement these 24 models. Let Y = (Y[u],Y[g]) where Y[g] = (Y[g m]′,Y[g o]′)′ and Y[u] : n×up, with Y[g m] : m × gp and Y[go] : (n −m) × gp. The temporal prediction problem needs the predictive posterior distribution of (Y[g m]|Y[go],H), m being the temporal unit to be predicted. Specifically, m = 1 in the one–day–ahead prediction at gauged sites. 100 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Let Zβ[g]0 = ( µ(1) µ(2) ) : ( m× gp (n−m)× gp ) and In + ZF−1Z′ = ( A11 A12 A21 A22 ) : ( m×m m× (n−m) (n−m)×m (n−m)× (n−m) ) . Given all the estimated hyperparameters H = {F, β0,Ω,Λ1, δ1,Λ0, τ00, H0, δ0}, the marginal posterior distribution is given by Y[g m]|Y[go],H ∼ tm×gp(µ(u|g),Φ(u|g) ⊗Ψ(u|g), δ(u|g)), (5.1) where µ(u|g) = µ(1) +A12A−122 (Y [go] − µ(2)) (5.2) Φ(u|g) = δ1 − gp+ 1 δ1 − gp+ n−m+ 1(A11 −A12A −1 22 A21) (5.3) Ψ(u|g) = 1 δ1 − gp+ 1{Λ1 ⊗Ω+ (Y [go] − µ(2))′A−122 (Y[g o] − µ(2))} (5.4) δ(u|g) = δ1 − gp+ n−m+ 1 (5.5) (Le & Zidek, 2006, p.160–161). To obtain the one–step–ahead temporal prediction at the gauged sites, one needs the predictive posterior distribution of the unobserved response variable of interest, that is, the last “species” or “pollutant” in the multi- variate response vector whose role is now being played by an hourly ozone concentration. Two different predictive posterior distributions of the last pollutant (i.e., the pth pollutant) are considered for Model–1 and Model–k, k ∈ {2, . . . , p}. These two cases follow: • For Model–1, Y[go] has the observed responses from day 1 to day 119, 101 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction and Y[g m] can be written as Y[g m] = ((Y [g m 1 ] 121,1, . . . , Y [gmg ] 121,p) ′ , . . . , (Y [g o 1 ] 120,1, . . . , Y [gog ] 120,p) ′ )′, with Y [gm1:g ] 121,1:p : 1× gp, the unobserved response vector of day 121 and Y [go1:g ] 120,1:p : 1×gp, the observed response vector of day 120. Hencem = 2 and n = 121. The predictive posterior distribution of Y[g m] can be obtained by using (5.2)–(5.5). To obtain the predictive distribution of Y [gm1:g ] 121,1:p given Y [go1:g ] 1:119,1:p and Y [go1:g ] 120,1:p, one can decompose µ(u|g), Φ(u|g) and Ψ(u|g) as follows: µ(u|g) = ( µ1r µ2r ) : ( 1× gp 1× gp ) and δ(u|g)Φ(u|g) = ( B11 B12 B21 B22 ) : ( 1× 1 1× 1 1× 1 1× 1 ) . Hence, the predictive posterior distribution of Y [gm1:g ] 121,1:p is given by Y [gm1:g ] 121,1:p|Y [go1:g ] 120,1:p,Y [go1:g ] 1:119,1:p,H ∼ t1×gp(µ1r +B12B−122 (Y [go1:g ] 120,1:p − µ2r), (B11 −B12B−122 B21) δ(u|g) + 1 ⊗Ψ(u|g)(Igp +Ψ−1(u|g)(Y [go1:g ] 120 − µ2r)′B−122 (Y [go1:g ] 120 −µ2r))). (5.6) Let E1 : gp × g be a block diagonal matrix with the block ep : p × 1, having pth diagonal element 1 and all others 0. At Gauged Site j ∈ {1, . . . , g}, the predictive distribution of the pth unobserved response Y [gmj ] 121,p, that is, Y [gm1:g ] 121,1:pE1ej , also has a t–distribution: Y [gmj ] 121,p|Y [go1:g ] 120,1:p,Y [go1:g ] 1:119,1:p,H ∼ tδ(u|g)+1(µ∗E1ej , φ∗e′jE′1Ψ∗E1ej), (5.7) 102 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction where µ∗ = µ1r+B12B−122 (Y [go1:g ] 120,1:p−µ2r), φ∗ = (B11−B12B−122 B21)/(δ(u|g) +1) and Ψ∗ = Ψ(u|g)(Igp+Ψ−1(u|g)(Y [go1:g ] 120,1:p−µ2r)′B−122 (Y [go1:g ] 120,1:p−µ2r)). • For Model–k, k = 2, . . . , p, Y[go] has the observed responses from day 1 to day 119, while Y[g m] consists of k − 1 unobserved responses and p− k + 1 observed ones at each gauged site. To predict the responses one–day–ahead at ungauged sites in the field, one has m = 1 and n = 120 in (5.2)–(5.5). Let E2j : gp × p be the matrix obtained by stacking g (p × p) matrices, in which the jth matrix is Ip and others are 0. At Gauged Site j ∈ {1, . . . , g}, we have Y[g m]E2j |Y[go],H ∼ t1×p(µ(u|g)E2j ,Φ(u|g) ⊗E′2jΨ(u|g)E2j , δ(u|g)). (5.8) Notice thatY[g m]E2j is (Y [goj ] n−1,k, . . . , Y [goj ] n−1,p, Y [gmj ] n,1 , . . . , Y [gmj ] n,k−1). Let E3 : p× p be such a matrix that its entries in the ith row and (p− i+ 1)th column are 1 while all others, 0, for i = 1, . . . , p. Multiplying Y[g m]E2j by E3 reverses the order of the pollutants such that the response of the last hour locating at the first position of the new response vector, the response of the second last hour locating at the second position of the new response vector, and so on. In other words, we obtain the following new response vector: (Y [gmj ] n,k−1, . . . , Y [gmj ] n,1 , Y [goj ] n−1,p, . . . , Y [goj ] n−1,k). That new response has the following multivariate t–distribution: Y[g m]E2jE3|Y[go],H ∼ t1×p(µj ,Φ(u|g) ⊗Ψj , δ(u|g)), (5.9) where µj = µ(u|g)E2jE3 and Ψj = E′3E ′ 2jΨ(u|g)E2jE3. Decompose Y[g m]E2jE3, µj and Ψj as follows: Y[g m]E2jE3 = (T1c,T2c) : (1× (k − 1), 1× (p− k + 1)), µj = (µ1c, µ2c) : (1× (k − 1), 1× (p− k + 1)), 103 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction and Ψj = ( C11 C12 C21 C22 ) : ( (k − 1)× (k − 1) (k − 1)× (p− k + 1) (p− k + 1)× (k − 1) (p− k + 1)× (p− k + 1) ) . Hence the unobserved response variable T1c is also t–distributed: T1c|T2c,Y[go],H ∼ t1×(k−1)(µ1c + (T2c − µ2c)C−122 C21, δ(u|g) δ(u|g) + p− k + 1 ×Φ(u|g){1 + (δ(u|g)Φ(u|g))−1(T2c − µ2c)C−122 ×(T2c − µ2c)′} ⊗ (C11 −C12C−122 C21), δ(u|g) + p− k + 1). (5.10) The last “pollutant”, that is, the first entry of T1c, can be predicted by multiplying T1c with a (k− 1)–dimensional vector E4, in which its first entry is 1 and 0 otherwise. Consequently, the predictive posterior distribution of Y [gmj ] n,k−1 is given as follows: Y [gmj ] n,k−1|Y[g o],Y [goj ] n−1,k:p,H ∼ tδ(u|g)+p−k+1((µ1c + (T2c − µ2c)C−122 C21)E4, δ(u|g) δ(u|g) + p− k + 1 Φ(u|g){1 +(δ(u|g)Φ(u|g))−1(T2c − µ2c)C−122 ×(T2c − µ2c)′}E′4(C11 −C12C−122 C21)E4). (5.11) The corresponding predictive variance of the (k−1)th hour of the 121st day at Gauged Site j is given by: Var(Y [gmj ] n,k−1)|Y[g o],Y [goj ] n−1,k:p,H) = δ(u|g) δ(u|g) + p− k − 1 Φ(u|g){1 +(δ(u|g)Φ(u|g))−1(T2c − µ2c)C−122 ×(T2c − µ2c)′}E′4(C11 −C12C−122 C21)E4. 104 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction It is straightforward to construct the 95% pointwise predictive intervals at the (k − 1)th hour of the 121st day at each gauged site from (5.11). 5.2 The DLM Approach An alternative approach, DLM, can also be used for temporal prediction and indeed would seem the obvious choice, it being an amalgamation of state–space time series models. For the measurement and state equations of the DLM Yt = F′txt + νt νt ∼ N(0, σ2exp(−V/λ)) xt = xt−1 + ωt ωt ∼ N(0, σ2W) with initial information: x0|D0 ∼ N(m0, σ20C0), one can obtain the pos- terior distribution of the state parameters at the last known time point, T, that is, xT |y1:T , θ ∼ N(mT , σ2CT ), using the Kalman filter, a smooth- ing method and the Metropolis–within–Gibbs sampling algorithm (details in Chapter 2). Given the distribution of the state parameters at the last time point, T, the observed responses until time T, y1:T , and the model parameters, θ = {λ, σ2, a1, a2}, the h–step ahead prediction is given by yT+h|y1:T , θ ∼ N(F′t+hmT , σ2{F′t+h(CT + hW)Ft+h + exp(−V/λ)}), (5.12) for h ∈ N . Hence, T = 2880 and h = 1, . . . , 24 for the one–day ahead prediction. For any fixed h, the predictive response, yT+h, can also be obtained by the MCMC method. More specifically, at iteration j, suppose we have updated the model parameters λ(j), σ2(j), a(j)1 and a (j) 2 using the FFBS algorithm. That is, one has xT |y1:T , θ(j) ∼ N(mT (j), σ2(j)CT (j)). Then, the predictive response at iteration j, yT+h(j), can be drawn from 105 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction (5.12), that is, yT+h|y1:T , θ(j) ∼ N(Ft+h(j) ′ mT (j), σ2 (j){Ft+h(j)′(CT (j) + hW(j))Ft+h(j) +exp(−V/λ(j))}). Consequently, the predictive responses are obtained by the sample means of {yT+h(j) : j = 1, . . . , J} (J = 500;h = 1, . . . , 24). The empirical predictive intervals at the 95% nominal level are obtained by the quantiles of these samples. 5.3 NAIVE∗ Approach The other alternative approach, NAIVE∗, helps us assess the one–day ahead prediction performance of the multivariate BSP and DLM approaches. That approach models the response variable by the grand mean, day effect and hour effect. To be more specific, the response variable used in this approach is the vectorized square–root ozone levels at each of the gauged sites. Using the same notation as above, at each gauged site j ∈ {1, . . . , g}, the response variable is Yj1:n = (Y [goj ] 1,1 , . . . , Y [goj ] 1,p , . . . , Y [goj ] n,1 , . . . , Y [goj ] n,p )′ : 1×np, for n = 120 and p = 24. The design matrix X contains three columns: the first column consists of 1s, building in long–term linear trend; the second for the days, capturing day effect of the week, that is, Monday, Tuesday, etc.; and the last for the hours, a substitute for day effect but giving more reasonable results. In other words, its “design matrix” can be written as X =  1 1 1 ... ... ... 1 1 p ... ... ... 1 n 1 ... ... ... 1 n p  : (np)× 3. 106 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Then our model is given by Y = Xβ + ε, where ε is the mean 0 Gaussian process to preserve its great simplicity. The coefficient vector at Gauged Site j, βj , is thus estimated by the least squared estimator, β̂j = (X′X)−1X′Y j 1:n. The one–day ahead prediction at Gauged Site j is thus given by Ŷ j n+1 = Xn+1β̂j , where Xn+1 : p× 3 as follows: Xn+1 =  1 n+ 1 1 ... ... ... 1 n+ 1 p  . Next the multivariate BSP approach is compared to the DLM and NAIVE∗ for one–day–ahead prediction at gauged sites in the Chicago area. 5.4 Results and Comparisons Figures 5.1– 5.14 plot the temporal predictions of the square–root of ozone levels on the 121st day by the multivariate BSP, univariate DLM and NAIVE∗ approaches, the 95% pointwise predictive intervals for that day by the multi- variate BSP and DLM approaches, and the observations from 114th to 121st days, at each of these 14 gauged sites. The multivariate BSP is much more accurate than either the DLM or NAIVE∗ approaches. In fact, its predictive performance is rather good at most gauged sites. Table 5.1 presents the mean square predictive error (MSPE) of the pre- dictive responses on the 121st day at each one of the 14 gauged sites using the three approaches. At Gauged Site j, the MSPE of the prediction at hour h can be computed by: MSPEj = 24∑ h=1 (PREDjh −OBSjh)2, where PREDjh is the predictive response at hour h of the 121 st day and OBSjh, the observed response at the same hour of the 121 st day, at Gauged Site j. The DLM has the poorest MSPE over all the gauged sites compared with NAIVE∗ and the BSP. The NAIVE∗ approach is slightly better than 107 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Gauged Site MSPE(NAIVE∗) MSPE(DLM) MSPE(BSP) 1 0.52 9.38 0.50 2 0.96 7.38 0.40 3 1.93 4.66 0.40 4 1.59 4.24 0.49 5 2.81 2.60 3.00 6 0.68 2.31 0.74 7 0.51 4.19 0.22 8 1.44 7.48 1.01 9 1.60 7.01 0.59 10 0.44 5.50 0.50 11 0.83 9.25 0.49 12 0.75 3.41 0.45 13 1.71 12.27 0.73 14 2.44 5.61 1.09 Table 5.1: The mean square predictive error (MSPE) of the one–day ahead predic- tion at the 14 gauged sites by the multivariate BSP, DLM, and NAIVE∗ approaches. The BSP dominates in all but 3 cases where it essentially ties with one or another of its competitors. the BSP at Gauged Sites 5, 6, and 10. The BSP has the smallest MSPE across most gauged sites among these three. Figure 5.15 plots the length of the 95% pointwise predictive intervals by the BSP at the 24 hours of the 121st day. Starting from the middle hours of that day, i.e., 9 A.M., the predictive error bands tend to increase after that until the last hour, 11 P.M., reflecting the increasing uncertainties due to the fact that fewer responses are observed as time increases. Figure 5.16 plots the length of the empirical 95% predictive intervals by the DLM at the 24 hours of the 121st day. These lengths are close to each other but have the wiggly periodic behaviour across all gauges sites, a characteristic previously observed in Chapter 2. Though these lengths are very close to each other, the DLM actually underestimates the predictive variabilities at the gauged sites as seen in Figure 5.17 which plots the cover- age probabilities of the DLM and BSP approaches, and also shows a slightly overestimated predictive variance for the BSP, at the 95% nominal level. 108 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction Moreover, the results in Section 5.1 can be straightforward generalized to an arbitrary time point, not limiting to the case of 121 days of response vectors in this chapter. Therefore, we conclude that the multivariate BSP approach is more ac- curate on the one–day ahead prediction at the gauged sites in the Chicago area AQS 2000 database than both the NAIVE and DLM approaches. 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.1: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 1. 5.5 Conclusion and Discussion The temporal prediction of the ground–level ozone concentrations in the Chicago’s hourly ozone field shows the success of the adjusted multivariate BSP approach, comparing with two others: the DLM and NAIVE∗. This adjusted approach can be generalized to any time point other than 121 in 109 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.2: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 2. the above analysis. That extension is straightforward and will be given in a future work. Moreover, forecasting more hours will be feasible when more data are available. This approach could provide a whole new approach to univariate time series. The potential problem with this adjusted approach is due to the loss of the information when only a subset of the whole database is used. Further extensions of the correlated response vector in the multivariate BSP mod- elling need to be explored. One possible solution is to develop the dynamic version of the multivariate BSP, the topic for the next chapters. 110 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.3: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 3. 111 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.4: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 4. 112 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.5: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 5. 113 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.6: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 6. 114 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.7: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 7. 115 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.8: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 8. 116 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.9: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 9. 117 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.10: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 10. 118 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.11: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 11. 119 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.12: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 12. 120 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.13: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 13. 121 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 0 50 100 150 0 5 1 0 1 5 Hours O 3 Obs at 114:120 Obs at 121 BSP:Pred BSP:95% PI NAIVE:Pred DLM:Pred DLM:95% PI Figure 5.14: The observed square–root of ozone concentrations ( √ ppb) from day 114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVE∗ approaches, and the 95% pointwise predictive intervals using the multivariate BSP and DLM approaches at Gauged Site 14. 122 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 5 10 15 20 0 5 1 0 1 5 2 0 Hour W i d t h o f t h e 9 5 % P I s G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 Figure 5.15: The width of the 95% pointwise predictive intervals of the one–day ahead prediction at the 14 gauged sites using the multivariate BSP approach. 123 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction 5 10 15 20 0 5 1 0 1 5 2 0 Hour W i d t h o f t h e 9 5 % P I s G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 Figure 5.16: The width of the 95% pointwise predictive intervals of the one–day ahead prediction at the 14 gauged sites using the DLM approach. 124 Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction DLM BSP 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 C o v e r a g e p r o b a b i l i t i e s Figure 5.17: Boxplots of the coverage probabilities using the DLM and multivariate BSP approaches at the 95% nominal level. 125 Chapter 6 Bayesian Empirical Orthogonal Function Method This chapter contains some theoretical results relevant to the extension of the BSP approach in next chapter. The idea is straightforward in that we partition the spatial variation into long–term variation, and short–term variation including measurement errors. We obtain what we call “Bayesian (empirical orthogonal functions) EOFs” in this chapter and extend results to incorporate the very flexible GIW prior to model the staircase patterned data in Chapters 4–5. Section 6.1 introduces empirical orthogonal functions (EOFs). Section 6.2 describes the classical EOFs obtained from spatio–temporal data and discusses potential difficulties with this method. Section 6.3 illustrates an alternative, corrected EOFs, for a known temporal covariance in the separa- ble space–time covariance structure. Section 6.4 proposes a Bayesian EOF method for a general Bayesian hierarchical model. Section 6.5 extends those results to incorporate the GIW prior. Simulation Study 1 is presented in Sections 6.2–6.3. Section 6.6 compares those three types of EOFs by Sim- ulation Study 2. Section 6.7 summarizes the results and states conclusions for this chapter. 6.1 Introduction Over the years, empirical orthogonal functions (EOFs) have been used ex- tensively for identifying spatial patterns for environmental processes. EOFs implicitly assume that the samples consist of independent replicates of the spatial fields. These have been widely used in astronomy, physics, clima- 126 Chapter 6. Bayesian Empirical Orthogonal Function Method tology, and oceanography (see Hannachi et al. (2007) for a review on this subject in atmospheric science). EOFs deal with data anomalies, that is, deviations of the observed values from their temporal mean, instead of the responses. These anomalies are obtained by subtracting the time averages of the responses from the local responses at each time point and site. More precisely, let Zi(t) represent the response at site si and at time t, for i = 1, . . . , p and t = 1, . . . , n. Then the anomaly at site si and time t is defined as Yi(t) = Zi(t)− 1 n n∑ t=1 Zi(t), (6.1) for i = 1, . . . , p and t = 1, . . . , n. Wikle (2002) describes EOFs as a commonly used spatio–temporal method in Climatology. The method is widely used in spatial process analysis to detect spatial patterns in a given field. For a continuous spatial pro- cess observed at discrete time points, the EOFs represent components of a Karhunen–Loève (KL) expansion; while in the discrete process, they are the components of a Principal Component Analysis (PCA). EOF and PCA differ from each other. EOF can be applied on any irreg- ularly located monitoring stations to find both the time series and spatial patterns. PCA can also be applied on any sites but only obtain the time series or spatial patterns. The spatial pattern represent the mode of vari- ability. The time series patterns, also called as the time series amplitudes, reflect or characterize how the spatial patterns oscillate over time. The time series amplitudes are often referred as the expansion coefficients or principle components. In this thesis, we call the time series amplitudes as expansion coefficients, in order to distinguish them from the principal components, the term often used in PCA. Although both EOF and PCA can be applied on any site, EOF might be inaccurate if it is constructed given unevenly located sites. In Climatology literature, some authors define EOF and PCA differently but some not. We can use infinitely many orthonormal basis vectors to 127 Chapter 6. Bayesian Empirical Orthogonal Function Method construct the EOF. However, the eigenvector basis is the only one that allows the expansion coefficients having the PCA property, that is, the expansion coefficients of the EOFs are uncorrelated (Björnsson and Venegas, 1997). The KL expansion represents a stochastic process by a linear combina- tion of an infinite number of orthogonal functions. In the KL expansion, the coefficients for the orthogonal functions are random variables. The or- thogonal functions are the eigenfunctions of the covariance function for this process. For the anomalies data, suppose Yt = (Yt(s1), . . . , Yt(sp))′ : p × 1 represents the anomalies vector at t across all the spatial sites in the region. Let Y = (Y1, . . . ,Yn) : p× n be the anomaly matrix. Definition 6.1.1 (KL expansion) Consider the spatio–temporal process {Z(s, t) : s ∈ D, t = 1, . . . , n}, where s represents for the spatial location in the domain D = {s1, . . . , sp} that of interest while t represents a time point. Assume E(Z(s, t)) = 0 and Cov(Z(si, t), Z(sj , t)) = C0(si, sj) for all t. The KL expansion represents the covariance function as an infinite linear combination of orthogonal functions, that is, C0(u, v) = ∞∑ j=1 λjφj(u)φj(v), (6.2) where {λj : j = 1, . . . ,∞} are the eigenvalues and {φj(.) : j = 1, . . . ,∞}, the orthogonal eigenfunctions. For the complete set of orthonormal basis functions {φj(.) : j = 1, . . . ,∞}, the response can be represented as follows: Z(s, t) = ∞∑ j=1 aj(t)φj(s), (6.3) where Var(aj(t)) > Var(aj+1(t)) for j = 1, 2, . . . , and Cov(ai(t), aj(t)) = 0 for i 6= j. In discrete case, the KL expansion is simply obtained through the PCA. Then {φj(s) : s ∈ D} is called the jth EOF and {aj(t) : t = 1, . . . , n}, the expansion coefficients corresponding to the jth EOF. In other words, the KL expansion allows one to represent the process by an infinite set of 128 Chapter 6. Bayesian Empirical Orthogonal Function Method separable orthonormal basis functions such that these orthonormal basis functions are optimal in minimizing the mean square variance. We next introduce the term of classical EOFs. 6.2 Classical EOFs To find the principal spatial patterns in a spatial–temporal process, one can use the classical EOFs 6. The spatial covariance matrix can be estimated using the samples of observed anomalies. In the geographical region of interest, zt(s) represents the univariate response variable at site s and time t, for s ∈ {s1, . . . , sp} and t = 1, . . . , n. Let Zt = (zt(s1), . . . , zt(sp))′ : p×1 be the response vector at time t. Assume the matrix–variate response variable Z = (Z1, . . . ,Zn) : p× n that follows a matrix–normal distribution with a separable covariance structure in space and time, that is, Z ∼ Np×n(0,ΣS ⊗ΣT ), where ΣS : p × p represents the spatial covariance matrix and ΣT : n × n, the temporal covariance matrix. This separable covariance structure implies no space–time interaction in spatial–temporal processes. Based on these assumptions, the spatial covariance matrix can be esti- mated by Σ̂S = 1nYY ′, where Y is the anomaly matrix defined in (6.1). The spectral decomposition theorem implies the existence of a unique decompo- sition for Σ̂S such that Σ̂S = Ψ̂Λ̂2Ψ̂′, where Λ̂2 = diag{λ̂21, . . . , λ̂2p} with λ̂1 > . . . > λ̂p > 0 being the eigenvalues for ZZ′; each column of Ψ̂ is the eigenvector corresponding to the associated eigenvalue. Hence, we represent ΣS as ΣS = ΨΛ2Ψ′ = (ΨΛ)(ΨΛ)′ = ΦΦ′, (6.4) a form of the KL expansion. We then obtain the classical EOFs from Φ̂ = 6“Classical” EOFs are usual EOFs used in literature in which it ignores temporal components. 129 Chapter 6. Bayesian Empirical Orthogonal Function Method Ψ̂Λ̂. However, the classical EOFs do not efficiently estimate the population level counter parts in ΣS without temporal independence, an unrealistic as- sumption in most cases. Moreover, this unrealistic assumption might provide misleading information about the spatial pattern obtained by the classical EOFs. In the following section, we will demonstrate this deficiency in clas- sical EOFs through some simulation studies. Moreover, a potential problem arises about how one can avoid the negative effects caused by correlated samples, that is, temporally correlated sequences. Next in Simulation Study 1, we show severe problems that classical EOFs can cause in a separable space–time field with known EOF matrix. To do this, we need to construct the orthogonal matrixO in the spectral decompo- sition theorem. Gram–Schmidt’s process is a procedure to obtain an orthog- onal basis. The known EOFs can be constructed using the Gram–Schmidt’s process to have a specified diagonal matrix Λ and an orthonormal basis function Φ. This construction starts with any set of given orthogonal vec- tors, saying, O1, . . . ,Ok, for 1 ≤ k ≤ n − 1, and k ∈ Z. Lemma 6.2.1 gives the details. Lemma 6.2.1 Given the orthogonal vectors G1, . . . ,Gk, with Gj : p×1 for j = 1, . . . , k, we obtain an orthogonal matrix G = (G1, . . . ,Gp) : p × p by repeating steps (i)–(iii) for j = k + 1, . . . , p : (i) Generate a realization yj from Np(0, Ip). (ii) Fit the linear regression model: yj = A0 + j−1∑ i=1 AiGi + εi, and obtain the estimated coefficients {Âi : i = 0, . . . , j − 1}. (iii) Denote Gj to be the fitted residuals yj − Â0 − ∑j−1 i=1 ÂiGi such that Gj ⊥ {G1, . . . ,Gj−1}. 130 Chapter 6. Bayesian Empirical Orthogonal Function Method This lemma gives us the orthonormal basis function by normalizing the generated orthogonal vectors G1, . . . ,Gp. We hereafter use O to represent the orthonormal basis matrix using the Gram–Schmidt type expansion. Since ΣS = OΛ2O′, we then obtain the spatial covariance function ΣS using the above constructed EOFs. Simulation study 1 The objective of this study is to compare three different types of EOFs for a separable state–space process with known EOFs and known temporal covariance matrix. Simulated data In this study, the spatial region is 18 × 18 grid locations, grid cell edges being taken to be 6 km. The regular lattice–base grids have 324 grid lo- cations. The spatial–temporal process over this lattice is assumed to be a mean 0 process with a separable spatio–temporal covariance structure. Af- ter constructing the EOFs matrix using the Gram–Schmidt’s method with a known diagonal matrix, the spatial covariance function is formed by their products as presented by the spectral decomposition theorem. The temporal covariance function is simply a causal and invertible AR(1) process with the variance parameter σ2v and AR coefficient, φ. The autocorrelation function (ACF) for the AR(1) process is given by γ(h) =  σ2v 1−φ2 h = 0 σ2v 1−φ2φ ||h|| h ≥ 1. In this example, we choose two orthogonal vectors G(1) and G(2) to con- struct the EOFs using the Gram–Schmidt method. More specifically, for p = 4q, q = 1, 2, . . . , G(i) is a p–dimensional vector in which its jth entry, G (i) j , is 1 for j ∈ [1, p2 ]; -1 for j ∈ [p2 + 1, p]; and 0 otherwise. To motivate development of our simulation model, we think of surface temperature as our response of interest. This response would be strongly determined by lat- 131 Chapter 6. Bayesian Empirical Orthogonal Function Method itude, high in summer in the northern hemisphere while low in the southern. We choose G1 to represent that spatial pattern. We then choose G2 to rep- resent a High–Low elevation spatial pattern such thatG2 is a p–dimensional vector by repeating (1√p/2,−1√p/2) 7 for√p times. The constructed orthog- onal matrix given by these two orthogonal vectors, denoted by G, can be obtained by applying the Gram–Schmidt method in Lemma 6.2.1. The first four diagonal entries in the specified diagonal matrix Λ2 are assumed to be 40, 20, 15, and 10. The remaining diagonal entries can be constructed by a decreasing sequence such that their summation is 15 and the minimum or last entry, around 0.023. Figures 6.1 and 6.2 plot the contours for simulated data at t = 5 and t = 28 in the region we study, respectively. These graphs show there may be the North–South and High–Low spatial patterns in this field, respectively. Moreover, this spatio–temporal field varies spatially and temporally. To check the temporal variations in this simulated database, we ran- domly select four grid locations and plot their histograms, ACFs and PACFs in Figure 6.3. These graphs show a very strong autocorrelation in the time series data of each of the selected four sites, as expected from the strong AR(1) temporal process in the data. Results and comparisons We first compute the classical EOFs and compare them with the true EOFs. Figure 6.4 plots the contours for the true EOFs. Clearly the first spatial pattern is North–South spatial pattern and the second, High–Low elevation spatial pattern, the principle determinants of surface temperature. Figure 6.5 plots the contours for the classical EOFs. Although it seems that classical EOFs can also capture the first two types of spatial patterns, obviously the estimates for the contours are far from the truth. For example, the values for the first true EOF in the northern region is close to be 0.35, while for the first classical EOF, close to be 0.6 at the northeastern region and 1.0, the northwestern region. Same sort of things happen for the second true 7Here 1√p/2 represents the √ p/2–dimensional vector of 1, and so as −1√p/2. 132 Chapter 6. Bayesian Empirical Orthogonal Function Method 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 Longitude L a t i t u d e Figure 6.1: Contour plot for the simulated data at day t = 5 in the 18 × 18 grid locations. The AR coefficient in the simulated data is set to be φ = 0.9. (White=- 4.0; Black=4.0.) and classical EOFs. We conclude that the classical EOFs fail to capture the true spatial patterns in this field due to the field’s high autocorrelation. The next section provides an alternative when the temporal covariance matrix, ΣT , is known, a method leading to what we call “corrected” EOFs. 6.3 Corrected EOFs This section presents an alternative to the classical EOFs of Section 6.2, given a known temporal covariance matrix. We call the EOFs obtained corrected EOFs, meaning that the temporal dependence structure has been incorporated. Although assuming the temporal covariance ΣT to be known is unrealistic, the analysis exposes the potentially serious problem that the 133 Chapter 6. Bayesian Empirical Orthogonal Function Method 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 Longitude L a t i t u d e Figure 6.2: Contour plot for the simulated data at day t = 28 in the 18 × 18 grid locations. The AR coefficient in the simulated data is set to be φ = 0.9. (White=-4.0; Black=4.0.) classical EOFs can cause. Given the matrix–variate normal distribution forY ∼ Np×n(0,ΣS⊗ΣT ), we have Y∗ = YΣ−1/2T ∼ Np×n(0,ΣS ⊗ In), by a standard property of the matrix–variate normal distribution. It is then straightforward to estimate ΣS using 1n ∑n t=1Y ∗ t (Y ∗ t ) ′ = 1nY ∗(Y∗)′. In other words, Σ̂S = 1 n YΣ−1T Y ′, (6.5) given the non–singular covariance matrix ΣT . The corrected EOFs are then constructed using the spectral decomposition theorem, same as the classical EOFs. To obtain unique EOFs, we restrict the eigenvectors to form the orthonormal matrix that has positive elements in its first row. We now revisit the example in Section 6.2 but obtain the corrected EOFs 134 Chapter 6. Bayesian Empirical Orthogonal Function Method Site 98 simu.data.rand.set[i, ] F r e q u e n c y −4 −2 0 2 4 6 0 2 0 4 0 6 0 8 0 Site 207 simu.data.rand.set[i, ] F r e q u e n c y −4 −2 0 2 4 6 0 2 0 4 0 6 0 8 0 Site 316 simu.data.rand.set[i, ] F r e q u e n c y −3 −2 −1 0 1 2 3 0 1 0 3 0 5 0 Site 323 simu.data.rand.set[i, ] F r e q u e n c y −6 −4 −2 0 2 4 0 2 0 4 0 6 0 8 0 0 5 10 15 20 25 − 0 . 2 0 . 2 0 . 6 1 . 0 Lag A C F Site 98 0 5 10 15 20 25 0 . 0 0 . 4 0 . 8 Lag A C F Site 207 0 5 10 15 20 25 0 . 0 0 . 4 0 . 8 Lag A C F Site 316 0 5 10 15 20 25 − 0 . 2 0 . 2 0 . 6 1 . 0 Lag A C F Site 323 5 10 15 20 25 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 Lag P a r t i a l A C F Site 98 5 10 15 20 25 0 . 0 0 . 2 0 . 4 0 . 6 Lag P a r t i a l A C F Site 207 5 10 15 20 25 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 Lag P a r t i a l A C F Site 316 5 10 15 20 25 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 Lag P a r t i a l A C F Site 323 Figure 6.3: Histogram (first row), ACFs (second row) and PACFs (third row) for the simulated data at four randomly selected sites in the region. The AR coefficient in the simulated data is set to be φ = 0.9. and compare them with the classical and true EOFs to see if the corrected EOFs improve our knowledge about the principal spatial patterns in this simulated database. Simulation study 1 (revisited) Figures 6.4–6.5 demonstrate the first two EOFs for true, classical and cor- rected, respectively. The first two true EOFs seem to have an obvious spatial pattern, seen in the corrected EOFs but not in the classical EOFs. The re- sults demonstrate the potential danger in using the classical EOFs when the spatio–temporal data are highly temporally correlated as well. Table 6.1 represents the percentage of spatial variation for the first 10 EOFs by the true, classical, and corrected methods. The corrected 135 Chapter 6. Bayesian Empirical Orthogonal Function Method 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (a) Longitude L a t i t u d e 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (b) Longitude L a t i t u d e Figure 6.4: Contour plots for the first two true EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) EOFs are much closer to the true values than the classical ones. Table 6.2 demonstrates the matrix discrepancies8 for the classical and corrected EOFs against the true EOFs. Assuming the temporal covariance function is known, the corrected EOFs are closer to the true EOFs than the classi- cal ones. The matrix discrepancy between the true and corrected EOFs is much smaller than that between the true and classical one. Moreover, the corrected EOFs better characterize the first two spatial types in this field than the classical EOFs do. More generally, for an unknown temporal covariance structure, it is im- possible to use the corrected EOFs to identify the principal spatial patterns 8The matrix distance between two matrices, A and B, is defined through the concept of the inner product of these two matrices, that is, ||A − B|| = tr(AB′). The matrix discrepancy between A and B is defined as √ ||A−B|| ||B|| . 136 Chapter 6. Bayesian Empirical Orthogonal Function Method 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (a) Longitude L a t i t u d e 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (b) Longitude L a t i t u d e Figure 6.5: Contour plots for the first two classical EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) in spatio–temporal fields. One may have interest to know whether we can obtain an accurate estimate of the spatial covariance matrix based on the observations so that the EOFs can better represent the principal spatial patterns. Moreover, how can one take into account of uncertainties in the estimates? To help answer those questions, we next propose a new Bayesian version of EOFs by adding a prior distribution for the spatial covariance matrix. 6.4 Bayesian EOFs In practice, the corrected EOFs cannot be computed because of the unknown temporal dependence structure for the spatial–temporal matrix–variate re- sponse Z : p× n, where p is the total number of spatial locations in the do- 137 Chapter 6. Bayesian Empirical Orthogonal Function Method 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (a) Longitude L a t i t u d e 20 40 60 80 100 2 0 4 0 6 0 8 0 1 0 0 (b) Longitude L a t i t u d e Figure 6.6: Contour plots for the first two corrected EOF vectors: (a) – 1st EOF; and (b) – 2nd EOF. (White=-1.7; Black=1.4.) main that is of interest and n, the total number of time points. Henceforth, Bayesian EOFs are proposed in this section as an alternative to classical EOFs. Bayesian EOFs: the underlying model The Bayesian EOFs model we adopt here is given by Z = µ⊗ 1′n +ΦX+ ², ² ∼ Np×n(0,ΣS ⊗Σ²) (6.6) and X |θ ∼ Np×n(0, Ip ⊗ΣT ), (6.7) where we assume both spatial and temporal covariance matrices ΣS and ΣT have full rank, that is, nonsingular. Moreover, we deal with two cases 138 Chapter 6. Bayesian Empirical Orthogonal Function Method Index of EOFs True Classical Corrected 1 40.000 40.833 39.507 2 20.000 20.230 20.178 3 15.000 15.890 15.072 4 10.000 9.165 10.267 5 0.023 0.544 0.189 6 0.024 0.519 0.183 7 0.024 0.481 0.178 8 0.024 0.434 0.178 9 0.024 0.431 0.173 10 0.024 0.405 0.171 Table 6.1: Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.9). Matrix discrepancies Classical v.s. True 4.347 Corrected v.s. True 0.137 Table 6.2: Matrix discrepancies for the classical and corrected EOFs against the true EOFs. for the temporal covariance matrix ΣT : (i) It is randomly distributed with an inverted Wishart distribution; and (ii) It has a semi–parametric form, that is, ΣT = σ2ρ(., θ), with known temporal correlation form for ρ(., .) but unknown parameters σ2 and θ. We assume that the temporal correlation matrix, ρ(., θ), decreases as the difference between two time points increases, such that ρ(ti − tj , θ) → 1 as ti − tj → 0. In practice, we assume σ2 = 1 due to the non–identifiable property for the Kronecker product. The semi–parametric form for ρ(., θ) is then estimable using the MCMC method. For simplicity, we assume no small–scale spatial variation and measurement errors, that is, no ² term in (6.6), deferring such refinements to future work. In (6.6), Φ can be treated as a constant matrix for a known ΣS by the Karhunen–Loeve (KL) expansion. Given the positive definite spatial covari- ance function ΣS : p× p, we have the unitary orthogonal matrix O : p× p, with its first row being positive, and diagonal matrixΛ = diag {λ1, . . . , λp} : p × p, with λ2s being the eigenvalues for ΣS and λ1 > . . . > λp > 0, such 139 Chapter 6. Bayesian Empirical Orthogonal Function Method that Σ−1S = OΛ −2O′. Hence, we have the Karhunen–Loeve (KL) expansion as follows: Σ−1S = (λ −1 1 O 1, . . . , λ−1p O p)(λ−11 O 1, . . . , λ−1p O p)′ = (Φ1, . . . ,Φp)(Φ1, . . . ,Φp)′ = p∑ j=1 ΦjΦj ′ = ΦΦ′, where Φj = λ−1j O j , for j = 1, . . . , p, and Φ = (Φ1, . . . ,Φp) : p× p. However, when ΣS is a random matrix, the EOFs represented by the columns of Φ are also random. Moreover, the orthogonal matrix can be treated as either constant or random in the KL expansion. The distribution for the random orthogonal matrix has been obtained by James (1954a), as an invariant uniform distribution on the Stiefel manifold (see Definition 6.4.2). Moreover, he also obtained the independent distribution of the diagonal entries in Λ2 in the KL expansion. Then a Bayesian version of EOFs can be obtained either by the MCMC method or through the empirical Bayes approach. The first level of a hier- archical model places no restriction on the form of our Bayesian EOFs, and so it is a nonparametric approach. When the prior for the purely spatial covariance matrix, ΣS , has been determined, Φ can be obtained using the KL expansion and Lemma 6.4.1 below. Therefore, the Bayesian EOF model (6.6)–(6.7) is completed by specify- ing prior distributions for the model parameters: p(µ), p(ΣT ) for Case (i) [or p(θ) for Case (ii)] and p(ΣS). Lacking specific prior information, we assume p(µ) ∝ 1 and Σ−1S ∼Wp(δS ,ΞS) with δS and ΞS being hyperparameters. For Case (i), we assume Σ−1T ∼ Wn(δT ,ΞT ). The collection of hyper- parameters can be denoted by H1 = {δS , δT ,ΞS ,ΞT }. The joint posterior distribution we are interested in is given by p(µ,ΣT ,ΣS |Y). 140 Chapter 6. Bayesian Empirical Orthogonal Function Method For Case (ii), we assume θ ∼ Nk(θ0,Σ0). The collection of hyperpa- rameters here can be denoted by H2 = {δS ,ΞS , θ0,Σ0}. The joint posterior distribution we need for inference is then given by p(µ, θ,ΣS |Y). In summary, we consider the following two Bayesian models to obtain the Bayesian EOFs for each of the two cases under consideration in which we assume no measurement error (²s): (i) The Bayesian model is given by Z = µ⊗ 1′n +ΦX (6.8) X ∼ Np×n(0, Ip ⊗ΣT ) (6.9) Φ = OΛ−1, (6.10) where Σ−1S = OΛ −2O′ by the above KL expansion. And the priors for the model parameters are assumed mutually independent and given as follows: p(µ) ∝ 1 (6.11) Σ−1S ∼ Wp(δS ,ΞS) (6.12) Σ−1T ∼ Wn(δT ,ΞT ). (6.13) (ii) The Bayesian model is given as in (6.8)–(6.10), but ΣT can be written as ρ(., θ), where ρ(., .) is the known temporal correlation matrix, de- creasing as the difference between any two time points increases, and θ is an unknown parameter. The priors for µ and Σ−1S are given in (6.11)–(6.12), respectively, and they are mutually independent. Model specification is completed with the prior for the temporal covariance function given by: θ ∼ Nk(θ0,Σ0). (6.14) Notice that we assume both ΣS and ΣT are valid covariance matrices and non–singular, that is, the rank for ΣS is p and for ΣT , n. 141 Chapter 6. Bayesian Empirical Orthogonal Function Method Decomposition of Y ∼ Np×n(0,ΣS ⊗ In) We first describe results from James (1954a), who discovers that the dis- tribution of the mean 0, temporal independent random matrix Y [that is, Y ∼ Np×n(0,ΣS ⊗ In)] can be uniquely decomposed into three indepen- dent parts, specifically, one part being Wishart distributed, one part being uniformly distributed on a Grassmann manifold and the last part being uni- formly distributed on a Stiefel manifold, that is, an orthogonal group in this setting. He constructs invariant measures on the orthogonal group (that is, the Haar measure), the Grassmann and Stiefel manifolds. James also finds the distribution of a non–central Wishart distribution using the Haar measure (1954b) and the distribution of the latent roots for a covariance matrix (1960). We now introduce some basic notation from James (1954a) and Chikuse (2004). Definition 6.4.1 The orthogonal group, O(n), is the set of all orthogonal matrices with the operation of matrix multiplication. Definition 6.4.2 The Stiefel manifold, Vk,n = {V : n× k;V′V = Ik}, is a set of k (k ≤ n) orthonormal matrices in Rn. Definition 6.4.3 The Grassmann manifold, Gk,n−k, is the set of all k– dimensional hyperplanes in Rn that pass through the origin. James sates that “. . . the Grassmann and Stiefel manifolds may be re- garded as coset spaces of the orthogonal group” (1954, p.63), an important property on their relationships. The main result from James has been sum- marized in the following lemma. Lemma 6.4.1 (James, 1945) Suppose Y = (y1, . . . ,yn) ∼ Np×n(0,ΣS ⊗ In). Then we have Y = OLP, (6.15) where O : p×p represents an orthogonal matrix that is uniformly distributed over the Grassmann manifold, P : p × n, a semi–orthogonal matrix (i.e., 142 Chapter 6. Bayesian Empirical Orthogonal Function Method PP′ = Ip in this case) that is uniformly distributed over the Stiefel manifold and L : p× p, a diagonal matrix with diagonal entries {l1, . . . , lp} such that l21, . . . , l 2 p are the eigenvalues for YY ′ with l1 > . . . > lp > 0. Mardia and Khatri (1977) develop the exact and asymptotic distribu- tions for the random matrix uniformly distributed on a Stiefel manifold. They also discuss the matrix form of the von Mises–Fisher distribution on a Stiefel manifold. We will not discuss this application of James in this chapter but leave the construction of the posterior distributions on Stiefel and Grassmann manifolds for future research. Theoretical results We present the related inference for Bayesian EOFs in following lemmas and theorems. All proofs in this section are listed in Appendix C.1. Lemma 6.4.1 and the KL expansion below provide the basis for the theoretical results needed to obtain Bayesian EOFs. It leads to the prior distribution as a special case when ΞS = Ip, as shown in following lemma. Lemma 6.4.2 If Σ−1S ∼ Wp(n, Ip) for some n ∈ Z+, by the KL expan- sion and Lemma 6.4.1, we have that the λ−2’s are mutually independently distributed with λ−2j ∼ χ2n, for j = 1, . . . , p, and O : p × p is uniformly distributed on the Stiefel manifold. Lemma 6.4.2 provides one way to sample the random matrix Σ−1S from its prior distribution Wp(δS , Ip). Theorem 6.4.1 Consider the data matrix Y : p× n ∼ Np×n(0,ΣS ⊗ΣT ). Given the nonsingular spatial and temporal covariance matrices ΣS and ΣT , let Y∗ = YΣ−1/2T . Then Y ∗ ∼ Np×n(0,ΣS ⊗ In). Consequently, Y∗ = Σ1/2S OLP, where O, L, and P are given in Lemma 6.4.1. The Bayesian EOFs are then obtained as W = 1nΣ 1/2 S OL. Theorem 6.4.2 Consider the data matrix Y : p× n ∼ Np×n(0,ΣS ⊗ΣT ). Suppose the temporal covariance matrix ΣT is nonsingular and known. Then 143 Chapter 6. Bayesian Empirical Orthogonal Function Method we have Y∗ = YΣ−1/2T ∼ Np×n(0,ΣS ⊗ In). Assume Σ−1S ∼ Wp(δS ,ΞS). The posterior distribution for the spatial precision matrix Σ−1S is given as follows: Σ−1S |Y ∼ Wp(δo,Ξo), (6.16) where δo = δS + n, and Ξo = ΞS −ΞSY(Y′ΞSY+ΣT )−1Y′ΞS . (6.17) The Bayesian EOFs can be obtained when ΣS is estimated or sampled from its posterior distribution. Consequently from Theorem 6.4.2, we can either obtain the estimates for ΣS using an empirical method such as that of the Sampson–Guttorp (SG) or sample it from its posterior distribution in (6.16). In other words, we can either use empirical Bayes or hierarchical Bayesian methods to obtain the estimates for the model parameters. If the spatial covariance matrix were known, valid and nonsingular, we would have a similar result for the posterior distribution of the temporal covariance matrix, ΣT . The next theorem tells us its posterior distribution if the prior for Σ−1T is assumed to be Wishart distributed, that is, Case (i). Theorem 6.4.3 Consider the data matrix Y : p× n ∼ Np×n(0,ΣS ⊗ΣT ). Suppose the spatial covariance matrix ΣS is known, valid and nonsingular. Assume the prior for Σ−1T is Wn(δT ,ΞT ). Then the posterior distribution for Σ−1T is given by Σ−1T |Y ∼ Wn(δ∗,Ξ∗), (6.18) where δ∗ = δT + p and Ξ∗ = ΞT −ΞTY′(YΞTY′ +ΣS)−1YΞT . (6.19) In Case (ii) where the temporal covariance matrix is assumed to have a 144 Chapter 6. Bayesian Empirical Orthogonal Function Method known parametric form but unknown parameters, we can obtain the pos- terior distribution for these parameters given a valid and known spatial covariance matrix. This posterior distribution is given in the following the- orem. Theorem 6.4.4 Given the same condition as in Theorem 6.4.3 and the fully Bayesian model in Case (ii) with the priors for θ in (6.14), the condi- tional posterior distributions for θ are given as follows: p(θ|Y) ∝ exp { −1 2 [ 1 σ2 tr(V′Vρ(., θ)−1) + (θ − θ0)′Σ−10 (θ − θ0) ]} , (6.20) where V = Σ−1/2S Y. In practice, both temporal and spatial covariance matrices are unknown, and so conditions in Theorems 6.4.1 and 6.4.4 do not hold. Although the nonsingularity condition for these two covariance matrices might be difficult to verify due to the challenging numerical problems, we need the condition to obtain the posterior samples for both matrices in this chapter. Future research will be devoted to improve on the results. To obtain posterior samples for both parameters in the MCMC frame- work, we can either use a mixture of MCMC and empirical Bayes methods or use pure MCMC runs. We will illustrate the algorithm for both cases next. But the idea here is to obtain the conditional posterior samples for µ, ΣS and ΣT subsequently. The empirical Bayes method will be used to obtain the estimate for ΣS given the data matrix, µ, and ΣT . We use the Gelman and Rubin R statistics as a device to check the convergence of the Markov chains (Gelman et al., 2004, p. 296–297). The estimates for the model parameters are then obtained as the mean of the posterior samples after the burn–in period. We next develop the posterior conditional distributions for the model parameters for Cases (i) and (ii). The Bayesian EOFs can be obtained by Theorem 6.4.1 in which the mean field, spatial and temporal covariance 145 Chapter 6. Bayesian Empirical Orthogonal Function Method matrices are estimated from their posterior distributions or by the empirical Bayes method. Posterior conditional distributions The posterior conditional distributions for µ, Σ−1S , and Σ −1 T can be obtained for Cases (i) and (ii) in the fully Bayesian framework. The proofs of the theorems in this section are presented in Appendix C.1. We first present the conditional posterior distributions for model parameters for Case (i) in the following theorem. Theorem 6.4.5 Given the Bayesian hierarchical model in (6.8)–(6.13), the posterior conditional distributions for these model parameters are given as follows: (i) The conditional posterior distribution for µ is given by µ|Z,ΣS ,ΣT ∼ Np(M,Σ∗ΣS) (6.21) where Σ∗ = {tr(1′n1nΣ−1T ) }−1 and M = ZΣ−1T 1′nΣ∗. (ii) The conditional posterior distribution for Σ−1S is given by Σ−1S |Z, µ,ΣT ∼ Wp(δ1,Ξ1), (6.22) where δ1 = δS + n, Ξ1 = Ξs −ΞsY(Y′ΞSY+ΣT )−1Y′Ξs, (6.23) and Y = Z− µ⊗ 1′n. (iii) The conditional posterior distribution for Σ−1T is given by Σ−1T |Z, µ,ΣS ∼ Wn(δ2,Ξ2), (6.24) where δ2 = δT + p, Ξ2 = ΞT −ΞTY′(YΞTY′ +ΣS)YΞT , (6.25) 146 Chapter 6. Bayesian Empirical Orthogonal Function Method and Y = Z− µ⊗ 1′n. In the same way, the posterior conditional distributions for the model parameters for Case (ii) are obtained in the next theorem. Theorem 6.4.6 Given the Bayesian hierarchical model in (6.8)–(6.12) and (6.14), the posterior conditional distributions for µ and Σ−1S are given in Theorem 6.4.5. Let V = Σ−1/2S (Z− µ⊗ 1′n). Moreover, the posterior condi- tional distribution for θ is given by: p(θ|Z, µ,ΣS) ∝ exp { −1 2 [ tr ( VV′ρ(., θ)−1 ) + (θ − θ0)′Σ−10 (θ − θ0) ]} . (6.26) After obtaining the estimates for the spatial and temporal covariance matrices, as well as other model parameters for both cases, the Bayesian EOFs can then be obtained by Theorem 6.4.1, Lemma 6.4.1 and the KL expansion. Next we illustrate how the MCMC algorithm can be used to obtain the posterior samples from the joint posterior distribution p(µ,Σ−1S ,Σ −1 T |Z) and p(µ,Σ−1S , θ|Z) for Cases (i) and (ii), respectively. MCMC algorithms To obtain the posterior samples from the joint posterior distribution of model parameters, µ,ΣS and ΣT (or θ), the MCMC algorithm is used to draw samples from their posterior distributions. For Case (i), we use Gibbs sampling method based on the full conditional distributions we obtained before. For Case (ii), a Metropolis–within–Gibbs algorithm is used because the posterior conditional distribution for θ does not have any closed form. Algorithm 6.4.1 For Case (i), Gibbs sampling can be used to draw the posterior samples from p(µ,ΣS ,ΣT |Z) : 1. Initialization: set µ(1) = Zrow, 147 Chapter 6. Bayesian Empirical Orthogonal Function Method sample Σ−1S (1) ∼Wp(δS ,ΞS), and Σ−1T (1) ∼Wn(δT ,ΞT ). 2. Given the (j − 1)th values, µ(j−1), Σ−1S (j−1) , Σ−1T (j−1) , and Z: (1) Sample µ(j) from p(µ|Z,ΣS(j−1),ΣT (j−1)) from (6.21). (2) Sample Σ−1S (j) from p(Σ−1S |Z, µ(j),ΣT (j−1)) from (6.22). (3) Sample Σ−1T (j) from p(Σ−1T |Z, µ(j),ΣS(j)) from (6.24). 3. Repeat until convergence. The Metropolis–within–Gibbs algorithm is omitted here because we present a similar result in Chapter 2. In this section, we give the algorithm for a very special case when the temporal process is assumed to be an AR(1) process. Hence we have φ as the parameter that characterizes the AR(1) process. Assume that φ ∼ N(φ0, σ2φ0). Then the collection of hyperparameters can be denoted by H = {φ0, σ2φ0 , δS ,ΞS}. Since there is no closed form for the posterior conditional distribution in (6.26), the Metropolis–within–Gibbs algorithm could then be used to draw posterior samples of interest. The next section includes a straightforward extension on the Bayesian EOFs results where its spatial covariance is assumed to have a GIW distri- bution instead of IW (see Le and Zidek (2006), for example). 6.5 Extension to the Bayesian EOFs We can extend the above results about Bayesian EOFs to incorporate the GIW prior for the spatial covariance structure in such a way that the GIW prior reflects some characteristics of the data matrix. Le & Zidek (1992– 2006) develop theoretical results for modelling spatio–temporal processes, i.e., the BSP approach in Chapter 4. 148 Chapter 6. Bayesian Empirical Orthogonal Function Method In such a Gaussian GIW framework, the spatial covariance matrix ΣS can be estimated by the SG–method or Damian SG–method (Damian et al., 2002). Therefore, the estimates for the ΣS can be updated at each iteration in the MCMC sampling. This will be carried out in future work. The next section includes two simulation examples that help assess the performance for the Bayesian, classical and corrected EOFs. 6.6 Simulation Study 2 The Bayesian EOF models we consider above have a very general structure. Note that ² in (6.6) represents small scale spatial variation or measurement error. If ² is close to 0, or equivalently to a very small value for Σ², we then have approximately Y = µ ⊗ 1′n + ΦX, where Φ’s columns are the EOFs for ΣS and X|θ ∼ Np×n(0, Ip ⊗ΣT (θ)). If ΣT (θ) = In, then we have the classical EOFs in Section 6.2. If ΣT (θ) 6= In but known, we then have the corrected EOFs in Section 6.3. If ΣT (θ) is unknown, we can use the Bayesian EOFs obtained in Section 6.4. The objective in this example is to compare the three different types of EOFs for a separable state–space process with known spatial and temporal covariance matrices. To do that, we first simulate the matrix–variate data set. We then compute these three EOFs and compare them with the true EOFs by contour plots and the matrix discrepancies. To briefly review these three types of EOFs, supposeY : p×n represents the anomaly matrix for p sites and n time points, and follows a multivariate normal distribution Np×n(0,ΣS ⊗ΣT ). Recall that the classical EOFs es- timate the sample spatial covariance matrix by 1nYY ′. Given the temporal dependence structure, that is, the temporal covariance matrix, the corrected EOFs estimate the sample spatial covariance matrix by YΣ−1T Y ′. Given the priors for the spatial and temporal covariance matrix, the Bayesian EOFs estimate the sample spatial covariance matrix in the hierarchical model by means of the corresponding posterior mode. We consider two cases in this section to assess the performance of the EOFs for two different temporal dependence structures. For both cases, we 149 Chapter 6. Bayesian Empirical Orthogonal Function Method assume a separable space time covariance structure, that is, an exponential spatial covariance function and an AR(1) temporal covariance function. In particular, the spatial covariance function is given by (ΣS)ij = exp(−Vij/λ), (6.27) where Vij is the Euclidean distance between si and sj , for i, j = 1, . . . , p and λ, a scale parameter. The temporal covariance function between tk and tl is given by σ2vφ 2||tk−tl||, for tk, tl ∈ {1, . . . , T}. Note that |φ| < 1 indicates a causal AR(1) process. If φ ' 0, then yt are approximately independent; if φ ' 1, then {yt : t = 1, . . . , n} is a highly autocorrelated AR(1) process. We consider Case (i), φ = 0.1, and Case (ii), φ = 0.9. The geographical region in this simulation study is set to be [0.1, 1.0]× [0.1, 1.0]. We select 100 grid points in this region to be the locations of interest, i.e., p = 100. We then choose n = 120 time points at each of these 100 sites. The initial settings for the separable space–time covariance functions are given as follows: λ = 0.4, σ2v = 1.0, and φ = 0.1 for Case (i), and 0.9 for Case (ii). We define Y (s, t) = Z(s, t)−µ̂(s), the anomaly at site s and time t, where µ̂(s) = 1n ∑n t=1 Z(s, t). We obtain the classical, corrected, and Bayesian EOFs using Y (s, t). We now compare the corrected, classical and “true” EOFs in the these two cases: (i) φ = 0.1, and (ii) φ = 0.9, respectively. Simulated data Suppose Y : p × n ∼ Np×n(0,ΣS ⊗ ΣT ). One way to generate the sim- ulated data is by first simulating Y∗ = YΣT 1/2 = (y∗1, . . . ,y∗n). Thus, y∗t ∼ Np(0,ΣS), independently for t = 1, . . . , n. We then generate Y by Y∗ΣT−1/2. An alternative to obtain the simulated data uses James’s result and the KL expansion. Given bothΣS andΣT , we first illustrate the way to generate the simulated data in any given regions. Given the spatial covariance matrix 150 Chapter 6. Bayesian Empirical Orthogonal Function Method ΣS , the Karhunen–Loeve expansion gives the unique orthogonal matrix O : p×p with its first row being positive and the unique diagonal matrixΛ2 : p×p with its decreasing diagonal entries being the eigenvalues of ΣS , such that ΣS = OΛ2O′ = (ΛO′)′ΛO′ = (λ1O(1), . . . , λpO(p))(λ1O(1), . . . , λpO(p))′ = (Φ(1), . . . ,Φ(p))(Φ(1), . . . ,Φ(p))′ = p∑ j=1 Φ(j)(Φ(j))′ = ΦΦ′, (6.28) where Φ = (Φ(1), . . . ,Φ(p)) : p× p and Φ(j) = λjO(j) for j = 1, . . . , p. Given ΣT : n×n, the corresponding Karhunen–Loeve expansion is given by ΣT = PL2P′ = p∑ j=1 Ψ(j)(Ψ(j))′ = ΨΨ′, (6.29) where P : n × n represents the orthogonal matrix with its first row posi- tive; L2 is a diagonal matrix with decreasing but positive diagonal entries l21, . . . , l 2 n; Ψ = (Ψ (1), . . . ,Ψ(n)), and Ψ(i) = liP(i) for i = 1, . . . , n. Consequently, ifY : p×n ∼ Np×n(0,ΣS⊗ΣT ), andΞ ∼ Np×n(0, Ip⊗In), thenY = ΦΞΨ′ where Φ andΨ are given by (6.28) and (6.29), respectively. Let Ξ∗ = ΞΨ′. Then Ξ∗ ∼ Np×n(0, Ip ⊗ΣT ), and Y = ΦΞ∗. In short, the simulated data can be generated, for known spatial and temporal covariance matrices, as follows: (i) Uniquely obtain Φ and Ψ as in (6.28) and (6.29) for known ΣS and ΣT , respectively. (ii) Generate Ξ ∼ Np×n(0, Ip⊗In) by n samples independently distributed 151 Chapter 6. Bayesian Empirical Orthogonal Function Method from the multivariate normal distribution Np(0, Ip). (iii) Obtain the simulated data matrix Y : p× n by ΦΞΨ′. Results for Case (i): ρ = 0.1 Table 6.3 demonstrates the percentage of the spatial variance of each spatial pattern found by the EOFs against the total space variance. It shows that these percentages for the true, classical or corrected EOFs are quite close to each other. Table 6.4 presents the matrix discrepancies of the corrected or classical EOFs to that of the true EOFs. These two matrix discrepancies are close to each other but the classical one is slightly better than the corrected one. Index of EOFs True Classical Corrected 1 33.584 31.913 31.883 2 11.122 13.476 13.457 3 11.122 8.963 8.996 4 5.108 6.193 6.201 5 3.853 4.732 4.747 6 3.638 3.467 3.467 7 2.274 2.733 2.735 8 2.274 2.333 2.329 9 1.537 1.946 1.945 10 1.537 1.719 1.724 Table 6.3: Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.1). Matrix discrepancies Classical v.s. True 0.230 Corrected v.s. True 0.233 Table 6.4: Matrix discrepancies for the classical and corrected EOFs against the true EOFs (ρ = 0.1). Figures 6.7–6.9 plot the contours for the three types of EOFs. In Figure 6.7, the first EOF in (a) shows an ellipsoidal spatial pattern in this field, 152 Chapter 6. Bayesian Empirical Orthogonal Function Method with its mode at the center of this region. The second EOF in (b) shows a north–east to south–west shifting spatial pattern with the negative mode at the north–east corner and the positive one at the south–west corner. The third EOF in (c) shows an approximately opposite spatial pattern to that in (b). The remaining three EOFs in (d)–(f) also represent certain spatial patterns in this field. It shows the classical EOFs to be quite similar to the corrected one due to the very small value of φ. Moreover, the ratio of matrix discrepancy between the classical and true EOFs matrix is 0.25, while that between the corrected and true ones, 0.25. It verifies our expectation that both EOFs work quite well since the “true” data are approximately inde- pendent samples across the overall locations. Classical EOFs are supposed to capture the main types of spatial patterns in this case. 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Longitude L a t i t u d e Figure 6.7: Contour plots for the first 6 true EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) 153 Chapter 6. Bayesian Empirical Orthogonal Function Method 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Longitude L a t i t u d e Figure 6.8: Contour plots for the first 6 classical EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) Results for Case (ii): ρ = 0.9 Table 6.5 plots the percentages of the spatial variations for each of the three types EOFs: true, classical, and corrected. This graph shows that the corrected EOFs gives more accurate estimates on the majority diagonal ele- ments of Λ in the KL expansion. Table 6.6 presents the matrix discrepancies between the corrected and classical EOFs and the true EOFs. It shows that the matrix discrepancy of the corrected EOFs against the true ones is much smaller than that of the classical EOFs against the true ones, which implies more accurate results of the corrected EOFs than those of the classical ones. Figures 6.10–6.11 present the first six classical and corrected EOFs for this case. Comparing them with Figure 6.7, it is obviously that the cor- rected EOFs can estimate the main types of spatial patterns better than 154 Chapter 6. Bayesian Empirical Orthogonal Function Method 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Longitude L a t i t u d e Figure 6.9: Contour plots for the first 6 corrected EOFs (ρ = 0.1): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-0.6; Black=0.9.) the classical ones. Moreover, the ratio of matrix discrepancy between the classical and true EOFs matrix is 5.79, while that between the corrected and true ones, 0.21. It shows that the classical EOFs are far from the “truth” for the highly autocorrelated database. 6.7 Conclusions We have proposed Bayesian EOF (BEOF) approach in this chapter and show the corresponding theoretical results as well as the MCMC algorithm to obtain the posterior samples of the model parameters. We have shown that the corrected EOF method can be used to obtain better representation of principal spatial patterns than the classical EOF method in two simula- 155 Chapter 6. Bayesian Empirical Orthogonal Function Method Index of EOFs True Classical Corrected 1 33.584 33.572 37.702 2 11.122 20.789 11.868 3 11.122 9.660 11.005 4 5.108 7.835 6.204 5 3.853 4.341 4.344 6 3.638 3.302 3.405 7 2.274 2.870 2.038 8 2.274 2.287 1.910 9 1.537 1.746 1.521 10 1.537 1.583 1.460 Table 6.5: Percentage of spatial variation (%) for the first 10 EOFs by the true, classical, and corrected methods (ρ = 0.9). Matrix discrepancies Classical v.s. True 4.978 Corrected v.s. True 0.332 Table 6.6: Matrix discrepancies for the classical and corrected EOFs against the true EOFs (ρ = 0.9). tion studies. From where, we conclude that the classical EOF may lead to severe problems for a highly temporally correlated space–time process. The corrected EOF greatly improves the performance of the classical EOF and captures the principal spatial patterns better than the classical one. The implementation of the BEOF method will be one future work, as well as the comparisons among the BEOFs, corrected and classical EOFs. Here notice that the BEOF is different than the Bayesian factor analysis proposed by Aguilar and West (2000). In the Bayesian factor analysis they defined, PCA is used for the time–varying ΣS . And MCMC algorithm has to be used to draw posterior samples for the Bayesian factors, which is com- putationally costly. We so extend the BEOF into an extension of the BSP to model the univariate or multivariate responses in spatio–temporal fields, which we call generalized Bayesian spatial prediction (GBSP) method. 156 Chapter 6. Bayesian Empirical Orthogonal Function Method 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Longitude L a t i t u d e Figure 6.10: Contour plots for the first 6 classical EOFs (ρ = 0.9): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-1.6; Black=2.2.) 157 Chapter 6. Bayesian Empirical Orthogonal Function Method 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (a) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (b) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (c) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (d) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (e) Longitude L a t i t u d e 0.2 0.4 0.6 0.8 1.0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 (f) Longitude L a t i t u d e Figure 6.11: Contour plots for the first 6 corrected EOFs (ρ = 0.9): (a) – 1st EOF; (b) – 2nd EOF; (c) – 3rd EOF; (d) – 4th EOF; (e) – 5th EOF; and (f) – 6th EOF. (White=-1.6; Black=2.2.) 158 Chapter 7 An Extension of the BSP: Bayesian Spatio–Temporal Models This chapter proposes an extension of the BSP approach for modelling air pollution data in large spatial–temporal domains, such as the AQS database we discussed in Chapters 2 and 3. The motivation for this extension has been addressed in previous studies. We extend the BSP approach because of its computational efficiency and better model performance in spatial in- terpolation and temporal prediction than those of the DLM. We integrate the BSP into the DLM framework because the DLM has a fairly flexible structure for temporal prediction at ungauged and gauged locations and capability for handling the missing observations at gauged sites. Our proposed model can deal with two types of covariates in this field: time–varying but site invariant covariates and site–specific and time–varying covariates. We decompose the underlying space–time processes into three parts: a long–term spatial–temporal term, a short–term main spatial pat- tern and a short–term spatial–temporal term. We also incorporate Bayesian EOFs into the new model so we can model gridded data, such as the MAQSIP (Multiscale Air Quality Simulation Platform) simulated data. We also extend the univariate Bayesian spatio–temporal model to the multi- variate case. Moreover, we summarize the MCMC method that allows us to draw MC samples from the joint posterior distribution of model param- eters. And henceforth, we are able to interpolate or predict the univariate (or multivariate) pollutant(s) in large spatio–temporal domains. 159 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models Section 7.1 introduces related research in this field. Section 7.2 proposes a univariate Bayesian spatio–temporal model, an extension of the BSP ap- proach, to model spatio–temporal processes over large domains. Section 7.3 discusses the relationships between our model and some others, such as the DLM proposed by Huerta et al. (2004), the state–space model (SSM) by Wikle and Cressie (1999), Gelfand et al. (2005), and Le–Zidek’s approach (1992). Section 7.4 presents the multivariate Bayesian spatio– temporal model. Section 7.5 demonstrates use of the MCMC algorithm to draw samples from the joint posterior distribution of the model parameters. Section 7.6 summarizes the advantages for the Bayesian spatio–temporal models we propose in this chapter and the remaining future work for our models. 7.1 Introduction In large spatial–temporal domains, computational efficiency often emerges as a major difficulty in modelling high–dimensional data. This computational problem has been addressed by many approaches. Mardia et al. (1998) propose the kriged Kalman filter (KKF) approach in which the mean structure in the observation equation are formed by finite common fields. However, in the discussion following Mardia et al.’s paper, Cressie and Wikle (1998) criticize the KKF approach for its oversmoothing predictor and lack of specific prior structure for some of the model parame- ters, comparing with the SSM proposed by Wikle and Cressie (1999), a fully hierarchical Bayesian approach. They use EOFs based on the orthogonal ba- sis functions to capture the main spatial patterns in spatial–temporal fields and so tackle the problem of the curse of dimensionality. However, Wikle and Cressie’s SSM can only incorporate an AR(1) process, and so would be inapplicable to more general structures of autocorrelated residuals after removing the long–term spatial–temporal component from the observation model. Gelfand et al. (2005) develop a dynamic spatio–temporal model using the linear coregionalization method (LMC) to incorporate different correlation 160 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models structure for multivariate responses. However, their model is not applicable to autocorrelated detrended residuals. A similar approach can also be seen in Lee and Ghosh (2005). Johannesson et al. (2007) propose dynamic multi– resolution spatial models (MRSM) to extend the MRSM and incorporate dynamic temporal features. Moreover, the spatial domain considered in their approach is recursively partitioned with a “tree–structured parent– children relationship between cells (pixels) at adjacent resolution”. Lopes et al. (2007) propose a spatial dynamic factor analysis to model the common factors as a latent process with unknown finite dimension, much smaller than the total number of sites. The forward–filtering–backward–sampling method is then applied to the latent process to obtain the posterior samples for state parameters in the latent process. Other novel works related to geostatistics can be referred to Wacker- nagel (1998). A fully Bayesian approach on kriging has been investigated by Handcock and Stein (1993). They state the ordinary kriging can be viewed as the Bayesian kriging under a non–informative prior for the mean. In their approach, they incorporate the unknown covariance structure in a Bayesian framework. The Bayesian kriging approach takes into account more uncertainty on the unknown covariance structure and so is capable of quantifying the performance of the estimated kriging predictor. Le and Zidek developed their approach in accord with a specific set of desiderata, one of which was computational feasibility. Moreover their ap- plications were commonly made to relatively compact regions such as urban areas. These two factors led them to use conventional EDA and time se- ries approaches to initially remove regional temporal components by fitting identical model parameters to all sites, making the standard errors of the resulting estimates negligible. In one case where there method was employed to cover a very large spatial domain (Fu et al., 2003) a spatial thin plate spline was fitted over sites so, in effect, a different mean level µ(s) was sub- tracted from each site’s series so as to center it. Thus in most cases, Le and Zidek focus on modeling small scale variation. At the same time, even though the large scale components were handled in a fairly ad hoc way, the approach itself was very general and very flexible. Thus for example, it sub- 161 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models sumes the approach of Wikle and Cressie (1999), albeit without the elegant modeling foundations the latter provide and the substantive rationale they give for it. Our approach attempts to meet their desideratum of computational sim- plicity while putting their prefiltering approach into a more formal hierar- chical framework. That retains the advantages of their methodology. At the same time, it extends the approach of Wikle and Cressie (1999) and includes a number of other approaches for modelling space time processes. Thus it provides a unifying Bayesian framework wherein computational strategies can be developed once and for all. That in turn gives modellers much flexi- bility. Next we propose the extension of the BSP approach, that is, the uni- variate and multivariate Bayesian spatio–temporal models. 7.2 Univariate Bayesian Spatio–temporal Model We consider the case that the observations are measured at irregular loca- tions in a large spatial domain. We propose this univariate Bayesian spatio– temporal model to spatially interpolate and temporally predict the response variables at those irregularly located gauged or ungauged sites. One fu- ture implementation will be to predict ground–level ozone concentrations over the eastern USA using the hourly ozone AQS database in Section 2.5. Comparing it with Wikle and Cressie’s approach, our model decomposes the spatial–temporal process into the following components: a long–term dynamic spatio–temporal term, a short–term spatial patterns and a short– term spatial temporal component. Note that Wikle and Cressie (1999) as- sume that long–term spatial–temporal components are the averages over all locations, and so they consider the response to be the detrended residuals. Denote by Z(s, t) the observation at site s and time t. We model the spatio–temporal fields by first removing all the linear and seasonal trends across all regions in the domain that is of interest, that is, F′1(s, t)M(s, t). The remaining detrended residuals are represented by Y (s, t). We then de- compose Y (s, t) into two parts, ΦK(s)aK(t) – the remaining long–term spa- 162 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models tial variation, and V (s, t) – the short–term spatial–temporal variation. In our approach, this short–term spatial–temporal term can be modelled as a BSP term in Le–Zidek’s approach, incorporated into a DLM framework. We propose the following model in this section: Z(s, t) = F′1(s, t)M(s, t) + Y (s, t) (7.1) Y (s, t) = ΦK(s)aK(t) + V (s, t) (7.2) V (s, t) = F′2(s, t)θ(s, t) + ν(s, t) (7.3) and M(s, t) = G1(s, t)M(s, t− 1) + η(s, t) (7.4) aK(t) = H2taK(t− 1) + ξt (7.5) θ(s, t) = G2(s, t)θ(s, t− 1) + ω(s, t). (7.6) The above models can incorporate a fairly large class of cases, such as systematic temporal components, regional level covariate effects, and local level or site–specific covariate effects. In the coming subsections, we discuss in particular the way to deal with different kinds of covariates in our model and possible choices for the terms to illustrate the covariance structure at the coarse level, that is, ΦK(s). Specifically, we consider two types of covari- ates: Type I covariates are the regional level covariates, common over all sites at any fixed time point, with a site–specific coefficient matrix; Type II covariates are the local level covariates, site–specific and time–varying, with a common coefficient vector across all sites at any fixed time point. The vector form of the above proposed univariate Bayesian spatio–temporal model can be written as follows: Zt = F′1tMt +Yt (7.7) Yt = ΦKaK(t) +Vt (7.8) Vt = F′2tθt + νt (7.9) 163 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models Mt = G1tMt−1 + ηt (7.10) aK(t) = H2taK(t− 1) + ξt (7.11) θt = G2tθt−1 + ωt, (7.12) where Zt = (Z(s1, t), . . . , Z(sp, t))′ : p×1, F′1t = diag{F1(s1, t), . . . , F1(sp, t)}, Mt = (M(s1, t), . . . ,M(sp, t))′,Yt = (Y (s1, t), . . . , Y (sp, t))′,ΦK = (ΦK(s1), . . . ,ΦK(sp))′, Vt = (V (s1, t), . . . , V (sp, t))′, F′2t = diag{F2(s1, t), . . . , F2(sp, t)}, θt = (θ(s1, t), . . . , θ(sp, t))′, νt = (ν(s1, t), . . . , ν(sp, t))′, ηt = (η(s1, t), . . . , η(sp, t))′, ξt = (ξ(s1, t), . . . , ξ(sp, t))′, ωt = (ω(s1, t), . . . , ω(sp, t))′ and Gjt = diag{Gj(s1, t), . . . , Gj(sp, t)} for j = 1, 2. Our proposed univariate Bayesian spatio–temporal model is completed by giving the following assumptions on the priors of model parameters: ΣC = OCΛ−2C O ′ C (7.13) Φ = OCΛ−1C (7.14) ΦK = ΦEKp (7.15) νt ∼ Np(0,ΣF ) (7.16) ηt ∼ Nl1(0,W1) (7.17) ξt ∼ NK(0,WK) (7.18) ωt ∼ Nl2(0,W2) (7.19) Σ−C ∼ Wp(ΞC , δC) (7.20) Σ−F ∼ Wp(ΞF , δF ), (7.21) where ΣC presents the covariance matrix for Y (s, t) at the coarse level and henceforthΣ−C, the precision matrix; similarly,ΣF represents the covariance matrix at the fine level and Σ−F, the precision matrix. Let EKp : p ×K = (ep,1, . . . , ep,K), where ep,j : p×1 represents the p–dimensional vector whose jth entry is 1, the others, 0, for j = 1, . . . ,K. The initial information can be described as follows: M0 ∼ N(mM0 ,CM0 ) (7.22) 164 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models aK(0) ∼ N(ma0,Ca0) (7.23) θ0 ∼ N(mθ0,Cθ0). (7.24) In the coming subsections, we first discuss possible choices and models for different types of covariates, along with the main spatial term contributed by ΦK , and then obtain theoretical results for our proposed model. 7.2.1 Type I covariates We can incorporate Type I covariates in our model, that is, the covariates are common over sites at each fixed time point, for example, the month effect, week day effect, and hourly effect we considered in Chapter 4. For these types of covariate, we consider their coefficient matrix (unknown parameter matrix) to be site–specific. In this subsection, we show how we deal with this type of covariate as the first step in (7.1) and (7.4). Suppose the data base has l1 Type I covariates in total. Let Z̆t = (Z̆1(t), . . . , Z̆l1(t)) : 1 × l1 be the l1–dimension covariate vector at time t. The corresponding coefficient matrix B̆ : l1 × p can also be written as B̆ = (β̆(s1), . . . , β̆(sp))′, where β̆(si) = (β̆1(si), . . . , β̆l1(si)) is the l1–dimensional coefficient column vector at site si for i = 1, . . . , p. Let F′1(si, t) = Z̆t and M(si, t) = β̆(si). Then (7.1) is equivalent to Z(si, t) = Z̆tβ̆(si) + Y (si, t). This shows our model to incorporate Type I covariates. 7.2.2 Type II covariates We now show how we can deal with Type II covariates in this subsection using the model we proposed above, that is, site specific covariates varying with time, for example, hourly temperatures, wind speeds and wind direc- tions at each location. The corresponding coefficients of Type II covariates are common over all sites at any fixed time point. We show this type of covariates can be dealt with at steps (7.3) and (7.6), along with an AR(1) autocorrelation structure for the detrended residuals V (si, t). Of course, the case of other systematic temporal components can also be dealt with using 165 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models a similar approach. Suppose l2 Type II covariates all are of interest. Let Z̃t(si) = (Z̃1,t(si), . . . , Z̃l2,t(si)) be the l2–dimensional covariate vector at time t and location si, for i = 1, . . . , p and t = 1, . . . , n. Then its corresponding coefficients are constant across all sites at each fixed time point t, that is, β̃′t = (β̃1,t, . . . , β̃l2,t) : 1× l2. Let F′1(si, t) = Z̃t(si) and M(si, t) = β̃t. Then (7.1) is equivalent to Z(si, t) = Z̃t(si)β̃t + Y (si, t), demonstrating our model’s flexibility. 7.2.3 Possible choices for ΦK(s) ΦK(s) can be chosen in various ways. For example, it could be formed from polynomials. It can also be formed from the empirical orthogonal functions as discussed in Chapter 6. We illustrate some possible choices for ΦK(s) in this subsection. The simplest case would be the polynomials. For example, ΦK(s) can come from a polynomial in latitude and longitude, representing a linear mean surface of the spatial–temporal field of Y (s, t) in the geographical coordinates (Stroud et al., 2001). Of course, ΦK(s) can be constructed using other polynomial basis functions, such as the quartic polynomials, that is, fourth degree polynomials, used by Fuentes and Raftery (2005). Or ΦK(s) could also be orthogonal basis functions. For example, it could be orthogonal polynomials, orthonormal basis functions for cubic splines, bicubic splines, wavelets, or empirical orthogonal functions. For the rea- sons we stated before in Chapter 6, we emphasize EOF functions. Wikle and Cressie (1999) estimate the EOF in their SSM, which could do here. For unknown EOFs, we extend the Bayesian EOFs in Chapter 6 under the univariate Bayesian spatio–temporal model. We will talk about this in the coming subsection. 7.2.4 Predictive posterior distributions We present theoretic results for the predictive posterior distribution of model parameters used in our model. Moreover, we briefly summarize the idea of implementing the BEOF in our model and the related theoretical results. 166 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models Suppose ΣC denotes the coarse–level covariance matrix for Y (s, t). As- sume ΣF is the fine–level covariance matrix for the deAR’d residuals, in- cluded in the covariance matrix of ν(s, t). The univariate Bayesian spatio–temporal model we proposed in (7.7)– (7.12) can also be written as the following form: Zt = F′tθ∗t + νt νt ∼ Np(0,ΣF) θ∗t = Gtθ∗t−1 + ω∗t ω∗t ∼ Nl1+K+l2(0,W), where F′t = (F ′ 1t,Φ K ,F′2t), θ∗t = (M ′ t,a K(t)′, θ′t)′,Gt = Block–diag{G1t,H2t, G2t}, ω∗t = (η′t, ξ′t, ω′t)′, and W = Block–diag{W1,WK ,W2}. Using standard results for the DLM and referring to Theorem A.2.1 in Appendix A.2, we obtain the corresponding posterior distributions for the state parameters, that is, θ∗t , given the observations until time t and the coarse– and fine– level covariance matrices ΣC and ΣF, respectively, in the following theorem. Theorem 7.2.1 Given the coarse– and fine– level covariance matrices, ΣC and ΣF, respectively, we obtain the following posterior distributions: θ∗t−1|Z1:t−1,ΣF,ΣC ∼ Nl1+K+l2 [mt−1,Ct−1] θ∗t |Z1:t−1,ΣF,ΣC ∼ Nl1+K+l2 [at−1,Rt−1] Zt|Z1:t−1,ΣF,ΣC ∼ Np[ft,Qt] θ∗t |Z1:t,ΣF,ΣC ∼ Nl1+K+l2 [mt,Ct], where at = Gtmt−1 Rt = GtCt−1G′t +W ft = F ′ tat Qt = F ′ tRtFt +Σ F et = Zt − ft At = RtFtQ−1t mt = at +Atet Ct = Rt −AtQtA′t, for t = 1, . . . , n. Interest lies in the joint posterior distribution p(θ∗1:n,ΣC,ΣF|Z1:n). We can write this joint posterior density as the product of p(θ∗1:n|ΣC,ΣF,Z1:n) and p(ΣC,ΣF|Z1:n). By Theorem 7.2.1, the former posterior distribution can 167 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models be obtained as follows: p(θ∗1:n|ΣC,ΣF,Z1:n) = n∏ t=1 p(θ∗t |ΣC,ΣF,Z1:t)p(θ∗0|ΣC,ΣF) ∝ n∏ t=1 |Qt|−1/2 exp { −1 2 n∑ t=1 (θ∗t −mt)′C−1t (θ∗t −mt) } . The latter is given by p(ΣC,ΣF|Z1:n) ∝ p(Z1:n|ΣC,ΣF)p(ΣC)p(ΣF) = n∏ t=1 p(Zt|ΣC,ΣF,Z1:t−1)p(ΣC)p(ΣF) ∝ ( n∏ t=1 |Qt| )−1/2 exp { −1 2 n∑ t=1 e′tQ −1 t et } p(ΣC)p(ΣF), where et and Qt are given in Theorem 7.2.1. Note that F ′ t can be viewed as a function of ΣC, that is, F′t(ΣC) and so are both et and Qt. Moreover, we obtain the following non–analytic form for the full condi- tional posterior distributions of ΣC and ΣF : p(ΣF|ΣC,Z1:n) ∝ p(ΣC,ΣF|Z1:n) ∝ p(ΣF) ( n∏ t=1 |Qt| )−1/2 exp { −1 2 n∑ t=1 e′tQ −1 t et } ∝ |ΣF|− δF+p2 n∏ t=1 |F′tRtFt +ΣF|− 1 2 exp { −1 2 [ tr(ΞFΣF−1) + n∑ t=1 e′t(F ′ tRtFt +Σ F)−1et ]} , (7.25) and similarly, p(ΣC|ΣF,Z1:n) ∝ |ΣC|− δC+p 2 n∏ t=1 |F′t(ΣC)RtFt(ΣC) +ΣF|− 1 2 exp { −1 2 × [ tr(ΞCΣC−1) + n∑ t=1 e′t(Σ C) ( F′t(Σ C)RtFt(ΣC) 168 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models +ΣF)−1 et(ΣC) ]} . (7.26) These results give us a theoretical basis for making inferences based on the joint posterior distribution of the model parameters using the MCMC method. The way using the MCMC method will be addressed in Section 7.5. 7.3 The Univariate Bayesian Spatio–temporal Model and Relationships with Others Approaches We talk about the flexibility of our univariate Bayesian spatio–temporal model and its relationship with some others approaches. 7.3.1 Relationship with the DLM in Huerta et al. (2004) The long–term spatial temporal term, F′1(s, t)M(s, t), is very general. It con- tains the case of linear trends and seasonal trends. As a very simple example, the linear trends can be incorporated into our model if we assume F′1(s, t) = (1, t) and M(s, t)′ = (β0t, β1t). Another example is the DLM in Huerta et al. (2004), a special case of our model. Assume F′1(t) = (1, S1t(a1), S2t(a2)) andM(s, t)′ = (βt, α1,s,t, α2,s,t). Then we obtain Z(s, t) = βt+S1t(a1)α1,s,t+ S2t(a2)α2,s,t) + Y (s, t), the DLM of Huerta et al. (2004). 7.3.2 Relationship with the SSM in Wikle & Cressie (1999) Our model contains the special case of the model proposed in Wikle and Cressie (1999), and so can treat the former as an extension of their model. They remove the linear and seasonal trends before fitting their model, a process incorporated by F′1(s, t)M(s, t) in our model. Assuming V (s, t) = ν(s, t) + ²(s, t), and removing (7.3), (7.4) and (7.6), we obtain the Wikle– Cressie model. 169 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models Our model can incorporate more general structures for systematic tem- poral components than Wikle and Cressie’s AR(1). To illustrate this gen- erality, we show now an AR(2) process can be dealt with using our model. This would make Wikle and Cressie’s model unsuitable for our purpose of modelling hourly ozone concentrations’ fields, although it might work for daily ozone concentrations. Suppose we consider l1 Type I covariates in (7.3) and AR(2) structure for the detrended residuals. In other words, we consider X(si, t) = V (si, t)− ztβ(si) and W (si, t) = X(si, t) − φ1(si)X(si, t − 1) − φ2(si)X(si, t − 2). In (7.3), let F′2(si, t) = (zt−φ1(si)zt−1−φ2(si)zt−2, φ1(si), φ2(si)) : 1× (l1+2), and θ(si, t)′ = (β(si)′, V (si, t− 1), V (si, t− 2)) : 1× (l1+2). We then obtain V (si, t) = F ′2(si, t)θ(si, t) +W (si, t). In (7.6), we further let G2(si, t) =  Il1 0l1 0l1 z∗(si, t) φ1(si) φ2(si) 0′l1 1 0  , where z∗(si, t) = zt−1 − φ1(si)zt−2 − φ2(si)zt−3. We so obtain (7.6) where ω(si, t)′ = (0l1 ,W (si, t − 1), 0). We have shown that our model can deal with AR(2) autocorrelation structure for the detrended residuals. More generally, we can show this dynamic linear modelling approach can actually accommodate far more general time series structure, such as ARMA(p, q) processes. However, Wikle and Cressie only consider the AR(1) process. From that point of view, our approach can be viewed as an extension to their well known model. 7.3.3 Relationship with the univariate SSM in Gelfand et al. (2005) We now show the relationship between our model and the univariate SSM proposed by Gelfand et al. (2005). They deal with both types of covariates in their model. Suppose F′1(s, t) = F ′ 2(s, t), M(s, t) = βt, and θ(s, t) = β(s, t) in our model. We then obtain the same mean function as Gelfend et al.. However, the main difference between their model and ours lays in the short– 170 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models term or small–scale spatial–temporal term. In their approach, they assume small–scale spatial–temporal term to be composed by two parts: a core- gionalization spatial–temporal term and a measurement error component. However, Gelfand et al.’s model cannot deal with autocorrelated detrended residuals, and so is not applicable in our applications. Our model contains more general structure than theirs, and we actually decompose the short– term spatial–temporal term into two parts: the principal spatial pattern and a local spatial–temporal pattern. 7.3.4 Relationship with the BSP model in Le and Zidek (1992) Our model also contains the Le–Zidek model as a special case. Le and Zidek prefilter the linear and seasonal trends, as well as the highly autocorrelated detrended residuals before fitting their model. This prefiltering step can be incorporated through the term F′1(s, t)M(s, t) and part of F ′ 2(s, t) in our model. Removing (7.2), (7.4) and (7.5), we obtain the Le–Zidek BSP model. Following (7.2), we can model the small spatial–temporal variation term as Le and Zidek do in their book. Let W (s, t) = V (s, t) − Z̃(s, t)β(s, t) and X(s, t) = W (s, t) − φiW (s, t − 1). For simplicity, assume β(s, t) = βt and Z̃(s, t), the l2 dimensional site–specific covariates vector, for exam- ple hourly temperatures or hourly wind speeds when the response vari- ate is the hourly ozone concentrations in our study. We can write Wt = (W (s1, t), . . . ,W (sp, t))′, and Z̃t = (Z̃(s1, t)′, . . . , Z̃(sp, t)′)′ : p × l2. We then have Wt = Vt − Z̃tβt and Xt = Wt − diag(φ1, . . . , φp)Wt−1. Le and Zidek then model X′t in a hierarchical Bayesian framework such that X′t ∼ Np(ẑtB,ΣF), where ẑt represents a q–dimension covariate vector and B : q × p, the coefficient matrix. At the second level, Le and Zidek model B ∼ Np×q(B0,F−1⊗ΣF). Furthermore, they assume that ΣF ∼W−1p (Ψ, δ). In our model, we also assume that βt = Hββt−1 + ω β t , where ω β t ∼ N(0,W2). We now have that V (s, t) = (Z̃(s, t)Hβ − φiZ̃(s, t− 1), φi)(β′t−1, V (s, t− 1))′ + Z̃(s, t)ωβt 171 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models +X(s, t) = F′2(s, t)θ(s, t) + Z̃(s, t)ω β t +X(s, t), (7.27) where F′2(s, t) = (Z̃(s, t)H β − φiZ̃(s, t − 1), φi) : 1 × (l2 + 1) and θ(s, t) = (β′t−1, V (s, t− 1))′ : (1 + l2 × 1). By the Le–Zidek’s modelling approach, we then have Vt = F′2tθt + Z̃tω β t +Bẑt + τt, where τt ∼ N(0,ΣF). Hence we obtain νt = Z̃tωβt + τt ∼ N(0, Z̃tW2Z̃ ′ t +Σ F). Note that this is a special case of our model. 7.4 A Multivariate Bayesian Spatio–Temporal Model We now extend our model to a multivariate case. Supposem different pollu- tants or species are measured in a spatio–temporal field. Let Zj(s, t) be the jth pollutant at site s and time t, for j = 1, . . . ,m, s ∈ {s1, . . . , sp}, and t = 1, . . . , n. Let Z(s,t) = (Z1(s, t), . . . , Zm(s, t))′ be the vector of observations at site s and time t. Assume l1j type I covariates and l2j type II covariates considered, for j = 1, . . . ,m. Those covariates are represented by F1,j(s, t)′ : 1× l1j and F2,j(s, t)′ : 1× l2j , respectively. Write Fi(s, t)′ : m×∑mj=1 lij as a block diagonal matrix with diagonal entries {Fi,1(s, t)′, . . . ,Fi,m(s, t)′}, for i = 1, 2. LetM(s, t) = (M1(s, t)′, . . . ,Mm(s, t)′)′ : ∑m j=1 l1j×1 and θ(s, t) = (θ1(s, t)′, . . . , θm(s, t)′)′ : ∑m j=1 l2j×1. Assume we obtain theΦKjj : 1×Kj , for j = 1, . . . ,m, and the corresponding expansion coefficients aKjj (t) : Kj × 1, using the multivariate EOF analysis. Let Φ∗(s) = Block–diag{ΦK11 (s), . . . , ΦKmm (s)} : m × ∑m j=1Kj , and a ∗(t) = (aK11 (t), . . . ,a Km m (t)) ′ : ∑m j=1Kj × 1. We obtain the multivariate Bayesian spatio–temporal model as follows: Z(s, t) = F′1(s, t)M(s, t) +Y(s, t) (7.28) Y(s, t) = Φ∗(s)a∗(t) +V(s, t) (7.29) V(s, t) = F′2(s, t)θ(s, t) + ν(s, t) (7.30) 172 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models and M(s, t) = G1(s, t)M(s, t− 1) + η(s, t) (7.31) a∗(t) = H2ta∗(t− 1) + ξt (7.32) θ(s, t) = G2(s, t)θ(s, t− 1) + ω(s, t), (7.33) where Y(s, t) = (Y1(s, t), . . . , Ym(s, t))′ : m× 1, V(s, t) = (V1(s, t), . . . , Vm(s, t))′ : m × 1, ν(s, t) = (ν1(s, t), . . . , νm(s, t))′ : m × 1, and ω(s, t) = (ω1(s, t), . . . , ωm(s, t))′ : m×1.We assume a separable space–time covariance structure for the matrix–variate νt = (ν(s1, t), . . . , ν(sp, t)) : m× p, that is, νt ∼ Nm×p(0,Ω⊗ΣF), where Ω represents the correlation matrix between the pollutants and ΣF, the covariance matrix between the spatial locations at the local– or fine– scale level. We assume that the EOFs in Φ∗ estimate the covariance matrix, that is, Ω⊗ΣC. We also assume inverted Wishart priors for ΣF and ΣC, respectively. We leave that extension to future work. 7.5 MCMC Algorithm on the Bayesian Spatio–temporal Models In this section, we discuss MCMC algorithms for sampling from the joint posterior distributions of the model parameters for both univariate and mul- tivariate spatial–temporal models. We do not present a simulation study or application, pending completion of future work on implementation. Instead we stop with a fairly clear statement of an approach that makes success seem plausible in applications like that one presented earlier to the AQS data, with hourly ozone concentrations over the entire USA and the whole of one summer. To make computation feasible, we can estimate the K using an ad hoc method. Or we can implement the idea of Bayesian EOF Chapter 6 in this 173 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models new proposed model, using block MCMC methods to draw samples from the full conditional distributions of model parameters. As we can see from the prediction posterior distribution for the univari- ate Bayesian spatio–temporal model, the Metropolis–Hasting method is re- quired to sample from the posterior distributions of the the fine– and coarse– level covariance matrices. That allows the Generalized inverted Wishart (GIW) prior to work at this stage. We leave this extension in future work. We now summarize what we have obtained in this chapter. From the predictive posterior distribution obtained in Section 7.2, we can use the block MCMC method to draw samples from its joint posterior distribution. In other words, we iteratively sample from p(ΣF|ΣC,Z1:n), p(ΣC|ΣF,Z1:n), and p(θ∗1:n|ΣF,ΣC,Z1:n) according to (7.25), (7.26), and Theorem 7.2.1, re- spectively. Algorithm 7.5.1 (Metropolis–within–Gibbs algorithm) 1. Initialization: sample Σ−F(1) ∼ W (ΞF, δF) Σ−C(1) ∼ W (ΞC, δC) θ∗1:n (1) ∼ N(m0,C0). 2. Given the (j − 1)th values, Σ−F(j−1), Σ−C(j−1), θ∗1:n(j−1) and the ob- servations Z1:n : (1) Use a Metropolis–Hasting step to sampleΣ−F(j) from p(Σ−F|Σ−C, θ∗1:n,Z1:n). (2) Use a Metropolis–Hasting step to sampleΣ−C(j) from p(Σ−C|Σ−F, θ∗1:n,Z1:n). (3) Sample θ∗1:n (j) from p(θ∗1:n|Σ−C,Σ−F,Z1:n). 3. Repeat until convergence. The implementation of the MCMC method to sample the model parame- ters in our model will be left to future work. After obtaining the MC samples 174 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models for the model parameters after the burn–in period, it is straightforward to obtain the temporal prediction of the responses using the DLM approach. The spatial interpolation problem can then be viewed as a missing data in the DLM. The latter can be obtained by adding this MCMC block of missing data in the above algorithm. We leave all the implementation of this model to future wok. 7.6 Results and Conclusions Our Bayesian spatio–temporal model is very flexible and powerful. It has the following advantages: 1. The general structure of the unified Bayesian spatio–temporal model allows us to remove the long–term systematic temporal variation, for example, the linear trend, seasonal trends, covariate effects for both types. For type I covariates, that is, the site–specific covariates, we regress them on site–invariant but time–varying unknown coefficients, an extension of the Le–Zidek’s approach. For type II covariates, that is, time–varying covariates common over the entire domain, we regress them on a site–specific and time–varying unknown coefficient matrix, one removed at the process model in Le–Zidek’s approach. 2. We then decompose Y (s, t) into two parts: the principal local spatial patterns and the remaining local spatial–temporal variation terms. In this step, ΦK(s) can also be constructed using other basis functions, such as orthogonal polynomials, thin spline plates, and bicubic spline methods. However, the eigenvalue basis functions, that is, the basis functions for the EOF, do provide a unique solution by the spectral decomposition theorem. In other words, we can truncate the first K EOFs to represent the main spatial patterns in the spatial–temporal fields of Y (s, t), that is, the “detrended” residuals. 3. The remaining local spatial–temporal variation term is then modelled as a BSP term because of the need for computational feasibility. The 175 Chapter 7. An Extension of the BSP: Bayesian Spatio–Temporal Models general structure of the systematic temporal components can then be incorporated. From this point of view, we obtain an extension for the SSM proposed by Wikle and Cressie (1999). 4. Our model allows us to incorporate two different scales of covariance matrices, that is, coarse– and fine– scale levels of spatial covariance matrices. At the coarse level, we truncate the EOFs using the Bayesian EOF method in Chapter 6 to represent the coarse scale long–term spatial patterns. At the fine level, the BSP term is used to represent the local scale short–term spatial–temporal patterns. 5. Spatial interpolation and temporal prediction turn out to be straight- forward using our model. Gaussian framework allows us to spatially interpolate the responses at ungauged sites. The time evolution of the dynamic models allows us to temporally predict the responses at the gauged and ungauged sites. Future work includes implementing that model to the real database or simulation study, and an extension involving the GIW prior for the coarse– and fine– scale covariance matrices. 176 Chapter 8 Future Work This last chapter presents a summary of this thesis (Section 8.1) and pro- posals for future work based on it (Section 8.2) as well as approaches to be taken to complete that work. 8.1 Thesis Summary We have implemented the Gaussian DLM proposed by Huerta et al. (2004) to spatially interpolate ground–level ozone concentrations at one cluster of monitoring stations in an AQS database (1995). A complete and tested software package has been developed for this implementation. Theoretical results have been developed regarding the predictive variance using the first– order polynomial model. Those results explain the monotone behavior of the coverage probabilities found from the results of spatial interpolation. More- over, we demonstrate how to use discount factors in the DLM to improve the predictive results and their accuracy. However, we find the approach very computationally intensive so that it is not scalable to large space–time domains. We therefore explore a computational simpler alternative, the BSP ap- proach, for spatially interpolating univariate and multivariate responses in space–time domains. That alternative has been implemented for an AQS database (2000). Moreover, we have found an extension to this approach for temporal forecasting one–day–ahead ground–level ozone concentrations. After comparing it with the DLM, we find that the latter cannot compete with the BSP for either spatial interpolation or temporal prediction in terms of mean squared prediction error (MSPE). The BSP approach uses empirical Bayes steps to estimate some model 177 Chapter 8. Future Work parameters within the Bayesian framework. Although these steps simplify computation the ad hoc approach would be seen as objectionable from a purely Bayesian perspective and the model is not as flexible as the DLM. To refine the BSP therefore, we put it into a DLM framework so that the model is more flexible to update or predict the responses as new data come on stream, one of the attractive features of dynamic models. Yet we preserve its computational simplicity. A Bayesian version of the EOF method, that is, the BEOF method, has been proposed in this thesis. The BEOF method allows us to represent the principal spatial patterns of spatial–temporal fields in a fully Bayesian framework. We have demonstrated potentially severe problems with the classical EOF method that seems to have been largely ignored. We have compared the classical and corrected EOFs with true EOFs using the sim- ulated database. Moreover, we have developed an MCMC algorithm for sampling from the joint posterior distribution of model parameters. How- ever, implementation of the BEOF approach on our existing database will be left to future work. Finally, we have proposed a unified approach to univariate and multivari- ate spatio–temporal modelling within a fully hierarchical Bayesian frame- work so that we can incorporate some interesting features of the BSP, DLM and BEOF approaches. We have provided theoretical results on the joint posterior distribution of model parameters and the corresponding MCMC algorithm. Implementation of this model in data and simulated data will be completed in future work. 8.2 Future Research Plan We finally propose in the list below, future work based on this thesis as well as possible directions for completing that work. 1. We can extend the DLM to the discount DLM and implement it on a real database. This implementation helps us set the reasonable hy- perpriors for some of the model parameters instead of fixing them in 178 Chapter 8. Future Work an ad–hoc way. We can also develop corresponding software for this purpose. 2. We can extend the DLM to incorporate the dependence of model pa- rameters. To do this, a further extension of the discount DLM is needed so that the time–varying discount factors can be free of the computational burden. 3. We will extend the one–day–ahead temporal prediction of ground–level ozone concentrations using the BSP approach to arbitrary time points. 4. One interesting problem in the BSP approach is to deal with the mono- tone data (double staircase) patterns in two directions. This is a really difficult problem and a route to the solution remains to be found. 5. We can implement the BSP in multivariate cases where dependent but unknown structure between the multivariate responses are considered. 6. Another future project for the BSP approach involves misaligned data. This problem will be partially addressed when we integrate the BSP into the DLM framework, so the model we considered in Chapter 7 will be one possible choice. The DLM framework helps in the prediction of missing data provided the rate of missingness is reasonable. 7. We will explore the choice of an optimal starting hour for making one–day–ahead prediction. 8. The Bayesian EOF method needs to be applied to real as opposed to simulated data so that we can find whether the Bayesian and cor- rected EOFs can give seriously discrepant answers, a result that would raise concerns about the appropriateness of classical EOFs. This im- plementation is straightforward following the MCMC algorithm we developed. 9. The extended BSP, i.e., the Bayesian spatio–temporal model, can be implemented with either model output or measurements. By combin- ing the two types of data over the entire network of US EPA moni- 179 Chapter 8. Future Work toring sites, more accurate spatial predictions should be achievable for ground–level ozone concentrations over a large spatio–temporal do- main. This straightforward implementation of our theory follows from our theoretical results and MCMC algorithm we have already devel- oped. For example, we can combine physical model outputs with sta- tistical ones, in which the MAQSIP (gridded) data, can be considered at the regional or coarse level and the observations or measurements at each monitoring station, at the local or fine level. 10. We can extend Bayesian spatio–temporal models to incorporate dif- ferent prior beliefs for the model parameters, that is, the covariance matrices regarding to different levels and blocks. This extension in- cludes developing the corresponding theoretical results on predictive and posterior distributions using the GIW prior on model parameters. 11. We can use MCMC sampling algorithm to sample a random covariance matrix that follows a GIW prior distribution. This algorithm will help us to spatially interpolate and temporally predict the responses using the Bayesian spatio–termporal models in which the priors of covariance matrices at both coarse and fine levels are GIWs. 180 Bibliography [1] Björnsson, H. & Venegas, S.A. (1997). A manual for EOF and SVD analyses of climatic data. Centre for Climate and Global Change Research Report # 97–1, McGill U. [2] Bretthorst, L. (1988). Bayesian spectral analysis and estimation. New York: Springer Verlag. [3] Brown, P.J., Le, N.D., & Zidek, J.V. (1994a). Multivariate spatial inter- polation and exposure to air pollutants. Can. J. Statist., 22, 489–510. [4] Brown, P.J., Le, N.D., & Zidek, J.V. (1994b). Inference for a covariance matrix. In Aspects of uncertainty: A tribute to DV Lindly, (Eds) PR Freeman and AFM Smith. Chichester: Wiley, 77–92. [5] Calder, C.A. (2004). Efficient posterior inference and prediction of space– time processes using dynamic process convolutions. In the Joint Proceed- ings of the Sixth International Symposium on Spatial Accuracy Assess- ment in Natural Resources and Environmental Sciences and the Fifteenth Annual Conference of TIES. Portland, Maine. June 28–July 1, 2004. [6] Calder, C.A., & Cressie, N. (2007). Some topics in convolution–based spatial modeling. Proceedings of the 56th Session of the International Statistics Institute. Lisbon, Portugal. August 22–29, 2007. [7] Carter, C.K., & Kohn, R. (1994). On gibbs sampling for state space models. Biometrika, 81, 541–553. [8] Chikuse Y. (2006). State space models on special manifolds. Journal of Multivariate Analysis, 97, 1284–1294. 181 Bibliography [9] Cressie, N. (1993). Statistics for spatial data. Wiley. [10] Cressie N., & Wikle, C.K. (1998). Strategies for dynamic space–time statistical modelling: discussion of “The Kriged Kalman filter” by Mardia et. al. Test, 7, 257–264. [11] Damian, D., Sampson, P.D., & Guttorp, P. (2001). Bayesian estimation of semi–parametric non–stationary spatial covariance structures. Environ- metrics, 12, 161–178. [12] Dou, Y.P., Nhu, D.L., & Zidek J.V. (2007). A dynamic linear model for hourly ozone concentrations. Technical Report #228, Department of Statistics, UBC. [13] Fu, A., Le, N.D., & Zidek, J.V. (2003). A statistical characterization of a simulated Canadian annual maximum rainfall field. Technical report #2003–17, SAMSI. [14] Fuentes, M. (2002). Interpolation of nonstationary air pollution process: a spatial spectral apporach. Statistical Modelling, 2, 281–298. [15] Gamerman, D., & Lopes, H.F. (2006). Markov chain Monte Carlo: stochastic simulation for Bayesian infrence. Chapman & Hall. [16] Gelfand, A.E., Banerjee, S., & Gamerman, D. (2005). Spatial process modelling for univariate and multivariate dynamic spatial data. Environ- metrics, 16, 465–479. [17] Gelfand, A.E., Schmidt, A.M., Banerjee, S., & Sirmans, C.F. (2004). Nonstationary multivariate process modelling through spatially varying coregionalization (with discussion). Test, 13, 263–312. [18] Handcock, M.S., & Stein M.L. (1993). A Bayesian analysis of kriging. Technometrics, 35, 403–410. [19] Hannachi, A., Jolliffe, I.T., & Stephenson, D.B. (2007). Review empir- ical orthogonal functions and related techniques in atmospheric science: a review Int. J. Climatol., 27, 1119–1152. 182 Bibliography [20] Harvey, A.C. (1984). A unified view of statistical forecasting procedures. J. Forecasting, 3, 245–275. [21] Higdon, D., Swall, J., & Kern, J. (1998). Non–stationary spatial mod- eling. Bayesian statistics 6, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith. Oxford: Oxford University Press, 761–768. [22] Huerta, G., Sanso, B., & Stroud, J.R. (2004). A spatio–temporal model for mexico city ozone levels. J. R. Statist. Soc. C, 53, 231–248. [23] James A.T. (1954). Normal multivariate analysis and the orthogonal group. [24] James, A.T. (1960). The distribution of the latent roots of the covari- ance matrix. The Annals of Mathematical Statistics, 31, 151–158. [25] Johannesson, G., Cressie, N., & Huang, H. (2007). Dynamic multi– resolution spatial models. Environmental and Ecological Statistics, 14, 5–25. [26] Kyriakidis, P.C., & Journel, A.G. (1999). Geostatistical space–time models: A review. Mathematical Geology, 31, 651–684. [27] Le, N.D., Sun, L., & Zidek, J.V. (2001). Spatial prediction and temporal backcasting for environmental fields having monotone data patterns. Can. J. Statist., 29, 516–529. [28] Le, N.D., Sun, W., & Zidek, J.V. (1997). Bayesian multivariate spatial interpolation with data missing by design. J. Roy. Stat. Soc., Ser B, 59, 501–510. [29] Le, N.D., & Zidek, J.V. (1992). Interpolation with uncertain spatial covariance: A Bayesian alternative to kriging. J. Mult. Anal., 43, 351– 374. [30] Le, N.D., & Zidek, J.V. (2006). Statistical analysis of environmental space–time processes. Springer. 183 Bibliography [31] Lee, H., & Ghosh, S.K. (2005). A reparametrization approach for dy- namic space–time models. Institude of Statistics Mimeo Series # 2587. [32] Li, K.H., Le, N.D., Sun, L., & Zidek, J.V. (1999). Spatial–temporal models for ambient hourly PM10 in Vancouver. Environmetrics, 10, 321– 338. [33] Lopes, H.F., Salazar, E., & Gamerman, D. (2006). Spatial dynamic factor analysis. Techical report, Universidade Federal do Rio de Janeiro. [34] Mardia, K.V. (1977). Distributions on Stiefel and Grassman manifolds, and their applications. Advances in Applied Probability, 9, 435–436. [35] Mardia, K.V., Goodall, C., Redfern, E.J., & Alonso, F.J. (1998). The kriged Kalman filter. Test, 7, 217–282. [36] Mardia, K.V., & Khatri, C.G. (1977). Uniform distribution on a Stiefel manifold. Journal of Multivariate Analysis, 7, 468–473. [37] Moller, J. (2003). Spatial statistics and computational methods. Springer–Verlag. [38] Sampson, P., & Guttorp, P. (1992). Nonparametric estimation of non- stationary spatial structure. J. Amer. Stat. Assoc., 87, 108–119. [39] Stroud, J.R., Muller, P., & Sanso, B. (2001). Dynamic models for spatio–temporal data. J. R. Statist. Soc. B, 63, 673–689. [40] Sun, W., Le, N.D., Zidek, J.V., & Burnett, R. (1998). Assessment of Bayesian multivariate interpolation approach for health impact studies. Environmetrics, 9, 565–586. [41] Wackernagel, H. (1998) Multivariate geostatistics: an introduction with applications. Springer. [42] West, M., & Harrison, J. (1997). Bayesian forcasting and dynamic mod- els. Springer–Verlag. 184 Bibliography [43] Wikle, C.K. (2002). Spatio–temporal methods in climatology. To ap- pear: Encyclopedia of Life Support Systems. EOLSS Publishers Co. Ltd. [44] Wikle, C.K., & Cressie, N. (1999). A dimension–reduced approach to space–time Kalman filtering. Biometrika, 86, 815–829. [45] Zidek, J.V., Sun, L., Le, N.D., Özkaynak, H. (2002). Contending with space–time interaction in the spatial prediction of pollution: Vancouver’s hourly ambient PM10 field. Environmetrics, 13, 1–19. 185 Appendix A Additional Results for Chapter 2 A.1 Additional Results for Section 2.6.1 The joint posterior distribution for x1:T , λ and σ2 is given by p(x1:T , λ, σ2|y1:T ) = p(λ, σ2)p(xT |λ, σ2,y1:T ) T∏ t=1 p(xT−t|xT−t+1, λ, σ2,y1:T ) × T∏ t=1 p(yt|λ, σ2,y1:t−1) = p(x1:T |λ, σ2,y1:T )p(σ2|λ,y1:T )p(λ|yT ). Suppose p(λ, σ2) = p(λ)p(σ2), that is, the priors for λ and σ2 are indepen- dent with each other. The joint posterior distribution for λ and σ2 can be written as follows: p(λ, σ2|y1:T ) ∝ p(λ)p(σ2)(σ2)−nT/2 T∏ t=1 |Qt|−1/2 exp { − 1 2σ2 T∑ t=1 e′tQ −1 t et } . If the prior for σ2 is an inverse gamma distribution with shape parameter α and scale parameter β, then the posterior distribution for σ2 is also an in- verse gamma distribution with shape parameter α+ nT2 and scale parameter β + 12 ∑T t=1 e ′ tQ −1 t et. Hence, the posterior density for λ can be written as follows: p(λ|y1:T ) = p(λ, σ2|y1:T ) p(σ2|λ,y1:T ) 186 Appendix A. Additional Results for Chapter 2 ∝ p(λ) T∏ t=1 |Qt|−1/2 [ β + 1 2 T∑ t=1 e′tQ −1 t et ]−(α+nT/2) . Therefore, the posterior density for x1:T is given by p(x1:T |λ, σ2,y1:T ) = p(xT |λ, σ2,y1:T ) T∏ t=1 p(xT−t|xT−t+1, λ, σ2,y1:T ). A.2 Additional Results for Section 2.6.2 Theorem A.2.1 Under Models (2.14) - (2.15) and initial prior (2.18), for any 1 ≤ t ≤ T, conditional on θ, we have (i) (xt−1|y1:t−1, θ) ∼ N [mt−1, σ2Ct−1] (xt|y1:t−1, θ) ∼ N [at, σ2Rt] (yt|y1:t−1, θ) ∼ N [ft, σ2Qt] (xt|y1:t, θ) ∼ N [mt, σ2Ct], where at = mt−1 Rt = Ct−1 +W ft = F ′ tat Qt = F ′ tRtFt +Vλ et = yt − ft At = RtFtQ−1t mt = at +Atet Ct = Rt −AtQtA′t. (ii) Define Bt = CtR−1t+1. For 0 ≤ k ≤ T − 1, (xT−k|y1:T , θ) ∼ N [aT (−k), σ2RT (−k)], (A.1) where aT (−k) = mT−k +BT−k[aT (−k + 1)− aT−k+1] RT (−k) = CT−k +BT−k[RT (−k + 1)−RT−k+1]B′T−k, 187 Appendix A. Additional Results for Chapter 2 with aT (0) = mT , RT (0) = CT , aT−k(1) = aT−k+1, and RT−k(1) = RT−k+1. A.3 Additional Results for Section 2.6.4 The observation equation is given by yt = 1 ′ nβt + S1t(a1)α1t + S2t(a2)α2t + νt, νt ∼ N [0, σ2Σ(λ)], where Σ(λ) = exp(−V/λ) and V denotes the distance matrix for the mon- itoring sites s1, . . . , sn. Given λ, σ2, xt (that is, βt, α1t and α2t) and yt, the posterior conditional distribution for the constant phase parameters, a1 and a2, is given by p(a1, a2|λ, σ2,xt,yt) ∝ p(yt|λ, σ2,xt, a1, a2)p(a1, a2) ∝ p(Mt|λ, σ2, βt, α1t, α2t, a1, a2)p(a1, a2), where Mt = yt − 1′nβt − cos(pit12)α1t − cos(pit6 )α2t, for t = 1, . . . , T. We consider the following two cases for the prior of a = (a1, a2)′ : • Case (i) a standard reference prior: p(a) ∝ 1; • Case (ii) a bivariate normal prior: a ∼ N(µ,Σ), where µ = (µ1, µ2)′ and Σ is a 2 by 2 covariance matrix. Under Case (i), for fixed t = 1, . . . , T, let l1 = sin(pit12)α1t, l2 = sin( pit 6 )α2t, m = Mt and S = σ2Σ(λ). The posterior conditional distribution for the phase parameter vector a is now given by p(a1, a2|λ, σ2,xt,yt) ∝ p(Mt|λ, σ2,xt, a1, a2,xt, λ, σ2)p(a1, a2) ∝ exp{−1/2[Mt − a1 sin(pit12)α1t − a2 sin( pit 6 )α2t]′ ×(σ2Σ(λ))−1[Mt − a1 sin(pit12)α1t − a2 sin( pit 6 )α2t]} = exp{−1 2 (m− a1l1 − a2l2)′S−1(m− a1l1 − a2l2)} 188 Appendix A. Additional Results for Chapter 2 ∝ exp{−1 2 (aΣ−1a′ − 2aΣ−1µ′)} ∝ exp{−1 2 [a′(l1, l2)′S−1(l1, l2)a− a′(l1, l2)′S−1m −mS−1(l1, l2)a]} ∝ exp{−1 2 (a− µ)′Σ−1(a− µ)}, where Σ−1 = (l1, l2)′S−1(l1, l2), (A.2) µ = Σ(l1, l2)′S−1m. (A.3) Note that equation (A.3) is equivalent to Σ−1µ = (l1, l2)′S−1m. (A.4) More specifically, we obtain the following elements in the mean vector and covariance matrix for the posterior conditional distribution of the phase parameter vector a. Σ = [ σ11 σ12 σ12 σ22 ] , (A.5) ∆−1 = (σ11σ22 − σ212)−1 (A.6) = (l′1S −1l1)(l′2S −1l2)− (l′1S−1l2)2 (A.7) σ11 = ∆(l′2S −1l2) (A.8) σ12 = −∆(l′1S−1l2) (A.9) σ22 = ∆(l′1S −1l1) (A.10) µ1 = σ11(l′1S −1m) + σ12(l′2S −1m) (A.11) µ2 = σ12(l′1S −1m) + σ22(l′2S −1m) (A.12) Therefore, we have the following conclusions for the conditional posterior distribution of the phase parameter vector a. 189 Appendix A. Additional Results for Chapter 2 (i) If the prior for a is the standard reference prior, that is, p(a1, a2) ∝ 1, we have ( a1 a2 |xt,yt, λ, σ2 ) ∼ N [( µ1 µ2 ) ,Σ ] , where µ1, µ2 and Σ can be formed in equations (A.5) - (A.12) or equations (A.2) - (A.3). (ii) If the prior for a is the bivariate normal distribution with mean vector µ0 = (µ01, µ 0 2) ′ and covariance matrix Σ0 = ( σ011 σ 0 12 σ012 σ 0 22 ) , we then have p(a1, a2|λ, σ2,xt,yt) ∝ exp{− 1 2 (a1 − µ1, a2 − µ2)′Σ−1(a1 − µ1, a2 − µ2)} exp{−12(a1 − µ 0 1, a2 − µ02)′(Σ0)−1 ×(a1 − µ01, a2 − µ02)} ∝ exp{−1 2 (a1 − µ∗1, a2 − µ∗2)′Σ∗−1(a1 − µ∗1, a2 − µ∗2)}, where Σ∗ = (Σ−1 +Σ0−1)−1 (A.13) µ∗ = Σ∗(Σ−1µ+ (Σ0)−1µ0). (A.14) From (A.13) and (A.14), we have Σ∗ = Σ−Σ(Σ+Σ0)−1Σ = Σ0(Σ+Σ0)−1Σ, (A.15) and µ∗ = Σ0(Σ+Σ0)−1µ+Σ(Σ+Σ0)−1µ0. (A.16) 190 Appendix A. Additional Results for Chapter 2 Hence, the posterior conditional distribution for the phase parameters is given by ( a|λ, σ2,xt,yt ) ∼ N(µ∗,Σ∗), where µ∗ andΣ∗ can be referred to equations (A.13)–(A.14), or (A.15)– (A.16). A.4 Additional Results for Sections 2.7.1 and 2.7.2 Given the values of the phase parameters, range and variance parameters, and the observations until time t, the joint distribution of αs1t, α1t is( αst α1t ) ∼ N [( αs1,t−1 α1,t−1 ) , σ2τ21Σ ∗(λ1), ] where Σ∗(θ) = exp{−V∗/θ} = [ Σ∗11(θ) Σ∗12(θ) Σ∗21(θ) Σ∗22(θ) ] , with Σ∗11(θ) a scalar, Σ∗12(θ) a 1 by n vector, and Σ∗22(θ) a n by n matrix. We use V∗ to denote the new distance matrix for the unknown site s and the monitoring stations s1, . . . , sn. We then have the conditional posterior distribution of αs1t as follows: (αs1t|αs1,t−1, α1t, α1,t−1,yt, λ, σ2) ∼ N [αs1,t−1 +Σ∗12(λ1)Σ∗22(λ1)−1(α1t − α1,t−1), σ2τ21 (Σ ∗ 11(λ1)−Σ∗22(λ1)−1 ×Σ∗21(λ1))]. (A.17) Similarly, the conditional posterior distribution for αs2t is (αs2t|αs2,t−1, α2t, α2,t−1,yt, λ, σ2) ∼ N [αs2,t−1 +Σ∗12(λ2)Σ∗22(λ2)−1(α2t − α2,t−1), σ2τ22 (Σ ∗ 11(λ2)−Σ∗22(λ2)−1 191 Appendix A. Additional Results for Chapter 2 ×Σ∗21(λ2))]. (A.18) Using the observation equation as in Model (2.11), we have the condi- tional predictive distribution for yst as follows: (yst |yt, αs1t, αs2t, α1t, α2t, βt, λ, σ2) ∼ N [βt + S1t(a1)αs1t + S2t(a2)αs2t +Σ∗12(λ) ×Σ∗22(λ)−1(yt − 1nβt − S1t(a1)α1t −S2t(a2)α2t), σ2(Σ∗11(λ)−Σ∗12(λ) ×Σ∗22(λ)−1Σ∗12(λ))]. (A.19) 192 Appendix B Software for Chapter 3 We write the DLM software, GDLM.1.0, using the R and C interface. This software can be freely downloaded from http://enviro.stat.ubc.ca/dlm/ GDLM.1.0.zip or http://enviro.stat.ubc.ca/dlm/GDLM.1.0.tar.gz for win- dows or unix/linux systems, respectively. The instructions for installing this package in windows and linux/unix is under the folder “INSTALL”. This software requires the R software, R ≥ 2.2.0, and C compiler, that is, MinGW-3.2.0-rc-3.exe. “FFBS” folder which contains the C codes for the forward–filtering–backward–sampling method used in the MCMC algo- rithm. Their header files are included under the folder “include”. “MAIN” folder contains all the R functions for the models used in this chapter. A real database example is written in the folder “DEMO” where the database is located in the folder “DATA”. “OUTPUT” allows one to store the com- putational output from the MCMC algorithm for making some graphical comparisons. The basic information for this software has been summarized in the “Readme” file. Two packages in R, MASS and stats, are required in this software to fit the DLM within Gaussian framework. We now illustrate several important functions in “MAIN” used in this software. (1) Function “forfun.c” is used as an R interface function with C. It gen- erates two main components to compute the acceptance rate in the Metropolis–within–Gibbs sampling. It has the following arguments: lat, long, mcmc.data.matrix, lambda, a1, a2, gamma.vec, m.init, C.init and BH. Here are the details for each of these arguments at the jth iteration: • “lat” presents the vector of latitudes at the gauged sites. 193 Appendix B. Software for Chapter 3 • “long” presents the vector of longitude at the gauged sites. • “mcmc.data.matrix” presents the MCMC complete data matrix used at each iteration, that is, Y (j−1). By default, its columns represent responses at each of the time points and rows for each of the sites. • “lambda” presents the range parameter, λ, in the DLM. • “a1” presents the phase parameter corresponding to the 24 hour periodicity. • “a2” presents the phase parameter corresponding to the 12 hour periodicity. • “gamma.vec” presents fixed hyperparameters, that is, γ = (τ2y , τ21 , λ1, τ 2 2 , λ2). • “m.init” and “C.init” present the initial mean vector and co- variance matrix for the state parameters, that is, m0 and C0, respectively. • “BH” presents the total number of hours included in each day. By default, it is set to be 24. This function has the following outputs: • “quad” presents the summation of the quadratic quantities in the log–likelihood of the joint posterior density for all the model parameters, that is, ∑T t=1 e ′ tQ −1 t et. • “ldet” presents the summation of the log of the determinant quan- tities in the log–likelihood of the joint posterior density for λ, that is, ∑T t=1 log |Qt|. • “quad.vec” presents the vector of the quadratic quantities calcu- lated at each of the time points, that is, {e′tQ−1t et : t = 1, . . . , T}. • “ldet.vec” presents the vector of the logarithm of the determinant quantities at each of the time points, that is, {log |Qt| : t = 1, . . . , T}. 194 Appendix B. Software for Chapter 3 (2) Function “ffbs.c” is another R interface function with C. It calls a C function in R, “DLMFFBS”, to generate the state parameters, xt, at each MCMC iteration. This function contains the following arguments: • “lat” presents the vector of latitudes at the gauged sites. • “long” presents the vector of longitude at the gauged sites. • “mcmc.data.matrix” presents the mcmc complete data matrix used at each iteration, that is, Y(j−1). By default, its columns represent responses at each of the time points and rows for each of the sites. • “lambda” presents the range parameter, λ, in the DLM. • “sigma” presents the variance parameter, σ2, in the DLM. • “a1” presents the phase parameter corresponding to the 24 hour periodicity. • “a2” presents the phase parameter corresponding to the 12 hour periodicity. • “gamma.vec” presents fixed hyperparameters, that is, γ = (τ2y , τ21 , λ1, τ 2 2 , λ2). • “m.init” and “C.init” present the initial mean vector and co- variance matrix for the state parameters, that is, m0 and C0, respectively. • “BH” presents the total number of hours included in each day. By default, it is set to be 24. • “ctr” presents the current index for the iterations in the MCMC algorithm, that is, j in this case. This function has the following outputs: • “quad” presents the summation of the quadratic quantities in the log–likelihood of the joint posterior density for all the model parameters, that is, ∑T t=1 e ′ tQ −1 t et. 195 Appendix B. Software for Chapter 3 • “ldet” presents the summation of the log of the determinant quan- tities in the log–likelihood of the joint posterior density for the model parameters, that is, ∑T t=1 log |Qt|. (3) Function “GDLM” is an R function to implement the MCMC algo- rithm for the DLM given by (2.14)–(2.15) in Chapter 2. This function contains the following arguments: • “srm.data” presents the initial data matrix for the MCMC algo- rithm where the missing values are fitted by the spatial regression method. • “origin.data” presents the raw data matrix, including the missing values. By default, its row represents the stations and columns, the time points. • “job” is a binary variable. “job=0” means this function will only do the MCMC sampling; while “job=1” means this function will do both the MCMC sampling and spatial interpolation at un- gauged sites. • “glat” presents the vector of latitude (in degree) at gauged sites. • “glong” presents the vector of longitude (in degree) at gauged sites. • “uglat” presents the vector of latitude (in degree) at the ungauged sites. • “uglong” presents the vector of longitude (in degree) at ungauged sites. • “gamma.vec” presents fixed hyperparameters, that is, γ = (τ2y , τ21 , λ1, τ 2 2 , λ2). • “m.init” and “C.init” present the initial mean vector and co- variance matrix for the state parameters, that is, m0 and C0, respectively. • “BH” presents the total number of hours included in each day. By default, it is set to be 24. 196 Appendix B. Software for Chapter 3 • “ITER” presents the total number of iterations used in the MCMC sampling. • “tuning.para” presents the fixed tuning parameter, τ2, for the Metropolis–Hasting step. • “alpha.init” and “beta.init” present the hyperprior for the vari- ance parameter, σ2, that is, ασ and βσ, respectively. • “eta.init” and “delta.init” present the hyperprior for the range parameter, λ. • “mu.phase.init” and “Sigma.phase.init” present the hyperprior for the phase parameters, (a1, a2), that is, µ0 andΣ0, respectively. • “output.direc.name” presents the directory name that the user want to store the output. This function has the following outputs: • “lambda” presents the MC samples for the range parameter, λ. • “sigma” presents the MC samples for the variance parameter, σ2. • “phase.a1” and “phase.a2” present the MC samples for the phase parameters, a1 and a2, respectively. • “accept.ratio” presents the acceptance ratio for the range param- eter, λ, in the Metropolis–Hasting step. • “accept.index” presents the index of accepted iteration in the Metropolis–Hasting step. • “quad” presents the quadratic form at each of the iterations (only λ = λ(j) and all the other parameters having their (j − 1)th iter- ation value). • “log.det” presents the logarithm of the determinant quantity at each of the iterations (only λ = λ(j) and all the other parameters having their (j − 1)th iteration value). • “theta.mat” presents the sampling matrix of the state parame- ters, x1:ITER, from the MCMC algorithm. 197 Appendix B. Software for Chapter 3 (4) Function “GDLM.INIT” is an R function, having almost the same settings as “GDLM”. The difference is that “GDLM.INIT” allows the MCMC sampling to start from any given values for model parameters: λ, σ2, a1 and a2. This function contains most arguments and same output as that of “GDLM” except for the following four arguments in the input: • “lambda.init.value” presents the starting values for λ. • “sigma.init.value” presents the starting values for σ2. • “phase.a1.init.value” presents the starting values for a1. • “phase.a2.init.value” presents the starting values for a2. (5) Function “interpolate.fun” is an R function used to interpolate the re- sponse variable at ungauged sites. It contains the following arguments: • “srm.data” presents the initial data matrix for the MCMC algo- rithm where the missing values are fitted by the spatial regression method. • “origin.data” presents the raw data matrix, including the missing values. By default, its row represents the stations and columns, the time points. • “job” is a binary variable. “job=0” means this function will only do the MCMC sampling; while “job=1” means this function will do both the MCMC sampling and spatial interpolation at un- gauged sites. • “glat” presents the vector of latitude (in degree) at gauged sites. • “glong” presents the vector of longitude (in degree) at gauged sites. • “uglat” presents the vector of latitude (in degree) at ungauged sites. • “uglong” presents the vector of longitude (in degree) at ungauged sites. 198 Appendix B. Software for Chapter 3 • “lambda.mcmc” presents the MCMC samples for the range pa- rameter, λ. • “sigma.mcmc” presents the MCMC samples for the variance pa- rameter, σ2. • “phase.a1.mcmc” presents the MCMC samples for the phase pa- rameter, a1. • “phase.a2.mcmc” presents the MCMC samples for the phase pa- rameter, a2. • “BURN.IN” presents the burn–in period setting for the MCMC samples for the model parameters. • “m.init” and “C.init” present the initial mean vector and co- variance matrix for the state parameters, that is, m0 and C0, respectively. • “gamma.vec” presents fixed hyperparameters, that is, γ = (τ2y , τ21 , λ1, τ 2 2 , λ2). • “BH” presents the total number of hours included in each day. By default, it is set to be 24. The output of this function is the interpolated values for the response variables at ungauged sites. 199 Appendix C Additional Results for Chapter 6 C.1 Additional Results for Section 6.4 Definition C.1.1 Suppose the random matrix response X : r × q has a matrix normal distribution, denoted by X ∼ Nr×q(M,C,Σ), where C : r × r > 0, and Σ : q× q > 0. Then the probability density function of X is given by p(X) = (2pi)−rq/2|C|−q/2|Σ|−r/2 exp{−1 2 tr[(X−M)′C−1(X−M)Σ−1]}. (C.1) Definition C.1.2 Suppose the random matrix X : q× q is symmetric, pos- itive definite and follows an inverted Wishart distribution with degrees of freedom δ and scale matrix S. Then the probability density function of X is given by p(X) = k|X|−( δ2+q) exp { −1 2 tr [ SX−1 ]} , (C.2) where S is positive definite and k−1 = 2qv/2piq(q−1)/4 q∏ j=1 Γ ( v − j − 1 2 ) |S|−v/2, with v = δ + q − 1. Proof C.1.1 (Lemma 6.4.2) By the KL expansion and Lemma 6.4.1, we 200 Appendix C. Additional Results for Chapter 6 have Λ−2 = OΣ−1S O ′ and moreover, Λ−2 ∼ Wp(n, Ip). Hence, the {λ−2j : j = 1, . . . , p} are mutually independent and λ−2j ∼W1(n, 1), that is, χ2n, for j = 1, . . . , p. ¦ Proof C.1.2 (Theorem 6.4.1) Given Y : p×n ∼ Np×n(0,ΣS⊗ΣT ), denote Y∗ to be YΣ−1/2T . Consequently, Y ∗ ∼ Np×n(0,ΣS⊗In). Similarly, we have Σ−1/2S Y ∗ ∼ Np×n(0, Ip ⊗ In). By Lemma 6.4.1, Σ−1/2S Y ∗ = OLP, where O represents an orthogonal matrix that is uniformly distributed over the Grassmann manifold, P, an orthogonal frame that is uniformly distributed over the Stiefel manifold, and L, a diagonal matrix with entries {l1, . . . , lp} such that l21, . . . , l2p are the eigenvalues for (Σ−1/2S Y ∗)(Σ−1/2S Y ∗)′. Hence, Y∗ = Σ−1/2S OLP. Moreover, since E[(Y∗)(Y∗)′] = Σ1/2S E[OL 2O′]Σ1/2S = nΣS , the Bayesian EOFs can then be given by W = 1nΣ 1/2 S OL. ¦ Proof C.1.3 (Theorem 6.4.2) By Definition C.1.2, we have p(Σ−1S ) ∝ |ΣS |−( δS 2 +p) exp{−1 2 tr(Ξ−1S Σ −1 S )}. Given ΣT , Y∗ = YΣ −1/2 T ∼ Np×n(0,ΣS⊗In). By Definition C.1.1, we have p(Y∗|ΣS) ∝ |ΣS |−n/2 exp{−12 tr[Y ∗(Y∗)′Σ−1S ]}. Then the posterior distribution for Σ−1S given Y ∗, that is, Y for known ΣT is given as follows: p(Σ−1S |Y) ∝ p(Y∗|ΣS)p(Σ−1S ) ∝ |ΣS |−( δS+n 2 +p) exp { −1 2 tr [ (Ξ−1S +Y ∗(Y∗)′)Σ−1S ]} . 201 Appendix C. Additional Results for Chapter 6 In other words, Σ−1S |Y ∼Wp(δo,Ξo), where δo = δS + n and Ξo = {Ξ−1S +Y∗(Y∗)′}−1 = (Ξ−1S +YΣ −1 T Y ′)−1 = ΞS −ΞSY(Y′ΞSY+ΣT )−1Y′ΞS . ¦ Proof C.1.4 (Theorem 6.4.3) This theorem can be proved similarly as in Proof C.1.3, and so omitted here. ¦ Proof C.1.5 (Theorem 6.4.4) Let V = Σ−1/2S Y. Then V ∼ Np×n(0, Ip ⊗ ΣT ). Hence, we have p(Y|ΣS , µ, θ) ∝ exp { −1 2 tr [ V′Vρ(., θ)−1 ]} . Given the prior for θ, Nk(θ,Σ0), the posterior conditional density for θ can be represented by p(θ|Y,ΣS , µ) ∝ p(Y|ΣS , µ, θ)p(θ) ∝ exp { −1 2 [ tr(VV′ρ(., θ)−1) + (θ − θ0)′Σ−10 (θ − θ0) ]} . ¦ Proof C.1.6 (Theorem 6.4.5) (i) Since Z ∼ Np×n(µ⊗ 1′n,ΣS ⊗ΣT ), and p(µ) ∝ 1, we have the posterior conditional distribution for µ as follows: p(µ|Z,ΣS ,ΣT ) ∝ p(Z|µ,ΣS ,ΣT )p(µ) ∝ exp{−1 2 tr[(µ⊗ 1′n − Z)Σ−1T (µ⊗ 1′n − Z)′Σ−1S ]} ∝ exp{−1 2 tr[(µ⊗ 1′n)Σ−1T (µ⊗ 1′n)′Σ−1S − (µ⊗ 1′n) 202 Appendix C. Additional Results for Chapter 6 ×Σ−1T Z′Σ−1S − ZΣ−1T (µ⊗ 1′n)′Σ−1S ]} = exp{−1 2 tr[(µµ′tr( 1′n1nΣ −1 T )− µ1nΣ−1T Z′ −ZΣ−1T 1′nµ′)Σ−1S ]} ∝ exp{−1 2 tr[(µ−M)(Σ∗)−1(µ−M)′Σ−1S ]}, where Σ∗ = {tr(1′n1nΣ−1T ) }−1 and M = ZΣ−1T 1′nΣ∗. Therefore, we have µ|Z,ΣS ,ΣT ∼ N1×p(M,Σ∗ ⊗ΣS), that is, Np(M,Σ∗ΣS) since Σ∗ is a scalar. (ii) Let Y = Z− µ⊗ 1′n. Consequently, we have YΣ−1/2T ∼ Np×n(0,ΣS ⊗ In). By Theorem 6.4.2, the posterior distribution for Σ−1S |Z,ΣT is Wp(δo,Ξo), given by (6.16), where Y = Z− µ⊗ 1′n. (iii) Similarly as in (ii), by Theorem 6.4.3 the posterior conditional distri- bution for Σ−1T |Z,ΣS is given by (6.18), where Y = Z− µ⊗ 1′n. ¦ Proof C.1.7 (Theorem 6.4.6) The proof for this theorem follows Theorem 6.4.4 but letting V = Σ−1/2S (Z− µ⊗ 1′n). ¦ 203