DYNAMIC BAYESIAN MODELSFOR MODELLINGENVIRONMENTAL SPACE{TIMEFIELDSbyYiping DouB.Sc., The XuZhou Normal University, 1997M.Sc., The University of New Brunswick, 2003Ph.D., The University of British Columbia, 2008A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinThe Faculty of Graduate Studies(Statistics)The University Of British Columbia(Vancouver)March, 2008c Yiping Dou 2008AbstractThis thesis addresses spatial interpolation and temporal prediction usingair pollution data by several space{time modelling approaches. Firstly, weimplement the dynamic linear modelling (DLM) approach in spatial interpo-lation and ?nd various potential problems with that approach. We developsoftware to implement our approach. Secondly, we implement a Bayesianspatial prediction (BSP) approach to model spatio{temporal ground{levelozone ?elds and compare the accuracy of that approach with that of theDLM. Thirdly, we develop a Bayesian version empirical orthogonal function(EOF) method to incorporate the uncertainties due to temporally varyingspatial process, and the spatial variations at broad{ and ?ne{ scale. Finally,we extend the BSP into the DLM framework to develop a uni?ed Bayesianspatio{temporal model for univariate and multivariate responses. The resultgeneralizes a number of current approaches in this ?eld.iiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xxDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 AQS Database and Ground{level Ozone . . . . . . . . . . . . 21.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Introduction to the Thesis . . . . . . . . . . . . . . . . . . . 52 Dynamic Linear Modelling . . . . . . . . . . . . . . . . . . . . 82.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Space{time Process . . . . . . . . . . . . . . . . . . . . . . . 92.3 Spatio{temporal DLM . . . . . . . . . . . . . . . . . . . . . . 112.4 Kalman Filter and Smoother . . . . . . . . . . . . . . . . . . 112.5 An Illustrative Example: Cluster 2 AQS Database (1995) . . 132.6 Algorithms for Estimating the Model Parameters . . . . . . 172.6.1 Metropolis{within{Gibbs algorithm . . . . . . . . . . 182.6.2 Sampling from p(?;?2;x1:T ja1;a2;y1:T ) . . . . . . . . 192.6.3 Sampling from p(ym1:T j?;?2;x1:T ;yo1:T ) . . . . . . . . . 22iiiTable of Contents2.6.4 Sampling from p(a1;a2jx1:T ;?;?2;y1:T ) . . . . . . . . 232.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 242.7 Algorithms for Interpolation and Prediction on UngaugedSites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.7.1 Sampling the unobserved state parameters . . . . . . 262.7.2 Spatial interpolation at ungauged sites . . . . . . . . 273 Dynamic Linear Modelling and Its Spatial Interpolation . 283.1 Cluster 2 AQS Dataset (1995) Revisited . . . . . . . . . . . . 283.2 Markov Chain Simulation Study . . . . . . . . . . . . . . . . 303.3 Spatial Interpolation . . . . . . . . . . . . . . . . . . . . . . . 323.4 Problems in the DLM . . . . . . . . . . . . . . . . . . . . . . 373.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 444 Multivariate Bayesian Spatial Prediction and Its Spatial In-terpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 AQS Ozone Database (2000) for the Chicago Area . . . . . . 544.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.4 Spatial Interpolation . . . . . . . . . . . . . . . . . . . . . . . 674.5 Spatial Leakage in the DLM . . . . . . . . . . . . . . . . . . 894.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . 965 Multivariate Bayesian Spatial Prediction and Its TemporalPrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.1 The Multivariate BSP Approach . . . . . . . . . . . . . . . . 995.2 The DLM Approach . . . . . . . . . . . . . . . . . . . . . . . 1055.3 NAIVEcurrency1 Approach . . . . . . . . . . . . . . . . . . . . . . . . 1065.4 Results and Comparisons . . . . . . . . . . . . . . . . . . . . 1075.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . 1096 Bayesian Empirical Orthogonal Function Method . . . . . 1266.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.2 Classical EOFs . . . . . . . . . . . . . . . . . . . . . . . . . . 129ivTable of Contents6.3 Corrected EOFs . . . . . . . . . . . . . . . . . . . . . . . . . 1336.4 Bayesian EOFs . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.5 Extension to the Bayesian EOFs . . . . . . . . . . . . . . . . 1486.6 Simulation Study 2 . . . . . . . . . . . . . . . . . . . . . . . 1496.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557 An Extension of the BSP: Bayesian Spatio{Temporal Mod-els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1597.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607.2 Univariate Bayesian Spatio{temporal Model . . . . . . . . . 1627.2.1 Type I covariates . . . . . . . . . . . . . . . . . . . . 1657.2.2 Type II covariates . . . . . . . . . . . . . . . . . . . . 1657.2.3 Possible choices for ?K(s) . . . . . . . . . . . . . . . 1667.2.4 Predictive posterior distributions . . . . . . . . . . . 1667.3 The Univariate Bayesian Spatio{temporal Model and Rela-tionships with Others Approaches . . . . . . . . . . . . . . . 1697.3.1 Relationship with the DLM in Huerta et al. (2004) . 1697.3.2 Relationship with the SSM in Wikle & Cressie (1999) 1697.3.3 Relationship with the univariate SSM in Gelfand etal. (2005) . . . . . . . . . . . . . . . . . . . . . . . . . 1707.3.4 Relationship with the BSP model in Le and Zidek(1992) . . . . . . . . . . . . . . . . . . . . . . . . . . 1717.4 A Multivariate Bayesian Spatio{Temporal Model . . . . . . . 1727.5 MCMC Algorithm on the Bayesian Spatio{temporal Models 1737.6 Results and Conclusions . . . . . . . . . . . . . . . . . . . . 1758 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1778.1 Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . 1778.2 Future Research Plan . . . . . . . . . . . . . . . . . . . . . . 178Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181vTable of ContentsAppendicesA Additional Results for Chapter 2 . . . . . . . . . . . . . . . . 186A.1 Additional Results for Section 2.6.1 . . . . . . . . . . . . . . 186A.2 Additional Results for Section 2.6.2 . . . . . . . . . . . . . . 187A.3 Additional Results for Section 2.6.4 . . . . . . . . . . . . . . 188A.4 Additional Results for Sections 2.7.1 and 2.7.2 . . . . . . . . 191B Software for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 193C Additional Results for Chapter 6 . . . . . . . . . . . . . . . . 200C.1 Additional Results for Section 6.4 . . . . . . . . . . . . . . . 200viList of Tables3.1 Posterior summaries for ?; ?2; a1; and a2: . . . . . . . . . . . . . 323.2 Comparisons between the nominal levels and actual predictive cred-ibility interval coverage at the ungauged sites A;:::;F: . . . . . . 333.3 Summary of pairs of \friends" for ungauged and gauged sites. . . . 373.4 Fixed ?currency1 in Study C: . . . . . . . . . . . . . . . . . . . . . . . . 424.1 The relationship between the discount factor (?) and the signal{to{noise ratio (r). . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 The greatest circle distance (GCD) between the pairs of ungaugedsites and their closest gauged site(s) in the Chicago's hourly O3?eld. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.3 The posterior ellipsoid coverage probabilities at various nominallevels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.4 The mean square predictive error (MSPE) at ungauged sites of themultivariate BSP, DLM, and NAIVE approaches. . . . . . . . . . 915.1 The mean square predictive error (MSPE) of the one{day aheadprediction at the 14 gauged sites by the multivariate BSP, DLM,and NAIVEcurrency1 approaches. The BSP dominates in all but 3 caseswhere it essentially ties with one or another of its competitors. . . 1086.1 Percentage of spatial variation (%) for the ?rst 10 EOFs by thetrue, classical, and corrected methods (? = 0:9). . . . . . . . . . 1396.2 Matrix discrepancies for the classical and corrected EOFs againstthe true EOFs. . . . . . . . . . . . . . . . . . . . . . . . . . . 139viiList of Tables6.3 Percentage of spatial variation (%) for the ?rst 10 EOFs by thetrue, classical, and corrected methods (? = 0:1). . . . . . . . . . 1526.4 Matrix discrepancies for the classical and corrected EOFs againstthe true EOFs (? = 0:1). . . . . . . . . . . . . . . . . . . . . . 1526.5 Percentage of spatial variation (%) for the ?rst 10 EOFs by thetrue, classical, and corrected methods (? = 0:9). . . . . . . . . . 1566.6 Matrix discrepancies for the classical and corrected EOFs againstthe true EOFs (? = 0:9). . . . . . . . . . . . . . . . . . . . . . 156viiiList of Figures2.1 Geographic locations for the 1995 AQS database in US map, wherethe latitude and longitude are measured by degrees. (Diamond =Cluster 1 sites; Upper{triangle = Cluster 2 sites; Down{triangle =Cluster 3 sites.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Bayesian periodogram for the square{root of hourly ozone concen-trations at Cluster 2 sites in the AQS database from May 15 toSeptember 11 (1995). . . . . . . . . . . . . . . . . . . . . . . . 153.1 Geographical locations for the ten gauged sites in Cluster 2 and therandomly selected six ungauged sites. (Number = Cluster 2 sitesand letter = ungauged sites.) . . . . . . . . . . . . . . . . . . . . 293.2 Traces of model parameters with the number of iterations of theMarkov chains. The model parameters are: (a) { ?; the rangeparameter; (b) { ?2; the variance parameter; (c) { a1; the phaseparameter with respect to the 24{hour periodicity; and (d) { a2;the phase parameter with respect to the 12{hour periodicity. . . . 313.3 Interpolation at Ungauged Site 4 from the 1st week to the 4th week. 343.4 Interpolation at Ungauged Site 4 from the 5th week to the 8th week. 343.5 Interpolation at Ungauged Site 4 from the 9th week to the 12th week. 353.6 Interpolation at Ungauged Site 4 from the 13th week to the 16th week. 353.7 Interpolation at Ungauged Site 4 from the 17th week to the 120thday. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.8 Scatterplot for the square{root of hourly ozone concentrations atUngauged Site D and its nearly neighbour, Gauged Site 1: . . . . 38ixList of Figures3.9 Traces of model parameters with number of iterations of the twoMarkov chains. The model parameters are: (a) ??; the range pa-rameter; (b) ??2; the variance parameter; (c) ?a1; the phase pa-rameter with respect to the 24?hour periodicity; and (d) ?a2; thephase parameter with respect to the 12?hour periodicity. . . . . . 393.10 Histogram (left panel), ACF (middle panel) and PACF (right panel)of model parameters of the Markov chains after a burn{in periodof 1;000 iterations. The model parameters are: (i) ?rst row: { ?;the range parameter; (ii) second row: { ?2; the variance parameter;(iii) third row: { a1; the phase parameter with respect to the 24{hour periodicity; and (iv) last row: { a2; the phase parameter withrespect to the 12{hour periodicity. . . . . . . . . . . . . . . . . . 403.11 Scatterplots for model parameters' pairs: (a) ? v.s. ?2; (b) ? v.s.a1; (c) ? v.s. a2; (d) ?2 v.s. a1; (e) ?2 v.s. a2; and (f) a1 v.s. a2: . 413.12 Scatterplot for ? against ?2 given one{week{data only, constructedfrom MCMC samples starting from same initial values. . . . . . . 433.13 Coverage probabilities v.s. 95% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.14 Coverage probability versus 90% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45xList of Figures3.15 Coverage probability versus 80% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.16 Coverage probability versus 70% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.17 Coverage probability versus 60% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed accordingto Study A: weekly data (dot with solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with ?xed ?currency1 (starwith dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . 483.18 Coverage probability versus 50% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.19 Coverage probability versus 40% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and(f) { site F. These coverage probabilities are computed according toStudy A: weekly data (dot with solid line); Study B: W1:17 (squarewith dot line); and Study C: W1:17 but with ?xed ?currency1 (star withdashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50xiList of Figures4.1 Geographical locations for the Chicago AQS database (2000), wherethe latitude and longitude are measured in degrees. (? = G =gauged sites and ? = UG = ungauged sites.) . . . . . . . . . . . 554.2 Boxplots for the rates of: (a) missing measurements; and (b) zeromeasurements, at 24 monitoring stations in the Chicago AQS database.(G = gauged sites and UG = ungauged sites.) . . . . . . . . . . 564.3 Boxplots for the square{root of hourly ozone concentrations (pppb)at 24 monitoring stations in the Chicago AQS database. (G1 =Gauged Site 1; UG1 = Ungauged Site 1; and so on.) . . . . . . . 574.4 The weekday e?ect of the square{root of hourly ozone concentra-tions (pppb) at the 14 gauged sites in the Chicago AQS database.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.5 The hourly e?ect of the square{root of hourly ozone concentrations(pppb) at the 14 gauged sites in the Chicago AQS database. . . . 594.6 The estimated spatial correlations of: (a){detrended residuals, and(b){deAR'd residuals; between gauged sites. . . . . . . . . . . . 684.7 The PACF plots for the square{root of hourly ozone concentrations(pppb) at the 14 gauged sites in the Chicago's area AQS database. 704.8 Boxplots for the spatial correlations of the detrended residuals (De-trended), and the estimated spatial correlations using the square{root of hourly ozone concentrations during the hours of: 9 A.M.to 10 A.M. (10:11), 8 A.M. to 10 A.M. (9:11), 7 A.M. to 10 A.M.(8:11), 6 A.M. to 10 A.M. (7:11), and 5 A.M. to 10 A.M. (6:11),respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.9 Interpolation at Ungauged Site 7 from the 1st week to the 2nd week.The square{root of hourly ozone concentrations are plotted on thevertical axes and hours, on the horizontal axes. [Solid (dotdashed)lines = interpolation and 95% pointwise predictive intervals by theBSP; dash (dot) lines = interpolation and 95% predictive intervalsby the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 77xiiList of Figures4.10 Interpolation at Ungauged Site 7 from the 3rd week to the 4th week.The square{root of hourly ozone concentrations are plotted on thevertical axes and hours, on the horizontal axes. [Solid (dotdashed)lines = interpolation and 95% pointwise predictive intervals by theBSP; dash (dot) lines = interpolation and 95% predictive intervalsby the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 784.11 Interpolation at Ungauged Site 7 from the 5th week to the 6th week.The square{root of hourly ozone concentrations are plotted on thevertical axes and hours, on the horizontal axes. [Solid (dotdashed)lines = interpolation and 95% pointwise predictive intervals by theBSP; dash (dot) lines = interpolation and 95% predictive intervalsby the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 794.12 Interpolation at Ungauged Site 7 from the 7th week to the 8th week.The square{root of hourly ozone concentrations are plotted on thevertical axes and hours, on the horizontal axes. [Solid (dotdashed)lines = interpolation and 95% pointwise predictive intervals by theBSP; dash (dot) lines = interpolation and 95% predictive intervalsby the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 804.13 Interpolation at Ungauged Site 7 from the 9th week to the 10th week.The square{root of hourly ozone concentrations are plotted on thevertical axes and hours, on the horizontal axes. [Solid (dotdashed)lines = interpolation and 95% pointwise predictive intervals by theBSP; dash (dot) lines = interpolation and 95% predictive intervalsby the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.] . . . . . . . . . . . . . . . . . . . . . . . . 81xiiiList of Figures4.14 Interpolation at Ungauged Site 7 from the 11th week to the 12thweek. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = ob-servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 824.15 Interpolation at Ungauged Site 7 from the 13th week to the 14thweek. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = ob-servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 834.16 Interpolation at Ungauged Site 7 from the 15th week to the 16thweek. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = ob-servations at Ungauged Site 7.] . . . . . . . . . . . . . . . . . . 844.17 The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLMand NAIVE approaches, and the 95% pointwise predictive intervalsusing the multivariate BSP and DLM approaches at Ungauged Site10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.18 The observed square{root of ozone concentrations (pppb) duringthe 10th week, the interpolation using the multivariate BSP, DLMand NAIVE approaches, and the 95% pointwise predictive intervalsusing the multivariate BSP and DLM approaches at Ungauged Site10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86xivList of Figures4.19 The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLMand NAIVE approaches, and the 95% pointwise predictive intervalsusing the multivariate BSP and DLM approaches at Ungauged Site9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.20 The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLMand NAIVE approaches, and the 95% pointwise predictive intervalsusing the multivariate BSP and DLM approaches at Ungauged Site8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.21 The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLMand NAIVE approaches, and the 95% pointwise predictive intervalsusing the multivariate BSP and DLM approaches at Ungauged Site2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.22 Boxplot of the simultaneous posterior ellipsoid credibility regionsat various nominal levels. . . . . . . . . . . . . . . . . . . . . . . 904.23 The ratio of MSPE of the interpolation by NAIVE to that of themultivariate BSP for each of: (a) the 17 weeks; and (b) the 10ungauged sites. . . . . . . . . . . . . . . . . . . . . . . . . . . 924.24 The ratio of MSPE of the interpolation by the DLM to that ofthe multivariate BSP for each of: (a) the 17 weeks; and (b) the 10ungauged sites. . . . . . . . . . . . . . . . . . . . . . . . . . . 934.25 Side{by{side boxplots of the coverage probabilities of the multi-variate BSP and DLM approaches plotted against the 10 ungaugedsites, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . 944.26 Side{by{side boxplots of the coverage probabilities of the multivari-ate BSP and DLM approaches plotted against the time span of 17weeks, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 95xvList of Figures5.1 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 1. . . . . . . . . . . . . . . . . . . . . . . . . . 1095.2 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 2. . . . . . . . . . . . . . . . . . . . . . . . . . 1105.3 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 3. . . . . . . . . . . . . . . . . . . . . . . . . . 1115.4 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 4. . . . . . . . . . . . . . . . . . . . . . . . . . 1125.5 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 5. . . . . . . . . . . . . . . . . . . . . . . . . . 1135.6 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 6. . . . . . . . . . . . . . . . . . . . . . . . . . 114xviList of Figures5.7 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 7. . . . . . . . . . . . . . . . . . . . . . . . . . 1155.8 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 8. . . . . . . . . . . . . . . . . . . . . . . . . . 1165.9 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 9. . . . . . . . . . . . . . . . . . . . . . . . . . 1175.10 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 10. . . . . . . . . . . . . . . . . . . . . . . . . 1185.11 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 11. . . . . . . . . . . . . . . . . . . . . . . . . 1195.12 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 12. . . . . . . . . . . . . . . . . . . . . . . . . 120xviiList of Figures5.13 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 13. . . . . . . . . . . . . . . . . . . . . . . . . 1215.14 The observed square{root of ozone concentrations (pppb) fromday 114 to day 121, the predicted values using the multivariateBSP, DLM and NAIVEcurrency1 approaches, and the 95% pointwise pre-dictive intervals using the multivariate BSP and DLM approachesat Gauged Site 14. . . . . . . . . . . . . . . . . . . . . . . . . 1225.15 The width of the 95% pointwise predictive intervals of the one{dayahead prediction at the 14 gauged sites using the multivariate BSPapproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.16 The width of the 95% pointwise predictive intervals of the one{dayahead prediction at the 14 gauged sites using the DLM approach. 1245.17 Boxplots of the coverage probabilities using the DLM and multi-variate BSP approaches at the 95% nominal level. . . . . . . . . 1256.1 Contour plot for the simulated data at day t = 5 in the 18 ? 18grid locations. The AR coe?cient in the simulated data is set tobe ? = 0:9: (White=-4.0; Black=4.0.) . . . . . . . . . . . . . . . 1336.2 Contour plot for the simulated data at day t = 28 in the 18 ? 18grid locations. The AR coe?cient in the simulated data is set tobe ? = 0:9: (White=-4.0; Black=4.0.) . . . . . . . . . . . . . . . 1346.3 Histogram (?rst row), ACFs (second row) and PACFs (third row)for the simulated data at four randomly selected sites in the region.The AR coe?cient in the simulated data is set to be ? = 0:9: . . . 1356.4 Contour plots for the ?rst two true EOF vectors: (a) { 1st EOF;and (b) { 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . . . . 1366.5 Contour plots for the ?rst two classical EOF vectors: (a) { 1st EOF;and (b) { 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . . . . 1376.6 Contour plots for the ?rst two corrected EOF vectors: (a) { 1stEOF; and (b) { 2nd EOF. (White=-1.7; Black=1.4.) . . . . . . . 138xviiiList of Figures6.7 Contour plots for the ?rst 6 true EOFs (? = 0:1): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and(f) { 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . . . 1536.8 Contour plots for the ?rst 6 classical EOFs (? = 0:1): (a) { 1stEOF; (b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF;and (f) { 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . 1546.9 Contour plots for the ?rst 6 corrected EOFs (? = 0:1): (a) { 1stEOF; (b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF;and (f) { 6th EOF. (White=-0.6; Black=0.9.) . . . . . . . . . . . 1556.10 Contour plots for the ?rst 6 classical EOFs (? = 0:9): (a) { 1stEOF; (b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF;and (f) { 6th EOF. (White=-1.6; Black=2.2.) . . . . . . . . . . 1576.11 Contour plots for the ?rst 6 corrected EOFs (? = 0:9): (a) { 1stEOF; (b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF;and (f) { 6th EOF. (White=-1.6; Black=2.2.) . . . . . . . . . . . 158xixAcknowledgementsI am most grateful to my co{supervisors, Constance van Eeden, Nhu Le andJim Zidek, the ?nancial support they provided as well as the ?exible work-ing environment they provided. I greatly appreciated the encouragementprovided by all three of these individuals.I bene?ted from my numerous discussions with Jim, who provided inspi-ration and encouraged my growth through his excellence guidance through-out my PhD studies. Jim's patience, valuable suggestions and commentscontributed greatly to making this thesis feasible. I am honoured to havethe chance to work with him.Discussions with Nhu Le proved invaluable and I bene?ted greatly fromhis comments and his inspired suggestions. In particular, I pro?ted from hisvery perceptive insights on how my work related to the realities I set out todescribe.I cannot thank Constance enough for her wise counsel, especially on heradvice on technical writing and teaching. She constantly encouraged meboth academically and personally.I would like to thank the other member of my advisory committee, PaulGustafson, for his insightful comments and suggestions. I would like toextend my thanks to all my thesis committee for their prompt responsesand suggestions on my thesis during the holiday.I am indebted for Harry Joe and Weiliang Qiu for their help on providingsome of the C codes used in the software that has been developed in thisthesis. I would like to thank Nancy Heckman for her invaluable suggestionswhen I was trying to solve a problem in my research, and Rick White forhis advice on using the C++ language.I would like to thank Prasad Kasibhatla of Nicholas School of Environ-xxAcknowledgementsment of Duke University for providing the database used for an implemen-tation in the thesis.Let me say \thank you" to the Department of Statistics for providing anice environment for my growth. I appreciate the chance to study in thisinstitute and all the help I got from the Department's faculty.I would also like to thank Christine Graham, Rhoda Morgan, ElaineSalameh, Peggy Ng, and Viena Tran for all their help with administrativematters.Many thanks to my department fellows for valuable discussions withthem.I would like to thank my parents and my grandparents for their under-standing and support throughout my PhD program. I would also like tothank my brother, Yiwen Dou, for his understanding and e?orts to takecare of family issues during the years I studied in Vancouver.YIPING DOUThe University of British ColumbiaMarch 2008xxiTo Xiulan, Benju, Huazhu, and YuanliexxiiChapter 1IntroductionThis thesis addresses one topic in four themes. More speci?cally, we developa uni?ed fully Bayesian hierarchical modelling approach to the interpolationand prediction of univariate (respectively multivariate) response variables(respectively vectors) in spatial{temporal ?elds, or space{time ?elds. Theimportance of the prediction of certain responses turn out to be vital forhuman health. Moreover, people tend to ?nd faster, cheaper and bettermethod for prediction. These all motivates studies in this thesis.Many researchers in these areas face responses with spatial structurethat changes over time in a dynamic fashion in the sense that the underlyingprocess varies in space and time (Wikle and Royle, 2004). The associatedrandom ?eld is called a \space{time ?eld". Measurements taken on that?eld yield so{called spatio{temporal data because of stochastic dependencerelationships that are both spatial and temporal in nature. Cressie (1993)de?nes a space{time ?eld as a set of stochastic processes over space thatdi?er over time. Space{time modelling requires that we deal with space{time data for a variety of purposes, using various methods.The ubiquity of such processes has led to a rich research literature onspace{time data problems spread over diverse ?elds such as environmentalhealth, climatology, epidemiology and ecology. For example, we may wishto investigate the relationship between air pollution and health outcomes,such as, asthma or chronic obstructive pulmonary disease. We may recordthe measurements at a number of monitoring sites within the study region.Each of those sites may: measure di?erent sets of pollutants; contain missingdata; have a startup time that di?ers from those of other monitoring sitesin the network.To model such data, we use a stochastic space{time model to capture the1Chapter 1. Introductiondependence between pollutants, spatially and temporally. That dependencemay derive from other processes than just those associated with the pollu-tants themselves. For example, the dependence between pollutants could bedue to wind direction and speed since some pollutants such as the ground{level ozone concentrations may spread with the wind. Temporal dependencemay be due to the atmospheric processes that generate the pollutants. Spa-tial dependence can yield a high correlation between the responses for eachpollutant at di?erent monitoring sites. Monitors in geographical proximitytend to be highly correlated unlike those that are far apart. These correla-tions among the monitoring sites in our application seem relatively constantover time.Section 1.1 introduces the AQS database used in this thesis and somefeatures of ground{level ozone concentrations. Section 1.2 describes somebackground literature on the topic this thesis addresses. Section 1.3 presentsthe plan of this thesis.1.1 AQS Database and Ground{level OzoneThe Air Quality System (AQS) database contains measurements of air pol-lutant concentrations in the United States, for both criteria air pollutantsand hazardous air pollutants. The former are of more concern as they areregulated under the US Clean Air Act of 1970 to protect human health andwelfare. The US Environmental Protection Agency (EPA) set air qualitystandards for six criteria air pollutants: Carbon Monoxide (CO), Nitro-gen Dioxide (NO2), Sulfur Dioxide (SO2), Ozone (O3), Particulate Matter(PM10 and PM2.5), and Lead (Pb).Ozone is good up bad down in terms of the environment and health, thatis, ozone could be good or bad depending on its location in the atmosphere.\Good" ozone occurs in the high altitude, about 6 to 30 miles from thesurface of the Earth, also called stratosphere ozone. Stratosphere ozone helpsreduce the harmful ultraviolet (UV) rays to protect life on Earth. \Bad"ozone, one of the six principal pollutants set by EPA that occurs closestto the surface of the Earth, often within 6 miles, also called ground{level2Chapter 1. Introductionozone or troposphere ozone. Ground{level ozone is \created by chemicalreactions between oxides of nitrogen (Nox) and volatile organic compounds(VOC) in the presence of sunlight"1. Ground{level ozone concentrationsare often high, annually in summer and daily in late morning and earlyafternoon. Ground{level ozone's absorption in the ultraviolet spectrum isapproximately 250 nanometers (nm). Ozone is measured by comparing thedegree of UV light absorption through a ?ow cell with ozone{free air.2.Ozone can also be measured by di?erential optical absorption spectroscopy(DOAS) instrumentation.The US EPA sets both primary and secondary standards for ground{level ozone at 0.08 parts per million (ppm) by volume. We use parts perbillion, the unit of ozone levels, instead of parts per million in order to beconsistent with other studies about ozone (while noting that these units arenot universally acceptable since Europe de?nes \billion" di?erently thanNorth America, for example). Only one{hour measured ground{level ozoneconcentrations are considered in this thesis and so we have the primary andsecondary standards3 of 80 ppb.Ground{level ozone concentrations are measurements of a space{timeprocess in space{time ?elds where the \true" ozone levels change over timeand monitoring locations. Uncertainty occurs between the measurementsand the things to be measured, that is, the \true" ozone levels. We considerall these features in a Bayesian framework, a framework chosen for its great?exibility. More speci?cally, in this thesis, we consider fully hierarchicalBayesian models for the prediction of the ground{level ozone concentrationsusing AQS database or a simulated database as our application that helps1See the following link at US EPA: http://www.epa.gov/air/ozonepollution/basic.html.2See the link at: http://www.epa.qld.gov.au/environmental management/air/air quality monitoring/air pollutants/ozone/.3According to EPA at http://www.epa.gov/air/ozonepollution/standards.html:? Primary standards are the limits set \to protect public health, including the healthof `sensitive' populations such as asthmatics, children, and the elderly."? Secondary standards are the limits set \to protect public welfare, including pro-tection against visibility impairment, damage to animals, crops, vegetation, andbuildings."3Chapter 1. Introductionus assess our models while being of great substantive importance.1.2 Literature ReviewMultivariate models for vectors of pollutants over networks of monitoringsites prove to be much more powerful than their univariate counterparts.(See Gelfand et al. (2005) for one recent approach to the development ofsuch models using a linear coregionalization method within a dynamic lin-ear modeling (DLM) framework to predict the univariate and multivariateresponses in space{time domains.) By combining all the information fromdi?erent pollutants at multiple locations, we borrow strength and gain a bet-ter understanding of the levels of pollution at these sites. Furthermore, weobtain more accurate predictors of the pollutants at ungauged sites (unmon-itored sites). Moreover, such models enable us to accommodate other sitespeci?c responses in our multivariate framework. For instance, we know thattemperature a?ects ground{level ozone concentrations; ozone levels tend tobe higher in summer than winter. Thus temperature can be incorporateddirectly in the model even though it would usually be regarded merely as acovariate.Another feature of space{time ?elds receiving increased attention in re-cent literature is nonstationarity. (See De?nition 2.2.2 in Chapter 2.) Non-stationarity can be due to the correlation (or covariance) varying with dif-ferent site{features or heterogeneity in pollutant levels. Higdon (1998) pro-poses a process convolution approach to de?ne a nonstationary process usingbasis function expansions. Following that, a dynamic process convolutionmethod is proposed by Calder (2004). Calder and Cressie (2007) review var-ious types of convolution{based models for spatial data, in which they citeFuentes's work (Fuentes, 2002). Fuentes (2002) models the nonstationaryprocess through the convolution method by de?ning that process to be amixture of stationary processes at small subregion. Sampson and Guttorp(1992) propose a deformation approach for capturing process non-stationarywhile Damian et al. (2002) o?er a Bayesian version of it. These covariancemodels ?t in well with the Bayesian hierarchical models proposed by Brown4Chapter 1. Introductionet al. (1994) who de?ne and implement the generalized inverted Wishartdistribution particularly to construct covariance models for patterned datasuch as those that exhibit a monotone (staircase) pattern.We use a DLM approach to deal with the problem of nonstationarity,amongst other things. In particular, we propose a class of fully hierarchi-cal Bayesian models that accounts for parameter uncertainty and addressesthe curse of dimensionality in spatio{temporal modelling. In fact, one ofthe greatest di?culties arising in implementing the above approaches forspatial{temporal ?elds is the computational burden and ine?ciency, espe-cially in high{dimensional systems with irregularly located monitoring sta-tions as well as sparse data. Approaches to tackling this problem includeBayesian kriging approach by Wikle and Cressie (1999), the process convo-lution method by Hidgon (1998), and the spatial dynamic factor approachby Lopes et al. (2007). As an extension of Wikle and Cressie, we investigatea fully Bayesian approach to construct the local principal spatial patternsthrough an EOF approach.1.3 Introduction to the ThesisMy thesis contains four main themes. The ?rst implements a version of theDLM model proposed by Huerta et al. (2004) to model an AQS database forthe ozone concentrations at one cluster of 10 monitoring stations over theentire summer of 1995. This implementation, along with the backgroundknowledge for the thesis is introduced in Chapters 2 and 3. In Chapter 2,we ?rst review the de?nition and properties of spatio{temporal processesand of Kalman ?ltering and smoothing methods in Gaussian DLM frame-work. We then introduce the AQS database (1995) and an implementationof an exploratory data analysis (EDA) based on that database. Finally, wedemonstrate use of MCMC algorithms for spatial interpolation and tempo-ral prediction at ungauged sites. Following the theory presented in Chapter2, we implement the DLM to model the hourly ozone concentrations basedon the database in Chapter 3. We use the software developed for this the-sis, GDLM.1.0, to complete this implementation. We also present potential5Chapter 1. Introductionproblems in applying this method to ground{level ozone concentrations atthat space{time domain. More discussions on this implementation of theDLM can be referred to Dou et al. (2007).Computational ine?ciency proves a critical problem in using the AQSdatabase (1995) for ozone studies { the DLM is just not scalable to largespace{time domains as our work will show. Thus we explore use of analternative, the BSP (Bayesian spatial prediction after pre?ltering) or Le{Zidek approach that we implement for another AQS database (2000) tospatially interpolate and temporally predict the ground{level ozone concen-trations. In Chapter 4, we ?rst introduce related literature work and theAQS database for the Chicago area. We then demonstrate the methodologyof the BSP approach. We show the existence of the spatial leakage problem(Le and Zidek, 1999) in the DLM framework, a newly result of this thesis.We summarize the results on spatial interpolation and compare them withthe results using the DLM at the end of this chapter.Le & Zidek (2006, p.131{183) suggest that a di?erent modelling approachwould be needed for the temporal prediction of univariate and multivariateresponses in spatio{temporal ?elds. However in Chapter 5, we show how todo this with a further modelling step so that the BSP can in fact be madeto yield one{day{ahead temporal forecasts for ground{level ozone concen-trations. The temporal prediction results using the BSP approach are thencompared with that of the DLM and another alternative called \NAIVE".We summarize these comparisons at the end of this chapter, and concludeon the advantages and disadvantages of the BSP and DLM approaches.Because of enormous computational time savings of the BSP over theDLM approach, we chose to extend and re?ne the BSP, in particular, toincorporate both broad and ?ne scale spatial variations, while incorporatingautocorrelation in the time series at each of the sites that of interest. InChapter 6, we ?rst show that the way EOFs ([Nancy] empirical orthogonalfunctions) are traditionally computed may be misleading when the time se-ries data autocorrelated in a simulation study. [Nancy: The EOF methodis used here for purpose of an extension into a fully Bayesian framework be-cause of its widely and intensively usage in scienti?c community.] Assuming6Chapter 1. Introductiona known temporal covariance function, we show the corrected EOF methodbetter capturing main spatial patterns than the classical one. Since this co-variance will always be unknown and so uncertain in practice, we propose aBayesian EOF method to represent that uncertainty. In the second simula-tion study, we compare both the classical and corrected EOFs with the trueEOFs, assuming a known and separable spatial{temporal covariance func-tion. However we leave the implementation of the Bayesian EOF method.That implementation can use the MCMC algorithms already developed inthis chapter.Finally, the ?exible, general structure of the DLM allows us to inte-grate the BSP approach into the DLM framework. Chapter 7 proposes auni?ed Bayesian spatio{temporal model for univariate and multivariate re-sponses. Using this new model, we can decompose data variations into threecomponents: long{term spatio{temporal; short{term principal spatial; andshort{term spatio{temporal components. The short{term spatio{temporalcomponents can be modelled as an BSP term. This very general, ?exiblemodel accounts for temporal correlation in the data and so allows us toupdate the information on those parameters as new data come in. To main-tain computational speed, the Bayesian EOF has been implemented in thismodel to capture the local{term principal spatial patterns in the detrendedspatio{temporal residual ?elds. We show the model's generality and ?exi-bility by investigating that some well{known related models turn out to bespecial cases of ours. Those related models include the DLM by Huerta etal. (2004), Wikle and Cressie (1999), Gelfand et al. (2005), and of course,the BSP approach itself from Le and Zidek (1992). Our model also incorpo-rates certain computational e?ciencies from theoretical results we obtain inthis thesis. In particular, we develop the MCMC algorithm to draw samplesfrom the joint posterior distribution of model parameters. We can imple-ment our model using this algorithm. However that implementation will beleft to future work.Finally in Chapter 8, we discuss future work ?owing from the work ofthis thesis as well as possible directions for their solutions.7Chapter 2Dynamic Linear Modelling2.1 IntroductionWe are particularly interested in the study for ozone concentrations, due toits importance for the environment. Without the shielding layer of strato-sphere ozone, ultraviolet (UV) radiation would harm life on Earth. The in-creasing incidences of skin cancers, for example, may be linked to a thinningof Earth's ozone level. On the other hand, exceedingly high troposphericozone levels may cause some other diseases, for instance, eye irritation andcardiovascular diseases. We wish to study the ozone levels to more com-pletely understand ozone today to better predict them in the future.Ozone concentrations are a spatio{temporal ?eld, that is, the responsevariable is observed across monitoring stations, which can be ?xed or variedas time changing, over some time periods. There are many approachesto modelling the spatio{temporal data. We have a particular interest indynamic linear modelling because modelling the time series process is basedon \classes of dynamic models," which is often de?ned as \sequences of setsof models." The term, dynamic, is de?ned as the changes in the process\due to the passage of time as a fundamental motive force" and the DLMs(dynamic linear models) when the normality is assumed.This chapter starts with an introduction to some basic notions in space{time theory, some elementary notions and results on the DLM and a reviewon Kalman ?lter and smoother processes in Sections 2.2{2.4.The theme of this chapter centers on a DLM approach to modelling theozone concentrations in the Air Quality System (AQS)4 database. The DLMstructure (proposed by Huerta et al. (2004)) is speci?ed by an illustrative4AQS originally called AIRS, the terminology we use hereafter.8Chapter 2. Dynamic Linear Modellingexample in Section 2.4 through some exploratory data analysis (EDA) andused throughout the following sections in this chapter. Theoretical resultsand algorithms on the DLM are represented in Sections 2.6 and 2.7. TheMCMC sampling scheme is outlined in Section 2.6.1. The forward{?ltering{backward{sampling (FFBS) method is demonstrated in Section 2.6.2 to es-timate the state parameters in the DLM. Moreover, we outline the MCMCsampling scheme to obtain samples for other model parameters from theirposterior conditional distributions with a Metropolis{Hasting step, and thetheoretical results for prediction and interpolation at ungauged sites fromtheir predictive posterior distributions in Section 2.7.2.2 Space{time ProcessLet D be the region of monitoring stations for study. For simplicity, we can?x D as a ?nite domain to study. Suppose Y (si;t) denotes the observationat time t 2 R and site si 2 D; and Z(si;t) denotes the space{time processat time t 2 R and site si 2 D; with t = 1;:::;T; and i = 1;:::;n: Theremay be additional covariates available, x(si;t):The modeling structure is given byY (si;t) = Z(si;t) + ?(si;t); i = 1;:::;n; t = 1;:::;T; (2.1)where ?(si;t) is a white noise process.The space{time process Z(si;t) can be expressed asZ(si;t) = ?(si;t) + w(si;t); (2.2)where ?(si;t) is the mean process obtained from the observed covariatex(si;t) and w(si;t) is a mean 0 spatio{temporal process.De?nition 2.2.1 The space{time covariance function is de?ned asC(s1;s2; t1;t2) = Cov[w(s1;t1);w(s2;t2)]; (2.3)where si is the ith location, and ti is a time point, for i = 1;2:9Chapter 2. Dynamic Linear ModellingDe?nition 2.2.2 The zero mean spatio{temporal process w(s;t) is covari-ance stationary ifC(s1;s2; t1;t2) = C(s1 ? s2; t1 ? t2) = C(h; ?); (2.4)where h = s1 ? s2 and ? = t1 ? t2:Note that h in (2.4) denotes the vector distance between the two sites.If w(s;t) does not satisfy (2.4), it is called a nonstationary spatio{temporalprocess.De?nition 2.2.3 The zero mean spatio{temporal process w(s;t) is isotropicifC(h; ?) = C(khk; j?j); (2.5)which means that the covariance function depends on the separation vectorsonly by their length of di?erence, khk and j?j : If the spatio{temporal processis not isotropic, then it is called anisotropic.The covariance structure for the spatio{temporal process can be simpli-?ed by assuming the separability (see De?nition 2.2.4).De?nition 2.2.4 The zero mean spatio{temporal isotropic process w(s;t)is separable ifC(khk; j?j) = Cs(khk)Ct(j?j); (2.6)that is, the covariance function for the spatio{temporal process can be de-composed as the product of an isotropic spatial and an isotropic temporalcovariance function.If the covariance for spatio{temporal process is not separable, it is callednon{separable and the process is then nonseparable.As Gelfand et al. (2004) mentioned, for non{stationary spatial process,the general approach is to construct a valid separable covariance function.Brown et al. (1994b) and Le et al. (1998) investigate a separable covari-ance structure to deal with nonstationary multivariate spatial models in a10Chapter 2. Dynamic Linear Modellinghierarchical Bayesian framework. The advantages for this approach are: (i)it reduces the number of parameters to be estimated in the model; (ii) itprovides a positive de?nite covariance matrix.Because of the dynamic features of spatio{temporal data, we are par-ticularly interested in modelling the process dynamically using a dynamicmodelling approach. In the following chapter, we introduce some basic no-tations, along with results, from the dynamic modelling.2.3 Spatio{temporal DLMSuppose yt : n?1 is a vector observation, for t = 1;2;:::: The DLM containstwo equations: the observation equation and the evolution equation. Theobservation equation is given byyt = F0txt + ?t; ?t ? N[0;Vt]; (2.7)and the evolution equation is given byxt = Gtxt?1 + !t; !t ? N[0;Wt]; (2.8)where Ft : p? n, Gt : p? p, Vt : n? n, and Wt : p? p are known matrices.In Equation (2.7), Ft is called the design matrix, xt is the state, or system,vector, and ?t is the observational error. Equation (2.8) is also called theevolution, state or system equation. Gt is the system or state matrix and!t is the system, or evolution, error with evolution matrix Wt:The dynamic linear model is completed with the initial information forthe state parameter given by(x0jy0) ? N[m0;C0]: (2.9)2.4 Kalman Filter and SmootherKalman recursion, or Kalman ?ltering as it is sometimes referred to, is usedto update and forecast the state parameters. An analogous method is de-11Chapter 2. Dynamic Linear Modellingrived by West and Harrison (1997, Chapter 4) on updating and forecastingstate parameters in the DLM framework. This result can also be derived bymeans of Bayesian inference. Carter and Kohn (1994) label this approachforward{?ltering{backward{sampling. It is particularly powerful and willbe used in the following chapter. It can, moreover, be summarized in thefollowing theorem which will be applied to the ?ltering and smoothing pro-cesses.Theorem 2.4.1 (West and Harrison, 1997) Let y1:t = (y1;:::;yt); t ? 1;denote all the responses observed until time t: Under models (2.7) - (2.8),together with the initial state information in (2.9), we have for t = 2;:::;T(i)(xt?1jy1:t?1;?) ? N[mt?1;Ct?1](xtjy1:t?1;?) ? N[at;Rt](ytjy1:t?1;?) ? N[ft;Qt](xtjy1:t;?) ? N[mt;Ct];whereat = Gtmt?1 Rt = GtCt?1G0t + Wtft = F0tat Qt = F0tRtFt + Vtet = yt ? ft At = RtFtQ?1tmt = at + Atet Ct = Rt ? AtQtA0t:(ii) Let Bt = CtG0t+1R?1t+1: For 0 ? k ? T ? 1;(xT?kjy1:T ;?) ? N[aT (?k);RT (?k)]; (2.10)whereaT (?k) = mT?k + BT?k[aT (?k + 1) ? aT?k+1]RT (?k) = CT?k + BT?k[RT (?k + 1) ? RT?k+1]B0T?k12Chapter 2. Dynamic Linear Modellingwith aT (0) = mT ; RT (0) = CT ; aT?k(1) = aT?k+1; and RT?k(1) =RT?k+1:Note that the distributions for the smoothing and ?ltering processes areobtained conditionally on the parameter vector ?: In practice, it is oftenexceedingly di?cult to obtain the posterior distribution of ? because it maynot have any closed form. In this case, the MCMC method is often usedto obtain samples of ? from its posterior conditional distribution. Afterobtaining ?s, we can use Theorem 2.4.1 to get samples of state parametersfrom the corresponding distributions. If we can obtain all the samples of ?sand state parameters, we can then do the prediction and interpolation, thegeneral goals of the kriging method.Next we set the DLM modelling for the Cluster 2 AQS database. ThisDLM modelling will be implemented in Chapter 3.2.5 An Illustrative Example: Cluster 2 AQSDatabase (1995)We consider hourly ozone concentrations in ppb measured during 1995 at375 di?erent monitoring stations irregularly located in the USA. Amongthe 375 stations, we choose three clusters of sites in close proximity with10 monitoring stations each, for a total of 30 monitoring stations. Thegeographical locations of these stations are given by the latitudinal andlongitudinal coordinates (see Figure 2.1).Our goal is to construct a suitable DLM for the Cluster 2 sites based onthe explanatory data analysis (EDA) as Huerta et al. (2004) suggest. Themissing data are ?lled initially by the spatial regression method (SRM). Theempirical distribution of the ozone concentrations has an asymmetric shape.So the square{root of the ozone concentrations are used as the responses dueto the normality assumption of the DLM. For our study of the periodicityof the ozone dataset, we plot the Bayesian periodograms (Bretthorst, 1988)for the square{root of ozone levels for this summer of 1995 at Cluster 2 sitesin Figure 2.2. We ?nd a high peak during 1 pm to 3 pm each day for 12013Chapter 2. Dynamic Linear ModellingFigure 2.1: Geographic locations for the 1995 AQS database in US map, wherethe latitude and longitude are measured by degrees. (Diamond = Cluster 1 sites;Upper{triangle = Cluster 2 sites; Down{triangle = Cluster 3 sites.)days, a signi?cant 24{hour cycle for all these gauged sites in Figure 2.2.We also ?nd a slightly signi?cant 12{hour cycle by plotting the periodicitieswhich contribute more variation according to the spectrum for each stationin Cluster 2 sites. We do not ?nd any obvious weekly cycles or any nightlypeaks for this database.The DLM used in this chapter is a variation of the one proposed byHuerta et al. (2004). The state vector equation accounts for the trend andperiodicity across the sites. The validity of this model for this example isassessed in Chapter 3. Given the information of other covariates, such astemperature or wind speed, better prediction of responses might be obtainedusing the DLM. However, due to the lack of such knowledge, we use such avariation of the DLM in Huerta et al. (2004).14Chapter 2. Dynamic Linear ModellingFigure 2.2: Bayesian periodogram for the square{root of hourly ozone concentra-tions at Cluster 2 sites in the AQS database from May 15 to September 11 (1995).Let yit denote the observed square{root of the ozone concentration, atsite si and time t; with i = 1;:::;n and t = 1;:::;T; where n presents thetotal number of gauged sites (that is, sites with observations) in our studyand T; the total number of time points.A variant of the state{space model for such a database is given by(Huerta et al., 2004):yt = 10n?t + S1t(a1)?1t + S2t(a2)?2t + ?t (2.11)?t = ?t?1 + wt (2.12)?jt = ?j;t?1 + !?jt ; (2.13)where ?t ? N[0;?2V?]; wt ? N[0;?2?2y ] and !?jt ? N[0;?2?2j V?j]; withV? = exp(?V=?) and V?j = exp(?V=?j); for j = 1;2: Let yt = (y1t;:::;ynt)0 :15Chapter 2. Dynamic Linear Modellingn?1 and ?jt = (?j1t;:::;?jnt)0 : n?1;j = 1;2: Here ?t denotes a canonicalspatial trend and ?jit; a coe?cient for site si at time t corresponding to aperiodicity component Sjt(aj); where Sjt(aj) = cos(?tj=12)+aj sin(?tj=12);for j = 1;2: Note that V = (vij) : n ? n represents the distance matrix forthe gauged sites s1;:::;sn; that is, vij = jjsi ? sjjj for i;j = 1;:::;n: Herejjsi?sjjj denotes the Euclidean distance between sites si and sj in kilometers.Here the initial state x0 = (?0;?010;?020)0 has been given in (2.9).Models (2.11){(2.13) can also be written as in (2.7) and (2.8) by lettingGt = I; Vt = ?2V? and Wt = ?2W; where W is a block diagonal ma-trix with diagonal entries ?2y ; ?21 exp(?V=?1); and ?22 exp(?V=?2): In otherwords, the observation and the state equations can also be written asyt = F0txt + ?t (2.14)xt = xt?1 + !t (2.15)where x0t = (?t;?1t0;?2t0); and F0t is given by26666641 S1t(a1) 0 ::: 0 S2t(a2) 0 ::: 01 0 S1t(a1) ::: 0 0 S2t(a2) ::: 0... ... ... ... ... ... ...1 0 0 ::: S1t(a1) 0 0 ::: S2t(a2)3777775:Let y1:T = (ym1:T ;yo1:T )0; where ym1:T = (ym1 ;:::;ymT ) represents all themissing values and yo1:T ; all the observed values in Cluster 2 sites for t =1;:::;T: The model parameters are (?;?2;x1:T ;ym1:T ;a1;a2); in which x1:T =(x1;:::;xT ) are the state parameters until time T; ? the range parameter,?2 the variance parameter and a = (a1;a2) the phase parameters. Here weassume constant, i.e., site{invariant, phase parameters over all gauged sitesdue to an empirical study at each gauged site that con?rms such features ofspatio{temporal data in this ?eld. De?ne ? = (?2y ;?21 ;?1;?22 ;?2) to be thevector of parameters ?xed in the DLM.The DLM is then completed with the following hyperpriors for some of16Chapter 2. Dynamic Linear Modellingthe model parameters:? ? IG(??;??)?2 ? IG(??2;??2)a ? N(?oa;?oa):The choice for such hyperpriors is addressed in Section 3.1.We express the state{space model in two di?erent ways because of ourdual objectives of inference for parameters and interpolation. For simplic-ity, we use models (2.14){(2.15) for the purpose of inference on the range,variance, state parameters and missingness in Sections 2.6.2{2.6.3, and usemodels (2.11){(2.13) for inference about the phase parameters in Section2.6.4 and spatial interpolation in Section 2.7.We can see that the state{space models in (2.14){(2.15) capture someimportant features of the AIRS database. It re?ects the time{dependentstructure of the data and captures the diurnal patterns of ozone concen-trations across all the sites. Further implementation of the DLM in thisdatabase will be revisited in Chapter 3.2.6 Algorithms for Estimating the ModelParametersFor the purpose of interpolation and prediction, one has to estimate all theunknown model parameters at each gauged site and each time point. Ourgoal in this section is to give the details needed to estimate the model param-eters by the MCMC method and the forward{?ltering{backward{samplingalgorithm, developed by Carter and Kohn (1994).In Section 2.6.1, we introduce the Metropolis{within{Gibbs method tosample from the target distribution for the model parameters given all theobservations. Sections 2.6.2, 2.6.3 and 2.6.4 give details about implement-ing this method under the DLM. The algorithm for estimating the modelparameters is then summarized in Section 2.6.5.17Chapter 2. Dynamic Linear Modelling2.6.1 Metropolis{within{Gibbs algorithmConsider the state space model (2.14){(2.15). Let yo1:T = (yo1;:::;yoT) :n ? T be the observation matrix at the n gauged sites until time T: Letx1:T = (x1;:::;xT ) : (2n + 1) ? T be the state parameters at the n gaugedsites until time T: For simplicity, the coordinates of ? are ?xed but theproblem on setting them will be addressed later in Section 3.5.The target distribution of interest is given by p(?;?2;x1:T ;ym1:T ;a1;a2jyo1:T ):Since the target density does not have a closed form, direct sampling meth-ods cannot draw samples from it. The MCMC method is a popular wayto sequentially sample the parameters from their posterior distributions de-pending on the last value drawn and iteratively until the convergence of thechain is reached. As we can see from Huerta et al. (2004), the MCMCmethod can be used to obtain the posterior, predictive and interpolationresults for the DLM.A blocking MCMC scheme is used to sample each component iterativelyfrom the target distribution. Three blocks are chosen as (?;?2;x1:T ); ym1:Tand (a1;a2): There are two reasons for such blocks in applying the MCMCmethod. Firstly, it is natural to select blocks in which the parameters arehighly correlated but relatively conditional independent between the blocks.The phase parameters are assumed independent in time and location andso less correlated with the other model parameters. Another reason derivesfrom the fact that the full conditional posterior distribution of the phaseparameters can be obtained by assuming a bivariate normal hyperprior.Details about inference are presented in Appendix A.2.Since there is no direct way to sample from the target distribution, Gibbssampling is used to sample these three blocks iteratively from their fullconditional posterior distributions. In other words, we can iteratively:(i) sample from p(x1:T ;?;?2ja1;a2;y1:T );(ii) sample from p(ym1:T j?;?2;a1;a2;yo1:T ); and(ii) sample from p(a1;a2jx1:T ;?;?2;y1:T ):18Chapter 2. Dynamic Linear ModellingThe problem is that we do not have a closed form for p(?;?2;x1:T ja1;a2;y1:T ) either. However, the full conditional posterior distribution of x1:T canbe obtained explicitly by Kalman ?ltering and smoothing (Section 2.4), i.e.,FFBS algorithm. Assuming an inverse Gamma hyperprior for ?2; the condi-tional posterior distribution of ?2 given the range and phase parameters alsohas an inverse Gamma distribution with new shape and scale parameters.Note thatp(?;?2;x1:T ja1;a2;y1:T ) = p(?ja1;a2;y1:T )p(?2j?;a1;a2;y1:T )?p(x1:T j?;?2;a1;a2;y1:T ); (2.16)which shows we can sample from the three conditional posterior distri-butions on the right{hand{side of (2.16) iteratively to obtain the sam-ples from p(?;?2;x1:T ja1;a2;y1:T ): However, there is no closed form forp(?ja1;a2;y1:T ): So we must sample ? from it by a Metropolis{Hasting stepwithin a Gibbs sampling cycle. This algorithm is often called Metropolis{within{Gibbs. Details about sampling from the joint target distribution aregiven in the next three subsections.2.6.2 Sampling from p(?;?2;x1:T ja1;a2;y1:T )We use the block MCMC scheme to sample (?;?2;x1:T ) from p(?;?2;x1:T ja1;a2;y1:T ): Because of (2.16), ideally we can could iteratively sample ? fromp(?ja1;a2;y1:T ); ?2 from p(?2j?;a1;a2;y1:T ); and x1:T from p(x1:T j?;?2;a1;a2;y1:T ): However, because we do not have a closed form for that posteriordensity of p(?ja1;a2;y1:T ), we instead use the Metropolis{Hasting algorithmto sample ?; given all the observations from the following term which isproportional to its posterior density, that is,p(?ja1;a2;y1:T ) / p(?)TYt=1jQtj?12"? + 12TXt=1et0Q?1t et#?(nT=2+?):Details are included in Appendix A.1.Since we cannot compute the normalization constant for p(?ja1;a2;y1:T );19Chapter 2. Dynamic Linear Modellingthe Metropolis{Hasting algorithm must be used here; it is impossible tosample ? directly from its posterior distribution. The Metropolis{Hastingalgorithm yields an equilibrium distribution for the Markov chain, since com-putation and simulation is easier for reversible chains where the transitionprobabilities and stationary density of the chain satisfy the detailed balanceequations. In the Metropolis{Hasting algorithm, the transition kernel is amixed distribution for the new state of the chain: q(:;:), the proposal densityand ?(:;:), the acceptance probability.We choose the proposal density, q(:;:); to be lognormal distribution, be-cause the parameter space is bounded below by 0; the density of the Gaussiandistribution making in appropriate. As Moller (2002) notes, this alternativeto the random walk Metropolis considers the proposal move to be a randommultiple of the current state. From the current state ?(j?1)(j > 1); the pro-posed move is ?currency1 = ?(j?1)eZ; where Z is drawn from a symmetric density,such as normal. In other words, at iteration j; we sample a new ?currency1 fromthis proposal distribution, centered at the previously sampled ?(j?1); with atuning parameter ?2 as the variance of the distribution of Z. Gammerman(2006) suggests the acceptance rate, that is, the ratio of accepted ?currency1 to thetotal number of iterations, should be around 50%: We tune ?2 to attain thatrate. If the acceptance rate is too high, for example, 70% to 100%; we thenincrease ?2: Similarly, if the acceptance rate is too low, for example, 0 to20%; we decrease ?2 to narrow down the search area for ?currency1:The Metropolis{Hasting algorithm proceeds as follows. Given ?(j?1);where j > 1;? Draw ?currency1 from LN(?(j?1);?2):5? Compute the acceptance probability?(?(j?1);?currency1) = min(1; p(?currency1jy1:T )=q(?(j);?currency1)p(?(j?1)jy1:T )=q(?currency1;?(j?1))):5X ? LN(a;b) means X follows a lognormal distribution. In other words, X = exp(Y )where Y ? N(a;b):20Chapter 2. Dynamic Linear Modelling? Accept ?currency1 with probability ?(?(j?1);?currency1): In other words, sample u ?U[0;1] and let ?(j) = ?currency1 if ?currency1 < u and ?(j) = ?(j?1) otherwise.We run this algorithm iteratively until convergence is reached.Next we sample ?2; given the accepted ?, a1, a2 and the observations.The prior for ?2 is chosen to be an inverse gamma distribution with shapeparameter ? and scale parameter ?: The posterior distribution for ?2 is alsoan inverse gamma distribution, but with a shape parameter ? + nT2 and ascale parameter ? + 12 PTt=1 et0Q?1t et:We now sample x1:T given the accepted ?, ?2, a1, a2 and y1:T ; using theforward{?ltering{backward{sampling (FFBS) method as described in Sec-tion 2.4. West and Harrison (1997) propose a general theorem for inferenceon the parameters in the DLM framework. For time series data, the usualmethod for updating and predicting is Kalman ?lter. We present the follow-ing theorem as the FFBS algorithm (similar to the Kalman ?lter algorithm)to resample the state parameters conditional on all the other parametersand observations. FFBS is used as part of a MCMC method to samplex1:T = (x1;:::;xT ) from the smoothing distribution p(xtj?;?2;a1;a2;y1:T ):It is called FFBS because recent data are used to update the state parame-ters, xt's, recursively for t from 1 to T; as well as to sample each element ofthe xt's using all the information recursively for t from T to 1:Theoretical results for models (2.14) and (2.15) are presented to give anidea of how to draw samples from the posterior distribution for x1:T : Onecan also obtain it from Theorem 2.4.1 in Section 2.4, by letting Gt = I;Vt = ?2V? and Wt = ?2W:The initial state parameter is given by(x0jy0;?) ? N[m0;?2C0]; (2.18)where y0 denotes the initial information, and m0 and C0 are known values.Later in Section 3.1, we consider how to specify them in Cluster 2 AQSdatabase (1995). Let ? = (?;?2;a1;a2;?): Assume all the prior informationhas been given and ?'s coordinates are mutually independent. The detailson this algorithm are included in Appendix A.2.21Chapter 2. Dynamic Linear Modelling2.6.3 Sampling from p(ym1:T j?;?2;x1:T ;yo1:T )The MCMC method has an important advantage that it enables us to ?ll inthe missing values at each iteration, that is, to treat missing values like themodel \parameters". In this way, it avoids to use ad hoc methods for ?ttedmissing values, say by period means or the spatial regression method.At any ?xed time point t, after appropriately de?ning a scale matrix Rt;we can rewrite the observation vector yt as follows:Rtyt =? ymtyot!;where ymt : nt?1 denotes the missing response(s) at time t and yot : (n?nt)?1 the observed response(s) at t: Notice that \o" represents for \observed"and \m" for \missing".Let Rt = (en1;:::;ent;ek1;:::;ekn?nt )0; where fsnj : j = 1;:::;tgpresents the gauged sites containing missing values at time point t; fskj :j = 1;:::;n ? ntg the gauged sites containing observed values at time t; forall t = 1;:::;T; and ej is an 1? n vector such that ejj = 1; ejk = 0 if k 6= j;for j 2 Z+:We already know that(ytj?;?2;xt;a) ? N[F0txt;?2 exp(?V=?)];and so Rtyt also has a multivariate normal distribution(Rtytj?;?2;xt;a) = ((ymt ;yot)0j?;?2;xt;a) ? N[~t; ~t];where~t = RtF0txt~?t = ?2Rt exp(?V=?)R0t:22Chapter 2. Dynamic Linear ModellingWe can also partition ~t as follows:~t =? ~?mt~ot;!where ~mt : nt ? 1 and ~ot : (n ? nt) ? 1:Similarly, we have~?t =? ~?mmt ~?mo~?omt ~?oot ;!where ~mmt : nt ? nt; ~mot : nt ? (n ? nt) and ~oot : (n ? nt) ? (n ? nt):By a basic property of the multivariate normal distribution, we have(ymt j?;?2;xt;a;yot) ? N[?currency1currency1t ;?currency1currency1t ]; (2.19)where?currency1currency1 = ~mt + ~mot (~oot )?1(yot ? ~ot); (2.20)and?currency1currency1t = ~mmt ? ~mot (~oot )?1 ~omt ; (2.21)for t = 1;:::;T:At each iteration, we draw the fymt g from the corresponding distribution(2.19) at each time point t and then write the response variables as y1:T =(ym1:T ;yo1:T ) without loss of generality, where ym1:T = (ym1 ;:::;ymT ) and yo1:T =(yo1;:::;yoT ):2.6.4 Sampling from p(a1;a2jx1:T ;?;?2;y1:T )We now present our method for sampling the phase parameters a = (a1;a2)0from its full conditional posterior distribution p(aj?;?2;x1:T ;y1:T ); using thesamples for ?, ?2 and x1:T obtained in Sections 2.6.2{2.6.3. For simplicity,we use the notation of models (2.11){(2.15) in this section.We then sample the constant phase parameters conditional on all theother parameters and observations. Suppose a = (a1;a2)0 has a conjugate23Chapter 2. Dynamic Linear Modellingbivariate normal prior with mean vector ?o = (?1o;?2o)0 and covariancematrix ?0: Then the posterior conditional distribution for a is normal withmean vector ?currency1 and covariance matrix ?currency1; where ?currency1 and ?currency1 can be obtainedfrom equations (A.13) and (A.14), respectively. This result is shown in theAppendix A.3.We will not use a non{informative prior for a such as p(a) / 1; becausethat would be problematic here. The reason is straightforward: we wantto avoid cases of non{identi?ed posterior means or posterior variances. Tobe more speci?c, assume p(a) / 1: Using the same inferential approach asabove, we ?nd that the posterior conditional distribution for a is normalwith mean vector ? = (?1;?2)0 and covariance matrix ? from equations(A.2) and (A.3), respectively. The elements of ? are also given in AppendixA.3, where ? can be singular for any t = 12k; where k is an integer. Hencewe obtain extreme values at times 12;24;:::;2880; which invalidates theassumption of constant phase parameters across all the time scales when wesample from their full conditional posterior distribution.For ?xed values of ?; ?2 and x1:T ; we sample a from N(?currency1;?currency1) andthen obtain the median as the estimator for a for each ?xed iteration byexploiting the assumption that a1 and a2 are constant phase parameters inthe models (2.14){(2.15).2.6.5 SummaryThe MCMC methods we use here are very similar to those of Huerta et al.(2004) except that we use all the samples after the burn{in period, not justthe chain corresponding to the accepted samples, as they did in their paper.That is because using only accepted Markov chains actually leads to thebiases on the samples, which indeed changes the detailed balance equationof Metropolis{Hasting algorithm.The algorithm we use in Cluster 2 AQS database is summarized as fol-lowing:------------------------------------------------------24Chapter 2. Dynamic Linear ModellingAlgorithm The Metropolis-within-Gibbs method------------------------------------------------------1. Initialization: sample?(1) ? IG(??;??)?2(1) ? IG(??2;??2)x(1)1:T ? N(m0;?2(1)C0):2. Given the (j ? 1)th values, ?(j?1); ?2(j?1); ym1:T (j?1); a(j?1)1 ; a(j?1)2 andthe observations yo1:T :(1) Sample (?(j);?2(j);x(j)1:T ) from p(?;?2;x1:T ja(j?1)1 ;a(j?1)2 ;y(j?1)1:T );wherey(j?1)1:T = (ym1:T (j?1);yo1:T ):(i) ? Generate a candidate value ?currency1 from a logarithm proposaldistribution q(?(j?1);?); that is, LN(?(j?1);?2) for somesuitable tuning parameter ?2:? Compute the acceptance ratio ?(?(j?1);?currency1) where?(?(j?1);?currency1) = min(1; p(?currency1ja(j?1)1 ;a(j?1)2 ;y(j?1)1:T )?currency1p(?(j?1)ja(j?1)1 ;a(j?1)2 ;y(j?1)1:T )?(j?1)):? With probability ?(?(j?1);?currency1) accept the candidate valueand set ?(j) = ?currency1; otherwise reject and set ?(j) = ?(j?1):(ii) Sample ?2(j) from p(?2j?(j);a(j?1)1 ;a(j?1)2 ;y(j?1)1:T ):(iii) Sample x(j)1:T from p(x1:T j?(j);?2(j);a(j?1)1 ;a(j?1)2 ;y(j?1)1:T ):(2) Sample ym1:T (j) from p(ym1:T j?(j);?2(j);x(j)1:T ;a(j?1)1 ;a(j?1)2 ;yo1:T ):(3) Sample (a(j)1 ;a(j)2 ) from p((a1;a2)j?(j);?2(j);x(j)1:T ;y(j)1:T ); where y(j)1:T =(ym1:T (j);yo1:T ):3. Repeat until convergence.------------------------------------------------------25Chapter 2. Dynamic Linear Modelling2.7 Algorithms for Interpolation and Predictionon Ungauged SitesOur goal in this section is to interpolate the ozone concentrations at un-gauged sites using the DLM and the simulated Markov chains of the modelparameters (see Section 2.6). In other words, suppose s1;:::;su are u un-gauged sites of interest within the region of Cluster 2 sites (excluding thepossibility of extrapolation), the objective is to draw samples fromp(ys1:T j?;?2;x1:T ;a1;a2;y1:T );where ys1:T = (ys1;:::;ysT ) : 1?T and yst denotes the unobserved square{rootof ozone concentrations at the ungauged site s and time t; for t = 1;:::;Tand s 2 fs1;:::;sug: Let (?s1t;?s2t) denote the unobserved state parametersat site s and time t. The DLM is given byytnew = 1n+10?t + S1t(a1)?1tnew + S2t(a2)?2tnew + ?tnew; (2.22)where ytnew = (yst ;yt0)0; ?tnew = (?s1t;?1t0;?s2t;?2t0)0; and ?tnew ? N(0;?2 exp(?Vnew=?)):In Section 2.7.1, we illustrate how to sample the unobserved state param-eters f(?s1t;?s2t) : t = 1;:::;Tg from the corresponding conditional posteriordistribution. Spatial interpolation at the ungauged site s is demonstratedin Section 2.7.2.2.7.1 Sampling the unobserved state parametersWe ?rst sample ?sjt given ?sj;t?1; ?jt and ?j;t?1;j = 1;2: From the stateequation (2.15) to ?jtnew; we know that the joint density of ?sjt and ?jtfollows a normal distribution, with covariance matrix ?2?2j exp(?Vnew=?j);where Vnew denotes the distance matrix for the unobserved station and themonitoring stations. The conditional posterior distribution,p(?sjtj?sj;t?1;?;?2;?t;?1t;?2t;a1;a2;y1:T );26Chapter 2. Dynamic Linear Modellinghas been derived in Appendix A.4.2.7.2 Spatial interpolation at ungauged sitesWe interpolate the square{root of ozone concentration at ungauged sitesby conditioning on all the other parameters and observations at gaugedsites. Similarly as above, yst and yt are jointly normally distributed fromthe observation equation. The predictive conditional distribution for yst ; thatis, p(yst j?s1t;?s2t;?;?2;?t;?1t;?2t;a1;a2;y1:T ); is given in Appendix A.4.27Chapter 3Dynamic Linear Modellingand Its Spatial InterpolationIn Chapter 2, we illustrate the DLM settings for the Cluster 2 AQS database.Furthermore, we show the explicit MCMC method to estimate the modelparameters and the predictive posterior distribution for the responses at\ungauged sites". Here we randomly select some monitoring stations inthe database and treat them as \ungauged" (i.e., unmonitoring) sites forthe purpose of model assessment. The theme of this chapter is the spa-tial interpolation using the DLM approach at the Cluster 2 AQS database.Section 3.1 revisits the Cluster 2 AIRS database. Section 3.2 shows theresults of MCMC. Section 3.3 demonstrates the spatial interpolation resultson the ozone study. Section 3.4 discusses the problems underlying the DLMprocess. Section 3.5 provides the summary and conclusions about this ap-plication of the univariate DLM.We also develop a software, written in C and R (see Appendix B), partlysolving the computational burden due to the use of MCMC algorithm in theDLM approach. This software, GDLM.1.0, had been successfully testedin PIMS summer school (2007). It can also be downloaded freely fromhttp://enviro.stat.ubc.ca/dlm and the written DEMO can be directly usedto illustrate the DLM in Chapter 2.3.1 Cluster 2 AQS Dataset (1995) RevisitedBecause of the importance and spatio{temporal features, ozone concentra-tions are of particular interest for this study. In this section, we revisit theozone levels in Cluster 2 AQS database and study it using a variant of the28Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationimplementation of the DLM approach proposed by Huerta et al. (2004).Within the range of Cluster 2 sites, six ungauged sites are randomly se-lected from the available sites, that is, a non{null subset of the sites withinthe range of but excluding the ten gauged ones in Cluster 2. The geograph-ical locations of these six ungauged sites, represented by the alphabeticletters, A;:::;F; are shown in Figure 3.1.Figure 3.1: Geographical locations for the ten gauged sites in Cluster 2 and therandomly selected six ungauged sites. (Number = Cluster 2 sites and letter =ungauged sites.)In Section 2.5, we have illustrated the features of ozone concentrationstabled in the AIRS database by means of an explanatory data analysis andconstructed the DLM based on these features. In Sections 2.6.1{2.6.4, wehave demonstrated the use of FFBS and MCMC methods in the model usedin our study. Statistical inferences and results are presented in the abovesubsections, where we illustrate the MCMC sampling scheme to obtain the29Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationsamples of model parameters.For the initial values of the state parameters, hyperpriors and ?xed modelparameters, we use settings by Huerta et al. (2004), after con?rming theirsuitability by preliminary investigation. We do a Markov chain simulationstudy and discuss the results in Section 3.2. We then interpolate the square{root of ozone concentrations at ungauged sites in Section 3.3. Issues facedin using the DLM process are discussed in Section 3.4. These issues includemonitoring two Markov chain's convergence, highly autocorrelated chainsfor ? and ?2; and time{varying e?ect of ?{?2 in the DLM. These problemsand possible solutions are summarized in Section 3.5.3.2 Markov Chain Simulation StudyWe do a Markov chain simulation study to draw samples of the DLM's modelparameters from their posterior distributions to make inference based onthem.Initial settingsAs proposed by Huerta et al. (2004), we use the following initial settings forthe starting values, hyperpriors and ?xed model parameters in the DLM:? The hyperprior for ? is IG(1;5) and IG(2;0:01) for ?2: The expectedvalue of IG(1;5) is 1 as are both of the variances of p(?) and p(?2):These vague priors for ? and ?2 are selected since we do not have anyprior knowledge about their distributions.? Initially the state parameters x0; is assumed to be normally distributedwith mean vector m0 = (2:85;?0:7510n;?0:0810n)0 and covariance ma-trix ?21C0; where ?21 ? IG(2;0:01) and C0 is a block diagonal matrixwith diagonal entries 1; 0:0110n and 0:0110n:? The hyperprior for a is a bivariate normal distribution with meanvector ?o = (2:5;9:8)0 and a diagonal matrix ?o with diagonal entries0:5 and 0:5:30Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolation? The model parameters in the DLM are ?xed as follows: ?2y = 0:02;?21 = 0:0002; ?22 = 0:0004; ?1 = 25; and ?2 = 25:We found our results to be fairly insensitive to changes in the valuesfor ?o; ?o; ?1; and ?2: However, it is not true for ?2y ; ?21 ; and ?22 : Furtherdiscussion on settings for these values can be found in Section 4.4 belowEquation (4.24).Monitoring the convergence of the Markov chainsFigure 3.2: Traces of model parameters with the number of iterations of theMarkov chains. The model parameters are: (a) { ?; the range parameter; (b) {?2; the variance parameter; (c) { a1; the phase parameter with respect to the 24{hour periodicity; and (d) { a2; the phase parameter with respect to the 12{hourperiodicity.Figure 3.2 shows the trace plots of model parameters ?; ?2; a1 and a2with the number of iterations of the simulated Markov chains where the totalnumber of iterations is 4;268: The burn{in period is chosen to be 2;269 and31Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationall the remaining Markov samples are collected for posterior inference. Theacceptance rate is approximately 62%: We observe that the Markov chainconverges after a run of less than ?ve hundreds iterations.Table 3.1 demonstrates the median and 95% quantile from the simulatedMarkov chains for the model parameters ?; ?2; a1 and a2:Quantile ? ?2 a1 a22:5% 69.29 1.19 2.42 9.77Median 71.83 1.21 2.45 9.8097:5% 75.37 1.24 2.48 9.84Table 3.1: Posterior summaries for ?; ?2; a1; and a2:3.3 Spatial InterpolationSix ungauged sites are randomly selected from the available subset of siteswithin the range of Cluster 2 sites. Their geographical locations are shownin Figure 3.1. Our goal in this subsection is to assess the model's perfor-mance by comparing the interpolation values with the observations at theseungauged sites, that is, A;:::;F. All missing values at ungauged sites areinitially ?lled in by the spatial regression method. The MCMC methodallows us to obtain the posterior samples for the missing values from theirposterior distributions. We use the observed data at ungauged sites to assessthe performance of the interpolation results by the DLM.Table 3.2 demonstrates the coverage probabilities at ungauged sites whencomparing them to the corresponding credible probabilities. These coverageprobabilities are calculated by counting the number of observed responsesfalling into the predictive intervals constructed by the DLM. It shows di?er-ent levels of the predictive credibility intervals and the corresponding actualcoverage probability at each of the ungauged sites based on the spatial inter-polation using the DLM approach. In general, the coverage probabilities atungauged sites are larger than the corresponding credible probability, whichindicates that the error bands of the interpolation are too wide. Amongthese six ungauged sites, site D has the highest coverage probability at32Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationthese nominal levels. This may be due to the fact that Ungauged Site D isvery close geographically to Gauged Site 1 in Cluster 2 sites. It validatesour assumption that the spatial correlation is large if the pairs of sites areclose together but small if they are far apart. However, these unsatisfac-tory coverage probabilities imply the de?ciency of the DLM, which we willaddress in the following sections.Nominal levels (%) Observed coverage fraction (%)A B C D E F95:0 94:9 96:9 96:5 99:7 96:1 98:190:0 91:9 93:7 93:5 99:4 93:6 96:880:0 84:8 88:5 88:2 97:7 89:6 94:370:0 78:7 83:5 83:3 94:0 85:8 90:660:0 73:0 78:5 77:1 89:7 81:6 86:650:0 65:2 71:5 70:4 85:6 76:1 81:440:0 55:2 61:4 61:0 79:2 67:9 74:730:0 42:2 47:6 47:5 69:6 54:9 64:4Table 3.2: Comparisons between the nominal levels and actual predictive credibil-ity interval coverage at the ungauged sites A;:::;F:Figures 3.3{3.7 show the interpolation results at Ungauged Site D fromMay 14 to September 11 in 1995, where the solid lines represent the pre-dicted median of the responses, the dashed lines represent the 95% predictiveintervals for the predicted square{root of ozone concentrations and the soliddots represent the observations at this \ungauged" site.33Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.3: Interpolation at Ungauged Site 4 from the 1st week to the 4th week.Figure 3.4: Interpolation at Ungauged Site 4 from the 5th week to the 8th week.34Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.5: Interpolation at Ungauged Site 4 from the 9th week to the 12th week.Figure 3.6: Interpolation at Ungauged Site 4 from the 13th week to the 16th week.35Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.7: Interpolation at Ungauged Site 4 from the 17th week to the 120th day.36Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationTable 3.2 demonstrates the coverage probabilities at each ungauged siteusing the data for the selected summer. That table shows Ungauged Site Dhaving the largest coverage probability comparing to other ungauged sites.An intuitively plausible explanation: these ungauged sites are close to somegauged sites in Cluster 2 sites. To explore this possibility let us consider\friends" of ungauged sites, any gauged sites in Cluster 2 within 100 kilo-meters.Table 3.3 shows the Global Circle Distance (GCD) and Pearson's cor-relations between these pairs of \friends". As an example, the relationshipbetween Ungauged Site D and its \friend", Gauged Site 1; is demonstratedin Figure 3.8. From this ?gure, Ungauged Site D has a strong linear as-sociation with Gauged Site 1: It explains that the coverage probability atUngauged Site D is always higher than the other sites.Ungauged Site Friend(s) GCD (km) Pearson's rA 2 66:6 0:73B 2 62:5 0:74C 2 35:5 0:84D 1 11:0 0:95E 2 38:0 0:70F (7;8) (18:6;44:9) (0:84;0:82)Table 3.3: Summary of pairs of \friends" for ungauged and gauged sites.Overall, the DLM does not predict the responses at ungauged sites veryaccurately. That points to problems hidden in this method and processmodel to which we turn in the next section.3.4 Problems in the DLMThere are a number of problems with the current DLM. We summarizeseveral critical issues in this section and give some suggestions about their37Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.8: Scatterplot for the square{root of hourly ozone concentrations atUngauged Site D and its nearly neighbour, Gauged Site 1:resolution in the next section.Monitoring two Markov chains' convergenceFigure 3.9 represents the trace plots of model parameters ?; ?2; a1; anda2 of two chains from the initial settings in Section 3.2. These two chainsseem to mix well after several hundred iterations. It suggests the Markovchains have converged.Autocorrelation and partial autocorrelation of the Markov chainsHowever, we know that the autocorrelation function (ACF) is very im-portant when considering the length of the chain needed to ensure the esti-38Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.9: Traces of model parameters with number of iterations of the twoMarkov chains. The model parameters are: (a) ??; the range parameter; (b)??2; the variance parameter; (c) ?a1; the phase parameter with respect to the24?hour periodicity; and (d) ?a2; the phase parameter with respect to the 12?hourperiodicity.mates having the required accuracy. In other words, a highly autocorrelatedchain has to run for a long time to obtain su?ciently accurate estimates.The partial autocorrelation function (PACF) is important in assessing theMarkov chain since a large value of the PACF at lag h indicates that thenext value in the chain is dependent not only on the immediate past butalso on the distant past.Figure 3.10 shows the histogram as well as the ACF and PACF for theMarkov chains used in Section 3.3, after a burn{in period of 1;000: Its ACFplots show the ?s to be highly autocorrelated. It indicates that the chainfor ? is not mixing very well, which leads to the biased estimates in Section3.3. A possible way to reduce the autocorrelation between these ?s is tothin the Markov chain. That is, we could use every kth (k > 1;k 2 Z+) ?39Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationgenerated by the chain to give the estimates. However, due to the undulylarge computational cost for thinning the Markov chain, we are forced touse the whole chain for estimation and interpolation.Figure 3.10: Histogram (left panel), ACF (middle panel) and PACF (right panel)of model parameters of the Markov chains after a burn{in period of 1;000 iterations.The model parameters are: (i) ?rst row: { ?; the range parameter; (ii) second row:{ ?2; the variance parameter; (iii) third row: { a1; the phase parameter with respectto the 24{hour periodicity; and (iv) last row: { a2; the phase parameter with respectto the 12{hour periodicity.Relationship between the pairs of ?; ?2; a1 and a2The DLM assumes that priors of model parameters ?; ?2; a1; and a2 aremutually uncorrelated. Figure 3.11 shows the relationship between the pairsof these parameters and specially, a weak linear association between ? and?2: It indicates that ? and ?2 are actually dependent, given the observedresponses.40Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.11: Scatterplots for model parameters' pairs: (a) ? v.s. ?2; (b) ? v.s. a1;(c) ? v.s. a2; (d) ?2 v.s. a1; (e) ?2 v.s. a2; and (f) a1 v.s. a2:Time varying e?ect of ?{?2: coverage probabilities versus cred-ible probabilitiesIt's natural to ask whether these ?s and ?2s generated from the MCMCmethod are constant over all the time points, an assumption in Huerta etal.'s DLM. In other words, we want to answer questions such as: (1) Whichdata and time points used in the DLM might produce di?erent estimationand interpolation result? (2) Are ? and ?2 varying from time{to{time?To help answer these questions, we design the following three simulationstudies:(i) Study A : Implement the DLM at ungauged sites using weekly data(Wk : k = 1;:::;17). Obtain the Markov chains of ?; ?2; a1 and a2:41Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationAlso obtain the coverage probability at each ungauged site and eachweek for ?xed nominal levels.(ii) Study B : Implement the DLM at ungauged sites using all the datafrom week 1 to week 17 (W1:17 = fW1;:::;W17g). Estimate the modelparameters and interpolate responses at ungauged sites. Furthermore,obtain the coverage probabilities at each ungauged site and each weekfor ?xed nominal levels, using each week's data.(iii) Study C : Fix ?currency1k at week k (k = 1;:::;17) using the Markov chainsobtained in Study A. Then use these ?currency1 = f?currency11;:::;?currency117g in the DLM.In other words, we go through all the steps in the algorithm of Section2.6.5 except that we use the ?xed ?currency1 instead of generating it by aMetropolis{Hasting step. Note that we only need Gibbs samplingand MCMC blocking scheme for this study. We then compute thecorresponding coverage probabilities using W1:17 at each ungauged siteand each week for ?xed nominal levels.The objective of Studies A and B is to demonstrate the e?ect of dataand time propagation on the interpolation results. Study C aims to tell usthere is a signi?cant di?erence between the interpolation results obtainedusing the ?xed ?currency1 and those from using the Markov samples of ?s. Table3.4 shows these ?xed ?currency1s in Study C:Week 1 2 3 4 5 6 7 8 9?currency1 54.2 178.5 83.7 405.4 86.6 59.7 199.3 144.1 322.7Week 10 11 12 13 14 15 16 17?currency1 142.2 172.7 187.9 315.8 419.0 99.8 260.3 284.8Table 3.4: Fixed ?currency1 in Study C:Figure 3.12 illustrates the MCMC estimation results for Study A: It plotsthe Markov chains for ? and ?2 using weekly data. Obviously, ? and ?2 varyfrom week to week, implying that the constant ? ? ?2 model is not tenableover a whole summer for this database.Figures 3.13{3.19 represent the coverage probabilities of the interpola-tion results from these three studies. The solid line with dots represents42Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.12: Scatterplot for ? against ?2 given one{week{data only, constructedfrom MCMC samples starting from same initial values.the results in Study A; the dot line with solid diamond for Study B; andthe dashed line with stars, Study C: The interpolation results for StudyB and C are very similar from these ?gures. In other words, using these?xed ?currency1s in Table 3.4 gives us similar interpolation results as treating it tobe model parameter in the standard DLM setting, pointing to a drawbackwith the current DLM. In fact, in some cases Study C even produces betterinterpolation results than Study B:The results from Study A show that, as time increases, the interpolationresults become, anomalously, more uncertain with the coverage probabilitiesgetting larger and larger. We can interpret this as the model trying toincorporate the constant ? and ?2 over all the time points while they actuallyvary with time.Comparing these studies, sometimes the DLM gives better interpolationresults when using only one week's worth of data. In any case the current43Chapter 3. Dynamic Linear Modelling and Its Spatial Interpolationmodel assumption of constant ? and ?2 is not valid in practice. Furtherdevelopment of the DLM is required to incorporate time{varying modelparameters.Figure 3.13: Coverage probabilities v.s. 95% nominal level for ungauged sites: (a){ site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F. Thesecoverage probabilities are computed according to Study A: weekly data (dot withsolid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with ?xed?currency1 (star with dashed line).3.5 Summary and ConclusionWe have implemented the DLM on Cluster 2 sites (AQS, 1995). Further-more, we have applied a variant of Huerta et al.'s (2004) DLM and MCMCmethod on this database for a whole summer (whereas they considered asingle week). We ?nd de?ciencies in their MCMC method which actuallyuses a biased estimate of ?: In practice, their model assumption of a con-stant ?{?2 seems inappropriate. Moreover, preliminary studies tell us the44Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.14: Coverage probability versus 90% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with?xed ?currency1 (star with dashed line).sensitive choice for the values of ?2y ; ?21 ; and ?22 ; indicating the inappropri-ate setting for these parameters to be \constant" in their model. Finallythe computational cost is of concern: that cost deriving from the use ofthe FFBS method used in this chapter. The software for implementing theDLM, GDLM.1.0, has been summarized in Appendix B.One way to tackle the interesting problem of setting ?2y ; ?21 ; and ?22 isby setting appropriate discount factors in the discount DLM. However, weare not recommending using the composite Metropolis{Hasting algorithmto obtain the samples for ? = (?;?2y ;?21 ;?1;?22 ;?2) from their joint posteriordistribution. The reason is obvious that the computational cost is hugeand it is very di?cult for the Markov chains to reach its convergence after45Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.15: Coverage probability versus 80% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with?xed ?currency1 (star with dashed line).certain iterations. We ?nd the relationship between the discount factors andthe model parameters: ?2y ; ?21 ; and ?22 by a ?rst{order polynomial dynamicmodel in Section 4.4.To deal with the time{varying ? and ?2 in the current version of theDLM, we might be able to use the discount DLM where the discount factorvaries from time to time. However, this even exemplify the computationalburden and so will not be recommended here. Instead of the DLM, wenow propose the Le{Zidek style modelling approach (also called as BSPapproach) in the next two chapters to deal with the spatial interpolation andtemporal prediction in spatial{temporal ?elds (Le & Zidek, 1992{2006).46Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.16: Coverage probability versus 70% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with?xed ?currency1 (star with dashed line).47Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.17: Coverage probability versus 60% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 ( square with dot line); and Study C: W1:17 butwith ?xed ?currency1 (star with dashed line).48Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.18: Coverage probability versus 50% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with?xed ?currency1 (star with dashed line).49Chapter 3. Dynamic Linear Modelling and Its Spatial InterpolationFigure 3.19: Coverage probability versus 40% nominal level for ungauged sites:(a) { site A; (b) { site B; (c) { site C; (d) { site D; (e) { site E; and (f) { site F.These coverage probabilities are computed according to Study A: weekly data (dotwith solid line); Study B: W1:17 (square with dot line); and Study C: W1:17 but with?xed ?currency1 (star with dashed line).50Chapter 4Multivariate BayesianSpatial Prediction and ItsSpatial InterpolationMany approaches other than the univariate DLM (Chapter 2) have beendeveloped to model space{time ?elds. Although the DLM provides a very?exible approach, it can have poor predictive performance. Moreover, thisapproach costs a lot of computation time and becomes impractical for largegeographical subregions, for instance, for the 274 sites in the US EPA AQSdatabase (1995). To overcome these di?culties, an alternative Bayesianhierarchical modelling approach, multivariate Bayesian spatial prediction(BSP), will be presented and its performance compared with that of theDLM.This BSP approach appears in a series of papers, originating with Leand Zidek (1992). Section 4.1 presents that approach, its motivation, andsome recent applications. Section 4.2 describes the Chicago area's hourlyozone (O3) AQS database (2000) in this study. Section 4.3 introduces relatedtheoretical results for the multivariate BSP approach (Le & Zidek, 2006).The multivariate BSP approach is then applied for spatial interpolation inthe Chicago area's hourly ozone AQS database in Section 4.4. In Section4.4, two other approaches, the DLM and NAIVE, are implemented for thesame purposes to give comparative assessment of the performance of thevarious predictors. Sections 4.6 summarizes our conclusions about them.51Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation4.1 IntroductionAlternatives to the DLM have been investigated in recent years for ?ndingthe characteristics of the mean surface of spatio{temporal processes or inter-polating them at unmonitored (ungauged) sites within speci?ed geograph-ical subregions. Like the DLM they can deal with nonstationary processesas described in Chapter 2. In particular, the multivariate Bayesian spatialprediction (BSP) approach presented in this chapter handles nonstationaryprocesses, using the deformation approach (Sampson & Guttorp, 1992).The DLM treats the processes as spatially correlated time series, thatis, parallel time series for a spatial process. Other approaches such as thelinear model of coregionalization (LMC) method also treat data as spatiallycorrelated time series (Gelfand et al., 2005). In contrast, the multivariateBSP treats the data as a collection of the temporally correlated spatialprocesses (Kyriakidis & Journel, 1999). Related work about the multivariateBSP can be seen in Le and Zidek (1992), Brown et al. (1994a, 1994b), Sunet al. (1998), Li et al. (1999), Le et al. (1997, 2001), Zidek et al. (2002),and Le and Zidek (2006).This chapter addresses spatial interpolation in space{time ?elds of hourlyozone concentrations. The multivariate BSP is implemented in the Chicagoarea throughout the whole summer of 2000. Within this geographical region,hourly ozone concentrations are measured at 24 monitoring sites throughthat time span. For comparison, the DLM method (see Chapter 2) is alsoapplied to this database.The large geographical scale of the interpolation problem across the USA,can be handled fairly well by the multivariate BSP. Unlike what was done inChapter 2 for the DLM, the BSP approach used in this chapter has a multi-variate rather than univariate framework, although the response variable isthe square{root of hourly ozone concentrations, the square{root being takento validate the normality assumption stated in Chapter 2. We have two rea-sons for choosing the multivariate setting: (1) to increase precision in spatialinterpolation and temporal prediction; and (2) to provide a way to reducethe spatial correlation leakage problem arising in the univariate case. In fact,52Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationmore accurate results can be obtained even for marginal inference based onthe joint distribution model for all of the responses rather than using onlya marginal distribution model, due to the greater uncertainty involved inthe latter. In other words, the multivariate approach allows interpolatorsand predictors to \borrow strength" not only from close{by monitoring sta-tions but also from chemical correlates (Le & Zidek, 2006, p.161). At thesame time, a potential spatial correlation leakage problem, which occurs forwhitened residuals (see Section 4.3) that have a nonnegligible lag 1 spatialcorrelation between two sites, can be side{stepped. That leakage problemseems to have been ?rst observed by Zidek et al. (2002) while interpolatinghourly PM10 concentrations in the Vancouver area. Though not preciselyde?ned, the leakage happens if the cross{covariance of space and time is notnegligible in the whitened residuals due for example to a failure to correctlymodel autocorrelation at ?ne temporal scales (Zidek et al., 2002). In fact,it actually provides a criterion for selecting the appropriate response vectordimension in our approach. Section 4.4 discusses appropriate initial settingsfor the multivariate BSP to interpolate hourly ozone concentrations. Theadvantages of using the multivariate approach have been described by Leand Zidek (2006), and will be reviewed in Section 4.6.The temporal prediction problem does not at ?rst glance seem to beembraced by the current multivariate BSP approach. The challenge arisesbecause the multivariate response variable used in this case has to be a 24{dimensional response vector, in order to estimate the hyperscale covariancematrix among these 24 \pollutants", in reality hours, with a separabilityassumption on the covariance structure. Using the whole database is notappropriate because it invalidates the independence assumption on the re-sponse vectors needed. However, using a part of the database leads to achoice for the covariates, although further work on the temporal predictorhas to be done. Resolutions of some above issues and theoretical results fortemporal predictive distributions are presented in Section 5.1.To assess the multivariate BSP model's performances, two other alterna-tive approaches, the DLM and NAIVE (NAIVEcurrency1), are proposed for spatialinterpolation (temporal prediction) of the Chicago area's hourly ozone con-53Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationcentration's ?eld. These three models' performances are then compared inSection 4.4 (Sections 5.2{5.3) for the BSP, DLM and NAIVE (NAIVEcurrency1),respectively.Computational e?ciency is one major advantage of the multivariateBSP approach. The software, EnviRo.stat, can be freely downloaded fromhttp://enviro.stat.ubc.ca/. Sections 4.6 and 5.5 summarize our conclusionsabout the implementation of this approach.The next section describes characteristics of the Chicago area's hourlyozone ?eld.4.2 AQS Ozone Database (2000) for the ChicagoAreaThe database used in this chapter originally comes from the AQS ozonedatabase (2000) by EPA. The hourly ground{level ozone concentrations (inppb) for the whole summer in the Chicago area are extracted from thatdatabase. The extracted database contains 24 monitoring stations at irreg-ularly geographical locations in this area, hourly ozone concentrations beingmeasured at each of them. The joint spatial and temporal dependence ofthe hourly ozone levels are then modelled as a spatio{temporal process inthe spatio{temporal ?eld over the Chicago area.To facilitate the assessment of the model's performance for interpolationand prediction, 14 sites are selected as \gauged" sites from 24 monitoringstations, the remaining 10 being taken to be ungauged sites. Figure 4.1represents the geographical locations of these 14 gauged and 10 ungaugedsites. Each has a few missing values but the gauged sites have many fewerzero measurements during the overall time span than most of the ungaugedsites, thus providing much more information for this spatio{temporal ?eld(see Figure 4.2).Figure 4.3 shows a side{by{side boxplot of the square{root of hourlyozone concentrations at each one of the 24 monitoring stations across allthe time points. It shows that Gauged Site 3 behaves di?erently because54Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.1: Geographical locations for the Chicago AQS database (2000), wherethe latitude and longitude are measured in degrees. (? = G = gauged sites and ?= UG = ungauged sites.)of its deviation from the median for all sites and times. Figure 4.1 showsthat gauged site to be near the Michigan River. However, it is unknown ifthe di?erence of the observed responses at Gauged Site 3 from the rest aredue to the in?uence of that river or because other sites are also close to it,for example, Gauged Sites 1 and 10. One might expect that any model nottaking account of this di?erence could lead to a poor model ?t. This chapter55Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.2: Boxplots for the rates of: (a) missing measurements; and (b) zeromeasurements, at 24 monitoring stations in the Chicago AQS database. (G =gauged sites and UG = ungauged sites.)will examine this issue later when comparing interpolation (see Section 4.4)and prediction (see Section 5) of ozone concentrations' ?eld using threedi?erent approaches: the multivariate BSP, DLM and NAIVE (NAIVEcurrency1).To explore this database further, weekday and hourly e?ects are exam-ined in Figures 4.4 and 4.5, respectively, using a simple regression method.The latter are approximately constant over all gauged sites; in particular,the variability of the hourly e?ects from 0 A.M. to 10 A.M. is slightly largerthan that of the remaining hours after 10 A.M., indicating the relativelystrong constant hourly e?ects from 10 A.M. to 11 P.M. The weekday ef-fects in Figure 4.4 also indicate constant weekday e?ects across all gaugedsites. The above exploratory data analysis (EDA) suggests modelling con-stant weekday and hourly e?ects across all gauged sites. Constant weekday56Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.3: Boxplots for the square{root of hourly ozone concentrations (pppb) at24 monitoring stations in the Chicago AQS database. (G1 = Gauged Site 1; UG1= Ungauged Site 1; and so on.)and hourly e?ects point to constant e?ects for appropriate covariates in themultivariate BSP approach. Next, the corresponding model settings andmethodology for the multivariate BSP is discussed in the context of theChicago area's hourly ozone concentrations' ?eld.4.3 MethodologyThe multivariate BSP approach puts no restriction at level one of its un-derlying hierarchical model structure on the covariance structure in thespatio{temporal ?eld. Instead, the covariance function is modelled by thegeneralized{inverted Wishart (GIW) (Le & Zidek 1992; 2006) distribution.That approach takes account of uncertainties about the K{step staircase57Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.4: The weekday e?ect of the square{root of hourly ozone concentrations(pppb) at the 14 gauged sites in the Chicago AQS database.pattern data by allowing di?erent degrees of freedom for di?erent \steps"in the data array, and allows a nonstationary covariance structure in the?eld. In this approach, spatio{temporal multivariate responses are treatedas a collection of temporally correlated spatial ?elds at a ?nite number oftime points, instead of a collection of spatially correlated time series at a?nite number of monitoring stations in the DLM approach (Kyriakidis &Journel, 1999). The multivariate BSP models spatio{temporal responses attwo levels: at the ?rst, the spatio{temporal random function in this ?eld issupposed to follow a Gaussian process with the mean function dependingon the covariates as well as the corresponding coe?cient matrix ?; and co-variance function ?; at the second, the coe?cient matrix ? is modelled asa random matrix, following a Gaussian process with covariance function ?;having a GIW distribution. The residuals after the ?rst level of modelling58Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.5: The hourly e?ect of the square{root of hourly ozone concentrations(pppb) at the 14 gauged sites in the Chicago AQS database.are called as detrended residuals, and those after the second level are calledas deAR'd residuals (Zidek et al., 2002).The multivariate BSP approach uses an autoregressive temporal struc-ture to incorporate short{term autocorrelations and a nonstationary spatialcovariance structure to deal with the nonstationary temporal{spatio pro-cesses. In particular, 2{hour{block response vectors are selected in Chicago'shourly ozone ?eld (Section 4.2) to reduce the loss of spatial correlation leak-age between the sites and allow prediction at the given hour borrowingstrength from its neighbour.Le and Zidek (2006) and Le et al. (1997) present theoretical results onthe multivariate BSP model, but only the results related to the work donein this section are summarized here. Speci?cally, the data patterns of themultivariate BSP, that is, the systematic missing data patterns, are intro-59Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationduced. Also it is demonstrated in the fully hierarchical Bayesian frameworkabout the multivariate BSP model, as well as the conditional posterior dis-tribution of the unobserved response variables at ungauged sites and theprocedure for the hyperparameters estimation. Finally, this section presentsthe posterior predictive credibility ellipsoids of the unobserved responses atungauged sites to assess the BSP's model performance (Sun et al., 1998; Le& Zidek, 2006, p.181{183). The package, EnviRo.stat, is used in this sec-tion (http://enviro.stat.ubc.ca). The results can be reproduced using thatsoftware since it is freely available.Model speci?cationThe multivariate BSP approach addresses two types of missing data pat-terns: monotone missing (i.e., staircase pattern) and systematically missing(Le et al., 1997); only the former is relevant here. The monotone missingpattern occurs when the multivariate responses are measured at di?erentsites that start at di?erent times. In this case, the sites can be rearrangedin such a way that the data array exhibits a monotone increasing patternover time (say a K{step staircase pattern, for K = 1;2;:::). Within eachblock of the monotone missing or staircase pattern, the multivariate randomvector of responses are measured from the same starting time.Le and Zidek (2006) provide a theory that handles such data. In ourapplication, K = 1 in the Chicago's hourly ozone ?eld because the responsevariables are measured from the same starting time across all gauged sites.[Note: at each gauged site, the small number of missing measurements areimputed by the spatial regression method before implementing the multi-variate BSP to interpolate the hourly ozone in the ?eld. Unlike the DLM,we can obtain posterior samples for the missingness by treating them asadditional parameters. Imputing the missingness by the MCMC samplesmight take more computational time, however.]In this spatio{temporal ?eld, let n denote the total number of timepoints, p; the total number of pollutants or species measured at each station,g; the total number of gauged sites and u; the total number of ungauged60Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationsites. Denote the response variable at time t and gauged site gj byY[gj]t = (Y [gj]t;1 ;:::;Y [gj]t;p ) : 1 ? p;and the response variable at time t and ungauged site ui byY[ui]t = (Y [ui]t;1 ;:::;Y [ui]t;p ) : 1 ? p;for t = 1;:::;n; j = 1;:::;g; and i = 1;:::;u: At time t; the responses atgauged sites are coordinates of the random response vector Y[g]t = (Y[g1]t ;:::;Y[gg]t ) : 1 ? gp and at ungauged sites, Y[u]t = (Y[u1]t ;:::;Y[uu]t ) : 1 ? up:The combined random response vector at time t can be written as Yt =(Y[u]t ;Y[g]t ) : 1 ? (u + g)p: Consequently, the matrix variate response Y isgiven by (Y10;:::;Yn0)0 : n? (u+g)p: Notice that Y can also be written as(Y[u];Y[g]); where Y[u] = (Y[u]1 0;:::;Y[u]n 0)0; the unobserved response vari-ables at ungauged sites, and Y[g] = (Y[g]1 0;:::;Y[g]n 0)0; the observed responsevariables at gauged sites. Let Z : n ? h be the covariates matrix, where thetotal number of covariates is h: Assume that the covariates are the sameacross all the sites at any ?xed time point. Suppose ? : h ? (u + g)p isthe coe?cient matrix of Z: Assume common covariates e?ects across all thesites in the multivariate BSP.The multivariate BSP model is given byYj?;? ? N(Z?;In ?)2 (4.1)?j?;?0 ? N(?0;F?1 ?) (4.2)? ? GIW(?;?); (4.3)where the covariance structure ? : (u+g)p?(u+g)p is positive de?nite andfollows the generalized Inverted Wishart (GIW) distribution (Le & Zidek,2006, p.158; Brown et al., 1992; Le et al., 1997; Sun et al., 1997; Le & Zidek,61Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation1992).The GIW distribution is de?ned recursively for the covariance function?; having the K{block structure (Brown et al., 1994; Le & Zidek, 2006,p.300). The GIW can be reparameterized by the Bartlett transformation(Le & Zidek, 2006, p.302). In particular, for the case of K = 1; the GIWdistributed ? in (4.3) is equivalent to (?[u];?[u];?1), such that?[u]j?[u] ? N(?[u]0 ;H[u] ?[u]) (4.4)?[u] ? IWup(?[u];currency1[u] ) (4.5)?1 ? IWgp(?1;currency11 ); (4.6)where IW represents the inverted Wishart distribution. Equations (4.4){(4.6) imply that ?[u]0 = ??1gg ?gu; H[u] = ??1gg ; ?[u] = ?; currency1[u] = ?ujg;?1 = ?gg; currency11 = ?gg; and ?1 = ? ? up by the properties and de?nitionsof the IW and GIW (Le & Zidek, 2006, p.299{301). Let H = f?;?;F;?0g;? = f?[u]0 ;H[u];currency1[u]; ;currency11g; and ? = f?[u];?1g: Hence, ?[u] = ?1 + up:Note that sine or cose functions (show in the DLM model before rep-resenting the periodicities in spatial{temporal data) can be included in Z,common over sites. While working with BSP, we can actually incorpo-rate more general structure with more ?exible than only using sine or cose.Moreover, we will have a?ordable number of parameters. To deal with thesite{speci?c covariates, we can do it in two ways: (1) dealing it in the pre{?ltering stage; or (2) treating it as a random process, and then conditioningon that response.Given the multivariate BSP model in (4.1){(4.3), the predictive posteriordistribution of the unobserved responses at ungauged sites and the hyper-parameter estimates at the gauged and ungauged sites are given brie?y in2 represents the Kronecker product between two matrices such thatAp?q Bm?n =0B@a11B ::: a1qB... ...ap1B ::: apqB1CApm?qn=0B@b11A ::: b1nA... ...bm1A ::: bmnA1CApm?qn:62Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationthe next section.Predictive distributions and hyperparameters estimationThis subsection brie?y demonstrates the method used in estimating the hy-perparameters in the multivariate BSP approach and the predictive posteriordistribution of the multivariate response variables at ungauged sites giventhose estimated hyperparameters. The hyperparameters in the BSP containthe hyperparameters at gauged sites, Hg = ( ;currency11;?1;F;?0), and those atungauged sites, Hu = (currency1[u];H[u];?[u]0 ;?[u]): Firstly, the EM algorithm is usedto estimate the hyperparameters at the gauged sites, Hg, in the multivariateBSP. Given those estimators, the Sampson{Guttorp method is used to esti-mate the covariance function of the ungauged sites and the cross{covariancefunction between the gauged and ungauged sites. Consequently, the estima-tors of ?[u] and ?[u]0 can be obtained from the above estimators. Finally, thepredictive posterior distribution of the response variables at the ungaugedsites is obtained conditional on those estimators of H = (Hg;Hu):Note here we only estimate those hyperparameters once in the BSP ap-proach. Not like the DLM, estimates for the model parameters have to beobtained at each iteration of the MCMC runs. This one{time{estimationgreatly saves our computational time.Suppose Y = (Y[u];Y[g]) : (n ? up;n ? gp); ? = (?[u];?[g]) : (h ?up;h? gp); and ?0 = (?[u]0 ;?[g]0 ) : (h? up;h? gp): Given Y[g]; the predictivedistribution of Y[u] isY[u]jY[g];H ? tn?(up)(?[ujg];?[ujg] ?[ujg];? ? up + 1); (4.7)where?[ujg] = Z?[u]0 + (Y[g] ? Z?[g]0 )?[u]0 (4.8)?[ujg] = In + ZF?1ZT + (Y[g] ? Z?[g]0 )H[u](X[g] ? Z?[g]0 )T (4.9)?[ujg] = 1? ? up + 1(currency1[u] ) (4.10)63Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolation(Le & Zidek, 2006, p.160{161).At the gauged sites, the multivariate BSP can deal with the case ofmissing pollutants in each block. In other words, the observed responses inthe matrix{variate form may include the missing columns of some pollutantsat the gauged sites. Let l be the total number of unobserved responses atgauged sites, and so l 2 f0;:::;gpg. The vector of observable responsesat gauged sites Y[g] can be partitioned into Y(1) : n ? l and Y(2) : n ?(gp ? l); the missing and observed responses at gauged sites, respectively.Let rj : (gp ? 1) = (rj;1;:::;rj;gp)0 be a vector such that rj;j = 1 andrj;k = 0 for k 6= j; and k;j = 1;:::;gp: Suppose R1 = (ri1;:::;ril); andR2 = (ril+1;:::;rigp); thus R = (R1;R2) : (gp ? gp); forms an orthogonalmatrix. Hence, Y(1) = Y[g]R1 and Y(2) = Y[g]R2: LetRT ?ggR =? ?11 ?12?21 ?22!=? RT1 ?ggR1 RT ?ggR2RT2 ?ggR1 RT2 ?ggR2!and?gg = RT ?ggR =? ?11 ?12?21 ?22!=? RT1 ?ggR1 RT ?ggR2RT2 ?ggR1 RT2 ?ggR2!;with ?11;?11 : l?l and ?22;?22 : (gp?l)?(gp?l); respectively. Similarly,?[g]0 R can be partitioned as ?[g]0 = (?[g]0 R1;?[g]0 R2) = (?[g](1);?[g](2)):By the properties of the multivariate t{distribution, the predictive pos-terior distribution of Y(1) is given byY(1)jY(2);H ? tn?h(Z?[g](1) + (Y(2) ? Z?[g](2))??122 ?21; 1? ? up ? l + 1P1j2 ?1j2;? ? up ? l + 1); (4.11)whereP1j2 = In + ZF?1ZT + (Y(2) ? Z?[g](2))??122 (Y(2) ? Z?[g](2))T (4.12)64Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationand?1j2 = ?11 ? ?12??122 ?21: (4.13)Furthermore, the predictive posterior distribution of Y[u] is given as followsY[u]jY(2);H ? tn?up(Z?[u]0 + (Y(2) ? Z?[g](2))??122 RT2 ?gu; Puj2 ?ujg? ? up ? l + 1;? ? up ? l + 1); (4.14)wherePuj2 = In + ZF?1ZT + (Y(2) ? Z?[g](2))??122 (Y(2) ? Z?[g](2))T (4.15)(Le et al., 1997).The next subsection explores the predictive performance of the multi-variate BSP approach.Predictive performanceTo assess the multivariate BSP model's performance, pointwise predictiveintervals and predictive posterior credibility ellipsoids are constructed fromthe predictive posterior distributions in Section 4.3 (Le et al., 1997; Le &Zidek, 2006).Le and Zidek (2006) point out that the pointwise predictive distributionof the last (or pth) pollutant at each of the ungauged sites and any ?xedtime point t can be obtained from (4.14) by letting up = 1: Moreover, thepointwise predictive variance is given byVar(Y[u]t jY(2) = y(2);H) = (?currency1 ? 2)?1Ptj2?uj2; (4.16)where ?currency1 = ? ? up ? l + 1: Hence, the pointwise predictive intervals of thelast pollutant at ungauged site ui and the ?xed time t is given byE(Y[u]t jY(2) = y(2);H) ? t?currency1(0:025)(Var(Y[u]t jY(2) = y(2);H))1=2:65Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationHowever, one might expect that the pointwise predictive intervals wouldnot have good calibration properties due to the accumulated uncertaintyarising from simultaneously interpolating the response variables in the spatio{temporal ?eld. Le et al. (1997) develop the ellipsoid credible regions forsimultaneously interpolating at ungauged sites, improving its calibration.Those ellipsoid credible regions are given by the following theorem that weinclude here for completeness.Theorem 4.3.1 The (1 ? ?)?level (0 < ? < 1) simultaneously posteriorcredibility region is given byfY[u]t : (Y[u]t ?^[u]t )??1ujg(Y[u]t ?^[u]t )T < up? ? up ? l + 1Ptj2Fup;??up?l+1(1??)g;where^[u]t = Zt?[u]0 + (Y[g]t ? Zt?[g]0 )?[u]0 ;Ptj2 = 1 + ZtF?1ZTt + (Y[g]t ? Zt?[g]0 )H[u](Y[g]t ? Zt?[g]0 )T ;and ??1ujg = (currency1[u] )?1:Theorem 4.3.1 is true becauseY[u]t jY(2)t ;H ? t1?up(^[u]t ; 1? ? up ? l + 1Ptj2?ujg;? ? up ? l + 1);(4.17)where^[u]t = Zt?[u]0 + (Y(2)t ? Zt?[g](2))??122 RT2 ?gu; (4.18)andPtj2 = 1 + ZtF?1ZTt + (Y(2)t ? Zt?[g](2))??122 (Y(2)t ? Zt?[g](2))T : (4.19)Further work by Sun et al. (1998) shows these credibility ellipsoids to bewell calibrated. These credible regions are also constructed and comparedwith the pointwise predictive intervals when interpolating in the Chicago's66Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationhourly ozone ?eld, in the next section.4.4 Spatial InterpolationThis section's main theme concerns the interpolation of ozone concentrationsat those ten ungauged sites using the multivariate BSP approach for theChicago's hourly ozone ?eld. Moreover, two other approaches, the DLMand NAIVE, are also used in the same interpolation problem to assess themodel's performance. The interpolation results using these three approachesare then compared in this section.The multivariate BSP approachFor the multivariate BSP model given by (4.1){(4.3), hourly ground{levelozone concentrations's ?eld is modelled as a trend plus a detrended spatio{temporal noise (i.e., detrended residuals) at the ?rst level of a hierarchi-cal model; at the second level, the temporal dependence in the detrendedspatio{temporal noise is modelled through an autoregressive process withthe deAR'd residuals. These deAR'd residuals are then interpolated atthe ungauged sites in the spatio{temporal ?eld. Finally, these interpolateddeAR'd residuals are imputed by taking them back to the trend model toget the interpolated values at the ungauged sites. To do this, we square theinterpolated responses due to the square{root transformation we made dueto the normality assumption assumed for this model.The multivariate BSP model has been used to interpolate Vancouver'shourly PM10 ?eld (Li et al., 1999; Zidek et al., 2002). A multivariate datamodel is created to reduce the loss of spatial correlation in the deAR'd resid-uals compared with their detrended counterparts, and to allow the predic-tions to be made on any response by borrowing strength from the close{byresponses through their spatial correlations (Le et al., 1999; Zidek et al.,2002; Le & Zidek, 2006, p.277). This loss of spatial correlation has beencalled the spatial correlation leakage problem (also seen in Section 4.5) byLi et al. (1999), Zidek et al. (2002), and Le and Zidek (2006). The spatial67Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationleakage problem occurs if nonnegligible lag{1, lag{2, etc., spatial correlationsexist in the deAR'd residuals. Using those problematic deAR'd residuals tointerpolate at ungauged sites, one may anticipate that the interpolators atany given ungauged site cannot borrow strength from its gauged neighboursdue to the small number of spatial correlation between sites (Zidek et al.,2002; Le & Zidek, 2006). Figures 4.6 plots the spatial correlations of thedetrended and deAR'd residuals between sites, respectively. The substantialloss of the spatial correlation of the deAR'd residuals suggests the use of themultivariate strategy adopted here. In particular, that strategy enables oneto bypass the need for di?cult ?ne scale autocorrelation modelling that willinevitably be inexact and hence induce the lagged spatial cross{correlationsthat can cause that leakage.Figure 4.6: The estimated spatial correlations of: (a){detrended residuals, and(b){deAR'd residuals; between gauged sites.But what kind of multivariate data settings should one select? Le et68Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolational. (1999) and Zidek et al. (2002) argue for the use of the daily responsevectors with hourly response coordinates instead of the hourly responsesthemselves. One of their arguments, that the spatial leakage problem is neg-ligible for the daily responses but not for the hourly ones, is supported bytheoretical results (Zidek et al., 2002). A related argument is that the dailydeAR'd residuals are approximately independent from each other based onthe AR structure of the detrended residuals, approximately satisfying the in-dependence assumptions for the responses in (4.1){(4.3). These argumentsalso apply to the Chicago's ground{level ozone ?eld, as demonstrated byFigure 4.7 which depicts the partial autocorrelation functions (PACFs) ofthe detrended residuals, indicating an hourly AR(2) process. That impliesthe possible choices of dimensions, 2,:::, or even 6, for the daily responsevectors to achieve adequate temporal separation between them to ensure in-dependence while at the same time, allowing pairwise temporal correlationsbetween hours to be estimated. Choice of the appropriate dimensionalityof the response vectors is based on how severe the spatial correlation leak-age may result. In other words, we use multivariate AR process for thedetrended residuals, avoiding di?erent AR process for responses at di?erentmonitoring locations.For simplicity, let (i:j){hour represent the sub{data matrix for the hourlyresponses from hour i to j across the gauged sites, for i;j 2 f1;:::;pg. Sup-pose one were interested in interpolating say hour 11's ozone level at any?xed ungauged site given all the observed responses at the gauged sites.As we discussed above, the possible response vectors could be the (10:11){hours, :::, or the (6:11){hours. For each choice of these response vectors,the spatial correlations between all the gauge sites are estimated using themultivariate BSP approach. Figure 4.8 shows these estimated spatial corre-lations for the (10:11){hours, :::, (6:11){hours response vectors and that ofthe detrended residuals. Notice how the spatial correlation declines as thedimensions of the response vector increases (leakage). This boxplot showsthat the (10:11){hours response vectors have the smallest number of loss ofspatial correlations between the gauged sites, so becoming the right choicefor a multivariate data model. This is true for other hours as well, strongly69Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.7: The PACF plots for the square{root of hourly ozone concentrations(pppb) at the 14 gauged sites in the Chicago's area AQS database.supporting the use of 2{hour{block as the response vector. Hence, 2{hour{block data are extracted from the Chicago AQS ozone database (2000) toserve as the multivariate responses in a multivariate BSP model designed tointerpolate the hourly ozone concentrations in the ?eld.Prior to implementing the multivariate BSP approach, a small number ofmissing measurements are ?lled in by the spatial regression method. For themodel in (4.1){(4.3), p = 2; n = 123; u = 10; and g = 14: The multivariateBSP approach is repeated 24 times by successively cycling the ?rst hourin the two{hour{block through the day to predict the hourly ozone levelsat the 10 ungauged sites, and the corresponding 95% pointwise predictiveintervals.For example, suppose the hour of \interest" was hour 11. The re-sponse variable at any ?xed day, t; and gauged site, j; could be written70Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.8: Boxplots for the spatial correlations of the detrended residuals (De-trended), and the estimated spatial correlations using the square{root of hourlyozone concentrations during the hours of: 9 A.M. to 10 A.M. (10:11), 8 A.M. to 10A.M. (9:11), 7 A.M. to 10 A.M. (8:11), 6 A.M. to 10 A.M. (7:11), and 5 A.M. to10 A.M. (6:11), respectively.as Y[gj]t = (Y[gj]t;10;Y[gj]t;11) : 1?p: Given the BSP model in (4.1){(4.3) and pri-ors of the parameters, the predictive posterior distribution of Y[u] is givenby (4.7). Notice that l = 0 because the response vector contains no miss-ing values. Two covariates, month with four levels and weekday with sevenlevels, are considered in this approach due to the exploratory data analy-sis described above for this ?eld, which returns h = 11: Let e1 = (0;1)0and E1 = (e10;:::;e10)0 : up ? 1: By a basic property of the multivariatet{distribution and the predictive posterior distribution of Y[u]t in (4.17), theposterior distribution of the interested pollutant at ungauged sites is given71Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationbyY[u]t E1jY(2)t ;H ? t1?u(^[u]t E1; 1? ? up ? l + 1Ptj2E01?ujgE1;? ? ug ? l + 1): (4.20)Let e0;j be the u{dimensional vector such that the jth entry is 1 and 0 oth-erwise, for j = 1;:::;u: Then the unobserved pth pollutant at the ungaugedsite uj; Y[u]t E1e0;j, has the following conditional posterior distributionY[u]t E1e0;jjY(2)t ;H ? t??up?l+1(^[u]t E1e0;j; 1? ? up ? l + 1Ptj2e00;jE01??ujgE1e0;j): (4.21)Hence, the predictive posterior mean and variance of the unobserved pthpollutant are given byE(Y[u]t E1e0;jjY(2)t ;H) = ^[u]t E1e0;jandVar(Y[u]t E1e0;jjY(2)t ;H) = 1? ? up ? l + 1Ptj2e00;jE01?ujgE1e0;j;respectively. Consequently, the pointwise 95% predictive intervals of theunobserved pth pollutant at the ungauged site ui and ?xed day t is given by:E(Y[u]t E1e0;jjY(2)t ;H) ? t??up?l+1(0:025)(Var(Y[u]t E1e0;jjY(2)t ;H))1=2:(4.22)The above procedure is then repeated 24 times to interpolate the hourlyozone levels in this ?eld. The software used for this multivariate BSP ap-proach is EnvioStat.1.0, found at http://enviro.stat.ubc.ca.We next compare our approach with two others, the DLM and NAIVE.72Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationThe DLM approachThe dynamic linear model (DLM) is one alternative approach to the spatialinterpolation in the Chicago area's hourly ozone ?eld. The DLM and its im-plementation have been extensively explored in Chapter 2. In the Chicago'sAQS ozone database, the total number of time points (i.e., the hours), T;is 123 ? 24 = 2952. To compare its interpolation results with those of themultivariate BSP, same gauged and ungauged sites are selected as in Section4.4. Hence the total number of the gauged sites, n; is 14. The initial settingsfor the DLM are given next, following an investigation of the discount factorfor ?rst{order polynomial dynamic models, to optimize this approach.Our investigation indicates that the appropriate prior for the phase pa-rameters a = (a1;a2)0 is N(?0;?0); where ?0 = (1:5;4:5)0 and?0 =? 0:0625 00 0:5625!:The best initial speci?cation for the state parameters turns out to be m0 =(5:15;?0:7510n;0:0510n)0 for location while C0 is a block diagonal matrix withdiagonal entries: 0:1304; 0:0158In; and 0:0003In:The ?rst{order polynomial dynamic model, for t ? 1; is then given by:yt = ?t + "t "t ? N(0;?2) (4.23)?t = ?t?1 + !t !t ? N(0;?2?2?); (4.24)and the initial information: ?0 ? N(0;?20): The measurement of the rate ofadaptation to new data, i.e., the adaptive coe?cient At; converges to theconstant A as t ! 1; where A is a function of r = ?2?; the ratio of the statevariance to the system variance (r 2 [0;1]) (Harvey, 1984; West & Harrison,1997). r is also called the signal{to{noise ratio, large values of r implyinga clear signal in the system. The adaptive coe?cient A re?ects how muchweight to put on the new data in updating the predictive mean for thissimplest DLM: larger value of A represents more weight on the new data;smaller value of A represents less weight on the new data. Consequently, the73Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationdiscount factor, ?, is de?ned as 1 ? A: Hence, the larger the discount factoris, the less weight is attached to information provided by the new data inupdating the predictive mean at current time; the smaller ? is, the moreweight is put on the information of the new data. If ? = 0; the dynamicmodel would not put any weight on the new data, resulting in a very poormodel ?t. In other words, there is no signal in the system but only the noisefrom the measurements. It is impossible to distinguish the signal and thenoise in the system. If ? = 1; almost all the weight would be on the newdata, resulting in a non{distinguishable signal and noise. West and Harrison(1997) recommend setting the values of ? between 0.8 and 1. A; r; and ?; arerelated by the equation: r = A2(1 ? A)?1 = (1 ? ?)2??1 (West & Harrison,1997).? 0.80 0.85 0.90 0.95 0.99r 0.05 0.0265 0.0111 0.0026 0.0001Table 4.1: The relationship between the discount factor (?) and the signal{to{noiseratio (r).Table 4.1 shows some of the values of ? and its corresponding r; that is,?2? in (4.23){(4.24). This table provides one way to set the initial values of?2?: For the constant DLM model in Chapter 2 given the common variance?2 the canonical trend has a Gaussian distribution with variance ?2?2y ; theperiodic trend with period 24, a Gaussian distribution with variance ?2?21 ;and the periodic trend with period 12, a Gaussian distribution with variance?2?22 . The variability of the canonical trend is assumed to be larger thanthat of the periodic trends; while the variability of the periodic trend withperiod 24 is assumed to be larger than that of the periodic trend with period12. In other words, one assumes that the canonical trend puts more weighton the information provided by the new data than other periodic trends,and the periodic trend with periodic 24 has more weight than the periodic12 trend. For example, if the discount factors of the canonical, periodic 24,and periodic 12 trends were set to 0.85, 0.90, and 0.95, respectively, thecorresponding values of ?2y ; ?21 ; and ?22 would be 0.0265, 0.0111, and 0.0026,respectively. Those values, divided by 17, are then selected to be the ?xed74Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationvalues of the model parameters: ?2y ; ?21 ; and ?22 : [The reason to divide themby 17 were discussed in Section 3.4 to take account of the 17 weeks' timee?ect.] Both ?1 and ?2 are set at 25. The hyperpriors for ?2 and ? areIG(2;2) and IG(1;5), respectively.Those initial settings of the DLM in Section 3.2 improve the DLMmodel's ?t in interpolating the responses at ungauged sites. However, it un-derestimates the variability of the system, leading to a trade{o? between theprecision and the variability of the spatial interpolator. The interpolation re-sults for Chicago's hourly ozone ?eld are shown in Section 4.4. The software,GDLM.1.0., can be freely downloaded from http://enviro.stat.ubc.ca/dlm,to allow our results to be reproduced or for application in other contexts.NAIVE approachOne might well suspect that the responses at the nearby gauged sites af-fect interpolation at ungauged sites. It may seem plausible that for a roughspatio{temporal ?eld, the spatial correlations between the sites are so smallthat the responses at di?erent sites can be viewed to be approximately in-dependent from each other. This leads to NAIVE approach, i.e., the \in-terpolated" values at any ungauged site are the observed responses at thatgauged site closest to the ungauged one.Ungauged Site Gauged Site GCD (km)1 1 18.192 1 5.743 1 6.094 4 8.705 7 12.996 7 19.467 7 13.508 1 13.979 11 15.8610 10 9.33Table 4.2: The greatest circle distance (GCD) between the pairs of ungauged sitesand their closest gauged site(s) in the Chicago's hourly O3 ?eld.75Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationTable 4.2 represents the GCD (in km) between the ungauged sites andtheir closest gauged sites in the Chicago's hourly ozone ?eld. For example,Gauged Site 11 is the closest one to Ungauged Site 9 in terms of the GCDin Table 4.2. The interpolated values at Ungauged Site 9 are the observedresponses at Gauged Site 11 in this approach.NAIVE approach ignores the local characteristics in the ?eld, not takingaccount of the e?ects of the covariates, or other factors such as wind andwind directions among the stations. If NAIVE approach had been provedcomparable or better than the multivariate BSP and DLM, one would preferit owing to its great simplicity. Section 4.4 compares NAIVE approach withthe multivariate BSP and DLM.Comparisons and resultsThis subsection implements the multivariate BSP, DLM, and NAIVE ap-proaches, to interpolate the hourly ozone concentrations at ungauged sitesin the Chicago area. These interpolation results are compared with eachother to assess the model performance of the multivariate BSP, DLM, andNAIVE approaches. Figures 4.9{4.16 plot the interpolation results of thethree approaches at one of the ungauged sites, Ungauged Site 7; duringthe ?rst week to the sixteenth week. In these four ?gures, the dots rep-resent the observed values at this ungauged site, the solid and dotdashedlines represent the predictive mean and 95% pointwise predictive intervalsfor the multivariate BSP approach, the dashed and dotted lines representthe predictive mean and 95% empirical predictive intervals for the DLM ap-proach, and the addition signs represent the interpolated value by NAIVEapproach, i.e., the observed values at Gauged Site 7 in this case (refer toTable 4.2). Those empirical predictive intervals of the DLM are wiggly andmonotonically increasing as time increases, one feature observed in Section3.4. On the other hand, the 95% pointwise predictive intervals of the mul-tivariate BSP do not have such wiggly behavior and capture both the long{and short{term trends at Ungauged Site 7. The interpolation of NAIVEapproach is not as good as those of the DLM and multivariate BSP for most76Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationtime points, but it does occasionally do well when the interpolation resultsfor the other two approaches deviate from the observed responses.Figure 4.9: Interpolation at Ungauged Site 7 from the 1st week to the2nd week. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictive inter-vals by the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.]The closer the ungauged sites are to the gauged ones, the better the ex-pected interpolation performance; that is the assumption upon which thesethree approaches rest. Figures 4.17{4.18 plot the interpolation results for77Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.10: Interpolation at Ungauged Site 7 from the 3rd week to the4th week. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictive inter-vals by the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.]the three approaches at Ungauged Site 10 during the 1st and 10th week,respectively. The overestimated predictive variances of the DLM have amonotone increasing trend in the 10th week, compared with those in the1st week. NAIVE approach performs quite di?erently from one time to thenext. For example, Figure 4.17 shows that NAIVE approach produces good78Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.11: Interpolation at Ungauged Site 7 from the 5th week to the6th week. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictive inter-vals by the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.]predictive values that are quite close to the observed responses at UngaugedSite 10 at most times in the 1st week. However, Figure 4.18 shows that thepredicted values for NAIVE approach tend to underestimate the observedresponses. One therefore suspects that other factors contribute to the spatialcorrelation between these sites, for instance, covariate e?ects, such as wind79Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.12: Interpolation at Ungauged Site 7 from the 7th week to the8th week. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictive inter-vals by the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.]and its direction. Two close{by sites may be correlated because they lie inthe same wind direction; on the other hand, two close{by sites may lookindependent of one another if their locations lie on a line orthogonal to thewind direction. Other interpolation results, demonstrated in Figures 4.19{4.21, are also examined to check for the association between the response80Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.13: Interpolation at Ungauged Site 7 from the 9th week to the10th week. The square{root of hourly ozone concentrations are plottedon the vertical axes and hours, on the horizontal axes. [Solid (dot-dashed) lines = interpolation and 95% pointwise predictive intervalsby the BSP; dash (dot) lines = interpolation and 95% predictive inter-vals by the DLM; + = interpolation by NAIVE; and ? = observationsat Ungauged Site 7.]variables at the ungauged site and its closest gauged neighbour.The marginal posterior distribution of the pth pollutant tends to overesti-mate the predictive variance of the process, as observed in Figures 4.17{4.21.The posterior ellipsoid credible regions, constructed from the joint posteriordistributions of the p pollutants, provide well calibrated predictive results81Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.14: Interpolation at Ungauged Site 7 from the 11th weekto the 12th week. The square{root of hourly ozone concentrations areplotted on the vertical axes and hours, on the horizontal axes. [Solid(dotdashed) lines = interpolation and 95% pointwise predictive inter-vals by the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = observa-tions at Ungauged Site 7.]using the multivariate BSP approach. Figure 4.22 plots the empirical el-lipsoid coverage probabilities across the ungauged sites at various nominallevels. Their averages across the ungauged sites are listed in Table 4.3,showing slightly underestimated predictive variances.Table 4.4 shows the mean square predictive error (MSPE) at each un-82Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.15: Interpolation at Ungauged Site 7 from the 13th weekto the 14th week. The square{root of hourly ozone concentrations areplotted on the vertical axes and hours, on the horizontal axes. [Solid(dotdashed) lines = interpolation and 95% pointwise predictive inter-vals by the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = observa-tions at Ungauged Site 7.]gauged site obtained by the three approaches. For all the ungauged sites,the NAIVE approach has the largest MSPE and has the poorest model per-formance of the three. Except for Ungauged Sites 2 and 8, the multivariateBSP has the smallest values of the MSPE among these three approaches,overall the best model performance among the three. At Ungauged Sites83Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.16: Interpolation at Ungauged Site 7 from the 15th weekto the 16th week. The square{root of hourly ozone concentrations areplotted on the vertical axes and hours, on the horizontal axes. [Solid(dotdashed) lines = interpolation and 95% pointwise predictive inter-vals by the BSP; dash (dot) lines = interpolation and 95% predictiveintervals by the DLM; + = interpolation by NAIVE; and ? = observa-tions at Ungauged Site 7.]2 and 8; the MSPE of the DLM is slightly smaller than that of the multi-variate BSP, implying possible local variations in the characteristics of theungauged sites. In other words, one might well add a nugget e?ect in themultivariate BSP approach to account for the variations among the responsevariables.84Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.17: The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap-proaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Ungauged Site 10.Figures 4.23 and 4.24 plot the ratios of the MSPEs of NAIVE and theDLM to that of the multivariate BSP, respectively. Figure 4.23 shows thatthe interpolation for the multivariate BSP has smaller MSPE than that forNAIVE approach over all seventeen weeks and ten ungauged sites. Figure4.24 plots the ratio of the MSPE of the DLM interpolator to that of themultivariate BSP, showing that better model performance of the latter overall 17 weeks and most ungauged sites.85Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.18: The observed square{root of ozone concentrations (pppb) duringthe 10th week, the interpolation using the multivariate BSP, DLM and NAIVEapproaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Ungauged Site 10.The coverage probabilities of the multivariate BSP and DLM are com-puted at the 95% nominal level as a further assessment of model perfor-mance. Figures 4.25 and 4.26 show the side{by{side boxplots of these cov-erage probabilities over 10 ungauged sites and 17 weeks, respectively. Thecoverage probabilities of the DLM are much higher than the 95% nominallevel, for most weeks and ungauged sites. In fact, the pointwise coverageprobabilities of the multivariate BSP tend to be closer to that nominal level86Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.19: The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap-proaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Ungauged Site 9.than those of the DLM.Comparing both the MSPEs and coverage probabilities from the threeapproaches, NAIVE approach has the worst model performance among thesethree; the DLM approach does better than NAIVE, but not as good asthe multivariate BSP. Overall, the multivariate BSP has the best modelperformance among these three.Temporal prediction is addressed in the next section by using the mul-87Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.20: The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap-proaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Ungauged Site 8.tivariate BSP approach in the Chicago area's hourly ozone ?eld. Moreover,that model is assessed by comparing with the other two approaches: theDLM and NAIVEcurrency1 (see Section 5.3).88Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.21: The observed square{root of ozone concentrations (pppb) duringthe 1st week, the interpolation using the multivariate BSP, DLM and NAIVE ap-proaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Ungauged Site 2.4.5 Spatial Leakage in the DLMZidek et al. (2002) address the crucial problem caused by the spatial leakage,or the space{time interaction problem in spatially interpolating the hourlyPM10 in Vancouver area using the univariate approach. The spatial leakageoccurred when substantial amount of between{site cross{correlations existsuntil certain time lag. The whitened residuals used in their model comefrom two steps: (i) ?rstly obtaining the detrended residuals by taking o?89Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.22: Boxplot of the simultaneous posterior ellipsoid credibility regions atvarious nominal levels.the covariates e?ect from the response variables, for ?xed site and varyingtime points; (ii) then the deAR'd (or whitened) residuals can be obtainedafter taking account the temporal dependence into the detrended residuals.To be more speci?c, suppose Y (s;t) represents the response variableat site s and time t; for s 2 fs1;:::;spg and t = 1;:::;n: Let zt : 1 ? lbe an l-dimensional covariates vector at time t for all sites and ? : 1 ? l;the regression coe?cients corresponding to the l covariates. The detrended90Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationNominal Coverage (%) 99 95 90 80 70 60 50Ellipsoid Credible Coverage (%) 82 75 71 65 62 57 53Table 4.3: The posterior ellipsoid coverage probabilities at various nominal levels.Ungauged Site MSPE(NAIVE) MSPE(DLM) MSPE(BSP)1 2.96 1.79 1.162 2.17 1.55 1.613 1.62 1.80 1.304 1.65 1.46 0.945 2.52 1.92 1.036 3.08 1.85 0.997 2.13 1.60 0.978 3.74 2.62 2.679 1.42 1.63 1.0110 0.46 0.87 0.38Table 4.4: The mean square predictive error (MSPE) at ungauged sites of themultivariate BSP, DLM, and NAIVE approaches.residuals, E(s;t), are given byE(s;t) = Y (s;t) ? zt?: (4.25)Assuming an AR(q) process for the detrended residuals, the deAR'dresiduals are denoted by e(s;t); and given byE(s;t) =qXi=1?iE(s;t ? i) + e(s;t): (4.26)Theoretical results in Zidek et al. (2002) show the existence of potentialspatial correlations leakage for an AR(1) (detrended) process. The connec-tions between Le{Zidek approach and the DLM allow one use the formermodelling in the DLM framework. Moreover, as a speci?c case of state{space model, the DLM framework can be treated as a bridge to unify theLe{Zidek approach and the general state{space modelling. Our objectiveis to investigate potential spatial correlations leakage in the general state{91Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.23: The ratio of MSPE of the interpolation by NAIVE to that of themultivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites.space modelling for this approach. The following content addresses the workfor an AR(p) process in (4.26). To reformulate the Le{Zidek approach asthe DLM, we can assume zt itself having a random Gaussian process withmean 0 and very small variance matrix, ?2zIl: Of course, we can also assumethat zt is ?xed at each time point t, exactly the same as in Zidek et al.(2002). In other words, we consider two cases here:Case (i) zt itself has a stochastic process, for t = 1;:::;n; thatzt = zt?1 + wzt wzt ? N(0;?2zIl): (4.27)92Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.24: The ratio of MSPE of the interpolation by the DLM to that of themultivariate BSP for each of: (a) the 17 weeks; and (b) the 10 ungauged sites.Case (ii) zt is ?xed at ?xed time point t; for t = 1;:::;n:For Case (i), models (4.25){(4.27) can be normalized in the DLM asfollows:Y (s;t) = F0?(s;t) (4.28)?(s;t) = G?(s;t ? 1) + wt; (4.29)where F0 : 1?(q+1) = (1;1;0;:::;0); ?(s;t)0 : 1?(q+1) = (zt?;E(s;t);E(s;t?93Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.25: Side{by{side boxplots of the coverage probabilities of the multivariateBSP and DLM approaches plotted against the 10 ungauged sites, respectively.1);:::;E(s;t ? q + 1)); w0t : 1 ? (q + 1) = (wzt ?;e(s;t);0;:::;0) andG : (q + 1) ? (q + 1) =0BBBBBBBB@1 0 0 ::: 00 ?1 ?2 ::: ?q0 1 0 ::: 0... ... ... ...0 0 0 ::: 11CCCCCCCCA:For Case (ii), we can also formalize models (4.25){(4.27) same as above94Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial InterpolationFigure 4.26: Side{by{side boxplots of the coverage probabilities of the multivariateBSP and DLM approaches plotted against the time span of 17 weeks, respectively.except the ?rst entry in F0 to be zt((zcurrency1t )0zcurrency1t )?1(zcurrency1t )0 and the (1,1) entry in G,zcurrency1t ((zcurrency1t?1)0zcurrency1t?1)?1zcurrency1t?1; and zcurrency1t = zt ? Pqi=1 ?izt?i:For both cases, let c0i : 1 ? (q + 1) such that its ithentry being 1 andothers, 0, for i = 1;:::;q + 1: Then we haveE(s;t ? i) = c0i+2?(s;t) (4.30)e(s;t) = c02w(s;t); (4.31)95Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationfor i = 0;:::;q ? 1:Zidek et al. (2002) shows that the spatial correlation leakage occurs forthe AR(1) process if Cor(E(s;t? 1);e(s0;t)) 6= 0: Hence the lag{1 between{site cross{covariance for any two sites s and s0 can be written as:Cov(E(s;t ? 1);e(s0;t)) = Covfc03?(s;t);c02w(s0;t)g= c03Covf?(s;t);w(s0;t)gc2= c03CovfG?(s;t ? 1) + w(s;t);w(s0;t)gc2= c03Cov(w(s;t);w(s0;t))c2= c03Wt(s;s0)c2; (4.32)where Wt : p(p+1)?p(p+1) is the covariance matrix for the vector wt : p(p+1)?1 = (w(s1;t)0;:::;w(sp;t)0)0 at ?xed time point t and Wt(s;s0) : (p+1)?(p + 1); the covariance matrix for w(s;t) and w(s0;t) for ?xed t: Therefore,the lag{1 between{site cross{correlation is not zero if (Wt(s;s0))32 6= 0:In general, for i = 1;:::;q?1; consider the covariance between E(s;t?i)and e(s0;t); we then haveCov(E(s;t ? i);e(s0;t)) = Cov(c0i+2?(s;t);c02w(s0;t))= c0i+2Cov(?(s;t);w(s0;t))c2= c0i+2Wt(s;s0)c2: (4.33)Hence, the lag{i between{site cross{correlation is not zero if (Wt(s;s0))i+2;2 6=0: This result implies that the spatial correlation leakage occurs if, fori = 1;:::;q ? 1, Wt(s;s0)i+2;2 6= 0 for any two sites s and s0: It meansthat the DLM framework cannot avoid the spatial correlation leakage prob-lem for the AR(q) process if such conditions are satis?ed.4.6 Conclusion and DiscussionThe multivariate BSP approach can be used to overcome the spatial correla-tion leakage problem for the ?ltered residuals between the sites; that other-96Chapter 4. Multivariate Bayesian Spatial Prediction and Its Spatial Interpolationwise tends to incorporate a signi?cant temporal structure in those residuals.Unlike the DLM, the multivariate BSP approach does not assume sta-tionarity and isotropy for the underlying process. The multivariate BSP canincorporate the uncertainty about hyperparameters through fully hierarchi-cal Bayesian modelling. On the other hand, the DLM approach needs somehyperparameters to be ?xed to achieve similar objective. The multivari-ate BSP can provide ellipsoid credible regions for multiple pollutants andsimultaneous prediction; the DLM provides predictive intervals for singlepollutant only. The multivariate BSP is much more computationally e?-cient than the DLM. Unlike the latter it can interpolate hourly ground{levelozone over a large number of monitoring stations, an impossible task for theDLM. In other words, it is computationally scalable.97Chapter 5Multivariate BayesianSpatial Prediction and ItsTemporal PredictionNot only are people interested in interpolating responses at ungauged sitesgiven all the information at gauged sites, but also in temporally predictingthem at these gauged sites. In fact, 24{hour ahead ozone forecasting is nowcommonly done in many urban areas. The modeller may be asked: \Whatwill the ozone concentration level be at 2 p.m. tomorrow given all data until10 a.m. today?"; or, \What will the ozone levels be tomorrow if I have allthe measurements up to today?"The multivariate BSP approach can be adapted to answer such questionsby taking a 24{dimensional multivariate response variable formed from thedaily 24 hourly univariate response variables, treated like 24 \species" or\pollutants"; each entry therein represents one measurement for each of thesuccessive 24 hours. The multivariate model is needed due to the fact thatthe daily dependence structure of the 24{dimensional multivariate responsevariable invalidates the row independence assumption of the deAR'd resid-uals in the multivariate BSP approach.To apply the BSP, one must obtain estimates of the hyperscale spatialcovariance of the 24 pollutants given all the observed response variables.However, the sequence of the 24{dimensional vectors of hourly responsesdoes not meet the BSP's assumption of independence. To overcome thisdi?culty, subsequences of vectors are systematically extracted to containresponses separated by 24 hours. This subset of data is then used to predict98Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Predictioneach one of the 24 response variables at the future day.More speci?cally, two sub{data matrices are formed by selecting these24{dimensional vectors, one sequence for odd days and a second for evendays. Model assumptions hold for each sequence, allowing hyperparam-eters at gauged sites to be estimated by the EM{algorithm via the En-viRo.stat software cited earlier. Each \pair" of estimated hyperparametersets can then be averaged to form approximate \estimates" of hyperparame-ters given all the data. Finally, the one{step ahead prediction is implementedat gauged sites given those estimates of hyperparameters and observed re-sponses. However, to get the 24{hour ahead forecast, obstacles remain thatwe now turn to.In fact, Section 5.1 demonstrates how to construct the above sequencesand the corresponding subdata matrices along with the multivariate settingsof the BSP model to predict each one of the 24 responses next day. Moreover,their predictive posterior distributions are developed and the correspondingpointwise predictive intervals at each gauged site, constructed. Section 5.2illustrates the h{step ahead prediction by the DLM approach. Section 5.3presents the one{day ahead temporal prediction by NAIVEcurrency1 approach. Sec-tion 5.4 shows the results and comparisons of the one{day ahead predictionby the three approaches at gauged sites.5.1 The Multivariate BSP ApproachSuppose Y [gmj ]t;i represents the unobserved ith response variable at day t;gauged site j, and Y [goj]t;i ; the observed response variable, for t = 1;:::;T,i = 1;:::;p, and j = 1;:::;g. Speci?cally in the Chicago's AQS database,T = 120; p = 24; and g = 14: Two cases are considered here to predictthe response variable at say (i) 11 P.M., or (ii) any single hour during the 0A.M. to 10 P.M. period on day 121, the last day in the observation sequence,whose data we set aside to be used for assessment.? Case 1: Predict the response variable at the last hour (i.e.,11 P.M.) of the 121st day.99Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionCall the corresponding multivariate BSP model \Model{1". One ofthe subdata matrices can be formed by fU(1)t g : 1 ? gp; where U(1)t =(Y [go1]2t?1;1;:::;Y [gog]2t?1;p); the other by V(1)t : 1?gp; where V(1)t = (Y [go1]2t;1 ;:::;Y [gog]2t;p ); for t = 1;:::;59: In other words, the observed response vari-ables from the 1st day to the 119th day are used as the \data". Afterthat, each of the subdata matrices is used with Model{1 to ?nd es-timates of the hyperparameters. Moreover, \approximate" estimatesof the hyperparameters given all the observed response variables atgauged sites are obtained by averaging each pair of the hyperparame-ters estimated by the BSP using these two matrices respectively.? Case 2: Predict the response variable at the (k ? 1)th hour ofthe 121st day, for k = 2;:::;p:Call the corresponding multivariate BSP model \Model{k". One ofthe subdata matrices can be formed by fU(k)t g : 1 ? gp; where U(k)t =(Y [go1]2t?1;k;:::;Y [go1]2t?1;p;Y [go1]2t;1 ;:::;Y [go1]2t;k?1;:::;Y [gog]2t?1;k;:::;Y [gog]2t?1;p;Y [gog]2t;1 ;:::;Y [gog]2t;k?1); the other by fV(k)t g; where V(k)t = (Y [go1]2t;k ;:::;Y [go1]2t;p ;Y [go1]2t+1;1;:::;Y [go1]2t+1;k?1;:::;Y [gog]2t;k ;:::;Y [gog]2t;p ;Y [gog]2t+1;1;:::;Y [gog]2t+1;k?1): One can obtain es-timates of the hyperparameters of Model{k in the same way as in Case1.The covariates, i.e., the weekdays, are constructed by starting from\Mondaycurrency1" for the fU(k)t g and \Tuesdaycurrency1" for the fV(k)t g, k = 1;:::;p;where the \currency1" has been added to signify that the beginning of the day hasbeen shifted successively by 0;1;:::;23 hours, according to which hour ofday 121 is to be predicted. This weekday e?ect is removed from the fU(k)t gsand fV(k)t gs, respectively, and the software, EnviRo.stat, used to implementthese 24 models.Let Y = (Y[u];Y[g]) where Y[g] = (Y[gm]0;Y[go]0)0 and Y[u] : n?up; withY[gm] : m ? gp and Y[go] : (n ? m) ? gp: The temporal prediction problemneeds the predictive posterior distribution of (Y[gm]jY[go];H), m being thetemporal unit to be predicted. Speci?cally, m = 1 in the one{day{aheadprediction at gauged sites.100Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionLetZ?[g]0 =? ?(1)?(2)!:? m ? gp(n ? m) ? gp!andIn + ZF?1Z0 =? A11 A12A21 A22!:? m ? m m ? (n ? m)(n ? m) ? m (n ? m) ? (n ? m)!:Given all the estimated hyperparameters H = fF;?0; ;currency11;?1;currency10;?00;H0;?0g; the marginal posterior distribution is given byY[gm]jY[go];H ? tm?gp(?(ujg);?(ujg) ?(ujg);?(ujg)); (5.1)where?(ujg) = ?(1) + A12A?122 (Y[go] ? ?(2)) (5.2)?(ujg) = ?1 ? gp + 1?1 ? gp + n ? m + 1(A11 ? A12A?122 A21) (5.3)?(ujg) = 1?1 ? gp + 1fcurrency11 + (Y[go] ? ?(2))0A?122 (Y[go] ? ?(2))g(5.4)?(ujg) = ?1 ? gp + n ? m + 1 (5.5)(Le & Zidek, 2006, p.160{161).To obtain the one{step{ahead temporal prediction at the gauged sites,one needs the predictive posterior distribution of the unobserved responsevariable of interest, that is, the last \species" or \pollutant" in the multi-variate response vector whose role is now being played by an hourly ozoneconcentration. Two di?erent predictive posterior distributions of the lastpollutant (i.e., the pth pollutant) are considered for Model{1 and Model{k,k 2 f2;:::;pg. These two cases follow:? For Model{1, Y[go] has the observed responses from day 1 to day 119,101Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Predictionand Y[gm] can be written asY[gm] = ((Y [gm1 ]121;1;:::;Y [gmg ]121;p)0;:::;(Y [go1]120;1;:::;Y [gog]120;p)0)0;with Y[gm1:g]121;1:p : 1 ? gp; the unobserved response vector of day 121 andY[go1:g]120;1:p : 1?gp; the observed response vector of day 120. Hence m = 2and n = 121: The predictive posterior distribution of Y[gm] can beobtained by using (5.2){(5.5). To obtain the predictive distribution ofY[gm1:g]121;1:p given Y[go1:g]1:119;1:p and Y[go1:g]120;1:p, one can decompose ?(ujg); ?(ujg)and ?(ujg) as follows:?(ujg) =? ?1r?2r!:? 1 ? gp1 ? gp!and?(ujg)?(ujg) =? B11 B12B21 B22!:? 1 ? 1 1 ? 11 ? 1 1 ? 1!:Hence, the predictive posterior distribution of Y[gm1:g]121;1:p is given byY[gm1:g]121;1:pjY[go1:g]120;1:p;Y[go1:g]1:119;1:p;H ? t1?gp(?1r + B12B?122 (Y[go1:g]120;1:p ? ?2r);(B11 ? B12B?122 B21)?(ujg) + 1 ?(ujg)(Igp+??1(ujg)(Y[go1:g]120 ? ?2r)0B?122 (Y[go1:g]120??2r))): (5.6)Let E1 : gp ? g be a block diagonal matrix with the block ep : p ? 1;having pth diagonal element 1 and all others 0. At Gauged Site j 2f1;:::;gg, the predictive distribution of the pth unobserved responseY[gmj ]121;p; that is, Y[gm1:g]121;1:pE1ej, also has a t{distribution:Y[gmj ]121;pjY[go1:g]120;1:p;Y[go1:g]1:119;1:p;H ? t?(ujg)+1(?currency1E1ej;?currency1e0jE01?currency1E1ej);(5.7)102Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Predictionwhere ?currency1 = ?1r+B12B?122 (Y[go1:g]120;1:p??2r); ?currency1 = (B11?B12B?122 B21)=(?(ujg)+1) and ?currency1 = ?(ujg)(Igp +??1(ujg)(Y[go1:g]120;1:p ??2r)0B?122 (Y[go1:g]120;1:p ??2r)):? For Model{k, k = 2;:::;p, Y[go] has the observed responses from day1 to day 119, while Y[gm] consists of k ? 1 unobserved responses andp ? k + 1 observed ones at each gauged site. To predict the responsesone{day{ahead at ungauged sites in the ?eld, one has m = 1 andn = 120 in (5.2){(5.5). Let E2j : gp ? p be the matrix obtained bystacking g (p ? p) matrices, in which the jth matrix is Ip and othersare 0. At Gauged Site j 2 f1;:::;gg, we haveY[gm]E2jjY[go];H ? t1?p(?(ujg)E2j;?(ujg) E02j?(ujg)E2j;?(ujg)):(5.8)Notice that Y[gm]E2j is (Y [goj]n?1;k;:::;Y[goj]n?1;p;Y[gmj ]n;1 ;:::;Y[gmj ]n;k?1): Let E3 :p ? p be such a matrix that its entries in the ith row and (p ? i + 1)thcolumn are 1 while all others, 0, for i = 1;:::;p. Multiplying Y[gm]E2jby E3 reverses the order of the pollutants such that the response ofthe last hour locating at the ?rst position of the new response vector,the response of the second last hour locating at the second positionof the new response vector, and so on. In other words, we obtain thefollowing new response vector: (Y [gmj ]n;k?1;:::;Y[gmj ]n;1 ;Y[goj]n?1;p;:::;Y[goj]n?1;k):That new response has the following multivariate t{distribution:Y[gm]E2jE3jY[go];H ? t1?p(?j;?(ujg) ?j;?(ujg)); (5.9)where ?j = ?(ujg)E2jE3 and ?j = E03E02j?(ujg)E2jE3: DecomposeY[gm]E2jE3; ?j and ?j as follows:Y[gm]E2jE3 = (T1c;T2c) : (1 ? (k ? 1);1 ? (p ? k + 1));?j = (?1c;?2c) : (1 ? (k ? 1);1 ? (p ? k + 1));103Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Predictionand?j =? C11 C12C21 C22!:? (k ? 1) ? (k ? 1) (k ? 1) ? (p ? k + 1)(p ? k + 1) ? (k ? 1) (p ? k + 1) ? (p ? k + 1)!:Hence the unobserved response variable T1c is also t{distributed:T1cjT2c;Y[go];H ? t1?(k?1)(?1c + (T2c ? ?2c)C?122 C21; ?(ujg)?(ujg) + p ? k + 1??(ujg)f1 + (?(ujg)?(ujg))?1(T2c ? ?2c)C?122?(T2c ? ?2c)0g (C11 ? C12C?122 C21);?(ujg) + p ? k + 1): (5.10)The last \pollutant", that is, the ?rst entry of T1c; can be predictedby multiplying T1c with a (k ? 1){dimensional vector E4; in which its?rst entry is 1 and 0 otherwise. Consequently, the predictive posteriordistribution of Y [gmj ]n;k?1 is given as follows:Y [gmj ]n;k?1jY[go];Y[go]n?1;k:p;H ? t?(ujg)+p?k+1((?1c + (T2c ? ?2c)C?122 C21)E4;?(ujg)?(ujg) + p ? k + 1?(ujg)f1+(?(ujg)?(ujg))?1(T2c ? ?2c)C?122?(T2c ? ?2c)0gE04(C11 ? C12C?122 C21)E4):(5.11)The corresponding predictive variance of the (k?1)th hour of the 121stday at Gauged Site j is given by:Var(Y [gmj ]n;k?1)jY[go];Y[go]n?1;k:p;H) =?(ujg)?(ujg) + p ? k ? 1?(ujg)f1+(?(ujg)?(ujg))?1(T2c ? ?2c)C?122?(T2c ? ?2c)0gE04(C11 ? C12C?122C21)E4:104Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionIt is straightforward to construct the 95% pointwise predictive intervalsat the (k ? 1)th hour of the 121st day at each gauged site from (5.11).5.2 The DLM ApproachAn alternative approach, DLM, can also be used for temporal predictionand indeed would seem the obvious choice, it being an amalgamation ofstate{space time series models. For the measurement and state equations ofthe DLMYt = F0txt + ?t ?t ? N(0;?2exp(?V=?))xt = xt?1 + !t !t ? N(0;?2W)with initial information: x0jD0 ? N(m0;?20C0); one can obtain the pos-terior distribution of the state parameters at the last known time point,T; that is, xT jy1:T ;? ? N(mT ;?2CT ); using the Kalman ?lter, a smooth-ing method and the Metropolis{within{Gibbs sampling algorithm (detailsin Chapter 2).Given the distribution of the state parameters at the last time point,T; the observed responses until time T, y1:T ; and the model parameters,? = f?;?2;a1;a2g, the h{step ahead prediction is given byyT+hjy1:T ;? ? N(F0t+hmT ;?2fF0t+h(CT + hW)Ft+h + exp(?V=?)g);(5.12)for h 2 N: Hence, T = 2880 and h = 1;:::;24 for the one{day aheadprediction. For any ?xed h; the predictive response, yT+h; can also beobtained by the MCMC method. More speci?cally, at iteration j; supposewe have updated the model parameters ?(j); ?2(j); a(j)1 and a(j)2 using theFFBS algorithm. That is, one hasxT jy1:T ;?(j) ? N(mT (j);?2(j)CT (j)):Then, the predictive response at iteration j; yT+h(j); can be drawn from105Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal Prediction(5.12), that is,yT+hjy1:T ;?(j) ? N(Ft+h(j)0mT (j);?2(j)fFt+h(j)0(CT (j) + hW(j))Ft+h(j)+exp(?V=?(j))g):Consequently, the predictive responses are obtained by the sample means offyT+h(j) : j = 1;:::;Jg (J = 500;h = 1;:::;24). The empirical predictiveintervals at the 95% nominal level are obtained by the quantiles of thesesamples.5.3 NAIVEcurrency1 ApproachThe other alternative approach, NAIVEcurrency1, helps us assess the one{day aheadprediction performance of the multivariate BSP and DLM approaches. Thatapproach models the response variable by the grand mean, day e?ect andhour e?ect. To be more speci?c, the response variable used in this approachis the vectorized square{root ozone levels at each of the gauged sites. Usingthe same notation as above, at each gauged site j 2 f1;:::;gg, the responsevariable is Yj1:n = (Y [goj]1;1 ;:::;Y[goj]1;p ;:::;Y[goj]n;1 ;:::;Y[goj ]n;p )0 : 1?np; for n = 120and p = 24. The design matrix X contains three columns: the ?rst columnconsists of 1s, building in long{term linear trend; the second for the days,capturing day e?ect of the week, that is, Monday, Tuesday, etc.; and the lastfor the hours, a substitute for day e?ect but giving more reasonable results.In other words, its \design matrix" can be written asX =0BBBBBBBBBBBBBBB@1 1 1... ... ...1 1 p... ... ...1 n 1... ... ...1 n p1CCCCCCCCCCCCCCCA: (np) ? 3:106Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionThen our model is given by Y = X? + "; where " is the mean 0 Gaussianprocess to preserve its great simplicity. The coe?cient vector at Gauged Sitej; ?j; is thus estimated by the least squared estimator, cj = (X0X)?1X0Yj1:n:The one{day ahead prediction at Gauged Site j is thus given by bjn+1 =Xn+1 bj; where Xn+1 : p ? 3 as follows:Xn+1 =0BBB@1 n + 1 1... ... ...1 n + 1 p1CCCA:Next the multivariate BSP approach is compared to the DLM and NAIVEcurrency1for one{day{ahead prediction at gauged sites in the Chicago area.5.4 Results and ComparisonsFigures 5.1{ 5.14 plot the temporal predictions of the square{root of ozonelevels on the 121st day by the multivariate BSP, univariate DLM and NAIVEcurrency1approaches, the 95% pointwise predictive intervals for that day by the multi-variate BSP and DLM approaches, and the observations from 114th to 121stdays, at each of these 14 gauged sites. The multivariate BSP is much moreaccurate than either the DLM or NAIVEcurrency1 approaches. In fact, its predictiveperformance is rather good at most gauged sites.Table 5.1 presents the mean square predictive error (MSPE) of the pre-dictive responses on the 121st day at each one of the 14 gauged sites usingthe three approaches. At Gauged Site j; the MSPE of the prediction at hourh can be computed by:MSPEj =24Xh=1(PREDjh ? OBSjh)2;where PREDjh is the predictive response at hour h of the 121st day andOBSjh; the observed response at the same hour of the 121st day, at GaugedSite j. The DLM has the poorest MSPE over all the gauged sites comparedwith NAIVEcurrency1 and the BSP. The NAIVEcurrency1 approach is slightly better than107Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionGauged Site MSPE(NAIVEcurrency1) MSPE(DLM) MSPE(BSP)1 0.52 9.38 0.502 0.96 7.38 0.403 1.93 4.66 0.404 1.59 4.24 0.495 2.81 2.60 3.006 0.68 2.31 0.747 0.51 4.19 0.228 1.44 7.48 1.019 1.60 7.01 0.5910 0.44 5.50 0.5011 0.83 9.25 0.4912 0.75 3.41 0.4513 1.71 12.27 0.7314 2.44 5.61 1.09Table 5.1: The mean square predictive error (MSPE) of the one{day ahead predic-tion at the 14 gauged sites by the multivariate BSP, DLM, and NAIVEcurrency1 approaches.The BSP dominates in all but 3 cases where it essentially ties with one or anotherof its competitors.the BSP at Gauged Sites 5, 6, and 10. The BSP has the smallest MSPEacross most gauged sites among these three.Figure 5.15 plots the length of the 95% pointwise predictive intervals bythe BSP at the 24 hours of the 121st day. Starting from the middle hoursof that day, i.e., 9 A.M., the predictive error bands tend to increase afterthat until the last hour, 11 P.M., re?ecting the increasing uncertainties dueto the fact that fewer responses are observed as time increases.Figure 5.16 plots the length of the empirical 95% predictive intervalsby the DLM at the 24 hours of the 121st day. These lengths are close toeach other but have the wiggly periodic behaviour across all gauges sites, acharacteristic previously observed in Chapter 2. Though these lengths arevery close to each other, the DLM actually underestimates the predictivevariabilities at the gauged sites as seen in Figure 5.17 which plots the cover-age probabilities of the DLM and BSP approaches, and also shows a slightlyoverestimated predictive variance for the BSP, at the 95% nominal level.108Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionMoreover, the results in Section 5.1 can be straightforward generalizedto an arbitrary time point, not limiting to the case of 121 days of responsevectors in this chapter.Therefore, we conclude that the multivariate BSP approach is more ac-curate on the one{day ahead prediction at the gauged sites in the Chicagoarea AQS 2000 database than both the NAIVE and DLM approaches.Figure 5.1: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 1.5.5 Conclusion and DiscussionThe temporal prediction of the ground{level ozone concentrations in theChicago's hourly ozone ?eld shows the success of the adjusted multivariateBSP approach, comparing with two others: the DLM and NAIVEcurrency1. Thisadjusted approach can be generalized to any time point other than 121 in109Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.2: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 2.the above analysis. That extension is straightforward and will be given in afuture work. Moreover, forecasting more hours will be feasible when moredata are available. This approach could provide a whole new approach tounivariate time series.The potential problem with this adjusted approach is due to the loss ofthe information when only a subset of the whole database is used. Furtherextensions of the correlated response vector in the multivariate BSP mod-elling need to be explored. One possible solution is to develop the dynamicversion of the multivariate BSP, the topic for the next chapters.110Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.3: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 3.111Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.4: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 4.112Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.5: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 5.113Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.6: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 6.114Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.7: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 7.115Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.8: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 8.116Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.9: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 9.117Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.10: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 10.118Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.11: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 11.119Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.12: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 12.120Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.13: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 13.121Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.14: The observed square{root of ozone concentrations (pppb) from day114 to day 121, the predicted values using the multivariate BSP, DLM and NAIVEcurrency1approaches, and the 95% pointwise predictive intervals using the multivariate BSPand DLM approaches at Gauged Site 14.122Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.15: The width of the 95% pointwise predictive intervals of the one{dayahead prediction at the 14 gauged sites using the multivariate BSP approach.123Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.16: The width of the 95% pointwise predictive intervals of the one{dayahead prediction at the 14 gauged sites using the DLM approach.124Chapter 5. Multivariate Bayesian Spatial Prediction and Its Temporal PredictionFigure 5.17: Boxplots of the coverage probabilities using the DLM and multivariateBSP approaches at the 95% nominal level.125Chapter 6Bayesian EmpiricalOrthogonal Function MethodThis chapter contains some theoretical results relevant to the extension ofthe BSP approach in next chapter. The idea is straightforward in thatwe partition the spatial variation into long{term variation, and short{termvariation including measurement errors. We obtain what we call \Bayesian(empirical orthogonal functions) EOFs" in this chapter and extend resultsto incorporate the very ?exible GIW prior to model the staircase patterneddata in Chapters 4{5.Section 6.1 introduces empirical orthogonal functions (EOFs). Section6.2 describes the classical EOFs obtained from spatio{temporal data anddiscusses potential di?culties with this method. Section 6.3 illustrates analternative, corrected EOFs, for a known temporal covariance in the separa-ble space{time covariance structure. Section 6.4 proposes a Bayesian EOFmethod for a general Bayesian hierarchical model. Section 6.5 extends thoseresults to incorporate the GIW prior. Simulation Study 1 is presented inSections 6.2{6.3. Section 6.6 compares those three types of EOFs by Sim-ulation Study 2. Section 6.7 summarizes the results and states conclusionsfor this chapter.6.1 IntroductionOver the years, empirical orthogonal functions (EOFs) have been used ex-tensively for identifying spatial patterns for environmental processes. EOFsimplicitly assume that the samples consist of independent replicates of thespatial ?elds. These have been widely used in astronomy, physics, clima-126Chapter 6. Bayesian Empirical Orthogonal Function Methodtology, and oceanography (see Hannachi et al. (2007) for a review on thissubject in atmospheric science).EOFs deal with data anomalies, that is, deviations of the observed valuesfrom their temporal mean, instead of the responses. These anomalies areobtained by subtracting the time averages of the responses from the localresponses at each time point and site. More precisely, let Zi(t) represent theresponse at site si and at time t; for i = 1;:::;p and t = 1;:::;n: Then theanomaly at site si and time t is de?ned asYi(t) = Zi(t) ? 1nnXt=1Zi(t); (6.1)for i = 1;:::;p and t = 1;:::;n:Wikle (2002) describes EOFs as a commonly used spatio{temporal methodin Climatology. The method is widely used in spatial process analysisto detect spatial patterns in a given ?eld. For a continuous spatial pro-cess observed at discrete time points, the EOFs represent components of aKarhunen{Lo? e (KL) expansion; while in the discrete process, they are thecomponents of a Principal Component Analysis (PCA).EOF and PCA di?er from each other. EOF can be applied on any irreg-ularly located monitoring stations to ?nd both the time series and spatialpatterns. PCA can also be applied on any sites but only obtain the timeseries or spatial patterns. The spatial pattern represent the mode of vari-ability. The time series patterns, also called as the time series amplitudes,re?ect or characterize how the spatial patterns oscillate over time. The timeseries amplitudes are often referred as the expansion coe?cients or principlecomponents. In this thesis, we call the time series amplitudes as expansioncoe?cients, in order to distinguish them from the principal components, theterm often used in PCA. Although both EOF and PCA can be applied onany site, EOF might be inaccurate if it is constructed given unevenly locatedsites.In Climatology literature, some authors de?ne EOF and PCA di?erentlybut some not. We can use in?nitely many orthonormal basis vectors to127Chapter 6. Bayesian Empirical Orthogonal Function Methodconstruct the EOF. However, the eigenvector basis is the only one that allowsthe expansion coe?cients having the PCA property, that is, the expansioncoe?cients of the EOFs are uncorrelated (Bj?rnsson and Venegas, 1997).The KL expansion represents a stochastic process by a linear combina-tion of an in?nite number of orthogonal functions. In the KL expansion,the coe?cients for the orthogonal functions are random variables. The or-thogonal functions are the eigenfunctions of the covariance function for thisprocess. For the anomalies data, suppose Yt = (Yt(s1);:::;Yt(sp))0 : p ? 1represents the anomalies vector at t across all the spatial sites in the region.Let Y = (Y1;:::;Yn) : p ? n be the anomaly matrix.De?nition 6.1.1 (KL expansion) Consider the spatio{temporal processfZ(s;t) : s 2 D;t = 1;:::;ng; where s represents for the spatial locationin the domain D = fs1;:::;spg that of interest while t represents a timepoint. Assume E(Z(s;t)) = 0 and Cov(Z(si;t);Z(sj;t)) = C0(si;sj) for allt: The KL expansion represents the covariance function as an in?nite linearcombination of orthogonal functions, that is,C0(u;v) =1Xj=1?j?j(u)?j(v); (6.2)where f?j : j = 1;:::;1g are the eigenvalues and f?j(:) : j = 1;:::;1g;the orthogonal eigenfunctions.For the complete set of orthonormal basis functions f?j(:) : j = 1;:::;1g;the response can be represented as follows:Z(s;t) =1Xj=1aj(t)?j(s); (6.3)where Var(aj(t)) > Var(aj+1(t)) for j = 1;2;:::; and Cov(ai(t);aj(t)) = 0for i 6= j: In discrete case, the KL expansion is simply obtained through thePCA. Then f?j(s) : s 2 Dg is called the jth EOF and faj(t) : t = 1;:::;ng;the expansion coe?cients corresponding to the jth EOF. In other words,the KL expansion allows one to represent the process by an in?nite set of128Chapter 6. Bayesian Empirical Orthogonal Function Methodseparable orthonormal basis functions such that these orthonormal basisfunctions are optimal in minimizing the mean square variance.We next introduce the term of classical EOFs.6.2 Classical EOFsTo ?nd the principal spatial patterns in a spatial{temporal process, one canuse the classical EOFs 6. The spatial covariance matrix can be estimatedusing the samples of observed anomalies.In the geographical region of interest, zt(s) represents the univariateresponse variable at site s and time t; for s 2 fs1;:::;spg and t = 1;:::;n:Let Zt = (zt(s1);:::;zt(sp))0 : p?1 be the response vector at time t: Assumethe matrix{variate response variable Z = (Z1;:::;Zn) : p ? n that follows amatrix{normal distribution with a separable covariance structure in spaceand time, that is, Z ? Np?n(0;?S ?T ); where ?S : p ? p represents thespatial covariance matrix and ?T : n ? n; the temporal covariance matrix.This separable covariance structure implies no space{time interaction inspatial{temporal processes.Based on these assumptions, the spatial covariance matrix can be esti-mated by ^S = 1nYY0; where Y is the anomaly matrix de?ned in (6.1). Thespectral decomposition theorem implies the existence of a unique decompo-sition for ^S such that ^S = ^ ^2 ^ 0; where ^2 = diagf^21;:::; ^2pg with^?1 > ::: > ^?p > 0 being the eigenvalues for ZZ0; each column of ^? is theeigenvector corresponding to the associated eigenvalue. Hence, we represent?S as?S = ?currency12?0= (?currency1)(?currency1)0= ??0; (6.4)a form of the KL expansion. We then obtain the classical EOFs from ^ =6\Classical" EOFs are usual EOFs used in literature in which it ignores temporalcomponents.129Chapter 6. Bayesian Empirical Orthogonal Function Method^?^currency1:However, the classical EOFs do not e?ciently estimate the populationlevel counter parts in ?S without temporal independence, an unrealistic as-sumption in most cases. Moreover, this unrealistic assumption might providemisleading information about the spatial pattern obtained by the classicalEOFs. In the following section, we will demonstrate this de?ciency in clas-sical EOFs through some simulation studies. Moreover, a potential problemarises about how one can avoid the negative e?ects caused by correlatedsamples, that is, temporally correlated sequences.Next in Simulation Study 1, we show severe problems that classical EOFscan cause in a separable space{time ?eld with known EOF matrix. To dothis, we need to construct the orthogonal matrix O in the spectral decompo-sition theorem. Gram{Schmidt's process is a procedure to obtain an orthog-onal basis. The known EOFs can be constructed using the Gram{Schmidt'sprocess to have a speci?ed diagonal matrix currency1 and an orthonormal basisfunction ?: This construction starts with any set of given orthogonal vec-tors, saying, O1;:::;Ok; for 1 ? k ? n ? 1; and k 2 Z: Lemma 6.2.1 givesthe details.Lemma 6.2.1 Given the orthogonal vectors G1;:::;Gk; with Gj : p?1 forj = 1;:::;k; we obtain an orthogonal matrix G = (G1;:::;Gp) : p ? p byrepeating steps (i){(iii) for j = k + 1;:::;p :(i) Generate a realization yj from Np(0;Ip):(ii) Fit the linear regression model:yj = A0 +j?1Xi=1AiGi + "i;and obtain the estimated coe?cients f ^i : i = 0;:::;j ? 1g:(iii) Denote Gj to be the ?tted residuals yj ? ^0 ? Pj?1i=1 ^iGi such thatGj ? fG1;:::;Gj?1g:130Chapter 6. Bayesian Empirical Orthogonal Function MethodThis lemma gives us the orthonormal basis function by normalizing thegenerated orthogonal vectors G1;:::;Gp: We hereafter use O to representthe orthonormal basis matrix using the Gram{Schmidt type expansion.Since ?S = Ocurrency12O0; we then obtain the spatial covariance function ?Susing the above constructed EOFs.Simulation study 1The objective of this study is to compare three di?erent types of EOFsfor a separable state{space process with known EOFs and known temporalcovariance matrix.Simulated dataIn this study, the spatial region is 18 ? 18 grid locations, grid cell edgesbeing taken to be 6 km. The regular lattice{base grids have 324 grid lo-cations. The spatial{temporal process over this lattice is assumed to be amean 0 process with a separable spatio{temporal covariance structure. Af-ter constructing the EOFs matrix using the Gram{Schmidt's method witha known diagonal matrix, the spatial covariance function is formed by theirproducts as presented by the spectral decomposition theorem. The temporalcovariance function is simply a causal and invertible AR(1) process with thevariance parameter ?2v and AR coe?cient, ?: The autocorrelation function(ACF) for the AR(1) process is given by?(h) =8<:?2v1??2 h = 0?2v1??2 ?jjhjj h ? 1:In this example, we choose two orthogonal vectors G(1) and G(2) to con-struct the EOFs using the Gram{Schmidt method. More speci?cally, forp = 4q;q = 1;2;:::; G(i) is a p{dimensional vector in which its jth entry,G(i)j , is 1 for j 2 [1; p2]; -1 for j 2 [p2 + 1;p]; and 0 otherwise. To motivatedevelopment of our simulation model, we think of surface temperature asour response of interest. This response would be strongly determined by lat-131Chapter 6. Bayesian Empirical Orthogonal Function Methoditude, high in summer in the northern hemisphere while low in the southern.We choose G1 to represent that spatial pattern. We then choose G2 to rep-resent a High{Low elevation spatial pattern such that G2 is a p{dimensionalvector by repeating (1pp=2;?1pp=2) 7 for pp times. The constructed orthog-onal matrix given by these two orthogonal vectors, denoted by G; can beobtained by applying the Gram{Schmidt method in Lemma 6.2.1.The ?rst four diagonal entries in the speci?ed diagonal matrix currency12 areassumed to be 40, 20, 15, and 10. The remaining diagonal entries can beconstructed by a decreasing sequence such that their summation is 15 andthe minimum or last entry, around 0:023:Figures 6.1 and 6.2 plot the contours for simulated data at t = 5 andt = 28 in the region we study, respectively. These graphs show there may bethe North{South and High{Low spatial patterns in this ?eld, respectively.Moreover, this spatio{temporal ?eld varies spatially and temporally.To check the temporal variations in this simulated database, we ran-domly select four grid locations and plot their histograms, ACFs and PACFsin Figure 6.3. These graphs show a very strong autocorrelation in the timeseries data of each of the selected four sites, as expected from the strongAR(1) temporal process in the data.Results and comparisonsWe ?rst compute the classical EOFs and compare them with the true EOFs.Figure 6.4 plots the contours for the true EOFs. Clearly the ?rst spatialpattern is North{South spatial pattern and the second, High{Low elevationspatial pattern, the principle determinants of surface temperature. Figure6.5 plots the contours for the classical EOFs. Although it seems that classicalEOFs can also capture the ?rst two types of spatial patterns, obviously theestimates for the contours are far from the truth. For example, the valuesfor the ?rst true EOF in the northern region is close to be 0:35; while forthe ?rst classical EOF, close to be 0:6 at the northeastern region and 1:0;the northwestern region. Same sort of things happen for the second true7Here 1pp=2 represents the pp=2{dimensional vector of 1, and so as ?1pp=2:132Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.1: Contour plot for the simulated data at day t = 5 in the 18 ? 18 gridlocations. The AR coe?cient in the simulated data is set to be ? = 0:9: (White=-4.0; Black=4.0.)and classical EOFs. We conclude that the classical EOFs fail to capture thetrue spatial patterns in this ?eld due to the ?eld's high autocorrelation.The next section provides an alternative when the temporal covariancematrix, ?T ; is known, a method leading to what we call \corrected" EOFs.6.3 Corrected EOFsThis section presents an alternative to the classical EOFs of Section 6.2,given a known temporal covariance matrix. We call the EOFs obtainedcorrected EOFs, meaning that the temporal dependence structure has beenincorporated. Although assuming the temporal covariance ?T to be knownis unrealistic, the analysis exposes the potentially serious problem that the133Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.2: Contour plot for the simulated data at day t = 28 in the 18 ? 18grid locations. The AR coe?cient in the simulated data is set to be ? = 0:9:(White=-4.0; Black=4.0.)classical EOFs can cause.Given the matrix{variate normal distribution for Y ? Np?n(0;?S ?T );we have Ycurrency1 = Y??1=2T ? Np?n(0;?S In); by a standard property of thematrix{variate normal distribution. It is then straightforward to estimate?S using 1n Pnt=1 Ycurrency1t (Ycurrency1t )0 = 1nYcurrency1(Ycurrency1)0: In other words,^?S = 1nY??1T Y0; (6.5)given the non{singular covariance matrix ?T : The corrected EOFs are thenconstructed using the spectral decomposition theorem, same as the classicalEOFs. To obtain unique EOFs, we restrict the eigenvectors to form theorthonormal matrix that has positive elements in its ?rst row.We now revisit the example in Section 6.2 but obtain the corrected EOFs134Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.3: Histogram (?rst row), ACFs (second row) and PACFs (third row) forthe simulated data at four randomly selected sites in the region. The AR coe?cientin the simulated data is set to be ? = 0:9:and compare them with the classical and true EOFs to see if the correctedEOFs improve our knowledge about the principal spatial patterns in thissimulated database.Simulation study 1 (revisited)Figures 6.4{6.5 demonstrate the ?rst two EOFs for true, classical and cor-rected, respectively. The ?rst two true EOFs seem to have an obvious spatialpattern, seen in the corrected EOFs but not in the classical EOFs. The re-sults demonstrate the potential danger in using the classical EOFs when thespatio{temporal data are highly temporally correlated as well.Table 6.1 represents the percentage of spatial variation for the ?rst10 EOFs by the true, classical, and corrected methods. The corrected135Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.4: Contour plots for the ?rst two true EOF vectors: (a) { 1st EOF; and(b) { 2nd EOF. (White=-1.7; Black=1.4.)EOFs are much closer to the true values than the classical ones. Table6.2 demonstrates the matrix discrepancies8 for the classical and correctedEOFs against the true EOFs. Assuming the temporal covariance functionis known, the corrected EOFs are closer to the true EOFs than the classi-cal ones. The matrix discrepancy between the true and corrected EOFs ismuch smaller than that between the true and classical one. Moreover, thecorrected EOFs better characterize the ?rst two spatial types in this ?eldthan the classical EOFs do.More generally, for an unknown temporal covariance structure, it is im-possible to use the corrected EOFs to identify the principal spatial patterns8The matrix distance between two matrices, A and B, is de?ned through the conceptof the inner product of these two matrices, that is, jjA ? Bjj = tr(AB0). The matrixdiscrepancy between A and B is de?ned asqjjA?BjjjjBjj :136Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.5: Contour plots for the ?rst two classical EOF vectors: (a) { 1st EOF;and (b) { 2nd EOF. (White=-1.7; Black=1.4.)in spatio{temporal ?elds. One may have interest to know whether we canobtain an accurate estimate of the spatial covariance matrix based on theobservations so that the EOFs can better represent the principal spatialpatterns. Moreover, how can one take into account of uncertainties in theestimates?To help answer those questions, we next propose a new Bayesian versionof EOFs by adding a prior distribution for the spatial covariance matrix.6.4 Bayesian EOFsIn practice, the corrected EOFs cannot be computed because of the unknowntemporal dependence structure for the spatial{temporal matrix{variate re-sponse Z : p ? n; where p is the total number of spatial locations in the do-137Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.6: Contour plots for the ?rst two corrected EOF vectors: (a) { 1st EOF;and (b) { 2nd EOF. (White=-1.7; Black=1.4.)main that is of interest and n; the total number of time points. Henceforth,Bayesian EOFs are proposed in this section as an alternative to classicalEOFs.Bayesian EOFs: the underlying modelThe Bayesian EOFs model we adopt here is given byZ = ? 10n + ?X + ?; ? ? Np?n(0;?S ??) (6.6)andX j? ? Np?n(0;Ip ?T ); (6.7)where we assume both spatial and temporal covariance matrices ?S and?T have full rank, that is, nonsingular. Moreover, we deal with two cases138Chapter 6. Bayesian Empirical Orthogonal Function MethodIndex of EOFs True Classical Corrected1 40.000 40.833 39.5072 20.000 20.230 20.1783 15.000 15.890 15.0724 10.000 9.165 10.2675 0.023 0.544 0.1896 0.024 0.519 0.1837 0.024 0.481 0.1788 0.024 0.434 0.1789 0.024 0.431 0.17310 0.024 0.405 0.171Table 6.1: Percentage of spatial variation (%) for the ?rst 10 EOFs by the true,classical, and corrected methods (? = 0:9).Matrix discrepanciesClassical v.s. True 4.347Corrected v.s. True 0.137Table 6.2: Matrix discrepancies for the classical and corrected EOFs against thetrue EOFs.for the temporal covariance matrix ?T : (i) It is randomly distributed withan inverted Wishart distribution; and (ii) It has a semi{parametric form,that is, ?T = ?2?(:;?); with known temporal correlation form for ?(:;:) butunknown parameters ?2 and ?: We assume that the temporal correlationmatrix, ?(:;?); decreases as the di?erence between two time points increases,such that ?(ti ? tj;?) ! 1 as ti ? tj ! 0: In practice, we assume ?2 =1 due to the non{identi?able property for the Kronecker product. Thesemi{parametric form for ?(:;?) is then estimable using the MCMC method.For simplicity, we assume no small{scale spatial variation and measurementerrors, that is, no ? term in (6.6), deferring such re?nements to future work.In (6.6), ? can be treated as a constant matrix for a known ?S by theKarhunen{Loeve (KL) expansion. Given the positive de?nite spatial covari-ance function ?S : p ? p; we have the unitary orthogonal matrix O : p ? p;with its ?rst row being positive, and diagonal matrix currency1 = diag f?1;:::;?pg :p ? p; with ?2s being the eigenvalues for ?S and ?1 > ::: > ?p > 0; such139Chapter 6. Bayesian Empirical Orthogonal Function Methodthat??1S = Ocurrency1?2O0:Hence, we have the Karhunen{Loeve (KL) expansion as follows:??1S = (??11 O1;:::;??1p Op)(??11 O1;:::;??1p Op)0= (?1;:::;?p)(?1;:::;?p)0=pXj=1?j?j0= ??0;where ?j = ??1j Oj; for j = 1;:::;p; and ? = (?1;:::;?p) : p ? p:However, when ?S is a random matrix, the EOFs represented by thecolumns of ? are also random. Moreover, the orthogonal matrix can betreated as either constant or random in the KL expansion. The distributionfor the random orthogonal matrix has been obtained by James (1954a), as aninvariant uniform distribution on the Stiefel manifold (see De?nition 6.4.2).Moreover, he also obtained the independent distribution of the diagonalentries in currency12 in the KL expansion.Then a Bayesian version of EOFs can be obtained either by the MCMCmethod or through the empirical Bayes approach. The ?rst level of a hier-archical model places no restriction on the form of our Bayesian EOFs, andso it is a nonparametric approach. When the prior for the purely spatialcovariance matrix, ?S; has been determined, ? can be obtained using theKL expansion and Lemma 6.4.1 below.Therefore, the Bayesian EOF model (6.6){(6.7) is completed by specify-ing prior distributions for the model parameters: p(?); p(?T ) for Case (i) [orp(?) for Case (ii)] and p(?S): Lacking speci?c prior information, we assumep(?) / 1 and ??1S ? Wp(?S;?S) with ?S and ?S being hyperparameters.For Case (i), we assume ??1T ? Wn(?T ;?T ): The collection of hyper-parameters can be denoted by H1 = f?S;?T ;?S;?T g: The joint posteriordistribution we are interested in is given by p(?;?T ;?SjY):140Chapter 6. Bayesian Empirical Orthogonal Function MethodFor Case (ii), we assume ? ? Nk(?0;?0): The collection of hyperpa-rameters here can be denoted by H2 = f?S;?S;?0;?0g: The joint posteriordistribution we need for inference is then given by p(?;?;?SjY):In summary, we consider the following two Bayesian models to obtainthe Bayesian EOFs for each of the two cases under consideration in whichwe assume no measurement error (?s):(i) The Bayesian model is given byZ = ? 10n + ?X (6.8)X ? Np?n(0;Ip ?T ) (6.9)? = Ocurrency1?1; (6.10)where ??1S = Ocurrency1?2O0 by the above KL expansion. And the priors forthe model parameters are assumed mutually independent and given asfollows:p(?) / 1 (6.11)??1S ? Wp(?S;?S) (6.12)??1T ? Wn(?T ;?T ): (6.13)(ii) The Bayesian model is given as in (6.8){(6.10), but ?T can be writtenas ?(:;?); where ?(:;:) is the known temporal correlation matrix, de-creasing as the di?erence between any two time points increases, and? is an unknown parameter. The priors for ? and ??1S are given in(6.11){(6.12), respectively, and they are mutually independent. Modelspeci?cation is completed with the prior for the temporal covariancefunction given by:? ? Nk(?0;?0): (6.14)Notice that we assume both ?S and ?T are valid covariance matricesand non{singular, that is, the rank for ?S is p and for ?T ; n.141Chapter 6. Bayesian Empirical Orthogonal Function MethodDecomposition of Y ? Np?n(0;?S In)We ?rst describe results from James (1954a), who discovers that the dis-tribution of the mean 0, temporal independent random matrix Y [that is,Y ? Np?n(0;?S In)] can be uniquely decomposed into three indepen-dent parts, speci?cally, one part being Wishart distributed, one part beinguniformly distributed on a Grassmann manifold and the last part being uni-formly distributed on a Stiefel manifold, that is, an orthogonal group in thissetting. He constructs invariant measures on the orthogonal group (thatis, the Haar measure), the Grassmann and Stiefel manifolds. James also?nds the distribution of a non{central Wishart distribution using the Haarmeasure (1954b) and the distribution of the latent roots for a covariancematrix (1960). We now introduce some basic notation from James (1954a)and Chikuse (2004).De?nition 6.4.1 The orthogonal group, O(n); is the set of all orthogonalmatrices with the operation of matrix multiplication.De?nition 6.4.2 The Stiefel manifold, Vk;n = fV : n ? k;V0V = Ikg; is aset of k (k ? n) orthonormal matrices in Rn:De?nition 6.4.3 The Grassmann manifold, Gk;n?k; is the set of all k{dimensional hyperplanes in Rn that pass through the origin.James sates that \. . .the Grassmann and Stiefel manifolds may be re-garded as coset spaces of the orthogonal group" (1954, p.63), an importantproperty on their relationships. The main result from James has been sum-marized in the following lemma.Lemma 6.4.1 (James, 1945) Suppose Y = (y1;:::;yn) ? Np?n(0;?S In): Then we haveY = OLP; (6.15)where O : p?p represents an orthogonal matrix that is uniformly distributedover the Grassmann manifold, P : p ? n; a semi{orthogonal matrix (i.e.,142Chapter 6. Bayesian Empirical Orthogonal Function MethodPP0 = Ip in this case) that is uniformly distributed over the Stiefel manifoldand L : p ? p; a diagonal matrix with diagonal entries fl1;:::;lpg such thatl21;:::;l2p are the eigenvalues for YY0 with l1 > ::: > lp > 0:Mardia and Khatri (1977) develop the exact and asymptotic distribu-tions for the random matrix uniformly distributed on a Stiefel manifold.They also discuss the matrix form of the von Mises{Fisher distribution ona Stiefel manifold. We will not discuss this application of James in thischapter but leave the construction of the posterior distributions on Stiefeland Grassmann manifolds for future research.Theoretical resultsWe present the related inference for Bayesian EOFs in following lemmas andtheorems. All proofs in this section are listed in Appendix C.1.Lemma 6.4.1 and the KL expansion below provide the basis for thetheoretical results needed to obtain Bayesian EOFs. It leads to the priordistribution as a special case when ?S = Ip; as shown in following lemma.Lemma 6.4.2 If ??1S ? Wp(n;Ip) for some n 2 Z+, by the KL expan-sion and Lemma 6.4.1, we have that the ??2's are mutually independentlydistributed with ??2j ? ?2n; for j = 1;:::;p; and O : p ? p is uniformlydistributed on the Stiefel manifold.Lemma 6.4.2 provides one way to sample the random matrix ??1S fromits prior distribution Wp(?S;Ip):Theorem 6.4.1 Consider the data matrix Y : p ? n ? Np?n(0;?S ?T ):Given the nonsingular spatial and temporal covariance matrices ?S and ?T ;let Ycurrency1 = Y??1=2T : Then Ycurrency1 ? Np?n(0;?S In): Consequently, Ycurrency1 =?1=2S OLP; where O, L, and P are given in Lemma 6.4.1. The BayesianEOFs are then obtained as W = 1n?1=2S OL:Theorem 6.4.2 Consider the data matrix Y : p ? n ? Np?n(0;?S ?T ):Suppose the temporal covariance matrix ?T is nonsingular and known. Then143Chapter 6. Bayesian Empirical Orthogonal Function Methodwe have Ycurrency1 = Y??1=2T ? Np?n(0;?S In): Assume ??1S ? Wp(?S;?S):The posterior distribution for the spatial precision matrix ??1S is given asfollows:??1S jY ? Wp(?o;?o); (6.16)where ?o = ?S + n; and?o = ?S ? ?SY(Y0?SY + ?T )?1Y0?S: (6.17)The Bayesian EOFs can be obtained when ?S is estimated or sampled fromits posterior distribution.Consequently from Theorem 6.4.2, we can either obtain the estimates for?S using an empirical method such as that of the Sampson{Guttorp (SG)or sample it from its posterior distribution in (6.16). In other words, we caneither use empirical Bayes or hierarchical Bayesian methods to obtain theestimates for the model parameters.If the spatial covariance matrix were known, valid and nonsingular, wewould have a similar result for the posterior distribution of the temporalcovariance matrix, ?T : The next theorem tells us its posterior distributionif the prior for ??1T is assumed to be Wishart distributed, that is, Case (i).Theorem 6.4.3 Consider the data matrix Y : p ? n ? Np?n(0;?S ?T ):Suppose the spatial covariance matrix ?S is known, valid and nonsingular.Assume the prior for ??1T is Wn(?T ;?T ): Then the posterior distributionfor ??1T is given by??1T jY ? Wn(?currency1;?currency1); (6.18)where ?currency1 = ?T + p and?currency1 = ?T ? ?T Y0(Y?T Y0 + ?S)?1Y?T : (6.19)In Case (ii) where the temporal covariance matrix is assumed to have a144Chapter 6. Bayesian Empirical Orthogonal Function Methodknown parametric form but unknown parameters, we can obtain the pos-terior distribution for these parameters given a valid and known spatialcovariance matrix. This posterior distribution is given in the following the-orem.Theorem 6.4.4 Given the same condition as in Theorem 6.4.3 and thefully Bayesian model in Case (ii) with the priors for ? in (6.14), the condi-tional posterior distributions for ? are given as follows:p(?jY) / exp??12? 1?2 tr(V0V?(:;?)?1) + (? ? ?0)0??10 (? ? ?0)??;(6.20)where V = ??1=2S Y:In practice, both temporal and spatial covariance matrices are unknown,and so conditions in Theorems 6.4.1 and 6.4.4 do not hold. Although thenonsingularity condition for these two covariance matrices might be di?cultto verify due to the challenging numerical problems, we need the conditionto obtain the posterior samples for both matrices in this chapter. Futureresearch will be devoted to improve on the results.To obtain posterior samples for both parameters in the MCMC frame-work, we can either use a mixture of MCMC and empirical Bayes methodsor use pure MCMC runs. We will illustrate the algorithm for both casesnext. But the idea here is to obtain the conditional posterior samples for?; ?S and ?T subsequently. The empirical Bayes method will be used toobtain the estimate for ?S given the data matrix, ?; and ?T : We use theGelman and Rubin R statistics as a device to check the convergence of theMarkov chains (Gelman et al., 2004, p. 296{297). The estimates for themodel parameters are then obtained as the mean of the posterior samplesafter the burn{in period.We next develop the posterior conditional distributions for the modelparameters for Cases (i) and (ii). The Bayesian EOFs can be obtained byTheorem 6.4.1 in which the mean ?eld, spatial and temporal covariance145Chapter 6. Bayesian Empirical Orthogonal Function Methodmatrices are estimated from their posterior distributions or by the empiricalBayes method.Posterior conditional distributionsThe posterior conditional distributions for ?; ??1S ; and ??1T can be obtainedfor Cases (i) and (ii) in the fully Bayesian framework. The proofs of thetheorems in this section are presented in Appendix C.1. We ?rst presentthe conditional posterior distributions for model parameters for Case (i) inthe following theorem.Theorem 6.4.5 Given the Bayesian hierarchical model in (6.8){(6.13), theposterior conditional distributions for these model parameters are given asfollows:(i) The conditional posterior distribution for ? is given by?jZ;?S;?T ? Np(M;?currency1?S) (6.21)where ?currency1 = ftr(10n1n??1T ) g?1 and M = Z??1T 10n?currency1:(ii) The conditional posterior distribution for ??1S is given by??1S jZ;?;?T ? Wp(?1;?1); (6.22)where ?1 = ?S + n;?1 = ?s ? ?sY(Y0?SY + ?T )?1Y0?s; (6.23)and Y = Z ? ? 10n:(iii) The conditional posterior distribution for ??1T is given by??1T jZ;?;?S ? Wn(?2;?2); (6.24)where ?2 = ?T + p;?2 = ?T ? ?T Y0(Y?T Y0 + ?S)Y?T ; (6.25)146Chapter 6. Bayesian Empirical Orthogonal Function Methodand Y = Z ? ? 10n:In the same way, the posterior conditional distributions for the modelparameters for Case (ii) are obtained in the next theorem.Theorem 6.4.6 Given the Bayesian hierarchical model in (6.8){(6.12) and(6.14), the posterior conditional distributions for ? and ??1S are given inTheorem 6.4.5. Let V = ??1=2S (Z? ? 10n): Moreover, the posterior condi-tional distribution for ? is given by:p(?jZ;?;?S) / exp??12htr?VV0?(:;?)?1?+ (? ? ?0)0??10 (? ? ?0)i?:(6.26)After obtaining the estimates for the spatial and temporal covariancematrices, as well as other model parameters for both cases, the BayesianEOFs can then be obtained by Theorem 6.4.1, Lemma 6.4.1 and the KLexpansion.Next we illustrate how the MCMC algorithm can be used to obtain theposterior samples from the joint posterior distribution p(?;??1S ;??1T jZ) andp(?;??1S ;?jZ) for Cases (i) and (ii), respectively.MCMC algorithmsTo obtain the posterior samples from the joint posterior distribution ofmodel parameters, ?;?S and ?T (or ?); the MCMC algorithm is used todraw samples from their posterior distributions. For Case (i), we use Gibbssampling method based on the full conditional distributions we obtainedbefore. For Case (ii), a Metropolis{within{Gibbs algorithm is used becausethe posterior conditional distribution for ? does not have any closed form.Algorithm 6.4.1 For Case (i), Gibbs sampling can be used to draw theposterior samples from p(?;?S;?T jZ) :1. Initialization: set?(1) = Zrow;147Chapter 6. Bayesian Empirical Orthogonal Function Methodsample??1S (1) ? Wp(?S;?S);and??1T (1) ? Wn(?T ;?T ):2. Given the (j ? 1)th values, ?(j?1); ??1S (j?1); ??1T (j?1); and Z:(1) Sample ?(j) from p(?jZ;?S(j?1);?T (j?1)) from (6.21).(2) Sample ??1S (j) from p(??1S jZ;?(j);?T (j?1)) from (6.22).(3) Sample ??1T (j) from p(??1T jZ;?(j);?S(j)) from (6.24).3. Repeat until convergence.The Metropolis{within{Gibbs algorithm is omitted here because we presenta similar result in Chapter 2. In this section, we give the algorithm for a veryspecial case when the temporal process is assumed to be an AR(1) process.Hence we have ? as the parameter that characterizes the AR(1) process.Assume that ? ? N(?0;?2?0): Then the collection of hyperparameters canbe denoted by H = f?0;?2?0;?S;?Sg: Since there is no closed form for theposterior conditional distribution in (6.26), the Metropolis{within{Gibbsalgorithm could then be used to draw posterior samples of interest.The next section includes a straightforward extension on the BayesianEOFs results where its spatial covariance is assumed to have a GIW distri-bution instead of IW (see Le and Zidek (2006), for example).6.5 Extension to the Bayesian EOFsWe can extend the above results about Bayesian EOFs to incorporate theGIW prior for the spatial covariance structure in such a way that the GIWprior re?ects some characteristics of the data matrix. Le & Zidek (1992{2006) develop theoretical results for modelling spatio{temporal processes,i.e., the BSP approach in Chapter 4.148Chapter 6. Bayesian Empirical Orthogonal Function MethodIn such a Gaussian GIW framework, the spatial covariance matrix ?Scan be estimated by the SG{method or Damian SG{method (Damian et al.,2002). Therefore, the estimates for the ?S can be updated at each iterationin the MCMC sampling. This will be carried out in future work.The next section includes two simulation examples that help assess theperformance for the Bayesian, classical and corrected EOFs.6.6 Simulation Study 2The Bayesian EOF models we consider above have a very general structure.Note that ? in (6.6) represents small scale spatial variation or measurementerror. If ? is close to 0; or equivalently to a very small value for ??; wethen have approximately Y = ? 10n + ?X; where ?'s columns are theEOFs for ?S and Xj? ? Np?n(0;Ip ?T (?)): If ?T (?) = In; then we havethe classical EOFs in Section 6.2. If ?T (?) 6= In but known, we then havethe corrected EOFs in Section 6.3. If ?T (?) is unknown, we can use theBayesian EOFs obtained in Section 6.4.The objective in this example is to compare the three di?erent types ofEOFs for a separable state{space process with known spatial and temporalcovariance matrices. To do that, we ?rst simulate the matrix{variate dataset. We then compute these three EOFs and compare them with the trueEOFs by contour plots and the matrix discrepancies.To brie?y review these three types of EOFs, suppose Y : p?n representsthe anomaly matrix for p sites and n time points, and follows a multivariatenormal distribution Np?n(0;?S ?T ). Recall that the classical EOFs es-timate the sample spatial covariance matrix by 1nYY0: Given the temporaldependence structure, that is, the temporal covariance matrix, the correctedEOFs estimate the sample spatial covariance matrix by Y??1T Y0: Given thepriors for the spatial and temporal covariance matrix, the Bayesian EOFsestimate the sample spatial covariance matrix in the hierarchical model bymeans of the corresponding posterior mode.We consider two cases in this section to assess the performance of theEOFs for two di?erent temporal dependence structures. For both cases, we149Chapter 6. Bayesian Empirical Orthogonal Function Methodassume a separable space time covariance structure, that is, an exponentialspatial covariance function and an AR(1) temporal covariance function. Inparticular, the spatial covariance function is given by(?S)ij = exp(?Vij=?); (6.27)where Vij is the Euclidean distance between si and sj; for i;j = 1;:::;p and?; a scale parameter. The temporal covariance function between tk and tlis given by ?2v?2jjtk?tljj; for tk;tl 2 f1;:::;Tg: Note that j?j < 1 indicatesa causal AR(1) process. If ? ' 0; then yt are approximately independent;if ? ' 1; then fyt : t = 1;:::;ng is a highly autocorrelated AR(1) process.We consider Case (i), ? = 0:1; and Case (ii), ? = 0:9:The geographical region in this simulation study is set to be [0:1;1:0] ?[0:1;1:0]: We select 100 grid points in this region to be the locations ofinterest, i.e., p = 100: We then choose n = 120 time points at each of these100 sites.The initial settings for the separable space{time covariance functions aregiven as follows: ? = 0:4; ?2v = 1:0; and ? = 0:1 for Case (i), and 0:9 forCase (ii).We de?ne Y (s;t) = Z(s;t)?^(s); the anomaly at site s and time t, where^(s) = 1n Pnt=1 Z(s;t): We obtain the classical, corrected, and BayesianEOFs using Y (s;t).We now compare the corrected, classical and \true" EOFs in the thesetwo cases: (i) ? = 0:1; and (ii) ? = 0:9; respectively.Simulated dataSuppose Y : p ? n ? Np?n(0;?S ?T ): One way to generate the sim-ulated data is by ?rst simulating Ycurrency1 = Y?T 1=2 = (ycurrency11;:::;ycurrency1n): Thus,ycurrency1t ? Np(0;?S); independently for t = 1;:::;n: We then generate Y byYcurrency1?T ?1=2:An alternative to obtain the simulated data uses James's result and theKL expansion. Given both ?S and ?T ; we ?rst illustrate the way to generatethe simulated data in any given regions. Given the spatial covariance matrix150Chapter 6. Bayesian Empirical Orthogonal Function Method?S; the Karhunen{Loeve expansion gives the unique orthogonal matrix O :p?p with its ?rst row being positive and the unique diagonal matrix currency12 : p?pwith its decreasing diagonal entries being the eigenvalues of ?S; such that?S = Ocurrency12O0= (currency1O0)0currency1O0= (?1O(1);:::;?pO(p))(?1O(1);:::;?pO(p))0= (?(1);:::;?(p))(?(1);:::;?(p))0=pXj=1?(j)(?(j))0= ??0; (6.28)where ? = (?(1);:::;?(p)) : p ? p and ?(j) = ?jO(j) for j = 1;:::;p:Given ?T : n?n; the corresponding Karhunen{Loeve expansion is givenby?T = PL2P0=pXj=1?(j)(?(j))0= ??0; (6.29)where P : n ? n represents the orthogonal matrix with its ?rst row posi-tive; L2 is a diagonal matrix with decreasing but positive diagonal entriesl21;:::;l2n; ? = (?(1);:::;?(n)); and ?(i) = liP(i) for i = 1;:::;n:Consequently, if Y : p?n ? Np?n(0;?S ?T ); and ? ? Np?n(0;Ip In);then Y = ???0 where ? and ? are given by (6.28) and (6.29), respectively.Let ?currency1 = ??0: Then ?currency1 ? Np?n(0;Ip ?T ); and Y = ??currency1:In short, the simulated data can be generated, for known spatial andtemporal covariance matrices, as follows:(i) Uniquely obtain ? and ? as in (6.28) and (6.29) for known ?S and?T ; respectively.(ii) Generate ? ? Np?n(0;Ip In) by n samples independently distributed151Chapter 6. Bayesian Empirical Orthogonal Function Methodfrom the multivariate normal distribution Np(0;Ip):(iii) Obtain the simulated data matrix Y : p ? n by ???0:Results for Case (i): ? = 0:1Table 6.3 demonstrates the percentage of the spatial variance of each spatialpattern found by the EOFs against the total space variance. It shows thatthese percentages for the true, classical or corrected EOFs are quite close toeach other. Table 6.4 presents the matrix discrepancies of the corrected orclassical EOFs to that of the true EOFs. These two matrix discrepancies areclose to each other but the classical one is slightly better than the correctedone.Index of EOFs True Classical Corrected1 33.584 31.913 31.8832 11.122 13.476 13.4573 11.122 8.963 8.9964 5.108 6.193 6.2015 3.853 4.732 4.7476 3.638 3.467 3.4677 2.274 2.733 2.7358 2.274 2.333 2.3299 1.537 1.946 1.94510 1.537 1.719 1.724Table 6.3: Percentage of spatial variation (%) for the ?rst 10 EOFs by the true,classical, and corrected methods (? = 0:1).Matrix discrepanciesClassical v.s. True 0.230Corrected v.s. True 0.233Table 6.4: Matrix discrepancies for the classical and corrected EOFs against thetrue EOFs (? = 0:1).Figures 6.7{6.9 plot the contours for the three types of EOFs. In Figure6.7, the ?rst EOF in (a) shows an ellipsoidal spatial pattern in this ?eld,152Chapter 6. Bayesian Empirical Orthogonal Function Methodwith its mode at the center of this region. The second EOF in (b) shows anorth{east to south{west shifting spatial pattern with the negative mode atthe north{east corner and the positive one at the south{west corner. Thethird EOF in (c) shows an approximately opposite spatial pattern to thatin (b). The remaining three EOFs in (d){(f) also represent certain spatialpatterns in this ?eld. It shows the classical EOFs to be quite similar to thecorrected one due to the very small value of ?: Moreover, the ratio of matrixdiscrepancy between the classical and true EOFs matrix is 0:25, while thatbetween the corrected and true ones, 0:25: It veri?es our expectation thatboth EOFs work quite well since the \true" data are approximately inde-pendent samples across the overall locations. Classical EOFs are supposedto capture the main types of spatial patterns in this case.Figure 6.7: Contour plots for the ?rst 6 true EOFs (? = 0:1): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and (f) { 6th EOF.(White=-0.6; Black=0.9.)153Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.8: Contour plots for the ?rst 6 classical EOFs (? = 0:1): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and (f) { 6th EOF.(White=-0.6; Black=0.9.)Results for Case (ii): ? = 0:9Table 6.5 plots the percentages of the spatial variations for each of thethree types EOFs: true, classical, and corrected. This graph shows that thecorrected EOFs gives more accurate estimates on the majority diagonal ele-ments of currency1 in the KL expansion. Table 6.6 presents the matrix discrepanciesbetween the corrected and classical EOFs and the true EOFs. It shows thatthe matrix discrepancy of the corrected EOFs against the true ones is muchsmaller than that of the classical EOFs against the true ones, which impliesmore accurate results of the corrected EOFs than those of the classical ones.Figures 6.10{6.11 present the ?rst six classical and corrected EOFs forthis case. Comparing them with Figure 6.7, it is obviously that the cor-rected EOFs can estimate the main types of spatial patterns better than154Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.9: Contour plots for the ?rst 6 corrected EOFs (? = 0:1): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and (f) { 6th EOF.(White=-0.6; Black=0.9.)the classical ones. Moreover, the ratio of matrix discrepancy between theclassical and true EOFs matrix is 5:79, while that between the corrected andtrue ones, 0:21: It shows that the classical EOFs are far from the \truth"for the highly autocorrelated database.6.7 ConclusionsWe have proposed Bayesian EOF (BEOF) approach in this chapter andshow the corresponding theoretical results as well as the MCMC algorithmto obtain the posterior samples of the model parameters. We have shownthat the corrected EOF method can be used to obtain better representationof principal spatial patterns than the classical EOF method in two simula-155Chapter 6. Bayesian Empirical Orthogonal Function MethodIndex of EOFs True Classical Corrected1 33.584 33.572 37.7022 11.122 20.789 11.8683 11.122 9.660 11.0054 5.108 7.835 6.2045 3.853 4.341 4.3446 3.638 3.302 3.4057 2.274 2.870 2.0388 2.274 2.287 1.9109 1.537 1.746 1.52110 1.537 1.583 1.460Table 6.5: Percentage of spatial variation (%) for the ?rst 10 EOFs by the true,classical, and corrected methods (? = 0:9).Matrix discrepanciesClassical v.s. True 4.978Corrected v.s. True 0.332Table 6.6: Matrix discrepancies for the classical and corrected EOFs against thetrue EOFs (? = 0:9).tion studies. From where, we conclude that the classical EOF may lead tosevere problems for a highly temporally correlated space{time process. Thecorrected EOF greatly improves the performance of the classical EOF andcaptures the principal spatial patterns better than the classical one. Theimplementation of the BEOF method will be one future work, as well as thecomparisons among the BEOFs, corrected and classical EOFs.Here notice that the BEOF is di?erent than the Bayesian factor analysisproposed by Aguilar and West (2000). In the Bayesian factor analysis theyde?ned, PCA is used for the time{varying ?S: And MCMC algorithm hasto be used to draw posterior samples for the Bayesian factors, which is com-putationally costly. We so extend the BEOF into an extension of the BSPto model the univariate or multivariate responses in spatio{temporal ?elds,which we call generalized Bayesian spatial prediction (GBSP) method.156Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.10: Contour plots for the ?rst 6 classical EOFs (? = 0:9): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and (f) { 6th EOF.(White=-1.6; Black=2.2.)157Chapter 6. Bayesian Empirical Orthogonal Function MethodFigure 6.11: Contour plots for the ?rst 6 corrected EOFs (? = 0:9): (a) { 1st EOF;(b) { 2nd EOF; (c) { 3rd EOF; (d) { 4th EOF; (e) { 5th EOF; and (f) { 6th EOF.(White=-1.6; Black=2.2.)158Chapter 7An Extension of the BSP:Bayesian Spatio{TemporalModelsThis chapter proposes an extension of the BSP approach for modelling airpollution data in large spatial{temporal domains, such as the AQS databasewe discussed in Chapters 2 and 3. The motivation for this extension hasbeen addressed in previous studies. We extend the BSP approach becauseof its computational e?ciency and better model performance in spatial in-terpolation and temporal prediction than those of the DLM. We integratethe BSP into the DLM framework because the DLM has a fairly ?exiblestructure for temporal prediction at ungauged and gauged locations andcapability for handling the missing observations at gauged sites.Our proposed model can deal with two types of covariates in this ?eld:time{varying but site invariant covariates and site{speci?c and time{varyingcovariates. We decompose the underlying space{time processes into threeparts: a long{term spatial{temporal term, a short{term main spatial pat-tern and a short{term spatial{temporal term. We also incorporate BayesianEOFs into the new model so we can model gridded data, such as theMAQSIP (Multiscale Air Quality Simulation Platform) simulated data. Wealso extend the univariate Bayesian spatio{temporal model to the multi-variate case. Moreover, we summarize the MCMC method that allows usto draw MC samples from the joint posterior distribution of model param-eters. And henceforth, we are able to interpolate or predict the univariate(or multivariate) pollutant(s) in large spatio{temporal domains.159Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsSection 7.1 introduces related research in this ?eld. Section 7.2 proposesa univariate Bayesian spatio{temporal model, an extension of the BSP ap-proach, to model spatio{temporal processes over large domains. Section7.3 discusses the relationships between our model and some others, suchas the DLM proposed by Huerta et al. (2004), the state{space model(SSM) by Wikle and Cressie (1999), Gelfand et al. (2005), and Le{Zidek'sapproach (1992). Section 7.4 presents the multivariate Bayesian spatio{temporal model. Section 7.5 demonstrates use of the MCMC algorithm todraw samples from the joint posterior distribution of the model parameters.Section 7.6 summarizes the advantages for the Bayesian spatio{temporalmodels we propose in this chapter and the remaining future work for ourmodels.7.1 IntroductionIn large spatial{temporal domains, computational e?ciency often emerges asa major di?culty in modelling high{dimensional data. This computationalproblem has been addressed by many approaches.Mardia et al. (1998) propose the kriged Kalman ?lter (KKF) approachin which the mean structure in the observation equation are formed by ?nitecommon ?elds. However, in the discussion following Mardia et al.'s paper,Cressie and Wikle (1998) criticize the KKF approach for its oversmoothingpredictor and lack of speci?c prior structure for some of the model parame-ters, comparing with the SSM proposed by Wikle and Cressie (1999), a fullyhierarchical Bayesian approach. They use EOFs based on the orthogonal ba-sis functions to capture the main spatial patterns in spatial{temporal ?eldsand so tackle the problem of the curse of dimensionality. However, Wikleand Cressie's SSM can only incorporate an AR(1) process, and so wouldbe inapplicable to more general structures of autocorrelated residuals afterremoving the long{term spatial{temporal component from the observationmodel.Gelfand et al. (2005) develop a dynamic spatio{temporal model using thelinear coregionalization method (LMC) to incorporate di?erent correlation160Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsstructure for multivariate responses. However, their model is not applicableto autocorrelated detrended residuals. A similar approach can also be seen inLee and Ghosh (2005). Johannesson et al. (2007) propose dynamic multi{resolution spatial models (MRSM) to extend the MRSM and incorporatedynamic temporal features. Moreover, the spatial domain considered intheir approach is recursively partitioned with a \tree{structured parent{children relationship between cells (pixels) at adjacent resolution". Lopes etal. (2007) propose a spatial dynamic factor analysis to model the commonfactors as a latent process with unknown ?nite dimension, much smallerthan the total number of sites. The forward{?ltering{backward{samplingmethod is then applied to the latent process to obtain the posterior samplesfor state parameters in the latent process.Other novel works related to geostatistics can be referred to Wacker-nagel (1998). A fully Bayesian approach on kriging has been investigatedby Handcock and Stein (1993). They state the ordinary kriging can beviewed as the Bayesian kriging under a non{informative prior for the mean.In their approach, they incorporate the unknown covariance structure ina Bayesian framework. The Bayesian kriging approach takes into accountmore uncertainty on the unknown covariance structure and so is capable ofquantifying the performance of the estimated kriging predictor.Le and Zidek developed their approach in accord with a speci?c set ofdesiderata, one of which was computational feasibility. Moreover their ap-plications were commonly made to relatively compact regions such as urbanareas. These two factors led them to use conventional EDA and time se-ries approaches to initially remove regional temporal components by ?ttingidentical model parameters to all sites, making the standard errors of theresulting estimates negligible. In one case where there method was employedto cover a very large spatial domain (Fu et al., 2003) a spatial thin platespline was ?tted over sites so, in e?ect, a di?erent mean level ?(s) was sub-tracted from each site's series so as to center it. Thus in most cases, Leand Zidek focus on modeling small scale variation. At the same time, eventhough the large scale components were handled in a fairly ad hoc way, theapproach itself was very general and very ?exible. Thus for example, it sub-161Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelssumes the approach of Wikle and Cressie (1999), albeit without the elegantmodeling foundations the latter provide and the substantive rationale theygive for it.Our approach attempts to meet their desideratum of computational sim-plicity while putting their pre?ltering approach into a more formal hierar-chical framework. That retains the advantages of their methodology. At thesame time, it extends the approach of Wikle and Cressie (1999) and includesa number of other approaches for modelling space time processes. Thus itprovides a unifying Bayesian framework wherein computational strategiescan be developed once and for all. That in turn gives modellers much ?exi-bility.Next we propose the extension of the BSP approach, that is, the uni-variate and multivariate Bayesian spatio{temporal models.7.2 Univariate Bayesian Spatio{temporal ModelWe consider the case that the observations are measured at irregular loca-tions in a large spatial domain. We propose this univariate Bayesian spatio{temporal model to spatially interpolate and temporally predict the responsevariables at those irregularly located gauged or ungauged sites. One fu-ture implementation will be to predict ground{level ozone concentrationsover the eastern USA using the hourly ozone AQS database in Section 2.5.Comparing it with Wikle and Cressie's approach, our model decomposesthe spatial{temporal process into the following components: a long{termdynamic spatio{temporal term, a short{term spatial patterns and a short{term spatial temporal component. Note that Wikle and Cressie (1999) as-sume that long{term spatial{temporal components are the averages over alllocations, and so they consider the response to be the detrended residuals.Denote by Z(s;t) the observation at site s and time t: We model thespatio{temporal ?elds by ?rst removing all the linear and seasonal trendsacross all regions in the domain that is of interest, that is, F01(s;t)M(s;t):The remaining detrended residuals are represented by Y (s;t): We then de-compose Y (s;t) into two parts, ?K(s)aK(t) { the remaining long{term spa-162Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelstial variation, and V (s;t) { the short{term spatial{temporal variation. Inour approach, this short{term spatial{temporal term can be modelled as aBSP term in Le{Zidek's approach, incorporated into a DLM framework.We propose the following model in this section:Z(s;t) = F01(s;t)M(s;t) + Y (s;t) (7.1)Y (s;t) = ?K(s)aK(t) + V (s;t) (7.2)V (s;t) = F02(s;t)?(s;t) + ?(s;t) (7.3)andM(s;t) = G1(s;t)M(s;t ? 1) + ?(s;t) (7.4)aK(t) = H2taK(t ? 1) + ?t (7.5)?(s;t) = G2(s;t)?(s;t ? 1) + !(s;t): (7.6)The above models can incorporate a fairly large class of cases, such assystematic temporal components, regional level covariate e?ects, and locallevel or site{speci?c covariate e?ects. In the coming subsections, we discussin particular the way to deal with di?erent kinds of covariates in our modeland possible choices for the terms to illustrate the covariance structure atthe coarse level, that is, ?K(s): Speci?cally, we consider two types of covari-ates: Type I covariates are the regional level covariates, common over allsites at any ?xed time point, with a site{speci?c coe?cient matrix; Type IIcovariates are the local level covariates, site{speci?c and time{varying, witha common coe?cient vector across all sites at any ?xed time point.The vector form of the above proposed univariate Bayesian spatio{temporalmodel can be written as follows:Zt = F01tMt + Yt (7.7)Yt = ?KaK(t) + Vt (7.8)Vt = F02t?t + ?t (7.9)163Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsMt = G1tMt?1 + ?t (7.10)aK(t) = H2taK(t ? 1) + ?t (7.11)?t = G2t?t?1 + !t; (7.12)where Zt = (Z(s1;t);:::;Z(sp;t))0 : p?1; F01t = diagfF1(s1;t);:::;F1(sp;t)g;Mt = (M(s1;t);:::;M(sp;t))0; Yt = (Y (s1;t);:::;Y (sp;t))0; ?K = (?K(s1);:::;?K(sp))0; Vt = (V (s1;t);:::;V (sp;t))0; F02t = diagfF2(s1;t);:::;F2(sp;t)g; ?t = (?(s1;t);:::;?(sp;t))0; ?t = (?(s1;t);:::;?(sp;t))0; ?t =(?(s1;t);:::;?(sp;t))0; ?t = (?(s1;t);:::;?(sp;t))0; !t = (!(s1;t);:::;!(sp;t))0and Gjt = diagfGj(s1;t);:::;Gj(sp;t)g for j = 1;2:Our proposed univariate Bayesian spatio{temporal model is completedby giving the following assumptions on the priors of model parameters:?C = OCcurrency1?2C O0C (7.13)? = OCcurrency1?1C (7.14)?K = ?EKp (7.15)?t ? Np(0;?F ) (7.16)?t ? Nl1(0;W1) (7.17)?t ? NK(0;WK) (7.18)!t ? Nl2(0;W2) (7.19)??C ? Wp(?C;?C) (7.20)??F ? Wp(?F ;?F ); (7.21)where ?C presents the covariance matrix for Y (s;t) at the coarse level andhenceforth ??C, the precision matrix; similarly, ?F represents the covariancematrix at the ?ne level and ??F, the precision matrix. Let EKp : p ? K =(ep;1;:::;ep;K); where ep;j : p?1 represents the p{dimensional vector whosejth entry is 1, the others, 0, for j = 1;:::;K.The initial information can be described as follows:M0 ? N(mM0 ;CM0 ) (7.22)164Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsaK(0) ? N(ma0;Ca0) (7.23)?0 ? N(m?0;C?0): (7.24)In the coming subsections, we ?rst discuss possible choices and modelsfor di?erent types of covariates, along with the main spatial term contributedby ?K; and then obtain theoretical results for our proposed model.7.2.1 Type I covariatesWe can incorporate Type I covariates in our model, that is, the covariates arecommon over sites at each ?xed time point, for example, the month e?ect,week day e?ect, and hourly e?ect we considered in Chapter 4. For thesetypes of covariate, we consider their coe?cient matrix (unknown parametermatrix) to be site{speci?c. In this subsection, we show how we deal withthis type of covariate as the ?rst step in (7.1) and (7.4).Suppose the data base has l1 Type I covariates in total. Let ?t =( ?1(t);:::; ?l1(t)) : 1 ? l1 be the l1{dimension covariate vector at time t:The corresponding coe?cient matrix ? : l1 ? p can also be written as ? =(?(s1);:::; ?(sp))0; where ?(si) = (?1(si);:::; ?l1(si)) is the l1{dimensionalcoe?cient column vector at site si for i = 1;:::;p:Let F01(si;t) = ?t and M(si;t) = ?(si): Then (7.1) is equivalent toZ(si;t) = ?t ?(si) + Y (si;t): This shows our model to incorporate Type Icovariates.7.2.2 Type II covariatesWe now show how we can deal with Type II covariates in this subsectionusing the model we proposed above, that is, site speci?c covariates varyingwith time, for example, hourly temperatures, wind speeds and wind direc-tions at each location. The corresponding coe?cients of Type II covariatesare common over all sites at any ?xed time point. We show this type ofcovariates can be dealt with at steps (7.3) and (7.6), along with an AR(1)autocorrelation structure for the detrended residuals V (si;t). Of course, thecase of other systematic temporal components can also be dealt with using165Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsa similar approach.Suppose l2 Type II covariates all are of interest. Let ~t(si) = ( ~1;t(si);:::;~Zl2;t(si)) be the l2{dimensional covariate vector at time t and location si; fori = 1;:::;p and t = 1;:::;n: Then its corresponding coe?cients are constantacross all sites at each ?xed time point t; that is, ~0t = (~1;t;:::; ~l2;t) : 1?l2:Let F01(si;t) = ~t(si) and M(si;t) = ~t: Then (7.1) is equivalent toZ(si;t) = ~t(si)~t + Y (si;t); demonstrating our model's ?exibility.7.2.3 Possible choices for ?K(s)?K(s) can be chosen in various ways. For example, it could be formed frompolynomials. It can also be formed from the empirical orthogonal functionsas discussed in Chapter 6. We illustrate some possible choices for ?K(s) inthis subsection.The simplest case would be the polynomials. For example, ?K(s) cancome from a polynomial in latitude and longitude, representing a linearmean surface of the spatial{temporal ?eld of Y (s;t) in the geographicalcoordinates (Stroud et al., 2001). Of course, ?K(s) can be constructedusing other polynomial basis functions, such as the quartic polynomials,that is, fourth degree polynomials, used by Fuentes and Raftery (2005).Or ?K(s) could also be orthogonal basis functions. For example, it couldbe orthogonal polynomials, orthonormal basis functions for cubic splines,bicubic splines, wavelets, or empirical orthogonal functions. For the rea-sons we stated before in Chapter 6, we emphasize EOF functions. Wikleand Cressie (1999) estimate the EOF in their SSM, which could do here.For unknown EOFs, we extend the Bayesian EOFs in Chapter 6 under theunivariate Bayesian spatio{temporal model. We will talk about this in thecoming subsection.7.2.4 Predictive posterior distributionsWe present theoretic results for the predictive posterior distribution of modelparameters used in our model. Moreover, we brie?y summarize the idea ofimplementing the BEOF in our model and the related theoretical results.166Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsSuppose ?C denotes the coarse{level covariance matrix for Y (s;t): As-sume ?F is the ?ne{level covariance matrix for the deAR'd residuals, in-cluded in the covariance matrix of ?(s;t):The univariate Bayesian spatio{temporal model we proposed in (7.7){(7.12) can also be written as the following form:Zt = F0t?currency1t + ?t ?t ? Np(0;?F)?currency1t = Gt?currency1t?1 + !currency1t !currency1t ? Nl1+K+l2(0;W);where F0t = (F01t;?K;F02t); ?currency1t = (M0t;aK(t)0;?0t)0; Gt = Block{diagfG1t;H2t;G2tg; !currency1t = (?0t;?0t;!0t)0; and W = Block{diagfW1;WK;W2g:Using standard results for the DLM and referring to Theorem A.2.1 inAppendix A.2, we obtain the corresponding posterior distributions for thestate parameters, that is, ?currency1t ; given the observations until time t and thecoarse{ and ?ne{ level covariance matrices ?C and ?F; respectively, in thefollowing theorem.Theorem 7.2.1 Given the coarse{ and ?ne{ level covariance matrices, ?Cand ?F; respectively, we obtain the following posterior distributions:?currency1t?1jZ1:t?1;?F;?C ? Nl1+K+l2[mt?1;Ct?1]?currency1t jZ1:t?1;?F;?C ? Nl1+K+l2[at?1;Rt?1]ZtjZ1:t?1;?F;?C ? Np[ft;Qt]?currency1t jZ1:t;?F;?C ? Nl1+K+l2[mt;Ct];whereat = Gtmt?1 Rt = GtCt?1G0t + Wft = F0tat Qt = F0tRtFt + ?Fet = Zt ? ft At = RtFtQ?1tmt = at + Atet Ct = Rt ? AtQtA0t;for t = 1;:::;n:Interest lies in the joint posterior distribution p(?currency11:n;?C;?FjZ1:n): Wecan write this joint posterior density as the product of p(?currency11:nj?C;?F;Z1:n)and p(?C;?FjZ1:n): By Theorem 7.2.1, the former posterior distribution can167Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsbe obtained as follows:p(?currency11:nj?C;?F;Z1:n) =nYt=1p(?currency1t j?C;?F;Z1:t)p(?currency10j?C;?F)/nYt=1jQtj?1=2 exp(?12nXt=1(?currency1t ? mt)0C?1t (?currency1t ? mt)):The latter is given byp(?C;?FjZ1:n) / p(Z1:nj?C;?F)p(?C)p(?F)=nYt=1p(Ztj?C;?F;Z1:t?1)p(?C)p(?F)/? nYt=1jQtj!?1=2exp(?12nXt=1e0tQ?1t et)p(?C)p(?F);where et and Qt are given in Theorem 7.2.1. Note that F0t can be viewed asa function of ?C; that is, F0t(?C) and so are both et and Qt:Moreover, we obtain the following non{analytic form for the full condi-tional posterior distributions of ?C and ?F :p(?Fj?C;Z1:n) / p(?C;?FjZ1:n)/ p(?F)? nYt=1jQtj!?1=2exp(?12nXt=1e0tQ?1t et)/ j?Fj??F +p2nYt=1jF0tRtFt + ?Fj?12 exp??12htr(?F?F?1)+nXt=1e0t(F0tRtFt + ?F)?1et#); (7.25)and similarly,p(?Cj?F;Z1:n) / j?Cj??C+p2nYt=1jF0t(?C)RtFt(?C) + ?Fj?12 exp??12?"tr(?C?C?1) +nXt=1e0t(?C)?F0t(?C)RtFt(?C)168Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Models+?F)?1 et(?C)io:(7.26)These results give us a theoretical basis for making inferences based onthe joint posterior distribution of the model parameters using the MCMCmethod. The way using the MCMC method will be addressed in Section7.5.7.3 The Univariate Bayesian Spatio{temporalModel and Relationships with OthersApproachesWe talk about the ?exibility of our univariate Bayesian spatio{temporalmodel and its relationship with some others approaches.7.3.1 Relationship with the DLM in Huerta et al. (2004)The long{term spatial temporal term, F01(s;t)M(s;t); is very general. It con-tains the case of linear trends and seasonal trends. As a very simple example,the linear trends can be incorporated into our model if we assume F01(s;t) =(1;t) and M(s;t)0 = (?0t;?1t): Another example is the DLM in Huerta etal. (2004), a special case of our model. Assume F01(t) = (1;S1t(a1);S2t(a2))and M(s;t)0 = (?t;?1;s;t;?2;s;t): Then we obtain Z(s;t) = ?t+S1t(a1)?1;s;t+S2t(a2)?2;s;t) + Y (s;t); the DLM of Huerta et al. (2004).7.3.2 Relationship with the SSM in Wikle & Cressie (1999)Our model contains the special case of the model proposed in Wikle andCressie (1999), and so can treat the former as an extension of their model.They remove the linear and seasonal trends before ?tting their model, aprocess incorporated by F01(s;t)M(s;t) in our model. Assuming V (s;t) =?(s;t) + ?(s;t); and removing (7.3), (7.4) and (7.6), we obtain the Wikle{Cressie model.169Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsOur model can incorporate more general structures for systematic tem-poral components than Wikle and Cressie's AR(1). To illustrate this gen-erality, we show now an AR(2) process can be dealt with using our model.This would make Wikle and Cressie's model unsuitable for our purpose ofmodelling hourly ozone concentrations' ?elds, although it might work fordaily ozone concentrations.Suppose we consider l1 Type I covariates in (7.3) and AR(2) structurefor the detrended residuals. In other words, we consider X(si;t) = V (si;t)?zt?(si) and W(si;t) = X(si;t) ? ?1(si)X(si;t ? 1) ? ?2(si)X(si;t ? 2): In(7.3), let F02(si;t) = (zt ??1(si)zt?1 ??2(si)zt?2;?1(si);?2(si)) : 1?(l1 +2);and ?(si;t)0 = (?(si)0;V (si;t? 1);V (si;t? 2)) : 1 ? (l1 + 2): We then obtainV (si;t) = F02(si;t)?(si;t) + W(si;t): In (7.6), we further letG2(si;t) =0BB@Il1 0l1 0l1zcurrency1(si;t) ?1(si) ?2(si)00l1 1 01CCA;where zcurrency1(si;t) = zt?1 ? ?1(si)zt?2 ? ?2(si)zt?3: We so obtain (7.6) where!(si;t)0 = (0l1;W(si;t ? 1);0): We have shown that our model can dealwith AR(2) autocorrelation structure for the detrended residuals. Moregenerally, we can show this dynamic linear modelling approach can actuallyaccommodate far more general time series structure, such as ARMA(p;q)processes. However, Wikle and Cressie only consider the AR(1) process.From that point of view, our approach can be viewed as an extension totheir well known model.7.3.3 Relationship with the univariate SSM in Gelfand etal. (2005)We now show the relationship between our model and the univariate SSMproposed by Gelfand et al. (2005). They deal with both types of covariates intheir model. Suppose F01(s;t) = F02(s;t); M(s;t) = ?t; and ?(s;t) = ?(s;t)in our model. We then obtain the same mean function as Gelfend et al..However, the main di?erence between their model and ours lays in the short{170Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsterm or small{scale spatial{temporal term. In their approach, they assumesmall{scale spatial{temporal term to be composed by two parts: a core-gionalization spatial{temporal term and a measurement error component.However, Gelfand et al.'s model cannot deal with autocorrelated detrendedresiduals, and so is not applicable in our applications. Our model containsmore general structure than theirs, and we actually decompose the short{term spatial{temporal term into two parts: the principal spatial pattern anda local spatial{temporal pattern.7.3.4 Relationship with the BSP model in Le and Zidek(1992)Our model also contains the Le{Zidek model as a special case. Le and Zidekpre?lter the linear and seasonal trends, as well as the highly autocorrelateddetrended residuals before ?tting their model. This pre?ltering step canbe incorporated through the term F01(s;t)M(s;t) and part of F02(s;t) in ourmodel. Removing (7.2), (7.4) and (7.5), we obtain the Le{Zidek BSP model.Following (7.2), we can model the small spatial{temporal variation termas Le and Zidek do in their book. Let W(s;t) = V (s;t) ? ~(s;t)?(s;t)and X(s;t) = W(s;t) ? ?iW(s;t ? 1): For simplicity, assume ?(s;t) = ?tand ~(s;t); the l2 dimensional site{speci?c covariates vector, for exam-ple hourly temperatures or hourly wind speeds when the response vari-ate is the hourly ozone concentrations in our study. We can write Wt =(W(s1;t);:::;W(sp;t))0; and ~t = ( ~(s1;t)0;:::; ~(sp;t)0)0 : p ? l2: Wethen have Wt = Vt ? ~t?t and Xt = Wt ? diag(?1;:::;?p)Wt?1: Leand Zidek then model X0t in a hierarchical Bayesian framework such thatX0t ? Np(^tB;?F); where ^t represents a q{dimension covariate vector andB : q ? p; the coe?cient matrix. At the second level, Le and Zidek modelB ? Np?q(B0;F?1 ?F): Furthermore, they assume that ?F ? W?1p (?;?):In our model, we also assume that ?t = H??t?1 + !?t ; where !?t ?N(0;W2): We now have thatV (s;t) = ( ~(s;t)H? ? ?i ~(s;t ? 1);?i)(?0t?1;V (s;t ? 1))0 + ~(s;t)!?t171Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Models+X(s;t)= F02(s;t)?(s;t) + ~(s;t)!?t + X(s;t); (7.27)where F02(s;t) = ( ~(s;t)H? ? ?i ~(s;t ? 1);?i) : 1 ? (l2 + 1) and ?(s;t) =(?0t?1;V (s;t ? 1))0 : (1 + l2 ? 1):By the Le{Zidek's modelling approach, we then have Vt = F02t?t +~Zt!?t + B^zt + ?t; where ?t ? N(0;?F): Hence we obtain ?t = ~Zt!?t + ?t ?N(0; ~tW2 ~0t + ?F): Note that this is a special case of our model.7.4 A Multivariate Bayesian Spatio{TemporalModelWe now extend our model to a multivariate case. Suppose m di?erent pollu-tants or species are measured in a spatio{temporal ?eld. Let Zj(s;t) be thejth pollutant at site s and time t, for j = 1;:::;m; s 2 fs1;:::;spg; and t =1;:::;n: Let Z(s,t) = (Z1(s;t);:::;Zm(s;t))0 be the vector of observationsat site s and time t: Assume l1j type I covariates and l2j type II covariatesconsidered, for j = 1;:::;m: Those covariates are represented by F1;j(s;t)0 :1?l1j and F2;j(s;t)0 : 1?l2j; respectively. Write Fi(s;t)0 : m?Pmj=1 lij as ablock diagonal matrix with diagonal entries fFi;1(s;t)0;:::;Fi;m(s;t)0g; fori = 1;2: Let M(s;t) = (M1(s;t)0;:::;Mm(s;t)0)0 : Pmj=1 l1j ?1 and ?(s;t) =(?1(s;t)0;:::;?m(s;t)0)0 : Pmj=1 l2j?1: Assume we obtain the ?Kjj : 1?Kj; forj = 1;:::;m; and the corresponding expansion coe?cients aKjj (t) : Kj ? 1,using the multivariate EOF analysis. Let ?currency1(s) = Block{diagf?K11 (s);:::;?Kmm (s)g : m ? Pmj=1 Kj; and acurrency1(t) = (aK11 (t);:::;aKmm (t))0 : Pmj=1 Kj ? 1:We obtain the multivariate Bayesian spatio{temporal model as follows:Z(s;t) = F01(s;t)M(s;t) + Y(s;t) (7.28)Y(s;t) = ?currency1(s)acurrency1(t) + V(s;t) (7.29)V(s;t) = F02(s;t)?(s;t) + ?(s;t) (7.30)172Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal ModelsandM(s;t) = G1(s;t)M(s;t ? 1) + ?(s;t) (7.31)acurrency1(t) = H2tacurrency1(t ? 1) + ?t (7.32)?(s;t) = G2(s;t)?(s;t ? 1) + !(s;t); (7.33)where Y(s;t) = (Y1(s;t);:::;Ym(s;t))0 : m ? 1; V(s;t) = (V1(s;t);:::;Vm(s;t))0 : m ? 1; ?(s;t) = (?1(s;t);:::;?m(s;t))0 : m ? 1; and !(s;t) =(!1(s;t);:::;!m(s;t))0 : m?1: We assume a separable space{time covariancestructure for the matrix{variate ?t = (?(s1;t);:::;?(sp;t)) : m ? p; that is,?t ? Nm?p(0; ?F);where represents the correlation matrix between the pollutants and ?F;the covariance matrix between the spatial locations at the local{ or ?ne{scale level. We assume that the EOFs in ?currency1 estimate the covariance matrix,that is, ?C:We also assume inverted Wishart priors for ?F and ?C; respectively. Weleave that extension to future work.7.5 MCMC Algorithm on the BayesianSpatio{temporal ModelsIn this section, we discuss MCMC algorithms for sampling from the jointposterior distributions of the model parameters for both univariate and mul-tivariate spatial{temporal models. We do not present a simulation study orapplication, pending completion of future work on implementation. Insteadwe stop with a fairly clear statement of an approach that makes successseem plausible in applications like that one presented earlier to the AQSdata, with hourly ozone concentrations over the entire USA and the wholeof one summer.To make computation feasible, we can estimate the K using an ad hocmethod. Or we can implement the idea of Bayesian EOF Chapter 6 in this173Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsnew proposed model, using block MCMC methods to draw samples fromthe full conditional distributions of model parameters.As we can see from the prediction posterior distribution for the univari-ate Bayesian spatio{temporal model, the Metropolis{Hasting method is re-quired to sample from the posterior distributions of the the ?ne{ and coarse{level covariance matrices. That allows the Generalized inverted Wishart(GIW) prior to work at this stage. We leave this extension in future work.We now summarize what we have obtained in this chapter. From thepredictive posterior distribution obtained in Section 7.2, we can use theblock MCMC method to draw samples from its joint posterior distribution.In other words, we iteratively sample from p(?Fj?C;Z1:n); p(?Cj?F;Z1:n);and p(?currency11:nj?F;?C;Z1:n) according to (7.25), (7.26), and Theorem 7.2.1, re-spectively.Algorithm 7.5.1 (Metropolis{within{Gibbs algorithm)1. Initialization: sample??F(1) ? W(?F;?F)??C(1) ? W(?C;?C)?currency11:n(1) ? N(m0;C0):2. Given the (j ? 1)th values, ??F(j?1), ??C(j?1), ?currency11:n(j?1) and the ob-servations Z1:n :(1) Use a Metropolis{Hasting step to sample ??F(j) from p(??Fj??C;?currency11:n;Z1:n):(2) Use a Metropolis{Hasting step to sample ??C(j) from p(??Cj??F;?currency11:n;Z1:n):(3) Sample ?currency11:n(j) from p(?currency11:nj??C;??F;Z1:n):3. Repeat until convergence.The implementation of the MCMC method to sample the model parame-ters in our model will be left to future work. After obtaining the MC samples174Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsfor the model parameters after the burn{in period, it is straightforward toobtain the temporal prediction of the responses using the DLM approach.The spatial interpolation problem can then be viewed as a missing data inthe DLM. The latter can be obtained by adding this MCMC block of missingdata in the above algorithm. We leave all the implementation of this modelto future wok.7.6 Results and ConclusionsOur Bayesian spatio{temporal model is very ?exible and powerful. It hasthe following advantages:1. The general structure of the uni?ed Bayesian spatio{temporal modelallows us to remove the long{term systematic temporal variation, forexample, the linear trend, seasonal trends, covariate e?ects for bothtypes. For type I covariates, that is, the site{speci?c covariates, weregress them on site{invariant but time{varying unknown coe?cients,an extension of the Le{Zidek's approach. For type II covariates, thatis, time{varying covariates common over the entire domain, we regressthem on a site{speci?c and time{varying unknown coe?cient matrix,one removed at the process model in Le{Zidek's approach.2. We then decompose Y (s;t) into two parts: the principal local spatialpatterns and the remaining local spatial{temporal variation terms. Inthis step, ?K(s) can also be constructed using other basis functions,such as orthogonal polynomials, thin spline plates, and bicubic splinemethods. However, the eigenvalue basis functions, that is, the basisfunctions for the EOF, do provide a unique solution by the spectraldecomposition theorem. In other words, we can truncate the ?rst KEOFs to represent the main spatial patterns in the spatial{temporal?elds of Y (s;t); that is, the \detrended" residuals.3. The remaining local spatial{temporal variation term is then modelledas a BSP term because of the need for computational feasibility. The175Chapter 7. An Extension of the BSP: Bayesian Spatio{Temporal Modelsgeneral structure of the systematic temporal components can then beincorporated. From this point of view, we obtain an extension for theSSM proposed by Wikle and Cressie (1999).4. Our model allows us to incorporate two di?erent scales of covariancematrices, that is, coarse{ and ?ne{ scale levels of spatial covariancematrices. At the coarse level, we truncate the EOFs using the BayesianEOF method in Chapter 6 to represent the coarse scale long{termspatial patterns. At the ?ne level, the BSP term is used to representthe local scale short{term spatial{temporal patterns.5. Spatial interpolation and temporal prediction turn out to be straight-forward using our model. Gaussian framework allows us to spatiallyinterpolate the responses at ungauged sites. The time evolution of thedynamic models allows us to temporally predict the responses at thegauged and ungauged sites.Future work includes implementing that model to the real database orsimulation study, and an extension involving the GIW prior for the coarse{and ?ne{ scale covariance matrices.176Chapter 8Future WorkThis last chapter presents a summary of this thesis (Section 8.1) and pro-posals for future work based on it (Section 8.2) as well as approaches to betaken to complete that work.8.1 Thesis SummaryWe have implemented the Gaussian DLM proposed by Huerta et al. (2004)to spatially interpolate ground{level ozone concentrations at one cluster ofmonitoring stations in an AQS database (1995). A complete and testedsoftware package has been developed for this implementation. Theoreticalresults have been developed regarding the predictive variance using the ?rst{order polynomial model. Those results explain the monotone behavior of thecoverage probabilities found from the results of spatial interpolation. More-over, we demonstrate how to use discount factors in the DLM to improvethe predictive results and their accuracy. However, we ?nd the approachvery computationally intensive so that it is not scalable to large space{timedomains.We therefore explore a computational simpler alternative, the BSP ap-proach, for spatially interpolating univariate and multivariate responses inspace{time domains. That alternative has been implemented for an AQSdatabase (2000). Moreover, we have found an extension to this approachfor temporal forecasting one{day{ahead ground{level ozone concentrations.After comparing it with the DLM, we ?nd that the latter cannot competewith the BSP for either spatial interpolation or temporal prediction in termsof mean squared prediction error (MSPE).The BSP approach uses empirical Bayes steps to estimate some model177Chapter 8. Future Workparameters within the Bayesian framework. Although these steps simplifycomputation the ad hoc approach would be seen as objectionable from apurely Bayesian perspective and the model is not as ?exible as the DLM.To re?ne the BSP therefore, we put it into a DLM framework so that themodel is more ?exible to update or predict the responses as new data comeon stream, one of the attractive features of dynamic models. Yet we preserveits computational simplicity.A Bayesian version of the EOF method, that is, the BEOF method, hasbeen proposed in this thesis. The BEOF method allows us to representthe principal spatial patterns of spatial{temporal ?elds in a fully Bayesianframework. We have demonstrated potentially severe problems with theclassical EOF method that seems to have been largely ignored. We havecompared the classical and corrected EOFs with true EOFs using the sim-ulated database. Moreover, we have developed an MCMC algorithm forsampling from the joint posterior distribution of model parameters. How-ever, implementation of the BEOF approach on our existing database willbe left to future work.Finally, we have proposed a uni?ed approach to univariate and multivari-ate spatio{temporal modelling within a fully hierarchical Bayesian frame-work so that we can incorporate some interesting features of the BSP, DLMand BEOF approaches. We have provided theoretical results on the jointposterior distribution of model parameters and the corresponding MCMCalgorithm. Implementation of this model in data and simulated data will becompleted in future work.8.2 Future Research PlanWe ?nally propose in the list below, future work based on this thesis as wellas possible directions for completing that work.1. We can extend the DLM to the discount DLM and implement it ona real database. This implementation helps us set the reasonable hy-perpriors for some of the model parameters instead of ?xing them in178Chapter 8. Future Workan ad{hoc way. We can also develop corresponding software for thispurpose.2. We can extend the DLM to incorporate the dependence of model pa-rameters. To do this, a further extension of the discount DLM isneeded so that the time{varying discount factors can be free of thecomputational burden.3. We will extend the one{day{ahead temporal prediction of ground{levelozone concentrations using the BSP approach to arbitrary time points.4. One interesting problem in the BSP approach is to deal with the mono-tone data (double staircase) patterns in two directions. This is a reallydi?cult problem and a route to the solution remains to be found.5. We can implement the BSP in multivariate cases where dependent butunknown structure between the multivariate responses are considered.6. Another future project for the BSP approach involves misaligned data.This problem will be partially addressed when we integrate the BSPinto the DLM framework, so the model we considered in Chapter 7 willbe one possible choice. The DLM framework helps in the predictionof missing data provided the rate of missingness is reasonable.7. We will explore the choice of an optimal starting hour for makingone{day{ahead prediction.8. The Bayesian EOF method needs to be applied to real as opposedto simulated data so that we can ?nd whether the Bayesian and cor-rected EOFs can give seriously discrepant answers, a result that wouldraise concerns about the appropriateness of classical EOFs. This im-plementation is straightforward following the MCMC algorithm wedeveloped.9. The extended BSP, i.e., the Bayesian spatio{temporal model, can beimplemented with either model output or measurements. By combin-ing the two types of data over the entire network of US EPA moni-179Chapter 8. Future Worktoring sites, more accurate spatial predictions should be achievable forground{level ozone concentrations over a large spatio{temporal do-main. This straightforward implementation of our theory follows fromour theoretical results and MCMC algorithm we have already devel-oped. For example, we can combine physical model outputs with sta-tistical ones, in which the MAQSIP (gridded) data, can be consideredat the regional or coarse level and the observations or measurementsat each monitoring station, at the local or ?ne level.10. We can extend Bayesian spatio{temporal models to incorporate dif-ferent prior beliefs for the model parameters, that is, the covariancematrices regarding to di?erent levels and blocks. This extension in-cludes developing the corresponding theoretical results on predictiveand posterior distributions using the GIW prior on model parameters.11. We can use MCMC sampling algorithm to sample a random covariancematrix that follows a GIW prior distribution. This algorithm will helpus to spatially interpolate and temporally predict the responses usingthe Bayesian spatio{termporal models in which the priors of covariancematrices at both coarse and ?ne levels are GIWs.180Bibliography[1] Bj?rnsson, H. & Venegas, S.A. (1997). A manual for EOF and SVDanalyses of climatic data. Centre for Climate and Global Change ResearchReport # 97{1, McGill U.[2] Bretthorst, L. (1988). Bayesian spectral analysis and estimation. NewYork: Springer Verlag.[3] Brown, P.J., Le, N.D., & Zidek, J.V. (1994a). Multivariate spatial inter-polation and exposure to air pollutants. Can. J. Statist., 22, 489{510.[4] Brown, P.J., Le, N.D., & Zidek, J.V. (1994b). Inference for a covariancematrix. In Aspects of uncertainty: A tribute to DV Lindly, (Eds) PRFreeman and AFM Smith. Chichester: Wiley, 77{92.[5] Calder, C.A. (2004). E?cient posterior inference and prediction of space{time processes using dynamic process convolutions. In the Joint Proceed-ings of the Sixth International Symposium on Spatial Accuracy Assess-ment in Natural Resources and Environmental Sciences and the FifteenthAnnual Conference of TIES. Portland, Maine. June 28{July 1, 2004.[6] Calder, C.A., & Cressie, N. (2007). Some topics in convolution{basedspatial modeling. Proceedings of the 56th Session of the InternationalStatistics Institute. Lisbon, Portugal. August 22{29, 2007.[7] Carter, C.K., & Kohn, R. (1994). On gibbs sampling for state spacemodels. Biometrika, 81, 541{553.[8] Chikuse Y. (2006). State space models on special manifolds. Journal ofMultivariate Analysis, 97, 1284{1294.181Bibliography[9] Cressie, N. (1993). Statistics for spatial data. Wiley.[10] Cressie N., & Wikle, C.K. (1998). Strategies for dynamic space{timestatistical modelling: discussion of \The Kriged Kalman ?lter" by Mardiaet. al. Test, 7, 257{264.[11] Damian, D., Sampson, P.D., & Guttorp, P. (2001). Bayesian estimationof semi{parametric non{stationary spatial covariance structures. Environ-metrics, 12, 161{178.[12] Dou, Y.P., Nhu, D.L., & Zidek J.V. (2007). A dynamic linear modelfor hourly ozone concentrations. Technical Report #228, Department ofStatistics, UBC.[13] Fu, A., Le, N.D., & Zidek, J.V. (2003). A statistical characterizationof a simulated Canadian annual maximum rainfall ?eld. Technical report#2003{17, SAMSI.[14] Fuentes, M. (2002). Interpolation of nonstationary air pollution process:a spatial spectral apporach. Statistical Modelling, 2, 281{298.[15] Gamerman, D., & Lopes, H.F. (2006). Markov chain Monte Carlo:stochastic simulation for Bayesian infrence. Chapman & Hall.[16] Gelfand, A.E., Banerjee, S., & Gamerman, D. (2005). Spatial processmodelling for univariate and multivariate dynamic spatial data. Environ-metrics, 16, 465{479.[17] Gelfand, A.E., Schmidt, A.M., Banerjee, S., & Sirmans, C.F. (2004).Nonstationary multivariate process modelling through spatially varyingcoregionalization (with discussion). Test, 13, 263{312.[18] Handcock, M.S., & Stein M.L. (1993). A Bayesian analysis of kriging.Technometrics, 35, 403{410.[19] Hannachi, A., Jolli?e, I.T., & Stephenson, D.B. (2007). Review empir-ical orthogonal functions and related techniques in atmospheric science:a review Int. J. Climatol., 27, 1119{1152.182Bibliography[20] Harvey, A.C. (1984). A uni?ed view of statistical forecasting procedures.J. Forecasting, 3, 245{275.[21] Higdon, D., Swall, J., & Kern, J. (1998). Non{stationary spatial mod-eling. Bayesian statistics 6, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid,and A.F.M. Smith. Oxford: Oxford University Press, 761{768.[22] Huerta, G., Sanso, B., & Stroud, J.R. (2004). A spatio{temporal modelfor mexico city ozone levels. J. R. Statist. Soc. C, 53, 231{248.[23] James A.T. (1954). Normal multivariate analysis and the orthogonalgroup.[24] James, A.T. (1960). The distribution of the latent roots of the covari-ance matrix. The Annals of Mathematical Statistics, 31, 151{158.[25] Johannesson, G., Cressie, N., & Huang, H. (2007). Dynamic multi{resolution spatial models. Environmental and Ecological Statistics, 14,5{25.[26] Kyriakidis, P.C., & Journel, A.G. (1999). Geostatistical space{timemodels: A review. Mathematical Geology, 31, 651{684.[27] Le, N.D., Sun, L., & Zidek, J.V. (2001). Spatial prediction and temporalbackcasting for environmental ?elds having monotone data patterns. Can.J. Statist., 29, 516{529.[28] Le, N.D., Sun, W., & Zidek, J.V. (1997). Bayesian multivariate spatialinterpolation with data missing by design. J. Roy. Stat. Soc., Ser B, 59,501{510.[29] Le, N.D., & Zidek, J.V. (1992). Interpolation with uncertain spatialcovariance: A Bayesian alternative to kriging. J. Mult. Anal., 43, 351{374.[30] Le, N.D., & Zidek, J.V. (2006). Statistical analysis of environmentalspace{time processes. Springer.183Bibliography[31] Lee, H., & Ghosh, S.K. (2005). A reparametrization approach for dy-namic space{time models. Institude of Statistics Mimeo Series # 2587.[32] Li, K.H., Le, N.D., Sun, L., & Zidek, J.V. (1999). Spatial{temporalmodels for ambient hourly PM10 in Vancouver. Environmetrics, 10, 321{338.[33] Lopes, H.F., Salazar, E., & Gamerman, D. (2006). Spatial dynamicfactor analysis. Techical report, Universidade Federal do Rio de Janeiro.[34] Mardia, K.V. (1977). Distributions on Stiefel and Grassman manifolds,and their applications. Advances in Applied Probability, 9, 435{436.[35] Mardia, K.V., Goodall, C., Redfern, E.J., & Alonso, F.J. (1998). Thekriged Kalman ?lter. Test, 7, 217{282.[36] Mardia, K.V., & Khatri, C.G. (1977). Uniform distribution on a Stiefelmanifold. Journal of Multivariate Analysis, 7, 468{473.[37] Moller, J. (2003). Spatial statistics and computational methods.Springer{Verlag.[38] Sampson, P., & Guttorp, P. (1992). Nonparametric estimation of non-stationary spatial structure. J. Amer. Stat. Assoc., 87, 108{119.[39] Stroud, J.R., Muller, P., & Sanso, B. (2001). Dynamic models forspatio{temporal data. J. R. Statist. Soc. B, 63, 673{689.[40] Sun, W., Le, N.D., Zidek, J.V., & Burnett, R. (1998). Assessment ofBayesian multivariate interpolation approach for health impact studies.Environmetrics, 9, 565{586.[41] Wackernagel, H. (1998) Multivariate geostatistics: an introduction withapplications. Springer.[42] West, M., & Harrison, J. (1997). Bayesian forcasting and dynamic mod-els. Springer{Verlag.184Bibliography[43] Wikle, C.K. (2002). Spatio{temporal methods in climatology. To ap-pear: Encyclopedia of Life Support Systems. EOLSS Publishers Co. Ltd.[44] Wikle, C.K., & Cressie, N. (1999). A dimension{reduced approach tospace{time Kalman ?ltering. Biometrika, 86, 815{829.[45] Zidek, J.V., Sun, L., Le, N.D., ? aynak, H. (2002). Contending withspace{time interaction in the spatial prediction of pollution: Vancouver'shourly ambient PM10 ?eld. Environmetrics, 13, 1{19.185Appendix AAdditional Results forChapter 2A.1 Additional Results for Section 2.6.1The joint posterior distribution for x1:T ;? and ?2 is given byp(x1:T ;?;?2jy1:T ) = p(?;?2)p(xT j?;?2;y1:T )TYt=1p(xT?tjxT?t+1;?;?2;y1:T )?TYt=1p(ytj?;?2;y1:t?1)= p(x1:T j?;?2;y1:T )p(?2j?;y1:T )p(?jyT ):Suppose p(?;?2) = p(?)p(?2); that is, the priors for ? and ?2 are indepen-dent with each other.The joint posterior distribution for ? and ?2 can be written as follows:p(?;?2jy1:T ) / p(?)p(?2)(?2)?nT=2TYt=1jQtj?1=2 exp(? 12?2TXt=1e0tQ?1t et):If the prior for ?2 is an inverse gamma distribution with shape parameter? and scale parameter ?; then the posterior distribution for ?2 is also an in-verse gamma distribution with shape parameter ?+ nT2 and scale parameter? + 12 PTt=1 e0tQ?1t et:Hence, the posterior density for ? can be written as follows:p(?jy1:T ) = p(?;?2jy1:T )p(?2j?;y1:T )186Appendix A. Additional Results for Chapter 2/ p(?)TYt=1jQtj?1=2"? + 12TXt=1e0tQ?1t et#?(?+nT=2):Therefore, the posterior density for x1:T is given byp(x1:T j?;?2;y1:T ) = p(xT j?;?2;y1:T )TYt=1p(xT?tjxT?t+1;?;?2;y1:T ):A.2 Additional Results for Section 2.6.2Theorem A.2.1 Under Models (2.14) - (2.15) and initial prior (2.18), forany 1 ? t ? T; conditional on ?; we have(i)(xt?1jy1:t?1;?) ? N[mt?1;?2Ct?1](xtjy1:t?1;?) ? N[at;?2Rt](ytjy1:t?1;?) ? N[ft;?2Qt](xtjy1:t;?) ? N[mt;?2Ct];whereat = mt?1 Rt = Ct?1 + Wft = F0tat Qt = F0tRtFt + V?et = yt ? ft At = RtFtQ?1tmt = at + Atet Ct = Rt ? AtQtA0t:(ii) De?ne Bt = CtR?1t+1: For 0 ? k ? T ? 1;(xT?kjy1:T ;?) ? N[aT (?k);?2RT (?k)]; (A.1)whereaT (?k) = mT?k + BT?k[aT (?k + 1) ? aT?k+1]RT (?k) = CT?k + BT?k[RT (?k + 1) ? RT?k+1]B0T?k;187Appendix A. Additional Results for Chapter 2with aT (0) = mT ; RT (0) = CT ; aT?k(1) = aT?k+1; and RT?k(1) =RT?k+1:A.3 Additional Results for Section 2.6.4The observation equation is given byyt = 10n?t + S1t(a1)?1t + S2t(a2)?2t + ?t; ?t ? N[0;?2?(?)];where ?(?) = exp(?V=?) and V denotes the distance matrix for the mon-itoring sites s1;:::;sn:Given ?; ?2; xt (that is, ?t;?1t and ?2t) and yt; the posterior conditionaldistribution for the constant phase parameters, a1 and a2; is given byp(a1;a2j?;?2;xt;yt) / p(ytj?;?2;xt;a1;a2)p(a1;a2)/ p(Mtj?;?2;?t;?1t;?2t;a1;a2)p(a1;a2);where Mt = yt ? 10n?t ? cos(?t12)?1t ? cos(?t6 )?2t; for t = 1;:::;T:We consider the following two cases for the prior of a = (a1;a2)0 :? Case (i) a standard reference prior: p(a) / 1;? Case (ii) a bivariate normal prior: a ? N(?;?); where ? = (?1;?2)0and ? is a 2 by 2 covariance matrix.Under Case (i), for ?xed t = 1;:::;T; let l1 = sin(?t12)?1t; l2 = sin(?t6 )?2t;m = Mt and S = ?2?(?): The posterior conditional distribution for thephase parameter vector a is now given byp(a1;a2j?;?2;xt;yt) / p(Mtj?;?2;xt;a1;a2;xt;?;?2)p(a1;a2)/ expf?1=2[Mt ? a1 sin(?t12)?1t ? a2 sin(?t6 )?2t]0?(?2?(?))?1[Mt ? a1 sin(?t12)?1t ? a2 sin(?t6 )?2t]g= expf?12(m ? a1l1 ? a2l2)0S?1(m ? a1l1 ? a2l2)g188Appendix A. Additional Results for Chapter 2/ expf?12(a??1a0 ? 2a??1?0)g/ expf?12[a0(l1;l2)0S?1(l1;l2)a ? a0(l1;l2)0S?1m?mS?1(l1;l2)a]g/ expf?12(a ? ?)0??1(a ? ?)g;where??1 = (l1;l2)0S?1(l1;l2); (A.2)? = ?(l1;l2)0S?1m: (A.3)Note that equation (A.3) is equivalent to??1? = (l1;l2)0S?1m: (A.4)More speci?cally, we obtain the following elements in the mean vectorand covariance matrix for the posterior conditional distribution of the phaseparameter vector a:? =" ?11 ?12?12 ?22#; (A.5)??1 = (?11?22 ? ?212)?1 (A.6)= (l01S?1l1)(l02S?1l2) ? (l01S?1l2)2 (A.7)?11 = ?(l02S?1l2) (A.8)?12 = ??(l01S?1l2) (A.9)?22 = ?(l01S?1l1) (A.10)?1 = ?11(l01S?1m) + ?12(l02S?1m) (A.11)?2 = ?12(l01S?1m) + ?22(l02S?1m) (A.12)Therefore, we have the following conclusions for the conditional posteriordistribution of the phase parameter vector a:189Appendix A. Additional Results for Chapter 2(i) If the prior for a is the standard reference prior, that is, p(a1;a2) / 1;we have ?a1a2 jxt;yt;?;?2!? N"? ?1?2!;?#;where ?1;?2 and ? can be formed in equations (A.5) - (A.12) orequations (A.2) - (A.3).(ii) If the prior for a is the bivariate normal distribution with mean vector?0 = (?01;?02)0 and covariance matrix?0 =? ?011 ?0?012 ?022!;we then havep(a1;a2j?;?2;xt;yt) / expf?12(a1 ? ?1;a2 ? ?2)0??1(a1 ? ?1;a2 ? ?2)g expf?12(a1 ? ?01;a2 ? ?02)0(?0)?1?(a1 ? ?01;a2 ? ?02)g/ expf?12(a1 ? ?currency11;a2 ? ?currency12)0?currency1?1(a1 ? ?currency11;a2 ? ?currency12)g;where?currency1 = (??1 + ?0?1)?1 (A.13)?currency1 = ?currency1(??1? + (?0)?1?0): (A.14)From (A.13) and (A.14), we have?currency1 = ? ? ?(? + ?0)?1? = ?0(? + ?0)?1?; (A.15)and?currency1 = ?0(? + ?0)?1? + ?(? + ?0)?1?0: (A.16)190Appendix A. Additional Results for Chapter 2Hence, the posterior conditional distribution for the phase parametersis given by ?aj?;?2;xt;yt?? N(?currency1;?currency1);where ?currency1 and ?currency1 can be referred to equations (A.13){(A.14), or (A.15){(A.16).A.4 Additional Results for Sections 2.7.1 and2.7.2Given the values of the phase parameters, range and variance parameters,and the observations until time t, the joint distribution of ?s1t;?1t is? ?st?1t!? N"? ?s1;t?1?1;t?1!;?2?21 ?currency1(?1);#where?currency1(?) = expf?Vcurrency1=?g =" ?currency111(?) ?currency1 (?)?currency121(?) ?currency122(?)#;with ?currency111(?) a scalar, ?currency112(?) a 1 by n vector, and ?currency122(?) a n by n matrix.We use Vcurrency1 to denote the new distance matrix for the unknown site s andthe monitoring stations s1;:::;sn:We then have the conditional posterior distribution of ?s1t as follows:(?s1tj?s1;t?1;?1t;?1;t?1;yt;?;?2) ? N[?s1;t?1 + ?currency112(?1)?currency122(?1)?1(?1t ??1;t?1);?2?21 (?currency111(?1) ? ?currency122(?1)?1??currency121(?1))]:(A.17)Similarly, the conditional posterior distribution for ?s2t is(?s2tj?s2;t?1;?2t;?2;t?1;yt;?;?2) ? N[?s2;t?1 + ?currency112(?2)?currency122(?2)?1(?2t ??2;t?1);?2?22 (?currency111(?2) ? ?currency122(?2)?1191Appendix A. Additional Results for Chapter 2??currency121(?2))]:(A.18)Using the observation equation as in Model (2.11), we have the condi-tional predictive distribution for yst as follows:(yst jyt;?s1t;?s2t;?1t;?2t;?t;?;?2) ? N[?t + S1t(a1)?s1t + S2t(a2)?s2t + ?currency112(?)??currency122(?)?1(yt ? 1n?t ? S1t(a1)?1t?S2t(a2)?2t);?2(?currency111(?) ? ?currency112(?)??currency122(?)?1?currency112(?))]:(A.19)192Appendix BSoftware for Chapter 3We write the DLM software, GDLM.1.0, using the R and C interface. Thissoftware can be freely downloaded from http://enviro.stat.ubc.ca/dlm/GDLM.1.0.zip or http://enviro.stat.ubc.ca/dlm/GDLM.1.0.tar.gz for win-dows or unix/linux systems, respectively. The instructions for installingthis package in windows and linux/unix is under the folder \INSTALL".This software requires the R software, R ? 2:2:0; and C compiler, thatis, MinGW-3.2.0-rc-3.exe. \FFBS" folder which contains the C codes forthe forward{?ltering{backward{sampling method used in the MCMC algo-rithm. Their header ?les are included under the folder \include". \MAIN"folder contains all the R functions for the models used in this chapter. Areal database example is written in the folder \DEMO" where the databaseis located in the folder \DATA". \OUTPUT" allows one to store the com-putational output from the MCMC algorithm for making some graphicalcomparisons. The basic information for this software has been summarizedin the \Readme" ?le.Two packages in R, MASS and stats, are required in this software to ?tthe DLM within Gaussian framework. We now illustrate several importantfunctions in \MAIN" used in this software.(1) Function \forfun.c" is used as an R interface function with C. It gen-erates two main components to compute the acceptance rate in theMetropolis{within{Gibbs sampling. It has the following arguments:lat, long, mcmc.data.matrix, lambda, a1, a2, gamma.vec, m.init, C.initand BH. Here are the details for each of these arguments at the jthiteration:? \lat" presents the vector of latitudes at the gauged sites.193Appendix B. Software for Chapter 3? \long" presents the vector of longitude at the gauged sites.? \mcmc.data.matrix" presents the MCMC complete data matrixused at each iteration, that is, Y (j?1). By default, its columnsrepresent responses at each of the time points and rows for eachof the sites.? \lambda" presents the range parameter, ?; in the DLM.? \a1" presents the phase parameter corresponding to the 24 hourperiodicity.? \a2" presents the phase parameter corresponding to the 12 hourperiodicity.? \gamma.vec" presents ?xed hyperparameters, that is, ? = (?2y ;?21 ;?1;?22 ;?2):? \m.init" and \C.init" present the initial mean vector and co-variance matrix for the state parameters, that is, m0 and C0;respectively.? \BH" presents the total number of hours included in each day.By default, it is set to be 24.This function has the following outputs:? \quad" presents the summation of the quadratic quantities inthe log{likelihood of the joint posterior density for all the modelparameters, that is, PTt=1 e0tQ?1t et:? \ldet" presents the summation of the log of the determinant quan-tities in the log{likelihood of the joint posterior density for ?, thatis, PTt=1 log jQtj:? \quad.vec" presents the vector of the quadratic quantities calcu-lated at each of the time points, that is, fe0tQ?1t et : t = 1;:::;Tg:? \ldet.vec" presents the vector of the logarithm of the determinantquantities at each of the time points, that is, flog jQtj : t =1;:::;Tg:194Appendix B. Software for Chapter 3(2) Function \?bs.c" is another R interface function with C. It calls a Cfunction in R, \DLMFFBS", to generate the state parameters, xt; ateach MCMC iteration. This function contains the following arguments:? \lat" presents the vector of latitudes at the gauged sites.? \long" presents the vector of longitude at the gauged sites.? \mcmc.data.matrix" presents the mcmc complete data matrixused at each iteration, that is, Y(j?1). By default, its columnsrepresent responses at each of the time points and rows for eachof the sites.? \lambda" presents the range parameter, ?; in the DLM.? \sigma" presents the variance parameter, ?2; in the DLM.? \a1" presents the phase parameter corresponding to the 24 hourperiodicity.? \a2" presents the phase parameter corresponding to the 12 hourperiodicity.? \gamma.vec" presents ?xed hyperparameters, that is, ? = (?2y ;?21 ;?1;?22 ;?2):? \m.init" and \C.init" present the initial mean vector and co-variance matrix for the state parameters, that is, m0 and C0;respectively.? \BH" presents the total number of hours included in each day.By default, it is set to be 24.? \ctr" presents the current index for the iterations in the MCMCalgorithm, that is, j in this case.This function has the following outputs:? \quad" presents the summation of the quadratic quantities inthe log{likelihood of the joint posterior density for all the modelparameters, that is, PTt=1 e0tQ?1t et:195Appendix B. Software for Chapter 3? \ldet" presents the summation of the log of the determinant quan-tities in the log{likelihood of the joint posterior density for themodel parameters, that is, PTt=1 log jQtj:(3) Function \GDLM" is an R function to implement the MCMC algo-rithm for the DLM given by (2.14){(2.15) in Chapter 2. This functioncontains the following arguments:? \srm.data" presents the initial data matrix for the MCMC algo-rithm where the missing values are ?tted by the spatial regressionmethod.? \origin.data" presents the raw data matrix, including the missingvalues. By default, its row represents the stations and columns,the time points.? \job" is a binary variable. \job=0" means this function will onlydo the MCMC sampling; while \job=1" means this function willdo both the MCMC sampling and spatial interpolation at un-gauged sites.? \glat" presents the vector of latitude (in degree) at gauged sites.? \glong" presents the vector of longitude (in degree) at gaugedsites.? \uglat" presents the vector of latitude (in degree) at the ungaugedsites.? \uglong" presents the vector of longitude (in degree) at ungaugedsites.? \gamma.vec" presents ?xed hyperparameters, that is, ? = (?2y ;?21 ;?1;?22 ;?2):? \m.init" and \C.init" present the initial mean vector and co-variance matrix for the state parameters, that is, m0 and C0;respectively.? \BH" presents the total number of hours included in each day.By default, it is set to be 24.196Appendix B. Software for Chapter 3? \ITER" presents the total number of iterations used in the MCMCsampling.? \tuning.para" presents the ?xed tuning parameter, ?2; for theMetropolis{Hasting step.? \alpha.init" and \beta.init" present the hyperprior for the vari-ance parameter, ?2; that is, ?? and ??; respectively.? \eta.init" and \delta.init" present the hyperprior for the rangeparameter, ?:? \mu.phase.init" and \Sigma.phase.init" present the hyperpriorfor the phase parameters, (a1;a2); that is, ?0 and ?0; respectively.? \output.direc.name" presents the directory name that the userwant to store the output.This function has the following outputs:? \lambda" presents the MC samples for the range parameter, ?:? \sigma" presents the MC samples for the variance parameter, ?2:? \phase.a1" and \phase.a2" present the MC samples for the phaseparameters, a1 and a2; respectively.? \accept.ratio" presents the acceptance ratio for the range param-eter, ?, in the Metropolis{Hasting step.? \accept.index" presents the index of accepted iteration in theMetropolis{Hasting step.? \quad" presents the quadratic form at each of the iterations (only? = ?(j) and all the other parameters having their (j ? 1)th iter-ation value).? \log.det" presents the logarithm of the determinant quantity ateach of the iterations (only ? = ?(j) and all the other parametershaving their (j ? 1)th iteration value).? \theta.mat" presents the sampling matrix of the state parame-ters, x1:ITER; from the MCMC algorithm.197Appendix B. Software for Chapter 3(4) Function \GDLM.INIT" is an R function, having almost the samesettings as \GDLM". The di?erence is that \GDLM.INIT" allows theMCMC sampling to start from any given values for model parameters:?; ?2; a1 and a2: This function contains most arguments and sameoutput as that of \GDLM" except for the following four arguments inthe input:? \lambda.init.value" presents the starting values for ?:? \sigma.init.value" presents the starting values for ?2:? \phase.a1.init.value" presents the starting values for a1:? \phase.a2.init.value" presents the starting values for a2:(5) Function \interpolate.fun" is an R function used to interpolate the re-sponse variable at ungauged sites. It contains the following arguments:? \srm.data" presents the initial data matrix for the MCMC algo-rithm where the missing values are ?tted by the spatial regressionmethod.? \origin.data" presents the raw data matrix, including the missingvalues. By default, its row represents the stations and columns,the time points.? \job" is a binary variable. \job=0" means this function will onlydo the MCMC sampling; while \job=1" means this function willdo both the MCMC sampling and spatial interpolation at un-gauged sites.? \glat" presents the vector of latitude (in degree) at gauged sites.? \glong" presents the vector of longitude (in degree) at gaugedsites.? \uglat" presents the vector of latitude (in degree) at ungaugedsites.? \uglong" presents the vector of longitude (in degree) at ungaugedsites.198Appendix B. Software for Chapter 3? \lambda.mcmc" presents the MCMC samples for the range pa-rameter, ?:? \sigma.mcmc" presents the MCMC samples for the variance pa-rameter, ?2:? \phase.a1.mcmc" presents the MCMC samples for the phase pa-rameter, a1:? \phase.a2.mcmc" presents the MCMC samples for the phase pa-rameter, a2:? \BURN.IN" presents the burn{in period setting for the MCMCsamples for the model parameters.? \m.init" and \C.init" present the initial mean vector and co-variance matrix for the state parameters, that is, m0 and C0;respectively.? \gamma.vec" presents ?xed hyperparameters, that is, ? = (?2y ;?21 ;?1;?22 ;?2):? \BH" presents the total number of hours included in each day.By default, it is set to be 24.The output of this function is the interpolated values for the responsevariables at ungauged sites.199Appendix CAdditional Results forChapter 6C.1 Additional Results for Section 6.4De?nition C.1.1 Suppose the random matrix response X : r ? q has amatrix normal distribution, denoted by X ? Nr?q(M;C;?); where C : r ?r > 0; and ? : q ? q > 0: Then the probability density function of X is givenbyp(X) = (2?)?rq=2jCj?q=2j?j?r=2 expf?12tr[(X ? M)0C?1(X ? M)??1]g:(C.1)De?nition C.1.2 Suppose the random matrix X : q ? q is symmetric, pos-itive de?nite and follows an inverted Wishart distribution with degrees offreedom ? and scale matrix S: Then the probability density function of X isgiven byp(X) = kjXj?(?2+q) exp??12trhSX?1i?; (C.2)where S is positive de?nite andk?1 = 2qv=2?q(q?1)=4qYj=1??v ? j ? 12?jSj?v=2;with v = ? + q ? 1:Proof C.1.1 (Lemma 6.4.2) By the KL expansion and Lemma 6.4.1, we200Appendix C. Additional Results for Chapter 6have currency1?2 = O??1S O0 and moreover, currency1?2 ? Wp(n;Ip): Hence, the f??2j :j = 1;:::;pg are mutually independent and ??2j ? W1(n;1); that is, ?2n; forj = 1;:::;p: ?Proof C.1.2 (Theorem 6.4.1) Given Y : p?n ? Np?n(0;?S ?T ); denoteYcurrency1 to be Y??1=2T : Consequently, Ycurrency1 ? Np?n(0;?S In): Similarly, we have??1=2S Ycurrency1 ? Np?n(0;Ip In):By Lemma 6.4.1, ??1=2S Ycurrency1 = OLP; where O represents an orthogonalmatrix that is uniformly distributed over the Grassmann manifold, P, anorthogonal frame that is uniformly distributed over the Stiefel manifold, andL, a diagonal matrix with entries fl1;:::;lpg such that l21;:::;l2p are theeigenvalues for (??1=2S Ycurrency1)(??1=2S Ycurrency1)0: Hence, Ycurrency1 = ??1=2S OLP: Moreover,sinceE[(Ycurrency1)(Ycurrency1)0] = ?1=2S E[OL2O0]?1=2S= n?S;the Bayesian EOFs can then be given by W = 1n?1=2S OL: ?Proof C.1.3 (Theorem 6.4.2) By De?nition C.1.2, we havep(??1S ) / j?Sj?(?S2 +p) expf?12tr(??1S ??1S )g:Given ?T ; Ycurrency1 = Y??1=2T ? Np?n(0;?S In): By De?nition C.1.1, we havep(Ycurrency1j?S) / j?Sj?n=2 expf?12tr[Ycurrency1(Ycurrency1)0??1S ]g:Then the posterior distribution for ??1S given Ycurrency1; that is, Y for known ?Tis given as follows:p(??1S jY) / p(Ycurrency1j?S)p(??1S )/ j?Sj?(?S+n2 +p) exp??12trh(??1S + Ycurrency1(Ycurrency1)0)??1Si?:201Appendix C. Additional Results for Chapter 6In other words, ??1S jY ? Wp(?o;?o); where ?o = ?S + n and?o = f??1S + Ycurrency1(Ycurrency1)0g?1= (??1S + Y??1T Y0)?1= ?S ? ?SY(Y0?SY + ?T )?1Y0?S:?Proof C.1.4 (Theorem 6.4.3) This theorem can be proved similarly as inProof C.1.3, and so omitted here. ?Proof C.1.5 (Theorem 6.4.4) Let V = ??1=2S Y: Then V ? Np?n(0;Ip ?T ): Hence, we havep(Yj?S;?;?) / exp??12trhV0V?(:;?)?1i?:Given the prior for ?; Nk(?;?0); the posterior conditional density for ?can be represented byp(?jY;?S;?) / p(Yj?S;?;?)p(?)/ exp??12htr(VV0?(:;?)?1) + (? ? ?0)0??10 (? ? ?0)i?:?Proof C.1.6 (Theorem 6.4.5)(i) SinceZ ? Np?n(? 10n;?S ?T );and p(?) / 1; we have the posterior conditional distribution for ? asfollows:p(?jZ;?S;?T ) / p(Zj?;?S;?T )p(?)/ expf?12tr[(? 10n ? Z)??1T (? 10n ? Z)0??1S ]g/ expf?12tr[(? 10n)??1T (? 10n)0??1S ? (? 10n)202Appendix C. Additional Results for Chapter 6???1T Z0??1S ? Z??1T (? 10n)0??1S ]g= expf?12tr[(??0tr( 10n1n??1T ) ? ?1n??1T Z0?Z??1T 10n?0)??1S ]g/ expf?12tr[(? ? M)(?currency1)?1(? ? M)0??1S ]g;where ?currency1 = ftr(10n1n??1T ) g?1 and M = Z??1T 10n?currency1: Therefore, wehave?jZ;?S;?T ? N1?p(M;?currency1 ?S);that is, Np(M;?currency1?S) since ?currency1 is a scalar.(ii) Let Y = Z ? ? 10n: Consequently, we have Y??1=2T ? Np?n(0;?S In): By Theorem 6.4.2, the posterior distribution for ??1S jZ;?T isWp(?o;?o); given by (6.16), where Y = Z ? ? 10n:(iii) Similarly as in (ii), by Theorem 6.4.3 the posterior conditional distri-bution for ??1T jZ;?S is given by (6.18), where Y = Z ? ? 10n:?Proof C.1.7 (Theorem 6.4.6) The proof for this theorem follows Theorem6.4.4 but letting V = ??1=2S (Z ? ? 10n): ?203
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Dynamic Bayesian models for modelling environmental...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Dynamic Bayesian models for modelling environmental space-time fields Dou, Yiping 2008-04-01
pdf
Page Metadata
Item Metadata
Title | Dynamic Bayesian models for modelling environmental space-time fields |
Creator |
Dou, Yiping |
Publisher | University of British Columbia |
Date Issued | 2008 |
Description | This thesis addresses spatial interpolation and temporal prediction using air pollution data by several space-time modelling approaches. Firstly, we implement the dynamic linear modelling (DLM) approach in spatial interpolation and find various potential problems with that approach. We develop software to implement our approach. Secondly, we implement a Bayesian spatial prediction (BSP) approach to model spatio-temporal ground-level ozone fields and compare the accuracy of that approach with that of the DLM. Thirdly, we develop a Bayesian version empirical orthogonal function (EOF) method to incorporate the uncertainties due to temporally varying spatial process, and the spatial variations at broad- and fine- scale. Finally, we extend the BSP into the DLM framework to develop a unified Bayesian spatio-temporal model for univariate and multivariate responses. The result generalizes a number of current approaches in this field. |
Extent | 3834182 bytes |
Subject |
Bayesian hierarchical modelling Kalman filter Dynamic linear modelling Bayesian spatial prediction MCMC algorithm Gibbs sampling Wishart distributions Forward filtering Backward sampling Bayesian spatial prediction Bayesian empirical orthogonal functions Bayesian spatial prediction methods |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2008-04-01 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0066333 |
URI | http://hdl.handle.net/2429/634 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2008-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2008_spring_dou_yiping.pdf [ 3.66MB ]
- Metadata
- JSON: 24-1.0066333.json
- JSON-LD: 24-1.0066333-ld.json
- RDF/XML (Pretty): 24-1.0066333-rdf.xml
- RDF/JSON: 24-1.0066333-rdf.json
- Turtle: 24-1.0066333-turtle.txt
- N-Triples: 24-1.0066333-rdf-ntriples.txt
- Original Record: 24-1.0066333-source.json
- Full Text
- 24-1.0066333-fulltext.txt
- Citation
- 24-1.0066333.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066333/manifest