Models and Monitoring Designs forSpatio-temporal Climate Data FieldsbyCamila Maria Casquilho ResendeB.Sc. Actuarial Science, Federal University of Rio de Janeiro, 2008B.Sc. Statistics, Federal University of Rio de Janeiro, 2010M.Sc. Statistics, Federal University of Rio de Janeiro, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2016c© Camila Maria Casquilho Resende 2016AbstractIn this thesis, we describe how appropriately modelling the spatio-tempo-ral mean surface can help resolve complex patterns of nonstationarity andimprove spatial prediction. Nonstationary fields are common in environmen-tal science applications, and even more so in regions with complex terrain.Our analyses focus on the Pacific Northwest, a region where rapid changesand localized weather are very common, and where the terrain plays animportant role in separating often radically different climate and weatherregimes. To this end, we introduce two comparable strategies for spatialprediction. The first is based on a Bayesian spatial prediction method,where an exploratory analysis was performed in order to better understandthe localized weather regimes. The other is based on tackling the anomaliesof expected climate in the Pacific Northwest region, based on the averagevalues of temperature computed over a 30-year range obtained through aclimate analysis system.Secondly, we focus on one of the recent challenges in spatial statisticsapplications, the data fusion problem. There has been an increased need forcombining information from multiple sources that may be on different spa-tial scales. Ensemble modelling is referred to as a statistical post-processingtechnique based on combining multiple computer model outputs in a sta-tistical model with the goal of obtaining probabilistic forecasts. We givean overview of some ensemble modelling strategies, by combining observedtemperature measurements with outputs from an ensemble of determinis-tic climate models. We also provide a comparison between the Bayesianmodel averaging approach and a dynamic Bayesian ensemble strategy forforecasting.iiAbstractFinally, we introduce a novel strategy for the design of monitoring net-work, where the goal is to select a high-quality yet diverse set of locations.The idea of spatial repulsion is brought to this context via the theory ofdeterminantal point processes. Our design strategy is not only able to yieldspatially-balanced designs, but it also has the ability to assess similarity be-tween the potential locations should there be extra sources of informationrelated to the underlying process of interest. We explore its relationship toexisting design methods, such as the entropy-based and space-filling designs.iiiPrefaceThis thesis is an original work of the author, Camila Maria CasquilhoResende, under the supervision of Dr. James V. Zidek and Dr. Nhu D. Le.Dr. Alexandre Bouchard-Coˆte´ provided valuable suggestions at the earlystages of the development of Chapter 8. A version of Chapter 4 was submit-ted to peer review. The manuscript is called “Spatio-temporal modelling oftemperature fields in the Pacific Northwest”, by Casquilho-Resende, C. M.,Le, N. D. and Zidek, J. V. The idea was jointly developed by myself, Dr.Nhu D. Le and Dr. James V. Zidek. I conducted all computational work,derivations, and the majority of the writing. An electronic version can befound online at arxiv:1604.00572.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . xixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Overview of Geostatistics . . . . . . . . . . . . . . . . . . . . 62.1.1 Handling Nonstationarity . . . . . . . . . . . . . . . . 92.2 Overview of Spatial Point Processes . . . . . . . . . . . . . . 112.3 Lambert Conformal Conic Projection . . . . . . . . . . . . . 143 Approximate Bayesian Inference . . . . . . . . . . . . . . . . 163.1 Laplace’s Method . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Integrated Nested Laplace Approximation . . . . . . . . . . . 184 Temperature Fields in the Pacific Northwest . . . . . . . . 234.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . 24vTable of Contents4.2 The Pacific Northwest . . . . . . . . . . . . . . . . . . . . . . 254.3 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.1 University of Washington (UW) Probcast Group Data 264.3.2 U.S. Global Historical Climatology Network . . . . . 284.3.3 PRISM Climate Group Data . . . . . . . . . . . . . . 334.4 Bayesian Spatial Prediction . . . . . . . . . . . . . . . . . . . 344.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5.1 The Spatio–temporal Trend . . . . . . . . . . . . . . 374.5.2 Spatial Correlation in the Residuals . . . . . . . . . . 394.5.3 Spatial Prediction . . . . . . . . . . . . . . . . . . . . 414.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 435 Ensemble Modelling . . . . . . . . . . . . . . . . . . . . . . . . 455.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . 465.2 The Bayesian Ensemble Melding Model . . . . . . . . . . . . 475.3 Inference for the BEM . . . . . . . . . . . . . . . . . . . . . . 515.3.1 A Stochastic-Partial Differential Equation Model Al-ternative . . . . . . . . . . . . . . . . . . . . . . . . . 525.3.2 Spatial Prediction . . . . . . . . . . . . . . . . . . . . 555.4 Ensemble Modelling of Temperatures in the Pacific Northwest 575.4.1 Data Description . . . . . . . . . . . . . . . . . . . . 575.4.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . 585.5 Discussion and Future Work . . . . . . . . . . . . . . . . . . 646 Ensemble Forecaster . . . . . . . . . . . . . . . . . . . . . . . . 666.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2 Bayesian Model Averaging . . . . . . . . . . . . . . . . . . . 666.3 Dynamic Bayesian Ensemble Forecaster . . . . . . . . . . . . 696.3.1 Decision Making Ideas . . . . . . . . . . . . . . . . . 696.4 DBEM Forecaster . . . . . . . . . . . . . . . . . . . . . . . . 716.4.1 A DBEM Forecaster Algorithm . . . . . . . . . . . . 716.5 An Empirical Assessment of the DBEM . . . . . . . . . . . . 74viTable of Contents6.5.1 Forecasting Temperature . . . . . . . . . . . . . . . . 746.5.2 Forecasting Temperature Anomalies . . . . . . . . . . 816.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . 867 Determinantal Point Processes . . . . . . . . . . . . . . . . . 897.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917.3 k-DPPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958 Design of Monitoring Networks . . . . . . . . . . . . . . . . . 988.1 Importance of Designing Monitoring Networks . . . . . . . . 988.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.3 A Review of Design Strategies . . . . . . . . . . . . . . . . . 1018.3.1 Space-Filling Designs . . . . . . . . . . . . . . . . . . 1028.3.2 Entropy-Based Designs . . . . . . . . . . . . . . . . . 1038.4 k-DPP Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1108.5.1 k-DPP Sampling Design Strategy . . . . . . . . . . . 1118.7 Comparing k-DPP and SF Sampling Designs . . . . . . . . . 1148.8 Comparing k-DPP and Entropy-Based Designs for Monitor-ing Temperature Fields . . . . . . . . . . . . . . . . . . . . . 1218.9 Discussion and Future Work . . . . . . . . . . . . . . . . . . 1269 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 1279.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1289.1.1 Nonstationarity in INLA-SPDE: Inference for the BEM 1289.1.2 Modified DBEM for Forecasting . . . . . . . . . . . . 1299.1.3 Comparison of k-DPP Design with the GeneralizedRandom Tessellation Stratified (GRTS) Design . . . . 1299.1.4 Inference about k-DPP Design Parameters . . . . . . 129Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130viiTable of ContentsAppendicesA Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145B INLA-SPDE Example . . . . . . . . . . . . . . . . . . . . . . . 146viiiList of Tables4.1 Empirical coverage probabilities and prediction summaries forthe different methods considered: Bayesian spatial predic-tion (BSP), Bayesian spatial prediction with PRISM (BSP –PRISM), and ordinary kriging. The overall MSPE refers tothe mean squared prediction errors (◦C2) averaged over spaceand time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.1 Forecasting summaries for three selected days February 20th,April 7th and June 5th, using a training set of 25 days. Sum-maries include the root mean squared forecast error (RMSFE),mean absolute error (MAE), the empirical coverage and theaverage length of the 95% credible intervals (CI) for the dif-ferent methods considered: the dynamic Bayesian ensemblemodel and the Bayesian model averaging (BMA). There area total of 109 available stations on Feb 20th, and a total of105 on Apr 7th and June 5th. . . . . . . . . . . . . . . . . . . 756.2 Forecasting summaries across all available time points us-ing a training set of 25 days. There are a total of 77 timepoints. Summaries include the root mean squared forecasterror (RMSFE), mean absolute error (MAE), the empiricalcoverage and the average length of the 95% credible intervalsfor the different methods considered: the dynamic Bayesianensemble model (DBEM) and the Bayesian model averaging(BMA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76ixList of Tables6.3 Forecasting summaries across the different months, using atraining set of 25 days. Summaries include the root meansquared forecast error (RMSFE), mean absolute error (MAE),the empirical coverage and the average length of the 95% cred-ible intervals (CI) for the different methods considered: thedynamic Bayesian ensemble model and the Bayesian modelaveraging (BMA). . . . . . . . . . . . . . . . . . . . . . . . . 776.4 Summary statistics for the temperature measurements (◦C)over space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.5 Forecasting summaries for three selected days Feb 20th, Apr7th and Jun 5th, using a training set of 25 days. Summariesinclude the root mean squared forecast error (RMSFE), meanabsolute error (MAE), the empirical coverage and the aver-age length of the 95% credible intervals (CI) for the differentmethods considered: the dynamic Bayesian ensemble modeland the Bayesian model averaging (BMA). There are a totalof 109 available stations on Feb 20th, and a total of 105 onApr 7th and June 5th. . . . . . . . . . . . . . . . . . . . . . . 836.6 Forecasting summaries across time using a training set of 25days. Summaries include the root mean squared forecast error(RMSFE), mean absolute error (MAE), the empirical cover-age and the average length of the 95% credible intervals forthe different methods considered: the dynamic Bayesian en-semble model (DBEM) and the Bayesian model averaging(BMA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.7 Forecasting summaries across the different months, using atraining set of 25 days. Summaries include the root meansquared forecast error (RMSFE), mean absolute error (MAE),the empirical coverage and the average length of the 95% cred-ible intervals (CI) for the different methods considered: thedynamic Bayesian ensemble model and the Bayesian modelaveraging (BMA). . . . . . . . . . . . . . . . . . . . . . . . . 85xList of TablesB.1 Posterior summaries for the parameters of the BEM model forthe artificial data, including posterior mean and 95% credi-ble intervals. Note that all parameters were reasonably wellestimated. Under a fairly weak prior specification, the mea-surement error precision was underestimated. . . . . . . . . . 149xiList of Figures4.1 Map of the 833 monitoring stations. Map created using Rlibrary ggmap with tiles by Stamen Design, under CC BY3.0, and data by OpenStreetMap, under CC BY SA. . . . . . 274.2 Data availability. Notice the unsystematic missingness of thedata pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Locations of 97 stations in the Pacific Northwestern area con-sidered in this study. Map created using R library ggmap withtiles by Stamen Design, under CC BY 3.0, and data by Open-StreetMap, under CC BY SA. . . . . . . . . . . . . . . . . . . 294.4 Averaged site temperatures for different months. Notice thedifferent patterns of temperature variation across the region.Map created using R library ggmap with tiles by Stamen De-sign, under CC BY 3.0, and data by OpenStreetMap, underCC BY SA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5 Binned empirical semivariograms with Monte Carlo envelopesin shaded area. These envelopes are obtained by permuta-tions of the data across the sampling locations, and indicateregions of uncorrelated data. . . . . . . . . . . . . . . . . . . 314.6 Latitude and longitude effects changing over time. The shadedarea represents 95% confidence intervals for these effects. . . . 324.7 Locations of the stations selected for training and for valida-tion purposes. Map created using R library ggmap with tilesby Stamen Design, under CC BY 3.0, and data by Open-StreetMap, under CC BY SA. . . . . . . . . . . . . . . . . . . 37xiiList of Figures4.8 Investigating effect of projected latitude (centred) on temper-atures considering different interaction scenarios for longitude(eastern, western), elevation (high, low) and time (February,June). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.9 Normal quantile-quantile plot of the residual temperaturesof a linear model with spatio-temporal mean function as inEquation 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . 384.10 Biorthogonal grid for the thin-plate spline characterizing thedeformation of the G-space, using NCDC data set. Solid lineindicates contraction and dashed lines indicate expansion. . . 394.11 Deformation assuming different spline smoothing λ values.Note that when λ = 0, no smoothing is applied. . . . . . . . . 404.12 Estimated dispersions after SG approach in G-space and in D-space. The solid line represents a fitted exponential variogram. 414.13 Map of mean squared prediction errors (◦C2), MSPE, aver-aged over time for the different methods considered: Bayesianspatial prediction (BSP), Bayesian spatial prediction withPRISM (BSP – PRISM), and ordinary kriging (OK). Thered triangles represent the stations used for training purposes.Map created using R library ggmap with tiles by Stamen De-sign, under CC BY 3.0, and data by OpenStreetMap, underCC BY SA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.14 Mean squared prediction errors (◦C2) averaged across un-gauged stations and across time for the different methodsconsidered: Bayesian spatial prediction (BSP), Bayesian spa-tial prediction with PRISM (BSP - PRISM), and ordinarykriging (OK). . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1 Map of the 120 stations used for illustration of the BEMmethodology. Map created using R library ggmap with tilesby Stamen Design, under CC BY 3.0, and data by Open-StreetMap, under CC BY SA. . . . . . . . . . . . . . . . . . . 58xiiiList of Figures5.2 Pearson’s correlation coefficients between measurements andmodel outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3 Triangulation for the BEM data available on Feb 20th. Themesh comprises of 591 edges and was constructed using trian-gles that have a minimum angle of 25, maximum edge lengthof 1◦ within the spatial domain and 2◦ in the extension do-main. The maximum edges were chosen to be less than theapproximate range of the process. The spatial domain was ex-tended to avoid a boundary effect. The monitoring stationsare highlighted in red. . . . . . . . . . . . . . . . . . . . . . . 625.4 Posterior mean for the calibration parameters (aj and bj) foreach member of the ensemble j = 1, . . . , 5 across time (in days). 635.5 Approximate marginal posterior distributions for calibrationparameters (aj and bj) and variances σ2δjfor each member ofthe ensemble j = 1, . . . , 5 for three selected days: February20th, April 7th, and June 5th. . . . . . . . . . . . . . . . . . . 646.1 Mean squared forecast error (MSFE) across space for fore-casts from February 20th to June 30th using the dynamicBayesian ensemble model (DBEM) and the Bayesian modelaveraging (BMA). Both methods assumed a training set of 25days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.2 95% credible intervals of the forecasts for days Feb 20th, Apr7th and June 5th for the different methods considered: thedynamic Bayesian ensemble model (DBEM) and the Bayesianmodel averaging (BMA). The number of stations where theforecasts were obtained were 109, 105 and 105, respectively.The dots represent the true measurement of temperature acrossthe available stations for the selected days. To facilitate vi-sualization, we also coloured the dots based on the differentmethodologies. Note the overall larger error bars observed forthe BMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79xivList of Figures6.3 Mean squared forecast error (◦C), MSFE, across space forforecasts across the month of June for different number oftraining days using the dynamic Bayesian ensemble model(DBEM) and the Bayesian model averaging (BMA). . . . . . 806.4 Monthly boxplots of observed temperatures over space. . . . . 806.5 Posterior means (solid line) for the mean parameters of theunderlying random field over time. The gray shaded regionrepresent the 95% credible intervals. Note the increase inuncertainty for later days in the series. . . . . . . . . . . . . . 826.6 95% credible intervals of the forecasts for days Feb 20th, Apr7th and June 5th for the different methods considered: thedynamic Bayesian ensemble model (DBEM) and the Bayesianmodel averaging (BMA). The number of stations where theforecasts were obtained were 109, 105 and 105, respectively.The dots represent the true measurement of temperature acrossthe available stations for the selected days. To facilitate vi-sualization, we also coloured the dots based on the differentmethodologies. Note the overall larger error bars observed forthe BMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.7 Mean squared forecast error (◦C), MSFE, across space forforecasts from February 20th to June 30th using the dynamicBayesian ensemble model (DBEM) and the Bayesian modelaveraging (BMA). Both methods assumed a training set of 25days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.1 A set of points in the plane drawn from (left) a DPP char-acterized by an L-ensemble with Gaussian kernel and (right)the same number of points sampled independently. Note theclumping associated to the randomly sampled points in con-trast to the more spatially balanced set of points sampledfrom the DPP. . . . . . . . . . . . . . . . . . . . . . . . . . . 94xvList of Figures8.1 Tropical rainforest data. Locations of Beilshmiedia pendulatrees and elevation (metres above sea level) in the [700, 1000]×[0, 200] metres window of a survey plot in Barro Island. Colouredbackground corresponds to the variation of elevation in thatwindow, as seen in the scale on the right of the plot. Dataavailable from spatstat R package. . . . . . . . . . . . . . . 1148.2 Locations of 20 Beilshmiedia pendula trees selected via aspace-filling and a 20-DPP design strategies. Coloured back-ground corresponds to the variation of elevation (metres abovesea level). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158.3 Empirical (solid line) and theoretical (dashed line) Ripley’sK. Note that the SF design shows more spatial regularity thanthe DPP design. . . . . . . . . . . . . . . . . . . . . . . . . . 1168.4 Realization of a Mate´rn random field with mean zero, par-tial sill σ2 = 4, range φ = 1 and smoothness ν = 2, in a[0, 10]× [0, 10] domain. Coloured background corresponds tothe observed values of the field (no units associated to them),as seen in the scale on the right of the plot. . . . . . . . . . . 1178.5 Example of sampling locations using a 40-DPP design, ran-dom (uniform) selection, and a space-filling design. . . . . . . 1188.6 Box-plots of posterior means for the model parameters Ψ =(µ, σ2, φ) after repeatedly selecting 40 locations using threedifferent strategies: a 40-DPP with a Gaussian kernel, ran-dom uniform selection, and a space-filling design. This pro-cess was repeated 100 times. The red horizontal lines repre-sent the true values of the parameters. . . . . . . . . . . . . . 1198.7 Box-plots of posterior standard deviations for the model pa-rameters Ψ = (µ, σ2, φ) after repeatedly selecting 40 locationsusing three different strategies: a 40-DPP with a Gaussiankernel, random uniform selection, and a space-filling design.This process was repeated 100 times. . . . . . . . . . . . . . . 120xviList of Figures8.8 Comparison of entropy-based and DPP design strategies. Theentropy solution yielded a log-determinant of 78.70 for the re-stricted conditional hypercovariance matrix for the ungaugedsites considering the optimal set of locations. Here, we il-lustrate the solution of a 10-DPP sampling design strategy.Note the similarity in the choice of new locations across bothdesigns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.9 Current maximum log-determinants of the restricted condi-tional hypercovariance matrix for the ungauged sites whenincreasing the number of simulations of the 10-DPP. . . . . . 1258.10 Log-determinants of the restricted conditional hypercovari-ance matrix for the ungauged sites varying the number ofsimulations of a 10-DPP. The gray line represents the log-determinant for the optimal entropy solution. . . . . . . . . . 125B.1 Triangulation for the artificial data. The mesh comprises of486 edges and was constructed using triangles that have aminimum angle of 25, maximum edge length of 0.1 withinthe spatial domain and 0.2 in the extension domain. The 100artificial monitoring locations are highlighted in red. . . . . . 147B.2 Posterior distributions for the mean parameters of the under-lying random field. The gray line represents the true value. . 150B.3 Posterior distribution for the measurement error variance i.e,σ2e . The gray line represents the true value. . . . . . . . . . . 150B.4 Posterior distributions for the additive calibration parametersfor each member of the ensemble i.e, aj , for j = 1, . . . , 5. Thegray line represents the true value. . . . . . . . . . . . . . . . 151B.5 Posterior distributions for the multiplicative calibration pa-rameters for each member of the ensemble i.e, bj , for j =1, . . . , 5. The gray line represents the true value. . . . . . . . 152B.6 Posterior distributions for the variance parameters for eachmember of the ensemble i.e, σ2δj , for j = 1, . . . , 5. The grayline represents the true value. . . . . . . . . . . . . . . . . . . 153xviiList of FiguresB.7 Posterior distributions for covariance parameters, namely, smooth-ness, variance and range, respectively. The gray line repre-sents the true value. . . . . . . . . . . . . . . . . . . . . . . . 153xviiiAcknowledgmentsI am deeply thankful to my supervisors Prof. Jim Zidek and Dr. Nhu Le,for the opportunity of working with them and for their continuous supportthroughout the development of this thesis.I would also like to thank Prof. Alexandre Bouchard-Coˆte´ for the in-valuable discussions about my work and about machine learning (ML) ingeneral. Thanks to the opportunity of being part of the ML reading groupmeetings. This has been instrumental to the broadening of my researchinterests.Thanks to STATMOS/PIMS-UBC for the financial support to attendvaluable workshops that greatly contributed to the development of this work.I am also thankful to my dear friends in UFRJ and UBC who believedin and gave me continuous incentive to carry on this journey, despite myself-criticism. Special thanks goes to my mum, for her unconditional loveand support.xixDedicationA` minha ma˜exxChapter 1IntroductionSpatial statistics focuses on the modelling of processes where geographi-cal information is of interest or relevant to understand an underlying physicalphenomenon. The areas of application of these methodologies are vast, suchas in environmental sciences, forestry, and agriculture. It is often essentialthat we understand these phenomena to better understand their effects innature. Notably in environmental science, there has been a growing needfor understanding the changes in the Earth’s climate as well as increas-ing concerns due to their potential impact on human health. Our work ismainly motivated by these concerns. We focus on diverse objectives relatedto environmental sciences, which we describe below.This thesis starts by providing some background on spatial statistics inChapter 2. In Chapter 3, we provide an overview of approximate methodsfor performing Bayesian inference.In Chapter 4, we analyze temperature fields in the Pacific Northwesternregion. The importance of modelling temperature fields goes beyond theneed to understand a region’s climate and serves too as a starting point forunderstanding their socioeconomic, and health consequences. Particularlydue to the topography of this region, temperature modelling has been rec-ognized to be challenging (Mass, 2008; Salathe´ et al., 2008; Kleiber et al.,2013), and demands flexible spatio-temporal models that are able to handlenonstationarity and changes in trend.Our main message is on how appropriately modelling the spatio-temporalmean can help resolve complex patterns for nonstationarity and improve spa-tial prediction. This is often achieved with an exploratory analysis to betterunderstand the localized changes in trend, instead of simply focusing on themodelling of the spatial covariance structure. We argue that carefully mod-1Chapter 1. Introductionelling the spatio-temportal mean is needed to better represent interestingsmaller-scale trends, especially for regions with a complex terrain like thePacific Northwest, which may not be captured by global climate models.Another contribution is the ability to accommodate features in the meanthat vary over space by extending the spatio-temporal model proposed inLe and Zidek (1992). This methodology is flexible and able to accommodatenonstationarity.We then introduce two comparable strategies for performing spatial pre-diction. The first is based on the extended Bayesian spatial predictionmethod after an exploratory analysis to better understand the local changesin trend, and the realization of the need to account for interacted spatio-temporal features in the mean. The second is based on tackling the anoma-lies of expected climate in the Pacific Northwest, based on the average val-ues of temperature computed over a 30-year range (1981-2010), providedby PRISM Climate Group (Daly et al., 1994, 1997, 2000). For the latterstrategy, we observed a higher mean squared prediction error for out-of-the-sample monitoring stations.Subsequently, in Chapter 5, we explore the data fusion problem, whereour goal is to combine information from multiple sources that might havebeen measured at different spatial scales. In weather studies, data measure-ments are often supplemented by information brought by computer modeloutputs. These computer models simulate physical phenomena in order bet-ter understand complex physical systems. We provide a description of theBayesian Ensemble Melding model (BEM) methodology introduced by Liu(2007) following Fuentes and Raftery (2005). The main idea lies in linkingprocesses on mismatched scales through an underlying “true” process. Oneof the main disadvantages of these methodologies is the computational bur-den faced while performing inference, as noted in many applications (Swalland Davis, 2006; Smith and Cowles, 2007; Foley and Fuentes, 2008). Here,we introduce a scalable inference methodology alternative for the BEM us-ing integrated nested Laplace approximations (INLA) (Rue et al., 2009).Following Lindgren et al. (2011), we take advantage of a Markov represen-tation of the Mate´rn covariance family in a continuous space. We illustrate2Chapter 1. Introductionthe methodology for combining an ensemble of computer model outputs withdata measurements of temperature across the Pacific Northwest.Then in Chapter 6, we introduce a dynamic strategy with the objec-tive of performing forecasting that builds on the BEM model’s ability toaccommodate time. The methodology uses an INLA framework and is com-putationally efficient. We provide a comparison of the DBEM forecastingstrengths with a Bayesian Model Averaging (BMA) (Raftery et al., 2005)alternative. The DBEM methodology is based on a mixture of posteriordistributions in a training set over time. Our empirical studies indicate thatthe DBEM is able take advantage of this smoothing by borrowing strengthof nearby sites, yielding less uncertainty in forecasting intervals, but it un-derperforms the BMA under a lot of uncertainty a posteriori.Afterwards, in Chapter 8 we stress how monitoring networks play animportant role in surveillance of environmental processes. We introduce aflexible monitoring network design strategy based on k-determinantal pointprocesses (DPP) (Kulesza and Taskar, 2012). An overview of DPPs is pro-vided in Chapter 7. The k-DPP design is able to yield a spatially balanceddesign by imposing repulsion on the distances between existing locations andhence avoiding spatial clumping, but also has the ability to assess similaritybetween the potential locations should there be extra sources of informationknown to influence the underlying process of interest. We describe how themethodology is able to handle both designing and redesigning of a monitor-ing network.Moreover, our empirical studies illustrate how the k-DPP sampling de-sign strategy can be used as a spatially balanced sampling design alternativeto the space-filling design (Royle and Nychka, 1998). The main advantage isdue to to the fact that it is essentially a randomized design strategy, whichcan be useful to help mitigate selection bias risks. Due to its flexibility, asampling k-DPP design strategy is particularly suited to the design of mo-bile networks. Another notable characteristic of the k-DPP design objectiveis that it is constructed in such way that is strongly similar to the entropy-based design (Caselton et al., 1992), and can be viewed as a randomizedversion of this design. We introduce a sampling strategy based on k-DPP3Chapter 1. Introductionthat can be particularly useful to approximate the optimal solution for theentropy-based design when the number of combinations is prohibitive, dueto the NP-hardness of this design criterion (Ko et al., 1995).Finally, Chapter 9 reflects on what we have learned in the work reportedin this thesis and in turn the future research to which our work leads.4Chapter 2Spatial StatisticsSpatial statistics comprises a wide range of statistical methods intendedfor the analysis of georeferenced data. Rapid advances in technology con-tribute to the ease of collecting spatially referenced data. In science, spatialdata arise in many applications, including environmental sciences, epidemi-ology, agriculture, and image processing, to name just a few. Notably inenvironmental science, there is a need for understanding the changes in theEarth’s climate as well as their impact in human health. Therefore, spatialstatistics studies are not only required from a scientific perspective, but alsofor regulatory purposes. The spatial statistics literature is often divided intothree main branches: geostatistics, lattice data and point patterns (Cressie,1993; Cressie and Wikle, 2011; Banerjee et al., 2014).In geostatistical studies, the idea is that there exists an underlying spa-tial process that governs a particular physical phenomenon, but data areonly observed at a finite set of locations. The locations, however, are con-sidered fixed. This theory is often used to understand weather phenomena,such as temperature fields, which will be the focus of Chapter 4. In suchapplications, a monitoring network refers to the weather stations at whichthe data are recorded. An overview is provided in Section 2.1.Furthermore, the analysis of lattice data refers to data that are obtainedon subregions that make up a larger space. An example would be pixel valuesfrom remote sensing. Despite the terminology, data need not be observed inregularly spaced locations. For instance, in epidemiological studies, there isan interest in mapping occurrences of diseases, but the information is oftengathered in a provincial or state scale.Finally, point patterns are associated with studies in which the main in-terest lies in the location of event occurrences, and assessing whether there52.1. Overview of Geostatisticsmay be a systematic pattern. Unlike geostatistical studies, there is random-ness associated to the locations. A brief overview is provided in Section2.2.2.1 Overview of GeostatisticsGeostatistics is a branch of spatial statistics in which inference for a spa-tially continuous phenomenon is based on spatially discrete sampled data, forinstance, measurements of temperature obtained at several different weatherstations.In geostatistics, the interest is in a latent continuous process {Y ∗(s) :s ∈ Rd}, where d is the dimension of the space. Geostatistical analysis,however, is based on a real-valued process {Y (s) : s ∈ G}, which is a partialrealization of the process on the whole space, where G ⊂ Rd. The data comefrom measurements at each location si, denoted as Y (si), i = 1, . . . , n.A stochastic process is assumed to be a spatial Gaussian process if,for a finite collection of locations si, i = 1, . . . , n, the joint distributionof Y = (Y (s1), . . . , Y (sn)) is multivariate Gaussian. Gaussian processeshave been central in spatial statistics, particularly in geostatistical studies.Gelfand and Schliep (2016) provides a description of how Gaussian processeshave become “the most valuable tool in geostatistical modelling”. One ofthe main advantages is that it suffices to describe a mean and covariancestructure and, additionally, the marginal and conditional distributions areknown.Having said this, there are other works aimed at addressing scenarios inwhich the normality assumption is not realistic, such as transformations ofthe data as in De Oliveira et al. (1997), the flexibility of generalized linearmodels in a geostatistical framework proposed by Diggle et al. (1998), oreven alternative stochastic representations introduced by a scale mixing ofa Gaussian process as in Palacios and Steel (2006).As pointed out in Diggle and Ribeiro Jr (2007), the basic geostatistical62.1. Overview of Geostatisticsmodel assumes that for each location s,Y (s) = x(s)>β + η(s) + (s), (s) ∼ N(0, τ2), (2.1)where the mean surface µ(s) = x(s)>β is a linear function of some spatially-referenced explanatory variables stored in the vector x(s)>, η(s) is a second-order stationary process with zero mean and variance σ2, as well as anisotropic correlation function, and (s) is an uncorrelated Gaussian processwith zero mean and variance τ2. The variance τ2 is referred to as the nuggeteffect and interpreted as measurement error and small-scale variation. Thequantity τ2 +σ2 is known as the sill whereas the variance σ2 is known as thepartial sill. In the above and throughout this thesis, > denotes transpose.A random process η(s) is second-order stationary ifE[η(s)] = µ(s) = µ, and (2.2)Cov(η(s + h), η(s)) = C(si − sj), (2.3)that is, µ(s) is constant for all s ∈ D and the covariance between any twopoints s and s + h in D depends only in the separation vector h. C(·) isknown as covariance function.Furthermore, a random process η(s) is intrinsic stationary ifE[η(s + h)− η(s)] = 0, and (2.4)Var[η(s + h)− η(s)] = 2γ(h), (2.5)for all s and s + h ∈ D. The quantity 2γ(·) is known as a variogramwhereas γ(·) is known as a semivariogram, which are useful to describespatial dependency. Note that Cov(η(s + h), η(s)) can be written asCov(η(s + h), η(s)) = σ2ρ(||h||), (2.6)where ρ(||h||) = Corr{η(s), η(s + h)}. Hence, another way of defining the72.1. Overview of Geostatisticssemivariogram is through the correlation function ρ, whereγ(h) = σ2(1− ρ(h)), (2.7)where ρ(h) = Corr{η(s), η(s + h)}.When the covariance function depends only on the distance between thesites, that is,Cov(η(s + h), η(s)) = C(||h||), (2.8)the process is known as isotropic. An isotropic and intrinsic stationaryprocess is known as homogenous process.In the classical spatial textbooks Cressie (1993); Cressie and Wikle (2011);Diggle and Ribeiro Jr (2007), there are several examples of isotropic para-metric covariance functions, such as the Mate´rn family. In the Mate´rn fam-ily, the covariance function isC(u) = σ2{2κ−1Γ(κ)}−1(u/φ)κKκ (u/φ) , (2.9)where u is the Euclidean distance between two locations, Γ denotes thegamma function, Kκ is the modified Bessel function of order κ > 0. Theparameters σ2 and φ > 0 are the partial sill and scale, respectively.The smoothness of the process is governed by the parameter κ. Whenκ = 0.5, the Mate´rn covariance function reduces to the exponential, definedasC(u) = σ2 exp (u/φ) . (2.10)It should be noted that the isotropy assumption is usually unrealistic inenvironmental applications and there are numerous works in the literatureaimed at handling nonstationarity. An overview of such methods is providedin Section 2.1.1.One important objective in geostatistics is to perform spatial predictionat unobserved locations. Kriging methods provide the best linear unbiased82.1. Overview of Geostatisticsestimate of the field at unobserved locations. A detailed description of Krig-ing methods can be found in the classical spatial textbooks Cressie (1993);Diggle and Ribeiro Jr (2007); Cressie and Wikle (2011). More recently, therehas been an increased interest in combining multi-source spatially referenceddata. In studies about the weather, for instance, besides the data obtainedfrom monitoring stations, outputs from deterministic climate models couldprovide additional information regarding large-scale variations about theunderlying phenomenon and could ultimately improve spatial prediction.Since the climate model outputs and monitoring data are often on mis-matched scales, there has been a growing interest in studying techniques forhandling this change of support problem. This will be the central focus ofChapter 5.Another important objective in geostatistical studies is deciding whereto position a monitoring station. The design problem will be the centralfocus of Chapter 8.2.1.1 Handling NonstationarityIn environmental applications, the isotropy assumption is often unre-alistic and it is crucial to handle nonstationarity in the spatial modelling.Recently, many techniques have been developed. A simple approach is toconsider locally stationary models, based on the idea that the effects ofnonstationarity in smaller spatial domains may be negligible. Haas (1990)suggests a moving-window technique, based on a circular subregion to wherethe inference is restricted. The idea was later extended to a spatio-temporalcase (Haas, 1995).Higdon et al. (1999) propose a model based on a moving average specifi-cation of a Gaussian process whereas Fuentes and Smith (2001) and Fuentes(2001, 2002) consider a class of nonstationary processes based on mixture oflocal stationary processes. Unlike the moving-window approaches, the modelis defined on the whole region of interest, though locally it still behaves likea stationary process.Another idea is to assume that after some deformation of the space, the92.1. Overview of Geostatisticsprocess may then be assumed stationary. Of particular note is the Sampson-Guttorp approach (Sampson and Guttorp, 1992) to spatial deformation,which considers models of the formCov(Y (s1), Y (s2)) = 2ρθ(||f(s1)− f(s2)||), (2.11)where f is a smooth nonlinear map G → D from the geographical G-space(G ⊂ Rd) to the deformed D-space (G ⊂ Rd).The locations of the sites in the D-space are obtained via a multidimen-sional scaling (MDS) algorithm. A mapping of the sites from the G-spaceinto the D-space is obtained through the minimization problem of the fol-lowing criterion over all monotonic functions δ:minδ∑i<j [δ(dij)− hij ]2∑i<j h2ij, (2.12)where dij and hij denote the observed dispersion and the distance betweenbetween sites i and j in the D-space, respectively.Once the locations of the sites are obtained in the D-space, Sampsonand Guttorp (1992) use thin-plate splines to obtain a smooth mapping ofthe sites from the G-space into the D-space and the δ function is replacedby a smooth function g such that dij ≈ g(hij). It is then possible to obtainestimates of realizations of the spatial process at ungauged locations by firstsmoothly mapping them onto the D-space and subsequently using standardstationary modeling tools.Damian et al. (2001) extended the Sampson and Guttorp (1992) ap-proach in a Bayesian framework, where the locations in the deformed spacedand the unknown parameters are estimated jointly, thus obtaining a smoothextrapolation of the deformed space to the whole region of interest. Schmidtand O’Hagan (2003) then proposed a similar spatial deformation methodbased on a Bayesian model, though the mapping of sites is handled in asingle framework, and unlike Sampson and Guttorp (1992), their predic-tive inferences take into account the uncertainty in the mapping. Schmidtand O’Hagan (2003) argue that the Sampson and Guttorp (1992) method102.2. Overview of Spatial Point Processesdoes not account for uncertainty about the mapping since their predictionmethod is based on some fixed locations in the distorted space.Other approaches consider the idea of decomposing the covariance func-tion for nonstationary processes, such as the work of Nychka and Saltzman(1998), which considered empirical orthogonal functions, and Nychka et al.(2002), based on a wavelet basis decomposition. More recently, Bornn et al.(2012) proposed a dimension expansion approach, based on the idea thatthe underlying field can be more straightforwardly described in a higherdimension.2.2 Overview of Spatial Point ProcessesA point process X is a stochastic mechanism whose realizations consistof countable sets of points, often referred to as events or point patterns.When this process generates a countable set of events in a limited regionD ⊂ Rk, it is called a spatial point process. In practice, the locations wherethose events occur are of special importance and are modelled as randomvariables. In particular, the focus is often on understanding and assessingpatterns in the locations of these events.The analysis of spatial point processes can be seen in various scientificapplications. For instance, often in forestry applications there may be aninterest in determining whether there is a pattern in the locations of a certainspecie of trees in a forest, or in monitoring forest wildfires. Another commonarea where the analysis of spatial point process is found is in epidemiologyapplications, where the goal could be in monitoring whether there exists acluster in the locations of occurrence of a certain disease.Spatial point processes are usually classified based on the pattern of thepoints. A completely random process is when there is no obvious patternor structure in the points. These processes are often modelled using a ho-mogenous Poisson process, and are sometimes referred to as complete spatialrandomness. The notion of homogeneity is due to the assumption that thenumber of points falling in a region B is proportional to its area (or vol-ume) |B| on average. More specifically, a homogeneous Poisson process X112.2. Overview of Spatial Point Processesin D ⊂ Rk with intensity λ > 0 satisfies the following properties:• The random variable Y (B) representing the number of events B ⊂ Dfollows a Poisson distribution with mean µ(B) = λ|B|.• For non-overlapping regions B1, . . . , Bm, the number of events in eachregion are mutually independent random variables.For more general inhomogenous Poisson process, the random variableY (B) representing the number of events B ⊂ D follows a Poisson distribu-tion with mean µ(B) =∫B λ(x)dx.When a pattern does exist, it may due to a clustering of events, in whichcase it would be reasonable to assume that the occurrence of an event in aregion is associated with occurrence of events nearby. Those processes areoften referred to as aggregative point process.On the other hand, when events are rather evenly spaced, it is reasonableto assume that the occurrence of an event in a region is actually preventingthe occurrence of events nearby, that is, repelling events. Those processes areoften called repulsive or regular point processes. In Chapter 7, we introduceone process of such type called called determinantal point processes.In order to describe a spatial point process, we need to define its firstand second order properties. The intensity function is defined asλ(x) = lim|dx|→0{E[N(dx)]|dx|}, (2.13)where dx is an infinitesimal region that contains the point x, N(dx) denotesthe number of events in this infinitesimal region, and |dx| denotes the areaor volume of this infinitesimal region.An estimate of the intensity function can be obtained by assuming aparametric model on the intensity function or by using kernel density esti-mators (Diggle, 1985; Berman and Diggle, 1989; Bivand et al., 2013). In the122.2. Overview of Spatial Point Processesplane,λˆ(x) =1h2N∑j=1M( ||x−xj ||h)q(||x||) , (2.14)where xj ∈ {x1, . . . , xN} is an observed point, M is a bivariate symmetrickernel function, and h > 0 is the bandwidth controlling the amount ofsmoothing in the estimation. In practice, however, we only observe pointsin a window W ∈ D where the point pattern was observed. The number ofpoints in a circle centred on an point inside the window W is not observableif the circle extends beyond W , which creates an edge effect. Hence, q(||x||)is a border correction to compensate for the missingness due to these edgeeffects.The second-order intensity function is defined asλ(xi, xj) = lim|dxi|,|dxj |→0{E[N(dxi)N(dxj)]|dxi||dxj |}. (2.15)where xi and xj denote two events in D. The second-order intensity functionis related to the chances of any pair of events occurring in the vicinities ofxi and xj .When the spatial point process is stationary, its intensity function isconstant, i.e. the mean number of events per unit area is λ(x) = λ, and thesecond-order intensity function is reduced toλ2(x, y) ≡ λ2(xi − xj). (2.16)For a stationary, isotropic process the second-order intensity function isreduced toλ2(x, y) ≡ λ2(||xi − xj ||). (2.17)Another second-order property of a stationary process is given by the132.3. Lambert Conformal Conic Projectionfollowing functionK(t) = λ−1E(N0(t)), (2.18)where N0(t) is the number of events within distance t of an arbitrary event.Ripley’s K is often referred to as slightly modified estimator of the one inRipley (1988), given byKˆ(t) =1n(n− 1) |W |n∑i=1∑j 6=ieijI(dij ≤ t), (2.19)where |W | denotes the area of the observation window, eij is an edge correc-tion weight. The Ripley’s K is a very common exploratory methodology toempirically evaluate inter-point dependencies. It is often compared with thetheoretical K based on a Poisson process given by K(t) = pit2, and servesas a benchmark for no correlation (Baddeley et al., 2015).A thorough statistical description o these processes can be found in Rip-ley (1988); Møller and Waagepetersen (2004); Diggle (2013).2.3 Lambert Conformal Conic ProjectionIt is extremely important to take into consideration the curvature of theEarth, especially for large regions such as the ones studied in this thesis. Inorder to do so, instead of considering the geographical coordinates of theobservations, such as latitude and longitude, we obtain their correspondinglocations based on a particular cartographic projection called the LambertConformal Conic Projection.As described in Snyder (1987), a cartographic projection is a systematictransformation of the latitudes and longitudes of the observations on thesurface of the Earth on a plane. In particular, the Lambert ConformalConic Projection places a cone over the sphere of the Earth and projectsthe surface conformally (i.e. preserving angles locally) onto the cone. Thescale is true along either one or two standard parallels of latitude.Geographical coordinates, with λ and φ denoting longitude and latitude142.3. Lambert Conformal Conic Projectionrespectively, can be transformed into Lambert conformal conic projectioncoordinates byx = ρ sin θ (2.20a)y = ρ0 − ρ cos θ, (2.20b)whereρ =R× Ftann(pi/4 + φ/2)(2.21)θ = n(λ− λ0) (2.22)ρ0 =R× Ftann(pi/4 + φ0/2)(2.23)F = cosφ1 tann(pi/4 + φ1/2)/n (2.24)n =ln(cosφ1/ cosφ2)ln[tan(φ/4 + φ/2)/ tan(φ/4 + φ1/2)], (2.25)φ0 and λ0 denoting the reference latitude and longitude, R the radius ofthe Earth, φ1 and φ2 the standard parallels. If only one standard parallel isused, i.e. φ1 = φ2, then n = sin(φ1).15Chapter 3Approximate BayesianInferenceLet Y be a random vector with density function or probability massfunction given by p(Y|Ψ), where Ψ is a parameter vector characterizingthe distribution of Y. In a Bayesian framework, before observing data, aprobability distribution is assumed for Ψ, namely a prior distribution. Thisprior distribution refers to the initial uncertainty about Ψ. After observingrealizations of Y, namely the data y, and via the Bayes’ theorem, one canobtain the posterior distribution of Ψ as followsp(Ψ|y) = p(y|Ψ)p(Ψ)∫p(y|Ψ)p(Ψ)dΨ . (3.1)Statistical inference in a Bayesian framework is based on the posteriordistribution of Ψ, which contains all probabilistic information about Ψ. Inparticular, suppose that Ψ = (Ψ1, . . . ,Ψk)>. Marginal posterior distribu-tions p(ΨI), where I ⊆ {1, . . . , k}, are obtained byp(ΨI|y) =∫p(Ψ|y)dΨI, (3.2)where I denote the complement of I.A common challenge in Bayesian inference is that often it is not possibleto analytically solve the integral in equations (3.1) or (3.2). Numericalapproximations have been developed mostly in the 1980s, such as Naylorand Smith (1982), Tierney and Kadane (1986), Smith et al. (1987) andTierney et al. (1989). With the recent advances in computational methods,and perhaps motivated by the work of Gelfand and Smith (1990), Markov163.1. Laplace’s Methodchain Monte Carlo (MCMC) methods have been increasingly popular sincethe 1990s.However, for the class of latent Gaussian models, MCMC strategies pro-vide a significant computational burden and the need to deal with the com-mon issue of correlated parameters. In this thesis, we focus on the morerecent and well-established work of Rue et al. (2009), that provides a de-terministic attractive alternative to the computationally intensive MCMCmethods. We describe this methodology in Section 3.2. In the followingSection 3.1, we provide an overview of the Laplace’s method.3.1 Laplace’s MethodIn order to approximate unimodal posteriors, Tierney and Kadane (1986)proposed the use of Laplace method to obtain moments of smooth positivefunctions. For instance, let G be a smooth positive function on a parameterspace. The posterior mean of G(Ψ) can be written asE[G(Ψ)] =∫G(Ψ)p(Ψ|y)dΨ =∫ G(Ψ)l(Ψ; y)p(Ψ)dΨ∫l(Ψ; y)p(Ψ)dΨ, (3.3)where p(Ψ|y) denotes the posterior distribution of Ψ, l(Ψ; y) the likelihoodfunction, and p(Ψ) the prior distribution of Ψ.Firstly, let L(Ψ) = log(G(Ψ)l(Ψ; y)p(Ψ)) denote the logarithm of theintegrand of the numerator in equation 3.3. Expanding L(Ψ) around itsmode Ψ∗ givesL(Ψ) ≈ L(Ψ∗)− 12(Ψ−Ψ∗)>I−1(Ψ∗)(Ψ−Ψ∗), (3.4)where I−1(Ψ∗) = −[∇2L(Ψ)]−1 is minus of the inverse of the Hessian matrixof L(Ψ) evaluated at its mode Ψ∗. Hence,∫g(Ψ)l(Ψ; y)p(Ψ)dΨ ≈∫exp{L(Ψ∗)− 12(Ψ−Ψ∗)>I−1(Ψ∗)(Ψ−Ψ∗)}dΨ= (2pi)−m2 |I(Ψ∗)| 12 exp {L(Ψ∗)} , (3.5)173.2. Integrated Nested Laplace Approximationwhere m is the dimension of the parameter vector Ψ.Similarly, let L˜(Ψ) = log(l(Ψ; y)p(Ψ)) denote the logarithm of the in-tegrand of the denominator in equation 3.3. Expanding L˜(Ψ) around itsmode Ψ˜∗givesL˜(Ψ) ≈ L˜(Ψ˜∗)− 12(Ψ− Ψ˜∗)>I˜−1(Ψ˜∗)(Ψ− Ψ˜∗), (3.6)where I˜−1(Ψ˜∗) = −[∇2L(Ψ)]−1 is minus of the inverse of the Hessian matrixL˜(Ψ) evaluated at its mode Ψ˜∗. Hence,∫l(Ψ; y)p(Ψ)dΨ ≈∫exp{L˜(Ψ˜∗)− 12(Ψ− Ψ˜∗)>I˜−1(Ψ˜∗)(Ψ− Ψ˜∗)}dΨ= (2pi)−m2 |I˜−1(Ψ∗)| 12 exp{L˜(Ψ˜∗)}. (3.7)Finally, equation 3.3 can be approximated asE[G(Ψ)] ≈ |I−1(Ψ∗)| 12|I˜−1(Ψ∗)| 12exp{L(Ψ∗)− L˜(Ψ˜∗)}. (3.8)3.2 Integrated Nested Laplace ApproximationThe integrated nested Laplace approximation (INLA) method was pro-posed by Rue et al. (2009) as a way of performing approximate Bayesianinference for latent Gaussian models. The INLA package for computationin R is available for download at www.r-inla.org (last accessed on June 15,2016). In this section, we provide an overview of the INLA method.Consider the following hierarchical structureyi|x,θ ∼ p(yi|xi,θ) (3.9a)x|θ ∼ p(x|θ) (3.9b)θ ∼ p(θ), (3.9c)where y = (y1, . . . , ynd)′ denotes the observed data, x = (x1, . . . , xn)′ theelements of the latent field, θ a m-dimensional hyperparameter vector, whereyi|x,θ belongs to the exponential family of distributions and x|θ is assumed183.2. Integrated Nested Laplace Approximationto follow a Gaussian distribution. Furthermore, it is assumed that data areconditionally independent given the latent field x. Hence,p(y|x,θ) =n∏i=1p(yi|xi,θ). (3.10)The posterior distribution is then given byp(x,θ|y) ∝ p(θ)p(x|θ)n∏i=1p(yi|xi,θ). (3.11)The latent field x is often of high dimension, but it is assumed that theyhave conditional independence properties. In particular, Rue et al. (2009)consider models that satisfy the Markov property, i.e, xi|x−i only dependson a subset x−i, where x−i = (x1, . . . , xi−1, xi+1, . . . , xn). In this case, thelatent field is a Gaussian random field and its precision matrix is sparse,which is the key ingredient for computational efficiency.In particular, Rue et al. (2009) focus on approximating the followingmarginal distributionsp(xi|y) =∫p(θ|y)p(xi|θ,y)dθ (3.12)p(θj |y) =∫p(θ|y)dθ−j , (3.13)for every i = 1, . . . , n and j = 1, . . . ,m.When the integrals in (3.12) and (3.13) cannot be found analytically, ap-proximations for p(xi|θ,y) and p(θ|y) are obtained and denoted by p˜(xi|θ,y)and p˜(θ|y), respectively. Thus, one can construct the following nested ap-proximationsp˜(xi|y) =∫p˜(θ|y)p˜(xi|θ,y)dθ (3.14)p˜(θj |y) =∫p˜(θ|y)dθ−j . (3.15)Via numerical integration, an approximation can be obtained for p˜(xi|y)193.2. Integrated Nested Laplace Approximationas followsp˜(xi|y) ≈∑jp˜(xi|θj ,y)p˜(θj |y)∆j (3.16)where ∆j denotes the area weights associated with the evaluation points θj .Below, we discuss strategies for approximating p(θ|y), p(xi|θ,y), andfor the marginal likelihood of the data, p(y).Approximation of p(θ|y)The approximation of p(θ|y) is equivalent to the Laplace approximationfor the marginal posterior distribution originally proposed by Tierney andKadane (1986), and is given byp˜(θ|y) ∝ p(x,θ,y)p˜G(x|θ,y)∣∣∣∣x=x∗(θ), (3.17)where p˜G(x|θ,y) is the Gaussian approximation of the full conditional dis-tribution of x and x∗(θ) is the mode of the full conditional x for a givenθ. The proportional sign in (3.17) is due to the fact that the normalizingconstant of p(x,θ|y) is unknown.Note thatp(x|θ,y) ∝ exp{−12x′Q(θ)x +∑ilog(p(yi|xi,θ))}, (3.18)where Q(θ) denotes the precision matrix of the Gaussian latent field withhyperparameters θ. Obtaining the Taylor expansion of second order aboutx∗ givesp˜(x|θ,y) ∝ exp{−12x′(Q(θ) + diag(c))x + d′x}, (3.19)where c and d are the coefficients of the expansion, and diag(·) representsa diagonal matrix.In practice, Rue et al. (2009) note that there is no need to represent203.2. Integrated Nested Laplace Approximationp˜(θ|y) parametrically, rather, explore it sufficiently well in order to select“good” points for the numerical integration. That is done by locating themode θ∗ of p(θ|y) and exploring the log-posterior in order to select pointsin regions of high probability mass for use in the integration.However, should the number of hyperparameters be large, say greaterthan 5, this grid exploration strategy can be very inefficient, with a compu-tational cost that grows exponentially with the number of hyperparameters.For such cases, Rue et al. (2009) propose the use of a central compositedesign (CCD) strategy to help locate the integration points, which can thenbe used to estimate the curvature of p(θ|y) around its mode.Marginal Likelihood ApproximationAn approximation for the marginal likelihood can be obtained from(3.17),p˜(y) =∫p(x,θ,y)p˜G(x|θ,y)∣∣∣∣x=x∗(θ)dθ, (3.20)where p(x,θ,y) = p(θ)p(x|θ)p(y|x,θ).Approximation of p(xi|θ,y)There are different ways of approximating p(xi|θ,y) (Rue et al., 2009)as described below. Throughout the applications in this thesis, we opted fora Laplace approximation.• Gaussian approximation: The simplest method of approximatingp(xi|θ,y) is based on a Gaussian approximationp˜G(xi|θ,y) ≡ N(µi(θ), σ2i (θ)), (3.21)where µi(θ) and σ2i (θ) are the marginal mean and variance fromp˜G(x|θ,y). This approximation can be useful in some cases, but asnoted in Rue and Martino (2007), it may lead to errors in the locationor lack of skewness.213.2. Integrated Nested Laplace Approximation• Laplace approximation: Another way of approximating p(xi|θ,y)is based on a Laplace approximation:p˜LA(xi|θ,y) ∝ p(x,θ,y)p˜GG(x−i|xi,θ,y)∣∣∣∣x−i=x∗−i(xi,θ), (3.22)where x∗−i(xi,θ) is the mode of p(x−i|xi,θ,y) and p˜GG(x−i|xi,θ,y)denotes the Gaussian approximation for x−i|xi,θ,y. A drawback isthe need to evaluate p˜GG for every xi and θ, which is computationallyinefficient. A way of overcoming this is by avoiding the optimizationstep and instead obtaining an approximate mode as followsx∗−i(xi,θ) ≈ Ep˜G(x−i|xi). (3.23)• Simplified Laplace approximation: Rue et al. (2009) introducedyet another way of approximating p(xi|θ,y) based on expanding (3.22)about xi = µi(θ) up to third order, which allows for correcting theGaussian approximation p˜G(xi|θ,y) in terms of location and skewnessin a more computationally efficient way.22Chapter 4Temperature Fields in thePacific Northwest4.1 MotivationMeteorological variables are crucial to understand a region’s climate. Inparticular, a much discussed topic in recent years is that the Earth’s climatehas been changing: global average atmospheric and sea surface temperaturehave increased and extreme temperature events such as heat waves are nowmore frequent. This changing climate has led to concerns about its impacton human health.Extreme temperatures may contribute to cardiovascular and respiratorydiseases, especially among elderly people, as is outlined in A˚stro¨m et al.(2013). Li et al. (2012) studied the relationship between temperature andmorbidity due to extreme heat and revealed that a number of hospital ad-missions in Milwaukee, Wisconsin were detected to be significantly relatedto high temperature. In fact, Robine et al. (2008) estimates an excess deathtoll of 70,000 people due to high temperatures in Europe in 2003 and aWorld Health Organization (WHO) assessment concluded that the modestwarming that has occurred since the 1970s was already causing over 140,000excess deaths annually by the year 2004 (World Health Organization, 2009).The spread of infectious diseases is also now being linked to the climatechange as per Hoberg and Brooks (2015). All of this highlights that theimportance of modelling temperature fields goes well beyond the naturalsciences.Due to the topography of the study region, the modelling of tempera-234.1. Motivationture fields can be particularly challenging. Kleiber et al. (2013) recognizedthe difficulty faced by statistical models in capturing complex spatial vari-ability. By analyzing data from the state of Colorado, Kleiber et al. (2013)developed a bivariate stochastic temperature model for minimum and max-imum temperature via a nonparametric approach. In the Pacific North-west, Salathe´ et al. (2008) focused on the development of a regional climatemodel run at a 15-km grid spacing. The topography of the study regioncontributes much to the complexity of modelling these fields and demandsflexible spatio-temporal models that are able to handle nonstationarity andchanges in trend.4.1.1 ContributionsOne of our contributions is the ability to accommodate features in themean that vary over space by extending the spatio-temporal model pro-posed in Le and Zidek (1992), and easily performing spatial prediction. Thismethodology is described in Section 4.4. Another important feature is itsflexibility due to the fact that no structure is assumed for the spatial covari-ance matrix. The method thus is able to accommodate nonstationarity. Weillustrate our analysis based on the Sampson and Guttorp (1992) methodfor estimating nonstationary spatial covariance structures.Additionally, our work conclusively shows how appropriately modellingthe spatio-temporal mean field can resolve complex patterns for nonsta-tionarity and improve spatial prediction. To this end, we also introducetwo comparable strategies for spatial prediction. The first is based on theextended Bayesian spatial prediction method after a thorough exploratoryanalysis to better understand the local changes in trend, and the need to ac-count for spatio-temporal interactions in the mean. The second is based ontackling the anomalies of expected climate in the Pacific Northwest, basedon the average values of temperature computed over a 30-year range (1981-2010), provided by PRISM Climate Group. These data were obtained usinga climate model called PRISM (Parameter-elevation Relationship on Inde-pendent Slopes Model), described in Daly et al. (1994, 1997, 2000). We244.2. The Pacific Northwestprovide an overview of the PRISM in Section 4.3.3.The outline of this chapter is as follows. We begin by providing a de-scription about the Pacific Northwestern region in Section 4.2, followed by adescription of the data sets in Section 4.3. Then, we introduce the Bayesianspatial prediction methodology in Section 4.4. Finally, we discuss the resultsin Section 4.54.2 The Pacific NorthwestThe Pacific Northwest is the region in the western part of North Amer-ica adjacent to the Northeast Pacific Ocean. It is a rather diverse region,with four mountain ranges dominating it, including the Cascade Range, theOlympic Mountains, the Coast Mountains and parts of the Rocky Moun-tains.This region is known to have a wet and cool climate overall, though inmore inland areas, the climate can be fairly dry, with warmer summers andharsher winters. According to Mass (2008), the Northwest weather and cli-mate are dominated mainly by the Pacific Ocean to the west and the region’smountain ranges that block and deflect low-level air. The ocean moderatesthe air temperatures year-round and serves as a source of moisture, andthe mountains modify precipitation patterns and prevent the entrance ofwintertime cold-air from the continental interior.The terrain is another key element to understand the Pacific Northwestweather. East of the Rocky Mountains is where the coldest air is usuallylocated, but the Rockies preclude this cold air from reaching the Northwestand the the cold air that does manage to cross gets warmer when descend-ing to eastern Washington, Oregon, Cascade Range, British Columbia, andAlaska. The temperatures in this region are thus controlled by the at-mospheric circulation patterns, the proximity to the Pacific Ocean and byelevation.In the literature, spatial modelling in this region is recognized to berather complex and it has been the subject of critical observation by localweather scientists. More recently, Mass (2008) apprises that the weather in254.3. Data Descriptionthe Northwest is often surprising, both in its intensity and in the remarkablecontrasts between nearby locations. Rapid changes and localized weather arevery common in this region and the terrain plays an important role in sep-arating often radically different climate and weather regimes. Mote (2004)noticed an apparent tendency for high-elevation stations to exhibit weakerwarming trends than lower-elevation stations when examining temperaturetrends in this region. In order to better understand this region, Salathe´ et al.(2008) implies that global simulations may indicate large-scale patterns ofchange though they may not capture the effects of narrow mountain ranges.Increases in temperature have been observed throughout the Northwest.Across the region from 1895 to 2011, a regionally averaged warming of about0.7 degrees Celsius (1.3 Fahrenheit) was observed (Kunkel et al., 2013; Moteet al., 2014). This change in climate influences hydrological and biologicalchanges, and ultimately, may lead to economic and social consequences. Allof this shows the importance of understanding the temperature fields in thisregion.4.3 Data DescriptionIn this section we briefly describe the various temperature data setsconsidered in this thesis and their sources. Most map images displayed inthis thesis were obtained via the ggmap R package (R Core Team, 2014;Kahle and Wickham, 2013).4.3.1 University of Washington (UW) Probcast GroupDataThe Probcast data set includes forecasts of surface level temperaturedata 48 hours ahead, initialized at midnight Coordinated Universal Time(UTC). Data are available for download at the University of Washing-ton (UW) Probcast Group web page (http://www.stat.washington.edu/MURI/, last accessed on June 15, 2016.). One of Probcast Group project’smain goals was to create methods for the integration of multisource infor-264.3. Data Descriptionmation, derived from deterministic model outputs, observations, and expertknowledge.The time range availability for the Probcast data is from January 12,2000 to June 30, 2000. This six-month period have become central forthis thesis. Stations whose types refer to ship reports, test automated sur-face observing system reports from the National Weather Service (NWS)or unidentified were considered unreliable by the Probcast Group and thushave not been considered in this analysis. Also, we did not consider fixedbuoys type of stations. Figure 5.1 contains a map with the 833 monitoringstations, spread over the Pacific Northwest.Figure 4.1: Map of the 833 monitoring stations. Map created using R li-brary ggmap with tiles by Stamen Design, under CC BY 3.0, and data byOpenStreetMap, under CC BY SA.The data come from the UW mesoscale short-range ensemble systemfor the Pacific northwestern area, described in detail in Grimit and Mass(2002). It corresponds to a five-member short-range ensemble consistingof different runs of the Pennsylvania State University–National Center forAtmospheric Research fifth generation Mesoscale Model (MM5). The runsdiffer due to the different initial values considered. In these data, the grid-scaled deterministic model outputs have been interpolated to the locationsof the monitoring stations by the Probcast group.274.3. Data DescriptionThe MURI data set is not an integrated spatio-temporal data set in thesense that the locations in which the measurements are available may varyconsiderably for different days whilst some stations have very few measure-ments and model outputs available. The temporal spacing of the observa-tions is highly irregular as it can be seen in Figure 4.2.Figure 4.2: Data availability. Notice the unsystematic missingness of thedata pattern.4.3.2 U.S. Global Historical Climatology NetworkThe U.S. Global Historical Climatology Network - Daily (GHCND) is anintegrated database of climate summaries from land surface stations acrossthe globe, developed for several potential applications, including climateanalysis and studies that require data at a daily time resolution, as describedin Menne et al. (2012) and Lawrimore et al. (2011).Figure 4.3 illustrates the locations of 97 stations where maximum dailytemperature data were downloaded from the GHCND database for the in-vestigation reported in this thesis. Due to the irregular temporal spacing ofthe observations as discussed in Section 4.3.1, we also collect the GHCNDdata. Our goal is to emulate the 48-hour forecasts of surface level temper-ature data from the Probcast group, hence for illustration purposes, theselected spatio-temporal data time frame is from January to June of the284.3. Data Descriptionyear 2000.Figure 4.3: Locations of 97 stations in the Pacific Northwestern area consid-ered in this study. Map created using R library ggmap with tiles by StamenDesign, under CC BY 3.0, and data by OpenStreetMap, under CC BY SA.Figure 4.4 shows contours of average site temperatures for differentmonths, obtained by bivariate linear interpolation. Notice that cooler tem-peratures are observed closer to the Pacific Ocean. Another interestingfeature is the different patterns of temperature variation across the region.Warmer temperatures are generally found east of the Cascades and sincewestern Washington is more exposed to air coming from from Puget Sound,the Straits of Juan de Fuca and Georgia, and the Pacific Ocean, it generallyexperiences cooler temperatures.Our preliminary analysis starts with an exploration of the spatio-temporaltrend, followed by an analysis of the unexplained residuals of the spatio-temporal process.Initially a simple geostatistical model was considered where the spatialtrend is described through a second-order polynomial regression model. Thetemperature measured on day t at location s is denoted as Yt(s), wheret denotes the time in days and s = (s1; s2), the location coordinates, inkm, after a suitable projection of the relevant part of the globe onto aflat surface. We considered projected spatial coordinates using the Lambert294.3. Data DescriptionFigure 4.4: Averaged site temperatures for different months. Notice thedifferent patterns of temperature variation across the region. Map createdusing R library ggmap with tiles by Stamen Design, under CC BY 3.0, anddata by OpenStreetMap, under CC BY SA.conformal conic projection, but for simplicity, we still refer to these projectedcoordinates as simply latitude and longitude.For a fixed time t,Yt(s) = µt(s) + νt(s), νt(s) ∼ N(0, σ2) (4.1)µt(s) = αt + β1ts1 + β2ts2 + β3ts1s2 + β4ts21 + β5ts22.Recall that when the spatial random field is stationary, the semivari-ogram between two locations sk and sl at a fixed time point t is definedasγ(sk, sl) =12E[(Yt(sk)− Yt(sl))2]. (4.2)For illustrative purposes, Figure 4.5 contains the binned empirical semi-304.3. Data Descriptionvariogram obtained separately for each of the two selected days (January 04and June 21). The shaded area corresponds to Monte Carlo envelopes ob-tained by repeatedly recomputing and plotting the semivariance after per-mutations of the temperature data across the sampling locations. Theyindicate regions of uncorrelated data.Figure 4.5: Binned empirical semivariograms with Monte Carlo envelopesin shaded area. These envelopes are obtained by permutations of the dataacross the sampling locations, and indicate regions of uncorrelated data.Figure 4.6 illustrates how the latitude and longitude effects change overtime. The longitude effect has a clearly increasing trend and this is possiblydue to the Cascade mountains that extend from southern British Columbiathrough Washington and Oregon to Northern California. The Cascadesblock the westward movement of most of the cold, dense air that managesto reach eastern Washington and Oregon.Our preliminary analysis indicates that for regions where topographychanges significantly, simple polynomial trends commonly used in practicemay introduce bias in the spatio–temporal residuals resulting in a semivari-ogram with a large squared bias term that can lead to a spurious finding ofnonstationarity when none exists. Thus, our preliminary analysis points tothe need for the improved estimation of the spatial mean model as reportedin the sequel, one that accounts for extra features, notably elevation.In particular, we recognize that our analysis needs to include spatio–temporal interactions as well as a longitude–elevation interaction. The latter314.3. Data DescriptionFigure 4.6: Latitude and longitude effects changing over time. The shadedarea represents 95% confidence intervals for these effects.is due to the effect of the proximity to the Pacific Ocean, which also takesinto consideration the elevation effect due to the mountain ranges whenmoving eastward. Moreover, the longitude effect is assumed to depend onthe elevation as well as how far north the station is located, and this effectmust be allowed to vary over time. Similarly, the latitude effect may varyover time and it is dependent on how far east the weather station is located.The above considerations lead to a spatio–temporal mean (trend) func-tion that may be described as follows:µt(s) ≡ f(long*lat*month,long*elev), (4.3)where f denotes a linear function, s = (long, lat) are projected latitudeand longitude coordinates, month indicates the month in study, and elevthe elevation at s. The ∗ notation is used to indicate that the mean functionincludes the individual, the two-way and, where applicable, the three-wayinteraction effects.However, an alternative method suggests itself, one based on the use ofhistorical temperature averages over the region to account for these complexinteractions, as a representation of the climate in the Pacific Northwest. Wedescribe this alternative in the following Subsection 4.3.3.324.3. Data Description4.3.3 PRISM Climate Group DataPRISM is a climate analysis system that uses point data, a digital eleva-tion model (DEM), i.e. digital representations of cartographic informationin a raster form, and other spatial data to generate gridded estimates of an-nual, monthly and event-based climatic parameters (Daly et al., 1994, 1997).It was developed primarily to interpolate climate elements in physiograph-ically complex landscapes (Daly et al., 2008), and is particularly useful toidentify short and long-term climate patterns.The extrapolation of climate over high elevation ranges is often neededdue to the lack of observations in mountainous regions. The use of PRISMdata would then be ideal for complex regions with mountainous terrain suchas the Pacific Northwest. In the literature, Daly et al. (1994, 1997, 2000)provide a description of the methodology behind PRISM. The main ideais that it calculates linear parameter–elevation relationships, allowing theslope to change locally with elevation. Observations nearer to the targetelevation receive more weight than those further up or down slope. Fortemperature, the elevation of the top of the boundary layer is estimated byusing the elevation of the lowest DEM pixels in the vicinity and adding aclimatological inversion height to this elevation. (Daly et al., 1997)The PRISM data we obtained corresponds to average values for temper-ature computed over a 30-year range (1981-2010), provided by the PRISMClimate Group, Oregon State University, and available online at http://prism.oregonstate.edu (last accessed on June 15, 2016). Our goal is touse these data as a representation of the climate in the Pacific Northwest.Having this information enables a comparison with our observations and ananalysis of anomalies (i.e. differences between actual and expected valuesvia PRISM), which would highlight what could not have been explained bythe expected climate. This will also serve as a baseline comparison withthe more complex mean function proposed in Section 4.3.2 that includesspatio–temporal interactions. Finally, in the sequel, PRISM data are usedto construct a spatio–temporal trend model as an alternative to that sug-gested by the analysis reported in Section 4.3.2.334.4. Bayesian Spatial Prediction4.4 Bayesian Spatial PredictionIn this section, we present an empirical Bayesian spatial prediction (BSP)method built on the assumption that realizations of an underlying randomfield are obtained from measurements made at g gauged stations and thatthe goal is to obtain spatial predictions at the other u ungauged stations.Let Yt ≡ (Y(u)t ,Y(g)t ) denote a p-dimensional row vector (p = u+ g), whereY(u)t and Y(g)t corresponds to the row vectors at the ungauged and gaugedstations, respectively. The variables Yt are assumed to be independentover time, or have passed a pre-filtering preliminary step, such that fort = 1, . . . , n,Yt|zt,B,Σ ∼ Np(ztB,Σ), (4.4)where N denotes a multivariate normal distribution, with subscripts makingthe dimension explicit; the zt is a k-dimensional row vector of covariates andB denotes a (k × p) matrix of regression coefficients.As originally formulated in Le and Zidek (1992), in the BSP modelling,covariates were allowed to vary with time, but not space. Over the ensuingdecade the BSP was extended in a variety of ways as summarized in Le andZidek (2006). In particular, the response vector at each space–time pointcould be multivariate, thus enabling site specific random covariates witha Gaussian distribution to be incorporated in the BSP by first includingthem in the fitted multivariate joint distribution in Equation (8.44) andthen conditioning on them to get the BSP. However, no way had been foundto incorporate site specific nonrandom covariates.However, such covariates are confronted in our analysis of temperaturefields in complex regions, as the spatio-temporal mean function must include,say, topographic features as well as the oftentimes crucial spatio-temporalinteractions. Thus, an extension was needed and the one that we developed,will now be presented.Let Y be a (n × p) response matrix such that Y ≡ (Y1, . . . ,Yn), Z isa (n × k) design matrix and B a (k × p) matrix of regression coefficients.344.4. Bayesian Spatial PredictionAssume thatY|Z,B,Σ ∼ MN n×p(ZB, I,Σ) (4.5)B|B0,Σ,F ∼ MN k×p(B0,F−1,Σ) (4.6)Σ ∼ W−1p (Ξ, δ), (4.7)where F−1 is a positive (k×k) definite matrix, and Ξ a (p×p) hyperparame-ter matrix. Here,MN andW−1 denote the matrix normal and the invertedWishart distributions, respectively, with subscripts making the dimensionsexplicit. We writeB0 =β(1)0 . . . β(p)0β(1)1 . . . β(p)1......β(1)k−1 . . . β(p)k−1 , (4.8)where β(j)0 = α +∑l βzlzlj includes the site–specific covariates at site j,denoted as zlj , j = 1, . . . , p and for i = 1, . . . , k, β(j)i denotes the coefficientsof the non-site specific covariates. The first column of the Z matrix cor-responds to a unit column vector, whereas the subsequent columns wouldcontain the non–site specific covariates. Note that the method entails “bury-ing” the site–specific covariates in the intercepts β(j)0 . In practice, all of theregression parameters are first estimated via least squares, and ultimatelyplugged into the B0 matrix.Denoting the matrices Σgg and Σuu as the covariance matrices of Y(g)and Y(u), respectively, and Σug the cross-covariance, we can partition Σand similarly the hyperparameter matrix Ξ asΣ =(Σuu ΣugΣgu Σgg)and Ξ =(Ξuu ΞugΞgu Ξgg). (4.9)For a fully (proper) Bayesian approach, extra hierarchy levels could bespecified. Nonetheless, the BSP was developed from its inception to save354.5. Resultscomputational time by bypassing this approach. It was recognized that, inpractice, the lack of prior knowledge would inevitably lead to a somewhatarbitrary choice of a convenience prior in this high-dimensional model. Thus,a preliminary empirical Bayes step is required for estimating B0 via a linearregression modelling approach, as suggested by the preliminary analysis inSection 4.3.2.When performing spatial prediction, we use the result showed in Le andZidek (1992) that the conditional distribution of Y(u)t , where t ∈ {1, . . . , n}is given byY(u)t |y(g)t ,Z,B0 ∼ tu(µ(u),dδ − u+ 1Ξu|g, δ − u+ 1), (4.10)whereµ(u) = ztB(u)0 + ΞugΞ−1gg (y(g)t − ztB(g)0 ) (4.11)d = 1 + ztF−1z>t + (y(g)t − ztB(g)0 )Ξ−1gg (y(g)t − ztB(g)0 )> (4.12)Ξu|g = Ξuu −ΞugΞ−1gg Ξgu. (4.13)Here B0 was partitioned as B0 = (B(u)0 ,B(g)0 ) according to the partition ofYt (superscripts denoting the ungauged and gauged parts).To finish model development, the covariance of the residual responses inEquation (8.44), Ξu|g , must be specified. However, in practice and in ourapplications, these residuals will not have a second-order stationary distribu-tion. Thus, we need to handle residuals with a non-stationary distribution.In this work, we adopt the Sampson-Guttorp (SG) warping method (Samp-son and Guttorp, 1992), as described in Section 2.1.1.4.5 ResultsThis section presents the results of applying the BSP method described inSection 4.4. This is implementable using the EnviroStat v0.4-0 R package(Le et al., 2014), including the extension proposed. For this purpose, weinitially selected 64 stations at random for training, leaving the remainder364.5. Resultsof the 97 stations for validation purposes, as illustrated in Figure 4.7.Figure 4.7: Locations of the stations selected for training and for validationpurposes. Map created using R library ggmap with tiles by Stamen Design,under CC BY 3.0, and data by OpenStreetMap, under CC BY SA.Work begins with an analysis of the spatio-temporal trend, as it is de-scribed in the subsequent Section 4.5.1, followed by an analysis of the spatialcorrelation in the residuals, after taking into account the mean trend in Sec-tion 4.5.2.4.5.1 The Spatio–temporal TrendFor the training stations, Figure 4.8 illustrates the effect of projectedlatitude on temperatures considering different scenarios of longitude, eleva-tion and time. Notice that moving north implies that the temperature infact decreases in different rates, depending on your initial scenario. We referto this as the RC-effect, which relates to a phenomenon where the effect oflatitude and longitude change over time, that is, at certain times, widelyseparated sites might be strongly correlated. In a statistical model, thiseffect alerts to the need to include space-time interactions.Figure 4.9 indicates that the Gaussian assumption is reasonably met, sono transformation is required.374.5. Results0102030−5.0 −2.5 0.0 2.5Centred Latitude (km × 10−2)Temperature (°C)ScenariosEastern, Low, JuneWestern, Low, JuneEastern, High, JuneWestern, High, JuneWestern, Low, FebEastern, Low, FebWestern, High, FebEastern, High, FebFigure 4.8: Investigating effect of projected latitude (centred) on tempera-tures considering different interaction scenarios for longitude (eastern, west-ern), elevation (high, low) and time (February, June).Figure 4.9: Normal quantile-quantile plot of the residual temperatures of alinear model with spatio-temporal mean function as in Equation 4.3.The following Subsection 4.5.2 provides an analysis of the spatial corre-lation in the residuals, after taking into account this spatio-temporal trend.384.5. Results4.5.2 Spatial Correlation in the ResidualsAn important diagnostic in applying the SG method, supplied with theEnviroStat v0.4-0 R package (Le et al., 2014), is the biorthogonal gridseen in Figure 4.10. It represents the degree of contracting and expandingof the G-space needed to attain an approximately stationary domain in D-space through deformation. The solid lines indicate contraction and dashedlines, expansion. The expansions can be explained by the abrupt changesin the residual temperatures for nearby regions, due to diverse terrain. Acontraction is seen in Eastern Washington, a basin located between theCascade and Rocky Mountains.0 200 400 kmFigure 4.10: Biorthogonal grid for the thin-plate spline characterizing thedeformation of the G-space, using NCDC data set. Solid line indicates con-traction and dashed lines indicate expansion.Figure 4.11 illustrates the effect of different spline smoothing λ values inthe deformed space. Without any smoothing (λ = 0), the D-space is foldedover on itself, implying that widely separated sites tend to be more correlatedthan sites located between them. To make the results more interpretable,we have chosen λ = 5, a value that keeps more of the gains from deformationseen in Figure 4.12, without folding the G-space.Figure 4.12 contains estimated dispersions after applying the SG ap-394.5. Resultsproach (Sampson and Guttorp, 1992) in G-space and D-space. A more sta-tionary fit is seen in the distorted space, since less variability is seen aroundthe (stationary) variogram line.We repeated the analysis using the PRISM data described in Section4.3.3. This corresponds to average values for temperature computed over a30-year range (1981-2010), and we use these data to represent the expectedclimate in the Pacific Northwest. Instead of estimating trend coefficientsbased on Equation 4.3, we analyze anomalies (i.e. differences between ourobserved values and those via PRISM). The goal is to validate our estimatedtrend by comparing improvements in prediction between these different anal-yses.−5 0 5 10 15−10−6−2024λ = 0D−Plane Coordinates−2 0 2 4 6 8 10−10−6−202λ = 0.5D−Plane Coordinates−2 0 2 4 6 8−8−6−4−202λ = 2D−Plane Coordinates−2 0 2 4 6 8−8−6−4−20λ = 5D−Plane CoordinatesFigure 4.11: Deformation assuming different spline smoothing λ values.Note that when λ = 0, no smoothing is applied.404.5. Resultsllll llllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllll ll llllllllllllllllllll llllllllllllllllllllllllll llllllllllllllll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllll l llllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllllllllllllll lllllllll lllll lllllll lllllllll llllllllllllllllllllllllllll ll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllll llllllllllllll lllllllllllllllllllllllllllllllllll lllll lllllllllllll llll lllllll0 200 400 600 800 10000.00.51.01.52.0G−space distance (km)Dispersionllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllll lll llllllllll llllllllllllllllllllll l ll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll0 5 10 150.51.01.52.0D−space distanceDispersionFigure 4.12: Estimated dispersions after SG approach in G-space and inD-space. The solid line represents a fitted exponential variogram.In the following section, we assess and compare the spatial predictionsmade by our fitted spatio–temporal model with our two alternative ap-proaches for modelling the spatio–temporal trend. We argue that PRISMcaptures the large-scale trend well, but it may not capture the effects ofterrain at smaller scales.4.5.3 Spatial PredictionIn this section we present our assessments of the prediction accuracy ofour fitted hierarchical spatio–temporal model. For validation purposes, wecompare the predicted values with the real values observed for the 33 left-out stations. Figure 4.13 contains a map of the mean squared predictionerrors, averaged over time.The prediction accuracy of the hierarchical spatio-temporal Bayesianmodel is also compared to ordinary kriging. For ordinary kriging, we usedthe geoR v.1.7-4.1 R package (Ribeiro Jr. and Diggle, 2001). The pa-rameter estimates of the Exponential covariance function were obtained viamaximum likelihood for the different time points. Figure 4.14 displays themean squared prediction error for the ungauged stations and at different timepoints, respectively. From Table 4.1, notice that the coverage for our space-time interaction spatial mean is similar when analyzing PRISM anomalies,414.5. Resultsll lllllllllll lllllll ll lllllllllllll lllllllllll lllllll ll lllllllllllll lllllllllll lllllll ll lllllllllllBSP BSP − PRISM OK404550−130 −125 −120 −115 −110−130 −125 −120 −115 −110−130 −125 −120 −115 −110LongitudeLatitude01020MSPEFigure 4.13: Map of mean squared prediction errors (◦C2), MSPE, averagedover time for the different methods considered: Bayesian spatial prediction(BSP), Bayesian spatial prediction with PRISM (BSP – PRISM), and ordi-nary kriging (OK). The red triangles represent the stations used for trainingpurposes. Map created using R library ggmap with tiles by Stamen Design,under CC BY 3.0, and data by OpenStreetMap, under CC BY SA.which serves as a way to characterize the strength of our general temperaturemapping theory.Table 4.1: Empirical coverage probabilities and prediction summaries for thedifferent methods considered: Bayesian spatial prediction (BSP), Bayesianspatial prediction with PRISM (BSP – PRISM), and ordinary kriging. Theoverall MSPE refers to the mean squared prediction errors (◦C2) averagedover space and time.BSP BSP – PRISM Ordinary krigingEmpirical coverageprobabilities of 95% CI0.918 0.921 0.529Overall MSPE 5.396 7.000 14.032(std.error) (2.362) (3.733) (5.823)In addition, Figure 4.14 shows that the mean squared prediction errors424.6. Concluding Remarksacross ungauged stations and across time are, on average, smaller for theBSP method introduced considering the spatio-temporal interactions in themean function as in Equation 4.3. The reason for this could be due tothe fact that PRISM may not be capturing the effects of terrain at smallerscales.Another disadvantage is that the PRISM data are currently not availableat locations outside of the United States. Thus, we advocate that for regionswith complex terrain like the Pacific Northwest, a thorough exploratoryanalysis is crucial to better understand the local changes in trend.lllll01020304050BSP BSP − PRISM OKMSPEMSPE across timellllllllllllll01020304050BSP BSP − PRISM OKMSPE across spaceMSPEFigure 4.14: Mean squared prediction errors (◦C2) averaged across ungaugedstations and across time for the different methods considered: Bayesianspatial prediction (BSP), Bayesian spatial prediction with PRISM (BSP -PRISM), and ordinary kriging (OK).4.6 Concluding RemarksThis chapter focused on the modelling temperature fields in the PacificNorthwest, where rapid changes in temperature and localized weather arecommon particularly due to the complex terrain.We introduced a flexible stochastic spatio-temporal model for daily tem-434.6. Concluding Remarksperatures in the Pacific Northwest that handles nonstationarity. We alsostressed the need for spatio-temporal interactions to understand the tem-perature trends. We believe that global climate models may fail to explaininteresting smaller-scale trends, especially in regions with a complex terrainlike the Pacific Northwest.In addition, we introduced two comparable strategies for spatial pre-diction in regions with a complex terrain. The first is an extension of theBayesian spatial prediction proposed by Le and Zidek (1992). We extendedthis method to take into account spatio-temporal interaction features in themean to capture the localized changes in trend. The second is based ontackling the anomalies of the expected climate in the Pacific Northwest,based on the average values of temperature computed over a 30-year range(1981-2010), provided by PRISM Climate Group. PRISM data, however,are currently not available at locations outside of the United States.This work conclusively shows how appropriately modelling the spatio-temporal mean field can help resolve these complex patterns for nonsta-tionarity and improve spatial prediction in contrast to using simpler meanstructures. This can be seen by larger MSPEs observed for the BSP-PRISM,where anomalies of expected weather were analyzed, instead of investigatingobserved changes in temperature across the Pacific Northwest. Moreover,we would like to emphasize the need to account for nonstationarity in thisregion, as demonstrated by the underperformance of the more traditionalkriging methodology based on stationary models.Our analysis also discovered abrupt changes in the observed tempera-tures for nearby regions due to diverse terrain in a great part of the westernregion, and less variable weather conditions in Eastern Washington, a basinlocated between the Cascade and Rocky Mountains.44Chapter 5Ensemble Modelling5.1 MotivationThe use of computer models has become increasingly common in envi-ronmental science applications. These computer models are used to simulatephysical phenomena in order to better understand complex physical systems.From a statistical perspective, the focus is often on processing informationbrought by their outputs and gathering useful insights about the underlyingsystem. In practice, environmental agencies often depend on these kinds ofinformation for regulatory purposes.It should be noted, however, that these outputs come from deterministicmodels, and hence there are no indications of uncertainty associated withthem. Kennedy and O’Hagan (2001) dealt with uncertainty analysis andintroduced a Bayesian calibration technique by incorporating informationfrom both computer model outputs and monitoring data.One of the recent challenges in spatial statistics applications is in fus-ing information from multiple sources that might have been measured atdifferent spatial scales. This problem is often seen as data fusion and it isalso referred to as the change of support problem. Although the issue ofhandling mismatched spatial scales is a well established problem, recent ad-vances in remote sensing highlights the need for suitable statistical methodsto address it (Nguyen et al., 2012). A comprehensive review of methods fordealing with mismatched spatial data can be found in Gotway and Young(2002) and Gelfand (2010). In this chapter, we build upon Fuentes andRaftery (2005), where we handle different spatial scales by linking themthrough an underlying “true” process at the micro-scale.Notably, in weather studies, data often come from monitoring stations,455.1. Motivationas it was the case in Chapter 4, but supplemented by the inclusion of thecomputer model outputs in the modelling. In this chapter, we will explorethese ideas for combining multiple computer models for a temperature dataset in the Pacific Northwest. As noted in Nychka and Anderson (2010),numerical weather prediction is one of the most successful applications ofthe data fusion problem, where a large set of observations are combinedwith physical models describing the atmosphere evolution and ultimatelyproducing high resolution weather forecasts.Other strategies that do not assume an underlying “true” process includedownscaler models (Berrocal et al., 2010a,b, 2012) and a two-stage regressionapproach (Guillas et al., 2006, 2008; Zidek et al., 2012). The downscalingstrategy handles the station and model outputs observations at mismatchedscales via a regression model with spatially varying coefficients. The two-stage regression approach as in Guillas et al. (2006, 2008) is based on firstregressing the station data on the model outputs and then regressing theestimated residuals of the first step on indicators of time and other temporalcomponents. The work in Zidek et al. (2012) provides an extension basedon an ad-hoc method to allow spatial interpolation of the coefficients of thelinear regression.5.1.1 ContributionsEnsemble modelling is hereby referred to as a statistical post-processingtechnique based on combining multiple computer models outputs in a sta-tistical model with the goal of obtaining probabilistic forecasts.In Section 5.2, we provide a description of the Bayesian Ensemble Meld-ing model (BEM) methodology introduced by Liu (2007) following Fuentesand Raftery (2005). The main idea lies in linking processes on mismatchedscales through an underlying “true” process. One of the main disadvantagesof these methodologies is the computational burden faced while performinginference, as noted in many applications (Swall and Davis, 2006; Smith andCowles, 2007; Foley and Fuentes, 2008).Moreover, simple MCMC strategies are known to be infeasible when465.2. The Bayesian Ensemble Melding Modelhandling large spatial data sets. In this chapter, our main objective is tointroduce a scalable inference methodology alternative for the BEM usingintegrated nested Laplace approximations described in Section 3.2. We fol-low Lindgren et al. (2011) and take advantage of a Markov representation ofthe Mate´rn covariance family, connecting ideas from Gaussian Markov ran-dom fields (GMRFs) and stochastic partial differential equations (SPDEs)in a continuous space.Since the BEM is essentially a spatial model, our ultimate goal is toprovide some background to Chapter 6, where we build upon the ability ofthe BEM to accommodate time and describe a dynamic strategy with theobjective of performing forecasting. In that chapter, we will take advantageof the computational gains of the INLA methodology for performing infer-ence for the BEM. On the other hand, McMillan et al. (2010) proposed aspatio-temporal extension through a specification of the underlying “true”process at a grid cell scale. They made use of MCMC methods for perform-ing inference and due to the large number of grid cells, the computationalburden was still an issue.5.2 The Bayesian Ensemble Melding ModelThe Bayesian Ensemble Melding (BEM) model described in Liu (2007)can be viewed as an extension of the Bayesian model proposed by Fuentesand Raftery (2005), by allowing the combination of the observed measure-ments with outputs from an ensemble of deterministic models.Similarly to Fuentes and Raftery (2005), the BEM model is able tolink processes on mismatched scales through an underlying “true” process{Z(s) : s ∈ Rd}, where d is the dimension of the domain, and through theconsideration of the conceptual processes {Z˜j(s) : s ∈ Rd}, the deterministicmodel output processes, j = 1, . . . , p.The different spatial scales are dealt with by linking the underlying“true” process in a micro-scale with the model outputs, which may be aver-ages at a grid-scale resolution. A similar idea was used in Wikle and Berliner(2005), where they describe it as a conditional change of support solution.475.2. The Bayesian Ensemble Melding ModelWikle and Berliner (2005) handle the mismatched scales by conditioninga true unobserved spatially continuous process on an areal average of theprocess at some support in which there is interest in performing inference.For the BEM model, at a given location s ∈ Rd, the measurement processis modeled asZˆ(s) = Z(s) + e(s), (5.1)where the measurement error process at location s is assumed to be e(s) ∼N(0, σ2e), and independent of Z(s). We represent the realizations of themeasurement process at all locations as the vector Zˆ.In addition, for each j = 1, . . . , p, the j-th output process from thedeterministic models {Z˜j(s) : s ∈ Rd} is modeled as:Z˜j(s) = aj(s) + bj(s)Z(s) + δj(s), (5.2)where for each j = 1, . . . , p, the error term is δj(s) ∼ N(0, σ2δ,j). The param-eter functions aj and bj measure the additive and multiplicative calibrationparameters for the j-th deterministic model. The processes δj(·) are assumedindependent of each other as well as independent of e(s), the measurementerror process.Since the deterministic model outputs are generally measured in sub-regions B1, . . . , Bm (blocks) of the study domain, for each deterministicmodel j = i, . . . , p, and each sub-region i = 1, . . . ,m,Z˜j(Bi) =1|Bi|(∫Biaj(s)ds + bj∫BiZ(s)ds +∫Biδj(s)ds), (5.3)where |Bi| denotes the area of the sub-regions of the study domain for i =1, . . . ,m.One way of approximating∫BiZ(s)ds is by Monte Carlo integration,after obtaining a sample of locations over the B1, . . . , Bm. To this end,suppose that a random sample of L points are obtained in each of sub-region485.2. The Bayesian Ensemble Melding ModelB1, . . . , Bm and thus for i = 1, . . . ,mZ(Bi) =∫BiZ(s)ds (5.4)≈ 1LL∑k=1Z(sk,Bi)ds. (5.5)Originally, Liu (2007) assumed no overlap between these sampling lo-cations and the monitoring sites within the sub-regions B1, . . . , Bm. Werepresent the realizations for the j-th deterministic model output at all mLsampling locations as the vector Z˜j . In the case where the deterministicmodels were previously interpolated at the monitoring locations, Z˜j has thesame dimension as Zˆ.Finally, in order to link the above processes, the “true” underlying pro-cess is modeled asZ(s) = µ(s) + (s), (5.6)where µ(s) is a deterministic mean structure, usually a polynomial functionof s, representing large-scale variation. The mean parameters are denotedas β. The errors (s) are assumed to be correlated with zero mean, andvariance denoted as σ2. Their correlation could be described by a parametriccorrelation functions for stationary processes or a non-stationary structurein a more general case. These correlation parameters are denoted as θ.Realizations of the “true” underlying process at all locations are repre-sented in the vector Z. Key to linking the measurement and deterministicprocesses is that not only these realizations of the “true” underlying processneed be observed at the n monitoring stations, but also at the samplinglocations where the deterministic model processes are observed. In the casewhere the p deterministic model processes are observed at L locations withineach sub-region B1, . . . , Bm, the vector Z has dimension (n+ pmL)× 1.Due to the mismatched dimensions, the linking matrices A0 and Aj , for495.2. The Bayesian Ensemble Melding Modeleach j = 1, . . . , p are introduced such thatdim(A0Z) = dim(Zˆ) = (n× 1) (5.7)dim(AjZ) = dim(Z˜j) = (m× 1). (5.8)First, consider the following auxiliary block matrix, with as many blocksas there are deterministic modelsEj =[0(1) . . .0(j−1)L(j)0(j+1) . . .0(p)], (5.9)and where each of the blocks corresponds to a (m×mL) and the j-th blockis given by the followingL(j) =1L . . .1L . . . 0 . . . 0.......... . ..........0 . . . 0 . . . 1L . . .1L . (5.10)The superscripts are used to differentiate the different blocks in the E ma-trix. There are a total of p blocks, thus Ej has dimensions (m×pmL). Thenwe can finally define the linking matrices asA0 = [In|0(n×pmL)] (5.11)Aj =[0(m×n)|Ej], (5.12)where In denotes a (n×n) identity matrix, and 0(i×j) a (i× j) zero matrix.Hence, A0 has dimensions (n × (n + pmL)) and for each j = 1, . . . , p, Ajhas dimensions (m× (n+ pmL)).In the following Section 5.3, we discuss the inference for the BEM model.505.3. Inference for the BEM5.3 Inference for the BEMLet Ψ = (β,θ, σ2, a1, . . . , ap, b1, . . . , bp, σ2δ1, . . . , σ2δp , σ2e) denote all modelparameters. Summarizing the BEM, note thatZˆ|Z,Ψ ∼ N (A0Z, σ2eIn)Z˜j |Z,Ψ ∼ N (aj1m + bjAjZ, σ2δjIm)Z|Ψ ∼ N (µ,Σ),for j = 1, . . . , p, where 1m denote a unit vector of size m. For modelcompleteness, prior assumptions must be made for Ψ.Here, we consider calibration parameters aj , bj for j = 1, . . . , p are con-stant in space. For particular applications, it might be of relevance to letthese parameters vary across space.The posterior distribution of Ψ and the “true” spatial underlying processis given byp(Ψ,Z|Zˆ, Z˜1:p) ∝ p(Zˆ|Z, σ2e)p∏j=1[p(Z˜j |Z, aj , bj , σ2δj )]p(Z|β,θ)p(Ψ) (5.13)∝ p(Ψ)|(σ2eIn)|−12 |Σ|− 12 [p∏j=1|(σ2δjIm)|−12 ] (5.14)exp{−12[(Zˆ−A0Z)>(σ2eIn)−1(Zˆ−A0Z)+(Z−Xβ)>Σ−1(Z−Xβ)+p∑j=1(Z˜j − 1aj − bjAjZ)>(σ2δjIm)−1(Z˜j − 1aj − bjAjZ)where Zˆ = (Zˆ(s1), . . . , Zˆ(sn))>, and Z˜1:p = (Z˜>1 , . . . , Z˜>p )ᵀ, where for eachj ∈ 1, . . . , p, Z˜j = (Z˜j(B1), . . . , Z˜j(Bm))>.Liu et al. (2011) obtained samples from this posterior distribution byusing MCMC methods. In fact, this has been the most common way ofperforming inference for such point-referenced spatial models. One of themain drawbacks of this approach is the computational burden associatedwith expensive matrix computations in each MCMC iteration, and the slow515.3. Inference for the BEMmixing of the chains.In Section 5.4, we describe an application of the BEM model based onINLA method for performing approximate Bayesian inference, as discussedin Section 3.2. In particular, in Subsection 5.3.1, we follow up on the workof Lindgren et al. (2011) and describe how to represent the BEM model insuch framework.5.3.1 A Stochastic-Partial Differential Equation ModelAlternativeOne of the main disadvantages of the BEM model is the computationalburden associated with factorizing dense covariance structures for estima-tion and spatial prediction purposes. For large data sets, their use is im-practical. Lindgren et al. (2011) proposed an alternative to this problemby taking advantage of a Markov representation of the Mate´rn covariancefamily, connecting ideas from between Gaussian Markov random fields (GM-RFs) and stochastic partial differential equations (SPDEs) in a continuousspace. Examples of applications using the INLA-SPDE approach for geosta-tistical models can be found in Simpson et al. (2012b,a) and Cameletti et al.(2013).In this section, we provide an overview of Lindgren et al. (2011), andthe discussion provided in Simpson et al. (2012b), later connecting the ideasinto the BEM model. The key step entails replacing the dense covariancestructure of a Gaussian field by a GMRF, thus allowing users to take ad-vantage of sparse precision matrices. Ultimately, the INLA methodology(Rue et al., 2009) described in Chapter 3 could then be used for the purposeof inference. The INLA-SPDE approach can be implemented using the INLAR package, available for download at r-inla.org (last accessed on June 15,2016).Instead of the usual definition through the covariance function, Lindgrenet al. (2011) suggest representing a Gaussian random field η with Mate´rn525.3. Inference for the BEMcovariance function as a solution to the SPDE(κ2 −∆)α2 η(s) =W(s), (5.15)α = ν + d/2, κ > 0, ν > 0,where ∆ =∑di=1∂2∂η2iis the Laplacian operator and W a spatial Gaussianwhite noise with unit variance.This is due to the fact that Gaussian random fields with Mate´rn covari-ance function are stationary solutions to (5.15) for any α ≥ d/2 (Whittle,1954, 1963). When α is an integer, the Mate´rn fields are Markovian (Lind-gren et al., 2011). The Mate´rn covariances are parametrized asCov(s, s + h) =σ22ν−1Γ(ν)(κ||h||)νKν(κ||h||). (5.16)Note that the two representations (5.15) and (5.16) are related, in such waythat the Mate´rn smoothness is ν = α− d/2 and the marginal variance σ2 isgiven byσ2 =Γ(ν)Γ(α)(4pi)d/2κ2ν. (5.17)Moreover, an empirically derived spatial range can be obtained from ρ =√8κ , where the spatial correlation decays to approximately 13%.For a finite set of suitable functions {ϕj(s), j = 1, . . . , N}, a solution of(5.15) satisfies∫ϕj(s)(κ2 −∆)α2 η(s)ds =∫ϕj(s)W(ds). (5.18)For instance, consider the case where α = 2 on a two-dimensional domainΩ ⊂ R2. Using Green’s first identity (A.1), and assuming a zero normalderivative of η(s) on the boundary of Ω, we can then rewrite (5.18) as∫Ω[κ2ϕj(s)η(s) +∇ϕj(s) · ∇η(s)]ds =∫Ωϕj(s)W(ds). (5.19)535.3. Inference for the BEMSince the idea is based on using GMRFs to approximate a Gaussian ran-dom field, Lindgren et al. (2011) then suggests that the Gaussian randomfield be approximated by a finite element representation of the solution ofthe SPDE based on a triangulation of the spatial domain. In R-INLA, a De-launey triangulation (DT), which maximizes the minimum interior triangleangle is used. Initial vertices are placed at the locations where observa-tions are available. Then, additional vertices are added to cover the spatialdomain, and finally yielding an irregular grid representation of the origi-nal (continuous) Gaussian process. The finite element representation of thesolution of the SPDE in (5.15) is as followsη(s) =N∑k=1ψk(s)wk, (5.20)for Gaussian weights {wk} and some basis functions {ψk} which are piece-wise linear in each triangle, defined such that ψk is 1 at vertex k and 0 atthe other vertices.In the case where α = 2, we could obtain a set of N equations to solveby substituting (5.20) into (5.19)N∑k=1(∫Ω[κ2ϕj(s)ψk(s) +∇ϕj(s) · ∇ψk(s)]ds)wk =∫Ωϕj(s)W(ds), (5.21)for j = 1, . . . , N . Assuming that the test functions are the same as the basisfunctions, i.e. ϕj(s) = ψk(s), note that the right-hand size of (5.21) becomes∫Ω ψj(s)W(ds), yielding a zero-mean normal distribution with covariances∫Ω ψk(s)ψj(s)ds. Hence (5.21) can be rewritten asw ∼ N (0,Kκ2C−1Kκ2), (5.22)where Kκ2 = κ2C + G. The above notation refers to a normal distributionwith mean and precision given by the above. The matrices Kκ2 , C and G545.3. Inference for the BEMhave entries given by(Kκ2)ij = κ2Cij + Gij (5.23)Cij =∫Ωψi(s)ψj(s)ds (5.24)Gij =∫Ω∇ϕj(s) · ∇ψi(s)ds. (5.25)Since C is dense, sparsity is imposed by replacing it with a diagonalmatrix C˜ with elements given by∫Ω ψi(s)ds, for i = 1, . . . , N . Thereforew yields an approximate GMRF from the continuous Mate´rn field, whichhas a sparse precision matrix and is more computationally efficient than thetypical dense Mate´rn one. In our applications, we assume α = 2, henceour focus here in describing this case. For α equal to other integers, checkLindgren et al. (2011).5.3.2 Spatial PredictionTypical spatial prediction techniques via kriging normally involve esti-mating the parameters of the underlying covariance structure. Then theseparameters are assumed known for use in spatial prediction. In this section,we describe a Bayesian approach to spatial prediction of the BEM whichtakes into account the uncertainty about parameters on these predictions.Conditionally on Ψ, the joint distribution of Zˆ and Z˜1:p is multivariatenormal, as follows:ZˆZ˜1Z˜2...Z˜p∼ Nµˆa1 + b1µ˜a2 + b2µ˜...aj + bjµ˜,ΣZˆ ΣZˆ,Z˜1 ΣZˆ,Z˜2 . . . ΣZˆ,Z˜pΣZ˜1 ΣZ˜1,Z˜2 . . . ΣZ˜1,Z˜p. . ..... . . ΣZ˜p−1,Z˜pΣZ˜p(5.26)where µˆ = (µ(s1), . . . , µ(sn))ᵀ and µ˜ = (∫B1µ(s)ds, . . . ,∫Bmµ(s)ds)ᵀ. Inthe case that outputs have been previously interpolated to the observed555.3. Inference for the BEMlocations, then µ˜ simplifies to µ˜ = (µ(s1), . . . , µ(sn))ᵀ.Regarding the covariance structure, ΣZˆ denotes the covariance matrixfor Zˆ, and for each j = 1, . . . , p, ΣZ˜j denotes the covariance matrix for Z˜j .The matrix ΣZˆ,Z˜j denotes the cross-covariance between the measurementsZˆ and the j-th output Z˜j , while ΣZ˜i,Z˜j , i 6= j, denotes the cross-covariancebetween the i-th and j-th members of the ensemble. For j = 1, . . . , p, andi 6= j, note thatΣZˆ = A0ΣA>0 + σ2eIn (5.27)ΣZ˜j = b2jAjΣA>j + σ2δjIm (5.28)ΣZˆ,Z˜j = bjA0ΣA>j (5.29)ΣZ˜i,Z˜j = bibjAjΣA>j , (5.30)where the A0 and Aj matrices have been described in (5.11) and (5.12).In the case that the deterministic models have been previously inter-polated to the measurement locations, these matrices may not need to bedefined. In the context of prediction, however, we often predict temperaturemeasurements conditionally on the model outputs. In practice, this meansthat over space, there are more sites where model outputs are available thantemperature measurements. In such cases, it might be useful to define a ma-trix A0 responsible for extracting the part of the “true” underlying field thatrelates to the locations where temperature measurements are available.We denote µˇ and Σˇ as the mean vector and covariance matrix of the jointdistribution of Zˇ = (Zˆ>, Z˜>1 , . . . , Z˜>p )> conditionally on Ψ, as described in(5.26). For the purposes of spatial prediction, the goal is to predict Zˆ at agiven set of m new locations x∗. This entails describing the predictive dis-tribution of Zˆ(x∗). Recall from the model definition (5.6), conditionally onΨ, that the distribution of Zˆ(x∗) is normal with mean µˆ(x∗) and covariancematrix A0ΣA>0 + σ2eIm.Now denote the cross-covariance between of Zˆ(x∗) and Zˆ, Z˜1:p condition-ally on Ψ byυ = (ΣZˆ(x∗),Zˆ,ΣZˆ(x∗),Z˜1 , . . . ,ΣZˆ(x∗),Z˜p). (5.31)565.4. Ensemble Modelling of Temperatures in the Pacific NorthwestHence, the distribution of Zˆ(x∗) conditionally on the observed dataZˆ, Z˜1:p and Ψ is normal with mean µˆ(x∗) + υ>Σˇ−1(Zˇ− µˇ) and covarianceΣZˆ(x∗) − υ>Σˇ−1υ.The posterior predictive distribution of Zˆ(x∗) can be obtained as followsp(Zˆ(x∗)|Zˆ, Z˜1:p) =∫p(Zˆ(x∗)|Zˆ, Z˜1:p,Ψ)p(Ψ|Zˆ, Z˜1:p)dΨ. (5.32)After having obtained N simulations from the posterior distribution ofΨ, it is then possible to approximate the integral in (5.32) by the followingRao-Blackwellized estimator:p(Zˆ(x∗)|Zˆ, Z˜1:p) ≈ 1NN∑l=1p(Zˆ(x∗)|Zˆ, Z˜1:p,Ψ(l)), (5.33)where Ψ(l) denotes the l-th simulated value from the posterior distributionof Ψ.5.4 Ensemble Modelling of Temperatures in thePacific Northwest5.4.1 Data DescriptionRecall the Probcast data introduced in Section 4.3.1. They include 48-hour forecasts of surface level temperature data initialized at midnight Co-ordinated Universal Time (UTC). The data come from the UW mesoscaleshort-range ensemble system for the Pacific northwestern area and corre-sponds to a five-member short-range ensemble consisting of different runsof the MM5 model, namely AVN, GEM, ETA, NGM, and NOGAPS. Inthese data, the grid-scaled deterministic model outputs had previously beeninterpolated to the locations of the monitoring stations by the Probcastgroup. The Figure 5.1 illustrates the locations of the 120 stations used inthis chapter for illustration of the BEM methodology.575.4. Ensemble Modelling of Temperatures in the Pacific NorthwestFigure 5.1: Map of the 120 stations used for illustration of the BEM method-ology. Map created using R library ggmap with tiles by Stamen Design, underCC BY 3.0, and data by OpenStreetMap, under CC BY SA.5.4.2 InferenceIn Appendix B, for validation of our approximate inference computation,we discuss the application of the INLA-SPDE methodology for the BEMmodel in an artificial data setting. In this section, we discuss our findingsfor the Probcast Group data set.Model DescriptionSince the deterministic model outputs had already been interpolated tothe locations of the monitoring stations by the Probcast group, the BEMmodel can thus be simplified toZˆ|Z,Ψ ∼ N (Z, σ2eIn) (5.34)Z˜j |Z,Ψ ∼ N (aj1n + bjZ, σ2δjIn) (5.35)Z|Ψ ∼ N (µ,Σ) (5.36)Ψ ∼ p(Ψ), (5.37)585.4. Ensemble Modelling of Temperatures in the Pacific Northwestfor j = 1, . . . , p, and p(Ψ) denotes the prior distribution for Ψ. In the above,N denotes a normal distribution with given mean vector and covariancematrix.Figure 5.2 illustrates the overall correlations between measurements andmodel outputs. Note that the correlation is stronger within members of theensemble. Due to the high correlations, a model that links the measurementswith the model outputs via a common underlying field, such as the BEMmodel, seems like a sensible choice.Figure 5.2: Pearson’s correlation coefficients between measurements andmodel outputs.Mean DescriptionIn a similar manner as described in Section 4.5.1 for a proxy data set cov-ering the same time frame and similar spatial domain, we assume a spatialmean that depends on the interaction between latitude (s1) and longitude(s2), and includes elevation (h(s)). The mean function can thus be summa-rized asµ(s) = β0 + β1s1 + β2s2 + β3s1s2 + β4h(s). (5.38)595.4. Ensemble Modelling of Temperatures in the Pacific NorthwestPrior SpecificationsFor model completeness, in order to carry out the inference procedurefor the BEM for the Probcast data set, we describe our independent andvague prior specifications below. For numerical stability, we specify priorsfor precision parameters (inverse of the variance) in a logarithmic scale.All notation used below for normal priors refer to mean and precision,whereas we refer to a Log-gamma as simply the logarithm of a Gammadistribution.• For the mean parameters α, β1, β2, β3, β4, and calibration parametersaj and bj , for j = 1, . . . , 5, we specified a N (0, 0.01) prior. This yieldsfairly noninformative priors for these parameters.• For log(σ−2δj ), j = 1, . . . , 5, we specified a Log-gamma(0.01, 0.01) prior.• For log(σ−2e ), we specified a Log-gamma(1, 0.01) prior.• For log(σ), we specify a N (0, 0.1) prior, for log(κ) a N (0, 1). Weheuristically specify the prior for the spatial range as a fifth of theapproximate domain diameter. This leads to a fairly vague prior spec-ification for log(σ). As described in Lindgren and Rue (2015), for thisheuristic choice, the precision 1 for the prior of log(κ) gives an ap-proximate 95% prior probability for the range being shorter than thedomain size.BEM Implementation in R-INLAIn this section, we briefly discuss the computational implementation ofthe BEM model using the INLA-SPDE framework. Our approach is similarto Ruiz-Ca´rdenas et al. (2012), where we construct an artificial observa-tional equation with “fake zero” data for the BEM model implementation.This technique has become common while implementing advanced modelsin INLA, as described in the Chapter 8 of Blangiardo and Cameletti (2015).More recently, Rue et al. (2016) provide a review of Bayesian computing us-605.4. Ensemble Modelling of Temperatures in the Pacific Northwesting the R-INLA package, and Lindgren and Rue (2015) focus on the aspectsof the SPDE implementation.Firstly, we construct the following linear predictorsξ0(s) = µ(s) + P (s, s0)g(s0) (5.39)ξj(s) = aj + bjξ0(s), (5.40)for j = 1, . . . ,m, where g(s0) represent the SPDE triangulated mesh basedon a Mate´rn model, and P (s, s0) is a projector matrix. The projector ma-trix is responsible for projecting the process from the mesh vertices to theobserved locations. Figure 5.3 illustrates the triangulation of the spatialdomain for Feb 20th, as described in Section 5.3.1. The triangulation wasperformed similarly for the other days.Using the R-INLA terminology (Rue et al., 2016), the ξ0 linear predictor,is created so that it can be “copied” into the linear predictors ξj , associatedwith the deterministic model outputs. The “copy” feature of R-INLA is keyin the BEM implementation, as it will allow us to link the underlying processto both measurements and model outputs observational models.The ξj(s) predictors are constructed in an independent and identicallydistributed model with low arbitrary precision. This means that, in prac-tice, ξj(s) are free to vary though essentially they will be restricted to thelinear functional described above. This is possible by assuming a Gaussianlikelihood with “fake zero” observations with a high fixed precision. Exam-ples of this approach can be found in Ruiz-Ca´rdenas et al. (2012). Morespecifically, we can describe the BEM observational model as0(s) = ξ0(s) + 0 − zˆ(s) (5.41)z˜j(s) = ξj(s) + j , (5.42)for j = 1, . . . ,m, where 0(s) denotes the “fake zero” observations, 0 repre-sents an independent Normal random effect with zero mean and variance σ2eand, similarly, j independent Normal random effects, each with zero meanand variance σ2δj .615.4. Ensemble Modelling of Temperatures in the Pacific Northwestlllllll ll ll l lll l lll l lll lll ll lll ll ll l ll ll ll ll l llllllll lllll lll ll ll llll l ll ll l ll ll lll l ll ll ll l llll lll l lll llllll llFigure 5.3: Triangulation for the BEM data available on Feb 20th. Themesh comprises of 591 edges and was constructed using triangles that have aminimum angle of 25, maximum edge length of 1◦ within the spatial domainand 2◦ in the extension domain. The maximum edges were chosen to be lessthan the approximate range of the process. The spatial domain was extendedto avoid a boundary effect. The monitoring stations are highlighted in red.Assessment of Calibration of the Model OutputsAn interesting feature of the BEM model is that by linking the deter-ministic model outputs process with the measurement process, it is thenpossible to quantify the uncertainty about these outputs. Of particular in-terest is therefore the calibration parameters (aj and bj) and variances σ2δjassociated to each member of the ensemble, here j = 1, . . . , 5.Figure 5.5 illustrates the approximate marginal posterior distributionsfor these parameters for three selected days. The calibration parameters (ajand bj) are particularly useful to assess deviations from the assumed latentprocess. Figure 5.4 illustrates their variation over time. For the additive625.4. Ensemble Modelling of Temperatures in the Pacific Northwestcalibration parameters (aj), note the slight decreasing trend during colderperiods followed by an increasing trend during warmer periods. On theother hand, the multiplicative calibration parameters (bj) show a decreasingtrend during warmer periods preceded by a slight increasing trend in colderperiods.0 20 40 60 80 100−50510Timea0 20 40 60 80 1000.40.60.81.01.21.4TimebAVNGEMETANGMNOGAPSFigure 5.4: Posterior mean for the calibration parameters (aj and bj) foreach member of the ensemble j = 1, . . . , 5 across time (in days).635.5. Discussion and Future Work−10 −8 −6 −4 −2 0 20.00.20.40.60.8Feb 20aDensity0.5 1.0 1.5 2.00123456Feb 20b0.0 0.5 1.0 1.502468Feb 20σδ2AVNGEMETANGMNOGAPS−15 −10 −5 0 5 100.00.10.20.30.40.5Apr 07aDensity0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.60123456Apr 07b0 1 2 3 401234Apr 07σδ2AVNGEMETANGMNOGAPS−2 0 2 4 60.00.51.01.5Jun 05aDensity0.6 0.8 1.0 1.20510152025Jun 05b0.0 0.5 1.0 1.5 2.0 2.5024681012Jun 05σδ2AVNGEMETANGMNOGAPSFigure 5.5: Approximate marginal posterior distributions for calibrationparameters (aj and bj) and variances σ2δjfor each member of the ensemblej = 1, . . . , 5 for three selected days: February 20th, April 7th, and June 5th.5.5 Discussion and Future WorkIn this chapter, we introduced a scalable inference methodology alterna-tive for the BEM using integrated nested Laplace approximations followingRue et al. (2009); Lindgren et al. (2011). In Chapter 6, we introduce adynamic strategy that builds on the BEM model ability to accommodatetime, with the objective of performing forecasting. Essentially we will apply645.5. Discussion and Future Workthe BEM model sequentially over a set of training days. These posteriorswill then be part of a mixture that will eventually be used for obtaining thepredictive distributions for forecasting.A limitation of the BEM model described in this Chapter is due to thefact that we assumed a Mate´rn covariance structure for the “true” under-lying random field. From Chapter 4, we noted that it is crucial to handlenon-stationarity when modelling temperatures in the Pacific Northwest. Forfuture work, we would like to accommodate this into the BEM inferencestrategy.As in Lindgren et al. (2011), this entails representing a Gaussian randomfield η as a solution to a SPDE with covariance parameters varying overspace, and being written as(κ2(s)−∆)α2 {τ(s)η(s)} =W(s), (5.43)where τ models the variance of the process and is allowed to vary on space.65Chapter 6Ensemble Forecaster6.1 ContributionsIn this chapter, we introduce a dynamic Bayesian ensemble model fore-caster (DBEM), which essentially builds on the ability of the BEM modeldescribed Chapter 5 to accommodate time, with the objective of performingforecasting.This general idea was originally introduced in Liu (2007), but the method-ology was not fully developed due computational challenges, and hence noempirical assessment of the method was made. The main contribution ofthis chapter is in its critical assessment of the DBEM, as well as providinga detailed description of its strategy.Following up on the inference for the BEM model based on an INLAframework described in the previous chapter, we offer a way to solve thecomputational challenges faced by Liu (2007) by making use of some ap-proximations from the INLA framework. Moreover, we provide a compar-ison of its forecasting strengths with a Bayesian Model Averaging (BMA)alternative. The BMA is described in the following Section 6.2.6.2 Bayesian Model AveragingThe main idea behind performing Bayesian model averaging (BMA)comes from realizing the need to take into account the uncertainty overthe model choice in a particular setting. This idea flows quite naturally intoa Bayesian framework. Instead of using a single model, the BMA relies ona mixing over multiple models, where the weights used for combining thequantities of interest are based on posterior model probabilities.666.2. Bayesian Model AveragingConsider ψ a quantity of interest, say, a predictive quantity, and letM1, . . . ,Mp denote the set of the possible models considered. The posteriorprobability for ψ is given by an average of the posterior distributions undereach of the models considered is defined as followsp(ψ|D) =p∑j=1p(ψ, |D,Mj)p(Mj |D)where D denotes the data available.Empirically, this approach has proved superior to simply choosing a sin-gle “best” model based on some criteria. For instance, using a logarithmicscoring rule, Raftery et al. (1997) noted that the BMA had optimum pre-dictive performance. A comprehensive review of the BMA can be found inHoeting et al. (1999).Consider now the situation where an ensemble of computer models out-puts are available and there is interest in embedding them in a statisti-cal model with the goal of obtaining probabilistic forecasts. The Bayesianmodel averaging forecast idea was introduced in Raftery et al. (2005), andit is based on a weighted average of individual deterministic outputs thatconstitute the ensemble, where the weights represent the individual forecastperformance in a training period. The main idea is that there is a “best”model output, and they attempt to quantify the uncertainty about whichmember may be considered “best” using BMA.Denoting by y as a weather quantity of interest and fk the forecast as-sociated with the k-th deterministic model in the ensemble, k = 1, . . . , p,Raftery et al. (2005) apprise that the conditional distribution gk(y|fk) canbe interpreted as a conditional distribution of the data given the k-th deter-ministic model that has the best forecast performance in the ensemble. TheBMA predictive model can therefore be written asp(y|f1, . . . , fk) =p∑k=1wkgk(y|fk), (6.1)where∑pk=1wk = 1 and wk = p(fk|D) is the posterior probability of forecast676.2. Bayesian Model Averagingk being the best based on their predictive performance in a training period.In particular, for the context of surface temperature forecasting, Rafteryet al. (2005) use a normal distribution to approximate the conditional dis-tribution gk(y|fk), centred at linearly calibrated forecastsY |fk ∼ N (ak + bkfk, σ20). (6.2)In contrast to the BEM model described in Section 5.2, they assume thatthe forecasts have been previously interpolated to the observation sites.The BMA predictive mean is given byE[Y |f1, . . . , fp] =p∑k=1wk(ak + bkfk). (6.3)Note that the BMA estimation is done for one location at a time. Forsimplicity, we suppressed the spatial and temporal indexes in the equationsabove. The procedure is as follows. Raftery et al. (2005) first estimateseparately the calibration parameters ak and bk for each member of theensemble via least squares regression using the training data. Then, theyproceed with the estimation of BMA weights wk and the BMA variabilityσ20 simultaneously for the ensemble via the Expectation Maximization (EM)algorithm for the training data. They optimize the estimation of σ0 so that itoptimizes the continuous ranked probability score (CRPS) for the trainingdata by performing a numerical search over possible values of σ0 centredat the maximum likelihood estimates, and keeping the other parametersfixed. The CRPS measures the difference between the predicted and truecumulative distributions, as followsCRPS(F, y∗) =∫ ∞−∞[F (y∗)− 1y≥y∗ ]2dy, (6.4)where F is the cumulative distribution function of the forecast distribution,y∗ is the true value and 1y≥y∗ is a step function that attains the value 1when y ≥ y∗ and zero otherwise. The BMA is implemented in R in theensembleBMA package (Fraley et al., 2013).686.3. Dynamic Bayesian Ensemble ForecasterOn the other hand, Berrocal et al. (2007) extend the BMA by taking intoaccount the possibility of spatial correlation, unlike the basic BMA. Theirspatial BMA strategy describes the predictive distribution for the wholefield as a weighted average of multivariate normal distributions centred atlinearly calibrated members of the ensemble. Similarly, Kleiber et al. (2011)provide a spatially adaptive extension called geostatistical model averagingby first estimating the parameters of the BMA model at each location andthen interpolating the estimates using a geostatistical model.In the following Section 6.3, we introduce our alternative strategy, basedon applying the BEM of Chapter 5 for the purposes of forecasting.6.3 Dynamic Bayesian Ensemble Forecaster6.3.1 Decision Making IdeasThe motivation for introducing the dynamic Bayesian ensemble modelforecaster (DBEM) comes from accommodating time into the BEM modeldescribed in Section 5.2 by borrowing ideas from a decision making modelproposed by Bayarri and DeGroot (1989) to combine opinions from differentexperts.Each of the k experts was assumed to have their own uncertainty about acertain parameter W quantified by the individual prior distributions denotedas pii(w) assigned by each expert i. In this decision making model, it wasassumed that an executive would form their opinion about W through aweighted average of the experts’ opinions. The executive’s prior ispi(w) =k∑i=1α(0)i pii(w), wherek∑i=1α(0)i = 1, α(0)i ≥ 0. (6.5)Furthermore, the k experts and the executive were assumed to jointlyobserve the value of a random variable X whose conditional distributionwhen W = w is denoted as f(·|w). The posterior for expert i ispi∗i (w|x) =pii(w)f(x|w)pi(x), (6.6)696.3. Dynamic Bayesian Ensemble Forecasterwhere pi(x) is the marginal distribution for expert i, that is,pi(x) =∫f(x|w)pii(w)dw. (6.7)The marginal distribution for the executive is given byp(x) =k∑i=1∫α(0)i f(x|w)pii(w)dw. (6.8)Hence, the posterior for the executive ispi∗(w|x) = pi(w)f(x|w)p(x)=k∑i=1α(0)i pi(x)p(x)pii(w)f(x|w)pi(x)=k∑i=1α(1)i pi∗i (w|x). (6.9)Therefore, the posterior for the executive is again a weighted average ofposteriors (pi∗i (w|x), i = 1, . . . , k) for the k experts. The updated weight forexpert i after observing X = x is α(1)i = αipi(x)/p(x), which depends ontheir marginal distribution of X. Note that the expert’s weight will increaseif his/her marginal distribution of x is large.In the following subsection, we describe the dynamic Bayesian ensemblemelding forecaster, which borrows ideas from this decision making strat-egy. Essentially, the measurements Zˆ and the members of the ensembleZ˜1, . . . , Z˜p across space at a given time point are viewed as the X in thisdecision-making process. The first k time point measurements and modeloutputs are used to fit the BEM model described in Section 5.2, and thusgiving the posterior distribution of Ψ for each of these k time points.706.4. DBEM Forecaster6.4 DBEM ForecasterThe dynamic Bayesian ensemble model forecaster borrows ideas fromthe decision making strategy described in Section 6.3.1, with the purpose ofaccommodating time in the BEM model, which is fundamentally a spatialmodel.In order to do so, a set of k time point measurements and model out-puts are used as a training set. These data are used to fit the BEM modeldescribed in Section 5.2, and obtain the posterior distribution of Ψ for eachof these k time points. Cross-validation techniques can be used for choosingthe number of time points. By combining these posterior distributions, an“executive” posterior distribution is obtained with the purpose of represent-ing the uncertainty about Ψ after having observed data across the trainingdays. This “executive” posterior can thus be viewed as a prior for the sub-sequent day which could then be used to obtain the predictive distributionfor this future day.6.4.1 A DBEM Forecaster AlgorithmIn this section, we provide details about the DBEM forecasting strategy,described in Algorithm 1, with the objective of obtaining forecasts.Following up on Section 5.3.2, recall that we introduced the joint distri-bution of Zˆ, Z˜1:p conditionally on the hyperparameters Ψ as a multivariatenormal. Hence, the conditional distribution Zˆk+1|Ψ(l), Z˜1:p,k+1, for eachsample l of Ψ required for Algorithm 1 can be obtained by using the prop-erties of multivariate normal.The main advantage of the DBEM is that it sidesteps the need tobuild time series models to incorporate a temporal component and insteadsmooths the sequence of posteriors by averaging them over time. It alsobuilds upon the Bayesian ensemble model’s ability to forecast temperaturesat future times. In contrast to the BMA approach described in Section6.2, another very desirable feature of the DBEM is that it also allows thecalibration coefficients (ak, bk, for k = 1, . . . , p) to change over time.716.4. DBEM ForecasterAlgorithm 1 DBEM ForecasterLet i = 1, . . . , k index a set of training days.1. The first step is to initialize the weights. For instance, by settingα(0)i =1k , for each i = 1, . . . , k, no preference is given to any timepoint in the training set.2. Obtain M samples from pi(0)i (Ψ|Zˆi, Z˜1:p,i), i.e., the posterior for eachtraining day i, where Z˜1:p,i = (Z˜1,i, . . . , Z˜p,i), and Z˜p,i denotes the dataavailable for the p-th member of the ensemble at the i-th training day.3. Update weights and get samples from the “executive” posterior de-scribed as followspi(1)(Ψ|Zˆk, Z˜1:p,k) =k∑i=1α(1)i pi(0)i (Ψ|Zˆi, Z˜1:p,i), (6.10)whereα(1)i =α(0)i pi(Zˆi, Z˜1:p,i)∑ki=1 α(0)i pi(Zˆi, Z˜1:p,i). (6.11)4. The “executive” posterior is used as a prior in order to obtain thefollowing predictive distribution:f(Zˆk+1|Z˜1:p,k+1) (6.12)=∫f(Zˆk+1|Ψ, Z˜1:p,k+1)pi(2)(Ψ)dΨ (6.13)≈ 1LL∑l=1f(Zˆk+1|Ψ(l), Z˜1:p,k+1), (6.14)where Ψ(l)1 , l = 1, . . . ,M are samples from the executive posterior.In addition, the DBEM is also able to accommodate spatial correlationsince the posterior distributions for the training days are obtained via theBayesian ensemble model, which is essentially a spatial model. In contrast,the BMA model strategy is done for one location at a time, as described inSection 6.2.726.4. DBEM ForecasterThe posteriors in item 2 of the DBEM algorithm can be approximatedefficiently using the INLA-SPDE approach described in Section 5.3.1, thusavoiding the computational burden associated with MCMC strategies. More-over, significant computational benefits come from efficiently obtaining themarginal likelihoods pi(Zˆi, Z˜1:p,i) from R-INLA as described in (3.2).A limitation of this methodology is due to the fact that the marginal log-likelihood approximations may be difficult to compare in situations wherethe number of stations vary significantly among the training days.The marginal log-likelihoods will usually be very negative, which mightyield some difficulty in differentiating the weights that measure the contri-bution of each training day in the mixture. This poor mixing will tend tohappen so long as at least one log-likelihood is significantly greater than therest, and will ultimately dominate the mixing weights. Moreover, anotherlimitation of the methodology described in Liu (2007) is that it does nottake into consideration the order of the observations. Instead, it heavilyrelies on the quality of the data in each of the training days.In order to overcome the poor mixing when obtaining the samples fromthe “executive” posterior, we scale the marginal log-likelihoods by the aver-age of the marginal log-likelihoods in the training set. The weights in part3 of Algorithm 1 are now written asα(1)i =α(0)i exp{m−1 log pi(Zˆi, Z˜1:p,i)}∑ki=1 α(0)i exp{m−1 log pi(Zˆi, Z˜1:p,i)}, (6.15)where m = |∑ki=1 log pi(Zˆi, Z˜1:p,i)/k| is the absolute value of the average ofall the marginal log-likelihoods in the training set. This strategy maintainsthe order of influence of the marginal log-likelihood among the training days,but deflates/inflates the weights to insure a better mixing.In the following section, we describe an empirical assessment of theDBEM methodology for obtaining forecasts. We compare it with the BMAmethodology described in Section 6.2.736.5. An Empirical Assessment of the DBEM6.5 An Empirical Assessment of the DBEMIn this section, we again refer to Probcast data introduced in Section4.3.1. To illustrate the DBEM methodology, Figure 5.1 contains the mapof the 120 stations considered in this study. Our goal is to perform anempirical assessment of the DBEM methodology by comparing it with theBMA approach described in Section 6.2. Recall that the number of availablestations varies across the different time points, as noted in Figure 4.2.In Section 6.5.1, we discuss our findings based on sequentially fitting aBEM (as illustrated in Section 5.4) among the training days, and using aDBEM strategy to obtain forecasts one day ahead. Moreover, in Section6.5.2 we use our knowledge of the behaviour of the temperature fields in thePacific Northwest described in Section 4.5.1 in an attempt to alleviate theinfluence of localized weather.6.5.1 Forecasting TemperatureThe first required step for the application of both the DBEM and theBMA entails specifying the number of training days. It would be advanta-geous to use a shorter training period in order to adapt rapidly to the changesin weather patterns. For the same data set used in this thesis, Raftery et al.(2005) noted that the overall root mean squared forecast error (RMSFE)and the mean absolute error (MAE) of the BMA decreased substantially asthe number of training days increases, up to 25 days, with little change inperformance beyond that. Because of this, we began with a training set of25 days for both methods. We later perform cross-validation and investigatethe effect of varying the number of training days in the forecasts.Table 6.2 contains the forecasting summaries for three selected days: Feb20th, Apr 7th and June 5th. Note that while observing a smaller RMSFEand MAE, and higher empirical coverages of the 95% credible intervals (CIs),the average length of the CIs for the BMA are generally larger than those forthe DBEM. We believe this is due to the strength that the DBEM borrowsfrom neighbouring sites at which a forecast is being made. We intend toexplore this issue in future work. Possibly due to this, note in Figure 6.2746.5. An Empirical Assessment of the DBEMthat DBEM outperformed the BMA for some of the stations.Table 6.1: Forecasting summaries for three selected days February 20th,April 7th and June 5th, using a training set of 25 days. Summaries includethe root mean squared forecast error (RMSFE), mean absolute error (MAE),the empirical coverage and the average length of the 95% credible intervals(CI) for the different methods considered: the dynamic Bayesian ensemblemodel and the Bayesian model averaging (BMA). There are a total of 109available stations on Feb 20th, and a total of 105 on Apr 7th and June 5th.MethodSelectedDaysRMSFE(◦C)MAE(◦C)EmpiricalcoverageAveragelength of95% CI(◦C)DBEMFeb 20thApr 7thJun 5th2.7573.2604.5242.2912.7523.6750.9540.8950.84810.82610.65712.331BMAFeb 20thApr 7thJun 5th2.7933.3042.4522.2242.7081.9730.9820.9431.00012.49612.12113.217The overall forecasting summaries across all time points can be foundin Table 6.2. The average length of the CIs for the BMA are generallylarger than those for the DBEM, but the BMA outperformed the DBEM asmeasured by its smaller RMSFE and MAE, and higher empirical coveragesof the 95% credible intervals.In order to get a better understanding about the forecasting ability of thedifferent methods across time, in Figure 6.1 we illustrate the RMSFEs splitby month. We also describe the performance statistics in Table 6.3. Notethat the DBEM outperformed the BMA during the months of February andMarch, but significantly underperformed it for the month of June.For cross-validation purposes, we also varied k to guide us in the choiceof the number of training days. In Figure 6.3 we illustrate the RMSFEsobtained for the month of June in particular. Note that there is a minorimprovement in performance for the BMA. For the DBEM, however, thenumber of training days does not seem to be a significant factor for perfor-mance improvement.756.5. An Empirical Assessment of the DBEMIn fact, from Algorithm 1, one may note that the forecasting performancewill be highly influenced by the quality of the posterior samples of the “ex-ecutive” posterior. Thus, a limitation of the DBEM is that it will tend tounderperfom under high uncertainty a posteriori. In particular, from thethe 95% CIs in Figure 6.5, we note increases in uncertainty for the meanparameters later in the study. This may explain the observed decrease inperformance of the DBEM for the month of June.Recall that in Chapter 5 we mentioned a limitation of the BEM modeldue to its assumed Mate´rn covariance structure. This may degrade theDBEM’s performance over some time periods, since it limits the ways inwhich strengths can be borrowed over space. The decrease in performanceis particularly exacerbated in June, when there is more discrepancy in tem-peratures, as seen in Figure 6.4 and Table 6.4.Table 6.2: Forecasting summaries across all available time points using atraining set of 25 days. There are a total of 77 time points. Summaries in-clude the root mean squared forecast error (RMSFE), mean absolute error(MAE), the empirical coverage and the average length of the 95% credi-ble intervals for the different methods considered: the dynamic Bayesianensemble model (DBEM) and the Bayesian model averaging (BMA).MethodRMSFE(◦C)MAE(◦C)EmpiricalcoverageAverage lengthof 95% CI (◦C)DBEM 4.003 3.347 0.823 11.240BMA 3.023 2.381 0.942 12.051766.5. An Empirical Assessment of the DBEMTable 6.3: Forecasting summaries across the different months, using a train-ing set of 25 days. Summaries include the root mean squared forecast error(RMSFE), mean absolute error (MAE), the empirical coverage and the av-erage length of the 95% credible intervals (CI) for the different methodsconsidered: the dynamic Bayesian ensemble model and the Bayesian modelaveraging (BMA).MethodSelectedDaysRMSFE(◦C)MAE(◦C)EmpiricalcoverageAveragelength of95% CI(◦C)DBEMFebruaryMarchAprilMayJune2.5423.1493.8163.2476.0842.0382.5423.1012.5885.3540.9650.8850.8550.9180.61710.92210.24111.00911.74512.025BMAFebruaryMarchAprilMayJune2.5683.2453.5542.9162.7952.0382.5872.7362.2792.2170.9830.9130.9120.9600.95412.38511.33912.50613.15111.454776.5. An Empirical Assessment of the DBEMFigure 6.1: Mean squared forecast error (MSFE) across space for forecastsfrom February 20th to June 30th using the dynamic Bayesian ensemblemodel (DBEM) and the Bayesian model averaging (BMA). Both methodsassumed a training set of 25 days.786.5.AnEmpiricalAssessmentoftheDBEM(a) Feb 20th (b) Apr 7th(c) Jun 5thFigure 6.2: 95% credible intervals of the forecasts for days Feb 20th, Apr 7th and June 5th for the differentmethods considered: the dynamic Bayesian ensemble model (DBEM) and the Bayesian model averaging (BMA).The number of stations where the forecasts were obtained were 109, 105 and 105, respectively. The dots representthe true measurement of temperature across the available stations for the selected days. To facilitate visualization,we also coloured the dots based on the different methodologies. Note the overall larger error bars observed for theBMA.796.5. An Empirical Assessment of the DBEMFigure 6.3: Mean squared forecast error (◦C), MSFE, across space for fore-casts across the month of June for different number of training days usingthe dynamic Bayesian ensemble model (DBEM) and the Bayesian modelaveraging (BMA).020401 2 3 4 5 6MonthsTemperature (° C)Figure 6.4: Monthly boxplots of observed temperatures over space.806.5. An Empirical Assessment of the DBEMTable 6.4: Summary statistics for the temperature measurements (◦C) overspace.Month Average (◦C) Std. Dev. (◦C)January 2.581 4.369February 5.531 3.982March 8.433 5.187April 14.992 5.858May 16.778 5.642June 22.374 7.1116.5.2 Forecasting Temperature AnomaliesIn this subsection, we use our knowledge of the behaviour of the temper-ature fields in the Pacific Northwest described in Section 4.5.1 in an attemptto alleviate the influence of localized weather, particularly in warmer peri-ods as seen in the previous Section 6.5.1. Here, we obtain the temperatureanomalies by taking out the spatial mean fitted via least squares. We thenproceed with forecasting these anomalies, and adding the fitted spatial meanback to obtain the forecasts in an observed temperature scale.Table 6.5 contains the forecasting summaries for three selected days: Feb20th, Apr 7th and June 5th. Comparing these results with those in Table6.2, note the reduction in the RMSFE for the DBEM for June 5th. Figure6.6 illustrates the 95% CIs of the forecasts for the selected days across thedifferent stations.An overall assessment across all time points can be found in Table 6.6 aswell as split by month in Table 6.7. In Figure 6.7 we illustrate the RMSFEssplit by month. Note that the DBEM still portrays a significantly poorerperformance than the BMA for the month of June, though compared Section6.5.1, we see an improvement in the RMSFE.Since in Section 6.5.2 the mean parameters were estimated separatelyfor all the training days, these parameters are thus varying with time andwe are implicitly incorporating time there. This could help explain why thisnew approach based on anomalies did not yield a substantial improvementin the results.816.5. An Empirical Assessment of the DBEM−1001020300 25 50 75 100Time (days)Intercept (° C)−2.50.02.55.07.510.00 25 50 75 100Time (days)Latitude−5.0−2.50.02.55.00 25 50 75 100Time (days)Longitude−1.0−0.50.00.50 25 50 75 100Time (days)Latitude−Longitude Interaction−0.0100−0.0075−0.0050−0.00250 25 50 75 100Time (days)ElevationFigure 6.5: Posterior means (solid line) for the mean parameters of theunderlying random field over time. The gray shaded region represent the95% credible intervals. Note the increase in uncertainty for later days in theseries.826.5. An Empirical Assessment of the DBEMTable 6.5: Forecasting summaries for three selected days Feb 20th, Apr 7thand Jun 5th, using a training set of 25 days. Summaries include the rootmean squared forecast error (RMSFE), mean absolute error (MAE), theempirical coverage and the average length of the 95% credible intervals (CI)for the different methods considered: the dynamic Bayesian ensemble modeland the Bayesian model averaging (BMA). There are a total of 109 availablestations on Feb 20th, and a total of 105 on Apr 7th and June 5th.MethodSelectedDaysRMSFE(◦C)MAE(◦C)EmpiricalcoverageAverage lengthof 95% CI (◦C)DBEMFeb 20thApr 7thJun 5th2.6123.0453.5202.0632.4302.7010.9540.8950.86710.89610.47611.404BMAFeb 20thApr 7thJun 5th2.7933.3042.4522.2242.7081.9730.9820.9431.00012.49612.12113.217836.5.AnEmpiricalAssessmentoftheDBEM(a) Feb 20th (b) Apr 7th(c) June 5thFigure 6.6: 95% credible intervals of the forecasts for days Feb 20th, Apr 7th and June 5th for the differentmethods considered: the dynamic Bayesian ensemble model (DBEM) and the Bayesian model averaging (BMA).The number of stations where the forecasts were obtained were 109, 105 and 105, respectively. The dots representthe true measurement of temperature across the available stations for the selected days. To facilitate visualization,we also coloured the dots based on the different methodologies. Note the overall larger error bars observed for theBMA.846.5. An Empirical Assessment of the DBEMTable 6.6: Forecasting summaries across time using a training set of 25days. Summaries include the root mean squared forecast error (RMSFE),mean absolute error (MAE), the empirical coverage and the average lengthof the 95% credible intervals for the different methods considered: the dy-namic Bayesian ensemble model (DBEM) and the Bayesian model averaging(BMA).MethodRMSFE(◦C)MAE(◦C)EmpiricalcoverageAverage lengthof 95% CI (◦C)DBEM 3.647 2.995 0.842 10.812BMA 3.023 2.381 0.942 12.051Table 6.7: Forecasting summaries across the different months, using a train-ing set of 25 days. Summaries include the root mean squared forecast error(RMSFE), mean absolute error (MAE), the empirical coverage and the av-erage length of the 95% credible intervals (CI) for the different methodsconsidered: the dynamic Bayesian ensemble model and the Bayesian modelaveraging (BMA).MethodSelectedDaysRMSFE(◦C)MAE(◦C)EmpiricalcoverageAverage lengthof 95% CI (◦C)DBEMFebruaryMarchAprilMayJune2.6433.1773.5513.3204.8052.1032.5772.8222.6594.1100.9480.8780.8820.9080.69110.90910.14610.93711.59710.702BMAFebruaryMarchAprilMayJune2.5683.2453.5532.9152.7952.0382.5872.7362.2792.2170.9830.9130.9130.9610.95412.38511.34012.50613.15111.454856.6. Discussion and Future WorkFigure 6.7: Mean squared forecast error (◦C), MSFE, across space for fore-casts from February 20th to June 30th using the dynamic Bayesian ensemblemodel (DBEM) and the Bayesian model averaging (BMA). Both methodsassumed a training set of 25 days.6.6 Discussion and Future WorkIn this chapter, we introduced a dynamic alternative for the BEM whichallows us to obtain forecasts, despite the fact that the BEM is a spatialmodel. The DBEM is a promising methodology as it outperformed the BMAfor some time periods. A limitation of this methodology carries over from theprevious chapter and is due to its assumed stationary covariance structure,which ultimately contributed to the decrease in performance for the DBEMparticularly over warmer periods, where the discrepancy in temperaturemeasurements is larger. Following up on this would require handling non-stationarity in the INLA-SPDE modelling.866.6. Discussion and Future WorkAdditionally, we discussed that the DBEM methodology requires thecomputation of mixing weights, which are based on normalized marginallikelihoods across a training set. Difficulties were encountered in differenti-ating amongst these weights when at least one log-likelihood is significantlyhigher than the rest.To deal with this issue, we propose an alternative methodology that isthe subject of current research. We describe the proposed methodology inAlgorithm 2. The main difference is in the added step of first mixing theposteriors on the training set with equal weights, which is then viewed as aprior for the subsequent day. As a consequence, the mixing weights used toyield the “executive” posterior are based on individual marginal likelihoodsof each of the training days but considering the data at a future time. Sam-ples from this “executive” posterior are used to obtain an approximationof the predictive distribution at the subsequent time, and ultimately theforecasts.876.6. Discussion and Future WorkAlgorithm 2 Modified DBEM ForecasterLet i = 1, . . . , k index a set of training days.1. The first step is to initialize the weights, by setting α(0)i =1k , for eachi = 1, . . . , k. At this stage, no preference is given to any time point inthe training set.2. Obtain samples from pii(Ψ|Zˆi, Z˜1:p,i), i.e., the posterior for each train-ing day i, where Z˜1:p,i = (Z˜1,i, . . . , Z˜p,i), and Z˜p,i denotes the dataavailable for the p-th member of the ensemble at the i-th training day.These will now represent “experts” posteriors.3. A baseline “executive” prior for the subsequent day is based on aweighted average of the “experts” posteriors among the training dayspi(0)(Ψ|Zˆ1:k, Z˜1:p,1:k) =k∑i=1α(0)i pii(Ψ|Zˆi, Z˜1:p,i). (6.16)4. The next step will require samples from the “executive” posterior de-scribed below.pi(1)(Ψ|Zˆk+1, Z˜1:p,k+1)=f(Zˆk+1, Z˜1:p,k+1|Ψ)pi(0)(Ψ|Zˆ1:k, Z˜1:p,1:k)p(Zˆk+1, Z˜1:p,k+1)=k∑i=1α(0)i pi(Zˆk+1, Z˜1:p,k+1)p(Zˆk+1, Z˜1:p,k+1)f(Zˆk+1, Z˜1:p,k+1|Ψ)pii(Ψ|Zˆi, Z˜1:p,i)pi(Zˆk+1, Z˜1:p,k+1)=k∑i=1α(1)i pii(Ψ|Zˆk+1, Z˜1:p,k+1), (6.17)where the updated weights are given byα(1)i =α(0)i pi(Zˆk+1, Z˜1:p,k+1)∑ki=1 α(0)i pi(Zˆk+1, Z˜1:p,k+1). (6.18)5. Finally, the forecasts are given by:f(Zˆk+2|Z˜1:p,k+2)=∫f(Zk+2|Ψ, Zˆk+1, Z˜1:p,k+1)pi(1)(Ψ|Zˆk+1, Z˜1:p,k+1)dΨ (6.19)≈ 1LL∑l=1f(Zˆk+2|Ψ(l), Zˆk+1, Z˜1:p,k+1),where Ψ(l)1 , l = 1, . . . ,M correspond to samples frompi(1)(Ψ|Zˆk+1, Z˜1:p,k+1).88Chapter 7Determinantal PointProcessesIn this chapter, we provide an overview of determinantal point processes,including definitions and sampling strategies. The main purpose of thischapter is to motivate the potential use of these processes in the design ofmonitoring networks, and ultimately provide a background for Chapter 8.7.1 MotivationDeterminantal point processes (DPPs) are repulsion point processes andvery appealing in practice due to their exact inference properties. DPPs havebeen explored in random matrix theory since the 1960s. In physics, the Pauliexclusion principle states that two identical subatomic particles, referred toas fermions, cannot occupy the same quantum state simultaneously. Thishas motivated the use of DPPs to model fermions in thermal equilibrium,but with the name of fermion processes (Macchi, 1975). A probabilisticdescription of DPPs can be found in Hough et al. (2006) and Borodin (2009).More recently, DPPs have attracted attention in the machine learningand statistical communities. The work of Kulesza and Taskar (2012) pro-vides a thorough and comprehensible introduction to DPPs. They focusedon applications most relevant to the machine learning community, such assearch results and document summarization.An important characteristic of DPPs is that they assign higher proba-bility to sets of items that are diverse. Its name is due to the fact that thelikelihood of a DPP depends on the determinant of a kernel matrix thatdefines a global measure of similarity between pairs of items (Borodin and897.1. MotivationOlshanski, 2000).There has been an increasing interest in DPPs in the machine learningcommunity. For instance, Kulesza and Taskar (2011b) focused on structuredDPPs when modelling distributions over sets of structures, such as sets ofsequences, trees, or graphs; Kulesza and Taskar (2011a) on a maximum aposteriori inference-based strategy for learning parameters of a DPP, Af-fandi et al. (2012) on Markov DPPs, which are advantageous when interestis to sequentially select multiple diverse sets of items. Gillenwater et al.(2012) proposed an approximate optimization solution algorithm for find-ing the most likely configuration in the DPP modelling framework; Affandiet al. (2013) used a Nystro¨m approximation to project the kernel matrixinto a low-dimensional space and improve the feasibility of sampling DPPsin high-dimensional settings. Finally, Affandi et al. (2014) focused on ap-proximate inference for DPPs and the use of Bayesian methods to learn theirparameters.In the statistical literature, Lavancier et al. (2015) focused on the prob-abilistic properties of DPPs and developed parametric models for analyzingDPPs. Inference was likelihood-based and their methodology was applied tospatial point pattern data sets with the goal of modelling different degreesof repulsiveness. Another recent work in the statistical literature is thatof Shirota and Gelfand (2016), which focused on inference in a Bayesianframework via an approximate Bayesian computation approach.Our motivation is the potential use of these processes in the design ofmonitoring networks. It is reasonable to assume that if observations aremeasured in one given location, a designer would not be interested in ob-taining measurements at nearby locations of that site, unless there is specialinterest in the smoothness of the process or in the measurement error. Thisleads to the idea of spatial repulsion in the design context as it is expectedthat the study region is to be well covered. We explore the idea that not onlya DPP design may be spatially-balanced, but it can also provide a flexibleway of imposing diversity in the selection of locations based on additionalvariables that might be available. We develop these ideas in Chapter 8.For the purposes of using DPPs in the design of monitoring network907.2. Definitionscontext, we restrict ourselves to introducing DPPs on a discrete finite setY = {1, . . . , N}, as in Kulesza and Taskar (2012). We believe it is moreappropriate for our application as, in practice, the domain in which we areallowed to locate monitoring stations is often discretized, since we are usuallyrestricted by the accessibility for the potentially new sites. Moreover, it willbe more computationally efficient to deal with discretized spaces. For somebackground material on more general DPPs defined on a Borel set B ⊆ Rd,check Lavancier et al. (2015).The rest of this chapter is as follows. In Section 7.2 we provide some def-initions regarding DPPs on a discrete set, as well as an algorithm to samplefrom it. Finally, in Section 7.3, we describe the notion of k-DPPS, whichwill be essential for the monitoring of networks context seen in Chapter 8.7.2 DefinitionsA point process P on a discrete set Y = {1, . . . , N} is a probabilitymeasure on 2Y , the set of all possible subsets Y ⊆ Y. Suppose that eachelement i is included with, say, probability pi. In an independent pointprocess, the probability of each subset is thenP(Y ) =∏i∈Ypi∏i/∈Y(1− pi). (7.1)A point process P is called a determinantal point process if, when Y isa random subset drawn according to P, for every A ⊆ Y,P(A ⊆ Y) = det(KA), (7.2)where K is a real, symmetric N × N positive semidefinite matrix indexedby the elements of Y, and KA ≡ [Kij ]i,j∈A, such that det(K∅) = 1, where ∅denotes the empty set.Note that equation (7.2) defines marginal probabilities of inclusion forsubsets A. In particular, if one is interested in the probability of a particular917.2. Definitionsitem i being selected, i.e, when A = {i}, thenP(i ∈ Y) = det(Kii) = Kii. (7.3)Hence, the diagonal entries of K give the marginal probabilities of inclusionfor individual elements.DPPs are parametrized by this marginal kernel matrix K which definesa global measure of similarity between pairs of items, so that more similaritems are less likely to co-occur. Thus, a DPP assigns higher probability tosets of items that are diverse. Note that in the case when A = {i, j},P(i, j ∈ Y) = det(Kii KijKji Kjj)= KiiKjj −KijKji= P(i ∈ Y)P(j ∈ Y)−K2ij , (7.4)so large values of Kij imply that i and j tend not to co-occur.In order to fully characterize a DPP, the eigenvalues of the marginalkernel K need to be bounded above by one. Hence, in practice, it is moreconvenient to characterize DPPs via L-ensembles (Borodin and Rains, 2005;Kulesza and Taskar, 2012), which directly define the probability of observingeach subset of Y. An L-ensemble defines a DPP not through the marginalkernel K, but via a real positive semidefinite matrix L, hereby referred toas L-ensemble, indexed by the elements of Y, such that:PL(Y = Y ) ∝ det(LY ). (7.5)Any positive semidefinite matrix defines a DPP. The normalization con-stant is available in closed form since∑Y⊆Ydet(LY ) = det(L+ I), (7.6)927.2. Definitionswhere I is the N ×N identity matrix. Thus,PL(Y ) = PL(Y = Y ) = det(LY )det(L+ I). (7.7)In contrast to the previous marginal probabilities of inclusion for subsetsA, (7.5) directly considers the probability of exactly observing all possiblerealizations of Y .That being said, alternative representations of DPPs are done eitherthrough the marginal kernel K or the L-ensemble. It is possible to translatebetween them (Macchi, 1975) as followsK = (L+ I)−1L. (7.8)The algorithm to sample from a DPP is based on an orthonormal eigen-decomposition of the marginal kernel, which can be obtained through theeigendecomposition of the positive semidefinite matrix L:L =N∑i=1λnvnv>n (7.9)and a rescaling of eigenvaluesK =N∑i=1λn1 + λnvnv>n . (7.10)In the above, {vn, λn} denote the eigenvectors and eigenvalues of L.Algorithm 3 summarizes how to sample from a DPP, as originally de-scribed in Kulesza and Taskar (2012). In the algorithm, the ei vector denotesa zero vector with 1 in its i-th entry. Note that the DPP sampling algorithmfirst entails sampling a subset of eigenvectors of the L-ensemble, where theirassociated eigenvalues govern their probabilities of selection. Note that thecardinality of the set of selected eigenvectors selected is unknown in advance,which can simply be viewed as sum of N independent Bernoulli random vari-ables.937.2. DefinitionsAlgorithm 3 Sampling from a DPPInput: {vn, λn} eigenvectors and eigenvalues of LJ ← ∅for n = 1, . . . , N doJ ← J ∪ {n} with probability λn1+λnend forV ← {vn}n∈JY ← ∅while |V | > 0 doSelect yi from Y with probability given by 1|V |∑v∈V (v>ei)2Y ← Y ∪ yiV ← V⊥, an orthonormal basis of the subspace of V orthogonal to eiend whileOutput: YAn approximation to a Poisson process in a plane where points are sam-pled independently seen in Figure 7.1 is contrasted with a simulation froma DPP using a Gaussian kernel. The DPP simulation displays much lessclumping, providing a better coverage of that region.lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllDPPllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllll llllllllllllllllllllllllllllllllllllllllllllllllllllllIndependent Point ProcessFigure 7.1: A set of points in the plane drawn from (left) a DPP character-ized by an L-ensemble with Gaussian kernel and (right) the same numberof points sampled independently. Note the clumping associated to the ran-domly sampled points in contrast to the more spatially balanced set of pointssampled from the DPP.947.3. k-DPPsBefore we describe DPPs with fixed cardinality, we need to introducean important way of expressing a DPP as a mixture of elementary DPPs(Kulesza and Taskar, 2012), also commonly known as determinantal projec-tion processes. Elementary DPPs, denoted as PV , are a particular type ofDPP where every eigenvalue of its marginal kernel is either zero or one. Itsmarginal kernel can thus be decomposed asKV =∑v∈Vvv>, (7.11)where V is a set of orthonormal vectors. From this decomposition, notethat elementary DPPs have their cardinality fixed as the cardinality of V .Furthermore, denoting VJ as {vn}n∈J , the mixture is (Kulesza and Taskar,2012)PL(Y ) = 1det(L+ I)∑J⊆{1,...,N}PVj (Y )∏n∈Jλn. (7.12)The notion of elementary DPPs will be particularly useful to define thenormalization constant under fixed cardinality DPPs, which is introducedin the following Section 7.3.7.3 k-DPPsA k-DPP on a discrete set Y = {1, . . . , N} is simply a DPP with fixedcardinality k. The modelling is thus restricted on which elements of size kare part of a random subset of Y. This notion is essential for the monitoringof networks context seen in Chapter 8. In practice, the number of monitoringsites that one can afford to sample from is usually known in advance. Onthe other hand, standard DPPs models may yield subsets of any size.A k-DPP can be obtained by conditioning a standard DPP on the eventthat the set Y has cardinality k, as follows957.3. k-DPPsPkL(Y ) = P(Y = Y | |Y | = k) =det(LY )∑|Y ′|=k det(LY ′), (7.13)where |Y | denotes the cardinality of Y and L is a positive semidefinite matrixindexed by the elements of Y.From the notion of elementary DPPs, note that the normalization con-stant of the k-DPP is given by∑|Y ′|=kdet(LY ′) = det(L+ I)∑|Y ′|=kPL(Y ′) (7.14)=∑|Y ′|=k∑J⊆{1,...,N}PVj (Y ′)∏n∈Jλn (7.15)=∑J⊆{1,...,N}|J |=k∏n∈Jλn (7.16)≡ ENk , (7.17)which can be computed recursively noting thatENk = EN−1k + λNEN−1k−1 , (7.18)where λ1, . . . , λN are the eigenvalues of the L-ensemble.Algorithm 4 summarizes how to sample from a k-DPP. The main differ-ence of the k-DPP algorithm is in its first step, where this time the subsetof eigenvectors is sampled with a fixed cardinality k.967.3. k-DPPsAlgorithm 4 Sampling from a k-DPPInput: size k and {vn, λn} eigenvectors and eigenvalues of LJ ← ∅Compute En1 , . . . , Enk , for n = 0, . . . , Nfor n = N, . . . , 1 doSample u ∼ U [0, 1]if u <λnEn−1k−1EnkthenJ ← J ∪ {n}k ← k − 1if k = 0 thenbreakend ifend ifend forV ← {vn}n∈JY ← ∅while |V | > 0 doSelect yi from Y with probability given by 1|V |∑v∈V (v>ei)2Y ← Y ∪ {yi}V ← V⊥, an orthonormal basis for the subspace of V orthogonal to eiend whileOutput: Y97Chapter 8Design of MonitoringNetworks8.1 Importance of Designing MonitoringNetworksFrom an environmental perspective, monitoring networks play an impor-tant role in surveillance of environmental processes which may impact eitherhuman health or nature. The measurements obtained from the monitoringof temperature, precipitation, and pollutants within a region, for instance,provide critical data for both scientists and governmental agencies that canbe used for many essential objectives, such as• Are the measurements obtained above the regulatory limits?• Is there a trend in a given health outcome that could potentially beassociated with an environmental hazard?With the advances in geographic information systems (GIS) (Goodchildand Haining, 2004; Murray, 2010), modern agriculture now relies on inter-active tools that provide climate and soil information. Such information isvaluable not only to maximize yield but also for the development of moreenvironmentally friendly practices.However, with important objectives come interesting challenges. Manydesign strategies have been developed. The different strategies are based onproviding a good spatial coverage of the domain, by ensuring randomizationand selecting locations at random given a certain probability of inclusion;or even depending on a model used to learn about a particular underlying988.1. Importance of Designing Monitoring Networksphenomenon. A brief description of such methods can be found in Section8.3.Moreover, it is important to note that a monitoring network need notbe static. Not only may a network’s purpose change over time, regulatorybudgets may also allow new sites to be added or may impose a reduction inthe network.Most importantly, we would like to advocate the need for “mobile friendly”design strategies, i.e. approaches suitable for the design of dynamic or mobilenetworks. In the agricultural context, weather conditions or other suddenemerging risks may require a close surveillance of the agriculture fields, andtheir design purpose may quickly and dramatically change over time. Thesurveillance itself can be done by obtaining measurements at certain keylocations in the field using portable devices (Tothill, 2001; Rodriguez-Mozazet al., 2006), such as nutrient meters for obtaining soil macro-nutrient in-formation. Depending on how these new conditions are expected to impacttheir crop, there may be a need to dynamically choose new locations onwhere to measure.In practice, however, stations are often preferentially deployed, possiblyrestricted to the accessibility of the potential new sites and budgetary con-straints. Diggle et al. (2010) drew attention to inference under this scenario.Preferential sampling occurs when the sampling locations are stochasticallydependent on the underlying process. A clear example is in air pollutionmonitoring. Data are sometimes collected at locations where it is anticipatedthat the outcome will have a large or small value (Guttorp and Sampson,2010).Diggle et al. (2010) pointed out that ignoring preferential sampling ingeostatistical models can lead to misleading inferences. In order to adjustfor potential biases, they proposed a shared latent process model for geosta-tistical modelling with preferentially sampled data. This issue is similarlydiscussed in the Bayesian framework by Pati et al. (2011). Gelfand et al.(2012) suggested a simulation-based approach to assess the effects of pref-erential sampling based on information about underlying process and otherfactors known drive that process. Also, Zidek et al. (2014) suggested a bias998.2. Contributionscorrection approach that is able to accommodate changes to the networkdue to preferential sampling over time. Ferreira and Gamerman (2015) usean approach based on utility functions in order to analyze the influence ofpreferential sampling in situations where the goal is to optimize an objectivefunction.The cited recent works have taken a negative view towards preferentialsampling. Though it is clear that the effects of preferential sampling on bothestimation and prediction should not be disregarded, we explore how we canobtain balanced designs yielding a high-quality yet diverse set of monitors.We introduce a flexible design strategy based on k-DPPs (Section 7.3) thatis able to yield a spatially balanced design by imposing repulsion on the dis-tances between the candidate locations and hence avoid spatial clumping,but it also has the ability to assess similarity between the potential loca-tions should there be extra sources of information related to the underlyingprocess of interest.8.2 ContributionsIn this chapter, our main contribution is the introduction of a novelflexible monitoring network design strategy based on k-DPPs. This strategyis able to handle both designing and redesigning a monitoring network. Thek-DPP design can yield spatially balanced designs by imposing repulsion onthe distances between the candidate locations. It is also possible to assesssimilarity and impose repulsion between the potential locations based onextra sources of information that might be related to the phenomenon ofinterest.Since its essence is that of a randomized design, we explore its potentialuse as a randomized alternative to space-filling designs. An overview ofspace-filling designs can be found in Section 8.3.1.Additionally, the k-DPP optimal design objective is remarkably similarto that of entropy design, which is reviewed in Section 8.3.2. Since the op-timization for entropy designs is a NP-hard problem (Ko et al., 1995), weexplore a sampling design for approximating the entropy design optimal so-1008.3. A Review of Design Strategieslution. This strategy explores the stochasticity of k-DPP and the simplicityof obtaining samples from this process.8.3 A Review of Design StrategiesDesign strategies are often divided into the following groups (Zidek andZimmerman, 2010):• Geometry-based designs: Designs of this type are based solely ongeometric considerations and, in general, their intent is to provide agood coverage of the design region. An example is the space-fillingdesign (Cox et al., 1997; Nychka et al., 1997; Royle and Nychka, 1998;Van Groenigen et al., 2000), which we describe in Section 8.3.1. Thesedesigns are particularly useful for exploratory purposes (Mu¨ller, 2005).However, as a non-randomized design strategy, it may be prone topotential sampling biases.• Probability-based designs: These designs are often based on ran-domly sampling locations from the design region. Although they canavoid potential sampling biases due to the randomization, the sampledpoints may be clumped in some particular areas of the design region.Considering that nearby locations tend to share similar characteris-tics, sampling locations too close to each other may not bring valuableinformation about the process of interest.• Model-based designs: As pointed out by Zidek and Zimmerman(2010), environmental monitoring networks are usually based on model-based designs. These designs often optimize a certain characteristicabout the process of interest, such as the reduction of uncertaintyabout model parameters or of the uncertainty about the prediction atunmeasured locations.In the following subsections, we briefly expand on the descriptions forspace-filling and entropy-based designs.1018.3. A Review of Design Strategies8.3.1 Space-Filling DesignsSpace-filling (SF) spatial designs aim at providing a good coverage ofthe design region based on a criterion that is purely geometric-based. SFdesigns thus make use of geometric measures to assess the coverage qual-ity. Outside of the monitoring network context, SF designs have also beenquite extensively explored in computer experiments as a way of selectinginputs (Sacks et al., 1989; Haaland et al., 1994). A survey of SF designs forcomputer experiments can be found in Pronzato and Mu¨ller (2012).Denote by C a set of candidate points, usually based on a fine discretiza-tion of the design region, and let D ⊂ C denote the set of k design points.A metric for the distance of any point s and a particular design D is givenby (Nychka et al., 1997; Royle and Nychka, 1998):dp(s,D) =(∑u∈D||s− u||p)1/p. (8.1)This metric determines how well the design covers the point s. Their overallcoverage criterion is based on minimizing averages of the coverage metricfor every candidate point, given byCp,q(D) =(∑u∈Cdp(s,D)q)1/q, (8.2)for all D ⊂ C. Royle and Nychka (1998) describe a suboptimal solutionbased on a point swapping algorithm. The algorithm uses random startingconfigurations and aim at decreasing the coverage criterion by swappingcandidate and design points until convergence.An advantage of this method is its computational simplicity, which iscurrently implemented in the R package fields (Nychka et al., 2015). Royleand Nychka (1998) point out that the resulting designs are nearly optimalfor spatial prediction purposes.A generalization of the space-filling design is that of Van Groenigen et al.(2000) who propose a weighted means of shortest distances criterion based1028.3. A Review of Design Strategieson minimizing ∫Ad(s)w(s)ds, (8.3)where A ⊆ R2, w(s) is a weight function, and the distance between s ∈ Aand its closest design points is given byd(s) = mini||s− xi||. (8.4)However, since these design strategies are geometry-based, they do notallow inclusion of other sources of information, and are highly dependent onthe coverage criterion. As non-randomized design strategies, they may beprone to potential sampling biases.8.3.2 Entropy-Based DesignsThe uncertainty about Y can be represented by the the entropy of itsdistributionH(Y) = EY[− log(f(Y)h(Y))], (8.5)where h(Y) denotes a reference density, which need not be integrable, al-lowing the entropy to be invariant under one-to-one transformations of thescale of Y (Jaynes, 1963).In hierarchical models for environmental processes, Y is usually definedconditionally on some hyperparameters, which we denote by Ψ. Consideringminimizing uncertainty about Ψ as another design objective (Caselton et al.,1992), the total entropy can be defined as1038.3. A Review of Design StrategiesH(Y,Ψ) = E[− log(f(Y,Ψ)hY,Ψ(Y,Ψ))](8.6)= E[− log(f(Y|Ψ)f(Ψ)hY(Y)hΨ(Ψ))](8.7)= E[− log(f(Y|Ψ)hY(Y))]+ E[− log(f(Ψ)hΨ(Ψ))](8.8)= H(Y|Ψ) +H(Ψ). (8.9)Similarly, we can define the total entropy in the context of the designof monitoring networks by first augmenting the data into Y = (Y(u),Y(g)),where Y(u) denotes the measurements at potential sites, currently ungauged,and Y(g) relates to the existing sites, referred to as gauged locations. Theresult is:H(Y(u),Y(g),Ψ) = E[− log(f(Y(u),Y(g),Ψ)hY(u),Y(g),Ψ(Y(u),Y(g),Ψ))]= E[− log(f(Y(u)|Y(g),Ψ)× f(Ψ|Y(g))× f(Y(g))hY(u)(Y(u))× hΨ(Ψ)× hY(g)(Y(g)))]= E[− log(f(Y(u)|Y(g),Ψ)hY(u)(Y(u)))]+ E[− log(f(Ψ|Y(g))hΨ(Ψ))]+ E[− log(f(Y(g))hY(g)(Y(g)))]= H(Y(u)|Y(g),Ψ) +H(Ψ|Y(g))︸ ︷︷ ︸H(Y(u),Ψ|Y(g))+H(Y(g)). (8.10)The design criterion is based on minimizing H(Y(u),Ψ|Y(g)), whichmeasures the uncertainty about Y(u) and Ψ after Y(g) is observed. Sincethe total entropy H(Y(u),Y(g),Ψ) is fixed, an equivalent criterion is tomaximize H(Y(g)). Moreover, the same criterion of maximizing H(Y(g))would be similarly obtained had we decomposed H(Y(u),Y(g)) instead ofH(Y(u),Y(g),Ψ).1048.3. A Review of Design StrategiesEntropy of Multivariate Normal DistributionsRecall that the log-density of a p-dimensional multivariate normal dis-tribution with mean µ and covariance Σ is given bylog f(Y|µ,Σ) = −p2log(2pi)− 12log det(Σ)− 12(Y − µ)>Σ−1(Y − µ),where det(·) denotes matrix determinant. Without loss of generality, assum-ing a reference measure hY(Y) = 1, thenH(Y) = EY[p2log(2pi) +12log det(Σ) +12(Y − µ)>Σ−1(Y − µ)]=p2log(2pi) +12log det(Σ) +12EY[(Y − µ)>Σ−1(Y − µ)]=p2log(2pi) +12log det(Σ) +p2=p2[log(2pi) + 1] +12log det(Σ). (8.11)Now suppose that we can partition Y as (Y(u),Y(g)). Denoting the ma-trices Σgg and Σuu as the covariance matrices of Y(g) and Y(u), respectively,and Σug the cross-covariance, we can partition Σ asΣ =(Σuu ΣugΣgu Σgg). (8.12)Recall from the properties of the multivariate normal that Y(u)|Y(g) ∼N (µu|g,Σu|g), where Σu|g = Σuu −ΣugΣ−1gg Σgu. Hence, the entropy of Ycan be written asH(Y(u),Y(g)) = H(Y(u)|Y(g)) +H(Y(g)) (8.13)c∝ 12log det(Σu|g) +12log det(Σgg), (8.14)wherec∝ denotes proportionality up to additive constants. The entropycriterion is thus to minimize log det(Σu|g), or equivalently, to maximizelog det(Σgg).1058.3. A Review of Design StrategiesIn the context of monitoring networks, say, when the goal is to augmentthe network, the objective is to find a subset of u+ sites among the u un-gauged ones (also referred to as candidate sites, where C denote the set ofcandidate points) to add to the existing network. We denote the remainingsites that are not the selected as u−. The resulting network will then consistof (Y(u+),Y(g)).Note that Y(u) can be partitioned into (Y(u+),Y(u−)). Proceeding sim-ilarly as above, note thatH(Y(u+),Y(u−),Y(g)) = H(Y(u+),Y(u−)|Y(g)) +H(Y(g))= H(Y(u−)|Y(u+),Y(g)) +H(Y(u+),Y(g)).Since the total entropy H(Y(u+),Y(u−),Y(g)) is fixed, it will be optimal toaugment the network with the u+ sites so as to maximize H(Y(u+),Y(g)).Considering thatH(Y(u+),Y(g)) = H(Y(u+)|Y(g)) +H(Y(g)) (8.15)c∝ 12log det(Σu+|g) +12log det(Σgg), (8.16)it will be optimal to maximize 12 log det(Σu+|g). The entropy criterion foraugmenting the network is thusarg maxu+⊂C12log |Σu|g|. (8.17)Entropy of Multivariate t-DistributionsThe entropy of a multivariate t-distribution can be obtained as a scalemixture of a multivariate normal and an inverted Wishart distribution (Casel-ton et al., 1992; Guttorp et al., 1993). Let Y be a g-dimensional randomvariable such asY|Σ ∼ N (µ,Σ) (8.18)Σ|Ξ, δ ∼ W−1(Ξ, δ). (8.19)1068.3. A Review of Design StrategiesHence,Y ∼ t(µ,Ξδ − g + 1 , δ − g + 1). (8.20)Conditionally on the hyperparameters Ξ and δ, note that the entropy ofY is defined asH(Y) = H(Y|Σ) +H(Σ)−H(Σ|Y), (8.21)since the joint entropy H(Y,Σ) can be decomposed in the following twodifferent ways:H(Y,Σ) = H(Y|Σ) +H(Σ) (8.22)H(Y,Σ) = H(Σ|Y) +H(Y). (8.23)Assuming the following reference measure h(Y,Σ) = |Σ|−(g+1)/2 as inCaselton et al. (1992), we will now describe each component of the entropyof Y.Since Y|Σ is multivariate normal, using results from Section 8.3.2, andsince ΞΣ−1 ∼ W(I, δ), thenH(Y|Σ) = 12[log(2pi) + 1] +12E[log det(Σ) | Ξ]=12[log(2pi) + 1] +12E[log det(ΣΞ−1Ξ) | Ξ]=12[log(2pi) + 1] +12E[log det(ΣΞ−1) | Ξ] + 12log det(Ξ)c∝ 12log det(Ξ). (8.24)1078.3. A Review of Design StrategiesNow notice that the other two components of the entropy of Y in (8.21)are in fact constants. Recall that if Σ|Ξ, δ ∼ W−1(Ξ, δ) then its density canbe written asf(Σ) ∝ det(Ξ)δ/2 det(Σ)− δ+p+12 exp{−12tr(ΞΣ−1)}. (8.25)Due to our choice of reference measure, conditionally on the hyperpa-rameters Ξ and δ, the entropy of Σ isH(Σ) = E[− log(f(Σ)h(Σ))]c∝ −δ2log det(Ξ) +δ2E(log det(Σ) | Ξ) + 12E[tr(ΞΣ−1) | Ξ]= −δ2log det(Ξ) +δ2E[log det(ΣΞ−1Ξ) | Ξ] + 12E[tr(ΞΣ−1) | Ξ]= −δ2log det(Ξ) +δ2E[log det(ΣΞ−1) | Ξ] + δ2log det(Ξ)+12E[tr(ΞΣ−1) | Ξ]c∝ c1(p, δ), (8.26)where c1(p, δ) denotes a constant that depends on p and δ. This is due tothe fact that ΞΣ−1 ∼ W(I, δ).Similarly, conditionally on the hyperparameters Ξ and δ, we will derivethe entropy of Σ|Y. Firstly, note that the marginal posterior of Σ is givenbyΣ|Y ∼ W−1(Ξ + YY>, δ + 1). (8.27)Recall that the reference measure we are using is h(Y,Σ) = h(Y)h(Σ) =|Σ|−(g+1)/2. Hence, conditionally on the hyperparameters Ξ and δ, the1088.3. A Review of Design Strategiesentropy of Σ|Y is given byH(Σ | Y) = E[− log(f(Σ|Y)h(Σ))]c∝ −δ + 12E[log det(Ξ + YY>) | Ξ]+δ + 12E[log det(Σ) | Ξ] + 12E[tr((Ξ + YY>)Σ−1) | Ξ]= −δ + 12log det(Ξ)− δ + 12E[log(Y>Ξ−1Y) | Ξ]+δ + 12E[log det(ΣΞ−1) | Ξ] + δ + 12log det(Ξ)+12E[tr((Ξ + YY>)Σ−1) | Ξ]c∝ −δ + 12E[log(Y>Ξ−1Y) | Ξ] + δ + 12E[log det(ΣΞ−1) | Ξ]+12E[tr(ΞΣ−1) | Ξ] + 12E[tr(Y>Σ−1Y) | Ξ]c∝ c2(p, δ), (8.28)where c2(p, δ) denotes a constant that depends on p and δ. This is due tothe fact thatdet(Ξ + YY>) = det(Ξ)[Y>Ξ−1Y] (8.29)tr((Ξ + YY>)Σ−1) = tr(ΞΣ−1) + tr(YΣ−1Y>), (8.30)ΞΣ−1 ∼ W(I, δ), and that given Ξ, Y>Ξ−1Y follows a F -distribution withdegrees of freedom depending on p and δ.Finally, combining (8.24), (8.26) and (8.28), we conclude that the entropyfor a t(µ, Ξδ−g+1 , δ − g + 1)isH(Y)c∝ 12log det(Ξ). (8.31)1098.4. k-DPP Design8.4 k-DPP DesignWe propose a flexible monitoring network design strategy based on ak-DPP. We use the fact that not only could a k-DPP design be spatially-balanced, but could also provide a flexible way of imposing diversity in theselection of locations based on additional variables that might be available.The methodology depends on the well-established theory of k-DPPs, thoughhere it is described in a design of monitoring network setting.Definition 8.5. A k-DPP design is characterized by an L-ensemble, denotedby L, i.e. any positive semidefinite matrix indexed by a set of n candidatelocations, such that n ≥ k. The design objective is based on the optimalconfiguration under a cardinality constraint, as followsarg maxY⊆Y, |Y |=kdet(LY ), (8.32)where Y = {1, . . . , n} and LY ≡ [Lij ]i,j∈Y .Note that the methodology easily accommodates the objective of reduc-ing the number of sites in a monitoring network. In this case, the elementsof L should be indexed by the n set of existing monitoring locations. Ifthe goal is to select k stations for reduction, then the cardinality constraintin the design objective should be the cardinality of the complementary set,n− k, as followsarg maxY⊆Y, |Y |=n−kdet(LY ), (8.33)where Y = {1, . . . , n}. This allows for the selection of the optimum n − kset of monitors that will remain in the network, while the other k will beshut down.Furthermore, the methodology can also be adapted for use when the goalis to augment the network. Let C be the set of m potential new locations andG the set of n existing monitors. Let Y index the elements of the monitoringnetwork which includes all the n existing monitors as well as the m candidatemonitors.1108.4. k-DPP DesignNote that the probabilities of selection need to be described conditionallyon the existing monitors. We describe these conditional probabilities asfollows.PL(Y = G ∪ C|G ⊆ Y) = PL(Y = G ∪ C)PL(G ⊆ Y) (8.34)∝ PL(Y = G ∪ C) (8.35)∝ det(LGC ), (8.36)where LG is an L-ensemble indexed by the elements of Y − G, and can beobtained as follows (Borodin and Rains, 2005; Kulesza and Taskar, 2012)LG = [(L+ IG¯)−1]G¯ − I, (8.37)where IG¯ is the matrix with ones in the diagonal entries of the elementsof Y − G, and L is an L-ensemble indexed by the elements of Y. Recallthat the subscripts denote matrix restriction. For instance, LGC is simplythe restricted LG , i.e. selecting the rows and columns associated with theelements of candidate set C. Denoting the set of u selected monitors as U,the augmentation design objective can be then described asarg maxU⊆C, |U|=mdet(LGU). (8.38)One of the limitations of the k-DPP design is that the described opti-mization problems are NP-hard (Ko et al., 1995; Kulesza and Taskar, 2012).In the following Section 8.5.1, we describe a sampling strategy tool basedon the k-DPP design that can also be used to select a subset of pointsamong a candidate set of points, which could ultimately be used to obtaina suboptimal approximation.8.5.1 k-DPP Sampling Design StrategyIn order to handle the NP-hard optimization problem, we propose a sam-pling design strategy based on k-DPPs. This sampling design strategy takes1118.4. k-DPP Designadvantage of the ease of sampling from a k-DPP as described in Algorithm4.Definition 8.6. A k-DPP sampling design is a probability-based design thatentails sampling from a k-DPP characterized by an L-ensemble, denoted byL, i.e. any positive semidefinite matrix indexed by a set of n candidatelocations. Every subset of k locations among the candidate points has theopportunity to be sampled with probabilityPkL(Y ) =det(LY )∑|Y ′|=k det(LY ′).A remarkable characteristic of a k-DPP sampling design is that its method-ology is flexible. The design strategies described are governed by the choiceof the L-ensemble, which brings some flexibility for the scope of allowabledesign strategies.Spatially-balanced k-DPP DesignsA spatially-balanced k-DPP design can be easily constructed as an al-ternative to space-filling designs when the only information available are thelocations of the potential new monitors.Result 8.6.1. A spatially-balanced k-DPP design can be used as an alter-native to space-filling designs when the only information available are thelocations of the potential new monitors. It suffices to construct a positivesemidefinite matrix L whose entries depend on a measure of distance betweenthe candidate locations.A simple way to construct an L-ensemble of such type is by using aGaussian kernel, as described below. Note that the positivity of the kernelguarantees that L is indeed a positive semidefinite matrix.• Gaussian radial basis kernel: The (i, j) entries of the L are givenby Li,j = exp{−||si−sj ||2/2σ2}, where ||si−sj || denotes the Euclideandistance between locations si and sj .1128.4. k-DPP DesignIn Section 8.7, we illustrate how spatially-balanced k-DPP designs canbe used as an alternative to space-filling designs.Inclusion of Extra Sources of Information in a k-DPP DesignIn the more general case, a k-DPP sampling design strategy could alsoallow for the inclusion of other potential extra sources of information. Inthe context of environmental monitoring networks, this could include to-pographic or demographic features. For instance, if our goal is to designa monitoring network for a given pollutant, we may consider demographicfeatures correlated with the outcome of interest, such as population size, todiversify the location of the monitors. By using a k-DPP strategy, we wouldbe more likely to choose locations in highly populated areas as well as notso populated ones.Assuming that there exists p standardized features available about agiven location si, such that (f(1)(si), . . . , f(p)(si)) is associated to si, thenLi,j = exp{−p∑l=1||f (l)(si)− f (l)(sj)||22σ2}. (8.39)We find this idea somewhat similar to Schmidt et al. (2011), where theirgoal was to handle nonstationarity by including the effect of covariates in thecovariance structure of spatial processes. In this work, our aim is to selecta diverse set of sampled design points by also taking into consideration theinformation available at the candidate points.Furthermore, one may use an empirical description of the data availableand define L as a sample covariance matrix.Approximation to the Entropy-based Design Objective using ak-DPP Sampling Design StrategyAnother remarkable characteristic of a k-DPP design is its strong similar-ity to the entropy-based design objective in the Gaussian case, as discussedin the following result.1138.7. Comparing k-DPP and SF Sampling DesignsResult 8.6.2. A k-DPP sampling design characterized by an L-ensemblegiven by the predictive covariance structure of the potential new monitorsgiven the existing monitors can be viewed as a randomized version of theentropy design in the Gaussian case.In Section 8.8, we illustrate how a k-DPP sampling design strategy canbe used as an approximation for the entropy-based designs for monitoringtemperature fields.8.7 Comparing k-DPP and SF Sampling DesignsExample 1. Beilshmiedia pendula trees in Barro IslandLet us consider as an example, the locations of Beilshmiedia pendulatrees and elevation for a subset of an original survey plot in Barro Island.The data set bei is available for download in the spatstat R package (Bad-deley and Turner, 2005; Baddeley et al., 2015). Figure 8.1 illustrates a totalof 357 potential trees to be selected from. The objective is to select a subsetof 20 of them.Figure 8.1: Tropical rainforest data. Locations of Beilshmiedia pendula treesand elevation (metres above sea level) in the [700, 1000] × [0, 200] metreswindow of a survey plot in Barro Island. Coloured background correspondsto the variation of elevation in that window, as seen in the scale on the rightof the plot. Data available from spatstat R package.Figure 8.2 illustrates the selected trees using the space-filling method1148.7. Comparing k-DPP and SF Sampling Designsdescribed in Section 8.3.1 as well as one based on a 20-DPP design. The20-DPP design was characterized by an L-ensemble using a Gaussian kerneldepended on both the locations and the elevation at all the potential trees,as in (8.39). Hence the k-DPP diversity was quantified by both the locationsand the elevations. For illustrative purposes, we have assumed an arbitraryvariance σ2 = 0.5.The space-filling design yielded a more spatially balanced design thanthe 20-DPP, as indicated in Figure 8.3. Nevertheless, the DPP design is notas spatially clustered as the candidate locations.Note that even though our interest might be in a spatially-balanceddesign in order to better represent the study region, the DPP design willalso “penalize” distant stations that are too much alike, hence also imposingdiversity with respect to elevation. Here, we use the “penalization” term toreflect reduced probability of selection.(a) Space-Filling (b) 20-DPPFigure 8.2: Locations of 20 Beilshmiedia pendula trees selected via a space-filling and a 20-DPP design strategies. Coloured background corresponds tothe variation of elevation (metres above sea level).1158.7. Comparing k-DPP and SF Sampling Designs0 10 20 30 40 50050001000015000Distances (metres)Estimated Ripley's K(a) All candidates0 10 20 30 40 50050001000015000Distances (metres)Estimated Ripley's K(b) Space-Filling0 10 20 30 40 50050001000015000Distances (metres)Estimated Ripley's K(c) 20-DPPFigure 8.3: Empirical (solid line) and theoretical (dashed line) Ripley’s K.Note that the SF design shows more spatial regularity than the DPP design.Example 2. Estimation Assessment with Artificial DataIn this study, we simulate a realization of a Mate´rn random field withmean zero, partial sill σ2 = 4, range φ = 1 and smoothness ν = 2, asillustrated in Figure 8.4.1168.7. Comparing k-DPP and SF Sampling Designs0 2 4 6 8 100246810−4−202Figure 8.4: Realization of a Mate´rn random field with mean zero, partialsill σ2 = 4, range φ = 1 and smoothness ν = 2, in a [0, 10]× [0, 10] domain.Coloured background corresponds to the observed values of the field (nounits associated to them), as seen in the scale on the right of the plot.For the purposes of this simulation study, the realization was assumedto be an artificial true underlying field. We then repeatedly selected 40locations using three design strategies: 40-DPP, via random uniform selec-tion, and via a space-filling design. The 40-DPP was characterized using aGaussian kernel with fixed variability of 0.5. No other source of informationbut the locations of the potential sites was assumed for any of the strategiesconsidered. The potential sites consisted of a 40× 40 fine grid over the spa-tial domain. Here, the objective is to compare the two spatially-balanceddesign strategies.Figure 8.5 illustrates an example of the sampling locations using thedifferent design strategies. Note that the space-filling design is not a ran-domized design strategy, so the variation in sampling locations is due tovariation in variability in the start configuration points for the point swapalgorithm. Note that these sampling locations would then emulate the en-vironmental monitors and the observed value of the Mate´rn random field atthose locations would reflect the data available.In this study, we assumed no measurement error. The basic geostatisticalmodel assumed that for each location si wasY (si) = µ+ η(si), (8.40)1178.7. Comparing k-DPP and SF Sampling Designsi = 1, . . . , 40, η(si) is a second-order stationary process with zero mean andpartial sill σ2, and Mate´rn correlation function with fixed smoothness ν = 2.(a) 40-DPP (b) Random (c) Space-FillingFigure 8.5: Example of sampling locations using a 40-DPP design, random(uniform) selection, and a space-filling design.We then proceeded with estimating the model parameters using theINLA method, as described in Section 3.2. To complete model specification,we assumed the following independent prior distributions for Ψ = (µ, σ2, φ):µ ∼ N (0, 100) (8.41)σ−2 ∼ Gamma(1, 0.01) (8.42)φ ∼ Gamma(1, 0.01). (8.43)These are fairly vague prior distributions. The notation above N denotes anormal with given mean and variance parameters.Figure 8.6 contains the box-plots of the posterior means for these param-eters, with posterior variability described in Figure 8.7. We have observedslightly more uncertainty a posteriori for the mean and partial sill param-eters. On the other hand, it can be seen that for the vast majority of thesimulations, the SF underestimated the true mean. The estimation perfor-mance of the DPP and SF design about the partial sill and range parametersseem comparable. When the main interest is in spatial regularity, in termsof estimation, we observe that the SF and 40-DPP design yield comparabledesign strategies.1188.7. Comparing k-DPP and SF Sampling Designslll−0.40.00.4DPP Random SFSampling design strategiesPosterior Means of µ(a) Meanlllll2468DPP Random SFSampling design strategiesPosterior Means of σ2(b) Partial sillllllllllllllllllllllll123DPP Random SFSampling design strategiesPosterior Means of φ(c) RangeFigure 8.6: Box-plots of posterior means for the model parameters Ψ =(µ, σ2, φ) after repeatedly selecting 40 locations using three different strate-gies: a 40-DPP with a Gaussian kernel, random uniform selection, and aspace-filling design. This process was repeated 100 times. The red horizon-tal lines represent the true values of the parameters.1198.7. Comparing k-DPP and SF Sampling Designsllll0.300.350.400.45DPP Random SFSampling design strategiesPosterior Standard Deviations of µ(a) Meanlllll2468DPP Random SFSampling design strategiesPosterior Standard Deviations of σ2(b) Partial sillllllllllllllllllllllll123DPP Random SFSampling design strategiesPosterior Standard Deviations of φ(c) RangeFigure 8.7: Box-plots of posterior standard deviations for the model pa-rameters Ψ = (µ, σ2, φ) after repeatedly selecting 40 locations using threedifferent strategies: a 40-DPP with a Gaussian kernel, random uniform se-lection, and a space-filling design. This process was repeated 100 times.1208.8. Comparing k-DPP and Entropy-Based Designs for Monitoring Temperature Fields8.8 Comparing k-DPP and Entropy-BasedDesigns for Monitoring Temperature FieldsAs noted in Section 4.1, temperature is now seen as an environmentalhazard due to its potential negative effects in human health and nature.That leads to a need to ensure that the temperature field is adequatelymonitored. In this section, we compare through a brief case study, thedesigns obtained by the entropy and DPP approaches.For this study we turn to the data described in Section 4.3.2, consistingof 97 stations spread over the Pacific Northwest where measurements ofmaximum daily temperature were obtained for the January to June 2000period. A subset of 64 stations are to be selected among the 97 stations toconstitute as a hypothetical monitoring network. An additional 33 stationsare designated as potential sites for new monitors. In this case study, thegoal is to select a subset of 10 stations from among the 33 to augment thenetwork based on some design criterion.From Section 4.4, recall that we have partitioned the data as Yt ≡(Y(u)t ,Y(g)t ), t = 1, . . . , n. Similarly, we can partition the data for a futuretime f as Yf ≡ (Y(u)f ,Y(g)f ). Denoting y(g)1:n as all the data available at thegauged sites across all times 1, . . . , n, then note that (Le and Zidek, 2006)Y(u)f |y(g)f ,y(g)1:n,Z,B0 ∼ tu(µ(u),dδ − u+ 1Ξu|g, δ − u+ 1), (8.44)whereµ(u) = zfB(u)0 + ΞugΞ−1gg (y(g)f − zfB(g)0 ) (8.45)d = 1 + zfF−1z>f + (y(g)f − zfB(g)0 )Ξ−1gg (y(g)f − zfB(g)0 )> (8.46)Ξu|g = Ξuu −ΞugΞ−1gg Ξgu. (8.47)and thatY(g)f |y(g)1:t ,Z,B0 ∼ tu(µ(g),cδ + n− u− g + 1Ξˆgg, l), (8.48)1218.8. Comparing k-DPP and Entropy-Based Designs for Monitoring Temperature Fieldswhereµ(g) = (I−W)Bˆ(g) + WB(g)0 (8.49)c = 1 + zf (A + F)−1z>f (8.50)A =n∑t=1z>t zt (8.51)W = (A + F)−1F−1 (8.52)Ξˆgg = Ξgg + S + (Bˆ(g) −B(g)0 )>(A−1 + F−1)−1(Bˆ(g) −B(g)0 )(8.53)S =n∑t=1(y(g)t − ztBˆ(g))>(y(g)t − ztBˆ(g)) (8.54)Bˆ(g) =(n∑t=1z>t zt)−1( n∑t=1z>t y(g)t). (8.55)Using the results for the entropy of a multivariate t distribution describedin Section 8.3.2, and denoting the available data as D = (y(g)1:n,Z), condi-tionally on the hyperparameters, the total entropy can thus be decomposedasH(Yf |D) = H(Y(u)f |Y(g)f ,D) +H(Y(g)f |D) (8.56)Proceeding as in the end of Section 8.3.2, recall that in the context ofmonitoring networks, when the goal is to augment the network, the objectiveis to find a subset of u+ sites among the u ungauged ones (also referred toas candidate sites, where C denote the set of candidate points) to add to theexisting network. We denote the remaining sites that are not the selectedas u−. The resulting network will then consist of (Y(u+)f ,Y(g)f ).Note that Y(u)f can be partitioned into (Y(u+)f ,Y(u−)f ). Thus,H(Y(u+)f ,Y(u−)f ,Y(g)f |D)= H(Y(u−)f |Y(u+)f ,Y(g)f ,D) +H(Y(u+)f ,Y(g)f |D) (8.57)Notice that it will be optimal to augment the network with the u+ sites so1228.8. Comparing k-DPP and Entropy-Based Designs for Monitoring Temperature Fieldsas to maximize H(Y(u+),Y(g)f |D). Considering thatH(Y(u+),Y(g)f |D) = H(Y(u+)f |Y(g)f ,D) +H(Y(g)f |D) (8.58)c∝ 12log |Ξu+|g|+12log det(Ξˆgg), (8.59)then an equivalent criterion is to maximize 12 log det(Ξu+|g).In summary, the entropy criterion for augmenting the network is thusarg maxu+⊂C12log det(Ξu|g). (8.60)For the purposes of this case study, we considered an alternate 10-DPPdesign strategy characterized by the same hypercovariance matrix, Ξu|g, inorder to yield comparable design objectives. Similarly as in Section 4.4, weestimate it via the SG warping method based on the hypercovariance matrixamong the gauged sites.We obtained a 10-DPP design based on a simulation strategy of repeat-edly sampling from a 10-DPP. Figure 8.8 illustrates the locations for selectedstations for the two different methods. The entropy solution yielded a log-determinant of 78.70 for the optimal set of locations. Note that the majorityof the new locations were selected in southern Oregon and northern Califor-nia, west of the Cascade mountains. The k-DPP also selected a couple ofsites in Eastern Washington, east of the Cascades.Figure 8.10 illustrates the distribution of the log-determinants for theDPP samples considering different numbers of simulations. Moreover, fromFigure 8.9, note that 10-DPP is very close to but suboptimal compared withthe entropy design.Though yielding a suboptimal solution, when the number of combina-tions is prohibitive, the simulation results indicate that sampling from ak-DPP could be used to obtain approximations to the entropy design. Fur-ther assessment to verify the properties of this method of approximating theentropy based design are needed, as the sampling complexity of the DPPwill also increase for a prohibitive number of combinations.1238.8. Comparing k-DPP and Entropy-Based Designs for Monitoring Temperature Fieldsllllllllllllll llllllllllllllllll lllExisting StationsPotential New StationsSelected New Stations(a) Entropyllllllllllllll llllllllllllllllll lllExisting StationsPotential New StationsSelected New Stations(b) 10-DPPFigure 8.8: Comparison of entropy-based and DPP design strategies. Theentropy solution yielded a log-determinant of 78.70 for the restricted condi-tional hypercovariance matrix for the ungauged sites considering the optimalset of locations. Here, we illustrate the solution of a 10-DPP sampling de-sign strategy. Note the similarity in the choice of new locations across bothdesigns.Ko et al. (1995) use a branch-and-bound algorithm for this optimizationproblem, using a greedy strategy to obtain candidate sets of points. How-ever, the algorithm is impractical for a large number of candidate points. Onthe other hand, Li et al. (2015) suggests an approximate sampling strategyfor discrete k-DPPs that could be useful to alleviate the sampling complexityfor a prohibitivele large number of combinations.1248.8. Comparing k-DPP and Entropy-Based Designs for Monitoring Temperature Fields78.278.478.678.80 20000 40000 60000Number of SimulationsMaximum of Log−DeterminantsFigure 8.9: Current maximum log-determinants of the restricted conditionalhypercovariance matrix for the ungauged sites when increasing the numberof simulations of the 10-DPP.Log−determinantsDensity73 74 75 76 77 78 790.00.20.40.6(a) 100Log−determinantsDensity73 74 75 76 77 78 790.00.20.40.6(b) 1,000Log−determinantsDensity73 74 75 76 77 78 790.00.20.40.6(c) 10,000Log−determinantsDensity73 74 75 76 77 78 790.00.20.40.6(d) 50,000Log−determinantsDensity73 74 75 76 77 78 790.00.20.40.6(e) 100,000Figure 8.10: Log-determinants of the restricted conditional hypercovariancematrix for the ungauged sites varying the number of simulations of a 10-DPP. The gray line represents the log-determinant for the optimal entropysolution.1258.9. Discussion and Future Work8.9 Discussion and Future WorkWe introduced a novel sampling design strategy based on k-determinantalpoint processes. The k-DPP design is a flexible design strategy that is ableto yield spatially-balanced designs, while imposing additional diversity in theselection of locations based on additional features that might be available.The methodology is able to handle designing and redesigning a monitoringnetwork. A summary of the important points discussed is as follows:• The k-DPP design can be used as a spatially-balanced sampling designalternative to the space-filling design.• The k-DPP design objective can be constructed in such a way that isstrongly similar to the entropy-based design objective. It can thus beviewed as a randomized version of an entropy design.• A sampling design strategy based on a k-DPP characterized by thesame hypercovariance matrix of a entropy-based design optimal cri-terion, i.e. Ξu|g, can be used as an approximation for the entropy-based design when the number of combinations is prohibitively large.Though suboptimal, this alternative could be particularly useful dueto the NP-hardness of the entropy-based design optimal criterion.Another randomized spatially balanced design is the generalized randomtessellation stratified (GRTS) design (Stevens Jr and Olsen, 1999, 2003,2004). For future work, we would like to investigate how the k-DPP designcompares to the GRTS.Moreover, in our studies we did not address inference about k-DPP L-ensemble parameters. Recent advances in a Bayesian framework includeAffandi et al. (2014), which can serve as a starting point for future explo-ration. Our next step would be to investigate their methodology for learningparameters, and how these ideas could be used in the redesigning a moni-toring network.126Chapter 9Concluding RemarksThis thesis was motivated by the growing need for understanding thechanges in Earth’s climate as well as a increasing concerns due to their po-tential impact in human health. The focus was mostly on different aspects oftemperature, particularly in the Pacific Northwestern region. Rapid changesand localized weather are very common in this region and the terrain playsan important role in separating often radically different climate and weatherregimes.When the goal is to understand a region’s weather, we advocate perform-ing exploratory analysis to better understand the localized changes in trend,instead of just focusing on the modelling of the spatial covariance structure.We argue that this is needed to better represent interesting smaller-scaletrends, especially for regions with a complex terrain like the Pacific North-west, which may not be captured by global climate models. We also extendedthe spatio-temporal model proposed in Le and Zidek (1992) and describedhow one could accommodate features in the mean that vary over space.In addition, we explored the data fusion problem in order to combineinformation from multiple sources that might have been measured at dif-ferent spatial scales. We saw that for environmental studies, data measure-ments are often supplemented by information brought by computer modeloutputs. We then introduced a scalable inference methodology using theINLA-SPDE approach, and illustrated this methodology for combining anensemble of computer models output with measurements of temperatureacross the Pacific Northwest.Another critical topic tackled was in designing monitoring networks.They play an important role in the surveillance of environmental processes.We introduced a novel flexible monitoring network design strategy based on1279.1. Future Workk-DPPs, and described how the methodology is able to handle both design-ing and redesigning a monitoring network. We illustrated how the k-DPPdesign is able to yield spatially-balanced designs, and could be used as arandomized design alternative to space-filling designs. Moreover, we notedthat the k-DPP design objective is strongly similar to the entropy-based de-sign one, and can be viewed as a randomized version of this design. Finally,we discussed how a sampling strategy based on k-DPPs could be particu-larly useful to approximate entropy-based design optimal solution when thenumber of combinations is prohibitive.9.1 Future WorkIn this section, we address the limitations of the methodologies presentedin this thesis, and introduce potential alternatives, which are currently sub-ject of future research.9.1.1 Nonstationarity in INLA-SPDE: Inference for the BEMIn Chapter 5, we assumed a stationaty Mate´rn covariance structure forthe “true” underlying random field. This assumption is somewhat unreal-istic due to the complex terrain of the Pacific Northwest and its localizedand abrupt changes in climate. We would therefore like to accommodatenonstationarity in the INLA-SPDE inference strategy. This would entail rep-resenting a Gaussian random field η as a solution to a SPDE with covarianceparameters varying over space, and writing it as(κ2(s)−∆)α2 {τ(s)η(s)} =W(s), (9.1)where τ models the variance of the process and is allowed to vary on space.We then aim to investigate whether this leads to improvement in theDBEM application of combining temperature measurements and an ensem-ble of deterministic model outputs for stations spread over the Pacific North-west, as introduced Chapter 6.1289.1. Future Work9.1.2 Modified DBEM for ForecastingIn Chapter 6, we noted that the DBEM is based on a mixture of posteriordistributions based on a training set. The methodology thus requires thecomputation of mixing weights, which are essentially based on normalizedmarginal likelihoods across the training set. Difficulty in differentiating theseweights may be encountered when at least one log-likelihood is significantlyhigher than the rest, and ultimately will dominate the mixing weights. Weintroduced an alternative methodology described in Algorithm 2 that is thesubject of current research.9.1.3 Comparison of k-DPP Design with the GeneralizedRandom Tessellation Stratified (GRTS) DesignIn Chapter 8, we described how the k-DPP design can be used as aspatially balanced sampling design alternative to the space-filling design.Another randomized spatially balanced design is the generalized randomtessellation stratified (GRTS) design (Stevens Jr and Olsen, 1999, 2003,2004). How well the k-DPP design compares with the GRTS is the subjectof future research.9.1.4 Inference about k-DPP Design ParametersIn Chapter 8, we did not address inference about k-DPP L-ensembleparameters. Inference for these parameters could ultimately be used whenredesigning a monitoring network, thus avoiding the need of an arbitraryselection. A starting point for exploration could be the work of Affandiet al. (2014). They proposed using MCMC methods, while focusing onrandom-walk Metropolis-Hastings (Metropolis et al., 1953; Hastings, 1970)and a slice sampler (Neal, 2003). Our next step would be to investigate theirmethodology for learning parameters, and how these ideas could be used inthe redesigning a monitoring network, ultimately avoiding the need for anarbitrary selection.129BibliographyAffandi, R. H., Fox, E. B., Adams, R. P. and B., T. (2014) Learning theParameters of Determinantal Point Process Kernels. In ICML.Affandi, R. H., Kulesza, A., Fox, E. and Taskar, B. (2013) Nystro¨m Approx-imation for Large-Scale Determinantal Processes. In Proceedings of the16th International Conference on Artificial Intelligence and Statistics.Affandi, R. H., Kulesza, A. and Fox, E. B. (2012) Markov determinantalpoint processes. In Proceedings of the 28th Conference on Uncertainty inArtificial Intelligence.A˚stro¨m, C., Orru, H., Rocklo¨v, J., Strandberg, G., Ebi, K. L. and Forsberg,B. (2013) Heat-related respiratory hospital admissions in Europe in achanging climate: a health impact assessment. BMJ Open, 3.Baddeley, A., Rubak, E. and Turner, R. (2015) Spatial Point Patterns:Methodology and Applications with R. London: Chapman and Hall/CRCPress.Baddeley, A. and Turner, R. (2005) spatstat: An R Package for AnalyzingSpatial Point Patterns. Journal of Statistical Software, 12, 1–42.Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2014) Hierarchical Modelingand Analysis for Spatial Data. Chapman and Hall/CRC.Bayarri, M. J. and DeGroot, M. H. (1989) Optimal Reporting of Predictions.Journal of the American Statistical Association, 84, 214–222.Berman, M. and Diggle, P. (1989) Estimating Weighted Integrals of the130BibliographySecond-Order Intensity of a Spatial Point Process. Journal of the RoyalStatistical Society. Series B (Methodological), 51, 81–92.Berrocal, B., Gelfand, A. E. and Holland, D. M. (2010a) A Spatio-TemporalDownscaler for Output From Numerical Models. Journal of Agricultural,Biological, and Environmental Statistics, 15, 176–197.Berrocal, V. J., Gelfand, A. E. and Holland, D. M. (2010b) A bivariatevariate space-time downscaler under space and time misaligment. TheAnnals of Applied Statistics, 4, 1942–1975.— (2012) Space-Time Data fusion Under Error in Computer Model Output:An Application to Modeling Air Quality. Biometrics, 68, 837–848.Berrocal, V. J., Raftery, A. E. and Gneiting, T. (2007) Combining SpatialStatistical and Ensemble Information in Probabilistic Weather Forecasts.Monthly Weather Review, 135, 1386–1402.Bivand, R. S., Pebesma, E. and Go´mez-Rubio, V. (2013) Applied SpatialData Analysis with R, chap. Spatial Point Pattern Analysis, 173–211.New York, NY: Springer New York.Blangiardo, M. and Cameletti, M. (2015) Spatial and spatio-temporalBayesian models with R-INLA. John Wiley & Sons.Bornn, L., Shaddick, G. and Zidek, J. V. (2012) Modeling NonstationaryProcesses Through Dimension Expansion. Journal of the American Sta-tistical Association, 107, 281–289.Borodin, A. (2009) Determinantal Point Processes.Borodin, A. and Olshanski, G. (2000) Distributions on Partitions, Point Pro-cesses, and the Hypergeometric Kernel. Communications in MathematicalPhysics, 211, 335–358.Borodin, A. and Rains, E. M. (2005) Eynard-Mehta Theorem, Schur Pro-cess, and their Pfaffian Analogs. Journal of Statistical Physics, 121, 291–317.131BibliographyCameletti, M., Lindgren, F., Simpson, D. and Rue, H. (2013) Spatio-temporal modeling of particulate matter concentration through the SPDEapproach. AStA Advances in Statistical Analysis, 97, 109–131.Caselton, W. F., Kan, L. and Zidek, J. V. (1992) Statistics in the Envi-ronmental & Earth Sciences, chap. Quality data networks that minimizeentropy. A Hodder Arnold Publication. E. Arnold.Cox, D. D., Cox, L. H. and Ensor, K. B. (1997) Spatial sampling and theenvironment: some issues and directions. Environmental and EcologicalStatistics, 4, 219–233.Cressie, N. and Wikle, C. (2011) Statistics for Spatial Data. J. Wiley.Cressie, N. A. C. (1993) Statistics for Spatial Data. J. Wiley.Daly, C., Halbleib, M., Smith, J. I., Gibson, W. P., Doggett, M. K., Tay-lor, G. H., Curtis, J. and Pasteris, P. P. (2008) Physiographically sensi-tive mapping of climatological temperature and precipitation across theconterminous United States. International Journal of Climatology, 28,2031–2064.Daly, C., Neilson, R. P. and Phillips, D. L. (1994) A Statistical-TopographicModel for Mapping Climatological Precipitation over Mountainous Ter-rain. Journal of Applied Meteorology, 33.Daly, C., Taylor, G. and Gibson, W. (1997) The PRISM Approach to Map-ping Precipitation and Temperature. In 10th Conf. on Applied Climatol-ogy, Reno, NV, Amer. Meteor. Soc., no. 10-12.Daly, C., Taylor, G., Gibson, W., Parzybok, T., Johnson, G. and Pasteris,P. (2000) High-quality spatial climate data sets for the United States andbeyond. Transactions of the ASAE, 43, 1957–1962.Damian, D., Sampson, P. D. and Guttorp, P. (2001) Bayesian estimationof semi-parametric non-stationary spatial covariance structures. Environ-metrics, 12, 161–178.132BibliographyDe Oliveira, V., Kedem, B. and Short, D. A. (1997) Bayesian Prediction ofTransformed Gaussian Random Fields. Journal of the American Statisti-cal Association, 92, 1422–1433.Diggle, P. (1985) A Kernel Method for Smoothing Point Process Data. Jour-nal of the Royal Statistical Society. Series C (Applied Statistics), 34, 138–147.Diggle, P. J. . (2013) Statistical Analysis of Spatial and Spatio-TemporalPoint Patterns. Chapman and Hall/CRC, 3rd edn.Diggle, P. J., Menezes, R. and Su, T.-l. (2010) Geostatistical inference underpreferential sampling. Journal of the Royal Statistical Society: Series C(Applied Statistics), 59, 191–232.Diggle, P. J. and Ribeiro Jr, P. J. (2007) Model-based Geostatistics (SpringerSeries in Statistics). Springer, 1 edn.Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998) Model-based Geostatis-tics. Journal of the Royal Statistical Society: Series C (Applied Statistics),47, 299350.Ferreira, G. S. and Gamerman, D. (2015) Optimal Design in Geostatisticsunder Preferential Sampling. Bayesian Anal., 10, 711–735.Foley, K. M. and Fuentes, M. (2008) A statistical framework to combinemultivariate spatial data and physical models for Hurricane surface windprediction. Journal of Agricultural, Biological, and Environmental Statis-tics, 13, 37–59.Fraley, C., Raftery, A. E., Gneiting, T. and M., S. J. (2013) ensembleBMA:An R Package for Probabilistic Forecasting using Ensembles and BayesianModel Averaging. Tech. Rep. 516, University of Washington.Fuentes, M. (2001) A high frequency kriging approach for non-stationaryenvironmental processes. Environmetrics, 12, 469–483.133Bibliography— (2002) Spectral methods for nonstationary spatial processes. Biometrika,89, 197–210.Fuentes, M. and Raftery, A. E. (2005) Model Evaluation and Spatial In-terpolation by Bayesian Combination of Observations with Outputs fromNumerical Models. Biometrics, 61, 36–45.Fuentes, M. and Smith, R. L. (2001) A New Class of Nonstationary SpatialModels. Tech. rep., North Carolina State University.Gelfand, A. E. (2010) Handbook of Spatial Statistics, chap. Misaligned spa-tial data: The change of support problem, 517–539. CRC Press.Gelfand, A. E., Sahu, S. K. and Holland, D. M. (2012) On the effect ofpreferential sampling in spatial prediction. Environmetrics, 23, 565–578.Gelfand, A. E. and Schliep, E. M. (2016) Spatial statistics and Gaussianprocesses: A beautiful marriage. Spatial Statistics.Gelfand, A. E. and Smith, A. F. M. (1990) Sampling-Based Approachesto Calculating Marginal Densities. Journal of the American StatisticalAssociation, 85, 398–409.Gillenwater, J., Kulesza, A. and Taskar, B. (2012) Near-Optimal MAP In-ference for Determinantal Point Processes. In Advances in Neural Infor-mation Processing Systems.Goodchild, M. F. and Haining, R. P. (2004) GIS and spatial data analysis:Converging perspectives. Papers in Regional Science, 83, 363–385.Gotway, C. A. and Young, L. J. (2002) Combining Incompatible SpatialData. Journal of the American Statistical Association, 97, 632–648.Grimit, E. P. and Mass, C. F. (2002) Initial Results of a Mesoscale Short-Range Ensemble Forecasting System over the Pacific Northwest. Weatherand Forecasting, 17, 192–205.134BibliographyGuillas, S., Bao, J., Choi, Y. and Wang, Y. (2008) Statistical correction anddownscaling of chemical transport model ozone forecasts over Atlanta.Atmospheric Environment, 42, 1338 – 1348.Guillas, S., Tiao, G., Wuebbles, D. and Zubrow, A. (2006) Statistical diag-nostic and correction of a chemistry-transport model for the prediction oftotal column ozone. Atmospheric Chemistry and Physics, 6, 525–537.Guttorp, P., Le, N. D., Sampson, P. D. and Zidek, J. V. (1993) Using entropyin the redesign of an environmental monitoring network. Tech. rep., Tech-nical report, Department of Statistics. University of British Columbia.,1992. Tech. Rep. 116.Guttorp, P. and Sampson, P. D. (2010) Discussion of Geostatistical inferenceunder preferential sampling by Diggle, P. J., Menezes, R. and Su, T.Journal of the Royal Statistical Society: Series C (Applied Statistics), 59,191–232.Haaland, P., McMillan, N., Nychka, D. and Welch, W. (1994) Analysis ofSpace-Filling Designs. vol. 26, 111–120.Haas, T. C. (1990) Lognormal and Moving Window Methods of EstimatingAcid Deposition. Journal of the American Statistical Association, 85, pp.950–963.— (1995) Local Prediction of a Spatio-Temporal Process with an Appli-cation to Wet Sulfate Deposition. Journal of the American StatisticalAssociation, 90, pp. 1189–1199.Hastings, H. (1970) Monte Carlo Sampling Methods Using Markov chainsand their Applications. Biometrika, 57, 97–109.Higdon, D., Swall, J. and Kern, J. (1999) Non-stationary spatial modeling.Bayesian statistics, 6, 761–768.Hoberg, E. P. and Brooks, D. R. (2015) Evolution in action: climatechange, biodiversity dynamics and emerging infectious disease. Philosoph-135Bibliographyical Transactions of the Royal Society of London B: Biological Sciences,370.Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999)Bayesian model averaging: a tutorial (with comments by M. Clyde, DavidDraper and E. I. George, and a rejoinder by the authors. Statist. Sci., 14,382–417.Hough, J. B., Krishnapur, M., Peres, Y. and Virg, B. (2006) DeterminantalProcesses and Independence. Probab. Surveys, 3, 206–229.Jaynes, E. T. (1963) Blendeis University Summer Institute Lectures in The-oretical Physics, Statistical Physics 3, chap. Information theory and sta-tistical mechanics, 181–218. New York: W. A. Benjamin, Inc.Kahle, D. and Wickham, H. (2013) ggmap: Spatial Visualization with gg-plot2. The R Journal, 5, 144–161.Kennedy, M. C. and O’Hagan, A. (2001) Bayesian calibration of computermodels. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 63, 425–464.Kleiber, W., Katz, R. W. and Rajagopalan, B. (2013) Daily minimum andmaximum temperature simulation over complex terrain. The Annals ofApplied Statistics, 7, 588–612.Kleiber, W., Raftery, A. E., Baars, J., Gneiting, T., Mass, C. F. and Grimit,E. (2011) Locally Calibrated Probabilistic Temperature Forecasting Us-ing Geostatistical Model Averaging and Local Bayesian Model Averaging.Monthly Weather Review, 139, 2630–2649.Ko, C.-W., Lee, J. and Queyranne, M. (1995) An Exact Algorithm for Max-imum Entropy Sampling. Operations Research, 43, pp. 684–691.Kulesza, A. and Taskar, B. (2011a) Learning Determinantal Point Processes.In Conference on Uncertainty in Artificial Intelligence (UAI). Barcelona,Spain.136Bibliography— (2011b) Structured Determinantal Point Processes. In Advances in NeuralInformation Processing Systems 23.— (2012) Determinantal point processes for machine learning. Foundationsand Trends in Machine Learning, 5.Kunkel, K. E., Stevens, L. E., Stevens, S. E., Sun, L., Janssen, E., Wuebbles,D., Redmond, K. T. and Dobson, J. G. (2013) Regional Climate Trendsand Scenarios for the U.S. National Climate Assessment. Part 6. Climateof the Northwest U.S. Tech. rep., U.S. NOAA Technical Report NESDIS142-6. National Oceanic and Atmospheric Administration, National En-vironmental Satellite, Data, and Information Service, Washington, D.C.Lavancier, F., Møller, J. and Rubak, E. (2015) Determinantal point processmodels and statistical inference. Journal of the Royal Statistical Society:Series B (Statistical Methodology).Lawrimore, J. H., Menne, M. J., Gleason, B. E., Williams, C. N., Wuertz,D. B., Vose, R. S. and Rennie, J. (2011) An overview of the Global His-torical Climatology Network monthly mean temperature data set, version3. Journal of Geophysical Research: Atmospheres, 116.Le, N., Zidek, J., White, R., Cubranic, D., with Fortran code for Sampson-Guttorp estimation authored by Paul D. Sampson, Guttorp, P., Meiring,W., Hurley, C., method implementation by H.A. Watts, R.-K.-F. andShampine., L. (2014) EnviroStat: Statistical analysis of environmentalspace-time processes. R package version 0.4-0.Le, N. D. and Zidek, J. (2006) Statistical Analysis of Environmental Space-Time Processes (Springer Series in Statistics). Springer, 1 edn edn.Le, N. D. and Zidek, J. V. (1992) Interpolation with uncertain spatial co-variances: A Bayesian alternative to Kriging. Journal of MultivariateAnalysis, 43, 351 – 374.Li, B., Sain, S., Mearns, L. O., Anderson, H. A., Kovats, S., Ebi, K. L.,Bekkedal, M. Y. V., Kanarek, M. S. and Patz, J. A. (2012) The impact137Bibliographyof extreme heat on morbidity in Milwaukee, Wisconsin. Climatic Change,110, 959–976.Li, C., Jegelka, S. and Sra, S. (2015) Efficient sampling for k-determinantalpoint processes. arXiv preprint arXiv:1509.01618.Lindgren, F. and Rue, H. (2015) Bayesian Spatial Modelling with R-INLA.Journal of Statistical Software, 63, 1–25.Lindgren, F., Rue, H. and Lindstro¨m, J. (2011) An explicit link betweenGaussian fields and Gaussian Markov random fields: the stochastic partialdifferential equation approach. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 73, 423–498.Liu, Z. (2007) Combining measurements with deterministic model out-puts: predicting ground-level ozone. Ph.D. thesis, University of BritishColumbia.Liu, Z., Le, N. D. and Zidek, J. V. (2011) An empirical assessment ofBayesian melding for mapping ozone pollution. Environmetrics, 22, 340–353.Macchi, O. (1975) The Coincidence Approach to Stochastic Point Processes.Advances in Applied Probability, 7, pp. 83–122.Mass, C. (2008) The Weather of the Pacific Northwest. University of Wash-ington Press, 1 edn.McMillan, N. J., Holland, D. M., Morara, M. and Feng, J. (2010) Combiningnumerical model output and particulate data using Bayesian space–timemodeling. Environmetrics, 21, 48–65.Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E. and Houston, T. G.(2012) An Overview of the Global Historical Climatology Network-DailyDatabase. Journal of Atmospheric and Oceanic Technology, 29, 897910.138BibliographyMetropolis, N., Rosenbulth, A., Rosenbulth, M., Teller, A. and Teller, E.(1953) Equation of State Calculations by Fast Computing Machine. Jour-nal of Chemical Physics, 21, 1087–1091.Møller, J. and Waagepetersen, R. P. (2004) Statistical inference and simu-lation for spatial point processes. Chapman & Hall/CRC.Mote, P. (2004) “ How and why is Northwest climate changing?” in ClimateChange, Carbon, and Forestry in Northwestern North America.Mote, P., Snover, A. K., Capalbo, S., Eigenbrode, S. D., Glick, P., Littell, J.,Raymondi, R. and Reeder, S. (2014) Ch. 21: Northwest. Climate ChangeImpacts in the United States: The Third National Climate Assessment.U.S. Global Change Research Program.Mu¨ller, W. G. (2005) A comparison of spatial design methods for correlatedobservations. Environmetrics, 16, 495–505.Murray, A. T. (2010) Advances in location modeling: GIS linkages andcontributions. Journal of Geographical Systems, 12, 335–354.Naylor, J. C. and Smith, A. F. M. (1982) Applications of a Method for theEfficient Computation of Posterior Distributions. Journal of the RoyalStatistical Society Series C, 31, 214–225.Neal, R. M. (2003) Slice sampling. Ann. Statist., 31, 705–767.Nguyen, H., Cressie, N. and Braverman, A. (2012) Spatial Statistical DataFusion for Remote Sensing Applications. Journal of the American Statis-tical Association, 107, 1004–1018.Nychka, D., Furrer, R., Paige, J. and Sain, S. (2015) fields: Tools for spatialdata. R package version 8.4-1.Nychka, D. and Saltzman, N. (1998) Case Studies for Environmental Statis-tics, chap. Design of air quality networks. Lecture Notes in Statistics,Springer Verlag.139BibliographyNychka, D., Wikle, C. and Royle, J. A. (2002) Multiresolution models fornonstationary spatial covariance functions. Statistical Modelling, 2, 315–331.Nychka, D., Yang, Q. and Royle, J. A. (1997) Statistics for the Environment,Vol. 3, Pollution Assessment and Control, chap. Constructing spatial de-signs using regression subset selection. Wiley, New York.Nychka, D. W. and Anderson, J. L. (2010) Handbook of Spatial Statistics,chap. Data assimilation, 241–270. CRC Press.Palacios, M. B. and Steel, M. F. J. (2006) Non-gaussian bayesian geostatisti-cal modelling. Journal of American Statistical Association, 101, 604618.Pati, D., Reich, B. J. and Dunson, D. B. (2011) Bayesian geostatisticalmodelling with informative sampling locations. Biometrika, 98, 35–48.Pronzato, L. and Mu¨ller, W. G. (2012) Design of computer experiments:space filling and beyond. Statistics and Computing, 22, 681–701.R Core Team (2014) R: A Language and Environment for Statistical Com-puting. R Foundation for Statistical Computing, Vienna, Austria.Raftery, A. E., Balabdaoui, F., Gneiting, T. and Polakowski, M. (2005)Using Bayesian model averaging to calibrate forecast ensembles. MonthlyWeather Review, 133, 1155–1174.Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997) Bayesian Model Av-eraging for Linear Regression Models. Journal of the American StatisticalAssociation, 92, 179–191.Ribeiro Jr., P. and Diggle, P. (2001) geoR: a package for geostatistical anal-ysis. R-NEWS, 1, 15–18.Ripley, B. D. (1988) Statistical inference for spatial processes. CambridgeUniversity Press.140BibliographyRobine, J.-M., Cheung, S. L. K., Roy, S. L., Oyen, H. V., Griffiths, C.,Michel, J.-P. and Herrmann, F. R. (2008) Death toll exceeded 70,000 inEurope during the summer of 2003. Comptes Rendus Biologies, 331, 171– 178.Rodriguez-Mozaz, S., Lopez de Alda, M. J. and Barcelo´, D. (2006) Biosen-sors as useful tools for environmental analysis and monitoring. Analyticaland Bioanalytical Chemistry, 386, 1025–1041.Royle, J. A. and Nychka, D. (1998) An Algorithm for the Constructionof Spatial Coverage Designs with Implementation in SPLUS. Comput.Geosci., 24, 479–488.Rue, H. and Martino, S. (2007) Approximate Bayesian inference for hi-erarchical Gaussian Markov random field models. Journal of statisticalplanning and inference, 137(10), 3177–3192.Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian inferencefor latent Gaussian models by using integrated nested Laplace approxi-mations. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 71, 319–392.Rue, H., Riebler, A., Sørbye, S. H., Illian, J. B., Simpson, D. P. and Lind-gren, F. K. (2016) Bayesian Computing with INLA: A Review. arXivpreprint arXiv:1604.00860.Ruiz-Ca´rdenas, R., Krainski, E. T. and Rue, H. (2012) Direct fitting ofdynamic models using integrated nested Laplace approximations {INLA}.Computational Statistics & Data Analysis, 56, 1808 – 1828.Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989) Design andAnalysis of Computer Experiments. Statist. Sci., 4, 409–423.Salathe´, E. P., Steed, R., Mass, C. F. and Zahn, P. H. (2008) A High-Resolution Climate Model for the U.S. Pacific Northwest: Mesoscale Feed-backs and Local Responses to Climate Change. J. Climate, 21, 57085726.141BibliographySampson, P. D. and Guttorp, P. (1992) Nonparametric Estimation of Non-stationary Spatial Covariance Structure. Journal of the American Statis-tical Association, 87, 108–119.Schmidt, A. M., Guttorp, P. and O’Hagan, A. (2011) Considering covariatesin the covariance structure of spatial processes. Environmetrics, 22, 487–500.Schmidt, A. M. and O’Hagan, A. (2003) Bayesian inference for non-stationary spatial covariance structure via spatial deformations. Journalof the Royal Statistical Society: Series B (Statistical Methodology), 65,743–758.Shirota, S. and Gelfand, A. E. (2016) Approximate Bayesian Computationand Model Validation for Repulsive Spatial Point Processes. ArXiv e-prints.Simpson, D., Lindgren, F. and Rue, H. (2012a) In order to make spatialstatistics computationally feasible, we need to forget about the covariancefunction. Environmetrics, 23, 65–74.— (2012b) Think continuous: Markovian Gaussian models in spatial statis-tics. Spatial Statistics, 1, 16 – 29.Smith, A. F. M., Skene, A. M., Shaw, J. E. H. and Naylor, J. C. (1987)Progress with Numerical and Graphical Methods for Practical BayesianStatistics. Journal of the Royal Statistical Society. Series D (The Statis-tician), 36, 75–82.Smith, B. J. and Cowles, M. K. (2007) Correlating point-referenced radonand areal uranium data arising from a common spatial process. Journal ofthe Royal Statistical Society: Series C (Applied Statistics), 56, 313–326.Snyder, J. P. (1987) Map projections–a working manual. United States.Stevens Jr, D. L. and Olsen, A. R. (2003) Variance estimation for spatiallybalanced samples of environmental resources. Environmetrics, 14, 593–610.142Bibliography— (2004) Spatially balanced sampling of natural resources. Journal of theAmerican Statistical Association, 99, 262–278.Stevens Jr, D. L. and Olsen, A, R. (1999) Spatially restricted surveys overtime for aquatic resources. Journal of Agricultural, Biological, and Envi-ronmental Statistics, 415–428.Strauss, W. A. (2008) Partial Differential equations – An Introduction. NewYork: John Wiley and Sons, Inc.Swall, J. L. and Davis, J. M. (2006) A Bayesian statistical approach forthe evaluation of {CMAQ}. Atmospheric Environment, 40, 4883 – 4893.Special issue on Model Evaluation: Evaluation of Urban and RegionalEulerian Air Quality Models.Tierney, L. and Kadane, J. B. (1986) Accurate approximations for poste-rior moments and marginal densities. Journal of the American StatisticsAssociation, 81, 82–86.Tierney, L., Kass, R. E. and Kadane, J. B. (1989) Approximate MarginalDensities of Nonlinear Functions. Biometrika, 76, 425–433.Tothill, I. E. (2001) Biosensors developments and potential applications inthe agricultural diagnosis sector. Computers and Electronics in Agricul-ture, 30, 205 – 218.Van Groenigen, J., Pieters, G. and Stein, A. (2000) Optimizing spatial sam-pling for multivariate contamination in urban areas. Environmetrics, 11,227–244.Whittle, P. (1954) ON STATIONARY PROCESSES IN THE PLANE.Biometrika, 41, 434–449.— (1963) Stochastic-processes in several dimensions. Bulletin of the Inter-national Statistical Institute, 40, 974–994.Wikle, C. K. and Berliner, L. M. (2005) Combining Information across Spa-tial Scales. Technometrics, 47, 80–91.143World Health Organization (2009) Global health risks: mortal-ity and burden of disease attributable to selected major risks.URLhttp://www.who.int/healthinfo/global_burden_disease/GlobalHealthRisks_report_full.pdf. Last Accessed: 2016-06-06.Zidek, J. V., Le, N. D. and Liu, Z. (2012) Combining data and simulated datafor space-time fields: application to ozone. Environmental and EcologicalStatistics, 19, 37–56.Zidek, J. V., Shaddick, G. and Taylor, C. G. (2014) Reducing estimationbias in adaptively changing monitoring networks with preferential siteselection. Ann. Appl. Stat., 8, 1640–1670.Zidek, J. V. and Zimmerman, D. L. (2010) Handbook of Spatial Statistics,chap. Monitoring Network Design, 131–148. CRC Press.144Appendix AMiscellaneousDefinition A.1. Green’s first identity (Chapter 7 of Strauss (2008))Let u and v be scalar functions on some region Ω ⊂ Rd. Suppose that uis twice continuously differentiable, and v once continuously differentiable.Then, ∫∂Ωv∂u∂ndS =∫Ω∇v · ∇u dx +∫Ωv∆u dx, (A.1)where ∆ is the Laplacian operator, ∂Ω is the boundary region Ω, and ∂u∂n isthe directional derivative in the outward normal direction.145Appendix BINLA-SPDE ExampleIn this appendix, we demonstrate the use of R-INLA for the implemen-tation of the BEM model described in Chapter 5. We consider an artificialensemble of five deterministic model outputs. We randomly sampled 100locations within a unit square, and simulated data based on the followingsettings.• Parameters of the “true” underlying process Z– Mean parameters:µ(s) = α+ β1lat + β2long,where α = 1, β1 = 2, β2 = 1.– Covariance parameters: Mate´rn withsmoothness κ = 10;marginal variance σ2 = 1; andrange√8/κ = 0.28• Parameters of measurement process Zˆσ2e = 0.5.• Parameters of ensemble members Z˜j– Additive biases (a1, a2, a3, a4, a5) = (2.0, 3.5, 1.5, 2.5, 3.0).– Multiplicative biases (b1, b2, b3, b4, b5) = (0.8, 1.2, 0.9, 1.1, 1.5).– Variances (σ2δ1 , σ2δ2, σ2δ3 , σ2δ4, σ2δ5) = (2.0, 2.5, 1.5, 2.0, 3.0).146Appendix B. INLA-SPDE ExampleAfter simulating the data, the first step is to create a triangulation ofthe continuous spatial domain, as it is described in Figure B.1. Note thatthe spatial domain was extended to avoid a boundary effect.llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllFigure B.1: Triangulation for the artificial data. The mesh comprises of 486edges and was constructed using triangles that have a minimum angle of25, maximum edge length of 0.1 within the spatial domain and 0.2 in theextension domain. The 100 artificial monitoring locations are highlighted inred.For model completeness, in order to carry out the inference procedure,we describe our independent prior specifications below. For numerical sta-bility, we specify priors for precision parameters (inverse of the variance)in a logarithmic scale. Notation used below for normal priors is mean andprecision, whereas we refer to a Log-gamma as simply the logarithm of aGamma distribution.• For the mean parameters α, β1, β2, and calibration parameters aj andbj , for j = 1, . . . , 5, we specified a N(0, 0.01) prior.• For log(σ−2δj ), j = 1, . . . , 5, we specified a Log-gamma(0.01, 0.01) prior.• For log(σ−2e ), we specified a Log-gamma(1, 0.01) prior.147Appendix B. INLA-SPDE Example• For log(σ), we specify a N(0, 0.1) prior, for log(κ) a N(0, 1). Weheuristically specify the prior for the spatial range as a fifth of theapproximate domain diameter. This leads to a fairly vague prior spec-ification for log(σ). As described in Lindgren and Rue (2015), for thisheuristic choice, the precision 1 for the prior of log(κ) gives an ap-proximate 95% prior probability for the range being shorter than thedomain size.Table B.1 contains the posterior summaries for the parameters of theBEM model for the artificial data, all parameters were reasonably well esti-mated. Under a fairly weak prior specification, the measurement error pre-cision was underestimated. The marginal posterior densities for all modelparameters can be found in Figures B.2, B.3, B.4, B.5, B.6, and B.7.148Appendix B. INLA-SPDE ExampleTable B.1: Posterior summaries for the parameters of the BEM model for theartificial data, including posterior mean and 95% credible intervals. Notethat all parameters were reasonably well estimated. Under a fairly weakprior specification, the measurement error precision was underestimated.Parameter True Value Post. Mean CI (95%)α 1.00 0.87 (-0.78; 2.56)β1 2.00 1.55 (-0.61; 3.60)β2 1.00 2.36 (0.29; 4.62)σ−2e 2.00 3.38 (2.07; 5.27)σ−2 1.00 1.40 (0.65; 2.56)κ 10.0 8.76 (4.63; 13.52)ρ 0.28 0.32 (0.18; 0.53)a1 2.00 1.81 (1.15; 2.45)a2 3.50 3.25 (2.49; 4.01)a3 1.50 1.06 (0.44; 1.65)a4 2.50 2.55 (1.85; 3.26)a5 3.00 2.93 (2.03; 3.85)b1 0.80 0.91 (0.71; 1.11)b2 1.20 1.30 (1.07; 1.53)b3 0.90 1.04 (0.85; 1.24)b4 1.10 1.14 (0.92; 1.36)b5 1.50 1.45 (1.17; 1.73)σ−2δ1 0.50 0.64 (0.47; 0.85)σ−2δ2 0.40 0.47 (0.34; 0.63)σ−2δ3 0.67 0.70 (0.51; 0.94)σ−2δ4 0.50 0.53 (0.39; 0.72)σ−2δ5 0.33 0.28 (0.21; 0.38)149Appendix B. INLA-SPDE Example−5 0 50.00.10.20.30.40.5αDensity−5 0 5 100.00.10.20.30.4β1Density−5 0 5 100.00.10.20.30.4β2DensityFigure B.2: Posterior distributions for the mean parameters of the underly-ing random field. The gray line represents the true value.0.2 0.3 0.4 0.5 0.6012345σe2DensityFigure B.3: Posterior distribution for the measurement error variance i.e,σ2e . The gray line represents the true value.150Appendix B. INLA-SPDE Example−1 0 1 2 3 4 50.00.20.40.60.81.01.21.4a1Density0 2 4 60.00.20.40.60.81.01.2a2Density−2 −1 0 1 2 3 40.00.20.40.60.81.01.2a3Density−1 0 1 2 3 4 5 60.00.20.40.60.81.01.2a4Density0 2 4 60.00.20.40.60.81.0a5DensityFigure B.4: Posterior distributions for the additive calibration parametersfor each member of the ensemble i.e, aj , for j = 1, . . . , 5. The gray linerepresents the true value.151Appendix B. INLA-SPDE Example0.0 0.5 1.0 1.501234b1Density0.5 1.0 1.5 2.0 2.50.00.51.01.52.02.53.03.5b2Density0.0 0.5 1.0 1.5 2.001234b3Density0.0 0.5 1.0 1.5 2.00.00.51.01.52.02.53.03.5b4Density0.0 0.5 1.0 1.5 2.0 2.50.00.51.01.52.02.5b5DensityFigure B.5: Posterior distributions for the multiplicative calibration param-eters for each member of the ensemble i.e, bj , for j = 1, . . . , 5. The gray linerepresents the true value.152Appendix B. INLA-SPDE Example1.0 1.5 2.0 2.50.00.51.01.5σδ12Density1.5 2.0 2.5 3.0 3.50.00.20.40.60.81.01.2σδ22Density1.0 1.5 2.00.00.51.01.5σδ32Density1.5 2.0 2.5 3.00.00.20.40.60.81.01.21.4σδ42Density2.5 3.0 3.5 4.0 4.5 5.0 5.50.00.20.40.6σδ52DensityFigure B.6: Posterior distributions for the variance parameters for eachmember of the ensemble i.e, σ2δj , for j = 1, . . . , 5. The gray line representsthe true value.5 10 150.000.050.100.15κDensity0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00.00.20.40.60.81.0σ2Density0.2 0.3 0.4 0.5 0.6 0.7012345ρDensityFigure B.7: Posterior distributions for covariance parameters, namely,smoothness, variance and range, respectively. The gray line represents thetrue value.153
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Models and monitoring designs for spatio-temporal climate...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Models and monitoring designs for spatio-temporal climate data fields Casquilho Resende, Camila Maria 2016
pdf
Page Metadata
Item Metadata
Title | Models and monitoring designs for spatio-temporal climate data fields |
Creator |
Casquilho Resende, Camila Maria |
Publisher | University of British Columbia |
Date Issued | 2016 |
Description | In this thesis, we describe how appropriately modelling the spatio-temporal mean surface can help resolve complex patterns of nonstationarity and improve spatial prediction. Nonstationary fields are common in environmental science applications, and even more so in regions with complex terrain. Our analyses focus on the Pacific Northwest, a region where rapid changes and localized weather are very common, and where the terrain plays an important role in separating often radically different climate and weather regimes. To this end, we introduce two comparable strategies for spatial prediction. The first is based on a Bayesian spatial prediction method, where an exploratory analysis was performed in order to better understand the localized weather regimes. The other is based on tackling the anomalies of expected climate in the Pacific Northwest region, based on the average values of temperature computed over a 30-year range obtained through a climate analysis system. Secondly, we focus on one of the recent challenges in spatial statistics applications, the data fusion problem. There has been an increased need for combining information from multiple sources that may be on different spatial scales. Ensemble modelling is referred to as a statistical post-processing technique based on combining multiple computer model outputs in a statistical model with the goal of obtaining probabilistic forecasts. We give an overview of some ensemble modelling strategies, by combining observed temperature measurements with outputs from an ensemble of deterministic climate models. We also provide a comparison between the Bayesian model averaging approach and a dynamic Bayesian ensemble strategy for forecasting. Finally, we introduce a novel strategy for the design of monitoring network, where the goal is to select a high-quality yet diverse set of locations. The idea of spatial repulsion is brought to this context via the theory of determinantal point processes. Our design strategy is not only able to yield spatially-balanced designs, but it also has the ability to assess similarity between the potential locations should there be extra sources of information related to the underlying process of interest. We explore its relationship to existing design methods, such as the entropy-based and space-filling designs. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2016-09-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-ShareAlike 4.0 International |
DOI | 10.14288/1.0314917 |
URI | http://hdl.handle.net/2429/59293 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2016-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-sa/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2016_november_casquilhoresende_camila.pdf [ 15.85MB ]
- Metadata
- JSON: 24-1.0314917.json
- JSON-LD: 24-1.0314917-ld.json
- RDF/XML (Pretty): 24-1.0314917-rdf.xml
- RDF/JSON: 24-1.0314917-rdf.json
- Turtle: 24-1.0314917-turtle.txt
- N-Triples: 24-1.0314917-rdf-ntriples.txt
- Original Record: 24-1.0314917-source.json
- Full Text
- 24-1.0314917-fulltext.txt
- Citation
- 24-1.0314917.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0314917/manifest