An Approximate Spatio-TemporalBayesian Model for Alberta WheatYieldbyEvan Popo↵B.Sc. Hons., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE COLLEGE OF GRADUATE STUDIES(Interdisciplinary Studies)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)June 2014c Evan Popo↵, 2014AbstractCrop forecasting models are very valuable to a number of agriculturaland government agencies. We investigated the e↵ect of spatial and temporalenvironmental climate covariates on the growth of crop yield (wheat) at theregional scale across the province of Alberta. Model fitting was accomplishedusing data collected during the growing season from climate stations acrossAlberta provided by Agriculture and Agri-Food Canada (AAFC). A bestfitting model was selected which takes into account simplicity (number ofcovariates used) and accuracy (predictive capability based on two selectioncriteria).There have been a number of Bayesian methods for predicting wheatyield. However, many of these methods typically involve extensive algo-rithms such as a Metropolis-Hastings Markov Chain Monte Carlo (MCMC)that adds substantial computational complexity and run-time. We inves-tigated the application of a spatio-temporal Bayesian model entitled theIntegrated Nested Laplace Approximation (INLA). This method o↵ers acomputationally cheaper alternative to the MCMC approach and is capableof handling large data requiring interpolation (data sparsity) with relativeease. By structuring the model to have a sparse precision matrix, INLA isable to simplify posterior marginal estimation of the parameters by incor-porating the Laplace approximation.The INLA model demonstrated strong predictive capabilities when pre-dicting for one year in advance or hind-casting for a single previous year.However, when multiple years of data were removed or predictions weremade for multiple years in advance, INLA struggled to make predictionswhich deviated considerably from the mean of the remaining data. Predic-tive performance in the best fitting model saw a 40% increase in root meaniiAbstractsquared error (RMSE) when moving from one year to two and another 6%increase when moving from two to three years. We conclude that the INLAmodel structure o↵ers valuable information when examining one year in ad-vance but caution should be taken when attempting to forecast for multipleyears in advance.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . ixChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Chapter 2: Data Assimilation, Manipulation, and Interpola-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Data Assimilation and Manipulation . . . . . . . . . . . . . . 52.2 Dealing With Missing Data . . . . . . . . . . . . . . . . . . . 6Chapter 3: Model Methodology . . . . . . . . . . . . . . . . . . 113.1 Bayesian Spatio-Temporal Model . . . . . . . . . . . . . . . . 113.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 The Integrated Nested Laplace Approximation (INLA) . . . . 153.4 Obtaining the marginals ⇡( k|y) . . . . . . . . . . . . . . . . 183.5 Obtaining the marginals ⇡(✓i|y) . . . . . . . . . . . . . . . . . 193.5.1 Using Gaussian Approximations . . . . . . . . . . . . 203.5.2 Using the Laplace Approximation . . . . . . . . . . . 203.5.3 Using the Simplified Laplace Approximation . . . . . 21ivTABLE OF CONTENTSChapter 4: Model Application Methodology . . . . . . . . . . 244.1 Latent Model Descriptions Used in R-INLA . . . . . . . . . . 244.2 Imputing Missing Covariate Values . . . . . . . . . . . . . . . 254.3 Spatial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Spatio-Temporal Models . . . . . . . . . . . . . . . . . . . . . 27Chapter 5: Model Selection . . . . . . . . . . . . . . . . . . . . 31Chapter 6: Results: Regional Distribution of Wheat Yield . . 42Chapter 7: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 51Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54vList of TablesTable 4.1 Covariate Configurations. . . . . . . . . . . . . . . . . . 30Table 5.1 Deviance information criterion (DIC) and cross vali-dated root mean squared error (CVRMSE) values forthe various models fitted. Smaller values of DIC andCVRMSE indicate a better model fit so Model 2 underconfiguration III had the best DIC value and Model 1under configuration II had the best overall CVRMSEvalue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Table 5.2 Correlation matrix for maximum temperature, mini-mum temperature, and growing degree days. Small val-ues indicate no potential problems with collinearity. . . 36Table 5.3 Deviance information criterion (DIC) and cross vali-dated root mean squared error (CVRMSE) values fora subset of models fitted. Model 2 under configurationIII had the best overall DIC value and model 1 underconfiguration I had the best overall CVRMSE value. . . 38Table 5.4 Model validation for spatial only, INLA imputed missingvalues, and average imputed missing values. The modelusing the INLA imputed missing values displayed thebest results in terms of both DIC and CVRMSE. . . . . 40viList of FiguresFigure 2.1 Map of 150 ecodistricts in Alberta and 1272 climatestations. Wheat yield data is unavailable for all whiteecodistricts and is available for all 92 ecodistricts shadedin grey over the period of 1999-2004. . . . . . . . . . . 7Figure 2.2 Total precipitation normalization. The square roottransformation better normalized the precipitation dataopposed to the Log + 1 transformation. . . . . . . . . 10Figure 3.1 Exploration of the posterior marginal for . Assumingtwo dimensions, a grid search is performed along thestandardized axes [14]. . . . . . . . . . . . . . . . . . . 19Figure 6.1 25th and 75th Spatial Percentiles for Fitted MeanWheat Yield over Alberta. Highest wheat yield den-sity is observed throughout the central belt of Alberta.Significant variation of wheat yield is observed overneighbouring ecosdistricts. . . . . . . . . . . . . . . . . 43Figure 6.2 25th and 75th temporal percentiles for fitted meanwheat yield over Alberta covering the years 1999-2004.Highlighted is the temporal variability. The blue linerepresents the mean predicted wheat yield with thelower and upper portions of the grey band representingthe 25th and 75th percentiles, respectively. The great-est mean wheat yield is observed in 1999 and 2004,with a minimum trough year of 2002. . . . . . . . . . . 44viiLIST OF FIGURESFigure 6.3 Probability of ecodistrict wheat yield in Alberta ex-ceeding overall mean threshold. Consistencies with thespatial variability in 6.1 are displayed. With high-est probabilities observed throughout the central beltof Alberta and large variation between neighbouringecodistricts. . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 6.4 Mean wheat yield predictions over Alberta for theyear 2004 - one year in advance. Predicted values dis-played on the left are fitted using the data from 1999-2003. White ecodistricts within the observed data onthe right indicate missing data for the year 2004. . . . 47Figure 6.5 Mean Wheat Yield Predictions over Alberta for theyear 2004-two years in advance. Predicted values dis-played on the left are fitted using the data from 1999-2002. White ecodistricts within the observed data onthe right indicate missing data for the year 2004. . . . 48Figure 6.6 Mean Wheat Yield Predictions over Alberta for theyear 2004-three years in advance. Predicted values dis-played on the left are fitted using the data from 1999-2001. White ecodistricts within the observed data onthe right indicate missing data for the year 2004. . . . 49Figure 6.7 Mean Wheat Yield Predictions over Alberta for theyear 2005-one year ahead of known data. Predictedvalues are fitted using all six years of available datafrom 1999-2004. . . . . . . . . . . . . . . . . . . . . . . 50viiiAcknowledgementsFirst of all, I would like to thank my supervisor, Dr. Jason Loeppky,who supported and guided me throughout the entirety of my Masters De-gree. You showed me the way to statistics and opened the door to a greatopportunity to work at Agriculture and Agri-Food Canada. Thank you foryour patience and for always having my best interests in mind. The lessonsand knowledge I have learnt from you will stick with me for a lifetime.Thank you to Dr. Nathaniel Newlands for allowing me the amazingwork opportunity at Agriculture and Agri-Food Canada and for all of yourhelpful and encouraging insight throughout my Masters. This opportunityand your insight made me excited about statistics and this thesis would havenot been possible without either of them.Thank you to Crystal Parras for being an outstanding research and workpartner throughout the summer of 2013. Your enthusiasm and dedicationis what pushed the project through and I couldn’t have wished for a moreorganized, understanding, and supportive partner.ixChapter 1Introduction1.1 MotivationA reliable model for forecasting crop yield at the regional-scale is cru-cial for guiding decision making by farmers, governments, and agriculturalindustries [1]. For farmers, it provides a reference in order to estimate theirexpected profits for the year. It also helps them judge how to care for theircrops. For example, a model can provide information as to what the optimalamount, and combination, of chemicals in the fertilizer will help their cropsgrow best, how much water to provide their crops throughout the season,and information on the best time to reseed and replace old crops. Forecastsalso provide farmers with important marketing and production informationthat help with decisions influencing their financial well-being further on inthe year [1]. Governments use agricultural forecasts as a way to adequatelyadjust policies and pricing and also to predict profits, which influence thingssuch as Gross Domestic Product (GDP) for the country. In some countries,governments use forecasts to intervene in the interest of protecting theirdomestic agriculture [1]. Agricultural journalists make up another portionof the population who has a need for the forecasts. Agricultural journalistsprovide another way for farmers and the general public to keep up-to-dateon crop forecasts. Food producers and others in the food industry requireagricultural forecasts in order to adequately make purchasing decisions andto aid in storage decisions [1]. For example, if a producer requires a cropand purchases a large quantity of that crop in advance, it can be expensivefor that producer to store all the extra product. However, if a forecast de-termines that a good harvest is approaching, the producer can then waitand buy at a later time to avoid the extra storage costs [1].11.1. MotivationThere are di↵erent models available for predicting wheat yield. Theyrange from basic linear and quadratic regression models, to more sophis-ticated methods [4, 9]. Newlands and Zamar [9] partition the province ofAlberta into Agricultural Statistical Census Regions (CARs) and take ad-vantage of the data from neighbouring CARs in order to obtain a predictedmean wheat yield for each CAR. They use techniques such as a robust leastangle regression (LARS), a Markov Chain Monte Carlo (MCMC) while in-corporating a Metropolis-Hastings step, and a Random Forest Algorithm.A more detailed description of their methods can be found in [9]. Newlandset al. then go on to improve upon this method in a more recent publication[10]. Carew et al. employ the use of a three-stage generalized least squaresprocedure. This procedure is a form of weighted regression, which was usedto account for the heteroscedasticity present between certain variables [4].There has also been some work in comparing various wheat forecastingmethods. These include the methods of Michel and Makowski which com-pare time series models at regional and national scales [8], as well as themethods of Prost et al. which involve comparing stepwise model selectionwith Bayesian model averaging [12].In what follows, we consider a spatio-temporal model within a Bayesianframework. Bayesian methods are beneficial as they allow prior informa-tion to be included into the analysis which accounts for results learned fromprevious studies and/or previous knowledge. Bayesian methods accountfor all sources of uncertainty. Furthermore, a Bayesian approach allowsfor easily obtainable information about populations parameters. For exam-ple, under a Bayesian approach it is easy to obtain the probability thatthe population mean is greater than a certain value. Having such capabili-ties allows for much more intuitive results than what a frequentist analysiswould present. Parameter estimation is accomplished using the IntegratedNested Laplace Approximation (INLA) which is computationally feasiblefor large data. The INLA method has been used before to construct mod-els for spatial and spatio-temporal data, as seen in [3]. To our knowledge,this is the first time the INLA method has been employed for the modellingof crop yield. INLA incorporates the spatial correlation between regions21.1. Motivationas a neighbourhood structure while operating under the Markovian prop-erty. This property says that two regions share a non-zero correlation ifand only if the two regions are neighbours of each other. This implies thatnon-neighbouring regions are assumed to be completely independent of oneanother. Spatial structures of this form are well documented and referred toas Gaussian Markov Random Fields (GMRF)[13]. Unlike previous MCMCmethods which randomly sample to eventually converge on the marginal pos-terior distribution of the parameters in the model; INLA exploits its modelassumptions to obtain a numerical approximation to the marginal posteriorsin question by using a Laplace approximation [3]. A full layout of how theINLA model operates will be introduced in chapter 3. The use of the R-INLApackage in R allows for ease of implementation of the INLA method. TheR-INLA package can be obtained for free download from www.r-inla.org.The INLA method allows for a more ecient, computationally cheaperalternative to the Markov chain Monte Carlo (MCMC) technique mentionedin the previous paragraph that is often implemented when dealing withBayesian modelling [3]. MCMC requires intensive computational poweralong with lengthy simulations in order to converge upon the posterior dis-tribution of the parameters [3]. In some cases the MCMC method may notconverge to the target posterior. INLA uses a combination of an analyticalapproximation and numerical integration to obtain the target distributionwhile removing the aforementioned convergence problems that can arise aswell as the lengthy computation time needed in most cases [16].The aim of this analysis involves obtaining a model using INLA whichaccurately describes the properties of wheat yield throughout the province ofAlberta. Only regions of Alberta for which data in the covariates are readilyavailable were considered. Environmental covariates that are deemed bestat predicting the wheat behaviour were evaluated. When selecting a model,both accuracy and eciency were considered. A final working INLA modelwas chosen based on the model selection criterion outlined in chapter 5. Thefinal model will then be used to obtain a forecast of predicted wheat yieldfor up to three years into the future as well as a year ahead of the rangeof available data. Also, the probability the wheat yield in each ecodistrict31.1. Motivationinvolved in the model exceeds the mean wheat yield over all ecodistricts willbe calculated and included in a map plot. 25% and 75% confidence intervalswere produced for wheat yield. These intervals hind cast for all the years ofavailable data, taking into account the spatial and temporal aspects of thedata.The remainder of this thesis is organized as follows. Chapter 2 high-lights how the data were obtained, structured and organized, as well as thedi↵erent methods for dealing with missing data. Chapter 3 then goes on todescribe the INLA method in detail, along with all mathematical algorithmsbehind the method. Chapter 4 discusses the structure of the di↵erent mod-els used to fit both spatial and spatio-temporal data as well as describes thedi↵erent configurations of covariates used in each model structure. Chapter5 lays out the di↵erent methods used for model selection and describes theprocess of narrowing down the models to one final model. Chapter 6 dis-plays various graphs and results obtained from the final selected model inthe previous chapter. Chapter 7 concludes and summarizes upon conclusionsdrawn from the analysis.4Chapter 2Data Assimilation,Manipulation, andInterpolation2.1 Data Assimilation and ManipulationAll data used throughout the analysis were provided by Agriculture andAgri-Food Canada (hereafter, AAFC). The province of Alberta is dividedinto 150 ecodistricts as defined by Agriculture and Agri-Food Canada. Datafrom the years 1999-2004 for mean wheat yield were provided over the grow-ing season for the 150 ecodistricts (with missing data present). The growingseason is defined to take place between the months of May to September.Ecodistricts where data for wheat yield were missing throughout the entiretime period were removed. This left a total of 92 ecodistricts remaining thatcontained wheat yield data in at least one year. Wheat yield is measured inbushels/acre and contains the following varieties of wheat: Canadian West-ern Extra Strong (CWES), Prairie Red, Soft White, Durum, Hard Red, andWinter wheat.Climate stations are distributed throughout Alberta. Daily year rounddata for maximum temperature (C), minimum temperature (C), and pre-cipitation (mm) are recorded at these stations. A map displaying the ecodis-tricts and climate stations is shown below in Figure 2.1. The grey ecodis-tricts represent ecodistricts where we have data on yield for the period of1999-2004 and the white ecodistricts represent locations where yield data inunavailable. Due to the yearly format of the wheat yield data having only52.2. Dealing With Missing Dataone value per ecodistrict, the data for the other variables was arranged toalso have one yearly value per ecodistrict. Over the days of the growing sea-son the total precipitation is summed and an average of both the maximumand minimum temperature is taken for each climate station. Manipulat-ing the data into a di↵erent format may have resulted in di↵erent modelfits. For example, keeping the temperature and precipitation data as a dailymeasure over the growing season or averaging these data on a monthly scaleover the growing season. In addition, examining precipitation data prior tothe growing season may have di↵erent results than summing total precipi-tation over the growing season. Following, the average of the data from allclimate stations within each ecodistrict is taken to obtain one value for min-imum temperature, maximum temperature and total precipitation for eachecodistrict in each year. Data for water deficit (mm/growing season) andgrowing degree days (GDD) (C-day) were also provided for each ecodis-trict in each year in the analysis. Water deficit is defined as precipitationminus potential evapotranspiration, where potential evapotranspiration isthe representation of the environmental demand for evapotranspiration pro-vided there is adequate water status in the soil profile. Evapotranspirationis the sum of evaporation and plant transpiration from the Earth’s landto the atmosphere. Growing degree days are defined as the average of thedaily maximum and minimum temperature compared to a base temperature(usually 10 C). Defined in an equation:GDD = Tmax + Tmin2 Tbase, (2.1)where Tbase is the base temperature.2.2 Dealing With Missing DataThe INLA method allows for missing values in the response variable(wheat yield). Therefore, there was no need to account for missing valuesfor mean wheat yield before fitting the models. However, INLA is not ableto fit a model with missing values in the covariates and because of this,62.2. Dealing With Missing DataFigure 2.1: Map of 150 ecodistricts in Alberta and 1272 climate stations.Wheat yield data is unavailable for all white ecodistricts and is available forall 92 ecodistricts shaded in grey over the period of 1999-2004.72.2. Dealing With Missing Datamissing values for temperature, water deficit, GDD, and precipitation hadto be imputed. Three di↵erent methods were considered when addressingthe issue of missing covariate data. The benefits of each of these approachesare discussed in chapter 5.1. Nearest neighboursFor each ecodistrict in Alberta with missing data, the average of thedata from the four nearest neighbouring ecodistricts were used to im-pute the missing value.The nearest neighbours were determined by calculating the distancesbetween the centre point of the current ecodistrict and the centrepoints of the other ecodistricts. This resulted in selecting ecodistrictswith the four shortest distances from the current ecodistrict’s centre,otherwise known as the nearest neighbours.Problems with this method stem from the fact that the missing datain Alberta is not randomly scattered throughout the province. Themissing data are most often observed over a large block of ecodistricts.Hence, there were many instances (approximately 1/3 of the time)where the four nearest neighbours of an ecodistrict with missing dataalso contained no data and therefore, the nearest neighbours methodwas not used.2. Averaging dataFor each year, the average of the covariate data for each ecodistrict wastaken (minimum temperature, maximum temperature, water deficit,GDD, and total precipitation). This value was used to fill in all themissing data for the corresponding covariate.3. Prediction using the INLA methodFor each covariate in the model, maximum temperature, minimumtemperature, water deficit, GDD, and total precipitation, the INLAmethod was used to predict missing values. Each covariate was mod-elled separately. For the spatial only model the covariate was the82.2. Dealing With Missing Dataresponse variable and the spatial structure ID was the sole predictorvariable and was modelled using the Besag, York, and Molie (BYM)model [2]. Details on this model will be included in chapter 4 to fol-low. For the spatio-temporal model the covariate was again used asthe response variable. This time the spatial structure ID and the yearvariables were both used as predictor variables for the response. Thespatial ID was again modelled using a BYM model and the year vari-able was modelled using a first order random walk [11], which will alsobe described in more detail in the section to follow.For both the spatial and spatio-temporal models, the fitted values fromthe resulting models were extracted and used to fill in the missing datafor the respective ecodistricts in their given years.After examining the data via histogram for total precipitation, it wasdeemed necessary to attempt to correct for normality as this precipita-tion data is to be used as a response variable in the imputation process.The two transformations considered prior to fitting were a square roottransformation and a log(data+1) transformation. Both of these trans-formations allow for a value of zero to be mapped back to zero withoutany diculties. After viewing histograms of both transformations, thesquare root transformation was chosen as it better normalized the dataas well as preserving the spread of the data. These properties provedfavourable when random e↵ects were introduced via the first orderrandom walk. The square root and log transformations can be seenin figure 2.2 to follow. When the fitted values were extracted, each ofthe values was squared to reverse the transformation.Because the nearest neighbour method was deemed infeasible, a de-tailed description comparing the two remaining data imputation meth-ods will be outlined in Chapter 5. To summarize, in terms of theDeviance Information Criterion (DIC) it can be seen that the modelfits using INLA imputed values performed better than the model fitsusing the averaged data in all cases.92.2. Dealing With Missing DataTotal PrecipitationFrequency0 1000 2000 3000 4000 5000 6000 7000020406080100Square Root of Total PrecipitationFrequency20 40 60 80050100150Log + 1 of Total PrecipitationFrequency5 6 7 8 9050100150200Figure 2.2: Total precipitation normalization. The square root transfor-mation better normalized the precipitation data opposed to the Log + 1transformation.This also held true for essentially every case when examining the vari-ous cross-validated root mean squared errors (CV RMSEs). For thesereasons, imputing values using INLA was selected as favourable overthe data averaging method.10Chapter 3Model Methodology3.1 Bayesian Spatio-Temporal ModelHere we examine methods allowing for the incorporation of both spatialand spatio-temporal data. Spatial data in two-dimensional space are indexedby Y(s)⌘ {y(s), s 2 D 2 R2}, and the spatio-temporal data we observe areindexed by Y(s,t)⌘ {y(s, t), (s, t) 2 D ⇢ R2 ⇥ R}. Here, D can representeither a set of spatial units in the case of point level data, or a continuoussurface in the case of area level data [3]. In what follows, we will consideronly area level spatio-temporal data. In this context, the set of spatiallocations (s11, ..., snT ) were used to examine the spatio-temporal pattern ofmean wheat yield over Alberta in n = 92 spatial ecodistricts and T = 6 timepoints representative of the years 1999 through 2004. The observed meanwheat yield data was represented as y(s) = {y(s11), ..., y(snT )}.We define a spatio-temporal model under a Bayesian framework. Let✓ be the set of parameters representing the expected mean wheat yieldresponse at each spatio-temporal location. These parameters are indexedin such a way that the spatial correlation between ecodistricts is accountedfor. The subscript notation i, j will be used to indicate a spatio-temporalpoint opposed to using the sij index.As the methods involved in this analysis only consider area level data(as opposed to data situated at fixed points), it is possible to apply a neigh-bourhood structure to the data. This is accomplished by operating underthe Markovian property, which states that any two elements of the param-eter vector ✓ share a non-zero correlation if and only if the two ecodistrictsindexed in the parameter vector are neighbours of each other. Defining theprecision matrix for the elements of ✓ to be Q = ⌃1, we have Qij repre-113.1. Bayesian Spatio-Temporal Modelsenting the entry for any pair of elements (i, j). Then, under the Markovianproperty, Qij 6= 0 only if j 2 {i,N (i)}, where N (i) represents the set ofneighbours of ✓i [3]. Spatial structures of this form are referred to as Gaus-sian Markov Random Fields (GMRF) [13].The structure of the Bayesian model is then as follows. The mean of thei-th unit, µi, is linked to an additive predictor ⌘i by means of an appropriatelink function g(·), such that g(µi) = ⌘i (for example, a logistic link func-tion for binomial data). ⌘i is then modelled additively using the followingstructure: ⌘i = ↵ + MXm=1mxmi + LXl=1fl(zli). (3.1)In our context, the link function used will simply be the identity function,so that the mean for observation i (µi = ⌘i). Here, ↵ represents the intercept, = (1, ...,M ) represents the vector of linear coecients corresponding tothe covariates x = (x1, ..., xM ), and f = (f1(·), ..., fM (·)) represents a set offunctions defined on another set of covariates z = (z1, ..., zM ) which may ormay not be the same as those in x.Looking back to equation (3.1), we can now define ✓ more concretely.The parameters of interest are given by ✓ = {↵,, f}. To return to theprevious discussion in this chapter, a GMRF prior is placed on the parame-ter vector ✓. This prior has mean 0 and precision matrix Q. Furthermore,a vector of K hyper-parameters = ( 1, ..., K) contained in the preci-sion matrix Q( ) is specified. This corresponds to ✓ having a multivariatenormal distribution with mean vector µ⇤, whose ith element is equal to↵ + MXm=1mxmi and covariance matrix Q1( ).The objective of the analysis would ideally be to obtain the joint pos-terior distribution of the parameters given the data, ⇡(✓, |y). However,this not only does not have a closed form known distribution, it is alsocomputationally infeasible to obtain a direct estimate of this distribution.Therefore, we must settle for a computationally feasible alternative methodthat involves estimating the marginal posterior distributions for each of the123.2. Gibbs Samplingelements in both the parameter vector ✓ and the hyper-parameter vector .These marginals are represented respectively below [3]:⇡(✓i|y) = Z ⇡( |y)⇡(✓i| ,y) d (3.2)⇡( k|y) = Z ⇡( |y) d k. (3.3)To accomplish this, we will examine two di↵erent techniques. The tradi-tionally used approach of Gibbs sampling, and the Integrated Nested LaplaceApproximation (INLA). Gibbs sampling, while practical and proven to be ef-fective, requires a lot of implementation and execution time especially whendealing with large data. INLA o↵ers a computationally cheaper techniquethat is structured in a matter that allows large data to be handled withrelative ease.The reason for wishing to to stray away from traditional methodologyand to apply the methodology of the INLA approach is to test the tech-nique on the limited region of the province of Alberta in order to investigatewhether these methods can o↵er any computational and accuracy improve-ments when compared to the traditional Gibbs sampling approach on thisregion. If this is the case, this would o↵er valuable insight for longer-termoperational goals regarding agricultural risk management and decision sup-port. The INLA methodology would hope to be extended to a larger projectby Agriculture and Agri-Foods Canada that would expand the model regionto the entirety of the agricultural landscape of Canada.3.2 Gibbs SamplingGibbs Sampling has been the go to method when dealing with spatialmodel parameter estimation. WinBUGS is the traditional statistical soft-ware developed for use in Bayesian methods and for incorporating MCMCmethods such as Gibbs Sampling. The Gibbs Sampling technique proceedsas follows.To begin, we assume that all of the marginals represented by equations133.2. Gibbs Sampling3.2 and 3.3 are of known form (for example Gaussian, Gamma, etc.). Inthis senerio, Gibbs sampling proceeds to sample from the aforementionedmarginals via the following steps:1. Set arbitrary initial values for ✓(0), (0)2. For t=1, ..., T proceed with the following:Step 1: Randomly sample ✓(t)1 from⇡(✓1|✓(t1)2 , ..., ✓(t1)I , (t1)1 , ..., (t1)K ,y)...Step I: Randomly sample ✓(t)Ifrom⇡(✓I |✓(t)1 , ..., ✓(t)I1, (t1)1 , ..., (t1)K ,y)Step I + 1: Randomly sample (t)1 from⇡( 1|✓(t)1 , ..., ✓(t)I , (t1)2 , ..., (t1)K ,y)...Step I + K: Randomly sample (t)Kfrom⇡( K |✓(t)1 , ..., ✓(t)I , (t)1 , ..., (t)K1,y)Gibbs Sampling operates under the fact that as t gets large, samplesfrom each of the marginal distributions of the elements of the parametervector will converge to a sample from the joint posterior distribution of theelements of the parameter vector.However, it is often the case that we do not have a recognizable dis-tribution for some or all of the marginals in the steps highlighted above.Therefore, at any step in which the marginal is of unknown form, we mustemploy the use of the Metropolis Hastings Algorithm to sample for thosemarginals.143.3. The Integrated Nested Laplace Approximation (INLA)The Metropolis Hastings Algorithm proceeds as follows. Note that thisis an example for ✓i, the same would hold true if needed for any of the k’s1. Choose an arbitrary starting value for ✓i (called ✓(0)i )2. Specify a proposal density function for ✓i (presented as q(✓⇤i |✓(t)i ))3. Draw ✓⇤ifrom q(✓⇤i|✓(t1)i)4. Compute the ratio r= ⇡(✓⇤i )q(✓(t1)i |✓⇤i )⇡(✓(t1)i )q(✓⇤i |✓(t1)i )In the case that the proposal density is symmetric, we would havethat:r=⇡(✓⇤i )⇡(✓(t1)i ).5. If r1, this means that the proposal draw has a higher probabilityof occurring than the current value of ✓i (that is ✓(t1)i ) and we willthus accept the proposal draw and set ✓(t)i=✓⇤i. If r1, what happensessentially is that a random coin is flipped. With this, we will set✓(t)i=8<:✓⇤i with probability r✓(t1)iwith probability 1-rFinally, once all the iterations of this chain are complete using directsample or a Metropolis step, in order to obtain the final estimates, typicallywhat is done is that the mode of each sample is determined and this valueis taken as our final estimate for each parameter in each model.3.3 The Integrated Nested LaplaceApproximation (INLA)The Integrated Nested Laplace Approximation [14] is a method devel-oped for use on a group of structured additive regression models, entitledlatent Gaussian models. These models span a large variety of structuresincluding generalized linear models and spatial and spatio-temporal models.Due to the flexible nature of these models, they have been used in a widerange of applications and fields [3].153.3. The Integrated Nested Laplace Approximation (INLA)The first step in the INLA approach is to find a nested approximation for⇡( |y). From this, it will then be possible to obtain the marginals ⇡( k|y).Due to the conditional independence of ✓ and given y induced by theGMRF [13], it can be seen that:⇡(✓, |y) = ⇡(✓|y)⇡( |y).Again, due to the conditional independence relationship, ⇡(✓|y) = ⇡(✓| ,y).Therefore, we obtain: ⇡( |y) = ⇡(✓, |y)⇡(✓| ,y) . (3.4)The above marginal is then approximated by:⇡˜( |y) = ⇡(✓, |y)⇡˜(✓| ,y) ✓=✓⇤( ) , (3.5)where ⇡˜(✓| ,y) is the Gaussian approximation of ⇡(✓| ,y) and ✓ =✓⇤( ) is the mode of ⇡˜(✓| ,y).The Gaussian approximation is constructed based on densities of theform [14]: ⇡(✓) / (12✓TQ✓ +Xi2Ilog {⇡(yi|✓i, )}) . (3.6)For ease of readability, let log {⇡(yi|✓i, )} = gi(✓i). The Gaussian ap-proximation is then constructed by matching the mode of the distributionand the shape of the distribution to that of a Gaussian distribution. Themode is calculated by using either a Newton-Raphson scoring algorithm orthe Fisher scoring algorithm [6]. These algorithms are conducted via thefollowing steps:1. Produce an initial estimate, µ(0), for the mode of ⇡(✓).2. Conduct a Taylor series expansion of gi(✓i) around µ(0)i truncating any163.3. The Integrated Nested Laplace Approximation (INLA)terms higher than second order. This is highlighted below:gi(✓i) ⇡ gi(µ(0)i ) + bi✓i 12ci✓2i (3.7)where bi = f1(µ(0)) and ci = f2(µ(0)).3. Obtain a Gaussian approximation of ⇡(✓) with precision matrix Q +diag(c) and mode µ(1) found as the solution of {Q + diag(c)}µ(1)=b.4. Continue until the approximation converges to a Gaussian distribution.Define the mean of this converged distribution to be ✓⇤ and precisionmatrix Q⇤ = Q + diag(c⇤) [14].The next step is to locate the mode of ⇡˜( |y), by optimizing log{⇡˜( |y)}with respect to using a quasi-Newton method [7]. This updates the sec-ond derivatives of log{⇡˜( |y)} using successive gradient vectors to convergeto the target mode. A popular quasi-Newton method to achieve this isthe Broyden-Flether-Goldbarb-Shanno (BFGS) method. This method startswith an initial estimate of the mode x0 for a function f(x), and an approx-imate Hessian matrix B0 is obtained using finite di↵erences to approximateall the second derivatives of f(x). Define the gradient of f to be rf , whichis also approximated using finite di↵erences. These steps are completed untilconvergence is achieved:1. Obtain a direction pk by solving the equation Bkpk = rf(xk) forpk.2. Minimize the function f(xk+↵pk) for ↵ 2 R+ to obtain the minimum↵k. Then update xk+1 = xk + ↵kpk.3. Set sk = ↵kpk.4. Obtain the di↵erence of successive gradient vectors, yk = rf(xk+1)rf(xk)5. Update the approximate Hessian matrix, Bk+1 = Bk+ykyTkyTk skBksksTkBksTkBksk .173.4. Obtaining the marginals ⇡( k|y)Once convergence is achieved for ⇡˜( |y), define the values of the functionat the mode to be ⇤.Following, the negative Hessian matrix is computed at ⇤. This is againaccomplished using finite di↵erences. Define this matrix as H1.3.4 Obtaining the marginals ⇡( k|y)As is Gaussian distributed, we set ⌃ = H1 to be its covariancematrix. In order to more eciently explore the distribution of log{⇡˜( |y)},standardized variables z will be used instead of . This is accomplished bysetting ⌃ = V⇤VT to be the eigendecomposition of ⌃, and defining asfollows (with, z ⇠ N (0, I)): (z) = ⇤ +V⇤1/2z. (3.8)log{⇡˜( |y)} is now explored using the z-configuration as shown in figure 3.1for a two dimensional vector [14]. The intersection of the two z1 and z2axis represents the mode (z=0). In order to explore log{⇡˜( |y)} and findthe maximum probability density, the following grid search is completed.Beginning at the mode, the density is explored in the positive z1 directionusing a pre-specified step width z as long as the conditionlog [⇡˜ { (0)|y}] log [⇡˜ { (z)|y}] < ⇡ (3.9)holds. The smaller the value of z > 0, the more accurate the exploration is.Where ⇡=2.5 is a pre-defined value depending on the threshold accuracy.Following, the negative z1 direction is explored the same way, and finallythe z2 axis is explored in the same matter as the z1 axis. The result of thisexploration produces the estimates represented by the black dots shown infigure 3.1. Afterwards, all combinations in the grid of black dots are evalu-ated and are included as grey dots provided that they satisfy the conditionin equation (3.9). As all of the points are laid out on a rectangular grid, allof the area weights k are taken to be equal. These weights will be usedduring a numerical integration as seen in equation (3.10) to follow.183.5. Obtaining the marginals ⇡(✓i|y)Figure 3.1: Exploration of the posterior marginal for . Assuming twodimensions, a grid search is performed along the standardized axes [14].These evaluated points { k} will now be used in a numerical integrationto evaluate the posterior marginals ⇡(✓i|y). This is accomplished by usingthe points to construct an interpolant to log{⇡˜( |y)} and evaluating themarginals ⇡˜( k|y) via numerical integration from this interpolant.3.5 Obtaining the marginals ⇡(✓i|y)The next task for the INLA approach is to obtain an estimate for themarginals ⇡(✓i|y). This was accomplished using the following numericalintegration: ⇡˜(✓i|y) ⇡ KXk=1⇡˜(✓i| k,y)⇡˜( k|y)k. (3.10)Having just calculated the ⇡( k|y) and the weights k, the only remainingtask is to estimate the marginals ⇡(✓i| ,y).There are multiple methods for which to estimate these marginals: using193.5. Obtaining the marginals ⇡(✓i|y)Gaussian approximations, Laplace approximations, or a simplified LaplaceApproximation.3.5.1 Using Gaussian ApproximationsUsing Gaussian approximations does not require much more calculationtime from what has already been computed. However, because of this, themethod lacks accuracy when compared to the others. When constructing theestimate for ⇡˜( |y), the Gaussian approximation for ⇡(✓| ,y) was alreadycalculated. Therefore, all that would be left to calculate are the marginalvariances for each ✓i.3.5.2 Using the Laplace ApproximationA more accurate and computationally intensive method then the Gaus-sian approximation is to compute the Laplace approximation. Note: theLaplace approximation will not be used directly for any computations withINLA but some preliminary explanations must be made in order to properlylay out the structure of the simplified Laplace approximation which will beused for calculations. We partitioned the parameter vector as ✓ = (✓i,✓i)and the following was calculated using the Laplace approximation:⇡(✓i| ,y) = ⇡((✓i,✓i)| ,y)⇡(✓i|✓i, ,y) . (3.11)This is approximated at the modal configuration as per equation (3.5). Wedenote the Laplace approximation at the modal configuration using the sub-script LA:⇡(✓i| ,y) ⇡ ⇡˜LA(✓i| ,y) = ⇡(✓, ,y)˜˜⇡(✓i|✓i, ,y) ✓i=✓⇤i(✓i, ) , (3.12)where ˜˜⇡(✓i|✓i, ,y) is the Gaussian approximation of ⇡(✓i|✓i, ,y) (thisis di↵erent than the Gaussian approximation ⇡˜(✓| ,y) computed previ-ously), and ✓⇤i(✓i, ) is the modal configuration.203.5. Obtaining the marginals ⇡(✓i|y)The computational complexity of equation (3.12) arises from the factthat ˜˜⇡(✓i|✓i, ,y) must be re-computed for every value of ✓i. We simpli-fied this procedure by proposing either to remove the need for ˜˜⇡(✓i|✓i, ,y)to be optimized for every value of ✓i altogether or to downgrade the opti-mization to just a subset of the ✓i’s.The first method removes the optimization step by estimating the modalconfiguration from the previously computed Gaussian approximation ⇡˜(✓| ,y):✓⇤i(✓i, ) ⇡ E⇡˜[✓i|✓i], (3.13)where the expected value on the right hand side is computed from ⇡˜(✓| ,y).The second method involves choosing only the ✓j ’s that are close to ✓ias they have the most e↵ect on the marginal of ✓i. The correlation betweenregions ✓j and ✓i decreases as the distances between the nodes increases. Wedefined a space as the set Ri( ) where we only include the regions ✓js thatfall within a specified distance and use only these regions to compute themarginal of ✓i. The points found with the region Ri( ) are then standard-ized using the marginal mean and standard deviation from the Gaussianapproximation found in equation (3.5) and are represented as ✓(s)i. Thesepoints are then used in the construction of the approximation of the marginalfor ✓i.In the computation of the Laplace approximation in equation (3.12) thereare more details which can be found in [14], but for the purposes of explainingthe simplified Laplace approximation for use in INLA, we have all the detailswe need at this point.3.5.3 Using the Simplified Laplace ApproximationThe method used by INLA which best balances the need for eciencyand accuracy is the Simplified Laplace Approximation. This method pro-ceeds by computing a Taylor’s series expansion of the Laplace Approxima-tion and only retaining the first few terms. This Taylor’s series is thenmanipulated via a skewness function which better fits the approximation tothe target distribution.213.5. Obtaining the marginals ⇡(✓i|y)A general guideline of how the Simplified Laplace Approximation is con-structed will be laid out to follow. However, for more details please refer to[14].As specified, the natural logarithm of the numerator of equation (3.12)is approximated via a Taylor’s series expansion truncated to any terms lessthan or equal to order three.log {⇡(✓, ,y)}|✓i=E⇡˜ [✓i|✓i] = 12 ⇣✓(s)i ⌘2+16⇣✓(s)i⌘3 Xj2I\id(3)j{µi( ), }{j( )aij( )}3 + ... (3.14)As we have Gaussian distributed data, the denominator of (3.12) simplyreduces to a constant [14].log˜˜⇡(✓i|✓i, ,y) ✓i=E⇡˜ [✓i|✓i] = constant (3.15)Setting (3)i( ) =Xj2I\id(3)j{µi( ), }{j( )aij( )}3,and combining equations (3.14) and (3.15), we obtain a simplified ap-proximation of equation (3.12) as:logn⇡˜SLA(✓(s)i | ,y)o = constant12 ⇣✓(s)i ⌘2+16 ⇣✓(s)i ⌘3 (3)i ( )+... (3.16)Finally, to complete the approximation, a skew-normal distribution is intro-duced upon (3.16) in order to better emulate the desired distribution. Thisdistribution is given by:⇡SN (z) = 2! ✓z ⇠! ◆✓az ⇠! ◆ , (3.17)where (·) and (·) are the standard normal cumulative distribution and223.5. Obtaining the marginals ⇡(✓i|y)probability density functions respectively. Furthermore, ⇠, ! > 0, and a arethe location, scale, and skewness parameters.23Chapter 4Model ApplicationMethodology4.1 Latent Model Descriptions Used in R-INLAThe structuring of INLA allows for each covariate to be fit using a dif-ferent model or added linearly, with the final model being a combinationof all models used for the covariates and the linear terms (as displayed isequation 3.1). The models used in the fitting of the spatial and the spatio-temporal INLA models were: the Besag, York, and Molie (BYM) model [2],an independent random noise (iid) model, and a first order random walk(rw1) [11]. A list of all the latent models available for R-INLA can be foundat www.r-inla.org/models/latent-models.The BYM model is simply a summation of two models, namely, theBesag model [2] and the iid model. Each of these two models are used toaccount for the fixed and random e↵ects of spatial interaction, respectively.The distribution for the Besag model is given by:zi|zj , i 6= j, ⌧ ⇠ N 0@ 1ni Xi⇠jzj , 1ni⌧1A . (4.1)Here, zi, represents the index of ecodistrict i, and follows a normal dis-tribution with mean 1niXi⇠jzj and variance 1ni⌧. The number of neighboursof ecodistrict i is represented by ni and i ⇠ j indicates that ecodistricts iand j are neighbours [2]. The precision parameter, ⌧ , is given a log gammaprior distribution, as follows:244.2. Imputing Missing Covariate Values⇡(⌧) = ln(⌧)↵1⌧↵(↵) exp( ln(⌧)/), (4.2)where ↵ and are shape parameters taking on values greater than zero.The density function for the iid model is given by:⇡(z|⌧) = nYi=11p2⇡psi⌧ exp✓12si⌧z2i◆ . (4.3)Here, z represents the vector of indices for all ecodistricts, si is a scalingconstant (the default which is used being equal to one), and zi and ⌧ aredefined in equation (4.1).For a Gaussian vector, x = (x1, ..., xn), the first-order random walk isspecified by [11]:xi = xi xi+1 ⇠ N (0, ⌧1). (4.4).4.2 Imputing Missing Covariate ValuesMissing data for ecodistricts i and years j must be imputed. For ease ofreadability, we defined the covariates whose values need to be imputed as: Z1 = Growing Season Maximum Temperature Z2 = Growing Season Minimum Temperature Z3 = Growing Season Water Deficit Z4 = Growing Season Growing Degree Days Z5 = Growing Season Total PrecipitationSpatial imputing and spatio-temporal imputing were modelled in di↵erentforms, with a temporal component being added to the base spatial imputing254.3. Spatial Modelmodel when the aspect of time was introduced. For the spatial only modelthe covariates were imputed using the following models:ZIi = ↵ + i + ⌫i, I = 1, 2, (4.5a)⌘5i = pZ5i = ↵ + i + ⌫i, (4.5b)where, ⌘5i is the link function as described at the beginning of chapter 3,↵ represents the intercept, represents the spatially structured component,and ⌫ represents the random spatial component (each modelled as describedin 2.2).Continuing, the covariates for the spatio-temporal models were imputedusing the following models:ZIij = ↵ + i + ⌫i + j + j , I = 1, 2, 3, 4, (4.6a)⌘5ij = pZ5ij = ↵ + i + ⌫i + j + j , (4.6b)where, j represents the temporally structured component, j representsthe random temporal component and all other components are as definedas above (each modelled as described in 2.2).4.3 Spatial Model SPATIALThe structure of the spatial only model is as follows:yi = ↵ + i + ⌫i + f1(Z1i, iid) + f2(Z2i, iid) + f2(Z3i, rw1) (4.7)where yi represents the mean wheat yield for ecodistrict i, ↵ representsthe intercept, represents the spatially structured component, ⌫ representsthe random spatial component, and the f(Z,M)s represent the model Mfor which the covariate Z is fit [3]. The models are as described in section264.4. Spatio-Temporal Models4.1.4.4 Spatio-Temporal ModelsWhen examining a spatio-temporal structure, there were three base mod-els considered. For each of these three bases, five di↵erent covariate con-figurations were tested on each base. Thus, bringing the total number ofmodels considered to 15. The model selection process involved selectingmodels using a subset of the following covariates: maximum temperature,minimum temperature, total precipitation, water deficit, and growing de-gree days. The base structure of each model is described below.Model 2yij = ↵ + i + ⌫i + ( + i) · j+ covariate functions, (4.8)where yij represents the mean wheat yield for ecodistrict i in the year j, ↵represents the intercept, represents the spatially structured component,⌫ represents the random spatial component, represents the linear globaltime e↵ect, represents the interaction between time and space, and j is avariable for year [3].The spatial components were modelled using a Besag, York, and Molie(BYM) model. Which, as explained previously, separates the structuredand random e↵ects of space into the Besag and iid models, respectively.The model component i was obtained by modelling the spatial componentas a independent random noise (iid) using the years as weights. Finally, was obtained by simply introducing the temporal component as a linearcovariate.The third model includes a spatially structured and spatially randomcomponent as in model 2. However, instead of introducing the temporalcomponent linearly, the temporal component is modelled in a similar matter274.4. Spatio-Temporal Modelsto the spatial component. That is, both a temporally structured and atemporally random component are introduced. This configuration assumesan independence between the spatial and temporal components. This modelis specified as follows: Model 3yij = ↵ + i + ⌫i + j + j+ covariate functions, (4.9)here j represents the temporally structured component, j represents therandom temporal component and all other variables are as defined in equa-tion (4.8) [3].The spatial components were modelled in the same matter as the modelspecified by equation (4.8). The temporally structured component j wasmodelled using a first order random walk (rw1) while incorporating a neigh-bourhood structure. The temporally random component j was modelledsimply as an iid.The first model has the same structure as mentioned in the third modelexcept it also incorporates a term to take into account the interaction be-tween time and space. This model can be seen below:Model 1yij = ↵ + i + ⌫i + j + j + ij+ covariate functions, (4.10)here ij represents the interaction term between space and time. This inter-action term assumes that the two random e↵ects on space and time (⌫i andj) interact. This assumption also specifies neither a spatial nor temporalstructure on the interaction term. i.e. ij is distributed normally with amean of 0 and variance ⌧ (ij ⇠ N (0, ⌧)). All other variables are definedas in equation (4.9) [3].284.4. Spatio-Temporal ModelsThe interaction term it was modelled using an iid model. This was doneby specifying an index number for each observation in the whole data set(552 total representing data for 6 years from 92 ecodistricts) to be inputtedinto the iid model. All other variables were modelled as described in themodel defined in equation (4.9).When being modelled, the various covariates were fit using the followingmodels:• Maximum Temperature - Independent Random Noise• Minimum Temperature - Independent Random Noise• Total Precipitation - First Order Random Walk• Water Deficit - First Order Random Walk• Growing Degree Days - Independent Random NoiseIn table 4.1 on the following page, the covariates used in each configu-ration are displayed. Afterwards, three configurations were attempted buteither achieved much worst results or failed to work at all. These configu-rations are referenced by NOT USED under the configuration column.The covariates measuring aspects of temperature (i.e. maximum tem-perature, minimum temperature, and growing degree days) were modelledusing an independent random noise model because it was hypothesized thattemperature follows a fairly random trend throughout both time and space.Therefore, using this specification, it is assumed that the temperature inone ecodistrict is e↵ected very minimally by the temperature of the ecodis-tricts surrounding it in the current year and the temperature of that sameecodistrict in past years.Covariates measuring aspects of precipitation (i.e. total precipitationand water deficit) were modelled using a first order random walk as it wasdeemed fit that precipitation follows a fairly random trend in terms of start-ing and stopping. However, due to random weather patterns throughoutspace it was appropriate to capture these patterns within the random walkstructure. The structure of the random walk also takes into account random294.4. Spatio-Temporal Modelstrends in precipitation throughout time. For example, if total precipitationincreased for multiple consecutive years within a specific ecodistrict then thetotal precipitation for the ecodistrict in the following year would be given ahigher probability to be modelled as increasing further and the same goes fora steadily declining total precipitation within a given ecodistrict throughouttime.Table 4.1: Covariate Configurations.Configuration Covariates UsedI Maximum Temperature, Minimum Temperature, TotalPrecipitationII Maximum Temperature, Minimum Temperature, TotalPrecipitation, Water DeficitIII Maximum Temperature, Minimum Temperature, TotalPrecipitation, Growing Degree DaysIV Maximum Temperature, Minimum Temperature, TotalPrecipitation, Water Deficit, Growing Degree DaysV Maximum Temperature, Minimum Temperature, WaterDeficitNOT USEDWater Deficit, Growing Degree DaysNOT USEDTotal Precipitation, Growing Degree DaysNOT USED Maximum Temperature, Minimum Temperature, WaterDeficit, Growing Degree Days30Chapter 5Model SelectionModel selection was performed via the use of two criterion: the de-viance information criterion (DIC) [15] and a k-fold cross validated rootmean squared error (CVRMSE).The deviance information criterion is a commonly used criterion in Bayesianmodel selection. The general rule for determining which model contains abetter fit is to choose the model with the smallest DIC [15]. For each of themodels fit, the DIC was calculated using all available data.The DIC is calculated as follows:DIC = pD + D¯. (5.1)Here, pD is a measure of the number of e↵ective parameters in the modeland penalizes models with more complexity (i.e. a larger number of param-eters) and is defined as: pD = D¯ D(✓¯). (5.2)D(✓) is known as the deviance and is defined as:D(✓) = 2 log(p(y|✓)) + C, (5.3)where y are the data, ✓ are the unknown parameters in the model, p(y|✓)is the likelihood and C is a constant with respect to ✓ which always cancelsout when comparing models and therefore does not need to be defined.Furthermore, ✓¯ is the expectation of ✓ and D¯ is the expectation of D(✓)with respect to ✓. Therefore, (5.2) can be viewed as the mean devianceminus the deviance of the means. D¯ will always decrease as the number of31Chapter 5. Model Selectionthe parameters in a model increases, which is why pD is used to compensatefor this and favours models with fewer parameters [15]. The method inwhich the DIC is calculated remains unchanged between the spatial andspatio-temporal models and is included as an output within the R-INLApackage for each model.Even though DIC is a measure of deviance, it is possible to obtain neg-ative values for this criterion. This is true because the probability densityp(y|✓) can be greater than one if it has a very small standard deviation.In this case, the log of the density would be positive thus making equation5.3 negative [15]. Due to this, it is the di↵erence in DIC values betweenthe models that we look at and not the absolute value of DIC for eachmodel. Thus, implying the lowest DIC value regardless of sign defines thebest fitting model.Cross validation was calculated di↵erently for the spatial only and thespatio-temporal models.For the spatial only model, cross validation was performed by randomlysampling 10% of the ecodistricts that contained wheat yield data. Thissample was removed from the data frame and the model was fit on theremaining data. The predicted values for the sample of ecodistricts werecompared to the observed data by calculating the mean squared error. Thisprocess was repeated 5000 times and the k-fold CVRMSE was calculated asthe determining factor of model fit.Mean squared error (MSE) is defined to be:MSEspatial = nXi=1(yi yˆi)2n , (5.4)where yi are the observed mean wheat yield data, yˆi are the predictedmean wheat yield data and n is the size of the 10% sample. The CVRMSEis then calculated as:32Chapter 5. Model SelectionCV RMSEspatial = vuuuut5000Xj=1 MSEj5000 , (5.5)where MSEj refers to the MSE of the jth sample.For the spatio-temporal model, instead of sampling and removing part ofthe data frame, each year was used to partition the data. When calculatingMSE, either one or multiple years were removed and predicted for duringeach of the terms in the calculation. Each possible combination of yearswere removed during each calculation. Therefore, equations (5.4) and (5.5)update to: MSEtemporal = mXj=1 n2Xi=1(yij yˆij)2n2 , (5.6)CV RMSEtemporalvuuuut KXk=1MSEkK . (5.7)Where n2 92 are the number of observed mean wheat yield data pointsin the year j, m are the number of years removed during each iteration, andK= 6m(where 6 is the number of years with obtained data).When performing the cross-validation, there were instances in whichcertain years were removed. INLA was unable to predict the wheat yieldfor those missing years. The spacing of the available data in the adjacentyears to the year in question caused issues for the INLA method in certainsituations. When there are large clumps of ecodistricts with missing datathroughout the map, INLA has a dicult time predicting the wheat yieldfor these ecodistricts. Typically when this occurred, it was in an iterationwhere it was particularly dicult for the models to predict accurately. Theseiterations were removed from all the models in order to avoid biasing the33Chapter 5. Model Selectionvalidation results.When calculating the CVRMSE for both the spatial only and spatio-temporal models, inconceivable outliers for MSE were produced in bothcertain iterations of the spatial only model and for certain year combina-tions in the spatio-temporal models. Outliers of this sort were removedusing the rule-of-thumb outlier rule when plotting the MSE values in a boxplot for each of the respective models. That is, any outliers outside of therange [Q1-(1.5)IQR, Q3+(1.5)IQR] of the MSE data for each of the modelswere removed. Where Q1 is the first quartile, Q3 is the third quartile, andIQR(interquartile range)=Q3-Q1.In table 5.1 to follow, Model refers to the base model structure as definedin equations (4.10), (4.8), and (4.9) respectively. The numbers I, II, III, IV,and V refer to the respective covariate configuration as defined in table 4.1.DIC refers to the respective model-configuration pairing’s DIC value, andCVRMSE1, CVRMSE2, and CVRMSE3 refer to the cross-validation valuescalculated by removing all subsets of 1, 2, and 3 years respectively from eachmodel-configuration pairing.34Chapter 5. Model SelectionTable 5.1: Deviance information criterion (DIC) and cross validated rootmean squared error (CVRMSE) values for the various models fitted. Smallervalues of DIC and CVRMSE indicate a better model fit so Model 2 underconfiguration III had the best DIC value and Model 1 under configurationII had the best overall CVRMSE value.Model I II IIIModel 1-DIC -2292.42 -1698.69 -1798.81Model 2-DIC 855.53 64.94 -2708.32Model 3-DIC -2040.65 -1688.83 -2054.89Model 1-CVRMSE1 40.55 37.09 43.24Model 2-CVRMSE1 72.96 55.50 69.01Model 3-CVRMSE1 51.88 51.36 38.25Model 1-CVRMSE2 56.90 63.25 56.37Model 2-CVRMSE2 78.69 115.29 79.20Model 3-CVRMSE2 58.76 60.48 58.12Model 1-CVRMSE3 60.51 65.79 60.88Model 2-CVRMSE3 143.72 99.87 162.94Model 3-CVRMSE3 62.10 59.15 60.53Model IV VModel 1-DIC 3786.07 -2506.65Model 2-DIC -2243.12 4381.46Model 3-DIC 3884.25 4360.94Model 1-CVRMSE1 44.39 61.26Model 2-CVRMSE1 142.51 61.65Model 3-CVRMSE1 50.08 60.80Model 1-CVRMSE2 55.35 55.91Model 2-CVRMSE2 107.16 75.77Model 3-CVRMSE2 65.95 58.13Model 1-CVRMSE3 62.10 55.21Model 2-CVRMSE3 112.03 188.72Model 3-CVRMSE3 63.94 56.5235Chapter 5. Model SelectionTo determine if there were redundant covariates (i.e. including all redun-dant covariates would achieve no further gains in accuracy), correlation val-ues were calculated between covariates measuring similar properties. Totalprecipitation and water deficit essentially are measuring the same quantity.The correlation coecient between total precipitation and water deficit was0.8366. This suggests a high collinearity between the two covariates thatcannot be ignored. Collinearity between explanatory variables o↵ers sev-eral undesirable consequences under a regression context [5]. Furthermore,since maximum temperature, minimum temperature, and growing degreedays are all measuring certain aspects of temperature, their correlation val-ues amongst the covariates were also examined. The correlations betweenmaximum temperature, minimum temperature, and growing degree days aresummarized in table 5.2 below.Table 5.2: Correlation matrix for maximum temperature, minimum temper-ature, and growing degree days. Small values indicate no potential problemswith collinearity.Max Temp Min Temp GDDMax Temp 1 0.6434 0.5748Min Temp 0.6434 1 0.4054GDD 0.5748 0.4054 1After observing the values in table 5.2, there does not appear to beany troublesome combination of variables that need to be examined fur-ther. However, the high correlation value between the total precipitationand water deficit covariates must be addressed. Since total precipitationand water deficit are highly positively correlated and both these covariatescontribute positively towards wheat growth, there is a high potential thattheir collinearity will cause a problem in the INLA regression model. Itbecomes dicult to distinguish the e↵ect each covariate has individually onthe response variable (mean wheat growth), meaning that the respectivecontributions of the covariates could overlap. Hence, leading to skewed re-sults by under or overestimating mean wheat growth in certain ecodistricts.36Chapter 5. Model SelectionIt is unwise to include both total precipitation and water deficit within thesame model. Therefore, configurations II and IV in table 5.1 will no longerbe considered. Thus, the following configurations in table 5.3 below remain.37Chapter 5. Model SelectionTable 5.3: Deviance information criterion (DIC) and cross validated rootmean squared error (CVRMSE) values for a subset of models fitted. Model2 under configuration III had the best overall DIC value and model 1 underconfiguration I had the best overall CVRMSE value.Model I III VModel 1-DIC -2292.42 -1798.81 -2506.65Model 2-DIC 855.53 -2708.32 4381.46Model 3-DIC -2040.65 -2054.89 4360.94Model 1-CVRMSE1 40.55 43.24 61.26Model 2-CVRMSE1 72.96 69.01 61.65Model 3-CVRMSE1 51.88 38.25 60.80Model 1-CVRMSE2 56.90 56.37 55.91Model 2-CVRMSE2 78.69 79.20 75.77Model 3-CVRMSE2 58.76 58.12 58.13Model 1-CVRMSE3 60.51 60.88 55.21Model 2-CVRMSE3 143.72 162.94 188.72Model 3-CVRMSE3 62.10 60.53 56.52Comparing the three remaining configurations in 5.3, it can be seen thatconfiguration V performs the worst. The only clear stand out position whereconfiguration V was better was in the DIC of model 1. However, due to thesignificantly better (lower) CVRMSE1 values in configurations I and III,this configuration is outdone. Also note that the CVRMSE1 value holdsmuch more precedence over both the CVRMSE2 and CVRMSE3 values asit becomes much harder for each of the models to predict as more years areremoved. Other than the DIC of model 1, there are no other significantplaces where configuration V is strictly better. Due to this, configuration Vwill no longer be considered.Comparing the remaining two configurations I and III, model 1 is betterunder configuration I, model 2 is better under configuration III, and model3 is strictly better under configuration III. To expand, better is defined ashaving the majority of lower DIC, CVRMSE1, CVRMSE2, and CVRMSE3values under one configuration over the other and strictly better is defined ashaving all lower values in the respectively categories under a given configura-38Chapter 5. Model Selectiontion. While model 2 does have the lowest DIC value under configuration IIIwhen compared to the best fits of model 1 under configuration I and model3 under configuration III, model 2 has a significantly worse CVRMSE underall categories than the other best model fits. Therefore, the final two mod-els to be considered are model 1 under configuration I and model 3 underconfiguration III.Comparing these two fits it can been seen that model 1 under configura-tion I has a slightly better DIC, CVRMSE2, and CVRMSE3 values, whilemodel 3 under configuration III has a slightly better CVRMSE1 value. Tofurther decide which model is to be chosen, it is seen that model 1 underconfiguration I has three covariates included in the model (Maximum Tem-perature, Minimum Temperature, and Total Precipitation) where model 3under configuration III has four covariates included in the model (MaximumTemperature, Minimum Temperature, Total Precipitation, and Growing De-gree Days). Therefore, if the models were extremely similar in terms of fit,judging solely on complexity, model 1 under configuration I is favoured asit includes less covariates and is thus a less complex model. Furthermore,the DIC (as described in the beginning of this chapter) is a value createdspecifically for Bayesian models and is included as part of the R-INLA pack-age for comparing di↵erent model fits and will hence be favoured in termsof model selection judgement over CVRMSE when the di↵erences betweentwo models are not drastically di↵erent in either category. Therefore, model1 under configuration I is to be the chosen model to fit.To further investigate the chosen model fit, a di↵erent method of im-puting the missing data was attempted. As described in section 2.2, thismethod was averaging all the collected data from each appropriate covariaterather than using INLA to impute the values. These results are displayedin table 5.4 on the following page.The column titled “Model” refers to the model’s reference name. HereSPATIAL and Model 1 are self explanatory by their reference equationnumbers in the following column and AVG is the same model structureas Model 1 except using the averaged data to impute missing data ratherthan using the INLA model to impute missing data. Eqn is referring to the39Chapter 5. Model Selectionequation in which that model structure is defined in chapter 4. CVRMSE,CVRMSE2, and CVRMSE3 refer to the cross-validation values calculatedby removing all subsets of 1, 2, and 3 years respectively. The SDE1, SDE2,and SDE3 values were calculated as a standardized measure in order to givethe CVRMSE values for 1, 2, and 3 years more context. First, a dummyvalue was calculated by simply predicting the wheat yield for each ecodistrictusing the mean of the wheat yields from the entire data set. Meaning thatwhen a year was removed, every value was filled in using the mean. Fromthis, a CV RMSE was calculated using the same method as described inequation (5.7) for both the Model 1 and AVG models. Finally, the SDEvalues were calculated as the ratio of the CVRMSE from each model overthe respective dummy value. Given this specification, an SDE value of lessthan one indicates that the given model is better at predicting than thedummy model which just predicts using the overall mean. Dummy valueswere calculated as 59.28, 59.26, and 59.25 when one, two, and three yearswere removed when calculating CVRMSE respectively.Table 5.4: Model validation for spatial only, INLA imputed missing values,and average imputed missing values. The model using the INLA imputedmissing values displayed the best results in terms of both DIC and CVRMSE.Model: SPATIAL Model 1 AVGEqn (4.7) (4.10) (4.10)DIC -372.17 -2292.42 122.81CVRMSE1 71.16* 40.55 44.02SDE1 NA 0.68 0.74CVRMSE2 NA 56.90 56.56SDE2 NA 0.96 0.95CVRMSE3 NA 60.51 62.91SDE3 NA 1.02 1.06Model 1 had a significantly better DIC compared to both the spatialonly model and the AVG model. The CVRMSE for the spatial only modelwas significantly worse than both the other models and is even worse thanany of the temporal models dummy value CVRMSE values. The spatial only40Chapter 5. Model Selectionmodel performed worse than the temporal models in terms of model predic-tion capabilities. When examining the CVRMSE values between Model 1and AVG, they compare fairly similarly in magnitude with Model 1 havingthe slight edge. It should also be stated that when removing one year at atime, both models predict significantly better than the dummy value usingonly the mean. This is seen by SDE1 having a value lower than 1. However,when two and three years are removed, both models tend to predict veryclose to the mean value (indicated by SDE2 and SDE3 values close to 1).Hence, caution should be taken when trying to predict for data with two ormore missing years.With Model 1 emerging as the best model in terms of both modelselection criterion, it can be seen that imputing missing values using INLAis a better method than computing the missing values using the average ofthe entire data set. Model 1 will be the final model selected from which todisplay results from in chapter 6 to follow.41Chapter 6Results: RegionalDistribution of Wheat YieldThe plots displayed in figures 6.1 and 6.2 respectively represent the per-centiles for mean wheat yield when examining the spatial and temporalcomponents of the model separately.Figure 6.1 shows the spatial aspect of the model. The two subsequentplots represent the 25th and 75th spatial percentiles, respectively. These val-ues were calculated by taking the 25th and 75th percentile of the six availablefitted values (due to the six available years of data) for each ecodistrict. Thisplot reveals the highest density of wheat yield observed throughout the cen-tral belt of Alberta, with relatively high yields in the southeast and loweryields throughout the northwest portion of the province. The plot also dis-plays the significant variation in wheat yield over neighbouring ecosdistrictsin addition to considerable variation in relation to the expected yield.Figure 6.2 shows the temporal aspect of the model. Highlighted is thetemporal variability of wheat yield over the years of model fitted values. Theblue line represents the mean predicted wheat yield over all ecodistricts inthe respective year. The lower and upper portions of the grey band repre-sent the 25th and 75th percentiles of the wheat yield over all ecodistricts inthe given year, respectively. This plot reveals the greatest mean wheat yieldoccurring in the boundary years of 1999 and 2004, with decreasing meanwheat yield when proceeding towards the middle years finally resulting inthe minimum trough year of 2002.42Chapter6.Results:RegionalDistributionofWheatYield25th Spatial Percentile 75th Spatial Percentile[0,50](50,100](100,150](150,200](200,250](250,300]Figure 6.1: 25th and 75th Spatial Percentiles for Fitted Mean Wheat Yield over Alberta. Highest wheat yielddensity is observed throughout the central belt of Alberta. Significant variation of wheat yield is observed overneighbouring ecosdistricts.43Chapter 6. Results: Regional Distribution of Wheat Yield0501001502002501999 2000 2001 2002 2003 2004YearWheat Yield25%-75% Credible Intervals for Temporal Wheat YieldFigure 6.2: 25th and 75th temporal percentiles for fitted mean wheat yieldover Alberta covering the years 1999-2004. Highlighted is the temporalvariability. The blue line represents the mean predicted wheat yield withthe lower and upper portions of the grey band representing the 25th and75th percentiles, respectively. The greatest mean wheat yield is observed in1999 and 2004, with a minimum trough year of 2002.44Chapter 6. Results: Regional Distribution of Wheat YieldFigure 6.3 considers only the spatial aspect of the model. The numbersshown represent the probability that the wheat yield in the given ecodistrictwill be above the mean wheat yield taken over all ecodistricts for whichobserved data were available. This plot is consistent with the discussionhighlighted for the plot portrayed in 6.1 in that the highest probabilitiesare observed throughout the central belt of Alberta, with lower probabilitiesobserved when moving to the northern and southern portions of the province.In addition, we again see large variation between neighbouring ecodistricts.Figures 6.4, 6.5, and 6.6 all represent the mean wheat yield predictedfor the year 2004. For each plot, the predicted values are displayed in thesegment to the left, where as the observed values are displayed in the segmentto the right. Figure 6.4 represents the prediction when the wheat yield datafor 2004 had been removed (i.e. one year in advance), figure 6.5 for whenthe wheat yield data for 2003 and 2004 had been removed (i.e. two yearsin advance), and finally figure 6.6 for when the wheat yield data for 2002,2003, and 2004 had been removed (i.e. three years in advance). Note thatan “NA” value in the right column represents that wheat yield data for thatecodistrict were missing in the year 2004. Consistent with the numbers forSDE in table 5.4, we can see from the subsequent plots that as we predictfor more years in advance the predictions tend to converge more and moreto the mean of the remaining overall data.Finally, 6.7 represents the mean wheat yield forecasted by the modelusing all six years of available data for one year ahead of known data (2005).This plot is meant as an example to demonstrate the prediction capabilitiesof the model. Due to no data being available in 2005, there is nothing tocompare this result to.45Chapter 6. Results: Regional Distribution of Wheat Yield[0,0.2](0.2,0.4](0.4,0.6](0.6,0.8](0.8,1]Figure 6.3: Probability of ecodistrict wheat yield in Alberta exceeding over-all mean threshold. Consistencies with the spatial variability in 6.1 aredisplayed. With highest probabilities observed throughout the central beltof Alberta and large variation between neighbouring ecodistricts.46Chapter6.Results:RegionalDistributionofWheatYieldPredicted Mean Wheat Yield Observed Mean Wheat YieldNA[0,50](50,100](100,150](150,200](200,250](250,300](300,350]Figure 6.4: Mean wheat yield predictions over Alberta for the year 2004 - one year in advance. Predicted valuesdisplayed on the left are fitted using the data from 1999-2003. White ecodistricts within the observed data on theright indicate missing data for the year 2004.47Chapter6.Results:RegionalDistributionofWheatYieldPredicted Mean Wheat Yield Observed Mean Wheat YieldNA[0,50](50,100](100,150](150,200](200,250](250,300](300,350]Figure 6.5: Mean Wheat Yield Predictions over Alberta for the year 2004-two years in advance. Predicted valuesdisplayed on the left are fitted using the data from 1999-2002. White ecodistricts within the observed data on theright indicate missing data for the year 2004.48Chapter6.Results:RegionalDistributionofWheatYieldPredicted Mean Wheat Yield Observed Mean Wheat YieldNA[0,50](50,100](100,150](150,200](200,250](250,300](300,350]Figure 6.6: Mean Wheat Yield Predictions over Alberta for the year 2004-three years in advance. Predicted valuesdisplayed on the left are fitted using the data from 1999-2001. White ecodistricts within the observed data on theright indicate missing data for the year 2004.49Chapter 6. Results: Regional Distribution of Wheat YieldPredicted Mean Wheat Yield For 2005NA[0,50](50,100](100,150](150,200](200,250](250,300](300,350]Figure 6.7: Mean Wheat Yield Predictions over Alberta for the year 2005-one year ahead of known data. Predicted values are fitted using all six yearsof available data from 1999-2004.50Chapter 7ConclusionThe main objective of the research and analysis was to develop a compu-tationally cheaper and accurate alternative method for modelling crop yielddi↵erent from the traditional MCMC algorithms for estimating model pa-rameters. Data for wheat yield in Alberta was used to test and validate thisobjective. This was accomplished using a spatio-temporal Bayesian modelincorporating a neighbourhood structure via the use of the INLA method.Using the Laplace approximation and exploiting the Gaussian Markov Ran-dom Field properties of the ecodistrict neighbourhood structure, the INLAmethod allowed for a computationally reasonable algorithm for obtainingthe posterior marginals of the model parameters.Crop yield and climate data provided by AAFC was assimilated, manip-ulated, and interpolated. This was because yield data were provided in theform of one data point per year over the growing season at the ecodistrictlevel, this meant that data collected from the multiple climate stations ineach ecodistrict had to be aggregated to the ecodistrict level. Data were pro-vided from each climate station for each day throughout the year meaningthat a single value for each covariate had to be obtained over the growingseason for each year. Finally, some ecodistricts contained no data for wheatyield altogether (either because no wheat grows within that ecodistrict orthe data were just missing) and hence data taken from the climate stationswithin these ecodistricts had to be removed and the shape file of Albertathat was created for use in R-INLA had to be adapted to exclude theseecodistricts.Missing data in the covariates (minimum temperature, maximum tem-perature, water deficit, GDD, and total precipitation) was also resolved.This was a concern due to the fact that the INLA method can deal with51Chapter 7. Conclusionmissing data in the response variable, however it is unable to handle missingdata in the covariates. Three methods were considered to fill in the missingdata. These included taking the average of the four nearest neighbours toeach ecodistrict with missing data, taking the average of the entire dataset for each covariate, and using INLA models treating the covariates asresponse variables and then using the fitted values from these models toreplace the missing data. The nearest neighbour method was dismissed asit was found that the missing data tended to appear in blocks and in someoccasions no average of neighbouring data could be computed. From theremaining two methods, the prediction of the missing data using the INLAmethod performed better than the averaging method in terms of model se-lection criterion and was hence selected as best.Using the Deviance Information Criterion and a K-fold cross validatedroot mean squared error as the two model selection criterion it was de-duced that the set of covariates most suited for explaining the properties ofwheat growth in Alberta while also paying attention to model complexitywere maximum temperature, minimum temperature and total precipitation.Furthermore, the model structure which was chosen involved including fourseparate components for structured and random spatial and temporal e↵ectsas well as an interaction e↵ect between space and time. This structure waschosen opposed to two other model structures where the first excluded theinteraction e↵ect, and the second included a linear time e↵ect and a spacetime interaction e↵ect instead of including separate temporal e↵ects.The INLA method performed quite well in hind casting for a singlepreviously removed year or forecasting for one year in advance. However,as more years are removed, the INLA model tends to predict closer andcloser to the mean of the remaining data. For example, when three yearsare removed in figure 6.6, the model only has three years of data left tobuild from and the predictions will all be very close to the mean of thewheat yield from the years 1999 through 2001 (which is 203.56). Confidencecan be placed in the one year predictions and caution should be taken whenviewing predictions for multiple years.In future work, a model which allowed for in season forecasting could52Chapter 7. Conclusionprove extremely useful. If wheat yield and covariate data were provided ona monthly basis, the INLA model could take on some added complexity byfurther separating the time component into monthly fractions. For example,in July this would allow one to predict the midseason forecast for wheat yieldin August while using the already collected data from May and June insteadof only allowing for a one year in advance prediction. Additional economicindicators relating to crop prices and distribution can also be included ascovariates.53Bibliography[1] Allen, R. G. (1994), “Economic forecasting in agriculture,” Interna-tional Journal of Forecasting, 10, 81–135. ! pages 1[2] Besag, J., York, J., and Mollie´, A. (1991), “Bayesian image restoration,with two applications in spatial statistics,” Annals of the Institute ofStatistical Mathematics, 43, 1–20. ! pages 9, 24[3] Blangiardo, M., Cameletti, M., Baio, G., and Rue, H. (2012), “Spa-tial and Spatio-Temporal models with R-INLA,” Spatial and Spatio-Temporal Epidemiology, 4, 33–49. ! pages 2, 3, 11, 12, 13, 15, 26, 27,28[4] Carew, R., Smith, E. G., and Grant, C. (2009), “Factors InfluencingWheat Yield and Variability,” Journal of Agricultural and Applied Eco-nomics, 41, 625–639. ! pages 2[5] Dobson, A. J. . and Barnett, A. G. . (2008), An Introduction to Gen-eralized Linear Models, Boca Raton, Florida: Chapman & Hall / CRC.! pages 36[6] Fahrmeir, L. and Tutz, G. (2001), Multivariate Statistical ModellingBased on Generalized Linear Models, Berlin: Springer. ! pages 16[7] Lewis, A. S. and Overton, M. L. (2012), “Nonsmooth optimizationvia quasi-Newton methods,” Springer and Mathematical OptimizationSociety, 141, 135–163. ! pages 17[8] Michel, L. and Makowski, D. . (2013), “Comparison of Statistical Mod-els for Analyzing Wheat Yield Time Series,” PLoS ONE. ! pages254Bibliography[9] Newlands, N. K. and Zamar, D. S. (2012), “In-season probabilistic cropyield forecasting, integrating agro-climate, remote-sensing and cropphenology data,” 2012 Joint Statistical Meetings (2012 JSM). ! pages2[10] Newlands, N. K., Zamar, D. S., Kouadio, L. ., Zhang, Y. ., Chipan-shi, A. C. ., Potgieter, A. ., Toure, S. ., and Hill, H. S. . (2014),“An integrated, probabilistic model for improved seasonal forecastingof agricultural crop yield under environmental uncertainty,” Frontiersin Environmental Science. ! pages 2[11] Pearson, K. (1905), “The Problem of the Random Walk,” Nature (Lon-don), 72, 318. ! pages 9, 24, 25[12] Prost, L., Makowski, D. ., and Jeu↵roy, M. H. . (2008), “Comparison ofstepwise selection and Bayesian model averaging for yield gap analysis,”Ecological Modelling, 219, 66–76. ! pages 2[13] Rue, H. and Held, L. (2005), Gaussian Markov Random Fields, BocaRaton, Florida: Chapman & Hall / CRC. ! pages 3, 12, 16[14] Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesianinference for latent Gaussian models by using integrated nested Laplaceapproximations,” Journal of the Royal Statistical Society, 71, 319–392.! pages vii, 15, 16, 17, 18, 19, 21, 22[15] Spiegelhalter, D. J., Best, N. G., Carlin, P., and van der Linde, A.(2002), “Bayesian Measures of Model Complexity and Fit,” Journalof the Royal Statistical Society. Series B (Statistical Methodology), 64,583–639. ! pages 31, 32[16] Taylor, B. M. and Diggle, P. D. (2012), “INLA or MCMC? A Tutorialand Comparative Evaluation for Spatial Prediction in log-Gaussian CoxProcesses,” eprint arXiv:1202.1738. ! pages 355
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An approximate spatio-temporal Bayesian model for Alberta...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
An approximate spatio-temporal Bayesian model for Alberta wheat yield Popoff, Evan 2014
pdf
Page Metadata
Item Metadata
Title | An approximate spatio-temporal Bayesian model for Alberta wheat yield |
Creator |
Popoff, Evan |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | Crop forecasting models are very valuable to a number of agricultural and government agencies. We investigated the effect of spatial and temporal environmental climate covariates on the growth of crop yield (wheat) at the regional scale across the province of Alberta. Model fitting was accomplished using data collected during the growing season from climate stations across Alberta provided by Agriculture and Agri-Food Canada (AAFC). A best fitting model was selected which takes into account simplicity (number of covariates used) and accuracy (predictive capability based on two selection criteria). There have been a number of Bayesian methods for predicting wheat yield. However, many of these methods typically involve extensive algorithms such as a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) that adds substantial computational complexity and run-time. We investigated the application of a spatio-temporal Bayesian model entitled the Integrated Nested Laplace Approximation (INLA). This method offers a computationally cheaper alternative to the MCMC approach and is capable of handling large data requiring interpolation (data sparsity) with relative ease. By structuring the model to have a sparse precision matrix, INLA is able to simplify posterior marginal estimation of the parameters by incorporating the Laplace approximation. The INLA model demonstrated strong predictive capabilities when predicting for one year in advance or hind-casting for a it single previous year. However, when multiple years of data were removed or predictions were made for multiple years in advance, INLA struggled to make predictions which deviated considerably from the mean of the remaining data. Predictive performance in the best fitting model saw a 40% increase in root mean squared error (RMSE) when moving from one year to two and another 6% increase when moving from two to three years. We conclude that the INLA model structure offers valuable information when examining one year in advance but caution should be taken when attempting to forecast for multiple years in advance. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-06-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0074343 |
URI | http://hdl.handle.net/2429/47033 |
Degree |
Master of Science - MSc |
Program |
Interdisciplinary Studies |
Affiliation |
Graduate Studies, College of (Okanagan) |
Degree Grantor | University of British Columbia |
GraduationDate | 2014-09 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2014_fall_popoff_evan.pdf [ 3.51MB ]
- Metadata
- JSON: 24-1.0074343.json
- JSON-LD: 24-1.0074343-ld.json
- RDF/XML (Pretty): 24-1.0074343-rdf.xml
- RDF/JSON: 24-1.0074343-rdf.json
- Turtle: 24-1.0074343-turtle.txt
- N-Triples: 24-1.0074343-rdf-ntriples.txt
- Original Record: 24-1.0074343-source.json
- Full Text
- 24-1.0074343-fulltext.txt
- Citation
- 24-1.0074343.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0074343/manifest