Bayesian Prediction and Inference inAnalysis of Computer ExperimentsbyHao ChenB.Sc., Renmin University of China, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2013c? Hao Chen 2013AbstractGaussian Processes (GPs) are commonly used in the analysis of datafrom a computer experiment. Ideally, the analysis will provide accurate pre-dictions with correct coverage probabilities of credible intervals. A Bayesianmethod can, in principle, capture all sources of uncertainty and hence givevalid inference. Several implementations are available in the literature, dif-fering in choice of priors, etc. In this thesis, we first review three popularBayesian methods in the analysis of computer experiments. Two predictioncriteria are proposed to measure both the prediction accuracy and the pre-diction actual coverage probability. From a simple example, we notice thatthe performances of the three Bayesian implementations are quite different.Motivated by the performance difference, we specify four important factorsin terms of Bayesian analysis and allocate different levels for the factorsbased on the three existing Bayesian implementations. Full factorial exper-iments are then conducted on the specified factors both for real computermodels and via simulation with the aim of identifying the significant fac-tors. Emphasis is placed on the prediction accuracy, since the performancesof the prediction coverage probability for most combinations are satisfactory.Through the analyses described above, we find that among the fourfactors, two factors are actually significant to the prediction accuracy. Thebest combination for the levels of the four factors is also identified.iiPrefaceThis thesis is original, unpublished work by the author, Hao Chen.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Structure of the Thesis . . . . . . . . . . . . . . . . . . . 21.2 Related Work: Three Key Ingredients . . . . . . . . . . . . . 21.2.1 Bayesian Modelling . . . . . . . . . . . . . . . . . . . 31.2.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . 41.2.3 Stationary Gaussian Processes . . . . . . . . . . . . . 82 Review for Existing Methods . . . . . . . . . . . . . . . . . . 112.1 Method of Maximum Likelihood Estimation . . . . . . . . . 112.2 The General Picture for Bayesian Estimation Methods . . . 122.3 Higdon?s Bayesian Method . . . . . . . . . . . . . . . . . . . 132.4 Method of Gaussian Emulation Machine . . . . . . . . . . . 152.5 Method of Treed Gaussian Process . . . . . . . . . . . . . . . 173 Tentative Comparison of the Bayesian Methods . . . . . . 193.1 Two Prediction Criteria . . . . . . . . . . . . . . . . . . . . . 193.2 Details for the Tentative Comparison . . . . . . . . . . . . . 203.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.2 Comparison Details . . . . . . . . . . . . . . . . . . . 21ivTable of Contents3.3 Comparison Results . . . . . . . . . . . . . . . . . . . . . . . 283.3.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 283.3.2 Actual Coverage Probability . . . . . . . . . . . . . . 283.4 Conclusion & Discussion . . . . . . . . . . . . . . . . . . . . 324 In-Depth Comparison: Real Computer Models . . . . . . . 344.1 Comparison Details . . . . . . . . . . . . . . . . . . . . . . . 344.1.1 Four Important Factors . . . . . . . . . . . . . . . . . 344.1.2 Full Factorial Design . . . . . . . . . . . . . . . . . . 394.1.3 Some Mathematics Details . . . . . . . . . . . . . . . 414.1.4 Other Details for the In-Depth Comparison . . . . . . 424.2 Real Computer Model 1: Borehole . . . . . . . . . . . . . . . 434.2.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 444.2.2 Actual Coverage Probability . . . . . . . . . . . . . . 494.3 Real Computer Model 2: PTW . . . . . . . . . . . . . . . . . 534.3.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 534.3.2 Actual Coverage Probability . . . . . . . . . . . . . . 594.4 Real Computer Model 3: G-protein . . . . . . . . . . . . . . 634.4.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 634.4.2 Actual Coverage Probability . . . . . . . . . . . . . . 684.5 Summary of the Real Computer Models . . . . . . . . . . . . 725 In-Depth Comparison: Simulation Study . . . . . . . . . . . 735.1 Details for the Simulation Study . . . . . . . . . . . . . . . 735.1.1 Data Generating Procedures . . . . . . . . . . . . . . 735.1.2 Choice of True ?i . . . . . . . . . . . . . . . . . . . . 745.2 Simulation Scenario 1 . . . . . . . . . . . . . . . . . . . . . . 755.2.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 755.2.2 Actual Coverage Probability . . . . . . . . . . . . . . 755.3 Simulation Scenario 2 . . . . . . . . . . . . . . . . . . . . . . 785.3.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 785.3.2 Actual Coverage Probability . . . . . . . . . . . . . . 815.4 Summary of the Simulation Study . . . . . . . . . . . . . . . 816 Conclusion & Discussion . . . . . . . . . . . . . . . . . . . . . 82Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85vTable of ContentsAppendicesA Posterior Distribution of ?i for pi(?2) = IG(?1, ?2) . . . . . . 88B Posterior Distribution of ?i for pi(?2) ? 1/?2 . . . . . . . . . 91viList of Tables3.1 Acceptance rates for different choices of MCMC length. . . . 283.2 Average actual coverage probability, 90% true CP. . . . . . . 323.3 Average actual coverage probability, 95% true CP. . . . . . . 324.1 Full factorial design. . . . . . . . . . . . . . . . . . . . . . . . 404.2 23 factorial design, given any level of the prior on ?i. . . . . . 414.3 ANOVA results for the Borehole computer model. . . . . . . 474.4 ANOVA results for the PTW computer model. . . . . . . . . 574.5 ANOVA results for the G-protein computer model. . . . . . . 665.1 Actual coverage probability, 90% true CP, scenario 1. . . . . . 755.2 Actual coverage probability, 95% true CP, scenario 1. . . . . . 785.3 Actual coverage probability, 90% true CP, scenario 2. . . . . . 815.4 Actual coverage probability, 95% true CP, scenario 2. . . . . . 81viiList of Figures1.1 Trace plot for alpha, beta and tau. . . . . . . . . . . . . . . . 73.1 Traceplot for 15000 total samples. . . . . . . . . . . . . . . . 223.2 Traceplot for 30000 total samples. . . . . . . . . . . . . . . . 233.3 Traceplot for 60000 total samples. . . . . . . . . . . . . . . . 243.4 Empirical densities for 15000 total samples. . . . . . . . . . . 253.5 Empirical densities for 30000 total samples. . . . . . . . . . . 263.6 Empirical densities for 60000 total samples. . . . . . . . . . . 273.7 Results for prediction accuracy. . . . . . . . . . . . . . . . . . 293.8 Results for actual coverage probability, 90% true CP. . . . . . 303.9 Results for actual coverage probability, 95% true CP. . . . . . 314.1 Density plot for Higdon?s prior. . . . . . . . . . . . . . . . . . 364.2 Density plot for GEM?s prior. . . . . . . . . . . . . . . . . . . 374.3 Density plot for TGP?s prior. . . . . . . . . . . . . . . . . . . 384.4 Prediction accuracy, given Higdon?s prior on ?i, Borehole. . . 444.5 Prediction accuracy, given GEM?s prior on ?i, Borehole. . . . 454.6 Prediction accuracy, given TGP?s prior on ?i, Borehole. . . . 464.7 Average prediction accuracy, given Higdon?s prior on ?i, Bore-hole. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.8 Average prediction accuracy, given GEM?s prior on ?i, Borehole. 484.9 Average prediction accuracy, given TGP?s prior on ?i, Borehole. 494.10 Prediction actual coverage probability, given Higdon?s prioron ?i, 90% true CP, Borehole. . . . . . . . . . . . . . . . . . . 504.11 Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, Borehole. . . . . . . . . . . . . . . . . . . . . 504.12 Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, Borehole. . . . . . . . . . . . . . . . . . . . . 514.13 Prediction actual coverage probability, given Higdon?s prioron ?i, 95% true CP, Borehole. . . . . . . . . . . . . . . . . . . 514.14 Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, Borehole. . . . . . . . . . . . . . . . . . . . . 52viiiList of Figures4.15 Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, Borehole. . . . . . . . . . . . . . . . . . . . . 524.16 Prediction accuracy, given Higdon?s prior on ?i, PTW. . . . . 544.17 Prediction accuracy, given GEM?s prior on ?i, PTW. . . . . . 554.18 Prediction accuracy, given TGP?s prior on ?i, PTW. . . . . . 564.19 Average prediction accuracy, given Higdon?s prior on ?i, PTW. 584.20 Average prediction accuracy, given GEM?s prior on ?i, PTW. 584.21 Average prediction accuracy, given TGP?s prior on ?i, PTW. 594.22 Prediction actual coverage probability, given Higdon?s prioron ?i, 90% true CP, PTW. . . . . . . . . . . . . . . . . . . . . 604.23 Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, PTW. . . . . . . . . . . . . . . . . . . . . . 604.24 Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, PTW. . . . . . . . . . . . . . . . . . . . . . 614.25 Prediction actual coverage probability, given Higdon?s prioron ?i, 95% true CP, PTW. . . . . . . . . . . . . . . . . . . . . 614.26 Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, PTW. . . . . . . . . . . . . . . . . . . . . . 624.27 Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, PTW. . . . . . . . . . . . . . . . . . . . . . 624.28 Prediction accuracy, given Higdon?s prior on ?i, G-protein. . . 634.29 Prediction accuracy, given GEM?s prior on ?i, G-protein. . . . 644.30 Prediction accuracy, given TGP?s prior on ?i, G-protein. . . . 654.31 Average prediction accuracy, given Higdon?s prior on ?i, G-protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.32 Average prediction accuracy, given GEM?s prior on ?i, G-protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.33 Average prediction accuracy, given TGP?s prior on ?i, G-protein. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.34 Prediction actual coverage probability, given Higdon?s prioron ?i, 90% true CP, G-protein. . . . . . . . . . . . . . . . . . 694.35 Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, G-protein. . . . . . . . . . . . . . . . . . . . 694.36 Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, G-protein. . . . . . . . . . . . . . . . . . . . 704.37 Prediction actual coverage probability, given Higdon?s prioron ?i, 95% true CP, G-protein. . . . . . . . . . . . . . . . . . 704.38 Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, G-protein. . . . . . . . . . . . . . . . . . . . 71ixList of Figures4.39 Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, G-protein. . . . . . . . . . . . . . . . . . . . 715.1 Norm-RMSE for TGP?s prior versus Higdon?s prior, scenario 1. 765.2 Norm-RMSE for GEM?s prior versus Higdon?s prior, scenario1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 Norm-RMSE for TGP?s prior versus Higdon?s prior, scenario 2. 795.4 Norm-RMSE for GEM?s prior versus Higdon?s prior, scenario2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80xAcknowledgementsFirst, I would like to acknowledge my special debts to my supervisors,Professor William Welch and Professor Jason Loeppky, who lead me to theintriguing field of analysis of computer experiments and are always stronglysupportive of my research career. It is my great pleasure to work with them.I also want to express my heartfelt gratitude to Professor James Zidek, whofinancially supported me during the summer of 2012 and provided lots ofguidance and help to me. In addition, I owe a debt of gratitude to ProfessorJerome Sacks, who accepted me as his co-author for a high quality journalpaper and provided many support to me.I would also express my gratitude to Professors Lang Wu, Jiahua Chen,Rollin Brant, Paul Gustafson, Matas Salibin-Barrera and Yew-Wei Lim fortheir constant support and excellent teaching. I am also grateful to PeggyNg, Elaine Salameh, Zin Kurji and Andrea Sollberger for their hard workand kind help. Thanks to everyone for making the department such a a-mazing place.Finally, I owe my special thanks to my parents for their support andunderstanding of my Master study.xiDedicationTo my parents: Mr. Qindong Chen & Mrs. Xiuying Zhao,who are so proud.xiiChapter 1IntroductionMany complex phenomena are extremely difficult to investigate throughcontrolled physical experiments. Instead, computer experiments becomeimportant alternatives to provide insights into such phenomena. Nowadays,computer experiments have been successfully applied in many sciences andengineering fields, for instance climate change, where traditional laboratory-based experiments are impossible to conduct. In general, a computer exper-iment is a designed set of runs of computer codes, which usually have thefollowing two distinguishing features: (1) deterministic, that is, repeatingan identical input does not change the output. (2) time-consuming, a singlerun may take several hours to complete.Consider a typical computer model, which was described by Gramacyand Lee (2008), Gramacy and Lee (2009). NASA was developing a newreusable rocket booster called the Langley Glide-Back Booster (LGBB).NASA has built a computer model, which is able to model the flight char-acteristics (lift, drag, pitch, side-force, yaw and roll) of the LGBB as afunction of 3 inputs?side slip angle, mach number and angle of attack. Foreach input configuration triplet, the computer model will yield six responsevariables as described above. However, even for one set of inputs, it takesthe computer model 5-20 hours on a high end workstation to solve the setsof inviscid Euler equations. A small modification of the input configurationmeans another 5-20 hours of run time. Hence, it will be very helpful toapproximate the output of computer model with much less computation.Suppose that we have n input configurations as well as the correspond-ing n outputs from a computer model. The primary goal in the analysis ofcomputer experiments is to predict the output for untried inputs via a statis-tical model using the available data. The most popular method to emulatethe computer code is to treat it as a realization from a Gaussian Process(GP) Sacks et al. (1989). Assuming the correlation parameters of the GPare known, the best linear unbiased predictor (BLUP) can be obtained byminimizing the Mean Square Error (MSE) of the predictor. In practice, one11.1. The Structure of the Thesismust estimate the correlation parameters from the available data (n inputsand n outputs). Several estimation methods have been proposed so far and,in principle, those estimating methods can be classified into two categoriesbased on their underlying paradigms: The first is the Frequentist-basedMaximum Likelihood Estimation (MLE) and its improved versions, such asWelch et al. (1992), Ba and Joseph (2012); The second is the Bayesian-basedestimation methods, such as the Treed GP proposed by Gramacy (2005) andthe Bayesian method proposed by Higdon et al. (2004, 2008). In this thesis,we will emphasize the Bayesian methods but MLE will also be included forcomparison.1.1 The Structure of the ThesisThis thesis is dedicated to foundational aspects of the analysis of com-puter experiments. In Chapter 1, some related statistical theories, such asBayes? theorem, will be introduced. Four popular parameters estimationmethods will be reviewed in Chapter 2. Our emphasis is placed on the threemethods that are based on the Bayesian framework. In Chapter 3, we willpresent the results of a tentative study, whose purpose is to compare theprediction performance of the three Bayesian methods. We observe thatin terms of the prediction accuracy, one of the Bayesian methods performspoorly, while, the remaining two have relatively better performance.Curiosity has driven us to explore the reasons why those Bayesian meth-ods have such different prediction performances. Since different methodshave different parameterizations as well as different prior distributions, wespecify 4 important factors that are able to account for the performancedifference and fix the other possible affecting factors. Each factor is as-signed different levels. Full factorial experiments are then carried out on thespecified factors both through real computer codes (in Chapter 4) and viasimulation data (in Chapter 5). Several useful conclusions are drawn andthe best combination for the levels of the factors is found. Some concludingremarks are made in Chapter 6.1.2 Related Work: Three Key IngredientsIn this section, we will introduce three important ingredients for thisrecipe of comparing different Bayesian-based estimation methods. All threeconcepts are briefly outlined here. In some cases, in-depth analysis is post-21.2. Related Work: Three Key Ingredientsponed to later chapters. Readers who have already gained knowledge aboutBayesian Modelling, Markov Chain Monte Carlo (MCMC) and StationaryGaussian Processes may skip this section.1.2.1 Bayesian ModellingThe basis of Bayesian modelling is the famous Bayes? theorem. In thesimplest case, it states that for two events A and B, the probability of Ahappens conditional on event B, denoted as P (A|B), can be expressed asP (A ?B)/P (B), provided P (B) > 0.Let ? denote the parameter(s) of a model. The prior distribution of? is p(?). Given data Y , the posterior distribution for ? is obtained bycombining the prior with the likelihood p(Y |?) by Bayes? theorem:p(?|Y ) =p(Y |?)p(?)p(Y ), (1.1)where p(Y ) is the marginal distribution for Y . In simple cases, it can beobtained by integrating out ?, i.e., p(Y ) =?p(Y |?)p(?)d?.The prior distribution on ? contains the scientific prior knowledge about?. However, in some circumstances, people do not have much meaningfulprior information about the parameters, a vague (non-informative) priorwill then be assumed. When combined with a likelihood, families of priorsproducing posterior distributions in the same family are called conjugate.Conjugate priors are very convenient in practice because they can lead toanalytically tractable posterior distributions.A significant benefit of Bayesian statistics is that it is able to fully quan-tify the uncertainty. The posterior distribution on ? is a whole summaryabout the parameters adjusted by data Y , in contrast with the Frequentist-based MLE, which can only yield a point estimate and a standard error ofthe point estimate. Nevertheless, it does not mean the Bayesian approachis perfect, because it is often impossible to get an analytical expression forp(?|Y ) for non-textbook examples. Details of Bayesian statistics, includ-ing its merits and drawbacks can be found in Hartigan (1964) and Robert(2001).31.2. Related Work: Three Key Ingredients1.2.2 Markov Chain Monte CarloAs we have pointed out, the posterior p(?|Y ) is generally mathemati-cally intractable. The problem largely lies in the marginal distribution ofY , which can, in principle, be obtained by p(y) =?p(y|?)p(?)d?. How-ever, when ? is a p-dimensional vector and p is relatively large, the high-dimensional integration becomes infeasible. A popular alternative to theintractable integration is to conduct posterior inference through simulation.Markov chain Monte Carlo(MCMC) is the standard choice(Gelman et al.(1995), Gilks et al. (1996)) for posterior inference by simulation, and is theubiquitous tool for Bayesian inference in this thesis.The main idea of MCMC is to establish a Markov chain whose station-ary distribution is the desired posterior distribution, and then draw samplesfrom that chain. The procedures for implementing MCMC are outlined be-low: Usually, we will ignore the initial stages of the chain. The initial stagesare called the burn-in period. We then collect samples from the chain everyt states, where t is the thinning parameter. By doing so, the independenceof the samples is largely guaranteed. Currently, there are two popular M-CMC algorithms: the Gibbs Sampler (Geman and Geman (1984)) and theMetropolis-Hastings (M-H) algorithm (Metropolis et al. (1953), Hastings(1970)), which is actually a particular case of the Gibbs Sampler. We willemphasize the M-H algorithm since all of the posterior inferences in Chapter4 and Chapter 5 are done by it.The Gibbs SamplerThe Gibbs Sampler is actually a sampler from the full conditionals.Suppose we wish to obtain a sample from the posterior distribution ofP (?1, . . . , ?d|Y ). The Gibbs Sampler is able to successively and repeatedlysimulate from the conditional distributions of each component (?i) giventhe other components. Under conditional conjugacy, the simulation step isusually straightforward. We outline the steps below.? 0. Initialize with ?0 =(?01, . . . , ?0d).? 1.1 Simulate ?11 from the conditional distribution of ?1|(?02, . . . , ?0d).? 1.2 Simulate ?12 from the conditional distribution of ?2|(?11, ?03, . . . , ?0d).? ? ?41.2. Related Work: Three Key Ingredients? 1.d Simulate ?1d from the conditional distribution of ?d|(?11, . . . , ?1d?1).? 2 Iterate the above procedures.It is obvious that with the Gibbs Sampler, as long as the full conditionalhas a closed form, we can obtain a sample directly from the posterior distri-bution without worrying about any integrals. In practice, parameters withconditionally conjugate priors are usually sampled with Gibbs procedures.Metropolis-Hastings AlgorithmThe M-H algorithm can be viewed as a generalization of the Gibbs Sam-pler when the full conditional does not have a closed form. Still considerdrawing samples from the posterior distribution of p(?|Y ), which is pro-portional to p(Y |?)p(?). Let ?i be the current state. The M-H algorithmproceeds by generating a new ?? from a transition (irreducible-aperiodic)kernel q(?|?i). The next state ?i+1 is given by?i+1 ={?? with probability ??i with probability 1? ?(1.2)where ? is? = min{1,p(Y |??)p(??)q(?i|??)p(Y |?i)p(?i)q(??|?i)}(1.3)and ? is usually referred to as the M-H acceptance ratio. Iterating theabove procedures, the constructed Markov chain converge to its stationarydistribution, which is actually the desired posterior distribution. Posteriorinference is conducted based on the samples collected from the chain.In addition, there are two concepts that are important to ensure conver-gence of the M-H algorithm. The first one is the proposal density, whichis the probability density of the transition kernel q(?|?i). The M-H algo-rithm works best if the proposal density matches the shape of the targetdistribution density. Since we do not know the shape of the desired poste-rior distribution, a uniform density has been used in the thesis. The secondis the acceptance rate, which is the fraction of candidate draws that areaccepted. We will cover some details of the acceptance rate in the nextpart.51.2. Related Work: Three Key IngredientsConvergence DiagnosticFrom the theory of MCMC, we expect the constructed Markov chain toeventually converge to the stationary distribution, which is also the targetposterior distribution.However, there is no guarantee that the chain has converged after Hdraws. Several methods have been proposed, both visual and statistical,to help judging convergence. Here we will introduce two ways to diagnoseconvergence: visual and statistical. The two methods have been applied tocheck convergence in the thesis.Intuitively, a direct way to see if the chain has converged is to see howwell the chain is moving around the parameter space. If the chain is fluc-tuating in the parameter space, it suggests the chain need more time toconverge to the desired distribution. The Traceplot is a plot that drawsthe sample values of a parameter against the iteration number. It enables usto check whether our chain gets stuck in certain areas of the parameter space.Here, we include a simple example to illustrate how a traceplot can helpto judge convergence. This linear regression example was used by Smith(2007) to illustrate the usefulness of the R package boa. The examplepresents a regression analysis on five (x, y) observations: (1, 1), (2, 3), (3, 3), (4, 3),and (5, 5). The analysis was performed with the following Bayesian model:yi ? N(?i, ?),?i = ?+ ?(xi ? x?),where ? = 1/?2 is the precision. The prior distributions for ?, ? and ?2 are? ? N(0, 0.0001),? ? N(0, 0.0001),? ? Gamma(0.001, 0.001).Primary interest was placed on the posterior inferences for ?, ? and ? . Inthe paper of Smith (2007), two parallel chains of 200 iterations each weregenerated in separate runs of the MCMC sampler started at different initialvalues. We concentrate on the first chain. The traceplots for ?, ? and ? aredrawn in figure 1.1. The three traceplots suggest after the first few steps,61.2. Related Work: Three Key IngredientsFigure 1.1: Trace plot for alpha, beta and tau.71.2. Related Work: Three Key Ingredientsthe ?, ? and ? converge.Besides the traceplot, the statistical indicator we use to help judgingconvergence is the acceptance rate, which was introduced before. If the ac-ceptance rate is too high, like 80% , the successive samples will move aroundthe space and the chain will converge very slowly to the desired distribu-tion. If the acceptance rate is too low, like 10%, the proposals are likelyto concentrate in regions of much lower probability density, and again thechain will converge very slowly. The desired acceptance rate depends on thetarget distribution. According to Roberts et al. (1997), the ideal acceptancerate for a one dimensional normal distribution is approximately 50% andthe ideal acceptance rate for a multidimensional normal target distributionis approximately 23%. In the thesis, acceptance rate from 20%?50% will beconsidered sufficient to indicate convergence.1.2.3 Stationary Gaussian ProcessesSacks et al. (1989) treated the observations of a deterministic computercode as if they were generated from the following model:y(x) = fT (x)? + Z(x). (1.4)Here, x is the vector of inputs to a computer model, y(x) is the output,f = (f1(x), f2(x), . . . , fk(x))T contains k known regression functions, ? isa vector of parameters with unknown values and Z(x) follows a Gaussiandistribution with zero mean and unknown variance ?2. Let x and x? be twosets of inputs. Here, we introduce 3 correlation structures between Z(x)and Z(x?). The key properties that we need are the correlation decreases asthe distance l between x and x? increases with limiting value 1 when l? 0and limiting value 0 when l??.? Gaussian Correlation StructureRGauss(x,x?) = exp{?p?h=1?h|xh ? x?h|2}, (1.5)where ? > 0. The correlation parameters(?) control how fast the cor-relation decays when distance increases. The smoothness parameters,which control the geometrical properties of the random field, are fixedat 2. The Gaussian Correlation structure is popular and is often used81.2. Related Work: Three Key Ingredientsin applications, see Gramacy and Lee (2008), Kennedy and O?Hagan(2001).? Power Exponential Correlation StructureRPowerExp(x,x?) = exp{?p?h=1?h|xh ? x?h|mh}, (1.6)where ?h > 0 and mh ? (0, 2]. The difference between Power Exponen-tial structure and Gaussian structure lies in the smoothness parame-ters. Instead of fixing the smoothness parameters, the Power Expo-nential structure introduces a new parameter m = {m1,m2, . . . ,mp}.In practice, mh can either be treated as a constant or be treated as ahyperparameter.? Matern Correlation StructureRMatern(x,x?) =1?(?)2??1(?2?l?)?K?(?2?l?), (1.7)where l is the distance between x and x?, K?() is the modified Besselfunction and ? and ? are non-negative parameters. As ? goes toinfinity, the Matern structure will converge to the Gaussian CorrelationStructure.All of the three correlation structure mentioned above guarantee thatthe n? n correlation matrix R for the observations is symmetric and posi-tive semidefinite. The diagonals of the correlation matrix R are all 1 and Ris completely determined by the correlation parameters ? = {?1, ?2, . . . , ?p}.In this thesis, we only consider the Gaussian correlation structure for itseffectiveness and simplicity.Running a computer model n times at input vectors x(1), x(2), . . ., x(n)produces n outputs y =(y(x(1)), y(x(2)), . . . , y(x(n)))T. Given a new inputconfiguration x?, we wish to predict y(x?). According to Sacks et al. (1989) ,the predictive distribution of y(x?) conditional on ?, ?2, ? and y is Gaussian:N (m(x?), v(x?)) , (1.8)wherem(x?) = fT (x?)? + rT (x?)R?1(y ? F?) (1.9)91.2. Related Work: Three Key Ingredientsandv(x?) = ?2(1? rT (x?)R?1r(x?)). (1.10)Here, F is the n ? k matrix with row i containing fT (xi), the n ? 1vector r(x?) is obtained from (1.5) with element i given by R(x?,xi) for alli = {1, . . . , n}, and the n?n matrix R has element (i, j) from R(xi,xj) forall i = {1, . . . , n} and j = {1, . . . , n}.We have shown that under (1.4) the predictive distribution of y(x?) isa conditional multivariate normal distribution, if the ?, ?2, ? are known.In addition, the centre of the normal distribution is the usual kriging pointpredictor. Theoretically speaking, the (1? ?)100% confidence interval con-structed according to (1.8) should give the correct coverage probability. Inpractice, however, one must estimate the model parameters, resulting in ex-tra uncertainty that needs to be assessed.In the next chapter, we will review four popular parameter estima-tion methods. Emphasis will be put on the three Bayesian methods. TheFrequentist-based MLE will only be incorporated for comparison.10Chapter 2Review for Existing MethodsIt has been quite popular to model the output from a deterministic com-puter model as the realization from Gaussian Process. Based on (1.4), manyresearchers (Sacks et al. (1989), Higdon et al. (2004), Bastos and O?Hagan(2009)) assume the outputs y follow a multivariate normal distribution. Thedensity of y given ?, ?2 and ? isL(y|?, ?2,?) =1(2pi?2)n/2 det1/2Rexp{?12?2(y ? F?)TR?1(y ? F?)}.(2.1)Theoretically speaking, people assume that the parameters ? = {?, ?2,?}are known. However, one must estimate ? in practice. In this chapter, wewill review four popular parameter estimation methods: the first one is MLEand the remaining three are Bayesian methods.2.1 Method of Maximum Likelihood EstimationMaximum Likelihood Estimation(MLE) is the most popular parameterestimation method within the frequentist paradigm. Based on (2.1), for anyfixed ?, we take the first derivative with respect to ? and ?2 separately,and then set the consequent expressions to zero, the MLEs for ? and ?2 areobtained as?? = (F TR?1F )?1F TR?1y (2.2)and??2 =1n(y ? F ??)TR?1(y ? F ??). (2.3)Please note that the MLEs for ? and ?2 are fully determined by ?.Plugging the MLEs of ? and ?2 into (2.1) gives the profile likelihood, whichis only affected by ?.1(2pi??2)n/2 det1/2 (R)exp{?n2}. (2.4)112.2. The General Picture for Bayesian Estimation MethodsThis profile likelihood need to be numerically maximized to yield that M-LE for ?. There are several numerical method can achieve this goal, such asthe Nelder-Mead Simplex (Nelder and Mead (1965)) and the Limited memo-ry Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method (see, for example,Shanno and Kettler (1970)). Welch et al. (1992) also proposed an effectiveapproach to get the MLE for ?. Since the thesis emphasizes Bayesian meth-ods, we are not going to mention the details of the numerical maximizationof ?. In addition, there are several statistical packages that are availablefor the MLE of ?, such as the mlegp package in R software and the GaSPsoftware which was initiated by Welch.Substituting the MLEs for ? into (1.9), the predictive mean becomesm?(x?) = fT (x?)?? + rT (x?)R?1(y ? F ??). (2.5)Since we use ?? instead of the true ?, extra uncertainty has been intro-duced and the predictive variance in (1.10) becomesv??(x?) = ??2[1? rT (x?)R?1r(x?)]+??2[f(x?)? F TR?1r(x?)]T (F TR?1F )?1[f(x?)? F TR?1r(x?)].(2.6)The extra uncertainty which is introduced by replacing ?? for ? is non-trivial (see Abt (1999)). However, in practice, the extra uncertainty is oftenignored. Consequently, the (1 ? ?)100% confidence interval based on (2.5)and (2.6) does not incorporate it, that is, the actual coverage probabilityshall be lower than the nominal coverage probability. The above analysesare in accordance with the results we observe in later chapters.2.2 The General Picture for Bayesian EstimationMethodsCompared with MLE, Bayesian estimation methods are able to quantifythe extra estimation uncertainty. GP parameters ? = {?, ?2,?} are treatedas hyperparameters within the Bayesian paradigm and different prior dis-tributions are often assumed on them. The posterior inferences on the GPparameters are usually conducted through MCMC.122.3. Higdon?s Bayesian MethodLet us begin by expressing the joint posterior distribution of the param-eters ? = {?, ?2,?} asP (?|Y ) ? P (Y |?)P (?). (2.7)This means that the joint posterior distribution of the parameters is propor-tional to the product of the joint prior distribution of the parameters andthe likelihood. By assuming independent prior distributions, the joint priordistribution is given byP (?) = P (?)P (?2)P (?). (2.8)Now, there are two approaches that one can take.? The first approach is to analytically integrate ? and ?2 out fromp(?,?, ?2|y) and sample from the resulting P (?|y) by the M-H al-gorithm.? The second approach is to sample the full posterior using Gibbs Sam-pler on ?, ?2 and M-H algorithm within Gibbs Sampler on the ?.Although the underlying ideas are similar, the joint prior distribution P (?)is different for different Bayesian methods. In the next sections, we willreview three popular Bayesian methods.2.3 Higdon?s Bayesian MethodThe first Bayesian method we introduce here is the one that was proposedby Higdon et al. (2004, 2008). Thereafter, we will call this method Higdon?smethod. The prior distributions on the GP parameters ? are listed below.? An Inverse Gamma ( IG (?1, ?2) ) distribution on ?2.? A Uniform distribution on ?i, for i = {1, . . . , k}.? A Beta distribution on ?i, where ?i = exp (??i/4), for i = {1, . . . , d}.In terms of the correlation parameters ?, we notice that Higdon?s method,in fact, transforms the ?i to a ?i correlation scale. After a simple transfor-mation of variables, this is equivalent to saying that the method assumes anindependent prior on each ?i with the following densitypi(?i) =18exp(??i/4)1?1? exp(??i/4). (2.9)132.3. Higdon?s Bayesian MethodBy integrating out ? and ?2, the marginal posterior distribution of ? isgiven byp(?|y) ?? ?p(?,?, ?2|y)L(y|?,?, ?2)d?d?2?p(?)(?2 + n?k2 ??2?)(?1+n?k2 ) det1/2(R) det1/2(F TR?1F ),(2.10)where p(?) is the prior distribution on ? and??2? =1n? k(y ? F ??)TR?1(y ? F ??),?? = (F tR?1F )?1F tR?1y.The Metropolis-Hastings algorithm is then applied to obtain samples fromthe posterior distribution of ?.With the prior specified above, the predictive distribution of p(y?|y,?)is a non-central t distribution with n? k + 2?1 degree of freedom. That is,p(y(x?)|?,y) ? t(m?(x?), v??(x?)),wherem?(x?) = fT (x?)?? + rT (x?)R?1(y ? F ??) (2.11)andv??(x?) =((n? 1)??2? + 2?2n? 1 + 2?1)(1? rT (x?)R?1r(x?) + hT (F TR?1F )?1h),(2.12)with h = f(x?)?F TR?1r(x?). In order for readers to gain a better under-standing on how Higdon?s method is used to make predictions, we brieflymention the procedures here.Repeat the following steps M times, starting at t = 11. For each dimension i = 1, 2, . . . , p, at step t, one first transforms ?ti to?ti.2. Using an uniform transition function, either U(?ti ? 0.5w, ?ti + 0.5w)or U(34?ti,43?ti), to generate a candidate ??i . w is a specified step-width.142.4. Method of Gaussian Emulation Machine3. Let ?? = (?t1, ?t2, . . . , ?ti?1, ??i , ?ti+1, . . . , ?tp) and?t = (?t1, ?t2, . . . , ?ti?1, ?ti, ?ti+1, . . . , ?tp).4. Generating a number ? from U(0, 1).5. If log(?) < log(p(??|y)) ? log(p(?t|y)), set ?t+1i = ??i ; otherwise, set?t+1i = ?ti.Here, p(??|y) and p(?t|y) are computed based on (2.10). After repeatingthe MCMC procedure for M times, one can get a M ? p matrix with the ith row containing ?i. For each ?i, one converts it back to ?i and computesm?i(x?) based on (2.11) and v?i?(x?) based on (2.12). The final predictor isdefined asm?(x?) = m?(x?) =1MM?i=1m?i(x?).The final predictive variance is obtained through the law of total variance.That is,v??(x?) =1MM?i=1v?i?(x?) +1M ? 1M?i=1[m?i(x?)? m?(x?)]2.The above Bayesian MCMC approach takes the extra uncertainty intoaccount. Therefore, the (1??)100% credible interval constructed from (2.11)and (2.12) should have a much more appropriate actual coverage probabilitycompared with the confidence interval constructed by MLE. This also agreeswith the results we obtain in Chapter 4.2.4 Method of Gaussian Emulation MachineKennedy (2004) proposed another Bayesian approach to emulate realcomputer models. Software called Gaussian Emulation Machine (GEM)was also created according to this method, which we will refer to as GEMin the thesis. After examining the GEM method carefully, we notice that itis quite similar to Higdon?s method. The difference is that GEM assumesdifferent prior distributions on ? = {?, ?2,?}. The priors assumed are listedbelow.? A Jeffreys prior on ?2 i.e., P (?2) ? 1/?2.? A Uniform distribution on ?i.152.4. Method of Gaussian Emulation Machine? An Exponential distribution on ?i.In a quite similar way, by integrating out the ? and ?2, the marginalposterior distribution of ? is given byp(?|y) ?? ?p(?,?, ?2|y)L(y|?,?, ?2)d?d?2?p(?)(??2?)n?k2 det1/2(R) det1/2(F TR?1F ),(2.13)where p(?) is the prior distribution on ? and??2? =1n? k(y ? F ??)TR?1(y ? F ??),?? = (F tR?1F )?1F tR?1y.Sampling from the posterior distribution of ? can be done through theMetropolis-Hastings algorithm.With the prior specified above, the predictive distribution of p(y?|y,?)is a non-central t distribution with n? k degree of freedom. That is,p(y(x?)|?,y) ? t(m?(x?), v??(x?)),wherem?(x?) = fT (x?)?? + rT (x?)R?1(y ? F ??) (2.14)andv??(x?) = ??2?(1? rT (x?)R?1r(x?) + hT (F TR?1F )?1h), (2.15)with h = f(x?) ? F TR?1r(x?). We will skip the detailed MCMC steps,since the procedures will be almost identical to the procedures mentionedin Higdon?s method. After repeating the MCMC procedure M times, onecan get an M ? p matrix with the s th row containing ?s. For each ?s,one computes m?s(x?) based on (2.14) and v?s?(x?) based on (2.15). The finalpredictor is defined asm?(x?) = m?(x?) =1MM?i=1m?i(x?).162.5. Method of Treed Gaussian ProcessThe final predictive variance isv??(x?) =1MM?i=1v?i?(x?) +1M ? 1M?i=1[m?i(x?)? m?(x?)]2.2.5 Method of Treed Gaussian ProcessGramacy (2005) proposed a treed GP hierarchical approach for predic-tion in computer experiments. Generally speaking, he divides the inputvectors x1, x2, . . ., xn into different regions and fits independent hierarchi-cal GP model within each region. The partitioning is recursive, that is anew partition is a sub-partition of a previous partition just like a tree de-velops more and more branches. Predictions are made conditional on thetree structure and will be averaged out finally to yield an ultimate predictor.Here, we introduce the hierarchical GP model without doing partition, i.e.,we fit one hierarchical GP model for the whole input space. Readers canimagine our ?tree? has no branches at all. Hereafter, we will refer to thismethod as TGP.The TGP method specifies a new correlation structureK(x,x?) = R?(x,x?) + g?x,x? ,where,R?(x,x?) = exp{?p?h=11dh|xh ? x?h|2}.Here, g is the nugget parameter and ?x,x? is the Kronecker delta function,i.e. ?x,x? = 1, if x = x?; ?x,x? = 0, otherwise. The TGP assumes anexponential prior on the nugget g and an independent mixture of Gammapriors on di, that ispi(g) ? exp(?)andpi(di) ?12(G(1, 20) +G(10, 10)) , i = {1, 2, . . . , p},with ? is treated as known. We also notice that the TGP actually works onthe 1/?i scale. The equivalent prior on ?i ispi(?i) ={1010?(10)1?9iexp(?10?i) + 20 exp(?20?i)}/(2?2i ).172.5. Method of Treed Gaussian ProcessThe two priors mentioned above are specified for the correlation matrixKd,g. The TGP method also assumes priors on ? and ?2 as? An Inverse Gamma (IG) prior on ?2.? A Normal distribution on ?i.Since the Normal and Inverse Gamma are both conjugate priors, it?s nothard to get the full conditional distributions for ? and ?2. Therefore, all ofthe parameters except g and d can be sampled with the Gibbs sampler. gand d will be sampled with the Metropolis-Hastings algorithm. The detailedderivations of the full conditionals along with the prediction formulas havebeen given by Gramacy (2005).Although it uses a different MCMC method, the underlying idea of TGPis quite similar to that of the Higdon?s method and the GEM we coveredin the previous section. All the three methods are constructed within theBayesian paradigm and are depending on numerical methods (MCMC) toconduct posterior inferences. In addition, we also observe in the next chap-ter that the performances of the (1 ? ?)100% credible intervals are muchbetter than those constructed by the MLE.In summary, we have introduced four popular parameter estimationmethods for Gaussian Processes. The first method is the Frequentist-basedMLE and the remaining three methods are Bayesian approaches. From nowon, we will emphasize the three Bayesian methods, since, theoretically s-peaking, they are superior to MLE in that the Bayesian paradigm is able toquantify the extra uncertainties.Moreover, we have done a search but failed to find any journal paper thathas compared the three Bayesian methods before. Therefore, we conducta comprehensive comparison on them. The comparison will focus on twoaspects: the prediction accuracy and the prediction actual coverage proba-bility, which will be measured by different criteria, respectively. A tentativecomparison will be made in the next chapter and the formal comparisonbetween the three methods will be covered in Chapter 4 and Chapter 5.18Chapter 3Tentative Comparison of theBayesian MethodsIn this chapter we will conduct a tentative comparison of the threeBayesian methods. Results from MLE will also be included for compari-son.3.1 Two Prediction CriteriaAs we have mentioned before, we would like to compare those Bayesianmethods both on prediction accuracy and on prediction actual coverageprobability. The first criterion is the ?Normalized Root Mean Square Error?or ?Norm-RMSE? in short, which quantifies the performance of predictionaccuracy. The expression is given below.Norm-RMSE =?1N?Ni=1(y?(xiho)? y(xiho))2?1N?Ni=1(y? ? y(xiho))2(3.1)where y? is the mean of the data from the runs in the experimental design.y(x) is the ?true value? of the hold-out (test) set, i.e., N further runs. Be-sides, y?(x) is the ?estimated value??predicted value from the GP. In mostcases, the Norm-RMSE roughly lies between the range of 0 to 1, since thedenominator is usually larger than the numerator?the usual RMSE. A 0value in Norm-RMSE means perfect prediction and a 1 or even larger valueindicates the predictor from GP model is not better than the trivial predic-tor (y?).The second criterion is the frequentist Actual Coverage Probability (ACP)Berger et al. (2001), which measures how well a method quantifies the uncer-tainty. The true coverage probability is set as 90% and 95% in this chapter.In fact, the actual coverage probability is a proportion, which calculateshow many of the credible intervals capture the true values among all of the193.2. Details for the Tentative Comparisonconstructed intervals. We want the actual coverage probability be as closeto the nominal coverage probability (90%, 95%) as possible. If the actu-al coverage probability of one method is apparently below the nominal one,we can conclude that this method fails to incorporate all of the uncertainties.3.2 Details for the Tentative ComparisonWe will mention some details below for the tentative comparison.3.2.1 DataThe data we are going to use is from a G-protein compute code, onethat has served as a testbed in many contexts. The G-protein computermodel was proposed by Yi et al. (2005) with the purpose of modelling ligandactivation of G-protein in yeast. It involves the solving a system of ordinarydifferential equations (ODEs) with nine parameters that can vary. Thedifferential equations are given byd?1dx= ??1?1x+ ?2?2 ? ?3?1 + ?5d?2dx= ?1?1x? ?2?2 ? ?4?2d?3dx= ??6?2?3 + ?8(Gtot ? ?3 ? ?4)(Gtot ? ?3)d?4dx= ?6?2?3 ? ?7?4(3.2)where ?1, . . . , ?4 are concentration of four chemical species, x is the concen-tration of the ligand, and ?1, . . . , ?8 is a vector of eight kinetic parameters.Gtot is the total concentration of G-protein complex after 30 seconds andthe output y = (Gtot??3)/Gtot is the normalized concentration of a relevantpart of the complex.We fixed five of the kinetic parameters, allowing only ?1, ?6, ?7 and x tovary. We use the GP model to construct an approximation of y as a functionof the transformed variables log(?1), log(?6), log(?7) and log(x), and thenfurther transform each of these to [0, 1]. In this way, the G-protein dataactually has 4 inputs (dimensions) and one simple output.203.2. Details for the Tentative Comparison3.2.2 Comparison DetailsWe have the following set-up for this comparison.? The training set comprises 20 runs, i.e., x points, and the design forthe training set is a random Latin Hypercube Design (LHD). (McKayet al. (1979))? The testing/hold-out set is a 1000?run random LHD.? The regression in the GP model is fixed as Constant, i.e, f(x) =?+ Z(x).? Two Criteria: Norm-RMSE and Actual Coverage Probability.? Estimation methods considered here: MLE, Higdon?s method, TGPand GEM.? Even for the same method, different designs give different outputs.Therefore, we need replications to minimize the effect of randomness.The replication number is set as 20, i.e., each method will be appliedto 20 designs and will have 20 numbers for Norm-RMSE, 90% ACPand 95% ACP, respectively.For the three Bayesian methods, in order to decide how to run the M-CMC, we do a small experiment. First of all, we specify 3 choices for thetotal number of MCMC samples as well as the corresponding burn-in asfollows? 15,000 total runs with the first 5,000 as burn-in.? 30,000 total runs with the first 10,000 as burn-in.? 60,000 total runs with the first 15,000 as burn-in.We then generate one training set with 41 runs and use the same 1000 testingpoints as hold-out. The prior on the correlation parameters (?1, . . . , ?4) areset independently as exp{0.1}. The initial values for (?1, . . . , ?4) are setat their MLE values, which are obtained from the mlegp package. Thetraceplot for each of the choices are given in Figures 3.1, 3.2, 3.3. Theempirical densities for each of the choices are presented in Figures 3.4, 3.5and 3.6213.2. Details for the Tentative ComparisonFigure 3.1: Traceplot for 15000 total samples.223.2. Details for the Tentative ComparisonFigure 3.2: Traceplot for 30000 total samples.233.2. Details for the Tentative ComparisonFigure 3.3: Traceplot for 60000 total samples.243.2. Details for the Tentative ComparisonFigure 3.4: Empirical densities for 15000 total samples.253.2. Details for the Tentative ComparisonFigure 3.5: Empirical densities for 30000 total samples.263.2. Details for the Tentative ComparisonFigure 3.6: Empirical densities for 60000 total samples.273.3. Comparison ResultsThe acceptance rates for each of the choices are given in Table 3.1Choices ?1 ?2 ?3 ?415,000 with 5,000 as burn-in 0.198 0.382 0.371 0.47830,000 with 10,000 as burn-in 0.202 0.381 0.372 0.48260,000 with 15,000 as burn-in 0.199 0.382 0.365 0.479Table 3.1: Acceptance rates for different choices of MCMC length.We can see that the empirical densities do not show much difference forthe different choices of number of MCMC samples. The acceptance ratesfor all of the three choices of MCMC are between 20% and 50%. Fromthe traceplots, the chain appears to converge when the number of MCMCsamples is set as 60,000. The 15,000 scenario or the 30,000 scenario is toosmall to guarantee convergence. Therefore, we decide to set the numberof MCMC samples as 60,000 and the first 15,000 are deleted as burn-in.The thinning number is chosen as 10 so that the MC samples we obtain areapproximately independent.3.3 Comparison Results3.3.1 Prediction AccuracyFirst, we present the results for Norm-RMSE as follows. In Figure 3.7,the top panel contains the dotplots and the bottom panel contains the box-plots.From Figure 3.7 , we can see that the MLE and Higdon?s method are s-lightly better than the TGP or GEM. In general, however, the performancesof the four methods are not substantially different.3.3.2 Actual Coverage ProbabilityThe results for the actual coverage probability are given in Figures 3.8and 3.9. Please note that each method has 20 numbers for the ACP from20 repeat designs. The nominal coverage probability in Figure 3.8 is 90%and 95% in Figure 3.9.283.3. Comparison ResultsFigure 3.7: Results for prediction accuracy.293.3. Comparison ResultsFigure 3.8: Results for actual coverage probability, 90% true CP.303.3. Comparison ResultsFigure 3.9: Results for actual coverage probability, 95% true CP.313.4. Conclusion & DiscussionThe red lines in Figures 3.8 and 3.9 represent the correct 90% or 95%coverage probability, respectively. From both figures, we can see that Hig-don?s methods and GEM fluctuate around the correct coverage probability,while TGP largely over-covers. It is also obvious that the TGP methodcontains less variation than the other 2 Bayesian methods. In order to elim-inate the variations, we compute the average ACP and provide the resultsfor each method in Table 3.2 and 3.3.MLE Higdon?s method GEM TGP64.92% 84.16 % 88.87% 96.17%Table 3.2: Average actual coverage probability, 90% true CP.MLE Higdon?s method GEM TGP72.43% 89.03% 93.03% 97.86%Table 3.3: Average actual coverage probability, 95% true CP.All in all, it is apparent that the MLE greatly under-covers. This re-sult is not surprising, since from previous analyses, we already know thatthe MLE method fails to incorporate all of the uncertainties. In addition,Higdon?s method has a little under-coverage, while TGP over-covers. Theperformance of GEM seems the best among the four methods we comparehere.3.4 Conclusion & DiscussionOur emphasis in this tentative comparison is put on the three Bayesianmethods. In terms of prediction accuracy, we can see that the performancesfor all of the three methods are quite similar. In terms of the predictionactual coverage probability, GEM is better than Higdon?s method and TGP.However, this comparison is only a superficial one because each Bayesianmethod has its own features, such as, different parameterizations, differentprior distributions on correlation parameters ?, etc. We actually do notknow which factor has significant effects on the prediction results. Peoplemay ask questions that we can not answer through this tentative comparison,such as323.4. Conclusion & Discussion? Does the prior on ? affect the prediction accuracy substantially?? Does the prior on ?2 affect the prediction accuracy or not?? You are using a constant model for GP. If you use a linear modelinstead, will it have a significant effect on the prediction performance?? You fix the number of computer model runs as 40 in your training set.If you increase the number of runs from 40 to say 80, will it affect theprediction performance?. . .From the above discussion, we can see that more sophisticated compar-isons/analyses are needed to gain a better understanding of the underlyingfactors. In the next chapters, we will specify four important factors thatmay affect the prediction performance. For each factor, different levels willbe assigned based to the three existing Bayesian methods. We then findthe important factors that have significant impacts on the prediction per-formance through full factorial experiments. The best combination for thelevels of factors will also be identified.33Chapter 4In-Depth Comparison: RealComputer ModelsIn this chapter, we will conduct an in-depth comparison of three realcomputer models. Conclusions drawn from the comparison can help us toanswer the questions asked in Chapter 3.4.1 Comparison Details4.1.1 Four Important FactorsFirst of all, we specify 4 factors within the Bayesian paradigm that mayhave significant effects on the prediction performance measures. The factorsare? The prior distribution on the correlation parameter ?i.? The prior distribution on ?2.? The regression term for the underlying Gaussian Process(GP).? The number of the runs in the training data.(1) Prior Distribution on the Correlation Parameter ?iSince different Bayesian methods have different parameterizations of thecorrelation function, we will uniformly parameterize it asR(x,x?) = exp{?p?h=1?h|xh ? x?h|2}. (4.1)Hereafter, we will refer the above parameterization as the ? scale. We ex-tract three priors from the three Bayesian methods we reviewed previouslyand we call these priors Higdon?s prior, TGP?s prior and GEM?s prior. For344.1. Comparison Detailsmethods that have different parameterizations, we transform their prior dis-tributions into the equivalent prior distributions on the ? scale. These threepriors are the three qualitative levels we specified for the first factor. Welist the three priors below.? Hidgon?s prior on ?, independent and identical distributed with thedensitypi(?i) =18exp(??i/4)1?1? exp(??i/4). (4.2)? TGP?s prior on ?, independent and identical distributed with the den-sitypi(?i) = {1010?(10)1?9iexp(?10?i) + 20 exp(?20?i)}/(2?2i ). (4.3)? GEM?s prior on ?, independent and identical exponential distributionwith ? = 0.01,pi(?i) = 0.01 exp{?0.01?i}. (4.4)We also draw the above three densities plots with different ?i ranges([0, 4]and [0, 40] ). The plots are given in Figures 4.1, 4.2 and 4.3.From Figure 4.1, we see that Higdon?s prior places extremely heavyweight on small ?i values. This is actually the desired prior distributionfor the correlation parameter, since small ?i values mean strong correlation-s between points in the design space. Moreover, compared with Higdon?sprior, GEM?s prior assumes a flat prior distribution on ?i (note the differ-ent y ranges), which represents its vague preference on the magnitude ofcorrelation parameters. Moreover, in terms of TGP?s prior, since the pa-rameterization of the TGP?s prior is ?i = 1/di, it implies if di ?? ?i ? 0.However, from Figure 4.3, the TGP?s prior forces di to stay away from ?,hence, ?i can not get small enough. Therefore, we conjecture that Higdon?sprior and GEM?s prior will have better performance than TGP?s prior.(2) Prior Distribution on ?2For the second factor, we notice that Higdon?s method and TGP?s methodassume an Inverse Gamma (IG) distribution on ?2, while, GEM?s methodassumes the Jeffreys? prior (pi(?2) ? 1/?2). Hence, we specify two levels forthe second factor:354.1. Comparison DetailsFigure 4.1: Density plot for Higdon?s prior.364.1. Comparison DetailsFigure 4.2: Density plot for GEM?s prior.374.1. Comparison DetailsFigure 4.3: Density plot for TGP?s prior.384.1. Comparison Details? Inverse Gamma prior?2 ? IG(?1 = 0.1, ?2 = 5),where the values that ?1, ?2 take are extracted from TPG.? Jeffreys? priorpi(?2) ?1?2.(3) Regression Terms for the Gaussian ProcessWe assign two levels for this factor: Constant and Full Linear. i.e.? Constant:y(x) = ?+ Z(x).? Full Lineary(x) =p?i=1?ixi + Z(x).(4) Number of the Runs in the Training DataLoeppky et al. (2009) provides evidence that the informal rule of n = 10dis sufficient for most computer models. Let d be the input dimension, wewould like to specify two levels here: 5d and 10d. The 10d level representsthe sufficient scenario for effectively emulating a computer model, while, wedeliberately choose the inadequate scenario? 5d level to explore what willhappen when the number of runs is not enough.In addition, we fixed all of the other potentially affecting factors to makesure the comparisons are fair. We will mention the fixed factors in thesubsequent sections.4.1.2 Full Factorial DesignFrom previous discussion, we have specified 4 important factors and foreach factor, we have assigned different levels. The following summarizes thedesign of the study.? Prior on ?i: Higdon?s prior, TGP?s prior and GEM?s prior.? Prior on ?2: IG and Jeffreys? prior.394.1. Comparison Details? Regression terms for the GP: Constant and Full Linear.? Number of training runs: 5d and 10d.Based on the full factorial design, we will have 3? 2? 2? 2 = 24 com-binations of the 4 factors. Table 4.1 summarizes the full factorial design.No. Prior on ? Prior on ?2 Regression Runs1 Hig Jeff FL 10d2 Hig Jeff FL 5d3 Hig Jeff Const 10d4 Hig IG FL 10d5 Hig Jeff Const 5d6 Hig IG Const 5d7 Hig IG FL 5d8 Hig IG Const 10d9 TGP Jeff FL 10d10 TGP Jeff FL 5d11 TGP Jeff Const 10d12 TGP IG FL 10d13 TGP Jeff Const 5d14 TGP IG Const 5d15 TGP IG FL 5d16 TGP IG Const 10d17 GEM Jeff FL 10d18 GEM Jeff FL 5d19 GEM Jeff Const 10d20 GEM IG FL 10d21 GEM Jeff Const 5d22 GEM IG Const 5d23 GEM IG FL 5d24 GEM IG Const 10dTable 4.1: Full factorial design.We can also view Table 4.1 in another way that gives the prior on ?i,the 24 full factorial design will reduce to 23 factorial designs. The 23 fullfactorial design conditional on each level of the ?i prior is presented in Table4.2.404.1. Comparison DetailsNo. Prior on ?2 Regression Runs1 Jeff FL 10d2 Jeff FL 5d3 Jeff Const 10d4 IG FL 10d5 Jeff Const 5d6 IG Const 5d7 IG FL 5d8 IG Const 10dTable 4.2: 23 factorial design, given any level of the prior on ?i.4.1.3 Some Mathematics DetailsIn this section, we will mention some mathematics details. Like mostnon-textbook Bayesian problems, the Metropolis-Hastings algorithm is need-ed here to draw samples from the posterior distribution of ?i and the pos-terior inferences are conducted based on the MC samples.(1) The Marginal Posterior Distribution of ?iFirst of all, we derive the marginal posterior distribution of ?i, whichwe need in the M-H algorithm to exactly sample ?i. The p(?|y) is obtainedby integrating out ? and ?2. The prior on ? is fixed as a vague normaldistribution with zero mean and large variance. We will only present theresults here, the detailed derivations are given in Appendices A, B.? For p(?2) ? IG(?1, ?2),p(?|y) ?p(?)(?2 + n?k2 ??2?)(?1+n?k2 ) det1/2(R) det1/2(F TR?1F ). (4.5)? For p(?2) ? 1/?2,p(?|y) ?p(?)(??2?)(n?k2 ) det1/2(R) det1/2(F TR?1F ). (4.6)(2) The Predictive DistributionOnce we have the sampled ?i, according to Santner et al. (2003), thepredictive distribution is a non-central t distribution. That isp(y(x?)|?,y) ? t(m?(x?), v??(x?)),414.1. Comparison Detailswherem?(x?) = fT (x?)?? + rT (x?)R?1(y ? F ??). (4.7)? For ?2 ? IG(?1, ?2), the degrees of freedom are n? k + 2?1 andv??(x?) =((n? 1)??2? + 2?2n? 1 + 2?1)(1? rT (x?)R?1r(x?) + hT (F TR?1F )?1h).(4.8)? For pi(?2) ? 1/?2, the degrees of freedom are n? k andv??(x?) = ??2?(1? rT (x?)R?1r(x?) + hT (F TR?1F )?1h), (4.9)where h and ??2? have been previously defined in Chapter 2.We briefly mention the prediction procedures here. After repeating theMCMC procedure for M times, one can get a M ? p matrix with the i throw containing ?i. For each sampled ?, one computes m?i(x?) based on (4.7)and v?i?(x?) based on (4.8) or (4.9). The final predictor is defined asm?(x?) = m?(x?) =1MM?i=1m?i(x?).Each point-wise credible interval is constructed in a semi-parametricway: Suppose that one wants to construct the 95% credible interval. Foreach MC sample, one generates t samples from a non-central t distributiont(m?i(x?), v?i?(x?)). Then, one combines the M ? t generated samples andcompute the empirical 2.5% quantile L and the empirical 97.5% quantile Ufrom the combined samples. The 95% CI is constructed as (U,L). A 90%credible interval is similarly obtained.4.1.4 Other Details for the In-Depth ComparisonWe have the following set-up for the in-depth comparison, which arequite similar to those of the tentative comparison in Chapter 3.? For each code, the test set is the same 1000 test points.? The design for the input set is a random Latin Hypercube Design(random LHD). (McKay et al. (1979))? Two Criteria: Norm-RMSE and Actual Coverage Probability. TheNominal Coverage Probability is set at 90% and 95%.424.2. Real Computer Model 1: Borehole? Even for the same method, different designs give different outputs.Therefore, we need replications to minimize the effect of randomness.The replication number is 25, i.e., each method will have 25 numbersfor Norm-RMSE, 90% ACP and 95% ACP, respectively.? The total number of MCMC samples is set as 60,000 with the first15,000 as burn-in. The thinning parameter is 10.? We do not include nugget terms for the correlation matrix R. If a nu-merical problem occurs in MCMC, the generated candidate is rejectedduring the Metropolis step.This completes the set-up of the in-depth comparison. The full factorialdesign on the 4 factors will help us explore which factors have significantimpact on the prediction performances. In the next sections, we will carryout the experiments on several real outputs from computer models to iden-tify the significant factors as well as their best levels.4.2 Real Computer Model 1: BoreholeIn this section, we will first work with the Borehole (Morris et al. (1993))computer model, one that has served as a testbed in many contexts. TheBorehole data are obtained from a computer model of the flow of waterthrough a borehole that is drilled from the ground surface through twoaquifers. The response variable from this model is y0, the flow rate throughthe borehole in m3/yr, which is determined by the equation.y =2piTu[Hu ?Hl]log(r/rw)[1 + 2LTulog(r/rw)r2wKw + Tu/Tl](4.10)where the 8 inputs and their respective ranges of interest and units are asfollows.? rw = radius of boreholw, rw ? [0.05, 0.15].? r = radius of influence, r ? [100, 5000].? Tu = transmissivity of upper aquifer, Tu ? [63070, 115600].? Hu = potentiometric head of upper aquifer, Hu ? [990, 1110].? Tl = transmissivity of lower aquifer, Tl ? [63.1, 116].434.2. Real Computer Model 1: Borehole? Hl = potentiometric head of lower aquifer, Hl ? [700, 820].? L = length of borehole, L ? [1120, 1680].? Kw hydraulic conductivity of borehole, Kw ? [9855, 12045].Please note that the Borehole function is not a typical computer model,since it can be expressed in a simple formula, and is used for demonstrationpurposes.4.2.1 Prediction AccuracyThe Norm-RMSE results are presented in Figures 4.4, 4.5 and 4.6. Wehave 24 combinations and each combination has 25 Norm-RMSE numbers.Figure 4.4: Prediction accuracy, given Higdon?s prior on ?i, Borehole.444.2. Real Computer Model 1: BoreholeFigure 4.5: Prediction accuracy, given GEM?s prior on ?i, Borehole.454.2. Real Computer Model 1: BoreholeFigure 4.6: Prediction accuracy, given TGP?s prior on ?i, Borehole.464.2. Real Computer Model 1: BoreholeFrom Figures 4.4, 4.5 and 4.6, we make the following observations.? The performance of Higon?s prior is slightly better that GEM?s prioron ?i and is much better than the performance of TGP?s prior on ?i.? A large number of runs (10d) gives smaller Norm-RMSE than a smallnumber of runs (5d).? With TGP?s prior on ?i, a full linear model has smaller Norm-RMSEnumbers than a constant model.? The prior on ?2 does not seem to have much impact on the predictionaccuracy.Considering both the main effects and the interactions, an ANOVA forthe Norm-RMSE numbers is given in Table 4.3.Df Sum Sq Mean Sq F value Pr(>F)theta 2 2.37 1.18 7431.18 0.00 ***sigma 1 0.00 0.00 1.45 0.23reg 1 0.13 0.13 827.26 0.00 ***runs 1 0.37 0.37 2333.58 0.00 ***theta:sigma 2 0.00 0.00 0.60 0.55theta:reg 2 0.32 0.16 994.69 0.00 ***sigma:reg 1 0.00 0.00 0.42 0.52theta:runs 2 0.11 0.05 320.44 0.00 ***sigma:runs 1 0.00 0.00 0.42 0.51reg:runs 1 0.00 0.00 14.85 0.00 ***theta:sigma:reg 2 0.00 0.00 0.12 0.89theta:sigma:runs 2 0.00 0.00 0.04 0.96theta:reg:runs 2 0.01 0.01 46.21 0.00 ***sigma:reg:runs 1 0.00 0.00 2.26 0.13theta:sigma:reg:runs 2 0.00 0.00 1.31 0.27Residuals 576 0.09 0.00Table 4.3: ANOVA results for the Borehole computer model.In Table 4.3, theta represents the prior on ?i, sigma represents the prioron ?2, reg represents the model for the GP and runs is the number of runs.We can see from the Table 4.3 that all of the three main effects are signif-icant, except for the prior on ?2. Then, for each combination, we take itsaverage value of the Normalized RMSE and draw the plots for interaction474.2. Real Computer Model 1: Boreholein Figures 4.7, 4.8 and 4.9.Figure 4.7: Average prediction accuracy, given Higdon?s prior on ?i, Bore-hole.Figure 4.8: Average prediction accuracy, given GEM?s prior on ?i, Borehole.484.2. Real Computer Model 1: BoreholeFigure 4.9: Average prediction accuracy, given TGP?s prior on ?i, Borehole.From all of the figures and Table 4.3, we make the following conclusions.? The prior on ?i is important for prediction accuracy. And Higdon?sprior on ?i is preferred.? The effect of number of runs is important and a large number of runs(10d) is better than a small number of runs (5d).? Regression terms for the GP only matter when we assume TGP?s prioron ?i. Under TPG?s prior on ?i, a full linear regression is preferred.? There is not enough evidence to prove that the prior on ?2 has astatistically significant effect on prediction accuracy.4.2.2 Actual Coverage ProbabilityGiven a nominal coverage probability, each combination will have 25numbers for the ACP. Since it contains non-trivial variations within theACP values for each combination, we take the average of the ACP valuesand report the results in Figures 4.10, 4.11, 4.12 and Figures 4.13, 4.14, 4.15.The red lines in Figures 4.10, 4.11, 4.12 and Figures 4.13, 4.14, 4.15represent the correct 90% and 95% coverage probability, respectively. We494.2. Real Computer Model 1: BoreholeFigure 4.10: Prediction actual coverage probability, given Higdon?s prior on?i, 90% true CP, Borehole.Figure 4.11: Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, Borehole.504.2. Real Computer Model 1: BoreholeFigure 4.12: Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, Borehole.Figure 4.13: Prediction actual coverage probability, given Higdon?s prior on?i, 95% true CP, Borehole.514.2. Real Computer Model 1: BoreholeFigure 4.14: Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, Borehole.Figure 4.15: Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, Borehole.524.3. Real Computer Model 2: PTWobserve that the ACP for TGP?s prior slightly over-cover. The mean ACPvalues for Higdon?s prior and GEM?s prior are satisfactory, since they arearound the nominal coverage probability (90%, 95%).We summarize what we conclude from the above analyses as follows: interms of prediction accuracy, Higdon?s prior on ?i is highly preferred. Largenumber of runs (10d) is preferred than small number of runs (5d). In termsof the Actual Coverage Probability, all of the 24 combinations? performancesare acceptable.4.3 Real Computer Model 2: PTWIn this section, the real data we are going to work on is the outputsfrom a real physical computer model?PTW. The PTW model, which wasproposed by Preston et al. (2003) in 2003, models the metallic plastic flowof metals and alloys during explosively driven deformation and high-velocityimpacts. The model has 11 inputs and 1 single output measures the flowstress, the stress that is required to plastically deform the metal or alloy.4.3.1 Prediction AccuracyThe Norm-RMSE results are presented in Figures 4.16, 4.17 and 4.18.Again there are 24 combinations and each combination has 25 Norm-RMSEnumbers.From Figures 4.16, 4.17 and 4.18, we make the following observations.? The performance of Higdon?s prior on ?i is better than the perfor-mances of TGP?s prior and GEM?s prior.? A large number of runs (10d) has smaller Norm-RMSE numbers thana small number of runs(5d).? With TGP?s prior on ?i, a full linear model has smaller Norm-RMSEnumbers than a constant model.? The prior on ?2 does not seem to have much impact on the predictionaccuracy.534.3. Real Computer Model 2: PTWFigure 4.16: Prediction accuracy, given Higdon?s prior on ?i, PTW.544.3. Real Computer Model 2: PTWFigure 4.17: Prediction accuracy, given GEM?s prior on ?i, PTW.554.3. Real Computer Model 2: PTWFigure 4.18: Prediction accuracy, given TGP?s prior on ?i, PTW.564.3. Real Computer Model 2: PTWConsidering both the main effects and the interactions, an ANOVA forthe Norm-RMSE numbers is given in Table 4.4.Df Sum Sq Mean Sq F value Pr(>F)theta 2 1.84 0.92 1791.28 0.00 ***sigma 1 0.00 0.00 0.00 0.95reg 1 0.11 0.11 215.38 0.00 ***runs 1 1.01 1.01 1962.85 0.00 ***theta:sigma 2 0.00 0.00 0.14 0.87theta:reg 2 0.40 0.20 389.00 0.00 ***sigma:reg 1 0.00 0.00 0.06 0.80theta:runs 2 0.05 0.02 45.64 0.00 ***sigma:runs 1 0.00 0.00 0.00 0.95reg:ss 1 0.00 0.00 0.05 0.81theta:sigma:reg 2 0.00 0.00 0.36 0.70theta:sigma:runs 2 0.00 0.00 0.22 0.80theta:reg:runs 2 0.02 0.01 15.56 0.00 ***sigma:reg:runs 1 0.00 0.00 0.03 0.86theta:sigma:reg:runs 2 0.00 0.00 0.10 0.90Residuals 576 0.30 0.00Table 4.4: ANOVA results for the PTW computer model.Again, we observe from Table 4.4 that all of the three main effects aresignificant, except for the prior on ?2. Then, for each combination, we takeits average value of the Normalized RMSE and draw the plots for interactionin Figures 4.19, 4.20 and 4.21.From all of the figures and Table 4.4, we draw the following conclusions.? The prior on ?i is important for prediction accuracy and TPG?s prioris least favoured.? The effect of the number of runs is significant and a large number ofruns (10d) is better than a small number of runs (5d).? Regression terms for the GP only matter when TGP?s prior is assumedon ?i. Under TPG?s prior, a full linear regression is preferred.? There is not enough evidence to prove that the prior on ?2 has astatistically significant effect on prediction accuracy.574.3. Real Computer Model 2: PTWFigure 4.19: Average prediction accuracy, given Higdon?s prior on ?i, PTW.Figure 4.20: Average prediction accuracy, given GEM?s prior on ?i, PTW.584.3. Real Computer Model 2: PTWFigure 4.21: Average prediction accuracy, given TGP?s prior on ?i, PTW.4.3.2 Actual Coverage ProbabilityGiven a nominal coverage probability, each combination has 25 numbersfor the ACP. As we did for the Borehole data, we take the average valueof ACP for each combination and report the average ACP in Figures 4.22,4.23, 4.24 and Figures 4.25, 4.26, 4.27.The red lines in Figures 4.22, 4.23, 4.24 and Figures 4.25, 4.26, 4.27represent the correct 90% and 95% coverage probability, respectively. Weobserve that the ACP for TGP?s prior shows slight over coverage. The ACPfor Higdon?s prior and GEM?s prior are satisfactory, since they are aroundthe nominal coverage probability(90%, 95%).The results from PTW are consistent with the results obtained fromBorehole, that is, briefly speaking, in terms of the prediction accuracy, Hig-don?s prior on ?i is favoured and large number of runs (10d) is always pre-ferred. The Actual Coverage Probability for all of the 8 combinations underHigdon?s prior on ?i have acceptable performances.594.3. Real Computer Model 2: PTWFigure 4.22: Prediction actual coverage probability, given Higdon?s prior on?i, 90% true CP, PTW.Figure 4.23: Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, PTW.604.3. Real Computer Model 2: PTWFigure 4.24: Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, PTW.Figure 4.25: Prediction actual coverage probability, given Higdon?s prior on?i, 95% true CP, PTW.614.3. Real Computer Model 2: PTWFigure 4.26: Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, PTW.Figure 4.27: Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, PTW.624.4. Real Computer Model 3: G-protein4.4 Real Computer Model 3: G-proteinIn this section, the real outputs are the G-protein dataset, which wasused as a testbed in Chapter 3.4.4.1 Prediction AccuracyThe Norm?RMSE results are presented in Figures 4.28, 4.29 and 4.30.Each combination has 25 Norm-RMSE numbers.Figure 4.28: Prediction accuracy, given Higdon?s prior on ?i, G-protein.634.4. Real Computer Model 3: G-proteinFigure 4.29: Prediction accuracy, given GEM?s prior on ?i, G-protein.644.4. Real Computer Model 3: G-proteinFigure 4.30: Prediction accuracy, given TGP?s prior on ?i, G-protein.654.4. Real Computer Model 3: G-proteinFrom Figures 4.28, 4.29 and 4.30, we can see that? A large number of runs (10d) has more accurate prediction values thana small number of runs (5d).Considering both the main effects and the interactions, an ANOVA forthe Normalized RMSE numbers is given in Table 4.5.Df Sum Sq Mean Sq F value Pr(>F)theta 2 0.03 0.01 10.55 0.00 ***sigma 1 0.00 0.00 1.73 0.19reg 1 0.03 0.03 28.01 0.00 ***ss 1 1.49 1.49 1192.70 0.00 ***theta:sigma 2 0.03 0.01 10.16 0.00 ***theta:reg 2 0.03 0.02 13.17 0.00 ***sigma:reg 1 0.02 0.02 17.37 0.00 ***theta:ss 2 0.02 0.00 6.87 0.00 **sigma:ss 1 0.03 0.03 26.99 0.00 ***reg:ss 1 0.01 0.01 4.89 0.03 *theta:sigma:reg 2 0.02 0.01 8.09 0.00 ***theta:sigma:ss 2 0.02 0.01 8.96 0.00 ***theta:reg:ss 2 0.02 0.01 8.24 0.00 ***sigma:reg:ss 1 0.01 0.01 11.86 0.00 ***theta:sigma:reg:ss 2 0.03 0.01 10.76 0.00 ***Residuals 576 0.72 0.00Table 4.5: ANOVA results for the G-protein computer model.The ANOVA of the Gprotein data indicates that except the main effectof the prior on ?2, all of the remaining factors including all of the 2?factor,3?factor and even 4?factor interactions are significant at significance level0.05. However, we also observe that the p-values for the prior on ?i and theregression terms are larger than for the Borehole and PTW datasets. Fromvisual inspections of the plots in Figures 4.31, 4.32 and 4.33, we can not seeobvious performance difference for the two ?significant? factors? prior on ?and the regression terms. The number of runs (10d versus 5d) is important,however.From all of the figures and Table 4.5 , our conclusion is that only thenumber of runs is obviously important for the prediction accuracy and a-664.4. Real Computer Model 3: G-proteinFigure 4.31: Average prediction accuracy, given Higdon?s prior on ?i, G-protein.Figure 4.32: Average prediction accuracy, given GEM?s prior on ?i, G-protein.674.4. Real Computer Model 3: G-proteinFigure 4.33: Average prediction accuracy, given TGP?s prior on ?i, G-protein.gain, large number (10d) is preferred.4.4.2 Actual Coverage ProbabilityFor each combination, we report the average value for its ACP in Figures4.34, 4.35, 4.36 and Figures 4.37, 4.38, 4.39.The red lines in Figures 4.34, 4.35, 4.36 and Figures 4.37, 4.38, 4.39represent the correct 90% and 95% coverage probability, respectively. Weobserve that the ACP for Higdon?s prior and GEM?s prior show sight undercoverage. The worst case for Higdon?s prior is 74.105% for 90% NominalCP and 80.455% for 95% Nominal CP. The worst case for GEM?s prior is77.875% for 90% Nominal CP and 84.135% for 95% Nominal CP. Althoughthe ACP values for the Higdon?s prior and GEM?s prior are not as good asthey were in previous experiments, they are still much better than the MLE.Moreover, we also observe that Higdon?s prior and GEM?s prior have betterACP with the Jeffreys? prior on ?2.684.4. Real Computer Model 3: G-proteinFigure 4.34: Prediction actual coverage probability, given Higdon?s prior on?i, 90% true CP, G-protein.Figure 4.35: Prediction actual coverage probability, given GEM?s prior on?i, 90% true CP, G-protein.694.4. Real Computer Model 3: G-proteinFigure 4.36: Prediction actual coverage probability, given TGP?s prior on?i, 90% true CP, G-protein.Figure 4.37: Prediction actual coverage probability, given Higdon?s prior on?i, 95% true CP, G-protein.704.4. Real Computer Model 3: G-proteinFigure 4.38: Prediction actual coverage probability, given GEM?s prior on?i, 95% true CP, G-protein.Figure 4.39: Prediction actual coverage probability, given TGP?s prior on?i, 95% true CP, G-protein.714.5. Summary of the Real Computer Models4.5 Summary of the Real Computer ModelsTill now, we have conducted the full factorial design on three real out-puts from computer models?Borehole, PTW and Gprotein. We can drawsome common conclusions through the above analyses. First of all, in termsof the prediction accuracy, which is measure by Normalized RMSE, we have? The prior on ?i usually has significant effect on the prediction accuracyand Higdon?s prior on ?i is preferred.? The number of runs is significant to the prediction accuracy and alarge sample size (10d) is favoured.? There is a lack of evidence to conclude that regression terms for theunderlying GP have significant effects if Higdon?s prior or GEM?s prioron ?i is assumed.? There is not enough evidence to prove that the prior on ?2 significantlyaffect the prediction accuracy.Secondly, in term of the prediction Actual Coverage Probability, Hig-don?s prior and GEM?s prior on ?i combined with Jeffreys? prior on ?2 showgood coverage probability performance for all three applications.72Chapter 5In-Depth Comparison:Simulation StudyIn this chapter, instead of working on outputs from computer models,we will simulate data from a Gaussian Process to conduct experiments.5.1 Details for the Simulation StudyFrom previous analyses, we know that the prior on ?2 does not signif-icantly affect the prediction accuracy and the regression model does notmatter if Higdon?s prior or GEM?s prior is assumed on the correlation pa-rameters. Therefore, starting from this section, we will focus on the twosignificant factors and drop the remaining factors, i.e., we fix prior on ?2 asJeffreys? prior and fix the regression term for the underlying GP as constant.The two varying factors and their levels are summarized below.? Prior on ?i: 3 levels, Hidgon?s prior, TGP?s prior and GEM?s prior.? Number of Runs: 2 levels, 5d and 10d.Hence, in total, we get 3? 2 = 6 combinations.5.1.1 Data Generating ProceduresWe will simulate data from a Gaussian Process and carry out the fullfactorial experiments on the simulated data. We fix dimension as 5 and thenumber of replications as 50. For fixed true ?i values, the procedures togenerate training (input) data are described below:? Step 1. Generate ?i from N(0, 3002).? Step 2. Generate ?2 from IG(0.1, 5).? Step 4. Given true ?i values, compute the correlation matrix R.735.1. Details for the Simulation Study? Step 5. Generate y from multivariate normal?N(?, ?2R).? Step 6. Repeat Step 1 to Step 5 for 50 times.Then, we generate 1000 testing points based on a random Latin HypercubeDesign(LHD) using the R package lhs as the test set.5.1.2 Choice of True ?iIn general, there are two way to determine the true ?i. The first way is tofix the true ?i beforehand and the second way is to sample the true ?i from aspecific distribution at each replication. We have tried both way, but in thethesis, we only report the results obtained from the first way, because somelarge ?i values are sampled due to randomness even if the distribution wesample from assigns heavy weights on small ?i values. Very large ?i valueslead to unpredictable functions.Fixing the True ?i According to Canonical ConfigurationIn order to determine ?i, we adapt a two-parameter class of canonicalconfigurations, which was proposed by Loeppky et al. (2009). Let ? =d?i=1?i and ? =d?i=1?2i . The two-parameter class of canonical configuration isdefined by?i = ?((1?i? 1d)b ? (1?id)b), i = 1, . . . , d, ? ? 0, b ? 1. (5.1)? ? is a scale parameter, which control the magnitude of the ?i.? b is a parameter that control the sparseness.In section 5.2, we let b = 3 and ? = 1, so the true ?i are(0.488, 0.296, 0.152, 0.056, 0.008)We name this simulation scenario 1. Then, in section 5.3, we keep b = 3,but increase ? to 3 and the true ?i change to(1.464, 0.888, 0.456, 0.168, 0.024)We name this simulation scenario 2. In both scenarios, the sparseness is thesame but the magnitude of the ?i of scenario 2 is larger than that of scenario745.2. Simulation Scenario 11.Except the fact that data are simulated from a Gaussian Process andthat only two factors are considered, all of the other details in Chapter 5are identical to those in Chapter 4.5.2 Simulation Scenario 1As we have discussed, in this section, the true ?i are fixed as(0.488, 0.296, 0.152, 0.056, 0.008).5.2.1 Prediction AccuracyThe criterion is still Normalized RMSE. The scatter plots for TGP?s pri-or versus Higdon?s prior and GEM?s prior versus Higdon?s prior are providedby Figures 5.1, 5.2.We can see that Hidgon?s prior on ? is much better than TGP?s priorand is slightly better than GEM?s prior. In addition, n = 10d has betteraccuracy than n = 5d. Those two conclusions agree with the conclusions wedrew from the analyses of the real computer models.5.2.2 Actual Coverage ProbabilityFor each combination, we have 50 numbers for its ACP. We take themean for each combination and report the means in Tables 5.1 and 5.2. Thetrue coverage probability is 90% and 95%, respectively.5d 10dHigdon?s prior 89.9% 90.7%TGP?s prior 96.6% 95.6%GEM?s prior 94.5% 92.4%Table 5.1: Actual coverage probability, 90% true CP, scenario 1.755.2. Simulation Scenario 1Figure 5.1: Norm-RMSE for TGP?s prior versus Higdon?s prior, scenario 1.765.2. Simulation Scenario 1Figure 5.2: Norm-RMSE for GEM?s prior versus Higdon?s prior, scenario 1.775.3. Simulation Scenario 25d 10dHigdon?s prior 94.1% 95.3%TGP?s prior 98.7% 98.8%GEM?s prior 96.9 % 96.4%Table 5.2: Actual coverage probability, 95% true CP, scenario 1.From Tables 5.1 and 5.2, the ACP of Higdon?s prior is the closest to thetrue coverage probability among the three priors. Besides, we also noticethat as the number of runs increases, the ACP tends to be closer to the truecoverage probability.5.3 Simulation Scenario 2In this section, the true ?i are fixed as(1.464, 0.888, 0.456, 0.168, 0.024).5.3.1 Prediction AccuracyThe scatter plots for TGP?s prior versus Higdon?s prior and GEM?s priorversus Higdon?s prior are provided by Figures 5.3 and 5.4.From Figures 5.3 and 5.4 , we get exactly the same two conclusions aswe got from previous analyses. We do not state them again. ComparingFigures 5.3 and 5.4 with Figures 5.1 and 5.2, it is obvious that the predictionaccuracy of Scenario 1 is better than the accuracy of Scenario 2. The reasonlies in the magnitude of ?i. In general, large ?i values reduce the correla-tion between points in the input space, thus increase the prediction difficulty.785.3. Simulation Scenario 2Figure 5.3: Norm-RMSE for TGP?s prior versus Higdon?s prior, scenario 2.795.3. Simulation Scenario 2Figure 5.4: Norm-RMSE for GEM?s prior versus Higdon?s prior, scenario 2.805.4. Summary of the Simulation Study5.3.2 Actual Coverage ProbabilityFor each combination, we have 50 numbers for its ACP. We take themean for each combination and report it in Tables 5.3 and 5.4. The truecoverage probability is 90% and 95%, respectively.5d 10dHigdon?s prior 93.6% 91.1%TGP?s prior 94.2% 93.9%GEM?s prior 95.2% 92.8%Table 5.3: Actual coverage probability, 90% true CP, scenario 2.5d 10dHigdon?s prior 96.4% 95.5%TGP?s prior 97.2% 96.4%GEM?s prior 97.8 % 96.5%Table 5.4: Actual coverage probability, 95% true CP, scenario 2.From Tables 5.3 and 5.4, the ACP of Higdon?s prior is the closest to thetrue coverage probability among the three priors. Besides, we also noticethat as the number of runs increases, the ACP tends to be closer to thecorrect coverage probability.5.4 Summary of the Simulation StudyFrom the simulation study, we can safely draw the following conclusions? In terms of both the prediction accuracy and the actual coverage prob-ability, Higdon?s prior on ?i is highly preferred.? In terms of both the prediction accuracy and the actual coverage prob-ability, a large number of runs (n = 10d) is highly preferred.81Chapter 6Conclusion & DiscussionCurrently, Bayesian methods are popular among the statistical communi-ty. To be more specific, in the analysis of computer experiments, there existseveral statistical methods within the Bayesian paradigm. Before makingany concluding remarks, let?s first concisely summarize what we have done.We review three well-known Bayesian methods in Chapter 2. The tenta-tive comparison in Chapter 3 suggests that the performances of the threeBayesian approaches are quite different. Motivated by the differences, wespecified 4 factors and allocated different levels for the factors according tothe three existing methods. With two assessment criteria in mind, full fac-torial designs were then conducted both on real computer codes in Chapter4 and on simulated data in Chapter 5, with the purpose of identifying theimportant factors and their best levels.Here, we reiterate our findings. Among the 4 factors we specified,? The prior on the correlation parameters ?i is significant to the predic-tion performance, and Higdon?s prior, which favours small ? values, ishighly preferred.? The number of runs significantly affects the prediction performanceand a large number (10d) is favoured.? So far, there is not enough evidence to conclude that the prior on ?2is significant for prediction accuracy.? So far, there is a lack of sufficient facts to conclude that the regressionterm for the Gaussian Process is important for prediction accuracywhen Higdon?s prior or GEM?s prior is assumed for the correlationparameters.Through the previous analyses, we not only identified the significantfactors, but also discover the best level for each factor. The best combinationthat we recommend is? Higdon?s prior for the correlation parameters.82Chapter 6. Conclusion & Discussion? n = 10d as the number of runs.? Jeffreys? prior for ?2.? Constant model for the underlying Gaussian Process.Although the prior on ?2 is not significant for prediction accuracy, werecommend Jeffreys? prior on ?2 for the benefit of actual coverage probabil-ity.In addition, the constant model is suggested over a full linear modelfor the Gaussian Process. It is noticed that the full linear model is morecomputational intensive than the constant model. Although the full linearmodel has better prediction accuracy when assuming TGP?s prior on ?i, thedifference in prediction accuracy does not exist with Higdon?s prior on ?i,which, in fact, is recommended. Also, since our research does not involve theextrapolation of data, we deem a constant model is enough for the GaussianProcess.In terms of approximation of the output from a computer model, theGaussian Process model is currently the most popular statistical methodthat is widely used in practice. It is also of paramount importance to searchfor a suitable prior that can be set as the default for the correlation param-eters in Bayesian analysis. Higdon?s prior is one of the useful informativepriors that allocates heavy weight to small ?i values. From all of the analy-ses we have conducted in this thesis, Higdon?s prior is highly preferred andcan be trusted as a default prior for future analysis.A large number of runs is favoured through our analyses. This makesperfect intuitive sense, since large sample size contain more informationthat can be utilized to make more accurate predictions. To some degree,the prediction accuracy improves as the number of runs increases. How-ever, increasing the number of runs increases the computation of analysis,since the number of rows and columns of the correlation matrix also increaseaccordingly. Let n be the number of runs. As n becomes very large, the cor-relation matrix, denoted as Rn?n, will be too ?cumbersome? to take inverseor do any other matrix manipulations. Currently, we believe that n = 10d,where d is the dimension of the data, is enough for well-behaved problems,such as most of the demonstrating examples in the thesis.83Chapter 6. Conclusion & DiscussionIn future, we would like to continue our research on the Bayesian ap-proach for the analysis of computer experiments. Based on this thesis, oneimprovement we can do is to add more choices for the correlation structure.We only focus on the Gaussian correlation structure here. However, it is alsorecognized (Chen et al. (2013))that the Gaussian correlation structure per-forms badly under some circumstances. Therefore, adding more correlationstructures, such as the Power Exponential is needed in future work.84BibliographyAbt, M. (1999). Estimating the prediction mean squared error in gaussianstochastic processes with exponential correlation structure. ScandinavianJournal of Statistics, 26:563?578.Ba, S. and Joseph, R. (2012). Composite gaussian process models for emu-lating expensive functions. Annals of Appied Statistics, 6(4):1838?1860.Bastos, L. and O?Hagan, A. (2009). Diagnostics for gaussian process emu-lators. Technometrics, 51(4):425?438.Berger, J., Oliveira, V. D., and Sanso, B. (2001). Objective bayesian analysisof spatially correlated data. Journal of the American Statistical Associa-tion, 96:1361?1374.Chen, H., Loeppky, J., Sacks, J., and Welch, W. (2013). A laboratory toassess analysis methods for computer experiments.Gelman, A., Carlin, J., Stern, H., and Rubin, D. (1995). Bayesian dataanalysis. Chapman and Hall.Geman, S. and Geman, D. (1984). Stochastic relaxation, gibbs distributionsand the bayesian restoration of images. IEEE Transactions on PatternAnalysis and Machine Intelligence, 6:721C741.Gilks, W., Richardson, S., and Spiegelhalter, D. (1996). Markov ChainMonte Carlo in practice. Chapman and Hall.Gramacy, R. (2005). Bayesian Treed Gaussian Process Models. PhD thesis,University OF California, Santa Cruz.Gramacy, R. and Lee, H. (2008). Bayesian treed gaussian process model-s with an application to computer modeling. Journal of the AmericanStatistical Association, 103(483):1119?1130.Gramacy, R. and Lee, H. (2009). Adaptive design and analysis of supercom-puter experiments. Technometrics, 51(2):130?145.85BibliographyHartigan, J. (1964). Invariant prior distributions. Annals of MathematicalStatistics, 35:836?845.Hastings, W. K. (1970). Monte carlo sampling methods using markov chainsand their applications. Biometrika, 57:97?109.Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computermodel calibration using high-dimensional output. Journal of the AmericanStatistical Association, 103(482):570?583.Higdon, D., Kennedy, M., Cavendish, J., Cafeo, J., and Ryne, R. (2004).Combining field observations and simulations for calibration and predic-tion. SIAM Journal of Scientific Computing, 26:448?466.Kennedy, M. (2004). Description of the gaussian process model used ingem-sa. Technical report, University of Sheffield.Kennedy, M. and O?Hagan, A. (2001). Bayesian calibration of computermodels. Journal of Royal Statistical Association, Series B, 63(3):425?464.Loeppky, J., Sacks, J., and Welch, W. (2009). Choosing the sample size ofa computer experiments: A practical guide. Technometrics, 51:366376.McKay, M., Beckman, R., and Conover, W. (1979). A comparison of threemethods for selecting values of input variables in the analysis of outputfrom a computer code. Technometrics, 21(2):239?245.Metropolis, N., Rosenbluth, A., Rosenbluth, M., and Teller, A. (1953). E-quations of state calculations by fast computing machine. Journal ofChemical Physics, 21:1087C1091.Morris, M., Mitchell, T., and Ylvisaker, D. (1993). Bayesian design andanalysis of computer experiments: Use of derivatives in surface prediction.Technometrics, 35:243?255.Nelder, J. A. and Mead, R. (1965). A simplex method for function mini-mization. Computer Journal, 7:308?313.Preston, D., Tonks, D., and Wallace, D. (2003). Model of plastic deformationfor extreme loading conditions. Journal of Applied Physics, 93:211?220.Robert, C. (2001). The Bayesian Choice: From Decision-Theoretic Foun-dations to Computational Implementation. Springer.86Roberts, G. O., Gelman, A., and Gilks, W. R. (1997). Weak convergenceand optimal scaling of random walk metropolis algorithms. The Annalsof Applied Probability, 7:110?120.Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989). Design and analysisof computer experiments. Statistical Science, 4(4):409?435.Santner, T., Williams, B., and Notz, W. (2003). The Design and Analysisof Computer Experiments. Springer Series in Statistics. Springer Press,first edition.Shanno, D. F. and Kettler, P. C. (1970). Optimal conditioning of quasi-newton methods. Mathematics of Computation, 24:657?664.Smith, B. (2007). boa: An r package for mcmc output convergence assess-ment and posterior inference. Journal of Statistical Software, 21:1?37.Welch, W., Buck, R., Sacks, J., Wynn, H., and Toby Mitchell, M. M.(1992). Screening, predicting, and computer experiments. Technomet-rics, 34(1):15?25.Yi, T.-M., Fazel, M., Liu, X., Otitoju, T., Goncalves, J., Papachristodoulou,A., Prajna, S., and Doyle, J. (2005). Application of robust model valida-tion using SOSTOOLS to the study of G-protein signaling in yeast. Pro-ceedings of the First Conference on Foundations of Systems Biology inEngineering, FOSBE.87Appendix APosterior Distribution of ?ifor pi(?2) = IG(?1, ?2)We present the detailed derivation for the posterior distribution of ?ihere. The final results were given by equation (4.5) and equation(4.6). Theposterior distribution of ?i is obtained by integrating the ?2 and ? out.(1) If pi(?2) = IG(?1, ?2) and pi(?) = 1p(?|y)?? ?pi(?,?, ?2|y)L(y|?,?, ?2)d?d?2?? ?pi(?)IG(?1, ?2)1(?2)n/2|R|1/2exp {?(y ? F?)TR?1(y ? F?)2?2}d?d?2?pi(?)|R|12?IG(?1, ?2)(?2)n/2?exp {?(y ? F?)TR?1(y ? F?)2?2}d?d?2 (1)Let?s first integrate out ?. Since ?? = (F TR?1F )?1F TR?1y, we have(y ? F?)R?1(y ? F?)= ?TF TR?1F? ? ?TF TR?1y ? yTR?1F? + yTR?1y= (? ? ??)T (F TR?1F )(? ? ??)? ??TF TR?1F ?? + yTR?1yTherefore, Let E denote exp {? (y?F?)TR?1(y?F?)2?2 }?Ed?= exp {?yTR?1y ? ??TF TR?1F ??2?2}?exp {?(? ? ??)T (F TR?1F )(? ? ??)2?2}d?= exp {?yTR?1y ? ??TF TR?1F ??2?2}|F TR?1F?2|?12 (2)88Appendix A. Posterior Distribution of ?i for pi(?2) = IG(?1, ?2)Since?|F TR?1F?2|12 exp {?(? ? ??)T (F TR?1F )(? ? ??)2?2}? ?? ???N(??,[FTR?1F?2]?1)d? = 1Combining the fact that ??2 = (y?F ??)TR?1(y?F ??)n?k , we continue from (2)?Ed?= exp {?yTR?1y ? ??TF TR?1??2?2}|F TR?1F?2|?12= exp {?(y ? F ??)TR?1(y ? F ??)2?2}|F TR?1F?2|?12= exp {?(n? k)??22?2}|F TR?1F?2|?12Hence, we continue from (1)p(?|y) ?? ?pi(?,?, ?2|y)L(y|?,?, ?2)d?d?2?pi(?)|R|12?IG(?1, ?2)(?2)n/2?exp {?(y ? F?)TR?1(y ? F?)2?2}d?d?2?pi(?)|R|12?IG(?1, ?2)(?2)n/2exp {?(n? k)??22?2}|F TR?1F?2|?12d?2?pi(?)|R|12 |F TR?1F |12?(?2)??1?1 exp {??2?2 }(?2)(n?k)/2exp {?(n? k)??22?2}d?2?pi(?)|R|12 |F TR?1F |12?(?2)?(?1+n?k2 +1) exp {??2 +(n?k)2 ??2?2}d?2?pi(?)|R|12 |F TR?1F |12??(?1 + n?k2 )(?2 + n?k2 ??2)(?1+n?k2 )?pi(?)|R|12 |F TR?1F |12 (?2 + n?k2 ??2)(?1+n?k2 )(3)Since?(?2 + n?k2 ??2)(?1+n?k2 )?(?1 + n?k2 )(?2)?(?1+n?k2 +1) exp {??2 + n?k2 ??2?2}? ?? ?IG(?1+n?k2 ,?2+n?k2 ??2)d?2 = 189Appendix A. Posterior Distribution of ?i for pi(?2) = IG(?1, ?2)Equation (3) is the posterior distribution of ? given by (4.5), which allowsMCMC.90Appendix BPosterior Distribution of ?ifor pi(?2) ? 1/?2(2) If pi(?2) ? 1/?2 and pi(?) = 1.The Jeffreys? prior is a special case of Inverse Gamma as ?1 ? 0, ?2 ? 0.Let?s plug ?1 = ?2 = 0 into equation (3), we havep(?|y) ?pi(?)|R|12 |F TR?1F |12 (0 + n?k2 ??2)(0+n?k2 )?pi(?)|R|12 |F TR?1F |12 (??2)(n?k2 )(4)Equation (4) is the posterior distribution of ?i given by (4.6).91
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bayesian prediction and inference in analysis of computer...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bayesian prediction and inference in analysis of computer experiments Chen, Hao 2013
pdf
Page Metadata
Item Metadata
Title | Bayesian prediction and inference in analysis of computer experiments |
Creator |
Chen, Hao |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | Gaussian Processes (GPs) are commonly used in the analysis of data from a computer experiment. Ideally, the analysis will provide accurate predictions with correct coverage probabilities of credible intervals. A Bayesian method can, in principle, capture all sources of uncertainty and hence give valid inference. Several implementations are available in the literature, differing in choice of priors, etc. In this thesis, we first review three popular Bayesian methods in the analysis of computer experiments. Two prediction criteria are proposed to measure both the prediction accuracy and the prediction actual coverage probability. From a simple example, we notice that the performances of the three Bayesian implementations are quite different. Motivated by the performance difference, we specify four important factors in terms of Bayesian analysis and allocate different levels for the factors based on the three existing Bayesian implementations. Full factorial experiments are then conducted on the specified factors both for real computer models and via simulation with the aim of identifying the significant factors. Emphasis is placed on the prediction accuracy, since the performances of the prediction coverage probability for most combinations are satisfactory. Through the analyses described above, we find that among the four factors, two factors are actually significant to the prediction accuracy. The best combination for the levels of the four factors is also identified. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-08-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0074133 |
URI | http://hdl.handle.net/2429/44887 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2013-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2013_fall_chen_hao.pdf [ 944.22kB ]
- Metadata
- JSON: 24-1.0074133.json
- JSON-LD: 24-1.0074133-ld.json
- RDF/XML (Pretty): 24-1.0074133-rdf.xml
- RDF/JSON: 24-1.0074133-rdf.json
- Turtle: 24-1.0074133-turtle.txt
- N-Triples: 24-1.0074133-rdf-ntriples.txt
- Original Record: 24-1.0074133-source.json
- Full Text
- 24-1.0074133-fulltext.txt
- Citation
- 24-1.0074133.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0074133/manifest