Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and analysis of computer experiments : assessing and advancing the state of the art Chen, Hao 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_may_chen_hao.pdf [ 1.38MB ]
Metadata
JSON: 24-1.0364238.json
JSON-LD: 24-1.0364238-ld.json
RDF/XML (Pretty): 24-1.0364238-rdf.xml
RDF/JSON: 24-1.0364238-rdf.json
Turtle: 24-1.0364238-turtle.txt
N-Triples: 24-1.0364238-rdf-ntriples.txt
Original Record: 24-1.0364238-source.json
Full Text
24-1.0364238-fulltext.txt
Citation
24-1.0364238.ris

Full Text

Dzsign vny Vnvl–sis of ComputzrExpzrimzntsO Vsszssing vnyVyvvnxing thz htvtz of thz VrtbyHao ChenM.Sc., Statistics, The University of British Columbia, 2013B.Sc., Statistics, Renmin University of China, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)March 2018c© Hao Chen 2018VwstrvxtComputer experiments have been widely used in practice as importantsupplements to traditional laboratory-based physical experiments in study-ing complex processes. However, a computer experiment, which in generalis based on a computer model with limited runs, is expensive in terms of itscomputational time. Sacks et al. (1989) proposed to use Gaussian process(GP) as a statistical surrogate, which has become a standard way to emulatea computer model. In the thesis, we are concerned with design and analy-sis of computer experiments based on a GP. We argue that comprehensive,evidence-based assessment strategies are needed when comparing differentmodel options and designs.We first focus on the regression component and the correlation structureof a GP. We use comprehensive assessment strategies to evaluate the effect ofthe two factors on the prediction accuracy. We also have a limited evaluationon empirical Bayes methods and full Bayes methods, from which we noticeBayes methods with a squared exponential structure do not yield satisfyingprediction accuracy in some examples considered. Hence, we propose to usehybrid and full Bayes methods with flexible structures. Through empiricalstudies, we show the new Bayes methods with flexible structures not onlyhave better prediction accuracy, but also have a better quantification ofuncertainty.In addition, we are interested in assessing the effect of design on pre-diction accuracy. We consider a number of popular designs in the litera-ture and use several examples to evaluate their performances. It turns outthe performance difference between designs is small for most of the exam-ples considered. From the evaluation of designs, we are motivated to usea sequential design strategy. We compare the performances of two exist-ii4bfgeTcging sequential search criteria and handle several important issues in usingthe sequential design to estimate an extreme probability/quantile of a floorsupporting system. Several useful recommendations have also been made.In summary, the thesis concerns both the design and analysis of computerexperiments: we not only assess the performance of existing methods, butalso propose new methods motivated by the assessment and provide insightsinto issues faced by practitioners.iiiav– hummvr–Computer experiments have been widely used in practice as importantsupplements to traditional physical experiments in studying complex pro-cesses. However, a computer experiment is expensive in terms of its com-putational time. Hence, Gaussian process (GP) was proposed to used as astatistical surrogate. The scope of the thesis is rather broad: we are con-cerned with design and analysis of computer experiments based on a GP.We use comprehensive assessment strategies to evaluate the effect of severalfactors on the prediction accuracy of the GP model. In addition, we proposenew methods motivated by the assessment. Working with an engineeringcomputer model, we also provide insights into issues faced by practitioners.iverzfvxzThe dissertation is written up under the supervision of Prof. William J.Welch. The research questions and the proposed new methods are discussedwith Prof. Welch during our weekly research meetings. I am responsible forall major areas of concept formation, mathematical derivation, simulationstudy and thesis composition. Prof. Welch spend a lot of time in helping meformulate the problems, find solutions and edit the draft. Prof. Jim Zidekalso spent much time in providing helpful suggestions and editing the draft.Chapter 2 is based on a published paper: Chen, H., Loeppky, J., Sacks, J.and Welch, W. (2016), “Analysis Methods for Computer Experiments: Howto Assess and What Counts?”, gtutisticul gciyncy, 31, 40-60. I conductedthe simulation study and created results for the key tables and figures in thearticle.Chapter 3 is based on a published paper: Chen, H., Loeppky, J. andWelch, W., “Flexible Correlation Structure for Accurate Prediction andUncertainty Quantification in Bayesian Gaussian Process Emulation of aComputer Model”, gIUaCUgU Journul on incyrtuinty euunticution, 5,598-620. The paper comes as an extension to my MSc thesis. I searchedand found a suitable prior distribution for the smoothness parameter in theBayesian analysis. In addition, I drafted the article.Chapter 4 is based on a manuscript : Chen, H., Loeppky, J., Sacks, J.and Welch, W., “Evaluation of Designs for Computer Experiments”. Themanuscript will be submitted for journal publication in future.Chapter 5 will be converted to a manuscript for journal publication infuture: Chen, H., and Welch, W., “Sequential Computer Experimental De-sign for Estimating an Extreme Probability or Quantile”. This manuscriptis based on a collaborative project between the Department of Statistics ofvCeefTceUBC and FP Innovations. The computer model of a floor system used inthe article was suggested by Conroy Lum from FP Innovations.viivwlz of ContzntsAostrnpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLny Suzznry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPrrsnpr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vanoyr os Pontrnts . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiLvst os anoyrs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiLvst os Svturrs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivApxnowyrqtrzrnts . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiQrqvpntvon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii1 Introquptvon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Computer Models . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Analysis of Computer Experiments . . . . . . . . . . . . . . 21.2.1 Gaussian Process . . . . . . . . . . . . . . . . . . . . 21.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Parameter Estimation . . . . . . . . . . . . . . . . . . 41.3 Design of Computer Experiments . . . . . . . . . . . . . . . 61.3.1 Fixed Design . . . . . . . . . . . . . . . . . . . . . . . 61.3.2 Review of Two Fixed Designs Utilized in Chapters 2and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.3 Sequential Design . . . . . . . . . . . . . . . . . . . . 101.4 Brief Overview of the Thesis . . . . . . . . . . . . . . . . . . 11viiGTble bf 6bageagf? Rvnyuntvon os tur Annyysvs Mrtuoqs sor Pozputrr Rxprrv-zrnts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1 Fast Computer Models . . . . . . . . . . . . . . . . . . . . . 172.1.1 Borehole Model . . . . . . . . . . . . . . . . . . . . . 182.1.2 Gprotein Model . . . . . . . . . . . . . . . . . . . . . 222.1.3 PTW Model . . . . . . . . . . . . . . . . . . . . . . . 222.2 Slow Computer Models . . . . . . . . . . . . . . . . . . . . . 222.2.1 Nilson-Kuusk Model . . . . . . . . . . . . . . . . . . . 242.2.2 Volcano Model . . . . . . . . . . . . . . . . . . . . . . 272.2.3 Sea Ice Model . . . . . . . . . . . . . . . . . . . . . . 282.3 Other Modelling Strategies . . . . . . . . . . . . . . . . . . . 302.3.1 Full Bayes . . . . . . . . . . . . . . . . . . . . . . . . 302.3.2 Non-stationarity . . . . . . . . . . . . . . . . . . . . . 342.3.3 Adding a Nugget Term . . . . . . . . . . . . . . . . . 352.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4.1 Uncertainty of Prediction . . . . . . . . . . . . . . . . 382.4.2 Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Larger Sample Sizes . . . . . . . . . . . . . . . . . . . 412.5 Conclusions and Recommendations . . . . . . . . . . . . . . 433 Syrxvoyr Porrryntvon Strupturrs vn Onyrsvnn Tnussvnn Pro-prss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.1 Review of Two Full Bayes Methods . . . . . . . . . . . . . . 463.2 A Motivating Example: Nilson-Kuusk Model . . . . . . . . . 483.3 New Bayesian Methods . . . . . . . . . . . . . . . . . . . . . 503.3.1 Priors for , 2 and  . . . . . . . . . . . . . . . . . . 503.3.2 Power-Exponential-Hybrid . . . . . . . . . . . . . . . 523.3.3 Power-Exponential-Full . . . . . . . . . . . . . . . . . 533.3.4 Mate´rn-Hybrid . . . . . . . . . . . . . . . . . . . . . . 533.3.5 Marginal posterior distribution of the correlation pa-rameters . . . . . . . . . . . . . . . . . . . . . . . . . 543.3.6 Metropolis-Hastings algorithm . . . . . . . . . . . . . 553.4 Applications and Simulation Study . . . . . . . . . . . . . . 57viiiGTble bf 6bageagf3.4.1 Nilson-Kuusk Model . . . . . . . . . . . . . . . . . . . 573.4.2 Volcano Model . . . . . . . . . . . . . . . . . . . . . . 603.4.3 Borehole Function . . . . . . . . . . . . . . . . . . . . 613.4.4 PTW Model . . . . . . . . . . . . . . . . . . . . . . . 633.4.5 Simulations from GPs . . . . . . . . . . . . . . . . . . 643.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . 67A Rvnyuntvon os Qrsvtns sor Pozputrr Rxprrvzrnts . . . . . 694.1 A Review of Easily Constructed Designs . . . . . . . . . . . 714.1.1 Random LHD . . . . . . . . . . . . . . . . . . . . . . 714.1.2 Maximin LHD . . . . . . . . . . . . . . . . . . . . . . 714.1.3 Transformed LHD . . . . . . . . . . . . . . . . . . . . 734.1.4 Orthogonal Array-based LHD . . . . . . . . . . . . . 744.1.5 Sobol Sequence . . . . . . . . . . . . . . . . . . . . . 744.1.6 Uniform Design . . . . . . . . . . . . . . . . . . . . . 754.1.7 Sparse Grid Design . . . . . . . . . . . . . . . . . . . 764.2 Main Comparison . . . . . . . . . . . . . . . . . . . . . . . . 774.2.1 Borehole Model . . . . . . . . . . . . . . . . . . . . . 784.2.2 PTW Model . . . . . . . . . . . . . . . . . . . . . . . 844.2.3 Weighted Franke’s Function . . . . . . . . . . . . . . 854.2.4 Corner-peak Function: Original Scale . . . . . . . . . 874.2.5 Corner-peak Function: Logarithmic Scale . . . . . . . 894.3 Comparison of Projection Designs . . . . . . . . . . . . . . . 914.3.1 Borehole Function . . . . . . . . . . . . . . . . . . . . 914.3.2 PTW Function . . . . . . . . . . . . . . . . . . . . . . 934.3.3 Weighted Franke’s Function . . . . . . . . . . . . . . 934.3.4 Corner-peak Function: Logarithmic Scale . . . . . . . 934.4 Non Space Filling Design – SGD . . . . . . . . . . . . . . . . 974.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.5.1 The Effect of Regression Component . . . . . . . . . 1004.5.2 Grid Generation . . . . . . . . . . . . . . . . . . . . . 1014.6 Some Concluding Remarks . . . . . . . . . . . . . . . . . . . 103ixGTble bf 6bageagfB Srqurntvny Pozputrr Rxprrvzrntny Qrsvtn sor Rstvzntvntnn Rxtrrzr Proonovyvty or Qunntvyr . . . . . . . . . . . . . . 1055.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.2 Computer Model of a Floor System . . . . . . . . . . . . . . 1095.3 Sequential Experimental Design . . . . . . . . . . . . . . . . 1115.3.1 Sequential Algorithms . . . . . . . . . . . . . . . . . . 1115.3.2 Expected Improvement Criterion . . . . . . . . . . . . 1135.3.3 Hypothesis Testing-Based Criterion (Distance-BasedCriterion) . . . . . . . . . . . . . . . . . . . . . . . . 1145.4 An Example—Short Column Function . . . . . . . . . . . . . 1155.4.1 Probability Approximation . . . . . . . . . . . . . . . 1165.4.2 Quantile Estimation . . . . . . . . . . . . . . . . . . . 1185.5 Application to the Computer Model of the Floor System . . 1205.5.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . 1205.5.2 Modelling the Input Distribution . . . . . . . . . . . 1235.5.3 MC Set . . . . . . . . . . . . . . . . . . . . . . . . . . 1245.5.4 True Probability and Quantile . . . . . . . . . . . . . 1255.5.5 Application Results . . . . . . . . . . . . . . . . . . . 1255.6 Comments and Discussion . . . . . . . . . . . . . . . . . . . 1275.6.1 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 1275.6.2 Final Remarks . . . . . . . . . . . . . . . . . . . . . . 128C Ponpyusvons nnq Suturr dorx . . . . . . . . . . . . . . . . . . 1306.1 Summary of the Thesis . . . . . . . . . . . . . . . . . . . . . 1306.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Ovoyvotrnpuy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Apprnqvx A Suppyrzrntny Mntrrvnys sor Punptrr ? . . . . . . 141A.1 Test Functions in Chapters 2 . . . . . . . . . . . . . . . . . . 141A.1.1 Borehole Model . . . . . . . . . . . . . . . . . . . . . 141A.1.2 Gprotein Model . . . . . . . . . . . . . . . . . . . . . 142A.2 Results of Normalized maximum absolute errors in Chapter 2 142xGTble bf 6bageagfApprnqvx O Suppyrzrntny Mntrrvnys sor Punptrr 3 . . . . . . 154B.1 Results of Normalized Maximum Absolute Errors . . . . . . 154B.2 Marginal Posterior Distribution of the Correlation Parame-ters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156B.3 Posterior Distributions and Acceptance Rates . . . . . . . . 164Apprnqvx P Suppyrzrntny Mntrrvnys sor Punptrr A . . . . . . 170C.1 Results of Normalized Max Absolute Errors for the MainComparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 170C.2 Results of Normalized Max Absolute Errors for the Compar-ison of Projection Designs . . . . . . . . . . . . . . . . . . . . 175C.3 Results of Normalized Maximum Absolute Errors for the Dis-cussion Section . . . . . . . . . . . . . . . . . . . . . . . . . . 179xiaist of ivwlzs2.1 Nilson-Kuusk model: Normalized RMSE of prediction. Theexperimental data are from a 100-run LHD. . . . . . . . . . . 252.2 Nilson-Kuusk model: Normalized RMSE of prediction. Theexperimental data are from a 150-run LHD. . . . . . . . . . . 263.1 Sample means of the 25 normalized RMSEs from repeat ex-periments with the Nilson-Kuusk model for four methods. . . 583.2 Sample means of |ACP − 0O95| from 25 repeat experimentswith the Nilson-Kuusk model. . . . . . . . . . . . . . . . . . . 593.3 Actual coverage probability for the volcano model . . . . . . 613.4 Values of the j for simulation. . . . . . . . . . . . . . . . . . 644.1 Borehole function: prediction results for SGD and mLHD . . 994.2 Corner-peak function and Corner-peak function analyzed onthe logarithmic scale: Prediction results for SGD and mLHDdesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.1 RMSE based on the final estimates for probability estimation. 1165.2 RMSE based on the final estimates for quantile estimation. . 1205.3 ANOVA contribution analysis. The 8 main effects explain99O23% of the total variance. Interactions between two inputfactors are not shown as none of them contributes more than0.5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.1 Borehole function input variables, units, and ranges. Allranges are converted to [0P 1] for statistical modeling. . . . . . 141xiiLifg bf GTblefA.2 G-protein code input variables and ranges. All variables aretransformed to log scales on [0P 1] for statistical modeling. . . 142A.3 Nilson-Kuusk model: Normalized maximum absolute error ofprediction. The experimental data are from a 100-run LHD . 144A.4 Nilson-Kuusk model: Normalized maximum absolute error ofprediction. The experimental data are from a 150-run LHD . 145B.1 Maximum likelihood estimates of j and j from PowExp-Emp for the Nilson-Kuusk model. . . . . . . . . . . . . . . . . 168B.2 Maximum likelihood estimates of the j and j from empiricalBayes and the Mate´rn correlation for the Nilson-Kuusk model.169B.3 Metroplis-Hastings acceptance rates for the j for the Nilson-Kuusk model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169B.4 Metroplis-Hastings acceptance rates for the j for the Nilson-Kuusk model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169xiiiaist of Figurzs2.1 Borehole model: Normalized RMSE of prediction . . . . . . . 192.2 Borehole model: Normalized maximum absolute error of pre-diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3 G-protein model: Normalized RMSE of prediction . . . . . . 232.4 PTW model: Normalized RMSE of prediction . . . . . . . . . 242.5 Nilson-Kuusk model: Normalized RMSE of prediction. Theexperimental data are from a 150-run LHD base plus 50 ran-dom points from a 100-run LHD. . . . . . . . . . . . . . . . . 262.6 Volcano model: Normalized holdout RMSE of prediction . . . 292.7 Seaice model: Normalized holdout RMSE of prediction . . . . 302.8 Borehole model: Normalized holdout RMSE of prediction forSqExp-Full (K) and CGP . . . . . . . . . . . . . . . . . . . . 312.9 G-protein model: Normalized holdout RMSE of predictionfor SqExp-Full (K) and CGP . . . . . . . . . . . . . . . . . . 322.10 Nilson-Kuusk model: Normalized holdout RMSE of predic-tion for SqExp-Full (K) and CGP . . . . . . . . . . . . . . . . 332.11 Volcano model: Normalized holdout RMSE of prediction forSqExp-Full (K) and CGP . . . . . . . . . . . . . . . . . . . . 342.12 Borehole model: Normalized RMSE of prediction with a nuggetterm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.13 Friedman function: Normalized RMSE of prediction with anugget term versus the same models without a nugget term . 392.14 Borehole and Nilson-Kuusk models, ACP . . . . . . . . . . . 402.15 Friedman function, ACP with no nugget term versus the samemodels with a nugget term . . . . . . . . . . . . . . . . . . . 42xivLifg bf Figheef3.1 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) for the Nilson-Kuusk model, the motivatingexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 Prior densities: (a) prior on j and (b) prior on j . . . . . . 523.3 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) for the Nilson-Kuusk model . . . . . . . . . . 583.4 Normalized RMSE for the Volcano model from six methods . 603.5 Normalized RMSE (left panel) and actual coverage proba-bility (right panel) for the Borehole function and a 80-pointmLHD base design. . . . . . . . . . . . . . . . . . . . . . . . 623.6 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) for the Borehole function and a 200-pointmLHD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.7 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) for the PTW model and a 110-point mLHD. 633.8 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) with y = 10 inputs and output simulatedfrom a GP with SqExp correlation. . . . . . . . . . . . . . . 653.9 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) with y = 10 inputs and output simulatedfrom a GP with PowExp correlation and all j = 1O8. . . . . 663.10 Normalized RMSE (left panel) and actual coverage probabil-ity (right panel) with y = 10 inputs and output simulatedfrom a GP with Mate´rn correlation and all j = 1. . . . . . . 674.1 Visualization of eight base designs . . . . . . . . . . . . . . . 794.2 Borehole function: Normalized RMSE of prediction for eightbase designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.3 Borehole function: Normalized RMSE of prediction . . . . . . 824.4 Borehole function: Normalized maximum absolute error ofprediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.5 PTW function: Normalized RMSE of prediction . . . . . . . 844.6 Weighted Franke’s function: Normalized RMSE of prediction 86xvLifg bf Figheef4.7 Corner-peak function: Normalized RMSE of prediction . . . . 884.8 Corner-peak function analyzed at the logarithmic scale: Nor-malized RMSE of prediction . . . . . . . . . . . . . . . . . . . 904.9 Borehole function: Normalized RMSE of prediction for m2LHD,OA-based LHD . . . . . . . . . . . . . . . . . . . . . . . . . . 924.10 PTW function: Normalized RMSE of prediction for m2LHD,OA-based LHD . . . . . . . . . . . . . . . . . . . . . . . . . . 944.11 Weighted Franke function: Normalized RMSE of predictionfor m2LHD, OA-based LHD . . . . . . . . . . . . . . . . . . . 954.12 Corner-peak function analyzed at the logarithmic scale: Nor-malized RMSE of prediction for m2LHD, OA-based LHD . . 964.13 Visualization of two SGD designs . . . . . . . . . . . . . . . . 984.14 Corner-peak function analyzed at the logarithmic scale witha full linear GP model: Normalized RMSE of prediction . . . 1025.1 Floor with y = 8 joists (supporting beams). The 8 joists actlike springs, i.e., they deflect under a load. . . . . . . . . . . . 1105.2 Computer model of a floor system with y = 4 joists. . . . . . 1105.3 Probability estimation and the initial design is random. (a)estimates from the distance-based criterion. (b) estimatesfrom the EI criterion. The medians over 10 repeat experi-ments are joined by solid lines. The dotted line is the trueprobability, 0O0025. . . . . . . . . . . . . . . . . . . . . . . . . 1175.4 Probability estimation and the initial design is uniform. (a)estimates from the distance-based criterion. (b) estimatesfrom the EI criterion. The medians over 10 repeat experi-ments are joined by solid lines. The dotted line is the trueprobability, 0O0025. . . . . . . . . . . . . . . . . . . . . . . . . 1175.5 The dots are the 20-point initial design. The solid line is thecontour f(x) = 0 of interest. The triangles are points addedwith the order indicated within each triangle. . . . . . . . . . 119xviLifg bf Figheef5.6 Quantile estimation and the initial design is random. (a)estimates from the distance-based criterion. (b) estimatesfrom the EI criterion. The medians are joined by solid lines.The dotted line is the true 0O0025 quantile, 0. . . . . . . . . . 1195.7 Quantile estimation and the initial design is uniform. (a)estimates from the distance-based criterion. (b) estimatesfrom the EI criterion. The medians are joined by solid lines.The dotted line is the true 0O0025 quantile, 0. . . . . . . . . . 1205.8 The relationship between the MOE of the four middle beamsand the response. The solid lines are the estimated maineffects with 95% pointwise confidence intervals as dotted lines.Note that the inputs are between 0O77 × 106 and 2O36 × 106(pounds per square inch), which will be explained in section5.5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.9 Modelling the lower 10% data: Histogram of the lower 10%data, the two parameter censored Weibull distribution andthe three parameter censored Weibull distribution. . . . . . . 1245.10 The computer model of the floor system and the initial designis uniform with distance based criterion. (a) probability es-timates. (b) quantile estimates. The medians over 10 repeatexperiments are joined by solid lines. The dotted line is thetrue probability 0O999 in (a) and the true quantile 3O88057 in(b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.11 Diagnostic plots for the short column function with the uni-form initial design. (a) probability estimates. (b) quantileestimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.12 Diagnostic plots for the floor system computer model. (a)probability estimates. (b) quantile estimates. . . . . . . . . . 128A.1 G-protein model: Normalized maximum absolute error of pre-diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143A.2 PTW model: Normalized maximum absolute error of prediction144xviiLifg bf FigheefA.3 Nilson-Kuusk model: Normalized maximum absolute error ofprediction. The experimental data are from a 150-run LHDbase plus 50 random points from a 100-run LHD . . . . . . . 145A.4 Volcano model: Normalized maximum absolute error of pre-diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146A.5 Seaice model: Normalized maximum absolute error of predic-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147A.6 Borehole model: Normalized maximum absolute error of pre-diction of prediction for SqExp-Full (K) and CGP . . . . . . 148A.7 G-protein model: Normalized maximum absolute error of pre-diction of prediction for SqExp-Full (K) and CGP . . . . . . 149A.8 Nilson-Kuusk model: Normalized maximum absolute error ofprediction for SqExp-Full (K) and CGP . . . . . . . . . . . . 150A.9 Volcano model: Normalized maximum absolute error of pre-diction for SqExp-Full (K) and CGP . . . . . . . . . . . . . . 151A.10 Borehole model: Normalized maximum absolute error of pre-diction with a nugget term . . . . . . . . . . . . . . . . . . . . 152A.11 Friedman function: Normalized maximum absolute error ofprediction with a nugget term versus the same models withouta nugget term . . . . . . . . . . . . . . . . . . . . . . . . . . . 153B.1 Normalized maximum absolute errors for the Nilson-Kuusk,the motivating example . . . . . . . . . . . . . . . . . . . . . 154B.2 Normalized maximum absolute error for the Nilson-Kuuskmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155B.3 Normalized maximum absolute error for the Volcano modelfrom six methods . . . . . . . . . . . . . . . . . . . . . . . . . 156B.4 Normalized maximum absolute error for the Borehole func-tion and a 80-point mLHD base design . . . . . . . . . . . . . 157B.5 Normalized maximum absolute error for the Borehole func-tion and a 200-point mLHD base design . . . . . . . . . . . . 158B.6 Normalized maximum absolute error for the PTW model . . 159xviiiLifg bf FigheefB.7 Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with SqExp correlation . . . . . 160B.8 Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with PowExp correlation and allj = 1O8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161B.9 Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with Mate´rn correlation and allj = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162B.10 Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Full method. . . . . . . . . . 165B.11 Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Full method. . . . . . . . . . 166B.12 Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Hybrid method. . . . . . . . . 167B.13 Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the Mate´rn-Hybrid method. . . . . . . . . 168C.1 Borehole function: Normalized maximum absolute error ofprediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170C.2 PTW function: Normalized maximum absolute error of pre-diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171C.3 Weighted Franke’s function: Normalized maximum absoluteerror of prediction . . . . . . . . . . . . . . . . . . . . . . . . 172C.4 Corner-peak function: Normalized maximum absolute errorof prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173C.5 Corner-peak function analyzed at the logarithmic scale: Nor-malized maximum absolute error of prediction . . . . . . . . . 174C.6 Borehole function: Normalized maximum absolute error ofprediction for m2LHD, OA-based LHD . . . . . . . . . . . . . 175C.7 PTW function: Normalized maximum absolute error of pre-diction for m2LHD, OA-based LHD . . . . . . . . . . . . . . 176C.8 Weighted Franke’s function: Normalized maximum absoluteerror of prediction for m2LHD, OA-based LHD . . . . . . . . 177xixLifg bf FigheefC.9 Corner-peak function analyzed at the logarithmic scale: Nor-malized maximum absolute error of prediction for m2LHD,OA-based LHD . . . . . . . . . . . . . . . . . . . . . . . . . . 178C.10 Corner-peak function analyzed at the logarithmic scale witha full linear GP model . . . . . . . . . . . . . . . . . . . . . . 179xxGlossvr–PTP Composite Gaussian Process.PP Corner Peak.RI Expected Improvement.SL Full Linear.TP Gaussian Process.LHS Latin Hypercube Sampling.MLR Maximum Likelihood Estimation.zLHQ maximin Latin Hypercube Design.OA Orthogonal Array.OALHQ Orthogonal Array based Latin Hypercube Design.rLHQ random Latin Hypercube Design.RMSR Root Mean Squared Error.STQ Sparse Grid Design.SL Select Linear.trLHQ transformed Latin Hypercube Design.xxiVxknowlzygzmzntsFirst, I would like to acknowledge my special debts to my supervisor,Professor William J. Welch, who lead me to the intriguing field of analysisof computer experiments and is always strongly supportive of my researchcareer. It is my great pleasure to work with him. I also want to express myheartfelt gratitude to Professor James Zidek, Professor Alexandre Bouchard-Cote, who are my supervision committee members and provided lots ofguidance and help to me. In addition, I owe a debt of gratitude to Profes-sor Jerome Sacks and Professor Jason Loeppky, who are my co-authors forseveral important journal publications.I would also express my gratitude to Professors Ying MacNab, LangWu, Bruce Dunham, Jiahua Chen, Rollin Brant, Paul Gustafson, MatasSalibin-Barrera and Yew-Wei Lim for their constant mentoring and excellentteaching during my stay at UBC. I am also grateful to Peggy Ng, ElaineSalameh, Ali Hauschildt and Andrea Sollberger for their hard work and kindhelp. Thanks to everyone for making the department such an amazing place.I feel extremely grateful to my girlfriend — Miss Yumian Hu, who is alsomy best friend. I would like to thank her for all of the support as well asthe sacrifice she has made for me. I would not be able to accomplish thePhD degree without her company.Last but not least, I owe my special thanks to my parents for theirsupport and understanding of my PhD study in Canada.xxiiDzyixvtionTo am beloveX Zahhef abX aohhef. Mf. QibXobg ChebabX Mfg. Xiimibg Zhao, who afe go dfoiX.xxiiiChvptzr FIntroyuxtionIn this chapter, we outline the research and briefly review some previouswork in the relevant fields. The chapter can be divided into three parts:in the first part, we talk about what are computer experiments and theobjectives of analyzing a computer model. In the second part, designs forcomputer experiments are discussed. The structure of the thesis is outlinedin the last part.FCF Computzr boyzlsMany complex phenomena are extremely difficult to investigate throughcontrolled physical experiments. Instead, computer experiments become im-portant alternatives to provide insights. Nowadays, computer experimentshave been successfully applied in many sciences and engineering fields, forinstance climate change, where traditional laboratory-based experiments areimpossible to conduct. In general, a computer experiment is a designed setof runs of a computer model, which usually has the following two distin-guishing features: (1) it is deterministic, that is, repeating an identical setof inputs does not change the output (2) it is time-consuming, a single runmay take several hours or even days to complete.Consider a typical computer model, which was described by Gramacyand Lee (2008). NASA was developing a new reusable rocket booster calledthe Langley Glide-Back Booster (LGBB). NASA had built a computer modelto model the flight characteristics (lift, drag, pitch, side-force, yaw and roll)of the LGBB as a function of 3 inputs—side slip angle, mach number andangle of attack. For each input configuration triplet, the computer modelwill yield six response variables as described above. However, even for one11.2. 4aTllfif bf 6bmchgee Ekceeimeagfset of inputs, it takes the computer model 5-20 hours on a high-end work-station to solve the sets of Euler equations. A small modification of theinput configuration means another 5-20 hours of running time. Hence, itwould help if one could approximate the output of the computer model withadequate accuracy and much less computational time.Without loss of generality, we assume each set of inputs is a 1 × y rowvector, where y denotes the input dimension and the corresponding output isa scalar. If there is more than one output, one can treat each independentlyor reduce more complex output to a few scalar summaries. Now, supposethat we have n input configurations as well as the corresponding n outputsfrom a computer model. The primary goal in the analysis of computerexperiments is to predict the output of untried sets of inputs via a statisticalsurrogate model.A popular choice for the surrogate model is the Gaussian process (GP)(Sacks et al., 1989), and GP is the only surrogate model we consider inthis thesis. Assuming the correlation parameters of a GP are known, thebest linear unbiased predictor (BLUP) can be obtained by minimizing themean squared error (MSE) of the predictor. However, in practice one hasto estimate all of the unknown parameters using the available data (n setsof inputs and n outputs). In addition, since most real computer models areexpensive to evaluate, the sample size of the available data, n, is usuallynot big. Therefore, how to carefully design the computer experiment toprovide as much information as possible within a limited sample size is ofgreat importance. We will elaborate on each point throughout the thesis.FCG Vnvl–sis of Computzr ExpzrimzntsFCGCF Gvussivn eroxzssSacks et al. (1989) treated the output of a deterministic computer modelas if it is a realization from the following possibly non-stationary regressionmodel:n (x) = si (x) + Z(x)O (1.1)21.2. 4aTllfif bf 6bmchgee EkceeimeagfHere, x is a vector of inputs to the computer model, s = (f)(x)P f2(x)P O O O P fk(x))icontains k known regression functions,  is a k × 1 column vector of un-known regression parameters and Z(O) is a Gaussian process defined on theinput space X with zero mean and unknown variance 2. The above for-mulation defines a random function on the X domain. The interpretationof (1.1) is straightforward: the mean of the stochastic process depends on xin a standard regression manner, while the residual function is assumed tofollow a stationary GP.The output of a computer model is deterministic, i.e., no independentrandom errors exist. Hence least-squares fitting of a response surface mightnot be appropriate. Treating a computer model as a realization from astochastic process provides a statistical framework for design and analysis.In addition, it gives a basis for uncertainty quantification.Let x and x′ be two sets of inputs. The correlation between Z(x) andZ(x′) is denoted by R(xPx′). Following common practice, R(xPx′) is takento be a product of 1-d correlation functions in the distances hj = |xj − x′j |,i.e., R(xPx′) =∏yj5)Rj(hj). In this thesis, we mainly consider four choicesof Rj and they will be introduced and discussed in detail in chapter 2. Ingeneral, we use  to denote the set of unknown correlation parameters. Thecorrelation matrix R is fully specified by  . In addition, no matter whichspecific correlation structure is used, the correlation matrix R is a positive-definite n × n symmetric matrix with all its diagonal elements equalling1.In terms of the regression component in (1.1), we specify three choicesfor s(x) in the thesis and the details will be discussed in chapter 2 as well.In practice, without meaningful prior information, how to choose the cor-relation structure and the regression component remain an open researchquestion. These are the topics of chapter 2.FCGCG erzyixtionRunning a computer model n times at input vectors x()), x(2), O O O, x(n)produces n outputs y =(y(x()))P y(x(2))P O O O P y(x(n)))i. Given a new input31.2. 4aTllfif bf 6bmchgee Ekceeimeagfconfiguration x∗, we wish to predict y(x∗) whatever correlation structureand regression term is used. According to Sacks et al. (1989), the predictivedistribution of y(x∗) conditional on , 2,  and y is Gaussian:c (m (x∗)P v (x∗)) P (1.2)wherem (x∗) = si (x∗) + ri (x∗)R−)(y − S) (1.3)andv (x∗) = 2(1− ri (x∗)R−)r(x∗)) O (1.4)Here, S is the n× k matrix with row i containing si (x(i)), the n× 1 vectorr(x∗) is obtained from one of the four correlation structures considered inthe thesis and depends on which structure is used, with element i given byR(x∗Px(i)) for all i = 1P O O O P n. The subscript  in the notation for m (x∗)and v (x∗) emphasizes that these quantities depend on  . All parametersare unknown and hence require estimation in practice.FCGCH evrvmztzr EstimvtionIn practice, the parameters , 2, and  have to be estimated, leadingto further variability in the predictive distribution. In general, the inferenceparadigm can be classified as being in one of two categories: the first, thefrequentist’s viewpoint, treats the unknown parameters as constants, whichleads to the use of method of maximum likelihood estimation (MLE). Itis a special case of the empirical Bayes method. The second is to take afully Bayesian perspective by assuming each parameter has a specified priordistribution and use the available data to generate its posterior distribution.We briefly introduce the empirical Bayes approach in the sequel along withits relationship to frequency theory.In the case of interest to us, empirical Bayes estimates the unknownparameters by maximizing the joint likelihood function. Based on (1.1), the41.2. 4aTllfif bf 6bmchgee Ekceeimeagflikelihood function for , 2 and  is multivariate normal:a(y|P 2P ) = 1(2.2)n=2 det)=2Rexp{− 122(y − S)iR−)(y − S)}O(1.5)For any fixed  and hence R, maximizing the likelihood with respect to and 2 gives their MLEs:ˆ = (SiR−)S)−)SiR−)y (1.6)andˆ2 =1n(y − Sˆ)iR−)(y − Sˆ)O (1.7)Note that the MLEs for  and 2 depend on the parameters  through R.Plugging ˆ and ˆ2 into (1.5) gives the profile likelihood,1(2.ˆ2)n=2 det)=2 (R)exp{−n2}OIt needs to be numerically maximized to yield MLEs for  .To obtain predictions, the MLEs of , 2, and  are substituted for thetrue parameter values in (1.2). Extra uncertainty is thereby introduced byuse of  ˆP ˆ2P ˆ, but only the contribution from ˆ to that added uncertaintyis easily quantified. The estimated predictive mean ismˆ (x∗) = ˆ + ri (x∗)R−)(y − Sˆ) (1.8)and the estimate of the predictive variance in (1.4) becomesvˆ (x∗) = ˆ2[1− ri (x∗)R−)r(x∗)]+ˆ2[f(x∗)− SiR−)r(x∗)]i (SiR−)S)−)[f(x∗)− SiR−)r(x∗)]O(1.9)An approach takes a Bayesian perspective, in which each unknown pa-rameter is assumed to have a prior distribution. The method of Markovchain Monte Carlo (MCMC) is then used to draw samples from its posterior51.3. Defiga bf 6bmchgee Ekceeimeagfdistribution. Then the mean and standard deviation are calculated from thesample as estimates of the true output and its standard error, respectively.The basic idea of the Bayesian paradigm is straightforward. However, dif-ferent researchers assume different prior distributions. Moreover, either useof a hybrid of the MLE and Bayesian approaches or of a fully Bayesian ap-proach to estimate all of the unknown parameters of a GP is another topicthat needs discussion. Details of two popular Bayesian implementations aswell as newly proposed Bayesian methods will be discussed in chapter 3.FCH Dzsign of Computzr ExpzrimzntsWe next provide a brief overview of design for computer experiments.Design is essentially the way we select the sets of inputs at which the outputsare evaluated by the computer model. The primary goal is to carefullydesign an experiment such that it can “train” the best statistical surrogatefor achieving a certain goal, such as making predictions at untried sets ofinputs. In this thesis, design is classified into the following two categories:(1) a fixed design and (2) a sequential design. Given the same sample size,n, a fixed design is one in which all of its sets of inputs are generatedsimultaneously at the beginning and the design stays fixed throughout theprocess, i.e, no more information is added. By contrast, a sequential designworks with the same sample size budget, but in a more dynamic way: onlypart of the budget is used at the beginning to form an initial design and newsets of inputs are sequentially added into the initial design, one per iterationuntil it reaches the size n. If more than one point is added at each iteration,the design is called batch sequential. It is obvious that a sequential designscheme is more flexible than that of a fixed design, but it is usually morecomplex.FCHCF Fixzy DzsignUnless otherwise stated in the sequel, the term “design” in the rest ofthe thesis refers to a fixed design. We will use “sequential design” to refer61.3. Defiga bf 6bmchgee Ekceeimeagfto a design that takes the sequential approach.We first talk about the fixed design. As far as we are concerned, anobvious way that computer experiments differ from the traditional physicalexperiments is that the computer model is deterministic, that is, the sameresult will always be observed if the same set of inputs is evaluated. Un-certainty arises from the fact that the true relationship between inputs andoutputs is unknown and any functional relationship researchers use, such asthe GP model, is only an approximation to the true relationship.Based on the above observation, Santner et al. (2003) proposed twoprinciples in selecting designs for computer experiments as follows:1. A design should not take more than one observation at any set ofinputs.2. Because we do not know the true relationship between the responseand inputs, a design should allow one to fit a variety of models andshould provide information about all portions of the experimental re-gion.The above two principles require that a good design for computer experi-ments should be spread over the design space. Thus, it is not surprising tosee the need for a design to be “space-filling” is a widely accepted principlenowadays when the primary interest is to emulate a computer model.Let us now elaborate on the “space-filling” design. Without loss of gener-ality, at least for rectangular regions, designs are defined on [0P 1]y. A designbased on a random Latin hypercube sample (LHS) was proposed by McKayet al. (1979) to beat a “baseline” design which is constructed by completelyrandom sampling from j(0P 1) independently in each dimension. The designis also called a Latin hypercube design (LHD). A Latin hypercube is the gen-eralization of a square grid to an arbitrary number of dimensions, wherebyeach cell is the only one in each axis-aligned hyperplane containing it. Theauthors showed the variance of the sample mean )n∑ni5) n(i) smaller usinga LHD than using a completely random design, if y(x)P O O O P xy) is mono-tonic in each of its arguments. Stein (1987) further showed that if n→∞,71.3. Defiga bf 6bmchgee EkceeimeagfLHD has a smaller variance than a completely random design without themonotonicity condition.During the past few decades, there are many variants of the LHD, such asthe maximin LHD and the minimax LHD. The ideas underlying the maximindesign and minimax design were first proposed by Johnson et al. (1990) andlater those designs were introduced within the class of LHDs. Those designscan be viewed as a class of “space-filling” designs where such a property isachieved by defining and optimizing a distance measurement, although thespecific metric may vary from one design to another.There exists another class of “space-filling” design in which such a prop-erty is obtained by using a low-discrepancy sequence (Niederreiter, 1988).Given a set e = {x())P O O O Px(n)}, using Niederreiter’s notation, the discrep-ancy is defined asYn(e ) = supB∈J∣∣∣∣A(B;e )n − y(B)∣∣∣∣ Pwhere y is the d-dimensional Lebesgue measure, A(B;e ) is the number ofpoints in e that fall into B, and J is a set of d-dimensional regions in [0P 1]y.B is a member of J . The discrepancy is low if the proportion of points ine falling into B is close to the measure of B for all B.An obvious application of a low-discrepancy sequence is numerical inte-gration. The integral of a function f can be approximated by the averageof the function evaluations at n points:∫ )(O O O∫ )(︸ ︷︷ ︸yf(x)yx ≈ 1nn∑i5)f(x(i))OIf the points are chosen randomly from a known distribution, this is theMonte Carlo method. However, if the points are chosen as elements of a low-discrepancy sequence, this is the quasi-Monte Carlo method. Niederreiter(1988) showed that the quasi-Monte Carlo method has a rate of convergenceclose to d(1Rn), whereas the rate for the Monte Carlo method is d(1R√n).Please see Niederreiter (1992) for more details. To approximate the integral81.3. Defiga bf 6bmchgee Ekceeimeagfwell, the set e = {x())P O O O Px(n)} should minimize the size of holes in thespace. A Sobol sequence (Sobol, 1967) is another type of “space-filling”design whose “space-filling” property is achieved by a low-discrepancy se-quence. More details about it will be discussed in chapter 4.FCHCG gzvizw of iwo Fixzy Dzsigns jtilizzy in Chvptzrs Gvny HWe assess the effect of designs in chapter 4, in which a detailed reviewof some fixed designs is given in section 4.1. However, there are two fixeddesigns frequently used in chapters 2 and 3. We, therefore, provide a reviewof them in this section to help readers better follow the materials in thefollowing chapters.Rnnqoz LHQMcKay et al. (1979) introduced the random LHD. Without loss of gener-ality, given a fixed sample size n, a random LHD in 2-d over the unit squarecan be constructed by the following steps:• Construct n× n cells over the unit square.• Construct an n× 2 matrix, Z, with column independent random per-mutations of the integers, {1P 2P O O O P n}.• Each row of Z gives the row and column indices for a cell on the grid.For the ilh (i = 1P 2P O O O P n) row of Z, take a random uniform drawfrom the corresponding cell. The resulting design is the random LHD.It can easily be extended to more than 2-d scenario. More details about therandom LHD will be given in section 4.1.Mnxvzvn LHQAn maximin LHD can be viewed as a combination of an LHD and amaximin design. The idea of a maximin design is that in order for thepoints to be spread out over the space, no two points are too close to each91.3. Defiga bf 6bmchgee Ekceeimeagfother. To be more specific, let Y ⊂ X be an arbitrary n-point design thatconsists of distinct inputs {x())P O O O Px(n)}. One way to measure the distancebetween any two points x(i)Px(j) is given by/(x(i)Px(j)) =(y∑k5)∣∣∣x(i)k − x(j)k ∣∣∣p))=pP (1.10)where p = 1 and p = 2 are rectangular and Euclidean distances, respectively.The Euclidean distance, p = 2 is used in this thesis. A maximin designmaximizes the minimal distance between any two points in the design space.Given the design Y, its minimal distance is defined.Φ(Y) = minx(n);x(j)∈D/(x(i)Px(j))O (1.11)Intuitively, it makes sure that no two points are too close, and hence thedesign is spread out. Johnson et al. (1990) first defined the maximin design.A design that maximizes Φ(Y) in (4.2) within the class of LHDs is calledmaximin LHD, which is abbreviated as mLHD throughout the thesis.FCHCH hzquzntivl DzsignA sequential design is an adaptive design. It is an active learning processthat allows the model to learn itself “on the fly” as new information is addedsequentially. The core of a sequential design is the selection criterion, whichselects the next point that will be added into the training set. Differentresearch objectives usually have different selection criteria. The expectedimprovement (EI) criterion (L) is a popular one and is considered in thethesis. The basic idea is that given an improvement function, it selects thenext point that maximizes the expectation of the improvement function.The expectation can be interpreted as an average over possible realizationsof the conditional GP of the “improvement” a new point brings after it isadded into the training set.In general, different improvement functions were proposed for achievingdifferent statistical objectives; for instance L used an improvement function101.4. 5eief Bieeiiej bf ghe Ghefiffor optimization. Ranjan et al. (2008) proposed a different improvementfunction for contour estimation. Although the improvement functions aredifferent due to different research goals, the underlying idea is similar, asboth take the EI as the selection criterion. A comparison of EI and a hy-pothesis testing based method is conducted in chapter 5.FCI Wrizf dvzrvizw of thz ihzsisThe thesis is manuscript-based in the sense that each chapter is eitherbased on a published journal paper or will be converted into a journal pa-per for future publication. However, the thesis also has strong connectionsbetween chapters. The structure of the thesis is as follows.In chapter 2, we are concerned with some fundamental issues of using aGP as a statistical proxy for complex computer models. First of all, withoutany prior scientific information, how to choose the regression component andhow to model the correlation structure of the GP? We use comprehensive,evidence-based assessment strategies to compare such modelling options.Applying the strategies to several computer models shows that a regressionmodel more complex than a constant mean has little impact on predictionaccuracy. The choice of correlation function has modest effect, but there islittle to separate two common choices, the power exponential and the Mate´rnstructure. We also have a limited comparison of Bayesian and empiricalBayes methods.From the evaluation of Bayesian and empirical Bayes methods in chap-ter 2, we notice that empirical Bayes methods with a squared exponentialstructure do not yield satisfactory prediction accuracy in some examplesconsidered. Hence, we propose in chapter 3 to use Bayesian methods withthe power exponential and Mate´rn structures, which are much more flexiblethan the squared exponential structure. Through extensive empirical stud-ies, we show the new methods with flexible structures not only have betterprediction accuracy, but also have a better quantification of uncertainty.In chapter 4, we are interested in assessing the effect of different de-signs on the prediction accuracy with a GP model. We consider popular111.4. 5eief Bieeiiej bf ghe Ghefiffixed designs in the literature with desirable mathematical properties anduse several examples to evaluate their performances. From an intensive sim-ulation study, we find that the performance difference in terms of predictionaccuracy of different designs is small. There are other factors that havemuch bigger impacts on the prediction accuracy: the sample size and thetransformation of output, n .In chapter 5, with a real computer model in wood engineering, we com-pare the performances of two existing sequential methods in estimating thefailure probability of the engineering system and its corresponding quantile.We also discuss several practical issues that faced by practitioners, clos-ing the gap between the theory of sequential design and its application inpractice. Based on our study, we make some useful suggestions.Finally, in chapter 6, we make some comments and summarize the futurework motivated from the dissertation.From this brief introduction, it should be clear that the thesis coversa very broad area of computer experiments: we not only review and assessthe methods that exist for design and analysis of computer experiments, butalso extend the existing methods to overcome the drawbacks found from theassessment. In summary, the thesis is concerned with fundamental issues indesign and analysis of computer experiments.12Chvptzr GEvvluvtion of thz Vnvl–sisbzthoys for ComputzrExpzrimzntsStatistical methods based on a regression model plus a zero-mean GPhave been widely used for analyzing a deterministic computer model. Ourmain thrusts in this chapter are the practical questions faced by the GPmodel users: What regression function and correlation function should beused? Does it matter? Besides the main points, there exist other variationsof using a GP, for instance whether an empirical Bayes or a fully Bayesianmethod should be used to estimate the unknown GP parameters? We alsopartially address those variations in the chapter.We will call a GP model with specified regression and correlation functiona Gaussian stochastic process (GaSP) model. For example, GaSP(Const,PowExp) will denote a constant mean regression term and the power ex-ponential correlation function. Following common practice, the correlationfunction is taken to be a product of 1-d correlation functions in the dis-tances hj = |xj − x′j |, i.e., R(xPx′) =∏yj5)Rj(hj). The four 1-d correlationfunctions and three regression models under consideration are as follows.• Squnrrq rxponrntvny 5noorrvvntrq SqRxp6:Rj(hj) = exp(−jh2j) (2.1)where j S 0. The sensitivity parameter (j) controls how fast thecorrelation decays when the distance between x and x′ increases. The136hTcgee 2. EiTlhTgiba bf ghe 4aTllfif MeghbWf fbe 6bmchgee Ekceeimeagfprocess is infinitely differentiable with respect to each xj , i.e., it isextremely smooth, because of the fixed exponent of 2 in the distancemetric. SqExp has been used in many applications; see Gramacy andLee (2008), Kennedy and O’Hagan (2001), for instance.• Powrr rxponrntvny 5noorrvvntrq PowRxp6:Rj(hj) = exp(−jhjj)(2.2)where j S 0 and j ∈ [1P 2]. The introduced parameters,  =()P 2P O O O P y), govern smoothness, and hence PowExp is a more flex-ible family than SqExp.• Mntrrn:Rj(hj) =1Γ(,j)2,j−)(˜jhj),jK,j(˜jhj)P (2.3)where Γ is the Gamma function, K,j is the modified Bessel functionof order ,j and ˜j S 0. Here ˜j is a sensitivity parameter, and ,jcontrols smoothness. To be consistent with the parameterization ofthe sensitivity parameters of SqExp and PowExp, we redefine ˜j asj = 2√,jR˜j . For simplicity we consider special cases defined by ,j .If ,j = j +)2 with integer j ≥ 0, there are j derivatives with respectto xj . The four special cases considered areRj(hj) =exp(−j |hj |) (j = 0 derivatives)exp(−j |hj |) (j |hj |+ 1) (j = 1 derivative)exp(−j |hj |)()+(j |hj |)2 + j |hj |+ 1)(j = 2 derivatives)exp(−j |hj |2) (j →∞ derivatives)OThe last case gives SqExp. The Mate´rn class was recommended by(Stein (1999) , section 2.7) for its control via ,j S 0 of the differen-tiability of the correlation function with respect to xj , and hence thatof the prediction function. Also note that different coordinates havedifferent ,j to be estimated in practice for the Mate´rn class just like146hTcgee 2. EiTlhTgiba bf ghe 4aTllfif MeghbWf fbe 6bmchgee Ekceeimeagfthe PowExp structure.• Mntrrn-?: Some authors (e.g. Picheny et al. (2013)) fix ,j in theMate´rn correlation function to give some differentiability. The Mate´rn-2 subfamily sets ,j = 2 +)2 for all j, giving 2 derivatives. This is aspecial case of the Mate´rn correlation function.In general,  will denote the set of unknown correlation parameters: =()P O O O P y) (SqExp)()P O O O P yP )P O O O P y) (PowExp)()P O O O P yP )P O O O P y) (Mate´rn)()P O O O P y) (Mate´rn-2)O(2.4)In terms of the regression component in (1.1), we explore three main choices:• Constant (abbreviated Const): (, i.e., k = 1, only the intercept.• Full linear (FL): ( + )x) + O O O+ yxy, i.e., a full linear model in allinput variables with k = y+ 1.• Select linear (SL): linear in xj like FL but only includes selected terms.The proposed algorithm for SL is as follows. For a given correlation familyconstruct a default predictor of outputs for untried inputs with the Constapproach. Decompose the predictive function (Schonlau and Welch, 2005)and identify all main effects that contribute more than 100Ry percent tothe total variation. These become the selected coordinates. Typically, largemain effects have clear linear components. If a large effect lacks a linearcomponent, little is lost by including a linear term. Inclusion of possiblenonlinear trends can be pursued at considerable computational cost; we donot routinely do so, but in section 2.2 we do include a regression model withnonlinear terms in xj .Assessing statistical strategies for the analysis of a computer experimentoften mimics what is done for physical experiments: a method is proposed,applied in examples and compared to other methods. If it is possible, formal156hTcgee 2. EiTlhTgiba bf ghe 4aTllfif MeghbWf fbe 6bmchgee Ekceeimeagfmathematical comparisons are made, otherwise, one needs to use empiricalcriteria to assess performance. An initial empirical study for a physicalexperiment has to rely on the specific data of that experiment and, whiledifferent analysis methods may be applied, all are bound by the single dataset. There are limited opportunities to vary sample size or design. How-ever, computer experiments provide rich opportunities to investigate therelative merits of an analysis method. One can construct a whole spec-trum of “replicate” experiments for a single computer model, avoiding thedangerous “anecdotal” reports.The danger of being misled by anecdotes can be seen in the followingexample. The borehole function (Morris et al., 1993) is frequently used as atestbed to assess different models. A 27-run orthogonal array (OA) in the 8input factors was proposed as a design, following Joseph et al. (2008). The27 runs were analyzed via GaSP with a specific R (the SqExp) and with twochoices for the regression function: Const versus SL (x) is selected). Thedetails are not important for now. A set of 10P 000 test points selected atrandom in the 8-dimensional input space was then predicted. The resultingvalues of the root mean squared error (RMSE) measure defined in (2.5) ofsection 2.1 were 0.141 and 0.080 for the Const and SL regression models,respectively.While the SL approach reduced the RMSE by about 43% relative toa model with the Const component, does this example provide powerfulevidence for using regression terms in the GaSP model? Not quite. Wereplicated the experiment with the same choices of regression terms andcorrelation function and the same test-data, but the training data came froma theoretically equivalent 27-run OA design (There are many equivalentOAs, e.g., by permuting the labels between columns of a fixed OA.) TheRMSE values for the second analysis were 0O073 and 0O465 for the Const andSL models respectively. The two analyses lead to very different conclusions.We study the impact on prediction accuracy of the particular modelspecifications commonly used. The primary goals are two-fold: First, wepropose a more evidence-based approach to distinguish what may be impor-tant from the unimportant and what may need further exploration. Second,162.1. FTfg 6bmchgee MbWelfour application of this approach to various examples leads to some specificrecommendations.The rest of the chapter is organized as follows. In sections 2.1 and 2.2,we use various computer models to assess the effect of the two aforemen-tioned factors on prediction accuracy, along with some choices of samplesize and designs. In section 2.3, some other modelling strategies are consid-ered and compared. We make some comments in section 2.4. Finally, someconclusions and recommendations will be made in section 2.5.GCF Fvst Computzr boyzlsWe slightly modify the commonly-used root mean square error (RMSE)to normalized root mean squared error (RMSE) of prediction over the hold-out (test) set (zres],hg) and normalized maximum absolute error (zeax,hg).These are given below:zres],hg =√)mm∑i5)(yˆ(x(i)hg)− y(x(i)hg))2√)mm∑i5)(y¯ − y(x(i)hg))2 P zeax,hg = max |yˆ(x(i)hg)− y(x(i)hg)|max |y¯ − y(x(i)hg)|P(2.5)where m is the number of runs in the holdout set, x(i)hg is point i in the hold-out set, yˆ(x(i)hg) is the predicted value, and y¯ is the trivial predictor: samplemean of y in the training set. The normalization in the denominator putsRMSE roughly on [0P 1] whatever the scale of y, with 1 indicating no betterperformance than y¯. Similarly, worst-case performance can be defined asthe normalized maximum absolute error.What are tolerable levels of error? Clearly, these are application-specificso that tighter thresholds would be demanded, say, for optimization thanfor sensitivity analysis. For general purposes we take the rule of thumb thatzres],hg Q 0O10 is useful. For normalized maximum error it is plausible thatthe threshold could be much larger, say 0O25 or 0O30. These speculations areconsequences of the experiences we document later, and are surely not the172.1. FTfg 6bmchgee MbWelflast word. The value of having thresholds is to provide benchmarks thatenable assessing when differences among different methods or strategies arepractically insignificant versus statistically significant.For fast codes under our control, large holdout sets can be obtained.Hence, in this section performance is measured through the use of a holdout(test) set of 10P 000 points, selected as a random LHD on the input space.We consider three fast computer models in this section.GCFCF Worzholz boyzlThe first model we look at is the borehole model (Morris et al., 1993).It is an 8-dimensional computer model that has served as a test-bed inmany contexts (e.g., Chen (2013)). The detailed description is put in theappendix A.1. Three different designs are available for the experiment: a27-run, 3-level OA; a 27-run maximin LHD (mLHD); and a 40-run mLHD.As permuting the columns of an OA or an mLHD design does not changethe internal structure of that design, we generate 25 equivalent designs bypermuting the columns of the available designs to avoid “anecdotal” com-parisons. In addition, GP parameters are estimated by the method of MLE.There are 12 possible modelling combinations from the four correlationstructures and three regression models outlined previously. The SL choicefor  here is always the term x). Its main effect accounts for approximately80% of the variation in predictions over the 8-dimensional input domain, andall analyses with a constant regression function choose x) and no other termsacross all designs and all repeat experiments. The experimental results areshown in Figure 2.1 for the normalized RMSE and in Figure 2.2 for thenormalized maximum absolute error.The top row of Figure 2.1 shows results with the 27-run OA design. Fora given modelling strategy, 25 random permutations of the columns of the27-run OA lead to 25 repeat experiments and hence a reference set of 25values of zres], hg shown as a boxplot. The results are striking. Relative toa constant regression model, the FL regression model has apparently largerzres], hg values for all correlation functions. The SL regression also performs182.1. FTfg 6bmchgee MbWelfNormalized RMSE0.000.020.040.060.080.10SqExp PowExp Matern Matern2Constant40−run mLHD0.000.020.040.060.080.10SqExp PowExp Matern Matern2Select linear40−run mLHD0.000.020.040.060.080.10SqExp PowExp Matern Matern2Full linear40−run mLHD0.00.10.20.30.40.5Constant27−run mLHD0.00.10.20.30.40.5Select linear27−run mLHD0.00.10.20.30.40.5Full linear27−run mLHD0.00.20.40.60.8Constant27−run OA0.00.20.40.60.8Select linear27−run OA0.00.20.40.60.8Full linear27−run OAFigure 2.1: Borehole function: Normalized RMSE of prediction, zres],hg, forGaSP with all combinations of three regression models and four correlationfunctions. There are three base designs: a 27-run OA (top row); a 27-runmLHD (middle row); and a 40-run mLHD (bottom row). For each basedesign, 25 random permutations of its columns give the 25 values of zres], hgin a boxplot.192.1. FTfg 6bmchgee MbWelfNormalized Max Absolute Error0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Constant40−run mLHD0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Select linear40−run mLHD0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Full linear40−run mLHD0.00.20.40.60.8Constant27−run mLHD0.00.20.40.60.8Select linear27−run mLHD0.00.20.40.60.8Full linear27−run mLHD0.00.20.40.60.81.0Constant27−run OA0.00.20.40.60.81.0Select linear27−run OA0.00.20.40.60.81.0Full linear27−run OAFigure 2.2: Borehole function: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP with all combinations of three regression modelsand four correlation functions. There are three base designs: a 27-run OA(top row); a 27-run mLHD (middle row); and a 40-run mLHD (bottom row).For each base design, 25 random permutations of its columns give the 25values of zeax, hg in a boxplot.202.1. FTfg 6bmchgee MbWelfvery poorly sometimes, but not always. In addition, the top row of Figure2.1 also shows that the choice of correlation function is far less importantthan that of regression model.The results for the 27-run mLHD in the middle row of Figure 2.1 showthat design can have a large effect on accuracy: every analysis model per-forms better for the 27-run mLHD than for the 27-run OA. Note the verticalscale is different for each row of the figure. The SL regression now performsabout the same as the constant regression. There is no substantial differ-ence in accuracy between the correlation functions. Indeed, the impact onaccuracy of using the space-filling mLHD design instead of an OA is muchmore important than differences due to choice of the correlation function.Increasing the number of runs to a 40-run mLHD (bottom row of Figure2.1) makes a further substantial improvement in prediction accuracy. All12 modelling strategies give zres], hg values of about 0O02 − 0O06 over the25 repeat experiments. Although there is little systematic difference amongstrategies, the variation over equivalent designs is still striking in a relativesense. The figure for normalized maximum absolute error (Figure 2.2), whichmeasures the worst case scenario tells the same story as Figure 2.1, andall results for normalized maximum absolute error will be reported in theappendix A.2 to save space from now on.Stein (1988) showed that as long as the covariance function used toobtain the predictor is compatible with the actual covariance function, theobtained predictor will have the same asymptotic efficiency as the optimalpredictor based on the actual covariance function. Stein (1988) definedthat two covariance functions are compatible if the probability measures oftwo GP with equal mean functions and covariance functions are mutuallyabsolutely continuous. However, the “true” covariance function is unknownin practice. In addition, the affordable experimental budget is usually notlarge for expensive computer models, i.e., n can not go to infinity. Hence,it is valuable to assess the performance of different correlation structuresempirically when sample size is limited.212.1. FTfg 6bmchgee MbWelfGCFCG Gprotzin boyzlThe second application, the G-protein model used in Loeppky et al.(2009) and described in the appendix, consists of a system of ODEs with4-dimensional input.Figure 2.3 shows zres],hg for the three regression models (here SL selectsx2, x+ as inputs with large effects) and four correlation functions. Thehold-out set comprises 10P 000 points generated from a random LHD. Theresults in the top row are for a 40-run mLHD. With d = 4, all 24 possiblepermutations of a single base design lead to 24 data sets and hence 24zres],hg values. The boxplots in the top row have similar distributions acrossthe 12 modelling strategies. As these empirical distributions have mostzres],hg values above 0O1, we try increasing the sample size with an 80-runmLHD, which leads to a substantial decline in the normalized RMSE withall modelling methods giving zres],hg values of about 0.06 or less.Thus, for the G-protein application, none of the three choices for or the four choices for R matters. Again, the variation among equivalentdesigns is alarmingly large, dwarfing any impact of modelling strategy. Thenormalized maximum absolute error results are reported in Figure A.1 inthe appendix.GCFCH eil boyzlResults for a third fast-to-run code, PTW (Preston et al., 2003), arereported in Figure 2.4 for the normalized RMSE. The model has 11 inputs.We took an mLHD with n = 110 as the base design. Prior information fromengineers suggested incorporating linear x) and x2 terms; SL also includedx+. No essential differences among  or R emerged, but again there is awide variation over equivalent designs. The results of normalized maximumabsolute error are in the appendix A.2.222.1. FTfg 6bmchgee MbWelfNormalized RMSE0.000.020.040.060.08SqExp PowExp Matern Matern2Constant80−run mLHD0.000.020.040.060.08SqExp PowExp Matern Matern2Select linear80−run mLHD0.000.020.040.060.08SqExp PowExp Matern Matern2Full linear80−run mLHD0.000.050.100.150.200.250.30Constant40−run mLHD0.000.050.100.150.200.250.30Select linear40−run mLHD0.000.050.100.150.200.250.30Full linear40−run mLHDFigure 2.3: G-protein function: Normalized RMSE of prediction, zres],hg, forGaSP with all combinations of three regression models and four correlationfunctions. There are three base designs: a 40-run mLHD (top row); and a80-run mLHD (bottom row). For each base design, 24 random permutationsof its columns give the 24 values of zres], hg in a boxplot.232.2. Flbj 6bmchgee MbWelfNormalized RMSE0.050.100.150.200.25SqExp PowExp Matern Matern2ConstantSqExp PowExp Matern Matern2Select linearSqExp PowExp Matern Matern2Full linearFigure 2.4: PTW function: Normalized RMSE of prediction, zres],hg, forGaSP with all combinations of three regression models and four correlationfunctions. There is one base design: a 110 mLHD. 25 random permutationsof its columns give the 25 values of zres], hg in a boxplot.GCG hlow Computzr boyzlsFor complex costly-to-run models, generating substantial holdout dataor output from multiple designs is infeasible. Forced to depend on whatdata are at hand leads us to rely on cross-validation methods for generatingmultiple designs and holdout sets. Our approach in this section is to deletea subset from the full data set, use the remaining data to produce a predic-tor, and calculate the normalized RMSE and normalized maximum absoluteerror from predicting the output in the deleted (holdout) subset. Repeat-ing this for a number (25 is what we use) of subsets gives some measure ofvariability and accuracy. In effect, we create 25 designs and correspondingholdout sets from a single data set and compare consequences arising fromdifferent choices for predictors. The details described in the applicationsbelow differ somewhat depending on the particular application. We reportthree slow computer models in this section.242.2. Flbj 6bmchgee MbWelfGCGCF cilsonBKuusk boyzlAn ecological code modelling reflectance for a plant canopy developed byNilson and Kuusk (1989) was used by Bastos and O’Hagan (2009) to illus-trate diagnostics for GaSP models. With 5-dimensional input, two computerexperiments were performed: the first using a 150-run random LHD and thesecond with an independently chosen LHD of 100 points.We carry out three studies based on the same data. The first treatsthe 100-point LHD as the experiment and the 150-point set as a holdoutsample. The second study reverses the roles of the two LHDs. A third study,extending one done by Bastos and O’Hagan (2009), takes the 150-run LHD,augments it with a random sample of 50 points from the 100-point LHD,takes the resulting 200-point subset as the experimental design for trainingthe statistical model, and uses the remaining 50 points from the 100-runLHD to form the holdout set in the calculation of zres],hg. By repeatingthe sampling of the 50 points 25 times we get 25 replicate experiments eachwith the same base 150 runs but differing with respect to the additional 50training points and the holdout set.Plotting y against x5 shows an obvious non linear trend. Therefore,in addition to the linear regression choices we have studied so far, we alsoincorporate a regression model identified by Bastos and O’Hagan (2009): anintercept, linear terms in the inputs x)P O O O P x,, and a quartic polynomial inx5. We label this model “Quartic”. All analyses are carried out with theoutput y on a log scale. The rationale for a log transformation is based onstandard diagnostics for GaSP models (Jones et al., 1998). The details areomitted here. Results on zres], hg are reported in Tables 2.1, 2.2 and Figure2.5. Results on zeax, hg are reported in Tables A.3, A.4 and Figure A.3 inthe appendix.Table 2.1 summarizes the results of the study with the 100-point LHDas training data and the 150-point set as a holdout sample. It shows thechoice for  is immaterial: the constant mean is as good as any. For thecorrelation function, SqExp is inferior to the other choices, there is someevidence that Mate´rn is preferred to Mate´rn-2, and there is little difference252.2. Flbj 6bmchgee MbWelfzres], hgRegression Correlation functionModel SqExp PowExp Matern2 MaternConstant 0.116 0.099 0.106 0.102Select linear 0.115 0.099 0.106 0.105Full linear 0.110 0.099 0.104 0.104Quartic 0.118 0.103 0.107 0.106Table 2.1: Nilson-Kuusk model: Normalized RMSE of prediction, zres], hg,for four regression models and four correlation functions.The experimentaldata are from a 100-run LHD, and the hold-out set is from a 150-run LHD.zres], hgRegression Correlation functionModel SqExp PowExp Matern2 MaternConstant 0.111 0.080 0.087 0.078Select linear 0.113 0.080 0.091 0.079Full linear 0.109 0.079 0.090 0.076Quartic 0.104 0.079 0.088 0.078Table 2.2: Nilson-Kuusk model: Normalized RMSE of prediction, zres], hg,for four regression models and four correlation functions. The experimentaldata are from a 150-run LHD, and the hold-out set is from a 100-run LHD.262.2. Flbj 6bmchgee MbWelfNormalized RMSE0.060.080.100.120.14SqExp PowExp Matern Matern2ConstantSqExp PowExp Matern Matern2Select linearSqExp PowExp Matern Matern2Full linearSqExp PowExp Matern Matern2QuarticFigure 2.5: Nilson-Kuusk model: Normalized RMSE of prediction, zres], hg,for four regression models and four correlation functions. Twenty-five de-signs are created from a 150-run LHD base plus 50 random points from a100-run LHD. The remaining 50 points in the 100-run LHD form the holdoutset for each repeat.between PowExp and Mate´rn, the best performers. Similar results pertainwhen the 150-run LHD is used for training and the 100-run set for testing inTable 2.2. The results for the normalized maximum absolute errors conveya similar message.The boxplots in Figures 2.5 for the third study are even more striking inexhibiting the inferiority of R = SqExp and the lack of advantages for any ofthe non-constant regression functions. The large variability in performanceamong designs and holdout sets is similar to that seen for the fast codereplicate experiments of section 2.1. The perturbations of the experiment,from random sampling here, appear to provide a useful reference set forstudying the behaviour of model choices.The large differences in prediction accuracy among the correlation func-tions, not seen in section 2.1, deserves some attention. An overly smoothcorrelation function — the SqExp — does not perform as well as the Mate´rnand PowExp functions here. The latter two have the flexibility to allowneeded rougher realizations. With the 150-run design and the constant re-gression model, for instance, the maximum of the log likelihood increases272.2. Flbj 6bmchgee MbWelfby about 50 when the power exponential is used instead of the squaredexponential, with four of the j in (2.2) taking values less than 2.GCGCG kolxvno boyzlA computer model studied by Bayarri et al. (2009) models the process ofpyroclastic flow (a fast-moving current of hot gas and rock) from a volcaniceruption. The inputs varied are: initial volume, x), and direction, x2, of theeruption. The output, y, is the maximum (over time) height of the flow at alocation. The unit of measurement of y is meter. A 32-run data set providedby Elaine Spiller (different from that reported by Bayarri et al. (2009) buta similar application) is available. Plotting the data shows the output hasa strong trend in x), and putting a linear term in the GaSP surrogate, asmodelled by Bayarri et al. (2009), is natural. But is it necessary?The nature of the data suggests a transformation of y could be useful.The one used by Bayarri et al. (2009) is log(y + 1),i.e., 1 meter is addedto avoid zero before a log transformation is taken. Diagnostic plots (Joneset al., 1998) from using  = Const and R = SqExp shows that the logtransform is reasonable, but a square-root transformation is better still. Wereport analyses for both transformations.The regression functions considered are constant, select linear (( +)x)), full linear, and quadratic (( + )x) + 2x2 + +x2)), because thetrend in x) appears stronger than linear when looking at main effects fromthe surrogate obtained using√y and GaSP(Const, PowExp).Analogous to the approach in the Nilson-Kuusk example, repeat exper-iments are generated by random sampling of 25 runs from the 32 availableto comprise the design for model fitting. The remaining 7 runs form theholdout set. This is repeated 25 times, giving 25 z]res], hg values in theboxplots of Figure 2.6. The zeax, hg values are reported in Figure A.4 inthe appendix A.2. The conclusions are much like those in the Nilson-Kuuskexample: there is no need to go beyond  = Const, and PowExp is preferredto SqExp. The failure of SqExp in the two “slow” examples considered thusfar is surprising in light of commonly held beliefs that the SqExp structure282.2. Flbj 6bmchgee MbWelfis adequate.Normalized RMSE0.00.10.20.30.40.50.6SqExp PowExp Matern Matern2Constantlog(y+1)0.00.10.20.30.40.50.6SqExp PowExp Matern Matern2Select linearlog(y+1)0.00.10.20.30.40.50.6SqExp PowExp Matern Matern2Full linearlog(y+1)0.00.10.20.30.40.50.6SqExp PowExp Matern Matern2Quadraticlog(y+1)0.000.050.100.150.200.250.30Constanty0.000.050.100.150.200.250.30Select lineary0.000.050.100.150.200.250.30Full lineary0.000.050.100.150.200.250.30QuadraticyFigure 2.6: Volcano model: Normalized holdout RMSE, zres], hg, for fourregression models and four correlation functions. Two transformations areconsidered: log(y) and√y.GCGCH hzv Ixz boyzlThe arctic sea-ice model studied in Chapman et al. (1994) and in Loeppkyet al. (2009) has 13 inputs, 4 outputs, and 157 available runs. The previousstudies found modest prediction accuracy of GaSP(Const, PowExp) surro-gates for two of the outputs (ice mass and ice area) and poor accuracy forthe other two (ice velocity and ice range). The question arises whether use oflinear regression terms can increase accuracy to acceptable levels. We ran-domly select 130 points to form a training set and the remaining 27 pointsis the hold-out set. The whole process is repeated 25 times, which leads tothe normalized RMSE results in Figure 2.7 and the normalized maximumabsolute error in Figure A.5. The answer to that question is no: there is no292.3. Bghee MbWelliag FgeTgegiefhelp from  = SL or FL, nor from changing R. Indeed, FL makes accuracymuch worse sometimes.Normalized RMSE0.00.51.01.5SqExp PowExp Matern Matern2ConstantRangeOfArea0.00.51.01.5SqExp PowExp Matern Matern2Select linearRangeOfArea0.00.51.01.5SqExp PowExp Matern Matern2Full linearRangeOfArea0.00.51.01.5ConstantIceVelocity0.00.51.01.5Select linearIceVelocity0.00.51.01.5Full linearIceVelocity0.00.20.40.60.81.01.2ConstantIceArea0.00.20.40.60.81.01.2Select linearIceArea0.00.20.40.60.81.01.2Full linearIceArea0.00.20.40.60.81.01.2ConstantIceMass0.00.20.40.60.81.01.2Select linearIceMass0.00.20.40.60.81.01.2Full linearIceMassFigure 2.7: Seaice model: Normalized holdout RMSE, zres], hg, for threeregression models and four correlation functions. The outputs are: IceMass,IceArea, IceVelocity and RangeOfArea, which are modelled independently.GCH dthzr boyzlling htrvtzgizsIt is clear that we have not studied all possible paths to GaSP modellingthat one might take in a computer experiment. In this section we addressthree others in the same fashion described above.GCHCF Full Wv–zsA number of full Bayesian approaches have been employed in the liter-ature. They go beyond the statistical formulation using a GP as a prioron the class of functions and assign prior distributions to all parameters,302.3. Bghee MbWelliag FgeTgegiefparticularly those of the correlation function. For illustration, we considera Bayesian implementation by Kennedy (2004), labelled as SqExp-Full (K)method. The method is based on the SqExp correlation structure and thedetails of the priors it assumes can be found in section 3.1 of chapter 3.For the borehole application, 25 repeat experiments are constructed forthree designs, as in section 2.1. The boxplots of zres],hg in Figure 2.8 com-pare SqExp-Full (K) with the SqExp and PowExp structures in section 2.1based on MLE of all parameters. (The method CGP and its boxplot willbe discussed in section 2.3.2 later.) Unless stated otherwise, the Const isused as the regression term. SqExp-Full (K) is less accurate than eitherGaSP(Const, SqExp) or GaSP(Const, PowerExp). The boxplots of zeax, hgare similar to Figure 2.8 and are reported in the appendix (Figure A.6).Normalized RMSE0.00.10.20.30.4SqExp PowExp SqExp−Full (K) CGP27−run OA0.000.050.100.150.200.250.30SqExp PowExp SqExp−Full (K) CGP27−run mLHD0.000.020.040.060.080.10SqExp PowExp SqExp−Full (K) CGP40−run mLHDFigure 2.8: Borehole function: Normalized holdout RMSE of prediction,zres], hg, for GaSP(Const, SqExp), GaSP(Const, PowerExp), SqExp-Full(K), and CGP. There are three base designs: a 27-run OA (left), a 27-run mLHD (middle); and a 40-run mLHD (right). For each base design,25 random permutations of its columns give the 25 values of zres], hg in aboxplot.Figure 2.9 similarly depicts results for the Gprotein model. The box-plots of zeax, hg are similar to Figure 2.9 and are reported in the appendix(Figure A.7). With the 40-run mLHD, the fully Bayesian and empiricalBayes methods all perform about the same, giving only fair prediction accu-312.3. Bghee MbWelliag FgeTgegiefNormalized RMSE0.000.050.100.150.20SqExp PowExp SqExp−Full (K) CGP40−run mLHD0.000.010.020.030.040.050.06SqExp PowExp SqExp−Full (K) CGP80−run mLHDFigure 2.9: G-protein: Normalized holdout RMSE of prediction, zres], hg,for GaSP(Const, SqExp), GaSP(Const, PowerExp), SqExp-Full (K), andCGP. There are two base designs: a 40-run mLHD (left); and an 80-runmLHD (right). For each base design, all 24 permutations of its columns givethe 24 values of zres], hg in a boxplot.racy. Increasing n to 80 improves accuracy considerably for all methods (thescales of the two plots are very different), far outweighing any systematicdifferences between their accuracies.As we have observed so far, SqExp-Full (K) performs as well as theGaSP methods for Gprotein, not so well for Borehole with n = 27 butadequately for n = 40. Turning to the slow codes in section 2.2 a differentmessage emerges. Figure 2.10 for the Nilson-Kuusk model is based on 25repeat designs constructed as for Figure 2.5 with a base design of 150 runsplus 50 randomly chosen from 100. The distributions of zres],hg for SqExp-Full (K) and GaSP(Const, SqExp) are similar, with GaSP(Const, PowExp)showing a clear advantage. Moreover, few of the Bayes zres],hg values meetthe 0O10 threshold, while all the GaSP(Const, PowExp) zres],hg values do.The boxplots of zeax, hg are reported in the appendix (Figure A.8 ) and aresimilar to Figure 2.10. SqExp-Full (K)) uses the SqExp correlation function,322.3. Bghee MbWelliag FgeTgegiefwhich performed relatively poorly in section 2.2. The disadvantage carriesover to the Bayesian method here.Normalized RMSE0.000.050.100.150.20SqExp PowExp SqExp−Full (K) CGPFigure 2.10: Nilson-Kuusk model: Normalized holdout RMSE of prediction,zres], hg, for GaSP(Const, SqExp), GaSP(Const, PowerExp), SqExp-Full(K), and CGP.The results in Figure 2.11 for the volcano model are for the 25 repeatexperiments described in section 2.2. Here again GaSP(Const, PowerExp)dominates Bayes and for the same reasons as for the Nilson-Kuusk model.For the√y transformation, all but a few GaSP(Const, PowerExp) zres],hgvalues meet the 0.10 threshold, in contrast to Bayes where all but a few donot. (Note also that error according to the zres],hg criterion is much smalleron the√y scale, supporting its choice.) The results of zeax,hg are reportedin the appendix (Figure A.9).These results are striking and suggest that Bayes methods relying on R= SqExp need extension to have the needed roughness determined by thedata. The hybrid Bayes-MLE approach employed by Bayarri et al. (2009)332.3. Bghee MbWelliag FgeTgegiefNormalized RMSE0.000.050.100.150.200.250.30SqExp PowExp SqExp−Full (K) CGPy0.00.10.20.30.40.50.6SqExp PowExp SqExp−Full (K) CGPlog(y+1)Figure 2.11: Volcano model: Normalized holdout RMSE of prediction,zres], hg, for GaSP(Const, SqExp), GaSP(Const, PowerExp), SqExp-Full(K), and CGP.estimates the correlation parameters in PowExp by MLE, fixes them, andtakes objective priors for  and 2. The mean of the predictive distributionfor a holdout output value gives the same prediction as GaSP(Const, Pow-Exp). Whether this is a good remedy requires further exploration. We willdiscuss it in the next chapter.GCHCG conBstvtionvrit–The use of stationary GPs as priors in the face of “non-stationary ap-pearing” functions has attracted a measure of concern despite the fact thata2-differentiable functions can be approximated using PowExp with enoughdata. Of course there never are enough data. A relevant question is whetherother priors, even stationary ones different from those in sections 2.1 and2.2, are better reflective of conditions and lead to more accurate predictors.Ba and Joseph (2012) advanced a “composite” GP (CGP) approach.These authors used two GPs, both with SqExp correlation. The first has342.3. Bghee MbWelliag FgeTgegiefcorrelation parameters j in (2.1) constrained to be small for gradually vary-ing longer-range trend, while the second has larger values of j for shorter-range behaviour. The second, local GP also has a variance that dependson x, primarily as a way to cope with apparent non-stationary behaviour.Does this composite approach offer an effective improvement to the simplerchoices of sections 2.1 and 2.2?We can apply CGP via its R library to the examples studied in sections2.1 and 2.2, much as was just done for SqExp-Full (K). The comparisons inFigure 2.8 for the borehole function show that GaSP and CGP have similaraccuracy for the two 27-run designs. GaSP has smaller error than CGP forthe 40-run mLHD, though both methods achieve acceptable accuracy. Theresults in Figure 2.9 for G-protein show little practical difference betweenany of the methods, including CGP. For these two fast-code examples, thereis negligible difference between CGP and the GaSP methods. For the modelsof section 2.2, however, conclusions are somewhat different. GaSP(Const,PowerExp) is clearly much more accurate than CGP for the Nilson-Kuuskmodel (Figure 2.10) and roughly equivalent for the volcano model (Figure2.11). All in all, we conclude that no evidence for the effectiveness of acomposite GaSP approach. These findings are in accordance with the earlierstudy by West et al. (1995).GCHCH Vyying v cuggzt izrmA nugget augments the GaSP model in (1.2.1) with an uncorrelated ϵterm, usually assumed to have a normal distribution with mean zero andconstant variance 2ϵ , independent of the correlated process Z(x). Thischanges the computation of R and r(x) in the conditional prediction (1.3),which no longer interpolates the training data. A nugget term has beenwidely used for statistical modelling of deterministic computer codes withoutrandom error. The reasons offered are that numerical stability is improved,so overcoming computational obstacles, and a nugget can produce betterpredictive performance or better confidence or credibility intervals. Theevidence — in the literature and presented here — suggests, however, that352.3. Bghee MbWelliag FgeTgegieffor deterministic functions the potential advantages of a nugget term aremodest. More systematic methods are available to deal with numericalinstability if it arises (Ranjan et al., 2011), adding a nugget does not converta poor predictor into an acceptable one, and other factors may be moreimportant for good statistical properties of intervals (section 2.4). On theother hand, we also do not find that adding a nugget (and estimating it alongwith the other parameters) is harmful, though it may produce smoothersrather than interpolators. We now elaborate on these points.First of all, a small nugget is often included to improve the numericalproperties of the correlation structure, R. However, for the space-fillingdesigns used in sections 2.1 and 2.2, Ranjan et al. (2011) showed that ill-conditioning in a no-nugget GaSP will only occur for low-dimensional x,high correlation, and large n. These conditions are not commonly met ininitial designs for applications. For instance, none of the computations inthis chapter failed due to ill-conditioning, and those computations involvedmany repetitions of experiments for the various functions and GaSP models.In addition, when a design is not space-filling, matrix ill-conditioning mayindeed occur. For instance, if the sequential design proposed by Binghamet al. (2014) to approximate a contour or to optimize a function is used, thenewly added point may get close to the existing points, x, causing numericalproblem. If ill-conditioning does occur, however, the mathematical solutionby Ranjan et al. (2011) is an alternative to adding a nugget.Secondly, a nugget term is also sometimes suggested to improve predic-tive performance. Andrianakis and Challenor (2012) showed mathemati-cally, however, that with a nugget the RMSE of prediction can be as largeas that of a least squares fit of just the regression component in (1.2.1). Ourempirical findings, choosing the size of 2ϵ via its MLE, are similarly unsup-portive of a nugget. We repeat the calculations leading to Figure 2.1 for theborehole function, but fitting a nugget term in all models, shows essentiallyno difference. The results are reported in Figure 2.12 for normalized RMSE.The results for maximum absolute error are reported in Figure A.10. TheMLE of 2ϵ is either zero or very small relative to the variance of the cor-related process. These findings are consistent with those of Ranjan et al.362.3. Bghee MbWelliag FgeTgegief(2011), who found for the borehole function and other applications that con-straining the model fit to have at least a modest value of 2ϵ deterioratedpredictive performance.Normalized RMSE0.000.020.040.060.080.10SqExp PowExp Matern Matern2Constant40−run mLHD0.000.020.040.060.080.10SqExp PowExp Matern Matern2Select linear40−run mLHD0.000.020.040.060.080.10SqExp PowExp Matern Matern2Full linear40−run mLHD0.00.10.20.30.40.5Constant27−run mLHD0.00.10.20.30.40.5Select linear27−run mLHD0.00.10.20.30.40.5Full linear27−run mLHD0.00.20.40.60.8Constant27−run OA0.00.20.40.60.8Select linear27−run OA0.00.20.40.60.8Full linear27−run OAFigure 2.12: Borehole model: Normalized RMSE of prediction, zres],hg, forGaSP adding a nugget estimated by its MLEs, and with all combinations ofthree regression models and four correlation functions. There are three basedesigns: a 27-run OA (top row); a 27-run mLHD (middle row); and a 40-runmLHD (bottom row). For each base design, 25 random permutations of itscolumns give the 25 values of zres], hg in a boxplot.Another example we consider is the Friedman functiony(x) = 10 sin(.x)x2) + 20(x+ − 0O5)2 + 10x, + 5x5P (2.6)with n = 25 runs, was used by Gramacy and Lee (2012) to show potential ad-vantages of including a nugget term. However, their context —performancecriteria, analysis method and design — differs in all respects from ours. Ourresults in the top row of Figure 2.13 show that the GaSP(Const, SqExp) andGaSP(Const, PowExp) models with n = 25 have highly variable accuracy,372.4. 6bmmeagfwith zres], hg values no better and often much worse than 20%. The resultsfor maximum absolute errors are reported in Figure A.11 and are consistentto Figure 2.13. The effect of the nugget is inconsequential. Increasing thesample size to n = 50 makes a dramatic improvement in prediction accuracy,but the effect of a nugget remains negligible.GCI CommzntsGCICF jnxzrtvint– of erzyixtionAs noted in sections 2.1 and 2.2, our attention is directed at predic-tion accuracy, the most compelling characteristic in practical settings. Forexample, where the objective is calibration and validation, the details of un-certainty, as distinct from accuracy, in the emulator of the computer modelare absorbed (and usually swamped) by model uncertainties and measure-ment errors (Bayarri et al., 2007). But for specific predictions it is clearlyimportant to have valid uncertainty statements. Currently, a full assessmentof the validity of emulator uncertainty quantification is unavailable. But, ithas long been recognized that the standard error of prediction can be opti-mistic when MLEs of the parameters in the correlation functions of sections2.1 and 2.2 are “plugged-in” because the uncertainty in the parameter valuesis not taken into account and that uncertainty is non-trivial (Abt, 1999).Bayes credible intervals with full Bayesian methods carry explicit andvalid uncertainty statements; hybrid methods using priors on some of thecorrelation parameters (as distinct from MLEs) may also have reliable cred-ible intervals. But for properties such as actual coverage probability (ACP),the proportion of points in a test set with true response values covered byintervals of nominal (say) 95% confidence or credibility, the behaviour is farfrom clear. Chen (2013) compared several Bayes methods with respect tocoverage. The results showed variability with respect to equivalent designslike that found above for accuracy, a troubling characteristic pointing toconsiderable uncertainty about the uncertainty.In Figure 2.14 we see some of the issues. It gives ACP results for the382.4. 6bmmeagfNormalized RMSE0.000.010.020.030.040.050.06No nugget NuggetSqExpn=500.000.010.020.030.040.050.06No nugget NuggetPowExpn=500.00.20.40.60.81.0SqExpn=250.00.20.40.60.81.0PowExpn=25Figure 2.13: Friedman function: Normalized RMSE of prediction, zres],hg,for GaSP(Const, SqExp) and GaSP(Const, PowExp) models with no nuggetterm versus the same models with a nugget. There are two base designs:a 25-run mLHD (top row); and a 50-run mLHD (bottom row). For eachbase design, 25 random permutations between columns give the 25 values ofzres], hg in a boxplot.392.4. 6bmmeagfActual Coverage Probability0.00.20.40.60.81.0SqExp PowExp SqExp−Full (K)Borehole0.00.20.40.60.81.0SqExp PowExp SqExp−Full (K)Nilson−KuuskFigure 2.14: Borehole and NilsonKuusk functions: ACP of nominal 95%confidence or credibility intervals for GaSP(Const, SqExp), GaSP(Const,PowExp), and SqExp-Full (K). For the borehole function, 25 random per-mutations between columns of a 40-run mLHD give the 25 values of ACP ina boxplot. For the NilsonKuusk function, 25 designs are created from a 150-run LHD base plus 50 random points from a 100-run LHD. The remaining50 points in the 100-run LHD form the holdout set for each repeat.borehole and NilsonKuusk models. The left-hand plot for borehole displaysthe anticipated under-coverage using plug-in estimates for the correlationparameters. (Confidence intervals here use n1 rather than n in the esti-mate of  in the standard error and tn−) instead of the standard normal.)PowExp is slightly better than SqExp, and SqExp-Full (K) has ACP valuesclose to the nominal 95%. Surprisingly, the plot for the NilsonKuusk modelon the right of Figure 2.14 paints a different picture. Plug-in with SqExpand SqExp-Full (K) both show under-coverage, while plug-in PowExp hasnear ideal properties here. We speculate that the use of the SqExp corre-lation function by SqExp-Full (K) is again suboptimal for the NilsonKuuskapplication, just as it was for prediction accuracy.We also compare models with and without a nugget in terms of coverageproperties for the Friedman function in (2.6). The results in Figure 2.15402.4. 6bmmeagfshow that the problem of substantial undercoverage seen in many of thereplicate experiments is not solved by inclusion of a nugget term. A modestimprovement in the distribution of ACP values is seen, particularly for n =50.GCICG DzsignsThe variability in performance over equivalent designs is a striking phe-nomenon in the analyses in previous sections and raises questions about howto cope with what seems to be unavoidable bad luck. Are there sequentialstrategies that can reduce the variability? Are there advantageous designtypes, more robust to arbitrary symmetries. For example, does it matterwhether a random LHD, mLHD, or an orthogonal LHD is used? The latterquestion will be addressed in detail in chapter 4. That design has a strongrole is both unsurprising and surprising. It is not surprising that care mustbe taken in planning an experiment; it is surprising and perplexing thatequivalent designs can lead to such large differences in performance that arenot mediated by good analytic procedures.GCICH avrgzr hvmplz hizzsAs noted in sections 2.1 and 2.2, our attention is on experiments where nis small or modest at most. With advances in computing power it becomesmore feasible to mount experiments with larger values of n while, at thesame time, more complex codes become feasible but only with limited n.Our focus continues to be on the latter and the utility of GaSP models inthat context.As n gets larger, Figure 2.1 illustrates that the differences in accuracyamong choices of R and  begin to vanish. Indeed, it is not even clear thatusing GaSP models for large n is useful; standard function fitting methodssuch as splines may well be competitive and easier to compute. In addition,when n is large nonstationary behaviour can become apparent and encour-ages variations in the GaSP methodology such as decomposing the inputspace (as in Gramacy and Lee (2008)) or by using a complex  together412.4. 6bmmeagfActual Coverage Probability0.00.20.40.60.81.0No nugget NuggetSqExpn=500.00.20.40.60.81.0No nugget NuggetPowExpn=500.00.20.40.60.81.0SqExpn=250.00.20.40.60.81.0PowExpn=25Figure 2.15: Friedman function: ACP of nominal 95% confidence intervals,for GaSP(Const, SqExp) and GaSP(Const, PowExp) models with no nuggetterm versus the same models with a nugget. There are two base designs: a25-run mLHD (top row); and a 50-run mLHD (bottom row). For each basedesign, 25 random permutations between columns give the 25 values of ACPin a boxplot.422.5. 6baclhfibaf TaW EecbmmeaWTgibafwith a computationally more tractable R (as in Kaufman et al. (2011)).Comparison of alternatives when n is large is yet to be considered.GC5 Conxlusions vny gzxommznyvtionsWe have stressed the importance of going beyond “anecdotes” in makingclaims for proposed methods. While this point is neither novel nor startling,it is one that is commonly ignored, often because the process of studyingconsequences under multiple conditions is more laborious. The boreholeexample (Figure 2.1), for instance, employs 75 experiments arising from 25repeats of each of 3 base experiments.The studies in the previous sections lead to the following conclusions:• There is no evidence that GaSP(Const, PowExp) is ever dominatedby use of regression terms, or other choices of R. Moreover, we havefound that the inclusion of regression terms makes the likelihood sur-face multi-modal, necessitating an increase in computational effort forempirical Bayes or full Bayesian methods. This appears to be due toconfounding between regression terms and the GP paths. In addition,GaSP(Const, PowExp) is the method that we recommend when facedwith a particular model and a set of runs and the primary goal is tomake predictions for untried sets of inputs.• Choosing R = SqExp, though common, can be unwise. The Mate´rnfunction optimized over a few levels of smoothness is a reasonablealternative to PowExp.• Design matters but cannot be controlled completely. Variability ofperformance from equivalent designs can be uncomfortably large.There is not enough evidence to settle the following questions:• Are full Bayes methods ever more accurate than GaSP(Const, Pow-Exp)? Full Bayes methods relying on R = SqExp were seen to besometimes inferior, and extensions to accommodate less smooth R432.5. 6baclhfibaf TaW EecbmmeaWTgibafsuch as PowExp are needed. This will be addressed in the next chap-ter.• Are composite GaSP methods ever better than GaSP(Const, Pow-Exp) in practical settings where the output exhibits nonstationarybehaviour? This is still an open question.44Chvptzr HFlzxiwlz Corrzlvtionhtruxturzs in Wv–zsivnGvussivn eroxzssThis chapter is a follow-up of chapter 2, where we observed the predic-tion accuracy of a SqExp structure based GP model can be worse than thatof a GP model with a more flexible structure, such as the PowExp and theMate´rn in section 2.2. In practice, the GP parameters, , 2, and the corre-lation parameters, are unknown, and hence need to be estimated using theavailable data. Well-known estimation methods can be broadly classifiedinto two categories based on the underlying paradigm. The first is the em-pirical Bayes method, where MLEs are used; see, for example, the algorithmof Welch et al. (1992). The second category is Bayesian posterior inference,such as that proposed by Higdon et al. (2008) and the treed GP methodused by Gramacy and Lee (2008). The Bayesian methods, in principle, takeaccount of parameter-estimation uncertainty and produce better coverageprobabilities for credible intervals. However, in chapter 2, we noted thatthis is not always true when a GP with a squared-exponential correlationfunction is used. In this chapter, we propose new Bayesian methods withmore flexible, power-exponential or Mate´rn, correlation structures.The focus here is quantifying the uncertainty from the statistical emula-tor of the computer model, including the contribution from parameter esti-mation. There are other sources of uncertainty, such as the systematic dis-crepancy between the computer model and physical measurements of the sys-tem (Kennedy and O’Hagan, 2001), not explicitly considered here. The pro-453.1. Eeiiej bf Gjb Fhll 5Tlef MeghbWfposed methods have relevance, however, because modelling the computer-model data is an important component of analyses addressing other ob-jectives, such as optimization and contour estimation. In addition, in thischapter, we consider the Const regression component only. Therefore, reduces to a scalar .The rest of the chapter is organized as follows. We first review two fullBayes methods in section 3.1. An example motivates new Bayesian methodsis analyzed in section 3.2. In section 3.3, we propose Bayesian methods withpower-exponential or Mate´rn correlation structure. They are evaluated insection 3.4 through application examples and a simulation study. Someconcluding remarks are made in section 3.5.HCF gzvizw of iwo Full Wv–zs bzthoysOnyrs wvtu n squnrrq rxponrntvny porrryntvon 5vrrsvon X6The first fully Bayes method is by Kennedy (2004), and hence we call itSqExp-Full (K). It has the following independent prior distributions on theGP parameters:• An improper uniform distribution (normal with infinite variance) on;• A Jeffreys prior on 2; i.e., .(2) ∝ 1R2; and• An exponential distribution with rate 0.1 on each sensitivity parameterj .By integrating out  and 2, Handcock and Stein (1993) gave the marginalposterior distribution of the SqExp correlation parameters  as.( |y) ∝∫ ∫.().(2).( )a(y|P 2P ) y y2∝∏yi5) .(j)(ˆ2 )n−)2det)=2(R) det)=2(1iR−)1)P(3.1)463.1. Eeiiej bf Gjb Fhll 5Tlef MeghbWfwhere .(j) is the prior distribution of j , ˆ is as in (1.6) and ˆ2 is givenbyˆ2 =1n− 1(y − 1ˆ)iR−)(y − 1ˆ)O (3.2)Except for a change in degrees of freedom, ˆ2 in (3.2) is the same as ˆ2in (1.7). The Metropolis-Hastings algorithm is then applied to sample fromthe posterior distribution of j . The predictive distribution conditional on is a non-central t distribution with n − 1 degrees of freedom (Kennedy,2004). That is,y(x∗| Py) ∼ t(mˆ (x∗)P vˆ (x∗))P (3.3)where mˆ (x∗) and vˆ (x∗) were given in (1.8) and (1.9).Onyrs wvtu n squnrrq-rxponrntvny porrryntvon 5vrrsvon H6A second full Bayes implementation based on the SqExp correlation func-tion was described by Higdon et al. (2008), and we call it SqExp-Full (H).The independent prior distributions on the GP parameters are now:• An improper uniform distribution (normal with infinite variance) on;• An inverse-gamma distribution IG(ϕ)P ϕ2) on 2 with shape parameterϕ) = 1 and scale ϕ2 = 0O0001; and• A beta(1P 0O1) distribution on /i = exp(−jR4).A beta prior on /j is equivalent to a log-beta distribution on j . The param-eters  and 2 can be marginalized out of the posterior (Higdon et al., 2008),and a Markov chain Monte Carlo (MCMC) algorithm is applied to samplethe /j . The predictive distribution at an untried input vector is the sameas (3.3) but with n− 1 + 2ϕ) as the degrees of freedom in the denominatorof ˆ2 in (3.2) and in the non-central t distribution in (3.3).Chen (2013) showed that the different priors on 2 in SqExp-Full (K)and SqExp-Full (H) do not substantially affect predictive properties, butthe priors for j are an important distinction. Besides these two fully Bayes473.2. 4 MbgiiTgiag EkTmcle- Ailfba-Khhfk MbWelmethods, there exist several other fully Bayes implementations, for exam-ple, the treed GP (TGP) method (Gramacy and Lee, 2008). The aim of thechapter, however, is not to compare existing estimation methods; a com-prehensive comparison of Bayesian methods can be found in Chen (2013).Rather, the purpose is to introduce Bayesian implementations with flexiblecorrelation structure.HCG V botivvting ExvmplzO cilsonBKuusk boyzlAbt (1999) noted that the extra uncertainty from the use of empiricalBayes plug-in MLEs of the correlation parameters can be non-trivial. Inprinciple, a Bayesian method should quantify parameter-estimation uncer-tainty and thus produce better coverage probabilities of credible intervals.We show in this section, however, that empirical Bayes versus fully Bayesmay be less important for uncertainty quantification than the correlation-function family.We consider an ecological computer code that models reflectance for aplant canopy. It was developed by Nilson and Kuusk (Nilson and Kuusk,1989) and analyzed by Bastos and O’Hagan (2009) to illustrate diagnosticsfor GP emulators. Two computer experiments are available for this codewith y = 5 inputs: the first is a 150-point LHD and the second is an in-dependent 100-point random LHD. We randomly sample 100 points fromthe 150-point set as the training set to train a GP model. The remaining50 points are added to the 100-point set to form a 150-point holdout set.Twenty-five different random samples are used so that each method con-sidered below will produce 25 repeated experiments and 25 sets of results.This approach is similar to that of chapter 2, in which we noted that thevariation in results from one design to another can be considerable. Hence,throughout the chapter we report results from multiple designs, either byrandom sampling or by permuting the columns of a base design.As recommended in chapter 2, a GP with the Const regression termis chosen as the statistical model to emulate the NK model. We considerthe following four methods, which have different combinations of correlation483.2. 4 MbgiiTgiag EkTmcle- Ailfba-Khhfk MbWelstructures and inference paradigm:• SqExp-Emp (further abbreviated as Sq-Emp in figures), i.e., SqExpcorrelation and empirical Bayes;• SqExp-Full (K) (further abbreviated as Sq-Full (K) in figures), i.e.,SqExp correlation and full Bayes with priors from Kennedy (2004),reviewed in section 3.1 ;• SqExp-Full (H) (further abbreviated as Sq-Full (H) in figures), i.e.,SqExp correlation and full Bayes with priors from Higdon et al. (2008),reviewed in section 3.1; and• PowExp-Emp (further abbreviated as Pow-Emp in figures), i.e., Pow-Exp correlation and empirical Bayes.To measure the prediction accuracy, we use normalized RMSE and nor-malized maximum absolute error in (2.5). How well a method quantifies theuncertainty is measured by the actual coverage probability (ACP). We set = 0O05 throughout for nominal (correct) coverage probability of 0O95. Anumber much less than 0O95 is called under-coverage, which indicates themethod does not fully quantify uncertainty.Results for the Nilson-Kuusk example are presented in Figure 3.1. In theleft panel the boxplots of normalized RMSE over the 25 repeat experimentsshow that the best prediction accuracy among the four competing methodsis from PowExp-Emp; it clearly outperforms all three methods with SqExpstructure. The normalized maximum error results in the appendix show thesame pattern. The ACP results in the right panel of Figure 3.1 show thatthe three methods with SqExp structure dramatically under-cover. PowExp-Emp has the best performance again.These accuracy and coverage results are likely due to the effect of thefifth input to the Nilson-Kuusk model. Its estimated main effect (Schon-lau and Welch, 2005) from a PowExp-Emp analysis accounts for about 90%of the total variance of the predicted output over the 5-dimensional inputspace. The median of the 25 MLEs of 5 in the PowExp correlation func-tion is 1O85, which deviates from the fixed power of 2 in SqExp. All of this493.3. Aej 5TlefiTa MeghbWfSq−Emp Sq−Full (K) Sq−Full (H) Pow−Emp0.000.050.100.150.20Normalized RMSESq−Emp Sq−Full (K) Sq−Full (H) Pow−Emp0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.1: Normalized RMSE (left panel) and actual coverage probability(right panel) for the Nilson-Kuusk model from four methods: SqExp-Emp,SqExp-Full (K), SqExp-Full (H) and PowExp-Emp. The boxplots show theresults from 25 random training-test data splits. The horizontal line in theright panel is the nominal coverage probability, 0.95.suggests that full Bayes methods with SqExp structure need to be gener-alized to incorporate more flexibility to deal with a computer model likeNilson-Kuusk, for better accuracy and better uncertainty quantification.Unfortunately, PowExp-Emp is not the panacea: it does not always workas well as it does for the Nilson-Kuusk example. In chapter 2, we noted thatits ACP is much worse than that of the SqExp-Full (K) method for theborehole model. New methods are needed.HCH czw Wv–zsivn bzthoysIn this section we outline new Bayesian methods for the PowExp andMate´rn correlation functions. We first describe the priors for the parame-ters treated in a fully Bayesian way; any further correlation parameters fora particular method are estimated by empirical Bayes. We then describe ageneral approach to implement all methods by outlining the marginal poste-rior distribution of all correlation parameters estimated by Bayes’ rule andthe algorithm for sampling them via MCMC.503.3. Aej 5TlefiTa MeghbWfHCHCF eriors for A 2 vny Similar to SqExp-Full (Version H), all the new methods treat , 2 andthe sensitivity parameters  in a fully Bayesian way. The independent priorsare:• A uniform distribution, j(−1000P 1000), on , which approximates animproper normal distribution;• An inverse-gamma IG(ϕ)P ϕ2) distribution on 2, where ϕ) → 0 andϕ2 → 0 (equivalent to the Jeffreys prior); and• An independent beta(1P 0O5) distribution on /j = exp(−jR4), for j =1P O O O P y.The Jeffreys prior is an improper prior distribution on 2. However it doesnot cause any problems when integrating 2 out of the joint posterior dis-tribution of all parameters. See the derivation for the marginal posteriordistribution of the correlation parameters in Appendix B.2.A beta (1,0.5) prior on the /j scale is equivalent to assuming a log-betadistribution for j :.(j) =18exp(−jR4) 1√1− exp(−jR4)(j = 1P O O O P y)OFigure 3.2(a) shows that this log-beta prior places heavy weight on small jvalues.The preference for small values of j reflects prior belief that the under-lying function is predictable, since a small j value means strong correlationbetween neighbouring points in the design space.When implementing the algorithm, we actually work with j = ln(/jR(1−/j)). After the logistic transformation, j is unconstrained, facilitating pro-posals in MCMC. The implied prior density on j is.(j) =12(exp(−j)1 + exp(−j))− )2(exp(j)(1 + exp(j))2)(j = 1P O O O P y)O (3.4)513.3. Aej 5TlefiTa MeghbWf0 1 2 3 40.00.51.01.52.02.5(a)θjDensity1.0 1.2 1.4 1.6 1.8 2.00.51.01.52.02.5(b)αjDensityFigure 3.2: Prior densities: (a) prior on j and (b) prior on j .The priors above are closely related to the SqExp-Full (H) method ofHigdon et al. (2008). Chen (2013) found through simulation that the log-beta prior on j , which favours small j , is highly preferred. The impropernormal prior on  and inverse-gamma prior on 2 are well-known conjugatepriors, which makes it easy to integrate them out when deriving the marginalposterior of the j , or equivalently the j .HCHCG eowzrBExponzntivlBH–wriyWe explore two approaches to deal with the smoothness parameters, ,of PowExp. The first takes the empirical Bayes paradigm for these param-eters only, by plugging in the MLEs of j . As the remaining parametersare treated in a fully Bayesian way (see section 3.3.1), we call this methodPowExp-Hybrid (further abbreviated as Pow-Hyb in figures).Spiller et al. (2014) recently proposed a Bayesian method in which theyfix j = 1O9 and estimate log(j) by its posterior mean using a reference prior(Paulo, 2005). This approach differs from the PowExp-Hybrid method wepropose here in two aspects: (1) we estimate the smoothness parametersinstead of fixing them, and (2) we use a different prior distribution on thesensitivity parameter, j .523.3. Aej 5TlefiTa MeghbWfHCHCH eowzrBExponzntivlBFullThe second new PowExp implementation is fully Bayesian and hence ab-breviated PowExp-Full (further abbreviated as Pow-Full in figures). Besidesthe priors specified above, it needs priors on the smoothness parameters j .We actually work on the logistic scale j , wherej = 1 +11 + exp (−j) OLike the j for the sensitivity parameters, j is unrestricted, always givingj ∈ (1P 2). In practice, we take an independent uniform prior, j(−20P 20),on j . After a transformation of variables, the implied prior distribution forj is.(j) =1401(j − 1)(2− j) (j = 1P O O O P y)P (3.5)as plotted in Figure 3.2(b). It assigns heavy weight to j values up to about1.2 and from about 1.8.The specification of this prior on j (via j) was the result of muchtrial and error. We tried independent uniform priors on the j , which isstraightforward, but results were not satisfactory for the borehole examplein section 3.4.3. We also considered independent uniform j(0P .) priors onj , where j = 1 + sin(j). This implied prior for j has heavy weight onvalues close to the boundary 2, the qualitative feature we were seeking, butit did not work well for the Nilson-Kuusk model. Several other priors wereconsidered, but only the prior in (3.5) from a logistic transformation workedfor all the examples in section 3.4. It is used for for all results reported forPowExp-Full.HCHCI bvt(zrnBH–wriyA Mate´rn-Hybrid implementation parallels PowExp-Hybrid: the Mate´rnsmoothness parameters j in (2) are estimated by MLE, with each dimensionallowed its own estimate from the four levels of smoothness. It is applied insection 3.4.1 and section 3.4.2 to two computer codes where the correlation533.3. Aej 5TlefiTa MeghbWfstructure is known to be important from the analysis in chapter 2. Themethod is further abbreviated as Mat-Hyb in figures.HCHC5 bvrginvl postzrior yistriwution of thz xorrzlvtionpvrvmztzrsFor any method, denote by  B the correlation parameters treated ina fully Bayesian way and sampled by MCMC. Any remaining correlationparameters are estimated by empirical Bayes. Thus, for the methods to becompared, B = (SqExp-Full, with all j fixed at 2)P (PowExp-Full) (PowExp-Hybrid, with all j replaced by MLEs) (Mate´rn-Hybrid, with all j replaced by MLEs)OAll methods have the priors for , 2, and the j prescribed in section3.3.1. Recall that j and j are logistic transformations of the sensitivityparameter j and the power j , respectively.We want the marginal posterior distribution of  B after integrating out and 2. With the priors .() ∝ 1 and .(2) = IG(ϕ)P ϕ2), as in section3.3.1, we have.( B|y) ∝ .( B)|R|)=2 (1iR−)1))=2 (ϕ2 + n−)2 ˆ2 B )(ϕ)+n−)2)P (3.6)in the appendix where |R| denotes the determinant of R, and ˆ2 B is as in(1.7) but parameterized in terms of  B. A derivation may be found in thein the appendix. The prior .( B) is a product of the independent priors(3.4) for the j and, in the case of PowExp-Full, the independent uniformpriors in section 3.3.3 for the j . For PowExp-Hybrid and Mate´rn-Hybrid,both sides of (3.6) are conditional on the MLEs of the j or j , respectively.The parameters  B are sampled from the marginal posterior distributionby the algorithm in section 3.3.6. According to Santner et al. (2003), thepredictive distribution of y(x∗|yP B) is as in (3.3), but again the degrees of543.3. Aej 5TlefiTa MeghbWffreedom are 2ϕ) + n− 1 and the predictive variance changes tovˆ B (x∗) =((n− 1)ˆ2 B + 2ϕ22ϕ) + n− 1)(1− ri (x∗)R−)r(x∗) +(1− 1iR−)r(x∗))21iR−)1)O(3.7)HCHCK bztropolisBHvstings vlgorithmTaking the PowExp-Full method for illustration, we briefly describe howMetropolis-Hastings (M-H) is used to sample the posterior distribution of and .At iteration i, the algorithm cycles through the input dimensions makinga proposal for each parameter in turn. The details are in algorithm 1.553.3. Aej 5TlefiTa MeghbWfAytorvtuz 1 Implementation of the M-H AlgorithmAt iteration i, to update for dimension j, denote the current valuesof  and  by [i]j = ([i]) P O O O P [i]j−)P [i−)]j P [i−)]j+) P O O O P [i−)]y ) and [i]j =([i]) P O O O P [i]j−)P [i−)]j P [i−)]j+) P O O O P [i−)]y ).1. Use the adaptive proposal f([i]j ) in (3.8) to generate a candidate ∗j .2. Randomly sample u from j(0P 1).3. Let ∗j = ([i]) P O O O P [i]j−)P ∗j P [i−)]j+) P O O O P [i−)]y ). If ln(u) Qln(.(∗j P[i]j |y)) − ln(.([i]j P[i]j |y)), set [i]j = ∗j ; otherwise [i]j =[i−)]j . The posterior distribution used here is .( B|y) in (3.6)with  B = (P). We now have the updated vector [i]j+) =([i]) P O O O P [i]j−)P [i]j P [i−)]j+) P O O O [i−)]y ).4. Use the adaptive proposal f([i]j ) in (3.8) to generate a candidate ∗j .5. Randomly sample another u from j(0P 1).6. Let ∗j = ([i]) P O O O P [i]j−)P ∗j P [i−)]j+) P O O O P [i−)]y ). If ln(u) Qln(.([i]j+)P∗j |y)) − ln(.([i]j+)P[i]j |y)), set [i]j = ∗j ; otherwise[i]j = [i−)]j . We now have the updated vector [i]j+) =([i]) P O O O P [i]j−)P [i]j P [i−)]j+) P O O O [i−)]y ).For an efficient algorithm, the step length for a proposal is important,and we use the adaptive MCMC algorithm (Roberts and Rosenthal, 2009).Let  denote the value at iteration i of a parameter under consideration fortransition. The candidate value isf( ) = 0O95c( P 2O382s2) + 0O05c( P 0O12)P (3.8)where s2 is the sample variance of the sampled values of  up to iterationi, which adapts as the algorithm iterates. Efficiency is achieved by thismixture proposal for all the examples we consider in section 3.4 in the sensethat almost all acceptance rates are between 0O15 and 0O50 (Rosenthal, 2014).563.4. 4cclicTgibaf TaW FimhlTgiba FghWlMore details may be found in the appendix.For any method, at iteration i after all dimensions have been updated, [i]B has been sampled from the posterior. The sample leads to a conditionalpredictive mean, mˆ [n]B(x), and conditional predictive variance, vˆ [n]B(x), com-puted according to (1.3) and (3.7), respectively. The overall predictor isdefined asmˆ(x) =1bM∑i5)mˆ [n]B(x)Pwhere b is the number of samples. The predictive variance is obtainedthrough the law of total variance. That is,vˆ(x) =1bM∑i5)vˆ [n]B(x) +1b − 1M∑i5)(mˆ [n]B(x)− mˆ(x))2OThe M-H algorithm is simpler for SqExp-Full, PowExp-Hybrid, andMate´rn-Hybrid, because it needs to iterate over the  sensitivity param-eters only. Thus, the other methods take half the computational time periteration in the M-H algorithm compared with PowExp-Full. (The hybridmethods have to find MLEs of the smoothness parameters, however, beforerunning M-H.)HCI Vpplixvtions vny himulvtion htuy–We now evaluate the performances of six methods: SqExp-Emp, SqExp-Full, PowExp-Emp, PowExp-Full, PowExp-Hybrid, and Mate´rn-Hybrid. Asin section 3.2, normalized RMSE and normalized maximum absolute errorare used to evaluate the prediction accuracy, and the ACP is used to assessthe uncertainty quantification. The nominal coverage probability is set to0O95.HCICF cilsonBKuusk boyzlWe revisit the Nilson-Kuusk code with repeat experiments generated asin section 3.2.573.4. 4cclicTgibaf TaW FimhlTgiba FghWlThe boxplots of normalized RMSE in the left panel of Figure 3.3 showthat the PowExp and Mate´rn correlation functions lead to noticeably betterprediction accuracy than SqExp. Empirical Bayes versus full Bayes ver-sus hybrid Bayes makes little difference here. The results for normalizedmaximum absolute error in the appendix follow a similar pattern.Sq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.000.050.100.150.20Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.3: Normalized RMSE (left panel) and actual coverage probability(right panel) for the Nilson-Kuusk code from six methods: SqExp-Emp,SqExp-Full, PowExp-Emp, PowExp-Full, PowExp-Hybrid and Mate´rn-Hybrid. The boxplots show the results from 25 random training-test datasplits. The horizontal line in the right panel is the nominal coverage proba-bility, 0O95.Sample means of normalized RMSE over the repeat experiments arereported in Table 3.1 for the four methods giving clearly superior predictionaccuracy here: those using either a PowExp or Mate´rn correlation function.The small differences in sample means and small standard errors of the meandifferences also reported in Table 3.1 indicate that these four methods havesimilar prediction accuracies on average.The ACP results in the right panel of Figure 3.3 also demonstrate theadvantage of the PowExp and Mate´rn correlation functions versus SqExp,with the three new Bayesian methods coming closest to the nominal coverageprobability of 0.95. Equipping Bayesian methods with PowExp or Mate´rnstructure is advantageous here in terms of uncertainty quantification relativeto SqExp, even if the latter has a fully Bayesian treatment. These findings583.4. 4cclicTgibaf TaW FimhlTgiba FghWlPowExp-Emp PowExp-Full PowExp-Hybrid Mate´rn-HybridPowExp-Emp 0.0955 -0.0010 (0.0005) -0.0002 (0.0002) -0.0003 (0.0008)PowExp-Full 0.0010 (0.0005) 0.0965 0.0008 (0.0005) 0.0007 (0.0007)PowExp-Hybrid 0.0002 (0.0002) -0.0008 (0.0005) 0.0957 -0.0001 (0.0008)Mate´rn-Hybrid 0.0003 (0.0008) -0.0007 (0.0007) 0.0001 (0.0008) 0.0958Table 3.1: Sample means of the 25 normalized RMSEs from repeat exper-iments with the Nilson-Kuusk model for four methods. The sample meansare in the diagonal entries of the table. An off-diagonal entry shows the dif-ference in sample means between the row method and the column method,with the standard error of the mean difference in parentheses.are confirmed by Table 3.2, which compares the four methods using PowExpor Mate´rn structure in terms of |ACP−0O95|, the absolute deviation of ACPfrom the nominal coverage probability. PowExp-Emp stands out with thePowExp-Emp PowExp-Full PowExp-Hybrid Mate´rn-HybridPowExp-Emp 0.0388 0.0203 (0.0045) 0.0083 (0.0024) 0.0184 (0.0039)PowExp-Full -0.0203 (0.0045) 0.0185 -0.0120 (0.0032) -0.0019 (0.0027)PowExp-Hybrid -0.0083 (0.0024) 0.0120 (0.0032) 0.0305 0.0103 (0.0031)Matern-Hybrid -0.0184 (0.0039) 0.0019 (0.0027) -0.0103 (0.0031) 0.0204Table 3.2: Sample means of |ACP− 0O95| from 25 repeat experiments withthe Nilson-Kuusk model. The sample means of four methods are in thediagonal entries of the table. An off-diagonal entry shows the difference insample means between the row method and the column method, with thestandard error of the mean difference in parentheses.largest mean absolute deviation from nominal coverage, and the observedmean difference of 0.0203 relative to PowExp-Full is large relative to thestandard error of the mean difference. The differences in averages and theirstandard errors are all fairly small for PowExp-Full, PowExp-Hybrid, andMate´rn-Hybrid, indicating that these methods perform about the same, aswas seen in Figure 3.3.To summarize the results for the Nilson-Kuusk code, use of a flexible cor-relation structure—PowExp or Mate´rn—leads to non-trivial improvementsin prediction accuracy relative to SqExp. Uncertainty quantification is alsomuch more reliable with PowExp or Mate´rn, with full or hybrid Bayesianimplementations adding to the advantage.593.4. 4cclicTgibaf TaW FimhlTgiba FghWlHCICG kolxvno boyzlA computer model of pyroclastic flow from a volcano eruption was stud-ied by Bayarri et al. (2009). The two inputs are initial volume, x), anddirection, x2, of the eruption, and the output, y, is the maximum height ofthe flow at a specific location.The non-negative output suggests transformation may be helpful. Ba-yarri et al. (2009) transformed y to log(y + 1). In chapter 2, we studiedboth the log(y+1) and√y transformations, as we do here. A 32-point dataset is available. We sample 25 points out of the 32 points as a training setand leave the remaining 7 points as a hold-out set for testing. Sampling isrepeated 100 times.The normalized RMSE results are shown in Figure 3.4. For both outputSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.10.20.30.40.5Normalized RMSE for ySq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.20.40.60.8Normalized RMSE for log(y+1)Figure 3.4: Normalized RMSE for the volcano code from six methods:SqExp-Emp, SqExp-Full, PowExp-Emp, PowExp-Full, PowExp-Hybrid andMate´rn-Hybrid. Each method has 100 RMSE values from 100 randomtraining-test data splits. Two transformations of y are considered:√y (leftpanel) and log(y + 1) (right panel).transformations, methods with a flexible correlation structure—PowExp-Emp, PowExp-Full, PowExp-Hybrid, and Mate´rn-Hybrid—have higher ac-curacy than SqExp-Emp and SqExp-Full. Results for normalized maximumabsolute error presented in the appendix show a similar pattern.Table 3.3 reports the ACP of each method, obtained as the numberof 95% confidence/credible intervals containing the true output value di-603.4. 4cclicTgibaf TaW FimhlTgiba FghWlvided by the total number of predictions, 700. The ACPs of PowExp-Full,Average coverage probability√y log(y + 1)SqExp-Emp 0O63 0O71SqExp-Full 0O78 0O82PowExp-Emp 0O86 0O85PowExp-Full 0O92 0O91PowExp-Hybrid 0O90 0O87Mate´rn-Hybrid 0O90 0O88Table 3.3: Actual coverage probability for the volcano code, computed as thenumber of 95% confidence/credible intervals containing the true output ofeach method divided by the total number of predictions, 700. The nominalcoverage is 0O95.PowExp-Hybrid, and Mate´rn-Hybrid are closer to 0O95, though they slightlyunder-cover. The SqExp-Emp approach substantially under-covers, withSqExp-Full faring little better. Thus, again the correlation function is moreimportant than the estimation paradigm here, though Bayesian methods,with a full or hybrid implementation, make further improvements.HCICH Worzholz FunxtionWe consider the Borehole function again as a test bed. Two base designs,with 80 or 200 points, are explored; both are approximate mLHDs (McKayet al., 1979) with the maximin criterion adapted to have desired space-filling properties in all 2-dimensional projections (Welch et al., 1996). Thecolumns of each base design are permuted at random to generate 25 differentbut equivalent designs and hence 25 repeat experiments. The hold-out setis 10,000 points drawn from one random LHD.For n = 80 the left panel of Figure 3.5 shows that the prediction accura-cies of the six competing methods are similar. The ACP performances of themethods are seen to differ, however, in the right panel of Figure 3.5. SqExp-Emp and PowExp-Emp show under-coverage, while the four Bayesian im-plementations have near-nominal coverage probabilities, with PowExp-Fulland Mate´rn-Hybrid the best.613.4. 4cclicTgibaf TaW FimhlTgiba FghWlSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.0000.0050.0100.015Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.20.40.60.81.0Actual Coverage ProbabilityFigure 3.5: Normalized RMSE (left panel) and actual coverage probability(right panel) for the Borehole function and a 80-point mLHD base design.The boxplots show the results from 25 repeat experiments permuting thecolumns of the base design. The horizontal line in the right panel is thenominal coverage probability, 0O95.When the sample size is increased to 200, however, the prediction accu-racies depicted in the left panel of Figure 3.6 show methods using PowExpcorrelation structure are noticeably more accurate than those employingSqExp and slightly better than using Mate´rn. A similar pattern is seenfor normalized maximum absolute error in the appendix. The ACP resultsSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.0000.0010.0020.0030.0040.005Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.20.40.60.81.0Actual Coverage ProbabilityFigure 3.6: Normalized RMSE (left panel) and actual coverage probability(right panel) for the Borehole function and a 200-point mLHD. The boxplotsshow the results from 25 random training-test data splits. The horizontalline in the right panel is the nominal coverage probability, 0O95.623.4. 4cclicTgibaf TaW FimhlTgiba FghWldepicted in the right panel of Figure 3.6 demonstrate that the SqExp corre-lation structure leads to under-coverage, even with a Bayesian implementa-tion. A possible explanation is that the impact of any inadequacy of the GPstatistical model with SqExp in modelling the borehole function increaseswith sample size. PowExp-Emp also under-covers, though less, while thethree Bayesian methods using PowExp or Mate´rn again have ACP closestto the nominal coverage probability.HCICI eil boyzlPreston et al. (2003) developed the PTW model to describe the plasticdeformation of metals. For our purposes the model contains y = 11 inputparameters. A base design has its columns permuted at random to generate25 different but equivalent designs. We consider a 110-point mLHD. Thehold-out set is 10P 000 points generated from one random LHD.The results are presented in Figure 3.7. The normalized RMSE valuesin the left panel show the four methods employing the PowExp or Mate´rncorrelation functions slightly outperforming both methods with SqExp.Sq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.000.050.100.150.200.25Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.7: Normalized RMSE (left panel) and actual coverage probability(right panel) for the PTWmodel and a 110-point mLHD. The boxplots showthe results from 25 random repeat experiments permuting the columns of thebase design. The horizontal line in the right panel is the nominal coverageprobability, 0O95.The normalized maximum absolute error results in the appendix show633.4. 4cclicTgibaf TaW FimhlTgiba FghWla similar pattern. The right panel of Figure 3.7 indicates, however, thatthe four fully Bayesian or hybrid methods have clearly better uncertainty-quantification properties than the two empirical Bayes methods, which under-cover as expected.HCIC5 himulvtions from GesThe applications so far assess the new Bayesian methods on computercodes, meaning that uncertainty quantification includes model misspecifica-tion in representing a code by a statistical GP model. We now consider idealsituations, where functions are realizations of GPs.The simulation settings are as follows. There are y = 10 inputs (results,not shown, for y = 5 are similar). The true value of  is 0 and 2 = 1; thereis little loss of generality here as the priors proposed in section 3.3.1 allowmuch latitude for location and scale. Three true correlation functions areconsidered: SqExp, PowExp with with all j set to 1.8, and Mate´rn with allj = 1. Thus, the realized functions have various degrees of smoothness. Thetrue values of the j are given in Table 3.4; they are a canonical configurationgenerated by the method of Loeppky et al. (2009) with  = 1 and w = 3.A training sample size of n = 100 gives reasonable prediction accuracy forrealizations from all three GP families.j 1 2 3 4 5 6 7 8 9 10j 0.271 0.217 0.169 0.127 0.091 0.061 0.037 0.019 0.007 0.001Table 3.4: Values of the j for simulation.Given one of the GP settings for generating data, there are 25 repeat ex-periments from different random LHD training designs. A set of 100 trainingpoints is combined with a 2000-point random LHD for the hold-out set (thesame hold-out set is used for all 25 repeats), and a GP realization is sam-pled from the multivariate normal at the 100 + 2000 points for the trainingand hold-out output data. All six of the methods SqExp-Emp, SqExp-Full,PowExp-Emp, PowExp-Full, PowExp-Hybrid, and Mate´rn-Hybrid are fit tothe data, with all relevant parameters estimated. For instance, the j and j643.4. 4cclicTgibaf TaW FimhlTgiba FghWlare estimated for PowExp, by empirical Bayes, full Bayes, or hybrid Bayes.Figure 3.8 shows the performances of the six methods when data aregenerated by a GP with SqExp correlation. As SqExp is a special caseof PowExp and Mate´rn, all trained models assume the correct GP family.In the left panel of Figure 3.8 it is seen that distributions of RMSE overthe repeat simulations are practically identical, i.e., there is no over-fittingpenalty from using PowExp or Mate´rn with any of the inference methodsconsidered. The ACP results in the right panel of Figure 3.8 indicate thatthe empirical Bayes methods SqExp-Emp and PowExp-Emp underestimateprediction uncertainty, whereas the full Bayes or hybrid Bayes methods ob-tain near-nominal ACP.Sq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.000.050.100.150.20Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.8: Normalized RMSE (left panel) and actual coverage probability(right panel) with y = 10 inputs and output simulated from a GP with Sq-Exp correlation. The boxplots show the results from 25 random realizationsof the GP. The horizontal line in the right panel is the nominal coverageprobability, 0O95.With output realized from a GP with PowExp correlation and all j =1O8, the RMSE results in the left panel of Figure 3.9 show that assum-ing a PowExp or Mate´rn correlation function, with any of the inferencemethods, is now clearly much more accurate for prediction than using thewrong correlation function, SqExp. The ACP results in the right panel ofFigure 3.9 are again driven by the inference method: there is substantialunder-coverage for the empirical Bayes methods. SqExp-Full and PowExp-653.4. 4cclicTgibaf TaW FimhlTgiba FghWlHybrid slightly under-cover, while PowExp-Full and Mate´rn-Hybrid havenear-nominal ACP.Sq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.10.20.30.40.50.6Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.9: Normalized RMSE (left panel) and actual coverage probability(right panel) with y = 10 inputs and output simulated from a GP withPowExp correlation and all j = 1O8. The boxplots show the results from25 random realizations of the GP. The horizontal line in the right panel isthe nominal coverage probability, 0O95.When the output is realized from a GP with Mate´rn correlation andall ,j = 1, the left panel of Figure 3.10 demonstrates that the PowExpand Mate´rn correlation functions again lead to much smaller RMSEs thandoes SqExp, regardless of the parameter-estimation method. The rightpanel of Figure 3.10 shows severe under-coverage for SqExp-Emp, someunder-coverage for SqExp-Full, and modest under-coverage for PowExp-Emp. PowExp-Full leads to over-coverage here, while PowExp-Hybrid andMate´rn-Hybrid have near-nominal ACP.Overall, the simulations suggest that the PowExp and Mate´rn correla-tion functions are advantageous in terms of prediction accuracy relative toSqExp. When data were generated using SqExp, there was little over-fittingpenalty in estimating the extra parameters of the PowExp and Mate´rn fam-ilies. On the other hand, the more flexible PowExp and Mate´rn familiessometimes led to much more accurate predictions. The normalized maxi-mum absolute error results reported in the appendix follow similar patterns.The simulations also point to the empirical-Bayes methods SqExp-Emp663.5. 6baclhfibaf TaW DifchffibaSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.000.050.100.150.20Normalized RMSESq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.50.60.70.80.91.0Actual Coverage ProbabilityFigure 3.10: Normalized RMSE (left panel) and actual coverage probability(right panel) with y = 10 inputs and output simulated from a GP withMate´rn correlation and all j = 1. The boxplots show the results from 25random realizations of the GP. The horizontal line in the right panel is thenominal coverage probability, 0O95.and PowExp-Emp under-estimating uncertainty, sometimes substantially.SqExp-Full also led to ACP values smaller than nominal, except when datawere generated using a SqExp correlation function. A combination of fullyBayesian or hybrid-Bayesian inference with the PowExp or Mate´rn fami-lies, provided more reliable inference, however. Such combinations showedmodest over- or under-coverage.HC5 Conxlusions vny DisxussionBayesian implementations with SqExp correlation structure were ex-tended to allow PowExp or Mate´rn structure. To estimate the additionalsmoothness parameters j in PowExp, hybrid and fully Bayesian approacheswere proposed and applied. The hybrid method uses MLEs of the j , with allother parameters in a GP model handled via Bayes’ rule. The fully Bayesianmethod employs an uninformative uniform prior for a logistic transform ofj . Similarly, a hybrid approach for the Mate´rn correlation function, usesMLEs for the smoothness parameters j .The work in chapter 2 noted that a fully Bayesian treatment tends to673.5. 6baclhfibaf TaW Difchffibahave better coverage probability than empirical Bayes method for applica-tions such as the borehole function where SqExp is adequate. The pre-vious work also noted that PowExp can give better prediction accuracyfor applications such as the Nilson-Kuusk code, where SqExp is less ade-quate. The applications and simulations considered show that advantagesin terms of prediction accuracy and uncertainty quantification tend to gohand in hand when a Bayesian method with flexible correlation functionis employed. Moreover, there appears to be little over-fitting penalty fromemploying PowExp or Mate´rn when the SqExp special case is adequate.There are some important details of implementation. First, the priors onthe sensitivity parameters j based on previous work (Higdon et al., 2008)assume the inputs xj are scaled to [0P 1]. These priors assign heavier weightto small values of the j , reflecting a prior view that the function can beusefully predicted. Secondly, such priors are particularly appropriate for aconstant-mean model. If a regression model with linear trends in the inputsis used, priors giving more weight to moderate values of j (Kennedy, 2004)may be more appropriate. Finally, we found that MCMC sampling of allcorrelation parameters, for sensitivity and smoothness, was more efficientwith parameters transformed to a logistic scale.68Chvptzr IEvvluvtion of Dzsigns forComputzr ExpzrimzntsFrom the analyses in chapters 2 and 3, we observe that design can have abig effect on prediction accuracy. For instance, in chapter 2, with the samesample size for fitting a GP emulating the Borehole function, a 27-run mLHDnearly always yields much smaller RMSE values than a 27-run OA over 25repeat experiments. The pattern persists when the sample size increases to40 and 80. This observation shows the importance of design. This chapterconcerns the effect of design on prediction accuracy. A simple completelyrandom n by y design can be generated such that every dimension containsn independent realizations from j(0P 1). Such designs are easy to generate,but unless one is lucky, the resulting design might have large holes.Intuitively, we would like designs to be space-filling when the primaryinterest is to emulate a computer model using a GP. The reason is that thepredictor from a GP (we are assuming the computer model is determinis-tic and no nugget or a trivially small nugget is used in the GP model) isactually an interpolator. The prediction error can be viewed as a functionof the location of the untried point relative to the observed points in thedesign space. Therefore, the prediction accuracy will deteriorate if the un-tried point is located in a sub-region that is sparsely observed. We believethis also explains why a space-filling design, in which the design points areevenly spread over the design space, is usually preferred in published anal-yses. There are several methods available to generate space-filling designs,depending on what one means by “evenly spread”. As introduced in section1.3, chapter 1, there are at least two different ways of achieving the “space-696hTcgee 4. EiTlhTgiba bf Defigaf fbe 6bmchgee Ekceeimeagffilling” property, either by defining and optimizing a distance measurement(e.g., mLHD), or by using a low-discrepancy sequence (e.g. Sobol sequence).We will review some of these methods in the next section in this chapter.Besides what has been covered previously, there are several other typesof designs in the literature, such as the maximum utility design Caselton andZidek (1984), the maximum entropy design (Shewry and Wynn, 1987) andthe uniform design (Fang et al., 2000). Pronzato and Mu¨ller (2012) provideda comprehensive review. Since there exist many different designs both space-filling (random LHD, e.g.) and less space-filling (completely random design,e.g.) for practitioners to choose from, an interesting question we want toanswer is the following. Without prior information, which one can be usedas a default design? We are aware that many of the proposed designs have“beautiful” mathematical properties. For instance, Stein (1987) showedthat if n → ∞, the variance of the sample mean )n∑ni5) n(i) is smallerusing a random LHD than using a completely random design generatedfrom j(0P 1) independently in each dimension. That being said, it could berisky to equate attractive theoretical results with better prediction accuracywith a GP. Therefore, the objective of this chapter is practical: first of all,it is fascinating to assess the effect of design on prediction accuracy witha GP in emulating a computer model. Secondly, which design can be usedas a default without contextual knowledge. In a nutshell, we are interestedin evaluating the prediction performance in practice and a comparison oftheoretical properties among different designs is not the focus of the chapter.The rest of this chapter is organized as follows: we review several pop-ular classes of designs in section 4.1. Empirical studies are conducted incomparing different designs in section 4.2, 4.3 and 4.4 through some real ex-amples. Some comments are made in section 4.5. The conclusions from theresearch are reported in section 4.6 and are briefly summarized as follows:- The prediction accuracy differences between different designs are notas big as one might expect. Without prior information, one can usean mLHD as the default design in a computer experiment. MLHDswill be outlined in section 4.1.704.1. 4 Eeiiej bf ETfill 6bafgehcgeW Defigaf- Other factors, such as the sample size and transformation of y ifneeded, have much larger effects on prediction accuracy than that ofdesigns.ICF V gzvizw of Evsil– Construxtzy DzsignsICFCF gvnyom aHDMcKay et al. (1979) introduced the random LHD. Without loss of gener-ality, given a fixed sample size n, a random LHD in 2-d over the unit squarecan be constructed by the following steps:• Construct n× n cells over the unit square.• Construct an n× 2 matrix, Z, with column independent random per-mutations of the integers, {1P 2P O O O P n}.• Each row of Z gives the row and column indices for a cell on the grid.For the ilh (i = 1P 2P O O O P n) row of Z, take a random uniform drawfrom the corresponding cell. The resulting design is the random LHD.It can easily be extended to more than 2-d scenario. McKay et al. (1979)showed the variance of the sample mean )n∑ni5) n(i) is smaller using a LHDthan using a completely random design, if y(x)P O O O P xy) is monotonic in eachof its arguments. Stein (1987) further showed that if n → ∞, LHD has asmaller variance than a completely random design without the monotonicitycondition. However, it is also well known that a random LHD can still havelarge holes. An obvious case is if all points lie on the diagonals by chance.We use rLHD to denote a random LHD in this chapter.ICFCG bvximin aHDAn mLHD can be viewed as a combination of an LHD and a maximindesign. The idea of a maximin design is that in order for the points to bespread out over the space, no two points are too close to each other. To bemore specific, let Y ⊂ X be an arbitrary n-point design that consists of714.1. 4 Eeiiej bf ETfill 6bafgehcgeW Defigafdistinct inputs {x())P O O O Px(n)}. One way to measure the distance betweenany two points x(i)Px(j) is given by/(x(i)Px(j)) =(y∑k5)∣∣∣x(i)k − x(j)k ∣∣∣p))=pP (4.1)where p = 1 and p = 2 are rectangular and Euclidean distances, respectively.The Euclidean distance, p = 2 is used in this thesis. A maximin designmaximizes the minimal distance between any two points in the design space.Given the design Y, its minimal distance is defined.Φ(Y) = minx(n);x(j)∈D/(x(i)Px(j))O (4.2)Intuitively, it makes sure that no two points are too close, and hence thedesign is spread out. Johnson et al. (1990) first defined the maximin design.A design that maximizes Φ(Y) in (4.2) within the class of LHDs is calledmaximin LHD, which is abbreviated asmLHD throughout the thesis. This isan important type of design that will be evaluated in the next section. In thischapter, we use the mauimikPLEA (t:.) ...) function of the R packagePLEA (Ba et al., 2015) to generate an mLHD. All function parameters aretaken as defaults unless stated otherwise.In addition, there also exists a variant of mLHD. In (4.1), instead ofconsidering all y dimensions (y ≥ 2) at once, we consider all projectionscorresponding to a limited number of input variables. Let S denote a set ofprojections, usually all subsets of two variables at a time:S = {{1P 2}P {1P 3}P O O O P {y− 1P y}}OThere are y(y− 1)R2 projections here and we use |S| to denote the numberof projections in S. Therefore, the minimal distance defined in (4.2) changesto:Φ(Y) = minx(n);x(j)∈D 1|S|∑s∈S1n(n− 1)R2n−)∑i5)n∑j5i+)1/s(x(i)Px(j)) O (4.3)724.1. 4 Eeiiej bf ETfill 6bafgehcgeW DefigafJoseph et al. (2015) proposed a maximum projection design, which bearsthe same idea as introduced above. To avoid confusion, we use mLHD todenote an mLHD based on (4.2) and use m2LHD to denote an mLHD basedon (4.3). The m2LHD design is generated by the R function JauMolLEA()of the package JauMol.ICFCH irvnsformzy aHDThe transformed LHD, which was proposed by Dette and Pepelyshev(2010), is a variant of the LHD. A transformed LHD allocates more ex-periments near the boundary of the design space. Therefore, it is not asspace-filling as other designs. However, since the largest prediction erroroften occurs at untried points near the boundary, the transformed LHD isexpected to have better prediction accuracy than rLHD and mLHD, espe-cially when the prediction accuracy is measured for the worst case. Detteand Pepelyshev (2010) proposed the following steps to generate a trans-formed LHD:Strp A: Generates an mLHD.Strp O: Define a generalized design supported at the points z())P O O O P z(n),where z(i) = (z(i;))P O O O P z(i;y)), by the transformation of the points x())P O O O Px(n)using the quantile function of the Beta densityp()+v)=2(t) =1B((1 + v)R2P (1 + v)R2)t(v−))=2(1− t)(v−))=2O (4.4)The transformation from x(i;j) to z(i;j) is effective by defining z as a solutionto the equationx(i;j) =∫ z(n;j)(p((1 + v)R2)(t)ytP (4.5)where the parameter v ∈ [0P 1] makes the Beta density u-shaped, i =1P O O O P nP j = 1P O O O P y. The parameter v can be considered as a tuning param-eter, which specifies the important of the boundary. The smaller v is, themore mass is put at the boundary of the interval [0P 1]. The special case ofv = 0 yields the arc-sine distribution. For that density, the transformation734.1. 4 Eeiiej bf ETfill 6bafgehcgeW Defigafis explicitly given as follows:z(i;j) =1− cos(.x(i;j))2O (4.6)We believe this type of ”boundary-focused” design may be useful for someproblems where behaviour near a boundary is important. We use trLHD todenote a transformed LHD with v = 0 based on an mLHD. Note that trLHDis based on mLHD only, not on m2LHD from the original paper (Dette andPepelyshev, 2010).ICFCI drthogonvl Vrrv–Bwvszy aHDThe OA-based LHD was introduced by Tang (1993). The constructionof an OA-based LHD depends on the existence of the corresponding OA.We describe orthogonal arrays first. An n × y matrix, b , is called an OAof strength rP r ≤ y, level s, size n, index , if each n × r submatrix of bcontains 1 × r row vectors of all possible combinations of 1P 2P O O O P s withthe same frequency . The array is denoted by dA(nP yP sP r). From thedefinition, it is clear that the sample size n = sr and a LHD is a specialcase of dA(nP yP nP 1).Consider an dA(nP yP sP r) denoted as b . For each column of b , onereplaces the sr−) locations with entry k by random permutation of (k −1)sr−)+1P (k−1)sr−)+2P O O O P (k−1)sr−)+sr−), for all k = 1P 2P O O O P s.It is then mapped to [0P 1]y. The resulting matrix is an OA-based LHD.Tang (1993) showed that an OA-based LHD yields a smaller variance ofn¯ than that of a random LHD in an asymptotic sense if y(x)P O O O P xy) isa continuous function in [0P 1]y. However, OA-based LHDs have a strictrestriction on sample size n = sr. For instance, orthogonal arrays (r ≥ 2)do not exist if n is a prime number. This largely restricts its availability inpractice.744.1. 4 Eeiiej bf ETfill 6bafgehcgeW DefigafICFC5 howol hzquznxzA Sobol sequence (Sobol, 1967) is an example of a low-discrepancy se-quence (Niederreiter, 1988). Given a set e = {x())P O O O Px(n)}, using Nieder-reiter’s notation, the discrepancy is defined asYn(e ) = supB∈J∣∣∣∣A(B;e )n − y(B)∣∣∣∣ Pwhere y is the d-dimensional Lebesgue measure, A(B;e ) is the numberof points in P that fall into B, and J is the set of d-dimensional regions in[0P 1]y. B is a member of J . The discrepancy is low if the proportion ofpoints in P falling into B is close to the measure of B for all B.An obvious application of low-discrepancy sequence is numerical inte-gration. The integral of a function f can be approximated by the averageof the function evaluations at n points:∫ )(O O O∫ )(︸ ︷︷ ︸yf(x)yx ≈ 1nn∑i5)f(x(i))OIf the points are chosen randomly from a known distribution, this is theMonte Carlo method. However, if the points are chosen as elements of a low-discrepancy sequence, this is the quasi-Monte Carlo method. Niederreiter(1988) showed that the quasi-Monte Carlo method has a rate of convergenceclose to d(1Rn), whereas the rate for the Monte Carlo method is d(1R√n).Please see Niederreiter (1992) for more details. To approximate the integralwell, the set e = {x())P O O O Px(n)} should minimize the holes in the space,which results in a “space-filling” design.In R, the plbli function in package oakatlliblu can be used to generatea Sobol sequence. In the thesis, we use plbli(pcoambiikd : 3) pbba :14..) ....) to generate a Sobol sequence.754.1. 4 Eeiiej bf ETfill 6bafgehcgeW DefigafICFCK jniform DzsignUniform design (Fang et al., 2000) is a kind of “space-filling” design.The basic idea of a uniform design is as follows. Consider y factors overregion [0P 1]y. We want to find a design, Du such that the points are uni-formly scattered over the experiment space. This is achieved by minimizingthe discrepancy between the distribution function (denoted as F (x)) of theuniform distribution on [0P 1]y and the empirical distribution (denoted asFn(x) ) of the design points. The ap discrepancy is usually defined asYn;p(Du) =[∫Du|F (x)− Fu(x)|p yx])=pOOne popular choice is the a∞ norm, which yieldsYn(Du) = supx∈Du|F (x)− Fn(x)| OIn practice, a uniform design is usually constructed by number theoreticproperties or via algorithms based on a finite candidate set of points. Ineither case, the construction process is time-consuming and difficult. Thedifficulty in constructing such a design limits its availability and popularity.ICFCL hpvrsz Griy DzsignBefore we introduce sparse grid design (SGD), we consider the latticedesign. A lattice design is defined as X = X) × X2P× O O O P×Xy, where eachXj is a set of one-dimensional points termed a component design. For a setA and B, the Cartesian product, denoted as A×B, is defined as the set ofall ordered pairs (vP w), where v ∈ A and w ∈ B. If the sample size of Xj isnj , the sample size of X is∏yi5) nj . Given all the component designs, thelattice design is extremely easy to generate. It also has an appealing propertythat since its covariance matrix takes the form of a Kronecker product ofmatrices, inverting the correlation matrix R is equivalent to inverting ysmaller submatrices, which can dramatically reduce the computational timewhen the sample size is large.764.2. MTia 6bmcTeifbaHowever, the lattice design has an obvious drawback that if y is large,the sample size will become exponentially large. For instance, if y = 8 andeach dimension has the same component design {0O25P 0O5P 0O75}, the totalsample size will be 30 = 6561.Plumlee (2014) proposed the use of sparse grid design as an alternative toa lattice design. To build an SGD, one must first specify a nested sequenceof one-dimensional experimental designs for each j = 1P O O O P y denoted Xj;t,where Xj;t ⊆ Xj;t+), t = 1P 2P O O O, Xj;( = ∅ and Xj;t is a component design. Asparse grid design is, therefore, defined asXhG() =⋃t⃗∈G()X);t) ×X2;t2×P O O O P×Xy;ti P (4.7)where  ≥ y is an integer that represents the level of the construction andG() = {t⃗ ∈ N y|∑yj5) tj = }. Here we use the overhead arrow to dis-tinguish the vector of indices, t⃗ = [t)P O O O P ty] from a scalar index. It iseasy to see that the sample size of a SGD is determined by the follow-ing three factors: (1) Dimension, y. (2) Level, . (3) Component designs.The component designs that we use for this chapter are same as the onesreported in the appendix of Plumlee (2014): Xj;) = {0O5}, Xj;2\Xj;) ={0O125P 0O875}, Xj;+\Xj;2 = {0O25P 0O75}, Xj;,\Xj;+ = {0P 1}, Xj;5\Xj;, ={0O375P 0O625}, Xj;6\Xj;5 = {0O1875P 0O8125}, Xj;7\Xj;6 = {0O0625P 0O9375}.Here A\B means objects that belong to set A and not to B. With thecomponent designs given above, the sample size can be expressed as follows:n =∑ean(y;−y)k5( 2k(yk)(−yk). For other types of component designs and theirassociated formulas for calculating sample size, please refer to Table 1 ofPlumlee (2014).ICG bvin CompvrisonIn this section, we consider the following six designs:• rLHD• mLHD774.2. MTia 6bmcTeifba• trLHD• Sobol• uniform design• completely random designThe three other designs: OA-based LHD, m2LHD and SGD that arenot included in this section. OA-based LHD and m2LHD will be consideredtogether in the next section as they are projection-based. SGD will bediscussed in section 4.5. Before any formal comparison is conducted, Figure4.1 visualizes 8 designs with n = 27, y = 8. Although OA-based LHD andm2LHD are not discussed in this section, they are included in Figure 4.1to provide readers with an overall idea of the appearance of these designs.SGD is not include in Figure 4.1 because of its strict restriction on samplesize. It shows only the projections onto the first two input variables. Thoseare the base designs used in the Borehole function, an example we will useas a testbed. The OA-based LHD is index 3, level 3, strength 2. The OA isdownloaded from Neil Sloane’s online catalog at ettm7,,kbiipilakb.clm,ikabu.etmi#TABLBP.We observe from Figure 4.1 that the completely random design has largeholes. The random LHD improves but it still does not equally fill thespace. As we expect, the trLHD allocates more points to the boundariesand mLHD, uniform design and Sobol are spread out over the region. Forthe two projection designs, its seems the m2LHD has a better space-fillingproperty than the OA-based LHD.ICGCF Worzholz boyzlWe consider the Borehole function as a test-bed to assess the impact ofthe design classes. There are 25 replicate experiments for each design con-sidered, by random permutation of the columns of the base design visualizedin Figure 4.1. A GaSP (Const, PowExp) model is fitted and the hold-out setis 10P 000 points generated by oakalmLEP in R. The assessment criteria are784.2. MTia 6bmcTeifba0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0Randomx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0rLHDx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0mLHDx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0trLHDx1x20.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0Sobolx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0Uniformx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0OA−LHDx1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0m2LHDx1x2Figure 4.1: Completely random design, rLHD, mLHD, trLHD, Sobol, Uni-form, OA-based LHD and m2LHD for y = 8 and n = 27. OA-based LHD isindex 3, level 3, strength 2. The 8-dimensional designs are projected ontotheir (x)P x2) coordinates.794.2. MTia 6bmcTeifbathe zres], hg and the zeax, hg in (2.5), while y¯ is the mean of the true outputsof the 10P 000 points in the hold-out set to provide the same standardizationfor all the designs. Figure 4.2 show zres], hg for n = 27 for all the 8 designs.Random rLHD mLHD trLHD Sobol Uniform OA−LHD m2LHD0 .0 00 .0 20 .0 40 .0 60 .0 80 .1 00 .1 2No rmal i zed  R MS EFigure 4.2: Borehole function: Normalized RMSE of prediction, zres], hg, for8 base designs, n = 27. For each base design, 25 random permutations of itscolumns give 25 values of zres], hg, displayed as a box plot. The OA-basedLHD is index 3, level 3, strength 2.This is a very interesting figure that shows the performance differencesbetween designs are actually small: the uniform design and mLHD have thebest overall performance and completely random design, trLHD and rLHDhave the largest variations among the designs considered. m2LHD has asightly better performance than OA-based LHD. However, none of the de-signs dramatically outperforms others and the overall performance is good asthe normalized RMSEs for all designs are below 0O1. The plot for normalized804.2. MTia 6bmcTeifbamaximum absolute error shows a similar pattern although the trLHD designand the uniform design have the best overall performances. The results forthe maximum error are reported in Figure C.1 in the appendix.Although the uniform design has a good overall performance with n = 27in the Borehole example, we are not able to find more uniform designs withlarger sample sizes. Therefore, from now on, we will focus on the seven otherdesigns only.In addition, Figure 4.3 shows the zres], hg results for n = 40P 80. Theresults for zeax, hg are reported in Figure 4.4. There is not enough space toreport results for the completely random design in the above figures. How-ever, it performs like the other four designs with larger variations though.There are some similarities in Figures 4.3 and 4.4, that is, given thesame sample size, the difference in prediction accuracy of the four designsis marginal, indicating that the effect of sample size overwhelms the impactof design class. Increasing the sample size from n = 40 to n = 80 reducesthe RMSE values from around 0O03 to about 0O01, a factor of about 1R3.Furthermore, we notice that trLHD is worse than mLHD for zres], hg, buthas a better performance for zeax, hg, which measures the worst case. Asit assigns more points to the boundaries, it is not surprising to see trLHDhas a better performance than the other four design classes in terms of theworst case. Taking all factors into account, mLHD is the recommendeddesign for emulating the Borehole function. Moreover, although we do notreport results for the completely random design, its performance is actuallynot bad, which further reflects sample size is much more important than thechoice of design. From now on, we will report the results for the normalizedmax absolute error in the appendix.ICGCG eil boyzlThe next model we consider is the PTWmodel. We consider n = 128 andn = 256 and with the same settings as in the Borehole example, the resultsof normalized RMSE and normalized max error are reported in Figure 4.5and in Figure C.2 in the appendix, respectively.814.2. MTia 6bmcTeifba1.60 1.9040 80−2.50−2.00−1.50−1.000.000.010.030.10log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.3: Borehole function: Normalized RMSE of prediction, zres], hg;n = 40P 80 for rLHD, Sobol, mLHD and trLHD designs. For each basedesign, 25 random permutations of its columns give 25 values of zres], hg,displayed as a box plot. Box plots are joined by their means.824.2. MTia 6bmcTeifba1.60 1.9040 80−2.50−2.00−1.50−1.00−0.500.000.010.030.100.32log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure 4.4: Borehole function: Normalized maximum absolute error of pre-diction, zeax, hg; n = 40P 80 for rLHD, Sobol, mLHD and trLHD designs.For each base design, 25 random permutations of its columns give 25 valuesof zeax, hg, displayed as a box plot. Box plots are joined by their means.834.2. MTia 6bmcTeifba2.11 2.41128 256−1.40−1.00−0.600.040.100.25log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.5: PTW function: Normalized RMSE of prediction, zres], hg; n =128P 256 for rLHD, Sobol, mLHD and trLHD designs. For each base design,25 random permutations of its columns give 25 values of zres], hg, displayedas a box plot. Box plots are joined by their means.844.2. MTia 6bmcTeifbaThe performances of the designs are similar to those in the Boreholeexample: trLHD has the worst normalized RMSE among the designs andthe mLHD has the best overall performance. By increasing the sample sizeto 256, the accuracies of all designs have dramatically improved as observedpreviously. The results of normalized maximum error show a pattern con-sistent with the Borehole function: trLHD has the best overall performance,which suggests the effect of criterion is also non-trivial. It is, therefore rec-ommended to use several criteria which measure different aspects of accuracyto obtain a more comprehensive conclusion.ICGCH lzightzy Frvnkz's FunxtionWe consider the Franke’s function (Franke, 1979). It was originally a2-d function used as a test function in interpolation problems. We extendit to a 8 dimension function by adding xiP xi+), where i = 3P 5P 7. Each pairxiP xi+) has the same form as x)P x2 and we further weight each pair. Ourversion of the weighted Franke’s function is as follows:f(x) =∑i5);+;5;7xi(0O75 exp(−(9xi − 2)24− (9xi+) − 2)24)+ 0O75 exp(−(9xi + 1)249− (9xi+) + 1)10)+ 0O5 exp(−(9xi − 7)24− (9xi+) − 3)24)− 0O2 exp (−(9xi − 4)2 − (9xi+) − 7)2))P (4.8)where x) = 1P x+ = x5 = x7 = 0O1. By assigning the above weights to the fourpairs, the first 2 variables are the most important factors in the function.With sample sizes n = 80P 200P 500, we used the same settings as before.The results are presented in Figure 4.6 for normalized RMSE and in FigureC.3 for the normalized max absolute error.From Figure 4.6, we observe a pattern seen for the Borehole function.That is, given the same sample size, mLHD is recommended, although the854.2. MTia 6bmcTeifba1.90 2.30 2.7080 200 500−1.25−1.00−0.75−0.500.060.100.180.32log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.6: Weighted Franke’s function: Normalized RMSE of prediction,zres], hg; n = 80P 200P 500 for rLHD, Sobol, mLHD and trLHD designs. Foreach base design, 25 random permutations of its columns give 25 values ofzres], hg, displayed as a box plot. Box plots are joined by their means.864.2. MTia 6bmcTeifbaperformance differences between designs are actually small. trLHD provesto be poorer than the other designs. All the above three examples suggestthat sample size has a much bigger effect than the choice of design.ICGCI CornzrBpzvk FunxtionO driginvl hxvlzThe corner-peak function isy(x) =1(1 +y∑j5)xjxj)y+)(0 Q xj Q 1P xj S 0 for j = 1P O O O P y)O (4.9)The function is special in a way that it grows rapidly near the origin, if xj islarge. In other words, the difficulty of emulating such a function increasesas xj increases. With y = 10, determining the appropriate coefficients xjrequires some effort. After some trial and error, we have chosen to followBarthelmann et al. (2000) and uniformly sample the xj from 0 to 1 andrescale them such that∑)(j5) xj = 1O85. By doing so, sparsity among the10 sampled coefficients will be insured. The sampled coefficients, xj usedare here {0.0014 0.0286 0.0648 0.0714 0.0764 0.0963 0.2553 0.3297 0.45000.4761}. Those numbers are kept the same throughout this chapter.In this section, we work with the corner-peak function on its originalscale, i.e., no any transformation is taken on y. The function grows rapidlynear the origin and even small designs of n = 100 have several orders ofmagnitude of variation in y. If some care is taken with appropriate statisticalmodelling, such as the use of the logarithm transformation, we expect theprediction accuracy will greatly improve with the same sample size, n.We first report results with n = 50P 100P 250. As usual, the predictionaccuracy is measured by the normalized RMSE (zres],hg) and normalizedmaximum absolute error (zeax,hg ) on a hold-out set consisting of 10P 000points generated by a random LHD. The results are reported in Figure 4.7 forthe normalized RMSE and in Figure C.4 in the appendix for the normalizedmaximum absolute error.Figure 4.7 demonstrates that accuracy is poor with n = 250 or less for all874.2. MTia 6bmcTeifba1.70 2.00 2.4050 100 250−1.00−0.500.000.501.000.100.321.003.1610.00log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.7: Corner-peak function: Normalized RMSE of prediction, zres], hg;n = 50P 100P 250 for for rLHD, Sobol, mLHD and trLHD designs. Foreach base design, 25 random permutations of its columns give 25 valuesof zres], hg, displayed as a box plot. Box plots are joined by their means.884.2. MTia 6bmcTeifbadesign classes. With n = 250, the normalized RMSEs are above 0O25, whichindicates the function is difficult to predict on its original scale. However, byincreasing the sample size from 50 to 250, the normalized RMSE decreasesfrom about 0O7 to around 0O3 for most of the designs, suggesting the samplesize has a non-trivial effect on the prediction accuracy even if the functionis relatively hard to emulate. The differences between mLHD and rLHD aresmall and the trLHD performs very poor when sample size is n = 50P 100.However, as we observed before, the trLHD has the best performance interms of the max error when n = 250 in Figure C.4. It is also worth notingthat the variation among equivalent designs within the class of trLHD ishuge.ICGC5 CornzrBpzvk FunxtionO aogvrithmix hxvlzThe corner-peak function in (4.9) is difficult to model because it growsrapidly near the origin. We report results on the corner-peak function after alogarithm transformation in section 4.2.5. In practice, it is often impossibleto clearly identify the bottleneck. But one can easily generate an n = 100mLHD design and compute the ratio of the maximum of y to the minimumof y. Permuting the columns of the mLHD 25 times will yield 25 of the ratiosand the average of the 25 ratios we observed is 390O2. Such a huge numberalso suggests a logarithm transformation is needed for decent predictionaccuracy.Therefore, instead of working on the original y scale, we consider workingwith log(y(x)). When computing RMSE, however we convert back to theoriginal scale. Using the same model settings, the results are reported inFigure 4.8 for the normalized RMSE and in Figure C.5 in the appendix forthe normalized maximum absolute error. Sample sizes considered here aren = 50P 100 only, as n = 100 already yields very good accuracies for all thedesigns.From Figure 4.8, it is clear that a huge improvement has been attainedfrom working on the logarithmic scale: Even with n = 50, the normalizedRMSE is already smaller than 0O1, which was impossible to obtain even with894.2. MTia 6bmcTeifba1.70 2.0050 100−2.00−1.50−1.00−0.500.010.030.100.32log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.8: Corner-peak function analyzed at the logarithmic scale: Nor-malized RMSE of prediction, zres], hg; n = 50P 100 for rLHD, Sobol, mLHDand trLHD designs. For each base design, 25 random permutations of itscolumns give 25 values of zres], hg, displayed as a box plot. Box plots arejoined by their means.904.3. 6bmcTeifba bf Cebjecgiba Defigafn = 250 at the original scale. By increasing the sample size to n = 100, alldesigns considered yield normalized RMSE that are around 0O03. AlthoughtrLHD performs relatively poorly on the original scale, it outperforms otherdesigns on the logarithmic scale. Relative to the other designs, Sobol doesnot perform as well as it does on the original scale. The performance of therLHD is actually very good when n = 100 in Figure 4.8. In terms of themaximum absolute error in Figure C.5, the trLHD still has the best overallperformance.Overall, from the comprehensive analysis of the corner-peak function,we can rank the importance of the following three factors.logarithm tranformation S sample size S design classOICH Compvrison of erojzxtion DzsignsIn this section, we consider the two projection-based designs: m2LHDand OA-based LHD with strength 2. The two designs are evaluated togetherhere mainly because they both have certain projection properties: m2LHD isa variant of mLHD, where optimization is conducted based on (4.3) implyingthat the 2-d projection properties are the focus. Moreover, a strength 2 OA-based LHD means any two dimensions will have all combinations of levelsappear exactly  times, where  is the index of an OA. Thus, both designshave their 2-d projection properties controlled or optimized to a certainextent. In addition, we also include mLHD as a reference.The sample size is mainly determined by OA-based LHD as it has astrict restriction on it, while mLHD and m2LHD are flexible. In general,the maximum sample size of a function is chosen such that the majorityof designs considered attain acceptable accuracies: usually the normalizedRMSE should decrease to or below 0O1.ICHCF Worzholz FunxtionWe now consider two OA-based LHDs with n = 32 (index 2, level 4,strength 2), n = 64 (index 1, level 8, strength 2), respectively. The strength914.3. 6bmcTeifba bf Cebjecgiba Defigafis kept at 2 for all OA-based LHDs. The original OA designs are down-loaded from the online catalog at ettm7,,kbiipilakb.clm,laaio,ikabu.etmi maintained by Neil J. A. Sloane and we convert them to OA-basedLHDs using the first 8 columns only. m2LHDs with the same sample sizesare generated using R package aufidro (Joseph et al., 2015). The mLHD isalso included as the reference. The hold-out set is the same 10,000 pointsgenerated from a random LHD. We generate equivalent designs by randomlypermuting the columns of the base design as before. The results for nor-malized RMSE are reported in Figure 4.9 and results for the normalizedmaximum absolute error are in Figure C.6 in the appendix.We observe from Figure 4.9 that difference between m2LHD and mLHDis actually very small for both sample sizes. The performance of the OA-based LHD is worse than the two other designs and has a larger variation.When the sample size is 64, all of the designs considered have achieved verygood accuracy: the normalized RMSE values are about 0O01. The maximumerror results in Figure C.6 show a similar pattern.ICHCG eil FunxtionThe second function we consider in this section is the PTW function.With n = 128 and 256 and the same settings as above, the results fornormalized RMSE and the normalized maximum absolute error are reportedin Figures 4.10 and C.7, respectively. They show same pattern as thatobserved for the Borehole function in section 4.3.1.ICHCH lzightzy Frvnkz's FunxtionWith the same setting as for the PTW function, the results for thenormalized RMSE and the normalized maximum absolute error are reportedin Figures 4.11 and C.8 in the appendix for the weighted Franke’s functiondefined in (4.8).We observe from Figure 4.11 that the performance of the OA-basedLHD is still the worst although we have expanded the Franke function tohave 2-dimensional interactions only and have presumably used designs with924.3. 6bmcTeifba bf Cebjecgiba Defigaf1.51 1.8132 64−2.5−2.0−1.5−1.00.000.010.030.10log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS EmLHDm2LHDOA−LHDFigure 4.9: Borehole function: Normalized RMSE of prediction, zres], hg;n = 32P 64 for m2LHD, OA-based LHD designs and mLHD. The two OA-based LHDs are n = 32 (index 2, level 4, strength 2) and n = 64 (index 1,level 8, strength 2). For each base design, 25 random permutations of itscolumns give 25 values of zres], hg, displayed as a box plot. Box plots arejoined by their means.934.3. 6bmcTeifba bf Cebjecgiba Defigaf2.11 2.41128 256−1.5−1.2−0.90.030.060.13log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS EmLHDm2LHDOA−LHDFigure 4.10: PTW function: Normalized RMSE of prediction, zres], hg; n =128P 256 for m2LHD, OA-based LHD designs and mLHD. The two OA-basedLHDs are n = 128 (index 2, level 8, strength 2) and n = 256 (index 1, level16, strength 2). For each base design, 25 random permutations of its columnsgive 25 values of zres], hg, displayed as a box plot. Box plots are joined bytheir means.944.3. 6bmcTeifba bf Cebjecgiba Defigaf2.11 2.41 2.71128 256 512−1.3−1.1−0.80.050.090.16log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS EmLHDm2LHDOA−LHDFigure 4.11: Weighted Franke function: Normalized RMSE of prediction,zres], hg; n = 128P 256P 512 for m2LHD, OA-based LHD and mLHD. Thethree OA-based LHDs are n = 128 (index 2, level 8, strength 2), n = 256(index 1, level 16, strength 2) and n = 512 (index 2, level 16, strength 2).For each base design, 25 random permutations of its columns give 25 valuesof zres], hg, displayed as a box plot. Box plots are joined by their means.954.4. Aba FcTce Filliag Defiga n F:Dgood 2-dimensional projection properties. Relatively, the OA-based LHD ismuch worse when the sample size increases to 256, but is comparable tomLHD and m2LHD when n = 512. mLHD and m2LHD have the bestoverall performance and the differences between the two designs is actuallyvery small. This agrees with what we observed in previous examples. Theresults for the normalized max absolute error in Figure C.8 show similartrends.ICHCI CornzrBpzvk FunxtionO aogvrithmix hxvlzIn this section, we consider the corner-peak function with the logarith-mic transformation directly, since none of the designs can attain an accept-able prediction accuracy without a transformation of y within a reasonablesample size. The same 10P 000 points are used as the hold-out set and 25equivalent designs are generated by permuting the columns of each base de-sign. Unless mentioned otherwise, all the other settings are kept same as insection 4.3.1. The trLHD is added as another benchmark, as we observed inFigure 4.8 that trLHD has the best performance for the corner-peak func-tion analyzed on the logarithmic scale. With n = 64 and 128, the results fornormalized RMSE are reported in Figure 4.12 and the results for normalizedmaximum absolute error are report in Figure C.9We observe in Figure 4.12 that on a relative scale, trLHD still has thebest performance. OA-based LHD is as good as trLHD when the sample sizeis 64, however, it deteriorates when the sample size increases to 128. mLHDand m2LHD have similar performance as seen in previous examples. Thetrends are consistent with Figure C.9 for the normalized maximum absoluteerror in the appendix.ICI con hpvxz Filling Dzsign { hGDThe SGD reviewed in section 4.1 is less space-filling. Because of itsspecial structure, the primary goal of the SGD is to facilitate reasonablyfast computations of predictions when the sample size becomes big enough964.4. Aba FcTce Filliag Defiga n F:D1.81 2.1164 128−2.2−1.7−1.2−0.70.010.020.060.20log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS EmLHDtrLHDm2LHDOA−LHDFigure 4.12: Corner-peak function analyzed at the logarithmic scale: Nor-malized RMSE of prediction, zres], hg; n = 64P 128 for m2LHD, OA-basedLHD and mLHD. The two OA-based LHDs are n = 64 (index 4, level 4,strength 2) and n = 128 (index 2, level 8, strength 2). For each base design,25 random permutations of its columns give 25 values of zres], hg, displayedas a box plot. Box plots are joined by their means.974.4. Aba FcTce Filliag Defiga n F:Dthat other designs would not be able to yield predictions within a feasibletime frame. That being said, we observe in this section that the predictionaccuracy of SGD is usually much worse than that of an mLHD when thesample size is not large. We argue that when sample size is small, for examplen Q 1000, and prediction accuracy is a concern, we would not recommendSGD.Figure 4.13 shows two SGD designs jittered by a small amount of noiseto show replicates with y = 8P  = 10 and y = 8P  = 11. We only plot thefirst two input variables. It is clear that SGD is no longer a space-fillingdesign as we can see large holes from Figure 4.13.0.2 0.4 0.6 0.80.20.40.60.8(a) n=145x1x20.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0(b) n=833x1x2Figure 4.13: Two SGD designs with y = 8P  = 10 and y = 8P  = 11,respectively. The designs are jittered by a small amount of noise to showreplicates and the sample sizes are n = 145P 833, repectively. Only the x)P x2are plotted.Results of two examples are reported as follows. The first example weconsider is the Borehole function. Two SGDs are generated: one is n=145with  = 10 and the other is n = 833 with  = 11. Those are the designsplotted in Figure 4.13. The component designs used for constructing the two984.4. Aba FcTce Filliag Defiga n F:DSGDs are suggested by Plumlee (2014). Moreover, these are two reasonablesample sizes to make a comparison for the Borehole model with mLHD, forexample. Given an SGD, it is impossible to make equivalent designs bypermuting its columns. Therefore, the SGD will only report one numberfor its normalized RMSE. We include mLHD with the same sample sizeas a competing design, for which we permute the columns to generated 3equivalent designs. In addition to the two criteria: normalized RMSE andnormalized maximum absolute error, we also report the median absoluteprediction error (MAPE). MAPE is one of the criteria considered in theoriginal paper (Plumlee, 2014). It is the median of the absolute predictionerrors of all points in the hold-out set (not standardized), which is robustto extreme prediction errors. The hold-out set is the same 10P 000 pointsgenerated from a rLHD as before. The results are reported in Table 4.1.n=145 n=833SGD mLHD SGD mLHDNormalized RMSE 0.0610 0.0022 0.0441 0.0002Normalized Max Absolute Error 0.1701 0.0067 0.1691 0.0003Median Absolute Prediction Error 0.6516 0.0401 0.3247 0.0034Table 4.1: Borehole function: prediction results for SGD and mLHD, n =145P 833. Three predcition criteria are considered. There is one design forSGD. For mLHD, 3 random permutations of its columns give 3 values ofeach criterion and the mean of the 3 numbers is reported.From Table 4.1, the performance difference is clear: given n = 145,the normalized RMSE of SGD is 0.061, which is about 30 times biggerthan that of the mLHD. The story persists when we talk about MAPE, thecriterion used by Plumlee (2014): the MAPE of SGD is 0.65 and is 16 timesbigger than that of the mLHD. When sample size increases to n = 833, thedifference is even larger: the MAPE of SGD is 0.3247, which is now 95 timesbigger than that of the mLHD.Even though an SGD, which has computational advantages, is usefulwhen the size is really large, the prediction performance for the Boreholemodel, where the sample size (n=145, 833) is relatively affordable and the994.5. Difchffibadimension is high (d=8), is not comparable to mLHD. The example suggeststhat an SGD design could potentially have a huge negative impact on theprediction accuracy.The second example on which we report is the y = 10 corner-peak func-tion both on the original scale and on the logarithmic scale. The sample sizefor SGD is n = 221 with the same component designs. An mLHD with thesame size is generated as the benchmark. Other settings are kept the same.The results are reported in Table 4.2. Again for all three criteria considered,the performance of SGD is much worse than that of mLHD, except in thatthe MAPE of SGD is comparable to that of the mLHD for the corner-peakfunction at its original scale. The MAPE is largely determined by predictionaccuracy away from the localized peak, where the function is easy to modelby any method.SGD mLHDCPNormalized RMSE 0.6626 0.3095Normalized Max Absolute Error 0.8794 0.4068Median Absolute Prediction Error 2O28× 10−, 2O03× 10−,CPlogNormalized RMSE 0.0675 0.0091Normalized Max Absolute Error 0.1403 0.0233Median Absolute Prediction Error 3O01× 10−6 4O64× 10−7Table 4.2: Corner-peak function and Corner-peak function analyzed on thelogarithmic scale: Prediction results for SGD and mLHD design, n = 221.Three prediction criteria are considered. There is one design for SGD. FormLHD, 3 random permutations of its columns give 3 values of each criterionand the mean of the 3 numbers is reported.IC5 DisxussionIC5CF ihz Effzxt of gzgrzssion ComponzntWe have already discussed the effect of the method of analysis in chapter2 of the thesis. As interaction between the method and design may exist,which will certainly further complicate the evaluation. We consider using1004.5. Difchffibaa full linear GP model to refit the corner-peak function analyzed on thelogarithmic scale. The corner-peak function is an example where switchingfrom a constant mean GP to a full linear GP might bring some benefits to theprediction accuracy. In fact, by taking a logarithmic transformation whenanalyzing the corner-peak function, we have already included the effect of theanalysis, whose impact on prediction accuracy is astonishingly large as theperformances of all designs have dramatically improved on the logarithmicscale.With the same settings, the results for normalized RMSE are reportedin Figure 4.14 and the results for normalized maximum absolute error arereported in Figure C.10 in the appendix. The vertical axis limits are keptthe same as those in Figures 4.8 and C.5 to facilitate direct comparison.From the two plots, it is very clear that for small sample size (n=50), re-sults for the full linear GP are worse, which indicates the FL model actuallyhinders the prediction performance. For larger sample sizes, the perfor-mances are similar for the two GP models. The observations are consistentwith those made in chapter 2.IC5CG Griy GznzrvtionGrid generation is the way the points are generated. To be more specific,suppose we want to generate a rLHD with n = 10 in one of the dimensions.There exists at least three choices of points:• Type 1: A random point each in (0P 0O1)P (0O1P 0O2)P O O O P (0O9P 1).• Type 2: 0R9P 1R9P 2R9P O O O P 9R9.• Type 3: 1R20P 3R20P O O O P 19R20.Therefore, even if researchers call it a random LHD, different people maychoose different ways to generate such a rLHD. In short, variants do existthough they all bear the same name. The major difference among the abovethree types depends on whether to include the boundaries or not. We cansee that a type 2 grid does include 0 and 1, while type 1 and type 3 tend to1014.5. Difchffiba1.70 2.0050 100−2.00−1.50−1.00−0.500.010.030.100.32log10(n)nl og 10( No rmal i zed  RMS E)No rmal i zed   RMS ErLHDSobolmLHDtrLHDFigure 4.14: Corner-peak function analyzed at the logarithmic scale with afull linear GP model: Normalized RMSE of prediction, zres], hg; n = 50P 100for rLHD, Sobol, mLHD and trLHD designs.. For each base design, 25random permutations of its columns give 25 values of zres], hg, displayed asa box plot. Boxplots are joined by their means. The vertical axis limits arekept the same as in Figure 4.8 to faculitate direct comparison.1024.6. Fbme 6baclhWiag EemTekfgenerate “middle” points. In this chapter, we use the R packages aufidro,gLHD to generate m2LHD and mLHD, respectively. It turns out that bothR packages take type 3 grids. Hence, in order to be consistent, we use type3 grids as well when generating rLHDs, OA-based LHDs. For example, weconvert an OA to an OA-based LHD by i−(:5n , where n is the sample size, iis an integer in the original OA. In summary, all the designs considered inthis chapter do not include the boundaries. By pushing points towards theboundaries, the trLHD designs have enhanced sampling near the boundary,however.ICK homz Conxluying gzmvrksIn this chapter, we first reviewed several different designs in section 4.1and used some examples to assess the prediction performance of the designsin section 4.2, 4.3 and 4.4 . Some important issues are discussed in section4.5. The major contributions are summarized as follows.1. Given the same sample size, we find mLHD has the best overall pre-diction accuracy. Therefore, mLHD is the default design we suggest if thereis no prior information available. Actually, the prediction performancesof different designs are not as large as we might expect. Designs, suchas rLHD and Sobol, which are trivial to generate, have shown acceptableperformances. For the two projection designs, m2LHD has a performancesimilar to that of mLHD. However, the OA-based LHD is notably worse thanmLHD. The performance of SGD is worse than mLHD for all the examplesconsidered. However, it is not necessarily interpreted as a drawback as itsprimary objective is fast prediction when sample size is big enough to disableother designs. Finally, we want to emphasize again that it could be riskyto equate attractive theoretical results of a design with a good predictionaccuracy.2. Sample size and transformation of y have a much bigger effect onthe prediction performance. In general, increasing sample size can greatlyimprove prediction accuracy. Transforming y requires knowing the natureof the function. For instance, we know a logarithmic transformation can1034.6. Fbme 6baclhWiag EemTekfhelp since we are aware that the function grows rapidly in part of the inputspace. In practice, one can first conduct a pilot experiment on the computermodel to see if the function varies by order of magnitude.Recall that for the corner peak function, the challenge is that y growsrapidly when x is close to the origin. Without a logarithmic transformationof y, none of the designs is able to yield a good RMSE. Since the designsconsidered in this chapter are all “fixed” designs in the sense that the ex-perimental points are fixed once generated, poor prediction performance onthe original scale also suggests the inability of fixed designs in analyzing andpredicting functions showing non-stationarity in a certain region. This mo-tivates the use of a sequential design, which is the topic of the next chapter.104Chvptzr 5hzquzntivl ComputzrExpzrimzntvl Dzsign forEstimvting vn Extrzmzerowvwilit– or fuvntilz5CF IntroyuxtionConsider the problem of estimating an extreme tail probability or quan-tile of the output of a computer model. Let the inputs, e = {m)P O O O P my}be y random variables, the output n = y(e), as a function ofe, is a randomvariable as well. Taking probability estimation as an example, given yf as acritical value of y of interest, a research problem of engineering interest is toprovide an estimate of pf = e (y(e) S yf ). If yf is large enough renderingthe said probability close to 0, pf is a right tail probability. A left tail prob-ability is analogous. In addition, the tail quantile estimation is an inverseproblem of the probability estimation. It searches for the 1 − pf quantileyf , such that e (y(e) Q yf ) = 1 − pf , where pf is given. In a nutshell,both require the accurate depiction of the tail of the distribution n = y(e).The focus of the chapter is to provide a good estimate of the extreme tailprobability or quantile when the computer model is expensive to run.The computer model of interest in the chapter is a model of a floorsystem. The inputs are y values of the Modulus of Elasticity (MOE) ofthe y joists and the output is the maximum deflection under a fixed staticload. If the maximum deflection exceeds a cut-off deflection, the system1055.1. IagebWhcgibawill fail. The details of this particular computer model will be introducedin section 5.2. Moreover, the general research objective discussed in theprevious paragraph can be stated in the specific context of the computermodel of a floor system as follows: given the cut-off deflection yf , it is ofinterest to estimate the failure probability pf = e (y(e) S yf ). Conversely,given the failure probability pf , it is of interest to estimate the quantile, yfsuch that pf = e (y(e) S yf ).If it were feasible to evaluate the computer model many times, theMonte Carlo (MC) method can be used directly to provide an estimateof the tail probability or quantile. Suppose there is a discrete MC set,XMC = {x())MCP O O O Px(N)MC}i generated from the distribution of e. How toappropriately choose the input distribution is a topic that will be addressedin section 5.5. Plugging XMC = {x())MCP O O O Px(N)MC}i into the computer model,one can obtain thec true outputs y(XMC) = {y(x())MC)P O O O P y(x(N)MC)}i . Thenthe empirical distribution of the output n is given byFˆ (w) =1cN∑i5)1(y(x(i)MC) Q w)P (5.1)where 1(Z) = 1 if event Z is true and 0 otherwise. The failure probability orquantile are then the (approximate) solutions of pf = 1− Fˆ (yf ). Accordingto the famous Glivenko-Cantelli theorem (Cantelli, 1933, Glivenko, 1933),the supremum of the difference between the empirical distribution and thetrue CDF converges to 0 almost surely under regularity conditions as cincreases. However, the novel part of the research is that the computer codeis expensive and so it is not affordable to evaluate the computer code manytimes to obtain a large sample. Consider an expensive computer model withlimited experimental budget, for example c = 40. It would be extremelyinaccurate if we estimate the empirical distribution of n based on thec = 40real outputs directly.The solution here is to use the Gaussian process (GP) model (Sacks et al.,1989, Santner et al., 2003) as a cheap statistical proxy for the expensive com-puter model. The primary idea is as follows: first generate a discrete train-1065.1. IagebWhcgibaing set X = {x())P O O O Px(n)}i and obtain y(X ) = {y(x()))P O O O P y(x(n))}ithrough the computer model. The available data enables one to build aGP model. Then, obtain the predicted values from the trained GP model:ky(XMC) = {yˆ(x())MC)P O O O P yˆ(x(N)MC)}i . The failure probability pf is estimatedby 1− Fˆ (yf ), where the true outputs y(XMC) are replaced by ky(XMC). Theestimate of the quantile can be obtained in an analogous way.There are several ways to choose the training set for fitting the GP. Thefirst way, which we call a fixed design strategy, is to use up all of the designbudget n, to train one statistical surrogate, which is employed to predictthe outputs of the points in the MC set. The fixed strategy is simple andfast, but the estimation accuracy may not be good enough, especially forestimating an extreme tail probability or quantile. The second way, a se-quential design strategy, is to use part of the design budget at the beginningto train an initial GP and sequentially adds points into the design space(one at a time), guided by a search criterion until the budget has been ex-hausted. Each time a new point is added, both the surrogate model andthe probability/quantile estimate get updated. Compared with the fixeddesign strategy, the second strategy is a dynamic process that allows newinformation being added “on the fly”, thus should intuitively, provide moreaccurate estimation. Better estimation outcomes using sequential method-ology is well recognized; see Jones et al. (1998) and Ranjan et al. (2008) forexamples.The core of a sequential method is the search criterion. Jones et al. (1998)proposed the expected improvement criterion (EI) based on an improvementfunction for global optimization and the EI criterion has been popular dur-ing the past 20 years. In general, there are different improvement functionsfor different statistical objectives. Ranjan et al. (2008) introduced an im-provement function for contour estimation, i.e., search x = {x)P O O O P xy} suchthat y(x) = v, where v is the targeted level. Roy and Notz (2014) used twodifferent criteria for estimating a quantile where pf = 0O2: the first one isalso based on EI and used nearly the same improvement function as Ranjanet al. (2008) for estimating a quantile. The second criterion of Roy andNotz (2014) is the so-called hypothesis testing-based criterion, which opti-1075.1. IagebWhcgibamizes a “discrepancy” between the current prediction and the response atany untried input set. The details will be reviewed in section 5.3.The work we propose in this chapter takes such a sequential design ap-proach. The focus is to provide answers and recommendations to somepractical questions an engineer asks when applying the sequential designstrategy. They are as follows.• There are two search criteria: EI and the hypothesis testing-based cri-terion for quantile and probability estimation. Which yields a moreaccurate result, especially for extreme probability and quantile estima-tion? Roy and Notz (2014) compared the performance of sequentialdesigns and fixed designs and showed sequential designs produced moreaccurate quantile estimates. However, they did not consider probabil-ity estimation. None of their examples covered extreme quantile esti-mation. In this chapter, we explicitly contrast the performances of EIand the hypothesis testing-based criterion for quantile and probabilityestimation. We conclude that the hypothesis testing-based criterionhas a faster convergence to its target and hence is preferred.• How to select the input distribution for e? It is noted that the distri-bution of n is fully determined by the distribution ofe = {m)P O O O P my}through the deterministic computer model. We explore several differ-ent ways of modelling the input distribution.• How to generate the MC set? One obvious option is to take a simplerandom sample from the input distribution. But the required sam-ple size may be computationally inefficient even with GP prediction.In this chapter, we adopt a stratified sampling scheme to generate asmaller MC set such that the extreme probability is well estimated.• How to select the initial design for training? We explore two differentchoices and compare their performances in an engineering model.• Based on the hypothesis testing-based criterion, we generate diagnos-tic plots to check if the sequence has converged or not. It is critically1085.2. 6bmchgee MbWel bf T Flbbe Flfgemimportant to know when to stop in practice when the “true” proba-bility and quantile are unknown.• When estimating the unknown parameters of a GP, we use the Bayesianmethod proposed in chapter 3 instead of the MLE paradigm. Both theEI and the hypothesis testing-based criteria contain the standard er-ror of the response at an untried point. Bayesian methods are able tofully quantify the parameter estimation uncertainty. Therefore, it ispreferred to use Bayesian methods to train a GP model and predict insequential design.Those are the topics we will address in this chapter, which are very practice-oriented. It is clear that this chapter involves several details that did notdraw much attention in the literature. In summary, the major contributionwe make here is to fill in the details which close the gap between the theoryof sequential design and its practical use.The rest of the chapter is organized as follows: In section 5.2, we in-troduce the wood computer model used to illustrate the methodology. Insection 5.3, the sequential algorithm and the aforementioned criteria arereviewed. A study based on an engineering model is reported in section5.4. The wood computer model is revisited in section 5.5. Some concludingremarks are made in section 5.6.5CG Computzr boyzl of v Floor h–stzmThe application that motivates this research is a numerical model of afloor system (McCutcheon, 1984). The performance characteristic of interestis the floor’s maximum deflection under a static load. The y inputs x)P O O O Pxy to the model are the modulus of elasticity (MOE) values in units of psiof y supporting joists. For instance, Figure 5.1 shows a system with y = 8joists. The load could be a further input, but it will be kept constant inthe analysis of section 5.5 because we are more interested in the relationshipbetween MOEs and the floor’s maximum deflection. Given the joist MOEinputs, the computer model outputs the y deflections of the y joists, and1095.2. 6bmchgee MbWel bf T Flbbe Flfgemthe maximum deflection is taken as the output of interest, y (units in); seeFigure 5.2. The computer code will be treated as a black-box, i.e., we do notFigure 5.1: Floor with y = 8 joists (supporting beams). The 8 joists act likesprings, i.e., they deflect under a load.                      MOE1                      MOE2                      MOE3                      MOE4                        Computer Model:       Solving a Set of Simultaneous Equations                Deflection—y1                Deflection—y2                Deflection—y3                Deflection—y4                 Uniform Load  Maximum Deflection—y            Inputs              OutputFigure 5.2: Computer model of a floor system with y = 4 joists.take account of the form of the equations in the mathematical representation.1105.3. FedheagiTl EkceeimeagTl DefigaThe code is deterministic, but if we think of the inputs x)P O O O P xy asrealizations of random variables m)P O O O P my, then the output deflection y isa realization of a random variable n . The reliability problem is to estimateeither:• The failure probability pf = e (n S yf ), for a given maximum allow-able deflection yf ; or• The quantile yf such that e (n S yf ) = pf , where pf is a given smallfailure probability.5CH hzquzntivl Expzrimzntvl DzsignHere we describe the sequential design strategies for estimation of a tailprobability or quantile. The heart of a sequential algorithm is the criterionfor adding new evaluations to the search, and we contrast existing criteria.5CHCF hzquzntivl VlgorithmsThe basic idea is to apply MC using the GP predictions of y(x(i)MC) forall points in the MC set XMC rather than evaluating the computer model.Given initial training data of n( evaluations, y(x()))P O O O P y(x(n()), and hencea trained GP, Algorithms 2 and 3 compute tail probability and quantileestimates, respectively. They add n+ points sequentially according to thesearch criteria in sections 5.3.2 and 5.3.3.Note that in section 5.5, the probability estimate in step 6 of algorithm 2will be replaced by the stratified random sampling estimate in (5.10). Step6 in algorithm 3 for quantile estimation will be replaced in an analogousway. Both algorithms are based on the method of Ranjan et al. (2008)for contour estimation up to the search criterion. Mapping out a givencontour where y(x) = yf is equivalent to separating the MC points intothose where y(x(i)MC) S yf versus those where y(x(i)MC) ≤ yf . In Algorithm 2it is straightforward to estimate pf based on the yˆ(x(i)MC) (Bichon et al.,2009). For quantile estimation, Algorithm 3 estimates the unknown quantile1115.3. FedheagiTl EkceeimeagTl DefigaAytorvtuz ? Return an estimate of pf = e (n S yf ) for a given yf1: sunptvon ProbabilityEstimate(n(P n+PX PyP yf )2: n = n(3: sor i = 1 to n+ qo4: Use the current training data, X and y, to fit the GP.5: Compute predictions yˆ(x())MC)P O O O P yˆ(x(N)MC) for the MC set6: pˆf = (1Rc)∑Ni5) 1(yˆ(x(i)MC) S yf )7: vs i Q n+ turn8: Choose x(n+)) from X[and based on a search criterion9: Add x(n+)) to X10: Evaluate y(x(n+))) and add it to y11: n is replaced by n+ 112: rrturn pˆfAytorvtuz 3 Return an estimate of yf , where e (n S yf ) = pf for a givenpf1: sunptvon QPantileEstimate(n(P n+PX PyP pf )2: n = n(3: sor i = 1 to n+ qo4: Use the current training data, X and y, to fit the GP.5: Compute predictions yˆ(x())MC)P O O O P yˆ(x(N)MC) for the MC set6: Compute yˆf such that pf ≃ (1Rc)∑Ni5) 1(yˆ(x(i)MC) S yˆf )7: vs i Q n+ turn8: Choose x(n+)) from X[and based on a search criterion9: Add x(n+)) to X10: Evaluate y(x(n+))) and add it to y11: n is replaced by n+ 112: rrturn yˆf1125.3. FedheagiTl EkceeimeagTl Defigaat each iteration and then chooses the next point as if the estimated quantileis the known contour of interest (Roy and Notz, 2014)5CHCG Expzxtzy Improvzmznt CritzrionThe improvement function proposed by Ranjan et al. (2008) for mappingout a contour where y(x) equals the constant yf can be written asI(x) =2v ;2(x)− (y(x)− yf )2 if |y(x)− yf | Q √v ;2(x)0 otherwiseP(5.2)where v ;2(x) is obtained from (1.9). The two subscripts emphasize thepredictive variance is conditional on the GP parameters, and  determinesthe level of confidence. Thus, a large improvement would result from a newevaluation at x where the uncertainty measured by v ;2(x) is currently largeand y(x) turns out to be close to the target yf . Taking the expectation ofI(x) with respect to the predictive distribution of y(x) gives the expectedimprovement,Z(I(x)) =(2v ;2(x)− (yˆ(x)− yf )2)(Φ(u2)− Φ(u)))+ v ;2(x) ((u2ϕ(u2)− u)ϕ(u)))− (Φ(u2)− Φ(u))))+ 2(yˆ(x)− yf )√v ;2(x)(ϕ(u2)− ϕ(u)))Pwhere yˆ(x) is the predictive mean in (1.8), u) = (yf − yˆ(x))R√v ;2(x)−,u2 = (yf − yˆ(x))R√v ;2(x) + , and ϕ(·) and Φ(·) are the PDF and CDFof the standard normal distribution, respectively. The sequential-designcriterion for selecting the next point, x∗, is to maximize the expected im-provement:x∗ = arg maxx∈X[andZ[I(x)]P (5.3)where X[and can be either a discrete or continuous candidate set.1135.3. FedheagiTl EkceeimeagTl Defiga5CHCH H–pothzsis izstingBWvszy Critzrion (DistvnxzBWvszyCritzrion)Roy and Notz (2014) defined the “discrepancy” between yf and y(x) atany untried input set to beY(x) =(y(x)−yf )2+ϵv ;2if v ;2 ̸= 0∞ otherwiseP(5.4)where ϵ S 0. If ϵ = 0, the expression looks like an F-statistic. Roy and Notz(2014) proposed to choose the next point that minimizes the expectation ofD(x). The expectation with respect to the predictive distribution of y(x) isZ(Y(x)) =(y(x)−yf√v ;2)2+ ϵv ;2+ 1P if v ;2 ̸= 0∞ otherwiseO(5.5)In this chapter, we work with ϵ = 0. Minimizing Z(Y(x)) is equivalent tothe following expression when v ;2(x) ̸= 0x∗ = argminx∈X[and(√(yˆ(x)− yf )2v ;2(x))= argminx∈X[and(|yˆ (x)− yf |√v ;2(x))O (5.6)Minimizing (5.6) is also equivalent to maximizing the following probabilityx∗ = argmaxx∈X[ande (x) = argmaxx∈X[ande(z Q −|yˆ (x)− yf |√v ;2(x))P (5.7)where z ∼ c(0P 1). We call |y^ (x)−yf |√v ;2 (x)the qvstnnpr between the predictedresponse and the quantile adjusted by the prediction standard error. There-fore, we call it the qvstnnpr-onsrq prvtrrvon. It is similar in spirit to thehypothesis testing-based criterion. The magnitude of the probability is gov-erned by two terms: (1) How close is yˆ (x) to yf . (2) How large v ;2(x) is.The criterion tends to select the following two types of points: (1) given thesame prediction variance, the point that has the closest predictive mean to1145.4. 4a EkTmcleoFhbeg 6blhma Fhacgibayf , which is in favour of a local search; (2) given the same predictive mean,the point has the largest predictive variance, which leads to a global search.Thus the algorithm combines local and global search. The criterion will notselect a point that is either in the initial training set or has been selected asits information has already been incorporated.5CI Vn Exvmplz|hhort Column FunxtionThe short column function models a short column with uncertain ma-terial properties and subject to uncertain loads. It was used by Kuscheland Rackwitz (1997) to study the trade-off between structural reliabilityand cost. We consider only the reliability component of the function. For agiven input x = {zPmP p} it isf(x) = 1− 4mwh2z− p2w2h2z2P (5.8)where the output f(x) is the limit state, i.e., the difference between resis-tance and load, and f(x) Q 0 means the system fails. The three inputs(zPmP p) and their explicit distributions (Kuschel and Rackwitz, 1997) are:• Z ∼ Lognormal(mean = 5P sd = 0O5). Y is the yield stress.• b ∼ c( = 2000P  = 400). M is the bending moment.• e ∼ c( = 500P  = 100). P is the axial force.The parameters w (width of the cross-section, in mm) and h (depth of thecross-section, in mm) are meaningfully chosen as w = 3P h = 10 such that asimulation based on 10 million points generated from their respective inputdistributions yield e (f(x) Q 0) = 0O0025 with negligible binomial standarderror. Therefore, the true probability of system failure is assumed to be0O0025. This is the probability that we aim to approximate using sequentialmethodology.Note that the probability of interest in the short column function isactually a lower-tail probability. Hence step 6 of algorithm 2 will change to1155.4. 4a EkTmcleoFhbeg 6blhma Fhacgibapˆf = (1Rc)∑Ni5) 1(yˆ(x(i)MC) Q 0). The quantile estimation in algorithm 3will change in a similar way.5CICF erowvwilit– VpproximvtionSuppose the experimental budget is n = 40. The size of initial designis kept as n( = 20. We consider two different choices for the initial de-sign: the first option is to independently generate from the respective inputdistributions. The second is to first generate a random LHD from 0 to 1and then map the three variables into mean ±3 standard deviations of theirrespective distributions. We label the first option as “random” and the sec-ond as “uniform”. In order to estimate the failure probability well, we cansee from (5.8) that large m and h, small z in the training set tend to leadto failure. Hence, we speculate that the uniform initial design will have abetter performance as it over-samples the tails of each distribution.The MC set is 100P 000 points independently generated from the respec-tive input distributions. The optimization in (5.3) or (5.7) is done by ad-ditionally generating 10P 000 points from the respective input distributionsto form a finite candidate set and adding the point that maximizes (5.2) or(5.7) in the candidate set. The whole process is repeated 10 times.The above sample sizes are meaningfully chosen such that they keep abalance between the approximation accuracy and the computational time.A constant mean power exponential GP model is used as the surrogatestatistical model and inference on the unknown GP parameters is conductedusing the full Bayesian method implemented in chapter 3. The results areshown in Figure 5.3, Figure 5.4 and Table 5.1.Table 5.1: RMSE based on the final estimates for probability estimation.Initial design EI Distance-basedRandom 0O00439 0O00014Uniform 0O00094 0O00011From Figure 5.3, after additionally adding 10 points, the distance-basedcriterion converges to the “true” probability. It stays there and does not1165.4. 4a EkTmcleoFhbeg 6blhma Fhacgiba0 .0 000 .0 050 .0 100 .0 150 .0 20Number of training runsE st i ma te d pr ob ab il i t y20 22 24 26 28 30 32 34 36 38 40(a) Distance−based0 .0 000 .0 050 .0 100 .0 150 .0 20Number of training runsE st i ma te d pr ob ab il i t y20 22 24 26 28 30 32 34 36 38 40(b) EIFigure 5.3: Probability estimation and the initial design is random. (a)estimates from the distance-based criterion. (b) estimates from the EI cri-terion. The medians over 10 repeat experiments are joined by solid lines.The dotted line is the true probability, 0O0025.0 .0 00 .0 20 .0 40 .0 60 .0 8Number of training runsE st i ma te d pr ob ab il i t y20 22 24 26 28 30 32 34 36 38 40(a) Distance−based0 .0 00 .0 20 .0 40 .0 60 .0 8Number of training runsE st i ma te d pr ob ab il i t y20 22 24 26 28 30 32 34 36 38 40(b) EIFigure 5.4: Probability estimation and the initial design is uniform. (a)estimates from the distance-based criterion. (b) estimates from the EI cri-terion. The medians over 10 repeat experiments are joined by solid lines.The dotted line is the true probability, 0O0025.1175.4. 4a EkTmcleoFhbeg 6blhma Fhacgibajump to any other places. However, the EI criterion barely shows any signof convergence. Table 5.1 suggests the distance-based criterion performsbetter than the EI for the random initial design.Figure 5.4 tells the same story for the uniform initial design: the distance-based criterion outperforms the EI. The second row of Table 5.1 confirmsthis. In addition, if we compare Figure 5.4 to Figure 5.3, the uniform initialdesigns perform better than the random initial design, in agreement withour expectation. It is also interesting that the uniform initial design “helps”more for the EI criterion than for the distance-based criterion.Moreover, we further increase the sample size for the EI criterion withthe random initial design only, and it is observed (though not reported herefor brevity) that the EI method eventually converges by adding 30 morepoints after the 20-point initial design, making the total sample size n = 50.This clearly indicates that compared with the distance-based criterion, theEI criterion converges more slowly. But, why is it the case? We explore theperformance difference below. Let us define i) = bRn and i2 = (eRn )2.Therefore, (5.8) can be rewritten asf(x) = 1− 4wh2t) − 1w2h2t2Pwhere f is a linear function of t) and t2. We concentrate on one repeatand show the initial points, the contour of interest and the 20 points addedin Figures 5.5 (a) for EI and the 20 points chosen in Figure 5.5 (b) by thedistance-based method.From Figure 5.5, we observe that the initial points are far from the con-tour of interest and the sequential method manages to explore the unknownspace and to settle down to the place of interest. However, among the 20points added, the distance-based criterion places 13 points on the contour atlevel 0,which is the search target. But from Figure 5.5 (b), the EI methodonly adds 3 point close to the contour, i.e., it wastes many points in uninter-esting regions of the input space. This can explain why it does not convergeto the “true” probability with 20 points added.1185.4. 4a EkTmcleoFhbeg 6blhma Fhacgiba0 20 40 60 80 10001 002 003 004 005 00M/Y( P/ Y) ^ 212345678910111213 14157819(a) EI, 20 points added0 20 40 60 80 10001 002 003 004 005 00M/Y( P/ Y) ^ 21234567891011121341516178 1920(b) Distance−based, 20 points addedFigure 5.5: The dots are the 20-point initial design. The solid line is thecontour f(x) = 0 of interest. The triangles are points added with the orderindicated within each triangle.5CICG fuvntilz EstimvtionFollowing the mode of analysis used above, we investigate extreme quan-tile estimation with the same settings as in section 5.4.1. For two differentinitial designs and two search criteria, results are reported in Figure 5.6,Figure 5.7 and Table 5.2.Table 5.2: RMSE based on the final estimates for quantile estimation.Initial design EI Distance-basedRandom 0O07592 0O01287Uniform 0O04999 0O00991The results are similar to what we observe in section 5.4.1. The distance-based criterion leads to estimates that converge faster than those for the EImethod for both initial designs. The uniform initial design performs betterthan the random initial design. Based on those observations, we recommendthe uniform initial design and the distance-based criterion.1195.4. 4a EkTmcleoFhbeg 6blhma Fhacgiba−0 .4−0 .20 .00 .20 .4Number of training runsE st i ma te d 0. 00 25  q ua nt i l e20 22 24 26 28 30 32 34 36 38 40(a) Distance−based−0 .4−0 .20 .00 .20 .4Number of training runsE st i ma te d 0. 00 25  q ua nt i l e20 22 24 26 28 30 32 34 36 38 40(b) EIFigure 5.6: Quantile estimation and the initial design is random. (a) esti-mates from the distance-based criterion. (b) estimates from the EI criterion.The medians are joined by solid lines. The dotted line is the true 0O0025quantile, 0.−0 .4−0 .20 .00 .20 .4Number of training runsE st i ma te d 0. 00 25  q ua nt i l e20 22 24 26 28 30 32 34 36 38 40(a) Distance−based−0 .4−0 .20 .00 .20 .4Number of training runsE st i ma te d 0. 00 25  q ua nt i l e20 22 24 26 28 30 32 34 36 38 40(b) EIFigure 5.7: Quantile estimation and the initial design is uniform. (a) esti-mates from the distance-based criterion. (b) estimates from the EI criterion.The medians are joined by solid lines. The dotted line is the true 0O0025quantile, 0.1205.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgem5C5 Vpplixvtion to thz Computzr boyzl of thzFloor h–stzm5C5CF erzliminvr– Vnvl–sisConsider the computer model introduced in section 5.2 with y = 8.Before doing any formal modelling, we carry out a preliminary sensitivityanalysis by fitting a constant Gaussian process with an n = 20 random LatinHypercube design (LHD) (McKay et al., 1979). We use functional analysisof variance (ANOVA) which takes each variable’s main effect as well as all ofthe two level interaction effects into account (Schonlau andWelch, 2005). Allhigher order interactions are ignored. The ANOVA decomposition resultsare reported in Table 5.3.Table 5.3: ANOVA contribution analysis. The 8 main effects explain 99O23%of the total variance. Interactions between two input factors are not shownas none of them contributes more than 0.5%.Input (MOE) % Variationx) 0.01x2 5.33x+ 12.83x, 26.21x5 37.51x6 16.14x7 0.97x0 0.23From Table 5.3, we observe that the four middle beams (x+P x,P x5P x6)have the most important effects on the response, y. The relationship be-tween them and the response is illustrated in Figure 5.8 again using themethods of Schonlau and Welch (2005). The relationship between the fourside beams and y are similar to those of the four middle beams but weaker.It is clear from Figure 5.8 that a small MOE value results in a larger de-flection. Therefore, the lower tail of the input distribution, H is important1215.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgemwhen estimating the upper probability. This is valuable prior informationobtained from the preliminary analysis.1000000 1500000 20000003.03.23.43.63.84.0MOE, x3Max deflection1000000 1500000 20000003.03.23.43.63.84.0MOE, x4Max deflection1000000 1500000 20000003.03.23.43.63.84.0MOE, x5Max deflection1000000 1500000 20000003.03.23.43.63.84.0MOE, x6Max deflectionFigure 5.8: The relationship between the MOE of the four middle beamsand the response. The solid lines are the estimated main effects with 95%pointwise confidence intervals as dotted lines. Note that the inputs arebetween 0O77× 106 and 2O36× 106 (pounds per square inch), which will beexplained in section 5.5.2.Note that in section 5.4 with the short column function, where the func-tion is explicitly specified in (5.8), we can determine immediately the criticalpart of the input space. In practice, however, when the functional relation-ship between inputs and output is unclear, we can carry out a preliminarystudy as in this section to examine the effects of the inputs.1225.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgem5C5CG boyzlling thz Input DistriwutionWe need to specify the input distribution. From the analysis in section5.5.1, we know that a small MOE leads to a large deflection. Therefore,the lower tail of the input distribution is critical. A dataset of 580 MOEvalues measured from 580 boards collected from a mill is available to us.One way to proceed is to fit a parametric input distribution, for examplethe Weibull. However, training a parametric distribution based on all of the580 MOE data values does not emphasize the importance of the lower tail.Therefore, at the beginning of the research, we considered the following twosemi-parametric input distributions:• a mixture distribution, with p) = 0O1 to sample from a two parametercensored Weibull distribution and p2 = 0O9 to sample with replacementfrom the upper 90% of the available data. The two parameter censoredWeibull distribution is trained by the lower 10% of the available datawith the rest (90% data) treated as right censored (Liu et al., 2018).• a mixture distribution, with p) = 0O1 to sample from a three parametercensored Weibull distribution and p2 = 0O9 to sample with replacementfrom the upper 90% of the available data. The three parameter cen-sored Weibull distribution is trained by the lower 10% of the availabledata with the rest (90% data) treated as right censored (Liu et al.,2018).The unknown parameters were estimated by the maximum likelihood methodand their density curves are added to the empirical histogram of the lower10% data shown in Figure 5.9. It is clear that there is a non-negligiblebump (3 boards) in the lower tail of the dataset, which neither of the semi-parametric distributions is able to characterize. Hence, we propose to usethe following non-parametric input distribution, H(x):H(x) = p) ×G(x) + p2 × h(x)P (5.9)where G(x) is the empirical distribution of the lower 10% data of the dataset.h(x) is the empirical distribution of the upper 90% data of the dataset.1235.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgem0.8 0.9 1.0 1.1 1.2 1.30123456moe10T he  r el at i ve f re qu en cy  d en si t y3 Parameters Censored Weibull2 Parameters Censored WeibullFigure 5.9: Modelling the lower 10% data: Histogram of the lower 10% data,the two parameter censored Weibull distribution and the three parametercensored Weibull distribution.p) = p2 = 0O5 is used to deliberately over-sample the lower tail of the inputdistribution. It will later be corrected using methods for weighted stratifiedrandom sampling when computing the probability/quantile estimates. Ina nutshell, we use the empirical distribution of the 580 MOE data valuesas the input distribution, but deliberately over-sample the lower tail whengenerating the MC set and the finite candidate set.5C5CH bC hztThe MC set is comprised of 12P 800 points that are generated as follows.Each dimension has 2 strata for the mixture distribution: either samplingfrom G(x) or sampling from h(x). Hence, in total there are 20 = 256 strata.For each stratum, we generate 50 different points from the relevant combi-nation of G(x) and h(x) distributions and thus we have 256 × 50 = 12800points in total to form the MC set. The above sample sizes are meaningfullychosen such that they keep a balance between the approximation accuracy1245.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgemand the computational time.Working with this MC set, step 6 in algorithm 2 for probability estima-tion will change topˆf =256∑h5){wh((1Rnh)nh∑i5)1(yˆ(x(i)MC) S yf ))}P (5.10)where h = 1P 2P O O O P 256 and wh with∑256h5)wh = 1. For instance, the com-bination with all dimensions coming from G(x) has weight wh = 0O10 as theactual stratum weight. For quantile estimation, step 6 in algorithm 3 alsochanges to finding yˆf such that pˆf equals a pre-specified probability, v. Inpractice, any trial value of yf gives a pˆf in (5.10). We minimize |pˆf − v|numerically with respect to yf using the optimize() function in the MASSpackage of R to find yˆf with 10−6 tolerance.5C5CI iruz erowvwilit– vny fuvntilzWith pf = 0O001, we simulated 10 different MC sets from the mixturedistribution in (5.9) and ran the computer model. With 10 repeats the meanof the estimated 0O999 quantiles is 3O88057 inches and the standard error ofthe mean is 0O00701. Therefore, the “true” 99O9lh percentile is taken to be3.88057 inches.5C5C5 Vpplixvtion gzsultsSuppose the design budget is n = 30. The number of points for the initialdesign is n( = 20 and 10 additional points are chosen by optimizing a searchcriterion and are added to the design space sequentially. The whole processis repeated K = 10 times with different initial designs. The MC set contains12800 points simulated as described in section 5.5.3. The finite candidate setis the same as the MC set. For the initial design with only n( = 20 points,however, we take a random uniform LHD to cover the input space globally,following the recommendations from the short column model in section 5.4.A constant mean GP model with the power exponential correlation structure1255.5. 4cclicTgiba gb ghe 6bmchgee MbWel bf ghe Flbbe Flfgemis used as the surrogate statistical model and inference about unknown GPparameters is conducted using the full Bayesian method implemented inchapter 3. Results based on the distance criterion are reported in Figure5.10.0 .0 00 00 .0 01 00 .0 02 00 .0 03 0Number of training runsE st i ma te d pr ob ab il i t y20 21 22 23 24 25 26 27 28 29 30(a) Probability Estimation3 .8 03 .8 53 .9 03 .9 54 .0 0Number of training runsE st i ma te d 0. 99 9 qu an ti l e20 21 22 23 24 25 26 27 28 29 30(b) Quantile EstimationFigure 5.10: The computer model of the floor system and the initial designis uniform with distance based criterion. (a) probability estimates. (b)quantile estimates. The medians over 10 repeat experiments are joined bysolid lines. The dotted line is the true probability 0O999 in (a) and the truequantile 3O88057 in (b).We can see from Figure 5.10 that after adding 10 additional points, boththe probability estimate and quantile estimate are very close to the truevalues, which demonstrates the effectiveness of the sequential design. More-over, we estimate the extreme probability and quantile well using merelythe carefully designed 12,800 points in the MC and candidate sets with areweighted input distribution. This is a result of the preliminary analysisby which we showed a negative association between MOE of joists and theassociated deflection.1265.6. 6bmmeagf TaW Difchffiba5CK Commznts vny Disxussion5CKCF DivgnostixsWe next discuss convergence diagnostics for sequences based on thedistance based criterion. From the formulation in (5.3), we know thatas the sample size of the training set increases to infinity, the distance,|yˆ (x)− yf |R√v ;2(x) multiplied by −1 as in (5.7) will go to negative in-finity. At each step, after a point is added, we compute the distance criterionpoint-wise for the MC set and draw a boxplot against the sample size of thetraining set. We would expect a declining trend as the sample size increases.Based on one repeat, the diagnostic plots for the short column function withthe uniform initial design and the floor system model are reported in Figures5.11 and 5.12.Sample size of the training setS ta nd ar di z ed  d is ta nc es  o f po in ts  i n t he  MC  s et21 23 25 27 29 31 33 35 37 39−1 20−1 00−8 0−6 0−4 0−2 00(a) Probability EstimationSample size of the training setS ta nd ar di z ed  d is ta nc es  o f po in ts  i n t he  MC  s et21 23 25 27 29 31 33 35 37 39−1 00−8 0−6 0−4 0−2 00(b) Quantile EstimationFigure 5.11: Diagnostic plots for the short column function with the uniforminitial design. (a) probability estimates. (b) quantile estimates.From Figures 5.11 and 5.12, the observations are consistent with ourexpectation as all plots exhibit a decreasing trend. After adding 20 pointsto the initial design for the short column function, most of the distancesmultiplied by −1 as in (5.7) in the MC set are less than -20, i.e., a hugestandardized distance between the predictive mean and the contour of in-terest. The pattern carries over to the computer model for the floor system,1275.6. 6bmmeagf TaW DifchffibaSample size of the training setS ta nd ar di z ed  d is ta nc es  o f po in ts  i n t he  MC  s et21 22 23 24 25 26 27 28 29 30−1 0−8−6−4−20(a) Probability EstimationSample size of the training setS ta nd ar di z ed  d is ta nc es  o f po in ts  i n t he  MC  s et21 22 23 24 25 26 27 28 29 30−1 0−8−6−4−20(b) Quantile EstimationFigure 5.12: Diagnostic plots for the floor system computer model. (a)probability estimates. (b) quantile estimates.which indicates the algorithm is converging. This information is important,when one works with an expensive computer code, for which the “true”probability/quantile would not be known.5CKCG Finvl gzmvrksThis chapter illustrates the usefulness of a sequential method in esti-mating an extreme probability or its associated quantile. Throughout thechapter, we have addressed some very practical issues an engineer faces whenusing a sequential design. From all of the analyses, we have the followingrecommendations:• Use the distance-based criterion, which is more straightforward andconverges faster than EI in our study.• Use the uniform initial design, which over-samples one or both tails(for the floor model only one tail is over-sampled).• The stratified sampling type MC set is very helpful and is preferredover randomly generating a huge MC set from the input distribution.1285.6. 6bmmeagf TaW Difchffiba• It can be important to do a preliminary analysis to gain some insightson the relationship between inputs and output.• How to model the input distribution is an important step when esti-mating an extreme probability/quantile.• The search criteria considered in this chapter uses the standard errorof prediction to guide local/global search. As Bayesian methods canprovide more realistic uncertainty estimates (chapter 3), their use isrecommended for sequential search.129Chvptzr KConxlusions vny FuturzlorkThe Gaussian process model has been widely used as a standard statis-tical model to analyze complex black-box type computer models. However,there are several fundamental aspects about GPs requiring in-depth analy-ses. The aim of this thesis is not only to provide a state-of-the-art assessmentof the important aspects that suggest solutions of outstanding issues, butalso to extend existing methods to overcome the drawbacks revealed fromthe analyses. The scope of the thesis is rather broad: chapters 2 and 3 dis-cuss the analysis of computer models; chapters 4 and 5 focus on the designof computer experiments. We provide a summary of each chapter in section6.1. Some future work is discussed in section 6.2.KCF hummvr– of thz ihzsisChapter 2 provides a comprehensive study of the analysis of computerexperiments. The primary aim of that chapter is to answer the following twoimportant questions: when faced with a computer code and a set of runs andthe primary goal is to make predictions for untried sets of inputs, (1) howto choose the regression terms of a GP; (2) how to choose the correlationstructure of a GP? Through intensive evaluation of several real computermodels, we conclude that a constant mean regression model is adequatefor all the examples we have considered and the PowExp and the Mate´rncorrelation functions are recommended. Moreover, several other conclusionsare1306.1. FhmmTel bf ghe Ghefif1. Design matters but cannot be controlled completely. Variability ofperformance from equivalent designs can be uncomfortably large.2. The Mate´rn correlation function optimized over a few levels of smooth-ness is a reasonable alternative to PowExp.3. Full Bayes methods relying on the SqExp structure were seen to besometimes inferior, and extensions to accommodate less smooth cor-relations are needed.4. There is not enough evidence to conclude that composite GaSP meth-ods are ever better than a constant mean GP model with the PowExpstructure in practical settings where the output exhibits non-stationarybehaviour.In chapter 3, Bayesian implementations with SqExp correlation struc-ture were extended to allow PowExp or Mate´rn structure. To estimate theadditional smoothness parameters j in PowExp, hybrid and fully Bayesianapproaches were proposed and applied. The main contribution are1. We combine Bayesian inference with the flexible correlation structuresto propose new methods. Bayesian inference can quantify all sourcesof parameter estimation uncertainty and the flexible correlation struc-tures can estimate the needed smoothness from the available data. Weessentially combine the good properties of each component.2. The methods we proposed not only have better uncertainty quantifi-cation, but also have better prediction accuracy than methods withthe SqExp structure in examples where roughness is needed.Chapter 4 conducts an extensive analysis of the design of computer ex-periments. We first reviewed several popular design classes and applieddifferent examples to assess the prediction performance of the designs. Ourimportant findings are1. Given the same sample size, we find mLHD has the best overall pre-diction accuracy. Therefore, mLHD is the default design we suggest ifthere is no prior information available.1316.2. Fhghee Wbek2. Sample size and transformation of y have much bigger effects on theprediction performance than that of design class.Chapter 5 provides some insights in how to use the sequential design tosolve a real engineering problem. In chapter 4, we evaluate the performancesof different design classes and those designs are fixed in the sense that oncegenerated they will not change. Chapter 5 discusses sequential design. It isan adaptive process that the design updates “on the fly” as new points aresequentially added. With the purpose of estimating a failure probability andits associated quantile, we contrast the performances of two sequential searchcriteria. Moreover, several other practical issues have also been carefullystudied. Some important recommendations are made.KCG Futurz lorkThe thesis suggests a number of future projects. First of all, in chapter2, we show that there is a lack of information to conclude the CGP hasbetter performance than a GP with the PowExp in practical settings wherethe output exhibits nonstationary behaviours. There are several methodsfor dealing with nonstationary outputs in the literature, such as treed GP(TGP) (Gramacy and Lee, 2008) and local GP (Gramacy and Apley, 2015).It would be interesting to contrast the performances of those methods withthat of a GP with the PowExp structure when nonstationarity exists.In chapter 3, we discussed the PowExp and Mate´rn structure for BayesianGPs. There exist other correlation families in the literature, for example thenon stationary covariance function used by Paciorek and Schervish (2004),worth trying. In other words, we would follow similar steps for Bayesian in-ference of a GP in chapter 3, but with a non stationary covariance structureinstead of the power exponential structure and the Mate´rn structure.In chapter 4, we have expanded Franke’s function to have 2-dimensionalinteractions only and have presumably follow designs with good 2-dimensionalprojection properties. But the performances of the projection-based designsare not good for Franke’s function. Are there other types of functions, for1326.2. Fhghee Wbekwhich there would be substantial improvement for the projection-based de-signs? This remains unknown.When optimizing the sequential criterion, we use a fixed finite candi-date set in chapter 5. However, it would be desirable to have a dynamiccandidate set, which can update automatically based on the point that isbeing added to the training set. Moreover, we acknowledge that both ex-amples used in chapter 5 are relatively simple examples. We will seek otherengineering models to compare the performance of the existing sequentialmethods. Those other models may well suggest new research objectives.Therefore, it is possible that those existing sequential methods needs to bemodified or we may have to propose a completely different criterion to fit thenew objective. Whatever the engineering objectives, a sequential method islikely to be advantageous for a complex application.133Wiwliogrvph–Abt, M. (1999), “Estimating the Prediction Mean Squared Error in GaussianStochastic Processes with Exponential Correlation Structure,” gcunxinuAviun Journul oz gtutistics, 26, 563–578.Andrianakis, I. and Challenor, P. G. (2012), “The Effect of the Nugget onGaussian Process Emulators of Computer Models,” Compututionul gtutisAtics : Dutu Unulysis, 56, 4215–4228.Ba, S. and Joseph, R. (2012), “Composite Gaussian Process Models forEmulating Expensive Functions,” Unnuls oz Uppliyx gtutistics, 6, 1838–1860.Ba, S., Myers, W. R., and Brenneman, W. A. (2015), “Optimal Sliced LatinHypercube Designs,” hychnomytrics, 57, 479–487.Barthelmann, V., Novak, E., and Ritter, K. (2000), “High DimensionalPolynomial Interpolation on Sparse Grids,” Uxvuncys in Compututionulauthymutics, 12, 273–288.Bastos, L. and O’Hagan, A. (2009), “Diagnostics for Gaussian Process Em-ulators,” hychnomytrics, 51, 425–438.Bayarri, M., Berger, J., Paulo, R., Sacks, J., Cafeo, J., Cavendish, J., Lin,C.-H., and Tu, J. (2007), “A Framework for Validation of Computer Mod-els,” hychnomytrics, 49, 138–154.Bayarri, M. J., Berger, J. O., Calder, E. S., Dalbey, K., Lunagomez, S.,Patra, A. K., Pitman, E. B., Spiller, E. T., and Wolpert, R. L. (2009),“Using Statistical and Computer Models to Quantify Volcanic Hazards,”hychnomytrics, 51, 402–413.1345iblibgeTchlBichon, B. B., Mahadevan, S., and Eldred, M. S. (2009), “Reliability-Based Design Optimization Using Efficient Global Reliability Analysis,”Palm Springs, California, 50th AIAA/ASME/ASCE/AHS/ASC Struc-tures, Structural Dynamics, and Materials Conference, pp. 1–12.Bingham, D., Ranjan, P., and Welch, W. (2014), “Design of ComputerExperiments for Optimization, Estimation of Function Contours, and Re-lated Objectives,” in gtutistics in UctionN U Cunuxiun cutlook, ed. Law-less, J. F., Boca Raton, Florida: CRC Press, chap. 7, pp. 110–123.Cantelli, F. P. (1933), “Sulla Determinazione Empirica Delle Leggi di Prob-abilita,” Giornuly Istituto Ituliuno Uttuuri, 4, 221–424.Caselton, W. F. and Zidek, J. V. (1984), “Optimal Monitoring NetworkDesigns,” gtutistics : drovuvility Lyttyrs, 2, 223–227.Chapman, W. L., Welch, W. J., Bowman, K. P., Sacks, J., and Walsh, J. E.(1994), “Arctic Sea Ice Variability: Model Sensitivities and a MultidecadalSimulation,” Journul oz Gyophysicul fysyurchN ccyuns, 99, 919–935.Chen, H. (2013), “Bayesian Prediction and Inference in Analysis of Com-puter Experiments,” Master’s thesis, University of British Columbia, Van-couver.Dette, H. and Pepelyshev, A. (2010), “Generalized Latin Hypercube Designfor Computer Experiments,” hychnomytrics, 52, 421–429.Fang, K.-T., Lin, D. K., Winker, P., and Zhang, Y. (2000), “Uniform Design:Theory and Application,” hychnomytrics, 42, 237–248.Franke, R. (1979), “A Critical Comparison of Some Methods for Interpola-tion of Scattered Data,” Tech. rep., Naval Postgraduate School MontereyCA.Glivenko, V. (1933), “Sulla Determinazione Empirica Della Legge di Prob-abilita,” Giornuly Istituto Ituliuno Uttuuri, 4, 92–99.1355iblibgeTchlGramacy, R. and Lee, H. (2008), “Bayesian Treed Gaussian Process ModelsWith an Application to Computer Modeling,” Journul oz thy Umyricungtutisticul Ussociution, 103, 1119–1130.Gramacy, R. B. and Apley, D. W. (2015), “Local Gaussian Process Approxi-mation for Large Computer Experiments,” Journul oz Compututionul unxGruphicul gtutistics, 24, 561–578.Gramacy, R. B. and Lee, H. K. (2012), “Cases for the Nugget in ModelingComputer Experiments,” gtutistics unx Computing, 22, 713–722.Handcock, M. S. and Stein, M. L. (1993), “A Bayesian Analysis of Kriging,”hychnomytrics, 35, 403–410.Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008), “Com-puter Model Calibration Using High-Dimensional Output,” Journul ozthy Umyricun gtutisticul Ussociution, 103, 570–583.Johnson, M., Moore, L., and Ylvisaker, D. (1990), “Minimax and MaximinDistance Designs,” Journul oz gtutisticul dlunning unx Inzyryncy, 26, 131–148.Jones, D. R., Schonlau, M., and Welch, W. J. (1998), “Efficient GlobalOptimization of Expensive Black-box Functions,” Journul oz Glovul cpAtimizution, 13, 455–492.Joseph, V. R., Gul, E., and Ba, S. (2015), “Maximum Projection Designsfor Computer Experiments,” Biomytriku, 102, 371–380.Joseph, V. R., Hung, Y., and Sudjianto, A. (2008), “Blind Kriging: a NewMethod for Developing Metamodels,” Journul oz aychunicul Dysign, 130,031102–1–8.Kaufman, C. G., Bingham, D., Habib, S., Heitmann, K., and Frieman, J. A.(2011), “Efficient Emulators of Computer Experiments Using CompactlySupported Correlation Functions With an Application to Cosmology,”Unnuls oz Uppliyx gtutistics, 2470–2492.1365iblibgeTchlKennedy, M. (2004), “Description of the Gaussian Process Model Used inGEM-SA,” Tech. rep., University of Sheffield, available online at ettm7,,ttt.tlkyleadak.cl.rh,acaabmic,DBJ,.Kennedy, M. and O’Hagan, A. (2001), “Bayesian Calibration of ComputerModels,” Journul oz foyul gtutisticul Ussociution, gyriys B, 63, 425–464.Kuschel, N. and Rackwitz, R. (1997), “Two Basic Problems in Reliability-Based Structural Optimization,” authymuticul aythoxs oz cpyrution fyAsyurch, 46, 309–333.Liu, Y., Salibia´n-Barrera, M., Zamar, R. H., and Zidek, J. V. (2018), “UsingArtificial Censoring to Improve Extreme Tail Quantile Estimates,” JourAnul oz thy foyul gtutisticul gociytyN gyriys C (Uppliyx gtutistics), 1–22.DOI: 10.1111/rssc.12262.Loeppky, J., Sacks, J., and Welch, W. (2009), “Choosing the Sample Size ofa Computer Experiments: A Practical Guide,” hychnomytrics, 51, 366–376.McCutcheon, W. J. (1984), “Deflections of Uniformly Loaded Floors: aBeam-Spring Analog,” Tech. Rep. Research Paper FPL449, United StatesDepartment of Agriculture, Madison, WI.McKay, M., Beckman, R., and Conover, W. (1979), “A Comparison of ThreeMethods for Selecting Values of Input Variables in the Analysis of Outputfrom a Computer Code,” hychnomytrics, 21, 239–245.Morris, M., Mitchell, T., and Ylvisaker, D. (1993), “Bayesian Design andAnalysis of Computer Experiments: Use of Derivatives in Surface Predic-tion,” hychnomytrics, 35, 243–255.Niederreiter, H. (1988), “Low Discrepancy and Low Dispersion Sequences,”Journul oz bumvyr hhyory, 30, 51–70.Niederreiter, H. (1992), funxom bumvyr Gynyrution unx euusiAaontyCurlo aythoxs, CBMS-NSF Regional Conference Series in Applied Math-ematics, SIAM, 1st ed.1375iblibgeTchlNilson, T. and Kuusk, A. (1989), “A Reflectance Model for the HomogeneousPlant Canopy and its Inversion,” fymoty gynsing oz Environmynt, 27,157–167.Paciorek, C. J. and Schervish, M. J. (2004), “Nonstationary CovarianceFunctions for Gaussian Process Regression,” in Uxvuncys in byurul InzorAmution drocyssing gystyms, pp. 273–280.Paulo, R. (2005), “Default Priors for Gaussian Processes,” Unnuls oz gtutisAtics, 33, 556–582.Picheny, V., Ginsbourger, D., Richet, Y., and Caplin, G. (2013), “Quantile-Based Optimization of Noisy Computer Experiments With Tunable Pre-cision,” hychnomytrics, 55, 2–13.Plumlee, M. (2014), “Fast Prediction of Deterministic Functions UsingSparse Grid Experimental Designs,” Journul oz thy Umyricun gtutisticulUssociution, 109, 1581–1591.Preston, D., Tonks, D., and Wallace, D. (2003), “Model of Plastic Defor-mation for Extreme Loading Conditions,” Journul oz Uppliyx dhysics, 93,211–220.Pronzato, L. and Mu¨ller, W. G. (2012), “Design of Computer Experiments:Space Filling and Beyond,” gtutistics unx Computing, 22, 681–701.Ranjan, P., Bingham, D., and Michailidis, G. (2008), “Sequential Experi-mental Design for Contour Estimation from Complex Computer Codes,”hychnomytrics, 50, 527–541.Ranjan, P., Haynes, R., and Karsten, R. (2011), “A Computationally StableApproach to Gaussian Process Interpolation of Deterministic ComputerSimulation Data,” hychnomytrics, 53, 366–378.Roberts, G. O. and Rosenthal, J. S. (2009), “Examples of Adaptive MCMC,”Journul oz Compututionul unx Gruphicul gtutistics, 18, 349–367.1385iblibgeTchlRosenthal, J. S. (2014), “Optimizing and Adapting the Metropolis Algo-rithm,” in gtutistics in UctionN U Cunuxiun cutlook, ed. Lawless, J. F.,Boca Raton, Florida: CRC Press, chap. 6, pp. 93–108.Roy, S. and Notz, W. I. (2014), “Estimating Percentiles in Computer Exper-iments: a Comparison of Sequential-adaptive Designs and Fixed Designs,”Journul oz gtutisticul hhyory unx dructicy, 8, 12–29.Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989), “Design andAnalysis of Computer Experiments,” gtutisticul gciyncy, 4, 409–435.Santner, T., Williams, B., and Notz, W. (2003), hhy Dysign unx Unulysis ozComputyr Efipyrimynts, Springer Series in Statistics, Springer Press, 1sted.Schonlau, M. and Welch, W. J. (2005), “Screening the Input Variables to aComputer Model via Analysis of Variance and Visualization,” in gcryyningaythoxs zor Efipyrimyntution in Inxustry, Drug Discovyry unx Gynytics,eds. Dean, A. and Lewis, S., New York: Springer, chap. 10, pp. 308–327.Sheather, S. J. and Jones, M. C. (1991), “A Reliable Data-based BandwidthSelection Method for Kernel Density Estimation.” Journul oz thy foyulgtutisticul gociyty gyriys B, 53, 683–690.Shewry, M. C. and Wynn, H. P. (1987), “Maximum Entropy Sampling,”Journul oz Uppliyx gtutistics, 14, 165–170.Sobol, I. (1967), “On the Distribution of Points in a Cube and the Approx-imate Evaluation of Integrals,” iggf Compututionul authymutics unxauthymuticul dhysics, 7, 86–112.Spiller, E. T., Bayarri, M. J., Berger, J. O., Calder, E. S., Patra, A. K., Pit-man, E. B., and Wolpert, R. L. (2014), “Automating Emulator Construc-tion for Geophysical Hazard Maps,” gIUaCUgU Journul on incyrtuintyeuunticution, 2, 126–152.Stein, M. (1987), “Large sample properties of simulations using Latin hy-percube sampling,” hychnomytrics, 29, 143–151.1395iblibgeTchlStein, M. (1988), “Asymptotically Efficient Prediction of a Random FieldWith a Misspecified Covariance Function,” Unnuls oz gtutistics, 16, 55–63.Stein, M. (1999), Intyrpolution oz gputiul DutuN gomy hhyory zor Kriging,New York: Springer Press.Tang, B. (1993), “Orthogonal Array-Based Latin Hypercubes,” Journul ozthy Umyricun gtutisticul Ussociution, 88, 1392–1397.Welch, W., Buck, R., Sacks, J., Wynn, H., and Toby Mitchell, M. M. (1992),“Screening, Predicting, and Computer Experiments,” hychnomytrics, 34,15–25.Welch, W. J., Buck, R. J., Sacks, J., Wynn, H. P., Morris, M. D., andSchonlau, M. (1996), “Response to James M. Lucas,” hychnomytrics, 38,199–203.West, O. R., Siegrist, R. I., Wynn, T. J., and Jenkins, R. A. (1995), “Mea-surement Error and Spatial Variability Effects on Characterization ofVolatile Organics in the Subsurface,” Environmyntul gciyncy unx hychAnology, 29, 647–656.140Vppznyix Vhupplzmzntvl bvtzrivls forChvptzr GVCF izst Funxtions in Chvptzrs GVCFCF Worzholz boyzlThe output y is generated byy =2.iu(Hu −Hl)log(rRrw)(1 + 2aiudgg(r=rw)r2wKw+ iuRil) Pwhere the 8 input variables and their respective ranges of interest are as inTable A.1.Variable Description (units) Rangerw radius of borehole (m) [0O05P 0O15]r radius of influence (m) [100P 5000]iu transmissivity of upper aquifer (m2/yr) [63070P 115600]Hu potentiometric head of upper aquifer (m) [990P 1110]il transmissivity of lower aquifer (m2 / yr) [63O1P 116]Hl potentiometric head of lower aquifer (m) [700P 820]a length of borehole (m) [1120P 1680]Kw hydraulic conductivity of borehole (m/yr) [9855P 12045]Table A.1: Borehole function input variables, units, and ranges. All rangesare converted to [0P 1] for statistical modeling.1414.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2VCFCG Gprotzin boyzlThe differential equations generating the output y of the G-protein sys-tem dynamics are˙) = −u))x+ u22 − u+) + u5P˙2 = u))x− u22 − u,2P˙+ = −u62+ + u0(Glgl − + − ,)(Glgl − +)P˙, = u62+ − u7,Py = (Glgl − +)RGlglPwhere )P O O O P , are concentrations of 4 chemical species, ˙) =)t , etc, andGlgl = 10000 is the (fixed) total concentration of G-protein complex after30 seconds. The input variables in this system are described in Table A.2.Only y = 4 inputs are varied: we model y as a function of log(x), log(u)),log(u6), log(u7).Variable Description Rangeu) rate constant [2× 106P 2× 107]u2 rate constant 5× 10−+ (fixed)u+ rate constant 1× 10−+ (fixed)u, rate constant 2× 10−+ (fixed)u5 rate constant 8 (fixed)u6 rate constant [3× 10−5P 3× 10−,]u7 rate constant [0O3P 3]u0 rate constant 1 (fixed)x initial concentration [1O0× 10−1P 1O0× 10−6]Table A.2: G-protein code input variables and ranges. All variables aretransformed to log scales on [0P 1] for statistical modeling.VCG gzsults of cormvlizzy mvximum vwsolutzzrrors in Chvptzr G1424.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.10.20.30.4SqExp PowExp Matern Matern2Constant80−run mLHD0.00.10.20.30.4SqExp PowExp Matern Matern2Select linear80−run mLHD0.00.10.20.30.4SqExp PowExp Matern Matern2Full linear80−run mLHD0.00.20.40.60.8Constant40−run mLHD0.00.20.40.60.8Select linear40−run mLHD0.00.20.40.60.8Full linear40−run mLHDFigure A.1: G-protein model: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP with all combinations of three regression mod-els and four correlation functions. There are three base designs: a 40-runmLHD (top row); and a 80-run mLHD (bottom row). For each base design,24 random permutations of its columns give the 24 values of zeax, hg in aboxplot.1434.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.30.40.50.60.7SqExp PowExp Matern Matern2ConstantSqExp PowExp Matern Matern2Select linearSqExp PowExp Matern Matern2Full linearFigure A.2: PTWmodel: Normalized maximum absolute error of prediction,zeax, hg, for GaSP with all combinations of three regression models and fourcorrelation functions. There is one base design: a 110 mLHD. 25 randompermutations of its columns give the 25 values of zeax, hg in a boxplot.zeax, hgRegression Correlation functionModel SqExp PowExp Matern2 MaternConstant 0.35 0.27 0.28 0.28Select linear 0.35 0.26 0.27 0.24Full linear 0.30 0.27 0.28 0.28Quartic 0.29 0.28 0.29 0.31Table A.3: Nilson-Kuusk model: Normalized maximum absolute error ofprediction, zeaxhg, for four regression models and four correlation functions.The experimental data are from a 100-run LHD, and the hold-out set is froma 150-run LHD.1444.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2zeax, hgRegression Correlation functionModel SqExp PowExp Matern2 MaternConstant 0.21 0.16 0.18 0.17Select linear 0.25 0.16 0.19 0.18Full linear 0.23 0.16 0.21 0.17Quartic 0.23 0.16 0.21 0.17Table A.4: Nilson-Kuusk model: Normalized maximum absolute error ofprediction, zeaxhg, for four regression models and four correlation functions.The experimental data are from a 150-run LHD, and the hold-out set is froma 100-run LHD.Normalized Max Absolute Error0.100.150.200.250.30SqExp PowExp Matern Matern2ConstantSqExp PowExp Matern Matern2Select LinearSqExp PowExp Matern Matern2Full LinearSqExp PowExp Matern Matern2QuarticFigure A.3: Nilson-Kuusk model: Normalized maximum absolute error ofprediction, zeax,hg, for four regression models and four correlation functions.Twenty-five designs are created from a 150-run LHD base plus 50 randompoints from a 100-run LHD. The remaining 50 points in the 100-run LHDform the holdout set for each repeat.1454.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.20.40.60.8SqExp PowExp Matern Matern2Constantlog(y+1)0.00.20.40.60.8SqExp PowExp Matern Matern2Select linearlog(y+1)0.00.20.40.60.8SqExp PowExp Matern Matern2Full linearlog(y+1)0.00.20.40.60.8SqExp PowExp Matern Matern2Quadraticlog(y+1)0.00.10.20.30.4Constanty0.00.10.20.30.4Select lineary0.00.10.20.30.4Full lineary0.00.10.20.30.4QuadraticyFigure A.4: Volcano model: Normalized maximum absolute error of predic-tion, zeax, hg, for three regression models and two correlation functions.1464.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.51.01.52.02.5SqExp PowExp Matern Matern2ConstantRangeOfArea0.00.51.01.52.02.5SqExp PowExp Matern Matern2Select linearRangeOfArea0.00.51.01.52.02.5SqExp PowExp Matern Matern2Full linearRangeOfArea0.00.51.01.52.02.5ConstantIceVelocity0.00.51.01.52.02.5Select linearIceVelocity0.00.51.01.52.02.5Full linearIceVelocity0.00.51.01.52.0ConstantIceArea0.00.51.01.52.0Select linearIceArea0.00.51.01.52.0Full linearIceArea0.00.51.01.52.0ConstantIceMass0.00.51.01.52.0Select linearIceMass0.00.51.01.52.0Full linearIceMassFigure A.5: Seaice model: Normalized maximum absolute error of predic-tion, zeax,hg, for three regression models and four correlation functions.The outputs are: IceMass, IceArea, IceVelocity and RangeOfArea, whichare modelled independently.1474.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.10.20.30.40.5SqExp PowExp SqExp−Full (K) CGP27−run OA0.00.10.20.30.40.5SqExp PowExp SqExp−Full (K) CGP27−run mLHD0.000.050.100.150.200.250.30SqExp PowExp SqExp−Full (K) CGP40−run mLHDFigure A.6: Borehole function: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP(Const, Gauss), GaSP(Const, PowerExp), SqExp-Full (K), and CGP. There are three base designs: a 27-run OA (left), a27-run mLHD (middle); and a 40-run mLHD (right). For each base design,25 random permutations of its columns give the 25 values of zeax, hg in aboxplot.1484.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.20.40.60.8SqExp PowExp SqExp−Full (K) CGP40−run mLHD0.000.050.100.150.200.250.30SqExp PowExp SqExp−Full (K) CGP80−run mLHDFigure A.7: G-protein model: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP(Const, Gauss), GaSP(Const, PowerExp), SqExp-Full (K), and CGP. There are two base designs: a 40-run mLHD (left); andan 80-run mLHD (right). For each base design, all 24 permutations of itscolumns give the 24 values of zeax, hg in a boxplot.1494.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.10.20.30.40.5SqExp PowExp SqExp−Full (K) CGPFigure A.8: Nilson-Kuusk model: Normalized maximum absolute errorof prediction, zeax,hg, for GaSP(Const, Gauss), GaSP(Const, PowerExp),SqExp-Full (K), and CGP.1504.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.00.10.20.30.4SqExp PowExp SqExp−Full (K) CGPy0.00.20.40.60.8SqExp PowExp SqExp−Full (K) CGPlog(y+1)Figure A.9: Volcano model: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP(Const, Gauss), GaSP(Const, PowerExp), SqExp-Full (K), and CGP.1514.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Constant40−run mLHD0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Select linear40−run mLHD0.000.050.100.150.200.250.30SqExp PowExp Matern Matern2Full linear40−run mLHD0.00.20.40.60.8Constant27−run mLHD0.00.20.40.60.8Select linear27−run mLHD0.00.20.40.60.8Full linear27−run mLHD0.00.20.40.60.81.0Constant27−run OA0.00.20.40.60.81.0Select linear27−run OA0.00.20.40.60.81.0Full linear27−run OAFigure A.10: Borehole model: Normalized maximum absolute error of pre-diction, zeax,hg, for GaSP adding a nugget estimated by its MLEs, and withall combinations of three regression models and four correlation functions.There are three base designs: a 27-run OA (top row); a 27-run mLHD(middle row); and a 40-run mLHD (bottom row). For each base design,25 random permutations of its columns give the 25 values of zeax, hg in aboxplot.1524.2. Eefhlgf bf AbemTlimeW mTkimhm Tbfblhge eeebef ia 6hTcgee 2Normalized Max Absolute Error0.000.050.100.150.20No nugget NuggetSqExpn=500.000.050.100.150.20No nugget NuggetPowExpn=500.00.20.40.60.81.0SqExpn=250.00.20.40.60.81.0PowExpn=25Figure A.11: Friedman function: Normalized maximum absolute error ofprediction, zeax,hg,, for GaSP(Const, SqExp) and GaSP(Const, PowExp)models with no nugget term versus the same models with a nugget. Thereare two base designs: a 25-run mLHD (top row); and a 50-run mLHD (bot-tom row). For each base design, 25 random permutations between columnsgive the 25 values of zeax, hg in a boxplot.153Vppznyix Whupplzmzntvl bvtzrivls forChvptzr HWCF gzsults of cormvlizzy bvximum VwsolutzErrorsSq−Emp Sq−Full (K) Sq−Full (H) Pow−EmpNo rmal i zed  Ma x Ab so lu te  E rr or0 .00 .10 .20 .30 .40 .5Figure B.1: Normalized maximum absolute errors for the Nilson-Kuusk (mo-tivating example) from four methods: SqExp-Emp, SqExp-Full (K), SqExp-Full (H) and PowExp-Emp. Each method has 25 normalized maximumabsolute errors from 25 random training-test data splits.1545.1. Eefhlgf bf AbemTlimeW MTkimhm 4bfblhge EeebefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−HybNo rmal i zed  Ma x Ab so lu te  E rr or0 .00 .10 .20 .30 .40 .5Figure B.2: Normalized maximum absolute error for the Nilson-Kuuskmodel from six methods: SqExp-Emp, SqExp-Full, PowExp-Emp, PowExp-Full, PowExp-Hybrid and Mate´rn-Hybrid. Each method has 25 normalizedmaximum absolute error values from 25 random training-test data splits.1555.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.10.20.30.40.50.6Normalized Max Absolute Error for ySq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0.00.20.40.60.81.0Normalized Max Absolute Error for log(y+1)Figure B.3: Normalized maximum absolute error for the volcano modelfrom six methods: SqExp-Emp, SqExp-Full, PowExp-Emp, PowExp-Full,PowExp-Hybrid and Mate´rn-Hybrid. Each method has 100 normalized max-imum absolute error values from 100 random training-test data splits. Twotransformations of y are considered:√y (left panel) and log(y + 1) (rightpanel).WCG bvrginvl eostzrior Distriwution of thzCorrzlvtion evrvmztzrsHere we derive the marginal posterior distribution of  B, the correlationparameters treated in a fully Bayesian way and hence sampled by MCMC.Any remaining correlation parameters are estimated by empirical Bayes, andthe expressions below are conditional on their plugged-in MLEs. We alsotake priors .() ∝ 1 and .(2) = IG(ϕ)P ϕ2).We need to integrate out  and 2 from the joint posterior of , 2, and1565.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−HybNo rmal i zed  Ma x Ab so lu te  E rr or0 .0 00 .0 10 .0 20 .0 30 .0 40 .0 5Figure B.4: Normalized maximum absolute error for the Borehole functionand a 80-point mLHD base design. Each method has 25 normalized maxi-mum absolute error values from repeat experiments permuting the columnsof the base design.1575.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−HybNo rmal i zed  Ma x Ab so lu te  E rr or0 .0 000 .0 020 .0 040 .0 060 .0 080 .0 10Figure B.5: Normalized maximum absolute error for the Borehole functionand a 200-point mLHD base design. Each method has 25 normalized maxi-mum absolute error values from repeat experiments permuting the columnsof the base design.1585.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−HybNo rmal i zed  Ma x Ab so lu te  E rr or0 .00 .20 .40 .60 .8Figure B.6: Normalized maximum absolute error for the PTW model withan mLHD base design. Each method has 25 normalized maximum absoluteerror values from repeat experiments permuting the columns of the basedesign.1595.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0 .0 00 .0 50 .1 00 .1 50 .2 0No rmal i zed  Ma x Ab so lu te  E rr orFigure B.7: Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with SqExp correlation. The boxplots showthe results from 25 random realizations of the GP.1605.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0 .00 .20 .40 .60 .8No rmal i zed  Ma x Ab so lu te  E rr orFigure B.8: Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with PowExp correlation and all j = 1O8. Theboxplots show the results from 25 random realizations of the GP.1615.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeefSq−Emp Sq−Full Pow−Emp Pow−Full Pow−Hyb Mat−Hyb0 .0 00 .0 50 .1 00 .1 50 .2 00 .2 50 .3 0No rmal i zed  Ma x Ab so lu te  E rr orFigure B.9: Normalized maximum absolute error with y = 10 inputs andoutput simulated from a GP with Mate´rn correlation and all j = 1. Theboxplots show the results from 25 random realizations of the GP.1625.2. MTegiaTl Cbfgeeibe Difgeibhgiba bf ghe 6beeelTgiba CTeTmegeef B:.( B|y) ∝∫ ∫.().(2).( B)a(y|P 2P B) y y2∝∫ ∫.( B)ϕϕ)2Γ(ϕ))(2)−ϕ)−) exp(−ϕ22)× 1(2)n=2 |R|)=2exp(−(y − 1)iR−)(y − 1)22)y y2=.( B)|R|)=2∫ϕϕ)2Γ(ϕ))(2)−ϕ)−) exp(−ϕ22)× 1(2)n=2∫exp(−(y − 1)iR−)(y − 1)22)y y2O (B.1)First, we evaluate the integral with respect to . Rewrite the quadraticform in the integrand as(y − 1)iR−)(y − 1) = (y − 1ˆ− 1(− ˆ))iR−)(y − 1ˆ− 1(− ˆ))= (y − 1ˆ)iR−)(y − 1ˆ) + (− ˆ)1iR−)1(− ˆ)O(B.2)The cross-product term disappears because(y−1ˆ)iR−)1 = yiR−)1−ˆ1iR−)1 = yiR−)1−(1iR−)1)−)1iR−)y1iR−)1 = 0after substituting ˆ from (1.6). Further simplification of (B.2) follows bynoting that(y − 1ˆ)iR−)(y − 1ˆ) = (2ϕ) + n− 1)ˆ2 B OAlso note that∫1√2.(1iR−)12))=2exp(−(− ˆ)i1iR−)1(− ˆ)22)yis the integral of the probability density of  ∼ c(ˆP (1iR−)1R2)−)) and1635.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgefhence integrates to 1. Thus, in (B.1),∫exp(−(y − 1)iR−)(y − 1)22)y ∝ exp(−(2ϕ) + n− 1)ˆ2 B22)(1iR−)12)−)=2O(B.3)Substituting (B.3) into (B.1) leaves just the integral with respect to 2:.( B|y)∝ .( B)|R|)=2∫ϕϕ)2Γ(ϕ))(2)−ϕ)−) exp(−ϕ22)× 1(2)n=2exp(−(2ϕ) + n− 1)ˆ2 B22)(1iR−)12)−)=2y2∝ .( B)|R|)=2 (1iR−)1))=2∫(2)−(ϕ)+n−)2+)) exp(−ϕ2 +(n−))2 ˆ2 B2)y2=.( B)|R|)=2 (1iR−)1))=2Γ(ϕ) +n−)2 )(ϕ2 +n−)2 ˆ2 B)(ϕ)+n−)2)Pwhich is the posterior distribution of  B given in (3.6) up to constants ofproportionality. The last line follows since∫(ϕ2 +n−)2 ˆ2 B)(ϕ)+n−)2)Γ(ϕ) +n−)2 )(2)−(ϕ)+n−)2+)) exp(−ϕ2 +n−)2 ˆ2 B2)y2is the integral of an inverse-gamma probability density for 2 with parame-ters ϕ) +n−)2 and ϕ2 +n−)2 ˆ2 B.WCH eostzrior Distriwutions vny VxxzptvnxzgvtzsWe report posterior distributions for the correlation parameters and theiracceptance rates for the Nilson-Kuusk example with a 100-point LHD totrain a GP model. The number of MCMC iterations is 10,000 with the first4,000 as burn-in, and the thinning parameter is set as 10.1645.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgefFor PowExp, Figures B.10 and B.11 show the empirical posterior densityplots of the j and j for the PowExp-Full method. The Metropolis-0.0 0.1 0.2 0.3 0.4 0.501234θ1Density0.0 0.5 1.0 1.50.00.51.01.52.0θ2Density0.00 0.05 0.10 0.15 0.20 0.25 0.30024681012θ3Density0.00 0.05 0.10 0.15 0.20 0.25 0.30024681012θ4Density5 10 15 200.000.050.100.15θ5DensityFigure B.10: Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Full method.Hasting algorithm works on the j and j scales, but the parameters aretransformed back to the j and j scales in the plots. The method proposedby Sheather and Jones (Sheather and Jones, 1991) is used to select thebandwidth of a Gaussian kernel density estimator, which is implemented1655.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgef1.0 1.2 1.4 1.6 1.8 2.00246810α1Density1.0 1.2 1.4 1.6 1.8 2.0051015α2Density1.0 1.2 1.4 1.6 1.8 2.0024681012α3Density1.0 1.2 1.4 1.6 1.8 2.0024681012α4Density1.0 1.2 1.4 1.6 1.8 2.00123456α5DensityFigure B.11: Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Full method.1665.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgefby the tiate.PG function in the JAPP library in R. Similarly, Figure B.12shows the empirical posterior density plots of the j for the PowExp-Hybridmethod. These posterior distributions may be compared with maximum0.00 0.05 0.10 0.15 0.20 0.25 0.300123456θ1Density0.0 0.2 0.4 0.6 0.8 1.00.00.51.01.52.02.53.0θ2Density0.00 0.05 0.10 0.15 0.20 0.25 0.30024681012θ3Density0.00 0.05 0.10 0.15 0.20 0.25 0.30024681012θ4Density2 4 6 8 10 12 14 160.000.050.100.150.20θ5DensityFigure B.12: Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the PowExp-Hybrid method.likelihood estimates (also on the j and j scales) from empirical Bayes, asshown in Table B.1.Figure B.13 shows the empirical posterior density plots of the j for the1675.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgefˆ) ˆ2 ˆ+ ˆ, ˆ5 ˆ) ˆ2 ˆ+ ˆ, ˆ50.16 0.35 0.02 0.04 8.89 1.26 2.00 1.90 2.00 1.89Table B.1: Maximum likelihood estimates of j and j from PowExp-Empfor the Nilson-Kuusk model.Mate´rn-Hybrid method. These posterior distributions may be compared0.0 0.1 0.2 0.3 0.402468θ1Density0.0 0.2 0.4 0.6 0.8 1.0 1.20.00.51.01.52.02.5θ2Density0.00 0.05 0.10 0.15 0.20 0.25 0.3002468101214θ3Density0.0 0.5 1.0 1.50.00.51.01.52.0θ4Density4 6 8 10 12 140.000.050.100.150.200.250.30θ5DensityFigure B.13: Empirical posterior density plots of )P O O O P 5 for the Nilson-Kuusk model with the Mate´rn-Hybrid method.1685.3. Cbfgeeibe Difgeibhgibaf TaW 4ccecgTace ETgefwith maximum likelihood estimates of the j and j from empirical Bayes,as shown in Table B.2.ˆ) ˆ2 ˆ+ ˆ, ˆ5 ˆ) ˆ2 ˆ+ ˆ, ˆ50.11 0.38 0.02 0.50 8.37 0 →∞ →∞ 2 2Table B.2: Maximum likelihood estimates of the j and j from empiricalBayes and the Mate´rn correlation for the Nilson-Kuusk model.The Metropolis-Hastings acceptance rates for the j (PowExp-Full, PowExp-Hybrid, and Mate´rn-Hybrid) are reported in Table B.3 and for the j (PowExp-Full) in Table B.4.Method ) 2 + , 5PowExp-Full 30.0% 42.6% 45.1% 43.4% 31.8%PowExp-Hybrid 41.8% 45.6% 47.6% 43.3% 41.3%Mate´rn-Hybrid 18.9% 27.2% 32.0% 23.4% 31.6%Table B.3: Metroplis-Hastings acceptance rates for the j for the Nilson-Kuusk model.Method ) 2 + , 5PowExp-Full 38.4% 49.4% 45.8% 43.9% 33.2%Table B.4: Metroplis-Hastings acceptance rates for the j for the Nilson-Kuusk model.169Vppznyix Chupplzmzntvl bvtzrivls forChvptzr ICCF gzsults of cormvlizzy bvx Vwsolutz Errorsfor thz bvin CompvrisonRandom rLHD mLHD trLHD Sobol Uniform OA−LHD m2LHD0 .0 00 .0 50 .1 00 .1 50 .2 00 .2 50 .3 0No rmal i zed  Ma x Ab so lu te  E rr orFigure C.1: Borehole function: Normalized maximum absolute error of pre-diction, zeax, hg, for all 8 base designs, n = 27. For each base design, 25random permutations of its columns give 25 values of zeax, hg, displayed asa box plot. Boxplots are joined by their means.1706.1. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe MTia 6bmcTeifba2.11 2.41128 256−0.90−0.60−0.300.130.250.50log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure C.2: PTW function: Normalized maximum absolute error of predic-tion, zeax, hg; n = 128P 256 for rLHD, Sobol, mLHD and trLHD designs. Foreach base design, 25 random permutations of its columns give 25 values ofzeax, hg, displayed as a box plot. Boxplots are joined by their means.1716.1. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe MTia 6bmcTeifba1.90 2.30 2.7080 200 500−1.30−1.00−0.70−0.400.100.200.40log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure C.3: Weighted Franke’s function: Normalized maximum absoluteerror of prediction, zeax, hg; n = 80P 200 for rLHD, Sobol, mLHD and trLHDdesigns. For each base design, 25 random permutations of its columns give25 values of zeax, hg, displayed as a box plot. Boxplots are joined by theirmeans.1726.1. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe MTia 6bmcTeifba1.70 2.00 2.4050 100 250−1.00−0.500.000.500.100.321.003.16log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure C.4: Corner-peak function: Normalized maximum absolute error ofprediction, zeax, hg; n = 50P 100P 250 for rLHD, Sobol, mLHD and trLHDdesigns. For each base design, 25 random permutations of its columns give25 values of zeax, hg, displayed as a box plot. Boxplots are joined by theirmeans.1736.1. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe MTia 6bmcTeifba1.70 2.0050 100−1.80−1.30−0.80−0.300.020.050.160.50log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure C.5: Corner-peak function analyzed at the logarithmic scale: Nor-malized maximum absolute error of prediction, zeax, hg; n = 50P 100 forrLHD, Sobol, mLHD and trLHD designs. For each base design, 25 randompermutations of its columns give 25 values of zeax, hg, displayed as a boxplot. Boxplots are joined by their means.1746.2. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe 6bmcTeifba bf Cebjecgiba DefigafCCG gzsults of cormvlizzy bvx Vwsolutz Errorsfor thz Compvrison of erojzxtion Dzsigns1.50 1.8132 64−2.0−1.5−1.0−0.50.010.030.100.32log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r ormLHDm2LHDOA−LHDFigure C.6: Borehole function: Normalized maximum absolute error of pre-diction, zeax, hg; n = 32P 64 for m2LHD, OA-based LHD designs and mLHD.The two OA-based LHDs are n = 32 (index 2, level 4, strength 2) and n = 64(index 1, level 8, strength 2). For each base design, 25 random permutationsof its columns give 25 values of zeax, hg, displayed as a box plot. Boxplotsare joined by their means.1756.2. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe 6bmcTeifba bf Cebjecgiba Defigaf2.11 2.41128 256−0.8−0.6−0.40.160.250.40log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r ormLHDm2LHDOA−LHDFigure C.7: PTW function: Normalized maximum absolute error of predic-tion, zeax, hg; n = 128P 256 for m2LHD, OA-based LHD designs and mLHD.The two OA-based LHDs are n = 128 (index 2, level 8, strength 2) andn = 256 (index 1, level 16, strength 2). For each base design, 25 randompermutations of its columns give 25 values of zeax, hg, displayed as a boxplot. Boxplots are joined by their means.1766.2. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe 6bmcTeifba bf Cebjecgiba Defigaf2.11 2.41 2.71128 256 512−1.3−1.1−0.8−0.60.090.160.28log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r ormLHDm2LHDOA−LHDFigure C.8: Weighted Franke’s function: Normalized maximum absoluteerror of prediction, zeax, hg; n = 128P 256P 512 for m2LHD, OA-based LHDdesigns and mLHD. The three OA-based LHDs are n = 128 (index 2, level8, strength 2), n = 256 (index 1, level 16, strength 2) and n = 512 (index2, level 16, strength 2). For each base design, 25 random permutations ofits columns give 25 values of zeax, hg, displayed as a box plot. Boxplots arejoined by their means.1776.2. Eefhlgf bf AbemTlimeW MTk 4bfblhge Eeebef fbe ghe 6bmcTeifba bf Cebjecgiba Defigaf1.81 2.1164 128−1.8−1.4−1.0−0.60.020.040.100.25log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r ormLHDtrLHDm2LHDOA−LHDFigure C.9: Corner-peak function analyzed at the logarithmic scale: Nor-malized maximum absolute error of prediction, zeax, hg; n = 64P 128 form2LHD, OA-based LHD designs and mLHD. The two OA-based LHDs aren = 64 (index 4, level 4, strength 2) and n = 128 (index 2, level 8, strength2). For each base design, 25 random permutations of its columns give 25values of zeax, hg, displayed as a box plot. Boxplots are joined by theirmeans.1786.3. Eefhlgf bf AbemTlimeW MTkimhm 4bfblhge Eeebef fbe ghe Difchffiba FecgibaCCH gzsults of cormvlizzy bvximum VwsolutzErrors for thz Disxussion hzxtion1.70 2.0050 100−1.80−1.30−0.80−0.300.020.050.160.50log10(n)nl og 10( No rmal i zed  Ma x Ab so lu te  E rr or )No rmal i zed   Ma x  Ab so lu te   Er r orrLHDSobolmLHDtrLHDFigure C.10: Corner-peak function analyzed at the logarithmic scale witha full linear GP model: Normalized maximum absolute error of prediction,zeax, hg; n = 50P 100 for rLHD, Sobol, mLHD and trLHD designs. Foreach base design, 25 random permutations of its columns give 25 values ofzeax, hg, displayed as a box plot and are joined by the means. The verticalaxis limits are kept same as in Figure C.5 to faculitate direct comparison.179

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0364238/manifest

Comment

Related Items