ESTIMATING DESIGN VALUES FOR EXTREME EVENTS by DOUGLAS FREDERICK SPARKS A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n FACULTY OF GRADUATE STUDIES /^Department of C i v i l Engineering j We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA A p r i l 1985 © DOUGLAS FREDERICK SPARKS, I IgS" I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r a n a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l m a k e i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s m a y b e g r a n t e d b y t h e h e a d o f m y d e p a r t m e n t o r b y h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t b e a l l o w e d w i t h o u t m y w r i t t e n p e r m i s s i o n . D e p a r t m e n t O f C i v i l Engineering T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a 1 9 5 6 Main Mall V a n c o u v e r , C a n a d a V 6 T 1 Y 3 D a t e A P r i l 2 4 » 1 9 8 5 ) E - 6 ( 3 / 8 1 ) ABSTRACT i i Extreme event populations are encountered i n a l l domains of c i v i l engineering. The c l a s s i c a l and Bayesian s t a t i s t i c a l approaches for describing these populations are described and compared. Bayesian frameworks applied to such populations are reviewed and critiqued. The present Bayesian framework i s explained from both theoretical and computational points of view. Engineering judgement and regional analyses can be used to y i e l d a d i s t r i b u t i o n on a parameter set describing a population of extremes. Extraordinary order events, as well as known data, can be used to update the prior parameter d i s t r i b u t i o n through Bayes theorem. The resu l t i n g posterior d i s t r i b u t i o n i s used to form a compound d i s t r i b u t i o n , the basis for estimation. Quantile d i s t r i b u t i o n s are developed as are li n e a r transformations of the parameters. Examples from several domains of c i v i l engineering i l l u s t r a t e the f l e x i b i l i t y of the computer program which implements the present method. Suggestions are made for further research. TABLE OF CONTENTS i i i ABSTRACT i i LIST OF TABLES v LIST OF FIGURES v ACKNOWLEDGEMENTS v i 1. REVIEW OF CLASSICAL STATISTICAL ANALYSIS OF EXTREME EVENTS AND THE BAYSIAN FRAMEWORK 1 2. LITERATURE REVIEW 10 3. PRESENT METHOD 26 3.0 Introduction 26 3.1 Pr i o r P r o b a b i l i t i e s 26 3.2 Likelihood Functions 27 3.3 Posterior P r o b a b i l i t i e s 30 3.4 Estimation 31 3.4.1 Expected Return Periods for a Given Event 31 3.4.2 Event for a Given Return Period 32 3.4.3 Quantile D i s t r i b u t i o n 32 3.5 R e f i t t i n g 33 3.6 Linear Transformations of the Parameters 34 3.7 Conclusion 34 4. EXAMPLE 4.0 Introduction 35 4.1 P r e c i p i t a t i o n 35 4.2 Floods 36 4.3 Strength of Materials - Wood 37 4.4 Conclusion 38 iv 5. CONCLUSIONS 5.0 Introduction 41 5.1 I n i t i a l Intentions 41 5.2 Further Research 42 5.3 Computer P i t f a l l s 43 5.3.1 The Output P i t f a l l s 43 5.3.2 The Language P i t f a l l s 43 5.3.3 The "Unfriendliness" P i t f a l l 44 5.4 Summary 44 REFERENCES 45 APPENDIX 1 50 APPENDIX 2 56 V LIST OF TABLES TABLE I : Summary of Results: P r e c i p i t a t i o n 39 TABLE I I : Summary of Results: Floods 40 LIST OF FIGURES FIGURE 1 : Frequency Curve showing Compound D i s t r i b u t i o n .... 47 FIGURE 2 : D i s t r i b u t i o n of Return Period f o r a Given Event ... 47 FIGURE 3 : D i s t r i b u t i o n of Extreme Events f o r a Given Period . 48 FIGURE 4 : D i s c r e t i z a t i o n of Normal D i s t r i b u t i o n s 48 FIGURE 5 : Bayesian Frequency Analysis 49 v i ACKNOWLEDGEMENT I would l i k e to thank Dr. S.O. Russell for his direction and help i n the preparation of the computer program and th i s thesis. I would also l i k e to express my sincere appreciation for NSERC's f i n a n c i a l support during i t s preparation, for the CEGP de Hauterive who loaned me computing f a c i l i t i e s and for Srba Krunic, who typed and prepared the manuscript i n a very limited period of time. Je tiens a remercier Denis Dufour pour les dessins des figures. F i n a l l y , I would l i k e to thank Sy l v i e , who gave up evenings and weekends and continually encouraged me while completing the work. 1 CHAPTER 1 : REVIEW OF CLASSICAL STATISTICAL ANALYSIS OF EXTREME EVEMTS AND THE BAYSIAN FRAMEWORK EXTREME EVENTS IN CIVIL ENGINEERING Extreme events and t h e i r analysis are a part of a l l d i s c i p l i n e s of c i v i l engineering. When designing structures, extreme loads such as earthquakes, extreme temperature gradients and the s t a t i s t i c a l nature of loading a l l lend themselves to s t a t i s t i c s of extremes. In environmental engineering, peak flows at a water-treatment plant or peak concentration of effluent constituents can be considered as populations of extreme events. In transportation engineering, peak t r a f f i c flows must be evaluated. Municipal engineers must design for periods of extreme demand on the different services provided by the municipality. To analyse the strength of materials, analyses are often based on the weakest specimen i n a sample, again an extreme. F i n a l l y , i n the water resource domain we find many examples of hydrologic extremes: droughts, floods and extreme cl i m a t i c events. In t h i s chapter we w i l l review b r i e f l y the c l a s s i c a l approach to these populations and develop a framework for a Bayesian approach. CLASSICAL APPROACH Data The f i r s t step i n the c l a s s i c a l s t a t i s t i c a l approach i s c o l l e c t i n g or selecting the data to be analysed. Data, i n t h i s context, consists of a series of observations or measurements. The data must s a t i s f y the following c r i t e r i a i n order to be analysed: 2 1. I t must be random. There can be no systematic d i s t o r t i o n of r e s u l t s . 2 . The data must be independent. Each data piece must correspond to an event that does not "depend" on previous events. 3. The data must be homogeneous ( i . e . come from the same population). This i s of c r i t i c a l importance i n the design of experiments and data c o l l e c t i o n program where we can continuously confuse two phenomena (e.g. "hockey s t i c k " effect of snowmelt and r a i n f a l l i n peak flows i n hydrology). 4. If the data i s a time series i t must be stationary. If the mean does not change over time the data Is " f i r s t order stationary". If the standard deviation does not change over time the data i s "second-order stationary". Deviations from stationary are either trends or jumps. Tests for these four c r i t e r i a can be found i n most elementary s t a t i s t i c s texts. Yet another requirement for the data to be analysed i s that i t must be re a l population values. Other information, such as the largest value i n a sample or series where the other values are unknown, cannot be used except to help i n the choice of an appropriate d i s t r i b u t i o n . This leads us to the second step i n the c l a s s i c a l approach; choosing a probability d i s t r i b u t i o n . The Choice of a D i s t r i b u t i o n This part of the analysis consists of choosing a probability d i s t r i b u t i o n which can best describe the population concerned. The c r i t e r i a for t h i s choice may be derived from norms published by 3 government agencies, f l e x i b i l i t y (these are two arguments used for the Log-Pearson I I I i n flood frequency analysis) or an understanding of the mechanics of the phenomenon under study (arguments used i n the choice of the Weibull or Lognormal d i s t r i b u t i o n for strength of materials or structures) or f i n a l l y , convenience (the best argument for the Normal d i s t r i b u t i o n ) . Often several d i s t r i b u t i o n s w i l l be t r i e d and then after i t i s seen how wel l they describe the population, a f i n a l choice w i l l be made. We must then proceed to estimate the parameters of the dist r i b u t i o n ( s ) chosen. ESTIMATING PARAMETERS Estimating parameters i s the crux of the process and as the inferences derived from these estimations are the outcome of the analysis, i t i s the most important step. Let us f i r s t define several terms: Parameter: a fixed value, cha r a c t e r i s t i c of a population Estimator: a value which we estimate represents the value of the parameter S t a t i s t i c : a value derived from a sample used to fi n d the estimator For further discussion, p a r t i c u l a r l y of the properties of estimators see Benjamin and Cornell (1970) (Ch. 4). Estimation i s the process of getting as close as possible using observations of a population, to the set of r e a l parameters. There are three estimation or f i t t i n g techniques i n popular use i n the domain of extreme analysis. The oldest and most i n t u i t i v e i s the graphical 4 method. While the l e a s t s t a t i s t i c a l , i t remains an important part of any a n a l y s i s , helping to locate anomalies. A p l o t t i n g formula i s used to l o cate the data points on appropriate p r o b a b i l i t y paper. Then a "best f i t " l i n e i s drawn (often by eye). Obvious o u t l i e r s , extreme maxima which may have exceedance p r o b a b i l i t i e s f o r greater or smaller than indicated by the p l o t t i n g formula or sample siz e can then be located. A more popular method i s the method of moments. Here the population d i s t r i b u t i o n Is described by i t s f i r s t moment about the o r i g i n (the mean) and i t s c e n t r a l moments (standard deviation, skew ku r t o s i s , e t c ) . The s t a t i s t i c s of the data sample are then used as estimators of those parameters. The t h i r d technique i s the maximum l i k e l i h o o d method. We w i l l summarize i t here. A more complete explanation along with the estimators' properties i s given by Benjamin and C o r n e l l (1970). The c r i t e r i o n for choosing parameters now becomes optimality. Given a sample, X and a population p r o b a b i l i t y density function which depends on the parameter(s) 6, f(X;8), we then ask, what estimator 6 for parameter 6 has the maximum l i k e l i h o o d of generating the sampe X?. To f i n d t h i s estimator we f i r s t derive a l i k e l i h o o d function for a sample. We then d i f f e r e n t i a t e i t , set i t equal to zero and solve, insuring along the way that we have found a maximum. This method has become the most popular because of the various desirable c h a r a c t e r i s t i c s of the estimator. 5 Quantile Estimation Quantiles i n extreme value d i s t r i b u t i o n s are calculated d i r e c t l y using the Cumulative D i s t r i b u t i o n Function (CDF) and the exceedance pro b a b i l i t y desired. This y i e l d s one extreme value corresponding to the quantile. Confidence intervals can be obtained only through invoking the central l i m i t theorem. Inadequacies A l b e i t b r i e f , t h i s summary can help us to id e n t i f y two categories of inadequacies with the c l a s s i c a l approach: theoretical and p r a c t i c a l . In the theoretical category, one problem i s the c l a s s i c a l approach's view of uncertainty. Of the three main types of uncertainty described by Benjamin and Cornell (1970) only natural uncertainty i s dealt with by c l a s s i c a l s t a t i s t i c s . Another i s the v a l i d i t y of the central l i m i t theorem. When invoked to fi n d confidence i n t e r v a l s , these intervals cannot be used to make a probability statement about the quantile. The p r a c t i c a l inadequacies are numerous. Often we may not have enough data to use the c l a s s i c a l approach with any accuracy. Or we may have data which does not f i t into the c l a s s i c a l framework (e.g. extremes i n a discontinuous series on a regression analysis on clima t i c and physiographic variables a f f e c t i n g the d i s t r i b u t i o n parameters). Another frequent sit u a t i o n i s the i n a b i l i t y of an engineer to incorporate a good subjective understanding into a s t a t i s t i c a l process. These inadequacies are the motivating force behind new approaches to s t a t i s t i c a l analysis. The most common formalized approach i s the Bayesian s t a t i s t i c a l framework. 6 BAYESIAN STATISTICS Bayeslan s t a t i s t i c s rests on the idea that parameters must be considered as random variables, having their own probability d i s t r i b u t i o n s . The Bayesian process of estimation consists of uniting a l l the information we can find into a parameter d i s t r i b u t i o n and then using a compound d i s t r i b u t i o n to describe the o r i g i n a l random variable. Bayes theorem i s used to incorporate the different pieces of information into the parameter d i s t r i b u t i o n . By looking at i t s elements we can see how to apply Bayesian s t a t i s t i c s to the extreme value problem. Bayes Theorem: The Elements Let us assume that we can express a l l we know about some parameter, 9 prior to seeing any data, by the d i s t r i b u t i o n g(9). Let us also assume that the parameter 6 describes a population X through the probability d i s t r i b u t i o n f ( x | 0 ) . If we then obtain new information or data Z about population X, we can revise what we know about 6. We do this through Bayes rule which says i n the continuous case 8 ( 6 | z ) " J g O ) f(z|6) d6 or i n the discrete case P(9 ±) f ( z | e ± ) p ( e i l z > = I p(e.) f ( z | e . ) i V 7 These expressions have four parts. The f i r s t part i s the "prior d i s t r i b u t i o n " described by g(0) or the p(8 i)'s. The second element i s the r e l a t i v e l i k e l i h o o d function f(z|8) or f C z ^ ) . The t h i r d part i s the normalizing factor /g(6) f(z|6) or £ p(Q ±) f ( z | 6 i ) . The left-hand side of the equation g(6|z) or p C B ^ z ) i s the "posterior d i s t r i b u t i o n of 6", the fourth element which i s then used i n the compound d i s t r i b u t i o n . In words then, Bayes theorem says that values of 9 which have a greater l i k e l i h o o d of generating the information z are worth more and t h e i r posterior p r o b a b i l i t i e s w i l l be increased while those values of 0 which are not l i k e l y to generate z are worth less and w i l l have reduced posterior p r o b a b i l i t i e s . PRIOR PROBABILITY DISTRIBUTIONS The p r i o r d i s t r i b u t i o n summarizes our understanding of the parameter before seeing any measured data. In c l a s s i c a l s t a t i s t i c s we presume to know nothing. In cases where we have l i t t l e or no data, the only s t a t i s t i c a l resource available i s a regression analysis. "Engineering judgement" i s then used to either permit or overule the conclusions. In a Bayesian context this engineering experience can be used, i n concert with regional analyses, to construct a prior d i s t r i b u t i o n . In cases where we would l i k e to l e t the data speak for i t s e l f "diffuse" priors are used. A diffuse prior gives an equal probability to a l l possible parameter values. The posterior d i s t r i b u t i o n i s then proportional to the sample l i k e l i h o o d function. It should be noted that a diffuse prior for one set of parameters w i l l not y i e l d the same results as a diffuse prior on some transformation of 8 t h i s set of parameters. Thus d i f f u s e p r i o r s should be used with d i s c r e t i o n . Usually we have some information which we can use to construct an informative p r i o r d i s t r i b u t i o n . LIKELIHOOD FUNCTIONS fl The l i k e l i h o o d function used i n Bayes' Rule i s the same one used i n the Maximum Likelihood technique. It i s sometimes c a l l e d the r e l a t i v e l i k e l i h o o d because the i n t e g r a l or sum of the l i k e l i h o o d s i s not equal to unity. As t h i s normalization takes place within Bayes' Rule a " r e l a t i v e " l i k e l i h o o d i s s u f f i c i e n t . As described f o r the maximum l i k e l i h o o d method, i t i s the j o i n t p r o b a b i l i t y d i s t r i b u t i o n of the sample x, over the set of parameters, the j o i n t p r o b a b i l i t y being the product of the l i k e l i h o o d of each sample specimen. The analyst must f i r s t determine the form of the underlying d i s t r i b u t i o n . Underlying d i s t r i b u t i o n s describe the population under study and thus the same choice i s presented as i n the c l a s s i c a l a n a l y s i s , the c r i t e r i a for the choice also remaining best f i t , the most conservative, governmental norms, or ph y s i c a l understanding of the process. The underlying d i s t r i b u t i o n i s then used to formulate p r o b a b i l i t y statements which are di c t a t e d by the data. Thus data not included i n the c l a s s i c a l analysis can be incorporated. For example, i f we know that f i v e specimens x i n a sample of 40 tested below some value a, then we can calculate the l i k e l i h o o d of t h i s event y with the underlying d i s t r i b u t i o n : f (z|6) - ( f ( x < a | 6 ) ) 5 x ( f ( x > a | 9 ) ) 3 5 9 1970). The product of these functions i s a d i s t r i b u t i o n of the extreme event population which now has probability densities associated with dif f e r e n t r e s u l t s . We see t h i s i l l u s t r a t e d i n Figure 1. Much l i k e a topographic map the contours of probability density indicate regions of greater confidence. The compound d i s t r i b u t i o n can y i e l d two types of results as indicated by the Sections A-A and B-B of th i s figure. Section A-A yields the d i s t r i b u t i o n of a return period for a given event as i l l u s t r a t e d i n Figure 2. This result can be used to formulate probability statements about the flow or the expected return period for this flow can be calculated. Section B-B shows the d i s t r i b u t i o n of a quantile. Probability statements can be made concerning the flows corresponding to this return period or again, the expectation can be taken, to calculate the expected value corresponding to t h i s return period. CONCLUSION In t h i s chapter we have looked at two theoretical frameworks for describing populations of extreme events. The c l a s s i c a l approach considers parameters to have fixed values i n the r e a l world and estimation to be the process of best approximating these values. Bayesian s t a t i s t i c s treats the parameters as random variables and uses Bayes theorem to introduce different types of information into a d i s t r i b u t i o n describing these parameters. A compound d i s t r i b u t i o n i s then used to make inferences about the population. To get a better understanding of the variety of applications of the Bayesian approach, l e t us consider several that have been proposed for the extreme event problem i n hydrology, each suggesting ways to overcome one or more of the weaknesses associated with the c l a s s i c a l approach. 10 CHAPTER 2 : LITERATURE REVIEW Bayesian s t a t i s t i c s i s s t i l l i n i t s early stages of application, and the l i t e r a t u r e concerning applications to extreme events i s r e l a t i v e l y rare. The s i x a r t i c l e s reviewed here attempt to address some of the problems discerned with the c l a s s i c a l analysis. P l o t t i n g Positions of Annual Flood Extremes Considering Extraordinary Values (Zhang, 1982) Although t h i s a r t i c l e does not deal with Bayesian analysis i t does suggest an interesting way of incorporating extraordinary values into a f i t t i n g process and as t h i s i s one of the properties of the present Bayesian technique i t i s worthy of examination. The two assumptions necessary for the analysis proposed by Zhang (1982) are the independence and homogeneity of the data, both basic to any analysis. The author, using order s t a t i s t i c s , develops optimal plotting formulae for the graphical method of frequency analysis. These p l o t t i n g formulae assign a return period to an extreme event i n a data series. Because of the generality of the basic assumptions the resultant formulae are independent of the probability d i s t r i b u t i o n s of the extreme events. This explains the popularity of the Weibull p l o t t i n g formula which results as the special case for continuous samples. For analyses Including extraordinary values, a more complicated formula obtains. The resulting e m p i r i c i a l or graphical d i s t r i b u t i o n has values plotted i n the extreme regions that usually (with sample sizes around 30) remain blank, where we would l i k e to see how the theoretical curves f i t . It i s interesting to note the effects of the method. As Zhang observes, the points plotted using the continuous formulae tend to translate along the probability axis when extraordinary values are incorporated into the analysis. Another interesting feature i s that the theoretical curve shown i s quite conservative i n the region that includes extremely large floods. Order s t a t i s t i c s suggest more risky r e s u l t s . The graphical method has become anachronous as better computing f a c i l i t i e s have enabled s t a t i s t i c i a n s and engineers to use more complicated mathematical models. It nevertheless retains a certain value as a v i s u a l check, enabling us to ide n t i f y o u t l i e r s and get a feeli n g for goodness of f i t . It i s ce r t a i n l y enhanced by the inclu s i o n of extraordinary extreme events but remains just a check. Bayesian Inference and Decision Making for Extreme Hydrologic Events Wood and Iturbe (1975) address the question of uncertainty i n the design of structures subject to extreme hydrologic events from a different angle - that of a decision-maker. Fortunately they divide the problem into two parts: inferences about future floods and decisions concerning preferences and engineering design variables. The former covers the subject of interest i n the present thesis. They propose that the Bayesian framework i s a better one for making these inferences than the c l a s s i c a l one because i t r e a l i s t i c a l l y treats d i f f e r e n t types of uncertainty (p. 534) and because i t can pool a l l available information (p. 533). We s h a l l review their approach to see how i t does t h i s . They begin by discerning three types of uncertainty: natural, s t a t i s t i c a l and model uncertainty. To account for natural uncertainty they choose three different underlying distr i b u t i o n s each describing the annual flood process: the Normal, Lognormal and Exceedence models (the l a t t e r uses a Poisson d i s t r i b u t i o n on ind i v i d u a l flow peaks exceeding a certain value). They then choose a "conjugate p r i o r " d i s t r i b u t i o n , to account for s t a t i s t i c a l or parameter uncertainty, to mathematically f i t each underlying d i s t r i b u t i o n so that the i n t e g r a l form of Bayes rule can be evaluated d i r e c t l y , and the posterior distri b u t i o n s parameters calculated from the prior's parameters and the sample s t a t i s t i c s . For the Normal model t h i s yields a gamma di s t r i b u t i o n as a conjugate prior d i s t r i b u t i o n , having four parameters. The Lognormal model uses the transformation z = In q to derive the conjugate prior and posterior distr i b u t i o n s whose forms closely resemble those of the Normal model, again with four parameters. The Exceedance model has a gamma-1 prior with two parameters. This implies that the form of the d i s t r i b u t i o n describing parameter uncertainty i s limited by the choice of the underlying d i s t r i b u t i o n , and that each model has a different way of looking at prior information. There are two p r a c t i c a l d i f f i c u l t i e s associated with t h i s approach. Both concern the parameters of the models. F i r s t of a l l , there are often several parameters and i t may be d i f f i c u l t , given the fact there i s often l i t t l e p r i o r information, to assign values to a l l of them. Secondly, for the Lognormal and Exceedance models, the user must be fa m i l i a r with the process and the distr i b u t i o n s to be able to give r e a l i s t i c estimates of the parameters. The regression models used by the authors for the Normal and Lognormal process overcome these problems but i n Canada, where gauge stations may be farther apart and record lengths shorter, i t could be 13 d i f f i c u l t to j u s t i f y the value of a t h i r d or fourth parameter. The authors also use different prior information for the different models. This would seem to jeopardize the v a l i d i t y of the ensuing model comparison, which could r e f l e c t the different prior information more than the "truth" of any of the models chosen. It i s also d i f f i c u l t to compare the models because the graphs of the results are d i f f e r e n t . Again, each model has i t s own graph paper and corresponding " c l a s s i c a l analysis". There are two s i g n i f i c a n t advantages i n the conjugate p r i o r approach. The f i r s t i s i t s rigour. Because the mathematics yields an antiderivative the re s u l t i s a rigorous Bayesian posterior d i s t r i b u t i o n of the extreme events. In as much as the form of the prior d i s t r i b u t i o n r e f l e c t s the true opinion of the analyst, and the underlying distributions have the f u l l confidence of the analyst, the method combines the information i n a way that leaves no errors due to approximations, numerical methods or graphical "goodness of f i t " methods. The second advantage i s that once the mathematics has been done, the calculations are quite simple. As there are texts i n which the mathematics have been done for many conjugate pairs (Raiffa and Schlaifer, 1968) this method i s not a d i f f i c u l t one to apply. Let us turn now to the conclusions obtained by the authors. The main conclusion the authors reach i s that, i n general, accounting for uncertainty i n the parameters w i l l lead to more extreme estimates for a given exceedance probability. This i s only true i f the same information i s given to the c l a s s i c a l and Bayesian analyses. While this trend to conservatism i s not unsafe, i t may be costly and thus push engineers away from the use of Bayesian methods unless we can incorporate more information. 14 I w i l l include only one remark concerning the Decision Analysis portion of this a r t i c l e . I t i s an excellent exposition of the techniques used to ar r i v e at an optimal design of a system under three possible population descriptions. I t i s a great help i n decision-making to be able to compare the consequences of choosing one of these underlying d i s t r i b u t i o n s . I t should be noted though, that i n no way i s t h i s part of the a r t i c l e linked to Bayesian s t a t i s t i c s and that the results of a c l a s s i c a l analysis could also have been used to rescale the damages. A l l the reasons for using t h i s procedure to determine design capacities are nevertheless well-founded and should be considered. Combining Si t e - S p e c i f i c and Regional Information: An Empirical Bayes Approach (Kuczera, 1982) In t h i s a r t i c l e Kuczera (1982) proposes a method which he c a l l s Empirical Bayes (EB) method for i n f e r r i n g hydrologic extrme events. The purpose i s to combine regionalization model results and s i t e - s p e c i f i c data to estimate some hydrologic quantity for a s p e c i f i c basin. Kuczera compares two "Bayesian" approaches with the s i t e -s p e c i f i c maximum li k e l i h o o d method. The method of moments i s used throughout i n the sense that a l l hydrologic quantiles are calculated with the moments. Ordinary least squares i s used i n the regionalization model to fi n d the moments at the s i t e considered. Because estimates of the mean are generally quite stable (an affirmation made without much support In cases where data i s lacking) when compared with the effects of errors i n the standard deviation, he considers only the d i s t r i b u t i o n of the Standard deviation (S) i n this study. 15 The f i r s t approach he c a l l s the Empirical Bayes with Ordinary Least Squares (EBO). The method assumes the super population of S i s inverted gamma, based on the properties of the d i s t r i b u t i o n : unimodal, skewed and bounded at zero. The author then invokes Bayes theorem through the method of conjugate p r i o r s , to y i e l d posterior d i s t r i b u t i o n s . I t should be noted that his assignment of a Lognormal d i s t r i b u t i o n to the annual flood maxima i s a necessary step i n t h i s process as i t i s i n the conjugate family of the inverted gamma di s t r i b u t i o n . The second approach he c a l l s the Linear Empirical Bayes estimator (LEBO), the difference being that he uses a different method for weighting the two information sources. Under the assumptions that the hydrologic quantile i n question and the estimate for i t obtained from the regionalization model both have well-behaved but unspecified distributions and that the means and variances of each can be estimated (from the sample moments for the former and the method of ordinary least squares) we can then try to say something about the consequences of choosing an estimator from the pooled data. C i t i n g Hartigan (1969), Kuczera states that under a quadratic loss function ( i . e . loss proportional to the square of the error) the best l i n e a r estimator of the hydrologic quantile (g) i s g = M + (1 + J ? - . ) (g° _ M) where g° and D are the mean and variance of the quantile using the s i t e data and M and A are the mean and variance of the quantile found through the regionalization model. To address t h i s p r i n c i p a l question concerning the r e l a t i v e e f f i c i e n c y of these two EB estimators, the author devises two experiments. The f i r s t i s a sampling experiment. The second i s an example using New England data where the results obtained from the EBO weighting technique are compared with those obtained using only the sample values i n the Inverted gamma d i s t r i b u t i o n . The purpose of the f i r s t experiment i s to evaluate the r e l a t i v e advantage gained by using the EB methods. The author proposes as a r i s k estimator, the non-scaler loss function produced by a log transform; the difference between the log of the variance estimator given by the method i n question and the log of the r e a l variance, known from the populations used to generate the data. To make a comparison, Kuczera then calculates r i s k savings equal to the reductions i n r i s k produced by using an EB meethod instead of the maximum li k e l i h o o d method. This experiment had two goals. The f i r s t was to see how the estimators compare. In general the LEBO estimator outperforms the EBO estimator. The reason according to the authors, i s that i n t h i s experiment we know that the inverted gamma d i s t r i b u t i o n w i l l have problems f i t t i n g the unskewed d i s t r i b u t i o n of S used to generate the data. The second goal was to compare the effects of varying the length of s i t e record and the s i m i l a r i t y of populations used i n the regionalization model. The results conform to our expectations. The stronger the s i m i l a r i t y i n the elements of the regionalization model the greater i t s influence and the better the EB results. The longer the s i t e record, the more robust the s i t e - s p e c i f i c estimators and weaker the EB res u l t s . I t was noted that for sites that deviate from the majority the EB estimators were less effective and i n cases where 17 there was a reasonable length of record, they even induced errors. The second experiment investigated the effect of using the EBO estimator on a s i t e i n New England. The author p r o f i t s from the chance to show that we get better results from a regionalization model that contains not only flood values but other basin characteristics such as drainage area, main channel slope etc. The author then makes a graphical comparison of the probability density functions produced with and without the incorporation of the regionalization model. The results again confirm our expectations and the results of the f i r s t experiment, i n that the effect of the regionalized model i s to reduce the variance of the estimate and s h i f t the mode but these changes diminish as the s i t e record length increases. To summarize, t h i s approach has several attractions as w e l l as l i a b i l i t i e s . Despite i t s apparent theoretical d i f f i c u l t y the calculations are quite simple. Th sampling experiment shows that there are s i g n i f i c a n t improvements when regionalization data i s incorporated Into the frequency analysis. We can, under a quadrate loss r u l e , f i n d a simple way to weight the estimates of the moments without specifying underlying d i s t r i b u t i o n s or parameter d i s t r i b u t i o n s . On the other hand, this approach can lead to problems when the basin under study deviates from the majority i n the region. Other problems can develop If parameter di s t r i b u t i o n s are not f l e x i b l e enough, as was the case for his conjugate p r i o r . Kuczera's r i s k analysis and sampling experiment give good reasons for respecting a Bayesian approach despite possible p i t f a l l s . 18 Flood Studies Report - pp. 286-289 (NERC, 1975) This p a r t i c u l a r Bayesian a n l y s i s i s taken from an excerpt of a Flood Studies Report published by the NERC. Rather than looking at the general case of a d i s t r i b u t i o n of annual maxima i t concentrates on the Bayesian estimation of the mean annual maxima. The analysis i s based on three assumptions: 1. that the value of the p r i o r estimate of the mean i s equivalent to one or two years of data (pp. 286 and 289). 2. that the mean i s normally d i s t r i b u t e d (p. 288). This i s true i f the underlying d i s t r i b u t i o n i s normal or i f the sample size i s large enough to invoke the c e n t r a l l i m i t theorem. The "large" w i l l be great (n > 30) f o r skewed d i s t r i b u t i o n s and few (n < 30) f o r d i s t r i b u t i o n s which approach a "normal" shape. 3. that the standard deviation of the annual maxima i s f i x e d and known. While t h i s i s not s p e c i f i e d , i t i s an i n t e g r a l part of the so l u t i o n (see pp. 618 Benjamin and C o r n e l l ) . Once these three assumptions are accepted then Bayes theorem i s invoked. The weighting formula used i s the following. u = (NhQ + h u )/(Nh + h ) o o o where N ,is the sample s i z e , Q and 1/h the sample mean and variance and u and 1/h are the p r i o r mean and variance. In words, the means are o o weighted by the r e c i p r o c a l of the variance of the mean. The usefulness of th i s technique i s determined by the v a l i d i t y of the assumptions. The consequence of the f i r s t assumption i s that as N 19 gets larger the effect of the regional analysis diminishes. Thus the real worth of the regional data i s an important question, l e f t unaddressed by the authors. The second assumption i s a useful one and can be supported i n most cases. Unfortunately i t works against the f i r s t assumption i n the sense that we f e e l more confident about the second's veracity i n cases where N i s large and thus the effect of the regional data i s minimal. The t h i r d assumption i s the hardest to accept. In most cases where we do not know the mean, we w i l l not be sure about the standard deviation, and th i s i s especially true for small N where the technique would seem the most useful. In summary, th i s method i s a s t a t i s t i c a l l y sound way of combining regional analysis and s i t e data provided we fe e l confident about the three basic assumptions. I t s s i m p l i c i t y allows us to check the importance of a regional analysis. I t should prove useful for small projects. Bayesian Frequency Analysis (Tang, 1980) r Tang (1980) approaches the extreme event frequency problem from yet another angle. F i r s t he uses the method of least squares to f i t several underlying d i s t r i b u t i o n s on t h e i r appropriate probability paper. Then he uses a "Bayesian" technique to combine the estimates from these different p r o b a b i l i s t i c models. F i n a l l y , he develops a Bayesian estimator for the r i s k associated with a given design and any s p e c i f i c model. Let us look at each of these three steps i n the analysis. The method of least squares i s an optimal solution of the graphical method of c l a s s i c a l frequency analysis. Because the location 20 of the points and thus the optimality i s affected by the pl o t t i n g formula chosen ( i n this case the Weibull) the moments of the d i s t r i b u t i o n described by the l i n e w i l l not necessarily agree with the sample moments. This can also affect the standard deviation of the estimate. Another possible problem, noted by Tang, i s that scale transformations (e.g. Lognormal, Log Pearson I I I etc.) may lead to higher standard deviations of estimates as we move away from the mean. This implies less accuracy. Both of these problems are consequences of the graphical nature of the solution and are thus i n some ways anachronistic. The Bayesian technique used by Tang resembles closely that proposed by the FLOOD STUDIES REPORT and the Linear Bayes approach proposed by Kuczera. Again, the estimates of different means are combined using their inverse variances as weights. Some of this method's weaknesses have already been discussed. In th i s p a r t i c u l a r application, i t i s the independence of the two estimates being combined that seem questionable because we know that they have been derived from exactly the same data. Nevertheless, the weighted average of the estimates could be accepted as i t does not change i n magnitude. The variance however, i s reduced each time we perform the weighting operation so i t w i l l c e r t a i n l y be biased. This fact was noted i n the FLOOD STUDIES REPORT. The only si t u a t i o n where the bias could be ignored would be when one variance was much smaller than the other. Tang's comments i n this section are quite useful. Other p l o t t i n g formulae may be used, but as Zhang showed, the Weibull i s the optimal one for a continuous sample. The second suggestion, doing a regression on only part of the data, seems p a r t i c u l a r l y interesting from an engineering point of view. Theoretically I t i s impossible to j u s t i f y , but i t could y i e l d interesting results and i s the approach used for "hockey-stick" d i s t r i b u t i o n s that characterize two-population extreme events. The f i n a l step i n Tang's analysis i s the sp e c i f i c a t i o n of a "Bayesian Risk" for a given design capacity. As i s the case with most general Bayesian approaches, the f i n a l step of the derivation i s an inte g r a l that must be evaluated numerically. Although Tang does not admit t h i s he gives us no in d i c a t i o n of having solved the i n t e g r a l nor i s a solution included i n his example. Bayesian Estimation of Frequency of Hydrological Events (Cunnane and Nash, 1979) Cunnane and Nash, l i k e Wood and Iturbe, use a formal Bayesian approach. Unlike the l a t t e r , the authors of this a r t i c l e produce a cumulative posterior d i s t r i b u t i o n of a quantile. Let us look at t h e i r development of this d i s t r i b u t i o n and i t s u t i l i t y . The f i r s t step i s to use a regional analysis to i d e n t i f y and quantify the joi n t prior d i s t r i b u t i o n of the f i r s t two moments of the annual maximum flood series. The regional regression analysis suggested lognormal distr i b u t i o n s for each and that they were independent. The parameters for the j o i n t d i s t r i b u t i o n were obtained from the means and the residuals. Using Bayes theorem, formulae for f i r s t the normalizing factor and then the posterior d i s t r i b u t i o n are derived. Both are integrals which cannot be evaluated a n a l y t i c a l l y and numerical methods must be used. An interesting technique i s suggested for finding l i m i t s for the numerical integration but no quadrature guidelines are given. 22 The advantages of th i s approach are multiple. F i r s t of a l l , there need be no relationship between the underlying probability model for the extreme event considered and the dis t r i b u t i o n s chosen for i t s parameters. Thus one of the limit a t i o n s of the conjugate pair approach i s overcome. This leads to two main advantages. The f i r s t i s that diffuse priors become a p o s s i b i l i t y . The second i s that the regional analysis can be done i n conventional and fa m i l i a r forms, thus enabling the hydrologist to extract the maximum from his prior information. F i n a l l y , the posterior d i s t r i b u t i o n of a quantile which r e s u l t s , allows us to make probability statements about the quantile, something we could not do i n the c l a s s i c a l analysis. There are two major disadvantages. The f i r s t i s that some method of numerical quadrature must be used along with a l l the error calcualations required to ensure i t s precision. This i s not covered i n the a r t i c l e . The second concerns the conclusions drawn from the posterior d i s t r i b u t i o n of the quantile. Nash and Cummane present thei r results i n the form of means and standard deviations for dif f e r e n t quantiles. However, the expected value of a quantile i s not the same as the expected quantile of a given extreme event. The l a t t e r i s often the information desired (see Chapter 1). With this approach we would need to do repeated numerical integrations i n a technique for finding roots of an equation. This would be very expensive i n a n a l y t i c a l time. The equation to be solved would be CO 00 [/ / P(U,V). F(q|u,V) dV du] - i = 0 U=o v=o T 23 where T = quantile desired (given) q = the flow for which the expected value of the return period i s T (solved f o r ) . P r a c t i c a l l y speaking, there should not be a great difference between the mean of the posterior d i s t r i b u t i o n of the quantile and q, but we cannot be sure. CONCLUSIONS There seems to be four general l i n e s of approach i n Bayesian applications to s t a t i s t i c a l analysis of extreme events. The f i r s t i s the graphical approach. Graphical solutions are proposed by Zhang (order s t a t i s t i c s ) and Tang (regression analysis). As stated previously, t h i s i s a good check but i t i s no longer rigorous enough. Scales and plotting positions can create biases, as both authors note. The second approach i s the conjugate pair. This i s the solution proposed by Wood and Iturbe, and the f i r s t solution proposed by Kuczera. While mathematically complete, i t has several drawbacks. The f i r s t of these i s the unf a m i l i a r i t y of the d i s t r i b u t i o n used to describe the parameters. Inverted gamma and regular gamma distr i b u t i o n s are not fam i l i a r to most c i v i l engineers, thus they are not sure what they are i n f e r r i n g into them as they estimate or calculate prior p r o b a b i l i t i e s . The second drawback i s that once the underlying probability d i s t r i b u t i o n i s chosen there i s a very l i m i t e d choice of distribu t i o n s for the parameters and this may l i m i t the choice to an unsuitable d i s t r i b u t i o n . F i n a l l y , the form of the prior d i s t r i b u t i o n may push us to find more s t a t i s t i c s than we f e e l 24 comfortable evaluating. These factors inhibit the engineering use of this otherwise excellent Bayesian technique. The third approach i s the weighted average using inverse variances as weights. This method is proposed by the FLOOD STUDIES REPORT for the mean annual flood, by Kuczera i n the Linear Bayes Technique, and by Tang to combine estimates from different underlying distributions. The method's simplicity, i t s strongest advocate, and its optimality under quadratic loss (see Kuczera) are counter balanced by either d i f f i c u l t y accepting the basic assumptions, uncertainty of the value of subjective information, or the risk of site-specific losses for sites deviating f rom the majority. Finally, the fourth approach is the "numerical" Bayesian approximation. This method uses numerical techniques to evaluate the expressions resulting from Bayes rule. It is used by Cunnane and Nash. It allows total f l e x i b i l i t y i n the choices of underlying probability models and parametric distributions and then overcomes the d i f f i c u l t y of combining them i n integral form by using the numerical methods to approximate the integrals. Thus the continuous form of Bayes rule: p(u,v|x) = g(y,V).L(u.Vlx) 00 CO / / g(M,V).L(u,V|x)dVdn —oo changes to P(u ±, V |x) = g(w±,V ).L(u.,Vi|x) g(u i,V j).L(u 1,V j|x) 25 Probability statements and expected values can then be computed using this posterior d i s t r i b u t i o n of the parameters. These four general approaches are a comprehensive view of Bayesian applications to extreme events. Terminology may be particular to hydrology, such as regional analysis (understood to mean regression analysis on the super-population) but, aside from these differences, the techniques can be applied to any series of extreme events. Let us now consider an approach which f a l l s into the fourth category and t r i e s to overcome the d i f f i c u l t i e s of the f i r s t three categories as w e l l as one of those found i n Cummane and Nash's analysis. 26 CHAPTER 3 : PRESENT METHOD 3.0 INTRODUCTION In t h i s chapter we w i l l discuss the s o l u t i o n proposed. This s o l u t i o n has two facades. One i s the way i t relates to Bayesian s t a t i s t i c s . We w i l l deal with t h i s i n the f i r s t part of each s e c t i o n . The other facade, which i s the way t h i s approach has been adapted to the computer, i s covered i n the second part of each s e c t i o n . 3.1 P r i o r P r o b a b i l i t i e s To begin with we must choose a set of parameters. For t h i s study we s h a l l l i m i t ourselves to two: the mean and the standard deviation of the extreme event population. These two parameters are common ones and t h i s should make subjective estimating not only easier but more accurate. It i s f a i r l y safe to consider these parameters as independent as was v e r i f i e d for the case of annual floods by Cunnane and Nash. For other processes, t h i s assumption should be checked. We w i l l assume that independence holds for the prior case only, and t h i s assumption may be overruled by the data. Two forms are proposed f o r t h i s d i s t r i b u t i o n : a j o i n t p r o b a b i l i t y of normally d i s t r i b u t e d random va r i a b l e s or a " d i f f u s e " p r i o r where a l l parameter values have equal p r o b a b i l i t y . Due to the d i f f i c u l t y of s p e c i f y i n g means, standard deviations and skewness' of parameters, LOW, PROBABLE and HIGH estimates of each parameter are entered to describe i t s d i s t r i b u t i o n . The LOW and HIGH estimates represent the ten percent and ninety percent quantiles; i n s t a t i s t i c a l terms 27 P(9 < 6. ) = .10 and p(9 > 9 U J u ) = .10 low r high The PROBABLE estimate represents the mean of th i s d i s t r i b u t i o n . To d i s c r e t i z e the d i s t r i b u t i o n two more values are calculated: LOW 4- PROB HIGH + PROB j C J n 2 » 2 * f i v e values are then placed i n vectors, one f o r each parameter. These two sets of values can form 2 5 d i s c r e t e pairs (mean, s t . dev.) of parameter values. To designate i n i t i a l or p r i o r p r o b a b i l i t i e s the two methods proposed become: assigning the p r o b a b i l i t i e s corresponding to a j o i n t d i s t r i b u t i o n of independent normally d i s t r i b u t e d variables; assigning equal p r o b a b i l i t y to each p a i r . Figure 4 shows how the set of p r o b a b i l i t i e s i s devised for one of the normally d i s t r i b u t e d v a r i a b l e s , y i e l d i n g values of .1675 f o r LOW and HIGH estimates, .2060 for intermediate values and .2510 for the PROBABLE. The j o i n t p r o b a b i l i t y f o r any pair i s then just the product of the marginal p r o b a b i l i t i e s associated with each member of the p a i r . E i t h e r these j o i n t values or equal values of 0.04 are then stored i n a 5 x 5 " p r o b a b i l i t y matrix". 3.2 Lik e l i h o o d Functions Once we have determined the form of the p r i o r d i s t r i b u t i o n , we need to f i n d the r e l a t i v e l i k e l i h o o d of the data a v a i l a b l e to obtain posterior parameter p r o b a b i l i t i e s . The f i r s t step i n t h i s process i s the choice of an underlying d i s t r i b u t i o n . The c r i t e r i a for th i s choice were explained i n Chapter 1. The present applications allow f i v e choices: the Normal, Lognormal, Cure-root Normal, Gumbel and Weibull 28 d i s t r i b u t i o n s . Each has several applications i n the extreme event f i e l d of analysis. The method of moments i s used to f i t the parameters. This means that a l l parameters are written i n terms of the moments of the population and these equations are then solved for specified values of the moments. The l i k e l i h o o d function for any of these underlying d i s t r i b u t i o n s i s determined from either the probability density function (pdf), f(x) or the cumulative density function (cdf), F(x). Derivations for each set of equations (parameters, pdf and cdf) can be found i n Appendix 1. The l i k e l i h o o d function of the data i s the product of the likel i h o o d of each indiv i d u a l piece of data, given the parameter set 6. Four types of data can be processed by the present method. The f i r s t i s an observed extreme event z\. The l i k e l i h o o d of such an event i s the probability density at z, given 0 , or f ( z i | 0 ) . The second type of data i s an observed event z 2 which has been exceeded m times i n n t r i e s . The l i k e l i h o o d i s a function of the exceedance probability: P e = 1 - F ( Z 2 | 0 ) The l i k e l i h o o d i s described by the following expression L(Z 2|6) = n C P m (1-P ) n " m z 1 me e where C i s the number of possible combinations of n objects taken m at m a time. The third type of data consists of a value, Z 3, not exceeded by the extreme events i n n t r i e s . The l i k e l i h o o d i s a special case of the 29 second data type: m = 0. Thus L(Z 3|6) = (1-P e) n The fourth type of data i s that of an event which i s the m t h largest event i n n t r i e s . It corresponds to another special case of the second data type: Z 4 being the m1"*1 largest i t was exceeded m-l times i n n-l t r i e s . The l i k e l i h o o d i s L(Z,|6) = n P E < " - V > d - P e ) n - m The l i k e l i h o o d function for a l l the data i s the product of the li k e l i h o o d for i n d i v i d u a l pieces of data. This result can be derived from Bayes rule (see Appendix 1). From the lik e l i h o o d function we can calculate the likelihoods for each pair of parameter values In the discrete case we are considering. The lik e l i h o o d of any pair should be the volume under the conditional l i k e l i h o o d function surface shown i n Figure 6 over the area that this pair represents. In our discretized case we represent t h i s volume by the li k e l i h o o d at the point When divided by the sum of the likelihoods at the other discre t i z e d points t h i s would y i e l d an absolute l i k e l i h o o d , but as only a r e l a t i v e one i s needed this d i v i s i o n i s not performed. This r e l a t i v e l i k e l i h o o d i s then used i n Bayes1 rule to calculate posterior p r o b a b i l i t i e s for the parameter pairs. 30 For the l a t t e r three data types, some uncertainty may exi s t i n the value of the extreme event Z^. In this case LOW, PROBABLE, and HIGH estimates can be entered on the assumption that the value i s normally distributed with the LOW and HIGH values corresponding to the tenth and ninetieth percentiles respectively. The "expected" l i k e l i h o o d i s then calculated using these three estimates, two intermediate values and the same corresponding p r o b a b i l i t i e s that were developed for a parameter. Mathematically 5 E[L(Z.|9)] + I Pk L(Z |9) 1 k-1 l k where Pk i s the probability associated with one of the five estimated values, Z , of the event Z . The computer treats each type of data separately. Posterior p r o b a b i l i t i e s are calculated after every data type i s entered. This permits us to analyse the effects that each type of data has on the extreme event d i s t r i b u t i o n as we w i l l see i n the next chapter. 3.3 Posterior P r o b a b i l i t i e s The elements needed to calculate posterior p r o b a b i l i t i e s are now in place: the prior probability matrix P ( i , j ) for parameter pi and o^; the l i k e l i h o o d matrix L ( i , j ) = L (z|u^, The posterior p r o b a b i l i t i e s P^ ( i , j ) are then determined using Bayes Rule. P ( i > j ) = P U , j ) - L ( 1 , J ) 5 5 ^ Z* P ( i , j ) . L ( i , j ) j=i 31 3.4 Estimation Estimation can be done using the posterior parameter pr o b a b i l i t i e s through the compound d i s t r i b u t i o n which weights the underlying d i s t r i b u t i o n s . Three types of estimation can be done: • expected extreme event values corresponding to a given exceedance probability; • the expected exceedance probability of a given event; • the d i s t r i b u t i o n of a quantile. 3.4.1 Expected Return Periods for a Given Event Underlying d i s t r i b u t i o n s can be written as cumulative probability functions of a population given the parameter set 6. For a given event value, x, we can then determine the exceedance probability associated with each pair of parameters ® ±y thus forming a set of exceedance p r o b a b i l i t i e s F ( X | 6 ^ ) for the twenty-five parameter pairs. To f i n d the expected return period we then f i n d the expected exceedance probability and invert i t . In this case the expected return period i s E[T r] = | [ F C x I O ^ ) ] where E[F(x|6 )] = I I F(x|G ) P (I ) 1 J i=l j=l 1 J z 3 2 32 3.4.2 Event for a Given Return Period To estimate events corresponding to predetermined exceedance p r o p a b i l i t i e s we must return to the d e f i n i t i o n of a compound d i s t r i b u t i o n . We would l i k e to find the event X whose compounded exceedance probability has a predetermined value. Even for common distributions this value of X i s d i f f i c u l t to solve for d i r e c t l y , hence the popularity of cummulative d i s t r i b u t i o n tables. In the case of a compound d i s t r i b u t i o n , numerical methods must be used. A ZEROIN algoritm used here finds the root of an equation by bisection or interpolation, to a precision determined by the programmer. The resu l t i s the value X which corresponds to the compound exceedance probability desired. Unlike the solution proposed by Cunnane and Nash, the p r i n c i p l e underlying compound d i s t r i b u t i o n s i s respected. 3.4.3 Quantile D i s t r i b u t i o n For every set of parameter values 8 we can solve for the extreme event corresponding to a given exceedance probability or quantile. The probability matrix contains the probability associated with each pair of parameters and hence the probability associated with t h i s extreme event value. The running t o t a l of these p r o b a b i l i t i e s when they are ordered by ascending corresponding events, forms a cummulative probability d i s t r i b u t i o n or quantile d i s t r i b u t i o n at t h i s given exceedance probability. Once again the ZEROIN algorithm i s used to f i n d the event corresponding to each pair of parameters. A "bubble" sorting algorithm i s then used to order the events. Then the running t o t a l i s calculated. 33 Quantile d i s t r i b u t i o n s can be used to make probability statements about the set of extreme events. Care should be used however, as the relationship between the quantile d i s t r i b u t i o n and the compound d i s t r i b u t i o n has not been studied here. While well-behaved data seems to y i e l d plausible r e s u l t s , care should always be used. 3.5 R e f i t t i n g The p r i o r d i s t r i b u t i o n undergoes major changes i f the data indicates d r a s t i c a l l y different preferences i n the mean and standard deviation. In such cases the data tends to increase the weights of either the LOW or HIGH estimates while giving the parameters at the other end of the range zero p r o b a b i l i t i e s . These heavily weighted distrib u t i o n s may constitute boundaries beyond which the data would l i k e to push parameter values. This l i m i t a t i o n i s a dir e c t consequence of our choice to l i m i t the analysis to a discrete case with only twenty-five pairs of parameters. One way to overcome t h i s l i m i t a t i o n i s to try and conserve the shape of the posterior d i s t r i b u t i o n and to r e f i t the range of parameter values. The p r i o r d i s t r i b u t i o n chosen was, i n i t s most r e s t r i c t i v e case, a bivariate normal d i s t r i b u t i o n of the mean and variance with a covariance equal to zero. To r e f i t the posterior d i s t r i b u t i o n we have used a bivariate normal with covariance unrestrained. Two new parameter vectors are generated using the mean and standard deviation of each parameter along with the covariance. Representative values are again situated at the mean, 10th and 90th quantiles with intermediate values as for the p r i o r . New pr o b a b i l i t i e s are calculated using the probability density at the corresponding bivariate normal at each pair. 34 These densities are normalized so that t h e i r sum equals unity. While this process does induce some ce n t r a l i z i n g tendencies we f e e l this i tendency i s much less important than r e s t r i c t i o n s on the range of possible parameter values. 3.6 Linear Transformations of the Parameters In some situ a t i o n s , analysts may want to perform l i n e a r transformations of the population to describe some dependant or closely related population. This program contains a transform function which performs such arithmetic operations. As the scalar or s h i f t c o e f f i c i e n t s are sometimes uncertain, a vector describing a di s t r i b u t i o n of the co e f f i c i e n t i s formed i n the same way parameter vectors are formed and normal d i s t r i b u t i o n p r o b a b i l i t i e s are assigned. A new set of parameter pairs i s then generated using c o e f f i c i e n t d i s t r i b u t i o n and the designated arithmetic operation. The resulting parameter pairs and their p r o b a b i l i t i e s are then r e f i t t e d using the procedure described i n Section 3.5. P r i n c i p a l applications involve changing of the time scale for time series or s h i f t i n g d i s t r i b u t i o n s beyond the physical boundaries imposed by the d i s c r e t i z a t i o n . 3.7 Conclusion A v i s u a l summary of the method i s found i n the Figure 5. Compound dis t r i b u t i o n s can be generated, inferences made, and r e f i t t i n g and l i n e a r transformations operated. 35 CHAPTER 4 : EXAMPLE 4.0 INTRODUCTION The following examples i l l u s t r a t e some of the uses of the Bayesian technique. While some effo r t has been made to use established techniques to f i n d prior d i s t r i b u t i o n s , t h i s i s a vast domain and i n most cases the author has chosen arbitray prior d i s t r i b u t i o n s which i n t u i t i v e l y seemed appropriate. 4.1 PRECIPITATION The monthly p r e c i p i t a t i o n at Vancouver International Airport i s the f i r s t population considered. The t o t a l p r e c i p i t a t i o n for the month of December i s used as an example. Mr. Shaeffer of the Atmospheric Service, Vancouver, has postulated (personal communication) that the cube roots of t o t a l monthly p r e c i p i t a t i o n are normally di s t r i b u t e d . Thus the cube-root normal d i s t r i b u t i o n i s chosen. In actual fact we have 43 years of data. A l l 43 are used as control. For our example we w i l l consider that we know four years of data; from 1977-1980 and we know the largest event i n the 43 years. A range of means and standard deviations was obtained by finding sample means and variances on three consecutive five-year series of data taken at the beginning of the to t a l sample. A copy of the output summary i s included i n Appendix 2. Results are tabulated i n Table I . The results (confirmed by an examination of the probability matrix) indicate increased preference for a s l i g h t l y smaller mean and standard deviation than the prior indicated. The known data series confirms the preference for a s l i g h t l y smaller mean but, after updating with the known extreme i n 43 t r i e s , i t s s h i f t s toward a larger standard deviation. More importantly, the r a i n f a l l predicted by the Bayesian method i s s i g n i f i c a n t l y smaller than that predicted by the control population. This indicates that the results are non-conservative when compared to the f u l l control population. It should also be noted that i n that context the extreme order event, rather than r e c t i f y i n g the s i t u a t i o n , aggravated i t . The known extreme events however, were consistently trying to r e c t i f y this s i t u a t i o n . 4.2 FLOODS A population of annual maximum flows for the Susquehama River at Harrisburg, Pennsylvania was considered as a second example. The example i s treated by c l a s s i c a l analysis techniques i n Lindsley and Franzini (1972). To keep their example as a control, the Gumbel d i s t r i b u t i o n was chosen as the underlying d i s t r i b u t i o n . 76 years of actual data are available. For our example we s h a l l use the 2nd largest of these flows, which occcurred i n 1889 and had a value of 707000 c f s , and the l a s t f i v e years of the data series. This simulates the type of s i t u a t i o n often observed where the largest event may be i n recent history unknown because i t exceeded a l l measuring techniques, the second largest can be evaluated using a flow-resistance equation such as Mannings equation but we only have fiv e years of actual data. Again a range of values for the parameters was obtained by calculating sample means and variances for three consecutive fi v e years samples taken at the beginning of the whole data series. A copy of the output summary i s included i n Appendix 2. Results are tabulated i n Table I I . 37 Two i n i t i a l hypotheses were tested: that l i t t l e was known about the population (the diffuse prior) and that the high and low estimates of the parameters represented the 10th and 90th quantiles of the engineer's preference for these parameters (the skewed p r i o r ) . In both cases the prior estimates yielded conservative results and the data (both types) tended to s h i f t toward the control for the higher return periods. This suggests an appropriate s h i f t toward a smaller standard deviation. While the extreme order event tended to push the lower return period results away from the control, the few known extremes pulled them back. As expected, the diffuse prior information i s more conservative r e f l e c t i n g the lack of preference of the engineer. 4.3 STRENGTH OF MATERIALS - WOOD Ax i a l t e n s i l e strength of spruce-pine-fir lumber was used as thir d example population. Using data correlating the modulus of e l a s t i c i t y i n bending and a x i a l t e n s i l e strength, a range of values for the a x i a l t e n s i l e strength parameters was obtained. The diffuse prior option was chosen because of the engineer's lack of preference. F i n a l l y , because of the "weak l i n k " type f a i l u r e of the samples the lognormal d i s t r i b u t i o n was chosen. While 86 extreme events were recorded, two data sets of f i v e random events were used for the Bayesian example. The author was asked to provide confidence int e r v a l s for the f i f t h quantile. The f i f t h quantile for the population f i t t e d using the method of moments i s available for control as i s a Bayesian computer run using a l l the available data. A summary of the l a t t e r ' s output obtained from the example can be found i n Appendix 2. 38 The quantile d i s t r i b u t i o n s contained i n these summaries y i e l d the following observations: - p r i o r information d i d not provide very t i g h t confidence i n t e r v a l s at a l l , the f i v e percent i n t e r v a l s being 4.1 and 28.4 MPa. - the median of t h i s quantile d i s t r i b u t i o n was 13.3 Mpa p r i o r to updating and rose only to approximately 14.8 MPa a f t e r updating, s h i f t i n g c l o s e r to the f i f t h quantile of the f i t t e d (cummulative density function. - the confidence i n t e r v a l s tightened d r a s t i c a l l y a f t e r updating with the extreme order event and continued to tighten a f t e r updating with the s e r i e s of known data. 4.4 CONCLUSION As we have seen, t h i s Bayesian technique can be applied to three d i f f e r e n t domains of c i v i l engineering. The computer program i s structured to make modifications, types of data, and options. In most cases the addressing subroutines must be modified, menus modified and the appropriate CDF and PDF subroutines added, to increase the f l e x i b i l i t y of the approach. A HELP l i b r a r y i s av a i l a b l e at most menus to guide the user through the program. While at times the program seems u s e r - f r i e n d l y , more work could be done to increase i t s f r i e n d l i n e s s and to standardize the format f or entering event values, parameter estimates, e t c . 39 TABLE I SUMMARY : RAINFALL Updating Stage Return Period P r i o r Extreme Known Flows + Control (years) Di s t r i b u t i o n Event Extreme Event Populations 2.0 162 161 149 184 5.0 196 191 197 223 10.0 216 208 225 240 20.0 237 222 250 280 50.0 265 242 282 305 100.0 288 258 305 200.0 311 274 327 40 TABLE I I SKEWED PRIOR Updating Stage Return Period P r i o r Extreme Extreme Event Control Event Known Flows Order and Order 2 286 295 288 275 5 395 406 396 375 10 472 481 470 430 20 552 556 545 500 50 667 659 646 530 100 701 740 725 640 200 860 824 806 700 D I F F U S E PRIOR Updating Stage Return Period P r i o r Extreme Extreme Order Control Order and Known Event Event 2 287 297 289 275 5 399 408 398 375 10 478 483 473 430 20 561 558 548 500 50 680 661 650 580 100 777 743 730 640 200 879 828 813 700 41 CHAPTER 5 : CONCLUSIONS 5.0 INTRODUCTION The conclusions can be divided into three categories.. The f i r s t i s the results of the present method i n s t a t i s t i c a l terms. As i s often the case i n research the present method has uncovered more questions than i t has answered. Thus the second category i s further research. L a s t l y , we w i l l look at some p i t f a l l s involved with t h i s kind of research i n conjunction with computing. 5.1 INITIAL INTENTIONS The l i t e r a t u r e review, the ex i s t i n g program, and the examples worked through i n the present thesis a l l confirm that Bayesian frameworks can be v a l i d , s t a t i s t i c a l , useful tools for analysing and f i t t i n g a wide variety of populations of extremes encountered i n C i v i l Engineering. S u f f i c i e n t a n a l y t i c a l work has been provided by a variety of research programs to form the bases of several Bayesian approaches. Computational f l e x i b i l i t y , when combined with menu-style programming can make computer programs available to a l l users, whatever their f a m i l i a r i t y with the hardware of computing. The reasonable results along with the obvious improvements on prior information that were obtained after updating with minimal amounts of data are very encouraging. Different preferences of the engineer can be included using degrees of diffuse pri o r s . These results should provide inputs for more widespread applications of Bayesian techniques. 42 5.2 FURTHER RESEARCH There remain three areas of further research that the author considers of primary importance for the ensured a p p l i c a b i l i t y of the present method. The f i r s t i s that of data-likelihood development. As noted i n the example, i n the case of non-conservative prior information data other than known events did not work to r e c t i f y the s i t u a t i o n . A more reasonable l i k e l i h o o d functions can be obtained by simply adding the extreme order event as a known flow i f i t has actually been recorded. Further investigation i s warranted for these types of data, and other types of data applicable to extreme events could be researched and developed. The second i s that of computational error estimation. One of the advantages of the Bayesian framework i s that i t deals with another type of uncertainty than just c l a s s i c a l inherent population uncertainty, that of parameter uncertainty. Uncertainty and error induced by d i s c r e t i z a t i o n should not be neglected. Some discretized models could be compared with models using conjugate d i s t r i b u t i o n s . The size of the probability matrix and the effects of increasing and decreasing the number of zones used i n the d i s c r e t i z a t i o n process could be investigated. F i n a l l y , the c l a s s i c a l analysis does have the ca p a b i l i t y of delimiting confidence i n t e r v a l s . The quantile distr i b u t i o n s could be used to the same ends and a comparison made.. Another area that has provided many questions throughout the development of t h i s thesis has been the estimation of appropriate prior d i s t r i b u t i o n s . Many s t a t i s t i c a l techniques exist that could be adapted and explained, to help engineers In estimating prior d i s t r i b u t i o n s that include a maximum of possible information. The domain of risk, aversion and overcoming undesired r i s k aversion through testing and questioning i s one possible area of reasearch i n this f i e l d . Yet another i s the use of multivariate d i s t r i b u t i o n s and regression data. The f i n a l item of concern i n this domain i s the development of diffuse priors to a more sophisticated degree. Research into any of the above topics would resolve questions uncovered during the present thesis development and promote the application of Bayesian methods. 5.3 COMPUTER PITFALLS During the period of time over which the major part of the work for this thesis was completed, daily computer programming was was performed; and several p i t f a l l s for graduate work i n p r a t i c a l areas l i k e C i v i l Enginering were discovered. 5.3.1 The Output P i t f a l l s C i v i l engineers are not computer s c i e n t i s t s , yet we need to see reasonably formatted output. A minimum of work should be done to make output l e g i b l e for non-users, but no more. Days and months can be and are spent cleaning house; and computer programmers can do this faster and better than c i v i l engineers. 5.3.2 The Language P i t f a l l s Computer languages change continually and thus l i m i t the a p p l i c a b i l i t y of a researcher's work i f care i s not taken i n the i n i t i a l stages to choose the hardware on which the software w i l l be used. The present program i s written i n Fortran 77 and can be used on any VAX computer with the appropriate compiler. It should be noted that i t cannot be used on micro-computers i n i t s present state. This l i m i t a t i o n i s quite stringent. 5.3.3 The "Unfriendliness" P i t f a l l Programs should be adaptable by most potential users and usable by the greatest possible number of those interested i n the output. This requires a "f r i e n d l y " program i n many sections, subroutines or better yet complete programs. Linking should be clear. The author feels that at the reserch l e v e l t h i s implies stringent size r e s t r i c t i o n s . The present program i s very unwieldy from a stanger's point of view. Again t h i s category of work should f a l l into the hands of computer s c i e n t i s t s . 5.4 SUMMARY Because of t h e i r f l e x i b i l i t y , the way they model the human decision-making and thought processes and their i n t u i t i v e p o s s i b i l i t i e s Bayesian techniques w i l l continue to interest s t a t i s t i c i a n s and engineers. The prsent work should prove useful i n this endeavour. L I S T OF REFERENCES 45 1. Benjamin, J.R. and Cornell, CA. "Probability S t a t i s t i c s and Decision for C i v i l Engineering", McGraw H i l l Book Co. Inc., New York, 1971. 2. Bok, G.E.P. and Tao, G.C. "Bayesian S t a t i s t i c a l Inference", Addison-Wesley Publishing Co., Reading, Mass. 1973. 3. Bury, K.V. " S t a t i s t i c a l Models i n Applied Science", Wiley, New York, N.Y., 1975. 4. Chow, V.T. "Handbook of Applied Hydrology", McGraw-Hill Book Co. Inc., New York, N.Y., 1964. 5. Cunnane, C. and Nash, J.E. "Bayesian Estimation of Frequency of Hydrological Events - Proceedings of Warsaw Symposium", July 1971, Aish Publ., No. 100, 1974. 6. K i t e , G.W. "Frequency and Risk Analysis i n Hydrology", Water REsources Publications, Fort C o l l i n g s , Ca, 1977. 7. Kuzcera, G. "Combining Site-Specific and Regional Information: An Empirical Bayes Approach", Water Resources Research, Vol. 18, No. 2, A p r i l 1982, pp. 306-314. 8. Linsle y , R.K. and Fran c i n i , J.B. "Water-Resources Engineering", McGraw H i l l Book Co. Inc., New York, N.Y., 1979. 9. Lin s l e y , R.K. , Kohler, M.A. and Paulhus, J.L. "Hydrology for Engineers", McGraw H i l l Book Co. Inc., New York, N.Y., 1958. 10. Flood Studies Report, NSERC, 1975 11. R a i f f a , H. and Schlaifer, R. " Applied S t a t i s t i c a l Decision Theory", MIT Press, Cambridge, Mass., 1968. Tang, W.H. "Bayesian Frequency Analysis", Journal of the Hydraulics Di v i s i o n , ASCE, Vol. 106, No. HY7, July 1980, pp. 1203-1218. Wood, E.F. and Rodriguez-Iturbe, I . "Bayesian Influence and Decision Making for Extreme Hydrologic Events", Water Resources Research, Vol. 11, No. 4, August 1975, pp. 533-542. Zhang, Y. "Plotting Positions of Annual Flood Extremes Considering Extraordinary Values", Water Resources Research, Vol. 18, No. 4, August 1982, pp. 859-864. 47 F R E Q U E N C Y C U R V E S H O W I N G C O M P O U N D D I S T R I B U T I O N D I S T R I B U T I O N O F R E T U R N P E R I O D F O R A G I V E N E V E N T P R O B A B I L I T Y D E N S I T Y 48 D I S T R I B U T I O N O F E X T R E M E E V E N T S F O R A G I _ V E N _ R E T U R N P E R I O P D I S C R E T I Z A T I O N O F N O R M A L D I S T R I B U T I O N P R O B A B L E FIGURE 4 B A Y E S I A N F R E Q U E N C Y A N A L Y S I S B A S I C T O O L S B A Y E S I A N P R O C E S S E N G I N E E R S E X P E R I E N C E K N O W N D A T A F R E Q U E N C Y D A T A , Q U A N T I L E D I S T R I B U T I O N N E W D A T A O R C H O I C E O F N E W P R I O R O R U N D E R L Y I N G D I S T R I B U T I O N C H O I C E O F ' P R I O R D I S T R I B U ; T I O N S > U N D E R L Y I N G ) D I S T R I B U T I O N S . U P D A T I N G ' Y I E L D I N G P O S T E R I O R D I S T R I B U T I O N " E X A M I N A T O " ' O F P O S T E R I O R [ D I S T R I B U T I O N , R E F I T J V O R T R A N S F O R M FIGURE 5 APPENDIX 1 DERIVATIONS: P r o b a b i l i t y Density Functions Cumulative Density Functions Parameters M u l t i p l i c i t y of L i k e l i h o o d Functions 51 A P P E N D I X 1 LIKELIHOOD DERIVATIONS Note: i n a l l cases u = estimate of mean u, a = estimate of standard deviation T. 1. NORMAL DISTRIBUTIONS PDF CDF —co 2. GUMBEL DISTRIBUTION PDF f(x) = a exp [- a (x-a) -[exp - a (x-a)]] CDF F(x) = exp [- exp [- a (x-a)]] a = - 7 — 0.577 a = u a 3. WEIBULL DISTRIBUTION PDF f(x) = K ( ^ ) k _ 1 exp [- (|) k] CDF F(x) = 1 - exp [- (|) k] to f i n d K solve T( l + 2/K) r* ( l - 1/K) - 1 - = 0 B = r a + 1/K) 4. LOGNORMAL AND CUBEROOT NORMAL DISTRIBUTIONS The two following d i s t r i b u t i o n s are monotonically increasing one-to-one transformations at the normal d i s t r i b u t i o n s . For any such transformation y = g(x) the transformed p r o b a b i l i t y functions are: PDF d " H y ) £ ( y ) - j J . 1 f x ( g - i ( y ) y CDF 53 F (y) = F ( g ~ 1 (y) y x a) Lognormal D i s t r i b u t i o n x - Any - g - l ( y ) = N ( u £ n y , o £ n y ) f (y) - - f (Any) y w y x 7 ^ A n y ^ F ( y ) = j " 1 1 " exp [- \ " V ) 2 ] d (Any) * /2TT T £ N Y -«> Any Thus with the parameters o„ and u„ we can use a normal r Any Any table or function for the CDF and multiply the function for the pdf by 1/y. To f i n d these parameters we can use direct integration of the transformations to y i e l d o 2 >„ = /An a-2-) + 1 Any v v u ' y j„ = An (p ) - 0.5 a. Any v y' Any b) Cuberoot Normal D i s t r i b u t i o n 3 -1 x = / y = g ( y ) = N ( U 3 / — > T3/~7 ) d g - 1 ( y ) _ 1 -2/3 dx = I Y PDF 3 3 / 2 i r y a 3 / - V y CDF vr ^ i r 3 / y r 1 f 3 / 7 - w3/y~ * i F ( y ) - 7 = 3 — - / e x P L- 2 ( 7 > J / 2 TT y a /y -°° a V y To f i n d the parameters we use Taylor's series approach described i n Benjamin and C o r n e l l (1970) p. 181. Keeping the f i r s t three terms for the mean: E < , ( x ) ) . , < » i ) + | . [ i ^ i ! , ] V x = „l/2 g(x) = x 1/3 i 5/3 n 3 / y - <y " 9 <y a v y Keeping the f i r s t two terms f o r the variance 1 -z/ i M u l t i p l i c a t i o n of Likelihood Functions Bayes Rule: P(0i)« L(Z|G i) P z ( <V = Z P (0 )• L(Z|0 i If the data Z comes i n two parts Z^ and Z 2 P ( 0 ± ) . L(Z|G ±) P Z ! ( 0 l ) = Z P ( 9 )• L(Z|0 ) i P Z i ( 0 2 ) L(Z 2|0.) and P z = P Z z = P Z z Z i (6.) = z ^ ( 0 i ) . L ( Z 2 | 0 i ) p ( G ± ) . L(z 1 |e 1)-.L(z 2 |e 1 ) = I (P(0 )• L(Zi|0 ) L(Z 2|G ) i P(0±)« ( L ( Z 1 | 9 1 ) - L ( Z 2 | 0 ! ) ) = E P(0.)« ( L ( Z 1 | 0 1 ) . L ( Z 1 | 0 ± ) ) L (z|e t = L <zi|©±) • L (z 2 |e i ) APPENDIX 2 OUTPUT SUMMARIES OUTPUT - EXAMPLE 1 Vancouver International Airport Mean Monthly Precipitation - December Low 132 PRIOR DISTRIBUTION DATA Estimates of Mean Probable 164 High 193 Low 16.7 Estimates of Standard Deviation Probable High 32.5 60.6 The Unverlying D i s t r i b u t i o n used i s the cube-root normal OUTPUT M 1. Return Period Event 2.00 161.93 5.00 195.88 10.00 216.09 20.00 236.65 50.00 265.49 100.00 288.23 200.00 311.26 UPDATING DATA SUMMARY 4th Largest Event i n N Tries-Event Value 300.2 OUTPUT Return Period Event 2.00 161.18 5.00 191.24 10.00 207.57 20.00 222.44 50.00 242.12 100.00 257.57 200.00 273.74 KNOWN EVENT DATA Event Probable 1 102.0 2 165.0 3 94.3 4 283.2 5 232.3 OUTPUT Return Period Event 2.00 149.36 5.00 196.53 10.00 224.58 20.00 250.00 100.00 304.58 200.00 327.42 N 43 58 OUTPUT - EXAMPLE 2 Low 269 Annual Flood Susquehanna River PRIOR DISTRIBUTION DATA Estimates of Mean Probable 290 High 362 Low 93 Estimates of Standard Deviation Probable High 100 215 OUTPUT Return Period Event 2.00 285.9 5.00 395.3 10.00 472.2 20.00 552.3 50.00 667.0 100.00 760.8 200.00 860.0 M 2. UPDATING DATA SUMMARY Mth Largest Event i n N Tries-Data Event Value 707 N 76 OUTPUT Return Period Event 2.00 294.7 5.00 405.7 10.00 480.8 20.00 556.0 50.00 659.0 100.00 740.2 200.00 824.1 KNOWN EVENT DATA Event Probable 1 2 3 4 5 412 212 252 494 214 OUTPUT Return Period Event 2.00 287.5 5.00 395.9 10.00 470.2 20.00 544.7 50.00 646.0 100.00 725.2 200.00 806.2 OUTPUT - EXAMPLE 3 Low 269 Annual Flood Susquehanna River Diffuse PRIOR DISTRIBUTION DATA Estimates of Mean Probable 290 High 362 Low 93 Estimates of Standard Deviation Probable High 100 215 The Underlying D i s t r i b u t i o n used i s the Gumbel D i s t r i b u t i o n OUTPUT Return Period Event 2.00 287.3 5.00 398.9 10.00 477.9 20.00 560.6 50.00 679.5 100.00 776.6 200.00 878.5 M 2. UPDATING DATA SUMMARY Mth Largest Event i n N Tries-Data Event Value 707 N 76 OUTPUT Return Period Event 2.00 297.0 5.00 407.8 10.00 482.6 20.00 557.9 50.00 661.2 100.00 742.9 200.00 827.5 KNOWN EVENT DATA Event Probable 1 412 2 212 3 252 4 494 5 214 OUTPUT Return Period Event 2.00 288.5 5.00 397.9 10.00 472.9 20.00 548.1 50.00 650.5 100.00 730.4 200.00 812.4 OUTPUT - EXAMPLE 4 Strength of Materials - Wood Maximum Tensile Strength Parallel to Grain PRIOR DISTRIBUTION DATA Estimates of Mean Low Probable High 16 30 44 Estimates of Standard Deviation Low Probable High 5.8 9.8 15.0 The Underlying D i s t r i b u t i o n used i s the Weibull D i s t r i b u t i o n CUMULATIVE QUANTILE DISTRIBUTION Event Quant. Exc 2.1 0.04 4.1 0.08 4.1 0.12 4.5 0.16 5.8 0.20 5.8 0.24 6.6 0.27 7.8 0.32 8.2 0.36 10.2 0.41 10.7 0.46 13.0 0.49 13.6 0.53 13.7 0.59 16.6 0.64 16.6 0.68 19.5 0.71 19.7 0.76 20.1 0.81 23.0 0.84 23.0 0.84 26.5 0.92 26.7 0.96 30.0 1.00 33.4 1.02 Event Exceeded M times i n N Tries - Data Event M N 14.8 83 87 OUTPUT - EXAMPLE 4 (Continued) CUMMULATIVE QUANTILE DISTRIBUTION FOR THE RETURN PERIOD 1.05 Event Quant. Exc. Prob 4.1 0.00 4.1 0.00 4.1 0.00 4.5 0.00 5.8 0.00 6.6 0.00 7.8 0.00 8.2 0.00 10.2 0.00 10.7 0.02 13.0 0.08 13.6 0.23 13.7 0.49 16.6 0.70 16.6 0.89 19.5 0.94 19.7 0.95 20.1 0.99 23.0 0.99 23.2 0.99 26.5 0.99 26.7 0.99 30.0 1.00 33.4 1.00 KNOWN EVENT DATA Event Values Event Probable 1 33.4 2 15.5 3 30.7 4 37.4 5 45.0 OUTPUT - EXAMPLE 4 (Continued) CUMMULATIVE QUANTILE DISTRIBUTION FOR THE RETURN PERIOD 1.05 Event Quant. Exc. Prob. 4.1 0.00 4.1 0.00 4.1 0.00 4.5 0.00 5.8 0.00 5.8 0.00 6.6 0.00 7.8 0.00 8.2 0.00 10.2 0.00 10.7 0.02 13.0 0.02 13.6 0.12 13.7 0.52 16.6 0.77 16.6 0.94 19.5 0.95 19.7 0.96 20.1 0.99 23.0 0.99 23.2 0.99 26.5 0.99 26.7 1.00 30.0 1.00 33.4 1.00
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Estimating design values for extreme events
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Estimating design values for extreme events Sparks, Douglas Frederick 1985
pdf
Page Metadata
Item Metadata
Title | Estimating design values for extreme events |
Creator |
Sparks, Douglas Frederick |
Publisher | University of British Columbia |
Date Issued | 1985 |
Description | Extreme event populations are encountered in all domains of civil engineering. The classical and Bayesian statistical approaches for describing these populations are described and compared. Bayesian frameworks applied to such populations are reviewed and critiqued. The present Bayesian framework is explained from both theoretical and computational points of view. Engineering judgement and regional analyses can be used to yield a distribution on a parameter set describing a population of extremes. Extraordinary order events, as well as known data, can be used to update the prior parameter distribution through Bayes theorem. The resulting posterior distribution is used to form a compound distribution, the basis for estimation. Quantile distributions are developed as are linear transformations of the parameters. Examples from several domains of civil engineering illustrate the flexibility of the computer program which implements the present method. Suggestions are made for further research. |
Subject |
Bayesian statistical decision theory Engineering design Structural design |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0062834 |
URI | http://hdl.handle.net/2429/25109 |
Degree |
Master of Applied Science - MASc |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of Civil Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1985_A7 S62.pdf [ 2.75MB ]
- Metadata
- JSON: 831-1.0062834.json
- JSON-LD: 831-1.0062834-ld.json
- RDF/XML (Pretty): 831-1.0062834-rdf.xml
- RDF/JSON: 831-1.0062834-rdf.json
- Turtle: 831-1.0062834-turtle.txt
- N-Triples: 831-1.0062834-rdf-ntriples.txt
- Original Record: 831-1.0062834-source.json
- Full Text
- 831-1.0062834-fulltext.txt
- Citation
- 831-1.0062834.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0062834/manifest