M I X T U R E DISTRIBUTIONS A N D SPATIAL S C A L E EFFECTS ON FLOOD H Y D R O L O G Y By A H M E D MTIRAOUI B.A.Sc. University of Laval, 1994 M.A.Sc. University of Laval, 1996 A THESIS SUBMITTED IN PARTIAL F U L F I L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES Forest Resources Management The UNIVERSITY OF BRITISH C O L U M B I A April 2004 © Ahmed Mtiraoui, 2004 ABSTRACT Knowledge of the magnitude and frequency of floods on rivers is necessary for a variety of practical applications, including the design of hydraulic structures such as bridges and culverts, and floodplain management through land-use allocation and flood-protection measures. Design floods estimated by fitted distributions are prone to errors associated with (i) mis-specification of the parent distribution at a single site and (ii) the estimation of flood statistics in regional analysis. The first part of this thesis deals with the mis-specification of the parent distribution, that is, the model governing the population from which the observed sample of data is supposedly drawn. Usually, traditional flood frequency analysis involves the assumption of homogeneity of the flood distribution. However, floods are often generated by heterogeneous distributions composed of a mixture of two or more populations. Differences between the populations may be due to a number of factors, including seasonal variations in the flood producing mechanisms, changes in weather patterns due to low frequency climate shifts and/or El-Nino/La-Nina oscillations, changes in channel routing due to the dominance of within channel or floodplain flow, and basin variability resulting from changes in antecedent soil moisture. We demonstrated that in many cases not recognizing these physical processes in conventional flood frequency analysis is the main reason why many frequency distributions do not provide an acceptable fit to flood data. An analysis of flow records from streams across British Columbia (Canada), the Gila River (Arizona, USA), and the River Tees (northern England) indicated that when floods are generated by two or more distinct hydrologic processes, the resulting flood distributions may be multimodal and may not be represented by homogeneous distributions. Analysis indicated that the T-year design flood estimated by assuming heterogeneous distributions is much more conservative than those estimated by homogeneous distributions. ii Monte Carlo simulations were used in this study to quantify the errors in estimating design floods that are caused by a mis-identification of distributions. A set of homogeneous and heterogeneous parametric and nonparametric distributions were compared. A series of variables were also tested, namely the return period, sample size, and combinations of several two parametric distributions. An assessment of the suitability of flood estimation techniques was made based on the effect of these conditions on the accuracy of the estimates. It was found that for high L-skewness (L-Cs) and a heavy-tailed probability density function, both of which are characteristics of flood mixtures in arid and semi arid climates of Arizona, the Wakeby and two-component log-normal (TCLN) distributions consistently perform well compared to the nonparametric (NP), Gumbel (EV1) and log-Pearson type III (LP3) distributions. For characteristics of flood mixtures representative of the humid climates of British Columbia, where the heterogeneity results in a flood frequency distribution with smaller value of L-skewness (L-Cs) and a bimodal probability density function, the Gumbel (EV1) and nonparametric (NP) distributions perform better than the other distributions. The second part of this thesis provides new insights that serve to improve scientific understanding and professional practice in addressing regional flood hydrology problems. Currently employed peak flow regionalisation procedures inherently make assumptions of scale invariance. One assumption is that the scaling exponent of the flood quantile-drainage area power relationship is independent of catchment size. A second assumption is that the index flood method is valid such that growth factors between flood quantiles are independent of catchment size (scale). A third assumption inherent in many regional flood models is the constancy in the L-coefficient of variation (L-Cv) and the L-coefficient of skewness (L-Cs) over homogeneous geographical regions. This study focuses on the spatial scaling patterns of linear moment flood statistics, and offers plausible explanations for observed regional scaling trends, in terms of the various precipitation and runoff mechanisms that dominate at different scales and in different climates. The characteristics of these mechanisms are then linked back to the effects that variations in L-moment ratio statistics have on flood quantile estimates, and most importantly, the tail behaviour of flood frequency distributions. iii A regional linear moment analysis of annual maximum daily flows in streams in British Columbia, California, Colorado, and the Walnut Gulch Experimental Watershed are used to demonstrate that these assumptions of scale invariance of flood statistics are invalid. This is because flood statistics depend not only on physiography and climatic conditions, but also to a large extent on the size of the catchment. Scale dependence of flood statistics hampers the estimation of peak flows, in particular for small (< 100 km ) ungauged watersheds. TABLE OF CONTENTS A B S T R A C T ii T A B L E OF CONTENTS v LIST OF FIGURES viii LIST OF T A B L E S xiv LIST OF S Y M B O L S xvi A C K N O W L E D G E M E N T S • xviii INTRODUCTION 1 1.1 MOTIVATION 1 1.1.1 Errors Associated with Mis-identification of the Parent Distribution at Single Site Analysis , 1 1.1.2 Errors Associated with the Estimation of the Flood Statistics in Regional Analysis 3 1.2 OBJECTIVES 7 1.3 THESIS STRUCTURE 9 STUDY A R E A A N D HYDROCLIMATIC D A T A 11 2.1 INTRODUCTION.. . . 11 2.2 STUDY A R E A S A N D HYDROMETRIC D A T A 12 2.2.1 Gila River Basin, Arizona (USA) 12 2.2.2 Walnut Gulch Experimental Watershed, Arizona (USA) 15 2.2.3 British Columbia, Canada 16 2.2.4 Colorado and California (USA) 18 2.2.5 Tees River Basin, United Kingdom 20 2.3 SCREENING OF HYDROMETRIC D A T A B A S E 21 2.3.1 Test of independence 22 2.3.2 Test for trend 22 2.3.3 Test for general randomness 22 2.3.4 Test for homogeneity 22 2.3.5 Application of nonparametric tests to regional studies 23 2.4 RECORD L E N G T H 23 2.5 HIGH A N D L O W OUTLIER FLOODS 24 2.6 T R E A T M E N T OF ZERO FLOWS 24 2.7 POSSIBLE EFFECTS OF CLIMATIC SHIFTS OR VARIABILITY ON STREAMFLOWS 25 2.8 D A T A Q U A L I T Y 26 CAUSES OF M I X T U R E IN FLOOD D A T A 28 3.1 INTRODUCTION 28 3.2 LITERATURE REVIEW 29 3.3 R E S E A R C H METHODS 32 3.3.1 Flood histogram by month of occurrence 34 3.3.2 Nonparametric PDF and CDF shapes 34 3.3.3 Antecedent Precipitation and Temperature Indices 36 3.4 RESULTS A N D DISCUSSION 36 3.4.1 British Columbia 36 3.4.2 Gila River Basin, Arizona 42 3.4.3 Tees River Basin, United Kingdom 43 3.5 CONCLUSIONS 45 T R E A T M E N T OF M I X T U R E DISTRIBUTIONS AT A SINGLE SITE 48 4.1 INTRODUCTION 48 4.2 LITERATURE REVIEW 49 4.3 R E S E A R C H METHODS 52 4.3.1 Parametric Flood Frequency Analysis 53 4.3.2 Parametric Two-Component Distributions Flood Frequency Analysis 54 4.3.3 Nonparametric Flood Frequency Analysis 58 4.4 RESULTS A N D DISCUSSION 60 4.4.1 Gila River Basin of Southeast and Central Arizona 60 4.4.2 Low Moor of the Tees River Basin of northeast England, United Kingdom... 62 4.4.3 British Columbia, Canada 63 4.4.4 Summary 64 4.5 CONCLUSIONS 65 ASSESSMENT OF THE P E R F O R M A N C E OF VARIOUS FLOOD M I X T U R E DISTRIBUTIONS USING M O N T E C A R L O SIMULATIONS 67 5.1 INTRODUCTION 67 5.2 LITERATURE REVIEW 69 5.3 R E S E A R C H METHODS 72 5.3.1 Simulation design 73 5.3.2 Scenarios of Monte Carlo Simulation 77 5.4 RESULTS A N D DISCUSSIONS 79 5.4.1 Scenario #1 80 5.4.2 Scenario #2 81 5.4.4 Scenario #3 81 5.4.5 Scenario #4 82 5.5 CONCLUSIONS 84 SPATIAL S C A L E EFFECTS ON REGIONAL FLOOD CHARACTERISTICS 86 6.1 INTRODUCTION 86 6.2 LITERATURE REVIEW 86 6.3 R E S E A R C H METHODS 92 6.4 RESULTS 96 6.5 DISCUSSION OF RESULTS 99 vi 6.5.1 Scaling of the L-Cv 99 6.5.2 Scaling of the L-Cs 104 6.6 CONCLUSIONS 104 CONCLUSIONS A N D RECOMMENDATIONS 107 7.1 S U M M A R Y OF THE STUDY AREAS 107 7.2 CONCLUSIONS 107 7.3 RECOMMENDATIONS FOR FUTURE STUDIES 111 REFERENCES 112 APPENDIX A 128 APPENDIX B 208 vii LIST OF FIGURES 2.1 The Gila River Basin showing the locations of the gauging station detailed in Table 3.2 2.2 Walnut Gulch Experimental Watershed: location map, rain gauge and watershed locations (from Goodrich et al, 1997) 2.3 Major physiographic regions of British Columbia with regions considered in this study superimposed as shaded areas 2.4 Location map of study regions in California and Colorado (after Pitlick, 1994) 2.5 The River Tees and study reach, England 3.1 Frequency curves, Flat River at Bahama, N.C. (after Potter, 1958) 3.2 Example of three histograms by month of occurrence (a) floods are occurring in one season, (b) floods occurring in two distinct seasons, and (c) floods occurring all year around 3.3 Hypothetical sample of: (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a unimodal distribution. 3.4 Hypothetical sample of: (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a bimodal distribution. 3.5 Hypothetical sample of: (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a heavy-tailed distribution. 3.6 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Fishtrap Creek near Mclure (WSC STN 08LB024) 3.7 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Salmo River near Salmo (WSC STN 08NE074) 3.8 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) viii one-week antecedent precipitation index plot for Boundary Creek near Porthill (WSC STN 08NH032) 3.9 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Halfway River near Parrell Creek (WSC STN 07FA001) 3.10 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Chilliwack River at Vedder Crossing at Vedder Crossing (WSC STN 08MH001) 3.11 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Bella Coola River above Burnt Bridge Creek (WSC STN 08FB007) 3.12 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Zymagotitz River near Terrace (WSC STN 08EG011) 3.13 (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot for Zymoetz River above O.K. Creek (WSC STN 08EF005) 3.14 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for the Kitimat River below Hirsch Creek (WSC STN 08FF001) 3.15 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Little Wedeene River below Bowbyes Creek (WSC STN 08FF003) 3.16 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Chilliwack River at Vedder Crossing (WSC STN 08MH001) 3.17 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Kitsumkalum River near Terrace (WSC STN 08EG006) ix 3.18 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Muskawa River near Fort Nelson (WSC STN 10CD001) 3.19 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Boundary Creek near Porthill (WSC STN 08NH032) 3.20 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Skeena River at Usk (WSC STN 08EF001) 3.21 Empirical mixed population analysis of: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve floods for Kettle River near Laurier (WSC STN 08NN012) 3.22 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by storm type, and (b) unclassified annual flood frequency curve for San Fransisco River at Clifton (USGS STN 09444500) and Gila River at Clifton (USGS STN 09442000) 3.23 San Francisco River at Clifton (USGS STN 09444500) (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences 3.24 Gila River near Clifton (USGS STN 09442000) (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences 3.25 Annual flood series for the Santa Cruz River at Tucson, Arizona. Hydroclimatological year is November 1 to October 31 (from Webb and Betancourt, 1992) 3.26 Variation of L-coefficient of variation (L-Cv) of the annual floods with drainage area during the pre-and post-1960 conditions 3.27 Empirical mixed population analysis of floods for the Gila River near Clifton and San Francisco River at Clifton (a) frequency curves of floods pre-1960 and post-1960, and (b) unclassified annual flood frequency curve 3.28 Empirical flood frequency curves at Broken Scar and Low Moor using the Cunnane plotting position formula 3.29 Wave speed and travel time through the reach at flood discharges (from Archer, 1989) X 3.30 Relationship between upstream and downstream discharges with the ratio of 12-hour mean flow to the peak flow as a parameter (from Archer, 1989) 3.31 Tees River at Low Moor - F3606 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences 3.32 Tees River at Broken Scar - F3501 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences 3.33 The nonparametric density function for the flood data at Broken Scar (h is the smoothing factor according to Equation 3.1) 3.34 The nonparametric density function for the flood data at Low Moor (h is the smoothing factor according to Equation 3.1) 4.1 Mixed population analysis of annual floods by the LP3 distribution for the ten streams in the Gila River Basin 4.2 Mixed population analysis of annual floods by the GEV distribution for the twelve streams in the Gila River Basin 4.3 Mixed population analysis of annual floods by the Wakeby distribution for the ten streams in the Gila River Basin 4.4 Mixed population analysis of annual floods by the TCLN distribution for the ten streams in the Gila River Basin 4.5 Mixed population analysis of annual floods by the nonparametric distribution for the ten streams in the Gila River Basin 4.6 Annual flood data at Low Moor fitted by (a) GEV distribution, (b) TCLN distribution and (c) Nonparametric distribution. Data are plotted on normal probability paper using the Cunnane formula 4.7 Mixed population analysis of annual floods by the EV1 distribution for the six streams in British Columbia 4.8 Mixed population analysis of annual floods by the TCLN distribution for the six streams in British Columbia. 4.9 Mixed population analysis of annual floods by the nonparametric distribution for the six streams in British Columbia 4.10 Effect of plotting position formula on the visual assessment of the LP3 distribution fit to annual floods of the Salt River near Roosevelt xi 5.1 Flowchart Summarizing the Monte Carlo Simulation Experiment 5.2 Mixture of flood series caused by differences in the (a) mean, (b) variance and (c) mean and variance of the two populations making up the heterogeneous distribution. 5.3 Probability density function and cumulative frequency function estimated by NP method for Scenario #1 (a hypothetical mixture similar in characteristics to flood mixture in BC) 5.4 Probability density function and cumulative frequency function estimated by NP method for Scenario #2 (a hypothetical mixture similar in characteristics to flood mixture in BC) 5.5 Probability density function and cumulative frequency function estimated by NP method for Scenario #3 (a hypothetical mixture similar in characteristics to flood mixture in Arizona) 5.6 Probability density function and cumulative frequency function estimated by NP method for Scenario #4 (a hypothetical mixture similar in characteristics to flood mixture in Arizona) 5.7 Scenario #1- accuracy of estimating design floods for various return periods when the sample Size is 20 (generation method is TCLN) 5.8 Scenario #2 - accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 5.9 Scenario #3 - accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 5.10 Scenario #4- accuracy of estimating design floods for various return periods when the sample Size is 20 (generation method is TCLN) 5.11 Scenario #1 - influence of sample size on the accuracy of 100-yr design flood. Data are generated by TCLN 5.12 Scenario #3 - influence of sample size on the accuracy of 100-yr design flood. Data are generated by TCLN 5.13 Accuracy of design flood estimation for various return periods for Scenario #1 of Arizona data using different generators (a) TCLN (b) TCEV and (c) Wakeby 5.14 Accuracy of design flood estimation for various return periods for Scenario #3 of BC data using different generators (a) TCLN (b) TCEV and (c) Wakeby xii 6.1 Variation of regional L-coefficient of variation (L-CV) of flood peaks with drainage (after Smith, 1992) 6.2 Variation of regional L-coefficient of variation (L-CV) of flood peaks with drainage (after Cathcart, 2001) 6.3 Visual representations of the box-plot 6.4 Scaling behavior of the L-coefficient of variation for the annual flood series derived from daily flow data for three physiographic regions 6.5 Scaling behavior of the L-coefficient of variation (L-Cv) for the annual flood series derived from instantaneous flow data for three physiographic regions 6.6 Climate effect on scaling of the L-Cv in Colorado and California 6.7 Box plots of the L-coefficient of variation versus drainage area for (a) Colarado Alpine; (b) Colorado Foothills; (c) north-Central Sierra Nevada; and (d) Coast Range 6.8 Scaling behavior of L-Cv in Walnut Gulch Experimental Watershed - Arizona 6.9 Large data scatter obscures any scaling of the L-coefficient of skew for the annual flood series derived from daily flow data for three physiographic regions 6.10 Climate Effect on Scaling of the L-coefficient of Skewness (L-Cs) in Colorado and California 6.11 Box plots of the L-coefficient of skewness (L-Cs) versus drainage Area for (a) Colarado Alpine; (b) Colarado Foothills; (c) Noth-Central Sierra Nevada; and (d) Coast Range 6.12 Sensitivity of the flood frequency curve to the L-Coefficient of Variation (L-Cv) obtained using Generalized Extreme Value (GEV) distribution 6.13 L-Coefficient of Variation vs. drainage area for Britis Columbia (a) Interior plateau, (b) Coast, and (c) Columbia/Soutern Rocky mountains xiii LIST OF T A B L E S 2.1 General characteristics of the ten selected USGS hydrometric stations in the Gila River basin, Arizona 2.2 General characteristics of the hydrometric stations in Walnut Gulch Experimental Watershed, Arizona 2.3 General characteristics of British Columbia hydrometric stations 2.4 General characteristics of the selected regions in Colorado and California 2.5 Test for independence of annual maximum daily flows at Broken Scar near Darlington 2.6 Test for trend of annual maximum daily flows at Broken Scar near Darlington 2.7 Test for randomness of annual maximum daily flows at Broken Scar near Darlington 2.8 Test for homogeneity of annual maximum daily flows at Broken Scar near Darlington 3.1 Approximate periods of El-Nino-Southern Oscillation conditions in equatorial Pacific Ocean (after Webb and Betancourt, 1992) 4.1 Recent flood frequency studies by L-moments that recommend use of the GEV distribution 4.2 The PDF and some characteristics of distributions commonly used in flood frequency analysis 4.3 T-year flood events estimated by the various frequency distributions for the ten stations in Arizona 4.4 T-year flood events estimated by the various frequency distributions for the six stations in BC 5.1 Scenarios for the Monte-Carlo Simulations Design 5.2 Scenario #1 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) x i v 5.3 Scenario #2 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) 5.4 Scenario #3 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) 5.5 Scenario #4 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) X V LIST OF S Y M B O L S A F S Annual flood series A P I Antecedent precipitation index B I A S Relative bias C A R F F Committee on American River Flood Frequencies C F A Consolidated frequency analysis C D F Cumulative distribution function C V Coefficient of variation E N S O El-Nino-southern oscillations EVl Extreme Value Type I (Gumbel) distribution exp Exponentiation; for example, exp(X) = ex F(x) True cumulative nonparametric distribution function F(X > x) Probability of exceedance of a flood event Fn (x) Estimated cumulative nonparametric distribution function f(x) Probability distribution function (PDF) of exceedances f(x) Estimated probability density function F, , F2 Distribution functions for population 1 and 2. G E V Generalized extreme value distribution h Smoothing factor of the nonparametric density function IH United Kingdom Institute of Hydrology K(.) Kernel function associated with the nonparametric method K* (t) Integral of the kernel function from t to infinity L-moments Linear moment statistics of Hosking (1990) L - C K Coefficient of L-kurtosis L -Cs L-Coefficient of L-skewness L - C v L-Coefficient of L-variation L N I I Two-parameter log-normal distribution xvi LNIII Three parameter log-normal distribution LP3 Three-parameter log-Pearson distribution LSCV Least square cross-validation technique MSE Mean square error NP Nonparametric distribution n Number of observations in a time series of annual floods n. t Frequency in class interval i QT Discharge of T-year return period QI Mean annual flood (index flood) RMSE Root mean square error SOI Southern oscillation index TCEV Two-component extreme value distribution T Return period in years USGS United States Geological Survey USWRC United States Water Resources Council A particular observation in a time series of annual floods Weight or relative frequency factor for component distributions a Factor used to weigh the relative importance of each flood population jit; and ai Mean and standard deviation of the component distribution i ju, a, g,and k, Mean, standard deviation, skewness, and kurtosis of composite distribution xvii ACKNOWLEDGEMENTS I wish to gratefully acknowledge the support, advice and encouragement of Dr. Younes Alila, throughout my research study. I am also indebted to Dr. Millar, Dr. Chieng and Dr. Hassan for contributing so much in directing this study and for taking the time to serve on my committee. I thankfully acknowledge Forest Renewal of British Columbia and the National Science and Engineering Research Council of Canada (Grant no. NSERC OGP0194388) for providing financial support to complete this research. I thank K. Hirschboeck, whose work on the hydroclimatology of the Gila River Basin has inspired me. I am also grateful to K. P. Singh of the Illinois State Water Survey for his valuable advice and for providing me with the source code necessary for fitting flood mixture. I also thank Wendy Merritt and Karen Ward for their editorial comments that helped improve the final draft of the thesis. Finally, I extend my warmest gratitude to my wife Faiza and my family for giving strength, encouragement, and continual moral support. xviii Chapter 1 INTRODUCTION 1.1 MOTIVATION Many practical problems need quantitative information and prediction of peak flows or floods at a wide range of space and time scales in drainage basins. The motivation behind estimation of floods is always exclusively the engineering design of structures (bridges culverts, etc.) although the estimates of peak flows for any return periods are needed for a variety of other important practical issues. For example, the variation of flows in terms of their magnitude and frequency, or regime, is a primary factor that controls channel form and process as well as the nature of aquatic and riparian ecosystems. Peak flows can be estimated either through a deterministic approach (i.e. rational method or unit hydrograph), or a stochastic approach using flood frequency analysis in which an assumed distribution is selected to fit data at a gauged watershed with a long record length. Peak flows estimated by the fitted distribution are prone to errors. A possible source of error is the mis-specification of the parent distribution, i.e. the model governing the population from which the observed sample of data is supposedly drawn. Another potential source of error is deviations of the distribution's parameters from their "true" values because of sampling deficiencies or inappropriate fitting techniques. 1.1.1 Errors Associated with Mis-identification of the Parent Distribution at Single Site Analysis The first part of this thesis focuses on model errors (errors associated with mis-identification of the parent distribution). Many flood frequency distributions have been investigated (Cunnane, 1989). Scientists have customarily used various at-site and 1 regional statistical measures of goodness-of-fit to discriminate between alternative distributions. A particular "best" fit distribution is selected if the deviations between the observed and computed floods are statistically insignificant. There are three problems associated with statistical goodness-of-fit tests. Firstly, based on the relatively short record length of available streamflow data, several distributions with extremely different upper tail characteristics often give an equally good fit in the range of observed floods. This has major implications for the prediction of extreme flood events. Secondly, the deviations between observed and predicted floods are often operationally significant and should not be ignored just because they are statistically insignificant. These deviations may lead to a difference of the order of millions of dollars in engineering design and flood damage, which no professional can justify treating as insignificant. Thirdly, more emphasis in the literature is being made on the statistical sophistication of these goodness-of-fit tests without much reference to the physical processes of floods as they affect the characteristics of the resulting frequency distribution. Klemes (1974) articulated this so eloquently that it is worth quoting verbatim: "... the main emphasis in stochastic analysis of hydrologic processes, which basically is the domain of pure hydrology, has been on the fitting of various preconceived mathematical models to empirical data rather than on arriving at a proper model from the physical nature of the process itself. ...In trying to improve this situation, the main problem is to find the ways in which the physical features of a phenomenon can be introduced into the analysis." More recently, Klemes (1999) stated that after a quarter of a century the state-of-science in stochastic hydrology is still the same. For instance, in both engineering practice and the scientific literature, the most commonly used theoretical flood frequency distributions assume that the annual flood series is a random sample drawn from a single homogeneous population (i.e. floods are associated with homogeneous distributions). The validity of such an assumption has often been questioned (e.g. CARFF, 1999). One 2 of the ways of accounting for the physical nature of flood processes in the selection of the most suitable frequency model is to consider floods to have heterogeneous frequency distributions. Within the same basin, the seasonal variation in flood-producing mechanisms (i.e., hurricanes, thunderstorms, frontal storms, snowmelt, rain-on-snow) offen results in heterogeneous (mixed) distributions of flood. The research questions that need to be addressed in the first part of this thesis are: (i) what are the benefits of techniques for improving the reliability of flood estimates by mixture distributions when the heterogeneity results in a flood frequency distribution with high L-skewness (L-Cs) and a heavy tailed probability density function (characteristics of flood mixtures in arid and semi-arid climates)? (ii) What are the gains in improving the reliability of flood estimates by mixture distributions when the heterogeneity results in a flood frequency distribution with a smaller value L-coefficient of skewness (L-Cs) and a bimodal probability density function (characteristics of flood mixtures in humid climates)? (iii) In either of the above two cases, which heterogeneous distribution gives the best performance in fitting mixture in annual floods (i.e., nonparametric or two-component parametric distributions)? and (iv) How do the nonparametric and two-component parametric distributions perform over various ranges of sample sizes and return periods used in practice? These analogous questions have not been answered in any region where mixtures have been identified. 1.1.2 Errors Associated with the Estimation of the Flood Statistics in Regional Analysis The second part of the thesis addresses errors associated with the estimation of the flood statistics in regional analysis. One approach to predicting floods is regional flood frequency analysis, and it is rather widely accepted (Bobee and Rasmussen, 1995). Such analyses consist of transferring information between sites within a hydrologically and statistically homogeneous region, zone of applicability of the regional model, in order to improve the accuracy of design flood estimates at short-term record stations and allow the estimation of information at ungauged sites. 3 Over the years, several methodologies have evolved in regional flood frequency analyses that are based on some fundamental assumptions of scale invariance. One approach is the index-flood method of Dalrymple (1960). In this approach, the coefficient of variation (Cv) and coefficient of skewness (Cs) of a flood distribution are assumed to be constant within a hydrologic region and not changing with the size of the catchment. The mean for a particular watershed is, however, estimated from at-site data (i.e., the hydrologic region is considered to be homogeneous in all flood statistics with the exception of the mean). A second method in regional flood frequency analysis is based on multiple regression. Various regression models have been used in hydrology to predict the statistics of hydrologic variables of individual watersheds. This requires selecting independent variables, such as watershed size, channel length, mean basin elevation, surface storage by lakes and swamps, drainage density, forest coverage, and precipitation, and selecting a regression model (linear, power, logarithmic, exponential, polynomial, etc.). Riggs (1973) reviewed published regional flood-frequency regressions and found that drainage area was the only variable used in many studies. This approach is also based on scale invariance assumptions of other types. For example, the exponent of the flood quantile-drainage area power relation (Qr=kAe) for a certain return period peak flow is determined across the full range of drainage areas available for analysis, as is the case with many of the United States Geological Survey (USGS) regression equations (Jennings et al, 1994) and flood equations used in BC (Church, 1997; Eaton et al, 2002; and Coulson and Obedkoff, 1998). It is thus assumed that the exponent is scale invariant. The mapping of flood statistics or parameters, another method of regionalization, is also based on scale invariance. Some studies have mapped the coefficient k of the flood-quantile drainage area power relation {Qr=kAe) (McKerchar and Pearson, 1990; Church, 1997; and Eaton et al, 2002). This regional factor k represents a first order approximation of regional runoff variation as affected by climate, physiography, land cover, and geology. However, the mapping of k is based on the assumption of a constant scaling exponent irrespective of physiography, climate, and size of the catchment. Without this assumption, the mapping of k cannot be done. The use of a generalized skew 4 coefficient and the development of skewness coefficient (Cs) maps by the United States Water Resources Council (1976) is another example of mapping flood statistics. The value of this approach was hotly debated (Klemes, 1976 and McCuen, 1979), but never fully resolved. Some recognized that at-site skewness values demonstrated large variability, and although the generalized skew approach is flawed, it provides a practical means of generating skewness estimates that minimize the significant errors associated with small sample sizes. Others have never accepted this compromise, arguing that the very concept of regionalizing the coefficient of skewness of the time series of maximum flows has serious faults, especially as an aid to engineering design, as it only reflects an overall average tendency and fails to recognize that the skewness is affected by the size of storage in the basin (McCuen, 1979 and McCuen 2001). Klemes (1976) was also pragmatic about this when he stated: "...First, regional estimates of skewness should be conditioned also on basin area and physiographic features. Second, the coefficient of skewness of annual floods is likely to vary along the course of a river, and reversals of the direction of its change can occur. Similarly, skewness on a tributary may be very different from that on the main stem of the river. " In studies where homogeneity in the coefficient of variation could not be justified, the regional shape estimation method has been used. The term 'regional shape estimation' was coined by Hosking and Wallis (1997) to refer to a flood frequency method in which the mean and dispersion of the frequency distribution are estimated from at-site data, while the shape parameter is assumed constant within a pre-specified hydrologic region. In other words, this method assumes that the coefficient of skewness is not changing with the size of the catchment. The same concept has also been used in earlier Monte Carlo simulation work by Lettenmaier et al. (1987) and Stedinger and Lu (1995). In some other studies, different flood statistics were assumed to be constant over different spatial scales (Fiorentino et al., 1987 and Gabriele and Arnell, 1991). This situation led to what is becoming known as the hierarchical regional approach, in which different 5 distribution parameters are estimated based on different, but nested, sub-sets of data. For instance, the skewness could be assumed constant over a super hydrologic region while the coefficient of variation could be considered constant over sub-regions defined within the super hydrologic region. In this instance, the mean, dispersion, and shape parameters for a watershed of interest are estimated from at-site, sub-regional, and regional data, respectively. It is implied in this method that the coefficient of variation and the coefficient of skewness are independent of the size of the catchment. Fractional membership is a regionalization approach that was introduced by Wiltshire (1986a, b). In this technique, a watershed is regarded not as belonging to a particular region but as having fractional membership in several regions. The parameters for the site of interest are estimated by a weighted-average of the corresponding estimates for different regions. A variation of this approach is the region of influence method (Burn, 1990) whereby each site has its own hydrologic homogeneous region that consists of sites expected to have similar frequency distributions. This approach uses a numerical algorithm to estimate flood frequency characteristics for ungauged sites on the basis of data from gauged sites with similar basin characteristics. For an ungauged site of interest, the algorithm selects a subset of gauged sites from the entire data base of all gauged sites. Selection of the subset of gauged sites is based on similarity between basin characteristics of these gauged sites and those of the ungauged site. Once a network of homogeneous gauged watersheds is identified, the estimation at the ungauged site proceeds with conventional procedures such as index flood or regression methods (GREHYS, 1996 and Burn, 1990). Therefore, the region of influence method uses the same scale invariance assumptions as the index flood and multiple regression as discussed above. The development of the above reviewed regional flood frequency models is often based on three fundamental steps: (i) delineation and testing of the hydrologic homogeneous regions, (ii) identification of a suitable candidate flood frequency distribution, and (iii) development of regional relationships for the transfer of flood statistics from gauged to ungauged catchments within the same "region". 6 The term "hydrologically homogeneous region" has a number of different definitions, but it generally refers to a grouping of basins that exhibits a consistent pattern of hydrologic response, which is often related to consistency in climate and physiography. Consequently, large regions often are not homogeneous because of the spatial variation of climate and physiography, while small regions, on the other hand, may be hydrologically homogeneous, but often contain insufficient data points to adequately define regional patterns. The delineation of homogeneous regions is closely related to the identification of the common regional distribution that applies within each region. A region can only be considered homogeneous if sufficient evidence can be established that data at different sites in the region are drawn from the same parent distribution. Distribution selection in L-moment analysis is performed by comparing L-moment ratios of L-skewness and L-kurtosis to the theoretical values (Hosking and Wallis, 1993). Averages of L-skewness and L-kurtosis within a homogeneous region are often plotted on an L-moment ratio diagram along with theoretical curves for various candidate distributions. If the point corresponding to the regional averages is located near the curve corresponding to a given distribution, the nearest distribution will be a reasonable choice for the parent distribution in this region. This averaging of L-skewness and L-kurtosis within the region assumes that these are not changing with the size of the watershed. In summary, the research questions that are addressed in the second part of the thesis are (i) how do flood frequency characteristics vary with the size of catchment? In other words, how do dimensionless flood statistics such as L-Coefficient of variation (L-Cv) and L-Coefficient of skewness (L-Cs) vary with the spatial scale? (ii) how does the relationship between these flood statistics and the size of the catchment vary with climate, physiography, and hydrologic regime? (iii) what is a plausible physical explanation of these scaling patterns? 1.2 OBJECTIVES The specific objectives of the thesis are as follows: 7 • Investigate mixtures that are present in annual flood data using various statistics describing the characteristics of floods. The statistics considered are the month of occurrence of floods, antecedent precipitation and temperature indices, shape of the probability density function, and shape of the cumulative distribution function. Use detailed hydroclimatic data to classify floods into distinct populations and demonstrate that each mode in the nonparametric density function corresponds to a flood generating mechanism. • Demonstrate that the homogeneous distributions such as three parameters log-normal (LN3), log-Pearson type III (LP3), and the generalized extreme value (GEV) have numerous drawbacks and limitations, and quantify the impact of mixtures on the reliability of homogeneous flood frequency analysis. • Investigate other alternative heterogeneous models such as the nonparametric frequency distributions, classified two-component parametric distributions, or unclassified two-component parametric distributions, which explicitly recognize the characteristics of probability distributions of floods generated by more than one hydrologic process. • Assess the sensitivity of design flood estimates to mis-specification of the identified parent distribution (homogeneous instead of heterogeneous) using Monte Carlo simulations, and investigate how heterogeneous distributions perform over various ranges of sample sizes and return periods used in practice. • Investigate how flood frequency characteristics vary with the size of catchment. In other words, how do dimensionless flood statistics such as L-coefficient of variation and L-coefficient of skewness vary with the spatial scale? Investigate how the relationship between these flood statistics and the size of the catchment varies with climate, physiography, and hydrologic regime; test the statistical and operational significance of these scaling relationships; and finally, highlight plausible physical explanations of these scaling patterns. 8 1.3 THESIS STRUCTURE From this point onward, this thesis is written in paper format with each chapter, except for chapter 2, covering the diverse aspects of flood frequency analysis. One journal article has already been published from this research (Chapters 4). Another one has been accepted for publication upon revision (Chapter 3). Chapter 2 describes the basins over which the analysis was conducted and details the reasoning behind the selection of different study areas. The data used in this thesis is described with particular detail given to the period of records, data quality issues, and the screening that was undertaken. Chapter 3 explores the various physical processes that result in annual flood data with multiple populations within three study areas (BC, Arizona, and England). Heterogeneity could be associated with or caused by different type of storms, channel morphology, E l Nino-Southern Oscillation conditions, and by decadal-scale climatic variability. Chapter 4 investigates the implications of using various popular homogeneous distributions of floods for basins that exhibit mixed population characteristics. This work demonstrates how frequency distributions that explicitly account for floods generated by a mixture of two or more populations are, in many instances, both hydrologically and statistically more appropriate. Chapter 5 performs a Monte Carlo simulation experiment to identify the best model for fitting annual flood data generated by mixed mechanisms and to quantify the errors in estimating design floods caused by a mis-identification of distributions. Classified two-component parametric distributions, unclassified two-component parametric distributions, and nonparametric distributions were compared. A series of variables were also tested using Monte Carlo simulations, namely the return period, sample size, and several combinations of two parametric distributions. An assessment of the suitability of flood estimation techniques was made based on the effect of these variables on the accuracy of the estimates. 9 Chapter 6 reviews the current state of the spatial scaling assumptions used in regional flood frequency analysis and discusses the scaling of flood peaks with a focus on the processes and hydrologic variables that control the relationships between flood peaks and the basin scale. The extent to which we can assume constancy in the third and fourth-order L moments (L-Cv and L-Cs) of the annual flood series with the size of the catchment is explicitly tested in this chapter. Chapter 7 summarizes the study, presents the major findings of the thesis, and gives recommendations for further research. 10 Chapter 2 S T U D Y A R E A A N D H Y D R O C L I M A T I C D A T A 2.1 INTRODUCTION This chapter describes in detail the sites that were selected to examine the performance of flood frequency analysis techniques. They were chosen because they reflect different hydrologic conditions and provide a unique opportunity for investigating different modelling techniques of heterogeneous distribution of floods. Floods in these study areas are the result of a combination of different climatic processes and physiographic characteristics. These selected sites are suitable sets of data needed for investigating both spatial and temporal scaling issues. Data sets were selected at the spatial scales of watersheds (Walnut Gulch Experimental Watershed, 150 km2), basins (Gila River Basin, Tees River Basin) and regions (BC, California, and Colorado). The Gila River Basin, located in central and southern Arizona (USA), is arid to semi-arid with a data set that has relatively high skewness and a long period of record. The Walnut Gulch Experimental Watershed is one of the very few heavily instrumented experimental basins established with gauges set up in a nested fashion. There are about 50 years of continuous streamflow and precipitation data for spatial scales ranging from a few hectares to more than 100 km . Three regions in British Columbia were selected: the Coast region, the Fraser and Thompson Plateau region, and the Columbia and Southern Rocky Mountains region. Floods in these three regions are driven by different climatic and physiographic regimes. The Coast region extends most of the length of the Province and includes both the windward and leeward sides of the mountains. The Fraser and Thompson Plateau region 11 provides a contrast with the wetter regions to the west and east. In the third region, the Columbia and Southern Rocky Mountains, precipitation is highly variable, ranging from as low as 500 mm at valley bottom localities to up to 2500 mm on the west-facing slopes of the mountains. The selected three regions in California and two regions in Colorado are characterized by diverse climates but similar physiography. The regions all have high relief and moderate to dense forest covers but different climatic regime (Pitlick, 1994). This similarity permits the analysis to focus solely on the effects of climate, rather than consider the effects of physiography, which is required with the BC data. The contrasting differences in the catchment characteristics and channel geomorphology of the upstream and downstream reaches of the Tees River Basin (England, UK) provide a unique opportunity for investigating the resulting effects on the form of flood frequency distribution. In Tees River Basin, the analysis was performed on the maximum annual flood series recorded from 1969 to 1999 at two gauges: Broken Scar, near Darlington, and Low Moor. The remainder of this chapter (i) introduces in more detail these study areas and discusses the factors and motivation behind the selection of these data and (ii) screens the selected historical peak flow data sets to remove any data points that might unduly influence the results of this study. 2.2 STUDY AREAS AND HYDROMETRIC DATA 2.2.1 Gila River Basin, Arizona (USA) The arid to semi-arid Gila River Basin is situated in central and southern Arizona (USA) (Figure 2.1). The Gila River Basin experiences different types of circulation patterns with strong seasonal differences in the types of atmospheric processes that influence hydrologic response. It provides a unique opportunity for investigating different modelling techniques of heterogeneous distribution of floods. Local, regional, and global synoptic patterns and atmospheric processes are the source of multiple populations of 12 flood events (Hirschboeck, 1985). The work presented in this thesis has been inspired by, and is an extension of, several previous investigations conducted on the hydroclimatology of the same study region (Bryson and Lowry, 1955; Hales, 1974; Douglas and Fritts, 1973; Hirschboeck, 1985 and 1987; Smith, 1986; and Webb and Betancourt, 1992). Floods in the study area are the result of a combination of different climatic and physiographic processes in each part of the basin. In the western part, dominated by the Sonoran Desert, ephemeral streams are fed by convective rainfall events. In the central and southern parts, dominated by hilly terrain, ephemeral streams are fed by the more frequent and orographically enhanced rainfall events that produce some of the largest floods in the area. In the northern and northeastern parts of the basin, consisting of the high elevation of the Mogollon Rim, winter orographic precipitation and spring snowmelt feed perennial headwater streams (Hirschboeck, 1985). On a synoptic scale, moisture coming in from the Gulf of California and the Pacific Ocean result in convective thunderstorms and tropical storms during the humid summer months and large frontal storms during the winter months. The location of the basin in a climatic transition zone between temperate and tropical latitudes contributes further to the contrast between the various seasons in terms of both precipitation and streamflow (Hirschboeck, 1985). Hydroclimatological research in southern Arizona has linked various flood-producing storm types to large-scale atmospheric-oceanic interactions (Hansen et al, 1977; Maddox et al., 1980; Hansen and Schwarz, 1981; Hirschboeck, 1985, 1987; and Smith, 1986). Three types of flood producing storms and associated upper-atmospheric patterns are described below. The summer rainy season in Arizona, also called "summer monsoon", is characterized by high intensity precipitation and is preceded by strong zonal flow and aridity under direct influence of subsidence from a subtropical high-pressure cell in the eastern Pacific Ocean, which remains displaced to the south during spring and early summer. Surface 13 heating and orographic effects operating on the unstable moist air results in monsoon storms that are characterized by isolated or complex groups of thunderstorms that have a duration of less than several hours (Maddox et al, 1980 and Hansen and Schwarz, 1981). The summer rainy season begins near the end of June and early July and persists until late August or early September. Hansen and Schwarz (1981) affirmed that although the Gulf of Mexico may be the source of much of the day-to-day summer precipitation in the southwest, it is not the source of moisture for extreme precipitation. Floods caused by monsoon storms have occurred in almost every year of record. Winter storms in Southern Arizona, which dominate from November through April, originate from large-scale low-pressure frontal systems traveling in the belt of upper air westerly wind flow. These systems may be large enough to cover the entire state with one homogeneous storm (Douglas and Fritts, 1973). The storm path moves southward in conjunction with the seasonal expansion of a low-pressure cell that occurs in the North Pacific. On the whole, the main winter storm path lies to the north of the study area and precipitation is usually related to trailing cold fronts moving across the northern part of the basin. During dry winters, the westerlies follow a path around the north side of the ridge of high pressure off the west Coast of North America and into the Pacific Northwest. In wet winters, this ridge is displaced westward, and a low-pressure trough develops over the western United States (Douglas, 1974). In late summer and early fall, widespread and intense rainfall occurs in southern Arizona due to northeastward penetration of tropical cyclones, which include hurricanes and tropical storms, from the tropical North Pacific Ocean. McDonald (1956) demonstrated that summer precipitation in Arizona has great spatial variability in any one year from station to station, although it shows less temporal variability because individual stations tend to receive summer rains fairly consistently from year to year. Conversely, precipitation occurring in winter months can have a great variability from year to year, but tends to be spatially homogeneous in any one year. McDonald (1956) attributed this to the differences in the physical mechanisms that dominate precipitation in the two seasons. Winter frontal storms and summer local convectional storms are the two most 14 common sources of precipitation in the Gila River Basin. However, Douglas and Fritts (1973) and Douglas (1974) reported on other atmospheric mechanisms, which may contribute significantly to precipitation in the study area. Contrary to previous studies, which were conducted on partial duration series of floods in the Gila River Basin, the analysis in this study was performed on the annual flood series. Long-term records from a dozen US Geological Survey stations were selected to cover a wide range of conditions in the basin. The location of these stations is shown in Figure 2.1 and the general characteristics of their tributary watersheds are listed in Table 2.1. Although most streams in the Gila River Basin are used as sources of irrigation and/or municipal water supplies, no major reservoirs or diversions were located upstream from any of the selected gauging stations. 2.2.2 Walnut Gulch Experimental Watershed, Arizona (USA) The U.S. Agricultural Research Service (ARS) established the Walnut Gulch Experimental Watershed forty years ago to better understand the behavior of desert watersheds under conditions of drought and flood (Figure 2.2). Walnut Gulch is a major watershed tributary that covers about 150 km2 on the east side of the San Pedro Valley and includes the city of Tombstone (Goodrich etal., 1997). Walnut Gulch is the most monitored and automated watershed in the United States. Each of its sub-watersheds contains instruments that measure rainfall, soil moisture, air and soil temperature, humidity, wind velocity and direction (horizontal and vertical), cloud cover, water runoff, and sediment transport (Renard et al., 1993). Al l of these data are automatically transmitted hourly to the ARS Watershed Research Center in Tucson. Walnut Gulch's landscape is a mix of grasslands, riparian vegetation, and shrubs. It is dotted with abandoned mines that date back to the late 1800's, when the mining boom was located in the region around Tombstone (Renard, 1990). The site is semi-arid, with a hot summer and a dry winter. Precipitation at the site is highly variable, with intense and localized convective thunderstorms providing two thirds 15 of the annual rainfall. Almost all runoff occurs during the summer monsoon season when intense rainfall from convective storms exceeds infiltration capacity. Summer rainfall has varied from a watershed average of 74 mm to 244 mm (Houser et al., 2000). The maximum-recorded summer precipitation at a rain gauge was 336 mm, illustrating the extreme temporal and spatial variability of seasonal rainfall as well as individual events. There are about 30 years of streamflow data for spatial scales ranging from a few hectares to more than 100 km2. This set of data is appropriate for investigating both temporal and spatial scaling issues. 2.2.3 British Columbia, Canada British Columbia is a mountainous region with a climate that is controlled by the north-south orientation of the mountain ranges and their location on the Pacific Coast. The region experiences two types of climate regimes in the winter. The more frequent is milder and consists of moist air flowing eastward from the Pacific Ocean over mountain barriers, causing heavy precipitation falls on the windward side of the mountains, and much drier conditions on the leeward side. Although less frequent, the second winter climate regime is much colder and is caused by dry air flowing southward from the northern arctic region. Precipitation is mainly in the form of rainfall in the southwestern area and snow elsewhere in the province. Frontal storm activities on the Coast dominate annual precipitation in winter mainly from November to March. Convective storm activities in the Interior become important in the summer from May to August. In summer, the climate of British Columbia tends to be much drier than winter due to a persistent high-pressure area developing off the Coast, which restricts the movement of frontal systems over the province. The proportion of annual precipitation that falls in the form of snow as opposed to rain increases with latitude, elevation, and distance from the Coast. The fraction of winter precipitation which falls as snow increases with latitude, elevation, and distance from the coast, and varies from less than 10% at sea level in the south coastal region to almost 100% in the northern interior (Moore and McKendry, 1996). As the temperature increases in the summer, streamflows are generated by snowmelt (and occasionally rain-16 on-snowmelt), which dominate the annual runoff in almost all places in the province. The exception is in the Coast Mountain region, where most of the flow is produced in the winter. To investigate the scaling behavior of floods for different climate and physiographic regimes, annual peak flow data were analysed for three physiographic regions as delineated by Holland (1964): the Coast Zone, the Fraser and Thompson Plateaus (part of Interior Plateau), and the Columbia and Southern Rocky Mountains (Figure 2.3). The Northern and Central Plateaus and Mountains and the Great Plains area in the Northeast are not included in this study due to the sparsity of hydrometric stations and because the three selected regions suffice to highlight extremes in the scaling behavior of regional flood statistics. The Coast Zone extends most of the length of the Province (Figure 2.3) and includes both the windward and leeward side of the mountains. It receives a high annual precipitation that generally varies between 1250 mm to 5000 mm. Most of the precipitation falls during the winter months. Streamflows are generated by rain, snowmelt, or rain-on-snow depending on the elevation of the particular watershed (Melone, 1985). The Interior Plateau is characterized by a semi-arid and more continental climate than the Coast. The Fraser and Thompson Plateau region, with a mean annual precipitation between 300 mm in low areas to about 1000 mm over the highlands (Waylen and Woo, 1982, 1984), is considered here to provide a contrast with the wetter regions to the west and east. Precipitation for the Fraser and Thompson Plateaus tends to be evenly distributed throughout the year. Streamflows are generated largely by spring snowmelt and occasionally by rain-on-snow events. In the third region, the Columbia and Southern Rocky Mountains, precipitation is highly variable ranging from as low as 500 mm at valley bottom localities to as high as 2500 mm on the west-facing slopes of the mountains. Most of the precipitation falls in the winter months and streamflows are generated predominantly from snowmelt. 17 The analysis was conducted on annual maximum daily discharge (Table 2.4) measured at unregulated streams (Environment Canada, 1994). The decision to use daily discharges was based on the fact that the number of stations measuring daily flows is substantially larger than the number of stations measuring instantaneous flows. A basic assumption of flood frequency analysis is that there is no temporal trend to the streamflow data caused by long-term climatic changes. While decadal climate variability in BC have been documented (Moore and McKendry, 1996), with warmer winters and smaller snowpacks prevailing since about 1977, their impact on peak streamflows is less obvious. Alila (1998) detected significant temporal shifts in annual peak flows in 16 out of 41 long-term record stations in BC, warranting the use of an appropriate standard climatic period for all stations. This period was 1960-1996. However, sampling variability associated with the short data records for most stations dominates over the subtler long-term temporal trend in peak flows. Therefore, it was decided not to limit the data record even further in an attempt to eliminate potential effects of climate change that cannot be detected in the majority of stations. 2.2.4 Colorado and California (USA) Three regions in California and two regions in Colorado as compiled by Pitlick (1994) (Figure 2.4), which are characterized by diverse climate but have similar physiography, were selected to examine their flood frequency characteristics. The similar physiography permits the analysis to focus solely on the effects of climate. These regions have high relief and moderate to dense forest cover. The Sierra Nevada, Coast Range, and Klamath Mountain regions of California all have floods produced by large-scale frontal systems. Relief in all three regions is more than 2000 meters. The seasonal patterns of precipitation and runoff in these three regions are similar. Most precipitation in northern California falls between October and May in association with Pacific frontal storms (Paulson et al, 1991). Mean annual precipitation tends to be higher on the Coast and lower inland, but the opposite is true of precipitation intensity. The rapid west-east rise in elevation of these mountain ranges produces strong orographic effects, and precipitation at higher elevations, whether falling as rain or snow, 18 can be extreme. For example, in the headwaters of the North Fork Feather River in the Sierra Nevada, more than 500 mm of rain was recorded over a 3-day storm in February 1986 (National Oceanic and Atmospheric Administration [NOAA] climatological data, 1986). Under cooler winter conditions, snow in the Sierra Nevada can accumulate to great depths. The highest flows in these regions occur during the winter months, although high flows are also associated with spring snowmelt in the Sierra Nevada and Klamath Mountains. Throughout northern California, the largest floods occur in mid-winter when there is heavy rainfall accompanied by partial melting of an existing snowpack (rain-on-snow events). For example, in the Klamath Mountains, a storm in January 1974 produced record floods on many streams, but 3-day rainfall totals for the storm generally did not exceed 300 mm. Climatic data from weather stations in the area indicated that minimum daily temperatures never fell below freezing from the 13 th to the 19 th of January, and therefore much of the runoff was probably produced from melting of the snowpack. The Foothills and Alpine regions of Colorado have peak flows typically generated by intense thunderstorms and snowmelt, respectively. The alpine region encompasses the crest of the Front Range and those drainage basins with headwaters above 2300 m. The Foothills region lies further east and includes drainage basins with average elevations between 1500 m and 2300 m (Jarrett and Costa, 1982). Forest cover in this region is moderate to sparse, particularly on steep, south-facing slopes near valley bottoms. Soils are thin and poorly developed throughout both regions. Precipitation in Colorado is derived from frontal storms originating in the Pacific Ocean, and from convective storms originating from the Gulf of Mexico or the Gulf of California (Paulson et al., 1991 and Barry, 1992). In winter, nearly all precipitation falls as snow. In spring, cold, low pressure systems draw moisture from the Gulf of Mexico causing upslope conditions that can produce heavy rainfall (Barry, 1992). In summer, convective storms are an almost daily occurrence. Particularly severe thunderstorms, such as the one 19 that generated the Big Thompson flood (McCain et al., 1979), may occur in the foothill and piedmont regions of the state. Runoff from drainage basins in the Alpine Region is highest in May and June when the winter snowpack is melting. Runoff in the Foothills region is produced by spring rainfall and summer thunderstorms. This distinction between snowmelt and thunderstorm generated flows is the basis for dividing the Front Range into two separate hydroclimatic regions (McCain and Jarrett, 1976; Jarrett and Costa, 1982; and Kircher et al., 1985). Some streams in this area derive runoff from both snowmelt and rainfall, although Jarrett (1990) indicated that these mixed snowmelt-rainfall events are infrequent and generally do not produce large floods. This thesis utilizes data from streams that derive flow from either snowmelt or rainfall, but not both. The data for the Klamath Mountain region was not included in the study because of an insufficient number of data points for meaningful analysis. 2.2.5 Tees River Basin, United Kingdom The Tees River Basin is located in the North East of England, United Kingdom, shown in Figure 2.5. It rises to 760 m above sea level on the eastern slopes of Cross Fell, one of the northern peaks of the Pennines. The river flows from west to east with an approximate length of 160 km and a drainage area of 1930 km 2 . The Tees River Basin has many tributaries with the Rivers Greta, Boulder, Skerne, Leven, Lune, and Billingham Beck being of the most important in terms of flow quantity and quality. In this study, the analysis was performed on the maximum annual flood series recorded from 1969 to 1999 at two watershed gauges: Broken Scar near Darlington and Low Moor. The two gauges are located 34.6 km apart. The upland catchment area rising on the Pennine plateau, a tributary to Broken Scar, is 818 km 2 with elevation ranging from 500 m to well above 600 m. The catchment is predominantly covered by peat (semi-decomposed organic matter) and has a mean annual precipitation of 1207 mm. Low soil permeability results in runoff coefficients that exceed 60%. The soil consists of alluvium 20 which is deposited every time the river floods and overflows its banks. The river reach above Broken Scar is steeply sloping and consequently has a rapid and flashy response with little opportunity for overbank storage even during extreme floods. There is a major change in catchment and channel characteristics downstream of Broken Scar. At Low Moor, the total tributary catchment area is 1264 km 2 . The catchment lies generally below 100 m above sea level and has an average rainfall of 665 mm. The average channel slope is 0.92 m/km. The soil is much more permeable with a runoff coefficient generally less than 10%. Therefore, the contribution to flood flows is small in comparison to flows from the catchment tributary to Broken Scar. Contrary to the river reach upstream of Broken Scar, the river downstream is alluvial and meanders over a wide and well-developed floodplain. The bankfull discharge is approximately 350 nvVsec and the channel capacity to bankfull through the reach is about 7 M m 3 (Archer, 1989). There are two principal tributaries to the river reach: Skern Creek draining an area of 250 km 2 and Clow Beck Creek draining an area of 78 km 2 . Both creeks join the Tees River basin downstream of Broken Scar. The contrasting differences in the catchment characteristics and channel geomorphology of the upstream and downstream reaches of the Tees River Basin provide a unique opportunity for investigating the resulting effects on the form of flood frequency distribution (Archer, 1989). 2.3 SCREENING OF HYDROMETRIC DATABASE Regional flood frequency analyses traditionally assume that the sample flood data are a reliable set of measurements of independent, random events drawn from a homogeneous population. To verify these assumptions, the nonparametric tests of independence, trend, randomness, and homogeneity were conducted at each hydrometric station. The most recent version of the Consolidated Frequency Analysis (Pilon and Harvey, 1994) package was used, with significance levels set to 5% and 1%. A brief description of each test is provided below. 21 2.3.1 Test of independence Two events can be considered to be independent i f the occurrence of each event is unaffected by the occurrence of the other event. This test assesses independence based on the correlation between data points in a data set. If the correlation coefficient between the N - l pairs of the (i) and the (i +1) members of a data set is not significantly greater than zero, the values of the data set can be assumed to be independent. 2.3.2 Test for trend If the successive observations of a time series are made during a period of gradually changing conditions, then there w i l l be a noticeable trend in the magnitude of the observations in the series when arranged in chronological order. In the present study, detection of trends in peak flow series was accomplished by visual inspection and by using the nonparametric Spearman Rank order Correlation coefficient (SRC) test. 2.3.3 Test for general randomness A random sample is one that is selected in such a way that any other sample could have resulted with equal likelihood. In a hydrological context, randomness is generally regarded as meaning that the data arising from natural causes. The nonparametric 'runs above and below the median' test for general randomness was employed in this study. The test is based on the number of runs (succession of identical symbols that are followed and preceded by different symbols or by no symbols at all) that a sample exhibits. 2.3.4 Test for homogeneity Homogeneity means that, excluding random fluctuations, a data series is invariant with respect to time. Testing for homogeneity in time is accomplished using the Mann-Whitney split sample test. If some more or less abrupt change has occurred during the sampling period, then some difference could be expected between the means of the sub-samples before and after the change. If two sub-samples of approximately the same size are chosen, it would be expected that i f there were no changes in conditions, then the sums of the ranks of the two sub-samples would not be significantly different. Assuming 22 a normal distribution and that the two-samples have the same variance, the difference of the sub-sample means can be tested for significance using the t-distribution. These assumptions are usually not met in hydrology (Pilon and Harvey, 1994). Therefore, testing for homogeneity was accomplished using the Mann-Whitney split-sample test, which is a function of the sub-sample sizes and their sums of ranks. 2.3.5 Application of nonparametric tests to regional studies These tests, which were all applied on a site-by-site basis, are not generally intended for regional studies. However, it is necessary to test individual data sets in order to eliminate any major anomalies prior to using the data for regional studies. Another point to note is that the reliability of these tests diminishes with sample size, so small samples containing anomalies are more likely to pass the tests than large samples containing anomalies. While the sampling variability of small samples is not as great a concern when using L-moment statistics, as opposed to using conventional moment statistics, it is still not desirable to use samples that contain large anomalies. As an example, results of tests for randomness, trend, independence, and homogeneity are detailed in Tables 2.6 to 2.9 for Broken Scar. This watershed has a continuous record of annual maximum daily flows from 1957 to 1998. Test results indicated that the data were random and homogeneous, and did not display significant dependence and trend. This indicates the suitability of the data sets for the analyses undertaken in this thesis. 2.4 RECORD LENGTH All stations with less than 10 years of record were eliminated from the data set. This 10-year threshold was selected somewhat arbitrarily, but it has some precedent (e.g. Burn et al, 1997) and it serves as a reasonable balance between maximizing the number of stations for analysis and minimizing the sampling error associated with small sample sizes. Burn et al (1997) argued that while a minimum 10-year record length may not lead to reliable at-site flood frequency information, it does provide useful information for regional flood frequency studies. 23 2.5 HIGH AND LOW OUTLIER FLOODS The presence of outliers in a data sample will cause difficulties in obtaining a satisfactory fitting of a homogeneous frequency distribution to the sample. Depending on whether the outliers are high or low, and on the chosen frequency distribution, the estimates of the T-year event will often be under or overestimated. Techniques are available for appropriately dealing with these outliers, but these outliers must first be detected. The inclusion of low outliers affects the sample skewness, which becomes small and sometimes even negative (Pilon and Harvey, 1994). The use of distorted sample statistics will lead to errors in the estimation of frequency distribution parameters, which in turn may lead to the under or overestimation of flood quantiles. Procedures for treating outliers require judgment involving both mathematical and hydrological considerations. The treatment of outliers is somewhat controversial (Haan, 1977), but, for single-site frequency analysis, values identified as low outliers are generally removed from the data set and a conditional probability adjustment is applied (IACWD, 1982). This approach is also appropriate for regional models that incorporate flood quantile estimates, such as the flood index approach. In this study, the historical flow values for each station were screened for high and low outliers with the modified Grubbs and Beck outlier test, as prescribed by "Bulletin 17B: Guidelines for Determining Flood Flow Frequency" (USIACWD, 1983). This test assumes that the logarithms of the samples are normally distributed and identifies an outlier as any point that falls in the tail of the distribution beyond a critical value. 2.6 TREATMENT OF ZERO FLOWS During extremely dry years, a number of the smaller watersheds used in the analyses produce no flow. These years of no flow are recorded as zeros in the annual flood series. Inclusion of these zero values in the sample set results in a distortion of the sample statistics. The zero serves as a lower bound, which truncates the potential variation of the peak flow. For instance, a large watershed may experience many different degrees of dry climate over the entire drainage area and thus may have a corresponding variety of low peak flows. However, a small watershed exposed to the same climate regime may experience a series of zero flows, with no variation. In these instances, inclusion of the 24 zeros in the data set will create problems in producing a satisfactory parametric flood frequency distribution. The presence of a series of zeros in the data set will likely result in an underestimation of the true population mean, while exclusion of the zeros will result in a significant overestimation of the mean. Either situation will adversely affect other sample statistics, although the direction of this influence is more difficult to predict. Another problem with having zeros in the data set occurs when the data are fit to some logarithmic types of parametric distributions. The logarithm of zero is minus infinity so it is impossible to estimate distribution parameters. Various techniques have been developed for the treatment of zeros in a data set, with the most common techniques utilizing conditional probability theory (Stedinger et al., 1993). These techniques are essentially the same as those utilized for treating low outliers. Pilon and Harvey (1994) present a good discussion of this topic. Most streams that were used in these study areas are perennial and therefore no treatment of zero flows was necessary. 2.7 POSSIBLE EFFECTS OF CLIMATIC SHIFTS OR VARIABILITY ON STREAMFLOWS A basic assumption of flood frequency analysis is that there is no temporal trend to the streamflow data caused by long-term climatic changes. However, changes in climate may have significant influence on such data. For example, low frequency or decadal climatic fluctuations in snowfall and temperature in BC have been documented (Moore and McKendry, 1996). The influence of these climatic changes on peak streamflow values is, however, less obvious. Changes in the frequency, timing and intensity of precipitation, and temperature can all influence peak streamflow levels. Trends in flood data series can be produced gradually or instantaneously. For example, a large forest clear-cut in a basin may quickly result in a shift, whereas a gradual forest insect infestation or climate change may produce longer-term trends in the flood-data series. In regional frequency analysis, trends in flood data caused by climate variability can be treated in different ways. For instance, one could use data from only the 25 hydrometric stations with a common record length during a particular period where the climate could be considered stable. However, doing so could be at the expense of reducing the spatial and temporal data coverage of the study region. For instance, in British Columbia, the bulk of the streamflow data have been collected during the last 40 years and therefore no attempt was made to reduce the hydrometric data from the various stations to a common recording time period. Church (1997) used this same period (1961 to 1990) as the global period for his analysis in a recent regionalization study. Furthermore, since flood mixture, which is a study topic of this thesis, might be the result of a climate changing in time, we have decided not to constrain the available record length of streamflows. 2.8 DATA QUALITY The means of developing streamflow measurement techniques is to develop the accurate and improved techniques of streamflow data. Streamflow data is the main factor causing errors in the establishment of water resources plan and flood forecasting. Streamflow measurements can be negatively affected by either instrumental or human errors. Streamflow is more difficult to measure accurately and continuously than is stage and, as a result, the flood records always contain some error. For instance, it is known that weirs and flumes produce the most accurate discharge measurements. However, even carefully designed and installed flumes may produce discharge values with an accuracy of +/- 3 to 5 % (Beschta et al., 2000). If flume calibration is done with a current meter or similar device rather than a weighing basin, an additional error and a systematic calibration bias is a likely result (Beschta et al., 2000). In addition, there can be another source of error in streamflow measurements as water "leaks" past the flume as some sort of underflow into or out of the watershed across the topographic divide (Ward, 1971). Streamflow for a gaging station is typically determined from an established stage-discharge relation, or rating curve. Measurement errors typically result from gauge misreadings or instrument and gauge malfunctions, while the most common calibration errors result from undetected rating curve shifts and the extrapolation of stage-discharge relations. In addition, the largest floods often exceed the measuring range of gauges, so peak flows must be estimated by 26 utilizing the slope-area method and post-flood field measurements, which are inherently inaccurate. On the other hand, traditional raingauges and snowgauges suffer from a number of technical problems, including undermeasurement due to air turbulence around the gauge, evaporation losses prior to measuring, and losses due to "wetting". The most important of these is turbulence, reducing the percentage of "true catch" as the wind speed increases (for instance, in winds of 22 m/s the typical gauge catches barely 50% of the true rainfall and only 25 % of true snowfall) (Jones, 1997). Windspeed and turbulence increase the higher the gauge stands but unfortunately, there is no international standard height for gauges. However, the gauge has to be set sufficiently apart from the ground surface to avoid another problem: rain splashing into the gauge. Losses due to wetting, that is, water left clinging to the inlet funnel and collecting bottle during measurement, are also increased by over-frequent measurement, whether it be rain or snow (Jones, 1997) Furthermore, the natural variability of precipitation and the difficulty to account for it within acceptable limits of error (sampling error) is, in addition to instrumental errors, a fundamental problem when measuring this variable. Sampling error involves the location of gauges to sample precipitation on a basin, including errors due to barriers (a hill, a wall of trees, for example), orographic effects, and the peculiar wind patterns in small forest openings (Hewlett, 1982). 27 Chapter 3 CAUSES OF MIXTURE IN FLOOD DATA 3.1 INTRODUCTION Traditional flood frequency analysis is based on the assumption that the annual flood series can be considered to be a sample from a single population. Several studies have demonstrated that in some regions the annual flood series is a sample of a mixture of two or more populations. That is, floods are caused by more than one mechanism. Differences between the populations may be due to a number of factors, including seasonal variations in the flood-producing mechanisms, changes in weather patterns due to low frequency climate shifts and/or El-Nino/La-Nina oscillations, and changes in channel routing due to the dominance of within channel or floodplain flow. Not recognizing these physical processes in conventional flood frequency analyses is likely the main reason the commonly used flood frequency distributions often fail to give an acceptable fit to flood data. This is demonstrated in Chapter 4. Following a review of previous work, this chapter demonstrates that the Gila River Basin of southeast and central Arizona, the Tees River Basin of northeast England, and British Columbia (BC), Canada, are all regions in which annual flood series are the product of a mixture of two or more hydrologically distinct populations. The objectives of this chapter are to: (i) investigate how in different climates and physiographies, mixture in floods can be caused by totally different processes and (ii) demonstrate how in each case, different techniques * Part of this chapter has been accepted for publication upon revision as Mtiraoui, A . and Y . Alila. 2004. Effect of river basin geomorphology on the selection of flood frequency distributions. Journal of Hydrologic Engineering 28 and different sets of hydroclimatic data need to be used to identify the number and nature of these mixtures. 3.2 LITERATURE REVIEW Frequency distributions, which are commonly used in hydrology, assume that the annual flood series is a random sample drawn from a single homogeneous population of floods. A number of researchers have shown that this assumption may not be valid and have treated floods as coming from a more than one population. For the most part, these researchers approached the problem from a statistical perspective. Only a few have focused on the physical processes contributing to heterogeneous flood distributions. Hazen (1930) classified floods in the United States as being generated by one of three mechanisms: melting snow, a large storm of wide distribution, and cloud bursts covering only small areas. Potter (1958) was one of the first to consider the evidence for two or more distinct populations of peak runoff by analysing the shape of the cumulative distribution function, which is referred to as the dog-leg on flood frequency curves. He also proposed possible climatic causes for the multiple populations. One of many illustrations of the dog-leg in the frequency curve that might have been selected is shown in Figure 3.1. Stoddart and Watt (1970) added credence to Potter's assumption when they successfully developed a distribution for extreme floods in Ontario by superimposing the distribution functions for summer rainfall floods and winter snowmelt floods. Singh (1968, 1974) presented a methodology for mathematically simulating heterogeneous distributions in flood data. Although he referred to climate as a probable cause of multiple populations, his approach was to objectively search the streamflow data alone to define a mixture of distribution, rather than to decompose the data on the basis of additional climatic information. The best documentation has been provided for southwestern Canada by Waylen and Woo (1982; 1983), Woo and Waylen (1984), and Waylen (1985a,b). For the catchments considered, the annual floods in southwestern Canada are due either to spring snowmelt or fall/winter rain. These mechanisms can effectively be classified using the four to eight-29 days antecedent rainfall. In addition to the classification yielding distinct flood distributions, the relative importance of the individual mechanisms also varies spatially in a physically meaningful way. For example, Waylen and Woo (1983) were able to illustrate that orographic effects and distance from the ocean controls the relative importance of the two mechanisms in southwestern BC. Melone (1985) investigated flood-producing mechanisms in BC, and concluded that in the Coastal region, floods do not result from the same flood generating mechanisms on all drainage basins. Floods may be induced by snowmelt in spring or summer, by rainfall or rain-on-snow in fall or winter, and by both snowmelt and rainfall during the year. Similar mixed mechanisms have been documented in other regions. Elliot et al. (1982) and Jarret and Costa (1982) used hydrograph analysis to classify snowmelt and rain events in the Colorado Rockies and foothills. None of these studies considered the effect of antecedent temperatures, which may have a large effect on flood mechanisms. Versace et al. (1982) and Rossi et al. (1984) showed that the high variability of skewness of Italian flood data resulted from the presence of isolated large flood events generated by extreme climatic conditions, which qualified as outliers under the EV1 distribution. Hirschboeck (1985, 1987) presented a more detailed hydroclimatic approach to separating sub-populations of a flood sequence, whereby various synoptic atmospheric circulation mechanisms and patterns that generated each flood event in the flood sequence were identified. The author used surface and upper weather maps to classify flood events as arising from snowmelt or tropical, cutoff low, frontal, monsoon widespread, monsoon localized, widespread synoptic, and local convectional storms. These seven storm groups are samples representing the component frequency distributions of a composite heterogeneous distribution of floods. However, due to small sample sizes of the flood series, attempts to test the statistical significance of the difference between the frequency distributions of the various groups proved inconclusive. In addition, frequency analyses of annual floods to obtain parameter estimates of the composite distribution or its components were not conducted given the small sample size. 30 Sivapalan et al. (1990) showed by numerical simulations that different storm-runoff production mechanisms produce floods of different characteristics. They demonstrated that in low rainfall climates, low-return period floods are dominated by saturation excess runoff and high-return period floods are dominated by infiltration excess runoff. Waylen and Caviedes (1986) showed that the E l Nino-Southern Oscillation (ENSO) might cause mixture in the annual flood data. The anomalous warming of the equatorial eastern and central Pacific Ocean during the ENSO years produces floods that are different from those of the non-ENSO years. Webb and Betancourt (1992) have noted the coincidence between intense El-Nino events and the occurrence of flooding episodes in Arizona during the subsequent winter. They used flow record at one station in the Gila River Basin to show how the annual floods generated by the ENSO conditions are hydroclimatically and statistically different from those generated by non-ENSO conditions. Archer (1989) showed that mixture in flood data is related to the geomorphic characteristics of the fluvial system in a river basin. In basins with well-developed floodplains, less confining terraces, wide valley bottoms, low slopes, and high hydraulic roughness, floods are generated by a mixture of two populations. The first population consists of peak flows that run within the confines of the main channel and the second population consists of those that inundate the floodplains. Despite this, Archer did not use the nonparametric method to define mixture nor attempt to fit frequency distributions that account for mixture in flood data. The objectives of this chapter are twofold. Firstly, the analysis attempts to explain physically why a mixture of two or more populations generates annual floods in some typical watersheds in BC, Arizona, and the U K . These explanations should look beyond just relating the antecedent precipitation and temperature to each flood event. Secondly, this study will (ii) investigate how in each of the above mechanisms, different techniques and different sets of hydroclimatic, atmospheric, and geomorphic data need to be used to identify the number and nature of these mixtures. 31 3.3 RESEARCH METHODS This chapter investigates the various physical processes resulting in annual flood data with multiple populations in the three study areas: the Gila River Basin of southeast and central Arizona, the Tees River Basin of northeast England, and British Columbia, Canada. As mentioned earlier, in British Columbia, Canada, Waylen and Woo (1982) categorized floods as either snowmelt or rain induced. However, they never discussed the possibility of separating data based on ENSO/Non-ENSO events, which effectively come from physically different populations. The E l Nino-Southern Oscillation (ENSO) conditions represent the anomalous warming of the equatorial eastern and central Pacific Ocean that occurs every 3 to 5 years. The most commonly used index to differentiate the ENSO from non-ENSO conditions is the Southern Oscillation Index (SOI), which is the difference in sea-level pressure between Darwin, Australia, and Tahiti (Ropelewski and Jones, 1987). One of the problems in the analyses of ENSO-related phenomena is the use of different criteria for identifying ENSO conditions, such as sea-surface temperatures, several versions of the SOI, or Line Island precipitation. When there is a high negative correlation between sea-surface temperature in the eastern Pacific Ocean and the SOI, strong ENSO years are easily defined. Differences arise when defining weaker ENSO years because warming occurs without a large reversal in sea-surface pressure. In this study, the Darwin-Tahiti pressure difference and the Line Island precipitation index were used to develop a chronology of 20 t h century ENSO conditions (Table 3.1). The chronology differs only slightly from existing chronologies of ENSO (Table 3.1). It does not have a denotation of strength, and it gives the approximate beginning and ending times for ENSO conditions. Using the classification in Table 3.1, ENSO conditions recurred on the average every 3.8 years for 1900-29, every 4.3 years for 1930-59, and every 3.8 years for 1960-86. In this study, I have first used the chronology of ENSO in Table 3.1 to classify every single annual flood as El-Nino or La-Nina in many typical watersheds across BC. 32 Secondly, I used this classification to demonstrate that mixture in the annual flood data is strongly affected by the prevalent ENSO conditions. The extent to which the dominance of either ENSO or non-ENSO events in the upper leg of the flood frequency curve has any spatial trends within British Columbia is then investigated. In Arizona, USA, mixture in flood data may be caused by differences in storm types. As mentioned previously, Hirschboeck (1985) pioneered a hydroclimatic approach for the analysis of flood data at 30 gauging stations within the Gila River Basin. She used the surface and upper weather maps to demonstrate that floods in southeastern Arizona may be generated by one of seven different rainfall regimes: tropical, cutoff-low, frontal, monsoon widespread, monsoon localized, widespread synoptic, and local convective storms. Her approach is capable of identifying sources of heterogeneity in flood distributions. Unfortunately, its use in the stochastic analysis of floods remains challenging because of the problems associated with limited sample sizes. The challenge in undertaking stochastic analyses of flood series having limited sample sizes is a major reason why heterogeneity in flood distributions has often been excluded from conventional analyses. To overcome this problem, the above seven generating mechanisms that contribute to flood events in the Gila River Basin were combined and reduced to three main categories of monsoonal storms, frontal systems, and tropical cyclones. Contrary to the study by Hirschboeck (1985), which was conducted on partial duration series of floods, the analysis in this study was performed on the long-term annual flood series at a dozen US Geological Survey stations, which were selected to cover a wide range of conditions within the Gila River Basin. Following this three-storm classification, the annual flood series of the San Francisco River at Clifton and the Gila River at Clifton was split into three groups using the original classified partial duration series data of Hirschboeck (1985). In the Tees River Basin, U K , the objective was to investigate how the mixture in flood data may be related to the geomorphologic characteristics of rivers (main channel vs. floodplain flows). The aim of this investigation is to demonstrate that as the flood wave 33 moves downstream in the basin, not only is the type of the distribution changing, but also its parameters. In watersheds with well-developed floodplains, less confining terraces, wide valley bottoms, low slopes, and high hydraulic roughness, there may be two physically different populations of floods. The first population consists of floods that run within the confines of the main channel, whereas the second population consists of those floods that inundate the floodplains. As the floodplains develop in the downstream direction within the same basin, the flood frequency distribution may be changing from a homogeneous (floods generated by a single population) to a heterogeneous (floods generated by a mixture of two populations) distribution. The analysis will be performed on the maximum annual flood series recorded from 1969 to 1999 at two stream gauges located 34.6 km apart: Broken Scar and Low Moor. To investigate the extent of mixtures in flood data caused by the above different streamflow generation mechanisms, the following techniques will be used. 3.3.1 Flood histogram by month of occurrence The seasonal sample splitting and the monthly discharge histograms were used as a first attempt to classify annual flood maxima. At each hydrometric station, the histogram of annual flood series by month of occurrence was constructed. The period marked by the dates of formation and loss of snow cover is often used to characterize the period of spring floods. The remaining period of the year characterizes rainfall-induced floods. The histograms in Figure 3.2 show examples of maximum annual flood by month of occurrence: (i) floods occurring in one single season, (ii) floods occurring in two distinct seasons, and (iii) floods occurring all year around. This histogram analysis can only be used when there are two distinct seasons of floods. When the rain that contributes to the annual floods occurs in the melt season, a seasonal split by time of occurrence was not physically possible. 3.3.2 Nonparametric PDF and C D F shapes The break in the slope of the empirical flood frequency curve and/or the bimodality in the nonparametric density function are methods that may be used to recognize mixture in the 34 flood data. The break in slope can be used to separate the two populations in the original annual flood time series. The nonparametric density function (Adamowski, 1985) is defined as: nhi=l x - x,. h (3.1) where xt to xn are the observations in an annual flood series, K(.) is a kernel function that is itself a probability density function, and h is a smoothing factor. It has been found that the choice of the kernel function is not crucial for the performance of the technique and as such the Gaussian Kernel approach (Adamowski, 1989) has been used. The shape of the density function is sensitive to the smoothing parameter h. An optimal algorithm that minimizes the integrated mean square error using a cross-validation technique is adopted in this study for estimating the optimal value h (Labatiuk, 1985). A unimodal probability density function and a straight line Cumulative Distribution Function (CDF) indicates that the floods may be the result of a single generating mechanism while a bimodal Probability Density Function (PDF) and a dog-leg CDF indicates the presence of mixture in flood data. It will be demonstrated using hydroclimatic that each mode corresponds to a distinct flood producing mechanism. As an example, nonparametric frequency analysis (Figures 3.3 to 3.5) clearly showed three distinct PDF shapes. Figure 3.3 is an example of a nonparametric PDF with one mode and a straight line CDF, Figure 3.4 is another example of a nonparametric PDF with two modes and a break (dog-leg) in the CDF, and Figure 3.5 shows a heavy-tailed PDF and several breaks in the CDF plots. These various density shapes were a reflection of either different flood-generating mechanisms or sampling variability. Parametric analysis would never have identified such heterogeneous distributions. Mixture of flood data may sometimes give a unimodal PDF (Figure 3.3) depending on how different the means of the two component distributions are. 35 3.3.3 Antecedent Precipitation and Temperature Indices The Antecedent Precipitation Index (API) provides a useful criterion by which to differentiate between the two flood-generating processes. A plot of number-day antecedent precipitation and the annual flood discharge for each year can help in determining if the two populations are physically different. The data may suggest that an arbitrary value of the total antecedent precipitation will distinguish the two populations of flood data. The effect of short wave and long wave radiation on snowmelt varies with temperature and if this causes flood mixture, the API cannot be used. Instead the average Antecedent Temperature Index (ATI) may be used to distinguish the different phenomena. Using this demarcation, all observed annual floods below this level were associated with antecedent temperatures below a certain degree. API and ATI indices were used in this study to classify floods in an attempt to induce more physics into flood hydrology. 3.4 RESULTS AND DISCUSSION 3.4.1 British Columbia The histograms of floods by month of occurrence were plotted and it was found that in the rain dominated Coastal areas of BC, annual floods occur mainly in winter and summer. In the snow-dominated Interior and northern BC areas, annual floods occur mainly in summer. Historically, these latter areas have experienced most of their rainfall that contributes to the annual maxima during the snowmelt season (Hare and Thomas, 1979). The classification of annual extreme floods based on time of occurrence using histogram analysis is therefore not possible in this thesis. A new technique based on the nonparametric density function was used to detect if the annual flood series is bimodal, unimodal, or heavy-tailed. The nonparametric probability density function and cumulative distribution function with the corresponding histograms by month of occurrence were plotted at typical stations displaying a wide range of climate and physiographic conditions and are shown in Figures 3.6 to 3.13. Finally, antecedent precipitation and temperature data for each flood event are used to differentiate between different flood generating mechanisms for each basin. In rainfall-dominated areas in BC, 36 the data suggest that an arbitrary value of the antecedent precipitation index (API) will distinguish the different flood generating processes. The API analysis can demonstrate that the multimodality is a result of the mixture itself and not sampling variability. A close examination of these plots is provided below. 3.4.1.1 Southeastern Interior of British Columbia Three representative basins (Fishtrap Creek Basin, Salmo River Basin near Salmo, and Boundary Creek Basin near Porthill) were chosen to reflect the hydrologic condition in the southeastern Interior of BC. Plots of the histogram by month of occurrence of floods for these basins are shown in Figures 3.6c to 3.8c. Annual floods occur exclusively from April to July. It is hypothesized that these floods are mainly generated by snowmelt or rain-on-snowmelt events. The estimated nonparametric density function and the cumulative distribution function are shown in Figures 3.6a to 3.8a and Figures 3.6b to 3.8b, respectively. The shapes of the flood distributions for these basins are bimodal. This may be caused by sampling variability, but could also be the result of two or more flood generating mechanisms that are physically different. This question can be addressed by plotting the one-week API of the annual flood discharge for each year (Figures 3.6d to 3.8d). However, the API plot was not useful, and so the average antecedent temperature is used to distinguish two different flood-generating mechanisms. The first cluster of observations, forming the first mode of the probability density function of Figures 3.6a to 3.8a, are floods generated by snowmelt and rain-on-snowmelt with an average antecedent temperature different than the second cluster, forming the second mode of the probability density function. Also, the possible difference in the dominant energy flux, including both short-wave and long-wave net radiation, convection from the air (sensible energy), vapor condensation (latent energy), and condensation from the ground, as well as the energy contained in rainfall can cause the snow pack to melt at a different rate. However, this is a first order indication that the multimodality of the distribution is the result of mixture in the flood data and not sampling variability. 37 3.4.1.2 Northeastern-Alberta Plateau of British Columbia The Halfway River Basin station was chosen to represent the Northeastern-Alberta Plateau. The plot of the histogram by month of occurrence of floods (Figure 3.9c) is similar to that of the Southeastern Interior, in that annual floods occur exclusively from May to August and are known to be generated mainly by snowmelt and rain-on-snowmelt events. The nonparametric density function and the cumulative distribution function are shown in Figures 3.9a and 3.9b, respectively. The resulting flood distribution has, unlike the case of the Southeastern Interior, three rather than two modes that might be the result of three physically different generating mechanisms. A plot of the one-week antecedent precipitation of the annual flood discharge for each year is given in Figure 3.9d. There are three distinct clusters in the API plot, and each seems to produce different ranges of flood magnitude, which in turn produced a PDF that exhibits three modes. The first cluster of observations forming the first mode is hypothesized to be floods generated by snow or rain-on-snow with an antecedent precipitation less than 15 mm. The second cluster, forming the second mode, corresponds to all floods generated by rain-on-snow events that have an antecedent precipitation between 25 and 38 mm, and the third cluster, forming the third mode, are floods generated also by rain-on-snow with antecedent precipitation greater than 38 mm. This is only a qualitative assessment and data are needed in order to support or reject the hypothesis. As in the analysis of the Southeastern Interior region, this is another first order indication that the multimodality of the distribution is the result of mixture in flood data rather than sampling variability. 3.4.1.3 South Coast of British Columbia Two stations were chosen to represent conditions of the South Coast of BC, the Chilliwack River at Vedder crossing and the Bella Coola River above Burnt Bridge Creek. Contrary to the previous two regions examined, annual floods of the Chilliwack River occur in two distinct seasons: from May to July and from October to February (Figure 3.10c). The nonparametric density functions and the cumulative distribution function estimated for this basin are shown in Figures 3.10a and 3.10b, respectively. As 38 found in the Northeastern-Alberta Plateau region, the flood distribution has three modes that might be the result of three physically different generating mechanisms. A plot of the one-week antecedent precipitation of the annual flood discharge for each year is given in Figure 3.10d. Similar to the Northeastern Alberta region, there are three different clusters, although with three different antecedent precipitation index ranges: less than 50 mm, between 50 and 120 mm, and greater than 120 mm. This again demonstrates that the multimodality of the distribution is the result of mixture in flood data and not sampling variability. Annual floods in the Bella Coola River occur mainly from May to October with fewer floods occurring from November to January (Figure 3.11c). The nonparametric density function and the cumulative distribution function estimated for this basin are shown in Figures 3.11a and 3.11b, respectively. Contrary to the flood distributions for the Chilliwack River, the Bella River has two modes that might be the result of two physically different generating mechanisms. A plot of the one-week antecedent precipitation of the annual flood discharge for each year is given in Figure 3.1 Id. There are two distinct clusters: the first cluster, which forms the first mode, are floods hypothesized to be generated by snowmelt with an antecedent precipitation less than 55 mm. The second cluster, forming the second mode, corresponds to rainfall generated floods that have an antecedent precipitation greater than 55 mm. As in the previous analysed regions, this demonstrates that the multimodality of the distribution is the result of mixture in flood data and not sampling variability. 3.4.1.4 North Coast of British Columbia In the North Coast of BC, two stations were chosen, the Zymagotitz River near Terrace and the Zymoetz River above O.K. Creek. The annual floods in these basins occur in two seasons from May to July and from September to November and are generated mainly by rainfall and snowmelt events (Figures 3.12c and 3.13c). The nonparametric density function for the Zymagotitz River shows two modes (Figure 3.12a), while the Zymoetz River exhibits three modes (3.13a). The cumulative distribution functions that were estimated for these two basins are shown in Figures 3.12b and 3.13b. In each basin, the 39 resulting flood distribution is multimodal, and each mode may be the result of a distinct generating mechanism. A plot of the one-week antecedent precipitation of the annual flood discharge for Zymagotitz River is given in Figure 3.12d. There are three different clusters, each cluster describing a distinct hydrologic process. The first set of observations forming the first mode represent floods hypothesized to be generated by snowmelt with an API less than 30 mm, the second forming the second mode corresponds to floods that are hypothesized to be generated by rainfall with an API between 30 and 130 mm and the third cluster corresponds to floods hypothesized to be generated by rainfall with an API greater than 130 mm. Unlike the Zymagotitz River, the Zymoetz River (Figure 3.13d) has two clusters; the first represents floods hypothesized to be generated by snow and rain-on-snow with an API less than 115 mm. The second cluster corresponds to floods hypothesized to be generated by rainfall with an API greater than 115 mm. Once again, this, as well as all the previous examples, demonstrates that the multimodality of the distribution is the result of mixture in flood data and not sampling variability. 3.4.1.5 ENSO-Related Phenomena Mixture in B C floods could also be the result of ENSO/Non-ENSO events. Figures 3.14 to 3.21 each show two frequency curves: one for the annual flood events generated during ENSO years and a second for those generated during non-ENSO years. In the north Coastal region of BC, two watersheds were used to show the effect of ENSO/Non-ENSO events: Kitimat River below Hirsch Creek (Figure 3.14) and Little Wedeene River below Bowbyes Creek (Figure 3.15). The lower tail of the composite frequency curves is controlled by a combination of ENSO and non-ENSO events; the upper tails, however, are defined exclusively by non-ENSO events. In the south Coastal of BC, two stations were used to show the effect of ENSO/Non-ENSO events: Chilliwack River at Vedder Crossing (Figure 3.16) and Kitsumkalum River near Terrace (Figure 3.17). At these two watersheds, while the lower tail of the composite frequency curves is still controlled by a combination of ENSO and non-ENSO events, the upper tails are defined exclusively by ENSO events. Two other watersheds located in the Interior were chosen and are Muskwa River near Fort Nelson (Figure 3.18) and Boundary Creek near Porthill (Figure 3.19). As 40 was the case for the southern Coast stations, the upper tail of the CDF is defined exclusively by the ENSO events for these two Interior stations. While many major streams in BC have shown similar trends, several others displayed no significant difference between the ENSO and non-ENSO annual flood frequency curves. Examples are shown in Figures 3.20 and 3.21 for the two different basins in BC: one on the Coast and the other in the Interior. Different basins seem to be impacted by the ENSO conditions in different ways. In the Interior snow dominated watersheds, non-ENSO and ENSO are producing what could be a mixture in the flood data. This means there is a strong correlation between SOI and extreme floods in these watersheds. In some Coastal watersheds, non-ENSO floods seem different from ENSO floods and the upper tail of the flood frequency curve is dominated by non-ENSO. Floods in these watersheds occur in both snow and rain seasons. However, many fewer floods occur during the rain season. This means that these watersheds must have relatively higher elevations and, therefore, must receive the precipitation in the form of mainly snow as opposed to rain (even though they are located on the Coast). On average, El-Nifio winters are associated with warmer conditions and below normal snowfall. La-Nina winters are generally accompanied by cooler and snowier conditions. However, these are average conditions, and individual ENSO events will not necessarily conform to this pattern. For example, in the past, some El-Nino winters have been cooler or snowier than normal and some La-Nina winters have been warmer with less snow than normal (Taylor, 1998). Warmer conditions during El-Nino may be attributed to the enhanced southerly flow associated with an amplified ridge of high pressure over northwestern Canada (Shabbar and Khandekar, 1996). For El-Nifio winters, this influence diminishes in a northward direction and is not significant for northern British Columbia. Colder than normal temperatures during La-Nina are likely due to an increase in northwesterly winds associated with a change in the circulation pattern during this phase (Shabbar and Khandekar, 1996). An interesting issue that underlies this study is the extent to which the atmospheric circulation can be distinguished in the annual flood variability. The major conclusion 41 from this investigation is that there are associations between flooding in BC and atmospheric conditions in the equatorial Pacific Ocean. However, the associations are not consistent with stronger relationships observed in the Interior of BC. Even in this area, the strongest associations occur in the mountainous divisions suggesting that a combination of precipitation and climate effects impact snow accumulation. 3.4.2 Gila River Basin, Arizona Figure 3.22 shows the relative significance of the resulting frequency curve for each type of storm. Because very few annual floods were generated by tropical storms, only the monsoon and frontal flood frequency curves were plotted. There is a substantial difference between the two curves for all return periods. The lower tail of the composite annual frequency curve is controlled by monsoonal storms while its upper tail is dominated mainly by frontal storms. This plot clearly illustrates that the annual flood series is composed of more than one flood population. This is also demonstrated in Figures 3.23c and 3.24c in which the annual floods in these basins occur in two seasons from July to September and from January to March and are generated mainly by rainfall events. The nonparametric density function and the cumulative distribution function estimated for these two basins are shown in Figures 3.23a and 3.24a and Figures 3.23b and 3.24b, respectively. Note that these PDFs from Arizona are not bimodal and have long heavy tails that differ dramatically from the PDFs of B C floods. Each stream is associated with a heavy-tailed multi-modal density function, which further attests to the physical reality that more than just one population generates annual floods. Mixture in annual flood data in the Gila River Basin can also be caused by longer-term low frequency climatic fluctuations. Studies have suggested that shifts in the climate of the southwestern US around 1930 and 1960 were driven by decadal-scale variability in the frequency of the ENSO conditions (e.g. Namias et al, 1988). While the long-term mean of the SOI has not changed, its variance decreased significantly during the period of 1930-59 (Elliott and Angell, 1988). Consequently, there are some striking differences in the hydroclimatic regimes of the periods 1900-29, 1930-59, and 1960-99. Webb and Betancourt (1992) showed how fall and winter precipitation has been enhanced and 42 summer precipitation has been suppressed during the periods of increased frequency of ENSO conditions before 1930 and after 1960 (Figure 3.25). This has resulted in changes to the long-term characteristics of annual floods on major streams in southern and central Arizona (Slezak-Pearthree and Baker, 1987; Hjalmarson, 1990; and Roeske et al, 1989). Figure 3.26, which was constructed using data from 12 major streams, further illustrates how the decadal climatic variability has a systematic regional impact on the Gila River Basin. The figure shows how the coefficient of variation of the annual floods varied with drainage area during the pre-1960 and post-1960 conditions. The coefficient of variation has increased on average by 25 percent across the whole Gila River Basin. This result may be due to the fact that the upper atmospheric circulation and weather patterns are large-scale phenomena and therefore affect watersheds of all sizes in the same way. Trend analyses performed by Webb and Betancourt (1992) on data from two stations suggest that at the 95-percent confidence level, the difference between the pre-1960 and post-1960 distributions is not statistically significant. However, to facilitate the appreciation of the differences in the magnitude of floods estimated separately from the pre-1960 and post-1960 periods, the two flood frequency curves are shown in Figure 3.27 for Gila River near Clifton and San Francisco River at Clifton, respectively. The two periods have annual frequency curves that diverge considerably for events larger than the 2-year return period. Although not statistically significant, there is a physical basis for these differences and they should, therefore, not be ignored. For this particular stream, the differences are the result of a simultaneous increase of about 50 percent in both the mean and coefficient of variation of annual floods during the post-1960 period. In summary, the upper tails of the observed flood frequency curves in the Gila River Basin seem to be consistently dominated by flood generating processes that are different from those controlling the lower tails. This implies that the annual flood series are associated with heterogeneous flood frequency distributions. 3.4.3 Tees River Basin, United Kingdom The analysis was performed on the maximum annual flood series at two stream gauges: Broken Scar near Darlington and Low Moor of the Tees River Basin of northeast 43 England, United Kingdom. Figure 3.28 shows plots of the empirical flood frequency curves at both Broken Scar and Low Moor. It can be seen that there is a break in the slope at bankfull discharge for each curve but the break is much more pronounced at the downstream station. This change in the shape of flood frequency curve as the flood wave travels from Broken Scar to Low Moor reflects the change in the geomorphic characteristics along the Tees River Basin (Archer, 1989). The relationship of travel time and wave speed to discharge is shown in Figure 3.29. The upland catchment tributary to the Tees River upstream of Broken Scar is characterized by rugged terrain, impermeable soils, and steeper slopes. Within the confines of a narrow valley, main channel and overbank floods are both associated with relatively high velocities. There is little opportunity for storage and the generated hydrographs are flashy and peaky over the whole range of floods. However, downstream of Broken Scar, floodplains are reasonably well developed with wider valleys. Overbank floods are, therefore, associated with much lower flow velocities than are floods running within the main channel. Furthermore, floodplains have a geometry and hydraulic roughness that are different from the main channel and play a significant role in the attenuation of extreme floods as they propagate down the river basin. The relationship between peak inflow to the reach and peak outflow is shown in Figure 3.30. The storage in the floodplains restrains flood growth and results in a much-pronounced break in the slope of the flood frequency curve. The break in the slope of a flood frequency curve, or dog-leg shape, has long been used as a subjective indication that floods are generated by a mixture of two populations (Potter, 1958). The month of occurrence of the floods, the shape of the probability density function, and the cumulative distribution function estimated using the nonparametric method of Equation 3.1 were also examined. Figures 3.31 and 3.32 show the (a) shape of the density function, (b) cumulative distribution function, and (c) month of occurrences of the flood data at Broken Scar and Low Moor, respectively. While the probability density function at Broken Scar appears to be unimodal, the PDF at Low Moor has a pronounced bimodality. The first mode corresponds closely to the mean (approximate 220 m3/s) of the observed floods that are smaller than bankfull discharge, and the second mode 44 corresponds closely to the mean (approximate 400 m3/s) of the observed floods that are larger than bankfull discharge. However, the identification of mixture in flood data based solely on the shape of the nonparametric density may be subjective. Chui (1991) indicated that the shape of the density function might be sensitive to the value of the smoothing factor h used in Equation 3.1. Nevertheless, Figures 3.33 and 3.34 show that the unimodal and bimodal features of the flood data at Broken Scar and Low Moor, respectively, are persistent for a wide range of the smoothing factor h. The optimum values of h for these two stations are 30 and 40, respectively. 3.5 CONCLUSIONS This chapter addressed the various physical processes resulting in annual flood data with multiple populations within the three study areas: the Gila River Basin of southeast and central Arizona, the Tees River Basin of northeast England, and British Columbia, Canada. Floods are thought to be influenced by a distinct number of factors including: seasonal variations in the flood producing mechanisms (rain vs. snowmelt vs. rain-on-snow), snow melting at substantially different ranges of temperature (short wave vs. longwave radiation), changes in weather patterns due to El-Nino/La-Nina oscillations resulting from the anomalous warming of the equatorial eastern and central Pacific Ocean, and/or the low frequency or decadal climatic fluctuations, and changes in channel routing due to the dominance of within channel or floodplain flow. In BC, it was found that in the rain dominated Coastal areas, annual floods occur mainly in winter and summer. In the snow dominated Interior and Northern areas, annual floods occur mainly in the summer. It was also demonstrated that mixture was caused by snowmelt generated by different ranges of temperatures and ENSO and non-ENSO conditions. The major conclusion from the latter investigation is that there are associations between flooding in BC and atmospheric conditions in the equatorial Pacific Ocean. These associations are strongest in the Interior and in northern BC. Whether precipitation falls as rain or snow is an important aspect of the flood generation process and explains in part the differences in results between the Coastal and Interior regions in relation to our ENSO investigation in BC. In certain elevation ranges, precipitation type 45 is critically dependent on temperature. Very cold locations may receive more snowfall if the temperature warms but still remains below freezing. In contrast, a location near the rain/snow level may change to predominantly rain with small increases in temperature. In the Gila River Basin, the upper tails of the observed flood-frequency curves seem to be dominated consistently by flood-generating processes that are different from those controlling the lower tails. The annual flood data are caused by differences in storm types. There is a substantial difference between the two curves for all return periods. The lower tail of the composite annual frequency curve is controlled by monsoonal storms while its upper tail is dominated mainly by frontal storms. Mixture in annual flood data can also be caused by low frequency climatic fluctuations. The upper tails of the observed flood frequency curves seem to consistently be dominated by flood generating processes that are different from those controlling the lower tails. To facilitate the appreciation of the differences in the magnitude of floods estimated separately from the pre-1960 and post-1960 periods, the two flood frequency curves have been constructed for the Gila River near Clifton and the San Francisco River at Clifton. The two periods have annual frequency curves that diverge considerably for larger than 2-year return period events. For this particular stream, the differences are the result of a simultaneous increase of about 50 percent in both the mean and coefficient of variation of annual floods during the post-1960 period. The nonparametric density functions for Arizona stations are not bimodal and have long heavy tails that differ dramatically from the PDFs of BC floods. Each stream is associated with a heavy-tailed multi-modal density function, which further attests to the physical reality that more than one population generates annual floods. At the two stream gauges of the Tees River Basin of northeast England (Broken Scar and Low Moor), mixture is highly associated with the change in geomorphic characteristics. It can be seen that there is a break in the slope at bankfull discharge for the empirical flood frequency curves at both Broken Scar and Low Moor but the break is much more pronounced at the downstream station. The upland catchment tributary of the Tees River (upstream of Broken Scar) is characterized by rugged terrain, impermeable soils, and 46 steeper slopes. Within the confines of a narrow valley, main channel and overbank floods are both associated with relatively high velocities. There is little opportunity for storage and the generated hydrographs are, therefore, flashy and peaky over the whole range of floods. However, downstream of Broken Scar, floodplains are reasonably well developed with wider valleys. Overbank floods are, therefore, associated with much lower flow velocities than are floods running within the main channel. Furthermore, floodplains have a geometry and hydraulic roughness that are different from the main channel and play a significant role in the attenuation of extreme floods as they propagate down the river basin. In summary, the upper tails of the observed flood frequency curves in the three study areas seem to consistently be dominated by flood generating processes that are different from those controlling the lower tails. This chapter used a number of characteristics of the hydrologic regime to identify flood mixtures and determine the number of physically different populations in a flood data sample. These characteristics were: (i) identification of the break in the slope of the empirical flood frequency curve and/or the bimodality/heavy tail in the nonparametric density function estimated by means of the nonparametric method, (ii) the discharge of flood histogram by month of occurrences, (iii) antecedent precipitation and temperature indices, and (iv) ENSO/Non-ENSO phenomena. This work implies that the annual flood series are associated with heterogeneous flood frequency distributions. The question remains whether flood estimation will benefit from the recognition that annual floods in these study areas come from heterogeneous distributions. In previous studies, this question has not been answered in any region where mixtures have been identified. Chapter 4 will demonstrate if and when it is operationally and statistically advantageous to use quantile estimation procedures, which exploit the presence of mixtures. 47 Chapter 4 TREATMENT OF MIXTURE DISTRIBUTIONS AT A SINGLE SITE 4.1 INTRODUCTION Fitting a continuous mathematical distribution to data sets yields a compact and smoothed representation of the flood frequency distribution revealed by the available data, and a systematic procedure or extrapolation to flood discharges larger than those historically observed. A variety of distribution functions and fitting methods are available for estimating the magnitude and frequency of floods. In the United States, for instance, the guidelines for frequency analysis presented in Bulletin 17B (IACWD, 1982) were established to provide consistency in the federal flood risk management process. In estimating a flood frequency distribution for the American River, the committee believed it was desirable to follow the spirit of these guidelines, although not necessarily the exact letter (American River Committee, 1998). In flood frequency analysis a basic assumption is that a series of flood peaks is a random sample from a stationary population. However, hydrologists can never be completely sure that this assumption is true. To overcome some of the deficiencies associated with the use of homogeneous distributions, alternative approaches are investigated from a single site flood frequency perspective in this chapter. These alternative approaches, such as classified two-component parametric distributions, unclassified two-component parametric distributions, and nonparametric methods, explicitly recognize the characteristics of probability distributions of floods generated by more than one hydrologic process. The objectives of this chapter are (i) to investigate the implications of using various routinely-' Part of this chapter (only Arizona data) has been published as Alila, Y . and A . Mtiraoui. 2002. Implications of heterogeneous flood frequency distributions on traditional stream discharge prediction techniques. Hydrological Processes, 16(5), 1065-1084. A . Mtiraoui analysed the data, interpreted the results, and wrote the first draft of the manuscript. Y . Ali la revised and edited the manuscript. 48 used homogeneous distributions on stream discharges for basins that exhibit mixed population characteristics using long-term hydroclimatic records from the Gila River Basin of southeast and central Arizona in the United States, British Columbia, Canada, and the United Kingdom and (ii) to demonstrate how frequency models that explicitly account for floods generated by a mixture of two or more populations are both hydrologically and statistically more appropriate in many instances. 4.2 LITERATURE REVIEW Numerous probability distributions have been proposed to fit annual maximum floods, dating back to the work of Horton (1913), who first used the normal distribution to describe the behavior of floods. It was soon recognized, however, that annual flood series are often skewed, which led to the use of the log-normal distribution (Hazen, 1914). The development and popularity of many other skewed distributions then followed, with the most commonly used being the extreme value type I (EV1), the generalized extreme value (GEV), the three parameter log-Pearson (LP3), and the three parameter log-normal distributions (Bobee et al, 1993; Kite, 1978; Pilon and Harvey, 1994; Watt et al, 1989). The proponents of each distribution have been able to show some degree of confirmation for their particular distribution by comparing theoretical results and measured values. However, there is no physical and theoretical basis for justifying the use of one specific distribution for modelling flood data (Bobee et al, 1993), and long-term flood records show no justification for the adoption of a single type of distribution (Benson, 1962b). Many studies of distribution selection have been completed in some countries, most notably in the US (USWRC, 1967) and Britain (NERC, 1975; Institute of Hydrology, 1999), resulting in the general adoption of specific distributions. In Canada, however, such a study has never been completed, and the selection of a distribution has not been given governmental direction. Rather, distribution selection is left to the individual, although the EV1 distribution is somewhat of a practical standard in this country. Its popularity is a result of its simplicity and ease of application, as well as the considerable exposure and demonstration it receives through the Rainfall Frequency Atlas for Canada 49 (Hogg and Carr, 1985). This atlas adopted the EV1 distribution as the standard for extreme rainfall analysis in Canada. Some of these distributions have been chosen based on conventional goodness-of-fit tests such as Chi-Square, Kolmogorov-Smirnov (Keeping, 1966), and Akaike Information Criterion (Akaike, 1974). Cunanne (1989) in his operational hydrology report for the World Meteorological Organization (WMO) made an excellent review concerning the choice of a distribution in a region. He observed that conventional goodness-of-fit tests such as Chi-Square and Kolmogorov-Smirnov tests are of little value in this context. Other distributions have been selected based on moment ratio diagrams utilizing the coefficient of skewness and kurtosis (Wu and Goodridge, 1974 and Watt and Nozdryn-Plotnicki, 1980). This approach has been the subject of much criticism due to the inherent sampling errors in estimating higher order moments that are associated with existing ranges of record lengths (Hazen, 1924 and Wallis et al, 191 A). The remaining distributions are based on a regional goodness-of-fit test using L-moments (Hosking and Wallis, 1993). This procedure is more robust than the classical hypothesis testing methods as it uses regional, rather than single site data, to discriminate between alternative distributions (Cong et al, 1993). However, this technique does not discriminate between mixture and non-mixture in flood data (Gingras and Adamowski, 1992). The distributions mentioned previously assume that the annual flood series is a random sample drawn from a single homogeneous population of floods. A number of researchers have shown that this assumption may not be valid and have treated floods as coming from more than one population. Stoddard and Watt (1970) combined two EV1 distributions to model floods in Southern Ontario, Crippen (1978) and Jarret and Costa (1988) combined two LP3 distributions to model floods in Colorado, and Webb and Betancourt (1992) combined three LP3 distributions to fit floods at one station only in Arizona. Singh (1968, 1974) and Singh and Sinclair (1972) used a mixture of two normal distributions to model annual flood peaks. They used a least squares fit to determine the mixture parameters. Waylen and Woo (1987) and Woo and Waylen (1984) modeled 50 annual flood peaks using a compounded extreme value distribution, classifying the data into spring snowmelt and fall rainfall events, and estimating parameters from the classified data in Ontario and British Columbia, Canada, respectively. They assumed that exceedances that occur during the same season are identically distributed and mutually independent. Cooke and Mostaghimi (1994) developed procedures for fitting heterogeneous distribution functions to mixed population data. These fitting procedures are different from the use of regression methods for fitting multiple parameter distributions. Their fitting procedures do not utilize any moment above second order, and allow for more than an order of magnitude increase in the number of heterogeneous distributions that can be fitted. Rossi et al. (1984) offered the Two-Component Extreme Value distribution (TCEV) as an alternative distribution for modelling annual floods in Italy. The T C E V is motivated on the premise that floods above a threshold come from two independent processes, each occurring according to a compound Poisson process and having exponentially distributed exceedances. With such a structure for floods above a threshold, annual floods have a TCEV distribution. A random variable with a TCEV distribution is equivalent to the largest of a censored sample from two independent EV1 distributions, with the censoring occurring at zero. Francis (1998) presented an approach, which permits the user to increase the information used in estimating river flood quantiles by means of the use of non-systematic information in a regional analysis framework. The T C E V has been the distribution function employed that fits well the statistical features of Mediterranean rivers. However, the explicit recognition that floods come from heterogeneous distributions would affect the flood estimation procedure when the two flood components are shown to be statistically different. Hirschboeck and Cruise (1989) demonstrated the usefulness of the heterogeneous distribution approach where mixed populations of observed flood data are present. They further used the same regional flood data divided into non-overlapping 30 year periods and computed coefficients of skewness for each 30-year period as described by Matalas et al. (1975). The mean and standard deviation of these skew coefficients were then 51 compared to the same of several common single population probability distributions given by Matalas et al. (1975). The work of Hirschboeck and Cruise (1989) illustrated that single population probability distributions, such as EV1 , L N , LP3, Pareto, and Weibull, failed to account adequately for the variability of skewness of Louisiana flood data. Another approach that explicitly recognizes the characteristics of probability distributions of floods generated by more than one hydrologic process, without need for splitting the various populations, is the use of the nonparametric frequency approach. Adamowski (1985) developed this method for single site analysis. In this approach, there is no need for a priori choice of distribution, and the parameter estimation problem is much less complex. Gingras and Adamowski (1992) used the nonparametric approach for the regionalization of streamflow in the provinces of Quebec and Ontario. Using long-term hydroclimatic records from the Gila River Basin of southeast and central Arizona in the United States, the Low Moor of the Tees River Basin of northeast England in the United Kingdom, and British Columbia in Canada, the objectives of this chapter are (i) to investigate the implications of using various popular homogeneous distributions on stream discharges for basins that exhibit mixed data, and (ii) to demonstrate how frequency models that explicitly account for floods generated by a mixture of two or more populations are hydrologically more appropriate in many instances. 4.3 RESEARCH METHODS In this chapter, the fit of the most commonly used parametric flood frequency distributions to data from the three selected study areas was assessed and compared to the fit of the parametric two-component and nonparametric distributions. No attempt was made to apply any of the statistical goodness-of-fit tests to discriminate between the alternative distributions for two reasons. Firstly, deviations between the observed floods and those predicted by parametric distributions are often operationally significant and should not be ignored just because they are statistically insignificant. Second, as we demonstrated in Chapter 3 having ample physical evidence that one of the fundamental 52 assumptions of frequency analysis is violated, namely that the annual floods are not identically distributed; the use of any particular single population probability distribution is not justified, irrespective of any statistical evidence. We advocate that the selection of the most plausible distribution for flood frequency analysis should be on the basis of hydrological and physical reasoning, as opposed to the systematic application of statistical goodness-of-fit tests. Unless otherwise stated, all flood frequency curves presented here are plotted on normal probability paper with the Cunnane plotting position formula T = (n+0.2) /(m-0.4), where Tis the return period in years, n is the total number of annual floods in the sample, and m is the rank when the flood observations are arranged in descending order. A sensitivity analysis of the results to the selected plotting position parameter is discussed in section 4.4.4. 4.3.1 Parametric Flood Frequency Analysis The assumption used in conventional flood frequency analysis is that annual flood series are drawn from a single homogeneous population. The implications of violating such a fundamental assumption on the fit of the most commonly used homogeneous distributions to flood data, and on the prediction of extreme events in the three study areas, are investigated in this chapter. The following homogeneous distributions are used because of their general adoption: log-Pearson type III (LP3), generalized extreme value (GEV), Gumbel (EV1), and Wakeby (Table 5.1). Where applicable, the method of linear moments (L-moments), as described by Hosking (1990) is used for fitting the data to these distributions. Otherwise, the method of maximum likelihood is used. L-moments have gained considerable momentum for application to the frequency analysis of hydrologic extremes, and are now generally recognized as being superior to traditional methods, such as conventional moments or the method of maximum likelihood (Alila, 1999). More recently, however, Klemes (2000) articulated some cautionary notes about the use of L-moments in hydrologic frequency analysis. He argued that high outliers in a flood-data series are important for extrapolating to long return period events in engineering design, as they define the upper leg of the flood frequency curve. Therefore, by using L-53 moments, which are less sensitive to these outliers than conventional moments, practitioners may be missing the most important piece of information in the flood data series. If the annual floods in a sample are identically distributed and sampling variability is the cause of outliers (for instance, a 100-year event in a 10-year sample), then they should not be given an undue weight. If any quantitative historic information can be found for any high outlier, reasonably well-established methodologies referred to as flood frequency analysis with historic information could be used (Pilon and Harvey, 1994). However, in the absence of any historic information, such high outliers are often either removed from the sample or simply ignored and, consequently, the use of the conventional moments would either overestimate or underestimate the T-year flood event. In this case, it is more rational to use a method that is less sensitive to outliers in the data, such as the L-moments. On the other hand, when the outliers are the result of differences in flood generating mechanisms between the lower and upper legs of the flood frequency curve, as demonstrated in Chapter 3, the critical issue becomes the distribution selection (homogeneous vs. heterogeneous) and not the fitting technique (moment vs. L-moments). It is considered in this study that the use of a mixture distribution is a more appropriate way of handling the information provided by the outliers. 4.3.2 Parametric Two-Component Distributions Flood Frequency Analysis Under conditions where floods are believed to be caused by more than one mechanism, the probability distribution of floods could be very different, and mixtures of probability distributions are used. Two broad classes of parametric two-component distributions exist. 4.3.2.1 Classified Parametric Two-Component Distributions The first technique divides an annual or partial duration series of floods into two or more sub-samples, with each sample fitted to a conventional homogeneous distribution. Using the assumption of independence of the flood populations and the additive rule of probability, the composite exceedance probability, FT, is estimated by (Webb and Betancourt, 1992): 54 FT(X >x) = Fl(X >x) + F2(X>x)-Fl(X>x)F2(X>x) (4.1) where F^X > x) and F2(X>x) are the exceedance probabilities of the two flood populations. In effect, Equation (4.1) states that the two flood generating mechanisms do not occur in the same year simultaneously; for instance, in any single year, floods may be caused by either tropical or monsoon storms but not both. A flood of magnitude x would have a return period 7j if it belongs to the first population and T2 if it belongs to the second population. The return period (T) for a flood of magnitude x can, therefore, be calculated as: Similar equations can be developed for a heterogeneous distribution of three or more populations. However, this can increase the complexity of the problem and make the problem intractable. The disadvantage of this method lies in the uncertainties in flood estimation associated with the reduced sample sizes resulting from the sub-division of the original time series of floods into sub-samples. In addition, the technique may not always be feasible in practice because the long-term meteorological data required for the classification of flood events are often not available. The second technique uses an annual time series for each flood population or generation mechanism (for instance, snowmelt in spring and rainfall in winter), with each annual time series fitted to a homogeneous distribution. Also based on the assumption of independence of the flood populations and the multiplicative rule of probability, the composite exceedance probability, FT, is estimated by: T = (4.2) ( r 1 + r 2 - i ) FT (X > x) = Fl (X > x)F2 (X > x) (4.3) 55 Waylen and Woo (1987) and Woo and Waylen (1984) used Equation (4.3) with two EV1 distributions to model annual low-flows and floods generated by mixed processes in Ontario and British Columbia, Canada, respectively. In effect, Equation (4.3) states that two flood peaks occur sequentially each year: for instance, one caused by snowmelt in spring and the second by rainfall in winter. This technique has been criticized by Rossi et al. (1984) because the assumption of identically distributed flood peaks within a season is not realistic. Floods caused by different types of storms might coexist in the same season (Gupta etal, 1976). 4.3.2.2 Unclassified Parametric Two-Component Distributions A technique that does not require a priori separation of flood processes as in the two methods of the classified parametric two-component distributions, considers the annual maximum floods to belong to several populations with distinct homogeneous distributions. According to Moran (1959), a heterogeneous distribution is a mixture of k homogeneous component distributions given by: F(x) = aFl (x) + a2F2(x) + .... +alFk(x) (4.4) where Fl(x),....,Fk(x) are the cumulative distribution functions of the k component distributions, and al,....,ak are parameters designating their relative proportions and satisfying the condition a, +a2 +.... + ak =1.0. The application of Equation (4.4) requires prior knowledge about the parent homogeneous distribution for each component. It also requires prior knowledge on the number of components that make up the flood mixture. Such information may be obtained from a classification of floods based on generation mechanism, as described in Chapter 3. When the hydroclimatic data needed for classifying floods are not available, a procedure for determining the number of distributions into which a heterogeneous distribution could be decomposed may be used (Medgyessy, 1977). However, it should 56 be stressed that an increase in the number of components makes the fitting technique less robust and less accurate, so the number of distributions should be kept to a minimum. For illustration purposes only, and to reduce the complexity of the analysis, a heterogeneous distribution composed of two log-normal distributions wi l l be used in this chapter. The composite exceedance probability, FT, is estimated by (Hawkins, 1974): FT(X >x) = aFx(X >x) + (l-a)F2(X >x) (4.5) where Fl(x) = — ] = fexp (*- A ) 2 2<r,2 dx' (4.6a) FJx) = j e x p (x'-ju2)2 2al dx (4.6b) In the preceding equations, x represents the natural log of the annual floods in m 3/sec, subscripts 1 and 2 refer to the two component distributions, pi and a are the sample mean and variance, a and ( l - « ) are their relative weights, and F denotes the probability that a particular flood x is not exceeded. The parameters a, , cr,, ju2, and a2 are estimated with a nonlinear optimization algorithm (Singh and Nakashima, 1981). The algorithm minimizes the objective function Z ( A Z ) 2 , where A Z equals the difference between the observed probability of the annual floods computed by the Cunnane plotting position equation and the theoretical probability estimated by Equation (4.5) above. The optimization of the objective function Z ( A Z ) 2 was conducted subject to the following four constraints (Cohen, 1967): ju = a/Lix +(\-a)jU2 (4.7a) 57 o2 - aa2 + a(l - a)o\ +a(l-a )(ju2 - jix )2 (4.7b) g(73 = am[ (3a2 + m2) + (1 - a)m2 (3a] + m 2 ) (4.7c) kr(74 = a(3ax + 6mf o[ + m\) + (1 - a)(3cr2 + 6m 2 o\ + m2) (4.7d) in which ju , a, g , and kt are the mean, standard deviation, skewness, and kurtosis of the annual flood series, m, = / / , - / / , and m2 = / / 2 - . This technique of fitting a heterogeneous distribution is superior to the conventional maximum likelihood method, which often does not converge for the typical short sample size of annual floods available in practice (Jiang and Kececioglu, 1992). The difficulty with this approach lies in the doubling (for the case of a mixture of two distributions) of the number of parameters that need to be estimated jointly from the same annual flood series and the associated loss of reliability in estimating floods. However, as explained by Equations (4.7a through 4.7d), Singh and Sinclair (1972) ascertained that fitting Equation (4.5) requires only first and second order moments for each component distribution and, therefore, yields parameter estimates in accordance with those computed from the actual data. 4.3.3 Nonparametric Flood Frequency Analysis The concept of nonparametric flood frequency analysis was introduced, independently, by both S. Yakowitz and K . Adamowski at an American Geophysical Union meeting in 1985. In this approach there is no need for a priori choice of distribution, and the parameter estimation problem is much less complex. It is clear that the nonparametric distribution is a viable alternative to the determination of design flood estimates, yet its potential has not been fully realized. In mathematical terms, the nonparametric frequency distribution has a density function defined by (Adamowski, 1985): 58 flfl i = i fl (4.8) where x, to xn are the observations in an annual flood series. K(.) is an assumed kernel function that is itself a probability density, and h is a smoothing factor. The choice of the kernel function is subjective, but Anderson (1969) and others have shown that this choice is not very critical to the success of the approach. In this chapter the Epanechnikov optimal kernel (Adamowski and Feluch, 1990) is used: K{t) I — 5 for \s (4.9) K(f) = 0 otherwise The kernel estimate Fn (x) of the cumulative distribution function F(x) is given by: ^ 1 J t - x ^ v h ) n , . = , V h J (4.10) where K\t) = \K{u)du, and K* {t) = -112 K\t) = 3t 4^5 K\t) = 1/2 f ,2^ 1 - -v 5 y for t<-4l for - V 5 < t < -JI for ?>V5 (4.11) In flood frequency analyses, the interest is often in the flood function, which can be derived as the inverse of the cumulative distribution function, Fn (x), or through kernel averaging of the sample flood function (Sheather and Marron, 1990). 59 The estimation of the smoothing factor h is more critical than the selection of a kernel function. Lall et al. (1993) used Monte Carlo simulations to investigate the performance of several well-known methods for calculating the parameter h . The smoothing factor has been estimated in this chapter by the least square cross-validation (LSCV) method that minimizes the database scored function: LSCV = \ ^ K « \ x l - x j , h „ h J ) - ZZhjK HJ J (4.12) where Kl2)( ) is the convolution of the kernel with itself, and hj is the smoothing factor associated with x.. According to Adamowski and Feluch (1990), this method gives optimal estimate of density in terms of the integral mean square error (IMSE) given by: IMSE= [j[fn(x)-f(x)]2dx (4.13) where f„(x) is an estimate of the unknown density. Through extensive Monte Carlo simulations, Adamowski (1985, 1989) showed that the nonparametric frequency approach is as accurate as its parametric counterpart when fitting annual flood and low-flow data generated by a homogeneous distribution. 4.4 RESULTS AND DISCUSSION In this section, the fit of the most commonly used homogeneous flood frequency distributions is assessed using data from the three selected study areas and compared to the fit of the parametric two-component distributions and the nonparametric distribution. 4.4.1 Gila River Basin of Southeast and Central Arizona In the US, it was found that the most commonly used distribution is the LP3, which is mandated by the US Water Resources Council (USWRC) as the standard flood frequency model under the assumption that the annual flood series are stationary and identically distributed [USWRC, 1982]. Figure 4.1 presents the fit of the LP3 distribution. As the 6 0 annual flood series has relatively long-term records, no regional adjustment was made to the at-site skewness, although this is required by Bulletin 17B of the USWRC (1982). The tail of the LP3 distribution gave a consistently poor fit to the observed floods. Two recent investigations have suggested that the GEV distribution might be an improved alternative to the LP3 for flood frequency analysis in the US (Vogel and Wilson, 1996 and Vogel et al, 1993), particularly when using L-moments. Figure 4.2 shows the theoretical fit of the GEV distribution. It appears that the G E V weights the central part of the distribution too heavily and consequently misses the upper tail of the observed flood frequency curve. The logarithmic scale on the y-axis of the frequency curves condenses the deviations between the theoretical fit and the observed floods. Consequently, an inexperienced viewer may perceive these deviations as very small or insignificant, whereas they could be quite substantial. Recently, many investigators have used the L-moments to re-assess the suitability of commonly used probability distributions as shown in Table 4.2. Although these studies used flood flows from different parts of the world, most of them recommended using the G E V distribution, including two that used US data. A comparison of Figures 4.1 and 4.2 reveals that the G E V does not provide any closer fit to the observed floods than the LP3, particularly in the extreme tail of the distribution. The fit of the Wakeby distribution at ten streams is shown on Figure 4.3. This fit is slightly better than that of any of the distributions discussed above but not consistently for all stations. This is particularly interesting because it is always presumed that as the Wakeby is a five-parameter model, it should have the flexibility to fit a wide range of flood characteristics (Singh, 1979). The fit of the annual flood data at the 12 streams in the Gila River Basin using Equation (4.5) is shown in Figure 4.4. The fit is superior to that of commonly used distributions presented in this chapter. While the homogeneous distributions consistently either over-predicted or under-predicted the magnitude of the extreme floods, the upper tail of the proposed heterogeneous distribution mirrors the behavior of the empirical distribution 61 more closely than any of the other homogeneous distributions. This improved fit at the upper tail of the distribution enhances the flood predictions in the Gila River Basin where annual floods are produced by mixed hydrologic processes. Table 4.3 gives estimates of the 50, 100, and 200-year return period floods according to the various frequency distributions. The heterogeneous distribution of Equation (4.5) gave estimates of floods that differ from those estimated by the homogeneous distributions by as much as 50, 100, and 150 percent for the 50, 100, and 200-year return period events, respectively. These differences are operationally substantial, as they would have important implications on professional practice. Therefore, they should not be ignored, irrespective of their statistical significance. Finally, the fit of the nonparametric approach (Figure 4.5) is assessed using data from the Gila River Basin and compared to the fit of the most commonly used parametric flood frequency distributions. The nonparametric approach is based on local averages of flood observations within the annual times series. This approach has the capacity of giving proper weight to tail behavior (Lall et al, 1993). This is contrary to the parametric approach, which typically weights the central part of the distribution more heavily. As shown in Table 4.3, the nonparametric distribution gave estimates of floods that differ from those estimated by the used homogeneous distributions by as much as 50, 100, and 150 percent for the 50, 100, and 200-year return period events, respectively. Again these differences should not be ignored even if the statistical significance failed. These differences are operationally very substantial, and from a professional practice prospective, they would have important implications. 4.4.2 Low Moor of the Tees River Basin of northeast England, United Kingdom In this section, the GEV distribution is used as it is commonly adopted in the UK. The fit of the GEV distribution by the method of L-moments is shown for Low Moor of the Tees River Basin in Figure 4.6a. The fit of the mixture of the two log-normal distributions is shown in Figure 4.6b. This fit is superior to the GEV distribution presented in Figure 62 4.6a. While the single population distribution over-predicts the magnitude of the extreme floods, the upper tail of the two-component log-normal ( T C L N ) distribution of Equation 4.5 mirrors the behavior of the observed floods more closely than the G E V distribution. This is the consequence of a physically more appropriate distribution that enhances the flood predictions at L o w Moor, where annual floods are produced by a mixture of two populations. The fit of the nonparametric frequency distribution for the Tees River at L o w M o o r is shown in Figure 4.6c. The fit of the curve to the data is consistently superior to the G E V distribution, particularly at the upper tail of the distribution. In this study, using a relatively large sample size of annual floods, it has been shown how the commonly used G E V distribution does not provide a satisfactory fit to the observed data, while the nonparametric approach along with T C L N distribution seems to give a better fit. 4.4.3 British Columbia, Canada The Gumbel (EV1) is the most commonly used distribution in B C . The fit of the EV1 by the method of L-moments is shown for six typical streams of British Columbia in Figure 4.7. It appears that the EV1 misses the lower and the upper tails of the observed flood frequency curve. The frequency analysis of the annual flood data at the same six streams using the T C L N distribution (Equation 4.5) is shown in Figure 4.8. The fit is superior to that of the EV1 distribution. While the homogeneous distribution consistently over-predicted the magnitude of the extreme floods, the upper tail of the proposed heterogeneous distribution mirrors the behavior of the empirical distribution more closely than this EV1 distribution. The improved performance at the upper tail of the distribution therefore enhances flood predictions in B C , which is the result of introducing physical hydrologic reasoning into the selection of the most appropriate type of flood frequency distribution for the combined sample of the annual flood series. The fit of the nonparametric frequency distribution obtained from the least square cross validation technique for the same six streams is shown in Figure 4.9. The fit of the curve to the data is consistently superior to the EV1 distribution, particularly at the upper tail of the distribution. 63 Table 4.4 gives estimates of the 50, 100, and 200-year return period floods according to the three frequency distributions. The T C L N of Equation 4.5 and the nonparametric distributions gave estimates of floods that differ from those estimated by the EV1 distribution by as much as 15, 30, and 45 percent for the 50, 100, and 200-year return period events. 4.4.4 Summary In summary, none of the traditional frequency models provided a satisfactory fit to the observed data in the three study areas, particularly at the upper tail of the flood distribution. It may be argued that the visual assessment of the poor fit of the homogeneous distributions could be attributed to the choice of the plotting position formula. Admittedly, the empirical probabilities of extreme flood events in a data sample may be affected by the plotting position (Viessman and Lewis, 1996). However, it is believed that the conclusions drawn from this study are valid irrespective of the plotting position used in the analysis. Figure 4.10 presents the theoretical fit of the LP3 distribution for the Salt River near Roosevelt, plotted with six different plotting position formulas. It is clear that the LP3 distribution provides a poor fit to the observed floods irrespective of the plotting position formula. It may be argued that it is not unusual for heterogeneous distribution models, such as the ones used in this study, to fit flood data well because the model can be considered to be as a five-parameter frequency distribution. Increasing the number of parameters often has the effect of improving model fit. However, it has been shown with the five-parameter Wakeby distribution that a great level of model complexity does not guarantee a satisfactory fit to the data. The difference between the Wakeby distribution and the ones used in this study is that the Wakeby is homogeneous while the others are heterogeneous. The selection of the log-normal model to describe the component distribution is a pragmatic decision made to restrict the number of parameters that need to be estimated from the combined sample and consequently minimize over fitting that may arise from over-parameterization. Any other two-parameter distribution could have been selected as long as the combined distribution gave a satisfactory fit to the annual flood series. 64 Parameter estimation in the treatment of heterogeneous distributions of floods remains a challenge. The parameter values estimated using the least square method adopted in this study are assumed to mirror the true characteristics of the parent heterogeneous distribution of the annual floods in the above study areas. After all, this method does not require the use of more than the first and second order moments of each component distribution as indicated by the constraint Equations (4.7a to 4.7d). The skewness and kurtosis are more reliably estimated from the combined sample of annual floods, which has a longer record length than the sub-samples associated with each flood generating mechanism. Admittedly, however, some over-fitting may cause these parameters to be artifacts of the particular flood sample. The robustness of this parameter estimation technique is best assessed through a Monte Carlo simulation experiment, particularly using the newly developed L-moment statistics as opposed to the conventional moments. Such Monte Carlo simulations are investigated in Chapter 5. The fit of the nonparametric frequency distribution to the three study areas is consistently superior to any of the parametric homogeneous models discussed above, particularly at the upper tail of the distribution. It is commonly recognized that the nonparametric method has a limited ability to extrapolate beyond the observed data in the flood sample. Its usefulness has therefore been questioned on the premise that when a large amount of data are available, one is able to discriminate between alternative parametric distributions. Using relatively large sample sizes of annual floods, it has been shown how none of the commonly used parametric distributions provides a satisfactory fit to the observed data, while the nonparametric approach seems to be a viable alternative. 4.5 CONCLUSIONS In this chapter, it has been demonstrated that none of the commonly used homogeneous distributions provides a satisfactory fit to the observed floods in the three study areas, particularly at the upper tail of the empirical distribution. This was the case for even the five-parameter Wakeby distribution, which is the most flexible and versatile of the homogeneous distributions considered. The criteria for selecting an appropriate flood frequency model should not just be based on goodness-of-fit testing but should also 65 consider physical hydrologic reasoning. A heterogeneous distribution that explicitly accounts for the fact that floods are generated by more than one hydrologically distinct mechanism produced a superior fit that mirrors the upper tail of the empirical distribution much better than several homogeneous distributions. It has been demonstrated how the use of a more appropriate type of distribution (heterogeneous as opposed to homogeneous) might have major implications on the prediction of extreme flood events. In flood frequency analyses with heterogeneous frequency models, two challenging decisions need to be made. Firstly, one must determine how many component distributions should be used. Hydroclimatic data can often be employed to decide on the number of flood populations. However, the use of more than two components may make the fitting technique less robust and less accurate. One remedy for this problem is to use a regional approach for fitting the heterogeneous distribution, as opposed to a single site approach. Such a technique has been justified in several recent studies (Fiorentino et al, 1985; Gabriele and Arnell, 1991). Secondly, one must select an appropriate parent distribution for each component. Whereas conventional heuristic arguments that are based on asymptotic theory can still be used in support of a particular homogeneous distribution, many of these arguments are not physically based and are largely subjective. More research on the selection of flood frequency distributions using the physical nature of hydrological processes is desperately needed. In this chapter, the parameters estimated using the least square method for fitting the mixture distribution of the two log-normal distributions are assumed to mirror the true characteristics of the parent heterogeneous distribution of the annual floods. However, some over-fitting may cause these parameters to be artifacts of the particular flood sample. The robustness of this parameter estimation technique is assessed through a Monte Carlo experiment in the next chapter. Also, the goodness-of-fit of the nonparametric distribution within the range of the observed data does not reflect its ability to extrapolate to floods of larger return periods. The reliability of extrapolating the nonparametric distribution beyond the highest observed flood on record is also assessed through a Monte Carlo Experiment in the next chapter. 66 Chapter 5 ASSESSMENT OF THE PERFORMANCE OF VARIOUS FLOOD MIXTURE DISTRIBUTIONS USING MONTE CARLO SIMULATIONS 5.1 INTRODUCTION Flood frequency analysis refers to those studies that seek to define the probability distribution of flood peaks and volumes. Although various distribution functions have been suggested, no one function has been accepted universally. Underlying the studies is the assumption that on an annual basis, floods are independently and identically distributed. Thus, reference can be made to the T-year flood that is defined as that flood whose magnitude is exceeded with probability 1/T, where T, measured in units of years, is referred to as the return period. Because the underlying distribution of floods is unknown, only an estimate of the T-year flood conditioned on an assumed distribution of floods can be provided. In one way or another, the estimate of the T-year flood is used to design structural and non-structural measures for purposes of flood damage reduction. Different choices for the assumed distribution of floods lead to different values of the estimate of the T-year flood, where the differences in the values increase with T. Flood frequency analysis has sought to define a best estimate of the T-year flood, based primarily on various criteria of best fit of different assumed distributions to observed flood sequences. Because different estimates of the T-year flood may lead to different designs of flood reduction measures in terms of their type, size, and operation, government agencies in many countries, such as the U.S. Water Resources Council, have sought to choose a particular distribution of best fit for use by all federal agencies concerned with floods (Benson, 1968). 67 The choice of assumed distribution of floods matters only if different choices lead to different designs. Due to the complexity of design processes and the fact that the processes are conditioned on the different aversions to flood risks by those who have a stake in the reduction of flood damages as well as on site-specific factors, it is difficult to assess the sensitivity of designs to different choices of the assumed distribution of floods. Nonetheless, some assessment of sensitivity would seem to provide a better basis for choosing among different assumed distributions of floods than the use of criteria of best statistical fit to observed flood sequences. Hydrologists recognize that floods are caused by more than one mechanism and that the probability distribution of floods of differing mechanisms could be very different. Even though much of the work presented in Chapter 4 included methods for conducting frequency analysis on floods classified by mechanisms, simulation studies are needed to determine whether there is any gain from such a procedure. To date, no such study has been reported in the literature. To gain some insight as to the sensitivity of the design floods to the choice of assumed flood distribution, Monte Carlo experiments have been undertaken to assess the performance of the best model for fitting annual flood data generated by mixed mechanisms. These simulations are used to address the following questions: (i) What are the gains in ways of improving the reliability of flood estimates by mixture distributions when the heterogeneity results in a flood frequency distribution with high L-skewness and a heavy tailed probability density function (characteristics of flood mixtures in arid and semi-arid climates of Arizona)? (ii) What are the gains in ways of improving the reliability of flood estimates by mixture distributions when the heterogeneity results in a flood frequency distribution with smaller value of L-skewness and a bimodal probability density function (characteristics of flood mixtures in humid climates of British Columbia)? 68 (iii) In either of the above two cases, which heterogeneous distribution gives the best performance in fitting mixture in annual floods (i.e., nonparametric or mixed two-component parametric distributions)? (iv) How do these heterogeneous distributions perform over the range of sample size and return periods that are used in practice? 5.2 LITERATURE REVIEW In general, flood estimates are subject to both modelling and sampling errors, which limit the accuracy of predictions of stochastic flood models. Distinguishing between the different flood models, over a range of conditions, generally requires a simulation approach. Monte Carlo based sampling experiments had been employed in the literature to assess the relative model performances in terms of certain statistical indices. Simulation procedures fall into two categories: parametric and nonparametric Monte Carlo simulations. In a parametric Monte Carlo simulation experiment, a frequency distribution and its parameter set are first selected. The selected distribution is then used to synthetically generate flow data sets that are much longer in record than would have been available in practice. The synthetic generation of flows is conducted through a random number generating technique whereby a simulated flow value can be any positive number but a probability is associated with each flow magnitude (Haan, 1977). The simulated data, considered representative of a real-world situation, then serves to enhance our understanding of the performance of alternative flood frequency models. Benson (1952) and Nash and Amorocho (1966) used Monte Carlo based simulated data distributed in accordance with an EV1 population mainly to study the standard errors of flood estimates for varying sample sizes. Benson (1952) states that with a sample size of 12, the mean can only be estimated to within +/- 25% of the true mean 95% of the time. In evaluating the performances of a number of estimators, Wallis et al. (1974) used sampling experiments to describe the random sampling behavior of commonly used statistics such as mean, standard deviation, and coefficient of skewness for samples derived from 69 various distributions including LP3 and EV3. They found that the estimates of the coefficient of skewness are not only subject to large sampling errors but are both biased and bounded. Adamowski (1989) indicated through parametric Monte Carlo simulation using synthetic flood data that are identically distributed, that nonparametric flood frequency distributions provide more accurate flood estimates than homogeneous distributions for large return periods because the right-hand tail is not dependent on lower flood values. For smaller return period floods, both parametric and nonparametric distributions are equivalent. Bardsley (1989) performed a Monte Carlo simulation with a Gumbel kernel and found good agreement with the theoretical values, although he was concerned about underestimation beyond the 100-year return period. He selected the Gumbel kernel because a positively skewed form was deemed desirable if the density was not to decrease too rapidly beyond the largest data point. Lettenmaier and Potter (1985) generated random samples based on EV1, LN2, and LP3 distributions to describe a regional flood model in which the flood statistics were considered to be dependent on drainage area. Following the introduction of the two-component extreme value (TCEV) distribution of Rossi et al. (1984), Beran et al. (1986) discussed the statistical properties of TCEV distribution based on both observed U.K. flood data and Monte Carlo generated samples and found that the TCEV distribution is suitable as a regional flood model. Unlike the USGS index flood method, this regional model has been parameterized in terms of coefficient of variations (L-Cv) and found to perform well only for low values for L-Cv. Arnell and Beran (1987) carried out a series of simulation experiments to compare the robustness of the regional TCEV estimation procedure against the other index type regional estimators, including the regional Probability Weighted Moment (PWM) estimators of G E V and Wakeby distributions. They found that in terms of bias, TCEV performed well, but in terms of variance the Wakeby distribution was more successful. The focus of this work was not to compare the performance of heterogeneous to homogeneous frequency distributions but different types of regional flood models using 70 different distributions. The synthetic flood data used in the Monte Carlo simulation work by Arnell and Beran (1987) were identically distributed and therefore had no mixture. The main problems with parametric Monte Carlo simulation are that the parametric distribution of most hydrological variables is unknown and the choice of the parent distribution is usually arbitrary. Although the robustness of the conclusions of a Monte Carlo simulation study to the hypothesis of a given parent distribution may be studied (Slack et al, 1975; Kuczera, 1982; and Haktanir, 1992), parametric simulation has been criticized for comparing statistical models in an artificial setting (Klemes, 1986; Potter, 1987; andBobee et al, 1993). Nonparametric Monte Carlo simulations based on the bootstrap technique (resampling with or without replacement from observed flood data) may seem more attractive since no hypothesis on the statistical distribution of the variable appears necessary to simulate data. However, when resampling with replacement from a reference sample, it is implicitly assumed that the finite reference sample is equivalent to the whole population. When the reference sample is small, this hypothesis may lead to unreasonable conclusions (Rubin, 1981). Flood data in practice are short. It would be a challenge to conduct nonparametric Monte Carlo simulations using such short records of flood. Fortin et al. (1997) proposed a rational approach for comparing procedures that involve the use of a combination of statistical distribution and a parameter estimation method. The method, based on nonparametric Bayesian preposterior analysis, consists of simulating samples by resampling from a reference series using Polya's model. The method used in their study to compare distribution and estimation combination has one important limitation; it cannot be used directly to study high return periods larger than the size n of the reference samples. A l l of the above studies are based on Monte Carlo simulation using homogeneous distributions. To the author's knowledge, the only study that has considered heterogeneous distributions in a Monte Carlo experiment context is that of Leytham (1984). This study generated mixture data using a two-component normal distribution and explored the small sample properties of both parameter and quantile estimates of 71 fitted mixture distributions. His main objective was to evaluate the performance of the classified two-component flood frequency analysis (Equation 4.1) in comparison to the performance of the unclassified flood frequency analysis (Equation 4.5). He found that for samples of the size generally available for work in mixed flood frequency analysis, parameters estimated from unclassified data tend to be inaccurate and greatly inferior to estimates from the corresponding classified data. Despite this fact, he also found that the properties of quantiles estimated from classified and unclassified samples were in reasonable agreement. In many practical applications, appropriate climatic data are often not available to classify events into different flood populations. As discussed in Chapter 3, this suggests that the ability to classify a sample may not be crucial for the wide range of problems in which flood quantiles are of principal concern. As illustrated by these numerous studies, Monte Carlo based simulated data can be used to study the predictive abilities of stochastic flood models for a range of realistic population conditions. From this, it is possible to select the most robust stochastic flood model (or models), which has the ability to predict its estimates with minimum bias and maximum efficiency (Cunnane, 1978). The first objective of this chapter is to demonstrate which heterogeneous distribution can perform better in fitting a mixture of annual floods that result in a flood frequency distribution exhibiting either a high L-skewness and a heavy tailed probability density function or a smaller value of L-skewness and a bimodal probability density function. The second objective of this chapter is to determine how these heterogeneous distributions perform over the various ranges of sample sizes and return periods. 5.3 RESEARCH METHODS To obtain meaningful conclusions from Monte Carlo simulation, it is important to properly design such simulations (Bobee et al, 1993). Monte Carlo simulation is applied in this chapter following the procedure summarized in the flowchart of Figure 5.1 with the purpose of evaluating the performance of the best model for fitting annual flood data generated by mixed mechanisms. To estimate the performance of a selected distribution 72 for a given sample size n and return period T, one or more baseline samples of observed data are analysed and used as a reference to simulate many samples of size n. The Monte Carlo simulation procedure is then applied to all simulated samples and QT is estimated from each of these simulated samples. Finally, the estimated values of QT are compared with a reference value (population value of Qi), giving a measure of performance of the distribution selection. As explained in the next sub-section, this performance measure depends mainly on the choice of the simulation model. 5.3.1 Simulation design As a result of the shortness of flood record available in practice and because of the limitations associated with the nonparametric Monte Carlo simulation techniques discussed in Section 5.2 above, we adopted a parametric approach to Monte Carlo simulation. To address the main problem associated with parametric Monte Carlo simulation, namely the sensitivity of Monte Carlo simulation results to the parent population used to generate the data, Monte Carlo simulation has been conducted using several generators such as the two-component lognormal distribution (TCLN), the two-component extreme value distribution (TCEV), and the Wakeby distribution. The methodology presented below is applied only to T C L N (the same methodology would apply for other generators). The Monte Carlo simulation procedure outlined in Figure 5.1 and implemented in this thesis is detailed in the following 7 steps: 1) Choose the generation method among TCLN, TCEV, and Wakeby. These distributions were the recommended choice for heterogeneous data (Singh, 1968, 1972, and 1974; Fiorentino, 1985; and Houghton, 1979). It is proposed that the heterogeneity of annual floods be herein considered as essentially composed of two-component parametric homogeneous distributions. The Wakeby distribution is often viewed as a homogeneous distribution that is characterized by five parameters. However, Singh (1979) demonstrated how 73 mathematical manipulation of the Wakeby distribution inverse form (x = x(F)) may lead to the interpretation that it is a mixture of two parametric frequency distributions in the form of: x = X] + X2 + e x, = -a(l - F)b x2 = c(l - F) where F is the uniform (0, 1) variate. The equation is written so that the distribution parameters a, b, c, d are always positive and the parameter e which is a location parameter is sometimes positive (Houghton, 1978). While the Wakeby as a mixture distribution can be handled mathematically more easily because its inverse function form (x = x(F)) is analytically defined, this is not the case in most other mixed frequency models composed of two parametric distributions. For instance, for the T C L N distribution, the following equation applies: Subscripts 1 and 2 refer to the first and second component distributions; and ui , U2, o~i, and 02 are the means and square roots of variances of the two-component log-normal distributions. To estimate the annual flood for a given probability, F(x), one would normally use the inverse function of Equation (5.1) [x = x(F)]. However, there is no analytical form available for Equations F(x) = oF, (x) + (1 - a)F2 (x) 0 < a < l (5.1) (5.2) (5.3) 74 5.2 or 5.3. Therefore, mathematically no inverse function exists. To overcome this challenge, Equation (5.1) can be manipulated into the form of: F(x)-[a Ffx) + (l-a )(F2(*)] = 0 or F(x)-aFl(x)-(\-a)F2(x) = 0. (5.4) For a given (known) F(x), Equation 5.4 is only a function of x and the problem of estimating the flood x for a given probability F(x) reduces to finding the root of the Equation (5.4). This is done by standard root finding techniques such as Newton-Raphson or bisection methods. The bisection method was used since it always guarantees the answer as long as the root lies between the initial starting values of the numerical solving algorithm. To generate a flood of a particular return period in our Monte Carlo study, a random number between 0 and 1 is chosen. This value is assigned to be the F(x), in Equation (5.4), and the corresponding value of x is calculated. This x is the randomly generated flood value. This process is repeated as often as required, creating a randomly generated flood series. This method is general and can be applied to a mixture of any two parametric homogeneous distributions for which x = x(F) is not available or cannot be derived. Otherwise for the Wakeby generator used in this Monte Carlo simulation, the analytical form is used in the generation. 2) For a specific sample size, decide on the number of replicate samples (5000 samples in this study) to be generated and the parameter values of the flood frequency distribution used as a generator in the Monte Carlo experiment. These values (Table 5.1) were taken to be the population parameters representative of observed- mixture data from Arizona and BC as detailed subsequently in Section 5.3.2. 75 3) Calculate the true population value (Q2, Q5, Qio, Q20, Qso, Q100, Q200, Q500, and Q1000) using Wakeby, TCEV, and TCLN generating distributions. To overcome the problem associated with the possibility that Equation 5.1 may have multiple solutions, and therefore the risk that the optimal solution may not be obtained, we have also double-checked the population of the quantiles using the following method. In the case of Q100, for instance, a set of 1000 samples was generated using the population parameters of the mixture, and each of these had a sample size of 100 years. The highest value from each of the 1000 samples was taken and the average of all the maximum values was computed to find the true value of Qwo-4) Decide on the heterogeneous distributions to be assessed in this Monte Carlo simulation. The performance of several heterogeneous distributions in fitting the synthetically generated mixture data was assessed. These heterogeneous models are: T C L N , TCEV, and the nonparametric distributions. To quantify the gains in adopting heterogeneous distributions and for comparative purposes, the synthetically generated data were also fitted to some of the most commonly used homogeneous distributions, namely: EV1 , GEV, and LP3. 5) For each generated flood sample, calculate the parameters of each of the selected distributions in step (4) above: For the T C L N distribution, the method of least squares was used and was described in Chapter 4. For the T C E V distribution, the maximum likelihood was used to calculate the parameter. This method is known to have a problem of convergence. The method used to calculate the NP distribution was described in Chapter 4. For the EV1, GEV, and Wakeby distributions, the method of L-moments was used as described in Chapter 4, and the advantages of this method are fully explained. L-moments solutions are not available for the LP3 distribution and in this case, the method of maximum likelihood was used. 76 6) Use the fitted distributions of step (5) above to calculate the sample estimates of Q2, Qs, Q10, Q20, Q50, Q100, Q200, Q500, and Qwoo- The use of various return periods T allows the assessment of the effect of return period on the reliability of a flood frequency analysis. In the case of T C L N and T C E V distributions, the flood quantiles are estimated using a numerical method detailed in step (3). For the GEV, Wakeby, LP3, E V 1 , and NP distributions, flood quantiles are estimated using an analytical method, since an analytical form of the inverse function (x = x(F)) exists for these frequency distributions. 7 ) Repeat step (2) through (6) for various sample sizes (n - 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100). This will allow an assessment of the effect of sample size on the reliability of flood frequency analysis. 8) Evaluate the relative performance of different flood estimators by quantifying the errors in the simulation experiment. The errors computed were the bias and root mean square error (RMSE) for each Qj. Bias refers to the difference between a sample estimate of a statistic and its true population value. An approach is said to be biased if it yields estimates that are on average significantly larger or smaller than the population values. The precision is measured by the variance and refers to the dispersion of estimates about the mean. However, the R M S E (used in this study and referred to as a measure of accuracy) reflects the dispersion of estimates about the population's mean and comprises both the bias and precision as defined by: 5.3.2 Scenarios of Monte Carlo Simulation The reader is reminded that the synthetic data in the Monte Carlo simulation is generated to reflect the characteristics of observed mixture in flood data. Four scenarios are (5.5) RMSE = (MSE) 1/2 (5.6) 77 therefore used in this study: two cases that represent BC data, and two that represent Arizona data. Three alternative ways of looking stochastically at a heterogeneous flood series are presented in Figure 5.2. Figure 5.2a depicts a nonstationary stochastic process with a time-varying mean. The theoretical distribution for the complete time series on the right side of the figure is a complex one which bears no resemblance to any one X(t) distribution. This is the classic representation of a flood series composed of multiple populations, each with a different mean. Figure 5.2b depicts a nonstationary stochastic process with a time-varying variance. Here the mean of the populations composing the mixture remains the same but the variance, or spread around the mean, changes from one component population to another. Figure 5.2c depicts a nonstationary stochastic process with a time-varying mean and variance. The resulting theoretical distribution for the entire time series is complex and reflects multiple populations. Admittedly, the component distributions making up mixtures may also have either the same or different higher order moments (such as L-Cs and L-Ck). However, these differences would result in the same mixture characteristics as displayed in Figures 5.2a to 5.2c. The four scenarios described below and used in this Monte Carlo simulation study represent hypothetical mixtures, but reflect the characteristics of observed flood data. While real data are available, it is often only in relatively short periods of record. The data generated for each scenario are based on real statistical parameters. These parameters are computed from actual flood data that reflect mixture in Arizona and BC. The parameters for the four scenarios are detailed in Table 5.1. Once data are generated using the parameters of Table 5.1, the L-Cv and L-Cs of the generated time series were calculated for each scenario. This is a way of verifying that the L-Cv and L-Cs of the generated mixtures are reflective of observed flood data in BC and Arizona. Synthetic data were used to display Figures 5.3 to 5.6 as opposed to observed data because the observed data record is short, and as a result, sampling variability could cause the bimodality in the PDF shape. Synthetic data have a much larger number of observations and if the bimodality holds, then we can be certain that this is not the result of sampling variability but is rather a mixture in flood data. The PDF and CDF plots presented in Figures 5.3 to 5.6 for the four scenarios used in this Monte Carlo simulation experiment 78 were generated using the T C L N distribution. The following is a detailed description of each scenario: Scenario # 1 is a case that presents a clear bimodality shape (Figure 5.3). This bimodality is the result of a mixture of two populations with different means and different variances. This sort of combination produced annual flood time series with relatively low L-Cv (0.16) and L-Cs (0.19) and is typical of flood data in BC. Scenario # 2 is another case that presents a multimodal shape (Figure 5.4). This bimodality is the result of a mixture of two populations with the same means but different variances. This sort of combination produced annual flood time series with relatively low L-Cv (0.18) and L-Cs (0.22) and is also typical of flood data in BC. Scenario # 3 presents a clear heavy tailed shape (Figure 5.5) that reflects a type of PDF shape that occurs in Arizona. This heavy tail is the result of different means and different variances. This sort of combination produced annual flood time series with relatively high L-Cv (0.33) and L-Cs (0.50). These are the highest values noted in this study. Scenario # 4 is another case that presents Arizona data with a heavy tail shape where the means are the same but the variances are different (Figure 5.6). This sort of combination produced annual flood time series with a relatively high L-Cv (0.35) and a similarly high L-Cs (0.44). 5.4 RESULTS AND DISCUSSIONS Sections 5.4.1 through 5.4.6 refer to Monte Carlo simulation results when the synthetic mixture data is generated by T C L N . Sensitivity of the Monte Carlo simulation results to the generating parent distribution is discussed separately in sub-section 5.4.7. The reader is reminded that the TCEV as a fitting mixture distribution was disregarded from any further analysis. This is because the fitting of the T C E V by maximum likelihood method failed most of the time, particularly for small sample sizes (i.e. < 100 79 years) used in this Monte Carlo simulation experiment. However, the T C E V was used as a generator of mixture. Relevant results are discussed in Section 5.4.7. A l l distributions performed equally well for return periods less than 10 years. Therefore, only the Monte Carlo simulation results for return periods longer than 10 years are presented and discussed in the remainder of this chapter. 5.4.1 Scenario #1 Figure 5.7 displays the variation of the R M S E (%) with the return period when the sample size is 20 (approximately the average record length of observations in BC) for Scenario #1. This is the case where the means and variances of the two populations are different and the resulting PDF shape is clearly bimodal. While the NP, T C L N , and EV1 gave the best results among all distributions, the NP marginally outperformed EV1 and T C L N . The NP performs better when compared to the other distributions, as it is more suited for bimodal shapes than those with a heavy tail PDF. The EV1 also performs reasonably well because it is a two-parameter frequency distribution with a constant theoretical L-Cs value of 0.17 (low), a value that is very close to the L-Cs for the scenario (0.19). The T C L N also performs reasonably well because it is the same distribution used to generate the synthetic data in the Monte Carlo simulation experiment. The GEV, LP3, and Wakeby distributions did not perform as well for return periods larger than 10 years, because they (a) are homogeneous distributions and (b) are five- and three-parameter distributions. Moments of order higher than two are known to be very sensitive to sampling variability associated with short record length (~ 20 years). Table 5.2 shows the variance, bias and M S E for all fitted distributions. It can be seen that the variance contributes to the M S E substantially more than the bias for all return periods shorter than 100 years. The NP and T C L N are virtually unbiased for return periods less than 200 years and therefore the accuracy in these two cases is dictated by the variance of the quantiles. 80 5.4.2 Scenario #2 In the case of Scenario #2, where the means of the two populations are the same but the variances are different and the resulting PDF is clearly multimodal, all distributions (apart from T C L N and EV1) produced quantiles with approximately the same accuracy for all ranges of return periods (Figure 5.8). However, T C L N and EV1 outperformed the other distributions, particularly for higher return periods. There is consistency in the results between Scenario #1 and Scenario #2 in that EV1 and T C L N are still reasonably reliable models for fitting the mixture. However, the small change in the population values of L-Cv and L-Cs made a substantial change in the performance of the various models (in particular the NP distribution). The multimodality of the PDF in scenario #2 explains why the performance of the NP model rapidly deteriorates as the return period increases. The NP is known to have poor extrapolating capabilities beyond the observed record. As it is shown in Table 5.3, T C L N was virtually unbiased for all return periods; EV1 and NP became highly biased as the return period increased beyond 100 years. This bias has contributed to the M S E of the EV1 and T C L N quantiles substantially more than the variance. 5.4.4 Scenario #3 Scenario #3 is representative of Arizona data. Figure 5.9 displays the variation of the R M S E (%) with the return period when the sample size is 20. This is a case where both the means and variances of the two populations are different. The PDF in this case has a heavy tail shape, and has a higher value of L-Cs (0.50) representative of arid to semi-arid climates. The NP distribution gave the worst performance among all distributions. This is not surprising since the parent distribution used to generate the synthetic data is highly skewed (heavy tailed PDF). The NP is known to have poor extrapolation capabilities under these conditions (small sample size with high skewness). The EV1 also did not perform well because its theoretical L-Cs (0.17) is much lower than the L-Cs (0.50) of the parent distribution. The T C L N distribution gave better performance than the LP3, NP, and EV1, particularly for return periods greater than 200 years. This might be explained by the fact that the data used in the Monte Carlo simulations were also generated by 81 T C L N . It is important to note that the GEV and Wakeby distributions gave very similar performances and they both outperformed all other distributions. It is believed that these two distributions outperformed the T C L N because they are fitted by L-moments while T C L N is fitted by the method of least squares. The bias, variance, and M S E are given in table 5.4. Comparing Tables 5.2 and 5.3 to 5.4, it is concluded that the bias and variance are much higher for Scenario #3 than for Scenarios #1 and #2. This is one more manifestation of the substantia] change in the performance of the frequency models as a result of the difference in the L-Cv and L-Cs of the parent generating distribution. The accuracy of EV1 and NP is dominated by the bias for higher return periods. For all other distributions, the accuracy of the quantile is dominated by the variance term. The T C L N , Wakeby, and G E V are the least biased among all fitted distributions. 5.4.5 Scenario #4 Scenario #4 is the second of the two cases that represent Arizona data. This is a case where the means of the two populations are the same and the variances are different. The PDF in this case has a heavy tailed shape, and has a high value of L-Cs (0.40) that is representative of arid to semi-arid climates. Similar to scenario #3, the NP gave the worst performance, particularly at higher return periods (Figure 5.10). This is not surprising since the parent distribution used to generate the synthetic data is highly skewed (heavy tailed PDF). As mentioned previously, the NP is known to have poor extrapolation capabilities under these conditions (small sample size with high skewness). The EV1 also did not perform well because its theoretical L-Cs (0.17) is much lower than the L-Cs (0.44) of the parent distribution. The performance of the T C L N distribution improved over the LP3, NP, and EV1 as the return period increased beyond 100 years. It is important to note that GEV and Wakeby outperformed all other distributions particularly for higher return periods (> 100 years). As explained for scenario #3, it is believed that these two distributions outperformed the T C L N because they are fitted by L-moments while T C L N are fitted by the method of least squares, as explained in scenario #3 above. 82 The bias, variance, and M S E are given in Table 5.5. Comparing Tables 5.2 and 5.3 to 5.5, it is concluded that bias and variance are much higher for Scenario #4 than for Scenarios #1 and #2. As mentioned in Scenario #3, this is one more manifestation of the substantial change in the performance of frequency models as a result of the difference in the L-Cv and L-Cs of the parent generating distribution. The accuracy of EV1 and NP is dominated by the bias for higher return period. For all other distributions, the accuracy of the quantile is dominated by the variance term. The T C L N , Wakeby, and G E V are the least biased among all fitted distributions. 5.4.6 The influence of sample size on the accuracy of flood quantiles In order to check the influence of sample size on the accuracy of quantiles under the various frequency distributions, the RMSE is plotted as a function of sample size for one return period at a time. Results in this subsection were consistent for all return periods. Therefore, only the results for the 100-year return period quantile are presented and discussed below. Figure 5.11, which represents Scenario #1, based on BC data, displays the variation of the R M S E with sample size for the 100-year flood quantile. In this plot, the RMSE is presented as a percent of the flood quantile itself. It can be seen that the accuracy of the 100-year flood event improves substantially as the sample size increases. However, the improvement became marginal as the sample size increased beyond 60 years. The NP distribution is persistently more accurate than all other distributions for the whole range of sample sizes. However, in comparison to the other distributions, the T C L N and EV1, performed differently for different ranges of sample sizes. For instance, Figure 5.11 shows that the LP3, Wakeby, and GEV outperformed the T C L N and EV1 for sample sizes larger than 60 years. However, for sample sizes smaller than 60 years, the T C L N and EV1 performed better than the LP3, Wakeby, and G E V distributions. Figure 5.12, which represents Scenario #3, based on Arizona data, displays the variation of the RMSE with sample size for the 100-year flood quantile. In this plot, the RMSE is also presented as a percent of the flood quantile itself. It can be seen that the Wakeby and 83 GEV distributions were the most accurate among all distributions for the whole range of sample sizes. In contrast to the results of Scenario #1 displayed in Figure 5.11, the relative performance of all distributions was consistent for all sample sizes. 5.4.7 Sensitivity of the Monte Carlo simulation results to the generating parent distribution The Monte Carlo simulation results discussed above were based on the T C L N as a generating parent distribution. In order to check if the results are sensitive to the generating distributions, the Monte Carlo simulations were repeated using the TCEV and Wakeby as generating parent distributions. It was found that the results are not sensitive to the generating distribution. One scenario from BC and one from Arizona were used to demonstrate this. Figure 5.13 represents Scenario #1 of BC data, and it displays the variation of R M S E (%) with the return period for all fitted distributions and for all three generating methods (TCLN, TCEV, and Wakeby) when the sample size is 20. The EV1 distribution was consistently the best, regardless of the generating method used. Figure 5.14 represents Scenario #3 of Arizona data, and it displays the variation of R M S E (%) with the return period for all fitted distributions and for all three generating methods (TCLN, TCEV, and Wakeby) when the sample size is 20. The Wakeby and G E V distributions were consistently the best, regardless of the generating method used. While the T C L N distribution performs equally well for T C L N and T C E V generators, it fails when using the Wakeby generator. 5.5 CONCLUSIONS This chapter analysed the statistical performance of various distributions, so as to ascertain which frequency model was suitable if mixed hydro-climatological processes were present in the observed flood series in Arizona and BC. The results of the performance analysis, in terms of the accuracy of flood quantiles as measured by the RMSE, suggest that for high L-skewness and a heavy tailed probability density function (characteristics of flood mixtures in the arid and semi-arid climates of Arizona) the GEV, Wakeby, and T C L N distributions consistently perform well over NP, EV1, and LP3. 84 These conclusions are the same for sample sizes between 20 and 100. The results in this chapter demonstrate that when flood data are generated by a mixture, the LP3 distribution recommended by US federal agencies is not valid. For characteristics of flood mixtures in the humid climates of British Columbia where the heterogeneity results in a flood frequency distribution with a smaller value of L-skewness and a bimodal probability density function, the EV1 and NP distributions perform better than the others. These results of the performance analysis in terms of bias and R M S E suggest that when the flood events result from different populations, the EV1 and NP would be better models to describe mixed flood populations. Results in this chapter support the use of EV1 by several BC studies (Russell, 1982; Loukas and Quick, 1995; and Waylen and Woo, 1982, 1983) but are in disagreement with others that recommend the use of distributions such as LP3 and LN3 (e.g. Coulson, 1991). 85 Chapter 6 SPATIAL SCALE EFFECTS ON REGIONAL FLOOD CHARACTERISTICS 6.1 INTRODUCTION Over the years, several methodologies have been developed in regional flood frequency analysis that are based on some fundamental assumptions of scale invariance. Spatial scale aspects are explicitly or implicitly introduced in regional flood frequency analysis by making assumptions on how the L-coefficient of variation (L-Cv) and L-coefficient of skewness (L-Cs) of floods change with basin size. These spatial scaling assumptions of scale invariance might well be one of the most significant sources of error in the estimation of floods at ungauged catchments. However, in the literature of regional hydrology, very little effort has been given to investigate this source of error. The objectives of this chapter are: (i) to investigate how flood frequency characteristics vary with the size of catchment. In other words, how do dimensionless flood statistics such as L-Cv and L-Cs vary with the spatial scale? (ii) to investigate how the relationship between these flood statistics and the size of the catchment varies with climate, physiography, and hydrologic regime; (iii) to test the statistical and operational significance of these scaling relationships; and (iv) to highlight plausible physical explanations of these scaling patterns. 6.2 LITERATURE REVIEW Scale issues extend from millimetres to thousands of kilometres in space and from milliseconds to geological eras in time. Based on this observation it is not surprising that 86 scale issues in hydrology are increasingly gaining momentum (e.g., Wood et al, 1990; Bloschl and Sivapalan, 1997; Gupta and Waymire, 1998; and Sivapalan et al, 2001). The scaling behavior of the L-Cv of peak flows has long been a topic of debate, and its beginning can be traced back to the work of Dawdy (1961), who noted that in some regions of the U.S. the slope of the flood frequency curve, and hence the L-Cv, tends to decrease with increasing catchment size. This finding was of particular significance because the L-Cv was generally assumed to be constant in hydrologically homogeneous regions, and its consistency is critical to the use of the index flood approach of peak flow estimation (Dalrymple, 1960) that was being advocated at that time. The L-Cv, which is the ratio of the standard deviation to the mean for a series of flood peaks, represents the degree of variability in the flood frequency distribution. Therefore it is one of the key parameters in regional flood frequency analysis (Bloschl and Sivapalan, 1997). The variation in L-Cv with the basin scale has important implications for the scaling theory of flood hydrology. Of primary interest is the change that occurs in the L-Cv as the basin scale changes and the processes that control this scaling pattern. Some investigators have used an empirical approach to investigate the spatial scaling of the flood statistics. These regional analyses of floods have made a significant contribution to our understanding of the scaling of hydrological fluxes. Smith (1992) studied the variation of the Cv with area for 104 stations in the central Appalachian region, which includes the Piedmont and Valley and Ridge physiographic provinces in Virginia and Maryland. The author hypothesized a pyramid-shaped scaling pattern, with the variation in the annual floods peaking for catchments with areas of between 50 km and 100 km 2 , as shown on Figure 6.1. In this study, the issue of homogeneity was given only minor consideration and therefore little effort was placed on delineating the flood data into homogeneous hydrological regions. Smith (1992) mentioned the possibility of heterogeneity in the central Appalachian data as a cause for the scaling trend. Hosking and Wallis (1997) also examined the same 104 stations of the Appalachian data set used by Smith (1992). Using various statistical tests to measure the heterogeneity they showed that for smaller basins less than 60 mi 2 (155 km2) the flood frequency distribution is determined by using more factors than drainage area alone. They found that it was 87 necessary to include the gauge elevation, latitude and longitude, as well as the basin area in a cluster analysis procedure to define several homogeneous regions. Hosking and Wallis (1997) concluded that drainage area alone was unable to explain the between-site variation in the frequency distribution of annual maximum streamflows. Cathcart (2001) is the only study that attempted to delineate a study region based on scaling. He conducted an extensive linear moment analysis of annual flood data in Oregon (USA) to investigate scale aspects of watershed response under various climate and physiographic conditions. The state of Oregon was delineated into twelve regions that are considered to be sufficiently hydrologically homogeneous and found three different scaling patterns of the L-Cv. The L-Cv demonstrated decreasing, increasing, and V-shaped patterns, with increasing scale. The trend of this V-shaped diagram (Figure 6.2) suggests that for smaller basins there is a decrease in the L-Cv as the basin area increases, while for larger basins there is an increase in the L-Cv as the basin area increases. This result is in direct contrast to the pyramid-shaped trend proposed by Smith (1992) and Gupta et al. (1994) as the regional variation in L-Cv. A decreasing trend across a range of basin scales is consistent with the findings of Dawdy (1961). The pattern of a decreasing L-Cv with increasing basin area is likely attributable to the L-Cv's sensitivity to baseflow. Many of the trendlines have very shallow slopes, which is attributed to the moderating effect of snowmelt. In large basins the baseflow tends to comprise a greater proportion of flood flows than in small basins, and therefore the overall variability of floods is smaller. The increasing trend is opposite to that which is typically expected and, as it is evident in only one region, may simply be an artifact of the available data. However, it is also possible that it is a true pattern, and this hypothesis is supported by the plausibility of physical explanations, which involve physiographic effects, rainfall intensity effects, and the mixture of flood mechanisms. Goodrich et al. (1997) stated that the L-Cv may increase with basin size due to an increase in the nonlinearity of basin response patterns as channel runoff processes start to dominate hillslope runoff effects. An increasing L-Cv may also be related to variable storm area coverage and the relative degree of mixture in 88 corresponding flood events. Spatial average rainfall values decrease with increasing basin scale; while at the same time variations in the synchronization of sub-basin responses start to increase. As the spatial coverage of storms is more complete, more commonly occurring in small basins, the annual peak flows more frequently approach the peak flow capacity of the watershed. However, in large basins, the peak flow capacity of the watershed is rarely approached, resulting in a fairly tight clustering of annual peak flows, with a few high outliers. Another plausible cause for the observed increasing L-Cv pattern is the mixing of rainfall and snowmelt flood data, which is expected to increase with basin area. Large basins are expected to experience a greater mixture of snowmelt and rainfall flood events than small basins because they typically extend over a greater range in elevation. Consequently, the upper and lower portions of a large basin are more likely to experience different climatic conditions, and accordingly produce floods by different mechanisms. For instance, the upper area of a basin may be largely responsible for fresh snowmelt floods while the lower area may produce the bulk of winter rainfall generated flows. This phenomenon was documented by Jarrett and Costa (1988), who delineated peak flow data on the basis of basin elevation in the Colorado Rocky Mountains. Other investigators have focused on the numerical modelling approach to investigate the spatial scaling of the flood statistics. Robinson and Sivapalan (1997a) used a simple derived-flood frequency model to investigate possible physical explanations for the two-phase (v-shaped) scaling pattern of Smith (1992). They attribute the increase of L-Cv with area for small catchments to the scaling behavior of the ratio of storm duration to catchment response time, and relate the decrease of L-Cv with area for large catchments to the spatial scaling of rainfall. They do admit, however, that their model involved a number of simplifying assumptions, including the exclusive use of Hortonian-type flow, and that the results largely reflect these assumptions. Bloschl and Sivapalan (1997b) revisited the problem with a more complex derived flood frequency model and a different empirical data set. The data were comprised of peak flows from 489 catchments in Austria that demonstrate a similar two-phase pyramid 89 scaling pattern of the L-Cv, with a peak corresponding to catchment areas of approximately 200 km 2 . They found that different processes affect the L-Cv in different ways, and that there is no unique explanation of the observed relationship between L-Cv and catchment scale. Rather, the different trends result from the complex interplay between processes and their relative importance at different scales and under different climate regimes. They concluded that much of the variability of L-Cv amongst catchments is due to runoff processes rather than rainfall variability. For instance, they determined that increasing the baseflow always decreases the L-Cv, and as large basins tend to have a larger relative baseflow component, this results in a significant decrease in the L-Cv with increasing basin area. In addition, it was found that increasing channel travel times with catchment scale translates into a decreasing trend in the L-Cv with increasing area for small basins, and an increasing trend for large basins. Interestingly, this V-shaped pattern is contrary to the peaked pattern delineated by Smith (1992) but consistent with empirical findings of Cathcart (2001). Also, recognizing the limitations of the Robinson and Sivapalan (1997a) analysis, Jothityangkoon and Sivapalan (2001) extended this model by allowing for multiple runoff processes, considering the spatial variability of rainfall, and incorporating the effects of antecedent moisture conditions. They found that the L-Cv is larger whenever there is a multiplicity of runoff processes, as opposed to when a single process dominates. Furthermore, the L-Cv was shown to increase with catchment area when multiple runoff processes are present in basins dominated by slow subsurface flow, but was conversely shown to decrease in basins dominated by fast runoff. The models discussed above have progressively become more sophisticated, but they all involve a number of simplifying assumptions, and therefore the results must be treated in the correct context. The current state of knowledge about the factors that influence runoff generation is quite limited and much work is still required before a full understanding of this subject can be gained (McDonnell, 2003 and Sivapalan, 1996). It is doubtful that the current level of rainfall-runoff model sophistication will permit the reproduction of all observed patterns in the L-Cv, which result from complex interactions 90 among numerous influencing factors such as climate, physiography, subsurface conditions, ground cover, and geomorphology. It was found that the distinct scaling behavior of the L-Cv, which is discussed in detail in Section 6.7, should also persist in the L-Cs as some studies have shown empirically that there is a monotonically increasing relationship between L-Cs and L-Cv (Svoboda, 1974 and Frind, 1969). The L-Cs and L-Cv are affected in the same way as a result of this empirically proven relation. The scaling behavior of the L-Cs of floods has received little attention because definite trends are not commonly detected in empirical data sets (Matalas et al., 1975). This is either because scaling relations do not exist, or because the sampling variability associated with third order moment sample statistics is so pronounced that any trends may be obscured. As a result, the variation of the skewness has been the topic of only limited interest since the early 1970's. The first significant study into the skewness of runoff was conducted by Klemes (1970), who used a non-linear reservoir model to demonstrate that negatively skewed runoff can result in basins with large storage capacities, even in cases where the distribution of input precipitation is positively skewed. Svoboda (1974) also used a similar approach to examine the effects of storage on the distribution parameters of peak flows, and concluded that as storage increases, the coefficient of skewness decreases and eventually becomes negative when the storage gets sufficiently large. This finding suggests that small basins should tend to exhibit higher skewness values than large basins, as large basins typically have relatively larger storage capacities. It also suggests that skewness should increase with increasing storm severity (McCuen and Hromadka, 1988 and McCuen, 2001), as relative storage capacity is lower with less frequent events. Hebson and Wood (1986) offered the next significant contribution to the skewness debate when they used a dimensionless flood frequency model to examine the effects of relative climatic and catchment scales on flood frequency response. They concluded that for a given climate, the frequency curves for small catchments tend to be more highly skewed than for large catchments. This finding supports the work of Svoboda (1974), as catchment size and storage are typically related. McCuen and Hromadka (1988) and 91 McCuen (2001) found that many of the common simple rainfall-runoff models imply that the coefficient of skewness decreases as storage increases. They also determined that skewness is likely a function of location due to the spatial variations of types of storm events and the skewness values of the resulting rainfall intensity-duration-frequency curves. Because L-Cv and L-Cs are affected by many climatic and physiographic parameters other than the size of catchment, this chapter addresses plausible physical explanations of these scaling patterns. 6.3 RESEARCH METHODS Ideally the only way of revealing the real effect of scale on flood statistics, such as L-Cv and L-Cs, is to have a long-term record of streamflow at a network of hydrometric stations that are nested within the same river basin. Ideally, the river basin should be reasonably small to keep to a minimum the effect of other physiographic and climatic parameters on the flood statistics. However, in reality this is not possible because there is no such nested set of hydrometric stations across a large enough range of spatial scales. As a result, in this thesis, the traditional long-term data at non-nested hydrometric stations will be used in an opportunistic way to reveal the scaling behavior of regional flood characteristics. Long-term hydrometric data in a study area that is as uniform and consistent in physiography and climate as possible will be investigated. This is an indirect way of adapting the definition of homogeneity, proposed by Gupta et al. (1994), in which a region is considered homogeneous if variations amongst probability distributions of floods for different sites within a region can be explained solely by variations in scale. Cathcart (2001) is the only one who tested the extent to which homogeneous regions could be delineated based on this definition using flood data from the State of Oregon (Gupta used this definition in a theoretical framework but never with real data). Also, there have been no formal statistical tests for verifying this type of homogeneity within a region. 92 The following are some of the criteria for selecting the study areas used in this chapter and corresponding hydrometric data sets: (i) The study area should be small enough to minimize the effects of factors other than scale such as climate and physiography. Empirical evidence suggests that drainage area is the most significant indicator variable (Benson, 1964; Pitlick, 1994; and Sumioka et al, 1997) and often the only significant variable (Pilgrim, 1983 and Gupta and Dawdy, 1995) of regional flood behavior. Typically, little reduction in standard error is achieved by incorporating more variables into the regression (Sumioka et al., 1997). This situation is particularly evident when the regression analyses are conducted on hydrologically homogeneous regions, and especially so if regional homogeneity is defined according to scaling behavior. The homogeneous condition ensures that the climatic and physiographic factors that affect peak flows, other than scale, are relatively uniform and therefore computationally insignificant. Even when hydrologic homogeneity is not perfectly achieved, as is always the case in empirical studies, the consideration of drainage area alone is often largely sufficient. This situation is not surprising, given that drainage area is related to many of the other variables typically considered in regional multiple regression studies, such as basin and channel slopes (Gray, 1961 and Cherkauer, 1972), channel length (Lystrom, 1970 and Pitlick, 1994), mean elevation of the study area (Jarrett and Costa, 1988 and Tarboton et al, 1989) and mean annual precipitation (Pitlick, 1994). As small basins are often located in the headwaters of watersheds, they tend to have steeper slopes, higher elevations, and greater precipitation than large basins, which are more likely to extend into the lower reaches of watersheds. In addition, as area is a function of length and width, large areas tend to be wider and have longer channel lengths. Consequently, these variables are indirectly considered by simply using drainage area (Benson, 1962a and b). (ii) The study area should have a large enough number of hydrometric stations to avoid spurious correlation between flood statistics and drainage area. As will be discussed in the results section, this proved to be a challenge in this study. 93 (iii) The record length at each hydrometric station should be long enough to reduce random variability on the sample statistics of floods. Sampling variability has the potential of obscuring any scale related signature on flood statistics. A l l stations with less than 10 years of record should be eliminated from the data set. This 10-year period was selected somewhat arbitrarily, but it has some precedent (Burn et al., 1997) and it served as a reasonable balance between maximizing the number of stations for analysis and minimizing the sampling error associated with small sample size. (iv) Gauged catchments within the study area should span a large enough range of spatial scales to facilitate an investigation into the scaling behavior of peak flows. This will permit meaningful analysis over a full range of scales as watershed response has been shown to be substantially different at different scales. Large watersheds are more likely to experience numerous events of similar size than small basins. (v) The hydrometric data should have been collected during the same climatic period. A basic assumption of flood frequency analysis is that there is no temporal trend to the streamflow data caused by long-term climatic changes. Shifts (jumps) and trends in flood-data series can be produced gradually or instantaneously. For example, a large clear-cut in a basin may quickly result in a shift, whereas a gradual forest insect infestation or climate change may produce trends in the flood-data series. In regional frequency analysis, trends in flood data caused by climate variability can be treated in different ways. For instance, one could use data from only the hydrometric stations with a common record length during a particular period where the climate could be considered stable. However, doing so could be at the expense of reducing the spatial and temporal data coverage of the study region. In BC, for instance, the bulk of the streamflow data has been collected during the last 40 years and therefore no attempt was made to reduce the hydrometric data from the various stations to a common recording time period. Church (1997) used this same period (1961-1990) as the global period for his analysis in a recent regionalization study. 94 To investigate the effect of the above factors on the scaling relationships, different data sets from study areas of contrasting climates, physiographies, and hydrologic regimes were chosen in this chapter. Through the careful empirical analysis of flood data from Arizona, California, Colorado, and British Columbia, plausible physical explanations to the various empirical scaling relations are given. A detailed description of each of these areas was provided in chapter 2. The outline of the research method is as follows: (i) Estimate the flood statistics from the annual flood series using the L-moments. L-moments are similar to conventional moments but have a number of advantageous statistical properties. Consequently, they have gained considerable momentum word-wide in the analysis of hydrologic extremes (Hosking and Wallis, 1997). L-moments suffer less from sampling variability than conventional moments and are more robust to outliers in the data and therefore enable more secure inferences to be made from small samples about an underlying probability distribution (Hosking, 1990). These properties are important when analysing the often sparse streamflow databases as is the case in practical applications. (ii) Plot the flood statistics against drainage area and conduct the ad-hoc homogeneity test used by Cathcart (2001) for each study area. (iii) Develop and compare the simple regional regression plots for data groupings of all areas for assessing the scaling behavior of the L-Cv and L-Cs (plots of Cv/Cs vs. catchment size). The regression plots were developed with semi-logarithmic scales. This approach was favoured over a more complex modelling approach, such as the use of a derived flood frequency model, due to reservations about the ability of a model to accurately simulate natural scaling patterns. (iv) Conduct simulation experiment and draw box-plots to see if the trends of the L-Cv are persistent. The simulation experiment consists of using the GEV 95 distribution to synthetically generate a large number of samples (100) with the same sample size, mean, L-Cv, and L-Cs as the observed data at each gauged watershed. At each station, once the samples were generated, then L-moment statistics were computed for each sample. The mean, L-standard deviation, and the minimum and maximum value of L-Cv were computed using the 100 samples generated for each gauged watershed. These values were used to display the box plots. The box-plot allows exploration of the data and the drawing of informal conclusions about the L-Cv on the L-Cv vs. drainage area plot. At each gauged watershed, the five-number summary consists of the mean, the quartiles, and the minimum and maximum values of the synthetically generated L-Coefficient of variations (Figure 6.3). Below, it is demonstrated that, by giving careful consideration to this definition and approach in delineating hydrologically homogeneous regions for regional flood frequency analysis, convincing scaling relationships can be obtained for the L-Cv (and L -Cs) of floods. Further, it is shown that the climatic and physiographic controls affecting these scaling patterns can be identified. 6.4 RESULTS To investigate the scaling behavior of floods for different climate regimes and physiography, annual peak flow data were analysed for study areas in Arizona, California, Colorado, and British Columbia. In British Columbia, annual peak flow data were analysed for three physiographic regions: the Coast Zone, the Fraser and Thompson Plateaus (part of Interior Plateau), and the Columbia and Southern Rocky Mountains (Figure 2.3). The Northern and Central Plateaus and Mountains and the Great Plains area in the Northeast are not included in this study due to scarcity of hydrometric stations and because the three selected regions sufficiently highlight extremes in the scaling behavior of the L-Cv in the Province. There is considerable scatter of the actual data points around the regression lines of L-Cv versus drainage area for most regions. Sampling variability caused by the relatively short record length of the annual flood series may explain a significant portion of this scatter. While this issue can be explored through Monte Carlo 96 analysis, some of the scatter is perhaps explained by ignoring factors other than drainage area in the regression. It was found that L-Cv for annual maximum daily flows varies substantially with catchment scale as the drainage area increases, and this variability is different in different physiographic regions (Figure 6.4). As the above analysis was conducted for daily rather than instantaneous flows, questions may arise regarding whether scaling patterns for L-Cv, similar to those in the daily peak flow data, are evident in the instantaneous peak flows. The L-moment analysis was therefore repeated for the BC instantaneous peaks. It was found that L-Cv scaling trends indeed appear to be similar for the annual maximum daily and instantaneous peaks (Figure 6.5) although the much larger scatter in the latter data makes it difficult to reliably identify actual scaling trends. This greater degree of scatter is expected as instantaneous flows vary more dramatically in magnitude than do daily averaged flows, an attribute that is compounded by the smaller number of stations measuring instantaneous flows. However, the similarity in scaling trends suggests that conclusions drawn from an analysis of annual maximum daily flows will be applicable to the instantaneous peaks. In Colorado and California, the scaling patterns of the L-Cv in the four regions are shown in Figure 6.6. This figure demonstrates the strong effect that moisture input has on the value of the L-Cv. The box-plots (Figure 6.7) are another way to present the trend of L-Cv versus drainage area. With the exception of the Colorado Alpine region curve, which is essentially flat, the trendlines all decrease with increasing basin area. In the Walnut Gulch watershed, the equivalent plot shows a monotonic increase of the L-Cv (Figure 6.8). The climate of Walnut Gulch is semiarid and the majority of runoff is the product of summer storms. To analyse the influence of scale, climate, and physiography on the L-skewness (L-Cs) of peak flows, and to explain the observed behavior in these terms, a plot of L-Cs versus drainage area is shown in Figure 6.9. Any scaling trend for L-Cs is obscured by the presence of large data scatter. Similar observations were made in exploring evidence for 97 scaling behavior of L-Ck, highlighting current limitations of the British Columbia database of streamflows for constraining scaling behavior in regional flood frequency distributions. However, in Colorado and California, clear trends could be delineated as shown in Figure 6.10 and Figure 6.11. These demonstrate patterns very similar to those detected for the L-Cv (Figure 6.6 and Figure 6.7) and by implication we expect the same pattern between L-Cs/L-Ck to apply in BC despite the large scatter caused by spatial and temporal sampling variability in the regional long-term hydrometric database. It is obvious from the scatter of the regional data points and the relatively flat slopes of the regression curves that the statistics would not be very conclusive, and it is believed that their inclusion would detract from the strength of the visual messages provided by the plots. Nevertheless, we included regression statistics on these plots. While it can be argued that the trendlines do not have strong statistical significance, "...if one takes the opposite view that there is no real trend in the relationship between L-Cv and area, that is L-Cv is a constant, then from a physical perspective the latter is an even more remarkable result" (Robinson and Sivapalan, 1997a). To check if a change of the L-Cv on any of these curves is operationally significant, a sensitivity analysis is carried out. In this sensitivity analysis, the mean, L-Cs, and L-Ck values held constant at 1000 m3/s, 0.24, and 0.17, respectively, while the L-Cv was systematically varied from 0.1 to 0.6. The result of this analysis is shown on Figure 6.12, which presents the flood frequency curves using the G E V distribution. This figure indicates that a small change in the L-Cv can result in substantial changes in the quantile values, particularly for large return periods. In addition, it demonstrates that an increase or decrease in the L-Cv results in a corresponding change in the quantile values. For instance, increasing the L-Cv by 0.10 from 0.30 to 0.40, results in a 14% increase in the 10-year peak flow estimate, a 22% increase in the 100-year quantile, and a 26% increase in the 1000-year quantile. If the observed trends in the L-Cv vs. drainage area plots are physically plausible then the potential influence of scale on L-Cv cannot be ignored irrespective of its statistical significance. 98 The hypothesis of hydrologic homogeneity was assessed through a comparison of the residual scatter of the observed L-Cv plots with the results of a Monte Carlo type simulation model. For illustration purposes, typical homogeneity testing results are presented in this chapter for BC data only (Figure 6.13). The model utilized a GEV distribution with parameters similar to those revealed by the BC data. The delineation process was to visually review the scatter of the observed residuals about the regression curve for the regional scaling plot of the L-Cv. It is contended that if the residual scatter of a regression plot is within the 90% confidence level demonstrated by the Monte Carlo model, then the region can be considered to be hydrologically homogeneous. The degree of heterogeneity within a region is indicated by the amount of residual variation in excess of the confidence level. 6 . 5 DISCUSSION O F R E S U L T S 6.5.1 Scaling of the L-Cv 6.5.1.1 Effect of type of climate and physiography The values of L-Cv, as determined through single-site L-moment analysis for each gauged watershed, were grouped by physiographic region in B C and plotted against drainage area (Figure 6.4). The first observation that can be made from this grouping is that L-Cv is generally higher for the semi-arid Fraser and Thompson Plateau watersheds than for the wetter Coast zone and Columbia and Southern Rocky Mountain watersheds. Furthermore, the snowmelt-dominated Columbia and Southern Rocky Mountains tend to be characterized by lower values for L-Cv than the rainfall and rain-on-snow dominated Coast region. These differences in the magnitude of the coefficient of variation between physiographic regions are consistent with regional patterns in the Oregon streamflow database (Cathcart, 2001), are greatest for small basins, and diminish with increasing drainage area. At the smallest scale of headwater catchments, climatic and geographic factors exert the greatest control on differences in flood generation between the regions, as precipitation and snowmelt translate directly into streamflow through hillslope runoff. At the largest basin scales, the large channel component of streamflow generation dampens the variability in the annual flood series as caused by year-to-year variations in 99 precipitation and snowmelt patterns. This is why we see convergence of the three curves as drainage area increases. L-Cv decreases with increasing drainage area in all three regions. Earlier studies have shown that in many regions the coefficient of variation tends to change with basin area, with the typical pattern being a decrease in L-Cv with increasing area (Dawdy, 1961 and Benson, 1962a). This pattern is usually ascribed to the increasing significance of baseflow with increasing basin size (Bloschl and Sivapalan, 1997), or the limited spatial extent of extreme rainfall, which leads to the decrease in L-Cv of average areal rainfall intensity with increasing area (Sivapalan et al., 1990 and Fiorentino et al., 2000). The decrease in L-Cv with drainage area is weakest for the Columbia and Southern Rocky Mountain watersheds (Figure 6.4). The scaling behavior of the L-Cv for this region is therefore closest to what could be considered simple scaling. Ribeiro and Rousselle (1996) and Pandey (1998) provide similar examples using Canadian flood data. Both of these examples involved floods that were largely driven by snowmelt, lending support to the hypothesis that snowmelt floods tend to exhibit simple scaling behavior (simple scaling means the L-Cv is not changing with drainage area). This idea was introduced by Gupta and Dawdy (1995) and is not surprising, as snowmelt rates are more uniform than rainfall intensities and the volume of runoff is almost directly proportional to the amount of snow cover. Consequently, in watersheds that regularly develop an annual snowpack, the runoff from each sub-basin should be reasonably proportional to basin size, which implies that the variation of annual floods should be similar regardless of sub-basin size. In areas dominated by rainfall-generated floods, it has been suggested that the multi-scaling (multi-scaling means the L-Cv is changing with drainage area) of floods is simply a reflection of the multi-scaling behavior evident in rainfall (Gupta et al., 1994). The spatial variation in rainfall results in different levels of annual flood variation at different scales. Regions that exhibit both rainfall (or rain-on-snow) and snowmelt floods will likely also exhibit multi-scaling if different flood mechanisms dominate at different scales. This may explain the strong scaling patterns for L-Cv in the Interior Plateau region. 100 Thirty years of professional practice in BC are based on the index flood approach, which assumes that the L-Cv is constant. There may be several reasons why the scaling behavior of L-Cv noted in this study has not been detected or has been simply ignored in earlier regional flood frequency studies in BC: (1) the longer data record used in this study compared to previous studies, (2) the use of L-moments in this study, which are more robust than conventional moments in the presence of sampling variability and are therefore better suited to detecting scaling behavior in the flood statistics, and (3) grouping of some of the data into sub-regions of contrasting climates allowing scaling trends to be identified across a full range of drainage areas. 6.5.1.2 Effect of form of precipitation and storm type This study demonstrated the strong effect that storm characteristics and precipitation form have on the value of the L-Cv. Definite patterns of decreasing L-Cv values and trendline slopes are evident as one moves from the convective storm floods of the Colorado Foothills through the frontal storm floods of the Sierra Nevada and Coast Range to the snowmelt floods of the Colorado Alpine. With the exception of the Colorado Alpine, the trendlines all decrease with increasing basin area. The decreasing trendline is largely attributed to decreases in the strength and variability of effective extreme rainfall intensities, which are most pronounced for floods resulting from convective storm systems. In contrast, the relatively uniform spatial coverage and melt rate of snow that occurs in the low relief Colorado Alpine region results in a flat trendline. 6.5.1.3 Effect of range of spatial scale (drainage area) Trends in the L-Cv result from the complex interplay between various processes and their relative importance over different ranges in scale. A decreasing trend across a full range of basin scales is consistent with the findings of many studies in the literature (e.g. Dawdy, 1961). The pattern of a decreasing L-Cv with increasing basin area is likely attributable to the L-Cv's sensitivity to baseflow, relative rainfall intensities, and the 101 desynchronization of floods from contributing areas. Many of the trendlines have very shallow slopes, which is attributed to the moderating effect of snowmelt. As the basin size increases, the increasing heterogeneity of basin characteristics and the decreasing storm area coverage (reduced basin average rainfall) serve to attenuate the variability of rainfall (Hamlin, 1983), resulting in less variable runoff, which is manifested as a smaller L-Cv. As a large basin is comprised of many small sub-basins, which contribute flows to the large basin outlet at different times because of varying storm response characteristics, there is often a desynchronization of flow delivery. The increasing trend is opposite to that which is typically expected and, as it is evident in only one region one may think that it may simply be an artifact of the available data. On the other hand, Catchcart (2001) found increasing trends in two regions. A l l regions for which increasing trends were identified in this study and that of Cathcart (2001) are arid or semi-arid. It is also possible that it is a true pattern, and this hypothesis is supported by the plausibility of physical explanations. The L-Cv may increase with basin size due to an increase in the nonlinearity of basin response patterns as channel runoff processes start to dominate hillslope runoff effects (Goodrich et al, 1997). This switch in the significance of runoff mechanisms is largely due to a downstream increase in channel lengths and associated channel losses, along with an increase in the prevalence and effects of floodplains. An increasing L-Cv may also be related to variable storm area coverage and the relative degree of mixture in corresponding flood events. Spatial average rainfall values decrease with increasing basin scale; while at the same time variations in the synchronization of sub-basin responses start to increase. In addition, with increasing basin scale, the likelihood of achieving the conditions required for peak basin response is reduced. As the spatial coverage of storms is more complete, more often, in small basins, the annual peak flows more frequently approach the peak flow capacity of the watershed. However, in large basins, the peak flow capacity of the watershed is rarely approached, resulting in a fairly tight clustering of annual peak flows, with a few high outliers. These outliers represent the few occasions when extensive storm coverage, high rainfall intensities and 102 wet antecedent conditions combine to produce exceptionally large flows from the larger catchments. This situation is most likely associated with convective storm events, but could also apply to frontal storm systems to a lesser degree. In essence, there are two different flood populations in evidence, and mixing the samples from these two populations may result in high measures of variability. Another plausible cause for the observed increasing L-Cv pattern is the mixing of rainfall and snowmelt flood data, which is expected to increase with basin area. Large basins are expected to experience a large number of mixed snowmelt and rainfall flood events than small basins because they typically extend over a greater range in elevation. Consequently, the upper and lower portions of a large basin are more likely to experience different climatic conditions, and accordingly produce floods by different mechanisms. For instance, the upper area of a basin may be largely responsible for fresh snowmelt floods while the lower area may produce the bulk of winter rainfall generated flows. This may not be the case in Walnut Gulch but was given as a plausible explanation for increased L-Cv with drainage area in Oregon by Catchcart (2001). It is interesting to note that the V-shape pattern detected in the Oregon data was not evident in these data sets. This may be due to a number of reasons, including that the v-shape is not real, but simply an artifact of the data, or that the climatic and physiographic conditions associated with the Arizona, BC, Colorado, and California data sets are not conducive to producing the V-shape. The persistence of the pattern in Oregon supports the second explanation. The other reason why a V-shape was not seen in these study areas may be that data do not span across a wide enough range of watershed scales, unlike the Oregon data. The pyramid shape trend in the L-Cv (Figure 6.1) that was derived from Smith's (1992) original analysis of the Appalachian data has not been found either. This study casts considerable doubt about the existence of a pyramid shape trend in the L-Cv. An important point that these results highlight is that considerable emphasis should be placed on proper delineation of the flood data into hydrologically homogeneous regions before drawing conclusions regarding the scaling trends in the L-Cv. 103 6.5.2 Scaling of the L-Cs As with L-Cv, definite patterns of decreasing L-Cs values and trendline slopes are evident as one moves from the convective storm floods of the Colorado foothills through the frontal storm floods of the Sierra Nevada and Coast Range to the snowmelt floods of the Colorado Alpine. Furthermore, the trendlines generally decrease with increasing basin area, except for the Colorado Alpine curve, which, although difficult to discern, appears to be increasing. The physical explanations for these patterns are likely similar to those offered for the L-Cv trends, given the strong correlation between the two statistics. The only anomaly in this correlation is with the snowmelt floods of the Colorado Alpine and no satisfactory explanation for this could be found. A trend of decreasing skewness with increasing basin size is likely related to the effects of basin storage, as suggested by Svoboda (1974), McCuen and Hromadka (1988), and McCuen (2001). A physical explanation for an increase in skewness is not apparent, although it is possibly related to mixture in the data. Unlike the L-Cv, which can be physically interpreted as a measure of flow variability, the skewness is a more abstract concept related to distributional shape, and therefore it is more difficult to visualize what conditions may affect it. 6.6 C O N C L U S I O N S The regional L-moment analysis of the study areas used in this chapter revealed that the L-Cv of annual maximum daily flows varies substantially with catchment scale and that this variability is different in different physiographic regions. In different regions, the L-Cv was found to decrease, increase, or remain constant with increasing spatial scale, although the decreasing pattern was the most prevalent. A decrease of the L-Cv with increasing basin area is likely attributable to proportional increases in baseflows, decreasing average areal rainfall intensities and increasing basin heterogeneity, and the desynchronization of floods from contributing areas. An increase of the L-Cv is likely related to physiographic effects and the mixture of floods from different populations. Constant trends, in turn, are likely related to snowmelt floods. 104 In some regions, the L-Cv appears to be constant and the reader is cautioned about assuming constancy of this coefficient. The prevalence of the decreasing pattern lends support to the idea that this trend is real and that perhaps its significance is obscured by the large data scatter that is caused by sampling variability and by signals from factors Other than basin area. Furthermore, while the slopes of the trendlines may not be statistically significant, they may be operationally and scientifically important, for it has been shown that a small change in the L-Cv can have a substantial impact on quantile estimates, particularly at high return periods. Furthermore, it should be recognized that the patterns might reveal valuable information about the effects that different physical processes have on peak flows in a region. This information can be used to further the understanding of scaling relations that will allow extrapolation and simplification to be done with some confidence and understanding. Trends in the L-Cv result from the complex interplay between various processes and their relative importance at different scales and under different climate regimes, and therefore one should expect a variety of different scaling patterns. This recommendation implies that the popular index flood method for flood estimation cannot be applied over a full range of spatial scales, but rather is limited to small ranges in basin size. Furthermore, it challenges the validity of mapping isolines of L-Cv (or Cv values), or even unit area flood quantiles, as isoline maps cannot represent the effects of scale. Despite the considerable variation evident in the scaling of the L-Cv, three generalities were noted. Firstly, L-Cv values are generally higher in arid and semi-arid regions than in humid regions, which is not surprising given that moisture inputs and antecedent moisture conditions tend to be far less variable in wet climates. Secondly, snowmelt floods appear to demonstrate less variation across spatial scales than rainfall floods, although non-constant pattern behavior is evident in almost all regions, including those with large snowmelt components. This result challenges the idea that snowmelt floods necessarily follow a constant pattern, although this conclusion is qualified with the knowledge that some regions have some mixture in their flood data that may have influenced the results. The third and final generality is that scaling relations for the L-Cv differ at varying 105 spatial scales, and consequently the simple and blind extrapolation of flood hydrology from large basins to small basins, or vice versa, is not recommended. This result raises concerns about the validity of many current modelling practices, both deterministic and stochastic, which typically assume high degrees of scale invariance or spatial consistency in regional flood characteristics. 106 Chapter 7 CONCLUSIONS AND RECOMMENDATIONS 7.1 SUMMARY OF THE STUDY AREAS The study areas utilized in this thesis were chosen because they reflect different hydrologic conditions and provide a unique opportunity for investigating different approaches to modelling heterogeneous flood distributions. The data sets were suitable for examining flood behavior • in response to different climate regimes • with a range of physiography, and • over a range of temporal and spatial scales This research work represents an effort to relate the values and patterns of commonly applied flood statistics to the physical processes that are thought to contribute to the generation of peak flows. The possible links between flood statistics and physical processes identified in this thesis contribute to the further understanding of hydrological systems. 7.2 CONCLUSIONS The following conclusions, having significant implications for the science of flood hydrology and for current practice of computing floods, are obtained from this study. 1. In regions of contrasting climatic and physiographic conditions, mixture could be caused by any one or a combination of the following: seasonal variations in the flood producing mechanisms (rain vs. snowmelt vs. rain-on-snow), snow melting 107 at substantially different ranges of temperature (short wave vs. longwave radiation), changes in weather patterns due to El-Nino/La-Nina oscillations resulting from the anomalous warming of the equatorial eastern and central Pacific Ocean, and/or the low frequency or decadal climatic fluctuations and changes in channel routing due to the dominance of within channel or floodplain flow. 2. Traditional approaches such as using the probability density function (PDF) and cumulative distribution function (CDF) shapes can help in identifying mixture in flood data. However, they may be misleading because of sampling variability. In such a case, a physically based approach is necessary to classify the peakflow events into distinct flood populations using any combination of climate, geomorphologic and atmospheric data. 3. It has been demonstrated in this thesis that in the Interior snow dominated watersheds, different temperature ranges causing the melt of snow pack may also produce mixture in flood data. 4. For the first time, it has been demonstrated that ENSO/Non-ENSO events may produce floods of different characteristics, and the difference is distinct in different locations of the Interior and Coast of BC. For the majority of basins located in the Coast of BC, the lower tail of the composite frequency curve is controlled by a combination of ENSO and non-ENSO events while the upper tail is defined exclusively by the non-ENSO events. For most watersheds located in the Interior of BC, the upper tail of the CDF is defined exclusively by the ENSO events. While many major streams have shown similar trends, several others displayed no significant difference between the ENSO and non-ENSO annual flood frequency curves. 5. In Arizona, on the other hand, different types of storms falling on the same basins may produce floods of a different type. In general terms, heterogeneity in the distribution of floods may be caused by three distinct mechanisms: (i) different 108 storm types, (ii) E l Nino-Southern Oscillation (ENSO) conditions resulting from the anomalous warming of the equatorial eastern and central Pacific Ocean, and/or (iii) the low frequency or decadal climatic fluctuations. 6. It was found that many of the commonly used homogeneous distributions do not provide a satisfactory fit to the observed mixture of floods in all study areas, particularly at the upper tail of the empirical distribution. This was the case for even the five-parameter Wakeby distribution, which is the most flexible and versatile of the homogeneous distributions considered. 7. A heterogeneous distribution that explicitly accounts for the fact that floods are generated by more than one hydrologically distinct mechanism produced a superior fit that mirrors the upper tail of the empirical distribution much better than several homogeneous distributions. It has been demonstrated how the use of a more appropriate type of distribution (heterogeneous as opposed to homogeneous) might have major implications on the prediction of extreme flood events. The heterogeneous distribution gave estimates of floods that differ by as much as 200% from those estimated by the homogeneous distributions. These differences are real and operationally important. Therefore, they should not be ignored, irrespective of their statistical significance. 8. From the results of the Monte Carlo simulations, it was found that: (i) For high L -skewness and a heavy tailed probability density function, both of which are characteristics of flood mixtures in arid and semi arid climates of Arizona, the GEV, Wakeby, and T C L N distributions consistently performs well over the NP, EV1, and LP3. (ii) For smaller value of L-skewness and a bimodal probability density function, both of which are characteristics of flood mixtures representative of the humid climates of British Columbia, the EV1 and NP distributions perform better than any other distribution. 9. This study revealed that the L-Cv of annual maximum flows in Arizona, BC, California, and Colorado varies substantially with catchment scale and that this 109 variability is different in different physiographic regions. Despite the considerable variation evident in the scaling of the L-Cv, three generalities were noted. Firstly, L-Cv values are generally higher in arid and semi-arid regions than in humid regions, which is not surprising given that moisture inputs and antecedent moisture conditions tend to be far less variable in wet climates. Secondly, snowmelt floods appear to demonstrate less variation across spatial scales than rainfall floods, although non-constant pattern behavior is evident in almost all regions, including those with large snowmelt components. This result challenges the idea that snowmelt floods follow necessarily a constant pattern, although this conclusion is qualified with the knowledge that some regions have some mixture in their flood data that may have influenced the results. The third and final generality is that scaling relations for the L-Cv vary over different spatial scales, and consequently the simple linear extrapolation of flood relations from large basins to small basins, or vice versa, is not recommended. 10. The last thirty years of professional practice in BC are based on the fundamental assumption that simple scaling of flood statistics holds. For this assumption to be valid, all floods must have a common probability distribution, which implies that the statistical moments of this distribution, such as the coefficient of variation, are independent of drainage area or any other catchment physiographic or climatic parameter. This study has demonstrated that in BC, L-Cv generally decreases with increasing catchment scale and that the strength of this trend depends on physiographic region and the climatic conditions governing flood generation. The quantity and form of precipitation in particular was found to exert strong controls on flood behavior. Differences in L-Cv between physiographic regions are greatest for small watersheds and diminish with increasing drainage area, pointing to a decreasing influence of climatic and geographic factors on flood generation with increasing catchment size. U O 7.3 RECOMMENDATIONS FOR FUTURE STUDIES The following text presents recommendations for future research to advance scientific and operational contributions of flood hydrology. • In addition to using an empirical approach to substantiate or refute the hypotheses addressed in this thesis, it would be useful to investigate whether or not the observed patterns in L-Cv and L-Cs could be produced through deterministic modelling work, in a manner similar to the studies by Robinson and Sivapalan (1997a, b), Bloschl and Sivapalan (1997), and Jothityangkoon and Sivapalan (2001). Such a modelling exercise could help confirm some of the speculation put forward in this thesis to explain the observed scaling trends in the study areas. • It would be ideal to have a long-term record of streamflow at a network of hydrometric stations that are nested within the same river basin to reveal the real effect of scale on flood statistics, such as L-Cv and L-Cs. The river basin should be reasonably small to keep to a minimum the effect of other physiographic and climatic parameters on the flood statistics. • It would be useful to repeat the delineation of regional scaling patterns with other data sets in order to see whether similar patterns are revealed. Detection of similar patterns in other data sets would help to further validate the results. i l l R E F E R E N C E S Adamowski, K. , 1985. "Nonparametric kernel estimation of flood frequencies." Water Resources Research, 21(11), 1585-1590. Adamowski, K. , 1989. " A Monte Carlo comparison of parametric and nonparametric estimation of flood frequencies." Journal of Hydrology, 108,295-308. Adamowski, K. and W. Feluch, 1990. "Nonparametric flood frequency analysis with historical information." Journal of Hydraulic Engineering, American Society of Civi l Engineers, 116(8), 1035-1047. Akaike, H. , 1974. " A new look at the statistical model identification." IEEE Transactions on Automatic Control, AC-19(6), 716-721. Alila, Y . , 1998. " A regional frequency approach for estimating design floods in B C " Forest Renewal BC (HQ 96330-RE) and Science Council of BC (FR-96/97-520) final report, Science Council of BC, Burnaby, BC, Canada. Alila, Y . , 1999. " A hierarchical approach for the regionalization of precipitation annual maxima in Canada." Journal of Geophysical Research, 104(D24), 31645-31655. Alila, Y . and A. Mtiraoui. 2002. Implications of heterogeneous flood frequency distributions on traditional stream discharge prediction techniques. Hydrological Processes, 16(5), 1065-1084. American River Committee, 1998. "Estimating American River-flow frequencies." US Army Corps of Engineers Sacramento District Office 1325 J Street Room 814 Sacramento C A Anderson, G.D., 1969. " A comparison of probability density estimates." Presented at IMS Annual Meeting, 19-22 August, New York. Archer, D.R., 1989. "Flood wave attenuation due to channel and floodplain storage and effects on flood frequency." In Floods: Hydrological, Sedimentological, and Geomorphological Implications, K. Beven and P. Carling (eds). New York: John Wiley, 37-46. Arnell, N . and M . Beran, 1987. "Testing the suitability of the two component extreme value distribution for regional flood estimation." Regional Flood Frequency Analysis, Ed. by V.P. Singh, pp. 159-175. Bardsley, W.E., 1989. "Using historical data in nonparametric flood estimation." Journal of Hydrology, 108, 249-255. 112 Barry, R.G., 1992. "Mountain weather and climate." Routledge, New York, 402 pp. Benson, M.A. , 1952. "Characteristics of frequency curves based on a theoretical 1,000-year record." USGS Open-File Report, U.S. Geological Survey, Reston, Virginia, USA. Benson, M.A. , 1962a. "Factors influencing the occurrence of floods in a humid region of diverse terrain." Geological Survey Water Supply Paper 1580-B. Benson, M.A. , 1962b. "Evaluation of methods for evaluating the occurrence of floods." Water Supply Paper 1550-A, U.S. Geological Survey, Reston, Virginia, USA. Benson, M.A. , 1964. "Factors influencing the occurrence of floods in the Southwest." U.S. Geological Survey Water Supply Professional Paperl580-D, U.S. Geological Survey, Reston, Virginia, USA. Benson, M.A. , 1968. "Uniform flood frequency estimating methods for federal agencies" Water Resources Research, 4(5), 891-908. Beran, M . , J .R.M. Hosking and N . Arnell, 1986. "Comment on 'two-component extreme value distribution for flood frequency analysis' by Fabio Rossi, Mauro Fiorentino, and Pasquale Versace" Water Resources Analysis, 22, 263-6. Beschta, R.L., M.R. Pyles, A.E. Skaugset and C.G. Surfleet, 2000. "Peakflow Responses to Forest Practices in the Western Cascades of Oregon, USA." Journal of Hydrology 233: 102-120. Bloschl, G. and M . Sivapalan, 1997. "Process controls on regional flood frequency: coefficient of variation and basin scale." Water Resources Research, 33(12), 2967-2980. Bobee, B., G. Cavadias, F. Ashkar, J. Bernier and P. Rasmussen, 1993. "Towards a systematic approach to comparing distributions used in flood frequency analysis." Journal of Hydrology, 142, 121-136. Bobee, B. and P.F. Rasmussen, 1995. "Recent Advances in Flood Frequency Analysis." U.S. National Report to the International Union of Geodesy and Geophysics 1991-1994, pp.1111-1116. Bryson, R.A. and W.P. Lowry, 1955. "Synoptic climatology of the Arizona summer precipitation singularity." American Meteorological Society Bulletin, 36, 329-339. Burn, D.H., 1990. "Evaluation of regional flood frequency analysis with a region of influence approach." Water Resources Research, 26(10), 2257-2265. 113 Burn, D.H., Z. Zolt and M . Kowalchuk, 1997. "Regionalization of catchments for regional flood frequency analysis." Journal of Hydrologic Engineering, 2(2), 76-82. C A R F F (Committee on American River Flood Frequencies), 1999. "Improving American River flood frequency analyses." Water Science and Technology Board, Commission on Geosciences, Environment, and Resources, National Research Council, Washington: National Academy press, 120. Cathcart, J.G., 2001. "The effects of scale and storm severity on the linearity of watershed response revealed through the regional L-moment analysis of peak flows." Ph.D. Thesis, Institute of Resources and Environment Resource Management and Environmental Studies, University of British Columbia Vancouver, BC, Canada. Cherkauer, D . C , 1972. "Longitudinal profiles of ephemeral streams in southeastern Arizona." Geological Society of America Bulletin, 83, 353-365. Chui, S.T., 1991. "The effect of discretization error on bandwidth selection for kernel density estimation." Biometrica, 78(2), 436-441. Church, M . , 1997. "Regionalization of Hydrological Estimates for British Columbia." Report prepared for the BC Ministry of Environment, Fisheries Branch, Habitat Division, Victoria, BC, Canada. Cohen, A . C , 1967. "Estimation in mixtures of two-normal distributions." Technometrics, 9(1), 15-28. Cong, S., Y . L i , J.L. Vogel and J.C. Schaake, 1993. "Identification of the underlying distribution form of precipitation using regional data." Water Resources Research, 29(4), 1103-1111. Cooke, R.A. and S. Mostaghimi, 1994. "MIXFIT: A microcomputer-based routine for fitting heterogeneous probability distribution functions to data." Transactions of the A S A E 3 7 , 1463-1472. Coulson, C.H., ed., 1991. "Manual of Operational Hydrology in British Columbia. Ministry of Environment." Water Management Division, Hydrology Section, Victoria, BC, Canada. Coulson, C H . and W. Obedkoff, 1998. "British Columbia streamflow inventory." Ministry of Environment Lands and Parks, Water Management Division, Water Inventory Section, Victoria, BC, Canada. Crippen, J.R., 1978. "Composite log-type III frequency-magnitude curve of annual floods." U.S. Geological Survey Open-File Report 78-352, 5 pp. 114 Cunnane, C , 1978. "Unbiased plotting position - a review." Journal of Hydrology, 37, 205-222. Cunnane, C , 1989. "Statistical distributions for flood frequency analysis." Operational Hydrology Report No. 33, WMO-No. 718, World Meteorological Organization, Geneva, Switzerland. Dalrymple, T., 1960. "Flood frequency analyses." United States Geological Survey, Water Supply Paper 1543-A. Dawdy, D.R., 1961. "Variation of flood ratios with size of drainage area." U.S. Geological Survey Professional Paper, 424-C, C36. Douglas, A . V . and H.C. Fritts, 1973. "Tropical cyclones of the eastern North Pacific and their effects on the climate of the Western United States." Final Report, N O A A Contract 1-35241, Laboratory of Three-Ring Research, University of Arizona, Tucson, 43 pp. Douglas, J., 1974. "Flood frequency and bridge and culvert sizes for forested mountains of North Carolina." Coweeta Hydrologic Laboratory, Franklin, North Carolina, 1-21. Eaton, B., M . Church and D. Ham, 2002. "Scaling and regionalization of flood flows in British Columbia, Canada." Hydrological Processes, 16(16), 3245-3263 Elliot, J.G., R.D. Jarrett and J. Ebling, 1982. Annual snowmelt and rainfall peak-flow data on selected foothills region streams. South Platte River, Arkansas River and Colorado River Basins, Colorado. USGS Open-File Report, pp. 226. Elliott W.P. and J.K. Angell, 1988. "Evidence for changes in Southern Oscillation relationships during the last 100 years." Journal of Climate, 27, 729-737. Environment Canada, 1994. " H Y D A T C D R O M : surface water and sediment data. Atmospheric Environment Services." Water Survey of Canada, Environment Canada, Ottawa, Canada. Fiorentino, M . , P. Versace and F. Rossi, 1985. "Regional flood frequency analysis using the two-component extreme value distribution." Hydrological Sciences Journal, 30,51-64. Fiorentino, M . , G. Salvatore, F. Rossi and P. Versace, 1987. "Hierarchical approach for regional flood frequency analysis." D. Reidel Publishing Company, 35-49. Fiorentino, M . and V . Iacobellis, 2000. "Scaling properties of the flood distribution moments and their dependence on climate." Eos Transactions, 81(48), 2000 Fall Meeting Abstracts, A G U , San Francisco, California, USA. 115 Fortin, V. , J. Bernier and B. Bobee, 1997. "Simulation, Bayes, and bootstrap in statistical hydrology." Water Resources Research, 33, 439-448. Francis, F., 1998. "Using the TCEV distribution function with systematic and non-systematic data in a regional flood frequency analysis." Stochastic Hydrology and Hydraulics, Volume 12, 267-283. Frind, E.O., 1969. "Rainfall-runoff relationships expressed by distribution parameters." Journal of Hydrology, 9(4), pp. 405-426. Gabriele, S. and N . Arnell, 1991. " A hierarchical approach to regional flood frequency analysis." Water Resources Research, 27(6), 2181-1289. Gingras, D. and K. Adamowski, 1992. "Coupling of nonparametric frequency and L-moment analysis for mixed distribution identification." Water Resources Bulletin, 28(2), 263-271. Goodrich, D . C , L.J. Lane, R . M . Shillito, S.N. Miller, K . H . Syed and D.A. Woolhiser, 1997. "Linearity of basin response as a function of scale in a semiarid watershed." Water Resources Research, 33(12), 2951-2965. Gray, D .M. , 1961. "Interrelationships of watershed characteristics." Journal of Geophysical Research, 66(4), 1215-1223. GREHYS, 1996. "Presentation and review of some methods for regional flood frequency analysis." Journal of Hydrology, 186(1-4). 63-84. Gupta, V .K . , L. Duckstein and R.W. Peebles, 1976. "On the joint distribution of the largest floods and its time of occurrence." Water Resources Research, 12(2), 295-304. Gupta, V .K . , O.J. Mesa and D.R. Dawdy, 1994. "Multiscaling theory of flood peaks: regional quantile analysis." Water Resources Research, 30(12), 3405-3421. Gupta, V . K . and D.R. Dawdy, 1995. "Physical interpretation of regional variations in the scaling exponents of flood quantiles." Hydrological Processes, 9, 347-361. Gupta, V . K . and E.C. Waymire, 1998. "Spatial variability and scale invariance in hydrologic regionalization." In: Scale Dependence and Scale Invariance in Hydrology, edited by G. Sposito. Cambridge: Cambridge University Press, pp. 88-135. Haan, C.T., 1977. "Statistical methods in hydrology." Iowa: Iowa State University Press, Ames. Haktanir, T., 1992. "Comparison of various flood frequency distributions using annual flood peaks data of rivers in Anatolia." Journal of Hydrology, 136, 1-31. 116 Hales, J.E., 1974. "Southwestern United States summer monsoon source, Gulf of Mexico or Pacific Ocean?" Weatherwise, 24, 148-155. Hamlin, M.J. , 1983. "The Significance of rainfall in the study of hydrological processes at basin scale." Journal of Hydrology, 65, 73-94. Hansen, E .M. , F .K Schwarz, J.T. Riedel, 1977. "Probable maximum precipitation estimates, Colorado River and Gret Basin drainages." National Oceanic and Atmospheric Administration Hydrometeorological Report 49, 161 p. Hansen, E . M . and F.K. Schwarz, 1981. "Meteorology of important rainstorms in the Colorado River and Great Basin drainages." National Ocean and Atmospheric Administration Hydrometeorological Report 50, 167p. Hare, F.K. and M . K . Thomas, 1979. "Climate Canada." John Wiley, Toronto, 31-35, 44-54, 94-97. Hawkins, R.H., 1974. " A note on mixed distributions in hydrology," Proceedings of a Symposium on Statistical Hydrology." U.S. Department of Agriculture, Agricultural Research Service, Misc. Publication No. 1275, 336-335. Hazen, A. , 1914. "Discussion on flood flows." by W.E. Fuller, Transactions, ASCE, 77, 628. Hazen, A. , 1924. "Discussion on theoretical frequency curves and their application to engineering problems." Transactions of the American Society of Civi l Engineers, 87, 143-173. Hazen, A., 1930. "Flood flows." New York: John Wiley Hebson, C S . and E.F. Wood, 1986. " A Study of scale effects in flood response." In: Scale Problems in Hydrology, ed V . K . Gupta et al., D. Reidel Publishing Company, Dordrecht, the Netherlands, 133-158. Hewlett, J. D., 1982 "Forests and floods in the light of recent investigation." Proceedings of the Canadian Hydrology Symposium, pp. 543-559. Hirschboeck, K . K . , 1985. "Hydroclimatology of flow events in the Gila River Basin, central and southern Arizona." Ph.D. Dissertation, Department of Geosciences, University of Arizona, Tucson. Hirschboeck, K .K. , 1987. "Hydroclimatically-defined mixed distributions in partial duration flood series." In "Hydrologic Frequency Modelling", V . P. Singh (Editor), D. Reidel Publishing Company, 199-212. 117 Hirschboeck, K . K . and J.F. Cruise, 1989. "Hydroclimatic regionalization of flooding variability: A combined stochastic-climatic approach." Technical progress report: Year 1, USGS WRR Project No: 14-08-0001-G1754. Hjalmarson, H.W., 1990. "Flood of October 1983 and history of flooding along the San Francisco River, Clifton, Arizona." U.S. Geological Survey Water-Resources Investigations report 85-4225-B. Hogg, W.D. and D.A. Carr, 1985. "Rainfall frequency atlas for Canada." Environment Canada, Atmospheric Environment Service, Ottawa, Ontario, Canada. Holland, S.S., 1964. "Landforms of British Columbia: A physiographic outline." Bulletin 48, B.C. Department of Mines and Petroleum, Victoria, Canada. Horton, R.E., 1913. "Frequency of occurrence of Hudson River floods." US Weather Bureau Bulletin, 2, 109-112. Hosking, J.R.M., 1990. "L-moments: analysis and estimation of distributions using linear combinations of order statistics." Journal of the Royal Statistical Society, B, 52(2), 105-124. Hosking, J.R.M. and J.R. Wallis, 1993. "Some statistics useful in flood frequency analysis." Water Resources Research, 29(2), 271-281. Hosking, J.R.M. and J.R. Wallis, 1997. "Regional frequency analysis." U.K: Cambridge University Press. Houghton, J .C , 1978. "Birth of a parent: the Wakeby distribution for modeling flood flows." Water Resources Research, 14(6), 1105-1109. Houser, P.R., D.C. Goodrich, K . H . Syed, 2000. "Runoff, precipitation, and soil moisture at Walnut Gulch." Chapter 6 In: Spatial Patterns in Catchment Hydrology, Observations and Modeling, Rodger Grayson and Gunter Blosch (eds.), Cambridge Univ. Press, p. 125-157 Interagency Advisory Committee on Water Data (IACWD), 1982. "Guidelines for determining flood flow frequency." Bulletin 17B, Hydrology Subcommittee, U.S Geological Survey, Reston, Va. Institute of Hydrology, 1999. "Flood estimation handbook." Institute of Hydrology, Wallingford, England. Jarrett, R.D. and J.E. Costa, 1982. "Multi-disciplinary approach to the flood hydrology of foothill streams in Colorado." In: International Symposium on Hydrometeorology, edited by A.I. Johnson and R.A. Clark, American Water Resources Association, Bethesda, Maryland, USA. 118 Jarret, R.D., and J.E. Costa, 1988. "Evaluation of the flood hydrology in the Colorado front range using precipitation, streamflow, and paleoflood data for the Big Thompson river basin." U.S. Geological Survey Water-Resources Investigations Report 87-4117, 37 pp. Jarrett, R.D., 1990. "Hydrologic and hydraulic research in mountain rivers." Water Resources Bulletin, 26, 419-429. Jennings, M.E. , W.O. Thomas and H.C. Riggs, 1994. "Nationwide summary of U.S. geological survey regional regression equations for estimating magnitude and frequency for ungauged sites, 1993." Water Resources Investigations Report 94-4002, US Geological Survey, Reston, Virginia, USA. Jiang, S. and D. Kececioglu, 1992. "Maximum likelihood estimates, from censored data, for mixed-Weibull distributions." IEEE Transactions on Reliability, 41(2), 248-255. Jothityangkoon, C. and M . Sivapalan, 2001. "Temporal scales of rainfall-runoff processes and spatial scaling of flood peaks: space-time connection through catchment water balance." Advances in Water Resources, 24(9-10), 1015-1036. Jones, J.A., 1997 "Global Hydrology: Processes, Resources and Environmental Management." Harlow, Essex, England, Longman. Keeping, E.S., 1966. "Distribution-free methods in statistics, Statistical Methods in Hydrology." Proceedings of Hydrology Symposium No.5, McGi l l University, Montreal, Canada. Kircher, J.E., A.F. Choquette, and B.D. Richter, 1985. "Estimation of natural streamflow characteristics in western Colorado." U.S. Geological Survey Water Resources Investigations Report 85-4086, 28pp. Kite, G.W., 1978. "Frequency and Risk Analyses in Hydrology." Water Resources Publications, Fort Collins, Colorado, USA. Klemes, V. , 1970. "Negatively Skewed Distribution of Runoff." IASH International Symposium on the Results of Research on Representative and Experimental Basins, Wellington, New Zealand, December 1970, 96, 219-236. Klemes, V. , 1974. "Some problems in pure and applied stochastic hydrology." in Proceedings of a Symposium on Statistical Hydrology: U.S Department of Agriculture, Agriculture Research Service, Miscellaneous Publication No. 1257, 2-15. Klemes, V. , 1976. "Comment on 'Regional skew in search of a parent." Water Resources Research, 12(6), 1325-1326. 119 Klemes, V. , 1986. "Dilettantism in Hydrology: Transition or Destiny." Water Resources Research, 22(9), 177S-188S. Klemes, V. , 1999. "Keeping techniques, methods and models in perspective." Journal of Water Resources Planning and Management, Jul/Aug, 181-185. Klemes, V. , 2000. "Tall tales about tails of hydrological distributions: I and II." American Society of Civil Engineers, Journal of Hydrologic Engineering, 5(3), 227-239. Kuczera, G., 1982. "Effects of sampling uncertainty and spatial correlation on an empirical Bayes procedure for combining site and regional information." Journal of Hydrology, 65(4), 373-398. Labatiuk, C.W., 1985. " A Nonparametric approach to flood frequency analysis." M.A.Sc. Thesis, University of Ottawa. Lall, U . , Y.I. Moon and K. Bosworth, 1993. "Kernel flood frequency estimators: Bandwidth selection and kernel choice." Water Resources Research, 29(4), 1003-1015. Lettenmaier, D.P. and K.W. Potter, 1985. "Testing flood frequency estimation methods using a regional flood generation model." Water Resources Research, 21(12), 1903-1914. Lettenmaier, D.P., J.R. Wallis and E.F. Wood, 1987. "Effect of regional heterogeneity on flood frequency estimation." Water Resources Research, 23(2), 313-323. Leytham, K . M . , 1984. "Maximum likelihood estimates for the parameters of mixture distributions." Water Resources Research, 20(7), 896-902. Loukas, A. and M.C. Quick, 1995. "Comparison of six extreme flood estimation techniques for ungauged watersheds in Coastal British Columbia." Canadian Water Resources Journal, 20, 17-29. Lystrom, D.J., 1970. "Evaluation of the streamflow-data program in Oregon." Open-File Report, U.S. Geological Survey, Portland, Oregon, USA. Maddox, R.A, Faye Canova, and L.R. Hoxit, 1980. "Meteorological characteristics of flash flood events over the western United States" Monthly Weather Review, 108, 1866-1877. Matalas, N.C., J.R. Slack and J.R. Wallis, 1975. "Regional skew in search of a parent." Water Resources Research, 11(6), 815-826. 120 McCain, J.F. and R.D. Jarrett, 1976. "Manual for estimating flood characteristics of natural-flow streams in Colorado." Tech. Manual 1, Colorado Water Conservation Board, Denvar, 68 pp. McCain, J.F., L.R. Hoxit, R.A. Maddox, C F . Chappell and F. Caracena, 1979. "Storm and flood of July 31-August 1, 1976, in the Big Thompson River and Cache la Poudre River basins, Larimer and Weld Counties, Colorado." U.S. Geological Survey Professional Paper 1115, Part A, 82pp. McCuen, R.H. and T.V. Hromadka II, 1988. "Flood skew in hydrologic design on ungauged Watersheds." Journal of Irrigation and Drainage Engineering, 114(2), 301-310. McCuen, R.H., 2001. "Generalized flood skew: Map versus watershed skew." Journal of Hydrologic Engineering, 6(4), 293-299. McDonald, J.E., 1956. "Variability of precipitation in an arid region." A survey of characteristics for Arizona, University of Arizona. Institute of Atmospheric Physics Tech. Report No. 1, 88p. McDonnell, J.J., 2003. "Where does water go when it rains? Moving beyond the variable source area concept of rainfall-runoff response." Hydrological Processes, 17, 1869-1875. McKerchar, A.L and C P . Pearson, 1990. "Maps of flood statistics for regional flood frequency analysis in New Zealand." Hydrological Sciences Journal, 35, 609-21. Medgyessy, P., 1977. "Decomposition of superpositions of distribution functions." New York: Wiley. Melone, A . M . , 1985. "Flood producing mechanisms in Coastal British Columbia." Canadian Water Resources Journal, 10(3), 47-62. Moore R.D. and L G . McKendry, 1996. "Spring snowpack anomaly patterns and winter climatic variability, British Columbia, Canada." Water Resources Research, 32(3), 623-632. Moran, P.A.P., 1959. "The theory of storage." Methuen, London. Mtiraoui, A. and Y . Alila. 2004. Effect of river basin geomorphology on the selection of flood frequency distributions. Journal of Hydrologic Engineering (under review). Namias, J., X . Yuan, and D.R. Cayan, 1988. "Persistence of North pacific Sea surface temperature and atmospheric flow patterns." Journal of Climate, 1, 682-703. Nash, J.E. and J. Amorocho, 1966. "The accuracy of the prediction of floods of high return period" Water resources research, 2(2), 191-198. 121 National Oceanic and Atmospheric Administration [NOAA] climatological data, 1986. "Daily weather maps." National Oceanic and Atmospheric Administration, 8p. Natural Environment Research Council (NERC), 1975. "Flood studies report: Volume I." Hydrological Studies, London, England. Obedkoff, W., 1998. "Streamflow in the southern Interior region." BC Ministry of Environment, Lands and Parks, Water Inventory Section, Resources Inventory Branch, Victoria, BC, Canada. Pandey, G.R., 1998. "Assessment of scaling behavior of regional floods." Journal of Hydrologic Engineering, 3(3), 169-173. Paulson, R.W., E.B. Chase, R.S. Roberts and D.W. Moody, 1991. "National water summary, 1988-1989 hydrologic events and floods and droughts." U.S. Geological Survey Water Supply, 2375, 591 pp. Pilgrim, D.H., 1983. "Some problems in transferring hydrological relationships between small and large drainage basins and between regions." Journal of Hydrology, 65, 49-72. Pilon, P.J., and K.D. Harvey, 1994. "Consolidated frequency analysis (CFA). Version 3.1, Reference Manual." Environment Canada, Ottawa, Ontario, Canada. Pitlick, J., 1994. "Relation between peak flows, precipitation, and physiography for five mountainous regions in the Western USA." Journal of Hydrology, 158, 219-240. Potter, W.D., 1958. "Upper and lower frequency curves for peak rates of runoff." Transactions of the American Geophysical Union, 39(1). Potter, K.W., 1987. "Research on Flood Frequency Analysis: 1983-1986." Reviews of Geophysics, 25(2), 113-118. Renard, K.G. , L.J. Lane, J.R. Simanton, W.E. Emmerich, J.J. Stone, D.C. Goodrich, M . A . Weltz and D.S. Yakowitz 1993. "Hydrology of an arid environment: Walnut Gulch studies". Proc. A W R A 29th Annual Renard, K.G. , 1990. "Walnut Gulch experimental watershed." In: Robert Z. Callaham (ed.), case studies and catalog of watershed projects in Western provinces and States. Wildland Resources Center Report 22, Univesity of California, Berkeley, pp. 57-59, 76. Ribeiro, J. and J. Rousselle, 1996. "Robust simple scaling analysis of flood peaks series." Canadian Journal of Civil Engineering, 23, 1139-1145. Riggs, H.C., 1973. "Regional analyses of streamflow characteristics." Hydrologic Analysis and Interpretation.Washington: United States Government Printing. 122 Robinson, J.S. and M . Sivapalan, 1997a. "An Investigation Into the Physical Causes of Scaling and Heterogeneity of Regional Flood Frequency." Water Resources Research, 33(5), 1045-1059. Robinson, J.S. and M . Sivapalan, 1997b. "Temporal Scales and Hydrologic Regimes: Implications for Flood Frequency Scaling." Water Resources Research, 33(12), 2981-2999. Roeske, R.H., J .M. Garrett and J.H. Eychaner, 1989. "Floods of October 1983 in southeastern Arizona." U.S. Geological Survey Water-Resources Investigations Report 85-4225-c, 77 pp. Ropelewski, C F . and P.D. Jones, 1987. "An extension of the Tahiti-Darwin southern oscillation Index." Monthly Weather Review, 115,2161-2165. Rossi, F., M . Fiorentino and P. Versace, 1984. "Two-Component extreme value distribution for flood frequency analysis." Water Resources Research, 20(7), 847-856. Rubin, D.B., 1981. "The Bayesian bootstrap" ann. Stat., 9, 130-134. Russell, S.O., 1982. "Flood Probability Estimation." Journal of Hydraulic Engineering, ASCE, 108(1), 63-73. Shabbar, A. and M . Khandekar, 1996. "The impact of El-Nino-Southern Oscillation on the temperature field over Canada." Atmospheric Ocean 34, 401-416. Sheather, S.J., and J.S. Marron, 1990. "Kernel quantile estimators." Journal of American Statistical Association, 85, 410-416. Singh, K.P., 1968. "Hydrologic distributions resulting from mixed populations and their computer simulation." Publication No. 81, International Association of Scientific Hydrology, 671-191. Singh, K.P., 1974. " A two-distribution method for fitting mixed distributions in hydrology." In proceeding of a symposium on statistical hydrology, U.S. dept. of Agricultural research service, Misc. Publ. No. 1275, 371-382. Singh, K.P. 1979. "Comments on birth of a parent: the Wakeby distribution for modelling flood flows." by J. C. Houghton, Water Resources Research, 15(5), 1285-1287. Singh, K.P. and R.A. Sinclair, 1972. "Two-distribution method for flood-frequency analysis." Journal of Hydraulics, American Society of Civil Engineers, 98(HY1), 29-44. 123 Singh, K.P. and M . Nakashima, 1981. " A new methodology for flood frequency analysis with objective detection and modification of outliers/inliers." Illinois State Water Survey, Champaign, Illinois, Report No. 272, 145. Sivapalan, M . , M.J. Beven, and E.F. Wood, 1990. "On hydrologic similarity, 3. A dimensionless flood frequency model using a generalized geomorphic unit hydrograph and partial area runoff generation." Journal of Water Resources Research, 26(1), 43-58. Sivapalan, M . 1996. "Review of 'Computer models of watershed hydrology', V . P. Singh (ed.) Water Resources Publications, Colorado." Catena. 29(1), 88-90. Sivapalan, M . , C. Jothityangkoon and M . Menabde, 2001. "Linearity and nonlinearity of basin response as a function of scale: discussion of alternative definitions." Water Resources Research, 38(2). Slack, J.R., J.R. Wallis and N.C. Matalas, 1975. "On the value of information to flood frequency analysis." Water Resources Research, 11(5), 629-647. Slezak-Pearthree, P .M. and V.R. Baker, 1987. "Channel change along the Rillito Creek system of southeastern Arizona, 1941 through 1983." Arizona Bureau of Geological and Mineral Technology Geological Survey Branch, Special Paper 6, 58 pp. Smith, W., 1986. "The effects of eastern north pacific tropical cyclones on the southwestern United States." National Oceanic and Atmospheric Administration Technical Memorandum NWS WS-197, 229 pp. Smith, J.A., 1992. "Representation of basin scale in flood peak distributions." Water Resources Research, 28(11), 2993-2999. Stedinger, J.R., R . M . Vogel and E. Foufoula-Georgiou, 1993. "Frequency analysis of extreme events." In: Handbook of Hydrology, edited by D.R. Maidment. New York: McGraw-Hill, Inc. Stedinger, J.R. and L . H . Lu, 1995. "Appraisal of regional and index flood quantile estimators." Stochastic Hydrology and Hydraulics, 9, 49-75. Stoddart, R.B.L. and W.E. Watt, 1970. "Flood frequency prediction for intermediate drainage basins in southern Ontario." C. E. Research Report No. 66, Department of Civil Engineering Queen's University at Kingston, Ontario. Sumioka, S.S., D.L. Kresch and K.D. Kasnick, 1997. "Magnitude and frequency of floods in Washington." USGS Water Resources Investigation Report 97-4277. 124 Svoboda, A. , 1974. "Effects of storage on distribution parameters of peak discharges." Journal of Hydrologic Sciences, 1(1-2), 35-54. Tarboton, D.G., R.L. Bras and I. Rodriguez-Iturbe, 1989. "Scaling and elevation in river networks." Water Resources Research, 25 (9), 2037-2051. Taylor, B., 1998. "Effect of El-Nino/Southern Oscillation (ENSO) on British Columbia and Yukon Winter Weather." Aquatic and Atmospheric Science Division, Pacific and Yukon Region, Environment Canada, Report 98-02. U.S. Water Resources Council (USWRC), 1967. " A uniform technique for determining flood flow frequencies." Bulletin No. 15, Washington, D . C , USA. U.S. Water Resources Council (USWRC), 1976. "Guidelines for determining flood flow frequency." Bulletin No. 17, Washington, D . C , USA. US Water Resources Council (USWRC), 1982. "Guidelines for determining flood flow frequency, Bulletin 17B." Hydrology subcommittee, US Water Resources Council, Washington, D.C. U.S. Interagency Advisory Committee on Water Data (USIACWD), Hydrology Subcommittee, 1983. "Guidelines for Determining Flood Flow Frequency." Bulletin No. 17B, issued 1981, revised 1983, US Geological Survey, Office of Water Data Restoration, Reston, Virginia, USA Versace, P., M . Fiorentino and F. Rossi, 1982. "Analysis of flood series by stochastic models." In Time Series Methods in Hydrosciences, Ed. by A . H. El-Shaarawi and R. Esterby, Elsevier Sc. Publ., 315-324. Viessman, W. and G.L. Lewis, 1996. "Introduction to hydrology." Fourth edition, New York :HarperCollins College Publishers, Chapter 27, 708-750. Vogel, R .M. , J.W.O. Thomas and T.A. McMahon, 1993. "Flood-flow frequency model selection in Southwestern United States." Journal of Water Resources Planning and Management, American Society of Civi l Engineers, 119(3), 353-366. Vogel, R . M . and I. Wilson, 1996. "Probability distribution of annual maximum, mean, and minimum streamflows in the United States." Journal of Hydrologic Engineering, American Society of Civil Engineers, 1(2), 69-76. Wallis, J.R., N.C. Matalas and J.R. Slack, 1974. "Just a moment!" Water Resources Research, 10(2), 211-219. Ward, R . C , 1971. "Small experimental watersheds: an appraisal of concepts and research developments." University of Hull, Hull Printers Ltd: Hull, U K . 125 Watt, W.E. and M.J.N. Nozdryn-Plotnicki, 1980. "Rainfall frequency analysis for urban design." Proceedings Canadian hydrology symposium, National Research Center, Ottawa, Ontario, Canada, 34-52. Watt, W.E., K.W. Lathem, C R . Neill, T.L. Richards and J. Rousselle (eds), 1989. "Hydrology of floods in Canada: a guide to planning and design." National Research Council of Canada, Associate Committee on Hydrology, Ottawa, Canada. Waylen, P.R. and M.K. Woo, 1982. "Prediction of annual floods generated by mixed process." Water Resources Research, 18(4), 1283-1286. Waylen, P.R. and M . K . Woo, 1983. "Annual floods in southwestern British Columbia, Canada." Journal of hydrology, 62, 95-105. Waylen, P.R. and M.K. Woo, 1984. "Regionalization and prediction of floods in the Frazer river catchment, B.C." Water Resources Bulletin, 20(6), 941-949. Waylen, P.R., 1985a. " A method of predicting daily peak flows in the high-flow season." Journal of Hydrology 77, 89-105. Waylen, P.R., 1985b. "Stochastic flood analysis in a region of mixed generating processes." Trans. Institute Britain Geography, 95-108. Waylen, P.R. and C N . Caviedes, 1986. "El-Nino and annual floods on the north Peruvian littoral." Journal of Hydrology, 89, 141-156 Waylen, P.R. and M . K . Woo, 1987. "Annual low flows generated by mixed processes." Hydrological Sciences Journal, 32(3), 371-383. Webb, R.H. and J.L. Betancourt, 1992. "Climatic variability and flood frequency of the Santa Cruz River, Pima County, Arizona." U.S. Geological Survey Water-Supply Paper 2379, 40 pp. Wiltshire, S.E., 1986a. "Regional flood frequency analysis I: homogeneity statistics." Hydrological Sciences Journal, 31(3), 321-333. Wiltshire, S.E., 1986b. "Regional frequency analysis II: multivariate classification of drainage basins in Britain." Hydrological Sciences Journal, 31(3), 335-345. Woo, M . K . and P.R. Waylen, 1984. "Areal prediction of annual floods generated by two distinct processes." Hydrological Sciences Journal, 29(1), 75-88. Wood, E.F., M . Sivapalan and K. Beven, 1990. "Similarity and scale in catchment storm response." Reviews of Geophysics, 28(1), 1-18. 126 Wu, B. and J.D. Goodridge, 1974. "On the selection of probability distributions for hydrologic frequency analysis." Paper presented at the American Geophysical Union Fall Annual Meeting. 127 APPENDIX A 128 Figure 2.1 The G i l a River Basin showing the locations of the gauging station detailed in Table 2.1. 129 150 0 150 300 m Figure 2.2 Walnut Gulch Experimental Watershed: location map, rain gauge and watershed locations (from Goodrich et al., 1997) 130 Figure 2.3 Major physiographic regions of British Columbia with regions considered in this study superimposed as shaded areas. 131 Figure 2.4 Location map of study regions in California and Colorado (Pitlick, 1994) 132 N •4^ ITNE CATCHMENT NORTH SEA WEAR C A T C H M E N T S K E R N E [ O f N CATCHMENT OVER 6 0 0 m 0 0 B R O K E N \ / J C A R 1 / CLOW BECK V l w EOW MOOR 0 S S W A t E C A T C H M E N T 0 25 Km t i i i . . . - i . I Figure 2.5 The River Tees and study reach, England (Archer, 1989) 133 134 o Reference Period: 1962 -to 1983 17 Observations 1962 to 1983 £ "I 1 Jan Feb War Apr Way Jun Jul Aug Sep Oct Nov Dee g R.f*r»nc« Period: 1969 -to 1994 22 Observations 1969 -to 1394 «* 1 I Jan Feb War Apr Way Jun Jul Aug Sep Oct Nov Dec q Reference Period: 19Q7 -to 1994 65 Observations 19Q7 -to 1994 Jan Feb War Apr Way Jun Jul Aug Sep Oct Nov Dec Month Figure 3.2 Example of three histograms by month of occurrence (a) floods are occurring in one season, (b) floods are occurring in two distinct seasons, and (c) floods occurring all year around 135 a Q 120.0 180.0 Flood (cms) 240.0 300.0 o o (V\\ — — (DJ _ 1.05 1.25 10 20 50 100 500 Recurrence Interval (years) Figure 3.3 Hypothetical sample of (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a unimodal distribution. 136 c Q 200.0 400.0 600.0 800.0 1000.0 Flood (cms) «3 E O © (b) r f 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurrence Interval (years) Figure 3.4 Hypothetical sample of: (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a bimodal distribution. 137 e/3 a Q 600.0 1200.0 1800.0 Flood (cms) 2400.0 3000.0 s T3 O O V " / 1.003 1.05 1.25 2 5 10 20 50 100 5 Recurrence Interval (years) Figure 3 . 5 Hypothetical sample of: (a) probability density function (PDF) and (b) cumulative distribution function (CDF) plots from a heavy-tailed distribution 138 Fishtrap Creek near McLure, WSC STN 08LB024, drainage area: 135 sq. km, 30 observations from 1915 to 1994 o R«fir.nc» Ptrlod: 1915 lo 1994 30 Observations 1315 te 1994 CJ u s u u 3 u u s T3 O O Jan Fib Mar Apr May Jun Jul tm SID Dd Nov D i ; Month Antecedent Precinitation Index (mm) Figure 3.6: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 139 Salmo River near Salmo, WSC STN 08NE074, drainage area: 1230 sq. km, 46 observations from 1949 to 1994 Riftrinu Pirlod: 1949 te 1994 45 Obsiryotlons 1949 to 1994 (d) a 1 y so •d to0 0 o 0 o 0 o Floods generated by avarage antecedent temperate 20C • Floods generated by average antecedent temperate 30 C Jan Ftb Mar Apr May Jim Jul Aug Sip Dot Moy Die ro m xo a 500 Month Antecedent Precipitation Index (mm) Figure 3.7: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 140 Boundary Creek near Porthill, WSC STN 08NH032, drainage area: 251 sq. km, 32 observations from 1959 to 1994 o Riftwict Pirlod: 1959 1o 1994 32 ObsirvutlMK 1959 ta B94 Vi £ u •a o o Month Antecedent Precipitation Index (mm) Figure 3.8: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 141 Halfway River near Parrell Creek, WSC STN 07FA001, drainage area: 9400 sq. km, 17 observations from 1962 to 1983 q Rafirtnci Ptrloa: 1962 to 1982 17 Obstrvotlons 1962 to 1983 U fi CU s-u fl u (c) "I 1 1 f 1 1 r cn S u XI o o (d) OSim aline «itiAM5 ran lRainonSro»»iAP>15nn A Rairfal generated floods with AP>38 nrn Jan Fib Mar tor May Jun Jul Auo S»D Dd Nov Die Month 0 5 10 15 20 25 30 35 4 0 * 5 Antecedent Precipitation Index (mm) Figure 3.9: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 142 Chilliwack River at Vedder Crossing, WSC STN 08MH001, drainage area: 1230 sq. km, 22 observations from 1969 to 1994 g Rlfinnct Period: 1969 te 1994 22 ObstrratlMtt 1969 te B94 Month E y so X! O O no (d) 0 Snowrel rjenerated Hoods villi APcSO mm • Raiial generated tads with50cAP<120 ran iRaMoerei*dltods«*AP>120iir» Antecedent Precipitation Index (mm) Figure 3.10: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 143 Bella Coola River above Burnt Bridge Creek, WSC STN 08FB007, drainage area: 3730 sa. km. 29 observations from 1965 to 1994 Month Antecedent Precipitation Index (mm) Figure 3.11: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 144 Zymagotitz River near Terrace, WSC STN 08EG011, drainage area: 376 sq. km, 35 observations from 1960 to 1994 Figure 3.12: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 145 Zymoetz River above O.K. Creek, WSC STN 08EF005, drainage area: 2980 sq. km, 32 observations from 1963 to 1994 Flood (cms) Recurrence Interval (years) q Pitinnti Pirled: 1963 te 1994 32 Obsirvjilons 1963 te 1994 Month Antecedent Precipitation Index (mm) Figure 3.13: (a) nonparametric probability density function, (b) nonparametric cumulative distribution function, (c) histogram of floods by month of occurrence, and (d) one-week antecedent precipitation index plot. 146 1.003 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 1.003 o o o X3 O O O o Return Period in years 1.25 2 5 10 20 50 100 500 I 1 h 0.005 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 1 h 0.99 0.9975 ure 3.14 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for the Kitimat River below Hirsch Creek (WSC STN 08FF001) 147 Return Period in years 1.003 1.05 1.25 2 5 10 20 50 100 500 1 1 1 1 1 1 h—I o o 8 H co O o o o o oo • ENSO o Non-ENSO 0.005 e » o 8 ° o o o o o o o - i 1 1 1 \ 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability (a) 0.99 0.9975 1.003 o o • 8 H CO CM •o O o o GO 1.05 • ENSO o Non-ENSO Return Period in years 1.25 2 5 10 20 50 100 500 —I 1 1 1 h 0.005 e o* o» 0 » (b) -I 1 1 1 1 1 1 h 0.1 0.4 0.6 0.8 0.9 0.95 0.99 0.9975 Non-exceedance Probability |ure 3.15 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Little Wedeene River below Bowbyes Creek, (WSC STN 08FF003). 148 1.003 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 o o oo o o CO o o fc o 3. co T3 o o o o CM O O O CO • E N S O 0 o Non-ENSO e 0 0 0 0» • c e~00° 0 0 0 9 . ° o 0 8 o ° o o o o 006? 0 0 ° SCO e o ° . ° <« 0 e (a) 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 1.003 o o co o o CO o o rf E o 3. co •o O o u- o o CM 8 H o 00 1.05 • E N S O o Non-ENSO Return Period in years 1.25 2 5 0.005 oeoo oo o o o i h H 1 10 20 50 100 500 1 1 • o e (b) 1 h 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability —I 1-0.99 0.9975 ;ure 3.16 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve. Chilliwack River at Vedder Crossing, (WSC STN 08MH001) 149 1.003 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 o o -cn o o -co o o -E & o ~o o -o m o o o -o o • CO • ENSO o Non-ENSO 0.005 o o o • e 1 h 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability (a) —I h 0.99 0.9975 1.003 Return Period in years 1.25 2 5 50 100 o o O) o o CO o o o o o o o o o o CO 0.005 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 ure 3.17 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Kitsumkalum River near Terrace, (WSC STN 08EG006). 150 .003 1.05 —I— Return Period in years 1.25 2 5 10 20 50 100 500 —I 1 1 1 1 1 1 o ' o o o co o ' CM o o -CO o o -to o o -• ENSO o Non-ENSO o °o8o 9«oO o o o o o O 9 e 1- i h (a) —I r 0.99 0.9975 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 1.003 1.05 —h-Return Period in years 1.25 2 5 10 20 - 4 -50 100 —I h-500 -4 o o o o o o o • CO • ENSO o Non-ENSO 00®' -t-(b) I 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 1 I 0.99 0.9975 ure 3.18 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Muskwa River near Fort Nelson, (WSC STN 10CD001) 151 Return Period in years 1.003 1.05 1.25 2 5 10 20 50 100 500 E o cn I I I • E N S O l i l l I I 0 _ o N o n - E N S O e o r-o o CD o 0 o ^ 0 IT) O r f eP o O CO 0 e o e (a) — i 1— 0.005 0.1 0.4 0.6 0.8 0.9 0.95 0.99 0.9975 Non-exceedance Probability Return Period in years 1.003 1.05 1.25 2 5 10 20 50 100 500 E _o •o o o LL o a> 1 1 1 • E N S O —I I I I I 1 1 e o _ o N o n - E N S O e o CO o in CP*" o r f O co o o CO o 1 1 1 1 (b) — i 1— 0.005 0.1 0.4 0.6 0.8 0.9 0.95 0.99 0.9975 Non-exceedance Probability ure 3.19 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Boundary Creek near Porthill, (WSC STN 08NH032) 152 1.003 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 0.005 1.003 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 Figure 3.20 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Skeena River at Usk, (WSC STN 08EF001) 153 Return Period in years 1.003 1.05 1.25 2 5 10 20 50 100 500 I I I I I • ENSO I I I I • o Non-ENSO 0 „ 0 O O a O » . 8 °o 0 0 0 (a) 0.005 0.1 0.4 0.6 0.8 0.9 0.95 0.99 0.9975 Non-exceedance Probability 1.003 1.05 Return Period in years 1.25 2 5 10 20 50 100 500 0) E o o ^ o -o o u. 8 H CO • ENSO o Non-ENSO H 1 h -\ h OOO B 0 0 O . 0 (b) 0.005 0.1 0.4 0.6 0.8 0.9 0.95 Non-exceedance Probability 0.99 0.9975 gure 3.21 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by ENSO conditions, and (b) unclassified annual flood frequency curve for Kettle River near Laurier, (WSC STN 08NN012) 154 San Francisco River at Clifton, USGS STN 09444500, drainage area: 2766 sq. km, 80 observations from 1891 to 1996 Gila River at Clifton, USGS STN 09442000, drainage area: 4010 sq. km, 74 observations from 1911 to 1995 1.003 Return Period in years 1.05 1.25 2 5 10 20 50100 500 I I I I I I I o Frontal i I I • Monsoon 0 0 e o o o 0 0 0 , 0 0 o o ^ t 0 o 0 0 0 0 (a) 0.005 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability Return Period in years 1.003 1.05 1.25 2 5 10 20 50100 500 1 1 1 1—I—I—H 1 _ O to O E "> o o o Frontal • Monsoon 0 '0' (a) i 1 1 1 1—h—l 1—h 0.005 0.1 0.4 0.6 0.8 0.9 0.95 0.990.9975 Non-exceedance Probability Return Period in years 1.003 1.05 1.25 2 5 10 20 50100 500 1 1 1 1 1 1 H r cn o E 1 0 O p U-o Frontal • Monsoon 1 0.005 ooo .000" (b) 1 r- 1 — f 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability 1.003 1.05 + Return Period in years 1.25 2 5 10 20 50100 500 —I 1 1—I—I—H— o Frontal • M o n s o o n 0 o (b) 1 1 1 1— 1— 0.005 0.1 0.4 0.6 0.8 0.9 0.95 0.990.9975 Non-exceedance Probability Figure 3.22 Empirical mixed population analysis of floods: (a) frequency curves of floods classified by storm type, and (b) unclassified annual flood frequency curve. 155 o gj R e f e r e n c e P e r i o d : 1891 "to 1996 3 2 0 0 . 0 Flood (cms) q R e f e r e n c e P e r i o d ; 1891 io 1996 8 0 O b s e r v a t i o n s 1891 io 1996 (c) J a n F e b M a r A p r M a y J u n J u l A u g S e p O c t N o v Dee Month ;ure 3.23 San Francisco River at Clifton - 9444500 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences. 156 R e f e r e n c e P e r i o d : 1911 -to 1995 SB a Q £ o o Flood (cms) ^ 1.05 1.25 5 10 2 0 5 0 lOO 5 0 0 Reccurrence Interval (years) Cl R e f e r e n c e P e r i o d : 1911 -to 1995 7 4 O b s e r v a t i o n s 1911 -to 1995 cu Figure 3.24 Gila River near Clifton - 9442000 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences. 157 Figure 3.25 Annual flood series for the Santa Cruz River at Tucson, Arizona. Hydroclimatological year is November 1 to October 31 (from Webb and Betancourt, 1992) o • Before 1960 O After 1960 — Trendline — Trendline Drainage Area (Km ) Figure 3.26 Variation of L-coefficient of variation (L-Cv) of the annual floods with drainage area during the pre-and post-1960 conditions. 158 1.003 1.05 1.25 Return Period in years 5 10 20 50100 500 ( a ) i I I i i i i o After • Before 0 0 0 0 • 0 V ' 0 0 0 0 1 o 0 Gila River Near Clifton 0.005 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability Return Period in years .003 1.05 1.25 2 5 10 20 50100 500 1 1 1 1 1 I I I CO < E in o ( a ) o After • Before oo 00 • • • 0 «0» 0 o. • 0 0 0 0 San Francisco River At Clifton I 1 1—I 1—h 0.005 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability Return Period in years .003 1.05 1.25 2 5 10 20 50100 500 1 " 1 1 ( b ) 1 1 1 1 1 1 1 0 0 0° 1 0 y 00* • 0 1 0 0 Gila River Near Clifton Return Period in years 1.003 1.05 1.25 2 5 10 20 50100 500 H—I I I I 0.005 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability 0.005 0.1 0.4 0.6 0.8 0.90.95 0.990.9975 Non-exceedance Probability Figure 3.27 Empirical mixed population analysis of floods for the Gila River near Clifton and San Francisco River at Clifton (a) frequency curves of floods pre-1960 and post-1960, and (b) unclassified annual flood frequency curve. 159 Return Period in Years 1.003 1.05 1.25 5 10 20 50 100 o Broken Scar • Low Moor 00 . 1 0 ,000° 0 0 t • 0 0 o o1 • e 0.005 0.100 0.400 0.600 0.800 0.900 0.950 0.990 Non-exceedance Probability Figure 3.28 Empirical flood frequency curves at Broken Scar and Low Moor using the Cunnane plotting position formula. 160 Discharge at Broken Scar cms) Figure 3.29 Wave speed and travel time through the reach at flood discharges. The plotted lines represent smoothed 10, 50, and 90 percentiles of grouped data (from Archer, 1989). Discharge - Broken Scar + tributaires (cms) Figure 3.30 Relationship between upstream and downstream discharges with the ratio of 12 hour mean flow to the peak flow as a parameter (from Archer, 1989) 161 R a f t f n M Pwrlod: 1969 ie> 1999 31 Observations 1969 -to © 9 9 Month Figure 3.31: Tees River at Low Moor - F3606 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences. 162 Flood (cms) — " (b) >*- jt— -*-5 lO 2 0 5 0 IOO S O O Reccurrence Interval (years) q R e f e r e n c e P e r i o d : 1957 -to 1998 4 2 O b s e r v a t i o n s 1957 -to 1996 .11 J a n F e b M a r A p r M a y J u n J u l A u g S e p O c t N o v Dee Month Figure 3.32 Tees River at Broken Scar - F3501 (a) probability density function; (b) flood series estimated by NP method; and (c) histogram by month of occurrences. 163 1000 1000 1000 Discharge Discharge Figure 3.33 The nonparametric density function for the flood data at Broken Scar (h is the smoothing factor according to Equation 3.1) •s 164 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 Discharge Discharge 1000 Discharge Discharge Figure 3.34 The nonparametric density function for the flood data at Low Moor (h is the smoothing factor according to Equation 3.1) 165 500 Recurence Interval in Years Recurence Interval in Years Figure 4.1 Mixed population analysis of annual floods by the LP3 distribution for.the ten streams in the Gila River Basin. 166 1.003 1.05 1.25 2 5 10 20 50 100 500 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Recurence Interval in Years Figure 4.1(cont.) Mixed population analysis of annual floods by LP3 mixed distribution for the ten streams in the Gila River Basin. 167 ft o E 1 1 1 Salt River Near Chrysotile y -0 00°. • Q O T 0 u u Salt River Near Roosevelt y J y 4 • — — i — o_/.— u A O 1.003 1.05 1.25 2 5 10 20 5 0 1 0 0 500 Recurence Interval in Years 1.003 1.05 1.25 2 5 10 20 5 0 1 0 0 500 Recurence Interval in Years 8 1 1 1 1 — T — San Francisco River At Clifton y -- 5 7 " / 0 c •fiO < y i =^ / 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years I I I = Oak Creek Near Cornville Jt. c^oo^l 0 0 > y 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years H Santa Cruz At Continental (0 O -5 2 W2T 1Z-1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Figure 4.2 Mixed population analysis of annual floods by the G E V distribution for the ten streams in the Gila River Basin. 168 1.003 1.05 1.25 2 5 10 20 50 100 500 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Recurence Interval in Years Figure 4.2(cont.) Mixed population analysis of annual floods by G E V mixed distribution for the ten streams in the Gila River Basin. 169 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years c3 o < LO E I I I I = — San Francisco River at Clifton / 0 c po c / .fee / • oo° 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.25 2 5 10 20 50 100 500 Recurence Interval in Years to to < EE CO b „ Santa Cruz River at Continental w2i 2 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.05 1.25 2 5 10 20 50100 500 Recurence Interval in Years Figure 4.3 Mixed population analysis of annual floods by the Wakeby distribution for the ten streams in the Gila River Basin. 170 =t=+==<== Santa Cruz at Tucson V— IP 0 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Gila River near Clifton t S>— <*" -(Sr o / " v 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Figure 4.3(cont.) Mixed population analysis of annual floods by Wakeby mixed distribution for ten streams in the Gila River Basin. 171 1.003 1.05 1.25 2 5 10 20 50 100 500 1.003 1.05 1.25 2 5 10 20 50100 500 Recurence Interval in Years Recurence Interval in Years Figure 4.4 Mixed population analysis of annual floods by the T C L N distribution for the ten streams in the Gila River Basin. 172 Figure 4.4(cont.) Mixed population analysis of annual floods by the T C L N distribution for the ten streams in the Gila River Basin. 173 •5T ° E ° Salt River near Roosevelt — — / 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years c5 o < m E I I I l = — San Francisco River at Clifton —f-0 o o 300 / / I I I Oak Creek near Cornville o o / " 0 0 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Santa Cruz River at Continental 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years San Pedro River near Redington f J —o u"y 0 o 1.05 1.25 2 5 10 20 50100 500 Recurence Interval in Years Figure 4.5 Mixed population analysis of annual floods by the Nonparametric distribution for the ten streams in the Gila River Basin. 174 Figure 4.5(cont.) Mixed population analysis of annual floods by Nonparametric distribution for the ten streams in the Gila River Basin. 175 1,003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years I (b) oc go 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years (c) ) > np&<> o 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Figure 4.6 Annual flood data at Low Moor fitted by (a) G E V distribution, (b) T C L N distribution and (c) Nonparametric distribution. Data are plotted on normal probability paper using the Cunnane formula. 176 1.003 1.05 1.25 2 5 10 20 5 0 1 0 0 500 1.003 1.05 1.25 2 5 10 20 5 0 1 0 0 500 Recu rence Interval in Years Recu rence Interval in Y e a r s 1.003 1.05 1.25 2 5 10 20 50 100 500 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Recurence Interval in Years 1.003 1.05 1.25 2 5 10 20 50100 500 1.003 1.05 1.25 2 5 10 20 5 0 1 0 0 500 Recurence Interval in Years Recurence Interval in Years Figure 4.7 Mixed population analysis of annual floods by the E V 1 distribution for the six streams in British Columbia. 177 1.003 1.05 1.25 2 5 10 20 50100 500 1.003 1.05 1.25 2 5 10 20 50100 500 Recurence Interval in Years Recurence Interval in Years I I I I I Halfway River near Parrell Creek 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years b I I I Chilliwack River basin 0 C y u / / •> 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Figure 4.8 Mixed population analysis of annual floods by the T C L N distribution for the six streams in British Columbia. 178 Figure 4.9 Mixed population analysis of annual floods by the Nonparametric distribution for the six streams in British Columbia. 179 1.003 1.05 1.25 2 5 10 20 50 100 500 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurence Interval in Years Recurence Interval in Years Figure 4.10 Effect of plotting position formula on the visual assessment of the LP3 distribution fit to annual floods of the Salt River Near Roosevelt. 180 Number of replications "R" Number of data per station " N " Generator F~ Wakeby Input parameters T C L N Input parameters Start of synthetic data generatiTOr Calculate true value Qr T=2, 5, 10, 1000 yrs Generate a random number between 0 and 1 Use selected method to estimate a flood value with P = random number N=N+1 True Fit the different distributions to this station Calculate fitted parameters For each method Calculate estimated Qj T C E V Input parameters True Quantify the errors Print the errors to output file STOP Figure 5.1 Flowchart summarizing the Monte Carlo simulation experiment 181 Figure 5.2 Mixture of flood series caused by differences in the (a) mean, (b) variance, and (c) mean and variance of the two populations making up the heterogeneous distributions. 182 Scenario #1 L-Cv =0.16 L -Cs= 0.19 cu — Discharge (cms) CV -C3 -C CJ c/5 W - r ' I" 20 tt 100 500 Recurrence Interval in Years Figure 5.3 Probability density function and cumulative frequency function estimated by NP method for Scenario #1 (a hypothetical mixture similar in characteristics to flood mixture in BC) 183 Discharge (cms) Recurrence Interval (Years) Figure 5.4 Probability density function and cumulative frequency function estimated by NP method for Scenario #2 (a hypothetical mixture similar in characteristics to flood mixture in BC) 184 Scenario #3 , ~ n L-Cv = 0.33 L-Cs = 0.50 2 eu Q I l' .0 BOO.O 1200.0 1800.0 2400.0 3000.0 Discharge (cms) v tx •-a -e u en b 5 10 20 00 100 r>00 Recurrence Interval in Years Figure 5.5 Probability density function and cumulative frequency function estimated by N P method for Scenario #3 (a hypothetical mixture similar in characteristics to flood mixture in Arizona) 185 r-. o o o o g —I 1 1 1 I I r .0 600.0 1200.0 1800.0 2400.0 3000.0 Discharge (cms) _ Scenario UA 1.003 1.05 1.25 2 5 10 20 50 100 500 Recurrence Interval in Years Figure 5.6 Probability density function and cumulative frequency function estimated by NP method for Scenario #5 (a hypothetical mixture similar in characteristics to flood mixture in Arizona) 186 60 50 40 30 10 1 1 1 1 — • TCLN A GEV ^ EV1 • Wakeby •::<NP OLP3 O L-O L-Cs / = 0.16 5 = 0.19 C i li c 5 i T c ) I E 1 ^ > c > c L \ I I I i ( t i l A i 1 10 100 1000 Return Period (yr) Figure 5.7: Scenario #7- accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 187 80 70 50 30 20 1 1 1 1 • TCLN A GEV ^EV1 DWakeb •:< NP o LP3 y • o L-Cv =0.18 L-Cs = 0.22 i 3 ? i < ? n <> 1 r • < > • < E 3 • il ( C < 5 • < > ; i J 1) < I i i i i — 10 100 1000 Return Period (yr) Figure 5.8: Scenario #2 - accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 188 140 • T C L N A G E V • EV1 EH Wakeby «:<NP O L P 3 L-Cv =0.33 L-Cs = 0.50 9 • a I 10 100 Return Period (yr) 1000 Figure 5.9: Scenario #3 - accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 189 • TCLN A GEV • EV1 • Wakeby •::<NP OLP3 L-Cv = 0.35 L-Cs = 0.44 9 1_ 10 100 Return Period (yr) 1000 Figure 5.10: Scenario #4- accuracy of estimating design floods for various return periods when the sample size is 20 (generation method is TCLN) 190 40 • TCLN A GEV • EV1 • Wakeby •::<NP OLP3 L-Cv = 0.16 L-Cs = 0.19 35 30 ] 5 .1 , , , , , 1 0 20 40 60 80 100 120 Sample Size (yrs ) Figure 5.11: Scenario #1 - influence of sample size on the accuracy of 100-yr design flood. Data are generated by TCLN 191 L-Cv = 0.33 • TCLN • GEV L-Cs = 0.50 • EV1 • Wakeby •:< NP OLP3 15 J , , , , , , 1 10 25 40 55 70 85 100 115 Sample Size (yrs) Figure 5.12: Scenario #3 - influence of sample size on the accuracy of 100-yr design flood. Data are generated by TCLN 192 * » T C L N A . GEV • Gumbel O Wakeby N P o LPIII TCLN < < > c > c > c \ A •f * ft i 1 100 1000 Return Period (yr) 2 2 •> TCLN Gumbel O Wakeby •::« NP O LPIII TCEV — C > z E J c c J ? 3 k 4 » • § T « • > * • s Return Period (yr) 675 > TCLN A . GEV - Gumbel a Wakeby - NP o LPIII o e Wakeby 10 100 1000 Return Period (yr) Figure 5.13. Accuracy of design flood estimation for various return periods for Scenario #1 of BC data using different generators (a) T C L N (b) T C E V and (c) Wakeby 193 C/3 ao m TCLN » GEV • EV1 D Wakeby NP o LPi T C L N c > t • c • 3 4 i E I f R e t u r n P e r i o d (yr) • TCLN A GEV • Gumbel • Wakeby .:> NP o LRU TCEV C c > c i i ! -t + l S a R e t u r n P e r i o d (yr) pq • TCLN A GEV • Gumbel O Wakeby NP O LPIII Wakeby * « e » 1 -• • 9 » a * 1 i | c a R e t u r n P e r i o d (yr) Figure 5.14. Accuracy o f design flood estimation for various return periods for Scenario #3 of Arizona data using different generators (a) T C L N (b) T C E V and (c) Wakeby 194 Figure 6.1 Variation of regional L-coefficient of variation (L-Cv) of flood peaks with drainaee (after Smith. 1992Y 195 1 A 0 A c A 0 Drainage Area A — • Figure 6.2 Variation of regional L-coefficient of variation (L-CV) of flood peaks with drainaee (after Cathcart. 2001V 196 Maximum (Upper Extreme) Mean + SD (Upper Quartiles) Mean Mean - SD (Lower Quartiles) Minimum (Lower Extreme) Drainage Area (km ) Figure 6.3 Visual representations of the box-plot 197 y = 0.80x R 2 = 0.67 -0.18 O Coast Columbia/Southern Rocky Mountains • Fraser/Thompson Plateaus R ' = 0.16 ' _i 1 • • • • • i _i • t 11111 i i i i 11111 -i i i 1111 1 0 100 1000 10000 100000 Drainage Area (km) Figure 6.4 Scaling behavior of the L-coefficient of variation for the annual flood series derived from daily flow data for three physiographic regions. 198 O Coast Columbia/Southern Rocky Mountains • Fraser/Thompson Plateaus y = 0.60x -0.16. O y = 0.24x 0 2 Q t R = 0.022 0.026 y = 0 . 2 3 x 0 0 7 1 . ^ A ^ i ^ a a r 3 - - - - ^ = = * R = 0.26 1 0 100 1000 10000 100000 Drainage Area (km) Figure 6.5 Scaling behavior of the L-coefficient of variation (L-Cv) for the annual flood series derived from instantaneous flow data for three physiographic regions. 199 \ • Colorado Foothills o North Central Sierra Nevada y = 1.9895X"0 2 3 2 6 A Coast Range R2 = 0.6255 * • Colorado Alpine \ e • 0 s • = 0.6436x a 0 6 0 2 R2 = 0.3033 S " " - " - - ^ o ° 7 \ 0 y = 0.3427xu U 2 ' R2 = 0.0176 01 A y = 8E-06x + 0.2298 • * V R = 0.0005 10 100 1000 10000 Drainage Area (km) Figure 6.6 Climate effect on scaling of the L-Cv in Colorado and California 200 1.5 2 2 . 5 3 Drainage Area (kmA2) Drainage Area (kmA2) Drainage Area (kmA2) Drainage Area (kmA2) Figure 6.7 Box plots of the L-coefficient of variation versus drainage area for (a) Colorado Alpine; (b) Colorado Foothills; (c) north-Central Sierra Nevada; and (d) Coast Range 201 0.8 0.7 « 0.61 © c 0.51 < o 5 0-41 0.31 0.2^ 0 . 0 0 1 0 . 0 1 0.1 1 0 1 0 0 1 0 0 0 Drainage Area (km ) Figure 6.8 Scaling behavior of L-Cv in Walnut Gulch Experimental Watershed - Arizona 202 0.8 o O Coast Cblurnbia/Southern Rocky Mountains • ftaser/Thompson Plateaus O 0 o o 0 o o 0 00 o o 4 o a o - ' .oo 0 ° C D / A " " A Q 4 o o A o o o 0 0 " 0 0 ( ^ o o , o „0OE1 A 0 Qj o L U — O - 1 • 0 O gB 2 A ck" ^ , i i i i IVIIII 1 — 1 _ A 11 10 100 1000 10000 100000 Drainage Area (km) Figure 6.9 Large data scatter obscures any scaling of the L-coefficient of skew for the annual flood series derived from daily flow data for three physiographic regions. 203 0.9 y = -0.0848Ln(x) + 1.0037 R : = 0.2797 Colorado Foothills 0.8 • • o North Central Sierra Nevada A Coast Range 0.7 • Colarado Alpine y = -0.0506Ln(x) + 0.6623 R 2 = 0.502 * 0.6 o 0.5 * ° J" 0.4 y = -0.0223Ln(x) + 0.3739 "Vw o A ^ S . • 0.3 R 2 = 0.062 o • • A A * 0.2 A 0.1 0 • A • * • A 1 10 100 1000 10000 Drainage Area (km2) Figure 6.10 Climate Effect on Scaling of the L-coefficient of Skewness (L-Cs) in Colorado and California 204 Drainage Area (kmA2) Drainage Area (kmA2) 0 . 5 1 1 .5 2 2 5 3 Drainage Area (kmA2) Drainage Area (kmA2) Figure 6.11 Box plots of the L-coefficient of skewness (L-Cs) versus drainage Area for (a) Colorado Alpine; (b) Colorado Foothills; (c) North-Central Sierra Nevada; and (d) Coast Range 205 11000 10000 9000 8000 7000 J | 6000 cu W g a i3 5000 - — p 4000 3000 2000 1000 Mean = 1000 m7s L-Cs = 0.24 L-Ck = 0.17 10 100 Return Period (yr) 1000 Figure 6.12 Sensitivity of the flood frequency curve to changes in the L-coefficient of variation (L-Cv) obtained using the Generalized Extreme Value (GEV) distribution 2 0 6 0.8 0 I • • 1 —' — — — —< — —J 1 10 100 1000 10000 100000 Drainage Area (sq. km) o C o a s t of B r i t i s h C o l u m b i a T r e n d l i n e L o w e r c o n f i d e n c e l e v e l at 9 0 % U p p e r c o n f i d e n c e l e v e l at 9 0 % 10 100 1000 Dra inage A r e a (km") 1 0 0 0 0 0 C o l u m b i a / S o u t h e r n R o c k y M o u n t a i n s - T r e n d l i n e - L o w e r c o n f i d e n c e level at 9 0 % - U p p e r c o n f i d e n c e level at 9 0 % 10 100 1000 Drainage Area (km ) 10000 1 0 0 0 0 0 Figure 6.13. L-Coefficient of variation vs. drainage area for British Columbia (a) Interior Plateau, (b) Coast, and (c) Columbia/Southern Rocky Mountains. 207 APPENDIX B 208 Table 2.1 General characteristics of the 10 selected USGS hydrometric stations in the Gila River basin, Arizona. ID Name NOBS Area (km2) L-Mean L-Cv L-Cs 94420000 Gila River Near Clifton 74 4010 9052 0.48 0.50 94445000 San Francisco River At Clifton 80 2766 15162 0.61 0.51 94685000 San Carlos River Near Peridot 68 1026 11223 0.48 0.39 94705000 San Pedro River At Palominas 58 737 6471 0.34 0.24 94720000 San Pedro River Near Redington 65 2927 10780 0.49 0.42 94820000 Santa Cruz River At Continental 53 1682 6451 0.55 0.55 94825000 Santa Cruz River At Tucson 80 2222 7177 0.43 0.41 94975000 Salt River Near Chrysotile 73 2849 17790 0.53 0.42 94985000 Salt River Near Roosevelt 73 4306 28376 0.57 0.45 95045000 Oak Creek Near Cornville 54 355 7815 0.50 0.33 209 Table 2.2 General characteristics of the hydrometric stations in Walnut Gulch Experimental Watershed, Arizona. ID NAME NOBS AREA (km2) L-MEAN L-CV L-CS 9471080 Walnut Gulch 63.010 Nr Tombstone Az: Usda/Sea 15 6.4 493 0.59 0.40 9456680 Agricul Resrch Serv Safford Wtrshed W-V 30 1.1 137 0.55 0.38 9451900 Agricul Resrch Serv Safford Wtrshed W-I 31 0.8 118 0.50 0.31 9445501 Willow C Nr Point Of Pines(Adjusted) 23 102 925 0.47 0.37 9471087 Walnut Gulch 63.111 Nr Tombstone Az: Usda/Sea 20 0.2 147 0.50 0.32 9471090 Walnut Gulch 63.009 Nr Tombstone Az: Usda/Sea 15 9.1 948 0.52 0.26 9471110 Walnut Gulch 63.015 Nr Tombstone Az: Usda/Sea 27 9.2 845 0.59 0.48 9471120 Walnut Gulch 63.011 Nr Tombstone Az: Usda/Sea 19 3.2 850 0.50 0.43 9471130 Walnut Gulch 63.008 Nr Tombstone Az: Usda/Sea 19 5.2 890 0.48 0.38 9471140 Walnut Gulch 63.006 Nr Tombstone Az: Usda/Sea 20 36.7 1798 0.45 0.31 9471180 Walnut Gulch 63.003 Nr Tombstone Az: Usda/Sea 28 3.5 374 0.67 0.58 9471185 Walnut Gulch 63.103 Nr Tombstone Az: Usda/Sea 19 0.3 11 0.36 0.26 9471200 Walnut Gulch 63.001 Nr Tombstone Az: Usda/ 25 57.7 2461 0.50 0.35 9471195 Walnut Gulch 63.007 Nr Tombstone Az: Usda/Sea 16 5.2 707 0.64 0.41 9480000 Santa Cruz River Near Lochiel 48 82.2 2191 0.51 0.41 9480500 Santa Cruz River Nr. Nogales 67 533 5527 0.42 0.39 9482410 Rodeo Wash At Tucson 12 7.2 319 0.40 0.22 9483042 Cemetery Wash At Tucson 25 1.2 302 0.29 0.16 9484000 Sabino Creek Near Tucson 65 35.5 1920 0.53 0.43 9485000 Rincon Creek Near Tucson 44 44.8 1899 0.59 0.43 9483100 Tanque Verde Creek Near Tucson 15 43 1524 0.32 0.19 9471190 Walnut Gulch 63.002 Nr Tombstone Az: Usda/Sea/ 28 43.9 2835 0.59 0.53 210 Table 2.3 General characteristics of British Columbia hydrometric stations F R A S E R / T H O M P S O N P L A T E A U S Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Moffat Creek Near Horsefly 08KH019 30 539 23.0 0.23 0.05 Paul Creek At The Outlet Of P. Lake 08LB012 17 56 0.9 0.40 0.16 Barnes Creek Near Ashcroft 08LF001 15 100 1.5 0.39 0.33 Cherry Creek Near Kamloops 08LF005 17 53 1.3 0.50 0.47 Hat Creek Near Ashcroft 08LF013 12 74 1.9 0.25 0.21 Baker Creek At Quesnel 08KE016 31 1570 40.5 0.24 0.21 Hat Creek Near Cache Creek 08LF015 34 658 5.7 0.39 0.28 Ambusten Creek Near The Mouth 08LF081 18 33 0.2 0.57 0.49 Guichon Creek Below Q. Creek 08LG032 30 800 5.5 0.32 0.06 South Thompson River At Chase 08LE031 72 16200 977.9 0.13 0.09 South Thompson River At M . Creek 08LE069 12 16600 986.4 0.10 -0.08 Scuitto Creek Near Barnhart Vale 08LE036 12 122 1.9 0.60 0.62 Murray Creek Near Spences Bridge 08LF017 13 119 2.8 0.43 0.36 Chase Creek Near Chase 08LE005 15 279 8.5 0.35 -0.11 Tranquille River Near Kamloops 08LF024 13 596 11.0 0.32 -0.02 Clinton Creek At Clinton 08LF038 11 64 0.4 0.37 0.06 Thompson River Near Walhachin 08LF043 15 40900 2525.3 0.11 0.22 Thompson River Near Spences B. 08LF051 43 54900 2783.7 0.11 0.05 Fiftynine Creek Near Clinton 08LF080 15 36 0.2 0.28 0.32 Joe Ross Creek Near The Mouth 08LF094 11 101 2.4 0.41 0.46 Chilko River Near Redstone 08MA001 68 6940 301.3 0.11 0.18 Chilcotin River Below Big Creek 08MB005 24 19300 340.1 0.15 0.25 Guichon Creek Above Tunkwa Lake 08LG056 27 78 1.2 0.30 0.11 Fountain Creek Near Lillooet 08MD001 12 52 0.6 0.44 0.23 Pavilion Creek Above Diversions 08MD028 15 37 0.5 0.29 0.27 Chataway Creek Near The Mouth 08LG066 11 32 0.5 0.18 -0.23 Lee Creek Above Diversions 08MD033 12 36 0.2 0.41 0.52 Town Creek At Lillooet 08ME013 10 14 0.1 0.42 0.34 Yalakom River Above Ore Creek 08ME025 12 575 22.2 0.19 0.31 Mcgillivray Creek Near Lillooet 08MF016 12 47 0.3 0.36 0.04 C O L U M B I A / SOUTHERN R O C K Y M O U N T A I N S Swift Creek Near The Mouth 08KA012 10 132 28.0 0.15 0.09 Michel Creek Above Corbin Creek 08NK028 10 35.9 6.2 0.16 0.22 Dennis Creek Near 1780 M Contour 08NM242 10 3.73 0.7 0.17 0.27 Bower Creek Near The Mouth 08NA067 11 0.1 0.16 0.14 Salmo River Near Waneta 08NE044 11 1300 200.4 0.16 -0.11 Vermilion River Near Radium Hot S. 08NF004 11 951 128.8 0.14 -0.22 Corn Creek Near Creston 08NH068 11 133 26.4 0.15 0.18 Coffee Creek Near Ainsworth 08NH101 11 87.3 18.4 0.22 -0.24 Two Forty Creek Near Penticton 08NM240 11 5 0.9 0.20 -0.06 211 Table 2.3 (Cont.) General characteristics of British Columbia hydrometric stations C O L U M B I A / SOUTHERN R O C K Y M O U N T A I N S Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Two Forty-One Creek Near Pendict. 08NM241 11 4.5 0.9 0.21 0.08 Horsefly River At Horsefly 08KH007 12 2310 147.9 0.18 -0.26 Clearwater River At Inlet To C. L. 08LA009 12 2380 434.6 0.16 -0.30 Hobson Creek Below Bois G. Creek 08LA018 12 162 44.4 0.14 0.29 Horsethief Creek Near Wilmer 08NA005 12 1850 67.9 0.17 -0.16 Kootenai River Near Rexford 08NG039 12 21800 1496.9 0.16 0.25 Goose Creek Near Cresant Valley 08NJ112 12 83.7 13.9 0.24 -0.14 Smoky Creek Above Diversions 08NJ162 12 5.59 0.4 0.14 0.14 Elliot Creek Above Diversions 08NJ165 12 0.1 0.21 0.27 Blueberry Creek Near B. Creek 08NE073 13 145 14.8 0.20 0.12 Hosmer Creek Above Diversions 08NK026 13 6.4 1.1 0.16 -0.01 Elk River Below Weary Creek 08NK027 13 334 39.6 0.13 0.22 Moody Creek Near Christina 08NN021 13 13.5 1.2 0.18 0.16 Cariboo River Near Keithley Creek 08KH013 14 2870 391.6 0.14 0.17 Muller Creek Near The Mouth 08KB006 16 134 34.4 0.15 0.13 Sand Creek Near Galloway 08NG010 16 135 24.4 0.18 -0.02 Litttle Sand Creek Near Jaffay 08NG011 17 116 5.2 0.17 0.27 Phillips Creek Near Rosville 08NG048 17 53 5.6 0.19 0.02 Howell Creek Above Cabin Creek 08NP003 17 145 17.8 0.11 0.12 Cabin Creek Near The Mouth 08NP004 17 93.2 17.8 0.12 0.08 Bugaboo Creek Near Spillimacheen 08NA001 18 381 63.7 0.17 0.07 Chuchinka Creek Near The Mouth 07EE009 19 311 46.8 0.18 0.11 Carbonate Creek Near Mcmurdo 08NA037 19 8.03 0.4 0.14 0.24 Columbia River At Surprise Rapids 08NB006 19 14000 1459.5 0.09 0.13 Couldry River In Lot 9380 08NP002 19 118 17.2 0.19 0.04 Mitchell River At Outlet Of M . L . 08KH014 20 245 45.0 0.11 0.03 Toby Creek Near Athalmer 08NA012 21 684 75.5 0.20 -0.20 Split Creek At The Mouth 08NB016 21 81.3 9.7 0.13 0.19 West Kettle River Below C. Creek 08NN022 21 1170 89.6 0.11 -0.15 Hidden Creek Near The Mouth 08NE114 22 57 12.4 0.16 0.15 Albert River At 1310 M Contour 08NF005 22 70 14.3 0.17 0.34 Palliser River In Lot S149 08NF006 22 653 95.5 0.13 0.37 Mather Creek Below Houle Creek 08NG076 22 136 8.0 0.20 -0.01 St. Mary River Below Morris Creek 08NG077 22 206 50.0 0.11 0.32 Fry Creek Below Carney Creek 08NH130 22 461 130.2 0.10 0.11 Carney Creek Below P. Creek 08NH131 22 118 30.9 0.11 0.13 Lemon Creek Above South L . Creek 08NJ160 22 178 34.0 0.18 0.33 Mckale River Near 940 M Contour 08KA009 23 252 55.4 0.10 0.13 Redfish Creek Near Harrop 08NJ061 23 26.2 7.1 0.18 -0.09 212 Table 2.3 (Cont.) General characteristics of British Columbia hydrometric stations C O L U M B I A / SOUTHERN R O C K Y M O U N T A I N S Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Clearwater River At Outlet Of H . L . 08LA013 24 904 184.8 0.13 0.15 Line Creek At The Mouth 08NK022 24 138 16.5 0.25 0.44 Elk River At Stanley Park 08NK012 25 3520 393.7 0.16 0.18 Fording River At The Mouth 08NK018 25 619 61.6 0.25 0.31 Michel Creek Below Natal 08NK020 25 637 94.9 0.19 0.22 Skookumuck Creek Near S. 08NG051 26 637 80.9 0.16 0.13 Moyie River At Moyie 08NH034 26 251 74.2 0.16 0.29 Elk River At Fernie 08NK002 27 3520 276.5 0.22 0.19 Trapping Creek Near The Mouth 08NN019 29 144 14.6 0.14 -0.06 Duck Creek Near Wynndel 08NH016 30 57 4.5 0.24 0.13 Moyie River Above Negro Creek 08NH120 30 240 48.2 0.17 0.07 West Kettle River Near Mcculloch 08NN015 30 230 35.3 0.16 -0.17 Kicking Horse River At Golden 08NA006 31 684 242.4 0.15 0.23 Kootenay River At Fort Steele 08NG065 31 11400 1013.5 0.13 0.23 Sullivan Creek Near Canyon 08NH115 31 6.22 0.3 0.20 0.16 Slocan River At Slocan City 08NJ014 31 1660 247.7 0.11 0.13 Horsefly River Above Mckinley C. 08KH010 32 785 108.9 0.12 0.12 Granby River At Grand Forks 08NN002 34 2050 248.1 0.14 -0.02 Mcgregor River At Lower Canyon 08KB003 35 4770 1111.5 0.13 0.27 Murtle River Above Dawson Falls 08LA004 35 1380 197.5 0.12 0.13 Deer Creek At Dear Park 08NE087 36 81 7.4 0.18 0.10 Arrow Creek Near Erickson 08NH084 36 78.7 12.3 0.21 -0.13 Kaslo River Below Kemp Creek 08NH005 37 453 89.1 0.16 0.14 Duncan River Near Howser 08NH001 38 2160 407.9 0.13 0.05 Moose River Near Red Pass 08KA008 40 458 92.2 0.16 0.20 Kootenay River At Newgate 08NG042 41 20000 1647.8 0.16 0.00 Fraser River At Mcbride 08KA005 42 6890 911.0 0.09 0.08 Beaton Creek Near Beaton 08NE008 43 100 12.3 0.19 0.13 Kootenay River At Canal Flats 08NF002 43 5390 520.9 0.16 0.08 Elk River Near Natal 08NK016 43 1870 164.5 0.20 0.18 Clearwater River At Outlet Of C. L . 08LA007 44 2950 621.6 0.11 0.11 Fraser River At Shelley 08KB001 45 32400 3216.0 0.11 0.14 Salmo River Near Salmo 08NE074 46 1230 240.7 0.12 0.15 Big Sheep Creek Near Rossland 08NE039 47 347 47.3 0.15 -0.02 St. Mary River Near Marysville 08NG046 49 1480 289.6 0.14 0.05 Columbia River Near F. Hot Springs 08NA045 50 891 44.6 0.19 0.15 Columbia River At Donald 08NB005 50 14000 703.8 0.13 0.18 Lardeau River At Marblehead 08NH007 52 1620 277.9 0.11 0.11 Kootenay River At Kootenay C. 08NF001 53 420 31.9 0.17 0.03 St. Marry River At Wycliffe 08NG012 53 2360 391.6 0.16 0.27 Quesnel River Near Quesnel 08KH006 56 11500 759.7 0.12 0.13 213 Table 2.3 (Cont.) General characteristics of British Columbia hydrometric stations C O L U M B I A / SOUTHERN R O C K Y M O U N T A I N S Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Kootenay River At Wardner 08NG005 59 13600 1192.5 0.14 -0.03 Kettle River Near Laurier 08NN012 65 9840 586.0 0.13 0.01 Moyie River At Eastport 08NH006 66 1480 142.9 0.18 0.04 Kettle River Near Ferry 08NN013 66 5750 338.4 0.14 0.01 Cariboo River Below K. Creek 08KH003 67 3260 383.5 0.13 0.04 Quesnel River At Likely 08KH001 70 5930 394.2 0.13 0.09 Slocan River Near Cresent Valley 08NJ013 71 3320 435.3 0.14 0.14 Columbia River At Nicholson 08NA002 90 6660 436.9 0.14 0.08 C O A S T A L B R I T I S H C O L U M B I A Lingfield Creek Near The Mouth 08MA006 20 98 8.1 0.20 0.28 Big Creek Below Graveyard Creek 08MB007 20 232 19.3 0.23 0.50 Pasayten River Above Calcite C. 08NL069 20 562 68.1 0.21 0.10 Bella Coola River Near H . 08FB002 21 4040 519.6 0.18 0.36 Similkameen River Above G. Creek 08NL070 21 407 71.7 0.16 -0.04 Tulameen River Below Vuich Creek 08NL071 21 256 66.9 0.21 0.20 Buck Creek At The Mouth 08EE013 22 580 42.5 0.19 0.19 Kitsumkalum River Near Terrace 08EG006 22 2180 477.8 0.17 0.24 Mackay Creek At Mon troy al B. 08GA061 22 4 3.5 0.32 0.22 Carnation Creek At The Mouth 08HB048 22 10 15.2 0.20 0.11 Anderson Creek At The Mouth 08MH104 22 27 9.9 0.23 -0.05 Premier Creek Near Queen C. 08OA003 22 0.2 0.20 0.29 Kemano River Above P. Tailrace 08FE003 23 583 353.4 0.33 0.30 Cheakamus River Near Mons 08GA024 23 287 99.7 0.17 0.24 Bridge River At Lajoie Falls 08ME004 23 956 195.9 0.09 0.03 Mosley Creek Near Dumbell Lake 08GD007 24 1550 214.1 0.12 0.12 Soo River Near Pemberton 08MG007 24 283 102.4 0.18 0.34 Skeena River At Glen Vowell 08EB003 25 25900 3198.8 0.16 0.26 Nanika River At Outlet Of K. Lake 08ED001 25 741 123.5 0.20 0.17 Green River Near Rainbow 08MG004 26 195 39.2 0.14 0.40 Iskut River Above Snippaker Creek 08CG004 27 7230 1563.6 0.20 0.45 Pallant Creek Near Queen C. 08OB002 27, 76.7 44.3 0.26 -0.06 Surprise Creek Near The Mouth O8DAO05 28 221 83.7 0.20 0.23 Bear River Above Bitter Creek 08DC006 28 350 131.0 0.16 0.25 Mahood Creek Near Newton 08MH018 28 18 9.4 0.22 0.21 Unuk River Near Stewart 08DD001 29 1480 554.3 0.19 0.20 Coldwater River Near Brookmere 08LG048 29 316 60.8 0.16 0.18 Skagit River Near Hope 08PA001 29 907 158.7 0.20 0.04 Kispiox River Near Hazelton 08EB004 30 1870 311.5 0.23 0.43 Salloomt River Near Hagensborg 08FB004 30 161 61.6 0.27 0.16 214 Table 2.3 (Cont.) General characteristics of British Columbia hydrometric stations COASTAL BRITISH COLUMBIA Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Nusatsum River Near Hagensborg 08FB005 30 269 95.6 0.21 0.15 Whipsaw Creek Below L. Creek 08NL036 30 85 8.9 0.27 0.29 Bulkley River Near Houston 08EE003 31 2380 118.5 0.21 0.00 Spius Creek Near Canford 08LG008 31 780 85.6 0.21 0.12 Yakoun River Near Port Clements 08OA002 31 477 262.6 0.23 0.20 Zymoetz River Above O.K. Creek 08EF005 32 2980 740.8 0.29 0.44 Chilliwack River Above S. Creek 08MH103 32 645 158.2 0.23 0.43 Morice River Near Houston 08ED002 33 1910 246.1 0.12 0.16 Exchamsiks River Near Terrace 08EG012 33 370 362.5 0.19 0.04 Goathorn Creek Near Telkwa 08EE008 34 132 14.3 0.25 0.25 San Juan River Near Port Renfrew 08HA010 34 580 619.1 0.19 0.02 Ucona River At The Mouth 08HC002 34 185 215.4 0.33 0.25 Zymagotitz River Near Terrace 08EG011 35 376 181.9 0.24 0.22 Tsable River Near Fanny Bay 08HB024 35 113 145.0 0.25 0.06 Zeballos River Near Zeballos 08HE006 35 181 334.9 0.26 0.19 Kanaka Creek Near W. Corners 08MH076 35 48 40.7 0.28 0.22 Iskut River Below Johnson River 08CG001 36 9350 2505.0 0.23 0.46 Coquihalla River Near Hope 08MF003 36 741 238.3 0.22 0.35 North Alouette River At 232nd S. 08MH006 37 37 42.2 0.25 0.13 Slesse Creek Near Vedder Crossing 08MH056 37 860 51.3 0.23 -0.12 Homathko River At The Mouth 08GD004 38 5720 1246.6 0.17 0.34 Campbell River At Outlet Of C.L. 08HD001 38 1400 421.5 0.20 0.26 Koksilah River At Cowichan Station 08HA003 41 209 125.0 0.29 -0.09 Sumas River Near Huntingdon 08MH029 41 65.91739 24.2 0.25 0.23 Sarita River Near Bamfield 08HB014 43 162 289.0 0.26 0.18 Chemainus River Near Westholme 08HA001 44 355 231.3 0.29 0.11 Harrison River Near Harrison H.S. 08MG013 44 7870 1269.8 0.11 0.09 Nass River Above Shumal Creek 08DB001 56 18500 3705.4 0.15 0.39 Chilliwack River At Vedder Crossing 08MH001 60 1230 303.1 0.23 0.09 Bulkley River At Quick 08EE004 64 7360 577.1 0.14 0.09 Chilliwack River At Outlet Of C. L 08MH016 66 329 70.3 0.15 0.14 Lillooet River Near Pemberton 08MG005 75 2160 538.9 0.14 0.32 Bridge River Below Tyaug. Creek 08ME014 12 3190.0 404.7 0.07 0.05 Browns River Near Courtenay 08HB025 18 86.0 70.1 0.34 0.22 Bulkley River Near Hazelton 08EE001 12 12300.0 889.2 0.16 0.19 Carnation Creek WeirC 17 1.5 3.8 0.23 0.25 Carnation Creek WeirB 24 9.3 27.7 0.19 0.23 Carnation Creek WeirE 23 2.6 9.1 0.35 0.38 Chapman Creek Above Diversion 08GA060 18 65.0 47.5 0.22 0.20 Chehalis River Near Harrison Mil. 08MG001 15 383.0 355.7 0.26 0.18 Clayton Falls Creek Near Mouth 08FB009 15 93.0 52.5 0.21 0.08 Coquitlam River Above Coquitlam 08MH141 13 55.0 77.1 0.20 0.08 215 Table 2.3 (Cont . ) General characteristics of British Columbia hydrometric stations C O A S T A L BRITISH C O L U M B I A Staation Name Stations NOBS Area L-Mean L-Cv L-Cs Cruickshank River Near Mouth 08HB074 13 214.0 209.7 0.28 0.13 Dove Creek Near The Mouth 08HB075 10 41.0 26.9 0.16 -0.16 Elaho River Near The Mouth 08GA071 14 1250.0 644.1 0.23 0.13 Elk Creek At Prairie Central Road 08MF048 14 12.0 2.9 0.37 0.32 Gun Creek Near Minto City 08ME006 13 570.0 59.5 0.12 -0.17 Haslam Creek Near Cassidy 08HB003 15 96.0 33.5 0.34 -0.01 Homathko River At Inlet To Tatla. 08GD008 20 227.5 9.0 0.22 0.13 Homathko River At Tragedy 08GD006 18 4070.0 579.0 0.11 0.15 Homathko River Below Creek 08GD005 20 1960.0 173.6 0.18 0.37 Hurley River Near Bralorne 08ME011 11 368.0 87.4 0.11 0.27 Jacobs Creek Above Jacobs Lake 08MH108 14 12.0 11.0 0.19 0.19 Kathlyn Creek Above Simpson 08EE010 11 16.1 0.9 0.17 0.00 Kitseguecla River Near Skeena 08EF004 12 728.0 145.4 0.24 0.25 Klinaklini River East Channel 08GE002 17 5780.0 1089.5 0.10 0.39 Kokish River Below Bonanza 08HF003 14 269.0 89.9 0.23 -0.19 Laventie Creek Near The Mouth 08JA015 19 87.0 52.7 0.31 0.40 Macivor Creek Near The Mouth 08JA016 16 53.0 6.4 0.20 -0.02 Mamquam River Above Mashiter 08GA054 19 334.0 152.0 0.22 0.21 Mashiter Creek Near Squamish 08GA057 11 39.0 22.1 0.34 0.41 Murray Creek At 216 Street 08MH129 15 26.0 10.7 0.24 0.24 Nicomekl River At 203 Street 08MH155 10 69.2 36.2 0.22 0.33 Oyster River Below Woodhus C. 08HD011 20 298.0 127.6 0.32 -0.01 Pitt River Near Alvin 08MH017 12 515.0 276.5 0.24 0.48 Richfield Creek Near Topley 08EE009 10 173.0 14.4 0.23 -0.11 Salmon River Above Campbell 08HD015 14 269.0 118.8 0.30 -0.22 Station Creek Above Diversions 08EE028 10 11.0 2.5 0.24 0.55 Telkwa River Below Tsai Creek 08EE020 19 368.0 94.4 0.23 0.33 Tsitika River Below Catherine C. 08HF004 20 360.0 294.0 0.26 0.29 Tulameen River At Coalmont 08NL008 11 1370.0 177.9 0.18 -0.09 Wahleach Creek Near Laidlaw 08MF006 14 65.0 21.2 0.23 0.31 Yorkson Creek Near Walnut 08MH097 18 6.0 3.0 0.22 0.18 Zymoetz River Near Terrace 08EF003 13 3080.0 790.5 0.13 -0.16 216 Table 2.4 General characteristics of the selected regions in Colorado and California ID A M E NOBS AREA L-Mean L-Cv L-Cs C O L O R A D O A L P I N E 6616000 North Fork Michigan River Gould, Co. 32 52 184 0.19 -0.03 6716500 Clear Creek Near Lawson, Co. 43 376 1193 0.28 0.36 6722500 South St. Vrain Creek Near Ward, Co. 24 37 247 0.18 0.13 6725500 Middle Boulder Creek At Nederland, Co. 52 93 463 0.17 0.08 6733000 Big Thompson River At Estes Park, Co. 50 351 1161 0.24 0.28 6747500 Poudre River 12 507 2094 0.16 0.09 6748600 South Fork Cache La Poudre Rustic, Co. 23 237 538 0.26 0.24 9011000 Colorado River Near Grand Lake, Co. 63 261 745 0.27 0.24 9024000 Fraser River At Winter Park, Co. 82 71 253 0.33 0.11 C O L O R A D O F O O T H I L L S 6710500 Bear Creek At Morrison, Co. 86 420 857 0.63 0.66 6711000 Turkey Creek Near Morrison, Co. 28 128 289 0.66 0.64 6725000 Left Hand Creek At M . Longmont, Co. 18 184 212 0.44 0.31 6730300 Coal Creek Near Plainview, Co. 24 39 165 0.73 0.76 6736000 North Fork Big Tho. River At Drake 30 218 607 0.67 0.79 6739500 Buckhorn Creek 11 343 4278 0.66 0.37 6742000 Little Thompson River 16 256 1270 0.61 0.37 6736500 Big Thompson River P, Nr Drake, Co. 38 712 1534 0.33 0.46 6752000 Cache La Poudre Mo , Nr Ft Collins, Co. 114 2703 3728 0.28 0.36 N O R T H - C E N T R A L SIERRA N E V E D A 11409500 Oregon C Nr North San Juan Ca 56 88 1797 0.39 0.32 11412500 Goodyears C A Goodyears Bar Ca 20 33 602 0.44 0.40 11413000 N Yuba R B l Goodyears Bar Ca 66 640 9933 0.45 0.36 11414000 S. Yuba River 52 133 3758 0.45 0.46 11417100 Poorman Creek 16 59 1423 0.54 0.55 11413100 N Yuba R Ab Slate C Nr Strawb. Ca 19 9 20813 0.50 0.32 11426400 N . Shirtail Creek 29 23 643 0.52 0.45 11427000 N.F. American River 55 876 18173 0.47 0.33 11433100 Long Canyon Creek 32 46 1045 0.59 0.44 C O A S T R A N G E 11472900 Black Butte R Nr Covelo Ca 22 415 12427 0.34 0.27 11473000 M f Eel R Bl Black Butte R Nr Co. Ca 16 940 41237 0.42 0.36 11477500 Van Duzen River 16 218 13850 0.22 0.13 11477700 Little Van Duzen R Nr Bridgeville Ca 12 93 6278 0.27 0.17 11478500 Van Duzen R Nr Bridgeville Ca 56 568 21901 0.25 0.14 11481500 Redwood C Nr Blue Lake Ca 26 173 5350 0.30 0.14 11482500 Redwood C A Orick Ca 45 709 22677 0.31 0.17 11529000 Sf Trinity RNrSalyer Ca 19 2299 34799 0.31 0.32 11529800 Willow C Nr Willow C Ca 15 105 4721 0.39 0.41 11529950 Campbell C Nr Hoopa Ca 12 18 752 0.38 0.46 11531000 M f Smith R A Gasquet Ca 15 335 14234 0.20 0.10 11532000 Sf Smith R Nr Crescent City Ca 21 745 52931 0.35 0.31 11532500 Smith River 64 1571 83170 0.25 0.20 217 Table 2.5 Test for independence of annual maximum daily flows at Broken Scar near Darlington Broken Scar near Darlington Drainage area = 818 square km Annual maximum daily flow series 1957 to 1998 Spearman rank order serial correlation coeff = -. 184 d.f.= 39 corresponds to students t =-1.17 critical t value at 5% level = 1.69 - - - - 1% - =2.43 Interpretation: The null hypothesis is that the correlation is zero. At the 5% level of significance, the correlation is not significantly different from zero. That is, the data do not display significant serial dependence. not significant not significant Table 2.6 Test for trend of annual maximum daily flows at Broken Scar near Darlington Broken Scar near Darlington Drainage area = 818 square km Annual maximum daily flow series 1957 to 1998 spearman rank order correlation coeff = -.122 d.f.= 40 corresponds to students t = -.77 critical t value at 5% level =-2.02 not significant - - - - 1% - =-2.70 not significant Interpretation: the null hypothesis is that the serial (lag-one) correlation is zero. At the 5% level of significance, the correlation is not significantly different from zero. That is, the data do not display significant trend. 218 Table 2.7 Test for randomness of annual maximum daily flows at Broken Scar near Darlington Broken Scar near Darlington Drainage area = 818 square km Annual maximum daily flow series 1957 to 1998 The number of runs above and below the median (runab) = 20 The number of observations above the median(nl) =21 The number of observations below the median(n2) =21 (Note: z is the standard normal variate.) For this test, Z = .63 Critical Z value at the 5% level = 1.96 not significant Interpretation: The null hypothesis is that the data are random. At the 5% level of significance, the null hypothesis cannot be rejected. That is, the sample is significantly random. Table 2.8 Test for homogeneity of annual maximum daily flows at Broken Scar near Darlington Broken Scar near Darlington Drainage area = 818 square km Annual maximum daily flow series 1957 to 1998 Split by time span, subsample 1 sample size= 20 subsample 2 sample size= 22 (Note: z is the standard normal variate.) For this test, z = -.78 Critical z value at 5% significant level = -1.65 not significant - - - 1% - - =-2.33 not significant Interpretation: The null hypothesis is that there is no location difference between the two samples. At the 5% level of significance, there is no significant location difference between the two samples. That is, they appear to be from the same population. 219 Table 3.1 Approximate periods of El-Nino-Southern Oscillation conditions in equatorial Pacific Ocean (after Webb and Betancourt, 1992) El-non- Southern Oscillation conditions agree with Southern Line Island Quinn Period of time* Oscillation Precipitation and others Rasmussen From To Index Index (1987) (1984) Late 1899 Mid-1900 Yes Yes Yes Yes Mid-1902 Early 1903 Yes Yes Yes Yes Early 1905 Mid-1906 Yes Yes Yes Yes Mid-1911 Mid-1912 Yes Yes Yes Yes Mid-1914 Mid-1915 Yes Yes Yes Yes Mid-1918 Late 1819 Yes Yes Yes Yes Mid-1923 Late 1823 Yes Yes • Yes Yes Mid-1925 Mid-1926 Yes Yes Yes Yes Mid-1930 Early 1931 Yes Yes Yes Yes Early 1932 Late 1932 Yes Yes Yes Yes Mid-1939 Early 1942 Yes Yes Yes Yes Early 1946 Late 1946 Yes Yes Yes Yes Early 1951 Late 1851 Yes Yes Yes Yes Early 1953 Late 1953 Yes Yes Yes Yes Early 1957 Mid-1958 Yes Yes Yes Yes Mid-1963 Early 1964 Yes Yes Yes Yes Early 1965 Mid-1966 Yes Yes Yes Yes Early 1969 Late 1969 Yes Yes Yes Yes Mid-1972 Early 1973 Yes Yes Yes Yes Mid-1976 Early 1978 Yes Yes Yes Yes Mid-1982 Mid-1983 Yes Yes Yes Yes Mid-1986 Early 1987... Yes Yes . . . . * Note tendency for El-Nino-Southern Oscillation conditions to begin in the early part of the calendar year between 1930 and 1960, compared to midyear before 1930 and after 1960. 220 Table 4.1 Recent flood frequency studies by L-moments that recommend use of the G E V distribution Reference Location Recommended probability distribution Number of sites Vogel and Wilson (1996) Continental United States LN3, G E V , and LP3 1490 Onoz and Bayazit (1995) Different parts of the world G E V 19 Karim and Chowdhury (1995) Bangladesh G E V 31 Vogel etal. (1993b) Australia G E V , GP, LP3 61 Vogel etal. (1993a) Southwestern United States LN3, LN2, G E V , and LP3 383 Gingras and Adamowski (1992) New Brunswick, Canada G E V 53 Pilon and Adamowski (1992) Nova Scotia, Canada G E V 25 Pearson (1991) South Island, New Zealand EV1 , EV2, G E V 275 Nathan and Weinmann (1991) Central Victoria, Australia G E V 53 221 Table 4.2 The PDF and some characteristics of distributions commonly used in flood frequency analysis Distribution PDF and/or C D F Log-Pearson type III (LP3) « p I - 0 n W ) / / J ) ( t a j [ _ f l / / r , pcT(a) Gumbel ( E V 1 ) f(x)= exp{ ^}exp[ exp{ ^}] a a a Generalized Extreme Value (GEV) /(x)=1a x "V- 'ex P { [i k{x-^rk) a a a Wakeby x(F) = Z + ^[l-a-Ff]-t[l-(l-Fys] P 5 222 Table 4.3 T-year flood events estimated by the various frequency distributions for the ten stations in Arizona Location Methods 50-year 100-year 200-year flood flood flood (cms) (cms) (cms) Salt River near Roosevelt, Log Pearson Type III 5352 7787 11072 Ariz. 04985000 Generalized extreme value 3936 5380 7306 Wakeby 3964 5154 6541 Nonparametric distribution 3320 4010 4070 T C L N distribution 3544 3951 4364 Salt River near Chrysotile, Log Pearson Type III 3115 4474 6258 Ariz. 9497500 Generalized extreme value 2294 3058 4049 Wakeby 2288 2888 3568 Nonparametric distribution 2090 2160 2200 T C L N distribution 2016 2251 2481 San Francisco River at Clifton, Log Pearson Type III 3058 4531 6541 Ariz. 9444500 Generalized extreme value 2268 3228 4559 Wakeby 2325 3171 4247 Nonparametric distribution 1970 2530 2590 T C L N distribution 1934 2080 2436 San Pedro River near Redington, Log Pearson Type III 1286 1591 1923 Ariz. 9472000 Generalized extreme value 1305 1739 2291 Wakeby 1271 1758 2427 Nonparametric distribution 1440 2530 2570 T C L N distribution 1395 1987 2738 Santa Cruz River at Continental, Log Pearson Type III 909 1226 1620 Ariz. 9482000 Generalized extreme value 895 1294 1860 Wakeby 926 1294 1787 Nonparametric distribution 960 1270 1290 T C L N distribution 1030 1417 1894 San Carlos River near Peridot, Log Pearson Type III 1294 1594 1917 Ariz. 9468500 Generalized Extreme Value 1308 1702 2189 Wakeby 1288 1577 1886 Nonparametric distribution 1180 1530 1570 T C L N distribution 1344 1689 2080 Santa Cruz at Tucson, Log Pearson Type III 762 954 1178 Ariz. 9482500 Generalized extreme value 787 1034 1345 Wakeby 725 991 1373 Nonparametric distribution 1050 1470 1500 T C L N distribution 779 1045 1425 223 Table 4.3(Cont.) T-year flood events estimated by the various frequency distributions for the ten stations in Arizona Location Methods 50-year 100-year 200-year flood flood flood ( c m s ) (cms) ( c m s ) Oak Creek near Cornville, Log Pearson Type III 784 886 977 Ariz. 9504500 Generalized extreme value 892 1124 1399 Wakeby 864 1014 1167 Nonparametric distribution 759 787 808 T C L N distribution 940 1177 1445 San Pedro River at Palominas, Log Pearson Type III 504 575 646 Ariz. 9470500 Generalized extreme value 518 606 702 Wakeby 518 606 697 Nonparametric distribution 501 616 642 T C L N distribution 510 588 669 Gila River near Clifton, Log Pearson Type III 1068 1370 1724 Ariz. 9442000 Generalized extreme value 1124 1566 2163 Wakeby 1150 1560 2093 Nonparametric distribution 1380 1580 1630 T C L N distribution 1230 1635 2119 224 Table 4.4 T-year flood events estimated by the various frequency distributions for the six stations in BC Location Methods 50-year flood (cms) 100-year flood (cms) 200-year flood (cms) Fishtrap Creek River, B C 08LB024 Gumbel (Extreme value type I) Nonparametric distribution T C L N distribution 14.90 12.60 12.86 16.20 13.10 13.59 17.00 13.50 14.28 Kemano River, B C 08FE003 Gumbel ( Extreme value type I) Nonparametric distribution T C L N distribution 900.00 857.00 959.30 1000.00 893.00 1074.00 1100.00 923.00 1188.00 Halfway River, BC 07FA001 Gumbel ( Extreme value type I) Nonparametric distribution T C L N distribution 1420.00 1360.00 1417.70 1590.00 1420.00 1561.20 1760.00 1460.00 1703.05 Chilliwack River, B C 08MH001 Gumbel ( Extreme value type 1) Nonparametric distribution T C L N distribution 1000.00 868.00 991.96 1095.00 889.00 1045.80 1130.00 907.00 1094.34 Boundary Creek near Porthill 08NH032 Gumbel ( Extreme value type I) Nonparametric distribution T C L N distribution 110.00 102.00 102.57 140.00 105.00 107.00 160.00 108.00 111.06 Fording River below CI. Creek 08NK021 Gumbel ( Extreme value type I) Nonparametric distribution T C L N distribution 36.20 33.30 34.65 39.10 34.70 37.66 41.00 35.70 40.58 225 Table 5.1 Scenarios for the Monte-Carlo Simulations Design Scenario # a M2 °i a2 L-Cv L-Cs 1 0.50 2.500 2.700 0.050 0.080 0.16 0.19 2 0.50 2.500 2.500 0.050 0.200 0.18 0.22 3 0.50 2.500 2.700 0.050 0.300 0.33 0.50 4 0.50 2.500 2.500 0.050 0.400 0.35 0.44 226 Table 5.2: Scenario #1 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) T V A R B IAS A 2 M S E V A R B IAS A 2 M S E TCLN Analysis GEV Analysis 1000 27067 609 27676 130550 94909 225459 500 20826 385 21211 68283 53033 121317 200 14086 191 14277 27444 21450 48893 100 9999 101 10100 13286 9041 22326 50 6715 46 6761 6517 2774 9291 20 3499 35 3534 3135 726 3861 10 2232 20 2252 2351 509 2860 5 2012 13 2025 1948 301 2249 2 1335 8 1343 1103 220 1323 LP3 Analysis EV1 Analysis 1000 291996 215572 507568 13374 28491 41865 500 161740 120032 281773 11094 18815 29908 200 69766 48915 118681 8415 9373 17788 100 34711 21261 55972 6641 4599 11240 50 16123 7175 23297 5084 1653 6737 20 5377 538 5915 3353 515 3868 10 2567 768 3335 2289 271 2560 5 1706 320 2026 1431 150 1581 2 1129 142 1271 624 91 715 WAKEBY Analysis NP Analysis 1000 87717 4282 91999 9943 6114 16056 500 48908 2565 51474 8979 3278 12257 200 22546 1367 23913 7719 931 8649 100 12277 833 13110 6759 129 6889 50 6621 450 7071 5779 226 6005 20 3313 151 3464 3724 204 3928 10 2539 144 2683 2567 87 2654 5 2241 87 2328 2095 15 2110 2 1173 45 1218 1405 3 1408 227 Table 5.3: Scenario #2 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) r VA/? B/ASA2 MSB VA/? B/ASA2 MSB TCLN Analysis GEV Analysis 1000 287251 5408 292660 683619 6 683625 500 196234 3924 200159 339482 2200 341682 200 114218 2189 116407 132608 5369 137977 100 72324 1340 73665 63522 5534 69056 50 42566 663 43229 29285 3785 33070 20 15540 299 15839 9548 850 10398 10 5936 227 6163 3649 623 4272 5 1802 169 1971 1228 250 1478 2 408 130 538 343 101 444 LP3 Analysis EV1 Analysis 1000 266763 61568 328331 33232 101925 135157 500 156648 45710 202358 27098 65752 92849 200 76608 27009 103616 19955 33083 53039 100 43506 16140 59646 15281 17699 32980 50 23747 7679 31426 11232 7557 18789 20 9658 1134 10791 6828 1183 8011 10 4334 1035 5369 4210 878 5088 5 1652 920 2572 2208 163 2371 2 404 325 729 590 87 677 WAKEBY Analysis NP Analysis 1000 1189632 227641 1417273 40470 199008 239478 500 469073 47920 516992 39700 112936 152636 200 143924 3190 147113 38565 40634 79199 100 61570 2174 63744 37565 12157 49722 50 27421 2020 29441 36203 688 36891 20 9723 1021 10744 19242 657 19900 10 4231 265 4496 4334 603 4937 5 1529 70 1599 2096 230 2326 2 286 30 316 301 4 306 228 Table 5.4: Scenario #3 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) r VAX B/ASA2 MSB VAX B/ASA2 MSB TCLN Analysis GEV Analysis 1000 8322 2280 10602 43268 23478 66746 500 6463 1496 7958 24834 15893 40728 200 4407 763 5170 10905 8686 19591 100 3129 406 3536 5351 4910 10261 50 2095 183 2279 2458 2292 4750 20 1149 43 1192 1086 388 1474 10 809 19 828 1029 385 1414 5 994 8 1002 1243 21 1264 2 1582 1 1583 1145 1 1146 LP3 Analysis EV1 Analysis 1000 162922 87531 250452 5546 67613 73159 500 93466 53604 147071 4623 47541 52164 200 41192 25591 66783 3544 26831 30376 100 20151 13070 33221 2835 15380 18215 50 8726 5569 14295 2217 7300 9517 20 2332 898 3230 1539 1385 2924 10 1092 486 1578 1130 620 1750 5 1180 202 1382 810 24 834 2 1274 31 1305 534 14 548 WAKEBY Analysis NP Analysis 1000 19811 184 19995 4966 272 5238 500 12543 181 12724 4320 259 4578 200 6835 112 6947 3483 154 3637 100 4121 101 4222 2871 53 2924 50 2330 87 2417 2277 50 2327 20 1122 65 1187 1472 18 1490 10 979 62 1041 1092 8 1100 5 1297 27 1324 900 4 904 2 1284 12 1296 1664 2 1666 229 Table 5.5: Scenario #4 - accuracy of estimating design floods for various T when the sample size is 20 (TCLN generator) T VAX B/ASA2 MSE VAX B/ASA2 MSE TCLN Analysis GEV Analysis 1000 9808949 77267 9886217 19728758 8187032 27915790 500 5572678 37260 5609938 7273433 2134033 9407466 200 2517291 13776 2531067 1964559 186733 2151292 100 1301371 6061 1307433 740030 81328 821358 50 623036 2319 625355 284552 39765 324317 20 195437 171 195608 83800 36706 120506 10 73464 224 73688 33729 19133 52863 5 24745 40 24785 12814 1349 14163 2 1581 10 1591 2173 590 2763 LP3 Analysis EV1 Analysis 1000 35430389 3245298 38675686 440822 3196855 3637677 500 12974384 803269 13777653 360202 1943113 2303316 200 3555006 56108 3611114 266058 919182 1185241 100 1347710 43298 1391008 204206 467781 671987 50 506249 14246 520495 150372 202436 352808 20 132851 18139 150990 91394 38209 129603 10 45076 6975 52052 55900 8460 64360 5 13561 4320 17881 28242 8084 36326 2 1711 1275 2986 4554 5089 9643 WAKEBY Analysis NP Analysis 1000 18596274 6009249 24605524 559925 4147230 4707155 500 6864045 1523632 8387677 558593 2245026 2803619 200 1901099 122834 2023933 556534 766136 1322670 100 749881 84435 834316 554734 213067 767801 50 311001 45678 356679 552178 15385 567563 20 104368 30101 134469 308431 8459 316890 10 45433 15519 60952 45076 6975 52052 5 17066 1042 18108 31605 650 32255 2 1779 387 2166 1257 231 1488 230
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Mixture distributions and spatial scale effects on...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Mixture distributions and spatial scale effects on flood hydrology Mtiraoui, Ahmed 2004
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Mixture distributions and spatial scale effects on flood hydrology |
Creator |
Mtiraoui, Ahmed |
Date Issued | 2004 |
Description | Knowledge of the magnitude and frequency of floods on rivers is necessary for a variety of practical applications, including the design of hydraulic structures such as bridges and culverts, and floodplain management through land-use allocation and flood-protection measures. Design floods estimated by fitted distributions are prone to errors associated with (i) mis-specification of the parent distribution at a single site and (ii) the estimation of flood statistics in regional analysis. The first part of this thesis deals with the mis-specification of the parent distribution, that is, the model governing the population from which the observed sample of data is supposedly drawn. Usually, traditional flood frequency analysis involves the assumption of homogeneity of the flood distribution. However, floods are often generated by heterogeneous distributions composed of a mixture of two or more populations. Differences between the populations may be due to a number of factors, including seasonal variations in the flood producing mechanisms, changes in weather patterns due to low frequency climate shifts and/or El-Niño/La-Nina oscillations, changes in channel routing due to the dominance of within channel or floodplain flow, and basin variability resulting from changes in antecedent soil moisture. We demonstrated that in many cases not recognizing these physical processes in conventional flood frequency analysis is the main reason why many frequency distributions do not provide an acceptable fit to flood data. An analysis of flow records from streams across British Columbia (Canada), the Gila River (Arizona, USA), and the River Tees (northern England) indicated that when floods are generated by two or more distinct hydrologic processes, the resulting flood distributions may be multimodal and may not be represented by homogeneous distributions. Analysis indicated that the T-year design flood estimated by assuming heterogeneous distributions is much more conservative than those estimated by homogeneous distributions. Monte Carlo simulations were used in this study to quantify the errors in estimating design floods that are caused by a mis-identification of distributions. A set of homogeneous and heterogeneous parametric and nonparametric distributions were compared. A series of variables were also tested, namely the return period, sample size, and combinations of several two parametric distributions. An assessment of the suitability of flood estimation techniques was made based on the effect of these conditions on the accuracy of the estimates. It was found that for high L-skewness (L-Cs) and a heavytailed probability density function, both of which are characteristics of flood mixtures in arid and semi arid climates of Arizona, the Wakeby and two-component log-normal (TCLN) distributions consistently perform well compared to the nonparametric (NP), Gumbel (EV1) and log-Pearson type III (LP3) distributions. For characteristics of flood mixtures representative of the humid climates of British Columbia, where the heterogeneity results in a flood frequency distribution with smaller value of L-skewness (L-Cs) and a bimodal probability density function, the Gumbel (EV1) and nonparametric (NP) distributions perform better than the other distributions. The second part of this thesis provides new insights that serve to improve scientific understanding and professional practice in addressing regional flood hydrology problems. Currently employed peak flow regionalisation procedures inherently make assumptions of scale invariance. One assumption is that the scaling exponent of the flood quantiledrainage area power relationship is independent of catchment size. A second assumption is that the index flood method is valid such that growth factors between flood quantiles are independent of catchment size (scale). A third assumption inherent in many regional flood models is the constancy in the L-coefficient of variation (L-Cv) and the Lcoefficient of skewness (L-Cs) over homogeneous geographical regions. This study focuses on the spatial scaling patterns of linear moment flood statistics, and offers plausible explanations for observed regional scaling trends, in terms of the various precipitation and runoff mechanisms that dominate at different scales and in different climates. The characteristics of these mechanisms are then linked back to the effects that variations in L-moment ratio statistics have on flood quantile estimates, and most importantly, the tail behaviour of flood frequency distributions. A regional linear moment analysis of annual maximum daily flows in streams in British Columbia, California, Colorado, and the Walnut Gulch Experimental Watershed are used to demonstrate that these assumptions of scale invariance of flood statistics are invalid. This is because flood statistics depend not only on physiography and climatic conditions, but also to a large extent on the size of the catchment. Scale dependence of flood statistics hampers the estimation of peak flows, in particular for small (< 100 km² ) ungauged watersheds. |
Extent | 12340568 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-11-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0075087 |
URI | http://hdl.handle.net/2429/15992 |
Degree |
Doctor of Philosophy - PhD |
Program |
Forestry |
Affiliation |
Forestry, Faculty of |
Degree Grantor | University of British Columbia |
GraduationDate | 2004-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2004-902331.pdf [ 11.77MB ]
- Metadata
- JSON: 831-1.0075087.json
- JSON-LD: 831-1.0075087-ld.json
- RDF/XML (Pretty): 831-1.0075087-rdf.xml
- RDF/JSON: 831-1.0075087-rdf.json
- Turtle: 831-1.0075087-turtle.txt
- N-Triples: 831-1.0075087-rdf-ntriples.txt
- Original Record: 831-1.0075087-source.json
- Full Text
- 831-1.0075087-fulltext.txt
- Citation
- 831-1.0075087.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0075087/manifest