Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A probabilistic inflow forecasting system for operation of hydroelectric reservoirs in complex terrain Bourdin, Dominique R. 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_fall_bourdin_dominique.pdf [ 3.56MB ]
Metadata
JSON: 24-1.0103347.json
JSON-LD: 24-1.0103347-ld.json
RDF/XML (Pretty): 24-1.0103347-rdf.xml
RDF/JSON: 24-1.0103347-rdf.json
Turtle: 24-1.0103347-turtle.txt
N-Triples: 24-1.0103347-rdf-ntriples.txt
Original Record: 24-1.0103347-source.json
Full Text
24-1.0103347-fulltext.txt
Citation
24-1.0103347.ris

Full Text

A Probabilistic Inflow Forecasting Systemfor Operation of Hydroelectric Reservoirsin Complex TerrainbyDominique R. BourdinB. Sc. Atmospheric Science, The University Of British Columbia, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Atmospheric Science)The University Of British Columbia(Vancouver)September 2013? Dominique R. Bourdin, 2013AbstractThis dissertation presents a reliable probabilistic forecasting system designed to predict inflows tohydroelectric reservoirs. Forecasts are derived from a Member-to-Member (M2M) ensemble inwhich an ensemble of distributed hydrologic models is driven by the gridded output of an ensembleof numerical weather prediction (NWP) models. Multiple parameter sets for each hydrologic modelare optimized using objective functions that favour different aspects of forecast performance. Oneach forecast day, initial conditions for each differently-optimized hydrologic model are updatedusing meteorological observations. Thus, the M2M ensemble explicitly samples inflow forecastuncertainty caused by errors in the hydrologic models, their parameterizations, and in the initial andboundary conditions (i.e., meteorological data) used to drive the model forecasts.Bias is removed from the individual ensemble members using a simple degree-of-mass-balancebias correction scheme. The M2M ensemble is then transformed into a probabilistic inflow forecastby applying appropriate uncertainty models during different seasons of the water year. The uncer-tainty models apply ensemble model output statistics to correct for deficiencies in M2M spread.Further improvement is found after applying a probability calibration scheme that amounts to are-labelling of forecast probabilities based on past performance.Each component of the M2M ensemble has an associated cost in terms of time and/or money.The relative value of each ensemble component is assessed by removing it from the ensemble andcomparing the economic gains associated with the reduced ensembles to those achieved using thefull M2M system. Relative value is computed using a simple (static) cost-loss decision model inwhich the reservoir operator takes action (lowers the reservoir level) when significant inflows arepredicted with probability exceeding some threshold.The probabilistic reservoir inflow forecasting system developed in this dissertation is appliedto the Daisy Lake hydroelectric reservoir located in the complex terrain of southwestern BritishColumbia, Canada. The hydroclimatic regime of the case study watershed is such that flashy fall andwinter inflows are driven by Pacific frontal systems, while spring and summer inflows are dominatedby snow and glacier melt. Various aspects of ensemble and probabilistic forecast performance areevaluated over a period of three water years.iiPrefaceThe main body of this dissertation is comprised of work from two published journal papers (Chap-ter 1 and a combination of Chapters 2 and 3), one submitted paper (Chapter 4), and one in prepa-ration (Chapter 5). Submitted and published papers have been reformatted to meet dissertationformatting requirements. Minor editing changes may have been made, but the content is otherwiseunaltered.Funding for this research was provided by the Canadian Natural Sciences and Engineering Re-search Council (NSERC) in the form of an Alexander Graham Bell Canada Graduate Scholarship.Additional funding was provided by a NSERC Discovery Grant to Professor Stull.Chapter 1Sections 1.1 and 1.2.1 of the introductory chapter are adapted from the following published paper:Bourdin, D. R., S. W. Fleming and R. B. Stull, 2012: Streamflow modelling: A primer onapplications, approaches and challenges. Atmosphere-Ocean, 50, 507-536.I wrote the original manuscript, with significant guidance and contributions from Dr. Flemingand editing by Professor Stull. I wrote the remainder of the chapter.Chapter 2Contents of Chapter 2 (Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts ofReservoir Inflow) are adapted from the following published paper:Bourdin, D. R. and R. B. Stull, 2013: Bias-corrected short-range Member-to-Member ensembleforecasts of reservoir inflow. Journal of Hydrology, 502, 77-88.I conducted the experiments and analysis and wrote the original chapter with editing by Profes-sor Stull.Chapter 3Contents of Chapter 3 (Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertaintyin the Modelling Chain) are adapted from the following published paper:Bourdin, D. R. and R. B. Stull, 2013: Bias-corrected short-range Member-to-Member ensembleforecasts of reservoir inflow. Journal of Hydrology, 502, 77-88.iiiPrefaceI conducted the experiments and analysis and wrote the original chapter with editing by Profes-sor Stull.Chapter 4Bourdin, D. R., T. N. Nipen and R. B. Stull, 2013: Reliable Probabilistic Forecasts from an Ensem-ble Reservoir Inflow Forecasting System. Submitted for professional peer review on 6 June 2013.Undergoing revisions.Additions and modifications to COMPS schemes were implemented by Dr. Thomas Nipen andmyself. I carried out the experiments and analysis and wrote the original chapter. Editing was doneby Dr. Nipen and Professor Stull.Chapter 5Bourdin, D. R. and R. B. Stull, 2013: On the Importance of Sampling Hydrologic Uncertainty: AnEconomic Analysis. In preparation.I conducted the experiments and analysis and wrote the original chapter, with editing by Profes-sor Stull.Appendix AI wrote the descriptions of verification measures used throughout the dissertation. Editing was doneby Professor Stull.Appendix BAs for Chapter 4.Appendix CAs for Chapter 4.ivTable of contentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Uncertainty in Hydrologic Model Predictions . . . . . . . . . . . . . . . . . . . 11.2 Previous Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Ensemble Hydrologic Modelling. . . . . . . . . . . . . . . . . . . . . . 21.2.2 Uncertainty Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Statistical Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Dissertation Case Study and Contributions. . . . . . . . . . . . . . . . . . . . . 91.3.1 Sampling Uncertainty in Inflow Forecasts . . . . . . . . . . . . . . . . . 101.3.2 Bias Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.3 Calibrated Probability Forecasts . . . . . . . . . . . . . . . . . . . . . . 111.3.4 Economic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts ofReservoir Inflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Case Study Area and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 A Member-to-Member (M2M) Ensemble Forecasting System . . . . . . . . . . . 172.3.1 Numerical Weather Prediction Models . . . . . . . . . . . . . . . . . . . 17vTable of contents2.3.2 Distributed Hydrologic Models . . . . . . . . . . . . . . . . . . . . . . 182.3.3 Downscaling of Meteorological Input . . . . . . . . . . . . . . . . . . . 222.4 A Simple Bias Correction Method . . . . . . . . . . . . . . . . . . . . . . . . . 232.5 Verification Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertaintyin the Modelling Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Case Study Area and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3 A Member-to-Member (M2M) Ensemble Forecasting System . . . . . . . . . . . 353.3.1 A Multi-NWP Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.2 A Multi-Hydrologic Model Ensemble . . . . . . . . . . . . . . . . . . . 363.3.3 A Multi-Parameter Hydrologic Ensemble . . . . . . . . . . . . . . . . . 363.3.4 A Multi-State Hydrologic Ensemble . . . . . . . . . . . . . . . . . . . . 423.3.5 Bias Correction of Inflow Forecasts . . . . . . . . . . . . . . . . . . . . 433.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.2.1 Study Dates and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.2.2 The Member-to-Member (M2M) Ensemble Forecasting System . . . . . . 544.2.3 A COmmunity Modular Post-processing System (COMPS) . . . . . . . . 574.3 From Ensembles to Calibrated Probability Forecasts . . . . . . . . . . . . . . . . 584.3.1 Uncertainty Modelling in the COMPS Framework . . . . . . . . . . . . . 604.3.2 Metrics of Probabilistic Forecast Quality . . . . . . . . . . . . . . . . . . 624.3.3 Probability Calibration Method . . . . . . . . . . . . . . . . . . . . . . 634.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.4.1 Performance of the Uncertainty Models . . . . . . . . . . . . . . . . . . 654.4.2 Effect of Probability Calibration . . . . . . . . . . . . . . . . . . . . . . 704.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75viTable of contentsChapter 5: On the Importance of Sampling Hydrologic Uncertainty: An EconomicAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Economic Value of Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2.1 A Simple Cost-Loss Decision Model. . . . . . . . . . . . . . . . . . . . 805.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.3.1 Study Dates and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.3.2 The Member-to-Member (M2M) Ensemble Forecasting System . . . . . . 845.3.3 Ensemble Reduction Test Cases . . . . . . . . . . . . . . . . . . . . . . 875.3.4 Cost-Loss Model Development for Daisy Lake. . . . . . . . . . . . . . . 885.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4.1 Quality and Skill of Reduced Ensemble Forecasts . . . . . . . . . . . . . 925.4.2 Economic Value of Ensemble Components. . . . . . . . . . . . . . . . . 945.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Chapter 6: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.1 Summary of Methods and Procedures . . . . . . . . . . . . . . . . . . . . . . . 1046.2 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.3 Potential Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.4 Limitations and Recommendations for Future Work . . . . . . . . . . . . . . . . 107Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Appendix A: Forecast Verification Metrics . . . . . . . . . . . . . . . . . . . . . . . . 125A.1 Measures-Oriented Verification for Deterministic Forecasts . . . . . . . . . . . . 125A.2 Distributions-Oriented Verification for Ensemble and Probabilistic Forecasts . . . 126Appendix B: Testing an Adaptive Bias Corrector for Daisy Lake Inflow Forecasts . . . 133Appendix C: Bayesian Model Averaging and the M2M Ensemble . . . . . . . . . . . . 135viiList of tables2.1 Performance of simulated inflows from the WaSiM and WATFLOOD hydrologicmodels during optimization (1997?2007) and validation (1986?1996) periods.Measures of model performance are described in Section 2.5 and Appendix A. . 203.1 Selected model parameters for the WaSiM hydrologic model, as optimized bythe DDS algorithm using different objective functions. . . . . . . . . . . . . . . 383.2 Same as Table 3.1, but for the two primary land classes in the WATFLOODmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.1 Cost-loss contingency table of inflow forecasts and observations. The numberof forecast hits is given by a, b is the number of false alarms, c the number ofmisses, and d the number of correct rejections. Action is taken when a particularinflow exceedance event is forecast to occur, incurring a cost C, while eventsthat were not forecast result in losses L. Correct rejections result in no costs orlosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Physical parameter values for the Daisy Lake reservoir for cost-loss calculations.Values are taken from McCollor and Stull (2008b). . . . . . . . . . . . . . . . . 905.3 Cost-loss ratios for the Daisy Lake reservoir calculated using the basic model[Eq. (5.11)], including climatological frequency s [Eq. (5.12)], and including avariable electricity market [Eq. (5.13)] with Sm/Sc = 2.5. . . . . . . . . . . . . 915.4 Actual expenses incurred over the two-year evaluation period by using variousM2M configurations for decision making at the 70 m3/s threshold. Expenses arecalculated for ? = 0.036 using the probability threshold, pt, of 0.04. The loss Lincurred for each missed forecast at this threshold is $216,543. . . . . . . . . . 99viiiList of tables5.5 Actual expenses incurred over the two-year evaluation period by using variousM2M forecasts for decision making at the 100 m3/s inflow anomaly threshold.Expenses are calculated for ? = 0.075 using a probability threshold, pt, of 0.08.L for this threshold is $309,348. . . . . . . . . . . . . . . . . . . . . . . . . . . 100A.1 Contingency table for calculating hit rates and false alarm rates. The numberof forecast hits is given by a, b is the number of false alarms, c the number ofmisses, and d the number of correct rejections. . . . . . . . . . . . . . . . . . . 131B.1 A comparison of ensemble mean inflow forecast performance after applying aDMB bias correction computed adaptively for a range of time scales (? ) andcomputed over a 3-day moving window using the linearly-weighted correctordescribed in Chapter 2. Smaller values of MAE and RMSE are preferred. . . . . 134ixList of figures2.1 Map of the Cheakamus basin above the Daisy Lake reservoir, located in south-western BC. ASTER global digital elevation model background map is a prod-uct of the Japanese Ministry of Economy, Trade and Industry (METI) and theNational Aeronautics and Space Administration (NASA), with higher eleva-tions represented by lighter shades of grey. . . . . . . . . . . . . . . . . . . . 162.2 Flowchart illustrating the process of generating updated hydrologic states, sim-ulated inflows, and forecasted inflows for a particular hydrologic model. . . . . 212.3 Raw ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the2009?2010 water year. Traces from the individual hydrologic models exhibitconsistent bias, indicating a failure to accurately simulate the hydrologic statewithin the watershed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Results of applying bias correction schemes with varying window lengths today 1 and day 2 forecasts as measured by ensemble mean verification metrics.Perfect forecasts have DMB, NSE, LNSE and RMSESS of one, and MAE andRMSE of zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.5 Ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the 2009?2010 water year following LDMB3 bias correction. . . . . . . . . . . . . . . . 292.6 Rank histograms for day 1 and day 2 raw and LDMB3 bias-corrected ensembleforecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7 ROC curves for day 1 ensemble forecasts for forecasted inflow anomalies greaterthan -5.0 m3/s (dot-dashed line), 2.7 m3/s (dashed line) and 19.5 m3/s (solidline). The dotted line is the zero-skill line. . . . . . . . . . . . . . . . . . . . . 312.8 Brier skill score (BSS = 1 is perfect), relative reliability (zero is perfect) andrelative resolution (one is perfect) for raw and bias-corrected forecasts for daysone and two. The inflow anomaly threshold evaluated here is 19.5 m3/s. . . . . 32xList of figures3.1 Map of the Cheakamus watershed showing land-use/land cover classes utilizedin the WATFLOOD model. Map derived from data provided by BC Hydro. . . 383.2 Snow-water equivalent at the Squamish Upper proxy site as simulated by theWaSiM hydrologic model using the MAEo, NSEo and LNSEo parameter sets. . 393.3 Flowchart illustrating the process of generating updated hydrologic states, sim-ulated inflows, and forecasted inflows for a particular hydrologic model. Solidlines show the flow of meteorological observations into the model and the pro-duction of simulated inflows and updated hydrologic states for the followingday. Dashed lines show the flow of NWP forecasts into the model and the re-sulting 2-day inflow forecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4 Daisy Lake inflows during fall and early winter of the 2009?2010 water yearsimulated by the WaSiM model using the MAEo, NSEo and LNSEo parametersets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.5 As in Figure 3.2, but simulations done by the WATFLOOD hydrologic model. 423.6 The flow of information into and out of the WaSiM model for generating MAEoforecasts. Each model (WaSiM and WATFLOOD) and each parameterization(MAEo, NSEo and LNSEo) generates 12 different daily forecasts in this wayfor a combined total of 72 unique daily forecasts. . . . . . . . . . . . . . . . . 443.7 Performance of day 1 MAEo, NSEo and LNSEo forecasts from the WaSiM andWATFLOOD models driven by the 4-km WRF NWP output fields. Perfectinflow forecasts have NSE and LNSE equal to one (unitless), and MAE of zerom3/s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8 Performance of the bias-corrected M2M ensemble mean with MAEo ensemblemembers only, and with the addition of the NSEo and LNSEo ensemble mem-bers. Perfect inflow forecasts have DMB, NSE, LNSE and RMSESS equal toone (unitless), and MAE and RMSE of zero m3/s. . . . . . . . . . . . . . . . . 46xiList of figures3.9 Brier skill score (BSS = 1 is perfect), relative reliability (zero is perfect) andrelative resolution (one is perfect) for different ensemble forecasts for days 1and 2. Scores for bias-corrected forecasts are indicated by bar heights, whilethose for raw forecasts are indicated by triangles. The inflow anomaly thresholdevaluated here is 19.5 m3/s. . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.10 ROC diagrams for raw and LDMB3 bias-corrected day 1 forecasts for inflowanomalies greater than -5.0 m3/s (dot-dashed line), 2.7 m3/s (dashed line) and19.5 m3/s (solid line). The dotted line is the zero-skill line. . . . . . . . . . . . 483.11 Rank histograms for the bias-corrected MAEo-only and full ensembles. Thefull ensemble has greater dispersion as indicated by a smaller percentage ofobservations falling into the extreme bins of the histogram. . . . . . . . . . . . 493.12 Raw ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the2009?2010 water year for all hydrologic model parameterizations. . . . . . . . 503.13 As in Figure 3.12, but following LDMB3 bias correction. . . . . . . . . . . . . 514.1 The flow of information into and out of the WaSiM model for generating fore-casts with the MAE-optimized parameter set. The forecast workflow is indi-cated by the solid arrows. Dashed arrows illustrate how meteorological obser-vations are used to update the model configuration?s hydrologic state for thefollowing day?s forecasts. The model configuration is specified by the dash-dotted arrows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Rank histograms for the M2M ensemble forecasts at lead times of 1?3 days.The ensemble forecasting system is underdispersive for all forecast horizons asindicated by the large percentage of observations that fall outside the range ofthe ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Empirical distributions of M2M ensemble mean forecast errors (m3/s) for fore-cast days 1 and 2 during the 2009?2010 water year. Errors computed after a logtransformation (LT) of forecasts and observations are generally more Gaussian,though the raw day 2 warm season forecast errors exhibit a Gaussian shape. . . 61xiiList of figures4.4 PIT histograms for the storm seasons (top row), warm seasons (middle row),and full water years (bottom row), pooled over the 2010?2011 and 2011?2012water years. Results are for the uncalibrated EMOS uncertainty model. Cali-bration deviationsD are shown for each histogram, withE[Dp] for comparison.Flatter histograms and therefore lower D are preferred. . . . . . . . . . . . . . 664.5 PIT histograms for the storm seasons (top row), warm seasons (middle row),and full water years (bottom row), pooled over the 2010?2011 and 2011?2012water years. Results are for the uncalibrated log-EMOSv uncertainty model.Calibration deviations D are shown for each histogram, with E[Dp] for com-parison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.6 PIT histograms for all forecast horizons during the 2010?2011 and 2011?2012storm seasons using the uncalibrated log-EMOSm uncertainty model. . . . . . 694.7 Ignorance and continuous ranked probability scores (CRPS) for the various un-certainty models tested. Forecasts are divided into storm season (solid lines)and warm season (dashed lines) for scoring, as each uncertainty model has dif-ferent calibration characteristics during these times of year. Smaller ignorancescores and CRPS are preferred. . . . . . . . . . . . . . . . . . . . . . . . . . . 694.8 PIT histograms for EMOS uncertainty model forecasts as in Figure 4.4, butfollowing PIT-based probability calibration with nine smoothing points and ? =90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.9 PIT histograms for log-EMOSv uncertainty model forecasts as in Figure 4.5,but following PIT-based probability calibration with nine smoothing points and? = 90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.10 Same as Figure 4.7, but scores are computed after applying the PIT-based cal-ibration scheme (black). Uncalibrated results (grey) are plotted for compari-son. Results for warm season EMOS forecasts probability calibrated using the?carry-forward? method are indicated by the heavy dashed line. . . . . . . . . . 734.11 PIT histogram for full water years after combining raw (no PIT-based calibra-tion applied) storm season forecasts from the log-EMOSv uncertainty modelwith carry-forward-calibrated EMOS forecasts during the warm season for idealforecast reliability and sharpness. . . . . . . . . . . . . . . . . . . . . . . . . 75xiiiList of figures5.1 Observed inflows (solid black line) for the 2010?2011 and 2011?2012 years.Anomaly inflow values (solid grey line) are calculated by subtracting the cli-matological inflows (dashed black line) from the observations. The anomalythresholds of 70 m3/s and 100 m3/s are indicated by the horizontal dashed greylines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2 The flow of information into and out of the WaSiM model for generating fore-casts with the MAE-optimized parameter set. This process is repeated for eachwatershed model (WaSiM and WATFLOOD) and each parameterization/state,yielding 72 unique inflow forecasts each day. . . . . . . . . . . . . . . . . . . 865.3 Reservoir schematic diagram for the cost-loss economic model developed inSection 5.3.4 for Daisy Lake. Water that does not spill can be channeled throughthe penstock to the turbines to produce power and therefore revenue. Figurebased on McCollor and Stull (2008b). . . . . . . . . . . . . . . . . . . . . . . 895.4 MAE and RMSESS for ensemble median forecasts derived from the variousM2M configurations. Perfect deterministic forecasts have MAE of zero andRMSESS of one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.5 Ignorance and continuous ranked probability scores for probabilistic forecastsderived from the various M2M configurations. Lower values are preferred forthese scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.6 Forecast value as a function of user-specific cost-loss ratio ? for the Full 1-dayM2M probability forecast. Relative value of zero indicates that the forecastingsystem offers no benefits over climatology, while perfect forecasts have relativevalue of one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.7 Forecast value as a function of user-specific cost-loss ratio ? for the Full M2Mprobability forecast (black line), and the various reduced ensemble configura-tions (coloured lines). The range of ? valid for Daisy Lake reservoir operationfor Sm/Sc from 1 to 10 are indicated by grey-shading. . . . . . . . . . . . . . 965.8 Same as Figure 5.7, but for an inflow anomaly threshold of 100 m3/s . . . . . . 97xivList of figuresC.1 A subset of BMA weights calculated using an adaptive updating scheme (upperpanel) and a moving window (middle panel). The weights are stacked such thatthicker areas represent larger weights. Results from observation-driven modelruns (simulations) made with the different model parameterizations are shownin the lower panel with observed inflows for comparison. Weights calculatedusing the moving window change with model performance, but with a signifi-cant time lag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137xvAcknowledgmentsFirstly, my endless thanks go to my supervisor, Dr. Roland Stull for his support and guidance. Hisenthusiasm and encouragement have been inspiring throughout the years. I also wish to thank mysupervisory committee members, Dr. Sean Fleming and Dr. Doug McCollor, for their valuableinput and expertise.I would like to thank all my colleagues, past and present, at the University of British Columbia(UBC), Canada, firstly, for their input and expert assistance, and secondly, for making it such anenjoyable place to be. Thanks go to Dr. Rosie Howard, May Wong, Dr. Greg West, Roland Schigas,Bruce Thomson, Dr. Atoossa Bakhshaii, Jesse Mason and Katelyn Wells. This research would nothave been completed in such a timely manner if not for the tireless efforts of George Hicks II andDr. Henryk Modzelewski in retrieving archived weather forecasts. Thank you to Dr. Thomas Nipenfor creating the COMPS framework and making it available for my use, and for answering manyquestions about probability modelling and calibration.I am grateful to BC Hydro, and in particular, Scott Weston, for providing hydrometric datafor the Cheakamus watershed. Dr. Nicholas Kouwen and Dr. Jo?rg Schulla provided indispens-able guidance during the setup and calibration of the WATFLOOD and WaSiM hydrologic models,respectively. Computer code for linking DDS with WaSiM was modified from that graciously pro-vided by Dr. Thomas Graeff. Computational resources and weather forecasts were provided by theGeophysical Disaster Computational Fluid Dynamics Centre at UBC.I am especially grateful to my parents, my best friend and sister, Jillian, and family and friends.Your unwavering support has made this possible.Finally, the primary funding for this research was provided by the Canadian Natural Sciencesand Engineering Research Council (NSERC) in the form of an Alexander Graham Bell CanadaGraduate Scholarship (CGS-D). Additional funding was provided by the Department of Earth,Ocean and Atmospheric Sciences at the University of British Columbia and a NSERC DiscoveryGrant awarded to Dr. Roland Stull.xviDedicationDedicated to my loving parents, Ron and Karen BourdinxviiChapter 1Introduction1.1 Uncertainty in Hydrologic Model PredictionsHydrologic models are simplified representations of a complex physical system ? the terrestrialcomponent of the hydrologic cycle. The applications of hydrologic models are many and varied,ranging from the assessment of the impacts of long-term climate or land use change to operationalforecasting of streamflows for flood forecasting or hydroelectric reservoir operation. Predictionsderived from hydrologic models carry with them some amount of uncertainty. This uncertaintycomes not only from the simplification of hydrologic process representation, but also from errors ininput data, incomplete knowledge of antecedent conditions, and uncertainty in model parameters.Model process uncertainty arises from the simplified or incorrect representation of hydrologicprocesses or their omission entirely. Imperfect process representation may be caused by the nec-essary use of simplified functional relations between hydrologic elements or, alternatively, by ourinsufficient knowledge of the physics that govern these processes (Kitanidis and Bras, 1980; Niehoffet al., 2002). As data availability and computational speed have increased, the model representationof hydrologic processes has become more accurate (Kouwen et al., 2005). In some cases, however,process understanding is still so limited that we are forced into black-box approaches (Sivapalanet al., 2003).The equations that govern hydrologic processes contain parameters having values derived fromobservation, professional experience, or model calibration. Parameter calibration is carried outthrough trial-and-error by adjusting model parameters until the model output is sufficiently close toobservations. Unfortunately, a common impediment to the successful calibration of model parame-ters is the availability of observations with which to compare the various model outputs (e.g., Brunand Band, 2000; Eckhardt and Ulbrich, 2003). The closeness of fit is measured by an objectivefunction, the choice of which can have an impact on the resulting optimum parameter set (O?zelkanand Duckstein, 2001; Wagener, 2003). Different objective functions may be sensitive to differentparts of the hydrograph; the choice of a single function will necessarily lead to a biased calibra-tion (Duan et al., 2007; Go?tzinger and Ba?rdossy, 2008). The non-unique dependence of model1Chapter 1: Introductionerror upon parameter values is commonly known in the hydrological literature as equifinality. Theexistence of multiple equally plausible parameter sets may suggest that none accurately representwatershed characteristics ? the right runoff can be obtained for the wrong reasons.Hydrologic state describes the conditions within the modelled watershed at any given time (e.g.,soil moisture, groundwater storage, snow-water equivalent, lake and stream levels, etc.). This stateforms the initial conditions from which hydrologic forecasts are started, and is an important sourceof uncertainty in the modelling chain. This uncertainty is related to hydrologic model and parameteruncertainty, because model states are updated by the model itself using observed meteorological orhydrologic data. Errors in the measurement of this data can lead to errors in hydrologic state.In a short-term operational forecast setting, hydrologic models are generally driven by weathermodel output. It has been reported that the uncertainty in numerical weather prediction (NWP)model output is the largest source of uncertainty in NWP-driven flow forecasts with a time horizonbeyond several days, whereas for shorter lead-times, uncertainties in the hydrologic model dominateprediction errors (Coulibaly, 2003; Cloke and Pappenberger, 2009). However, the comparativeimportance of the two forms of error over the two time scales depends on context; for an anticipatedheavy rainstorm in a small and rapidly responding catchment, uncertainty around the amount ofrainfall expected over the next day may have more impact on forecast quality than hydrologic modelerror.Note that distributed hydrologic models are often run with much higher spatial resolution thanatmospheric models, requiring the downscaling of meteorological fields from NWP to hydrologicmodel scale. This introduces an additional source of uncertainty to the modelling chain. Particu-larly in complex terrain, the process of distributing meteorological data across the watershed shouldaccount for temperature lapse rates and the rate of increase of precipitation with elevation, bothof which may vary seasonally or on shorter time scales (Alila and Beckers, 2001). Elevation de-pendence can be incorporated into downscaled NWP fields using regression techniques (e.g., Daly,2006; Kurtzman and Kadmon, 1999) or inverse-distance weighting with constant linear lapse rateadjustments (e.g., Leemans and Cramer, 1991; Willmott and Matsura, 1995; Westrick and Mass,2001).1.2 Previous Related Work1.2.1 Ensemble Hydrologic ModellingResearch into probabilistic weather forecasts began in the 1960s, building on Lorenz?s (1963) workin chaos theory. By the 1980s, ensembles of forecasts based on multiple initial conditions (multi-2Chapter 1: Introductionanalysis ensembles) were being made in research mode. The first operational Ensemble PredictionSystem (EPS) was generated by the US National Centers for Environmental Prediction (NCEP) in1992 (Sivillo et al., 1997). Ensemble forecasts from the Meteorological Service of Canada combinethe multi-analysis and varied-model ensemble approaches (in which the same model is run withalternative physics schemes or parameterizations) to sample a wide range of predictive uncertainty(Environment Canada, 2013). Super-ensembles or grand-ensembles, derived from the combina-tion of ensembles from each of several forecast centres, comprise a truly probabilistic approach,accounting for uncertainties in initial conditions, parameterizations, and model structure (Ross andKrishnamurti, 2005). A similar approach is needed for hydrologic applications to increase ensemblespread and capture the full range of predictive uncertainty (Krzysztofowicz, 2001).The success of ensemble weather forecasting has led to its adoption in hydrology, primarilythrough the use of ensemble NWP output to drive a deterministic hydrologic model (Cloke andPappenberger, 2009). That is, the same hydrologic model is re-run using different weather predic-tions to generate an ensemble of hydrographs. The first efforts in ensemble streamflow prediction(ESP) used an ensemble of meteorological observations from the climate record for long-term pre-diction (Day, 1985). ESP methods of this type are still routinely used for seasonal to annual watersupply forecasting purposes (i.e., for forecast time scales at which NWP models do not provideskill). Operational weather forecast ensembles such as those distributed by the European Centrefor Medium-Range Weather Forecasts (ECMWF) have been applied to flood forecasting in researchmode (e.g., Gouweleeuw et al., 2005; Roulin and Vannitsem, 2005).In NWP, ensembles are commonly generated by running a weather model with multiple setsof varying initial conditions. This follows naturally from the existence of deterministic chaos ? ahighly non-linear sensitivity to initial conditions ? in the weather, and our inability to know theexact state of the atmosphere at any given time (Lorenz, 1963). The possible existence of chaos inhydrologic processes has been investigated with inconclusive results (Sivakumar, 2000; Sivakumaret al., 2001; Khan et al., 2005). Nevertheless, errors in initial conditions are recognized as animportant source of uncertainty in hydrologic modelling (Liu and Gupta, 2007). Estimates of stateuncertainty are commonly made using the Ensemble Kalman Filter (EnKF) (e.g., Evensen, 1994;Andreadis and Lettenmaier, 2006; Clark et al., 2008), which generates an ensemble of hydrologicstates. This method has shown promise in assimilation of remotely sensed snow coverage and snow-water equivalent data in complex terrain (Andreadis and Lettenmaier, 2006). An EnKF variant, thebias-aware Retrospective Ensemble Kalman Filter (REnKF) has been shown to successfully updatestate variables using observations of discharge by accounting for associated time lags (Pauwels andDe Lannoy, 2006). The particle filter (PF) is an alternative data assimilation method that is notsubject to the limitations of EnKF such as the use of Gaussian distributions to model non-normally3Chapter 1: Introductiondistributed hydrological errors (Moradkhani et al., 2005a; Moradkhani and Sorooshian, 2009). ThePF has been used for assimilation of remotely-sensed and in-situ snow water equivalent data andobserved streamflow, and has been shown to produce state estimates and subsequent streamflowforecasts of higher quality than the EnKF method (e.g., Moradkhani et al., 2005a; DeChant andMoradkhani, 2011b; Leisenring and Moradkhani, 2011).The existence of equally likely sets of parameter values has long been recognized (Binley et al.,1991) and has led to the development of probabilistic and stochastic methods for estimating param-eter uncertainty. For example, the Shuffled Complex Evolution Metropolis algorithm (SCEM-UA;Vrugt et al., 2003b) and the Multi-Objective Shuffled Complex Evolution Metropolis algorithm(MOSCEM-UA; Vrugt et al., 2003a) converge to an ensemble of parameter sets that can be usedto infer probabilistic uncertainty. The Simultaneous Optimization and Data Assimilation (SODA)method combines SCEM-UA with an EnKF to improve the treatment of input, output, parame-ter uncertainty, and structural uncertainty, resulting in ?meaningful prediction uncertainty bounds?(Vrugt et al., 2005, pg. 2). However, these methods require knowledge of a prior distribution ofparameter values, which may be difficult to define. In practice, the prior distribution is usually takento be a noninformative (uniform) distribution, though in some algorithms this can lead to slow con-vergence to the posterior target distribution (e.g., Kuczera and Parent, 1998). Parameter estimationis affected by uncertainty in measured model input and output (e.g., rainfall and streamflow), andignoring this uncertainty can lead to biased and misleading model results. The Bayesian total erroranalysis (BATEA) methodology developed by Kavetski et al. (2006a) requires hydrologic modellersto incorporate all application-specific sources of data uncertainty into the modelling process. Themethod is effective at identifying and correcting input errors when the user-supplied error modelsare valid (Kavetski et al., 2006b). However, models of input uncertainty are poorly understood, andthe method is computationally demanding. A more simple method for quantifying parameter un-certainty consists of using multiple objective functions for creating multiple differently-optimizedparameter sets (Duan et al., 2007).As outlined above, data assimilation applications have primarily focussed on updating hydro-logic model states. Recent research has turned to simultaneous estimation of model states and modelparameters (Liu et al., 2011, and sources cited therein). Real-time updating of states and parametersallows the hydrologic model to more closely reproduce observed system response (Moradkhani andSorooshian, 2009). Unlike ?batch? parameter calibration techniques, which seek to minimize long-term prediction error over some historical period of calibration data, dual state-parameter estimationimproves flexibility and allows for the investigation of temporal variability in model parameters(Moradkhani et al., 2005a). Such methods can also be applied where long historical datasets areunavailable for batch calibration. Both the EnKF and PF data assimilation methods have been ap-4Chapter 1: Introductionplied to the dual state-parameter estimation problem (e.g., Moradkhani et al., 2005a,b; DeChant andMoradkhani, 2011b; Leisenring andMoradkhani, 2011). Liu et al. (2011) note that data assimilationmethods in general have not been adequately implemented in operational settings due to a numberof challenges including the availability of observed data, the specification of uncertainty in the data,and computational burden. Recent research has attempted to make the particle filter method moreviable for operational prediction, but computational expense is still an issue (Moradkhani et al.,2012).Equifinality refers not only to the existence of different parameter sets within a model structurethat produce acceptable simulation results but also to the existence of many possible suitable modelstructures. Beven and Binley (1992) define model structure as including model processes and otherconsiderations such as spatial discretization. Structural uncertainty is typically handled through theuse of multiple hydrologic models. Shamseldin et al. (1997) appear to have been the first to applythe multi-model ensemble approach in rainfall-runoff modelling, using four empirical models and asimple lumped, conceptual model. Their results showed that, in general, better discharge predictionscould be obtained through model combination. Others have also shown that multi-model ensemblemean hydrologic forecasts are able to outperform even the best single-model forecast within theensemble (e.g., Coulibaly et al., 2005; Ajami et al., 2006).In order to generate a truly probabilistic forecasting system, it is necessary to sample all sourcesof uncertainty in the modelling chain (Krzysztofowicz, 2001). To date, operational and researchefforts into probabilistic streamflow forecasts through the use of ensembles have neglected somesources of uncertainty. For example, the US National Weather Service River Forecast System gen-erates operational probabilistic water supply forecasts using the original ESP method of Day (1985)(Franz et al., 2003). Thus, the forecasts account only for uncertainty in meteorological inputs, andignore non-stationarity. Georgakakos et al. (2004) used multiple calibrated and uncalibrated hydro-logic models, some with many parameter sets to assess streamflow prediction uncertainty. Duanet al. (2007) used three hydrologic models, each calibrated using three different objective functionsto derive a nine-member ensemble that assessed uncertainty arising from model structure and pa-rameter uncertainty. Carpenter and Georgakakos (2006) used a Monte Carlo sampling frameworkto account for both parametric and radar-rainfall uncertainty. BC Hydro?s Absynthe modelling pro-cedure for daily inflow likewise incorporates ensemble weather forecasts and multiple parametersets for a single hydrologic model (Fleming et al., 2010). Other examples of hydrologic ensemblesthat incompletely sample uncertainty include but are not limited to: Vrugt et al. (2005); Moradkhaniet al. (2005b); Randrianasolo et al. (2010); Thirel et al. (2010); Van den Bergh and Roulin (2010),and De Roo et al. (2011).5Chapter 1: Introduction1.2.2 Uncertainty ModellingEnsemble forecasting techniques are designed to sample the range of uncertainty in forecasts, butare often found to be underdispersive in both weather and hydrologic forecasting applications (e.g.,Eckel and Walters, 1998; Buizza, 1997; Wilson et al., 2007; Olsson and Lindstro?m, 2008; Woodand Schaake, 2008). In order to correct these deficiencies, uncertainty models can be used to fit aprobability distribution function (PDF) to the ensemble, whereby the parameters of the distributionare estimated based on statistical properties of the ensemble and past verifying observations. Thesetheoretical distributions reduce the amount of data required to characterize the distribution (for ex-ample, from n ensemble members to two parameters describing the mean and spread of a Gaussiandistribution), and allow estimation of probabilities for events outside of the range of observed ormodelled behaviour (Wilks, 2006).Uncertainty models make different assumptions about how the ensemble members and observa-tions are generated. A common method for producing probability forecasts is the binned probabilityensemble (BPE; Anderson, 1996). The assumption in this case is that the N ensemble members andthe unknown verifying observation are drawn from the same unknown probability distribution. Theobservation then has an equally likely probability of (N + 1)1 of falling between any two con-secutive ranked ensemble members, or outside of this range. Alternatively, centering a Gaussianprobability distribution on the ensemble mean with spread proportional to the ensemble variancemakes the assumption that the ensemble mean forecast errors are normally distributed (or, equiva-lently, that the verifying observations are drawn from a normal distribution centred at the ensemblemean). This model also assumes the existence of a spread-skill relationship. That is, the spreadof the ensemble members should be related to the accuracy (or skill) of the ensemble mean; whenthe forecast is more certain, as indicated by low ensemble spread, errors are expected to be small.However, this relationship is often tenuous (e.g., Hamill and Colucci, 1998; Stensrud et al., 1999;Grimit and Mass, 2002). Bayesian Model Averaging (BMA) is an alternative uncertainty model thatassigns probability distributions to the individual ensemble members and takes the forecast PDF tobe the weighted sum of these distributions (Raftery et al., 2005). The weights indicate the likeli-hood of each distribution being the correct one, and are based on past performance of the individualensemble members.In cases where a forecast PDF is fitted to the ensemble, the shape of the PDF should correspondto the shape of the empirical distribution of the forecast errors. For a simple Gaussian distributioncentred on the ensemble mean, the errors of the ensemble mean forecast are used. In the case ofBMA, the individual distributions should match the shape of the corresponding ensemble member?sforecast errors. Hydrologic variables and their errors are often described as being non-normally6Chapter 1: Introductiondistributed, and are therefore transformed into a space in which the errors become normally dis-tributed, and the transformed variable can be modelled using the simple Gaussian PDF (e.g., Duanet al., 2007; Reggiani et al., 2009; Wang et al., 2009). The log-normal distribution, which amountsto fitting a simple Gaussian distribution to log-transformed data, has a long history of use in hydrol-ogy, and is still commonly applied today (e.g., Chow, 1954; Stedinger, 1980; Lewis et al., 2000;Steinschneider and Brown, 2011). This distribution is particularly well-suited to streamflow andinflow forecasting, as it assigns probabilities only to positive forecast values.If the uncertainty model assumptions are valid, the resulting probability forecasts should bestatistically reliable or calibrated, meaning that an event forecasted to occur with probability pwill, over the course of many such forecasts, be observed a fraction p of the time (Murphy, 1973).Otherwise, the probabilistic forecasts cannot be used for risk-based decision making, since the prob-abilities cannot be taken at face value. Reliability is easily corrected using probability calibrationmethods, discussed next. Note that the probabilistic definition of calibration differs from that usedin hydrologic modelling. In the latter field, calibration is the process of obtaining hydrologic modelparameters tuned for a particular watershed (described in Section 1.1). Both probability calibrationand hydrologic model calibration are addressed in this dissertation. Thus, to avoid ambiguity, hy-drologic model calibration will be referred to as parameter optimization or simply optimization, andthe term calibration will be used in the probabilistic sense.1.2.3 Statistical PostprocessingMeteorological and hydrologic forecasts contain both systematic and random errors. Systematicerror, also known as (unconditional) bias, can arise due to differences between modelled and actualtopography, and due to deficiencies in model representation of physical processes. The objective ofbias correction is to reduce the systematic error of future forecasts by using statistical relationshipsbetween past forecasts and their verifying observations. Random error can be reduced throughensemble averaging, though the full ensemble contains valuable information regarding probabilitiesof possible future outcomes (Anderson, 1996).In a multi-model ensemble context, ensemble members derived from different dynamical (NWPand/or hydrologic) models should be corrected by computing bias correction factors for the individ-ual members. In an ensemble where multiple realizations of a single dynamical model are used,application of a single correction factor (e.g., the bias of the ensemble mean) to all members isappropriate. If bias correction is not done prior to multi-model combination, spread and other mea-sures of ensemble performance can be artificially inflated due to the interaction of opposing modelbiases (Johnson and Swinbank, 2009; Candille et al., 2010). If component EPS biases do not bal-ance, then their combination can result in a degradation of forecast accuracy (Wilson et al., 2007).7Chapter 1: IntroductionFor this reason, many frameworks for generating reliable probabilistic forecasts through modelcombination begin with a bias correction step. A linear regression-type bias corrector is built intothe BMA framework described by Raftery et al. (2005) to correct the individual meteorologicalensemble members prior to combining them into a probabilistic weather forecast. Likewise, themore generalized approach of Johnson and Swinbank (2009) uses the bias of the ensemble meanforecast to correct the individual members from a single dynamical model. Vrugt and Robinson(2007) have suggested that the global regression-based correction of the original BMA frameworkis too simple to be useful in hydrology where model errors are non-Gaussian and heteroscedastic(i.e., proportional to flow level), and that local non-linear bias correction should be applied basedon modelling errors in the immediate past.Various methods of statistical calibration have been devised to correct for conditional or distri-butional biases in probabilistic forecasts. These can generally be split into two groups: ensemblecalibration, which adjusts individual ensemble members in order to produce reliable forecasts; andprobability calibration, which adjusts the probabilities directly. Examples of ensemble calibrationinclude BMA (Raftery et al., 2005) and generalizations thereof (e.g., Johnson and Swinbank, 2009).When the BPE is used to generate a probabilistic forecast, information contained in the rankhistogram (Anderson, 1996; Talagrand et al., 1997) can be used for probability calibration (Hamilland Colucci, 1997). The probability mass between consecutive ensemble members is adjusted basedon how often historical observations fell into that bin. This amounts to shifting the cumulativedistribution function (CDF) at each ensemble member to the frequency of historical observationsfalling below that particular rank. This weighted ranks (WR) method has been found to producemore reliable and generally higher quality probabilistic quantitative precipitation forecasts than theraw ensembles or even model output statistics (MOS) (e.g., Eckel and Walters, 1998; Hamill andColucci, 1998). The WR method can be generalized to calibrate probabilistic forecasts generated byother probability models, where reliability is assessed using a probability integral transform (PIT)histogram (Gneiting et al., 2005). In this case, the forecast CDF values are relabelled based on thedistributions of past PIT values. Nipen and Stull (2011) have shown that this method can improvethe reliability and other scores of probabilistic forecasts generated using BPE and even BMA.In hydrologic EPSs, bias correction has focused on the removal of unconditional bias fromlong-term (e.g., monthly or seasonal) forecasts (Hashino et al., 2007; Wood and Schaake, 2008).Post-processing of short-term hydrologic EPSs has focused on the correction of conditional or dis-tributional bias of probabilistic forecasts to generate reliable probabilities (Seo et al., 2006; Zhaoet al., 2011). Bayesian methods have been applied successfully in hydrologic forecasting applica-tions over a range of timescales (e.g., Duan et al., 2007; Reggiani et al., 2009; Wang et al., 2009;Parrish et al., 2012). Probability calibration on the other hand, has not yet been widely adopted8Chapter 1: Introductionby the hydrologic modelling community. Olsson and Lindstro?m (2008) provide an example of avery simple probability calibration used to improve ensemble spread. Roulin (2007) applied theweighted ranks method to medium-range forecasts of streamflow and found very little improvementto the already reliable forecasting system.When weather model output is used to drive a hydrologic model, bias correction is often appliedto the precipitation forecasts (Kouwen et al., 2005; Yoshitani et al., 2009; Westrick et al., 2002). Theimportance of post-processing the inputs to hydrologic models has been discussed in the literature(e.g., McCollor and Stull, 2008a; Yuan et al., 2008). Mascaro et al. (2010) have shown that wellcalibrated precipitation forecasts will indeed yield reliable probabilistic streamflow predictions de-spite the nonlinearities in the hydrologic model. They also found that underdispersive precipitationforecasts do not necessarily lead to underdispersive streamflows. Thirel et al. (2008) have shownthat underdispersive precipitation forecasts can lead to even more highly underdispersive streamflowforecasts in both short- and medium-range applications. Mascaro et al. (2011) have demonstratedthat dispersion in a streamflow forecast is highly dependent on the antecedent rainfall; when thewatershed has more initial wetness, the streamflow forecast is increasingly controlled by the deter-ministic nature of a previous rainfall event. This indicates that the uncertainties in the hydrologicmodel and its initial conditions are extremely important, and thus further calibration of the down-stream model output may be necessary. Olsson and Lindstro?m (2008) have suggested that separatetreatment of meteorological and hydrologic errors may be desirable from a scientific standpoint, butthat operationally, only adjustment of the final hydrologic output is necessary.1.3 Dissertation Case Study and ContributionsThe ensemble and probabilistic forecasting methods developed in this dissertation are used to pre-dict inflows to the Daisy Lake reservoir, a hydroelectric facility on the upper Cheakamus River insouthwestern British Columbia (BC), Canada. This reservoir is operated by the British ColumbiaHydro and Power Authority (BC Hydro). The total area of the Cheakamus watershed upstreamof the reservoir is 721 km2, approximately 8% of which is glaciated. Elevation within the studybasin ranges from 341 m to 2677 m above sea level with a median elevation of 1401 m. Inflows tothe Daisy Lake reservoir are primarily driven by snowmelt during spring and summer with a smallglacial melt component. A secondary inflow peak occurs during the fall and winter storm seasonwhen Pacific frontal systems can bring significant inflows, particularly in the case of rain-on-snowevents that can be difficult to predict. The watershed responds rapidly to such events, generatinginflow time series with steep rising and falling limbs; watersheds with this type of response arecommonly referred to as flashy.9Chapter 1: IntroductionThis watershed was selected because it presents various modelling challenges. NWP forecastsare complicated by the region?s complex terrain, which can lead to strong orographic gradients inprecipitation fields (which can in turn be highly dependent on storm track). Other challenges in-clude cold air damming episodes and difficulties in forecasting temperature profiles and thereforeprecipitation phasing. High-resolution NWP models may be able to capture these processes, as theyare able to represent complex topography more accurately than models using coarse grid scales.In order to take direct advantage of high-resolution NWP fields to drive the inflow forecasts, dis-tributed hydrologic models are required (as opposed to lumped, conceptual or empirical modellingapproaches) (Bourdin et al., 2012). For application to the case study watershed, these hydrologicmodels must be capable of modelling snow and glacier melt processes and lakes in complex terraingiven relatively limited input data.The main goal of this dissertation is to generate reliable probabilistic forecasts of inflow fora hydroelectric reservoir in complex terrain. This will be achieved by: sampling all sources oferror in the inflow modelling chain, thereby creating an ensemble of inflow forecasts; and by apply-ing statistical post-processing techniques including simple bias correction, uncertainty models, andprobability calibration. The dissertation components are described presently.1.3.1 Sampling Uncertainty in Inflow ForecastsThe primary contribution of this dissertation is the creation of a Member-to-Member (M2M) en-semble forecasting system that explicitly attempts to sample all sources of error in the hydrologicmodelling chain.In Chapter 2, a M2M forecasting system is generated by using individual members of a multi-model, multi-grid scale NWP ensemble to drive an ensemble of distributed hydrologic models.The NWP fields are downscaled using multiple interpolation schemes, thereby generating multiplemeteorological forcings from a single NWP output. This ensemble therefore explicitly samplesuncertainty in the NWP forecasts and the processes used to downscale them to the hydrologic modelscale, and the uncertainty in the hydrologic model structures.This M2M ensemble is expanded in Chapter 3 with the addition of multiple hydrologic modelparameterizations and multiple hydrologic states or initial conditions, which are used to begin eachdaily forecast during the case study period. The multi-parameter M2M component is created byoptimizing the parameters of the two hydrologic models using three different objective functions,thereby taking advantage of equifinality. The multi-state component is generated by updating thehydrologic state in the watershed each day using meteorological observations to drive each hydro-logic model with each parameterization.The full M2M ensemble developed in Chapter 3 consists of 72 ensemble members, each of10Chapter 1: Introductionwhich represents a different possible scenario of inflow to the Daisy Lake reservoir. It is believedthat this ensemble is the first example of a short-term hydrologic forecasting system that attemptsto explicitly sample all sources of error in the inflow modelling chain, thereby comprising a trulyprobabilistic forecasting system as defined by Krzysztofowicz (2001).1.3.2 Bias CorrectionPrior to combining the M2M ensemble members into probabilistic or ensemble mean forecasts, abias correction factor is applied to each individual ensemble member. In a multi-model ensemblecontext, ensemble members derived from different dynamical (numerical weather prediction and/orhydrologic) models should be corrected by computing bias correction factors for the individualmembers.In Chapter 2, a simple bias correction scheme is developed that makes use of the degree of massbalance (DMB). The DMB is simply the ratio of past forecast inflows to past observed inflows, cal-culated over a moving window. A bias-corrected forecast is generated by dividing the raw forecastby the DMB factor. The use of a multiplicative bias corrector ensures that bias-corrected inflow fore-casts never become negative. Unlike regression-based bias correction schemes, which can requirelengthy training periods, the method employed in this study is able to handle the heteroscedasticnature of hydrologic forecast errors (Vrugt and Robinson, 2007).1.3.3 Calibrated Probability ForecastsIn Chapter 4, the 72-member M2M ensemble forecasting system is transformed into a reliable prob-abilistic forecast by using suitable uncertainty models and applying probability calibration whennecessary.The uncertainty models are based on the Ensemble Model Output Statistics (EMOS) method ofGneiting et al. (2005). EMOS fits a probability distribution function to the ensemble, whereby theparameters describing the spread of the distribution are estimated based on statistical properties ofthe ensemble and the verifying observations. In this way, it is possible to implicitly account for anyuncertainty that is neglected or simply underestimated by the M2M ensemble.An intelligent calibration strategy is employed to correct for calibration deficiencies duringperiods when the uncertainty model produces unreliable forecasts. Calibration is done using thePIT-based method of Nipen and Stull (2011), which relabels forecast probabilities based on the dis-tribution of past PIT values accumulated over a training window. This is the first application of themethod to hydrologic forecasting.11Chapter 1: Introduction1.3.4 Economic AnalysisSince the sum total of the costs associated with the full M2M ensemble (e.g., time spent setting up ahydrologic model or price paid for high-resolution NWP fields) may be prohibitive for operationalforecasting applications, it is prudent to evaluate the economic value of each M2M component. Ifthe price paid for each component is known, such an analysis can be used to determine whether ornot they are cost-effective. Murphy (1993) identified three types of forecast ?goodness?: consis-tency (i.e., between a forecaster?s best judgement and the actual forecast), quality, and value. Value,which is concerned with economic worth to the forecast end user, is the focus of Chapter 5.In order to determine the economic value of each component of the M2M ensemble (multipleNWP models and grid scales, multiple distributed hydrologic models, multiple model parameteriza-tions and multiple hydrologic states), a simple cost-loss decision model is developed for the DaisyLake reservoir based on the work of Richardson (2000) and McCollor and Stull (2008b). By com-paring the economic value of the full M2M ensemble probability forecasts to that achieved by M2Mconfigurations with individual ensemble components removed, it is possible to estimate the valueadded by the individual components.12Chapter 2Bias-Corrected Short-RangeMember-to-Member Ensemble Forecasts of ReservoirInflow2.1 IntroductionSince the first efforts in ensemble streamflow prediction over two decades ago (Day, 1985), ensem-ble hydrologic forecasting has grown immensely in scope. Hydrologic ensemble prediction systems(EPSs) have been developed that include multiple weather inputs (Gouweleeuw et al., 2005; Roulinand Vannitsem, 2005), multiple assimilated hydrologic states or initial conditions (Pauwels and DeLannoy, 2006; Clark et al., 2008), multiple parameter sets (Vrugt et al., 2003a,b) and multiple hy-drologic models (Shamseldin et al., 1997; Ajami et al., 2006). Such ensembles attempt to samplethe range of uncertainty in hydrologic prediction that is caused by errors in these components of themodelling chain (Bourdin et al., 2012).Meteorological and hydrologic forecasts contain both systematic and random errors. Systematicerror, also known as (unconditional) bias, can arise due to differences between modelled and actualtopography, deficiencies in model representation of physical processes, and errors in model param-eterization. The objective of bias correction is to reduce the systematic error of future forecasts byusing statistical relationships between past forecasts and their verifying observations. Random errorcan be reduced through ensemble averaging, though the full ensemble contains valuable informationregarding probabilities of possible future outcomes (Anderson, 1996).In a multi-model ensemble, members derived from different dynamical (numerical weather pre-diction and/or hydrologic) models should be individually bias-corrected prior to their combination.In an ensemble where multiple realizations of a single dynamical model are used, a single correctionfactor (e.g., the bias of the ensemble mean) should be applied to all members. If bias correction isnot done prior to multi-model combination, spread and other measures of ensemble performance13Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowcan be artificially inflated due to the interaction of opposing model biases (Johnson and Swinbank,2009; Candille et al., 2010). If component EPS biases do not balance, then their combination canresult in a degradation of forecast accuracy (Wilson et al., 2007).For this reason, many frameworks for generating reliable probabilistic forecasts through modelcombination begin with a bias correction step. For example, a linear regression-type bias correctoris built into the Bayesian Model Averaging (BMA) framework described by Raftery et al. (2005) tocorrect the individual meteorological ensemble members prior to combining them into a probabilis-tic weather forecast. Likewise, the more generalized approach of Johnson and Swinbank (2009)uses the bias of the ensemble mean forecast to correct the individual members from a single dy-namical model. Vrugt and Robinson (2007) suggested that the global regression-based correctionof the original BMA framework is too simple to be useful in hydrology where model errors arenon-Gaussian and heteroscedastic (i.e., proportional to flow level), and that local non-linear biascorrection should be applied based on errors in the immediate past. In some applications of BMA,additive bias correction schemes have been used in place of global correction (e.g., Schmeits andKok, 2010). Parrish et al. (2012) have suggested the combined use of data assimilation methodswith BMA as a way of dealing with non-Gaussian error and other shortcomings of the method.In hydrologic EPSs, bias correction has focused on the removal of unconditional bias fromlong-term (e.g., monthly or seasonal) forecasts (Hashino et al., 2007; Wood and Schaake, 2008).Post-processing of short-term hydrologic EPSs has focused on the correction of conditional or distri-butional bias of probabilistic forecasts to generate reliable probabilities (Seo et al., 2006; Zhao et al.,2011). Madadgar et al. (2012) recently developed an ensemble post-processing method suitable forseasonal hydrologic forecasting that is able to remove both unconditional and conditional bias whileadditionally improving other aspects of ensemble quality. When weather model output is used todrive a hydrologic model, bias correction is often applied to the precipitation forecasts (Kouwenet al., 2005; Yoshitani et al., 2009; Westrick et al., 2002). The importance of post-processing theinputs to hydrologic models has been discussed in the literature (e.g., McCollor and Stull, 2008a;Yuan et al., 2008). However, Mascaro et al. (2011) have demonstrated that dispersion in streamflowensemble forecasts is highly dependent on hydrologic state, suggesting that further correction ofthe end forecast is likely to be required. Indeed, Olsson and Lindstro?m (2008) have suggested thatwhile separate treatment of meteorological and hydrologic errors may be desirable from a scientificstandpoint, from an operational point of view, only adjustment of the final hydrologic forecast isstrictly necessary.In this study, we develop a Member-to-Member (M2M) ensemble inflow forecasting system thatincorporates multiple weather models driving multiple hydrology models for a total of 24 ensemblemembers. That is, individual NWP forecast ensemble members are used to drive the individual14Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowmembers of the DH ensemble ? hence the term ?Member-to-Member?. We then assess the qualityof both probabilistic and deterministic ensemble mean forecasts before and after applying differentbias correction schemes to individual ensemble traces. Evaluation is based on daily inflow forecastsfor a hydroelectric reservoir in the complex terrain of southwestern British Columbia, Canada.2.2 Case Study Area and DataThe M2M ensemble is used to forecast inflows to the Daisy Lake reservoir, a hydroelectric facilityon the upper Cheakamus River in southwestern British Columbia (BC), Canada, operated by theBritish Columbia Hydro and Power Authority (BC Hydro) (Figure 2.1). The total basin area up-stream of the reservoir is 721 km2, approximately 8% of which is glaciated. Elevation within thestudy basin ranges from 341 m to 2677 m above sea level with a median elevation of 1401 m. Hind-casts from the M2M system are tested over the 2009?2010 water year, which was characterized byEl Nin?o conditions during the winter months that weakened throughout the spring and shifted intoa La Nin?a state by late summer. In southwestern BC, it is well documented that El Nin?o episodesgenerally bring warmer, drier weather, while La Nin?a episodes are characterized by cooler and wet-ter than normal conditions (e.g., Mantua et al., 1997; Dettinger et al., 1998; Fleming et al., 2007;Fleming and Whitfield, 2010). The El Nin?o episode that occurred during the case-study water yearwas relatively wet for the region of interest and winter snow accumulation at low elevations was be-low normal due to above-average temperatures. Note that for this particular hydroclimatic regime,a water year is defined as the period from October 1 of the starting year through September 30 ofthe following year.Inflows to the Daisy Lake reservoir are primarily driven by snowmelt during spring and summerwith a small glacial melt component. A secondary inflow peak occurs during the fall and winterstorm season when Pacific frontal systems can bring significant inflows, particularly in the case ofrain-on-snow events that can be difficult to predict. Daily average inflow rates are calculated by BCHydro using a water balance based on observed reservoir levels and outflows. The calculated inflowsused in this study have undergone quality control and are considered to be of high quality. For thepurposes of this study, these values will be referred to as observed inflows. The Water Surveyof Canada (WSC) collects streamflow data for the Cheakamus River above Millar Creek (CHK)location (Figure 2.1). This data source was used in various stages of watershed model parameteroptimization, but is not used to verify forecasts made by the M2M inflow forecasting system.The Cheakamus basin upstream of Daisy Lake has limited coverage with respect to meteorolog-ical observations, especially at high elevation. BC Hydro operates three data collection platforms(DCP) within the watershed. These are located at the Daisy Lake Dam (CMS), on the Cheakamus15Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow 24?  12?  123oW  48?  36?  48?  54?   50oN   6?  12?   0 5 10 15 20 25 kmCHKCMSCMUWSKWAENULDaisy LakeBoundaryLakeGlacierStreamWeather StationFigure 2.1: Map of the Cheakamus basin above the Daisy Lake reservoir, located in south-western BC. ASTER global digital elevation model background map is a product of theJapanese Ministry of Economy, Trade and Industry (METI) and the National Aeronauticsand Space Administration (NASA), with higher elevations represented by lighter shadesof grey.River aboveMillar Creek (CHK) and the Upper Cheakamus site (CMU). Additional coverage is pro-vided by observing stations at the Whistler (WAE) and Squamish Airport (WSK) stations operatedby Environment Canada (EC). All weather station locations and identifiers are shown in Figure 2.1.These observing platforms range in elevation from 52 m above sea level at WSK to 880 m at CMU.Daily maximum and minimum temperatures and 24-hour accumulated precipitation observa-tions are available for all stations. WAE and WSK additionally provide observations of wind speedand humidity. During the model optimization and validation periods (1986 ? 1997), missing hu-midity and wind speed observations were filled in using linear regression when one of WSK orWAE were available, and using monthly climatological values when both were missing. BC Hy-16Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowdro DCP data undergoes stringent quality control and has a complete record of observations duringthis period. An imaginary weather station (designated as ?NUL? in Figure 2.1) has been placedin the eastern portion of the watershed to improve meteorological data coverage. Data from theCMU site were copied to the NUL site, which was carefully selected based on elevation, aspect,surrounding terrain, and comparison of PRISM (Parameter-elevation Regressions on IndependentSlopes Model) 1961?1990 climate normal data at the two sites, which was downscaled to 400 mresolution by ClimateBC (PRISM Climate Group, 2012; Wang et al., 2006). We expect that withoutNUL, the inverse-distance downscaling of meteorological observations to hydrologic model gridscale (discussed in Section 2.3.3) would be less accurate in the eastern half of the watershed. Sincethe nearest real weather stations are located in regions of lower elevation with different aspect, pre-cipitation measurements at these sites will likely do a poor job of characterizing rainfall events inthe mountainous terrain of eastern Cheakamus, even following adjustments for elevation.Continuous snow pillow observations are available from the WSC for all of the parameter opti-mization period and six out of ten years during the validation period for the Squamish Upper site,located outside of the western boundary of the Cheakamus watershed at an elevation of 1340 m. Aproxy site was selected for verification of simulated snow water equivalent (SWE) at a location justinside the western watershed boundary. Site selection was again based on elevation, aspect, terrain,and a comparison PRISM-ClimateBC data at the real and proxy locations.2.3 A Member-to-Member (M2M) Ensemble Forecasting SystemThe M2M ensemble inflow forecasting system developed and applied in this study incorporatesmultiple Numerical Weather Prediction (NWP) models, which are downscaled using multiple inter-polation schemes, and finally used to drive multiple Distributed Hydrologic (DH) models. That is,individual members of the NWP ensemble drive individual members of the hydrologic ensemble. Adescription of each of these components follows.2.3.1 Numerical Weather Prediction ModelsThe NWP models are taken from the operational ensemble suite run by the Geophysical DisasterComputational Fluid Dynamics Centre (GDCFDC), in the Department of Earth, Ocean and Atmo-spheric Sciences at the University of British Columbia. The ensemble consists of three independentnested limited-area high-resolution mesoscale models with forecast domains centered over south-western BC.The Mesoscale Compressible Community (MC2) model is a fully compressible, semi-implicit,semi-Lagrangian, non-hydrostatic mesoscale model (Benoit et al., 1997). The fifth-generation Penn-17Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowsylvania State University-National Center for Atmospheric Research Mesoscale Model (MM5) is afully compressible, non-hydrostatic model designed for mesoscale and regional-scale atmosphericsimulation (Grell et al., 1994). Version 3 of theWeather Research and Forecasting (WRF) mesoscalemodel is also fully compressible and non-hydrostatic and has been developed as a community model(Skamarock et al., 2008).The coarse resolution (108-km horizontal grid spacing) outer nests of these three NWP modelsare initialized using the National Centers for Environmental Prediction (NCEP) North AmericanMesoscale (NAM) model, which also provides time-varying boundary conditions. All three NWPmodels produce forecast output at horizontal grid spacings of 36, 12, 4 and 1.3 km. The finer grids,which have smaller model domains due to computational time constraints, are nested inside of thecoarse grids from which they receive their time-varying boundary conditions. Due to the relativelysmall size of the case-study watershed, only NWP output from the three finest grids are used to drivethe DH models. The NWP models are initialized at 00UTC and run out to 60 hours (the 1.3-kmMC2 model runs for only 39 hours due to operational time constraints). NWP model forecast hoursbeginning from 00PST (08UTC) are used to drive the DH models. A multi-model, multi-grid-scaleensemble of weather forecasts consisting of at least six of the total nine members was issued everyday throughout the study period except for a five-day interval (April 9?13, 2010) in which onlyWRF members were available.2.3.2 Distributed Hydrologic ModelsIn order to take advantage of the high-resolution distributed NWP output available, two physically-oriented, distributed hydrologic models have been selected for use based on their suitability to thecase-study watershed. Specifically, the models must be able to simulate snowmelt and glacier meltprocesses and lakes in complex terrain given relatively limited input data. The DH models selectedfor this study are the Water balance Simulation Model (WaSiM; Schulla, 2012) and WATFLOOD(Kouwen, 2010).WaSiM is fully distributed and uses physically based algorithms for most process descriptions.Algorithms of varying complexity may be selected by the model developer based on data constraintsand knowledge of processes operating in the study watershed. For the current application, potentialevapotranspiration (PET) is based on the Penman-Monteith equation (Monteith, 1965), the infil-tration model is based on the Green and Ampt approach (Green and Ampt, 1911; Peschke, 1987),and soil water modelling and runoff are based on the TOPMODEL approach of Beven and Kirkby(1979). For sub-daily time steps, snowmelt can be modelled using a simple temperature index algo-rithm (Anderson, 1973) or a temperature-wind index approach in which melt rate is proportional towind speed (Schulla, 2012). Because of poor coverage (both spatial and temporal) for wind speed18Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowobservations in the Cheakamus basin, and because of the importance of snowmelt contributions toDaisy Lake inflows, we use the temperature index algorithm. The model was run using a 1 kmgrid spacing and an hourly time step. WaSiM uses gridded NWP output of hourly precipitation,temperature, wind speed, humidity and global radiation (the latter three variables being used by thePET module). For the duration of this study, global radiation (total direct and diffuse solar radiationat the ground surface) fields are only available from the MM5 models. Therefore WaSiM modelforecasts from all NWP models incorporate MM5 global radiation fields of corresponding NWPmodel grid scale.The WATFLOOD model similarly incorporates mainly physically based process descriptions,but operates in a semi-distributed nature using the Grouped Response Unit (GRU; Kite and Kouwen,1992.) The GRU approach lumps together Hydrologic Response Units (HRU), which are areasof similar land cover residing within one model grid square. Hydrologic processes are modelledidentically for each group of HRU, and the responses of each group are weighted and summed togenerate a total GRU outflow (Kouwen et al., 1993). This allows WATFLOOD to preserve sub-grid scale hydrologic variability (for example, that described by a high-resolution digital elevationmodel), while computing flows at a grid scale selected based on availability of meteorological inputsor the desired level of output detail. WATFLOOD uses the Hargreaves equation to estimate PET(Hargreaves and Samani, 1982). Infiltration is modelled by the Philip formula (Philip, 1954), whichis identical to the Green and Ampt approach except that it includes the effects of surface ponding.Snowmelt is modelled using the temperature index algorithm (Anderson, 1973). The NWP modeloutput fields utilized by WATFLOOD are hourly precipitation and temperature.Parameters of both DH models were optimized on observations of inflows at CMS and stream-flows at CHK, using inputs of observed meteorological quantities from the DCP and EC stations(Figure 2.1) to drive model simulations for a period of ten water years (October 1997 ? September2007). To run the models at an hourly time step, daily minimum (TMIN) and maximum (TMAX)temperatures were transformed into hourly temperatures using a sine curve connecting TMIN at0400 PST to TMAX at 1600 PST. Daily total precipitation was disaggregated for WaSiM by divid-ing the daily total into 24 equal hourly amounts. WATFLOOD incorporates a built-in disaggregationmethod whereby 1 mm of precipitation accumulates each hour until the daily total is met and equalhourly amounts are used if the daily total is greater than 24 mm. Global radiation inputs for theWaSiM model were calculated for each DCP and EC station location using equations from Stull(2000) with adjustment for atmospheric conditions based on Spokas and Forcella (2006).Parameter optimization consisted of a multi-stage process beginning with manual tuning of aparameter set previously used for a similar application. Then a series of automated optimizationswere run using the Dynamically Dimensioned Search (DDS) algorithm (Tolson and Shoemaker,19Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow2007; Graeff et al., 2012; Francke, 2012). First, each DH model was auto-optimized with 500 DDSruns to tune parameters expected to impact high flows in the basin (e.g., rain/snow partitioning andsnowmelt parameters) using the Nash-Sutcliffe Efficiency (NSE; Nash and Sutcliffe, 1970) of sim-ulated flow as an objective function. Continuing from the resulting parameter set, an additional 500DDS runs were done using the NSE of log-transformed flows to tune parameters affecting low flowperiods (e.g., soil parameters). Finally, three separate trials of 1000 DDS runs each were executedto optimize all previously-tuned parameters on the MAE of simulated CMS inflows only. While pa-rameter optimization should in theory seek to optimize all characteristics of the hydrologic regime,this study is concerned only with generating high-quality inflow forecasts. Simulated CHK stream-flow performance statistics still improved for both hydrologic models during the final optimizationstage (with the exception of WATFLOOD mean absolute error, which increased by 1.6%).The best of the three final-stage parameter optimization trials was selected based on perfor-mance during an independent validation period of ten water years spanning October 1986 throughSeptember 1996. Optimization and validation results for the best trial are shown in Table 2.1. Ob-served and simulated SWE at the proxy site for the Squamish Upper snow pillow were in goodagreement during the optimization and validation periods, with coefficient of determination (R2)values of 0.91 (optimization) and 0.77 (validation) for WaSiM. Values for WATFLOOD were 0.87and 0.83 respectively. The timing of annual peak and zero-SWE conditions was also acceptable forboth models. Snow pillow observations were not used during optimization due to extremely limitedcoverage, and because the use of a proxy watershed location introduces uncertainty. Also, becausethere is only one SWE measurement site, optimization of WATFLOOD SWE would be limited tothe particular land class in which the proxy site was located, which is somewhat arbitrary. SWE wasexcluded from WaSiM optimization in order to create a level playing field for the two models.Table 2.1: Performance of simulated inflows from the WaSiM and WATFLOOD hydrologicmodels during optimization (1997?2007) and validation (1986?1996) periods. Measuresof model performance are described in Section 2.5 and Appendix A.PerformanceMeasureWaSiM WATFLOODOptimization Validation Optimization ValidationNSE 0.79 0.72 0.75 0.79LNSE 0.73 0.75 0.79 0.81R2 0.79 0.73 0.77 0.79DMB 1.01 0.92 1.07 0.97MAE (m3/s) 12.9 15.3 12.9 13.8RMSE (m3/s) 19.8 24.8 21.4 21.620Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir InflowThese optimized models were then used to make ex-post-facto NWP-driven forecasts (or hind-casts) for the 2009?2010 water year to enable an independent verification in a setting similar toreal-time operational forecasts. To begin, WaSiM and WATFLOOD were spun up from uniform,snow-free initial conditions for the period of September 1?30, 2009 using observed meteorologicaldata to drive the models. The simulated hydrologic state for each model was saved at the end ofthis period to be used as an initial condition for the first NWP-driven M2M forecast run on October1, 2009. Each day of the study period, observed meteorological data are used to drive the hydro-logic models to update the model states, producing initial conditions for the day?s forecasts. Thisupdating is done to ensure that large hydrologic state errors do not accumulate due to poor NWPforecasts (Westrick et al., 2002). Observation-driven simulated inflows are created as a by-productof the state-updating process for WaSiM and WATFLOOD. Figure 2.2 illustrates this process ofgenerating updated hydrologic states, simulated inflows (driven by observed meteorological data),and forecasted inflows (driven by NWP forecasts) for an individual DH model. Grey arrows repre-sent input used to drive the model from a particular hydrologic state. Black arrows represent modelruns initialized from this state. Solid lines show the flow of meteorological observations into themodel and the resulting model runs that produce simulated inflows and updated hydrologic statesfor the following day. Dashed lines show the flow of NWP forecast fields into the model and theresulting 2-day inflow forecasts. The flow of time is indicated by the dash-dotted line along the topof the figure.Uniform, Snow-Free StateOct 1 00PSTUpdated StateOct 2 00PSTUpdated StateOct 3 00PSTUpdated StateHydrologic StateModel InputMeteorologicalObservationsSept 1 - 30, 2009MeteorologicalObservationsOct 1, 2009MeteorologicalObservationsOct 2, 2009MeteorologicalObservationsOct 3, 2009Model Output Simulated In!owOct 1, 2009Simulated In!owOct 2, 2009Simulated In!owOct 3, 2009NWP ForecastIssuedOct 1, 2009Forecast In!owOct 1-2, 2009Forecast In!owOct 2-3, 2009Forecast In!owOct 3-4, 2009NWP ForecastIssuedOct 2, 2009NWP ForecastIssuedOct 3, 2009...TimeFigure 2.2: Flowchart illustrating the process of generating updated hydrologic states, simu-lated inflows, and forecasted inflows for a particular hydrologic model.21Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow2.3.3 Downscaling of Meteorological InputEach DHmodel incorporates built-in methods for downscaling weather station data or gridded NWPforecast fields to the DH model grid scale. The downscaling or interpolation process introducesuncertainty into the forecasting chain, particularly for low-resolution meteorological inputs. There-fore, an ensemble of downscaling methods has been incorporated into the M2M ensemble in anattempt to account for such errors.WaSiM has several downscaling methods available for use, a few of which have been appliedhere: bilinear interpolation, Inverse-Distance Weighting (IDW) where the weight parameter is set totwo (i.e., distance-squared), altitude-dependent regression (REGR), and weighted combinations ofIDW and REGR. DCP and EC station observations of temperature, wind speed, humidity and globalradiation were downscaled using REGR, while precipitation was downscaled using a combinationof 25% IDW (with a search radius of 30 km) and 75% REGR. These methods were selected basedon the characteristics of each meteorological variable (Klok et al., 2001).NWP output fields at the 12 km grid scale are downscaled to the WaSiM grid using two differentmethods (where the same method is used for all meteorological variables): IDW with a searchradius of 12 km; and REGR. Outputs from the 4 and 1.3 km grids are downscaled using the bilinearinterpolation. Other methods were initially included in the M2M ensemble but were found to yieldresults too similar to those listed above.WATFLOOD similarly offers an inverse-distance weighting interpolation with a default weight-ing of distance-squared. This built-in scheme offers an option to incorporate elevation-dependenceusing a constant elevation adjustment rate (EAR); additional smoothing parameters can be specified.For downscaling of station observations, EARs for temperature and precipitation were respectivelyset to 3 C/km (decreasing with height) and -0.3 mm/km (increasing with height). The IDW searchradius was set to 20 km, and fields were smoothed over a distance of 5km. These EAR values wereselected based on examination of PRISM-ClimateBC fields surrounding DCP stations within thewatershed. The temperature EAR is slightly less than slope-air lapse rates measured during clearnights and saturated frontal conditions at Whistler Mountain by Erven (2012).NWP grids are downscaled using a search radius equal to their grid spacing. Smoothing isapplied to the 12-km fields, and two different sets of EARs are used: those used in downscalingmeteorological observations for model optimization and state updating, and a measured clear-daytemperature slope-air lapse rate of 8 C/km (Erven, 2012). The 4-km and 1.3-km NWP grids aredownscaled without elevation adjustment or smoothing.In summary, the full M2M ensemble consists of 24 different combinations of NWP models,downscaling procedures and hydrologic models. WaSiM is driven by 12 different sets of NWP22Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowinputs: 12-km NWP model output from MC2, MM5 and WRF downscaled using IDW and REGR,4-km outputs downscaled using bilinear interpolation, and 1.3-km outputs also downscaled usingthe bilinear algorithm. There are 12 additional members from the WATFLOOD model: 12-kmNWP output from the three NWP models downscaled using IDW with two different temperaturelapse rates, 4-km outputs downscaled using no elevation adjustment, and 1.3-km outputs likewisedownscaled without elevation dependence. To the best of our knowledge, this is the first exampleof a short-range ensemble inflow or streamflow forecasting system to incorporate multiple NWPmodels, multiple downscaling schemes, and multiple hydrologic models.2.4 A Simple Bias Correction MethodAn appropriate measure of bias for volumetric quantities such as precipitation and reservoir inflowis the degree of mass balance (DMB; McCollor and Stull, 2008a). The DMB is a measure of theratio of simulated or forecasted inflow to the observed inflow over a given period of time and isgiven by: DMBN = PNk=1 fkPNk=1 ok , (2.1)where DMBN is the degree of mass balance over an interval of N days and fk and ok are theforecasted and observed inflows, respectively, for the kth day prior to the current day. A DMB ofone indicates a forecast or simulation that is free of volumetric bias.A bias-corrected inflow forecast is calculated by:FBC = FRawDMBN , (2.2)where FBC is today?s bias-corrected daily inflow forecast, FRaw is today?s raw (uncorrected) dailyinflow forecast, andDMBN is the correction factor applied to the raw forecast. Forecast days 1 and2 are treated separately (i.e., the day 1 forecasts are corrected using a DMB of the day 1 forecastsvalid over the past N days, while the day 2 forecasts are corrected using the DMB of the day 2forecasts valid over the past N days). The use of a multiplicative bias correction factor ensures thatcorrected inflow forecasts do not become negative.In order to allow more recent forecast errors to have a bigger impact on the bias correction,an additional bias correction scheme is applied in which the DMB correction factor is a linearly-weighted average of the previous errors. This linearly-weighted DMB, calculated over an interval23Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflowof N days (denoted LDMBN ) is given by:LDMBN = NXk=1wk fkok , (2.3)where k, fk and ok are as previously defined and wk is the weight applied to the error for day k.The weight, given by: wk = N  k + 1PNi=1 i (2.4)is normalized such that the sum of applied weights is equal to one. The LDMBN correction factoris applied to the forecast using Eq. (2.2) and replacing DMBN with LDMBN .This bias corrector handles only the unconditional forecast bias, or the difference between thecentral locations of the forecasts and observations. Conditional bias, also known as distributionalbias or reliability (Appendix A) can be corrected using probability calibration methods (e.g., Hamilland Colucci, 1997; Seo et al., 2006; Zhao et al., 2011; Nipen and Stull, 2011; Madadgar et al., 2012).It will be shown that correction of unconditional bias improves forecast resolution, or the ability ofthe forecasting system to a priori differentiate future weather outcomes such that different forecastsare associated with distinct verifying observations. This is an important aspect of ensemble forecastquality that cannot be improved by conditional bias correction via probability calibration (Toth et al.,2003).The DMB and LDMB bias correction schemes described above are applied to each inflow en-semble member separately. Recall that the purpose of bias correction is to correct for systematicerrors in the dynamic NWP and DH models. Since each ensemble member is derived from a differ-ent NWP model driving a different DH model, individual member bias correction is appropriate inthis context. Moving windows of lengths N equal to 3, 7, 15, 30, 45 and 60 days are applied to theM2M ensemble members and compared in Section 2.6. If there are missing forecasts or observa-tions during the past N days, the N most recent days with available forecast-observation pairs areused instead. Thus, even very short training windows are not overly sensitive to missing data.2.5 Verification ApproachForecasted hourly inflow rates from each M2M ensemble member were averaged over each forecastday for verification against daily observed inflow rates. Measures-oriented verification statisticsare calculated for the M2M ensemble mean at each forecast horizon. Such measures of forecastquality include the DMB as a measure of forecast bias (a DMB of one indicating no bias), andthe Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as measures of accuracy24Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow(with perfect forecasts having MAE and RMSE of zero). Forecast bias and accuracy can also bedetermined by a visual assessment of inflow hydrographs. M2M skill is measured relative to azero-skill persistence forecast using the RMSE Skill Score (RMSESS). Statistical association ismeasured by the Nash-Sutcliffe Efficiency (NSE) and the NSE of Log-transformed flows (LNSE),which emphasizes forecast quality during periods of low flow.While measures-oriented verification scores are useful for evaluating the quality of the ensemblemean forecast or individual ensemble members, the true value of an ensemble is best describedusing distributions-oriented measures (Murphy and Winkler, 1987). Here, we employ the rankhistogram, Brier Skill Score (BSS), and the Relative Operating Characteristic (ROC). A descriptionof all measures- and distributions-oriented verification measures is provided in Appendix A.Both the Brier scores and ROC are calculated for forecast and observation anomaly thresholdsrelative to climatological inflow values. In order to ensure that the ensemble is not unduly rewardedfor making high inflow forecasts during the snowmelt period where little skill is required to doso, we subtract climatology from the forecasts and observations. This daily climatology is derivedfrom the median of observations on each calendar day over the period 1986?2008. A 15-day runningmean is then used to generate a smoothed climatology. BSS, its decomposition, and ROC curveswill be calculated for anomaly thresholds having inflow rates of -5.0, 2.7, and 19.5 m3/s. Thesecorrespond to the quartiles of 2009?2010 observed inflow anomalies.When comparing bias correction windows of different lengths, the verification periods includeonly days where all methods had enough prior forecast-observation pairs for calculation of DMBNor LDMBN . This ensures that shorter moving window corrections that are available earlier in thewater year are not penalized (rewarded) for difficult (easy) forecast cases during this period.2.6 Results and DiscussionThe raw ensemble traces for each ensemble member forecast are shown for the entire study periodin Figure 2.3. The consistency in forecast bias among WATFLOOD ensemble members and amongWaSiM ensemble members indicates bias in the simulations used to generate their initial conditions.Periods of strong positive (negative) M2M forecast bias are consistent with periods during whichthe daily simulated inflows exhibit positive (negative) bias relative to observed inflows.This failure to accurately simulate the watershed state may be due to incorrect distribution ofmeteorological observations during the winter El Nin?o and summer La Nin?a episodes. Errors inthe wintertime simulations for both models are largely consistent with errors in winter simulationsduring the optimization and validation periods where El Nin?o conditions prevailed; the snowmelt-related errors are likewise consistent with those seen during years with La Nin?a summers.25Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30050100150200250300350Inflow (m3 /s)2009?2010 Day 1 Raw Forecasts  WaSiM MembersWATFLOOD MembersObserved10/02 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 10/01050100150200250300350Inflow (m3 /s)Date (mm/dd)Day 2 Raw ForecastsFigure 2.3: Raw ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the 2009?2010 water year. Traces from the individual hydrologic models exhibit consistent bias,indicating a failure to accurately simulate the hydrologic state within the watershed.For example, the WATFLOOD model (as set up for this particular study) tends to simulate er-roneously high inflows during El Nin?o winters, and to be late in simulating snowmelt during LaNin?a summers. The winter errors could be due to distributed temperatures being too warm at highelevations where observations are not available. This would result in wintertime precipitation too-often falling as rain, and in an underestimation of the high-elevation snow pack. This hypothesis issupported by snowmelt-driven flows being undersimulated during spring/summer following severalof these El Nin?o winters. Conversely, high-elevation distributed temperatures may be too cold dur-ing the La Nin?a spring and summer. The use of alternative downscaling techniques or temperatureEARs for different climate indices may improve these simulations; the investigation of such alterna-tives is beyond the scope of this work. Data assimilation methods that update hydrologic state usingobserved SWE have shown promise for seasonal forecasting (DeChant and Moradkhani, 2011a),but may perform poorly for the Cheakamus basin due to the paucity of representative SWE data.26Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir InflowThe DMB and LDMB bias correction methods result in dramatic improvements in M2M en-semble mean forecast quality, with best results for a 3-day moving window (Figure 2.4). Forboth forecast horizons and all window lengths, the LDMB correction offers improvement over theequally-weighted DMB correction. Moving windows of 45 and 60 days were found to producebias-corrected ensemble mean forecasts that were worse than the raw output for some performancemetrics, and are therefore not shown (raw forecast scores are indicated by the horizontal lines inFigure 2.4). The relatively good DMB of the raw ensemble mean forecasts is likely a result of per-forming model combination prior to bias correction of the individual ensemble members. That is,a balance is achieved by combining the WATFLOOD ensemble members, which are generally toowet (with DMB values ? 1.2 for days 1 and 2), with the WaSiM members, which have a dry bias(DMB ? 0.9 for both days).0.750.80.850.90.951NSE0.750.80.850.90.951LNSE0681012MAE01214161820 RMSE?0.200.20.4RMSESS   Day 1 Raw Day 2 Raw Day 1 DMB Day 1 LDMB Day 2 DMB Day 2 LDMB3?day Window 7?day 15?day 30?day0.80.911.1 DMBFigure 2.4: Results of applying bias correction schemes with varying window lengths to day 1and day 2 forecasts as measured by ensemble mean verification metrics. Perfect forecastshave DMB, NSE, LNSE and RMSESS of one, and MAE and RMSE of zero.27Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir InflowThe bias in the hydrologic state used to start each NWP-driven forecast was found to be theprimary contributor to forecast bias. By correcting the individual M2M traces using the bias of thecorresponding simulated inflows calculated over the same window lengths, similar improvementsto those shown in Figure 2.4 were found for most performance metrics. The importance of thisbias source is likely the reason that short correction windows perform so well; since the Cheakamuswatershed is mountainous and flashy in nature, only recent forecast errors are likely to play animportant role in bias correction for short-term (1?2 day) forecasts.The LDMB3 bias-corrected ensemble traces (Figure 2.5) no longer exhibit strong bias. Theydo, however, show an erroneous forecast spike on January 14?15, 2010 that is not as pronouncedin Figure 2.3. An examination of NWP ensemble mean forecasts and observations at the CMUweather station reveals that this is caused by a combination of NWP failure and subsequent biascorrection. On January 11 and 12, raw inflow forecasts from all models were too low likely becauseNWP forecasts were colder and drier than observations, leading to snow accumulation rather thana rain-on-snow inflow event. The raw inflow forecast on January 15 is slightly larger than observedbecause the NWP forecasts were too warm and wet. This forecast failure, coupled with the largeDMB correction resulting from the January 11?12 forecast failure, leads to the false alarm issuedby the LDMB3-corrected forecast on January 14?15.The absence of strong bias in the LDMB3 forecasts is also evident in the bias-corrected rankhistograms in Figure 2.6. The raw forecast rank histograms exhibit an overall L shape, indicatingan over-forecasting bias. The peak in the middle of the raw histograms is due to the fact that theWATFLOOD and WaSiM ensemble members are tightly clustered and have opposing biases. Thus,observations are most likely to fall outside of the range of these ensembles, or somewhere betweenthe clusters. Following bias correction, the L shape is far less pronounced. Both the raw and bias-corrected ensembles are underdispersive; bias correction causes a slight reduction in dispersion.ROC diagrams (Figure 2.7) for the day 1 raw and LDMB3 bias-corrected ensembles indicatethat the bias-corrected ensemble is better able to discriminate between the occurrence and non-occurrence of inflow events of various magnitudes. The DMB3 bias-corrected ensemble performssimilarly to the LDMB3 corrected ensemble, with slightly less area under each curve.Figure 2.8 shows the BSS, relative reliability and relative resolution of raw and bias-correctedforecasts for the 19.5 m3/s (75th percentile) inflow anomaly threshold. LDMB3 is better than DMB3for day 1 in terms of all three metrics. For day 2 forecasts, however, DMB3 performs better thanLDMB3 as measured by the BSS. The decomposition shows that this is because of a deteriorationin LDMB3 reliability, which can easily be corrected by further post-processing. Day 2 resolution ofLDMB3 forecasts remains superior to the DMB3, and it is in this attribute that we find the intrinsicvalue of the forecasting system. These results point to the importance of separately handling uncon-28Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30050100150200250300350Inflow (m3 /s)2009?2010 Day 1 LDMB3 Forecasts  WaSiM MembersWATFLOOD MembersObserved10/02 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 10/01050100150200250300350Inflow (m3 /s)Date (mm/dd)Day 2 LDMB3 ForecastsFigure 2.5: Ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the 2009?2010water year following LDMB3 bias correction.ditional and conditional (distributional) bias or reliability. Reliability can be improved by removingconditional bias, whereas resolution can only be corrected by improving the forecasting ?engine?used to generate the ensemble, for example, through unconditional bias correction.2.7 ConclusionsTwo different bias correction schemes, each trained using windows of varying lengths, have beenapplied to a 24-member ensemble inflow forecasting system developed for the Daisy Lake reser-voir in southwestern British Columbia, Canada. Both bias correction schemes use the degree ofmass balance between past inflow forecasts and observations to correct future forecasts. Based onexamination of a suite of measures- and distributions-oriented verification metrics, we determinedthat a linearly-weighted combination of past DMB errors (with more recent errors being weightedmore heavily) performs slightly better than an equally-weighted combination, with both methods29Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow0 12 2410%20%30%40%Day 1 ? RawRankFrequency0 12 2410%20%30%40%Day 2 ? RawRank0 12 2410%20%30%40%Day 1 ? LDMB3RankFrequency0 12 2410%20%30%40%Day 2 ? LDMB3RankFigure 2.6: Rank histograms for day 1 and day 2 raw and LDMB3 bias-corrected ensembleforecasts.producing forecasts that are superior to the raw forecasts. The best improvement was obtained for a3-day bias correction training window, likely due to the importance of hydrologic state bias and theflashy, mountainous nature of the case study watershed.The bias correction schemes used in this study are simple and easily implemented in an oper-ational setting. They can be applied to any watershed, but the ideal training window is likely tobe basin-dependent. For example, larger basins with slow response times may have better resultswith longer training periods. There is very little overhead involved in calculating the bias correctionfactors, which require only (N + forecast length) days of past forecast-observation pairs. Thispresents an advantage over regression-based correction methods, which can require years of datafor training. Additionally, unlike global regression, the DMB and LDMB methods described hereincan handle the heteroscedasticity of errors in hydrologic forecasts (Vrugt and Robinson, 2007).The NWP models used to drive the WATFLOOD and WaSiM hydrologic models in this studyhad forecast horizons of two days. At longer forecast horizons, longer training windows may benecessary in order to balance the goals of minimizing both short-memory hydrologic state bias and30Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir Inflow0 0.2 0.4 0.6 0.8 1.000.20.40.60.81.0False Alarm RateHit Rate  RawLDMB3Figure 2.7: ROC curves for day 1 ensemble forecasts for forecasted inflow anomalies greaterthan -5.0 m3/s (dot-dashed line), 2.7 m3/s (dashed line) and 19.5 m3/s (solid line). Thedotted line is the zero-skill line.NWP errors that may require longer learning periods (McCollor and Stull, 2008a). Using knowledgeof observation-driven simulated inflow bias, it is possible to separate the bias in the M2M ensembleforecasts into that caused by bias in initial conditions, and that caused by the interaction of the NWPand DH models (i.e., total forecast bias is the product of the DMB in the hydrologic state and of theinteracting models). This is an area for potential future study.To date, operational and research applications of ensemble forecasting have dealt with only oneor two error sources at a time (e.g., Vrugt et al., 2005; Moradkhani et al., 2005b; Randrianasoloet al., 2010; Thirel et al., 2010; Van den Bergh and Roulin, 2010; De Roo et al., 2011) and aretherefore underdispersive, failing to sample the full range of possible hydrologic outcomes. TheM2M system presented in this chapter is likewise underdispersive. This will be handled by addingto the M2M system a multi-state component and multi-parameter component. This upgraded M2Mensemble will thereby attempt to explicitly account for all sources of uncertainty in the modelling31Chapter 2: Bias-Corrected Short-Range Member-to-Member Ensemble Forecasts of Reservoir InflowDay 1 Day 200.20.40.60.81  Brier Skill Score RelativeReliability  RelativeResolutionBrier Skill Score RelativeReliability  RelativeResolutionRaw DMB3 LDMB3Figure 2.8: Brier skill score (BSS = 1 is perfect), relative reliability (zero is perfect) and rela-tive resolution (one is perfect) for raw and bias-corrected forecasts for days one and two.The inflow anomaly threshold evaluated here is 19.5 m3/s.chain (Bourdin et al., 2012) and constitute a truly probabilistic forecast as defined by Krzysztofow-icz (2001). It is anticipated that further probabilistic calibration (e.g., Nipen and Stull, 2011) willbe needed in order to improve the statistical reliability of the ensemble.32Chapter 3Improving Ensemble Forecasts ofReservoir Inflow by SamplingUncertainty in the Modelling Chain3.1 IntroductionAs simplified representations of complex processes, hydrologic models and their predictions aresubject to uncertainty. Making an ensemble of multiple forecasts is a way to sample the range ofthis uncertainty. Since Lorenz?s (1963) work in chaos theory, the quantification of uncertaintiesin both initial conditions and model processes has become common practice in ensemble weatherforecasting. The use of multi-model, multi-analysis super-ensembles represents a truly probabilisticapproach, accounting for uncertainties in initial conditions, parameterizations, and model structure(Ross and Krishnamurti, 2005). A similar approach is needed for hydrologic applications to increaseensemble spread and capture the full range of predictive uncertainty.In order to generate a truly probabilistic forecasting system, it is necessary to sample all sourcesof uncertainty in the modelling chain (Krzysztofowicz, 2001). In the case of reservoir inflow fore-casting, errors are incorporated into the forecasting system by way of the hydrologic models them-selves, their parameterizations, and the initial and boundary conditions (i.e., meteorological data)used to drive the models. To date, operational and research efforts into probabilistic streamflowforecasts through the use of ensembles have neglected some sources of uncertainty. For example,the US National Weather Service River Forecast System generates operational probabilistic watersupply forecasts using the original Ensemble Streamflow Prediction (ESP) method of Day (1985),whereby streamflow simulation is driven by historical sets of temperature and precipitation data(Franz et al., 2003). Thus, the ESP forecasts account only for uncertainty in meteorological inputs,ignoring non-stationarity. Georgakakos et al. (2004) used multiple hydrologic models, some withmany parameter sets to assess streamflow prediction uncertainty. Duan et al. (2007) used three hy-33Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chaindrologic models, each optimized using three different objective functions to derive a nine-memberensemble that assessed uncertainty arising from model structure and parameter error. Carpenter andGeorgakakos (2006) used a Monte Carlo sampling framework to account for both parametric andradar-rainfall uncertainty. The Absynthe modelling procedure used by the BC Hydro and PowerAuthority (BC Hydro) for daily inflow forecasts likewise incorporates ensemble weather forecastsand multiple parameter sets for a single hydrologic model (Fleming et al., 2010). Other examplesof hydrologic ensembles that incompletely sample uncertainty include but are not limited to: Vrugtet al. (2005); Moradkhani et al. (2005b); Randrianasolo et al. (2010); Thirel et al. (2010); Van denBergh and Roulin (2010), and De Roo et al. (2011).In this chapter, we present an ensemble reservoir inflow forecasting system that samples fore-cast uncertainty arising from all sources of error in the modelling chain. The ensemble consists ofmultiple Numerical Weather Prediction (NWP) model output grids downscaled using multiple in-terpolation schemes and subsequently used to drive multiple Distributed Hydrologic (DH) models.Each of these DH models makes use of multiple differently-optimized parameter sets and beginseach day?s forecast from a set of different initial conditions or hydrologic states. This ensemblethereby comprises a truly probabilistic forecasting system as defined by Krzysztofowicz (2001).To the best of our knowledge, this Member-to-Member (M2M) ensemble is currently the onlyexample of a short-term hydrologic forecasting system that explicitly attempts to sample all sourcesof model error. In a previous study (Chapter 2), an ensemble inflow forecasting system consisting ofa multi-NWP, multi-DH ensemble with multiple downscaling schemes was evaluated. The focus ofthis chapter is on evaluating the impact of adding the multi-parameter and multi-state componentsto this ensemble.3.2 Case Study Area and DataThe hydrologic ensemble forecasting system developed in this study is used to forecast inflows tothe Daisy Lake reservoir, a hydroelectric facility on the upper Cheakamus River in southwesternBritish Columbia (BC), Canada. The reservoir is operated by BC Hydro. Evaluation of the en-semble is carried out over the 2009?2010 water year, defined as the period from 1 October, 2009to 30 September, 2010. Fall and winter storm season inflows are primarily driven by precipitationfrom Pacific frontal systems. Rain-on-snow events can cause significant inflows during this pe-riod. During spring and summer, inflows are snowmelt-driven, with some late-season glacier meltcontributions.Daily average inflow rates are calculated by BC Hydro using a water balance based on observedreservoir levels and outflows. The calculated inflows employed in this study have undergone quality34Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chaincontrol and are considered to be of high quality. For the purposes of this study, these values will bereferred to as observed inflows. Hourly forecast inflows to the Daisy Lake reservoir are transformedinto daily average inflow rates for verification against these observations.Evaluation of the ensemble forecasting system is based on a suite of measures- and distributions-oriented verification metrics. Measures-oriented scores are used to evaluate the performance of theensemble mean, and include the Degree of Mass Balance (DMB) as a measure of forecast bias, theMean Absolute Error (MAE) and Root Mean Square Error (RMSE) as measures of accuracy, andthe RMSE Skill Score (RMSESS), which measures forecast skill relative to a zero-skill referenceforecast, taken here to be persistence. Statistical association of ensemble mean forecasts is evaluatedusing the Nash-Sutcliffe Efficiency (NSE; Nash and Sutcliffe, 1970), which emphasizes high-flowperiods, and the NSE of Log-transformed flows (LNSE) to evaluate low-flow performance. The fullensemble is evaluated using distributions-oriented measures including the Brier Skill Score (BSS)and its decomposition into relative reliability and relative resolution, the ensemble rank histogram,and the Relative Operating Characteristics (ROC) diagram. Detailed descriptions of these verifica-tion scores are given in Appendix A.3.3 A Member-to-Member (M2M) Ensemble Forecasting SystemThe Member-to-Member (M2M) ensemble forecasting system used for forecasting inflows to theDaisy Lake reservoir explicitly samples uncertainty arising from errors in the Numerical WeatherPrediction (NWP) fields used to drive the Distributed Hydrologic (DH) models, the hydrologicmodels themselves and their parameterizations, and the hydrologic states or initial conditions usedto begin each daily forecast run. The result is an ensemble of 72 unique daily inflow forecasts. Adescription of each of the M2M ensemble components follows.3.3.1 A Multi-NWP EnsembleThe NWPmodels are from the operational ensemble run by the Geophysical Disaster ComputationalFluid Dynamics Centre (GDCFDC) in the Department of Earth, Ocean and Atmospheric Sciencesat the University of British Columbia. The ensemble includes three independent nested limited-areahigh-resolution mesoscale models with forecast domains centred over southwestern BC.The Mesoscale Compressible Community (MC2) model is a fully compressible, semi-implicit,semi-Lagrangian, non-hydrostatic mesoscale model (Benoit et al., 1997). The fifth-generation Penn-sylvania State University-National Center for Atmospheric Research Mesoscale Model (MM5) is afully compressible, non-hydrostatic model designed for mesoscale and regional-scale atmosphericsimulation (Grell et al., 1994). Version 3 of theWeather Research and Forecasting (WRF) mesoscale35Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainmodel is also fully compressible and non-hydrostatic and has been developed as a community model(Skamarock et al., 2008).The coarse resolution (108 km horizontal grid spacing) outer nests of these three NWP modelsare initialized using the National Centers for Environmental Prediction (NCEP) North AmericanMesoscale (NAM) model, which also provides time-varying boundary conditions. All three NWPmodels produce forecast output at horizontal grid spacings of 36, 12, 4 and 1.3 km. The finer grids(which have smaller model domains due to computational constraints) are nested inside of the coarsegrids from which they receive their time-varying boundary conditions. Due to the small size of thecase-study watershed, NWP output from only the three finest grids are used to drive the DH models.The NWP models are initialized at 00UTC and run out to 60 hours (the 1.3-km MC2 model runs foronly 39 hours due to operational time constraints). NWP model forecasts beginning from 00PST(08UTC) are used to drive the DH models from a particular set of initial conditions, or a hydrologicstate. The creation of initial conditions for the daily inflow forecasts is described in Section 3.3.4.3.3.2 A Multi-Hydrologic Model EnsembleThe DH models applied to the case-study watershed are the Water balance Simulation Model(WaSiM; Schulla, 2012) and WATFLOOD (Kouwen, 2010). These models were selected becausethey are distributed, and therefore able to take direct advantage of high-resolution NWP input. Theyare also able to simulate snowmelt and glacier melt processes and lakes in complex terrain givenrelatively limited input data. These features are critical for modelling in the case-study watershed.The optimization of model parameters for each DH model is described in Section 3.3.3.Both DH models are run at 1 km grid spacing at an hourly time step. The NWP fields aredownscaled to the DH model grid using interpolation schemes built into each DH model. For theWaSiM model, 12-km NWP fields are downscaled using two methods: inverse-distance weighting(IDW); and elevation-dependent regression (Schulla, 2012). The 4-km and 1.3-km NWP fieldsare downscaled using a bilinear interpolation scheme. WATFLOOD downscaling is done usingIDW that incorporates elevation dependence using an optional elevation adjustment rate for bothtemperature and precipitation. 12-km fields are downscaled using IDW with two different elevationadjustments, while the 4- and 1.3-km fields do not use the elevation adjustment.3.3.3 A Multi-Parameter Hydrologic EnsembleParameters of both DH models were optimized on observed inflows using meteorological data fromweather stations located within the case-study watershed and surrounding area to drive model sim-ulations for a period of ten water years (1997?2007). Parameter optimization consisted of a multi-36Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainstage process beginning with manual tuning of a parameter set previously used for a similar applica-tion. Then, to generate different parameter sets, a series of automated optimizations were run usingthe Dynamically Dimensioned Search (DDS) algorithm (Tolson and Shoemaker, 2007; Graeff et al.,2012; Francke, 2012) with different objective functions: the mean absolute error (MAE) of simu-lated inflow, to minimize overall errors; Nash-Sutcliffe Efficiency (NSE; Nash and Sutcliffe, 1970)of inflow, to emphasize performance during high-flow events; and the NSE of log-transformed flows(LNSE), to optimize during low-flow periods. This methodology is consistent with that of Duanet al. (2007), who likewise used objective functions favouring different parts of the hydrograph tooptimize multiple hydrologic models. The different model parameterizations attempt to explicitlysample uncertainty in the parameter values. The parameter sets optimized using NSE and LNSE ofinflows are referred to as the NSEo and LNSEo parameter sets.The third parameter set, designated MAEo, was generated using a four-step procedure to pro-duce a ?best? model parameterization for each hydrologic model. Following the manual tuning step,DDS was applied to optimize parameters expected to impact high flows in the basin (e.g., rain/snowpartitioning and snowmelt parameters) using the NSE of simulated flow as an objective function.Continuing from the resulting parameter set, an additional DDS optimization was carried out usingthe NSE of log-transformed flows to optimize parameters affecting low flow periods (e.g., soil pa-rameters). These optimizations were based on performance of simulated inflows to Daisy Lake andstreamflows at an upstream location (Cheakamus Upper), which are collected by the Water Surveyof Canada (WSC). Finally, three separate DDS trials were executed to optimize all tuning parameterson the MAE of simulated inflows only. The best of the three trials was selected based on perfor-mance during an independent validation period of ten water years (1986?1996). In this chapter, thefull ensemble consisting of forecasts from all three model parameterizations is compared to an en-semble comprised of the single best (MAEo) parameterization for each DH model. This MAEo-onlyconfiguration is identical to the multi-NWP, multi-DH ensemble evaluated in Chapter 2.The set of model parameters selected for optimization in each DH model is the same in all threeoptimized model parameterizations. These are parameters related to watershed soil properties andthe accumulation and melt of snow and glaciers. WATFLOOD allows different land-use types tobe assigned different values of parameters (Kite and Kouwen, 1992; Kouwen et al., 1993). WAT-FLOOD soil parameters were optimized for four different land classes comprising the majority ofthe watershed area: barren/alpine, old forest, young forest, and logged (Figure 3.1). Model parame-ters impacting snow and glacier processes were also optimized for the glacier land class. Parameterscorresponding to the other land classes in WATFLOOD (wetlands, water and impervious surfaces)were excluded from optimization because they have well-defined values or because they contributerelatively little to the total watershed area.37Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainLand Use ClassAlpine/BarrenOld ForestYoung ForestLoggedGlaciersWetlandsWaterUrban0 5 10 15 202.5 Kilometers?Figure 3.1: Map of the Cheakamus watershed showing land-use/land cover classes utilized inthe WATFLOOD model. Map derived from data provided by BC Hydro.A selection of readily-interpreted, optimized WaSiM model parameters is shown in Table 3.1.By examining these parameters, we can anticipate how the NSEo and LNSEo WaSiM model runswill differ from those made using the MAEo parameter set. Since the NSEo rain-snow threshold(TR/S) is lower, we should expect less snow accumulation and flashier precipitation-driven inflowevents during the fall and winter. A slightly lower threshold temperature for snowmelt (Tmelt) meansthat melt may begin earlier in the year, and a lower degree-day melt factor (MF) should result in aslower melt rate. Similarly, we anticipate LNSEo simulations to have less snow accumulation thanthe MAEo and the NSEo, with more rain events during the fall and winter. The timing of springsnowmelt should be approximately the same as the NSEo simulations, but snowmelt will be slower.Table 3.1: Selected model parameters for the WaSiM hydrologic model, as optimized by theDDS algorithm using different objective functions.Model Parameter MAEo NSEo LNSEoTR/S(C) -1.44 -1.75 -1.88Tmelt (C) 1.68 1.28 0.87MF (mm/day/C) 1.59 1.48 1.0838Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainFigure 3.2 displays snow water equivalent (SWE) simulations made by WaSiM for the 2009?2010 water year. Observed SWE is from the Squamish Upper snow pillow site operated by theWater Survey of Canada, located just outside the western boundary of the watershed. SimulatedSWE is at a proxy location in the watershed selected based on elevation, aspect, terrain, and acomparison of PRISM (Parameter-elevation Regressions on Independent Slopes Model) 1961?1990climate normal data at the real and proxy sites (PRISM Climate Group, 2012). PRISM data wasdownscaled to 400 m resolution by ClimateBC (Wang et al., 2006). Results in Figure 3.2 confirmthat the NSEo- and LNSEo-based simulations accumulate less snow than MAEo. The melt rate forthese simulations is lower than MAEo simulations and observations. The timing of the onset ofspring snowmelt is similar for all model runs, and is in good agreement with observations.10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/3005001000150020002500SWE (mm)Date (mm/dd)WaSiM?Simulated Snow?Water Equivalent (SWE)  MAEoNSEoLNSEoObservedFigure 3.2: Snow-water equivalent at the Squamish Upper proxy site as simulated by theWaSiM hydrologic model using the MAEo, NSEo and LNSEo parameter sets.The impact of different parameter values can also be seen in WaSiM simulated inflows. Thesesimulations are driven by observed meteorological data and are a by-product of updating the dailyhydrologic state or initial condition. Figure 3.3 illustrates the process of generating updated hy-drologic states, simulated inflows, and forecasted inflows (which are driven by NWP fields) for anindividual DH model. As anticipated, the fall/winter period displayed in Figure 3.4 shows NSEo andLNSEo simulated inflows to be flashier than the MAEo simulated inflows due to different rain/snowpartitioning of precipitation events. These results have not been bias-corrected. The impact of vary-ing soil parameters is more difficult to discern from the inflow record in such a flashy watershed.WaSiM model output allows for a detailed analysis of water storage in the unsaturated and saturatedsoil zones among many other diagnostic products; such output was not analyzed in this study.39Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainUniform, Snow-Free StateOct 1 00PSTUpdated StateOct 2 00PSTUpdated StateOct 3 00PSTUpdated StateHydrologic StateModel InputMeteorologicalObservationsSept 1 - 30, 2009MeteorologicalObservationsOct 1, 2009MeteorologicalObservationsOct 2, 2009MeteorologicalObservationsOct 3, 2009Model Output Simulated In!owOct 1, 2009Simulated In!owOct 2, 2009Simulated In!owOct 3, 2009NWP ForecastIssuedOct 1, 2009Forecast In!owOct 1-2, 2009Forecast In!owOct 2-3, 2009Forecast In!owOct 3-4, 2009NWP ForecastIssuedOct 2, 2009NWP ForecastIssuedOct 3, 2009...TimeFigure 3.3: Flowchart illustrating the process of generating updated hydrologic states, simu-lated inflows, and forecasted inflows for a particular hydrologic model. Solid lines showthe flow of meteorological observations into the model and the production of simulatedinflows and updated hydrologic states for the following day. Dashed lines show the flowof NWP forecasts into the model and the resulting 2-day inflow forecasts.11/01 11/15 12/01 12/15 01/01 01/15050100150200250Inflow (m3 /s)Date (mm/dd)WaSiM Daisy Lake Simulated Inflows  MAEoNSEoLNSEoObservedFigure 3.4: Daisy Lake inflows during fall and early winter of the 2009?2010 water year sim-ulated by the WaSiM model using the MAEo, NSEo and LNSEo parameter sets.40Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainWATFLOODmodel parameters for alpine and old forest land-use classes are shown in Table 3.2.These classes are examined because they are the largest classes in the watershed (Figure 3.1), andbecause the proxy site for comparing modelled and observed SWE is in the old forest class. Notethat in WATFLOOD, TR/Sis taken to be Tmelt, whereas WaSiM allows a rain/snow mix to occurover a range of temperatures and rain/snow and snowmelt temperature thresholds can differ. WAT-FLOOD model parameters for the old forest land class indicate that relative to MAEo, both NSEoand LNSEo models may begin to accumulate snow earlier in the fall, and will do so at a greater rate.Flashy rain-on-snow or winter rainfall events are less likely due to higher threshold temperatures forrain/snow partitioning. NSEo and LNSEo snowmelt should begin later in the spring, with LNSEohaving a greater rate of melting. In alpine areas, Tmelt for NSEo and LNSEo is only slightly lowerthan for the MAEo parameter set, so the differences in rain/snow partitioning are likely insignificant.The NSEo melt factor is less than that of the other models, so this land class will contribute less tosnowmelt-driven inflows, but its contribution may last longer into the summer.Table 3.2: Same as Table 3.1, but for the two primary land classes in the WATFLOOD model.Model Parameter Land Class MAEo NSEo LNSEoTmelt, TR/S(C)Alpine -1.90 -2.12 -2.12Old Forest 1.96 2.71 2.84MF (mm/h/C)Alpine 0.24 0.17 0.24Old Forest 0.11 0.10 0.12Figure 3.5 shows that SWE simulated at the proxy location for the 2009?2010 water year variesbetween the different model simulations as expected. That is, the rate of snow accumulation at thesite is slightly higher for NSEo and LNSEo than for MAEo, and melt begins later in the season.Once melt begins, the LNSEo SWE drops off faster than the NSEo and MAEo SWE due to thehigher degree-day melt factor (note that this is a simplified explanation, as the melt rate is alsorelated to the difference between Tmelt and actual air temperature). In a previous study (Chapter 2),WATFLOOD was found to be late in simulating the onset of snowmelt during La Nin?a summers;this phenomenon is visible in Figure 3.5. Simulated inflows are not shown because the interactionof inflow contributions from the various land classes makes it difficult to assess the impact of a smallnumber of model parameters on this variable. However, it appears that the MAEo simulated inflowsare slightly flashier than the NSEo and LNSEo inflows during the fall and winter, and this could bedue to the lower MAEo TR/S temperature in the old forest land class. Additional diagnostic modeloutput available for examining the impacts of soil parameters has not been examined.41Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chain10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/3005001000150020002500SWE (mm)Date (mm/dd)WATFLOOD?Simulated Snow?Water Equivalent (SWE)  MAEoNSEoLNSEoObservedFigure 3.5: As in Figure 3.2, but simulations done by the WATFLOOD hydrologic model.3.3.4 A Multi-State Hydrologic EnsembleThe multi-state or multi-initial-condition component of the M2M ensemble forecasting systemarises as a direct consequence of implementing a multi-parameter component. On each forecastday, the model state is updated by simulating watershed processes using yesterday?s meteorologicalobservations to drive the DH models (Figure 3.3). In order to maintain equilibrium in the hydrologicmodels, the model parameterization used to drive the forecast must match that used in generatingthese initial conditions. Consider, for example, the WaSiM model parameter m, which describeswater recession in the saturated zone. This parameter has a direct impact on the soil saturationdeficit, so if the model state is updated with a particular m value, it may generate a hydrologic statewith a deficit. If the forecast beginning from this state uses a different value for m, this deficit maysuddenly become a surplus, resulting in an immediate release of water from the saturated zone.To avoid such discontinuities, the initial model spin-up from uniform, snow-free conditions atthe beginning of the case-study water year is done for each hydrologic model using each of the threedifferent parameter sets with meteorological observations used to drive the models. WATFLOODand WaSiM each have three different sets of initial conditions ? a MAEo state, a NSEo state, anda LNSEo state. Each forecast day, these states are updated using the corresponding parameter setsand newly observed meteorological data, and then forecasts are made from these updated states,again using the corresponding model parameterization. This process is illustrated in Figure 3.3 for asingle model/parameterization/state. Forecasts made using the MAEo parameter set and beginningfrom the MAEo state will be referred to as MAEo forecasts, and so forth. These different hydrologic42Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainstates comprise a limited sampling of the uncertainty space in the models? initial conditions.While ensemble data assimilation methods are available for hydrologic modelling applications(e.g., Andreadis and Lettenmaier, 2006; Clark et al., 2008), we have opted to limit the componentof the M2M ensemble that samples hydrologic state uncertainty to just those states necessitatedby the multi-parameter ensemble. This is because of the paucity of observed data available withinthe watershed for assimilation. DeChant and Moradkhani (2011a) have had some success usingassimilation of observed SWE to update hydrologic state in seasonal forecasting. However, themethod was found to be sensitive to the availability of representative observations and would there-fore possibly fail to produce an accurate state for the Cheakamus watershed. The RetrospectiveEnsemble Kalman Filter (REnKF; Pauwels and De Lannoy, 2006) may be worth exploring, thoughthe computational expense of ensemble data assimilation can be prohibitive in an operational fore-casting framework. Dual state-parameter estimation frameworks that incorporate data assimilationcould also be used for a more complete handling of parameter and initial condition uncertainty (e.g.,Moradkhani et al., 2005a; DeChant and Moradkhani, 2011b; Leisenring and Moradkhani, 2011).The full ensemble including the six different hydrologic models (two distinct DH models, eachwith three parameterizations/states) and driven by the multi-model, multi-grid scale NWP ensemblewith different downscaling schemes has a total of 72 ensemble members. A sample workflow forgenerating the MAEo WaSiM forecasts is illustrated in Figure 3.6. Each day, each model configura-tion (consisting of a hydrologic model and a parameter set/hydrologic state) is driven by 12 differentdownscaled NWP forecast fields, generating 12 different inflow forecasts. This forecast workflowis indicated by the solid arrows. Dashed arrows illustrate how meteorological observations are usedto update the model configuration?s hydrologic state for the following day?s forecasts. The modelconfiguration is indicated by dash-dotted arrows. This process is repeated for each watershed model(WaSiM and WATFLOOD) and each parameterization/state (MAEo, NSEo and LNSEo), yielding72 unique inflow forecasts each day.3.3.5 Bias Correction of Inflow ForecastsPrior to combination and evaluation, each of the 72 inflow forecast ensemble members is post-processed to remove unconditional bias. The purpose of bias correction is to correct for systematicerrors in the dynamic NWP and DH models. Since each ensemble member is derived from a dif-ferent NWP model driving a different DH model, individual member bias correction is necessary inthis context.An appropriate measure of bias for volumetric quantities such as precipitation or reservoir inflowis the degree of mass balance (DMB; McCollor and Stull, 2008a). The DMB is a measure of theratio of simulated or forecasted inflow volume to the observed inflow volume over a given period of43Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainWRF 12 km WRF 1.3 kmWRF 4 kmMM5 12 kmMM5 4 kmMM5 1.3 kmMC2 12 kmMC2 4 kmMC2 1.3 kmDownscaler #1Downscaler #2 Downscaler #3 Downscaler #4WaSiM Model WaSiM Model MAEo StateMAEo-optimized ParametersMeteorological ObservationsUpdated WaSiM Model MAEo StateForecast 4 Forecast 5 Forecast 6Forecast 1 Forecast 2 Forecast 3Forecast 10 Forecast 11 Forecast 12Forecast 7 Forecast 8 Forecast 9Downscaler #5Model InputModel OutputModel Con!gurationFigure 3.6: The flow of information into and out of the WaSiM model for generating MAEoforecasts. Each model (WaSiM and WATFLOOD) and each parameterization (MAEo,NSEo and LNSEo) generates 12 different daily forecasts in this way for a combined totalof 72 unique daily forecasts.time (see Appendix A). The use of a multiplicative bias corrector prevents corrected inflows frombecoming negative.A linearly-weighted DMB bias correction factor calculated over a moving window of three dayswas found to be ideal in removing bias from the multi-NWP, multi-DH M2M ensemble forecastsevaluated in Chapter 2. This study found bias in the hydrologic state used to start each inflowforecast to be the main contributor to forecast bias. The importance of this bias source is likely thereason that a short correction window performs so well; due to the flashy nature of the case-studywatershed, only recent forecast errors are likely to play an important role in bias correction forshort-term forecasts. The linearly-weighted three-day DMB bias corrector (LDMB3) developed inChapter 2 is applied in this study. Uncorrected forecasts are referred to herein as ?raw? forecasts.3.4 Results and DiscussionIn Chapter 2, a M2M ensemble consisting of the multi-NWP and multi-DH ensemble componentswith a single ?best? (MAEo) model parameterization was evaluated. In this chapter, the impact44Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainof adding the multi-parameter and multi-state ensemble components is evaluated by comparisonagainst the smaller ensemble.An initial analysis of individual forecast ensemble members indicated that the NSEo and LNSEomembers performed poorly relative to the MAEo members for many measures-oriented verificationscores. Figure 3.7 shows the performance of day 1 inflow forecasts driven by the 4-km WRF modeloutput, evaluated over the 2009?2010 water year. Note that MAE, NSE and LNSE without thesubscript ?o? refer to verification measures (see Appendix A), while those with the subscript refer tomodel forecasts from a particular model parameterization/hydrologic state. The differences in per-formance among differently-optimized forecasts driven by this particular NWP model are consistentwith those driven by output from other NWP models and grid scales. Thus, this can be considereda representative sample of the relative performance of the various DH model parameterizations.0.50.60.70.80.91NSE  MAEoNSEoLNSEo0.50.60.70.80.91LNSE  05101520m3 /sMAE  WaSiM WATFLOODRawLDMB3RawLDMB3 RawLDMB3RawLDMB3RawLDMB3RawLDMB3Figure 3.7: Performance of day 1 MAEo, NSEo and LNSEo forecasts from the WaSiM andWATFLOOD models driven by the 4-km WRF NWP output fields. Perfect inflow fore-casts have NSE and LNSE equal to one (unitless), and MAE of zero m3/s.The raw LNSEo WaSiM forecasts perform poorly relative to the other optimizations, especiallyin terms of MAE and NSE. The differences among the raw WATFLOOD members are not as large.Following bias correction using the LDMB3 correction factor, the performance of the WaSiM mem-bers is roughly equal. In the case of WATFLOOD, the bias corrector is unable to improve theLNSEo ensemble member performance to the same degree as the MAEo and NSEo members (interms of MAE and NSE). A comparison of the forecast hydrographs for these ensemble members(not shown) reveals that the characteristics of snowmelt in the LNSEo forecast cause contributionsto inflow during the rising limb of the freshet to fluctuate more rapidly than those from the MAEo45Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainforecast. This erratic behaviour could hamper the ability of the bias correction scheme to reduceerrors during this period.Based on this initial analysis, it is not immediately obvious whether addition of the NSEo and,in particular, the LNSEo ensemble members to the M2M inflow forecasting system will improveany characteristics of ensemble performance. Figure 3.8 shows how the bias-corrected ensemblemean forecast performance changes as the NSEo members and LNSEo members are added to theMAEo-only ensemble evaluated in Chapter 2. Adding the NSEo ensemble members improves allmeasures of ensemble mean performance at all forecast lead times, while the additional inclusionof the LNSEo members diminishes these improvements. The ensemble configuration that includesall three model parameterizations is referred to as the full ensemble.0.750.80.850.90.951NSE0.750.80.850.90.951LNSE06810m3 /sMAE012141618m3 /sRMSE0.250.30.350.4RMSESS  MAEo?only MAEo + NSEo Full EnsembleDay 1 Day 20.90.9511.05 DMBFigure 3.8: Performance of the bias-corrected M2M ensemble mean with MAEo ensemblemembers only, and with the addition of the NSEo and LNSEo ensemble members. Perfectinflow forecasts have DMB, NSE, LNSE and RMSESS equal to one (unitless), and MAEand RMSE of zero m3/s.46Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainThe Brier Skill Scores (BSS) shown in Figure 3.9 indicate that the addition of the NSEo mem-bers to the MAEo-only ensemble improves the BSS for the 19.5 m3/s inflow anomaly threshold onforecast days 1 and 2. Scores are shown for bias-corrected forecasts (indicated by bar heights) andalso for raw forecasts (indicated by triangles). Anomalies are calculated by subtracting the dailyclimatological median inflow from the forecast, and are used so that the forecasting system is notrewarded for making high inflow forecasts during the snowmelt season when little skill is requiredto do so. Adding the LNSEo members offers little to no improvement to the BSS. Decompositionof the BSS into relative reliability and relative resolution components shows that this is because thefull ensemble has poor (higher) reliability relative to the MAEo-plus-NSEo ensemble, but greaterresolution. In fact, the addition of the NSEo members to the MAEo members results in almost noimprovement to the day 2 forecast resolution, which is the most important attribute of an ensembleforecasting system (Toth et al., 2003). Reliability can be corrected by further post-processing to re-move conditional bias, whereas resolution can only be corrected by improving the forecast ?engine?used to generate the ensemble, for example, through unconditional bias correction. This is clearlyindicated by the difference between raw and bias-corrected forecast scores in Figure 3.9. This re-sult demonstrates that while inclusion of more diverse ensemble members to the M2M ensembleis important for forecast quality, the forecast is only able to reach its full potential following biascorrection.Day 1 Day 200.20.40.60.81  RawBrier Skill Score RelativeReliability  RelativeResolutionBrier Skill Score RelativeReliability  RelativeResolutionMAEo only MAEo + NSEo Full EnsembleFigure 3.9: Brier skill score (BSS = 1 is perfect), relative reliability (zero is perfect) and rela-tive resolution (one is perfect) for different ensemble forecasts for days 1 and 2. Scoresfor bias-corrected forecasts are indicated by bar heights, while those for raw forecastsare indicated by triangles. The inflow anomaly threshold evaluated here is 19.5 m3/s.47Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling ChainRelative to theMAEo-only ensemble, inclusion of the NSEo and LNSEo ensemble members alsoresults in improved forecast discrimination for a range of anomaly thresholds as indicated by theROC curves in Figure 3.10. Results are shown for both raw and bias-corrected (LDMB3) ensembleforecasts and clearly indicate the importance of bias correction of individual ensemble membersprior to combination.0 0.2 0.4 0.6 0.8 1.000.20.40.60.81.0False Alarm RateHit Rate  Full Ensemble (LDMB3)MAEo?only (LDMB3)Full Ensemble (Raw)MAEo?only (Raw)Figure 3.10: ROC diagrams for raw and LDMB3 bias-corrected day 1 forecasts for inflowanomalies greater than -5.0 m3/s (dot-dashed line), 2.7 m3/s (dashed line) and 19.5m3/s (solid line). The dotted line is the zero-skill line.Finally, the ensemble rank histograms shown in Figure 3.11 indicate that the full ensemble is,as intended, more dispersive than the MAEo-only ensemble on forecast days 1 and 2 as a resultof its improved error sampling. The upper histograms show 48% and 33% of observations fallingoutside of the range of day 1 and day 2 MAEo-only forecasts, respectively. These values drop to26% and 16% respectively with the addition of the NSEo and LNSEo ensemble members. Thus,while the 72-member ensemble attempts to explicitly sample all sources of error in the hydrologicmodelling chain, it still fails to capture the full range of forecast uncertainty. We expect that thelimited sampling of hydrologic state uncertainty is the main cause of this underdispersion. Dualstate-parameter estimation methods could be employed for more complete handling of parameter48Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chainand initial condition uncertainty if additional observed data were available within the case study wa-tershed (Moradkhani et al., 2005a,b; DeChant and Moradkhani, 2011b; Leisenring and Moradkhani,2011), though such methods are computationally expensive.0 12 2410%20%30%RankFrequencyDay 1 ? MAEo?only0 12 2410%20%30%RankDay 2 ? MAEo?only0 24 48 7210%20%30%RankFrequencyDay 1 ? Full Ensemble0 24 48 7210%20%30%RankDay 2 ? Full EnsembleFigure 3.11: Rank histograms for the bias-corrected MAEo-only and full ensembles. The fullensemble has greater dispersion as indicated by a smaller percentage of observationsfalling into the extreme bins of the histogram.Increased spread is also evident in the raw ensemble hydrograph traces shown in Figure 3.12 ascompared with those in Figure 2.3. Ensemble members derived from the same hydrologic modelparameterizations have a tendency to cluster together, supporting the finding in Chapter 2 that biasin the model simulation used to generate the daily hydrologic state is the primary contributor tooverall forecast bias. This clustering is most visible near the peak of summertime snowmelt-driveninflows in July and August. The spread of the full ensemble is greatly reduced following LDMB3bias correction (Figure 3.13), but remains larger than that of the MAEo-only ensemble (Figure 2.5).3.5 Concluding RemarksIn this chapter, we have evaluated the impact of incorporating a multi-parameter, multi-state com-ponent into a M2M ensemble forecasting system consisting of a multi-NWP, multi-hydrologic49Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chain10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30050100150200250300350Inflow (m3 /s)2009?2010 Day 1 Raw Forecasts  WaSiM MembersWATFLOOD MembersObserved10/02 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 10/01050100150200250300350Inflow (m3 /s)Date (mm/dd)Day 2 Raw ForecastsFigure 3.12: Raw ensemble traces for day 1 (top) and day 2 (bottom) forecasts during the2009?2010 water year for all hydrologic model parameterizations.model component. The multi-parameter component was achieved by optimizing the WaSiM andWATFLOOD hydrologic models with different objective functions (MAE, NSE and NSE of log-transformed flows). The multi-state component is necessitated by the use of multiple parameter-izations in order to avoid discontinuities in the inflow forecasts that could occur due to suddenlychanging model parameters. To the best of our knowledge, this is the first example of a short-term hydrologic forecasting ensemble that explicitly attempts to sample all sources of hydrologicuncertainty.Initial analysis of inflow forecast performance indicated that the addition of the LNSEo ensem-ble members had a negative impact on the performance of the ensemble mean relative to an ensem-ble mean comprised of MAEo and NSEo members alone. However, examination of distributions-oriented measures of forecast performance revealed that while the inclusion of the LNSEo ensemblemembers did result in a deterioration of ensemble reliability, it significantly improved the ensembleresolution. Recall that reliability can easily be corrected using calibration methods (e.g., Hamill50Chapter 3: Improving Ensemble Forecasts of Reservoir Inflow by Sampling Uncertainty in the Modelling Chain10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30050100150200250300350Inflow (m3 /s)2009?2010 Day 1 LDMB3 Forecasts  WaSiM MembersWATFLOOD MembersObserved10/02 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 10/01050100150200250300350Inflow (m3 /s)Date (mm/dd)Day 2 LDMB3 ForecastsFigure 3.13: As in Figure 3.12, but following LDMB3 bias correction.and Colucci, 1997; Nipen and Stull, 2011), whereas resolution can only be corrected by improvingthe forecasting ?engine? used to generate the ensemble. Adding diversity to the M2M ensemble byexplicitly attempting to sample the full range of forecast uncertainty is important in improving thismost important aspect of ensemble quality. However, it was found that the removal of unconditionalforecast bias offered significant additional gains, enabling the ensemble to reach its true potential.Based on these results, future work on the M2M ensemble forecasting system will include bias-corrected ensemble members from all three model parameterizations. While the spread of the fullensemble is greater than the MAEo-only ensemble evaluated in a previous study (Chapter 2), it re-mains underdispersive. This will be improved upon by applying an appropriate uncertainty modelto transform the ensemble into a reliable probabilistic forecast, and by performing probability cali-bration if and when necessary to further improve reliability.51Chapter 4Reliable Probabilistic Forecasts from anEnsemble Reservoir Inflow ForecastingSystem4.1 IntroductionForecasts of weather and hydrologic variables are subject to uncertainty due to errors introduced intothe modelling chain via imperfect initial and boundary conditions, poor model resolution of terrainand small-scale processes, and the necessary simplification of physical process representation in themodels themselves (e.g., Palmer et al., 2005; Bourdin et al., 2012). Deterministic forecasts ignorethese errors and may provide forecast users with a false impression of certainty. Probabilistic fore-casts expressed as probability distributions are a way of quantifying this uncertainty by indicatingthe likelihood of occurrence of a range of forecast values. Additionally, reliable probabilistic inflowforecasts enable hydroelectric reservoir managers to set risk-based criteria for decision making andoffer potential economic benefits (Krzysztofowicz, 2001).Ensemble forecasting techniques are designed to sample the range of uncertainty in forecasts,but are often found to be unreliable, with underdispersiveness being a frequently cited deficiencyin both weather and hydrologic forecasting applications (e.g., Eckel and Walters, 1998; Buizza,1997; Wilson et al., 2007; Olsson and Lindstro?m, 2008; Wood and Schaake, 2008). In order tocorrect these deficiencies, uncertainty models can be used to fit a probability distribution function(PDF) to the ensemble, whereby the parameters of the distribution are estimated based on statisticalproperties of the ensemble and the verifying observations. These theoretical fitted distributionsreduce the amount of data required to characterize the distribution (for example, from 72 ensemblemembers to two parameters describing the mean and spread of a Gaussian distribution), and allowestimation of probabilities for events that lie outside of the range of observed or modelled behaviour(Wilks, 2006).52Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemUncertainty models make different assumptions about how the ensemble members and obser-vations are generated. For example, centering a Gaussian probability distribution on the ensemblemean with spread proportional to the ensemble variance makes the assumption that the ensemblemean forecast errors are normally distributed (or, equivalently, that the verifying observations aredrawn from a normal distribution centred at the ensemble mean). This model also assumes the exis-tence of a spread-skill relationship. That is, the spread of the ensemble members should be relatedto the accuracy (or skill) of the ensemble mean; when the forecast is more certain, as indicated bylow ensemble spread, errors are expected to be small. However, this relationship is often tenuous(e.g., Hamill and Colucci, 1998; Stensrud et al., 1999; Grimit and Mass, 2002). If the uncertaintymodel assumptions are valid, the resulting probability forecasts should be statistically reliable orcalibrated, meaning that an event forecasted to occur with probability p will, over the course ofmany such forecasts, be observed a fraction p of the time (Murphy, 1973). Otherwise, the proba-bilistic forecasts cannot be used for risk-based decision making, since the probabilities cannot betaken at face value.Various methods of statistical calibration have been devised to correct for deficiencies in prob-abilistic forecasts. These can generally be split into two groups: ensemble calibration, which ad-justs individual ensemble members in order to produce reliable forecasts; and probability calibra-tion, which adjusts the probabilities directly. Examples of ensemble calibration include BayesianModel Averaging (BMA; Raftery et al., 2005) and generalizations thereof (e.g., Johnson and Swin-bank, 2009). The weighted ranks method (Hamill and Colucci, 1997) and its generalization, theProbability Integral Transform (PIT)-based calibration of Nipen and Stull (2011) are examples ofprobability calibration that have been shown to improve the reliability and value of forecasts ofprecipitation, temperature, wind speed, and other meteorological variables. Nipen and Stull (2011)also demonstrated that their method was able to further improve forecasts generated using BMA.Bayesian methods have been applied successfully in hydrologic forecasting applications over arange of timescales (e.g., Duan et al., 2007; Reggiani et al., 2009; Wang et al., 2009; Parrish et al.,2012). Probability calibration on the other hand, has not yet been widely adopted by the hydrologicmodelling community. Olsson and Lindstro?m (2008) provide an example of a very simple probabil-ity calibration used to improve ensemble spread. Roulin (2007) applied the weighted ranks methodto medium-range forecasts of streamflow and found very little improvement to the already reliableforecasting system. Quantile mapping (QM) is a similar probability calibration technique, but issuited to seasonal hydrologic forecasting, as it maps forecast probabilities to their correspondingclimatological values (Hashino et al., 2007; Madadgar et al., 2012).In this chapter, we apply two simple uncertainty models to a 72-member ensemble of bias-corrected reservoir inflow forecasts in order to generate probabilistic forecasts. We then test the53Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemapplication of a probability calibration scheme to improve reliability where necessary. All post-processing applied to the ensemble is done via the COmmunity Modular Post-processing System(COMPS). This system was originally described and implemented by Nipen (2012) and is nowavailable as open-source at http://wfrt.github.io/Comps/. We have contributed a number of newand existing schemes within the COMPS framework for bias correction, uncertainty modelling and?intelligent? calibration; a description of COMPS and the applied schemes follows.4.2 Case Study4.2.1 Study Dates and DataIn this study, various uncertainty models and probability calibration strategies are tested on a 72-member ensemble reservoir inflow forecasting system developed for the Daisy Lake reservoir, a hy-droelectric facility on the upper Cheakamus River in southwestern British Columbia (BC), Canada.The reservoir is operated by the BC Hydro and Power Authority (BC Hydro). Evaluation of theensemble is carried out over the 2010?2011 and 2011?2012 water years. For this particular hy-droclimatic regime, a water year is defined as the period from October 1 to September 30. Falland winter storm season inflows are primarily driven by precipitation from Pacific frontal systems.Rain-on-snow events can result in significant inflows during this period. During the spring andsummer, inflows are snowmelt-driven, with some late-season glacier melt contributions.Daily average inflow rates are calculated by BC Hydro using a water balance based on observedreservoir levels and outflows. The calculated inflows employed in this study are considered to beof high quality. For the purposes of this study, these values will be referred to as observed inflows.Hourly forecasts of inflows to the Daisy Lake reservoir are transformed into daily average inflowrates for verification against these observations.Ensemble and probability forecasts were generated continuously from the beginning of the2009?2010 water year through the end of the study period. The first water year (2009?2010) wasused to spin up the COMPS model parameters (described in Section 4.2.3 and Section 4.3.1) and isexcluded from evaluation.4.2.2 The Member-to-Member (M2M) Ensemble Forecasting SystemThe Member-to-Member (M2M) ensemble forecasting system used for forecasting inflows to theDaisy Lake reservoir explicitly samples uncertainty arising from errors in the Numerical WeatherPrediction (NWP) input fields used to drive the Distributed Hydrologic (DH) models, the hydrologicmodels themselves and their parameterizations, and the hydrologic states or initial conditions used54Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemto begin each daily forecast run. The result is an ensemble of 72 unique daily inflow forecasts.The NWPmodels are taken from the operational ensemble suite run by the Geophysical DisasterComputational Fluid Dynamics Centre (GDCFDC), in the Department of Earth, Ocean and Atmo-spheric Sciences at the University of British Columbia. The ensemble consists of three independentnested limited-area high-resolution mesoscale models with forecast domains centred over south-western BC: the Mesoscale Compressible Community model (MC2; Benoit et al., 1997); the fifth-generation Pennsylvania State University-National Center for Atmospheric Research MesoscaleModel (MM5; Grell et al., 1994); and Version 3 of the Weather Research and Forecasting (WRF)model (Skamarock et al., 2008). Hourly model output fields with grid spacing of 12, 4 and 1.3 kmare used for this study.From the start of the modelling period (October 2009) through March 2012, all NWP modelswere initialized at 00UTC using the National Centers for Environmental Prediction (NCEP) NorthAmerican Mesoscale (NAM) model, which also provides time-varying boundary conditions. InMarch 2012, the initial/boundary condition for the MM5 and WRF was switched to the NCEPGlobal Forecast System (GFS) model, while MC2 continued to make use of the NAM.NWP (and therefore inflow) forecast horizon varies during the case-study period. From the startof the modelling period through April 2010, all NWP models were run out to 60 hours except forthe 1.3-km MC2 model runs, which are limited to 39 hours due to operational time constraints. AllWRF grids began producing 84-hour forecasts in late April, 2010. In March 2011, the MM5 12-kmand 4-km forecasts were extended to 84 hours, enabling them to drive a 3-day inflow forecast. InMarch 2012, 1.3-km MM5 model output was also made available out to 84 hours.The Distributed Hydrologic (DH) models applied to the case-study watershed are the Water bal-ance Simulation Model (WaSiM; Schulla, 2012) and WATFLOOD (Kouwen, 2010). These modelswere selected because they are distributed, and therefore able to take advantage of high-resolutionNWP input. They are also able to simulate snow and glacier melt processes and lakes in complexterrain given relatively limited input data. Both DH models are run at 1 km grid spacing at anhourly time step. The required NWP fields are downscaled to the DH model grids using interpo-lation schemes built into each DH model. For the WaSiM model, 12-km NWP fields (tempera-ture, precipitation, wind speed, humidity, and global radiation) are downscaled using two methods:inverse-distance weighting (IDW) and elevation-dependent regression (Schulla, 2012). The 4-kmand 1.3-km NWP fields are downscaled using a bilinear interpolation scheme. WATFLOOD down-scaling is done using IDW that incorporates elevation dependence using an optional constant ele-vation adjustment rate for both temperature and precipitation (these being the only required NWPfields). 12-km fields are downscaled using IDW with two different elevation adjustments, while the4- and 1.3-km fields are downscaled without elevation adjustment.55Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemBoth WaSiM and WATFLOOD model parameters have been optimized using the DynamicallyDimensioned Search (DDS) algorithm (Tolson and Shoemaker, 2007; Graeff et al., 2012; Francke,2012). Optimization of each model was done using three different objective functions: the meanabsolute error (MAE) of simulated inflow, to minimize overall errors; Nash-Sutcliffe Efficiency(NSE; Nash and Sutcliffe, 1970) of inflow, to emphasize performance during high-flow events; andthe NSE of log-transformed flows, to optimize performance during low-flow periods. These differ-ent parameterizations attempt to sample the uncertainty in the hydrologic models? parameter values.Simulations during the ten-year optimization period (1997?2007) were driven by observed meteo-rological conditions at several weather stations within the case-study watershed and surroundingarea (Figure 2.1).The multi-state or multi-initial-condition component of the M2M ensemble forecasting systemarises as a direct consequence of implementing a multi-parameter component. In forecast mode,the hydrologic state for each model and each model parameterization is updated at the start of theforecast day by driving the model with observed meteorological data. This resulting simulated stateis used as the initial condition for the day?s forecast run. In order to avoid discontinuities early in thedaily forecast cycle, the parameter set used in updating the hydrologic state must match that used inthe forecast. Thus, each parameter set has its own hydrologic state for each model, resulting in thecreation of six different hydrologic states each day. While each hydrologic model/parameterizationis initialized from a deterministic hydrologic state, these initial conditions still provide a smallsampling of the hydrologic state uncertainty space. Figure 4.1 illustrates the update/forecast processfor a particular parameterization of the WaSiM model. The forecast workflow is indicated by thesolid arrows. Dashed arrows illustrate how meteorological observations are used to update themodel configuration?s hydrologic state for the following day?s forecasts. This process is repeatedfor each watershed model (WaSiM and WATFLOOD) and each parameterization/state, yielding 72unique inflow forecasts each day.During the 731-day evaluation period, ensemble forecasts were issued every day for forecastdays 1 and 2, while day 3 forecasts were issued on 729 days. Due to NWP model failures, thesize of the ensemble forecast issued each day is variable: the day 1 forecasts consisted of a full72-member ensemble on 456 days, while the day 2 forecasts were complete on 446 days. In themajority of cases, the number of missing day 1 and 2 ensemble members was small (3-6 missingmembers). The smallest ensemble size for forecast days 1 and 2 during the case-study period is39 members and occurred on 2 forecast days. Day 3 NWP failures are more common, as modelinstabilities can result in shortened forecast lead time. A full day 3 ensemble forecast was issuedon 684 of the 731 case-study days. There were 12 forecast days when the day 3 ensemble was lessthan half of its intended size. Probabilistic forecasts are issued regardless of ensemble size.56Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemWRF 12 km WRF 1.3 kmWRF 4 kmMM5 12 kmMM5 4 kmMM5 1.3 kmMC2 12 kmMC2 4 kmMC2 1.3 kmDownscaler #1Downscaler #2 Downscaler #3 Downscaler #4WaSiM Model WaSiM Model MAEo StateMAEo-optimized ParametersMeteorological ObservationsUpdated WaSiM Model MAEo StateForecast 4 Forecast 5 Forecast 6Forecast 1 Forecast 2 Forecast 3Forecast 10 Forecast 11 Forecast 12Forecast 7 Forecast 8 Forecast 9Downscaler #5Model InputModel OutputModel Con!gurationFigure 4.1: The flow of information into and out of the WaSiM model for generating forecastswith the MAE-optimized parameter set. The forecast workflow is indicated by the solidarrows. Dashed arrows illustrate how meteorological observations are used to updatethe model configuration?s hydrologic state for the following day?s forecasts. The modelconfiguration is specified by the dash-dotted arrows.4.2.3 A COmmunity Modular Post-processing System (COMPS)The COmmunity Modular Post-processing System (COMPS) breaks down the process of generatingcalibrated probabilistic forecasts into a series of steps referred to as components. As implemented byNipen (2012), COMPS contains components for bias correction, uncertainty modelling, probabilitycalibration, forecast updating (not applied in this study), and verification. The input to the systemis a set of predictors: ensemble forecasts of, for example, weather or hydrologic variables at aspecific geographical location. The COMPS user selects the schemes to implement for each desiredcomponent, creating a specific configuration. COMPS can also be used to generate post-processeddeterministic forecasts by bypassing the uncertainty and calibration components (bypass schemesexist for each component).Each component scheme relies on model parameters that evolve over time. Consider for ex-ample a simple degree-of-mass-balance [DMB; Eq. (A.1)] bias correction scheme. DMB valuesless than one indicate that inflows are underforecast, while DMB greater than one indicates an57Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemoverforecasting bias. Using a moving window approach, each day, the previous N days of forecast-observation pairs are retrieved in order to calculate the bias correction factor DMBN . Today?sbias-corrected forecast is generated by dividing the raw forecast by this value. COMPS maintainscomputational efficiency by requiring the parameters of its various component models to be com-puted adaptively rather than over a moving window. Thus, only the last estimate of the parametervalue must be retrieved each day, along with the new forecast-observation pair to update the param-eter for the next forecast cycle.Let an ensemble of K raw inflow forecasts be denoted as ?t,k, where t is a particular time and kis an index between 1 and K. The verifying observation at time t is xt. An adaptive calculation ofthe DMB correction factor for ensemble member k is then given by:DMBt+1,k = ?  1? DMBt,k + 1? ??t,kxt ? (4.1)where ? is a unitless time scale that describes how quickly the impact of new information (?t,k/xt)diminishes over time. Recent information is weighted more heavily; older information (DMBt,k) isnever forgotten by the adaptive scheme but becomes less important with time. While ? is necessarilyunitless, for a daily adaptive update it can be interpreted as an e-folding time in days.We have implemented this scheme in the COMPS framework; results of testing a range ofdimensionless time scales (? ) against the moving-window DMB calculation described in Chapter 2are given in Appendix B. An adaptive DMB bias corrector with ? = 3.0 was found to be effectiveat removing bias for forecast horizons of 1-3 days. Inflow forecast bias is strongly controlled bybias in the hydrologic states from which each day?s forecast is begun. This, coupled with the flashy,mountainous nature of the study watershed, suggests that only very recent errors are likely to aid inbias correction, and explains why such a short e-folding time is so effective. Other components ofthe COMPS system used in this study are described in Sections 4.3.1 and 4.3.3.4.3 From Ensembles to Calibrated Probability ForecastsThe first step in generating a probabilistic forecast of inflows to the Daisy Lake reservoir from theM2M ensemble is choosing a suitable uncertainty model. As indicated by the rank histograms inFigure 4.2, the M2M ensemble is underdispersive. That is, observations often fall outside of thepredicted range of inflows (interpretation of rank histograms is described in Appendix A). Thisresult implies that in spite of the M2M ensemble system explicitly attempting to sample all sourcesof error in the modelling chain, the amount of uncertainty captured by it is often inadequate. Thisis a common problem in both weather and hydrologic ensembles (e.g., Eckel and Walters, 1998;58Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemBuizza, 1997; Wilson et al., 2007; Olsson and Lindstro?m, 2008; Wood and Schaake, 2008). Wesuspect that a more complete handling of parameter and, in particular, initial condition uncertaintywould improve this characteristic of the M2M ensemble. Dual state-parameter estimation frame-works that incorporate data assimilation could be used to accomplish this goal, though the paucity ofobserved hydrologic state data within the case study watershed confounds their use, and such meth-ods are computationally demanding (Moradkhani et al., 2005a; DeChant and Moradkhani, 2011b;Leisenring and Moradkhani, 2011).0 24 48 725%10% Day 1RankFrequency0 24 48 725%10% Day 2Rank0 12 24 36 485%10% Day 3RankFigure 4.2: Rank histograms for the M2M ensemble forecasts at lead times of 1?3 days. Theensemble forecasting system is underdispersive for all forecast horizons as indicated bythe large percentage of observations that fall outside the range of the ensemble.In order to correct this deficiency, uncertainty models can be used to fit a probability distributionfunction (PDF) to the ensemble, whereby the parameters describing the spread of the distributionare estimated based on statistical properties of the ensemble and the verifying observations. In thisway, it is possible to implicitly account for any uncertainty that is neglected or underestimated bythe ensemble. The shape of the PDF fitted to the ensemble should correspond to the shape of theempirical distribution of the bias-corrected M2M ensemble mean forecast errors (because we planto centre the distribution on the bias-corrected M2M mean). Hydrologic variables and their errorsare often described as being non-normally distributed, and are therefore transformed into a space inwhich the errors become normally distributed, and the transformed variable can be modelled usinga simple Gaussian PDF (e.g., Duan et al., 2007; Reggiani et al., 2009; Wang et al., 2009). Thelog-normal distribution, which amounts to fitting a Gaussian distribution to log-transformed data,has a long history of use in hydrology, and is still popular today (e.g., Chow, 1954; Stedinger, 1980;Lewis et al., 2000; Steinschneider and Brown, 2011). This distribution is particularly well-suited tostreamflow and inflow forecasting, as it only assigns probabilities to positive forecast values.Observed daily inflows at Daisy Lake exhibit a bimodal distribution, with storm season flowsforming a skewed distribution at low flow values, and warm season flows forming a second peak at59Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemhigher flows. For this reason, forecast errors are analyzed by season. Figure 4.3 shows the distribu-tion of M2M ensemble mean forecast errors during the 2009?2010 storm season (October throughApril), warm season (May through September) and full water year before and after log transfor-mation. The full 72-member bias-corrected ensemble described in Section 4.2.2 has been used tocalculate the ensemble mean. Log-transformed errors are calculated by taking the natural loga-rithm of the forecasts and the observations prior to calculating the error (observed forecast). AGaussian distribution is plotted on each empirical distribution, centred over the mean forecast errorwith the standard deviation given by that of the errors. Despite the small sample size (358 forecast-observation pairs for the full water year and 206 and 152 for the storm season and warm season,respectively), we can draw some useful conclusions about the distribution of M2M ensemble fore-cast errors. The non-transformed forecast errors comprise a slightly positively skewed distributionwith the mean of the errors consistently greater than the median. The error distributions duringthe storm season are characterized by high peaks and long, narrow tails, and are therefore not wellmodelled by the normal distribution. Day 2 warm season errors do not exhibit skewness and appearto be well modelled by the superposed normal distribution. The log-transformed errors are likewisemore normally distributed with much smoother peaks than their raw counterparts.Based on these results and on the above-cited literature, we will test the performance of twodifferent uncertainty models for producing reliable inflow forecasts for Daisy Lake: a log-normaluncertainty model is expected to perform well during the storm season; the Gaussian shape ofwarm season forecast errors suggests that a non-transformed normal PDF may produce calibratedprobability forecasts during this time. The spread of these distributions should be related to forecastskill; when the forecast is less skillful, the uncertainty (as represented by the spread of the PDF) isgreater. Since the M2M ensemble is underdispersive (Figure 4.2), we expect that the distributionalspread will be best represented by a combination of ensemble spread and information regardingrecent errors (Nipen and Stull, 2011; Gneiting et al., 2005).4.3.1 Uncertainty Modelling in the COMPS FrameworkCOMPS includes an uncertainty model scheme in which a Gaussian distribution N is fitted to theensemble using the Ensemble Model Output Statistics (EMOS) method of Gneiting et al. (2005).This uncertainty model makes the assumption that the forecast errors are normally distributed. In-corporating the DMB bias correction scheme, which is applied individually to each of K ensemblemember forecasts (?t,k) at time t, the EMOS forecast PDF (ft) is given by:ft ? N  1K KXk=1?t,kDMBt,k , aT s2t + bT! , (4.2)60Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System?60 ?30 0 30 60 9000.0250.05Probability DensityDay 1 M2M Forecast Errors?60 ?30 0 30 60 9000.0250.05Day 1 Errors ? Storm Season?60 ?30 0 30 60 9000.0250.05Day 1 Errors ? Warm Season?60 ?30 0 30 60 9000.0250.05Probability DensityDay 2 M2M Forecast Errors?60 ?30 0 30 60 9000.0250.05Day 2 Errors ? Storm Season?60 ?30 0 30 60 9000.0250.05Day 2 Errors ? Warm Season?0.5 0 0.50123Probability DensityDay 1 M2M Forecast Errors (LT)?0.5 0 0.50123Day 1 Errors (LT) ? Storm Season?0.5 0 0.50123Day 1 Errors (LT) ? Warm Season?0.5 0 0.5012Probability Density  Day 2 M2M Forecast Errors (LT)?0.5 0 0.5012Day 2 Errors (LT) ? Storm Season?0.5 0 0.5012Day 2 Errors (LT) ? Warm SeasonFigure 4.3: Empirical distributions of M2M ensemble mean forecast errors (m3/s) for forecastdays 1 and 2 during the 2009?2010 water year. Errors computed after a log transforma-tion (LT) of forecasts and observations are generally more Gaussian, though the raw day2 warm season forecast errors exhibit a Gaussian shape.61Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemwhere s2tis the ensemble variance. The first parameter of the Gaussian distribution is the bias-corrected ensemble mean, while the second represents the spread of the distribution and is de-termined by a least squares linear regression fit to the variance of the ensemble. The regressionparameters aT and bT are determined based on past values of the square error of the bias-correctedensemble mean during a training period (i.e., they describe the ensemble spread-skill relationship).Users of COMPS can also choose to have this scheme find a linear relationship between the fore-cast error and the mean of the ensemble, which has been shown to be a good predictor of error forprecipitation (Hamill and Colucci, 1998).We have modified the Gaussian EMOS scheme in COMPS to be able to fit a normal distributionto log-transformed data. This uncertainty model, which assumes forecast errors to be log-normallydistributed, will be used to transform the M2M ensemble into a probabilistic forecast, and willbe referred to as log-EMOS. We will also test this scheme without log transformation; we expectthis uncertainty model (which we refer to simply as EMOS) to produce calibrated forecasts duringthe warm season when forecast errors exhibit a normal distribution. It is possible for the non-transformed uncertainty model to assign positive probabilities to negative inflow rates, which makesthis model unsuitable for prediction during low-flow periods; this is not a concern during the warmseason when snowmelt-driven inflows are relatively high.In both the EMOS and log-EMOS schemes, the regression parameters in Eq. (4.2) are updatedadaptively using a dimensionless timescale of ? = 30. Nipen (2012) found this to be a suitabletraining period for various meteorological variables. While short training periods allow the un-certainty model to adapt quickly to changes in forecast regime or ensemble configuration, longerperiods allow for a more robust estimation of the parameters. Gneiting et al. (2005) similarly founda moving window of 40 days to be a suitable compromise between these competing criteria.An adaptive updating scheme was also implemented in COMPS for computing the weights inBayesian Model Averaging (BMA), which can be used to produce calibrated probabilistic forecasts(Raftery et al., 2005). For various reasons, the method was found to be unsuitable for application tothe M2M ensemble; results of testing the method are given in Appendix C.4.3.2 Metrics of Probabilistic Forecast QualitySo long as the assumptions made by the uncertainty model hold true, it will produce calibrated prob-ability forecasts. Probabilistic calibration, or reliability (Murphy, 1973) is a measure of consistencybetween forecast probabilities and the frequency of occurrence of observed values. That is, eventsforecasted with probability p should, over the course of many such forecasts, be observed to occura fraction p of the time. This property is evaluated by visualizing the distribution of ProbabilityIntegral Transform (PIT) values (Gneiting et al., 2007) in a PIT histogram, which, for perfectly62Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemcalibrated forecasts, should be approximately flat. PIT values are given by:Pt = Ft(xt) (4.3)where xt is the verifying observation at time t, and Ft is the corresponding forecast cumulativedistribution function (CDF). The forecast CDF of variable x at time t is given by:Ft(x) = Z x1ft(x)dx. (4.4)The calibration deviation metricD of Nipen and Stull (2011) provides a more objective measureof calibration (Appendix A). While calibration is a desirable characteristic of probabilistic forecasts,it is not an adequate measure of the usefulness of a forecast. Consider, for example, an uncertaintymodel that always issues a climatological forecast (i.e., the forecast PDF is always taken as thedistribution of the climatological record). Assuming stationarity, such a forecasting system wouldbe perfectly calibrated, but far too vague for decision making. Therefore, we will also require ourforecast PDFs to concentrate probability in the correct area (i.e., near the verifying observation)on each day. This property can be measured by the ignorance score (Roulston and Smith, 2002).We also employ the Continuous Ranked Probability Score (CRPS), which addresses both calibrationand sharpness (Gneiting et al., 2005, 2007). A description of verification metrics used in this chapterand their interpretation is given in Appendix A.4.3.3 Probability Calibration MethodWe have carefully selected candidate uncertainty models for the M2M ensemble forecasts of inflowsto the Daisy Lake reservoir based on characteristics of the forecast errors. Namely, ensuring thatthe uncertainty models? assumptions (regarding how the ensemble and verifying observations arerealized) are true at certain times of the year. During these times, the uncertainty model should beable to produce calibrated forecasts. However, at other times during the water year, or as evaluatedover shorter time periods, these assumptions may be false, resulting in poorly calibrated forecasts.It is during these times that probability calibration can offer improvements to the probabilistic fore-casting system.The PIT-based probability calibration scheme implemented within COMPS is that describedby Nipen and Stull (2011) with necessary modifications for adaptive parameter calculation (Nipen,2012). Recall that a necessary condition for reliability is a flat PIT histogram. This is equivalentto requiring the cumulative distribution of PIT values to lie along the 1:1 line of PIT values vs.observed relative frequencies. By constructing an empirical cumulative distribution of PIT values63Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemaccumulated over a moving window of time points T , we can derive the PIT-based calibrationfunction as:T (p) = 1kTkXt2TH(p Ft(xt)) (4.5)where the PIT value Ft(xt) is the forecast CDF value at the verifying observation xt at a time t inthe training set T , p is a probability value between 0 and 1, and H is the Heaviside function givenin Eq. (A.17).The probability calibrated CDF is then calculated by:F?t(x) = T (Ft(x)), (4.6)which amounts to a relabelling of CDF values Ft(x) to form a new distribution F?t(x). The correctedforecast PDF (f?t) can be calculated by combining Eq. (4.4) and Eq. (4.6) and invoking the chainrule, yielding: f?t(x) =  T (Ft(x))ft(x), (4.7)where  T (p) is defined as the derivative of the calibration function T (p) with respect to p, andserves as an amplification function to the raw PDF ft(x).The calibration curveT (p) is generated by dividing the p interval [0,1] into any number of bins.Unless otherwise stated, we will use 10 bins and the individual PIT values along the calibrationcurve will be updated with a time scale of ? = 90. Note that using more bins requires a longertraining period T in order to reduce the curve?s sensitivity to sampling errors. Nipen and Stull(2011) found that the calibrator required on the order of 100 data points for optimal results (i.e.,balancing the competing objectives of reducing sampling error and using recent data for trainingthe calibrator). Note that using fewer bins would reduce sampling error, but that the calibrationcurve would be very coarse. Excluding the (constant) end points (0,0) and (1,1) of the calibrationcurve, these ten bins are defined by nine interior ?smoothing points? (p,p). Modifying Eq. (4.5)for adaptive updating of these points yields:p,t+1 =?  1? p,t + 1? H(p Ft(xt)), (4.8)(Nipen, 2012).The PIT-based calibration scheme as implemented in COMPS uses a monotonically increasingcubic spline to create a smooth T (p) curve with a continuous derivative to connect the smoothingpoints. This allows it to generate a smoothly varying adjusted PDF for calculating the ignorancescore.64Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemAn important finding of Nipen and Stull (2011) is that applying the calibration during periodswhen the uncertainty model already produces calibrated forecasts actually degrades the reliability.This is due to sampling errors in the PIT histogram used to generate the calibration curve (Brockerand Smith, 2007; Pinson et al., 2010). In such cases, the probability forecast is best left unadjusted.An option for ?intelligent calibration? (inteliCal) has therefore been added to the COMPS PIT-based calibration scheme. InteliCal determines whether or not the calibration should be appliedby comparing the calibration deviation metric, D [Eq. (A.8)] to the value of calibration deviationexpected for perfectly calibrated forecasts (caused by sampling error), given by E[Dp] [Eq. (A.9)](Nipen and Stull, 2011). When D is sufficiently greater than E[Dp], we anticipate that the forecastwill benefit from calibration.The calibration deviation metric D in Eq. (A.8) is calculated using bin counts computed over amoving window of length kTk. In the adaptive updating framework of COMPS, the only change tothe formulation of D is that the bin frequencies bikTk1 are updated adaptively. We also replacekTk in Eq. (A.9) with the dimensionless time scale ? . To correct for scaling differences that occurwith the replacement of the moving window update with the adaptive updating scheme, a factorof 1.5 is required when comparing D and E[Dp]. This conversion factor was determined throughtrial-and-error by comparing D computed using a moving window to that computed adaptively overthe range 60 ? ? ? 250. InteliCal then applies the PIT-based calibration only when:1.5D > ICF ? E[Dp]. (4.9)The inteliCal adjustment Factor (ICF ) allows COMPS users to adjust the sensitivity of inteliCal ifnecessary. The default ICF is 1.0, but as the performance of the inteliCal scheme has not yet beenthoroughly tested, the ideal ICF is not known. In this study, we attempt to determine a suitablevalue for the ICF , though results may be case-specific.4.4 Results and Discussion4.4.1 Performance of the Uncertainty ModelsFigure 4.4 shows the uncalibrated PIT histograms for the 3-day EMOS uncertainty model forecastsmade during the case-study storm seasons (first row), the warm seasons (second row), and for thefull water years (bottom row). Calibration deviation (D) is shown on each histogram, with theexpected deviation for a perfectly calibrated forecast (E[Dp]) also shown on the day 1 plots (E[Dp]does not change with lead time as it is only a function of the number of bins in the PIT histogramand the sample size). This figure clearly illustrates how selecting an inappropriate uncertainty model65Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System0 0.5 100.050.10.150.20.25Bin frequencyStorm Season ? Day 1      D =  0.066E[Dp] =  0.0150 0.5 100.050.10.150.20.25D =  0.060Day 20 0.5 100.050.10.150.20.25D =  0.065Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.021E[Dp] =  0.017Warm Season ? Day 10 0.5 100.050.10.150.20.25D =  0.018Day 20 0.5 100.050.10.150.20.25D =  0.027Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.041E[Dp] =  0.011Full Water Year ? Day 10 0.5 100.050.10.150.20.25D =  0.034Day 20 0.5 100.050.10.150.20.25  D =  0.034Day 3Figure 4.4: PIT histograms for the storm seasons (top row), warm seasons (middle row), andfull water years (bottom row), pooled over the 2010?2011 and 2011?2012 water years.Results are for the uncalibrated EMOS uncertainty model. Calibration deviations D areshown for each histogram, with E[Dp] for comparison. Flatter histograms and thereforelower D are preferred.can yield highly uncalibrated results. The PIT histograms for the storm season show that the EMOSuncertainty model does not concentrate enough probability density at the centre of the distribution.This is readily anticipated given the empirical storm season error distribution in Figure 4.3. Duringthe warm season, which exhibits a more normal distribution of errors, the EMOS uncertainty model66Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemis able to produce nearly calibrated probabilistic forecasts. The importance of specifying a timeperiod over which calibration is measured is evident in the PIT histograms for the full water year,which mask the excellent calibration during the warm season.We have applied the log-EMOS uncertainty model with two different configurations: usingensemble variance as a predictor of the Gaussian spread (log-EMOSv); and using ensemble meanas a predictor of this spread (log-EMOSm). Precipitation uncertainty has been found to be betterexplained by ensemble mean than by measures of ensemble spread (Hamill and Colucci, 1998).Since reservoir inflows are so strongly influenced by precipitation, it was anticipated that stormseason inflow uncertainty would likewise be better represented by the ensemble mean.Figure 4.5 shows uncalibrated PIT histograms for forecast days 1 through 3 broken up by sea-son for log-EMOSv forecasts. This uncertainty model is, as expected, superior to the EMOS modelduring the storm season when errors are log-normally distributed (Figure 4.3), but produces slightlyless calibrated forecasts during the warm season. Using the log-EMOSm uncertainty model duringthe storm season results in more observations falling in the tails of the forecast distribution (Fig-ure 4.6). This may be caused by the behaviour of inflows to the Daisy Lake reservoir during thefall and winter. During the storm season, observed and forecasted ensemble mean inflows can bequite low for several days or even weeks (due to dry weather patterns, or precipitation falling assnow), and can then increase very suddenly when a rain or rain-on-snow event occurs. The en-semble mean-as-spread uncertainty model will have difficulty training for these sudden changes.Also, ensemble mean forecast misses and false alarms will result in the distribution having spreadcompletely unrelated to forecast skill.The superior performance (relative to EMOS forecasts) of the log-EMOSv forecasts during thestorm season is also reflected in this model?s ignorance and continuous ranked probability scores(Figure 4.7). Ignorance scores for the EMOS model during the storm season are significantly worsedue to the unsuitability of this model during periods when forecast errors are not normally dis-tributed. Conversely, the EMOS model has slightly better ignorance scores during the warm seasonfor days 1 and 2. The fact that these ignorance scores are still higher than those for the EMOSmodel during the storm season appears to be caused by the uncertainty model generating warmseason forecast PDFs with large spread. The bias-corrected ensemble members themselves exhibitlarge spread, suggesting that this is a failure of both the M2M ensemble forecasting system andof the regression-based EMOS model as the forecast error characteristics change between seasons.The CRPS values in Figure 4.7 clearly show that the log-EMOSv model performs best during thestorm season. The EMOS forecasts have the best day 1 CRPS, but log-EMOSv is better at longerlead times due to these forecasts having better sharpness.67Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System0 0.5 100.050.10.150.20.25Bin frequencyStorm Season ? Day 1      D =  0.019E[Dp] =  0.0150 0.5 100.050.10.150.20.25D =  0.022Day 20 0.5 100.050.10.150.20.25D =  0.023Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.020E[Dp] =  0.017Warm Season ? Day 10 0.5 100.050.10.150.20.25D =  0.026Day 20 0.5 100.050.10.150.20.25D =  0.030Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.012E[Dp] =  0.011Full Water Year ? Day 10 0.5 100.050.10.150.20.25D =  0.014Day 20 0.5 100.050.10.150.20.25  D =  0.013Day 3Figure 4.5: PIT histograms for the storm seasons (top row), warm seasons (middle row), andfull water years (bottom row), pooled over the 2010?2011 and 2011?2012 water years.Results are for the uncalibrated log-EMOSv uncertainty model. Calibration deviationsD are shown for each histogram, with E[Dp] for comparison.68Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System0 0.5 100.050.10.150.20.25Bin frequencyStorm Season ? Day 1      D =  0.023E[Dp] =  0.0150 0.5 100.050.10.150.20.25D =  0.029Day 20 0.5 100.050.10.150.20.25D =  0.027Day 3Figure 4.6: PIT histograms for all forecast horizons during the 2010?2011 and 2011?2012storm seasons using the uncalibrated log-EMOSm uncertainty model.Day 1 Day 2 Day 33.544.555.56Ignorance  EMOSlog?EMOSvlog?EMOSmDay 1 Day 2 Day 3345678CRPSFigure 4.7: Ignorance and continuous ranked probability scores (CRPS) for the various un-certainty models tested. Forecasts are divided into storm season (solid lines) and warmseason (dashed lines) for scoring, as each uncertainty model has different calibrationcharacteristics during these times of year. Smaller ignorance scores and CRPS are pre-ferred.69Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System4.4.2 Effect of Probability CalibrationFigure 4.8 illustrates the power of the PIT-based probability calibration scheme (with the defaultICF of 1.0). The method is able to correct for the EMOS uncertainty model?s failed assump-tion of Gaussian forecast errors during the storm season. However, the dimensionless timescale of? = 90 applied here results in the calibration adjustments necessary during the storm season beingpropagated into the already well-calibrated warm season. Namely, the calibration has successfullyadjusted the storm season predictive distributions to have higher peaks and thicker tails, but hascarried this adjustment into the warm season, as indicated by these distributions now being under-dispersive with too much probability density at the centre of the distribution. This problem does notoccur during the transition from the warm season to the storm season. This is because by the endof the already well-calibrated warm season, the calibration deviation as measured by the adaptivescheme is very small, and inteliCal does not apply any calibration to the forecasts.The PIT histograms for probability calibrated log-EMOSv forecasts (with an ICF of 1.0) areshown in Figure 4.9. The deterioration in warm season calibration deviation may again be causedby the lengthy learning period of the scheme and the fact that the raw storm season and warm seasonPIT histograms exhibit different distributional biases; there is a slight underforecasting bias duringthe former period, and a tendency to overforecast in the latter. A more likely explanation is that theinteliCal scheme is applying the calibration correction too often, resulting in the introduction of ad-ditional calibration deviation caused by sampling error. The raw log-EMOSv calibration deviationsD are not significantly different from E[Dp]; these forecasts (particularly during the storm season)may therefore be considered calibrated to within sampling error.While storm season calibration is improved by the PIT-based calibration scheme in both un-certainty models, the ignorance scores indicate that calibration is somehow shifting the highestconcentration of probability in the forecast PDF away from the verifying observation (Figure 4.10).Examination of forecast CDFs on a handful of forecast days (not shown) reveals that while calibra-tion of the EMOS uncertainty model can yield excellent results with respect to calibration deviationduring the storm season, it does so by shifting the forecast PDF such that the verifying observationfalls nearer to the tails of the distribution. Warm season EMOS ignorance scores are only worseduring the (lengthy) period when the calibrator is adjusting to the new regime. The opposite is truefor the log-EMOSv model, where the ignorance scores plotted in Figure 4.10 are similar or betterfor the probability calibrated forecasts during the storm season, but are consistently higher duringthe warm season where the calibration is having to do the most adjustments. Calibration of theEMOS forecasts results in improved CRPS during the storm season because these forecasts havemore probability mass near the centre of the distribution and are therefore sharper. EMOS warm70Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System0 0.5 100.050.10.150.20.25Bin frequencyStorm Season ? Day 1      D =  0.018E[Dp] =  0.0150 0.5 100.050.10.150.20.25D =  0.009Day 20 0.5 100.050.10.150.20.25D =  0.013Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.027E[Dp] =  0.017Warm Season ? Day 10 0.5 100.050.10.150.20.25D =  0.035Day 20 0.5 100.050.10.150.20.25D =  0.047Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.004E[Dp] =  0.011Full Water Year ? Day 10 0.5 100.050.10.150.20.25D =  0.014Day 20 0.5 100.050.10.150.20.25  D =  0.017Day 3Figure 4.8: PIT histograms for EMOS uncertainty model forecasts as in Figure 4.4, but fol-lowing PIT-based probability calibration with nine smoothing points and ? = 90.season scores deteriorate as a result of the introduction of significant calibration deviation. Theincreased calibration deviation of log-EMOSv forecasts during the warm season contributes to theincreased CRPS during this season. During the storm season, CRPS for this model does not changesignificantly after application of the PIT-based calibrator.Nipen (2012) derived a decomposition of the ignorance score for a set of raw forecasts intotwo parts: (1) the potential ignorance score of a perfectly calibrated forecast (IGNpot), and (2)71Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting System0 0.5 100.050.10.150.20.25Bin frequencyStorm Season ? Day 1      D =  0.010E[Dp] =  0.0150 0.5 100.050.10.150.20.25D =  0.013Day 20 0.5 100.050.10.150.20.25D =  0.025Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.021E[Dp] =  0.017Warm Season ? Day 10 0.5 100.050.10.150.20.25D =  0.037Day 20 0.5 100.050.10.150.20.25D =  0.039Day 30 0.5 100.050.10.150.20.25Bin frequency      D =  0.009E[Dp] =  0.011Full Water Year ? Day 10 0.5 100.050.10.150.20.25D =  0.017Day 20 0.5 100.050.10.150.20.25  D =  0.025Day 3Figure 4.9: PIT histograms for log-EMOSv uncertainty model forecasts as in Figure 4.5, butfollowing PIT-based probability calibration with nine smoothing points and ? = 90.extra ignorance caused by a lack of calibration (IGNuncal). Ignorance can therefore be reducedby improving the ensemble forecasting system, applying bias correction, or using a more suitableuncertainty model to reduce IGNpot, or by calibrating the forecast to reduce IGNuncal. In ourcomparison of raw and probability calibrated EMOS and log-EMOSv forecasts, the bias correctionand uncertainty model schemes have not undergone any changes. Therefore changes in ignorancescores can be attributed to changes in IGNuncal. This suggests that the increased ignorance exhibited72Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting SystemDay 1 Day 2 Day 33.544.555.566.57Ignorance  EMOSlog?EMOSvDay 1 Day 2 Day 3345678CRPSFigure 4.10: Same as Figure 4.7, but scores are computed after applying the PIT-based calibra-tion scheme (black). Uncalibrated results (grey) are plotted for comparison. Results forwarm season EMOS forecasts probability calibrated using the ?carry-forward? methodare indicated by the heavy dashed line.by the calibrated log-EMOSv forecasting system can be attributed to overfitting of the calibrationcurve to sampling errors. While overfitting may play a role in the deterioration of EMOS uncertaintymodel ignorance after calibration, the main problem in this case is the long lag-time in updating thePIT histogram when the forecasting system?s error characteristics transition between seasons. Notethat sampling error is likely less significant in verification than it is in calibration, as the verificationsample sizes are larger.The adjustment factor (ICF ) in Eq. (4.9) allows users of COMPS to adjust the inteliCal sen-sitivity. Since the probability calibrated log-EMOSv forecasts made with the default ICF of 1.0exhibit signs of overfitting, we may expect better results with a larger ICF . We tested inteliCalcalibration of the storm season log-EMOSv forecasts with ICF ranging from 1.1 to 2.0. This ex-periment revealed that the most improvement to calibration deviation, which occurs for ICF in therange of 1.0 to 1.33, is accompanied by an increase in ignorance. Higher values of ICF preventthe ignorance score from being inflated due to the introduction of sampling error in the calibration,but this is achieved by applying the calibration correction sparingly. The fact that the log-EMOSv73Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemstorm season forecast ignorance scores are lowest without any probability cailbration supports theearlier suggestion that these forecasts can be considered to be calibrated (within sampling error).While we can likewise consider the EMOS warm season forecasts to be calibrated, the drasticchange in shape of the EMOS uncertainty model?s raw (uncalibrated) PIT histograms between sea-sons suggests an alternative calibration strategy for improving the sharpness of EMOS forecasts.Indeed, even with an ICF of 2.0, the calibrated warm season EMOS PIT histograms exhibit the?U? shape evident in Figure 4.8. This confirms that the problem is not caused by overfitting to sam-pling error in the calibration curve, but rather by the long lag-time in updating its shape. To avoidthe adaptive calibration scheme?s long lag-time in generating representative calibration curves, wereplaced the calibration parameters at the start of the warm season (taken to be May 1) with thosevalid at some time during the previous year?s warm season. July 29 was selected as the replacementdate based on calibration statistics from the 2009?2010 water year. By this date, the PIT histogramis able to reflect the (well-calibrated) characteristics of the EMOS warm season probability fore-casts. Note that the choice of May 1 for the start of the warm season is based solely on examinationof climatological inflows (i.e., it is the approximate start of the rising limb of the climatologicalfreshet). Whether this date coincides with the start of the snowmelt season in any given year is notknown ahead of time.This calibration strategy, which we refer to as carry-forward (CF) calibration, resulted in sig-nificant improvements to warm season forecasts derived from the EMOS uncertainty model (usingthe default ICF of 1.0). The calibration deviation for day 1 forecasts dropped to 0.011, while de-viations for days 2 and 3 dropped to 0.015 and 0.021 respectively. Additionally, ignorance scoresfor these CF-calibrated forecasts were greatly improved as shown in Figure 4.10 (heavy dashedline). CRPS also improved as a result of forecasts becoming sharper after calibration. A compari-son of raw and CF-calibrated distributional spread (not shown) reveals that the calibration schemereduces spread, particularly early in the warm season. As the warm season progresses, the change inspread is reduced, and toward the end of the warm season, the calibration scheme slightly increasesthe distributional spread for forecast horizons of 2?3 days. This supports the suggestion in Sec-tion 4.4.1 that the regression-based distributional spread fitting has difficulty during the transitionperiod between the storm season and warm season.Examination of forecast PDFs for a handful of dates throughout the warm season reveals thatIGN and CRPS improvements are due to the amplification function [Eq. (4.7)] increasing the heightof the forecast PDF near the centre, and reducing the height in the tails (i.e., reducing the spread).Results from the CF-calibration were found to be insensitive to increases in ICF up to 1.43. ForICF values greater than this, the calibration is applied too sparingly and ignorance scores showvery little improvement. In terms of both calibration deviation and ignorance, an ICF of 1.0 gives74Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemthe best results in the carry-forward calibration framework.Based on these results, the ideal M2M-based probability forecasting system for Daisy Lakeinflows is a combination of two different COMPS configurations: (1) the raw (uncalibrated) log-EMOSv forecasts during the storm season; and (2) the carry-forward-calibrated EMOS forecastsduring the warm season. Figure 4.11 shows the PIT histogram resulting from pooling the forecastsfrom the these two model configurations over the 2010?2011 and 2011?2012 water years. Thecorresponding ignorance scores for forecast days 1, 2 and 3 are 4.38, 4.76 and 4.95, respectively,while CRPS values are 4.44, 5.11 and 5.53.0 0.5 100.050.10.150.20.25Bin frequency      D =  0.011E[Dp] =  0.011Full Water Year ? Day 10 0.5 100.050.10.150.20.25D =  0.014Day 20 0.5 100.050.10.150.20.25D =  0.011  Day 3Figure 4.11: PIT histogram for full water years after combining raw (no PIT-based calibrationapplied) storm season forecasts from the log-EMOSv uncertainty model with carry-forward-calibrated EMOS forecasts during the warm season for ideal forecast reliabilityand sharpness.4.5 Concluding RemarksIn this chapter, we have transformed a 72-member ensemble forecasting system that explicitly sam-ples all sources of error in the inflow modelling chain into a calibrated probabilistic forecastingsystem. This work was done exclusively using the COmmunity Modular Post-processing System(COMPS) described and developed by Nipen (2012). COMPS allows its users to implement andapply schemes for bias correction, uncertainty modelling, probability calibration, forecast updatingusing recent observations, and verification. Any of these components can alternatively be bypassed,making COMPS a flexible post-processing tool for point forecasts of almost any observed phe-nomenon.An analysis of inflow forecast error characteristics at the Daisy Lake Reservoir enabled us toimplement and apply COMPS uncertainty models appropriate at different times of year. Duringthe storm season, a log-normal uncertainty model fit to the M2M ensemble using EMOS yields75Chapter 4: Reliable Probabilistic Forecasts from an Ensemble Reservoir InflowForecasting Systemreliable or calibrated forecasts; a simple normal EMOS distribution yields calibrated results duringthe warm season when errors are normally distributed.The PIT-based calibration scheme of Nipen and Stull (2011) was generally found to improvecalibration at the expense of forecast ignorance. In the case of the well-calibrated log-EMOSv un-certainty model, this is caused by an overfitting of the calibration curve to sampling errors. Seasonalchanges in PIT histogram shape for the EMOS uncertainty model caused continuous updating ofcalibration curve parameters to produce poorly calibrated forecasts during the warm season. Thisis because of the long lag-time in adaptively updating the calibration curve. By replacing thesecalibration parameters at the start of the warm season with those valid late in the previous year?swarm season (a process referred to as carry-forward calibration), we were able to produce sharperforecasts with slightly less calibration deviation, and greatly reduced ignorance and CRPS.The ideal approach to probabilistic forecasting of inflows to the case-study watershed is there-fore a combination of two different configurations: raw (not calibrated) log-normal EMOS un-certainty model forecasts during the storm season (October through April), and Gaussian EMOSuncertainty model forecasts with carry-forward calibration (with an ICF of 1.0) during the warmseason (May through September). This combined configuration is easily achieved in an operationalforecast setting, whereby both uncertainty models are run continuously throughout the year, andthe forecast output from the COMPS system is switched at pre-determined dates, or, alternatively,when the observed flow characteristics begin to transition. Testing of the newly-implemented intel-iCal calibration scheme in COMPS indicates that a suitable value for the inteliCal adjustment factor,ICF , may be in the range of 1.43 to 1.67. Whether these results are specific to the case study datais unknown; future work should include further testing of the inteliCal scheme.While the methods applied and results shown in this chapter are specific to the case-study wa-tershed, there are some general lessons that can be applied in other studies. First and foremost,an analysis of forecast error characteristics goes a long way in determining the ideal uncertaintymodel. We have shown that when the uncertainty model makes correct assumptions about howforecast errors are distributed around the ensemble mean, probabilistic forecasts derived from themodel are reliable or very nearly so. We have also shown that error characteristics can be stronglyregime-dependent. Thus, applications of probabilistic forecasting methods in watersheds with dis-tinct seasonality (for example, a rainy season and a snowmelt-driven season) may benefit from theuse of different uncertainty models at different times of year.76Chapter 5On the Importance of SamplingHydrologic Uncertainty: An EconomicAnalysis5.1 IntroductionDeterministic forecasts can give forecast users a false impression of certainty. In risk-based de-cision making, deterministic forecast failures can lead to significant economic and societal losses(e.g., Glassheim, 1997). Forecasts expressed in terms of reliable probabilities of a range of possibleevents can enable rational decision making and provide economic benefits to the decision makerand to society as a whole (Krzysztofowicz, 2001). Roulston et al. (2006) have shown that when pro-vided with weather forecasts and quantitative estimates of forecast uncertainty, even nonspecialistsare able to make decisions that increase economic reward and reduce exposure to risk. Researchhas repeatedly illustrated that, over a range of time scales, even imperfect probabilistic weatherand hydrologic forecasts are able to provide positive economic value to a wider range of users thandeterministic forecasts and that for most users reliable probability forecasts provide increased eco-nomic value (e.g., Richardson, 2000; Zhu et al., 2002; Palmer, 2002; Stensrud and Yussouf, 2003;Roulin, 2007; McCollor and Stull, 2008b).The Member-to-Member (M2M) ensemble evaluated in this chapter consists of various compo-nents, each sampling a different source of uncertainty in the hydrologic modelling chain. The M2Mforecasting system includes multiple Numerical Weather Prediction (NWP) models and multiple(nested) NWP grids that are downscaled using multiple interpolation schemes to drive multiple Dis-tributed Hydrologic (DH) models. Each DH model has multiple model parameterizations and usesmultiple hydrologic states to begin each daily forecast. Each of these components comes at a price,whether measured in terms of money, hours worked, or computational costs.Many gridded NWP model output fields are freely available from national forecast centres77Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysissuch as the U.S. National Centers for Environmental Prediction, and the Meteorological Service ofCanada, though there are computational costs associated with handling these large data sets. Other,generally higher resolution or custom products may be obtained through a contract with a private oracademic weather modelling group, or by purchasing expensive high-performance computers andhiring IT staff to make in-house NWP forecasts. Many hydrologic models are freely available, butsupport may not be, and setup of these models for a specific watershed can be a time-consuming,and therefore costly undertaking. Automated model parameter optimization schemes require somemanual setup time, but can generally be left to run for the days or weeks required for tuning, so longas the computing resources are available. A multi-parameter ensemble may necessitate a multi-state component to avoid forecast discontinuities when model parameters change suddenly betweenmodel runs. The multi-state component is then essentially free, aside from costs associated withadditional model run-time, which could be significant.Since the sum total of the price paid for the full M2M ensemble may be prohibitive for someoperational forecasting applications, it is prudent to evaluate the economic value of each M2M com-ponent. If the price paid for each component is known, such an analysis can be used to determinewhether or not they are cost-effective. Murphy (1993) identified three types of forecast ?goodness?:consistency (i.e., between a forecaster?s best judgement and the actual forecast), quality, and value.Value, which is concerned with economic worth to the forecast end user, is the focus of this chapter.In this study, the full 72-member M2M (hereafter identified as ?Full?) ensemble is reduced byeliminating various ensemble components. Each reduced ensemble forecast is transformed into aprobabilistic forecast using uncertainty models that fit a probability distribution to the ensemble.The relative economic values of the probabilistic forecasts from the reduced M2M configurationsare then compared to those from the Full M2M system to ascertain the value added by each com-ponent. Any sources of uncertainty that are not explicitly or adequately sampled by the ensemblemay be implicitly accounted for by the uncertainty model or by subsequent probability calibration.Using this strategy, it may be possible to reduce the ensemble setup cost and computational com-plexity (and therefore operationally critical forecast run-time) while continuing to generate reliableprobabilistic forecasts.Economic value of the probabilistic forecasts used in this study is estimated based on costs andlosses associated with operating the reservoir under the guidance of each ensemble configuration.This value provided to the forecast end user does not include any reductions in value due to compu-tational, time, or monetary costs associated with the various ensemble components. It is thereforeup to the individual forecast end user to weigh the value of each ensemble component against theprice paid for that component.78Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis5.2 Economic Value of ForecastsThe economic value of reservoir inflow forecasts is controlled by complex interactions betweenforecast quality, reservoir operation constraints, transmission constraints, demand for electricity, andthe highly variable electricity market, among other factors. Dynamic economic models that seek tocapture these processes have been developed for specific regions or markets, and are typically usedfor determining economically optimal strategies for water management.For example, energy production in the Columbia River system, which is controlled by a numberof major storage and run-of-river dams in southwestern Canada and the United States, is mod-elled by ColSim (Hamlet et al., 2002; Hamlet and Lettenmaier, 1999). ColSim handles a variety ofcompeting system objectives such as hydropower, flood control, flow targets and recreational con-straints. The CALVIN (California Value Integrated Network) model (Jenkins et al., 2001; Draperet al., 2003; Pulido-Velazquez et al., 2004) similarly balances different objectives for optimal oper-ation of California?s major water supply system.The Short-Term Optimization Model (STOM) developed by Shawwash (2000) for the BritishColumbia Hydro and Power Authority (BC Hydro) focuses on operations planning that optimizeshydroelectric resource utilization and trade opportunities at time scales of one day to a week for theentire BC Hydro generating system. STOM determines the optimal tradeoff between the long-termvalue of water and the returns from spot trading transactions in a competitive electricity market.Operating decisions are driven by the need to meet system electricity demand while meeting otherrequirements and constraints that are often in competition with the main objective. Other decisionsupport tools have been developed for BC Hydro operations planning at longer timescales for spe-cific power complexes (Druce, 1990) and for system-wide management (Fane, 2003). Althoughmodels such as ColSim, CALVIN and STOM allow a thorough, realistic examination of energyproduction and revenues for different operating strategies (whether driven by weather forecastsor changes operating on longer time scales), their use also entails ?enormous data requirements?(Draper et al., 2003, p. 160).In the absence of suitable complex, dynamic models like those described above, or the datarequired to drive them, the economic value of forecasts can still be estimated with respect to hydro-electricity production using more simplified decision-making models. For example, McCollor andStull (2008b) developed such a model for daily reservoir operation using the static cost-loss modelof Richardson (2000). This cost-loss model has also been employed in the evaluation of forecastsof temperature for the energy sector (Stensrud and Yussouf, 2003), road-weather forecasts (Thornesand Stephenson, 2001), and severe weather forecasts (Legg and Mylne, 2004) among many otherapplications.79Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisKrzysztofowicz and Duckstein (1979) employed a similar model of decision making for opti-mizing the conflicting objectives of flood control and hydroelectric production. However, in thismodel, decision criteria vary with the decision maker?s preferences with respect to these objectives,and can therefore differ from purely economic criteria. Roulin (2007) presents a more dynamicmodel of decision making in which decisions and actions can change as the event draws nearerand new forecast information becomes available. Georgakakos and Yao (2001) have shown thatreservoir management models that make use of forecast ensembles have potential to improve sys-tem performance only if the management model or process uses the forecast information effectively.This can be done by employing adaptive decision systems to determine dynamic operational policiesgiven uncertain forecasts.In this chapter, economic value of the M2M ensemble and various reduced configurationsthereof will be evaluated using the simple (static) cost-loss model for reservoir operation devel-oped by McCollor and Stull (2008b). A description of the general cost-loss decision-making modelfollows in Section 5.2.1. Refinement of the decision-making problem for economical operation ofthe case-study reservoir is illustrated in Section 5.3.4.5.2.1 A Simple Cost-Loss Decision ModelIn the simplified model of reservoir operation developed herein, the operator, when faced with aforecast of a significant inflow event, must decide whether the forecast probability is great enoughto warrant taking mitigative action. This action amounts to drafting (i.e., lowering the water levelof) the reservoir by routing the water through the turbines and generating electricity, thereby makingroom for subsequent inflow. If the event does not occur, this action results in an economic cost dueto the lowered hydraulic head, which reduces the energy that can be derived from a given volumeof water. Conversely, if the reservoir operator does not draft the reservoir and the inflow event doesoccur, an economic loss is incurred by spilling water rather than running it through the generators.The reservoir operator?s choice depends on: the capacity of the reservoir to take in additionalwater, the inflow forecast, and operational constraints such as maintaining constant reservoir levelsfor maximum hydroelectric production or recreational usage, or meeting minimum flow require-ments for aquatic habitat. By taking the appropriate action for each forecast, the operator can expectto minimize costs and losses over the long run. The decision-making process can be simplified bymaking certain assumptions that are outlined in Section 5.3.4.There are four possible combinations of mitigative action and occurrence of an event, each withits own net cost. These are summarized in Table 5.1. If the forecast probability of a particular event(where an event is the exceedance of some significant inflow threshold) exceeds some thresholdvalue (pt), the reservoir operator takes action, incurring a cost C. A loss L occurs if the event was80Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisnot forecast, but was observed to occur. Operational expenses that occur as a result of using theforecasting system are related to the number of forecast hits a (correct forecasts), the number offalse alarms b (no occurrence when forecast), and the number of misses c (event occurs but wasnot forecast). Correct rejections d (event was neither forecast nor observed) do not result in anyexpenses.Table 5.1: Cost-loss contingency table of inflow forecasts and observations. The number offorecast hits is given by a, b is the number of false alarms, c the number of misses, andd the number of correct rejections. Action is taken when a particular inflow exceedanceevent is forecast to occur, incurring a cost C, while events that were not forecast result inlosses L. Correct rejections result in no costs or losses.Observed Not observedForecast/Action a($C) b($C)Not forecast/No action c($L) d($0)The mean expense associated with using a particular forecasting system is then given by:Ef = anC + bnC + cnL, (5.1)where n is the total number of forecast-observation pairs in the evaluation period (n = a+b+c+d)(Richardson, 2000).The economic value of the forecasts is assessed by comparing Ef to the mean expenses associ-ated with perfect forecasts (Ep), and those associated with operating without any forecast informa-tion (Ec): V = Ec  EfEc  Ep . (5.2)Note that this relative value definition is equivalent to a skill score, with maximum V of one indi-cating perfect forecasts, and V less than zero indicating forecasts that are less skillful/valuable thanclimatology (Wilks, 2001; Richardson, 2003).In the absence of any forecast information, the decision maker will either take protective actionevery day, incurring a mean expense of C, or they will choose to never protect, in which case lossesoccur at a rate equal to the climatological base rate of an event (s), resulting in mean expense of sL.Since this choice depends on which course of action results in the minimum economic risk, Ec issimply min(C, sL). Given perfect forecasts the decision maker would choose to protect only whenthe event occurred, yielding a mean expense of Ep = sC.81Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisUpon defining the user-specific cost-loss ratio ? = C/L, Eq. (5.2) can be expressed as:V = min(?, s) ?n (a + b) cnmin(?, s) s? . (5.3)Incorporating the definitions of hit rate (H) and false alarm rate (F) given in Appendix A, andsetting the climatological base rate to s = (a + c)/n, Eq. (5.3) can be rewritten as:V = min(?, s) F?(1 s) + Hs(1 ?) smin(?, s) s? . (5.4)5.3 Case Study5.3.1 Study Dates and DataThe M2M ensemble (described in Section 5.3.2) is used to forecast inflows to the Daisy Lake reser-voir, a hydroelectric facility on the upper Cheakamus River in the mountainous terrain of south-western BC, Canada. The reservoir is operated by BC Hydro. Evaluation is carried out over the2010?2011 and 2011?2012 water years. Forecasts were also generated for the 2009?2010 wateryear, but these forecasts are excluded from evaluation because their quality may be impacted by thespin-up of uncertainty model and probability calibration parameters.For this particular hydroclimatic regime, a water year is defined as the period from October 1to September 30 of the following year. Fall and winter storm season (October ? April) inflows areprimarily driven by precipitation from Pacific frontal systems. Rain-on-snow events can result insignificant inflows during this period. During the spring and summer warm season (May ? Septem-ber), inflows are snowmelt-driven, with some late-season glacier melt contributions. Daily averageinflow rates are calculated by BC Hydro using a water balance based on observed reservoir levelsand outflows. These calculated inflows are considered to be of high quality for this basin, and willbe referred to as observed inflows for the purposes of this study. Hourly forecast inflows to theDaisy Lake reservoir are transformed into daily average inflow rates for verification against theseobservations.In the simple cost-loss model developed herein, it is assumed that the hypothetical Daisy Lakereservoir operator is sensitive to daily average anomaly inflow rates of 70 m3/s and 100 m3/s .Anomalies are calculated by subtracting the daily climatological median inflow from the forecast,and are used so that the forecasting system is not unduly rewarded for making high inflow forecastsduring the snowmelt season when relatively little skill is required to do so. Instead, the forecasting82Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysissystem is rewarded for correctly forecasting events that are significantly different from climatology,or a readily anticipated inflow value. The choice of inflow rate threshold is limited by the small sam-ple size of two water years in which no extreme events occurred (Figure 5.1). An anomaly thresholdof 100 m3/s corresponds to an absolute inflow threshold of approximately 130?150 m3/s dependingon time of year; based on the full climatological record, this threshold corresponds to a one-in-one-month inflow event. For comparison, inflow events requiring pre-generation or drafting of the DaisyLake reservoir occur on average once per year (Doug McCollor, personal communication, April 17,2013).10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30?50050100150200Inflow (m3 /s)2010?2011 Water Year  ObservedClimateAnomaly10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 09/30?50050100150200Inflow (m3 /s)Date (mm/dd)2011?2012 Water YearFigure 5.1: Observed inflows (solid black line) for the 2010?2011 and 2011?2012 years.Anomaly inflow values (solid grey line) are calculated by subtracting the climatologi-cal inflows (dashed black line) from the observations. The anomaly thresholds of 70m3/s and 100 m3/s are indicated by the horizontal dashed grey lines.During the evaluation period, the 100 m3/s threshold is exceeded on eight days. Six of theseinflow events occur during the spring/summer warm season, and the remaining two events occurduring the fall/winter storm season. The climatological base rate (or exceedance probability), s, for83Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisinflow event anomalies exceeding 100 m3/s is approximately 0.011 during the evaluation period.The corresponding base rate for the 70 m3/s threshold is approximately 0.023. During the evaluationperiod, there are five such storm season events and twelve during the warm season.In addition to assessing the economic value of the M2M configurations over the case-studyperiod, probabilistic forecast quality will be evaluated using the ignorance score (IGN) and con-tinuous ranked probability score (CRPS). Deterministic ensemble median forecast quality and skillwill be measured using Mean Absolute Error (MAE) and the Root Mean Squared Error Skill Score(RMSESS) for each configuration. For a description of these verification measures and their inter-pretation, see Appendix A.5.3.2 The Member-to-Member (M2M) Ensemble Forecasting SystemThe M2M ensemble forecasting system is designed such that all sources of uncertainty in the inflowmodelling chain are sampled. Uncertainty in the forecasts comes from the NWP models used todrive the hydrologic models, the hydrologic models themselves and their parameterizations, and theinitial conditions or hydrologic states from which the forecasts are started.The NWPmodels are taken from the operational ensemble suite run by the Geophysical DisasterComputational Fluid Dynamics Centre (GDCFDC), in the Department of Earth, Ocean and Atmo-spheric Sciences at the University of British Columbia. The ensemble consists of three independentnested limited-area high-resolution mesoscale models with forecast domains centred over south-western BC: the Mesoscale Compressible Community model (MC2; Benoit et al., 1997); the fifth-generation Pennsylvania State University-National Center for Atmospheric Research MesoscaleModel (MM5; Grell et al., 1994); and Version 3 of the Weather Research and Forecasting (WRF)model (Skamarock et al., 2008). Hourly model output fields with grid spacing of 12, 4 and 1.3 kmare used for this study.The NWP models are initialized at 00UTC. Forecast run time varies during the case-study pe-riod. From the start of the evaluation period (October 2010) all NWP models were run out to atleast 60 hours except for the 1.3-km MC2 model runs, which are 39 hours due to operational timeconstraints. The WRF model produced 84-hour forecasts for all grids throughout the study period.In March 2011, the MM5 12-km and 4-km forecasts were extended to 84 hours, enabling them togenerate a 3-day inflow forecast. In March 2012, 1.3-km MM5 model output was made availableout to 84 hours, resulting in a day-3 inflow forecast ensemble consisting of up to 48 members; fore-cast days 1 and 2 had at most 72 ensemble members available throughout the three-year forecastperiod. Due to occasional NWP model failures, the size of the ensemble forecast issued each day isvariable.From the beginning of the case-study period through March 2012, the coarse resolution (108 km84Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysishorizontal grid spacing) outer nests of the three NWP models were initialized using the NationalCenters for Environmental Prediction (NCEP) North American Mesoscale (NAM) model, whichalso provides time-varying boundary conditions. In March 2012, the initial/boundary condition forthe MM5 and WRF was switched to NCEP?s Global Forecast System (GFS) model, while MC2continued to make use of the NAM.The Distributed Hydrologic (DH) models applied to the case-study watershed are the Water bal-ance Simulation Model (WaSiM; Schulla, 2012) and WATFLOOD (Kouwen, 2010). These modelswere selected because they are distributed, and therefore able to take direct advantage of high-resolution NWP input, and because they are able to simulate snow and glacier melt processes andlakes in complex terrain given relatively limited input data. Both DH models are run at 1 km gridspacing with an hourly time step. The NWP fields are downscaled to the DH model grid using inter-polation schemes built into each DH model. For the WaSiM model, 12-km NWP fields are down-scaled using two methods: inverse-distance weighting (IDW); and elevation-dependent regression(Schulla, 2012). The 4-km and 1.3-km NWP fields are downscaled using a bilinear interpolationscheme. WATFLOOD downscaling is done using IDW that incorporates elevation dependence us-ing optional elevation adjustment rates for both temperature and precipitation. The 12-km fields aredownscaled using IDW with two different sets of elevation adjustments, while the 4- and 1.3-kmfields do not use the elevation adjustment.Both WaSiM and WATFLOOD model parameters have been optimized using the DynamicallyDimensioned Search (DDS) algorithm (Tolson and Shoemaker, 2007; Graeff et al., 2012; Francke,2012). Three parameter sets were generated for each model by using three different objective func-tions for DDS optimization: the mean absolute error (MAE) of simulated inflow, to minimize overallerrors; Nash-Sutcliffe Efficiency (NSE; Nash and Sutcliffe, 1970) of inflow, to emphasize perfor-mance during high-flow events; and the NSE of log-transformed flows, to optimize during low-flowperiods. Simulations during the ten-year optimization period (1997?2007) were driven by observedmeteorological data at several weather stations within the case-study watershed and surroundingarea.The multi-state or multi-initial-condition component of the M2M ensemble forecasting systemarises as a direct consequence of implementing a multi-parameter component. In forecast mode,the hydrologic state for each model and each model parameterization is updated at the start of theforecast day by driving the model with observed meteorological data. This resulting simulated stateis used as the initial condition for the day?s forecast run. In order to avoid discontinuities earlyin the daily forecast cycle, the parameter set used for the updating of hydrologic state must matchthat used in the forecast. Thus, each parameter set has its own daily hydrologic state for eachmodel. Figure 5.2 illustrates the update/forecast process for a particular parameterization of the85Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisWaSiMmodel. The forecast workflow is indicated by the solid arrows. Dashed arrows illustrate howmeteorological observations are used to update the model configuration?s hydrologic state for thefollowing day?s forecasts. The model configuration is specified by dash-dotted arrows. This processis repeated for each watershed model (WaSiM and WATFLOOD) and each parameterization/state,yielding 72 unique inflow forecasts each day.WRF 12 km WRF 1.3 kmWRF 4 kmMM5 12 kmMM5 4 kmMM5 1.3 kmMC2 12 kmMC2 4 kmMC2 1.3 kmDownscaler #1Downscaler #2 Downscaler #3 Downscaler #4WaSiM Model WaSiM Model MAEo StateMAEo-optimized ParametersMeteorological ObservationsUpdated WaSiM Model MAEo StateForecast 4 Forecast 5 Forecast 6Forecast 1 Forecast 2 Forecast 3Forecast 10 Forecast 11 Forecast 12Forecast 7 Forecast 8 Forecast 9Downscaler #5Model InputModel OutputModel Con!gurationFigure 5.2: The flow of information into and out of the WaSiM model for generating forecastswith the MAE-optimized parameter set. This process is repeated for each watershedmodel (WaSiM and WATFLOOD) and each parameterization/state, yielding 72 uniqueinflow forecasts each day.Ensemble forecasts from the M2M forecasting system are transformed into probabilistic fore-casts using the Ensemble Model Output Statistics (EMOS) method of Gneiting et al. (2005) (seeChapter 4). EMOS fits a normal distribution to the bias-corrected ensemble mean whereby distri-butional spread is based on a combination of ensemble variance and past ensemble mean errors.During the storm season, ensemble mean inflow errors to the Daisy Lake dam were found to exhibita distribution with a high, narrow peak and slight positive skewness. Thus, prior to fitting the normaldistribution to the ensemble, the data are log-transformed. During the warm season, when inflowsare driven by snowmelt and glacier melt, forecast errors were found to be normally distributed,and no transformation is required prior to fitting the normal probability distribution. In Chapter 4,86Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisthese seasonal uncertainty models were shown to produce reliable forecasts (i.e., events forecastedwith probability p are, over the course of many such forecasts, observed to occur a fraction p ofthe time). The probability calibration method of Nipen and Stull (2011) was able to significantlyincrease warm-season forecast sharpness, thereby decreasing ignorance.5.3.3 Ensemble Reduction Test CasesIn order to compare the cost-effectiveness of each of the M2M components described above, theperformance of the ensemble with all components, and with individual components removed isevaluated. Note that the multi-state, multi-parameter (MSP) hydrologic modelling elements de-scribed in Section 5.3.2 are inextricably linked and therefore comprise a single M2M component.Since interpolation schemes are built into the hydrologic models, these are considered to be free en-semble components. Recall that the 12-km NWP model inputs to both DH models are downscaledusing two different methods, while the higher resolution NWP grids are each downscaled using onemethod per DH model. In the M2M configurations tested in this economic analysis, 12-km NWPgrids are downscaled using both schemes unless otherwise specified.The NWP ensemble itself is comprised of two ensembles: a multi-model (MM) ensemble anda multi-grid scale (MGS) ensemble. It is likely that ensemble configurations including a MM NWPensemble would be most commonly exploited in operational settings, as low resolution modelsare available free-of-charge more commonly than their higher-resolution counterparts ? hence theterm ?the poor-man?s ensemble? (Ebert, 2001). In configurations consisting of a single NWP input,the WRF model output has been selected because it is the most current NWP model in the M2Mensemble, and, as a community model, is subject to ongoing development and support. In order toremove any impact of ensemble size on comparisons of high-resolution and low-resolution single-NWP configurations, the 12-km NWP fields are downscaled using one interpolation scheme.The following reduced M2M configurations are evaluated in this study (where the name of theconfiguration specifies the component that has been removed):? ?MGS: multi-DH, multi-MSP using 12 km multi-model NWP fields (36 members)? ?MM: multi-DH, multi-MSP using multi-grid scale WRF NWP fields (24 members)? ?MSP: multi-NWP, multi-DH with MAE-optimized parameterizations (24 members)? ?DH (WFLD): multi-NWP, multi-MSP, WATFLOOD DH model (36 members)? ?DH (WaSiM): multi-NWP, multi-MSP, WaSiM DH model (36 members)? ?NWP (HR): multi-DH, multi-MSP with WRF 1.3-km NWP fields (6 members)87Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis? ?NWP (LR): multi-DH, multi-MSP with WRF 12-km NWP fields downscaled one way (6members)Each of these ensemble configurations is transformed into a probabilistic forecast in the form ofa probability density function (PDF) using the EMOS method described in Section 5.3.2 (includingwarm-season probability calibration). Relative value of each of the reduced ensembles is comparedto that of the Full M2M probabilistic forecasts.In order to be useful for risk-based decision making, probabilistic forecasts must be reliable.Otherwise, their probabilities cannot be taken at face value. Thus, calibration deviations Eq. (A.8)for the probability forecasts derived from each reduced M2M configuration were examined duringthe storm season and warm season. Forecasts with calibration deviations less than 50% greaterthan their expected values (Nipen and Stull, 2011) were taken to be sufficiently reliable. Using thiscriterion, all of the configurations listed above produced calibrated forecasts except for ?MSP and?DH (WFLD) storm season forecasts. The calibration method of Nipen and Stull (2011) was appliedduring storm season periods when the forecasts were deemed to be sufficiently unreliable. The?intelligent? calibration scheme described in Chapter 4 (with ICF = 1.67 [Eq. (4.9)]) improvesreliability and prevents ignorance scores from being inflated by sampling error (Nipen, 2012).5.3.4 Cost-Loss Model Development for Daisy LakeIn this section, the cost-loss model described in Section 5.2.1 is refined, providing a specific modelfor the Daisy Lake reservoir. While this represents a great simplification of reservoir managementand operational constraints, it can be used to examine the relative value of probabilistic inflowforecasts to the reservoir. This model is taken from McCollor and Stull (2008b).A schematic of a hydroelectric reservoir is presented in Figure 5.3. The physical characteristicsof the reservoir are given by:h1 = nominal head (m), the height difference between the reservoir outlet and the turbine,h2 = the difference between the lowered (to prevent spillage) reservoir elevation and the outlet (m),h3 = the difference between the full reservoir and lowered reservoir elevations (m),Ar = the surface area of the reservoir (m2).Inflows (Qin) to the reservoir are either spilled (Qs), or channeled to the turbine via the penstockat a rate Qt. The power P (W) produced by the turbine is given by:P = ?QtH, (5.5)88Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysish3h2h1ArQinQtQsTurbineOutletReservoirDamPenstockFigure 5.3: Reservoir schematic diagram for the cost-loss economic model developed in Sec-tion 5.3.4 for Daisy Lake. Water that does not spill can be channeled through the penstockto the turbines to produce power and therefore revenue. Figure based on McCollor andStull (2008b).where ? is the turbine efficiency (expressed as a fraction),  is the specific weight of water (9807N/m3 at 5C), Qt is the flow through the turbine (m3/s), and H is the head, or the difference inelevation (m) between the reservoir level and the turbine. Assuming constant flow conditions, theenergy K (J) produced by running the generator for time T (in seconds) is:K = PT. (5.6)Given a selling value of S ($J1), the market value ? ($) of this energy is:? = KS = PTS. (5.7)In this decision model, losses occur in conjunction with large inflow events that are not forecast.If it is assumed that the reservoir is full when this event occurs (i.e., H = h1 + h2 + h3), then thisloss is equivalent to the value of the water that spills past the generator without producing power orrevenue. The amount of this loss, L($), which is related to lost power PL, is given by:L = PLTS = ?Qs(h1 + h2 + h3)TS. (5.8)Assuming that lowering the reservoir level by h3 would have prevented this spill, the spilled volume89Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisQs can be defined as: Qs = Arh3. (5.9)In order to prevent losses, the reservoir should be lowered preceding large inflow events. Thecosts of this preventive action result from operating the reservoir at a lower head H = h1 + h2relative to operation at H = h1 + h2 + h3:C = ?Qt(h1 + h2 + h3)TS  ?Qt(h1 + h2)TS = ?Qth3TS. (5.10)If we further assume that flow through the turbine, Qt, is at least equal to the base inflowrate, Qb, and let G represent the reservoir storage volume [Ar(h2 + h3)], then the cost/loss ratio(? = C/L) is: ? = QbArh1 + G. (5.11)Eq. (5.11) can be evaluated for Daisy Lake using the physical parameters listed in Table 5.2,yielding ? = 0.00033. This number indicates that the Daisy Lake reservoir operator is highlysensitive to losses associated with spilling, and that they will benefit from taking mitigative actioneven when events are forecasted with very small probabilities.Table 5.2: Physical parameter values for the Daisy Lake reservoir for cost-loss calculations.Values are taken from McCollor and Stull (2008b).Physical Parameter ValueNominal head h1 (m) 291Reservoir storage G (m3) 46?106Reservoir area Ar (m2) 43?106Daily base inflow Qb (m3/day) 4.2?106As noted in McCollor and Stull (2008b), it is possible to further refine the basic cost-loss modelby assuming that the costs associated with operating the reservoir at a lowered head may be realizeduntil the next inflow event occurs. The length of time until the next event is given by the inverse ofthe climatological base rate of the event (s). This refined cost-loss ratio is:? = Qbs(Arh1 + G) . (5.12)Further refinement can be made by allowing for a dynamic energy market. That is, we assumethat power from spilled water could have been sold at a contract price Sc, while additional power90Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisrequired to meet demand during periods of lowered head will need to be purchased at a market priceof Sm. This yields a cost-loss ratio of:? = SmQbSc s(Arh1 + G) . (5.13)Specific cost-loss ratios for the Daisy Lake reservoir calculated using Equations (5.12) and(5.13) are given in Table 5.3 for inflow anomaly thresholds of 70 m3/s and 100 m3/s. Followingfrom McCollor and Stull (2008b), the ratio of market to contract price (Sm/Sc) is taken to be 2.5 forthe case study. In reality, this ratio is highly variable, and market prices can spike such that Sm/Scapproaches 20, though it is more commonly in the range of 1 to 10 (Doug McCollor, personalcommunication, April 17, 2013). Based on these values, the cost-loss ratio for Daisy Lake can beseen to vary over a range of 0.00033 (using the basic model) to approximately 0.3 (incorporating sfor the 100 m3/s threshold and Sm/Sc of 10).Table 5.3: Cost-loss ratios for the Daisy Lake reservoir calculated using the basic model[Eq. (5.11)], including climatological frequency s [Eq. (5.12)], and including a variableelectricity market [Eq. (5.13)] with Sm/Sc = 2.5.C/L Model Anomaly Threshold70 m3/s 100 m3/sBasic Model 0.00033 0.00033Including s 0.014 0.030Including Sm/Sc 0.036 0.075Note that the dynamic cost-loss model developed by Roulin (2007) is based on the idea thatusers may be able to reduce the cost of taking action if they have more time to prepare. Thisreduction in cost results in a reduction in ? toward the region where maximum value is achieved(Richardson, 2000). Such a cost reduction would provide little benefit in the case-study region,where ? is already very low. Additionally, if the size of the event to which the reservoir operator issensitive is assumed constant, it follows from Eq. (5.10) that the cost associated with taking actionearlier is actually higher, because the reservoir will operate at lower head for a longer period of timeT prior to the next inflow event. A more complex dynamic model that allows the inflow thresholdto change with time and that accounts for the duration of an inflow event would be more suitable,but is beyond the scope of this research.91Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis5.4 Results and Discussion5.4.1 Quality and Skill of Reduced Ensemble ForecastsAs shown in Figure 5.4, ensemble median forecasts from the ?MGS M2M configuration have thelowest MAE for forecast day 1. As forecast horizon increases, the importance of including high-resolution NWP guidance becomes more apparent, as illustrated by the superior performance ofboth ?MM and Full M2M forecasts. The same conclusions can be drawn from an examination ofRMSESS. At lead times of 1-2 days, the inclusion of a multi-model NWP ensemble is more impor-tant than a multi-grid scale NWP ensemble (?MM is worse than ?MGS). Sampling any aspect ofNWP uncertainty has the potential for significant gains in forecast quality as compared to ensembleconfigurations where NWP uncertainty is neglected altogether (?NWP).It has been reported that the uncertainty in NWP model output is the largest source of error inNWP-driven flow forecasts with a time horizon beyond several days, whereas for shorter lead-times,uncertainties in the hydrologic model dominate prediction errors (Coulibaly, 2003; Cloke and Pap-penberger, 2009). Note that the comparative importance of NWP and DH model error over differenttime scales depends on context; for an anticipated heavy rainstorm in a small and flashy catchment,uncertainty around the amount of rainfall expected over the next day may have considerably moreimpact on tomorrow?s streamflow forecast than uncertainty introduced by the hydrologic models.Indeed, the deterministic scores in Figure 5.4 indicate no clear winner with respect to ?NWP and?DH or ?MSP ensembles. On average, the ?NWP, ?DH and ?MSP ensembles are very similar fordays 1 and 2. Ignoring the large MAE of ?DH (WaSiM), it appears that NWP error is most importantat a lead time of 3 days. RMSESS also reveals that removal of the NWP ensemble component intro-duces more large forecast errors at this lead time. Excluding the MSP component does not introducemany large forecast errors. Overall, including an ensemble of different hydrologic modelling ap-proaches is more advantageous than including a multi-state and multi-parameter component for asingle hydrologic model for this reservoir.The reliability and sharpness of the various probabilistic forecasting systems are assessed usingthe ignorance and continuous ranked probability scores shown in Figure 5.5. These scores indicatethat the ?MGS and Full ensemble probabilistic forecasts are of equally high quality for a lead timeof one day, and that beyond this forecast horizon, the inclusion of high-resolution NWP modelsis important. Also, these results support the finding that including a multi-model NWP ensemblecomponent is more important than a multi-grid scale NWP component, and that either is far superiorto ignoring NWP uncertainty altogether.Given that all of the configurations yield probabilistic forecasts with comparable (small) cali-92Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisDay 1 Day 2 Day 35.566.577.588.59MAEDay 1 Day 2 Day 30.70.750.8RMSESS  Full M2M? MGS? MM? MSP? DH (WFLD)? DH (WaSiM)? NWP (LR)? NWP (HR)Figure 5.4: MAE and RMSESS for ensemble median forecasts derived from the various M2Mconfigurations. Perfect deterministic forecasts have MAE of zero and RMSESS of one.Day 1 Day 2 Day 344.555.5IgnoranceDay 1 Day 2 Day 344.555.566.5CRPS  Full M2M? MGS? MM? MSP? DH (WFLD)? DH (WaSiM)? NWP (LR)? NWP (HR)Figure 5.5: Ignorance and continuous ranked probability scores for probabilistic forecasts de-rived from the various M2M configurations. Lower values are preferred for these scores93Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisbration deviations (D), the lower ignorance scores of the Full M2M forecasts indicate that they arebetter able to concentrate probability density in the right area (i.e., near the verifying observation)each day. Since the forecast PDFs are centred at the ensemble mean, lower ignorance scores are,as expected, generally associated with lower MAE and higher RMSESS. The slightly inflated igno-rance scores of the ?DH (WFLD) forecasts relative to the ?DH (WaSiM) forecasts are likely causedby the introduction of sampling error by the probability calibration scheme applied to storm-seasonforecasts. Ensembles that ignore error associated with the hydrologic models (?DH and ?MSP)have generally higher ignorance scores (i.e., are worse) than ?NWP ensembles.The lower CRPS of the Full M2M ensemble forecasts indicates that its forecast PDFs are sharperthan those derived from any of the reduced M2M configurations. Again, the exception occurs forday 1 forecasts, where the ?MGS ensemble CRPS is slightly lower. The ?DH (WaSiM) forecastPDFs have consistently higher spread than the ?DH (WFLD) configuration, and this is reflected intheir higher CRPS. In fact, forecasts from this configuration have generally higher and more variablespread than any other configuration; analysis of EMOS spread parameters reveals that distributionalspread is often taken to be up to 8? the ?DH (WaSiM) ensemble variance and is further inflated bythe large ensemble mean errors (Figure 5.4). Removal of any multi-model NWP component (?MMor ?NWP) also results in high spread, and this is again reflected by higher CRPS values. Theseresults support the findings of Stensrud and Yussouf (2003) who illustrated that for temperatureforecasting, multi-model weather ensembles improve the spread-skill relationship (the spread ofthe ensemble members should be related to the accuracy or skill of the ensemble mean; when theforecast is more certain, as indicated by low ensemble spread, errors are expected to be small).5.4.2 Economic Value of Ensemble ComponentsValue curves for an inflow anomaly threshold of 70 m3/s are shown in Figure 5.6 for the Full 1-dayM2M probability forecasts. Curves are plotted for probability thresholds pt of 0.02 (dashed line),0.1, 0.2,...0.9 (solid lines), and 0.98 (dash-dotted line). The heavy solid line is the envelope curve ofoptimal value achieved when each user chooses the probability threshold that maximizes the valuefor their ?.For calibrated forecast probabilities, users will maximize the value they get from the forecast-ing system by choosing pt = ? (Richardson, 2000). Deviations are caused by sampling variability;value curves for operationally insignificant inflows with greater sample sizes (e.g., 40 m3/s, notshown) are able to match pt values to their corresponding ? ranges more closely. Note that proba-bility thresholds above 0.7 are rarely exceeded even for the relatively low inflow anomaly thresholdof 70 m3/s; the false alarm rate is zero, and it follows from Eq. (5.4) that the value converges to thehit rate H for ? > s, indicated by the horizontal line segments in Figure 5.6. Forecasts issued with94Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis0.001 0.01 0.1 100.20.40.60.81?Value70 m3/sFigure 5.6: Forecast value as a function of user-specific cost-loss ratio ? for the Full 1-dayM2M probability forecast. Relative value of zero indicates that the forecasting systemoffers no benefits over climatology, while perfect forecasts have relative value of one.probability thresholds at or below 0.08 never result in forecast misses, thus the hit rate is one, andit follows from Eq. (5.4) that the value is equal to 1  F for ? < s. The value curves for pt of 0.8and 0.9 are identical. These characteristics of the value curves are due to sampling limitations.The impacts on relative value of removing various components of the M2M ensemble are shownin Figures 5.7 and 5.8. Envelope value curves are plotted for the Full ensemble probabilistic fore-casts (solid black line) as well as for each reduced M2M configuration (coloured lines). Note thatthe envelope curve in Figure 5.7 does not match that in Figure 5.6 because the maximum valuehas not been constrained by the few pt shown in Figure 5.6. The range of ? valid for operation ofthe Daisy Lake reservoir at each inflow threshold for Sm/Sc from 1 to 10 is indicated by the greyshaded area. Since the uncertainty models used in this study generate inflow forecasts over a con-tinuous range of probability thresholds, differences in relative value between ensembles of varyingsize are due only to associated changes in forecast quality and not in the available resolution of pt(Richardson, 2000, 2001).For events exceeding the 70 m3/s threshold, it can be seen in Figure 5.7 that day 1 forecastvalue is insensitive to most changes in M2M ensemble composition over the range of ? valid forDaisy Lake. Only the ?DH (WFLD) and ?NWP (HR) ensembles result in forecast misses at lowpt, resulting in significantly reduced value. The impact of the large spread of the ?DH (WaSiM)ensemble can be seen in its slightly lower forecast value at low pt where there are no forecast95Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis0.001 0.01 0.1 100.20.40.60.81?Value70 m3/sDay 10.001 0.01 0.1 100.20.40.60.81?ValueDay 20.001 0.01 0.1 100.20.40.60.81?ValueDay 3  Full M2M? MGS? MM? MSP? DH (WFLD)? DH (WaSiM)? NWP (LR)? NWP (HR)Figure 5.7: Forecast value as a function of user-specific cost-loss ratio ? for the Full M2Mprobability forecast (black line), and the various reduced ensemble configurations(coloured lines). The range of ? valid for Daisy Lake reservoir operation for Sm/Scfrom 1 to 10 are indicated by grey-shading.96Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis0.001 0.01 0.1 100.20.40.60.81?Value100 m3/sDay 10.001 0.01 0.1 100.20.40.60.81?ValueDay 20.001 0.01 0.1 100.20.40.60.81?ValueDay 3  Full M2M? MGS? MM? MSP? DH (WFLD)? DH (WaSiM)? NWP (LR)? NWP (HR)Figure 5.8: Same as Figure 5.7, but for an inflow anomaly threshold of 100 m3/s .misses, but a large number of false alarms, indicating poor discrimination. The importance ofincluding ensemble NWP input is apparent for users with ? greater than approximately 0.08, wherevalue decreases due to lower hit rates for inflow events of this size.At forecast lead times of two and three days, ?NWP (LR) shows high value for a range of usersand significantly greater value for low-? users. This is a result of the high spread of the model?sfitted PDF resulting in fewer forecast misses at low probabilities. The same is true of the other97Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysis?NWP ensembles and the ?DH (WaSiM) configuration, relative to the Full ensemble. The increasein spread associated with these configurations also results in the forecasting systems issuing morefalse alarm forecasts. For low ?, it follows from Eq. (5.4) that false alarms are not penalized veryheavily, and value is impacted very little. For this reason, most of the M2M configurations showgreater value than the Full ensemble over the range of ? valid for the Daisy Lake reservoir operator.Figure 5.8 shows that day 1 forecast value is more sensitive to ensemble configuration at ananomaly inflow threshold of 100 m3/s , particularly at low probability thresholds where most en-sembles miss event forecasts more frequently. Over the range of ? valid for the case study, the Fullensemble has more value than many other configurations. The ?MGS has less value than ?MMensemble at lead times of 2 and 3 days, supporting the conclusion that high-resolution NWP inputbecomes increasingly important with forecast lead time, particularly for high-impact events. At leadtimes of two and three days, ensembles that neglect sources of NWP uncertainty have higher valuethan the Full ensemble, as high forecast spread leads to few forecast misses and many false alarms.Additionally, calibration deficiencies (or sampling limitations) for forecasts of high inflows causemaximum value curves to follow the individual value curves for very low pt for a wide range of ?.The actual expense associated with the various ensemble configurations over the two-year eval-uation period can be estimated by evaluating the contingency table counts (Table 5.1) for a particularthreshold. From Eq. (5.1), actual expense is:Eactual = L(a? + b? + c). (5.14)In this example, the cost-loss ratio that incorporates climatological frequency for the 70 m3/s and100 m3/s inflow anomaly thresholds and variable market pricing with Sm/Sc = 2.5 is used. Ta-ble 5.4 shows the contingency table elements and estimated costs for various ensemble configura-tions for the 70 m3/s threshold. Contingency table counts are based on forecasts with pt of 0.04(recall that maximum value occurs for users who take action at pt equal to their ?). Results forthe 100 m3/s threshold are shown in Table 5.5 for pt = 0.08. In calculating the one-day loss L asin Eq. (5.8), turbine efficiency (?) is taken to be 0.9 (Gulliver, 1991), and the cost of energy (S) isassumed constant at $50/MWh1. Since the goal of this study is in estimating the relative value of thevarious M2M ensemble components, the use of these generic values (which may not be appropriatefor the case study watershed and dates) is not a critical issue. Accurate estimates of Eactual areonly necessary for comparison against the price paid for ensemble components, which is beyondthe scope of this work.1Source: U.S. Energy Information Administration (2012). 2011 Brief: Wholesale electricity prices mostly lowerin 2011. URL: http://www.eia.gov/todayinenergy/detail.cfm?id=4530. Retrieved May 21, 2013. Price quoted is theapproximate average wholesale spot electricity price for the United States.98Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisTable 5.4: Actual expenses incurred over the two-year evaluation period by using variousM2M configurations for decision making at the 70 m3/s inflow anomaly threshold. Ex-penses are calculated for ? = 0.036 using the probability threshold, pt, of 0.04. The lossL incurred for each missed forecast at this threshold is $216,543.Ensemble Hit False Alarm Miss Costconfiguration (a) (b) (c) Eactual estimate ($)Day 1 Full Ensemble 17 35 0 1.87L 405,400?MGS 17 33 0 1.80L 389,800?MM 17 33 0 1.80L 389,800?MSP 17 36 0 1.91L 413,200?DH (WFLD) 16 37 1 2.91L 629,700?DH (WaSiM) 16 39 1 2.98L 645,300?NWP (LR) 17 35 1 2.87L 621,900?NWP (HR) 15 38 2 3.91L 846,300Day 2 Full Ensemble 13 41 4 5.94L 1,287,100?MGS 13 46 4 6.12L 1,326,100?MM 15 50 2 4.34L 939,800?MSP 13 46 4 6.12L 1,326,100?DH (WFLD) 12 43 5 6.98L 1,511,500?DH (WaSiM) 14 64 3 5.81L 1,257,700?NWP (LR) 15 53 2 4.45L 963,200?NWP (HR) 14 52 3 5.38L 1,164,100Day 3 Full Ensemble 11 45 5 7.02L 1,519,300?MGS 10 56 6 8.38L 1,813,800?MM 12 52 4 6.30L 1,365,100?MSP 12 59 4 6.56L 1,419,700?DH (WFLD) 8 51 8 10.12L 2,192,300?DH (WaSiM) 13 85 3 6.53L 1,413,600?NWP (LR) 12 60 4 6.59L 1,427,500?NWP (HR) 12 55 4 6.41L 1,388,500At the 70 m3/s threshold, Table 5.4 shows reduced costs associated with the ?MGS ensembleat the 1-day forecast horizon. This is in agreement with results from Figure 5.4 and Figure 5.5,which show the ?MGS ensemble to be at least as good as the Full ensemble at this lead time. Forforecast horizons of 2 and 3 days, ensembles with lower costs than the Full ensemble are those withhigh spread (e.g., ?MM, ?NWP and ?DH (WaSiM)). These forecasting systems have fewer forecastmisses but also a significantly greater number of false alarms, which are not heavily penalizedbecause of the low cost-loss ratio.99Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic AnalysisTable 5.5: Actual expenses incurred over the two-year evaluation period by using variousM2M forecasts for decision making at the 100 m3/s inflow anomaly threshold. Expensesare calculated for ? = 0.075 using a probability threshold, pt, of 0.08. L for this thresholdis $309,348.Ensemble Hit False Alarm Miss Costconfiguration (a) (b) (c) Eactual estimate ($)Day 1 Full Ensemble 7 7 1 2.05L 634,200?MGS 5 8 3 3.98L 1,229,700?MM 5 11 3 4.20L 1,299,300?MSP 7 10 1 2.28L 703,800?DH (WFLD) 6 12 2 3.35L 1,036,300?DH (WaSiM) 6 9 2 3.13L 966,700?NWP (LR) 5 10 3 4.13L 1,276,100?NWP (HR) 6 9 2 3.13L 966,700Day 2 Full Ensemble 6 12 2 3.35L 1,036,300?MGS 5 15 3 4.50L 1,392,100?MM 5 13 3 4.35L 1,345,700?MSP 6 15 2 3.58L 1,105,900?DH (WFLD) 6 15 2 3.58L 1,105,900?DH (WaSiM) 5 17 3 4.65L 1,438,500?NWP (LR) 6 14 2 3.50L 1,082,700?NWP (HR) 5 10 3 4.13L 1,276,100Day 3 Full Ensemble 4 12 4 5.20L 1,608,600?MGS 4 13 4 5.28L 1,631,800?MM 5 12 3 4.28L 1,322,500?MSP 4 12 4 5.20L 1,608,600?DH (WFLD) 4 7 4 4.83L 1,492,600?DH (WaSiM) 4 19 4 5.73L 1,771,000?NWP (LR) 4 17 4 5.58L 1,724,600?NWP (HR) 5 8 3 3.98L 1,229,700At the 100 m3/s threshold, the Full ensemble is markedly superior to the reduced configurationsfor day 1 forecasts, and slightly less so for day 2 forecasts. At a lead time of 3 days, some of theM2M configurations that neglect NWP uncertainty lead to cost reductions of up to 24%. The factthat forecast misses are reduced without inflation of false alarms suggests that these typically high-spread ensembles have a tendency to underforecast extreme events. Observed anomaly inflowsabove the 100 m3/s threshold during the case-study period are all driven by precipitation eventswith strong orographic gradients and, in two cases, rain-on-snow contributions. This again points100Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisto the importance of including NWP uncertainty in the ensemble, particularly as forecast lead timeincreases. Unfortunately, given the small sample size evaluated for this inflow anomaly threshold,it is difficult to draw any strong conclusions.Note that the relative costs in Tables 5.4 and 5.5 do not necessarily correspond to the maximumvalue curves shown in Figures 5.7 and 5.8. Despite all of the ensembles being calibrated as measuredover all inflows, calibration deviation can still exist for particular thresholds. This may lead to non-optimal decision making when action is taken at pt = ?. For example, Figure 5.7 suggests that userstaking action using day 2 forecasts with ? = 0.04 will achieve the least forecast value by using theFull ensemble, and that value would be maximized by using one of ?MM, ?DH (WFLD), or ?NWP(LR). However, Table 5.4 reveals that due to calibration deficiencies in all of the ensembles at thishigh threshold (which may be artifacts of sampling limitations), the Full ensemble actually has lessassociated cost than several configurations, and ?DH (WFLD) has the highest cost.For low ?, it follows from Eq. (5.3) that false alarms result in very little decrease in value,whereas forecast misses result in significant costs. As discussed by Murphy (1994), it is impossibleto know the economic value of forecasts to a particular user unless you know that a forecast resultedin the user taking action, have detailed knowledge of the decision-making processes of the user, andknow the skill of the forecasts. Thus, it is reasonable, given the assumptions made to simplify theDaisy Lake cost-loss model, to assume that the decision maker won?t necessarily take action at thelow probabilities used in Tables 5.4 and 5.5. In spite of the economic value analysis shown here,the high number of false alarms issued by some of the forecasting systems would likely lead thereservoir operator to begin to ignore these warnings, which would result in significant losses whenthe event finally did occur.5.5 ConclusionsIt has been argued that in order to produce truly probabilistic forecasts of hydrologic phenomena,it is necessary to sample all sources of error (Krzysztofowicz, 2001). Unfortunately, accounting foreach these sources of error comes at a price, either in terms of money or time spent. Therefore, thischapter has been devoted to an economic analysis of each error-sampling component of the M2Mensemble.In order to evaluate the economic value of each component of the M2M ensemble, a simple,static cost-loss model has been applied to simulate a hypothetical reservoir operator at the DaisyLake reservoir who is assumed to be sensitive to certain inflow thresholds. By modelling thedecision-making process based on the use of the Full ensemble and on ensemble configurationswith individual components removed, it is possible to draw some general conclusions about the101Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisrelative value of each component. This value provided to the forecast end user does not includeany reductions in value due to computational, time, or monetary costs associated with the variousensemble components. It is therefore up to the individual forecast end user to weigh the value ofeach ensemble component against the price paid for that component.The impacts of some M2M components cannot be isolated. For example, the cost differencebetween the Full ensemble and the ?DH (WFLD) configuration cannot be attributed solely to theaddition or removal of the WaSiM hydrologic model because each model uses different schemes todownscale the driving NWP fields, and the differently-optimized parameter sets and initial condi-tions for the models are not equivalent. Thus, it is only possible to make general observations aboutthe relative importance of different types of ensemble configurations.Based on a comparison of actual operating costs associated with taking action using the varietyof M2M ensemble configurations, it can be seen that for inflow anomalies above the 70 m3/s thresh-old, the inclusion of multiple distributed hydrologic models is worth approximately $120,000 annu-ally for forecast lead times of 1 day (excluding the price paid for this ensemble component). Otherconfigurations give inconclusive results. For the 100 m3/s threshold, sampling aspects of NWP un-certainty is, on average, worth $280,000/year at a lead time of 1 day, and $119,000/year at a leadtime of 2 days. DH uncertainty at this threshold is worth slightly less, at $180,000 and $118,000annually for forecast days 1 and 2, respectively. The MSP component is worth $35,000/year forboth lead times at the 100 m3/s threshold. This analysis could be used to determine whether thecase-specific price associated with including multiple DH models and/or multi-model or multi-gridscale NWP ensembles are offset by the benefits provided by their inclusion.This economic analysis, though based on a relatively small evaluation sample size, can be usedin conjunction with more robust metrics of deterministic and probabilistic forecast quality and skill,to draw some useful conclusions for probability forecasting in the case-study watershed. Acrossall lead times, the Full M2M ensemble is generally superior to any of the reduced ensemble con-figurations. Exclusion of the multi-grid scale NWP ensemble component is not detrimental at leadtimes of one day, though the importance of high-resolution NWP model output through the use ofa multi-grid scale ensemble is apparent as lead time increases. Including a multi-hydrologic modelcomponent is important at all lead times, and is more important than including multiple parametersets and hydrologic states (which cannot be separated in this particular case). Forecast false alarmscan be avoided by forecast PDFs that do not overestimate spread. In this case study, the Full M2Mensemble is superior to the other configurations in its skill in predicting appropriate forecast spreadusing the EMOS uncertainty model. Note that while larger ensemble sizes tend to perform betterin this respect, ensemble size does not appear to be the primary driver; the ?NWP configurationshave the smallest ensemble sizes, whereas the distributional spread predicted by the ?DH (WaSiM)102Chapter 5: On the Importance of Sampling Hydrologic Uncertainty: An Economic Analysisconfiguration is often the greatest. Ensemble mean errors, which are used by EMOS to increasePDF spread when ensemble variance is inadequate, appear to be more important in this respect.Uncertainty in hydrologic model predictions has led to numerous recommendations for thequantification of this uncertainty and the use of probabilistic forecasting frameworks (e.g., Kitanidisand Bras, 1980; Krzysztofowicz, 2001; Beven, 2006; Liu and Gupta, 2007; De Roo et al., 2011).The use of weather ensembles has been shown to provide added skill for flood warning as com-pared to deterministic forecasts (e.g., Gouweleeuw et al., 2005; Roulin and Vannitsem, 2005; Thirelet al., 2010), and Pappenberger et al. (2008) have shown that super-ensembles that combine the en-sembles of various forecasting centres have significant added reliability and value. Ensemble meanforecasts derived from multiple hydrologic models have been demonstrated to have overall superiorperformance to the best individual member, even when models with non-optimized parameters areincluded (e.g., Shamseldin et al., 1997; Coulibaly et al., 2005; Ajami et al., 2006). Forecast qualityhas also been shown to improve with the inclusion of different hydrologic states (e.g., McMillanet al., 2013), differently-optimized hydrologic model parameterizations (e.g., Duan et al., 2007;Vrugt et al., 2003a,b), and combinations of the two (Moradkhani et al., 2005b; Vrugt et al., 2005).The results presented in this chapter, while specific to the case-study watershed and the modelsapplied, are generally in agreement with the above-cited literature. That is, the importance of sam-pling all sources of error is clear in terms of forecast quality, skill, and to a lesser extent, value. Thisstudy has gone a step beyond its predecessors and attempted to assign an actual monetary valueto the importance of sampling the various sources of uncertainty. Whether or not each ensemblecomponent is worthwhile is likely to be case-dependent, but it is anticipated that for rainfall-drivenflows, the inclusion of multiple NWP models will be worthwhile. The necessity for potentiallycostly high-resolution NWP model output may be restricted to predictions in complex terrain whereorographic effects are particularly important or in regions subject to convective rainfall. Includ-ing multiple hydrologic models is likely to be advantageous in many applications. In applicationswhere ensemble data assimilation methods are feasible and provide hydrologic state estimates ofhigh quality (e.g., Andreadis and Lettenmaier, 2006; Clark et al., 2008; Pauwels and De Lannoy,2006), they may contribute added value. This is an area for potential future exploration.103Chapter 6ConclusionsThe goal of this dissertation was to generate reliable probabilistic forecasts of inflow for a hydro-electric reservoir in complex terrain. This has been achieved through a combination of explicit andimplicit sampling of the various sources of uncertainty in the hydrologic modelling chain.6.1 Summary of Methods and ProceduresPredictions derived from hydrologic models are uncertain due to errors in: (1) the hydrologicmodel?s structure; (2) the meteorological data (observed or modelled) used to drive the hydrologicmodel; (3) the initial conditions or hydrologic state used to start the forecast run; and (4) the param-eterization of the hydrologic models.In order to generate a probabilistic forecast of reservoir inflows to the Daisy Lake reservoirlocated in southwestern British Columbia, Canada, a forecasting framework has been created inwhich all of these sources of error are explicitly sampled. The probabilistic forecasting frameworkwas built incrementally throughout the dissertation:? Reservoir inflow forecast uncertainty stemming from the hydrologic models themselves andthe meteorological data used to drive these models was addressed in Chapter 2. This wasdone by using the individual members of a multi-model, multi-grid scale numerical weatherprediction (NWP) ensemble to drive two different distributed hydrologic models. Uncertaintyintroduced into the modelling chain by the procedures used to downscale the driving data fromNWP to hydrologic model grid scale was sampled by using multiple interpolation schemesfor the lowest resolution NWP fields, where this uncertainty is greatest.? In Chapter 3, this Member-to-Member (M2M) forecasting system was expanded to accountfor errors introduced via hydrologic model parameterization and the hydrologic state or initialconditions used to begin each daily inflow forecast. Parameter uncertainty was sampled byoptimizing inflows simulated by the two hydrologic models using three different objectivefunctions to improve different aspects of inflow forecast quality. The multi-state component104Chapter 6: Conclusionsof this expanded M2M ensemble arises as a direct result of the multi-parameter componentbecause of the way in which the daily hydrologic state is updated.? Prior to combining the individual ensemble members into ensemble mean or probabilisticforecasts, a simple bias correction scheme was applied to each ensemble member to removesystematic errors introduced by the dynamical models. The bias correction factor was de-termined by comparing the total forecasted inflow volume over a training period of specifiedlength to the corresponding total observed inflow volume. A variety of training period lengthswere tested in Chapter 2.? The resulting 72-member inflow forecast ensemble was transformed into a probabilistic fore-casting system in Chapter 4 by applying suitable uncertainty models. The uncertainty modelsfit a probability distribution function (PDF) to the ensemble whereby the spread of the dis-tribution was related to the variance of the ensemble members and recent ensemble meanerrors.? An intelligent probability calibration scheme was applied to the probability forecasts to im-prove reliability during periods when the uncertainty model produced forecasts deemed to besufficiently uncalibrated. The probability calibrator relabels forecast probabilities based onthe distribution of probability integral transform (PIT) values over some past training period.? The price paid for generating an ensemble hydrologic forecasting system that explicitly sam-ples all sources of uncertainty in the modelling chain may be excessive for operational fore-casting applications. Therefore, the approximate economic value added by each of the M2Mcomponents developed in Chapters 2 and 3 was estimated in Chapter 5 using a simple cost-loss decision model adapted specifically for the hydroelectric energy sector.6.2 Summary of FindingsA number of findings were made based on evaluation of the probabilistic inflow forecasting systemand its various components:? The addition of the multi-state, multi-parameter M2M components to the multi-NWP, multi-distributed-hydrologic-model ensemble increases forecast resolution by improving the fore-casting ?engine? that generates the ensemble. This is the most important aspect of ensembleforecast quality because unlike reliability, it cannot be corrected using probability calibrationmethods. The full 72-member M2M ensemble was found to be underdispersive in spite ofattempting to account for all sources of error (Chapter 3).105Chapter 6: Conclusions? Bias in the hydrologic state used to begin each daily inflow forecast was found to be the pri-mary source of bias in the forecast. Because of this, and the flashy mountainous nature of thecase study watershed, a short bias-correction training window of three days was found to beideal for correcting forecast bias and other measures of deterministic (ensemble mean) fore-cast quality. The bias corrector also significantly improved ensemble forecast resolution anddiscrimination. A degree-of-mass-balance bias correction scheme that weights more recentinformation more heavily performed better than a scheme with equal weighting (Chapter 2).? In the case study watershed, forecast error characteristics were found to change with theseasons. During the fall-winter storm season, errors are approximately log-normally dis-tributed; a log-normal PDF fitted to the ensemble during this period produced reliable fore-casts. Spring-summer inflows are driven by snowmelt, and since forecast errors are normallydistributed during this season, reliable forecasts were generated using a simple Gaussian un-certainty model. These ensemble model output statistics (EMOS) uncertainty models wereable to correct for the spread-deficiency of the M2M ensemble (Chapter 4).? Since the probabilistic forecasts were already well calibrated, the PIT-based calibrator had atendency to increase forecast ignorance scores by introducing sampling error. Examinationof forecast error characteristics led to the development of an alternative ?carry-forward? cali-bration strategy that was able to improve forecast sharpness and therefore decrease ignoranceduring the warm season (Section 4.4.2).? A sensitivity comparison of the full 72-member M2M ensemble forecasting system to al-ternative M2M configurations with individual ensemble components removed revealed thatexplicit sampling of all sources of error improves many facets of forecast quality. At shortlead times, a multi-NWP ensemble component was not critical, but was found to becomeimportant with increasing lead time (Chapter 5).? Using a simple cost-loss decision model, NWP uncertainty sampling was found to have thegreatest economic value for management of the case study watershed, followed by uncertaintyintroduced by hydrologic model structure. Ensembles with poor spread-skill relationshipwere found to have high value in spite of often issuing forecast false alarms, which were notheavily penalized by the cost-loss model (Chapter 5).6.3 Potential ApplicationsThe methods outlined in this dissertation are simple and can be easily adopted for any number ofhydrologic modelling applications. Predictions in ungauged basins (PUB) are highly uncertain, as106Chapter 6: Conclusionsthey are generally based on the premise that data from a gauged basin can be applied in other lo-cations (Sivapalan et al., 2003). The M2M error sampling approach would be a viable method forestimating uncertainty in PUBs because of its relatively small data requirements. Another hydro-logic application for the ensemble methods discussed herein is in predicting the impacts of climatechange on hydrologic processes using global climate model (GCM) output. Uncertainty in GCMprediction is caused by errors in estimations of future greenhouse gas emissions, climate sensitiv-ity, regional responses, and changes in the intensity and frequency of weather extremes (Eckhardtand Ulbrich, 2003; Wilby et al., 2006). The choices made in estimating these model parametersas well as the choice of global climate model result in a broad range of possible future scenarios(Christensen et al., 2004). In cases where climate change may result in land cover changes (e.g.,glacier retreat or changes in vegetation), uncertainty in land cover data used by the hydrologic mod-els should also be incorporated. Note that in climate change applications, statistical post-processingmethods would need to assume stationarity, as adaptive updating of correction parameters would beimpossible. Statistical corrections based on past data may be invalid in future climates (Hay et al.,2002; Fowler et al., 2007).Applications of the M2M probabilistic forecasting framework are certainly not limited to hy-drology. All uncertain forecasts should be expressed probabilistically in order to convey this un-certainty; any forecasting system involving multiple uncertain components could make use of thesimple error-sampling strategy employed in this dissertation, particularly if these uncertainties in-teract non-linearly. As an example, consider modelling the transport of forest fire smoke. Theseforecasts are subject to uncertainty in the plume model structure (for example, whether it trackslarge-scale puffs or individual particles), in the meteorological forecasts used to drive the plumemodel, in the assimilation of forest fire data and errors in fire fuel loading estimates and the re-sulting emissions forecasts. A M2M ensemble strategy would be a suitable and relatively simplemethod of estimating uncertainty in these and other air-quality forecasts.6.4 Limitations and Recommendations for Future WorkThe primary limitation of the analysis in this dissertation is the short forecast lead-time afforded bythe high-resolution NWP models used in the M2M ensemble. Building on these results, the nextstage of research should examine the relative importance of various sources of inflow forecast errorat longer lead times. This would be done by adding medium-range NWP forecasts to the M2Mensemble.At longer lead times, it is anticipated that a different bias correction strategy from that developedin Chapter 2 would be necessary. As forecast horizon increases, end forecast bias arising from NWP107Chapter 6: Conclusionsmodels will likely begin to outweigh that caused by bias in hydrologic state. Longer bias correctiontraining windows may be necessary to correct the NWP errors (e.g., McCollor and Stull, 2008a).Different bias correction schemes should also be tested, such as seasonal degree-of-mass-balance(DMB) calculation or the robust best easy systematic estimator of Woodcock and Engel (2005),both of which are suitable for correcting daily hydrometeorological forecasts (McCollor and Stull,2008a).Another expansion of the M2M ensemble forecasting framework that warrants examination isthe addition of conceptual and soft computing hydrologic models. Because of the choices madeby the developers of physically oriented models such as WaSiM and WATFLOOD, each is goodat simulating different parts of the hydrologic cycle. In addition, the subjective choices made indeveloping models based on soft computing approaches (e.g., auto-regressive methods, artificialneural networks, and fuzzy expert systems) can result in models that perform well at different times(Han et al., 2007). By combining predictions from different models and from different modellingapproaches, it is possible to take advantage of the expertise of each of them, theoretically resultingin better overall predictive capabilities.The methods used in this dissertation to sample uncertainty arising from hydrologic model pa-rameterization and hydrologic state were extremely simple and likely contributed to the lack of dis-persiveness exhibited by the full M2M ensemble. Future work should evaluate the merits of moreadvanced methods of ensemble data assimilation and parameter optimization in a M2M forecastingframework.The use of data assimilation methods such as ensemble Kalman filtering and particle filters (e.g.,Moradkhani et al., 2005b,a; Moradkhani and Sorooshian, 2009; DeChant and Moradkhani, 2011b;Leisenring and Moradkhani, 2011) is confounded by a lack of observed hydrologic state data withinthe case study watershed. It is anticipated that the use of such methods would greatly improvethe sampling of initial condition uncertainty in the M2M framework, and could potentially correctthe underdispersiveness of the ensemble developed in this study. The deployment of additionalobserving stations within the watershed would make such methods feasible (though their computa-tional complexity is of concern), and could potentially result in forecast (and therefore economic)improvements that outweigh the cost of installation and maintenance of these stations.Other ensemble parameter optimization methods are also available, including the Shuffled Com-plex Evolution Metropolis algorithm (SCEM-UA; Vrugt et al., 2003b) and its extension, the Multi-Objective Shuffled Complex Evolution Metropolis algorithm (MOSCEM-UA; Vrugt et al., 2003a).Such methods are based on the idea that a search of the feasible parameter space near the optimumparameter set will reveal many sets that are equally capable of producing simulations and fore-casts of high quality. Whether it is preferable to perturb parameter values around their optimum108Chapter 6: Conclusionsvalues or to use different objective functions to optimize an ensemble of parameterizations is anarea needing further research. Additionally, the dynamically dimensioned search algorithm used inthis dissertation was developed specifically for high-dimensional optimization problems associatedwith distributed hydrologic models, and may be hard to beat in practice (Tolson and Shoemaker,2007). Dual state-parameter estimation methods that allow parameters to evolve in time could beemployed for more complete handling of parameter and initial condition uncertainty if additionalobserved hydrologic state data were available within the case study watershed (Moradkhani et al.,2005a,b; DeChant and Moradkhani, 2011b; Leisenring and Moradkhani, 2011). Parameter opti-mization that incorporates knowledge of climate signals such as the El Nin?o-Southern Oscillationstate may also be a worthwhile area of future research, as recommended in Chapter 2.Another likely contributor to the underdispersiveness of the M2M ensemble is its (implicit)assumption of perfect observations. Ideally, model input uncertainty should be incorporated intothe model parameter optimization procedure to avoid biased or misleading model output (Kavetskiet al., 2006a). Data uncertainty can be incorporated into hydrologic modelling using Bayesian meth-ods, but computational complexity is a concern (Kavetski et al., 2006b). Neglect of observationalerror has an additional impact on forecast verification ? an impact that could actually have an op-posite effect on apparent M2M dispersiveness. That is, it is entirely possible that after accountingfor errors in the calculated Daisy Lake inflow ?observations?, they would fall within the bounds ofthe M2M ensemble more often.Determination of candidate PDF shapes in Chapter 4 was based on analysis of empirical en-semble mean forecast error distributions over one year. The storm-season forecast error distributionwas found to have a very high peak and a slight positive skew. Based on this and on a review of theliterature on probabilistic hydrologic modelling, the log-normal distribution was selected to modelthe forecast PDF during the storm season, and the method did indeed produce reliable forecasts. Anarea of potential future study should include testing the performance of other PDF shapes such asthe Gamma or Weibull distributions (Wilks, 2006). Alternatively, the Gaussian PDF could be usedfollowing data reexpression using a power transformation such as the Box-Cox transformation (Boxand Cox, 1964).Since Chapter 4 is the first application of the inteliCal probability calibrator, it is not yet knownhow large the calibration deviation should be relative to the expected deviation before the PIT-basedcalibration is applied. This sensitivity is controlled by the inteliCal adjustment factor (ICF ) inEq. (4.9), whereby higher values of ICF result in less frequent calibration. For nearly-calibratedstorm season forecasts in this case study, an ICF of approximately 1.67 seems to balance theapparently competing objectives of improving calibration without increasing ignorance. Duringthe warm season, when calibration is good but ignorance is high, an ICF of 1.0 to 1.43 provides109Chapter 6: Conclusionsgreat improvements to forecast ignorance. While an ICF in the range of 1.43 to 1.67 thereforeappears to be a suitable compromise for maximizing both storm season and warm season forecastquality, further testing of the inteliCal scheme is required to determine whether these results arecase-specific.Another limitation of Chapter 4 is the way in which the warm season and storm season weredefined, and therefore how the uncertainty model was changed between seasons. The strategy em-ployed (whereby the models were switched on pre-defined dates based on climatological flow char-acteristics) likely had very little impact on the verification metrics in Chapters 4 and 5. However,the change in forecasting system, if not correctly timed, could result in non-optimal forecasts withsignificant impacts on reservoir operation. An alternative would be to change the uncertainty modelwhen flows are observed to have undergone the transition between seasons.The simple (static) cost-loss decision model used in Chapter 5 to estimate the economic valueprovided by the various M2M components exhibited a tendency to give high value to forecasts withlarge uncertainty bounds. These forecasting systems give fewer forecast misses than their sharpercounterparts, but also issue many forecast false alarms. Making use of a wide variety of verificationscores can help to weed out such poor forecasting systems. However, estimates of economic valueshould ideally incorporate more knowledge about the reservoir operator?s decision-making processand how they react to false alarms. It is likely that after many instances of the forecasting system?crying wolf?, the decision maker will begin to ignore such forecasts. This could result in largeeconomic losses when the event finally did occur.The analysis in Chapter 5 would also benefit from a more robust verification sample size foroperationally significant inflows. This would require a longer record of forecasts and observations.This analysis also ignored costs associated with setting up the full M2M ensemble forecasting sys-tem (e.g., time spent on model setup, price paid for high-resolution NWP forecasts), because theywere a non-issue in this research setting. Future economic analyses of this type for operationalforecasting will require an estimate of the price paid for each component in order to determine thefeasibility of the M2M approach.Verification in this dissertation was based on a comparison of daily average inflow rates. Thiswas necessary given the ?observed? inflow data available for verification. These observations areactually calculated using a water balance based on observed reservoir levels and outflows; the hourlydata are extremely noisy. Daily averages are of much higher quality and therefore suitable for veri-fication. The use of daily averages likely results in an inflation of apparent forecast quality relativeto what might be achieved by verifying over shorter averaging periods. Indeed, for quantitative pre-cipitation forecasts (QPF) derived from high-resolution NWPmodels, lengthening the accumulationperiod used for verification significantly increases QPF skill because timing differences become less110Chapter 6: Conclusionsimportant (Stensrud and Yussouf, 2007; Mass et al., 2002). Hydrologic forecasting applicationsthat make use of high-resolution NWP forecasts and that require forecasts with sub-daily time stepsshould consider these caveats when carrying out forecast evaluation.111BibliographyAjami, N. K., Q. Duan, X. Gao, and S. Sorooshian, 2006: Multimodel combination techniques foranalysis of hydrological simulations: Application to distributed model intercomparison projectresults. Journal of Hydrometeorology, 7, 755?768.Alila, Y. and J. Beckers, 2001: Using numerical modeling to address hydrologic forestmanagement issues in British Columbia. Hydrological Processes, 15, 3371?3387.Anderson, E. A., 1973: National weather service river forecast system - snow accumulation andablation model. Tech. Rep. NOAA Technical Memorandum NWS Hydro-17, U.S. Departmentof Commerce, 217 pp.Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts fromensemble model integrations. Journal of Climate, 9, 1518?1530.Andreadis, K. M. and D. P. Lettenmaier, 2006: Assimilating remotely sensed snow observationsinto a macroscale hydrology model. Advances in Water Resources, 29, 872?886.Benoit, R., M. Desgagne?, P. Pellerin, S. Pellerin, Y. Chartier, and S. Desjardins, 1997: TheCanadian MC2: A semi-Lagrangian, semi-implicit wideband atmospheric model suited forfinescale process studies and simulation. Monthly Weather Review, 125, 2382?2415.Beven, K., 2006: On undermining the science? Hydrological Processes, 20, 3141?3146.Beven, K. and A. Binley, 1992: The future of distributed models: Model calibration anduncertainty prediction. Hydrological Processes, 6, 279?289.Beven, K. J. and M. J. Kirkby, 1979: A physically based, variable contributing area model of basinhydrology. Hydrol. Sci. Bull., 24, 43?69.Binley, A. M., K. J. Beven, A. Calver, and L. G. Watts, 1991: Changing responses in hydrology:Assessing the uncertainty in physically based model predictions. Water Resources Research, 27,1253?1261.Bourdin, D. R., S. W. Fleming, and R. B. Stull, 2012: Streamflow modelling: A primer onapplications, approaches and challenges. Atmosphere-Ocean, 50, 507?536.Box, G. E. P. and D. R. Cox, 1964: An analysis of transformations. Journal of the Royal StatisticalSociety, 26, 211?252.Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Monthly WeatherReview, 78, 1?3.112BibliographyBrocker, J. and L. A. Smith, 2007: Increasing the reliability of reliability diagrams. Weather andForecasting, 22, 651?661.Brun, S. E. and L. E. Band, 2000: Simulating runoff behavior in an urbanizing watershed.Computers, Environment and Urban Systems, 24, 5?22.Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributionsof the ECMWF ensemble prediction system. Monthly Weather Review, 125, 99?119.Candille, G., S. Beauregard, and N. Gagnon, 2010: Bias correction and multiensemble in theNAEFS context or how to get a ?free calibration? through a multiensemble approach. MonthlyWeather Review, 138, 4268?4281.Carpenter, T. M. and K. P. Georgakakos, 2006: Intercomparison of lumped versus distributedhydrologic model ensemble simulations on operational forecast scales. Journal of Hydrology,329, 174?185.Chow, V. T., 1954: The log-probability law and its engineering applications. Proceedings of theAmerican Society of Civil Engineers, 80, 536?1?536?25.Christensen, N. S., A. W. Wood, N. Voisin, D. P. Lettenmaier, and R. N. Palmer, 2004: The effectsof climate change on the hydrology and water resources of the Colorado River basin. ClimaticChange, 62, 337?363.Clark, M. P., D. E. Rupp, R. A. Woods, X. Zheng, R. P. Ibbitt, A. G. Slater, J. Schmidt, and M. J.Uddstrom, 2008: Hydrological data assimilation with the ensemble Kalman filter: Use ofstreamflow observations to update states in a distributed hydrological model. Advances in WaterResources, 31, 1309?1324.Cloke, H. L. and F. Pappenberger, 2009: Ensemble flood forecasting: A review. Journal ofHydrology, 375, 613?626.Coulibaly, P., 2003: Impact of meteorological predictions on real-time spring flow forecasting.Hydrological Processes, 17, 3791?3801.Coulibaly, P., M. Hache?, V. Fortin, and B. Bobe?e, 2005: Improving daily reservoir inflow forecastswith model combination. Journal of Hydrologic Engineering, 10, 91?99.Daly, C., 2006: Guidelines for assessing the suitability of spatial climate data sets. InternationalJournal of Climatology, 26, 707?721.Day, G. N., 1985: Extended streamflow forecating using NWSRFS. Journal of Water ResourcesPlanning and Management, 111, 157?170.De Roo, A., et al., 2011: Quality control, validation and user feedback of the European Flood AlertSystem (EFAS). International Journal of Digital Earth, 4, 77?90.113BibliographyDeChant, C. M. and H. Moradkhani, 2011a: Improving the characterization of initial condition forensemble streamflow prediction using data assimilation. Hydrology and Earth System Sciences,15, 3399?3410.DeChant, C. M. and H. Moradkhani, 2011b: Radiance data assimilation for operational snow andstreamflow forecasting. Advances in Water Resources, 34, 351?364.Dettinger, M. D., D. R. Cayan, H. F. Diaz, and D. M. Meko, 1998: North-south precipitationpatterns in western North America on interannual-to-decadal timescales. Journal of Climate, 11,3095?3111.Draper, A. J., M. W. Jenkins, K. W. Kirby, J. R. Lund, and R. E. Howitt, 2003:Economic-engineering optimization for California water management. Journal of WaterResources Planning and Management, 129, 155?164.Druce, D. J., 1990: Incorporating daily flood control objectives into a monthly stochastic dynamicprogramming model for a hydroelectric complex. Water Resources Research, 26, 5?11.Duan, Q., N. K. Ajami, X. Gao, and S. Sorooshian, 2007: Multi-model ensemble hydrologicprediction using Bayesian model averaging. Advances in Water Resources, 30, 1371?1386.Ebert, E. E., 2001: Ability of a poor man?s ensemble to predict the probability and distribution ofprecipitation. Monthly Weather Review, 129, 2461?2480.Eckel, F. A. and M. K. Walters, 1998: Calibrated probabilistic quantitative precipitation forecastsbased on the MRF ensemble. Weather and Forecasting, 13, 1132?1147.Eckhardt, K. and U. Ulbrich, 2003: Potential impacts of climate change on groundwater rechargeand streamflow in a central European low mountain range. Journal of Hydrology, 284, 244?252.Environment Canada, 2013: Ensemble forecasts: Definition of the control model and perturbedmodels. URL http://weather.gc.ca/ensemble/verifs/model e.html, retrieved May 22, 2013.Erven, L. N., 2012: An observational study of slope air and free air wintertime temperatures inWhistler Valley, British Columbia, Canada. M.S. thesis, Department of Earth and OceanSciences, University of British Columbia, 112 pp.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model usingMonte Carlo methods to forecast error statistics. Journal of Geophysical Research, 99,10143?10162.Fane, L. A., 2003: Generalized optimization in the British Columbia hydroelectric system. M.S.thesis, Department of Civil Engineering, University of British Columbia, 109 pp.Fleming, S. W., F. A. Weber, and S. Weston, 2010: Multiobjective, manifoldly constrained MonteCarlo optimization and uncertainty estimation for an operational hydrologic forecast model.American Meteorological Society Annual Meeting, Atlanta, Georgia.114BibliographyFleming, S. W. and P. H. Whitfield, 2010: Spatiotemporal mapping of ENSO and PDO surfacemeteorological signals in British Columbia, Yukon, and Southeast Alaska. Atmosphere-Ocean,48, 122?131.Fleming, S. W., P. H. Whitfield, R. D. Moore, and E. J. Quilty, 2007: Regime-dependentstreamflow sensitivities to Pacific climate modes across the Georgia-Puget transboundaryecoregion. Hydrological Processes, 21, 3264?3287.Fowler, H. J., S. Blenkinsop, and C. Tebaldi, 2007: Linking climate change modelling to impactsstudies: recent advances in downscaling techniques for hydrological modelling. InternationalJournal of Climatology, 27, 1547?1578.Francke, T., 2012: Particle swarm optimization and dynamically dimensioned search, optionallyusing parallel computing based on Rmpi. URL www.rforge.net/ppso/, retrieved May 13, 2013.Franz, K. J., H. C. Hartmann, S. Sorooshian, and R. Bales, 2003: Verification of National WeatherService ensemble streamflow predictions for water supply forecasting in the Colorado RiverBasin. Journal of Hydrometeorology, 4, 1105?1118.Georgakakos, K. P., D.-J. Seo, H. Gupta, J. Schaake, and M. B. Butts, 2004: Towards thecharacterization of streamflow simulation uncertainty through multimodel ensembles. Journal ofHydrology, 298, 222?241.Georgakakos, K. P. and H. Yao, 2001: Assessment of Folsom Lake response to historical andpotential future climate scenarios 2. Reservoir management. Journal of Hydrology, 249,176?196.Glassheim, E., 1997: Fear and loathing in North Dakota. Natural Hazards Observer, 21, 1?4.Gneiting, T., F. Balabdaoui, and A. E. Raftery, 2007: Probabilistic forecasts, calibration andsharpness. Journal of the Royal Statistical Society, 69, 243?268.Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilisticforecasting using ensemble model output statistics and minimum CRPS estimation. MonthlyWeather Review, 133, 1098?1118.Go?tzinger, J. and A. Ba?rdossy, 2008: Generic error model for calibration and uncertaintyestimation of hydrological models. Water Resources Research, 44, W00B07.Gouweleeuw, B. T., J. Thielen, G. Franchello, A. P. J. DeRoo, and R. Buizza, 2005: Floodforecasting using medium-range probabilistic weather prediction. Hydrology and Earth SystemSciences, 9, 365?380.Graeff, T., E. Zehe, T. Blume, T. Francke, and B. Schro?der, 2012: Predicting event response in anested catchment with generalized linear models and a distributed watershed model.Hydrological Processes, 26, 3749?3769.115BibliographyGreen, W. H. and G. A. Ampt, 1911: Studies of soil physics. Part 1: The flow of air and waterthrough soils. J. Agric. Soc., 4, 1?24.Grell, G., J. Dudhia, and D. R. Stauffer, 1994: A description of the fifth-generation PennState/NCAR mesoscale model (MM5). Tech. Rep. TN-398+STR, NCAR, 121 pp.Grimit, E. P. and C. F. Mass, 2002: Initial results of a mesoscale short-range ensemble forecastingsystem over the Pacific Northwest. Weather and Forecasting, 17, 192?205.Gulliver, J. S., 1991: Hydraulic conveyance design, 5.1?5.81. Hydropower Engineering Handbook,McGraw-Hill, Washington, D.C.Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. MonthlyWeather Review, 129, 550?560.Hamill, T. M. and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts.Monthly Weather Review, 125, 1312?1327.Hamill, T. M. and S. J. Colucci, 1998: Evaluation of Eta-RSM probabilistic precipitation forecasts.Monthly Weather Review, 126, 711?724.Hamlet, A. F., D. Huppert, and D. P. Lettenmaier, 2002: Economic value of long-lead streamflowforecasts for Columbia River hydropower. Journal of Water Resources Planning andManagement, 128, 91?101.Hamlet, A. F. and D. P. Lettenmaier, 1999: Effects of climate change on hydrology and waterresources in the Columbia River basin. Journal of the American Water Resources Association,35, 1597?1623.Han, D., L. Chan, and N. Zhu, 2007: Flood forecasting using support vector machines. Journal ofHydroinformatics, 9, 267?276.Hargreaves, G. H. and Z. A. Samani, 1982: Estimating potential evapotranspiration. ASCE J.Irrigation Drainage Div., 108, 225?230.Hashino, T., A. A. Bradley, and S. S. Schwartz, 2007: Evaluation of bias-correction methods forensemble streamflow volume forecasts. Hydrology and Earth System Sciences, 11, 939?950.Hay, L. E., et al., 2002: Use of Regional Climate Model Output for Hydrologic Simulations.Journal of Hydrometeorology, 3, 571?590.Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensembleprediction systems. Weather and Forecasting, 15, 559?570.Jenkins, M. W., et al., 2001: Improving California water management: Optimizing value andflexibility. Tech. rep., Center for Environmental and Resources Engineering, University ofCalifornia.116BibliographyJohnson, C. and R. Swinbank, 2009: Medium-range multimodel ensemble combination andcalibration. Quarterly Journal of the Royal Meteorological Society, 135, 777?794.Kavetski, D., G. Kuczera, and S. W. Franks, 2006a: Bayesian analysis of input uncertainty inhydrological modeling: 1. Theory. Water Resources Research, 42, W03407.Kavetski, D., G. Kuczera, and S. W. Franks, 2006b: Bayesian analysis of input uncertainty inhydrological modeling: 2. Application. Water Resources Research, 42, W03408.Khan, S., A. R. Ganguly, and S. Saigal, 2005: Detection and predictive modeling of chaos in finitehydrological time series. Nonlinear Processes in Geophysics, 12, 41?53.Kitanidis, P. K. and R. L. Bras, 1980: Real-time forecasting with a conceptual hydrologic model:1. Analysis of uncertainty. Water Resources Research, 16, 1025?1033.Kite, G. W. and N. Kouwen, 1992: Watershed modeling using land classifications. WaterResources Research, 28, 3193?3200.Klok, E. J., K. Jasper, K. P. Roelofsma, J. Gurtz, and A. Badoux, 2001: Distributed hydrologicalmodelling of a heavily glaciated alpine river basin. Hydrological Sciences Journal, 46, 553?570.Kouwen, N., 2010: WATFLOOD/WATROUTE hydrological model routing and flow forecastingsystem. Tech. rep., University of Waterloo. URLwww.civil.uwaterloo.ca/watflood/downloads/manual10.pdf, retrieved May 13, 2013.Kouwen, N., M. Danard, A. Bingeman, W. Luo, F. R. Seglenieks, and E. D. Soulis, 2005: Casestudy: Watershed modeling with distributed weather model data. Journal of HydrologicEngineering, 10, 23?38.Kouwen, N., E. D. Soulis, A. Pietroniro, J. Donald, and R. A. Harrington, 1993: Grouped responseunits for distributed hydrologic modeling. Journal of Water Resources Planning andManagement, 119, 289?305.Krzysztofowicz, R., 2001: The case for probabilistic forecasting in hydrology. Journal ofHydrology, 249, 2?9.Krzysztofowicz, R. and L. Duckstein, 1979: Preference criterion for flood control underuncertainty. Water Resources Research, 14, 513?520.Kuczera, G. and E. Parent, 1998: Monte Carlo assessment of parameter uncertainty in conceptualcatchment models: the Metropolis algorithm. Journal of Hydrology, 211, 69?84.Kurtzman, D. and R. Kadmon, 1999: Mapping of temperature variables in Israel: A comparison ofdifferent interpolation methods. Climate Research, 13, 33?43.Leemans, R. and W. P. Cramer, 1991: The IIASA database for mean monthly values oftemperature, precipitation, and cloudiness on a global terrestrial grid. Tech. Rep. RR-91-18,International Institute for Applied Systems Analysis.117BibliographyLegg, T. P. and K. R. Mylne, 2004: Early warnings of severe weather from ensemble forecastinformation. Weather and Forecasting, 19, 891?906.Leisenring, M. and H. Moradkhani, 2011: Snow water equivalent prediction using Bayesian dataassimilation methods. Stoch. Environ. Res. Risk Assess., 25, 253?270.Lewis, D., M. J. Singer, R. A. Dahlgren, and K. W. Tate, 2000: Hydrology in a California oakwoodland watershed: a 17-year study. Journal of Hydrology, 240, 106?117.Liu, Y. and H. V. Gupta, 2007: Uncertainty in hydrologic modeling: Toward an integrated dataassimilation framework. Water Resources Research, 43, W07401.Liu, Y., et al., 2011: Advancing data assimilation in operational hydrologic forecasting: progresses,challenges, and emerging opportunities. Hydrology and Earth System Sciences, 16, 3863?3887.Lorenz, E. N., 1963: Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20,130?141.Madadgar, S., H. Moradkhani, and D. Garen, 2012: Towards improved post-processing ofhydrologic forecast ensembles. Hydrological Processes, doi:10.1002/hyp.9562.Mantua, N. J., S. R. Hare, Y. Zhang, J. M. Wallace, and R. C. Francis, 1997: A Pacific interdecadalclimate oscillation with impacts on salmon production. Bulletin of the American MeteorologicalSociety, 78, 1069?1079.Mascaro, G., E. R. Vivoni, and R. Deidda, 2010: Implications of ensemble quantitativeprecipitation forecast errors on distributed streamflow forecasting. Journal ofHydrometeorology, 11, 69?86.Mascaro, G., E. R. Vivoni, and R. Deidda, 2011: Impact of basin scale and initial condition onensemble streamflow forecast uncertainty. 25th Conference on Hydrology, Seattle, WA.Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolutionproduce more skillful forecasts? Bulletin of the American Meteorological Society, 83, 407?430.McCollor, D. and R. Stull, 2008a: Hydrometeorological accuracy enhancement via postprocessingof numerical weather forecasts in complex terrain. Weather and Forecasting, 23, 131?144.McCollor, D. and R. Stull, 2008b: Hydrometeorological short-range ensemble forecasts incomplex terrain. Part II: Economic evaluation. Weather and Forecasting, 23, 557?574.McMillan, H., E. Hreinsson, M. P. Clark, S. K. Singh, C. Zammit, and M. J. Uddstrom, 2013:Operational hydrological data assimilation with the recursive ensemble Kalman filter. Hydrologyand Earth System Sciences, 17, 21?38.Monteith, J. L., 1965: Evaporation and environment. Symposium of the Society for ExperimentalBiology, 19, 205?234.118BibliographyMoradkhani, H., K.-L. Hsu, H. V. Gupta, and S. Sorooshian, 2005a: Uncertainty assessment ofhydrologic model states and parameters: Sequential data assimilation using the particle filter.Water Resources Research, 41, W05012.Moradkhani, H., S. Sorooshian, H. V. Gupta, and P. R. Houser, 2005b: Dual state-parameterestimation of hydrological models using ensemble Kalman filter. Advances in Water Resources,28, 135?147.Moradkhani, H. and S. Sorooshian, 2009: General review of rainfall-runoff modeling: Modelcalibration, data assimilation, and uncertainty analysis, 1?24. Hydrological Modelling and theWater Cycle, Coupling the Atmospheric and Hydrological Models, Springer, Berlin, Germany.Moradkhani, H., C. M. DeChant, and S. Sorooshian, 2012: Evolution of ensemble dataassimilation for uncertainty quantification using the particle filter-Markov chain Monte-Carlomethod. Water Resources Research, 48, W12520.Murphy, A. H., 1973: A new vector partition of the probability score. Journal of AppliedMeteorology, 12, 595?600.Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weatherforecasting. Weather and Forecasting, 8, 293?293.Murphy, A. H., 1994: Assessing the economic value of weather forecasts: An overview ofmethods, results and issues. Meteorological Applications, 1, 69?73.Murphy, A. H. and R. L. Winkler, 1987: A general framework for forecast verification. MonthlyWeather Review, 115, 1330?1338.Nash, J. E. and I. V. Sutcliffe, 1970: River flow forecasting through conceptual models: Part I - Adiscussion of principles. Journal of Hydrology, 10, 282?290.Niehoff, D., U. Fritsch, and A. Bronstert, 2002: Land-use impacts on storm-runoff generation:scenarios of land-use change and simulation of hydrological response in a meso-scale catchmentin SW-Germany. Journal of Hydrology, 267, 80?93.Nipen, T., 2012: A component-based probabilistic weather forecasting system for operationalusage. Ph.D. thesis, Department of Earth, Ocean and Atmospheric Sciences, University ofBritish Columbia, 101 pp.Nipen, T. and R. Stull, 2011: Calibrating probabilistic forecasts from an NWP ensemble. Tellus A,63, 858?875.Olsson, J. and G. Lindstro?m, 2008: Evaluation and calibration of operational hydrologicalensemble forecasts in Sweden. Journal of Hydrology, 350, 14?24.O?zelkan, E. C. and L. Duckstein, 2001: Fuzzy conceptual rainfall-runoff models. Journal ofHydrology, 253, 41?68.119BibliographyPalmer, T. N., 2002: The economic value of ensemble forecasts as a tool for risk assessment: Fromdays to decades. Quarterly Journal of the Royal Meteorological Society, 128, 747?774.Palmer, T. N., G. J. Shutts, R. Hagedorn, F. J. Doblas-Reyes, T. Jung, and M. Leutbecher, 2005:Representing model uncertainty in weather and climate prediction. Annual Review of Earth andPlanetary Sciences, 33, 163?193.Pappenberger, F., J. Bartholmes, J. Thielen, H. L. Cloke, R. Buizza, and A. De Roo, 2008: Newdimensions in early flood warning across the globe using grand-ensemble weather predictions.Geophysical Research Letters, 35, L10404.Parrish, M. A., H. Moradkhani, and C. M. DeChant, 2012: Toward reduction of model uncertainty:Integration of Bayesian model averaging and data assimilation. Water Resources Research, 48,W03519.Pauwels, V. R. N. and G. J. M. De Lannoy, 2006: Improvement of modeled soil wetness conditionsand turbulent fluxes through the assimilation of observed discharge. Journal ofHydrometeorology, 7, 458?477.Peschke, G., 1987: Soil moisture and runoff components from a physically founded approach. ActaHydrophysica, 31, 191?205.Philip, J. R., 1954: An infiltration equation with physical significance. Soil Science, 77, 153?158.Pinson, P., P. McSharry, and H. Madsen, 2010: Reliability diagrams for non-parametric densityforecasts of continuous variables: Accounting for serial correlation. Quarterly Journal of theRoyal Meteorological Society, 136, 77?90.PRISM Climate Group, 2012: Latest PRISM data. URL http://www.prism.oregonstate.edu/,retrieved May 13, 2013.Pulido-Velazquez, M., M. W. Jenkins, and J. R. Lund, 2004: Economic values for conjunctive useand water banking in southern California. Water Resources Research, 40, W03401.Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian modelaveraging to calibrate forecast ensembles. Monthly Weather Review, 133, 1155?1174.Randrianasolo, A., M. H. Ramos, G. Thirel, V. Andre?assian, and E. Martin, 2010: Comparing thescores of hydrological ensemble forecasts issued by two different hydrological models.Atmospheric Science Letters, 11, 100?107.Reggiani, P., M. Renner, A. H. Weerts, and P. A. H. J. M. van Gelder, 2009: Uncertaintyassessment via Bayesian revision of ensemble streamflow predictions in the operational RiverRhine forecasting system. Water Resources Research, 45, W02428.Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble predictionsystem. Quarterly Journal of the Royal Meteorological Society, 126, 649?667.120BibliographyRichardson, D. S., 2001: Measures of sklll and value of ensemble predictions systems, theirinterrelationship and the effect of ensemble size. Quarterly Journal of the Royal MeteorologicalSociety, 127, 2473?2489.Richardson, D. S., 2003: Economic Value and Skill, 165?187. Forecast Verification: APractitioner?s Guide in Atmospheric Science, Wiley, Chichester, England.Ross, R. S. and T. N. Krishnamurti, 2005: Reduction of forecast error for global numerical weatherprediction by the Florida State University (FSU) superensemble. Meteorology and AtmosphericPhysics, 88, 215?235.Roulin, E., 2007: Skill and relative economic value of medium-range hydrological ensemblepredictions. Hydrology and Earth System Sciences, 11, 725?737.Roulin, E. and S. Vannitsem, 2005: Skill of medium-range hydrological ensemble predictions.Journal of Hydrometeorology, 6, 729?744.Roulston, M. S., G. E. Bolton, A. N. Kleit, and A. L. Sears-Collins, 2006: A laboratory study ofthe benefits of including uncertainty information in weather forecasts. Weather and Forecasting,21, 116?122.Roulston, M. S. and L. A. Smith, 2002: Evaluating probabilistic forecasts using informationtheory. Monthly Weather Review, 130, 1653?1660.Schmeits, M. J. and K. J. Kok, 2010: A comparison between raw ensemble output, (modified)Bayesian model averaging, and extended logistic regression using ECMWF ensembleprecipitation forecasts. Monthly Weather Review, 138, 4199?4211.Schulla, J., 2012: Model description WaSiM (Water balance Simulation Model). Tech. rep.,Hydrology Software Consulting J. Schulla, 305 pp.Seo, D.-J., H. D. Herr, and J. C. Schaake, 2006: A statistical post-processor for accounting ofhydrologic uncertainty in short-range ensemble streamflow prediction. Hydrology and EarthSystem Sciences Discussions, 3, 1988?2035.Shamseldin, A. Y., K. M. O?Connor, and G. C. Liang, 1997: Methods for combining the outputs ofdifferent rainfall-runoff models. Journal of Hydrology, 197, 203?229.Shawwash, Z. K. E., 2000: A decision support system for real-time hydropower scheduling in acompetetive power market environment. Ph.D. thesis, Department of Civil Engineering,University of British Columbia, 315 pp.Sivakumar, B., 2000: Chaos theory in hydrology: important issues and interpretations. Journal ofHydrology, 227, 1?20.Sivakumar, B., R. Berndtsson, and J. Olsson, 2001: Evidence of chaos in the rainfall-runoffprocess. Hydrological Sciences Journal, 46, 131?145.121BibliographySivapalan, M., et al., 2003: IAHS decade on Predictions in Ungauged Basins (PUB), 2003-2012:Shaping an exciting future for the hydrologic sciences. Hydrological Sciences Journal, 48,857?880.Sivillo, J. K., J. E. Ahlquist, and Z. Toth, 1997: An ensemble forecasting primer. Weather andForecasting, 12, 809?818.Skamarock, W. C., et al., 2008: A description of the Advanced Research WRF version 3. Tech.Rep. TN-475+STR, NCAR, 113 pp.Spokas, K. and F. Forcella, 2006: Estimating hourly incoming solar radiation from limitedmeteorological data. Weed Science, 54, 182?189.Stedinger, J. R., 1980: Fitting log normal distributions to hydrologic data. Water ResourcesResearch, 16, 481?490.Steinschneider, S. and C. Brown, 2011: Influences of North Atlantic climate variability onlow-flows in the Connecticut River basin. Journal of Hydrology, 409, 212?224.Stensrud, D. J., H. E. Brooks, J. Du, M. S. Tracton, and E. Rogers, 1999: Using ensembles forshort-range forecasting. Monthly Weather Review, 127, 433?446.Stensrud, D. J. and N. Yussouf, 2003: Short-range ensemble predictions of 2-m temperature anddewpoint temperature over New England. Monthly Weather Review, 131, 2510?2524.Stensrud, D. J. and N. Yussouf, 2007: Reliable probabilistic quantitative precipitation forecastsfrom a short-range ensemble forecasting system. Weather and Forecasting, 22, 3?17.Stull, R. B., 2000: Meteorology for Scientists and Engineers. 2d ed., Brooks/Cole ThomsonLearning, Pacific Grove, CA, 502 pp.Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems.Proc. Workshop on Predictability, Reading, United Kingdom, ECMWF, Reading, UnitedKingdom, 1?25.Thirel, G., F. Regimbeau, E. Martin, J. Noilhan, and F. Habets, 2010: Short- and medium-rangehydrological ensemble forecasts over France. Atmospheric Science Letters, 11, 72?77.Thirel, G., F. Rousset-Regimbeau, E. Martin, and F. Habets, 2008: On the impact of short-rangemeteorological forecasts for ensemble streamflow predictions. Journal of Hydrometeorology, 9,1301?1317.Thornes, J. E. and D. B. Stephenson, 2001: How to judge the quality and value of weather forecastproducts. Meteorological Applications, 8, 307?314.Tolson, B. A. and C. A. Shoemaker, 2007: Dynamically dimensioned search algorithm forcomputationally efficient watershed model calibration. Water Resources Research, 43, W01413.122BibliographyToth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and Ensemble Forecasts,137?163. Forecast Verification: A Practitioner?s Guide in Atmospheric Science, Wiley,Chichester, England.Van den Bergh, J. and E. Roulin, 2010: Hydrological ensemble prediction and verification for theMeuse and Scheldt basins. Atmospheric Science Letters, 11, 64?71.Vrugt, J. A., C. G. H. Diks, H. V. Gupta, W. Bouten, and J. M. Verstraten, 2005: Improvedtreatment of uncertainty in hydrologic modeling: Combining the strengths of globaloptimization and data assimilation. Water Resources Research, 41, W01017.Vrugt, J. A., H. V. Gupta, L. A. Bastidas, W. Bouten, and S. Sorooshian, 2003a: Effective andefficient algorithm for multiobjective optimization of hydrologic models. Water ResourcesResearch, 39, 1214.Vrugt, J. A., H. V. Gupta, W. Bouten, and S. Sorooshian, 2003b: A shuffled complex evolutionMetropolis algorithm for optimization and uncertainty assessment of hydrologic modelparameters. Water Resources Research, 39, 1201.Vrugt, J. A. and B. A. Robinson, 2007: Treatment of uncertainty using ensemble methods:Comparison of sequential data assimilation and Bayesian model averaging. Water ResourcesResearch, 43, W01411.Wagener, T., 2003: Evaluation of catchment models. Hydrological Processes, 17, 3375?3378.Wang, Q. J., D. E. Robertson, and F. H. S. Chiew, 2009: A Bayesian joint probability modelingapproach for seasonal forecasting of streamflows at multiple sites. Water Resources Research,45, W05407.Wang, T., A. Hamann, D. Spittlehouse, and S. N. Aitken, 2006: Development of scale-free climatedata for western Canada for use in resource management. International Journal of Climatology,26, 383?397.Westrick, K. J. and C. F. Mass, 2001: An evaluation of a high-resolution hydrometeorologicalmodeling system for prediction of a cool-season flood event in a coastal mountainous watershed.Journal of Hydrometeorology, 2, 161?180.Westrick, K. J., P. Storck, and C. F. Mass, 2002: Description and evaluation of ahydrometeorological forecast system for mountainous watersheds. Weather and Forecasting, 17,250?262.Wilby, R. L., P. G. Whitehead, A. J. Wade, D. Butterfield, R. J. Davis, and G. Watts, 2006:Integrated modelling of climate change impacts on water resources and quality in a lowlandcatchment: River Kennet, UK. Journal of Hydrology, 330, 204?220.Wilks, D. S., 2001: A skill score based on economic value for probability forecasts.Meteorological Applications, 8, 209?219.123Appendix BibliographyWilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2d ed., Academic Press, 627pp.Willmott, C. J. and K. Matsura, 1995: Smart interpolation of annually averaged air temperature inthe United States. Journal of Applied Meteorology, 34, 2577?2586.Wilson, L. J., S. Beauregard, A. E. Raftery, and R. Verret, 2007: Calibrated surface temperatureforecasts from the Canadian ensemble prediction system using Bayesian model averaging.Monthly Weather Review, 135, 1364?1385.Wood, A. W. and J. C. Schaake, 2008: Correcting errors in streamflow forecast ensemble mean andspread. Journal of Hydrometeorology, 9, 132?148.Woodcock, F. and C. Engel, 2005: Operational consensus forecasts. Weather and Forecasting, 20,101?111.Yoshitani, J., Z. Q. Chen, M. L. Kavvas, and K. Fukami, 2009: Atmospheric model-basedstreamflow forecasting at small, mountainous watersheds by a distributed hydrologic model:Application to a watershed in Japan. Journal of Hydrologic Engineering, 14, 1107?1118.Yuan, H., J. A. McGinley, P. J. Schultz, C. J. Anderson, and C. Lu, 2008: Short-range precipitationforecasts from time-lagged multimodel ensembles during the HMT-West-2006 campaign.Journal of Hydrometeorology, 9, 477?491.Zhao, L., Q. Duan, J. Schaake, A. Ye, and J. Xia, 2011: A hydrologic post-processor for ensemblestreamflow predictions. Advances in Geosciences, 29, 51?59.Zhu, Y., Z. Toth, R. Wobus, D. Richardson, and K. Mylne, 2002: The economic value ofensemble-based weather forecasts. Bulletin of the American Meteorological Society, 83, 73?83.124Appendix AForecast Verification MetricsA.1 Measures-Oriented Verification for Deterministic ForecastsLet mt be the deterministic (i.e., ensemble mean or median) forecasted inflow at time t. Scores arecalculated over all t in the set of time points T . The size of this set is given by kTk, which, for thecase of daily inflow forecasts, can be interpreted as the number of days over which the forecast isevaluated. The verifying observation is given by xt, and the mean observed value over all t in T isgiven by x?.Degree of mass balance (DMB)An appropriate measure of bias for volumetric quantities such as precipitation and reservoir inflowis the DMB (McCollor and Stull, 2008a). The DMB is a measure of the ratio of simulated orforecasted inflow to the observed inflow over a given period of time and is given by:DMB = Pt2T mtPt2T xt . (A.1)DMB values less than one indicate that inflows are underforecast, while DMB values greaterthan one indicate a wet forecast bias. A DMB of one is achieved for forecasts that are free of biasas measured over T .Mean Absolute Error (MAE) MAE = 1kTkXt2T|mt  xt| (A.2)For perfect forecasts MAE is zero.125Appendix A: Forecast Verification MetricsRoot Mean Square Error (RMSE)RMSE =s 1kTkXt2T(mt  xt)2 (A.3)Perfect forecasts have an RMSE of zero. This score places more emphasis on large inflow forecasterrors than does MAE.Root Mean Square Error Skill Score (RMSESS)Forecast skill is determined by comparing the RMSE of the forecast to that of a zero-skill referenceforecast (RMSEref ). We take the reference forecast to be persistence (i.e., the forecast issuedtoday for all lead times is taken to be yesterday?s observed inflow).RMSESS = 1 RMSERMSEref (A.4)A skill score of zero indicates that the forecast and reference forecast scores are the same andthat therefore the forecasting system offers no advantage over persistence. A positive skill scoreindicates that the forecast has more skill than persistence, while a negative skill score indicates theopposite. A perfect forecasting system has a skill score of one.Nash-Sutcliffe Efficiency (NSE)The Nash-Sutcliffe Efficiency (NSE) is an indicator of statistical association (Nash and Sutcliffe,1970) that gives a score of one for a perfect forecast. An NSE of zero indicates that the modelperforms no better than a climatological constant forecast given by x?. Emphasis is placed on forecastperformance during periods of high inflows.NSE = 1 Pt2T (xt mt)2Pt2T (xt  x?)2 (A.5)The NSE of log-transformed flows (LNSE) provides a measure of low-flow forecast quality.A.2 Distributions-Oriented Verification for Ensemble andProbabilistic ForecastsIn the following, a forecast probability density function (PDF) of variable x, valid at time t is givenby ft(x). The verifying observation is designated as xt. Scores are calculated for all t in the set oftime points T . The size of this set is given by kTk, which, for the case of daily inflow forecasts, can126Appendix A: Forecast Verification Metricsbe interpreted as the number of days over which the forecast is evaluated.ReliabilityReliability or calibration (Murphy, 1973) is a measure of consistency between forecast probabilitiesand the frequency of occurrence of observed values. That is, events forecasted with probability pshould, over the course of many such forecasts, be observed to occur a fraction p of the time. Areliable forecasting system exhibits a flat rank histogram or a flat probability integral transform (PIT)histogram (see below), however, a flat histogram is not always a guarantee of a reliable ensemble(Hamill, 2001). Reliability is easily corrected using probability calibration methods (e.g., Hamilland Colucci, 1997; Nipen and Stull, 2011).ResolutionResolution is a measure of how well a forecasting system can a priori differentiate future weatheroutcomes such that different forecasts are associated with distinct verifying observations. This is themost important attribute of a forecast system (Toth et al., 2003), as it cannot be improved throughadjustment of probability values. Resolution can only be corrected by improving the forecasting?engine? used to generate the ensemble or probability forecasts. This measure of forecast perfor-mance is conditioned on the forecasts in that it examines whether or not an event occurred given thatit was predicted with a particular probability. The converse of resolution is discrimination, which isconditioned on observations (see discussion of the Relative Operating Characteristic).Rank HistogramThe rank histogram or Talagrand diagram is used to assess calibration when the Binned ProbabilityEnsemble (BPE) uncertainty model is used to generate probabilistic forecasts (Anderson, 1996;Talagrand et al., 1997). In the BPE uncertainty model, it is assumed that the ensemble members andverifying observation are pulled from the same probability distribution. Therefore, each ensemblemember is equally likely, and when the observation and K ensemble members are pooled togetherand ranked, the rank of the observation is a random integer between 1 andK + 1. The rank histogramindicates the frequency with which the observation falls into each bin, defined as the regions betweentwo consecutive ensemble members and above and below the highest and lowest ensemble members.If the BPE assumption is true, and the ensemble members represent the full spread of the probabilitydistribution of the observations, then the probabilities derived from this uncertainty model should becalibrated, and the rank histogram should be flat. A U-shaped diagram indicates a lack of ensemblespread, while L- and J-shaped diagrams indicate over- and under-forecasting biases, respectively.127Appendix A: Forecast Verification MetricsProbability Integral Transform (PIT) HistogramThe PIT histogram (Gneiting et al., 2005) is analogous to the rank histogram, and is used to assesscalibration when a probability forecast is expressed as a fitted PDF. PIT values are given by:Pt = Ft(xt), (A.6)where Ft is the corresponding forecast cumulative distribution function (CDF). The forecast CDFof variable x at time t is given by: Ft(x) = Z x1ft(x)dx. (A.7)For perfectly calibrated forecasts, the PIT histogram will be flat, with equal numbers of obser-vations falling into each equally sized bin. The number of bins is arbitrary and not constrained byensemble size; for our PIT histograms, we divide the interval [0,1] into 10 equally sized bins. If thePIT histogram is not flat, its shape can be used to diagnose problems with the uncertainty model.For example, as with the rank histogram, a U-shaped histogram is an indication of underdispersion,or inadequate spread in the forecast PDF. Note that for both diagrams, flatness is not always a guar-antee of calibration, as opposing biases at different times during the evaluation period can result inflat histograms (Hamill, 2001).Calibration Deviation (D)A more objective measure of calibration is the calibration deviation metric D of Nipen and Stull(2011), which measures the degree of deviation from a flat PIT histogram:D =vuut 1B BXi=1? bikTk  1B?2 (A.8)where i is an integer between 1 and the number of bins B and bi is the bin count or number ofobservations in bin i. Bin frequencies are given by bikTk1. Low values of D are preferred, andindicate a small degree of deviation from a flat PIT histogram.Perfectly reliable forecasts can be expected to exhibit some calibration deviation as a resultof sampling limitations (Brocker and Smith, 2007; Pinson et al., 2010). The expected calibrationdeviation for a perfectly calibrated forecast is given by:E[Dp] =s1B1kTkB . (A.9)128Appendix A: Forecast Verification MetricsWhen referring to calibration, we will specify a time period over which the calibration metric iscomputed, and we will not require the forecast to exhibit calibration over shorter time scales. Thisis important because, as Hamill (2001) points out, a forecast can have different distributional biasesduring different times of year. Thus, when calibration is computed over a set of time points T , anoverforecasting bias during the first half of T combined with an underforecasting bias during thesecond half can balance to produce a flat histogram.Ignorance Score (IGN)While reliability/calibration is a desirable characteristic of probabilistic forecasts, it is not an ad-equate measure of the usefulness of a forecast. Consider, for example, an uncertainty model thatalways issues a climatological forecast (i.e., the forecast PDF is always taken as the distributionof the climatological record). Assuming stationarity, such a forecasting system would be perfectlycalibrated, but far too vague for decision making. Therefore, we will also require our forecast PDFs(ft) to concentrate probability in the correct area (i.e., near the verifying observation) on each day.This property can be measured by the ignorance score (Roulston and Smith, 2002), which is definedas: IGN =  1kTkXt2Tlog2(ft(xt)), (A.10)with lower ignorance scores being preferred. Forecasts are rewarded with low ignorance scores forplacing high probability in the vicinity of the verifying observation. Due to the use of the logarithmin the definition of IGN, arithmetic differences between two ignorance scores are more relevant thantheir ratios.Nipen (2012) derived a decomposition of the ignorance score for a set of raw forecasts intotwo parts: (1) the potential ignorance score of a perfectly calibrated forecast (IGNpot), and (2)extra ignorance caused by a lack of calibration (IGNuncal). Ignorance can therefore be reducedby improving the ensemble forecasting system, applying bias correction, or using a more suitableuncertainty model to reduce IGNpot, or by calibrating the forecast to reduce IGNuncal.Brier Score (BS) and Brier Skill Score (BSS)The Brier Score (Brier, 1950) is one of the most frequently used evaluation scores for ensembleprediction systems; it is defined as the mean square error of the probability forecast:BS = 1kTkXt2T(pt  xt)2, (A.11)129Appendix A: Forecast Verification Metricswhere pt is the probability that the forecasted inflow will exceed a given inflow threshold, and xtis equal to one if the observed inflow exceeds the threshold, or zero otherwise. The exceedanceforecast probability is given by the number of ensemble members that exceed the threshold or theprobability of exceedance given by the forecast CDF. A BS of zero indicates a perfect deterministicforecast. The Brier score can also be converted into a skill score:BSS = 1 BSBSref (A.12)where BSref is the Brier score of a low-skill climatological forecast in which the probability of theevent for each forecast is equal to x?. Thus, BSref = x?(1 x?).The BS can also be decomposed into uncertainty, reliability and resolution components as il-lustrated by Murphy (1973). Uncertainty is a measure of the difficulty in forecasting the event anddepends only on observations. Following decomposition, the BSS can be reformulated as:BSS = resolutionuncertainty  reliabilityuncertainty= RelativeResolutionRelativeReliability. (A.13)A perfect forecast has a BSS and relative resolution equal to one and a relative reliability of zero.Brier scores are calculated for forecast and observation anomaly thresholds relative to clima-tological inflow values. In order to ensure that the ensemble is not unduly rewarded for makinghigh inflow forecasts during the snowmelt period where little skill is required to do so, we subtractclimatology from the forecasts and observations. This daily climatology is derived from the medianof observations on each calendar day over the period 1986?2008. A 15-day running mean is thenused to generate a smoothed climatology.Relative Operating Characteristic (ROC)Given event counts a, b, c and d from Table A.1, the hit rate (H) and false alarm rate (F) are definedas: H = aa + c (A.14)F = bb + d (A.15)The ROC diagram (Toth et al., 2003) compares H to F at different forecast probability levels andis an indicator of an ensemble?s ability to discriminate between the occurrence or non-occurrenceof a forecast event (e.g., exceedance of some threshold inflow). Discrimination is the converse of130Appendix A: Forecast Verification MetricsTable A.1: Contingency table for calculating hit rates and false alarm rates. The number offorecast hits is given by a, b is the number of false alarms, c the number of misses, and dthe number of correct rejections.Observed Not observedForecast a bNot forecast c dresolution, and examines the probabilities that were predicted conditioned on whether or not anevent was observed to occur.At a probability threshold of pt = 0, an exceedance forecast is issued as long as long as there isat least a 0% chance of exceeding the event threshold (i.e., always). Thus, H and F are both equalto one. As pt increases, more points are created along the ROC curve. At pt = 1, H and F are bothzero. A ROC curve lying along the 1:1 line indicates no skill. Ideally, the curve should travel alongthe left and upper axes (i.e., it should have a low F and high H at all probability levels).As with the Brier Score [Eq. (A.11)], H and F are calculated for inflow anomaly thresholds rela-tive to climatological inflows. Exceedance probabilities are again given by the number of ensemblemembers that exceed the threshold or the probability of exceedance given by the forecast CDF.Continuous Ranked Probability Score (CRPS)According to Gneiting et al. (2005), probabilistic forecasts should aim to maximize sharpness sub-ject to calibration. Sharpness refers to the spread of the forecast PDFs; forecasts are sharp if theirPDFs are narrow relative to low-skill forecasts derived from climatology, for example. A sharp prob-abilistic forecasting system is more likely to generate binary event exceedance or non-exceedanceprobabilities near zero or one. The Continuous Ranked Probability Score (CRPS) addresses bothcalibration and sharpness (Gneiting et al., 2005, 2007) and is given by:CRPS = 1kTkXt2TZ 11[Ft(x)H(x xt)]2dx, (A.16)where H is the Heaviside function defined as:H(s) = 8<:1 s  00 s < 0. (A.17)This score can be interpreted as an integral of Brier Scores [Eq. (A.11)] over the range of all131Appendix A: Forecast Verification Metricspossible forecast thresholds x. As with the Brier Score Eq. (A.11), these thresholds are taken tobe anomaly thresholds relative to climatology. For a deterministic forecast, Ft(x) is either zero orone, and the CRPS reduces to the mean absolute error. Hersbach (2000) has shown that the CRPScan be decomposed into a reliability component and a ?potential? CRPS component that measuressharpness. Thus, lower CRPS values are preferred, and can be achieved by improving probabilisticforecast reliability and sharpness.132Appendix BTesting an Adaptive Bias Corrector forDaisy Lake Inflow ForecastsRecall from Chapter 2 that a linearly-weighted DMB bias corrector with a moving window of 3 days(LDMB3) was found to be ideal for removing bias and improving other measures of forecast errorin the M2M ensemble mean inflow forecasts for the Daisy Lake reservoir. This linear weightingof past errors is similar to the weighting in the adaptive parameter updating scheme employed byCOMPS [Eq. (4.1)]. Indeed, a dimensionless time scale of ? = 2.0 gives the same weighting toyesterday?s forecast error as the LDMB3 scheme [Eq. (2.4)].A range of ? from 2.0 to 5.0 has been tested on the M2M inflow forecasts from the 2010?2011water year. Using this year enabled an evaluation of the impact of bias correction on day 3 forecasts;the previous water year has only a short record of day 3 forecasts. Comparison of the various timescales was based on mean absolute error (MAE) and root mean squared error (RMSE) of ensemblemean forecasts (see Appendix A for a description of these verification measures). This comparisonis shown in Table B.1. Other metrics of ensemble mean performance were found to be relativelyinsensitive to the choice of ? over the specified range.For all forecast horizons, the adaptive schemes outperform LDMB3 for ? < 5.0. Day 1 forecastsshow the best improvement for the shortest time scales. Day 2 does best for ? between 3.0 and3.5, while the ideal ? for day 3 appears to be close to 3.0. Comparison for the 2009?2010 wateryear yielded similar results for days 1 and 2. Based on the superior performance of the adaptivebias corrector to the LDMB3 corrector, and on the comparison of the various time scales, M2MCOMPS configurations for the Daisy Lake inflow forecasting system will use the adaptive DMBbias corrector with ? = 3.0.133Appendix B: Testing an Adaptive Bias Corrector for Daisy Lake Inflow ForecastsTable B.1: A comparison of ensemble mean inflow forecast performance after applying aDMB bias correction computed adaptively for a range of time scales (? ) and computedover a 3-day moving window using the linearly-weighted corrector described in Chap-ter 2. Smaller values of MAE and RMSE are preferred.Forecast Day Metric ? = 2.0 ? = 2.5 ? = 3.0 ? = 3.5 ? = 4.0 ? = 5.0 LDMB3Day 1MAE 5.6 5.6 5.8 5.9 6.0 6.2 6.2RMSE 8.8 8.9 9.0 9.1 9.2 9.5 9.7Day 2MAE 7.3 7.2 7.2 7.2 7.2 7.3 8.1RMSE 12.0 11.9 11.8 11.8 11.9 12.1 12.9Day 3MAE 8.6 8.4 8.3 8.4 8.4 8.5 9.0RMSE 13.2 12.9 12.9 12.9 13.0 13.1 14.1134Appendix CBayesian Model Averaging and the M2MEnsembleAn adaptive updating scheme was implemented in COMPS (Chapter 4) for computing the weights inBayesian Model Averaging (BMA), which can be used to produce calibrated probabilistic forecasts(Raftery et al., 2005). In the BMA uncertainty model, an observation is assumed to be drawn fromone of several candidate distributions centred at each ensemble member. A forecast probabilitydensity function (PDF) is then taken to be the weighted sum of these distributions, where the weightsare a measure of past forecast performance based on the value of the candidate model?s forecastPDF at past verifying observations. BMA has been applied successfully in hydrologic forecastingapplications over a range of timescales, where predictors are transformed prior to fitting normaldistributions (e.g., Duan et al., 2007; Reggiani et al., 2009; Wang et al., 2009; Parrish et al., 2012).In order to reduce the number of parameters that must be fit given the training data, most applicationsof BMA take the spread parameters of the individual distributions to be identical.Early tests of the method indicated that it was not a suitable uncertainty model for the M2Mensemble. This is likely due to the multi-parameter component of the ensemble?s formulation. Evi-dence of this can be seen in Figure C.1, which shows how the model weights evolve throughout the2009?2010 water year for a small sample of M2M WaSiM ensemble members (weights have beencalculated for the full 72-member ensemble, but only a small sample is plotted for readability; day1 forecasts weights are shown). In the upper frame of Figure C.1, the weights have been calculatedadaptively using a dimensionless timescale of ? = 30. The centre frame shows how the weightsevolve when calculated using a moving window of 150 days. A long window is required becausethe algorithm is attempting to fit 73 parameters (72 weights and a common spread parameter). Thelower frame shows observed inflows and observation-driven model runs made with the three differ-ent parameterizations (MAEo, NSEo, and LNSEo; see Chapter 3). These observation-driven runsare used to generate the initial conditions for the inflow forecasts each day, and have a direct impacton forecast quality.The LNSEo models, which perform slightly better than the others during the early part of the135Appendix C: Bayesian Model Averaging and the M2M Ensemblewater year, are given slightly larger weights (indicated by thicker shaded areas) early on by theadaptive algorithm. At the start of the warm season, the MAEo and NSEo begin to outperform theLNSEo models. The moving window calculation of BMA weights is able to increase the weightsapplied to these models, but it does so with a significant time lag. The adaptive weight calculationon the other hand is unable to resurrect the weights of the MAEo and NSEo members, even with arelatively small ? value.A possible solution would be to allow each member to have its own spread parameter so thatmodels with small weights would be permitted to have large spread. Since the weights are calculatedbased on the value of the model?s forecast PDFs at the observed values during training, this greaterspread would allow the weights to recover more quickly during periods of transition between idealparameterizations. However, for the 72-member M2M ensemble, this would require fitting 144parameters, and would require a lengthy training period to ensure a good fit, resulting in a significanttime lag in weight changes. Another issue with implementing BMA for the M2M ensemble isthat the number of ensemble members for each forecast day is not necessarily constant (e.g., dueto numerical weather prediction model instabilities). When calculating weights using a movingwindow, this issue can be solved by calculating weights based only on the performance of themodels that are actually available for the next forecast. In the adaptive framework, this wouldrequire tracking thousands of possible ensemble member combinations and updating the weightsfor each combination daily. We have therefore opted to exclude BMA from further consideration.136Appendix C: Bayesian Model Averaging and the M2M Ensemble10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/0100.050.1BMA WeightsWindowed BMA Weights by Day10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/0100.050.1BMA WeightsAdaptive BMA Weights by Day  MM512kmMAEoMM512kmNSEoMM512kmLNSEoMM54kmMAEoMM54kmNSEoMM54kmLNSEo10/01 11/01 12/01 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01050100150200250300Date (mm/dd)Inflow (m3 /s)WaSiM Simulated Inflows  ObservedMAEoNSEoLNSEo               NO             DATAFigure C.1: A subset of BMA weights calculated using an adaptive updating scheme (upperpanel) and a moving window (middle panel). The weights are stacked such that thickerareas represent larger weights. Results from observation-driven model runs (simula-tions) made with the different model parameterizations are shown in the lower panelwith observed inflows for comparison. Weights calculated using the moving windowchange with model performance, but with a significant time lag.137

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0103347/manifest

Comment

Related Items