SHORT LEAD-TIME STREAMFLOW FORECASTING BY MACHINE LEARNING METHODS, WITH CLIMATE VARIABILITY INCORPORATED by Kabir Rasouli B.Sc., Tabriz University, Azerbaijan, Iran, 2004 M.Sc., Islamic Azad University, Tehran, Iran, 2006 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Atmospheric Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2010 c© Kabir Rasouli 2010 Abstract Streamflow fluctuates as a result of different atmospheric, hydrologic, and morphologic mechanisms governing a river watershed. Variability of mete- orological variables such as rainfall, temperature, wind, sea level pressure, humidity, and heating, as well as large scale climate indices like the Arc- tic Oscillation, Pacific/North American Pattern, North Atlantic Oscillation, and El Niño-Southern Oscillation play a role on the availability of water in a given basin. In this study, outputs of the NOAA Global Forecasting System (GFS) model, climate fluctuations, and observed data from meteo- hydrologic stations are used to forecast daily streamflows. Three machine learning methods are used for this purpose: support vector regression (SVR), Gaussian process (GP), and Bayesian neural network (BNN) models, and the results are compared with the multiple linear regression (MLR) model. Lead-time for forecasting varies from 1 to 7 days. This study has been applied to a small coastal watershed in British Columbia, Canada. Model comparisons show the BNN model tends to slightly outperform the GP and SVR models and all three models perform better than the MLR model. The results show that as predictors the observed data and the GFS model outputs are most effective at shorter lead-times while observed data and climate indices are most effective at longer lead-times. When the lead- time increases, climate indices such as the Arctic Oscillation, the North Atlantic Oscillation, and the Niño 3.4 which measures the central equatorial Pacific sea surface temperature (SST) anomalies, become more important in influencing the streamflow variability. The Nash-Sutcliffe forecast skill scores based on the naive methods of climatology, persistence, and a combination of them for all data and the Peirce Skill Score (PSS) and Extreme Dependency Score (EDS) for the streamflow rare events are evaluated for the BNN model. For rare events the skill scores are better when the predictors are the GFS outputs plus locally observed data compared to cases when only observed data or any combination of observed and climate indices are chosen as the predictors. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives of This Research . . . . . . . . . . . . . . . . . . 3 1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Literature Review . . . . . . . . . . . . . . . . . . . . 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Application of Machine Learning Techniques . . . . . . . . . 7 2.3 Physical Processes Inherent in Machine Learning Models . . 8 2.4 Climate Variability and the Snowmelt-streamflow Modeling . 9 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 3 Introduction to Machine Learning Methods . . . 13 3.1 Machine Learning Methods . . . . . . . . . . . . . . . . . . . 13 3.1.1 Bayesian Neural Network for Short-term Streamflow Forecast . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Support Vector Regression Coupled with Genetic Al- gorithm (SVRGA) . . . . . . . . . . . . . . . . . . . . 17 3.1.3 Gaussian Processes (GP) . . . . . . . . . . . . . . . . 23 3.2 Input Data Preprocessing (Scaling) . . . . . . . . . . . . . . 26 3.3 Evaluation of the Model Performance and Streamflow Fore- cast Skill Scores . . . . . . . . . . . . . . . . . . . . . . . . . 26 iii 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 4 Case Study and Results . . . . . . . . . . . . . . . . 29 4.1 Case Study Watershed and Data Sources . . . . . . . . . . . 29 4.1.1 Case Study Watershed- Stave River . . . . . . . . . . 29 4.1.2 Data for Case Study . . . . . . . . . . . . . . . . . . . 33 4.2 Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 Model Inputs Selection . . . . . . . . . . . . . . . . . 35 4.3 Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Extreme Event Forecasts . . . . . . . . . . . . . . . . . . . . 53 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 5 Conclusion and Recommendation . . . . . . . . . . 57 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 iv List of Tables 4.1 Data used in this study (1983-2001) from the GFS model out- put, local observation and climate indices. The GFS outputs are available twice a day at 12.00 and 24.00 . . . . . . . . . . 32 4.2 Predictor selection in 4 cases of (1) local observation only, (2)Local observations & Climate indices, (3) GFS outputs (twice a day at 12.00 and 24.00) & Local observations, and (4) GFS outputs, Local data, and Large scale climate indices for forecast lead-times of 1 to 7 days (1: predictor selected; 0: predictor not selected) . . . . . . . . . . . . . . . . . . . . 36 v List of Figures 2.1 The UBC watershed model components (regenerated from [29]) 6 3.1 The proposed flowchart for streamflow forecast in a given coastal watershed based on local observations, GFS weather forecasts, and large-scale climate signals. . . . . . . . . . . . . 14 3.2 Nonlinear SVR with ǫ-insensitive tube . . . . . . . . . . . . . 19 3.3 The range of decision variables of the genetic algorithm in binary form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 The optimisation algorithm of the support vector machines’ efficiency coupling with genetic algorithm to forecast short lead-time streamflow (grey step in model training in Figure 3.1) 22 3.5 The operators of evolution a) crossover which generates the two offsprings from two parents selected by the selection prob- ability and b) mutation which changes the genes randomly. . 24 4.1 The study area in British Columbia, Canada (revised from BC Hydro’s maps). . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Climatological monthly mean of precipitation, daily maxi- mum temperature, and streamflow discharge at the Stave River above lake station in British Columbia, Canada (1983- 2001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Forecast scores for the linear and non-linear machine learning models for the case where all potential predictors are considered 37 4.4 Forecast scores for the BNN model for the 4 cases where pre- dictors are observed data only, observed data and climate indices, observed and GFS output, and observed data, GFS outputs, and climate indices. . . . . . . . . . . . . . . . . . . 39 4.5 One day lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 40 vi 4.6 Two days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 41 4.7 Three days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 42 4.8 Four days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 43 4.9 Five days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 44 4.10 Six days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 45 4.11 Seven days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the ob- served streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 46 4.12 The normal Nash-Sutcliffe skill score (relative to climatology forecast) for the different sets of predictors. . . . . . . . . . . 49 4.13 The Nash-Sutcliffe skill score (relative to persistence fore- cast)for the different sets of predictors. . . . . . . . . . . . . . 50 4.14 The Nash-Sutcliffe skill score for the different sets of predic- tors relative to the optimal linear combination of climatology and persistence forecast. . . . . . . . . . . . . . . . . . . . . . 51 4.15 The comparison of the Nash-Sutcliffe skill score relative to climatology, persistence, and combination of both for the case that the local observation data and the GFS outputs are the predictors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.16 PSS and EDS scores of high streamflow forecasts using various machine learning methods (BNN, SVRGA and GP) and MLR at lead-times of 1 through 7 days for Stave River above the lake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 vii Acknowledgements The guidance and commitment of Prof. William Hsieh as a friend and ad- visor is gratefully acknowledged. The research could not have been done without his coordination. His excellent knowledge of Atmospheric science and machine learning methods made it possible for me to work in UBC climate prediction group as a research assistant student. His constant sup- port and nice attitude allowed completion of this research without external pressure or tension. Special thanks should be given to Dr. Alex Cannon. His commitment to learning made my first Canadian experience invaluable. He should also be recognized for his coordination of the research in UBC climate prediction group and for his detailed explanation about BC Hydro and data sources. As my committee member, Dr. Ziad Shawwash deserved thanks for his detailed instruction in each phase of the research, for approv- ing my research and for inspiring me through his passion for hydrology and his excellent experience with BC Hydro. I would also like to thank to Dr. Younes Alila for his encouragement and excellent knowledge of hydrology. The hydrometric and climatic data for the research have provided by hy- drologists in BC Hydro. Here, I would like to thank them all. I also would like to thank my warmhearted friends at UBC climate prediction group Dr. Johannes Jenkner, Dr. Zhen Zeng, and Mr. Carlos Gaitan-Ospina, for their help and support. Thanks to my parents Sabileh and Manouchehr for their love and en- couragement every time I phoned them, even though we were separated by the oceans. viii Chapter 1 Introduction 1.1 Background Rivers play a vital role in water supply, agriculture, power generation, trans- portation, and flood control. To understand its mechanisms, it is worth knowing which processes are occurring during high, medium, and low rates of discharges. Streamflows can be divided into four components: overland flow occurs when precipitation exceeds the infiltration capacity of soil; shal- low subsurface streamflow represents water that infiltrates the soil but is routed relatively quickly to the stream channel; saturated overland flow oc- curs where the water table is close to the surface; and groundwater flow represents relatively deep and slow water pathways and provides water to the streamflows even when there is little or no precipitation [28]. Changes in climate and land use as well as human activities such as dam construction, channelization, logging, and urbanization led to the change in these components and consequently more frequent and intense flood- ing, depletion of groundwater, degradation in water quality and availability, extirpation of species and closure of fisheries [1, 7]. Keeping in mind all these artificial changes and the natural complicated rainfall-runoff processes, groundwater and surfacewater integration makes it difficult to get a good estimate of streamflow fluctuations and consequently the water available for allocating to different sectors such as agriculture, industry, environment, power generation purposes, and urban water supply. In terms of power generation importance in British Columbia, Canada, between 43,000 and 54,000 gigawatt-hours of energy is generated annually from 32 hydroelectric facilities and 2 thermal power plants of BC Hydro (BCH). The capacity of the integrated hydroelectric system is about 11,500 MW, 87% of which is hydroelectric. The economics of hydroelectric plants depend mainly on the reservoir’s storage capacity and the magnitude, fre- quency, the rate of change or flashiness, and predictability or timing of inflows. Accurate short-lead time streamflow forecasts of reservoir inflows enable water supply and power generation planning, scheduling and opera- tion authorities to better manage and optimize the operation of their system 1 and it helps them in making decisions on how to allocate the production of hydroelectric and thermal plants, and on how much energy to buy or sell. The short-lead time streamflow forecasts are also used for flood mitiga- tion which requires active regulation of reservoir storage for optimum use of available water resources. Red Cross data for the period 1971-1995 indicate that river and coastal floods, in an average year affect 60 million people and kill more than 12,700 humans throughout the world. The number of extreme flood disasters requiring international assistance has grown: in the years 1990-1998 it was higher than in the previous 35 years (1950-1985). In Canada, flood events have claimed the lives of at least 200 people during the 20th century, with over 2 billion in damage, which shows that floods in industrialized nations typically result in low human losses but very high economic losses [16]. There are many factors affecting the rainfall-runoff and streamflow fluc- tuation processes. The local micrometeorological factors include soil mois- ture, soil temperature, and snow budget. The latter is sensitive to tem- perature changes and it can be considered in the form of antecedent flow. Antecedent flow functions as a suitable memory of the watershed system to capture the local and physiographic features. There is usually no appropri- ate data for soil moisture and soil temperature and snow depth in higher elevations due to the lack of stations on mountains. Also, inclusion of the retrospective numerical weather forecasts by the NOAA Global Forecasting System (GFS) model [11] for accumulated precipitation, temperature, zonal and meridional winds, relative humidity, sea level pressure, and precipitable water in models [21] can be useful in terms of capturing the governing short and long-term signals in the study area. Changes in the land use, the to- pographic and morphologic features of the study area as well as climate variability have effects on the streamflow fluctuations. In this study an attempt is carried out to analyze the role of locally recorded data, GFS reforecasts, and large scale climate indices on stream- flow fluctuations in order to identify the climate variability incorporation in the hydrological records and to forecast the streamflow more accurately. This could help to reduce the concerns of operating decisions revision. To ap- proximately incorporate snow budget influences on the short-term forecasts, the difference between antecedent and present streamflows, as well as max- imum daily temperature, difference between the maximum and minimum daily temperatures, and precipitation are input into the machine learning models. The studies show that variability in snowpack and melting rates are strongly affecting the diurnal streamflow timing in small basins[19]. Snow- pack determines diurnal streamflow timing in small basins where the hour 2 of daily peak flow generally starts near midnight and then shifts earlier in the day as snow depths and travel times decrease. Sophisticated machine learning techniques which are introduced recently to environmental sciences and applied widely in different learning systems are implemented in this study. 1.2 Objectives of This Research There are many different atmospheric and hydrologic models used in the lit- erature to link climate variability to stream fluctuations. In this study, data generated by the Global Forecasting Systems, GFS model (formerly known as Medium-Range Forecast, MRF), climate indices, and observed data from local meteo-hydrological stations are used to forecast streamflows. An at- tempt is made to forecast streamflows with sophisticated machine learning techniques which are introduced after 1994 and recently applied in different learning systems. These are support vector regression combined with ge- netic algorithm (SVRGA), Bayesian neural network (BNN), and Gaussian process (GP) models, for which their skills in short-term forecasts are com- pared with those of multiple linear regression (MLR) model. The support vector regression (SVR) approach is combined with the genetic algorithm (GA), an efficient nonlinear optimization technique, where the GA is used to find the optimal hyper-parameters for the SVR model. The role of locally recorded data, the GFS model outputs, and large-scale climate indices on streamflow fluctuations is analyzed in order to capture the main signals governing a small coastal basin. 1.3 Organization of the Thesis Chapter 2 provides a literature review on streamflow forecast techniques with a special focus on the hydrometeoric studies and machine learning techniques. The literature review explores in detail the application of arti- ficial intelligent systems in snowmelt runoff modelling. Chapter 3 provides the history of machine learning methods and the mathematical background related to those used in this study. Chapter 4 provides a brief introduc- tion to the case study watershed, data availability, and the criteria used to judge model performance. It also illustrates the applications of the machine learning techniques in a case study watershed for daily streamflow fore- casting, and compares the machine learning models performance with the 3 MLR. Chapter 5 provides the conclusions and recommendations for future research. 4 Chapter 2 Literature Review 2.1 Introduction In essence, the entire process of precipitation-streamflow includes the pre- cipitation in the form of rainfall and snow based on temperature and atmo- spheric variability, soil moisture balance based on infiltration and evapora- tion, timing of different hydrologic components such as surface water, inter- flow, and groundwater, and the river and reservoir storage routing. Each of these events needs strong parameterization or simplification to be modelled which limits the hydrologic modelling. Different types of models are avail- able to tackle this problem, each with its own advantages and disadvantages. The UBC watershed model (UBCWM) introduced in 1977 [29], as an exam- ple has the parameter selection issue in evaluation of rainfall-runoff process. This model is a conceptual model which is being used operationally for forecasting short-term streamflows in basins which are subject to snowmelt floods from the mountain snowpacks. Different components of the UBC wa- tershed model are shown in Figure 2.1. This model estimates snowpack accumulation and depletion and operates entirely from meteorological in- puts of daily maximum and minimum temperatures and precipitation. Soil moisture deficit, groundwater and evapotranspiration are controlled in the model. For those basins regulated by lakes along the river or by river out- put, river and reservoir routing is available in the UBC watershed model. According to the amount of the soil moisture deficit, flow is divided into three (slow, medium, and fast) components. The fast flow occurs when the soil moisture deficit and groundwater percolation are satisfied. The medium and fast components of runoff are routed by unit hydrographs representing transformation of excess rainfall hyetograph into the corresponding direct runoff hydrograph. The slow component of runoff utilizes recession coeffi- cients. The same classification of the rising and falling limbs of flood hydro- graph is implemented in an integrated fashion of river flow prediction by Srinivasulu and Jain [33] for different categories of hydrographs. They used a conceptual model to calculate the base flow by flow recession with adap- 5 Figure 2.1: The UBC watershed model components (regenerated from [29]) 6 tive coefficients and infiltration with the Green-Ampt method. This model also uses a nonlinear reservoir for flood routing and the time area method to translate the effective rainfall input in time. They trained artificial neu- ral networks (ANNs) using elitist real-coded genetic algorithm (ERGA) for rainfall-runoff modeling. For model fusion, the current flow is considered rising if the flow in the previous consecutive two time steps is rising or effec- tive rainfall of the previous time step is greater than zero. Also, the current flow is considered falling if the flow in the previous consecutive two time steps is falling. Most of the conceptual models such as the UBCWM model are suffering from higher parameterization issues. It means one needs to set different parameters so that be able to simulate the rainfall-runoff processes properly which is very difficult due to the complexity of these processes. In compari- son, the newer machine learning models have generalization ability and are more efficient in testing periods even better than some of the conventional models such as neural networks. The machine learning methods can iden- tify the patterns in time series very well. Li [18] compares the performance of the daily ANN model with the operational UBCWM model and hourly ANN models with the Multi-Input Single Output Linear Model (MISOLM), which is a blackbox model. The overall results of the research show that the ANN technique is practicable and effective for realtime streamflow and flood forecasting; the ANN models have higher simulation accuracy than the other referenced models. 2.2 Application of Machine Learning Techniques Moradkhani et al. [24] built a hybrid ANN model (SORB) by combining two ANN architectures: Self Organizing Feature Map (SOFM) and Radial Basis Function (RBF), for one step ahead streamflow forecasting. van Hinsbergen et al. [36] combined the neural networks in an ensemble using Bayesian inference theory and introduced an evidence factor for each model. This can be used as a stopping criterion during training, and as a tool to select and combine different neural networks and also to draw prediction intervals instead of just one predicted value. Wu et al. [41] proposed a hybrid genetic-based SVR model, HGA-SVR, which can optimize the SVR hyperparameters integrating the real valued and integer genetic algorithm, for increasing the predictive accuracy and ca- pability of generalization compared with traditional machine learning mod- els. 7 2.3 Physical Processes Inherent in Machine Learning Models After significant success in the applications of conventional artificial neural networks (ANN) in various aspects of hydrology, the motivation to get rid of the blackbox nature of ANN and explain ANN in the hydrology termi- nology has started to emerge. Recent researches were focused on relating ANN behavior to physical processes, relating ANN architecture to exist- ing hydrological theory or even determining the ANN architecture based on geomorphologic information. Zhang and Govindaraju [43] demonstrated how geomorphologic infor- mation could be incorporated into ANN architecture. In their research, Geomorphologic Instantaneous Unit Hydrograph (GIUH) provided a the- oretical basis for part of the architecture of Geomorphologic-based ANN (GANNs). They interpreted the number of hidden neurons of a three layer BPNN by the number of possible paths in GIUH and therefore, it can be determined by geomorphologic information. The weights of connections be- tween hidden layer and output layer are represented by path probability in GIUH theory and determined by geomorphologic information. The weights of connections between input layer and hidden layer are analogous to the transient holding time when compared with GIUH theory. The inputs to GANN are current and past rainfall excesses. The number of previous rain- fall excesses data points was determined by a trial and error method. The application of GANN in two small watersheds in southern Indiana showed that the GANN is superior to GIUH. What is meaningful in the research is that the GANN has changed the normal structure of the ANN model, as it evaluates ANN from a purely empirical model to, 15 somewhat, physically based model. Pan and Wang [27] derived the unit hydrograph (UH) from the weights of the network. They employed dynamic recurrent neural net- works (also: state space neural networks, SSNN) for hourly time step flood forecasting in the WuTu watershed (204 km2 in Taiwan). The research built connections between physical concepts and ANN parameters by deriving UH from the weights of the network. The research revealed the physical concept in networks, which is traditionally thought to be a blackbox model as it lacks any physical meaning of the weights. The performance of SSNN for short term rainfall-runoff forecasting indicated that the dynamic recurrent network is an appropriate tool. Jain et al. [15] showed that ANN is able of capturing certain physical behaviour of rainfall-runoff processes. The results suggest that the hidden 8 neurons in the ANN rainfall-runoff model approximate various components of the hydrologic system, such as infiltration, base flow, and delayed and quick surface flow, etc., and represent the rising limb and different por- tions of the falling limb of a flow hydrograph. They applied a three layer back propagation ANN with four hidden neurons in the Kentucky River Basin (10239 km2,USA) to simulate five hydrological processes simultane- ously and compared the ANN model with the Conceptual Rainfall Runoff model (CRR). The CRR model consists of two components and has the ability to simulate total flow, base flow, surface flow, soil moisture and in- filtration processes simultaneously. The comparison between ANN behavior (hidden neuron output) and output of CRR model was done by means of correlation analysis and graphical comparison. The comparison indicated that both the ANN model and CRR are good and the performance of the ANN model is better than that of the CRR model. The ANN model captures different components of the physical rainfall-runoff process being modelled through its massively distributed structure. Those four hidden neurons in the ANN model were found to model different components of a hydrological process individually. The inference from the study strongly suggests that an ANN model is not necessarily a blackbox model as it is able to capture certain physical behavior in rainfall-runoff processes. Yu et al. [42] proposed a real-time (one to four hours ahead) flood stage forecasting system based on support vector regression (SVR). They applied a two-step grid search to find the hyperparameters of SVR and also hydro- logical concept of concentration time which is the time difference measured between the center of rainfall hyetograph and the peak stage to determine the order of input predictors. 2.4 Climate Variability and the Snowmelt-streamflow Modeling Changes in the land use, the topographic and morphologic features of the study area as well as climate variability have direct or indirect effects on the streamflow fluctuations. In coastal regions the main part of inflow re- sults from seasonal storms and spring snowmelt. To simulate and forecast the streamflows accurately, study of climate variability is inevitable. Burn [4] studied the climate influences on streamflow timing. He showed that in headwater basins the spring freshet is occurring earlier with timing shift. The El Niño-Southern Oscillation (ENSO), which involves sea surface tem- perature (SST) oscillation in the Pacific Ocean, is associated with low and 9 high winter precipitation totals (for El-Niño warm episodes and La-Niña cool episodes, respectively) in the Pacific Northwest and the opposite for the desert Southwest, USA [17]. Pacific Decadal Oscillation (PDO) which refers to the time history of the leading eigenmode of the North Pacific SST [22] and ENSO have great effects on runoff timing in Mackenzie River basin in Canada and Upper Klamath Lake, Oregon, USA [4, 17] and on American and European climate variations [21]. Kennedy et al. [17] found that runoff timing shift and current trends in climate variability suggest that the inclusion of large-scale climate variables in regression based streamflow models improves the accuracy and robustness of the early season models during a time when total snow accumulation is incomplete. In order to reduce the streamflow forecast uncertainty, they an- alyzed six large-scale climate indices including the Pacific North American Pattern, Southern Oscillation Index, PDO, Multivariate El Niño-Southern Oscillation Index, Niño 3.4, and a revised Trans-Niño Index (TNI) to see their effects on inter-annual variation of the major hydrologic inputs in Up- per Klamath Lake, Oregon, USA. For streamflow forecasting, it is useful to compute a seasonal TNI series instead of a snapshot by a single month that reflects the seasonal character of the Pacific Ocean SST gradient. North Atlantic Oscillation (NAO) is usually defined by the normalized difference in surface pressure between a station in the Azores and a station in Iceland. Makkeasorn et al. [21] studied the global climate variability ef- fects on short-term streamflow forecasting using genetic programming and neural networks. They emphasized the inclusion of sea surface temperature in addition to the spatio-temporal rainfall distribution via the Next Gen- eration Radar (NEXRAD), meteorological data via local weather stations, and historical streamflow data via gauge stations to forecast discharges in a semi-arid watershed in Texas, USA. The resultant of the driving forces and climate variability cause the complexity and highly non-linearity of short- term river streamflows. Schaeffer et al. [31] investigated the climate change impact not only on the changes in the mean of the climatic variables but also on the extremes. They showed that in coastal areas changes in the one-in-10-year tempera- ture events are largely attributed to changes in large-scale circulation, for extremes linked to infrequent wind directions. Temperature extremes in dif- ferent regions were found to be linked by a large-scale circulation anomaly pattern, which resembles the Arctic Oscillation (AO). They also found that for coastal areas, a large-scale change in the mean wind direction causes disproportionally large changes in the extremes associated with infrequent wind directions which are due to a shift towards a preferred positive phase of 10 North Atlantic Oscillation (NAO). Thus, changes in the mean and extremes of climate indices influence to the variations of local micrometeorological factors and consequently on the mean and extremes of river flow. Ambaum et al. [2] investigated the definition and interpretation of the AO and NAO indices. They showed that the NAO reflects the correlation between the surface pressure variability at two centers of action, whereas the AO does not. Hemispheric empirical orthogonal function (EOF) analy- sis on lower tropospheric fields yields very different and unrelated patterns. However, the leading two EOFs of the 850-hPa streamfunction are clearly representative of Pacific North America pattern (PNA) and NAO, respec- tively. An increase in the AO implies a strengthening of the polar jet over northwest Europe and of the subtropical jet there but a weakening of the Pacific jet. This study suggests that in comparison to the AO, it is the NAO and PNA patterns that have separated physical patterns which have the predictability capacity. Fleming et al. [8] investigated teleconnections between AO and monthly streamflow for the glacier and snowmelt-fed rivers in British Columbia and the Yukon, in northern Canada. A competent predictor selection turns out to be an integral part of the skilful forecast models. Clark and Hay [6] applied medium-range numerical weather prediction model output to produce forecasts of streamflow. They found that the fore- cast biases of the Medium-range forecast model (MRF) recently known as the Global Forecast System (GFS) is very high and in some parts of the study area exceeds 100% of the mean of precipitation. This outlines the need for additional processing before using the GFS output for hydrologic forecasts. The model output statistics (MOS) method is used for downscaling the GFS outputs to the local stations into US. The forecasted atmospheric variables (e.g., total column precipitable water, 2-m air temperature) are used as a predictor in a forward screening multiple linear regression model to improve local forecasts of precipitation and temperature in stations. These refore- casts are then entered to the U.S. Geological Survey’s Precipitation-Runoff Modeling System (PRSM) to forecast 8-day lead-time streamflow. This literature review found that the research on the use of machine learning techniques for snowmelt runoff modelling is limited and that the existing researches have shown the robustness of these methods in this area. One of the earliest publications [35] in this area showed that daily ANN model that uses daily precipitation, temperature and snow water equiva- lent (SWE) as inputs provides a high degree of accuracy when compared with a regression model and with a simple conceptual model. Their results indicated that temperature is an important indicator for snowmelt runoff 11 simulation. 2.5 Summary Interaction between atmosphere, ocean, and water resources over the land plays the main role on the availability and variability of water in different climate regimes. Recent studies show an increase of interest on the use of a variety of atmospheric variables as predictors in hydrologic forecasting studies, since the emphasize in conventional methods was only on using lo- cal recorded data as predictors. In this study, the atmospheric variables reforecasted by a numerical weather prediction model as well as indices of climate variability are added to the local observation as predictor in sev- eral machine learning methods. To indirectly incorporate the snow budget influences on the short-term forecasts, the difference between antecedent and present streamflows, as well as the maximum daily temperature, daily temperature changes, and precipitation were used in the machine learning models, as streamflow difference is assumed to represent the snow memory of the river system. Also, temperature changes during a day is important as it controls the snowmelt processes that affects the freezing and thawing of the snow pack. 12 Chapter 3 Introduction to Machine Learning Methods 3.1 Machine Learning Methods The history of learning machines can be classified into four periods [39]: Constructing (i) the first learning machines, (ii) the fundamentals of the theory, (iii) neural networks, and (iv) the alternatives to neural networks. In this study, we will focus in the latter period and introduce two alterna- tives to neural networks. The steps that were carried out in this study are summarized as flowing: 1. Data processing for 4 combinations of 3 categories of the GFS output, local observations, and large-scale climate indices and normalizing the predictand using log transform. 2. Predictor selection based on the stepwise regression method in the Matlab toolbox. 3. Multi-Linear Regression (MLR) model training. 4. Bayesian neural network training using cross-validation and ensemble average. 5. Gaussian process model training. 6. Support vector regression model with genetic algorithm optimization training. 7. Comparing the results obtained from steps 3 through 6. 8. Repeat the steps 2 through 7 for 4 combined data sets of GFS outputs, local observations, and climate indices. 13 Figure 3.1: The proposed flowchart for streamflow forecast in a given coastal watershed based on local observations, GFS weather forecasts, and large- scale climate signals. 14 Figure 3.1 illustrates the flowchart of this study in more detail. Four combinations of local observation, the GFS outputs and climate indices are as follows: • Only locally observed data including daily precipitation and stream- flow, maximum temperature and temperature changes during the day • Observed data and the GFS outputs • Observed data and climate indices • Observed data, the GFS outputs, and climate indices Final predictors are selected by the forward stepwise regression method using the F test which is a statistical test in which the test statistic has an F-distribution under the null hypothesis. It is usually used to compare the statistical models that have been fit to a data set to identify the best model. Following this sequence of steps, the potential usefulness of predictor categories, such as climate indices and local observations are investigated in detail. In conventional neural network modelling [13], a network is trained using a data set (x, y), where x are the predictors and y the predictand, by ad- justing network parameters or weights (including offset or bias parameters) w so as to minimize an objective function (also called cost or loss or error function), such as, E(w) = CED + Ew, (3.1) ED = 1 2 N∑ n=1 (y(xn;w)− tn)2, Ew = ∑ v (αvEw,v), where ED is an error function measuring how close the network output y(x,w) is to the target t, Ew is the weight penalty term to prevent overfitting (i.e. fitting to the noise in the data), and (v) is number of the group of weights. n and N are the order and total number of the training data. The parameters αv control the extent to which the regularizer influences the solution. C is the inverse weight penalty parameter. A small C will strongly suppress the magnitude of w found by the optimization process, thereby yielding a less complex (i.e. less nonlinear) model. The best value for C is 15 commonly chosen upon validating the model performance over independent data not used in training the model. With the optimal C, the model should be neither overfitting nor underfitting the data. This principle is called Empirical Risk Minimization (ERM)[10]. 3.1.1 Bayesian Neural Network for Short-term Streamflow Forecast An alternative to validation in order to find the best value for C is Bayesian neural network (BNN) [20], a neural network based on a Bayesian proba- bilistic formulation. The idea of BNN is to treat the network parameters or weights as random variables, obeying an assumed prior distribution. Once observed data are available, the prior distribution is updated to a posterior distribution using Bayes’ theorem. BNN automatically determines the op- timal value of C without the need of validation data [13]. The Bayesian training method is not based on maximum likelihood technique, but on Bayes’ theorem. It has some advantages: First, error bars can be assigned to the predictions of a network. Second, an automatic procedure for finding C is used, and since all the data are used for training, better models may result [36]. In the structure of Bayesian training, the parameters are not just single values, but a distribution of values representing the various degrees of belief. The goal is then to find the posterior probability distribution for weights after observing the dataset D, denoted by p(w|D), p(w|D) = p(D|w)p(w) p(D) , (3.2) where p(D) is the normalization factor, p(D|w) represents a noise model on the target data and corresponds to the likelihood function, and p(w) is the prior probability of the weights. Gaussian forms of the prior and the likelihood function are as following: p(w) = 1 Zw(α) exp ( − ∑ v αvEw,v ) , (3.3) p(D|w) = 1 ZD(β) exp ( − β 2 N∑ n=1 (y(xn;w)− tn)2 ) , (3.4) where Zw = ∫ exp(− ∑ v αvEw,v)dw and ZD = ∫ exp(−βED)dD are nor- 16 malizing constants and α = (α1, . . . , αv) and β are called hyperparameters as they control the distribution of the other parameters, i.e. the weights w of the network. The prior has zero mean and variances 1/αv for every group (v) of weights, the likelihood function has zero mean and variance 1/β. It can be seen that the exponents in Eqs. 3.3 and 3.4 take the form of the error functions Ew and ED, respectively in Eq. 3.1, resulting in an expression for the posterior: p(w|D) = 1 Zs(α, β) exp (−E(w) (3.5) E(w) = βED + ∑ v αvEw,v where Zs(α, β) = ∫ exp(−E(w))dw is a normalizing constant. Consider now the maximum of the posterior distribution, wMP (the most probable value of weight vector). Although, the most probable values for the weights (the “peak” of the posterior distribution) can be found using normal back- propagation, the entire posterior distribution needs to be evauluated to gen- erate for example error bars on the predictions or to construct an ensemble of networks. Therefore, the posterior needs to be approximated, for example by a Taylor expansion. This estimation of the posterior distribution of the weights can be used to construct error bars and to create an ensemble of networks [36]. In summary, first, different networks with different initial weight values are constructed, then for a model, the initial weight values for the hyper- parameters from their priors are drawn. Next, the model is trained by a scaled conjugate gradient algorithm. Every step of the algorithm, values of the hyperparameters are re-estimated. The stop criterion is calculated for each network every few epochs. In neural networks applications, the number of input and output neurons is completely determined by the research problem itself while the number of hidden neurons is arbitrary. In general, more hidden neurons will provide a network greater capacity to capture the underlying relationships in training data. 3.1.2 Support Vector Regression Coupled with Genetic Algorithm (SVRGA) Conventional neural network approaches have suffered difficulties with gen- eralization, producing models that can overfit the data. This is a conse- 17 quence of the optimisation algorithms used for parameter selection and the statistical measures used to select the ‘best’ model. The foundations of Sup- port Vector Machines (SVM) have been developed and are gaining popular- ity due to many attractive features, and promising empirical performance. The formulation embodies the Structural Risk Minimization (SRM) prin- ciple, which has been shown to be superior, to traditional Empirical Risk Minimization (ERM) principle, employed by conventional neural networks [10]. SRM minimises an upper bound on the expected risk, as opposed to ERM that minimises the error on the training data. It is this difference which equips SVM with a greater ability to generalize, which is the goal in statistical learning. SVMs were first developed to solve the classification problem, and then extended to regression problems [38]. In the SVR algorithm, the input data are mapped into the higher dimen- sional feature space, in which the training data may exhibit linearity, and then, linear regression in this feature space is performed. Let xi be mapped into a feature space by a nonlinear function Φ(xi); the regression function becomes [42] f(w,b) = w · Φ(x) + b. (3.6) The tolerated errors within the extent of the ǫ-tube, as well as the penal- ized losses Lǫ when data are outside of the tube, are defined by Vapnik’s ǫ-insensitive loss function as Lǫ(yi) = { 0 for |yi − (w · Φ(xi) + b)| ≤ ǫ |yi − (w.Φ(xi) + b)| − ǫ for |yi − (w · Φ(xi) + b)| > ǫ. (3.7) Figure 3.2 presents the concept of nonlinear SVR. The nonlinear regression problem and its dual form can be expressed as the following optimization problems: min w,b,ξ,ξ∗ 1 2 w2 + C l∑ i=1 (ξi + ξ ∗ i ), subject to yi − (w.Φ(xi) + b) ≤ ǫ+ ξi, (w.Φ(xi) + b)− yi ≤ ǫ+ ξ ∗ i , ξi, ξ ∗ i ≥ 0, i = 1, 2, . . . , l, (3.8) 18 Figure 3.2: Nonlinear SVR with ǫ-insensitive tube 19 and min αi,ᾱi 1 2 l∑ i,j=1 (αi − ᾱi)(αj − ᾱj)〈Φ(xi).Φ(xj)〉+ ǫ l∑ i=1 (αi + ᾱi)− l∑ i=1 yi(αi − ᾱi), subject to l∑ i=1 (αi − ᾱi) = 0, 0 ≤ αi ≤ C, i = 1, 2, . . . , n, 0 ≤ ᾱi ≤ C, i = 1, 2, . . . , n, (3.9) where ξi and ξ ∗ i are slack variables that specify the upper and the lower training errors subject to an error tolerance ǫ and C is a positive constant that determines the degree of penalized loss when a training error occurs. In this optimization problem, most data examples are expected to be in the ǫ-tube. If a data example is outside the tube, then an error ξi or ξ ∗ i exists, which is to be minimized in the objective function. αi and ᾱi, are Lagrange multipliers which enable the optimization problem to be solved more easily in the dual form. The computation in the input space can be performed using a “kernel” function K(xi,xj) = 〈φ(xi).φ(xj)〉 to yield the inner products in the feature space, circumventing the problems intrinsic in evaluating the feature space as φ can be a very high or infinite dimensional vector. Performance of the SVR model depends on the choice of the kernel func- tion and the hyperparameters. In this study, we use the Gaussian or radial basis kernel function (RBF) kernel, K(x,xi) = exp[− ||x− xi|| 2 2σ2 ], (3.10) with the hyperparameter σ controlling the width of the Gaussian function. SVR with the RBF kernel (SVRRBF) is a nonlinear regression model but with the robust ǫ-insensitive error norm instead of the non-robust mean squared error norm as used in MLR. Finally, the kernel function allows the regression function of nonlinear SVR to be expressed as follows: f(xi) = l∑ i=1 (−αk + ᾱk)K(xi,xk) + b. (3.11) 20 We also tested the linear kernal, the polynomial kernel and the sigmoidal kernel. We used the SVR codes written by Chang and Lin in 2001, download- able from the LibSVMwebsite (http:www.csie.ntu.edu.tw/cjlin/libsvm). The hyperparameters are initially estimated according to [5], and then optimized by genetic algorithms (GA) [9, 12]. For this purpose, the GA optimization model is coupled with SVR to find the optimal hyperparameters and type of kernel function used. Genetic algorithms are a particular class of evolu- tionary algorithms that use techniques inspired by evolutionary biology, e.g. inheritance, mutation, selection, and crossover. GAs is well suited to ma- nipulate the models with varying structures since they can search non-linear solution spaces without requiring gradient information or a prior knowledge of model characteristics. In this study, we tried the grid search method as well, but in comparison to the GA the results were poor and in some cases (e.g. small grid size) it needs longer time than the GA method does. The support vector regression model shown in grey in the flowchart is given in more details here. As discussed above, to find the optimal hyper- parameters and the kernel type of the SVR model, the genetic algorithm as a search approach is implemented in this study. According to this approach, first chromosome of decision variables is defined in binary form as shown in Figure 3.3. Each chromosome includes 4 genes of kernel type, and the 3 Figure 3.3: The range of decision variables of the genetic algorithm in binary form. SVR hyperparameters (C, σ and ǫ). The range of the variables is given in the figure. Figure 3.4 represent the proposed methodology for obtaining the optimal SVR model. As an evolutionary technique, GA starts with the initial generation of individuals or chromosomes. The initial generation are generated adapting to the noise variance by the K-nearest neighbour (KNN) method [5]. According to this, SVR hyper- 21 Figure 3.4: The optimisation algorithm of the support vector machines’ effi- ciency coupling with genetic algorithm to forecast short lead-time streamflow (grey step in model training in Figure 3.1) 22 parameters C, σ, and ǫ are estimated by C = max(ȳ + 3std(y), ȳ − 3std(y)), (3.12) σ = √√√√ n 15 k n 1 5 (k − 1) ∑ (y − yKNN)2 2 , (3.13) ǫ = 3σ √ log(n) n , (3.14) where y and yKNN denote the training data and the corresponding values from the KNN scheme, respectively, with n the number of training samples and k the number of neighbours. The fitness value of the optimization problem is the negative of the mean square error between the SVR forecast and the observation. After initializing the first generation, the fitness value is computed for all chromosomes of the generation and the best chromosome corresponding to highest fitness is kept. Then, the two operators of crossover and mutation are carried out over the current generation to get the new generation or offsprings. The definition of these operators is illustrated in Figure 3.5. Crossover and mutation probability determine random location on the DNA for cutting and bit-flipping, respectively. After imposing the operators to get a new generation, the fitness value is again determined and the best chromosome is compared to that of the prior generation and then the best chromosome is retained. The evolution process is repeated till the termination factor is reached. The final best chromosome gives the optimal value for the decision variables. By using these variables, one can train the SVR model to forecast the streamflow for different combinations of predictors. The entire process is repeated for forecast lead-times of 1 through 7 days. 3.1.3 Gaussian Processes (GP) In statistics, one is usually interested in understanding the nature of data and their relations in the form of models such as linear relations or interde- pendencies to summarize them. However, in machine learning the objective is twofold: formulate the best possible forecast and understand the learn- ing algorithms. Gaussian processes (GP) using these two characteristics provide a principled, practical, probabilistic approach to learning in kernel machines. Rasmussen and Williams [30] show how GP can be interpreted as a Bayesian version of the support vector machine methods. Gaussian 23 Figure 3.5: The operators of evolution a) crossover which generates the two offsprings from two parents selected by the selection probability and b) mutation which changes the genes randomly. 24 process is a collection of random variables, any finite number of which have consistent joint Gaussian distributions. A GP is fully specified by its mean function m(x) and covariance function k(x, x́). This is a natural generaliza- tion of the Gaussian distribution whose mean and covariance is a vector and matrix, respectively. The Gaussian distribution is over vectors, whereas the Gaussian process is over functions. We will write [3]: f ∼ GP (m,k) (3.15) meaning:“the function f is distributed as a GP with mean function m and covariance function k”. In order to understand this process, samples from the function f can be drawn. To work only with finite quantities, only the value of f at n specific locations is chosen. Given the x-values we can evaluate the vector of means and a covariance matrix which defines a regular Gaussian distribution. To clarify the distinction between process and distribution m and k are used for the former and µ and Σ for the latter. We can now generate a random vector from this distribution. This vector will have as coordinates the function values f(x) for the corresponding x’s: f ∼ N(µ,Σ) (3.16) The GP is used as a prior for Bayesian inference. The prior does not depend on the training data, but specifies some properties of the functions. The goal is to derive the simple rules of how to update this prior in the light of the training data. One of the primary goals computing the posterior is that it can be used to make predictions for unseen test cases. Let f be the known function values of the training cases, and let f∗ be a set of function values corresponding to the test set inputs, X∗. Again, we write out the joint distribution of everything we are interested in:[ f f∗ ] ∼ N ( [ µ µ∗ ] , [ Σ Σ∗ ΣT ∗ Σ∗∗ ] ) . (3.17) where we’ve introduced the following shorthand: µ = m(xi), i = 1, . . . , n for the training means and analogously for the test means µ∗; for the covariance we use Σ for training set covariances, Σ∗ for training-test set covariances and Σ∗∗ for test set covariances. Since we know the values for the training set f we are interested in the conditional distribution of f∗ given f which is expressed as: f∗|f ∼ N(µ∗ +Σ T ∗ Σ−1(f − µ),Σ∗∗ −Σ T ∗ Σ−1Σ∗). (3.18) 25 This is the posterior distribution for a specific set of test cases. It is easy to verify (by inspection) that the corresponding posterior process is: f |D ∼ GP (mD, kD), (3.19) mD(x) = m(x) + Σ(X,x) TΣ−1(f −m), kD(x, x́) = k(x, x́)− Σ(X,x) TΣ−1Σ(X, x́), where Σ(X,x) is a vector of covariances between every training case and x. These are the central equations for GP predictions. The only problem in GP application is its memory and computational complexities which mainly appear when the dataset size, n, is high (more than a few thousands). The memory complexity is of O(n2) and computational complexity is of O(n3) due to the size of covariance matrix inversion. 3.2 Input Data Preprocessing (Scaling) A machine learning technique is theoretically able to handle raw data. How- ever, without properly transforming the input data, the training process and the training result may not be the most efficient one. If variables in the train- ing dataset vary greatly in magnitude, the weights and biases have to adapt to the difference in magnitude. The resulting weights and biases are not necessarily reflecting the importance of the input variables and will make the training algorithm run inefficiently. Input data preprocessing/scaling is an efficient way to solve the problem. The commonly used scaling methods include: (i) linear transformation, (ii) statistical standardization, and (iii) mathematical functions (e.g. “log” transform). The transformation usually scales input into certain range between 0 and 1 or even 0.15 to 0.85. Nayak et al. [26] indicated that transferring input variable to normally distributed time series will improve overall neural network performance. Smith and Eli [32] transferred a hydrograph into Fourier series with 21 coefficients that were simulated by a neural network, and found that the prediction of the entire hydrograph to be very accurate for multiple storm events. 3.3 Evaluation of the Model Performance and Streamflow Forecast Skill Scores Verification and evaluation of the linear and nonlinear models performance introduced in the previous section for different schemes is carried out using the pearson correlation coefficient (Corr), mean absolute error (MAE), root 26 mean square error (RMSE), and the Nash-Sutcliffe model efficiency coef- ficient (E) for the entire time series of forecasts and observations, and the Peirce Skill Score (PSS), and Extreme Dependency Score (EDS) for the rare intense events in the streamflow (Figure 3.1). While Corr is a good measure- ment of linear association between the forecasts and observations, it does not take forecast bias into account, and is sensitive to rare events. MAE measures the average magnitude of the forecast errors. Compared to the RMSE, MAE is a much cleaner, hence preferable, measure of the average error [40]. These indices for forecast lead-times of 1 to 7 days are analyzed for all models built on different data sets, and we would expect the nonlinear models to perform better than the multiple linear regression (MLR) model due to the nonlinearity in the rainfall-runoff processes. Skill scores measure the accuracy of the forecasts of interest relative to the accuracy of forecasts based on naive forecasting methods such as cli- matology or persistence. Naive methods which produce the most accurate forecasts are chosen as the reference standard. A climatological forecast is simply the mean of the meteorological variable of interest over an ap- propriate record period and persistence forecast simply gives the value of the variable at time of the forecast for future times. Linear combination of climatology and persistence is considered as a method to produce more accurate forecasts [25]. Here, fi = kx ◦ i + (1− k)x̄, (3.20) denotes a convex linear combination of the persistence forecast (x◦i ) and climatological forecast (x̄) on the ith occasion (i = 1, . . . , n), where k repre- sents a constant (0 ≤ k ≤ 1). When k = 0 the combined forecast is identical to a climatological forecast, when k = 1 the combined forecast is identical to a persistence forecast, and when 0 < k < 1 the combined forecast is a linear combination of climatology and persistence. k is computed based on the au- tocorrelation coefficient of streamflow. In this study, fi is substituted into the Nash-Sutcliffe skill score and the accuracy of the forecasts for 1 to 7 days lead-times are calculated. The revised Nash-Sutcliffe equation becomes: E = 1− ∑ (xobs − xfor) 2∑ (xobs − fi)2 , (3.21) E = 1− MSEfor MSEref , (3.22) where xobs denotes the recorded variable of interest, here streamflow, and 27 xfor is the associated forecasted value and MSEfor represents the machine learning model forecast error and MSEref is the naive model error where the reference can be climatology, persistence or their combination. 3.4 Summary Nonlinearity of the atmospheric-hydrologic process events in general and the rainfall-runoff process in particular require the use of skillful non-linear machine learning models to reveal to some extend the main processes occur- ring during the conversion of precipitation into the streamflow. To obtain this, climate indices, global forecasting system model outputs including dif- ferent atmospheric variables, as well as locally recorded data are selected to forecast the short-term streamflow in a given small coastal watershed. Machine learning methods used in this study are Bayesian neural networks, support vector regression coupled with genetic algorithm, and Gaussian pro- cess. Creating an ensemble of forecasts during Bayesian training increases the forecast accuracy and presents a range of forecast values instead of a sin- gle value. To evaluate the performance of the models, forecast skill scores are calculated based on climatology, persistence, and a combination of both. 28 Chapter 4 Case Study and Results 4.1 Case Study Watershed and Data Sources 4.1.1 Case Study Watershed- Stave River To explore the applicability of machine learning methods to streamflow fore- casting, the Stave River above the lake was selected as a case study water- shed. The catchment is a coastal watershed located in southern British Columbia, Canada. The drainage area of the Stave River above the lake is about 282 km2. The drainage basin of the Stave River catchment and its geographic features and position in British Columbia, Canada, are shown in Figure 4.1. The period of measurement record used in this study is 1983- 2001. According to the climatology monthly flow in this gauge, June has the highest flow volume and December the lowest, which shows the snowmelt- fed flow regime in this catchment, since it has higher values in summer and lower in winter. Figure 4.2 represents the monthly distribution of pre- cipitation, temperature, and streamflow at the Stave River above the lake station. The major source of precipitation in the Stave basin comes during November to June from frontal southwesterly flows of warm moist air aloft. The abrupt rise of the Coast Mountains above the flat or rolling topogra- phy of the Lower Fraser Valley has a significant influence on the moist air, causing about the two thirds of the annual precipitation to occur during this period. Sources of precipitation in the summer are generally weak frontal or convective storms. The monthly average of the variables implies the higher correlation of the streamflow rates with the maximum daily temperature as a consequence of the snowmelt in this watershed during spring and sum- mer seasons. The annual average discharge is 33.78 m3s−1 and total annual precipitation is approximately 3041 mm. The annual mean of the daily maximum temperature and daily average temperatures are 10.3 and 6.8◦C, respectively. 29 Figure 4.1: The study area in British Columbia, Canada (revised from BC Hydro’s maps). 30 Figure 4.2: Climatological monthly mean of precipitation, daily maximum temperature, and streamflow discharge at the Stave River above lake station in British Columbia, Canada (1983-2001). 31 Table 4.1: Data used in this study (1983-2001) from the GFS model output, local observation and climate indices. The GFS outputs are available twice a day at 12.00 and 24.00 No. Variable Index Unit Type 1 maximum temperature Tmax ◦C Obs. 2 temperature change Tmax−min ◦C Obs. 3 precipitation P mm Obs. 4 1 day antecedent flow Qt−1 m 3s−1 Obs. 5 2 day antecedent flow Qt−2 m 3s−1 Obs. 6 seasonal phase sin& cos(phases) - - 7 Arctic Oscillation AO - Clim. 8 Antarctic Oscillation AAO - Clim. 9 North Atlantic Oscillation NAO - Clim. 10 Pacific North American Pattern PNA - Clim. 11 sea surface temperature(Niño 3.4) SST - Clim. 12 sea surface temperature(Niño 3.4) SSTA - Clim. 13 Pacific Decadal Oscillation PDO - Clim. 14 accumulated precipitation apcp mm GFS 15 heating heating J GFS 16 air pressure at 500m z500 pa GFS 17 mean sea level pressure prmsl pa GFS 18 precipitable water pwat - GFS 19 relative humidity rhum % GFS 20 2m air temperature t2m K GFS 21 temperature temp700 K GFS 22 wind amplitude at 10m wind Amp ms−1 GFS 23 wind phase at 10m sin(phase) - GFS 24 wind phase at 10m cos(phase) - GFS 32 4.1.2 Data for Case Study In this study, the observed data from meteo-hydrological stations, reforecast dataset generated by the GFS model (formerly known as Medium-Range Forecast, MRF) available twice a day at 12.00 and 24.00, and the large-scale climate indices are used to forecast streamflows. Reforecast data set have been used for predicting with observed analogs [6], studying atmospheric predictability, integrating into the numerical weather prediction models, and diagnosing model bias where bias is the mean forecast minus the mean veri- fication [11]. The pool of potential predictors important for streamflow fore- casting is depicted in Table 4.1. Data sources include rawinsonde profile, surface marine reports from the Comprehensive Ocean-Atmosphere Data Set (COADS), aircraft observations of wind and temperature, synoptic reports of surface pressure over land, vertical temperature profiles form the Tele- vision Infrared Operation Satellite (TIROS) Operational Vertical Sounder (TOVS) over the ocean, TOVS temperature sounding over land above 100 hPa, surface wind speeds from the special Sensor Microwave Imager (SSM/I) and satellite cloud drift winds [6]. The El Niño-Southern Oscillation (ENSO) as one of the climate indices is an important source of inter-annual variability in weather and climate with a periodicity of 2-7 years. The Niño 3.4 index is used to represent ENSO activity as it measures the central equatorial Pacific sea surface tempera- ture (SST) anomalies [4]. The Pacific Decadal Oscillation (PDO) has been shown to have strong connections with hydrological variables, particularly in the Pacific Northwest. The PDO alternates between warm and cool phases every 20-30 years. For parts of North America, warm phases are generally associated with warmer and drier winters while cool phases are associated with cooler and wetter winters. The PDO was in a warm phase from 1976 until 1999 when it returned to a cool phase. In this study, the PDO time se- ries have been used from the website ftp://ftp.atmos.washington.edu /man- tua/ pnw impacts/INDICES/PDO.latest. The Arctic Oscillation (AO) is the leading principal component of sea level pressure anomalies north of 20◦ latitude. The AO has been reported to have a strong influence on condi- tions over the central Arctic Ocean and the North Atlantic basin and lesser influence over the North Pacific basin. The AO is similar in structure to the North Atlantic Oscillation (NAO). The NAO is the normalized difference in surface pressure between a station in the Azores and a station in Iceland. The NAO has been found to be related to changes in temperature and pre- cipitation in North America. In the North Atlanitic and to a lesser extent in the North Pacific, an increasing NAO index gives an increasing tendency 33 for a smooth transition between the stratospheric and polar tropospheric jets [2]. The most part of the climate indices used in this study have been downloaded from the NOAA/National Weather Service website and Climate Prediction Center (CPC). 4.2 Model Setup 4.2.1 Introduction As mentioned in the last sections, the Stave River basin is a snowmelt-fed watershed. Snowpack data in this watershed is not available for the whole study time period (1983-2001), as data are only available for recent years (2002-2005). In terms of neighbouring watersheds, those that have snowpack data are located far from the Stave River basin, consequently in different climate conditions than the Stave River. Therefore, precipitation in the form of both snow or rainfall is considered in this study. Temperature, rainfall, and the memory of the river system in the form of antecedent flow in the locally observed data are incorporated to forecast the streamflow in this basin. Multiple linear regression (MLR), and three nonlinear models, BNN, GP, and SVRGA are used for this purpose. For the first set of models, the following set of data is implemented: GFSt,Climt, Pt, (Tmax, Tmax−min)t, Qt, Qt−1 (4.1) ⇓ Machine Learning Model ⇓ Qt+l where t and t−1 denote respectively the current day and the preceding day, and l the forecast lead-time, Q is the streamflow, P is precipitation, Tmax and Tmax−min are the daily maximum temperature and the difference between the daily maximum and minimum temperatures. Then, for evaluating the role of reforecast data on the streamflow, the GFS outputs twice a day which are shown in Table 4.1 are added as predictors. Finally, the climate indices are added as predictors. The period of training and testing the models are 1983-1997 and 1998-2001, respectively. 34 4.2.2 Model Inputs Selection Four combinations of predictors are used to determine which predictors have significant effects on streamflow forecasting. There are three sets of predic- tors, the GFS model outputs, local observations and climate indices. The four combination of predictors are: (1) only local observation, (2) local observations and climate indices, (3) local observations and GFS model out- puts, and (4) local observations, climate indices and GFS outputs. Final predictors are selected by the stepwise regression method. Table 4.2 shows the selected predictors for above four combinations and for forecast lead- times from 1 to 7 days. These variables are selected so that include all the potential effective local circumstances, weather, and climate patterns in the study area. 4.3 Model Results The BNN model training is a process of adjusting the weights and biases based on Bayes’ theory. The training process tries to update the set of parameters (weights) once the predictors are entered to the model. This makes the BNN simulated streamflows match observed streamflows as close as possible. In BNN application, the training is also a process of determining the ANN architecture in terms of number of hidden neurons. The number of hidden neurons is found by training the ANN with different number of hidden neurons, then choosing the ANN with the best average performance. Due to the large number of weights and the gradient descent training al- gorithm, training could be easily trapped by a local minimum. To avoid this problem, in this research, an ensemble of 30 BNN models with ran- dom initial weights are generated. Then, the forecast mean and variance are obtained and used to draw the prediction intervals. In this research, the training data are divided into two parts namely training dataset (1983-1997) and evaluation dataset (1998-2001). Four error criteria, i.e. the correlation coefficient, mean absolute error (MAE), root mean squared error (RMSE), and the Nash-Sutcliffe model ef- ficiency coefficient (E) [23], evaluate the model forecast performance. These scores for forecast lead-times of 1 to 7 days are illustrated in Figure 4.3 for the case with all potential predictors implemented. As shown, the nonlinear models perform better than the MLR and the BNN model which uses the average of 30 ensemble members tends to give higher correlation coefficient even in longer lead-times (e.g. Corr = 0.58 and E = 0.30 at 7 day lead-time) and lower MAE and RMSE errors than the other models, especially at the 35 Table 4.2: Predictor selection in 4 cases of (1) local observation only, (2)Lo- cal observations & Climate indices, (3) GFS outputs (twice a day at 12.00 and 24.00) & Local observations, and (4) GFS outputs, Local data, and Large scale climate indices for forecast lead-times of 1 to 7 days (1: predic- tor selected; 0: predictor not selected) Lead-time (day) 1 2 3 4 5 6 7 No. Predictor-case: 1234 1234 1234 1234 1234 1234 1234 1 Tmax 1100 1100 1111 1111 1111 1111 1111 2 Tmax−min 1111 1111 1111 1111 1111 1111 1111 3 P 1111 1110 1111 1111 1111 1111 1111 4 Qt−1 1111 1111 1111 1111 1111 1111 1111 5 Qt−2 1111 1111 1111 1111 1010 1010 1110 6 sin(phases) 1111 1100 1111 1111 1111 1111 1111 7 cos(phases) 1111 1100 1111 1111 1111 1111 1111 8 AO 0000 0100 0101 0101 0001 0000 0100 9 AAO 0000 0000 0000 0000 0000 0000 0000 10 NAO 0100 0000 0100 0100 0000 0101 0001 11 PNA 0000 0001 0001 0001 0000 0000 0000 12 SST 0101 0001 0001 0001 0101 0101 0101 13 SSTA 0101 0001 0001 0001 0001 0101 0101 14 PDO 0000 0000 0000 0001 0000 0000 0000 15 apcp-12 0011 0011 0011 0011 0010 0010 0010 16 apcp-24 0011 0011 0011 0011 0011 0011 0011 17 heating-12 0000 0011 0011 0011 0000 0010 0010 18 heating-24 0011 0011 0010 0011 0011 0011 0011 19 z500 − 12 0011 0011 0011 0011 0011 0011 0011 20 z500 − 24 0011 0011 0011 0011 0011 0011 0011 21 prmsl− 12 0011 0011 0011 0000 0001 0000 0000 22 prmsl− 24 0011 0011 0011 0011 0011 0011 0011 23 pwat − 12 0011 0010 0010 0010 0000 0001 0001 24 pwat − 24 0011 0011 0011 0011 0001 0000 0000 25 rhum-12 0011 0010 0010 0010 0011 0010 0010 26 rhum-24 0011 0001 0001 0000 0000 0000 0000 27 t2m − 12 0010 0001 0000 0011 0001 0011 0011 28 t2m − 24 0011 0011 0011 0001 0011 0011 0001 29 temp700 − 12 0010 0011 0011 0000 0000 0000 0000 30 temp700 − 24 0010 0010 0010 0000 0000 0000 0000 31 wind Amp-12 0011 0001 0000 0010 0000 0010 0010 32 wind Amp-24 0001 0001 0001 0001 0001 0001 0000 33 sin(phase)− 12 0000 0010 0011 0011 0010 0000 0000 34 cos(phase)− 12 0000 0000 0000 0000 0000 0000 0000 35 sin(phase)− 24 0011 0011 0011 0011 0001 0000 0000 36 cos(phase)− 24 0011 0000 0000 0010 0011 0011 0011 36 Figure 4.3: Forecast scores for the linear and non-linear machine learning models for the case where all potential predictors are considered 37 larger lead-times. The MAE values for 1 and 2 days lead-time are 6.5 and 10.1 m3s−1 which are relatively lower than standard deviation of daily dis- charge (34.85 m3s−1). According to the Nash-Sutcliffe efficiency criterion, performance of the BNN, SVRGA, and GP for 1 day forecast lead-time is about 0.79 and for 2 to 6 days is higher than 0.3, which shows the good efficiency of the three nonlinear models in capturing the streamflow signal in the Stave River basin. It must be noted that an efficiency of 1 (E = 1) corresponds to a perfect match of modeled streamflow to the observed data. An efficiency of 0 (E = 0) indicates that the model predictions are as ac- curate as the mean of the observed data, whereas an efficiency less than zero (E < 0) occurs when the observed mean is a better predictor than the model. As shown in Figure 4.3, the BNNmodel tended to slightly outperform the MLR, GP and SVRGA models. Due to this fact, the BNN model is selected to be trained using three categories of predictors. The Nash-Sutcliffe score in Figure 4.4 shows that the observed and GFS data as predictors are best at shorter lead-times (1-4 days). For 6 and 7 day lead-time, the North Atlantic Oscillation climate index is selected as predictor which improves the forecast score in longer lead-times. Correlation coefficient of longer lead-times as shown in Figure 4.4 is the best when observation and climate indices are predictors. It is also true for the MAE and to some extent for the Nash- Sutcliffe scores. It can be extracted that for lead-times of 1 through 4 days using the observation and GFS outputs gives the best results where as for the cases of 5 through 7 days predictors can be chosen from the observation data and climate indices. This shows the effect of climate signals in longer lead forecasts. The effect of the climate indices are also manifested at longer lead-times in the case where the observed and climate indices are considered as predictors. The effects of the climate indices on longer lead- time streamflow forecasts are most likely due to the interannual variability of the accumulated snow in the Stave River basin, associated with the El Niño-Southern Oscillation (ENSO) and other modes of climate variability. Composites of the snow water equivalent anomalies (SWEA) showed that SWEA were negative during El Niño years and positive during La Niña years in the Columbia River Basin in British Columbia [14]. The forecasted streamflow by the BNN model at 1 through 7 day(s) lead-time and the observed time series (Figures 4.5 through 4.11, re- spectively) show generally good agreement, even although for some of the extreme flood events, the BNN model did not perform very well. Rapid tem- perature change during short periods and consequent snowmelt, and rain on snow events can be the reasons for the underestimation of extreme events 38 Figure 4.4: Forecast scores for the BNN model for the 4 cases where pre- dictors are observed data only, observed data and climate indices, observed and GFS output, and observed data, GFS outputs, and climate indices. by our models in the region. For comparing the proposed machine learning methods with the operational UBC Watershed Model, UBCWM applied by BC-Hydro in Canada, due to the lack of overlapping the study time pe- riod used in this research (1983-2001) and the available UBCWM forecasts (2003-2006), we analyzed the correlation coefficient between the observed and forecasted streamflow in Stave River both for the models proposed here and that for the UBCWM model. The UBCWM model results show that the values of correlation coefficient for 2003 through 2006 are 0.856, 0.883, 0.922, and 0.928, respectively, for one day lead-time forecast and 0.659, 0.478, 0.738, and 0.735 for 5 days lead-time forecast. The correlation co- efficient for entire period of 2003-2006 is 0.895 and 0.681 for one day and five days lead-time forecasts, respectively. These values are 0.894 and 0.647 obtained from the BNN model during the test period of 1998-2001 which 39 Figure 4.5: One day lead-time forecast by the BNN model with all the pre- dictors incorporated (1998-2001); red crosses show the observed streamflow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 40 Figure 4.6: Two days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 41 Figure 4.7: Three days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 42 Figure 4.8: Four days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 43 Figure 4.9: Five days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 44 Figure 4.10: Six days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 45 Figure 4.11: Seven days lead-time forecast by the BNN model with all the predictors incorporated (1998-2001); red crosses show the observed stream- flow, solid back line shows the forecast ensemble mean, and light green lines show the 95% prediction intervals. 46 are almost comparable with the forecast scores of the UBCWM model for one day and five days lead-time forecast even though the time periods for two models are different. The hydrometric station in this research is located upstream of the reservoir inflow, while in the UBCWM model the time se- ries used for prediction is the reservoir inflow implementing different manual adjustments. Note that the BNN models used here are totally automatic, and by implementing the operational adjustments used by BC-Hydro, the forecast skill scores can probably be improved further. Prediction intervals are calculated based on the model uncertainty. For this purpose, two parts of model variance are considered where one relates to the different initial conditions of BNN ensemble members and the second one tackles the model variance using the approximate Bayesian Approach [37]. If we train an ensemble of K neural network each on different initial conditions, model uncertainty, σ2w for a prediction on a particular input pat- tern x can be estimated simply by calculating σ2w = ∑ k (F k(x,wk)− F̄ (x))2 K − 1 , (4.2) where F k(x,wk) denotes the prediction of the kth neural network , wk is the optimal weight vector for that particular neural network after training with dataset Dk , and F̄ (x) = ∑ k F k(x,wk) K , (4.3) is the mean prediction of the ensemble of K models for datum x. In the Bayesian approach, model fitting should not yield only a point estimate w∗ in parameter space, that fit a particular dataset D best, but should produce a posterior distribution P (w|D,H) describing region in pa- rameterspace that are probable given D and our specific modelling assump- tions H (e.g. the number of hidden neurons). It is assumed that the data is generated by a Gaussian noise process with initial noise level of σ2D = 1/β, then maximizing the log posterior probability of the parameters given the data and our modelling assumptions is equal to minimizing the cost function C(w) = β ∑ n 1 2 (yn − tn)2 + α ∑ i 1 2 w2i . (4.4) 47 The hyperparameters β and α regulate to which extend the output error (the first term) and the size of the weights (the second term) contribute to the cost function, and can be optimized simultaneously with the parameters. If we assume that the posterior distribution P (w|D,α, β,H) after training can be approximated by a Gaussian with mean w∗ then the model variance can be calculated by σ 2 w = g TA−1g, (4.5) g = ∂y ∂w |xn, (4.6) where w∗ is the sensivity of the output to the parameters, and A−1 is the co-variance matrix of the model parameters, which is in fact the inverse of Hessian matrix of the model outputs with respect to its parameters, which is calculated during training [37]. In the approximate Bayesian method one neural network is trained on the entire dataset and uses a Gaussian approx- imation of the posterior distribution of its parameters to obtain confidence estimates. Given an ensemble of K neural networks each trained on differ- ent initial conditions, the model variance can be σ2w for a prediction on a particular input pattern x can be estimated simply by calculating [37] σ2w = σ 2 wa + 1 K ∑ k σ2 wb(k) (4.7) where σ2wa is calculated with Eq. 4.2 and σ 2 wb(k) is calculated from Eq. 4.5. σ2w is selected as the final prediction variance which is almost equivalent to the 95% of the cumulative density function of the ensemble members. The percentage of observed points lying outside the prediction intervals for forecast lead-times of 1 through 7 days are 6.4, 5.2, 8.1, 10.3, 13, 3.6, and 6% respectively which are not far from the theoretical value of 5%. To evaluate the performance of the models, streamflow forecast skill scores are calculated relative to climatology, persistence, and combination of both in four different different combinations of predictors. Skill Score (SS) is defined as SS = 1− MSEfor MSEref , (4.8) where MSEfor denotes the machine learning model forecast error andMSEref is the naive reference model error where the reference method can be clima- tology, persistence or their optimal linear combination. Climatology fore- casts simply issue the long-term mean whereas persistence forecasts issue the latest observed values. SS is zero when the MSE of the proposed machine 48 Figure 4.12: The normal Nash-Sutcliffe skill score (relative to climatology forecast) for the different sets of predictors. 49 Figure 4.13: The Nash-Sutcliffe skill score (relative to persistence fore- cast)for the different sets of predictors. 50 Figure 4.14: The Nash-Sutcliffe skill score for the different sets of predictors relative to the optimal linear combination of climatology and persistence forecast. 51 Figure 4.15: The comparison of the Nash-Sutcliffe skill score relative to climatology, persistence, and combination of both for the case that the local observation data and the GFS outputs are the predictors. 52 leaning model is the same as the reference method error and it equals one when the forecast error is zero and prediction is perfect. Figures 4.12- 4.15 show the Nash-Sutcliffe skill score for the BNN model, where all the scores are above zero which indicating the models have better skills than the cli- matology and persistence forecasts, and the optimal linear combination of two. In short-term streamflow forecasts, the forecast skill score in relative to a naive forecasting method. The naive method of persistence is more accurate than the climatology method (except at long-lead times) and both are less accurate than the combination of the both. That is the reason why the skill scores is the lowest relative to the combination and highest relative to climatology forecasts(Figure 4.15). 4.4 Extreme Event Forecasts To study how good the models are in forecasting the rare events of stream- flow, two skill score indices are used: Peirce Skill Score (PSS) and Extreme Dependency Score (EDS) [34]. PSS and EDS are defined by PSS = a a+ c − b b+ d , (4.9) EDS = 2 ln(a+c n ) ln( a n ) − 1, (4.10) where a denotes the number of occasions in which the event was forecast and observed (hits), b the number of times the event was forecast but not observed (false alarms), c the number of times the event was observed but not forecast (misses) and d the number of times the event was neither forecast nor observed (correct rejections). The sum of a, b, c and d is the total number of cases (n). Both PSS and EDS scores have advantage of the non- vanishing property when the event probability tends to zero and range from -1 (worst forecast) to 1 (perfect forecast). PSS tends to be unduly weighted toward the first term in (Eq. 4.10) for rare events of streamflow, where the second term becomes very small, as “correct rejection” dominates over “false alarms”, as well as over “hits” and “misses”. It must be mentioned that before calculating the scores, since the predictions from the models tended to underestimate the amplitude of the streamflow (see Figures 4.5 and 4.11), the forecast values are rescaled - the rescaling calibration is determined by matching the cumulative distribution of the observed streamflow and that of the predicted values using the training data. Figure 4.16 illustrates the 53 Figure 4.16: PSS and EDS scores of high streamflow forecasts using various machine learning methods (BNN, SVRGA and GP) and MLR at lead-times of 1 through 7 days for Stave River above the lake 54 PSS and EDS score over 1 through 7 days lead-time for different categories of predictors. Each panel shows the score for one model. For both linear and nonlinear models PSS and EDS scores tended to be better when the predictors are the GFS outputs plus locally observed data than when only observed data or any combination of observed and climate indices are used as predictors. In terms of model accuracy, it seems that the BNN model tended to slightly outperform the other models except for SVRGA at 3 days lead-time which performs marginally better than BNN. Both PSS and EDS scores for the linear model (MLR) at 7 days lead-time are better than those for the nonlinear models, indicating that at long-lead time the signal is too weak to allow the complex machine learning methods to improve on the simple MLR. 4.5 Summary The snow dominated Stave River above the lake is selected in this study to explore the applicability of machine learning methods to streamflow forecast- ing. The observed data from meteo-hydrological stations, reforecast dataset generated by the GFS model available twice a day at 12.00 and 24.00, and the large-scale climate indices are used to forecast streamflows. The Arc- tic Oscillation (AO), the leading principal component of sea level pressure anomalies, the NAO, the normalized difference in surface pressure between a station in the Azores and a station in Iceland, and the ENSO Niño 3.4 index which measures the central equatorial Pacific sea surface temperature (SST) are the main dominant climate signals over the study area influencing the streamflow fluctuations. The Stave river basin is a snowmelt-fed watershed. Snowpack data in this watershed is not available. Local observed data, i.e. temperature, precipitation, and the memory of the river system in the form of antecedent flow data, are incorporated to forecast the streamflow in this basin. Multiple linear regression (MLR), and three nonlinear models, BNN, GP, and SVRGA are applied for this purpose. Due to the large number of weights in a neural network model and the gradient descent training algorithm, training could be easily trapped by a local minimum in the cost function. To avoid this problem, an ensemble of 30 BNN models are generated to obtain the forecast mean, variance, and the prediction intervals. The percentage of observed data points of lying outside the 95% prediction intervals ranges between 3.6 to 13% for lead-times of 1 to 7 days, indicating that the prediction intervals are reasonably accurate. Four forecast scores, the correlation coefficient, mean absolute error 55 (MAE), root mean squared error (RMSE), and the Nash-Sutcliffe model efficiency coefficient (E) were used to evaluate the model forecasts. These scores show that for lead-times of 1 to 4 days using the observation and GFS outputs gives the best results, while for lead-times of 5 to 7 days pre- dictors can be chosen from the observation data and climate indices. To evaluate the performance of the models, streamflow forecast skill scores are also calculated relative to climatology, persistence, and the optimal linear combination of both. To study how good the models are in forecasting the rare events of streamflow, two skill score indices are used, Peirce Skill Score (PSS) and Extreme Dependency Score (EDS). For both linear and nonlinear models PSS and EDS scores are better when the predictors are the GFS outputs plus locally observed data in comparison to the cases when only observed data or any combination of observed and climate indices are chosen as the predictor. In terms of model accuracy, the BNN model tended to slightly outperform the other models. 56 Chapter 5 Conclusion and Recommendation Interaction between atmosphere, ocean, and water resources over the land plays the main role on the availability and variability of water in different climate regimes. The hydrologic fluctuations have direct effects on water resources planning and management. Recent studies show the increase of the interest on the application of the different atmospheric variables on hydrologic studies, since the emphasis in conventional methods was on only local recorded data. In this study, the atmospheric variables reforecasted by a numerical weather prediction model are added to the local observation as prediction in several machine learning methods. In this research, an attempt is made to capture the memory of the Stave River basin, in terms of its dependence on the snowmelt utilizing antecedent streamflow, temperature and rainfall as well as the climate variability as indicated by several climate indices, in order to forecast the streamflow effectively. Higher nonlinearity of the atmospheric-hydrologic events in general and the rainfall-runoff process in particular requires the use of non-linear ma- chine learning methods to reveal to some extent the nonlinear processes occurring during the conversion of precipitation into the streamflow. To obtain this, climate indices, global reforecasting system model outputs of various atmospheric variables, as well as locally recorded data are selected as predictors to forecast the short-term streamflow fluctuations in a given small size coastal watershed. Machine learning methods used in this study are Bayesian neural networks, support vector regression coupled with genetic algorithm, and Gaussian process which have advantages to the conventional models. In this study, the SVR model is coupled with genetic algorithm to overcome the hyperparameters and kernel selection issues. Using an ensem- ble of BNN increases the forecast accuracy and gives a prediction interval around the forecasted value. To evaluate the performance of the models, forecast skill scores are calculated relative to climatology, persistence, and the optimal linear combination of both. Two scores (PSS and EDS) based on the contingency table are calculated for assessing the forecast skills of 57 extreme streamflow events. The results show that BNN tends to slightly outperform the other models and can be used successfully in short-term forecasts, and the Nash-Sutcliffe model efficiency coefficient even for 7 days prediction is practically signifi- cant. This index shows that the observed data and the GFS model outputs as predictors are best at shorter lead-times while observed data and cli- mate indices are best at longer lead-times. When the lead-time increases, climate indices such as the Arctic Oscillation (AO), the leading principal component of sea level pressure anomalies, the North Atlantic Oscillation (NAO), the normalized difference in surface pressure between a station in the Azores and a station in Iceland, and ENSO Niño 3.4 which measures the central equatorial Pacific sea surface temperature (SST) anomalies are the main dominant climate signals over the study area on influencing the flow variability. The effects of the climate variability on longer lead-time streamflow forecasts are most likely due to the interannual variability of the accumulated snow in the Stave River basin, in relation to El Niño-Southern Oscillation (ENSO) events. For both linear and nonlinear models, the PSS and EDS scores which evaluate the rare events of streamflow are better when the predictors are the GFS outputs plus locally observed data in comparison to the cases when only observed data or any combination of observed and climate indices are chosen as predictors. In terms of model accuracy, the nonlinear models generally outperform MLR, and the BNN model tends to slightly outperform the other nonlinear models. To analyze if the differences between the forecast scores for different models are statistically and opera- tionally significant it is recommended that error bars be added to the scores calculated in this study. The correlation coefficient shows that the BNN model forecasts for one day lead-time forecast are almost the same as those obtained from the UBCWM model used in BC-hydro, but for five days lead-time forecast even though during some times are comparable, but slightly lower than the UBCWM model scores. These automatic forecast scores can probably be operationally improved by BC-hydro’s adjustments. For further investigation, it is recommended that instead of the rela- tively coarse-resolution numerical weather prediction model of GFS which is a global forecasting system one can use other higher-resolution NWP model outputs to connect the atmospheric variables to the hydrologic regimes. Also, downscaling techniques can be carried out on the NWP model out- puts and then feed to the rainfall-runoff models and generate ensemble of streamflow forecasts to evaluate the uncertainty of streamflow prediction. The dominant climate indices in our study area are AO, NAO, and SST. 58 How these and other climate indices relate to streamflow fluctuations in other parts of the globe can be evaluated. In this study, the predictor selection was done using stepwise regression. Other alternatives include clarifcation and regression trees (CART) [13] and the leading principal components from a principal component analysis of all the potential predictors, which eliminate the inter-dependency present in the predictors. 59 Bibliography [1] Abramovitz, J. N. (1996). Imperiled Waters, Impoverished Future: the Decline of Freshwater Ecosystems. Worldwatch Institute, Washington (DC), USA. [2] Ambaum, M. H. P., Hoskins, B. J., and Stephenson, D. B. (2001). Arctic Oscillation or North Atlantic Oscillation? Journal of Climate, 14(16):3495–3507. [3] Bousquet, O., von Luxburg U., and Ratesch, G. (2004). Advanced Lec- tures on Machine Learning. Springer, Berlin, Heidelberg New York. [4] Burn, D. H. (2008). Climatic Influences on Streamflow Timing in the Headwaters of the Mackenzie River Basin. Journal of Hydrology, 352(1- 2):225 – 238. [5] Cherkassky, Vladimir and Ma, Yunqian (2004). Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Netw., 17(1):113–126. [6] Clark, M. P. and Hay, L. E. (2004). E.: Use of Medium-Range Numerical Weather Prediction Model Output to Produce Forecasts of Streamflow. J. Hydrometeorol, 5:15–32. [7] Collier, M., Webb, R. H., and Schmidt, J. C. (1996). Dams and Rivers: Primer on the Downstream Effects of Dams. Reston (VA): US Geological Survey. [8] Fleming, S. W., Moore, R. D., and Clarke, G. K. C. (2006). Glacier- mediated Streamflow Teleconnections to the Arctic Oscillation. Interna- tional Journal of Climatology, 26:619–636. [9] Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. 60 [10] Gunn, S. R. (1998). Support Vector Machines for Classification and Regression. [11] Hamill, T. M., Whitaker, J. S., and Mullen, S. L. (2006). Reforecasts: An Important Dataset for Improving Weather Predictions. Bulletin of the American Meteorological Society, 87(1):33–46. [12] Holland, J. H. (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA, USA. [13] Hsieh, W. W. (2009). Machine Learning Methods in the Environmental Sciences - Neural Networks and Kernels. Cambridge. [14] Hsieh, W. W. and Tang, B. (2001). Interannual Variability of Accumu- lated Snow in the Columbia Basin, British Columbia. Journal of Water Resources Research, 37(6):17531759. [15] Jain, A., Sudheer, K. P., and Srinivasulu, S. (2004). Identification of Physical Processes Inherent in Artificial Neural Network Rainfall Runoff Models. Journal of Hydrological Processes, 18(3):571 – 581. [16] Jakob, M., Church, M., Holm, K., Meredith, D., Peters, N., and Weath- erly, H. (2009). An Appeal for Flood Risk Assessments in Canada. Journal of Innovation, pages 23–26. [17] Kennedy, A. M., Garen, D. C., and Koch, R. W. (2009). The As- sociation Between Climate Teleconnection Indices and Upper Klamath Seasonal Streamflow: Trans-Niño Index. Hydrological Processes, 23:973– 984. [18] Li, J. (2005). A Neural Network Model for Flood Forecasting for Small Hydro Plants. [19] Lundquist, J. D. and Dettinger, M. D. (2005). How Snowpack Hetero- geneity Affects Diurnal Streamflow Timing. Journal of Water Resources Research, 41(W05007):14. [20] MacKay, D. J. C. (1992). Bayesian Interpolation. Neural Comput., 4(3):415–447. [21] Makkeasorn, A., Chang, N., and Zhou, X. (2008). Short-term Stream- flow Forecasting with Global Climate Change Implications - A Compar- ative Study Between Genetic Programming and Neural Network Models. Journal of Hydrology, 352(3-4):336 – 354. 61 [22] Mantua, N. J., Hare, S. R., Zhang, Y., Wallace, J. M., and Francis, R. C. (1997). A Pacific Interdecadal Climate Oscillation with Impacts on Salmon Production. Bulletin of the American Meteorological Society, 78(6):1069–1079. [23] McCuen, R. H., Knight, Z., and Cutter, A. G. (2006). Evaluation of the Nash–Sutcliffe Efficiency Index. Journal of Hydrologic Engineering, 11(6):597–602. [24] Moradkhani, H., lin Hsu, K., Gupta, H. V., and Sorooshian, S. (2004). Improved Streamflow Forecasting Using Self-organizing Radial Basis Function Artificial Neural Networks. Journal of Hydrology, 295(1- 4):246 – 262. [25] Murphy, A. H. (1992). Climatology, Persistence, and Their Linear Com- bination as Standards of Reference in Skill Scores. Weather and Forecast- ing, 7(4):692–698. [26] Nayak, P. C., Sudheer, K. P., Rangan, D. M., and Ramasastri, K. S. (2004). A Neuro-fuzzy Computing Technique for Modeling Hydrological Time Series. Journal of Hydrology, 291(1-2):52 – 66. [27] Pan, T.-Y. and Wang, R.-Y. (2004). State Space Neural Networks for Short Term Rainfall-runoff Forecasting. Journal of Hydrology, 297(1-4):34 – 50. [28] Poff, N., Allan, J., Bain, M., Karr, J., Prestegaard, K., Richter, B., Sparks, R., and Stromberg, J. (1997). The Natural Flow Regime. Bio- science, 47(11):769–784. [29] Quick, M. and Pipes, A. (1977). UBC Watershed Model. Journal of Hydrological Sciences, XXII:153–161. [30] Rasmussen, C. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA. [31] Schaeffer, M., Selten, F. M., and Opsteegh, J. D. (2008). Shifts of Means Are not a Proxy for Changes in Extreme Winter Temperatures in Climate Projections. Journal of Climate Dynamics, 25(1):51 – 63. [32] Smith, J. and Eli, R. N. (1995). Neural-Network Models of Rainfall- Runoff Process. Journal of Water Resources Planning and Management, 121(6):499–508. 62 [33] Srinivasulu, S. and Jain, A. (2009). River Flow Prediction Using an Integrated Approach. Journal of Hydrologic Engineering, 14(1):75–83. [34] Stephenson, D. B., Casati, B., Ferro, C. A. T., and Wilson, C. A. (2008). The Extreme Dependency Score: a Non-vanishing Measure for Forecasts of Rare Events. Meteorological Applications, 15(1):41–50. [35] Tokar, A. S. and Johnson, P. A. (1999). Rainfall-Runoff Modeling Using Artificial Neural Networks. Journal of Hydrologic Engineering, 4(3):232– 239. [36] van Hinsbergen, C., van Lint, J., and van Zuylen, H. (2009). Bayesian Committee of Neural Networks to Predict Travel Times with Confi- dence Intervals. Transportation Research Part C: Emerging Technologies, 17(5):498 – 509. [37] van Lint, H. (2003). Confidence Intervals for Real-time Freeway Travel Time Prediction. In 2003 IEEE Intelligent Transportation Systems Pro- ceeding, Vols. 1 & 2, pages 1453–1458. 6th IEEE International Conference on Intelligent Transportation Systems, Shangai, Peoples R China, Oct. 12-15, 2003. [38] Vapnik, V., Golowich, S. E., and Smola, A. J. (1996). Support Vector Method for Function Approximation, Regression Estimation and Signal Processing. pages 281–287. [39] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., New York, NY, USA. [40] Willmott, C. and Matsuura, K. (2005). Advantages of the Mean Abso- lute Error (MAE) Over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Climate Research, 30(1):79–82. [41] Wu, C.-H., Tzeng, G.-H., and Lin, R.-H. (2009). A Novel Hybrid Ge- netic Algorithm for Kernel Function and Parameter Otimization in Sup- port Vector Regression. Expert Syst. Appl., 36(3):4725–4735. [42] Yu, P.-S., Chen, S.-T., and Chang, I.-F. (2006). Support Vector Re- gression for Real-time Flood Stage Forecasting. Journal of Hydrology, 328(3-4):704 – 716. The ICWRER - Symposium in Dresden, Germany. [43] Zhang, B. and Govindaraju, R. S. (2003). Geomorphology-based Ar- tificial Neural Networks (GANNs) for Estimation of Direct Runoff over Watersheds. Journal of Hydrology, 273(1-4):18 – 34. 63
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Short lead-time streamflow forecasting by machine learning...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Short lead-time streamflow forecasting by machine learning methods, with climate variability incorporated Rasouli, Kabir 2010
pdf
Page Metadata
Item Metadata
Title | Short lead-time streamflow forecasting by machine learning methods, with climate variability incorporated |
Creator |
Rasouli, Kabir |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Streamflow fluctuates as a result of different atmospheric, hydrologic, and morphologic mechanisms governing a river watershed. Variability of meteorological variables such as rainfall, temperature, wind, sea level pressure, humidity, and heating, as well as large scale climate indices like the Arctic Oscillation, Pacific/North American Pattern, North Atlantic Oscillation, and El Niño-Southern Oscillation play a role on the availability of water in a given basin. In this study, outputs of the NOAA Global Forecasting System (GFS) model, climate fluctuations, and observed data from meteohydrologic stations are used to forecast daily streamflows. Three machine learning methods are used for this purpose: support vector regression (SVR), Gaussian process (GP), and Bayesian neural network (BNN) models, and the results are compared with the multiple linear regression (MLR) model. Lead-time for forecasting varies from 1 to 7 days. This study has been applied to a small coastal watershed in British Columbia, Canada. Model comparisons show the BNN model tends to slightly outperform the GP and SVR models and all three models perform better than the MLR model. The results show that as predictors the observed data and the GFS model outputs are most effective at shorter lead-times while observed data and climate indices are most effective at longer lead-times. When the leadtime increases, climate indices such as the Arctic Oscillation, the North Atlantic Oscillation, and the Niño 3.4 which measures the central equatorial Pacific sea surface temperature (SST) anomalies, become more important in influencing the streamflow variability. The Nash-Sutcliffe forecast skill scores based on the naive methods of climatology, persistence, and a combination of them for all data and the Peirce Skill Score (PSS) and Extreme Dependency Score (EDS) for the streamflow rare events are evaluated for the BNN model. For rare events the skill scores are better when the predictors are the GFS outputs plus locally observed data compared to cases when only observed data or any combination of observed and climate indices are chosen as the predictors. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 3.0 Unported |
DOI | 10.14288/1.0052642 |
URI | http://hdl.handle.net/2429/27090 |
Degree |
Master of Science - MSc |
Program |
Atmospheric Science |
Affiliation |
Science, Faculty of Earth, Ocean and Atmospheric Sciences, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/3.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_rasouli_kabir.pdf [ 1.13MB ]
- Metadata
- JSON: 24-1.0052642.json
- JSON-LD: 24-1.0052642-ld.json
- RDF/XML (Pretty): 24-1.0052642-rdf.xml
- RDF/JSON: 24-1.0052642-rdf.json
- Turtle: 24-1.0052642-turtle.txt
- N-Triples: 24-1.0052642-rdf-ntriples.txt
- Original Record: 24-1.0052642-source.json
- Full Text
- 24-1.0052642-fulltext.txt
- Citation
- 24-1.0052642.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0052642/manifest