INCORPORATING FLOOD CONTROL RULE CURVES OF THE COLUMBIA RIVER HYDROELECTRIC SYSTEM IN A MULTIRESERVOIR REINFORCEMENT LEARNING OPTIMIZATION MODEL by NAZANIN SHABANI B.Sc. Sharif University of Technology, 2003 M.Sc. Iran University of Science and Technology, 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Civil Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (VANCOUVER) December 2009 © Nazanin Shabani, 2009 ABSTRACT The main objective of reservoir operation planning is to determine the amount of water released from a reservoir and the amount of energy traded in each time step to make the best use of available resources. This is done by evaluating the trade-off between the immediate and the future profit of power generation. A set of constraints have to be met in operating a reservoir system such as the continuity equations, transmission limits, generation and reservoir limits, flood control limits and load resource balance. Another important issue that needs to be addressed in these problems is uncertainty, which comes from lack of knowledge or certainty about the exact amount of an input parameter because of its spatial and temporal variability, inherent nature of a problem or parameter, errors in measurement due to human or technology inaccuracy and other errors in modeling due to simplification or ignorance. This research successfully implements a Reinforcement Learning (RL) optimization algorithm incorporating some of the operating rules and flood control constraints of the Columbia River Treaty. It considers the main sources of uncertainty in operating a large scale hydropower system: market prices and inflows by using a number of scenarios of historical data on inflow and energy prices in the learning process. The RL method reduces the time and computational effort needed to solve the operational planning problem and can be used to determine the value of water and marginal value of water for the BC Hydro system. The results suggest that the RL algorithm can incorporate a typical flood control rules for multireservoir optimization problems of this type. - ii - TABLE OF CONTENTS ABSTRACT .............................................................................................................................. ii TABLE OF CONTENTS .......................................................................................................... iii LIST OF TABLES ......................................................................................................................v LIST OF FIGURES ...................................................................................................................vi LIST OF ABBREVIATIONS.................................................................................................. viii 1. INTRODUCTION...............................................................................................................1 1.1 BACKGROUND .........................................................................................................1 1.2 PROBLEM DEFINITION ...........................................................................................4 1.3 UNCERTAINTY.........................................................................................................7 1.4 GOALS AND OBJECTIVES ......................................................................................9 2. LITERATURE RIVIEW ...................................................................................................10 2.1 BACKGROUND .......................................................................................................10 2.2 OPTIMIZATION TECHNIQUES .............................................................................10 2.2.1 Linear Programming ..........................................................................................10 2.2.2 Dynamic Programming ......................................................................................13 2.2.3 Evolutionary and Heuristic Methods ..................................................................15 2.2.4 Reinforcement Learning.....................................................................................16 3. THE COLUMBIA RIVER TREATY ................................................................................18 3.1 CHARACTERISTICS OF THE COLUMBIA RIVER...............................................18 3.2 THE COLUMBIA RIVER TREATY HISTORY.......................................................21 3.3 THE CRT PROJECTS...............................................................................................23 3.4 POWER GENERATION PROJECT CATEGORIES.................................................25 3.5 THE CRT OBJECTIVES AND BENEFITS ..............................................................26 3.5.1 Flood Control Benefits .......................................................................................26 3.5.1.1 Rule Curves ...................................................................................................28 3.6 POWER BENEFITS..................................................................................................29 3.7 THE CRT OPERATING PLANS ..............................................................................31 3.8 CRT MODELS..........................................................................................................35 4. MODELING METHODOLOGY ......................................................................................37 4.1 INTRODUCTION .....................................................................................................37 4.2 DYNAMIC PROGRAMMING .................................................................................38 4.3 REINFORCEMENT LEARNING VERSUS OTHER METHODS ............................39 4.4 REINFORCEMENT LEARNING COMPONENTS ..................................................40 4.5 REINFORCEMENT LEARNING APPLICATION ...................................................44 4.6 REINFORCEMENT LEARNING ALGORITHM .....................................................46 4.6.1 Dynamic Programming Methods (DP) ...............................................................46 4.6.1.1 Policy Iteration...............................................................................................46 4.6.1.2 Value Iteration ...............................................................................................47 4.6.2 Monte Carlo Methods (MC)...............................................................................47 4.6.3 Temporal Difference (TD) .................................................................................48 4.6.4 Function Approximation ....................................................................................51 5. CRT-RL MODEL AND RESULTS...................................................................................53 5.1 INTRODUCTION .....................................................................................................53 5.2 MODEL COMPONENTS .........................................................................................53 5.2.1 Time Horizon.....................................................................................................53 5.2.2 State Space.........................................................................................................54 - iii - 5.2.3 Environment ......................................................................................................55 5.2.3.1 Decision Variables of the Environment ..........................................................56 5.2.3.2 Constraints .....................................................................................................57 5.2.3.3 Objective Function.........................................................................................62 5.2.4 Actions ..............................................................................................................63 5.2.5 Rewards .............................................................................................................63 5.2.6 Value Function...................................................................................................64 5.2.7 Scenarios of Stochastic Parameters ....................................................................65 5.3 CRT-RL ALGORITHM ............................................................................................68 5.4 RESULTS .................................................................................................................72 5.4.1 Infeasibility........................................................................................................72 5.4.2 Results of the Value of Water.............................................................................74 5.4.3 The Impact of Modeling the FCC Curves and CRT Discharge Limits ................86 5.4.4 Policy Results ....................................................................................................89 5.4.4.1 Import/Exports Policies ..................................................................................90 5.4.4.2 Diagrams of Reservoir Volumes.....................................................................94 5.4.4.3 Diagram of the Energy ...................................................................................95 5.5 Summary ...................................................................................................................96 6. SUMMARY AND CONCLUSIONS.................................................................................97 6.1 SUMMARY ..............................................................................................................97 6.2 CONCLUSIONS .......................................................................................................99 6.3 FUTURE WORKS ..................................................................................................100 REFERENCES .......................................................................................................................101 - iv - LIST OF TABLES Table 1. Table 2. Characteristic of Columbia River Treaty Plants [35] ..............................................24 Discretized GMS and Mica Reservoir Storages (1000 cms-day).............................55 -v- LIST OF FIGURES Figure 1. B.C. Hydro facility region [2].......................................................................................3 Figure 2. BC Hydro generation-demand balance and transmission lines [2].................................3 Figure 3. Typical variations of reservoir storage and market prices [2] ........................................5 Figure 4. The trade-off between present and future value of water in storage [3] .........................6 Figure 5. Columbia River basin and plants [37].........................................................................19 Figure 6. The Columbia River annual (Jan-July) runoff volumes at Dallas, Oregon from 1925 to 2005 [34] ..................................................................................................................................20 Figure 7. The CRT organization chart [34] ................................................................................23 Figure 8. Hydrographs observed and pre-project flows for the year ending September 30, 1997 [35] ...........................................................................................................................................27 Figure 9. Flood control curve for the Libby reservoir [35] .........................................................28 Figure 10. Rule curves for a typical reservoir operation under the CRT [35]..............................29 Figure 11. B.C. Columbia region [2] .........................................................................................30 Figure 12. Arrow and Duncan outflow with and without AOP regulation [35]...........................32 Figure 13. Comparison between the AOP, DOP and unregulated outflow of Arrow and Duncun in 1999-2000 operating year [35] ..............................................................................................34 Figure 14. Reinforcement learning and dynamic programming comparison [29] .......................40 Figure 15. The interaction between agent and environment in RL [29] ......................................42 Figure 16. DP, MC and TD methods and their value function [29] ............................................51 Figure 17. Mica flood control curves for 69 years......................................................................59 Figure 18. Arrow flood control curves for 69 years ...................................................................59 Figure 19. 69 scenarios of historical data of GMS local natural inflow ......................................66 Figure 20. 69 scenarios of historical data of Mica local natural inflow.......................................66 Figure 21. 69 scenarios of power import price ...........................................................................66 Figure 22. 69 scenarios of power export price ...........................................................................67 Figure 23. Energy load (demand) ..............................................................................................68 Figure 24. Schematic representation of CRT-RL model algorithm using function algorithm, adopted from [4] .......................................................................................................................70 Figure 25. The schematic of CRT-RL Algorithm, adopted from [4]...........................................71 Figure 26. The error defined as the maximum difference in the value function between two successive iterations ..................................................................................................................72 Figure 27. 3-D diagrams of value of water in storage (1st-15th of August and 16th-31st of August) .................................................................................................................................................74 Figure 28. 3-D diagrams of value of water in storage (September and October).........................75 Figure 29. 3-D diagrams of value of water in storage (November and December) .....................75 Figure 30. 3-D diagrams of value of water in storage (January and February)............................75 Figure 31. 3-D diagrams of value of water in storage (March and 1st-15th of April) ...................76 Figure 32. 3-D diagrams of value of water in storage (16th-30th April and May) ........................76 Figure 33. 3-D diagrams of value of water in storage (June and July) ........................................76 Figure 34. 2-D diagram of GMS value of water in storage (1st to 15th August)...........................78 Figure 35. 2-D diagram of GMS value of water in storage (1st to 15th August)...........................78 Figure 36. 2-D diagram of Mica value of water in storage (January)..........................................78 Figure 37. 2-D diagram of GMS value of water in storage (January) .........................................79 Figure 38. 2-D diagram of Mica value of water in storage (May)...............................................79 Figure 39. 2-D diagram of GMS value of water in storage (May) ..............................................79 Figure 40. 2-D diagram of the marginal value of energy for GMS (1-15 August) ......................81 Figure 41. 2-D diagram of the marginal value of energy for Mica (1-15 August).......................81 - vi - Figure 42. 2-D diagram of the marginal value of energy for GMS (January)..............................82 Figure 43. 2-D diagram of the marginal value of energy for Mica (January) ..............................82 Figure 44. 2-D diagram of the marginal value of energy for GMS (May) ..................................83 Figure 45. 2-D diagram of the marginal value of energy for Mica (May) ...................................84 Figure 46. 2-D diagram of the marginal value of energy for GMS when the Mica storage is equal to205,000 cms-d........................................................................................................................85 Figure 47. 2-D diagram of the marginal value of energy for Mica when the GMS storage is equal to 300,000 cms-d.......................................................................................................................85 Figure 48. Value of water in storage with and without FCC and discharge limits for GMS storage of 300,000 cms-d in December..................................................................................................86 Figure 49. Value of water in storage with and without FCC and discharge limits for GMS storage of 300,000 cms-d in 16-30 April ...............................................................................................87 Figure 50. Marginal value of energy with and without FCC and discharge limits for GMS storage of 300,000 cms-d in 16-30 April ...............................................................................................87 Figure 51. The average difference between the marginal value of energy for GMS with and without FCC curves and discharge limits when the Mica storage is equal to 205,000 cms-d ......88 Figure 52. The average difference between the marginal value of energy for Mica with and without FCC curves when the GMS storage is equal to 300,000 cms-d......................................89 Figure 53. Load and other generations for the selected scenario.................................................89 Figure 54. Import and export prices for the selected scenario.....................................................90 Figure 55. Import and export with respect to GMS Storage in 1-15 August periods ...................91 Figure 56. Import and export with respect to Mica Storage in 1-15 August periods ...................91 Figure 57. Import and export with respect to GMS Storage in January ......................................92 Figure 58. Import and export with respect to Mica Storage in January.......................................92 Figure 59. Import and export with respect to GMS storage in May ............................................93 Figure 60. Import and export with respect to Mica storage in May ............................................93 Figure 61. GMS reservoir volume for Mica storage volume=205000 cms-d ..............................94 Figure 62. Mica reservoir volume for GMS storage volume=300000 cms-d ..............................95 Figure 63. Energy production when GMS storage volume=300000 cms-d and Mica storage volume=205000 cms-d..............................................................................................................96 - vii - LIST OF ABBREVIATIONS ACOA Ant Colony optimization algorithm AI Artificial intelligence AMPL A Mathematical Programming Language ANFRL Neural fuzzy reinforcement learning AOP Assured Operating Plans BC British Columbia BPA Bonneville Power Administration C-BT Colorado-Big Thompson cfs Cubic feet per second cms Cubic meter per second cms-d Cubic meter per second-day CRB Columbia River Basin CRC Critical rule curve CRT Columbia River Treaty CRTOM Columbia River Treaty Optimization Model d Day DOP Detailed Operating Plan DP Dynamic Programming ECC Energy Content Curve e.g. For example ELQT Extended linear quadratic Gaussian etc. Etcetera FCC Flood Control Curve GA Genetic Algorithm GOM Generalized Optimization Model GWh Gigawatt-hour H Hour i.e. That is LP Linear programming - viii - Maf Million acre feet MC Monte Carlo MCM Marginal Cost Model MDP Markovian Decision Process MW Megawatt MVW Marginal value of water ORC Operating rule curve ORT Optimal reservoir trajectory Pdf Probability density function PEB Permanent engineering board PWL Piecewise linear RL Reinforcement Learning RLROM Reinforcement Learning Reservoir Optimization Model RTC Real time control SA Simulated Annealing SDP Stochastic Dynamic Programming SDDP Stochastic Dual Dynamic Programming SLP Stochastic Linear Programming SLPR Stochastic Linear Programming with Recourse technique STOM Short Term Optimization Model TD Temporal Difference TSR Treaty Storage Regulations TS Tabu search URC Upper rule curve - ix - ACKNOWLEDGMENTS I would like to express my appreciation to my supervisor, Dr. Ziad K. Shawwash who offered me invaluable assistance, support and guidance during my MASc study. Special thanks to the Principal Engineer in Resource Analysis Group of the Generation Resource Management (GRM) Department at B.C Hydro, Dr. Thomas K. Siu for his help and support for the Grant-in-Aid program between UBC and B.C Hydro. Deepest gratitude to Dr. Alaa Abdallah, the Manager of Planning and Reliability Analysis Group of GRM, who has always given me great encouragement and valuable insight throughout the course of my research. Special thanks and appreciation to Dr. Greg Lawrence for his comments and feedback on my project. Special thanks and appreciation are due to my friend Hamideh A. Riseh, Specialist Engineer in the Engineering Department of B.C Hydro who helped me to understand many aspects of my project. And last but not least, I would like to express my love and gratitude to my beloved husband, Hamid, for his understanding and endless love, throughout the duration of my studies. Without his help I would have never been able to find my way. -x- 1. INTRODUCTION 1.1 BACKGROUND The main objective of electricity producing companies is to provide energy for their customers by converting primary energy sources to electricity. Different power generation facilities are used all over the world to meet the growing energy demand for municipal and industrial consumers. Several advantages and disadvantages exist for each of the generation methods. Some of them can only produce a constant amount of electricity while others are more flexible and can change their generation level quickly. Some are classified as green and can produce renewable energy while others may have adverse environmental effects. The cost of energy produced by each method vary depending on the technology used, efficiency, availability, capital cost and other factors. One of the widely used power production technologies in Canada is hydro power generation. Compared to other technologies, hydropower energy generation has many advantages such as, the flexibility of storing and shifting energy generation and the capability to quickly change the generation production level in real time. Electricity is considered as a non-storable or very costly commodity to store [1]. In the electric power industry there is no possibility to meet demand from stored reserves as in the combustible fuel industries, and electricity power producers must balance demand with supply in real time. Therefore, an ideal power generation facility has to be capable of providing energy within a wide range of output and be able to change its production level in a short period of time. Hydropower generation facilities are considered to be one of the best technologies that are capable of doing this. In these systems water, rather than electricity, can be stored and used in the future. Depending on the head of the reservoir, the generation capacity and turbines discharge capacity, a wide range of electricity can be generated by a hydropower generator. These characteristics make it attractive and practical. In addition, building dams and storing water in reservoirs has other benefits, 1 such as the supply of water for multiple consumption purposes, regulation of river inflows, flood prevention and control, and production of green energy and provision of recreational opportunities. One of the main disadvantages, however, is that hydropower generation requires high initial capital cost and could have some adverse environmental, ecological and social impacts. In British Columbia, almost 90% of the electricity is produced by hydropower plants located throughout the Province with a total generating capacity of 10,700 megawatts (MW) and an average annual energy production of 48,000 GWh [2]. The British Columbia Hydro and Power Authority (B.C. Hydro), is the third largest electricity producer in Canada, owns and manages the power generation operation of all plants in B.C. A total number of 33 generating facilities in several regions of British Columbia contribute to energy production in the Province. The most important of these facilities were built on the Peace River (2 plants, 34% of total energy production) and the Columbia River (3 plants, 31% of total energy production). In addition, two thermal generating facilities, other small hydro plants, Independent Power Producers (IPP) plants and imports/exports provide the balance of energy demand in BC. The BC Hydro dams are also used to control the river flows and reduce the risk of flood damage in the watershed. Figure 1 shows a schematic of the main regions with power generation facilities managed by BC Hydro. Many high voltage transmission lines carry the power from the production source and connect the Province to the electricity markets in Alberta and the US. Figure 2 shows a schematic of the transmission lines and supply-demand balance throughout the province. This study focuses on operations planning of the Columbia and Peace hydropower facilities and discusses the potential impacts of including some of the operating rules and flood control constraints of the Columbia River Treaty between Canada and the US on the marginal value of water of BC Hydro’s generation system. 2 Figure 1. B.C. Hydro facility region [2] Figure 2. BC Hydro generation-demand balance and transmission lines [2] 3 1.2 PROBLEM DEFINITION Operation planning of a hydro power generation system is a well-known problem and it has been addressed by many researchers as discussed in Chapter 2. The main objective of the problem is to determine the amount of water released from a reservoir and energy traded in each time step. Determining the optimal trade-off between the immediate rewards and the future value of water stored in reservoirs is an important problem, particularly when a set of constraints has to be met. The constraints include the continuity equations, transmission limits, generation and reservoir limits, flood control limits and the load resource balance. In a typical operations planning problem, the reservoir level may rise or fall according to a given generation level and the magnitude of its inflows. The objective of controlling reservoir levels is to meet current and future electricity demand and to optimize system efficiency. High reservoir levels are capable of generating more power using the same amount of water, but with higher risk of spilling water in the future. When electricity prices are high, it is desirable to produce energy and make profit by selling the energy surplus if surplus water is available. When energy prices are low, water is preferably stored for use in future production. So the objective of this problem is to meet domestic load demand and firm import/export contracts and make the optimal tradeoff between present benefits, expressed as revenues from energy purchases and sales and the potential expected long-term value of resources expressed as the value of water stored in reservoirs. The decisions that have to be made are: when and how much power to import/export and where and how much water to store in or draft from reservoirs while meeting the domestic load and the firm export/import contracts and also a set of constraints on system operation. Figure 3 shows a schematic of typical annual variations of reservoir storage and market prices. It can be seen from Figure 3 that reservoirs are drawn down during the winter months to meet high demand during these months and then they are refilled during the freshet period, when the inflows are high. Market prices are 4 also typically high during the winter months and are also higher than normal during summer months due to air conditioning load in the west coast. 672 4000 670 3500 668 Reservoir Elevation (m) 664 2500 662 2000 660 1500 658 656 Energy Price ($) 3000 666 1000 654 500 652 650 0 Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Reservoir Elevation Price Figure 3. Typical variations of reservoir storage and market prices [2] The trade-off between short term rewards and long-term value of water can be assessed by calculating the Marginal Value of Water which is defined as the value of an extra increment of storage in a given reservoir ($/m3). The main trade-off becomes between the marginal value of water and the present market price, where it would be profitable to produce and sell energy when the market price is higher than the marginal value of water, and conversely, to store water and/or import energy when the current market price is lower than marginal value of water. The marginal value of water is calculated as the first derivative of the value of water function with respect to reservoir storage. The water use is optimized at the point that minimizes the sum of immediate and future costs. At this point, the derivatives of the immediate cost function and future cost function with respect to storage become equal [3]. Figure 4 shows a typical variation of value of water function and marginal value of water with respect to reservoir storage. When inflows are high in wet years, BC Hydro operates its systems to fill the reservoirs and sell surplus energy in excess of domestic demand. If surplus water can not be stored then it could be released without contributing to power production. These releases are called spillage and must be minimized since it results in loss of generation and may cause 5 adverse environmental impacts. In dry years, on the other hand, when the inflows are low, energy demands have to be met by either drafting storage reservoirs, or by running generating stations such as the Burrard thermal generating station or by purchasing electricity on the market, usually at high cost. Figure 4. The trade-off between present and future value of water in storage [3] To achieve optimum operation, the entire system should be operated in an integrated and coordinated manner considering inflow conditions, electricity demand characteristics, current system status, maintenance requirements, specific dam and generating station constraints, and uncertainty in the inflow and price forecasts. The effects of operation in one region on another have also to be taken into consideration as generating in certain manner in one region will have a direct impact on how other systems are operated. 6 1.3 UNCERTAINTY Uncertainty is an important issue when modeling water resources problems. In general terms, uncertainty in modeling means the lack of knowledge or certainty about the exact value of a parameter because of its spatial and temporal variability. There are other kinds of uncertainty, such as the vagueness surrounding the value of an input, and these are not considered in this study. Different sources of uncertainty are possible, such as the inherent nature of a problem or an input parameter, errors in measurement due to human or technological inaccuracy and other errors in modeling due to simplification or ignorance. In deterministic modeling, usually the mean or the “expected value” of a stochastic parameter is used, and these could be significantly different from their real value. In stochastic modeling, realization of the stochastic variables is usually considered using the first moment of a random parameter. Higher moments of a stochastic parameter could also be considered, such as the standard deviation, skewness, kurtosis in addition to serial and cross correlation between the different random parameters. Using these properties, a set of “scenarios” can be generated for one or more random parameters. One method of scenario generation is called the moment matching method. In this method, a set of scenarios containing realization for all the random parameters match their original statistical properties. Other methods of scenario generation can also be used including the Monte-Carlo simulation method. In this method, a large number of samples are generated by sampling the probability distribution function of the parameters. Some probability distribution function parameters, such as the mean and standard deviation, are used if the distribution is normal while others require more moments to be used, such as the case when the Gamma distribution function is used. Considering uncertainty in a model has a number of advantages and disadvantages. One of the advantages is that a stochastic model usually results in a more robust and reliable solution. It also makes the decision makers aware of potential future risks and it will provide them with the option to hedge against them. However, stochastic models are 7 generally large scale compared to deterministic ones. They can also become nonlinear, complex and infeasible and they provide more conservative solutions. Despite these disadvantages, stochastic modeling is attractive and even essential for decision makers, and in this study, the uncertainty in some of the multireservoir model parameters has been considered. There are three main sources of uncertainty that must be considered when operating the BC Hydro power generation system: electricity demand, water inflow and market prices. Electricity demand refers to the amount of electricity consumption by the consumers. Inflow is the amount of water that flows to the reservoirs and can be used for power generation. The main objective of operations planning is to provide electricity to meet the demand without shortage in low inflow periods and make the highest profit during average and high inflow periods. Flow regulation and flood control are other important objectives. Demand is usually low in summer and high in winter, while inflow is usually low in winter and melting of snowpack results in high inflows in late spring and early summer. Fluctuations in demand exist in different hours of weekdays and weekends. In the BC Hydro power generation system, water inflow to the reservoirs is an important factor in assessing the energy production capability of the system and is mainly influenced by weather conditions and it varies from year to year and it therefore can also be considered as a stochastic variable. As a consequence, the amount of power generation varies from one year to the next and the average power generation capability in BC is 45600 GWh, it can vary between 36500 GWh in the driest year to 55900 GWh in the wettest year [4]. Market prices vary considerably at the hourly, daily and seasonally. They are mainly driven by many factors including power generation capacity, availability, transmission capacity and most importantly power use patterns which are driven by weather conditions. 8 1.4 GOALS AND OBJECTIVES The goal of this research is to implement a Reinforcement Learning (RL) optimization algorithm incorporating some of the operating rules and flood control constraints of the Columbia River Treaty and the main sources of uncertainty in operating a large scale hydropower system: market prices and inflows. The RL method reduces the time and computational effort needed to solve the reservoirs operations planning problem and can be used to determine the value of water and the marginal value of water of the BC Hydro system. This thesis is organized as follows: this chapter presented an introduction to the problem and the motivation for the work on this research topic. It also outlined the goals and objectives of this research. Chapter 2 reviews the literature in this field. Chapter 3 presents an introduction to the Columbia River Treaty. Chapter 4 presents the concepts of stochastic optimization used in this research with a focus on Reinforcement Learning and its mathematical formulation. Chapter 5 presents the model details and discusses the result for studies that were carried out using the RL model. Finally Chapter 6 presents summary and conclusions of this study. 9 2. LITERATURE RIVIEW 2.1 BACKGROUND The problem of optimizing operation of a multi-reservoir water resource system (in series or parallel) with multiple objectives has been dealt with in several previous studies. These objectives may include power generation, flood control management, environmental issues and water supply, or a combination of some or all of them. Different methods may be used to deal with this problem depending on the characteristic and objectives of the study. Extensive surveys on previous studies in this field can be found in [5], [6] and [7]. The multi-reservoir optimization problem can be classified into two categories: deterministic and stochastic. Deterministic and stochastic problems can also be classified as linear, integer, mixed integer and non-linear programming techniques. Water resources problems are inherently stochastic and since they typically contain some uncertain parameters, using stochastic optimization methods is essential and appropriate. Several methods have been used to solve stochastic optimization problems such as Stochastic Linear Programming, Stochastic Dynamic Programming, and Stochastic Dual Dynamic Programming. A brief review of the literature on these methods is presented next. 2.2 OPTIMIZATION TECHNIQUES 2.2.1 Linear Programming The Linear Programming (LP) technique has been widely used in the field of water resources management, especially for multi-reservoir system operation and management. The technique has also been used to solve problems that contain non-linear relationships, under certain assumptions and conditions, by linearization or piecewise linearization of 10 non-linear equations or by iteratively solving the problem. A typical objective function of a reservoir operations LP problem is to maximize the value of resources or value of power generation or to minimize the cost of water discharges while meeting a set of constraints. The main advantages of using this optimization method are: the ability of LP algorithm to solve large scale problems, its assured convergence to global optimum solutions, having no need to have an initial solution and its use of a well-developed duality theory for sensitivity analysis and the ease of problem formulation [8]. For large scale problems with large number of constraints and variables, decomposition methods such as the Dantzig-Wolf decomposition technique for large deterministic LP problems and the Bender decomposition technique for large stochastic LP problems can be used to solve these problems in shorter periods of time [5]. In 1968, Loucks developed a stochastic linear programming (SLP) model to determine the optimal operating policy of a single reservoir subject to random serially correlated net inflows which were represented by first-order Markov chains. The transition probabilities of inflows were estimated from historical inflows. The objective function was to minimize the sum of the expected squared deviation form the target reservoir volumes and discharges with all linear constraints. The solution yielded an optimal monthly reservoir release as a function of the reservoir content and inflow. In theory, this model can be extended to any number of reservoirs in series. In practice, however, the number of reservoirs must be small because the execution time increases exponentially with the number of reservoirs, a problem that is similar to those which are solved by Stochastic Dynamic Programming (SDP) methods [9]. In 1985, Birge showed that a large stochastic multistage linear programming problem can be broken down into a series of one-stage linear programming problems using the Benders Decomposition method. Since the one-stage problem is much smaller than the multistage one, the method can be applied to a larger number of reservoirs [10]. In 1989 Hiew, et al. applied the LP technique to eight reservoirs in the Colorado-Big Thompson (C-BT) project in northern Colorado. In their model, the objective function was to maximize firm annual deliveries from the C-BT system to a proposed water 11 storage project, while satisfying all constraints including water supply and power commitments and other constraints such as downstream fishery needs [11]. Linear programming has also been used to solve implicit stochastic optimization problems. Implicit here means that a deterministic optimization problem is solved many times with different inflow scenarios each time. The results of the optimizations can then be fed to a regression model to obtain a closed-loop solution. Karamouz et al. (1992) used this approach to determine a reservoir operating policy that is a linear function of the reservoir content and inflow [12]. In 2000, Shawwash developed an LP short term optimization model (STOM) to maximize the value of BC Hydro resources. The model maximizes the revenues from spot transactions in the US and Alberta electricity market while meeting the hourly load and other constraints. The model uses a linear programming technique in which the inflows, prices, and system load are deterministic inputs. The model was further modified and extended to the medium-term generalized optimization model (GOM), which is capable of handling longer planning periods ranging from hourly to monthly time-steps. This model also allows for variable time-steps that could be specified on daily, weekly, and monthly time-steps, and to include sub-time-steps within specified time-steps. Hence, GOM has the advantage of capturing the variation in load and market prices during weekdays or weekends [3]. Variations of LP problems have been used such as Integer and Mixed-Integer programming. In Integer programming, the variables can only take integer values while in the Mixed-Integer method, some variables are integer and others are real. An example of Mixed-Integer programming application in hydro power generation planning is the unit commitment problem. In this problem, the availability of a generation unit in each time step is a variable which can be set to zero when it is off and 1 when it is on. Tang (2007) applied a mixed-integer linear programming algorithm for a maintenance scheduling problem for B.C. Hydro [13]. 12 2.2.2 Dynamic Programming Another important method for optimizing water resource systems is Dynamic Programming (DP), which is one of the most powerful methods to deal with non-linearity and stochasticity by decomposing a large complex problem into a series of sub-problems. In 1970 Arvanitidis et. al. developed an aggregate model for installations on a river system. The optimal operating policy of the aggregate reservoir, determined with Stochastic Dynamic Programming (SDP), gives the total amount of potential weekly energy releases from the reservoirs, but not the amount of water to release from each reservoir [14]. Turgeon (1981) extended the algorithm and imporoved the results by using two state variables in his SDP problem: the water content of a reservoir and the total energy content of the downstream reservoirs. He established an operating policy for each reservoir that is a function of two state variables [15]. Archibald et al. (1997) improved the method developed by Turgeon (1981) by using four state variables in their SDP problem, and solved the problem with a set of inflow scenarios. The problem can be considered stochastic because there is a conditional probability of switching from one inflow scenario to another at the beginning of each week [16]. Druce (1989, 1990) developed a marginal cost model (MCM) for long term planning of the BC Hydro system. The Stochastic Dynamic Programming (SDP) technique was used to calculate the monthly marginal value of water in the Williston reservoir (GMS plant) for each storage level over a planning period of 4 to 6 years. The SDP model takes into account the uncertainty in inflows, and market prices. The calculated marginal value of stored water in the Williston reservoir is then used as a proxy for the long-term system marginal cost. The model also develops a probabilistic forecast of the system price signals (R ), reservoir storage volumes and the expected spills [17 & 18]. bch Stochastic Dual Dynamic Programming (SDDP) is another method for optimizing operation of a multi-reservoir system. Pereira and Pinto (1985) applied SDDP to a hydropower system of 37 reservoirs in Brazil and determined a monthly operating policy for 5 stages. Inflows were represented by scenario trees with two branches in the first 13 month, four in the second, eight in the third, and so on [19]. Jacobs et al. (1995) used this method in their software named SOCRATES to determine the optimal operating policy of the hydroelectric power plants over four stages [20]. Kelman et al. (1990) developed a Sampling Stochastic Dynamic Programming (SSDP) model that captured the complex temporal and spatial structure of the streamflow process by using a large number of sample streamflow sequences. The best inflow forecast was included in model as a hydrologic state variable to improve the reservoir operating policy [21]. Optimization embedded in a simulation algorithm or re-optimization was used by Johnson et al. (1991) and they showed that in a parallel reservoir system, a simulation model with single-period optimization sub-models maximizing the storage volume at the end of refill period performs better than a model with the optimization sub-models minimizing spills [22]. Nash (2003) developed a stochastic optimization model for the Columbia and the Peace reservoir systems. In the first model, which is the long-term model, the monthly storage value function and the associated marginal values are developed by a DP-LP based model and are then passed to the second model. The second model is the shorter-term model that applies a stochastic linear programming with recourse technique (SLPR) in which the inflows, prices, and demands are defined by a scenario tree. The storage value curves generated by the long-term model are used as a terminal value functions in the shorterterm model. The outputs of the shorter-term model for the two aggregated reservoir systems are the refined marginal value over the shorter periods and the operating decisions for each period [23]. Turgeon (2007) presented a new method, called optimal reservoir trajectory (ORT), for determining the optimal operating policy of a power system of several reservoirs in series. This method is based on the fact that raising the level of a reservoir feeding a power plant is profitable as long as the gain due to the higher head is greater than the loss due to the additional spillage. So at each time-step there is a reservoir storage level at which its expected energy generation is maximized and the ORT algorithm found that. 14 When the number of reservoirs is large, the problem is more complicated since the optimal level of one reservoir depends on the levels of all other reservoirs. Turgeon used this model for two, three and seven reservoirs and the results of two and three reservoirs were compared to the results of stochastic dynamic programming. He concluded that the ORT model is more efficient than SDP but with approximate optimal solution. The author also stated that this model is particularly applicable for large number of reservoirs in series but not reservoirs located on different rivers [7]. 2.2.3 Evolutionary and Heuristic Methods Evolutionary methods, mainly inspired by bio-systems behaviour, are one of the important emerging approaches for optimization. They are able to deal with complex and difficult-to-formulate problems with stochasticity, non-convexity and large scale environment. Although they can not guarantee global optimum solution, they often can reach to an approximate best solution that is acceptable to the decision makers. Genetic Algorithms, artificial neural networks and ant colony optimization algorithms are some important evolutionary methods that have been used in the field of reservoir operation and management problems. Oliveira and Loucks (1997) used a Genetic Algorithm (GA) to evaluate operating rules for multi-reservoir systems and demonstrated that GAs can be used to identify effective operating rules [24]. Wardlaw and Sherif (1999) used a genetic algorithms in a four reservoir system and achieved a global optimum. The results showed that a genetic algorithm could be satisfactorily used in real time operations with stochastic inflows. The authors concluded that the genetic algorithm approach is robust and can be easily applied to complex systems and has the potential of being an alternative to stochastic dynamic programming approaches [25]. Ahmed and Sarma (2005) used a GA model to find the optimal operating policy of a multi-purpose reservoir. They used a synthetic monthly streamflow series of 100 years to derive the operating policy. The results of the derived policy were compared to that of the SDP model on the basis of their performance in reservoir simulation for 20 years of historic monthly streamflow. The authors stated that 15 the GA-erived policies are promising and competitive and can be effectively used for reservoir operation [26]. Afshar and Moeini (2008) presented a constrained formulation of the Ant Colony Optimization Algorithm (ACOA) for the optimization of large scale reservoir operation problem. They used this method to optimally solve the problem of hydropower operation of “Dez” Reservoir in Iran. The authors stated that their model is able to solve large scale reservoir operation problems where the conventional heuristic methods fail to even find a feasible solution [27]. Heuristic methods are iterative search algorithms for combinatorial optimization where it is usually easy to be trapped in local optimum. The main goal of heuristic algorithms is to allow for more time to explore the search space. This task is done by accepting moves, which may degrade the solution quality while enables the algorithm to escape from local optimum and hopefully converge to global optimum. Examples of these methods are Simulated Annealing (SA) technique and Tabu Search (TS) algorithms. In 2004, Teegavarapu applied SA technique to multi-period, multiple-reservoir systems. The technique was tested for application to a benchmark problem of four-reservoir system previously solved using a linear programming formulation and also to a system of four hydropower generating reservoirs in Manitoba, Canada. A limited version of this problem was solved using a mixed integer nonlinear programming and results were compared with those obtained using SA. The authors stated that simulated annealing can be used for obtaining near-optimal solutions for multi-period reservoir operation problems that are computationally intractable [28]. 2.2.4 Reinforcement Learning One of the main problems of working with traditional stochastic optimization methods, such as stochastic dynamic programming, is the ‘curse of dimensionality’ since the dimension of the system grows exponentially when the states space increases. Reinforcement learning is an appropriate method to overcome this problem. This method 16 belongs to field of machine learning techniques and artificial intelligence (AI). An introduction to this method and details of the reinforcement learning technique can be found in Sutton book [29]. Wilson (1996) applied the RL technique in the real-time optimal control of hydraulic networks [30]. Bhattacharya, et al. (2003) applied the RL technique in the real time control (RTC) of Delfland water system in the Netherlands covering an area of about 367 km2 that consists of about 60 polders with 12 pumping stations. They concluded that in all applications involving some sort of control functions (urban drainage systems, polder water level maintenance, and reservoir operation), RL has substantial potential in solving these complex problems [31]. Abolpour et al. (2005) used an Adaptive Neural Fuzzy Reinforcement Learning (ANFRL) to obtain optimal values of the decision variables in water allocation improvement in river basin problems. The authors concluded that the methodology can be used to develop fuzzy rule systems that accurately simulate the behavior of complex river systems within the context of uncertainty [32]. In 2007, Lee et al. used a RL algorithm for stochastic optimization of a multi-reservoir system in South Korea. They considered an objective function for maximizing the average periodic benefits and rewards. The Q-Learning method was applied to develope operating policies in real-time and the results were compared to implicit stochastic dynamic programming and sampling stochastic dynamic programming. The authors concluded that the operating rules derived from Q-Learning did better than the other rules derived by implicit stochastic dynamic programming and stochastic dual dynamic programming [33]. Abdalla, et al., (2007) developed a Reinforcement Learning Reservoir Optimization Model (RLROM) and applied it to the BC Hydro main reservoirs on the Peace and the Columbia Rivers. The model considers the Williston and the Kinbasket reservoir storage levels as state variables and the forward sales as decision variables [4]. This study extends this model as discussed next. 17 3. THE COLUMBIA RIVER TREATY 3.1 CHARACTERISTICS OF THE COLUMBIA RIVER The Columbia River is the most powerful river in North America in terms of power generation. The capability of power production of a river is measured by multiplying its head by its flow. The Columbia River has the highest head in North America while its inflow is lower than some other rivers such as the St. Lawrence, the Mississippi, the Mackenzie, the Yukon and the Nelson Rivers. The high head of the Columbia River is attributed to the fact that a large percentage of the River inflow originates from high mountain lakes in the Canadian Rocky Mountains in British Columbia and Alberta. A number of high dams are also built on the main stem of the river. For example, the Grand Coulee dam, located on Columbia River in the State of Washington is about twice as high as Niagara Falls1. In terms of catchment area, the Columbia River is the fourth largest river in North America with a drainage area of 715,640 Km2. The drainage area is located in British Columbia and in seven western U.S. states: Washington, Oregon, Idaho, Montana, Wyoming, Nevada and Utah. The River originates from Columbia Lake on the west slope of the British Columbia’s Rocky Mountain range. It then passes through the US and empties into the Pacific Ocean at the border between Washington and Oregon as shown in Figure 5. The River is 1,942 km long and flows into the Pacific Ocean near Astoria, Oregon. Approximately 25-50% (depending on the season) of the total runoff of the Columbia River system originates from the Canadian portion of the system, although it only represents 15% of the total Columbia River Basin (CRB) area. The Canadian contribution to inflows of the Columbia River is particularly significant in late summer and it forms about 50% of the river flows The Dalles in Oregon. 1 The information provided in this chapter is taken from references [34, 35 and 36]. 18 Figure 5. Columbia River basin and plants [37] The basin is located in the southern interior climate region, and is affected by both continental and modified maritime conditions. Most of the CRB lies in the Intermountain Region bounded by the Rocky Mountains to the east and north, and the Cascade Mountains to the west. The CRB topography is characterized by a series of mountain ranges that climb to significant elevations. Winter snow accumulation and melt is the most dominant factor influencing the natural hydrology of the Columbia River basin, as most of the precipitation in the area falls 19 between October and March. Natural runoff shows a low flow period in the fall and winter months, and a large spring peak flow, or freshet flows from snow melt. However, the annual hydrograph of the Columbia River over the past 40 years is not characteristic of natural processes, because the flow of the river is heavily regulated by many dams. Higher than natural flows occur in winter due to releases of reservoir storage for cold season energy production, followed by less than natural flows in the spring, therefore reducing the peak flow for flood control. Flood control is achieved by draining the reservoirs in mid-winter and then refilling them in the early summer with the freshet water flows that could otherwise create flooding conditions. Figure 6 shows the Columbia River annual runoff volumes from January to July at the Dalles in Oregon from 1925 to 2005. C o lu m b ia R ive r a t T h e D a lle s , O re g o n An n u a l J a n -J u ly ru n o ff v o lu m e s 15 0% 14 0% 13 0% Jan-July runoff volume (% of normal) 12 0% 11 0% 10 0% 9 0% 8 0% 7 0% 6 0% 5 0% 4 0% 3 0% 2 0% 1 0% 0% 1 92 0 1 93 0 19 40 19 50 19 60 19 70 19 80 19 90 20 00 20 10 Y ea r Figure 6. The Columbia River annual (Jan-July) runoff volumes at Dallas, Oregon from 1925 to 2005 [34] 20 3.2 THE COLUMBIA RIVER TREATY HISTORY The Columbia River Treaty (CRT) was initiated after the occurrence of a large flood in 1948. This flood destroyed Vanport, Oregon, which was a City of about 35,000. Almost 60 people were killed and a large number of homes, farms and dykes were damaged from Trail in BC all the way to Astroia, Oregan. Both the US and Canadian Governments were motivated to think about an agreement to prevent such a disaster from happening again and that could provide benefit of flood control and also power generation for the people of the region. They were also looking for new sources of electricity to meet the growing demand for energy. The two countries were engaged in negotiation during the period 1945-60 and eventually the Treaty was signed on 17 January 1961 by the Canadian Prime Minister Diefenbaker of Canada and President Eisenhower of the US. The US ratified the Treaty shortly but Canada did not. British Columbia requested that the downstream power benefits be sold to the US to provide the needed funds for building the dams on the Columbia and Peace Rivers. First, the government of Canada did not accept such a deal. But after the two governments started negotiation in 1961-64 they signed the Treaty Protocol in January 1964. This protocol clarified some issues and also allowed Canada to sell its entitlement to the US for US$254.4 million for 30 years. With these added agreements, the Treaty was ratified and implemented by the Canadian and the British Columbian government on September 16, 1964. The following is a list of the key historical dates of the CRT: • 1938-42 – The Grand Coulee Dam was built. • 1948 – The Columbia River flood caused much damage in both countries. • 1961-64 – The Columbia River Treaty was signed, ratified, and proclaimed - sale of first 30 years of Canadian Entitlement to the U.S. 21 • 1967-76 – The Duncan, Arrow (Keenleyside), Mica (Kinbasket), and the Libby (Koocanusa) dams were completed and reservoirs were filled for the first time. • 1984 – The Revelstoke dam was built and Non-Treaty Storage Agreement signed. • 2003 - All Treaty Entitlement energy returned to Canada (end of 30-year sale). • 2004 - End of Non-Treaty Storage Agreement release provisions- Water Use Plan process for Mica, Revelstoke, Arrow, and Duncan was completed. The CRT does not have a specified termination date. However at any time after 15 September 2024, both countries have the option to continue or to terminate the Treaty with a 10-year notice. Certain terms of the Treaty continue for the life of the projects, including the called upon flood control provisions, the Libby coordination obligations and the Kootenay River diversion rights. A number of important future dates in the CRT are listed below: • June 30th, 2011 - all Non-Treaty storage must be refilled. • September 15th, 2014 - Either Canada or U.S may give 10-year notice for termination of Columbia River Treaty. • September 15th, 2024 - Earliest termination date for Columbia River Treaty. The Treaty activities are mostly managed by the Canadian and U.S. Entities. The Canadian Entity is BC Hydro and Power Authority, and the U.S. Entity is the Bonneville Power Administration (BPA) and the Northwestern Division Engineer for the U.S. Army Corps of Engineers. The Treaty also established a Permanent Engineering Board, consisting of equal members from Canada and the U.S. that reports to the governments annually on Treaty results, any deviations from the operating plans, and assists the 22 Entities in resolving any disputes. Figure 7 shows the Columbia River Treaty organization chart for Canada and the US. CANADIAN GOVERNMENT Ministry of Foreign Affairs & Trade Ministry Natural Resources BRITISH COLUMBIA GOVERNMENT CANADIAN ENTITY for Art.XIV2j* Treaty Protocol UNITED STATES GOVERNMENT Department of State Department of the Army Department of Energy PERMANENT ENGINEERING BOARD* CANADIAN UNITED STATES CANADIAN ENTITY* PEB ENGINEERING COMMITTEE*** CANADIAN UNITED STATES CANADIAN COORDINATOR & SECRETARY** OPERATING COMMITTEE** CANADIAN UNITED STATES United States ENTITY* United States COORDINATOR & SECRETARY** HYDROMETEOROLOGICAL COMMITTEE** CANADIAN UNITED STATES * created by Treaty or governments, ** created by Entities, *** created by PEB Figure 7. The CRT organization chart [34] 3.3 THE CRT PROJECTS The Treaty required Canada to provide 19.12 km³ of reservoir storage through building three dams: Mica, Arrow, and Duncan. However, Mica dam was built higher than required by the Treaty, and provides 6.17 km³ of non-treaty storage space. Unless otherwise agreed upon, Canada is also required to operate these reservoirs for the 23 optimum power generation and flood control downstream in both countries. The allocation of power storage operations among the three projects is at the Canadian discretion. The Mica Dam was the only dam that was designed and constructed with a powerhouse. All the other Canadian Treaty projects were initially built only for the purpose of regulating the river inflows. In 2002, 35 years after the Keenleyside storage dam was originally completed, the 185 MW Arrow Lakes Hydro project was constructed. The Duncan Dam remains a pure storage project, and has no at-site power generation facilities. The Treaty allowed the U.S. to construct and operate the Libby project on the Kootenai River in Montana (Kootenay in Canada) which provides a further 6.14 km³ of active storage in the Koocanusa reservoir. Water behind the Libby dam floods back 68 km into Canadian territories, while the water released from the dam returns to Canada just upstream of Kootenay Lake. The Libby Dam was completed in 1973 and is operated for power, flood control, and other benefits at-site and downstream in both Canada and the United States, and neither country makes any payment for the resulting downstream benefits. The characteristics of all these dams are presented in Table 1. Table 1. Characteristic of Columbia River Treaty Plants [35] Dam Name DUNCAN ARROW MICA LIBBY Date Treaty Non-Treaty Generator Dam Height Completed Storage Storage Capacity 1967 1.73 km³ None None 40 m (130 ft) 1968 (1.4 Maf) 8.76 km³ 0.31 km³ 170 MW 52 m (170 ft) 1973 (7.1 Maf) 8.63 km³ (0.25 Maf) 6.17 km³ 1,740 MW 198 m (650 ft) 1973 (7.0 Maf) 6.14 km³ (5.0 Maf) None 604 MW 113 m (370 ft) Total CA (5.0 Maf) 19.12 km³ 6.48 km³ 1,910 MW Total US 6.14 km³ None 604 MW Total 25.26 km³ 6.48 km³ 2,514 MW 24 The Revelstoke dam is another dam on the Canadian side of the Columbia River. It was not built under the CRT provisions and was built under a separate agreement with the condition that its operation must not interfere with CRT operation. The Revelstoke dam is located 130 kilometers downstream from Mica and is the second largest power plant in BC Hydro generation facilities. It has 4 generating units and benefits from storage in the Mica reservoir. About 30% of its flow comes from local natural inflow and discharge from Mica dam contribute in the rest of about 70% of the river flow. 3.4 POWER GENERATION PROJECT CATEGORIES There are two major categories of power generation projects on the Columbia River and its tributaries: storage and run-of-river projects. · Storage projects: Storage projects are capable of adjusting the River’s natural inflow pattern to more closely match water and electricity use patterns. In these projects, rainfall runoff and snowmelt can be stored and used when needed. During spring, inflows originating from snowmelt are usually more than required for power generation, irrigation and other uses. This water is stored in the reservoirs until late summer, fall and winter when the demand for electricity is high. Storage dams also regulate the streamflow and prevent flooding. In these projects, reservoir levels may vary significantly during normal river operations. There is typically a large difference between storage reservoirs when it is full and when it is down at its lowest level. For example, the Kinbasket Lake operates over 47 meters range and the Arrow Lake operates over a range of 20 meters. Run-of-River projects: The other category is run-of-river projects, which have no storage and were developed primarily for navigation and hydropower generation. All run-of-river projects provide hydraulic head for power generation and many can also provide sufficient water depth over rapids and other obstacles to permit barrage navigation. Runof-river projects release water at the dam at nearly the same rate as its inflows, and the small storage that backs up behind run-of-river projects is referred to as pondage. 25 Reservoir levels behind these projects vary only 1 to 3.0 meters in normal operations. For example the Revelstoke storage level typically varies by 1.5-3.0 meters. However, it can be drafted to 17 meters in severe conditions. 3.5 THE CRT OBJECTIVES AND BENEFITS The CRT defines different purposes for water use. The primary objectives were ranked according to their priorities as follows: 1) Consumption, defined as water demand for domestic, municipal, stock-water, irrigation, mining and industrial purposes. 2) Flood control and prevention. 3) Firm energy demand must be met. 4) Target volume or reservoir refill by July 31 to maximize the firm energy for the following year. 5) Secondary energy was set as the last priority. In addition to these priorities, fisheries and environmental issues have become an important concern for decision makers. The following sections discuss some of these objectives in some details. 3.5.1 Flood Control Benefits Before any dams were built, there were significant variations in the seasonal streamflow of the Columbia River. For example, streamflows ranged from as low as 396 cms (14,000 cubic feet per second (cfs)) to 15,574 cms (550,000 cfs) at the US/Canadian border. This variation heightened the risks of flooding during the freshet period and drought during the fall and winter. The freshet period extends from late spring and early summer until end of July. After building these dams, the streamflow became regulated. Figure 8 compares the streamflow pattern of observed and pre-project flows at Castlegar and Trail in 1997. 26 C olu mb ia River at Birch bank (Rive r fl ow ga uge l ocated betw een Castlegar an d Trail) ) d n o c e s re p t e fe ic b u c 0 0 0 1 ( e rg a h c s i D 3 50 3 00 <= == == st art of m a jor re gio nal f lood d am ag e ( 280 k cf s ) 2 50 < == == = s t art of m in or re gion al flo od dam a ge (2 25 kc fs) 2 00 < == == = st a rt of lo c alized f lo od da m age (17 0 kcf s) 1 50 1 00 50 0 Oc t No v D ec Ja n Fe b Ma r Apr Ma y P re -p ro j ect flow Jun Ju l Au g Sep Ob serve d fl o w Figure 8. Hydrographs observed and pre-project flows for the year ending September 30, 1997 [35] The CRT includes several flood control rules. According to the CRT, Canada is required to operate at least 10.4 km³ (8.4 Million-Acre-Feet (Maf)) of storage at Mica, Duncan and Arrow for flood control in Canada and the U.S. To compensate for this operation, the U.S. paid $64.4 million for ½ of the present worth of the estimated future flood damages prevented by operation of the Canadian Treaty projects through 2024. The U.S. has also the option to request Canada to evacuate all remaining Canadian storage space if there is a potential major flood and if U.S. storage capacity is insufficient to limit the flow at the Dalles to 16,990 cms (600,000 cfs). In such case, the U.S. must pay $1.875 million for each of the first four requests for this “on-call” additional storage. The U.S. must also coordinate the operation of Libby for flood control in Canada. The Corps of Engineers estimated that the Treaty storage prevented more than US$ 200 million of damages in 1972 and 1974 and have also reduced the 1997 peak flows at the Dalles to 170 kcfs, and prevented about US$ 200 million in flood damages in that year. 27 3.5.1.1 Rule Curves Rule curves are a set of operating values defining reservoir storage that are used to operate the hydro system for specific needs. They help to coordinate the operation of reservoirs under various water conditions. Different types of rule curves are used for different operating objectives in typical hydro studies. In CRT operations, they are defined and used as follows: · The Upper Rule Curve (URC): This curve defines the upper storage limit at a reservoir to minimize the flooding risk. Since the amount of spring runoff is uncertain, a single or base curve is used for the months from August to December for all years. From January to July, a number of rule curves are used, depending on runoff forecast, to calculate the URC for each subsequent operation year. Figure 9 shows a typical URC for a reservoir (the Libby reservoir). Linear interpolation is used to calculate points on the curve if the forecast is between any given URC curves. The URC is also called the Flood Control Curve (FCC) in the BC Hydro system and as used in this study. Required storage space (MAF) Apr-Aug Libby forecast 0 3.5 MAF 1 4.5 MAF 2 5.5 MAF 3 4 6.5 MAF 5 7.5 MAF 6 Oct Nov Dec Jan Feb Mar Apr Figure 9. Flood control curve for the Libby reservoir [35] · The Energy Content Curve (ECC): This curve defines reservoir operation that provides high probability of refill by the end of the operating year. Drafting a 28 reservoir below ECC decreases the probability of refill and is only done to meet the firm load. This curve also has base curve for August to December and a variable curve for January to July base on runoff forecast. · Critical Rule Curve (CRC): This curve defines an operation that drafts the reservoir from full to empty over the number of years in the critical period to make most efficient use of available water. For a 4-year critical period, CRC1 depicts the first critical-year operation; CRC2 depicts the second critical-year operation and so on. Figure 10 shows typical rule curves (full reservoir, FCC, CRC1-4) for a typical reservoir. Figure 10. Rule curves for a typical reservoir operation under the CRT [35] 3.6 POWER BENEFITS In order to take advantage of available water, storage and capacity in the province, BC Hydro integrates the operation of its power production system to optimize the generation opportunities. Large reservoirs on the Peace and Columbia Rivers have a significant influence on the integrated system operation while the effects of other small plants are usually considered to be negligible. Plants in each river are linked hydraulically and they 29 should be coordinated in their flow, generation and storage for flood control and power generation. Figure 11 shows different installations on the Columbia River stem and tributaries. In operating the Columbia River plants, CRT has additional effect on the required coordination of the storage at Mica, Duncan, Arrow and Libby reservoirs and it also sets target for management of the flow at the US boundary. Non-Treaty Storage agreement with the US also impact how BC Hydro regulates the use of storage in Mica and Arrow. Figure 11. B.C. Columbia region [2] There are some other installations on the Columbia River. Two of them are on the Kootenay River, the Duncan Dam, which currently has no generation but impounds 1727 million m3 of live storage in Duncan reservoir, and the Kootenay Canal plant with 528 MW of installed capacity and uses the storage in Duncan lake and Kootenay lake (approximately 987 million m3) which is operated under the Kootenay Canal Agreement with west Kootenay Power. The other facilities are located on the Pend Oreille River system: Seven Mile, with 594 MW of installed generating capacity and limited daily storage capability and the Waneta with 450 MW of installed capacity. The total number of hydroelectric facilities located on the Canadian side of the Columbia River amount to 30 11 installations. They have 20 units and are capable of producing 5,200 MW of energy. These generating facilities are located in the upper Columbia region, the East and West Kootenay River and on the Pend Oreille River [2]. The River plants on the Kootenay River are not owned but are operated by B.C Hydro on behalf of their owners. Six main generation facilities and several smaller installations on the Columbia River system account for almost 50% of the installed capacity and 30% of the storage capacity of the BC Hydro generating system. The CRT includes several rules on power generation of the system. Some of them are: 19.1 km³ of Canadian storage is operated for optimum power generation in Canada and the U.S. Canada receives 50% of the increase in power generation capability downstream in the U.S. due to the operation of Canadian Treaty storage. Canada is required to operate Treaty projects according to specified monthly operating plans. However, Canada has flexibility to operate individual projects for maximum Canadian benefit as long as the overall Canadian storage operation meets Treaty targets. Downstream power and flood control benefits resulting from Libby storage operation belong to the country where they occur, i.e., the U.S. benefits accrue to the U.S. and Canadian benefits accrue to Canada. The CRT made other power projects in Canada and the U.S. feasible, for example Kootenay Canal plants in Canada, the Wells, the Grand Coulee, and the Bonneville projects in the US. 3.7 THE CRT OPERATING PLANS This section outlines the studies and plans that are regularly carried out to implement the CRT rules and conditions, namely: Assured Operating Plan, assessments of downstream benefits, Detailed Operating Plan, Treaty Storage Regulation and Actual Operating Plan. · Assured Operating Plan (AOP) Studies: The Assured Operating Plan determines the Canadian Entitlement amounts and establishes a base operation for Canadian Treaty storage. It includes little direct treatment of other interests that have grown in importance 31 over the years, such as fish protection, irrigation and other environmental concerns. To achieve optimal power benefit and flood protection, an Assured Operating Plan (AOP) for the Canadian Treaty storage for six years is developed annually. These optimum operational studies are based on firm and non-firm energy capacity, and do not include non-power requirements such as fish and recreation. The AOP document is signed by both countries and it becomes the assured operating plan after it is reviewed by the Permanent Engineering Board (PEB). PEB determines if the plan meets the objectives of the CRT and reports their recommendations to both governments. Figure 12 shows the outflow pattern of Arrow and Duncan with and without AOP regulation in 2004. (60-yr Avg.) 140000 120000 Unregulated 100000 04AOP Regulated 80000 60000 40000 JULY JUNE MAY 30-Apr 15-Apr MAR FEB JAN DEC NOV OCT SEP 0 31-Aug 20000 15-Aug Arrow+Duncan Outflow in cfs Arrow+Duncan Treaty Outflows Figure 12. Arrow and Duncan outflow with and without AOP regulation [35] · Assessment of Downstream Power Benefits: Downstream Power Benefits are calculated based on the AOP simulation studies and they do not represent actual storage and power systems operations. Basically, the Canadian Entitlement does not reflect or adjust actual power benefits. Canada is entitled to 50% of the downstream energy and capacity increments in the U.S. resulting from releases in excess to those specified by the 32 Canadian Treaty storage. There is a “first-added” right of the Canadian Treaty storage ahead of other U.S. storage projects benefits built after those outlined by the CRT. The 30-year sale of power benefits expired on 1 April 2003. Now all the energy and capacity of Canadian Entitlement are delivered to the BC border, and BC is the owner and Powerex acts as its agent for selling energy to BC Hydro or other customers in the U.S. · Detailed Operating Plan (DOP) Studies: The CRT allows for preparing a Detailed Operating Plan (DOP) each year to consider fisheries, recreation, and other benefits in addition to power and flood control concerns. DOP provides more advantages to both countries than the AOPs that were prepared to consider only power and flood control objectives. DOP is prepared each year for the following operating years from August to July. DOP must be accepted by agreement otherwise the countries will have to abide by the agreed upon AOP. A mutual agreement has to be made on the DOP for including fisheries and other benefits. However, the Treaty permits the Entities to incorporate a broad range of interests into the DOP. These interests are agreed on immediately prior to the operating year, and they could modify the AOP to produce results that are more advantageous to both countries. For more than 20 years, the DOP's have included a growing number of fish-friendly operations designed to address environmental concerns on both sides of the border. · Treaty Storage Regulations (TSR) Studies: The Treaty Storage Regulation studies implements the rules of the DOP within the current operating year. TSR takes the 6 rule curves and gives priority to flood control, firm power, refill and secondary power. It uses actual inflows to date plus forecast inflows to the end of the operating year. The TSR models the entire Columbia River basin in Canada and the U.S. · Actual Operations Plans Studies (Short term): The TSR studies (monthly storage targets) provide the starting points for the rights and obligations for all operations of Treaty projects. Both countries can agree to deviate from the TSR study by using supplemental agreements. A weekly conference call is made on every Thursday at 11 am between U.S. and BC Entities to discuss the Treaty flow requests for the following week 33 (Saturday to Friday). Treaty flow agreement is then finalized by Friday Noon. Withinweek flow shaping needs are accommodated whenever possible. Mid-week changes are possible by mutual agreement. Canada has rights for flexibility within Canada and transferring water between Mica, Revelstoke, Arrow, and Duncan reservoirs. The optimization of CRT reservoirs in Canada is mainly based on this transfer. Figure 13 shows comparison between the actual AOP treaty, DOP and unregulated outflow of Arrow and Duncun in 1999-2000 operating year. 19 99-0 0 C ana dian (Arr ow + D unc an) Tr eaty Ou tflo w s in kc fs 130 110 A ct ual Trea ty D O P TS R 100 U n regu late d Average Monthly Flow in kcfs 120 90 80 70 60 50 40 30 20 10 0 AUG SE P O CT NOV DE C J AN FE B M AR A PR M AY JU N JU LY A UG SEP 1 99 9 -00 O p e ra tin g Y e ar Figure 13. Comparison between the AOP, DOP and unregulated outflow of Arrow and Duncun in 1999-2000 operating year [35] The CRT regulation pattern can be summarized in the following points: 1. Inflow forecasts are usually issued in early February and are then updated in April, primarily using snow pack accumulated in the period October-February in the Columbia watershed. 2. During the period Aug-Dec, conservative operation of the storage system is carried out in accordance with the treaty rules. This operation consist of: • Drafting reservoirs only to meet Treaty firm load requirements. 34 • No drafting to meet secondary energy requirement is carried out unless it is required to so to meet flood control curves requirements. • Water is conserved until Jan 1st to meet firm loads in later periods when snowpack measurements are low. 3. In average and above-average water years, the flood control and operation curves (as shown in Figure 10) drops down until end of January, and the system is allowed or sometimes is forced to release more water in January depending on the inflow forecast. The three higher ranking objectives, consumptive use, flood control and firm energy, are met and secondary energy can be produced. 4. In below-average water years, flood control and operation curves are kept high. In these years only firm energy demand is met and no secondary energy is produced. 3.8 CRT MODELS Differences and conflicts between objectives and constraints in this multi-purpose, multi-reservoir system on the one hand and uncertainty and variations of the problem parameters on the other hand make modeling and analysis of this problem a difficult and complex task. A number of simulation and optimization models have been developed to describe the system’s behaviour and to determine the optimum operating plans in order to optimize power production while meeting all other system’s constraints. Also these priorities and conflicting objectives pose several constraints on operation of the treaty and non-treaty reservoirs, and they can be treated in many instances as additional and conditional constraints in the optimization model with an overall objective of maximizing the value of resources by making the optimal trade-off between energy trade with the US and Alberta and maximizing the value of water stored in reservoirs while meeting all power and non-power constraints and complying with current and future Columbia River treaty obligations. 35 A deterministic model of the CRT was developed at BC Hydro. The model is based on the work of Shawwash in 2000 [3] and the GOM model and its extension by the BC Hydro-UBC research team. This study incorporates the inflow and price stochastic variables into this modeling system and a Reinforcement Learning approach is used to model CRT under uncertainty. The modeling concept and details are presented in the next chapter. 36 4. MODELING METHODOLOGY 4.1 INTRODUCTION Reinforcement learning (RL) is a learning method that can be used to arrive to an optimal solution. The method is based on interactions between an agent and an environment and on assessing the consequences of making decisions. Learners are not told what to do to achieve the optimal solution, but they discover how to act and improve their behaviour. This method has combined concepts of different disciplines including machine learning, operations research, control theory, psychology and neuroscience [29]. Machine learning, a sub category of artificial intelligence algorithms, is widely used in different areas. There are several machine learning algorithms that are based on performance improvement over time by using input/output data. The methods of machine learning can be categorized into different types, depending on the outcome of the algorithm. The main categories of machine learning methods can be summarized as follows [38]: · Supervised learning algorithms: In these algorithms the objective is to generate or approximate a function which maps the inputs to a desired output according to a predefined target attribute, which is called label variable. · Unsupervised learning algorithms: In these algorithms, the objective is to determine how the data are organized. Here, there is no label variable and algorithm tries to categorize instances without any predefined target attribute. · Semi-supervised learning algorithms: In these algorithms both labelled and unlabelled instances are used to make or approximate a mapping function. · Reinforcement learning algorithms: These algorithms are based on learning from previous actions by receiving feedback from an environment. This 37 method will be used in this study and will be explained in detail in subsequent sections. · Transduction algorithms: Instead of generating a function as in supervised learning, these algorithms try to predict new outputs based on training inputs and outputs and then test the inputs which are available during training. · Learning to learn algorithms: these methods are based on learning inductive bias based on previous experience. The reinforcement learning algorithm which has been applied by Abdalla et. al (2006) [4] to the B.C. Hydro generation system is further developed in this study to incorporate more details on the Columbia River Treaty constraints. To describe the main concept of RL, a good way is to review dynamic programming (DP) first. Knowledge of this well-known method makes explanation of RL easier to understand since it is based on Markovian Decision Process (MDP). A very brief description of DP is presented next. 4.2 DYNAMIC PROGRAMMING Stochastic Dynamic programming (SDP) is a powerful method for solving non-linear and sequential optimization problems such as hydropower operation problems. In this method a complex problem is broken down into a number of smaller and simpler sub-problems. Then these sub-problems are solved recursively. In this method, achievement of the optimal solution is guaranteed. These are the main characteristics of SDP, which make it practical and attractive for researchers and decision makers. In SDP, continuous decision variables are discretized into a number of limited points called the state space in each time step (stage). This changes a problem from a continuous infinite to a discretized finite problem. Then the best decision is made in each stage based on the information and the decisions in previous stages. 38 However, SDP has some well-known difficulties as follows: 1- Calculating the transition probabilities and transition rewards of the system in SDP are usually difficult and computationally expensive, particularly for large scale stochastic models as defining and evaluating the transition probabilities can sometimes be impractical. Gosavi (2003) states that: "Evaluating the transition probabilities often involves evaluating multiple integrals that contain the probability density functions (pdfs) of random variables. It is for this reason that DP is said to be plagued by the curse of modeling" [39]. 2- The required time and computational effort for solving SDP problems increases exponentially as the state discretization and number of stages increase. This problem is known as the “curse of dimensionality” which is a well-known term in SDP studies. One way of overcoming these problems is by applying machine learning algorithms to find an approximate solution to SDP problems. Reinforcement Learning (RL) is one of the important methods of Artificial Intelligence (AI) and machine learning which can be applied to the reservoir operation and management problems [4]. This method has the potential of learning how to control a larger system in a shorter time and less computational effort with or without a formal model of the system. 4.3 REINFORCEMENT LEARNING VERSUS OTHER METHODS · RL and DP: Compared to DP, RL leads to a solution which is approximately or nearly optimal, with significantly less time and computational effort and it overcomes the curse of dimensionality [29]. Figure 14 compares RL and DP algorithms. · RL and Evolutionary Methods: RL is more efficient than other evolutionary methods as Sutton et al. (1998) stated in [29]. They believed that “evolutionary methods 39 ignore much of the useful structure of the reinforcement learning problem: they do not use the fact that the policy they are searching for is a function of states and actions. In some cases this information can be misleading, but more often it should enable more efficient search. Although evolution and learning share many features and can naturally work together, as they do in nature, we do not consider evolutionary methods by themselves to be especially well suited to Reinforcement Learning problems” [29]. Figure 14. Reinforcement learning and dynamic programming comparison [29] Therefore, RL has several advantages compared to other methods. The description of the method and its components are briefly explained next. More details on the RL technique can be found in [29] and [4]. 4.4 REINFORCEMENT LEARNING COMPONENTS As mentioned previously, the RL algorithm is based on learning from previous behaviour 40 for performance improvement. RL has several components defined as follows: · Agent: The learner which makes decision according to received feedback and tries to improve its performance. · Action: the decision that is made in each step by the agent. · Environment: a simulation or optimization model of the system or the real system. · Reward Function: feedback received by the agent from the environment indicating how well the agent behaved in choosing the action. The RL concept can be summarized as follows: an agent takes an action and after interacting with the environment receives a feedback as a reward. These four keywords are the four main components of RL that could be defined by a model. The interaction between the agent and the environment takes place in a sequence of time steps (stages). Like DP, the state space is discretized and this interaction occurs in all the possible states variables in each stage. After interaction, the agent receives two feedbacks: a reward defined by a reward function and the state of the next stage (t+1). The agent saves and updates the reward and little by little learns how to behave to maximize the cumulative rewards. The reward function maps each state-space pair to a single number. Figure 15 shows the schematic of the interaction between the agent and the environment. 41 AGENT State st Reward rt Action at rt+1 ENVIRONMENT st+1 Figure 15. The interaction between agent and environment in RL [29] Other components of RL can be summarized as follows: · Policy: the agent's behaviour at a given time dictates how to act in each state. It can be a simple function or a lookup table, or extensive computation such as a search process. The policy determines the behaviour and is the core information used by a reinforcement learning agent [29]. · Value Function: is a long term desirability of a state or the total amount of available rewards that can be expected after considering the following probable states. The reward itself indicates the immediate or short term desirability of being in a state and can be low while the value of that state is high because it can be followed by high reward states. The reverse could also be true. So value and reward are linked to each other. Values are estimated to achieve more reward and rewards are essential for estimating values. For making/evaluating decisions or choosing actions, high values are considered rather than high rewards. Rewards calculation are easier than value functions since they are received from the environment while value functions should be estimated from the sequences of observations an agent makes over its entire lifetime. Value function estimation is the most dominant part of the RL algorithm. · Exploration/Exploitation: The agent will make the next action taking into 42 consideration the previous reward in order to achieve the best decision. This aspect is named exploitation and it guaranties the convergence to the optimal solution. To prevent the algorithm from getting stuck in a local optimal solution, the agent takes arbitrary actions with a given probability to search for a better solution among a larger number of feasible solutions. This aspect is known as exploration and takes place more frequently at the initial stages of the optimization process. As the agent matures, the probability of exploration will decrease and the model learns from the previous reward to take an action which makes it closer to the optimal solution. Two methods for balancing the exploration and exploitation in RL are the e-greedy and softmax action selection rules. In the e-greedy method the learner selects the highest estimated action value most of the time. The advantage of the e-greedy method is that the probability of choosing every action is infinity when the number of trials is large [29]. The disadvantage of this method is that the algorithm gives an equal weight to all the actions in the exploration process. It does not consider the estimated value of the selected action so the learner equally values the worst action and the second best action. In the softmax method this problem has been solved. In this method, the agent chooses between actions according to their estimated values. The best action is taken greedily and the rest are weighted and ranked based on their value functions [4]. The RL is based on the assumption that the interaction can be modeled within the Markovian Decision processes (MDP) framework. If a state signal from an environment summarizes all the history of past states, it is said to have Markovian property. In other words, if a signal state has Markovian property, the environment responses to it only based on the present state-action pair rather than on the previous ones. So a state signal has a Markovian property if the representation of the transition probability is true [4]: p sas¢ = Pr{s t +1 = s ¢, rt +1 = r | s t , rt , s t -1 , rt -1 ,..., s 0 , r0 } = Pr{s t +1 = s ¢, rt +1 = r | s t , rt } (4.1) 43 Where Pr is the probability of going to state s ¢ from a given state s at time-step t , s is state and r is reward, and psas¢ is the probability of moving to state s for a given state s ¢ . Equation (4.1) states that the probability of moving to state s ¢ and having reward r at time step t only depends on the previous time-step state and reward. The expected value of the immediate reward is: Rss¢ = E{rt +1 | st = s, st +1 = s¢} (4.2) Where Rss ¢ is the immediate reward for moving from state s to state s ¢ and E{ } is the expected value function. These are the basic aspects of an MDP process. If the state and action space in a MDP are finite then it is called a finite Markovian decision process. 4.5 REINFORCEMENT LEARNING APPLICATION Two different applications of the RL method are possible. One is to find an optimal policy, known as optimal control problem and the other is to evaluate an existing policy known as the prediction problem [4]. In the prediction problem, the value function only depends on the state of the environment while in control problems it also depends on the action that has been taken. The equation of calculating value function for a given policy p is: V p (s ) = Ep {Rt | st ì = s} = Ep íå g k rt + k +1 | s t ¥ î k =0 ü = sý þ (4.3) Where V is the value function of state s in policy p , E is the expected value, R is the reward received after t time steps, g is the discount factor, r is the immediate reward, and k is the time horizon. The control equation problem is: 44 ì¥ ü Q p ( s, a) = Ep {Rt | s t = s , a t = a} = Ep íå g k rt + k +1 | st = s , at = a ý î k =0 þ (4.4) In this equation, Q p ( s, a) is the expected return if the user follows policy p , starts from state s and takes action a . The value functions linking state s and its possible successor states can be derived recursively: [ ] V p (s ) = Ep {Rt | s t = s} = å p (s , a )å p sas¢ Rsas¢ + gV p (s ¢) , "s Î S a s¢ (4.5) According to this equation, known as the Bellman equation, the value function of state s is equal to the discounted value function of the expected next state s ¢ plus the expected reward of the stage. The optimal policy is one that maximizes the value function V p (s ) , which constitutes the unique solution to the Bellman equation, that can be defined as: V * ( s ) = max V p ( s ), p "s Î S (4.6) The optimal action-value function Q * (s , a ) in terms of V * ( s ) is: { } [ Q * (s , a ) = Ep rt +1 + gV * (s ¢) | s t = s , a t = a = å p sas¢ Rsas¢ + gV * ( s ¢) s¢ ] (4.7) In the control problem, the optimal actions appear best after a one-step search with the optimal value function V * for each state. The long term optimal action can be found by a one-step-ahead search. The agent can find the best action by having Q * instead of doing a one-step-ahead search. The results of all one-step-ahead searches are stored by the action-value function. This function provides the optimal expected long-term return as a value that is locally and immediately available for each state-action pair [29]. 45 4.6 REINFORCEMENT LEARNING ALGORITHM There are three methods for solving RL [4]. These methods are: 1. Dynamic programming (DP): a planning method, model-based (requires a model of the environment) 2. Monte Carlo techniques (MC): learning method, model-free (learns from basic experience, without using a model). 3. Temporal Difference learning methods (TD): learning method, model-free (learns from basic experience, without using a model). 4.6.1 Dynamic Programming Methods (DP) Dynamic Programming is a powerful method for solving Markovian Decision Processes. As stated previously, DP can explain the theory and fundamentals of RL. DP needs a perfect knowledge of the model of the environment, and recursively use the value functions and the Bellman equation of optimality to obtain the optimal policies. The value function is updated by estimating the value function of successor states. DP problems can be solved by three methods: the policy iteration, the value iteration and the linear programming methods. The first two methods are discussed next. 4.6.1.1 Policy Iteration In the policy iteration method, the policy p is given and the objective is to find its optimal value function. First, the values of all states are initialized and then updated iteratively by successive approximation and by applying the Bellman equation recursively. This iteration stops when the difference between two successive value functions is small enough [29]. 46 The policy can be improved by these estimated action-values. If some actions, which are not in the policy p , add more values than current actions then they are replaced. This method of policy improvement is usually done by taking greedy actions. The policy iteration algorithm is a combination of these two steps: policy evaluation and policy improvement. The algorithm evaluates the value of a given policy and then improves it in a greedy way and again evaluates it and so on until the optimal policy is reached. 4.6.1.2 Value Iteration The value iteration method also combines the policy evaluation and improvement steps to achieve the optimal solution. But the difference is that in the value iteration algorithm, the optimal policy is estimated directly by taking the best actions. In this algorithm all the state space is swept once and then the policy is evaluated by sampling sequences of states, actions, and rewards. Then the algorithm is followed by a policy improvement step. As mentioned earlier, the DP algorithm needs a model of the environment, and specifically a way to derive the transition probabilities and the expected immediate rewards. This model is not available sometimes or could be difficult to be built particularly for large-scale problems. The next two algorithms are based on learning and they do not require a model of the environment. 4.6.2 Monte Carlo Methods (MC) In the Monte Carlo (MC) algorithm the agent estimates the value function based on the experience it gains. MC samples random parameters based on their mean, distribution function and other moments. Having generated these parameters, the expected value function V and action-value function Q are then estimated. When the number of generated random parameters is sufficiently large, the average of the calculated value and 47 action-value functions gets closer to their accurate expected values. MC control methods can be made by switching between policy evaluation and policy improvement for the complete steps of the episode. Each episode contains starts from a state or state-action pair, follows a given policy until the end of the learning process. The calculated value and action-value function from the sampled parameters at the end of the episode are then used for policy evaluation and improvement. To assure that the number of sampled parameters is large enough to capture all the possible actions for the policy improvement, two approaches can be used. They are known as on-policy and off-policy method. In on-policy control method, the agent generally uses a soft policy, meaning that p (s, a ) > 0 for all s Î S and all a Î A . Several different methods of on-policy approach are available [29]. In this method the agent does a continuous exploration and tries to find the best policy in the process. In the off-policy method sufficient exploration is guaranteed by following two policies: the agent uses one arbitrary stochastic policy for interacting with the environment and generates a behaviour policy. The other policy, called the estimated policy, which is a deterministic (e.g. greedy) policy. This helps the agent to cover sufficiently large action space. There are two main differences between MC and DP methods: First, MC methods operate on sample experience and can be used for direct learning from interaction with the environment without a model. Second, MC methods do not estimate their value for one state on the basis of the estimates of successor states [29]. 4.6.3 Temporal Difference (TD) The key idea of the RL can be presented by Temporal Difference (TD) learning methods. 48 This method is a combination of DP and MC methods outlined earlier. In this study, the TD method has been used in the RL algorithm. Similar to MC methods, the TD approaches use experience for solving the prediction problem. Both of these methods do not require a model of the environment. The similarity between TD methods and DP is that they update estimations of the value function without waiting for the final outcome. So the TD methods have advantages of both DP and MC. The agent observes one successor state when it interacts with the environment instead of using all possible state values and weighting them by their probability distribution. Building a model of the environment is not required. Additionally, TD methods do not need to wait for the end of the episode for updating the value function [29]. Different versions of TD algorithms are available. The simplest one is known as one-step TD or TD (0). In this algorithm, the agent waits only for the next time step to update its estimate of the value function. This estimation is based on the value function of the observed state transition V (s t ) and the immediate rewards received from the environment. The following update is performed on each time step: V (st ) ¬ V (st ) + a n [rt +1 + gV (st +1 ) - V (st )] (4.8) In this equation, 0 £ a t £ 1 is a step-size parameter representing the learning rate. This rate decreases as the agent matures. The Equation (4.8) states that: New Estimate ←Old Estimate + Step_Size*[Target – Old Estimate] As it can be seen that the target for the TD algorithm is rt +1 + gV (st +1 ) . Difference between Target and Old Estimate is the error between two successive estimation of the value function. As the agent gets closer to the Target, this error becomes smaller. 49 In prediction problems, in which the policy p is given, it has been proven that the TD algorithm converges to V * if the following two conditions are met: ¥ 1. åa t =0 t = ¥ which guarantees that the step-size is sufficiently large to make the problem independent of the initial condition and random fluctuation. ¥ 2. åa t =0 2 t < ¥ which guarantees that the step-size becomes gradually small enough to converge to an optimal solution [29]. In the control problem in which estimation of the action-value function is needed, TD methods can be used to evaluate or predict the policy iteration algorithm. Similar to the MC methods, and to ensure that sufficient exploration is taken place either the on-policy or the off-policy approaches can be used. Figure 16 schematically compares DP, MC and TD algorithms and how the value function is calculated in these methods. 50 Figure 16. DP, MC and TD methods and their value function [29] 4.6.4 Function Approximation The simplest way for storing the value function for each state (or state-action pair) in DP is by using look up tables. When the problem is finite it is an efficient and straightforward method. But when the state space is large and infinite, which include continuous variables, this method becomes impractical. An alternative for look up tables is to approximate a function using the value function results obtained in the iterative process. This can be done in different ways and have been discussed in [29] in detail: 1. Function fitting (neural network and regression). 2. Function interpolation (K-nearest-neighbours and Kernel methods), 3. State aggregation. Each of these methods has their own advantages, weaknesses and difficulties. In the 51 function fitting approach different functions can be used such as Piecewise Linear function or quadratic function. For the PWL function, interpolation between discretized points of the value function is carried out. For the quadratic function, the general form of value function base on each state can be approximated as: f a ( s) = A + B ´ s + C ´ s 2 (4.9) If parameters A , B and C can be estimated and the value function can then be approximated in each state point. In this case, only these parameters need to be stored instead of all Q-value. These methods make the model capable of handling a large state space and can yield smoother value functions. 52 5. CRT-RL MODEL AND RESULTS 5.1 INTRODUCTION In this study, the reinforcement learning model developed by Abdalla (2007) [4] was extended to optimize the BC Hydro power production system including the Columbia River Treaty operations. In the BC Hydro generation system, the overall objective is to maximize the value of resources, which is a function of immediate energy trade and potential future energy production. The details of the Columbia River Treaty optimization model with the reinforcement learning algorithm, called CRT-RL model in this study, and its results are presented in this chapter. 5.2 MODEL COMPONENTS As outlined in chapter 4, a RL model has several components. These components are problem dependant and should be defined case by case. The components of the CRT-RL model in the current study are described below. The corresponding mathematical notations are also presented. 5.2.1 Time Horizon T : Final time period in a cycle (iteration or episode) t : Current time step (stage) in a cycle (iteration or episode), where t Î {1,..., T } . The model is based on the traditional stochastic dynamic programming framework, which includes a sequence of decisions regarding future opportunities. The problem on hand represents a Markovian Decision Process, which means that the outcome of choosing an action (making a decision) only depends on the present state. Usually the time horizon is considered to be more than one year for long term planning of reservoir operations. In 53 this model, because of multi-year operation of large reservoirs (Mica and GMS), the time horizon is taken to be 3 years. Every year has 14 time steps according to the CRT rules and constraints. August and April have two time steps (one starts at the 1st and the other at 16th day of the month) and the rest of the months have one time step that starts at their first day. So in total there are 42 time steps in a typical study. 5.2.2 State Space J : Number of reservoirs included in the RL model, where { j Î1,..., J } . n j : Number of state discretization for reservoir j . {S }: A set of discretized storage volume for reservoir j , where j { n S j = s1j , s 2j ,..., s j j } in cms-d (cubic meters per second for one day). s j ,t : Storage volume in reservoir j at beginning of period t in cms-d (storage state variables). s j ,t +1 : Storage volume in reservoir j at the end of period t in cms-d. At any time step the state of the system represents the amount of water in the reservoirs. This amount is discretized between the corresponding maximum and minimum possible reservoir storages. The state space is made up by different possible combinations of discretized reservoir storage levels. As mentioned earlier, the BC Hydro power generation facilities are operated in an integrated manner to maximize the overall value of resources and takes advantage of the diverse mix of resources to compensate possible energy shortages in one region by producing electricity in another region. So the Columbia and Peace River reservoirs are considered in this study as the most important power generation facilities in BC. The following five plants are included in the optimization model: Mica, Revelestoke and the Arrow Reservoirs on the Columbia and the GMS and Peace Canyon on the Peace River. To control downstream floods in the Columbia River according to CRT rules, two control points have been included in the model at the town of Trail and at the US border. These 54 control points are considered in the optimizer (environment),which will be described in subsequent section. Discretization of the state space for the five plants included in this study would represent an ideal work, but it will results in a very large state space. It could also unnecessarily make the problem harder to solve since the Revelstoke and Peace Canyon and Arrow reservoirs can, for practical purposes be considered run-of-river projects. So only the two big reservoirs, the Willistone (GMS plant) and the Kinbasket (MCA plant) reservoirs, have been selected to represent the storage in the entire system, and it follows the work of Abdalla (2007) [4]. It could be argued that the Arrow reservoir is large enough to be added to the two reservoir problem, and this work is left for future research. The other reservoirs and control points are hydraulically linked to these reservoirs and the two plants could be considered to be representatives the overall system. The state space is made by discretizing the GMS and the Mica storage volumes at 10 and 9 discrete state respectively that varies between their maximum and minimum reservoir storages. Therefore the state space consists of 90 points in each time step as listed in the Table 2: Table 2. Discretized GMS and Mica Reservoir Storages (1000 cms-day) Plant 1 2 3 4 5 6 7 8 9 10 GMS 75 125 150 200 250 300 350 400 450 470 Mica 125 145 165 185 205 225 245 265 285 5.2.3 Environment The environment can consist of a real system or a model of the system (simulator or optimizer), which interacts with the RL agent and feedback to it in each time step and state space point. Abdalla (2007) used the BC Hydro Generalized Optimization Model (GOM) as an environment in the RL algorithm he has developed [4]. GOM is used in BC Hydro for simulation and optimization of the integrated BC Hydro power generation system. GOM is based on the Short Term Optimization Model (STOM) developed by Shawwash (2000) [3]. GOM contains basic optimization formulation of STOM and it has 55 more flexibility to consider longer time horizon at various time resolutions and it is a deterministic model which incorporates the forecast of system inflows, domestic load and market prices into the optimization. The basic objective function of GOM is to maximize the value of BC Hydro resources. It also includes a detailed hydraulic simulation to calculate the hydraulic balance, generation and turbine limits in each time step. In this study, the GOM was extended for the Columbia River Treaty, called CRTOM (Columbia River Treaty Optimization Model), and is used as the environment. CRTOM was developed by Shawwash (2009) [40] to consider the Columbia River Treaty rules and constraints in the optimization process. CRTOM has variable time steps: hourly, daily, weekly, sub-monthly and/or monthly. Sub-time steps may also be used to divide time steps into shorter time periods. This enables the model to include different load conditions within a time step based on loadduration curves. For example weekly or more than a week time steps have to reflect load difference in weekdays and weekends. Daily and more than a day time steps may contain high, peak and low load hour sub-time steps. In this study, one sub-time step has been considered for weekdays and weekends. Other details on CRTOM components are presented next. 5.2.3.1 Decision Variables of the Environment The CRTOM decision variables are: QTK ,T ,h : Turbine release from reservoir k at time step t and sub-time step h in cms. QS k ,t ,h : Spill (non-power) release from reservoir k at time stept and sub-time step h in cms. G k ,t , h : Generation from plant k at time step t and sub-time step h in MWh. SpotUSt , h : Spot transaction (import/export) to US market at time step t and sub-time step h in MWh. 56 Spot ABt , h : Spot transaction (import/export) to Alberta market at time step t and sub-time step h in MWh. 5.2.3.2 Constraints The optimization model incorporates many constraints, most of them are linear and some have been modeled as piecewise linear equations. The details of these constraints are presented here: i. Hydraulic Continuity Equation This constraint implies that the reservoir storage in each time step is equal to the previous time step storage plus water flowing into the reservoir from upstream reservoirs’ turbine and spill releases and local natural inflow minus water discharged from the reservoir through turbine and spillage. I wj ,t : Local natural inflow to reservoir j in time step t for scenario w where w Î {1,..., W} in cms. C kjT : Elements of the matrices representing the hydraulic connection between the reservoirs of the turbine outflow from reservoir k to reservoir j . C kjS : Elements of the matrices representing the hydraulic connection between the reservoirs of the spill outflow from reservoir k to reservoir j . H t ,h : Number of hours in sub-time step h at time step t and h Î (1,2,..., hn ) . H t : Number of hours in time step t . S j ,t +1 = S j ,1 when t = T . C kjT , C kjS = 0 if there is no physical hydraulic connection between the reservoirs. C kjT , C kjS = 1 if j = k . C kjT , C kjS = -1 for "k ¹ j and reservoir j is physically connected to reservoir k . 57 The equation is: é K hn ù S j ,t +1 = S j ,t + ê ± åå QTk ,t ,h ´ C kjT + QS k ,t , h ´ C kjS ´ H t , h + I wj ,t ´ H t ú / 24 ë k =1 h =1 û ( ) "k ¹ j (5.1) ii. Storage Bound Constraints S kMin ,t : Minimum reservoir storage for reservoir k and time stept . S kMax ,t : Maximum reservoir storage for reservoir k and time stept . fcck ,t : Maximum reservoir storage based on flood control curves for reservoir k and time step t when the reservoir is under the CRT. Reservoir storage must be operated within the storage limits. These limits are calculated based on piecewise linear (PWL) functions relating the storage volume to the reservoir elevation (Forebay) in the GOM model, which can be stated as S k ,t = f (Fbk ,t ) . For plants that are operated under the CRT rules, flood control curves are also considered to obtain these limits. For each inflow scenario, a unique flood control curve exists which is used in the optimization model as a limit on the maximum storage volumes for Mica and Arrow. The Mica and Arrow maximum storage is set to be equal to the minimum of the normal maximum storage limit and the flood control storage limit. Figure 17 and 18 show the flood control curves for Mica and Arrow corresponding to the 69 inflow scenarios of historic records. Max S kMin ,t £ S k ,t £ S k ,t "k ¹ {Mica, Arrow}, t Max S kMin ,t £ S k ,t £ min( S k ,t , fcc k, t ) "k = {Mica, Arrow}, t (5.2) (5.3) 58 280000 270000 260000 250000 240000 230000 220000 J ul J un M ay 1 6 -3 0 A p r 1 -1 5 A p r M ar Feb J an Dec Nov Oct S ep 1 6 -3 1 A u g 1 -1 5 A u g M ic a M ax im u m V a lu e S to rag e (c m s-d ) 290000 Arrow Maxim um Storage Volum e (cm s-d) Figure 17. Mica flood control curves for 69 years 102000 101000 100000 99000 98000 97000 96000 95000 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 18. Arrow flood control curves for 69 years iii. Power Generation Constraints G k ,t , h : Power generation of plant k at time step t and sub time step h in MWh. HK k ,t : The amount generation per discharge unit for plant k at time step t in MWH/cms. 59 Power generation in a reservoir is a function of the reservoir elevation (Forebay), turbine discharge and number of units available for commitment. This can be stated as G j ,t = f (FB j ,t , QT j ,t , U j ,t ). This function is not linear but can be presented by a PWL surface function similar to the functions that have been used in GOM and STOM. In the CRTOM, instead of using the PWL curves, the HK tables have been used. HK is defined as the amount of generation for a unit of turbine discharge. HK tables are given in the model as an input. The equation is: Gk ,t ,h = HK k ,t ´ QT j ,t (5.4) iv. Total Power Generation Constraints ORk : The percentage of operating reserve from plant k . GTk ,t ,h : Total power generation and operating reserve from plant k at time step t and sub-time step h in MWh. The operating reserve is a specific level of reserve capacity that should be available at all times to ensure reliable power grid operation. The equation is: GTk ,t ,h = G k ,t ,h + G k ,t ,h ´ ORk (5.5) v. Generation Limit Constraints GkMin ,t : Minimum possible power generation for reservoir k in time step t in MWh. GkMax ,t : Maximum possible power generation for reservoir k in time step t in MWh. Generation for each reservoir k in each time step t and sub-time step h cannot be outside these as represented in the following equation: 60 Max GkMin ,t £ GTk ,t ,h £ Gk ,t "k , t , h (5.6) vi. Load-Resource Balance (LRB) constraints Lt , h : System load at time step t at sub time step h in MWh. Gt¢, h : The fixed and shaped generation from all other small hydro, IPP and thermal plants that are not included as a decision variable in the optimization problem. The amount of energy produced by reservoir k and other small hydro and thermal plants in time step t and sub-time step h plus power purchased from the US/Alberta minus power sold to the US/Alberta has to be equal to system load at that time step and sub-time step. The equation is: K åG k =1 k ,t ,h + Gt¢,h + SpotUSt , h + Spot ABt , h = Lt ,h "t , h (5.7) vii. Spot US and Alberta Transmission Constraints Min T AB , TUSMint , h : The inter-tie transmission limit from BC to Alberta and the US at time step t ,h t at sub time step h in MWh. Max T AB , TUSMax : The inter-tie transmission limit from Alberta and the US to BC at time step t ,h t ,h t at sub time step h in MWh. Transactions between BC Hydro and the US/Alberta in each time and sub-time steps cannot violate the transmission limits as represented in the following equations: TUSMint , h £ SpotUS t , h £ TUSMax t ,h (5.8) Min Max T AB £ Spot ABt , h £ T AB t ,h t ,h (5.9) 61 viii. Turbine Bounds Max QT jMin ,t , QT j ,t : Minimum and Maximum limits in cms on turbine discharge for plant j and time step t . The water discharge from each turbine in each time step cannot violate the limits as represented by the following equation: Max QT jMin ,t £ QT j ,t £ QT j ,t (5.10) ix. Plant Discharge bounds The plant total discharge, which includes turbine and spill discharges, cannot violate the limits. These limits are calculated based on elevation, unit availability and optimum unit commitment curves. The equation can be written as: Q Min £ QT j ,t + QS j ,t £ Q Max (5.11) 5.2.3.3 Objective Function Pt ,h : Market price at time step t for scenario w in $/MWh. MVWkN,t -1 : Marginal Value of Water estimated by RL model in $/m3. The model objective function consists of two terms. The first term calculates the revenue from selling/purchasing energy to/from the US and Alberta. The second term calculates the value of water in storage. It is calculated by multiplying the marginal value of water calculated in iteration N - 1 in RL model (to be described later) by taking the difference 62 between the target (end of period) storage calculated in iteration N - 1 and the storage calculated in the current iteration N . The equation is: Maximize: å [(Spot ht h =1 USt , h ) ] K [( ) + Spot ABt ,h ´ H t ,h ´ Pt ,h + å skN,t +1 - skN,t-+11 ´ MVWkN,t -1 ] "t (5.12) k =1 Where ht is the number of sub time-steps in time-step t. All the above constraints and the objective function are linear or PWL equations. So the linear programming technique can be used as the optimization method in the environment. 5.2.4 Actions In this study, the TD method of RL has been used and therefore no actions need to be taken by the model. The environment, the CRTOM takes the state space and scenarios as inputs and optimizes reward for the agent. This entails that the model allows the environment to choose the optimal action from a continuous range of actions rather than randomly choosing from a set of discretized actions. 5.2.5 Rewards g : Monthly discount factor of future rewards. s ¢ : State vector representing a possible combination of different storage increments of the different reservoirs ( ) j within the state space where s ¢ = s1i1 , s 2i2 ,..., s Jj , with i i j Î {1,..., n j } for i = 1,..., J in cms-d. rt (s¢ ) : Rewards of transition to state s t¢+1 at end of time step t in $-Canadian. The reward function is a feedback from the environment (CRTOM) to the agent at each time step and it shows how well it improved so far. At each time period, the optimization model (CRTOM) determines the optimal control strategy that maximizes the benefits 63 based on the current state of the system. The objective of model is to maximize the cumulative reward received from environment. 5.2.6 Value Function Qt (s¢) : The Q-value function when the system state is s t¢ at the beginning of time step t in cms. f t (s ¢) : Expected value function when the system state is s t¢ at the beginning of the time step (stage) t in $. N : Iteration number or the age of the agent. a n : Learning rate (step size) parameter in iteration n where n Î {1,..., N }. In this study, the Temporal Difference (TD) method is used in the RL algorithm. In this method the value function is calculated by the following equation: V (st ) ¬ V (st ) + a n [rt +1 + gV (st +1 ) - V (st )] (5.13) a n , the learning rate, is used to control the step size of the learning process and rate of model estimation modification of the value function. At the first steps of the learning process, the learning rate is usually high and changes the value function rapidly. As the agent grows up, the learning rate gradually decreases. This helps the model to converge faster [29]. The equation for learning rate, as developed by Even-Dar and Mansour (2003) is [41]: a = 1 Ny (5.14) y is a parameter that ranges between 0.50 and 1.0. Abdalla (2007) used y = 0.845 in his model and in this study is taken to be equal to 0.8 [4]. 64 5.2.7 Scenarios of Stochastic Parameters W : Number of scenarios of random variables, where {w Î 1,..., W} . I kw,t : Inflow of reservoir k at time -step t for scenario w in cms. Pt w : Market price at time step t for scenario w in $/MWh. e~ : Vector of random events that influence the operation of the multi-reservoir system, { } where e~ = I w , Pw . N max : Maximum number of iterations. As mentioned earlier, water resources planning problems are inherently stochastic. The learning process is implemented by iteratively optimizing with different scenarios for the stochastic parameters. The uncertainty exists in some parameters such as inflow, load, and energy price. Abdalla (2007) considered these three parameters as the model’s stochastic parameters. A moment matching method was used to derive 10 scenarios for the learning model [4]. In this study 69 scenarios of inflow are used in CRTOM as scenarios. These scenarios are based on historical data from 1932-2000. Figures 19 and 20 show the 69 scenarios of the local natural inflows to GMS and Mica respectively. The other reservoirs and control points have similar inflow properties. Figures 21 and 22 show the market import and export price of electricity in $/MWh. One scenario of inflow and corresponding market prices is selected randomly in each iteration. The probability of choosing each scenario is the same for all scenarios and is equal to 1/69=1.45%. This number of scenarios seems adequate to cover the most possible price and inflow ranges in the Peace and Columbia rivers and the BC Hydro system. 65 7000 I n flo w (c m s) 6000 5000 4000 3000 2000 1000 0 Jul Jun M ay 1 6 -3 0 A p r 1 -1 5 A p r M ar F eb Jan D ec N ov Oct S ep 1 6 -3 1 A u g 1 -1 5 A u g Figure 19. 69 scenarios of historical data of GMS local natural inflow 3000 2500 Inflow (cms) 2000 1500 1000 500 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 20. 69 scenarios of historical data of Mica local natural inflow 140.00 Import Price ($/MWh.) 120.00 100.00 80.00 60.00 40.00 20.00 0.00 Jul Jun May 16-30 apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 21. 69 scenarios of power import price 66 140.00 Export Price ($/MWh) 120.00 100.00 80.00 60.00 40.00 20.00 0.00 Jul Jun May 16-30 apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 22. 69 scenarios of power export price Inflow in this study is the main stochastic parameter in the BC Hydro optimization model. It is influenced by the weather conditions including rainfall, snowfall accumulation and temperature patterns, which are considered to be stochastic phenomena. In addition, each inflow scenario corresponds to an energy price scenario. This is because in Pacific Northwest, electricity prices are correlated to runoff volumes in Northwest [4]. For instance in dry years, the available water stored in reservoirs is used to produce firm energy and no secondary energy is generated, the power generation in the fall will be low and the prices will tend to be high in dry years. In wet years enough water is available to produce and sell secondary energy and the energy prices will be low. The inflows and electricity prices are considered to be highly correlated stochastic parameters and are taken as stochastic parameters in the CRT-RL model. Although Abdalla (2007) considered load as another stochastic parameter but it is fixed in this study as shown in Figure 23. The number of runs that was carried out by Abdala was 180 iterations with 10 scenarios. The CRT-RL model was run for 400 iterations, but it is possible to run it for any number of iterations. 67 9000 8000 7000 Load (MWh) 6000 5000 4000 3000 2000 1000 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 23. Energy load (demand) 5.3 CRT-RL ALGORITHM The CRT-RL algorithm is executed using the following steps: 1. The state space is discretized to cover all the range between minimum and maximum storage volume. 2. The number of time steps is defined. 3. The value function is initiated at zero or to an initial guess. 4. Start a new iteration. 5. At each time step and at each state space point, a scenario of inflow and price is randomly chosen from the historical data sequence. The sequence consists of the study stages (e.g. 42 time steps for 3 years). 6. The state space and scenario vector is used by CRTOM as inputs. 7. CRTOM optimizes the problem and after determining the release and plant generation policies, it determines the maximum profit as a reward and next stage (target) storage as feedback. These amounts are then saved. 8. The value function is then calculated to update the value function tables. 9. The marginal value of water is calculated as a derivative of the value function with respect to storage in the reservoir. The amount of MVW is then updated. 68 10. The model moves to the next state space. 11. If all state space points are visited, the model moves to the next time step. 12. If the end of the time horizon was reached the model goes back to step 4 and it continues until the end of all iterations. The maximum number of iteration is chosen based on the maximum difference between two successive value functions. If this difference is small enough the algorithm will be deemed to have converged. Figure 24 shows a schematic representation of the CRT-RL model algorithm and Figure 25 shows a flow chart of the CRT-RL algorithm. 69 Figure 24. Schematic representation of CRT-RL model algorithm using function algorithm, adopted from [4] 70 Start Construct the Environment (CRTOM) Initialize the Value Function Set N=1 Set t=1 Sample Randomly an Inflow/Price Scenario Choose over the State Space points Go to the next State Point Simulate the System by CRTOM (Get the Decision for Each State) N=N+1 t=t+1 Apply TD Algorithm to Update the Value Function Are all the State Space Points met? Is Planning Horizon Reached? Check Stopping Criteria Generate Optimal Solution Stop Figure 25. The schematic of CRT-RL Algorithm, adopted from [4] 71 5.4 RESULTS The CRTOM-RL model described previously in this chapter was formulated and implemented in AMPL2. The model was run for 90 state space points, 42 stages and 400 iterations. The total number of optimization was 1,512,000 times. It could be done for more number of iterations. However, as Figure 26 shows, after 300 iterations, the model converged to the solution. The convergence criterion is defined as the maximum difference between values of the water for two successive iterations over all stages and state space points. Maximum Difference between Two Successive Values of Water 35 30 Error (%) 25 20 15 10 5 0 0 50 100 150 200 250 300 350 400 450 Number of Iterations Figure 26. The error defined as the maximum difference in the value function between two successive iterations 5.4.1 Infeasibility The CRTOM model sometimes results in infeasible solutions. This infeasibility depends on the scenarios that were randomly selected, the state space points and the stages. First, the model was designed in such a way that infeasible solutions were omitted from the learning process. Therefore the value of water function was not updated when the model 2 Modeling Language for Mathematical Programming 72 achieved an infeasible result. The results showed that in most time-steps the numbers of infeasible results are small (less than 5%). However, there are large numbers of infeasible solutions in May, June and July amounting to 30-80% of the runs in these months. Investigation of this issue and inspecting the results on infeasible solutions and considering Figures 17 and 18, it can be realized that the main source of infeasibility in these months are the flood control curves and the maximum spill limits on the Arrow reservoir, Trail and the US border to comply with the Columbia River Treaty rules and conditions. These limits impose tight constraints on Mica and Arrow maximum storage volumes during the freshet period and in early summer. Even in some scenarios, the state space points of Mica storage are above the maximum allowable storage volume dictated by the flood control curves. Generally, the model becomes infeasible at the maximum state space points for Mica and minimum state space points for GMS in these months. To avoid infeasibility, a penalty cost function has been added to penalize violation from the maximum and minimum reservoirs volume limits. This means that the model is allowed to go beyond the reservoir storage limits (Equations 5.2 and 5.3) at a cost. Violation from maximum storage limits means spill and violation from the minimum limit means purchasing more energy and both of them will entail a cost for the system. The amount of penalty cost is assumed to be equal to 1.2 of the maximum marginal value of water in the system which is 3500 $/cms-d or about 180 $/MWh. Equations (5.2) and (5.3) have therefore were changed to: Min Max Max S kMin ,t - dS k ,t £ S k ,t £ S k ,t + dS k ,t "k ¹ {Mica, Arrow}, t Min Max Max S kMin ,t - dS k ,t £ S k ,t £ min( S k ,t , fcc k, t ) + dS k ,t "k = {Mica, Arrow}, t (5.15) (5.16) Max Where dS kMin are model variables for violation from the minimum and the ,t and dS k ,t maximum reservoir volume limits for plant k in time step t. The objective function was also changed as follows: Maximize: 73 å [(Spot ht h =1 USt , h ] ) K [( ] ) [( K ) Max + Spot ABt ,h ´ H t ,h ´ Pt ,h + å skN,t +1 - skN,t-+11 ´ MVWkN,t -1 - å dS kMin ´ 3500 ,t + dS k ,t k =1 ] k =1 (5.17) Therefore the model decides about the time and amount of violation from the storage limits. Since violation entails incurring some costs for the system and reduces the profit, the model tries to minimize the reservoir volume violations unless it is forced to do this due to infeasibility. Having these penalty terms in the model resulted in feasible runs at all the time. 5.4.2 Results of the Value of Water The results of the model are presented in several diagrams of the value of water in storage in different time steps. Figures 27-33 show the 3-D diagram of value of water in storage for different time-steps. The storage level of the two reservoirs, GMS and MCA are presented in the horizontal axis while the vertical axis represents the value of water in storage. 1-15 August 16-31 August 245000 450000 125000 470000 350000 Williston (GM S) Storage (cm s -d) 400000 250000 300000 150000 185000 200000 75000 0 Kinbas ket (Mica) Storage (cm s -d) 1,250 1,000 750 500 250 0 245000 185000 Williston (GM S) Storage (cm s -d) 450000 500 250 1,500 125000 470000 750 1,750 350000 1,000 2,000 400000 1,250 2,250 250000 1,500 2,500 300000 1,750 2,750 150000 2,000 3,000 200000 2,250 125000 Value of Water (M$) 2,500 75000 2,750 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 125000 3,000 Value of Water (M$) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 Kinbask et (M ica) Storage (cm s-d) Figure 27. 3-D diagrams of value of water in storage (1st-15th of August and 16th-31st of August) 74 "t September October 285000 245000 450000 Willis ton (GMS) Storage (cm s-d) 125000 470000 350000 400000 250000 300000 150000 245000 165000 200000 125000 75000 205000 Kinbasket (Mica) Storage (cm s-d) 185000 450000 500 250 0 Williston (GMS) Storage (cms-d) 125000 470000 1,000 750 350000 1,250 400000 1,500 250000 1,750 300000 2,000 150000 2,250 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 75000 Value of Water (M$) 2,500 200000 2,750 125000 3,000 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 3,000 Valu e o f Water (M $) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 Kinbasket (Mica) Storage (cm s-d) Figure 28. 3-D diagrams of value of water in storage (September and October) December November 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 470000 400000 Willis ton (GM S) Storage (cm s -d) 125000 450000 300000 350000 Kinbasket (Mica) Storage (cm s-d) 200000 125000 185000 250000 75000 245000 125000 470000 Williston (GMS) Storage (cm s-d) 450000 350000 400000 250000 300000 200000 150000 125000 245000 185000 150000 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 75000 Value of Water (M$) 2,750 2,500 3,250 V a lu e o f W a te r (M $ ) 3,250 3,000 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-25 0 Kinbas k e t (M ica) Storage (cm s -d) Figure 29. 3-D diagrams of value of water in storage (November and December) 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 125000 470000 400000 Willis ton (GMS) Storage (cm s-d) 450000 350000 250000 150000 200000 75000 245000 185000 300000 Kinbasket (Mica) Storage (cm s-d) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 125000 125000 470000 400000 Williston (GMS) Storage (cms -d) 450000 300000 350000 200000 250000 125000 150000 245000 185000 February Value of Water (M$) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 75000 Value of Water (M$) January Kinbas k et (Mica) Storage (cm s -d) Figure 30. 3-D diagrams of value of water in storage (January and February) 75 1-15 April March 3,000 450000 125000 Williston (GMS) Storage (cms-d) 470000 350000 400000 250000 150000 300000 Kinbas k e t (Mica) Storage (cm s -d) 185000 200000 75000 245000 125000 450000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 125000 470000 350000 400000 250000 300000 150000 200000 75000 185000 Willis ton (GMS) Storage (cm s -d) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 245000 125000 Value of Water (M$) 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 500 250 0 Value of Water (M$) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 Kinbasket (Mica) Storage (cms-d) Figure 31. 3-D diagrams of value of water in storage (March and 1st-15th of April) 16-30 April May 3,250 3,000 2,750 2,250 2,000 1,750 1,500 1,250 1,000 750 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 285000 500 245000 125000 470000 400000 Willis ton (GMS) Stor age (cm s-d) 450000 300000 350000 Kinbas k e t (M ica) Storage (cm s -d) 165000 250000 470000 205000 75000 125000 450000 400000 300000 350000 200000 250000 150000 75000 125000 245000 0 165000 Willis ton (GM S) Storage (cm s -d) 285000 250 205000 150000 0 200000 250 125000 500 Kinbask et (M ica) Storage (cm s-d) Figure 32. 3-D diagrams of value of water in storage (16th-30th April and May) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 3,000 2,750 2,500 2,250 2,000 1,750 1,500 1,250 1,000 750 July 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 3,000 2,750 2,500 Va lue o f W at er (M$) June 2,250 2,000 1,750 1,500 1,250 1,000 750 285000 250 245000 0 205000 125000 47 000 0 Williston (GMS) Storage (cms-d) 450 000 4 00 000 3 500 00 165000 30 00 00 Kinbask et (Mica) Storage (cm s-d) 500 200 00 0 125000 470000 Williston (GMS) Storage (cms -d) 450000 350000 400000 250000 165000 300000 150000 200000 125000 205000 25 000 0 0 1 250 00 245000 1 50 000 285000 250 7 50 00 500 75000 Value of Water (M$) Value of Water (M$) 2,500 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 3,250 Value of Water (M$) 3,000-3,250 2,750-3,000 2,500-2,750 2,250-2,500 2,000-2,250 1,750-2,000 1,500-1,750 1,250-1,500 1,000-1,250 750-1,000 500-750 250-500 0-250 Kinbasket (Mica) Storage (cms-d) Figure 33. 3-D diagrams of value of water in storage (June and July) 76 From these diagrams it can be seen that the value function depends on the storage level in both Mica and GMS reservoirs. This dependence, however, varies depending on the month of the year as it is lower in some months such as in the April 16th to 31st and in May (Figures 32) as it can be seen that the surface function is “flatter”, thereby entailing lower marginal value of water. It can be seen that it is steeper and higher in other months such as in November and December (Figure29), and to a lesser extent in January and February. During the first five time steps, the marginal value of water, which is calculated as the slope of the water value functions, is high as can be seen from Figures 27, 28 and 29. Particularly in winter, the electricity demand is high and therefore the energy prices are high as well as shown in Figures 21, 22 and 23. So power generation is profitable for the system and the model gives preference to export energy during these months, and the values of water are high during the winter months as shown in Figures 29 and 30. In the freshet period, on the other hand, the inflows are high and the market prices and loads are low as shown in Figures 19-23. So the marginal value of energy would be low as can be seen from the slope of value of water functions in Figures 31, 32 and 33. In some years the model is forced to spill, and even must violate the maximum reservoirs limits at a cost of the penalty to achieve a feasible optimal solution, and the water value function is low in these months. The following three time-steps have been selected to represent and discuss the value of water results in more details: One in summer (1-15 August), one in winter (January) and one in the freshet period (May). Figures 34 to 39 show the value of water in 2-D diagrams with respect to Mica and GMS storage volumes for these three months respectively. 77 1-15 Aug 3,200 Value of Water (M$) 2,800 2,400 2,000 1,600 1,200 800 400 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GMS Storage (cm s-d) Mica=125000 cms-d Mica=145000 cms-d Mica=165000 cms-d Mica=185000 cms-d Mica=205000 cms-d Mica=225000 cms-d Mica=245000 cms-d Mica=265000 cms-d Mica=285000 cms-d Figure 34. 2-D diagram of GMS value of water in storage (1st to 15th August) 1-15 Aug 3,200 2,800 Value of Water (M$) 2,400 2,000 1,600 1,200 800 400 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 35. 2-D diagram of GMS value of water in storage (1st to 15th August) Jan 3,200 2,800 Value o f W ater (M $) 2,400 2,000 1,600 1,200 800 400 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GM S Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 36. 2-D diagram of Mica value of water in storage (January) 78 Jan 3,200 2,800 Valu e o f Wat er (M $) 2,400 2,000 1,600 1,200 800 400 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 37. 2-D diagram of GMS value of water in storage (January) May 3,200 2,800 Value of Water (M$) 2,400 2,000 1,600 1,200 800 400 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GMS Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 38. 2-D diagram of Mica value of water in storage (May) May 3,200 2,800 Value of Water (M$) 2,400 2,000 1,600 1,200 800 400 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 M ica Storage (cm s -d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 39. 2-D diagram of GMS value of water in storage (May) 79 The 2-D diagrams are plotted to show the value function for different GMS and Mica reservoir storage volumes. Figures 34 and 37 show that the value of water function is steep in 1-15 August and January. However the slope of the value function decreases as the storage in Mica and GMS increases. This is more obvious in May (Figures 38 and 39), which shows that the curves are almost flat at high Mica and GMS reservoir storage levels. This is because of the effect of the available storage volumes on the value of water in the reservoirs. Generally as the reservoir level increase the marginal value, or the unit increase in storage value, of the water decreases, and the results of the CRT-RL model clearly demonstrates this. As mentioned earlier, the marginal value of water is derivative of the value of water function. It is also called the shadow price or the Lagrange multiplier of the mass balance equation for a reservoir (Equation 5.1) [4], in $/cms-day. For a better representation, it is better to convert the marginal value of water to the marginal value of energy in $/MWh to compare it to the electricity price in the market. An example of these results, the marginal value of energy for Mica and GMS with respect to the reservoir storages for 1-15 August, January and May are plotted in 2-D diagrams as shown in Figures 40-45. Figure 40 shows that the GMS marginal value of energy in 1-15 August starts from 130 $/MWh and it decreases as the GMS storage increases. At the maximum GMS storage levels (450,000 and 470,000 cms-d) the GMS marginal value of energy ranges between 90-100 $/MWh. 80 1-15 Aug Marginal Value of Energy ($/MWh) 180 160 140 120 100 80 60 40 20 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GM S Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 40. 2-D diagram of the marginal value of energy for GMS (1-15 August) Figure 41 shows Mica’s marginal value of energy in $/MWh for 1-15 August for different GMS reservoir storages. It can be seen that the maximum Mica marginal value of energy is between 150-165 $/MWh when its reservoir storage is at the minimum level (125,000 cms-d). As the Mica reservoir storage increases the marginal value of energy decreases but it does not go less than 115 $/MWh even when the Mica and GMS reservoir storages are at their maximum levels. 1-15 Aug Marginal Value of Energy ($/MWh) 180 160 140 120 100 80 60 40 20 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 41. 2-D diagram of the marginal value of energy for Mica (1-15 August) Figure 42 shows that the GMS marginal value of energy in January starts at about 120 $/MWh and decreases as the GMS storage increases. At the maximum GMS storages 81 (450,000 and 470,000 cms-d) the GMS marginal value of energy decreases to about 50 $/MWh. January Marginal Value of Energy ($/MWh) 180 160 140 120 100 80 60 40 20 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GM S Storage (cm s -d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 42. 2-D diagram of the marginal value of energy for GMS (January) Figure 43 shows the Mica marginal value of energy in $/MWh in January for different GMS reservoir storages. It can be seen that the maximum Mica marginal value of energy is between 140-160 $/MWh when its reservoir storage is at the minimum level (125,000 cms-d). As the Mica reservoir storage increases the corresponding marginal value of energy decreases but it does not go less than 60 $/MWh even when the Mica and GMS reservoir storages are at their maximum levels. January Marginal Value of Energy ($/MWh) 180 160 140 120 100 80 60 40 20 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 43. 2-D diagram of the marginal value of energy for Mica (January) 82 Figure 44 shows the GMS marginal value of energy in the freshet period (May) for different Mica reservoir storages. It can be seen that at this time-step, when the GMS reservoir storage volume is more than 450,000 cms-d the corresponding marginal value of energy will be at its lowest level and it even drops to almost zero. When GMS is empty the marginal value of energy is about 120 $/MWh for different Mica storages. It can be seen that the marginal value of energy is not sensitive to Mica’s storage at very low or very high GMS storage levels. May 180 Value of Energy ($/MWh) 160 140 120 100 80 60 40 20 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GMS Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 44. 2-D diagram of the marginal value of energy for GMS (May) Figure 45 shows the Mica marginal value of energy in the freshet period (May) for different GMS reservoir storages. As the results show, the maximum Mica marginal value of energy in this month is about 140 $/MWh for the minimum Mica storage volume. As the Mica reservoir storage increases, the marginal value of energy decreases and it decreases to almost zero when the reservoir storage is more than 245,000 cms-d. It can also be seen that when GMS is nearly full (more than 400,000 cms-d) the Mica marginal value of energy drops down to zero for storage levels of more than 225,000 cms-d, so during the freshet period, as there is high inflow and low market prices and low load, the marginal value of energy is correspondingly low. 83 May Marginal Value of Energy ($/MWh) 180 160 140 120 100 80 60 40 20 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cm s -d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 45. 2-D diagram of the marginal value of energy for Mica (May) Figures 46 shows the marginal value of energy of GMS in different time-steps and different GMS storage volumes when the Mica reservoir storage is at the medium level (205,000 cms-d). It can be seen that when GMS is empty the marginal value of energy is high for all months particularly in October, November and December. As the GMS reservoir storage level increases, the marginal value of energy decreases. This decrease is higher in March, April and May. When GMS is full the marginal value of energy drops to almost zero in the second half on April and in May. Generally speaking, during the fall and winter the marginal value of energy in GMS is high while during the freshet period it is low and the marginal value of energy for GMS decreases as the reservoir storage increases. Figures 47 shows the marginal value of energy of Mica in different time-steps and different Mica storage volumes when the GMS reservoir storage is at the medium level (300,000 cms-d). Almost the same pattern as GMS can also be seen for Mica. During fall and winter the marginal value of energy in Mica is high while during the freshet period it is low and even drops down to zero in 16th to 30th of April and May when the Mica storage is more than 245,000 cms-d. In general, the marginal value of energy for Mica decreases as the Mica reservoir storage increases. 84 160 140 75000 120 125000 150000 100 200000 80 250000 60 300000 40 400000 20 450000 350000 470000 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug GMS Marginal Value of Energy ($/MWh) 180 Figure 46. 2-D diagram of the marginal value of energy for GMS when the Mica storage is equal to205,000 cms-d Marginal Value of Energy ($/MWh) 180 160 125000 140 145000 120 165000 100 185000 205000 80 225000 60 245000 265000 40 285000 20 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 47. 2-D diagram of the marginal value of energy for Mica when the GMS storage is equal to 300,000 cms-d The differences between the marginal value of energy in Mica and GMS are: · The highest GMS marginal value of energy is mostly in August other than when GMS is empty (75,000 and 125,000 cms-d) while the highest value of energy in Mica is in October and November. · When GMS is empty its marginal value of energy is almost constant in March and April while the marginal value of energy drops down in these months for all reservoir storages in Mica. · Mica has higher marginal values of energy than that of the GMS. 85 5.4.3 The Impact of Modeling the FCC Curves and CRT Discharge Limits To evaluate the impact of the Columbia River Treaty on the system, the flood control curves for Mica and Arrow and limits on the discharges from the Arrow reservoir, Trail and the US border control points were omitted from the optimization model. To omit the FCC curves, this Equation (5.15) was enforced for all plants and Equation (5.16) was dropped from the optimization model. This frees the model from the CRT rules and constraints. The results show that the value of water is higher in the freshet period while the difference is small in other months. Figures 48 and 49 compare the value of water with respect to Mica storage when GMS storage is 300,000 cms-d with and without the flood control and discharge limits in December and 16-30 April respectively. Figure 48. Value of water in storage with and without FCC and discharge limits for GMS storage of 300,000 cms-d in December 86 V a lu e o f W ater S to rag e ($) M illio n s 3000 2750 2500 2250 2000 1750 1500 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) with FCC and Discharge Limits without FCC and Discharge Limits Figure 49. Value of water in storage with and without FCC and discharge limits for GMS storage of 300,000 cms-d in 16-30 April In addition, the average marginal value of energy for Mica is higher by about 9 $/MWHr and the difference is highest in the freshet period by up to 34 $/MWh, particularly when the reservoir storage is high as shown in Figure 50. This is a clear indication that the CRT rule and constraints have some implication on the value of resources in the BC Hydro system. Marginal Value of Energy ($/MWh) 140 120 100 80 60 40 20 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cms-d) with FCC and Discharge Limits without FCC and Discharge Limits Figure 50. Marginal value of energy with and without FCC and discharge limits for GMS storage of 300,000 cms-d in 16-30 April 87 Figure 51 shows the average difference between the GMS marginal value of energy when Mica reservoir storage is at the medium level (205,000 cms-d) with and without FCC curves and the discharge limits. This figure shows that inclusion of the FCC curves and the discharge limits in the model caused, on the average, a decrease in the GMS marginal value of energy in July, August, September and October while it increases the marginal value of energy in other months. However the difference in GMS marginal value of energy is not more than 1.6 $/MWh. 2 1.5 1 0.5 0 -0.5 -1 -1.5 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Difference between GMS Marginal Value of Energy ($/MWh) With FCC-Without FCC Figure 51. The average difference between the marginal value of energy for GMS with and without FCC curves and discharge limits when the Mica storage is equal to 205,000 cms-d Figure 52 shows the average difference between Mica marginal value of energy when GMS reservoir storage is at storage level of 300,000 cms-d with and without FCC curves and the discharge limits. As this figure shows, the FCC inclusion decreases the marginal value of energy in all months. The maximum marginal value of energy decrease at Mica by 20 $/MWh in 1-15 April. 88 0 -5 -10 -15 -20 -25 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Difference between Mica Marginal Value of Energy ($/MWh) With FCC-Without FCC Figure 52. The average difference between the marginal value of energy for Mica with and without FCC curves when the GMS storage is equal to 300,000 cms-d 5.4.4 Policy Results This section presents a set of generalized policy results of the CRT-RL model. These policies are expressed as a function of the reservoir storage level. Figure 53 shows the variability of load and other small hydro (non-optimized) generation, and Figure 54 shows the variability of import and export prices. 9000 8000 7000 MWh 6000 5000 Other Gener ations 4000 3000 Load 2000 1000 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 53. Load and other generations for the selected scenario 89 120.00 100.00 $/MWh 80.00 60.00 40.00 20.00 0.00 Jul Jun May 16-30 apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Export Price Import Price Figure 54. Import and export prices for the selected scenario 5.4.4.1 Import/Exports Policies The results of the total import/export for 3 time-steps (1st-15th August, January and May) are presented in Figures 55-62. In these diagrams, positive numbers indicate exports while negative numbers indicate imports. From Figures 55 and 56, it can be seen that the system is exporting in the 1st-15th August period when reservoir storages in Mica and GMS are high. As shown in Figure 53, the load is not high in this period. Figure 54 also shows that the 1st-15th August period has an average price, therefore it is expected that the system will be in a net exporting mode when the reservoirs are full in this month. When the reservoirs are about or below average levels, the marginal values of energy are more than the market price and therefore the system prefers not to export or even sometimes has a small amount of imports. 90 1-15 Aug 3,200 2,800 2,400 Trade (MWh) 2,000 1,600 1,200 800 400 0 75000 -400 125000 175000 225000 275000 325000 375000 425000 475000 GMS Storage (cm s-d) Mica=125000 cms-d Mica=145000 cms-d Mica=165000 cms-d Mica=185000 cms-d Mica=205000 cms-d Mica=225000 cms-d Mica=245000 cms-d Mica=265000 cms-d Mica=285000 cms-d Figure 55. Import and export with respect to GMS Storage in 1-15 August periods 1-15 Aug 3,200 2,800 2,400 Trade (MWh) 2,000 1,600 1,200 800 400 0 125000 -400 145000 165000 185000 205000 225000 245000 265000 285000 Mica Storage (cm s-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 56. Import and export with respect to Mica Storage in 1-15 August periods Figures 57 and 58 show that in January the system is in export mode most of the times other than when the reservoir storage in Mica and GMS is low. This is due to high demands (Figure 53) and high price levels (Figure 54). Therefore when the reservoirs are empty, the marginal value of energy is higher than the market price, and the system imports energy to satisfy the load. When the reservoir storage levels are high and the marginal value of energy is lower than the market price, it becomes more profitable for the system to produce and sell energy. However, the maximum amount of export in January is lower than that in 1st-15th August period because the amount of small hydro 91 generation is lower in this month as shown in Figure 53 and the system generation production is primarily used to meet the domestic load. Jan 3,200 2,800 2,400 2,000 1,600 T rad e (M Wh ) 1,200 800 400 0 75000 -400 125000 175000 225000 275000 325000 375000 425000 475000 -800 -1,200 -1,600 -2,000 GM S Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 57. Import and export with respect to GMS Storage in January Jan 3,200 2,800 2,400 2,000 Trade (MWh) 1,600 1,200 800 400 0 125000 -400 145000 165000 185000 205000 225000 245000 265000 285000 -800 -1,200 -1,600 -2,000 Mica Storage (cms-d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 58. Import and export with respect to Mica Storage in January May exhibits the lowest price as shown in Figure 54. Also the demand in this month is not high in this month and it is almost identical to the demand in 1st-15th August period. Figures 59 and 60 show that the system is exporting all the times. However, the amounts of export are not the same for all the reservoir levels. When the reservoirs all empty their marginal value of energy are high as shown in Figures 44 and 45 therefore the model 92 exports at a lower level. While for full reservoir storages, the marginal value of energy is low and it drops sometimes to zero and the system generates power and exports at full capacity to maximize rewards. It should be noted that having a full reservoir in the freshet period is not usually practical. May 3,200 2,800 Trade (MWh) 2,400 2,000 1,600 1,200 800 400 0 75000 125000 175000 225000 275000 325000 375000 425000 475000 GMS Storage (cm s-d) Mica=125000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=145000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=165000 cms-d Mica=225000 cms-d Mica=285000 cms-d Figure 59. Import and export with respect to GMS storage in May May 3,200 2,800 Trade (MWh) 2,400 2,000 1,600 1,200 800 400 0 125000 145000 165000 185000 205000 225000 245000 265000 285000 M ica Stor age (cm s -d) GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 60. Import and export with respect to Mica storage in May 93 5.4.4.2 Diagrams of Reservoir Volumes Figure 61 shows the optimal GMS reservoir storage policy in a year when the Mica reservoir volume is equal to 205,000 cms-d. The curve legend corresponds to the initial GMS reservoir storage volume at the start of the stage. For example, the pink curve corresponds to the GMS initial reservoir storage equal to 125,000 cms-d at the start of the planning period (stage). It means that if the GMS reservoir storage at the beginning of 16th to 31st of August month is equal to 125,000 cms-d, the optimum operation is to keep it at a higher level of 136,104 cms-d. In March, on the other hand, if the GMS starting reservoir storage is 125,000 cms-d, the reservoir has to be drafted to 104,040 cms-d. It should be noted that these curves do not show the GMS trajectory, but they represent the optimum reservoir draft or fill policy. It can also be seen that the GMS reservoir has to be drafted in winter to produce electricity and the storage levels has to be kept rising in the freshet period to fill the reservoirs to be used in the future. GMS Storage Volume (cms-d) 500000 450000 400000 350000 300000 250000 200000 150000 136104 100000 104040 50000 0 Jul Jun May 16-30 Apr GMS=125000 cms-d GMS=250000 cms-d GMS=400000 cms-d 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug GMS=75000 cms-d GMS=200000 cms-d GMS=350000 cms-d GMS=470000 cms-d GMS=150000 cms-d GMS=300000 cms-d GMS=450000 cms-d Figure 61. GMS reservoir volume for Mica storage volume=205000 cms-d The same optimal storage policy is presented for Mica. Figure 62 shows the Mica reservoir storage in a year when the GMS reservoir volume is equal to 300,000 cms-d. For example, the light blue curve corresponds to the initial Mica reservoir storage equal to 265,000 cms-d. It means that if the Mica reservoir storage at the beginning of 94 September is equal to 265,000 cms-d then the optimum operation is to draft it to 282,581 cms-d. Similarly, if the Mica starting reservoir storage in May is 265,000 cms-d, the optimal policy would be for the reservoir to be filled to 244,958 cms-d level. It can also be seen that the Mica reservoir has to be drafted in winter to produce electricity and has to be kept rising in the freshet period to fill the reservoir to be used in the future. Mica Resevoir Volume (cms-d) 300000 282581 280000 260000 244958 240000 220000 200000 180000 160000 140000 120000 100000 Mica=145000 cms-d Mica=165000 cms-d Mica=185000 cms-d Mica=245000 cms-d Mica=205000 cms-d Mica=265000 cms-d Mica=225000 cms-d Mica=285000 cms-d Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Mica=125000 cms-d Figure 62. Mica reservoir volume for GMS storage volume=300000 cms-d 5.4.4.3 Diagram of the Energy Figure 63 shows the energy production level for the five plants when the initial GMS reservoir storage level is equal to 300,000 cms-d and the Mica reservoir storage is equal to 205,000 cms-d. It can be seen that the GMS contribution to power generation is higher than other plants. The amount of energy produced is high in the winter months when the load and prices are high and when the ice constraint is in effect, and it is low in the freshet period. 95 6000 Energy Generation (MW) 5000 4000 ARD REV 3000 MCA PCN GMS 2000 1000 0 Jul Jun May 16-30 Apr 1-15 Apr Mar Feb Jan Dec Nov Oct Sep 16-31 Aug 1-15 Aug Figure 63. Energy production when GMS storage volume=300000 cms-d and Mica storage volume=205000 cms-d 5.5 Summary In this chapter, the details of RL-CRT model were presented. Different components of RL model were described in chapter 4 and they were defined and explained in more details in this Chapter. The environment, CRTOM, was described and all the constraints and objective functions that were used in the optimization model were outlined. The results of running the CRT-RL model was presented and explained in detail. 96 6. SUMMARY AND CONCLUSIONS 6.1 SUMMARY Operation planning of hydropower generation systems is a well-known and complex optimization problem that has attracted many investigators. The main objective of the problem is to determine the optimal operation policies to maximize the expected value of the system resources and make the best use of available stored water in reservoirs. These decisions have to be made at each time step: the amount of water released from a reservoir and the amount of energy traded. The ideal operation of a reservoir system is to produce and sell energy when electricity prices are high and store water for future use when energy prices are low while meeting domestic load and firm import/export contracts. Another important issue is to make the optimal trade-off between present benefits, expressed as revenues from energy purchase and sale and the potential expected long-term value of resources expressed as the value of water stored in reservoirs. Therefore the objective of controlling reservoir levels is to meet current and future electricity demand and to optimize system efficiency. The majority of water resources problems deal with uncertainty in some of their input parameters. The uncertainty exists due to spatial and temporal variability, the inherent nature of a problem or an input parameter, errors in measurement, human or technological inaccuracy, and other errors in modeling because of simplification or ignorance. Considering uncertainty in a model may make it larger, nonlinear, complex and infeasible, but its solution provides decision makers with more robust and reliable results and enables them to hedge against possible future risks. In this study, the BC Hydropower generation system, including the Columbia River Treaty constraints, was considered in an optimization model with stochastic parameters. The model used a Reinforcement Learning (RL) optimization algorithm incorporating the main sources of uncertainty, which are market prices and inflows. 97 The Columbia River Treaty, signed between the US and Canada in 1964, defined different rules and constraints on the operation of the Duncan, Mica and the Arrow dams in B.C. The treaty restricts these reservoir operations through flood control curves and release limits, which are set depending on the inflow scenarios. The CRT has different water use priorities: consumption, flood control and prevention, firm energy demand, target volume of the reservoir and the secondary energy. Flood control and power generation are among the most important priorities and were considered in the optimization model in this study. Reinforcement learning (RL) is a learning method for achieving an optimal solution to stochastic problems. The RL concept can be summarized as follows: an agent takes an action and after interacting with the environment receives a feedback as a reward. These keywords are the four main components of RL that could be defined by a model. The interaction between the agent and the environment takes place in a sequence of time-steps (stages). RL is based on Dynamic Programming framework which discretizes the state space and this interaction occurs in all the possible states variables in each stage. After occurrence of agent-environment interaction, the agent receives two feedbacks: a reward defined by a reward function and the state of the next stage (t+1). The agent saves and updates the reward and little by little learns how to behave to maximize the cumulative rewards. The reward function maps each state-space pair to a single number. The RL method reduces the time and computational effort needed to solve the multi reservoir operation planning problem. In this study, a RL based model was used to determine the value of water and the marginal value of water for the BC Hydro system. In this model, the reservoir storages of GMS on the Peace River and Mica on the Columbia River were discretized to make the state space. The Columbia River Treaty Optimization Model was used as the environment. The model was formulated in AMPL and was run for 400 iterations. The results include several diagrams of the value of water and the marginal value of water in different time-steps as well as the total trade and generation for each plant in each stage. 98 6.2 CONCLUSIONS The RL model, developed by Abdalla (2007), was successfully extended to model the operation of the BC Hydro system with Columbia River Treaty constraints and it was run using 69 historical inflow scenarios. The results indicate that the model is able to solve this stochastic large-scale problem within a reasonable time and computational effort. The model provides the value of water in storage with respect to Mica and GMS reservoir storage. It also gives the results of the marginal value of water and the marginal value of energy which are considered to be of key importance in the decision making process at BC Hydro. The results provide the optimal policy for different Mica and GMS reservoir storage levels in different time steps. Having these optimal policies, the amount of water released through turbine discharge and spillage and also the amount of energy produced and traded with the US and Alberta in each time step can be determined. The main objective of this study was to evaluate the effect of the Columbia River Treaty terms and conditions on the BC Hydro generation system. The results showed that incorporating the flood control curves and discharge constraints decrease the value of water in the system. The amount of decrease is higher in the freshet period since the flood control and discharge limits have more significant impact on storage and releases decisions in that period. The marginal value of energy for Mica is lower when the Columbia River Treaty limits are incorporated in the model as shown in Figure 50. Generally, it can be concluded that incorporation of the Columbia River Treaty rules and constraints will results in added cost for the BC Hydro generation system. The CRT-RL model results show that the value of water in storage depends on the Mica and GMS reservoir storage. This dependency varies from one time-step to the next. The marginal value of water is higher during winter when the electricity load and prices are high and is lower in the freshet period when a high volume of water is flowing into reservoirs and also when the energy load and prices are low. The results also show that Mica operation is restricted during the freshet period due to CRT flood control curves. 99 The RL algorithm overcomes the curse of dimensionality problem which has been a wellknown difficulty in implementing the stochastic dynamic programming models for reservoir operations problems. This model can be applied to reservoir operation problems and can enable the decision makers to look at the effect of incorporating stochastic parameters in their policies. 6.3 FUTURE WORKS A number of enhancements and extensions are needed to make the CRT-RL more useful in operating the BC Hydro system: - The model can be further developed to consider methods of function approximations such as quadratic functions and piecewise linear functions. - The model can use online scenario generation methods such as auto-regression. - The BC Hydro inflow forecasting system can be used to provide forecast inflow data and to generate scenarios that can be used in this model. - The multi-agent RL method can be used to increase the model efficiency and decrease the time and computational effort required for algorithm convergence. - The optimization model can be linked to the BC Hydro simulation model to provide detailed simulation of the modeled system. - The CRT-RL model could be extended to include other reservoirs on the main stem of the Columbia River, in particular to include the large storage reservoirs in the US and possibly the Duncan reservoir in BC. 100 REFERENCES 1. Doege, J., P. Schiltknecht, H.J. Lüthi (2006), “Risk Management of Power Portfolios and Valuation of Flexibility“, OR Spectrum, Vol. 28, No. 2, pp. 267-287. 2. BC Hydro Website Information: www.BCHydro.com 3. Shawwash Z. (2000), “A Decision Support System for Real-Time Hydropower Scheduling in a Competitive Power Market Environment”, PhD thesis, Dept. of Civil Engineering, University of British Columbia. 4. Abdalla E. A. (2007), “A reinforcement learning algorithm for operations planning of a hydroelectric power multireservoir system”, PhD thesis, Dept. of Civil Engineering, University of British Columbia. 5. Yeh, W. (1985), “Reservoir management and operation models: A State-of-the-art Review”, Water Resources Research, 21(12), pp. 1797-1818. 6. Labadie, J. W. (2004), “Optimal Operation of Multireservoir Systems: State-of-Art Review,” Journal of Water Resources Planning and Management, ASCE, 130(2), pp. 93-111. 7. Turgeon, A. (2007), “Stochastic Optimization of Multireservoir Operation: The Optimal Reservoir Trajectory Approach” Water Resources Research, 43(5), pp. 405420. 8. Labadie, J. W. (1997), “Reservoir System Optimization Models,” Water Resources Update, 108, Universities Council on Water Resources. 101 9. Loucks, D.P. 1968, “Computer models for reservoir regulations,” Journal of the Sanitary Engineering Division, American Society Civil Engineers 94 SA4, pp. 657669. 10. Birge, J. R. (1985), “Decomposition and Partitioning Methods for Multistage Stochastic Programs,” Oper. Res., 33, pp. 989-1007. 11. Hiew, K., J. Labadie, J. Scott (1989), “Optimal Operational Analyses of the ColoradoBig Thompson Project,” in Computerized Decision Support Systems for Water Managers, J. Labadie et al., (eds.), ASCE, New York, pp. 632-646. 12. Karamouz M., MH. Houck, JW. Delleur (1992), “Optimization and Simulation of Multiple Reservoir Systems,” Journal of Water Resources Planning and Management, Volume 118, Issue 1, pp. 71-81. 13. Tang, Y. (2007), “A mixed integer-linear programming model for solving the hydroelectric unit maintenance scheduling problem”, Master thesis, University of British Columbia. 14. N.V. Arvanitidis, J. Rosing (1970), “Composite Representation of a Multireservoir Hydroelectric Power System,” IEEE Transactions on Power Apparatus and Systems PAS-89 pp. 327–335. 15. Turgeon A. (1981), “Optimal Short-Term Hydro Scheduling From the Principle of Progressive Optimality,” Water Resources Research, 17(3), pp. 481-486. 16. Archibald, T. W., K. I. M. Mckinnon, L. C. Thomas (1997), “An aggregate stochastic dynamic programming model of multiple reservoir systems,” Water Resource Research, 33(2), pp. 333-340. 102 17. Druce, D. J. (1989), “Decision Support for Short Term Export Sales From a Hydroelectric System,” In Computerized Decision Support Systems for Water Managers, J. W. Labadie, L.E. Brazil, I. Corbue, and L.E. Johanson, Eds., ASCE, pp. 490-497. 18. Druce, D. J. (1990), “Incorporating Daily Flood Control Objectives into a Monthly Stochastic Dynamic Programming Model for a Hydroelectric Complex,” Water Resources Research, 26(1), pp. 5-11. 19. Pereira, M. V. F., L. M. V. G. Pinto (1985), “Stochastic Optimization of a Multireservoir Hydroelectric System: A Decomposition Approach,” Water Resources Research, 21(6), pp. 779-792. 20. Jacobs, J. G., J. Grygier, D. Morton, G. Schulz, K. Staschus, and J. Stedinger (1995), “SOCRATES: A System for Scheduling Hydroelectric Generation Under Uncertainty,” in Models for planning under uncertainty, Vladimirou, et al. (eds.), Baltzer Publishers, Bussum, The Netherlands. 21. Kelman, J., J. R. Stedinger, L. A. Cooper, E. Hsu, and S. Yuan (1990), “Sampling Stochastic Dynamic Programming Applied to Reservoir Operation,” Water Resour. Res., 26(3), pp. 447–454. 22. Johnson SA., JR. Stedinger, K. Staschus (1991), “Heuristic operating policies for reservoir system simulation,” Water Resource Research, 27(5): pp. 673–685. 23. Nash, G. (2003), “Optimization of the Operation of a Two-Reservoir Hydropower System,” Ph.D. Thesis, Department of Civil Engineering, University of British Columbia,”. 24. Oliveira, R., D. P. Loucks (1997), “Operating Rules for Multireservoir Systems,” Water Resour. Res., 33(4), pp. 839–852. 103 25. Wardlaw R., M. Sharif (1999), “Evaluation of Genetic Algorithms for Optimal Reservoir System Operation,” Vol. 125, No. 1, pp. 25-33. 26. Ahmed J.A., A.K. Sarma (2005), “Genetic Algorithm for Optimal Operating Policy of a Multipurpose Reservoir,” Water Resources Management, Volume 19, Number 2, pp. 145-161. 27. Afshar M. H., R. Moeini (2008), “Partially and Fully Constrained Ant Algorithms for the Optimal Solution of Large Scale Reservoir Operation Problems,” Water Resources Management, Volume 22, Number 12, pp. 1835-1857. 28. Teegavarapu R.S.V., S.P. Simonovic (2002), “Optimal Operation of Reservoir Systems using Simulated Annealing,” Water Resources Management, Volume 16, Number 5, pp. 401-428. 29. Sutton, R. S., and A. G. Barto (1998), “Reinforcement Learning – An Introduction,” MIT Press, Cambridge, Massachusetts. 30. Wilson, G. (1996), “Reinforcement Learning: A New Technique for the Real-Time Optimal Control of Hydraulic Networks,” Proc. Hydroinformatics Conf., Balkema, The Netherlands. 31. Bhattacharya B., A. H. Lobbrecht, and D. P. Solomatine (2003), “Neural Networks and Reinforcement Learning in Control of Water Systems,” J. Water Resources Planning and Management, Volume 129, Issue 6, pp. 458-465. 32. Abolpour B., M. Javan, and M. Karamouz (2005), “Water allocation improvement in river basin using Adaptive Neural Fuzzy Reinforcement Learning approach,” Applied Soft Computing, Volume 7, Issue 1, pp. 265-285. 104 33. Lee, J.-H., and J. W. Labadie (2007), “Stochastic optimization of multireservoir systems via reinforcement learning,” Water Resource Research, 43, W11408, doi:10.1029/2006WR005627. 34. BC Hydro reports and presentations on Columbia River Treaty. 35. Ketchum K., The Columbia River Treaty -a short summary, a presentation provided by BC Hydro available at: http://www.rdck.bc.ca/publications/pdf/BC%20Hydro%20CRT%20pres.pdf 36. The Columbia River Treaty and Protocols and Related Documents (1964), available at: http://www.empr.gov.bc.ca/EAED/EPB/Documents/1964_treaty_and_protocol.pdf 37. http://www.lrf.org/AboutLR/ColBasinMap.html 38. Christopher M. Bishop (2006), “Pattern Recognition and Machine Learning,” Springer. 39. Gosavi, A. (2003), “Simulation-Based Optimization”, Kluwer Academic Publishers. 40. Shawwash K.Z. (2009), Columbia River Treaty Optimization Model, personal communication. 41. Eva-Dar, E. and Y. Mansour (2003), “Learning Rates for Q-Learning,” Journal of Machine Learning Research, 5, pp. 1-25. 105
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Incorporating flood control rule curves of the Columbia...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Incorporating flood control rule curves of the Columbia River hydroelectric system in a multireservoir… Shabani, Nazanin 2009
pdf
Page Metadata
Item Metadata
Title | Incorporating flood control rule curves of the Columbia River hydroelectric system in a multireservoir reinforcement learning optimization model |
Creator |
Shabani, Nazanin |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | The main objective of reservoir operation planning is to determine the amount of water released from a reservoir and the amount of energy traded in each time step to make the best use of available resources. This is done by evaluating the trade-off between the immediate and the future profit of power generation. A set of constraints have to be met in operating a reservoir system such as the continuity equations, transmission limits, generation and reservoir limits, flood control limits and load resource balance. Another important issue that needs to be addressed in these problems is uncertainty, which comes from lack of knowledge or certainty about the exact amount of an input parameter because of its spatial and temporal variability, inherent nature of a problem or parameter, errors in measurement due to human or technology inaccuracy and other errors in modeling due to simplification or ignorance. This research successfully implements a Reinforcement Learning (RL) optimization algorithm incorporating some of the operating rules and flood control constraints of the Columbia River Treaty. It considers the main sources of uncertainty in operating a large scale hydropower system: market prices and inflows by using a number of scenarios of historical data on inflow and energy prices in the learning process. The RL method reduces the time and computational effort needed to solve the operational planning problem and can be used to determine the value of water and marginal value of water for the BC Hydro system. The results suggest that the RL algorithm can incorporate a typical flood control rules for multireservoir optimization problems of this type. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0063141 |
URI | http://hdl.handle.net/2429/17470 |
Degree |
Master of Applied Science - MASc |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of Civil Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2010-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2010_spring_shabani_nazanin.pdf [ 1.12MB ]
- Metadata
- JSON: 24-1.0063141.json
- JSON-LD: 24-1.0063141-ld.json
- RDF/XML (Pretty): 24-1.0063141-rdf.xml
- RDF/JSON: 24-1.0063141-rdf.json
- Turtle: 24-1.0063141-turtle.txt
- N-Triples: 24-1.0063141-rdf-ntriples.txt
- Original Record: 24-1.0063141-source.json
- Full Text
- 24-1.0063141-fulltext.txt
- Citation
- 24-1.0063141.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0063141/manifest