MODELING FORCED OUTAGE IN HYDROPOWER GENERATING UNITS FOR OPERATIONS PLANNING MODEL by Abhishek Agrawal B.Tech., Indian Institute of Technology (BHU), Varanasi, 2015 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (CIVIL ENGINEERING) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2018 © Abhishek Agrawal, 2018 ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, a thesis entitled: Modeling Forced Outage in Hydropower Generating Units for Operations Planning Model submitted by Abhishek Agrawal in partial fulfillment of the requirements for the degree of Masters of Applied Science in Civil Engineering Examining Committee: Professor Ziad Shawwash Supervisor Professor Gregory Lawrence Supervisory Committee Member Supervisory Committee Member Additional Examiner Additional Supervisory Committee Members: Supervisory Committee Member Supervisory Committee Member iii Abstract Unplanned outages of generating units, also known as forced outages, act as a source of operational uncertainty for hydropower companies like BC Hydro. Forced outages reduce plant availability and causes loss of system flexibility and revenues. A combination of both likelihood of occurrence (frequency) and severity of outage event (duration) truly represents the risks posed by forced outages. Energy studies, using simulation and optimization models, are carried out by utility companies to incorporate different sources of uncertainties and maximize benefits in multi-purpose, multi-reservoir systems. The Department of System Optimization at BC Hydro is developing new quantitative approaches to model uncertainty of forced outages in their operations planning models and system energy studies. In this thesis, statistical properties of forced outage datasets are quantified, and different algorithms to generate scenarios of forced outages are developed. The statistical analysis methods and scenario generation algorithms are applied for a major hydroelectric facility in the BC Hydro system having 10 generating units and results are presented. Time to Failure and Time to Repair for outage events were obtained and checked for annual trends, seasonality and correlations. Outages of units were also evaluated for homogeneity. The impacts of planned outage on forced outages were quantified and suitable probabilistic distributions were developed to represent frequency and duration of outages. Three different scenario generation algorithms were developed using Markov/Semi-Markov based processes and Monte Carlo Simulation. It was found that Semi-Markov based scenario generation algorithm that comprehensively accounts for impacts of planned outages on forced outages is best suited to generate scenarios of forced outages for energy studies and operational planning models. iv Lay Summary Forced (unplanned) outages of generating units in hydropower plants lead to loss of generation capacity. The uncertainty in occurrence and duration of outages affects the results of energy planning models and causes loss of flexibility and revenues during system operations. Scenarios of forced outages can model the uncertainty in generation capacity. This thesis presents the methods that can be used to comprehensively account for uncertainty of forced outages via scenarios. Recent data on forced outages were obtained and their statistical properties were investigated. Different scenario generation methods were developed to model those properties. Tests were carried out for a BC Hydro generating station. It was found that the impact of planned outages on forced outages should be incorporated to generate future outage scenarios. The most suitable method was identified and recommended for quantifying uncertainty due to forced outages in generating units. v Preface The author contributed in building the database, development of methods and results analysis for the research presented in Chapter 2, 3 and 4. A version of Chapter 4 was published in the proceedings of the Annual Canadian Dam Association Conference - 2017, under the title “Modeling Forced Outages of Hydropower Generator Units for Reliable Dam Operation” (Agrawal et al. 2017). The author’s research supervisor - Professor Ziad Shawwash and postdoctoral researcher Dr. Quentin Desreumaux guided the development of statistical analysis and scenario generation framework. The data on historic outage records were collected from BC Hydro’s Unit Status Record with the help from its administrator Mr. Stan Mathews, who also provided useful insights about outage data. The research objectives were shaped by Prof Shawwash and the Department of System Optimization at BC Hydro. Dr. Andrew Keats provided valuable feedback for the methods developed in Chapter 3. Mr. Tim Blair and Dr. Dave Bonser helped in verifying the assumptions made and benchmarking criteria used in Chapter 4. vi Table of Contents Abstract ......................................................................................................................................... iii Lay Summary ............................................................................................................................... iv Preface .............................................................................................................................................v Table of Contents ......................................................................................................................... vi List of Tables ..................................................................................................................................x List of Figures ............................................................................................................................... xi List of Abbreviations ................................................................................................................. xiv Acknowledgements .................................................................................................................... xvi Dedication .................................................................................................................................. xvii Chapter 1: Introduction ................................................................................................................1 1.1 Problems due to uncertainty in unit availability ............................................................. 2 1.2 Probabilistic analysis of forced outages .......................................................................... 4 1.3 Goals and Objectives ...................................................................................................... 5 1.4 Organization of Thesis .................................................................................................... 6 Chapter 2: Literature Review .......................................................................................................7 2.1 Database Management System for Generating Unit States ............................................ 7 2.2 Unit States in the BC Hydro Unit Status Record ............................................................ 8 2.2.1 Available States .......................................................................................................... 9 2.2.2 Forced Outage State .................................................................................................. 12 2.2.3 Maintenance Outage ................................................................................................. 13 2.2.4 Planned Outage ......................................................................................................... 14 vii 2.2.5 Forced Extensions ..................................................................................................... 15 2.3 Statistical analysis of unit states ................................................................................... 15 2.4 Reliability analysis in Forced outage modeling ............................................................ 22 2.4.1 Basic concepts of Reliability Analysis ..................................................................... 24 2.4.2 Hazard Rate and Bath Tub Curves ............................................................................ 25 2.4.3 Probability distributions for different hazard rates ................................................... 28 2.4.4 Development of reliability indices for forced outages .............................................. 30 2.4.5 Use of Markov process in Forced Outage modeling ................................................ 32 2.5 Research literature on Modeling of forced outages ...................................................... 37 2.5.1 Studies on cost of forced outage and preventive maintenance ................................. 37 2.5.2 Forced Outage modeling for reserve computation .................................................... 39 2.6 Suitable methods for hydropower units ........................................................................ 43 Chapter 3: Methodology..............................................................................................................45 3.1 Sources of Unit Unavailability...................................................................................... 45 3.1.1 Classification and reclassification of outages, forced extensions and de-rates ........ 46 3.1.2 Defining Time to Repair and Time to Fail................................................................ 48 3.2 Statistical Analysis ........................................................................................................ 50 3.2.1 Trend in Occurrence and duration of outages ........................................................... 51 3.2.2 Seasonality of Forced Outages .................................................................................. 52 3.2.3 Independence of TTR and TTF values ..................................................................... 53 3.2.4 Impact of planned outage on forced outages ............................................................ 54 3.2.5 Probabilistic model of outages .................................................................................. 55 3.3 Methods of sampling from forced outage distributions ................................................ 58 viii 3.3.1 Monte Carlo Simulations .......................................................................................... 59 3.3.1.1 State Sampling Approach ................................................................................. 60 3.3.1.2 State Duration Sampling Approach .................................................................. 61 3.3.1.3 System State Transition Sampling Approach ................................................... 63 3.4 Methods for scenario generation of forced outages ...................................................... 64 3.4.1 Markov Process-State Sampling Method .................................................................. 64 3.4.2 Semi-Markov Process for FO-FO model .................................................................. 68 3.4.3 Semi Markov Process for PO-FO model .................................................................. 73 3.5 Summary of Methods .................................................................................................... 77 Chapter 4: Case Study and Analysis of Results ........................................................................78 4.1 Gordon M. Shrum Generating Station .......................................................................... 78 4.2 Database of forced outages ........................................................................................... 80 4.2.1 Preliminary Analysis ................................................................................................. 81 4.3 Statistical Analysis of data ............................................................................................ 85 4.3.1 Homogeneity of Units ............................................................................................... 86 4.3.2 Trends in Forced Outage Data .................................................................................. 88 4.3.3 Seasonality of outages............................................................................................... 89 4.3.4 Independence of TTR and TTF................................................................................. 93 4.3.5 Impact of Planned Outage ......................................................................................... 94 4.3.6 Probabilistic Model for TTR and TTF ...................................................................... 97 4.4 Results of Scenario Generation ................................................................................... 100 4.4.1 Markov Process Model ........................................................................................... 101 4.4.2 Semi-Markov Process (FO-FO) model ................................................................... 105 ix 4.4.3 Semi-Markov process PO-FO model ...................................................................... 113 4.5 Summary of scenario generation methods .................................................................. 119 Chapter 5: Discussion and Conclusions ...................................................................................120 5.1 Key takeaways for building an outage database ......................................................... 120 5.2 Conclusions from statistical analysis of data .............................................................. 121 5.3 Discussions on Scenario Generation Methods ............................................................ 123 5.4 Recommendations: ...................................................................................................... 125 5.5 Future Work ................................................................................................................ 126 Bibliography ...............................................................................................................................128 x List of Tables Table 1: Available On-Line States ................................................................................................ 10 Table 2 Available Off-line States ................................................................................................. 11 Table 3 Forced Out States ............................................................................................................. 12 Table 4 Maintenance Outage State ............................................................................................... 13 Table 5 Planned and Upgrade Outages ......................................................................................... 14 Table 6 Forced Extension of Outages ........................................................................................... 15 xi List of Figures Figure 1 Average Hydropower operational status ........................................................................ 17 Figure 2 Mean FOD for units with different MCR ratings ........................................................... 19 Figure 3 Mean FOD for units at different age .............................................................................. 19 Figure 4 Mean FOD for units operated differently ....................................................................... 19 Figure 5 Major Component contribution to hydraulic unit ICbF due to unplanned outages ........ 20 Figure 6: Typical bath tub curve for component failure. .............................................................. 27 Figure 7: Two-state representation of Unit Availability ............................................................... 33 Figure 8 Multi-state representation of Unit Availability .............................................................. 33 Figure 9 Trade off curve for Total cost of unit maintenance ........................................................ 38 Figure 10 Simplified chart for unit availability ............................................................................ 49 Figure 11 : Description of TTR and TTF ...................................................................................... 50 Figure 12 TTR/TTF definition considering Planned Outages ...................................................... 55 Figure 13 Difference between ECDF and parametric CDF .......................................................... 56 Figure 14 System State Duration Method (Billinton and Li 1994) .............................................. 62 Figure 15 Two-State model for generating units .......................................................................... 65 Figure 16 Preventing overlap of sampled FO with existing PO ................................................... 71 Figure 17 Case of TTF period crossing PO period ....................................................................... 71 Figure 18 TTR- curtailment to prevent overlap with PO .............................................................. 72 Figure 19 TTR curtailment when FO exceeds PO period............................................................. 72 Figure 20 Check for TTF value crossing a PO period .................................................................. 74 Figure 21 Accepted value of TTF and TTR.................................................................................. 75 Figure 22 TTR curtailment and sampling from TTF_PF.............................................................. 76 xii Figure 23 Rejecting sampled TTF_FF and sampling from TTF_PF ............................................ 76 Figure 24 Peace River System in British Columbia ..................................................................... 79 Figure 25 Normalized Total Duration of Outages ........................................................................ 83 Figure 26 Normalized Number of Outage Events ........................................................................ 83 Figure 27 Statistical distribution of TTR of different Outages ..................................................... 84 Figure 28 ECDF curves of TTF for each unit ............................................................................... 87 Figure 29 ECDF curves of TTR for each unit .............................................................................. 87 Figure 30 Annual Trend in TTF.................................................................................................... 89 Figure 31 Annual Trend in TTR ................................................................................................... 89 Figure 32 Seasonality in TTR distribution.................................................................................... 91 Figure 33 Seasonality in TTF distribution .................................................................................... 91 Figure 34 Confidence band on TTF-winter months ..................................................................... 93 Figure 35 Scatter Plot of TTR and TTF ........................................................................................ 94 Figure 36 Two State Unit representation ...................................................................................... 94 Figure 37 Impact of Planned Outage on Forced Outage............................................................... 95 Figure 38 Impact of Planned Outages on TTR ............................................................................. 96 Figure 39 Impact of Planned Outages on TTF.............................................................................. 96 Figure 40 Best Fit distribution – TTR........................................................................................... 98 Figure 41 Best Fit distribution - TTF ............................................................................................ 99 Figure 42 Best Fit distribution - TTF-FF .................................................................................... 100 Figure 43 Best Fit distribution - TTF-PF .................................................................................... 100 Figure 44 TTF- Markov Process ................................................................................................. 103 Figure 45 TTR- Markov Process ................................................................................................ 104 xiii Figure 46 TTF FO-FO Model – (Parametric Input Distribution) ............................................... 106 Figure 47 TTR FO-FO Model – (Parametric Input Distribution) ............................................... 106 Figure 48 TTF FO-FO Model (Non-Parametric Distribution) ................................................... 108 Figure 49 TTR FO-FO Model (Non-Parametric Distribution) ................................................... 109 Figure 50 TTF FO-FO Model (PO Heuristics) ........................................................................... 111 Figure 51 TTR FO-FO Model (PO Heuristics)........................................................................... 111 Figure 52 TTR from PO-FO model ............................................................................................ 115 Figure 53 Bias in higher TTR values .......................................................................................... 115 Figure 54 TTF-FF using Parametric distribution ........................................................................ 116 Figure 55 TTF-PF using Parametric distribution ........................................................................ 116 Figure 56 TTF-FF in PO-FO model with bias correction ........................................................... 118 Figure 57 TTF-PF in PO-FO model with bias corrections ......................................................... 118 xiv List of Abbreviations ABNO Available but Not Operating AIC Akaike Information Criterion BC British Columbia BC Hydro British Columbia Hydro and Power Utility BCUC British Columbia Utility Commission CEA Canadian Electrical Association DAUFOP Derating Adjusted Utilization Forced Outage Probability ECDF Empirical Cumulative Distribution Function ERIS Equipment Reliability Information System ETA Event Tree Analysis FD Forced De-rate FO Forced Outage FOD Forced Outage Duration FOR Forced Outage Rate FTA Fault Tree Analysis GADS Generating Availability Data System GMS Gordon Merritt Shrum ICbF Incapability Factor LOLE Loss of Load Expectation LOLP Loss of Load Probability MC Markov Chain xv MCR Maximum Continuous Rating MCS Monte Carlo Simulation MLE Maximum Likelihood Estimation MMTR Mean Time to Repair MO Maintenance Outage MTTF Mean Time to Fail NERC North American Electric Reliability Corporation PDF Probability Distribution Function PO Planned Outage RBD Reliability Block Diagram SD Scheduled De-rates SNL Speed No Load TTF Time to Fail TTR Time to Repair US United States USACE United States Army Corps of Engineers USR Unit Status Record VaR Value at Risk WECC Western Electricity Coordination Council xvi Acknowledgements The research presented in this thesis was supported and funded by grants provided to Dr. Ziad K. Shawwash by BC Hydro and the Natural Sciences and Engineering Research Council of Canada (NSERC; CRDPJ 476296 - 14). I would like to express my sincere gratitude to my supervisor Prof. Shawwash for the continuous support, patience, mentorship and inspiration throughout my graduate studies and research. He offered me the opportunity to get involved in the hydropower industry and gain valuable experience. I am also grateful to Dr. Gregory A. Lawrence for his support and guidance during my program at UBC and for reviewing this thesis. I would like to express my immense gratitude to Dr. Quentin Desreumaux for providing invaluable technical guidance during my research. His critique and inputs were instrumental in achieving many of my research objectives. I would also like to thank the BC Hydro operations planning engineers: Mr. Tim Blair, Dr. Dave Bonser, Dr. Andrew Keats and Mr. Amr Ayad. Without their passionate participation and regular feedback, the application of methods would not have been successful. I offer my heartfelt gratitude to Jonathan van Groll, Mehretab Tadesse, Luis Galindo and Erica Kennedy for making the Masters program so enjoyable. I would also like to thank my friend Ms. Neha Kothari for her irreplaceable assistance and encouragement during my stay in Vancouver. Special thanks are owed to my parents and other family members, who have supported me throughout my years of education, both morally and financially. Last but not the least, I am grateful to my landlords D.J. and Hari Singh for their love and affection. xvii Dedication To my parents, Rajkumar and Laxmi 1 Chapter 1: Introduction Utility companies around the world are using robust models for risk management and optimization of generating resources to participate in competitive electricity markets. System optimization studies are essential for finding a set of operating conditions for multi-purpose, multi-reservoir systems that maximizes the net benefits while satisfying all system wide constraints. BC Hydro is one of the largest utility companies in Canada that manages multi-purpose, multi-reservoir system for generation and supply of power. It operates 30 hydroelectric facilities and 2 thermal plants that provide power to its 4 million customers (BC Hydro 2016). Energy studies are executed every month over a 5-year planning horizon to forecast an optimal set of reservoir storage targets and generating station operations under forecasts of market, inflow and weather conditions. These studies also provide forecasts of energy imports and exports, corporate financials and price signals that are used as a decision support tool for hydro and thermal plant operations on a system scale relative to markets. There is a significant level of data uncertainty in the decisions made during operations planning. In order to make informed decisions within such a context, it is fundamental to properly model the nature and consequences of the uncertainty involved (Barroso and Conejo 2006). Most of the studies related to planning and optimization of reservoir operations involve managing many types of uncertainty such as reservoir inflows, electricity demand and market prices. In addition to these natural or market driven uncertainties, there are operational uncertainties pertaining to the system as a whole such as the available system capacity. The system capacity is a function of individual unit availabilities. Availability of generating units is impacted by the occurrence and duration of planned and unplanned outage events. 2 The unavailability of a generating unit can be caused by outages that may occur suddenly or are scheduled ahead in time. Scheduled outages or ‘planned outages’ (PO), are generally deterministic as there is negligible uncertainty in their duration and timing. These are known to system modelers in advance and hence accounted for in energy planning studies. However, the complete shutdown of units or sudden decrease of available generating capacity also occurs due to various unforeseen conditions. These unscheduled shut downs are called ‘forced outages’ (FO) and temporary reduction of maximum generating capacity are called ‘forced de-rates’ (FD). A generating unit may be forced out, or de-rated, due to various reasons related to generator, turbine, water conduit, exciter, transformer, circuit breakers, auxiliary parts among others (NERC 2015). 1.1 Problems due to uncertainty in unit availability Regulators in North America and Europe require utility companies to keep contingency reserves to deal with uncertainty in the system capacity caused by unplanned outages. These reserves are defined as the capability above the firm system demand required to compensate for outages, errors in load forecasting and also to balance system voltage and frequency. Reserves can be computed as some percentage of the installed capacity, or can be the sum of the largest generating unit capacity plus some constant value of the installed capacity (Cepin 2011). The North American Electric Reliability Corporation (NERC) divides the geographical areas under its jurisdiction into regions and sub-regions. These sub-regions must comply with the reserves requirements mandated by NERC. For entities within the western interconnection, such as British Columbia, the reserves are based on a proposal by the Western Electricity Coordination Council (WECC) which states that the minimum amount of contingency reserve should be the greater of the most severe single 3 contingency, or the sum of 3% of the balancing area load and 3% of the balancing area generation (Milligan et al. 2010). Since forced outage are uncertain events, statistical representation of outages become necessary. There is a cost of keeping conservative operating reserves, which could have been otherwise allocated for power generation (Bornak 2013). Beyond the issue of reserve capacity, correct representation of FO is necessary because it has been established that the net worth of energy produced by the system is impacted by the frequency of outages (failure rates) and by the duration of outages (repair rates) (Ryan et al. 1990). (Parrish 2015) discussed methods to quantify costs of outages of the US Army Corps of Engineers (USACE) hydropower facilities. The report argued that the economic impact of unit outages is a function of the plant’s operating strategy, water availability, regional generation mix, and regional electricity demand. Changes in operating strategy due to outages lead to loss of revenues. For example, take the case of forward contracts, which are often used by producers to reduce financial risk in energy markets. If the producer has insufficient capacity at the time of delivery due to unit unavailability, then they might be forced to buy energy from spot markets to honour their contract. This can lead producers to incur financial losses (Barroso et.al., 2006, Das et.al., 2012). Water availability impacts cost of outages because the water level in the reservoir is seasonal and thus financial cost of outages is also seasonal. Outages lead to wasteful spill which becomes a cost of outage. Regional energy demand is important because when electricity demand increases in a region, more expensive generating units are used. This increases the value of generation lost due to a unit outage during certain periods. And lastly, if fossil fuels are used to 4 replace a hydropower unit outage, it may lead to emission costs due to regional generation mix. It is, therefore, safe to conclude that non-inclusion of unplanned outages affects all operating objectives by varying degrees. These adverse impacts demonstrate why it is important to account for uncertainty in unit availability in operations planning studies. 1.2 Probabilistic analysis of forced outages Understanding the underlying failure processes and predicting when equipment might fail is challenging since outages can depend on several parameters like age of unit, periodic maintenance, load on the unit, start-stops, component wear and tear, etc. The relevance of each of the factors is almost impossible to quantify (Braglia, et.al, 2012). Two methods of analysis of forced outages can be used: deterministic and probabilistic. Use of deterministic analysis would imply perfect knowledge of future forced outages. The weakness of deterministic analysis for failure events is that it will invariably over-predict or under-predict the consequences of events. Probabilistic analysis provides an appropriate mathematical framework for characterising uncertainty and making informed decisions to overcome this challenge. Probability based analysis of risks were originally developed for application in the military and aerospace industry. It was then applied to the nuclear industry by using a reliability analysis framework to prevent catastrophic failures. Reliability analysis is applied to study the probability of the system to remain in the operating state without failure. However, this concept can be modified and applied to repairable systems such as the generating units of hydropower plants. If a modeler considers a serious failure event that is highly unlikely to occur, it leads to wastage of resources during regular operations. On the other hand, if the modeler ignores an event considering it to be less severe but that event occurs rather frequently, then the system would have insufficient reliability. A probabilistic assessment of 5 generating unit outages can consider the operational impact of outage events on system operation and the probability of its occurrence (Billinton and Li 1994). A combination of both severity of event and likelihood of occurrence truly represents the risks posed by forced outages. Access to long term historic unit outage data, representable probabilistic description of forced outages and improved computational power enables the use of advanced computational algorithms such as Markov Chain processes and Monte Carlo Simulations to generate scenarios of forced outages that are statistically similar to the historic forced outage values. These scenarios of forced outages combined with the planned outage data can significantly improve operations planning of generating facilities. 1.3 Goals and Objectives The aim of this research is to identify suitable methods for modeling forced outage of hydropower generating units and to develop scenario generation methods for system optimization studies of the BC Hydro system. To achieve these research objectives the following tasks were undertaken: 1) Study of literature on modeling of forced outages to find methods relevant for hydropower generating units. 2) Build a database from historical outage records and perform statistical analysis of outage data to understand key statistical properties of forced outages that need to be modeled. 3) Develop and test a probabilistic model to represent outages. 4) Develop and test different scenario generation methods to create potential outage scenarios and benchmark the results against historic values and to select the most suitable model. 6 1.4 Organization of Thesis This thesis is divided into five chapters. This first chapter gives a brief background of the problem and rationale for the research objectives. In the second chapter, a literature survey is carried out to analyse different modeling approaches of forced outages and identify the most appropriate modeling approach for hydropower units. This chapter also includes description of the standard unit status record database, important definitions related to forced outage studies and reliability indices used for benchmarking scenarios. The third chapter describes the methodology adopted for this research and it is divided in two sub-Sections. The first Section describes the development of database followed by the statistical analysis that determines the properties of outages that have to be modeled. At the end of this Section, a probabilistic model for the occurrence and duration of outages is developed. The second Section describes the scenario generation algorithms that were developed using the Markov Chain processes and Monte Carlo Simulation. Chapter four of this thesis describes the case study taking a BC Hydro generating station as an example. The results of statistical analysis and scenario generation algorithm are presented. Chapter five summarises the main conclusions and provides recommendation for future work. 7 Chapter 2: Literature Review Modeling forced outage involves statistical analysis of data and application of suitable reliability methods. The purpose of this chapter is to provide background information in both these topics and summarize relevant research work on stochastic modeling of outages. Starting with a description of the database system for generating units, all operating states of a hydro power generating unit are described. Past work done in statistical analysis of hydropower unit outages is presented to show how database systems have been used to obtain specific information. The next part delves into reliability concepts as applied to forced outage studies. It includes basic concepts of reliability, hazard rate, development of reliability indices and application of Markov process to model forced outages. Some studies are presented to show how researchers have modeled forced outages for different purposes. Finally, the key takeaways from this literature review are summarized to help in designing appropriate methods for a stochastic representation of forced outages for the BC Hydro system. 2.1 Database Management System for Generating Unit States In system operations data is collected primarily for two purposes: for the assessment of past performance and for the prediction of future system performance. Both purposes are complimentary in nature. To accurately predict future performance or improve upon past performance, it is essential to transform past experiences into suitable models. This makes data collection a crucial part of the whole exercise. The methodology of future predictions and data collection evolve simultaneously and this process is iterative. The database has to be comprehensive enough to reflect the needs of the predictive methodology, at the same time, it’s 8 important for utility companies not to collect unnecessary data which can lead to computation of irrelevant statistics (Billinton and Li 1994). To standardize common reporting procedure and realize a practical database, all major electric power utilities in Canada and USA, have developed a uniform data collection and analysis system. In Canada, comprehensive generating unit outage databases are maintained by the Canadian Electricity Association (CEA) and in USA, by the North American Electric Reliability Council (NERC). The CEA’s Equipment Reliability Information System (ERIS) and the NERC’s Generating Availability Data System (GADS) contain a wealth of important information. GADS maintain complete operating histories on more than 7,700 generating units, representing over 90 percent of the installed generating capacity in the United States and Canada (NERC 2015). The objective of this reporting system is the identification of the state of a generating unit at each hour throughout the year. ERIS started collecting data in 1977. It is structured into 3 basic components: the Generation Equipment Status Reporting System, the Transmission Equipment Outage Reporting System and the Distribution Equipment Outage Reporting System (R. Billinton 2001). For the purposes of this thesis, only the generating equipment status is relevant for dealing with generating unit availability. 2.2 Unit States in the BC Hydro Unit Status Record BC Hydro maintains the unit status for each unit since 1977 in its Unit System Recording (USR) database, in line with CEA’s and NERC guidelines. This database has information of all the operational status changes (state codes) accurate to one-minute interval for all BC Hydro’s generating units since 1977. Originally, this status record was only needed to satisfy the Canadian 9 Electric Association’s (CEA) reporting requirements. Today, the record and the statistics available from it are also required for a growing number of internal and external applications including: British Columbia Utility Commission (BCUC) rate hearings, monthly executive meetings, BC Hydro capital planning, and various other internal and external reporting purposes (Unit Status Recording Primary Reference, 2009). USR is the record that identifies a change in unit status. There are 30 state codes to cover various operating scenarios. Broadly, these states are: Available, Forced Outage, Maintenance Outage, Planned Outage and Forced Extensions of planned and maintenance outage. The following sub-Sections describe the major operating states and tabulate different state-codes, under which a unit can be classified in USR. 2.2.1 Available States Available States refers to operating and not operating states of units when a generating unit is not in an outage state. The gross maximum electrical output (in megawatts) which a generating unit has been designed for and/or shown by acceptance testing to be capable of producing continuously is called the Maximum Continuous Rating (MCR). Units can either generate energy at full MCR, reduced MCR or run in synchronous-condense (SC) mode. A derating is a reduction below MCR of generating unit capacity more than 2% of its MCR resulting from a component failure or other condition. A derating may be forced or scheduled. If some condition requires that the generating unit to be derated at once or, as soon as possible up to and including the very next weekend, then it is called forced derating. Reduction in MCR resulting 10 from a planned outage of a piece of equipment is called scheduled derating. Synchronous Condense (SC) state refers to the case when a hydropower unit is synchronized with system and operated as a motor, spinning freely in air and drawing power from the grid to provide reactive power/voltage support. The operating units may run at its Maximum Continuous Rating or reduced capacity (de-rated) or run in condensing mode. Table 1-Table 6 summarizes the different available states as classified by CEA (Canadian Electricity Association 2016): 1) Available On-Line States: Table 1: Available On-Line States State Name Definition 11 Generating The generating unit is spinning and synchronized with the system and is capable of operating at MCR under normal operating procedures. (Breaker Closed) 11-1 Condensing (turbine coupled)-Hydro only 11-2 Speed-No-Load (SNL) Breaker Closed 12 Generating under Forced Derating The generating unit is spinning and synchronized with the system but not capable of carrying its MCR due to a Forced Derating being in effect. (Breaker Closed) 12-1 Condensing under Forced Derating - Hydro only 12-2 SNL Breaker Closed under Forced Derating 13 Generating under Scheduled Derating 11 13-1 Condensing under Scheduled Derating - Hydro only The generating unit is spinning and synchronized with the system but not capable of carrying its MCR due to a Scheduled Derating being in effect. (Breaker Closed) 13-2 SNL Breaker Closed under Scheduled Derating 2) Available Off-line States: This represents the set of cases similar to available states with the only difference that the unit is not electrically connected to the transmission system for economic reasons. Table 2 Available Off-line States State Name Definition 14 Available but not Operating (ABNO) The generating unit can carry its MCR but is not being operated to supply system load. The unit may be spinning with the breaker open. 14-1 Condensing (Turbine Un-coupled) - Thermal Only 14-2 SNL Breaker Open - Hydro only 15 ABNO under Forced Derating The generating unit can deliver only part of its MCR due to a forced derating but is not being operated to supply system load. 15-1 Condensing under Forced Derating - Thermal Only 15-2 SNL Breaker Open under Forced Derating - Hydro Only 16 ABNO under Scheduled Derating The generating unit can deliver only part of its MCR due to a scheduled 16-1 Condensing under Scheduled Derating - Thermal Only 12 16-2 SNL Breaker Open under Scheduled Derating - Hydro Only derating but is not being operated to supply system load. 2.2.2 Forced Outage State For the purpose of systematic record keeping, forced outages are defined as the occurrence of a component failure or other condition which requires that the generating unit be removed from service immediately or up to and including the very next weekend. The sub categories of forced outages are: Sudden Forced Outage Unit Trip, Immediately Deferrable Forced Outage, Deferrable Forced Outage, and Starting Forced Outage. These are described in more detail in Table 3 below. Table 3 Forced Out States State Name Description 21-1 Sudden Forced Outage. Unit Trip The occurrence of a component failure or other condition which results in the unit being automatically or manually tripped. 21-2 Immediately Deferrable Forced Outage The occurrence of a component failure or other condition which requires that the unit be removed from service within 10 minutes 21-3 Deferrable Forced Outage The occurrence of a component failure or other condition which requires that the unit be removed from service from 10 minutes up to and including the very next weekend. 21-4 Starting Failure Outage The unsuccessful attempt to bring a unit from a shutdown state to synchronism with the electric system within a specified time interval. 13 Unit outages can occur due to several reasons. The ERIS database system mandates the assignment of a 5-digit component cause code that identifies the specific component or part thereof that caused the unit outage. Many of these outages are organized together under major groupings. These major groupings with examples are shown below (Canadian Electricity Association, 2016): a) Buildings and Structures: draft tubes, channels, tunnels, sluice gates, penstock, etc b) Power Generation Facilities (Turbines): runner, hub, blades, bearing, wicket gate, etc c) Power Generation Facilities (Generators): stator, core iron, slip rings, commutator d) Electrical Power System: bus duct, cable, switching equipment, transformers e) Instrumentation and Controls: governor, excitation equipment, power output system f) Plant Auxiliary Processes and Services: fire protection, water depressing system, g) External Conditions: storms, floods, fire, staff shortage, transmission line outage. 2.2.3 Maintenance Outage A Maintenance Outage (MO) is defined as an outage that can be deferred beyond the end of the next weekend. A MO can occur any time during the year, has a flexible start date, may or may not have a predetermined duration, and is usually much shorter than a planned outage. The removal of a generating unit from service is generally done in order to perform work on specific components which could have been postponed past next weekend. This work is done to prevent a potential forced outage and which could not be postponed from season to season. Table 4 Maintenance Outage State 14 State Name Description 24 Maintenance Outage Event that can be deferred beyond the next weekend but not beyond the season 2.2.4 Planned Outage A Planned Outage is the removal of a generating unit from service for the inspection and/or general overhaul of one or more major equipment groups (e.g. five-year turbine overhaul, annual boiler overhaul). Scheduling of these planned outages for maintenance is a complicated task in itself. Constraints on resources, such as working crews, hours, and budget, have to be taken into account apart from assessing the impact of the scheduled outages on operations (Dalal et al. 2018). After completing the scheduling process, the outage period is assigned well in advance and can be postponed from season to season if needed. From a unit availability point of view, planned outages are deterministic in nature, although there could be some uncertainty in start or end dates and in duration of the outage. Table 5 Planned and Upgrade Outages State Name Description 25 Planned Outage Removal of unit from services can be deferred beyond the next the season 26 Upgrade Outage Removal of a generating unit from service for prolonged work to make modifications that will alter its performance beyond the original design and/or provide life extension through rehabilitation 15 2.2.5 Forced Extensions Sometimes the duration of Maintenance and Planned Outages are unexpectedly exceeded during repair work. The period of time exceeded beyond the originally scheduled period is called Forced Extension. The different types of Forced Extensions are described in Table 6. Table 6 Forced Extension of Outages State Name Description 22 Forced Extension of a Maintenance Outage The generating unit has an outage resulting from a condition discovered during a maintenance outage which has forced the extension of the maintenance outage. 23 Forced Extension of a Planned Outage Outage resulting from a condition discovered during a planned outage. 27 Forced Extension of an Upgrade Outage Outage resulting from a condition discovered during an upgrade outage which has forced the extension of the maintenance outage. 2.3 Statistical analysis of unit states Data on unit states is collected for two main purposes: first as an assessment of past performance and second as a prediction of future system performance (Billinton and Li 1994). The various sub classification presented in the previous Section are required to obtain operating statistics on any generating unit(s). These statistics are used to make decisions both internally for financial and operational purposes, as well as externally for regulatory reasons and comparison with other electric utilities. The unit operation statistics obtained from USR are used by various departments at BC Hydro for a range of purposes. A water licensing unit may focus on water levels and would 16 focus on discharge from turbine irrespective of power generation. So, they would not differentiate between on-line and off-line states as long as the turbine is releasing water. The maintenance department may use USR data to optimize the duration and timing of planned upgrades based on past statistics of planned and maintenance outages. The department of system optimization would be concerned about the availability of units without differentiating between available-operating (AO) and available-but-not-operating (ABNO) states. Some examples are presented to show how these databases have been used by researchers to gain various insights about unit operating statistics. Figure 1 shows the average hydropower operational status of generating units reporting to NERC. This chart represents cumulative hours spent in specific states by hydropower units reporting to GADS (USA) from 2000 to 2012. The database for this analysis comprised of units with different rated capacities. It has 258 units that have a capacity below 10 MW, 520 units with capacities between 10-99 MW and 126 units with capacities above 100 MW (Oak Ridge National Lab 2014). 17 Figure 1 Average Hydropower operational status (Hourly breakdown by unit size classes of units reporting to NERC) Source: (Oak Ridge National Lab 2014) The authors distilled all the operating states into 7 broad categories (4 active and 3 outage states). The active states consist of unit service hours (unit is synchronized to grid), pumping hours (turbine-generator is used as pump/motor), condensing (unit is spinning freely in air to provide reactive power/voltage support) and reserved shutdown hours (Available But Not Operating-ABNO). The outage states consist of Forced Outages, Planned Outages and Maintenance Outages. The sum of hours spent in the active states constitutes the total number of available hours (represented as a purple line in Figure 1). 18 The report concluded that there is a tradeoff between planned and forced outages. For larger units, a significant portion of total outage duration is spent in planned outage state. This may be a reason why larger unit experienced the lowest number of forced outage hour in comparison to smaller units, which experienced an increasing trend in unplanned outages during the last 13 years. Forced outage of a very large unit can be very expensive; therefore, it would make sense for plant operators to invest more in regular inspections to avoid forced outage. This is just one example of how these databases can be analyzed to answer specific questions. Another example of hydraulic unit outage analysis is found in the generation equipment status report (ERIS-CEA 2012). This report has data on operational performance and forced outages of 456 hydraulic units from 2003-2007. Here the cumulative forced outage duration for all units was 8.4 years with mean forced outage duration of around 46 hours per outage event. This paper further defined Incapability Factor (ICbF) as the ratio of Total Equivalent Outage Time, in hours, to total number of hours the unit is in service times 100. The ICbF due to unplanned outages (forced outages, forced derates and maintenance outages) for the 456 units was computed to be ~2.4% and due to planned outage was computed to be ~6.3%. The report analyzed forced outage information in many different ways. The charts of mean forced outage duration (FOD) under different MCR ratings, years in service and operating factors (unit loading in percentage) are shown in Figure 2 to Figure 4. 19 Figure 2 Mean FOD for units with different MCR ratings Source: Adapted from (ERIS-CEA 2012) Figure 3 Mean FOD for units at different age Source: Adapted from (ERIS-CEA 2012) Figure 4 Mean FOD for units operated differently Source: Adapted from (ERIS-CEA 2012) 0204060801001205-23 24-99 100-199 200-299 300-399 400-499 500 andoverMean FOD (hours)Unit MCR range (MW)010020030040001st2nd3rd 4th5th6-1011-1516-2021-2526-3031-3536-4041-4546-5051-55>56Mean FOD (hours)Years of Service02040608010012011-20 21-30 31-40 41-50 51-60 60-70 71-80 81-90 91-100Mean FOD (hours)Operating Factor (%)20 Figure 2 shows that bigger units have relatively shorter duration per outage and are brought back into service quicker than smaller units. Figure 3 suggests there are higher outage impacts in the first years when a new unit is installed which then stabilizes during the useful life of the unit and starts increasing again as they age. Figure 4 shows units mean FOD for different operating factors. Operating factor is the ratio of actual output of generating unit to the total potential output of unit if it was operated at full capacity for the entire duration. It was observed that units that have a higher operating factor undergo less severe outages. Units that have a lower Operating Factor are operated less frequently and have reported a much higher mean FOD. This can be because frequent start-stops of unit can increase chances of starting failure. The report also analyzed the contribution of major components to hydraulic unit ICbF due to unplanned outages, as shown in Figure 5. It can be noted that component failure related to turbine and generator are the main cause of forced outage in hydro power units. Figure 5 Major Component contribution to hydraulic unit ICbF due to unplanned outages Source: (ERIS-CEA, 2012) Buildings and Structures7 %Hydro Turbine37 %Generator30 %Electrical Power System11 %Instrumentation and Control7 %Plant Aux. Processes and Services1 %External Conditions7 %21 The CREATE report (Simonoff et al. 2005), statistically analyzed electricity outages over the period of January 1990 to August 2004 using data obtained from NERC. The authors looked at seasonality and annual trend of several attributes such as: number of incidents, average outage duration, capacity lost, and customers affected. They used three-month averages to analyze seasonal characteristics and twelve-month averages for annual trends. Winter was defined as December through February, spring as March through May, summer as June through August, and autumn as September through November. The report found that the number of incidents is increasing annually at a rate of 8.3%. However, this increase in number of events did not translate into duration of outage hours. From 1990 to 1993 durations of outages were getting shorter on average but this trend changed in the mid-1990s when the average duration started to increase, and this trend became more pronounced after 2002. Seasonal analysis suggested there were 65-85% more incidents in summer than the other seasons. The winter, spring, and autumn estimated rates were found to be similar to each other, with autumn having a rate that is slightly lower. This was presumably attributed to weather effects such as: snow and ice in the winter, thunderstorms in parts in spring and summer, and most importantly intense heat with corresponding air conditioner use in the summer. Autumn had lesser load and seasonal weather affects. However, no significant seasonal affect in duration of outages was observed as there was in number of incidents of outages. Apart from these reports, researchers have also emphasized the need for statistical analysis of operating states to understand the characteristics of a system before selecting a modeling approach for outages. (Koval and Chowdhury 1994) highlighted the importance of statistical analysis of unit 22 operation patterns and its relevance in the assessment of reliability of units. In their work, they argued that past patterns can help in answering relevant questions such as are generating units behaving similarly? Are there distinct outage patterns for each one of them? Are there more outages in one season versus others? What is the best probabilistic model to represent repair duration and failure interval, among others? The main takeaway from these past works is that analysis of outage starts with collection of relevant data on unit operating states from standard databases. A database has to be defined for specific need of the modeler and then statistical analysis has to be carried out to find the properties relevant for modeling. Once the statistical properties of outages are obtained then suitable mathematical approaches can be selected for modeling the uncertainty in forced outages. In the next Section, some of mathematical tools found in the literature are discussed in detail. 2.4 Reliability analysis in Forced outage modeling Reliability analysis methods have been increasingly used in academia and industry mostly due to availability of historic database and low cost computational power. In analysis of electric power systems, Fault Tree Analysis (FTA) and Event Tree Analysis (ETA) have been widely used to quantify major factors influencing total system reliability (Endrenyi 1979; Billinton & Allan, 1992; Zhou, 2015). ETA is a bottom-up analysis approach that defines all potential accident sequences/chains associated with particular initiating events. (Begovic, Perkel, & Hartlein, 2006) obtained reliability data of generating units and used Monte Carlo Simulation to quantify system reliability. On the 23 other hand, FTA is a top-down failure analysis where a system failure event is first identified and then all combinations of component failures are analyzed that can cause that system failure event. (Van Casteren, Bollen, & Schmieg, 2000) used FTA methods in forced outage analysis where they identified system failures and quantified it using component outage values. Reliability Block Diagram (RBD) is a diagrammatic method to visualize how component reliability contributes to the success and failure of complex systems. Forced outages are essentially failure events so they have been investigated by researchers and utilities using concepts of reliability engineering (Barroso and Conejo 2006; Roy Billinton and Allan 1992; Roy Billinton and Ge 2004; Bornak 2013; Cepin 2011; Curley 2013; Finger 1979; Scully et al. 1992). There are two main reasons of why reliability analysis has been employed for forced outage related studies. First, to compute operating reserves required for reliable system operations given outages are uncertain events. It was realized that use of deterministic criteria, such as “peak load percentage” or “loss of largest unit”, could not quantify the risk of supply shortages in the system and also failed to quantify the worth of added reliability by using reserves. Therefore, stochastic methods have to be explored to quantify reliability of generating resources (Prada 1999). The second reason was to quantify the optimum cost of preventive maintenance by analyzing the trade-off between the cost of maintenance and the cost of plant unavailability due to forced outages for different cases of failure and repair rates. To better understand of the research work in the literature, some of the basic concepts and traditionally used reliability indices are discussed with an aim to attain required background in reliability analysis of power systems. 24 2.4.1 Basic concepts of Reliability Analysis The classical index to measure reliability has been the probability of not failing. Reliability, as the characteristic of an item, is often defined by the probability that it will perform a required function under stated conditions for a stated period of time. Simply put, it is the probability of system staying in the operating state without failure. Mathematically, reliability can be expressed as: R(t) = P(T>t) Equation 1 where T is a non-negative random variable denoting the failure time, and 𝑡 is the designated period of time given the operating conditions. If f(u) is the probability density function (PDF) for the failure time 𝑇, the reliability function can be calculated as: 𝑅(𝑡) = ∫ 𝑓(𝑢) ⅆ𝑢∞𝑡 Equation 2 The reliability function R(𝑡) can also be defined as the complement of the cumulative distribution function (CDF) for failure probability, F(𝑡), corresponding to f(u): 𝑅(𝑡) = 1 − 𝐹(𝑡) = 1 − ∫ 𝑓(𝑢)𝑡0ⅆ𝑢 Equation 3 The reliability function gives the inverse cumulative probability distribution of failure time for components. Failure time, also referred to as Time to Fail (TTF), is the duration of time the component remained in-service/ available state before going out of service. For different values of TTF, F(t) or [1- R(t)] gives the probability of exceedance of component failing after that time. Since generating units are a continuously operated system that can tolerate failure, a slightly modified term is used, called availability. Availability is defined as the characteristic of an item 25 expressed by the probability that it will perform a required function under stated conditions in a stated moment of time (Roy Billinton and Allan 1992). This refers to probability of finding the system in operating state at some time into the future, which means that either the component has not failed at all till time t or it has already been repaired after failure so that it is fully operational again at time 𝑡. The operative state of a facility at time 𝑡 is assured, either because it has not failed till time 𝑡 with the reliability 𝑅(𝑡), or because it has been functioning properly since the last repair which occurred at time u, where 0<𝑢<𝑡. Hence, the instantaneous availability is expressed as: 𝐴(𝑡) = 𝑅(𝑡) + ∫ 𝑅(𝑡 − 𝑢)𝑓(𝑢)𝑡0ⅆ𝑢 Equation 4 Since the instantaneous availability is not easy to evaluate, the steady state availability is more commonly used (Zhou, 2015). The steady state availability gives the long-term operational performance of a repairable system and is mathematically defined as the limit of the instantaneous availability function as time approaches infinity: 𝑙𝑖𝑚𝑡→∞ 𝐴(𝑡) = (∑ 𝑇𝑇𝐹𝑖𝑁𝑖=1∑ (𝑇𝑇𝐹𝑖 + 𝑇𝑇𝑅𝑖)𝑁𝑖=1) Equation 5 where (𝑇𝑇𝐹i , 𝑇𝑇𝑅i) are the alternating sequences of time to failure and time to repair being simulated, and 𝑁 is the number of samples. The estimated steady state availability can be used to calculate forced outage rate of generating units (Billinton and Allan 1992). 2.4.2 Hazard Rate and Bath Tub Curves Hazard functions h(t) are another method to describe reliability of components. For a small time dt, hazard h(t)dt is the probability that a component that has survived until time t will fail in the next dt interval (Endrenyi 1979). Mathematically hazard functions are described as, 26 ℎ(𝑡) = 𝑙𝑖𝑚𝑑𝑡→01ⅆ𝑡𝑃[𝑡 < 𝑇 < 𝑡 + ⅆ𝑡 | 𝑇 > 𝑡] Equation 6 where T is the lifetime of the component, after which it fails. Endrenyi also showed that h(t) can be expressed as: ℎ(𝑡) = −ⅆⅆ𝑡(ln [𝑅(𝑡)]) Equation 7 Or, 𝑅(𝑡) = 𝑒− ∫ ℎ(𝑡) 𝑑𝑡𝑡0 Equation 8 Hazard functions tend to increase, decrease or remain constant. An increasing hazard function over the lifetime of the unit indicates components are becoming more prone to failure as they age. A decreasing hazard rate indicates continuous reduction in chances of imminent failure as time passes. A constant hazard rate applies to components where the chance of failure in some time t remains the same for any other t+dt. In reliability analysis, electrical/mechanical components are treated as either repairable or non-repairable. The life of non-repairable components lasts up to its first failure where the repair is uneconomical or infeasible. A constant hazard rate is often assumed to simplify analysis of non-repairable components. Due to this assumption of constant hazard rate, Poisson process has been used to describe the occurrence of failure events for non-repairable components which have approximately constant hazard rate (Endrenyi 1979). A Poisson process describes the probability of an isolated event (for say accidents, telephone calls, etc.) occurring a specified number of times in a given interval of time or space when the rate of occurrence (hazard rate) in a continuum of time or space is fixed (Billinton and Allan 1992). For non-repairable system, Poisson distribution 27 is used to find probability of single arrival of failure event. Mathematically, the reliability of a non-repairable component over a specified period t, having constant hazard rate λ, is given by: 𝑅(𝑡) = 𝑒−𝜆𝑡 Equation 9 On the other hand, repairable components are those that can be brought back to service after failure events. The life histories of repairable components consist of alternating operating and repair periods. Generating units come under repairable categories. They experience “failure” due to forced outages but can be brought back into service after maintenance and/or repair. Many repairable electrical components with preventive and corrective repairs exhibit a bath-tub hazard curve as shown in Figure 6 (Endrenyi 1979). Figure 6: Typical bath tub curve for component failure. Source: Roy Billinton & Allan, 1992 As shown in the above figure, the hazard curve has three distinct parts each with its own hazard rate and hence its own probability distribution for failure events. The first phase is called the burn-in phase or infant mortality, where the failure occurs likely due to design issues. Issues are detected and fixed in this initial period, so the hazard rate decreases rapidly. This phase is followed by the useful life period where the chance of failure is low and relatively constant. In the ‘old-age’, wear 28 out failures becomes the predominant cause of failure and results in an increasing hazard rate (Endrenyi 1979). A generating unit, like any other electrical or mechanical system, can be made to remain within their useful life period for the bulk of their installed period by constant and careful preventive maintenance and corrective maintenance. Preventive maintenance is performed in order to keep the unit in a condition that is consistent with the required levels of performance and reliability. This is achieved by regularly checking all the operating systems, cleaning, adjusting, lubricating all the components, replacing the components nearing a wearout condition and checking and repairing failed redundant components. Corrective maintenance is required after forced outage when the system malfunction. Its purpose is to restore system operation as soon as possible after failure by replacing, repairing or adjusting the components which have caused interruption or breakdown of the system (Roy Billinton and Allan 1992). 2.4.3 Probability distributions for different hazard rates The mathematical representation of these different types of hazard functions is done by selecting appropriate model from the Weibull family of distributions. The advantage of using the Weibull family of distributions is that these have a specific characteristics shape and any experimental data can be fit by choosing appropriate parameters. The general form of Weibull hazard function is 29 ℎ(𝑡) =𝛽𝑡𝛽−1𝛼𝛽 Equation 10 where α and 𝛽 are constants. If 𝛽 >1, the hazard rate increases; if 𝛽 =1, the hazard rate is constant and if 𝛽 < 1 less than 1 the hazard rate is decreasing (Endrenyi 1979). The reliability function (i.e., inverse distribution of failure time) for the Weibull hazard function is represented by: 𝑅(𝑡) = 𝑒−(𝑡𝛼)𝛽 Equation 11 The Gamma distribution has similar properties to those of the Weibull distribution, as it is a two-parameter distribution having a shape parameter β and a scale parameter α. By varying these parameters, a gamma distribution can be fitted to a wide range of experimental data. The reliability function is represented by: 𝑅(𝑡) = ∫𝑡𝛽−1𝛼𝛽Γ(𝛽)(𝑒−𝑡𝛼∝𝑡 )ⅆ𝑡 Equation 12 Where α and 𝛽 are constants and Γ(𝛽) is the Gamma function defined as: 𝛤(𝛽) = ∫ 𝑥𝛽−1𝑒−𝑥 ⅆ𝑥∞0 , n>0 Equation 13 The exponential distribution is a special case for both Weibull and Gamma distribution functions. When the parameter β = 1, the hazard function is constant and is given by (1/α). Interestingly, Poisson distribution is special case of exponential which evaluates the reliability until the first failure. The inverse distribution of failure time or reliability function is given by: 𝑅(𝑡) = 𝑒−1𝛼 Equation 14 While much of the theoretical work has been done on failure time and reliability functions, the repair time is simply checked for suitable fits using trial and error. (Roy Billinton and Allan 1992) 30 states that a lognormal distribution can be a good fit to the distribution of component repair times and hence it is finding acceptance in academia for the assessment of repairable systems. The cumulative probability distribution of repair times, using lognormal distribution is given by: 𝑄(𝑡) = ∫1𝑡𝜎√2𝜋𝑒 (−(𝑙𝑛 𝑡 − 𝜇)22𝜎2) ⅆ𝑡𝑡0 Equation 15 Where, 𝜇 is the mean and 𝜎 is the standard deviation of values of repair times. 2.4.4 Development of reliability indices for forced outages Generating unit availability is affected by many factors other than forced outages. Equipment aging, operational changes from environmental regulations and changes of the relative priority given to different water uses in multipurpose projects are all likely contributors to steady state availability factors (Oak Ridge National Lab, 2014). To quantify the impacts of forced outages on generating units, specific reliability indices have been used by building onto the basic definition of reliability and availability. Loss of Load Probability (LOLP) is a popular reliability index that incorporates a probabilistic approach. It represents the probability of net load on the system exceeding its available generation capacity under the assumption that peak load of each day lasts all day (Endrenyi 1979). In this method, all possible combinations of available units and unavailable units are taken to evaluate net system availability and their associated probabilities. Any load value can be checked against the calculated system capacity to find cumulative probability of that load not being met. This measure does not exactly stand for loss of load but rather a deficiency of installed available capacity. A modified form of LOLP is Loss of Load Expectation (LOLE) which gives the duration 31 of time, rather than of a probability measure in percentage, for which a certain load would not be met (Cepin 2011). LOLE is used to analyze outages at the consumer’s end, for example 5 hours of load shedding in 1 year. In order to accommodate de-rated states and non-operating states for peaking units, specialized indices such as the Derating Adjusted Utilization Forced Outage Probability (DAUFOP) have been developed (Wang, Ramani, and Davies 2004). This index gives the probability of a generating unit (including de-rated states) of not being available when needed (Roy Billinton and Ge 2004). The problem with LOLP, LOLE and DAUFOP is that they do not give any indication of the frequency of occurrence or the duration for which an insufficient capacity condition is likely to exist. It can provide a probability measure associated with every possible plant capacity due to individual unit outages. For example, for a plant with 2 generating units with capacities of 100 MW each, the plant capacity can be 0, 100 or 200 MW depending upon how many units are forced out. Using the above indices, such as LOLP, a probability measure can be assigned for each increment of plant capacity to quantify the chances of the load not being met. However, in a multi-reservoir system, the loading decisions on individual unit are based on local inflows and the marginal cost of water in the reservoir that are decided by system optimization studies. Simply put, LOLP and similarly derived indices provide a lumped measure of reliability and does not answer specific questions like how many outages occurred in a particular period, or what was the expected duration of each event in a year. Hydro power companies use simulation studies and optimization models to plan and operate multi-reservoir systems. To account for the uncertainty in system capacity, scenarios of outages are 32 required that can realistically represent the stochastic nature of time between failures and duration of outages as observed in historic data. Frequency and duration of outage events contain these additional physical characteristics and hence provide valuable information for the purposes of modeling (Billinton & Li, 1994). This method uses discrete unit states and transition probabilities to find ‘when’ and for ‘how long’ the unit will remain in a particular state (Roy Billinton and Allan 1992). The frequency of encountering a certain state is the probability of being in that state multiplied by the rate of departure from that state (Cepin 2011). To summarize, LOLP and related indices provide the probability of load exceeding available generation. However, from an operational modeling perspective, the average number of occurrences and duration of interruptions per time period are needed and are important. Basic Markov modeling process has been applied to generate scenarios and obtain reliability indices of frequency and duration. The application of a Markov process for assessment of reliability indices is explained in the following sub-Section. 2.4.5 Use of Markov process in Forced Outage modeling Generating units can be run at full capacity, can run at partial capacity when it is in derated state or can have 0 capacity during forced outages. Markov process is often used to describe the process of a system changing its state. For a Markov process to be applicable, the system behavior should have two characteristics: a lack of memory and stationarity. For generating units, these criteria are generally assumed to be true (Roy Billinton and Ge 2004). Generally, in a Markov process the probability of being in one state at time step t+1, depends on the state of the system at time t, but not on the states occupied earlier (Finger 1979). This change of state is governed by transition 33 probabilities which in case of outages, take the form of failure rate (λ) and repair rate (µ). Failure rate is the number of failures per unit available hour and repair rate is number of repairs per unit unavailable hour. Depending on the research question, either two state models is used (Figure 7) or multi-state discrete capacity models are used (Figure 8) Figure 7: Two-state representation of Unit Availability Figure 8 Multi-state representation of Unit Availability Multi-state discrete states have been used to describe the uncertainity in generation for a distributed system having wind and solar components as the system capacity is affected by sudden changes in weather as well as from individual component failures (Yan-fu Li and Zio 2012; Yan Li, Cui, and Lin 2017). In case of hydropower or thermal generating units, multi-state approach can be used to model different de-rated states. However, it is more common to adjust the duration of de-rated state into equivalent full forced outage (Roy Billinton and Ge 2004). 34 The application of Markov method requires the computation of transition probabilities in terms of failure and repair rate. The standard method of quantifying the transition probabilities for generating units was discussed by (Endrenyi 1979) who defined availability of any power system component undergoing normal repair and preventive maintenance. Endrenyi defined duration of outage events as Time to Repair (TTR) and duration from the start of unit operation to unit outages as Time to Fail (TTF). Endrenyi further showed that if the rate of occurrence of outage is constant, then in a long-time horizon the probability density function of TTF is given by an exponential distribution. The parameter that characterizes this exponential distribution is the failure rate λ or Forced Outage Rate (FOR). FOR is defined as total number of outages divided by total duration of time the unit remains in an operating state. FOR is inverse of Mean Time to Failure (MTTF). Hence, 𝝀 =1𝑀𝑇𝑇𝐹 Equation 16 The probability density function of TTF is then given by the expression 𝑓(𝑡, 𝜆) = 𝜆 𝑒−𝑡𝜆⁄ Equation 17 Assuming a constant repair rate µ, given by inverse of Mean Time to Repair (MTTR) µ =1𝑀𝑇𝑇𝑅 Equation 18 The probability density function of TTR becomes 𝑓(𝑡, µ) = µ𝑒−𝑡µ⁄ Equation 19 35 The use of exponential distribution makes the process a homogenous Markov process. A Markov process is called homogeneous if transition probability from tn to tn+1 is independent of ‘n’. Mathematically this can be written as: P[ Xn=j | Xn=i] = P[ X1=j | X0=i] Equation 20 The use of an exponential distribution enables analytical solutions for a homogenous Markov process and so it has been used widely in power system reliability application. There have been other good reasons in the past to assume exponential distribution for values of TTF. Limitations of the availability and quality of data made it difficult to verify an exact distribution. The parameter of exponential distribution is the failure rate which is simply the inverse of mean time to failure. And most importantly, the assumption of constant failure rates and hence exponential distribution simplified complex problems in power system reliability (Roy Billinton and Allan 1992). A Markov process can be used to find the state of the unit in every time step in a time series. An aggregation of these states creates a realistic scenario of outage events. Hence the required reliability indices such as frequency and duration of outage can be computed from the generated data. The generated data can be used as input to energy planning and simulation models, the details of which are provided in the methodology Section. It is important to mention here that the use of an exponential distribution is not an inherent limitation and the Markov process techniques shown above applies equally well to any other distributions that would fit time to failure values. The essential difference is that the integration for some distributions is rather complex or even impossible analytically and additional numerical integration techniques must be used. This can make the evaluation tedious or too difficult for hand 36 calculations and computer solutions are then required. (Barroso & Conejo, 2006, Hall & Ringlee, 1968) used exponential distributions as described above to model both TTF and TTR. The use of exponential distribution is specifically attributed to the fact that it can make complex models very tractable. However, some researchers have argued against use of exponential distribution and proposed Weibull distribution as it is bell-shaped and provides more modeling flexibility (Anderson and Davison 2005; Van Casteren, Bollen, and Schmieg 2000) Additional difficulty in using exponential distribution also arises from the use of MTTF as parameter. Ideally, TTF should be computed from the moment the unit begins to operate to the moment it fails (Roy Billinton and Allan 1992). This definition tends to subtract the periods of planned outages and reserve shut down states from TTF durations. However, when the scenario generation algorithm is to be run, the modeler has no information on how the unit would be loaded. Forced outage scenarios are supposed to an input in simulation models and Unit commitment is an output of those models. Hence defining TTF from start of unit loading to beginning of forced outage is not useful to obtain a sequence of outages that is supposed to be input for simulation studies. Also, assumptions of exponential distribution for Time to Repair have not been found to be realistic (Roy Billinton and Allan 1992). Due to unavailability of parts or manpower, certain repair many take very long time thereby making lognormal distribution a more suitable choice (Anderson & Davison, 2005). 37 2.5 Research literature on Modeling of forced outages Reliability analysis and scenario generation of forced outages for generating units have been carried out in industry and academia for two major purposes. One is to quantify the cost of preventive maintenance by analyzing the tradeoff between cost of maintenance and cost of plant unavailability due to forced outages at different levels of failure and repair rates (Begovic, Perkel, and Hartlein 2006; Binder et al. 1991; Das and Wollenberg 2012; Parrish 2015; Prada 1999; S. Ryan and Mazumdar 1990). The other is to compute operating reserves required to meet the load based on reliability criteria specified by regulators (Roy Billinton & Ge, 2004; Boomsma, Kristoffersen & Denmark, 2008; Bornak, 2013; Scully et al., 1992). Some past works are reviewed in the following Sections. 2.5.1 Studies on cost of forced outage and preventive maintenance (Binder et al. 1991) presents top down methods of analyses for predicting unit availability via different case studies of electricity utility firms. The authors argue that in bottom-up analyses, retrofits are identified, and most cost-effective repairs are conducted sequentially untill the funds are exhausted. On the other hand, in top-down analyses, historical trends of unit availabilities along with their overall spending are used for benchmarking unit’s performance. In top down analyses, different maintenance scenarios are examined to obtain a trade-off curve between availability and cost of maintenance as shown in Figure 9 . 38 Figure 9 Trade off curve for Total cost of unit maintenance (Binder et al. 1991) In order to increase system availability, the cost of maintenance has to increase. Optimum cost is that point which minimizes the sum of cost of unavailability and cost of maintenance. In the report, the authors reviewed the methods used by utility firms to quantify cost of unavailability. In one case study for Houston Lighting and Power Company (HL&P), they looked at impacts of upgrade spending, unit aging and plant load on unit availability. They developed a forced outage model based on the GADS database to compute duration of forced outages every year. They used regression analysis and the key explanatory variables for annual forced outage hours were found to be annual service hours, service hours per start and number of starts. In another case study, Southern Company Services used records of 85 fossil fuel units to find the factors affecting availability, using multiple regression on their database. They found that forced outage rates of current year were impacted by: previous year’s forced outages, current year’s planned outages and current year’s spending on maintenance, among others. 39 (NERC 1992) conducted a study on coal-fired thermal plants using planned and forced outage data from 1982-1988. It concluded that there are increased chances of forced outage in the week after a long duration planned outage due to boiler tube leaks and turbine vibrations. The study showed that utilities can reduce impacts of forced outages after long planned outages by looking at historical records of specific component failures. While the results may not hold for hydro turbine units, the study emphasized that it is important to understand the behavior of forced outages following planned outages. (Das and Wollenberg 2012) investigated the financial risk in day-ahead market associated with forced outages and how that risk varied with change in bids and location of generators. To compute the Value at Risk (VaR), the authors looked at a random outage scenario after the generators have been scheduled. The work focused on computing costs of outage for low bids, high bids and normalized bids to help in investment decisions. Outages were simulated just by considering probability of failure of a generator every hour, in a 24-hour period. 2.5.2 Forced Outage modeling for reserve computation (Finger 1979) developed a simple 2 state Markov process to model forced outages and used Loss of Load Probability (LOLP) to quantify reliability of electric power system due to load demand and generator operating characteristics. The power output of the plant was modeled as a Markov chain with parameters being mean TTR and mean TTF. The generated plant output and load demand were analyzed to derive equivalent demand curve that shows the load demand not being met (LOLP) for different possible cases of failure rates. 40 (Begovic, Perkel, and Hartlein 2006) used individual component probability of failures to obtain system reliability. Past historical data was used to obtain parameters of Weibull distribution for performance of the overall system. Monte Carlo Simulation technique was used to randomly sample and synthesize failure events. The authors state that use of historic data to quantify parameters of distribution is more robust approach and use of Monte Carlo simulation aids in extracting confidence intervals around the forecasted parameters. (Roy Billinton and Ge 2004) described the IEEE proposed four-state model and modified it to make it more realistic based on historic data and modeling using transition probabilities. He states that the basic two state model, with one state being available and another being forced out, is good for base load units but not for units used for peaking loads. Peaking units have large shut down time and thus simple 2 state models give unreasonably high forced outage rates. Four-state representation of unit characteristics overcomes: In service state, Reserve shut down state, Forced out needed and Forced out not needed state. The authors argue that the IEEE 4 state model does not capture transitions from the reserve shutdown state to the forced out but not needed state. They argue that unit operating characteristics should be analyzed from historic data for computation of transition rates and proposed a modified 4 state model to capture all possible transitions. A Markov process was used to obtain probability of failure. DAUFOP was used as a reliability measure because it provided practical and realistic unit reliability indicators for peaking units. (Rondla 2012) similarly developed 2-state and 4-state models to differentiate forced outage rates in base load and peaking units. 41 (Scully et al. 1992) used a semi-guided Monte Carlo simulation method to obtain scenarios of forced outages of generating units which are ‘statistically balanced’ at the system level. A 2-unit system was considered with a given failure and repair rate. Since there are two units in the system, the system capacity can be 100%, 50% or 0% depending on how many units are out. The authors first considered a simple two state Markov process to find the unit state in daily time step. Next, they developed a model to guide the selection of forced outage periods such that outage-periods are statistically reasonable from a ‘system’ standpoint. The difference in this method, from that of simple Markov process, is that before scheduling the forced outage in a period the program checks the total capacity on outage for each day in the period. If too much outage capacity is scheduled in any day of the period, the draw is considered biased. That particular random draw of outage duration is discarded and loop back to obtain a new outage period. The authors claimed to obtain statistically balanced forced outage schedules and faster convergence of Monte Carlo iterations required to produce reliable results. They further showed the use of this method to quantify benefits of reducing forced outage rates at system level. (Van Casteren, Bollen, and Schmieg 2000) used concepts of Fault Tree Analysis (FTA) to derive system state probabilities using component state durations and transition probabilities. They further compared two reliability assessment methods: homogenous Markov model using exponential distribution and a Semi-Markov model using Weibull distributions. They argued that use of Weibull distribution is more realistic and can be made mathematically tractable using a Semi-Markov process. A Markov process moves from one-time step to another and finds state of system in every time step. In Semi-Markov model the duration of time for which the system remains in a given state is also a random variable. So instead of moving from one-time step to 42 another, the Semi-Markov process moves from one state to another. The authors show that using a homogeneous model for reliability-based calculations overestimates the interruption costs by overestimating the fraction of longer outages. They claim that Semi-Markov method alleviates most of the shortcomings of homogenous Markov method. (Boomsma, Kristoffersen, and Krogh 2008) developed a prototype model for estimation of reserve power needed to cover forced outages of power plants and deviations from forecast of wind and solar power generation. While modeling outages, they considered planned outages to be deterministic and forced outages as stochastic processes in a simple 2-state model. Instead of using a Markov process and generating scenarios using exponential distribution, the authors used a Weibull distribution for TTF and TTR values. They applied Semi-Markov process to obtain a sequence of TTR and TTF values for a whole year. The scenario generation algorithm had an exclusion rule to prevent overlap of generated forced outage with an existing planned outage. An index called Forced Outage Rate (FOR) was defined as the ratio of forced outage hours to sum of forced outage hours and available hours. A generated scenario was accepted when the FOR for the generated scenario was close to FOR calculated from historic data. In his master’s thesis (Bornak 2013) used similar methods used by (Boomsma, Kristoffersen, and Krogh 2008) for generating forced outage scenarios to calculate reserves. A Weibull distribution was used for TTR and TTF values. Semi-Markov methods were used with Monte Carlo Simulations to generate forced outages. However, in this case the author did not use any heuristic exclusion rule to prevent overlap of planned and forced outages. He applied this method to obtain outages for generating units in the South Africa power system which were then used to analyze 43 the implementation of reserves. (Barroso and Conejo 2006) also described a scenario generation algorithm comprising of successive sampling from TTF and TTR distributions till the end of time horizon to obtain outage scenario. 2.6 Suitable methods for hydropower units Based on past works, it is evident that modeling of forced outage should start with collection of required data on unit operations. Statistical analysis of outage data is important to understand the properties of outages that needs to be modelled. This analysis should answer questions about choice of distribution for TTR and TTF. It should also provide information about trend, seasonality and independence of outage events since application of Markov/Semi-Markov process requires events to be stationary and memoryless. Frequency and duration are appropriate reliability indices to compare historic outage data with generated outage scenarios. Monte Carlo Simulation is being used in the industry to generate scenarios of outages. In this thesis, BC Hydro’s generating units are considered base load units and the units undergo periodic planned maintenance. To the best of the author’s knowledge, the impact of planned outages on forced outages for hydropower units have not been statistically investigated. At best, heuristic methods have been applied to prevent overlap of forced and planned outages in scenario generation (Bornak 2013). Appropriate statistical tests should be carried out to quantify impacts of planned outage on forced outage. In the literature, either a simple two state model has been used for base load units or a 4-state model has been used for peaking units. Planned outages are scheduled well in advance and are 44 known to the modeler during scenario generation of outages. Two-state models move from one forced outage to another and are not designed to account for planned outages. It appears that past researchers had subsumed the planned outage state within “available” state or did not consider it altogether. Four-state models for peaking units are not suitable for modeling impact of planned outages because “reserve shut down” state is not equivalent to periodic “planned outage”. Moreover, BC Hydro’s optimization models aims to load the available units in such an order that maximizes revenue based on marginal cost of water and market prices for electricity. The forced outage scenarios that would be generated would be an input to these optimization models at a stage when loading on units is unknown. Hence, the states like “reserve shut down” and “Forced Out Not Needed” have little relevance in context of simulation and optimization studies. This problem necessitates modification of the two-state model to account for any impact of planned outages on forced outages. In the next chapter, the methods used for statistical analysis of database are described. Based on the statistical analysis of data, an appropriate algorithm to generate scenario of forced outages is developed. Detailed description of Markov processes, Semi-Markov processes and Monte Carlo Simulation is provided. A two-state base case model for scenario generation is developed, which is extended to include impact of planned outages. The results of statistical analysis and scenario generation methods are presented in the case study Section. 45 Chapter 3: Methodology This chapter presents the method used to analyze the forced outage dataset and outlines the rationale behind the scenario generation algorithm developed and used in this research. The first Section describes available generating unit state data and its classification and reclassification to obtain two important outputs: Time to Fail (TTF) and Time to Repair (TTR) values. The second Section explains the methods used in statistical analysis of the TTR and TTF data. Finally, the third Section presents the scenario generation algorithm developed using Markov and Semi-Markov processes. 3.1 Sources of Unit Unavailability Statistical analysis of outage data and the development of probabilistic models for scenario generation requires the creation of a database. The standardized data recording system for various operating states of generating units was discussed in Section 2.2. A number of unit unavailability states can be identified, including: • forced outages, • maintenance outages, • planned outages, • upgrade outages, • forced and scheduled de-rates, and • forced extensions of maintenance and planned outages. For all other operating states, the unit may or may not be generating power but is still available when needed. Since the analysis is only concerned with failure of generating units, the various 46 classifications of available states described in Section 2.2.1 is irrelevant. Hence, data can be collected only for the unavailable states. Quantification of uncertainty in unit availability is of paramount importance for use in system energy studies and in operations planning and optimization. The next sub-Sections outline the assumptions that were made to simplify generating unit’s unavailability states. 3.1.1 Classification and reclassification of outages, forced extensions and de-rates Generating unit upgrade outages, planned outages and scheduled de-rates are known causes of unit’s unavailability as described in Section 2.2.4. Information on planned outages is usually made available well in advance, as they require the allocation of manpower and other resources needed to complete the work. A major system upgrade implies that the unit will be scheduled to be out of service for an extended period of time, and the state of the units will be known to the energy studies modelers and operations planners as well. Therefore, planned outages, scheduled deratings and upgrade outages can be treated as deterministic inputs in operations planning and energy studies. Maintenance outages are different from planned and upgrade outages as explained in Section 2.2.3. By definition, maintenance outages are those events where the unit has to be taken out of service to prevent a forced outage in the near future (NERC 2015). During these outages, the removal of a generating unit from service is generally done in order to perform work on specific components which could have been postponed past next weekend. The plant operator can schedule maintenance outage during weekdays following the next weekend, but it cannot be transferred to the next season. The problem identified, must be fixed in the current season implying that the plant operator has no information about maintenance outages that may occur in the next season or seasons 47 thereafter. This means that if a system optimization study is being carried out for a planning horizon of three to five years, then the system modelers would have no information about future maintenance outages. Therefore, for the purposes of energy studies, maintenance outages can be classified as uncertain as forced outage in terms of occurrence and duration. Forced extension of maintenance outages was defined in Section 2.2.5. It is the period of time for which maintenance outage duration was extended as the repair work needed could not be completed in its allocated time. Now, if maintenance outages are to be treated as forced outages, there is no point of differentiating maintenance outage period and a forced extension of maintenance outage period. Therefore, it is logical to merge the data sets on forced extensions of maintenance outage with their corresponding maintenance outages. Forced extensions of planned outages were defined in Section 2.2.5. Decision on including these outages for model studies should be based on preliminary data analysis on its impact on system. If the forced extensions of planned outages are not significant in number and duration in comparison to other sources of unit unavailability, then it can be eliminated from the database or merged with the corresponding planned outages. Forced de-rates are operating states that are caused by a sudden malfunction of parts, similar to forced outages but do not require the unit to be shut off completely. During forced de-rates the unit availability can be anywhere between 0-98% of its Maximum Capacity Rating (MCR) as defined in Section 2.2.1. A common method used to simplify forced de-rates is to convert de-rates to an equivalent outage (Roy Billinton and Allan 1992; NERC 1992). For a derating event it be 48 calculated as follows: Equivalent forced outage hours = Forced De-rate hours x (MCR – Final de-rated capacity)/MCR. For example, a 100MW unit was de-rated to 50MW for 10 hours during a de-rating event then the equivalent forced outage hour for that derating would be 5 hours. From an operations perspective, a 10-hour derating of 50MW is not equal to 5-hour outage of 100MW. However, for medium to long term energy planning studies, the conversion method can be used to simplify the problem if de-rates are not a major contributor to system unavailability. However, if data analysis of different unit states shows that de-ratings are frequent and contribute significantly to unit’s unavailability then, based on historic data, some de-rated states can be considered beyond the ON and OFF states. This would lead to a more realistic multi-state modeling of unit states and availability. For example, if a unit of 100MW capacity has incurred many deratings that reduced its capacity by half then a unit can be represented by 3 states: 0, 50 and 100MW. The discretization of unit states should be based on historic data and in consultation with the modeling group that would use forced outage scenarios in their models. 3.1.2 Defining Time to Repair and Time to Fail Based on the classification presented in the previous Section, all unavailable unit states can be grouped into to two main categories: planned outages and forced outages. Planned outages, upgrade outages and scheduled deratings are deterministic inputs to modeling studies and are simply referred to as planned outages in this thesis. The forced extension of a maintenance outage is merged with its corresponding maintenance outage. Forced de-rates are converted to equivalent forced outages. All these are collectively called forced outages as they represent the source of 49 uncertainty for unit availability. Such simplified representation of unit availability has been used in (Boomsma, Kristoffersen, and Krogh 2008; Bornak 2013) and shown in Figure 10. Figure 10 Simplified chart for unit availability Adapted from (Boomsma, Kristoffersen, and Krogh 2008) Stochastic representation of outages requires the statistical analysis of Time to Fail (TTF) and Time to Repair (TTR) values using the forced outage database. It should be noted that the use of TTF and TTR values fits well with the index of reliability, i.e., the frequency and duration of outages. TTF values provide information about the frequency of outages and TTR values reflect the duration of outages. TTF should be computed from the moment the unit begins to operate to the moment it fails (Roy Billinton and Allan 1992). This definition of TTF tends to differentiate between the various Available states, as defined in Section 2.2.1, by allowing the inclusion of unit committed state and excluding the reserve shut down state, when the unit is not generating power. In the historic data set, there is information on unit’s operating status and the reserve shut down period can therefore be eliminated from TTF values. But, looking forward, planners do not have any information about when the unit would be committed or in shut down state. So, the ideal way 50 of defining TTF from the historic dataset it is not useful. A more useful method to define TTF would be to move from one outage to another so that outage scenarios can be derived irrespective of all other deterministic operating states. Hence, TTF is computed from end of one forced outage to the beginning of next forced outage. TTR is the duration of the outage event, as shown in Figure 11. Figure 11 : Description of TTR and TTF A TTR and TTF database is created from historical records of generating units in order to perform statistical analysis as discussed in the next Section. 3.2 Statistical Analysis After obtaining the database on forced outages, statistical tests are used to understand the properties of forced outages that need to be accounted for in generating scenarios. Units and system performance levels can be examined in a variety of ways, most of which are appropriate for a specific purpose (Binder et al. 1991). If outage events are memoryless, independent and stationary then, homogenous Markov process can be used to model such events (Roy Billinton and Allan 1992; Perrica, Goldoni, and Raimondi 2009). The memoryless property assumes that the operating state of a unit in the next time step would be dependent only on the current state of the unit and not on the previous state. Independence of outage events assumes that the duration of TTR is independent of the duration of TTF. Stationarity assumes that the behavior of the system is not 51 changing over time, i.e., the conditional probability of failure or repair during any fixed interval of time is constant. In addition to these properties, the seasonality of outages and the impact of planned outages are important questions that can be evaluated using statistical tests. The methods used in this analysis are described in the following sub-Sections. 3.2.1 Trend in Occurrence and duration of outages The TTF and TTR depends on many factors, including basic system design, operating conditions, type of repairs, quality of repairs, materials used, etc. (David 1996). Generating units that take preventive maintenance measures have a bath-tub like hazard rate as discussed in Section 2.4.2. The first few operating years of any unit represent the period of infant mortality zone, therefore data for the initial years should not be included in the analysis as it would cause an unrealistically high rate of failure for a plant that is already in its useful life period. The useful life period of the unit is a period of constant rate of hazard, i.e., assumption of stationarity of the random process is valid in this zone. However, if there is an increasing trend in occurrence and duration of outage then it is indicative of an ageing unit with increasing wear and tear. Thus, it is essential to analyze trends in outage data to check if the unit is entering a wear out phase. The evaluation of trends in TTF values was done by computing the number of outages per year and the mean annual TTF values. For analyzing trends in TTR values, the mean and median duration of outages per year were used. Mean TTF and mean TTR have been used in past works for trend analysis (Ellis and Gibson 1991; Koval and Chowdhury 1994). Median values have been used for additional verification because a single large outage event can skew the mean of outages. 52 To test the hypothesis that the time series data is stationary, Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test can be conducted on both the datasets of mean TTF and mean TTR. In the KPSS test, the time series under investigation is represented as the sum of a deterministic trend, a random walk, and a stationary error. For the time series to be stationarity the variance of the random walk has to be equal to zero. The technical details about this test can be found in (Kwiatkowski et al., 1992). 3.2.2 Seasonality of Forced Outages The load demand and inflows in hydropower generation have seasonal variations. Since unit operation is highly dependent on these seasonal factors, it is crucial to check for seasonal patterns in outages. These factors have not been covered in much detail in past works. In (Simonoff et al., 2005) the authors look at the number of incidents and average generation capacity lost in each season for assessing seasonality of outage occurrences and durations. Analysis of seasonality is subjective depending on the question being asked. Outages can span across different seasons, so the generation capacity lost in each season does not provide information on the frequency and severity of outages in those seasons. For hydropower units, the real question to test seasonality is whether statistical distributions of TTR and TTF are significantly different from one season to another. To study the impact of seasonality on TTR and TTF values, the dataset can be divided into four seasons. Winter season from Dec.-Feb., Spring from Mar.-May, Summer from Jun.-Aug. and Autumn from Sep.-Nov. Empirical Cumulative Distribution Functions (ECDFs) of TTR and TTF can be plotted for each of the seasons along with the ECDFs for all the months taken together. 53 Based on the plot, assumptions can be made regarding seasonality, with inputs from system modelers. If the individual ECDF plots of each season are outside the 95% confidence interval for ECDF of all months taken together, then this would indicate seasonality in data. The reason for directly evaluating ECDFs instead of comparing point estimates like mean, median or variance is to prevent any bias and have a clear comparison of data values in every percentile of TTR and TTF. The median would only reflect a single point of the ECDF. The variance would explain the dispersion only. Mean value would be impacted by outlier values of extremely high TTR and TTF. However, direct analysis of ECDF overcomes these problems and shows how the comparison fares throughout the range of TTR and TTF. 3.2.3 Independence of TTR and TTF values Outage events are basically a series of transitions from TTF to TTR and back. Hence, the impact of TTR and TTF on each other has to be analyzed to correctly model failure events. Duration of an outage can depend on the preceding TTF value, and TTR can impact the subsequent TTF. If this phenomenon is demonstrated in the historic records of units, then it must be accounted for in the scenario generation algorithm. The Pearson correlation coefficient was used to study dependence between TTR and preceding TTF values, and between TTR and succeeding TTF values (Perrica, Goldoni, and Raimondi 2009). Scatter plots can be used to visualize these types of relationships. Autocorrelation in TTR and TTF values should also be checked by using the same Pearson correlation coefficient for different lags intervals. 54 3.2.4 Impact of planned outage on forced outages Planned outage events are basically preventive maintenance activities which are carried out to keep the system in a condition that is consistent with the required levels of performance and reliability. The objective is to keep the system failure rates from increasing above the design levels. Since the generating units are undergoing regular maintenance, it is worth looking at the impacts of planned outages on the occurrence and duration of forced outages. In the previous analyses, planned outages have not been accounted for quantitatively. As shown in, Figure 11 earlier, a simple two state model has been used. In this analogy, a forced outage was tagged as an “unavailable” state and all other states, including planned outages, were incorporated within the “available” state. Use of this model is based on the assumption that planned outages and forced outages are completely independent of each other. However, planned outages can have an impact on forced outages as discussed earlier. To quantify the impact of planned outages on forced outage, the database should be reclassified based on new definition of TTF. Until now, TTF has been defined as the time from one forced outage to the next forced outage, without consideration of planned outages. This has led to TTF values to be included in planned outage periods. Therefore, TTF was redefined as the time from the end of ‘last outage’ (planned/forced) to ‘beginning of the next forced outage’. This redefinition helps in classifying forced outages based on whether the preceding outage was a forced or planned outage. Two sets of TTR and TTF can then be developed, one for forced outages that occur just after planned outages (set 1: PF) and forced outages that occur after forced outages (set 2: FF), as 55 shown in Figure 12. These sets are thus called TTF_PF, TTR_PF and TTF_FF and TTR_FF for comparison purposes. Figure 12 TTR/TTF definition considering Planned Outages Comparison of ECDFs can then be done to see if the distributions of occurrence and duration of outages is impacted by a preceding planned outage. One plot can be created having ECDFs for TTR_PF, TTR_FF and TTR obtained using the older definition. Similarly, another ECDF plot of TTF_PF, TTF_FF and TTF using older definition can be created and analyzed. Based on these plots and in consultation with system modelers, suitable assumptions can be made regarding impacts of planned outages on forced outages for modeling and scenario generation. 3.2.5 Probabilistic model of outages The statistical analysis of historic data is based on the premise that future values of TTR and TTF would be consistent with the past behavior of generation units. These past records should be used to obtain suitable probabilistic distribution for TTR and TTF. Probability distributions can be used to describe the outcomes of a random variable, which, in this case are the values of TTR and TTF. The probability distribution of a random variable is represented using Cumulative Distribution Function (CDF). ECDFs have been used in the statistical analysis Section for analyzing the properties of outages. However, parametric distributions are required for scenario generation 56 algorithms. In statistical analysis of seasonality, the purpose of using ECDFs was to compare one ECDF of TTR/TTF with another TTR/TTF. This comparison using ECDF is more robust than comparing point estimates. However, ECDFs is not well suited for use in mathematical models. Furthermore, ECDF is usually bounded at its lower and upper ends by the lowest and highest value in historic dataset but parametric CDFs are not. ECDFs are stepped functions but parametric CDFs are smooth as illustrated in Figure 13. Figure 13 Difference between ECDF and parametric CDF This warrants development of a parametric CDF representing historical TTF and TTR values that can be used to sample future TTR and TTF values. In Section 2.4.5, it was explained that the exponential distribution is the most commonly used probabilistic model to fit TTF values in electric power systems (Barroso and Conejo 2006; Finger 1979; Scully et al. 1992). However, Weibull and Gamma distributions provide more flexibility as they use shaping parameters and thus have been used by many researchers (Boomsma, Kristoffersen, and Krogh 2008; Bornak 2013). The Lognormal function is also being considered an important tool to describe TTR values (Billinton & Allan 1992; Anderson & Davison 2005). 57 The advantage of Weibull, Gamma and Lognormal distribution functions is that they don’t have any characteristic shape and can be shaped to represent many distributions that best fit the actual data set. For forced outage events, the best-fit parametric distributions of TTR, TTF, TTF_FF and TTF_PF can be obtained from empirical data instead of assuming some distribution. A MATLAB code can then be used to find the best fit distribution for TTR and TTF values. The parameters of the distribution are obtained by Maximum Likelihood Estimation (MLE) (Raychaudhuri, 2008). (Sheppard, 2012) developed a code that can fit all the 18-parametric distributions defined in MATLAB to a time series dataset. The best fit among these distributions is obtained using Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). AIC is derived from information theory and is designed to pick the function that produces a probability distribution with the smallest discrepancy from the true distribution. It is an estimate of a constant plus the relative distance between the unknown true likelihood function of the data and the fitted likelihood function of the model. A lower AIC means a model is considered to be closer to the data. BIC is derived from a large sample asymptotic approximation to the full Bayesian model comparison. It is an estimate of a function of the posterior probability of a model being true, under a certain Bayesian setup. A lower BIC means that a model is considered to be more likely to be the true model. More details about these criterion can be found in (Kuha 2004). For the purpose of finding the best distribution, it is sufficient to know that the distribution which minimizes AIC and/or BIC is the most suitable fit for that dataset. 58 In this Section, the statistical properties of forced outages were established and probabilistic model for TTR and TTF dataset were developed. Analysis of these methods will allow selection of suitable methods for the stochastic representation of forced outages as discussed in the following Section. 3.3 Methods of sampling from forced outage distributions Reliability based methods for stochastic modeling can be applied using an analytical, simulation or an optimization approach. The difference between analytical, simulation and optimization approaches is in the way in which reliability indices are evaluated. In an analytical approach, the system is represented by a simplified mathematical model and reliability indices are directly evaluated from solutions of equations. On the other hand, simulation and optimization approaches treat the problem as a series of real experiments conducted in simulated time to model the random behavior. Reliability indices are estimated from the results of simulation or optimization studies, by counting, for example, the number of outage events. (Billinton and Li 1994) provides a good justification for using simulation methods rather than analytical solutions. In the past, analytical methods had the advantage of being less time consuming compared to simulation methods. However, due to modern computing resources, solution time is no longer an issue. Mathematical models can sometimes be an over-simplification of the system, thus becoming unrealistic. Simulation techniques can provide a wide range of output parameters including all statistical moments and probability density functions, whereas the output from analytical methods is usually limited to expected values. (Cepin 2011), on the other hand, cautions that the results in both approaches are only as good as the model of the system, the 59 appropriateness of the evaluation technique, and the quality of the assumptions and input data used in the models. (Endrenyi 1979) argued that analytical methods are more suitable for components with instantaneous repairs. However, for components with normal repair times and preventive maintenance, the simulation approach is preferred. The usefulness of modelling a stochastic process rests heavily upon the ability to adequately model uncertainty. Simulation method aims to capture uncertainty through scenarios. In the following Sections, Monte Carlo Simulation methods are described followed by description of its application in forced outage scenario generation. Optimization methods are beyond the scope of this study and will be investigated in future studies. 3.3.1 Monte Carlo Simulations Monte Carlo Simulation (MCS) is a simulation method that uses repeated sampling of the random variable from a probability distribution to generate scenarios of the event as it might happen in real life thus, enabling computation of desired output. In simple terms, MCS is a methodical way of doing a what-if analysis in experimental situations where the results are not known in advance. It allows for the investigation of the complete range of risk associated with each uncertain input variable (Raychaudhuri 2008). Because of this property, MCS is being used in many applications of modeling studies ranging from natural sciences, social sciences, engineering, finance, etc. The application of MCS in any application involves four steps. First, a base case is established which is a deterministic model of the system that uses the most likely value of the input parameters. The next step involves identification of a suitable statistical distribution of the random variable. Next, a set of random variables is sampled from this distribution. One set of random numbers, consisting 60 of one value for each of the input variables, is generated to provide one set of output values. This represents one scenario. This process is repeated by generating more sets of random numbers, to obtain different sets of possible output values (i.e., scenarios). Finally, a statistical analysis is carried out on the generated output to obtain required statistics on the response of the system. Failures in power systems are random events, hence they can be simulated using MCS. Sequential time MCS is applied when system behaviour depends on historic data (Alkuhayli, Raghavan, and Chowdhury 2012). In case of generating units, MCS can be used to randomly sample values of TTR and TTF from their respective probability distributions to generate scenarios of forced outages which can then be compared against historic data in terms of the frequency and duration of outages. In (Billinton and Li 1994), the authors described basic principles of three different simulation approaches. These approaches are: a state sampling approach, a state duration approach and a system state transition sampling approach. These methods are described in the following Sections. 3.3.1.1 State Sampling Approach A system is made up of many individual components. The combination of each component state decides the final state of the system. A power-plant can be thought of as a system consisting of individual units as components. In a state sampling approach, the behavior of each generating unit can be modeled to get plant level reliability indices. On assuming individual generating units as components there are two states: available and forced out. Let PF denote the probability of failure of one unit. To obtain the unit state, a random binary 61 number is first generated, with values of either 0 or 1, from a uniform distribution and then the results are compared with PF. The planning horizon is discretized into smaller time-steps. If the generated number is less than PF, then the unit is assigned forced out state and if it is larger than PF then it is assigned available state for that time-step. The sampling is done for the next time-step till the end of planning horizon for all units. The main advantage of this approach is that the sampling method is relatively simple as it entails comparing a uniformly generated random number against the value of PF. It requires little input data as only component state probabilities are required. The main disadvantage of this method is that frequency and duration of outages have to be back calculated from the generated scenario of outage and available states. 3.3.1.2 State Duration Sampling Approach The state duration sampling approach is based on sampling the probability distribution of component state durations. To apply this method, the unit state duration distribution functions are obtained. Here, duration distribution can be the probability distribution of any state duration like TTR or TTF. Next, unit states are sampled chronologically for all units. The chronological system state transition process is then created by combining individual unit state patterns generated. For the two-state representation of generating unit, component state duration distribution functions can be obtained for their available state (TTF) and repair state (TTR). The following steps can then be followed: 1) The initial state of each unit is assumed to be Available/UP state. 2) The duration of each component residing in its present state is sampled from its respective distribution. If unit is UP, a value of TTF is sampled. At the end of TTF period, the unit switches to the forced-out state and a value of TTR is sampled. 62 3) Step 2 is then repeated for the period of planning horizon for all units separately. 4) The individual unit state can then be used to obtain system state transition scenario. This method is illustrated for a two-component system in Figure 14. First, individual unit states are developed, which are then combined to give system state. Figure 14 System State Duration Method (Billinton and Li 1994) The main advantage of this approach is that it allows any state duration distribution to be used for sampling. It also easily provides frequency and duration indices of outage patterns. Its disadvantages being computational time and memory required because of the large number of 63 individual component’s states that must covered by the sampling algorithm. This method also requires detailed inputs in the form of probability distributions of TTR and TTF for each component. 3.3.1.3 System State Transition Sampling Approach This method focuses on sampling state transitions of the system rather than sampling individual components in the system. For a power plant with n generating units, there can be n+1 possible states that can be attained by the plant, i.e., 1 unit OFF, 2 units OFF, etc. If the individual component state durations are assumed to be represented by exponential probability distributions then the state duration probabilities of the system can be derived as follows (Billinton and Li 1994). To sample the system state durations, a random binary number can be generated (0, 1), and compared against state duration probabilities for the plant. A system state transition sequence can be obtained by sampling of each time step until the end of planning horizon. The main advantage of this method is that the reliability indices for the entire system are calculated directly. The major disadvantage of this approach is that the system state probability distributions can only be represented by exponential distribution if component states durations have exponential distribution. In summary, there is a trade-off between the state sampling approach and state duration sampling approach in terms of preparation of database, computational expense and post processing of output results. The state sampling approach requires few inputs and has a simple sampling algorithm but requires computation of reliability indices like frequency and duration of outage events from generated scenarios. The state duration sampling algorithm requires detailed input distributions for 64 component TTR and TTF. Although it is computationally expensive, it can model any statistical distribution and easily provide the required reliability indices. The system state transition approach comes with very restrictive assumptions in that it can be used only if system states follow exponential distribution. Furthermore, it does not provide outage patterns for individual units. Both state sampling and state duration sampling approach are applied to generate scenarios of forced outages in the next Section. 3.4 Methods for scenario generation of forced outages Finally, after analysing the statistical properties, fitting a probability distribution and selecting suitable sampling approaches; appropriate modeling algorithms can be developed to generate scenarios of forced outages. The scenario generation methodology is described using a two-state model based on a Markov Process and a Semi-Markov Process. The Markov Process method uses state sampling approach to generate scenarios of outages and the Semi-Markov Process uses state duration sampling approach. The Semi-Markov method can be further modified to include impact of planned outages on scenario generation. 3.4.1 Markov Process-State Sampling Method As has been briefly mentioned in the literature review, in Markov processes, the state of a system in the next time step is only dependent on the state of the system in the current time step and not on any past events. The dependence is quantified using transition probabilities for every possible change of state. If these transition probabilities are stationary over time, then the resulting Markov chains are called time-homogenous Markov chains (Konstantopoulos, 2009). The applicability of this method depends on the assumption that outage events are independent and identically 65 distributed. An outage scenario can be considered a series of ON-OFF states at every discrete time steps. A series of ON makes up TTF and a series of OFFs makes up TTR. Since this pattern involves the ‘transition’ from one state to another, a Markov Chain – Monte Carlo Simulation (MC-MCS) process is suitable for scenario generation. Initial research on scenario generation of outages has extensively used Markov processes (Li & Wang 2017; Moatti 1988; Finger 1979; Scully et al. 1992; Endrenyi, 1979 ; Barroso & Conejo, 2006). Figure 15 shows a two-state representation of generating unit where the ‘Up’ state is when the unit is available and ‘down’ state is when unit is under forced outage state. Markov model for forced outage assumes that the switch from Up to Down and vice versa, is constant in time irrespective of duration of existing state (Scully et al. 1992). Figure 15 Two-State model for generating units The state changes are also shown in Figure 15. The duration of the transition from Up to Down is called as ‘failure rate’ or λ, defined as mean number of transitions from up to down state per unit time in the Up state. Similarly, ‘repair rate’, µ is defined as mean number of transitions per unit λ DOWN UP µ State – 1 State - 2 66 time in Down state (Scully et al. 1992; Endreyni, 1979). The following equations describe both these parameters. 𝜆 =𝑛12𝑇1 = 1/(T1/n12) = 1/ (Mean Up Time) Equation 21 µ =𝑛21𝑇2 = 1 /(T2 / n21) = 1/ (Mean Down Time) Equation 22 Where, n21 is the number of transitions from State 2 (UP) and 1 (DOWN) and is numerically equal to n12, number of transitions from State 1 (DOWN) to 2 (UP). If F(12) is the frequency of transition from Up to Down, and P1 is the steady state probability of system remaining in state Up, then F(12) = P1* λ Equation 23 And similarly, F(21) = P2 * µ Equation 24 In steady state conditions, the frequency of entering an outage state is same as exiting it and P1 + P2 = 1. Therefore, P1 = µ/(µ + 𝜆) Equation 25 and P2 = λ/(λ + µ) Equation 26 P2 is also called Forced Outage Rate (FOR) as it gives the steady state probability of the unit remaining in a forced out state (Rondla 2012; Scully et al. 1992). 67 The scenario generation methodology using Markov Chain – Monte Carlo Simulation involves 3 steps. First is the preprocessing of historical data and the computation of transition probabilities. Second, scenario generation using repeated sampling, and finally, post-processing of output unit states for deriving the time and duration of outages. In pre-processing of data, the historical data is taken, and pre-processing is done to obtain transition probabilities. The TTR value for every outage event is rounded off to the nearest integer. This is done because the planning time horizon for future forced outage scenarios is to be discretized in hourly time-steps. This step might make all TTR less than 0.5 hours equal to 0 and modelers can either eliminate such values or round those values to 1 hour. However, the failure time (TTF) values cannot be rounded off directly. Outages with TTF values less than 0.5 hours cannot be eliminated or merged. This is because a small duration outage state may be quickly followed by a large duration outage due to various reasons. Merging different types of outages is not a realistic representation of all possible cases. Hence, all TTF values less than 1 hour are made equal to 1 hour and other TTF values are rounded off to their nearest integer values. This is followed by the computation of the failure rate, repair rate and FOR in the dataset. Having quantified the FOR, λ and µ, outage scenarios are created. The planning time horizon is discretized in units of 1 hour. The initial state of the system can be an input or assumed to be UP or can be randomized, x, between 0 and 1. To illustrate, if x ≥ FOR (the P2 defined in Equation 26), then the unit is considered available at the start of the study; otherwise it is considered Forced Out at the start (time=’t’). For the next time step (time=’t+1’), another random variable is generated between 0 and 1. The following rule is used to assign the state of a unit: 68 1) If at t=1, unit is UP and x ≥ 𝜆; then the state of the unit at t+1 remains UP. 2) If at t=1, unit is UP and x < 𝜆; then the state of unit at t+1 changes to Down. 3) If at t=1, unit is Down and x > 𝜆; then the state of unit at t+1 changes to UP. 4) If at t=1, unit of Down and x ≤ 𝜆; then the state of unit at t+1 remains Down. Once the state of the unit at t+1 is obtained, the steps are repeated until the end of time horizon. This gives a series of UP and DOWN states. Multiple scenarios can be created for stochastic modeling purposes. However, the goal of this thesis is to test the accuracy of scenario generation methods, so to ensure statistical significance, scenarios for a period of 1000 year is generated. This is based on the assumption that a 1000-year period would remove the impact of sampling variability and help in assessing the adequacy of the algorithm used. The scenarios generated are time series of unit states being in either UP and DOWN state in every hour. Post processing is also done on the generated data to obtain TTR and TTF values. The TTR and TTF values computed from outage scenarios are then compared against historic data. Additional statistics can be computed, such as the annual duration of outages, mean TTF, annual number of outages above certain value, etc. This scenario generation method can be further refined to model partial Derated states if those states are significant (Lisnianski et al. 2012). 3.4.2 Semi-Markov Process for FO-FO model Many authors have pointed to the inefficiency of homogenous Markov processes in modeling TTR and TTF, as it can give unrealistic results (Van Casteren, Bollen, and Schmieg 2000). A Semi-Markov process has been proposed as an alternative to homogenous Markov chain processes (Bornak 2013). 69 The distinguishing feature of the Semi-Markov model is the addition of a random variable that represents the duration in which the system remains in each state. In Markov processes, the planning horizon is discretized, and the algorithm moves from one time-step to another to evaluate the state of the system in every time step depending on the realization of the transition probability. In Semi-Markov processes, the unit remains in a particular state for the duration of time given by TTR/TTF. The values of TTR/TTF are derived from a pre-defined distribution, and the transition between states is modeled using those values. This feature allows the use of any distribution to represent the characteristics of the process. If the TTF and TTR of units are uncorrelated, then Semi- Markov process becomes suitable to generate outages for long time periods. If X(t) is the state of the unit at time t and Sn represents the time of the nth transition, then the duration between the transitions can be calculated as Un=Sn–Sn-1, which is also a random variable. Un distribution is chosen to best fit historic data. Hence, Un depends only on the states switched from X(Sn) and X(Sn-1) and is independent of X(t) for t