Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Methods for estimating reliability of water treatment processes : an application to conventional and… Beauchamp, Nicolas 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2008_fall_beauchamp_nicolas.pdf [ 1.58MB ]
Metadata
JSON: 24-1.0063084.json
JSON-LD: 24-1.0063084-ld.json
RDF/XML (Pretty): 24-1.0063084-rdf.xml
RDF/JSON: 24-1.0063084-rdf.json
Turtle: 24-1.0063084-turtle.txt
N-Triples: 24-1.0063084-rdf-ntriples.txt
Original Record: 24-1.0063084-source.json
Full Text
24-1.0063084-fulltext.txt
Citation
24-1.0063084.ris

Full Text

METHODS FOR ESTIMATING RELIABILITY OF WATER TREATMENT PROCESSES: AN APPLICATION TO CONVENTIONAL AND MEMBRANE TECHNOLOGIES by NICOLAS BEAUCHAMP B.Eng., Laval University, 2007  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in  THE FACULTY OF GRADUATE STUDIES (Civil Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2008 © Nicolas Beauchamp, 2008  ABSTRACT Water supply systems aim, among other objectives, to protect public health by reducing the concentration of, and potentially eliminating, microorganisms pathogenic to human beings. Yet, because water supply systems are engineered systems facing variable conditions, such as raw water quality or treatment process performance, the quality of the drinking water produced also exhibits variability. The reliability of a treatment system is defined in this context as the probability of producing drinking water that complies with existing microbial quality standards. This thesis examines the concept of reliability for two physicochemical treatment technologies, conventional rapid granular filtration and ultrafiltration, used to remove the protozoan pathogen Cryptosporidium parvum from drinking water. First, fault tree analysis is used as a method of identifying technical hazards related to the operation of these two technologies and to propose ways of minimizing the probability of failure of the systems. This method is used to compile operators’ knowledge into a single logical diagram and allows the identification of important processes which require efficient monitoring and maintenance practices. Second, an existing quantitative microbial risk assessment model is extended to be used in a reliability analysis. The extended model is used to quantify the reliability of the ultrafiltration system, for which performance is based on full-scale operational data, and to compare it with the reliability of rapid granular filtration systems, for which performance is based on previously published data. This method allows for a sound comparison of the reliability of the two technologies. Several issues remain to be addressed regarding the approaches used to quantify the different input variables of the model. The approaches proposed herein can be applied to other water treatment technologies, to aid in prioritizing interventions to improve system reliability at the operational level, and to determine the data needs for further refinements of the estimates of important variables.  ii  TABLE OF CONTENTS Abstract ....................................................................................................................................... ii Table of Contents .......................................................................................................................iii List of Tables .............................................................................................................................. vi List of Figures ........................................................................................................................... vii Acknowledgements ..................................................................................................................viii Co-Authorship Statement ......................................................................................................... ix 1  Introduction ........................................................................................................................ 1 1.1 Context.......................................................................................................................... 1 1.2 Pathogens of concern .................................................................................................... 3 1.2.1 Cryptosporidium parvum .......................................................................................... 4 1.3 Risk and reliability in the water industry...................................................................... 6 1.3.1 Quantitative microbial risk assessment .................................................................... 6 1.3.2 Methods of hazard identification in drinking water ................................................. 9 1.3.2.1 Hazard analysis and critical control point ........................................................ 9 1.3.2.2 Fault tree analysis ........................................................................................... 10 1.3.3 Reliability of water utilities .................................................................................... 11 1.4 Treatment technologies............................................................................................... 13 1.4.1 Conventional treatment........................................................................................... 13 1.4.1.1 Pre-treatment, clarification and rapid granular filtration ................................ 13 1.4.1.2 Performance testing and monitoring............................................................... 14 1.4.2 Low pressure membrane filtration .......................................................................... 15 1.4.2.1 Classification of membrane technologies ....................................................... 15 1.4.2.2 Submerged hollow fibre ultrafiltration membranes ........................................ 16 1.4.2.3 Integrity testing and monitoring ..................................................................... 16 1.4.3 Disinfection ............................................................................................................ 17 1.4.3.1 Chemical disinfection ..................................................................................... 17 1.4.3.2 Ultraviolet radiation ........................................................................................ 18 1.5 Research objectives .................................................................................................... 19 1.6 Thesis outline .............................................................................................................. 20 1.7 References................................................................................................................... 23  2  Technical Hazard Identification In Water Treatment Using Fault Tree Analysis .... 28 Preface .................................................................................................................................... 28 2.1 Introduction ................................................................................................................ 29 2.2 Background ................................................................................................................. 30 2.2.1 Qualitative reliability methods ............................................................................... 30 2.2.2 Microbial quality of drinking water........................................................................ 31 iii  2.3 Method for fault tree analysis of filtration technologies ............................................ 32 2.3.1 Fault tree construction ............................................................................................ 33 2.3.1.1 Defining the system, environment and top event ........................................... 33 2.3.1.2 Building the fault trees.................................................................................... 34 2.3.1.3 Soliciting operator knowledge ........................................................................ 35 2.4 Case studies ................................................................................................................ 36 2.4.1 Plant A .................................................................................................................... 36 2.4.2 Plant B .................................................................................................................... 37 2.5 Results of the case studies .......................................................................................... 39 2.5.1 Systems, environments and top events definitions ................................................. 39 2.5.1.1 System definitions .......................................................................................... 39 2.5.1.2 Environment definition ................................................................................... 39 2.5.1.3 Top event definition........................................................................................ 41 2.5.2 Fault trees ............................................................................................................... 42 2.5.2.1 Cutsets ............................................................................................................ 42 2.6 Discussion ................................................................................................................... 43 2.6.1 Using fault trees ...................................................................................................... 44 2.6.1.1 Data to determine probabilities of occurrence ................................................ 45 2.7 Conclusion .................................................................................................................. 48 2.8 References................................................................................................................... 60 3 Application and Limitation of a QMRA-Based Reliability Analysis to Assess the Performance of an Ultrafiltration Water Treatment Technology ....................................... 63 Preface .................................................................................................................................... 63 3.1 Introduction ................................................................................................................ 64 3.2 Reliability approach for applying the QMRA model ................................................. 66 3.3 Application of the reliability approach to a full-scale UF plant ................................. 68 3.3.1 Input variable probability distributions .................................................................. 68 3.3.1.1 Infectivity........................................................................................................ 68 3.3.1.2 Tap water consumption................................................................................... 69 3.3.1.3 Pathogen concentration................................................................................... 69 3.3.1.4 Full-scale UF system performance ................................................................. 70 3.3.2 Results of FORM analysis ...................................................................................... 71 3.3.2.1 Reference cases for conventional physicochemical system performance ...... 72 3.4 Discussion ................................................................................................................... 74 3.4.1 Methodological issues ............................................................................................ 74 3.4.2 Value of the approach in decision-making issues................................................... 75 3.4.2.1 Risk and regulations ....................................................................................... 75 3.4.2.2 Comparison of treatment technologies ........................................................... 76 3.4.3 Reliability issues ..................................................................................................... 78 3.4.3.1 Incorporating correlations ............................................................................... 78 3.4.3.2 Use of FORM ................................................................................................. 78 3.5 Conclusions ................................................................................................................ 78 3.6 References................................................................................................................... 86 iv  4  Conclusion ......................................................................................................................... 90 4.1 Summary ..................................................................................................................... 90 4.1.1 Application of FTA to identify technical hazards .................................................. 90 4.1.2 Reliability analysis to incorporate uncertainties in QMRA .................................... 91 4.2 Findings ...................................................................................................................... 91 4.3 Future work................................................................................................................. 93 4.4 References................................................................................................................... 95  v  LIST OF TABLES Table 1.1: Membranes in drinking water.................................................................................... 21 Table 2.1: Definitions related to the fault tree analysis, Plant A ................................................ 50 Table 2.2: Definitions related to the fault tree analysis, Plant B ................................................ 50 Table 2.3: Description of intermediary and primary events for membrane filtration failure ..... 51 Table 2.4: Description of intermediary and primary events for conventional filtration failure . 53 Table 2.5: Classification of cutsets ............................................................................................. 56 Table 3.1: Model and input variable assumptions ...................................................................... 80  vi  LIST OF FIGURES Figure 1.1: Life cycle of Cryptosporidium parvum (Smith and Rose, 1998) ............................. 22 Figure 2.1: Fault tree for waterborne outbreaks (adapted from Risebro et al., 2007) ................ 56 Figure 2.2: Fault tree construction in relation to soliciting operators’ knowledge ..................... 57 Figure 2.3: Fault tree of Plant A ................................................................................................. 58 Figure 2.4: Fault tree of Plant B ................................................................................................. 59 Figure 3.1: Probability density and cumulative distribution functions of input variables.......... 81 Figure 3.2: Total removal (LRVtotal) achieved by the UF plant between May 2005 and June 2007 .................................................................................................................................... 82 Figure 3.3: Reliability of an UF plant compared with reference cases for conventional rapid granular filtration (RGF) plants for different risk levels .................................................... 83 Figure 3.4: Probability density functions of conventional treatment train removal ................... 84 Figure 3.5: Importance of random variable for UF plant reliability analysis ............................. 85  vii  ACKNOWLEDGEMENTS I want to express my gratitude to all the people who supported me through my graduate studies at UBC. I would like to thank my supervisors, Barbara J. Lence and Christian Bouchard, for their invaluable ideas and support, and their independent and open-minded thinking. This project represents a precious learning experience, for which they are mostly responsible. I would not have made it through the ups and downs of graduate studies without the unconditional support of my family. My parents, sister, and brother-in-law provided precious encouragement. I am grateful to the support shown by my friends in Quebec and Vancouver. I also want to thank my girlfriend for providing encouragements when I needed them. Many thanks are given to the managers and operators who devoted time to the project. They offered incredible insights on the everyday operation of water facilities. This thesis could not be without their contribution. I am grateful to Michael Messner, of the United States Environmental Protection Agency, for his valuable help regarding infectivity models, to Elizabeth Sigalet, of the Interior Health Authority, and to Corinne Ong, of the British Columbia Centre for Disease Control, for their insights regarding microbial water quality, and to Michèle Giddings, of Health Canada, for her help regarding water consumption data. The informative discussions I had with each of them were greatly appreciated. I appreciate the financial support of the Natural Science and Engineering Research Council, which made it possible for me to undertake this degree.  viii  CO-AUTHORSHIP STATEMENT Nicolas Beauchamp was the lead and principal researcher of the work contained in the thesis titled " Methods for Estimating Reliability of Water Treatment Processes: An Application to Conventional and Membrane Technologies.” Dr. Barbara Lence and Dr. Christian Bouchard, the Research Co-supervisors of the thesis, provided inspiration, technical guidance, and supported the writing of the papers included in the thesis.  ix  1  INTRODUCTION 1.1 Context Water treatment can be defined as the “manipulation of water from various sources to  achieve a water quality that meets specified goals or standards set by the community through its regulatory agencies” (Crittenden et al., 2005). One of the most common uses of water treatment is the production of drinking water for citizens on a municipal scale. The provision of uncontaminated water is believed to be one of the most effective interventions in the history of medicine to reduce the burden of diseases on the world population. Bacterial infections such as cholera and typhoid fever, and viral infections such as poliomyelitis are practically eradicated in countries with high hygiene standards, in part because of the protection provided by treating water intended for human consumption. Despite the successes achieved by municipal water treatment, waterborne disease outbreaks still occur in most parts of the world, including developed countries. Famous North American outbreaks include the Escherichia coli outbreak in Walkerton, Ontario, in 2000, where approximately 2,300 people were infected and seven died, the Cryptosporidium parvum outbreak in Milwaukee, Wisconsin, in 1993, causing an estimated 400,000 infections and 50 deaths, and the North Battleford outbreak of Cryptosporidium parvum, causing between 5,800 and 7,100 infections (Hrudey and Hrudey, 2007). The objectives of water treatment differ depending on the intended use of the water and on the jurisdiction under which the treatment process is regulated. In municipal drinking water applications, the main objectives are to provide microbiologically and chemically safe water, with acceptable aesthetic quality, in sufficient quantity and at a relatively low cost. Initially, compliance with water treatment objectives was assured when monitoring of treated water for undesired contaminants achieved a certain standard. Many of these rules and regulations are still applied today. For example, in Canada, drinking water must be monitored for the presence of Escherichia coli in samples of 100 ml at a set frequency and all test results must be negative (Federal-Provincial Committee on Drinking Water, 2004). Concerns regarding whether this approach provides sufficient protection resulted in the development of other types of standards, namely a performance-based standards. Sampling of small quantities of treated water is believed to be statistically insufficient for detecting diseasecausing organisms, especially for contaminants that can be a health concern at very low 1  concentrations. To further guarantee the good quality of drinking water, monitoring the performance of the various treatment processes became the norm. For example, in chemical disinfection processes, the CT approach is now used under many jurisdictions. In applying the CT approach, one determines the inactivation of a given microorganism achieved by disinfection by measuring the concentration of disinfectant (C) and the contact time (T) of this disinfectant with the contaminated water. For each pathogen and disinfectant, a relationship between the product of C and T and the inactivation is usually provided in the regulations. The required performance, in terms of inactivation, is a function of the source water being treated. Systems using surface water as their source generally have a set of performance standards to achieve, different from those of groundwater systems. Again, there are local variations among jurisdictions. This approach to regulating municipal water treatment systems also has limits. While performance standards do help to improve the quality of drinking water, authorities are required to determine performance standards and to allocate performance credits for the different pathogens of concern, and for different technologies used by water treatment utilities. Indeed, as new pathogens are discovered, the removal capacity of treatment technologies and regulations regarding them must be determined for these newly recognized threats. Also, the water industry is evolving rapidly to address the needs of water treatment facilities and new technologies are being developed to treat a wider range of pathogens. Each new technology is accompanied by a significant level of uncertainty given the relatively low level of experience with these technologies and the lack of knowledge regarding new microorganisms that they are to treat. Different tools like risk assessment or reliability analysis have been developed to address such uncertainties. Some methods aim at quantifying uncertainty and others at reducing it. In the context of emerging pathogens and evolving technologies, it is important to advance knowledge regarding these tools and their application to the water treatment industry. This is the general objective of the present work. This project aims to advance techniques for hazards identification and quantification in a water treatment context by applying these tools to two different water treatment technologies used to remove the pathogen Cryptosporidium parvum. The remainder of this chapter will provide background information motivating this work. Section 1.2 will provide information regarding the pathogens of concern in drinking water, with a particular focus on Cryptosporidium parvum. In Section 1.3, the risk and reliability tools used in the drinking water industry are presented. Section 1.4 summarizes the current state of 2  knowledge regarding the treatment technologies investigated in the two case studies examined in this thesis. Finally, presentations of the objectives and content of the thesis are provided in Sections 1.5 and 1.6, respectively.  1.2 Pathogens of concern Although this project focuses on Cryptosporidium parvum, many other biological contaminants are a primary concern for water treatment facilities. Waterborne organisms pathogenic to humans can be viruses, bacteria, protozoa, helminths, and algaes. Since water is ingested in the digestive tract, most diseases caused by pathogenic organisms are gastroenteric in nature, although some waterborne microorganisms, like the bacterium Legionella pneumophilia, cause respiratory diseases. Helminths and algaes are different from other more traditional organisms in their pathogenicity. Helminths, or parasitic worms, are macroorganisms in certain stages of their life and can therefore be effectively removed by conventional water filters. Algae are not directly pathogenic to humans. Some species produce endotoxins that cause poisoning in animals and that have been associated with gastroenteritis outbreaks in humans. Vibrio cholerae, Salmonella spp. and Escherichia coli are probably the most well-known bacteria that cause gastroenteritis. Numerous epidemics of each of these have been reported throughout the world at different times since they have been recognized as human pathogens. However, their prevalence and virulence is lower in countries where good sanitation and good water treatment are in place than in countries where they are not. This is because chemical disinfectants like chlorine or ozone, widely used in developed countries, are higly effective at inactivating bacteria. Therefore, in many regulations, the presence of a disinfectant residual, i.e., a small concentration of disinfectant, in the distribution system, is usually considered a sufficient indicator of safe water with regards to bacterial contaminants. Less is known related to viruses than to bacteria. Because of their smaller size ( < 0.1 µm ), viruses could not be detected until the advent of the electron microscope. They are also harder than bacteria to catch in raw or treated water samples. Their relative simple structure makes them difficult to identify and differentiate. Nevertheless, a few viruses have been identified as waterborne. This is the case of the poliovirus and the hepatitis viruses (A, B, C, D, E, G), which cause respectively poliomyelitis, a stiffness of muscles that varies from mild to severe, and hepatitis, an inflammation of the liver. Viruses can also cause gastroenteritis. Examples are the 3  rotarovirus family and the human calciviruses. Traditional chemical disinfection can also be very effective against viruses, given that an appropriate dosage of disinfectant is used. Protozoa are unicellular organisms, but unlike bacteria, they have a nuclei and complex life stages. In general, they are found in the environment in a resistant stage, develop in their host, where they reproduce and sometimes cause a disease, and then get ejected back in the environment. The main protozoa of concern in drinking water are Entamoeba histolytica, Giardia lamblia and Cryptosporidium parvum, which all cause gastroenteritis to a more or less severe degree by infecting humans digestive tract. Protozoan parasites are modern concerns in water treatment. For example, in the United States, Giardia lamblia was the principal target of the 1989 Surface Water Treatment Rule (SWTR) (USEPA, 1989) and Cryptosporidium parvum has only been regulated since 1998, with the advent of the Intherim Enhanced Surface Water Treatment Rule (IESWTR) (USEPA, 1998) and more recently with the Long Term 2 Enhanced Surface Water Treatment Rule (LT2ESWTR) (USEPA, 2006a). Conventional filtration and disinfection are not as effective against protozoa as they are for bacteria and viruses. Interest in protozoa grew when they were recognized as endemic human pathogens in both developing and developed parts of the world. Since then it has become clear that monitoring treated water quality or establishing treatment performance requirements based on only the type of source water (i.e., surface or groundwater) is not effective in regulating these types of pathogens.  1.2.1 Cryptosporidium parvum Cryptosporidium parvum is a protozoan parasite with a relatively complex life cycle (Figure 1.1 (Smith and Rose, 1998)) that involves both sexual and asexual reproduction. In the natural environment, it is found in a resistant stage, the oocyst, and is an obligate parasite infecting epithelial cells of the intestine of its host. The Cryptosporidium parvum oocyst is spherical with a diameter of 3-7 µm. Different species of Cryptosporidium have been described since the beginning of the 20th century, when Cryptosporidium species were discovered. Cryptosporidium parvum is the predominant species infectious to humans, although other less prevalent species have also caused illnesses in humans (Chappell and Okhuysen, 2002). Cryptosporidium parvum infects other animals, which act as reservoirs of oocysts. They are believed to be important factors for the transmission of the disease. Until recently, two genotypes have been identified, one infecting only humans (genotype 1, or “H” for human) and one infecting many animals, including humans 4  (genotype 2, or “C” for calves). Genotype 1 is now classified as a separate species, Cryptosporidium hominis (Chappell and Okhuysen, 2002). In this thesis, Cryptosporidium parvum includes both species. Symptoms of cryptosporidiosis are non-bloody diarrhea and abdominal cramps and can include low fever, nausea and vomiting. People of all ages can be infected, although it is more common and symptoms are more severe in children (Crittenden et al. 2005), suggesting a possible acquired immunity or resistance through exposure. Hosts do not always develop the illness, infections being asymptomatic in these cases. Up to now, no antibiotic treatment for cryptosporidiosis has been approved. Since Cryptosporidium parvum is an intestinal parasite of warm-blooded mammals, its presence is usually linked to a source of fecal contamination, like cattle ranching (Ong et al., 1996), dairy farming (Hansen and Ongerth, 1991), or municipal sewage discharge (Atherholt et al., 1998; Kistemann et al., 2002), although lower levels can still be found in relatively pristine watersheds (Hansen and Ongerth, 1991). Transport of oocysts is generally through runoff and, in the case of combined sewage discharge, through overflow during storm events. The highest concentrations in a stream are therefore generally found downstream of a source of contamination, after rainfall-runoff events (Hansen and Ongerth, 1991; Atherholt et al., 1998; Bradford and Schijven, 2002; Schijven et al., 2005). A strategy proposed to limit overland transport is the use of vegetated buffer strips along streams. It was found that maximizing infiltration of rainwater resulted in the highest removal of oocysts by the soil matrix (Atwill et al., 2002; Davies et al. 2004). Buffer zones tend to reduce water velocities, stabilize slopes and create porous soil matrices, which are all favourable to infiltration and removal of oocysts from the water flow. Due to its low size and density, electrostatic forces play an important role in explaining oocyst behaviour in water. It was found that oocysts in natural environments have a negatively charged surface (Ongerth and Pecoraro, 1996). Hence, they tend to travel freely in natural water bodies, repulsed by particles, which are also negatively charged (Medema et al., 1998; Dai and Boll, 2003). Nevertheless, attached oocysts, which tend to settle more readily than unattached oocysts, can be found in the presence of organic material and microbial flocs like those found in wastewater treatment plant effluents (Medema et al., 1998). Oocysts are believed to be resistant to most conditions encountered in the environment. Neither pH variation, temperature variation, nor natural decay achieves a 100% inactivation of oocysts (Robertson et al., 1992). It is unlikely that natural pH variations provide any inactivation and a fraction of 5  oocysts can also survive freezing for at least a month. UV light inactivation was also proposed as a mechanism for inactivation in natural environments, but it was found to be negligible (Antenucci et al. 2005; Brookes et al., 2005), mostly because of light attenuation by organic matter, particles, and canopy. A natural decay over time has been observed in environmental settings. Since long-term sources of oocysts, like oocysts shielding by cow feces (Schijven et al., 2005), seem to be present in most watersheds studied, the assumption that these oocysts pose a threat to human health is reasonable. Because of the asymptomatic infections and low reporting of the disease, endemic levels are difficult to estimate. Yet, antibodies to Cryptosporidium parvum are found in 30 to 50 % of the population of the United-States and in 60 to 85 % of the developing world (Crittenden et al. 2005), suggesting that most people have been infected at least once in their lifetime. Numerous waterborne outbreaks of cryptosporidiosis have been documented. In North America, the most famous outbreaks are the Milwaukee outbreak of 1993, in Wisconsin, United States, and the North Battleford outbreak of 2001, in Saskatchewan, Canada.  1.3 Risk and reliability in the water industry A new approach to regulation and design of treatment plants was introduced when it became obvious that not all water treatment facilities were effective against pathogens such as Cryptosporidium parvum (USEPA, 2006a). For those systems where higher levels of contamination exist, improvement of the protection is required to reach an acceptable level of contaminants in the finished water. Because it is not currently economically or technologically feasible to determine concentrations of Cryptosporidium parvum in finished drinking water (USEPA, 2006a), treatment performance requirements for this protozoon are based on health targets and raw water concentration. Quantitative Microbial Risk Assessment (QMRA) is the tool that is used to determine treatment goals. One of the objectives of this project is directed at further developing QMRA by integrating it with reliability analysis.  1.3.1 Quantitative microbial risk assessment QMRA is the application of principles of risk assessment to evaluate the consequences of an exposure to microbial infectious agents. It involves the identification of information regarding the presence of pathogenic agents in the immediate environment of the host of concern, the 6  relationship between the dose of pathogens and the response of the host, and the hazards created by the infectious agents. QMRA may be used to address a range of objectives. For example, one might want to know the risk placed on a population exposed to a certain concentration of pathogens through consumption of food or drinking water. It is also possible to use QMRA to estimate the level of infectious agents to which a population was exposed given an observed rate of infection in an outbreak. As described by Smeets (2008), a utility might use QMRA to estimate if it is complying with the health targets set by its health authorities. It may also be used to compare normal endemic levels of infection, when the treatment plant is operating within its limits, to special events where adverse conditions (e.g., source water contamination or treatment failure) have led to high contamination of drinking water for a short period of time. Other uses are: setting operational and critical limits to ensure the microbial safety of the water produced, designing monitoring programs that will verify the treatment performance of a system, preparing corrective actions when the water utility does not comply with the health targets, and comparing different treatment alternatives in the design phase of a facility (Smeets, 2008). The process of QMRA consists of four steps: hazard identification, exposure assessment, dose-response assessment, and risk characterization. The hazard is always defined as the consequences of exposure to a pathogen. Therefore, the first step consists of developing a description of the pathogen of concern, identifying its different infection routes, its effect on human health, and the treatment available for the disease. It is also at this step that the endpoint of the risk assessment can be chosen. The endpoint is the outcome measure of the risk assessment. Different measures have been proposed and applied over the years, from the number of infections, symptomatic or asymptomatic (Rose et al. 1991; Haas et al. 1996; Teunis and Havelaar 2002), or of fatalities in a year (Murray et al. 2006), or other intermediate measures like the number of illnesses in a year (Perz et al. 1998; Makri et al. 2004), or the disability adjusted life-years, DALY, (Havelaar et al. 2000; Pruss et al. 2002; Ashbolt 2004). The choice of an endpoint depends on many factors, including the goal of the QMRA study and the knowledge and data available to evaluate these endpoints. Exposure assessment involves quantifying the doses of pathogens to which people are exposed. In drinking water applications, the dose is a function of the concentration of pathogens and the volume of water ingested. Because of the high sample volumes required to detect Cryptosporidium parvum or other pathogens present at very low levels, the concentration in finished water cannot simply be measured (Haas et al. 1996) but must be estimated based on the 7  concentration of pathogens in raw water and the reduction achieved by treatment. Although necessary, the evaluation of treatment performances introduces uncertainty in the risk assessment. It is the most important source of uncertainty according to a study by Teunis et al. (1997). The next step, dose-response assessment, characterizes the relationship between the dose of pathogens to which one is exposed, and the response in terms of the endpoint selected in the step of hazard identification. Different models, such as the log-normal model, the single-hit exponential model and the Beta-distributed “infectivity probability” model (Haas, 1983), now known as the Beta-Poisson model, have been proposed and fitted to many pathogen infection data sets to model this relationship. Finally, the step of risk characterization integrates the preceding steps as a means of determining whether the risk calculated is a threat to public health, taking into account the uncertainties and variability of the different inputs to the model used. Application of QMRA in the drinking water industry was initiated by Haas (1983), although important work related to developing dose-response relationships (Rentdorff, 1954; Furumoto and Mickey, 1967; Gifford and Koch, 1969) and identifying the impact of pathogen suspension used in volunteer and animal feed studies (Worcester, 1954; Chang, 1958) are antecedents to Haas’ work. The major developments in the application of QMRA for drinking water took place during the 1980s and 1990s (Rose et al., 1991; Haas et al., 1996; Teunis et al., 1997). Haas et al. (1999) summarized the state of knowledge of QMRA at the time, reported on the microbial world, different steps involved in a risk assessment, and models used in the process of QMRA, and compiled previously published data regarding dose-response relationships. Since then, QMRA has been used to develop technical standards (USEPA, 1989, 1998, 2006a) for water treatment plants. A major shortcoming of QMRA is that it is data intensive. Also, it requires setting an acceptable level of risk as a health target, which is not always a straightforward task. It involves social, technical and economical factors. Lechevallier and Buckley (2007) is a recent valuable source of information regarding the determination of acceptable risk, where the authors summarize the current state of knowledge regarding QMRA, including the determinants of acceptable risk. In addition, although the use of QMRA quantifies the variability of treatment process performances, it does not try to identify the causes of this variability. For example, using QMRA, one might identify the cause of a high health risk to be the low performance of the 8  chemical disinfection process, i.e., low CT values. Yet, the method itself cannot be used to indicate why the concentration of disinfectant (C) or the contact time (T) was too low. Is the contact tank under-designed, is there a problem with the disinfectant dosing equipment or generating equipment, or is the control system of the plant deficient? Other methods looking more directly at the treatment technologies need to be employed to mitigate the risk of infection at the operational level.  1.3.2 Methods of hazard identification in drinking water Apart from QMRA, other qualitative and quantitative methods have been developed to conduct more or less advanced risk assessments. Some are strictly qualitative and aim only at conducting the first step of a risk assessment, namely hazard identification. Others are more complete and can be used to further characterize the probabilities of occurrence of different risk scenarios. Fault tree analysis (FTA), subsequently described in more detail in Section 1.3.2.2, is the tool used for hazard identification in this project.  1.3.2.1 Hazard analysis and critical control point The most widely applied method of hazard identification is certainly the Hazard Analysis and Critical Control Point (HACCP) method. This method was developed in the late 1960s in the domain of food safety. The HACCP concept, presented in the General Principles of Food Hygiene (Codex Alimentarius Commission, 2003), is a system for identification and control of significant health hazards from food. Its first application in drinking water was undertaken in 1994 (Havelaar, 1994). Since then, it has been applied widely in the water sector (Hamilton et al. 2006). As stated by Hamilton et al. (2006), “HACCP promotes the proactive management of hazards through the identification of ‘critical control points’ where they can be monitored and reduced.” In some places, it led to improvements in the understanding of the water system, and resulted in changes in operating procedures (Hamilton, 2006). However, various limitations of the HACCP approach have been pointed out. First, it is concerned with hazards, not risks. HACCP should be part of a global risk assessment where the likelihood of the identified hazards is at least estimated. The second and probably the most important pitfall of HACCP is that it should not be used to justify existing practices, but rather to identify objectively the critical control points and the best practices to manage them, regardless of what is currently implemented. It would otherwise be a purely theoretical exercise that would not lead to any risk mitigation. An example of this is the application of the method by Damikouka et al. (2007), 9  where the results did not seem to bring any new information and the recommendations remained general.  1.3.2.2 Fault tree analysis The FTA method is a deductive method taking the form of a vertical diagram listing all the possible events in a logical sequence leading to an undesirable consequence. At the top of the tree is the undesired event, also called the top event or system failure, and each row in the tree contains the possible causes of the events of the row above itself. Boolean operators (AND or OR) called gates connect events of different rows. This method is used when the possible causes of a given event of interest needs to be identified. It has the advantages of providing a logical sequence of events for a system failure and of helping to identify faults that have an impact on the failure, and it is efficient at modeling a large number of events and combination of events. However, like most qualitative methods, it also has the disadvantage of depending on the knowledge and experience of the analyst and it requires an adequate level of detail for the desired application because large trees can rapidly become difficult to understand. Finally, fault trees are of interest because it is possible to use them to conduct what is called “quantitative evaluations”. The goal of a quantitative evaluation is to find the probability of the top event and quantify the importance of primary events on system failure using probability theory and Boolean algebra. FTA has been widely used in the power industry but, to the author’s knowledge, its application in drinking water has thus far been limited. Mercer and Hrudey (1990) were the first to apply FTA in a drinking water context. They demonstrate the use of FTA in quantitative risk assessment for an occupational hazard faced by water treatment plant operators. Démotier et al. (2003) provide another example of the quantitative use of FTA in a global risk assessment in which they evaluate the probability of producing water that does not comply with standards. They incorporate incomplete information and uncertainty through belief functions to perform a quantitative evaluation of the fault tree for the risk of non-compliance. Deere et al. (2001) acknowledge the importance of identifying events and paths as scenarios leading to the undesired health effect of drinking water contamination and name this the fault tree concept, referring to a fault tree built by Stevens et al. (1995) for the contamination of a water source. Risebro et al. (2007) conduct an application of FTA to the causes of waterborne outbreaks. They identify  10  causes of outbreaks and propose a technique to quantitatively assess the importance of the contribution of each cause to waterborne outbreaks. In this project, the preference of using the FTA method over the HACCP method resulted from the fact that the undesirable event was already identified (i.e., microbial contamination of drinking water) and that the deductive approach of FTA could fit the needs of identifying causes and paths of failure. Moreover, FTA has not been applied to examine the operational aspects of different water treatment technologies and therefore this work presented an opportunity to advance application of FTA for different water treatment technologies.  1.3.3 Reliability of water utilities Uncertainty is present in many aspects of engineering, such as design, construction, operation, or monitoring. For example, in civil engineering, varying exterior loads and heterogeneous material properties lead to different structural reactions. Changing rainfall intensities can result in varying loads on a dam system. In drinking water production, a varying raw water quality and evolving plant performance produce a range of treated water qualities. Yet uncertainty cannot always be reduced and it is important to take it into account in the different phases of a project. As noted by Haas and Trussell (1998), with uncertain inputs, there is always a possibility that a system performs outside of its specifications. To deal with such uncertainties, the concept of reliability was introduced in civil engineering, first in structural safety, and then in many other fields. Reliability is defined as the likelihood of proper functioning of a system, called success, or non-failure. This is a quantitative definition that demands a probabilistic and/or statistical approach. Two points of view can be taken when looking at the reliability of a treatment plant: the inherent reliability and mechanical reliability. As defined by Eisenberg et al. (2001), inherent reliability is “the probability that a treatment plant effluent will meet or exceed a given set of criteria” given that the treatment plant is, in general, operating properly. This definition acknowledges that the effluent quality will vary and that it might temporarily exceed one or more criteria. Also defined by Eisenberg et al. (2001), the goal of mechanical reliability is “to determine key pieces of equipment in the plant whose failure may be related to effluent quality, and to determine the probability that the facility will be functioning according to design specifications.” Analysis of mechanical reliability requires hazard identification, determining the 11  possible faults and key components of the system, and frequency analysis, quantifying the probabilities of these faults leading to failure of the process. Although the tools presented in Sections 1.3.1 and 1.3.2 are usually considered risk assessment tools, they may be viewed through the lens of reliability. QMRA, in attempting to quantify the effluent concentration of pathogens and their consequences, may be classified as an inherent reliability method. On the other hand, hazard identification methods like HACCP and FTA may be considered mechanical reliability methods, where the causes of failure are identified, classified and possibly quantified. As previously mentioned, in this thesis, the two viewpoints will be used and developed. Different reliability methods have been developed over the years for determining probability of failures when inputs to the system are uncertain. Monte Carlo simulation, the most wellknown and widely used of these, consists of sampling the uncertain inputs, determining the output of the model, and repeating this process for a large number of samples, until the probability distribution of the output is stable. Monte Carlo simulation was used extensively in QMRA application (Perz et al. 1998; Barbeau et al. 2000; Medema et al. 2003; Zmirou-Navier et al. 2006; Jaidi, 2007) and may be used in estimating reliability, although it has not been explicitly referred to as a reliability method. Analytical reliability methods have been developed that use information regarding the input probability distributions and compute the reliability and sensitivities of the reliability to the input variables analytically. The first method developed is the First-order second-moment (FOSM) method, named FOSM because it uses second moment information (i.e., the mean and variance) of the input variables. The applications of this method being very limited, a more advanced technique was developed, named the First-order reliability method (FORM). FORM uses marginal distributions of the random input variables and correlations among them to evaluate the probability of failure of a system. This method is used in the present study to implement a reliability approach to drinking water facilities. Thereby, a new point of view is provided that may complement or enhance QMRA studies by providing a broader analysis of the reliability of a water treatment facility that is consistent with other disciplines of civil engineering practice. For more details regarding reliability methods, see relevant sources of information (Ang and Tang, 1984; Madsen et al. 1986).  12  1.4 Treatment technologies New technologies are inherently not as widespread as conventional ones. Experience gained from long-term observation and assessment is not as abundant for such technologies. In this thesis, it is proposed that analysis of mechanical reliability may provide valuable insights in the face of such limited information, and improve our understanding of the factors that may impede the performance of these technologies, our identification of approaches for preventing or minimizing their malfunction and the data collection needs to support these approaches, and our understanding of the similarities and differences among conventional and new technologies. In the following paragraphs, the water treatment technologies investigated in the case studies for this project are described.  1.4.1 Conventional treatment The conventional treatment train consists of a series of processes used to treat surface waters. The processes include screening, coagulation, flocculation, clarification, rapid granular filtration, and chemical disinfection. Chemicals for controlling alkalinity can also be added at different stages to control the coagulation or flocculation pH, or as a polishing step, if required. The physicochemical part of the conventional treatment train is described in Section 1.4.1. The disinfection step is discussed further in Section 1.4.3.  1.4.1.1 Pre-treatment, clarification and rapid granular filtration Screening, coagulant dosage and mixing, and mechanical flocculation are the pre-treatment steps. The goal of pre-treatment is the conditioning of water for the subsequent processes. Screening removes undesired objects and large particles that could harm the system. Coagulation and flocculation are the processes that destabilize particles and aggregate them in flocs, respectively. Particles in water generally have a repulsive negatively charged surface. Accurate dosage of coagulant neutralizes this charge (i.e., coagulation) and allows particles to aggregate when they collide (i.e., flocculation). The goal of forming flocs is to more easily remove them by clarification and filtration. Clarification is a solid-liquid separation process that removes flocs by settling or flotation. Settling takes place in basins where water velocities are low, allowing the flocs previously formed to settle. The sludge accumulating at the bottom of settling tanks is removed periodically. Flotation uses a recirculation flow, saturated with air, to create micro-bubbles that attach to flocs. 13  The bubble-floc aggregate has a lower density than water and rises to the surface. Floating flocs are skimmed off the top of a tank and sent to a waste stream. The clarified water is then sent to the filtration step. Rapid granular filtration removes particles and pathogens present in the effluent of the clarification step by straining large particles and attaching small particles to filter media by adsorption (e.g., electrostatic forces, and hydration). Most particles are removed by attachment to the filter media. The filter media is rinsed and cleaned of the accumulated particles by backwashing in intervals of every several hours to every several days.  1.4.1.2 Performance testing and monitoring The combination of pre-treatment and rapid granular filtration is usually referred to as the physicochemical removal part of the treatment train. Indeed, the performances of these steps are related (e.g., filtration performance depends on clarification performance) and removal credits for pathogens are usually given to these steps as a whole. The standards regulating physicochemical removal are mostly based on experience and empirical data. Monitoring of pathogens, like Cryptosporidium, that pose a health threat at low concentrations is not economically and technically feasible (USEPA, 2006a). Surrogate measures are therefore used to monitor and test the performance of the physicochemical treatment train. Turbidity is the oldest, most widely acknowledged and applied surrogate to monitor the quality of finished water. In most treatment facilities, it is continuously monitored at the effluent of every filter. In many jurisdictions, standards for physicochemical treatment performance are based on turbidity measurements. Despite the relatively long experience with this treatment train, uncertainties regarding the relation between effluent turbidity and performance remain, especially for newly regulated pathogens like Cryptosporidium. While it is now broadly known that optimized coagulation-flocculation steps are necessary for consistently removing pathogens (Ongerth and Pecoraro, 1995; Edzwald and Kelly, 1998; Huck et al. 2001), there is no direct relation between turbidity removal or effluent values and removal of oocysts. Many attempts have been made to estimate such relationships, (Haas et al., 2001; Emelko et al., 2005) and although some have been found, they are site-specific and not universally applicable. Among the other surrogates proposed, indicator microorganisms, particle monitoring and particle counts also seem to be limited by site-specificity. For more information, Emelko et al. (2005) provide a  14  review of the different studies directed toward understanding surrogate measures for the removal of Cryptosporidium oocysts by granular media filters.  1.4.2 Low pressure membrane filtration A membrane is a thin layer of material capable of separating particles, colloids or solutes from liquid by sieving and surface interactions when a driving force is applied to pass the water through the membrane. In drinking water treatment, this driving force is a pressure differential generated by a pump or a vacuum. Interest in low-pressure membrane filtration in the drinking water industry has grown because of new regulations regarding pathogenic microorganisms and chemical products such as disinfection by-products. Membranes have the capacity to remove both types of contaminants, i.e., pathogens and disinfection by-products precursors, which make them a popular choice for municipal drinking water systems. A brief description of membrane technologies is provided here.  1.4.2.1 Classification of membrane technologies Membranes can be classified based on different characteristics. The first and most important classification is based made on pore sizes. Table 1.1 shows the different classifications and uses based on pore size. The pore size and the surface pore density (number of pores per unit of membrane surface) of a membrane determines, among other things, the pressure that will be required to drive the process, and the substances that are filtered, hence the application that is adequate for this type of membrane. Membranes can also be classified based on the material of which they are made. Different organic and inorganic materials are used for membranes. In low-pressure drinking water applications, organic membranes are generally used. Among these materials are cellulose acetate, polysulfone, polyethersulfone, polypropylene, and polyvinylidene difluoride (PVdF) (Adham et al. 2005). Membranes are also differentiated according to their geometry and the configuration of their smallest operating unit, called a module. Flat membrane sheets can be mounted in two types of modules: the plate and frame and the spiral wound modules. Hollow-fibre membranes are either found as submerged modules or pressurized vessels. For more details on the different types of membrane modules and configurations and their applications, see Mallevialle et al. (1996), Adham et al. (2005), USEPA (2005). For low pressure membranes (MF/UF), a hollowfibre module is the most widely applied configuration (USEPA, 2005). The case study in this 15  thesis employs submerged hollow-fibre ultrafiltration membranes, and these are described in more detail here.  1.4.2.2 Submerged hollow fibre ultrafiltration membranes Hollow fibres are tubes of a few millimetres in diameter. A hollow-fibre module consists of several hundred to a few thousand fibres bundled together. In the case of submerged hollow fibres, a few thousand fibres are attached at both ends to a permeate pipe and are submerged in a tank of the water to be treated. Several modules are arranged in parallel forming cassettes and finally these cassettes are arranged in parallel forming a treatment train. Multiple parallel trains are usually present in a plant. When the removal of particles and natural organic matter (NOM) is required, an ultrafiltration plant typically consists of the following steps: coagulation, flocculation, ultrafiltration, and chemical disinfection. The coagulation and flocculation steps can be omitted if the raw water contains low concentration of NOM. Hence, ultrafiltration in a water treatment plant plays the same role the clarification and rapid granular filtration processes plays, i.e., it eliminates particles, coagulated matter, and pathogens by physical removal.  1.4.2.3 Integrity testing and monitoring Intact ultrafiltration membranes are an absolute barrier to protozoa and bacteria (Farahbakhsh et al. 2003), their absolute pore size of 0.1 µm being smaller than the size of contaminants which are on the order of >3 µm for protozoan cysts and ~ 1 µm for bacteria. However, breaches larger than microorganism sizes can result in the contamination of filtered water. Membrane integrity testing and monitoring are therefore critical for ensuring that the membrane system is functioning as required. Continuous integrity monitoring is based on the same principle on which quality monitoring of the rapid granular filtrate is based, i.e., indicators of performance are continuously monitored in the end product. Turbidity, and sometimes particle counts, are monitored in the permeate and spikes or higher-than-average values indicate a possible integrity breach. However, unlike in rapid granular filtration, performance credits are not granted based on turbidity measurements. In many jurisdictions, credits are given based on direct integrity testing. Different techniques such as pressure-decay tests, diffusive airflow tests, sonic tests or spiked integrity monitoring are used. The most widely used test is the pressure-decay test (Adham et al. 2005). Pressure-decay 16  tests are conducted on membrane trains taken offline. Air is pressurized on the permeate side of the fibres and a valve is closed on the permeate side to maintain the air pressure. Air will flow slowly through pores and breaches for a few minutes, causing a pressure decay, and the pressure loss is measured. Based on compressible flow hydraulics, a relationship between the rate of loss of pressure and the removal achieved by the membranes is then used to estimate the performance of the train (ASTM international, 2003; USEPA, 2005).  1.4.3 Disinfection Disinfection, usually one of the last steps in a water treatment system, is undertaken to inactivate pathogens that may still be present in the filter effluent. It many cases where water is not filtrated, disinfection is the only process decontaminating water before it is distributed. Disinfection is not covered here in great detail as this work focuses on filtration technologies used to remove Cryptosporidium oocysts. Many disinfection facilities use free chlorine as a disinfectant, which is considered ineffective at inactivating oocysts with practical doses. A natural extension of this thesis is to apply the risk assessment tools to other chemical disinfectants or UV radiation. Crittenden et al. (2005) provides an overview of disinfection and its related issues.  1.4.3.1 Chemical disinfection Chemical disinfection is designed to inactivate pathogens that may still be present in the filter effluent and to protect the distribution system from further recontamination and regrowth. This is referred to as primary and secondary disinfection, respectively. The most widely used disinfectant is free chlorine, but other chemical disinfectants such as ozone, chlorine dioxide, and chloramines are also used in different circumstances. The inactivation mechanisms of chemical disinfectants are not well understood. However, kinetics models have been well-fit to inactivation data and are widely used today (Crittenden et al. 2005). Performance monitoring of chemical disinfection is based on the kinetic models that use the CT concept. The product of the concentration of disinfectant (C) and the time of contact with the infected water (T) is the dose to which pathogens are exposed. Different pathogens exhibit varying tolerances to the spectrum of disinfectants available. Therefore, a relationship between the dose (CT) and the pathogen inactivation needs to be developed for every disinfectantpathogen pair of concern. Also, since this type of disinfection involves a chemical reaction, it is always influenced by the water temperature and usually influenced by pH. For these reasons, 17  there is a great deal of literature published regarding chemical disinfection processes. For a summary, see Crittenden et al. (2005).  1.4.3.2 Ultraviolet radiation A short review of UV radiation is provided here. For more information, see the guidance manual prepared by USEPA (2006b) and other relevant papers (Clancy et al. 2000; Mackey et al. 2002; Qian et al. 2004; Hijnen et al. 2006; Mamane and Linden 2006; Caron et al. 2007). Interest in UV radiation for drinking water application grew when disinfection by-products of chemical disinfectants were discovered, because UV radiation does not form such by-products. It became even more popular recently, with the discovery of its capacity to inactivate protozoan parasites more effectively than conventional disinfectants, and its popularity is expected to grow in the next decade (USEPA, 2006b). UV radiation inactivates pathogens by causing a photochemical reaction modifying essential components of the microorganisms’ DNA, wiping out their infectivity or capacity to reproduce. The effectiveness of UV radiation is related to the intensity (I) of the light to which pathogens are exposed, and the time of exposure (T). The product of these, IT, similar to the CT concept in chemical disinfection, is called fluence, and is usually expressed in mJ/cm2. The relationship between the fluence and the inactivation of a pathogen is determined experimentally using a collimated beam apparatus, which exposes microorganisms to a known UV intensity for a given time. Clancy et al. (2000) provides a complete description of the methodology used in such experiments. However, due to the multiple streamlines in a full-scale UV reactor, pathogens are exposed to a distribution of doses. The performance of a full-scale reactor usually requires validation by bioassay, as described by Mackey et al. (2002). Briefly, in a bioassay, the infectivity of a pathogen is determined experimentally before and after pathogenic organisms pass through the full-scale reactor. Given these data, inactivation is computed by comparing preand post-UV infectivity and the reduction equivalent dose, RED, (the dose equivalent in a collimated beam apparatus to the computed reduction) delivered by the reactor is computed. During operation, it is also important to consider factors such as transmittance of the water and presence of particles and aggregates of particles. Mamane and Linden (2006) and Caron et al. (2007) demonstrate that particulate matter interferes with the dose received by pathogens. Other factors influencing the dose received by pathogens are the lamp aging, possible fouling of lamp surface, and the consistency of the incoming power supply. Finally, it is important to note 18  that UV radiation, just like chemical disinfection, does not produce, at any given dose, the same level of inactivation for all pathogens. Hijnen et al. (2006) provide a review of the inactivation of pathogens achieved by UV systems.  1.5  Research objectives  The objectives of this thesis are: •  To explore and gain insight into the application of the concepts of hazard, risk, and reliability in the context of the microbial quality of drinking water;  •  To further develop the tools used in risk and reliability assessment of drinking water facilities.  •  To identify technical and operational hazards for new and conventional physicochemical processes;  •  To quantitatively assess the reliability of new and conventional physicochemical processes;  The first objective of this thesis, apart from being my personal objective and motivation to undertake this degree, is important for stakeholders of the drinking water industry. All stakeholders do not readily understand the concepts of hazard, risk, and reliability. I think a better understanding and identification of microbial quality, technical and operational hazards, risk of infection, and reliability of treatment systems will improve stakeholders awareness of the challenges operators face in everyday operation of a treatment facility, of the issues faced by engineers in the design phase of a facility, and of the motivation underpinning new drinking water regulations. Some tools (FTA, QMRA and reliability analysis) used in the drinking water industry and in other engineering disciplines, as presented earlier, can be employed in reaching these objectives. To further develop and explore the application of these tools to real cases is in itself an important contribution and constitutes the second objective of this thesis. Improving the protection of public health in the context of emerging pathogens and evolving technologies involves reducing the risk of infection and improving the reliability of a treatment facility. Identifying technical and operational hazards and suggesting strategies to mitigate them is the first step toward improving the protection of public health. These strategies can involve, 19  for example, new watershed management plans, changes in operating procedure, or use of new technologies. It is also valuable to look at ways to quantify the effect of different mitigation approaches. Hence, the third and fourth objectives of this thesis are to identify technical and operational hazards for conventional and new treatment technologies and to quantify their reliability using operational data.  1.6  Thesis outline  This thesis is manuscript-based. It is composed of two manuscripts that will be submitted for publication in peer-reviewed journals. Chapter 1 provided an introduction to the concepts of hazard, risk, reliability, microbial water quality, and water treatment important to the reader as a foundation for understanding the manuscripts. Chapter 2 establishes the FTA approach as a tool for identifying technical hazards and mitigating risks of infection through drinking water contamination at the operational level, and provides a systematic approach for extracting and organizing information on water treatment systems. The method is applied to two different physicochemical treatment systems, a conventional treatment train and an ultrafiltration treatment system, to provide a qualitative assessment of the reliability of conventional and new technologies. In Chapter 3, a quantitative reliability analysis is undertaken, providing new insights on the previously developed QMRA method, and using membrane plant operational data to provide a basis for estimating the reliability of such treatment systems described therein. The importance of uncertainties in the model inputs is characterized and means for improving our understanding of treatment process reliability are suggested. Finally, Chapter 4 summarizes the findings and limitations of the methods proposed, and discusses future work that should be conducted to address the issues raised by this study.  20  Table 1.1: Membranes in drinking water  Classification Pore size* (µm) Operating pressure* (kPa) Desalinization (monovalent ions) Softening (multivalent ions) Removing Drinking Disinfection byproducts and water precursors Removing particles and microorganisms (replacing part of typical train) *Can vary beyond these ranges.  Microfiltration (MF) 0.1 50 - 500  Ultrafiltration (UF) 0.01 50 - 500  Nanofiltration (NF) 0.001 500 - 1500  Reverse osmosis (RO) Non-porous 5000 - 8000 X  X  X  X  X  X  X  X  X  21  Figure 1.1: Life cycle of Cryptosporidium parvum (Smith and Rose, 1998)  22  1.7  References  Adham, S., Chiu, K.-p., Gramith, K. and Oppenheimer, J. (2005). Development of a Microfiltration and Ultrafiltration Knowledge Base, AWWA Research Foundation, Denver, 179 p. Ang, A. H.-S. and Tang, W. H. (1984). Probability Concepts in Engineering Volume II: Decision, Risk and Reliability, John Wiley & Sons, Inc., Antenucci, J. P., Brookes, J. D. and Hipsey, M. R. (2005). "A simple model for quantifying Cryptosporidium transport, dilution, and potential risk in reservoirs" Journal of the American Water Works Association, 97, (1), 86-93. Ashbolt, N. J. (2004). "Risk analysis of drinking water microbial contamination versus disinfection by-products (DBPs)" Toxicology, 198, 255-262. ASTM International (2003). Standard Practice for Integrity Testing of Water Filtration Membrane Systems, ASTM International, D 6908 - 03, 1-14. Atherholt, T. B., LeChevallier, M. W., Norton, W. D. and Rosen, J. S. (1998). "Effect of rainfall on Giardia and Crypto" Journal of the American Water Works Association, 90, (9), 66-80. Atwill, E. R., Hou, L., Karle, B. M., Harter, T., Tate, K. W. and Dahlgren, R. A. (2002). "Transport of Cryptosporidium parvum Oocysts through Vegetated Buffer Strips and Estimated Filtration Efficiency" Applied and Environmental Microbiology, 68, (11), 55175527. Barbeau, B., Payment, P., Coallier, J., Clément, B. and Prévost, M. (2000). "Evaluating the Risk of Infection from the Presence of Giardia and Cryptosporidium in Drinking Water" Quantitative Microbiology, 2, (1), 37. Bradford, S. A. and Schijven, J. (2002). "Release of Cryptosporidium and Giardia from Dairy Calf Manure: Impact of Solution Salinity" Environmental Science and Technology, 36, (18), 3916-3923. Brookes, J. D., Hipsey, M. R., Burch, M. D., Regel, R. H., Linden, L. G., Ferguson, C. M. and Antenucci, J. P. (2005). "Relative Value of Surrogate Indicators for Detecting Pathogens in Lakes and Reservoirs" Environmental Science and Technology, 39, (22), 8614-8621. Caron, E., Chevrefils Jr, G., Barbeau, B., Payment, P. and Prévost, M. (2007). "Impact of microparticles on UV disinfection of indigenous aerobic spores" Water Research, 41, 45464556. Chang, S. L., Berg, G., Busch, K. A., Stevenson, R. E., Clarke, N. A. and Kabler, P. W. (1958). "Application of the “Most Probable Number” Method for Estimating Concentrations of Animal Viruses by the Tissue Culture Technique" Virology, 6, 27-42. Chappell, C. L. and Okhuysen, P. C. (2002). "Cryptosporidiosis" Current Opinion in Infectious Diseases, 15, 523-527. Clancy, J. L., Bukhari, Z., Hargy, T. M., Bolton, J. R., Dussert, B. W. and Marshall, M. M. (2000). "Using UV to inactivate Cryptosporidium" Journal of the American Water Works Association, 92, (9), 97-104. Codex Alimentarius Commission (2003). Recommended international code of practice: General principles of food hygiene, Food and Agriculture Organization, CAC/RCP 1-1969, Rev.42003, 31p.  23  Crittenden, J., Trussel, R. R., Hand, D. W., Howe, K. J. and Tchobanoglous, G. (2005). Water treatment principles and design, John Wiley & Sons, Inc., Hoboken, N.J., 1948 p. Dai, X. and Boll, J. (2003). "Evaluation of Attachment of Cryptosporidium parvum and Giardia lamblia to Soil Particles" Journal of Environmental Quality, 32, (1), 296-304. Damikouka, I., Katsiri, A. and Tzia, C. (2007). "Application of HACCP principles in drinking water treatment" Desalination, 210, (1-3), 138-145. Davies, C. M., Ferguson, C. M., Kaucner, C., Krogh, M., Altavilla, N., Deere, D. A. and Ashbolt, N. J. (2004). "Dispersion and Transport of Cryptosporidium Oocysts from Fecal Pats under Simulated Rainfall Events" Applied and Environmental Microbiology, 70, (2), 1151-1159. Deere, D. A., Stevens, M., Davison, A., Helm, G. and Dufour, A. (2001) "Management Strategies" in: Water Quality Guidelines, Standards and Health: Assessment of risk and risk management for water-related infectious disease, IWA Publishing, 257-288. Démotier, S., Denoeux, T. and Schon, W. (2003) "Risk assessment in drinking water production using belief functions" in: Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Proceeding, Springer-Verlag Berlin, Berlin, 319-331. Edzwald, J. K. and Kelly, M. B. (1998). "Control of Cryptosporidium: from reservoirs to clarifiers to filters" Water Science and Technology, 37, (2), 1-8. Eisenberg, D., Soller, J., Sakaji, R. and Olivieri, A. (2001). "A methodology to evaluate water and wastewater treatment plant reliability" Water Science and Technology, 43, (10), 91-99. Emelko, M. B., Huck, P. M. and Coffey, B. M. (2005). "A review of Cryptosporidium removal by granular media filtration" Journal of the American Water Works Association, 97, (12), 101-115. Farahbakhsh, K., Adham, S. S. and Smith, D. W. (2003). "Monitoring the Integrity of LowPressure Membranes" Journal of the American Water Works Association, 95, (6), 95-107. Federal-Provincial-Territorial Committee on Drinking Water (2004). Guidelines for Canadian Drinking Water Quality: Supporting Documentation — Protozoa: Giardia and Cryptosporidium, Water Quality and Health Bureau - Healthy Environments and Consumer Safety Branch, Furumoto, W. A. and Mickey, R. (1967). "A Mathematical Model for the Infectivity-Dilution Curve of Tobacco Mosaic Virus: Experimental Tests" Virology, 32, 224-233. Gifford, G. E. and Koch, A. L. (1969). "The Interferon Dose Response Curve and its Possible Significance" Journal of Theoretical Biology, 22, 271-283. Haas, C. N. (1983). "Estimation of Risk Due to Low Doses of Microorganisms: a Comparison of Alternative Methodologies" American Journal of Epidemiology, 118, (4), 573-582. Haas, C. N., Crockett, C. S., Rose, J. B., Gerba, C. P. and Fazil, A. M. (1996). "Assessing the Risk Posed by Oocysts in Drinking Water" Journal of the American Water Works Association, 88, (9), 131-136. Haas, C. N. and Trussell, R. R. (1998). "Frameworks for assessing reliability of multiple, independent barriers in potable water reuse" Water Science and Technology, 38, (6), 1-8. Haas, C. N., Rose, J. B. and Gerba, C. P. (1999). Quantitative microbial risk assessment, Wiley, New York, x, 449 p. Haas, C. N., French, K., Finch, G. R. and Guest, R. K. (2001). Data Review on the Physical/Chemical Removal of Cryptosporidium, Foundation, A. R., 96p. 24  Hamilton, P. D., Gale, P. and Pollard, S. J. T. (2006). "A commentary on recent water safety initiatives in the context of water utility risk management" Environment International, 32, (8), 958-966. Hansen, J. S. and Ongerth, J. E. (1991). "Effects of Time and Watershed Characteristics on the Concentration of Cryptosporidium Oocysts in River Water" Applied and Environmental Microbiology, 57, (10), 2790-2795. Havelaar, A. H. (1994). "Application of HACCP to drinking water supply" Food Control, 5, (3), 145-152. Havelaar, A. H., Hollander, A. E. M. D., Teunis, P. F. M., Evers, E. G., Kranen, H. J. V., Versteegh, J. F. M., Koten, J. E. M. V. and Slob, W. (2000). "Balancing the Risks and Benefits of Drinking Water Disinfection: Disability Adjusted Life-Years on the Scale" Environmental Health Perspectives, 1008, (4), 315-321. Hijnen, W. A. M., Beerendonk, E. F. and Medema, G. J. (2006). "Inactivation credit of UV radiation for viruses, bacteria and protozoan (oo)cysts in water: A review" Water Research, 40, 3-22. Hrudey, S. E. and Hrudey, E. J. (2007). "Published Case Studies of Waterborne Disease Outbreaks—Evidence of a Recurrent Threat" Water Environment Research, 79, (3), 233245. Huck, P. M., Coffey, B. M. and O'Melia, C. R. (2001). Filter Operation Effects on Pathogen Passage [Project #490], Denver, Jaidi, K. (2007). Développement d'un modèle d'analyse des risques microbiologiques (QMRA) permettant le choix de combinaisons de procédés les plus sécuritaires, Génie civil, géologique et des mines, École Polytechnique de Montréal: Montréal. Maîtrise Ès Sciences Appliquées: 237p. Kistemann, T., Claßen, T., Koch, C., Dangendorf, F., Fischeder, R., Gebel, J., Vacata, V. and Exner, M. (2002). "Microbial Load of Drinking Water Reservoir Tributaries during Extreme Rainfall and Runoff" Applied and Environmental Microbiology, 68, (5), 2188-2197. LeChevallier, M. W. and Buckley, M. (2007). Clean water: What is acceptable microbial risk? Microbiology, A. A. o., Washington, D.C., 18p. Mackey, E. D., Hargy, T. M., Wright, H. B., Malley Jr, J. P. and Cushing, R. S. (2002). "Comparing Cryptosporidium and MS2 bioassays- implications for UV reactor validation" Journal of the American Water Works Association, 94, (2), 62-69. Madsen, H. O., Krenk, S. and Lind, N. C. (1986). Methods of Structural Safety, Prentice-Hall, Englewood Cliffs, 403. Makri, A., Modarres, R. and Parkin, R. (2004). "Cryptosporidiosis Susceptibility and Risk: A Case Study" Risk Analysis, 24, (1), 209-220. Mallevialle, J., Odendaal, P. E. and Wiesner, M. R. (1996). Water Treatment Membrane Processes, McGraw-Hill. Mamane, H. and Linden, K. G. (2006). "Impact of particle aggregated microbes un UV disinfection. II: Proper absorbance measurement for UV fluence" Journal of Environmental Engineering, 132, (6), 607-615. Medema, G. J., Schets, F. M., Teunis, P. F. M. and Havelaar, A. H. (1998). "Sedimentation of Free and Attached Cryptosporidium Oocysts and Giardia Cysts in Water" Applied and Environmental Microbiology, 64, (11), 25  Medema, G. J., Hoogenboezem, W., Veer, A. J. v. d., Ketelaars, H. A. M., Hijnen, W. A. M. and Nobel, P. J. (2003). "Quantitative risk assessment of Cryptosporidium in surface water treatment" Water Science and Technology, 47, (3), 241-247. Mercer, S. M. and Hrudey, S. E. (1990). "Demonstration of Quantitative Risk Assessment for a Municipal Water Treatment Plant Chlorination Process", Biennial Environmental Specialty Conference, Hamilton, Ont., May 1990. Canadian Society for Civil Engineering. Murray, R., Uber, J. G. and Janke, R. (2006). "Model for estimating acute health impacts from consumption of contaminated drinking water" Journal of Water Resources Planning and Management, 132, (4), 293-299. Ong, C., Moorehead, W., Ross, A. and Isaac-Renton, J. L. (1996). "Studies of Giardia spp. and Cryptosporidium spp. in Two Adjacent Watersheds" Applied and Environmental Microbiology, 62, (8), 2798-2805. Ongerth, J. E. and Pecoraro, J. P. (1995). "Removing Cryptosporidium using multimedia filters" Journal of the American Water Works Association, 87, (12), 83-89. Ongerth, J. E. and Pecoraro, J. P. (1996). "Electrophoretic Mobility of Cryptosporidium oocysts and Giardia cysts" Journal of Environmental Engineering, 122, (3), 228-231. Perz, J. F., Ennever, F. K. and Blancq, S. M. L. (1998). "Cryptosporidium in Tap Water: Comparison of Predicted Risks with Observed Levels of Disease" American Journal of Epidemiology, 147, (3), 289-301. Pruss, A., Kay, D., Fewtrell, L. and Bartram, J. (2002). "Estimating the burden of disease from water, sanitation, and hygiene at a global level" Environmental Health Perspectives, 110, (5), 537-542. Qian, S. S., Donnelly, M., Schmelling, D. C., Messner, M., Linden, K. G. and Cotton, C. (2004). "Ultraviolet light inactivation of protozoa in drinking water: a Bayesian meta-analysis" Water Research, 38, (2), 317-326. Rentdorff, R. C. (1954). "The ExperimentalTransmission of Human Intestinal Protozoan Parasites: II. Giardia lamblia Cysts Given in Capsules" American Journal of Hygiene, 59, 209-220. Risebro, H. L., Doria, M. F., Andersson, Y., Medema, G., Osborn, K., Schlosser, O. and Hunter, P. R. (2007). "Fault tree analysis of the causes of waterborne outbreaks" Journal of Water and Health, 05, (Supplement 1), 1-18. Robertson, L. J., Campbell, A. T. and Smith, H. V. (1992). "Survival of Cryptosporidium parvum Oocysts under Various Environmental Pressures" Applied and Environmental Microbiology, 58, (11), 3494-3500. Rose, J. B., Haas, C. N. and Regli, S. (1991). "Risk Assessment and Control of Waterborne Giardiasis" American Journal of Public Health, 81, (6), 709-713. Schijven, J. F., Bradford, S. A. and Yang, S. (2005). "Release of Cryptosporidium and Giardia from Dairy Cattle Manure: Physical Factors" Journal of Environmental Quality, 33, 14991508. Smeets, P. (2008). Stochastic modelling of drinking water treatment in microbial risk assessment, Civil Engineering, Delft University of Technology: Delft. Doctor: 203p. Smith, H. V. and Rose, J. B. (1998). "Waterborne Cryptosporidiosis: Current Status" Parasitology Today, 14, (1), 14-22.  26  Stevens, M., McConnell, S., Nadebaum, P. R., Chapman, M., Ananthakumar, S. and McNeil, J. (1995). "Drinking water quality and treatment requirements: a risk-based approach" Water, 22, 12-16. Teunis, P. F. M., Medema, G. J., Kruidenier, L. and Havelaar, A. H. (1997). "Assessment of the risk of infection by Cryptosporidium and Giardia in drinking water from a surface water source" Water Research, 31, (6), 1333-1346. Teunis, P. F. M. and Havelaar, A. H. (2002). "Risk assessment for protozoan parasites " International Biodeterioration and Biodegradation, 50, (3-4), 185-193. United States Environmental Protection Agency (1989). Drinking Water; National Primary Drinking Water Regulations; Filtration, Disinfection; Turbidity, Giardia Lamblia, Viruses, Legionela and Heterotrophic Bacteria; Final Rule, 40 CFR Parts 141 and 142, Federal Register, Vol. 54 No. 124, 27486 - 27541. United States Environmental Protection Agency (1998). National Primary Drinking Water Regulations: Interim Enhanced Surface Water Treatment Rule; Final Rule, 40 CFR Parts 141 and 142, Federal Register, Vol. 63, No. 241, 69478-69521. United States Environmental Protection Agency (2005). Membrane Filtration Guidance Manual, Office of Water, United States Environmental Protection Agency (2006a). National Primary Drinking Water Regulations: Long Term 2 Enhanced Surface Water Treatment Rule; Final Rule, 40 CFR Parts 9, 141, and 142 [EPA–HQ–OW–2002–0039; FRL–8013–1] RIN 2040—AD37, Federal Register, Vol. 71, No. 3, 654-786. United States Environmental Protection Agency (2006b). Ultraviolet Disinfection Guidance Manual for the Final Long Term 2 Enhanced Surface Water Treatment Rule, Office of Water, Worcester, J. (1954). "How Many Organisms?" Biometrics, 10, (2), 227-234. Zmirou-Navier, D., Gofti-Laroche, L. and Hartemann, P. (2006). "Waterborne microbial risk assessment: a population-based dose-response function for Giardia spp. (E.MI.R.A study)" BMC Public Health, 6, 122.  27  2  TECHNICAL HAZARD IDENTIFICATION IN WATER TREATMENT USING FAULT TREE ANALYSIS1  Preface This chapter is an independent contribution to literature and is directed at advancing hazard identification and reliability tools in the water treatment industry. It focuses solely on the hazard identification step and proposes a methodology applicable to different treatment technologies. I believe the reliability of treatment technologies should be assessed at a more detailed level, i.e., mechanical reliability, than that which is usually undertaken in current quantitative risk and reliability assessments, i.e., inherent reliability. This chapter emphasizes the need to mitigate the risk of infection at the operational level. FTA is applied from this standpoint to membrane and conventional technologies.  1  A version of this chapter will be submitted for publication. Beauchamp, N., Lence, B.J., Bouchard, C. Technical Hazard Identification in Water Treatment Using Fault Tree Analysis  28  2.1 Introduction The water industry has applied quantitative microbial risk assessment (QMRA) for approximating the infection risk caused by the presence of microbial contaminants in drinking water and thereby for developing standards and regulations for these new treatment technologies (USEPA, 2006; Medema and Ashbolt, 2006). QMRA requires, among other information, the performance of such water treatment facilities. Such data may include disinfectant residuals or measurements of removal of indicator microorganisms, which are generally monitored downstream of the treatment unit process of interest. QMRA considers water quality from the point of view of inherent reliability. As defined by Eisenberg et al. (2001), inherent reliability is “the probability that a treatment plant effluent will meet or exceed a given set of criteria,” assuming that the treatment plant is, in general, “operating properly.” By contrast, the goal of mechanical reliability, also defined by Eisenberg et al. (2001), is “to determine key pieces of equipment in the plant whose failure may be related to effluent quality, and to determine the probability that the facility will be functioning according to design specification.” Analysis of mechanical reliability requires hazard identification, determining the possible faults and key components of the system, and frequency analysis, quantifying the probabilities of these faults leading to failure of the process. Because new technologies, such as membrane processes and UV disinfection, are not as widespread as conventional ones, the experience gained from long-term observation and assessment is not as abundant as it is for the latter. In this paper, it is proposed that analysis of mechanical reliability may provide valuable insights in the face of such limited information. Mechanical reliability analysis may improve our understanding of the factors that impede the performance of these technologies, our identification of approaches for preventing or minimizing their malfunction and the data collection needs to support these approaches, and improve our understanding of the similarities and differences among conventional and new technologies. An approach for undertaking technical hazard identification, in order to support analysis of mechanical reliability, is described. This approach applies fault tree analysis (FTA) and an iterative process for eliciting fault tree components from plant operators, and is demonstrated for plants employing new and conventional water treatment technologies for removing protozoa from surface waters. The FTA method was chosen to identify the technical hazards because it is efficient at considering many different events that can lead to a system malfunction and it allows 29  for consideration of events of any nature (e.g., mechanical, operational, and natural). The objectives of this work are to advance FTA for use in the water industry by developing a systematic approach for extracting and organizing information regarding water treatment plant systems; and to improve our understanding of the technical hazards at the treatment plant operational level that affect the risk of infection from microbial contaminants in drinking water. To reach these objectives, two case studies of typical midsize plants treating riverine surface water are used to demonstrate the approach. The plants were chosen to provide qualitative assessments of the reliability of conventional and new physicochemical removal processes, namely pre-treatment, clarification plus rapid granular filtration and pre-treatment plus membrane ultrafiltration (UF). The remaining sections of this paper are organized as follows. First, information regarding the qualitative reliability methods and the pathogens of concern in drinking water is provided. Next, the steps followed to build the fault trees developed in this work are described, along with the technologies to which the approach is applied. The fault trees and related definitions for the two case studies are then presented and discussed.  2.2 Background 2.2.1 Qualitative reliability methods Reliability is defined as the probability of success, or of non-failure. In assessing the reliability of a system with multiple components, such as a water treatment system, the reliability of the system depends on the reliability of its components. How one defines the system, its goals, its components, their relationship within the system as a whole, and their contribution to system success or failure, is therefore crucial. In addition to FTA, approaches such as event-tree analysis (ETA), failure modes and effects analysis (FMEA), and the Hazard and Operability (HAZOP) method have been widely used in different industries (reviewed by Gressel and Gideon, 1991), as well as the Hazard Analysis and Critical Control Point (HACCP) method in the food industry (reviewed by Min and Min, 2006). Other example applications of these methods are cited in the analysis of power systems (Chenet al., 2007), dam safety (Hartford and Baecher, 2004), and wastewater treatment (Choudhury et al, 1992). In general, the objectives of these various techniques are to identify the possible failure modes, paths to failure and consequences of failure, with a focus on one or more of these elements. 30  Application of FTA in the drinking water industry has so far been limited. Mercer and Hrudey (1990) were the first to apply FTA in a drinking water context. They demonstrate the use of FTA in quantitative risk assessment for an occupational hazard faced by water treatment plant operators. Démotier et al. (2003) provide another example of the quantitative use of a fault tree analysis in a global risk assessment in which they evaluate the probability of producing water that does not comply with standards. They incorporate incomplete information and uncertainty through belief functions to perform a quantitative evaluation of the fault tree for the risk of noncompliance. Deere et al. (2001) acknowledge the importance of identifying events and failure paths as scenarios leading to the undesired health effect of drinking water contamination and name this the fault tree concept, referring to a fault tree built by Stevens et al. (1995) for the contamination of a water source. Risebro et al. (2007) conduct an application of FTA to identify the causes of waterborne outbreaks. They also propose a technique to quantitatively assess the importance of the contribution of each cause to waterborne outbreaks. Figure 2.1 presents an adaptation of the fault tree from Risebro et al. (2007). It can be observed from this fault tree that the conditions necessary for a waterborne outbreak to occur are generally a source of contamination, the failure of the treatment or distribution system, and the inadequate detection of or response to the drinking water contamination. In the present study, we propose to develop a fault tree for the event named “filtration failure” in Figure 2.1. FTA has not been applied to the operational aspects of different water treatment technologies and therefore there is opportunity to advance the applicability of FTA in this context and to gain insight into various filtration treatment technologies.  2.2.2 Microbial quality of drinking water The main objectives of most modern treatment plants are to provide microbiologically and chemically safe water, with acceptable aesthetic quality, in sufficient quantity and at a relatively low cost. Providing microbiologically safe water remains the priority for most treatment facilities, especially those treating surface waters. Waterborne organisms pathogenic to humans can be viruses, bacteria, protozoa, helminths, and algaes. Depending on jurisdictions and raw water quality, precise objectives for removal of these pathogens are set and performance criteria are imposed on treatment facilities. Generally, bacteria, viruses and protozoa are the targets of regulations. Compliance with water treatment objectives is assured either by monitoring the treated water for undesired contaminants (e.g., E. coli and fecal coliforms) or by demonstration of process performance (e.g., in the case of viruses and protozoa). Interest in protozoa grew 31  when they were recognized as endemic human pathogens in both developing and developed parts of the world. In North America and Europe, there has been a move towards establishing treatment objectives for the protozoa Giardia lamblia and Cryptosporidium parvum. The lower efficiency of conventional filtration and disinfection against protozoa, compared with their efficiency against bacteria and viruses, created an incentive to develop new technologies to treat water for protozoa. Among these technologies, low-pressure membrane filtration has grown in popularity over the last few years. Like conventional granular filters, membrane filtration aims at physically removing pathogens. The focus of this study is on the physicochemical removal of Cryptosporidium parvum by conventional and UF plants.  2.3 Method for fault tree analysis of filtration technologies The FTA is a deductive method, developed in 1961 by Bell laboratories and the U.S. Air Force (Ericson, 1999), taking the form of a vertical inverted logical tree, linking all the possible events in a logical sequence leading to an undesirable consequence. At the top of the tree is the undesired event, also called the top event or system failure. Each row in the tree contains the possible causes of the events of the row above it, and all rows are therefore beneath the top event, connected like branches of a tree. Boolean operators (AND or OR) called gates connect events of different rows. The OR gates link events that have a common consequence but that do not have to happen simultaneously for that consequence to happen. The AND gates link events that must happen simultaneously for the consequence to happen. At the bottom of the tree are the events initiating failure, called primary events, which are not developed further in the analysis. Primary events can be either basic events, not developed because they are part of the environment of the system and cannot be controlled, or undeveloped events, not developed further because of their limited consequence or the lack of available information related to them (Hartford and Baecher, 2004). The distinction between basic and undeveloped events is sometimes artificial and depends on the ability of the analyst to go into more or less detail. In this study, no distinction is made between the two types of primary events. A logical sequence beginning at a primary event, going through the intermediate events, and ending at the undesired top event is called a failure path. A group of primary events (or a single primary event) leading to failure is called a cutset. Minimal cutsets are cutsets in which all events are necessary for the undesired event to occur. Removing one event from a minimal cutset prevents system failure. 32  This method is used when the possible causes of a given event of interest need to be identified. It has the advantages of providing a logical sequence of events for a system failure, of aiding in identifying faults that have an impact on the failure, and of being efficient at describing a large number of events and combination of events. However, it also has the disadvantage of depending on the knowledge and experience of the analyst and it requires that an adequate level of detail for the given application be identified that is sufficient for describing the system without becoming unduly complicated, producing a large tree that is difficult to understand or use. Finally, fault trees are of interest because it is possible to use them to conduct what is called “quantitative evaluations.” The goal of a quantitative evaluation is to find the probability of the top event and to quantify the importance of primary events for influencing system failure using probability theory and Boolean algebra. Quantitative evaluation requires estimates of the probabilities of occurrence of primary and interim events. Steps for developing fault trees have been described previously by Hartford and Baecher (2004). However, as previously noted, the knowledge of the analyst is very important in the construction of a fault tree. Operators are believed to be the most knowledgeable regarding the mechanical reliability of their plant and therefore their involvement should be solicited in the development of the fault trees. The information and knowledge necessary may be elicited from plant operators using a process in which fault trees are constructed and reviewed by operators in an iterative fashion. The next subsection provides a description of the steps involved in the construction of a fault tree, based on Hartford and Baecher (2004). The following subsection describes the iterative procedure developed and undertaken in this study to solicit operator input in the construction of fault trees for the analysis of operational hazards of filtration technologies. Figure 2.2 shows how these two methodologies relate to each other.  2.3.1 Fault tree construction  2.3.1.1 Defining the system, environment and top event The first step in every qualitative or quantitative reliability analysis is to define the system to be analyzed. This involves delimiting the spatial boundaries (i.e., the extent of physical limits) and temporal boundaries (i.e., the extent of time) of the system, as well as its objective and performance criteria. If appropriate, the system can also be divided into subsystems at this step. Subsystems must be defined in the same manner as the whole system. The general objective of the risk analysis for which the technical hazards are to be identified with the fault trees 33  developed herein is the mitigation of the risk of infection due to microbial contaminants in drinking water. The trees will be constructed to identify interventions at the operational level. This will influence the subsequent steps described in the following sections and focus the results of the fault tree analysis. The environment describes the conditions under which the system will operate. The second step taken is to define the environmental conditions. Generally, systems are exposed to various exterior “loads” (e.g., temperature, pressure, forces, contaminations, and radiation). A fault tree may only be valid under certain combinations of these loads and needs to be redefined when the conditions change. Although the selection or definition of the top event seems straightforward, it requires careful consideration. The top event must not be defined too widely, so that the fault tree is not too large and difficult to understand or use. It also has to serve the purpose that the tree plays in the overall analysis. In this work, the fault tree development is motivated by the use of filtration technologies for microbial water quality. This will be the basis of the definition of the top event.  2.3.1.2 Building the fault trees Once the top event is selected, it is time to identify its immediate causes. All possible immediate reasons for the top event to occur must be identified. When the analyst believes that all the causes have been identified, the same process is repeated for every cause found, i.e., the analyst searches for immediate causes of the events comprising the second row of the tree. This process is repeated until a desired level of detail is reached. This level of detail is defined based on the available information and limits of the studies. No causes are sought for an event if it is part of the environment of the system, if it cannot be controlled or if there is a lack of information or knowledge to further develop the event. As was previously noted, monitoring is not considered in the fault trees developed herein because it is part of another branch of the more general fault tree (Risebro et al., 2007) shown in Figure 2.1. Similarly, redundancy in equipment and system maintenance, although they may prevent failure events from happening, are not listed in the fault trees. Monitoring, redundancy, and maintenance generally have an influence on the frequency of the events of the fault trees. Hence, they will become important once the fault tree is used to perform a quantitative analysis. Because it is not our objective to conduct such an analysis, we exclude these types of events from the trees. An example of how they can be used is given in Section 2.6.1. 34  2.3.1.3 Soliciting operator knowledge The success of the definition steps and the process of finding causes of events from the top event to the primary events depend on the experience of the plant operators. In this study, we undertook an approach intended to maximize the transfer of knowledge and experience from operators. This approach appears to be useful in conducting such analyses for the water treatment industry because it accommodates situations where knowledge is dispersed among several stakeholders. First, literature regarding the technology to be analyzed is reviewed, and its potential effectiveness for removing microbial contaminants in drinking water is identified. Reported modes and causes of success and failure at removing microbial contaminants under adverse environmental conditions are also identified. Second, onsite visits to the water treatment plant are undertaken during which plant operating data are collected in cooperation with plant operators. This step includes taking part in shifts with operators, reviewing and analyzing operational data, plant design documents and operation and maintenance manuals, and conducting extensive interviews with crew members. This information is valuable in defining the system, its environment, and the top event adequately. A series of fault tree development sessions are then conducted with individual plant operators. After describing the motivations and objectives of the project and the fault tree method, the analyst and the operator determine the boundaries of the system to be studied. A simplified flow diagram of the treatment plant is used as a reference during these sessions. The analyst asks the operator to identify the barriers of the system to microbial contamination of drinking water. Then, the analyst systematically queries the operator regarding the major components of the system asking how each element may influence the performance of the system. Operators describe sequences of events leading to failure, operator tasks and responsibilities in preventing these events, and challenging situations when the treatment system has been stressed in the past. The closing questions to operators concern the frequency of such events. They are asked to comment on the likelihood of the occurrence of events, and to identify the most and least likely events among the ones they have listed. Next the analyst builds a preliminary fault tree based on each operator fault tree development session. The operators are then asked to review their corresponding fault tree to verify its content. Then, the analyst conflates the fault trees and the information collected during the literature review to build one final fault tree and asks the group of treatment plant operators to review the conflated fault tree 35  and comment on its relevance for their plant. The final step is to incorporate these comments in the conflated fault tree.  2.4 Case studies Fault trees are developed herein for two filtration technologies treating surface water in a Canadian context. The two plants were chosen to provide examples of the application of the approach. Although we attempted to make the fault trees general and applicable to most environments possibly encountered in municipal drinking water applications in Canada, the particular environment of a system often dictates how it is operated. The plants used as case studies do not necessarily face the same challenges as do other plants applying the same technologies. For this reason, the fault trees are not an exhaustive list of all possible situations encountered in water treatment but a first attempt at collecting information on pathways of failure. Names and locations of plants are not revealed for confidentiality reasons. As the fault trees were developed only for the physicochemical part of the treatment plants, disinfection is only briefly described.  2.4.1 Plant A Plant A is composed of the following processes: screening, coagulant dosing and static mixing, mechanical flocculation, submerged UF hollow fibre membrane trains, and chlorination. The physicochemical process considers the first five unit processes, i.e., from screening to UF. Pre-treatment is defined as the combination of screening, coagulant dosing and mixing, and mechanical flocculation. The objective of pre-treatment is to condition the water for optimal UF operation. Coagulation and flocculation are the processes of respectively destabilizing particles and aggregating them in flocs. Here, it is not necessary to form large flocs as pin flocs, small enough so as not to foul membranes, are ideal. Charge neutralization coagulation is in use at this plant. Membrane filtration is a physical removal process. Particles, pathogens and flocs are removed by size exclusion. Fibre walls are made of a supporting structure, which constitutes most of the thickness of the fibre, and the active layer, a skin that rejects particles and pathogens by size exclusion. Intact UF membranes are an absolute barrier to protozoan (oo)cysts and bacteria (Farahbakhsh et al. 2003), their absolute pore size of 0.1 µm being smaller than the size of contaminants, which are greater than three µm for (oo)cysts and approximately one µm for bacteria. Viruses, however, are smaller than the pore size of UF membranes and the removal of 36  viruses is usually lower than the removal of protozoa and bacteria. Breaches larger than microorganism sizes can result in contamination of filtered water. Membrane integrity testing and monitoring are therefore critical for ensuring that the membrane system is functioning as required. For information regarding integrity tests, see relevant sources (USEPA, 2005; ASTM international 2003). The UF step is defined as a system of parallel trains of submerged UF membrane modules. These modules are made of hollow fibres connected to a permeate collector pipe. The permeate, i.e., the filtered water, is collected on the inside of the hollow fibres. All modules of a train are connected to a train permeate collector. Membranes are operated in a constant flow mode, where the permeation flow is fixed according to the water demand of the distribution system. The pressure differential between the feed and permeate sides, created by vacuum pumps, varies to assure the required flow. Integrity tests are conducted on each train when required (approximately one per week) with an automated procedure. The objective of the UF step is the removal, by sieving, of particles, including coagulated NOM, and pathogens from the water. The performance of UF membranes is assessed by the quality of the permeate (turbidity and particle counts) and by its capacity to remove pathogens (log removal) as determined by integrity tests using the standardized method described in ASTM international (2003). Disinfection consists of the in-situ chlorine generation facility and chlorine solution reservoir, the dosage equipment, and the chlorine contact tank. Chlorine dosage is adjusted automatically based on the flow and residual concentration at the exit of the contact tank. The objective of disinfection is to inactivate pathogens not removed by previous treatment steps and to protect the distribution system against microbial recontamination.  2.4.2 Plant B Plant B is what is referred to as a “conventional treatment train.” It is composed of the following processes: screening, coagulant dosing and static mixing, alkalinity control, mechanical flocculation, clarification by dissolved-air flotation (DAF), rapid granular filtration on a dual-media bed, UV radiation, chlorine dosing, and retention in a chlorine contact tank. Pretreatment, clarification and granular filtration are considered the conventional physicochemical removal process. UV disinfection is an addition to what is usually considered as a conventional treatment train. The DAF clarification process also differs from the more widely used clarification process, i.e., settling. 37  Pre-treatment consists of screening, followed by coagulant dosing and mixing by a diffusive pump, alkalinity control, and mechanical flocculation. Its objective is similar to that of Plant A. The goal of pre-treatment is the conditioning of water for flotation, i.e., the formation of flocs. The coagulation mode used is sweep coagulation. The DAF process needs small strong flocs. Large flocs may be too heavy to float and fluffy flocs might break when contacting rising bubbles. Following pre-treatment is clarification by DAF and filtration by rapid granular dual-media filters. The two parallel DAF units at Plant B use a recirculation flow, partially saturated with air in two pressurized saturators, to provide the micro-bubbles required for flotation. Floated flocs are skimmed off the top of the DAF tank and sent to a waste stream. The clarified water is sent to four parallel dual-media granular filters made of 500 mm anthracite and 250 mm silica sand. Filters are operated in a constant head mode, i.e., the head of the water over the filter is kept constant by a valve downstream of each filter. Large particles are removed by straining and small particles by attachment to the filter media from adsorption (e.g., electrostatic forces and hydration). The filter media is rinsed and cleaned of the accumulated particles by backwashing at intervals of one to three days. The objective of clarification and rapid granular filtration is to remove and retain particles and pathogens present in the effluent of the flocculation tanks. The performance of the physicochemical process is usually regulated as a whole by the quality of the filtrate, where turbidity is used as a surrogate measure for the log removal performance. Disinfection at Plant B differs slightly from that at Plant A, in that it involves the additional step of UV radiation. The sequence of treatment is UV radiation, chlorine dosing, and retention in a chlorine contact tank. The UV dose is measured by UV sensors and adjusted automatically based on flow. Chlorine dosage is adjusted automatically based on flow and manually on residual present at the exit of the contact tank. The objective of disinfection is the same as in Plant A, i.e., to inactivate pathogens not removed by previous treatment steps and to protect the distribution system against recontamination.  38  2.5 Results of the case studies 2.5.1 Systems, environments and top events definitions Table 2.1 and Table 2.2 provide a summary of the system, environment, and top event definitions for Plants A and B, respectively.  2.5.1.1 System definitions In both case studies, the system considered is the physicochemical process. The fault trees are developed to mitigate the risk of infection by Cryptosporidium parvum at the operational level. The spatial boundaries of the fault trees are from screening of the water to the production of permeate or filtrate (downstream of filtration units). In the demonstration of the fault tree approach presented here, waste streams, disposal of sludge and disinfection steps are not considered. The temporal scope of the study is the duration of normal operation of the treatment plants. Hence, events in the design phase of the facilities, manufacturing of the components, building of the plants, and initial start-up period are not developed. The objective of the physicochemical process and performance criteria for both case studies are also similar. In both cases, the objective of the physicochemical process consists of physically removing pathogens and coagulated matter from the water. Performance criteria are usually expressed as the maximum concentration of contaminants or their indicators in the filtered water. Technology performance criteria are also used, such as the removal of pathogens achieved by filters and/or disinfection. However, performance credits are not given on the same basis for UF and rapid granular filtration. For UF, integrity tests, mentioned earlier, provide a direct evaluation of the capacity of the system. For granular filters, filter effluent monitoring (turbidity or particle counts) is generally the sole indicator of treatment performance.  2.5.1.2 Environment definition Any “load” the systems must face is considered part of their environment. The most important aspect of the physical environment to consider is raw water quality. Most drinking water facilities in Canada taking their water from surface watercourses are subject to a hydrological cycle characterized by variations in raw water quality between seasons. In summer and fall, the water quality is driven by runoff generated by rainfall. These events usually increase the turbidity, organic content and microbial contaminants content of the water. Water temperatures in late summer are generally the highest encountered, with values between 15 and 39  25oC. In the winter months, accumulation of precipitation in the form of snow causes low flows and lower microbial contamination than those in the summer. Water temperatures are usually slightly above the freezing point, increasing viscosity of the water and decreasing chemical reaction rates. Most of the accumulated snow melts during spring, creating a boost in transport of contaminants by runoff at that time. The hydrological cycle and its impact on water quality are greatly influenced by land use in the watershed. In this work, the considered water quality variations for both plants are as follows. We considered raw water with a temperature varying from almost freezing to 25oC, a baseline turbidity between 0.2 and 2 NTU with occasional peaks at values over 10 NTU, a dissolved organic matter concentration high enough to require its partial removal by enhanced coagulation, and a baseline microbial contamination from the watershed with occasional event-driven microbial contamination from rainfall and runoff. Another “load” to which the plants are both exposed is the demand for water. The daily demand changes over a year, with a minima in winter and a maxima in summer, and the hourly demand fluctuates over a day, with a minima at night and a maxima at peak use hours during the day. We considered that the plants were properly designed, i.e., that they adjust to meet these variations. Each plant is also exposed to specific loads. UF membranes are exposed to cleaning chemicals from maintenance cleanings and permeability recovery cleanings. Maintenance cleanings are conducted with a chlorine solution once a week for approximately 15 minutes. Recovery cleanings are conducted on average three times a year using both a chlorine solution to remove organic foulants and citric acid to remove inorganic foulants. The manufacturer defines limits of exposure to these chemicals in terms of an instantaneous timeframe, i.e., the maximum concentration during a cleaning, or a cumulative limit, i.e., the sum of the concentration multiplied by the time of exposure to the chemical over the lifetime of the membrane. Membranes are also exposed to another load: pressure. Transmembrane pressure (TMP), or the difference in pressures on each side of the membrane, is the force that drives water through the hollow fibre walls. In the case of submerged hollow fibre membranes, a vacuum pump creates the pressure differential. The manufacturer specifies operational limits to TMP beyond which the integrity of the components of the system is not guaranteed. During backpulses, conducted for 30 seconds every 15 minutes, maintenance cleanings, and recovery cleanings, inside-out flows, i.e., water flowing from the inside to the outside of the fibres, are applied to the membranes. The same TMP operational limits apply during these periods. 40  In addition to the water quality loads defined earlier, Plant B experiences three other loads. First, the raw water alkalinity at Plant B may vary from as low as 1 mg CaCO3/L to 14 mg CaCO3/L, thus requiring adjustment for treatment. Alkalinity control is not an issue for Plant A, and is thus only considered in the fault tree of Plant B. Second, the DAF saturation system experiences considerable air pressure. The required pressure reached by the compressor and the corresponding airflow is a function of the recycle flow of water and the water temperature, and can therefore be considered as a “load” on the system. However, the saturators are operated at constant pressure and recycle flow. As the load on the system is unlikely to vary outside the ranges of acceptable pressure and flow specified by the manufacturer, it is not detailed in the present study. Finally, the backwash flows can be seen as a load applied on filters during their backwash cycle. Backwashes are conducted on every filter every 36 to 72 hours, depending on the time of year. Required backwash flow rates and duration is based on the particles to be removed from the filters and on the size and density of media. Flow must be high enough to wash out accumulated particles but must not dislodge the filter media.  2.5.1.3 Top event definition The objective of this study is to assess the reliability of technologies used to remove Cryptosporidium oocysts from drinking water. Water treatment processes are sometimes designed to remove organics through the use of enhanced coagulation. However, the failure to physically remove organics by enhanced coagulation and membrane or granular filtration is not an adequate indication that pathogens have not been removed. Particles and pathogens may still be removed while organics are not if the coagulant dose is high enough for particle coagulation but not for NOM enhanced coagulation. Even though treatment plants have more objectives, the focus of this study is the capacity to remove the pathogen Cryptosporidium parvum. Hence, this analysis does not consider the chemical risks or aesthetic concerns. The top event is defined to reflect this focus: “Presence of unexpectedly high Cryptosporidium parvum concentrations in the filtered water.” We acknowledge the fact that physicochemical processes are not expected to remove 100% of pathogens at all times. Some oocysts may not be removed, even when both systems are operated properly. Therefore, the top event considers a minimal concentration as acceptable, but an unexpectedly high concentration as a system failure.  41  2.5.2 Fault trees Figure 2.3 and Figure 2.4 present the fault trees for the UF process and for the conventional physicochemical process, respectively. Table 2.3 and Table 2.4 provide detailed information regarding the events and sources (i.e., operators or literature) where the given event is identified. It is important to note that some of the literature sources that do not refer to treatment systems identical to the case studies presented here (e.g., other types of membrane configurations, different coagulants, or other types of clarification processes) were also reviewed. When appropriate, the observations from these references were extrapolated in order to consider every possible situation that could occur in the case study systems. For example, in the fault tree for Plant A, the occurrence of manufactured defects has not been mentioned by plant operators, possibly because they have not experienced it, but has been reported previously in literature. Because it is possible that Plant A experiences this event in the future, it is included in the fault tree.  2.5.2.1 Cutsets Numerous events affect the removal of Cryptosporidium parvum in both systems, with Plant B having slightly more events than Plant A. Indeed, there are 19 different primary events and 16 different minimal cutsets in the membrane fault tree and 29 different primary events and 23 different minimal cutsets in the conventional treatment fault tree. The minimal cutsets identified in each fault tree were categorized based on the process(es) where failure is initiated. For Plant A, failure may be initiated at screening, coagulation (including coagulant mixing), flocculation, and membrane filtration, and for Plant B at screening, coagulation (including coagulant mixing), flocculation (including alkalinity control), DAF, and granular filtration. The number of cutsets for each category is shown in Table 2.5. For Plant A, two failure paths, starting with the events “A component of the processes breaks into the water” and “Membranes are fouled” can be initiated by more than one process. The total number of categorized cutsets is therefore larger than the initial number of cutsets. For Plant A, the majority of categorized cutsets relate to the membrane filtration process itself, with 13 cutsets out of 19 pertaining to filtration. For Plant B, the process with the most related cutsets is also the filtration process, although it does not represent the majority, with 8 out of 23 cutsets pertaining to filtration. The flocculation process at Plant B has only one fewer cutset, with 7. If we consider the processes of coagulation and flocculation as a whole, it becomes the step with the most cutsets at Plant B with 10 cutsets out 42  of 23, but not at Plant A, with 5 cutsets out of 19. In brief, there are more failure paths at Plant B than at Plant A, and a larger variety of processes initiate these paths.  2.6 Discussion A quick glance at Figure 2.3 and Figure 2.4 allows a first general observation about the systems studied: numerous failure paths exist for both systems, with Plant B having more paths than Plant A, despite the fact that the top event is limited to microbial contamination by Cryptosporidium parvum. This highlights the often-overlooked fact that both treatment technologies are complex systems requiring constant supervision and competent operation. To consider other treatment plant objectives, such as the removal of disinfection by-products precursors, would likely result in additional possible modes of failure that would need to be prevented, and would thus reinforce this conclusion. In Plant A, the majority of events relate to the membrane filtration processes. This comes from the fact that the processes upstream of the membrane filters have less influence on pathogen removal at Plant A in comparison with Plant B, where the processes upstream of the granular filters are responsible for the majority of the cutsets. The fact that the failure paths are distributed among the processes at Plant B but concentrated in the membrane filtration process at Plant A may also point out that the removal at Plant B depends on the proper functioning of more processes than at Plant A. This is a manifestation of the established fact that under non-optimal coagulation-flocculation conditions, removal of oocysts by conventional treatment is hindered. In the case of UF, according to operators, non-optimal pre-treatment conditions may cause membrane fouling, which has an immediate impact on the productivity of the system but does not reduce the removal of oocysts until extreme fouling causes an intolerable increase in TMP and damages fibres (Gijsbertsen-Abrahamse et al., 2006). Fouling can also have an impact on the aging of the membrane fibres because of the effect of exposure to abrasive fouling substances and cleaning chemicals required to recover the permeability. However, it is important to note that, for Plant A, this concern is not immediate but that it can cause problems on a long-term basis, whereas for Plant B, the coagulation-flocculation process is continuously adjusted to changes in raw water quality. It is important to note that the number of cutsets is not an indication of the probability of failure of a process, nor is the higher number of cutsets in one fault tree an indication that this 43  treatment system does not achieve the required removal of Cryptosporidium oocysts more frequently. It only reflects the number of possible scenarios leading to system failure, as defined previously. The quantification of the probability of occurrence of each scenario may help in determining priorities in addressing the issues raised by the fault trees. This is one of the proposed uses of the fault trees. The following section discusses uses of the fault trees and includes examples of the data required to determine the probabilities of occurrence of events.  2.6.1 Using fault trees The main objective of constructing a fault tree is to identify potential hazards. The fault trees constructed herein allow for hazards identification, although many hazards were already known by the operators, indicating that the plants studied had well understood hazard prevention habits and that most hazards had pre-emptively been identified, or already found in other references. However, the fault trees summarize and assemble the knowledge that might have previously been dispersed and organizes this knowledge in a single logical, concise diagram, making this knowledge more accessible. The fault trees constructed herein may be used by operators and potentially by other plants in evaluating the preparedness of treatment systems to face potential harmful events. Failure paths determined by the trees can be examined individually and the current practices at a plant investigated for whether a given variable is monitored or a given reaction protocol is followed if a failure event occurs. Addressing such questions may lead to improvements in operation practices, and in the design of plants where events that may not have been considered initially by the design team may be addressed, leading to a decreased likelihood of failure in both instances. The tree can also be employed as an educational tool for operators and engineers in training. For example, it is valuable for operators who must operate different treatment systems to realize that, for Cryptosporidium oocysts removal, the operational focus of a membrane system, i.e., the membrane filtration step, is different from the operational focus of a conventional system, i.e., the pre-treatment steps. In addition, the fault tree may be used to diagnose and identify the cause of the failure by deduction in a case where the top event, i.e., high concentration of oocysts in finished water, occurs. Finally, fault trees may be used to prioritize preventive interventions. By ordering events by their frequency of occurrence, operators can decide to put more efforts into preventing frequent events that have a significant impact on the removal of oocysts. This application of fault trees requires quantification of the probability of occurrence of the events. 44  Potential future studies might address this task with the approach proposed in the following section.  2.6.1.1 Data to determine probabilities of occurrence Quantification of the probability of occurrence of events may be undertaken in a number of ways. For example, Mercer and Hrudey (1990) use operational data with a Bayesian analysis to determine the probability of occurrence of an occupational hazard related to a chlorination process. Liu (2006) suggests the use of historical operational data or fatigue testing to quantify the frequency of membrane integrity breach. Risebro et al. (2007), although they do not quantify the frequency of events in their fault tree, developed a scoring system, based on expert knowledge, to quantify the importance of every factors involved in waterborne disease outbreaks. Other possible calculations can be based on manufacturers’ data, plant operational data such as the operation and maintenance logbooks, or monitoring results. Plant configuration, such as redundancy in equipment, can also be taken into consideration, as shown in the following example. The fault trees presented here do not incorporate the possibility of preventing events or reacting to failures by monitoring, redundancy or maintenance. These three phenomena are related and are important for quantifying the probability of occurrence of the events in the fault trees. For example, given the event “Dosing equipment fails” in the conventional fault tree, suppose that a fictitious plant is equipped with three dosage pumps for its coagulant needs but uses only two at a time (having one redundant pump). Redundancy and maintenance can prevent failure. If one pump breaks and underdoses coagulant, the redundant pump is put online and replaces the broken pump. In another situation, a planned maintenance, where each pump is inspected while the other two are dosing coagulant, can detect the potential break and repair the defective part before the break happens. Such planned maintenance at a given frequency can prevent failure. Alternatively, monitoring of the dose of coagulant added and of the resulting quality of the water by a streaming current analyser can alert the operators that one of the pumps is malfunctioning allowing the plant to switch to the properly functioning pump. When performing a quantitative evaluation of a fault tree, maintenance, monitoring and redundancy will influence the frequency at which an event will happen. Consider a hypothetical situation. A dosage pump fails on average once every two years. This is once every 17 520 hours, or 5.71×10-5 failures per hour. Maintenance is conducted on the three pumps every week, 45  or every 168 hours. When failure occurs, we suppose that it is detected 95% of the time and corrective actions are taken. For the other 5% of the time, it is not detected and no corrective action is taken, leading to failure. With this information, we can compute the frequency at which a failure is likely to occur. The probability that an individual pump fails in the time interval between maintenances, F(t), is given by the cumulative distribution function (CDF) of an exponential process, where L is the mean rate of failure of the pump (5.71×10-5) and t is the time interval between maintenance operations (168). F (t ) = 1 − exp(− L × t )  (2.1)  The probability that an individual pump breaks is then 9.54× 10-3. When a pump failure is detected (i.e., 95% of failures are detected), the defective pump is taken offline and the redundant pump is placed online. The probability that a pump breaks and is not detected is 0.05×9.54×10-3, i.e., 4.77×10-4. The probability that none of the two pumps dosing coagulant fails between maintenance, P(k), is given by a binomial process, where k is the number of pump failures (0), n is the number of pumps running (2) and p is the probability of a non-detected pump failure (4.77×10-4).  n n−k P (k ) =   × p k × (1 − p ) (2.2) k  The probability of proper functioning of the equipment between maintenances is therefore 0.99905. In other words, problems with the dosage pumps will arise between maintenances 0.095% of the time, or every 7337 days. Without any monitoring, a pump failure would only be detected during maintenance (assuming that the rate of failure detection during maintenance is one). The same binomial process applies in this case, where k is 0, n is 2 and p is 9.54× 10-3. The probability of proper functioning may be calculated based on Equation 2.2 as 0.9810. In other words, undetected problems with the dosage pumps will arise between maintenances 1.9% of the time, or every 369 days. This shows how redundancy becomes more advantageous when monitoring is present. A case requiring a more complex reliability analysis based on operational data is described in the following example, where an operational parameter exceeds the range of safe operation and initiates failure. Examples in the conventional and membrane fault trees are “Rising velocities are too low” and “Permeability is too low,” respectively. In these cases, determining the failure probability is not as straightforward as presented in the previous case. For example, consider permeability, which has been studied extensively in previous literature on membrane treatment.  46  Equation 2.3 is the flux equation for membranes, where J is the flux of water permeating the membrane, in m/s, ∆P is the TMP, in Pascal, µ is the dynamic viscosity of water, in kg/m•s, km is the hydraulic resistance of the membrane and kf is the hydraulic resistance added by reversible and irreversible fouling, in m-1.  J=  ∆P µ (k m + k f  )  (2.3)  Hence, at constant flow, the TMP is a function of water viscosity and resistances. Identifying which of these factors causes failure in a particular situation is difficult because they are all contributing to the absolute value of the pressure. It is the function of J, µ, km, and kf that determines the resulting TMP. Computing the probability that ∆P reaches an unbearable value requires a quantitative reliability analysis, where the variability of the inputs J, µ, km, and kf and their correlations are known. It is likely that this probability will be dynamic, i.e., it will change seasonally, as the water demand and water viscosity evolve, and throughout the life of the membranes, as reversible and irreversible fouling take place. Finally, monitoring of TMP would further reduce the frequency of occurrence of unbearable TMP, and in this example possibly eliminate it if an operational limit is respected. Also, as it is the case for Plant A, the vacuum pumps can be designed to maintain lower vacuums so that the TMP does not reach an unbearable level. Obtaining data to conduct a quantitative evaluation of the fault trees is not an easy task. Generally, information should be obtained at the lowest level of the fault tree, i.e., for primary events, which gives the operators the best indication of the potential technical problems they will face. Nonetheless, monitoring of events at other points within the fault tree is also valuable in certain cases. Monitoring can prevent contamination of finished water by oocysts. Although it was not considered in the fault trees, many if not all events could be prevented given adequate monitoring. For Plant A, TMP can be monitored and operational limits programmed in the control system, so that the likelihood of the membrane collapsing because of an increase in TMP is greatly reduced. By the same reasoning, it is recommended that crew members record the exposure of membranes to chemical solutions, to know when the operational limits given by the manufacturer are exceeded, and so that effective actions may be taken. For Plant B, filter run times are easily monitored and maximum run times should be determined for the range of raw water conditions experienced throughout the year, in order to prevent breakthrough. Also, 47  because pH is an important parameter in the flocculation process, pH should be monitored in the flocculation tank and adjusted when out of the acceptable range. This would prevent events of non-optimal pre-treatment from occurring. Some events, however, cannot be economically or technically monitored. This is the case, for example for Plant A, of seal malfunctions, where continuously monitoring the state of every seal isolating the feed side from the permeate side of every membrane module, is unrealistic. In Plant B, direct monitoring of the rising velocities of the particle-bubble aggregate is similarly unrealistic. Monitoring of an intermediate event is advised in these cases. In Plants A and B, turbidity and particle counts of the finished water of every filter are currently monitored, as well as between the DAF and the filtration process at Plant B. This allows for the detection of contamination caused by events that are not directly monitored. A good diagnostic of the problem might be more difficult to attain but the detection of a problem can at least prompt the operators to react, e.g., by taking the defective train offline until the problem is identified and repaired, thus preventing water contamination.  2.7 Conclusion The general motivation for this study is to improve our understanding of the technical and operational factors that may impede the removal of Cryptosporidium oocysts by different physicochemical treatment technologies. The iterative approach for extracting and organizing information from water treatment plant operators was successful in identifying many technical and operational hazards. The resulting fault trees demonstrate that operating any drinking water production system is a complex task where many factors must be taken into account. Regarding the removal of Cryptosporidium oocysts, at Plant A, most initiating events relate to the UF step, while the pre-treatment steps (coagulation and flocculation) are more critical at Plant B. However, these findings and most failure modes identified were known by operators or previously reported in the literature. This could be explained by the comprehensive knowledge shown by the operators. Therefore for a plant following proven design and operating methods, conducting a full-scale mechanical reliability analysis is unlikely to lead to the discovery of unexpected hazards. This is consistent with previous findings by Mercer and Hrudey (1990). The value of the fault trees resides in the fact that they summarize and compile dispersed knowledge in a single tool, and in their use as a diagnostic tool for operators and plant managers, or a design 48  tool for the water industry. The fault trees are representations of the links between the fundamentals of water treatment at a technical level, and the impact these fundamentals might have on the operational reliability of the technologies. Quantification of the probability of events may help to prioritize interventions at the technical and operational levels. Among these interventions, the study shows how monitoring, maintenance and redundancy can be taken into account to minimize the probability of malfunction of different components of the system. In certain cases, obtaining data for quantitative analysis might seem like a daunting task. Future research should focus on expanding the applicability of the fault trees to other situations and on addressing the issue of quantification of the probabilities of occurrence.  49  Table 2.1: Definitions related to the fault tree analysis, Plant A System definition Environment definition pH: 7 to 8.5 Physical: Alkalinity: 30 to 45 mg CaCO3/L Physicochemical process, including the following steps: Screening, coagulant dosage and static mixing, mechanical Temperature: 1 to 25oC flocculation, submerged hollow fibre UF Turbidity: 2 NTU to > 10 NTU. Dissolved organics Temporal: From the beginning of normal operation to shutdown of Microbial contamination the plant Varying demand Objective: Physically remove pathogens and coagulated matter Membranes are exposed to chlorine solution once per week for maintenance. Performance criteria: Turbidity, particle counts, log removal Membranes are exposed to chlorine solution and citric acid three times per year.  Top event Presence of unexpectedly high Cryptosporidium parvum concentrations in the permeate  Membranes are exposed to varying transmembrane pressure.  Table 2.2: Definitions related to the fault tree analysis, Plant B System definition Environment definition pH: 6 to 7.5 Physical: Physicochemical process, including the following steps: Alkalinity: 2 to 15 mg CaCO3/L Temperature: 1 to 25oC Screening, coagulant dosage and static mixing, alkalinity control, mechanical flocculation, dissolved-air flotation, Turbidity: 0.2 NTU to >10 NTU. and rapid dual-media granular filtration Dissolved organics Microbial contamination Temporal: From the beginning of normal operation to shutdown of the plant Varying demand Objective: Physically remove pathogens and coagulated matter Saturators are exposed to a constant air pressure and airflow. Performance criteria: Turbidity, particle counts, log removal Filters are exposed to varying backwash flow rates.  Top event Presence of unexpectedly high Cryptosporidium parvum concentrations in the filtrate  50  Table 2.3: Description of intermediary and primary events for membrane filtration failure EVENT ADDITIONAL INFORMATION Membrane skin is damaged and does not An event where the membrane skin itself is damaged by a stressor resulting in a pathogen remove pathogens. removal lower than expected. Membranes suffer manufactured/installation Various factors (e.g., insufficient membrane strength, broken fibres during installation), depending defect. on the manufacturer quality control and installation practices, can result in weak fibres that cannot support normal operational loads (e.g., pressure, flow, temperature, chemical exposure). Membrane skin is worn out. Chemical solution dose breaches membrane The membrane skin material can degrade and lose its efficiency when exposed to certain chemicals. skin. A dose of chemical is expressed as a concentration (C) multiplied by a time of exposure (T). It is similar to the well-known CT concept in disinfection. The presence of a high dose (CT) of chemical can result from different conditions (e.g., chemical spill in raw water, industrial wastewater effluent, long-term exposure to cleaning chemical). Manufacturers usually provide tolerances related to chemical exposure. Particles/solids breaches membrane skin. Abrasive particles are present. Abrasive particles rub against Repeated physical contact of the membrane skin with particles can cause skin breach. The presence membranes. of particles alone is not sufficient. Movement of the particles on the membranes has to be provided. Biochemical agent breaches membrane Microorganisms can grow in a biofilm and secrete abrasive substances, or can grow in a tank (e.g., skin. mussels) and breach the membrane skin. This phenomenon depends on many factors such as the microorganisms found in the raw water and the conditions found in the tank (e.g., temperature, pH, pressure, salinity, alkalinity). Each plant should be investigated on a case-study basis, given this phenomenon is suspected. Membrane modules are improperly stored. Storage problem can occur if membrane modules are stored onsite in inadequate conditions. Membranes can dry or be subjected to frost-defrost, which damages the skin. Foreign body cuts membrane fibres. Fibres can be cut, leading to water entering the lumen without going through the active layer. An object enters the tank from the pump stations. Screening device is breached. Screens are exposed to many conditions in the raw water (e.g., high and low flows, ice, small and large debris) and can eventually be breached. An object goes through breached screen and pumps. A component of the processes breaks in the water. An object is dropped in the membrane tank. Membrane bursts. Internal air pressure reaches an unbearable level during an integrity test. Membrane collapses.  SOURCES (1), (2), (8) (2), (4), (5), (6), (8) (1), (8), (9)  (1), (8) (1), (8) (1), (8) (1), (2), (8)  (2), (4), (8) (1) (1) (1)  Remains of a broken mechanical part can enter the flow of water. This depends on the process upstream of the membrane tank and on the particular mechanical equipment used in these processes. Objects can be dropped in tanks at any time if tanks are open to the atmosphere or during maintenance if tanks are generally covered.  (8)  A fibre can break longitudinally, i.e., parallel to its length, by collapsing on itself.  (7)  (1)  51  EVENT Transmembrane pressure reaches an unbearable level. Permeability is too low. Water viscosity is too high. Membranes are fouled. A water hammer occurs. A valve is opened/closed too rapidly. Change in transmembrane pressure causes an air shock. Air is trapped in membrane system after an integrity test A unit is put back in operation. Water bypasses membrane filtration (short-circuit). Seal or coupling fails. Seal suffers from manufactural or installation defect. Seal is worn out. Coupling is loose. Movement/vibration of the system loosens coupling. (1) (2) (3) (4) (5) (6) (7) (8) (9)  ADDITIONAL INFORMATION The vacuum inside a fibre is too high and the TMP makes it collapse.  SOURCES (8)  In plants operated in constant flux mode, the TMP varies to accommodate for changes in permeability. The permeability of the membranes can reach a value so low it needs an unbearable TMP to produce the required flow.  (7)  Different factors such as reversible and irreversible fouling, detailed in Section 2.6.1.1, can occur.  Air exposed to a sudden change in pressure changes volume quickly and gives a shock. It is possible that the air pressurized in the fibres for an integrity test is not purged properly. Factors, such as the hydraulic configuration of fibres and modules, the pressure used in the test, the duration of the test, influence the quantity of air imprisoned in the membrane fibres after a test. The restart protocol will also influence the quantity of air imprisoned. A train is put back in operation after it passed an integrity test. An event where membranes are intact but unfiltered water reaches the permeate by other means, resulting in short-circuiting of the filtration. Seal or couplings usually isolate the permeate side from the feed side of the membrane. It can happen that they do not fulfill their function. Various factors, depending on the manufacturer quality control and installation practices, can result in weak seals that cannot support normal operational loads (pressure, flow, temperature, chemical exposure, etc.) Seals wear out normally. They have to be replaced after a more or less long period of time depending on their quality.  (2), (7) (2), (3), (4), (8) (2) (8) (4), (8)  (8) (1), (8) (1), (3), (5) (2), (3) (1), (2) (1) (1)  Plant operators Adham et al. (2005) Atassi et al. (2007) Eddy et al. (2007) Johnson and MacCormick (2003) Liu (2006) Galjaard et al. (2007) Gijsbertsen-Abrahamse et al. (2006) Gitis et al. (2005)  52  Table 2.4: Description of intermediary and primary events for conventional filtration failure EVENT ADDITIONAL INFORMATION Particles are not sufficiently removed by An event where particles and contaminants present in filter influent are not strained or filter media attached to the filter media and reach the filter effluent. Filter influent quality is inadequate. DAF does not remove particles as it should. The DAF process is expected to reduce the quantity of particles the filters must remove. This process may fail to produce an effluent of acceptable quality for the filters. Flocs are not removed from the DAF. Floated particles recontaminate Particles previously removed from the water may be entrained in the effluent flow of the DAF effluent. process. Particles accumulate sufficiently in Accumulation of floated particles is normal. They are usually removed by a sludge collection system DAF tank. periodically. Yet, over a certain quantity of accumulated sludge, particles can recontaminate the effluent. Level in DAF tank is too low. Level is not adjusted after a change in flowrate. Skimmer fails. A skimmer is a removal system for floated particles. Flocs are formed but not floated. Rising velocities are too low. Floc-bubble aggregates are too dense to float. Flocs are too large to be floated. The DAF process needs small strong flocs. Large flocs may be too heavy to be floated and fluffy flocs might break when contacting rising bubbles. Mixing is inadequate. Mixing energy is too low. Power outage Mixing speed is too low. Mixer is broken. Different problems, which prevent mixing from happening, can arise depending on the type of mixer used. Mixing time is too long. Effective volume of mixing tank is too big. Coagulant is overdosed. See following entry: “Dose of alkalinity control chemical is inadequate.” Dosage problems are similar for both types of chemical. ... Water viscosity is too high. A high water viscosity slows down the floating of particles. The DAF process should be designed to take into account viscosity variations. Not enough bubbles are formed. Recycle water is not sufficiently saturated with air. Saturator fails.  SOURCES (1), (7) (1), (8) (1) (1), (12) (1), (8), (12) (1), (8), (12) (1) (1) (1) (1)  (1) (1), (8) (1) (1) (1) (1)  (3), (10), (12) (1), (12) (1) (1)  53  EVENT Recycle nozzles are plugged. Recycled water contains a plugging agent. Flocs do not form as expected. Conditions for sweep coagulation are outside of the targeted range. The pH in the coagulation tank is out of the sweep coagulation range. Dose of alkalinity control chemical is inadequate. Chemical dosage is inadequate for the present water quality. Organic and particle loading in raw water changes. Chemical dosage is not adjusted to changes in water quality. Chemical is available but not added. Dosing equipment fails. Chemical supply is empty. Coagulant dose is out of the sweep coagulation range. ... Particles bypass the filter media (shortcircuiting) Preferential paths are present in the filter media. Mudballs are present in the filter media. Backwash flow rate is too low. Backwash time is too short. A filter underdrain is clogged.  ADDITIONAL INFORMATION If nozzles are plugged, the recycle flow may be too low to produce enough bubbles. Possible clogging agents are: recirculated flocs, small wood debris, and anything that gets thrown in the water and makes it through the pre-treatment steps of the treatment train.  SOURCES (1), (12) (1) (1)  Coagulation modes are determined by the pH of the water and by the coagulant dosage. These two parameters must be in a given range for the expected coagulation phenomenon to take place. The pH of the water is an important parameter for coagulation and floc formation and must be controlled, particularly where the water has a low alkalinity.  (1), (3), (8), (10), (12) (1), (8), (10)  Because of raw water quality changes, the optimal dosage changes in time. Therefore, a set dose might be adequate at some times but not at others.  (1)  The optimum coagulant dosage is a function, among other things, of the amount of organics and particles present in the water to be treated.  (1) (1)  (9) See previous entry: “Dose of alkalinity control chemical is inadequate.” Dosage problems are similar for both types of chemical.  (1), (2), (4), (5), (6), (7), (8), (10)  A portion of the influent, conditioned or not, may take preferential paths in the filter, preventing particles from attaching to or being strained by the filter media.  (1) (1), (5), (10)  Mudballs are accumulations of particles that clog areas of the filters. The effective surface area of the filter is reduced. A low backwash flow rate prevents some particles from being removed from the filter media. These particles may accumulate. A short backwash time can prevent some particles from being removed from the filter media. These particles may accumulate. Clogged orifices prevent water from flowing through certain area of the filter and force it to take other paths. Also, a clogged underdrain prevents the even repartition of backwash water, leading to the same result, i.e., preferential paths.  (1) (1), (9) (1), (9) (10)  54  EVENT Small diameter media accumulates in underdrain. Air binding occurs. A filter surface is clogged. Flocs are too cohesive and foul filter surface. Filter aid polymer is overdosed. Clarification process does not remove particles as it should. … Particles pass through filter media but are not catched. Filter does not provide adsorption sites (breakthrough)). Filter requires backwash. Backwash is started too late. An un-ripened filter treats water. Filter is in ripening stage. Ripening water is not sent to waste or recycled. Particles detach from filter media  ADDITIONAL INFORMATION  SOURCES (10)  Because of head loss, low pressure can occur in a filter and promote degassing. Bubbles that attach to filter media create preferential paths.  (11) (9), (11) (5), (9), (10)  (See previous entry.) The physicochemical condition of the filter media is unfavourable to particle attachment and particles pass by without being removed from the water.  (1), (2), (5), (6), (7), (8) (1) The backwash process is started after some water has already passed through the filter media with no attachment capacity. The surface of filter media is still negatively charged and has not caught sufficient particles to reach its optimal removal capacity. A filter that was just backwashed needs to be ripened before it recovers its full capacity. A filter being ripened does not provide consistent pathogen removal. The effluent of a filter undergoing ripening may be mixed with the effluent of other filters and distributed to consumers. Filters accumulate particles in their media. It is possible that particles previously removed by filter media detach themselves and contaminate the filter effluent.  Flow increases suddenly. A filter is taken offline. Filter requires maintenance. Filter requires backwash. Total flow is not adjusted to the remaining number of filters. (1) Plant operators (2) Dugan et al. (2001) (3) Edzwald and Kelly (1998) (4) Edzwald et al. (2000) (5) Emelko et al. (2005) (6) Huck et al. (2001)  (7)  The incoming flow of a filter is reduced to zero. The total effluent flow of a series of filters is not modified when the number of functioning filters is changed. (7) Huck et al. (2002) (8) LeChevallier and Au (2004) (9) Pizzi (2004) (10) Pizzi (2005) (11) Scardina and Edwards (2004) (12) Schofield (2001)  (5), (6), (7), (8), (9) (5), (7), (8) (7), (8) (5), (7) (1), (5), (6), (8) (1) (1) (1)  55  Table 2.5: Classification of cutsets Process initiating failure Screening Coagulation Flocculation DAF (clarification) Membrane filtration/Rapid granular filtration Total  Plant A 2 2 2 N/A 13 19  Plant B 0 3 7 5 8 23  Figure 2.1: Fault tree for waterborne outbreaks (adapted from Risebro et al., 2007)  56  Figure 2.2: Fault tree construction in relation to soliciting operators’ knowledge  57  Figure 2.3: Fault tree of Plant A  58  Figure 2.4: Fault tree of Plant B  59  2.8 References Adham, S., Chiu, K.-p., Gramith, K. and Oppenheimer, J. (2005). Development of a Microfiltration and Ultrafiltration Knowledge Base, AWWA Research Foundation, Denver, 179 p. ASTM International (2003). Standard Practice for Integrity Testing of Water Filtration Membrane Systems, ASTM International, D 6908 - 03, 1-14. Atassi, A., White, M. and Rago, L. M. (2007). "Membrane failure headlines: Facts behind the hype regarding initial problems experienced at low pressure membrane installations", Membrane Technology Conference, Tampa, Florida, March 18-21, 2007. American Water Works Association. Chen, S. K., Ho, T. K. and Mao, B. H. (2007). "Reliability evaluations of railway power supplies by fault-tree analysis" IET Electric Power Applications, 1, (2), 161-172. Choudhury, S. H., Yu, S. L. and Haimes, Y. Y. (1992). "Assessing the risk of non-compliance in waste-water treatment" Water Science and Technology, 26, (5-6), 1411-1420. Deere, D. A., Stevens, M., Davison, A., Helm, G. and Dufour, A. (2001) "Management Strategies" in: Water Quality Guidelines, Standards and Health: Assessment of risk and risk management for water-related infectious disease, IWA Publishing, 257-288. Démotier, S., Denoeux, T. and Schon, W. (2003) "Risk assessment in drinking water production using belief functions" in: Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Proceeding, Springer-Verlag Berlin, Berlin, 319-331. Dugan, N. R., Fox, K. R., Owens, J. H. and Miltner, R. J. (2001). "Controlling Cryptosporidium Oocysts Using Conventional Treatment" Journal of the American Water Works Association, 93, (12), 64-76. Eddy, D., Judd, S. and Adham, S. S. (2007). "Pan-Atlantic Review of Operation of LowPressure Membrane Filtration Plants", International Conference on Membranes for Water and Wastewater Treatment, Harrogate, UK, Edzwald, J. K. and Kelly, M. B. (1998). "Control of Cryptosporidium: from reservoirs to clarifiers to filters" Water Science and Technology, 37, (2), 1-8. Edzwald, J. K., Tobiason, J. E., Parento, L. M., Kelly, M. B., Kaminski, G. S., Dunn, H. J. and Galant, P. B. (2000). "Giardia and Cryptosporidium removals by clarification and filtration under challenge conditions" Journal of the American Water Works Association, 92, (12), 70-84. Eisenberg, D., Soller, J., Sakaji, R. and Olivieri, A. (2001). "A methodology to evaluate water and wastewater treatment plant reliability" Water Science and Technology, 43, (10), 91-99. Emelko, M. B., Huck, P. M. and Coffey, B. M. (2005). "A review of Cryptosporidium removal by granular media filtration" Journal of the American Water Works Association, 97, (12), 101-115. Ericson, C. (1999). “Fault Tree analysis – A History” The 17th International System Safety Conference, Orlando, Florida, August 16-21, The System Safety Society Farahbakhsh, K., Adham, S. S. and Smith, D. W. (2003). "Monitoring the Integrity of LowPressure Membranes" Journal of the American Water Works Association, 95, (6), 95-107. Galjaard, G., Lampe, M. and Kruithof, J. C. (2007). "UF-membrane replacement after 6 years of operation at the UF/RO Heemskerk plant: a matter of critical flux and membrane integrity", 60  Membrane Technology Conference, Tampa, Florida, March 18-21, 2007. American Water Works Association. Gijsbertsen-Abrahamse, A. J., Cornelissen, E. R. and Hofman, J. A. M. H. (2006). "Fiber failure frequency and causes of hollow fiber integrity loss" Desalination, 194, 251-258. Gitis, V., Haught, R. C., Clark, R. M., Gund, J. and Lev, O. (2005). "Application of nanoscale probes for the evaluation of the integrity of ultrafiltration membranes" Journal of Membrane Science, 276, 185-192. Gressel, M. G. and Gideon, J. A. (1991). "An overview of process hazard evaluation techniques" American Industrial Hygiene Association Journal, 52, (4), 158-163. Hartford, D. N. D. and Baecher, G. B. (2004). Risk and Uncertainty in Dam Safety, Thomas Telford Publishing, London, xxvi, 391 p. Huck, P. M., Coffey, B. M. and O'Melia, C. R. (2001). Filter Operation Effects on Pathogen Passage [Project #490], Denver, Huck, P. M., Coffey, B. M., Emelko, M. B., Maurizio, D. D., Slawson, R. M., Anderson, W. B., Van Den Oever, J., Douglas, I. P. and O'Melia, C. R. (2002). "Effect of filter operation on Cryptosporidium removal" Journal of the American Water Works Association, 94, (6), 97111. Johnson, W. T. and MacCormick, T. (2003). "Issues of operational integrity in membrane drinking water plants" Water Science and Technology, 3, (5), 73-80. LeChevallier, M. W. and Au, K.-K. (2004). Water Treatment and Pathogen Control: Process Efficiency in Achieving Sake Drinking-water, IWA Publishing, London, xx, 112p. Liu, C. (2006). "A Risk-Based Approach to Determine the Frequency of Integrity Testing for Drinking Water Membrane Systems", Water Quality Technology Conference, Cincinnati, Ohio, November 6-10, 2006. American Water Works Association. Medema, G. J. and Ashbolt, N. J. (2006). QMRA: its value for risk management, 34p. Mercer, S. M. and Hrudey, S. E. (1990). "Demonstration of Quantitative Risk Assessment for a Municipal Water Treatment Plant Chlorination Process", Biennial Environmental Specialty Conference, Hamilton, Ont., May 1990. Canadian Society for Civil Engineering. Min, S. and Min, D. B. (2006). "The hazard analysis and critical control point (HACCP) system and its implementation in an aseptic thermal juice processing scheme: A review" Food science and biotechnology, 15, (5), 651-663. Pizzi, N. G. (2004). Stories from the road: on-the-job experiences of water treatment plant operators, American Water Works Association, Denver, CO, ix, 110 p. Pizzi, N. G. (2005). Water Treatment Operator Handbook, American Water Works Association, 251p. Risebro, H. L., Doria, M. F., Andersson, Y., Medema, G., Osborn, K., Schlosser, O. and Hunter, P. R. (2007). "Fault tree analysis of the causes of waterborne outbreaks" Journal of Water and Health, 05, (Supplement 1), 1-18. Scardina, P. and Edwards, M. (2004). "Air binding of granular filter media" Journal of Environmental Engineering, 130, (10), 1126-1138. Schofield, T. (2001). "Dissolved air flotation in drinking water production" Water Science and Technology, 43, (8), 9-18.  61  Stevens, M., McConnell, S., Nadebaum, P. R., Chapman, M., Ananthakumar, S. and McNeil, J. (1995). "Drinking water quality and treatment requirements: a risk-based approach" Water, 22, 12-16. United States Environmental Protection Agency (2005). Membrane Filtration Guidance Manual, Office of Water, United States Environmental Protection Agency (2006). National Primary Drinking Water Regulations: Long Term 2 Enhanced Surface Water Treatment Rule; Final Rule, 40 CFR Parts 9, 141, and 142 [EPA–HQ–OW–2002–0039; FRL–8013–1] RIN 2040—AD37, Federal Register, Vol. 71, No. 3, 654-786.  62  3  APPLICATION AND LIMITATION OF A QMRA-BASED RELIABILITY ANALYSIS TO ASSESS THE PERFORMANCE OF AN ULTRAFILTRATION WATER TREATMENT TECHNOLOGY2  Preface This chapter advances quantitative reliability tools in the water treatment industry. It provides a new point of view related to the QMRA method, which was developed and applied in the last decades in the drinking water industry. Here, reliability, that is the probability of success, is substituted for the traditional risk of infection as the measure of performance of treatment technologies. The reliability approach is applied to a UF system and compared to a reference case from the literature.  2  A version of this chapter will be submitted for publication. Beauchamp, N., Bouchard, C., Lence, B.J. Application and Limitation of a QMRA-Based Reliability Analysis to Assess the Performance of an Ultrafiltration Water Treatment Technology  63  3.1 Introduction Decision making in the water treatment industry has become increasingly more complex with the discovery of previously unidentified pathogens and substances, new concerns of consumers, the emergence of novel treatment technologies and the initiation of new standards and regulations. Each new technology is accompanied by a significant level of uncertainty given the relatively low level of experience with these technologies. Nevertheless, decisions involving considerable economical, environmental and social implications, such as the selection of a water supply system, the modification of an existing system, or the development of standards and regulations overseeing the practices of the industry, must be made despite such uncertainty. Risk assessment approaches have been developed to address such uncertainties. As noted by Haas and Trussell (1998), the possibility that an engineered system performs out of prescribed specifications always exists. In this context, reliability is usually defined as the probability of proper functioning of a system, called success, or non-failure. This is a quantitative definition that demands a probabilistic and/or statistical approach. Two points of view can be adopted when trying to quantitatively evaluate the reliability of an engineered system. First, the system can be seen as a kind of black box and the variability of its output can be measured and quantified. Comparing the output variability with a standard or regulation that differentiates the normal state from the failure state, one can determine the probability of compliance, or reliability. This is called inherent reliability (Eisenberg et al. 2001). Alternatively, a system can be decomposed into many sub-systems assembled in series or in parallel, each with its own probability of failure. The probability of proper functioning of the whole system can be calculated by conflating the probability of failure of its components. This approach is called mechanical reliability (Eisenberg et al. 2001). In this paper, it is proposed that inherent reliability analysis, which considers variability and uncertainty of the different variables involved in the protection of public health, can provide useful information for decision-makers facing uncertainties. In the last few decades, an important focus of research in the field of microbial quality of drinking water has been the investigation of protozoan parasites, mostly Giardia spp. and  Cryptosporidium spp. These protozoa infect the epithelial cells of warm-blooded animals, including humans. They reproduce in their host and are excreted through feces in a resistant stage, called a cyst (Giardia) or oocyst (Cryptosporidium). The symptoms of giardiasis and 64  cryptosporidiosis, their respective illnesses, are gastrointestinal. It is now known that (oo)cystcontaminated water is an important route of infection, along with direct or indirect contact with feces of infected individuals. In source water, (oo)cysts behave like like-sized silt particles and are usually found suspended, unless in calm water where sedimentation of small particles occur (King and Monis, 2007; Brookes et al, 2004). Disinfection using traditional chlorination technologies at practical dosages and retention times has some effect on Giardia cysts but no significant effect on Cryptosporidium oocysts. Ozone is effective against both in warm waters but becomes much less effective under cold water conditions, such as those encountered in Canadian winters. Physical removal or alternative disinfection, such as ultraviolet (UV) radiation, is required to effectively treat water contaminated with protozoan (oo)cysts. Among the new technologies developed to remove (oo)cysts, membrane ultrafiltration (UF) is gaining popularity. Quantitative evaluation of exposure to these pathogens, of the consequences of exposure and of the concentration reduction achievable by the different technologies is required when treatment alternatives are being considered. Quantitative microbial risk assessment (QMRA) is the application of principles of risk assessment to evaluate the consequences of an exposure to microbial infectious agents. The major developments in the application of QMRA for drinking water took place during the 1980s and 1990s (Haas, 1983; Rose et al. 1991; Haas et al. 1996; Teunis et al. 1997). Haas et al. (1999) summarized the state of knowledge of QMRA at the time, and reported on the microbial world, the different steps involved in a risk assessment, the models used in the process of QMRA, and the previously published data regarding doseresponse relationships. Since then, QMRA has been used in the United States to develop technical standards (USEPA, 1989, 2006) and in Europe to develop a regulation framework (MicroRisk Consortium, 2003). In Canada, although provincial drinking water regulations are strongly influenced by the USEPA approach (Alberta Environment, 2006; MDDEP, 2006), QMRA has not been applied directly in the development of rules and standards. In this chapter, QMRA is applied to a full-scale UF plant in a Canadian setting, chosen to provide an example of the use of a new physicochemical process removing Cryptosporidium oocysts. To take variability and uncertainties related to the model variables into account, it is proposed to integrate the QMRA model into a reliability-based approach. This approach provides a sound basis for comparison of treatment technology performance, to quantify the importance of different model variables, and to improve our understanding of the risk associated 65  with microbiological water quality. The approach is demonstrated using operational data from the full-scale UF plant. The resulting reliability is compared with that of conventional treatment train reference cases to demonstrate the possible uses of the method. The remaining sections of this paper are organized as follows. First, the development of the reliability point of view of the QMRA approach is described. Next, the inputs for the application of the approach for analysing the reliability of a UF plant are described and the results for this application are presented and compared to reliability estimates for conventional treatment under different operating scenarios. A discussion follows, where the results are critically analyzed in light of the objectives and assumptions of the study, and the value of reliability analysis for the different stakeholders is discussed.  3.2 Reliability approach for applying the QMRA model In the application of QMRA, the most common and widespread model used to estimate the ratio of infection in the population is the exponential model (Haas et al., 1996; Teunis et al., 1997; Barbeau et al., 2000; Medema et al., 2003). The mathematical expression for this model is:  Pinf = 1 − exp(− r × N )  (3.1)  where Pinf is the ratio of infection in the population; r is the infectivity parameter, defined as the ratio of ingested pathogens that will survive the host’s defences and start an infection, and N is the mean dose of oocysts. The mean dose, while easier to determine in experimental settings, cannot be directly measured at the consumer’s tap in a drinking water context and is usually expressed as the product of the mean of the treated water concentration of oocysts (Ctreated) and the intake volume of cold drinking water (V). Moreover, Ctreated is impractical to measure for  Cryptosporidium parvum, because common concentrations are well below the detection limits of modern sampling methods. The alternative to direct measurement of pathogens in treated water is to measure concentrations in raw water and to estimate the removal achieved by the treatment plant. The resulting dose is shown in Equation 3.2, where Craw is the oocyst concentration in raw water, and LRVtotal is the performance of the whole plant for this pathogen, expressed in a base-10 logarithmic scale (log removal value, LRV), and all other variables are defined above. 66  N = V × C raw × 10 − LRVtotal  (3.2)  This model is based on two important assumptions. The first is known as the “single hit” hypothesis. That is that two or more pathogenic microorganisms of the same species, even if ingested together, act independently, and that any of them may cause an infection. Second, it is assumed that pathogens in the treated water are randomly distributed and the quantity of oocysts in a given volume exhibits a Poisson distribution with an average concentration Ctreated. The model applies to pathogens for which these two assumptions are reasonable. The time scale at which this model may be applied is the choice of the analyst. If one wishes to know the yearly risk of infection, then yearly values of variables V, Craw and LRVtotal must be used. The infectivity parameter, r, is independent of time. In the current study, the model is applied at a daily time scale. However, results are presented as yearly risk of infection to facilitate comparison with previously developed risk of infection standards (Haas, 1996). The implications of the time scale chosen are examined in the Discussion section. To perform a reliability analysis, a performance function separating the failure domain from the success domain must be defined. In this case, failure is defined as the production of water leading to an unacceptable risk of infection for the consumers. Its mathematical expression is r r shown in Equation 3.3, where g X is the performance function, X is the vector of random r input variables, and Pacc is the acceptable risk of infection. Failure occurs when g X ≤ 0.  ( )  ( )  Finally, the extended expression including all random variables is shown in Equation 3.4. r g X = Pacc − Pinf (3.3)  ( )  r g X = Pacc − 1 + exp − r × V × Craw × 10− LRVtotal  ( )  (  )  (3.4)  Based on the previous classification, this model is categorized as an inherent reliability model. It uses quantification of the system output (treatment plant performance, LRVtotal) and compares it to a standard (i.e., acceptable risk of infection, Pacc) to differentiate failure from success. To investigate system performance in the face of uncertain standards, a range of values of Pacc may be examined. Issues regarding the determination of Pacc will be addressed in Section 3.4. Different methods have been developed to assess the reliability of engineered systems. In this study, the First-order reliability method (FORM), developed by Hasofer and Lind (1974), and described in Maier et al. (2001), is used. Briefly, FORM is a method using marginal 67  distributions of random variables and correlations between these variables to numerically evaluate the probability of failure of a system. A hyperplane approximates the performance function (first order approximation), and the joint probability of being on the failure side of the plane is found numerically. The advantages of this method over the widely-used Monte Carlo sampling (MCS) method are its computational efficiency, its repeatability, and the other measures, such as the importance of each random variable in the probability of failure, that may be determined during the analytical process. The disadvantage of FORM is that it gives a first order approximation of the probability of failure. However, in many applications of FORM to water engineering problems, FORM results differ only slightly from MCS results, while computing efficiency is improved (Vasquez et al., 2000; Maier et al., 2001; Sarang et al. 2008; Thorndahl et al., 2008). For these reasons, FORM was chosen for this study. The program used to execute the FORM analysis in this study is the Open System for Earthquake Engineering Simulation (OpenSees) developed at the University of California in Berkeley (OpenSees, 2006). The reliability tools of this program, although intended for structural reliability, can be used to solve any reliability problem.  3.3 Application of the reliability approach to a full-scale UF plant 3.3.1 Input variable probability distributions Figure 3.1 shows the distribution functions of the input variables. Each graph shows the cumulative distribution function (CDF), i.e., the probability that the random variable is lower than a given value, on the right ordinal scale, and the probability density function (PDF), i.e., the derivate of the CDF, on the left ordinal scale. Descriptions of the random variables follow.  3.3.1.1 Infectivity The USEPA (2005a), in the Economical Analysis for the Long Term 2 Enhanced Surface Water Treatment Rule (LT2ESWTR), estimates different distributions of infectivity, r, by applying different probability models to data from human volunteer challenge studies. Among the models based on the exponential equation (see Equation 1), the best fit was found when r followed a Beta distribution. Hence, a Beta distribution is assumed and the infectivity distribution used in the reliability analysis applied herein is shown in Figure 3.1a. This 68  distribution was found using the maximum log-likelihood technique based on results from six volunteer feeding studies presented in USEPA (2005a). The likelihood function is: n   1  L(α , β ) = ∑ ln ∫ Beta(r , α , β ) × ∏ Bin(infectioni , j , subject i , j ,1 − exp(− r × N ))dr  i =1  j =1   r =0   6  (3.5)  where r is the infectivity parameter, α and β are the Beta distribution parameters, infection is the number of subjects infected at dose j in study i, subject is the number of subjects exposed to dose j in study i, i is the index of the volunteer feeding study, j is the index of the dose level of the study, n is the number of dose levels of each study, and N is the dose. Maximizing the function results in the following parameters: α = 0.383, β = 10.829 .  3.3.1.2 Tap water consumption The distribution of daily cold tap water consumption for the Canadian population, used in the reliability analysis, is shown in Figure 3.1b. As suggested by Mons et al. (2007), countryspecific tap water consumption data and distributions should be used for QMRA applications when available. In Canada, the Environmental Health Directorate (EHD) conducted the most complete survey of tap water consumption in 1977 and 1978 (EHD, 1981). The tap water consumption was shown to vary according to factors such as age, geographical location, and level of physical activity. The ratio of tap water consumed “cold”, versus boiled or through food, also varies with age. Statistical distributions of total daily tap water consumption are available (EHD, 1981) for different age groups (less than 6, 6 to 17, and 17 years-old and over). In our work, to account for the quantity of boiled water, consumption levels for each age group were multiplied by the ratio of cold to total tap water for this particular age group (82%, 83%, and 45%, respectively). To produce Figure 3.1b, a weighted average of the “cold-wateradjusted” distributions was calculated where the weights were based on the 2005 demographical percentage of each age group (6.4%, 15.2%, and 54.4%, respectively) available through Statistics Canada (2006).  3.3.1.3 Pathogen concentration Conducting a case study for a known plant requires raw water pathogen concentrations specific to its source water. Monitoring raw water concentrations of protozoa is not common practice in Canada. For the plant studied herein, one monitoring campaign of Cryptosporidium oocysts in raw water was conducted in 1997 before commissioning of the actual treatment plant. 69  One sample per month was taken between July and December inclusively, for a total of 6 samples. The results from this sampling campaign were used to obtain the probability distribution shown in Figure 3.1c. For each day, oocysts are assumed to be randomly distributed in the water with an unknown average concentration, Craw, between 0 and 100 oocysts/100L. Each measurement is therefore the result of a Poisson process. For every sample, the likelihood of possible values of Craw was computed, resulting in a probability distribution between 0 and 100 oocysts/100L. Assuming all six distributions are equally likely to represent the variation in daily mean concentration, they were averaged to obtain the distribution shown in Figure 3.1c. It is important to note that although Figure 3.1c represents a possible contamination scenario, it does not necessarily characterize a typical or all Canadian source waters. Nonetheless, the computational approach could be used to obtain distributions for other source waters with similar monitoring data.  3.3.1.4 Full-scale UF system performance Full-scale plant data were used to obtain the distribution for the performance variable (LRVtotal). This plant is not necessarily representative of all plants using UF but represents a plausible scenario of performance resulting from the operation of this technology under Canadian jurisdiction. A brief description of the plant is provided with the probability distribution of performance. The treatment facility is not identified for confidentiality reasons. The treatment process of the plant consists of the following steps: 2.5mm screening, coagulation, flocculation, UF, and chlorination. The type of membranes used is hollow fibre operated in an outside-in fashion. Membranes are immersed in a water tank and a vacuum is applied on the inside of the fibres, which causes the water to permeate through the fibres and flow to the permeate collector. A total of twelve parallel trains, of six cassettes each, can be operated independently. The filtered water is then chlorinated by the addition of a sodium hypochlorite solution. Chlorinated water passes through a contact tank and ends up in a reservoir from which it is pumped to the distribution system. Integrity tests are conducted on hollow-fibre UF membranes using the standard pressuredecay test technique described in ASTM International (2003). This test consists of pressurizing the inside of the fibres with air and measuring the decay of pressure occurring when air flows through pores and possibly breaches. Air will flow through wetted pores and breaches if capillary forces cannot resist the air pressure applied during the test. Because these capillary 70  forces are a function of the diameter of the holes, it is possible to compute the air pressure that will detect holes of 3 µm or more, which is the minimum size of an oocyst, as required in ASTM international (2003) and USEPA (2005b). For the system studied, this pressure is 66 kPa. During operation, it is assumed that water flowing through the detected breaches is untreated. Both the airflow during the integrity testing and the equivalent water flow during water filtration are assumed to pass through the same breaches, with the same characteristics (i.e., breach diameters, lengths, shapes, etc.) Using compressible and incompressible flow hydraulics, the equivalent flow of water is modeled. Assuming both flows are laminar, the Hagen-Poiseuille Equation for laminar flow is used to calculate the equivalent water flow from the pressure-decay rate measured during the integrity test. Details of the mathematical derivation of the equivalent water flow can be found in ASTM International (2003). The log reduction value for this test is computed by Equation 3.6, where LRV is the log reduction value of the unit, Qfilt is the total flow of water through the membrane, Qbreach is the untreated flow through the breach, and CF is the concentration factor, a dimensionless term, dependant on the hydraulic configuration of the system, which takes into account the increase in solids concentration on the feed side of the membrane.  Q filt    LRV = log10   CF × Qbreach   (3.6)  At this plant, an integrity test is conducted on each train every week. The LRV values were obtained from pressure-decay tests conducted between May 2005 and June 2007. The minimum and maximum LRV for individual trains are 4.01 and 7.81, respectively, with a mean of 5.52 and a standard deviation of 0.507. The result of each integrity test is assumed to represent the performance of the tested train for the entire period between the given test and the previous test. An LRV value is therefore assigned to each train for every day between May 1st 2005 and June 30th 2007. A value of LRVtotal, the average removal achieved by the 12 trains, was computed for each of these days. The resulting historical data of the total removal (LRVtotal) achieved by the UF plant is shown on Figure 3.2. These values were used to obtain the probability distribution of the performance of the whole plant (LRVtotal) as shown on Figure 3.1d.  3.3.2 Results of FORM analysis The results of the reliability analysis are shown in Figure 3.3. For different values of the acceptable risk, Pacc, the reliability, or probability of success for the UF membrane system is 71  given by the curve “UF Membrane”. The other curves in Figure 3.3 are the results of reliability analyses for different operating scenarios of a conventional physicochemical treatment train, as explained in Section 3.3.2.1. Success, as mentioned previously, is defined as the production of water leading to a risk lower than the acceptable value. Hence, the curves in Figure 3.3 can be viewed as the CDF of the risk of infection, given the input variables described previously. For example, at a reliability level of 0.9, the risk of infection is 10-4.93. That is, there is a 90% probability that less than 1 in 85,114 people become infected by Cryptosporidium parvum by drinking water produced by this UF plant.  3.3.2.1 Reference cases for conventional physicochemical system performance Because UF is a new treatment technology, it is interesting to compare its reliability with a reference technology considered to be established, i.e., for which experience is more abundant. The conventional physicochemical treatment train, which can be viewed as the traditional technology providing physical removal of oocysts, can be such a reference. The conventional physicochemical treatment train consists of the following steps: screening, coagulation, flocculation, clarification, and rapid granular media filtration. In many jurisdictions, turbidity of the filter effluent is monitored and used as a surrogate to compute credited log removal of pathogens (Alberta Environment, 2006; MDDEP, 2006; USEPA, 2006). However, in the Microrisk project and in other sources (Huck et al., 2002; Emelko et al., 2005; Smeets et al., 2006), it is recognized that although turbidity is a good indicator of plant optimization, it is not suitable for quantitative assessment of oocysts removal. A summary of issues regarding the use of turbidity as a surrogate can be found in Section 3.4.2. In this work, rather than using effluent turbidity data to establish a conventional treatment reference performance, an approach developed in the Microrisk project (Smeets et al., 2006) was used. This approach, to the author’s knowledge, is the most comprehensive reference on pathogen removal for use in quantitative risk analysis, and is illustrated in the performance probability distributions shown in Figure 3.4. First, a literature review of 15 studies of Cryptosporidium oocyst removal by bench-, pilot-, and full-scale conventional treatment trains was conducted to obtain the range of possible performances. Based on the full range of removal values obtained in the studies, the basic log-triangular PDF shown in Figure 3.4a is constructed. Then, given the turbidity abatement achieved by a plant, Smeets et al. (2006) suggest that this 72  PDF be adjusted in the following way. When effluent turbidity of the physicochemical process is consistently maintained below 0.1 NTU, the plant is assumed to be performing well and is likely to provide removal in the high range of the basic distribution and therefore the distribution is shifted as in Figure 3.4b. On the contrary, when the filtrate turbidity is unstable, varying by a few tenths of NTU around the regulatory limit, indicating that filtration does not work effectively, the plant is considered to be poorly optimized and the removal is likely to be in the low range of the basic distribution and is therefore shifted as in Figure 3.4c. A third reference case of mean performance is developed in our study, as shown in Figure 3.4d, which may be thought of as an intermediate case between optimal and poor operation. This approach is general because performance distributions, although based on actual measurements from studies, are adjusted arbitrarily to represent different operational scenarios. Turbidity values are used as an indicator to adjust the basic performance distribution based on the degree of plant optimization but do not provide an actual assessment of the achieved removal. Results of the reliability analyses for the conventional treatment reference cases based on the high, low, and mean performance distributions in Figure 3.4b, c and d, respectively are shown in Figure 3.3, in addition to the results of UF reliability analysis. They indicate that the UF plant studied provides water with low concentrations of Cryptosporidium parvum more reliably than would the reference conventional trains treating the same water. In a particular situation, it is possible to compare LRV probability distributions directly instead of conducting the full reliability analysis. If the context is such that all other variables (Craw, V, and r) are fixed and the intention is only to determine which technology provides the highest removal, then, it seems that LRV probability distributions are sufficient. However, it is important to observe the following. First, using solely LRV probability distributions remains a reliability analysis, although a simple one with a single variable. It acknowledges variability and recognizes the importance of conferring pathogen removal credits to a system not only based on the capacity of the technology but on the operational reality. Removal assessment techniques for both technologies are nonetheless crucial. This issue is further discussed in Section 3.4.2.2. Second, when the reliability of two plants in different settings (i.e., different Craw, V, and r) is to be assessed, incorporating the QMRA model in the reliability analysis is valuable, because the health impact becomes the only possible point of comparison. In the context of compliance with regulation, conducting the complete reliability analysis is also advised, given a standardized approach to reliability analysis is provided and a reliability target is determined. 73  Finally, Figure 3.5 shows the absolute values of the importance vector, which represents sensitivity information regarding the input variables, for the membrane plant FORM analysis. r The importance vector is the gradient, a unit vector, of the performance function g X at the  ( )  point where it is approximated by a hyperplane. The gradient represents, for an infinitesimal increase in the value of the random variable, the resulting change in the value of the risk of infection. Hence, the higher the value of the importance vector for a given variable, the more sensitive the risk of infection is to this variable. The infectivity parameter is the variable with the highest value regardless of the level of acceptable risk. The raw water concentration is the second most important variable, except for the highest value of acceptable risk, where the plant performance is slightly more important. The intake volume is, in all cases, the least important of the random variables. The importance vectors for the conventional treatment scenarios, not shown here, exhibit almost identical patterns.  3.4 Discussion 3.4.1 Methodological issues Because challenging or addressing previously observed methodological issues related to QMRA was not the objective of this work, the interested reader will find references in Table 3.1 that address these issues in more detail. In the following section, the focus is on the goal of this study, namely on exploring the value of incorporating uncertainty in comparing treatment technologies through reliability analysis. Table 3.1 lists the assumptions on which the QMRA model is based and on which the input probability distributions are obtained. Descriptions of the assumptions made in the present study are given in the second column. The third column specifies if the given assumption is conservative or not, or if its influence is unknown. References are given for discussion of these hypotheses or regarding validation or invalidation of them. They do not necessarily support the assumptions made in the present study but discuss or shed new light on the issues raised by the use of the assumption.  74  3.4.2 Value of the approach in decision-making issues Different uses of QMRA have been proposed (Smeets, 2008). And these applications can be divided in two broad categories: regulation issues and design and operational issues. The approach proposed herein has value for both of these sets of issues. The following subsections discuss them in turn.  3.4.2.1 Risk and regulations Traditionally, compliance with water treatment objectives was assured when monitoring of treated water for undesired contaminants achieved a certain standard, developed to reach a desired health target. Nowadays, it is known that the economically or technologically feasible sampling of treated water is not statistically significant to detect disease-causing organisms that are a health concern at very low concentrations, such as Cryptosporidium. Instead of samplingbased standards, performance-based standards are now the norm. Yet, performances and other factors influencing health risk vary. LeChevallier and Buckley (2007) recommend that risk assessors should quantify uncertainties involved in the QMRA process and “define variables in terms of increasing or decreasing degree of certainty.” The model proposed herein is a direct response to this recommendation, where uncertainty is addressed through reliability analysis. By acknowledging variability, policy-makers must first determine the level of risk of infection that is considered “tolerable”, and its associated reliability, or probability of achieving that level of risk of infection. Jaidi (2007) and Medema et al. (2003), in studies focusing on other treatment technologies, recognized this issue, although the use of reliability analysis was not explicitly mentioned in their work. In the past, the value of “acceptable” or “tolerable” risk of infection has been set at 1 infection per 10, 000 people per year, or 10-4, in the US and under other jurisdictions. This value is affected by many factors such as the population at risk, the costs, benefits and willingness to pay to achieve the value, the geographical location, and the perception of the risk (LeChevallier and Buckley, 2007). Its determination is beyond the scope of this work. However, it is difficult to state if the plant analyzed here is safe enough if no standard is set on the “tolerable” risk. Before one can grant meaning to the result of a single reliability analysis, a target reliability must be established. A possible starting point on which to base the value of this target reliability can be the back calculation of reliability of plants designed according to previously accepted 75  practices. This approach has been used in structural engineering in Canada for what is called “code calibration” (Foschi et al., 1993; Nowak, 1995). The reliability of previous structures, for which the design was based on accepted best practices and had proven to be “safe enough”, was used as a baseline for the reliability-based calibration of new codes. A similar approach could be used to determine the target reliability that should be reached by treatment plants at the level of acceptable risk. While the mean risk of infection can be driven by one or two low performance events of low probability (Smeets, 2008), reliability has the added value of not being as sensitive to the precise evaluation of these events (Jaidi, 2007). Even if these high risk events are misevaluated by an order of magnitude or two, it does not influence the probability of failure if this probability is neither extremely high nor extremely low. That is, the percentiles of the distribution of the risk of infection are more stable than the mean risk of infection. One value of the reliability approach described in the present work lies in the fact that it provides a more stable output while addressing uncertainties.  3.4.2.2 Comparison of treatment technologies Some authors working in the field of QMRA and in other engineering disciplines suggest that results from reliability analyses should only be used as a means of comparison because of the various assumptions in the structure of the model and in the input probability distributions (Faber, 2006; Jaidi, 2007). Yet, it is of the upmost importance to recognize that determining LRV distributions for both technologies poses many challenges. For example, in the present comparison, LRVs for the UF plant were obtained from tests conducted weekly, for which the results are computed with a hydraulic model involving many assumptions. On the other hand, the performance distributions for the conventional treatment scenario were derived from a compilation of research studies where removals were computed using influent and effluent sampling data from one-time experiments on different systems. These two approaches for representing performance differ in the time scale at which the data are obtained and in the techniques to obtain such data. It is reasonable to ask if a comparison based on such different sources of information is valid. In the US and in some Canadian provinces, turbidity data are used to award credits to conventional treatment plants. Turbidity at the effluent of the filters is recorded at a given frequency (every 4 hours or every 15 minutes). Every month, these recorded data are compiled 76  and credits are given based on the turbidity value that is reached 95% of the time. This approach was introduced in the Surface Water Treatment Rule (SWTR) in the US (USEPA, 1989). While the values of the awarded credits are based on studies of the capacity of granular filters and on corresponding effluent turbidity (Al-Ani et al., 1986; Dugan et al., 2001), the recognition of operational variability in this regulatory approach is minimal. In a given month, filter performance will likely be affected by various factors, such as changes in raw water quality and changes in loading rate. The average removal achieved will be influenced by the events of low performance occurring during the 5% of the time when the target turbidity may not be met (Jaidi, 2007; Smeets, 2008) and will likely be different from what is computed in studies as the removal capacity during optimized conditions. Yet, removal credits awarded on a monthly basis are based on studies that sample at most during a few days and that do not necessarily consider such lower removal periods. Despite the fact that adverse conditions such as suboptimal coagulations have been simulated in some of the studies (Dugan et al., 2001), these results are not taken into account when removal credits are given on a monthly basis. When conducting a comparison of operational reliability for two different technologies, the time scale and method used to assess the performance of the technologies should be similar, if not identical. In the present case, and in many regulatory approaches, this condition cannot be met. The results presented here should therefore be interpreted carefully, taking into account that the removal achieved by the UF and conventional technologies have not been assessed with similar methods. Finally, it is important to mention that time scales are also significant for other variables, such as Craw and V. To our knowledge, this variability has not been explicitly recognized for treatment performance, pathogen concentrations or water intake. Yearly, monthly, daily, and even hourly averages of a single variable can be quite different from one another. Hydrologists are familiar with such variability in the analysis of river flows at different time scales. Thus, when conducting a QMRA-based reliability analysis, the time scale of interest needs to be determined and data needs to be collected in accordance with this time scale. As mentioned by Haas and Trussell (1998): "For contaminants with health effects associated with acute, or single dose, exposures, the presence of variability becomes important." We suggest that a relatively short time scale (daily) generally be considered, since risk analysis based on an annual time scale may fail to account for undesirable short term variations in the risk of infection.  77  3.4.3 Reliability issues  3.4.3.1 Incorporating correlations An aspect of reliability analysis that was not considered here is correlations between random variables. For example, the water consumption peaks could occur simultaneously with the raw water contamination peaks, (or the contrary, i.e., consumption peaks could coincide with periods of low contamination) which would result in a risk distribution different from that shown in Figure 3.3. Correlations between raw water concentrations and performance of the treatment plant could also be observed. For example, Haas and Kaymak (2003) found that inactivation of Giardia muris cysts by ozone decreased as the initial number of cysts in the water decreased. Our data set did not allow the computation of such correlations. Nonetheless, correlations should be considered in future research, as they may have a significant impact on reliability.  3.4.3.2 Use of FORM The technique used to conduct the reliability analyses may also provide sensitivity information regarding the parameters. Figure 3.5 shows that in our analyses the risk is more sensitive to infectivity, r, and raw water concentration, Craw, and less sensitive to the plant performance, LRVtotal, and water intake, V. Low values of the importance vector do not mean that a variable does not influence the result of the analysis but rather that more is known regarding this variable in comparison with other variables. Given the high values of the importance vectors for infectivity and raw water concentrations, more information regarding them would narrow their respective probability distributions and be the most beneficial when seeking to reduce the range of risk of infection obtained through the reliability analysis. For example, it is correct to say that the raw water concentration distribution is highly uncertain, being based on few samples. This distribution would be undeniably improved by more comprehensive monitoring campaigns.  3.5 Conclusions The new reliability point of view for applying QMRA is demonstrated for estimating the reliability of a UF plant based on operational data and for comparing it to the reliability of three reference conventional treatment cases in terms of their ability to provide a “tolerable” risk of 78  infection by Cryptosporidium parvum. In all cases, the UF plant studied exhibited a higher reliability than the reference conventional treatment cases for a wide range of risk of infection. However, many issues such as the time scale at which the removals are measured and the methods used to establish the removal for different technologies impose moderation on the strength of this conclusion. It is our understanding that the current regulatory approaches for establishing removal credits do not adequately recognize the variability of the treatment processes and, in the case of conventional treatment, that this approach should be updated. The uncertainty associated with the computation of LRVs from integrity testing on UF membranes should also be quantified and incorporated in the reliability analysis. In general, the prediction of pathogen concentrations in treated water is a task that will require more attention if a QMRAbased reliability approach is widely adopted in Canadian provinces. It is possible, in certain circumstances, to simplify the reliability analysis by considering only the variability of the LRVs. Yet variability needs to be acknowledged, and regulators should base their choices on the operational reality of the different treatment plants. Other issues raised that will need to be addressed by decision-makers are the choice of an “acceptable” or “tolerable” level of risk of infection and its associated reliability, the time scale at which the model should be applied, the necessary amount of data to gather regarding the different input variables to reduce uncertainty and quantify variability satisfactorily, and the quantification of correlations between these variables. In the end, the method presented here provides a sound basis on which to evaluate one criterion in what can be considered a multi-criteria decision problem. Choices such as the selection of a water supply system or the modification of an existing system involve many other considerations, such as (in no particular order) cost, environmental impact, other acute or chronic health issues, aesthetic concerns, operability, operator safety, and consumers’ perception of the technology.  79  Table 3.1: Model and input variable assumptions NAME CURRENT ASSUMPTION INFLUENCE Model Assumptions Single hit Oocysts act independently and Conservative hypothesis one oocyst can start an infection. Random distribution of pathogens Viability of oocysts  Oocyst counts in a given volume of water exhibit a Poisson probability distribution. All oocysts counted in raw water are potentially infectious.  Serological and immunological status of the population  The dose-response relationship developed in volunteer challenge studies is valid for the whole population.  Infectivity  Tap water consumption  Pathogen concentration  Treatment train performance  Unknown (Pathogens might be over-dispersed or underdispersed.) Conservative  Unknown (Acquired immunity and immunodepressed populations are less and more sensitive, respectively.) Input Variable Assumptions Oocysts found in the Unknown environment are a mix of (More infective or less different genotypes. The infective genotypes may challenge study results used are exist.) representative of the environmental variability. The distributions obtained 30 Unknown years ago are still valid and (Consumption habits represent the consumption for might have changed. the population served by the Spatial averaging system studied. attenuates local variations.) The six samples taken in 1997 Unknown are representative of the (The actual variability of pathogen concentrations may concentrations. differ those observed.) The hydraulic model used in DIT is valid and results are suitable for quantitative risk assessment.  Unknown  REFERENCES Haas et al. (1996) Haas et al. (1999) Teunis et al (1997) Okhuysen et al (1999) Teunis and Havelaar (2000) Haas and Rose (1996) Teunis et al (1997) Gale (2001) Gale et al (2004) Teunis et al. (1997) Makri et al. (2004) King and Monis (2007) Dupont et al. (1995) Haas et al. (1999) Okhuysen et al (1999) Gale (2001) Englehardt and Swartout (2004) Messner et al. (2001) Aboytes et al. (2004) USEPA (2006)  EHD (1981) Gofti-Laroche et al. (2001) Mons et al. (2007)  Smith and Thompson (2001) USEPA (2005c) USEPA (2005d) (References discuss sampling methods.) ASTM international (2003) Farahbakhsh et al. (2003) Johnson and MacCormick (2003)  80  12  0.8  1.6  0.8  9  0.6  1.2  0.6  6  0.4  0.8  0.4  3  0.2  0.4  0.2  0 0  PDF  0.1  0.5 (a) Infectivity  0 1 1  2  1  0 0 0 1 2 3 (b) Daily Water Consumption (L/d) 4  1  0.08  0.8  3.2  0.8  0.06  0.6  2.4  0.6  0.04  0.4  1.6  0.4  0.02  0.2  0.8  0.2  0 0 0 50 100 (c) Concentration (#oocysts/100L)  CDF  1  CDF  PDF  15  0 0 4.5 5 5.5 6 (d) Daily total oocyst log removal  Figure 3.1: Probability density and cumulative distribution functions of input variables  81  6  Total LRV  5.5  5  4.5 05/01/2005  10/06/2005  03/13/2006 08/18/2006 Date  01/23/2007  06/30/2007  Figure 3.2: Total removal (LRVtotal) achieved by the UF plant between May 2005 and June 2007  82  1 UF Membranes 0.9  Reliability, P(success)  0.8 0.7  RGF Low RGF Mean RGF High  0.6 0.5 0.4 0.3 0.2 0.1 0 -10 10  -8  -6  -4  -2  10 10 10 10 Acceptable risk (#infection per person per year)  0  10  Figure 3.3: Reliability of an UF plant compared with reference cases for conventional rapid granular filtration (RGF) plants for different risk levels  83  (a) Microrisk basic case  (b) High performance  1  1 PDF  1.5  PDF  1.5  0.5  0.5  0 0  2  4  0 0  6  LRV (c) Low performance  1  1  4 LRV (d) Mean performance  6  PDF  1.5  PDF  1.5  2  0.5  0.5  0 0  2  4 LRV  6  0 0  2  4  6  LRV  Figure 3.4: Probability density functions of conventional treatment train removal  84  1 Infectivity Volume Concentration Total LRV  Value of the importance vector  0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0  -10  -9  -8 -7 -6 x Value of the acceptable risk (10 )  -5  -4  Figure 3.5: Importance of random variable for UF plant reliability analysis  85  3.6 References Aboytes, R., Di Giovanni, G. D., Abrams, F. A., Rheinecker, C., McElroy, W., Shaw, N. and LeChevallier, M. W. (2004). "Detection of infectious Cryptosporidium in filtered drinking water" Journal American Water Works Association, 96, (9), 88-98. Al-Ani, M. Y., Hendricks, D. W., Logsdon, G. S. and Hibler, C. P. (1986). "Removing Giardia Cysts From Low Turbidity Waters By Rapid Rate Filtration" Journal American Water Works Association, 78, (5), 66-73. Alberta Environment (2006). Standards and guidelines for municipal waterworks, wastewater and storm drainage system, Environmental Assurance Division: Environmental Policy Branch: Drinking Water Branch, 399p. ASTM International (2003). Standard Practice for Integrity Testing of Water Filtration Membrane Systems, ASTM International, D 6908 - 03, 1-14. Barbeau, B., Payment, P., Coallier, J., Clément, B. and Prévost, M. (2000). "Evaluating the Risk of Infection from the Presence of Giardia and Cryptosporidium in Drinking Water" Quantitative Microbiology, 2, (1), 37. Brookes, J. D., Antenucci, J., Hipsey, M., Burcha, M. D., Ashbolt, N. J. and Ferguson, C. (2004). "Fate and transport of pathogens in lakes and reservoirs" Environment International, 30, 741-759. Dugan, N. R., Fox, K. R., Owens, J. H. and Miltner, R. J. (2001). "Controlling Cryptosporidium Oocysts Using Conventional Treatment" Journal of the American Water Works Association, 93, (12), 64-76. Dupont, H. L., Chappell, C. L., Sterling, C. R., Okhuysen, P. C., Rose, J. B. and Akubowski, W. J. (1995). "The infectivity of Cryptosporidium parvum in healthy volunteers" The New England Journal of Medicine, 332, (13), 855-859. Eisenberg, D., Soller, J., Sakaji, R. and Olivieri, A. (2001). "A methodology to evaluate water and wastewater treatment plant reliability" Water Science and Technology, 43, (10), 91-99. Emelko, M. B., Huck, P. M. and Coffey, B. M. (2005). "A review of Cryptosporidium removal by granular media filtration" Journal of the American Water Works Association, 97, (12), 101-115. Englehardt, J. D. and Swartout, J. (2004). "Predictive Population Dose-Response Assessment for Cryptosporidium parvum: Infection Endpoint" Journal of Toxicology and Environmental Health, 67, 651-666. Environmental Health Directorate (1981). Tapwater consumption in Canada, Branch, H. P., ISBN: 0-662-12489-8, 83p. Faber, M. H. (2006). Risk and safety in civil, surveying and environmental engineering: Lecture notes, Swiss Federal Institute of Technology, Zurich, 396. Farahbakhsh, K., Adham, S. S. and Smith, D. W. (2003). "Monitoring the Integrity of LowPressure Membranes" Journal of the American Water Works Association, 95, (6), 95-107. Foschi, R. O., Folz, B. and Yao, F. (1993). "Reliability-Based Design Of Wood Structures Background To Csa-086.1-M89" Canadian Journal Of Civil Engineering, 20, (3), 349-357. Gale, P. (2001). "Developments in microbiological risk assessment for drinking water" Journal of Applied Microbiology, 91, 191-205. 86  Gale, P., Pitchers, R. and Gray, P. (2004). "The effect of drinking water treatment on the spatial heterogeneity of micro-organisms: implications for assessment of treatment efficiency and health risk" Water Research, 36, 1640-1648. Gofti-Laroche, L., Potelon, J. L., Da Silva, E. and Zmirou-Navier, D. (2001). "Description de la consommation d'eau de boisson dans certaines communes françaises (étude E.MI.R.A.)" Revue d'épidémiologie et de santé publique, 49, (5), 411-422. Haas, C. N. (1983). "Estimation of Risk Due to Low Doses of Microorganisms: a Comparison of Alternative Methodologies" American Journal of Epidemiology, 118, (4), 573-582. Haas, C. N. (1996). "Acceptable Microbial Risk" Journal of the American Water Works Association, 88, (12), 8. Haas, C. N., Crockett, C. S., Rose, J. B., Gerba, C. P. and Fazil, A. M. (1996). "Assessing the Risk Posed by Oocysts in Drinking Water" Journal of the American Water Works Association, 88, (9), 131-136. Haas, C. N. and Rose, J. B. (1996). "Distribution of Cryptosporidium Oocysts in a Water Supply" Water Research, 30, (10), 2251-2254. Haas, C. N. and Trussell, R. R. (1998). "Frameworks for assessing reliability of multiple, independent barriers in potable water reuse" Water Science and Technology, 38, (6), 1-8. Haas, C. N., Rose, J. B. and Gerba, C. P. (1999). Quantitative microbial risk assessment, Wiley, New York, x, 449 p. Haas, C. N. and Kaymak, B. (2003). "Effect of initial microbial density on inactivation of Giardia muris by ozone" Water Research, 37, (12), 2980-2988. Hasofer, A. M. and Lind, N. C. (1974). "Exact And Invariant Second-Moment Code Format" Journal Of The Engineering Mechanics Division-Asce, 100, (NEM1), 111-121. Huck, P. M., Coffey, B. M., Emelko, M. B., Maurizio, D. D., Slawson, R. M., Anderson, W. B., Van Den Oever, J., Douglas, I. P. and O'Melia, C. R. (2002). "Effect of filter operation on Cryptosporidium removal" Journal of the American Water Works Association, 94, (6), 97111. Jaidi, K. (2007). Développement d'un modèle d'analyse des risques microbiologiques (QMRA) permettant le choix de combinaisons de procédés les plus sécuritaires, Génie civil, géologique et des mines, École Polytechnique de Montréal: Montréal. Maîtrise Ès Sciences Appliquées: 237p. Johnson, W. T. and MacCormick, T. (2003). "Issues of operational integrity in membrane drinking water plants" Water Science and Technology, 3, (5), 73-80. King, B. J. and Monis, P. T. (2007). "Critical processes affecting Cryptosporidium oocyst survival in the environment" Parasitology, 134, 309-323. LeChevallier, M. W. and Buckley, M. (2007). Clean water: What is acceptable microbial risk? Microbiology, A. A. o., Washington, D.C., 18p. Maier, H. R., Lence, B. J., Tolson, B. A. and Foschi, R. O. (2001). "First-order reliability method for estimating reliability, vulnerability, and resilience" Water Resources Research, 37, (3), 779-790. Makri, A., Modarres, R. and Parkin, R. (2004). "Cryptosporidiosis Susceptibility and Risk: A Case Study" Risk Analysis, 24, (1), 209-220. Medema, G. J., Hoogenboezem, W., Veer, A. J. v. d., Ketelaars, H. A. M., Hijnen, W. A. M. and Nobel, P. J. (2003). "Quantitative risk assessment of Cryptosporidium in surface water treatment" Water Science and Technology, 47, (3), 241-247. 87  Messner, M. J., Chappell, C. L. and Okhuysen, P. C. (2001). "Risk Assessment for Cryptosporidium: A Hierarchical Bayesian Analysis of Human Dose Response Data" Water Research, 35, (16), 3934-3940. MicroRisk Consortium (2003). MICRORISK: Relation to EU policy, [www.microrisk.com/publish/cat_index_8.shtml] Retreived on July 21st 2008. Ministère du Développement durable de l'Environnement et des Parcs (2006). Guide de conception des installations de production d’eau potable, Direction des politiques de l'eau, 286p. Mons, M. N., van der Wielen, J. M. L., Blokker, E. J. M., Sinclair, M. I., Hulshof, K. F. A. M., Dangendorf, F., Hunter, P. R. and Medema, G. J. (2007). "Estimation of the consumption of cold tap water for microbiological risk assessment: an overview of studies and statistical analysis of data" Journal of Water and Health, 5, (Supplement 1), 151-170. Nowak, A. S. (1995). "Calibration Of LRFD Bridge Code" Journal Of Structural EngineeringAsce, 121, (8), 1245-1251. Okhuysen, P. C., Chappell, C. L., Crabb, J. H., Sterling, C. R. and DuPont, H. L. (1999). "Virulence of Three Distinct Cryptosporidium parvum Isolates for Healthy Adults" The Journal of Infectious Disease, 180, 1275-1281. OpenSees (2006). Open System for Earthquake Engineering Simulation - Home page, [http://opensees.berkeley.edu/index.php] Retreived on July 21st 2008. Rose, J. B., Haas, C. N. and Regli, S. (1991). "Risk Assessment and Control of Waterborne Giardiasis" American Journal of Public Health, 81, (6), 709-713. Sarang, A., Vahedi, A. and Shamsai, A. (2008). "How to quantify sustainable development: A Risk-based approach to water quality management" Environmental Management, 41, (2), 200-220. Smeets, P., Rietveld, L., Hijnen, W. A. M., Medema, G. J. and Stenström, T.-A. (2006). Efficacy of water treatment processes, MicroRisk Consortium, 70p. Smeets, P. (2008). Stochastic modelling of drinking water treatment in microbial risk assessment, Civil Engineering, Delft University of Technology: Delft. Doctor: 203p. Smith, M. and Thompson, K. C. (2001). Cryptosporidium: the analytical challenge, Royal Society of Chemistry, Cambridge, p.c m. Statistics Canada (2006). Annual Demographic Statistics 2005, Division, D., Catalogue no. 91213-XIB, ISSN: 1480-8803, 310p. Teunis, P. F. M., Medema, G. J., Kruidenier, L. and Havelaar, A. H. (1997). "Assessment of the risk of infection by Cryptosporidium and Giardia in drinking water from a surface water source" Water Research, 31, (6), 1333-1346. Teunis, P. F. M. and Havelaar, A. H. (2000). "The Beta-Poisson Model Is Not a Single-Hit Model" Risk Analysis, 20, (4), 513-520. Thorndahl, S., Schaarup-Jensen, K. and Jensen, J. B. (2008). "Probabilistic modelling of combined sewer overflow using the First Order Reliability Method" Water Science and Technology, 57, (9), 1337-1344. United States Environmental Protection Agency (1989). Drinking Water; National Primary Drinking Water Regulations; Filtration, Disinfection; Turbidity, Giardia Lamblia, Viruses, Legionela and Heterotrophic Bacteria; Final Rule, 40 CFR Parts 141 and 142, Federal Register, Vol. 54 No. 124, 27486 - 27541. 88  United States Environmental Protection Agency (2005a). Economic Analysis for the Final Long Term 2 Enhanced Surface Water Treatment Rule, Office of Water, 319. United States Environmental Protection Agency (2005b). Membrane Filtration Guidance Manual, Office of Water, United States Environmental Protection Agency (2005c). Method 1622: Cryptosporidium in water by filtration/IMS/FA, EPA 815-R-05-001, 67p. United States Environmental Protection Agency (2005d). Method 1623: Cryptosporidium and Giardia in water by filtration/IMS/FA, EPA 815-R-05-002, 68p. United States Environmental Protection Agency (2006). National Primary Drinking Water Regulations: Long Term 2 Enhanced Surface Water Treatment Rule; Final Rule, 40 CFR Parts 9, 141, and 142 [EPA–HQ–OW–2002–0039; FRL–8013–1] RIN 2040—AD37, Federal Register, Vol. 71, No. 3, 654-786. Vasquez, J. A., Maier, H. R., Lence, B. J., Tolson, B. A. and Foschi, R. O. (2000). "Achieving water quality system reliability using genetic algorithms" Journal of Environmental Engineering, 126, (10), 954-962.  89  4  CONCLUSION This thesis develops and presents reliability tools in the context of microbial drinking water  quality. Applications of techniques for hazards identification and quantification are demonstrated for two different water treatment technologies used to remove the pathogen Cryptosporidium parvum, which has been one of the recent focuses of the water industry. Qualitative FTA and quantitative reliability analysis have applications for the different stakeholders of the industry.  4.1 Summary The major contributions of this thesis are summarized here.  4.1.1 Application of FTA to identify technical hazards QMRA in the water industry and other reliability approaches often take the point of view of inherent reliability, looking at the end product and quantifying the consequences of its variability. Increasing water treatment plant reliability, thus improving the protection of public health, necessitates taking the mechanical reliability point of view, where important processes whose failures affect effluent quality are identified and ways to mitigate their impact identified. With the advent of new regulations and the use of new treatment technologies to treat new pathogens and to reach new regulatory water quality objectives, there is a need to improve our understanding of the factors that may impede the performance of the different technologies. Different hazard identification methods such as HACCP, HAZOP, FMEA, ETA or FTA have been applied in various industries, with HACCP being the most widely applied in the water industry and FTA having been applied in some cases. Chapter 2 of this thesis provides an approach for soliciting operator knowledge and for building fault trees with the aim of improving mechanical reliability of treatment technologies employed to remove the pathogen Cryptosporidium parvum from surface water. It describes the steps typically followed to build fault trees, and the iterative process formulated in this project to solicit knowledge from treatment plant operators. The approach is then applied to two case studies involving a new and a conventional physicochemical process, namely a UF system and a rapid granular filtration system. This application represents a first attempt at collecting information about technical hazards of water treatment technologies into a single fault tree. 90  Results such as the differences found between the technologies, the future uses of the fault trees, and the potential extension of this tool are discussed. Findings are reviewed in Section 4.2.  4.1.2 Reliability analysis to incorporate uncertainties in QMRA With the discovery of previously unidentified pathogens, the emergence of novel treatment technologies, the initiation of new regulations, and the inherent variability of many factors, decision making in the water treatment industry is complex. The need for methods to take uncertainties into account has been pointed out previously. QMRA has been used in the past with uncertain inputs, such as variable treatment plant performances and pathogen concentrations. Yet, very few authors have proposed reliability analysis as an effective way of incorporating uncertainties in decision making practice in the water industry. Recent focus of research and regulations in the field of microbial quality of drinking water has been on the protozoon Cryptosporidium parvum. UF membrane processes have been developed in part to remove small parasites, including protozoa, from the water. In Chapter 3, the risk of infection by Cryptosporidium parvum in a Canadian context is computed for a full-scale UF membrane plant. A reliability approach for applying QMRA is proposed. Reliability analysis is used to take the variability and uncertainty of the different variables affecting the risk of infection into account. Full-scale operational data are used to characterize the performance of the UF system. As a mean of comparison, the same approach is applied to conventional physicochemical treatment train reference cases, for which the performance is obtained based on previous literature. The inputs to the QMRA model are described by probability distributions. FORM is used to conflate the various distributions into the resulting reliability of the water treatment plants. Uncertainties are successfully taken into account and the technique described can be used in different settings, such as in formulating regulation or in design. Yet, other issues have been identified that will need to be addressed, such as the applicability of the model in the Canadian context, the verification of some QMRA hypothesis, and the quantification of various uncertainties. The findings and possible additions to the field of treatment plant reliability are summarized in the Section 4.2.  4.2 Findings The following findings can be extracted from the results of this thesis.  91  • Physicochemical treatment technologies designed to remove Cryptosporidium parvum require competent operation to avoid technical hazards. • In UF plants, for operational hazards leading to contamination of finished water with Cryptosporidium oocysts, hazards to the membrane modules are more numerous than hazards to the other processes in the plant. For conventional physicochemical treatment plants, hazards to the pre-treatment processes are more numerous. • FTA allows identification of hazards but most of these were previously known by plant operators. It is nonetheless a useful tool for summarizing technical hazards and for the following uses: prevention, post-failure diagnostic, education, design verification, and prioritizing interventions. • Quantitative analysis of the fault trees may require an important amount of data but could be useful to prioritize risk-reducing interventions. • QMRA model assumptions need to be addressed before applying QMRA models to a particular context. This is the case in Canada, where few studies have taken place. • Based on my assumptions, the UF plant studied provides treated water with a lower risk of infection more reliably than the conventional treatment reference cases established from the Microrisk project. • The time scale at which the variables Craw, V and LRVtotal are computed may impact the resulting reliability. Since microbial contaminants such as Cryptosporidium parvum have acute short term health effects, using a short time-scale (daily) is advised. Data in subsequent studies should be obtained accordingly. • Correlations between input variables can influence the results of a reliability analysis. Knowledge about these correlations is currently relatively sparse. • The FORM technique identified infectivity and raw water concentrations as variables for which additional information would result in a reduction in the range of reliability and provide a more precise estimate of the risk of infection. • The water treatment industry should seek inspiration from other engineering disciplines where reliability is more commonly used in standard development and system design.  92  4.3 Future work In future research, other hazard identification methods should be compared to FTA, and FTA should be applied to treatment plants in different environmental settings, to determine if unknown or unexpected hazards can be identified. FTA may not have detected unknown hazards in the cases described here because the drinking water industry and specifically the two plants described in the case studies have implemented good risk reduction practices to the point where most hazards had already been identified. These propositions need to be verified. In an effort to improve the mechanical reliability of treatment plants, interventions should aim at reducing the probability of occurrence of primary events identified in the fault trees. A first step toward this improvement is the quantification of these probabilities of occurrence. Such quantification would help prioritizing interventions to events of greater probability. To further develop QMRA-based reliability analysis, the hypothesis of the QMRA model should be addressed and verified in the Canadian context of application. From a regulation perspective, it would be valuable to open the debate of “tolerable” risk and the reliability that should be associated with this level of risk. Insights on these issues could be drawn from engineering disciplines where reliability studies have been undertaken for a longer period of time than they have in the water industry. Some variability and uncertainties of the inputs have not been addressed in this thesis. Attention should be given to these uncertainties and it should be determined if they impact the range of risks significantly. The FORM technique identified infectivity as having the most influence on the reliability results presented in Chapter 3. Also, few data are usually available to characterize raw water concentration of Cryptosporidium parvum. More work related to this issue would also help in reducing uncertainty. The impact of uncertainties on the log removal model of pressure-decay tests for UF technologies could also be quantified. As explained previously, significant improvement is needed for the quantification technique of oocyst removal by rapid granular filter. There are now many doubts related to the validity of turbidity as a surrogate for quantitative risk evaluation, and the recognition of variability in regulatory approaches seems to be minimal. Finally, it will also be important to determine correlations between the different inputs of the QMRA model. The degree of influence of the correlations on the results of the reliability analysis needs to be determined.  93  In the end, it needs to be stated, as mentioned by Hrudey (2001), that the purpose of risk assessment, and similarly of reliability analysis, is to inform, not to make decisions. The risk of infection and the reliability of a treatment plant for that level of risk is only one criterion on which a treatment process may be evaluated (Longpré et al., 2004; Gregory et al., 2006). Other considerations, such as cost, operability, other health issue, and the perception of the technology are also relevant. This thesis shows that at least the performance criterion, and certainly other criteria, must be evaluated rigorously to support informed decisions.  94  4.4 References Gregory, R., Failing, L., Ohlson, D. and McDaniels, T. L. (2006). "Some pitfalls of an overemphasis on science in environmental risk management decisions" Journal of Risk Research, 9, (7), 717-735. Hrudey, S. E. (2001). "A Risk Management Approach" Water, 26, (1), 29-32. Longpré, É., Bouchard, C., Abi-Zeid, I. and Rodriguez, M. (2004). "The development of a multicriteria decision-aid framework for the selection of a drinking water treatment system", 4th international conference on decison-making in urban and civil engineering, Porto, Portugal, October 28-30.  95  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0063084/manifest

Comment

Related Items