Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Modelling of drinking water treatment and disinfection by-product formation with artificial neural networks de Medeiros Paulino, Rafael 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_september_demedeirospaulino_rafael.pdf [ 3.38MB ]
Metadata
JSON: 24-1.0380442.json
JSON-LD: 24-1.0380442-ld.json
RDF/XML (Pretty): 24-1.0380442-rdf.xml
RDF/JSON: 24-1.0380442-rdf.json
Turtle: 24-1.0380442-turtle.txt
N-Triples: 24-1.0380442-rdf-ntriples.txt
Original Record: 24-1.0380442-source.json
Full Text
24-1.0380442-fulltext.txt
Citation
24-1.0380442.ris

Full Text

      Modelling of drinking water treatment and disinfection by-product formation with artificial neural networks   by  Rafael de Medeiros Paulino   B.Sc., Federal University of Rio Grande do Norte, 2016    A thesis submitted in partial fulfillment of the requirements for the degree of   Master of Applied Science  in  The Faculty of Graduate and Postdoctoral Studies  (Civil Engineering)  The University of British Columbia (Vancouver)  August 2019   © Rafael de Medeiros Paulino, 2019      ii   The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:  Modelling of drinking water treatment and disinfection by-product formation with artificial neural networks  submitted by Rafael de Medeiros Paulino in partial fulfillment of the requirements for the   degree of Master of Applied Science   in Civil Engineering  Examining Committee:  Pierre R. Bérubé, Civil Engineering (University of British Columbia) Supervisor  Caetano C. Dorea, Civil Engineering (University of Victoria) Supervisory Committee Member       iii   Abstract The source water for Coquitlam Water Treatment Plant (CWTP) originates from a watershed located in the mountains north of the City of Vancouver (BC, Canada), providing approximately 20% of the water demand for the metropolitan area. Treatment at CWTP consists of ozonation, followed by UV for primary disinfection and chlorination for secondary disinfection.  Ozone is used to increase the UV transmittance (UVT) of the water, and to reduce the formation of chlorinated disinfection by-products (DBP) in the distribution system. Ozone addition at the CWTP is currently dosed proportionally to the flow being treated. This approach does not take into consideration the complex interactions that exist between varying raw water characteristics and ozone, and how these impact changes in UVT or DBP formation. Advanced numerical computational techniques, such as artificial neural networks (ANN), are increasingly being used to objectively identify optimal operating setpoints for complex systems. In the present study, two sets of ANN models were developed to optimize ozone addition for effective UV treatment and control of DBP formation.  The first, the treatment system operation models, were used to predict pre-chlorination UVT, used as surrogate for the DBP formation potential, based on raw water characteristics and CWTP operational setpoints. The second, the distribution system models were used to predict the formation of total haloacetic acids (HAA), total trihalomethanes (THM) and two HAA fractions (DCAA and TCAA) in the distribution system based on raw and treated water characteristics.  The treatment models could accurately predict pre-chlorination UVT. A moderate correlation was also observed between the measured and predicted DBP concentrations using the distribution system models, even though significant scatter was observed. This was likely due to the small available dataset and lack of reliable estimations of retention time and chlorine concentration in the distribution system. Scenario analyses with selected models were performed to investigate possible operational benefits of the implementation of these machine learning algorithms models to control ozone dosing. These suggested that savings could be achieved and high and constant level of pre-chlorination UVT could be maintained if ozone was dosed using these artificial neural network models. iv   Lay summary The goal of the present study is to use advanced numerical techniques, generally known as machine learning algorithms, to model operation at Coquitlam Water Treatment Plant. These models can be used to optimally operate the treatment plant while guaranteeing safe drinking water for the habitants of the region.                     v   Preface  This dissertation is original, unpublished, independent work by the author, Rafael de Medeiros Paulino.                       vi   Table of contents Abstract ................................................................................................................................... iii Lay summary ........................................................................................................................... iv Preface ....................................................................................................................................... v Table of contents ...................................................................................................................... vi List of tables ............................................................................................................................ ix List of figures ............................................................................................................................. x List of abbreviations ............................................................................................................... xii 1. Introduction .......................................................................................................................... 1 2. Background and Literature review ...................................................................................... 3 2.1. Ozone and its applications in drinking water ............................................................... 3 2.2. Disinfection by-products ................................................................................................ 6 2.2.1. Introduction ............................................................................................................. 6 2.2.2. Trihalomethanes ..................................................................................................... 7 2.2.3. Haloacetic Acids ...................................................................................................... 7 2.2.4. DBP formation models ............................................................................................ 8 2.2.5. Drinking water treatment and DBP history in Vancouver .................................... 9 2.3. Machine learning algorithms........................................................................................12 2.3.1. Introduction ............................................................................................................12 2.3.2. Basic elements of an ANN ......................................................................................14 2.3.3. Activation functions ...............................................................................................16 2.3.4. Error function, backpropagation and optimizing algorithms ................................17 2.3.5. ANN architecture design and hyperparameters ...................................................19 2.3.6. Overfitting and underfitting ..................................................................................22 2.3.7. Looking into what is inside the “black-box” ...........................................................23 vii   2.4. ANNs in water and wastewater treatment ..................................................................26 3. Objectives .............................................................................................................................30 4. Materials and methods ........................................................................................................31 4.1. Problem definition and design approach ......................................................................31 4.2. Basics elements of model development ........................................................................33 4.3. Treatment system operation models ............................................................................35 4.3.1. Introduction ............................................................................................................35 4.3.2. Data preprocessing .................................................................................................35 4.3.3. Inputs and outputs selection ..................................................................................37 4.4. Distribution system models ..........................................................................................40 4.4.1. Introduction ............................................................................................................40 4.4.2. Data preprocessing .................................................................................................40 4.4.3. Inputs and outputs selection ..................................................................................40 4.5. Scenario analyses ..........................................................................................................44 5. Results and discussion ........................................................................................................45 5.1. ANN training ................................................................................................................45 5.2. Treatment models .........................................................................................................47 5.2.1. Assessment .............................................................................................................47 5.2.2. Accuracy analyses ..................................................................................................48 5.2.3. Trend analyses .......................................................................................................51 5.3. Distribution system models ..........................................................................................64 5.3.1. Assessment .............................................................................................................64 5.3.2. Selection of models for further analysis .................................................................64 5.3.3. Accuracy analyses ..................................................................................................67 5.3.4. Trends analyses ......................................................................................................70 viii   5.4. Scenario analyses ..........................................................................................................81 5.4.1. Assessment .............................................................................................................81 5.4.2. Ozone dosing to control HAA formation ................................................................81 5.4.3. Ozone dosing to control treated water UVT ..........................................................83 6. Conclusions ..........................................................................................................................87 7. Recommendations ................................................................................................................89 References ................................................................................................................................90 Appendix A: Models’ architectures and hyperparameters. .................................................. 100 Appendix B: Logic diagram flow of the selection of architectures and hyperparameters of ANN models. .......................................................................................................................... 101 Appendix C: Sample code used in the present study ............................................................ 102 Appendix D: Logic diagram flow of the procedure used in one of the scenario analyses. .... 105 Appendix E: Process flow diagrams from CWTP .................................................................. 106 Appendix F: Sensors (i.e. tags) used for the development of models ................................... 111            ix   List of tables Table 1 – Examples of studies that used ANNs in water and wastewater treatment applications ..............................................................................................................................27 Table 2 – Ranges for which parameters were considered as correct measurements .............36 Table 3 – Treatment model development phase .....................................................................38 Table 4 – Distribution models development phase .................................................................42 Table 5 – Correlation coefficients and MSE of the treatment system models .......................49 Table 6 – Treatment system operation models multiple linear regressions ..........................59 Table 7 - Treatment system and operation models’ errors and statistical parameters .........68 Table 8 – Distribution system models multiple linear regressions ........................................78               x   List of figures Figure 1 – Lewis diagrams for the two contributing structures of the ozone molecule resonance hybrid ...................................................................................................................... 3 Figure 2 - Typical ozone decay in water process ...................................................................... 5 Figure 3 – Drinking water treatment and distribution system and DBP sampling stations in Metro Vancouver .....................................................................................................................10 Figure 4 – Schematic drawing of CWTP processes .................................................................10 Figure 5 – Isolated measurements and running averages of four consecutive measurements of THMs and HAAs in the metropolitan area of Vancouver ..................................................11 Figure 6 - General ANN architecture representation ............................................................14 Figure 7 - One input, one node and one output ANN representation ....................................15 Figure 8 - Rectified Linear Unit graphical representation ....................................................16 Figure 9 - Example of the error function behaviour depending on the weights and biases ..17 Figure 10 – Comparison between Adam and other frequently used optimizing algorithms. 19 Figure 11 - Representation of typical behaviours during training with different learning rates .................................................................................................................................................21 Figure 12 - Representation of typical behaviours for different learning rates over iterations .................................................................................................................................................22 Figure 13 - Example of underfit, appropriate and overfit models ..........................................23 Figure 14 - Summary of inputs and outputs of the models ....................................................32 Figure 15 – Example of the results of the removal of inconsistencies shorter than 12h .......36 Figure 16 - Example of the results of the removal of data outside of acceptable range ........37 Figure 17 – Typical error vs. learning rate .............................................................................45 Figure 18 - Typical error vs. number of steps .........................................................................46 Figure 19 – Typical predicted vs. measured treated water UVT ...........................................48 Figure 20 - Slope and intercept of regression through the data for treatment models .........50 Figure 21 – Measured treated water UVT or ΔUVT vs. measured input parameters of interest .................................................................................................................................................53 xi   Figure 22 – Typical results for predictions vs. inputs of interest analysis ............................56 Figure 23 – Typical results for relative variable importance analyses of the treatment system models ......................................................................................................................................58 Figure 24 - Comparison between Garson’s algorithm and MLR results for treatment models 1,2,3 ............................................................................................................................................61 Figure 25 – Typical box-plot distribution of non-biodegradable DBPs at selected sampling stations in Metro Vancouver ...................................................................................................65 Figure 26 – Seasonal variability of total HAAs at the sampling stations in the City of Maple Ridge ........................................................................................................................................65 Figure 27 – Typical distribution of total HAAs, DCAA and TCAA at the sampling stations in the City of Maple Ridge ...........................................................................................................66 Figure 28 – Typical result of an ANN model that predicts DBP at the City of Maple Ridge 67 Figure 29 - Typical predicted vs. measured total HAA ..........................................................68 Figure 30 – Slope and intercept of regression through the data for distribution models ......69 Figure 31 – Measured total HAA vs. measured input parameters of interest ......................72 Figure 32 – Typical results for predictions vs. inputs of interest analysis ............................75 Figure 33 – Typical results for relative variable importance analyses of the distribution system models ..........................................................................................................................77 Figure 34 - Comparison between Garson’s algorithm and MLR results for distribution models1,2,3 .................................................................................................................................79 Figure 35 – Scenario analysis for HAA concentration based on ozone dose ..........................82 Figure 36 – Typical scenario analysis to maintain constant treated water UVT ..................83 Figure 37 – Estimated annual O&M costs with UV-disinfection and ozonation ...................84 Figure 38 – Estimated annual O&M costs with UV-disinfection and ozonation combined ..85 Figure 39 – Estimated annual O&M savings associated with implementation of ANNs .....85  xii   List of abbreviations Artificial neural network Coquitlam Water Treatment Plant Disinfection by- product Dissolved organic carbon Haloacetic acid Mean squared error Multilayer perceptron Natural organic matter Operation and maintenance Rectified Linear Unit Total organic carbon Trihalomethane Ultraviolet absorbance at 254nm Ultraviolet transmittance at 254nm ANN CWTP DBP DOC HAA MSE MLP NOM O&M ReLU TOC THM UVA UVT 1   1. Introduction Water and wastewater treatment are achieved using an array of physical, chemical and biological treatment systems. Each process is selected and designed not only to complement each other, but also to provide multiple barriers to the removal of contaminants of concern (Ramalho, 2013). However, the current approach to operating water treatment systems is highly subjective. The operation of most treatment plants relies on operator judgment, flow proportional setpoints and/or bench-scale tests. These do not provide the objective real-time information essential to optimize the operation of the systems.  The complexity is amplified by the fact that the performance of each component of a treatment system relies upon the functioning of the other components, as well as the raw water characteristics, both of which can change rapidly over time (Zhang & Stanley, 1999; Liu & Ratnaweera, 2016) As an example of a drinking water treatment system, the metro Vancouver Coquitlam Water Treatment Pant (CWTP) provides treatment using ozonation, UV radiation and chlorination, supplying approximately 20% of the regional demand for potable water (Metro Vancouver, 2019). Ozone is dosed proportionally to the flow being treated at CWTP, to attempt to maintain a constant ozone concentration at the point of application.  However, because ozone consumption depends on the characteristic of the raw water, which changes over time, the ozone consumption is highly variable (i.e. ozone is consistently under or overdosed).  The amount of ozone available affects the characteristics of the water being treated, impacting, for example, UV transmittance and therefore the required dose of UV and the extent of disinfection that can be achieved. In addition, ozone is the only process in the system with the capacity to oxidize natural organic matter. Without ozonation, the concentration of disinfection by-products (DBPs) in sections of the distribution system supplied by CWTP would likely exceed the limits prescribed by the Canadian Drinking Water Guidelines (CDWQG) (Chin & Béruré, 2005). Because of the complexity of water treatment systems, no universal mechanistic models exist that can be used to objectively control the performance of each process.  However, advanced data-driven numerical techniques, often referred to as machine learning algorithms, can be used to identify optimal operating setpoints for complex systems.  Due to their predictive and pattern recognition abilities, these algorithms are now being used in a wide variety of applications, from food quality analysis and face recognition to water and wastewater treatment (Heaton, 2008). 2   Machine learning algorithms are not based on physical, chemical and/or biological mechanisms.  Instead, these algorithms recognize, based on “learning” through previous training, that given certain inputs, some operational conditions can be optimized to achieve a certain goal (Chollet, 2017). The present study investigated the use of artificial neural networks, a particular machine learning algorithm, to identify the optimal ozone dose to apply at the CWTP to effectively control (i.e. limit) DBP formation in sections of the distribution system supplied by the plant and to optimize costs with ozonation and UV-disinfection. The present report is composed by: i) A literature review, which introduces key concepts necessary for the understating of the rest of the study; ii) Major objectives of the present study; iii) Materials and methods, which displays a guideline of how the study was developed; iv) Results; v) Major conclusions drawn; and vi) Recommendations for future studies.          3   2. Background and Literature review  2.1. Ozone and its applications in drinking water1 Ozone is an inorganic molecule consisting of three oxygen atoms (O3). The molecule is described as a hybrid between two different resonant structures, each with a single bond on one side and a double bond on the other (Figure 1). This resonance caused by the addition of the third oxygen atom makes the ozone molecule less stable than its allotrope oxygen gas (O2) and a strong oxidizing agent.  Figure 1 – Lewis diagrams for the two contributing structures of the ozone molecule resonance hybrid Adapted from (Langlais, et al., 1991) Ozone is used in drinking water treatment for its oxidizing potential, commonly as a primary disinfectant and/or for taste and odour control. In some cases, ozone is also used when compounds in the water need to be oxidized (Rakness, 2011).  There are several different methods to generate ozone, for example, the corona discharge method and ultraviolet-ozone generators. The corona discharge method starts with the vaporization of liquid oxygen. Small amounts of dried air are then injected to the gaseous oxygen. Passing this mixture through a high voltage electric field excites the oxygen atoms and causes them to split and recombine as ozone molecules. Due to its instability, ozone most frequently has to be produced at the point of use.  The mechanisms and pathways of ozonation of compounds in water are not yet completely understood. These are very complex processes, often involving multiple steps and being dependent on numerous variables. Ozone can react directly with the compounds in solution (direct reaction pathway), or it can produce radicals, notably hydroxyl radicals (OH), that then react with the compounds (indirect reaction pathway).   1 Unless otherwise indicated, the material presented in this section can be referenced to (Gottschalk, et al., 2010)..  4   The direct reaction pathway usually involves reactions between ozone and unsaturated bonds, leading to the splitting of these bonds. Commonly, the reaction velocity is proportional to the electron density of the compound, in the case of organic compounds, or degree of nucleophilicity, in the case of inorganic compounds. In general, ozone can react much faster with inorganic compounds. One particular reaction of concern for drinking water is between ozone and bromide, which generates bromate, a potential carcinogen. Amongst organic compounds, aromatic and aliphatic molecules that carry electron-supplying substituents such as hydroxyl or amine groups are examples of those which ozone will react faster with.  The indirect reaction pathway involves production of radicals (molecules with unpaired electrons), most of which are highly reactive. The indirect chain mechanisms are commonly divided into three steps: initiation, chain propagation and termination.  Initiation is the decay of ozone to form secondary oxidants, accelerated by compounds known as initiators. One of the most common initiation pathways is the reaction between ozone and hydroxides (OH−), which leads to the formation of one superoxide anion (O2−) and one hydroperoxyl radical (HO2), which stays in acid-base equilibrium with superoxide anions (Equations 1 and 2).  𝑂3 + 𝑂𝐻− → 𝑂2− + 𝐻𝑂2 (1)  𝐻𝑂2 ↔ 𝑂2− + 𝐻+ (2) The ozone and the formed radicals continue reacting in the chain propagation step, forming different superoxide radicals, depending on factors such as pH and oxygen availability. Substances that convert hydroxyl radicals into superoxide radicals act as chain carriers, promoting the chain reaction. These substances are known as promoters. Some promoters usually encountered in the context of water treatment are humic acids.  Some organic and inorganic substances, known as scavengers, function as inhibitors, terminating the chain reaction and inhibiting ozone decay. They react with OH− to form secondary radicals that do not produce superoxides. Carbonate (CO32-) and bicarbonate (HCO3-) ions are amongst the most common scavengers. Generally, direct ozonation plays a major role when the radical reactions are inhibited. Such is the case if the water either does not contain initiators or promoters or if it contains 5   scavengers in higher concentrations. Therefore, both inorganic and organic compounds play an important role in the ozone decay rate and the preferential reaction pathway. Regardless of the predominant pathway, ozone decay in water can be described as two-phase process: an instantaneous demand (IOD), followed by a slower pseudo-first order decay (Figure 2). The magnitude of the IOD may greatly vary depending on the source of natural organic matter and specially its hydrophilicity  (Cho, et al., 2003).   Figure 2 - Typical ozone decay in water process (Zhang, 2007)                6   2.2. Disinfection by-products2 2.2.1. Introduction Disinfection by-products (DBPs) are generally defined as chemical compounds that are formed by the reaction of a disinfectant used in water treatment with a compound naturally present in the water. Even though nearly all disinfectants commonly used in water treatment are capable of producing DBPs, for the present study, only those originated from the reaction of chlorinated species and natural organic matter (NOM) will be discussed. The use of chlorine in drinking water treatment for disinfection started in late XIX century, but it was only during the 1970s that potentially harmful halogenated compounds were identified to originate from reactions between chlorine compounds and NOM. Currently, more than 30 DBPs have been identified in water containing NOM and disinfected using chlorine compounds.  Numerous countries, including Canada and the United States, have regulations requiring a minimum residual chlorine concentration to be maintained in the distribution system as a means to prevent microbial regrowth– a process known as secondary disinfection. In general, the health risks from the presence of DBPs are considered to be less than the health risks from consuming water that has not been disinfected or has experienced microbial regrowth (Health Canada, 2006). Studies conducted during the late 1980s in the United States (Krasner, et al., 1989) demonstrated that two classes of chlorinated DBPs were predominant on a weight basis in drinking water: trihalomethanes (THMs) and haloacetic acids (HAAs). These two classes are now regulated and monitored in North America and other parts of the world and are used as indicators for the presence of other DBPs in chlorinated drinking water. To reduce DBP formation and/or increase the inactivation of certain microorganisms, non-chlorinated disinfectants can be used as primary disinfectants. Disinfection with UV is currently one of the most frequently used non-chlorine-based disinfection technology for its strong germicidal ability.  2  Unless otherwise indicated, the material presented in this section can be referenced to (Singer, 1999). 7   Novel UV processes are being developed, but the use of UV for disinfection is still hampered by its high operating costs. UV units also require that the turbidity of the water be low for effective disinfection. In addition, because no chemicals are added in the water, no residual or secondary disinfection is provided (Oram, 2014). 2.2.2. Trihalomethanes THMs are chemical compounds in which three of the four hydrogen atoms in the methane molecule (CH4) have been replaced with atoms of the halogen family - fluorine, chlorine, bromine, iodine, astatine or tennessine (Ivahnenko & Zogorski, 2006). For the purpose of the present study, the term THMs will refer only to chloroform, bromodichloromethane (BDCM), dibromochloromethane (DBCM) and bromoform, the most common THMs found in drinking water (Health Canada, 2006).  Of these, chloroform has been most extensively studied. It is, generally, present in highest concentration. It is formed by the reaction of chlorine with humic, fulvic and citric acids (Larson & Rockwel, 1979), triclosan (Rule, et al., 2005), resorcinol (Özbelge, 2001) or aldehydes and ketones (Deborde & von Gunten, 2008). In addition to those compounds, BDCM, DBCM and bromoform can also be formed when brominated species are presented in the water.  The health effects of THMs are not yet completely clear. Based on limited evidence, they are considered to be possible carcinogen in humans and to affect the reproductive system. There is no widely accepted number at below which THMs are safe for human consumption. Most guidelines impose a very low maximum concentration. Currently in Canada this limit is 100µg/l (Health Canada, 2006). 2.2.3. Haloacetic Acids  HAAs are carboxylic acids in which a halogen has replaced one of the methyl hydrogen atoms. Monochloroacetic acid (MCAA), dichloroacetic acid (DCAA), trichloroacetic acid (TCAA), monobromoacetic acid (MBAA) and dibromoacetic acid (DBAA) are the HAAs most commonly found in drinking water. These are often grouped and referred as HAA5 or total HAA (Health Canada, 2008). DCAA and TCAA are often present in higher concentrations than the rest and were subject of a greater number of studies. They are formed by the reaction of chlorine with humic and 8   fulvic acids present in the water. Just as BDCM, DCBM and bromoform, MBAA and DBAA can be formed when brominated species are present in the water. The health effects associated with exposure to HAA vary from each specific compound. Some species such as DCAA, TCAA and DBAA are considered possible carcinogen for humans. It was observed that MCAA caused liver and kidney changes in animals. A single guideline for total HAA is commonly adopted. In Canada, the limit is currently 80µg/l (Health Canada, 2008). The rates at which THM and HAA are formed are pH dependent. In general, THM formation increases when pH increases while HAAs are more predominant at neutral and acidic pH (Hung, et al., 2017). 2.2.4. DBP formation models3 Active research on the DBPs formation has resulted in a variety of predictive models. Most of them were developed through laboratory or field-scale experiments using, raw or synthetic waters and are, therefore, of empirical nature. There are also a few studies that developed models based on kinetic relationships. Numerous parameters are commonly used in the development of these predictive models, including: total organic carbon (TOC), dissolved organic carbon (DOC), ultraviolet absorbance at 254nm (UVA) and/or its counterpart UV transmittance at 254nm (UVT), specific ultraviolet absorbance (SUVA), pH, temperature, bromide ion concentration, chlorine dose and reaction time.  UVA is a measurement of the amount of ultraviolet light absorbed by a water sample. It is typically measured at 254nm, but can also be measured at other wavelengths. Measurements are typically performed ex-situ (i.e. in the laboratory) on filtered water samples. UVT, on the other hand, is a measurement of the amount of ultraviolet light that passes through a water sample. UVT is typically measured in-situ on unfiltered water samples. UVA and UVT absolute values can differ specially when the water contains substantial particulate material (e.g. turbidity).  3  Unless otherwise indicated, the material presented in this section can be referenced to (Chowdhury, et al., 2009). 9   UVA and/or UVT are commonly used as surrogate for disinfection byproduct formation potential (RealTech, 2017). Differential UVA and/or UVT resulting from chlorination can also be related to DBP formation potential  (Korshin, et al., 2002). However, due to the complexity of the reactions and the number of variables involved, no universal mechanistic model exists that can predict DBP formation. 2.2.5. Drinking water treatment and DBP history in Vancouver Metro Vancouver is a regional organization responsible for planning and delivering a number of services for the city of Vancouver and multiple others surrounding cities. The provided services include drinking water and wastewater treatment and solids waste management. Metro Vancouver is also responsible for the distribution and monitoring of drinking water quality.  The source water used by Metro Vancouver comes from three protected watersheds located in the mountains north of the City. There are two main treatment plants that together provide water for approximately 2.5 million habitants in the region: Seymour-Capilano Filtration Plant and Coquitlam Water Treatment Plant (Metro Vancouver, 2019).  Metro Vancouver monitors THMs and HAAs in multiple points throughout the distribution system on a quarterly basis. DBPs concentrations are published in annual reports since 2002. To be in accordance with CDWQG, the average of four consecutive quarterly measurements must not exceed 100µg/L for THMs and 80µg/L for HAAs (Figure 3). The Coquitlam Water Treatment Plant (CWTP) – the focus of the present study – provides water to approximately five hundred thousand people (20% of the population) in the metropolitan region, treating about 380 million litres of drinking water per day (Metro Vancouver, 2019). Treatment at CWTP originally consisted of ozonation for primary disinfection and chlorination for secondary disinfection. Ozonation prior to chlorination also reduced the DBP formation potential (Chin & Béruré, 2005). Since 2014, UV was added after ozonation to provide primary disinfection (Figure 4). Ozonation is still used at CWTP to increase the UV transmittance (UVT) of the raw water prior to UV disinfection, oxidize taste and odour compounds and to oxidize DBP precursors. Ozone is produced on site through the corona discharge method.  10    Figure 3 – Drinking water treatment and distribution system and DBP sampling stations in Metro Vancouver Squares represent the drinking water treatment plants and circles represent sampling stations in which DBPs are monitored in Metro Vancouver. DBP data from the five black circles were used on this study (see section 4.4.3). Adapted from Metro Vancouver’s GIS – Water Services   Figure 4 – Schematic drawing of CWTP processes Historically, THMs have been below the CDWQG (Figure 5a). HAAs, on the other hand, have experienced a higher variance; with multiple isolated measurements above the guideline. 11   Since 2002, the average of four consecutive quarterly measurements has exceeded the guidelines for HAAs on only one occasion (Figure 5b).   (a)  (b) Figure 5 – Isolated measurements and running averages of four consecutive measurements of THMs and HAAs in the metropolitan area of Vancouver Data taken from The Greater Vancouver Water District Quality Control Annual Reports (2002-2017). Dots represent isolated measurements. Lines represent the average of four consecutive measurements. (a) THMs and (b) HAAs. Analyses of historical data and discussions with Metro Vancouver revealed that HAAs are the DBP of greater concern in the region, as it was present in higher concentrations throughout the study period and has more restrict regulatory limits. 12   2.3. Machine learning algorithms4 2.3.1. Introduction The human nervous system is an immensely complex biological neural network. Trillions of interconnected neurons control most of our actions, such as breathing, moving and thinking. It is generally understood that the biological neural functions are stored in the neurons and in the connections, or synapses, between them. Learning is viewed as the formation of new connections between neurons and the modification of existing ones.  Artificial neural networks (ANN) are a type of machine learning algorithms that are, to some extent, based on these biological neural networks, mimicking their structures and trying to reproduce their learning capabilities.  ANNs have evolved through both conceptual innovations and implementation developments. Initial background work in the field dates back to the late XIX century and consists of studies in a multitude of disciplines such as physics, psychology and neurophysiology. These studies contributed to the development of general theories for human learning and conditioning.  The development of the field of ANNs as it its presently known began in the 1940s. By the late 1950s, neural networks were already being applied, for instance, in pattern recognition. Since then, the growth of our understanding of biological learning mechanisms, the development of new mathematical models and the enormous technological improvements in computers and processors have enabled the exponential growth of the field. Currently, ANNs have a multitude of different applications. From aerospace autopilots, to animations and special effects and to speech recognition and translation. These machine learning algorithms have become an indispensable part of our lives.  As a rule of thumb, ANNs are used in situations where one or more of the below are applicable: i) There is no mechanistic model that can describe the phenomenon precisely and reliably, usually due to its complexity or to the difficulty of tracking and measuring the high number of variables that significantly impact the process (e.g. purity test of olive oil);  4 Unless otherwise indicated, the material presented in this section can be referenced to (Hagan, et al., 2014). 13   ii) The event is, by nature, subject to great randomness and/or human influence (e.g. stock market predictions); and/or iii) Human brains are capable of doing the task, but an incredible amount of human work would be necessary to perform it on the desired scale (e.g. face recognition of people in airports). Furthermore, the development of any ANN requires the existence of a reliable, representative and, preferably, vast dataset upon which the model can be trained to identify the impact of each variable to the expected outcomes of the event. This process is known as learning the mapping function from the inputs. From hereafter, the terms ANNs and machine learning algorithms will refer specifically to methods based on learning data representation, also known as deep learning, as opposed to task-specific algorithms. Additionally, the discussions in the present study will focus on multilayer perceptron (MLP) ANNs, which are a class of feedforward neural networks (i.e. the nodes do not form a cycle – see section 2.3.2). Machine learning algorithms can be classified into two large groups based on the way they learn: supervised and unsupervised learning. Supervised machine learning, the most commonly used between the two, requires that the possible outputs of the algorithm are known, and that the data used to train the model is labeled with correct answers. Unsupervised machine learning is often more complex and less applicable, being almost exclusively restricted to intricate pattern recognition problems such as k-means clustering, principal and independent component analysis and association rules (Castle, 2017).  Supervised machine learning algorithms are often further divided into two categories, classifiers and regressors. Classifiers are algorithms that estimate the mapping function from the input variables into predetermined discrete groups. Simply put, they are trained to place outputs into categories, which are frequently non-numerical. For example, classifiers are often used in problems such as image recognition (e.g. identify whether or not a certain object is present in an image). Regression algorithms estimate the mapping function from the input variables to numerical or continuous output variables. Simply put, they are trained to make numerical predictions (e.g. to a given set of inputs, what is the expected value of the output). Regressors are often used in stock market studies and estimations in general. 14   The present study used regressor algorithms, which were trained under supervised learning conditions.  2.3.2. Basic elements of an ANN Artificial neural networks try to mimic the structure and functionality of the biological neural networks. The general structure of an ANN, here thereafter referred to as architecture, consists of three parts: the input layer, the hidden layer(s) and the output layer (Figure 6).   Figure 6 - General ANN architecture representation Adapted from (Srivastava, 2014) Each layer is formed by nodes, also known as neurons. The nodes in the input layer are “passive”, meaning that they do not perform transformations in the data. Their function is to bring the initial data to the model for further processing (Zurada, 1992). The nodes in the hidden layer(s) and the output layer, however, are “active” nodes. They receive the data from the previous layer and perform transformations. Typically, an active node consists of two units: a summation function and an activation function (Mayo, 2017). An ANN has only one input layer and one output layer, but the number of hidden layers is variable and affects the predictions of the ANN (see section 2.3.5) The simplest ANN architecture consists of a single input, one hidden layer with a single node and a single node in the output layer (Figure 7). The hidden layer node “receives” the scalar 15   “p” from the input layer. The summation function multiplies the scalar input “p” by a scalar weight “w” and adds a bias “b”. The output from the summation function (i.e. “wp+b”) is commonly referred to as summer output. Based on the summer output, the activation function “f” produces a scalar “h”. Similarly, the output layer node receives the scalar “h” and transforms it into a scalar “a” using a summation function and an activation function. The scalar “a” is the model prediction.  Figure 7 - One input, one node and one output ANN representation Adapted from (Mayo, 2017) The efficiency and accuracy of the ANN depend on several different factors, including: the activation function used (see section 2.3.3), the optimizing algorithm and the error function chosen (see section 2.3.4),  the network architecture (i.e. number of layers and nodes – see section 2.3.5). The initial weight and bias, “w” and “b”, are commonly randomly generated and, through iterative processes during the training phase, are adjusted to minimize the errors between the predicted and the expected outputs. An effective training should make this initial randomness have a negligible effect on the final prediction of the model. The most common method of training an ANN is to randomly divide the available dataset into three parts: training set, cross-validation set and test set. The training set usually contains approximately 60% of the data and is used to train the ANN, based on the inputs and outputs. The cross-validation set, which contains approximately 20% of the data, is used to validate the training and to optimize the architecture and other parameters of the network (see section 2.3.5) to minimize the errors. The test set, which contains the remaining 20% of the data, is used to validate and assess the efficacy of the ANN model.  Being a data-driven type of model, ANNs are impacted by the quality (i.e. range and precision) of the data used for training. It is important to analyze and pre-process the dataset before training the ANNs (Nawi, et al., 2013). Normalizing the data by scaling inputs to a 16   mean of zero and a standard deviation of one, for example, has been reported to improve the performance of some ANN models (Bronwlee, 2019). 2.3.3. Activation functions5 The activation function is one of the most important components of an ANN. The activation function determines whether the summer output should be discarded or activated. They help the models account for non-linearity and interactions between variables. There are numerous activation functions, each one with their own advantages and disadvantages. The present study used the Rectified Linear Unit (ReLU) activation function.  ReLU, or variations of this function (e.g. PReLU, RReLU, SELU, CReLU), are currently the most commonly used activation functions in regressor ANN models. Some of their reported advantages are their capacity to handle non-linearities between variables and lower computational requirements when compared to other activation functions (Kakaraparthi, 2019). They are particularly suited for complex ANNs, with multiple inputs and nodes in the hidden layer (Wei, et al., 2018).  The ReLU function is a half-restricted (from bottom) activation function. It returns zero if the summer output is negative, while for positive values, the value itself is returned (Figure 8). It can be mathematically represented by 𝑓(𝑥) = max⁡(0, 𝑥) (Sharma, 2017).    Figure 8 - Rectified Linear Unit graphical representation  5 Unless otherwise indicated, the material presented in this section can be referenced to (Becker, 2018). 17   The ReLU function is unbound on the axis of possible activation values, meaning that there is no upper limit. The function returns zero for negative values, therefore, for negative summer outputs, the weights are not updated during training. This can create “dead” nodes, which never get “activated”. This problem can usually be mitigated by adjusting the learning rate (see section 2.3.5) (Kakaraparthi, 2019).  2.3.4. Error function, backpropagation and optimizing algorithms The accuracy of an ANN is computed through the error, or loss, function (E). This function describes the deviation between the predicted and expected output for different weights and biases. Most commonly, the error function uses mean squared error (MSE) as the error estimator, but other functions can also be used. The error magnitude depends on the set of all weights and biases. A simple 2-D representation of an error function is illustrated in Figure 9. As previously mentioned, ANN nodes usually start with randomly generated weights and biases values. The first predicted output will likely not be similar to the expected output. The goal of the training phase is to find the global minimum of the error function. Note that the global minimum of an error function is very rarely zero, meaning that the model will likely never predict the exact expected value.   Figure 9 - Example of the error function behaviour depending on the weights and biases Adapted from (Zulkifli, 2018) 18   ANNs learn through an iterative process of changing the weights and biases in each of the active nodes (i.e. nodes in the hidden layers and output layer) and analyzing the impacts on the predicted output. The process of quantifying the error and “distributing” it backwards to train the network is generally called backpropagation, and it occurs during the training phase. The "backwards" part of the name comes from the fact the weights and biases are optimized first for the output layer and then for the hidden layers from last to first (McGonagle, et al., 2019). There are numerous algorithms, referred to as optimizing algorithms, that can be used to minimize the error function. These algorithms are generally classified as first or second order algorithms. First order optimizing algorithms minimize the error function using its gradient values. The gradient for each point of the error surface is a vector that represents the direction and the value of the slope on that point. The most common class of first order optimizing algorithms is called gradient descent. In this type of algorithm, the gradient vectors, ∇, are stored in a Jacobian matrix formed by the first order partial derivatives of the error function with respect to the set of all weights and biases, θ (Walia, 2017). The Jacobian matrix stores gradients for multiple points in the surface and is used to conduct the algorithm towards the minimums on the error surface (Burney, et al., 2007; Isac & Nemeth, 2008; Stranbury, 2014) (Eq 3).   𝛻(𝑝1, 𝑝2, 𝜃) =𝜕𝐸(𝑝1, 𝑝2, 𝜃)𝜕𝜃 (3) Second order optimizing algorithms use the second order derivative to minimize the error function. Most ANNs use first order algorithms as they are easier to compute requiring less computing time, with rapid conversion for large datasets. Second order techniques are mainly used when the second order derivative of the gradient is somehow known beforehand (Walia, 2017).  One of the most widely used and accepted optimizing algorithm is the Adam optimizing algorithm. Adam is a gradient descent first order optimizing algorithm. However, it also approximates the second order derivative by computing both the average of the gradients and their variance. In addition, it stores and uses an exponentially decaying average of past squared gradients. Benefits of the Adam optimizing algorithms include: high computational 19   efficiency, limited memory requirements and good performance on large datasets (Kingma & Ba, 2014) (Figure 10).  Figure 10 – Comparison between Adam and other frequently used optimizing algorithms. (Kingma & Ba, 2014) OBS.: Training cost is a synonym of error function 2.3.5. ANN architecture design and hyperparameters ANNs with a single node in the hidden layer often cannot be used to model complex systems (Hagan, et al., 2014). Adding more nodes in the hidden layer or adding more hidden layers in series is often required to increase the capacity of ANNs to model complex systems. Note that, different nodes can use different activation functions (Hagan, et al., 2014). In an MLP ANN, each node is connected to all nodes in the previous and subsequent layers (Figure 6). Nodes assign weights and biases to each of the inputs. Increasing the number of nodes in the hidden layer or increasing the number of hidden layers increases the computational requirements of the ANN model. There is no definitive method to find the optimal number of hidden layers and nodes per layer. It is common to empirically test different architectures and settle upon the one that generates the minimum error (Gad, 2017). In ANNs with a single hidden layer, as a rule of thumb, the number of nodes in the hidden layer is chosen to be between the number of nodes in the input layer and the number of nodes in the output layer. Some other empirically 20   derived guidelines can be obtained from the literature (Heaton, 2008). In practice, one hidden layer is generally sufficient to model the majority of systems (Ng, 2015).  In addition to the ANN architecture, the activation functions and the optimizing algorithms, other parameters can impact the accuracy and computational requirements of a model. These are known as hyperparameters and they are the learning rate, the batch size and the number of steps. The learning rate is the step size used by the optimizing algorithm when changing the set of weights and biases from one iteration to the next. The optimal learning rate is the one which will make the model reliably converge to the lowest possible error with the lowest number of iterations (Figure 11a). If the learning rate is lower than the optimal, training takes longer and the model might get “stuck” in a local minimum on the error surface, failing to identify the global minimum (Figure 11b and c). If the learning rate is higher than the optimal, training can be faster but the model might not reach the minimum point,  but it might also cause the model to fail to converge or even diverge (Figure 11d and e). Figure 12 summarizes typical behaviours for different learning rates over training iterations. Guidelines exist to assist identifying the most appropriate learning rate (Smith, 2017). Typically, an ANN is trained with different learning rates and then the both the error function and the required computing time are analyzed. Learning rates are usually picked between 10-5 and 100 at logarithmic intervals. Typically, the dataset used to train an ANN contains many measurements (i.e. multiple sets of inputs and corresponding outputs), and cannot be processed all at once. The data is divided into smaller groups, or batches. The batch size defines how many sets of measurements are going to be processed at a time. In other words, the batch size defines, during the training phase, how many sets of inputs are processed before the optimizing algorithm analyzes the error function and adjusts the weights and biases. The number of steps defines how many times these adjustments are going to be made (Hagan, et al., 2014). Once the model has processed the entire dataset, it is said that it completed an epoch, or one full cycle through the training data. Typically, multiple epochs are needed to complete training (Tolotra, 2018). Batch size and number of steps are related. The most appropriate setpoint for these are, usually, identified in parallel.  21        (a)                                 (b)         (c)                                 (d)         (e) Figure 11 - Representation of typical behaviours during training with different learning rates 22   a: Optimal learning rate; b: Low learning rate; c: Very low learning rate; d: High learning rate; e: Very high learning rate. Adapted from (Zulkifli, 2018)  Figure 12 - Representation of typical behaviours for different learning rates over iterations Adapted from (Zulkifli, 2018) Batch size and number of steps have to be properly adjusted to optimize the performance and computational requirements of a model. The ideal batch size is typically between 2 and 32 sets of data6, depending on the dataset size (Masters & Luschi, 2018). Larger batch sizes (>512) may lead to degradation of the predictive capability of the model (Keskar, et al., 2017). It is common to adopt the batch size and number of steps based on trial and error. 2.3.6. Overfitting and underfitting The true underlying pattern in a dataset is called signal. Noise, on the other hand, refers to not relevant information or randomness existent in the dataset. Ideally, predictive models should learn to reproduce the signal while ignoring the noise (Silver, 2012). Non-optimized models can suffer from two problems associated with signal and the noise: overfitting or underfitting.  6 Batch sizes are often chosen as a power of two. 23   Overfitting occurs when a model learns to predict the noise associated with the dataset. Overfit models display extremely high accuracy when making predictions on the dataset they were trained with but their performance significantly decreases when faced with a different dataset, even if they share the same signal (Patni, 2018; Alencar, 2019). An ANN can be overfit if the model is too complex (i.e. too many input variables or architecture not optimized) or if the training is longer than necessary (i.e. number of steps and batch size not optimized). Overfitting is particularly problematic on small datasets (El Deeb, 2015; IRIC, 2017), as the fewer samples available for training, the more models can fit the data (Alencar, 2019). Underfitting is the opposite. Underfit models are unable to fully learn the signal from the data because they are too simple, or not optimized, to the dataset. A model with an appropriate fit learns only the signal and not the noise from the training dataset. These models will have similar performances on any dataset with the same signal (Patni, 2018). Figure 13 displays a classic example of underfit, appropriate and overfit models.  Figure 13 - Example of underfit, appropriate and overfit models  The goal of the model in the figure is to separate circles from triangles. The underfit model presents a sub-optimal, simplistic result. The overfit model learned the noise from the training data and while it displays good results on this dataset, it will, most likely, have poor performances on different datasets. Appropriate fit models learn only the signal from the training dataset and have a similar performance on any dataset with similar signals. Adapted from (Patni, 2018) 2.3.7. Looking into what is inside the “black-box” ANNs are generally considered to be “black-box” models. This is because it is difficult to quantify the importance of the input variables to the magnitude of the predicting outputs. Nonetheless, some approaches exist to quantify the importance of each variable in an ANN (Olden & Jackson, 2002). These attempt to simplify the complex architecture and function of 24   ANNs to a set of weights similar to those of multiple linear regression models. The present study will focus on two approaches: Garson’s algorithm and the connection weight method. Garson’s algorithm identifies the relative importance of input variables for specific output variables in a supervised ANN by deconstructing the model weights (Garson, 1991; Goh, 1995). The basic idea is that the relative importance, or strength of association, of a specific input for a specific output can be determined by identifying the weights in the nodes between them. The weights are analyzed for each input and their absolute values are scaled relative to all other inputs. A single value is obtained for each input that describes its importance on the outputs of the model (Equation 4). However, studies with artificial datasets for which the relationships between the input and output variables are known have demonstrated that Garson’s algorithm often fails to identify the correct importance of different inputs. This is because Garson’s algorithm uses the absolute value of weights to calculate the importance of the inputs (Olden, et al., 2004). The connection weight method is similar to Garson’s algorithm, but it does not use absolute values and also does not scale the inputs importance relatively to all other inputs (Equation 5). Take a single hidden layer and single output ANN as an example. A is defined as the matrix of weights in the hidden layer nodes (different rows of the matrix have the weights of different inputs and different columns have the weights of different nodes). B is defined as the vector of weights of the output layer nodes. The vector containing the relative importance of each input, or the connection weights vector, (𝑪𝑾⃗⃗⃗⃗⃗⃗  ⃗). Garson’s algorithm 𝑪𝑾⃗⃗⃗⃗⃗⃗  ⃗ = ⁡|𝑨 ∙ 𝑩|∑ |𝑨 ∙ 𝑩| (4) Connection weights method 𝑪𝑾⃗⃗⃗⃗⃗⃗  ⃗ = ⁡𝑨 ∙ 𝑩 (5) Garson’s algorithm results, which are normalized to 100% relative importance, are a more intuitive display of the relative importance magnitude of each variable. The Connection weights method indicates, besides the relative importance magnitude, whether the input impact on the output is positive or negative. The relative importance analysis of the input parameters is one of the most studied and debated topics in the field of ANNs, and their results should be interpreted with caution. 25   Amongst other reasons, issues with the repeatability of results is one of the contested aspects of the analysis (Olden, et al., 2004).                        26   2.4. ANNs in water and wastewater treatment Since their development, a variety of scientific disciplines have gained interest in ANNs due to their complex pattern recognition and prediction capabilities  (Olden, et al., 2004).  Even though the basic underlying mechanisms of water and wastewater treatment are understood, there are no mechanistic models that comprehensively describe most treatment processes. Many scholars identified the potential of these machine learning algorithms for water and wastewater treatment modelling, optimization and cost analysis. Table 1 displays an extensive list of studies that used ANNs in the context of water and wastewater treatment and/or distribution. Based on the studies listed on Table 1: i) Most of previous studies developed ANNs to model either one single process within water or wastewater treatment or to model the treatment as a whole (i.e. predict treated water quality); ii) Few studies have modelled distribution systems. Most of the ones that did used simulated distribution system data. This is because modelling of real-world distribution systems introduce several difficulties (e.g. dead-zones, complex decay kinetics, non-uniform disinfectant concentrations) that cannot be completely captured by most modelling approaches (Shimazu, et al., 2005; Platikanov, et al., 2007; Kulkarni & Chellam, 2010); iii) None of the studies modelled both treatment and distribution of drinking water in parallel; iv) A single study reported successful modelling of real-world distributions systems using ANNs (Ye, et al., 2011); v) A single study reported the use of relative importance analysis to identify the dominant input variables, but discussion was limited (Kulkarni & Chellam, 2010); and  vi) No consistent framework was observed in past studies.  The present study proposed to use ANNs to model the processes in a drinking water treatment plant. Additionally, ANN models to predict the impact of treatment on the formation of DBPs in multiple points of a real-world distribution system were developed. Furthermore, the present study proposed a more in-depth investigation of the relative 27   importance analysis of the input variables in ANN models. A comprehensive framework that could be used in the modelling of water and wastewater treatment and distribution using ANNs was proposed. Table 1 – Examples of studies that used ANNs in water and wastewater treatment applications Study ANN application Findings (Gagnon, et al., 1997) Modelling of coagulant dosage in a water treatment plant. Models have been successfully developed and implemented.  (Baxter, et al., 1999) Full-scale model for the removal of NOM by enhanced coagulation. Model predicted effluent colour with a high degree of accuracy. (Göb, et al., 1999) Modelling of kinetic of a photochemical water treatment process. The model can describe the evolution of the pollutant concentration during irradiation time under various conditions. (Zhang & Stanley, 1999) Modelling of coagulant dosage in a water treatment plant. Model was found to consistently predict the optimum alum dosage for different control actions. (Gontarskia, et al., 2000) Modelling of industrial wastewater treatment plant. Satisfactory predicted results were obtained for an optimized situation. (Tupas, 2000) Predict filter effluent particles counts in a water treatment plant. Models were able to accurately estimating particle count, specially for the Fall season. (Choi & Park, 2001) Control of wastewater treatment processes using a hybrid ANN The hybrid ANN can be used to extract information from noisy data and to describe complex wastewater treatment processes. (Baxter, et al., 2002) Development of a methodology for developing successful ANN models of drinking water treatment processes. Not applicable. 28   Study ANN application Findings (Milot, et al., 2002) Modelling of THM occurrence in drinking water. ANN models gave similar or better results than other modelling techniques. (Rodriguez, et al., 2003) Modelling of THM occurrence in drinking water. ANNs have a greater ability than MLRs to predict THM formation for most water quality and chlorination conditions (Shetty & Chellam, 2003) Prediction of nanofiltration membrane fouling. Accurate ANN predictions were possible. Simple ANNs are capable of capturing changes in feed water quality. (Hamed, et al., 2004)  Predict of wastewater treatment plant performance based on daily records of selected parameters through various stages of the treatment. The ANN models were found to provide an efficient and a robust tool in predicting WWTP performance. (Heck, et al., 2004) Modelling of effectiveness of ozonation in drinking water. The network was effective in predicting the outcome of ozone disinfection under conditions not previously encountered in training. (Mjalli, et al., 2007) Prediction of wastewater treatment plant performance. ANNs are capable of capturing the plant operation characteristics with a good degree of accuracy. (Griffiths, 2010) Model filtration process in a water treatment plant (post-filtration particle count). High correlation was observed between the predicted and measured datasets. (Khataee & Kasiri, 2010) Modelling of contaminated water treatment processes by homogeneous and heterogeneous nanocatalysis. ANN models are an effective and simple approach to describe behaviour of these complex processes. 29   Study ANN application Findings (Kulkarni & Chellam, 2010) Modelling of DBP formation using simulated distribution system data. Highly accurate ANN models were developed. (Ye, et al., 2011) Modelling of DBPs using artificial neural networks. The performance of the ANNs was “excellent” (r>0.84). (McArthur & Andrews, 2015) Modelling of coagulant dosage in a water treatment plant Highly accurate ANN models were developed. (Marzouk & Elkadi, 2016)  Predicting construction costs of water treatment plants based on cost drivers. Models were validated and their effectiveness was demonstrated. (Giwa, et al., 2016) Model removal of selected contaminants by an electrically-enhanced membrane bioreactor for wastewater treatment. High correlations for all contaminants were observed. (Liu & Ratnaweera, 2016) Modelling of coagulant dosage in a water treatment plant. ANNs enabled reductions up to 15% in coagulant consumption. (Manamperuma, et al., 2017) Modelling of coagulant dosage in a water treatment plant. ANNs enabled reduction up to 30% in coagulant consumption. (Karadurmuş, et al., 2018) Modelling of bromate removal in drinking water. Highly accurate ANN models were developed.        30   3. Objectives The overall objectives of the present study were developed to address some of the limitations of existing ANN models, developing a model to optimize ozone dose to minimize DBP formation. The specific objectives were: • Develop a framework of the use of artificial neural networks to model the treatment and distribution of drinking water; • Develop ANN models to estimate DBPs in the distribution system based on real-time ozone dose setpoints and raw water characteristics; • Investigate the relative importance of the inputs in the ANN models; and  • Integrate treatment and distribution models to identify optimal ozone dose setpoints that minimizes operational costs while ensuring compliance with CDWQG for DBPs.               31   4. Materials and methods 4.1. Problem definition and design approach The current approach to operating water treatment systems is often subjective. The operational complexity is confounded by the fact that the performance of each component of a treatment system highly depends on the functioning of the other components in the system and on the raw water characteristics, both of which can change rapidly (Zhang & Stanley, 1999; Liu & Ratnaweera, 2016). CWTP doses ozone proportionally to the flow being treated, However, because ozone consumption depends on the characteristic of the raw water, which changes over time, the ozone consumption is variable (see section 2.1 and 2.2.4). ANNs trained based on historical data and capable of receiving and processing in real-time water quality and operational parameters and predicting their impacts on DBP formation were developed. Models for THMs, total HAAs, DCAA and TCAA were developed. Raw water quality data, as well as treatment system data (i.e. post-treatment water quality characteristics and operational setpoints) are available in real-time. Distribution system data (i.e. water quality characteristics, including DBPs), on the other hand, are collected on a quarterly basis (every three months). To account for the discrepancy in the frequency at which data is collected, two models were developed: i) The treatment system operation model predicts treated water characteristics that impact DBP formation based on raw water characteristics and CWTP operational setpoints; and ii) The distribution system model predicts concentration of DBPs in different points of the distribution system based on treated water characteristics.  Theoretically, one single ANN is capable of predicting concentrations of all these DBPs at the same time. However, to keep the simplicity of the models, separate ANNs were developed for THMs, total HAAs, DCAA and TCAA in all stages of the present study. Of the parameters outlined in sections 2.1 and 2.2.4 known to impact either ozonation or DBP precursors in the effluent of treatment systems, the following are measured in real-time at CWTP:  i) Raw water pH; 32   ii) Raw water temperature; iii) Raw water turbidity; iv) Raw water UVT; and v) Ozone concentrations at the beginning and at the end of the ozone contactor.  These were used as inputs to the treatment system model. Of the parameters outlined in section 2.2.4 known to impact DBP formation in the distribution system, the following were consistently present in the Water Quality Annual Reports:  i) Treated water pH; ii) Treated water temperature; iii) Treated water turbidity; and iv) Treated water UVA. These were used as inputs to the distribution system model. A summary of the inputs and outputs used for the models is presented in Figure 14 (see sections 4.3.3 and 4.4.3 for further discussion on inputs and outputs selection).  Figure 14 - Summary of inputs and outputs of the models * Calculated based on total mass of ozone added and water flow. ** Concentration of ozone at the end of the ozone contactor pipe, prior to quenching.   33   4.2. Basics elements of model development All models discussed in the present study were developed using Python 3.0 through the Jupyter Notebook platform. Python was chosen because it is an open source license and for being well suited for machine learning applications. Many Python libraries were used throughout the study for multiple purposes. For the development of the ANNs, Tensorflow 1.2.1 was used. Sample codes developed in the present study are illustrated in Appendix C For most of the data handling and visualization, the SciPy ecosystem and its packages were used. Data analysis was also done in other softwares such as Tableau 2019.1. All ANNs developed in the present study were multi-layer perceptrons (see section 2.3.1) that used rectified linear units as activation functions in all nodes (see section 2.3.3) and Adam as its optimizing algorithm (see section 2.3.4). For all models, 60% of the available data was used for training and the remainder for validation and testing.  As discussed in section 2.3.4, ANNs introduce randomness during the training phase. For quality assurance, for all models, five ANNs were optimized and trained in parallel. The five ANNs in a model share the same architecture and hyperparameters. The predictions from the five replicate models were averaged to provide a global prediction. From hereafter, the word “model” refers to the global average of five ANNs (e.g. “to train a model” means to train five ANNs and average their results). This approach helps to mitigate the recurrent flaw in which ReLU nodes can become inactive for basically all inputs, as discussed in section 2.3.3 (Kakaraparthi, 2019). This approach is loosely based on the decision trees and random forest method, for which discussion is found elsewhere (Smarra, et al., 2018). To optimize architecture an hyperparameters of the ANNs, the following procedure was developed (all steps were discussed in section 2.3.5): i) Set initial architecture as one hidden layer with number of nodes between the number of inputs and the number of outputs; ii) Set batch size as 2 or 32, depending on the dataset size (treatment system models used 32 as initial batch size and distribution model system used 2 as initial batch size - further discussion in sections 4.3 and 4.4); iii) Set number of steps to allow the model to process the whole training dataset at least three times (three epochs); 34   iv) Test learning rates from 10-5 to 100 at logarithmic intervals and select the optimal (Smith, 2017);  v) Test different architectures to optimize processing time while preserving or improving accuracy of predictions on the cross-validation dataset; vi) Tune batch size and number of steps to optimize processing time while preserving or improving accuracy of predictions on the cross-validation dataset; and vii) Fine tune architecture and hyperparameters. The final architecture and hyperparameters used for each model are presented in Appendix A.  A logic diagram with this step by step procedure is displayed in Appendix B.                35   4.3. Treatment system operation models 4.3.1. Introduction Historical data from real-time measurements of raw and treated water characteristics and operational setpoints for the period of January-December 2017 was used to train and validate the treatment system models. Data was merged into five-minute averages. Over one hundred thousand measurements for each parameter was initially available. As discussed in section 2.3.2, 60% of the available dataset was used to train the models (training dataset). The remainder 40% was split into two equal-sized datasets, one for cross-validation (cross-validation dataset) and one for testing (test dataset - see section 2.3.2). Inputs were scaled to means of zero and standard deviations one (see section 2.3.2). The development of the treatment system model went through two stages: data preprocessing and inputs/outputs selection. 4.3.2. Data preprocessing Sensors malfunction and maintenance protocols at CWTP can generate inconsistent data. These can deteriorate the learning and predictive capabilities of the models. All data used to train the treatment models was reviewed to identify and remove these inconsistencies. Water quality parameters are not expected to suffer substantial rapid changes. The algorithm used to review data identified any deviation in the data greater than four standard deviations compared to the average of the data for the previous twelve hours. This data was discarded unless the duration of the deviation was greater than twelve hours (Figure 15). The algorithm used to review the data also identified and discarded data outside of an acceptable expected range (Figure 16a and Figure 16b). Table 2 displays the ranges within which the inputs were considered acceptable. These values were chosen based on analysis of the historical data and discussions with Metro Vancouver.  36    (a)  (b) Figure 15 – Example of the results of the removal of inconsistencies shorter than 12h a: The original dataset; b: Dataset after “cleaning” with the algorithm that removes inconsistencies in the data due to transient changes of short duration. Table 2 – Ranges for which parameters were considered as correct measurements Parameter Range Raw water temperature (oC) 2 - 20 Raw water UVT (%) 50 - 100 Raw water turbidity (NTU) 0 - 5 Raw water pH (pH units) 5.0 – 7.5 Initial ozone concentration (mg/L) * 1.0 – 4.0 Final ozone concentration (mg/L) ** 0.3 – 1.6 Treated water UVT (%) 80 – 100 * Calculated based on total mass of ozone added and water flow. ** Concentration of ozone at the end of the ozone contactor pipe, prior to quenching.  37    (a)  (b) Figure 16 - Example of the results of the removal of data outside of acceptable range a: Replicate of Figure 15b, dataset without short duration deviations; b: Dataset without both short and long duration deviations. 4.3.3. Inputs and outputs selection A total of four treatment system models were developed. They differ by the combination of sets of inputs used and predictions (i.e. outputs). Two sets of inputs were used:  i) Raw water temperature, raw water pH, raw water UVT, raw water turbidity and initial ozone concentration (i.e. calculated based on mass of ozone added and flow); and ii) Same as i) with the addition of a final ozone concentration (i.e. prior to quenching).  Each set of inputs was used to: i) Predict treated water UVT directly; and ii) Predict treated water UVT by predicting ΔUVT and adding that value to raw water UVT.   38   a) Inputs The two sets of inputs were used to test the importance of ozone decay on the performance of the models. As discussed in section 2.1, ozone decay in water typically is a two-phase process. The first set of inputs includes only one ozone concentration: the theoretical initial ozone concentration calculated based on the mass of ozone added and flow. This initial ozone concentration was calculated based on other operational setpoints, as presented in Eq. 3. The second set of inputs uses, in addition, a final ozone concentration, measured downstream in the ozone contact chamber, immediately prior to ozone-quenching.  [𝑂3] =(𝐶𝑂𝐺𝑀 ×100𝑔𝑂3/𝑚36.99% ∗ × 𝐺𝑀𝐹 ×𝐺𝑎𝑠𝑒𝑜𝑢𝑠⁡𝑚𝑖𝑥𝑡𝑢𝑟𝑒⁡𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒⁡𝐴𝑡𝑚𝑜𝑠𝑝ℎ𝑒𝑟𝑖𝑐⁡𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒⁡ ∗∗)𝑊𝑎𝑡𝑒𝑟⁡𝑓𝑙𝑜𝑤 (3) COGM: concentration, in weight, of ozone in the gaseous oxygen mixture GMF: gaseous mixture flow * In accordance with (Ozone Solutions, 2015) ** Factor to account difference in pressure from the gas stream to the water stream  b) Outputs Treated water UV transmittance was the desired output of the treatment models (see section 2.2.4). Two methods of predicting treated water UVT were investigated: directly predicting treated water UVT; and predicting UVT variation (ΔUVT) caused by the treatment (i.e. treated water UVT minus raw water UVT), and calculating treated water UVT by adding this value to the raw water UVT.  In summary, a total of four treatment models were developed. Models were named “T” for treatment, followed by either “T” or “Δ”, representing whether they directly predicted treated water or delta UVT, followed by either “1” or “2”, representing whether they used one (i.e. initial) or two (i.e. initial and final) ozone concentrations (Table 3). Table 3 – Treatment model development phase Model name Inputs Output T – T1 Raw water temperature, UVT, turbidity, pH and initial ozone concentration. Treated water UVT 39   Model name Inputs Output T – Δ1 Raw water temperature, UVT, turbidity, pH and initial ozone concentration. ΔUVT T – T2 Raw water temperature, UVT, turbidity, pH, initial ozone concentration and final ozone concentration. Treated water UVT T – Δ2 Raw water temperature, UVT, turbidity, pH, initial ozone concentration and final ozone concentration. ΔUVT                                  40   4.4. Distribution system models 4.4.1. Introduction Historical data from the Water Quality Annual Reports from the period of 2002-2017 was used to train and validate the distribution models. Approximately 60 measurements for each sampling stations was available. As discussed in section 2.3.2, 60% of the available dataset was used to train the models (i.e. training dataset). All of the remaining 40% was used for testing (i.e. test dataset - see section 2.3.2). This was done because of the small size of the available dataset.  Scaling of the inputs hindered the performance of the distribution models. This was likely due to the non-normal distribution of the DBP data (e.g. HAA had a p-value of 0.017 based on the D’Agostino and Pearson’s normality tests) and the greater inherent variability in the measured values. Inputs were, therefore, not scaled prior to training. 4.4.2. Data preprocessing Measurements for DBPs and treated water quality parameters are generally not collected at the same time. For the present analysis, it was assumed that DBP could be paired to water quality data as long as measurements were collected within 15 days. 4.4.3. Inputs and outputs selection A total of twenty-four distribution models were developed. They differ by the combination of sets of inputs used and predictions (i.e. outputs). Two sets of inputs were used: i) Treated water temperature, treated water pH, treated water UVA, treated water turbidity and UVA variation caused by treatment (see section 2.2.4); and ii) Same as i) with the addition of treated water DOC and TOC.  Each set of inputs was used to: i) Predict DBPs at GV-098;  ii) Predict DBPs at MPR-434, -435, -438 and -440; and iii) Predict DBPs at all five selected sampling stations (see replicate of Figure 3 below for further reference).  Note that treatment system models used UVT and distribution models used UVA as inputs. This was because of the nature of the measurements: treated water UVT was measured online in-situ and treated water UVA was measured ex-situ in laboratory conditions. 41   However, because of the low turbidity of the water, ex-situ UVA measurements and in-situ UVT measurements are essentially interchangeable (see section 2.2.4).   a) Inputs The two sets of inputs were used to test the impact of the addition of DOC and TOC on the performance of the models. DOC and TOC measurements are available in the annual reports but are not currently measured in real-time at CWTP. Retention time estimations and chlorine dose are two parameters often used in DBP formation models (see section 2.2.4) that were not reliably available (i.e. data could not be paired with DBP measurements for most of the study period). b) Outputs DBPs are monitored at multiple sampling stations spread across the region. It was assumed that the eastern portion of the distribution system receive water predominantly from CWTP (i.e. limited influence from Seymour-Capilano Filtration Plant water). Only data from sampling stations located in this region was considered for the present study. DBP data from five different sampling locations were investigated: sampling station GV-098, in the city of Pitt Meadows, administrated by Metro Vancouver; and sampling stations MPR-434, MPR-435, MPR-438 and MPR-440, in the city of Maple Ridge, administrated by the municipality of Maple Ridge. Metro Vancouver publishes their DBP measurements on the Greater Vancouver Water District Quality Control Annual Reports. DBP data for the four stations in the city of Maple Ridge were ceded by the municipality. Metro Vancouver and the city of Maple Ridge, however, have different sampling and sample handling (quenching and transport) procedures. Models were separately developed to predict DBPs exclusively at GV-098, exclusively at MPR stations and at all five sampling stations. As discussed in section 4.1, to keep models’ simplicity, each model predicted only one DBP at a time, so each combination of sets of inputs used and outputs predicted generated four models (total THMs, total HAAs, DCAA and TCAA). 42    Replicate of Figure 3 In summary, a total of twenty-four treatment models were developed. Models were named “THM”, “HAA”, “DCAA” or “TCAA”, based on which DBP they predict, followed by either “B” or “DT”, representing whether they use only the base inputs or, in addition to those, DOC and TOC, followed by either “GV”, “MR” or “A”, representing whether they predict DBPs at GV-098, Maple Ridge sampling stations or All five sampling stations. Table 4 summarizes the distribution models, with a “D” as placeholder for each DBP. Table 4 – Distribution models development phase Model name Inputs Output D – B – GV Treated water temperature, pH, UVA and turbidity and ΔUVA DBPs at GV-098 D – B – MR Treated water temperature, pH, UVA and turbidity and ΔUVA DBPs at Maple Ridge sampling stations D – B – A Treated water temperature, pH, UVA and turbidity and ΔUVA DBPs at all selected sampling stations 43   Model name Inputs Output D – DT – GV Treated water temperature, pH, UVA, turbidity, ΔUVA, DOC and TOC DBPs at GV-098 D – DT – GV Treated water temperature, pH, UVA, turbidity, ΔUVA, DOC and TOC DBPs at Maple Ridge sampling stations D – DT – GV Treated water temperature, pH, UVA, turbidity, ΔUVA, DOC and TOC DBPs at all selected sampling stations Another input commonly used in DBP formation models is retention time. Estimating the retention time within distribution systems requires hydraulic simulation models, which are time consuming and not always very accurate (Chowdhury, et al., 2009). Ideally, flow at the sampling stations would be used as an estimation of retention time. However, flow data is not available for the period of 2002 to 2007; losing some of the already scarce DBP measurements harmed the models more than the benefits of adding flow. The models that predict DBPs in more than one sampling station (models “MR” and “A”), though, have to take into account the difference in retention times between these stations. Distance from the treatment plant was used to incorporate this aspect.         44   4.5. Scenario analyses Trained treatment and distribution models were used to analyze scenarios. These analyses used the test-dataset (see section 2.3.2) of the treatment models (i.e. real-time raw and treated water characteristics and operational setpoints) to estimate DBP formation and ozone consumption. For the scenario analyses, the following was considered: i) A pre-determined target was set. The target variable of treatment system models was a treated water UVT. The target variable of distribution models was a DBP concentration in the distribution system; ii) An algorithm executed the selected model multiple times varying the control variable within historical range. The control variable for treatment system models was the initial ozone concentration. The control variable for distribution models was treated water UVA; and  iii) Results were analyzed and compared to current practices. A logic diagram with a step by step procedure used to perform one of the analyses (section 5.4.2) is displayed in Appendix D. Multiple target treated water UVT were used for multiple scenarios analyses (see discussion in section 5.4.1) and compared to current practices. As discussed in section 2.2.5, HAAs are of greater concern in the distribution system for Metro Vancouver. Based on discussion with Metro Vancouver, the HAA target for the scenario analyses was set as the least conservative of the following values: 60µg/L with a 90% confidence level or 50µg/L with a 99% confidence level (further discussion on confidence levels on section 5.2.2).        45   5. Results and discussion 5.1. ANN training  The procedure described in 4.2 was followed to identify the most appropriate architecture and hyperparameters for all models. This section will use model T-T1 as a representation of typical observed results during the training phase. Details of the architecture and hyperparameters for models developed are presented in Appendix A. For model T-T1, the initial number of nodes in the hidden layer was set to four (between five inputs and one output). Batch size was to set 32 (large dataset available). The number of steps was set to 5140, to allow three full cycles through the training data. Learning rates ranging between 10-5 and 100, at logarithmic intervals, were tested and the error was monitored (Figure 17).  Figure 17 – Typical error vs. learning rate  Results presented for model T-T1. Models experienced high errors for the learning rate of 10-5, likely because the models converged in local minimums on the error function (see section 2.3.5). The error displayed an initial decrease with an increase in learning rate. The optimal learning rate is equivalent to that identified before diminishing returns (Smith, 2017). For the above application, the optimal learning rate was selected to be 0.001. The optimal learning rate for all models was either 0.01 or 0.001. 46   With provisional learning rate and batch size set, different architectures were tested to optimize both processing time and prediction accuracy. For the above application, best results were achieved with one single hidden layer with five nodes. To tune batch size and number of steps, the error function at each step was monitored (Figure 18).  Figure 18 - Typical error vs. number of steps  Results presented for model T-T1 Over 99.9% of the error was minimized within the first one-thousand steps. Increasing the number of steps decreased the prediction errors, but it also increased the computational requirements. For the above application, batch size was set to 32 and the number of steps was set to 10280, allowing for six full cycles through the dataset.      47   5.2. Treatment models 5.2.1. Assessment The present study evaluated the performance of the models based on two non-independent factors: accuracy and ability to detect and replicate expected and/or observed trends in the dataset (see sections 5.2.2 and 5.2.3, respectively). Unless otherwise indicated, results from the test dataset are presented. The accuracy of the models was quantified based on the ability of a model to predict measured data and minimize the error function (i.e. error is defined as the difference between the predicted value and the measured value). The parameters used to quantify the accuracy of the models were:  i) Mean squared error;  ii) Coefficient of correlation (Pearson r) of the predicted and measured data;  iii) Slope and intercept of the predicted and measured data;  iv) Repeatability of the results (i.e. whether or not successive executions generated the same result); and v) Processing time to train the models. Note that processing time, or computational requirements, is not a direct measurement of accuracy, but, as discussed in sections 2.3.5 and 5.1, there is usually a trade-off between accuracy and computational requirement. The ability to detect and replicate expected and/or observed trends in the dataset is more subjective and was assessed to gain insight into the ability of a model to replicate overall expected trends based on current knowledge (see sections 2.1 and 2.2.4). The parameters used to assess the ability of the models to detect and replicate expected and/or observed trends in the dataset were: i) Investigation of existing trends in the measured data; ii) Data analysis on predictions of the models comparatively to the measured values; iii) Relative variable importance using Garson’s algorithm and Connection weights method (as discussed in section 2.3.6); and 48   iv) Multiple linear regressions.  Recall that four treatment system operation models were developed. Models were name “T” for treatment, followed by either “T” or “Δ”, based on whether they predict treated water UVT directly or using ΔUVT, followed by either “1” or “2”, based on how many ozone concentrations they use as inputs. Typical results are presented in sections 5.2.2 and 5.2.3 when the outcomes from all the models were similar.  5.2.2. Accuracy analyses Typical results for treatment model predicted vs. measured treated water UVT values are presented in Figure 19.   Figure 19 – Typical predicted vs. measured treated water UVT Results presented for model T-T1. The confidence interval overlapped with the regression line and cannot be identified in the figure above. As illustrated, a linear relationship was observed between predicted and measured treated water UVT values. The slopes and intercepts of the regression line through the predicted vs. measured data, hereafter referred to as regression through data for the different models, are presented in Figure 20. 49   Models that directly predict treated water UVT (T-T1 and T-T2) were relatively accurate, with correlation coefficients greater than 0.96. Their regressions through the data displayed slopes close to the expected value of 1 and intercepts close to the expected value of 0 (Table 5, Figure 20). Models that indirectly predict treated water UVT using ΔUVT (T-Δ1 and T-Δ2) were, generally, more accurate (i.e. higher correlation coefficients, slopes closer to 1 and intercepts closer to 0) when compared to the models that directly predict treated water UVT (Table 5, Figure 20).  Considering both initial and final ozone concentrations in the ozone contactor had mixed impacts on the accuracy of the models. Improvements in correlation coefficient and MSE were observed for both the “T” and “Δ” models. A substantial increase in slope (to values closer to the expected value of 1) and decrease in intercept (to values closer to the expected value of 0) of the regressions through the data were also observed for “T” models. However, considering both initial and final ozone concentrations actually decreased the slope and increased the intercept of the regressions through the data for the “Δ” models. Table 5 – Correlation coefficients and MSE of the treatment system models Model Correlation coefficient MSE T-T1 0.966 0.1733 T-T2 0.972 0.1730 T-Δ1 0.975 0.1555 T-Δ2 0.979 0.1236  The correlation coefficients for all models were statistically different (based on a 99% confidence interval) with the exception of the Models T-T2 and T-Δ1 (Fisher, 1921). Model T-T1 had the lowest slope and the highest intercept for the regression through the data. No statistical differences between the slopes and intercepts for the regression through the data of models T-T2 and T-Δ1 were observed (based on a 99% confidence interval). Similarly, no statistical differences between the slopes and intercepts for the regression through the data of models T-T2 and T-Δ2 were observed (based on a 99% confidence interval). However, the differences between the slopes and intercepts for the regression 50   through data of models T-Δ1 and T-Δ2 were statistically significant (based on a 99% confidence interval). All slopes were statistically less than 1, indicating that for all models a fraction of the variability observed in the measured data could not be predicted using the selected input parameters. This is most likely due to variability in the measurements themselves, which cannot be accounted for with the current modelling approach.  (a)  (b) Figure 20 - Slope and intercept of regression through the data for treatment models Error bars correspond to the 99% confidence interval of estimated parameters. Ideal slope and intercept values are 1 and 0, respectively. a: Slopes and b: intercepts. The repeatability of predicted values for all treatment models was similar. Once models were optimized, successive trainings generated stable (i.e. similar) results. “T” models had lower computational requirements than “Δ” models (computational requirement being proportional to number of steps required to train the model – Appendix 51   A). However, these differences did not substantially impact computing time (i.e. all treatment models had reasonably low training times, under two minutes). No substantial difference between models with one or two (i.e. initial and final) ozone concentrations was observed in regards to computational requirements. In summary, the treatment system models could reproduce predictions proportional to the measured values with high accuracy using the selected input parameters. The accuracy analyses revealed that “Δ” models had a higher accuracy than “T” models. Model T-Δ2 was observed to generate the highest correlation coefficient and lowest MSE. Model T- Δ1 was observed to generate the slope and intercept combination closest to the expected values of 1 and 0, respectively. The accuracy of the treatment models was equal or higher than the highest accuracies reported in the studies discussed in section 2.4 (Table 1). These results indicated that the developments in the ANN field (e.g. new optimizing algorithms and activation functions) should stimulate even further their use in water and wastewater applications. 5.2.3. Trend analyses a) Trends in the measured dataset As a first step in assessing the ability of the models to detect and replicate existing trends in the dataset, the occurrence of trends in all of the measured data was assessed. As previously discussed (sections 2.1 and 2.2.4), raw water pH, temperature, turbidity and UVT were expected to impact either ozonation or treated water UVT. The relationship between these parameters and treated water UVT and ΔUVT on all of the measured dataset are presented in Figure 21. 52           (a)       (b)        (c)       (d)         (e)       (f) 53          (g)       (h)        (i)       (j) Figure 21 – Measured treated water UVT or ΔUVT vs. measured input parameters of interest Due to the large dataset (>80,000 measurements), the density of points was colour-coded: brighter regions have more measurements, darker regions have less. It is acknowledged that the relationships between the measured parameters of interest are not necessarily linear. Therefore, Pearson r is not necessarily an accurate representation of the existence of a trend between the parameters. a: Measured treated water UVT vs. measured raw water UVT; b: Measured ΔUVT vs. measured raw water UVT; c: Measured treated water UVT vs. measured initial ozone concentration; d: measured ΔUVT vs. measured initial ozone concentration; e: Measured treated water UVT vs. measured raw water temperature; f: measured ΔUVT vs. measured raw water temperature; g: Measured treated water UVT vs. measured raw water turbidity; h: measured ΔUVT vs. raw water turbidity; i: Measured treated water UVT vs. measured raw water pH; j: measured ΔUVT vs. raw water pH; The present study quantitatively defined relationships as “clear”, when the correlation coefficient was greater than 0.55; as “moderate” when the correlation coefficient was between 0.30 and 0.55; and as “limited” when the correlation coefficient was less than 0.30. As expected, clear relationships between raw water UVT and both treated water UVT and ΔUVT were observed: the greater the raw water UVT, the greater the treated water UVT and the lower the ΔUVT (Figure 21a and b). 54   Moderate relationships between initial ozone concentration and both treated water UVT and ΔUVT were observed: the greater the initial ozone concentration, the greater the treated water UVT and the greater the ΔUVT (Figure 21c and d). The impact of the initial ozone concentration, specially on ΔUVT, was expected to be higher, because ozonation is the only process at the CWTP that is expected to impact UVT.  A clear correlation between raw water temperature and treated water UVT and a moderate relationship between raw water temperature and ΔUVT were observed: the greater the temperature the greater the treated water UVT and the lower the ΔUVT (Figure 21e and f).  Clear relationships between raw water turbidity and pH and treated water UVT and ΔUVT were expected. However, these two parameters varied within a very small range in the measured dataset. Therefore, only limited relationships were observed (Figure 21g, h, i and j). b) Predicted vs. measured data The ability of the models to detect expected trends was assessed by comparing the predicted and measured outputs plotted vs. each of the input parameters on the test dataset. Figure 22 displays the results of this analysis.             (a)               (b) 55             (c)              (d)        (e)                    (f)         (g)                   (h) 56           (i)                   (j) Figure 22 – Typical results for predictions vs. inputs of interest analysis Results presented for model T-T1. The x-axis in each graph contains an input, and the y-axes are the corresponding predictions or measured values. On the left column, in black, the predictions; on the right column, in grey, the measured treated water UVT values (similar to Figure 21, but only for the test subset of the data). a: Predicted treated water UVT vs. measured raw water UVT; b: Measured treated water UVT vs. measured raw water UVT; c: Predicted treated water UVT vs. measured initial ozone concentration; d: measured treated water UVT vs. measured initial ozone concentration; e: Predicted treated water UVT vs. measured raw water temperature; f: measured treated water UVT vs. measured raw water temperature; g: Predicted treated water UVT vs. measured raw water turbidity; h: measured treated water UVT vs. raw water turbidity; i: Predicted treated water UVT vs. measured raw water pH; j: measured treated water UVT vs. raw water pH; Qualitatively, the trends observed for the measured test dataset (right column of Figure 22) were similar to those observed when considering all of the data (Figure 21). The trends observed for the measured test dataset (right column of Figure 22) were also quantitatively similar to those observed when considering all of the data (i.e. similar correlation coefficients - Figure 21). These results indicate that the trends present in all of dataset were preserved after the split of the data into training, cross-validation and test datasets. Qualitatively, the trends observed for the measured test dataset (right column of Figure 22) were generally replicated by the predictions of the models (left column of Figure 22). The trends observed for the measured test dataset (right column of Figure 22) were also quantitatively detected and replicated by  the predictions of the models (i.e. similar correlation coefficients - left column of Figure 22). c) Relative importance analyses and multiple linear regressions Results from the relative variable importance analyses, based on Garson’s algorithm and Connection weights methods (see section 2.3.6), are presented in Figure 23. Both analyses attributed similar impacts to the different input parameters considered. 57   The relative importance analyses consistently ranked raw water UVT as the input parameter with the greatest impact on the predicted treated water UVT and ΔUVT (Figure 23). This was consistent with analysis of trends in the measured data (section 5.2.3a) and the ability of the models to detect these trends (section 5.2.3b).         (a)       (b)       (c)       (d)     (e)       (f) 58       (g)       (h) Figure 23 – Typical results for relative variable importance analyses of the treatment system models a: Garson’s algorithm results for model T-T1; b: Connection weight method results for model T-T1; c: Garson’s algorithm results for model T-Δ1; d: Connection weight method results for model T-Δ1; e: Garson’s algorithm results for model T-T2; f: Connection weight method results for model T-T2; g: Garson’s algorithm results for model T-Δ2; h: Connection weight method results for model T-Δ2. The relative importance analyses generally ranked raw water turbidity as the input parameter with the second greatest impact on the predicted treated water UVT and ΔUVT (Figure 23). However, only a limited relationship was observed in the measured dataset (section 5.2.3a). The relative importance analyses ranked initial ozone concentration or raw water pH as the input parameters with the third greatest impact on the predicted treated water UVT and ΔUVT (Figure 23). The initial ozone concentration impact was consistent with analysis of trends in the measured data (section 5.2.3a) and the ability of the models to detect these trends (section 5.2.3b). However, only a limited relationship between pH and treated water UVT and ΔUVT was observed in the measured dataset (section 5.2.3a).  As discussed in section 2.3.6, relative importance analysis results should be interpreted with caution. In the present study, the results from both Garson’s algorithms and the Connection weight method results varied over successive trainings, even though the predictions of models remained stable. (section 2.3.4 discusses randomness introduced during training and its impacts on the outcomes of a model). Multiple linear regression (MLR) was used to further investigate the impact of the different input parameters on the predictions of the models. No major multicollinearity was detected (variance inflation factor, VIF, smaller than 10 for all input parameters), and amongst the inputs, raw water UVT and temperature shared the most collinearity (VIF ≈ 6), which is expected since raw water UVT and temperature were subject to substantial seasonal 59   variability. Due to different magnitudes of the input parameters, these were scaled to means of zero and standard deviation one to provide a more intuitive comparison between the coefficients. Correlation coefficients for predicted vs. measured values of MLRs were lower than those for all ANNs across all models (i.e. MLRs displayed lower accuracy than ANNs). The regression coefficient for the raw water UVT was consistently the greatest, indicating that the MLRs consistently ranked raw water UVT as the input parameter with the greatest impact on the treated water UVT and ΔUVT (Table 6). This was consistent with the analysis of trends in the measured data (section 5.2.3a) and the ability of the models to detect these trends (section 5.2.3b). The regression coefficient for the initial ozone concentration was generally the second greatest, indicating that the MLRs generally ranked initial ozone concentration as the input parameter with the second greatest impact on treated water UVT and ΔUVT (Table 6). This was consistent with the analysis of trends in the measured data (section 5.2.3a) and the ability of the models to detect these trends (section 5.2.3b). The regression coefficient for raw water temperature, pH and turbidity were generally lower, indicating that the MLRs generally ranked these parameters as inputs of minor importance (Table 6). pH and turbidity results were consistent with the analysis of trends in the measured data (section 5.2.3a) and the ability of the models to detect these trends (section 5.2.3b). However, temperature results were not consistent with those analyses, which indicated a strong or moderate relationship between temperature and treated water UVT and ΔUVT in the measured dataset (section 5.2.3a) and in the test dataset (5.2.3b). Table 6 – Treatment system operation models multiple linear regressions Model Multiple linear regression coefficients Pearson r T-T1 𝑇.𝑈𝑉𝑇 = 0.778 × 𝑅𝑎𝑤⁡𝑈𝑉𝑇 + 0.001⁡ × 𝑇𝑢𝑟𝑏 − 0.088 × 𝑝𝐻 + 0.098 × 𝑇𝑒𝑚𝑝 + 0.199 × 𝐼𝑛𝑖𝑡⁡𝑜𝑧𝑜𝑛𝑒 0.933 T-T2 𝑇.𝑈𝑉𝑇 = 0.817 × 𝑅𝑎𝑤⁡𝑈𝑉𝑇 + 0.001⁡ × 𝑇𝑢𝑟𝑏 − 0.091 × 𝑝𝐻 + 0.044 × 𝑇𝑒𝑚𝑝 + 0.212 × 𝐼𝑛𝑖𝑡⁡𝑜𝑧𝑜𝑛𝑒− 0.036𝑓𝑖𝑛𝑎𝑙⁡𝑜𝑧𝑜𝑛𝑒 0.933 T- Δ1 𝛥𝑈𝑉𝑇 = −0.817 × 𝑅𝑎𝑤⁡𝑈𝑉𝑇 + 0.002⁡ × 𝑇𝑢𝑟𝑏 − 0.162 × 𝑝𝐻 + 0.179 × 𝑇𝑒𝑚𝑝 + 0.364 × 𝐼𝑛𝑖𝑡⁡𝑜𝑧𝑜𝑛𝑒 0.934 60   Model Multiple linear regression coefficients Pearson r T- Δ2 𝛥𝑈𝑉𝑇 = −0.723 × 𝑅𝑎𝑤⁡𝑈𝑉𝑇 + 0.018⁡ × 𝑇𝑢𝑟𝑏 − 0.166 × 𝑝𝐻 + 0.080 × 𝑇𝑒𝑚𝑝 + 0.387 × 𝐼𝑛𝑖𝑡⁡𝑜𝑧𝑜𝑛𝑒− 0.065𝑓𝑖𝑛𝑎𝑙⁡𝑜𝑧𝑜𝑛𝑒 0.938 T.UVT represents treated water UVT; Raw UVT represents raw water UVT; Turb represents raw water turbidity; pH represents raw water pH; Temp represents raw water temperature; Init ozone represents initial ozone concentration; and Final ozone represents final ozone concentration. A visual representation of the comparison between Garson’s algorithm and MLR results is displayed in Figure 24.   (a)  (b) 61    (c)  (d) Figure 24 - Comparison between Garson’s algorithm and MLR results for treatment models 1,2,3 a: Model T-T1; b: Model T-T2; c: Model T-Δ1; d: Model T-Δ2. 1: Importance magnitude by the Connection weights method was similar to Garson’s algorithm. Garson’s algorithms results are scaled to 100% relative importance and are, therefore, more suited for visual representation. 2: MLR coefficients were scaled to 100% relative importance, as it follows: 𝑆𝑐𝑎𝑙𝑒𝑑⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡⁡ = ⁡𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑆𝑢𝑚⁡𝑜𝑓⁡𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 3: Absolute values of MLR coefficients to match Garson’s algorithm absolute values. In summary, it can be concluded from the relative importance analyses and the MLRs that: i) Raw water UVT had a consistently high impact on predicted treated water UVT; 62   ii) Initial ozone concentration was also impactful, but to a lesser extent; iii) Raw water temperature, turbidity and pH and final ozone concentration were of limited impact; and iv) Relative importance analyses results were not consistent.  d) Discussion The ability to detect and replicate expected and/or observed trends analyses revealed one of the characteristics of most data-driven models: their inability to differentiate correlation and causation effects. Treatment at CWTP successfully increases water UV transmittance (i.e. treated water UVT is substantially higher than raw water UVT). The only process that should have an effect on this parameter is ozonation (recall that “treated water UVT” refers to pre-chlorination UVT). Therefore, the majority of the change in UVT is expected to be due to the ozone. However, the effectiveness of ozone is also correlated to other parameters such as pH, temperature and raw water UVT (see section 2.1). The correlation between raw and treated water UVT seems to outweigh the ability of the models to capture the existing causation effect between ozone and UVT variation. The relatively lower impact of ozone on both treated water UVT and specially on ΔUVT (for both measured and predicted data) is related to also another recurrent flaw of data-driven models: their training is tied to historical data, inheriting its range, precision and limitations (see section 2.3.2). It is possible that a change in ozone dose within the historical operational range (ozone concentrations between 1.0 and 3.5mg/L) does not significantly impact UVT. Consequently, the ANNs “learn” that ozone dose has low impact on UVT. It is plausible to assume that perhaps ozone doses lower than 1.0mg/L would cause most of the observed ΔUVT. Further research, beyond the scope of the present study, is required to investigate the impact of ozone on UVT over a wide range of concentrations, specially below 1.0mg/L. In summary, no significant differences were observed between the models in regards to their ability to detect and replicate expected trends. The models that predict treated water UVT using ΔUVT demonstrated higher accuracy than models that directly predict treated water UVT. No major differences were observed for models that considered only initial vs. both initial and final ozone concentrations: each set had advantages on some accuracy parameters 63   and both performed similar on the trend analyses. The addition of the final ozone concentration as an input variable did not clearly benefit the models. Model T-Δ1 was therefore selected the best performing treatment model overall due to its high accuracy, ability to detect trends and simplicity. This model was used to perform the scenario analyses discussed in section 5.4.                    64   5.3. Distribution system models 5.3.1. Assessment The techniques used to evaluate the treatment system models were also used for the distribution system models, with adjustments where applicable. Accuracy and ability to detect and replicate expected and/or observed trends in the dataset, as discussed in section 5.3.1, were assessed. Unless otherwise indicated, results from the test dataset are presented. Recall, a total of twenty-four distribution system models were developed. Models were named “THM”, “HAA”, “DCAA” or “TCAA”, based on which DBP they predict, followed by either “B” or “DT”, representing whether they use only the base inputs or, in addition to those, treated water DOC and TOC, followed by either “GV”, “MR” or “A”, representing whether they predict DBPs at GV-098, Maple Ridge sampling stations or All five sampling stations. Typical results are presented in sections 5.3.3 and 5.3.4 when the outcomes from all the models were similar. 5.3.2. Selection of models for further analysis Major differences were observed between models that predict DBPs at the different sampling stations. Exploratory data analysis and modelling suggested substantial discrepancies between the DBP data collected by Metro Vancouver and by the City of Maple Ridge. DBP formation is expected to be significantly impacted by contact time in the distribution system (see section 2.2.4). Therefore, the predictions of the models were expected to be better when combining measurements from different locations compared to when considering measurements at a single location, however, this was not the case.  Moreover, at any point in time, no consistent trend between the values collected at the different locations was observed. For instance, THMs and non-biodegradable HAA fractions values measured by the City of Maple Ridge are expected to consistently be greater than those measured by Metro Vancouver, considering the longer retention time at the Maple Ridge stations than the Metro Vancouver stations. However, this was not the case (Figure 25). Furthermore, no seasonal variability was observed on the DBP data collected at the Maple Ridge stations (Figure 26).  65    Figure 25 – Typical box-plot distribution of non-biodegradable DBPs at selected sampling stations in Metro Vancouver Results represented for total THM. One-way ANOVA p = 0.101. Boxes show the quartiles of the dataset and the whiskers show the rest of the distribution (1.5 times interquartile range).  Figure 26 – Seasonal variability of total HAAs at the sampling stations in the City of Maple Ridge One-way ANOVA p = 607. Boxes show the quartiles of the dataset and the whiskers show the rest of the distribution (1.5 times interquartile range).  Further exploratory data analysis revealed that there were major inconsistencies in the DBP data measured at the Maple Ridge stations. For example, on multiple occasions, some fractions of HAAs (i.e. DCAA and TCAA) were higher than total HAA (Figure 27).  66    Figure 27 – Typical distribution of total HAAs, DCAA and TCAA at the sampling stations in the City of Maple Ridge Results presented for MPR-434. As discussed in section 2.2.3, total HAA is a sum of five different HAA species, including DCAA and TCAA. The total HAA concentration is, therefore, expected to be higher than the sum of some of its parts. Each pair of bars represents one sampling event. The accuracy of the distribution models that predict DBPs measured by the city of Maple Ridge was poor. As illustrated in Figure 28, regardless of the input parameters, the predicted HAAs were relatively constant, while the actual measured HAA varied substantially between 10 to 90μg/L.  This was observed for models that predict DBPs only at Maple Ridge (“MR” models) and models that predict DBPs at all five sampling stations (“A” models), both when considering the base inputs and when considering DOC and TOC in addition to the base inputs, for all DBPs modelled. The differences in sampling and DBP analyses between the City of Maple Ridge and Metro Vancouver, in addition to inherent measurement variability and the lack a reliable estimate for retention time and chlorine dose, hindered the learning abilities of these models. These poor predictive results were also observed through multiple linear regressions. For these reasons, the analyses and discussions that follow only focus on the models that predict DBP at sampling station GV-098 (“GV” models). Also, in the analyses and discussions that follow, the term “distribution system models” only refers to models that predict DBPs at sampling stations GV-098.  0.020.040.060.080.0100.0120.0DBP (μg/L)Total HAA (μg/L) DCAA + TCAA (μg/L)2002 2004            2008 2010 2012 2014  201667    Figure 28 – Typical result of an ANN model that predicts DBP at the City of Maple Ridge Results presented for model HAA-B-MR As previously discussed (section 4.4.3), based on discussions with Metro Vancouver, HAAs were identified as being of greater concern. The analyses and discussions that follow focus mainly on models that predict HAAs. It is worth noting that the prevalence of HAAs over THMs in the distribution system was expected considering the pH of the water in the distribution system was approximately neutral (see section 2.2.3).  5.3.3. Accuracy analyses Typical results for model predicted vs. measured total HAA values are presented in Figure 29.  As illustrated, a linear relationship was observed between the measured and predicted values. The slopes and intercepts of the regression through the data (see section 5.2.2) for the different models are presented in Figure 30.  68    Figure 29 - Typical predicted vs. measured total HAA Results presented for model HAA-B-GV. The distribution system models could reproduce predictions proportional to the measured values with moderate accuracy, with correlation coefficients of approximately 0.65. The regressions through the data displayed slopes substantially different from the expected value of 1 and intercepts substantially different to the expected value of 0 (Table 7 and Figure 30). The correlation coefficients for models HAA-B-GV and HAA-DT-GV were statistically not different (based on a 99% confidence interval) (Fisher, 1921). No statistical differences between the slopes and intercepts for the regression through the data of models HAA-B-GV and HAA-DT-GV were observed (based on a 99% confidence interval). Considering DOC and TOC in addition to the base inputs did not substantially impact the accuracy of the predictions, but narrowed the confidence interval of the slope and intercept for the regression through the data. Table 7 - Treatment system and operation models’ errors and statistical parameters Model Correlation coefficient MSE HAA – B – GV  0.653 195.17 HAA – DT – GV  0.651 202.82 69    (a)  (b) Figure 30 – Slope and intercept of regression through the data for distribution models Error bars correspond to the 99% confidence interval of estimated parameters. Ideal slope and intercept values are 1 and 0, respectively. a: Slopes and b: intercepts. The repeatability of predicted values for both distribution models was similar. Once models were optimized, successive trainings periodically generated different results. It was observed that this variability was dependent on the split of the data into training and test sets (see sections 2.3.2 and 2.3.4). This was likely due to the small dataset available to train the 0.57±0.490.15±0.130.000.200.400.600.801.001.20HAA-B-GV HAA-DT-GVSlope28.4±22.3541.2±6.590.010.020.030.040.050.060.0HAA-B-GV HAA-DT-GVIntercept70   models. When random-seeding (i.e. forcing the algorithm to always generate the same random numbers) was used for the data-splitting, the repeatability of the results improved. The small datasets also caused the distribution models to attribute a higher importance to each measurement during the training phase. As a consequence, the models likely “learned” to model the noise in the training data, which is expected to have contributed to lower accuracies on the predictions for the test dataset (see section 2.3.6). It is hypothesized that additional reliable DBP data would be extremely beneficial for the accuracy and stability of the distribution models.  No substantial difference in computational requirements were observed for the models HAA-B-GV and HAA-DT-GV (computational requirement being proportional to number of steps required to train the model -Appendix A). All distribution models had reasonably low training times, under two minutes.  In summary, the distribution system models could reproduce predictions proportional to the measured values with moderate accuracy using the selected input parameters. The accuracy analyses revealed that both models had similar accuracies. For both distribution models, a substantial fraction of variability observed in the measured data could not be predicted using the selected input parameters. This is most likely due to variability in the measurements themselves, which cannot be accounted for with the current modelling approach. Other possible sources of variability are procedures such as sample collection (and quenching), transport, analysis, and QA/QC. It was also hypothesized that the low number of measurements available to train and validate the distribution models and the lack of reliable retention time estimations and chlorine concentrations affected the predictions and/or stability of the models.  Similar results were observed for the THMs, DCAA and TCAA models (results not presented). However, their accuracy was lower than the accuracy of the HAA models (e.g. typical correlation coefficients for THMs, DCAA and TCAA were, respectively, 0.58, 0.62 and 0.61). 5.3.4. Trends analyses a) Trends in the measured dataset As a first step in assessing the ability of the models to detect and replicate existing trends in the dataset, the occurrence of trends in all of the measured data was assessed. 71   As previously discussed (section 2.2.4), treated water pH, temperature, turbidity and UVA were expected to impact DBP formation. The relationship between these parameters and total HAA at GV-098 on the measured dataset are presented in Figure 31.   (a)                             (b)   (c)                             (d) 72                                                        (e)                                                                                      (f)  (g) Figure 31 – Measured total HAA vs. measured input parameters of interest It is acknowledged that the relationships between the measured parameters of interest are not necessarily linear. Therefore, Pearson r is not necessarily an accurate representation of the existence of a trend between the parameters. a: Measured total HAA at GV-098 vs. treated water UVA; b: Measured total HAA at GV-098 vs. ΔUVA; c: Measured total HAA at GV-098 vs. treated water temperature; d: Measured total HAA at GV-098 vs. treated water turbidity; e: Measured total HAA at GV-098 vs. treated water pH; f: Measured total HAA at GV-098 vs. treated water DOC; g: Measured total HAA at GV-098 vs. treated water TOC. Significant scatter and no clear correlations could be observed between measured total HAA at GV-098 and treated water quality parameters. It was hypothesized that relationships would be more evident if more measurements had been available. However, further research, beyond the scope of the present study, would be required to assess this hypothesis. 73   Moderate relationships between treated water temperature and pH and total HAA at GV-098 were observed: the greater the treated water temperature or pH, the greater the total HAA at GV-098 (Figure 31c and e). The temperature impact is consistent with theoretical expectations (Health Canada, 2008). However, HAAs were expected to be present at higher concentrations at lower pH (Hung, et al., 2017). This was likely because pH varied within a very small range in the measured dataset. Clear relationships between treated water UVA, ΔUVA, treated water turbidity, DOC and TOC and total HAA at GV-098 were expected. However, only limited relationships were observed for these parameters (Figure 31a, b, d, f and g). b) Predicted vs. measured data The ability of the models to detect expected trends was assessed by comparing the predicted and measured outputs plotted vs. each of the input parameters. Figure 32 displays the results of this analysis.           (a)       (b) 74            (c)         (d)            (e)          (f)          (g)         (h) 75           (i)        (j) Figure 32 – Typical results for predictions vs. inputs of interest analysis Results presented for model HAA-B-GV. The x-axis in each graph contains an input. The y-axes are the corresponding predictions or measured values. On the left column, in black, the predictions; on the right column, in grey, the measured treated water UVA values (similar to Figure 31, but only for the test subset of the data). a: Predicted total HAA at GV-098 vs. measured treated water UVA; b: Measured total HAA at GV-098 vs. measured treated water UVA; c: Predicted total HAA at GV-098 vs. measured treated water temperature; d: Measured total HAA at GV-098 vs. measured treated water temperature; e: Predicted total HAA at GV-098 vs. measured treated water pH; f: Measured total HAA at GV-098 vs. measured treated water pH; g: Predicted total HAA at GV-098 vs. measured treated water turbidity; h: Measured total HAA at GV-098 vs. measured treated water turbidity; i: Predicted total HAA at GV-098 vs. measured ΔUVA; j: Measured total HAA at GV-098 vs. measured ΔUVA. The trends observed for the measured test dataset (right column of Figure 32) were not quantitatively similar to those observed when considering all the data (Figure 31) for the plots vs. treated water UVA, temperature and pH: i) The measured test dataset displayed an inverse relationship to what was theoretically expected and to what was observed in all of the dataset: greater treated water UVA was associated with lower total HAA at GV-098; ii) The measured test dataset displayed a substantially stronger relationship between treated water temperature and measured total HAA at GV-098 than what was observed in all of the dataset; and iii) The measured test dataset displayed a substantially weaker relationship between treated water pH and measured total HAA at GV-098 than what was observed in all of the dataset.  These results provide a better understanding of the issues with repeatability of results discussed in section 5.3.3. The split of the data impacted the results of the models because 76   different fractions of the dataset (e.g. the test dataset) did not display similar trends to those observed when considering all of the data. Again, this was likely due to the small dataset available to train the models. Relatively different trends were observed between measured and predicted total HAA at GV-098 for the different input parameters of interest on the test dataset (Figure 32). The predicted total HAA at GV-098 (left column of Figure 32) displayed substantially stronger relationships for all the different input parameters of interest when compared to the relationships observed in the measured total HAA at GV-098 plots (right column of Figure 32). c) Relative importance analyses and multiple linear regressions Results from the relative variable importance analyses, based on Garson’s algorithm and Connection weights methods (see section 2.3.6), are presented in Figure 33. Both analyses attributed similar impacts to the different input parameters considered.   (a)  (b) 77    (c)  (d) Figure 33 – Typical results for relative variable importance analyses of the distribution system models a: Garson’s algorithm results for model HAA-B-GV; b: Connection weight method results for model HAA-B-GV; c: Garson’s algorithm results for model HAA-DT-GV; d: Connection weight method results for model HAA-DT-GV. The relative importance analyses consistently ranked treated water temperature as the input parameter with the least impact on the predicted total HAA at GV-098 (Figure 33). This is not consistent with analysis of trends in the measured data (section 5.3.4a) and the ability of the models to detect these trends (section 5.3.4b), which indicated that treated water temperature was actually the input parameter with the greatest impact on total HAA at GV-098.   The results from both Garson’s algorithms and the Connection weight method results varied substantially over successive trainings, even though the predictions of models remained stable (i.e. similar) after the introduction of random-seeds (see section 5.3.3). All other input parameters were ranked with variable, moderate importance. Multiple linear regression (MLR) was used to further investigate the impact of the different input parameters on the predictions of the models. No major multicollinearity was detected 78   (variance inflation factor, VIF, was smaller than 10 for all explanatory variables), not even between DOC and TOC (VIF ≈ 4). Due to different magnitudes of the input parameters, these were scaled to means of zero and standard deviation one to provide a more intuitive comparison between the coefficients. Correlation coefficients for predicted vs. measured values of MLRs were lower than those for all ANNs across all models (i.e. MLRs displayed lower accuracy than ANNs). Table 8 – Distribution system models multiple linear regressions Model Multiple linear regression coefficients Pearson r HAA-B-GV 𝐻𝐴𝐴 = 0.3270 × 𝑇𝑟𝑒𝑎𝑡. 𝑈𝑉𝐴 + 0.4925⁡ × 𝑇𝑒𝑚𝑝 + 0.2935 × 𝑝𝐻 + 0.0220 × 𝑇𝑢𝑟𝑏 − 0.1107 × 𝛥𝑈𝑉𝐴 0.539 HAA-DT-GV 𝐻𝐴𝐴 = 0.2295 × 𝑇𝑟𝑒𝑎𝑡. 𝑈𝑉𝐴 + 0.5415⁡ × 𝑇𝑒𝑚𝑝 + 0.3586 × 𝑝𝐻 + 0.0025 × 𝑇𝑢𝑟𝑏 − 0.067 × 𝛥𝑈𝑉𝐴 + 0.4527 × 𝐷𝑂𝐶− 0.2100 × 𝑇𝑂𝐶 0.568 HAA represents total HAA at GV-098; Treat. UVA represents treated water UVA; Temp represents treated water temperature; pH represents treated water pH; Turb means treated water turbidity; DOC represents treated water DOC; and TOC represents treated water TOC. The regression coefficient for the treated water temperature was consistently the greatest, indicating that the MLRs consistently ranked treated water temperature as the input parameter with the greatest impact on total HAA at GV-098 (Table 8). This was consistent with the analysis of trends in the measured data (section 5.3.4a) and the ability of the models to detect these trends (section 5.3.4b). The regression coefficient for the treated water pH or UVA were consistently the second or third greatest, indicating that the MLRs consistently ranked treated water pH or UVA as the input parameters with the second or third greatest impact on total HAA at GV-098 (Table 8). The impact of treated water UVA on total HAA was consistent with the analysis of trends in the measured data (section 5.3.4a) and the ability of the models to detect these trends (section 5.3.4b). The impact of treated water pH on total HAA was substantially stronger than the observed with the analysis of trends in the measured data (section 5.3.4a) and the ability of the models to detect these trends (section 5.3.4b). All other input parameters were ranked with variable, moderate impact on total HAA at GV-098. 79   A visual representation of the comparison between Garson’s algorithm and MLR results is displayed in Figure 34.   (a)  (b) Figure 34 - Comparison between Garson’s algorithm and MLR results for distribution models1,2,3 a: Model HAA-B-GV; b: Model HAA-DT-GV. 1: Importance magnitude by the Connection weights method was similar to Garson’s algorithm. Garson’s algorithms results are scaled to 100% relative importance and are, therefore, more suited for visual representation. 2: MLR coefficients were scaled to 100% relative importance, as it follows: 𝑆𝑐𝑎𝑙𝑒𝑑⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡⁡ = ⁡𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑆𝑢𝑚⁡𝑜𝑓⁡𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙⁡𝑀𝐿𝑅⁡𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 3: Absolute values of MLR coefficients to match Garson’s algorithm absolute values. Treated water UVATreated water temperatureTreated water pHTreated water turbidityΔUVA0.000.100.200.300.400.500.600.700.800.901.000.00 0.20 0.40 0.60 0.80 1.00Garson's algorithm coeffiicientMLR coefficientModel HAA-B-GVTreated water UVATreated water temperatureTreated water pHTreated water turbidityTreated water DOCTreated water TOCΔUVA0.000.200.400.600.801.000.00 0.20 0.40 0.60 0.80 1.00Garson's algorithm coeffiicientMLR coefficientModel HAA-DT-GV80   It can be concluded from these analyses that: i) None of the input parameters had a consistently high impact on total HAA at GV-098; ii) The input parameter of greatest importance for MLR (treated water temperature) was the parameter of least impact based on relative importance analyses; and iii) Relative importance analyses results were not consistent. d) Discussion Consistent with the analysis presented in section 5.2.3, the relative variable importance analyses overlooked some trends, such as the clear relationship observed in the measured test dataset and in the predictions of the models between treated water temperature. Multiple linear regressions attributed a much higher importance to treated water pH than it was observed in both the measured test dataset and in the predictions of the models. Similar results were observed on the trend analyses for THMs, DCAA and TCAA models. In summary, no significant differences were observed between the two models (i.e. HAA-B-GV and HAA-DT-GV) in regards to their ability to detect trends. No significant differences were observed between the two models in regards to accuracy (see section 5.3.3). The addition of treated water DOC and TOC as input variables did not clearly benefit the models. Model HAA-B-GV was therefore selected the best performing treatment model overall due to its accuracy, ability to detect trends and simplicity. This model was used to perform the scenario analyses discussed in section 5.4.         81   5.4. Scenario analyses 5.4.1. Assessment As discussed in section 4.5, scenario analyses were performed to asses the differences in operations between the current ozone dosing approach (i.e. flow based) and an approach that uses ANNs. The test dataset of the treatment models (i.e. real-time raw and treated water characteristics and operational setpoints) was used for the scenario analyses. For discussion that follows, only model T-Δ1 and HAA-B-GV were considered (see discussion at the end of sections 5.2.3 and 5.3.4). Two analyses were considered: i) Assessment of required ozone dose to maintain the HAA concentration at GV-098 below a target level; and ii) Assessment of required ozone dose to maintain the treated water UVT above target levels. 5.4.2. Ozone dosing to control HAA formation This scenario analysis combined the selected treatment and distribution models, T-Δ1 and HAA-B-GV. A logic diagram outlining the step-by-step procedure used by the algorithm that performed this analysis is presented in Appendix D.  The first scenario analysis identified the ozone concentration required to maintain the HAA concentration in the distribution system at a level equal or below a target concentration. As discussed in section 4.5, based on discussion with Metro Vancouver, the HAA target for the scenario analyses was set as the least conservative of the following values: 60µg/L with a 90% confidence level or 50µg/L with a 99% confidence level (section 5.3.3, Figure 29). The 90% confidence interval indicated a 41.7μg/L target, and the 99% confidence interval indicated a 42μg/L target. Therefore, the chosen target HAA concentration was of 42μg/L. The ozone concentration predicted by the ANN to maintain the HAA concentration equal to or below the target level was compared to the ozone concentration that is currently applied. Both the proposed and current applied ozone concentrations are illustrated in Figure 35.  82    Figure 35 – Scenario analysis for HAA concentration based on ozone dose The observations in Figure 35 are average ozone concentrations for five-minute intervals (see section 4.3.1). As discussed (section 2.3.2), the split of the data is random. The test dataset has 20% of the total available data (see section 4.3.1). Therefore, for each five observations in all of the data, one observation is included in the test set and four are absent. The five-minute intervals are most likely non-consecutive (i.e. on average, there are four five-minute intervals between each observation on the test set). This explains part of the noise (i.e. short-term variability) observed in the current applied ozone setpoint line. The ozone concentration setpoint for the dataset ranged between 1.4mg/L and 2.6mg/L. The ozone concentration predicted by the ANN to meet the target HAA level was consistently lower than the applied dose. The model indicated that a 1.0mg/L ozone concentration was sufficient to meet the target HAA level, regardless of conditions. Without imposed boundaries, the model would have indicated that no ozone was required to meet the target HAA level. However, because the treatment model was trained for ozone doses ranging between 1.0mg/L and 3.0mg/L, the predictions of the model were confined to this range.  As previously discussed (section 5.3.4), none of the input parameters for the distribution model had a consistent and substantial impact on the predicted HAA. This is reflected in the above scenario analysis for which predicted HAA is not impacted by the raw, or treated water characteristics. The predicted setpoint was constant at the lower boundary of the range (i.e. 1.0mg/L) because the model “learned” that low treated water UVA, and consequently low ozone doses, was associated with low total HAAs at GV-098 (see section 5.3.4b and Figure 32a and b). 83   These results highlight the importance of investigating the trends and the relative importance of input parameters on the predictions of ANN models. Note that the majority of the studies that used ANNs for water or wastewater applications did not report having completed such assessment (section 2.4). Considering accuracy exclusively, the distribution models had an acceptable performance. However, the trend analyses (section 5.3.4) revealed that theoretically expected trends were not observed in the dataset and, therefore, were not detected nor replicated by the distribution models. This first scenario analysis indicated that, despite providing a good estimate of HAA concentrations in the distribution system, caution should be used in applying the distribution models developed in the present study to control ozone concentrations using real-time water characteristics and operational setpoints. 5.4.3.  Ozone dosing to control treated water UVT  The second scenario analysis identified the ozone concentration required to maintain a target level of treated water UVT. Different target treated water UVTs were tested, ranging between 86% and 96% (historical measured treated water UVT range). Figure 36 displays a typical result for this scenario analysis.   Figure 36 – Typical scenario analysis to maintain constant treated water UVT  Results presented for 92% treated water UVT target. The ozone concentration setpoint for the dataset was the same as in Figure 35, ranging between 1.4mg/L and 2.6mg/L. Again, observations in Figure 36 are average ozone doses for five-minute intervals.  Without imposed boundaries, the model would have indicated that, in some occasions, no ozone was required to meet the target treated water UVT level. However, because the treatment model was trained for ozone doses ranging between 1.0mg/L and 3.0mg/L, the 84   predictions of the model were confined to this range. The ozone concentrations predicted by the ANN indicated that, to maintain a constant treated water UVT, the ozone delivery should be variable, adjusting to changes in raw water characteristics. An economical evaluation to estimate costs of operations with different target treated water UVTs was performed. UV-disinfection operation and maintenance (O&M) costs for different target treated water UVTs were estimated based on previous studies (Cotton, et al., 2001). Ozone concentrations were predicted by the model for different target treated water UVTs. Ozonation O&M costs were estimated based on previous studies (Mundy, et al., 2018). The estimated annual O&M costs with UV-disinfection and ozonation are displayed in Figure 37.  Figure 37 – Estimated annual O&M costs with UV-disinfection and ozonation As expected, the greater the target treated water UVT, the lower the estimated costs for UV-disinfection (section 2.2.4) and the greater the costs for ozonation (section 2.1). For target treated water UVTs lower than 92%, the estimated costs for UV disinfection were dominant. For target treated water UVTs greater than 92%, the estimated costs for ozonation became dominant. The sum of both estimated costs is displayed in Figure 38.  $- $100,000.00 $200,000.00 $300,000.00 $400,000.00 $500,000.00 $600,000.00 $700,000.0084 86 88 90 92 94 96 98Estimatedannual O&Mcosts (CAD)Target treated water UVT (%)UV disinfectionOzonation85    Figure 38 – Estimated annual O&M costs with UV-disinfection and ozonation combined Estimated combined costs indicated that a target treated water UVT of approximately 92% optimized UV disinfection and ozonation O&M costs. This is approximately 1.5% lower than the average measured treated water UVT during the study period (93.5%). Annual O&M costs with UV disinfection (Cotton, et al., 2001) and ozonation (Mundy, et al., 2018) resulting of current ozone dosing approach (i.e. flow based) were estimated. Based on the total cost estimates, savings associated with the implementation of the ozone dosing control using ANNs were calculated and are presented in Figure 39.   Figure 39 – Estimated annual O&M savings associated with implementation of ANNs Based on the total cost estimates, the scenario analysis indicated that savings in the order of $65,000CAD can be achieved per year if ANNs are used to maintain a target treated water  $560,000.00 $570,000.00 $580,000.00 $590,000.00 $600,000.00 $610,000.00 $620,000.00 $630,000.0089 90 91 92 93 94 95 96Estimated UV disinfection and ozonationannual O&M costs (CAD)Target treated water UVT (%) $- $10,000.00 $20,000.00 $30,000.00 $40,000.00 $50,000.00 $60,000.00 $70,000.0090.5 91 91.5 92 92.5 93 93.5 94 94.5Estimated UV disinfection and ozonation annual O&M savings (CAD)Treated water UVT (%)86   UVT of approximately 92%. Additional benefits of the use of ANNs to control ozone dose include: i) Constant treated water quality, based on UVT, regardless of raw water characteristics; ii) Optimization of UV-disinfection by controlling UVT; iii) Increase of biological stability of water because of lower ozone addition (Hu, et al., 1999); and iv) Theoretically, more controlled DBP levels in the distribution system. The scenario analysis estimated costs for UV disinfection and ozonation based on results of studies by others. A more in-depth economic evaluation of actual costs for UV disinfection and ozonation at the CWTP, including a sensitivity analysis, would provide a more accurate estimate of expected savings.                87   6. Conclusions  Listed below are the main conclusions from the present study: • The use of artificial neural networks (ANNs) in drinking water treatment optimization is promising due to its pattern recognition and prediction capabilities. The present study proposed a framework of the use of artificial neural networks to model the treatment and distribution of drinking water  • Drinking water treatment at CWTP could be successfully modelled with ANNs. Pre-chlorination UV transmittance was chosen as the output variable, as a surrogate of DBP formation potential. The treatment system models could reproduce predictions proportional to the measured values with high accuracy using the selected input parameters.  • Modelling of DBPs concentration at sampling station GV-098 revealed that models could reproduce predictions proportional to the measured values with moderate accuracy using the selected input parameters. However, a substantial fraction of variability observed in the measured data could not be predicted using the selected input parameters. It was hypothesized that this variability partially resulted from procedures such as sample collection (and quenching), transport and analysis which cannot be accounted for with the present modelling approach. Furthermore, it was hypothesized that reliable retention time and chlorine residual estimation measurements could improve modelling of DBP formation.  • Garson’s algorithm and connection weight methods had, for the most part, agreeing results. However, both significantly over or underestimated the impact of some input parameters on the outputs of the models. Multiple linear regressions were not an accurate representation of the impact of the inputs on the outputs either. Trend analyses results highlighted the importance of investigating the trends and the relative importance of input parameters on ANN predictions, assessments which the majority of the studies that used ANNs for water or wastewater applications did not report.  88   • Scenario analyses integrating treatment and distribution models to identify optimal ozone dose setpoints that minimizes operational costs while ensuring compliance with CDWQG for DBPs were developed: o For the scenario analysis that assessed the required ozone dose to maintain the HAA concentration at GV-098 below a target level, which integrated treatment and distribution models, the predicted ozone setpoint was 1.0mg/L, regardless of conditions.  DBPs predicted by the distribution models were not substantially impacted by the raw, or treated water characteristics; and o For the scenario analyses that assessed the required ozone dose to maintain the treated water UVT above target levels, which only considered the treatment models, the proposed ozone delivery system, using in real-time raw water characteristics to predict the appropriate ozone dose, could maintain a constant treated water UVT level equal to the average observed in the study period while bringing moderate savings with UV-disinfection and ozonation O&M and other benefits. Additionally, this scenario analysis indicated that, from an economical standpoint, optimal operation is achieved with a treated water UVT target of approximately 92%. Savings of approximately $65,000CAD per year could be achieved if ANN models were used to control ozone dose for those conditions.         89   7. Recommendations • The impact of ozone doses lower than 1.0mg/L on the UV transmittance of the water used by CWTP should be investigated. Current study was limited to historical data, with ozone concentrations ranging from 1.0 to 3.0mg/L; and  •  DBP formation throughout the distribution system in the region of Vancouver needs to be characterized more extensively. Recommendations include: o More frequent measurements of DBPs; o Measuring surrogates for DBP formation, such as ΔUVT, at the sampling stations in real-time; o Studies of hydraulics and retention time in the distribution system.               90   References Alencar, R., 2019. Dealing with very small datasets. [Online]  Available at: https://www.kaggle.com/rafjaa/dealing-with-very-small-datasets [Accessed 7th July 2019]. Baxter, C., Stanley, S. & Zhang, Q., 1999. Development of a full-scale artificial neural network model for the removal of natural organic matter by enhanced coagulation. Journal of Water Services Research and Technology: Aqua, 48(4), pp. 129-136. Baxter, C., Stanley, S., Zhang, Q. & Smith, D., 2002. Developing artificial neural network models of water treatment processes: a guide for utilities. Journal of Environmental Engineering and Science, 1(3), pp. 201-211. Becker, D., 2018. Kaggle - Rectified Linear Units (ReLU) in Deep Learning. [Online]  Available at: https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning [Accessed 1st January 2019]. Bronwlee, J., 2019. Machine Learning Mastery. [Online]  Available at: https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/ [Accessed 14 June 2019]. Burney, S., Jilani, T. & Ardil, C., 2007. A Comparison of First and Second Order Training Algorithms for Artificial Neural Networks. International Journal of Computer and Information Engineering, 1(1), pp. 145-151. Castle, N., 2017. Supervised vs. Unsupervised Machine Learning. [Online]  Available at: https://www.datascience.com/blog/supervised-and-unsupervised-machine-learning-algorithms [Accessed 24 December 2018]. Chin, A. & Béruré, P., 2005. Removal of disinfection by-product precursors with ozone-UV advanced oxidation process. Water Researcch, Volume 39, pp. 2136-2144. Choi, D. & Park, H., 2001. A hybrid artificial neural network as a software sensor for optimal control of a wastewater treatment process. Water Research, 35(16), pp. 3959-3967. 91   Choi, Y. & Choi, Y., 2010. The effects of UV disinfection on drinking water quality in distribution systems.. Water Research, 44(1), pp. 115-122. Chollet, F., 2017. Deep learning with Python. I ed. s.l.:Manning Publications Company. Cho, M., Kim, H., Cho, S. & Yoon, J., 2003. Investigation of Ozone Reaction in River Waters Causing Instantaneous Ozone Demand. Ozone: Science & Engineering, 25(4), pp. 251-259. Chowdhury, S., Champagne, P. & McLellan, P., 2009. Models for predicting disinfection byproduct (DBP) formation in drinking water: A chronological review. Science of the Total Environment, Volume 407, pp. 4189-4206. Cotton, C., Owen, D., Cline, G. & Brodeur, T., 2001. UV disifection costs for inactivating "Cryptosporidium". Journal (American Water Works Association), 93(6), pp. 82-94. Deborde, M. & von Gunten, U., 2008. Reaction of chlorine with inorganic and organic compounds during water treatment-kinetic and mechanics. Water Research, Volume 42, pp. 13-51. El Deeb, A., 2015. What to do with “small” data?. [Online]  Available at: https://medium.com/rants-on-machine-learning/what-to-do-with-small-data-d253254d1a89 [Accessed 7th July 2019]. Fisher, R., 1921. On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample.. Metron, Volume 1, pp. 3-32. Gad, A., 2017. Brief introduction to deep learnign + solving XOR using ANN, s.l.: Menoufia University. Gagnon, C., Grandjean, B. & Thibault, J., 1997. Modelling of coagulant dosage in a water treatment plant. Artificial Intelligence in Engineerin, Volume 11, pp. 401-404. Garson, G., 1991. Interpreting neural network connection weights. Artificial Intelligence Expert. AI Expert, 6(4), pp. 46-51. Giwa, A. et al., 2016. Experimental investigation and artificial neural networks ANNs modeling of electrically-enhanced membrane bioreactor for wastewater treatment. Journal of Water Process Engineering, Volume 11, pp. 88-97. 92   Göb, S. et al., 1999. Modeling the kinetics of a photochemical water treatment process by means of artificial neural networks. Chemical Engineering and Processing: Process Intensification, 38(4-6), pp. 373-382. Goh, A., 1995. Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3), pp. 143-151. Gontarskia, C., Rodrigues, P., Mori, M. & Prenem, L., 2000. Simulation of an industrial wastewater treatment plant using artificial neural networks. Computers & Chemical Engineering, 24(2-7), pp. 1719-1723. Gottschalk, C., Libra, J. & Saupe, A., 2010. Ozonation of water and wastewater, 2nd Edition. Berlin: WILEY-VCH. Griffiths, K. A., 2010. The application of artificial neural networks for filtration optimization in drinking water treatment. Toronto: Department of Civil Engineering - University of Toronto. Hagan, M., Demuth, H., Beale, M. & de Jesus, O., 2014. Neural Network Design. 2nd Edition ed. Oklahoma: Martin Hagan. Hamed, M., Khalafallah, M. & Hassanien, E., 2004. Prediction of wastewater treatment plant performance using artificial neural networks. Environmental Modelling & Software, 19(10), pp. 919-928. Health Canada, 2006. Guidelines for Canadian Drinking Water Quality: Trihalomethanes, Ottawa: s.n. Health Canada, 2008. Guidelines for Canadian Drinking Water Quality: Haloacetic Acids, Ottawa: s.n. Heaton, J., 2008. Introduction to Neural Networks with Java. 2nd ed. s.l.:Heaton Research. Heck, S., Ellis, G. & Hoermann, V., 2004. Modeling the Effectiveness of Ozone as a Water Disinfectant Using an Artificial Neural Network. Environmental Engineering Science, 18(3). Hu, J., Wang, Z., Ng, W. & Ong, S., 1999. The effect of water treatment processes on the biological stability of potable water. Water Research, 33(11), pp. 2587-2592. 93   Hung, Y., Waters, B., Yemmireddy, V. & Huang, C., 2017. pH effect on the formation of THM and HAA disinfection byproducts and potential control strategies for food processing. ScienceDirect, Volume 16, pp. 2914-2923. IRIC, 2017. Overfitting and Regularization. [Online]  Available at: https://bioinfo.iric.ca/overfitting-and-regularization/ [Accessed 7th July 2019]. Isac, G. & Nemeth, S. Z., 2008. Scalar and Asympotic Scalar Derivatives. 1st ed. s.l.:Springer Science & Business Media. Ivahnenko, T. & Zogorski, J., 2006. Sources and occurrence of chloroform and other trihalomethanes in drinking-water supply wells in the United States, 1986-2001, Reston, Virginia: U.S. Department of the Interior, U.S. Geological Survey. Kakaraparthi, V., 2019. Activation Functions in Neural Networks - Medium. [Online]  Available at: https://medium.com/@prateekvishnu/activation-functions-in-neural-networks-bf5c542d5fec [Accessed 05 March 2019]. Karadurmuş, E., Taşkın, N., Göz, E. & Yüceer, M., 2018. Prediction of Bromate Removal in Drinking Water Using Artificial Neural Networks. Ozone: Science & Engineering, 41(2), pp. 118-127. Keskar, N. et al., 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. 5th International Conference on Learning Representations. Khataee, A. & Kasiri, M., 2010. Artificial neural networks modeling of contaminated water treatment processes by homogeneous and heterogeneous nanocatalysis. Journal of Molecular Catalysis A: Chemical, 331(1-2), pp. 86-100. Kingma, D. & Ba, J., 2014. Adam: A Method for Stochastic Optimization. San Diego, s.n. Korshin, G., Benjamin, M. & Hemingway, O. W., 2002. Development of Differential UV Spectroscopy for DBP, Seattle, WA: AWWA Research Foundation and American Water Works Association. Krasner, S. et al., 1989. The occurrence of disinfection by-products in US drinking water. Jour, Volume 81(8), p. 41. 94   Kulkarni, P. & Chellam, S., 2010. Disinfection by-product formation following chlorination of drinking water: Artificial neural networks models and changes in speciation with treatment. Science of the Total Environment, 408(19), pp. 4202-4210. Langlais, B., Reckhow, D. & Brink, D., 1991. Ozone in Water Treatment: Application and Engineering. 1st ed. s.l.:American Water Works Association. Larson, R. A. & Rockwel, A., 1979. Chloroform and chlorophenol production by decarboxylation of natural acids during aqueous chlorination. Enrivonmental Science and Technology, Volume 13, pp. 325-329. Lek, S. et al., 1996. Application of neural networks to modeling nonlinear relationships in ecology. Ecol. Model, Volume 90, pp. 39-52. Liu, W. & Ratnaweera, H., 2016. Improvement of multi-parameter-based feed-forward coagulant dosing control system with feed-back functionalities. Water Science & Technology, 74(2), pp. 491-499. Manamperuma, L., Wei, L. & Ratnaweera, H., 2017. Multi-parameter based coagulant dosing control. Water Science & Technology, Volume 75.9, pp. 2157-2162. Marzouk, M. & Elkadi, M., 2016. Estimating water treatment plant costs using factor analysis and artificial neural networks. Journal of Cleaner Production, 112(5), pp. 4540-4549. Masters, D. & Luschi, C., 2018. Revisiting Small Batch Training for Deep Neural Networks. Bristol, UK: s.n. Mayo, M., 2017. Neural Network Foundations, Explained: Activation Function. [Online]  Available at: https://www.kdnuggets.com/2017/09/neural-network-foundations-explained-activation-function.html [Accessed 14 February 2019]. McArthur, R. & Andrews, R., 2015. Development of artificial neural networks based confidence intervals and response surfaces for the optimization of coagulation performance. Water Science & Technology: Water Supply, Volume 15.5, pp. 1079-1087. McGonagle, J., Shaikouski, G. & Williams, C., 2019. Backpropagation - Brilliant. [Online]  Available at: https://brilliant.org/wiki/backpropagation/ [Accessed 09 June 2019]. 95   Metro Vancouver, 2019. Drinking Water Facilities. [Online]  Available at: http://www.metrovancouver.org/services/water/quality-facilities/facilities-processes/drinking-water-treatment-facilities/Pages/default.aspx [Accessed 13 January 2019]. Milot, J., Rodriguez, M. & Sérodes, J., 2002. Contribution of neural networks for modeling trihalomethanes occurence in drinking water. Journal of Water Resource Planning and Management, 128(5). Mjalli, F., Al-Asheh, S. & Alfadala, H., 2007. Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance. Journal of Environmental Management, 83(3), pp. 329-338. Mundy, B. et al., 2018. A review of ozone system costs for municipal applications. Report by the Municipal Committee - IOA Pan American Group. Ozone: Science & Engineering, 40(4), pp. 266-274. Nawi, N., Atomi, W. & Rehman, M., 2013. The Effect of Data Pre-processing on Optimized Training of Artificial Neural Networks. Procedia Technology, Volume 11, pp. 32-39. Ng, A., 2015. Machine Learning, Stanford University. [Online]  Available at: https://www.coursera.org/learn/machine-learning [Accessed 21 May 2019]. Olden, J., 2000. An artificial neural network approach for studying phytoplankton succession. Hydrobiology, Volume 436, pp. 131-143. Olden, J. & Jackson, D., 2002. Illuminating the ‘‘black box’’: a randomization approach for understanding variable contributions in artificial neural networks. Ecol. Modelling, Volume 154, pp. 135-150. Olden, J., Joy, M. & Death, R., 2004. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Modelling, Volume 178, pp. 389-397. Oram, B., 2014. UV Disinfection Drinking Water - Water Research Center. [Online]  Available at: https://www.water-research.net/index.php/water-treatment/water-disinfection/uv-disinfection [Accessed 11 December 2018]. 96   Özbelge, T. A., 2001. A study for chloroform formation in chlorination of resorcinol. Turkish Journal of Engineering and Environmentla Science, Volume 25, pp. 289-298. Ozone Solutions, 2015. Ozone Conversion & Equations. [Online]  Available at: https://www.ozonesolutions.com/info/ozone-conversions-equations [Accessed 04 Feb 2019]. Paruelo, J. & Tomasel, F., 1997. Prediction of functional characteristics of ecosystems: a comparison of artificial neural networks and regression models. Ecol. Model, Volume 98, pp. 173-186. Patni, S., 2018. Generalisation, Training-Validation & Test data. Machine Learning- Part 6. [Online]  Available at: https://medium.com/@shubhapatnim86/generalisation-training-validation-test-data-machine-learning-part-6-1de9dbb7d3d5 [Accessed 7th July 2019]. Platikanov, S., Puig, X., Martin, J. & Tauler, R., 2007. Chemometric Modeling and Prediction of Trihalomethane Formation in Barcelona's Water Works Plant. Water Research, 41(15), pp. 3394-4006. Rakness, K., 2011. Ozone in Drinking Water Treatment: Process Design, Operation, and Optimization. 1st ed. s.l.:American Water Works Association. Ramalho, R., 2013. Introsuction to watewater teratment processes. 2nd ed. s.l.:Academic Press. RealTech, 2017. DBP and formation potential monitoring. [Online]  Available at: https://realtechwater.com/applications/drinking-water/dbp-formation-potential/ [Accessed 22 May 2019]. Rodriguez, M., Milot, J. & Sérodes, J., 2003. Predicting trihalomethane formation in chlorinated waters using multivariate regression and neural networks. Journal of Water Supply: Research and Technology, 52(3), pp. 199-215. Rule, K., Ebbett, V. & Vikesland, P., 2005. Formation of chloroform and chlorinated organics by free-chlorine-mediated oxidation of triclosan. Environmental Science and Technology, Volume 190, pp. 3176-3185. 97   Shah, A., Dotson, A., Linden, K. & Mitch, W., 2011. Impact of UV Disinfection Combined with Chlorination/Chloramination on the Formation of Halonitromethanes and Haloacetonitriles in Drinking Water. Environmental Science Technology, 45(8), pp. 3657-3664. Sharma, S., 2017. Activation Functions in Neural Networks - Towards Data Science. [Online]  Available at: https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6 [Accessed 24 November 2018]. Shetty, G. & Chellam, S., 2003. Artificial neural networks modeling of contaminated water treatment processes by homogeneous and heterogeneous nanocatalysis. Journal of Membrane Science, 217(1-2), pp. 69-86. Shimazu, H. et al., 2005. Developing a model for disinfection by-products based on multiple regression analysis in a water distribution system. Developing a model for disinfection by-products based on multiple regression analysis in a water distribution system, 54(4), pp. 225-237. Silver, N., 2012. The Signal and the Noise : Why Most Predictions Fail – but Some Don't. 1st ed. United States: Penguin Group. Singer, P., 1999. Formation and control of disinfection by-products in drinking water. USA: American Water Works Association. Smarra, F. et al., 2018. Data-driven model predictive control using random forests for building energy optimization and climate control. Applied Energy, Volume 226, pp. 1252-1272. Smith, L., 2017. Cyclical Learning Rates for Training Neural Networks. IEEE Winter Conference on Applications of Computer Vision (WACV), March.  Srivastava, T., 2014. How does Artificial Neural Network (ANN) algorithm work? - Analytics Vidhya. [Online]  Available at: https://www.analyticsvidhya.com/blog/2014/10/ann-work-simplified/ [Accessed 04 January 2019]. Stranbury, D., 2014. The Clever Machine - Derivation: Error Backpropagation & Gradient Descent for Neural Networks. [Online]  Available at: https://theclevermachine.wordpress.com/2014/09/06/derivation-error-98   backpropagation-gradient-descent-for-neural-networks/ [Accessed 03 Jan 2019]. Tolotra, S., 2018. What Is The Difference Between Step, Batch Size, Epoch, Iteration ? Machine Learning Terminology. [Online]  Available at: https://tolotra.com/2018/07/25/what-is-the-difference-between-step-batch-size-epoch-iteration-machine-learning-terminology/ [Accessed 09 June 2019]. Tupas, R., 2000. Artificial Neural Network Modelling of Neural Network Performance. Alberta, Edmonton, Canada: M.Sc. Thesis - University of Alberta. Walia, A., 2017. Towards Data Science - Optimize Gradient Descent. [Online]  Available at: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f [Accessed 03 Jan 2019]. Wei, J., Bhardwaj, A. & Di, W., 2018. Deep Learning Essentials. s.l.:Packt Publishing. Wu, G. & Lo, S., 2010. Effects of data normalization and inherent-factor on decision of optimal coagulant dosage in water treatment by artificial neural network. Expert Systems with Applications, 37(7), pp. 4974-4983. Ye, B. et al., 2011. Formation and modeling of disinfection by-products in drinking water of six cities in China. Journal of Environmental Monitoring, Volume 13, pp. 1271-1275. Zhang, J., 2007. An Integrated Design Approach for Improving Drinking Water Ozone Disinfection Treatment Based on Computational Fluid Dynamics.. Waterloo: University of Waterloo. Zhang, Q. & Stanley, S., 1999. Real-time water treatment process control with artificial neural networks. Journal of Environmental Engineering, 125(2), pp. 153-160. Zulkifli, H., 2018. Understanding Learning Rates and How It Improves Performance in Deep Learning. [Online]  Available at: https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10 [Accessed 04 January 2019]. 99   Zurada, J., 1992. Introduction to Artificial Neural Systems. 1 ed. St. Paul, MN: West Publishing Co..                       100   Appendix A: Models’ architectures and hyperparameters. Model Architecture (Inputs – Hidden Layers – Outputs) Hyperparameters T-T1  5 – 5 – 1 Learning rate: 0.001 Batch size: 32 # of steps: 10280 T-T2 5 – 4 – 1 Learning rate: 0.01 Batch size: 32 # of steps: 10280 T-Δ1 5 – 6 – 1 Learning rate: 0.01 Batch size: 32 # of steps: 20560 T-Δ2 5 – 6 – 1 Learning rate: 0.001 Batch size: 32 # of steps: 17848 HAA – ΔU – GV 4 – 4 – 1 Learning rate: 0.01 Batch size: 2 # of steps: 2772 HAA – DT – GV 4 – 4 – 1  Learning rate: 0.01 Batch size: 2 # of steps: 2700     101   Appendix B: Logic diagram flow of the selection of architectures and hyperparameters of ANN models.                                     No 102   Appendix C: Sample code used in the present study a) Data cleaning algorithm import pandas as pd import numpy as np a = 144 i = a j = a+1 k = 0 x_out =[] d_out = [] y = data[‘pH’] x_new = y[i-a:i].tolist() d_new = data['Dates'][i-a:i].tolist() z =0 o = 0 while j < 104892:     mean_x = np.mean(x_new[i-a:i])     std_x = np.std(x_new[i-a:i])     if (mean_x-4*std_x <= y[j] <= mean_x+4*std_x):         x_new.append(y[j])         d_new.append(data['Dates'][j])         i= i + 1         j = j + 1         k = 0         x_out=[]         d_out=[]     else:         if k == a:             x_new = x_new + x_out             d_new = d_new + d_out             i = i + a             j = j + 1             o = o + 1 103               x_out = []             d_out = []             k = 0         else:             x_out.append(y[j])             d_out.append(data['Descriptorè'][j])             k = k + 1             j = j + 1             z = z + 1 b) ANN training import tensorflow.contrib.learn as learn import tensorflow as tf from sklearn.model_selection import train_test_split np.random.seed(101) feature_columns = [tf.contrib.layers.real_valued_column("", dimension=X.shape[0])] X = data[[‘ UVT', ‘Turbidity',' pH',' Temperature',' Initial ozone concentration']].dropna() y = data['Delta UVT'].dropna() X1NN, X_testNN, y1NN, y_testNN = train_test_split(X, y, test_size=0.2, random_state=101) X_trainNN, X_crossNN, y_trainNN, y_crossNN = train_test_split(X1NN, y1NN, test_size=0.25,random_state=101) X['0 - UVT']=preprocessing.scale(X['0 - UVT']) X['0 - Turbidity 1']=preprocessing.scale(X['0 - Turbidity 1']) X['0 - pH']=preprocessing.scale(X['0 - pH']) X['0 - Temperature']=preprocessing.scale(X['0 - Temperature']) X['1 - Initial ozone concentration']=preprocessing.scale(X['1 - Initial ozone concentration']) y = preprocessing.scale(y) X1, X_test, y1, y_test = train_test_split(X, y, test_size=0.2, random_state=101) X_train, X_cross, y_train, y_cross = train_test_split(X1, y1, test_size=0.25,random_state=101) array = np.zeros([18278,5]) errorlist=[] import tensorflow.contrib.learn as learn import tensorflow as tf from sklearn.model_selection import train_test_split optimizer = tf.train.AdamOptimizer(learning_rate=0.01) 104   for i in range(0,5):     feature_columns = [tf.contrib.layers.real_valued_column("", dimension=X1.shape[0])]     regressor = learn.DNNRegressor(hidden_units=[6],feature_columns=feature_columns,optimizer=optimizer)     regressor.fit(X_train, y_train,steps=20560,batch_size=32)     uvt_predictions = regressor.predict(X_cross)     array[:,i]=list(uvt_predictions)       if array[:,i].std() != 0:         error = metrics.mean_squared_error(y_cross,array[:,i])         errorlist = np.append(errorlist,error)                  105   Appendix D: Logic diagram flow of the procedure used in one of the scenario analyses.     106   Appendix E: Process flow diagrams from CWTP  107    108    109    110    111   Appendix F: Sensors (i.e. tags) used for the development of models Tag ID Measurement Q-AI-22-328 Raw water UVT (%) WCQ4-AI-001A1 Raw water turbidity (NTU) Q-AI-22-010 Raw water pH Q-FI-50-101A1 Water flow (MLD) Q-TI-22-020/PV.CV Raw water temperature (°C) Q-AI-50-015/PV.CV Ozone concentration in gaseous mixture (%/w) Q-FI-50-012/PV.CV Gaseous mixture flow (LPM) Q-PI-50-010/PV.CV Gaseous mixture pressure (kPa) Q-AI-50-630A1/PV.CV Pre-chlorination UVT (%) Q-AI-50-102A1 Ozone concentration prior to soda ash (mg/L)  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0380442/manifest

Comment

Related Items