UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Assessment of source water microbial quality using Bayesian belief networks and data balancing algorithms Aliashrafi Zagi, Atefeh

Abstract

Cryptosporidium and E. coli are recognized as critical pathogens in source water with mortality risk. In order to protect public health from waterborne risks, monitoring of Cryptosporidium and E. coli in drinking water sources is essential. However, direct measurement of these pathogens is expensive and labor-intensive, resulting in limited information and time-delays for risk-based management of water systems. While these challenges slow down the real-time monitoring of pathogens’ levels in ambient waters, AI-based techniques offer a fast and effective alternative for direct measurements. Bayesian Belief Networks (BBNs) is one of these data-driven methods gaining traction in modelling environmental systems and capturing their uncertainties. BBNs can assist the decision-makers by visualizing the interaction of variables in the complex systems. In this thesis, BBNs have been used to estimate Cryptosporidium and E. coli levels to provide a real-time assessment of the microbial quality of source water and fill the time gap required for direct measurement. However, available Cryptosporidium data are rare and unbalanced, mainly indicating absence or non-detectable levels of Cryptosporidium. To overcome this challenge, two data balancing algorithms, Adaptive Synthesized Sampling (ADASYN) and Synthetic Over Sampling Technique (SMOTE) have been utilized. The objective was to eliminate unbalanced features of the dataset and train the model in a way that can predict both presence and absence of Cryptosporidium based on unbalanced and real measurements. In current work, the BBN model has been used for Cryptosporidium prediction and trained for the first time with a balanced dataset generated through ADASYN and SMOTE algorithms. The application of balancing algorithms increased the prediction accuracy to more than 60%, compared with models developed by unbalanced datasets. Furthermore, the sensitivity of pathogen’s level to different water quality and weather parameters was also investigated with the aim of improving the information regarding factors influencing source waters quality. Although precipitation and temperature indicated a significant impact on target parameters, the scale of the impact was very site-specific. The observation indicated that besides weather and water quality characteristics, different characteristics of each monitoring site seem to affect the level of Cryptosporidium and E. coli in studied water sources.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International