UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Prediction and anomaly detection in water quality with explainable hierarchical learning through parameter sharing Mohammad Mehr, Ali

Abstract

Decisions made on water quality have high implications for diverse industries and general population. In a 2020 study, Guo et al. report that the current literature on modeling spatiotemporal variabilities in surface water quality at large scales across multiple catchments is very poor. In this thesis, we introduce a simple, explainable, and transparent machine learning model that is derived from linear regression with hierarchical features for efficient prediction and for anomaly detection on large scale spatiotemporal datasets. Our model learns offsets for various features in the dataset while utilizing a hierarchy among the features. These offsets can enable generalization and be used in anomaly detection. We show some interesting theoretical results on such hierarchical models. We built a water pollution platform for exploratory data analysis of water quality data in large scales. We evaluate the predictions of our model on the Waterbase - Water Quality dataset by the European Environmental Agency. We also investigate the explainability of our model. Finally, we investigate the performance of our model in classification tasks while analyzing its ability to do regularization and smoothing as the number of observations grows in the dataset.

Item Citations and Data

License

Attribution-NoDerivatives 4.0 International

Usage Statistics