Open Collections

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Improving prediction of water main failures using statistical and machine learning algorithms Vaags, Eric Henry

Abstract

Models that estimate the likelihood of failure of water mains are widely used to support the repair and replacement strategies of water utilities. Advances in the fields of statistics and machine learning have introduced a wide range of models and improvements to data management have made increasingly complex models more feasible. The datasets that are used to develop these models are frequently subject to change as strategies for the operation and renewal of water distribution systems evolve. This issue is potentially exacerbated by the nonstationary processes impacting these systems. For water main failure prediction models to be useful in this dynamic context, it may be necessary for utilities to periodically evaluate several models for their dataset or for researchers to examine the performance of one or more models across multiple datasets. This work presents a framework for the selection and analysis of water main failure prediction models that is intended to enable efficient development of a range of models for a single dataset or investigation of the performance of models across several datasets. Each step of the framework is described and recommendations are given for researchers and asset managers attempting to implement the processes defined herein. The framework is investigated using data from four different utilities, where each dataset is highly censored. Through the application of the framework, four models are selected and refined: Cox Proportional Hazards Model, Neural Multi-Task Logistic Regression Model, XGBoost Survival Embeddings Model, and Random Survival Forests Model. These models are trained on each of the utility datasets and the outputs are compared to assess the efficacy of the framework. Results show that the framework may be used to identify models that are sufficiently robust to achieve high performance using datasets from four different utilities. Of the final selection of models developed through the framework, the lowest performance among all four datasets is a C-index of 0.780. Additionally, the framework is able to establish at least one model for each utility that performs very well. The C-index values range from 0.880 to 0.913 for the best model developed for each utility.

Item Metadata

Title	Improving prediction of water main failures using statistical and machine learning algorithms
Creator	Vaags, Eric Henry
Supervisor	Lence, Barbara J.
Publisher	University of British Columbia
Date Issued	2021
Description	Models that estimate the likelihood of failure of water mains are widely used to support the repair and replacement strategies of water utilities. Advances in the fields of statistics and machine learning have introduced a wide range of models and improvements to data management have made increasingly complex models more feasible. The datasets that are used to develop these models are frequently subject to change as strategies for the operation and renewal of water distribution systems evolve. This issue is potentially exacerbated by the nonstationary processes impacting these systems. For water main failure prediction models to be useful in this dynamic context, it may be necessary for utilities to periodically evaluate several models for their dataset or for researchers to examine the performance of one or more models across multiple datasets. This work presents a framework for the selection and analysis of water main failure prediction models that is intended to enable efficient development of a range of models for a single dataset or investigation of the performance of models across several datasets. Each step of the framework is described and recommendations are given for researchers and asset managers attempting to implement the processes defined herein. The framework is investigated using data from four different utilities, where each dataset is highly censored. Through the application of the framework, four models are selected and refined: Cox Proportional Hazards Model, Neural Multi-Task Logistic Regression Model, XGBoost Survival Embeddings Model, and Random Survival Forests Model. These models are trained on each of the utility datasets and the outputs are compared to assess the efficacy of the framework. Results show that the framework may be used to identify models that are sufficiently robust to achieve high performance using datasets from four different utilities. Of the final selection of models developed through the framework, the lowest performance among all four datasets is a C-index of 0.780. Additionally, the framework is able to establish at least one model for each utility that performs very well. The C-index values range from 0.880 to 0.913 for the best model developed for each utility.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-10-19
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0402556
URI	http://hdl.handle.net/2429/80000
Degree	Master of Applied Science - MASc
Program	Civil Engineering
Affiliation	Applied Science, Faculty of; Civil Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2021-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Improving prediction of water main failures using statistical and machine learning algorithms Vaags, Eric Henry

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights