- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Application of machine learning and information theory...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Application of machine learning and information theory to monitor and predict environmental signals Foroozand, Hossein
Abstract
Environmental signal forecasting is the process of making predictions of the future based on past and present data. In general, forecasting has built on the process of science to uncover knowledge and interpret the meaning of those discoveries. In the last decades, data availability revolutionized the process of investigation into the natural world and the knowledge generated through that process. Recent progress in environmental signal predictions has been driven by 1) methodological improvement in prediction models; and 2) emergence of new data acquisition techniques and resulting data sets. This dissertation is divided into two main parts to focus on both aspects of recent progress (i.e., striving for better models and better data). Machine learning is the fast-growing branch of data-driven models and is one of the most influential contributing factors to model improvement. There are many ways to improve model predictions in this field, and Bootstrap AGGregatING (Bagging), which uses a large collection of models (called an ensemble) instead of a single one, is one of the widely applied methods. The training of those models can be computationally expensive. In this research, we propose a method to pick only the most informative samples for model training, to achieve equally good performance with a smaller ensemble. For problems where computational effort is a limitation, this could lead to better predictions. The pursuit of better data is partly relying on optimally designing the monitoring network. Monitoring network optimization using information theory measures, like other statistical approaches, faces multiple problems regarding assumptions made in the choices of objective function and data discretization. The research undertaken in the second part of dissertation is mainly focused on investigating how assumptions would affect the optimal network layouts. We propose a single objective optimization of joint entropy (network's information content) to maximize information collection. The first application of the K-means quantization method is proposed to improve data representativeness in monitoring network design. We introduce information partitioning techniques to improve network selection process once it reaches its saturation point from achievable information content perspective; we address a novel framework in a case of high-density raingauge network design.
Item Metadata
Title |
Application of machine learning and information theory to monitor and predict environmental signals
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2021
|
Description |
Environmental signal forecasting is the process of making predictions of the future based on past and present data. In general, forecasting has built on the process of science to uncover knowledge and interpret the meaning of those discoveries. In the last decades, data availability revolutionized the process of investigation into the natural world and the knowledge generated through that process. Recent progress in environmental signal predictions has been driven by 1) methodological improvement in prediction models; and 2) emergence of new data acquisition techniques and resulting data sets. This dissertation is divided into two main parts to focus on both aspects of recent progress (i.e., striving for better models and better data).
Machine learning is the fast-growing branch of data-driven models and is one of the most influential contributing factors to model improvement. There are many ways to improve model predictions in this field, and Bootstrap AGGregatING (Bagging), which uses a large collection of models (called an ensemble) instead of a single one, is one of the widely applied methods. The training of those models can be computationally expensive. In this research, we propose a method to pick only the most informative samples for model training, to achieve equally good performance with a smaller ensemble. For problems where computational effort is a limitation, this could lead to better predictions.
The pursuit of better data is partly relying on optimally designing the monitoring network. Monitoring network optimization using information theory measures, like other statistical approaches, faces multiple problems regarding assumptions made in the choices of objective function and data discretization. The research undertaken in the second part of dissertation is mainly focused on investigating how assumptions would affect the optimal network layouts. We propose a single objective optimization of joint entropy (network's information content) to maximize information collection. The first application of the K-means quantization method is proposed to improve data representativeness in monitoring network design. We introduce information partitioning techniques to improve network selection process once it reaches its saturation point from achievable information content perspective; we address a novel framework in a case of high-density raingauge network design.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-09-30
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0401733
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2021-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International