Modelling complex biologging data with hidden Markov models

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Modelling complex biologging data with hidden Markov models Sidrow, Evan

Abstract

Hidden Markov models (HMMs) are commonly used to identify latent processes from observed time series, but it is challenging to fit them to large and complex time series collected by modern sensors. Using data from threatened resident killer whales (Orcinus orca) as a case study, this dissertation provides solutions to three common challenges when identifying latent behaviour from biologging data. First, time series are now collected at high frequencies and exhibit fine-scale dependence structures that violate common assumptions of HMMs. I propose a hierarchical approach that divides time series into sequences of curves. A coarse-scale HMM models the sequence of curves while a related fine-scale HMM describes the structure within each curve. The fine-scale model includes autoregression to capture dependence in addition to moving-window summaries and Fourier analysis to capture intricate fine-scale structures. I use simulation and case studies to show that this framework produces more interpretable state estimation and more accurate parameter estimates than existing methods. Second, many modern biologging time series include labels of the latent process of interest. Labels can improve predictive accuracy, but sparse labels can have a negligible influence on parameter estimates. I introduce a weighted likelihood approach that increases the relative influence of labeled observations. I use this approach to develop two HMMs that model the foraging behaviour of killer whales at different scales. Using cross-validated evaluation metrics, I show that my weighted method produces more accurate and understandable models than unweighted methods. Finally, applying HMMs to very large time series is computationally demanding, as expectation-maximization (EM) algorithms typically iterate through the entire data set for every parameter update. I propose a novel optimization algorithm that combines a partial E step with variance-reduced stochastic optimization within the M step. I prove my algorithm converges under standard regularity conditions and test my algorithm empirically using simulations and case studies. My algorithm converges in fewer epochs, with less computation time, and to regions of higher likelihood than standard numerical optimization techniques. In total, this dissertation introduces novel methods to develop and train HMMs for complex biologging time series collected by modern sensors.

Item Metadata

Title	Modelling complex biologging data with hidden Markov models
Creator	Sidrow, Evan
Supervisor	Heckman, Nancy; Auger-Méthé, Marie
Publisher	University of British Columbia
Date Issued	2024
Description	Hidden Markov models (HMMs) are commonly used to identify latent processes from observed time series, but it is challenging to fit them to large and complex time series collected by modern sensors. Using data from threatened resident killer whales (Orcinus orca) as a case study, this dissertation provides solutions to three common challenges when identifying latent behaviour from biologging data. First, time series are now collected at high frequencies and exhibit fine-scale dependence structures that violate common assumptions of HMMs. I propose a hierarchical approach that divides time series into sequences of curves. A coarse-scale HMM models the sequence of curves while a related fine-scale HMM describes the structure within each curve. The fine-scale model includes autoregression to capture dependence in addition to moving-window summaries and Fourier analysis to capture intricate fine-scale structures. I use simulation and case studies to show that this framework produces more interpretable state estimation and more accurate parameter estimates than existing methods. Second, many modern biologging time series include labels of the latent process of interest. Labels can improve predictive accuracy, but sparse labels can have a negligible influence on parameter estimates. I introduce a weighted likelihood approach that increases the relative influence of labeled observations. I use this approach to develop two HMMs that model the foraging behaviour of killer whales at different scales. Using cross-validated evaluation metrics, I show that my weighted method produces more accurate and understandable models than unweighted methods. Finally, applying HMMs to very large time series is computationally demanding, as expectation-maximization (EM) algorithms typically iterate through the entire data set for every parameter update. I propose a novel optimization algorithm that combines a partial E step with variance-reduced stochastic optimization within the M step. I prove my algorithm converges under standard regularity conditions and test my algorithm empirically using simulations and case studies. My algorithm converges in fewer epochs, with less computation time, and to regions of higher likelihood than standard numerical optimization techniques. In total, this dissertation introduces novel methods to develop and train HMMs for complex biologging time series collected by modern sensors.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-10-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0445591
URI	http://hdl.handle.net/2429/89430
Degree	Doctor of Philosophy - PhD
Program	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2024-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Modelling complex biologging data with hidden Markov models Sidrow, Evan

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights