UBC Theses and Dissertations
Hidden Markov models : multiple processes and model selection MacKay, Rachel J.
This thesis considers two broad topics in the theory and application of hidden Markov models (HMMs): modelling multiple time series and model selection. Of particular interest is the application of these ideas to data collected on multiple sclerosis patients. Our results are, however, directly applicable to many different contexts in which HMMs are used. One model selection issue that we address is the problem of estimating the number of hidden states in a HMM. We exploit the relationship between finite mixture models and HMMs to develop a method of consistently estimating the number of hidden states in a stationary HMM. This method involves the minimization of a penalized distance function. Another such issue that we discuss is that of assessing the goodness-of-fit of a stationary HMM. We suggest a graphical technique that compares the empirical and estimated distribution functions, and show that, if the model is misspecified, the proposed plots will signal this lack of fit with high probability when the sample size is large. A unique feature of our technique is the plotting of both the univariate and multivariate distribution functions. HMMs for multiple processes have not been widely studied. In this context, random effects may be a natural choice for capturing differences among processes. Building on the framework of generalized linear mixed models, we develop the theory required for implementing and interpreting HMMs with random effects and covariates. We consider the case where the random effects appear only in the conditional model for the observed data, as well as the more difficult setting where the random effects appear in the model for the hidden process. We discuss two methods of parameter estimation: direct maximum likelihood estimation and the EM algorithm. Finally, to determine whether the additional complexity introduced by the random effects is warranted, we develop a procedure for testing the significance of their variance components. We conclude with a discussion of future work, with special attention to the problem of the design and analysis of multiple sclerosis clinical trials.