UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The neutral-to-the-left mixture model La, Sean

Abstract

A useful step in data analysis is clustering, in which observations are grouped together in a hopefully meaningful way. The mainstay model for Bayesian nonparametric clustering is the Dirichlet process mixture model, which has one key advantage of inferring the number of clusters automatically. However, the Dirichlet process mixture model has particular characteristics, such as linear growth in the size of clusters and exchangeability, that may not be suitable modelling choices for some data sets, so there is further research to be done into other Bayesian nonparametric models with characteristics that differ from that of the Dirichlet process mixture model while maintaining automatic inference of the number of clusters. In this thesis, we introduce the Neutral-to-the-Left mixture model, a family of Bayesian nonparametric infinite mixture models which serves as a strict generalization of the Dirichlet process mixture model. This family of mixture models has two key parameters: the distribution of arrival times of new clusters, and the parameters of the stick breaking distribution, whose customization allows the user to inject prior beliefs regarding the structure of the clusters into the model. We describe collapsed Gibbs and Metropolis–Hastings samplers to infer the posterior distribution of clusterings given data. We consider one particular parameterization of the Neutral-to-the-Left mixture model with characteristics that are distinct from that of the Dirichlet process mixture model, evaluate its performance on simulated data, and compare these to results from a Dirichlet process mixture model. Finally, we explore the utility of the Neutral-to-the-Left mixture model on real data by applying the model to cluster tweets.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International