UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Inference under finite mixture models : distributed learning and approximate inference Zhang, Qiong

Abstract

Finite mixture models are widely used to model data that exhibit heterogeneity. In machine learning, they are often used as probabilistic models for clustering analysis. In the application of finite mixtures to real datasets, the most fundamental task is to learn model parameters. In modern applications, datasets are often too large to be stored in a single facility and are distributed across data centres. To learn models on these distributed datasets, split-and-conquer (SC) approaches are often used. SC approaches consist of two steps: they first locally learn one model per data centre and then send these local results to a central machine to be aggregated. Since the parameter spaces of mixtures are non-Euclidean, existing aggregation methods are not appropriate. We develop a novel computationally efficient aggregation approach for SC learning of finite mixtures. We show that the resulting estimator is root-n-consistent under some general conditions. Experiments show the proposed approach has comparable statistical performance with the global estimator based on the full dataset, if the latter is feasible. It also has better statistical and computational performance than some existing methods for learning mixtures on large datasets. Finite mixtures are also widely used to approximate density functions of various shapes. When mixtures are used in graphical models to approximate density functions, the order of the mixture increases exponentially due to recursive procedures and the inference becomes intractable. One way to make the inference tractable is to use mixture reduction, that is, to approximate the mixture by one with a lower order. We propose a novel reduction approach by minimizing the Composite Transportation Divergence (CTD) between two mixtures. The optimization problem can be solved by a majorization minimization algorithm. We show that many existing algorithms for reduction are special cases of our approach. This finding allows us to provide theoretical support for these existing algorithms. Our approach also allows flexible choices for cost functions in CTD. This flexibility allows our approach to have better performance than existing approaches. We also discuss other related learning issues under finite mixtures.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International