Near-optimal Herding

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Near-optimal Herding Samadi, Samira

Abstract

Herding is an algorithm of recent interest in the machine learning community, motivated by inference in Markov random fields. It solves the following Sampling Problem: given a set Χ \subset R^d with mean μ, construct an infinite sequence of points from Χ such that, for every t ≥ 1, the mean of the first t points in that sequence lies within Euclidean distance O(1/t) of μ. The error of a solution to Sampling Problem is defined to be the distance between the empirical mean of the first t samples and the original mean μ. The O(1/t) error bound suppresses the dependence on d and Χ. In this thesis, we study the best dependence on d and |Χ| that can be achieved for the error in Sampling Problem. Known analysis of the Herding algorithm give an error bound that depends on geometric properties of Χ but, even under favorable conditions, this bound depends linearly on d. We first show that any algorithm for the Sampling Problem must have error Ω(√d/t). Afterward, we present a new polynomial-time algorithm that solves the Sampling Problem with error O(√d log^2.5|Χ|/t) assuming that Χ is finite. This implies that our algorithm is optimal to within logarithmic factors. Finally, we prove that the actual error of the Herding Algorithm is strictly worse than the error of our algorithm if we measure the error in the infinity-norm. Our algorithm is randomized and based on recent algorithmic results in discrepancy theory. We implement our algorithm and other potential solutions for the Sampling Problem and evaluate them on various inputs.

Item Metadata

Title	Near-optimal Herding
Creator	Samadi, Samira
Publisher	University of British Columbia
Date Issued	2014
Description	Herding is an algorithm of recent interest in the machine learning community, motivated by inference in Markov random fields. It solves the following Sampling Problem: given a set Χ \subset R^d with mean μ, construct an infinite sequence of points from Χ such that, for every t ≥ 1, the mean of the first t points in that sequence lies within Euclidean distance O(1/t) of μ. The error of a solution to Sampling Problem is defined to be the distance between the empirical mean of the first t samples and the original mean μ. The O(1/t) error bound suppresses the dependence on d and Χ. In this thesis, we study the best dependence on d and \|Χ\| that can be achieved for the error in Sampling Problem. Known analysis of the Herding algorithm give an error bound that depends on geometric properties of Χ but, even under favorable conditions, this bound depends linearly on d. We first show that any algorithm for the Sampling Problem must have error Ω(√d/t). Afterward, we present a new polynomial-time algorithm that solves the Sampling Problem with error O(√d log^2.5\|Χ\|/t) assuming that Χ is finite. This implies that our algorithm is optimal to within logarithmic factors. Finally, we prove that the actual error of the Herding Algorithm is strictly worse than the error of our algorithm if we measure the error in the infinity-norm. Our algorithm is randomized and based on recent algorithmic results in discrepancy theory. We implement our algorithm and other potential solutions for the Sampling Problem and evaluate them on various inputs.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2014-08-26
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0166033
URI	http://hdl.handle.net/2429/50167
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2014-09
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Near-optimal Herding Samadi, Samira

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights