UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automated tuning and analysis of non-reversible parallel tempering Surjanovic, Nikola

Abstract

Non-reversible parallel tempering (NRPT) is an effective algorithm for sampling from distributions with complex geometry, such as those arising from posterior distributions of weakly identifiable and high-dimensional Bayesian models or Gibbs distributions in statistical mechanics. In this work we introduce methods for the automated tuning of NRPT and establish convergence results that explain its observed empirical success. A central feature of all methods that we consider is that they can be fully automated and are robust, enabling them to be used in software with minimal hassle for the user. Furthermore, the methods are all parallelizable and scale well with modern computational resources. We begin with a study of how to bridge NRPT and variational inference in order to obtain more effective samplers. To do so, we introduce a generalized annealing path connecting the posterior to an adaptively tuned variational reference, where the reference is tuned to minimize the forward (inclusive) KL divergence to the posterior. Our experiments demonstrate large empirical gains on a wide range of realistic Bayesian inference scenarios. The methods are implemented in open-source software and have been applied to address challenging open problems in astrophysics. Next, in order to reliably implement methods such as variational NRPT, it is necessary to have robust optimizers. We introduce AutoGD: a gradient descent method that automatically adapts the learning rate at each iteration. Our theory establishes the convergence of AutoGD, recovering the optimal rate of standard gradient descent with a fixed learning rate (up to a constant) for a broad class of functions. Empirical results suggest strong performance of the method, along with its extensions to AutoBFGS and AutoLBFGS, on a variety of traditional problems and variational inference optimization tasks. Finally, to shed light on the empirical success of NRPT, we establish its uniform (geometric) ergodicity under a model of efficient local exploration. We obtain analogous ergodicity results for classical reversible parallel tempering, providing new evidence that NRPT dominates its reversible counterpart. The rates that we obtain reflect real experiments well for distributions where global exploration is not possible without parallel tempering, and we link the theoretical results to useful guidelines for MCMC practitioners.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International