UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computationally efficient geometric methods for optimization and inference in machine learning Lin, Wu

Abstract

Optimization and inference are essential ingredients of machine learning (ML). Many optimization and inference problems can be formulated from a probabilistic perspective to exploit the Fisher-Rao geometric structure of a probability family. By leveraging the structure, optimization and inference methods can improve their performance. A classic approach to exploiting the Fisher-Rao structure is natural-gradient descent (NGD). However, implementing NGD remains computationally intensive. When a parameter space is constrained, it can also be non-trivial to perform NGD while taking the constraint into consideration. It is even more challenging to perform NGD in structured parameter spaces. This work pushes the boundary of the theory and practice of NGD. We first establish a connection among NGD, mirror descent (MD), and Riemannian gradient descent (RGD). Thanks to this connection, we consider various approaches to better exploit underlying geometric and algebraic structures of NGD resulting in computationally efficient and practically useful methods for inference and optimization problems in ML. In the first part of this work, we reduce the computational cost of NGD for exponential family approximations in variational inference. To do so, we exploit a convex duality in MD by viewing NGD as a MD update. Secondly, we extend the scope of NGD and develop efficient methods by exploiting a similar duality for mixtures of exponential family approximations. Mixtures are more powerful than the exponential family to model complex distributions. We introduce a new Fisher-Rao metric for a mixture since the original Fisher-Rao metric becomes singular in mixture cases. The third part of this work addresses the constraint issue in NGD by proposing a new efficient update scheme inspired by RGD. We focus on performing NGD updates in positive-definite constrained parameter spaces as these spaces appear in exponential family distributions and their mixtures. Finally, we enable a practical usage of NGD methods in structured and constrained parameter spaces by using tools from Riemannian geometry. We propose a structure-preserving NGD update scheme by using matrix Lie groups and moving coordinates. Thanks to our approach, we also develop novel structured second-order methods for unconstrained optimization and adaptive gradient-based methods for deep learning.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International