UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Optimal transport approximation of measures via lower-dimensional structures : principal manifolds and the average distance problem Kobayashi, Forest

Abstract

We study two variational problems encoding the idea of optimally approximating a probability measure $\rho$ via a lower-dimensional set $\Sigma$ of bounded complexity. In both problems we quantify the performance of $\Sigma$ in approximating $\rho$ via the average of the $p$\textsuperscript{th}-power of the distance to $\Sigma$, $\mathbb{E}_{\omega\sim\rho}[d^p(\omega,\Sigma)]$. This yields an optimal transport (OT) interpretation in terms of minimizing the \emph{Monge-Kantorovich $p$-cost} (also called the \emph{Wasserstein $p$-cost}) over measures $\nu$ supported on $\Sigma$. This perspective is essential in framing some of the applications of our work. However, for the theory, it turns out that we can eschew the language of OT in favor of more geometric perspectives. The two problems differ in the way we quantify the ``complexity'' of the approximating sets. In the first problem (Part I) we consider a class of constraints defined in terms of parametrizations $f$ of $\Sigma$. The particular class we choose is the set of $W^{k,q}(\mathbb{R}^m;\mathbb{R}^n)$ Sobolev norms, where $kq>m$ and $n>m$. We build up general theory with these assumptions plus a few technical hypotheses. The generality of the framework allows us to make connections to manifold learning problems with noisy data and certain generative machine learning problems with regularization. One of the key concepts we develop along the way is the ``gradient'' of $\mathbb{E}_{\omega\sim\rho}[d^p(\omega,\Sigma)]$; it has the form of a vector field that we call the \emph{barycenter field}. We use this to develop nontrivial numerical methods for simulating the problem. The barycenter field does not rely on the Sobolev structure, so we can use it in studying the second problem as well. In the second problem (Part II) we restrict ourselves to $(m=1)$-dimensional approximations $\Sigma$ and use a fully parametrization-independent constraint, the Hausdorff 1-measure $\mathcal{H}^1(\Sigma)$. This yields optimizers with richer topological structure than those of the Sobolev problem. Via analysis of the barycenter field, we are able to mostly resolve an open problem regarding the topological characterization of optimizers, proving that for general target dimension $n\geq 2$ and objective exponent $p\in\{2\}\cup(\frac{3+\sqrt{5}}{2},\infty)$ optimizers are finite binary trees. We expect the result also holds for $p\in(2,\frac{3+\sqrt{5}}{2}]$ but were unable to show it with our methods.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International