UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the structure of optimal martingale transport in higher dimensions Lim, Tongseok 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2016_may_lim_tongseok.pdf [ 531.67kB ]
JSON: 24-1.0228865.json
JSON-LD: 24-1.0228865-ld.json
RDF/XML (Pretty): 24-1.0228865-rdf.xml
RDF/JSON: 24-1.0228865-rdf.json
Turtle: 24-1.0228865-turtle.txt
N-Triples: 24-1.0228865-rdf-ntriples.txt
Original Record: 24-1.0228865-source.json
Full Text

Full Text

On the structure of optimal martingale transport inhigher dimensionsbyTongseok LimA thesis submitted in partial fulfillment of the requirements forthe degree ofDoctor of PhilosophyinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)The University of British Columbia(Vancouver)April 2016c© Tongseok Lim, 2016AbstractIn the first part of this thesis, we study the structure of solutions to the optimal mar-tingale transport problem, when the marginals lie in higher dimensional Euclideanspaces (Rd, d ≥ 2). The problem has been extensively studied in one-dimensionalspace (R), but few results have been shown in higher dimensions. In this thesis, wepropose two conjectures and provide key ideas that lead to solutions in importantcases.In the second part, we study the structure of solutions to the optimal subharmonicmartingale transport problem, again when the marginals lie in higher dimensionalEuclidean spaces. First, we show that this problem has an equivalent formulationin terms of the celebrated Skorokhod embedding problem in probability theory. Wethen describe the fine structure of the solution provided the marginals are radiallysymmetric. The general case remains unsolved, and its potential solution calls for adeeper understanding of harmonic analysis and Brownian motion in higher dimen-sional spaces.iiPrefaceChapter 2 of this thesis is based on the paper [21] prepared in collaboration withNassif Ghoussoub and Young-Heon Kim, as well as my paper [31].Chapter 3 is based on paper [20] written in collaboration with Nassif Ghoussouband Young-Heon Kim. Some text from these papers have been used verbatim in thisthesis.Papers submitted for publication:[20] N. Ghoussoub, Y-H. Kim, and T. Lim. Optimal Skorokhod embedding betweenradially symmetric marginals in general dimensions., preprint.[21] N. Ghoussoub, Y-H. Kim, and T. Lim. Structure of optimal martingale transportin general dimensions. http://arxiv.org/abs/1508.01806, 2015. pp 1 - 27.[31] T. Lim. Optimal martingale transport between radially symmetric marginals ingeneral dimensions., preprint.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Optimal transport problem . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Martingale transport problem . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Summary of our result . . . . . . . . . . . . . . . . . . . . . . 41.3 Subharmonic martinagale transport problem a.k.a Skorokhod embed-ding problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Structure of optimal martingale transport in higher dimensions . 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Profile of an optimal martingale transport when the dual is attained . 272.3.1 Duality for optimal martingale transports . . . . . . . . . . . 282.3.2 Regularity of the dual functions . . . . . . . . . . . . . . . . . 30iv2.3.3 Structure of optimal martingale supporting sets when the dualis attained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4 A canonical decomposition of martingale transports . . . . . . . . . . 402.4.1 Irreducible convex pavings associated to martingale supportingsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.4.2 Duality attainment on each irreducible component . . . . . . . 442.4.3 Proof of Theorem 2.2.11 . . . . . . . . . . . . . . . . . . . . . 522.4.4 The disintegration of a martingale transport plan . . . . . . . 552.5 Structural results for general optimal martingale transport plans . . . 552.5.1 Nikodym sets and martingale transports . . . . . . . . . . . . 552.5.2 The case where the convex paving has at most1-codimensional components . . . . . . . . . . . . . . . . . . . 572.5.3 The case of a discrete target . . . . . . . . . . . . . . . . . . . 602.6 Minimization problem under radially symmetric marginals . . . . . . 622.6.1 Monotonicity principle and stability of µ∧ ν under every min-imizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.6.2 Structure of optimal martingale transport in one dimension . . 652.6.3 Structure of optimal martingale transport in higher dimensions 693 Structure of optimal subharmonic martingale transport in higherdimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.1 Formulation of the problem . . . . . . . . . . . . . . . . . . . . . . . 753.1.1 Skorokhod embedding problem . . . . . . . . . . . . . . . . . 753.1.2 Subharmonic martingale transport problem . . . . . . . . . . 763.2 Symmetrization of transport plans . . . . . . . . . . . . . . . . . . . . 783.3 Randomized stopping times . . . . . . . . . . . . . . . . . . . . . . . 793.4 Monotonicity principle and stability of µ ∧ ν . . . . . . . . . . . . . . 813.5 Variational lemma and its consequences . . . . . . . . . . . . . . . . . 843.5.1 Heuristic observations . . . . . . . . . . . . . . . . . . . . . . 843.5.2 Variational lemma and the main theorem . . . . . . . . . . . . 84v4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.2 Directions for future work . . . . . . . . . . . . . . . . . . . . . . . . 92Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Appendix A A proof of Proposition 3.1.3 . . . . . . . . . . . . . . . . . 98Appendix B Continuity of the functions in Theorem 3.5.2 . . . . . . 102viAcknowledgementsI thank my supervisors, Professors Nassif Ghoussoub and Young-Heon Kim, from theUBC Department of Mathematics for their valuable guidance and support through-out my doctoral study. They have oriented me into a beautiful and deep area ofresearch, and they have patiently led me and moulded me into becoming a profes-sional mathematician.I thank Professor Ali Lazrak from the Sauder School of Business, for his heartfeltadvice and encouragement.I thank Professor Edwin A. Perkins, for his invaluable teaching of probability theoryand for generously giving me so much of his time and his ideas.I thank my mother, Jae-Jeong Lee, for her loving support throughout my studies inKorea and in Canada.I thank my father, now in heaven, Eui-Taek Lim. He eagerly wanted me to pursue ahigher-level education, and for that he supported me with everything he could give.I dedicate this Ph.D. thesis to his memory.I would also like to thank my uncle Jaekyun Lee, and my aunt Soonok Lee, for theirkind hospitality during my stay in Toronto. Chapter 3 could not have been writtenwithout their support.Last but foremost, I thank my God, who las led my way and will lead me now andforever.viiDedicationTo my father, who watches over me from heaven, and my mother.viiiChapter 1Introduction1.1 Optimal transport problemThe research in my thesis has focused on the structure of probability measures whichsolve certain optimization problems. The prototype is the optimal mass transportproblem: for a given cost function c : Rd × Rd → R and two Borel probabilitymeasures µ, ν on Rd, we consider:(1.1.1) Maximize / Minimize cost[pi] =∫Rd×Rdc(x, y) dpi(x, y)over all pi ∈ T (µ, ν), where T (µ, ν) is the set of mass Transport plans–i.e. the set ofprobabilities pi on Rd×Rd with marginals µ and ν on Rd. We interpret the transportplan pi as follows: for A,B ⊆ Rd, pi(A×B) is then the amount of mass transportedby the plan pi from the resource domain A to the target range B. An equivalentprobabilistic formulation is to consider the following problem:Maximize / Minimize EP c(X, Y )(1.1.2)over all joint random variables (X, Y ) : Ω → Rd × Rd with given laws X ∼ µ andY ∼ ν respectively.1In 1781, Gaspard Monge [30] formulated the following question that was relevantto his work in engineering: Given two sets U, V in Rd of equal volume, find theoptimal volume-preserving map between them, where optimality is measured againstthe cost function c(x, y) of transporting particle x to y. The optimal map shouldthen minimize the total cost of redistributing the mass of U through V . Much later,Kantorovich, a Nobel prize winner in 1975 for related work in economics, generalizedthe Monge problem and proposed the above formulation.Many researchers capitalized on Kantorovich’s pioneering work, including Bre-nier, Caffarelli, Evans, McCann, Gangbo, Trudinger, Wang, Ambrosio, Otto, andVillani, to contribute to the development of the fundamental theory of optimal trans-port. Their contributions have had far-reaching consequences in many areas of math-ematics, including differential geometry, isoperimetry, PDE, infinite-dimensional lin-ear programming, functional analysis, economics, probability, and statistics.In Monge’s original problem, the cost was simply the Euclidean distance c(x, y) =|x − y|. Even for this seemingly simple case, it took two centuries before Sudakov,Evans, Gangbo, Ambrosio, Caffarelli, McCann, Feldman, Trudinger, Wang and oth-ers showed rigourously that an optimal transport map exists. Many deep applicationsfollowed since.More recently, a new direction emerged where the transport plans are assumedto be martingales. This is where our main research contribution lies. In the sequel, Ishall describe the problem, its motivation, our contributions, and further directionsof research.1.2 Martingale transport problemConsider the optimization problem (1.1.1) over MT(µ, ν), where MT(µ, ν) is the setof Martingale Transport plans, that is the set of probability measures pi on Rd × Rdwith marginals µ and ν, such that for each x ∈ Rd, its disintegration (pix)x∈Rd withrespect to µ has its barycenter at x; in other words, for any convex function ϕ on2Rd,ϕ(x) ≤∫Rdϕ(y) dpix(y).(1.2.1)Equivalently, we study (1.1.2) over all martingales (X, Y ) (i.e. E[Y |X] = X) withgiven laws X ∼ µ, Y ∼ ν. We recall that a disintegration is an indexed probabilitymeasures (pix)x∈Rd , which can be interpreted as the conditional probability dpix(y) =P(Y = y|X = x) in such a way that dpi(x, y) = dpix(y)dµ(x).The problem originates in mathematical finance. Indeed, in the process of pricingfinancial options, the usual approach is to postulate a model and then calculate theprice of an option according to the model. However, this assumption is not practicalsince all what we can observe in the market are the traded option prices and not anyspecific model. We are therefore led to ask what we could infer about the underlyingmodel, if we can only know vanilla option prices. Mathematically speaking, weare interested in an underlying model (X, Y ), where X is the (random) price ofthe options at the first date of maturity and Y is the price at the second date ofmaturity. If we know the call and put prices of many exercised prices for these datesof maturity, then we can infer the marginal distribution of the asset price (i.e. X ∼ µand Y ∼ ν). Since there can be many candidate martingales (X, Y ) having the samemarginals, we can then look at the ones where the two values Min EP c(X, Y ) andMax EP c(X, Y ) is achieved. This will give the lower (resp., upper) bound of theprice of the option c(X, Y ).It was D. Hobson who first recognized the importance of Skorokhod embeddings1in this “model-free” approach to finance and asset pricing. Since then, much relatedresearch has been done in this context, for example, by Beiglbo¨ck, Cox, Dolinsky,Galichon, Henry-Labordere, Hobson, Huesmann, Soner, Touzi and others [4], [6],[12], [17], [25], [26]. On the other hand, another long line of research on the gen-eral martingale transport problem has been made by Beiglbo¨ck, Henry-Labordere,Hobson, Juillet, Klimmek, Neuberger, Penkner, Touzi and others [3], [4], [5], [27],[28].1We will introduce the Skorokhod embedding problem later3Note that in one dimension, the two problems are equivalent, i.e., one can alwaysstudy martingales induced by stopped Brownian motion. This is not surprising sincein R1, the class of convex functions is the same as the class of subharmonic functions(see section 3.1 for more detail).But the class of convex functions becomes a strict subset of the class of sub-harmonic functions in dimension d ≥ 2, and in this higher-dimensional case, theSkorokhod embedding problem is equivalent to the Subharmonic martingale trans-port problem, which we define and discuss in chapter 3. We note that the abovecited papers are all done in one dimension. Mathematically, this means that themarginals µ, ν are probability measures on the real line, or the random variablesX, Y have values in R1. Financially, this means that the option c(X, Y ) dependsonly on one stock process. It may not be hard to expect that the optimal martingalecoupling problem (1.1.1) will be much more difficult when the option depends onarbitrarily many stock processes, simultaneously and nonlinearily. Nevertheless, thehigher dimensional problem looks very important, not only mathematically but alsofinancially since many options in the real market indeed depend on many number ofassets, e.g. every stock in the S&P 500 index.It is therefore important to consider the higher dimensional case. In my firstpaper [31] in this direction, I have shown that the optimal martingale problem hasa unique solution in case the marginals µ, ν are radially symmetric, which is satis-fied by important distributions such as Gaussians. Very recently, my Prof. NassifGhoussoub, Prof. Young-Heon Kim and I have completed a paper [21] that dealswith the general case in arbitrarily high dimensions, and which gives a satisfactorydescription of the optimal solutions. To our knowledge, these are the first such re-sults to be established in arbitrarily high dimensions. We will explain these in detailin chapter Summary of our resultIn Beiglbo¨ck-Juillet [5], Hobson-Neuberger [28], Hobson-Klimmek [27], it is shownthat in the problem (1.1.1) with d = 1 and c(x, y) = |x− y|, whenever µ is dispersed4(µ << L1), an optimal martingale transport pi is unique for any given ν, and moreinterestingly, exhibits an extremal property in that for each x ∈ R, the disintegration(conditional probability) pix is concentrated at two boundary points of an interval.Now the question becomes: what is the higher-dimensional extension of the aboveresult? We conjecture the following:Problem 1: Let c(x, y) = |x − y| and assume µ << Ld. If pi ∈ MT (µ, ν) solves(1.1.1), then there is a set Γ with pi(Γ) = 1 such that for µ-almost every x, the fiberΓx := {y | (x, y) ∈ Γ} is contained in the Choquet boundary (i.e., the set of extremepoints) of the closed convex hull of Γx, i.e.,Γx ⊆ Ext(conv(Γx)).First, I verifiy that the problem holds true under radial symmetry condition [31]:Theorem 1.2.1. Assume that µ and ν are radially symmetric on Rd and µ is abso-lutely continuous with respect to Lebesgue measure (µ << Ld). Let c(x, y) = |x− y|pfor some 0 < p ≤ 1 and let pi be a minimizer for the problem (1.1.1). Then forµ a.e. x, the disintegration pix is concentrated on two points which lie on the one-dimensional subspace spanned by x. Furthermore, pi is unique.In [31], ideas on the calculus of variations which nicely exploit the radial structureis presented.However, if the marginals µ, ν do not exhibit any symmetry on Rd, such a two-point splitting type result cannot be expected. For example, if µ is concentratedaround the origin in R2 and ν is concentrated around the vertices of a triangle, thenobviously every pix must be split to 3 points. The variational calculus in [31] is notapplicable here as it used the radial symmetry of the marginals.Instead, we try to seek the dual formulation. In [3], it is shown that if the costc is lower semi-continuous, then for the minimization problem, the “no-duality gap”5result holds:(1.2.2)min{∫Rd×Rdc(x, y) dpi; pi ∈MT (µ, ν)}= sup{∫Rdαdµ+∫Rdβdν ; (α, β) ∈ Dm(µ, ν, c)},where Dm(µ, ν, c) is the set of all (α, β) ∈ L1(µ)×L1(ν) such that for some boundedγ : Rd → Rd,(1.2.3) α(x) + β(y) + γ(x) · (y − x) ≤ c(x, y) for all (x, y) ∈ Rd × Rd.We note that without the linear term γ(x) ·(y−x), (2.1.4) is the dual formulationin the classical optimal transport, and this new term makes the problem completelydifferent than the classical case.By Prokhorov’s compactness theorem, it can be shown that the “primal” problem(which is the minimization problem in (1.2.2)) attains a solution. Now our first majorcontribution in chapter 2 shows:Theorem 1.2.2. If the “dual” problem (that is, the supremum problem in (1.2.2))attains a solution (α, β, γ), then the problem 1 holds true.Note that even when an optimal dual triplet (α, β, γ) is attained, it does not nec-essarily have any regularity, making the analysis of the dual problem difficult. Thisis reminiscent of PDE theory where solutions may first be found in the generalizedspaces. In chapter 2, the following regularity theorem is developed, and one of itsconsequences is that the variational proof of Theorem 1.2.2 can be activated:Theorem 1.2.3. Suppose x 7→ c(x, y) is locally Lipschitz (e.g. c(x, y) = |x−y|) andassume that the dual problem attains a solution (α, β, γ). Then, one can improve theirregularity and get the extended dual triple (α, β, γ), such that α is locally Lipschitz(thus, differentiable a.e), and a certain natural orthogonal projection of γ is alsodifferentiable a.e.Now the issue is that it is not clear whether the dual optimization problem attainsa solution or not. In the classical transport problem, it is essentially always attained,6thus we could study the dual side of the problem. Unfortunately, it is not the casein the martingale transport problem; an example is given in chapter 2. The newlyintroduced linear them γ(x)(y − x) causes trouble in dual attainment problem, andit makes the martingale transport problem hard to deal with.Since duality is not attained in general, we investigate the possibility of decom-posing an optimal martingale transport into “irreducible components”, on whichduality of the “restricted” problem is attained. For this, we establish a canonicaldecomposition of a martinagale transport set, and we prove that each componentindeed admits a dual. This is another main accomplishment of this thesis:Theorem 1.2.4 (Decomposition of martingale plans). Let pi ∈ MT (µ, ν) andlet Γ be a martingale-supporting set with pi(Γ) = 1. Then, one can define a naturaldecomposition map Ξ : Γ→ K(Rd) where K(Rd) is the space of convex closed subsetsof Rd, such that if p˜i = Ξ#pi denotes the push-forward of pi into K(Rd), and I ⊆ K(Rd)is the image of Γ by Ξ, then the following holds:(1) There exists a disintegration of pi along the map Ξ such thatpi(S) =∫IpiC(S)dp˜i(C) for each Borel set S ⊆ Rd × Rd,(1.2.4)where for p˜i-a.e. C ∈ I, piC is a probability measure supported on ΓC := Γ∩ (C×C).(2) If pi ∈ MT (µ, ν) is optimal for problem (1.1.1), then for every C ∈ I, piC isoptimal in MT (µC , νC), where µC , νC are marginals of piC. Furthermore, dualoptimal solution is attained for piC.(3) If µC is absolutely continuous with respect to Lebesgue measure on C, andc(x, y) = |x− y|, then for µC-almost all x, Γx ⊆ Ext(conv(Γx)).As applications of the above theory, the following theorems are presented; notethat the dual attainment assumption in Theorem 1.2.2 is no longer assumed.7Theorem 1.2.5. Let pi be a solution of (1.1.1) with c(x, y) = |x − y|. Supposeµ << Ld. Then for µ a.e. x, the conditional probability pix is supported on a set ofHausdorff dimension at most d-1.Theorem 1.2.6. If every “irreducible components” in the decomposition theoremhas dimension ≥ d-1 (in particular, when d = 2 this holds automatically), then forµ a.e. x, Γx ⊆ Ext(conv(Γx)).Theorem 1.2.7. Let c(x, y) = |x− y| and suppose ν is discrete; i.e. ν is supportedon a countable set. Then for µ a.e. x, the conditional probability pix is supported ond+1 points which are vertices of a polytope in Rd, and therefore the optimal solutionis unique.In the attempt to remove the dimensional assumption in Theorem 1.2.6, we en-counter a bizarre object, called Nikodym set [2]. In chapter 2, an optimal martingaleon the Nikodym set is constructed, such that if we apply the decomposition theorem2.2.14, then the disintegrated source measure µC is singular even if µ << Ld, due tothe bizarre structure of the Nikodym set. As µC is not absolutely continuous, ourregularity theory does not apply, and thus we cannot deduce the extreme propertyfrom our theory. This suggests that, the martingale transport problem in higherdimensions indeed is connected with one of the deepest problems in analysis, andmakes problem 1 very intriguing.Meanwhile, Theorem 1.2.7 and 1.2.1 suggest that, in certain cases, a finer struc-ture of optimal martingale might be obtained. Inspired by these results, we proposethe following problem:Problem 2 (Minimization:) Assume that pi is a minimizer for the problem (1.1.1)with distance cost. Then for µ a.e. x, the conditional probability pix is supported onk+ 1 points, where k = k(x) ≤ d, that form the vertices of a k-dimensional polytope,and therefore, the minimizing solution is unique.8In this regard, we show a simple counterexample that in the maximization prob-lem, neither polytope-type fine structure nor uniqueness of optimal solution can beexpected even under radially symmetric marginals µ and ν. Therefore, one of mynext research is to understand why the maximization and minimization problemcan be fundamentally different, which is also a very interesting analytic / geometricaspect of the optimal martingale transport problem in higher dimensions.Next, we introduce another problem, called the Skorokhod embedding and itsconnection with the martingale transport problem.1.3 Subharmonic martinagale transport problema.k.a Skorokhod embedding problemLet B be a Brownian motion with initial law µ. Consider the set of stopping timesT = {τ : τ is a stopping time and Bτ ∼ ν}(1.3.1)We say that τ ∈ T solves the Skorokhod embedding problem. Now, we considerMaximize / Minimize EP c(B0, Bτ ) over τ ∈ T(1.3.2)The aim of the study is to characterize such optimal stopping times.It turns out that this problem is equivalent to the transport problem (1.1.1) overSMT(µ, ν), where SMT(µ, ν) (Subharmonic Martingale Transport plan) is the set ofjoint probabilities such that its disintegration pix is in SH(x) for each x. Now theequivalence stems from the following proposition:Proposition 1.3.1. Let SH(x) be the set of probabilities with subharmonic barycenterat x; i.e. every ψ ∈ SH(x) satisfies the constraint (1.2.1) for every subharmonicfunction ϕ. Then for any ψ ∈ SH(x), there exists a stopping time τ such thatψ = Law(Bxτ ), where Bx is a Brownian motion starting at x.9We note that, in order for the above proposition to hold, we need to consider theclass of randomized stopping times (see [6]). By proposition, we see that the optimalSkorokhod embedding problem and the subharmonic martingale transport problemare equivalent, and it also shows the equivalence of martingale transport problemand Skorokhod embedding problem in one dimension.While in Skorokhod formulation we focus on the random movement of the Brow-nian motion and its stopping rule, in mass transport formulation we focus on thedistributions (pix)x satisfying the defining inequality (1.2.1). It turns out that the in-terplay between the two viewpoints are very enlightening, as can be seen in chapter 3.Since initiated by Skorokhod in 1961, the Skorokhod embedding problem hasbecome one of the most important subjects in probability, and there is a large litera-tures on the subject. We see that most works concern the one-dimensional problem,and one of the potential reasons is that the class of subharmonic functions is muchwider than that of convex functions in higher dimensions. We start tackling theproblem with the cost EP|B0 − Bτ | and radially symmetric marginals. However, itturns out that even in the radial marginal case, the subtle constraint (1.2.1) withrespect to the cone of subharmonic functions makes the problem hard to handle. Inchapter 3, we show that by applying the properties of Brownian motion and stoppingtime, a certain variational calculus is admissible. As a consequence, we show:Theorem 1.3.2. Let c(x, y) = |x − y| and let µ and ν be radially symmetric prob-ability measures on Rd which are in subharmonic order. Then for µ a.e. x, everyoptimal stopping time for the Skorokhod embedding problem is the first-hitting timeof a barrier set; in particular, every optimal stopping time is of natural (i.e. non-randomized) type and furthermore, the optimal time is unique.The naturality (non-randomization property) of the optimal stopping time in thetheorem can be seen as the Brenier-type result in Skorokhod embedding, and we seethat it possesses the extremal property every optimal solution exhibits.10Chapter 2Structure of optimal martingaletransport in higher dimensions2.1 IntroductionWe study the profile of one-step martingales pi on Rd × Rd that optimize the ex-pected value of the modulus of their increment, among all martingales with twogiven marginals µ and ν in convex order. More precisely, we shall investigate thesubmanifolds Γx of Rd along which a particle x is propagated under such transportplans. These questions originate in mathematical finance and are variations on theoriginal Monge problem, where one considers all couplings of the given marginalsand not only those of martingale type [32], [19], [37, 38]. However, unlike solutionsof the Monge problem, which are often supported on graphs (such as the well-knownBrenier solution [8] for the cost given by the squared distance), the additional mar-tingale constraint forces the transport to split the elements of the initial measure µ.One cannot therefore expect –but in trivial cases– that optimal martingale plans besupported on graphs.These martingale transport problems are motivated by problems in mathematicalfinance, which call for no-arbitrage lower (or upper) bounds on the price of a forwardstarting straddle, given today’s vanilla call prices at the two relevant maturities.11Just like in the Monge-Kantorovich theory for optimal transport, these problemshave dual counterparts, whose financial interpretation amount to constructing themost (or least) expensive semi-static hedging strategy which sub-replicates the payoffof the forward starting straddle for any realization of the underlying forward priceprocess.The minimization and maximization problems are quite different, though by nowwell understood when the marginals are probability measures on the real line, at leastin the case of one-step martingales. We refer to Hobson-Neuberger [28], Hobson-Klimmek [27], and Beiglbo¨ck-Juillet [5]. For the multi-step case, see Beiglbo¨ck-Henry-Labordere-Penkner [3]. The dynamic case have been also studied by Galichon-Henri-Labordere-Touzi [17] and Dolinsky–Soner [12, 13]. The two cases studied arewhen the cost is either c(x, y) = |x − y|, which is the main focus of this thesis, orthe case when the cost satisfies the so-called Mirlees condition. Note that the one-dimensional case is closely related to Skorohod embedding problem, since real valuedmartingales can be realized as adequately stopped Brownian paths. See for exampleHobson [26], Beiglbo¨ck-Cox-Huesmann [6] and Beiglbo¨ck-Henry-Labordere-Touzi [4].Surprisingly, much less is known in the case where the marginals are supported onhigher dimensional Euclidean spaces Rd. In this chapter, we shall tackle the followinggeneral optimization problem associated to a cost function c : Rd × Rd → R:(2.1.1)Maximize / Minimize cost[pi] =∫Rd×Rdc(x, y) dpi(x, y) over pi ∈MT (µ, ν),where MT (µ, ν) is the set of Martingale Transport plans, that is the set of proba-bilities pi on Rd × Rd with marginals µ and ν, such that for µ-almost x ∈ Rd, thecomponents of its disintegration (pix)x with respect to µ, i.e. dpi(x, y) = dpix(y)dµ(x),have their barycenter at x; in other words, for any convex function ϕ on Rd,ϕ(x) ≤∫Rdϕ(y) dpix(y).12One can also use the probabilistic notation in the following way:Maximize / Minimize EP c(X, Y )(2.1.2)over all martingales (X, Y ) on a probability space (Ω,F , P ) into Rd × Rd (i.e.E[Y |X] = X) with laws X ∼ µ and Y ∼ ν (i.e., P (X ∈ A) = µ(A) and P (Y ∈ A) =ν(A) for all Borel set A in Rd). Note that in this case, the disintegration of pi can bewritten as the conditional probability dpix(y) = P(Y = y|X = x).A classical theorem of Strassen [36] states that the set MT (µ, ν) of martingaletransports is nonempty if and only if the marginals µ and ν are in convex order, thatis if1. µ and ν have finite mass and finite first moments, and2.∫Rd ϕdµ ≤∫Rd ϕdν for every convex function ϕ on Rd.In that case we will write µ ≤C ν, which is sometimes called the Choquet order. Notethat x is the barycenter of a measure ν if and only if δx ≤C ν, where δx is Diracmeasure at x. .We will mostly consider the Euclidean distance cost c(x, y) = |x − y| unlessstated otherwise, although some of the results below hold for more general costs.We shall use the term optimization in problem (2.1.1) whenever the result holdsfor either maximization or minimization. We shall be more specific otherwise, sinceit will soon become very clear that the two cases can sometimes be fundamentallydifferent. The following theorem summarizes the main structural result when µ andν are one-dimensional marginals. Hobson-Neuberger [28] were first to deal with themaximization case while Beiglbo¨ck-Juillet [5] and D. Hobson and M. Klimmek [27]deal with the context of minimization. For a subset Γ in Rd × Rd, we shall denoteby Γx, the fiber Γx := {y ∈ Rd; (x, y) ∈ Γ}.Theorem 2.1.1. (Beiglbo¨ck-Juillet [5], Hobson-Neuberger [28], Hobson-Klimmek[27]) Assume that µ and ν are probability measures in convex order on R, and thatµ is continuous. There exists then a unique optimal martingale transport plan pi for13the cost function c(x; y) = |x− y|. Moreover there is a set Γ ⊆ R×R with pi(Γ) = 1such that:1. in the minimization problem, |Γx| ≤ 3 for every x ∈ R. More precisely, pican be decomposed into pistay + pigo, where pistay = (Id ◦ ×Id)#(µ ∧ ν) (thismeasure is concentrated on the diagonal of R2) and pigo is concentrated ongraph(T1) ∪ graph(T2) where T1, T2 are two real-valued functions.2. in the maximization problem, |Γx| ≤ 2 for every x ∈ R, and pi is concentratedon graph(T1) ∪ graph(T2) where T1, T2 are two real-valued functions.Our main goal in this thesis is to consider higher dimensional analogues of theabove result. In section 2.6, we show that the above theorem extends, in the case ofminimization, to the setting where the marginals are radially symmetric on Rd andc(x, y) = |x − y|p for 0 < p ≤ 1. The general case is wide open and our goal is towork towards establishing the following:Conjecture 1: Consider the cost function c(x, y) = |x − y| and assume that µ isabsolutely continuous with respect to Lebesgue measure on Rd (µ << Ld). If pi is amartingale transport that optimizes (2.1.1). xThen, there is a set Γ with pi(Γ) = 1such that for µ-almost every x, Γx := {y | (x, y) ∈ Γ} is contained in the Choquetboundary (i.e., the set of extreme points) of the closed convex hull of Γx, i.e.,Γx ⊆ Ext(conv(Γx)).Note that for the minimization problem, we can and will assume that µ ∧ ν = 0since any minimizing martingale transport for problem (2.1.1) must let the supportof µ ∧ ν stay put. See [5] or [31] for a proof. One can then easily see that in theone dimensional case, the above conjecture reduces to Theorem 2.1.1 since then thedimension of the linear span of Γx is one and the Choquet boundary consists ofexactly two points, unless of course Γx is a singleton.We shall be able to prove the above conjecture in many important cases, inparticular, when a natural dual optimization problem is attained (Theorem 2.2.4),14or when the linear span of Γx has full dimension (Theorem 2.2.15). Another casewhere the answer is affirmative is in dimension d = 2 (Theorem 2.2.16) provided thetwo marginals are well separated, which as noted above, is not very restrictive in thecase of minimization.We actually expect to have a more rigid structure in the case of minimization.Indeed, we show that in this case, assuming µ ∧ ν = 0, we also have #Γx ≤ 2 forµ-almost all x, whenever the marginals are radially symmetric on Rd and c(x, y) =|x− y|p for 0 < p ≤ 1. The general case remains open as we propose the following:Conjecture 2 (Minimization): Consider the cost function c(x, y) = |x − y| andassume that µ is absolutely continuous with respect to Lebesgue measure on Rd, thatµ∧ ν = 0. If pi is a martingale transport that minimizes (2.1.1). Then, there is a setΓ with pi(Γ) = 1 such that for µ almost every x, the set Γx consists of k + 1 pointsthat form the vertices of a k-dimensional polytope, where k := k(x) is the dimensionof the linear span of Γx and therefore, the minimizing solution is unique.We shall give a partial answer to the above conjecture under the assumption thatthe target measure ν is discrete. Actually, in this case the result holds true in boththe maximization and minimization cases (Theorem 2.2.16). We note however that –unlike the minimization case– one cannot always expect in higher dimensions neitherthe uniqueness of a maximizer (Example 2.2.19), nor a polytope-type structure forits support of Γx (Example 2.2.18), even when the marginals are radially symmetric.Just like in the Monge-Kantorovich theory, the above optimization problem (2.1.1)has a dual formulation, which will be crucial to our analysis. Indeed, one can show(see for example [3]) that if the cost c is lower semi-continuous, then for the mini-mization problem,(2.1.3)min{∫Rd×Rdc(x, y) dpi; pi ∈MT (µ, ν)}= sup{∫Rdβdν −∫Rdαdµ; (α, β) ∈ Dm(µ, ν, c)},where Dm(µ, ν, c) is the set of all (α, β) ∈ L1(µ) × L1(ν) such that for some γ ∈15Cb(Rd,Rd),(2.1.4) β(y)− α(x)− γ(x)(y − x) ≤ c(x, y) for all (x, y) ∈ Rd × Rd.Similarly, if the cost c is upper semi-continuous, then(2.1.5)max{∫Rd×Rdc(x, y) dpi; pi ∈MT (µ, ν)}= inf{∫Rdβdν −∫Rdαdµ; (α, β) ∈ DM(µ, ν, c)},where DM(µ, ν, c) is the set of all (α, β) ∈ L1(µ) × L1(ν) such that for some γ ∈Cb(Rd;Rd),(2.1.6) β(y)− α(x)− γ(x)(y − x) ≥ c(x, y) for all (x, y) ∈ Rd × Rd.Note that if pi is an optimal martingale measure and if the corresponding dual prob-lem is attained on a triplet (α, β, γ), then it is easy to see that there exists a Borelsubset Γ ⊆ Rd × Rd with full pi-measure such that(2.1.7) β(y)− α(x)− γ(x)(y − x) = c(x, y) if and only if (x, y) ∈ Γ.We shall show that characterizing a concentration set Γ of pi with such an iden-tity forces it to have a specific extremal structure. However, unlike the Monge-Kantorovich theory for mass transport, solutions to the dual problem need not exist–at least in the maximization problem– even in the one-dimensional case, as shownin [3]. Example 2.3.2 below provides another simple counterexample.We shall actually show that attainment in the dual problem is a very strongassumption. Namely, in the case x 7→ c(x, y) is locally Lipschitz (uniformly in y),once there is a triplet of functions (α, β, γ), where the dual is attained, one can thenimprove their regularity, that is, one can associate extensions (α, β, γ) such that αand an appropriate orthogonal projection of γ are differentiable almost everywhere.This additional regularity allows us to show that Conjecture 1 then holds in case ofdual attainment (Theorem 2.2.4) for the cost c(x, y) = |x− y|.We then proceed to build on this result to deal with more general cases. One16of the notable advances here is a decomposition of any Borel set Γ supporting agiven optimal martingale transport pi, into components {ΓC}C∈I , where a dual triplet(αC , βC , γC) is found for each piece in such a way that (2.1.4) (resp., (2.1.6)) holdsfor the minimization (resp., maximization) setting, with equality on each piece ΓC .See Theorem 2.2.11 for the precise statement.What is remarkable is that this decomposition, as well as the duality statements,can be done in full generality (i.e., for any cost function) and without any referenceto a martingale transport problem or even to any reference measure. The decompo-sition is done through an equivalence relation on the projection XΓ of Γ on the firstcoordinate, that is induced by a well chosen irreducible convex paving, that is a collec-tion of mutually disjoint convex subsets in Rd that covers XΓ. On the other hand, agood analogy for the (pointwise and not measure-theoretic) duality conditions, is therole played by c-cyclically monotone sets and their corresponding c-convex potentialsin the Monge-Kantorovich theory associated to a given cost function c.Back to the martingale problem, we then consider the disintegration {piC}C∈Iof any martingale measure pi along the above described decomposition of its sup-port Γ (Theorem 2.2.14). This then gives a canonical decomposition of the optimalmartingale problem into a collection of non-interactive martingale problems whereduality is attained for each piece piC in MT (µC , νC). We note that this result canbe seen as a generalization of the decomposition of Beiglebo¨ck-Juillet [5] in the one-dimensional case d = 1, where the disintegration comes from restricting the measuresµ, ν onto open subintervals of R obtained by examining the potential functions forµ, ν. Like theirs, our decomposition applies to any cost function c and not only toc(x, y) = |x − y|. It is however quite different since it depends on the support ofthe martingale measure pi that we start with. More importantly, our decomposi-tion need not be countable (Example 2.5.2) which creates additional and interestingcomplications for the higher dimensional cases.We shall use the above decomposition/disintegration results to establish the abovestated conjectures under various conditions. For example, Conjecture 1 holds in thecase where the dimensions of all components (C)C∈I are larger than d − 1, or indimension d = 2 provided the supports of the marginals µ and ν are well separated17from each other (see Theorem 2.2.16).Remarkably, the results discussed so far do not distinguish between the minimiza-tion and maximization problems (except that we assume that µ ∧ ν = 0 in the caseof minimization). The previously mentioned decomposition can be used to proveConjecture 2) in either the minimization and maximization case, provided the targetmeasure ν has a countable support (Theorem 2.2.17). However, as mentioned above,we believe that these two problems are quite different, at least in terms of findingfiner structural results for each of the cases.In the next section, we give the precise statements of our results. In Section 2.3,we show the regularity and structural results in the case where the dual problem isattained. In Section 2.4, we establish the decomposition/disintegration results, aswell as the existence of dual triplets on each of the components in the decomposition.Section 2.5 is devoted to proving under various additional conditions, structuralresults for sets where optimal martingale transports concentrate.2.2 Main resultsTo discuss our main results, we first introduce a few definitions. We also borrowsome of the notation from [5].Definition 2.2.1. For A ⊆ Rd, we shall write V (A) for the lowest-dimensional affinespace containing A. Also define IC(A) := int(conv(A)) and CC(A) := cl(conv(A)),where again the interior or closure is taken in the topology of V (A), where the topologyof a set A is with respect to the Euclidean metric topology of V (A) (and not withrespect to the whole space Rd).For a Borel set Γ ⊆ Rd × Rd, we write XΓ := projXΓ, YΓ := projY Γ, i.e. XΓ is theprojection of Γ on the first coordinate space Rd, and YΓ on the second. For eachx ∈ XΓ, we consider the fiberΓx = {y ∈ Rd | (x, y) ∈ Γ}.18If Γx = {x}, then IC(Γx) = {x} since the interior of a singleton set is itself in thetopology of 0-dimensional space.Note that if pi is a martingale transport plan inMT (µ, ν), then there exists a Borelset Γ ∈ Rd×Rd such that pi(Γ) = 1 in such a way that µ-almost all x ∈ XΓ belongs toIC(Γx). The set Γ may not necessarily be the support of pi, which is normally definedas the smallest closed set with full pi-measure, but can be constructed in the followingway: Consider any Borel set Λ ⊆ Rd × Rd such that pi(Λ) = 1, then the uniquedisintegration (pix)x of pi with respect to µ yields a well-defined measurable mapx→ pix on a Borel set E in Rd with µ(E) = 1 such that each x in E is the barycenterof pix and pix(Λx) = 1. It is clear that x ∈ CC(Λx). However, it is not necessarily inIC(Λx). Note however that for any Borel set B in Rd, the map x 7→ pix(B) is Borelmeasurable, hence for each r > 0, the set Br := {(x, y) |x ∈ E, pix(Br(y)) > 0} isBorel (Here, Br(y) is the open ball with center y and radius r in Rd) and consequentlythe set Θ := {(x, y) |x ∈ E, y ∈ spt(pix)} =⋂∞n=1B1/n is also Borel. LettingΓ := Λ ∩ Θ, it is clear that pi(Γ) = 1 and pix(Γx) = 1 for all x ∈ E. Finally notethat the probability measure pix has its barycenter at x and that Γx ⊆ spt(pix), hencex ∈ IC(Γx) for x ∈ E.This leads us to the following purely set-theoretic definition.Definition 2.2.2. Say that a Borel set Γ ⊆ Rd ×Rd is a martingale supporting set,iffor every x ∈ XΓ, x ∈ IC(Γx).(2.2.1)We let SMT denote the class of all martingale supporting sets, i.e.SMT := {Γ ⊆ Rd × Rd : Γ is a Borel set such that for every x ∈ XΓ, x ∈ IC(Γx)}.And just like martingale supporting sets can be defined independently of trans-port problems, duality can also be defined without the presence of any prescribedmarginals.19Definition 2.2.3. Let c : Rd × Rd → R be a cost function. We shall say that asubset G of Rd×Rd admits a c-dual, or that G is c-dualizable, if there exists a triplet{α, β, γ}, α : XG → R, β : YG → R, γ : XG → Rd, such that the following dualityrelation (for the maximization problem) holds:β(y)− c(x, y) ≥ γ(x) · (y − x) + α(x) ∀x ∈ XG, y ∈ YG,(2.2.2)β(y)− c(x, y) = γ(x) · (y − x) + α(x) ∀(x, y) ∈ G.(2.2.3)In the minimization problem, the inequality in (2.2.2) is reversed.We will call the triplet {α, β, γ} a c-dual for G. A function β : YG → R is calleda c-dual for G if there exist α and γ such that {α, β, γ} is a c-dual for G.Our first main result show that c-dualizable martingale supporting sets enjoystrong structural results, which will be proved in sections 2.3.2 and 2.3.3. Recall thatin the minimization problem, we can and shall assume without loss of generality, thatµ ∧ ν = 0.Theorem 2.2.4. Suppose x 7→ c(x, y) is locally Lipschitz, where the Lipschitz con-stants are uniformly bounded in y, (e.g. c(x, y) = |x − y|). Then, the followingholds:1. [Regularity of dual solutions] Let Γ be a Borel set in SMT that admits a c-dual in the sense of definition 2.2.3 above and suppose that XΓ ⊆ Ω := IC(YΓ)with Ω being an open set in Rd. Then, there exist α : Ω → R, β : Rd → R,γ : Ω → Rd, such that the triplet (α, β, γ) is also a c-dual for Γ, with α beinglocally Lipschitz (thus, differentiable a.e) while γ is locally bounded in Ω.If we further assume that y 7→ c(x, y) is locally Lipschitz with Lipschitz con-stants uniformly bounded in x, then β is also locally Lipschitz.In the case of maximization, β can be taken to be a convex function on Rdwhenever the function y 7→ c(x, y) is assumed to be convex.2. [Structure of an optimal martingale transport when duality is at-tained] Assume now that c(x, y) = |x− y| and that µ is absolutely continuous20with respect to Lebesgue measure. If pi ∈ MT (µ, ν) is a solution of (2.1.1) foreither the minimization or maximization problem and if the dual problem isattained, then for µ-almost all x, Γx is contained in the Choquet boundary ofits closed convex hull i.e., Γx ⊆ Ext(conv(Γx)).Since duality is not attained in general ([3] or Example 2.3.2 below), we investigatethe possibility of decomposing an optimal martingale transport into “irreduciblecomponents” on which duality is attained for the “restricted” problem. First, weintroduce the concept of a convex paving.Definition 2.2.5. Let Φ be a family of mutually disjoint open convex sets in Rd(Recall that here, the openness of a set C is with respect to V (C)). Given a setΓ ⊆ Rd × Rd, we shall say that Φ is a convex paving for Γ provided1. XΓ ⊆⋃C∈Φ C.2. each C ∈ Φ contains at least one element x in XΓ (C is then denoted C(x)).3. For any z, x ∈ XΓ, we have: IC(Γz) ∩ C(x) 6= ∅ ⇒ IC(Γz) ⊆ C(x).Note that such a paving clearly defines an equivalent relation on XΓ by simplydefining x ∼Φ x′ if and only if C(x) = C(x′). The corresponding equivalent classesare then [x] = C(x) ∩XΓ.There can be many convex pavings of a set Γ ⊆ Rd×Rd; take for example Φ := {Rd}which however doesn’t give much information about Γ. We therefore introduce thefollowing concept.Definition 2.2.6. For a fixed set Γ ⊆ Rd×Rd, we shall say that Φ is an irreducibleconvex paving for Γ if for any other convex paving Ψ for Γ, we have the followingproperty: If C ∈ Φ, D ∈ Ψ are such that C ∩D 6= ∅, then necessarily C ⊆ D.Note that an irreducible convex paving for a set Γ is necessarily unique. As totheir existence, we shall show in section 4 the following result.Theorem 2.2.7. For every martingale supporting set Γ ∈ SMT , there exists a uniqueirreducible convex paving for Γ.21Now, a key property of an optimal martingale transport pi ∈MT (µ, ν) is due toBeiglbo¨ck and Juillet [5]. It says that there exists a set Γ of full pi-measure in Rd×Rdsuch that each one of its finite subsets has a c-dual. This is one of the consequencesof the variational lemma in [5], where duality on finite sets is obtained via linearprogramming (see [5] and [27]). We state it here as a proposition.Definition 2.2.8. Let c : Rd × Rd → R be a cost function. (We do not assumeanything on the cost function c.) We say that a set G ⊆ Rd × Rd is finitely c-dualizable, if every finite subset H ⊆ G admits a c-dual in the sense of definition2.2.3.The following proposition shows that an optimal martingale transport concen-trates naturally on a martingale supporting Borel set in SMT that is also finitelyc-dualizable.Proposition 2.2.9. Let pi ∈MT (µ, ν) be an optimal martingale transport for Prob-lem (2.1.1). Assuming the cost c continuous, then there exists a Borel set Γ ∈ SMTsuch that pi(Γ) = 1 and Γ is finitely c-dualizable.Indeed, it is shown in [39] that there exists a Borel set Λ in Rd×Rd with pi(Λ) =1, that satisfies a certain monotonicity property, which is in fact the martingalecounterpart of the c-cyclic monotonicity that is inherent to the Monge-Kantorovichtheory. As mentioned above, by the duality theorem of linear programming, thisproperty is equivalent to saying that Λ is finitely c-dualizable. Starting with such a Λ,use now the remark described before Definition 2.2.2 to find a martingale supportingset Γ ⊆ Λ such that pi(Γ) = 1. Besides being in SMT , it is clear that Γ is also finitelyc-dualizable.Remark 2.2.10. We note that in the above proposition, the continuity assumptionon the cost can be relaxed, in particular when one has the “no-duality gap” (2.1.3)or (2.1.5). See [39] for details.Since duality is not attained in general, an optimal martingale transport measureis not necessarily concentrated on a c-dualizable set Γ ∈ SMT . On the other hand,in view of the above result, we can and will assume that it is concentrated on a set22Γ ∈ SMT that is finitely c-dualizable. This leads to the question of finding “maximal”components of Γ that are c-dualizable. It turns out that this is indeed the case aswe show that if a set G ∈ SMT is such that all of its finite subsets have c-duals, thenthe same hold for G∩ (C×Rd), where C is any component of the irreducible convexpaving Φ of G.In view of Theorem 2.2.7, we can now state the following.Theorem 2.2.11. Let Γ ∈ SMT be finitely c-dualizable. Then, there exists an irre-ducible convex paving Φ for Γ such that for each x ∈ XΓ, the set Γ∩(C(x)×Rd) has ac-dual in the sense of definition 2.2.3, where the sets C(x) are the convex componentsin Φ.Remark 2.2.12. Theorem 2.2.11 can be seen as a martingale counterpart to a cel-ebrated result of Rockafellar [33] in the Monge-Kantorovich theory for mass trans-port, which essentially says that the property of c-cyclical monotonicity that char-acterizes the support of optimal transport plans are c-dualizable via a pair of func-tions, one being c-convex and the other being its c-Legendre transform. Here, thefinitely c-dualizable property replaces cyclic monotonicity, while duality for martin-gale supporting sets require three functions, instead of the two in Rockafellar’s result.However, in the martingale case, “dualizability” does not always hold on the wholesupport but requires it to be decomposed into irreducible components.Theorems 2.2.4 and 2.2.11 yield several structural results in general dimensions suchas the following. Note that the attainability of the dual problem is not assumed here.Corollary 2.2.13 (dimensional result). Let pi be a solution of the optimizationproblem (2.1.1) with c(x, y) = |x − y| and suppose µ is absolutely continuous withrespect to Lebesgue measure. Then, there is a set Γ with pi(Γ) = 1 such that forµ-almost every x in Rd,(1) the Hausdorff dimension of Γx is at most d-1, and(2) whenever dimV (Γx) = d, then Γx ⊆ Ext(conv(Γx)).23Proof. Let Γ be as in Proposition 2.2.9, and first consider those points x withdimV (Γx) = d. In this case, the disjoint sets C(x) in Theorem 2.2.11 are opensets in Rd and so, the restriction of µ to each of these dualizable components isagain absolutely continuous. Theorem 2.2.4 can then be applied. Note now that theChoquet boundary has dimension at most d− 1. This shows that for µ-a.e. x in theopen set⋃dimV (C)=dC, we have that dimV (Γx) ≤ d−1. The property also obviouslyholds outside that set, which means that item (1) is also verified.We now perform a disintegration of pi with respect to the decomposition {ΓC}C∈Φ.For Γ ∈ SMT , consider the irreducible components {C;C ∈ Φ} as given in Theo-rem 2.2.7. We shall assume in the sequel that the map Ξ : Γ → K(Rd) defined by(x, y) 7→ C(x) is measurable, where K(Rd) is the space of convex closed subsets ofRd, which is a separable complete metric space for the Hausdorff metric. We canthen show that a martingale transport plan pi can be canonically disintegrated intoits components given by (Γ ∩ (C(x)× Rd))x∈XΓ . See Section 2.4.4 for the proof.Theorem 2.2.14 (Disintegration of martingale plans). Let (µ, ν) be probabilitymeasures on Rd in convex order and let pi ∈ MT (µ, ν). In the case of minimizationproblem, we assume further that µ ∧ ν = 0. Then, there exists a set Γ ∈ SMT withpi(Γ) = 1 such that if p˜i = Ξ#pi denotes the push-forward of pi into K(Rd), andI ⊆ K(Rd) is the image of Γ by Ξ, then the following holds:(1) There exists a disintegration of pi along the map Ξ such thatpi(S) =∫IpiC(S)dp˜i(C) for each Borel set S ⊆ Rd × Rd,(2.2.4)where for p˜i-a.e. C, piC is a probability measure supported on ΓC := Γ∩(C×Rd).(2) For p˜i-a.e. C ∈ I, there exist probability measures µC , νC such that the couple(µC , νC) is in convex order, µC is supported on XC := XΓ ∩ C, νC on YΓC , andpiC ∈MT (µC , νC).(3) If pi is optimal for problem (2.1.1) in MT(µ, ν), then for p˜i-a.e. C ∈ I, piC is24optimal for the same problem on MT (µC , νC). Furthermore, ΓC admits a c-dualin the sense of definition 2.2.3. In particular, duality is attained for piC.(4) If µC is absolutely continuous with respect to Lebesgue measure on V (C), andc(x, y) = |x− y|, then for µC-almost all x, Γx ⊆ Ext(conv(Γx)).The decomposition of Γ into components {C}C∈I given by Theorem 2.2.14 wasmotivated by a similar one proposed by Beiglbo¨ck-Juillet [5] in the one dimensionalcase (d = 1). It is however quite different since in their case the decompositiondepends only on the marginals µ and ν. Theirs is also a countable partition, whichmakes the restricted problems much more amenable to analysis. Actually, the inter-vals in their decomposition are simply the connected components of the set wherethe potentials of µ and ν are different on the real line.In fact, our decomposition coincides with the one performed by Beiglbo¨ck-Juilletin the one-dimensional case. But, in the higher dimensional cases, our decompositionmay also depend on the set Γ in SMT , where the martingale transport concentrates.Also, our decomposition can be uncountable, and the induced probability measuresµC ’s can be Dirac measures (see Example 2.5.2). The relationship between the twodecompositions will be investigated in a forthcoming paper [21].The above decomposition yields certain structural results for the original optimalmartingale transport, especially in the case c(x, y) = |x − y|. For the next result,define the function δ(x) as the radius of the largest open ball contained in C(x), i.e.δ(x) = sup{r ≥ 0 : B(x, r) ⊆ C(x)} for x ∈ XΓ, and define the setE = {x ∈ XΓ : Γx 6⊆ Ext(conv(Γx))}.Theorem 2.2.15. Let c(x, y) = |x − y| and suppose µ << Ld. Let pi ∈ MT (µ, ν)be a solution of (2.1.1) and let Γ ∈ SMT be a finitely c-dualizable Borel set such thatpi(Γ) = 1. Assume the function δ and the set E are measurable, and supposedim(V (C(x))) ≥ d− 1 for µ a.e. x.(2.2.5)25Then, for µ almost every x ∈ Rd we have Γx ⊆ Ext(conv(Γx)).This will be proved in Section 2.5.2, as well as the following corollary.Corollary 2.2.16. Let d = 2, c(x, y) = |x − y| and suppose µ << Ld. Let pi ∈MT (µ, ν) be a solution of (2.1.1) and let Γ ∈ SMT be a finitely c-dualizable Borel setsuch that pi(Γ) = 1. Assume E is measurable and supposeµ is concentrated in an open set U in Rd, while ν is supported on U c.(2.2.6)Then, for µ almost every x ∈ R2, we have Γx ⊆ Ext(conv(Γx)).Note that the separation assumption (2.2.6) is very mild in the minimizationproblem, since as mentioned above, we can then assume µ ∧ ν = 0. Also note thatin this 2D case, we do not need the measurability of δ once (2.2.6) is assumed.We shall also give a partial answer to Conjecture 2 in Section 2.5.3. Namely,assuming that the target measure is discrete, and in this case, it holds true in boththe maximization and minimization cases:Theorem 2.2.17. Let c(x, y) = |x − y| (or more generally for c(x, y) = |x − y|pwith p 6= 2), suppose µ is absolutely continuous with respect to the Lebesgue measure,and that ν is discrete; i.e. ν is supported on a countable set. Let pi ∈ MT (µ, ν) bea solution of (2.1.1) and let Γ ∈ SMT be a finitely c-dualizable Borel set such thatpi(Γ) = 1. Then for µ a.e. x, the fiber Γx consists of exactly d + 1 points whichare vertices of a d-dimensional polytope in Rd, and therefore the optimal solution isunique.Now we give a couple of examples, which illustrate that the above stated conjec-tures could be the best structural results we can hope for.Example 2.2.18. The polytope-like structure of the support required in conjecture 2does not hold in general for the corresponding maximization problem. Indeed, since12(|x− y| − 1)2 ≥ 0, we have12|y|2 − 12|x|2 + 1− x · (y − x) ≥ |x− y| on Rd × Rd,(2.2.7)26with equality on the set {(x, y); |x − y| = 1}. The functions α(x) = 12|x|2 − 1,β(y) = 12|y|2 and γ(x) = x then form a dual triplet for the maximization problemwith cost |x − y|. This means that every martingale (X, Y ) with |X − Y | = 1 a.s.is optimal for the maximization problem corresponding to its own marginals X ∼ µand Y ∼ ν. Hence, Γx is not in general a discrete set.Finally, we consider the uniqueness question in conjecture 2, and whether it couldhold for the maximization problem. In [5] it is shown that when d = 1 the solution ofthe martingale transport problem (2.1.1) is unique for both max/min problem underthe assumption that µ is absolutely continuous. Also, it is reported in [31] that inthe minimization problem with radially symmetric marginals (µ, ν), the minimizeris again unique in any dimension. We note however that, unlike the minimizationcase, one cannot expect the uniqueness of a maximizing martingale measure in higherdimensions, even in the radially symmetric case, as the following example indicates.Example 2.2.19. Let µ be a radially symmetric probability measure on R2 ' C suchthat µ({0}) = 0. Let z1 = cos pi4 +i sin pi4 , z2 = cos pi4−i sin pi4 , z3 = −z1 and z4 = −z2,and define the probability measures pi1 and pi2 on C × C, whose disintegrations pi1xand pi2x for each x ∈ C, x 6= 0, are given by,pi1x =14δx+ x|x| z1 +14δx+ x|x| z2 +14δx+ x|x| z3 +14δx+ x|x| z4pi2x =18δx+ x|x| z1 +38δx+ x|x| z2 +18δx+ x|x| z3 +38δx+ x|x| z4 .Then, by the discussion around Section (2.5.1), one can see that both pi1 and pi2 areoptimal for the maximization problem corresponding to µ and ν := ν1 = ν2, wheredνi(y) =∫C piix(y) dµ(x), i = 1, 2, hence, the maximizer is not unique.2.3 Profile of an optimal martingale transport whenthe dual is attainedThe goal of this section is to prove Theorem 2.2.4 which shows that dual attainmentof the optimization problem (2.1.1) yields good structural results thanks to regularity27properties of the dual functions.2.3.1 Duality for optimal martingale transportsAs mentioned in the introduction, there is a dual formulation for problem (3.1.2),just like in the Monge-Kantorovich theory for (non-martingale) mass transport.Lemma 2.3.1. (see e.g. [3]) Let µ and ν be two probability measures on Rd in convexorder, and let c : Rd×Rd → R be a cost function that is upper semi-continuous, then(2.3.1)max{∫Rd×Rdc(x, y) dpi; pi ∈MT (µ, ν)}= inf{∫Rdβdν −∫Rdαdµ; (α, β) ∈ DM(µ, ν, c)},where DM(µ, ν, c) is the set of all (α, β) ∈ L1(µ) × L1(ν) such that for some γ ∈Cb(Rd),(2.3.2) β(y)− α(x)− γ(x)(y − x) ≥ c(x, y) for all (x, y) ∈ Rd × Rd.A similar result holds for the cost minimization problem, provided c is lower semi-continuous, and the inequality in (2.3.2) is reversed.Note that if the dual problem is attained at functions α, β and γ in DM(µ, ν, c),then since (in the maximization problem),(2.3.3) β(y)− α(x)− γ(x)(y − x) ≥ c(x, y) for all (x, y) ∈ Rd × Rd,and µ and ν are the marginals of some optimal pi in MT(µ, ν), then since∫γ(x) ·(y − x) dpi(x, y) = 0 (due to martingale condition), we have∫Rd×Rd{β(y)− α(x)− γ(x)(y − x)}dpi(x, y) =∫Rd×Rdc(x, y)dpi(x, y).It follows thatβ(y)− α(x)− γ(x)(y − x) = c(x, y) for pi a.e (x, y) ∈ Rd × Rd,28hence the equality holds on a concentration set Γ of pi.Conversely, if G ⊆ Rd × Rd and pi∗(G) = 1 for some pi∗ ∈ MT(µ, ν) and if thereexists a triplet (α, β, γ) in DM(µ, ν, c) with equality (2.3.3) holding on G, then pi∗is an optimal solution of (2.1.1). Indeed, let pi ∈ MT(µ, ν) and let H be such thatpi(H) = 1. As µ(XG) = 1 and ν(YG) = 1, we can then assume that XH ⊆ XG andYH ⊆ YG, hence by integrating (2.3.3) with pi, we get∫Rd×Rdc(x, y) dpi(x, y) ≤∫Rdβ(y) dν(y)−∫Rdα(x) dµ(x).(Again∫γ(x) · (y − x) dpi(x, y) = 0 since pi ∈ MT(µ, ν)). However, by integrating(2.3.3) with pi∗ and since we have equality on G, we get∫Rd×Rdc(x, y) dpi∗ =∫Rd×Rd{β(y)− α(x)− γ(x)(y − x)}dpi∗ =∫Rdβ(y) dν −∫Rdα(x) dµ.This shows that pi∗ is optimal.This suggests that dual attainability is actually a property of the support ofthe optimal martingale transport and not of the measure itself, which is why weintroduce in Definition 2.2.3 the concept of pointwise c-duality for a set. Note thatthe triplet (α, β, γ) defining c-duality in Definition 2.2.3 are much more general thanthe elements in DM(µ, ν, c), and our results indicate that dual attainment in (2.3.1)does not occur in general even in the most relaxed way, where the functions (α, β, γ)are allowed to be highly non-regular.An obvious but important remark is that if G is c-dualizable in the sense ofDefinition 2.2.3, then any subset H of G is also c-dualizable, since any c-dual tripletfor G would work for H as well. We have also seen above that if G admits a c-dualand if it supports a martingale transport pi∗ in MT(µ, ν), then pi∗ is an optimalsolution of (2.1.1) for the cost c. Hence, only Borel sets where optimal martingalemeasure concentrate can admit a c-dual, but not conversely, as we remarked in theintroduction that there exists an optimal martingale measure whose support cannever have a c-dual (see [3]). We give here a very simple example.Example 2.3.2. Let µ = ν be two identical probability measures on the interval [0, 1],29then the only martingale (say pi) from µ to itself is the identity transport, hence itis obviously the solution of the maximization problem with respect to the distancecost, and its support is Γ = {(x, x) : x ∈ [0, 1]}. If now {α, β, γ} is a solution to thedual problem, thenβ(y) ≥ |x− y|+ γ(x) · (y − x) + α(x) ∀x ∈ [0, 1],∀y ∈ [0, 1];β(y) = |x− y|+ γ(x) · (y − x) + α(x) ∀(x, y) ∈ Γ.The above relations easily yield that for any 0 < a < b < 1, we have γ(a) + 2 ≤ γ(b),which means that it is impossible to define a suitable real-valued function γ for a.e.x in [0, 1].The fact that a concentration set of an optimal martingale transport need notadmit a c-dual is in sharp contrast with the usual optimal transport problem andmakes the martingale transport problem difficult to analyze. Nevertheless, we shallsee in section 3 that whenever a concentration set admits a dual, then it exhibits aspecial extremal configuration.2.3.2 Regularity of the dual functionsIn Definition 2.2.3, the triplets (α, β, γ) are only assumed to be real-valued functions,which means that Lemma 2.3.5 is not directly applicable. However, we will now showthat dual triplets have regular extensions in such a way that Theorem 2.2.4 item (1)would apply.Lemma 2.3.3 (Theorem 2.2.4 item (1)). Assume that Γ is a Borel martingalesupporting set in SMT with a c-dual triplet {α, β, γ}. Set Ω := IC(YΓ) and supposethat XΓ ⊆ Ω and Ω is open in Rd. Assume that x 7→ c(x, y) is locally Lipschitz,where the local Lipschitz constants are uniformly bounded in y (e.g. c(x, y) = |x−y|).Then, there exists a locally Lipschitz α : Ω→ R, a locally bounded γ : Ω→ Rd, andβ : Rd → R such that {α, β, γ} coincides with {α, β, γ} on XΓ, YΓ, XΓ, respectively(thus, is also a c-dual triplet for Γ). In particular, α is differentiable a.e. in Ω.30Proof. Recall the duality relations for the maximization problem:β(y)− c(x, y) ≥ γ(x) · (y − x) + α(x) ∀x ∈ XΓ, y ∈ YΓ(2.3.4)β(y)− c(x, y) = γ(x) · (y − x) + α(x) ∀(x, y) ∈ Γ.(2.3.5)Let Bx(y) = β(y) − c(x, y). First, we extend the function α on Ω in the followingway:α(x) := sup{c ∈ R : ∃a ∈ Rd such that Bx(y) ≥ a · (y − x) + c ∀ y ∈ YΓ} for x ∈ Ω.Thus α ≥ α on XΓ. Also note that (2.3.5) and the fact x ∈ IC(Γx) imply thatα(x) = α(x) for all x ∈ XΓ, thus it is indeed an extension that is finite on XΓ. Weclaim that it is also finite at every x ∈ Ω. Indeed, for x ∈ Ω = IC(YΓ), we can choose{y1, ..., ys} ⊆ YΓ such thatx ∈ U := IC({y1, ..., ys}) and U is open in Rd.(2.3.6)It is clear that α(z) ≤ M := maxyi Bz(yi) for all z ∈ U , and since x 7→ c(x, y) islocally Lipschitz, we see that α(z) is bounded above for all z near x. In other words,α is locally upper bounded. The proof below will show that it is also locally lowerbounded.Now we extend γ to Ω in the following way:γ(x) := {a ∈ Rd : β(y)− c(x, y) ≥ a · (y − x) + α(x) ∀y ∈ YΓ,β(y)− c(x, y) = a · (y − x) + α(x) ∀y ∈ Γx}Note that γ(x) is a multivalued function. For x ∈ XΓ, α(x) = α(x), and so γ(x) ∈γ(x). But we must show that γ(x) is nonempty for any x ∈ Ω. Choose an approxi-mating sequence {cn} of α(x) and corresponding {an}, i.e. Bx(y) ≥ an · (y− x) + cnand cn ↗ α(x). Now by (2.3.6) and Bx(yi) ≤ M , it is clear that {an} must bebounded, therefore γ(x) is nonempty and compact. Below, we abuse notation byletting γ(x) denote an arbitrary selection of a from γ(x).31We now prove that α is locally Lipschitz. First, fix R > 0 and let x ∈ XΓ, x′ ∈Ω, y ∈ YΓ with |x|, |x′| < R. By the local Lipschitz property of c in x, i.e.c(x, y)− c(x′, y) ≤ C|x− x′|for some C = C(R) > 0 and for all |x|, |x′| < R, we seeα(x) + γ(x) · (y − x) ≤ Bx(y) ≤ Bx′(y) + C|x′ − x|.(2.3.7)Thus,α(x)− C|x′ − x|+ γ(x) · (x′ − x) + γ(x) · (y − x′) ≤ Bx′(y).The definition of α givesα(x)− C|x′ − x|+ γ(x) · (x′ − x) ≤ α(x′).(2.3.8)In particular, α is locally lower bounded. Knowing that α is finite everywhere in Ω,the above argument can be repeated, giving (2.3.8) for any x, x′ ∈ Ω;α(x)− C|x′ − x|+ γ(x) · (x′ − x) ≤ α(x′).By interchanging x and x′, we get|α(x)− α(x′)| ≤ ((|γ(x)| ∨ |γ(x′)|) + C) |x′ − x|.Therefore, the local boundedness of γ will imply that α is locally Lipschitz. Now, toshow this property for γ, use (2.3.6) and let V be a small neighbourhood of x whoseclosure is in U . Since α is bounded on V , there exists C > 0 such thatγ(z) · (yi − z) ≤ C ∀z ∈ V, i = 1, 2, ..., s,(2.3.9)which says that γ is bounded on V . Hence, α is locally Lipschitz.32Finally, we can extend β to the whole of Rd via the formula,β(y) := supx∈Ω{c(x, y) + γ(x) · (y − x) + α(x)}.(2.3.10)Clearly β = β on YΓ.If now y 7→ c(x, y) is convex, the above definition of β immediately implies theconvexity and finiteness on YΓ, hence also on conv(YΓ). Therefore β is continuousin Ω and differentiable a.e. in Ω. Moreover, in the case c(x, y) = |x− y| we observethat (2.3.5) immediately implies thatx /∈ Γx for all x ∈ XΓ where β is differentiable.For the minimization problem, the duality isβ(y)− c(x, y) ≤ γ(x) · (y − x) + α(x) ∀x ∈ XΓ, y ∈ YΓ,(2.3.11)β(y)− c(x, y) = γ(x) · (y − x) + α(x) ∀(x, y) ∈ Γ.(2.3.12)Thus, we define the extensions asα(x) := inf{c ∈ R : ∃a ∈ Rd such that Bx(y) ≤ a · (y − x) + c ∀ y ∈ YΓ}γ(x) := {a ∈ Rd : β(y)− c(x, y) ≤ a · (y − x) + α(x) ∀y ∈ YΓβ(y)− c(x, y) = a · (y − x) + α(x) ∀y ∈ Γx}β(y) := infx∈Ω{c(x, y) + γ(x) · (y − x) + α(x)}(2.3.13)The same reasoning shows that α is locally Lipschitz. But, note that in this case βmay not be a convex function even if y 7→ c(x, y) is convex.Finally note that if y 7→ c(x, y) is locally Lipschitz and the local Lipschitz con-stants are uniformly bounded in x ∈ Ω, then in both max/min cases, the function33β given in (2.3.10), (2.3.13), respectively, are locally Lipschitz in y, as long as γ(x)is uniformly bounded on Ω, since then it is given by the sup or inf of Lipschitzfunctions.The next lemma shows that essentially γ is differentiable whenever α is. Thisproperty will be crucial in the proof of Theorem 2.2.4 item (2).Lemma 2.3.4. Use the same conditions as in Lemma 2.3.3, and suppose x 7→ c(x, y)is differentiable at x whenever x 6= y (e.g. c(x, y) = |x − y|). Let α : Ω → R,β : Rd → R, and γ : Ω → Rd be a c-dual triplet for Γ, and assume that for themaximization problem, (for minimization the inequality is reversed) there is s ∈ Rd,such thatα(x′) ≥ s · (x′ − x) + α(x) + o(|x′ − x|) as x′ → x.(2.3.14)Fix x ∈ XΓ, and let V be the vector subspace of Rd corresponding to the affine spaceV (Γx), and assume dimV ≥ 1. Let projV γ be the orthogonal projection of the valueof γ on V . Then α and projV γ have a directional derivative at x in every directionu ∈ V .Proof. By the duality assumption for the maximization problem, for all x′ ∈ Ω andall (x, y) ∈ Γ,c(x′, y) + γ(x′) · (y − x′) + α(x′) ≤ c(x, y) + γ(x) · (y − x) + α(x).(2.3.15)Choose a unit vector u ∈ V and let x′ = x+ tu. Then (2.3.15) is rewritten asα(x+ tu)− α(x)t≤ γ(x+ tu)− γ(x)t· (x+ tu− y) + γ(x) · u− c(x+ tu, y)− c(x, y)tif t > 0(2.3.16)α(x+ tu)− α(x)t≥ γ(x+ tu)− γ(x)t· (x+ tu− y) + γ(x) · u− c(x+ tu, y)− c(x, y)tif t < 0(2.3.17)34Let us use the notation Dt,uf(x) =f(x+tu)−f(x)t. Now assumption (2.3.14) says thatlim supt↑0Dt,uα(x) ≤ lim inft↓0Dt,uα(x).(2.3.18)Since x ∈ int(conv(Γx)), there exists y1, ..., yk ∈ Γx \{x}, p1, ..., pk ≥ 0, q1, ..., qk ≥ 0,Σpi = 1, Σqi = 1, t+ > 0, t− < 0, such thatx+ t+u = Σpiyix+ t−u = Σqiyi.Note that the first term on the right side of (2.3.16) and (2.3.17) is linear in y, soby summing up the yi’s with the weights pi’s or qi’s, we get (and we write γ1(x) :=γ(x) · u)Dt,uα(x) ≤ Dt,uγ1(x)(t− t±) + C±(t) if t > 0(2.3.19)Dt,uα(x) ≥ Dt,uγ1(x)(t− t±) + C±(t) if t < 0(2.3.20)Here C+(t), C−(t) are functions of t 6= 0, but have limits as t→ 0 by the differentia-bility assumption on the cost. Write C± = limt→0C±(t), respectively.By taking lim inft↓0 in (2.3.19) and lim supt↑0 in (2.3.20) and by recalling thatt+ > 0, t− < 0, we havelim inft↓0Dt,uα(x) ≤ (−t+) lim supt↓0Dt,uγ1(x) + C+lim inft↓0Dt,uα(x) ≤ (−t−) lim inft↓0Dt,uγ1(x) + C−lim supt↑0Dt,uα(x) ≥ (−t+) lim inft↑0Dt,uγ1(x) + C+lim supt↑0Dt,uα(x) ≥ (−t−) lim supt↑0Dt,uγ1(x) + C−.35This and (2.3.18) combine to givelim inft↑0Dt,uγ1(x) ≤ lim supt↑0Dt,uγ1(x) ≤ lim inft↓0Dt,uγ1(x)≤ lim supt↓0Dt,uγ1(x) ≤ lim inft↑0Dt,uγ1(x),that is γ1 = γ ·u is differentiable at x in the direction u. Knowing this, we then takelim supt↓0 in (2.3.19) and lim inft↑0 on (2.3.20) to getlim supt↓0Dt,uα(x) ≤ (−t+)∇uγ1(x) + C+lim inft↑0Dt,uα(x) ≥ (−t+)∇uγ1(x) + C+.Combining this with (2.3.18), we get the differentiability of α at x in the directionu.Next, choose any unit vector v ∈ V orthogonal to u and let γ2(x) := γ(x) · v. Wewant to show that ∇uγ2(x) exists. We proceed just as before; for some k ∈ N, thereexists y1, ..., yk ∈ Γx \ {x}, p1, ..., pk ≥ 0, q1, ..., qk ≥ 0, Σpi = 1, Σqi = 1, t+ > 0,t− < 0, such thatx+ t+v = Σpiyix+ t−v = Σqiyi.By summing up the yi’s and the weights pi’s or qi’s as before, we get this timeDt,uα(x) ≤ tDt,uγ1(x)− t±Dt,uγ2(x) + C±(t) if t > 0(2.3.21)Dt,uα(x) ≥ tDt,uγ1(x)− t±Dt,uγ2(x) + C±(t) if t < 0(2.3.22)36Taking lim inft↓0 in (2.3.21) and lim supt↑0 in (2.3.22) and recalling t+ > 0, t− < 0,∇uα(x) ≤ (−t+) lim supt↓0Dt,uγ2(x) + C+∇uα(x) ≤ (−t−) lim inft↓0Dt,uγ2(x) + C−∇uα(x) ≥ (−t+) lim inft↑0Dt,uγ2(x) + C+∇uα(x) ≥ (−t−) lim supt↑0Dt,uγ2(x) + C−,which implies differentiability of γ2 = γ · v at x in the direction u. Now choose anorthonormal basis {u, v1, ..., vm} of V and write projV γ = (γ · u)u + Σ(γ · vi)vi. Weobserved that each component of projV γ is directionally-differentiable.For the minimization problem, we need to reverse the inequalities in (2.3.14) and(2.3.15), and then proceed in the same way. Hence, the lemma is proved.2.3.3 Structure of optimal martingale supporting sets whenthe dual is attainedIn this subsection, we shall consider c(x, y) = |x − y|, and a martingale measurepi ∈ MT (µ, ν) that is a solution of (2.1.1) that is concentrated on a Borel set Γ.Now we prove Theorem 2.2.4 item (2), which shows that Conjecture 1 holds for bothmax/min problem whenever the optimal martingale is concentrated on a Borel setthat admits a c-dual. We shall first indicate the role of the regularity of the “potentialfunctions” (α, β, γ).Lemma 2.3.5. Let Γ ∈ SMT , Ω an open set in Rd containing XΓ, α : Ω → R andγ : Ω→ Rd be two functions. In the maximization problem, define β : Rd → R asβ(y) = supx∈Ω{|x− y|+ γ(x) · (y − x) + α(x)};(2.3.23)37For the minimization problem, setβ(y) = infx∈Ω{|x− y|+ γ(x) · (y − x) + α(x)}.(2.3.24)Assume that Γ satisfiesβ(y) = |x− y|+ γ(x) · (y − x) + α(x) for all (x, y) ∈ Γ.(2.3.25)Then if α and γ are differentiable at x ∈ XΓ, then Γx is supported on the Choquetboundary of the closed convex hull of Γx.Proof. Define the “tilted cone”ζ(x, y) = ζx(y) = ζy(x) := |x− y|+ γ(x) · (y − x) + α(x).The duality condition (2.3.25) with (2.3.23) tells us the following: if (x, y) ∈ Γ, thenfor all x′ ∈ Ω,ζx′(y) ≤ ζx(y)(2.3.26)Or (2.3.25) with (2.3.24) we get the reverse inequality.Note that since ζx(y) is continuous, the same inequality holds for all y ∈ Γx. Thisobviously implies that, if y ∈ Γx and x 6= y, then the gradient with respect to xvanishes:∇ζy(x) = 0(2.3.27)and in fact (2.3.26) also implies that if y ∈ Γx, then necessarily x 6= y. (If x = y,then the function ζy(x) strictly increases as x moves along the direction ∇x[γ(x) ·(y − x) + α(x)].) We may call this as non-staying property or unstability, for themaximization problem. For the minimization problem, without loss of generality wealready assumed that x /∈ Γx, but in fact x /∈ Γx as well, by (2.3.28) below.Now suppose the lemma is false. Then we can find {y, y0, ..., ys} ⊆ Γx for some38s ≥ 1 with y = Σsi=0piyi, Σsi=0pi = 1. Choose a minimum s such that all pi > 0. Nowtaking directional derivative in the direction u = x−y|x−y| gives∇uζy(x) = ∇uζyi(x) = 0 ∀i = 0, 1, ..., s.We compute∇uζyi(x) =x− yi|x− yi| · u+∇uγ(x) · (yi − x)− γ(x) · u+∇uα(x).Then, by linearity of y 7→ ∇uγ(x) · y, the equation ∇ζy(x) = 0 simply becomes1 =s∑i=0pix− yi|x− yi| ·x− y|x− y| .As x−y|x−y| is a unit vector and all pi > 0, this can hold only if all yi lie on the rayemanated from x. The minimality of s then implies that s = 1, hence {y, y0, y1} ⊆ Γxwould lie on a ray emanating from x, which is a contradiction, once we prove thefollowing claim:Γx is supported on the topological boundary of the convex hull of Γx,(2.3.28)Recall that here the topology is not the topology in Rd but the topology in V :=V (Γx). If our claim is false and assuming first that dim(V ) ≥ 2, we can find y ∈Γx ∩ IC(Γx) as a barycenter of a triangle joining 3 points y0, y1, y2 in Γx. But theabove argument implies that y0, y1, y2 have to be aligned, which is a contradiction.If dim(V ) = 1, then as x ∈ IC(Γx), we can find {y, y0, y1} ⊆ Γx such that x and yare in the interior of the line segment y0y1. But then again by above, {y, y0, y1} mustlie on the ray (i.e. half-line) emanated from x, a contradiction. Finally, we cannothave dim(V ) = 0 since this simply means that Γx = {x}, but as we already showedabove that x /∈ Γx in the case of maximization, while we already assumed withoutloss of generality that x /∈ Γx in the case of minimization.Proof of Theorem 2.2.4, item (2): First, we claim that there exists a set Γ′ such39that Γ′ ⊆ Γ, pi(Γ′) = 1 and XΓ′ ⊆ IC(YΓ′). Let X ′ be the set of Lebesgue points ofXΓ. Then as µ << Ld, we have µ(X ′) = 1. Let Γ′ := Γ∩(X ′×Rd). Then, Γ′ ∈ SMT ,pi(Γ′) = 1 and X ′ ⊆ IC(X ′) ⊆ IC(YΓ′), as claimed.Since a c-dual for Γ is also a dual for Γ′, Lemma 2.3.3 applies to Γ′ and allows toassume that α is differentiable Ld - a.e. x in Ω := IC(YΓ′). Lemma 2.3.4 givesthat whenever α is differentiable at x, then projV γ is directionally-differentiable inV = V (Γx) at x. This enables us to replicate the proof of lemma 2.3.5 and we getthat Γx is supported on the Choquet boundary of the closed convex hull of Γx for Ld- a.e. x in Ω, and hence for µ a.e. x, since µ << Ld and µ(Ω) = 1.2.4 A canonical decomposition of martingale trans-portsWe have shown in the last section that Conjecture 1 holds whenever the dual problemis attained. In this section, we shall decompose an optimal martingale transport piinto components on which an induced martingale transport problem is defined in sucha way that its dual problem is attained. For that, we shall associate an irreducibleconvex paving Φ to any Borel set Γ ∈ SMT in such a way that dual attainmentholds on each Γ ∩ (C(x) × Rd) where C(x) is any component in the convex pavingΦ. In the last part, we will use this convex paving to perform the above mentioneddecomposition of pi.2.4.1 Irreducible convex pavings associated to martingalesupporting setsLet Γ be a Borel set in SMT . We start by defining an equivalence relation on XΓ.For each x ∈ X := XΓ, we define inductively an increasing sequence of convex opensets (Cn(x))n in the following way:Start with the trivial equivalence relation x ∼0 x′ iff x = x′. Let C0(x) := IC(Γx)and recall that if Γx = {x}, then C0(x) = {x}. Now define the following equivalence40relation on X: x ∼1 x′ if there exist finitely many x1, ...xk in X such that thefollowing chain condition holds:C0(x) ∩ C0(x1) 6= ∅,C0(xi) ∩ C0(xi+1) 6= ∅ ∀i = 1, 2, ...k − 1,C0(xk) ∩ C0(x′) 6= ∅.We then consider the open convex hull:C1(x) := IC[⋃x′∼1xC0(x′)].Note that x ∼1 x′ implies C1(x) = C1(x′). Unfortunately, the convex sets C1(x)do not determine the equivalence classes. In particular, they may not be mutuallydisjoint for elements that are not equivalent for ∼1. So, we proceed to define ∼2in a similar way: x ∼2 x′ if there exists finitely many x1, ...xk in X such that thefollowing chain condition holds:C1(x) ∩ C1(x1) 6= ∅,C1(xi) ∩ C1(xi+1) 6= ∅ ∀i = 1, 2, ...k − 1,C1(xk) ∩ C1(x′) 6= ∅;and we setC2(x) := IC[⋃x′∼2xC1(x′)].Again, ∼2 is an equivalence relation and one can easily see that• x ∼1 x′ ⇒ x ∼2 x′• x ∼2 x′ ⇒ C2(x) = C2(x′)• C1(x) ⊆ C2(x).But still, the sets C2(x) may not be mutually disjoint for non-equivalent x′s. We41continue inductively in a similar fashion by defining equivalence relations ∼n forn = 1, 2, ... and their corresponding classesCn(x) := IC[⋃x′∼nxCn−1(x′)].It is easy to check that we have the following properties for each n,x ∼n x′ ⇒ x ∼n+1 x′x ∼n x′ ⇒ Cn(x) = Cn(x′)Cn(x) ⊆ Cn+1(x).Finally, define the equivalence relationx ∼ x′ if x ∼n x′ for some n,and its corresponding convex setsC(x) := limn→∞Cn(x) = ∪∞n=0Cn(x).(2.4.1)Now, we show that Ψ = {C(x)}x∈X is an irreducible convex paving for Γ.Theorem 2.4.1. The canonical relation ∼ on XΓ and the components (C(x))x∈XΓsatisfy the following:1. x ∼ x′ ⇒ C(x) = C(x′), and x  x′ ⇒ C(x) ∩ C(x′) = ∅.2. C(x) are mutually disjoint, that is either C(x) = C(x′) or C(x) ∩ C(x′) = ∅.3. x′ ∈ X ∩ C(x) if and only if x′ ∼ x.4. Φ = {C(x)}x∈X is an irreducible convex paving for Γ.5. Cn(x) = IC[⋃x′∼nx Γx′ ] for n ≥ 0 and C(x) = IC[⋃x′∼x Γx′ ].42Proof. The fact that x ∼n x′ ⇒ Cn(x) = Cn(x′) gives the first part of (1). If thereexists a z ∈ C(x) ∩ C(x′), then there is N such that z ∈ CN(x) ∩ CN(x′), implyingx ∼N+1 x′ and verifying the second part of (1) of which (2) and (3) are obviousconsequences.To prove (4), let Ψ be any convex paving of Γ and let z, x ∈ XΓ, D ∈ Ψ be such thatC(z) ∩D(x) 6= ∅. We must show that C(z) ⊆ D(x). We claim that, for any n ≥ 0,(∗) Cn(z) ∩D(x) 6= ∅ ⇒ Cn(z) ⊆ D(x), for every z, x ∈ XΓ.Indeed, it is true for n = 0 by definition. Assume that (∗) is true for some n, andsuppose Cn+1(z)∩D(x) 6= ∅. Note that Cn(z) ⊆ D(z), and so if w ∼n+1 z, by (∗) wehave that Cn(w) ⊆ D(z). As Cn+1(z) = IC[⋃w∼n+1z Cn(w)], this readily implies thatCn+1(z) ⊆ D(z), but then D(z) ∩ D(x) 6= ∅ and hence D(z) = D(x). This proves(∗) for every n ≥ 0. Now if C(z) ∩D(x) 6= ∅, then for all large n Cn(z) ∩D(x) 6= ∅,hence by (∗) we get that Cn(z) ⊆ D(x). Therefore C(z) ⊆ D(x) which proves theirreducibility of Φ.For (5), let (Ai)i∈I be any family of sets in Rd, where I is an index set. Then it iseasy to see thatIC(Ai) = IC(CC(Ai)), andCC(⋃i∈ICC(Ai)) = CC(⋃i∈IAi) = CC(⋃i∈IIC(Ai)).But note that A ⊆ B does not imply IC(A) ⊆ IC(B) in general. The above impliesin particularIC(⋃i∈IAi) = IC(⋃i∈IIC(Ai)).In addition, a simple induction shows that for every n ≥ 0, we haveCn(x) = IC[⋃x′∼nxΓx′ ].Indeed, it is true for n = 0 by definition. Suppose Cn(x) = IC[⋃x′∼nx Γx′ ]. Now by43definition,Cn+1(x) = IC[⋃x′∼n+1xCn(x′)] = IC[⋃x′∼n+1xIC(⋃x′′∼nx′Γx′′)] = IC[⋃x′∼n+1x(⋃x′′∼nx′Γx′′)].But⋃x′∼n+1x⋃x′′∼nx′ Γx′′ =⋃x′∼n+1x Γx′ , hence, Cn+1(x) = IC(⋃x′∼n+1x Γx′), com-pleting the induction.Finally, we proceed as follows:C(x) = IC[C(x)] = IC[⋃n≥0Cn(x)] = IC[⋃n≥0IC(⋃x′∼nxΓx′)]= IC[⋃n≥0⋃x′∼nxΓx′ ] = IC[⋃x′∼xΓx′ ],which completes the proof of (5) and the theorem.2.4.2 Duality attainment on each irreducible componentLet c : Rd×Rd → R be a cost function on which we make no assumption. Our aim isto prove Theorem 2.2.11, which will be obtained once we prove the following lemma.Lemma 2.4.2. Let Γ ∈ SMT and denote X := XΓ. Fix x0 ∈ X and set G :=Γ ∩ (C(x0) × Rd), where C(x0) is the convex set corresponding to the equivalenceclass of x0 given in Section 2.4.1. Then, for each y ∈ YG, there exists a compactinterval Ky ⊆ R such that any finite subset H ⊆ G has a c-dual triplet (α, β, γ) withβ(y) ∈ Ky for all y ∈ YH .Note that the above lemma states that there is a uniform estimate on the wayfinite subsets of G have c-duals. This control on the β component of the c-dualtriplets will allow us to use Tychonoff’s compactness theorem to deduce that G hasa c-dual.To prove Lemma 2.4.2, we first give an idea about the degrees of freedom wehave in choosing β. First, note that if β is a c-dual for G and L : Rd → R is anaffine function, then β − L is also a c-dual for G. Letting m = dim(V (YG)), we can44find {y0, ..., ym} ⊆ YG such that V ({y0, ..., ym}) = V (YG), i.e. {y0, ..., ym} constitutevertices of an m-dimensional polytope in V (YG). Now for a given c-dual functionβ for G, let L : V (YG) → R be an affine function determined by L(yi) = β(yi) fori = 0, 1, ...,m. The function β′ := β−L then satisfies β′(yi) = 0 for all i = 0, 1, ...,m,which means that we have m+1 freedom of choice on the value of β. In other words,if we set Kyi = {0} for i = 0, 1, ...,m, then we can find β′ such that β′(yi) ∈ Kyi foreach yi. Now, we want to observe how the initial value of β can control its values atother points y. We shall see that thanks to the duality relations (2.2.2) and (2.2.3),the control of the value of β propagates well along a given chain inside the equivalentclass C(x0).The proof of Lemma 2.4.2 is involved, and requires several key steps. To clarifythe idea, we consider first the special case c = 0 where we can establish a completecontrol on the dual functions.Lemma 2.4.3. Let G ∈ SMT and assume that (α, β, γ) is a 0-dual triplet associatedto the set G. In other words,β(y) ≥ Lx(y) ∀x ∈ XG, y ∈ YG(2.4.2)β(y) = Lx(y) ∀(x, y) ∈ G,(2.4.3)where for each x, Lx is the affine functionLx(y) := γ(x) · (y − x) + α(x).Then, Lx = Lx′ on V (C(x)) whenever x ∼ x′.Note that (2.4.3) says that if we have control on Lx, then we have control on βfor all y ∈ Gx. In particular, Lemma 2.4.3 implies that if Lx = 0 (we can choosesuch Lx without loss of generality) then, Lx′ = 0 on V (C(x)) for all x′ ∈ C(x), thusα(x′) = 0 for all x′ ∈ C(x) and β(y) = 0 at each y ∈ Gx′ .The above lemma is a consequence of the following proposition:Proposition 2.4.4. Let L1, L2 be two affine functions on Rd, and let S1, S2 be sets inRd. Suppose that L1 ≤ L2 on S1, and L2 ≤ L1 on S2, and that IC(S1)∩ IC(S2) 6= ∅.45Then, L1 = L2 on V (S1 ∪ S2), the latter is the minimal affine space containing thesets S1 and S2.Proof. This follows from two facts:1. For affine functions, L ≤ L′ on a set S implies L1 ≤ L2 on conv(S).2. If two affine functions L,L′ satisfy L ≤ L′ on a set S and if moreover, L(z) =L′(z) at some interior point of conv(S), then L = L′ on conv(S), thus on V (S).Indeed, apply (1) to the case L = L1, L′ = L2, and S = S1, and also to the caseL = L2, L′ = L1, and S = S2. We get L1 = L2 on conv(S1) ∩ conv(S2). Now, fromthe assumption, IC(S1)∩IC(S2) 6= ∅, and also obviously IC(S1)∩IC(S2) ⊆ IC(Si),i = 1, 2. Using (2) we then get that L1 = L2 on both conv(Si), i = 1, 2. From thisthe assertion follows.Proof of Lemma 2.4.3. First note that for each x, x′ ∈ X = XG, conditions (2.4.2)and (2.4.3) yield that Lx′ ≤ Lx on Gx, and Lx ≤ Lx′ on Gx′ .Now to prove the lemma, it suffices to show that for each n ∈ {0, 1, 2, ..., },if x ∼n x′ then Lx = Lx′ on V (Cn(x)).(2.4.4)Here, by x ∼0 x′ we mean x = x′. We do this inductively. Our induction hypothesisis (2.4.4) together withLz ≤ Lx on Cn(x), for each z ∈ X and x ∈ X.(2.4.5)For n = 0, (2.4.4) is trivially satisfied and (2.4.5) follows from (2.4.2) and (2.4.3).Now, assume that (2.4.4) and (2.4.5) hold for all n ≤ k. For n = k + 1, if x ∼k+1 x′then there are x = x0, x1, ..., xm = x′ for some m, such that Ck(xi) ∩ Ck(xi+1) 6= ∅for each 0 ≤ i ≤ m − 1. From this, and using (2.4.4) and (2.4.5), we can applyProposition 2.4.4 with the choice L1 = Lxi , L2 = Lxi+1 , S1 = Ck(xi), S2 = Ck(xi+1),and see Lxi = Lxi+1 on V (Ck(xi)∪Ck(xi+1)), for i = 0, ...,m− 1. Similarly, repeatedapplication of Proposition 2.4.4 eventually yields that Lx = Lxi = Lx′ on each Ck(xi),46for i = 1, ..,m. Therefore, Lx = Lx′ on⋃iCk(xi), thus, on V (⋃mi=0Ck(xi)). Thisholds for any x ∼k+1 x′, thus by applying the result to all z ∼k+1 x ∼k+1 x′, we alsoseeLx = Lx′ on V (⋃z∼k+1xCk(z)) = V (Ck+1(x)),verifying (2.4.4) for n = k + 1. For (2.4.5), for each z ∈ X, from the assumption(2.4.5) for n ≤ k and applying (2.4.4), we have Lz ≤ Lx′ = Lx on Ck(x′) for allx′ ∼k+1 x. For the affine functions, this implies Lz ≤ Lx on Ck+1(x). This completesthe induction argument, so the proof.We now consider the case of a non-trivial cost c. We first establish a morequantitative version of Proposition 2.4.4.Proposition 2.4.5. Let L1, L2 be two affine functions on Rd, and let S1, S2 be setsin Rd. Suppose that• L1 ≤ L2 + δ1 on S1, and L2 ≤ L1 + δ2 on S2 for some constants δ1, δ2 > 0;• there is a point z in IC(S1) ∩ IC(S2).Then, |L1 − L2| ≤ C on conv(S1 ∪ S2). Here, C = C(z, S1, S2, δ1, δ2) < ∞ as longas z stays in the interior IC(S1) ∩ IC(S2), though as z gets close to the boundaries∂ (conv(Si)), i = 1 or 2, the constant C may go to +∞.Proof. First, convexity and linearity imply that for each δ, δ′ > 0 we have the follow-ing:1. For affine functions, L ≤ L′ + δ on a set S implies L ≤ L′ + δ on conv(S).2. If two affine functions L,L′ satisfy L ≤ L′ + δ on a set S and if moreover,L(z) ≥ L′(z) − δ′ at some interior point z of conv(S), then |L − L′| ≤ C =C(z, S, δ, δ′) on conv(S). Here, the constant C < ∞ depends only on δ, δ′and the ratio between the minimum distance from z to ∂ (conv(S)) and themaximum distance to ∂(conv(S)), though, as z gets close to ∂ (conv(S)), theconstant C can go to +∞.47Now, apply (1) to the case L = L1, L′ = L2, and S = S1, and also to the caseL = L2, L′ = L1, and S = S2. Thus, we get |L1−L2| ≤ max(δ1, δ2) at the point z ofIC(S1) ∩ IC(S2). Now, apply (2), to get |L1 − L2| ≤ C on both conv(Si), i = 1, 2,where C = C(S1, S2, δ1, δ2) < ∞. Applying (1) again, we have |L1 − L2| ≤ C onconv(S1 ∪ S2), completing the proof.We now introduce the following notation.Definition 2.4.6. Let G ∈ SMT and let H ⊆ G. Given any triplet {α, β, γ} thatforms a c-dual for H, we define for each x ∈ XH , the affine functionLHx (y) = γ(x) · (y − x) + α(x).The superscript H indicates that LHx arises from a c-dual of H. The c-duality con-ditions satisfied by α, β, γ on H can be written as:β(y)− c(x, y) ≥ LHx (y), ∀x ∈ XH , y ∈ YH ,(2.4.6)β(y)− c(x, y) = LHx (y), ∀(x, y) ∈ H.(2.4.7)For an affine space V , we write for x, x′ ∈ XH ,LHx ≈ LHx′ on Vif there is a bounded set S with V = V (S) and a constant M = M(c,H, S) dependingonly on H, the cost function c and the set S, such that for every choice of c-dual{α, β, γ} for H, we have|LHx − LHx′ | ≤M on the set S.(2.4.8)We say LHx ≈ LHx′ at z, if we have (2.4.8) for S = {z}.An immediate observation is that for x, x′, x′′ ∈ XH ,whenever LHx ≈ LHx′ and LHx′ ≈ LHx′′ on V , then LHx ≈ LHx′′ on V .48Also, note that if H ′ ⊆ H, any c-dual for H is also a c-dual for H ′, hence for anyx, x′ ∈ XH′ , we necessarily haveLH′x ≈ LH′x′ on V ⇒ LHx ≈ LHx′ on V .(2.4.9)We shall now prove the analogue of Lemma 2.4.3 in the case of a general cost. Weshall again use Proposition 2.4.5, to establish a propagation of control on the affinefunctions LHx ’s, along an ordered chain of intersecting convex open sets. But, since cis not trivial anymore, the control on LHx can be done only in finite steps, since theerrors (the constant C in Proposition 2.4.5) can accumulate.Lemma 2.4.7. Set G := Γ ∩ (C(x0) × Rd) and suppose x, x′ ∈ XG (i.e. x ∼ x′).Then there exists a finite set H ⊆ G such that x, x′ ∈ XH and LHx ≈ LHx′ on V (C(x)).Proof. First observe that it suffices to proveClaim 2.4.8. Suppose x ∼ x′ and z ∈ Gx′. Then there exists a finite set H ⊆ Gsuch that x, x′ ∈ XH , and LHx ≈ LHx′ at z.The lemma follows when we apply this claim to a set of finitely many (ui, vi)’s,i = 1, ...,m, in G (so x ∼ x′ ∼ ui) with V (C(x)) = V ({vi}mi=1), and use (2.4.9).We show this claim using induction on n = 0, 1, 2, 3, .... Our induction hypothesisis if x ∼n x′ and z ∈ Gx′ , then there exists a finite set H ⊆ G such that(i1) x, x′ ∈ XH , z ∈ YH and YH ⊆ Cn(x);(i2) LHx ≈ LHx′ on V (YH);(i3) for each finite set F ⊆ G with H ⊆ F , and for w ∈ XF , there is a constantC = C(H,w) depending only on H and w such thatLFw ≤ LFx + C on YH .Notice that the claim follows if we verify (i1) – (i3) for each n, since in particular,the values of LHx and LHx′ at z is estimated from (i2).49Let us start the induction. For the case n = 0, assume x ∼0 x′ (i.e. x = x′) andz ∈ Gx′ . Choose H = {(x, z)}. Then, (i1) and (i2) are trivially satisfied. Moreover,(i3) holds from (2.4.6) and (2.4.7), where the constant C is estimated by the valuec(w, z)− c(x, z). This completes the case n = 0.Suppose the induction hypothesis holds for n. Assume x ∼n+1 x′ and z ∈ Gx′ .Then, there is a finite chain of Cn(xk), k = 0, 1, ...,m, x0 = x, xm = x′, such thatCn(xk) ∩ Cn(xk+1) 6= ∅ for each k. Recall Cn(x) = IC[⋃x′∼nxGx′ ]. Thus for eachCn(xk), it is possible to find a finite set Jk := {(uik, vik)mki=1} ⊆ G such that xk ∼n uikfor all i and IC(YJk) is a good approximation of Cn(xk), i.e. YJk ⊆ Cn(xk) ⊆ V (YJk)and IC(YJk) ∩ IC(YJk+1) 6= ∅. Also, we can let z ∈ YJm .Now, apply the induction hypothesis for n to each xk, uik, xk ∼n uik and find afinite set H ik ⊆ G that satisfies (i1)–(i3) for x = xk, x′ = uik, z = vik, and H = H ik.LetHk :=⋃iH ikThen, from (i3) for H ik’s, we also have (i3) for H = Hk and x = xk. Here, the pointof considering Hk is YJk ⊆ YHk ⊆ Cn(xk), so V (YJk) = V (YHk), hence IC(YHk) ∩IC(YHk+1) 6= ∅ as well.Now, to verify the induction hypothesis for n+ 1-th step, letH¯ :=⋃kHkWe will show properties (i1)–(i3) for this set H¯. From the construction, x, x′ ∈ XH¯ ,z ∈ YH¯ , and since Cn(xk) ⊆ Cn+1(xk) = Cn+1(x), (i1) readily follows. For (i2), applythe induction hypothesis (i3) for Hk’s to Proposition 2.4.5 iteratively for the pairsYH1 and YH2 , YH1 ∪ YH2 and YH3 , ... , YH1 ∪ ....∪ YHk and YHk+1 , so on. Then we seethe estimate (2.4.8) holds for S = YH¯ , thus,LH¯x ≈ LH¯x1 ≈ ... ≈ LH¯xm−1 ≈ LH¯x′ on V (YH¯),(2.4.10)50verifying (i2).For (i3), let F be a finite set containing H¯ and let w ∈ XF . Then (i3) for eachHk gives that LFw ≤ LFxk + Ck on YHk , k = 0, 1, ...,m. Now applying (2.4.10) andrecalling (2.4.9), we conclude that there is a constant C = C(H¯, w) such thatLFw ≤ LFx + C on YH¯ .This completes the induction, and the proof.Armed with Lemma 2.4.7, we shall establish Lemma 2.4.2.Proof of Lemma 2.4.2. Recall that we fix x0 ∈ X and let G := Γ ∩ (C(x0)× Rd)and V := V (YG).Let m = dim(V ). Then we can findJ := {(ui, vi)}mi=0 ⊆ G such that V ({vi}mi=0) = V.Define the initial choices Kvi = {0}, i = 0, 1, ...,m. We want to define the Ky’s tobe compatible with these initial choices. For y ∈ YG, choose x(y) ∈ XG such that(x(y), y) ∈ G. By lemma 2.4.7 (especially see Claim 2.4.8), for y ∈ YG, we can choosea finite set H(y) such that J ∪ {(x(y), y)} ⊆ H(y) andLH(y)x(y) ≈ LH(y)ui at vi, ∀i = 0, ...,m.In particular, there exists a constant M , depending only on y and H(y) but not onthe choice of dual functions of H(y), such that|LH(y)x(y) (vi)− LH(y)ui (vi)| ≤M, ∀ i = 0, ...,m.(2.4.11)Let (α, β, γ) be a c-dual triplet for the set H(y); note that it exists because Γ, thusG is finitely c-dualizable. By subtracting an appropriate affine function from β, we51can let β(vi) = 0. Duality then givesβ(y) = c(x(y), y) + LH(y)x(y) (y).Since LH(y)x is affine and V ({vi}mi=0) = V , the value LH(y)x(y) (y) can be computed fromthe values LH(y)x(y) (vi). Hence by (2.4.11), the values LH(y)ui (vi) give an estimate of β(y).Notice that from duality we haveLH(y)ui (vi) = β(vi)− c(ui, vi) = −c(ui, vi).Thus, there exists a constant N = N(y) such that if β is a c-dual for H(y) and ifβ(vi) = 0 for all i, then −N ≤ β(y) ≤ N . We set Ky = [−N,N ].To get the claim in Lemma 2.4.2, we let H be any finite set and denote YH ={y1, ..., ys}. LetH∗ = H ∪H(y1) ∪ ... ∪H(ys)Now choose β to be a c-dual for H∗ with β(vi) = 0 for all i. Since β is also a c-dualfor H(yj), we have β(yj) ∈ Kyj for all j = 1, ..., s. Finally, note that β is also ac-dual for H, concluding the proof.2.4.3 Proof of Theorem 2.2.11As before, let G = Γ ∩ (C(x0)× Rd) and let V := V (YG) be the ambient space. Wefirst find the desired function β : YG → R from the compactness argument alreadyused in [5]. Indeed, define K := Πy∈YGKy, where the Ky’s were obtained in Lemma2.4.2. This is a subset of the space of all functions from YG to R. In the topology ofpointwise convergence, K is compact by Tychonoff’s theorem. Now we claim that,for any finite H ⊆ G, the setΨH := {β ∈ K : β is a c-dual for H}52is a non-empty closed subset ofK. Indeed, that ΨH is non-empty follows from Lemma2.4.2 since Γ and hence G is finitely c-dualizable: if necessary, one can extend theβ found in Lemma 2.4.2 –and originally defined on YH– to YG, by simply lettingβ(y) = 0 for y /∈ YH .To show that ΨH is closed, let {βn} be a sequence of c-dual functions for H, andsuppose βn → β pointwise on YG. We need to show that β is also a c-dual for H.But for each n, we have functions (αn, γn) such that the duality relation holds:βn(y)− c(x, y) ≥ γn(x) · (y − x) + αn(x) ∀x ∈ XH , y ∈ YH(2.4.12)βn(y)− c(x, y) = γn(x) · (y − x) + αn(x) ∀(x, y) ∈ H.(2.4.13)Here, without loss of generality, we can assume that each vector γn(x) is parallelto V (YH). Now since (βn(y) − c(x, y))x∈XH ,y∈YH is uniformly bounded in n, we canchoose (αn(x), γn(x)) in such a way that (αn(x), γn(x))n is also uniformly boundedin n. Since XH is finite, we can find a subsequence of (αn, γn) which converges to(α, γ) at every x ∈ XH . Then (α, β, γ) is clearly a c-dual triplet for H, establishingthe claim on ΨH .Now for any T ⊆ S, any c-dual for S is also a c-dual for T . We therefore havethe finite intersection property for {ΨH}, i.e. ∅ 6= ΨH1∪...∪Hs ⊆⋂j=1,...,s ΨHj . Bythe compactness of K and the closeness of ΨH ’s, we deduce that the set ΨG :=⋂H⊆G,|H|<∞ΨH is nonempty.We claim that any β ∈ ΨG is a c-dual for G. Fix x ∈ XG and β ∈ ΨG. We mustshow that there exists an affine function Lx on V = V (YG) such that the dualityholds:β(y)− c(x, y) ≥ Lx(y), ∀y ∈ YG(2.4.14)β(y)− c(x, y) = Lx(y), ∀y ∈ Gx.(2.4.15)Choose a finite set Hx ⊆ Gx such that V (Hx) = V (Gx). Observe that for any finite53set F containing H := {x} ×Hx,LFx (y) = β(y)− c(x, y) = LHx (y) ∀y ∈ Hx, hence LFx (y) = LHx (y) ∀y ∈ V (Gx).In particular, LFx (x) = LHx (x) since x ∈ IC(Gx) ⊆ V (Gx). Let us define α(x) =LHx (x).Now we need to construct the last piece which is γ(x). For this, in additionto H, we also choose a finite set {(vi, wi)}mi=1 ⊆ G such that x ∈ IC({wi}mi=1) andV ({wi}mi=1) = V , and defineH¯ := H ∪ {(vi, wi)}mi=1.For any finite set F ⊆ G with H¯ ⊆ F , according to the duality, define the setγF (x) := {v ∈ V : β(y)− c(x, y)− α(x) ≥ v · (y − x), ∀y ∈ YF(2.4.16)β(y)− c(x, y)− α(x) = v · (y − x), ∀y ∈ Fx}(2.4.17)The set γF (x) is nonempty because of finite dualizable property of G. Now, sincex ∈ IC({wi}mi=1) and V ({wi}mi=1) = V , we deduce from (2.4.16) that γF (x) is a closedand bounded set in V , hence compact. Again since every c-dual for S is also a c-dualfor T , whenever T ⊆ S, we have that {γF (x) : F ⊇ H¯} has the finite intersectionproperty. Hence, we can choose aγ(x) ∈⋂F⊇H¯,|F |<∞γF (x).Finally, we show that (2.4.14) and (2.4.15) hold for this choice of (α(x), γ(x)).Indeed, let (x′, y′) ∈ G, and let F = H¯ ∪ {(x′, y′)}. By (2.4.16), we have β(y′) −c(x, y′) ≥ α(x) + γ(x) · (y′−x), so (2.4.14) holds. Let y ∈ Gx. Let F = H¯ ∪{(x, y)}.By (2.4.17), we have β(y) − c(x, y) = α(x) + γ(x) · (y − x), so (2.4.15) holds. Thiscompletes the proof of Theorem The disintegration of a martingale transport planFor a closed convex set U ⊆ Rd, let K(U) be the space of all closed convex subsetsin Rd, equipped with the Hausdorff metric in such a way that it becomes a separablecomplete metric space (Polish space): This allows for the disintegration of a measurepi on X via a measurable map T : X → K(U) (see e.g. [7, Corollary 2.4]) in such away that each piece of the disintegrated measure, say piC , is a probability measure onT−1(C). In particular, piC(T−1(C)) = 1 for T#pi-a.e. C ∈ K(U), ultimately yieldingconditional probabilities.Proof of Theorem 2.2.14. The above discussion and the measurability hypothesisof the map Ξ : Γ → K(Rd) defined by (x, y) 7→ C(x), yield the disintegration of piinto piCdp˜i(C) in (2.2.4), with piC supported on ΓC . The measures µC , νC are obtainedby taking marginals of piC . The martingale and optimality properties of piC for p˜i-a.e.C, follow from those properties of pi and the disintegration (2.2.4). For the optimalmartingale transport pi, the set ΓC admits a c-dual by Theorem 2.2.11 applied to thefinitely c-dualizable Γ. This deals with items 1), 2), and 3) of the theorem. Finally,4) follows immediately from Theorem Structural results for general optimal martin-gale transport plansOne would like to know when we can disintegrate µ into absolutely continuous piecesµC , so as to apply theorem 2.2.4 on each partition. We start by a counterexampleshowing that this is not possible in general, at least in dimension d ≥ 3. We thenproceed in Section 2.5.2 to show Theorem 2.2.15 and 2.2.16, which give instanceswhere the reasoning could apply.2.5.1 Nikodym sets and martingale transportsAmbrosio, Kirchheim, and Pratelli [2] constructed a Nikodym set in R3 having fullmeasure in the unit cube, and intersecting each element of a family of pairwise55disjoint open lines only in one point. More precisely they showed the following:Theorem 2.5.1. (Ambrosio, Kirchheim, and Pratelli [2]) There exist a Borel setMN ⊆ [−1, 1]3 with |[−1, 1]3 − MN | = 0 and a Borel map f = (f1, f2) : MN →[−2, 2]2 × [−2, 2]2 such that the following holds. If we define for x ∈ MN the opensegment lx connecting (f1(x),−2) to (f2(x), 2), then• {x} = lx ∩MN for all x ∈MN ,• lx ∩ ly = ∅ for all x 6= y ∈MN .Example 2.5.2. One can use the above construction to construct an optimal martin-gale transport, whose equivalence classes are singletons, hence the disintegration ofthe first marginal along the partitions C(x) is the Dirac mass δx, which is obviouslynot absolutely continuous w.r.t. L1.Consider the obvious inequality 12ε(|x− y| − ε)2 ≥ 0, and its equivalent form12ε|y|2 ≥ |x− y|+ 1εx · (y − x) + 12ε|x|2 − ε.(2.5.1)Thus by letting αε(x) =12ε|x|2 − ε, βε(y) = 12ε |y|2 and γε(x) = 1εx, (2.5.1) is adual relation for the maximization problem, and (2.5.1) shows that every martingalepiε := (X, Y ) with |X − Y | = ε a.s. is optimal with its own marginals X ∼ µ andY ∼ ν.Now fix ε > 0 small and let X be a random variable whose distribution µ hasuniform density on [−1, 1]3. We define Y conditionally on X by evenly distributingthe mass along the lines lx considered in Theorem 2.5.1 and distance ε, that isY splits equally in two pieces from x ∈ X along lx with distance ε. Then themartingale (X, Y ) is optimal for the maximization problem. But note that in thiscase, each equivalence class [x] is the singleton {x}, so the disintegration of µ alongthe partitions C(x) is the Dirac mass δx, which is obviously not absolutely continuousw.r.t. L1. Hence, the decomposition is not useful in this case. One also noticesthat the convex sets associated to the irreducible paving of the martingale (X, Y )have codimension 2. We leave it as an open problem whether one can do withoutassumption (2.2.5) in Theorem 2.5.3. By letting ε→ 0, the above problem approaches the one considered inExample 2.3.2, that is the case when the marginals µ = ν are equal, the only maximalmartingale transport is the identity, and the value of the maximal cost is zero. On theother hand, note that∫βε(y)dν(y)−∫αε(x)dµ(x) = ε, which means that (αε, βε, γε)is a minimizing sequence for the dual problem. But neither of the sequences αε, βε, γεconverge (neither pointwise nor in L1). This is another manifestation of the non-existence of a dual in example 2.3.2. This said, for the minimization problem, wehave no example where duality is not attained.2.5.2 The case where the convex paving has at most1-codimensional componentsWe now establish Theorems 2.2.15 and 2.2.16. The following lemma describes a“flattening map” and shows that it is bi-Lipschitz.Lemma 2.5.4. Let Rd = V ×W , where V = Rd−1 and W = R. Let δ > 0 and let Abe a subset of W . Suppose that for each h ∈ A, there is a set Dh which is containedin a hyperplane Hh with Hh ∩W = {0, ..., 0, h}. Suppose further that {Dh}h∈A aremutually disjoint and the projection of every {Dh} on V contains the ball BR withcenter 0 and radius R in V . Finally, suppose that the angle between Hh and W isbounded; there is η < pi/2 such that the normal direction of Hh and the direction ofW has angle less than η for every h ∈ A.Now define the flattening map F : ∪hDh → Rd as follows: for x = (v, w) ∈ Dh,F (v, w) = (v, h). Then F is bi-Lipschitz on the set N := (∪hDh)∩ (Br ×W ), wherer < R.Proof. First, note that by the disjointness of {Dh} the map F is bijective, so F−1is well-defined. The lemma is intuitively clear; the map F cannot move two nearbypoints too far away, because the hyperdiscs {Dh} are disjoint.First of all, from the bounded angle assumption, F is clearly bi-Lipschitz on eachF (Dh) with the same Lipschitz constant for all h ∈ A. Hence, for x1 = (v1, w1),x2 = (v2, w2), we will assume that x1, x2 are contained in Dh1 , Dh2 respectively, andh1 6= h2.57We consider the case v1 = v2 ∈ V and |v1| = |v2| ≤ r. Let L be the 1-dimensionalsubspace of V containing 0 and v1. Regarding Dh1 , Dh2 as affine functions on V ,since their graphs on L ∩ BR are disjoint and linear and r < R, it is clear that|w1 − w2| ≈ |h1 − h2|; i.e.C1|h1 − h2| ≤ |w1 − w2| ≤ C2|h1 − h2| for some C1, C2 > 0.(2.5.2)Next we consider the case v1 6= v2. We want to show |x1 − x2| ≈ |F (x1)− F (x2)|, orequivalently,|w1 − w2| ≈ |h1 − h2|.Let L be the 1-dimensional subspace of V containing v1 and v2. Regarding Dh1 , Dh2as affine functions on V , since their graphs on L ∩ BR are disjoint and linear, it isclear that|w1 − w2| = |Dh1(v1)−Dh2(v2)| ≤ max(|Dh1(v1)−Dh2(v1)|, |Dh1(v2)−Dh2(v2)|).But by (2.5.2), we havemax(|Dh1(v1)−Dh2(v1)|, |Dh1(v2)−Dh2(v2)|) ≤ C2|h1 − h2|,which shows that F−1 is Lipschitz on F (N). On the other hand, by (2.5.2), we have|h1 − h2| ≤ (1/C1) min(|Dh1(v1)−Dh2(v1)|, |Dh1(v2)−Dh2(v2)|)and, again regarding Dh1 , Dh2 as disjoint linear graphs on L ∩BR, we havemin(|Dh1(v1)−Dh2(v1)|, |Dh1(v2)−Dh2(v2)|) ≤ |Dh1(v1)−Dh2(v2)| = |w1 − w2|which shows that F is Lipschitz on N . Therefore, the lemma follows.58Proof of Theorem 2.2.15. Write X = XΓ = X0 ∪X1, whereX0 = {x ∈ X : dimV (C(x)) = d}, X1 = {x ∈ X : dimV (C(x)) = d− 1}.Note that X0 = ∪x∈X0(X ∩C(x)) = X ∩ (∪x∈X0C(x)). Since ∪x∈X0C(x) is open, X0is measurable and so is X1 = X −X0. By considering a Lebesgue point x of X0, weget that Ld(X0∩C(x)) > 0. But since Γ∩ (C(x)×Rd) admits a dual, then Theorem2.2.4 yields that Γx ⊆ Ext(conv(Γx))for a.e. x in X0 ∩ C(x). Since X0 can beapproximated by compact sets from the inside and {C(x)}x∈X0 is an open cover ofX0, we conclude that Γx ⊆ Ext(conv(Γx))for a.e. x in X0.Let Xδ := {x ∈ X1 : δ(x) > δ} for some δ > 0, let x0 be a Lebesgue point of Xδ,and consider W to be the 1-dimensional affine space containing x0 and perpendicularto C(x0). Choose ε > 0 much smaller than δ and let Xδ,ε := Xδ ∩ B(x0, ε). Then{C(x) : x ∈ Xδ,ε}is a disjoint, d − 1 dimensional open convex cover of Xδ,ε andC(x) ∩W 6= ∅. Let F : ⋃x∈Xδ,ε C(x)→ ⋃x∈Xδ,ε F (C(x)) be the flattening map withrespect to W as in the lemma 2.5.4. Since F is bi-Lipschitz on the appropriate setcontaining Xδ,ε, Ld(F (Xδ,ε)) > 0.Now let E = {x ∈ X : Γx 6⊆ Ext(conv(Γx))}33 . Then by theorem 2.2.4, Γx ⊆Ext(conv(Γx))for Ld−1 almost all x in Xδ,ε∩C(x), hence Ld−1(E∩Xδ,ε∩C(x)) = 0for each x ∈ Xδ,ε.Now{F (C(x)) : x ∈ Xδ,ε}is a parallel cover of F (Xδ,ε), so by Fubini’s theoremwith bi-Lipschitz map F , we conclude that Ld(E ∩ Xδ,ε) = 0. Hence as in theprevious case, we conclude that Ld(E ∩ Xδ) = 0. Finally, letting δ → 0 we getLd(E ∩X1) = 0.Proof of Corollary 2.2.16. For x ∈ U , consider the problem (2.1.1) with thesource µ restricted to a small ball B with center x. Now by the separation as-sumption (2.2.6), we have δ(x) > c on X1 ∩B for some c > 0, hence the same proofas for Theorem 2.2.15 yields the result.592.5.3 The case of a discrete targetWe shall conclude this section by considering the case of a discrete target. Note thatthis result is another affirmative sign towards Conjecture 2.Theorem 2.5.5. Let c(x, y) = |x− y|, suppose µ << Ld and that ν is discrete, i.e.ν is supported on a countable set. Then for µ a.e. x, Γx consists of d + 1 pointswhich are vertices of a polytope in Rd, and therefore the optimal solution is unique.Proof. Since the result holds true (for more general target measures) when d = 1,we shall assume that d ≥ 2. Let S be the countable support of ν and let J := {E ⊆S : |E| < ∞ & dimV (E) ≤ d − 1}, where |E| is the cardinality of the set E.Consider VJ := ∪E∈JV (E). Since dimV (E) ≤ d − 1 and J is countable, it followsthat Ld(VJ) = 0. Let X := XΓ \ V (J) so that µ(X) = 1. Now notice that if x ∈ X,then Γx must contain vertices of a polytope which has x in its interior. Now letK := {E ⊆ S : |E| = d+ 2 and E contains vertices of a d-dimensional polytope.}Fix F = {y0, y1, ..., yd, y} in K, where y0, y1, ..., yd are vertices of a d-dimensionalpolytope and consider the set A := {x ∈ X : F ⊆ Γx}. In other words, A =Γy0 ∩ ... ∩ Γyd ∩ Γy, where Γy := {x : (x, y) ∈ Γ}. We shall prove that µ(A) = 0.Indeed, suppose otherwise, that is µ(A) > 0 and let x0 be a Lebesgue point of A.Let B = A ∩ C(x0) and note that Ld(B) > 0 since C(x0) is open in Rd. Since theset Γ ∩ (C(x0)×Rd) has a c-dual, there exist constants λ0, λ1, ..., λd, λ such that forall x ∈ B, we have|x− yi|+ γ(x) · (yi − x) + α(x) = λi, i = 0, 1, ..., d|x− y|+ γ(x) · (y − x) + α(x) = λ.Also note that {y0, y1, ..., yd, y} ⊆ Ext(conv(Γx))for almost all x ∈ B. Let pi bedetermined by y = Σdi=0piyi, and Σdi=0pi = 1, and note that some pi may be negative.60Then, by the above, we get that the functiong(x) := Σdi=0pi|x− yi| − |x− y|is constant on B, which has positive measure.We explain why this leads to a contradiction. First, notice that because g isreal analytic in Ω := Rd \ {y0, ..., yd, y}, it is not constant in any open subset, sinceotherwise it is constant everywhere, which is not the case. Second, without loss ofgenerality, assume x0 = 0 and g(0) = 0, and notice that from the real analyticityof g, one can write g(x) = Pk(x) + Q(x) for some k ∈ N, where Pk(x) is the firstnonzero k-th degree homogeneous polynomial, and Q(x) is a power series of termswith degree greater than k, in particular, Q(x) = O(|x|k+1). Now, consider the setS := {u ∈ Sd−1 | there exists 0 6= xn → 0, xn/|xn| → u, with 0 = g(xn)}.Then, for each u ∈ S, 0 = g(xn)|xn|k = Pk(xn/|xn|) +Q(xn)|xn|k , showingPk(u) = limn→∞ Pk(xn/|xn|) = 0. Thus, S is a subset of the zero set {u; Pk(u) = 0}.Now if g is zero on the set B where x0 is a Lebsegue point, then S = Sd−1, hencePk = 0, a contradiction. Hence, µ(A) = 0. The countability of K now implies thetheorem.For the uniqueness, we use the usual argument, namely that the average of twooptimal plans is also optimal, which contradicts the polytope-type of their respectivesupports.Remark 2.5.6. As we see from the above proof, Theorem 2.5.5 holds true for a muchmore general cost c(x, y) than |x−y|. Indeed, it is enough (but not necessary) c(x, y)to be analytic in {x 6= y}, and the function g(x) = ∑di=0 pic(x, yi) − c(x, y) to benon-constant. In particular, we can choose c(x, y) = |x− y|p, with p 6= 2.612.6 Minimization problem under radially symmet-ric marginalsSo far, we have studied the optimal martingale transport problem in higher dimen-sions, and we conjectured the following extremal property of minimizers.Conjecture: Consider the cost function c(x, y) = |x− y| and assume that µ is ab-solutely continuous with respect to Lebesgue measure on Rd, and that µ∧ ν = 0. If piis a martingale transport that minimizes (2.1.1), then there is a set Γ with pi(Γ) = 1such that for µ almost every x, the fiber Γx := {y : (x, y) ∈ Γ} consists of k+1 pointsthat form the vertices of a k-dimensional polytope, where k := k(x) is the dimensionof the linear span of Γx. Finally, the minimizing solution is unique.We have seen that the conjecture is true when the source µ is absolutely contin-uous and the target ν is discrete. In this section, we show that the above conjectureis true in the minimization case when the marginals µ and ν are radially symmetricon Rd, and determine the explicit structure of the minimizer. Many important prob-ability distributions, e.g. the Gaussians, satisfy the radial symmetry assumption.Theorem 2.6.1. Suppose that µ, ν are radially symmetric probability measures onRd which are in convex order and µ{0} = 0. Assume that either µ is absolutelycontinuous, or that there exists an open ball Br = {x | |x| < r} such that µ(Br) = 1and ν(Br) = 0. Then there is a unique minimizer pi for the problem (2.1.1) withrespect to the cost c(x, y) = |x−y|p where 0 < p ≤ 1, and for each x, the disintegrationpix is concentrated on the one-dimensional subspace Lx = {ax : a ∈ R}. Furthermore,if µ is absolutely continuous with respect to Lebesgue measure and µ ∧ ν = 0, thenpix is supported at two points, hence verifying the conjecture.The organization of this section is as follows. In section 2.6.1, we introduce themonotonicity principle [5] and establish the stability of the common marginal µ ∧ νunder every minimizer of (2.1.1). In section 2.6.2, we further apply the monotonicityto determine the structure of the minimizer in one dimension. Finally, in section622.6.3, we establish the variational lemma and the main theorem.2.6.1 Monotonicity principle and stability of µ∧ ν under ev-ery minimizerAn important basic tool in optimal transport is the notion of c-cyclical monotonicity.A parallel statement was given in [5].Definition 2.6.2. Let ϕ be a finite measure supported on a finite set H ⊆ Rd ×Rd.Let XH be the orthogonal projection of H onto the first coordinate space Rd. Thenwe say that ψ is a competitor of ϕ if ψ has the same marginals as ϕ and for eachx ∈ XH ,∫Rd y dϕ(x, y) =∫Rd y dψ(x, y).Lemma 2.6.3 (Monotonicity principle [5],[39]). Assume that µ, ν are probabilitymeasures in convex order and that c : Rd × Rd → R is a Borel measurable costfunction. Assume that pi ∈ MT (µ, ν) is an optimal martingale transport plan whichleads to finite cost. Then there exists a Borel set Γ ⊆ Rd × Rd with pi(Γ) = 1 suchthat the following monotonicity principle holds:If ϕ is a finite measure on a finite set H ⊆ Γ, then for every competitor ψ of ϕ,∫c dϕ ≤∫c dψ.The meaning of the monotonicity principle is clear; spt(ϕ) ⊆ Γ means that ϕ is a“subplan” of the full transport plan pi, and the definition of competitor means thatif we change the subplan ϕ to ψ, then the martingale structure of pi is not disrupted.Now if we have∫c dϕ >∫c dψ, then we may modify pi to have ψ as its subplan,achieving less cost, therefore the current plan pi is not a minimizer.We recall the notations, which are introduced in [5]:For a set Γ ⊆ Rd × Rd, we write XΓ := projXΓ, YΓ := projY Γ, i.e. XΓ is the projec-tion of Γ on the first coordinate space Rd, and YΓ on the second. For each x ∈ Rd,63we let Γx = {y ∈ Rd | (x, y) ∈ Γ}.Now as an application of the monotonicity principle, we prove the stability ofµ ∧ ν under every optimal martingale transport. [5] discusses the following theoremin one-dimensional setup with p = 1. We prove it here in general dimension withevery 0 < p ≤ 1. Note that radial symmetry of µ, ν is not assumed.Theorem 2.6.4. Let pi be any minimizer of the problem (2.1.1) with cost c(x, y) =|x− y|p, 0 < p ≤ 1. Then under pi the common mass µ ∧ ν stays put, in the sensethat if we define D : Rd → Rd×Rd by D(x) = (x, x), then the push-forward measureof µ ∧ ν by the map D is dominated by pi, i.e. D#(µ ∧ ν) ≤ pi.Proof. Suppose that the theorem is false. Then there exists a minimizer pi such thatD#(µ ∧ ν)  pi. Let Γ be a monotone concentration set of pi as in Lemma 2.6.3and denote (pix)x∈Rd as its integration. Since D#(µ ∧ ν)  pi, we can find a pointx ∈ spt(µ ∧ ν) such that pix is not a Dirac mass δx and (z, x) ∈ Γ for some z 6= x.Then as pix(Γx) = 1, we can find a probability measure ψx such that ψx 6= δx, ψx issupported on a finite subset of Γx, and ψx has its barycenter at x.Now for every 0 < p ≤ 1 and z ∈ Rd, we have that |x− y|p + |z − x|p ≥ |y − z|p,hence ∫|x− y|pdψx(y) + |z − x|p ≥∫|y − z|pdψx(y).(2.6.1)But the inequality is strict since ψx 6= δx, a contradiction to the fact that Γ ismonotone. Therefore, the theorem holds and every minimizer pi makes the commonmass µ ∧ ν stay put.Remark 2.6.5. By the theorem, we can reduce the problem between disjoint marginalsµ¯ := µ − µ ∧ ν and ν¯ := ν − µ ∧ ν. Thus, from now on we will always assume thatµ ∧ ν = 0, and therefore, for any minimizer pi ∈ MT (µ, ν), we have a monotone setΓ such that pi(Γ) = 1 and Γ ∩∆ = ∅, where ∆ := {(x, x) |x ∈ Rd}.642.6.2 Structure of optimal martingale transport in one di-mensionIn this section, we study the minimization problem (2.1.1) in one dimension, i.e. themarginals µ, ν are defined on the real line R. We will consider the cost functionc(x, y) = |x − y|p with 0 < p ≤ 1 and will determine the structure of optimal cou-pling. In this section, we do not assume the symmetry of marginals µ, ν with respectto the origin. Recall that we can assume µ ∧ ν = 0. Finally, we will say that µ iscontinuous if µ does not assign positive measure at any point; µ({x}) = 0 for everyx ∈ R.The following theorem for the 1-dimensional case was shown in [5] when p = 1,and we extend it for every 0 < p ≤ 1.Theorem 2.6.6. Assume that µ ∧ ν = 0, µ is continuous and c(x, y) = |x− y|p forsome 0 < p ≤ 1. Let pi be a minimizer for the problem (2.1.1) with d = 1. Then, thereexists a monotone set Γ such that pi(Γ) = 1 and for every x ∈ XΓ, we have |Γx| = 2.Hence if we define two functions S : XΓ → R and T : XΓ → R by Γx = {S(x), T (x)}and S(x) < x < T (x), then pi is concentrated on graph(S) ∪ graph(T ). Therefore,the minimizer is unique.Proof. Let Γ be any monotone concentration set of pi and suppose (x, y−), (x, y+), (x′, y′) ∈Γ, with y− < y′ < y+. Then we claim that neither y− < x′ < x ≤ y′ nory′ ≤ x < x′ < y+ is possible. To prove the claim, suppose y− < x′ < x ≤ y′and let 0 < t < 1 be such that ty− + (1− t)y+ = y′. Now consider the functionG(z) = t|z − y−|p + (1− t)|z − y+|p − |z − y′|p.If y− < z < y′,G(z) = t(z − y−)p + (1− t)(y+ − z)p − (y′ − z)p.65Taking derivative, we getG′(z) = p[t(z − y−)p−1 − (1− t)(y+ − z)p−1 + (y′ − z)p−1].We observeIf 0 < p < 1, (y′ − z)p−1 > (y+ − z)p−1 hence G′(z) > 0.If p = 1, G′(z) = t− (1− t) + 1 = 2t > 0.Hence for y− < x′ < x ≤ y′, we have G(x′) < G(x). In other words,t(x′ − y−)p + (1− t)(y+ − x′)p + |y′ − x|p < t(x− y−)p + (1− t)(y+ − x)p + |y′ − x′|p.This means that if we define a measure ϕ by ϕ = tδ(x,y−) +(1− t)δ(x,y+) +δ(x′,y′), thenwe have a cost-efficient competitor ψ by ψ = tδ(x′,y−) + (1 − t)δ(x′,y+) + δ(x,y′). Notethat ψ satisfies the assumption to be a competitor of ϕ. Hence by Lemma 2.6.3,(x, y−), (x, y+), (x′, y′) ∈ Γ with y− < y′ < y+ and y− < x′ < x ≤ y′ cannot occur.The case y′ ≤ x < x′ < y+ cannot occur as well.Now we follow the proof of [5]; Suppose the set A := {x ∈ R : |Γx| ≥ 3} isuncountable. Then we will have (x, y−), (x, y+), (x, y) ∈ Γ, with y− < x < y < y+or y− < y < x < y+ (Recall that Γ ∩ ∆ = ∅, where ∆ := {(x, x) |x ∈ Rd}, sinceµ ∧ ν = 0). Assume the first case. Then the Lemma 3.2 in [5] shows that anygiven ε > 0, we have (x′, y′) ∈ Γ with x − ε < x′ < x and |y′ − y| < ε by theuncountability of A. Then for small ε we have the first forbidden case, and similarlyif y− < y < x < y+ then we have (x′, y′) ∈ Γ with x < x′ < x + ε and |y′ − y| < ε,the second forbidden case. Hence, A must be countable, therefore by continuity ofµ, A is negligible.Uniqueness follows by usual argument, namely, if pi1 and pi2 are optimal solutionsrealized by (S1, T1) and (S2, T2) respectively, then the averagepi1+pi22is also optimaland hence it must also be realized by two functions (S, T ). This implies that S1(x) =S2(x) and T1(x) = T2(x) for µ a.e. x, yielding uniqueness.66In fact, we can say more on the structure of optimal martingale. Note that inthe following lemma, the continuity of µ is not assumed.Lemma 2.6.7. Let I := (a, b) be an open interval and suppose ν(I) = 0. Letc(x, y) = |x − y|p for some 0 < p ≤ 1 and let pi be a minimizer for the problem(2.1.1). Denote (pix)x be its disintegration and if x ∈ I, then denote pi+x as therestriction of pix on [b,∞) and pi−x as the restriction of pix on (−∞, a].Then if x, x′ ∈ I and x < x′, then sup(spt(pi+x′)) ≤ inf(spt(pi+x )) and sup(spt(pi−x′)) ≤inf(spt(pi−x )). In other words, the set valued functions spt(pi+x ) and spt(pi−x ) decreaseon I.Proof. Let Γ be a monotone concentration set of pi with YΓ ∩ I = ∅. If x, x′ ∈ I ∩XΓand x < x′, then we claim that sup(Γ+x′) ≤ inf(Γ+x ), where Γ+x := Γx ∩ [b,∞). If not,then we can find y′ > y ≥ b such that (x, y), (x′, y′) ∈ Γ. As pi is a martingale, we canalso find y′′ ≤ a with (x′, y′′) ∈ Γ. Then the configuration (x, y), (x′, y′), (x′, y′′) ∈ Γis forbidden by the proof of Theorem 2.6.6, a contradiction. As pix(Γx) = 1 for everyx ∈ XΓ, pi+x has its full mass on Γ+x , hence sup(spt(pi+x′)) ≤ inf(spt(pi+x )). The othercase sup(spt(pi−x′)) ≤ inf(spt(pi−x )) can be proved similarly.We may say the above result as “local decreasing property”, as the function x 7→spt(pi+x ) and x 7→ spt(pi−x ) decrease locally, i.e. on an open interval I whenever ν(I) =0. Thus if we make the following assumption, we will have the global decreasingproperty for any optimal martingale transport.Separation Assumption: There is an open interval I so that µ(I) = 1 and ν(I) = 0.For example, two Gaussian measures µ, ν in convex order will satisfy this assump-tion, after µ ∧ ν is subtracted from each marginal. Now we observe that the globaldecreasing property also yields the uniqueness of optimal solution, without assumingthe continuity of µ.Theorem 2.6.8. Under the separation assumption, a solution for the problem (2.1.1)with c(x, y) = |x− y|p for some 0 < p ≤ 1 is unique. Moreover, the optimal solutionis identical for all 0 < p ≤ 1.67Proof. Let 0 < p ≤ 1 be fixed and let pi be a minimizer. Then the separationassumption yields that (pix)x∈I decreases on I = (a, b), as shown in Lemma 2.6.7.By the decreasing property, pi must take the mass of µ from the left and transportit to fill out the ν+ (ν restricted on [b,∞)) and ν− (ν restricted on (−∞, a]) fromthe right in a martingale way. When µ, ν are continuous, this is described by thefollowing equations with functions S(x) ∈ (−∞, a] and T (x) ∈ [b,∞):µ ((a, x]) = ν− ((S(x), a]) + +ν+ ((T (x),∞)),∫ xat dµ(t) =∫ aS(x)t dν−(t) +∫ ∞T (x)t dν+(t).The first equation says the preservation of mass, and the second says the preserva-tion of barycenter. In the case µ, ν are general, with functions 0 ≤ λ−(x), λ+(x) ≤ 1it may be written asµ ((a, x]) = λ−(x) ν−(S(x)) + ν− ((S(x), a]) + λ+(x) ν+(T (x)) + ν+ ((T (x),∞)),∫(a,x]t dµ(t) = S(x)λ−(x) ν−(S(x)) +∫(S(x),a]t dν−(t)+ T (x)λ+(x) ν+(T (x)) +∫(T (x),∞)t dν+(t).Note that S(x), T (x), λ−(x), λ+(x) are uniquely determined, if S(x), T (x) arechosen as the largest numbers which satisfy the above equations. Furthermore, itis clear that these equations uniquely determine the martingale coupling pi. Finally,the above equations are derived only from the decreasing property of pi and do notdepend on p, therefore the theorem follows.In particular if we assume symmetry of µ, ν, then we have:Corollary 2.6.9. If µ, ν are symmetric with respect to the origin, then the uniquecoupling in Theorem 2.6.6 or 2.6.8 is also symmetric.68Proof. We can prove it directly, or we let σ be an optimal coupling in Theorem 2.6.6or 2.6.8, and let τ = 12(σ+ σ′) be a symmetrization of σ, where σ′ is the reflection ofσ with respect to the origin. Then τ is also optimal, so by uniqueness, σ = τ .2.6.3 Structure of optimal martingale transport in higherdimensionsWe have studied the structure of the martingale transport in one dimension whichminimizes EP |X−Y |p where 0 < p ≤ 1, and in particular have shown its uniquenesseither when µ is continuous or when the separation assumption holds. In this section,we will introduce the notion of symmetrization of a transport plan, and then willpresent a variational calculus which will lead the higher dimensional problem underradially symmetric marginals into the one-dimensional situation.L- and R- symmetrizationsWe first introduce the notion of L- symmetrization.Definition 2.6.10. Let L be a one-dimensional subspace of Rd and let ψ be a prob-ability measure on Rd. We say that ψ is L-symmetric if it is symmetric with respectto L, i.e. for any Borel set B and any orthogonal matrix M which fixes L, we haveψ(B) = ψ(M(B)).We say that the probability measures ϕ and ψ are L-equivalent if ϕ(B) = ψ(B)for every B ⊆ Rd which is symmetric with respect to L, i.e. x ∈ B implies z ∈B for every z with dist(z, L) = dist(x, L) and x − z ⊥ L. Then we write ϕ ∼=L ψ.Finally, we define L[ψ] to be the unique L-symmetric measure that is L-equivalentto ψ.Now we turn to the notion of R-symmetrization. Let Sr = {x | |x| = r} be theunit sphere in Rd with radius r, let ζ be a probability measure on Sr, and let (pix)x∈Srbe a set of probability measures on Rd attached on each x ∈ Sr. We can view this asa mass transport plan pi with initial mass ζ, by seeing pix as its disintegration. Wemay denote this as pi = (ζ, pix). Fix a vector w ∈ Sr, and let (Mx)x∈Sr be a choice of69orthogonal matrices with the property Mx(x) = w. For u ∈ Rd, let Tu(z) = z + u betranslations. Finally, let Lx be the one-dimensional subspace spanned by x 6= 0.Now consider the following probability measureσ(·) :=∫x∈Sr(Mx ◦ T−x ◦ Lx(pix)) (·) dζ(x)Note that σ does not depend on the choice of (Mx)x∈Sr , due to the presence of theoperator Lx in the definition. Now we define the R- symmetrization operator actingon the transport plan pi = (ζ, pix).Definition 2.6.11. The R-symmetrization operator is defined byR[pi] = R[(ζ, pix)] = (U, σx)where σx = (Mx ◦ T−x)−1(σ) and U is the uniform probability measure on Sr.Thus, σ is an average of appropriately translated and rotated pix’s with weight ζ,andR-symmetrization operator uniformly pushes σ back on S. Now for any transportplan pi = (µ, pix) with general initial distribution µ, one can similarly apply the R-symmetrization, by applying the above R-symmetrization on each disintegration ofµ along the spherical layers Sr.Finally, we introduce the notion of R- equivalence on the space of probabilitymeasures on Rd.Definition 2.6.12. Probability measures ϕ and ψ are called R-equivalent if theycontain the same mass on any annulus, i.e. for any B ⊆ R+ and any AB := {x ∈Rd | |x| ∈ B}, we have ϕ(AB) = ψ(AB). We write ϕ ∼=R ψ.Next, we will apply the symmetrization ideas to study the structure of optimalmartingale transport in higher dimensions, when the marginals are radially symmet-ric.70Variational Lemma and Main TheoremIn this section, we will present a variational lemma which will allow martingaletransport problem under radial marginals to be reduced to the problem on the one-dimensional subspaces, where we can apply the results in the previous section.Lemma 2.6.13 (Variational Lemma). Consider the cost function of the form c(x, y) =h(|x− y|) and let Lx be the one-dimensional subspace spanned by x, x 6= 0. Let ϕ bea probability measure on Rd with barycenter at x and assume that ϕ is not supportedon Lx. Suppose that r 7→ h′(r)/r is strictly decreasing for r > 0. Then, there existsa probability measure ψ with barycenter at x, supported on Lx and is R- equivalentto ϕ, such that ∫Rdh(|x− y|) dϕ(y) >∫Rdh(|x− y|) dψ(y).(2.6.2)For example, h(r) = rp, 0 < p < 2, or h(r) = −rp, p > 2, satisfies the assumptionof the lemma.Proof. We can assume that ϕ is Lx-symmetric, as it does not change the cost∫Rd h(|x−y|) dϕ(y). Thus, it is sufficient to consider the two-dimensional case d = 2,as we will see. Now we will explain how to deform ϕ to obtain ψ in the lemma.For this, let us consider the family of probability measures ψ(t) supported onthe four points z1n(t), z2n(t), z1s(t), z2s(t), where 0 ≤ t ≤ 1 is a parameter. We willobserve that the cost of ψ(t) strictly decreases as t increases, which will be thedesired deformation process.To begin, without loss of generality let the barycenter x be a point in R2, x =(0, b), b 6= 0. Let z1, z2 ∈ R2, |z1| = |z2| = r > 0 and let z1 = (a, z), z2 = (−a, z).Now for 0 ≤ t ≤ 1, let zn(t) = z + t(r − z), zs(t) = z − t(r + z), and letz1n(t) =(√r2 − (zn(t))2, zn(t)), z2n(t) =(−√r2 − (zn(t))2, zn(t)),z1s(t) =(√r2 − (zs(t))2, zs(t)), z2s(t) =(−√r2 − (zs(t))2, zs(t)).Thus, the four points z1n(t), z2n(t), z1s(t), z2s(t) are on the circle of center 0 and71radius r, and they are symmetrically located with respect to the vertical axis. Nowdefine ψ(t) and its transportation costψ(t) =r + z4rδz1n(t) +r + z4rδz2n(t) +r − z4rδz1s (t) +r − z4rδz2s (t),C(t) =r + z2rh(||z1n(t)− x||) +r − z2rh(||z1s(t)− x||).Thus, C(t) is the cost of transporting the point mass δx to ψ(t). Note thatψ(0) = 12δ(−a,z) + 12δ(a,z) and ψ(1) =r+z2rδ(0,r) +r−z2rδ(0,−r), so ψ(t) is a continuousdeformation from ψ(0) to ψ(1) along the circle of radius r. Now, observe that forall 0 ≤ t ≤ 1, the barycenter of ψ(t) is fixed at (0, z) and they are obviously R-equivalent. We will show C ′(t) < 0 if h′(r)/r is strictly decreasing for r > 0; wecomputeC ′(t) =(r + z2r)h′(||z1n(t)− x||)||z1n(t)− x||)〈z1n(t)− x,ddt(z1n(t))〉+(r − z2r)h′(||z1s(t)− x||)||z1s(t)− x||)〈z1s(t)− x,ddt(z1s(t))〉where〈,〉is the inner product. Note that〈z1n(t),ddt(z1n(t))〉=〈z1s(t),ddt(z1s(t))〉= 0,and 〈x,ddt(z1n(t))〉= b(r − z), 〈x, ddt(z1s(t))〉= −b(r + z), henceC ′(t) =(b(r + z)(r − z)2r)[h′(||z1s(t)− x||)||z1s(t)− x||)− h′(||z1n(t)− x||)||z1n(t)− x||)].Now we compute||z1n(t)− x||2 = r2 + b2 − 2b zn(t) = r2 + b2 − 2b(z + t(r − z))||z1s(t)− x||2 = r2 + b2 − 2b zs(t) = r2 + b2 − 2b(z − t(r + z))||z1n(t)− x||2 − ||z1s(t)− x||2 = −4brt72Hence, we see that||z1n(t)− x|| < ||z1s(t)− x|| if b > 0||z1n(t)− x|| > ||z1s(t)− x|| if b < 0Thus in any case C ′(t) < 0. Hence, C(0) > C(1) and note that ψ(1) is supported onthe line Lx.Now we return to the general case and assume that ϕ is Lx-symmetric (if not,consider Lx(ϕ)) but not supported on Lx. Consider a pair of points z1, z2 in Rdthat are symmetric with respect to Lx, i.e. |z1| = |z2| = r,〈x, z1 − z2〉 = 0 andz1+z22∈ Lx. Now by Lx-symmetry of ϕ, dϕ(z1) = dϕ(z2), so we can perform thecontinuous deformation along a great circle of radius r, and by the above computationthe cost is strictly decreasing. Hence, after performing the deformation to all suchpair of points z1, z2 in the support of ϕ, we obtain ψ which is supported on the rayLx, and we have shown that (2.6.2) holds.Finally, we apply the symmetrization arguments in conjunction with the varia-tional lemma, to conclude the following theorem. Recall that without loss of gener-ality we can assume µ ∧ ν = 0, by Remark 2.6.5.Theorem 2.6.14. Suppose that µ∧ν = 0, µ({0}) = 0 and µ, ν are radially symmetricprobability measures on Rd which are in convex order. Assume that either µ isabsolutely continuous, or that there exists an open ball Br = {x | |x| < r} such thatµ(Br) = 1 and ν(Br) = 0. Then there is a unique minimizer pi for the problem(2.1.1) with respect to the cost c(x, y) = |x− y|p, where 0 < p ≤ 1. Furthermore, foreach x, the disintegration pix is concentrated on the one-dimensional subspace Lx.Proof. Let pi be a minimizer. First of all, we claim that for µ a.e. x, the disintegrationpix is concentrated on the line Lx. If not, then we apply the variational lemma 2.6.13for each pix and get a competitor ψx, so that we get another martingale ψ = (µ, ψx)having ψx as its disintegrations. Then cost(pi) > cost(ψ), but ψ is not necessarilybe in MT(µ, ν). However, since pix and ψx are R-equivalent, if we apply the R-symmetrization to the martingale ψ, then R[ψ] is in MT(µ, ν), by radial symmetry73of µ and ν. Now cost(pi) > cost(ψ) = cost(R[ψ]), a contradiction. Hence the claimis true for every minimizer pi ∈ MT(µ, ν). This implies that the problem (2.1.1)is decomposed to the problem on each one-dimensional subspaces, which respect toradially disintegrated marginals along each subspaces. Then the corollary 2.6.9 saysthat the solution for each reduced problem is unique, yielding uniqueness for thewhole problem.Remark 2.6.15. When µ and ν are defined by density functions f and g on Rd, thenthe radially disintegrated marginals along one-dimensional subspace have correspond-ing density functions f0(r) = f(r)|r|d−1 and g0(r) = g(r)|r|d−1, −∞ < r < ∞, byradial symmetry of µ and ν. Let us check that f0 and g0 are in convex order.By radial symmetry of f and g, f0 and g0 have the same mass and the samebarycenter 0. Thus, we only need to check that∫R(r − k)+ f0(r)dr ≤∫R(r − k)+ g0(r)drfor all real k. Let sk(r) = (r − k)+ and hk(r) = 12(sk(r) + sk(−r)). Now let H(x) =hk(|x|) for x ∈ Rd . Then H is a radially symmetric convex function, hence∫Rd H(x)f(x)dx ≤∫Rd H(x)g(x)dxand it is clear that∫Rd H(x)f(x)dx = C∫R hk(r)f(r)|r|d−1dr = C∫R(r − k)+ f0(r) drwhere C is the surface area of unit sphere in Rd.74Chapter 3Structure of optimal subharmonicmartingale transport in higherdimensions3.1 Formulation of the problem3.1.1 Skorokhod embedding problemLet µ and ν be probability measures on Rd, d ≥ 2, and let B denote a Brownianmotion with initial law µ, i.e. B0 ∼ µ.Consider the set of stopping timesT = {τ : τ is a stopping time and B0 ∼ µ,Bτ ∼ ν.}Thus, a stopping time τ ∈ T solves Skorokhod embedding problem with terminallaw ν. We note that T contains all randomized stopping times; definition will begiven later.Next, we introduce the cost functionc(x, y) : Rd × Rd → R.75Let f : R+ → R, f(0) = 0, be a continuous function. In this chapter, we areinterested in the following form:c(x, y) = f(|x− y|).A particular example is when f(r) = rα, 0 < α 6= 2,c(x, y) = |x− y|α.(3.1.1)Now we consider the following optimization problemMaximize / Minimize Ef(|B0 −Bτ |) over τ ∈ T(3.1.2)The purpose of this chapter is to characterize such optimal stopping times, especiallywhen the initial and terminal laws (a.k.a. marginals) µ and ν are radially symmetric.3.1.2 Subharmonic martingale transport problem[21] considered the following problemMaximize / Minimize Cost[pi] =∫∫c(x, y)dpi(x, y) over pi ∈MT (µ, ν)where MT(µ, ν) (Martingale Transport plan) is the set of joint probabilities of µ andν on Rd×Rd, such that for each x ∈ Rd and pi ∈ MT(µ, ν), the disintegration pix hasits barycenter at x; in other words, for any convex function f on Rd,f(x) ≤∫f(y) dpix(y).(3.1.3)Now we define SMT(µ, ν) (Subharmonic Martingale Transport plan) as above, butthe constraint (3.1.3) must hold for every subharmonic function f . Since every convexfunction is subharmonic, SMT problem is more restricted; SMT(µ, ν) ⊆ MT(µ, ν).Example 3.1.1 (SMT(µ, ν) 6= MT(µ, ν)). Let µ be the uniform measure on the unit76sphere, and let ν have 12mass at the origin and 12mass uniformly on the sphere ofradius 2. Then MT(µ, ν) 6= ∅; for each x on the unit sphere, define the disintegrationpix by sending12mass to the origin and 12mass to the point 2x. Then the resulting piis an MT plan and clearly satisfies the marginal condition. However, if we considera subharmonic function f(z) = log |z| , we have∫f(z) dµ (z) = 0 > −∞ =∫f(z) dν (z)(3.1.4)Hence, SMT(µ, ν) = ∅.Definition 3.1.2. We call a probability measure ψ a SH-measure with barycenterx if (3.1.3) holds for every subharmonic function f , and SH(x) be the set of allSH-measures at x.We denote Bx a Brownian motion starting at x. Let τ be a stopping time.Observe that, for any subharmonic function f , (f(Bxt ), t ≥ 0) is a submartingale,hence the measure Law(Bxτ ) is in SH(x). In fact, the coverse is also true, and we givea proof of the following proposition in the appendix.Proposition 3.1.3. Let ψ ∈ SH(x) and suppose supp(ψ) is compact. Then thereexists a stopping time τ such that ψ = Law(Bxτ ).For the proposition to hold, we need to consider the randomized stopping times;see [6]. Now by proposition, we see that the optimal Skorokhod embedding problemand the optimal subharmonic martingale transport problem are equivalent.Remark 3.1.4. In d = 2, SH-measures are also called Jensen measures. However, inR2d ∼= Cd with d ≥ 2, the two notions are not equivalent, so in this chapter we willuse the notation SH-measure.Remark 3.1.5. For simplicity, we will assume throughout this chapterMarginals µ and ν are supported in a compact ball K ⊆ Rd.77Therefore every measure appearing in the sequel are also supported in K, andfor every stopping time τ ,τ = τ ∧ κ, where κ is the first time for Brownian motion to hit the boundary ∂K.In other words, no one can escape the “universe” K. However, the assumptionis only to avoid some convergence issues, and we think that it is sufficient to assumeappropriate moment conditions on the marginals µ, ν according to α in (3.1.1).3.2 Symmetrization of transport plansWe recall the notion of symmetrization of transport plans, which will be crucial forour analysis of the optimal transport plans under radially symmetric marginals. Thissection is the same as the first part of the section 2.6.3 for reader’s convenience.Definition 3.2.1. Let L be a one-dimensional subspace of Rd and let ψ be a proba-bility measure on Rd. We say that ψ is L-symmetric if it is symmetric with respectto L, i.e. for any Borel set B and any orthogonal matrix M which fixes L, we haveψ(B) = ψ(M(B)).We say that the probability measures ϕ and ψ are L-equivalent if ϕ(B) = ψ(B)for every B ⊆ Rd which is symmetric with respect to L, i.e. x ∈ B implies z ∈B for every z with dist(z, L) = dist(x, L) and x − z ⊥ L. Then we write ϕ ∼=L ψ.Finally, we define L[ψ] to be the unique L-symmetric measure that is L-equivalentto ψ.Now we turn to the notion of R-symmetrization. Let Sr = {x | |x| = r} be thesphere in Rd with radius r, let ζ be a probability measure on Sr, and let (pix)x∈Sr bea set of probability measures on Rd attached on each x ∈ Sr. We can view this asa mass transport plan pi with initial mass ζ, by seeing pix as its disintegration. Wemay denote this as pi = (ζ, pix). Fix a vector w ∈ Sr, and let (Mx)x∈Sr be a choice oforthogonal matrices with the property Mx(x) = w. For u ∈ Rd, let Tu(z) = z + u betranslations. Finally, let Lx be the one-dimensional subspace spanned by x 6= 0.78Now consider the following probability measureσ(·) :=∫x∈Sr(Mx ◦ T−x ◦ Lx[pix]) (·) dζ(x)Note that σ does not depend on the choice of (Mx)x∈Sr , due to the presence of theoperator Lx in the definition. Now we define the R- symmetrization operator actingon the transport plan pi = (ζ, pix).Definition 3.2.2. The R-symmetrization operator is defined byR[pi] = R[(ζ, pix)] = (U, σx)where σx = (Mx ◦ T−x)−1(σ) and U is the uniform probability measure on Sr.Thus, σ is an average of appropriately translated and rotated pix’s with weight ζ,andR-symmetrization operator uniformly pushes σ back on S. Now for any transportplan pi = (µ, pix) with general initial distribution µ, one can similarly apply the R-symmetrization, by applying the above R-symmetrization on each disintegration ofµ along the spherical layers Sr.Finally, we introduce the notion of R-equivalence on the space of probabilitymeasures on Rd.Definition 3.2.3. Probability measures ϕ and ψ are called R-equivalent if they con-tain the same mass on any annulus, i.e. for any B ⊆ R+ and any AB := {x ∈Rd | |x| ∈ B}, we have ϕ(AB) = ψ(AB). We write ϕ ∼=R ψ.3.3 Randomized stopping timesDefinition 3.3.1. Let C(R+) = {ω : R+ → Rd | ω(0) = 0, ω is continuous.} be thecontinuous path space starting at 0. Let Ω := C(R+) × Rd be the probability space,and we denote for (ω, x) ∈ Ω that (ω, x)(t) = ωx(t) := x + ω(t), thus the secondcomponent of Ω represents the starting point of the path. Finally, let S be the set of79all stopped pathsS = {(fx, s) | fx : [0, s]→ Rd is continuous and fx(0) = x}.(3.3.1)Definition 3.3.2 (Randomized stopping time). (See [6, Definition 4.17]) Let (Ω,F ,P)be the (filtered) probability space over the path space Ω, equipped with the probabilitymeasure P := W⊗µ, where W is the Wiener measure and µ is a probability measureon Rd. Consider its extension(Ω× [0, 1],F ⊗ B([0, 1]),P⊗ L1)where, B is the Borel σ-algebra, and L is the Lebesgue measure. A probablity measureξ on Ω × R+ is called a randomized stopping time if for each u ∈ R+ and ωx ∈ Ω,the random timeρu(ωx) := inf{t ≥ 0 : ξωx([0, t]) ≥ u}gives a stopping time adapted to F⊗B([0, 1]). Here, ξωx is a disintegration of ξ alongthe path ωx according to P; ξ(dω, dx, dt) = ξωx(dt)µ(dx)W(dω).Throughout this chapter, the terminology “stopping time” means the random-ized stopping time, and we will say that a stopping time is non-randomized if thedisintegration ξωx is the Dirac measure on R+ for P a.e. ωx; thus, a non-randomizedstopping time is the stopping time usually described as a random variable on Ω.Next, we introduce the conditional randomized stopping time given (fx, s) ∈ S,that is, the normalized stopping measure given that we followed the path fx up totime s. For (fx, s) ∈ S and ω ∈ C(R+), define the concatenate path fx ⊕ ω by(fx ⊕ ω)(t) = fx(t) if t ≤ s, and (fx ⊕ ω)(t) = fx(s) + ω(t− s) if t > s.Definition 3.3.3 (Conditional stopping time [6]). The conditional randomized stop-80ping time of ξ, given (fx, s) ∈ S, denoted by ξ(fx,s), is defined to beξ(fx,s)ω ([0, t]) :=11− ξfx⊕ω([0, s]) (ξfx⊕ω([0, t+ s])− ξfx⊕ω([0, s])) if ξfx⊕ω([0, s]) < 1,ξ(fx,s)ω ({0}) := 1 if ξfx⊕ω([0, s]) = 1.According to [6], this is the normalized stopping measure of the “bush” whichfollows the “stub” (fx, s). Note that ξfx⊕ω([0, s]) does not depend on the bush ω.3.4 Monotonicity principle and stability of µ ∧ νImagine that a particle of unit mass at x starts a Brownian motion and suppose thatit reaches y at time t, and continues to travel until stopping time τ ; i.e. Bxt (ω) = yand t < τ(ωx). Then by the strong Markov property, the particle at y will diffusefor the remaining time τ − t and thus will create a SH-measure ψy = Law(Byτ−t).Also imagine that another unit mass particle starting at x′ stops at y′ under τ , i.e.Bx′τ (ω′) = y′. In this situation, we will use the notation(x, ψy : x′, δy′) ∈ τ(3.4.1)and we define its costC(x, ψy : x′, δy′) =∫c(x, z) dψy(z) + c(x′, y′).More precise definition is the following; let Γ ⊆ S be a concentration set of τ , i.e.τ(Γ) = 1. Then we say that(x, ψy : x′, δy′) ∈ τ with respect to Γif there exist (fx, t), (gx′, t′) ∈ Γ and s < t, such that y = fx(s), y′ = gx′(t′), andψy = Law(By(τ (fx,s))).In [6], the following theorem is proved for more general path-dependent cost.81Theorem 3.4.1 (Monotonicity principle [6]). Suppose the cost is of the form c(x, y)and that τ is an optimal stopping time for the minimization problem (3.1.2). Thenthere exists a Borel set Γ ⊆ S such that τ(Γ) = 1, and Γ is c-monotone, in thefollowing sense: If (x, ψy : x′, δy) ∈ τ with respect to Γ, thenC(x, ψy : x′, δy) ≤ C(x, δy : x′, ψy).(3.4.2)Now we introduce a variant of the theorem, which is for the case of radiallysymmetric marginals µ and ν.Theorem 3.4.2 (Radial monotonicity principle). Suppose the cost is of the formc(x, y) = f(|x − y|) and that τ is an optimal stopping time for the minimizationproblem (3.1.2) where the marginals µ, ν are radially symmetric. Then there exists aBorel set Γ ⊆ S such that τ(Γ) = 1, and Γ is radially c-monotone, in the followingsense: If (x, ψy : x′, δy′) ∈ τ with respect to Γ and |y| = |y′|, thenC(x, ψy : x′, δy′) ≤ C(x, δy : x′, ϕy′)(3.4.3)for any ϕy′ ∈ SH(y′) which is R-equivalent to ψy.Remark 3.4.3. The radial monotonicity principle presented here is an adapted versionof the principle given in [6], for our radially symmetric marginal case. Its intuitivemeaning is clear: if the current optimal stopping rule τ allows a particle starting atx to diffuse when it reaches y so that it becomes the probability measure ψy, buttakes another particle at x′ to stop at y′, and if we have the opposite inequalityC(x, ψy : x′, δy′) > C(x, δy : x′, ϕy′)(3.4.4)for some ϕy′ ∈ SH(y′), then we can “modify” the stopping time τ to τ ′ in sucha way that the particle at x now stops at y by τ ′, but instead the particle at x′starts diffusing at y′ until it becomes ϕy′ . Then (3.4.4) means that the cost for τ ′ issmaller than that of τ , but the modified time τ ′ may not satisfy the terminal marginalcondition ν. However, as ψy ∼=R ϕy′ and |y| = |y′|, τ ′ is also R-equivalent to τ . Hence,82regarding a stopping time as a SMT plan, we can apply the symmetrization operatorto τ ′ to get R[τ ′]. Now since the terminal marginal ν is radially symmetric, R[τ ′]also solves the Skorokhod embedding problem with less cost, a contradiction (notethat the symmetrization operator does not alter the cost as the cost is of the formf(|x − y|)). The monotonicity principle asserts the existence of a set Γ of stoppedpaths, which supports the optimal stopping time and resists any such modification.Now we give a simple application of the monotonicity principle.Theorem 3.4.4. Let τ be a minimizer of the problem (3.1.2) and assume c(x, y) =|x − y|α with 0 < α ≤ 1. Then under τ , the common mass µ ∧ ν stays put; i.e. onµ ∧ ν, τ = 0.Proof. Suppose the theorem is false. Then heuristically, there is a particle at x inµ ∧ ν which diffuses to some ψx ∈ SH(x) \ {δx}, and so another particle at y, y 6= x,has to flow into x and make up for the loss. Mathematically, for Γ ⊆ S with τ(Γ) = 1,there should exist x, y ∈ Rd, x 6= y and ψx ∈ SH(x) \ {δx}, such that(x, ψx : y, δx) ∈ τ with respect to Γ.(3.4.5)Now for 0 < α ≤ 1 and z ∈ Rd, |x− z|α + |y − x|α ≥ |y − z|α, so∫|x− z|αdψx(z) + |y − x|α ≥∫|y − z|αdψx(z).(3.4.6)In other words,C(x, ψx : y, δx) ≥ C(x, δx : y, ψx).(3.4.7)But the inequality is strict because ψx 6= δx, hence by theorem 3.4.1, τ is not aminimizer.Note that, for the above result, we do not need to assume the radial symmetry ofmarginals µ and ν. And by the theorem, when we consider the minimization problemwith 0 < α ≤ 1, we can assume µ ∧ ν = 0 without loss of generality.833.5 Variational lemma and its consequences3.5.1 Heuristic observationsLet Sy,r be the uniform probability measure on the sphere of center y and radius r,the most simple element in SH(y). We choose c(x, y) = |x − y| for simplicity andobserve the following “gain” function:G(x) = G(x, y, r) =∫|x− z| dSy,r(z)− |x− y|and its gradient ∇xG. G is the increase of cost when the dirac mass at y diffusesuniformly onto the sphere.1. If |x− y| < |x′ − y′| and r is fixed, then G(x, y, r) > G(x′, y′, r).Roughly speaking, for the same diffusion, the increase of cost is greater whenthe distance |x − y| is small. Hence, for cost minimization, we expect that itis better to stop particles near the source x, and let the particles diffuse whichare far from x, as far as the given marginal condition holds.2. ∇G(x) is toward the direction y− x, thus the directional derivative ∇uG(x) is∇uG(x) < 0 if 〈u, y − x〉 < 0.Hence, the gain function decreases when x moves away from the “center ofdiffusion” y, and we may link this with monotonicity principle to tell somethingabout optimality. This is the key idea of this chapter.3.5.2 Variational lemma and the main theoremFrom now on, c(x, y) = |x− y|α, 0 < α 6= 2. We define the gain functionG(x, ψy) :=∫|x− z|αdψy(z)− |x− y|α for ψy ∈ SH(y), and note thatG(x, ψy)−G(x′, ϕy′) = C(x, ψy : x′, δy′)− C(x, δy : x′, ϕy′).84Lemma 3.5.1. Let x, y be nonzero vectors in Rd with r = |x|. Let u be a unittangent vector to the sphere Sr at x, such that 〈u, y〉 < 0. Let ψy ∈ SH(y) \ {δy}.Then, there exists a ϕy ∈ SH(y) which is R-equivalent to ψy, such thatG(x, ϕy) = G(x, ψy)∇uG(x, ϕy) < 0 if 0 < α < 2∇uG(x, ϕy) > 0 if α > 2where the directional derivative is applied to the x variable.For proof, we first present some calculations. Let c(x, z) = |x− z|α. Take partialderivative ∂∂xdand put x = 0, then we geth(z) := ∇edc(x, z)∣∣x=0= −α |z|α−2 zd(3.5.1)Then we take Laplacian in z,∆h(z) = −α(α− 2)(α + d− 2) |z|α−4 zd(3.5.2)We see that the function h isstrictly superharmonic in the lower half-space {zd < 0} if 0 < α < 2strictly subharmonic in the lower half-space {zd < 0} if α > 2.Proof of Lemma 3.5.1: Without loss of generality, assume x = x1 e1 = (x1, 0, ..., 0)and u = ed. Then 〈u, y〉 < 0 means that y is in the lower half-space. Let H = {z ∈Rd : zd = 0} and 0 < α < 2.Choose a stopping time τ for Brownian motion By such that Law(Byτ ) = ψy, andlet η be the first time for By to hit H. Let σy = Law(Byτ∧η). Then σy is supported inthe lower half-space, and it is nontrivial, hence by the strict subharmonicity (3.5.2),∇uG(x, σy) < 0.85Now we modify τ to τ ′ in the following way; if τ ≤ η, then we let τ ′ = τ . Butif τ > η, in other words if a particle at y has landed on H but not completelystopped by τ , then we symmetrize the remaining time of τ (i.e. the conditionalstopping time) with respect to H and get τ ′. More precisely, let τH be the reflectionof the conditional stopping time of τ w.r.t. H; if τ stops a path emanating from H,then τH stops the reflected path at the same time. Now define τ′ := τ+τH2to be arandomization; before re-starting Brownian motion on H, we flip a coin and chooseeither τ or τH .Now, define ϕy = Law(Byτ ′) and observe1. ϕy ∼=R ψy by the symmetry with respect to H in the definition of τ ′.2. ∇uG(x, ϕy) = ∇uG(x, σy), since the function z 7→ ∇edc(x, z) is odd in zd (see(3.5.2)) hence the symmetrization in the definition of τ ′ does not change ∇uG.Hence, the lemma follows.Next, we prove the following consequence of the variational lemma.Theorem 3.5.2. Let x0, x1, y be nonzero vectors in Rd and let r = |x0| = |x1|.Assume |x0 − y| < |x1 − y|. Fix a measure ϕy ∈ SH(y) \ {δy} and define<(x, ϕy) := {ψy ∈ SH(y) : ψy ∼=R ϕy and ψy is a solution of Min∫|x− z|αdψy(z)},<(x, ϕy) := {ψy ∈ SH(y) : ψy ∼=R ϕy and ψy is a solution of Max∫|x− z|αdψy(z)},G(x) := G(x, ψy) =∫|x− z|αdψy(z)− |x− y|α for any ψy ∈ <(x, ϕy),G(x) := G(x, ψy) =∫|x− z|αdψy(z)− |x− y|α for any ψy ∈ <(x, ϕy).Then we haveG(x0) > G(x1) and G(x0) < G(x1) if 0 < α < 2G(x0) < G(x1) and G(x0) > G(x1) if α > 2.86Proof. First, we claim thatG(x) and G(x) are continuous.The proof of this plausible fact is given in the appendix. Now let 0 < α < 2,and chooose any differentiable curve x(t) : [0, 1] → Sr with x(0) = x0, x(1) = x1and |x(t) − y | is strictly increasing. In other words, we choose x(t) in such a waythat 〈 ddtx(t), y〉 < 0 for all t. Now for a fixed t ∈ [0, 1], choose any ψy ∈ <(x(t), ϕy).Then Lemma 3.5.1 gives σy ∈ <(x(t), ϕy) with ddtG(x(t), σy) < 0. By definition,G(x(s)) ≤ G(x(s), σy) for any s, and G(x(t)) = G(x(t), σy). Hencelim supε↓0G(x(t+ ε))−G(x(t))ε≤ lim supε↓0G(x(t+ ε), σy)−G(x(t), σy)ε=ddtG(x(t), σy) < 0.We showed that the function G(x(t)) is continuous and has the above strictinequality for each t ∈ [0, 1], hence it must be strictly decreasing. Therefore, thetheorem follows. The other three cases can be shown in the same way.Finally, armed with theorem 3.5.2, we establish the main theorem.Theorem 3.5.3. Suppose µ, ν are radially symmetric probability measures on Rd,d ≥ 2, and µ({0}) = 0. Let c(x, y) = |x − y|α, where 0 < α 6= 2. Then the solutionfor the optimization problem (3.1.2) under the marginals µ, ν is unique and it is givenby a non-randomized stopping time.Proof. Let τ be a minimizer for (3.1.2). The proof consists of two parts. The firstpart claims that there exists a Γ ⊆ S on which τ is concentrated, such that Γ doesnot admit a “forbidden pair”, an element in Γ×Γ. Then the second part claims thatif Γ does not allow such forbidden pair, then every stopping time concentrated on Γis necessarily of non-randomized type. This will yield uniqueness immediately.We begin the first part. For x, y 6= 0 in Rd, define the following barrier set:Uyx = {z ∈ Rd : |z| = |y| and 〈x, y〉 < 〈x, z〉}.87Now fix 0 < α < 2. We then describe the forbidden pairs:FP = {((f, s), (g, t)) ∈ S × S : f(0) = g(0) 6= 0 and ∃s′ < s such that f(s′) ∈ U g(t)g(0)}.Now choose a radially c-monotone Γ for τ as in theorem 3.4.2. Notice that theorem3.5.2 and 3.4.2 imply that there must be no forbidden pair in Γ, i.e. FP∩(Γ×Γ) = ∅.Now the second step relies on the oscillation property of Brownian motion. LetΓ be such that FP ∩ (Γ× Γ) = ∅ and ξ be a stopping time with ξ(Γ) = 1. Then weclaim that ξ must be of non-randomized type.Suppose not. Then there exists an element (fx, s) ∈ Γ such that the conditionalstopping time ξ(fx,s) is nonzero. This means that the Brownian motion which hasfollowed the path fx up to time s will continue its motion at y := fx(s). Nowconsider the barrier U := Uyx and note that, Brownian motion starting at y underany non-zero stopping time will go through the surface U before its complete stop,since y is on the boundary circle of U . This implies that there is a stopped path(gy, t) ∈ S such that (fx ⊕ gy, s+ t) ∈ Γ and for some s′ < s+ t, (fx ⊕ gy)(s′) ∈ U .Now notice that the pair ((fx⊕gy, s+t), (fx, s)) ∈ Γ×Γ is forbidden, a contradiction.Therefore, every minimizer is of non-randomized type, and uniqueness follows inthe usual way; let τ and τ ′ be two minimizers. If their disintegrations do not agree,i.e. τωx 6= τ ′ωx for all ωx ∈ B with P(B) > 0, then the stopping time τ+τ′2must be ofrandomized type but it is obviously a minimizer, a contradiction.Remark 3.5.4. Let Γ be the radial c-monotone set on which the optimizer in Theorem3.5.3 is concentrated. The proof of the above theorem in fact tells us that, forminimization problem with 0 < α < 2 or maximization problem with α > 2, theoptimal stopping time applied on the Brownian motion Bx is given by the first hittingtime of the following union of barriersUx := ∪yUyx , where y = fx(s) for some (fx, s) ∈ Γand by uniqueness and the radial symmetry of µ and ν, the Ux’s are congruent underrotation; if M is an orthogonal matrix and M(x) = x′, then M(Ux) = Ux′ .88For minimization problem with α > 2 or maximization problem with 0 < α < 2,we have the same type of result, but the barrier will be reversed: it will be given byVx := ∪yV yx , where y = fx(s) for some (fx, s) ∈ Γwhere V yx is the reversed barrier: Vyx = {z ∈ Rd : |z| = |y| and 〈x, y〉 > 〈x, z〉}.This is due to the interchange of the superharmonic and subharmonic region of thederivative of the cost function (3.5.1), according to α. Also note that when dealingwith maximization problem, the inequalities (3.4.2) and (3.4.3) in the monotonicityprinciples must be reversed.89Chapter 4ConclusionIn this chapter, we summarize the contributions of this thesis and discuss directionsfor future work.4.1 ContributionsIn chapter 2, we study the martingale optimal transport problem (2.1.1), especiallywhen the marginals µ, ν lie on the higher dimensional spaces Rd, d ≥ 2. First, wenotice that when the dual problem attains an optimal solution, we could study thedual solution and conclude that the conjecture 1 holds true. But in that proof, weneed regularity of the dual solution for the variational argument to work. Thus, ourfirst major contribution is to verify such regularity improvement whenever a triple(α, β, γ) solves the dual problem.Then the question is whether the dual problem always attains a solution or not.Unlike the Monge-Kantorovich mass transport theory, it is shown that the martingaletransport problem does not always attain a solution for the dual, revealing the deepaspect of the problem. In regards to this matter, our second major contribution is todevelop a general theory of dual attainment, which enables us to decompose a givenoptimal martingale transport and to study each piece.However, this partial dual attainment result still does not fully resolve the con-90jecture 1, due to the presence of certain factors such as, the Nikodym set. Therefore,the conjecture 1 still remains open, especially in dimension d ≥ 3, and this revealsthe fact that the martingale transport problem in higher dimensions is connected toone of the deepest topics in analysis.Towards conjecture 2, even under the assumption of full dual attainment, wedo not see a general argument which leads to its resolution, making the conjectureeven deeper. We find an example which shows that the conjecture 2 does not holdfor maximization problem, even under radial symmetric marginals. But with theadditional assumption of discreteness of the target ν, we could resolve the conjecturefor both maximization / minimization problem. Another case where we could showthat conjecture 2 is affirmative is when the marginals µ, ν are radially symmetric,only in the case of minimization. Here the sharp distinction between the solutionsfor maximization and minimization problem emerges, and this is the reason whywe guess that the conjecture 2 would hold only for the minimization problem. Weshould mention that even in dimension 2, the conjecture has not been fully solved.In this thesis, we also define the Subharmonic martingale transport problem aswell as we notice that the problem is in fact equivalent to the Skorokhod embeddingproblem which is one of the most important problems in probability theory. Inspiredby the previous result in martingale transport, we tackle the Skorokhod problem inhigher dimensions and in case of marginals being assumed to be radially symmetric.Instead of appealing to the dual problem, we rather develop a direct method whichinvolves fine properties of Brownian motion and nicely exploits the radial symmetryof the marginals. It eventually leads to the conclusion saying that every optimalembedding time must exhibit a certain extremal property which can be seen as theBrenier-type result in the case of Skorokhod embedding. Now we see that the centralideas in two important areas in mathematics, namely the mass transport theory inanalysis and the Skorokhod embedding theory in probability, culminate in meetingat their peaks, and this harmony has not been fully studied yet.914.2 Directions for future workMy further research interest lies on the multi-step martingale transport problem;even in one-dimensional setting, not much is known for the following problem; see,e.g., [11], [23], [24], [29]. Continued from the Conjecture 1 and 2, we propose thefollowing problems:Problem 3 (Multi-marginal case) If a martingale (X1, X2, ..., Xn) optimizesEP c(X1, X2, ..., Xn) where c(x1, ..., xn) : Rd × ... × Rd → R is an appropriate path-dependent cost function and the marginals Xi ∼ µi are known, then what can onesay on the structure of the martingale (X1, X2, ..., Xn)?Even further, I am interested in the case where a continuum of marginals isimposed. We note that recently the classical optimal transport problem with themultiple or continuum marginals has been intensively studied by Gangbo, Swiech,Carlier, Ekeland, Agueh, Pass, Ghoussoub, Kim and etc, but the corresponding mar-tingale transport problem is wide open:Problem 4 (Continuum-marginal case) If a continuum of probabilities (µt)t∈[0,1](having the right convex order) is given, then what can one say on the structure of themartingale (Xt)t∈[0,1], in case in which it solves the optimization problem with respectto an appropriate path-dependent cost function, and has (µt)t∈[0,1] as its marginal?Meanwhile, many important results are kept hidden in the Skorokhod embeddingproblem, especially in higher dimensions. Inspired by the result in Chapter 3, we areworking on the following more ambitious problem:Problem 5: Without assuming any symmetry on the marginals µ, ν in Rd, everyoptimal stopping time is of non-randomized type (i.e. the stopping rule can be solelydetermined by the Brownian path itself). Furthermore, the optimal time is unique.92Even further, while working on the above projects, we see that a general unifyingtheory can be built and each extremal property in the above problems can be seenas special cases. One of the most important cases should be the PlurisubharmonicMartingale Transport Problem, where the defining constraint (1.2.1) holds with re-spect to the cone of plurisubharmonic functions on Cd, which plays a key role in thetheory of several complex variables. Then, we ask:Problem 6: If pi is an optimal plurisubharmonic martingale transport with respectto an appropriate cost functional, then what kind of extremal property must it exhibit?Problem 7: What is an equivalent formulation of the plurisubharmonic martingaletransport in terms of the Brownian motion, and what novelty would it tell us on thetheory of complex variables?Final remark is that these optimal martingale problems do not need to be con-sidered only on Rd, but on other interesting spaces as well; for example, on theRiemannian manifolds and even on the infinite-dimensional spaces. Uncountable in-teresting results should be awaiting for someone to discover them.93Bibliography[1] L. Ambrosio and A. Pratelli. Existence and stability results in the L1 theoryof optimal transportation. Optimal transportation and applications. SpringerBerlin Heidelberg., Volume 1813 of the series Lecture Notes in Mathematics(2003) 123-160.[2] L. Ambrosio, B. Kirchheim, and A. Pratelli. Existence of optimal transportmaps for crystalline norms. Duke Mathematical Journal.,Volume 125, Number2 (2004) 207-241.[3] M. Beiglbo¨ck, P. Henry-Labordere, and F. Penkner. Model-independent boundsfor option prices: A linear programming approach. Finance and Stochastics., 17(2013) 477-501.[4] M. Beiglbo¨ck, P. Henry-Labordere, and N. Touzi. Monotone martingale trans-port plans and Skorohod Embedding. Preprint (2015).[5] M. Beiglbo¨ck and N. Juillet. On a problem of optimal transport under marginalmartingale constraints. http://arxiv.org/abs/1208.1509 (2014).[6] M. Beiglbo¨ck, A.M.G. Cox, and M. Huesmann. Optimal transport and Sko-rokhod embedding. http://arxiv.org/abs/1307.3656, 2014.[7] S. Bianchini and F. Cavalletti. The Monge Problem for Distance Cost inGeodesic Spaces. Comm. Math. Phys., Volume 318, Issue 3 (2013) 615-673.94[8] Y. Brenier. Polar factorization and monotone rearrangement of vector-valuedfunctions. Communications on pure and applied mathematics, Volume 44, Issue4 (1991) 375-417.[9] S. Bu and W. Schachermayer. Approximation of Jensen measures by imagemeasures under holomorphic functions and applications. Trans. Amer. Math.Soc., 331 (1992), 585-608.[10] G. Choquet. Forme abstraite du the´ore`me de capacitabilite´. Ann. Inst. Fourier.Grenoble, 9 (1959) 83–89.[11] A.M.G. Cox, J. Ob lo´j, and N. Touzi. The Root solution to the multi-marginal embedding problem: an optimal stopping and time-reversal approach.http://arxiv.org/abs/1505.03169, (2015).[12] Y. Dolinsky and H.M. Soner. Martingale optimal transport and robust hedgingin continuous time. Probab. Theory Relat. Fields., 160 (2014) 391-427.[13] Y. Dolinsky and H.M. Soner. Martingale optimal transport in the Skorokhodspace. Stochastic Processes and their Applications., 125(10) (2015) 3893-3931.[14] G.A. Edgar. Complex martingale convergence. Banach Spaces, Volume 1166 ofthe series Lecture Notes in Mathematics., 38-59.[15] G.A. Edgar. Analytic martingale convergence. Journal of Functional Analysis.,Volume 69, Issue 2, November 1986, 268-280.[16] P. Fischer and J. A. R. Holbrook. Balayage defined by the nonnegative convexfunctions. Proc. Amer. Math. Soc., 79(3) (1980) 445-448.[17] A. Galichon, P. Henry-Labordere, and N. Touzi. A Stochastic Control Approachto No-Arbitrage Bounds Given Marginals, with an Application to LookbackOptions. Annals of Applied Probability., Volume 24, Number 1 (2014) 312-336.[18] T. W. Gamelin. Uniform Algebras And Jensen Measures, Cambridge UniversityPress, 1979.95[19] W. Gangbo and R.J. McCann. The geometry of optimal transportation. ActaMath., Volume 177, Issue 2 (1996) 113-161.[20] N. Ghoussoub, Y-H. Kim, and T. Lim. Optimal skorokhod embedding betweenradially symmetric marginals in general dimensions. Preprint (2015).[21] N. Ghoussoub, Y-H. Kim, and T. Lim. Structure of optimal martingale transportin general dimensions. http://arxiv.org/abs/1508.01806, (2015).[22] N. Ghoussoub and B. Maurey. Plurisubharmonic martingales and barriers incomplex quasi-Banach spaces. Annales de l’institut Fourier., Volume 39, Issue4 (1989)1007-1060.[23] G. Guo, X. Tan, and N. Touzi. Optimal Skorokhod embedding under finitely-many marginal constraints. http://arxiv.org/abs/1506.04063, (2015).[24] F. Hirsch, C. Profeta, B. Roynette, and M. Yor. Peacocks and associated mar-tingales, with explicit constructions, volume 3 of Bocconi & Springer Series.Springer, Milan (2011).[25] D. Hobson. Robust hedging of the lookback option. Finance and Stochastics.,2 (1998) 329-347.[26] D. Hobson. The Skorokhod embedding problem and model-independent boundsfor option prices. In Paris-Princeton Lectures on Mathematical Finance 2010,volume 2003 of Lecture Notes in Math., Springer, Berlin (2011) 267-318.[27] D. Hobson and M. Klimmek. , Robust price bounds for the forward startingstraddle. Finance and Stochastics., Volume 19, Issue 1 (2014) 189-214.[28] D. Hobson and A. Neuberger. Robust bounds for forward start options. Math-ematical Finance., Volume 22, Issue 1 (2012) 31-56..[29] S. Kallblad, X. Tan, and N. Touzi. Optimal Skorokhod embedding given fullmarginals and Azema-Yor peacocks. http://arxiv.org/abs/1503.00500, (2015).96[30] H. Kellerer. Duality theorems for marginal problems. Z. Wahrsch. Verw. Gebi-ete, 67(4) (1984) 399-432.[31] T. Lim. Optimal martingale transport between radially symmetric marginals ingeneral dimensions. Preprint, 2015.[32] G. Monge. Me´moire sur la the´orie des de´blais et des remblais. Histoire del’acade´mie Royale des Sciences de Paris, 1781.[33] R.T. Rockafellar Characterization of the subdifferentials of convex functions.Pacific J. Math, 17 (1966) 497-510.[34] L. Ru¨schendorf and S. T. Rachev. A characterization of random variables withminimum L2-distance. J. Multivariate Anal., 32(1) (1990) 48-54.[35] L. Ru¨schendorf and L. Uckelmann. Numerical and analytical results for thetransportation problem of Monge-Kantorovich. Metrika, 51(3) (electronic)(2000) 245-258.[36] V. Strassen. The existence of probability measures with given marginals. Ann.Math. Statist., 36 (1965) 423-439.[37] C. Villani. Topics in optimal transportation, Volume 58, Graduate Studies inMathematics. American Mathematical Society, Providence, RI, 2003.[38] C. Villani. Optimal Transport. Old and New, Vol. 338, Grundlehren der mathe-matischen Wissenschaften. Springer, 2009.[39] D. Zaev. On the Monge-Kantorovich problem with additional linear constraints.http://arxiv.org/abs/1404.4962v1, 2014.97Appendix AA proof of Proposition 3.1.3Here, we prove the following proposition.Proposition A.0.1. Let ψ ∈ SH(x) and suppose supp(ψ) is compact. Then thereexists a stopping time τ such that ψ = Law(Bxτ ).We introduce the notion of spherical martingale.Definition A.0.2. Let U be the uniform probability measure on the unit sphere inRd. Let (Xi)∞i=1 be i.i.d random variables on some probability space (Ω,P), whosedistribution is U . We define the spherical martingales as follows:F0 = x ∈ Rd(A.0.1)Fn(X1, ..., Xn)− Fn−1(X1, ..., Xn−1) = rn(X1, ..., Xn−1) ·Xn.(A.0.2)Thus, at each time n, a particle splits uniformly to the sphere of radius rn. (Com-pare this definition with the analytic martingales in [9].) A simple but importantobservation is that, the push-forwarded measure of δx by spherical martingale Fncan be exactly obtained by Law(Bxτ ), where τ is defined as follows: τ1 is the firsthitting time of Bx on the sphere S(x, r1), the sphere of center x and radius r1. NowBτ1(ω) = x1 ∈ S(x, r1), and define τ2 as the first hitting time of Bx1 on the sphereS(x1, r2). We inductively define τ1 ≤ τ2 ≤ ... and we see that Fn(P) = Law(Bxτn).Now we reproduce the proof of proposition 2.1 in [9] for our spherical martingale.98Proposition A.0.3. Let O be an open set in Rd and f : O → R ∪ {−∞} be anupper semicontinuous function. Define f0 = f and for n ≥ 1fn(x) = inf{∫fn−1(x+ ry) dU(y)}where the infimum is taken over all r ≥ 0 such that {x + rB} ⊆ O. Then (fn)∞n=0decreases pointwise to the largest subharmonic function fˆ on O dominated by f .Proof. The proof here follows the same way as in [9] but we provide it here forcompleteness. (fn)∞n=0 obviously decreases. We show upper-semicontinuity of fˆ .Suppose fn−1 is upper semicontinuous and let (xk)∞k=0 in O be such that limk→∞xk =x0. If r ≥ 0 is such that {x0 + rB} ⊆ O where B is the closed unit ball, then there isk0 such that {xk + rB} ⊆ O for k ≥ k0. The upper semicontinuous function fn−1 isbounded above on the relatively compact set ∪∞k=k0{xk + rB} and, for every z ∈ B,fn−1(x0 + rz) ≥ lim supk→∞ fn−1(xk + rz). Hence by Fatou’s lemma,∫fn−1(x0 + ry) dU(y) ≥ lim supk→∞∫fn−1(xk + ry) dU(y) ≥ lim supk→∞fn(xk).Thus fn(x0) ≥ lim supk→∞(xk), showing upper-semicontinuity of fn and hence of f .Let g be subharmonic on O with g ≤ f . Suppose g ≤ fn−1. Then for {x0 +rB} ⊆ O,∫fn−1(x0 + ry) dU(y) ≥∫g(x0 + ry) dU(y) ≥ g(x0)so fn(x0) ≥ g(x0), hence fˆ ≥ g. Finally, for {x0+rB} ⊆ O, by monotone convergencefˆ(x0) = limn→∞fn(x0) ≤ limn→∞∫fn−1(x0 + ry) dU(y) =∫fˆ(x0 + ry) dU(y).This shows that fˆ is subharmonic, and the proof is complete.99By following G.A. Edgar [14], one can write the definition of fn as the following:fn(x) = inf{E[f(Fn)] : (Fi)ni=0 is a spherical martingale valued in O with F0 = x}.(A.0.3)Now the following theorem and its proof is a close analogue of theorem (A) in [9].Theorem A.0.4. Any SH-measure in Rd can be approximated by the push-forwardof spherical martingales.Proof. Let ψ ∈ SH(x), and define the set of push-forwarded measuresΦ = {Fn(P) : (Fi)ni=0 is a spherical martingale with F0 = x}.Then Φ is convex: let (F ′i )ni=0 and (F′′i )mi=0 be two spherical martingales, and we mayassume n = m. Define a spherical martingale (Fi)n+1i=0 by letting F0 = F1 = x and for1 ≤ i ≤ n,Fi(X1, X2, ..., Xi+1) =F ′i (X2, ..., Xi+1) if X1 is in the upper hemisphereF ′′i (X2, ..., Xi+1) if X1 is in the lower hemisphere.Then clearly Fn+1(P) = {F ′n(P) + F ′′n (P)}/2, hence Φ is convex.Now if the theorem is false, then by Hahn-Banach theorem we could find a con-tinuous and bounded function f on O and real numbers a < b such that∫f dψ ≤ a, while∫f ◦ Fn dP ≥ b for every Fn(P) ∈ Φ.Hence by (A.0.3) we have fˆ(x) ≥ b, but then a ≥ ∫ f dψ ≥ ∫ fˆ dψ ≥ fˆ(x) ≥ b, acontradiction.Proof of Proposition A.0.1: For ψ ∈ SH(x), we have a sequence {ψn} ⊆ Φ suchthat ψn → ψ. We know that ψn = Law(Bxτn) for a sequence of stopping timesτn. Since supp(ψ) is compact, Law(Bxτn) are also compactly supported, hence inparticular Eτn = E|Bxτn|2 ≤ V for some constant V . Then by following the proof100of theorem 4.28 in [6], we see that the sequence (τn) is tight, hence by Prokhorov’stheorem we have a subsequence (τk) which converges to a (randomized) stopping timeτ . Now let f be a continuous and bounded function. Then the function (ω, t) →(f ◦Bx)(ω, t) is continuous and bounded on C(R+)×R+, hence Ef(Bxτk)→ Ef(Bxτ ),which means that Law(Bxτk)→ Law(Bxτ ). Therefore, Law(Bxτ ) = ψ.101Appendix BContinuity of the functions inTheorem 3.5.2Here, we show the continuity of the functions in Theorem 3.5.2.Lemma B.0.1. The functions G(x) and G(x) in Theorem 3.5.2 are continuous.Proof. We will use the following:(∗) Let {an}∞n=1 be a sequence of real numbers. Then {an} converges to a if and onlyif each subsequence {ak} of {an} has a further subsequence {al} that converges to a.Now let xn → x in Rd, and defineC(x) := G(x) + |x− y|α =∫|x− z|αdψy(z) for any ψy ∈ <(x, ϕy).Let an = C(xn), a = C(x). We need to show that an → a as n→∞.Choose any subsequence {ak} of {an}, and choose a corresponding sequence ofmeasures ψk ∈ <(xk, ϕy). By compactness, {ψk} has a subsequence {ψl} whichconverges to, say ψ. Note that ψ ∈ SH(y) and ψ ∼=R ϕy by weak convergence. Now∫|xl − z|αdψl(z)−∫|x− z|αdψ(z)=[ ∫(|xl − z|α − |x− z|α) dψl(z)]+[ ∫|x− z|αdψl(z)−∫|x− z|αdψ(z)].102The first bracket goes to zero as l →∞ since |xl − z|α − |x− z|α → 0 uniformly onevery compact set in Rd, and the second bracket goes to zero since ψl → ψ.Now we claim that∫|x− z|αdψ(z) = C(x) = a, i.e. ψ ∈ <(x, ϕy).(B.0.1)If not, then there exists a ρ ∈ <(x, ϕy) such that∫|x− z|αdψ(z) >∫|x− z|αdρ(z), hence∫|xl − z|αdψl(z) >∫|xl − z|αdρ(z) for all large l,a contradiction since ψl ∈ <(xl, ϕy). Therefore, an → a and the lemma follows.103


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items