UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the kernel average for n functions Moffat, Sarah Michelle 2009-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
[if-you-see-this-DO-NOT-CLICK]
[if-you-see-this-DO-NOT-CLICK]
ubc_2010_spring_moffat_sarah.pdf [ 1.27MB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0069333.json
JSON-LD: 1.0069333+ld.json
RDF/XML (Pretty): 1.0069333.xml
RDF/JSON: 1.0069333+rdf.json
Turtle: 1.0069333+rdf-turtle.txt
N-Triples: 1.0069333+rdf-ntriples.txt
Original Record: 1.0069333 +original-record.json
Full Text
1.0069333.txt
Citation
1.0069333.ris

Full Text

On the Kernel Average for n Functions by Sarah Michelle Moffat  B.Sc., The University of British Columbia, 2007  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The College of Graduate Studies (Mathematics)  THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) December 2009 c Sarah Michelle Moffat 2009  Abstract After an introduction to Hilbert spaces and convex analysis, the proximal average is studied and two smooth operators are provided. The first is a new version of an operator previously supplied by Goebel, while the second one is new and uses the proximal average of a function and a quadratic to find a smooth approximation of the function. Then, the kernel average of two functions is studied and a reformulation of the proximal average is used to extend the definition of the kernel average to allow for any number of functions. The Fenchel conjugate of this new kernel average is then examined by calculating the conjugate for two specific kernel functions that represent two of the simplest cases that could be considered. A closed form solution was found for the conjugate of the first kernel function and it was rewritten in three equivalent forms. A solution was also found for the conjugate of the second kernel function, but the two solutions do not have the same form which suggests that a general solution for the conjugate of any kernel function will not be found.  ii  Table of Contents Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Dedication  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii  1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  2 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2  2.1  General Vector Spaces . . . . . . . . . . . . . . . . . . . . . .  2  2.2  Inner Product Spaces  . . . . . . . . . . . . . . . . . . . . . .  4  2.3  Facts on Maximization and Minimization . . . . . . . . . . .  7  3 Convex Analysis 3.1  3.2  . . . . . . . . . . . . . . . . . . . . . . . . . .  8  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8  3.1.1  Cones . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  3.1.2  Interiors of Sets  Convex Sets  Convex Functions  . . . . . . . . . . . . . . . . . . . . .  10  . . . . . . . . . . . . . . . . . . . . . . . .  11 iii  Table of Contents 3.2.1  Subgradients . . . . . . . . . . . . . . . . . . . . . . .  17  3.2.2  Fenchel Conjugate . . . . . . . . . . . . . . . . . . . .  18  3.2.3  Epi-addition and Epi-multiplication . . . . . . . . . .  24  3.3  Proximity Operators . . . . . . . . . . . . . . . . . . . . . . .  27  3.4  Minimax Theory . . . . . . . . . . . . . . . . . . . . . . . . .  28  4 The Proximal Average  . . . . . . . . . . . . . . . . . . . . . .  30  4.1  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  31  4.2  Properties  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  32  Fenchel Conjugate . . . . . . . . . . . . . . . . . . . .  33  4.2.1 4.3  Applications 4.3.1  . . . . . . . . . . . . . . . . . . . . . . . . . . .  41  Self-dual Smooth Approximations . . . . . . . . . . .  41  5 The Kernel Average of Two Functions  . . . . . . . . . . . .  47  5.1  Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  47  5.2  Properties  49  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6 The Kernel Average of n Functions  . . . . . . . . . . . . . .  51  6.1  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  51  6.2  The Kernel Average Conjugate . . . . . . . . . . . . . . . . .  53  6.3  P-norm Kernel Conjugate for General p Case when n = 3 . .  59  6.4  P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 . . .  63  6.4.1  Case 1: x ≥ y1 , x ≥ y1 + y2 , x ≥ 0 . . . . . . . . . . .  64  6.4.2  Case 2: x ≤ y1 , x ≥ y1 + y2 , x ≥ 0 . . . . . . . . . . .  67  6.4.3  Case 3: x ≥ y1 , x ≤ y1 + y2 , x ≥ 0 . . . . . . . . . . .  69  6.4.4  Case 4: x ≥ y1 , x ≥ y1 + y2 , x ≤ 0 . . . . . . . . . . .  73  iv  Table of Contents 6.4.5  Case 5: x ≤ y1 , x ≤ y1 + y2 , x ≥ 0 . . . . . . . . . . .  77  6.4.6  Case 6: x ≤ y1 , x ≥ y1 + y2 , x ≤ 0 . . . . . . . . . . .  80  6.4.7  Case 7: x ≥ y1 , x ≤ y1 + y2 , x ≤ 0 . . . . . . . . . . .  84  6.4.8  Case 8: x ≤ y1 , x ≤ y1 + y2 , x ≤ 0 . . . . . . . . . . .  88  6.4.9  Combining the Eight Cases . . . . . . . . . . . . . . .  90  6.4.10 Bringing It All Together 7 Conclusion  . . . . . . . . . . . . . . . .  92  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  99  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102  Appendices A Maple Code  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104  v  List of Figures 3.1  Epigraph of a function . . . . . . . . . . . . . . . . . . . . . .  13  6.1  Case 1 summary . . . . . . . . . . . . . . . . . . . . . . . . .  66  6.2  Case 2 summary . . . . . . . . . . . . . . . . . . . . . . . . .  70  6.3  Case 3 summary . . . . . . . . . . . . . . . . . . . . . . . . .  73  6.4  Case 4 summary . . . . . . . . . . . . . . . . . . . . . . . . .  77  6.5  Case 5 summary . . . . . . . . . . . . . . . . . . . . . . . . .  81  6.6  Case 6 summary . . . . . . . . . . . . . . . . . . . . . . . . .  84  6.7  Case 7 summary . . . . . . . . . . . . . . . . . . . . . . . . .  87  6.8  Case 8 summary . . . . . . . . . . . . . . . . . . . . . . . . .  89  6.9  Overview of the eight cases . . . . . . . . . . . . . . . . . . .  97  6.10 The six regions . . . . . . . . . . . . . . . . . . . . . . . . . .  98  6.11 Plots of g2∗ (y1 , y2 , −y1 − y2 ) . . . . . . . . . . . . . . . . . . .  98  vi  Acknowledgements First, I would like to thank my supervisors Heinz Bauschke and Shawn Wang for their excitement about mathematics and their flexibility when it came to fitting mathematics into my life. I’d also like to thank Liangjin Yao and Mason Macklem for all their help, and Richard Taylor who first inspired me to pursue graduate studies.  vii  Dedication For my wonderful husband Jim and our daughter Zoe, who was born during the production of this thesis.  viii  Chapter 1  Introduction When averaging functions the most natural place to start is the arithmetic average, defined by A := λf1 + (1 − λ)f2 ,  (1.1)  where 0 ≤ λ ≤ 1. This works well if both functions f1 and f2 are everywhere defined. But if the functions are not differentiable at some points or if their domains do not intersect, then the arithmetic average will not be differentiable or will be +∞ everywhere. In [2] Bauschke, Lucet, and Trienis define a new average, the proximal average, and discuss the benefits of using the proximal average for these types of cases. In particular, the proximal average produces a continuous and differentiable function even if the original functions are non-smooth and their domains do not intersect, provided at least one of the functions is differentiable with a full domain. From the definition of the proximal average, the more general kernel average [3] was defined for averaging two functions based on a kernel function. Both the arithmetic and proximal averages can be derived as special cases of the kernel average. This thesis extends the definition of the kernel average to an arbitrary number of functions and examines the convex conjugate of the kernel average for n functions. 1  Chapter 2  Hilbert Spaces In this chapter we give some background material on inner product spaces. The notion of vector spaces and their extensions are central to much of the following thesis, so a quick reminder of some concepts from linear algebra is also included to refresh the reader’s memory. For more on vector spaces see [7, Chapters 1 and 7] or [11, Chapter 4].  2.1  General Vector Spaces  Definition 2.1.1 (Vector Space) A vector space consists of a set V with elements called vectors, along with two operations such that the following properties hold: (1) Vector addition: Let u, v ∈ V , then there is a vector u + v ∈ V and the following are satisfied. (i) Commutativity: u + v = v + u, ∀u, v ∈ V. (ii) Associativity: u + (v + w) = (u + v) + w, ∀u, v, w ∈ V. (iii) Zero: there is a vector 0 ∈ V such that 0 + u = u = u + 0, ∀u ∈ V. (iv) Inverses: for each u ∈ V , there is a vector −u such that u+(−u) = 0. 2  2.1. General Vector Spaces (2) Scalar multiplication: Let u, v ∈ V and r, s ∈ R, then the following are satisfied. (i) Left distributivity: (r + s)v = rv + sv. (ii) Associativity: r(sv) = (rs)v. (iii) Right distributivity: r(u + v) = ru + rv. (iv) Neutral element: 1v = v. (v) Absorbing element: 0v = 0. (vi) Inverse neutral element: (−1)v = −v. Example 2.1.2 The space Rn consists of vectors v = (v1 , · · · , vn ) with vi ∈ R for 1 ≤ i ≤ n and operations defined by  (u1 , · · · , un ) + (v1 , · · · , vn ) := (u1 + v1 , · · · , un + vn ) r(v1 , · · · vn ) := (rv1 , · · · , rvn ), where r ∈ R. Definition 2.1.3 A subspace of a vector space V is a subset W of V with W = ∅ and W is a vector space using the operations of V . Definition 2.1.4 Let S ⊆ V , the span of S is the smallest subspace containing S and is denoted by span S. Fact 2.1.5 The subspace spanned by a nonempty set S ⊆ V consists of all linear combinations of the elements of S.  3  2.2. Inner Product Spaces Definition 2.1.6 A linear transformation A from a vector space V to a vector space W is a function A : V → W satisfying  A(r1 v1 + r2 v2 ) = r1 Av1 + r2 Av2 , ∀v1 , v2 ∈ V and ∀r1 , r2 ∈ R.  2.2  Inner Product Spaces  We recall the definitions of a norm and an inner product. Definition 2.2.1 A norm on a vector space V is a function  ·  : V →  [0, +∞[ with the following properties. (i) Positive definite: x = 0 if, and only if, x = 0, (ii) Homogeneous: αx = |α| x , ∀x ∈ V and α ∈ R, (iii) Triangle inequality: x + y ≤ x + y , ∀x, y ∈ V . Definition 2.2.2 An inner product on a vector space V is a function ·, · : V × V → R satisfying the following properties. (i) Positive definite: x, x ≥ 0, ∀x ∈ V and x, x = 0 only if x = 0; (ii) Symmetry: x, y = y, x , ∀x, y ∈ V ; (iii) Bilinearity αx + βy, z = α x, z + β y, z , ∀x, y, z ∈ V and α, β ∈ R. We call a vector space paired with an inner product and norm induced by x := x, x  1/2 ,  an inner product space.  Definition 2.2.3 In a normed vector space (V, · ), a sequence (vn )∞ n=1 converges to v ∈ V if lim vn − v = 0. n→∞  4  2.2. Inner Product Spaces Definition 2.2.4 A sequence (vn )∞ n=1 is called a Cauchy sequence if for every  > 0, there is an integer N > 0 such that vn − vm  <  for all  n, m ≥ N . Remark 2.2.5 While every convergent sequence is a Cauchy sequence, the converse is not true. For example, consider the sequence  1,  in Q approaching  14 141 1414 14142 , , , ,··· 10 100 1000 10000  √ 2. This sequence does not converge in Q.  Definition 2.2.6 An inner product space V is complete if every Cauchy sequence in V converges to some vector v ∈ V . Definition 2.2.7 A complete inner product space is called a Hilbert space. Example 2.2.8 The following inner product spaces are Hilbert spaces: (i) [7, Theorem 4.2.5] Rn paired with the inner product x, y = x1 y1 + · · · + x n yn . (ii) [7, Theorem 7.5.8] The space such that x ∞ n=1  2  :=  ∞ n=1  2,  1/2  x2n  consisting of all sequences x = (xn )∞ n=1  is finite, with the inner product x, y =  x n yn .  (iii) [10, Example 2.2-7] The space L2 [0, 1], consisting of all (equivalence classes of ) functions f on [0, 1] such that f +∞, paired with the inner product f, g =  2  := (  1 1 2 2 0 |f (x)| dx)  <  1 0 f (x)g(x)dx.  5  2.2. Inner Product Spaces Definition 2.2.9 [4, Section 1.1] Let X, V be vector spaces and let A : X → V be a linear operator. The corresponding adjoint linear transformation from V to X is the unique operator A∗ such that the following identity holds  Ax, y = x, A∗ y  (2.1)  for all x ∈ X and y ∈ V . Example 2.2.10 Let X be a vector space and λ1 , · · · , λn ∈ R. Let A : X n → X be the linear operator defined by x = (x1 , · · · , xn ) → λ1 x1 + · · · + λn xn . Then the adjoint of A is the operator A∗ : X → X n such that z → (λ1 z, · · · , λn z). Proof. Let x ∈ X n and y ∈ X, then n  Ax, y = λ1 x1 + · · · + λn xn , y =  λi xi , y . i=1  On the other hand, x, A∗ y = (x1 , · · · , xn ), (λ1 y, · · · , λn y) n  = x1 , λ1 y + · · · + xn , λn y =  λi xi , y . i=1  Since Ax, y = x, A∗ y ∀x ∈ X n , y ∈ X and the adjoint is unique then A∗ is the adjoint of A.  6  2.3. Facts on Maximization and Minimization  2.3  Facts on Maximization and Minimization  In this section, we assume that X is an inner product space. Fact 2.3.1 Let x ∈ X then  sup x, y = y∈X     0,  if x = 0;    +∞,  otherwise.  Fact 2.3.2 Let f and g be functions from X → ]−∞, +∞], then (i) sup (f (x) + g(y)) = sup (f (x)) + sup (g(y)) x,y∈X  (ii)  x∈X  y∈X  inf (f (x) + g(y)) = inf (f (x)) + inf (g(y)) .  x,y∈X  x∈X  y∈X  Fact 2.3.3 Let x, y ∈ X then (i) sup sup f (x, y) = sup sup f (x, y) x∈X y∈X  y∈X x∈X  (ii) inf inf f (x, y) = inf inf f (x, y) x∈X y∈X  y∈X x∈X  These will be used frequently later on.  7  Chapter 3  Convex Analysis Let H denote a real Hilbert space with inner product ·, · , norm  · , and  identity mapping Id. We’ll now introduce some necessary convex analysis, for a more in-depth look at convex analysis please see [12].  3.1  Convex Sets  Definition 3.1.1 A set A ⊆ H is affine if x ∈ A, y ∈ A, and θ ∈ R imply that θx + (1 − θ)y ∈ A. Definition 3.1.2 A set C ⊆ H is convex if x ∈ C, y ∈ C, and 0 ≤ θ ≤ 1 imply that θx + (1 − θ)y ∈ C. This means that for any two points in a convex set C, the line segment joining the two points is also contained in C. Example 3.1.3 The following sets are convex: (i) Affine sets; (ii) Halfspaces: A set H is a halfspace if for some b ∈ H and β ∈ R, H = {x ∈ H : x, b ≤ β};  8  3.1. Convex Sets (iii) Closed ball of radius r > 0 centered at a point xc : B(xc , r) := {x ∈ H : xc − x ≤ r}. The following definitions describe some important properties of sets that are frequently used in convex analysis. Definition 3.1.4 The indicator function of a set C ⊆ H is the function ιC : H → [0, +∞] defined by  ιC (x) =     0,  if x ∈ C;    +∞,  otherwise.  Definition 3.1.5 The support function of a set C ⊆ H is the function σC : H → ]−∞, +∞] defined by σC (x) = sup x, u . u∈C  3.1.1  Cones  A set C is called a cone if for every x ∈ C and θ > 0 we have θx ∈ C. A set C is a convex cone if it is both convex and a cone. That is, for every x1 , x2 ∈ C and θ1 , θ2 ≥ 0 then θ1 x1 + θ2 x2 ∈ C. Definition 3.1.6 The conical hull of a set C ⊆ H is the set  cone C =  {λx : x ∈ C}. λ>0  9  3.1. Convex Sets Fact 3.1.7 [5, Section 2.1.5] The conical hull of C is the smallest convex cone that contains the set C. Example 3.1.8 Let D = {z = (x, y, 0) ∈ R3 : z ≤ 1} be a closed unit disc in R3 , then cone D = R2 × {0}. Definition 3.1.9 Let C be a nonempty convex set in H. We say that C recedes in the direction of y, y = 0, if and only if x + λy ∈ C for every λ ≥ 0 and x ∈ C. Directions in which C recedes are also called the directions of recession. Definition 3.1.10 The recession cone of a set C is the set of all vectors y ∈ H such that for each x ∈ C, x + λy ∈ C for all λ ≥ 0. The recession cone of C is denoted by 0+ C. Fact 3.1.11 [12, Theorem 8.1] Let C be a non-empty convex set. Then the recession cone, 0+ C, is a convex cone containing the origin.  3.1.2  Interiors of Sets  Definition 3.1.12 The interior of a set C ⊆ H is the set  int C = {x ∈ H : ∃ > 0 such that B(x, ) ⊆ C},  where B(x, ) = {y : y − x < }. Definition 3.1.13 The relative interior of a convex set C ⊆ H is the set  ri C = {x ∈ H : cone(C − x) = span(C − x)}. 10  3.2. Convex Functions The following example illustrates the need to distinguish between the interior and the relative interior of a set. Example 3.1.14 Consider again the closed disc D = {z = (x, y, 0) ∈ R3 : z ≤ 1}.We get that int D = ∅ since no ball in R3 can be contained in D, however ri D = {z = (x, y, 0) ∈ R3 : z < 1}. Definition 3.1.15 A point x is a limit point of a subset C ⊆ H if there is a sequence (xn )∞ n=1 with xn ∈ C such that x = lim xn . A set C ⊆ H is n→∞  closed if it contains all of its limit points. Definition 3.1.16 The closure of a set C ⊆ H is the smallest closed set containing C, and is denoted by cl C.  3.2  Convex Functions  The effective domain of a function f : H →] − ∞, +∞] is the set of points:  dom f = {x ∈ H : f (x) < +∞}.  (3.1)  The set of global minimizers of f is denoted by argminx∈H f (x). Definition 3.2.1 We call a function f proper if it never takes on the value of −∞ and is not identically equal to +∞. Definition 3.2.2 A function f is lower semicontinuous at a point x0 if lim inf f (x) ≥ f (x0 ), x→x0  11  3.2. Convex Functions where lim inf is as defined in [13, Definition 1.5]. The function is said to be lower semicontinuous if it is lower semicontinuous at every point x0 ∈ H. Definition 3.2.3 A function f is coercive if  lim f (x) = ∞.  x →∞  A function is supercoercive if f (x) = ∞. →∞ x  lim  x  Definition 3.2.4 The epigraph of a function f : H → ]−∞, +∞] is the set  epi f = {(x, t) ∈ H × R : f (x) ≤ t}.  This is illustrated in Figure 3.1.  Definition 3.2.5 A function f : H → ]−∞, +∞] is convex if epi f is a convex set. We denote the set of proper, lower semicontinuous, convex functions in H by Γ0 (H). While Definition 3.2.5 conveniently ties the notion of a convex function to that of a convex set, in practice a convex function is usually synonymous with the following result.  12  3.2. Convex Functions  Figure 3.1: Epigraph of a function [5, Figure 3.5]  Theorem 3.2.6 [12, Theorem 4.1] A function f : H → ]∞, +∞] is convex if and only if for x, y ∈ H and 0 < θ < 1,  f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).  A function is said to be strictly convex if the inequality in Theorem 3.2.6 is strict; that is, that f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) provided that x = y. Fact 3.2.7 [14, Theorem 2.5.1(ii) and Proposition 2.5.6] If a function f is both coercive and strictly convex then it has a unique minimizer, x ¯. That is, argminx∈H f (x) = {¯ x}.  13  3.2. Convex Functions Definition 3.2.8 A function f : H → ]∞, +∞] is concave if −f is convex. Definition 3.2.9 A function f : H → ]∞, +∞] is affine if f is finite and both convex and concave. Example 3.2.10 Let f : H → ]∞, +∞] be defined by x → a, x − b where a ∈ H and b ∈ R. Then f is an affine function. Proof. Let x, y ∈ H and θ ∈ R, then f (θx + (1 − θ)y) = a, θx + (1 − θ)y − b = θ( a, x − b) + (1 − θ)( a, y − b) = θf (x) + (1 − θ)f (y). Therefore the conditions for convexity and concavity are both satisfied and f is affine. Another method of determining if a function is convex is by checking the first and second order conditions for convexity. Fact 3.2.11 (First order condition) [5, Section 3.1.3] Let f : Rn → ]−∞, +∞] be a differentiable function; that is, its gradient ∇f exists at each point of its open domain. Then f is convex if and only if its domain is convex and f (y) ≥ f (x) + ∇f (x), y − x holds for all x, y ∈ dom f . This condition says that for a convex function the first-order Taylor series approximation is a global underestimator of the function, and conversely if 14  3.2. Convex Functions the Taylor approximation is a global underestimator then the function is convex. Fact 3.2.12 (Second order condition) [5, Section 3.1.4] Let f : Rn → ]−∞, +∞] be a twice differentiable function; that is, its Hessian or second derivative ∇2 f (x) exists at each point of it open domain. Then f is convex if and only if dom f is convex and its Hessian is positive semidefinite: ∇2 f (x)  0.  Remark 3.2.13 A matrix A ∈ Rm×n is positive semidefinite if y T Ay ≥ 0 for all y ∈ Rn , and is denoted by A  0.  Fact 3.2.14 (Composition with affine mapping) [5, Section 3.2.2] Suppose f : Rn → R, A ∈ Hm×n , and b ∈ Rn . Define g : Rn → R by  g(x) = f (Ax + b);  with dom g = {x : Ax + b ∈ dom f }. Then if f is convex, so is g; if f is concave, so is g. Definition 3.2.15 Given two functions f, g from H → ]−∞, +∞], f is said to majorize g if f (x) ≥ g(x) ∀x ∈ H. Definition 3.2.16 Let f be a function from H → R. The lower semicontinuous hull of f is the greatest lower semi-continuous function majorized by f , i.e, the function whose epigraph is the closure of the epigraph of f .  15  3.2. Convex Functions Definition 3.2.17 Let f ∈ Γ0 (H). The recession function of f is the function f ∞ : H → ]−∞, +∞] whose epigraph is the recession cone of the epigraph of f , 0+ (epi f ). Fact 3.2.18 [14, Theorem 2.1.5 (ii)] Let f ∈ Γ0 (H) be such that int(dom f ) = ∅ and let t0 ∈ dom f . The function ϕt0 : dom f \{t0 } → R defined by ϕt0 :=  f (t) − f (t0 ) , t − t0  is nondecreasing; if f is strictly convex then ϕt0 is increasing. Proposition 3.2.19 Let f ∈ Γ0 (H) and x0 ∈ dom f , then the recession function of f is f (x0 + tu) − f (x0 ) t→∞ t  f ∞ (u) = lim for all u ∈ H.  Proof. Using Definition 3.2.17 and Definition 3.1.10 we have (u, λ) ∈ 0+ (epi f ) ⇔ ∀t > 0 : (x0 , f (x0 )) + t(u, λ) ∈ epi f f (x0 + tu) − f (x0 ) ≤λ t f (x0 + tu) − f (x0 ) ⇔ sup ≤ λ. t t>0 ⇔  Fix u ∈ H, the function t →  f (x0 +tu)−f (x0 ) t  is nondecreasing on ]0, +∞] due  to the convexity of f and Fact 3.2.18. Then, (u, λ) ∈ 0+ (epi f ) ⇔ lim  t→∞  f (x0 + tu) − f (x0 ) f (x0 + tu) − f (x0 ) = sup ≤ λ. t t t>0 16  3.2. Convex Functions Therefore for all u ∈ H, f ∞ (u) = lim  t→∞  3.2.1  f (x0 +tu)−f (x0 ) . t  Subgradients  In order to deal with nonsmooth convex functions, we will now introduce a concept that is analogous to a derivative of a differentiable function. Definition 3.2.20 We say that y ∈ H is a subgradient of a convex function f at the point x if  f (x) + y, z − x ≤ f (z) (∀z ∈ H).  (3.2)  The set of all subgradients of f at x is called the subdifferential of f at x and is denoted by ∂f (x). That is,  ∂f (x) := {y ∈ H : f (x) + y, z − x ≤ f (z) ∀z ∈ H}.  (3.3)  Example 3.2.21 Let f (x) = |x|, then     {−1},     ∂f (x) = [−1, 1],       {1},  if x < 0; if x = 0; if x > 0.  Fact 3.2.22 [12, Theorem 26.1] Let f : H → ]−∞, +∞] be a differentiable function. Then ∂f = {∇f }. Fact 3.2.23 [14, Theorem 2.5.7] If f is a proper convex function, then x ∈ dom f is a minimum point for f if and only if 0 ∈ ∂f (x). In particular, 17  3.2. Convex Functions if f is differentiable at x then x is a minimum if ∇f (x) = 0. Fact 3.2.24 (Bronstead-Rockafellar) [14, Theorem 3.1.2] Let H be a Hilbert space and f ∈ Γ0 (H). Then dom f ⊆ dom ∂f and dom f ∗ ⊆ ran ∂f .  3.2.2  Fenchel Conjugate  Definition 3.2.25 Let f : H → ]−∞, +∞]. The Fenchel conjugate, or convex conjugate, of f is defined as f ∗ (y) =  sup ( x, y − f (x)) , x∈dom f  for all y ∈ H. It is interesting to note that since x, y − f (x) is affine with respect to y for a fixed x, then f ∗ is the supremum of a set of convex functions and is therefore convex regardless of whether f is convex or not. In the case where f is a finite, coercive and twice continuously differentiable function, the Fenchel conjugate is also referred to as the Legendre transform of f [13, Example 11.9]. Proposition 3.2.26 (Fenchel-Young Inequality) Let f : H → ]−∞, +∞]. The following holds f (x) + f ∗ (x∗ ) ≥ x, x∗  (3.4)  for all x, x∗ ∈ H. Proof. Follows directly from the definition of the conjugate function. Proposition 3.2.27 If f1 ≤ f2 then f1∗ ≥ f2∗ . 18  3.2. Convex Functions Proof. Let y ∈ H, since f1 ≤ f2 then x, y − f1 (x) ≥ x, y − f2 (x) for all x ∈ H. Taking the supremum over x of each side yields  sup ( x, y − f1 (x)) ≥ sup ( x, y − f2 (x)) .  x∈H  x∈H  Therefore f1∗ (y) ≥ f2∗ (y). Proposition 3.2.28 Let f : H → ]−∞, +∞] and c ∈ R be a constant. Then (f (· − c))∗ (x∗ ) = x∗ , c + f ∗ (x∗ ). Proof. Let x∗ ∈ H, then (f (· − c))∗ (x∗ ) = sup ( x∗ , x − f (x − c)). Letting x  x − c = x , this becomes (f (· − c))∗ (x∗ ) = sup  x∗ , x + c − f (x ) = x∗ , c + sup  x  x∗ , x − f (x )  x  = x∗ , c + f ∗ (x∗ ).  Example 3.2.29 Let q =  1 2  2,  ·  then q∗ = q and this is the only self-  conjugate function, i.e. f = f ∗ . Proof. From the definition of the conjugate we get that q∗ (y) = sup x  Setting h(x) = x, y −  1 2  x  2  x, y −  1 x 2  2  and differentiating, we get h (x) = y − x and  h (x) = − Id, which is negative definite: Hence, h is strictly concave and 19  3.2. Convex Functions therefore the critical point x = y is the maximizer. Setting x = y in the equation above yields q∗ (y) =  1 2  y 2.  On the other hand, suppose f is a convex function satisfying f = f ∗ . Then f is proper and using Proposition 3.2.26 we get x, x ≤ f (x) + f ∗ (x) = 2f (x) ⇔ q(x) ≤ f (x) Then by Fact 3.2.27 f ∗ ≤ q∗ = q. Since f ∗ = f , we get q ≤ f ≤ q, therefore f = q. Due to its frequent use, from here on we will use the notation that  q :=  1 · 2  2  .  (3.5)  Example 3.2.30 Let f : R → R be defined by f (x) = p1 |x|p then for p = 1  ∗  ∗  f (x ) =     0,   +∞,  if − 1 ≤ x∗ ≤ 1; otherwise.  And when p > 1 1 f ∗ (x∗ ) = |x∗ |q q where  1 p  +  1 q  = 1.  Proof. When p = 1, f (x) = |x| and f ∗ (x∗ ) = sup (xx∗ − |x|) x∈R  20  3.2. Convex Functions  If  x∗  ≥ 0,  f ∗ (x∗ )  = sup  (xx∗  − x) = sup  x∈R+  (x(x∗  − 1)) =  x∈R+  If x∗ < 0, f ∗ (x∗ ) = sup (xx∗ + x) = sup (x(x∗ + 1)) = x∈R−  Altogether, f ∗ (x∗ ) =  x∈R−     0,     +∞,   0,    0,  if x∗ > 1; if x∗ ≤ 1. if x∗ ≥ −1;    +∞,  if x∗ < −1.  if − 1 ≤ x∗ ≤ 1;    +∞, otherwise.  When p > 1, f (x) = p1 |x|p and f ∗ (x∗ ) = sup xx∗ − p1 |x|p x∈R  Differentiating to find the critical point, 1 d (xx∗ − |x|p ) = x∗ − |x|p−1 sgn(x) = 0, dx p where sgn(x) denotes the sign of x. Then solving for x yields 1  x∗ = |x|p−1 sgn(x) ⇒ |x∗ | = |x|p−1 ⇒ |x| = |x∗ | p−1 .  Substituting this back into the definition of the conjugate, 1 1 1 x|x|p−1 sgn(x) − |x|p = |x||x|p−1 − |x|p = |x|p − |x|p p p p p 1 p−1 p−1 p − 1 ∗ p−1 =( )|x|p = ( )(|x∗ | p−1 )p = ( )|x | . p p p Letting q =  p p−1 ,  we get that f ∗ (x∗ ) = 1q |x∗ |q where  1 p  +  1 q  = 1.  21  3.2. Convex Functions Example 3.2.31 Let f (x) =  1 p |x  − c|p for some constant c with p > 1.  Then f is convex. Proof. Let 1 1 g(z) = ( |z|q )∗ = |z|p , q p where  1 p  +  1 q  = 1. Then g(z) is convex since it is a conjugate function and  f (x) is convex by Fact 3.2.14. Fact 3.2.32 (Biconjugate theorem) [14, Theorem 2.3.3] Assume that f : H → ]−∞, +∞] is a proper function. Then f ∗∗ := (f ∗ )∗ = f if and only if f ∈ Γ0 (H). Fact 3.2.33 (Conjugate of the difference of functions) [8, Theorem 2.1] Let g ∈ Γ0 (H) and let h ∈ Γ0 (H) such that h and h∗ both have full domain. Then (∀x∗ ∈ H)(g − h)∗ (x∗ ) = sup (g ∗ (y ∗ ) − h∗ (y ∗ − x∗ )) y ∗ ∈H  (3.6)  Definition 3.2.34 Let f : H → ]−∞, +∞], and A be a linear transformation from H to H. Define  Af (x) := inf{f (y) : Ay = x}.  Proposition 3.2.35 Let f : H → ]−∞, +∞], and A be a linear transformation from H to H. Then (Af )∗ = f ∗ ◦ A∗ .  22  3.2. Convex Functions Proof. Let x∗ ∈ H, then (Af )∗ (x∗ ) = sup ( x, x∗ − Af (x)) = sup x∈H  x∈H  x, x∗ −  inf  {y:Ay=x}  f (y)  = sup ( x, x∗ − f (y) : (x, y) ∈ H × H, Ay = x) = sup ( Ay, x∗ − f (y) : y ∈ H) = sup ( y, A∗ x∗ − f (y) : y ∈ H) = f ∗ (A∗ x∗ ) = (f ∗ ◦ A∗ )(x∗ ).  Fact 3.2.36 (Fenchel’s Duality Theorem) [12, Theorem 31.1] Let f and g be a proper convex functions on H. Suppose at least one of the following holds: (i) ri(dom f ) ∩ ri(dom g) = ∅, (ii) f and g are lower semicontinuous, and ri(dom f ∗ ) ∩ ri(dom g ∗ ) = ∅. Then inf (f (x) + g(x)) = sup (−g ∗ (−x∗ ) − f ∗ (x∗ )) . x  x∗  Fact 3.2.37 [14, Theorem 2.3.1] Let f : H → ]−∞, +∞] and α, β ∈ R. If α > 0 then (αf )∗ (x∗ ) = αf ∗ (α−1 x∗ ) for every x∗ ∈ H; if β = 0 then (f (β·))∗ (x∗ ) = f ∗ (β −1 x∗ ) for every x∗ ∈ H.  23  3.2. Convex Functions  3.2.3  Epi-addition and Epi-multiplication  Definition 3.2.38 Let f1 , f2 ∈ Γ0 (H). The infimal convolution, or epiaddition, is defined by  (f  g)(x) :=  inf  x1 +x2 =x  (f1 (x1 ) + f2 (x2 ))  (3.7)  for all x ∈ H. The infimal convolution is said to be exact at a point x if the infimum is attained. Definition 3.2.39 Let f ∈ Γ0 (H) and α ≥ 0. We define epi-multiplication by α f=     αf (·/α),  if α > 0;    ι{0} ,  if α = 0.  (3.8)  Proposition 3.2.40 [1, Proposition 3.1] Let f, f1 , · · · , fn ∈ Γ0 (H). Then the following properties hold: (i) dom(α f ) = α(dom f ) (ii) dom(f1  ···  fn ) = (dom f1 ) + · · · + (domfn )  Proposition 3.2.41 Let α ≥ 0. Then the following hold: (i) (αf )∗ = α f ∗ ; (ii) (α f )∗ = αf ∗ ; (iii) (f1  ···  fn )∗ = f1∗ + · · · + fn∗ . 24  3.2. Convex Functions Proof. (i) Let x∗ ∈ H, then we consider two cases. (1) If α > 0, (αf )∗ (x∗ ) = sup ( x, x∗ − αf (x)) = α sup ( x, x∗ /α − f (x)) x  x  = αf ∗ (x∗ /α). (2) If α = 0  (αf )∗ (x∗ ) = sup ( x, x∗ − 0]) = x     0,  if x∗ = 0;    +∞, otherwise  (3.9)  = ι{0} (x∗ ) = (α f ∗ )(x∗ ). Altogether, (αf )∗ (x) = (α f ∗ )(x∗ ). (ii)  (α f )∗ =  =     sup ( x, x∗ − αf (x/α)) if α > 0 x    sup x, x∗ − ι{0} (x) if α = 0 x    α sup ( x/α, x∗ − f (x/α)) if α > 0   0  x  if α = 0  = αf ∗ (x∗ ).  25  3.2. Convex Functions (iii) (f1  fn )∗ (x∗ ) = sup  ···  x  = sup x  x, x∗ −  inf  x1 +···+xn =x  (f1 (x1 ) + · · · + fn (xn ))  x1 + · · · + xn , x∗ +  sup  x1 +···+xn =x  (−f1 (x1 ) − · · · − fn (xn ))  = sup ( x1 , x∗ − f1 (x1 )) + · · · + sup ( x1 , x∗ − f1 (x1 )) x1  x1  = (f1∗ + · · · + fn∗ )(x∗ ).  Fact 3.2.42 [1, Fact 3.4] The following hold. (i) If int dom f1 ∩ · · · int dom fn−1 ∩ dom fn = ∅, then (f1 + · · · + fn )∗ = f1∗  ···  fn∗ and the infimal convolution is exact.  ∗ ∩dom f ∗ = ∅, then f (ii) If int dom f1∗ ∩· · · int dom fn−1 1 n  and epi(f1  ···  fn is exact  λn (fn + q))∗ = λ1 (f1∗  q) + · · · +  fn ) = (epi f1 ) + · · · + (epi fn ).  Lemma 3.2.43 (λ1 (f1 + q) λn (fn∗  ···  ···  q)  Proof. Using Proposition 3.2.41(iii), Proposition 3.2.41(i), Fact 3.2.42, and Example 3.2.29, we get that (λ1 (f1 + q)  ···  λn (fn + q))∗ = (λ1 (f1 + q))∗ + · · · + (λn (fn + q))∗ = λ1 (f1 + q)∗ + · · · + λn (fn + q)∗ = λ1 (f1∗  q∗ ) + · · · + λn (fn∗  = λ1 (f1∗  q) + · · · + λn (fn∗  q∗ ) q). (3.10) 26  3.3. Proximity Operators  The following lemma illustrates the beauty of the epi-multiplication notation and will be used for a couple of proofs in the following chapters. Lemma 3.2.44 Let fi : H → ]−∞, +∞] for 1 ≤ i ≤ n. Let f = (f1 , · · · , fn ), x = (x1 , · · · , xn ) and f˜(x) =  λi fi (xi ): Then f˜∗ (x∗ ) =  n i=1  λi fi∗ (x∗i ).  Proof. f˜∗ (x∗ ) = sup { x, x∗ − f˜(x)} x∈H  =  sup x=(x1 ,··· ,xn )∈H  { x1 , x∗1 + · · · xn , x∗n −  λi fi (xi )}  = λ1 sup{ x1 , x∗1 /λ1 − f1 (x1 )} + · · · + λn sup{ xn , x∗n /λn − fn (xn )} x1  =  3.3  x∗ λ1 f1∗ ( 1 ) λ1  xn  + ··· +  x∗ λn fn∗ ( n ) λn  = λ1 f1∗ (x∗1 ) + · · · + λn fn∗ (x∗n ).  Proximity Operators  Definition 3.3.1 The proximity operator, or proximal mapping, of a function f ∈ Γ0 (H) is defined by (∀x ∈ H) Proxf x = argminy∈H (f (y) + q(x − y)) .  (3.11)  Fact 3.3.2 [6, Section 2.2] For all x ∈ H and for all p ∈ H  p = Proxf x ⇔ x − p ∈ ∂f (p), 27  3.4. Minimax Theory and Proxf = (Id +∂f )−1 . Remark 3.3.3 Note that since the function y → f (y) + q(x − y) is strictly convex and supercoercive, it has a unique minimizer, p = Proxf (x).  3.4  Minimax Theory  Minimax problems are optimization problems that involve both minimization and maximization. Let X and Y be arbitray subsets of H with X = ∅ and Y = ∅, and let F be a function from X × Y to [−∞, +∞]. Minimax theory deals with problems of the form sup inf F (x, y) or inf sup F (x, y). x∈X y∈Y  y∈Y x∈X  For more on minimax theory please see chapters 36 and 37 in [12]. Definition 3.4.1 If sup inf F (x, y) = inf sup F (x, y) then the common x∈X y∈Y  y∈Y x∈X  value is called the minimax or the saddle-value of F . Definition 3.4.2 F is a concave-convex function if F (·, y) is a concave function on X for all y ∈ Y and F (x, ·) is convex on Y for all x ∈ X. Similarly, F is a convex-concave function if F (·, y) is convex on X for all y ∈ Y and F (x, ·) is concave on Y for all x ∈ X. The following fact gives us conditions for determining whether the saddlevalue exists. Fact 3.4.3 [12, Theorem 37.3] Let F : Rm × Rn → ]−∞, +∞] be a proper concave-convex function with effective domain X × Y . Then either of the  28  3.4. Minimax Theory following conditions implies that the saddle-value of F exists. If both conditions hold, the saddle-value must be finite. (a) The convex functions F (x, ·) for x ∈ ri X have no common direction of recession. (b) The convex functions −F (·, y) for y ∈ ri Y have no common direction of recession.  29  Chapter 4  The Proximal Average When averaging functions, the traditional arithmetic average  λ1 f1 + · · · + λn fn  (4.1)  is the natural place to begin. However, when the domains of the functions do not intersect then the result of (4.1) is a function that is everywhere infinity. The proximal average provides a useful method of averaging functions, even when their domains do not intersect. In this chapter, we give a new proof to the self-duality of the proximal average: (P(f , λ))∗ = P(f ∗ , λ). We also supply two self-dual smooth operators, Sβ f and Tβ f . For this chapter, let f1 , · · · , fn ∈ Γ0 (H), and λ1 , · · · , λn be strictly positive real numbers such that  n i=1  λi = 1.  30  4.1. Definitions  4.1  Definitions  Definition 4.1.1 (Proximal Average) Let f = (f1 , · · · , fn ) and λ = (λ1 , · · · , λn ). The λ-weighted proximal average of n functions fi , 1 ≤ i ≤ n, is P(f , λ) = λ1 (f1 + q)  ···  λn (fn + q) − q.  (4.2)  That is, n  (∀x ∈ H) P(f , λ)(x) =  n  λi (fi (xi /λi ) + q(xi /λi )) − q(x).  inf  i=1  xi =x i=1  This can be reformulated in two different ways. Proposition 4.1.2 The proximal average can be equivalently defined by (i) P(f , λ)(x) =  inf  λi yi =x  i  λi fi (yi ) +  i  λi q(yi ) − q(x)  (ii) P(f , λ) = (λ1 (f1 + q)∗ + · · · + λn (fn + q)∗ )∗ − q. Proof. Using the change of variables yi = xi /λi in Definition 4.1.1 we immediately get (i). For (ii), first note that by Proposition 3.2.40(ii), (∀i ∈ N) dom(fi∗  q) = (dom fi∗ ) + (dom q) = H.  31  4.2. Properties Then Fact 3.2.42(i), Proposition 3.2.41(i), and Fact 3.2.32 yield (λ1 (f1 + q)∗ + · · · + λn (fn + q)∗ )∗ = (λ1 (f1 + q)∗ )∗ = λ1 (f1 + q)∗∗ = λ1 (f1 + q)  4.2  · · · (λn (fn + q)∗ )∗ λn (fn + q)∗∗  ··· ···  λn (fn + q).  Properties  Theorem 4.2.1 (Domain)  dom P(f , λ) = λ1 dom f1 + · · · + λn dom fn Proof. Using Proposition 3.2.40i and Proposition 3.2.40 ii dom P(f , λ) = dom(λ1 (f1 + q)  ···  λn (fn + q))  = dom(λ1 (f1 + q)) + · · · + dom(λn (fn + q))  (4.3)  = λ1 dom(f1 + q) + · · · + λn dom(fn + q) = λ1 dom f1 + · · · + λn dom fn .  Corollary 4.2.2 If at least one function fi has full domain and λi > 0 then P(f , λ) has full domain.  32  4.2. Properties  4.2.1  Fenchel Conjugate  Next we examine the conjugate of the proximal average. The purpose of this section is to give a new proof for (P(f , λ))∗ = P(f ∗ , λ) without using Toland’s Duality Theorem. First we must prove the following lemma, which will also be used later to reformulate the proximal average. Lemma 4.2.3 The following identity holds n  n  λi q(yi ) − q( i=1  λi yi ) = i=1  1 4  n  n  λi λj yi − yj  2  .  (4.4)  λi λj yi , yj .  (4.5)  i=1 j=1  Proof. Consider first the left hand side of (4.4),  n  n  λi q(yi ) − q( i=1  = =  1 2 1 2  λi yi ) i=1  n  λi yi  2  −  i=1 n  λi yi i=1  2  −  1 2 1 2  n  λi yi  2  λ2i yi  2  i=1 n i=1  − i=j  On the other hand, from the right hand side we get,  33  4.2. Properties  1 4  n  n 2  λi λj yi − yj i=1 j=1 n  1 = 4 = = = = = =  1 4 1 4 1 4 1 4 1 2 1 2  n  λj yi − yj  λi  2  j=1 n  i=1 n  (λj yi  λi  2  − 2λj yi , yj + λj yj  j=1 n  i=1 n  n  λj yi  λi  2  −2  j=1  i=1 n  yi  j=1 2  −2  i=1 n 2  λi yi  2  −  1 2  λi yi  2  −  1 2  i=1 n i=1  2  j=1  λj yi , yj +  λj yj  2  j=1 n  λi λj yi , yj +  −2  i=1 n  λj yj  n  j=1 n n  λi yi  )  n  λj yi , yj +  n  λi  2  i=1 j=1 n n  λi λj yi , yj i=1 j=1 n λi λj λ2i yi 2 − i=1 i=j  n  λi i=1  λj yj  2  j=1  yi , yj .  (4.6)  Since (4.5) and (4.6) are equal, the proof is complete. The following lemma is new. Lemma 4.2.4 Let  g(y1 , · · · , yn ) = λ1 q(y1 ) + · · · + λn q(yn ) − q(λ1 y1 + · · · + λn yn ),  34  4.2. Properties for (y1 , · · · , yn ) ∈ Hn and  g  ∗  (x∗1 , · · ·  , x∗n )  =  n i=1  λn = 1. Then     λ1 q(x∗1 ) + · · · + λn q(x∗n ),  if x∗1 + · · · + x∗n = 0;    +∞,  otherwise.  Proof. For every (x∗1 , · · · , x∗n ) ∈ Hn we have g ∗ (x∗1 , · · · , x∗n ) = sup  x∗1 , y1 + · · · + x∗n , yn − λ1 q(y1 ) − · · · − λn q(yn )  y1 ,··· ,yn  + q(λ1 y1 + · · · + λn yn ) .  (4.7)  In light of Lemma 4.2.3, the equation within the supremum is concave and therefore solving for critical points will yield the supremum. Taking the partial derivatives, with respect to yi for i = 1...n, and setting them equal to zero gives x∗1 − λ1 y1 + (λ1 y1 + · · · + λn yn )λ1 = 0 .. .  (4.8)  x∗n − λn yn + (λ1 y1 + · · · + λn yn )λn = 0. Then taking the sum of all of the above equations yields n  n  x∗i  −  i=1 n i=1  λi yi + i=1 n  x∗i  ⇔  n  −  λi ( i=1 n  λi yi + i=1  n  λi yi ) = 0 i=1 n  x∗i = 0.  λi yi ⇔ i=1  i=1  35  4.2. Properties So x∗1 + · · · + x∗n = 0 and if we let y1 = x∗1 /λ1 , · · · , yn = x∗n /λn then (y1 , · · · , yn ) is a solution to (4.8). Consequently, λ1 y1 + · · · + λn yn = 0 and x∗i , x∗i /λi = 2λi q(x∗i /λi ) = 2(λi q)(x∗i ) for i = 1 · · · n. This gives us that x∗1 , y1 + · · · + x∗n , yn − λ1 q(y1 ) − · · · − λn q(yn ) n  λi q(x∗i ) − λ1 q(x∗1 ) − · · · − λn q(x∗n )  =2 i=1  = λ1 q(x∗1 ) + · · · + λn q(x∗n ) If i  x∗i = 0 then let y1 = y2 = · · · = yn = y and then (4.7) becomes n  g ∗ (x∗1 , · · · , x∗n ) ≥ sup y  x∗i , y − λ1 q(y) − · · · − λn q(y) + q(y) i=1 n  x∗i , y − q(y) + q(y)  ≥ sup y  i=1 n  x∗i , y  ≥ sup y  = +∞.  i=1  Thus,  g ∗ (x∗1 , · · · , x∗n ) =     λ1 q(x∗1 ) + · · · + λn q(x∗n ), if x∗1 + · · · + x∗n = 0;   +∞,  Remark 4.2.5 It can be noted that  otherwise.  ∂g ∂yi  = λi yi − (λ1 y1 + · · · + λn yn )λi for  36  4.2. Properties i = 1 · · · n so that  ∂g ∂y1  + ··· +  ∂g ∂yn  = 0. This means that  ran ∂g ⊆ {(x∗1 , · · · , x∗n ) : x∗1 + · · · + x∗n = 0}. Conversely, if x∗1 + · · · + x∗n = 0 and we let y1 = x∗1 /λ1 , · · · , yn = x∗n /λn then ∇g(y1 , · · · , yn ) = (x∗1 , · · · , x∗n ) and {(x∗1 , · · · , x∗n ) : x∗1 + · · · + x∗n = 0} ⊆ ran ∂g. Therefore ran ∂g = {(x∗1 , · · · , x∗n ) : x∗1 + · · · + x∗n = 0}. Now since ran ∂g ⊆ dom g ∗ ⊆ ran ∂g by Fact 3.2.24 and we have ran ∂g = ran ∂g then dom g ∗ = {(x∗1 , · · · , x∗n ) : x∗1 + · · · + x∗n = 0}, as we saw in the previous lemma. Theorem 4.2.6 (Fenchel Conjugate) [1, Theorem 5.1] (P(f , λ))∗ = P(f ∗ , λ)  (4.9)  Proof. Let  f (x) = P(f , λ)(x) =  n i=1  inf  λ1 f1 (y1 ) + · · · λn fn (yn ) + λ1 q(y1 ) + · · · + q(yn )  λi yi =x  − q(λ1 y1 + · · · + λn yn ) ,  and let A : Hn → H be a linear operator defined by A = i.e. A(x1 , · · · , xn ) =  n i=1  λ1 , · · · , λn ,  λi xi . Then A∗ : H → Hn , the adjoint of A, is  37  4.2. Properties    λ  1  .  ∗ ∗ .  A =  . , i.e. A (z) = (λ1 z, · · · , λn z). Then we can write f as f = AF   λn where  F (y) = λ1 f1 (y1 )+· · ·+λn fn (yn )+λ1 q(y1 )+· · ·+λn q(yn )−q(λ1 y1 +· · ·+λn yn ) and AF (y) :=  inf  {x:Ax=y}  F (x).  For ease of notation, say that F = j + g where j(y) = λ1 f1 (y1 ) + · · · + λn fn (yn ) and g(y) = λ1 q(y1 ) + · · · + λn q(yn ) − q(λ1 y1 + · · · + λn yn ). By Proposition 3.2.35, f ∗ (x∗ ) = (AF )∗ (x∗ ) = (F ∗ ◦ A∗ )(x∗ ).  Since j ∈ Γ0 (H) and dom g = H × · · · × H, then int(dom f ) ∩ int(dom g) = ∅ and we can use Fact 3.2.42(i) and Lemma 3.2.44 to get f ∗ (x∗ ) = (j ∗ =  g ∗ )A∗ (x∗ )  (λ1 f1∗ + · · · + λn fn∗ )  g ∗ A∗ (x∗ ).  38  4.2. Properties Then using Lemma 4.2.4 we get f ∗ (x∗ ) = = =  =  (λ1 f1∗ + · · · + λn fn∗ ) inf  y1 ,··· ,yn  λ1 f1∗ (y1 /λ1 ) + · · · + λn fn∗ (yn /λn ) + g ∗ (λ1 x∗ − y1 , · · · , λn x∗ − yn ) λ1 f1∗ (y1 /λ1 ) + · · · + λn fn∗ (yn /λn )  inf  λ1 x∗ −y1 +···+λn x∗ −yn =0 λ1 x∗ − y1 + λ1 q( ) λ1 x∗ =y  g ∗ (λ1 x∗ , · · · , λn x∗ )  inf  1 +···+yn  + · · · + λn q(  λn x∗ − yn ) λn  λ1 f1∗ (y1 /λ1 ) + · · · + λn fn∗ (yn /λn ) + λ1 q(x∗ −  + λn q(x∗ −  y1 ) + ··· λ1  yn ) . λn  Expanding the last set of terms in the above equation yields λi q(x∗ −  yi λi ∗ yi )= x − λi 2 λi  2  =  λi ∗ x 2  2  = λi q(x∗ ) − x∗ , yi + λi q(  − x∗ , yi +  λi yi 2 λi  2  yi ) λi  for all i = 1 · · · n. Taking the sum of all of these terms and substituting back into the infimum equation produces f ∗ (x∗ ) =  inf  x∗ =y1 +···+yn  + λ1 q( =  x∗ =y  inf  1 +···+yn  λ1 f1∗ (y1 /λ1 ) + · · · + λn fn∗ (yn /λn ) + q(x∗ ) − x∗ , y1 + · · · + yn y1 yn ) + · · · + λn q( ) λ1 λn λ1 f1∗ (y1 /λ1 ) + · · · + λn fn∗ (yn /λn ) + λ1 q(  y1 yn ) + · · · + λn q( ) λ1 λn  − q(x∗ ) .  39  4.2. Properties Making the simple change of variable zi = yi /λi for i = 1 · · · n generates f ∗ (x∗ ) =  inf  x∗ =λ1 z1 +···+λn zn  λ1 f1∗ (z1 ) + · · · + λn fn∗ (zn ) + λ1 q(z1 ) + · · · λn q(zn ) − q(x∗ )  = P(f ∗ , λ).  Example 4.2.7 Let f = (f, f ∗ ) and λ =  1 1 2, 2  , then P(f , λ) = q.  Proof. By Fact 4.2.6, (P(f , λ))∗ = P(f ∗ , λ). Since f ∗ = (f ∗ , f ∗∗ ) = (f ∗ , f ), then we get that P(f ∗ , λ) = P(f , λ). Therefore, using Example 3.2.29 we see that (P(f , λ))∗ = q. Corollary 4.2.8 P(f , λ) is convex, lower semicontinuous, and proper. Proof. We can apply Fact 4.2.6 twice to see that (P(f , λ))∗∗ = (P(f ∗ , λ))∗ = P(f , λ).  Therefore, in light of Fact 3.2.32 P(f , λ) ∈ Γ0 (H). Definition 4.2.9 (Moreau Envelope) The Moreau envelope of f ∈ Γ0 (H) with parameter µ > 0 is eµ f = f  µ q.  Fact 4.2.10 (Moreau Envelope of the Proximal Average) [1, Theorem 6.2] 40  4.3. Applications (i) eµ P(f , λ) = λ1 eµ f1 + · · · + λn eµ fn (ii) (eµ P(f , λ))∗ = λ1 (eµ f1 )∗  ···  λn (eµ fn )∗  Fact 4.2.11 (Proximal Mapping) [1, Theorem 6.7]  ProxP(f ,λ) = λ1 Proxf1 + · · · + λn Proxfn  4.3 4.3.1  Applications Self-dual Smooth Approximations  A function f in Rn is smooth if f is finite and differentiable everywhere in Rn . It can be helpful in cases of nondifferentiable convex functions to find a smooth approximation of the function. Here, two smooth approximations are defined using the proximal average. The first smooth operator, Sβ f , has a simple expression in terms of the Moreau envelope which can be seen in [9]. The second smooth operator, Tβ f , has a simple expression in terms of the proximal average and is new. Both approximations are ”self-dual” in the sense that the conjugate of the approximation of f is equal to the approximation of the conjugate of f . Recall that, P(f1 , λ1 , · · · , fn , λn ) := (λ1 (f1 + q)∗ + · · · + λn (fn + q)∗ )∗ − q. For 0 ≤ β ≤ 1 and a proper lower-semicontinuous function f , define Sβ f :  41  4.3. Applications Rn → ]−∞, +∞] by Sβ f (x) := (1 + β)2 P(f,  1−β 2β x , q, )( ) 1+β 1+β 1+β  (4.10)  for all x ∈ Rn . Theorem 4.3.1  (i) (Sβ f )∗ = Sβ f ∗  (ii) When 0 < β ≤ 1, we have Sβ f = (1 + β)P(1 − β, f, β, 0) + βq = (1 − β)2 (f Therefore when β → 0, Sβ f → lim (f β→0+  1 β q)  1 q) + βq. (4.11) β  = f.  Proof. (i) By Theorem 4.2.6, we have (Sβ f )∗ =  (1 + β)2 P(f,  2β · 1−β , q, )( ) 1+β 1+β 1+β  ∗  By Fact 3.2.37 and Proposition 3.2.41(i) we then get ∗ 2β 1−β , q, ) ((1 + β)·) 1+β 1+β ∗ 2β 1−β (1 + β)· , q, ) ( = (1 + β)2 P(f, ) 1+β 1+β (1 + β)2 1−β 2β · = (1 + β)2 P f ∗ , , q, ( ) 1+β 1+β (1 + β)  (Sβ f )∗ =  (1 + β)2 P(f,  = Sβ f ∗ . (ii) For every x, by the definition of the proximal average, Proposi-  42  4.3. Applications tion 3.2.41(i), and Example 3.2.29 ∗ 1−β 2β x 1 x 2 (f + q)∗ + (q + q)∗ ( )− 1+β 1+β 1+β 2 (1 + β)2 ∗ 1−β β x (f + q)∗ + q ( ) − q(x) 1+β 1+β 1+β  Sβ f (x) = (1 + β)2 = (1 + β)2  = (1 + β) (1 − β)(f + q)∗ + βq  ∗  (x) − q(x)  (1 − β)(f + q)∗ + βq  = (1 + β)  ∗  (x) − q(x) + βq(x).  This is the first equality in (4.11). To continue, apply Fact 3.2.42(i), Fact 3.2.32, Example 3.2.29, and Proposition 3.2.41(i) to Sβ f = (1 + β) (1 − β)(f + q)∗ + βq  ∗  (x) − q(x),  to get Sβ f (x) = (1 + β) [((1 − β)(f + q)∗ )∗ (βq)∗ ] (x) − q(x) = (1 + β) ((1 − β)(f + q)(  · )) 1−β  1 q (x) − q(x). β  Using Definition 3.2.38, u 1 ) + q(x − u) − q(x) 1−β β u u 1 = (1 + β) inf (1 − β)f ( ) + (1 − β)q( ) + q(x − u) − q(x) u 1−β 1−β β u 1 1 u ) + q( )+ q(x − u) − q(x) . = (1 − β 2 ) inf f ( u 1−β 1−β β(1 − β) 1 − β2  Sβ f (x) = (1 + β) inf (1 − β)(f + q)( u  (4.12)  43  4.3. Applications Simplifying the portion of (4.12) containing q and using 1 β(1−β)2  and  1 β(1−β)  −  1 (1−β 2 )  =  1 1 + β(1−β) (1−β)2  =  1 β(1−β 2 )  u 1 1 )+ q(x − u) − q(x) 1−β β(1 − β) 1 − β2 1 u 2 x 2 u 1 1 1 = + − x, u + 2 (1 − β) 2 β(1 − β) 2 β(1 − β) β(1 − β) 2 2 2 u x 1 1 1 = − x, u + 2 2 β(1 − β) 2 β(1 − β) β(1 − β ) 2 2 1 u u 1 1 x 2 2 = ( − 2 x, + x ) + ( − ) 2β (1 − β)2 1−β β(1 − β 2 ) β 2 1 u 2 β x 2 x− + . = 2β 1−β 1 − β2 2  q(  2  −  x 1 2 1−β 2  Plugging this back into (4.12) gives u 1 u β ) + q(x − )+ q(x) u 1−β β 1−β 1 − β2 1 = (1 − β 2 ) inf f (w) + q(x − w) + βq(x) w β 1 q (x) + βq(x), = (1 − β 2 ) f β  Sβ f (x) = (1 − β 2 ) inf f (  which is the second equality of (4.11). The convergence result follows from [13, Theorem 1.25]. Another smooth operator is defined by  Tβ f := P(f, 1 − β, q, β). Theorem 4.3.2  (4.13)  (i) (Tβ f )∗ = Tβ f ∗  44  2  4.3. Applications (ii) When 0 < β ≤ 1, we have 2−β 2 β q ( ·) + q. β 2−β 2−β  Tβ f = (1 − β) f  Proof. (i) This follows from Theorem 4.2.6 and Example 3.2.29 (Tβ f )∗ = (P(f, 1 − β, q, β))∗ = P(f ∗ , 1 − β, q∗ , β) = P(f ∗ , 1 − β, q, β) = Tβ f ∗ (ii) Applying Proposition 4.1.2(ii), Proposition 3.2.41(i), Example 3.2.29, Fact 3.2.42(i), and Definition 3.2.38, Tβ f (x) = ((1 − β)(f + q)∗ + β(2q)∗ )∗ (x) − q(x) = (1 − β)(f + q)∗ + β  q 2  ∗  (x) − q(x)  · 2 ) q (x) − q(x) 1−β β u 2 u ) + (1 − β)q( ) + q(x − u) − q(x). = inf (1 − β)f ( u 1−β 1−β β  =  (1 − β)(f + q)(  This is equivalent to,  Tβ f (x) = (1−β) inf f ( u  u u 2 1 ) + q( )+ q(x − u) − q(x) . 1−β 1−β β(1 − β) 1−β (4.14)  45  4.3. Applications Note that u 2 1 )+ q(x − u) − q(x) 1−β β(1 − β) 1−β 1 u 2 x 2 − 2 x, u + u 2 x 2 2 1 = + − (1 − β)2 2 β(1 − β) 2 1−β 2 2 2 u 2−β 2 x, u 2−β x = − + β(1 − β)2 2 β(1 − β) β(1 − β) 2 2−β u 2 2x u 2x 2 2−β 4 = −2 , + + − 2 2β (1 − β) 2−β 1−β 2−β β(1 − β) β(2 − β) 2 x 2 − β 2x u 2 β = − + . 2β 2 − β 1 − β (1 − β)(2 − β) 2  q(  Substitute this back into (4.14) to get u 2 − β 2x u 2 β x )+ − + u 1−β 2β 2 − β 1 − β 2−β 2 2x β 2−β q( − w) + q(x) = (1 − β) inf f (w) + w β 2−β 2−β 2−β 2x β = (1 − β) f q ( )+ q(x), β 2−β 2−β  2  Tβ f (x) = (1 − β) inf f (  which proves the desired equality.  46  x 2  2  Chapter 5  The Kernel Average of Two Functions 5.1  Definition  The kernel average for two functions is given by Bauschke and Wang in [3] as a generalization of the proximal average. A natural extension of the definition of the proximal average, the kernel average replaces the use of q with any kernel function g. Using the same notation as the previous chapter, we assume f1 , f2 , and g are functions in Γ0 (H), and λ1 , λ2 are strictly positive real numbers such that λ1 + λ2 = 1. Definition 5.1.1 (Kernel Average) Let f = (f1 , f2 ) and λ = (λ1 , λ2 ), we define P (f , λ, g) : H → [−∞, +∞] at x ∈ H by P (f , λ, g)(x) := =  inf  λ1 y1 +λ2 y2 =x  inf  x=z1 +z2  (λ1 f1 (y1 ) + λ2 f2 (y2 ) + λ1 λ2 g(y1 − y2 ))  z1 z2 z1 z2 λ1 f1 ( ) + λ2 f2 ( ) + λ1 λ2 g( − ) . λ1 λ2 λ1 λ2  (5.1)  We call this the average of f1 and f2 with respect to the kernel g, or the g-average of f1 and f2 . 47  5.1. Definition Example 5.1.2 (Arithmetic Average) Set g = ι{0} , then P (f , λ, g)(x) = =  inf  (λ1 f1 (y1 ) + λ2 f2 (y2 ) + λ1 λ2 ι0 (y1 − y2 ))  inf  (λ1 f1 (y1 ) + λ2 f2 (y1 ))  λ1 y1 +λ2 y2 =x  λ1 y1 +λ2 y1 =x  = λ1 f1 (x) + λ2 f2 (x) is the arithmetic average. Lemma 5.1.3 The following equality holds when λ1 + λ2 = 1. λ1 λ2 y1 − y2  2  = λ1 y1  2  + λ2 y2  2  − λ1 y1 + λ2 y2 2 .  Proof. Let y1 , y2 ∈ H then by Lemma 4.2.3,  λ 1 y1  2  + λ2 y2  2  − λ1 y1 + λ2 y2  2  =2 =2  1 4  2  2  λ1 λj yi − yj  2  i=1 j=1  1 λ1 λ2 y1 − y2 4  = λ1 λ2 y1 − y2  2  1 + λ2 λ1 y2 − y1 4  2  Example 5.1.4 (Proximal Average) If g = q, then  P (f , λ, g)(x) =  inf  λ1 y1 +λ2 y2 =x  λ1 f1 (y1 ) + λ2 f2 (y2 ) + λ1 λ2  1 y1 − y2 2  2  .  48  2  5.2. Properties Applying Lemma 5.1.3  P (f , λ, g) =  inf  λ1 y1 +λ2 y2 =x  1 1 1 λ1 f1 (y1 )+λ2 f2 (y2 )+ λ1 y1 2 + λ2 y2 2 − x 2 2 2  2  ,  which is the proximal average with n = 2. Example 5.1.5 Let f1 = ι{a} and f2 = ι{b} , with a, b ∈ R. Then  P (f , λ, g) =  5.2     λ1 λ2 g(a − b)  if x = λ1 a + λ2 b    +∞  otherwise.  Properties  Fact 5.2.1 (Fenchel Conjugate) [3, Theorem 2.2] Let f1 , f2 , g ∈ Γ0 (H). For every x∗ ∈ H, (P (f , λ, g))∗ (x∗ ) = (cl ϕ)(λ1 x∗ , λ2 x∗ )  (5.2)  where  ϕ(u, v) =  inf  λ1 z1 +λ2 z2 =u+v  1 u z1 −v z2 λ1 f1∗ (z1 ) + λ2 f2∗ (z2 ) + λ1 λ2 (g ∗ ( − ) + g∗( − )) . 2 λ1 λ2 λ2 λ1 λ2 λ1  Furthermore, if (ri dom f1 − ri dom f2 )  ri dom g = ∅ then the closure oper-  ation in (5.2) can be omitted and we get that (P (f , λ, g))∗ (x∗ ) = P (f ∗ , λ, (g ∗ )∨ )  (5.3)  49  5.2. Properties where for a given function g ∈ Γ0 (H), let g ∨ (x) = g(−x), and the infimum in Definition 5.1.1 is attained, that is P (f ∗ , λ, (g ∗ )∨ )(x∗ ) =  min  x∗ =λ1 z1 +λ2 z2  (λ1 f1∗ (z1 ) + λ2 f2∗ (z2 ) + λ1 λ2 g ∗ (z2 − z1 )) . (5.4)  Corollary 5.2.2 Let f1 , f2 , g ∈ Γ0 (H), and assume that both g and g ∗ have full domain. Then both P (f , λ, g) and P (f ∗ , λ, (g ∗ )∨ ) are in Γ0 (H) and (P (f , λ, g))∗ = P (f ∗ , λ, (g ∗ )∨ ).  In particular, for g =  1 p  ·  (P (f , λ,  where  1 p  +  1 q  p  (5.5)  with p > 1, we have  1 · p  p  ))∗ = P (f ∗ , λ,  1 · q  q  )  (5.6)  = 1.  Can the definition of the kernel average be generalized for n functions? What is the reformulation of Definition 5.1.1? This will be addressed in the next chapter.  50  Chapter 6  The Kernel Average of n Functions 6.1  Motivation  In this chapter, we look at another reformulation of the proximal average and see how that reformulation can be used to extend the definition of the kernel average to n functions. Similar to the previous chapters, we assume f1 , · · · , fn and g are functions in Γ0 (H), and λ1 , · · · , λn are strictly positive n  real numbers such that  i=1  λi = 1. First we will prove a new reformulation  of the proximal average. Theorem 6.1.1 Let f = (f1 , · · · fn ) with f1 , · · · , fn ∈ Γ0 (H), and λ = (λ1 , · · · , λn ) with λ1 , · · · , λn ≥ 0 and  n i=1  λi = 1. Define  f := P(f , λ) = λ1 (f1 + q)∗ + · · · + λn (fn + q)∗  ∗  − q.  Then for every x ∈ H,  f (x) =  inf  λ1 y1 +···λn yn =x  λ1 f1 (y1 ) + · · · + λn fn (yn ) +  1 2  λi λj yi − yj  2  .  i<j  51  6.1. Motivation Proof. By Proposition 4.1.2(i) and Lemma 4.2.3 n  f (x) =  n  λi fi (yi ) +  inf  n i=1  λi yi =x  i=1  λi q(yi ) − q(x) i=1  n  =  n i=1  = =  n  inf  n  λi fi (yi ) +  λi yi =x  i=1  inf  λi yi =x  inf  λi yi =x  λi q(yi ) − q( i=1  λi fi (yi ) +  1 4  n  λi yi ) i=1  n  λi λj yi − yj  2  i=1 j=1  λi fi (yi ) +  λi λj i<j  1 yi − yj 2  2  .  This reformulation of the proximal average suggests a generalization where  1 2  ·  2  is replaced by a function g. We’ll call this generalization  the kernel average of n functions, defined by  Qg (f , λ)(x) :=  inf  λi yi =x  λi fi (yi ) +  λi λj g(yi − yj ) .  (6.1)  i<j  This definition is the same as the kernel average when n = 2, but extends the kernel average definition by allowing more than two functions. We’ll now explore the kernel average for n functions in a bit more detail, and from here on when we refer to the kernel average we mean Qg (f , λ).  52  6.2. The Kernel Average Conjugate  6.2  The Kernel Average Conjugate  We will now consider the kernel average as Qg (f , λ)(x) = inf {h(y) := f˜(y) + g˜(y)} = (Ah)(x), Ay=x  where n  A(x1 , · · · , xn ) =  λi xi Ah = inf{Ay = x}{h(x)}, i=1  f˜(y) =  λi fi (yi ), and  g˜(y) =  λi λj g(yi − yj ). i<j  In light of Proposition 3.2.35, we get Q∗g (x∗ ) = (h∗ ◦ A∗ )(x∗ ) =  (f˜ + g˜)∗ ◦ A∗ (x∗ ),  so we can see that to get Q∗g we need to compute h∗ = (f˜ + g˜)∗ , which by Fact 3.2.42 we have h∗ = f˜∗  g˜∗ . It is quite simple to compute f˜∗ , and this  was done in Lemma 3.2.44, but g˜∗ is more challenging. ∗  To consider g˜∗ = i<j  λi λj g(yi − yj ) , we’ll first begin with the case  which gives us the proximal average, where g = q, with the hope that this will allow us to find a general formula for any g. Proposition 6.2.1 Let x = (x1 , · · · , xn ) and g1 (x) =  n  n  i=1 j=1  1 2  xi − xj  2  =  53  6.2. The Kernel Average Conjugate 2 i<j  1 2  xi − xj  2  then  g1∗ (x∗ )  Proof. Let λi =  1 n  =      8n1 2  n  n  x∗i − x∗j 2 ,  i=1 j=1    +∞,  if x∗1 + · · · x∗n = 0; otherwise.  for i = 1 · · · n, then g1 can be rewritten as n  n  g1 (x) = i=1 j=1  1 xi − xj 2  2  = (2n2 ) ·  1 4  n  n  xi − xj  2  .  i=1 j=1  Using Lemma 4.2.3, n  g1 (x) = 2n  2  n  λi q(xi ) − q( i=1  λi xi ) i=1  = 2n2 · g(x),  where g(x) is as defined in Lemma 4.2.4. Then by Proposition 3.2.41(i) and Lemma 4.2.4 x∗ g1∗ (x∗ ) = 2n2 g ∗ ( 2 ) 2n  n  x∗ x∗  2n2 λi q( 2ni2 ) , if 2n12 + · · · + i=1 =   +∞, otherwise.  x∗n 2n2  = 0;  54  6.2. The Kernel Average Conjugate x∗  nx∗  Now, λi q( 2ni2 ) = n1 q( 2n2i ) = n 2  2n  x∗ q( i2 ) 2n  λi i=1  1 q(x∗i ) 4n3  1 = 2n 1 = 2  so  n  q(x∗i ) i=1 n i=1  1 q(x∗i ) n  1 = 2  n  λi q(x∗i ) . i=1  Applying Lemma 4.2.3 n  2n2  λi q( i=1  Since  x∗1 2n2  x∗i ) 2n2  =  ∗  xn + · · · + 2n 2 = 0 ⇔  1 q( 2  1 2n  n  λi x∗i ) + i=1  1 4  n  n  λi λj x∗i − x∗j  2  .  i=1 j=1  λ1 x∗1 + · · · + λn x∗n  =0⇔  n i=1  λi x∗i = 0, we  get n  2n2  x∗i ) 2n2  λi q( i=1  = =  1 1 2 4 1 8n2  n  n  λi λj x∗i − x∗j  2  i=1 j=1 n n  x∗i − x∗j 2 . i=1 j=1  Altogether,  g1∗ (x∗ )  =      8n1 2  n  n  i=1 j=1    +∞,  x∗i − x∗j 2 , if x∗1 + · · · + x∗n = 0; otherwise.  Proposition 6.2.1 can also be proven directly using critical points. Proof.  55  6.2. The Kernel Average Conjugate By definition, n  g1∗ (x∗ ) = sup  x1 ,··· ,xn  i=1 j=1  Let gˆ1 (x) = x∗1 , x1 + · · · + x∗n , xn − 2  n  n  x∗1 , x1 + · · · + x∗n , xn −  n  n  1 2  i=1 j=1  1 xi − xj 2  xi − xj  2.  2  Since  .  ∂g1 ∂xi  =  (xi − xj ), we get  j=1  n  ∇gˆ1 (x) =  (x∗1  n  −2  (x1 − xl ), · · ·  , x∗n  −2  l=1  (xn − xl )). l=1  Setting this equal to zero to solve for the critical points, we see n  n  (x∗1 , · · · , x∗n ) = (2  (xn − xl )),  (x1 − xl ), · · · , 2 l=1  l=1  and therefore that x∗1 + · · · + x∗n = 0. And we also get n  n  (x∗i − x∗j ) = 2  (xi − xj ) = 2n(xi − xj ),  (xi − xl ) − (xj − xl ) = 2 l=1  l=1  which implies that (xi − xj ) =  1 ∗ 2n (xi  − x∗j ) for all i, j with 1 ≤ i, j ≤ n.  Since gˆ1 is a sum of linear and concave functions, then gˆ1 is concave. Thus, the critical point is a maximum and we can substitute into g1∗ to find the supremum.  56  6.2. The Kernel Average Conjugate Doing this, we find g1∗ (x∗ ) = x∗1 , x1 + · · · + x∗n−1 , xn−1 + −x∗1 − · · · − x∗n−1 , xn n  n  − i=1 j=1  =  x∗1 , x1  = x∗1 , =  1 1 2 2n  2  − xn + · · · +  x∗i − x∗j  2  x∗n−1 , xn−1  − xn  1 − 2 8n  n  n  x∗i − x∗j  2  i=1 j=1  1 ∗ 1 1 (x1 − x∗n ) + · · · + x∗n−1 , (x∗n−1 − x∗n ) − 2 2n 2n 8n  n  n  x∗i − x∗j  2  i=1 j=1  1 1 ∗ ∗ 1 ∗ ( x∗1 , x∗1 + · · · + x∗n−1 , x∗n−1 ) − x1 , xn − · · · − x , x∗ 2n 2n 2n n−1 n n n 1 − 2 x∗i − x∗j 2 8n i=1 j=1  1 ( x∗1 = 2n =  1 2n  1 = 2n  2  + ··· +  n  x∗i  2  i=1 n  x∗i 2 i=1  −  x∗n−1 2  1 8n2  1 − 2  n  1 1 + −x∗1 − · · · − x∗n−1 , x∗n − 2 2n 8n  n  n  x∗i − x∗j i=1 j=1  n  x∗i − x∗j  2  i=1 j=1  x∗1  + · · · + x∗n n  2  1 − 2 8n  n  n  x∗i − x∗j 2 . i=1 j=1  Then using (4.2.3), we get that g1∗ (x∗ ) = =  1 4n2 1 8n2  n  n  x∗i − x∗j  2  i=1 j=1 n n  x∗i − x∗j  −  1 8n2  n  n  x∗i − x∗j  2  i=1 j=1  2  i=1 j=1  when x∗1 + · · · + x∗n = 0. If x∗1 + · · · + x∗n = 0 then set x1 = · · · = xn = x and  57  2  6.2. The Kernel Average Conjugate get that  n  g1∗ (x∗ )  x∗i , x  ≥ sup x  = +∞.  i=1  This formula can be written in several equivalent forms using the fact that  n i=1  x∗i = 0. Using this, we can see that  n  n  n  x∗i  −  x∗j 2  n  ( x∗i  =  i=1 j=1  2  + x∗j  2  − 2 x∗i , x∗j )  i=1 j=1 n n  n  ( x∗i  = i=1 n  + x∗j 2 ) − 2 x∗i ,  j=1 n  = i=1 n  x∗j j=1  ( x∗i  2  + x∗j 2 ) − 0  ( x∗i  2  + x∗j 2 ) + 2 x∗i ,  j=1 n  = i=1 n  2  n  j=1 n  x∗j j=1  x∗i + x∗j 2 .  = i=1 j=1  And from the first proof of Proposition 6.2.1 we also see that g1∗ (x∗ ) =  1 8n2  n  n  x∗i − x∗j  2  i=1 j=1  =  1 2n  n i=1  1 ∗ 2 x . 2 i  So that the following three formulations for the conjugate of g1 are equivalent: (1) g1∗ (x∗ ) =  1 8n2  (2) g1∗ (x∗ ) =  1 8n2  n  n  i=1 j=1 n  n  i=1 j=1  x∗i − x∗j  2  x∗i + x∗j  2  =  n  n  i=1 j=1  =  n  n  i=1 j=1  ∗ ∗ 1 xi −xj 2 2 2n  ∗ ∗ 1 xi +xj 2 2 2n  58  6.3. P-norm Kernel Conjugate for General p Case when n = 3 (3) g1∗ (x∗ ) =  1 2n  n i=1  1 2  xi  2  when x∗1 + · · · + x∗n = 0 and g1∗ (x∗ ) = +∞ otherwise. The next step we wish to consider is the case where g = xj  p  1 2  n  n  i=1 j=1  1 p  xi −  for both general p > 1 and general n > 1. The conjugate is known for  general p > 1 and n = 2, so in the next section we begin looking at general p with n = 3. Example 6.2.2 (General p, n = 2) Let f (x1 , x2 ) =  1 p  x1 − x2  p  with p >  1. Then, f ∗ (y1 , y2 ) = sup  x1 ,x2  = sup  x1 ,x2  x1 , y1 + x2 , y2 −  x1 − x2 , y1 + x2 , y1 + y2 −  And using Example 3.2.30, with  f ∗ (y1 , y2 ) =  6.3  1 x1 − x2 p  1 p  +  1 q  p  1 x1 − x2 p  p  .  = 1, we get      1 y1 q , q  if y1 + y2 = 0;    +∞,  otherwise.  P-norm Kernel Conjugate for General p Case when n = 3  We now wish to consider the case where g2 =  g2 (x1 , x2 , x3 ) =  1 x1 − x2 p  p  +  1 2  3  3  i=1 j=1  1 x2 − x3 p  p  +  1 p  xi − xj  p,  that is  1 x3 − x1 p . p  59  6.3. P-norm Kernel Conjugate for General p Case when n = 3 Then g2∗ (y1 , y2 , y3 ) is equal to 1 1 x2 − x3 p − x3 − x1 p } p p x1 ,x2 ,x3 1 1 1 = sup {x1 y1 + x2 y2 − x1 − x2 p + sup(x3 y3 − x2 − x3 p − x3 − x1 p )} p p p x1 ,x2 x3 sup {x1 y1 + x2 y2 + x3 y3 −  1 x1 − x2 p  p  −  (6.2) Recognizing that sup{x3 y3 − 1 p  x3  1 p  x2 − x3  p  −  1 p  x3 − x1 p } = ( p1 x1 − ·  p  +  x2 − · p )∗ (y3 ), then applying Fact 3.2.42(i)  sup{x3 y3 − x3  1 x2 − x3 p  p  −  1 1 1 x3 − x1 p } = (( · −x1 p )∗ ( · −x2 p )∗ )(y3 ) p p p (6.3)  Using Proposition 3.2.28 and Example 3.2.30 we get  (  1 1 x − z p )∗ (y) = y, z + y q . p q  (6.4)  Combining (6.4) and (6.3)  ((  1 1 1 · −x1 p )∗ ( · −x2 p )∗ )(y3 ) = ( · p p q =  q  inf {  u+v=y3  + x1 , · ) ( 1 u q  q  +  1 · q  1 v q  q  q  + x2 , · )  + x1 , u + x2 , v }. (6.5)  Substituting (6.5) back into (6.2) and setting v = y3 − u yields g2∗ (y1 , y2 , y3 ) = sup inf [ x1 , y1 + x2 , y2 − x1 ,x2  u  1 x1 − x2 p  p  +  1 u q  q  +  1 u − y3 q  + x1 , u + x2 , y3 − u ].  60  q  6.3. P-norm Kernel Conjugate for General p Case when n = 3 Looking at the above equation we can see that  F ((x1 , x2 ), u) = x1 , y1 + x2 , y2 −  1 1 1 x1 −x2 p + u q + u−y3 q + x1 , u + x2 , y3 −u p q q  is concave-convex since F ((x1 , x2 ), ·) is a sum of linear and convex functions and F (·, u) is a sum of linear and concave functions. Fix x = (x1 , x2 ) ∈ H × H, and let F ((x1 , x2 ), u) = Fx (u). Using Fact 3.4.3 we show that a saddle value exists by showing that 0+ (epi Fx ) = {(0, λ) : λ ≥ 0}. By definition, 0+ (epi Fx ) = epi Fx∞ , so (u, λ) ∈ 0+ (epi Fx ) ⇔ (u, λ) ∈ epi Fx∞ ⇔ λ ≥ Fx∞ (u). Using Proposition 3.2.19 to compute Fx∞ (u) yields Fx∞ (u) = lim  λ→∞  = lim  Fx (λu) − Fx (0) λ 1 q + 1q λu − y3 q λu  q  + x1 , λu + x2 , y3 − λu −  λ→∞  λq 1q  q  u + = lim λ→∞    +∞, if u = 0; =   0, if u = 0.  λq 1q  u−  y3 q λ  1 q  y3  q  − x2 , y3  λ + λ x1 , u + λ x2 , yλ3 − u λ  Therefore (u, λ) ∈ epi Fx∞ ⇔ u = 0. Since there is no common nonzero direction of recession for Fx , we can use Fact 3.4.3 to swap the positions of  61  6.3. P-norm Kernel Conjugate for General p Case when n = 3 the infimum and supremum, and combining the appropriate inner product terms produces g2∗ (y1 , y2 , y3 ) = inf sup  x1 , y1 +u + x2 , y2 +y3 −u −  u x1 ,x2  1 1 1 x1 −x2 p + u q + u−y3 p q q (6.6)  q  Considering the inner supremum first, we will fix x1 and let x2 − x1 = z. Then (6.6) becomes 1 1 1 u q + u − y3 q + sup x1 , y1 + u + sup x1 + z, y2 + y3 − u − z p u q q p x1 z 1 1 1 u q + u − y3 q + sup x1 , y1 + y2 + y3 + sup z, y2 + y3 − u − z = inf u q q p x1 z inf  Here we can see that the supremum on the right is the definition of ( p1 ·  p )∗  evaluated at y2 +y3 −u. Using Example 3.2.30 and the fact that y1 +y2 +y3 = 0 we then get g2∗ (y1 , y2 , y3 ) = inf u  where y1 + y2 + y3 = 0 and  1 u q 1 p  +  1 u − y3 q  q  +  1 q  = 1.  q  +  1 y2 + y3 − u q  q  Since g2 is symmetric under permutation of its variables, we interchange y1 and y3 so that the previous description of g2∗ turns into the more symmetric form g2∗ (y1 , y2 , y3 ) = inf x  1 x − y1 q  where y1 + y2 + y3 = 0 and  1 p  q  +  +  1 q  1 x − (y1 + y2 ) q  q  +  1 x − (y1 + y2 + y3 ) q  q  = 1.  62  p  .  .  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 In R, the problem becomes  min x∈R  with  1 p  +  1 q  1 1 1 |x − y1 |q + |x − (y1 + y2 )|q + |x|q q q q  (6.7)  = 1 and y1 + y2 + y3 = 0. We will define 1 1 1 h(x) := |x − y1 |q + |x − (y1 + y2 )|q + |x|q , q q q  (6.8)  so that we are solving min h(x). Since the case with p = 2 and q = 2 was x∈R  already solved, we consider this problem with the next simplest case, q = 3 which corresponds to p = 23 .  6.4  P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Considering (6.8) with q = 3 gives the problem  min h(x)  (6.9)  1 1 1 h(x) = |x − y1 |3 + |x − (y1 + y2 )|3 + |x|3 , 3 3 3  (6.10)  x∈R  where  and y1 , y2 are constants. In order to find the optimal value of (6.9) we need to consider eight cases, which cover all possible values of the three absolute values in (6.10). The eight cases are as follows:  63  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 (1) x ≥ y1 , x ≥ y1 + y2 , and x ≥ 0 (2) x ≤ y1 , x ≥ y1 + y2 , and x ≥ 0 (3) x ≥ y1 , x ≤ y1 + y2 , and x ≥ 0 (4) x ≥ y1 , x ≥ y1 + y2 , and x ≤ 0 (5) x ≤ y1 , x ≤ y1 + y2 , and x ≥ 0 (6) x ≤ y1 , x ≥ y1 + y2 , and x ≤ 0 (7) x ≥ y1 , x ≤ y1 + y2 , and x ≤ 0 (8) x ≤ y1 , x ≤ y1 + y2 , and x ≤ 0 We now consider each case in depth, and set y = (y1 , y2 ).  6.4.1  Case 1: x ≥ y1 , x ≥ y1 + y2 , x ≥ 0  Using the constraints of this case, we define the function to be minimized as 1 1 1 h1,y (x) = (x − y1 )3 + (x − (y1 + y2 ))3 + x3 , 3 3 3 and we minimize h1,y over the region where max{y1 , y1 +y2 , 0} ≤ x. Because h1,y (x) is convex with respect to x, see Example 3.2.31, then by Fact 3.2.23 any critical point will be the minimizer. Differentiating to solve for critical points yields ∂h1,y = (x − y1 )2 + (x − (y1 + y2 ))2 + x2 . ∂x  64  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Setting  ∂h1,y ∂x  = 0 and solving for x, we get the critical points 1 x1 = y2 + 3 1 x2 = y2 + 3  2 y1 + 3 2 y1 − 3  1 3 1 3  −2y22 − 2y1 y2 − 2y12 −2y22 − 2y1 y2 − 2y12 .  To see if the critical points are real, the value of −2y22 − 2y1 y2 − 2y12 must be examined and we see that −2y22 − 2y1 y2 − 2y12 = −(y12 + y22 ) − (y12 + 2y1 y2 + y22 ) = −(y12 + y22 ) − (y1 + y2 )2 ≤ 0. Since −2y22 − 2y1 y2 − 2y12 ≤ 0 for all values of y1 and y2 then there are no real critical points of h1,y , so we next check the boundary points, x ≥ max{0, y1 , y1 + y2 }. If max{0, y1 , y1 + y2 } = y1 + y2 , i.e. when y1 ≥ 0 and y2 ≥ 0, or when y2 ≥ 0 and −y2 ≤ y1 ≤ 0, then the minimum value of h1y1 ,y2 (x) is 1 1 h1,y (y1 + y2 ) = y23 + (y1 + y2 )3 . 3 3 If max{0, y1 , y1 + y2 } = y1 , or rather when y1 ≥ 0 and y2 ≤ 0, then we get a minimum value at 1 1 h1,y (y1 ) = |y2 |3 + y13 . 3 3 If max{0, y1 , y1 + y2 } = 0, i.e. if y1 ≤ 0 and y2 ≤ 0, or y2 ≥ 0 and y2 ≤ −y1 ,  65  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 then the minimum value is 1 1 h1,y (0) = |y1 |3 + |y1 + y2 |3 . 3 3 This case is summarized graphically in Figure 6.1.  Figure 6.1: Case 1 summary  66  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  6.4.2  Case 2: x ≤ y1 , x ≥ y1 + y2 , x ≥ 0  The conditions for this case give the following function, 1 1 1 h2,y (x) = (y1 − x)3 + (x − (y1 + y2 ))3 + x3 , 3 3 3 for minimization on the region where max{0, y1 + y2 } ≤ x ≤ y1 . Again, differentiating to find the critical points gives us ∂h2,y = −(y1 − x)2 + (x − (y1 + y2 ))2 + x2 . ∂x And solving  ∂h2,y ∂x  = 0 yields the two critical points  x1 = y2 +  −2y1 y2  x2 = y2 −  −2y1 y2 .  Considering the constraints, we see that y1 ≥ x ≥ 0 and y1 +y2 ≤ x ≤ y1 , so that y1 ≥ 0 and y2 ≤ 0 is the region of interest. This makes −2y1 y2 ≥ 0 so that the critical points are real. We need x ≥ 0, but x2 ≤ 0 for all y1 , y2 in this region, and if x2 = 0 then y2 = y1 = 0 and x1 = x2 = 0. Hence, it suffices to consider only x1 . Next, we check that x1 satisfies the three conditions of this case: x ≥ 0, √ x ≤ y1 , and x ≥ y1 + y2 . For x1 = y2 + −2y1 y2 ≥ 0, it is required that −y2 = |y2 | ≤  1 −2y1 y2 ⇔ y22 ≤ −2y1 y2 ⇔ |y2 | ≤ 2y1 ⇔ |y2 | ≤ y1 . 2  67  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 For x1 ≤ y1 , this is equivalent to y2 +  −2y1 y2 ≤ y1 ⇔ −2y1 y2 ≤ (y1 − y2 )2 ⇔ 0 ≤ y12 + y22 .  Therefore this condition is always true. For x1 ≥ y1 + y2 , we get y2 +  −2y1 y2 ≥ y1 + y2 ⇔  −2y1 y2 ≥ y1 ⇔ 2|y2 | ≥ y1 .  Since h2,y is convex with respect to x, then if these three conditions hold then x1 is the minimizer. So if 12 |y2 | ≤ y1 ≤ 2|y2 | then the critical point is the minimizer and we get the minimum value of h2,y (x) is 1 h2,y (x1 ) = − y1 y2 (3y1 − 3y2 − 4 −2y1 y2 ) 3 If the conditions for the critical point are not satisfied then we must check the boundary conditions. In this case the boundary points are max{y1 + y2 , 0} ≤ x ≤ y1 , and we consider two subcases: Subcase 1: y1 + y2 ≥ 0 To determine which is the minimum, we evaluate the difference between the upper boundary value and the lower boundary value. When y1 + y2 ≥ 0, the max{y1 + y2 , 0} = y1 + y2 so we look at the difference 1 1 1 1 h2,y (y1 ) − h2,y (y1 + y2 ) = (−y2 )3 + y13 − (−y2 )3 − (y1 + y2 )3 3 3 3 3 1 3 1 1 = y1 − (y1 + y2 )3 = − y23 − y1 y2 (y1 + y2 ). 3 3 3  68  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Since y1 ≥ 0, y2 ≤ 0 and y1 + y2 ≥ 0 the above difference is always positive. This means that the minimum value is 1 1 h2,y (y1 + y2 ) = (y1 + y2 )3 − y23 . 3 3 Subcase 2: y1 + y2 ≤ 0 With y1 + y2 ≤ 0 the max{y1 + y2 , 0} = 0 so we calculate the difference 1 1 1 1 h2,y (y1 ) − h2,y (0) = y13 + (−y2 )3 − y13 − (−y1 − y2 )3 3 3 3 3 1 1 1 = (−y2 )3 − (−y1 − y2 )3 = y13 + y1 y2 (y1 + y2 ). 3 3 3 Again, because of the signs of y1 , y2 , and y1 + y2 the difference is positive so the minimum is 1 1 h2,y (0) = y13 − (y1 + y2 )3 . 3 3 Case 2 is summarized graphically in Figure 6.2.  6.4.3  Case 3: x ≥ y1 , x ≤ y1 + y2 , x ≥ 0  For this case, the function we are looking to minimize is 1 1 1 h3,y = (x − y1 )3 + (y1 + y2 − x)3 + x3 , 3 3 3 over the region where max{y1 , 0} ≤ x ≤ y1 + y2 . Differentiating h3,y yields ∂h3,y = (x − y1 )2 − (y1 + y2 − x)2 + x2 . ∂x  69  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Figure 6.2: Case 2 summary  And setting  ∂h3,y ∂x  = 0 and solving for x gives us the critical points  x1 = −y2 +  2y22 + 2y1 y2  x2 = −y2 −  2y22 + 2y1 y2 .  Looking at the constraints for this case we can see that y1 + y2 ≥ x ≥ y1 , so we must have y2 ≥ 0. And y1 + y2 ≥ x ≥ 0 gives us that y1 + y2 ≥ 0. Knowing that y2 ≥ 0, it is obvious that x2 ≤ 0 and when x2 = 0 then y2 = y1 = 0 and x1 = x2 = 0. So this case can be covered by considering only x1 . For x1 to be a real critical point, we need 2y22 +2y1 y2 ≥ 0 ⇔ 2y2 (y1 +y2 ) ≥ 0. Since both y1 ≥ 0 and y1 + y2 ≥ 0 always holds for this case then this is 70  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 always true. We also require that the critical point x1 satisfies the conditions for this case. For x1 ≥ 0, −y2 +  2y22 + 2y1 y2 ≥ 0 ⇔ 2y22 + 2y1 y2 ≥ y22 ⇔ 2y1 y2 ≥ −y22 ⇔ 2y1 ≥ −y2  For x1 ≥ y1  −y2 + 2y22 + 2y1 y2 ≥ y1 ⇔  2y22 + 2y1 y2 ≥ y1 +y2 ⇔ 2y22 +2y1 y2 ≥ y12 +2y1 y2 +y22 ⇔ y22 ≥ y12 ⇔ y2 ≥ |y1 |  And for x1 ≤ y1 + y2  − y2 +  2y22 + 2y1 y2 ≤ y1 + y2 ⇔ 2y22 + 2y1 y2 ≤ y12 + 4y1 y2 + 4y22 ⇔ 0 ≤ y12 + 2y1 y2 + 2y22 ⇔ 0 ≤ (y1 + y2 )2 + y22  So x1 ≤ y1 + y2 is always true, and x1 ≥ 0 and x1 ≥ y1 hold when 2y1 ≥ −y2 and y2 ≥ |y1 |, respectively. This means that when both 2y1 ≥ −y2 and y2 ≥ |y1 | are true, the critical point x1 will produce the minimum, 1 h3,y (x1 ) = y2 (y1 + y2 )(3y1 + 6y2 − 4 2y2 (y1 + y2 )). 3 Next, we will need to consider two subcases of y1 ≥ 0 and y1 ≤ 0 separately in order to find the minimum value for each region where the critical point does not satisfy the conditions of this case.  71  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y2 ≥ 0, y1 ≤ 0 If y1 ≤ 0 and 2y1 ≥ −y2 then the critical point x1 is the minimizer and the minimum value is as stated above. If 2y1 < −y2 then there are no critical points and we look at the boundary points, which in this subcase are max{0, y1 } = 0 ≤ x ≤ y1 + y2 . Taking the difference of the two potential minimums yields 1 h3,y (y1 + y2 ) − h3,y (0) = y23 + 3 1 = y23 − 3  1 1 1 (y1 + y2 )3 − |y1 |3 − (y1 + y2 )3 3 3 3 1 |y1 |3 . 3  Since y1 + y2 ≥ 0 ⇔ y2 ≥ −y1 ⇔ y2 ≥ |y1 |, then h3,y (0) ≤ h3,y (y1 + y2 ) and the lower bound is the minimizer and the minimum value is 1 1 h3,y (0) = (y1 + y2 )3 − y13 . 3 3 Subcase 2: y2 ≥ 0, y1 ≥ 0 Since both y1 and y2 are always positive then 2y1 ≥ −y2 always holds. Therefore the critical point is good everywhere within the region where y2 ≥ y1 . When y1 ≥ y2 then the boundary points are max{0, y1 } = y1 ≤ x ≤ y1 + y2 . Taking the difference produces 1 1 1 1 h3,y (y1 + y2 ) − h3,y (y1 ) = y23 + (y1 + y2 )3 − y23 − y13 3 3 3 3 1 1 3 3 = (y1 + y2 ) − y1 ≥ 0. 3 3  72  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 So the minimum value is 1 1 h3,y (y1 ) = y23 + y13 . 3 3 Case 3 is summarized graphically in Figure 6.3.  Figure 6.3: Case 3 summary  6.4.4  Case 4: x ≥ y1 , x ≥ y1 + y2 , x ≤ 0  For this case we minimize the function 1 1 1 h4,y (x) = (x − y1 )3 + (x − y1 − y2 )3 − x3 3 3 3  73  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 over the region where max{y1 , y1 + y2 } ≤ x ≤ 0. Solving  ∂h4 ∂x  = (x − y1 )2 +  (x − y1 − y2 )2 − x2 = 0 yields the critical points x1 = 2y1 + y2 +  2y1 y2 + 2y12  x2 = 2y1 + y2 −  2y1 y2 + 2y12 .  The constraints for this case lead us to the inequalities y1 ≤ 0 and y1 +y2 ≤ 0, or y2 ≤ −y1 = |y1 |. In order for the critical points to be real, we require 2y1 y2 + 2y12 ≥ 0 ⇔ |y1 | ≥ y2 which is always true in this case. Therefore the critical point will give the minimum value. To satisfy x ≤ 0 and determine which critical point is the one we want, we need x1 , x2 ≤ 0. Looking first at x1 : y2 + 2y1 +  2y1 y2 + 2y12 ≤ 0 ⇔ (−y2 − 2y1 )2 ≥ 2y1 y2 + 2y12  ⇔ y22 + 4y1 y1 + 4y12 ≥ 2y1 y2 + 2y12 ⇔ y22 + 2y1 y1 + 2y12 ≥ 0 ⇔ (y1 + y2 )2 + y12 ≥ 0. This always holds, so we check the next condition, x1 ≥ y1 + y2 : y2 + 2y1 +  2y1 y2 + 2y12 ≥ y1 + y2 ⇔ y1 +  2y1 y2 + 2y12 ≥ 0  ⇔ 2y1 y2 + 2y12 ≥ y12 ⇔ 2y1 y2 + y12 ≥ 0 ⇔ y1 (2y2 + y1 ) ≥ 0 1 ⇔ 2y2 + y1 ≤ 0 ⇔ y2 ≤ |y1 |. 2  74  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 And the final condition, x1 ≥ y1 , yields y2 + 2y1 +  2y1 y2 + 2y12 ≥ y1 ⇔ y1 + y2 +  2y1 y2 + 2y12 ≥ 0  ⇔ 2y1 y2 + 2y12 ≥ (−y1 − y2 )2 ⇔ 2y1 y2 + 2y12 ≥ y12 + 2y1 y2 + y22 ⇔ y12 − y22 ≥ 0 ⇔ |y1 | ≥ |y2 |. So the critical point x1 is in the region of interest when y2 ≤  1 2 |y1 |  and  |y1 | ≥ |y2 | both hold. Next looking at x2 , if we look at the condition x2 ≥ y1 we see that 2y1 + y2 −  2y1 y2 + 2y12 ≥ y1 ⇔ (y1 + y2 ) −  2y1 y2 + 2y12 ≥ 0.  This only holds if y1 = y2 = 0, in which case x1 = x2 so this point is not in the interior of the region and we do not need to consider it. So when y2 ≤ 21 |y1 | and |y1 | ≥ |y2 | both hold then the minimum value is 1 h4,y (x1 ) = − y1 (y1 + y2 )(6y1 + 3y2 + 4 2y1 (y1 + y2 )). 3 Next, we examine the boundary conditions max{y1 , y1 + y2 } ≤ x ≤ 0 for the two subcases y2 ≥ 0 and y2 ≤ 0, to determine the minimum when the constraints needed for the critical point do not hold.  75  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y2 ≥ 0 When y2 ≥ 0 then the max{y1 , y1 +y2 } = y1 +y2 , so we look at the difference 1 1 1 1 h4,y (0) − h4,y (y1 + y2 ) = − y13 − (y1 + y2 )3 − y23 + (y1 + y2 )3 3 3 3 3 1 1 = |y1 |3 − y23 . 3 3 Since |y1 | ≥ y2 then the above difference is positive, so the minimum is 1 1 h4,y (y1 + y2 ) = y23 − (y1 + y2 )3 . 3 3 Subcase 2: y2 ≤ 0 When y2 ≤ 0 then the max{y1 , y1 + y2 } = y1 , so we look at the difference 1 1 1 1 h4,y (0) − h4,y (y1 ) = |y1 |3 + |y1 + y2 |3 − |y2 |3 − |y1 |3 3 3 3 3 1 1 3 3 = |y1 + y2 | − |y2 | . 3 3 Since |y1 + y2 | ≥ |y2 | then the difference is positive and the minimum is 1 1 h4,y (y1 ) = |y1 |3 + |y2 |3 . 3 3 Case 4 is summarized graphically in Figure 6.4.  76  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Figure 6.4: Case 4 summary  6.4.5  Case 5: x ≤ y1 , x ≤ y1 + y2 , x ≥ 0  For this case we look at minimizing 1 1 1 h5,y (x) = (y1 − x)3 + (y1 + y2 − x)3 + x3 , 3 3 3 over the region where 0 ≤ x ≤ min{y1 , y1 + y2 }. Differentiating h5,y with respect to x yields ∂h5,y = −(y1 − x)2 − (y1 + y2 − x)2 + x2 . ∂x  77  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Then solving  ∂h5,y ∂x  = 0 gives us the critical points  x1 = 2y1 + y2 +  2y1 y2 + 2y12  x2 = 2y1 + y2 −  2y1 y2 + 2y12 .  These critical points are the same as in the previous case, but must be reexamined using the new constraints of this case. We see that y1 ≥ x ≥ 0, so y1 ≥ 0 and similarly y1 + y2 ≥ 0. In order for the critical points to be real, we need 2y1 y2 + 2y12 ≥ 0 ⇔ y1 + y2 ≥ 0 which always holds for this case. Next, we check where the critical points are valid. When we check x1 ≤ y1 , we see that 2y1 + y2 + y1 + y2 +  2y1 y2 + 2y12 ≤ y1 ⇔  2y1 y2 + 2y12 ≤ 0, but since y1 + y2 ≥ 0 and  2y1 y2 + 2y12 ≥ 0  then this does not hold except when y1 = y2 = 0, and it that situation x1 = x2 . Therefore, for this case we need only consider x2 . Looking at x2 , we see for x2 ≥ 0, 2y1 + y2 −  2y1 y2 + 2y12 ≥ 0 ⇔ (2y1 + y2 )2 ≥ 2y1 y2 + 2y12  ⇔ 4y12 + 4y1 y2 + y22 ≥ 2y1 y2 + 2y12 ⇔ 2y12 + 2y1 y2 + y22 ≥ 0 ⇔ y12 + (y1 + y2 )2 ≥ 0. This always holds, so next we check x2 ≤ y1 : 2y1 + y2 −  2y1 y2 + 2y12 ≤ y1 ⇔ (y1 + y2 )2 ≤ 2y1 y2 + 2y12  ⇔ y12 + 2y1 y2 + y22 ≤ 2y1 y2 + 2y12 ⇔ y22 − y12 ≤ 0 ⇔ |y1 | ≥ |y2 |.  78  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 And checking the last condition we see that for x2 ≤ y1 + y2 , 2y1 y2 + 2y12 ≤ y1 + y2 ⇔ y1 −  2y1 + y2 −  2y1 y2 + 2y12 ≤ 0  ⇔ 2y1 y2 + 2y12 ≥ y12 ⇔ 2y1 y2 + y12 ≥ 0 ⇔ y1 (2y2 + y1 ) ≥ 0 ⇔ 2y2 + y1 ≥ 0 So the critical point x2 is in the region of interest when both |y1 | ≥ |y2 | and 2y2 + y1 ≥ 0 hold, and the minimum value is 1 h5,y (x2 ) = − y1 (y1 + y2 )(−6y1 − 3y2 + 4 2y1 (y1 + y2 )). 3 Now we look at the boundary conditions 0 ≤ x ≤ min{y1 , y1 + y2 } in the two subcases y2 ≥ 0 and y2 ≤ 0 to determine the minimum when the critical point is not in the region of interest. That is, when |y1 | < |y2 | and 2y2 + y1 < 0. Subcase 1: y2 ≥ 0 With y2 ≥ 0 the min{y1 , y1 + y2 } = y1 and the difference that we need to consider is 1 h5,y (y1 ) − h5,y (0) = y23 + 3 1 = y23 − 3  1 3 1 3 1 y − y − (y1 + y2 )3 3 1 3 1 3 1 1 (y1 + y2 )3 = − y13 − y1 y2 (y1 + y2 ) ≤ 0 3 3  79  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 So the upper bound is the minimizer and the minimum value is 1 1 h5,y (y1 ) = y13 + y23 . 3 3 Subcase 2: y2 ≤ 0 When y2 ≤ 0 then the min{y1 , y1 +y2 } = y1 +y2 and we look at the difference 1 h5,y (y1 + y2 ) − h5,y (0) = (−y2 )3 + 3 1 = (−y2 )3 − 3  1 1 1 (y1 + y2 )3 − y13 − (y1 + y2 )3 3 3 3 1 3 y 3 1  Since y2 ≤ 0, y1 ≥ 0, and y1 + y2 ≥ 0 then y1 ≥ |y2 | and the above equation is always less than or equal to zero, which makes the upper bound, y1 + y2 the minimizer with a minimum value of 1 1 h5,y (y1 + y2 ) = (y1 + y2 )3 − y23 . 3 3 Case 5 is summarized graphically in Figure 6.5.  6.4.6  Case 6: x ≤ y1 , x ≥ y1 + y2 , x ≤ 0  For this case we have the function 1 1 1 h6,y (x) = (y1 − x)3 + (x − y1 − y2 )3 + (−x)3 3 3 3 to minimize. Differentiating with respect to x to get our critical points yields ∂h6,y = −(y1 − x)2 + (x − y1 − y2 )2 − x2 ∂x 80  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Figure 6.5: Case 5 summary  Setting this equal to zero and solving for x then gives us the critical points  x1 = −y2 +  2y22 + 2y1 y2  x2 = −y2 −  2y22 + 2y1 y2 .  Looking at the constraints of the case, notice that y1 + y2 ≤ x ≤ y1 and y1 + y2 ≤ x ≤ 0 imply that both y1 + y2 ≤ 0 and y2 ≤ 0. For the critical points to be real we need 2y22 + 2y1 y2 ≥ 0 ⇔ 2y2 (y1 + y2 ) ≥ 0, which is always true since y2 and y1 + y2 are both always negative. It is easy to see that critical point x1 is always positive, so it does not satisfy the conditions of this case, except when y1 = y2 = 0 which makes x1 = x2 . So it suffices to check only x2 for this case. 81  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Checking the condition x2 ≥ y1 + y2 , 2y22 + 2y1 y2 ⇔  y1 + y2 ≤ −y2 −  2y22 + 2y1 y2 ≤ −y1 − 2y2  ⇔ 2y22 + 2y1 y2 ≤ y12 + 4y1 y2 + 4y22 ⇔ 0 ≤ y12 + 2y1 y2 + 2y22 ⇔ 0 ≤ (y1 + y2 )2 + y22 . This condition always holds. Next, we check x2 ≤ y1 , −y2 −  2y22 + 2y1 y2 ≤ y1 ⇔ −y1 − y2 ≤  2y22 + 2y1 y2  ⇔ y12 + 2y1 y2 + y22 ≤ 2y22 + 2y1 y2 ⇔ y12 ≤ y22 ⇔ |y1 | ≤ |y2 |. And the last condition is x2 ≤ 0, −y2 −  2y22 + 2y1 y2 ≤ 0 ⇔ −y2 ≤  2y22 + 2y1 y2  ⇔ y22 ≤ 2y22 + 2y1 y2 ⇔ 0 ≤ y22 + 2y1 y2 ⇔ 0 ≤ y2 (y2 + 2y1 ) ⇔ y2 + 2y1 ≤ 0 ⇔ 2y1 ≤ |y2 |. So x2 is the minimizer when both |y1 | ≤ |y2 | and 2y1 ≤ |y2 | hold and the minimum value is 1 h6,y (x2 ) = − y2 (y1 + y2 )(3y1 + 6y2 + 4 2y2 (y1 + y2 )). 3 Outside this region we check the boundary conditions y1 + y2 ≤ x ≤ min{0, y1 } to determine the minimizer.  82  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y1 ≥ 0 With y1 ≥ 0 the min{y1 , 0} = 0 so we look at the difference 1 h6,y (0) − h6,y (y1 + y2 ) = y13 + 3 1 = y13 − 3  1 1 1 (−y1 − y2 )3 − (−y2 )3 − (−y1 − y2 )3 3 3 3 1 1 1 (−y2 )3 = y13 − |y2 |3 . 3 3 3  Since y1 + y2 ≤ 0 we know that |y2 | ≥ y1 and the above equation is less than or equal to zero. This makes the upper boundary the minimum and the minimum value is 1 1 h6,y (0) = y13 − (y1 + y2 )3 . 3 3 Subcase 2: y1 ≤ 0 When y1 ≤ 0 the min{y1 , 0} = y1 and we consider the difference 1 1 1 1 h6,y (y1 ) − h6,y (y1 + y2 ) = (−y2 )3 + (−y1 )3 − (−y2 )3 − (−y1 − y2 )3 3 3 3 3 1 1 3 3 = |y1 | − |y1 + y2 | . 3 3 Because y1 and y2 are both negative then |y1 + y2 | ≥ |y1 | and the equation above is less than or equal to zero, giving us a minimum value of 1 1 h6,y (y1 ) = − y13 − y23 . 3 3 Case 6 is summarized graphically in Figure 6.6.  83  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Figure 6.6: Case 6 summary  6.4.7  Case 7: x ≥ y1 , x ≤ y1 + y2 , x ≤ 0  In this case the equation to be minimized is 1 1 1 h7,y (x) = (x − y1 )3 + (y1 + y2 − x)3 + (−x)3 , 3 3 3 over the region where y1 ≤ x ≤ min{0, y1 + y2 }. Differentiating h7,y with respect to x yields ∂h7,y = (x − y1 )2 − (y1 + y2 − x)2 − x2 . ∂x  84  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Setting  ∂h7,y ∂x  = 0 and solving for x gives us the critical points  x1 = y2 +  −2y1 y2  x2 = y2 −  −2y1 y2 .  Looking at the constraints we see that y1 ≤ x ≤ 0 and y1 ≤ x ≤ y1 + y2 so we have y1 ≤ 0 and y2 ≥ 0 for this case. For the critical points to be real, −2y1 y2 ≥ 0 must hold, and because of the signs of y1 and y2 this is always true. The critical point x1 ≤ 0 so it does not lie in the interior of the region of interest. And when x1 = 0 then y1 = y2 = 0 and so x1 = x2 . Therefore we need only consider x2 , and we check x2 against the constraints. First, x 2 ≥ y1 , y2 −  −2y1 y2 ≥ y1 ⇔ (y2 − y1 )2 ≥ −2y1 y2 ⇔ y22 + y12 ≥ 0.  This will always hold, so next we look at x2 ≤ 0 y2 −  −2y1 y2 ≤ 0 ⇔ y22 ≤ −2y1 y2 ⇔ y22 + 2y1 y2 ≤ 0 ⇔ y2 (y2 + 2y1 ) ≤ 0 ⇔ y2 + 2y1 ≤ 0 ⇔ 2|y1 | ≥ y2 .  And finally at x2 ≤ y1 + y2 , y2 −  1 −2y1 y2 ≤ y1 + y2 ⇔ y12 ≤ −2y1 y2 ⇔ |y1 | ≤ 2y2 ⇔ |y1 | ≤ y2 . 2  Then the critical point x2 is the minimizer if  1 2 |y1 |  ≤ y2 ≤ 2|y1 |, and the  85  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 minimum value is 1 h7,y = y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ). 3 Outside this region, we look at the boundary conditions y1 ≤ x ≤ min{0, y1 + y2 } to determine the minimizer. Subcase 1: y1 + y2 ≥ 0 When y1 + y2 ≥ 0 the min{0, y1 + y2 } = 0 so we look at the difference 1 1 1 1 h7,y (0) − h7,y (y1 ) = (−y1 )3 + (y1 + y2 )3 − y23 − (−y1 )3 3 3 3 3 1 1 3 1 3 3 = (y1 + y2 ) − y2 = y1 + y1 y2 (y1 + y2 ). 3 3 3 Because y1 ≤ 0, y2 ≥ 0, then y1 + y2 ≤ y2 and (y1 + y2 )3 ≤ y23 so that the above equation is always negative and therefore the minimum value is 1 1 h7,y (0) = (y1 + y2 )3 − y13 . 3 3 Subcase 2: y1 + y2 ≤ 0 Here the min{0, y1 + y2 } = y1 + y2 so the difference to consider is 1 1 1 1 h7,y (y1 + y2 ) − h7,y (y1 ) = y23 + (−y1 − y2 )3 − y23 − (−y1 )3 3 3 3 3 1 1 1 = (−y1 − y2 )3 − (−y1 )3 = − y23 − y1 y2 (y1 + y2 ) 3 3 3  86  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Again, because of the signs of y1 and y2 , we get that (−y1 − y2 ) ≤ −y1 and (−y1 − y2 )3 ≤ (−y1 )3 so that this equation is always negative so the minimum value is 1 1 h7,y (y1 + y2 ) = y23 − (y1 + y2 )3 . 3 3 Case 7 is summarized graphically in Figure 6.7.  Figure 6.7: Case 7 summary  87  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  6.4.8  Case 8: x ≤ y1 , x ≤ y1 + y2 , x ≤ 0  For the final case the function we minimize is 1 1 1 h8,y (x) = (y1 − x)3 + (y1 + y2 − x)3 + (−x)3 , 3 3 3 over the region where x ≤ min{0, y1 , y1 + y2 }. Differentiating h8,y with respect to x yields ∂h8,y = −(y1 − x)2 − (y1 + y2 − x)2 − x2 . ∂x Setting this equal to zero and solving for x gives the critical points 2 x1 = y1 + 3 2 x2 = y1 + 3  1 y2 + 3 1 y2 − 3  1 3 1 3  −2y12 − 2y1 y2 − 2y22 −2y12 − 2y1 y2 − 2y22 .  As we saw in case 1, −2y12 − 2y1 y2 − 2y22 = −(y12 + y22 ) − (y1 + y2 )2 ≤ 0 so the critical points are only real when y1 = y2 = 0. Thus we are only left with the boundary conditions x ≤ min{0, y1 , y1 +y2 }. If min{0, y1 , y1 +y2 } = 0 then the minimum value is 1 1 h8,y (0) = y13 + (y1 + y2 )3 . 3 3  88  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 When the min{0, y1 , y1 + y2 } = y1 then the minimum value is 1 1 h8,y (y1 ) = y23 + (−y1 )3 . 3 3 And finally, if min{0, y1 , y1 + y2 } = y1 + y2 , the minimum value is 1 1 h8,y y1 + y2 ) = (−y2 )3 + (−y1 − y2 )3 . 3 3 Case 8 is summarized graphically in Figure 6.8.  Figure 6.8: Case 8 summary  89  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  6.4.9  Combining the Eight Cases  Now that each possibility has been examined, each case must be compared against the others to determine the global minimum in each region. Cases 2,3, and 4 and Cases 5,6, and 7 can be plotted without any overlap, as seen in Figure 6.9 on page 97. This leaves 12 regions each with four possible minimizers. Since we can see from equation (6.10) that h(x) is convex with respect to x, the critical point will be the minimum in regions where there is a valid critical point. Combining all of the eight cases, we can see that there is a valid critical point for every possible region. We therefore have only six regions, as seen in Figure 6.10 on page 98. The regions are divided by the lines y1 − y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0, and each region has a critical point minimizer. The six regions are defined as follows. Let y = (y1 , y2 ), then 1 y ∈ A if − y1 ≤ y2 ≤ y1 , 2 1 y ∈ B if − y2 ≤ y1 ≤ y2 , 2 1 y ∈ C if − 2y2 ≤ y1 ≤ − y2 , 2 1 y ∈ D if y1 ≤ y2 ≤ − y1 , 2 1 y ∈ E if y2 ≤ y1 ≤ − y2 , and 2 1 y ∈ F if − 2y1 ≤ y2 ≤ − y1 . 2  (6.11)  The minimum for each region is outlined below  90  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 (A) When y ∈ A, then the minimizer is the critical point x2 from Case 5, and the minimum value is  h5,y (x2 ) = h5,y (2y1 + y2 −  2y1 y2 + 2y12 )  1 = − y1 (y1 + y2 )(−6y1 − 3y2 + 4 2y1 (y1 + y2 )). 3 (B) When y ∈ B, the minimizer is the critical point x1 from Case 3, and the minimum value is  h3,y (x1 ) = h3,y (−y2 +  2y22 + 2y1 y2 )  1 = − y2 (y1 + y2 )(−3y1 − 6y2 + 4 2y2 (y1 + y2 )). 3 (C) When y ∈ C, the minimizer is the critical point x2 from Case 7, and the minimum value is  h7,y (x2 ) = h7,y (y2 −  −2y1 y2 )  1 = y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ). 3 (D) When y ∈ D, the minimizer is the critical point x1 from Case 4, and the minimum value is  h4,y (x1 ) = h4,y (2y1 + y2 +  2y1 y2 + 2y12 )  1 = − y1 (y1 + y2 )(6y1 + 3y2 + 4 2y1 (y1 + y2 )). 3 (E) When y ∈ E, the minimizer is the critical point x2 from Case 6, and 91  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 the minimum value is 2y22 + 2y1 y2 )  h6,y (x2 ) = h6,y (−y2 −  1 = − y2 (y1 + y2 )(3y1 + 6y2 + 4 2y2 (y1 + y2 )). 3 (F) And when y ∈ F , the minimizer is the critical point x1 from Case 2, and the minimum value is  h2,y (x1 ) = h2,y (y2 +  −2y1 y2 )  1 = y1 y2 (−3y1 + 3y2 + 4 −2y1 y2 ). 3  6.4.10  Bringing It All Together  If you recall, the goal was to solve  min h(x) = min x∈R  x∈R  1 1 1 |x − y1 |3 + |x − (y1 + y2 )|3 + |x|3 , 3 3 3  .  in order to get g2∗ (y1 , y2 , y3 ), where y1 + y2 + y3 = 0 g2 (x1 , x2 , x3 ) =  2 x1 − x2 3  3 2  +  2 x2 − x3 3  3 2  +  2 x3 − x1 3  3 2  The minimizer of h(x) has been found for each of the six regions, and so  92  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 we have found that g2 (y1 , y2 , −y1 − y2 ) =    1  − 3 y1 (y1 + y2 )(−6y1 − 3y2 + 4 2y1 (y1 + y2 )), if y ∈ A;        − 31 y2 (y1 + y2 )(−3y1 − 6y2 + 4 2y2 (y1 + y2 )), if y ∈ B;       √   1 y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ), if y ∈ C; 3    − 13 y1 (y1 + y2 )(6y1 + 3y2 + 4 2y1 (y1 + y2 )),         − 31 y2 (y1 + y2 )(3y1 + 6y2 + 4 2y2 (y1 + y2 )),       √   1 y1 y2 (−3y1 + 3y2 + 4 −2y1 y2 ), 3  (6.12)  if y ∈ D; if y ∈ E; if y ∈ F,  where the regions A, · · · , F are as defined in (6.11) on page 90. Examining a plot of the above function and its contour plot, in Figure 6.11 on page 98, we see that g2∗ is convex which is what we would expect from a conjugate function, even though g2∗ is not obviously convex upon inspection. Recall from (6.7) on page 63 that we had three conjugate variables, y1 , y2 , and y3 such that y1 + y2 + y3 = 0. Making the substitution y3 = −(y1 + y2 ) allows us to write (6.12) in the  93  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 more symmetric form  g2∗ (y1 , y2 , y3 ) =    √   13 y1 y3 (3y3 − 3y1 + 4 −2y1 y3 ), if (y1 , y2 ) ∈ A;       √  1   3 y2 y3 (3y3 − 3y2 + 4 −2y2 y3 ), if (y1 , y2 ) ∈ B;      √   1 y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ), if (y1 , y2 ) ∈ C; 3  √  1   3 y1 y3 (3y1 − 3y3 + 4 −2y1 y3 ), if (y1 , y2 ) ∈ D;      √    31 y2 y3 (3y2 − 3y3 + 4 −2y2 y3 ), if (y1 , y2 ) ∈ E;     1 √   y1 y2 (3y2 − 3y1 + 4 −2y1 y2 ), if (y1 , y2 ) ∈ F, 3  (6.13)  where the regions A, · · · , F are as defined in (6.11) on page 90. With the y3 variable added back into the equation, we can then recognize that the three boundaries y1 − y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0 are equivalent to y1 = y2 , y1 = y3 , and y2 = y3 . If we consider region A defined by y1 ≥ y2 ≥ − 12 y1 , and look at the difference, y2 − y3 = y2 − (−y1 − y2 ) = y1 + 2y2 1 ≥ y1 + 2(− y1 ) = 0, 2 so min{y1 , y2 , y3 } = y3 and max{y1 , y2 , y3 } = y1 . Thus, we can rewrite using ymax = max{y1 , y2 , y3 } and ymin = min{y1 , y2 , y3 } 1 g2∗ (y1 , y2 , y3 ) = y1 y3 (3y3 − 3y1 + 4 −2y1 y3 ) 3 4 2 2 = ymax ymin − ymax ymin + ymax ymin 3  −2ymax ymin ,  if (y1 , y2 ) ∈ A. 94  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Similarly, all of the other regions can be rewritten in the same manner so that (6.13) can be rewritten using ymax = max{y1 , y2 , y3 } and ymin = min{y1 , y2 , y3 } as 4 2 2 g2∗ (y1 , y2 , y3 ) = ymax ymin − ymax ymin + ymax ymin 3  −2ymax ymin , (6.14)  without the need to specify the region. Remark 6.4.1 Although g2∗ does not look convex, it is because it arose as a conjugate function. Convexity can also be shown with calculus if you proceed as follows. We have three variables such that ymax ≥ y0 ≥ ymin and ymax + y0 + ymin = 0, so y0 = −ymax − ymin and hence ymax ≥ −ymax − ymin ≥ ymin . Equivalently, 2ymax + ymin ≥ 0 and −2ymin − ymax ≥ 0. Now define x := ymax and y := −ymin , then both x ≥ 0 and y ≥ 0 and we care about the region where 2x − y ≥ 0 and 2y − x ≥ 0.  (6.15)  Then we get the function 4 f (x, y) = xy 2 + yx2 − xy 3  2xy,  95  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 with   ∇2 f (x, y) =  √ √ (2 xy− 2y)y √  xy √  √ √ 2y xy+2x xy−3 2xy √ xy   √ √ √ 2y xy+2x xy−3 2xy √  xy √  √ (2 xy− 2x)x √ xy  It can be shown that the (1, 1) and (2, 2) elements of ∇2 f (x, y) are positive using the constraints in (6.15). The determinant of ∇2 f (x, y) can be shown to be positive by assuming x = a2 and y = b2 , factoring the resulting equation, and considering the signs of each of the factors. Since the Hessian is positive semidefinite then f is convex.  96  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  (a) Case 1  (b) Cases 2, 3, and 4  (c) Cases 5, 6, and 7  (d) Case 8  Figure 6.9: Overview of the eight cases  97  6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3  Figure 6.10: The six regions  (a) The function g2∗ (y1 , y2 , −y1 − y2 )  (b) The contour plot of g2∗ (y1 , y2 , −y1 − y2 )  Figure 6.11: Plots of g2∗ (y1 , y2 , −y1 − y2 )  98  Chapter 7  Conclusion The kernel average was previously only defined for two functions. We have used the identity n  n  λi q(yi ) − q( i=1  λi yi ) = i=1  1 4  n  n 2  λi λj yi − yj  ,  i=1 j=1  and the definition of the proximal average to define the kernel average for n functions,  Qg (f , λ)(x) :=  inf  λi yi =x  λi fi (yi ) +  λi λj g(yi − yj ) . i<j  We then examined a specific case of the kernel average, namely the proximal average, and its conjugate and calculated that for n  n  g1 (x) = i=1 j=1  1 xi − xj 2  2  =2 i<j  1 xi − xj 2  2  ,  the conjugate function is  g1∗ (x∗ )  =      8n1 2  n  n  i=1 j=1    +∞,  x∗i − x∗j 2 , if x∗1 + · · · + x∗n = 0; otherwise. 99  Chapter 7. Conclusion This can also be written using any of the following equivalent formulations when x∗1 + · · · + x∗n = 0: (i) g1∗ (x∗ ) =  1 8n2  (ii) g1∗ (x∗ ) =  1 8n2  (iii) g1∗ (x∗ ) =  1 2n  n  n  i=1 j=1 n  n  i=1 j=1 n  i=1  1 2  x∗i − x∗j  2  x∗i + x∗j  2  =  n  n  i=1 j=1  =  n  n  i=1 j=1  ∗ ∗ 1 xi −xj 2 2 2n  ∗ ∗ 1 xi +xj 2 2 2n  xi 2 .  Next, we computed the conjugate when  g2 (x1 , x2 , x3 ) =  2 x1 − x2 3  3 2  +  2 x2 − x3 3  3 2  +  2 x3 − x1 3  3 2  ,  and found that  g2∗ (y1 , y2 , y3 ) =    √  1   3 y1 y3 (3y3 − 3y1 + 4 −2y1 y3 ), if (y1 , y2 ) ∈ A;      √  1  −2y2 y3 ), if (y1 , y2 ) ∈ B; y y (3y − 3y + 4  2 3 3 2 3      √   1 y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ), if (y1 , y2 ) ∈ C; 3  √  1   3 y1 y3 (3y1 − 3y3 + 4 −2y1 y3 ), if (y1 , y2 ) ∈ D;      √   1 y2 y3 (3y2 − 3y3 + 4 −2y2 y3 ), if (y1 , y2 ) ∈ E;  3     1 √   y1 y2 (3y2 − 3y1 + 4 −2y1 y2 ), if (y1 , y2 ) ∈ F, 3  where the regions A, B, C, D, E, and F are as defined in (6.11), or equivalently using ymax = max{y1 , y2 , y3 } and ymin = min{y1 , y2 , y3 },  4 2 2 g2∗ (y1 , y2 , y3 ) = ymax ymin − ymax ymin + ymax ymin 3  −2ymax ymin .  (7.1) 100  Chapter 7. Conclusion It was expected that we might find a similar form for g2∗ as was found for g1∗ , which would help formulate a closed form solution for g˜∗ in the general case where  n  n  g˜(x) =  λi λj g(xi − xj ), i=1 j=1  for any function g. However, due to the fact that there does not appear to be any correlation between the solutions for g1∗ and g2∗ , it seems unlikely that a general solution will be found.  101  Bibliography [1] H. H. Bauschke, R. Goebel, Y. Lucet and X. Wang, “The proximal average: basic theory,” SIAM Journal on Optimization, vol 19(2), pp. 766785, 2008. [2] H. H. Bauschke, Y. Lucet and M. Trienis, “How to transform one convex function continuously into another,” SIAM Review, vol 50, pp. 115-132, 2008. [3] H. H. Bauschke and X. Wang, “The kernel average for two convex functions and its application to the extension and representation of monotone operators,” Transactions of the American Mathematical Society 361, pp. 5947-5965, 2009 [4] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization, Springer, 2006. [5] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [6] P. L. Combettes and J.-C. Pesquet, “A proximal decomposition method for solving convex variational inverse problems,” Inverse Problems, vol. 24(6), 2008. 102  [7] K. R. Davidson and A. P. Donsig, Real Analysis with Real Applications, Prentice Hall, 2002. [8] R. Ellaia and J.-B. Hiriart-Urruty, “The conjugate of the difference of functions,” Journal of Optimization Theory and Applications, vol 49(3), pp.493-498, 1986. [9] R. Goebel, “Self-dual smoothing of convex and saddle functions,” Journal of Convex Analysis, vol 15(1), pp.179-190, 2008. [10] E. Kreyszig, Introductory Functional Analysis with Applications, John Wiley and Sons, 1978. [11] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000. [12] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [13] R. T. Rockafellar and R. J-B Wets, Variational Analysis, SpringerVerlag, 2004. [14] C. Z˘alinescu, Convex Analysis in General Vector Spaces, World Scientific Publishing, 2002.  103  Appendix A  Maple Code The following is the Maple code used to verify the calculations for chapter 6. > r e s t a r t : with ( p l o t s ) : Case 1 : > h1 := x−> ( 1 / 3 ) ∗ ( x−y [ 1 ] ) ˆ 3 + ( 1 / 3 ) ∗ ( x−y [1] − y [ 2 ] ) ˆ 3 + ( 1 / 3 ) ∗ x ˆ 3 ;  1 1 1 x− > (x − y1 )3 + (x − y1 − y2 )3 + x3 3 3 3 > dh1 := d i f f ( h1 ( x ) , x ) ; c r i t i c a l p o i n t s 1 := s o l v e ( dh1 = 0 , x ) ;  2 1 1 y2 + y1 + 3 3 3  1 2 1 −2y22 − 2y1 y2 − 2y12 , y2 + y1 − 3 3 3  −2y22 − 2y1 y2 − 2y12  Case 2 : > h2 := x−> ( 1 / 3 ) ∗ ( y [1] − x ) ˆ 3 + ( 1 / 3 ) ∗ ( x−y [1] − y [ 2 ] ) ˆ 3 + ( 1 / 3 ) ∗ x ˆ 3 ;  1 1 1 x− > (y1 − x)3 + (x − y1 − y2 )3 + x3 3 3 3 > dh2 := d i f f ( h2 ( x ) , x ) : c r i t i c a l p o i n t s 2 := s o l v e ( dh2 = 0 , x ) ; 104  Appendix A. Maple Code  y2 +  −2y1 y2 , y2 −  −2y1 y2  > h 2 s o l n 1 := f a c t o r ( s i m p l i f y ( h2 ( c r i t i c a l p o i n t s 2 [ 1 ] ) ) ) ;  √ √ 1 − y1 y2 (3y1 − 3y2 − 4 2 −y1 y2 ) 3 > h 2 s o l n 2 := f a c t o r ( s i m p l i f y ( h2 ( c r i t i c a l p o i n t s 2 [ 2 ] ) ) ) ;  √ √ 1 − y1 y2 (3y1 − 3y2 + 4 2 −y1 y2 ) 3 Case 3 : > h3 := x−> ( 1 / 3 ) ∗ ( x−y [ 1 ] ) ˆ 3 + ( 1 / 3 ) ∗ ( y [ 1 ] + y [2] − x )ˆ3+(1/3)∗ x ˆ 3 ;  1 1 1 x− > (x − y1 )3 + (y1 + y2 − x)3 + x3 3 3 3 >dh3 := d i f f ( h3 ( x ) , x ) : c r i t i c a l p o i n t s 3 := s o l v e ( dh3 = 0 , x ) ;  −y2 +  2y22 + 2y1 y2 , −y2 −  2y22 + 2y1 y2  >h 3 s o l n 1 := f a c t o r ( s i m p l i f y ( h3 ( c r i t i c a l p o i n t s 3 [ 1 ] ) ) ) ;  √ 1 − y2 (y2 + y1 )(−3y1 − 6y2 + 4 2 y2 (y2 + y1 )) 3 >h 3 s o l n 2 := f a c t o r ( s i m p l i f y ( h3 ( c r i t i c a l p o i n t s 3 [ 2 ] ) ) ) ; 105  Appendix A. Maple Code  √ 1 y2 (y2 + y1 )(3y1 + 6y2 + 4 2 y2 (y2 + y1 )) 3 Case 4 : > h4 := x−> ( 1 / 3 ) ∗ ( x−y [ 1 ] ) ˆ 3 + ( 1 / 3 ) ∗ ( x−y [1] − y [ 2 ] ) ˆ 3 − ( 1 / 3 ) ∗ x ˆ 3 ;  1 1 1 x− > (x − y1 )3 + (x − y1 − y2 )3 + x3 3 3 3 >dh4 := d i f f ( h4 ( x ) , x ) : c r i t i c a l p o i n t s 4 := s o l v e ( dh4 = 0 , x ) ;  y2 + 2y1 +  2y1 y2 + 2y12 , y2 + 2y1 −  2y1 y2 + 2y12  >h 4 s o l n 1 := f a c t o r ( s i m p l i f y ( h4 ( c r i t i c a l p o i n t s 4 [ 1 ] ) ) ) ;  1 − y1 (y2 + y1 )(3y2 + 6y1 + 4 2y1 (y2 + y1 )) 3 >h 4 s o l n 2 := f a c t o r ( s i m p l i f y ( h4 ( c r i t i c a l p o i n t s 4 [ 2 ] ) ) ) ;  1 y1 (y2 + y1 )(−3y2 − 6y1 + 4 2y1 (y2 + y1 )) 3 Case 5 : > h5 := x−> ( 1 / 3 ) ∗ ( y [1] − x ) ˆ 3 + ( 1 / 3 ) ∗ ( y [ 1 ] + y [2] − x )ˆ3+(1/3)∗ x ˆ 3 ;  1 1 1 x− > (y1 − x)3 + (y1 + y2 − x)3 + x3 3 3 3 106  Appendix A. Maple Code >dh5 := d i f f ( h5 ( x ) , x ) : c r i t i c a l p o i n t s 5 := s o l v e ( dh5 = 0 , x ) ;  y2 + 2y1 +  2y1 y2 + 2y12 , y2 + 2y1 −  2y1 y2 + 2y12  >h 5 s o l n 1 := f a c t o r ( s i m p l i f y ( h5 ( c r i t i c a l p o i n t s 5 [ 1 ] ) ) ) ;  1 y1 (y2 + y1 )(3y2 + 6y1 + 4 2y1 (y2 + y1 )) 3 >h 5 s o l n 2 := f a c t o r ( s i m p l i f y ( h5 ( c r i t i c a l p o i n t s 5 [ 2 ] ) ) ) ;  1 − y1 (y2 + y1 )(−3y2 − 6y1 + 4 2y1 (y2 + y1 )) 3 Case 6 : > h6 := x−> ( 1 / 3 ) ∗ ( y [1] − x ) ˆ 3 + ( 1 / 3 ) ∗ ( x−y [1] − y [ 2 ] ) ˆ 3 − ( 1 / 3 ) ∗ x ˆ 3 ;  1 1 1 x− > (y1 − x)3 + (x − y1 − y2 )3 − x3 3 3 3 dh6 := d i f f ( h6 ( x ) , x ) : c r i t i c a l p o i n t s 6 := s o l v e ( dh6 = 0 , x ) ;  −y2 +  2y22 + 2y1 y2 , −y2 −  2y22 + 2y1 y2  >h 6 s o l n 1 := f a c t o r ( s i m p l i f y ( h6 ( c r i t i c a l p o i n t s 6 [ 1 ] ) ) ) ;  1 y2 (y2 + y1 )(−3y1 − 6y2 + 4 2y2 (y2 + y1 )) 3 107  Appendix A. Maple Code >h 6 s o l n 2 := f a c t o r ( s i m p l i f y ( h6 ( c r i t i c a l p o i n t s 6 [ 2 ] ) ) ) ;  1 − y2 (y2 + y1 )(3y1 + 6y2 + 4 2y2 (y2 + y1 )) 3 Case 7 : > h7 := x−> ( 1 / 3 ) ∗ ( x−y [ 1 ] ) ˆ 3 + ( 1 / 3 ) ∗ ( y [ 1 ] + y [2] − x )ˆ3 −(1/3)∗ x ˆ 3 ;  1 1 1 x− > (x − y1 )3 + (y1 + y2 − x)3 − x3 3 3 3 >dh7 := d i f f ( h7 ( x ) , x ) : c r i t i c a l p o i n t s 7 := s o l v e ( dh7 = 0 , x ) ;  y2 +  −2y1 y2 , y2 −  −2y1 y2  >h 7 s o l n 1 := f a c t o r ( s i m p l i f y ( h7 ( c r i t i c a l p o i n t s 7 [ 1 ] ) ) ) ;  1 − y1 y2 (3y2 − 3y1 + 4 −2y1 y2 ) 3 >h 7 s o l n 2 := f a c t o r ( s i m p l i f y ( h7 ( c r i t i c a l p o i n t s 7 [ 2 ] ) ) ) ;  1 y1 y2 (3y1 − 3y2 + 4 −2y1 y2 ) 3 Case 8 : > h8 := x−> ( 1 / 3 ) ∗ ( y [1] − x ) ˆ 3 + ( 1 / 3 ) ∗ ( y [ 1 ] + y [2] − x )ˆ3 −(1/3)∗ x ˆ 3 ;  108  Appendix A. Maple Code  1 1 1 x− > (y1 − x)3 + (y1 + y2 − x)3 − x3 3 3 3 >dh8 := d i f f ( h8 ( x ) , x ) : c r i t i c a l p o i n t s 8 := s o l v e ( dh8 = 0 , x ) ;  1 2 1 y2 + y1 + 3 3 3  1 2 1 −2y22 − 2y1 y2 − 2y12 , y2 + y1 − 3 3 3  −2y22 − 2y1 y2 − 2y12  Plotting : > r e s t a r t : with ( p l o t s ) : > f 1 := ( y1 , y2)−> −(1/3)∗ y1 ∗ ( y2+y1 )∗( −6∗ y1−3∗y2+4∗ s q r t ( 2 ) ∗ s q r t ( y1 ∗ ( y2+y1 ) ) ) : > f 2 := ( y1 , y2)−> −(1/3)∗ y2 ∗ ( y2+y1 )∗( −3∗ y1−6∗y2+4∗ s q r t ( 2 ) ∗ s q r t ( y2 ∗ ( y2+y1 ) ) ) : > f 3 := ( y1 , y2)−> ( 1 / 3 ) ∗ y1 ∗ y2 ∗ ( 3 ∗ y1−3∗y2+4∗ s q r t ( 2 ) ∗ s q r t (−y1 ∗ y2 ) ) : > f 4 := ( y1 , y2)−> −(1/3)∗ y1 ∗ ( y2+y1 ) ∗ ( 6 ∗ y1+3∗y2+4∗ s q r t ( 2 ) ∗ s q r t ( y1 ∗ ( y2+y1 ) ) ) : > f 5 := ( y1 , y2)−> −(1/3)∗ y2 ∗ ( y2+y1 ) ∗ ( 3 ∗ y1+6∗y2+4∗ s q r t ( 2 ) ∗ s q r t ( y2 ∗ ( y2+y1 ) ) ) : > f 6 := ( y1 , y2)−> ( 1 / 3 ) ∗ y1 ∗ y2 ∗(−3∗ y1+3∗y2+4∗ s q r t ( 2 ) ∗ s q r t (−y1 ∗ y2 ) ) : > f p i e c e := ( y1 , y2)−> p i e c e w i s e ( 0 <= y1 and −(1/2)∗ y1 <= y2 and y2 <= y1 , f 1 ( y1 , y2 ) , 0 <= y2 and −(1/2)∗ y2 <= y1 and y1 <= y2 , f 2 ( y1 , y2 ) , y1 <= 0 and −(1/2)∗ y1 <= y2 and y2 <= −2∗y1 , f 3 ( y1 , y2 ) , y1 <= 0 and y1 <= y2 and y2 <= −(1/2)∗ y1 , 109  Appendix A. Maple Code f 4 ( y1 , y2 ) , y2 <= 0 and y2 <= y1 and y1 <= −(1/2)∗ y2 , f 5 ( y1 , y2 ) , 0 <= y1 and −2∗y1 <= y2 and y2 <= −(1/2)∗ y1 , f 6 ( y1 , y2 ) ) : >p l o t 3 d ( f p i e c e ( y1 , y2 ) , y1 = −10..10 , y2 = −10..10 , a x e s=normal ) ;  >p l o t s [ c o n t o u r p l o t ] ( f p i e c e ( y1 , y2 ) , y1 = −10..10 , y2 = −10..10 , c o n t o u r s =100);  110  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 17 1
United Kingdom 16 0
China 5 3
Canada 2 0
France 2 2
Germany 1 13
India 1 0
Hungary 1 0
Russia 1 0
City Views Downloads
Tamworth 16 0
Ashburn 13 0
Unknown 5 17
Shenzhen 4 2
Sangrur 1 0
Vancouver 1 0
Kelowna 1 0
Beijing 1 1
Maryville 1 0
Budapest 1 0
Sunnyvale 1 0
Seattle 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0069333/manifest

Comment

Related Items