On the Kernel Average for n Functions by Sarah Michelle Moffat B.Sc., The University of British Columbia, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The College of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) December 2009 c© Sarah Michelle Moffat 2009 Abstract After an introduction to Hilbert spaces and convex analysis, the proximal average is studied and two smooth operators are provided. The first is a new version of an operator previously supplied by Goebel, while the second one is new and uses the proximal average of a function and a quadratic to find a smooth approximation of the function. Then, the kernel average of two functions is studied and a reformulation of the proximal average is used to extend the definition of the kernel aver- age to allow for any number of functions. The Fenchel conjugate of this new kernel average is then examined by calculating the conjugate for two spe- cific kernel functions that represent two of the simplest cases that could be considered. A closed form solution was found for the conjugate of the first kernel function and it was rewritten in three equivalent forms. A solution was also found for the conjugate of the second kernel function, but the two solutions do not have the same form which suggests that a general solution for the conjugate of any kernel function will not be found. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 General Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Facts on Maximization and Minimization . . . . . . . . . . . 7 3 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Interiors of Sets . . . . . . . . . . . . . . . . . . . . . 10 3.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 11 iii Table of Contents 3.2.1 Subgradients . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Epi-addition and Epi-multiplication . . . . . . . . . . 24 3.3 Proximity Operators . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Minimax Theory . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 The Proximal Average . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . 33 4.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 Self-dual Smooth Approximations . . . . . . . . . . . 41 5 The Kernel Average of Two Functions . . . . . . . . . . . . 47 5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6 The Kernel Average of n Functions . . . . . . . . . . . . . . 51 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 The Kernel Average Conjugate . . . . . . . . . . . . . . . . . 53 6.3 P-norm Kernel Conjugate for General p Case when n = 3 . . 59 6.4 P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 . . . 63 6.4.1 Case 1: x ≥ y1, x ≥ y1 + y2, x ≥ 0 . . . . . . . . . . . 64 6.4.2 Case 2: x ≤ y1, x ≥ y1 + y2, x ≥ 0 . . . . . . . . . . . 67 6.4.3 Case 3: x ≥ y1, x ≤ y1 + y2, x ≥ 0 . . . . . . . . . . . 69 6.4.4 Case 4: x ≥ y1, x ≥ y1 + y2, x ≤ 0 . . . . . . . . . . . 73 iv Table of Contents 6.4.5 Case 5: x ≤ y1, x ≤ y1 + y2, x ≥ 0 . . . . . . . . . . . 77 6.4.6 Case 6: x ≤ y1, x ≥ y1 + y2, x ≤ 0 . . . . . . . . . . . 80 6.4.7 Case 7: x ≥ y1, x ≤ y1 + y2, x ≤ 0 . . . . . . . . . . . 84 6.4.8 Case 8: x ≤ y1, x ≤ y1 + y2, x ≤ 0 . . . . . . . . . . . 88 6.4.9 Combining the Eight Cases . . . . . . . . . . . . . . . 90 6.4.10 Bringing It All Together . . . . . . . . . . . . . . . . 92 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Appendices A Maple Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 v List of Figures 3.1 Epigraph of a function . . . . . . . . . . . . . . . . . . . . . . 13 6.1 Case 1 summary . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Case 2 summary . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3 Case 3 summary . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4 Case 4 summary . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.5 Case 5 summary . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.6 Case 6 summary . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.7 Case 7 summary . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.8 Case 8 summary . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.9 Overview of the eight cases . . . . . . . . . . . . . . . . . . . 97 6.10 The six regions . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.11 Plots of g∗2(y1, y2,−y1 − y2) . . . . . . . . . . . . . . . . . . . 98 vi Acknowledgements First, I would like to thank my supervisors Heinz Bauschke and ShawnWang for their excitement about mathematics and their flexibility when it came to fitting mathematics into my life. I’d also like to thank Liangjin Yao and Mason Macklem for all their help, and Richard Taylor who first inspired me to pursue graduate studies. vii Dedication For my wonderful husband Jim and our daughter Zoe, who was born during the production of this thesis. viii Chapter 1 Introduction When averaging functions the most natural place to start is the arithmetic average, defined by A := λf1 + (1− λ)f2, (1.1) where 0 ≤ λ ≤ 1. This works well if both functions f1 and f2 are every- where defined. But if the functions are not differentiable at some points or if their domains do not intersect, then the arithmetic average will not be differentiable or will be +∞ everywhere. In [2] Bauschke, Lucet, and Trienis define a new average, the proximal average, and discuss the benefits of using the proximal average for these types of cases. In particular, the proximal average produces a continuous and differentiable function even if the original functions are non-smooth and their domains do not intersect, provided at least one of the functions is differentiable with a full domain. From the definition of the proximal average, the more general kernel av- erage [3] was defined for averaging two functions based on a kernel function. Both the arithmetic and proximal averages can be derived as special cases of the kernel average. This thesis extends the definition of the kernel average to an arbitrary number of functions and examines the convex conjugate of the kernel average for n functions. 1 Chapter 2 Hilbert Spaces In this chapter we give some background material on inner product spaces. The notion of vector spaces and their extensions are central to much of the following thesis, so a quick reminder of some concepts from linear algebra is also included to refresh the reader’s memory. For more on vector spaces see [7, Chapters 1 and 7] or [11, Chapter 4]. 2.1 General Vector Spaces Definition 2.1.1 (Vector Space) A vector space consists of a set V with elements called vectors, along with two operations such that the following properties hold: (1) Vector addition: Let u, v ∈ V , then there is a vector u+ v ∈ V and the following are satisfied. (i) Commutativity: u+ v = v + u, ∀u, v ∈ V. (ii) Associativity: u+ (v + w) = (u+ v) + w, ∀u, v, w ∈ V. (iii) Zero: there is a vector 0 ∈ V such that 0+u = u = u+0, ∀u ∈ V. (iv) Inverses: for each u ∈ V , there is a vector −u such that u+(−u) = 0. 2 2.1. General Vector Spaces (2) Scalar multiplication: Let u, v ∈ V and r, s ∈ R, then the following are satisfied. (i) Left distributivity: (r + s)v = rv + sv. (ii) Associativity: r(sv) = (rs)v. (iii) Right distributivity: r(u+ v) = ru+ rv. (iv) Neutral element: 1v = v. (v) Absorbing element: 0v = 0. (vi) Inverse neutral element: (−1)v = −v. Example 2.1.2 The space Rn consists of vectors v = (v1, · · · , vn) with vi ∈ R for 1 ≤ i ≤ n and operations defined by (u1, · · · , un) + (v1, · · · , vn) := (u1 + v1, · · · , un + vn) r(v1, · · · vn) := (rv1, · · · , rvn), where r ∈ R. Definition 2.1.3 A subspace of a vector space V is a subset W of V with W 6= ∅ and W is a vector space using the operations of V . Definition 2.1.4 Let S ⊆ V , the span of S is the smallest subspace con- taining S and is denoted by spanS. Fact 2.1.5 The subspace spanned by a nonempty set S ⊆ V consists of all linear combinations of the elements of S. 3 2.2. Inner Product Spaces Definition 2.1.6 A linear transformation A from a vector space V to a vector space W is a function A : V →W satisfying A(r1v1 + r2v2) = r1Av1 + r2Av2, ∀v1, v2 ∈ V and ∀r1, r2 ∈ R. 2.2 Inner Product Spaces We recall the definitions of a norm and an inner product. Definition 2.2.1 A norm on a vector space V is a function ‖ · ‖ : V → [0,+∞[ with the following properties. (i) Positive definite: ‖x‖ = 0 if, and only if, x = 0, (ii) Homogeneous: ‖αx‖ = |α|‖x‖, ∀x ∈ V and α ∈ R, (iii) Triangle inequality: ‖x+ y‖ ≤ ‖x‖+ ‖y‖, ∀x, y ∈ V . Definition 2.2.2 An inner product on a vector space V is a function 〈·, ·〉 : V × V → R satisfying the following properties. (i) Positive definite: 〈x, x〉 ≥ 0, ∀x ∈ V and 〈x, x〉 = 0 only if x = 0; (ii) Symmetry: 〈x, y〉 = 〈y, x〉, ∀x, y ∈ V ; (iii) Bilinearity 〈αx+ βy, z〉 = α〈x, z〉+ β〈y, z〉, ∀x, y, z ∈ V and α, β ∈ R. We call a vector space paired with an inner product and norm induced by ‖x‖ := 〈x, x〉1/2, an inner product space. Definition 2.2.3 In a normed vector space (V, ‖ · ‖), a sequence (vn)∞n=1 converges to v ∈ V if lim n→∞ ‖vn − v‖ = 0. 4 2.2. Inner Product Spaces Definition 2.2.4 A sequence (vn)∞n=1 is called a Cauchy sequence if for every ² > 0, there is an integer N > 0 such that ‖vn − vm‖ < ² for all n,m ≥ N . Remark 2.2.5 While every convergent sequence is a Cauchy sequence, the converse is not true. For example, consider the sequence 1, 14 10 , 141 100 , 1414 1000 , 14142 10000 , · · · in Q approaching √ 2. This sequence does not converge in Q. Definition 2.2.6 An inner product space V is complete if every Cauchy sequence in V converges to some vector v ∈ V . Definition 2.2.7 A complete inner product space is called a Hilbert space. Example 2.2.8 The following inner product spaces are Hilbert spaces: (i) [7, Theorem 4.2.5] Rn paired with the inner product 〈x, y〉 = x1y1 + · · ·+ xnyn. (ii) [7, Theorem 7.5.8] The space `2, consisting of all sequences x = (xn)∞n=1 such that ‖x‖2 := ( ∞∑ n=1 x2n )1/2 is finite, with the inner product 〈x, y〉 = ∞∑ n=1 xnyn. (iii) [10, Example 2.2-7] The space L2[0, 1], consisting of all (equivalence classes of) functions f on [0, 1] such that ‖f‖2 := ( ∫ 1 0 |f(x)|2dx) 1 2 < +∞, paired with the inner product 〈f, g〉 = ∫ 10 f(x)g(x)dx. 5 2.2. Inner Product Spaces Definition 2.2.9 [4, Section 1.1] Let X,V be vector spaces and let A : X → V be a linear operator. The corresponding adjoint linear transformation from V to X is the unique operator A∗ such that the following identity holds 〈Ax, y〉 = 〈x,A∗y〉 (2.1) for all x ∈ X and y ∈ V . Example 2.2.10 Let X be a vector space and λ1, · · · , λn ∈ R. Let A : Xn → X be the linear operator defined by x = (x1, · · · , xn) 7→ λ1x1 + · · · + λnxn. Then the adjoint of A is the operator A∗ : X → Xn such that z 7→ (λ1z, · · · , λnz). Proof. Let x ∈ Xn and y ∈ X, then 〈Ax, y〉 = 〈λ1x1 + · · ·+ λnxn, y〉 = n∑ i=1 λi〈xi, y〉. On the other hand, 〈x,A∗y〉 = 〈(x1, · · · , xn), (λ1y, · · · , λny)〉 = 〈x1, λ1y〉+ · · ·+ 〈xn, λny〉 = n∑ i=1 λi〈xi, y〉. Since 〈Ax, y〉 = 〈x,A∗y〉 ∀x ∈ Xn, y ∈ X and the adjoint is unique then A∗ is the adjoint of A. ¥ 6 2.3. Facts on Maximization and Minimization 2.3 Facts on Maximization and Minimization In this section, we assume that X is an inner product space. Fact 2.3.1 Let x ∈ X then sup y∈X 〈x, y〉 = 0, if x = 0; +∞, otherwise. Fact 2.3.2 Let f and g be functions from X → ]−∞,+∞], then (i) sup x,y∈X (f(x) + g(y)) = sup x∈X (f(x)) + sup y∈X (g(y)) (ii) inf x,y∈X (f(x) + g(y)) = inf x∈X (f(x)) + inf y∈X (g(y)) . Fact 2.3.3 Let x, y ∈ X then (i) sup x∈X sup y∈X f(x, y) = sup y∈X sup x∈X f(x, y) (ii) inf x∈X inf y∈X f(x, y) = inf y∈X inf x∈X f(x, y) These will be used frequently later on. 7 Chapter 3 Convex Analysis Let H denote a real Hilbert space with inner product 〈·, ·〉, norm ‖ · ‖, and identity mapping Id. We’ll now introduce some necessary convex analysis, for a more in-depth look at convex analysis please see [12]. 3.1 Convex Sets Definition 3.1.1 A set A ⊆ H is affine if x ∈ A, y ∈ A, and θ ∈ R imply that θx+ (1− θ)y ∈ A. Definition 3.1.2 A set C ⊆ H is convex if x ∈ C, y ∈ C, and 0 ≤ θ ≤ 1 imply that θx+ (1− θ)y ∈ C. This means that for any two points in a convex set C, the line segment joining the two points is also contained in C. Example 3.1.3 The following sets are convex: (i) Affine sets; (ii) Halfspaces: A set H is a halfspace if for some b ∈ H and β ∈ R, H = {x ∈ H : 〈x, b〉 ≤ β}; 8 3.1. Convex Sets (iii) Closed ball of radius r > 0 centered at a point xc: B(xc, r) := {x ∈ H : ‖xc − x‖ ≤ r}. The following definitions describe some important properties of sets that are frequently used in convex analysis. Definition 3.1.4 The indicator function of a set C ⊆ H is the function ιC : H → [0,+∞] defined by ιC(x) = 0, if x ∈ C; +∞, otherwise. Definition 3.1.5 The support function of a set C ⊆ H is the function σC : H → ]−∞,+∞] defined by σC(x) = sup u∈C 〈x, u〉. 3.1.1 Cones A set C is called a cone if for every x ∈ C and θ > 0 we have θx ∈ C. A set C is a convex cone if it is both convex and a cone. That is, for every x1, x2 ∈ C and θ1, θ2 ≥ 0 then θ1x1 + θ2x2 ∈ C. Definition 3.1.6 The conical hull of a set C ⊆ H is the set coneC = ⋃ λ>0 {λx : x ∈ C}. 9 3.1. Convex Sets Fact 3.1.7 [5, Section 2.1.5] The conical hull of C is the smallest convex cone that contains the set C. Example 3.1.8 Let D = {z = (x, y, 0) ∈ R3 : ‖z‖ ≤ 1} be a closed unit disc in R3, then coneD = R2 × {0}. Definition 3.1.9 Let C be a nonempty convex set in H. We say that C recedes in the direction of y, y 6= 0, if and only if x+λy ∈ C for every λ ≥ 0 and x ∈ C. Directions in which C recedes are also called the directions of recession. Definition 3.1.10 The recession cone of a set C is the set of all vectors y ∈ H such that for each x ∈ C, x + λy ∈ C for all λ ≥ 0. The recession cone of C is denoted by 0+C. Fact 3.1.11 [12, Theorem 8.1] Let C be a non-empty convex set. Then the recession cone, 0+C, is a convex cone containing the origin. 3.1.2 Interiors of Sets Definition 3.1.12 The interior of a set C ⊆ H is the set intC = {x ∈ H : ∃² > 0 such that B(x, ²) ⊆ C}, where B(x, ²) = {y : ‖y − x‖ < ²}. Definition 3.1.13 The relative interior of a convex set C ⊆ H is the set riC = {x ∈ H : cone(C − x) = span(C − x)}. 10 3.2. Convex Functions The following example illustrates the need to distinguish between the interior and the relative interior of a set. Example 3.1.14 Consider again the closed disc D = {z = (x, y, 0) ∈ R3 : ‖z‖ ≤ 1}.We get that intD = ∅ since no ball in R3 can be contained in D, however riD = {z = (x, y, 0) ∈ R3 : ‖z‖ < 1}. Definition 3.1.15 A point x is a limit point of a subset C ⊆ H if there is a sequence (xn)∞n=1 with xn ∈ C such that x = limn→∞xn. A set C ⊆ H is closed if it contains all of its limit points. Definition 3.1.16 The closure of a set C ⊆ H is the smallest closed set containing C, and is denoted by clC. 3.2 Convex Functions The effective domain of a function f : H →]−∞,+∞] is the set of points: dom f = {x ∈ H : f(x) < +∞}. (3.1) The set of global minimizers of f is denoted by argminx∈H f(x). Definition 3.2.1 We call a function f proper if it never takes on the value of −∞ and is not identically equal to +∞. Definition 3.2.2 A function f is lower semicontinuous at a point x0 if lim inf x→x0 f(x) ≥ f(x0), 11 3.2. Convex Functions where lim inf is as defined in [13, Definition 1.5]. The function is said to be lower semicontinuous if it is lower semicontinuous at every point x0 ∈ H. Definition 3.2.3 A function f is coercive if lim ‖x‖→∞ f(x) =∞. A function is supercoercive if lim ‖x‖→∞ f(x) ‖x‖ =∞. Definition 3.2.4 The epigraph of a function f : H → ]−∞,+∞] is the set epi f = {(x, t) ∈ H × R : f(x) ≤ t}. This is illustrated in Figure 3.1. Definition 3.2.5 A function f : H → ]−∞,+∞] is convex if epi f is a convex set. We denote the set of proper, lower semicontinuous, convex functions in H by Γ0(H). While Definition 3.2.5 conveniently ties the notion of a convex function to that of a convex set, in practice a convex function is usually synonymous with the following result. 12 3.2. Convex Functions Figure 3.1: Epigraph of a function [5, Figure 3.5] Theorem 3.2.6 [12, Theorem 4.1] A function f : H → ]∞,+∞] is convex if and only if for x, y ∈ H and 0 < θ < 1, f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y). A function is said to be strictly convex if the inequality in Theorem 3.2.6 is strict; that is, that f(θx + (1 − θ)y) < θf(x) + (1 − θ)f(y) provided that x 6= y. Fact 3.2.7 [14, Theorem 2.5.1(ii) and Proposition 2.5.6] If a function f is both coercive and strictly convex then it has a unique minimizer, x̄. That is, argminx∈H f(x) = {x̄}. 13 3.2. Convex Functions Definition 3.2.8 A function f : H → ]∞,+∞] is concave if −f is convex. Definition 3.2.9 A function f : H → ]∞,+∞] is affine if f is finite and both convex and concave. Example 3.2.10 Let f : H → ]∞,+∞] be defined by x 7→ 〈a, x〉 − b where a ∈ H and b ∈ R. Then f is an affine function. Proof. Let x, y ∈ H and θ ∈ R, then f(θx+ (1− θ)y) = 〈a, θx+ (1− θ)y〉 − b = θ(〈a, x〉 − b) + (1− θ)(〈a, y〉 − b) = θf(x) + (1− θ)f(y). Therefore the conditions for convexity and concavity are both satisfied and f is affine. ¥ Another method of determining if a function is convex is by checking the first and second order conditions for convexity. Fact 3.2.11 (First order condition) [5, Section 3.1.3] Let f : Rn → ]−∞,+∞] be a differentiable function; that is, its gradient ∇f exists at each point of its open domain. Then f is convex if and only if its domain is convex and f(y) ≥ f(x) + 〈∇f(x), y − x〉 holds for all x, y ∈ dom f . This condition says that for a convex function the first-order Taylor series approximation is a global underestimator of the function, and conversely if 14 3.2. Convex Functions the Taylor approximation is a global underestimator then the function is convex. Fact 3.2.12 (Second order condition) [5, Section 3.1.4] Let f : Rn → ]−∞,+∞] be a twice differentiable function; that is, its Hessian or second derivative ∇2f(x) exists at each point of it open domain. Then f is convex if and only if dom f is convex and its Hessian is positive semidefinite: ∇2f(x) º 0. Remark 3.2.13 A matrix A ∈ Rm×n is positive semidefinite if yTAy ≥ 0 for all y ∈ Rn, and is denoted by A º 0. Fact 3.2.14 (Composition with affine mapping) [5, Section 3.2.2] Sup- pose f : Rn → R, A ∈ Hm×n, and b ∈ Rn. Define g : Rn → R by g(x) = f(Ax+ b); with dom g = {x : Ax + b ∈ dom f}. Then if f is convex, so is g; if f is concave, so is g. Definition 3.2.15 Given two functions f, g from H → ]−∞,+∞], f is said to majorize g if f(x) ≥ g(x) ∀x ∈ H. Definition 3.2.16 Let f be a function from H → R. The lower semi- continuous hull of f is the greatest lower semi-continuous function majorized by f , i.e, the function whose epigraph is the closure of the epigraph of f . 15 3.2. Convex Functions Definition 3.2.17 Let f ∈ Γ0(H). The recession function of f is the func- tion f∞ : H → ]−∞,+∞] whose epigraph is the recession cone of the epi- graph of f , 0+(epi f). Fact 3.2.18 [14, Theorem 2.1.5 (ii)] Let f ∈ Γ0(H) be such that int(dom f) 6= ∅ and let t0 ∈ dom f . The function ϕt0 : dom f\{t0} → R defined by ϕt0 := f(t)− f(t0) t− t0 , is nondecreasing; if f is strictly convex then ϕt0 is increasing. Proposition 3.2.19 Let f ∈ Γ0(H) and x0 ∈ dom f , then the recession function of f is f∞(u) = lim t→∞ f(x0 + tu)− f(x0) t for all u ∈ H. Proof. Using Definition 3.2.17 and Definition 3.1.10 we have (u, λ) ∈ 0+(epi f)⇔ ∀t > 0 : (x0, f(x0)) + t(u, λ) ∈ epi f ⇔ f(x0 + tu)− f(x0) t ≤ λ ⇔ sup t>0 f(x0 + tu)− f(x0) t ≤ λ. Fix u ∈ H, the function t 7→ f(x0+tu)−f(x0)t is nondecreasing on ]0,+∞] due to the convexity of f and Fact 3.2.18. Then, (u, λ) ∈ 0+(epi f)⇔ lim t→∞ f(x0 + tu)− f(x0) t = sup t>0 f(x0 + tu)− f(x0) t ≤ λ. 16 3.2. Convex Functions Therefore for all u ∈ H, f∞(u) = lim t→∞ f(x0+tu)−f(x0) t . ¥ 3.2.1 Subgradients In order to deal with nonsmooth convex functions, we will now introduce a concept that is analogous to a derivative of a differentiable function. Definition 3.2.20 We say that y ∈ H is a subgradient of a convex function f at the point x if f(x) + 〈y, z − x〉 ≤ f(z) (∀z ∈ H). (3.2) The set of all subgradients of f at x is called the subdifferential of f at x and is denoted by ∂f(x). That is, ∂f(x) := {y ∈ H : f(x) + 〈y, z − x〉 ≤ f(z) ∀z ∈ H}. (3.3) Example 3.2.21 Let f(x) = |x|, then ∂f(x) = {−1}, if x < 0; [−1, 1], if x = 0; {1}, if x > 0. Fact 3.2.22 [12, Theorem 26.1] Let f : H → ]−∞,+∞] be a differentiable function. Then ∂f = {∇f}. Fact 3.2.23 [14, Theorem 2.5.7] If f is a proper convex function, then x ∈ dom f is a minimum point for f if and only if 0 ∈ ∂f(x). In particular, 17 3.2. Convex Functions if f is differentiable at x then x is a minimum if ∇f(x) = 0. Fact 3.2.24 (Bronstead-Rockafellar) [14, Theorem 3.1.2] Let H be a Hilbert space and f ∈ Γ0(H). Then dom f ⊆ dom ∂f and dom f∗ ⊆ ran ∂f . 3.2.2 Fenchel Conjugate Definition 3.2.25 Let f : H → ]−∞,+∞]. The Fenchel conjugate, or convex conjugate, of f is defined as f∗(y) = sup x∈dom f (〈x, y〉 − f(x)) , for all y ∈ H. It is interesting to note that since 〈x, y〉 − f(x) is affine with respect to y for a fixed x, then f∗ is the supremum of a set of convex functions and is therefore convex regardless of whether f is convex or not. In the case where f is a finite, coercive and twice continuously differentiable function, the Fenchel conjugate is also referred to as the Legendre transform of f [13, Example 11.9]. Proposition 3.2.26 (Fenchel-Young Inequality) Let f : H → ]−∞,+∞]. The following holds f(x) + f∗(x∗) ≥ 〈x, x∗〉 (3.4) for all x, x∗ ∈ H. Proof. Follows directly from the definition of the conjugate function. ¥ Proposition 3.2.27 If f1 ≤ f2 then f∗1 ≥ f∗2 . 18 3.2. Convex Functions Proof. Let y ∈ H, since f1 ≤ f2 then 〈x, y〉 − f1(x) ≥ 〈x, y〉 − f2(x) for all x ∈ H. Taking the supremum over x of each side yields sup x∈H (〈x, y〉 − f1(x)) ≥ sup x∈H (〈x, y〉 − f2(x)) . Therefore f∗1 (y) ≥ f∗2 (y). ¥ Proposition 3.2.28 Let f : H → ]−∞,+∞] and c ∈ R be a constant. Then (f(· − c))∗(x∗) = 〈x∗, c〉+ f∗(x∗). Proof. Let x∗ ∈ H, then (f(· − c))∗(x∗) = sup x (〈x∗, x〉 − f(x− c)). Letting x− c = x′, this becomes (f(· − c))∗(x∗) = sup x′ (〈x∗, x′ + c〉 − f(x′)) = 〈x∗, c〉+ sup x′ (〈x∗, x′〉 − f(x′)) = 〈x∗, c〉+ f∗(x∗). ¥ Example 3.2.29 Let q = 12‖ · ‖2, then q∗ = q and this is the only self- conjugate function, i.e. f = f∗. Proof. From the definition of the conjugate we get that q∗(y) = sup x ( 〈x, y〉 − 1 2 ‖x‖2 ) Setting h(x) = 〈x, y〉 − 12‖x‖2 and differentiating, we get h′(x) = y − x and h′′(x) = − Id, which is negative definite: Hence, h is strictly concave and 19 3.2. Convex Functions therefore the critical point x = y is the maximizer. Setting x = y in the equation above yields q∗(y) = 12‖y‖2. On the other hand, suppose f is a convex function satisfying f = f∗. Then f is proper and using Proposition 3.2.26 we get 〈x, x〉 ≤ f(x) + f∗(x) = 2f(x) ⇔ q(x) ≤ f(x) Then by Fact 3.2.27 f∗ ≤ q∗ = q. Since f∗ = f , we get q ≤ f ≤ q, therefore f = q. ¥ Due to its frequent use, from here on we will use the notation that q := 1 2 ‖ · ‖2. (3.5) Example 3.2.30 Let f : R→ R be defined by f(x) = 1p |x|p then for p = 1 f∗(x∗) = 0, if − 1 ≤ x∗ ≤ 1; +∞, otherwise. And when p > 1 f∗(x∗) = 1 q |x∗|q where 1p + 1 q = 1. Proof. When p = 1, f(x) = |x| and f∗(x∗) = sup x∈R (xx∗ − |x|) 20 3.2. Convex Functions If x∗ ≥ 0, f∗(x∗) = sup x∈R+ (xx∗ − x) = sup x∈R+ (x(x∗ − 1)) = +∞, if x∗ > 1; 0, if x∗ ≤ 1. If x∗ < 0, f∗(x∗) = sup x∈R− (xx∗ + x) = sup x∈R− (x(x∗ + 1)) = 0, if x∗ ≥ −1; +∞, if x∗ < −1. Altogether, f∗(x∗) = 0, if − 1 ≤ x∗ ≤ 1; +∞, otherwise. When p > 1, f(x) = 1p |x|p and f∗(x∗) = sup x∈R ( xx∗ − 1p |x|p ) Differentiating to find the critical point, d dx (xx∗ − 1 p |x|p) = x∗ − |x|p−1 sgn(x) = 0, where sgn(x) denotes the sign of x. Then solving for x yields x∗ = |x|p−1 sgn(x)⇒ |x∗| = |x|p−1 ⇒ |x| = |x∗| 1p−1 . Substituting this back into the definition of the conjugate, x|x|p−1 sgn(x)− 1 p |x|p = |x||x|p−1 − 1 p |x|p = |x|p − 1 p |x|p = ( p− 1 p )|x|p = (p− 1 p )(|x∗| 1p−1 )p = (p− 1 p )|x∗| pp−1 . Letting q = pp−1 , we get that f ∗(x∗) = 1q |x∗|q where 1p + 1q = 1. ¥ 21 3.2. Convex Functions Example 3.2.31 Let f(x) = 1p |x − c|p for some constant c with p > 1. Then f is convex. Proof. Let g(z) = ( 1 q |z|q)∗ = 1 p |z|p, where 1p + 1 q = 1. Then g(z) is convex since it is a conjugate function and f(x) is convex by Fact 3.2.14. ¥ Fact 3.2.32 (Biconjugate theorem) [14, Theorem 2.3.3] Assume that f : H → ]−∞,+∞] is a proper function. Then f∗∗ := (f∗)∗ = f if and only if f ∈ Γ0(H). Fact 3.2.33 (Conjugate of the difference of functions) [8, Theorem 2.1] Let g ∈ Γ0(H) and let h ∈ Γ0(H) such that h and h∗ both have full domain. Then (∀x∗ ∈ H)(g − h)∗(x∗) = sup y∗∈H (g∗(y∗)− h∗(y∗ − x∗)) (3.6) Definition 3.2.34 Let f : H → ]−∞,+∞], and A be a linear transforma- tion from H to H. Define Af(x) := inf{f(y) : Ay = x}. Proposition 3.2.35 Let f : H → ]−∞,+∞], and A be a linear transfor- mation from H to H. Then (Af)∗ = f∗ ◦A∗. 22 3.2. Convex Functions Proof. Let x∗ ∈ H, then (Af)∗(x∗) = sup x∈H (〈x, x∗〉 −Af(x)) = sup x∈H ( 〈x, x∗〉 − inf {y:Ay=x} f(y) ) = sup (〈x, x∗〉 − f(y) : (x, y) ∈ H ×H, Ay = x) = sup (〈Ay, x∗〉 − f(y) : y ∈ H) = sup (〈y,A∗x∗〉 − f(y) : y ∈ H) = f∗(A∗x∗) = (f∗ ◦A∗)(x∗). ¥ Fact 3.2.36 (Fenchel’s Duality Theorem) [12, Theorem 31.1] Let f and g be a proper convex functions on H. Suppose at least one of the following holds: (i) ri(dom f) ∩ ri(dom g) 6= ∅, (ii) f and g are lower semicontinuous, and ri(dom f∗) ∩ ri(dom g∗) 6= ∅. Then inf x (f(x) + g(x)) = sup x∗ (−g∗(−x∗)− f∗(x∗)) . Fact 3.2.37 [14, Theorem 2.3.1] Let f : H → ]−∞,+∞] and α, β ∈ R. If α > 0 then (αf)∗(x∗) = αf∗(α−1x∗) for every x∗ ∈ H; if β 6= 0 then (f(β·))∗(x∗) = f∗(β−1x∗) for every x∗ ∈ H. 23 3.2. Convex Functions 3.2.3 Epi-addition and Epi-multiplication Definition 3.2.38 Let f1, f2 ∈ Γ0(H). The infimal convolution, or epi- addition, is defined by (f ¤ g)(x) := inf x1+x2=x (f1(x1) + f2(x2)) (3.7) for all x ∈ H. The infimal convolution is said to be exact at a point x if the infimum is attained. Definition 3.2.39 Let f ∈ Γ0(H) and α ≥ 0. We define epi-multiplication by α? f = αf(·/α), if α > 0; ι{0}, if α = 0. (3.8) Proposition 3.2.40 [1, Proposition 3.1] Let f, f1, · · · , fn ∈ Γ0(H). Then the following properties hold: (i) dom(α? f) = α(dom f) (ii) dom(f1¤ · · ·¤ fn) = (dom f1) + · · ·+ (domfn) Proposition 3.2.41 Let α ≥ 0. Then the following hold: (i) (αf)∗ = α? f∗; (ii) (α? f)∗ = αf∗; (iii) (f1¤ · · ·¤ fn)∗ = f∗1 + · · ·+ f∗n. 24 3.2. Convex Functions Proof. (i) Let x∗ ∈ H, then we consider two cases. (1) If α > 0, (αf)∗(x∗) = sup x (〈x, x∗〉 − αf(x)) = α sup x (〈x, x∗/α〉 − f(x)) = αf∗(x∗/α). (2) If α = 0 (αf)∗(x∗) = sup x (〈x, x∗〉 − 0]) = 0, if x∗ = 0; +∞, otherwise = ι{0}(x∗) = (α? f∗)(x∗). (3.9) Altogether, (αf)∗(x) = (α? f∗)(x∗). (ii) (α? f)∗ = sup x (〈x, x∗〉 − αf(x/α)) if α > 0 sup x (〈x, x∗〉 − ι{0}(x)) if α = 0 = α sup x (〈x/α, x∗〉 − f(x/α)) if α > 0 0 if α = 0 = αf∗(x∗). 25 3.2. Convex Functions (iii) (f1¤ · · ·¤ fn)∗(x∗) = sup x ( 〈x, x∗〉 − inf x1+···+xn=x (f1(x1) + · · ·+ fn(xn)) ) = sup x ( 〈x1 + · · ·+ xn, x∗〉+ sup x1+···+xn=x (−f1(x1)− · · · − fn(xn)) ) = sup x1 (〈x1, x∗〉 − f1(x1)) + · · ·+ sup x1 (〈x1, x∗〉 − f1(x1)) = (f∗1 + · · ·+ f∗n)(x∗). ¥ Fact 3.2.42 [1, Fact 3.4] The following hold. (i) If int dom f1 ∩ · · · int dom fn−1 ∩ dom fn 6= ∅, then (f1 + · · · + fn)∗ = f∗1 ¤ · · ·¤ f∗n and the infimal convolution is exact. (ii) If int dom f∗1 ∩· · · int dom f∗n−1∩dom f∗n 6= ∅, then f1¤ · · ·¤ fn is exact and epi(f1¤ · · ·¤ fn) = (epi f1) + · · ·+ (epi fn). Lemma 3.2.43 (λ1 ?(f1 + q)¤ · · ·¤λn ?(fn + q))∗ = λ1(f∗1 ¤ q) + · · · + λn(f∗n ¤ q) Proof. Using Proposition 3.2.41(iii), Proposition 3.2.41(i), Fact 3.2.42, and Example 3.2.29, we get that (λ1 ?(f1 + q)¤ · · ·¤λn ?(fn + q))∗ = (λ1 ?(f1 + q))∗ + · · ·+ (λn ?(fn + q))∗ = λ1(f1 + q)∗ + · · ·+ λn(fn + q)∗ = λ1(f∗1 ¤ q∗) + · · ·+ λn(f∗n ¤ q∗) = λ1(f∗1 ¤ q) + · · ·+ λn(f∗n ¤ q). (3.10) 26 3.3. Proximity Operators ¥ The following lemma illustrates the beauty of the epi-multiplication no- tation and will be used for a couple of proofs in the following chapters. Lemma 3.2.44 Let fi : H → ]−∞,+∞] for 1 ≤ i ≤ n. Let f = (f1, · · · , fn), x = (x1, · · · , xn) and f̃(x) = ∑ λifi(xi): Then f̃∗(x∗) = n∑ i=1 λi ? f ∗ i (x ∗ i ). Proof. f̃∗(x∗) = sup x∈H {〈x, x∗〉 − f̃(x)} = sup x=(x1,··· ,xn)∈H {〈x1, x∗1〉+ · · · 〈xn, x∗n〉 − ∑ λifi(xi)} = λ1 sup x1 {〈x1, x∗1/λ1〉 − f1(x1)}+ · · ·+ λn sup xn {〈xn, x∗n/λn〉 − fn(xn)} = λ1f∗1 ( x∗1 λ1 ) + · · ·+ λnf∗n( x∗n λn ) = λ1 ? f∗1 (x ∗ 1) + · · ·+ λn ? f∗n(x∗n). ¥ 3.3 Proximity Operators Definition 3.3.1 The proximity operator, or proximal mapping, of a func- tion f ∈ Γ0(H) is defined by (∀x ∈ H) Proxf x = argminy∈H (f(y) + q(x− y)) . (3.11) Fact 3.3.2 [6, Section 2.2] For all x ∈ H and for all p ∈ H p = Proxf x⇔ x− p ∈ ∂f(p), 27 3.4. Minimax Theory and Proxf = (Id+∂f)−1. Remark 3.3.3 Note that since the function y 7→ f(y) + q(x− y) is strictly convex and supercoercive, it has a unique minimizer, p = Proxf (x). 3.4 Minimax Theory Minimax problems are optimization problems that involve both minimiza- tion and maximization. Let X and Y be arbitray subsets of H with X 6= ∅ and Y 6= ∅, and let F be a function from X × Y to [−∞,+∞]. Minimax theory deals with problems of the form sup x∈X inf y∈Y F (x, y) or inf y∈Y sup x∈X F (x, y). For more on minimax theory please see chapters 36 and 37 in [12]. Definition 3.4.1 If sup x∈X inf y∈Y F (x, y) = inf y∈Y sup x∈X F (x, y) then the common value is called the minimax or the saddle-value of F . Definition 3.4.2 F is a concave-convex function if F (·, y) is a concave function on X for all y ∈ Y and F (x, ·) is convex on Y for all x ∈ X. Similarly, F is a convex-concave function if F (·, y) is convex on X for all y ∈ Y and F (x, ·) is concave on Y for all x ∈ X. The following fact gives us conditions for determining whether the saddle- value exists. Fact 3.4.3 [12, Theorem 37.3] Let F : Rm × Rn → ]−∞,+∞] be a proper concave-convex function with effective domain X × Y . Then either of the 28 3.4. Minimax Theory following conditions implies that the saddle-value of F exists. If both condi- tions hold, the saddle-value must be finite. (a) The convex functions F (x, ·) for x ∈ riX have no common direction of recession. (b) The convex functions −F (·, y) for y ∈ riY have no common direction of recession. 29 Chapter 4 The Proximal Average When averaging functions, the traditional arithmetic average λ1f1 + · · ·+ λnfn (4.1) is the natural place to begin. However, when the domains of the functions do not intersect then the result of (4.1) is a function that is everywhere infinity. The proximal average provides a useful method of averaging functions, even when their domains do not intersect. In this chapter, we give a new proof to the self-duality of the proximal average: (P(f ,λ))∗ = P(f∗,λ). We also supply two self-dual smooth operators, Sβf and Tβf . For this chapter, let f1, · · · , fn ∈ Γ0(H), and λ1, · · · , λn be strictly pos- itive real numbers such that n∑ i=1 λi = 1. 30 4.1. Definitions 4.1 Definitions Definition 4.1.1 (Proximal Average) Let f = (f1, · · · , fn) and λ = (λ1, · · · , λn). The λ-weighted proximal average of n functions fi, 1 ≤ i ≤ n, is P(f ,λ) = λ1 ?(f1 + q)¤ · · ·¤λn ?(fn + q)− q. (4.2) That is, (∀x ∈ H) P(f ,λ)(x) = inf n∑ i=1 xi=x n∑ i=1 ( λi(fi(xi/λi) + q(xi/λi)) ) − q(x). This can be reformulated in two different ways. Proposition 4.1.2 The proximal average can be equivalently defined by (i) P(f ,λ)(x) = inf∑ λiyi=x (∑ i λifi(yi) + ∑ i λiq(yi) ) − q(x) (ii) P(f ,λ) = (λ1(f1 + q)∗ + · · ·+ λn(fn + q)∗)∗ − q. Proof. Using the change of variables yi = xi/λi in Definition 4.1.1 we im- mediately get (i). For (ii), first note that by Proposition 3.2.40(ii), (∀i ∈ N) dom(f∗i ¤ q) = (dom f∗i ) + (dom q) = H. 31 4.2. Properties Then Fact 3.2.42(i), Proposition 3.2.41(i), and Fact 3.2.32 yield (λ1(f1 + q)∗ + · · ·+ λn(fn + q)∗)∗ = (λ1(f1 + q)∗)∗¤ · · ·¤(λn(fn + q)∗)∗ = λ1 ?(f1 + q)∗∗¤ · · ·¤λn ?(fn + q)∗∗ = λ1 ?(f1 + q)¤ · · ·¤λn ?(fn + q). ¥ 4.2 Properties Theorem 4.2.1 (Domain) domP(f ,λ) = λ1 dom f1 + · · ·+ λn dom fn Proof. Using Proposition 3.2.40i and Proposition 3.2.40 ii domP(f ,λ) = dom(λ1 ?(f1 + q)¤ · · ·¤λn ?(fn + q)) = dom(λ1 ?(f1 + q)) + · · ·+ dom(λn ?(fn + q)) = λ1 dom(f1 + q) + · · ·+ λn dom(fn + q) = λ1 dom f1 + · · ·+ λn dom fn. (4.3) ¥ Corollary 4.2.2 If at least one function fi has full domain and λi > 0 then P(f ,λ) has full domain. 32 4.2. Properties 4.2.1 Fenchel Conjugate Next we examine the conjugate of the proximal average. The purpose of this section is to give a new proof for (P(f ,λ))∗ = P(f∗,λ) without using Toland’s Duality Theorem. First we must prove the following lemma, which will also be used later to reformulate the proximal average. Lemma 4.2.3 The following identity holds n∑ i=1 λiq(yi)− q( n∑ i=1 λiyi) = 1 4 n∑ i=1 n∑ j=1 λiλj‖yi − yj‖2. (4.4) Proof. Consider first the left hand side of (4.4), n∑ i=1 λiq(yi)− q( n∑ i=1 λiyi) = 1 2 n∑ i=1 λi‖yi‖2 − 12‖ n∑ i=1 λiyi‖2 = 1 2 n∑ i=1 λi‖yi‖2 − 12 n∑ i=1 λ2i ‖yi‖2 − ∑ i6=j λiλj〈yi, yj〉. (4.5) On the other hand, from the right hand side we get, 33 4.2. Properties 1 4 n∑ i=1 n∑ j=1 λiλj‖yi − yj‖2 = 1 4 n∑ i=1 λi ( n∑ j=1 λj‖yi − yj‖2 ) = 1 4 n∑ i=1 λi ( n∑ j=1 (λj‖yi‖2 − 2λj〈yi, yj〉+ λj‖yj‖2) ) = 1 4 n∑ i=1 λi ( n∑ j=1 λj‖yi‖2 − 2 n∑ j=1 λj〈yi, yj〉+ n∑ j=1 λj‖yj‖2 ) = 1 4 n∑ i=1 λi ( ‖yi‖2 − 2 n∑ j=1 λj〈yi, yj〉+ n∑ j=1 λj‖yj‖2 ) = 1 4 ( n∑ i=1 λi‖yi‖2 − 2 n∑ i=1 n∑ j=1 λiλj〈yi, yj〉+ n∑ i=1 λi ( n∑ j=1 λj‖yj‖2 )) = 1 2 n∑ i=1 λi‖yi‖2 − 12 n∑ i=1 n∑ j=1 λiλj〈yi, yj〉 = 1 2 n∑ i=1 λi‖yi‖2 − 12 n∑ i=1 λ2i ‖yi‖2 − ∑ i6=j λiλj〈yi, yj〉. (4.6) Since (4.5) and (4.6) are equal, the proof is complete. ¥ The following lemma is new. Lemma 4.2.4 Let g(y1, · · · , yn) = λ1q(y1) + · · ·+ λnq(yn)− q(λ1y1 + · · ·+ λnyn), 34 4.2. Properties for (y1, · · · , yn) ∈ Hn and n∑ i=1 λn = 1. Then g∗(x∗1, · · · , x∗n) = λ1 ? q(x∗1) + · · ·+ λn ? q(x∗n), if x∗1 + · · ·+ x∗n = 0; +∞, otherwise. Proof. For every (x∗1, · · · , x∗n) ∈ Hn we have g∗(x∗1, · · · , x∗n) = sup y1,··· ,yn ( 〈x∗1, y1〉+ · · ·+ 〈x∗n, yn〉 − λ1q(y1)− · · · − λnq(yn) + q(λ1y1 + · · ·+ λnyn) ) . (4.7) In light of Lemma 4.2.3, the equation within the supremum is concave and therefore solving for critical points will yield the supremum. Taking the partial derivatives, with respect to yi for i = 1...n, and setting them equal to zero gives x∗1 − λ1y1 + (λ1y1 + · · ·+ λnyn)λ1 = 0 ... (4.8) x∗n − λnyn + (λ1y1 + · · ·+ λnyn)λn = 0. Then taking the sum of all of the above equations yields n∑ i=1 x∗i − n∑ i=1 λiyi + n∑ i=1 λi( n∑ i=1 λiyi) = 0 ⇔ n∑ i=1 x∗i − n∑ i=1 λiyi + n∑ i=1 λiyi ⇔ n∑ i=1 x∗i = 0. 35 4.2. Properties So x∗1 + · · · + x∗n = 0 and if we let y1 = x∗1/λ1, · · · , yn = x∗n/λn then (y1, · · · , yn) is a solution to (4.8). Consequently, λ1y1 + · · · + λnyn = 0 and 〈x∗i , x∗i /λi〉 = 2λiq(x∗i /λi) = 2(λi ? q)(x∗i ) for i = 1 · · ·n. This gives us that 〈x∗1, y1〉+ · · ·+ 〈x∗n, yn〉 − λ1q(y1)− · · · − λnq(yn) = 2 n∑ i=1 λi ? q(x∗i )− λ1 ? q(x∗1)− · · · − λn ? q(x∗n) = λ1 ? q(x∗1) + · · ·+ λn ? q(x∗n) If ∑ i x∗i 6= 0 then let y1 = y2 = · · · = yn = y and then (4.7) becomes g∗(x∗1, · · · , x∗n) ≥ sup y ( 〈 n∑ i=1 x∗i , y〉 − λ1q(y)− · · · − λnq(y) + q(y) ) ≥ sup y ( 〈 n∑ i=1 x∗i , y〉 − q(y) + q(y) ) ≥ sup y ( 〈 n∑ i=1 x∗i , y〉 ) = +∞. Thus, g∗(x∗1, · · · , x∗n) = λ1 ? q(x∗1) + · · ·+ λn ? q(x∗n), if x∗1 + · · ·+ x∗n = 0; +∞, otherwise. ¥ Remark 4.2.5 It can be noted that ∂g∂yi = λiyi − (λ1y1 + · · · + λnyn)λi for 36 4.2. Properties i = 1 · · ·n so that ∂g∂y1 + · · ·+ ∂g ∂yn = 0. This means that ran ∂g ⊆ {(x∗1, · · · , x∗n) : x∗1 + · · ·+ x∗n = 0}. Conversely, if x∗1+ · · ·+x∗n = 0 and we let y1 = x∗1/λ1, · · · , yn = x∗n/λn then ∇g(y1, · · · , yn) = (x∗1, · · · , x∗n) and {(x∗1, · · · , x∗n) : x∗1 + · · ·+ x∗n = 0} ⊆ ran ∂g. Therefore ran ∂g = {(x∗1, · · · , x∗n) : x∗1 + · · · + x∗n = 0}. Now since ran ∂g ⊆ dom g∗ ⊆ ran ∂g by Fact 3.2.24 and we have ran ∂g = ran ∂g then dom g∗ = {(x∗1, · · · , x∗n) : x∗1 + · · ·+ x∗n = 0}, as we saw in the previous lemma. Theorem 4.2.6 (Fenchel Conjugate) [1, Theorem 5.1] (P(f ,λ))∗ = P(f∗,λ) (4.9) Proof. Let f(x) = P(f ,λ)(x) = inf n∑ i=1 λiyi=x ( λ1f1(y1) + · · ·λnfn(yn) + λ1q(y1) + · · ·+ q(yn) − q(λ1y1 + · · ·+ λnyn) ) , and let A : Hn → H be a linear operator defined by A = ( λ1, · · · , λn ) , i.e. A(x1, · · · , xn) = n∑ i=1 λixi. Then A∗ : H → Hn, the adjoint of A, is 37 4.2. Properties A∗ = λ1 ... λn , i.e. A∗(z) = (λ1z, · · · , λnz). Then we can write f as f = AF where F (y) = λ1f1(y1)+· · ·+λnfn(yn)+λ1q(y1)+· · ·+λnq(yn)−q(λ1y1+· · ·+λnyn) and AF (y) := inf {x:Ax=y} F (x). For ease of notation, say that F = j + g where j(y) = λ1f1(y1) + · · · + λnfn(yn) and g(y) = λ1q(y1) + · · · + λnq(yn) − q(λ1y1 + · · · + λnyn). By Proposition 3.2.35, f∗(x∗) = (AF )∗(x∗) = (F ∗ ◦A∗)(x∗). Since j ∈ Γ0(H) and dom g = H×· · ·×H, then int(dom f)∩ int(dom g) 6= ∅ and we can use Fact 3.2.42(i) and Lemma 3.2.44 to get f∗(x∗) = (j∗¤ g∗)A∗(x∗) = ( (λ1 ? f∗1 + · · ·+ λn ? f∗n)¤ g∗ ) A∗(x∗). 38 4.2. Properties Then using Lemma 4.2.4 we get f∗(x∗) = ( (λ1 ? f∗1 + · · ·+ λn ? f∗n)¤ g∗ ) (λ1x∗, · · · , λnx∗) = inf y1,··· ,yn ( λ1f ∗ 1 (y1/λ1) + · · ·+ λnf∗n(yn/λn) + g∗(λ1x∗ − y1, · · · , λnx∗ − yn) ) = inf λ1x∗−y1+···+λnx∗−yn=0 ( λ1f ∗ 1 (y1/λ1) + · · ·+ λnf∗n(yn/λn) + λ1q( λ1x ∗ − y1 λ1 ) + · · ·+ λnq(λnx ∗ − yn λn ) ) = inf x∗=y1+···+yn ( λ1f ∗ 1 (y1/λ1) + · · ·+ λnf∗n(yn/λn) + λ1q(x∗ − y1 λ1 ) + · · · + λnq(x∗ − yn λn ) ) . Expanding the last set of terms in the above equation yields λiq(x∗ − yi λi ) = λi 2 ‖x∗ − yi λi ‖2 = λi 2 ‖x∗‖2 − 〈x∗, yi〉+ λi2 ‖ yi λi ‖2 = λiq(x∗)− 〈x∗, yi〉+ λiq( yi λi ) for all i = 1 · · ·n. Taking the sum of all of these terms and substituting back into the infimum equation produces f∗(x∗) = inf x∗=y1+···+yn ( λ1f ∗ 1 (y1/λ1) + · · ·+ λnf∗n(yn/λn) + q(x∗)− 〈x∗, y1 + · · ·+ yn〉 + λ1q( y1 λ1 ) + · · ·+ λnq( yn λn ) ) = inf x∗=y1+···+yn ( λ1f ∗ 1 (y1/λ1) + · · ·+ λnf∗n(yn/λn) + λ1q( y1 λ1 ) + · · ·+ λnq( yn λn ) − q(x∗) ) . 39 4.2. Properties Making the simple change of variable zi = yi/λi for i = 1 · · ·n generates f∗(x∗) = inf x∗=λ1z1+···+λnzn ( λ1f ∗ 1 (z1) + · · ·+ λnf∗n(zn) + λ1q(z1) + · · ·λnq(zn)− q(x∗) ) = P(f∗,λ). ¥ Example 4.2.7 Let f = (f, f∗) and λ = ( 1 2 , 1 2 ) , then P(f ,λ) = q. Proof. By Fact 4.2.6, (P(f ,λ))∗ = P(f∗,λ). Since f∗ = (f∗, f∗∗) = (f∗, f), then we get that P(f∗,λ) = P(f , λ). There- fore, using Example 3.2.29 we see that (P(f ,λ))∗ = q. ¥ Corollary 4.2.8 P(f ,λ) is convex, lower semicontinuous, and proper. Proof. We can apply Fact 4.2.6 twice to see that (P(f ,λ))∗∗ = (P(f∗,λ))∗ = P(f ,λ). Therefore, in light of Fact 3.2.32 P(f ,λ) ∈ Γ0(H). ¥ Definition 4.2.9 (Moreau Envelope) The Moreau envelope of f ∈ Γ0(H) with parameter µ > 0 is eµf = f ¤µ ? q. Fact 4.2.10 (Moreau Envelope of the Proximal Average) [1, Theo- rem 6.2] 40 4.3. Applications (i) eµP(f ,λ) = λ1eµf1 + · · ·+ λneµfn (ii) (eµP(f ,λ))∗ = λ1 ?(eµf1)∗¤ · · ·¤λn ?(eµfn)∗ Fact 4.2.11 (Proximal Mapping) [1, Theorem 6.7] ProxP(f ,λ) = λ1 Proxf1 + · · ·+ λn Proxfn 4.3 Applications 4.3.1 Self-dual Smooth Approximations A function f in Rn is smooth if f is finite and differentiable everywhere in Rn. It can be helpful in cases of nondifferentiable convex functions to find a smooth approximation of the function. Here, two smooth approximations are defined using the proximal average. The first smooth operator, Sβf , has a simple expression in terms of the Moreau envelope which can be seen in [9]. The second smooth operator, Tβf , has a simple expression in terms of the proximal average and is new. Both approximations are ”self-dual” in the sense that the conjugate of the approximation of f is equal to the approximation of the conjugate of f . Recall that, P(f1, λ1, · · · , fn, λn) := (λ1(f1 + q)∗ + · · ·+ λn(fn + q)∗)∗ − q. For 0 ≤ β ≤ 1 and a proper lower-semicontinuous function f , define Sβf : 41 4.3. Applications Rn → ]−∞,+∞] by Sβf(x) := (1 + β)2P(f, 1− β1 + β , q, 2β 1 + β )( x 1 + β ) (4.10) for all x ∈ Rn. Theorem 4.3.1 (i) (Sβf)∗ = Sβf∗ (ii) When 0 < β ≤ 1, we have Sβf = (1 + β)P(1− β, f, β, 0) + βq = (1− β)2(f ¤ 1 β q) + βq. (4.11) Therefore when β → 0, Sβf → lim β→0+ (f ¤ 1βq) = f . Proof. (i) By Theorem 4.2.6, we have (Sβf)∗ = ( (1 + β)2P(f, 1− β 1 + β , q, 2β 1 + β )( · 1 + β ) )∗ By Fact 3.2.37 and Proposition 3.2.41(i) we then get (Sβf)∗ = ( (1 + β)2P(f, 1− β 1 + β , q, 2β 1 + β ) )∗ ((1 + β)·) = (1 + β)2 ( P(f, 1− β 1 + β , q, 2β 1 + β ) )∗ ( (1 + β)· (1 + β)2 ) = (1 + β)2P ( f∗, 1− β 1 + β , q, 2β 1 + β ) ( · (1 + β) ) = Sβf∗. (ii) For every x, by the definition of the proximal average, Proposi- 42 4.3. Applications tion 3.2.41(i), and Example 3.2.29 Sβf(x) = (1 + β)2 [( 1− β 1 + β (f + q)∗ + 2β 1 + β (q+ q)∗ )∗ ( x 1 + β )− 1 2 ‖x‖2 (1 + β)2 ] = (1 + β)2 ( 1− β 1 + β (f + q)∗ + β 1 + β q )∗ ( x 1 + β )− q(x) = (1 + β) ( (1− β)(f + q)∗ + βq )∗ (x)− q(x) = (1 + β) [( (1− β)(f + q)∗ + βq )∗ (x)− q(x) ] + βq(x). This is the first equality in (4.11). To continue, apply Fact 3.2.42(i), Fact 3.2.32, Example 3.2.29, and Proposition 3.2.41(i) to Sβf = (1 + β) ( (1− β)(f + q)∗ + βq )∗ (x)− q(x), to get Sβf(x) = (1 + β) [((1− β)(f + q)∗)∗¤(βq)∗] (x)− q(x) = (1 + β) [ ((1− β)(f + q)( · 1− β ))¤ 1 β q ] (x)− q(x). Using Definition 3.2.38, Sβf(x) = (1 + β) inf u [ (1− β)(f + q)( u 1− β ) + 1 β q(x− u) ] − q(x) = (1 + β) inf u [ (1− β)f( u 1− β ) + (1− β)q( u 1− β ) + 1 β q(x− u) ] − q(x) = (1− β2) inf u [ f( u 1− β ) + q( u 1− β ) + 1 β(1− β)q(x− u)− 1 1− β2 q(x) ] . (4.12) 43 4.3. Applications Simplifying the portion of (4.12) containing q and using 1 (1−β)2+ 1 β(1−β) = 1 β(1−β)2 and 1 β(1−β) − 1(1−β2) = 1β(1−β2) q( u 1− β ) + 1 β(1− β)q(x− u)− 1 1− β2 q(x) = 1 (1− β)2 ‖u‖2 2 + 1 β(1− β) ‖x‖2 2 − 1 β(1− β)〈x, u〉+ 1 β(1− β) ‖u‖2 2 − 1 1− β2 ‖x‖2 2 = 1 β(1− β)2 ‖u‖2 2 − 1 β(1− β)〈x, u〉+ 1 β(1− β2) ‖x‖2 2 = 1 2β ( ‖u‖2 (1− β)2 − 2〈x, u 1− β 〉+ ‖x‖ 2) + ( 1 β(1− β2) − 1 β ) ‖x‖2 2 = 1 2β ‖x− u 1− β ‖ 2 + β 1− β2 ‖x‖2 2 . Plugging this back into (4.12) gives Sβf(x) = (1− β2) inf u ( f( u 1− β ) + 1 β q(x− u 1− β ) + β 1− β2 q(x) ) = (1− β2) inf w ( f(w) + 1 β q(x− w) ) + βq(x) = (1− β2) ( f ¤ 1 β q ) (x) + βq(x), which is the second equality of (4.11). The convergence result follows from [13, Theorem 1.25]. ¥ Another smooth operator is defined by Tβf := P(f, 1− β, q, β). (4.13) Theorem 4.3.2 (i) (Tβf)∗ = Tβf∗ 44 4.3. Applications (ii) When 0 < β ≤ 1, we have Tβf = (1− β) ( f ¤ 2− β β q ) ( 2 2− β ·) + β 2− β q. Proof. (i) This follows from Theorem 4.2.6 and Example 3.2.29 (Tβf)∗ = (P(f, 1− β, q, β))∗ = P(f∗, 1− β, q∗, β) = P(f∗, 1− β, q, β) = Tβf∗ (ii) Applying Proposition 4.1.2(ii), Proposition 3.2.41(i), Example 3.2.29, Fact 3.2.42(i), and Definition 3.2.38, Tβf(x) = ((1− β)(f + q)∗ + β(2q)∗)∗(x)− q(x) = ( (1− β)(f + q)∗ + β q 2 )∗ (x)− q(x) = ( (1− β)(f + q)( · 1− β )¤ 2 β q ) (x)− q(x) = inf u ( (1− β)f( u 1− β ) + (1− β)q( u 1− β ) + 2 β q(x− u) ) − q(x). This is equivalent to, Tβf(x) = (1−β) inf u ( f( u 1− β ) + q( u 1− β ) + 2 β(1− β)q(x− u)− 1 1− β q(x) ) . (4.14) 45 4.3. Applications Note that q( u 1− β ) + 2 β(1− β)q(x− u)− 1 1− β q(x) = 1 (1− β)2 ‖u‖2 2 + 2 β(1− β) ‖x‖2 − 2〈x, u〉+ ‖u‖2 2 − 1 1− β ‖x‖2 2 = 2− β β(1− β)2 ‖u‖2 2 − 2〈x, u〉 β(1− β) + 2− β β(1− β) ‖x‖2 2 = 2− β 2β ( ‖u‖2 (1− β)2 − 2〈 2x 2− β , u 1− β 〉+ ‖ 2x 2− β ‖ 2 ) + ( 2− β β(1− β) − 4 β(2− β) ) ‖x‖2 2 = 2− β 2β ‖ 2x 2− β − u 1− β ‖ 2 + β (1− β)(2− β) ‖x‖2 2 . Substitute this back into (4.14) to get Tβf(x) = (1− β) inf u ( f( u 1− β ) + 2− β 2β ‖ 2x 2− β − u 1− β ‖ 2 ) + β 2− β ‖x‖2 2 = (1− β) inf w ( f(w) + 2− β β q( 2x 2− β − w) ) + β 2− β q(x) = (1− β) ( f ¤ 2− β β q ) ( 2x 2− β ) + β 2− β q(x), which proves the desired equality. ¥ 46 Chapter 5 The Kernel Average of Two Functions 5.1 Definition The kernel average for two functions is given by Bauschke and Wang in [3] as a generalization of the proximal average. A natural extension of the definition of the proximal average, the kernel average replaces the use of q with any kernel function g. Using the same notation as the previous chapter, we assume f1, f2, and g are functions in Γ0(H), and λ1, λ2 are strictly positive real numbers such that λ1 + λ2 = 1. Definition 5.1.1 (Kernel Average) Let f = (f1, f2) and λ = (λ1, λ2), we define P (f ,λ, g) : H → [−∞,+∞] at x ∈ H by P (f ,λ, g)(x) := inf λ1y1+λ2y2=x (λ1f1(y1) + λ2f2(y2) + λ1λ2g(y1 − y2)) = inf x=z1+z2 ( λ1f1( z1 λ1 ) + λ2f2( z2 λ2 ) + λ1λ2g( z1 λ1 − z2 λ2 ) ) . (5.1) We call this the average of f1 and f2 with respect to the kernel g, or the g-average of f1 and f2. 47 5.1. Definition Example 5.1.2 (Arithmetic Average) Set g = ι{0}, then P (f ,λ, g)(x) = inf λ1y1+λ2y2=x (λ1f1(y1) + λ2f2(y2) + λ1λ2ι0(y1 − y2)) = inf λ1y1+λ2y1=x (λ1f1(y1) + λ2f2(y1)) = λ1f1(x) + λ2f2(x) is the arithmetic average. Lemma 5.1.3 The following equality holds when λ1 + λ2 = 1. λ1λ2‖y1 − y2‖2 = λ1‖y1‖2 + λ2‖y2‖2 − ‖λ1y1 + λ2y2‖2. Proof. Let y1, y2 ∈ H then by Lemma 4.2.3, λ1‖y1‖2 + λ2‖y2‖2 − ‖λ1y1 + λ2y2‖2 = 2 ( 1 4 2∑ i=1 2∑ j=1 λ1λj‖yi − yj‖2 ) = 2 ( 1 4 λ1λ2‖y1 − y2‖2 + 14λ2λ1‖y2 − y1‖ 2 ) = λ1λ2‖y1 − y2‖2 ¥ Example 5.1.4 (Proximal Average) If g = q, then P (f ,λ, g)(x) = inf λ1y1+λ2y2=x ( λ1f1(y1) + λ2f2(y2) + λ1λ2 1 2 ‖y1 − y2‖2 ) . 48 5.2. Properties Applying Lemma 5.1.3 P (f ,λ, g) = inf λ1y1+λ2y2=x ( λ1f1(y1)+λ2f2(y2)+ 1 2 λ1‖y1‖2+12λ2‖y2‖ 2−1 2 ‖x‖2 ) , which is the proximal average with n = 2. Example 5.1.5 Let f1 = ι{a} and f2 = ι{b}, with a, b ∈ R. Then P (f ,λ, g) = λ1λ2g(a− b) if x = λ1a+ λ2b +∞ otherwise. 5.2 Properties Fact 5.2.1 (Fenchel Conjugate) [3, Theorem 2.2] Let f1, f2, g ∈ Γ0(H). For every x∗ ∈ H, (P (f ,λ, g))∗(x∗) = (clϕ)(λ1x∗, λ2x∗) (5.2) where ϕ(u, v) = inf λ1z1+λ2z2=u+v ( λ1f ∗ 1 (z1) + λ2f ∗ 2 (z2) + 1 2 λ1λ2(g∗( u λ1λ2 − z1 λ2 ) + g∗( −v λ1λ2 − z2 λ1 )) ) . Furthermore, if (ri dom f1− ri dom f2) ⋂ ri dom g 6= ∅ then the closure oper- ation in (5.2) can be omitted and we get that (P (f ,λ, g))∗(x∗) = P (f∗,λ, (g∗)∨) (5.3) 49 5.2. Properties where for a given function g ∈ Γ0(H), let g∨(x) = g(−x), and the infimum in Definition 5.1.1 is attained, that is P (f∗,λ, (g∗)∨)(x∗) = min x∗=λ1z1+λ2z2 (λ1f∗1 (z1) + λ2f ∗ 2 (z2) + λ1λ2g ∗(z2 − z1)) . (5.4) Corollary 5.2.2 Let f1, f2, g ∈ Γ0(H), and assume that both g and g∗ have full domain. Then both P (f ,λ, g) and P (f∗,λ, (g∗)∨) are in Γ0(H) and (P (f ,λ, g))∗ = P (f∗,λ, (g∗)∨). (5.5) In particular, for g = 1p‖ · ‖p with p > 1, we have (P (f ,λ, 1 p ‖ · ‖p))∗ = P (f∗,λ, 1 q ‖ · ‖q) (5.6) where 1p + 1 q = 1. Can the definition of the kernel average be generalized for n functions? What is the reformulation of Definition 5.1.1? This will be addressed in the next chapter. 50 Chapter 6 The Kernel Average of n Functions 6.1 Motivation In this chapter, we look at another reformulation of the proximal average and see how that reformulation can be used to extend the definition of the kernel average to n functions. Similar to the previous chapters, we assume f1, · · · , fn and g are functions in Γ0(H), and λ1, · · · , λn are strictly positive real numbers such that n∑ i=1 λi = 1. First we will prove a new reformulation of the proximal average. Theorem 6.1.1 Let f = (f1, · · · fn) with f1, · · · , fn ∈ Γ0(H), and λ = (λ1, · · · , λn) with λ1, · · · , λn ≥ 0 and n∑ i=1 λi = 1. Define f := P(f ,λ) = (λ1(f1 + q)∗ + · · ·+ λn(fn + q)∗)∗ − q. Then for every x ∈ H, f(x) = inf λ1y1+···λnyn=x λ1f1(y1) + · · ·+ λnfn(yn) + 12 ∑ i<j λiλj‖yi − yj‖2. 51 6.1. Motivation Proof. By Proposition 4.1.2(i) and Lemma 4.2.3 f(x) = inf n∑ i=1 λiyi=x ( n∑ i=1 λifi(yi) + n∑ i=1 λiq(yi)− q(x) ) = inf n∑ i=1 λiyi=x ( n∑ i=1 λifi(yi) + n∑ i=1 λiq(yi)− q( n∑ i=1 λiyi) ) = inf∑ λiyi=x (∑ λifi(yi) + 1 4 n∑ i=1 n∑ j=1 λiλj‖yi − yj‖2 ) = inf∑ λiyi=x (∑ λifi(yi) + ∑ i<j λiλj 1 2 ‖yi − yj‖2 ) . ¥ This reformulation of the proximal average suggests a generalization where 12‖ · ‖2 is replaced by a function g. We’ll call this generalization the kernel average of n functions, defined by Qg(f ,λ)(x) := inf∑ λiyi=x (∑ λifi(yi) + ∑ i<j λiλjg(yi − yj) ) . (6.1) This definition is the same as the kernel average when n = 2, but extends the kernel average definition by allowing more than two functions. We’ll now explore the kernel average for n functions in a bit more detail, and from here on when we refer to the kernel average we mean Qg(f ,λ). 52 6.2. The Kernel Average Conjugate 6.2 The Kernel Average Conjugate We will now consider the kernel average as Qg(f ,λ)(x) = inf Ay=x {h(y) := f̃(y) + g̃(y)} = (Ah)(x), where A(x1, · · · , xn) = n∑ i=1 λixiAh = inf{Ay = x}{h(x)}, f̃(y) = ∑ λifi(yi), and g̃(y) = ∑ i<j λiλjg(yi − yj). In light of Proposition 3.2.35, we get Q∗g(x ∗) = (h∗ ◦A∗)(x∗) = ( (f̃ + g̃)∗ ◦A∗ ) (x∗), so we can see that to get Q∗g we need to compute h∗ = (f̃ + g̃)∗, which by Fact 3.2.42 we have h∗ = f̃∗¤ g̃∗. It is quite simple to compute f̃∗, and this was done in Lemma 3.2.44, but g̃∗ is more challenging. To consider g̃∗ = ( ∑ i<j λiλjg(yi − yj) )∗, we’ll first begin with the case which gives us the proximal average, where g = q, with the hope that this will allow us to find a general formula for any g. Proposition 6.2.1 Let x = (x1, · · · , xn) and g1(x) = n∑ i=1 n∑ j=1 1 2‖xi − xj‖2 = 53 6.2. The Kernel Average Conjugate 2 ∑ i<j 1 2‖xi − xj‖2 then g∗1(x ∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2, if x∗1 + · · ·x∗n = 0; +∞, otherwise. Proof. Let λi = 1n for i = 1 · · ·n, then g1 can be rewritten as g1(x) = n∑ i=1 n∑ j=1 1 2 ‖xi − xj‖2 = (2n2) · 14 n∑ i=1 n∑ j=1 ‖xi − xj‖2. Using Lemma 4.2.3, g1(x) = 2n2 ( n∑ i=1 λiq(xi)− q( n∑ i=1 λixi) ) = 2n2 · g(x), where g(x) is as defined in Lemma 4.2.4. Then by Proposition 3.2.41(i) and Lemma 4.2.4 g∗1(x ∗) = 2n2g∗( x∗ 2n2 ) = 2n2 ( n∑ i=1 λi ? q( x∗i 2n2 ) ) , if x ∗ 1 2n2 + · · ·+ x∗n 2n2 = 0; +∞, otherwise. 54 6.2. The Kernel Average Conjugate Now, λi ? q( x∗i 2n2 ) = 1nq( nx∗i 2n2 ) = 1 4n3 q(x∗i ) so 2n2 ( n∑ i=1 λi ? q( x∗i 2n2 ) ) = 1 2n n∑ i=1 q(x∗i ) = 1 2 ( n∑ i=1 1 n q(x∗i ) ) = 1 2 ( n∑ i=1 λiq(x∗i ) ) . Applying Lemma 4.2.3 2n2 ( n∑ i=1 λi ? q( x∗i 2n2 ) ) = 1 2 ( q( n∑ i=1 λix ∗ i ) + 1 4 n∑ i=1 n∑ j=1 λiλj‖x∗i − x∗j‖2 ) . Since x ∗ 1 2n2 + · · ·+ x∗n 2n2 = 0⇔ 12n ( λ1x ∗ 1+ · · ·+ λnx∗n ) = 0⇔ n∑ i=1 λix ∗ i = 0, we get 2n2 ( n∑ i=1 λi ? q( x∗i 2n2 ) ) = 1 2 ( 1 4 n∑ i=1 n∑ j=1 λiλj‖x∗i − x∗j‖2 ) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2. Altogether, g∗1(x ∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2, if x∗1 + · · ·+ x∗n = 0; +∞, otherwise. ¥ Proposition 6.2.1 can also be proven directly using critical points. Proof. 55 6.2. The Kernel Average Conjugate By definition, g∗1(x ∗) = sup x1,··· ,xn ( 〈x∗1, x1〉+ · · ·+ 〈x∗n, xn〉 − n∑ i=1 n∑ j=1 1 2 ‖xi − xj‖2 ) . Let ĝ1(x) = 〈x∗1, x1〉 + · · · + 〈x∗n, xn〉 − n∑ i=1 n∑ j=1 1 2‖xi − xj‖2. Since ∂g1∂xi = 2 n∑ j=1 (xi − xj), we get ∇ĝ1(x) = (x∗1 − 2 n∑ l=1 (x1 − xl), · · · , x∗n − 2 n∑ l=1 (xn − xl)). Setting this equal to zero to solve for the critical points, we see (x∗1, · · · , x∗n) = (2 n∑ l=1 (x1 − xl), · · · , 2 n∑ l=1 (xn − xl)), and therefore that x∗1 + · · ·+ x∗n = 0. And we also get (x∗i − x∗j ) = 2 n∑ l=1 (xi − xl)− (xj − xl) = 2 n∑ l=1 (xi − xj) = 2n(xi − xj), which implies that (xi − xj) = 12n(x∗i − x∗j ) for all i, j with 1 ≤ i, j ≤ n. Since ĝ1 is a sum of linear and concave functions, then ĝ1 is concave. Thus, the critical point is a maximum and we can substitute into g∗1 to find the supremum. 56 6.2. The Kernel Average Conjugate Doing this, we find g∗1(x ∗) = 〈x∗1, x1〉+ · · ·+ 〈x∗n−1, xn−1〉+ 〈−x∗1 − · · · − x∗n−1, xn〉 − n∑ i=1 n∑ j=1 1 2 ( 1 2n )2‖x∗i − x∗j‖2 = 〈x∗1, x1 − xn〉+ · · ·+ 〈x∗n−1, xn−1 − xn〉 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 〈x∗1, 1 2n (x∗1 − x∗n)〉+ · · ·+ 〈x∗n−1, 1 2n (x∗n−1 − x∗n)〉 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 2n (〈x∗1, x∗1〉+ · · ·+ 〈x∗n−1, x∗n−1〉)− 1 2n 〈x∗1, x∗n〉 − · · · − 1 2n 〈x∗n−1, x∗n〉 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 2n (‖x∗1‖2 + · · ·+ ‖x∗n−1‖2 + 1 2n 〈−x∗1 − · · · − x∗n−1, x∗n〉 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 2n n∑ i=1 ‖x∗i ‖2 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 2n n∑ i=1 ‖x∗i ‖2 − 1 2 ‖x ∗ 1 + · · ·+ x∗n n ‖2 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2. Then using (4.2.3), we get that g∗1(x ∗) = 1 4n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 − 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 when x∗1+ · · ·+ x∗n = 0. If x∗1+ · · ·+ x∗n 6= 0 then set x1 = · · · = xn = x and 57 6.2. The Kernel Average Conjugate get that g∗1(x ∗) ≥ sup x ( 〈 n∑ i=1 x∗i , x〉 ) = +∞. ¥ This formula can be written in several equivalent forms using the fact that n∑ i=1 x∗i = 0. Using this, we can see that n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = n∑ i=1 n∑ j=1 (‖x∗i ‖2 + ‖x∗j‖2 − 2〈x∗i , x∗j 〉) = n∑ i=1 ( n∑ j=1 (‖x∗i ‖2 + ‖x∗j‖2)− 2〈x∗i , n∑ j=1 x∗j 〉 ) = n∑ i=1 ( n∑ j=1 (‖x∗i ‖2 + ‖x∗j‖2)− 0 ) = n∑ i=1 ( n∑ j=1 (‖x∗i ‖2 + ‖x∗j‖2) + 2〈x∗i , n∑ j=1 x∗j 〉 ) = n∑ i=1 n∑ j=1 ‖x∗i + x∗j‖2. And from the first proof of Proposition 6.2.1 we also see that g∗1(x ∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = 1 2n n∑ i=1 1 2 ‖x∗i ‖2. So that the following three formulations for the conjugate of g1 are equiva- lent: (1) g∗1(x∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = n∑ i=1 n∑ j=1 1 2‖ x∗i−x∗j 2n ‖2 (2) g∗1(x∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i + x∗j‖2 = n∑ i=1 n∑ j=1 1 2‖ x∗i+x ∗ j 2n ‖2 58 6.3. P-norm Kernel Conjugate for General p Case when n = 3 (3) g∗1(x∗) = 1 2n n∑ i=1 1 2‖xi‖2 when x∗1 + · · ·+ x∗n = 0 and g∗1(x∗) = +∞ otherwise. The next step we wish to consider is the case where g = 12 n∑ i=1 n∑ j=1 1 p‖xi− xj‖p for both general p > 1 and general n > 1. The conjugate is known for general p > 1 and n = 2, so in the next section we begin looking at general p with n = 3. Example 6.2.2 (General p, n = 2) Let f(x1, x2) = 1p‖x1−x2‖p with p > 1. Then, f∗(y1, y2) = sup x1,x2 ( 〈x1, y1〉+ 〈x2, y2〉 − 1 p ‖x1 − x2‖p ) = sup x1,x2 ( 〈x1 − x2, y1〉+ 〈x2, y1 + y2〉 − 1 p ‖x1 − x2‖p ) . And using Example 3.2.30, with 1p + 1 q = 1, we get f∗(y1, y2) = 1 q‖y1‖q, if y1 + y2 = 0; +∞, otherwise. 6.3 P-norm Kernel Conjugate for General p Case when n = 3 We now wish to consider the case where g2 = 12 3∑ i=1 3∑ j=1 1 p‖xi − xj‖p, that is g2(x1, x2, x3) = 1 p ‖x1 − x2‖p + 1 p ‖x2 − x3‖p + 1 p ‖x3 − x1‖p. 59 6.3. P-norm Kernel Conjugate for General p Case when n = 3 Then g∗2(y1, y2, y3) is equal to sup x1,x2,x3 {x1y1 + x2y2 + x3y3 − 1 p ‖x1 − x2‖p − 1 p ‖x2 − x3‖p − 1 p ‖x3 − x1‖p} = sup x1,x2 {x1y1 + x2y2 − 1 p ‖x1 − x2‖p + sup x3 (x3y3 − 1 p ‖x2 − x3‖p − 1 p ‖x3 − x1‖p)} (6.2) Recognizing that sup x3 {x3y3 − 1p‖x2 − x3‖p − 1p‖x3 − x1‖p} = (1p‖x1 − ·‖p + 1 p‖x2 − ·‖p)∗(y3), then applying Fact 3.2.42(i) sup x3 {x3y3− 1 p ‖x2−x3‖p− 1 p ‖x3−x1‖p} = ((1 p ‖·−x1‖p)∗¤(1 p ‖·−x2‖p)∗)(y3) (6.3) Using Proposition 3.2.28 and Example 3.2.30 we get ( 1 p ‖x− z‖p)∗(y) = 〈y, z〉+ 1 q ‖y‖q. (6.4) Combining (6.4) and (6.3) (( 1 p ‖ · −x1‖p)∗¤(1 p ‖ · −x2‖p)∗)(y3) = (1 q ‖ · ‖q + 〈x1, ·〉)¤(1 q ‖ · ‖q + 〈x2, ·〉) = inf u+v=y3 {1 q ‖u‖q + 1 q ‖v‖q + 〈x1, u〉+ 〈x2, v〉}. (6.5) Substituting (6.5) back into (6.2) and setting v = y3 − u yields g∗2(y1, y2, y3) = sup x1,x2 inf u [〈x1, y1〉+ 〈x2, y2〉 − 1 p ‖x1 − x2‖p + 1 q ‖u‖q + 1 q ‖u− y3‖q + 〈x1, u〉+ 〈x2, y3 − u〉]. 60 6.3. P-norm Kernel Conjugate for General p Case when n = 3 Looking at the above equation we can see that F ((x1, x2), u) = 〈x1, y1〉+〈x2, y2〉−1 p ‖x1−x2‖p+1 q ‖u‖q+1 q ‖u−y3‖q+〈x1, u〉+〈x2, y3−u〉 is concave-convex since F ((x1, x2), ·) is a sum of linear and convex functions and F (·, u) is a sum of linear and concave functions. Fix x = (x1, x2) ∈ H × H, and let F ((x1, x2), u) = Fx(u). Using Fact 3.4.3 we show that a saddle value exists by showing that 0+(epiFx) = {(0, λ) : λ ≥ 0}. By definition, 0+(epiFx) = epiF∞x , so (u, λ) ∈ 0+(epiFx)⇔ (u, λ) ∈ epiF∞x ⇔ λ ≥ F∞x (u). Using Proposition 3.2.19 to compute F∞x (u) yields F∞x (u) = lim λ→∞ ( Fx(λu)− Fx(0) λ ) = lim λ→∞ ( 1 q‖λu‖q + 1q‖λu− y3‖q + 〈x1, λu〉+ 〈x2, y3 − λu〉 − 1q‖y3‖q − 〈x2, y3〉 λ ) = lim λ→∞ ( λq 1q‖u‖q + λq 1q‖u− y3λ ‖q + λ〈x1, u〉+ λ〈x2, y3λ − u〉 λ ) = +∞, if u 6= 0; 0, if u = 0. Therefore (u, λ) ∈ epiF∞x ⇔ u = 0. Since there is no common nonzero direction of recession for Fx, we can use Fact 3.4.3 to swap the positions of 61 6.3. P-norm Kernel Conjugate for General p Case when n = 3 the infimum and supremum, and combining the appropriate inner product terms produces g∗2(y1, y2, y3) = infu supx1,x2 ( 〈x1, y1+u〉+〈x2, y2+y3−u〉−1 p ‖x1−x2‖p+1 q ‖u‖q+1 q ‖u−y3‖q ) . (6.6) Considering the inner supremum first, we will fix x1 and let x2−x1 = z. Then (6.6) becomes inf u ( 1 q ‖u‖q + 1 q ‖u− y3‖q + sup x1 〈x1, y1 + u〉+ sup z 〈x1 + z, y2 + y3 − u〉 − 1 p ‖z‖p ) = inf u ( 1 q ‖u‖q + 1 q ‖u− y3‖q + sup x1 〈x1, y1 + y2 + y3〉+ sup z 〈z, y2 + y3 − u〉 − 1 p ‖z‖p ) . Here we can see that the supremum on the right is the definition of (1p‖·‖p)∗ evaluated at y2+y3−u. Using Example 3.2.30 and the fact that y1+y2+y3 = 0 we then get g∗2(y1, y2, y3) = infu ( 1 q ‖u‖q + 1 q ‖u− y3‖q + 1 q ‖y2 + y3 − u‖q ) where y1 + y2 + y3 = 0 and 1p + 1 q = 1. Since g2 is symmetric under permutation of its variables, we interchange y1 and y3 so that the previous description of g∗2 turns into the more sym- metric form g∗2(y1, y2, y3) = infx ( 1 q ‖x−y1‖q+ 1 q ‖x−(y1+y2)‖q+ 1 q ‖x−(y1+y2+y3)‖q ) where y1 + y2 + y3 = 0 and 1p + 1 q = 1. 62 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 In R, the problem becomes min x∈R ( 1 q |x− y1|q + 1 q |x− (y1 + y2)|q + 1 q |x|q ) (6.7) with 1p + 1 q = 1 and y1 + y2 + y3 = 0. We will define h(x) := 1 q |x− y1|q + 1 q |x− (y1 + y2)|q + 1 q |x|q, (6.8) so that we are solving min x∈R h(x). Since the case with p = 2 and q = 2 was already solved, we consider this problem with the next simplest case, q = 3 which corresponds to p = 32 . 6.4 P-norm Kernel Conjugate when p = 32, q = 3, and n = 3 Considering (6.8) with q = 3 gives the problem min x∈R h(x) (6.9) where h(x) = 1 3 |x− y1|3 + 13 |x− (y1 + y2)| 3 + 1 3 |x|3, (6.10) and y1, y2 are constants. In order to find the optimal value of (6.9) we need to consider eight cases, which cover all possible values of the three absolute values in (6.10). The eight cases are as follows: 63 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 (1) x ≥ y1, x ≥ y1 + y2, and x ≥ 0 (2) x ≤ y1, x ≥ y1 + y2, and x ≥ 0 (3) x ≥ y1, x ≤ y1 + y2, and x ≥ 0 (4) x ≥ y1, x ≥ y1 + y2, and x ≤ 0 (5) x ≤ y1, x ≤ y1 + y2, and x ≥ 0 (6) x ≤ y1, x ≥ y1 + y2, and x ≤ 0 (7) x ≥ y1, x ≤ y1 + y2, and x ≤ 0 (8) x ≤ y1, x ≤ y1 + y2, and x ≤ 0 We now consider each case in depth, and set y = (y1, y2). 6.4.1 Case 1: x ≥ y1, x ≥ y1 + y2, x ≥ 0 Using the constraints of this case, we define the function to be minimized as h1,y(x) = 1 3 (x− y1)3 + 13(x− (y1 + y2)) 3 + 1 3 x3, and we minimize h1,y over the region where max{y1, y1+y2, 0} ≤ x. Because h1,y(x) is convex with respect to x, see Example 3.2.31, then by Fact 3.2.23 any critical point will be the minimizer. Differentiating to solve for critical points yields ∂h1,y ∂x = (x− y1)2 + (x− (y1 + y2))2 + x2. 64 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Setting ∂h1,y∂x = 0 and solving for x, we get the critical points x1 = 1 3 y2 + 2 3 y1 + 1 3 √ −2y22 − 2y1y2 − 2y21 x2 = 1 3 y2 + 2 3 y1 − 13 √ −2y22 − 2y1y2 − 2y21. To see if the critical points are real, the value of −2y22 − 2y1y2 − 2y21 must be examined and we see that −2y22 − 2y1y2 − 2y21 = −(y21 + y22)− (y21 + 2y1y2 + y22) = −(y21 + y22)− (y1 + y2)2 ≤ 0. Since −2y22 − 2y1y2 − 2y21 ≤ 0 for all values of y1 and y2 then there are no real critical points of h1,y, so we next check the boundary points, x ≥ max{0, y1, y1 + y2}. If max{0, y1, y1 + y2} = y1 + y2, i.e. when y1 ≥ 0 and y2 ≥ 0, or when y2 ≥ 0 and −y2 ≤ y1 ≤ 0, then the minimum value of h1y1,y2(x) is h1,y(y1 + y2) = 1 3 y32 + 1 3 (y1 + y2)3. If max{0, y1, y1 + y2} = y1, or rather when y1 ≥ 0 and y2 ≤ 0, then we get a minimum value at h1,y(y1) = 1 3 |y2|3 + 13y 3 1. If max{0, y1, y1+ y2} = 0, i.e. if y1 ≤ 0 and y2 ≤ 0, or y2 ≥ 0 and y2 ≤ −y1, 65 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 then the minimum value is h1,y(0) = 1 3 |y1|3 + 13 |y1 + y2| 3. This case is summarized graphically in Figure 6.1. Figure 6.1: Case 1 summary 66 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 6.4.2 Case 2: x ≤ y1, x ≥ y1 + y2, x ≥ 0 The conditions for this case give the following function, h2,y(x) = 1 3 (y1 − x)3 + 13(x− (y1 + y2)) 3 + 1 3 x3, for minimization on the region where max{0, y1 + y2} ≤ x ≤ y1. Again, differentiating to find the critical points gives us ∂h2,y ∂x = −(y1 − x)2 + (x− (y1 + y2))2 + x2. And solving ∂h2,y∂x = 0 yields the two critical points x1 = y2 + √ −2y1y2 x2 = y2 − √ −2y1y2. Considering the constraints, we see that y1 ≥ x ≥ 0 and y1+y2 ≤ x ≤ y1, so that y1 ≥ 0 and y2 ≤ 0 is the region of interest. This makes −2y1y2 ≥ 0 so that the critical points are real. We need x ≥ 0, but x2 ≤ 0 for all y1, y2 in this region, and if x2 = 0 then y2 = y1 = 0 and x1 = x2 = 0. Hence, it suffices to consider only x1. Next, we check that x1 satisfies the three conditions of this case: x ≥ 0, x ≤ y1, and x ≥ y1 + y2. For x1 = y2 + √−2y1y2 ≥ 0, it is required that −y2 = |y2| ≤ √ −2y1y2 ⇔ y22 ≤ −2y1y2 ⇔ |y2| ≤ 2y1 ⇔ 1 2 |y2| ≤ y1. 67 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 For x1 ≤ y1, this is equivalent to y2 + √ −2y1y2 ≤ y1 ⇔ −2y1y2 ≤ (y1 − y2)2 ⇔ 0 ≤ y21 + y22. Therefore this condition is always true. For x1 ≥ y1 + y2, we get y2 + √ −2y1y2 ≥ y1 + y2 ⇔ √ −2y1y2 ≥ y1 ⇔ 2|y2| ≥ y1. Since h2,y is convex with respect to x, then if these three conditions hold then x1 is the minimizer. So if 12 |y2| ≤ y1 ≤ 2|y2| then the critical point is the minimizer and we get the minimum value of h2,y(x) is h2,y(x1) = −13y1y2(3y1 − 3y2 − 4 √ −2y1y2) If the conditions for the critical point are not satisfied then we must check the boundary conditions. In this case the boundary points are max{y1 + y2, 0} ≤ x ≤ y1, and we consider two subcases: Subcase 1: y1 + y2 ≥ 0 To determine which is the minimum, we evaluate the difference between the upper boundary value and the lower boundary value. When y1 + y2 ≥ 0, the max{y1 + y2, 0} = y1 + y2 so we look at the difference h2,y(y1)− h2,y(y1 + y2) = 13(−y2) 3 + 1 3 y31 − 1 3 (−y2)3 − 13(y1 + y2) 3 = 1 3 y31 − 1 3 (y1 + y2)3 = −13y 3 2 − y1y2(y1 + y2). 68 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Since y1 ≥ 0, y2 ≤ 0 and y1+ y2 ≥ 0 the above difference is always positive. This means that the minimum value is h2,y(y1 + y2) = 1 3 (y1 + y2)3 − 13y 3 2. Subcase 2: y1 + y2 ≤ 0 With y1 + y2 ≤ 0 the max{y1 + y2, 0} = 0 so we calculate the difference h2,y(y1)− h2,y(0) = 13y 3 1 + 1 3 (−y2)3 − 13y 3 1 − 1 3 (−y1 − y2)3 = 1 3 (−y2)3 − 13(−y1 − y2) 3 = 1 3 y31 + y1y2(y1 + y2). Again, because of the signs of y1, y2, and y1 + y2 the difference is positive so the minimum is h2,y(0) = 1 3 y31 − 1 3 (y1 + y2)3. Case 2 is summarized graphically in Figure 6.2. 6.4.3 Case 3: x ≥ y1, x ≤ y1 + y2, x ≥ 0 For this case, the function we are looking to minimize is h3,y = 1 3 (x− y1)3 + 13(y1 + y2 − x) 3 + 1 3 x3, over the region where max{y1, 0} ≤ x ≤ y1 + y2. Differentiating h3,y yields ∂h3,y ∂x = (x− y1)2 − (y1 + y2 − x)2 + x2. 69 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Figure 6.2: Case 2 summary And setting ∂h3,y∂x = 0 and solving for x gives us the critical points x1 = −y2 + √ 2y22 + 2y1y2 x2 = −y2 − √ 2y22 + 2y1y2. Looking at the constraints for this case we can see that y1 + y2 ≥ x ≥ y1, so we must have y2 ≥ 0. And y1 + y2 ≥ x ≥ 0 gives us that y1 + y2 ≥ 0. Knowing that y2 ≥ 0, it is obvious that x2 ≤ 0 and when x2 = 0 then y2 = y1 = 0 and x1 = x2 = 0. So this case can be covered by considering only x1. For x1 to be a real critical point, we need 2y22+2y1y2 ≥ 0⇔ 2y2(y1+y2) ≥ 0. Since both y1 ≥ 0 and y1 + y2 ≥ 0 always holds for this case then this is 70 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 always true. We also require that the critical point x1 satisfies the conditions for this case. For x1 ≥ 0, −y2 + √ 2y22 + 2y1y2 ≥ 0⇔ 2y22 + 2y1y2 ≥ y22 ⇔ 2y1y2 ≥ −y22 ⇔ 2y1 ≥ −y2 For x1 ≥ y1 −y2+ √ 2y22 + 2y1y2 ≥ y1 ⇔ √ 2y22 + 2y1y2 ≥ y1+y2 ⇔ 2y22+2y1y2 ≥ y21+2y1y2+y22 ⇔ y22 ≥ y21 ⇔ y2 ≥ |y1| And for x1 ≤ y1 + y2 − y2 + √ 2y22 + 2y1y2 ≤ y1 + y2 ⇔ 2y22 + 2y1y2 ≤ y21 + 4y1y2 + 4y22 ⇔ 0 ≤ y21 + 2y1y2 + 2y22 ⇔ 0 ≤ (y1 + y2)2 + y22 So x1 ≤ y1+ y2 is always true, and x1 ≥ 0 and x1 ≥ y1 hold when 2y1 ≥ −y2 and y2 ≥ |y1|, respectively. This means that when both 2y1 ≥ −y2 and y2 ≥ |y1| are true, the critical point x1 will produce the minimum, h3,y(x1) = 1 3 y2(y1 + y2)(3y1 + 6y2 − 4 √ 2y2(y1 + y2)). Next, we will need to consider two subcases of y1 ≥ 0 and y1 ≤ 0 separately in order to find the minimum value for each region where the critical point does not satisfy the conditions of this case. 71 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y2 ≥ 0, y1 ≤ 0 If y1 ≤ 0 and 2y1 ≥ −y2 then the critical point x1 is the minimizer and the minimum value is as stated above. If 2y1 < −y2 then there are no critical points and we look at the boundary points, which in this subcase are max{0, y1} = 0 ≤ x ≤ y1+y2. Taking the difference of the two potential minimums yields h3,y(y1 + y2)− h3,y(0) = 13y 3 2 + 1 3 (y1 + y2)3 − 13 |y1| 3 − 1 3 (y1 + y2)3 = 1 3 y32 − 1 3 |y1|3. Since y1 + y2 ≥ 0⇔ y2 ≥ −y1 ⇔ y2 ≥ |y1|, then h3,y(0) ≤ h3,y(y1 + y2) and the lower bound is the minimizer and the minimum value is h3,y(0) = 1 3 (y1 + y2)3 − 13y 3 1. Subcase 2: y2 ≥ 0, y1 ≥ 0 Since both y1 and y2 are always positive then 2y1 ≥ −y2 always holds. Therefore the critical point is good everywhere within the region where y2 ≥ y1. When y1 ≥ y2 then the boundary points are max{0, y1} = y1 ≤ x ≤ y1 + y2. Taking the difference produces h3,y(y1 + y2)− h3,y(y1) = 13y 3 2 + 1 3 (y1 + y2)3 − 13y 3 2 − 1 3 y31 = 1 3 (y1 + y2)3 − 13y 3 1 ≥ 0. 72 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 So the minimum value is h3,y(y1) = 1 3 y32 + 1 3 y31. Case 3 is summarized graphically in Figure 6.3. Figure 6.3: Case 3 summary 6.4.4 Case 4: x ≥ y1, x ≥ y1 + y2, x ≤ 0 For this case we minimize the function h4,y(x) = 1 3 (x− y1)3 + 13(x− y1 − y2) 3 − 1 3 x3 73 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 over the region where max{y1, y1 + y2} ≤ x ≤ 0. Solving ∂h4∂x = (x− y1)2 + (x− y1 − y2)2 − x2 = 0 yields the critical points x1 = 2y1 + y2 + √ 2y1y2 + 2y21 x2 = 2y1 + y2 − √ 2y1y2 + 2y21. The constraints for this case lead us to the inequalities y1 ≤ 0 and y1+y2 ≤ 0, or y2 ≤ −y1 = |y1|. In order for the critical points to be real, we require 2y1y2+2y21 ≥ 0⇔ |y1| ≥ y2 which is always true in this case. Therefore the critical point will give the minimum value. To satisfy x ≤ 0 and determine which critical point is the one we want, we need x1, x2 ≤ 0. Looking first at x1: y2 + 2y1 + √ 2y1y2 + 2y21 ≤ 0⇔ (−y2 − 2y1)2 ≥ 2y1y2 + 2y21 ⇔ y22 + 4y1y1 + 4y21 ≥ 2y1y2 + 2y21 ⇔ y22 + 2y1y1 + 2y21 ≥ 0 ⇔ (y1 + y2)2 + y21 ≥ 0. This always holds, so we check the next condition, x1 ≥ y1 + y2: y2 + 2y1 + √ 2y1y2 + 2y21 ≥ y1 + y2 ⇔ y1 + √ 2y1y2 + 2y21 ≥ 0 ⇔ 2y1y2 + 2y21 ≥ y21 ⇔ 2y1y2 + y21 ≥ 0⇔ y1(2y2 + y1) ≥ 0 ⇔ 2y2 + y1 ≤ 0⇔ y2 ≤ 12 |y1|. 74 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 And the final condition, x1 ≥ y1, yields y2 + 2y1 + √ 2y1y2 + 2y21 ≥ y1 ⇔ y1 + y2 + √ 2y1y2 + 2y21 ≥ 0 ⇔ 2y1y2 + 2y21 ≥ (−y1 − y2)2 ⇔ 2y1y2 + 2y21 ≥ y21 + 2y1y2 + y22 ⇔ y21 − y22 ≥ 0⇔ |y1| ≥ |y2|. So the critical point x1 is in the region of interest when y2 ≤ 12 |y1| and |y1| ≥ |y2| both hold. Next looking at x2, if we look at the condition x2 ≥ y1 we see that 2y1 + y2 − √ 2y1y2 + 2y21 ≥ y1 ⇔ (y1 + y2)− √ 2y1y2 + 2y21 ≥ 0. This only holds if y1 = y2 = 0, in which case x1 = x2 so this point is not in the interior of the region and we do not need to consider it. So when y2 ≤ 12 |y1| and |y1| ≥ |y2| both hold then the minimum value is h4,y(x1) = −13y1(y1 + y2)(6y1 + 3y2 + 4 √ 2y1(y1 + y2)). Next, we examine the boundary conditions max{y1, y1+ y2} ≤ x ≤ 0 for the two subcases y2 ≥ 0 and y2 ≤ 0, to determine the minimum when the constraints needed for the critical point do not hold. 75 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y2 ≥ 0 When y2 ≥ 0 then the max{y1, y1+y2} = y1+y2, so we look at the difference h4,y(0)− h4,y(y1 + y2) = −13y 3 1 − 1 3 (y1 + y2)3 − 13y 3 2 + 1 3 (y1 + y2)3 = 1 3 |y1|3 − 13y 3 2. Since |y1| ≥ y2 then the above difference is positive, so the minimum is h4,y(y1 + y2) = 1 3 y32 − 1 3 (y1 + y2)3. Subcase 2: y2 ≤ 0 When y2 ≤ 0 then the max{y1, y1 + y2} = y1, so we look at the difference h4,y(0)− h4,y(y1) = 13 |y1| 3 + 1 3 |y1 + y2|3 − 13 |y2| 3 − 1 3 |y1|3 = 1 3 |y1 + y2|3 − 13 |y2| 3. Since |y1 + y2| ≥ |y2| then the difference is positive and the minimum is h4,y(y1) = 1 3 |y1|3 + 13 |y2| 3. Case 4 is summarized graphically in Figure 6.4. 76 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Figure 6.4: Case 4 summary 6.4.5 Case 5: x ≤ y1, x ≤ y1 + y2, x ≥ 0 For this case we look at minimizing h5,y(x) = 1 3 (y1 − x)3 + 13(y1 + y2 − x) 3 + 1 3 x3, over the region where 0 ≤ x ≤ min{y1, y1 + y2}. Differentiating h5,y with respect to x yields ∂h5,y ∂x = −(y1 − x)2 − (y1 + y2 − x)2 + x2. 77 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Then solving ∂h5,y∂x = 0 gives us the critical points x1 = 2y1 + y2 + √ 2y1y2 + 2y21 x2 = 2y1 + y2 − √ 2y1y2 + 2y21. These critical points are the same as in the previous case, but must be re- examined using the new constraints of this case. We see that y1 ≥ x ≥ 0, so y1 ≥ 0 and similarly y1 + y2 ≥ 0. In order for the critical points to be real, we need 2y1y2 + 2y21 ≥ 0 ⇔ y1 + y2 ≥ 0 which always holds for this case. Next, we check where the critical points are valid. When we check x1 ≤ y1, we see that 2y1 + y2 + √ 2y1y2 + 2y21 ≤ y1 ⇔ y1 + y2 + √ 2y1y2 + 2y21 ≤ 0, but since y1 + y2 ≥ 0 and √ 2y1y2 + 2y21 ≥ 0 then this does not hold except when y1 = y2 = 0, and it that situation x1 = x2. Therefore, for this case we need only consider x2. Looking at x2, we see for x2 ≥ 0, 2y1 + y2 − √ 2y1y2 + 2y21 ≥ 0⇔ (2y1 + y2)2 ≥ 2y1y2 + 2y21 ⇔ 4y21 + 4y1y2 + y22 ≥ 2y1y2 + 2y21 ⇔ 2y21 + 2y1y2 + y22 ≥ 0 ⇔ y21 + (y1 + y2)2 ≥ 0. This always holds, so next we check x2 ≤ y1: 2y1 + y2 − √ 2y1y2 + 2y21 ≤ y1 ⇔ (y1 + y2)2 ≤ 2y1y2 + 2y21 ⇔ y21 + 2y1y2 + y22 ≤ 2y1y2 + 2y21 ⇔ y22 − y21 ≤ 0⇔ |y1| ≥ |y2|. 78 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 And checking the last condition we see that for x2 ≤ y1 + y2, 2y1 + y2 − √ 2y1y2 + 2y21 ≤ y1 + y2 ⇔ y1 − √ 2y1y2 + 2y21 ≤ 0 ⇔ 2y1y2 + 2y21 ≥ y21 ⇔ 2y1y2 + y21 ≥ 0 ⇔ y1(2y2 + y1) ≥ 0⇔ 2y2 + y1 ≥ 0 So the critical point x2 is in the region of interest when both |y1| ≥ |y2| and 2y2 + y1 ≥ 0 hold, and the minimum value is h5,y(x2) = −13y1(y1 + y2)(−6y1 − 3y2 + 4 √ 2y1(y1 + y2)). Now we look at the boundary conditions 0 ≤ x ≤ min{y1, y1 + y2} in the two subcases y2 ≥ 0 and y2 ≤ 0 to determine the minimum when the critical point is not in the region of interest. That is, when |y1| < |y2| and 2y2 + y1 < 0. Subcase 1: y2 ≥ 0 With y2 ≥ 0 the min{y1, y1 + y2} = y1 and the difference that we need to consider is h5,y(y1)− h5,y(0) = 13y 3 2 + 1 3 y31 − 1 3 y31 − 1 3 (y1 + y2)3 = 1 3 y32 − 1 3 (y1 + y2)3 = −13y 3 1 − y1y2(y1 + y2) ≤ 0 79 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 So the upper bound is the minimizer and the minimum value is h5,y(y1) = 1 3 y31 + 1 3 y32. Subcase 2: y2 ≤ 0 When y2 ≤ 0 then the min{y1, y1+y2} = y1+y2 and we look at the difference h5,y(y1 + y2)− h5,y(0) = 13(−y2) 3 + 1 3 (y1 + y2)3 − 13y 3 1 − 1 3 (y1 + y2)3 = 1 3 (−y2)3 − 13y 3 1 Since y2 ≤ 0, y1 ≥ 0, and y1+ y2 ≥ 0 then y1 ≥ |y2| and the above equation is always less than or equal to zero, which makes the upper bound, y1 + y2 the minimizer with a minimum value of h5,y(y1 + y2) = 1 3 (y1 + y2)3 − 13y 3 2. Case 5 is summarized graphically in Figure 6.5. 6.4.6 Case 6: x ≤ y1, x ≥ y1 + y2, x ≤ 0 For this case we have the function h6,y(x) = 1 3 (y1 − x)3 + 13(x− y1 − y2) 3 + 1 3 (−x)3 to minimize. Differentiating with respect to x to get our critical points yields ∂h6,y ∂x = −(y1 − x)2 + (x− y1 − y2)2 − x2 80 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Figure 6.5: Case 5 summary Setting this equal to zero and solving for x then gives us the critical points x1 = −y2 + √ 2y22 + 2y1y2 x2 = −y2 − √ 2y22 + 2y1y2. Looking at the constraints of the case, notice that y1 + y2 ≤ x ≤ y1 and y1 + y2 ≤ x ≤ 0 imply that both y1 + y2 ≤ 0 and y2 ≤ 0. For the critical points to be real we need 2y22 + 2y1y2 ≥ 0 ⇔ 2y2(y1 + y2) ≥ 0, which is always true since y2 and y1 + y2 are both always negative. It is easy to see that critical point x1 is always positive, so it does not satisfy the conditions of this case, except when y1 = y2 = 0 which makes x1 = x2. So it suffices to check only x2 for this case. 81 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Checking the condition x2 ≥ y1 + y2, y1 + y2 ≤ −y2 − √ 2y22 + 2y1y2 ⇔ √ 2y22 + 2y1y2 ≤ −y1 − 2y2 ⇔ 2y22 + 2y1y2 ≤ y21 + 4y1y2 + 4y22 ⇔ 0 ≤ y21 + 2y1y2 + 2y22 ⇔ 0 ≤ (y1 + y2)2 + y22. This condition always holds. Next, we check x2 ≤ y1, −y2 − √ 2y22 + 2y1y2 ≤ y1 ⇔ −y1 − y2 ≤ √ 2y22 + 2y1y2 ⇔ y21 + 2y1y2 + y22 ≤ 2y22 + 2y1y2 ⇔ y21 ≤ y22 ⇔ |y1| ≤ |y2|. And the last condition is x2 ≤ 0, −y2 − √ 2y22 + 2y1y2 ≤ 0⇔ −y2 ≤ √ 2y22 + 2y1y2 ⇔ y22 ≤ 2y22 + 2y1y2 ⇔ 0 ≤ y22 + 2y1y2 ⇔ 0 ≤ y2(y2 + 2y1)⇔ y2 + 2y1 ≤ 0⇔ 2y1 ≤ |y2|. So x2 is the minimizer when both |y1| ≤ |y2| and 2y1 ≤ |y2| hold and the minimum value is h6,y(x2) = −13y2(y1 + y2)(3y1 + 6y2 + 4 √ 2y2(y1 + y2)). Outside this region we check the boundary conditions y1 + y2 ≤ x ≤ min{0, y1} to determine the minimizer. 82 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Subcase 1: y1 ≥ 0 With y1 ≥ 0 the min{y1, 0} = 0 so we look at the difference h6,y(0)− h6,y(y1 + y2) = 13y 3 1 + 1 3 (−y1 − y2)3 − 13(−y2) 3 − 1 3 (−y1 − y2)3 = 1 3 y31 − 1 3 (−y2)3 = 13y 3 1 − 1 3 |y2|3. Since y1 + y2 ≤ 0 we know that |y2| ≥ y1 and the above equation is less than or equal to zero. This makes the upper boundary the minimum and the minimum value is h6,y(0) = 1 3 y31 − 1 3 (y1 + y2)3. Subcase 2: y1 ≤ 0 When y1 ≤ 0 the min{y1, 0} = y1 and we consider the difference h6,y(y1)− h6,y(y1 + y2) = 13(−y2) 3 + 1 3 (−y1)3 − 13(−y2) 3 − 1 3 (−y1 − y2)3 = 1 3 |y1|3 − 13 |y1 + y2| 3. Because y1 and y2 are both negative then |y1 + y2| ≥ |y1| and the equation above is less than or equal to zero, giving us a minimum value of h6,y(y1) = −13y 3 1 − 1 3 y32. Case 6 is summarized graphically in Figure 6.6. 83 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Figure 6.6: Case 6 summary 6.4.7 Case 7: x ≥ y1, x ≤ y1 + y2, x ≤ 0 In this case the equation to be minimized is h7,y(x) = 1 3 (x− y1)3 + 13(y1 + y2 − x) 3 + 1 3 (−x)3, over the region where y1 ≤ x ≤ min{0, y1 + y2}. Differentiating h7,y with respect to x yields ∂h7,y ∂x = (x− y1)2 − (y1 + y2 − x)2 − x2. 84 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Setting ∂h7,y∂x = 0 and solving for x gives us the critical points x1 = y2 + √ −2y1y2 x2 = y2 − √ −2y1y2. Looking at the constraints we see that y1 ≤ x ≤ 0 and y1 ≤ x ≤ y1 + y2 so we have y1 ≤ 0 and y2 ≥ 0 for this case. For the critical points to be real, −2y1y2 ≥ 0 must hold, and because of the signs of y1 and y2 this is always true. The critical point x1 ≤ 0 so it does not lie in the interior of the region of interest. And when x1 = 0 then y1 = y2 = 0 and so x1 = x2. Therefore we need only consider x2, and we check x2 against the constraints. First, x2 ≥ y1, y2 − √ −2y1y2 ≥ y1 ⇔ (y2 − y1)2 ≥ −2y1y2 ⇔ y22 + y21 ≥ 0. This will always hold, so next we look at x2 ≤ 0 y2 − √ −2y1y2 ≤ 0⇔ y22 ≤ −2y1y2 ⇔ y22 + 2y1y2 ≤ 0⇔ y2(y2 + 2y1) ≤ 0 ⇔ y2 + 2y1 ≤ 0⇔ 2|y1| ≥ y2. And finally at x2 ≤ y1 + y2, y2 − √ −2y1y2 ≤ y1 + y2 ⇔ y21 ≤ −2y1y2 ⇔ |y1| ≤ 2y2 ⇔ 1 2 |y1| ≤ y2. Then the critical point x2 is the minimizer if 12 |y1| ≤ y2 ≤ 2|y1|, and the 85 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 minimum value is h7,y = 1 3 y1y2(3y1 − 3y2 + 4 √ −2y1y2). Outside this region, we look at the boundary conditions y1 ≤ x ≤ min{0, y1+ y2} to determine the minimizer. Subcase 1: y1 + y2 ≥ 0 When y1 + y2 ≥ 0 the min{0, y1 + y2} = 0 so we look at the difference h7,y(0)− h7,y(y1) = 13(−y1) 3 + 1 3 (y1 + y2)3 − 13y 3 2 − 1 3 (−y1)3 = 1 3 (y1 + y2)3 − 13y 3 2 = 1 3 y31 + y1y2(y1 + y2). Because y1 ≤ 0, y2 ≥ 0, then y1 + y2 ≤ y2 and (y1 + y2)3 ≤ y32 so that the above equation is always negative and therefore the minimum value is h7,y(0) = 1 3 (y1 + y2)3 − 13y 3 1. Subcase 2: y1 + y2 ≤ 0 Here the min{0, y1 + y2} = y1 + y2 so the difference to consider is h7,y(y1 + y2)− h7,y(y1) = 13y 3 2 + 1 3 (−y1 − y2)3 − 13y 3 2 − 1 3 (−y1)3 = 1 3 (−y1 − y2)3 − 13(−y1) 3 = −1 3 y32 − y1y2(y1 + y2) 86 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Again, because of the signs of y1 and y2, we get that (−y1 − y2) ≤ −y1 and (−y1 − y2)3 ≤ (−y1)3 so that this equation is always negative so the minimum value is h7,y(y1 + y2) = 1 3 y32 − 1 3 (y1 + y2)3. Case 7 is summarized graphically in Figure 6.7. Figure 6.7: Case 7 summary 87 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 6.4.8 Case 8: x ≤ y1, x ≤ y1 + y2, x ≤ 0 For the final case the function we minimize is h8,y(x) = 1 3 (y1 − x)3 + 13(y1 + y2 − x) 3 + 1 3 (−x)3, over the region where x ≤ min{0, y1, y1 + y2}. Differentiating h8,y with respect to x yields ∂h8,y ∂x = −(y1 − x)2 − (y1 + y2 − x)2 − x2. Setting this equal to zero and solving for x gives the critical points x1 = 2 3 y1 + 1 3 y2 + 1 3 √ −2y21 − 2y1y2 − 2y22 x2 = 2 3 y1 + 1 3 y2 − 13 √ −2y21 − 2y1y2 − 2y22. As we saw in case 1, −2y21 − 2y1y2 − 2y22 = −(y21 + y22)− (y1 + y2)2 ≤ 0 so the critical points are only real when y1 = y2 = 0. Thus we are only left with the boundary conditions x ≤ min{0, y1, y1+y2}. If min{0, y1, y1+y2} = 0 then the minimum value is h8,y(0) = 1 3 y31 + 1 3 (y1 + y2)3. 88 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 When the min{0, y1, y1 + y2} = y1 then the minimum value is h8,y(y1) = 1 3 y32 + 1 3 (−y1)3. And finally, if min{0, y1, y1 + y2} = y1 + y2, the minimum value is h8,yy1 + y2) = 1 3 (−y2)3 + 13(−y1 − y2) 3. Case 8 is summarized graphically in Figure 6.8. Figure 6.8: Case 8 summary 89 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 6.4.9 Combining the Eight Cases Now that each possibility has been examined, each case must be compared against the others to determine the global minimum in each region. Cases 2,3, and 4 and Cases 5,6, and 7 can be plotted without any overlap, as seen in Figure 6.9 on page 97. This leaves 12 regions each with four possible minimizers. Since we can see from equation (6.10) that h(x) is convex with respect to x, the critical point will be the minimum in regions where there is a valid critical point. Combining all of the eight cases, we can see that there is a valid critical point for every possible region. We therefore have only six regions, as seen in Figure 6.10 on page 98. The regions are divided by the lines y1 − y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0, and each region has a critical point minimizer. The six regions are defined as follows. Let y = (y1, y2), then y ∈ A if − 1 2 y1 ≤ y2 ≤ y1, (6.11) y ∈ B if − 1 2 y2 ≤ y1 ≤ y2, y ∈ C if − 2y2 ≤ y1 ≤ −12y2, y ∈ D if y1 ≤ y2 ≤ −12y1, y ∈ E if y2 ≤ y1 ≤ −12y2, and y ∈ F if − 2y1 ≤ y2 ≤ −12y1. The minimum for each region is outlined below 90 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 (A) When y ∈ A, then the minimizer is the critical point x2 from Case 5, and the minimum value is h5,y(x2) = h5,y(2y1 + y2 − √ 2y1y2 + 2y21) = −1 3 y1(y1 + y2)(−6y1 − 3y2 + 4 √ 2y1(y1 + y2)). (B) When y ∈ B, the minimizer is the critical point x1 from Case 3, and the minimum value is h3,y(x1) = h3,y(−y2 + √ 2y22 + 2y1y2) = −1 3 y2(y1 + y2)(−3y1 − 6y2 + 4 √ 2y2(y1 + y2)). (C) When y ∈ C, the minimizer is the critical point x2 from Case 7, and the minimum value is h7,y(x2) = h7,y(y2 − √ −2y1y2) = 1 3 y1y2(3y1 − 3y2 + 4 √ −2y1y2). (D) When y ∈ D, the minimizer is the critical point x1 from Case 4, and the minimum value is h4,y(x1) = h4,y(2y1 + y2 + √ 2y1y2 + 2y21) = −1 3 y1(y1 + y2)(6y1 + 3y2 + 4 √ 2y1(y1 + y2)). (E) When y ∈ E, the minimizer is the critical point x2 from Case 6, and 91 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 the minimum value is h6,y(x2) = h6,y(−y2 − √ 2y22 + 2y1y2) = −1 3 y2(y1 + y2)(3y1 + 6y2 + 4 √ 2y2(y1 + y2)). (F) And when y ∈ F , the minimizer is the critical point x1 from Case 2, and the minimum value is h2,y(x1) = h2,y(y2 + √ −2y1y2) = 1 3 y1y2(−3y1 + 3y2 + 4 √ −2y1y2). 6.4.10 Bringing It All Together If you recall, the goal was to solve min x∈R h(x) = min x∈R ( 1 3 |x− y1|3 + 13 |x− (y1 + y2)| 3 + 1 3 |x|3, ) . in order to get g∗2(y1, y2, y3), where y1 + y2 + y3 = 0 g2(x1, x2, x3) = 2 3 ‖x1 − x2‖ 32 + 23‖x2 − x3‖ 3 2 + 2 3 ‖x3 − x1‖ 32 The minimizer of h(x) has been found for each of the six regions, and so 92 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 we have found that g2(y1, y2,−y1 − y2) = −13y1(y1 + y2)(−6y1 − 3y2 + 4 √ 2y1(y1 + y2)), if y ∈ A; −13y2(y1 + y2)(−3y1 − 6y2 + 4 √ 2y2(y1 + y2)), if y ∈ B; 1 3y1y2(3y1 − 3y2 + 4 √−2y1y2), if y ∈ C; −13y1(y1 + y2)(6y1 + 3y2 + 4 √ 2y1(y1 + y2)), if y ∈ D; −13y2(y1 + y2)(3y1 + 6y2 + 4 √ 2y2(y1 + y2)), if y ∈ E; 1 3y1y2(−3y1 + 3y2 + 4 √−2y1y2), if y ∈ F, (6.12) where the regions A, · · · , F are as defined in (6.11) on page 90. Examining a plot of the above function and its contour plot, in Fig- ure 6.11 on page 98, we see that g∗2 is convex which is what we would expect from a conjugate function, even though g∗2 is not obviously convex upon inspection. Recall from (6.7) on page 63 that we had three conjugate variables, y1, y2, and y3 such that y1 + y2 + y3 = 0. Making the substitution y3 = −(y1+y2) allows us to write (6.12) in the 93 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 more symmetric form g∗2(y1, y2, y3) = 1 3y1y3(3y3 − 3y1 + 4 √−2y1y3), if (y1, y2) ∈ A; 1 3y2y3(3y3 − 3y2 + 4 √−2y2y3), if (y1, y2) ∈ B; 1 3y1y2(3y1 − 3y2 + 4 √−2y1y2), if (y1, y2) ∈ C; 1 3y1y3(3y1 − 3y3 + 4 √−2y1y3), if (y1, y2) ∈ D; 1 3y2y3(3y2 − 3y3 + 4 √−2y2y3), if (y1, y2) ∈ E; 1 3y1y2(3y2 − 3y1 + 4 √−2y1y2), if (y1, y2) ∈ F, (6.13) where the regions A, · · · , F are as defined in (6.11) on page 90. With the y3 variable added back into the equation, we can then recognize that the three boundaries y1 − y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0 are equivalent to y1 = y2, y1 = y3, and y2 = y3. If we consider region A defined by y1 ≥ y2 ≥ −12y1, and look at the difference, y2 − y3 = y2 − (−y1 − y2) = y1 + 2y2 ≥ y1 + 2(−12y1) = 0, so min{y1, y2, y3} = y3 and max{y1, y2, y3} = y1. Thus, we can rewrite using ymax = max{y1, y2, y3} and ymin = min{y1, y2, y3} g∗2(y1, y2, y3) = 1 3 y1y3(3y3 − 3y1 + 4 √ −2y1y3) = ymaxy2min − y2maxymin + 4 3 ymaxymin √ −2ymaxymin, if (y1, y2) ∈ A. 94 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Similarly, all of the other regions can be rewritten in the same manner so that (6.13) can be rewritten using ymax = max{y1, y2, y3} and ymin = min{y1, y2, y3} as g∗2(y1, y2, y3) = ymaxy 2 min − y2maxymin + 4 3 ymaxymin √ −2ymaxymin, (6.14) without the need to specify the region. Remark 6.4.1 Although g∗2 does not look convex, it is because it arose as a conjugate function. Convexity can also be shown with calculus if you proceed as follows. We have three variables such that ymax ≥ y0 ≥ ymin and ymax + y0 + ymin = 0, so y0 = −ymax − ymin and hence ymax ≥ −ymax − ymin ≥ ymin. Equivalently, 2ymax + ymin ≥ 0 and −2ymin − ymax ≥ 0. Now define x := ymax and y := −ymin, then both x ≥ 0 and y ≥ 0 and we care about the region where 2x− y ≥ 0 and 2y − x ≥ 0. (6.15) Then we get the function f(x, y) = xy2 + yx2 − 4 3 xy √ 2xy, 95 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 with ∇2f(x, y) = (2 √ xy−√2y)y√ xy 2y √ xy+2x √ xy−3√2xy√ xy 2y √ xy+2x √ xy−3√2xy√ xy (2 √ xy−√2x)x√ xy It can be shown that the (1, 1) and (2, 2) elements of ∇2f(x, y) are pos- itive using the constraints in (6.15). The determinant of ∇2f(x, y) can be shown to be positive by assuming x = a2 and y = b2, factoring the resulting equation, and considering the signs of each of the factors. Since the Hessian is positive semidefinite then f is convex. 96 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 (a) Case 1 (b) Cases 2, 3, and 4 (c) Cases 5, 6, and 7 (d) Case 8 Figure 6.9: Overview of the eight cases 97 6.4. P-norm Kernel Conjugate when p = 32 , q = 3, and n = 3 Figure 6.10: The six regions (a) The function g∗2(y1, y2,−y1 − y2) (b) The contour plot of g∗2(y1, y2,−y1− y2) Figure 6.11: Plots of g∗2(y1, y2,−y1 − y2) 98 Chapter 7 Conclusion The kernel average was previously only defined for two functions. We have used the identity n∑ i=1 λiq(yi)− q( n∑ i=1 λiyi) = 1 4 n∑ i=1 n∑ j=1 λiλj‖yi − yj‖2, and the definition of the proximal average to define the kernel average for n functions, Qg(f ,λ)(x) := inf∑ λiyi=x (∑ λifi(yi) + ∑ i<j λiλjg(yi − yj) ) . We then examined a specific case of the kernel average, namely the prox- imal average, and its conjugate and calculated that for g1(x) = n∑ i=1 n∑ j=1 1 2 ‖xi − xj‖2 = 2 ∑ i<j 1 2 ‖xi − xj‖2, the conjugate function is g∗1(x ∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2, if x∗1 + · · ·+ x∗n = 0; +∞, otherwise. 99 Chapter 7. Conclusion This can also be written using any of the following equivalent formulations when x∗1 + · · ·+ x∗n = 0: (i) g∗1(x∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i − x∗j‖2 = n∑ i=1 n∑ j=1 1 2‖ x∗i−x∗j 2n ‖2 (ii) g∗1(x∗) = 1 8n2 n∑ i=1 n∑ j=1 ‖x∗i + x∗j‖2 = n∑ i=1 n∑ j=1 1 2‖ x∗i+x ∗ j 2n ‖2 (iii) g∗1(x∗) = 1 2n n∑ i=1 1 2‖xi‖2. Next, we computed the conjugate when g2(x1, x2, x3) = 2 3 ‖x1 − x2‖ 32 + 23‖x2 − x3‖ 3 2 + 2 3 ‖x3 − x1‖ 32 , and found that g∗2(y1, y2, y3) = 1 3y1y3(3y3 − 3y1 + 4 √−2y1y3), if (y1, y2) ∈ A; 1 3y2y3(3y3 − 3y2 + 4 √−2y2y3), if (y1, y2) ∈ B; 1 3y1y2(3y1 − 3y2 + 4 √−2y1y2), if (y1, y2) ∈ C; 1 3y1y3(3y1 − 3y3 + 4 √−2y1y3), if (y1, y2) ∈ D; 1 3y2y3(3y2 − 3y3 + 4 √−2y2y3), if (y1, y2) ∈ E; 1 3y1y2(3y2 − 3y1 + 4 √−2y1y2), if (y1, y2) ∈ F, where the regions A,B,C,D,E, and F are as defined in (6.11), or equiva- lently using ymax = max{y1, y2, y3} and ymin = min{y1, y2, y3}, g∗2(y1, y2, y3) = ymaxy 2 min − y2maxymin + 4 3 ymaxymin √ −2ymaxymin. (7.1) 100 Chapter 7. Conclusion It was expected that we might find a similar form for g∗2 as was found for g∗1, which would help formulate a closed form solution for g̃∗ in the general case where g̃(x) = n∑ i=1 n∑ j=1 λiλjg(xi − xj), for any function g. However, due to the fact that there does not appear to be any correlation between the solutions for g∗1 and g∗2, it seems unlikely that a general solution will be found. 101 Bibliography [1] H. H. Bauschke, R. Goebel, Y. Lucet and X. Wang, “The proximal aver- age: basic theory,” SIAM Journal on Optimization, vol 19(2), pp. 766- 785, 2008. [2] H. H. Bauschke, Y. Lucet and M. Trienis, “How to transform one convex function continuously into another,” SIAM Review, vol 50, pp. 115-132, 2008. [3] H. H. Bauschke and X. Wang, “The kernel average for two convex func- tions and its application to the extension and representation of mono- tone operators,” Transactions of the American Mathematical Society 361, pp. 5947-5965, 2009 [4] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Opti- mization, Springer, 2006. [5] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge Uni- versity Press, 2004. [6] P. L. Combettes and J.-C. Pesquet, “A proximal decomposition method for solving convex variational inverse problems,” Inverse Problems, vol. 24(6), 2008. 102 [7] K. R. Davidson and A. P. Donsig, Real Analysis with Real Applications, Prentice Hall, 2002. [8] R. Ellaia and J.-B. Hiriart-Urruty, “The conjugate of the difference of functions,” Journal of Optimization Theory and Applications, vol 49(3), pp.493-498, 1986. [9] R. Goebel, “Self-dual smoothing of convex and saddle functions,” Jour- nal of Convex Analysis, vol 15(1), pp.179-190, 2008. [10] E. Kreyszig, Introductory Functional Analysis with Applications, John Wiley and Sons, 1978. [11] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000. [12] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [13] R. T. Rockafellar and R. J-B Wets, Variational Analysis, Springer- Verlag, 2004. [14] C. Zălinescu, Convex Analysis in General Vector Spaces, World Scien- tific Publishing, 2002. 103 Appendix A Maple Code The following is the Maple code used to verify the calculations for chapter 6. >r e s t a r t : with ( p l o t s ) : Case 1 : > h1 := x−> (1/3)∗ ( x−y [1 ] ) ˆ3+(1/3 )∗ ( x−y [1]−y [2 ] )ˆ3+(1/3)∗ x ˆ3 ; x− > 1 3 (x− y1)3 + 13(x− y1 − y2) 3 + 1 3 x3 > dh1 := d i f f ( h1 (x ) , x ) ; c r i t i c a l p o i n t s 1 := so l v e ( dh1 = 0 , x ) ; 1 3 y2 + 2 3 y1 + 1 3 √ −2y22 − 2y1y2 − 2y21, 1 3 y2 + 2 3 y1 − 13 √ −2y22 − 2y1y2 − 2y21 Case 2 : > h2 := x−> (1/3)∗ ( y [1]−x )ˆ3+(1/3)∗(x−y [1]−y [2 ] )ˆ3+(1/3)∗ x ˆ3 ; x− > 1 3 (y1 − x)3 + 13(x− y1 − y2) 3 + 1 3 x3 > dh2 := d i f f ( h2 (x ) , x ) : c r i t i c a l p o i n t s 2 := so l v e ( dh2 = 0 , x ) ; 104 Appendix A. Maple Code y2 + √ −2y1y2, y2 − √ −2y1y2 > h2so ln1 := f a c t o r ( s imp l i f y ( h2 ( c r i t i c a l p o i n t s 2 [ 1 ] ) ) ) ; −1 3 y1y2(3y1 − 3y2 − 4 √ 2 √−y1y2) > h2so ln2 := f a c t o r ( s imp l i f y ( h2 ( c r i t i c a l p o i n t s 2 [ 2 ] ) ) ) ; −1 3 y1y2(3y1 − 3y2 + 4 √ 2 √−y1y2) Case 3 : > h3 := x−> (1/3)∗ ( x−y [1 ] ) ˆ3+(1/3 )∗ ( y [1 ]+y [2]−x)ˆ3+(1/3)∗x ˆ3 ; x− > 1 3 (x− y1)3 + 13(y1 + y2 − x) 3 + 1 3 x3 >dh3 := d i f f ( h3 (x ) , x ) : c r i t i c a l p o i n t s 3 := so l v e ( dh3 = 0 , x ) ; −y2 + √ 2y22 + 2y1y2,−y2 − √ 2y22 + 2y1y2 >h3so ln1 := f a c t o r ( s imp l i f y ( h3 ( c r i t i c a l p o i n t s 3 [ 1 ] ) ) ) ; −1 3 y2(y2 + y1)(−3y1 − 6y2 + 4 √ 2 √ y2(y2 + y1)) >h3so ln2 := f a c t o r ( s imp l i f y ( h3 ( c r i t i c a l p o i n t s 3 [ 2 ] ) ) ) ; 105 Appendix A. Maple Code 1 3 y2(y2 + y1)(3y1 + 6y2 + 4 √ 2 √ y2(y2 + y1)) Case 4 : > h4 := x−> (1/3)∗ ( x−y [1 ] ) ˆ3+(1/3 )∗ ( x−y [1]−y [2 ] )ˆ3 − (1/3)∗x ˆ3 ; x− > 1 3 (x− y1)3 + 13(x− y1 − y2) 3 + 1 3 x3 >dh4 := d i f f ( h4 (x ) , x ) : c r i t i c a l p o i n t s 4 := so l v e ( dh4 = 0 , x ) ; y2 + 2y1 + √ 2y1y2 + 2y21, y2 + 2y1 − √ 2y1y2 + 2y21 >h4so ln1 := f a c t o r ( s imp l i f y ( h4 ( c r i t i c a l p o i n t s 4 [ 1 ] ) ) ) ; −1 3 y1(y2 + y1)(3y2 + 6y1 + 4 √ 2y1(y2 + y1)) >h4so ln2 := f a c t o r ( s imp l i f y ( h4 ( c r i t i c a l p o i n t s 4 [ 2 ] ) ) ) ; 1 3 y1(y2 + y1)(−3y2 − 6y1 + 4 √ 2y1(y2 + y1)) Case 5 : > h5 := x−> (1/3)∗ ( y [1]−x )ˆ3+(1/3)∗( y [1 ]+y [2]−x)ˆ3+(1/3)∗x ˆ3 ; x− > 1 3 (y1 − x)3 + 13(y1 + y2 − x) 3 + 1 3 x3 106 Appendix A. Maple Code >dh5 := d i f f ( h5 (x ) , x ) : c r i t i c a l p o i n t s 5 := so l v e ( dh5 = 0 , x ) ; y2 + 2y1 + √ 2y1y2 + 2y21, y2 + 2y1 − √ 2y1y2 + 2y21 >h5so ln1 := f a c t o r ( s imp l i f y ( h5 ( c r i t i c a l p o i n t s 5 [ 1 ] ) ) ) ; 1 3 y1(y2 + y1)(3y2 + 6y1 + 4 √ 2y1(y2 + y1)) >h5so ln2 := f a c t o r ( s imp l i f y ( h5 ( c r i t i c a l p o i n t s 5 [ 2 ] ) ) ) ; −1 3 y1(y2 + y1)(−3y2 − 6y1 + 4 √ 2y1(y2 + y1)) Case 6 : > h6 := x−> (1/3)∗ ( y [1]−x )ˆ3+(1/3)∗(x−y [1]−y [2 ] )ˆ3 − (1/3)∗x ˆ3 ; x− > 1 3 (y1 − x)3 + 13(x− y1 − y2) 3 − 1 3 x3 dh6 := d i f f ( h6 (x ) , x ) : c r i t i c a l p o i n t s 6 := so l v e ( dh6 = 0 , x ) ; −y2 + √ 2y22 + 2y1y2,−y2 − √ 2y22 + 2y1y2 >h6so ln1 := f a c t o r ( s imp l i f y ( h6 ( c r i t i c a l p o i n t s 6 [ 1 ] ) ) ) ; 1 3 y2(y2 + y1)(−3y1 − 6y2 + 4 √ 2y2(y2 + y1)) 107 Appendix A. Maple Code >h6so ln2 := f a c t o r ( s imp l i f y ( h6 ( c r i t i c a l p o i n t s 6 [ 2 ] ) ) ) ; −1 3 y2(y2 + y1)(3y1 + 6y2 + 4 √ 2y2(y2 + y1)) Case 7 : > h7 := x−> (1/3)∗ ( x−y [1 ] ) ˆ3+(1/3 )∗ ( y [1 ]+y [2]−x)ˆ3−(1/3)∗x ˆ3 ; x− > 1 3 (x− y1)3 + 13(y1 + y2 − x) 3 − 1 3 x3 >dh7 := d i f f ( h7 (x ) , x ) : c r i t i c a l p o i n t s 7 := so l v e ( dh7 = 0 , x ) ; y2 + √ −2y1y2, y2 − √ −2y1y2 >h7so ln1 := f a c t o r ( s imp l i f y ( h7 ( c r i t i c a l p o i n t s 7 [ 1 ] ) ) ) ; −1 3 y1y2(3y2 − 3y1 + 4 √ −2y1y2) >h7so ln2 := f a c t o r ( s imp l i f y ( h7 ( c r i t i c a l p o i n t s 7 [ 2 ] ) ) ) ; 1 3 y1y2(3y1 − 3y2 + 4 √ −2y1y2) Case 8 : > h8 := x−> (1/3)∗ ( y [1]−x )ˆ3+(1/3)∗( y [1 ]+y [2]−x)ˆ3−(1/3)∗x ˆ3 ; 108 Appendix A. Maple Code x− > 1 3 (y1 − x)3 + 13(y1 + y2 − x) 3 − 1 3 x3 >dh8 := d i f f ( h8 (x ) , x ) : c r i t i c a l p o i n t s 8 := so l v e ( dh8 = 0 , x ) ; 1 3 y2 + 2 3 y1 + 1 3 √ −2y22 − 2y1y2 − 2y21, 1 3 y2 + 2 3 y1 − 13 √ −2y22 − 2y1y2 − 2y21 Plo t t i ng : >r e s t a r t : with ( p l o t s ) : > f 1 := (y1 , y2)−> −(1/3)∗y1 ∗( y2+y1)∗(−6∗y1−3∗y2+4∗ s q r t (2)∗ s q r t ( y1 ∗( y2+y1 ) ) ) : > f 2 := (y1 , y2)−> −(1/3)∗y2 ∗( y2+y1)∗(−3∗y1−6∗y2+4∗ s q r t (2)∗ s q r t ( y2 ∗( y2+y1 ) ) ) : > f 3 := (y1 , y2)−> (1/3)∗ y1∗y2 ∗(3∗y1−3∗y2+4∗ s q r t (2)∗ s q r t (−y1 ∗y2 ) ) : > f 4 := (y1 , y2)−> −(1/3)∗y1 ∗( y2+y1 )∗ (6∗ y1+3∗y2+4∗ s q r t (2)∗ s q r t ( y1 ∗( y2+y1 ) ) ) : > f 5 := (y1 , y2)−> −(1/3)∗y2 ∗( y2+y1 )∗ (3∗ y1+6∗y2+4∗ s q r t (2)∗ s q r t ( y2 ∗( y2+y1 ) ) ) : > f 6 := (y1 , y2)−> (1/3)∗ y1∗y2∗(−3∗y1+3∗y2+4∗ s q r t (2)∗ s q r t (−y1 ∗y2 ) ) : > f p i e c e := (y1 , y2)−> p i e c ew i s e (0 <= y1 and −(1/2)∗y1 <= y2 and y2 <= y1 , f 1 ( y1 , y2 ) , 0 <= y2 and −(1/2)∗y2 <= y1 and y1 <= y2 , f 2 ( y1 , y2 ) , y1 <= 0 and −(1/2)∗y1 <= y2 and y2 <= −2∗y1 , f 3 ( y1 , y2 ) , y1 <= 0 and y1 <= y2 and y2 <= −(1/2)∗y1 , 109 Appendix A. Maple Code f 4 ( y1 , y2 ) , y2 <= 0 and y2 <= y1 and y1 <= −(1/2)∗y2 , f 5 ( y1 , y2 ) , 0 <= y1 and −2∗y1 <= y2 and y2 <= −(1/2)∗y1 , f 6 ( y1 , y2 ) ) : >plot3d ( f p i e c e ( y1 , y2 ) , y1=−10..10 , y2=−10..10 , axes=normal ) ; >p l o t s [ contourp lo t ] ( f p i e c e ( y1 , y2 ) , y1=−10..10 , y2=−10..10 , contours =100); 110
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On the kernel average for n functions
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
On the kernel average for n functions Moffat, Sarah Michelle 2009
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | On the kernel average for n functions |
Creator |
Moffat, Sarah Michelle |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | After an introduction to Hilbert spaces and convex analysis, the proximal average is studied and two smooth operators are provided. The first is a new version of an operator previously supplied by Goebel, while the second one is new and uses the proximal average of a function and a quadratic to find a smooth approximation of the function. Then, the kernel average of two functions is studied and a reformulation of the proximal average is used to extend the definition of the kernel average to allow for any number of functions. The Fenchel conjugate of this new kernel average is then examined by calculating the conjugate for two specific kernel functions that represent two of the simplest cases that could be considered. A closed form solution was found for the conjugate of the first kernel function and it was rewritten in three equivalent forms. A solution was also found for the conjugate of the second kernel function, but the two solutions do not have the same form which suggests that a general solution for the conjugate of any kernel function will not be found. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-16 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0069333 |
URI | http://hdl.handle.net/2429/21932 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Arts and Sciences, Irving K. Barber School of (Okanagan) Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan) |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-05 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_spring_moffat_sarah.pdf [ 1.27MB ]
- Metadata
- JSON: 24-1.0069333.json
- JSON-LD: 24-1.0069333-ld.json
- RDF/XML (Pretty): 24-1.0069333-rdf.xml
- RDF/JSON: 24-1.0069333-rdf.json
- Turtle: 24-1.0069333-turtle.txt
- N-Triples: 24-1.0069333-rdf-ntriples.txt
- Original Record: 24-1.0069333-source.json
- Full Text
- 24-1.0069333-fulltext.txt
- Citation
- 24-1.0069333.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0069333/manifest