UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the kernel average for n functions Moffat, Sarah Michelle 2009-03-16

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
24-ubc_2010_spring_moffat_sarah.pdf [ 1.27MB ]
Metadata
JSON: 24-1.0069333.json
JSON-LD: 24-1.0069333-ld.json
RDF/XML (Pretty): 24-1.0069333-rdf.xml
RDF/JSON: 24-1.0069333-rdf.json
Turtle: 24-1.0069333-turtle.txt
N-Triples: 24-1.0069333-rdf-ntriples.txt
Original Record: 24-1.0069333-source.json
Full Text
24-1.0069333-fulltext.txt
Citation
24-1.0069333.ris

Full Text

On the Kernel Average for n FunctionsbySarah Michelle MofiatB.Sc., The University of British Columbia, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe College of Graduate Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)December 2009c Sarah Michelle Mofiat 2009AbstractAfter an introduction to Hilbert spaces and convex analysis, the proximalaverage is studied and two smooth operators are provided. The flrst is anew version of an operator previously supplied by Goebel, while the secondone is new and uses the proximal average of a function and a quadratic toflnd a smooth approximation of the function.Then, the kernel average of two functions is studied and a reformulationof the proximal average is used to extend the deflnition of the kernel aver-age to allow for any number of functions. The Fenchel conjugate of this newkernel average is then examined by calculating the conjugate for two spe-ciflc kernel functions that represent two of the simplest cases that could beconsidered. A closed form solution was found for the conjugate of the flrstkernel function and it was rewritten in three equivalent forms. A solutionwas also found for the conjugate of the second kernel function, but the twosolutions do not have the same form which suggests that a general solutionfor the conjugate of any kernel function will not be found.iiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 General Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 22.2 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . 42.3 Facts on Maximization and Minimization . . . . . . . . . . . 73 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1.1 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Interiors of Sets . . . . . . . . . . . . . . . . . . . . . 103.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 11iiiTable of Contents3.2.1 Subgradients . . . . . . . . . . . . . . . . . . . . . . . 173.2.2 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . 183.2.3 Epi-addition and Epi-multiplication . . . . . . . . . . 243.3 Proximity Operators . . . . . . . . . . . . . . . . . . . . . . . 273.4 Minimax Theory . . . . . . . . . . . . . . . . . . . . . . . . . 284 The Proximal Average . . . . . . . . . . . . . . . . . . . . . . 304.1 Deflnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.1 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . 334.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Self-dual Smooth Approximations . . . . . . . . . . . 415 The Kernel Average of Two Functions . . . . . . . . . . . . 475.1 Deflnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 The Kernel Average of n Functions . . . . . . . . . . . . . . 516.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2 The Kernel Average Conjugate . . . . . . . . . . . . . . . . . 536.3 P-norm Kernel Conjugate for General p Case when n = 3 . . 596.4 P-norm Kernel Conjugate when p = 32, q = 3, and n = 3 . . . 636.4.1 Case 1: x ‚ y1, x ‚ y1 +y2, x ‚ 0 . . . . . . . . . . . 646.4.2 Case 2: x • y1, x ‚ y1 +y2, x ‚ 0 . . . . . . . . . . . 676.4.3 Case 3: x ‚ y1, x • y1 +y2, x ‚ 0 . . . . . . . . . . . 696.4.4 Case 4: x ‚ y1, x ‚ y1 +y2, x • 0 . . . . . . . . . . . 73ivTable of Contents6.4.5 Case 5: x • y1, x • y1 +y2, x ‚ 0 . . . . . . . . . . . 776.4.6 Case 6: x • y1, x ‚ y1 +y2, x • 0 . . . . . . . . . . . 806.4.7 Case 7: x ‚ y1, x • y1 +y2, x • 0 . . . . . . . . . . . 846.4.8 Case 8: x • y1, x • y1 +y2, x • 0 . . . . . . . . . . . 886.4.9 Combining the Eight Cases . . . . . . . . . . . . . . . 906.4.10 Bringing It All Together . . . . . . . . . . . . . . . . 927 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102AppendicesA Maple Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104vList of Figures3.1 Epigraph of a function . . . . . . . . . . . . . . . . . . . . . . 136.1 Case 1 summary . . . . . . . . . . . . . . . . . . . . . . . . . 666.2 Case 2 summary . . . . . . . . . . . . . . . . . . . . . . . . . 706.3 Case 3 summary . . . . . . . . . . . . . . . . . . . . . . . . . 736.4 Case 4 summary . . . . . . . . . . . . . . . . . . . . . . . . . 776.5 Case 5 summary . . . . . . . . . . . . . . . . . . . . . . . . . 816.6 Case 6 summary . . . . . . . . . . . . . . . . . . . . . . . . . 846.7 Case 7 summary . . . . . . . . . . . . . . . . . . . . . . . . . 876.8 Case 8 summary . . . . . . . . . . . . . . . . . . . . . . . . . 896.9 Overview of the eight cases . . . . . . . . . . . . . . . . . . . 976.10 The six regions . . . . . . . . . . . . . . . . . . . . . . . . . . 986.11 Plots of g⁄2(y1;y2;¡y1 ¡y2) . . . . . . . . . . . . . . . . . . . 98viAcknowledgementsFirst, I would like to thank my supervisors Heinz Bauschke and Shawn Wangfor their excitement about mathematics and their  exibility when it cameto fltting mathematics into my life. I’d also like to thank Liangjin Yao andMason Macklem for all their help, and Richard Taylor who flrst inspired meto pursue graduate studies.viiDedicationFor my wonderful husband Jim and our daughter Zoe, who was born duringthe production of this thesis.viiiChapter 1IntroductionWhen averaging functions the most natural place to start is the arithmeticaverage, deflned byA := ‚f1 +(1¡‚)f2; (1.1)where 0 • ‚ • 1. This works well if both functions f1 and f2 are every-where deflned. But if the functions are not difierentiable at some pointsor if their domains do not intersect, then the arithmetic average will notbe difierentiable or will be +1 everywhere. In [2] Bauschke, Lucet, andTrienis deflne a new average, the proximal average, and discuss the benefltsof using the proximal average for these types of cases. In particular, theproximal average produces a continuous and difierentiable function even ifthe original functions are non-smooth and their domains do not intersect,provided at least one of the functions is difierentiable with a full domain.From the deflnition of the proximal average, the more general kernel av-erage [3] was deflned for averaging two functions based on a kernel function.Both the arithmetic and proximal averages can be derived as special cases ofthe kernel average. This thesis extends the deflnition of the kernel averageto an arbitrary number of functions and examines the convex conjugate ofthe kernel average for n functions.1Chapter 2Hilbert SpacesIn this chapter we give some background material on inner product spaces.The notion of vector spaces and their extensions are central to much of thefollowing thesis, so a quick reminder of some concepts from linear algebra isalso included to refresh the reader’s memory. For more on vector spaces see[7, Chapters 1 and 7] or [11, Chapter 4].2.1 General Vector SpacesDeflnition 2.1.1 (Vector Space) A vector space consists of a set V withelements called vectors, along with two operations such that the followingproperties hold:(1) Vector addition: Let u;v 2 V, then there is a vector u+v 2 V and thefollowing are satisfled.(i) Commutativity: u+v = v +u; 8u;v 2 V:(ii) Associativity: u+(v +w) = (u+v)+w; 8u;v;w 2 V:(iii) Zero: there is a vector 0 2 V such that 0+u = u = u+0; 8u 2 V:(iv) Inverses: for each u 2 V, there is a vector ¡u such that u+(¡u) =0:22.1. General Vector Spaces(2) Scalar multiplication: Let u;v 2 V and r;s 2R, then the following aresatisfled.(i) Left distributivity: (r +s)v = rv +sv.(ii) Associativity: r(sv) = (rs)v.(iii) Right distributivity: r(u+v) = ru+rv.(iv) Neutral element: 1v = v.(v) Absorbing element: 0v = 0.(vi) Inverse neutral element: (¡1)v = ¡v.Example 2.1.2 The spaceRn consists of vectors v = (v1;¢¢¢ ;vn) with vi 2R for 1 • i • n and operations deflned by(u1;¢¢¢ ;un)+(v1;¢¢¢ ;vn) := (u1 +v1;¢¢¢ ;un +vn)r(v1;¢¢¢vn) := (rv1;¢¢¢ ;rvn);where r 2R.Deflnition 2.1.3 A subspace of a vector space V is a subset W of V withW 6=? and W is a vector space using the operations of V.Deflnition 2.1.4 Let S  V, the span of S is the smallest subspace con-taining S and is denoted by spanS.Fact 2.1.5 The subspace spanned by a nonempty set S  V consists of alllinear combinations of the elements of S.32.2. Inner Product SpacesDeflnition 2.1.6 A linear transformation A from a vector space V to avector space W is a function A : V ! W satisfyingA(r1v1 +r2v2) = r1Av1 +r2Av2; 8v1;v2 2 V and 8r1;r2 2R:2.2 Inner Product SpacesWe recall the deflnitions of a norm and an inner product.Deflnition 2.2.1 A norm on a vector space V is a function k¢ k : V ![0;+1[ with the following properties.(i) Positive deflnite: kxk = 0 if, and only if, x = 0,(ii) Homogeneous: kfixk = jfijkxk; 8x 2 V and fi 2R,(iii) Triangle inequality: kx+yk•kxk+kyk; 8x;y 2 V.Deflnition 2.2.2 An inner product on a vector space V is a function h¢;¢i :V £V !R satisfying the following properties.(i) Positive deflnite: hx;xi‚ 0; 8x 2 V and hx;xi = 0 only if x = 0;(ii) Symmetry: hx;yi = hy;xi; 8x;y 2 V;(iii) Bilinearity hfix+fly;zi = fihx;zi+flhy;zi; 8x;y;z 2 V and fi;fl 2R.We call a vector space paired with an inner product and norm induced bykxk := hx;xi1=2, an inner product space.Deflnition 2.2.3 In a normed vector space (V;k ¢ k), a sequence (vn)1n=1converges to v 2 V if limn!1kvn ¡vk = 0.42.2. Inner Product SpacesDeflnition 2.2.4 A sequence (vn)1n=1 is called a Cauchy sequence if forevery † > 0, there is an integer N > 0 such that kvn ¡ vmk < † for alln;m ‚ N.Remark 2.2.5 While every convergent sequence is a Cauchy sequence, theconverse is not true. For example, consider the sequence1; 1410; 141100; 14141000; 1414210000;¢¢¢in Q approaching p2. This sequence does not converge in Q.Deflnition 2.2.6 An inner product space V is complete if every Cauchysequence in V converges to some vector v 2 V.Deflnition 2.2.7 A complete inner product space is called a Hilbert space.Example 2.2.8 The following inner product spaces are Hilbert spaces:(i) [7, Theorem 4.2.5] Rn paired with the inner product hx;yi = x1y1 +¢¢¢+xnyn.(ii) [7, Theorem 7.5.8] The space ‘2, consisting of all sequences x = (xn)1n=1such that kxk2 := 1Pn=1x2n¶1=2is flnite, with the inner product hx;yi =1Pn=1xnyn.(iii) [10, Example 2.2-7] The space L2[0;1], consisting of all (equivalenceclasses of) functions f on [0;1] such that kfk2 := (R10 jf(x)j2dx)12 <+1, paired with the inner product hf;gi = R10 f(x)g(x)dx.52.2. Inner Product SpacesDeflnition 2.2.9 [4, Section 1.1] Let X;V be vector spaces and let A : X !V be a linear operator. The corresponding adjoint linear transformationfrom V to X is the unique operator A⁄ such that the following identity holdshAx;yi = hx;A⁄yi (2.1)for all x 2 X and y 2 V.Example 2.2.10 Let X be a vector space and ‚1;¢¢¢ ;‚n 2 R. Let A :Xn ! X be the linear operator deflned by x = (x1;¢¢¢ ;xn) 7! ‚1x1 +¢¢¢+ ‚nxn. Then the adjoint of A is the operator A⁄ : X ! Xn such thatz 7! (‚1z;¢¢¢ ;‚nz).Proof. Let x 2 Xn and y 2 X, thenhAx;yi = h‚1x1 +¢¢¢+‚nxn;yi =nXi=1‚ihxi;yi:On the other hand,hx;A⁄yi = h(x1;¢¢¢ ;xn);(‚1y;¢¢¢ ;‚ny)i= hx1;‚1yi+¢¢¢+hxn;‚nyi =nXi=1‚ihxi;yi:Since hAx;yi = hx;A⁄yi 8x 2 Xn;y 2 X and the adjoint is unique then A⁄is the adjoint of A. ¥62.3. Facts on Maximization and Minimization2.3 Facts on Maximization and MinimizationIn this section, we assume that X is an inner product space.Fact 2.3.1 Let x 2 X thensupy2Xhx;yi =8>><>>:0; if x = 0;+1; otherwise.Fact 2.3.2 Let f and g be functions from X ! ]¡1;+1], then(i) supx;y2X(f(x)+g(y)) = supx2X(f(x))+ supy2X(g(y))(ii) infx;y2X(f(x)+g(y)) = infx2X(f(x))+ infy2X(g(y)):Fact 2.3.3 Let x;y 2 X then(i) supx2Xsupy2Xf(x;y) = supy2Xsupx2Xf(x;y)(ii) infx2Xinfy2Xf(x;y) = infy2Xinfx2Xf(x;y)These will be used frequently later on.7Chapter 3Convex AnalysisLet H denote a real Hilbert space with inner product h¢;¢i, norm k¢k, andidentity mapping Id. We’ll now introduce some necessary convex analysis,for a more in-depth look at convex analysis please see [12].3.1 Convex SetsDeflnition 3.1.1 A set A  H is a–ne if x 2 A, y 2 A, and  2R implythat  x+(1¡ )y 2 A.Deflnition 3.1.2 A set C  H is convex if x 2 C, y 2 C, and 0 •  • 1imply that  x+(1¡ )y 2 C.This means that for any two points in a convex set C, the line segmentjoining the two points is also contained in C.Example 3.1.3 The following sets are convex:(i) A–ne sets;(ii) Halfspaces: A set H is a halfspace if for some b 2 H and fl 2 R,H = fx 2H : hx;bi• flg;83.1. Convex Sets(iii) Closed ball of radius r > 0 centered at a point xc: B(xc;r) := fx 2H :kxc ¡xk• rg:The following deflnitions describe some important properties of sets that arefrequently used in convex analysis.Deflnition 3.1.4 The indicator function of a set C  H is the function¶C : H ! [0;+1] deflned by¶C(x) =8>><>>:0; if x 2 C;+1; otherwise:Deflnition 3.1.5 The support function of a set C  H is the function C : H ! ]¡1;+1] deflned by C(x) = supu2Chx;ui:3.1.1 ConesA set C is called a cone if for every x 2 C and  > 0 we have  x 2 C. Aset C is a convex cone if it is both convex and a cone. That is, for everyx1;x2 2 C and  1; 2 ‚ 0 then  1x1 + 2x2 2 C.Deflnition 3.1.6 The conical hull of a set C  H is the setconeC =[‚>0f‚x : x 2 Cg:93.1. Convex SetsFact 3.1.7 [5, Section 2.1.5] The conical hull of C is the smallest convexcone that contains the set C.Example 3.1.8 Let D = fz = (x;y;0) 2 R3 : kzk • 1g be a closed unitdisc in R3, then coneD =R2 £f0g.Deflnition 3.1.9 Let C be a nonempty convex set in H. We say that Crecedes in the direction of y, y 6= 0, if and only if x+‚y 2 C for every ‚ ‚ 0and x 2 C. Directions in which C recedes are also called the directions ofrecession.Deflnition 3.1.10 The recession cone of a set C is the set of all vectorsy 2 H such that for each x 2 C, x + ‚y 2 C for all ‚ ‚ 0. The recessioncone of C is denoted by 0+C.Fact 3.1.11 [12, Theorem 8.1] Let C be a non-empty convex set. Then therecession cone, 0+C, is a convex cone containing the origin.3.1.2 Interiors of SetsDeflnition 3.1.12 The interior of a set C  H is the setintC = fx 2H : 9† > 0 such that B(x;†)  Cg;where B(x;†) = fy : ky ¡xk < †g.Deflnition 3.1.13 The relative interior of a convex set C  H is the setriC = fx 2H : cone(C ¡x) = span(C ¡x)g:103.2. Convex FunctionsThefollowingexampleillustratestheneedtodistinguishbetweentheinteriorand the relative interior of a set.Example 3.1.14 Consider again the closed disc D = fz = (x;y;0) 2R3 :kzk • 1g.We get that intD =? since no ball in R3 can be contained in D,however riD = fz = (x;y;0) 2R3 : kzk < 1g.Deflnition 3.1.15 A point x is a limit point of a subset C  H if thereis a sequence (xn)1n=1 with xn 2 C such that x = limn!1xn. A set C  H isclosed if it contains all of its limit points.Deflnition 3.1.16 The closure of a set C  H is the smallest closed setcontaining C, and is denoted by clC.3.2 Convex FunctionsThe efiective domain of a function f : H !]¡1;+1] is the set of points:domf = fx 2H : f(x) < +1g: (3.1)The set of global minimizers of f is denoted by argminx2Hf(x).Deflnition 3.2.1 We call a function f proper if it never takes on the valueof ¡1 and is not identically equal to +1.Deflnition 3.2.2 A function f is lower semicontinuous at a point x0 ifliminfx!x0f(x) ‚ f(x0);113.2. Convex Functionswhere liminf is as deflned in [13, Deflnition 1.5]. The function is said to belower semicontinuous if it is lower semicontinuous at every point x0 2H.Deflnition 3.2.3 A function f is coercive iflimkxk!1f(x) = 1:A function is supercoercive iflimkxk!1f(x)kxk = 1:Deflnition 3.2.4 The epigraph of a function f : H ! ]¡1;+1] is the setepif = f(x;t) 2H£R: f(x) • tg:This is illustrated in Figure 3.1.Deflnition 3.2.5 A function f : H ! ]¡1;+1] is convex if epif is aconvex set.We denote the set of proper, lower semicontinuous, convex functions in Hby ¡0(H). While Deflnition 3.2.5 conveniently ties the notion of a convexfunction to that of a convex set, in practice a convex function is usuallysynonymous with the following result.123.2. Convex FunctionsFigure 3.1: Epigraph of a function[5, Figure 3.5]Theorem 3.2.6 [12, Theorem 4.1] A function f : H ! ]1;+1] is convexif and only if for x;y 2H and 0 <  < 1,f( x+(1¡ )y) •  f(x)+(1¡ )f(y):A function is said to be strictly convex if the inequality in Theorem 3.2.6 isstrict; that is, that f( x + (1 ¡  )y) <  f(x) + (1 ¡  )f(y) provided thatx 6= y.Fact 3.2.7 [14, Theorem 2.5.1(ii) and Proposition 2.5.6] If a function f isboth coercive and strictly convex then it has a unique minimizer, „x. That is,argminx2Hf(x) = f„xg.133.2. Convex FunctionsDeflnition 3.2.8 A function f : H ! ]1;+1] is concave if ¡f is convex.Deflnition 3.2.9 A function f : H ! ]1;+1] is a–ne if f is flnite andboth convex and concave.Example 3.2.10 Let f : H ! ]1;+1] be deflned by x 7! ha;xi¡b wherea 2H and b 2R. Then f is an a–ne function.Proof. Let x;y 2H and  2R, thenf( x+(1¡ )y) = ha; x+(1¡ )yi¡b=  (ha;xi¡b)+(1¡ )(ha;yi¡b)=  f(x)+(1¡ )f(y):Therefore the conditions for convexity and concavity are both satisfled andf is a–ne. ¥Another method of determining if a function is convex is by checking theflrst and second order conditions for convexity.Fact 3.2.11 (First order condition) [5, Section 3.1.3] Let f : Rn !]¡1;+1] be a difierentiable function; that is, its gradient rf exists ateach point of its open domain. Then f is convex if and only if its domainis convex andf(y) ‚ f(x)+hrf(x);y ¡xiholds for all x;y 2 domf.This condition says that for a convex function the flrst-order Taylor seriesapproximation is a global underestimator of the function, and conversely if143.2. Convex Functionsthe Taylor approximation is a global underestimator then the function isconvex.Fact 3.2.12 (Second order condition) [5, Section 3.1.4] Let f : Rn !]¡1;+1] be a twice difierentiable function; that is, its Hessian or secondderivative r2f(x) exists at each point of it open domain. Then f is convexif and only if domf is convex and its Hessian is positive semideflnite:r2f(x) ” 0:Remark 3.2.13 A matrix A 2 Rm£n is positive semideflnite if yTAy ‚ 0for all y 2Rn, and is denoted by A ” 0.Fact 3.2.14 (Composition with a–ne mapping) [5, Section 3.2.2] Sup-pose f :Rn !R, A 2Hm£n, and b 2Rn. Deflne g :Rn !R byg(x) = f(Ax+b);with domg = fx : Ax + b 2 domfg. Then if f is convex, so is g; if f isconcave, so is g.Deflnition 3.2.15 Given two functions f;g from H ! ]¡1;+1], f is saidto majorize g if f(x) ‚ g(x) 8x 2H.Deflnition 3.2.16 Let f be a function from H ! R. The lower semi-continuous hull of f is the greatest lower semi-continuous function majorizedby f, i.e, the function whose epigraph is the closure of the epigraph of f.153.2. Convex FunctionsDeflnition 3.2.17 Let f 2 ¡0(H). The recession function of f is the func-tion f1 : H ! ]¡1;+1] whose epigraph is the recession cone of the epi-graph of f, 0+(epif).Fact 3.2.18 [14, Theorem 2.1.5 (ii)] Let f 2 ¡0(H) be such that int(domf) 6=? and let t0 2 domf. The function ’t0 : domfnft0g!R deflned by’t0 := f(t)¡f(t0)t¡t0;is nondecreasing; if f is strictly convex then ’t0 is increasing.Proposition 3.2.19 Let f 2 ¡0(H) and x0 2 domf, then the recessionfunction of f isf1(u) = limt!1 f(x0 +tu)¡f(x0)tfor all u 2H.Proof. Using Deflnition 3.2.17 and Deflnition 3.1.10 we have(u;‚) 2 0+(epif) ,8t > 0 : (x0;f(x0))+t(u;‚) 2 epif, f(x0 +tu)¡f(x0)t • ‚, supt>0f(x0 +tu)¡f(x0)t • ‚:Fix u 2H, the function t 7! f(x0+tu)¡f(x0)t is nondecreasing on ]0;+1] dueto the convexity of f and Fact 3.2.18. Then,(u;‚) 2 0+(epif) , limt!1 f(x0 +tu)¡f(x0)t = supt>0f(x0 +tu)¡f(x0)t • ‚:163.2. Convex FunctionsTherefore for all u 2H, f1(u) = limt!1 f(x0+tu)¡f(x0)t . ¥3.2.1 SubgradientsIn order to deal with nonsmooth convex functions, we will now introduce aconcept that is analogous to a derivative of a difierentiable function.Deflnition 3.2.20 We say that y 2H is a subgradient of a convex functionf at the point x iff(x)+hy;z ¡xi• f(z) (8z 2H): (3.2)The set of all subgradients of f at x is called the subdifierential of f at x andis denoted by @f(x). That is,@f(x) := fy 2H : f(x)+hy;z ¡xi• f(z) 8z 2Hg: (3.3)Example 3.2.21 Let f(x) = jxj, then@f(x) =8>>>>>><>>>>>>:f¡1g; if x < 0;[¡1;1]; if x = 0;f1g; if x > 0:Fact 3.2.22 [12, Theorem 26.1] Let f : H ! ]¡1;+1] be a difierentiablefunction. Then @f = frfg.Fact 3.2.23 [14, Theorem 2.5.7] If f is a proper convex function, thenx 2 domf is a minimum point for f if and only if 0 2 @f(x). In particular,173.2. Convex Functionsif f is difierentiable at x then x is a minimum if rf(x) = 0.Fact 3.2.24 (Bronstead-Rockafellar) [14, Theorem 3.1.2] Let H be aHilbert space and f 2 ¡0(H). Then domf  dom@f and domf⁄  ran@f.3.2.2 Fenchel ConjugateDeflnition 3.2.25 Let f : H ! ]¡1;+1]. The Fenchel conjugate, orconvex conjugate, of f is deflned asf⁄(y) = supx2domf(hx;yi¡f(x));for all y 2H.It is interesting to note that since hx;yi¡ f(x) is a–ne with respect to yfor a flxed x, then f⁄ is the supremum of a set of convex functions andis therefore convex regardless of whether f is convex or not. In the casewhere f is a flnite, coercive and twice continuously difierentiable function,the Fenchel conjugate is also referred to as the Legendre transform of f [13,Example 11.9].Proposition 3.2.26 (Fenchel-Young Inequality) Let f : H ! ]¡1;+1].The following holdsf(x)+f⁄(x⁄) ‚hx;x⁄i (3.4)for all x;x⁄ 2H.Proof. Follows directly from the deflnition of the conjugate function. ¥Proposition 3.2.27 If f1 • f2 then f⁄1 ‚ f⁄2.183.2. Convex FunctionsProof. Let y 2 H, since f1 • f2 then hx;yi¡f1(x) ‚ hx;yi¡f2(x) for allx 2H. Taking the supremum over x of each side yieldssupx2H(hx;yi¡f1(x)) ‚ supx2H(hx;yi¡f2(x)):Therefore f⁄1(y) ‚ f⁄2(y). ¥Proposition 3.2.28 Let f : H ! ]¡1;+1] and c 2 R be a constant.Then(f(¢¡c))⁄(x⁄) = hx⁄;ci+f⁄(x⁄):Proof. Let x⁄ 2 H, then (f(¢¡c))⁄(x⁄) = supx(hx⁄;xi¡f(x¡c)). Lettingx¡c = x0, this becomes(f(¢¡c))⁄(x⁄) = supx0¡hx⁄;x0 +ci¡f(x0)¢ = hx⁄;ci+supx0¡hx⁄;x0i¡f(x0)¢= hx⁄;ci+f⁄(x⁄):¥Example 3.2.29 Let q = 12k ¢ k2, then q⁄ = q and this is the only self-conjugate function, i.e. f = f⁄.Proof. From the deflnition of the conjugate we get thatq⁄(y) = supx hx;yi¡ 12kxk2¶Setting h(x) = hx;yi¡ 12kxk2 and difierentiating, we get h0(x) = y ¡x andh00(x) = ¡Id, which is negative deflnite: Hence, h is strictly concave and193.2. Convex Functionstherefore the critical point x = y is the maximizer. Setting x = y in theequation above yields q⁄(y) = 12kyk2:On the other hand, suppose f is a convex function satisfying f = f⁄. Thenf is proper and using Proposition 3.2.26 we gethx;xi• f(x)+f⁄(x) = 2f(x), q(x) • f(x)Then by Fact 3.2.27 f⁄ • q⁄ = q. Since f⁄ = f, we get q • f • q, thereforef = q. ¥Due to its frequent use, from here on we will use the notation thatq := 12k¢k2: (3.5)Example 3.2.30 Let f :R!R be deflned by f(x) = 1pjxjp then for p = 1f⁄(x⁄) =8>><>>:0; if ¡1 • x⁄ • 1;+1; otherwise:And when p > 1f⁄(x⁄) = 1qjx⁄jqwhere 1p + 1q = 1.Proof. When p = 1, f(x) = jxj and f⁄(x⁄) = supx2R(xx⁄ ¡jxj)203.2. Convex FunctionsIfx⁄ ‚ 0, f⁄(x⁄) = supx2R+(xx⁄ ¡x) = supx2R+(x(x⁄ ¡1)) =8>><>>:+1; if x⁄ > 1;0; if x⁄ • 1:Ifx⁄ < 0, f⁄(x⁄) = supx2R¡(xx⁄ +x) = supx2R¡(x(x⁄ +1)) =8>><>>:0; if x⁄ ‚¡1;+1; if x⁄ < ¡1:Altogether,f⁄(x⁄) =8>><>>:0; if ¡1 • x⁄ • 1;+1; otherwise:When p > 1, f(x) = 1pjxjp and f⁄(x⁄) = supx2R‡xx⁄ ¡ 1pjxjp·Difierentiating to flnd the critical point,ddx(xx⁄ ¡ 1pjxjp) = x⁄ ¡jxjp¡1 sgn(x) = 0;where sgn(x) denotes the sign of x. Then solving for x yieldsx⁄ = jxjp¡1 sgn(x) )jx⁄j = jxjp¡1 )jxj = jx⁄j 1p¡1:Substituting this back into the deflnition of the conjugate,xjxjp¡1 sgn(x)¡ 1pjxjp = jxjjxjp¡1 ¡ 1pjxjp = jxjp ¡ 1pjxjp= (p¡1p )jxjp = (p¡1p )(jx⁄j 1p¡1)p = (p¡1p )jx⁄j pp¡1:Letting q = pp¡1, we get that f⁄(x⁄) = 1qjx⁄jq where 1p + 1q = 1. ¥213.2. Convex FunctionsExample 3.2.31 Let f(x) = 1pjx ¡ cjp for some constant c with p > 1.Then f is convex.Proof. Letg(z) = (1qjzjq)⁄ = 1pjzjp;where 1p + 1q = 1: Then g(z) is convex since it is a conjugate function andf(x) is convex by Fact 3.2.14. ¥Fact 3.2.32 (Biconjugate theorem) [14, Theorem 2.3.3] Assume thatf : H ! ]¡1;+1] is a proper function. Then f⁄⁄ := (f⁄)⁄ = f if andonly if f 2 ¡0(H).Fact 3.2.33 (Conjugate of the difierence of functions) [8, Theorem 2.1]Let g 2 ¡0(H) and let h 2 ¡0(H) such that h and h⁄ both have full domain.Then(8x⁄ 2H)(g ¡h)⁄(x⁄) = supy⁄2H(g⁄(y⁄)¡h⁄(y⁄ ¡x⁄)) (3.6)Deflnition 3.2.34 Let f : H ! ]¡1;+1], and A be a linear transforma-tion from H to H. DeflneAf(x) := infff(y) : Ay = xg:Proposition 3.2.35 Let f : H ! ]¡1;+1], and A be a linear transfor-mation from H to H. Then(Af)⁄ = f⁄ –A⁄:223.2. Convex FunctionsProof. Let x⁄ 2H, then(Af)⁄(x⁄) = supx2H(hx;x⁄i¡Af(x)) = supx2H hx;x⁄i¡ inffy:Ay=xgf(y)¶= sup(hx;x⁄i¡f(y) : (x;y) 2H£H;Ay = x)= sup(hAy;x⁄i¡f(y) : y 2H)= sup(hy;A⁄x⁄i¡f(y) : y 2H)= f⁄(A⁄x⁄) = (f⁄ –A⁄)(x⁄):¥Fact 3.2.36 (Fenchel’s Duality Theorem) [12, Theorem 31.1] Let f andg be a proper convex functions on H. Suppose at least one of the followingholds:(i) ri(domf)\ri(domg) 6=?,(ii) f and g are lower semicontinuous, and ri(domf⁄)\ri(domg⁄) 6=?.Theninfx (f(x)+g(x)) = supx⁄(¡g⁄(¡x⁄)¡f⁄(x⁄)):Fact 3.2.37 [14, Theorem 2.3.1] Let f : H ! ]¡1;+1] and fi;fl 2 R.If fi > 0 then (fif)⁄(x⁄) = fif⁄(fi¡1x⁄) for every x⁄ 2 H; if fl 6= 0 then(f(fl¢))⁄(x⁄) = f⁄(fl¡1x⁄) for every x⁄ 2H.233.2. Convex Functions3.2.3 Epi-addition and Epi-multiplicationDeflnition 3.2.38 Let f1;f2 2 ¡0(H). The inflmal convolution, or epi-addition, is deflned by(f ⁄g)(x) := infx1+x2=x(f1(x1)+f2(x2)) (3.7)for all x 2H.The inflmal convolution is said to be exact at a point x if the inflmum isattained.Deflnition 3.2.39 Let f 2 ¡0(H) and fi ‚ 0. We deflne epi-multiplicationbyfi?f =8>><>>:fif(¢=fi); if fi > 0;¶f0g; if fi = 0:(3.8)Proposition 3.2.40 [1, Proposition 3.1] Let f;f1;¢¢¢ ;fn 2 ¡0(H). Thenthe following properties hold:(i) dom(fi?f) = fi(domf)(ii) dom(f1⁄¢¢¢⁄fn) = (domf1)+¢¢¢+(domfn)Proposition 3.2.41 Let fi ‚ 0. Then the following hold:(i) (fif)⁄ = fi?f⁄;(ii) (fi?f)⁄ = fif⁄;(iii) (f1⁄¢¢¢⁄fn)⁄ = f⁄1 +¢¢¢+f⁄n:243.2. Convex FunctionsProof. (i) Let x⁄ 2H, then we consider two cases.(1) If fi > 0,(fif)⁄(x⁄) = supx(hx;x⁄i¡fif(x)) = fisupx(hx;x⁄=fii¡f(x))= fif⁄(x⁄=fi):(2) If fi = 0(fif)⁄(x⁄) = supx(hx;x⁄i¡0]) =8>><>>:0; if x⁄ = 0;+1; otherwise= ¶f0g(x⁄) = (fi?f⁄)(x⁄):(3.9)Altogether, (fif)⁄(x) = (fi?f⁄)(x⁄).(ii)(fi?f)⁄ =8>><>>:supx(hx;x⁄i¡fif(x=fi)) if fi > 0supx¡hx;x⁄i¡¶f0g(x)¢ if fi = 0=8>><>>:fisupx(hx=fi;x⁄i¡f(x=fi)) if fi > 00 if fi = 0= fif⁄(x⁄):253.2. Convex Functions(iii)(f1⁄¢¢¢⁄fn)⁄(x⁄) = supx hx;x⁄i¡ infx1+¢¢¢+xn=x(f1(x1)+¢¢¢+fn(xn))¶= supx hx1 +¢¢¢+xn;x⁄i+ supx1+¢¢¢+xn=x(¡f1(x1)¡¢¢¢¡fn(xn))¶= supx1(hx1;x⁄i¡f1(x1))+¢¢¢+supx1(hx1;x⁄i¡f1(x1))= (f⁄1 +¢¢¢+f⁄n)(x⁄):¥Fact 3.2.42 [1, Fact 3.4] The following hold.(i) If intdomf1 \¢¢¢intdomfn¡1 \domfn 6= ?, then (f1 +¢¢¢+ fn)⁄ =f⁄1 ⁄¢¢¢⁄f⁄n and the inflmal convolution is exact.(ii) If intdomf⁄1\¢¢¢intdomf⁄n¡1\domf⁄n 6=?, then f1⁄¢¢¢⁄fn is exactand epi(f1⁄¢¢¢⁄fn) = (epif1)+¢¢¢+(epifn):Lemma 3.2.43 (‚1 ?(f1 + q)⁄¢¢¢⁄‚n ?(fn + q))⁄ = ‚1(f⁄1 ⁄q) + ¢¢¢ +‚n(f⁄n⁄q)Proof. Using Proposition 3.2.41(iii), Proposition 3.2.41(i), Fact 3.2.42, andExample 3.2.29, we get that(‚1 ?(f1 + q)⁄¢¢¢⁄‚n ?(fn + q))⁄ = (‚1 ?(f1 + q))⁄ +¢¢¢+(‚n ?(fn + q))⁄= ‚1(f1 + q)⁄ +¢¢¢+‚n(fn + q)⁄= ‚1(f⁄1 ⁄q⁄)+¢¢¢+‚n(f⁄n⁄q⁄)= ‚1(f⁄1 ⁄q)+¢¢¢+‚n(f⁄n⁄q):(3.10)263.3. Proximity Operators¥The following lemma illustrates the beauty of the epi-multiplication no-tation and will be used for a couple of proofs in the following chapters.Lemma 3.2.44 Let fi : H ! ]¡1;+1] for 1 • i • n. Let f = (f1;¢¢¢ ;fn),x = (x1;¢¢¢ ;xn) and ~f(x) = P‚ifi(xi): Then ~f⁄(x⁄) =nPi=1‚i ?f⁄i (x⁄i).Proof.~f⁄(x⁄) = supx2Hfhx;x⁄i¡ ~f(x)g= supx=(x1;¢¢¢;xn)2Hfhx1;x⁄1i+¢¢¢hxn;x⁄ni¡X‚ifi(xi)g= ‚1 supx1fhx1;x⁄1=‚1i¡f1(x1)g+¢¢¢+‚n supxnfhxn;x⁄n=‚ni¡fn(xn)g= ‚1f⁄1(x⁄1‚1)+¢¢¢+‚nf⁄n(x⁄n‚n) = ‚1 ?f⁄1(x⁄1)+¢¢¢+‚n ?f⁄n(x⁄n):¥3.3 Proximity OperatorsDeflnition 3.3.1 The proximity operator, or proximal mapping, of a func-tion f 2 ¡0(H) is deflned by(8x 2H) Proxf x = argminy2H(f(y)+ q(x¡y)): (3.11)Fact 3.3.2 [6, Section 2.2] For all x 2H and for all p 2Hp = Proxf x , x¡p 2 @f(p);273.4. Minimax TheoryandProxf = (Id+@f)¡1:Remark 3.3.3 Note that since the function y 7! f(y)+q(x¡y) is strictlyconvex and supercoercive, it has a unique minimizer, p = Proxf(x).3.4 Minimax TheoryMinimax problems are optimization problems that involve both minimiza-tion and maximization. Let X and Y be arbitray subsets of H with X 6=?and Y 6= ?, and let F be a function from X £Y to [¡1;+1]. Minimaxtheory deals with problems of the form supx2Xinfy2YF(x;y) or infy2Ysupx2XF(x;y).For more on minimax theory please see chapters 36 and 37 in [12].Deflnition 3.4.1 If supx2Xinfy2YF(x;y) = infy2Ysupx2XF(x;y) then the commonvalue is called the minimax or the saddle-value of F.Deflnition 3.4.2 F is a concave-convex function if F(¢;y) is a concavefunction on X for all y 2 Y and F(x;¢) is convex on Y for all x 2 X.Similarly, F is a convex-concave function if F(¢;y) is convex on X for ally 2 Y and F(x;¢) is concave on Y for all x 2 X.Thefollowingfactgivesusconditionsfordeterminingwhetherthesaddle-value exists.Fact 3.4.3 [12, Theorem 37.3] Let F :Rm £Rn ! ]¡1;+1] be a properconcave-convex function with efiective domain X £ Y. Then either of the283.4. Minimax Theoryfollowing conditions implies that the saddle-value of F exists. If both condi-tions hold, the saddle-value must be flnite.(a) The convex functions F(x;¢) for x 2 riX have no common direction ofrecession.(b) The convex functions ¡F(¢;y) for y 2 riY have no common directionof recession.29Chapter 4The Proximal AverageWhen averaging functions, the traditional arithmetic average‚1f1 +¢¢¢+‚nfn (4.1)is the natural place to begin. However, when the domains of the functions donot intersect then the result of (4.1) is a function that is everywhere inflnity.The proximal average provides a useful method of averaging functions, evenwhen their domains do not intersect.In this chapter, we give a new proof to the self-duality of the proximalaverage:(P(f;‚))⁄ = P(f⁄;‚):We also supply two self-dual smooth operators, Sflf and Tflf.For this chapter, let f1;¢¢¢ ;fn 2 ¡0(H), and ‚1;¢¢¢ ;‚n be strictly pos-itive real numbers such thatnPi=1‚i = 1.304.1. Deflnitions4.1 DeflnitionsDeflnition 4.1.1 (Proximal Average) Let f = (f1;¢¢¢ ;fn) and ‚ = (‚1;¢¢¢ ;‚n).The ‚-weighted proximal average of n functions fi, 1 • i • n, isP(f;‚) = ‚1 ?(f1 + q)⁄¢¢¢⁄‚n ?(fn + q)¡q: (4.2)That is,(8x 2H) P(f;‚)(x) = infnPi=1xi=xnXi=1 ‚i(fi(xi=‚i)+ q(xi=‚i))¶¡q(x):This can be reformulated in two difierent ways.Proposition 4.1.2 The proximal average can be equivalently deflned by(i) P(f;‚)(x) = infP‚iyi=x Pi‚ifi(yi)+Pi‚iq(yi)¶¡q(x)(ii) P(f;‚) = (‚1(f1 + q)⁄ +¢¢¢+‚n(fn + q)⁄)⁄ ¡q:Proof. Using the change of variables yi = xi=‚i in Deflnition 4.1.1 we im-mediately get (i).For (ii), flrst note that by Proposition 3.2.40(ii),(8i 2N)dom(f⁄i ⁄q) = (domf⁄i )+(domq) = H:314.2. PropertiesThen Fact 3.2.42(i), Proposition 3.2.41(i), and Fact 3.2.32 yield(‚1(f1 + q)⁄ +¢¢¢+‚n(fn + q)⁄)⁄ = (‚1(f1 + q)⁄)⁄⁄¢¢¢⁄(‚n(fn + q)⁄)⁄= ‚1 ?(f1 + q)⁄⁄⁄¢¢¢⁄‚n ?(fn + q)⁄⁄= ‚1 ?(f1 + q)⁄¢¢¢⁄‚n ?(fn + q):¥4.2 PropertiesTheorem 4.2.1 (Domain)domP(f;‚) = ‚1 domf1 +¢¢¢+‚n domfnProof. Using Proposition 3.2.40i and Proposition 3.2.40 iidomP(f;‚) = dom(‚1 ?(f1 + q)⁄¢¢¢⁄‚n ?(fn + q))= dom(‚1 ?(f1 + q))+¢¢¢+dom(‚n ?(fn + q))= ‚1 dom(f1 + q)+¢¢¢+‚n dom(fn + q)= ‚1 domf1 +¢¢¢+‚n domfn:(4.3)¥Corollary 4.2.2 If at least one function fi has full domain and ‚i > 0 thenP(f;‚) has full domain.324.2. Properties4.2.1 Fenchel ConjugateNext we examine the conjugate of the proximal average. The purpose ofthis section is to give a new proof for (P(f;‚))⁄ = P(f⁄;‚) without usingToland’s Duality Theorem. First we must prove the following lemma, whichwill also be used later to reformulate the proximal average.Lemma 4.2.3 The following identity holdsnXi=1‚iq(yi)¡q(nXi=1‚iyi) = 14nXi=1nXj=1‚i‚jkyi ¡yjk2: (4.4)Proof. Consider flrst the left hand side of (4.4),nXi=1‚iq(yi)¡q(nXi=1‚iyi)= 12nXi=1‚ikyik2 ¡ 12knXi=1‚iyik2= 12nXi=1‚ikyik2 ¡ 12nXi=1‚2ikyik2 ¡Xi6=j‚i‚jhyi;yji: (4.5)On the other hand, from the right hand side we get,334.2. Properties14nXi=1nXj=1‚i‚jkyi ¡yjk2= 14nXi=1‚i nXj=1‚jkyi ¡yjk2¶= 14nXi=1‚i nXj=1(‚jkyik2 ¡2‚jhyi;yji+‚jkyjk2)¶= 14nXi=1‚i nXj=1‚jkyik2 ¡2nXj=1‚jhyi;yji+nXj=1‚jkyjk2¶= 14nXi=1‚i kyik2 ¡2nXj=1‚jhyi;yji+nXj=1‚jkyjk2¶= 14 nXi=1‚ikyik2 ¡2nXi=1nXj=1‚i‚jhyi;yji+nXi=1‚i nXj=1‚jkyjk2¶¶= 12nXi=1‚ikyik2 ¡ 12nXi=1nXj=1‚i‚jhyi;yji= 12nXi=1‚ikyik2 ¡ 12nXi=1‚2ikyik2 ¡Xi6=j‚i‚jhyi;yji: (4.6)Since (4.5) and (4.6) are equal, the proof is complete. ¥The following lemma is new.Lemma 4.2.4 Letg(y1;¢¢¢ ;yn) = ‚1q(y1)+¢¢¢+‚nq(yn)¡q(‚1y1 +¢¢¢+‚nyn);344.2. Propertiesfor (y1;¢¢¢ ;yn) 2Hn andnPi=1‚n = 1. Theng⁄(x⁄1;¢¢¢ ;x⁄n) =8>><>>:‚1 ?q(x⁄1)+¢¢¢+‚n ?q(x⁄n); if x⁄1 +¢¢¢+x⁄n = 0;+1; otherwise.Proof. For every (x⁄1;¢¢¢ ;x⁄n) 2Hn we haveg⁄(x⁄1;¢¢¢ ;x⁄n) = supy1;¢¢¢;yn hx⁄1;y1i+¢¢¢+hx⁄n;yni¡‚1q(y1)¡¢¢¢¡‚nq(yn)+ q(‚1y1 +¢¢¢+‚nyn)¶: (4.7)In light of Lemma 4.2.3, the equation within the supremum is concave andtherefore solving for critical points will yield the supremum. Taking thepartial derivatives, with respect to yi for i = 1:::n, and setting them equalto zero givesx⁄1 ¡‚1y1 +(‚1y1 +¢¢¢+‚nyn)‚1 = 0... (4.8)x⁄n ¡‚nyn +(‚1y1 +¢¢¢+‚nyn)‚n = 0:Then taking the sum of all of the above equations yieldsnXi=1x⁄i ¡nXi=1‚iyi +nXi=1‚i(nXi=1‚iyi) = 0,nXi=1x⁄i ¡nXi=1‚iyi +nXi=1‚iyi ,nXi=1x⁄i = 0:354.2. PropertiesSo x⁄1 + ¢¢¢ + x⁄n = 0 and if we let y1 = x⁄1=‚1;¢¢¢ ;yn = x⁄n=‚n then(y1;¢¢¢ ;yn) is a solution to (4.8). Consequently, ‚1y1 + ¢¢¢ + ‚nyn = 0and hx⁄i;x⁄i=‚ii = 2‚iq(x⁄i=‚i) = 2(‚i ?q)(x⁄i) for i = 1¢¢¢n. This gives usthathx⁄1;y1i+¢¢¢+hx⁄n;yni¡‚1q(y1)¡¢¢¢¡‚nq(yn)= 2nXi=1‚i ?q(x⁄i)¡‚1 ?q(x⁄1)¡¢¢¢¡‚n ?q(x⁄n)= ‚1 ?q(x⁄1)+¢¢¢+‚n ?q(x⁄n)If Pix⁄i 6= 0 then let y1 = y2 = ¢¢¢ = yn = y and then (4.7) becomesg⁄(x⁄1;¢¢¢ ;x⁄n) ‚ supy hnXi=1x⁄i;yi¡‚1q(y)¡¢¢¢¡‚nq(y)+ q(y)¶‚ supy hnXi=1x⁄i;yi¡q(y)+ q(y)¶‚ supy hnXi=1x⁄i;yi¶= +1:Thus,g⁄(x⁄1;¢¢¢ ;x⁄n) =8>><>>:‚1 ?q(x⁄1)+¢¢¢+‚n ?q(x⁄n); if x⁄1 +¢¢¢+x⁄n = 0;+1; otherwise.¥Remark 4.2.5 It can be noted that @g@yi = ‚iyi ¡(‚1y1 +¢¢¢+ ‚nyn)‚i for364.2. Propertiesi = 1¢¢¢n so that @g@y1 +¢¢¢+ @g@yn = 0. This means thatran@g  f(x⁄1;¢¢¢ ;x⁄n) : x⁄1 +¢¢¢+x⁄n = 0g:Conversely, if x⁄1+¢¢¢+x⁄n = 0 and we let y1 = x⁄1=‚1;¢¢¢ ;yn = x⁄n=‚n thenrg(y1;¢¢¢ ;yn) = (x⁄1;¢¢¢ ;x⁄n) andf(x⁄1;¢¢¢ ;x⁄n) : x⁄1 +¢¢¢+x⁄n = 0g ran@g:Therefore ran@g = f(x⁄1;¢¢¢ ;x⁄n) : x⁄1 +¢¢¢+ x⁄n = 0g. Now since ran@g  domg⁄  ran@g by Fact 3.2.24 and we have ran@g = ran@g then domg⁄ =f(x⁄1;¢¢¢ ;x⁄n) : x⁄1 +¢¢¢+x⁄n = 0g, as we saw in the previous lemma.Theorem 4.2.6 (Fenchel Conjugate) [1, Theorem 5.1](P(f;‚))⁄ = P(f⁄;‚) (4.9)Proof. Letf(x) = P(f;‚)(x) = infnPi=1‚iyi=x ‚1f1(y1)+¢¢¢‚nfn(yn)+‚1q(y1)+¢¢¢+ q(yn)¡q(‚1y1 +¢¢¢+‚nyn)¶;and let A : Hn ! H be a linear operator deflned by A = ‚1; ¢¢¢ ; ‚n¶,i.e. A(x1;¢¢¢ ;xn) =nPi=1‚ixi. Then A⁄ : H ! Hn, the adjoint of A, is374.2. PropertiesA⁄ =0BBBB@‚1...‚n1CCCCA, i.e. A⁄(z) = (‚1z;¢¢¢ ;‚nz). Then we can write f as f = AFwhereF(y) = ‚1f1(y1)+¢¢¢+‚nfn(yn)+‚1q(y1)+¢¢¢+‚nq(yn)¡q(‚1y1+¢¢¢+‚nyn)andAF(y) := inffx:Ax=ygF(x):For ease of notation, say that F = j + g where j(y) = ‚1f1(y1) + ¢¢¢ +‚nfn(yn) and g(y) = ‚1q(y1) + ¢¢¢ + ‚nq(yn) ¡ q(‚1y1 + ¢¢¢ + ‚nyn). ByProposition 3.2.35,f⁄(x⁄) = (AF)⁄(x⁄) = (F⁄ –A⁄)(x⁄):Since j 2 ¡0(H) and domg = H£¢¢¢£H, then int(domf)\int(domg) 6=?and we can use Fact 3.2.42(i) and Lemma 3.2.44 to getf⁄(x⁄) = (j⁄⁄g⁄)A⁄(x⁄)= (‚1 ?f⁄1 +¢¢¢+‚n ?f⁄n)⁄g⁄¶A⁄(x⁄):384.2. PropertiesThen using Lemma 4.2.4 we getf⁄(x⁄) = (‚1 ?f⁄1 +¢¢¢+‚n ?f⁄n)⁄g⁄¶(‚1x⁄;¢¢¢ ;‚nx⁄)= infy1;¢¢¢;yn ‚1f⁄1(y1=‚1)+¢¢¢+‚nf⁄n(yn=‚n)+g⁄(‚1x⁄ ¡y1;¢¢¢ ;‚nx⁄ ¡yn)¶= inf‚1x⁄¡y1+¢¢¢+‚nx⁄¡yn=0 ‚1f⁄1(y1=‚1)+¢¢¢+‚nf⁄n(yn=‚n)+‚1q(‚1x⁄ ¡y1‚1 )+¢¢¢+‚nq(‚nx⁄ ¡yn‚n )¶= infx⁄=y1+¢¢¢+yn ‚1f⁄1(y1=‚1)+¢¢¢+‚nf⁄n(yn=‚n)+‚1q(x⁄ ¡ y1‚1)+¢¢¢+‚nq(x⁄ ¡ yn‚n)¶:Expanding the last set of terms in the above equation yields‚iq(x⁄ ¡ yi‚i) = ‚i2 kx⁄ ¡ yi‚ik2 = ‚i2 kx⁄k2 ¡hx⁄;yii+ ‚i2 kyi‚ik2= ‚iq(x⁄)¡hx⁄;yii+‚iq(yi‚i)for all i = 1¢¢¢n. Taking the sum of all of these terms and substituting backinto the inflmum equation producesf⁄(x⁄) = infx⁄=y1+¢¢¢+yn ‚1f⁄1(y1=‚1)+¢¢¢+‚nf⁄n(yn=‚n)+ q(x⁄)¡hx⁄;y1 +¢¢¢+yni+‚1q(y1‚1)+¢¢¢+‚nq(yn‚n)¶= infx⁄=y1+¢¢¢+yn ‚1f⁄1(y1=‚1)+¢¢¢+‚nf⁄n(yn=‚n)+‚1q(y1‚1)+¢¢¢+‚nq(yn‚n)¡q(x⁄)¶:394.2. PropertiesMaking the simple change of variable zi = yi=‚i for i = 1¢¢¢n generatesf⁄(x⁄) = infx⁄=‚1z1+¢¢¢+‚nzn ‚1f⁄1(z1)+¢¢¢+‚nf⁄n(zn)+‚1q(z1)+¢¢¢‚nq(zn)¡q(x⁄)¶= P(f⁄;‚):¥Example 4.2.7 Let f = (f;f⁄) and ‚ = ¡12; 12¢, then P(f;‚) = q.Proof. By Fact 4.2.6,(P(f;‚))⁄ = P(f⁄;‚):Since f⁄ = (f⁄;f⁄⁄) = (f⁄;f), then we get that P(f⁄;‚) = P(f;‚). There-fore, using Example 3.2.29 we see that (P(f;‚))⁄ = q: ¥Corollary 4.2.8 P(f;‚) is convex, lower semicontinuous, and proper.Proof. We can apply Fact 4.2.6 twice to see that(P(f;‚))⁄⁄ = (P(f⁄;‚))⁄ = P(f;‚):Therefore, in light of Fact 3.2.32 P(f;‚) 2 ¡0(H). ¥Deflnition 4.2.9 (Moreau Envelope) The Moreau envelope of f 2 ¡0(H)with parameter „ > 0 ise„f = f ⁄„?q:Fact 4.2.10 (Moreau Envelope of the Proximal Average) [1, Theo-rem 6.2]404.3. Applications(i) e„P(f;‚) = ‚1e„f1 +¢¢¢+‚ne„fn(ii) (e„P(f;‚))⁄ = ‚1 ?(e„f1)⁄⁄ ¢¢¢⁄‚n ?(e„fn)⁄Fact 4.2.11 (Proximal Mapping) [1, Theorem 6.7]ProxP(f;‚) = ‚1 Proxf1 +¢¢¢+‚n Proxfn4.3 Applications4.3.1 Self-dual Smooth ApproximationsA function f in Rn is smooth if f is flnite and difierentiable everywhere inRn. It can be helpful in cases of nondifierentiable convex functions to flnda smooth approximation of the function. Here, two smooth approximationsare deflned using the proximal average. The flrst smooth operator, Sflf,has a simple expression in terms of the Moreau envelope which can be seenin [9]. The second smooth operator, Tflf, has a simple expression in termsof the proximal average and is new. Both approximations are "self-dual"in the sense that the conjugate of the approximation of f is equal to theapproximation of the conjugate of f.Recall that,P(f1;‚1;¢¢¢ ;fn;‚n) := (‚1(f1 + q)⁄ +¢¢¢+‚n(fn + q)⁄)⁄ ¡q:For 0 • fl • 1 and a proper lower-semicontinuous function f, deflne Sflf :414.3. ApplicationsRn ! ]¡1;+1] bySflf(x) := (1+fl)2P(f; 1¡fl1+fl;q; 2fl1+fl)( x1+fl) (4.10)for all x 2Rn.Theorem 4.3.1 (i) (Sflf)⁄ = Sflf⁄(ii) When 0 < fl • 1, we haveSflf = (1+fl)P(1¡fl;f;fl;0)+flq = (1¡fl)2(f ⁄ 1flq)+flq: (4.11)Therefore when fl ! 0, Sflf ! limfl!0+(f ⁄ 1flq) = f.Proof. (i) By Theorem 4.2.6, we have(Sflf)⁄ = (1+fl)2P(f; 1¡fl1+fl;q; 2fl1+fl)( ¢1+fl)¶⁄By Fact 3.2.37 and Proposition 3.2.41(i) we then get(Sflf)⁄ = (1+fl)2P(f; 1¡fl1+fl;q; 2fl1+fl)¶⁄((1+fl)¢)= (1+fl)2 P(f; 1¡fl1+fl;q; 2fl1+fl)¶⁄((1+fl)¢(1+fl)2)= (1+fl)2P f⁄; 1¡fl1+fl;q; 2fl1+fl¶( ¢(1+fl))= Sflf⁄:(ii) For every x, by the deflnition of the proximal average, Proposi-424.3. Applicationstion 3.2.41(i), and Example 3.2.29Sflf(x) = (1+fl)2• 1¡fl1+fl(f + q)⁄ + 2fl1+fl(q + q)⁄¶⁄( x1+fl)¡ 12 kxk2(1+fl)2‚= (1+fl)2 1¡fl1+fl(f + q)⁄ + fl1+flq¶⁄( x1+fl)¡q(x)= (1+fl) (1¡fl)(f + q)⁄ +flq¶⁄(x)¡q(x)= (1+fl)• (1¡fl)(f + q)⁄ +flq¶⁄(x)¡q(x)‚+flq(x):Thisistheflrstequalityin(4.11). Tocontinue, applyFact3.2.42(i), Fact3.2.32,Example 3.2.29, and Proposition 3.2.41(i) toSflf = (1+fl) (1¡fl)(f + q)⁄ +flq¶⁄(x)¡q(x);to getSflf(x) = (1+fl)[((1¡fl)(f + q)⁄)⁄⁄(flq)⁄](x)¡q(x)= (1+fl)•((1¡fl)(f + q)( ¢1¡fl))⁄ 1flq‚(x)¡q(x):Using Deflnition 3.2.38,Sflf(x) = (1+fl)infu•(1¡fl)(f + q)( u1¡fl)+ 1flq(x¡u)‚¡q(x)= (1+fl)infu•(1¡fl)f( u1¡fl)+(1¡fl)q( u1¡fl)+ 1flq(x¡u)‚¡q(x)= (1¡fl2)infu•f( u1¡fl)+ q( u1¡fl)+ 1fl(1¡fl)q(x¡u)¡ 11¡fl2q(x)‚:(4.12)434.3. ApplicationsSimplifying the portion of (4.12) containing q and using 1(1¡fl)2+ 1fl(1¡fl) =1fl(1¡fl)2 and1fl(1¡fl) ¡1(1¡fl2) =1fl(1¡fl2)q( u1¡fl)+ 1fl(1¡fl)q(x¡u)¡ 11¡fl2q(x)= 1(1¡fl)2 kuk22 +1fl(1¡fl)kxk22 ¡1fl(1¡fl)hx;ui+1fl(1¡fl)kuk22 ¡11¡fl2kxk22= 1fl(1¡fl)2 kuk22 ¡1fl(1¡fl)hx;ui+1fl(1¡fl2)kxk22= 12fl( kuk2(1¡fl)2 ¡2hx;u1¡fli+kxk2)+( 1fl(1¡fl2) ¡1fl)kxk22= 12flkx¡ u1¡flk2 + fl1¡fl2 kxk22 :Plugging this back into (4.12) givesSflf(x) = (1¡fl2)infu f( u1¡fl)+ 1flq(x¡ u1¡fl)+ fl1¡fl2q(x)¶= (1¡fl2)infw f(w)+ 1flq(x¡w)¶+flq(x)= (1¡fl2) f ⁄ 1flq¶(x)+flq(x);which is the second equality of (4.11). The convergence result follows from[13, Theorem 1.25]. ¥Another smooth operator is deflned byTflf := P(f;1¡fl;q;fl): (4.13)Theorem 4.3.2 (i) (Tflf)⁄ = Tflf⁄444.3. Applications(ii) When 0 < fl • 1, we haveTflf = (1¡fl) f ⁄ 2¡flfl q¶( 22¡fl¢)+ fl2¡flq:Proof. (i) This follows from Theorem 4.2.6 and Example 3.2.29(Tflf)⁄ = (P(f;1¡fl;q;fl))⁄ = P(f⁄;1¡fl;q⁄;fl) = P(f⁄;1¡fl;q;fl)= Tflf⁄(ii) Applying Proposition 4.1.2(ii), Proposition 3.2.41(i), Example 3.2.29,Fact 3.2.42(i), and Deflnition 3.2.38,Tflf(x) = ((1¡fl)(f + q)⁄ +fl(2q)⁄)⁄(x)¡q(x)=‡(1¡fl)(f + q)⁄ +flq2·⁄(x)¡q(x)= (1¡fl)(f + q)( ¢1¡fl)⁄ 2flq¶(x)¡q(x)= infu (1¡fl)f( u1¡fl)+(1¡fl)q( u1¡fl)+ 2flq(x¡u)¶¡q(x):This is equivalent to,Tflf(x) = (1¡fl)infu f( u1¡fl)+ q( u1¡fl)+ 2fl(1¡fl)q(x¡u)¡ 11¡flq(x)¶:(4.14)454.3. ApplicationsNote thatq( u1¡fl)+ 2fl(1¡fl)q(x¡u)¡ 11¡flq(x)= 1(1¡fl)2 kuk22 +2fl(1¡fl)kxk2 ¡2hx;ui+kuk22 ¡11¡flkxk22= 2¡flfl(1¡fl)2 kuk22 ¡2hx;uifl(1¡fl) +2¡flfl(1¡fl)kxk22= 2¡fl2fl kuk2(1¡fl)2 ¡2h2x2¡fl;u1¡fli+k2x2¡flk2¶+ 2¡flfl(1¡fl) ¡4fl(2¡fl)¶kxk22= 2¡fl2fl k 2x2¡fl ¡ u1¡flk2 + fl(1¡fl)(2¡fl)kxk22 :Substitute this back into (4.14) to getTflf(x) = (1¡fl)infu f( u1¡fl)+ 2¡fl2fl k 2x2¡fl ¡ u1¡flk2¶+ fl2¡fl kxk22= (1¡fl)infw f(w)+ 2¡flfl q( 2x2¡fl ¡w)¶+ fl2¡flq(x)= (1¡fl) f ⁄ 2¡flfl q¶( 2x2¡fl)+ fl2¡flq(x);which proves the desired equality. ¥46Chapter 5The Kernel Average of TwoFunctions5.1 DeflnitionThe kernel average for two functions is given by Bauschke and Wang in[3] as a generalization of the proximal average. A natural extension of thedeflnition of the proximal average, the kernel average replaces the use of qwith any kernel function g. Using the same notation as the previous chapter,weassumef1;f2;andg arefunctionsin¡0(H), and‚1;‚2 arestrictlypositivereal numbers such that ‚1 +‚2 = 1.Deflnition 5.1.1 (Kernel Average) Let f = (f1;f2) and ‚ = (‚1;‚2),we deflne P(f;‚;g) : H ! [¡1;+1] at x 2H byP(f;‚;g)(x) := inf‚1y1+‚2y2=x(‚1f1(y1)+‚2f2(y2)+‚1‚2g(y1 ¡y2))= infx=z1+z2 ‚1f1(z1‚1)+‚2f2(z2‚2)+‚1‚2g(z1‚1¡ z2‚2)¶:(5.1)We call this the average of f1 and f2 with respect to the kernel g, or theg-average of f1 and f2.475.1. DeflnitionExample 5.1.2 (Arithmetic Average) Set g = ¶f0g, thenP(f;‚;g)(x) = inf‚1y1+‚2y2=x(‚1f1(y1)+‚2f2(y2)+‚1‚2¶0(y1 ¡y2))= inf‚1y1+‚2y1=x(‚1f1(y1)+‚2f2(y1))= ‚1f1(x)+‚2f2(x)is the arithmetic average.Lemma 5.1.3 The following equality holds when ‚1 +‚2 = 1.‚1‚2ky1 ¡y2k2 = ‚1ky1k2 +‚2ky2k2 ¡k‚1y1 +‚2y2k2:Proof. Let y1;y2 2H then by Lemma 4.2.3,‚1ky1k2 +‚2ky2k2 ¡k‚1y1 +‚2y2k2 = 2 142Xi=12Xj=1‚1‚jkyi ¡yjk2¶= 2 14‚1‚2ky1 ¡y2k2 + 14‚2‚1ky2 ¡y1k2¶= ‚1‚2ky1 ¡y2k2¥Example 5.1.4 (Proximal Average) If g = q, thenP(f;‚;g)(x) = inf‚1y1+‚2y2=x ‚1f1(y1)+‚2f2(y2)+‚1‚212ky1 ¡y2k2¶:485.2. PropertiesApplying Lemma 5.1.3P(f;‚;g) = inf‚1y1+‚2y2=x ‚1f1(y1)+‚2f2(y2)+12‚1ky1k2+12‚2ky2k2¡12kxk2¶;which is the proximal average with n = 2.Example 5.1.5 Let f1 = ¶fag and f2 = ¶fbg, with a;b 2R. ThenP(f;‚;g) =8>><>>:‚1‚2g(a¡b) if x = ‚1a+‚2b+1 otherwise:5.2 PropertiesFact 5.2.1 (Fenchel Conjugate) [3, Theorem 2.2] Let f1;f2;g 2 ¡0(H).For every x⁄ 2H,(P(f;‚;g))⁄(x⁄) = (cl’)(‚1x⁄;‚2x⁄) (5.2)where’(u;v) = inf‚1z1+‚2z2=u+v ‚1f⁄1(z1)+‚2f⁄2(z2)+ 12‚1‚2(g⁄( u‚1‚2¡ z1‚2)+g⁄( ¡v‚1‚2¡ z2‚1))¶:Furthermore, if (ridomf1¡ridomf2)Tridomg 6=? then the closure oper-ation in (5.2) can be omitted and we get that(P(f;‚;g))⁄(x⁄) = P(f⁄;‚;(g⁄)_) (5.3)495.2. Propertieswhere for a given function g 2 ¡0(H), let g_(x) = g(¡x), and the inflmumin Deflnition 5.1.1 is attained, that isP(f⁄;‚;(g⁄)_)(x⁄) = minx⁄=‚1z1+‚2z2(‚1f⁄1(z1)+‚2f⁄2(z2)+‚1‚2g⁄(z2 ¡z1)):(5.4)Corollary 5.2.2 Let f1;f2;g 2 ¡0(H), and assume that both g and g⁄ havefull domain. Then both P(f;‚;g) and P(f⁄;‚;(g⁄)_) are in ¡0(H) and(P(f;‚;g))⁄ = P(f⁄;‚;(g⁄)_): (5.5)In particular, for g = 1pk¢kp with p > 1, we have(P(f;‚; 1pk¢kp))⁄ = P(f⁄;‚; 1qk¢kq) (5.6)where 1p + 1q = 1.Can the deflnition of the kernel average be generalized for n functions?What is the reformulation of Deflnition 5.1.1? This will be addressed in thenext chapter.50Chapter 6The Kernel Average of nFunctions6.1 MotivationIn this chapter, we look at another reformulation of the proximal averageand see how that reformulation can be used to extend the deflnition of thekernel average to n functions. Similar to the previous chapters, we assumef1;¢¢¢ ;fn and g are functions in ¡0(H), and ‚1;¢¢¢ ;‚n are strictly positivereal numbers such thatnPi=1‚i = 1. First we will prove a new reformulationof the proximal average.Theorem 6.1.1 Let f = (f1;¢¢¢fn) with f1;¢¢¢ ;fn 2 ¡0(H), and ‚ =(‚1;¢¢¢ ;‚n) with ‚1;¢¢¢ ;‚n ‚ 0 andnPi=1‚i = 1. Deflnef := P(f;‚) = ¡‚1(f1 + q)⁄ +¢¢¢+‚n(fn + q)⁄¢⁄ ¡q:Then for every x 2H,f(x) = inf‚1y1+¢¢¢‚nyn=x‚1f1(y1)+¢¢¢+‚nfn(yn)+ 12Xi<j‚i‚jkyi ¡yjk2:516.1. MotivationProof. By Proposition 4.1.2(i) and Lemma 4.2.3f(x) = infnPi=1‚iyi=x¡ nXi=1‚ifi(yi)+nXi=1‚iq(yi)¡q(x)¢= infnPi=1‚iyi=x¡ nXi=1‚ifi(yi)+nXi=1‚iq(yi)¡q(nXi=1‚iyi)¢= infP‚iyi=x¡X‚ifi(yi)+14nXi=1nXj=1‚i‚jkyi ¡yjk2¢= infP‚iyi=x¡X‚ifi(yi)+Xi<j‚i‚j12kyi ¡yjk2¢:¥This reformulation of the proximal average suggests a generalizationwhere 12k ¢ k2 is replaced by a function g. We’ll call this generalizationthe kernel average of n functions, deflned byQg(f;‚)(x) := infP‚iyi=x¡X‚ifi(yi)+Xi<j‚i‚jg(yi ¡yj)¢: (6.1)This deflnition is the same as the kernel average when n = 2, but extendsthe kernel average deflnition by allowing more than two functions. We’ll nowexplore the kernel average for n functions in a bit more detail, and from hereon when we refer to the kernel average we mean Qg(f;‚).526.2. The Kernel Average Conjugate6.2 The Kernel Average ConjugateWe will now consider the kernel average asQg(f;‚)(x) = infAy=xfh(y) := ~f(y)+ ~g(y)g = (Ah)(x);whereA(x1;¢¢¢ ;xn) =nXi=1‚ixiAh = inffAy = xgfh(x)g;~f(y) = X‚ifi(yi); and~g(y) =Xi<j‚i‚jg(yi ¡yj):In light of Proposition 3.2.35, we getQ⁄g(x⁄) = (h⁄ –A⁄)(x⁄)= (~f + ~g)⁄ –A⁄¶(x⁄);so we can see that to get Q⁄g we need to compute h⁄ = (~f + ~g)⁄, which byFact 3.2.42 we have h⁄ = ~f⁄⁄~g⁄. It is quite simple to compute ~f⁄, and thiswas done in Lemma 3.2.44, but ~g⁄ is more challenging.To consider ~g⁄ = ¡Pi<j‚i‚jg(yi ¡ yj)¢⁄, we’ll flrst begin with the casewhich gives us the proximal average, where g = q, with the hope that thiswill allow us to flnd a general formula for any g.Proposition 6.2.1 Let x = (x1;¢¢¢ ;xn) and g1(x) =nPi=1nPj=112kxi ¡xjk2 =536.2. The Kernel Average Conjugate2Pi<j12kxi ¡xjk2 theng⁄1(x⁄) =8>><>>:18n2nPi=1nPj=1kx⁄i ¡x⁄jk2; if x⁄1 +¢¢¢x⁄n = 0;+1; otherwise.Proof. Let ‚i = 1n for i = 1¢¢¢n, then g1 can be rewritten asg1(x) =nXi=1nXj=112kxi ¡xjk2 = (2n2)¢ 14nXi=1nXj=1kxi ¡xjk2:Using Lemma 4.2.3,g1(x) = 2n2 nXi=1‚iq(xi)¡q(nXi=1‚ixi)¶= 2n2 ¢g(x);where g(x) is as deflned in Lemma 4.2.4. Then by Proposition 3.2.41(i) andLemma 4.2.4g⁄1(x⁄) = 2n2g⁄( x⁄2n2)=8>><>>:2n2 nPi=1‚i ?q( x⁄i2n2)¶; if x⁄12n2 +¢¢¢+ x⁄n2n2 = 0;+1; otherwise.546.2. The Kernel Average ConjugateNow, ‚i ?q( x⁄i2n2) = 1nq(nx⁄i2n2) = 14n3q(x⁄i) so2n2 nXi=1‚i ?q( x⁄i2n2)¶= 12nnXi=1q(x⁄i)= 12 nXi=11nq(x⁄i)¶= 12 nXi=1‚iq(x⁄i)¶:Applying Lemma 4.2.32n2 nXi=1‚i ?q( x⁄i2n2)¶= 12 q(nXi=1‚ix⁄i)+ 14nXi=1nXj=1‚i‚jkx⁄i ¡x⁄jk2¶:Since x⁄12n2 +¢¢¢+ x⁄n2n2 = 0 , 12n ‚1x⁄1 +¢¢¢+‚nx⁄n¶= 0 ,nPi=1‚ix⁄i = 0, weget2n2 nXi=1‚i ?q( x⁄i2n2)¶= 12 14nXi=1nXj=1‚i‚jkx⁄i ¡x⁄jk2¶= 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2:Altogether,g⁄1(x⁄) =8>><>>:18n2nPi=1nPj=1kx⁄i ¡x⁄jk2; if x⁄1 +¢¢¢+x⁄n = 0;+1; otherwise.¥Proposition 6.2.1 can also be proven directly using critical points. Proof.556.2. The Kernel Average ConjugateBy deflnition,g⁄1(x⁄) = supx1;¢¢¢;xn hx⁄1;x1i+¢¢¢+hx⁄n;xni¡nXi=1nXj=112kxi ¡xjk2¶:Let ^g1(x) = hx⁄1;x1i + ¢¢¢ + hx⁄n;xni ¡nPi=1nPj=112kxi ¡ xjk2. Since @g1@xi =2nPj=1(xi ¡xj), we getr^g1(x) = (x⁄1 ¡2nXl=1(x1 ¡xl);¢¢¢ ;x⁄n ¡2nXl=1(xn ¡xl)):Setting this equal to zero to solve for the critical points, we see(x⁄1;¢¢¢ ;x⁄n) = (2nXl=1(x1 ¡xl);¢¢¢ ;2nXl=1(xn ¡xl));and therefore that x⁄1 +¢¢¢+x⁄n = 0. And we also get(x⁄i ¡x⁄j) = 2nXl=1(xi ¡xl)¡(xj ¡xl) = 2nXl=1(xi ¡xj) = 2n(xi ¡xj);which implies that (xi ¡ xj) = 12n(x⁄i ¡ x⁄j) for all i;j with 1 • i;j • n.Since ^g1 is a sum of linear and concave functions, then ^g1 is concave. Thus,the critical point is a maximum and we can substitute into g⁄1 to flnd thesupremum.566.2. The Kernel Average ConjugateDoing this, we flndg⁄1(x⁄) = hx⁄1;x1i+¢¢¢+hx⁄n¡1;xn¡1i+h¡x⁄1 ¡¢¢¢¡x⁄n¡1;xni¡nXi=1nXj=112¡ 12n¢2kx⁄i ¡x⁄jk2= hx⁄1;x1 ¡xni+¢¢¢+hx⁄n¡1;xn¡1 ¡xni¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= hx⁄1; 12n(x⁄1 ¡x⁄n)i+¢¢¢+hx⁄n¡1; 12n(x⁄n¡1 ¡x⁄n)i¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= 12n(hx⁄1;x⁄1i+¢¢¢+hx⁄n¡1;x⁄n¡1i)¡ 12nhx⁄1;x⁄ni¡¢¢¢¡ 12nhx⁄n¡1;x⁄ni¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= 12n(kx⁄1k2 +¢¢¢+kx⁄n¡1k2 + 12nh¡x⁄1 ¡¢¢¢¡x⁄n¡1;x⁄ni¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= 12nnXi=1kx⁄ik2 ¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= 12nnXi=1kx⁄ik2 ¡ 12kx⁄1 +¢¢¢+x⁄nn k2 ¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2:Then using (4.2.3), we get thatg⁄1(x⁄) = 14n2nXi=1nXj=1kx⁄i ¡x⁄jk2 ¡ 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2= 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2when x⁄1 +¢¢¢+x⁄n = 0. If x⁄1 +¢¢¢+x⁄n 6= 0 then set x1 = ¢¢¢ = xn = x and576.2. The Kernel Average Conjugateget thatg⁄1(x⁄) ‚ supx hnXi=1x⁄i;xi¶= +1:¥This formula can be written in several equivalent forms using the factthatnPi=1x⁄i = 0. Using this, we can see thatnXi=1nXj=1kx⁄i ¡x⁄jk2 =nXi=1nXj=1(kx⁄ik2 +kx⁄jk2 ¡2hx⁄i;x⁄ji)=nXi=1 nXj=1(kx⁄ik2 +kx⁄jk2)¡2hx⁄i;nXj=1x⁄ji¶=nXi=1 nXj=1(kx⁄ik2 +kx⁄jk2)¡0¶=nXi=1 nXj=1(kx⁄ik2 +kx⁄jk2)+2hx⁄i;nXj=1x⁄ji¶=nXi=1nXj=1kx⁄i +x⁄jk2:And from the flrst proof of Proposition 6.2.1 we also see thatg⁄1(x⁄) = 18n2nXi=1nXj=1kx⁄i ¡x⁄jk2 = 12nnXi=112kx⁄ik2:So that the following three formulations for the conjugate of g1 are equiva-lent:(1) g⁄1(x⁄) = 18n2nPi=1nPj=1kx⁄i ¡x⁄jk2 =nPi=1nPj=112kx⁄i¡x⁄j2n k2(2) g⁄1(x⁄) = 18n2nPi=1nPj=1kx⁄i +x⁄jk2 =nPi=1nPj=112kx⁄i+x⁄j2n k2586.3. P-norm Kernel Conjugate for General p Case when n = 3(3) g⁄1(x⁄) = 12nnPi=112kxik2when x⁄1 +¢¢¢+x⁄n = 0 and g⁄1(x⁄) = +1 otherwise.The next step we wish to consider is the case where g = 12nPi=1nPj=11pkxi¡xjkp for both general p > 1 and general n > 1. The conjugate is known forgeneral p > 1 and n = 2, so in the next section we begin looking at generalp with n = 3.Example 6.2.2 (General p, n = 2) Let f(x1;x2) = 1pkx1¡x2kp with p >1. Then,f⁄(y1;y2) = supx1;x2 hx1;y1i+hx2;y2i¡ 1pkx1 ¡x2kp¶= supx1;x2 hx1 ¡x2;y1i+hx2;y1 +y2i¡ 1pkx1 ¡x2kp¶:And using Example 3.2.30, with 1p + 1q = 1, we getf⁄(y1;y2) =8>><>>:1qky1kq; if y1 +y2 = 0;+1; otherwise.6.3 P-norm Kernel Conjugate for General p Casewhen n = 3We now wish to consider the case where g2 = 123Pi=13Pj=11pkxi ¡xjkp, that isg2(x1;x2;x3) = 1pkx1 ¡x2kp + 1pkx2 ¡x3kp + 1pkx3 ¡x1kp:596.3. P-norm Kernel Conjugate for General p Case when n = 3Then g⁄2(y1;y2;y3) is equal tosupx1;x2;x3fx1y1 +x2y2 +x3y3 ¡ 1pkx1 ¡x2kp ¡ 1pkx2 ¡x3kp ¡ 1pkx3 ¡x1kpg= supx1;x2fx1y1 +x2y2 ¡ 1pkx1 ¡x2kp +supx3(x3y3 ¡ 1pkx2 ¡x3kp ¡ 1pkx3 ¡x1kp)g(6.2)Recognizing that supx3fx3y3 ¡ 1pkx2 ¡x3kp ¡ 1pkx3 ¡x1kpg = (1pkx1 ¡¢kp +1pkx2 ¡¢kp)⁄(y3), then applying Fact 3.2.42(i)supx3fx3y3¡1pkx2¡x3kp¡1pkx3¡x1kpg = ((1pk¢¡x1kp)⁄⁄(1pk¢¡x2kp)⁄)(y3)(6.3)Using Proposition 3.2.28 and Example 3.2.30 we get(1pkx¡zkp)⁄(y) = hy;zi+ 1qkykq: (6.4)Combining (6.4) and (6.3)((1pk¢¡x1kp)⁄⁄(1pk¢¡x2kp)⁄)(y3) = (1qk¢kq +hx1;¢i)⁄(1qk¢kq +hx2;¢i)= infu+v=y3f1qkukq + 1qkvkq +hx1;ui+hx2;vig:(6.5)Substituting (6.5) back into (6.2) and setting v = y3 ¡u yieldsg⁄2(y1;y2;y3) = supx1;x2infu [hx1;y1i+hx2;y2i¡ 1pkx1 ¡x2kp + 1qkukq + 1qku¡y3kq+hx1;ui+hx2;y3 ¡ui]:606.3. P-norm Kernel Conjugate for General p Case when n = 3Looking at the above equation we can see thatF((x1;x2);u) = hx1;y1i+hx2;y2i¡1pkx1¡x2kp+1qkukq+1qku¡y3kq+hx1;ui+hx2;y3¡uiis concave-convex since F((x1;x2);¢) is a sum of linear and convex functionsand F(¢;u) is a sum of linear and concave functions.Fix x = (x1;x2) 2 H £ H, and let F((x1;x2);u) = Fx(u). UsingFact 3.4.3 we show that a saddle value exists by showing that 0+(epiFx) =f(0;‚) : ‚ ‚ 0g.By deflnition, 0+(epiFx) = epiF1x , so(u;‚) 2 0+(epiFx) , (u;‚) 2 epiF1x, ‚ ‚ F1x (u):Using Proposition 3.2.19 to compute F1x (u) yieldsF1x (u) = lim‚!1 Fx(‚u)¡Fx(0)‚¶= lim‚!1 1qk‚ukq + 1qk‚u¡y3kq +hx1;‚ui+hx2;y3 ¡‚ui¡ 1qky3kq ¡hx2;y3i‚¶= lim‚!1 ‚q 1qkukq +‚q 1qku¡y3‚ kq +‚hx1;ui+‚hx2; y3‚ ¡ui‚¶=8>><>>:+1; if u 6= 0;0; if u = 0:Therefore (u;‚) 2 epiF1x , u = 0. Since there is no common nonzerodirection of recession for Fx, we can use Fact 3.4.3 to swap the positions of616.3. P-norm Kernel Conjugate for General p Case when n = 3the inflmum and supremum, and combining the appropriate inner productterms producesg⁄2(y1;y2;y3) = infu supx1;x2 hx1;y1+ui+hx2;y2+y3¡ui¡1pkx1¡x2kp+1qkukq+1qku¡y3kq¶:(6.6)Considering the inner supremum flrst, we will flx x1 and let x2¡x1 = z.Then (6.6) becomesinfu 1qkukq + 1qku¡y3kq +supx1hx1;y1 +ui+supzhx1 +z;y2 +y3 ¡ui¡ 1pkzkp¶= infu 1qkukq + 1qku¡y3kq +supx1hx1;y1 +y2 +y3i+supzhz;y2 +y3 ¡ui¡ 1pkzkp¶:Here we can see that the supremum on the right is the deflnition of (1pk¢kp)⁄evaluated at y2+y3¡u. Using Example 3.2.30 and the fact that y1+y2+y3 =0 we then getg⁄2(y1;y2;y3) = infu 1qkukq + 1qku¡y3kq + 1qky2 +y3 ¡ukq¶where y1 +y2 +y3 = 0 and 1p + 1q = 1.Since g2 is symmetric under permutation of its variables, we interchangey1 and y3 so that the previous description of g⁄2 turns into the more sym-metric formg⁄2(y1;y2;y3) = infx 1qkx¡y1kq+1qkx¡(y1+y2)kq+1qkx¡(y1+y2+y3)kq¶where y1 +y2 +y3 = 0 and 1p + 1q = 1.626.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3In R, the problem becomesminx2R 1qjx¡y1jq + 1qjx¡(y1 +y2)jq + 1qjxjq¶(6.7)with 1p + 1q = 1 and y1 +y2 +y3 = 0. We will deflneh(x) := 1qjx¡y1jq + 1qjx¡(y1 +y2)jq + 1qjxjq; (6.8)so that we are solving minx2Rh(x). Since the case with p = 2 and q = 2 wasalready solved, we consider this problem with the next simplest case, q = 3which corresponds to p = 32.6.4 P-norm Kernel Conjugate when p = 32, q = 3,and n = 3Considering (6.8) with q = 3 gives the problemminx2Rh(x) (6.9)whereh(x) = 13jx¡y1j3 + 13jx¡(y1 +y2)j3 + 13jxj3; (6.10)and y1;y2 are constants.In order to flnd the optimal value of (6.9) we need to consider eightcases, which cover all possible values of the three absolute values in (6.10).The eight cases are as follows:636.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3(1) x ‚ y1, x ‚ y1 +y2, and x ‚ 0(2) x • y1, x ‚ y1 +y2, and x ‚ 0(3) x ‚ y1, x • y1 +y2, and x ‚ 0(4) x ‚ y1, x ‚ y1 +y2, and x • 0(5) x • y1, x • y1 +y2, and x ‚ 0(6) x • y1, x ‚ y1 +y2, and x • 0(7) x ‚ y1, x • y1 +y2, and x • 0(8) x • y1, x • y1 +y2, and x • 0We now consider each case in depth, and set y = (y1;y2).6.4.1 Case 1: x ‚ y1, x ‚ y1 +y2, x ‚ 0Using the constraints of this case, we deflne the function to be minimized ash1;y(x) = 13(x¡y1)3 + 13(x¡(y1 +y2))3 + 13x3;and we minimize h1;y over the region where maxfy1;y1+y2;0g• x. Becauseh1;y(x) is convex with respect to x, see Example 3.2.31, then by Fact 3.2.23any critical point will be the minimizer. Difierentiating to solve for criticalpoints yields@h1;y@x = (x¡y1)2 +(x¡(y1 +y2))2 +x2:646.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Setting @h1;y@x = 0 and solving for x, we get the critical pointsx1 = 13y2 + 23y1 + 13q¡2y22 ¡2y1y2 ¡2y21x2 = 13y2 + 23y1 ¡ 13q¡2y22 ¡2y1y2 ¡2y21:To see if the critical points are real, the value of ¡2y22 ¡2y1y2 ¡2y21 mustbe examined and we see that¡2y22 ¡2y1y2 ¡2y21 = ¡(y21 +y22)¡(y21 +2y1y2 +y22)= ¡(y21 +y22)¡(y1 +y2)2 • 0:Since ¡2y22 ¡ 2y1y2 ¡ 2y21 • 0 for all values of y1 and y2 then there areno real critical points of h1;y, so we next check the boundary points, x ‚maxf0;y1;y1 +y2g.If maxf0;y1;y1 + y2g = y1 + y2, i.e. when y1 ‚ 0 and y2 ‚ 0, or wheny2 ‚ 0 and ¡y2 • y1 • 0, then the minimum value of h1y1;y2(x) ish1;y(y1 +y2) = 13y32 + 13(y1 +y2)3:If maxf0;y1;y1 + y2g = y1, or rather when y1 ‚ 0 and y2 • 0, then we geta minimum value ath1;y(y1) = 13jy2j3 + 13y31:If maxf0;y1;y1 +y2g = 0, i.e. if y1 • 0 and y2 • 0, or y2 ‚ 0 and y2 •¡y1,656.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3then the minimum value ish1;y(0) = 13jy1j3 + 13jy1 +y2j3:This case is summarized graphically in Figure 6.1.Figure 6.1: Case 1 summary666.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 36.4.2 Case 2: x • y1, x ‚ y1 +y2, x ‚ 0The conditions for this case give the following function,h2;y(x) = 13(y1 ¡x)3 + 13(x¡(y1 +y2))3 + 13x3;for minimization on the region where maxf0;y1 + y2g • x • y1. Again,difierentiating to flnd the critical points gives us@h2;y@x = ¡(y1 ¡x)2 +(x¡(y1 +y2))2 +x2:And solving @h2;y@x = 0 yields the two critical pointsx1 = y2 +p¡2y1y2x2 = y2 ¡p¡2y1y2:Considering the constraints, we see that y1 ‚ x ‚ 0 and y1+y2 • x • y1,so that y1 ‚ 0 and y2 • 0 is the region of interest. This makes ¡2y1y2 ‚ 0so that the critical points are real. We need x ‚ 0, but x2 • 0 for all y1;y2in this region, and if x2 = 0 then y2 = y1 = 0 and x1 = x2 = 0. Hence, itsu–ces to consider only x1.Next, we check that x1 satisfles the three conditions of this case: x ‚ 0,x • y1, and x ‚ y1 +y2. For x1 = y2 +p¡2y1y2 ‚ 0, it is required that¡y2 = jy2j•p¡2y1y2 , y22 •¡2y1y2 ,jy2j• 2y1 , 12jy2j• y1:676.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3For x1 • y1, this is equivalent toy2 +p¡2y1y2 • y1 ,¡2y1y2 • (y1 ¡y2)2 , 0 • y21 +y22:Therefore this condition is always true. For x1 ‚ y1 +y2, we gety2 +p¡2y1y2 ‚ y1 +y2 ,p¡2y1y2 ‚ y1 , 2jy2j‚ y1:Since h2;y is convex with respect to x, then if these three conditions holdthen x1 is the minimizer. So if 12jy2j • y1 • 2jy2j then the critical point isthe minimizer and we get the minimum value of h2;y(x) ish2;y(x1) = ¡13y1y2(3y1 ¡3y2 ¡4p¡2y1y2)If the conditions for the critical point are not satisfled then we must checkthe boundary conditions. In this case the boundary points are maxfy1 +y2;0g• x • y1, and we consider two subcases:Subcase 1: y1 +y2 ‚ 0To determine which is the minimum, we evaluate the difierence between theupper boundary value and the lower boundary value. When y1 + y2 ‚ 0,the maxfy1 +y2;0g = y1 +y2 so we look at the difierenceh2;y(y1)¡h2;y(y1 +y2) = 13(¡y2)3 + 13y31 ¡ 13(¡y2)3 ¡ 13(y1 +y2)3= 13y31 ¡ 13(y1 +y2)3 = ¡13y32 ¡y1y2(y1 +y2):686.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Since y1 ‚ 0, y2 • 0 and y1 +y2 ‚ 0 the above difierence is always positive.This means that the minimum value ish2;y(y1 +y2) = 13(y1 +y2)3 ¡ 13y32:Subcase 2: y1 +y2 • 0With y1 +y2 • 0 the maxfy1 +y2;0g = 0 so we calculate the difierenceh2;y(y1)¡h2;y(0) = 13y31 + 13(¡y2)3 ¡ 13y31 ¡ 13(¡y1 ¡y2)3= 13(¡y2)3 ¡ 13(¡y1 ¡y2)3 = 13y31 +y1y2(y1 +y2):Again, because of the signs of y1, y2, and y1 + y2 the difierence is positiveso the minimum ish2;y(0) = 13y31 ¡ 13(y1 +y2)3:Case 2 is summarized graphically in Figure 6.2.6.4.3 Case 3: x ‚ y1, x • y1 +y2, x ‚ 0For this case, the function we are looking to minimize ish3;y = 13(x¡y1)3 + 13(y1 +y2 ¡x)3 + 13x3;over the region where maxfy1;0g• x • y1 +y2. Difierentiating h3;y yields@h3;y@x = (x¡y1)2 ¡(y1 +y2 ¡x)2 +x2:696.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Figure 6.2: Case 2 summaryAnd setting @h3;y@x = 0 and solving for x gives us the critical pointsx1 = ¡y2 +q2y22 +2y1y2x2 = ¡y2 ¡q2y22 +2y1y2:Looking at the constraints for this case we can see that y1 + y2 ‚ x ‚ y1,so we must have y2 ‚ 0. And y1 + y2 ‚ x ‚ 0 gives us that y1 + y2 ‚ 0.Knowing that y2 ‚ 0, it is obvious that x2 • 0 and when x2 = 0 theny2 = y1 = 0 and x1 = x2 = 0. So this case can be covered by consideringonly x1.Forx1 tobearealcriticalpoint, weneed2y22+2y1y2 ‚ 0 , 2y2(y1+y2) ‚0. Since both y1 ‚ 0 and y1 +y2 ‚ 0 always holds for this case then this is706.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3always true. We also require that the critical point x1 satisfles the conditionsfor this case. For x1 ‚ 0,¡y2 +q2y22 +2y1y2 ‚ 0 , 2y22 +2y1y2 ‚ y22 , 2y1y2 ‚¡y22 , 2y1 ‚¡y2For x1 ‚ y1¡y2+q2y22 +2y1y2 ‚ y1 ,q2y22 +2y1y2 ‚ y1+y2 , 2y22+2y1y2 ‚ y21+2y1y2+y22, y22 ‚ y21 , y2 ‚jy1jAnd for x1 • y1 +y2¡y2 +q2y22 +2y1y2 • y1 +y2 , 2y22 +2y1y2 • y21 +4y1y2 +4y22, 0 • y21 +2y1y2 +2y22 , 0 • (y1 +y2)2 +y22So x1 • y1 +y2 is always true, and x1 ‚ 0 and x1 ‚ y1 hold when 2y1 ‚¡y2 and y2 ‚jy1j, respectively. This means that when both 2y1 ‚¡y2 andy2 ‚jy1j are true, the critical point x1 will produce the minimum,h3;y(x1) = 13y2(y1 +y2)(3y1 +6y2 ¡4p2y2(y1 +y2)):Next, we will need to consider two subcases of y1 ‚ 0 and y1 • 0separately in order to flnd the minimum value for each region where thecritical point does not satisfy the conditions of this case.716.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Subcase 1: y2 ‚ 0, y1 • 0If y1 • 0 and 2y1 ‚ ¡y2 then the critical point x1 is the minimizer andthe minimum value is as stated above. If 2y1 < ¡y2 then there are nocritical points and we look at the boundary points, which in this subcaseare maxf0;y1g = 0 • x • y1+y2. Taking the difierence of the two potentialminimums yieldsh3;y(y1 +y2)¡h3;y(0) = 13y32 + 13(y1 +y2)3 ¡ 13jy1j3 ¡ 13(y1 +y2)3= 13y32 ¡ 13jy1j3:Since y1 +y2 ‚ 0 , y2 ‚¡y1 , y2 ‚jy1j, then h3;y(0) • h3;y(y1 +y2) andthe lower bound is the minimizer and the minimum value ish3;y(0) = 13(y1 +y2)3 ¡ 13y31:Subcase 2: y2 ‚ 0, y1 ‚ 0Since both y1 and y2 are always positive then 2y1 ‚ ¡y2 always holds.Therefore the critical point is good everywhere within the region wherey2 ‚ y1. When y1 ‚ y2 then the boundary points are maxf0;y1g = y1 •x • y1 +y2. Taking the difierence producesh3;y(y1 +y2)¡h3;y(y1) = 13y32 + 13(y1 +y2)3 ¡ 13y32 ¡ 13y31= 13(y1 +y2)3 ¡ 13y31 ‚ 0:726.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3So the minimum value ish3;y(y1) = 13y32 + 13y31:Case 3 is summarized graphically in Figure 6.3.Figure 6.3: Case 3 summary6.4.4 Case 4: x ‚ y1, x ‚ y1 +y2, x • 0For this case we minimize the functionh4;y(x) = 13(x¡y1)3 + 13(x¡y1 ¡y2)3 ¡ 13x3736.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3over the region where maxfy1;y1 +y2g • x • 0. Solving @h4@x = (x¡y1)2 +(x¡y1 ¡y2)2 ¡x2 = 0 yields the critical pointsx1 = 2y1 +y2 +q2y1y2 +2y21x2 = 2y1 +y2 ¡q2y1y2 +2y21:Theconstraintsforthiscaseleadustothe inequalitiesy1 • 0andy1+y2 • 0,or y2 • ¡y1 = jy1j. In order for the critical points to be real, we require2y1y2 +2y21 ‚ 0 ,jy1j‚ y2 which is always true in this case. Therefore thecritical point will give the minimum value. To satisfy x • 0 and determinewhich critical point is the one we want, we need x1;x2 • 0. Looking flrst atx1:y2 +2y1 +q2y1y2 +2y21 • 0 , (¡y2 ¡2y1)2 ‚ 2y1y2 +2y21, y22 +4y1y1 +4y21 ‚ 2y1y2 +2y21 , y22 +2y1y1 +2y21 ‚ 0, (y1 +y2)2 +y21 ‚ 0:This always holds, so we check the next condition, x1 ‚ y1 +y2:y2 +2y1 +q2y1y2 +2y21 ‚ y1 +y2 , y1 +q2y1y2 +2y21 ‚ 0, 2y1y2 +2y21 ‚ y21 , 2y1y2 +y21 ‚ 0 , y1(2y2 +y1) ‚ 0, 2y2 +y1 • 0 , y2 • 12jy1j:746.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3And the flnal condition, x1 ‚ y1, yieldsy2 +2y1 +q2y1y2 +2y21 ‚ y1 , y1 +y2 +q2y1y2 +2y21 ‚ 0, 2y1y2 +2y21 ‚ (¡y1 ¡y2)2 , 2y1y2 +2y21 ‚ y21 +2y1y2 +y22, y21 ¡y22 ‚ 0 ,jy1j‚jy2j:So the critical point x1 is in the region of interest when y2 • 12jy1j andjy1j‚jy2j both hold.Next looking at x2, if we look at the condition x2 ‚ y1 we see that2y1 +y2 ¡q2y1y2 +2y21 ‚ y1 , (y1 +y2)¡q2y1y2 +2y21 ‚ 0:This only holds if y1 = y2 = 0, in which case x1 = x2 so this point is notin the interior of the region and we do not need to consider it. So wheny2 • 12jy1j and jy1j‚jy2j both hold then the minimum value ish4;y(x1) = ¡13y1(y1 +y2)(6y1 +3y2 +4p2y1(y1 +y2)):Next, we examine the boundary conditions maxfy1;y1+y2g• x • 0 forthe two subcases y2 ‚ 0 and y2 • 0, to determine the minimum when theconstraints needed for the critical point do not hold.756.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Subcase 1: y2 ‚ 0When y2 ‚ 0 then the maxfy1;y1+y2g = y1+y2, so we look at the difierenceh4;y(0)¡h4;y(y1 +y2) = ¡13y31 ¡ 13(y1 +y2)3 ¡ 13y32 + 13(y1 +y2)3= 13jy1j3 ¡ 13y32:Since jy1j‚ y2 then the above difierence is positive, so the minimum ish4;y(y1 +y2) = 13y32 ¡ 13(y1 +y2)3:Subcase 2: y2 • 0When y2 • 0 then the maxfy1;y1 +y2g = y1, so we look at the difierenceh4;y(0)¡h4;y(y1) = 13jy1j3 + 13jy1 +y2j3 ¡ 13jy2j3 ¡ 13jy1j3= 13jy1 +y2j3 ¡ 13jy2j3:Since jy1 +y2j‚jy2j then the difierence is positive and the minimum ish4;y(y1) = 13jy1j3 + 13jy2j3:Case 4 is summarized graphically in Figure 6.4.766.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Figure 6.4: Case 4 summary6.4.5 Case 5: x • y1, x • y1 +y2, x ‚ 0For this case we look at minimizingh5;y(x) = 13(y1 ¡x)3 + 13(y1 +y2 ¡x)3 + 13x3;over the region where 0 • x • minfy1;y1 + y2g. Difierentiating h5;y withrespect to x yields@h5;y@x = ¡(y1 ¡x)2 ¡(y1 +y2 ¡x)2 +x2:776.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Then solving @h5;y@x = 0 gives us the critical pointsx1 = 2y1 +y2 +q2y1y2 +2y21x2 = 2y1 +y2 ¡q2y1y2 +2y21:These critical points are the same as in the previous case, but must be re-examined using the new constraints of this case. We see that y1 ‚ x ‚ 0, soy1 ‚ 0 and similarly y1 +y2 ‚ 0. In order for the critical points to be real,we need 2y1y2 + 2y21 ‚ 0 , y1 + y2 ‚ 0 which always holds for this case.Next, we check where the critical points are valid.When we check x1 • y1, we see that 2y1 + y2 +p2y1y2 +2y21 • y1 ,y1 + y2 +p2y1y2 +2y21 • 0, but since y1 + y2 ‚ 0 and p2y1y2 +2y21 ‚ 0then this does not hold except when y1 = y2 = 0, and it that situationx1 = x2. Therefore, for this case we need only consider x2.Looking at x2, we see for x2 ‚ 0,2y1 +y2 ¡q2y1y2 +2y21 ‚ 0 , (2y1 +y2)2 ‚ 2y1y2 +2y21, 4y21 +4y1y2 +y22 ‚ 2y1y2 +2y21 , 2y21 +2y1y2 +y22 ‚ 0, y21 +(y1 +y2)2 ‚ 0:This always holds, so next we check x2 • y1:2y1 +y2 ¡q2y1y2 +2y21 • y1 , (y1 +y2)2 • 2y1y2 +2y21, y21 +2y1y2 +y22 • 2y1y2 +2y21 , y22 ¡y21 • 0 ,jy1j‚jy2j:786.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3And checking the last condition we see that for x2 • y1 +y2,2y1 +y2 ¡q2y1y2 +2y21 • y1 +y2 , y1 ¡q2y1y2 +2y21 • 0, 2y1y2 +2y21 ‚ y21 , 2y1y2 +y21 ‚ 0, y1(2y2 +y1) ‚ 0 , 2y2 +y1 ‚ 0So the critical point x2 is in the region of interest when both jy1j‚jy2j and2y2 +y1 ‚ 0 hold, and the minimum value ish5;y(x2) = ¡13y1(y1 +y2)(¡6y1 ¡3y2 +4p2y1(y1 +y2)):Now we look at the boundary conditions 0 • x • minfy1;y1 + y2g inthe two subcases y2 ‚ 0 and y2 • 0 to determine the minimum when thecritical point is not in the region of interest. That is, when jy1j < jy2j and2y2 +y1 < 0.Subcase 1: y2 ‚ 0With y2 ‚ 0 the minfy1;y1 + y2g = y1 and the difierence that we need toconsider ish5;y(y1)¡h5;y(0) = 13y32 + 13y31 ¡ 13y31 ¡ 13(y1 +y2)3= 13y32 ¡ 13(y1 +y2)3 = ¡13y31 ¡y1y2(y1 +y2) • 0796.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3So the upper bound is the minimizer and the minimum value ish5;y(y1) = 13y31 + 13y32:Subcase 2: y2 • 0When y2 • 0 then the minfy1;y1+y2g = y1+y2 and we look at the difierenceh5;y(y1 +y2)¡h5;y(0) = 13(¡y2)3 + 13(y1 +y2)3 ¡ 13y31 ¡ 13(y1 +y2)3= 13(¡y2)3 ¡ 13y31Since y2 • 0, y1 ‚ 0, and y1 +y2 ‚ 0 then y1 ‚jy2j and the above equationis always less than or equal to zero, which makes the upper bound, y1 +y2the minimizer with a minimum value ofh5;y(y1 +y2) = 13(y1 +y2)3 ¡ 13y32:Case 5 is summarized graphically in Figure 6.5.6.4.6 Case 6: x • y1, x ‚ y1 +y2, x • 0For this case we have the functionh6;y(x) = 13(y1 ¡x)3 + 13(x¡y1 ¡y2)3 + 13(¡x)3to minimize. Difierentiating with respect to x to get our critical points yields@h6;y@x = ¡(y1 ¡x)2 +(x¡y1 ¡y2)2 ¡x2806.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Figure 6.5: Case 5 summarySetting this equal to zero and solving for x then gives us the critical pointsx1 = ¡y2 +q2y22 +2y1y2x2 = ¡y2 ¡q2y22 +2y1y2:Looking at the constraints of the case, notice that y1 + y2 • x • y1 andy1 + y2 • x • 0 imply that both y1 + y2 • 0 and y2 • 0. For the criticalpoints to be real we need 2y22 + 2y1y2 ‚ 0 , 2y2(y1 + y2) ‚ 0, which isalways true since y2 and y1 +y2 are both always negative. It is easy to seethat critical point x1 is always positive, so it does not satisfy the conditionsof this case, except when y1 = y2 = 0 which makes x1 = x2. So it su–cesto check only x2 for this case.816.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Checking the condition x2 ‚ y1 +y2,y1 +y2 •¡y2 ¡q2y22 +2y1y2 ,q2y22 +2y1y2 •¡y1 ¡2y2, 2y22 +2y1y2 • y21 +4y1y2 +4y22, 0 • y21 +2y1y2 +2y22 , 0 • (y1 +y2)2 +y22:This condition always holds. Next, we check x2 • y1,¡y2 ¡q2y22 +2y1y2 • y1 ,¡y1 ¡y2 •q2y22 +2y1y2, y21 +2y1y2 +y22 • 2y22 +2y1y2, y21 • y22 ,jy1j•jy2j:And the last condition is x2 • 0,¡y2 ¡q2y22 +2y1y2 • 0 ,¡y2 •q2y22 +2y1y2, y22 • 2y22 +2y1y2 , 0 • y22 +2y1y2, 0 • y2(y2 +2y1) , y2 +2y1 • 0 , 2y1 •jy2j:So x2 is the minimizer when both jy1j•jy2j and 2y1 •jy2j hold and theminimum value ish6;y(x2) = ¡13y2(y1 +y2)(3y1 +6y2 +4p2y2(y1 +y2)):Outside this region we check the boundary conditions y1 + y2 • x •minf0;y1g to determine the minimizer.826.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Subcase 1: y1 ‚ 0With y1 ‚ 0 the minfy1;0g = 0 so we look at the difierenceh6;y(0)¡h6;y(y1 +y2) = 13y31 + 13(¡y1 ¡y2)3 ¡ 13(¡y2)3 ¡ 13(¡y1 ¡y2)3= 13y31 ¡ 13(¡y2)3 = 13y31 ¡ 13jy2j3:Since y1 + y2 • 0 we know that jy2j ‚ y1 and the above equation is lessthan or equal to zero. This makes the upper boundary the minimum andthe minimum value ish6;y(0) = 13y31 ¡ 13(y1 +y2)3:Subcase 2: y1 • 0When y1 • 0 the minfy1;0g = y1 and we consider the difierenceh6;y(y1)¡h6;y(y1 +y2) = 13(¡y2)3 + 13(¡y1)3 ¡ 13(¡y2)3 ¡ 13(¡y1 ¡y2)3= 13jy1j3 ¡ 13jy1 +y2j3:Because y1 and y2 are both negative then jy1 +y2j ‚ jy1j and the equationabove is less than or equal to zero, giving us a minimum value ofh6;y(y1) = ¡13y31 ¡ 13y32:Case 6 is summarized graphically in Figure 6.6.836.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Figure 6.6: Case 6 summary6.4.7 Case 7: x ‚ y1, x • y1 +y2, x • 0In this case the equation to be minimized ish7;y(x) = 13(x¡y1)3 + 13(y1 +y2 ¡x)3 + 13(¡x)3;over the region where y1 • x • minf0;y1 + y2g. Difierentiating h7;y withrespect to x yields@h7;y@x = (x¡y1)2 ¡(y1 +y2 ¡x)2 ¡x2:846.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Setting @h7;y@x = 0 and solving for x gives us the critical pointsx1 = y2 +p¡2y1y2x2 = y2 ¡p¡2y1y2:Looking at the constraints we see that y1 • x • 0 and y1 • x • y1 +y2so we have y1 • 0 and y2 ‚ 0 for this case. For the critical points to be real,¡2y1y2 ‚ 0 must hold, and because of the signs of y1 and y2 this is alwaystrue. The critical point x1 • 0 so it does not lie in the interior of the regionof interest. And when x1 = 0 then y1 = y2 = 0 and so x1 = x2. Thereforewe need only consider x2, and we check x2 against the constraints. First,x2 ‚ y1,y2 ¡p¡2y1y2 ‚ y1 , (y2 ¡y1)2 ‚¡2y1y2 , y22 +y21 ‚ 0:This will always hold, so next we look at x2 • 0y2 ¡p¡2y1y2 • 0 , y22 •¡2y1y2 , y22 +2y1y2 • 0 , y2(y2 +2y1) • 0, y2 +2y1 • 0 , 2jy1j‚ y2:And flnally at x2 • y1 +y2,y2 ¡p¡2y1y2 • y1 +y2 , y21 •¡2y1y2 ,jy1j• 2y2 , 12jy1j• y2:Then the critical point x2 is the minimizer if 12jy1j • y2 • 2jy1j, and the856.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3minimum value ish7;y = 13y1y2(3y1 ¡3y2 +4p¡2y1y2):Outsidethisregion, welookattheboundaryconditionsy1 • x • minf0;y1+y2g to determine the minimizer.Subcase 1: y1 +y2 ‚ 0When y1 +y2 ‚ 0 the minf0;y1 +y2g = 0 so we look at the difierenceh7;y(0)¡h7;y(y1) = 13(¡y1)3 + 13(y1 +y2)3 ¡ 13y32 ¡ 13(¡y1)3= 13(y1 +y2)3 ¡ 13y32 = 13y31 +y1y2(y1 +y2):Because y1 • 0, y2 ‚ 0, then y1 + y2 • y2 and (y1 + y2)3 • y32 so that theabove equation is always negative and therefore the minimum value ish7;y(0) = 13(y1 +y2)3 ¡ 13y31:Subcase 2: y1 +y2 • 0Here the minf0;y1 +y2g = y1 +y2 so the difierence to consider ish7;y(y1 +y2)¡h7;y(y1) = 13y32 + 13(¡y1 ¡y2)3 ¡ 13y32 ¡ 13(¡y1)3= 13(¡y1 ¡y2)3 ¡ 13(¡y1)3 = ¡13y32 ¡y1y2(y1 +y2)866.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Again, because of the signs of y1 and y2, we get that (¡y1 ¡ y2) • ¡y1and (¡y1 ¡ y2)3 • (¡y1)3 so that this equation is always negative so theminimum value ish7;y(y1 +y2) = 13y32 ¡ 13(y1 +y2)3:Case 7 is summarized graphically in Figure 6.7.Figure 6.7: Case 7 summary876.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 36.4.8 Case 8: x • y1, x • y1 +y2, x • 0For the flnal case the function we minimize ish8;y(x) = 13(y1 ¡x)3 + 13(y1 +y2 ¡x)3 + 13(¡x)3;over the region where x • minf0;y1;y1 + y2g. Difierentiating h8;y withrespect to x yields@h8;y@x = ¡(y1 ¡x)2 ¡(y1 +y2 ¡x)2 ¡x2:Setting this equal to zero and solving for x gives the critical pointsx1 = 23y1 + 13y2 + 13q¡2y21 ¡2y1y2 ¡2y22x2 = 23y1 + 13y2 ¡ 13q¡2y21 ¡2y1y2 ¡2y22:As we saw in case 1,¡2y21 ¡2y1y2 ¡2y22 = ¡(y21 +y22)¡(y1 +y2)2 • 0so the critical points are only real when y1 = y2 = 0. Thus we are only leftwith the boundary conditions x • minf0;y1;y1+y2g. If minf0;y1;y1+y2g =0 then the minimum value ish8;y(0) = 13y31 + 13(y1 +y2)3:886.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3When the minf0;y1;y1 +y2g = y1 then the minimum value ish8;y(y1) = 13y32 + 13(¡y1)3:And flnally, if minf0;y1;y1 +y2g = y1 +y2, the minimum value ish8;yy1 +y2) = 13(¡y2)3 + 13(¡y1 ¡y2)3:Case 8 is summarized graphically in Figure 6.8.Figure 6.8: Case 8 summary896.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 36.4.9 Combining the Eight CasesNow that each possibility has been examined, each case must be comparedagainst the others to determine the global minimum in each region. Cases2,3, and 4 and Cases 5,6, and 7 can be plotted without any overlap, as seenin Figure 6.9 on page 97. This leaves 12 regions each with four possibleminimizers.Since we can see from equation (6.10) that h(x) is convex with respectto x, the critical point will be the minimum in regions where there is a validcritical point. Combining all of the eight cases, we can see that there isa valid critical point for every possible region. We therefore have only sixregions, as seen in Figure 6.10 on page 98. The regions are divided by thelines y1 ¡ y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0, and each region has acritical point minimizer.The six regions are deflned as follows. Let y = (y1;y2), theny 2 A if ¡ 12y1 • y2 • y1; (6.11)y 2 B if ¡ 12y2 • y1 • y2;y 2 C if ¡2y2 • y1 •¡12y2;y 2 D if y1 • y2 •¡12y1;y 2 E if y2 • y1 •¡12y2; andy 2 F if ¡2y1 • y2 •¡12y1:The minimum for each region is outlined below906.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3(A) When y 2 A, then the minimizer is the critical point x2 from Case 5,and the minimum value ish5;y(x2) = h5;y(2y1 +y2 ¡q2y1y2 +2y21)= ¡13y1(y1 +y2)(¡6y1 ¡3y2 +4p2y1(y1 +y2)):(B) When y 2 B, the minimizer is the critical point x1 from Case 3, andthe minimum value ish3;y(x1) = h3;y(¡y2 +q2y22 +2y1y2)= ¡13y2(y1 +y2)(¡3y1 ¡6y2 +4p2y2(y1 +y2)):(C) When y 2 C, the minimizer is the critical point x2 from Case 7, andthe minimum value ish7;y(x2) = h7;y(y2 ¡p¡2y1y2)= 13y1y2(3y1 ¡3y2 +4p¡2y1y2):(D) When y 2 D, the minimizer is the critical point x1 from Case 4, andthe minimum value ish4;y(x1) = h4;y(2y1 +y2 +q2y1y2 +2y21)= ¡13y1(y1 +y2)(6y1 +3y2 +4p2y1(y1 +y2)):(E) When y 2 E, the minimizer is the critical point x2 from Case 6, and916.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3the minimum value ish6;y(x2) = h6;y(¡y2 ¡q2y22 +2y1y2)= ¡13y2(y1 +y2)(3y1 +6y2 +4p2y2(y1 +y2)):(F) And when y 2 F, the minimizer is the critical point x1 from Case 2,and the minimum value ish2;y(x1) = h2;y(y2 +p¡2y1y2)= 13y1y2(¡3y1 +3y2 +4p¡2y1y2):6.4.10 Bringing It All TogetherIf you recall, the goal was to solveminx2Rh(x) = minx2R 13jx¡y1j3 + 13jx¡(y1 +y2)j3 + 13jxj3;¶:in order to get g⁄2(y1;y2;y3), where y1 +y2 +y3 = 0g2(x1;x2;x3) = 23kx1 ¡x2k32 + 23kx2 ¡x3k32 + 23kx3 ¡x1k32The minimizer of h(x) has been found for each of the six regions, and so926.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3we have found that g2(y1;y2;¡y1 ¡y2) =8>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>:¡13y1(y1 +y2)(¡6y1 ¡3y2 +4p2y1(y1 +y2)); if y 2 A;¡13y2(y1 +y2)(¡3y1 ¡6y2 +4p2y2(y1 +y2)); if y 2 B;13y1y2(3y1 ¡3y2 +4p¡2y1y2); if y 2 C;¡13y1(y1 +y2)(6y1 +3y2 +4p2y1(y1 +y2)); if y 2 D;¡13y2(y1 +y2)(3y1 +6y2 +4p2y2(y1 +y2)); if y 2 E;13y1y2(¡3y1 +3y2 +4p¡2y1y2); if y 2 F;(6.12)where the regions A;¢¢¢ ;F are as deflned in (6.11) on page 90.Examining a plot of the above function and its contour plot, in Fig-ure 6.11 on page 98, we see that g⁄2 is convex which is what we wouldexpect from a conjugate function, even though g⁄2 is not obviously convexupon inspection.Recall from (6.7) on page 63 that we had three conjugate variables,y1;y2; and y3 such that y1 +y2 +y3 = 0.Making the substitution y3 = ¡(y1+y2) allows us to write (6.12) in the936.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3more symmetric formg⁄2(y1;y2;y3) =8>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>:13y1y3(3y3 ¡3y1 +4p¡2y1y3); if (y1;y2) 2 A;13y2y3(3y3 ¡3y2 +4p¡2y2y3); if (y1;y2) 2 B;13y1y2(3y1 ¡3y2 +4p¡2y1y2); if (y1;y2) 2 C;13y1y3(3y1 ¡3y3 +4p¡2y1y3); if (y1;y2) 2 D;13y2y3(3y2 ¡3y3 +4p¡2y2y3); if (y1;y2) 2 E;13y1y2(3y2 ¡3y1 +4p¡2y1y2); if (y1;y2) 2 F;(6.13)where the regions A;¢¢¢ ;F are as deflned in (6.11) on page 90.With the y3 variable added back into the equation, we can then recognizethat the three boundaries y1 ¡y2 = 0, 2y1 + y2 = 0, and y1 + 2y2 = 0 areequivalent to y1 = y2, y1 = y3, and y2 = y3. If we consider region A deflnedby y1 ‚ y2 ‚¡12y1, and look at the difierence,y2 ¡y3 = y2 ¡(¡y1 ¡y2) = y1 +2y2‚ y1 +2(¡12y1) = 0;so minfy1;y2;y3g = y3 and maxfy1;y2;y3g = y1. Thus, we can rewrite usingymax = maxfy1;y2;y3g and ymin = minfy1;y2;y3gg⁄2(y1;y2;y3) = 13y1y3(3y3 ¡3y1 +4p¡2y1y3)= ymaxy2min ¡y2maxymin + 43ymaxyminp¡2ymaxymin;if (y1;y2) 2 A.946.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Similarly, all of the other regions can be rewritten in the same mannerso that (6.13) can be rewritten using ymax = maxfy1;y2;y3g and ymin =minfy1;y2;y3g asg⁄2(y1;y2;y3) = ymaxy2min ¡y2maxymin + 43ymaxyminp¡2ymaxymin; (6.14)without the need to specify the region.Remark 6.4.1 Although g⁄2 does not look convex, it is because it arose as aconjugate function. Convexity can also be shown with calculus if you proceedas follows.We have three variables such that ymax ‚ y0 ‚ ymin and ymax + y0 +ymin = 0, so y0 = ¡ymax ¡ymin and henceymax ‚¡ymax ¡ymin ‚ ymin:Equivalently, 2ymax + ymin ‚ 0 and ¡2ymin ¡ymax ‚ 0. Now deflne x :=ymax and y := ¡ymin, then both x ‚ 0 and y ‚ 0 and we care about theregion where2x¡y ‚ 0 and 2y ¡x ‚ 0: (6.15)Then we get the functionf(x;y) = xy2 +yx2 ¡ 43xyp2xy;956.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3withr2f(x;y) =0B@ (2pxy¡p2y)ypxy 2ypxy+2xpxy¡3p2xypxy2ypxy+2xpxy¡3p2xypxy(2pxy¡p2x)xpxy1CAIt can be shown that the (1;1) and (2;2) elements of r2f(x;y) are pos-itive using the constraints in (6.15). The determinant of r2f(x;y) can beshown to be positive by assuming x = a2 and y = b2, factoring the resultingequation, and considering the signs of each of the factors. Since the Hessianis positive semideflnite then f is convex.966.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3(a) Case 1 (b) Cases 2, 3, and 4(c) Cases 5, 6, and 7 (d) Case 8Figure 6.9: Overview of the eight cases976.4. P-norm Kernel Conjugate when p = 32, q = 3, and n = 3Figure 6.10: The six regions(a) The function g⁄2(y1;y2;¡y1 ¡y2) (b) The contour plot of g⁄2(y1;y2;¡y1 ¡y2)Figure 6.11: Plots of g⁄2(y1;y2;¡y1 ¡y2)98Chapter 7ConclusionThe kernel average was previously only deflned for two functions. We haveused the identitynXi=1‚iq(yi)¡q(nXi=1‚iyi) = 14nXi=1nXj=1‚i‚jkyi ¡yjk2;and the deflnition of the proximal average to deflne the kernel average for nfunctions,Qg(f;‚)(x) := infP‚iyi=x¡X‚ifi(yi)+Xi<j‚i‚jg(yi ¡yj)¢:We then examined a speciflc case of the kernel average, namely the prox-imal average, and its conjugate and calculated that forg1(x) =nXi=1nXj=112kxi ¡xjk2 = 2Xi<j12kxi ¡xjk2;the conjugate function isg⁄1(x⁄) =8>><>>:18n2nPi=1nPj=1kx⁄i ¡x⁄jk2; if x⁄1 +¢¢¢+x⁄n = 0;+1; otherwise.99Chapter 7. ConclusionThis can also be written using any of the following equivalent formulationswhen x⁄1 +¢¢¢+x⁄n = 0:(i) g⁄1(x⁄) = 18n2nPi=1nPj=1kx⁄i ¡x⁄jk2 =nPi=1nPj=112kx⁄i¡x⁄j2n k2(ii) g⁄1(x⁄) = 18n2nPi=1nPj=1kx⁄i +x⁄jk2 =nPi=1nPj=112kx⁄i+x⁄j2n k2(iii) g⁄1(x⁄) = 12nnPi=112kxik2:Next, we computed the conjugate wheng2(x1;x2;x3) = 23kx1 ¡x2k32 + 23kx2 ¡x3k32 + 23kx3 ¡x1k32;and found thatg⁄2(y1;y2;y3) =8>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>:13y1y3(3y3 ¡3y1 +4p¡2y1y3); if (y1;y2) 2 A;13y2y3(3y3 ¡3y2 +4p¡2y2y3); if (y1;y2) 2 B;13y1y2(3y1 ¡3y2 +4p¡2y1y2); if (y1;y2) 2 C;13y1y3(3y1 ¡3y3 +4p¡2y1y3); if (y1;y2) 2 D;13y2y3(3y2 ¡3y3 +4p¡2y2y3); if (y1;y2) 2 E;13y1y2(3y2 ¡3y1 +4p¡2y1y2); if (y1;y2) 2 F;where the regions A;B;C;D;E; and F are as deflned in (6.11), or equiva-lently using ymax = maxfy1;y2;y3g and ymin = minfy1;y2;y3g,g⁄2(y1;y2;y3) = ymaxy2min ¡y2maxymin + 43ymaxyminp¡2ymaxymin: (7.1)100Chapter 7. ConclusionIt was expected that we might flnd a similar form for g⁄2 as was found forg⁄1, which would help formulate a closed form solution for ~g⁄ in the generalcase where~g(x) =nXi=1nXj=1‚i‚jg(xi ¡xj);for any function g. However, due to the fact that there does not appearto be any correlation between the solutions for g⁄1 and g⁄2, it seems unlikelythat a general solution will be found.101Bibliography[1] H. H. Bauschke, R. Goebel, Y. Lucet and X. Wang, \The proximal aver-age: basic theory," SIAM Journal on Optimization, vol 19(2), pp. 766-785, 2008.[2] H. H. Bauschke, Y. Lucet and M. Trienis, \How to transform one convexfunction continuously into another," SIAM Review, vol 50, pp. 115-132,2008.[3] H. H. Bauschke and X. Wang, \The kernel average for two convex func-tions and its application to the extension and representation of mono-tone operators," Transactions of the American Mathematical Society361, pp. 5947-5965, 2009[4] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Opti-mization, Springer, 2006.[5] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge Uni-versity Press, 2004.[6] P. L. Combettes and J.-C. Pesquet, \A proximal decomposition methodfor solving convex variational inverse problems," Inverse Problems, vol.24(6), 2008.102[7] K. R. Davidson and A. P. Donsig, Real Analysis with Real Applications,Prentice Hall, 2002.[8] R. Ellaia and J.-B. Hiriart-Urruty, \The conjugate of the difierence offunctions," Journal of Optimization Theory and Applications, vol 49(3),pp.493-498, 1986.[9] R. Goebel, \Self-dual smoothing of convex and saddle functions," Jour-nal of Convex Analysis, vol 15(1), pp.179-190, 2008.[10] E. Kreyszig, Introductory Functional Analysis with Applications, JohnWiley and Sons, 1978.[11] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000.[12] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970.[13] R. T. Rockafellar and R. J-B Wets, Variational Analysis, Springer-Verlag, 2004.[14] C. Z‚alinescu, Convex Analysis in General Vector Spaces, World Scien-tiflc Publishing, 2002.103Appendix AMaple CodeThe following is the Maple code used to verify the calculations for chapter6.>restart : with( plots ):Case 1:> h1 := x¡> (1/3)⁄(x¡y[1])^3+(1/3)⁄(x¡y[1]¡y[2])^3+(1/3)⁄x^3;x¡ > 13(x¡y1)3 + 13(x¡y1 ¡y2)3 + 13x3> dh1 := diff (h1(x) , x); criticalpoints1 := solve (dh1 = 0, x);13y2 +23y1 +13q¡2y22 ¡2y1y2 ¡2y21; 13y2 + 23y1 ¡ 13q¡2y22 ¡2y1y2 ¡2y21Case 2:> h2 := x¡> (1/3)⁄(y[1]¡x)^3+(1/3)⁄(x¡y[1]¡y[2])^3+(1/3)⁄x^3;x¡ > 13(y1 ¡x)3 + 13(x¡y1 ¡y2)3 + 13x3> dh2 := diff (h2(x) , x): criticalpoints2 := solve (dh2 = 0, x);104Appendix A. Maple Codey2 +p¡2y1y2;y2 ¡p¡2y1y2> h2soln1 := factor ( simplify (h2( criticalpoints2 [1])));¡13y1y2(3y1 ¡3y2 ¡4p2p¡y1y2)> h2soln2 := factor ( simplify (h2( criticalpoints2 [2])));¡13y1y2(3y1 ¡3y2 +4p2p¡y1y2)Case 3:> h3 := x¡> (1/3)⁄(x¡y[1])^3+(1/3)⁄(y[1]+y[2]¡x)^3+(1/3)⁄x^3;x¡ > 13(x¡y1)3 + 13(y1 +y2 ¡x)3 + 13x3>dh3 := diff (h3(x) , x): criticalpoints3 := solve (dh3 = 0, x);¡y2 +q2y22 +2y1y2;¡y2 ¡q2y22 +2y1y2>h3soln1 := factor ( simplify (h3( criticalpoints3 [1])));¡13y2(y2 +y1)(¡3y1 ¡6y2 +4p2py2(y2 +y1))>h3soln2 := factor ( simplify (h3( criticalpoints3 [2])));105Appendix A. Maple Code13y2(y2 +y1)(3y1 +6y2 +4p2py2(y2 +y1))Case 4:> h4 := x¡> (1/3)⁄(x¡y[1])^3+(1/3)⁄(x¡y[1]¡y[2])^3¡(1/3)⁄x^3;x¡ > 13(x¡y1)3 + 13(x¡y1 ¡y2)3 + 13x3>dh4 := diff (h4(x) , x): criticalpoints4 := solve (dh4 = 0, x);y2 +2y1 +q2y1y2 +2y21;y2 +2y1 ¡q2y1y2 +2y21>h4soln1 := factor ( simplify (h4( criticalpoints4 [1])));¡13y1(y2 +y1)(3y2 +6y1 +4p2y1(y2 +y1))>h4soln2 := factor ( simplify (h4( criticalpoints4 [2])));13y1(y2 +y1)(¡3y2 ¡6y1 +4p2y1(y2 +y1))Case 5:> h5 := x¡> (1/3)⁄(y[1]¡x)^3+(1/3)⁄(y[1]+y[2]¡x)^3+(1/3)⁄x^3;x¡ > 13(y1 ¡x)3 + 13(y1 +y2 ¡x)3 + 13x3106Appendix A. Maple Code>dh5 := diff (h5(x) , x): criticalpoints5 := solve (dh5 = 0, x);y2 +2y1 +q2y1y2 +2y21;y2 +2y1 ¡q2y1y2 +2y21>h5soln1 := factor ( simplify (h5( criticalpoints5 [1])));13y1(y2 +y1)(3y2 +6y1 +4p2y1(y2 +y1))>h5soln2 := factor ( simplify (h5( criticalpoints5 [2])));¡13y1(y2 +y1)(¡3y2 ¡6y1 +4p2y1(y2 +y1))Case 6:> h6 := x¡> (1/3)⁄(y[1]¡x)^3+(1/3)⁄(x¡y[1]¡y[2])^3¡(1/3)⁄x^3;x¡ > 13(y1 ¡x)3 + 13(x¡y1 ¡y2)3 ¡ 13x3dh6 := diff (h6(x) , x): criticalpoints6 := solve (dh6 = 0, x);¡y2 +q2y22 +2y1y2;¡y2 ¡q2y22 +2y1y2>h6soln1 := factor ( simplify (h6( criticalpoints6 [1])));13y2(y2 +y1)(¡3y1 ¡6y2 +4p2y2(y2 +y1))107Appendix A. Maple Code>h6soln2 := factor ( simplify (h6( criticalpoints6 [2])));¡13y2(y2 +y1)(3y1 +6y2 +4p2y2(y2 +y1))Case 7:> h7 := x¡> (1/3)⁄(x¡y[1])^3+(1/3)⁄(y[1]+y[2]¡x)^3¡(1/3)⁄x^3;x¡ > 13(x¡y1)3 + 13(y1 +y2 ¡x)3 ¡ 13x3>dh7 := diff (h7(x) , x): criticalpoints7 := solve (dh7 = 0, x);y2 +p¡2y1y2;y2 ¡p¡2y1y2>h7soln1 := factor ( simplify (h7( criticalpoints7 [1])));¡13y1y2(3y2 ¡3y1 +4p¡2y1y2)>h7soln2 := factor ( simplify (h7( criticalpoints7 [2])));13y1y2(3y1 ¡3y2 +4p¡2y1y2)Case 8:> h8 := x¡> (1/3)⁄(y[1]¡x)^3+(1/3)⁄(y[1]+y[2]¡x)^3¡(1/3)⁄x^3;108Appendix A. Maple Codex¡ > 13(y1 ¡x)3 + 13(y1 +y2 ¡x)3 ¡ 13x3>dh8 := diff (h8(x) , x): criticalpoints8 := solve (dh8 = 0, x);13y2 +23y1 +13q¡2y22 ¡2y1y2 ¡2y21; 13y2 + 23y1 ¡ 13q¡2y22 ¡2y1y2 ¡2y21Plotting :>restart : with( plots ):> f1 := (y1 ,y2)¡> ¡(1/3)⁄y1⁄(y2+y1)⁄(¡6⁄y1¡3⁄y2+4⁄sqrt (2)⁄sqrt (y1⁄(y2+y1 ))):> f2 := (y1 ,y2)¡> ¡(1/3)⁄y2⁄(y2+y1)⁄(¡3⁄y1¡6⁄y2+4⁄sqrt (2)⁄sqrt (y2⁄(y2+y1 ))):> f3 := (y1 ,y2)¡> (1/3)⁄y1⁄y2⁄(3⁄y1¡3⁄y2+4⁄sqrt (2)⁄ sqrt(¡y1⁄y2 )):> f4 := (y1 ,y2)¡> ¡(1/3)⁄y1⁄(y2+y1)⁄(6⁄y1+3⁄y2+4⁄sqrt (2)⁄sqrt (y1⁄(y2+y1 ))):> f5 := (y1 ,y2)¡> ¡(1/3)⁄y2⁄(y2+y1)⁄(3⁄y1+6⁄y2+4⁄sqrt (2)⁄sqrt (y2⁄(y2+y1 ))):> f6 := (y1 ,y2)¡> (1/3)⁄y1⁄y2⁄(¡3⁄y1+3⁄y2+4⁄sqrt (2)⁄ sqrt(¡y1⁄y2 )):>fpiece := (y1 ,y2)¡> piecewise (0 <= y1 and ¡(1/2)⁄y1 <= y2and y2 <= y1 , f1 (y1 ,y2) , 0 <= y2 and ¡(1/2)⁄y2 <= y1 andy1 <= y2 , f2 (y1 ,y2) , y1 <= 0 and ¡(1/2)⁄y1 <= y2 and y2 <=¡2⁄y1 , f3 (y1 ,y2) , y1 <= 0 and y1 <= y2 and y2 <= ¡(1/2)⁄y1 ,109Appendix A. Maple Codef4 (y1 ,y2) , y2 <= 0 and y2 <= y1 and y1 <= ¡(1/2)⁄y2 ,f5 (y1 ,y2) , 0 <= y1 and ¡2⁄y1 <= y2 and y2 <= ¡(1/2)⁄y1 ,f6 (y1 ,y2 )):>plot3d( fpiece (y1 ,y2) , y1=¡10..10, y2=¡10..10, axes=normal );>plots [ contourplot ]( fpiece (y1 ,y2) , y1=¡10..10, y2=¡10..10,contours=100);110

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0069333/manifest

Comment

Related Items